Fossil SCM
Continuing work on the VCCP spec. This is an incremental check-in.
Commit
60794e993c9e3dad82a475203c63a1dda5c4b134f35292f712c43b82f300a9d8
Parent
b37bb7dc79aacde…
1 file changed
+385
-11
+385
-11
| --- www/vccp/intro.md | ||
| +++ www/vccp/intro.md | ||
| @@ -1,11 +1,11 @@ | ||
| 1 | 1 | Version Control Collaboration Protocol |
| 2 | 2 | ====================================== |
| 3 | 3 | |
| 4 | 4 | <blockquote><center style='background: yellow; border: 1px solid black;'> |
| 5 | 5 | This document is a work in progress.<br> |
| 6 | -The last update was on 2019-03-09.<br> | |
| 6 | +The last update was on 2019-03-13.<br> | |
| 7 | 7 | Check back later for updates. |
| 8 | 8 | </center></blockquote> |
| 9 | 9 | |
| 10 | 10 | 1.0 Introduction |
| 11 | 11 | ---------------- |
| @@ -12,22 +12,22 @@ | ||
| 12 | 12 | |
| 13 | 13 | The Version Control Collaboration Protocol or VCCP is an attempt to make |
| 14 | 14 | it easier for developers to collaborate even when they are using different |
| 15 | 15 | version control systems. |
| 16 | 16 | |
| 17 | -For example, suppose Alice, the founder and | |
| 18 | -[BDFL](https://en.wikipedia.org/wiki/Benevolent_dictator_for_life) | |
| 17 | +For example, suppose Alice, the founder and principal maintainer | |
| 19 | 18 | for the fictional "BambooCoffee" project, prefers using the |
| 20 | 19 | [Mercurial](https://www.mercurial-scm.org/) version control system, |
| 21 | 20 | but two of her clients, Bob and Cindy, know nothing but |
| 22 | 21 | [Git](https://www.git-scm.org/) and steadfastly refuse to |
| 23 | -type any command that begins with "hg", and an important | |
| 22 | +type any command that begins with "hg". | |
| 23 | +Further suppose that an important | |
| 24 | 24 | collaborator, Dave, really prefers [Bazaar](bazaar.canonical.com/). |
| 25 | 25 | The VCCP is designed to make it relatively easy and painless |
| 26 | 26 | for Alice to set up Git and Bazaar mirrors of her Mercurial |
| 27 | -repository so that Bob, Cindy, and Dave can all use the tools with | |
| 28 | -which they are most familiar. | |
| 27 | +repository so that Bob, Cindy, and Dave can all use the tools | |
| 28 | +they are most familiar with. | |
| 29 | 29 | |
| 30 | 30 | <center></center> |
| 31 | 31 | |
| 32 | 32 | Assuming all the servers speak VCCP (which is not the case at the |
| 33 | 33 | time of this writing, but we hope to encourage that for the future) |
| @@ -38,11 +38,11 @@ | ||
| 38 | 38 | ### 1.1 Bidirectional Collaboration |
| 39 | 39 | |
| 40 | 40 | The diagram above shows that all changes originate from Alice and |
| 41 | 41 | that Bob, Cindy, and David are only consumers. If Cindy wanted to |
| 42 | 42 | make a change to BambooCoffee, she would have to do that with a backchannel, |
| 43 | -such as sending a patch via email to Alice and then get Alice to check | |
| 43 | +such as sending a patch via email to Alice and asking Alice to check | |
| 44 | 44 | in the change. |
| 45 | 45 | |
| 46 | 46 | But VCCP also support bidirectional collaboration. |
| 47 | 47 | |
| 48 | 48 | <center></center> |
| @@ -55,26 +55,27 @@ | ||
| 55 | 55 | VCCP message back to Truth containing Cindy's changes. Truth would then |
| 56 | 56 | relay those changes over to Mirror-2 where Dave could see them as well. |
| 57 | 57 | |
| 58 | 58 | ### 1.2 Client-Mirror versus Server-Mirror |
| 59 | 59 | |
| 60 | -VCCP allows the mirrors to be set up as either clients or server. | |
| 60 | +VCCP allows the mirrors to be set up as either clients or servers. | |
| 61 | 61 | |
| 62 | 62 | In the client-mirror approach, the mirrors periodically poll Truth asking |
| 63 | 63 | for changes. In the server-mirror approach, Truth sends changes to the |
| 64 | 64 | mirrors as they occur. |
| 65 | 65 | |
| 66 | 66 | In the first example above, the implication was that the server-mirror |
| 67 | 67 | approach was being used. The Truth repository would take the initiative |
| 68 | 68 | to send changes to the mirrors. But it does not have to be that way. |
| 69 | 69 | Suppose Dave is unknown to Alice. Suppose he just likes Alice's work and |
| 70 | -wants to keep his own mirror for his own convenience. Dave could set up | |
| 70 | +wants to keep his own mirror of her work for his own convenience. | |
| 71 | +Dave could set up | |
| 71 | 72 | Mirror-2 as a client-mirror that periodically polls Truth for changes. |
| 72 | 73 | |
| 73 | 74 | In the second example above, Truth and Mirror-1 could be configured to |
| 74 | 75 | have a Peer-to-Peer relationship rather than a Truth-to-Mirror relationship. |
| 75 | -When new content arrives a Truth (because Alice did an "hg commit"), | |
| 76 | +When new content arrives at Truth (because Alice did an "hg commit"), | |
| 76 | 77 | Truth acts as a client to initiate a transfer of that new information |
| 77 | 78 | over to Mirror-1. When new content originates at Mirror-1 (because |
| 78 | 79 | Cindy did "git commit") then Mirror-1 acts as a client to send a the new |
| 79 | 80 | content over the Truth. Or, they could set it up so that Truth is always |
| 80 | 81 | the client and it periodically polls Mirror-1 looking for new content |
| @@ -81,10 +82,39 @@ | ||
| 81 | 82 | coming from Cindy. Or, they could set it up so that Mirror-1 is always |
| 82 | 83 | the client and it periodically polls Truth looking for changes from Alice. |
| 83 | 84 | |
| 84 | 85 | The point is that VCCP works in all of these scenarios. |
| 85 | 86 | |
| 87 | +### 1.3 Name Mapping | |
| 88 | + | |
| 89 | +Different version control systems use different names to refer to the same | |
| 90 | +object. For example, Fossil names files using a SHA3-256 hash of the | |
| 91 | +unmodified file content, whereas Git uses a hardened-SHA1 hash of the file | |
| 92 | +content with an added prefix. Mercurial, Monotone, Bazaar, and others all | |
| 93 | +uses different naming schemes, so that the same check-in in any particular | |
| 94 | +version control system will have a different name in all other version | |
| 95 | +control systems. | |
| 96 | + | |
| 97 | +When mirroring a project between two version control systems, somebody | |
| 98 | +needs to keep track of the mapping between names. | |
| 99 | + | |
| 100 | +For example, in the second diagram above, if Mirror-1 wants to tell Truth | |
| 101 | +that it has a new check-in "Q" that is a child of "P", then it has to send | |
| 102 | +the name of check-in "P". Does it send the Git-name of "P" or the | |
| 103 | +Mercurial-name of "P"? If Mirror-1 sends Truth the Git-name of "P" then | |
| 104 | +Truth must be the system that does the name mapping. If Mirror-1 sends | |
| 105 | +Truth the Mercurial-name of "P", then Mirror-1 is the system that maintains | |
| 106 | +the mapping. | |
| 107 | + | |
| 108 | +The VCCP is designed such that both names for a | |
| 109 | +particular check-in or file can be sent. One of the collaborating systems | |
| 110 | +must still take responsibility for translating the names, but it does not | |
| 111 | +matter which system. As long as one or the other of the two systems | |
| 112 | +maintains a name mapping, the collaboration will work. Of course, it | |
| 113 | +also works for both systems to maintain the name map, and for maximum | |
| 114 | +flexibility, perhaps that should be the preferred approach. | |
| 115 | + | |
| 86 | 116 | 2.0 Minimum Requirements |
| 87 | 117 | ------------------------ |
| 88 | 118 | |
| 89 | 119 | The VCCP is modeled after the Git fast-export and fast-import protocol. |
| 90 | 120 | That is to say, VCCP thinks in terms of "check-ins" with each check-in |
| @@ -101,11 +131,11 @@ | ||
| 101 | 131 | a parent, but all the others should. Check-ins may also identify |
| 102 | 132 | zero or more "merge" parents, and zero or more "cherrypick" ancestors. |
| 103 | 133 | But the merges and cherrypicks can be ignored on systems that do not |
| 104 | 134 | support those concepts. |
| 105 | 135 | |
| 106 | -VCCP assumes that every distinct version of a file, and every check-in has | |
| 136 | +VCCP assumes that every distinct version of a file and every check-in has | |
| 107 | 137 | a unique name. In Git and Mercurial, those names are SHA1 hashes |
| 108 | 138 | (computed in different ways). Fossil uses SHA3-256 hashes. I'm not sure |
| 109 | 139 | what Bazaar uses. VCCP does not care how the names are derived, as long |
| 110 | 140 | as they always uniquely identify the file or check-in. |
| 111 | 141 | |
| @@ -121,5 +151,349 @@ | ||
| 121 | 151 | |
| 122 | 152 | The VCCP is a client-server protocol. |
| 123 | 153 | A client formats a VCCP message and sends it to the server. |
| 124 | 154 | The server acts upon that message, formulates a reply, and sends |
| 125 | 155 | the reply back to the client. |
| 156 | + | |
| 157 | +It does not matter what transport mechanism is used to send the VCCP | |
| 158 | +messages from client to server and back again. | |
| 159 | +But for maximum flexibility, it is suggested that HTTP (or HTTPS) be | |
| 160 | +used. The client sends an HTTP request to the server with the | |
| 161 | +VCCP message as the request content and a MIME-type of "application/x-vccp". | |
| 162 | +The HTTP response is another VCCP message with the same MIME-type. | |
| 163 | +The use of HTTP means that firewalls and proxies are not an | |
| 164 | +impediment to collaboration and that collaboration connection information | |
| 165 | +can be described by a simple URL. | |
| 166 | + | |
| 167 | +There are provisions in the VCCP design to allow authentication | |
| 168 | +in the body of the VCCP message itself. Or, two systems can, by | |
| 169 | +mutual agreement, authenticate by some external mechanism. | |
| 170 | + | |
| 171 | +### 3.1 Message Content | |
| 172 | + | |
| 173 | +A single VCCP message round-trip can be a "push" if the client is sending | |
| 174 | +new check-in information to the server, or it can be a "pull" if the | |
| 175 | +client is polling the server to see if new check-in information is available | |
| 176 | +for download, or it can be both at once. | |
| 177 | + | |
| 178 | +The basic design of a VCCP message is inspired by the Git fast-export | |
| 179 | +protocol, but with enhancements to support incremental updates and | |
| 180 | +bidirectional updates and to make the message format more robust and | |
| 181 | +portable and simpler to generate and parse. A single message may contain | |
| 182 | +multiple "files", check-in descriptions that reference those files, and "tag" | |
| 183 | +descriptions. A "message description" section contains authentication | |
| 184 | +data, error codes, and other meta-data. Every request and every | |
| 185 | +reply contains, at a minimum, a message description. | |
| 186 | + | |
| 187 | +For a push, the request contains a message description with | |
| 188 | +authentication information, and the new files, check-ins, and tags | |
| 189 | +that are being pushed to the server. The reply to a push contains | |
| 190 | +success codes, and the names that the server assigned to the new objects, | |
| 191 | +so that the client can maintain a name map. | |
| 192 | + | |
| 193 | +For pull, the request contains only a message description with | |
| 194 | +authentication information and a description of what content the | |
| 195 | +client desires to pull. | |
| 196 | +The reply to a pull contains the files, check-ins, and tags requested. | |
| 197 | + | |
| 198 | +For a pull request, there is no mechanism (currently defined) for the | |
| 199 | +server to learn the client-side names for files and check-ins. Hence, | |
| 200 | +for a collaboration arrangement where the client polls the server for | |
| 201 | +updates, the client must maintain the name map. | |
| 202 | + | |
| 203 | +### 3.2 Message Format Overview | |
| 204 | + | |
| 205 | +The format of a VCCP message is an ordinary SQLite database file with | |
| 206 | +a two-table schema. | |
| 207 | +The DATA table contains file, check-in, and tag content and the | |
| 208 | +message description. The DATA.CONTENT column contains either raw | |
| 209 | +file content or check-ins and tags descriptions formatted as JSON. | |
| 210 | +The message description is also JSON contained in a specially | |
| 211 | +designated row of the DATA table. The NAME table of the schema | |
| 212 | +is used to transmit name mappings. The NAME table serves the same | |
| 213 | +role as the "marks" file of git-fast-export. | |
| 214 | + | |
| 215 | +### 3.3 Why Use A Database As The Message Format? | |
| 216 | + | |
| 217 | +Why does a VCCP message consist of an SQLite database instead of a | |
| 218 | +bespoke format like git-fast-export? | |
| 219 | + | |
| 220 | + 1. Some of the content to be transferred will typically be binary. | |
| 221 | + Most projects have at least a few images or other binary files | |
| 222 | + in their tree somewhere. Other files will be pure text. Check-in | |
| 223 | + and tag descriptions will also be pure text (JSON). That means | |
| 224 | + that the VCCP message will be a mix of text and binary content. | |
| 225 | + An SQLite database file is a convenient and efficient way | |
| 226 | + to encapsulate both binary and text content into a single container | |
| 227 | + which is easily created and accessed. | |
| 228 | + | |
| 229 | + 2. Robust, cross-platform libraries for reading and writing SQLite database | |
| 230 | + files already exist on every computer. No custom parser or generator | |
| 231 | + code needs to be written, debugged, managed, or maintained. | |
| 232 | + | |
| 233 | + 3. The SQLite database file format is well defined, cross-platform | |
| 234 | + (32-bit, 64-bit, bit-endian, and little-endian) and is endorsed | |
| 235 | + by the US Library of Congress as a recommended file format for | |
| 236 | + archival data storage. | |
| 237 | + | |
| 238 | + 4. Unlike a serial format (such as git-fast-export) which must | |
| 239 | + normally be written and read sequentially from beginning to end, | |
| 240 | + elements of an SQLite database can be constructed and read in any | |
| 241 | + order. This gives extra implementation flexibility to both readers | |
| 242 | + and writers. | |
| 243 | + | |
| 244 | +### 3.4 Database Schema | |
| 245 | + | |
| 246 | +The database schema for a VCCP message is as follows: | |
| 247 | + | |
| 248 | +> | |
| 249 | + CREATE TABLE data( | |
| 250 | + id INTEGER PRIMARY KEY, | |
| 251 | + dclass INT, | |
| 252 | + sz INT, | |
| 253 | + calg INT, | |
| 254 | + cref INT, | |
| 255 | + content ANY | |
| 256 | + ); | |
| 257 | + CREATE TABLE name( | |
| 258 | + nameid INT, | |
| 259 | + nametype INT, | |
| 260 | + name TEXT, | |
| 261 | + PRIMARY KEY(nameid,nametype) | |
| 262 | + ) WITHOUT ROWID; | |
| 263 | + | |
| 264 | +The DATA table holds the message description, the content of files, and JSON | |
| 265 | +descriptions of check-ins and tags. The NAME table is used to transmit | |
| 266 | +names. The DATA table corresponds to the body of a git-fast-export stream | |
| 267 | +and the NAME table corresponds to the "marks" file that is read and | |
| 268 | +written by the "--import-marks" and "--export-marks" options of the | |
| 269 | +"git fast-export" command. | |
| 270 | + | |
| 271 | +Each file, check-in, and tag is normally a single distinct entry in | |
| 272 | +the DATA table. (Exception: very large files, greater than 1GB in size, | |
| 273 | +can be split across multiple DATA table rows - see below.) Entries in | |
| 274 | +the DATA tale can occur in any order. It is not required that files | |
| 275 | +referenced by check-ins have a smaller DATA.ID value, for example. | |
| 276 | +Free ordering does not impede data extraction (see the algorithm descriptions | |
| 277 | +below) but it does give considerable freedom to the message generator | |
| 278 | +logic. | |
| 279 | + | |
| 280 | +Each DATA row has a class identified by a small integer in the DATA.DCLASS | |
| 281 | +column. | |
| 282 | + | |
| 283 | +> | |
| 284 | +| 0: | A check-in | | |
| 285 | +| 1: | A file | | |
| 286 | +| 2: | A tag | | |
| 287 | +| 3: | The VCCP message description | | |
| 288 | +| 4: | Application-defined-1 | | |
| 289 | +| 5: | Application-defined-2 | | |
| 290 | + | |
| 291 | +Every well-formed VCCP message has exactly one message description entry | |
| 292 | +with DATA.ID=0 and DATA.DCLASS=3. No other DATA table entries should have | |
| 293 | +DATA.DCLASS=3. | |
| 294 | + | |
| 295 | +The application-defined values are reserved for extended uses of the | |
| 296 | +VCCP message format. In particular, there are plans to enhance | |
| 297 | +Fossil so that it uses VCCP as its sync protocol, replacing its | |
| 298 | +current bespoke protocol. But Fossil needs to send information other | |
| 299 | +kinds of objects, such as wiki pages and tickets, that are not known | |
| 300 | +to Git and most other version control systems. A few | |
| 301 | +"application defined" values are available at strategic points in | |
| 302 | +the message format description to accommodate these extended use cases. | |
| 303 | +New application-defined values may be defined in the future. | |
| 304 | +Portable VCCP messages between different version control systems | |
| 305 | +should never use the application-defined values. | |
| 306 | + | |
| 307 | +The DATA.CONTENT field can be either text or binary, as appropriate. | |
| 308 | +For files, the DATA.CONTENT is binary. For check-ins and tags and for | |
| 309 | +the message description, the DATA.CONTENT is a text JSON object. | |
| 310 | + | |
| 311 | +The DATA.CONTENT field can optionally be compressed. The DATA.SZ field | |
| 312 | +is the uncompressed size of the content in bytes. The compression method | |
| 313 | +is determined by the DATA.CALG field: | |
| 314 | + | |
| 315 | +> | |
| 316 | +| 0: | No compression | | |
| 317 | +| 1: | ZLib compression | | |
| 318 | +| 2: | Multi-blob | | |
| 319 | +| 3: | Application-defined-1 | | |
| 320 | +| 4: | Application-defined-2 | | |
| 321 | + | |
| 322 | +The "multi-blob" compression method means that the content is the | |
| 323 | +concatenation of the content in other DATA table rows. This | |
| 324 | +allows for content that exceeds the 1GB size limit for an SQLite | |
| 325 | +BLOB column. If the DATA.CALG field is 2, then DATA.CONTENT will | |
| 326 | +be a JSON array of integer values, where each integer is the DATA.ID | |
| 327 | +of another DATA table entry that contains part of the content. | |
| 328 | +The actual data content is the concatenation of the other DATA table | |
| 329 | +entries. The secondary DATA table entries can also be compressed, | |
| 330 | +though not with multi-blob. In other words, the multi-blob | |
| 331 | +compression method may not be nested. This effectively limits the | |
| 332 | +maximum size of a file in the VCCP to maximum size of an SQLite | |
| 333 | +database, which is 140 terabytes. | |
| 334 | + | |
| 335 | +Portable VCCP files should only use compression methods 0, 1, and 2, | |
| 336 | +and preferrably only method 0 (no compression). But application-defined | |
| 337 | +compression methods are available for proprietary uses of the | |
| 338 | +VCCP message format. The DATA.CREF field is auxiliary data intended | |
| 339 | +for use with these application-defined compression methods. In | |
| 340 | +particular, DATA.CREF is intended to be the DATA.ID of a "base" | |
| 341 | +entry for delta-compression methods. For a portable VCCP file, | |
| 342 | +the DATA.CREF field should always be NULL. | |
| 343 | + | |
| 344 | +The DATA.ID field provides an integer identifier for files and | |
| 345 | +check-ins. The scope of that name is the single VCCP message | |
| 346 | +in which the DATA table entry appears, however. The NAME table | |
| 347 | +is used to provide a mapping from these internal integer names | |
| 348 | +to the persistent global hash names of the various version | |
| 349 | +control systems. | |
| 350 | + | |
| 351 | +A single object can have different names, depending on which | |
| 352 | +version control system stores it. For this reason, the NAME | |
| 353 | +table is designed to allow storage of multiple names for the | |
| 354 | +same object. If NAME.NAMETYPE is 0, that means that the name | |
| 355 | +is appropriate for use on the client. If NAME.NAMETYPE is 1, | |
| 356 | +that means the name is appropriate for use on the server. | |
| 357 | + | |
| 358 | +To simplify the implementation of VCCP on diverse systems, | |
| 359 | +names should be sent as text. If the names for a particular system | |
| 360 | +are binary hashes, then the NAME table should store them as | |
| 361 | +the hexadecimal representation. | |
| 362 | + | |
| 363 | +#### 3.4.1 NAME Table Example 1 | |
| 364 | + | |
| 365 | +Suppose a client is pushing a new check-in to the server and the | |
| 366 | +check-in text is stored in the DATA.ID=1 row. Then the request | |
| 367 | +should contain a NAME table row with NAME.NAMEID=1 (to match the | |
| 368 | +DATA table ID value) and NAME.NAMETYPE=0 (because client names | |
| 369 | +have NAMETYPE 0) and with the name of that check-in according to | |
| 370 | +the client stored in NAME.NAME. The server will recode the | |
| 371 | +check-in according its its own format, and store the server-side | |
| 372 | +name in a new NAME table row with NAME.NAMEID=1 and NAME.NAMETYPE=1. | |
| 373 | +The server then includes the complete NAME table in its reply | |
| 374 | +back to the client. In this way, the client is able to discover | |
| 375 | +the name of the check-in on the server. The serve can also | |
| 376 | +remember the client check-in name, if desired. | |
| 377 | + | |
| 378 | +### 3.5 Check-in JSON Format | |
| 379 | + | |
| 380 | +Check-ins are described by DATA table rows where the content is a | |
| 381 | +single JSON object, as follows: | |
| 382 | + | |
| 383 | +> | |
| 384 | + { | |
| 385 | + "time": DATETIME, -- Date and time of the check-in | |
| 386 | + "comment": TEXT, -- The original check-in comment | |
| 387 | + "mimetype": TEXT, -- The mimetype of the comment text | |
| 388 | + "branch": TEXT, -- Branch this check-in belongs to | |
| 389 | + "from": INT, -- NAME.NAMEID for the primary parent | |
| 390 | + "merge": [INT], -- Merge parents | |
| 391 | + "cherrypick": [INT] -- Cherrypick merges | |
| 392 | + "author": { -- Author of the change | |
| 393 | + "name": TEXT, -- Name or handle | |
| 394 | + "email": TEXT, -- Email address | |
| 395 | + "time": DATETIME -- Override for $.time | |
| 396 | + }, | |
| 397 | + "committer": { -- Committer of the change | |
| 398 | + "name": TEXT, -- Name or handle | |
| 399 | + "email": TEXT, -- Email address | |
| 400 | + "time": DATETIME -- Override for $.time | |
| 401 | + }, | |
| 402 | + "tag": [{ -- Tags and properties for this check-in | |
| 403 | + "name": TEXT, -- tag name | |
| 404 | + "value": TEXT, -- value (if it is a property) | |
| 405 | + "delete": 1, -- If present, delete this tag | |
| 406 | + "propagate": 1 -- Means propagate to descendants | |
| 407 | + }], | |
| 408 | + "reset": 1, -- All files included, not just changes | |
| 409 | + "file": [{ -- File in this check-in | |
| 410 | + "fname": TEXT, -- filename | |
| 411 | + "id": INT, -- DATA.ID or NAME_NAMEID. Omitted to delete | |
| 412 | + "mode": TEXT, -- "x" for executable. "l" for symlink | |
| 413 | + "oldname": TEXT -- Prior name if the file is renamed | |
| 414 | + }] | |
| 415 | + } | |
| 416 | + | |
| 417 | +The $.time element is defines the moment in time when the check-in | |
| 418 | +occurred. The $.time field is required. Times are always Coordinated | |
| 419 | +Universal Time (UTC). DATETIME can be represented in multiple ways: | |
| 420 | + | |
| 421 | + 1. If the DATETIME is an integer, then it is the number of seconds | |
| 422 | + since 1970 (also known as "unix time"). | |
| 423 | + | |
| 424 | + 2. If the DATETIME is text, then it is ISO8601 as follows: | |
| 425 | + "YYYY-MM-DD HH:MM:SS.SSS". The fractional seconds may be | |
| 426 | + omitted. | |
| 427 | + | |
| 428 | + 3. If the DATETIME is a real number, then it is the fractional | |
| 429 | + julian day number. | |
| 430 | + | |
| 431 | +The $.comment element is the check-in comment. The $.comment field is | |
| 432 | +required. The mimetype for $.commit defaults to "text/plain" but can | |
| 433 | +be some other MIME-type if the $.mimetype field is present. | |
| 434 | + | |
| 435 | +The $.branch element defines the name of the branch that this check-in | |
| 436 | +belongs to. If omitted, the branch of the check-in is the same as | |
| 437 | +the branch of its primary parent check-in. | |
| 438 | + | |
| 439 | +The $.from element is defines the primary parent check-in. Every | |
| 440 | +check-in other than the first check-in of the project has a primary | |
| 441 | +parent. The integer value of the $.from element is either the | |
| 442 | +DATA.ID value for another check-in in the same VCCP message or is | |
| 443 | +the NAME.NAMEID value for a NAME table entry that identifies the | |
| 444 | +parent check-in, or both. If the information sender is relying on the | |
| 445 | +other side to do name mapping, then only the local name will be provided. | |
| 446 | +But if the information sender has a name map, it should provide both | |
| 447 | +its local name and the remote name for the check-in, so that the receiver | |
| 448 | +can update its name map. | |
| 449 | + | |
| 450 | +The $.merge element is an array of integers for additional check-ins | |
| 451 | +that are merged into the current check-in. The $.cherrypick element | |
| 452 | +is an array of integer values that are check-ins that are cherrypick-merged | |
| 453 | +into the current check-in. Systems that do not record cherrypick merges | |
| 454 | +can ignore the $.cherrypick value. | |
| 455 | + | |
| 456 | +The $.author and $.committer elements define who created the check-in. | |
| 457 | +The $.committer element is required. The $.author element may be omitted | |
| 458 | +in the common case where the author and committer are the same. The | |
| 459 | +$.committer.time and $.author.time subelements should only be included | |
| 460 | +if they are different from $.time. | |
| 461 | + | |
| 462 | +The $.reset element, if present, should have an integer value of "1". | |
| 463 | +The presence of the $.reset element is a flag that affects the meaning | |
| 464 | +of the $.file element. | |
| 465 | + | |
| 466 | +The $.file element is an array of JSON objects that define the files | |
| 467 | +associated with the check-in. If the $.reset flag is present, then there | |
| 468 | +must be one entry in $.file for every file in the check-in. If the | |
| 469 | +$.reset flag is omitted (the common case) then there is one entry | |
| 470 | +in $.file for every file that changes relative to the primary parent | |
| 471 | +in $.from. If There is no primary parent, then the presence of the | |
| 472 | +$.reset flag is assumed even if it is omitted. | |
| 473 | + | |
| 474 | +The $.file[].fname element is the name of the file. | |
| 475 | +The $.file[].id element corresponds to a DATA.ID or NAME.NAMEID | |
| 476 | +that is the content of the file. If the file is being removed | |
| 477 | +by this check-in, then the $.file[].id element is omitted. | |
| 478 | +The $.file[].mode element is text containing one or more ASCII | |
| 479 | +characters. If the "x" character is included in $.file[].mode | |
| 480 | +then the file is executable. If the "l" character is included | |
| 481 | +in $.file[].mode then the file is a symbolic link (and the content | |
| 482 | +of the file is the target of the link). The $.file[].mode may | |
| 483 | +be blank or omitted for a normal read/write file. If a file | |
| 484 | +is being renamed, the $.file[].oldname field may be included | |
| 485 | +to show the previous name of the file, if that information is | |
| 486 | +available. | |
| 487 | + | |
| 488 | +Some version control systems allow tags and properties to be | |
| 489 | +associated with a check-in. The $.tag element supports this | |
| 490 | +feature. Each element of the $.tag array is a separate tag | |
| 491 | +or property. If the $.tag[].propagate field exists and has | |
| 492 | +a value of "1", then the tag/property propagates to all | |
| 493 | +non-merge children. If the $.tag[].delete field exists and | |
| 494 | +has a value of "1", then a propagating tag or property with | |
| 495 | +the given name that was set by some ancestor check-in is | |
| 496 | +stopped and omitted from this check-in. Version control | |
| 497 | +systems that do not support tags and/or properties on check-ins | |
| 498 | +or that do not support tag propagation can ignore all of these | |
| 499 | +attributes. | |
| 126 | 500 |
| --- www/vccp/intro.md | |
| +++ www/vccp/intro.md | |
| @@ -1,11 +1,11 @@ | |
| 1 | Version Control Collaboration Protocol |
| 2 | ====================================== |
| 3 | |
| 4 | <blockquote><center style='background: yellow; border: 1px solid black;'> |
| 5 | This document is a work in progress.<br> |
| 6 | The last update was on 2019-03-09.<br> |
| 7 | Check back later for updates. |
| 8 | </center></blockquote> |
| 9 | |
| 10 | 1.0 Introduction |
| 11 | ---------------- |
| @@ -12,22 +12,22 @@ | |
| 12 | |
| 13 | The Version Control Collaboration Protocol or VCCP is an attempt to make |
| 14 | it easier for developers to collaborate even when they are using different |
| 15 | version control systems. |
| 16 | |
| 17 | For example, suppose Alice, the founder and |
| 18 | [BDFL](https://en.wikipedia.org/wiki/Benevolent_dictator_for_life) |
| 19 | for the fictional "BambooCoffee" project, prefers using the |
| 20 | [Mercurial](https://www.mercurial-scm.org/) version control system, |
| 21 | but two of her clients, Bob and Cindy, know nothing but |
| 22 | [Git](https://www.git-scm.org/) and steadfastly refuse to |
| 23 | type any command that begins with "hg", and an important |
| 24 | collaborator, Dave, really prefers [Bazaar](bazaar.canonical.com/). |
| 25 | The VCCP is designed to make it relatively easy and painless |
| 26 | for Alice to set up Git and Bazaar mirrors of her Mercurial |
| 27 | repository so that Bob, Cindy, and Dave can all use the tools with |
| 28 | which they are most familiar. |
| 29 | |
| 30 | <center></center> |
| 31 | |
| 32 | Assuming all the servers speak VCCP (which is not the case at the |
| 33 | time of this writing, but we hope to encourage that for the future) |
| @@ -38,11 +38,11 @@ | |
| 38 | ### 1.1 Bidirectional Collaboration |
| 39 | |
| 40 | The diagram above shows that all changes originate from Alice and |
| 41 | that Bob, Cindy, and David are only consumers. If Cindy wanted to |
| 42 | make a change to BambooCoffee, she would have to do that with a backchannel, |
| 43 | such as sending a patch via email to Alice and then get Alice to check |
| 44 | in the change. |
| 45 | |
| 46 | But VCCP also support bidirectional collaboration. |
| 47 | |
| 48 | <center></center> |
| @@ -55,26 +55,27 @@ | |
| 55 | VCCP message back to Truth containing Cindy's changes. Truth would then |
| 56 | relay those changes over to Mirror-2 where Dave could see them as well. |
| 57 | |
| 58 | ### 1.2 Client-Mirror versus Server-Mirror |
| 59 | |
| 60 | VCCP allows the mirrors to be set up as either clients or server. |
| 61 | |
| 62 | In the client-mirror approach, the mirrors periodically poll Truth asking |
| 63 | for changes. In the server-mirror approach, Truth sends changes to the |
| 64 | mirrors as they occur. |
| 65 | |
| 66 | In the first example above, the implication was that the server-mirror |
| 67 | approach was being used. The Truth repository would take the initiative |
| 68 | to send changes to the mirrors. But it does not have to be that way. |
| 69 | Suppose Dave is unknown to Alice. Suppose he just likes Alice's work and |
| 70 | wants to keep his own mirror for his own convenience. Dave could set up |
| 71 | Mirror-2 as a client-mirror that periodically polls Truth for changes. |
| 72 | |
| 73 | In the second example above, Truth and Mirror-1 could be configured to |
| 74 | have a Peer-to-Peer relationship rather than a Truth-to-Mirror relationship. |
| 75 | When new content arrives a Truth (because Alice did an "hg commit"), |
| 76 | Truth acts as a client to initiate a transfer of that new information |
| 77 | over to Mirror-1. When new content originates at Mirror-1 (because |
| 78 | Cindy did "git commit") then Mirror-1 acts as a client to send a the new |
| 79 | content over the Truth. Or, they could set it up so that Truth is always |
| 80 | the client and it periodically polls Mirror-1 looking for new content |
| @@ -81,10 +82,39 @@ | |
| 81 | coming from Cindy. Or, they could set it up so that Mirror-1 is always |
| 82 | the client and it periodically polls Truth looking for changes from Alice. |
| 83 | |
| 84 | The point is that VCCP works in all of these scenarios. |
| 85 | |
| 86 | 2.0 Minimum Requirements |
| 87 | ------------------------ |
| 88 | |
| 89 | The VCCP is modeled after the Git fast-export and fast-import protocol. |
| 90 | That is to say, VCCP thinks in terms of "check-ins" with each check-in |
| @@ -101,11 +131,11 @@ | |
| 101 | a parent, but all the others should. Check-ins may also identify |
| 102 | zero or more "merge" parents, and zero or more "cherrypick" ancestors. |
| 103 | But the merges and cherrypicks can be ignored on systems that do not |
| 104 | support those concepts. |
| 105 | |
| 106 | VCCP assumes that every distinct version of a file, and every check-in has |
| 107 | a unique name. In Git and Mercurial, those names are SHA1 hashes |
| 108 | (computed in different ways). Fossil uses SHA3-256 hashes. I'm not sure |
| 109 | what Bazaar uses. VCCP does not care how the names are derived, as long |
| 110 | as they always uniquely identify the file or check-in. |
| 111 | |
| @@ -121,5 +151,349 @@ | |
| 121 | |
| 122 | The VCCP is a client-server protocol. |
| 123 | A client formats a VCCP message and sends it to the server. |
| 124 | The server acts upon that message, formulates a reply, and sends |
| 125 | the reply back to the client. |
| 126 |
| --- www/vccp/intro.md | |
| +++ www/vccp/intro.md | |
| @@ -1,11 +1,11 @@ | |
| 1 | Version Control Collaboration Protocol |
| 2 | ====================================== |
| 3 | |
| 4 | <blockquote><center style='background: yellow; border: 1px solid black;'> |
| 5 | This document is a work in progress.<br> |
| 6 | The last update was on 2019-03-13.<br> |
| 7 | Check back later for updates. |
| 8 | </center></blockquote> |
| 9 | |
| 10 | 1.0 Introduction |
| 11 | ---------------- |
| @@ -12,22 +12,22 @@ | |
| 12 | |
| 13 | The Version Control Collaboration Protocol or VCCP is an attempt to make |
| 14 | it easier for developers to collaborate even when they are using different |
| 15 | version control systems. |
| 16 | |
| 17 | For example, suppose Alice, the founder and principal maintainer |
| 18 | for the fictional "BambooCoffee" project, prefers using the |
| 19 | [Mercurial](https://www.mercurial-scm.org/) version control system, |
| 20 | but two of her clients, Bob and Cindy, know nothing but |
| 21 | [Git](https://www.git-scm.org/) and steadfastly refuse to |
| 22 | type any command that begins with "hg". |
| 23 | Further suppose that an important |
| 24 | collaborator, Dave, really prefers [Bazaar](bazaar.canonical.com/). |
| 25 | The VCCP is designed to make it relatively easy and painless |
| 26 | for Alice to set up Git and Bazaar mirrors of her Mercurial |
| 27 | repository so that Bob, Cindy, and Dave can all use the tools |
| 28 | they are most familiar with. |
| 29 | |
| 30 | <center></center> |
| 31 | |
| 32 | Assuming all the servers speak VCCP (which is not the case at the |
| 33 | time of this writing, but we hope to encourage that for the future) |
| @@ -38,11 +38,11 @@ | |
| 38 | ### 1.1 Bidirectional Collaboration |
| 39 | |
| 40 | The diagram above shows that all changes originate from Alice and |
| 41 | that Bob, Cindy, and David are only consumers. If Cindy wanted to |
| 42 | make a change to BambooCoffee, she would have to do that with a backchannel, |
| 43 | such as sending a patch via email to Alice and asking Alice to check |
| 44 | in the change. |
| 45 | |
| 46 | But VCCP also support bidirectional collaboration. |
| 47 | |
| 48 | <center></center> |
| @@ -55,26 +55,27 @@ | |
| 55 | VCCP message back to Truth containing Cindy's changes. Truth would then |
| 56 | relay those changes over to Mirror-2 where Dave could see them as well. |
| 57 | |
| 58 | ### 1.2 Client-Mirror versus Server-Mirror |
| 59 | |
| 60 | VCCP allows the mirrors to be set up as either clients or servers. |
| 61 | |
| 62 | In the client-mirror approach, the mirrors periodically poll Truth asking |
| 63 | for changes. In the server-mirror approach, Truth sends changes to the |
| 64 | mirrors as they occur. |
| 65 | |
| 66 | In the first example above, the implication was that the server-mirror |
| 67 | approach was being used. The Truth repository would take the initiative |
| 68 | to send changes to the mirrors. But it does not have to be that way. |
| 69 | Suppose Dave is unknown to Alice. Suppose he just likes Alice's work and |
| 70 | wants to keep his own mirror of her work for his own convenience. |
| 71 | Dave could set up |
| 72 | Mirror-2 as a client-mirror that periodically polls Truth for changes. |
| 73 | |
| 74 | In the second example above, Truth and Mirror-1 could be configured to |
| 75 | have a Peer-to-Peer relationship rather than a Truth-to-Mirror relationship. |
| 76 | When new content arrives at Truth (because Alice did an "hg commit"), |
| 77 | Truth acts as a client to initiate a transfer of that new information |
| 78 | over to Mirror-1. When new content originates at Mirror-1 (because |
| 79 | Cindy did "git commit") then Mirror-1 acts as a client to send a the new |
| 80 | content over the Truth. Or, they could set it up so that Truth is always |
| 81 | the client and it periodically polls Mirror-1 looking for new content |
| @@ -81,10 +82,39 @@ | |
| 82 | coming from Cindy. Or, they could set it up so that Mirror-1 is always |
| 83 | the client and it periodically polls Truth looking for changes from Alice. |
| 84 | |
| 85 | The point is that VCCP works in all of these scenarios. |
| 86 | |
| 87 | ### 1.3 Name Mapping |
| 88 | |
| 89 | Different version control systems use different names to refer to the same |
| 90 | object. For example, Fossil names files using a SHA3-256 hash of the |
| 91 | unmodified file content, whereas Git uses a hardened-SHA1 hash of the file |
| 92 | content with an added prefix. Mercurial, Monotone, Bazaar, and others all |
| 93 | uses different naming schemes, so that the same check-in in any particular |
| 94 | version control system will have a different name in all other version |
| 95 | control systems. |
| 96 | |
| 97 | When mirroring a project between two version control systems, somebody |
| 98 | needs to keep track of the mapping between names. |
| 99 | |
| 100 | For example, in the second diagram above, if Mirror-1 wants to tell Truth |
| 101 | that it has a new check-in "Q" that is a child of "P", then it has to send |
| 102 | the name of check-in "P". Does it send the Git-name of "P" or the |
| 103 | Mercurial-name of "P"? If Mirror-1 sends Truth the Git-name of "P" then |
| 104 | Truth must be the system that does the name mapping. If Mirror-1 sends |
| 105 | Truth the Mercurial-name of "P", then Mirror-1 is the system that maintains |
| 106 | the mapping. |
| 107 | |
| 108 | The VCCP is designed such that both names for a |
| 109 | particular check-in or file can be sent. One of the collaborating systems |
| 110 | must still take responsibility for translating the names, but it does not |
| 111 | matter which system. As long as one or the other of the two systems |
| 112 | maintains a name mapping, the collaboration will work. Of course, it |
| 113 | also works for both systems to maintain the name map, and for maximum |
| 114 | flexibility, perhaps that should be the preferred approach. |
| 115 | |
| 116 | 2.0 Minimum Requirements |
| 117 | ------------------------ |
| 118 | |
| 119 | The VCCP is modeled after the Git fast-export and fast-import protocol. |
| 120 | That is to say, VCCP thinks in terms of "check-ins" with each check-in |
| @@ -101,11 +131,11 @@ | |
| 131 | a parent, but all the others should. Check-ins may also identify |
| 132 | zero or more "merge" parents, and zero or more "cherrypick" ancestors. |
| 133 | But the merges and cherrypicks can be ignored on systems that do not |
| 134 | support those concepts. |
| 135 | |
| 136 | VCCP assumes that every distinct version of a file and every check-in has |
| 137 | a unique name. In Git and Mercurial, those names are SHA1 hashes |
| 138 | (computed in different ways). Fossil uses SHA3-256 hashes. I'm not sure |
| 139 | what Bazaar uses. VCCP does not care how the names are derived, as long |
| 140 | as they always uniquely identify the file or check-in. |
| 141 | |
| @@ -121,5 +151,349 @@ | |
| 151 | |
| 152 | The VCCP is a client-server protocol. |
| 153 | A client formats a VCCP message and sends it to the server. |
| 154 | The server acts upon that message, formulates a reply, and sends |
| 155 | the reply back to the client. |
| 156 | |
| 157 | It does not matter what transport mechanism is used to send the VCCP |
| 158 | messages from client to server and back again. |
| 159 | But for maximum flexibility, it is suggested that HTTP (or HTTPS) be |
| 160 | used. The client sends an HTTP request to the server with the |
| 161 | VCCP message as the request content and a MIME-type of "application/x-vccp". |
| 162 | The HTTP response is another VCCP message with the same MIME-type. |
| 163 | The use of HTTP means that firewalls and proxies are not an |
| 164 | impediment to collaboration and that collaboration connection information |
| 165 | can be described by a simple URL. |
| 166 | |
| 167 | There are provisions in the VCCP design to allow authentication |
| 168 | in the body of the VCCP message itself. Or, two systems can, by |
| 169 | mutual agreement, authenticate by some external mechanism. |
| 170 | |
| 171 | ### 3.1 Message Content |
| 172 | |
| 173 | A single VCCP message round-trip can be a "push" if the client is sending |
| 174 | new check-in information to the server, or it can be a "pull" if the |
| 175 | client is polling the server to see if new check-in information is available |
| 176 | for download, or it can be both at once. |
| 177 | |
| 178 | The basic design of a VCCP message is inspired by the Git fast-export |
| 179 | protocol, but with enhancements to support incremental updates and |
| 180 | bidirectional updates and to make the message format more robust and |
| 181 | portable and simpler to generate and parse. A single message may contain |
| 182 | multiple "files", check-in descriptions that reference those files, and "tag" |
| 183 | descriptions. A "message description" section contains authentication |
| 184 | data, error codes, and other meta-data. Every request and every |
| 185 | reply contains, at a minimum, a message description. |
| 186 | |
| 187 | For a push, the request contains a message description with |
| 188 | authentication information, and the new files, check-ins, and tags |
| 189 | that are being pushed to the server. The reply to a push contains |
| 190 | success codes, and the names that the server assigned to the new objects, |
| 191 | so that the client can maintain a name map. |
| 192 | |
| 193 | For pull, the request contains only a message description with |
| 194 | authentication information and a description of what content the |
| 195 | client desires to pull. |
| 196 | The reply to a pull contains the files, check-ins, and tags requested. |
| 197 | |
| 198 | For a pull request, there is no mechanism (currently defined) for the |
| 199 | server to learn the client-side names for files and check-ins. Hence, |
| 200 | for a collaboration arrangement where the client polls the server for |
| 201 | updates, the client must maintain the name map. |
| 202 | |
| 203 | ### 3.2 Message Format Overview |
| 204 | |
| 205 | The format of a VCCP message is an ordinary SQLite database file with |
| 206 | a two-table schema. |
| 207 | The DATA table contains file, check-in, and tag content and the |
| 208 | message description. The DATA.CONTENT column contains either raw |
| 209 | file content or check-ins and tags descriptions formatted as JSON. |
| 210 | The message description is also JSON contained in a specially |
| 211 | designated row of the DATA table. The NAME table of the schema |
| 212 | is used to transmit name mappings. The NAME table serves the same |
| 213 | role as the "marks" file of git-fast-export. |
| 214 | |
| 215 | ### 3.3 Why Use A Database As The Message Format? |
| 216 | |
| 217 | Why does a VCCP message consist of an SQLite database instead of a |
| 218 | bespoke format like git-fast-export? |
| 219 | |
| 220 | 1. Some of the content to be transferred will typically be binary. |
| 221 | Most projects have at least a few images or other binary files |
| 222 | in their tree somewhere. Other files will be pure text. Check-in |
| 223 | and tag descriptions will also be pure text (JSON). That means |
| 224 | that the VCCP message will be a mix of text and binary content. |
| 225 | An SQLite database file is a convenient and efficient way |
| 226 | to encapsulate both binary and text content into a single container |
| 227 | which is easily created and accessed. |
| 228 | |
| 229 | 2. Robust, cross-platform libraries for reading and writing SQLite database |
| 230 | files already exist on every computer. No custom parser or generator |
| 231 | code needs to be written, debugged, managed, or maintained. |
| 232 | |
| 233 | 3. The SQLite database file format is well defined, cross-platform |
| 234 | (32-bit, 64-bit, bit-endian, and little-endian) and is endorsed |
| 235 | by the US Library of Congress as a recommended file format for |
| 236 | archival data storage. |
| 237 | |
| 238 | 4. Unlike a serial format (such as git-fast-export) which must |
| 239 | normally be written and read sequentially from beginning to end, |
| 240 | elements of an SQLite database can be constructed and read in any |
| 241 | order. This gives extra implementation flexibility to both readers |
| 242 | and writers. |
| 243 | |
| 244 | ### 3.4 Database Schema |
| 245 | |
| 246 | The database schema for a VCCP message is as follows: |
| 247 | |
| 248 | > |
| 249 | CREATE TABLE data( |
| 250 | id INTEGER PRIMARY KEY, |
| 251 | dclass INT, |
| 252 | sz INT, |
| 253 | calg INT, |
| 254 | cref INT, |
| 255 | content ANY |
| 256 | ); |
| 257 | CREATE TABLE name( |
| 258 | nameid INT, |
| 259 | nametype INT, |
| 260 | name TEXT, |
| 261 | PRIMARY KEY(nameid,nametype) |
| 262 | ) WITHOUT ROWID; |
| 263 | |
| 264 | The DATA table holds the message description, the content of files, and JSON |
| 265 | descriptions of check-ins and tags. The NAME table is used to transmit |
| 266 | names. The DATA table corresponds to the body of a git-fast-export stream |
| 267 | and the NAME table corresponds to the "marks" file that is read and |
| 268 | written by the "--import-marks" and "--export-marks" options of the |
| 269 | "git fast-export" command. |
| 270 | |
| 271 | Each file, check-in, and tag is normally a single distinct entry in |
| 272 | the DATA table. (Exception: very large files, greater than 1GB in size, |
| 273 | can be split across multiple DATA table rows - see below.) Entries in |
| 274 | the DATA tale can occur in any order. It is not required that files |
| 275 | referenced by check-ins have a smaller DATA.ID value, for example. |
| 276 | Free ordering does not impede data extraction (see the algorithm descriptions |
| 277 | below) but it does give considerable freedom to the message generator |
| 278 | logic. |
| 279 | |
| 280 | Each DATA row has a class identified by a small integer in the DATA.DCLASS |
| 281 | column. |
| 282 | |
| 283 | > |
| 284 | | 0: | A check-in | |
| 285 | | 1: | A file | |
| 286 | | 2: | A tag | |
| 287 | | 3: | The VCCP message description | |
| 288 | | 4: | Application-defined-1 | |
| 289 | | 5: | Application-defined-2 | |
| 290 | |
| 291 | Every well-formed VCCP message has exactly one message description entry |
| 292 | with DATA.ID=0 and DATA.DCLASS=3. No other DATA table entries should have |
| 293 | DATA.DCLASS=3. |
| 294 | |
| 295 | The application-defined values are reserved for extended uses of the |
| 296 | VCCP message format. In particular, there are plans to enhance |
| 297 | Fossil so that it uses VCCP as its sync protocol, replacing its |
| 298 | current bespoke protocol. But Fossil needs to send information other |
| 299 | kinds of objects, such as wiki pages and tickets, that are not known |
| 300 | to Git and most other version control systems. A few |
| 301 | "application defined" values are available at strategic points in |
| 302 | the message format description to accommodate these extended use cases. |
| 303 | New application-defined values may be defined in the future. |
| 304 | Portable VCCP messages between different version control systems |
| 305 | should never use the application-defined values. |
| 306 | |
| 307 | The DATA.CONTENT field can be either text or binary, as appropriate. |
| 308 | For files, the DATA.CONTENT is binary. For check-ins and tags and for |
| 309 | the message description, the DATA.CONTENT is a text JSON object. |
| 310 | |
| 311 | The DATA.CONTENT field can optionally be compressed. The DATA.SZ field |
| 312 | is the uncompressed size of the content in bytes. The compression method |
| 313 | is determined by the DATA.CALG field: |
| 314 | |
| 315 | > |
| 316 | | 0: | No compression | |
| 317 | | 1: | ZLib compression | |
| 318 | | 2: | Multi-blob | |
| 319 | | 3: | Application-defined-1 | |
| 320 | | 4: | Application-defined-2 | |
| 321 | |
| 322 | The "multi-blob" compression method means that the content is the |
| 323 | concatenation of the content in other DATA table rows. This |
| 324 | allows for content that exceeds the 1GB size limit for an SQLite |
| 325 | BLOB column. If the DATA.CALG field is 2, then DATA.CONTENT will |
| 326 | be a JSON array of integer values, where each integer is the DATA.ID |
| 327 | of another DATA table entry that contains part of the content. |
| 328 | The actual data content is the concatenation of the other DATA table |
| 329 | entries. The secondary DATA table entries can also be compressed, |
| 330 | though not with multi-blob. In other words, the multi-blob |
| 331 | compression method may not be nested. This effectively limits the |
| 332 | maximum size of a file in the VCCP to maximum size of an SQLite |
| 333 | database, which is 140 terabytes. |
| 334 | |
| 335 | Portable VCCP files should only use compression methods 0, 1, and 2, |
| 336 | and preferrably only method 0 (no compression). But application-defined |
| 337 | compression methods are available for proprietary uses of the |
| 338 | VCCP message format. The DATA.CREF field is auxiliary data intended |
| 339 | for use with these application-defined compression methods. In |
| 340 | particular, DATA.CREF is intended to be the DATA.ID of a "base" |
| 341 | entry for delta-compression methods. For a portable VCCP file, |
| 342 | the DATA.CREF field should always be NULL. |
| 343 | |
| 344 | The DATA.ID field provides an integer identifier for files and |
| 345 | check-ins. The scope of that name is the single VCCP message |
| 346 | in which the DATA table entry appears, however. The NAME table |
| 347 | is used to provide a mapping from these internal integer names |
| 348 | to the persistent global hash names of the various version |
| 349 | control systems. |
| 350 | |
| 351 | A single object can have different names, depending on which |
| 352 | version control system stores it. For this reason, the NAME |
| 353 | table is designed to allow storage of multiple names for the |
| 354 | same object. If NAME.NAMETYPE is 0, that means that the name |
| 355 | is appropriate for use on the client. If NAME.NAMETYPE is 1, |
| 356 | that means the name is appropriate for use on the server. |
| 357 | |
| 358 | To simplify the implementation of VCCP on diverse systems, |
| 359 | names should be sent as text. If the names for a particular system |
| 360 | are binary hashes, then the NAME table should store them as |
| 361 | the hexadecimal representation. |
| 362 | |
| 363 | #### 3.4.1 NAME Table Example 1 |
| 364 | |
| 365 | Suppose a client is pushing a new check-in to the server and the |
| 366 | check-in text is stored in the DATA.ID=1 row. Then the request |
| 367 | should contain a NAME table row with NAME.NAMEID=1 (to match the |
| 368 | DATA table ID value) and NAME.NAMETYPE=0 (because client names |
| 369 | have NAMETYPE 0) and with the name of that check-in according to |
| 370 | the client stored in NAME.NAME. The server will recode the |
| 371 | check-in according its its own format, and store the server-side |
| 372 | name in a new NAME table row with NAME.NAMEID=1 and NAME.NAMETYPE=1. |
| 373 | The server then includes the complete NAME table in its reply |
| 374 | back to the client. In this way, the client is able to discover |
| 375 | the name of the check-in on the server. The serve can also |
| 376 | remember the client check-in name, if desired. |
| 377 | |
| 378 | ### 3.5 Check-in JSON Format |
| 379 | |
| 380 | Check-ins are described by DATA table rows where the content is a |
| 381 | single JSON object, as follows: |
| 382 | |
| 383 | > |
| 384 | { |
| 385 | "time": DATETIME, -- Date and time of the check-in |
| 386 | "comment": TEXT, -- The original check-in comment |
| 387 | "mimetype": TEXT, -- The mimetype of the comment text |
| 388 | "branch": TEXT, -- Branch this check-in belongs to |
| 389 | "from": INT, -- NAME.NAMEID for the primary parent |
| 390 | "merge": [INT], -- Merge parents |
| 391 | "cherrypick": [INT] -- Cherrypick merges |
| 392 | "author": { -- Author of the change |
| 393 | "name": TEXT, -- Name or handle |
| 394 | "email": TEXT, -- Email address |
| 395 | "time": DATETIME -- Override for $.time |
| 396 | }, |
| 397 | "committer": { -- Committer of the change |
| 398 | "name": TEXT, -- Name or handle |
| 399 | "email": TEXT, -- Email address |
| 400 | "time": DATETIME -- Override for $.time |
| 401 | }, |
| 402 | "tag": [{ -- Tags and properties for this check-in |
| 403 | "name": TEXT, -- tag name |
| 404 | "value": TEXT, -- value (if it is a property) |
| 405 | "delete": 1, -- If present, delete this tag |
| 406 | "propagate": 1 -- Means propagate to descendants |
| 407 | }], |
| 408 | "reset": 1, -- All files included, not just changes |
| 409 | "file": [{ -- File in this check-in |
| 410 | "fname": TEXT, -- filename |
| 411 | "id": INT, -- DATA.ID or NAME_NAMEID. Omitted to delete |
| 412 | "mode": TEXT, -- "x" for executable. "l" for symlink |
| 413 | "oldname": TEXT -- Prior name if the file is renamed |
| 414 | }] |
| 415 | } |
| 416 | |
| 417 | The $.time element is defines the moment in time when the check-in |
| 418 | occurred. The $.time field is required. Times are always Coordinated |
| 419 | Universal Time (UTC). DATETIME can be represented in multiple ways: |
| 420 | |
| 421 | 1. If the DATETIME is an integer, then it is the number of seconds |
| 422 | since 1970 (also known as "unix time"). |
| 423 | |
| 424 | 2. If the DATETIME is text, then it is ISO8601 as follows: |
| 425 | "YYYY-MM-DD HH:MM:SS.SSS". The fractional seconds may be |
| 426 | omitted. |
| 427 | |
| 428 | 3. If the DATETIME is a real number, then it is the fractional |
| 429 | julian day number. |
| 430 | |
| 431 | The $.comment element is the check-in comment. The $.comment field is |
| 432 | required. The mimetype for $.commit defaults to "text/plain" but can |
| 433 | be some other MIME-type if the $.mimetype field is present. |
| 434 | |
| 435 | The $.branch element defines the name of the branch that this check-in |
| 436 | belongs to. If omitted, the branch of the check-in is the same as |
| 437 | the branch of its primary parent check-in. |
| 438 | |
| 439 | The $.from element is defines the primary parent check-in. Every |
| 440 | check-in other than the first check-in of the project has a primary |
| 441 | parent. The integer value of the $.from element is either the |
| 442 | DATA.ID value for another check-in in the same VCCP message or is |
| 443 | the NAME.NAMEID value for a NAME table entry that identifies the |
| 444 | parent check-in, or both. If the information sender is relying on the |
| 445 | other side to do name mapping, then only the local name will be provided. |
| 446 | But if the information sender has a name map, it should provide both |
| 447 | its local name and the remote name for the check-in, so that the receiver |
| 448 | can update its name map. |
| 449 | |
| 450 | The $.merge element is an array of integers for additional check-ins |
| 451 | that are merged into the current check-in. The $.cherrypick element |
| 452 | is an array of integer values that are check-ins that are cherrypick-merged |
| 453 | into the current check-in. Systems that do not record cherrypick merges |
| 454 | can ignore the $.cherrypick value. |
| 455 | |
| 456 | The $.author and $.committer elements define who created the check-in. |
| 457 | The $.committer element is required. The $.author element may be omitted |
| 458 | in the common case where the author and committer are the same. The |
| 459 | $.committer.time and $.author.time subelements should only be included |
| 460 | if they are different from $.time. |
| 461 | |
| 462 | The $.reset element, if present, should have an integer value of "1". |
| 463 | The presence of the $.reset element is a flag that affects the meaning |
| 464 | of the $.file element. |
| 465 | |
| 466 | The $.file element is an array of JSON objects that define the files |
| 467 | associated with the check-in. If the $.reset flag is present, then there |
| 468 | must be one entry in $.file for every file in the check-in. If the |
| 469 | $.reset flag is omitted (the common case) then there is one entry |
| 470 | in $.file for every file that changes relative to the primary parent |
| 471 | in $.from. If There is no primary parent, then the presence of the |
| 472 | $.reset flag is assumed even if it is omitted. |
| 473 | |
| 474 | The $.file[].fname element is the name of the file. |
| 475 | The $.file[].id element corresponds to a DATA.ID or NAME.NAMEID |
| 476 | that is the content of the file. If the file is being removed |
| 477 | by this check-in, then the $.file[].id element is omitted. |
| 478 | The $.file[].mode element is text containing one or more ASCII |
| 479 | characters. If the "x" character is included in $.file[].mode |
| 480 | then the file is executable. If the "l" character is included |
| 481 | in $.file[].mode then the file is a symbolic link (and the content |
| 482 | of the file is the target of the link). The $.file[].mode may |
| 483 | be blank or omitted for a normal read/write file. If a file |
| 484 | is being renamed, the $.file[].oldname field may be included |
| 485 | to show the previous name of the file, if that information is |
| 486 | available. |
| 487 | |
| 488 | Some version control systems allow tags and properties to be |
| 489 | associated with a check-in. The $.tag element supports this |
| 490 | feature. Each element of the $.tag array is a separate tag |
| 491 | or property. If the $.tag[].propagate field exists and has |
| 492 | a value of "1", then the tag/property propagates to all |
| 493 | non-merge children. If the $.tag[].delete field exists and |
| 494 | has a value of "1", then a propagating tag or property with |
| 495 | the given name that was set by some ancestor check-in is |
| 496 | stopped and omitted from this check-in. Version control |
| 497 | systems that do not support tags and/or properties on check-ins |
| 498 | or that do not support tag propagation can ignore all of these |
| 499 | attributes. |
| 500 |