Fossil SCM

Continuing work on the VCCP spec. This is an incremental check-in.

drh 2019-03-13 15:34 vccp
Commit 60794e993c9e3dad82a475203c63a1dda5c4b134f35292f712c43b82f300a9d8
1 file changed +385 -11
+385 -11
--- www/vccp/intro.md
+++ www/vccp/intro.md
@@ -1,11 +1,11 @@
11
Version Control Collaboration Protocol
22
======================================
33
44
<blockquote><center style='background: yellow; border: 1px solid black;'>
55
This document is a work in progress.<br>
6
-The last update was on 2019-03-09.<br>
6
+The last update was on 2019-03-13.<br>
77
Check back later for updates.
88
</center></blockquote>
99
1010
1.0 Introduction
1111
----------------
@@ -12,22 +12,22 @@
1212
1313
The Version Control Collaboration Protocol or VCCP is an attempt to make
1414
it easier for developers to collaborate even when they are using different
1515
version control systems.
1616
17
-For example, suppose Alice, the founder and
18
-[BDFL](https://en.wikipedia.org/wiki/Benevolent_dictator_for_life)
17
+For example, suppose Alice, the founder and principal maintainer
1918
for the fictional "BambooCoffee" project, prefers using the
2019
[Mercurial](https://www.mercurial-scm.org/) version control system,
2120
but two of her clients, Bob and Cindy, know nothing but
2221
[Git](https://www.git-scm.org/) and steadfastly refuse to
23
-type any command that begins with "hg", and an important
22
+type any command that begins with "hg".
23
+Further suppose that an important
2424
collaborator, Dave, really prefers [Bazaar](bazaar.canonical.com/).
2525
The VCCP is designed to make it relatively easy and painless
2626
for Alice to set up Git and Bazaar mirrors of her Mercurial
27
-repository so that Bob, Cindy, and Dave can all use the tools with
28
-which they are most familiar.
27
+repository so that Bob, Cindy, and Dave can all use the tools
28
+they are most familiar with.
2929
3030
<center>![](diagram-1.jpg)</center>
3131
3232
Assuming all the servers speak VCCP (which is not the case at the
3333
time of this writing, but we hope to encourage that for the future)
@@ -38,11 +38,11 @@
3838
### 1.1 Bidirectional Collaboration
3939
4040
The diagram above shows that all changes originate from Alice and
4141
that Bob, Cindy, and David are only consumers. If Cindy wanted to
4242
make a change to BambooCoffee, she would have to do that with a backchannel,
43
-such as sending a patch via email to Alice and then get Alice to check
43
+such as sending a patch via email to Alice and asking Alice to check
4444
in the change.
4545
4646
But VCCP also support bidirectional collaboration.
4747
4848
<center>![](diagram-2.jpg)</center>
@@ -55,26 +55,27 @@
5555
VCCP message back to Truth containing Cindy's changes. Truth would then
5656
relay those changes over to Mirror-2 where Dave could see them as well.
5757
5858
### 1.2 Client-Mirror versus Server-Mirror
5959
60
-VCCP allows the mirrors to be set up as either clients or server.
60
+VCCP allows the mirrors to be set up as either clients or servers.
6161
6262
In the client-mirror approach, the mirrors periodically poll Truth asking
6363
for changes. In the server-mirror approach, Truth sends changes to the
6464
mirrors as they occur.
6565
6666
In the first example above, the implication was that the server-mirror
6767
approach was being used. The Truth repository would take the initiative
6868
to send changes to the mirrors. But it does not have to be that way.
6969
Suppose Dave is unknown to Alice. Suppose he just likes Alice's work and
70
-wants to keep his own mirror for his own convenience. Dave could set up
70
+wants to keep his own mirror of her work for his own convenience.
71
+Dave could set up
7172
Mirror-2 as a client-mirror that periodically polls Truth for changes.
7273
7374
In the second example above, Truth and Mirror-1 could be configured to
7475
have a Peer-to-Peer relationship rather than a Truth-to-Mirror relationship.
75
-When new content arrives a Truth (because Alice did an "hg commit"),
76
+When new content arrives at Truth (because Alice did an "hg commit"),
7677
Truth acts as a client to initiate a transfer of that new information
7778
over to Mirror-1. When new content originates at Mirror-1 (because
7879
Cindy did "git commit") then Mirror-1 acts as a client to send a the new
7980
content over the Truth. Or, they could set it up so that Truth is always
8081
the client and it periodically polls Mirror-1 looking for new content
@@ -81,10 +82,39 @@
8182
coming from Cindy. Or, they could set it up so that Mirror-1 is always
8283
the client and it periodically polls Truth looking for changes from Alice.
8384
8485
The point is that VCCP works in all of these scenarios.
8586
87
+### 1.3 Name Mapping
88
+
89
+Different version control systems use different names to refer to the same
90
+object. For example, Fossil names files using a SHA3-256 hash of the
91
+unmodified file content, whereas Git uses a hardened-SHA1 hash of the file
92
+content with an added prefix. Mercurial, Monotone, Bazaar, and others all
93
+uses different naming schemes, so that the same check-in in any particular
94
+version control system will have a different name in all other version
95
+control systems.
96
+
97
+When mirroring a project between two version control systems, somebody
98
+needs to keep track of the mapping between names.
99
+
100
+For example, in the second diagram above, if Mirror-1 wants to tell Truth
101
+that it has a new check-in "Q" that is a child of "P", then it has to send
102
+the name of check-in "P". Does it send the Git-name of "P" or the
103
+Mercurial-name of "P"? If Mirror-1 sends Truth the Git-name of "P" then
104
+Truth must be the system that does the name mapping. If Mirror-1 sends
105
+Truth the Mercurial-name of "P", then Mirror-1 is the system that maintains
106
+the mapping.
107
+
108
+The VCCP is designed such that both names for a
109
+particular check-in or file can be sent. One of the collaborating systems
110
+must still take responsibility for translating the names, but it does not
111
+matter which system. As long as one or the other of the two systems
112
+maintains a name mapping, the collaboration will work. Of course, it
113
+also works for both systems to maintain the name map, and for maximum
114
+flexibility, perhaps that should be the preferred approach.
115
+
86116
2.0 Minimum Requirements
87117
------------------------
88118
89119
The VCCP is modeled after the Git fast-export and fast-import protocol.
90120
That is to say, VCCP thinks in terms of "check-ins" with each check-in
@@ -101,11 +131,11 @@
101131
a parent, but all the others should. Check-ins may also identify
102132
zero or more "merge" parents, and zero or more "cherrypick" ancestors.
103133
But the merges and cherrypicks can be ignored on systems that do not
104134
support those concepts.
105135
106
-VCCP assumes that every distinct version of a file, and every check-in has
136
+VCCP assumes that every distinct version of a file and every check-in has
107137
a unique name. In Git and Mercurial, those names are SHA1 hashes
108138
(computed in different ways). Fossil uses SHA3-256 hashes. I'm not sure
109139
what Bazaar uses. VCCP does not care how the names are derived, as long
110140
as they always uniquely identify the file or check-in.
111141
@@ -121,5 +151,349 @@
121151
122152
The VCCP is a client-server protocol.
123153
A client formats a VCCP message and sends it to the server.
124154
The server acts upon that message, formulates a reply, and sends
125155
the reply back to the client.
156
+
157
+It does not matter what transport mechanism is used to send the VCCP
158
+messages from client to server and back again.
159
+But for maximum flexibility, it is suggested that HTTP (or HTTPS) be
160
+used. The client sends an HTTP request to the server with the
161
+VCCP message as the request content and a MIME-type of "application/x-vccp".
162
+The HTTP response is another VCCP message with the same MIME-type.
163
+The use of HTTP means that firewalls and proxies are not an
164
+impediment to collaboration and that collaboration connection information
165
+can be described by a simple URL.
166
+
167
+There are provisions in the VCCP design to allow authentication
168
+in the body of the VCCP message itself. Or, two systems can, by
169
+mutual agreement, authenticate by some external mechanism.
170
+
171
+### 3.1 Message Content
172
+
173
+A single VCCP message round-trip can be a "push" if the client is sending
174
+new check-in information to the server, or it can be a "pull" if the
175
+client is polling the server to see if new check-in information is available
176
+for download, or it can be both at once.
177
+
178
+The basic design of a VCCP message is inspired by the Git fast-export
179
+protocol, but with enhancements to support incremental updates and
180
+bidirectional updates and to make the message format more robust and
181
+portable and simpler to generate and parse. A single message may contain
182
+multiple "files", check-in descriptions that reference those files, and "tag"
183
+descriptions. A "message description" section contains authentication
184
+data, error codes, and other meta-data. Every request and every
185
+reply contains, at a minimum, a message description.
186
+
187
+For a push, the request contains a message description with
188
+authentication information, and the new files, check-ins, and tags
189
+that are being pushed to the server. The reply to a push contains
190
+success codes, and the names that the server assigned to the new objects,
191
+so that the client can maintain a name map.
192
+
193
+For pull, the request contains only a message description with
194
+authentication information and a description of what content the
195
+client desires to pull.
196
+The reply to a pull contains the files, check-ins, and tags requested.
197
+
198
+For a pull request, there is no mechanism (currently defined) for the
199
+server to learn the client-side names for files and check-ins. Hence,
200
+for a collaboration arrangement where the client polls the server for
201
+updates, the client must maintain the name map.
202
+
203
+### 3.2 Message Format Overview
204
+
205
+The format of a VCCP message is an ordinary SQLite database file with
206
+a two-table schema.
207
+The DATA table contains file, check-in, and tag content and the
208
+message description. The DATA.CONTENT column contains either raw
209
+file content or check-ins and tags descriptions formatted as JSON.
210
+The message description is also JSON contained in a specially
211
+designated row of the DATA table. The NAME table of the schema
212
+is used to transmit name mappings. The NAME table serves the same
213
+role as the "marks" file of git-fast-export.
214
+
215
+### 3.3 Why Use A Database As The Message Format?
216
+
217
+Why does a VCCP message consist of an SQLite database instead of a
218
+bespoke format like git-fast-export?
219
+
220
+ 1. Some of the content to be transferred will typically be binary.
221
+ Most projects have at least a few images or other binary files
222
+ in their tree somewhere. Other files will be pure text. Check-in
223
+ and tag descriptions will also be pure text (JSON). That means
224
+ that the VCCP message will be a mix of text and binary content.
225
+ An SQLite database file is a convenient and efficient way
226
+ to encapsulate both binary and text content into a single container
227
+ which is easily created and accessed.
228
+
229
+ 2. Robust, cross-platform libraries for reading and writing SQLite database
230
+ files already exist on every computer. No custom parser or generator
231
+ code needs to be written, debugged, managed, or maintained.
232
+
233
+ 3. The SQLite database file format is well defined, cross-platform
234
+ (32-bit, 64-bit, bit-endian, and little-endian) and is endorsed
235
+ by the US Library of Congress as a recommended file format for
236
+ archival data storage.
237
+
238
+ 4. Unlike a serial format (such as git-fast-export) which must
239
+ normally be written and read sequentially from beginning to end,
240
+ elements of an SQLite database can be constructed and read in any
241
+ order. This gives extra implementation flexibility to both readers
242
+ and writers.
243
+
244
+### 3.4 Database Schema
245
+
246
+The database schema for a VCCP message is as follows:
247
+
248
+>
249
+ CREATE TABLE data(
250
+ id INTEGER PRIMARY KEY,
251
+ dclass INT,
252
+ sz INT,
253
+ calg INT,
254
+ cref INT,
255
+ content ANY
256
+ );
257
+ CREATE TABLE name(
258
+ nameid INT,
259
+ nametype INT,
260
+ name TEXT,
261
+ PRIMARY KEY(nameid,nametype)
262
+ ) WITHOUT ROWID;
263
+
264
+The DATA table holds the message description, the content of files, and JSON
265
+descriptions of check-ins and tags. The NAME table is used to transmit
266
+names. The DATA table corresponds to the body of a git-fast-export stream
267
+and the NAME table corresponds to the "marks" file that is read and
268
+written by the "--import-marks" and "--export-marks" options of the
269
+"git fast-export" command.
270
+
271
+Each file, check-in, and tag is normally a single distinct entry in
272
+the DATA table. (Exception: very large files, greater than 1GB in size,
273
+can be split across multiple DATA table rows - see below.) Entries in
274
+the DATA tale can occur in any order. It is not required that files
275
+referenced by check-ins have a smaller DATA.ID value, for example.
276
+Free ordering does not impede data extraction (see the algorithm descriptions
277
+below) but it does give considerable freedom to the message generator
278
+logic.
279
+
280
+Each DATA row has a class identified by a small integer in the DATA.DCLASS
281
+column.
282
+
283
+>
284
+| 0: | A check-in |
285
+| 1: | A file |
286
+| 2: | A tag |
287
+| 3: | The VCCP message description |
288
+| 4: | Application-defined-1 |
289
+| 5: | Application-defined-2 |
290
+
291
+Every well-formed VCCP message has exactly one message description entry
292
+with DATA.ID=0 and DATA.DCLASS=3. No other DATA table entries should have
293
+DATA.DCLASS=3.
294
+
295
+The application-defined values are reserved for extended uses of the
296
+VCCP message format. In particular, there are plans to enhance
297
+Fossil so that it uses VCCP as its sync protocol, replacing its
298
+current bespoke protocol. But Fossil needs to send information other
299
+kinds of objects, such as wiki pages and tickets, that are not known
300
+to Git and most other version control systems. A few
301
+"application defined" values are available at strategic points in
302
+the message format description to accommodate these extended use cases.
303
+New application-defined values may be defined in the future.
304
+Portable VCCP messages between different version control systems
305
+should never use the application-defined values.
306
+
307
+The DATA.CONTENT field can be either text or binary, as appropriate.
308
+For files, the DATA.CONTENT is binary. For check-ins and tags and for
309
+the message description, the DATA.CONTENT is a text JSON object.
310
+
311
+The DATA.CONTENT field can optionally be compressed. The DATA.SZ field
312
+is the uncompressed size of the content in bytes. The compression method
313
+is determined by the DATA.CALG field:
314
+
315
+>
316
+| 0: | No compression |
317
+| 1: | ZLib compression |
318
+| 2: | Multi-blob |
319
+| 3: | Application-defined-1 |
320
+| 4: | Application-defined-2 |
321
+
322
+The "multi-blob" compression method means that the content is the
323
+concatenation of the content in other DATA table rows. This
324
+allows for content that exceeds the 1GB size limit for an SQLite
325
+BLOB column. If the DATA.CALG field is 2, then DATA.CONTENT will
326
+be a JSON array of integer values, where each integer is the DATA.ID
327
+of another DATA table entry that contains part of the content.
328
+The actual data content is the concatenation of the other DATA table
329
+entries. The secondary DATA table entries can also be compressed,
330
+though not with multi-blob. In other words, the multi-blob
331
+compression method may not be nested. This effectively limits the
332
+maximum size of a file in the VCCP to maximum size of an SQLite
333
+database, which is 140 terabytes.
334
+
335
+Portable VCCP files should only use compression methods 0, 1, and 2,
336
+and preferrably only method 0 (no compression). But application-defined
337
+compression methods are available for proprietary uses of the
338
+VCCP message format. The DATA.CREF field is auxiliary data intended
339
+for use with these application-defined compression methods. In
340
+particular, DATA.CREF is intended to be the DATA.ID of a "base"
341
+entry for delta-compression methods. For a portable VCCP file,
342
+the DATA.CREF field should always be NULL.
343
+
344
+The DATA.ID field provides an integer identifier for files and
345
+check-ins. The scope of that name is the single VCCP message
346
+in which the DATA table entry appears, however. The NAME table
347
+is used to provide a mapping from these internal integer names
348
+to the persistent global hash names of the various version
349
+control systems.
350
+
351
+A single object can have different names, depending on which
352
+version control system stores it. For this reason, the NAME
353
+table is designed to allow storage of multiple names for the
354
+same object. If NAME.NAMETYPE is 0, that means that the name
355
+is appropriate for use on the client. If NAME.NAMETYPE is 1,
356
+that means the name is appropriate for use on the server.
357
+
358
+To simplify the implementation of VCCP on diverse systems,
359
+names should be sent as text. If the names for a particular system
360
+are binary hashes, then the NAME table should store them as
361
+the hexadecimal representation.
362
+
363
+#### 3.4.1 NAME Table Example 1
364
+
365
+Suppose a client is pushing a new check-in to the server and the
366
+check-in text is stored in the DATA.ID=1 row. Then the request
367
+should contain a NAME table row with NAME.NAMEID=1 (to match the
368
+DATA table ID value) and NAME.NAMETYPE=0 (because client names
369
+have NAMETYPE 0) and with the name of that check-in according to
370
+the client stored in NAME.NAME. The server will recode the
371
+check-in according its its own format, and store the server-side
372
+name in a new NAME table row with NAME.NAMEID=1 and NAME.NAMETYPE=1.
373
+The server then includes the complete NAME table in its reply
374
+back to the client. In this way, the client is able to discover
375
+the name of the check-in on the server. The serve can also
376
+remember the client check-in name, if desired.
377
+
378
+### 3.5 Check-in JSON Format
379
+
380
+Check-ins are described by DATA table rows where the content is a
381
+single JSON object, as follows:
382
+
383
+>
384
+ {
385
+ "time": DATETIME, -- Date and time of the check-in
386
+ "comment": TEXT, -- The original check-in comment
387
+ "mimetype": TEXT, -- The mimetype of the comment text
388
+ "branch": TEXT, -- Branch this check-in belongs to
389
+ "from": INT, -- NAME.NAMEID for the primary parent
390
+ "merge": [INT], -- Merge parents
391
+ "cherrypick": [INT] -- Cherrypick merges
392
+ "author": { -- Author of the change
393
+ "name": TEXT, -- Name or handle
394
+ "email": TEXT, -- Email address
395
+ "time": DATETIME -- Override for $.time
396
+ },
397
+ "committer": { -- Committer of the change
398
+ "name": TEXT, -- Name or handle
399
+ "email": TEXT, -- Email address
400
+ "time": DATETIME -- Override for $.time
401
+ },
402
+ "tag": [{ -- Tags and properties for this check-in
403
+ "name": TEXT, -- tag name
404
+ "value": TEXT, -- value (if it is a property)
405
+ "delete": 1, -- If present, delete this tag
406
+ "propagate": 1 -- Means propagate to descendants
407
+ }],
408
+ "reset": 1, -- All files included, not just changes
409
+ "file": [{ -- File in this check-in
410
+ "fname": TEXT, -- filename
411
+ "id": INT, -- DATA.ID or NAME_NAMEID. Omitted to delete
412
+ "mode": TEXT, -- "x" for executable. "l" for symlink
413
+ "oldname": TEXT -- Prior name if the file is renamed
414
+ }]
415
+ }
416
+
417
+The $.time element is defines the moment in time when the check-in
418
+occurred. The $.time field is required. Times are always Coordinated
419
+Universal Time (UTC). DATETIME can be represented in multiple ways:
420
+
421
+ 1. If the DATETIME is an integer, then it is the number of seconds
422
+ since 1970 (also known as "unix time").
423
+
424
+ 2. If the DATETIME is text, then it is ISO8601 as follows:
425
+ "YYYY-MM-DD HH:MM:SS.SSS". The fractional seconds may be
426
+ omitted.
427
+
428
+ 3. If the DATETIME is a real number, then it is the fractional
429
+ julian day number.
430
+
431
+The $.comment element is the check-in comment. The $.comment field is
432
+required. The mimetype for $.commit defaults to "text/plain" but can
433
+be some other MIME-type if the $.mimetype field is present.
434
+
435
+The $.branch element defines the name of the branch that this check-in
436
+belongs to. If omitted, the branch of the check-in is the same as
437
+the branch of its primary parent check-in.
438
+
439
+The $.from element is defines the primary parent check-in. Every
440
+check-in other than the first check-in of the project has a primary
441
+parent. The integer value of the $.from element is either the
442
+DATA.ID value for another check-in in the same VCCP message or is
443
+the NAME.NAMEID value for a NAME table entry that identifies the
444
+parent check-in, or both. If the information sender is relying on the
445
+other side to do name mapping, then only the local name will be provided.
446
+But if the information sender has a name map, it should provide both
447
+its local name and the remote name for the check-in, so that the receiver
448
+can update its name map.
449
+
450
+The $.merge element is an array of integers for additional check-ins
451
+that are merged into the current check-in. The $.cherrypick element
452
+is an array of integer values that are check-ins that are cherrypick-merged
453
+into the current check-in. Systems that do not record cherrypick merges
454
+can ignore the $.cherrypick value.
455
+
456
+The $.author and $.committer elements define who created the check-in.
457
+The $.committer element is required. The $.author element may be omitted
458
+in the common case where the author and committer are the same. The
459
+$.committer.time and $.author.time subelements should only be included
460
+if they are different from $.time.
461
+
462
+The $.reset element, if present, should have an integer value of "1".
463
+The presence of the $.reset element is a flag that affects the meaning
464
+of the $.file element.
465
+
466
+The $.file element is an array of JSON objects that define the files
467
+associated with the check-in. If the $.reset flag is present, then there
468
+must be one entry in $.file for every file in the check-in. If the
469
+$.reset flag is omitted (the common case) then there is one entry
470
+in $.file for every file that changes relative to the primary parent
471
+in $.from. If There is no primary parent, then the presence of the
472
+$.reset flag is assumed even if it is omitted.
473
+
474
+The $.file[].fname element is the name of the file.
475
+The $.file[].id element corresponds to a DATA.ID or NAME.NAMEID
476
+that is the content of the file. If the file is being removed
477
+by this check-in, then the $.file[].id element is omitted.
478
+The $.file[].mode element is text containing one or more ASCII
479
+characters. If the "x" character is included in $.file[].mode
480
+then the file is executable. If the "l" character is included
481
+in $.file[].mode then the file is a symbolic link (and the content
482
+of the file is the target of the link). The $.file[].mode may
483
+be blank or omitted for a normal read/write file. If a file
484
+is being renamed, the $.file[].oldname field may be included
485
+to show the previous name of the file, if that information is
486
+available.
487
+
488
+Some version control systems allow tags and properties to be
489
+associated with a check-in. The $.tag element supports this
490
+feature. Each element of the $.tag array is a separate tag
491
+or property. If the $.tag[].propagate field exists and has
492
+a value of "1", then the tag/property propagates to all
493
+non-merge children. If the $.tag[].delete field exists and
494
+has a value of "1", then a propagating tag or property with
495
+the given name that was set by some ancestor check-in is
496
+stopped and omitted from this check-in. Version control
497
+systems that do not support tags and/or properties on check-ins
498
+or that do not support tag propagation can ignore all of these
499
+attributes.
126500
--- www/vccp/intro.md
+++ www/vccp/intro.md
@@ -1,11 +1,11 @@
1 Version Control Collaboration Protocol
2 ======================================
3
4 <blockquote><center style='background: yellow; border: 1px solid black;'>
5 This document is a work in progress.<br>
6 The last update was on 2019-03-09.<br>
7 Check back later for updates.
8 </center></blockquote>
9
10 1.0 Introduction
11 ----------------
@@ -12,22 +12,22 @@
12
13 The Version Control Collaboration Protocol or VCCP is an attempt to make
14 it easier for developers to collaborate even when they are using different
15 version control systems.
16
17 For example, suppose Alice, the founder and
18 [BDFL](https://en.wikipedia.org/wiki/Benevolent_dictator_for_life)
19 for the fictional "BambooCoffee" project, prefers using the
20 [Mercurial](https://www.mercurial-scm.org/) version control system,
21 but two of her clients, Bob and Cindy, know nothing but
22 [Git](https://www.git-scm.org/) and steadfastly refuse to
23 type any command that begins with "hg", and an important
 
24 collaborator, Dave, really prefers [Bazaar](bazaar.canonical.com/).
25 The VCCP is designed to make it relatively easy and painless
26 for Alice to set up Git and Bazaar mirrors of her Mercurial
27 repository so that Bob, Cindy, and Dave can all use the tools with
28 which they are most familiar.
29
30 <center>![](diagram-1.jpg)</center>
31
32 Assuming all the servers speak VCCP (which is not the case at the
33 time of this writing, but we hope to encourage that for the future)
@@ -38,11 +38,11 @@
38 ### 1.1 Bidirectional Collaboration
39
40 The diagram above shows that all changes originate from Alice and
41 that Bob, Cindy, and David are only consumers. If Cindy wanted to
42 make a change to BambooCoffee, she would have to do that with a backchannel,
43 such as sending a patch via email to Alice and then get Alice to check
44 in the change.
45
46 But VCCP also support bidirectional collaboration.
47
48 <center>![](diagram-2.jpg)</center>
@@ -55,26 +55,27 @@
55 VCCP message back to Truth containing Cindy's changes. Truth would then
56 relay those changes over to Mirror-2 where Dave could see them as well.
57
58 ### 1.2 Client-Mirror versus Server-Mirror
59
60 VCCP allows the mirrors to be set up as either clients or server.
61
62 In the client-mirror approach, the mirrors periodically poll Truth asking
63 for changes. In the server-mirror approach, Truth sends changes to the
64 mirrors as they occur.
65
66 In the first example above, the implication was that the server-mirror
67 approach was being used. The Truth repository would take the initiative
68 to send changes to the mirrors. But it does not have to be that way.
69 Suppose Dave is unknown to Alice. Suppose he just likes Alice's work and
70 wants to keep his own mirror for his own convenience. Dave could set up
 
71 Mirror-2 as a client-mirror that periodically polls Truth for changes.
72
73 In the second example above, Truth and Mirror-1 could be configured to
74 have a Peer-to-Peer relationship rather than a Truth-to-Mirror relationship.
75 When new content arrives a Truth (because Alice did an "hg commit"),
76 Truth acts as a client to initiate a transfer of that new information
77 over to Mirror-1. When new content originates at Mirror-1 (because
78 Cindy did "git commit") then Mirror-1 acts as a client to send a the new
79 content over the Truth. Or, they could set it up so that Truth is always
80 the client and it periodically polls Mirror-1 looking for new content
@@ -81,10 +82,39 @@
81 coming from Cindy. Or, they could set it up so that Mirror-1 is always
82 the client and it periodically polls Truth looking for changes from Alice.
83
84 The point is that VCCP works in all of these scenarios.
85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86 2.0 Minimum Requirements
87 ------------------------
88
89 The VCCP is modeled after the Git fast-export and fast-import protocol.
90 That is to say, VCCP thinks in terms of "check-ins" with each check-in
@@ -101,11 +131,11 @@
101 a parent, but all the others should. Check-ins may also identify
102 zero or more "merge" parents, and zero or more "cherrypick" ancestors.
103 But the merges and cherrypicks can be ignored on systems that do not
104 support those concepts.
105
106 VCCP assumes that every distinct version of a file, and every check-in has
107 a unique name. In Git and Mercurial, those names are SHA1 hashes
108 (computed in different ways). Fossil uses SHA3-256 hashes. I'm not sure
109 what Bazaar uses. VCCP does not care how the names are derived, as long
110 as they always uniquely identify the file or check-in.
111
@@ -121,5 +151,349 @@
121
122 The VCCP is a client-server protocol.
123 A client formats a VCCP message and sends it to the server.
124 The server acts upon that message, formulates a reply, and sends
125 the reply back to the client.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
126
--- www/vccp/intro.md
+++ www/vccp/intro.md
@@ -1,11 +1,11 @@
1 Version Control Collaboration Protocol
2 ======================================
3
4 <blockquote><center style='background: yellow; border: 1px solid black;'>
5 This document is a work in progress.<br>
6 The last update was on 2019-03-13.<br>
7 Check back later for updates.
8 </center></blockquote>
9
10 1.0 Introduction
11 ----------------
@@ -12,22 +12,22 @@
12
13 The Version Control Collaboration Protocol or VCCP is an attempt to make
14 it easier for developers to collaborate even when they are using different
15 version control systems.
16
17 For example, suppose Alice, the founder and principal maintainer
 
18 for the fictional "BambooCoffee" project, prefers using the
19 [Mercurial](https://www.mercurial-scm.org/) version control system,
20 but two of her clients, Bob and Cindy, know nothing but
21 [Git](https://www.git-scm.org/) and steadfastly refuse to
22 type any command that begins with "hg".
23 Further suppose that an important
24 collaborator, Dave, really prefers [Bazaar](bazaar.canonical.com/).
25 The VCCP is designed to make it relatively easy and painless
26 for Alice to set up Git and Bazaar mirrors of her Mercurial
27 repository so that Bob, Cindy, and Dave can all use the tools
28 they are most familiar with.
29
30 <center>![](diagram-1.jpg)</center>
31
32 Assuming all the servers speak VCCP (which is not the case at the
33 time of this writing, but we hope to encourage that for the future)
@@ -38,11 +38,11 @@
38 ### 1.1 Bidirectional Collaboration
39
40 The diagram above shows that all changes originate from Alice and
41 that Bob, Cindy, and David are only consumers. If Cindy wanted to
42 make a change to BambooCoffee, she would have to do that with a backchannel,
43 such as sending a patch via email to Alice and asking Alice to check
44 in the change.
45
46 But VCCP also support bidirectional collaboration.
47
48 <center>![](diagram-2.jpg)</center>
@@ -55,26 +55,27 @@
55 VCCP message back to Truth containing Cindy's changes. Truth would then
56 relay those changes over to Mirror-2 where Dave could see them as well.
57
58 ### 1.2 Client-Mirror versus Server-Mirror
59
60 VCCP allows the mirrors to be set up as either clients or servers.
61
62 In the client-mirror approach, the mirrors periodically poll Truth asking
63 for changes. In the server-mirror approach, Truth sends changes to the
64 mirrors as they occur.
65
66 In the first example above, the implication was that the server-mirror
67 approach was being used. The Truth repository would take the initiative
68 to send changes to the mirrors. But it does not have to be that way.
69 Suppose Dave is unknown to Alice. Suppose he just likes Alice's work and
70 wants to keep his own mirror of her work for his own convenience.
71 Dave could set up
72 Mirror-2 as a client-mirror that periodically polls Truth for changes.
73
74 In the second example above, Truth and Mirror-1 could be configured to
75 have a Peer-to-Peer relationship rather than a Truth-to-Mirror relationship.
76 When new content arrives at Truth (because Alice did an "hg commit"),
77 Truth acts as a client to initiate a transfer of that new information
78 over to Mirror-1. When new content originates at Mirror-1 (because
79 Cindy did "git commit") then Mirror-1 acts as a client to send a the new
80 content over the Truth. Or, they could set it up so that Truth is always
81 the client and it periodically polls Mirror-1 looking for new content
@@ -81,10 +82,39 @@
82 coming from Cindy. Or, they could set it up so that Mirror-1 is always
83 the client and it periodically polls Truth looking for changes from Alice.
84
85 The point is that VCCP works in all of these scenarios.
86
87 ### 1.3 Name Mapping
88
89 Different version control systems use different names to refer to the same
90 object. For example, Fossil names files using a SHA3-256 hash of the
91 unmodified file content, whereas Git uses a hardened-SHA1 hash of the file
92 content with an added prefix. Mercurial, Monotone, Bazaar, and others all
93 uses different naming schemes, so that the same check-in in any particular
94 version control system will have a different name in all other version
95 control systems.
96
97 When mirroring a project between two version control systems, somebody
98 needs to keep track of the mapping between names.
99
100 For example, in the second diagram above, if Mirror-1 wants to tell Truth
101 that it has a new check-in "Q" that is a child of "P", then it has to send
102 the name of check-in "P". Does it send the Git-name of "P" or the
103 Mercurial-name of "P"? If Mirror-1 sends Truth the Git-name of "P" then
104 Truth must be the system that does the name mapping. If Mirror-1 sends
105 Truth the Mercurial-name of "P", then Mirror-1 is the system that maintains
106 the mapping.
107
108 The VCCP is designed such that both names for a
109 particular check-in or file can be sent. One of the collaborating systems
110 must still take responsibility for translating the names, but it does not
111 matter which system. As long as one or the other of the two systems
112 maintains a name mapping, the collaboration will work. Of course, it
113 also works for both systems to maintain the name map, and for maximum
114 flexibility, perhaps that should be the preferred approach.
115
116 2.0 Minimum Requirements
117 ------------------------
118
119 The VCCP is modeled after the Git fast-export and fast-import protocol.
120 That is to say, VCCP thinks in terms of "check-ins" with each check-in
@@ -101,11 +131,11 @@
131 a parent, but all the others should. Check-ins may also identify
132 zero or more "merge" parents, and zero or more "cherrypick" ancestors.
133 But the merges and cherrypicks can be ignored on systems that do not
134 support those concepts.
135
136 VCCP assumes that every distinct version of a file and every check-in has
137 a unique name. In Git and Mercurial, those names are SHA1 hashes
138 (computed in different ways). Fossil uses SHA3-256 hashes. I'm not sure
139 what Bazaar uses. VCCP does not care how the names are derived, as long
140 as they always uniquely identify the file or check-in.
141
@@ -121,5 +151,349 @@
151
152 The VCCP is a client-server protocol.
153 A client formats a VCCP message and sends it to the server.
154 The server acts upon that message, formulates a reply, and sends
155 the reply back to the client.
156
157 It does not matter what transport mechanism is used to send the VCCP
158 messages from client to server and back again.
159 But for maximum flexibility, it is suggested that HTTP (or HTTPS) be
160 used. The client sends an HTTP request to the server with the
161 VCCP message as the request content and a MIME-type of "application/x-vccp".
162 The HTTP response is another VCCP message with the same MIME-type.
163 The use of HTTP means that firewalls and proxies are not an
164 impediment to collaboration and that collaboration connection information
165 can be described by a simple URL.
166
167 There are provisions in the VCCP design to allow authentication
168 in the body of the VCCP message itself. Or, two systems can, by
169 mutual agreement, authenticate by some external mechanism.
170
171 ### 3.1 Message Content
172
173 A single VCCP message round-trip can be a "push" if the client is sending
174 new check-in information to the server, or it can be a "pull" if the
175 client is polling the server to see if new check-in information is available
176 for download, or it can be both at once.
177
178 The basic design of a VCCP message is inspired by the Git fast-export
179 protocol, but with enhancements to support incremental updates and
180 bidirectional updates and to make the message format more robust and
181 portable and simpler to generate and parse. A single message may contain
182 multiple "files", check-in descriptions that reference those files, and "tag"
183 descriptions. A "message description" section contains authentication
184 data, error codes, and other meta-data. Every request and every
185 reply contains, at a minimum, a message description.
186
187 For a push, the request contains a message description with
188 authentication information, and the new files, check-ins, and tags
189 that are being pushed to the server. The reply to a push contains
190 success codes, and the names that the server assigned to the new objects,
191 so that the client can maintain a name map.
192
193 For pull, the request contains only a message description with
194 authentication information and a description of what content the
195 client desires to pull.
196 The reply to a pull contains the files, check-ins, and tags requested.
197
198 For a pull request, there is no mechanism (currently defined) for the
199 server to learn the client-side names for files and check-ins. Hence,
200 for a collaboration arrangement where the client polls the server for
201 updates, the client must maintain the name map.
202
203 ### 3.2 Message Format Overview
204
205 The format of a VCCP message is an ordinary SQLite database file with
206 a two-table schema.
207 The DATA table contains file, check-in, and tag content and the
208 message description. The DATA.CONTENT column contains either raw
209 file content or check-ins and tags descriptions formatted as JSON.
210 The message description is also JSON contained in a specially
211 designated row of the DATA table. The NAME table of the schema
212 is used to transmit name mappings. The NAME table serves the same
213 role as the "marks" file of git-fast-export.
214
215 ### 3.3 Why Use A Database As The Message Format?
216
217 Why does a VCCP message consist of an SQLite database instead of a
218 bespoke format like git-fast-export?
219
220 1. Some of the content to be transferred will typically be binary.
221 Most projects have at least a few images or other binary files
222 in their tree somewhere. Other files will be pure text. Check-in
223 and tag descriptions will also be pure text (JSON). That means
224 that the VCCP message will be a mix of text and binary content.
225 An SQLite database file is a convenient and efficient way
226 to encapsulate both binary and text content into a single container
227 which is easily created and accessed.
228
229 2. Robust, cross-platform libraries for reading and writing SQLite database
230 files already exist on every computer. No custom parser or generator
231 code needs to be written, debugged, managed, or maintained.
232
233 3. The SQLite database file format is well defined, cross-platform
234 (32-bit, 64-bit, bit-endian, and little-endian) and is endorsed
235 by the US Library of Congress as a recommended file format for
236 archival data storage.
237
238 4. Unlike a serial format (such as git-fast-export) which must
239 normally be written and read sequentially from beginning to end,
240 elements of an SQLite database can be constructed and read in any
241 order. This gives extra implementation flexibility to both readers
242 and writers.
243
244 ### 3.4 Database Schema
245
246 The database schema for a VCCP message is as follows:
247
248 >
249 CREATE TABLE data(
250 id INTEGER PRIMARY KEY,
251 dclass INT,
252 sz INT,
253 calg INT,
254 cref INT,
255 content ANY
256 );
257 CREATE TABLE name(
258 nameid INT,
259 nametype INT,
260 name TEXT,
261 PRIMARY KEY(nameid,nametype)
262 ) WITHOUT ROWID;
263
264 The DATA table holds the message description, the content of files, and JSON
265 descriptions of check-ins and tags. The NAME table is used to transmit
266 names. The DATA table corresponds to the body of a git-fast-export stream
267 and the NAME table corresponds to the "marks" file that is read and
268 written by the "--import-marks" and "--export-marks" options of the
269 "git fast-export" command.
270
271 Each file, check-in, and tag is normally a single distinct entry in
272 the DATA table. (Exception: very large files, greater than 1GB in size,
273 can be split across multiple DATA table rows - see below.) Entries in
274 the DATA tale can occur in any order. It is not required that files
275 referenced by check-ins have a smaller DATA.ID value, for example.
276 Free ordering does not impede data extraction (see the algorithm descriptions
277 below) but it does give considerable freedom to the message generator
278 logic.
279
280 Each DATA row has a class identified by a small integer in the DATA.DCLASS
281 column.
282
283 >
284 | 0: | A check-in |
285 | 1: | A file |
286 | 2: | A tag |
287 | 3: | The VCCP message description |
288 | 4: | Application-defined-1 |
289 | 5: | Application-defined-2 |
290
291 Every well-formed VCCP message has exactly one message description entry
292 with DATA.ID=0 and DATA.DCLASS=3. No other DATA table entries should have
293 DATA.DCLASS=3.
294
295 The application-defined values are reserved for extended uses of the
296 VCCP message format. In particular, there are plans to enhance
297 Fossil so that it uses VCCP as its sync protocol, replacing its
298 current bespoke protocol. But Fossil needs to send information other
299 kinds of objects, such as wiki pages and tickets, that are not known
300 to Git and most other version control systems. A few
301 "application defined" values are available at strategic points in
302 the message format description to accommodate these extended use cases.
303 New application-defined values may be defined in the future.
304 Portable VCCP messages between different version control systems
305 should never use the application-defined values.
306
307 The DATA.CONTENT field can be either text or binary, as appropriate.
308 For files, the DATA.CONTENT is binary. For check-ins and tags and for
309 the message description, the DATA.CONTENT is a text JSON object.
310
311 The DATA.CONTENT field can optionally be compressed. The DATA.SZ field
312 is the uncompressed size of the content in bytes. The compression method
313 is determined by the DATA.CALG field:
314
315 >
316 | 0: | No compression |
317 | 1: | ZLib compression |
318 | 2: | Multi-blob |
319 | 3: | Application-defined-1 |
320 | 4: | Application-defined-2 |
321
322 The "multi-blob" compression method means that the content is the
323 concatenation of the content in other DATA table rows. This
324 allows for content that exceeds the 1GB size limit for an SQLite
325 BLOB column. If the DATA.CALG field is 2, then DATA.CONTENT will
326 be a JSON array of integer values, where each integer is the DATA.ID
327 of another DATA table entry that contains part of the content.
328 The actual data content is the concatenation of the other DATA table
329 entries. The secondary DATA table entries can also be compressed,
330 though not with multi-blob. In other words, the multi-blob
331 compression method may not be nested. This effectively limits the
332 maximum size of a file in the VCCP to maximum size of an SQLite
333 database, which is 140 terabytes.
334
335 Portable VCCP files should only use compression methods 0, 1, and 2,
336 and preferrably only method 0 (no compression). But application-defined
337 compression methods are available for proprietary uses of the
338 VCCP message format. The DATA.CREF field is auxiliary data intended
339 for use with these application-defined compression methods. In
340 particular, DATA.CREF is intended to be the DATA.ID of a "base"
341 entry for delta-compression methods. For a portable VCCP file,
342 the DATA.CREF field should always be NULL.
343
344 The DATA.ID field provides an integer identifier for files and
345 check-ins. The scope of that name is the single VCCP message
346 in which the DATA table entry appears, however. The NAME table
347 is used to provide a mapping from these internal integer names
348 to the persistent global hash names of the various version
349 control systems.
350
351 A single object can have different names, depending on which
352 version control system stores it. For this reason, the NAME
353 table is designed to allow storage of multiple names for the
354 same object. If NAME.NAMETYPE is 0, that means that the name
355 is appropriate for use on the client. If NAME.NAMETYPE is 1,
356 that means the name is appropriate for use on the server.
357
358 To simplify the implementation of VCCP on diverse systems,
359 names should be sent as text. If the names for a particular system
360 are binary hashes, then the NAME table should store them as
361 the hexadecimal representation.
362
363 #### 3.4.1 NAME Table Example 1
364
365 Suppose a client is pushing a new check-in to the server and the
366 check-in text is stored in the DATA.ID=1 row. Then the request
367 should contain a NAME table row with NAME.NAMEID=1 (to match the
368 DATA table ID value) and NAME.NAMETYPE=0 (because client names
369 have NAMETYPE 0) and with the name of that check-in according to
370 the client stored in NAME.NAME. The server will recode the
371 check-in according its its own format, and store the server-side
372 name in a new NAME table row with NAME.NAMEID=1 and NAME.NAMETYPE=1.
373 The server then includes the complete NAME table in its reply
374 back to the client. In this way, the client is able to discover
375 the name of the check-in on the server. The serve can also
376 remember the client check-in name, if desired.
377
378 ### 3.5 Check-in JSON Format
379
380 Check-ins are described by DATA table rows where the content is a
381 single JSON object, as follows:
382
383 >
384 {
385 "time": DATETIME, -- Date and time of the check-in
386 "comment": TEXT, -- The original check-in comment
387 "mimetype": TEXT, -- The mimetype of the comment text
388 "branch": TEXT, -- Branch this check-in belongs to
389 "from": INT, -- NAME.NAMEID for the primary parent
390 "merge": [INT], -- Merge parents
391 "cherrypick": [INT] -- Cherrypick merges
392 "author": { -- Author of the change
393 "name": TEXT, -- Name or handle
394 "email": TEXT, -- Email address
395 "time": DATETIME -- Override for $.time
396 },
397 "committer": { -- Committer of the change
398 "name": TEXT, -- Name or handle
399 "email": TEXT, -- Email address
400 "time": DATETIME -- Override for $.time
401 },
402 "tag": [{ -- Tags and properties for this check-in
403 "name": TEXT, -- tag name
404 "value": TEXT, -- value (if it is a property)
405 "delete": 1, -- If present, delete this tag
406 "propagate": 1 -- Means propagate to descendants
407 }],
408 "reset": 1, -- All files included, not just changes
409 "file": [{ -- File in this check-in
410 "fname": TEXT, -- filename
411 "id": INT, -- DATA.ID or NAME_NAMEID. Omitted to delete
412 "mode": TEXT, -- "x" for executable. "l" for symlink
413 "oldname": TEXT -- Prior name if the file is renamed
414 }]
415 }
416
417 The $.time element is defines the moment in time when the check-in
418 occurred. The $.time field is required. Times are always Coordinated
419 Universal Time (UTC). DATETIME can be represented in multiple ways:
420
421 1. If the DATETIME is an integer, then it is the number of seconds
422 since 1970 (also known as "unix time").
423
424 2. If the DATETIME is text, then it is ISO8601 as follows:
425 "YYYY-MM-DD HH:MM:SS.SSS". The fractional seconds may be
426 omitted.
427
428 3. If the DATETIME is a real number, then it is the fractional
429 julian day number.
430
431 The $.comment element is the check-in comment. The $.comment field is
432 required. The mimetype for $.commit defaults to "text/plain" but can
433 be some other MIME-type if the $.mimetype field is present.
434
435 The $.branch element defines the name of the branch that this check-in
436 belongs to. If omitted, the branch of the check-in is the same as
437 the branch of its primary parent check-in.
438
439 The $.from element is defines the primary parent check-in. Every
440 check-in other than the first check-in of the project has a primary
441 parent. The integer value of the $.from element is either the
442 DATA.ID value for another check-in in the same VCCP message or is
443 the NAME.NAMEID value for a NAME table entry that identifies the
444 parent check-in, or both. If the information sender is relying on the
445 other side to do name mapping, then only the local name will be provided.
446 But if the information sender has a name map, it should provide both
447 its local name and the remote name for the check-in, so that the receiver
448 can update its name map.
449
450 The $.merge element is an array of integers for additional check-ins
451 that are merged into the current check-in. The $.cherrypick element
452 is an array of integer values that are check-ins that are cherrypick-merged
453 into the current check-in. Systems that do not record cherrypick merges
454 can ignore the $.cherrypick value.
455
456 The $.author and $.committer elements define who created the check-in.
457 The $.committer element is required. The $.author element may be omitted
458 in the common case where the author and committer are the same. The
459 $.committer.time and $.author.time subelements should only be included
460 if they are different from $.time.
461
462 The $.reset element, if present, should have an integer value of "1".
463 The presence of the $.reset element is a flag that affects the meaning
464 of the $.file element.
465
466 The $.file element is an array of JSON objects that define the files
467 associated with the check-in. If the $.reset flag is present, then there
468 must be one entry in $.file for every file in the check-in. If the
469 $.reset flag is omitted (the common case) then there is one entry
470 in $.file for every file that changes relative to the primary parent
471 in $.from. If There is no primary parent, then the presence of the
472 $.reset flag is assumed even if it is omitted.
473
474 The $.file[].fname element is the name of the file.
475 The $.file[].id element corresponds to a DATA.ID or NAME.NAMEID
476 that is the content of the file. If the file is being removed
477 by this check-in, then the $.file[].id element is omitted.
478 The $.file[].mode element is text containing one or more ASCII
479 characters. If the "x" character is included in $.file[].mode
480 then the file is executable. If the "l" character is included
481 in $.file[].mode then the file is a symbolic link (and the content
482 of the file is the target of the link). The $.file[].mode may
483 be blank or omitted for a normal read/write file. If a file
484 is being renamed, the $.file[].oldname field may be included
485 to show the previous name of the file, if that information is
486 available.
487
488 Some version control systems allow tags and properties to be
489 associated with a check-in. The $.tag element supports this
490 feature. Each element of the $.tag array is a separate tag
491 or property. If the $.tag[].propagate field exists and has
492 a value of "1", then the tag/property propagates to all
493 non-merge children. If the $.tag[].delete field exists and
494 has a value of "1", then a propagating tag or property with
495 the given name that was set by some ancestor check-in is
496 stopped and omitted from this check-in. Version control
497 systems that do not support tags and/or properties on check-ins
498 or that do not support tag propagation can ignore all of these
499 attributes.
500

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button