Fossil SCM

fossil-scm / www / vccp / intro.md
1
Version Control Collaboration Protocol
2
======================================
3
4
<blockquote><center style='background: yellow; border: 1px solid black;'>
5
This document is a work in progress.<br>
6
The last update was on 2019-03-13.<br>
7
Check back later for updates.
8
</center></blockquote>
9
10
1.0 Introduction
11
----------------
12
13
The Version Control Collaboration Protocol or VCCP is an attempt to make
14
it easier for developers to collaborate even when they are using different
15
version control systems.
16
17
For example, suppose Alice, the founder and principal maintainer
18
for the fictional "BambooCoffee" project, prefers using the
19
[Mercurial](https://www.mercurial-scm.org/) version control system,
20
but two of her clients, Bob and Cindy, know nothing but
21
[Git](https://www.git-scm.org/) and steadfastly refuse to
22
type any command that begins with "hg".
23
Further suppose that an important
24
collaborator, Dave, really prefers [Bazaar](bazaar.canonical.com/).
25
The VCCP is designed to make it relatively easy and painless
26
for Alice to set up Git and Bazaar mirrors of her Mercurial
27
repository so that Bob, Cindy, and Dave can all use the tools
28
they are most familiar with.
29
30
<center>![](diagram-1.jpg)</center>
31
32
Assuming all the servers speak VCCP (which is not the case at the
33
time of this writing, but we hope to encourage that for the future)
34
then whenever Alice checks in a new change to her primary repository
35
(here labeled "Truth") that repository sends a VCCP message to the
36
two mirrors which causes them to pick up the changes as well.
37
38
### 1.1 Bidirectional Collaboration
39
40
The diagram above shows that all changes originate from Alice and
41
that Bob, Cindy, and David are only consumers. If Cindy wanted to
42
make a change to BambooCoffee, she would have to do that with a backchannel,
43
such as sending a patch via email to Alice and asking Alice to check
44
in the change.
45
46
But VCCP also support bidirectional collaboration.
47
48
<center>![](diagram-2.jpg)</center>
49
50
If Cindy is a frequent contributor, and assuming that Git and Mercurial
51
are compatible version control systems (which I believe they are) then
52
VCCP can be used to move information from Truth to Mirror-1 and from Mirror-1
53
back to Truth. In that configuration, Cindy would be able to check in her
54
changes using the "git" command. The Mirror-1 server would then send a
55
VCCP message back to Truth containing Cindy's changes. Truth would then
56
relay those changes over to Mirror-2 where Dave could see them as well.
57
58
### 1.2 Client-Mirror versus Server-Mirror
59
60
VCCP allows the mirrors to be set up as either clients or servers.
61
62
In the client-mirror approach, the mirrors periodically poll Truth asking
63
for changes. In the server-mirror approach, Truth sends changes to the
64
mirrors as they occur.
65
66
In the first example above, the implication was that the server-mirror
67
approach was being used. The Truth repository would take the initiative
68
to send changes to the mirrors. But it does not have to be that way.
69
Suppose Dave is unknown to Alice. Suppose he just likes Alice's work and
70
wants to keep his own mirror of her work for his own convenience.
71
Dave could set up
72
Mirror-2 as a client-mirror that periodically polls Truth for changes.
73
74
In the second example above, Truth and Mirror-1 could be configured to
75
have a Peer-to-Peer relationship rather than a Truth-to-Mirror relationship.
76
When new content arrives at Truth (because Alice did an "hg commit"),
77
Truth acts as a client to initiate a transfer of that new information
78
over to Mirror-1. When new content originates at Mirror-1 (because
79
Cindy did "git commit") then Mirror-1 acts as a client to send a the new
80
content over the Truth. Or, they could set it up so that Truth is always
81
the client and it periodically polls Mirror-1 looking for new content
82
coming from Cindy. Or, they could set it up so that Mirror-1 is always
83
the client and it periodically polls Truth looking for changes from Alice.
84
85
The point is that VCCP works in all of these scenarios.
86
87
### 1.3 Name Mapping
88
89
Different version control systems use different names to refer to the same
90
object. For example, Fossil names files using a SHA3-256 hash of the
91
unmodified file content, whereas Git uses a hardened-SHA1 hash of the file
92
content with an added prefix. Mercurial, Monotone, Bazaar, and others all
93
uses different naming schemes, so that the same check-in in any particular
94
version control system will have a different name in all other version
95
control systems.
96
97
When mirroring a project between two version control systems, somebody
98
needs to keep track of the mapping between names.
99
100
For example, in the second diagram above, if Mirror-1 wants to tell Truth
101
that it has a new check-in "Q" that is a child of "P", then it has to send
102
the name of check-in "P". Does it send the Git-name of "P" or the
103
Mercurial-name of "P"? If Mirror-1 sends Truth the Git-name of "P" then
104
Truth must be the system that does the name mapping. If Mirror-1 sends
105
Truth the Mercurial-name of "P", then Mirror-1 is the system that maintains
106
the mapping.
107
108
The VCCP is designed such that both names for a
109
particular check-in or file can be sent. One of the collaborating systems
110
must still take responsibility for translating the names, but it does not
111
matter which system. As long as one or the other of the two systems
112
maintains a name mapping, the collaboration will work. Of course, it
113
also works for both systems to maintain the name map, and for maximum
114
flexibility, perhaps that should be the preferred approach.
115
116
2.0 Minimum Requirements
117
------------------------
118
119
The VCCP is modeled after the Git fast-export and fast-import protocol.
120
That is to say, VCCP thinks in terms of "check-ins" with each check-in
121
consisting of a number of files (or "Blobs" in git-speak). Any version
122
control system that wants to use VCCP needs to also be able to think
123
in those terms.
124
125
Since VCCP is modeled after fast-import, it has the concept of a tag.
126
But the use of tags is optional and
127
VCCP will work with systems that do not support tags.
128
129
VCCP assumes that most check-ins have a parent check-in from which it
130
was derived. Obviously, the first check-in for a project does not have
131
a parent, but all the others should. Check-ins may also identify
132
zero or more "merge" parents, and zero or more "cherrypick" ancestors.
133
But the merges and cherrypicks can be ignored on systems that do not
134
support those concepts.
135
136
VCCP assumes that every distinct version of a file and every check-in has
137
a unique name. In Git and Mercurial, those names are SHA1 hashes
138
(computed in different ways). Fossil uses SHA3-256 hashes. I'm not sure
139
what Bazaar uses. VCCP does not care how the names are derived, as long
140
as they always uniquely identify the file or check-in.
141
142
VCCP assumes that each check-in has a commit comment and a "committer"
143
and a timestamp for when the commit occurred. We hope that the timestamps
144
are well-ordered in the sense that each check-in comes after its predecessor,
145
though this is not a requirement. VCCP will continue to work even if
146
the timestamps are out of order, perhaps due to a misconfigured system clock
147
on the workstation of one of the collaborators.
148
149
3.0 Protocol Overview
150
---------------------
151
152
The VCCP is a client-server protocol.
153
A client formats a VCCP message and sends it to the server.
154
The server acts upon that message, formulates a reply, and sends
155
the reply back to the client.
156
157
It does not matter what transport mechanism is used to send the VCCP
158
messages from client to server and back again.
159
But for maximum flexibility, it is suggested that HTTP (or HTTPS) be
160
used. The client sends an HTTP request to the server with the
161
VCCP message as the request content and a MIME-type of "application/x-vccp".
162
The HTTP response is another VCCP message with the same MIME-type.
163
The use of HTTP means that firewalls and proxies are not an
164
impediment to collaboration and that collaboration connection information
165
can be described by a simple URL.
166
167
There are provisions in the VCCP design to allow authentication
168
in the body of the VCCP message itself. Or, two systems can, by
169
mutual agreement, authenticate by some external mechanism.
170
171
### 3.1 Message Content
172
173
A single VCCP message round-trip can be a "push" if the client is sending
174
new check-in information to the server, or it can be a "pull" if the
175
client is polling the server to see if new check-in information is available
176
for download, or it can be both at once.
177
178
The basic design of a VCCP message is inspired by the Git fast-export
179
protocol, but with enhancements to support incremental updates and
180
bidirectional updates and to make the message format more robust and
181
portable and simpler to generate and parse. A single message may contain
182
multiple "files", check-in descriptions that reference those files, and "tag"
183
descriptions. A "message description" section contains authentication
184
data, error codes, and other meta-data. Every request and every
185
reply contains, at a minimum, a message description.
186
187
For a push, the request contains a message description with
188
authentication information, and the new files, check-ins, and tags
189
that are being pushed to the server. The reply to a push contains
190
success codes, and the names that the server assigned to the new objects,
191
so that the client can maintain a name map.
192
193
For pull, the request contains only a message description with
194
authentication information and a description of what content the
195
client desires to pull.
196
The reply to a pull contains the files, check-ins, and tags requested.
197
198
For a pull request, there is no mechanism (currently defined) for the
199
server to learn the client-side names for files and check-ins. Hence,
200
for a collaboration arrangement where the client polls the server for
201
updates, the client must maintain the name map.
202
203
### 3.2 Message Format Overview
204
205
The format of a VCCP message is an ordinary SQLite database file with
206
a two-table schema.
207
The DATA table contains file, check-in, and tag content and the
208
message description. The DATA.CONTENT column contains either raw
209
file content or check-ins and tags descriptions formatted as JSON.
210
The message description is also JSON contained in a specially
211
designated row of the DATA table. The NAME table of the schema
212
is used to transmit name mappings. The NAME table serves the same
213
role as the "marks" file of git-fast-export.
214
215
### 3.3 Why Use A Database As The Message Format?
216
217
Why does a VCCP message consist of an SQLite database instead of a
218
bespoke format like git-fast-export?
219
220
1. Some of the content to be transferred will typically be binary.
221
Most projects have at least a few images or other binary files
222
in their tree somewhere. Other files will be pure text. Check-in
223
and tag descriptions will also be pure text (JSON). That means
224
that the VCCP message will be a mix of text and binary content.
225
An SQLite database file is a convenient and efficient way
226
to encapsulate both binary and text content into a single container
227
which is easily created and accessed.
228
229
2. Robust, cross-platform libraries for reading and writing SQLite database
230
files already exist on every computer. No custom parser or generator
231
code needs to be written, debugged, managed, or maintained.
232
233
3. The SQLite database file format is well defined, cross-platform
234
(32-bit, 64-bit, bit-endian, and little-endian) and is endorsed
235
by the US Library of Congress as a recommended file format for
236
archival data storage.
237
238
4. Unlike a serial format (such as git-fast-export) which must
239
normally be written and read sequentially from beginning to end,
240
elements of an SQLite database can be constructed and read in any
241
order. This gives extra implementation flexibility to both readers
242
and writers.
243
244
### 3.4 Database Schema
245
246
The database schema for a VCCP message is as follows:
247
248
>
249
CREATE TABLE data(
250
id INTEGER PRIMARY KEY,
251
dclass INT,
252
sz INT,
253
calg INT,
254
cref INT,
255
content ANY
256
);
257
CREATE TABLE name(
258
nameid INT,
259
nametype INT,
260
name TEXT,
261
PRIMARY KEY(nameid,nametype)
262
) WITHOUT ROWID;
263
264
The DATA table holds the message description, the content of files, and JSON
265
descriptions of check-ins and tags. The NAME table is used to transmit
266
names. The DATA table corresponds to the body of a git-fast-export stream
267
and the NAME table corresponds to the "marks" file that is read and
268
written by the "--import-marks" and "--export-marks" options of the
269
"git fast-export" command.
270
271
Each file, check-in, and tag is normally a single distinct entry in
272
the DATA table. (Exception: very large files, greater than 1GB in size,
273
can be split across multiple DATA table rows - see below.) Entries in
274
the DATA tale can occur in any order. It is not required that files
275
referenced by check-ins have a smaller DATA.ID value, for example.
276
Free ordering does not impede data extraction (see the algorithm descriptions
277
below) but it does give considerable freedom to the message generator
278
logic.
279
280
Each DATA row has a class identified by a small integer in the DATA.DCLASS
281
column.
282
283
>
284
| 0: | A check-in |
285
| 1: | A file |
286
| 2: | A tag |
287
| 3: | The VCCP message description |
288
| 4: | Application-defined-1 |
289
| 5: | Application-defined-2 |
290
291
Every well-formed VCCP message has exactly one message description entry
292
with DATA.ID=0 and DATA.DCLASS=3. No other DATA table entries should have
293
DATA.DCLASS=3.
294
295
The application-defined values are reserved for extended uses of the
296
VCCP message format. In particular, there are plans to enhance
297
Fossil so that it uses VCCP as its sync protocol, replacing its
298
current bespoke protocol. But Fossil needs to send information other
299
kinds of objects, such as wiki pages and tickets, that are not known
300
to Git and most other version control systems. A few
301
"application defined" values are available at strategic points in
302
the message format description to accommodate these extended use cases.
303
New application-defined values may be defined in the future.
304
Portable VCCP messages between different version control systems
305
should never use the application-defined values.
306
307
The DATA.CONTENT field can be either text or binary, as appropriate.
308
For files, the DATA.CONTENT is binary. For check-ins and tags and for
309
the message description, the DATA.CONTENT is a text JSON object.
310
311
The DATA.CONTENT field can optionally be compressed. The DATA.SZ field
312
is the uncompressed size of the content in bytes. The compression method
313
is determined by the DATA.CALG field:
314
315
>
316
| 0: | No compression |
317
| 1: | ZLib compression |
318
| 2: | Multi-blob |
319
| 3: | Application-defined-1 |
320
| 4: | Application-defined-2 |
321
322
The "multi-blob" compression method means that the content is the
323
concatenation of the content in other DATA table rows. This
324
allows for content that exceeds the 1GB size limit for an SQLite
325
BLOB column. If the DATA.CALG field is 2, then DATA.CONTENT will
326
be a JSON array of integer values, where each integer is the DATA.ID
327
of another DATA table entry that contains part of the content.
328
The actual data content is the concatenation of the other DATA table
329
entries. The secondary DATA table entries can also be compressed,
330
though not with multi-blob. In other words, the multi-blob
331
compression method may not be nested. This effectively limits the
332
maximum size of a file in the VCCP to maximum size of an SQLite
333
database, which is 140 terabytes.
334
335
Portable VCCP files should only use compression methods 0, 1, and 2,
336
and preferrably only method 0 (no compression). But application-defined
337
compression methods are available for proprietary uses of the
338
VCCP message format. The DATA.CREF field is auxiliary data intended
339
for use with these application-defined compression methods. In
340
particular, DATA.CREF is intended to be the DATA.ID of a "base"
341
entry for delta-compression methods. For a portable VCCP file,
342
the DATA.CREF field should always be NULL.
343
344
The DATA.ID field provides an integer identifier for files and
345
check-ins. The scope of that name is the single VCCP message
346
in which the DATA table entry appears, however. The NAME table
347
is used to provide a mapping from these internal integer names
348
to the persistent global hash names of the various version
349
control systems.
350
351
A single object can have different names, depending on which
352
version control system stores it. For this reason, the NAME
353
table is designed to allow storage of multiple names for the
354
same object. If NAME.NAMETYPE is 0, that means that the name
355
is appropriate for use on the client. If NAME.NAMETYPE is 1,
356
that means the name is appropriate for use on the server.
357
358
To simplify the implementation of VCCP on diverse systems,
359
names should be sent as text. If the names for a particular system
360
are binary hashes, then the NAME table should store them as
361
the hexadecimal representation.
362
363
The NAME table can also be used for error messages in a VCCP
364
reply message. If the request contains an error associated with
365
a particular row in the DATA table, or with a particular NAMEID,
366
then an error message is added to the NAME table with NAMEID
367
set to the offending DATA.ID or NAMEID and NAMETYPE set to 2
368
and with English the error message text in NAME.NAME. Hence,
369
the allowed values for NAME.NAMETYPE are:
370
371
>
372
| 0: | Name of the object as known to the client |
373
| 1: | Name of the object as known to the server |
374
| 2: | Error message text for the object |
375
376
#### 3.4.1 NAME Table Example 1
377
378
Suppose a client is pushing a new check-in to the server and the
379
check-in text is stored in the DATA.ID=1 row. Then the request
380
should contain a NAME table row with NAME.NAMEID=1 (to match the
381
DATA table ID value) and NAME.NAMETYPE=0 (because client names
382
have NAMETYPE 0) and with the name of that check-in according to
383
the client stored in NAME.NAME. The server will recode the
384
check-in according its its own format, and store the server-side
385
name in a new NAME table row with NAME.NAMEID=1 and NAME.NAMETYPE=1.
386
The server then includes the complete NAME table in its reply
387
back to the client. In this way, the client is able to discover
388
the name of the check-in on the server. The serve can also
389
remember the client check-in name, if desired.
390
391
### 3.5 Check-in JSON Format
392
393
Check-ins are described by DATA table rows where the content is a
394
single JSON object, as follows:
395
396
>
397
{
398
"time": DATETIME, -- Date and time of the check-in
399
"comment": TEXT, -- The original check-in comment
400
"mimetype": TEXT, -- The mimetype of the comment text
401
"branch": TEXT, -- Branch this check-in belongs to
402
"from": INT, -- NAME.NAMEID for the primary parent
403
"merge": [INT], -- Merge parents
404
"cherrypick": [INT] -- Cherrypick merges
405
"author": { -- Author of the change
406
"name": TEXT, -- Name or handle
407
"email": TEXT, -- Email address
408
"time": DATETIME -- Override for $.time
409
},
410
"committer": { -- Committer of the change
411
"name": TEXT, -- Name or handle
412
"email": TEXT, -- Email address
413
"time": DATETIME -- Override for $.time
414
},
415
"tag": [{ -- Tags and properties for this check-in
416
"name": TEXT, -- tag name
417
"value": TEXT, -- value (if it is a property)
418
"delete": 1, -- If present, delete this tag
419
"propagate": 1 -- Means propagate to descendants
420
}],
421
"reset": 1, -- All files included, not just changes
422
"file": [{ -- File in this check-in
423
"fname": TEXT, -- filename
424
"id": INT, -- DATA.ID or NAME_NAMEID. Omitted to delete
425
"mode": TEXT, -- "x" for executable. "l" for symlink
426
"oldname": TEXT -- Prior name if the file is renamed
427
}]
428
}
429
430
The $.time element is defines the moment in time when the check-in
431
occurred. The $.time field is required. Times are always Coordinated
432
Universal Time (UTC). DATETIME can be represented in multiple ways:
433
434
1. If the DATETIME is an integer, then it is the number of seconds
435
since 1970 (also known as "unix time").
436
437
2. If the DATETIME is text, then it is ISO8601 as follows:
438
"YYYY-MM-DD HH:MM:SS.SSS". The fractional seconds may be
439
omitted.
440
441
3. If the DATETIME is a real number, then it is the fractional
442
julian day number.
443
444
The $.comment element is the check-in comment. The $.comment field is
445
required. The mimetype for $.commit defaults to "text/plain" but can
446
be some other MIME-type if the $.mimetype field is present.
447
448
The $.branch element defines the name of the branch that this check-in
449
belongs to. If omitted, the branch of the check-in is the same as
450
the branch of its primary parent check-in.
451
452
The $.from element is defines the primary parent check-in. Every
453
check-in other than the first check-in of the project has a primary
454
parent. The integer value of the $.from element is either the
455
DATA.ID value for another check-in in the same VCCP message or is
456
the NAME.NAMEID value for a NAME table entry that identifies the
457
parent check-in, or both. If the information sender is relying on the
458
other side to do name mapping, then only the local name will be provided.
459
But if the information sender has a name map, it should provide both
460
its local name and the remote name for the check-in, so that the receiver
461
can update its name map.
462
463
The $.merge element is an array of integers for additional check-ins
464
that are merged into the current check-in. The $.cherrypick element
465
is an array of integer values that are check-ins that are cherrypick-merged
466
into the current check-in. Systems that do not record cherrypick merges
467
can ignore the $.cherrypick value.
468
469
The $.author and $.committer elements define who created the check-in.
470
The $.committer element is required. The $.author element may be omitted
471
in the common case where the author and committer are the same. The
472
$.committer.time and $.author.time subelements should only be included
473
if they are different from $.time.
474
475
The $.reset element, if present, should have an integer value of "1".
476
The presence of the $.reset element is a flag that affects the meaning
477
of the $.file element.
478
479
The $.file element is an array of JSON objects that define the files
480
associated with the check-in. If the $.reset flag is present, then there
481
must be one entry in $.file for every file in the check-in. If the
482
$.reset flag is omitted (the common case) then there is one entry
483
in $.file for every file that changes relative to the primary parent
484
in $.from. If There is no primary parent, then the presence of the
485
$.reset flag is assumed even if it is omitted.
486
487
The $.file[].fname element is the name of the file.
488
The $.file[].id element corresponds to a DATA.ID or NAME.NAMEID
489
that is the content of the file. If the file is being removed
490
by this check-in, then the $.file[].id element is omitted.
491
The $.file[].mode element is text containing one or more ASCII
492
characters. If the "x" character is included in $.file[].mode
493
then the file is executable. If the "l" character is included
494
in $.file[].mode then the file is a symbolic link (and the content
495
of the file is the target of the link). The $.file[].mode may
496
be blank or omitted for a normal read/write file. If a file
497
is being renamed, the $.file[].oldname field may be included
498
to show the previous name of the file, if that information is
499
available.
500
501
Some version control systems allow tags and properties to be
502
part of the check-in itself rather than a separate entity.
503
The $.tag element supports this
504
feature. Each element of the $.tag array is a separate tag
505
or property. If the $.tag[].propagate field exists and has
506
a value of "1", then the tag/property propagates to all
507
non-merge children. If the $.tag[].delete field exists and
508
has a value of "1", then a propagating tag or property with
509
the given name that was set by some ancestor check-in is
510
stopped and omitted from this check-in. Version control
511
systems that do not support tags and/or properties on check-ins
512
or that do not support tag propagation can ignore all of these
513
attributes.
514
515
### 3.6 Tag JSON Format
516
517
A new tag is created using the following JSON syntax:
518
519
>
520
{
521
"time": DATETIME, -- Time when the tag was created
522
"name": TEXT, -- Name of the tag
523
"from": INT, -- Check-in being tagged
524
"comment": TEXT, -- Message associated with the tag
525
"mimetype": TEXT, -- Mimetype of the message
526
"value": TEXT, -- Value of the tag if it is really a property
527
"delete": 1, -- Stop propagaging this tag
528
"propagate": 1, -- Propagate this tag to direct children
529
"tagger": { -- Person who created this tag
530
"name": TEXT, -- Name or handle
531
"email": TEXT, -- Email address
532
"time": DATETIME -- Override for $.time
533
}
534
}
535
536
### 3.7 Message Description JSON Format
537
538
>
539
{
540
"version": INT, -- protocol version number
541
"features": [TEXT], -- supported optional features
542
"client_vcs": TEXT,
543
"server_vcs": TEXT,
544
"credentials": {
545
"username": TEXT,
546
"password": TEXT,
547
}
548
"pull": [{
549
"branch": TEXT,
550
"decendents_of": TEXT,
551
"after": DATETIME
552
}],
553
"max_size_hint": INT,
554
"done": 1,
555
"more_available": 1,
556
"continue_with": JSON,
557
}
558

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button