Fossil SCM

Continuing work on the VCCP spec. This is an incremental check-in.

drh 2019-03-13 15:34 vccp

Commit 60794e993c9e3dad82a475203c63a1dda5c4b134f35292f712c43b82f300a9d8

Parent b37bb7dc79aacde…

1 file changed +385 -11

M www/vccp/intro.md

+385 -11

		--- www/vccp/intro.md
		+++ www/vccp/intro.md
		@@ -1,11 +1,11 @@
1	1	Version Control Collaboration Protocol
2	2	======================================
3	3
4	4	<blockquote><center style='background: yellow; border: 1px solid black;'>
5	5	This document is a work in progress.<br>
6		-The last update was on 2019-03-09.<br>
	6	+The last update was on 2019-03-13.<br>
7	7	Check back later for updates.
8	8	</center></blockquote>
9	9
10	10	1.0 Introduction
11	11	----------------
		@@ -12,22 +12,22 @@
12	12
13	13	The Version Control Collaboration Protocol or VCCP is an attempt to make
14	14	it easier for developers to collaborate even when they are using different
15	15	version control systems.
16	16
17		-For example, suppose Alice, the founder and
18		-[BDFL](https://en.wikipedia.org/wiki/Benevolent_dictator_for_life)
	17	+For example, suppose Alice, the founder and principal maintainer
19	18	for the fictional "BambooCoffee" project, prefers using the
20	19	[Mercurial](https://www.mercurial-scm.org/) version control system,
21	20	but two of her clients, Bob and Cindy, know nothing but
22	21	[Git](https://www.git-scm.org/) and steadfastly refuse to
23		-type any command that begins with "hg", and an important
	22	+type any command that begins with "hg".
	23	+Further suppose that an important
24	24	collaborator, Dave, really prefers [Bazaar](bazaar.canonical.com/).
25	25	The VCCP is designed to make it relatively easy and painless
26	26	for Alice to set up Git and Bazaar mirrors of her Mercurial
27		-repository so that Bob, Cindy, and Dave can all use the tools with
28		-which they are most familiar.
	27	+repository so that Bob, Cindy, and Dave can all use the tools
	28	+they are most familiar with.
29	29
30	30	<center>![](diagram-1.jpg)</center>
31	31
32	32	Assuming all the servers speak VCCP (which is not the case at the
33	33	time of this writing, but we hope to encourage that for the future)
		@@ -38,11 +38,11 @@
38	38	### 1.1 Bidirectional Collaboration
39	39
40	40	The diagram above shows that all changes originate from Alice and
41	41	that Bob, Cindy, and David are only consumers. If Cindy wanted to
42	42	make a change to BambooCoffee, she would have to do that with a backchannel,
43		-such as sending a patch via email to Alice and then get Alice to check
	43	+such as sending a patch via email to Alice and asking Alice to check
44	44	in the change.
45	45
46	46	But VCCP also support bidirectional collaboration.
47	47
48	48	<center>![](diagram-2.jpg)</center>
		@@ -55,26 +55,27 @@
55	55	VCCP message back to Truth containing Cindy's changes. Truth would then
56	56	relay those changes over to Mirror-2 where Dave could see them as well.
57	57
58	58	### 1.2 Client-Mirror versus Server-Mirror
59	59
60		-VCCP allows the mirrors to be set up as either clients or server.
	60	+VCCP allows the mirrors to be set up as either clients or servers.
61	61
62	62	In the client-mirror approach, the mirrors periodically poll Truth asking
63	63	for changes. In the server-mirror approach, Truth sends changes to the
64	64	mirrors as they occur.
65	65
66	66	In the first example above, the implication was that the server-mirror
67	67	approach was being used. The Truth repository would take the initiative
68	68	to send changes to the mirrors. But it does not have to be that way.
69	69	Suppose Dave is unknown to Alice. Suppose he just likes Alice's work and
70		-wants to keep his own mirror for his own convenience. Dave could set up
	70	+wants to keep his own mirror of her work for his own convenience.
	71	+Dave could set up
71	72	Mirror-2 as a client-mirror that periodically polls Truth for changes.
72	73
73	74	In the second example above, Truth and Mirror-1 could be configured to
74	75	have a Peer-to-Peer relationship rather than a Truth-to-Mirror relationship.
75		-When new content arrives a Truth (because Alice did an "hg commit"),
	76	+When new content arrives at Truth (because Alice did an "hg commit"),
76	77	Truth acts as a client to initiate a transfer of that new information
77	78	over to Mirror-1. When new content originates at Mirror-1 (because
78	79	Cindy did "git commit") then Mirror-1 acts as a client to send a the new
79	80	content over the Truth. Or, they could set it up so that Truth is always
80	81	the client and it periodically polls Mirror-1 looking for new content
		@@ -81,10 +82,39 @@
81	82	coming from Cindy. Or, they could set it up so that Mirror-1 is always
82	83	the client and it periodically polls Truth looking for changes from Alice.
83	84
84	85	The point is that VCCP works in all of these scenarios.
85	86
	87	+### 1.3 Name Mapping
	88	+
	89	+Different version control systems use different names to refer to the same
	90	+object. For example, Fossil names files using a SHA3-256 hash of the
	91	+unmodified file content, whereas Git uses a hardened-SHA1 hash of the file
	92	+content with an added prefix. Mercurial, Monotone, Bazaar, and others all
	93	+uses different naming schemes, so that the same check-in in any particular
	94	+version control system will have a different name in all other version
	95	+control systems.
	96	+
	97	+When mirroring a project between two version control systems, somebody
	98	+needs to keep track of the mapping between names.
	99	+
	100	+For example, in the second diagram above, if Mirror-1 wants to tell Truth
	101	+that it has a new check-in "Q" that is a child of "P", then it has to send
	102	+the name of check-in "P". Does it send the Git-name of "P" or the
	103	+Mercurial-name of "P"? If Mirror-1 sends Truth the Git-name of "P" then
	104	+Truth must be the system that does the name mapping. If Mirror-1 sends
	105	+Truth the Mercurial-name of "P", then Mirror-1 is the system that maintains
	106	+the mapping.
	107	+
	108	+The VCCP is designed such that both names for a
	109	+particular check-in or file can be sent. One of the collaborating systems
	110	+must still take responsibility for translating the names, but it does not
	111	+matter which system. As long as one or the other of the two systems
	112	+maintains a name mapping, the collaboration will work. Of course, it
	113	+also works for both systems to maintain the name map, and for maximum
	114	+flexibility, perhaps that should be the preferred approach.
	115	+
86	116	2.0 Minimum Requirements
87	117	------------------------
88	118
89	119	The VCCP is modeled after the Git fast-export and fast-import protocol.
90	120	That is to say, VCCP thinks in terms of "check-ins" with each check-in
		@@ -101,11 +131,11 @@
101	131	a parent, but all the others should. Check-ins may also identify
102	132	zero or more "merge" parents, and zero or more "cherrypick" ancestors.
103	133	But the merges and cherrypicks can be ignored on systems that do not
104	134	support those concepts.
105	135
106		-VCCP assumes that every distinct version of a file, and every check-in has
	136	+VCCP assumes that every distinct version of a file and every check-in has
107	137	a unique name. In Git and Mercurial, those names are SHA1 hashes
108	138	(computed in different ways). Fossil uses SHA3-256 hashes. I'm not sure
109	139	what Bazaar uses. VCCP does not care how the names are derived, as long
110	140	as they always uniquely identify the file or check-in.
111	141
		@@ -121,5 +151,349 @@
121	151
122	152	The VCCP is a client-server protocol.
123	153	A client formats a VCCP message and sends it to the server.
124	154	The server acts upon that message, formulates a reply, and sends
125	155	the reply back to the client.
	156	+
	157	+It does not matter what transport mechanism is used to send the VCCP
	158	+messages from client to server and back again.
	159	+But for maximum flexibility, it is suggested that HTTP (or HTTPS) be
	160	+used. The client sends an HTTP request to the server with the
	161	+VCCP message as the request content and a MIME-type of "application/x-vccp".
	162	+The HTTP response is another VCCP message with the same MIME-type.
	163	+The use of HTTP means that firewalls and proxies are not an
	164	+impediment to collaboration and that collaboration connection information
	165	+can be described by a simple URL.
	166	+
	167	+There are provisions in the VCCP design to allow authentication
	168	+in the body of the VCCP message itself. Or, two systems can, by
	169	+mutual agreement, authenticate by some external mechanism.
	170	+
	171	+### 3.1 Message Content
	172	+
	173	+A single VCCP message round-trip can be a "push" if the client is sending
	174	+new check-in information to the server, or it can be a "pull" if the
	175	+client is polling the server to see if new check-in information is available
	176	+for download, or it can be both at once.
	177	+
	178	+The basic design of a VCCP message is inspired by the Git fast-export
	179	+protocol, but with enhancements to support incremental updates and
	180	+bidirectional updates and to make the message format more robust and
	181	+portable and simpler to generate and parse. A single message may contain
	182	+multiple "files", check-in descriptions that reference those files, and "tag"
	183	+descriptions. A "message description" section contains authentication
	184	+data, error codes, and other meta-data. Every request and every
	185	+reply contains, at a minimum, a message description.
	186	+
	187	+For a push, the request contains a message description with
	188	+authentication information, and the new files, check-ins, and tags
	189	+that are being pushed to the server. The reply to a push contains
	190	+success codes, and the names that the server assigned to the new objects,
	191	+so that the client can maintain a name map.
	192	+
	193	+For pull, the request contains only a message description with
	194	+authentication information and a description of what content the
	195	+client desires to pull.
	196	+The reply to a pull contains the files, check-ins, and tags requested.
	197	+
	198	+For a pull request, there is no mechanism (currently defined) for the
	199	+server to learn the client-side names for files and check-ins. Hence,
	200	+for a collaboration arrangement where the client polls the server for
	201	+updates, the client must maintain the name map.
	202	+
	203	+### 3.2 Message Format Overview
	204	+
	205	+The format of a VCCP message is an ordinary SQLite database file with
	206	+a two-table schema.
	207	+The DATA table contains file, check-in, and tag content and the
	208	+message description. The DATA.CONTENT column contains either raw
	209	+file content or check-ins and tags descriptions formatted as JSON.
	210	+The message description is also JSON contained in a specially
	211	+designated row of the DATA table. The NAME table of the schema
	212	+is used to transmit name mappings. The NAME table serves the same
	213	+role as the "marks" file of git-fast-export.
	214	+
	215	+### 3.3 Why Use A Database As The Message Format?
	216	+
	217	+Why does a VCCP message consist of an SQLite database instead of a
	218	+bespoke format like git-fast-export?
	219	+
	220	+ 1. Some of the content to be transferred will typically be binary.
	221	+ Most projects have at least a few images or other binary files
	222	+ in their tree somewhere. Other files will be pure text. Check-in
	223	+ and tag descriptions will also be pure text (JSON). That means
	224	+ that the VCCP message will be a mix of text and binary content.
	225	+ An SQLite database file is a convenient and efficient way
	226	+ to encapsulate both binary and text content into a single container
	227	+ which is easily created and accessed.
	228	+
	229	+ 2. Robust, cross-platform libraries for reading and writing SQLite database
	230	+ files already exist on every computer. No custom parser or generator
	231	+ code needs to be written, debugged, managed, or maintained.
	232	+
	233	+ 3. The SQLite database file format is well defined, cross-platform
	234	+ (32-bit, 64-bit, bit-endian, and little-endian) and is endorsed
	235	+ by the US Library of Congress as a recommended file format for
	236	+ archival data storage.
	237	+
	238	+ 4. Unlike a serial format (such as git-fast-export) which must
	239	+ normally be written and read sequentially from beginning to end,
	240	+ elements of an SQLite database can be constructed and read in any
	241	+ order. This gives extra implementation flexibility to both readers
	242	+ and writers.
	243	+
	244	+### 3.4 Database Schema
	245	+
	246	+The database schema for a VCCP message is as follows:
	247	+
	248	+>
	249	+ CREATE TABLE data(
	250	+ id INTEGER PRIMARY KEY,
	251	+ dclass INT,
	252	+ sz INT,
	253	+ calg INT,
	254	+ cref INT,
	255	+ content ANY
	256	+ );
	257	+ CREATE TABLE name(
	258	+ nameid INT,
	259	+ nametype INT,
	260	+ name TEXT,
	261	+ PRIMARY KEY(nameid,nametype)
	262	+ ) WITHOUT ROWID;
	263	+
	264	+The DATA table holds the message description, the content of files, and JSON
	265	+descriptions of check-ins and tags. The NAME table is used to transmit
	266	+names. The DATA table corresponds to the body of a git-fast-export stream
	267	+and the NAME table corresponds to the "marks" file that is read and
	268	+written by the "--import-marks" and "--export-marks" options of the
	269	+"git fast-export" command.
	270	+
	271	+Each file, check-in, and tag is normally a single distinct entry in
	272	+the DATA table. (Exception: very large files, greater than 1GB in size,
	273	+can be split across multiple DATA table rows - see below.) Entries in
	274	+the DATA tale can occur in any order. It is not required that files
	275	+referenced by check-ins have a smaller DATA.ID value, for example.
	276	+Free ordering does not impede data extraction (see the algorithm descriptions
	277	+below) but it does give considerable freedom to the message generator
	278	+logic.
	279	+
	280	+Each DATA row has a class identified by a small integer in the DATA.DCLASS
	281	+column.
	282	+
	283	+>
	284	+\| 0: \| A check-in \|
	285	+\| 1: \| A file \|
	286	+\| 2: \| A tag \|
	287	+\| 3: \| The VCCP message description \|
	288	+\| 4: \| Application-defined-1 \|
	289	+\| 5: \| Application-defined-2 \|
	290	+
	291	+Every well-formed VCCP message has exactly one message description entry
	292	+with DATA.ID=0 and DATA.DCLASS=3. No other DATA table entries should have
	293	+DATA.DCLASS=3.
	294	+
	295	+The application-defined values are reserved for extended uses of the
	296	+VCCP message format. In particular, there are plans to enhance
	297	+Fossil so that it uses VCCP as its sync protocol, replacing its
	298	+current bespoke protocol. But Fossil needs to send information other
	299	+kinds of objects, such as wiki pages and tickets, that are not known
	300	+to Git and most other version control systems. A few
	301	+"application defined" values are available at strategic points in
	302	+the message format description to accommodate these extended use cases.
	303	+New application-defined values may be defined in the future.
	304	+Portable VCCP messages between different version control systems
	305	+should never use the application-defined values.
	306	+
	307	+The DATA.CONTENT field can be either text or binary, as appropriate.
	308	+For files, the DATA.CONTENT is binary. For check-ins and tags and for
	309	+the message description, the DATA.CONTENT is a text JSON object.
	310	+
	311	+The DATA.CONTENT field can optionally be compressed. The DATA.SZ field
	312	+is the uncompressed size of the content in bytes. The compression method
	313	+is determined by the DATA.CALG field:
	314	+
	315	+>
	316	+\| 0: \| No compression \|
	317	+\| 1: \| ZLib compression \|
	318	+\| 2: \| Multi-blob \|
	319	+\| 3: \| Application-defined-1 \|
	320	+\| 4: \| Application-defined-2 \|
	321	+
	322	+The "multi-blob" compression method means that the content is the
	323	+concatenation of the content in other DATA table rows. This
	324	+allows for content that exceeds the 1GB size limit for an SQLite
	325	+BLOB column. If the DATA.CALG field is 2, then DATA.CONTENT will
	326	+be a JSON array of integer values, where each integer is the DATA.ID
	327	+of another DATA table entry that contains part of the content.
	328	+The actual data content is the concatenation of the other DATA table
	329	+entries. The secondary DATA table entries can also be compressed,
	330	+though not with multi-blob. In other words, the multi-blob
	331	+compression method may not be nested. This effectively limits the
	332	+maximum size of a file in the VCCP to maximum size of an SQLite
	333	+database, which is 140 terabytes.
	334	+
	335	+Portable VCCP files should only use compression methods 0, 1, and 2,
	336	+and preferrably only method 0 (no compression). But application-defined
	337	+compression methods are available for proprietary uses of the
	338	+VCCP message format. The DATA.CREF field is auxiliary data intended
	339	+for use with these application-defined compression methods. In
	340	+particular, DATA.CREF is intended to be the DATA.ID of a "base"
	341	+entry for delta-compression methods. For a portable VCCP file,
	342	+the DATA.CREF field should always be NULL.
	343	+
	344	+The DATA.ID field provides an integer identifier for files and
	345	+check-ins. The scope of that name is the single VCCP message
	346	+in which the DATA table entry appears, however. The NAME table
	347	+is used to provide a mapping from these internal integer names
	348	+to the persistent global hash names of the various version
	349	+control systems.
	350	+
	351	+A single object can have different names, depending on which
	352	+version control system stores it. For this reason, the NAME
	353	+table is designed to allow storage of multiple names for the
	354	+same object. If NAME.NAMETYPE is 0, that means that the name
	355	+is appropriate for use on the client. If NAME.NAMETYPE is 1,
	356	+that means the name is appropriate for use on the server.
	357	+
	358	+To simplify the implementation of VCCP on diverse systems,
	359	+names should be sent as text. If the names for a particular system
	360	+are binary hashes, then the NAME table should store them as
	361	+the hexadecimal representation.
	362	+
	363	+#### 3.4.1 NAME Table Example 1
	364	+
	365	+Suppose a client is pushing a new check-in to the server and the
	366	+check-in text is stored in the DATA.ID=1 row. Then the request
	367	+should contain a NAME table row with NAME.NAMEID=1 (to match the
	368	+DATA table ID value) and NAME.NAMETYPE=0 (because client names
	369	+have NAMETYPE 0) and with the name of that check-in according to
	370	+the client stored in NAME.NAME. The server will recode the
	371	+check-in according its its own format, and store the server-side
	372	+name in a new NAME table row with NAME.NAMEID=1 and NAME.NAMETYPE=1.
	373	+The server then includes the complete NAME table in its reply
	374	+back to the client. In this way, the client is able to discover
	375	+the name of the check-in on the server. The serve can also
	376	+remember the client check-in name, if desired.
	377	+
	378	+### 3.5 Check-in JSON Format
	379	+
	380	+Check-ins are described by DATA table rows where the content is a
	381	+single JSON object, as follows:
	382	+
	383	+>
	384	+ {
	385	+ "time": DATETIME, -- Date and time of the check-in
	386	+ "comment": TEXT, -- The original check-in comment
	387	+ "mimetype": TEXT, -- The mimetype of the comment text
	388	+ "branch": TEXT, -- Branch this check-in belongs to
	389	+ "from": INT, -- NAME.NAMEID for the primary parent
	390	+ "merge": [INT], -- Merge parents
	391	+ "cherrypick": [INT] -- Cherrypick merges
	392	+ "author": { -- Author of the change
	393	+ "name": TEXT, -- Name or handle
	394	+ "email": TEXT, -- Email address
	395	+ "time": DATETIME -- Override for $.time
	396	+ },
	397	+ "committer": { -- Committer of the change
	398	+ "name": TEXT, -- Name or handle
	399	+ "email": TEXT, -- Email address
	400	+ "time": DATETIME -- Override for $.time
	401	+ },
	402	+ "tag": [{ -- Tags and properties for this check-in
	403	+ "name": TEXT, -- tag name
	404	+ "value": TEXT, -- value (if it is a property)
	405	+ "delete": 1, -- If present, delete this tag
	406	+ "propagate": 1 -- Means propagate to descendants
	407	+ }],
	408	+ "reset": 1, -- All files included, not just changes
	409	+ "file": [{ -- File in this check-in
	410	+ "fname": TEXT, -- filename
	411	+ "id": INT, -- DATA.ID or NAME_NAMEID. Omitted to delete
	412	+ "mode": TEXT, -- "x" for executable. "l" for symlink
	413	+ "oldname": TEXT -- Prior name if the file is renamed
	414	+ }]
	415	+ }
	416	+
	417	+The $.time element is defines the moment in time when the check-in
	418	+occurred. The $.time field is required. Times are always Coordinated
	419	+Universal Time (UTC). DATETIME can be represented in multiple ways:
	420	+
	421	+ 1. If the DATETIME is an integer, then it is the number of seconds
	422	+ since 1970 (also known as "unix time").
	423	+
	424	+ 2. If the DATETIME is text, then it is ISO8601 as follows:
	425	+ "YYYY-MM-DD HH:MM:SS.SSS". The fractional seconds may be
	426	+ omitted.
	427	+
	428	+ 3. If the DATETIME is a real number, then it is the fractional
	429	+ julian day number.
	430	+
	431	+The $.comment element is the check-in comment. The $.comment field is
	432	+required. The mimetype for $.commit defaults to "text/plain" but can
	433	+be some other MIME-type if the $.mimetype field is present.
	434	+
	435	+The $.branch element defines the name of the branch that this check-in
	436	+belongs to. If omitted, the branch of the check-in is the same as
	437	+the branch of its primary parent check-in.
	438	+
	439	+The $.from element is defines the primary parent check-in. Every
	440	+check-in other than the first check-in of the project has a primary
	441	+parent. The integer value of the $.from element is either the
	442	+DATA.ID value for another check-in in the same VCCP message or is
	443	+the NAME.NAMEID value for a NAME table entry that identifies the
	444	+parent check-in, or both. If the information sender is relying on the
	445	+other side to do name mapping, then only the local name will be provided.
	446	+But if the information sender has a name map, it should provide both
	447	+its local name and the remote name for the check-in, so that the receiver
	448	+can update its name map.
	449	+
	450	+The $.merge element is an array of integers for additional check-ins
	451	+that are merged into the current check-in. The $.cherrypick element
	452	+is an array of integer values that are check-ins that are cherrypick-merged
	453	+into the current check-in. Systems that do not record cherrypick merges
	454	+can ignore the $.cherrypick value.
	455	+
	456	+The $.author and $.committer elements define who created the check-in.
	457	+The $.committer element is required. The $.author element may be omitted
	458	+in the common case where the author and committer are the same. The
	459	+$.committer.time and $.author.time subelements should only be included
	460	+if they are different from $.time.
	461	+
	462	+The $.reset element, if present, should have an integer value of "1".
	463	+The presence of the $.reset element is a flag that affects the meaning
	464	+of the $.file element.
	465	+
	466	+The $.file element is an array of JSON objects that define the files
	467	+associated with the check-in. If the $.reset flag is present, then there
	468	+must be one entry in $.file for every file in the check-in. If the
	469	+$.reset flag is omitted (the common case) then there is one entry
	470	+in $.file for every file that changes relative to the primary parent
	471	+in $.from. If There is no primary parent, then the presence of the
	472	+$.reset flag is assumed even if it is omitted.
	473	+
	474	+The $.file[].fname element is the name of the file.
	475	+The $.file[].id element corresponds to a DATA.ID or NAME.NAMEID
	476	+that is the content of the file. If the file is being removed
	477	+by this check-in, then the $.file[].id element is omitted.
	478	+The $.file[].mode element is text containing one or more ASCII
	479	+characters. If the "x" character is included in $.file[].mode
	480	+then the file is executable. If the "l" character is included
	481	+in $.file[].mode then the file is a symbolic link (and the content
	482	+of the file is the target of the link). The $.file[].mode may
	483	+be blank or omitted for a normal read/write file. If a file
	484	+is being renamed, the $.file[].oldname field may be included
	485	+to show the previous name of the file, if that information is
	486	+available.
	487	+
	488	+Some version control systems allow tags and properties to be
	489	+associated with a check-in. The $.tag element supports this
	490	+feature. Each element of the $.tag array is a separate tag
	491	+or property. If the $.tag[].propagate field exists and has
	492	+a value of "1", then the tag/property propagates to all
	493	+non-merge children. If the $.tag[].delete field exists and
	494	+has a value of "1", then a propagating tag or property with
	495	+the given name that was set by some ancestor check-in is
	496	+stopped and omitted from this check-in. Version control
	497	+systems that do not support tags and/or properties on check-ins
	498	+or that do not support tag propagation can ignore all of these
	499	+attributes.
126	500

	--- www/vccp/intro.md
	+++ www/vccp/intro.md
	@@ -1,11 +1,11 @@
1	Version Control Collaboration Protocol
2	======================================
3
4	<blockquote><center style='background: yellow; border: 1px solid black;'>
5	This document is a work in progress.<br>
6	The last update was on 2019-03-09.<br>
7	Check back later for updates.
8	</center></blockquote>
9
10	1.0 Introduction
11	----------------
	@@ -12,22 +12,22 @@
12
13	The Version Control Collaboration Protocol or VCCP is an attempt to make
14	it easier for developers to collaborate even when they are using different
15	version control systems.
16
17	For example, suppose Alice, the founder and
18	[BDFL](https://en.wikipedia.org/wiki/Benevolent_dictator_for_life)
19	for the fictional "BambooCoffee" project, prefers using the
20	[Mercurial](https://www.mercurial-scm.org/) version control system,
21	but two of her clients, Bob and Cindy, know nothing but
22	[Git](https://www.git-scm.org/) and steadfastly refuse to
23	type any command that begins with "hg", and an important

24	collaborator, Dave, really prefers [Bazaar](bazaar.canonical.com/).
25	The VCCP is designed to make it relatively easy and painless
26	for Alice to set up Git and Bazaar mirrors of her Mercurial
27	repository so that Bob, Cindy, and Dave can all use the tools with
28	which they are most familiar.
29
30	<center>![](diagram-1.jpg)</center>
31
32	Assuming all the servers speak VCCP (which is not the case at the
33	time of this writing, but we hope to encourage that for the future)
	@@ -38,11 +38,11 @@
38	### 1.1 Bidirectional Collaboration
39
40	The diagram above shows that all changes originate from Alice and
41	that Bob, Cindy, and David are only consumers. If Cindy wanted to
42	make a change to BambooCoffee, she would have to do that with a backchannel,
43	such as sending a patch via email to Alice and then get Alice to check
44	in the change.
45
46	But VCCP also support bidirectional collaboration.
47
48	<center>![](diagram-2.jpg)</center>
	@@ -55,26 +55,27 @@
55	VCCP message back to Truth containing Cindy's changes. Truth would then
56	relay those changes over to Mirror-2 where Dave could see them as well.
57
58	### 1.2 Client-Mirror versus Server-Mirror
59
60	VCCP allows the mirrors to be set up as either clients or server.
61
62	In the client-mirror approach, the mirrors periodically poll Truth asking
63	for changes. In the server-mirror approach, Truth sends changes to the
64	mirrors as they occur.
65
66	In the first example above, the implication was that the server-mirror
67	approach was being used. The Truth repository would take the initiative
68	to send changes to the mirrors. But it does not have to be that way.
69	Suppose Dave is unknown to Alice. Suppose he just likes Alice's work and
70	wants to keep his own mirror for his own convenience. Dave could set up

71	Mirror-2 as a client-mirror that periodically polls Truth for changes.
72
73	In the second example above, Truth and Mirror-1 could be configured to
74	have a Peer-to-Peer relationship rather than a Truth-to-Mirror relationship.
75	When new content arrives a Truth (because Alice did an "hg commit"),
76	Truth acts as a client to initiate a transfer of that new information
77	over to Mirror-1. When new content originates at Mirror-1 (because
78	Cindy did "git commit") then Mirror-1 acts as a client to send a the new
79	content over the Truth. Or, they could set it up so that Truth is always
80	the client and it periodically polls Mirror-1 looking for new content
	@@ -81,10 +82,39 @@
81	coming from Cindy. Or, they could set it up so that Mirror-1 is always
82	the client and it periodically polls Truth looking for changes from Alice.
83
84	The point is that VCCP works in all of these scenarios.
85





























86	2.0 Minimum Requirements
87	------------------------
88
89	The VCCP is modeled after the Git fast-export and fast-import protocol.
90	That is to say, VCCP thinks in terms of "check-ins" with each check-in
	@@ -101,11 +131,11 @@
101	a parent, but all the others should. Check-ins may also identify
102	zero or more "merge" parents, and zero or more "cherrypick" ancestors.
103	But the merges and cherrypicks can be ignored on systems that do not
104	support those concepts.
105
106	VCCP assumes that every distinct version of a file, and every check-in has
107	a unique name. In Git and Mercurial, those names are SHA1 hashes
108	(computed in different ways). Fossil uses SHA3-256 hashes. I'm not sure
109	what Bazaar uses. VCCP does not care how the names are derived, as long
110	as they always uniquely identify the file or check-in.
111
	@@ -121,5 +151,349 @@
121
122	The VCCP is a client-server protocol.
123	A client formats a VCCP message and sends it to the server.
124	The server acts upon that message, formulates a reply, and sends
125	the reply back to the client.
























































































































































































































































































































































126

	--- www/vccp/intro.md
	+++ www/vccp/intro.md
	@@ -1,11 +1,11 @@
1	Version Control Collaboration Protocol
2	======================================
3
4	<blockquote><center style='background: yellow; border: 1px solid black;'>
5	This document is a work in progress.<br>
6	The last update was on 2019-03-13.<br>
7	Check back later for updates.
8	</center></blockquote>
9
10	1.0 Introduction
11	----------------
	@@ -12,22 +12,22 @@
12
13	The Version Control Collaboration Protocol or VCCP is an attempt to make
14	it easier for developers to collaborate even when they are using different
15	version control systems.
16
17	For example, suppose Alice, the founder and principal maintainer

18	for the fictional "BambooCoffee" project, prefers using the
19	[Mercurial](https://www.mercurial-scm.org/) version control system,
20	but two of her clients, Bob and Cindy, know nothing but
21	[Git](https://www.git-scm.org/) and steadfastly refuse to
22	type any command that begins with "hg".
23	Further suppose that an important
24	collaborator, Dave, really prefers [Bazaar](bazaar.canonical.com/).
25	The VCCP is designed to make it relatively easy and painless
26	for Alice to set up Git and Bazaar mirrors of her Mercurial
27	repository so that Bob, Cindy, and Dave can all use the tools
28	they are most familiar with.
29
30	<center>![](diagram-1.jpg)</center>
31
32	Assuming all the servers speak VCCP (which is not the case at the
33	time of this writing, but we hope to encourage that for the future)
	@@ -38,11 +38,11 @@
38	### 1.1 Bidirectional Collaboration
39
40	The diagram above shows that all changes originate from Alice and
41	that Bob, Cindy, and David are only consumers. If Cindy wanted to
42	make a change to BambooCoffee, she would have to do that with a backchannel,
43	such as sending a patch via email to Alice and asking Alice to check
44	in the change.
45
46	But VCCP also support bidirectional collaboration.
47
48	<center>![](diagram-2.jpg)</center>
	@@ -55,26 +55,27 @@
55	VCCP message back to Truth containing Cindy's changes. Truth would then
56	relay those changes over to Mirror-2 where Dave could see them as well.
57
58	### 1.2 Client-Mirror versus Server-Mirror
59
60	VCCP allows the mirrors to be set up as either clients or servers.
61
62	In the client-mirror approach, the mirrors periodically poll Truth asking
63	for changes. In the server-mirror approach, Truth sends changes to the
64	mirrors as they occur.
65
66	In the first example above, the implication was that the server-mirror
67	approach was being used. The Truth repository would take the initiative
68	to send changes to the mirrors. But it does not have to be that way.
69	Suppose Dave is unknown to Alice. Suppose he just likes Alice's work and
70	wants to keep his own mirror of her work for his own convenience.
71	Dave could set up
72	Mirror-2 as a client-mirror that periodically polls Truth for changes.
73
74	In the second example above, Truth and Mirror-1 could be configured to
75	have a Peer-to-Peer relationship rather than a Truth-to-Mirror relationship.
76	When new content arrives at Truth (because Alice did an "hg commit"),
77	Truth acts as a client to initiate a transfer of that new information
78	over to Mirror-1. When new content originates at Mirror-1 (because
79	Cindy did "git commit") then Mirror-1 acts as a client to send a the new
80	content over the Truth. Or, they could set it up so that Truth is always
81	the client and it periodically polls Mirror-1 looking for new content
	@@ -81,10 +82,39 @@
82	coming from Cindy. Or, they could set it up so that Mirror-1 is always
83	the client and it periodically polls Truth looking for changes from Alice.
84
85	The point is that VCCP works in all of these scenarios.
86
87	### 1.3 Name Mapping
88
89	Different version control systems use different names to refer to the same
90	object. For example, Fossil names files using a SHA3-256 hash of the
91	unmodified file content, whereas Git uses a hardened-SHA1 hash of the file
92	content with an added prefix. Mercurial, Monotone, Bazaar, and others all
93	uses different naming schemes, so that the same check-in in any particular
94	version control system will have a different name in all other version
95	control systems.
96
97	When mirroring a project between two version control systems, somebody
98	needs to keep track of the mapping between names.
99
100	For example, in the second diagram above, if Mirror-1 wants to tell Truth
101	that it has a new check-in "Q" that is a child of "P", then it has to send
102	the name of check-in "P". Does it send the Git-name of "P" or the
103	Mercurial-name of "P"? If Mirror-1 sends Truth the Git-name of "P" then
104	Truth must be the system that does the name mapping. If Mirror-1 sends
105	Truth the Mercurial-name of "P", then Mirror-1 is the system that maintains
106	the mapping.
107
108	The VCCP is designed such that both names for a
109	particular check-in or file can be sent. One of the collaborating systems
110	must still take responsibility for translating the names, but it does not
111	matter which system. As long as one or the other of the two systems
112	maintains a name mapping, the collaboration will work. Of course, it
113	also works for both systems to maintain the name map, and for maximum
114	flexibility, perhaps that should be the preferred approach.
115
116	2.0 Minimum Requirements
117	------------------------
118
119	The VCCP is modeled after the Git fast-export and fast-import protocol.
120	That is to say, VCCP thinks in terms of "check-ins" with each check-in
	@@ -101,11 +131,11 @@
131	a parent, but all the others should. Check-ins may also identify
132	zero or more "merge" parents, and zero or more "cherrypick" ancestors.
133	But the merges and cherrypicks can be ignored on systems that do not
134	support those concepts.
135
136	VCCP assumes that every distinct version of a file and every check-in has
137	a unique name. In Git and Mercurial, those names are SHA1 hashes
138	(computed in different ways). Fossil uses SHA3-256 hashes. I'm not sure
139	what Bazaar uses. VCCP does not care how the names are derived, as long
140	as they always uniquely identify the file or check-in.
141
	@@ -121,5 +151,349 @@
151
152	The VCCP is a client-server protocol.
153	A client formats a VCCP message and sends it to the server.
154	The server acts upon that message, formulates a reply, and sends
155	the reply back to the client.
156
157	It does not matter what transport mechanism is used to send the VCCP
158	messages from client to server and back again.
159	But for maximum flexibility, it is suggested that HTTP (or HTTPS) be
160	used. The client sends an HTTP request to the server with the
161	VCCP message as the request content and a MIME-type of "application/x-vccp".
162	The HTTP response is another VCCP message with the same MIME-type.
163	The use of HTTP means that firewalls and proxies are not an
164	impediment to collaboration and that collaboration connection information
165	can be described by a simple URL.
166
167	There are provisions in the VCCP design to allow authentication
168	in the body of the VCCP message itself. Or, two systems can, by
169	mutual agreement, authenticate by some external mechanism.
170
171	### 3.1 Message Content
172
173	A single VCCP message round-trip can be a "push" if the client is sending
174	new check-in information to the server, or it can be a "pull" if the
175	client is polling the server to see if new check-in information is available
176	for download, or it can be both at once.
177
178	The basic design of a VCCP message is inspired by the Git fast-export
179	protocol, but with enhancements to support incremental updates and
180	bidirectional updates and to make the message format more robust and
181	portable and simpler to generate and parse. A single message may contain
182	multiple "files", check-in descriptions that reference those files, and "tag"
183	descriptions. A "message description" section contains authentication
184	data, error codes, and other meta-data. Every request and every
185	reply contains, at a minimum, a message description.
186
187	For a push, the request contains a message description with
188	authentication information, and the new files, check-ins, and tags
189	that are being pushed to the server. The reply to a push contains
190	success codes, and the names that the server assigned to the new objects,
191	so that the client can maintain a name map.
192
193	For pull, the request contains only a message description with
194	authentication information and a description of what content the
195	client desires to pull.
196	The reply to a pull contains the files, check-ins, and tags requested.
197
198	For a pull request, there is no mechanism (currently defined) for the
199	server to learn the client-side names for files and check-ins. Hence,
200	for a collaboration arrangement where the client polls the server for
201	updates, the client must maintain the name map.
202
203	### 3.2 Message Format Overview
204
205	The format of a VCCP message is an ordinary SQLite database file with
206	a two-table schema.
207	The DATA table contains file, check-in, and tag content and the
208	message description. The DATA.CONTENT column contains either raw
209	file content or check-ins and tags descriptions formatted as JSON.
210	The message description is also JSON contained in a specially
211	designated row of the DATA table. The NAME table of the schema
212	is used to transmit name mappings. The NAME table serves the same
213	role as the "marks" file of git-fast-export.
214
215	### 3.3 Why Use A Database As The Message Format?
216
217	Why does a VCCP message consist of an SQLite database instead of a
218	bespoke format like git-fast-export?
219
220	1. Some of the content to be transferred will typically be binary.
221	Most projects have at least a few images or other binary files
222	in their tree somewhere. Other files will be pure text. Check-in
223	and tag descriptions will also be pure text (JSON). That means
224	that the VCCP message will be a mix of text and binary content.
225	An SQLite database file is a convenient and efficient way
226	to encapsulate both binary and text content into a single container
227	which is easily created and accessed.
228
229	2. Robust, cross-platform libraries for reading and writing SQLite database
230	files already exist on every computer. No custom parser or generator
231	code needs to be written, debugged, managed, or maintained.
232
233	3. The SQLite database file format is well defined, cross-platform
234	(32-bit, 64-bit, bit-endian, and little-endian) and is endorsed
235	by the US Library of Congress as a recommended file format for
236	archival data storage.
237
238	4. Unlike a serial format (such as git-fast-export) which must
239	normally be written and read sequentially from beginning to end,
240	elements of an SQLite database can be constructed and read in any
241	order. This gives extra implementation flexibility to both readers
242	and writers.
243
244	### 3.4 Database Schema
245
246	The database schema for a VCCP message is as follows:
247
248	>
249	CREATE TABLE data(
250	id INTEGER PRIMARY KEY,
251	dclass INT,
252	sz INT,
253	calg INT,
254	cref INT,
255	content ANY
256	);
257	CREATE TABLE name(
258	nameid INT,
259	nametype INT,
260	name TEXT,
261	PRIMARY KEY(nameid,nametype)
262	) WITHOUT ROWID;
263
264	The DATA table holds the message description, the content of files, and JSON
265	descriptions of check-ins and tags. The NAME table is used to transmit
266	names. The DATA table corresponds to the body of a git-fast-export stream
267	and the NAME table corresponds to the "marks" file that is read and
268	written by the "--import-marks" and "--export-marks" options of the
269	"git fast-export" command.
270
271	Each file, check-in, and tag is normally a single distinct entry in
272	the DATA table. (Exception: very large files, greater than 1GB in size,
273	can be split across multiple DATA table rows - see below.) Entries in
274	the DATA tale can occur in any order. It is not required that files
275	referenced by check-ins have a smaller DATA.ID value, for example.
276	Free ordering does not impede data extraction (see the algorithm descriptions
277	below) but it does give considerable freedom to the message generator
278	logic.
279
280	Each DATA row has a class identified by a small integer in the DATA.DCLASS
281	column.
282
283	>
284	\| 0: \| A check-in \|
285	\| 1: \| A file \|
286	\| 2: \| A tag \|
287	\| 3: \| The VCCP message description \|
288	\| 4: \| Application-defined-1 \|
289	\| 5: \| Application-defined-2 \|
290
291	Every well-formed VCCP message has exactly one message description entry
292	with DATA.ID=0 and DATA.DCLASS=3. No other DATA table entries should have
293	DATA.DCLASS=3.
294
295	The application-defined values are reserved for extended uses of the
296	VCCP message format. In particular, there are plans to enhance
297	Fossil so that it uses VCCP as its sync protocol, replacing its
298	current bespoke protocol. But Fossil needs to send information other
299	kinds of objects, such as wiki pages and tickets, that are not known
300	to Git and most other version control systems. A few
301	"application defined" values are available at strategic points in
302	the message format description to accommodate these extended use cases.
303	New application-defined values may be defined in the future.
304	Portable VCCP messages between different version control systems
305	should never use the application-defined values.
306
307	The DATA.CONTENT field can be either text or binary, as appropriate.
308	For files, the DATA.CONTENT is binary. For check-ins and tags and for
309	the message description, the DATA.CONTENT is a text JSON object.
310
311	The DATA.CONTENT field can optionally be compressed. The DATA.SZ field
312	is the uncompressed size of the content in bytes. The compression method
313	is determined by the DATA.CALG field:
314
315	>
316	\| 0: \| No compression \|
317	\| 1: \| ZLib compression \|
318	\| 2: \| Multi-blob \|
319	\| 3: \| Application-defined-1 \|
320	\| 4: \| Application-defined-2 \|
321
322	The "multi-blob" compression method means that the content is the
323	concatenation of the content in other DATA table rows. This
324	allows for content that exceeds the 1GB size limit for an SQLite
325	BLOB column. If the DATA.CALG field is 2, then DATA.CONTENT will
326	be a JSON array of integer values, where each integer is the DATA.ID
327	of another DATA table entry that contains part of the content.
328	The actual data content is the concatenation of the other DATA table
329	entries. The secondary DATA table entries can also be compressed,
330	though not with multi-blob. In other words, the multi-blob
331	compression method may not be nested. This effectively limits the
332	maximum size of a file in the VCCP to maximum size of an SQLite
333	database, which is 140 terabytes.
334
335	Portable VCCP files should only use compression methods 0, 1, and 2,
336	and preferrably only method 0 (no compression). But application-defined
337	compression methods are available for proprietary uses of the
338	VCCP message format. The DATA.CREF field is auxiliary data intended
339	for use with these application-defined compression methods. In
340	particular, DATA.CREF is intended to be the DATA.ID of a "base"
341	entry for delta-compression methods. For a portable VCCP file,
342	the DATA.CREF field should always be NULL.
343
344	The DATA.ID field provides an integer identifier for files and
345	check-ins. The scope of that name is the single VCCP message
346	in which the DATA table entry appears, however. The NAME table
347	is used to provide a mapping from these internal integer names
348	to the persistent global hash names of the various version
349	control systems.
350
351	A single object can have different names, depending on which
352	version control system stores it. For this reason, the NAME
353	table is designed to allow storage of multiple names for the
354	same object. If NAME.NAMETYPE is 0, that means that the name
355	is appropriate for use on the client. If NAME.NAMETYPE is 1,
356	that means the name is appropriate for use on the server.
357
358	To simplify the implementation of VCCP on diverse systems,
359	names should be sent as text. If the names for a particular system
360	are binary hashes, then the NAME table should store them as
361	the hexadecimal representation.
362
363	#### 3.4.1 NAME Table Example 1
364
365	Suppose a client is pushing a new check-in to the server and the
366	check-in text is stored in the DATA.ID=1 row. Then the request
367	should contain a NAME table row with NAME.NAMEID=1 (to match the
368	DATA table ID value) and NAME.NAMETYPE=0 (because client names
369	have NAMETYPE 0) and with the name of that check-in according to
370	the client stored in NAME.NAME. The server will recode the
371	check-in according its its own format, and store the server-side
372	name in a new NAME table row with NAME.NAMEID=1 and NAME.NAMETYPE=1.
373	The server then includes the complete NAME table in its reply
374	back to the client. In this way, the client is able to discover
375	the name of the check-in on the server. The serve can also
376	remember the client check-in name, if desired.
377
378	### 3.5 Check-in JSON Format
379
380	Check-ins are described by DATA table rows where the content is a
381	single JSON object, as follows:
382
383	>
384	{
385	"time": DATETIME, -- Date and time of the check-in
386	"comment": TEXT, -- The original check-in comment
387	"mimetype": TEXT, -- The mimetype of the comment text
388	"branch": TEXT, -- Branch this check-in belongs to
389	"from": INT, -- NAME.NAMEID for the primary parent
390	"merge": [INT], -- Merge parents
391	"cherrypick": [INT] -- Cherrypick merges
392	"author": { -- Author of the change
393	"name": TEXT, -- Name or handle
394	"email": TEXT, -- Email address
395	"time": DATETIME -- Override for $.time
396	},
397	"committer": { -- Committer of the change
398	"name": TEXT, -- Name or handle
399	"email": TEXT, -- Email address
400	"time": DATETIME -- Override for $.time
401	},
402	"tag": [{ -- Tags and properties for this check-in
403	"name": TEXT, -- tag name
404	"value": TEXT, -- value (if it is a property)
405	"delete": 1, -- If present, delete this tag
406	"propagate": 1 -- Means propagate to descendants
407	}],
408	"reset": 1, -- All files included, not just changes
409	"file": [{ -- File in this check-in
410	"fname": TEXT, -- filename
411	"id": INT, -- DATA.ID or NAME_NAMEID. Omitted to delete
412	"mode": TEXT, -- "x" for executable. "l" for symlink
413	"oldname": TEXT -- Prior name if the file is renamed
414	}]
415	}
416
417	The $.time element is defines the moment in time when the check-in
418	occurred. The $.time field is required. Times are always Coordinated
419	Universal Time (UTC). DATETIME can be represented in multiple ways:
420
421	1. If the DATETIME is an integer, then it is the number of seconds
422	since 1970 (also known as "unix time").
423
424	2. If the DATETIME is text, then it is ISO8601 as follows:
425	"YYYY-MM-DD HH:MM:SS.SSS". The fractional seconds may be
426	omitted.
427
428	3. If the DATETIME is a real number, then it is the fractional
429	julian day number.
430
431	The $.comment element is the check-in comment. The $.comment field is
432	required. The mimetype for $.commit defaults to "text/plain" but can
433	be some other MIME-type if the $.mimetype field is present.
434
435	The $.branch element defines the name of the branch that this check-in
436	belongs to. If omitted, the branch of the check-in is the same as
437	the branch of its primary parent check-in.
438
439	The $.from element is defines the primary parent check-in. Every
440	check-in other than the first check-in of the project has a primary
441	parent. The integer value of the $.from element is either the
442	DATA.ID value for another check-in in the same VCCP message or is
443	the NAME.NAMEID value for a NAME table entry that identifies the
444	parent check-in, or both. If the information sender is relying on the
445	other side to do name mapping, then only the local name will be provided.
446	But if the information sender has a name map, it should provide both
447	its local name and the remote name for the check-in, so that the receiver
448	can update its name map.
449
450	The $.merge element is an array of integers for additional check-ins
451	that are merged into the current check-in. The $.cherrypick element
452	is an array of integer values that are check-ins that are cherrypick-merged
453	into the current check-in. Systems that do not record cherrypick merges
454	can ignore the $.cherrypick value.
455
456	The $.author and $.committer elements define who created the check-in.
457	The $.committer element is required. The $.author element may be omitted
458	in the common case where the author and committer are the same. The
459	$.committer.time and $.author.time subelements should only be included
460	if they are different from $.time.
461
462	The $.reset element, if present, should have an integer value of "1".
463	The presence of the $.reset element is a flag that affects the meaning
464	of the $.file element.
465
466	The $.file element is an array of JSON objects that define the files
467	associated with the check-in. If the $.reset flag is present, then there
468	must be one entry in $.file for every file in the check-in. If the
469	$.reset flag is omitted (the common case) then there is one entry
470	in $.file for every file that changes relative to the primary parent
471	in $.from. If There is no primary parent, then the presence of the
472	$.reset flag is assumed even if it is omitted.
473
474	The $.file[].fname element is the name of the file.
475	The $.file[].id element corresponds to a DATA.ID or NAME.NAMEID
476	that is the content of the file. If the file is being removed
477	by this check-in, then the $.file[].id element is omitted.
478	The $.file[].mode element is text containing one or more ASCII
479	characters. If the "x" character is included in $.file[].mode
480	then the file is executable. If the "l" character is included
481	in $.file[].mode then the file is a symbolic link (and the content
482	of the file is the target of the link). The $.file[].mode may
483	be blank or omitted for a normal read/write file. If a file
484	is being renamed, the $.file[].oldname field may be included
485	to show the previous name of the file, if that information is
486	available.
487
488	Some version control systems allow tags and properties to be
489	associated with a check-in. The $.tag element supports this
490	feature. Each element of the $.tag array is a separate tag
491	or property. If the $.tag[].propagate field exists and has
492	a value of "1", then the tag/property propagates to all
493	non-merge children. If the $.tag[].delete field exists and
494	has a value of "1", then a propagating tag or property with
495	the given name that was set by some ancestor check-in is
496	stopped and omitted from this check-in. Version control
497	systems that do not support tags and/or properties on check-ins
498	or that do not support tag propagation can ignore all of these
499	attributes.
500

Fossil SCM

Keyboard Shortcuts