Fossil SCM

Updated my work list, added first notes about 'cvs import' functionality.

aku 2007-08-28 03:34 trunk

Commit 103c397e4b36b1ddfb4afe52d120489361cac9c8

Parent 63564202fd0d6a9…

3 files changed +50 +49 +2 -2

+ ci_cvs.txt + ci_fossil.txt ~ todo-ak.txt

A ci_cvs.txt

+50

		--- a/ci_cvs.txt
		+++ b/ci_cvs.txt
		@@ -0,0 +1,50 @@
	1	+ M {Wed Nov 22 09:28:49 AM PST 2000} ericm 1.7 tcllib/modules/ftpd/ftpd.tcl
	2	+ files: 2
	3	+ delta: 0
	4	+ range: 0 seconds
	5	+ =============================/cmsg
	6	+ M {Wed Nov 29 02:14:33 PM PST 2000} ericm 1.3 tcllib/aclocal.m4
	7	+ files: 1
	8	+ delta:
	9	+ range: 0 seconds
	10	+ =============================/cmsg
	11	+ M {Sun Feb 04 12:28:35 AM PST 2001} ericm 1.9 tcllib/modules/mime/ChangeLog
	12	+ M {Sun Feb 04 12:28:35 AM PST 2001} ericm 1.12 tcllib/modules/mime/mime.tcl
	13	+ files: 2
	14	+ delta: 0
	15	+ range: 0 seconds
	16	+
	17	+All csets modify files which already have several revisions. We have
	18	+no csets from before that in the history, but these csets are in the
	19	+RCS files.
	20	+
	21	+I wonder, is SF maybe removing old entries from the history when it
	22	+grows too large ?
	23	+
	24	+This also affects incremental import ... I cannot assume that the
	25	+history always grows. It may shrink ... I cannot keep an offset, will
	26	+have to record the time of the last entry, or even the full entry
	27	+processed last, to allow me to skip ahead to anything not known yet.
	28	+
	29	+I might have to try to implement the algorithm outlined below,
	30	+matching the revision trees of the individual RCS files to each other
	31	+to form the global tree of revisions. Maybe we can use the history to
	32	+help in the matchup, for the parts where we do have it.
	33	+
	34	+Wait. This might be easier ... Take the delta information from the RCS
	35	+files and generate a fake history ... Actually, this might even allow
	36	+us to create a total history ... No, not quite, the merge entries the
	37	+actual history may contain will be missing. These we can mix in from
	38	+the actual history, as much as we have.
	39	+
	40	+Still, lets try that, a fake history, and then run this script on it
	41	+to see if/where are differences.
	42	+
	43	+===============================================================================
	44	+
	45	+
	46	+Notes about CVS import, regarding CVS.
	47	+
	48	+- Problem: CVS does not really track changesets, but only individual
	49	+ revisions of files. To recover changesets it is necessary to look at
	50	+ author, branch, timestamp information, and the commit

	--- a/ci_cvs.txt
	+++ b/ci_cvs.txt
	@@ -0,0 +1,50 @@

	--- a/ci_cvs.txt
	+++ b/ci_cvs.txt
	@@ -0,0 +1,50 @@
1	M {Wed Nov 22 09:28:49 AM PST 2000} ericm 1.7 tcllib/modules/ftpd/ftpd.tcl
2	files: 2
3	delta: 0
4	range: 0 seconds
5	=============================/cmsg
6	M {Wed Nov 29 02:14:33 PM PST 2000} ericm 1.3 tcllib/aclocal.m4
7	files: 1
8	delta:
9	range: 0 seconds
10	=============================/cmsg
11	M {Sun Feb 04 12:28:35 AM PST 2001} ericm 1.9 tcllib/modules/mime/ChangeLog
12	M {Sun Feb 04 12:28:35 AM PST 2001} ericm 1.12 tcllib/modules/mime/mime.tcl
13	files: 2
14	delta: 0
15	range: 0 seconds
16
17	All csets modify files which already have several revisions. We have
18	no csets from before that in the history, but these csets are in the
19	RCS files.
20
21	I wonder, is SF maybe removing old entries from the history when it
22	grows too large ?
23
24	This also affects incremental import ... I cannot assume that the
25	history always grows. It may shrink ... I cannot keep an offset, will
26	have to record the time of the last entry, or even the full entry
27	processed last, to allow me to skip ahead to anything not known yet.
28
29	I might have to try to implement the algorithm outlined below,
30	matching the revision trees of the individual RCS files to each other
31	to form the global tree of revisions. Maybe we can use the history to
32	help in the matchup, for the parts where we do have it.
33
34	Wait. This might be easier ... Take the delta information from the RCS
35	files and generate a fake history ... Actually, this might even allow
36	us to create a total history ... No, not quite, the merge entries the
37	actual history may contain will be missing. These we can mix in from
38	the actual history, as much as we have.
39
40	Still, lets try that, a fake history, and then run this script on it
41	to see if/where are differences.
42
43	===============================================================================
44
45
46	Notes about CVS import, regarding CVS.
47
48	- Problem: CVS does not really track changesets, but only individual
49	revisions of files. To recover changesets it is necessary to look at
50	author, branch, timestamp information, and the commit

A ci_fossil.txt

+49

		--- a/ci_fossil.txt
		+++ b/ci_fossil.txt
		@@ -0,0 +1,49 @@
	1	+
	2	+To perform CVS imports for fossil we need at least the ability to
	3	+parse CVS files, i.e. RCS files, with slight differences.
	4	+
	5	+For the general architecture of the import facility we have two major
	6	+paths to choose between.
	7	+
	8	+One is to use an external tool which processes a cvs repository and
	9	+drives fossil through its CLI to insert the found changesets.
	10	+
	11	+The other is to integrate the whole facility into the fossil binary
	12	+itself.
	13	+
	14	+I dislike the second choice. It may be faster, as the implementation
	15	+can use all internal functionality of fossil to perform the import,
	16	+however it will also bloat the binary with functionality not needed
	17	+most of the time. Which becomes especially obvious if more importers
	18	+are to be written, like for monotone, bazaar, mercurial, bitkeeper,
	19	+git, SVN, Arc, etc. Keeping all this out of the core fossil binary is
	20	+IMHO more beneficial in the long term, also from a maintenance point
	21	+of view. The tools can evolve separately. Especially important for CVS
	22	+as it will have to deal with lots of broken repositories, all
	23	+different.
	24	+
	25	+However, nothing speaks against looking for common parts in all
	26	+possible import tools, and having these in the fossil core, as a
	27	+general backend all importer macollection of files, some of which may be manifests, others are data
	28	+files, and if it imports them in a random order it might find that
	29	+file X, which was imported first and therefore has no delta
	30	+compression, is actually somewhere in the middle of a line of
	31	+revisions, and should be delta-compressed, and then it has to find out
	32	+the predecessor and do the compression, etc.
	33	+
	34	+So depending on how the internal logic of delta-compression is done
	35	+reconstruct might need more logic to help the lower level achieve good
	36	+compression.
	37	+
	38	+Like, in a first pass determine which files are manifests, and read
	39	+enough of them to determine their parent/child structure, and in a
	40	+second pass actually imports them, in topological order, with all
	41	+relevant non-manifest files for a manifest imported as that time
	42	+too. With that the underlying engine would see the files basically in
	43	+the same order as generated by a regular series of commits.
	44	+
	45	+Problems for reconstruct: Files referenced, but not present, and,
	46	+conversely, files present, but not referenced. This can done as part
	47	+of the second pass, aborting when a missing file is encountered, with
	48	+(un)marking of used files, and at the end we know the unused
	49	+files. Could also be a separate pass between first and second.

	--- a/ci_fossil.txt
	+++ b/ci_fossil.txt
	@@ -0,0 +1,49 @@

	--- a/ci_fossil.txt
	+++ b/ci_fossil.txt
	@@ -0,0 +1,49 @@
1
2	To perform CVS imports for fossil we need at least the ability to
3	parse CVS files, i.e. RCS files, with slight differences.
4
5	For the general architecture of the import facility we have two major
6	paths to choose between.
7
8	One is to use an external tool which processes a cvs repository and
9	drives fossil through its CLI to insert the found changesets.
10
11	The other is to integrate the whole facility into the fossil binary
12	itself.
13
14	I dislike the second choice. It may be faster, as the implementation
15	can use all internal functionality of fossil to perform the import,
16	however it will also bloat the binary with functionality not needed
17	most of the time. Which becomes especially obvious if more importers
18	are to be written, like for monotone, bazaar, mercurial, bitkeeper,
19	git, SVN, Arc, etc. Keeping all this out of the core fossil binary is
20	IMHO more beneficial in the long term, also from a maintenance point
21	of view. The tools can evolve separately. Especially important for CVS
22	as it will have to deal with lots of broken repositories, all
23	different.
24
25	However, nothing speaks against looking for common parts in all
26	possible import tools, and having these in the fossil core, as a
27	general backend all importer macollection of files, some of which may be manifests, others are data
28	files, and if it imports them in a random order it might find that
29	file X, which was imported first and therefore has no delta
30	compression, is actually somewhere in the middle of a line of
31	revisions, and should be delta-compressed, and then it has to find out
32	the predecessor and do the compression, etc.
33
34	So depending on how the internal logic of delta-compression is done
35	reconstruct might need more logic to help the lower level achieve good
36	compression.
37
38	Like, in a first pass determine which files are manifests, and read
39	enough of them to determine their parent/child structure, and in a
40	second pass actually imports them, in topological order, with all
41	relevant non-manifest files for a manifest imported as that time
42	too. With that the underlying engine would see the files basically in
43	the same order as generated by a regular series of commits.
44
45	Problems for reconstruct: Files referenced, but not present, and,
46	conversely, files present, but not referenced. This can done as part
47	of the second pass, aborting when a missing file is encountered, with
48	(un)marking of used files, and at the end we know the unused
49	files. Could also be a separate pass between first and second.

M todo-ak.txt

+2 -2

		--- todo-ak.txt
		+++ todo-ak.txt
		@@ -11,15 +11,15 @@
11	11
12	12	* Think about exposure of functionality as libraries, i.e. Tcl
13	13	packages. Foundations like delta, etc. first, work up to
14	14	higher-levels.
15	15
16		-* Document delta format, delta encoder.
17		-
18	16	* Document the merge algorithm.
19	17
20	18	* Document the xfer protocol.
	19	+
	20	+* CVS import. Testcases: Tcl, Tk, Tcllib.
21	21
22	22	Questions
23	23
24	24	* In the timeline seen at http://fossil-scm.hwaci.com/fossil/timeline
25	25	the manifest uuids are links to pages providing additional
26	26

	--- todo-ak.txt
	+++ todo-ak.txt
	@@ -11,15 +11,15 @@
11
12	* Think about exposure of functionality as libraries, i.e. Tcl
13	packages. Foundations like delta, etc. first, work up to
14	higher-levels.
15
16	* Document delta format, delta encoder.
17
18	* Document the merge algorithm.
19
20	* Document the xfer protocol.


21
22	Questions
23
24	* In the timeline seen at http://fossil-scm.hwaci.com/fossil/timeline
25	the manifest uuids are links to pages providing additional
26

	--- todo-ak.txt
	+++ todo-ak.txt
	@@ -11,15 +11,15 @@
11
12	* Think about exposure of functionality as libraries, i.e. Tcl
13	packages. Foundations like delta, etc. first, work up to
14	higher-levels.
15


16	* Document the merge algorithm.
17
18	* Document the xfer protocol.
19
20	* CVS import. Testcases: Tcl, Tk, Tcllib.
21
22	Questions
23
24	* In the timeline seen at http://fossil-scm.hwaci.com/fossil/timeline
25	the manifest uuids are links to pages providing additional
26

Fossil SCM

Keyboard Shortcuts