Fossil SCM

Reworked my notes regarding 'reconstruct' based on my reading of content.c, checkin.c, and manifest.c

aku 2007-08-28 05:01 trunk
Commit 10062df2fa9b02069f0d3c90dd1d990717a921a9
1 file changed +65 -38
+65 -38
--- ci_fossil.txt
+++ ci_fossil.txt
@@ -58,43 +58,70 @@
5858
5959
----------------------------------
6060
6161
reconstruct
6262
63
-It is currently not clear to me when and how fossil does
64
-delta-compression. Does it use deltas, or reverse deltas, or a
65
-combination of both ? And when does it generate the deltas ?
66
-
67
-The trivial solution is that it uses deltas, i.e. the first revision
68
-of a file is stored without delta compression and all future versions
69
-are deltas from that, and the delta is generated when the new revision
70
-is committed. With the obvious disadvantage that newer revisions take
71
-more and more time to be decompressed as the set of deltas to apply
72
-grows.
73
-
74
-And during xfer it simply sends the deltas as is, making for easy
75
-integration on the remote side.
76
-
77
-However reconstruct, that method sees initially only an unordered
78
-collection of files, some of which may be manifests, others are data
79
-files, and if it imports them in a random order it might find that
80
-file X, which was imported first and therefore has no delta
81
-compression, is actually somewhere in the middle of a line of
82
-revisions, and should be delta-compressed, and then it has to find out
83
-the predecessor and do the compression, etc.
84
-
85
-So depending on how the internal logic of delta-compression is done
86
-reconstruct might need more logic to help the lower level achieve good
87
-compression.
88
-
89
-Like, in a first pass determine which files are manifests, and read
90
-enough of them to determine their parent/child structure, and in a
91
-second pass actually imports them, in topological order, with all
92
-relevant non-manifest files for a manifest imported as that time
93
-too. With that the underlying engine would see the files basically in
94
-the same order as generated by a regular series of commits.
95
-
96
-Problems for reconstruct: Files referenced, but not present, and,
97
-conversely, files present, but not referenced. This can done as part
98
-of the second pass, aborting when a missing file is encountered, with
99
-(un)marking of used files, and at the end we know the unused
100
-files. Could also be a separate pass between first and second.
63
+The core logic for handling content is in the file "content.c", in
64
+particular the functions 'content_put' and 'content_deltify'. One of
65
+the main users of these functions is in the file "checkin.c", see the
66
+function 'commit_cmd'.
67
+
68
+The logic is clear. The new modified files are simply stored without
69
+delta-compression, using 'content_put'. And should fosssil have an id
70
+for the _previous_ revision of the committed file it uses
71
+'content_deltify' to convert the already stored data for that revision
72
+into a delta with the just stored new revision as origin.
73
+
74
+In other words, fossil produces reverse deltas, with leaf revisions
75
+stored just zip-compressed (plain) and older revisions using both zip-
76
+and delta-compression.
77
+
78
+Of note is that the underlying logic in 'content_deltify' gives up on
79
+delta compression if the involved files are either not large enough,
80
+or if the achieved compression factor was not high enough. In that
81
+case the old revision of the file is left plain.
82
+
83
+The scheme can thus be called a 'truncated reverse delta'.
84
+
85
+The manifest is created and committed after the modified files. It
86
+uses the same logic as for the regular files. The new leaf is stored
87
+plain, and storage of the parent manifest is modified to be a delta
88
+with the current as origin.
89
+
90
+Further note that for a checkin of a merge result oonly the primary
91
+parent is modified in that way. The secondary parent, the one merged
92
+into the current revision is not touched. I.e. from the storage layer
93
+point of view this revision is still a leaf and the data is kept
94
+stored plain, not delta-compressed.
95
+
96
+
97
+
98
+Now the "reconstruct" can be done like so:
99
+
100
+- Scan the files in the indicated directory, and look for a manifest.
101
+
102
+- When the manifest has been found parse its contents and follow the
103
+ chain of parent links to locate the root manifest (no parent).
104
+
105
+- Import the files referenced by the root manifest, then the manifest
106
+ itself. This can be done using a modified form of the 'commit_cmd'
107
+ which does not have to construct a manifest on its own from vfile,
108
+ vmerge, etc.
109
+
110
+- After that recursively apply the import of the previous step to the
111
+ children of the root, and so on.
112
+
113
+For an incremental "reconstruct" the collection of files would not be
114
+a single tree with a root, but a forest, and the roots to look for are
115
+not manifests without parent, but with a parent which is already
116
+present in the repository. After one such root has been found and
117
+processed the unprocessed files have to be searched further for more
118
+roots, and only if no such are found anymore will the remaining files
119
+be considered as superfluous.
120
+
121
+We can use the functions in "manifest.c" for the parsing and following
122
+the parental chain.
123
+
124
+Hm. But we have no direct child information. So the above algorithm
125
+has to be modified, we have to scan all manifests before we start
126
+importing, and we have to create a reverse index, from manifest to
127
+children so that we can perform the import from root to leaves.
101128
--- ci_fossil.txt
+++ ci_fossil.txt
@@ -58,43 +58,70 @@
58
59 ----------------------------------
60
61 reconstruct
62
63 It is currently not clear to me when and how fossil does
64 delta-compression. Does it use deltas, or reverse deltas, or a
65 combination of both ? And when does it generate the deltas ?
66
67 The trivial solution is that it uses deltas, i.e. the first revision
68 of a file is stored without delta compression and all future versions
69 are deltas from that, and the delta is generated when the new revision
70 is committed. With the obvious disadvantage that newer revisions take
71 more and more time to be decompressed as the set of deltas to apply
72 grows.
73
74 And during xfer it simply sends the deltas as is, making for easy
75 integration on the remote side.
76
77 However reconstruct, that method sees initially only an unordered
78 collection of files, some of which may be manifests, others are data
79 files, and if it imports them in a random order it might find that
80 file X, which was imported first and therefore has no delta
81 compression, is actually somewhere in the middle of a line of
82 revisions, and should be delta-compressed, and then it has to find out
83 the predecessor and do the compression, etc.
84
85 So depending on how the internal logic of delta-compression is done
86 reconstruct might need more logic to help the lower level achieve good
87 compression.
88
89 Like, in a first pass determine which files are manifests, and read
90 enough of them to determine their parent/child structure, and in a
91 second pass actually imports them, in topological order, with all
92 relevant non-manifest files for a manifest imported as that time
93 too. With that the underlying engine would see the files basically in
94 the same order as generated by a regular series of commits.
95
96 Problems for reconstruct: Files referenced, but not present, and,
97 conversely, files present, but not referenced. This can done as part
98 of the second pass, aborting when a missing file is encountered, with
99 (un)marking of used files, and at the end we know the unused
100 files. Could also be a separate pass between first and second.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
--- ci_fossil.txt
+++ ci_fossil.txt
@@ -58,43 +58,70 @@
58
59 ----------------------------------
60
61 reconstruct
62
63 The core logic for handling content is in the file "content.c", in
64 particular the functions 'content_put' and 'content_deltify'. One of
65 the main users of these functions is in the file "checkin.c", see the
66 function 'commit_cmd'.
67
68 The logic is clear. The new modified files are simply stored without
69 delta-compression, using 'content_put'. And should fosssil have an id
70 for the _previous_ revision of the committed file it uses
71 'content_deltify' to convert the already stored data for that revision
72 into a delta with the just stored new revision as origin.
73
74 In other words, fossil produces reverse deltas, with leaf revisions
75 stored just zip-compressed (plain) and older revisions using both zip-
76 and delta-compression.
77
78 Of note is that the underlying logic in 'content_deltify' gives up on
79 delta compression if the involved files are either not large enough,
80 or if the achieved compression factor was not high enough. In that
81 case the old revision of the file is left plain.
82
83 The scheme can thus be called a 'truncated reverse delta'.
84
85 The manifest is created and committed after the modified files. It
86 uses the same logic as for the regular files. The new leaf is stored
87 plain, and storage of the parent manifest is modified to be a delta
88 with the current as origin.
89
90 Further note that for a checkin of a merge result oonly the primary
91 parent is modified in that way. The secondary parent, the one merged
92 into the current revision is not touched. I.e. from the storage layer
93 point of view this revision is still a leaf and the data is kept
94 stored plain, not delta-compressed.
95
96
97
98 Now the "reconstruct" can be done like so:
99
100 - Scan the files in the indicated directory, and look for a manifest.
101
102 - When the manifest has been found parse its contents and follow the
103 chain of parent links to locate the root manifest (no parent).
104
105 - Import the files referenced by the root manifest, then the manifest
106 itself. This can be done using a modified form of the 'commit_cmd'
107 which does not have to construct a manifest on its own from vfile,
108 vmerge, etc.
109
110 - After that recursively apply the import of the previous step to the
111 children of the root, and so on.
112
113 For an incremental "reconstruct" the collection of files would not be
114 a single tree with a root, but a forest, and the roots to look for are
115 not manifests without parent, but with a parent which is already
116 present in the repository. After one such root has been found and
117 processed the unprocessed files have to be searched further for more
118 roots, and only if no such are found anymore will the remaining files
119 be considered as superfluous.
120
121 We can use the functions in "manifest.c" for the parsing and following
122 the parental chain.
123
124 Hm. But we have no direct child information. So the above algorithm
125 has to be modified, we have to scan all manifests before we start
126 importing, and we have to create a reverse index, from manifest to
127 children so that we can perform the import from root to leaves.
128

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button