|
1
|
# Delta Manifests |
|
2
|
|
|
3
|
<div class="sidebar">Do not confuse these with the core [Fossil delta |
|
4
|
format](./delta_format.wiki). This document describes an optional |
|
5
|
feature not enabled by default.</div> |
|
6
|
|
|
7
|
This article describes "delta manifests," a special-case form of |
|
8
|
checkin manifest which is intended to take up far less space than |
|
9
|
a normal checkin manifest, in particular for repositories with |
|
10
|
many files. We'll see, however, that the space savings, if indeed |
|
11
|
there are any, come with some caveats. |
|
12
|
|
|
13
|
This article assumes that the reader is at least moderately familiar |
|
14
|
with Fossil's [artifact file format](./fileformat.wiki), in particular |
|
15
|
the structure of checkin manifests, and it won't make much sense to |
|
16
|
readers unfamiliar with that topic. |
|
17
|
|
|
18
|
# Background and Motivation of Delta Manifests |
|
19
|
|
|
20
|
A checkin manifest includes a list of every file in that checkin. A |
|
21
|
moderately-sized project can easily have a thousand files, and every |
|
22
|
checkin manifest will include those thousand files. As of this writing |
|
23
|
Fossil's own checkins contain 989 files and the manifests are 80kb |
|
24
|
each. Thus a checkin which changes only 2 bytes of source code |
|
25
|
ostensibly costs another 80kb of storage for the manifest for that |
|
26
|
change. |
|
27
|
|
|
28
|
Delta manifests were conceived as a mechanism to help combat that |
|
29
|
storage overhead. |
|
30
|
|
|
31
|
# Makeup of a Delta Manifest |
|
32
|
|
|
33
|
A delta manifest is structured like a normal manifest (called a |
|
34
|
"baseline" manifest) except that it has *two types of parents*: the |
|
35
|
P-card which is part of (nearly) every manifest and a so-called |
|
36
|
baseline (denoted by a B-card). The P-card tells us which artifact(s) |
|
37
|
is/are the parents for purposes of the SCM version DAG. The B-card |
|
38
|
tells us which manifest to use as a basis for this delta. The B-card |
|
39
|
need not be, and often is not, the same as the P-card. Here's an |
|
40
|
example: |
|
41
|
|
|
42
|
``` |
|
43
|
B c04ce8aaf1170966c6f8abcce8b57e72a0fa2b81 |
|
44
|
C Minor\sdoc\supdates... |
|
45
|
D 2021-03-11T18:56:24.686 |
|
46
|
F bindings/s2/shell_extend.c 6d8354c693120a48cfe4798812cd24499be174b2 |
|
47
|
<15 F-cards snipped for brevity> |
|
48
|
F src/repo.c 2f224cb0e59ccd90ba89c597e40b8e8d87506638 |
|
49
|
P 61d3e64e6fb1a93d4a7b0182e4c6b94d178d66d9 |
|
50
|
R a84ec2e8e1eb37ff0d94cac262795e23 |
|
51
|
U stephan |
|
52
|
Z 536e6d26dd8dbe2779d9e5f52a15518e |
|
53
|
``` |
|
54
|
|
|
55
|
The B-card names another manifest, by its unique ID, the same way that |
|
56
|
a P-card does. A manifest may have multiple P-card parents (the second |
|
57
|
and subsequent ones denoting merge parents) but B-cards always refer |
|
58
|
to exactly one parent. |
|
59
|
|
|
60
|
What unambiguously distinguishes this as a delta is the existence of |
|
61
|
the B-card. All deltas have a B-card and no other type of artifact has |
|
62
|
one. What also, but not unambiguously, distinguishes it as a delta is |
|
63
|
that it has only 17 F-cards, whereas a baseline manifest in that same |
|
64
|
repository has (as of this writing) 291 F-cards. In this particular |
|
65
|
case, the delta manifest is 1363 bytes, compared to 20627 bytes for |
|
66
|
the next checkin - a baseline manifest. That's a significant saving in |
|
67
|
F-cards, especially if a repository contains thousands of files. That |
|
68
|
savings, however, comes with caveats which we'll address below. |
|
69
|
|
|
70
|
Trivia regarding the B-card: |
|
71
|
|
|
72
|
- The B-card always refers to a baseline manifest, not another delta. |
|
73
|
- Deltas may not chain with another delta, but any number of deltas |
|
74
|
may have the same B-card. It is quite common for a series of delta |
|
75
|
manifest checkins, each of which derives (in the P-card sense) from |
|
76
|
the one before it, to have the same B-card. |
|
77
|
|
|
78
|
A delta manifest is functionally identical to a normal manifest except |
|
79
|
that it has a B-card and how it records F-cards. Namely, it only |
|
80
|
records F-cards which have changed at some point between this delta |
|
81
|
and the version represented by the delta's B-card. This recording of |
|
82
|
F-card *differences* also means that delta manifests, unlike normal |
|
83
|
manifests, have to explicitly record deleted F-cards. Baseline |
|
84
|
manifests do not record deletions. Instead, they include a list of |
|
85
|
every file which is part of that checkin. Deltas, however, record the |
|
86
|
differences between their own version and a baseline version, and thus |
|
87
|
have to record deletions. They do this by including F-cards which have |
|
88
|
only a file name and no hash. |
|
89
|
|
|
90
|
Iterating over F-cards in a manifest is something several important |
|
91
|
internal parts of Fossil have to do. Iterating over a baseline |
|
92
|
manifest, e.g. when performing a checkout, is straightforward: simply |
|
93
|
walk through the list in the order the cards are listed. A delta, |
|
94
|
however, introduces a significant wrinkle to that process. In short, |
|
95
|
when iterating over a delta's F-cards, code has to compare the delta's |
|
96
|
list to the baseline's list. If the delta has an entry the parent does |
|
97
|
not have, or which is a newer entry for the same file, the delta's |
|
98
|
entry is used. If the delta is missing an entry which the baseline |
|
99
|
has, the baseline's entry is used. When a deletion F-card is |
|
100
|
discovered in the delta (recall that baselines do not record |
|
101
|
deletions), iteration over that card is skipped - the internal |
|
102
|
algorithms which iterate over F-cards never report deletions to the |
|
103
|
code iterating over those cards. The reason for that is consistency: |
|
104
|
only deltas record file deletions, but the fact that it's a delta is |
|
105
|
an internal detail, not something which higher-level code should |
|
106
|
concern itself with. If higher-level iteration code were shown file |
|
107
|
deletions, they would effectively be dealing with a leaky abstraction |
|
108
|
and special-case handling which only applies to delta manifests. The |
|
109
|
F-card iteration API hides such details from its users (other |
|
110
|
Fossil-internal APIs). |
|
111
|
|
|
112
|
|
|
113
|
# When does Fossil Create Deltas? |
|
114
|
|
|
115
|
By default, Fossil never creates delta manifests. It can be told to do |
|
116
|
so using the `--delta` flag to the [`commit` |
|
117
|
command](/help/commit). (Before doing so in your own repositories, |
|
118
|
please read the section below about the caveats!) When a given |
|
119
|
repository gets a delta manifest for the first time, Fossil records |
|
120
|
that fact in the repository's `config` table with an entry named |
|
121
|
`seen-delta-manifest`. If, in later sessions, Fossil sees that that |
|
122
|
setting has a true value, it will *consider* creating delta manifests |
|
123
|
by default. |
|
124
|
|
|
125
|
Conversely, the [`forbid-delta-manifests` repository config |
|
126
|
setting](/help/forbid-delta-manifests) may be used to force Fossil to |
|
127
|
*never* create deltas. That setting will propagate to other repository |
|
128
|
clones via the sync process, to try to ensure that no clone introduces |
|
129
|
a delta manifests. We'll cover reasons why one might want to use that |
|
130
|
setting later on. |
|
131
|
|
|
132
|
After creating a delta manifest during the commit process, Fossil |
|
133
|
examines the size of the delta. If, in Fossil's opinion, the space |
|
134
|
savings are not significant enough to warrant the delta's own |
|
135
|
overhead, it will discard the delta and create a new baseline manifest |
|
136
|
instead. (The heuristic it uses for that purpose is tucked away in |
|
137
|
Fossil's checkin algorithm.) |
|
138
|
|
|
139
|
|
|
140
|
# Caveats |
|
141
|
|
|
142
|
Delta manifests may appear, on the surface, to be a great way to save |
|
143
|
a few bytes of repository space. There are, however, caveats... |
|
144
|
|
|
145
|
## Space Savings? |
|
146
|
|
|
147
|
Though deltas were conceived as a way to save storage space, that |
|
148
|
benefit is *not truly achieved* because... |
|
149
|
|
|
150
|
When a manifest is created, Fossil stores its parent version as a |
|
151
|
[fossil delta](./delta_format.wiki) (as opposed to a delta manifest) |
|
152
|
which succinctly descibes the differences between the parent and its |
|
153
|
new child. This form of compression is extremely space-efficient and |
|
154
|
can reduce the real storage space requirements of a manifest from tens |
|
155
|
or hundreds of kilobytes down to a kilobyte or less for checkins which |
|
156
|
modify only a few files. As an example, as of this writing, Fossil's |
|
157
|
[tip checkin baseline manifest](/artifact/decd537016bf) is 80252 bytes |
|
158
|
(uncompressed), and the delta-compressed baseline manifest of the |
|
159
|
[previous checkin](/artifact/2f7c93f49c0e) is stored as a mere 726 |
|
160
|
bytes of Fossil-delta'd data (not counting the z-lib compression which |
|
161
|
gets applied on top of that). In this case, the tip version modified 7 |
|
162
|
files compared to its parent version. |
|
163
|
|
|
164
|
Thus delta manifests do not *actually* save much storage space. They |
|
165
|
save *some*, in particular in the tip checkin version: Fossil |
|
166
|
delta-compresses *older* versions of checkins against the child |
|
167
|
versions, as opposed to delta-compressing the children against the |
|
168
|
parents. The reason is to speed up access for the most common case - |
|
169
|
the latest version. Thus tip-version delta manifests are more |
|
170
|
storage-space efficient than tip-version baseline manifests. Once the |
|
171
|
next version is committed, though, and Fossil deltification is applied |
|
172
|
to those manifests, that difference in space efficiency shrinks |
|
173
|
tremendously, often to the point of insignificance. |
|
174
|
|
|
175
|
We can observe the Fossil-delta compression savings using a bit of |
|
176
|
3rd-party code which can extract Fossil-format blobs both with and |
|
177
|
without applying their deltas: |
|
178
|
|
|
179
|
``` |
|
180
|
$ f-acat tip > A # tip version's manifest |
|
181
|
$ f-acat prev --raw > B # previous manifest in its raw fossil-deltified form |
|
182
|
$ f-acat prev > C # previous manifest fossil-undelta'd |
|
183
|
$ ls -la A B C |
|
184
|
-rw-rw-r-- 1 user user 80252 Mar 12 07:09 A # tip |
|
185
|
-rw-rw-r-- 1 user user 726 Mar 12 07:09 B # previous: delta'd |
|
186
|
-rw-rw-r-- 1 user user 80256 Mar 12 07:09 C # previous: undelta'd |
|
187
|
``` |
|
188
|
|
|
189
|
For comparison's sake, when looking at a separate repository which |
|
190
|
uses delta manifests, a delta-compressed delta manifest takes up |
|
191
|
approximately the same space as a delta-compressed baseline manifest |
|
192
|
(to within 10 bytes for the test samples). |
|
193
|
|
|
194
|
i.e. delta manifests may not save any storage space except for the tip |
|
195
|
version! (*Surprise!*) |
|
196
|
|
|
197
|
In terms of RAM costs, deltas usually cost more memory than baseline |
|
198
|
manifests. The reason is because traversing a delta requires having |
|
199
|
not only that delta in memory, but also its baseline version. Delta |
|
200
|
manifests are seldom used in ways which do not require also loading |
|
201
|
their baselines. Thus Fossil internally requires two manifest objects |
|
202
|
for most operations with a delta manifest, whereas a baseline has but |
|
203
|
one. The difference in RAM cost is directly proportional to the size |
|
204
|
of the delta manifest. |
|
205
|
|
|
206
|
## Manifests as Proof of Code Integrity |
|
207
|
|
|
208
|
Delta manifests have at least one more notable caveat, this one |
|
209
|
arguably more significant than an apparent lack of space savings: |
|
210
|
they're useless for purposes of publishing a manifest which downstream |
|
211
|
clients can use to verify the integrity of their copy of the software. |
|
212
|
|
|
213
|
Consider this use case: [the SQLite project](https://sqlite.org) |
|
214
|
publishes source code to many thousands of downstream consumers, many |
|
215
|
of whom would like to be able to verify that the copy they have |
|
216
|
downloaded is actually the copy published by the project. This is |
|
217
|
easily achieved by providing a copy of the downloaded version's |
|
218
|
manifest, as it contains a hash of every single file the project |
|
219
|
published and the manifest itself has a well-known hash and is |
|
220
|
cryptographically tamper-proof. It's mathematically extremely improbable for a |
|
221
|
malicious party to modify such a manifest and re-publish it as an |
|
222
|
"official" one, as the various hashes (F-cards, R-card, Z-card, *and* |
|
223
|
the hash of the manifest itself) would not line up. A collision-based |
|
224
|
attack would have to defeat *all four of those hashes*, which is |
|
225
|
practically impossible to do. Thus a Fossil checkin manifest can be used |
|
226
|
to provide strong assurances that a given copy of the software has not |
|
227
|
been tampered with since being exported by Fossil. |
|
228
|
|
|
229
|
*However*, that use case is *only possible with baseline manifests*. |
|
230
|
A delta manifest is *essentially useless* for that purpose. The |
|
231
|
algorithm for traversing F-cards of a delta manifest is not trivial |
|
232
|
for arbitrary clients to reproduce, e.g. using a shell script. While |
|
233
|
it *could* be done in any higher-level programming language (or some |
|
234
|
truly unsightly shell code), it would be an onerous burden on |
|
235
|
downstream consumers and would not be without risks of having bugs |
|
236
|
which invalidate the strong guarantees provided by the manifest. |
|
237
|
|
|
238
|
It's worth noting that the core Fossil project repository does not use |
|
239
|
delta manifests, at least in part for the same reason the SQLite |
|
240
|
project does not: the ability to provide a manifest which clients can |
|
241
|
easily use to verify the integrity of the code they've downloaded. The |
|
242
|
[`forbid-delta-manifests` config |
|
243
|
setting](/help/forbid-delta-manifests) is used to ensure that none are |
|
244
|
introduced into the repository beyond the few which were introduced |
|
245
|
solely for testing purposes. |
|
246
|
|
|
247
|
|