Fossil SCM

fossil-scm / www / delta-manifests.md
1
# Delta Manifests
2
3
<div class="sidebar">Do not confuse these with the core [Fossil delta
4
format](./delta_format.wiki). This document describes an optional
5
feature not enabled by default.</div>
6
7
This article describes "delta manifests," a special-case form of
8
checkin manifest which is intended to take up far less space than
9
a normal checkin manifest, in particular for repositories with
10
many files. We'll see, however, that the space savings, if indeed
11
there are any, come with some caveats.
12
13
This article assumes that the reader is at least moderately familiar
14
with Fossil's [artifact file format](./fileformat.wiki), in particular
15
the structure of checkin manifests, and it won't make much sense to
16
readers unfamiliar with that topic.
17
18
# Background and Motivation of Delta Manifests
19
20
A checkin manifest includes a list of every file in that checkin. A
21
moderately-sized project can easily have a thousand files, and every
22
checkin manifest will include those thousand files. As of this writing
23
Fossil's own checkins contain 989 files and the manifests are 80kb
24
each. Thus a checkin which changes only 2 bytes of source code
25
ostensibly costs another 80kb of storage for the manifest for that
26
change.
27
28
Delta manifests were conceived as a mechanism to help combat that
29
storage overhead.
30
31
# Makeup of a Delta Manifest
32
33
A delta manifest is structured like a normal manifest (called a
34
"baseline" manifest) except that it has *two types of parents*: the
35
P-card which is part of (nearly) every manifest and a so-called
36
baseline (denoted by a B-card). The P-card tells us which artifact(s)
37
is/are the parents for purposes of the SCM version DAG. The B-card
38
tells us which manifest to use as a basis for this delta. The B-card
39
need not be, and often is not, the same as the P-card. Here's an
40
example:
41
42
```
43
B c04ce8aaf1170966c6f8abcce8b57e72a0fa2b81
44
C Minor\sdoc\supdates...
45
D 2021-03-11T18:56:24.686
46
F bindings/s2/shell_extend.c 6d8354c693120a48cfe4798812cd24499be174b2
47
<15 F-cards snipped for brevity>
48
F src/repo.c 2f224cb0e59ccd90ba89c597e40b8e8d87506638
49
P 61d3e64e6fb1a93d4a7b0182e4c6b94d178d66d9
50
R a84ec2e8e1eb37ff0d94cac262795e23
51
U stephan
52
Z 536e6d26dd8dbe2779d9e5f52a15518e
53
```
54
55
The B-card names another manifest, by its unique ID, the same way that
56
a P-card does. A manifest may have multiple P-card parents (the second
57
and subsequent ones denoting merge parents) but B-cards always refer
58
to exactly one parent.
59
60
What unambiguously distinguishes this as a delta is the existence of
61
the B-card. All deltas have a B-card and no other type of artifact has
62
one. What also, but not unambiguously, distinguishes it as a delta is
63
that it has only 17 F-cards, whereas a baseline manifest in that same
64
repository has (as of this writing) 291 F-cards. In this particular
65
case, the delta manifest is 1363 bytes, compared to 20627 bytes for
66
the next checkin - a baseline manifest. That's a significant saving in
67
F-cards, especially if a repository contains thousands of files. That
68
savings, however, comes with caveats which we'll address below.
69
70
Trivia regarding the B-card:
71
72
- The B-card always refers to a baseline manifest, not another delta.
73
- Deltas may not chain with another delta, but any number of deltas
74
may have the same B-card. It is quite common for a series of delta
75
manifest checkins, each of which derives (in the P-card sense) from
76
the one before it, to have the same B-card.
77
78
A delta manifest is functionally identical to a normal manifest except
79
that it has a B-card and how it records F-cards. Namely, it only
80
records F-cards which have changed at some point between this delta
81
and the version represented by the delta's B-card. This recording of
82
F-card *differences* also means that delta manifests, unlike normal
83
manifests, have to explicitly record deleted F-cards. Baseline
84
manifests do not record deletions. Instead, they include a list of
85
every file which is part of that checkin. Deltas, however, record the
86
differences between their own version and a baseline version, and thus
87
have to record deletions. They do this by including F-cards which have
88
only a file name and no hash.
89
90
Iterating over F-cards in a manifest is something several important
91
internal parts of Fossil have to do. Iterating over a baseline
92
manifest, e.g. when performing a checkout, is straightforward: simply
93
walk through the list in the order the cards are listed. A delta,
94
however, introduces a significant wrinkle to that process. In short,
95
when iterating over a delta's F-cards, code has to compare the delta's
96
list to the baseline's list. If the delta has an entry the parent does
97
not have, or which is a newer entry for the same file, the delta's
98
entry is used. If the delta is missing an entry which the baseline
99
has, the baseline's entry is used. When a deletion F-card is
100
discovered in the delta (recall that baselines do not record
101
deletions), iteration over that card is skipped - the internal
102
algorithms which iterate over F-cards never report deletions to the
103
code iterating over those cards. The reason for that is consistency:
104
only deltas record file deletions, but the fact that it's a delta is
105
an internal detail, not something which higher-level code should
106
concern itself with. If higher-level iteration code were shown file
107
deletions, they would effectively be dealing with a leaky abstraction
108
and special-case handling which only applies to delta manifests. The
109
F-card iteration API hides such details from its users (other
110
Fossil-internal APIs).
111
112
113
# When does Fossil Create Deltas?
114
115
By default, Fossil never creates delta manifests. It can be told to do
116
so using the `--delta` flag to the [`commit`
117
command](/help/commit). (Before doing so in your own repositories,
118
please read the section below about the caveats!) When a given
119
repository gets a delta manifest for the first time, Fossil records
120
that fact in the repository's `config` table with an entry named
121
`seen-delta-manifest`. If, in later sessions, Fossil sees that that
122
setting has a true value, it will *consider* creating delta manifests
123
by default.
124
125
Conversely, the [`forbid-delta-manifests` repository config
126
setting](/help/forbid-delta-manifests) may be used to force Fossil to
127
*never* create deltas. That setting will propagate to other repository
128
clones via the sync process, to try to ensure that no clone introduces
129
a delta manifests. We'll cover reasons why one might want to use that
130
setting later on.
131
132
After creating a delta manifest during the commit process, Fossil
133
examines the size of the delta. If, in Fossil's opinion, the space
134
savings are not significant enough to warrant the delta's own
135
overhead, it will discard the delta and create a new baseline manifest
136
instead. (The heuristic it uses for that purpose is tucked away in
137
Fossil's checkin algorithm.)
138
139
140
# Caveats
141
142
Delta manifests may appear, on the surface, to be a great way to save
143
a few bytes of repository space. There are, however, caveats...
144
145
## Space Savings?
146
147
Though deltas were conceived as a way to save storage space, that
148
benefit is *not truly achieved* because...
149
150
When a manifest is created, Fossil stores its parent version as a
151
[fossil delta](./delta_format.wiki) (as opposed to a delta manifest)
152
which succinctly descibes the differences between the parent and its
153
new child. This form of compression is extremely space-efficient and
154
can reduce the real storage space requirements of a manifest from tens
155
or hundreds of kilobytes down to a kilobyte or less for checkins which
156
modify only a few files. As an example, as of this writing, Fossil's
157
[tip checkin baseline manifest](/artifact/decd537016bf) is 80252 bytes
158
(uncompressed), and the delta-compressed baseline manifest of the
159
[previous checkin](/artifact/2f7c93f49c0e) is stored as a mere 726
160
bytes of Fossil-delta'd data (not counting the z-lib compression which
161
gets applied on top of that). In this case, the tip version modified 7
162
files compared to its parent version.
163
164
Thus delta manifests do not *actually* save much storage space. They
165
save *some*, in particular in the tip checkin version: Fossil
166
delta-compresses *older* versions of checkins against the child
167
versions, as opposed to delta-compressing the children against the
168
parents. The reason is to speed up access for the most common case -
169
the latest version. Thus tip-version delta manifests are more
170
storage-space efficient than tip-version baseline manifests. Once the
171
next version is committed, though, and Fossil deltification is applied
172
to those manifests, that difference in space efficiency shrinks
173
tremendously, often to the point of insignificance.
174
175
We can observe the Fossil-delta compression savings using a bit of
176
3rd-party code which can extract Fossil-format blobs both with and
177
without applying their deltas:
178
179
```
180
$ f-acat tip > A # tip version's manifest
181
$ f-acat prev --raw > B # previous manifest in its raw fossil-deltified form
182
$ f-acat prev > C # previous manifest fossil-undelta'd
183
$ ls -la A B C
184
-rw-rw-r-- 1 user user 80252 Mar 12 07:09 A # tip
185
-rw-rw-r-- 1 user user 726 Mar 12 07:09 B # previous: delta'd
186
-rw-rw-r-- 1 user user 80256 Mar 12 07:09 C # previous: undelta'd
187
```
188
189
For comparison's sake, when looking at a separate repository which
190
uses delta manifests, a delta-compressed delta manifest takes up
191
approximately the same space as a delta-compressed baseline manifest
192
(to within 10 bytes for the test samples).
193
194
i.e. delta manifests may not save any storage space except for the tip
195
version! (*Surprise!*)
196
197
In terms of RAM costs, deltas usually cost more memory than baseline
198
manifests. The reason is because traversing a delta requires having
199
not only that delta in memory, but also its baseline version. Delta
200
manifests are seldom used in ways which do not require also loading
201
their baselines. Thus Fossil internally requires two manifest objects
202
for most operations with a delta manifest, whereas a baseline has but
203
one. The difference in RAM cost is directly proportional to the size
204
of the delta manifest.
205
206
## Manifests as Proof of Code Integrity
207
208
Delta manifests have at least one more notable caveat, this one
209
arguably more significant than an apparent lack of space savings:
210
they're useless for purposes of publishing a manifest which downstream
211
clients can use to verify the integrity of their copy of the software.
212
213
Consider this use case: [the SQLite project](https://sqlite.org)
214
publishes source code to many thousands of downstream consumers, many
215
of whom would like to be able to verify that the copy they have
216
downloaded is actually the copy published by the project. This is
217
easily achieved by providing a copy of the downloaded version's
218
manifest, as it contains a hash of every single file the project
219
published and the manifest itself has a well-known hash and is
220
cryptographically tamper-proof. It's mathematically extremely improbable for a
221
malicious party to modify such a manifest and re-publish it as an
222
"official" one, as the various hashes (F-cards, R-card, Z-card, *and*
223
the hash of the manifest itself) would not line up. A collision-based
224
attack would have to defeat *all four of those hashes*, which is
225
practically impossible to do. Thus a Fossil checkin manifest can be used
226
to provide strong assurances that a given copy of the software has not
227
been tampered with since being exported by Fossil.
228
229
*However*, that use case is *only possible with baseline manifests*.
230
A delta manifest is *essentially useless* for that purpose. The
231
algorithm for traversing F-cards of a delta manifest is not trivial
232
for arbitrary clients to reproduce, e.g. using a shell script. While
233
it *could* be done in any higher-level programming language (or some
234
truly unsightly shell code), it would be an onerous burden on
235
downstream consumers and would not be without risks of having bugs
236
which invalidate the strong guarantees provided by the manifest.
237
238
It's worth noting that the core Fossil project repository does not use
239
delta manifests, at least in part for the same reason the SQLite
240
project does not: the ability to provide a manifest which clients can
241
easily use to verify the integrity of the code they've downloaded. The
242
[`forbid-delta-manifests` config
243
setting](/help/forbid-delta-manifests) is used to ensure that none are
244
introduced into the repository beyond the few which were introduced
245
solely for testing purposes.
246
247

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button