Fossil SCM

fossil-scm / www / delta_format.wiki
1
<title>Fossil Delta Format</title>
2
<h1>1.0 Overview</h1>
3
4
Fossil achieves efficient storage and low-bandwidth synchronization
5
through the use of delta-compression. Instead of storing
6
or transmitting the complete content of an artifact, Fossil stores or
7
transmits only the changes relative to a related artifact.
8
9
This document describes the delta-encoding format used by Fossil.
10
The intended audience is developers working on either
11
<a href="index.wiki">Fossil</a> itself, or on tools compatible with
12
Fossil. Understanding of this document is <em>not</em> required for
13
ordinary users of Fossil. This document is an implementation detail.
14
15
This document only describes the delta file format. A
16
[./delta_encoder_algorithm.wiki|separate document] describes one possible
17
algorithm for generating deltas in this format.
18
19
<h2>1.1 Sample Software And Analysis Tools</h2>
20
21
The core routines used to generate and apply deltas in this format
22
are contained in the [../src/delta.c|delta.c] source file. Interface
23
logic, including "test-*" commands to directly manipulate deltas are
24
contained in the [../src/deltacmd.c|deltacmd.c] source file. SQL functions
25
to create, apply, and analyze deltas are implemented by code in the
26
[../src/deltafunc.c|deltafunc.c] source file.
27
28
The following command-line tools are available to create and apply
29
deltas and to test the delta logic:
30
31
* [/help/test-delta|fossil test-delta] &rarr; Run self-tests of
32
the delta logic
33
34
* [/help/test-delta-create|fossil test-delta-create X Y] &rarr; compute
35
a delta that converts file X into file Y. Output that delta.
36
37
* [/help/test-delta-apply|fossil test-delta-apply X D] &rarr; apply
38
delta D to input file X and output the result.
39
40
* [/help/test-delta-analyze|fossil test-delta-analyze X Y] &rarr; compute
41
and delta that converts file X into file Y but instead of writing the
42
delta to output, write performance information about the delta.
43
44
When running the [/help/sqlite3|fossil sql] command to get an
45
interactive SQL session connected to the repository, the following
46
additional SQL functions are provided:
47
48
* <b>delta_create(</b><i>X</i><b>,</b><i>Y</i><b>)</b> &rarr;
49
Compute a data that carries blob X into blob Y and return that delta
50
as a blob.
51
52
* <b>delta_apply(</b><i>X</i><b>,</b><i>D</i><b>)</b> &rarr;
53
Apply delta D to input blob X return a new blob which is the result.
54
55
56
* <b>delta_output_size(</b><i>D</i>)</b> &rarr;
57
Return the size of the output that would result from applying delta D.
58
59
* <b>delta_parse(</b><i>D</i>)</b> &rarr; This is a table-valued function
60
that returns one row for the header, for the trailer, and for each segment
61
in delta D.
62
63
64
<h1 id="structure">2.0 Structure</h1>
65
<verbatim type="pikchr">
66
leftmargin = 0.1
67
box height 50% "Header"
68
box same "Segments"
69
box same "Trailer"
70
</verbatim>
71
72
A delta consists of three parts, a "header", a "trailer", and a
73
"segment-list" between them.
74
75
Both header and trailer provide information about the target
76
helping the decoder, and the segment-list describes how the target can
77
be constructed from the original.
78
79
<h2 id="header">2.1 Header</h2>
80
<verbatim type="pikchr">
81
leftmargin = 0.1
82
box height 50% "Size"
83
box same "\"\\n\""
84
</verbatim>
85
86
The header consists of a single number followed by a newline
87
character (ASCII 0x0a). The number is the length of the target in
88
bytes.
89
90
This means that, given a delta, the decoder can compute the size of
91
the target (and allocate any necessary memory based on that) by simply
92
reading the first line of the delta and decoding the number found
93
there. In other words, before it has to decode everything else.
94
95
<h2 id="trailer">2.2 Trailer</h2>
96
<verbatim type="pikchr">
97
leftmargin = 0.1
98
box height 50% "Checksum"
99
box same "\";\""
100
</verbatim>
101
102
The trailer consists of a single number followed by a semicolon (ASCII
103
0x3b). This number is a checksum of the target and can be used by a
104
decoder to verify that the delta applied correctly, reconstructing the
105
target from the original.
106
107
The checksum is computed by treating the target as a series of
108
32-bit integer numbers (MSB first), and summing these up, modulo
109
2^32-1. A target whose length is not a multiple of 4 is padded with
110
0-bytes (ASCII 0x00) at the end.
111
112
By putting this information at the end of the delta a decoder has
113
it available immediately after the target has been reconstructed
114
fully.
115
116
<h2 id="slist">2.3 Segment-List</h2>
117
<verbatim type="pikchr">
118
leftmargin = 0.1
119
PART1: [
120
B1: box height 50% width 15% ""
121
B2: box same ""
122
B3: box same ""
123
"***"
124
box height 50% width 15% ""
125
I1: line down 50% from B2 .s
126
arrow right until even with B3.e
127
box "Insert Literal" height 50%
128
line down 75% from I1 .s
129
arrow right until even with B3.e
130
box "Copy Range" height 50%
131
]
132
down
133
PART2: [
134
""
135
box "Length" height 50%
136
right
137
box "\":\"" same
138
box "Bytes" same
139
] with .nw at previous.sw
140
</verbatim>
141
142
The segment-list of a delta describes how to create the target from
143
the original by a combination of inserting literal byte-sequences and
144
copying ranges of bytes from the original. This is where the
145
compression takes place, by encoding the large common parts of
146
original and target in small copy instructions.
147
148
The target is constructed from beginning to end, with the data
149
generated by each instruction appended after the data of all previous
150
instructions, with no gaps.
151
152
<h3 id="insertlit">2.3.1 Insert Literal</h3>
153
154
A literal is specified by two elements, the size of the literal in
155
bytes, and the bytes of the literal itself.
156
157
<verbatim type="pikchr">
158
leftmargin = 0.1
159
box "Length" height 50%
160
box "\":\"" same
161
box "Bytes" same
162
</verbatim>
163
164
The length is written first, followed by a colon character (ASCII
165
0x3a), followed by the bytes of the literal.
166
167
<h3 id="copyrange">2.3.2 Copy Range</h3>
168
169
A range to copy is specified by two numbers, the offset of the
170
first byte in the original to copy, and the size of the range, in
171
bytes. The size zero is special, its usage indicates that the range
172
extends to the end of the original.
173
174
<verbatim type="pikchr">
175
leftmargin = 0.1
176
box "Length" height 50%
177
box "\"@\"" same
178
box "Offset" same
179
box "\",\"" same
180
</verbatim>
181
182
183
The length is written first, followed by an "at" character (ASCII
184
0x40), then the offset, followed by a comma (ASCII 0x2c).
185
186
<h1 id="intcoding">3.0 Encoding of integers</h1>
187
188
The format currently handles only 32 bit integer numbers. They are
189
written base-64 encoded, MSB first, and without leading
190
"0"-characters, except if they are significant (i.e. 0 => "0").
191
192
The base-64 encoding uses one character for each 6 bits of
193
the integer to be encoded. The encoding characters are:
194
195
<pre>
196
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~
197
</pre>
198
199
The least significant 6 bits of the integer are encoded by
200
the first character, followed by
201
the next 6 bits, and so on until all non-zero bits of the integer
202
are encoded. The minimum number of encoding characters is used.
203
Note that for integers less than 10, the base-64 coding is a
204
ASCII decimal rendering of the number itself.
205
206
<h1 id="examples">4.0 Examples</h1>
207
208
<h2 id="examplesint">4.1 Integer encoding</h2>
209
210
<table>
211
<tr>
212
<th>Value</th>
213
<th>Encoding</th>
214
</tr>
215
<tr>
216
<td>0</td>
217
<td>0</td>
218
</tr>
219
<tr>
220
<td>6246</td>
221
<td>1Xb</td>
222
</tr>
223
<tr>
224
<td>-1101438770</td>
225
<td>2zMM3E</td>
226
</tr>
227
</table>
228
229
<h2 id="examplesdelta">4.2 Delta encoding</h2>
230
231
An example of a delta using the specified encoding is:
232
233
<pre>
234
1Xb
235
4E@0,2:thFN@4C,6:scenda1B@Jd,6:scenda5x@Kt,6:pieces79@Qt,F: Example: eskil~E@Y0,2zMM3E;</pre>
236
</pre>
237
238
This can be taken apart into the following parts:
239
240
<table>
241
<tr><th>What </th> <th>Encoding </th><th>Meaning </th><th>Details</th></tr>
242
<tr><td>Header</td> <td>1Xb </td><td>Size </td><td> 6246 </td></tr>
243
<tr><td>S-List</td> <td>4E@0, </td><td>Copy </td><td> 270 @ 0 </td></tr>
244
<tr><td>&nbsp;</td> <td>2:th </td><td>Literal </td><td> 2 'th' </td></tr>
245
<tr><td>&nbsp;</td> <td>FN@4C, </td><td>Copy </td><td> 983 @ 268 </td></tr>
246
<tr><td>&nbsp;</td> <td>6:scenda </td><td>Literal </td><td> 6 'scenda' </td></tr>
247
<tr><td>&nbsp;</td> <td>1B@Jd, </td><td>Copy </td><td> 75 @ 1256 </td></tr>
248
<tr><td>&nbsp;</td> <td>6:scenda </td><td>Literal </td><td> 6 'scenda' </td></tr>
249
<tr><td>&nbsp;</td> <td>5x@Kt, </td><td>Copy </td><td> 380 @ 1336 </td></tr>
250
<tr><td>&nbsp;</td> <td>6:pieces </td><td>Literal </td><td> 6 'pieces' </td></tr>
251
<tr><td>&nbsp;</td> <td>79@Qt, </td><td>Copy </td><td> 457 @ 1720 </td></tr>
252
<tr><td>&nbsp;</td> <td>F: Example: eskil</td><td>Literal </td><td> 15 ' Example: eskil'</td></tr>
253
<tr><td>&nbsp;</td> <td>~E@Y0, </td><td>Copy </td><td> 4046 @ 2176 </td></tr>
254
<tr><td>Trailer</td><td>2zMM3E </td><td>Checksum</td><td> -1101438770 </td></tr>
255
</table>
256
257
The unified diff behind the above delta is
258
259
<verbatim>
260
bluepeak:(761) ~/Projects/Tcl/Fossil/Devel/devel > diff -u ../DELTA/old ../DELTA/new
261
--- ../DELTA/old 2007-08-23 21:14:40.000000000 -0700
262
+++ ../DELTA/new 2007-08-23 21:14:33.000000000 -0700
263
@@ -5,7 +5,7 @@
264
265
* If the server does not have write permission on the database
266
file, or on the directory containing the database file (and
267
- it is thus unable to update database because it cannot create
268
+ it is thus unable to update the database because it cannot create
269
a rollback journal) then it currently fails silently on a push.
270
It needs to return a helpful error.
271
272
@@ -27,8 +27,8 @@
273
* Additional information displayed for the "vinfo" page:
274
275
+ All leaves of this version that are not included in the
276
- descendant list. With date, user, comment, and hyperlink.
277
- Leaves in the descendant table should be marked as such.
278
+ descendant list. With date, user, comment, and hyperlink.
279
+ Leaves in the descendant table should be marked as such.
280
See the compute_leaves() function to see how to find all
281
leaves.
282
+ Add file diff links to the file change list.
283
@@ -37,7 +37,7 @@
284
285
* The /xfer handler (for push, pull, and clone) does not do
286
delta compression. This results in excess bandwidth usage.
287
- There are some code in xfer.c that are sketches of ideas on
288
+ There are some pieces in xfer.c that are sketches of ideas on
289
how to do delta compression, but nothing has been implemented.
290
291
* Enhancements to the diff and tkdiff commands in the cli.
292
@@ -45,7 +45,7 @@
293
single file. Allow diffs against any two arbitrary versions,
294
not just diffs against the current check-out. Allow
295
configuration options to replace tkdiff with some other
296
- visual differ of the users choice.
297
+ visual differ of the users choice. Example: eskil.
298
299
* Ticketing interface (expand this bullet)
300
</verbatim>
301
302
303
304
<h1 id="notes">Notes</h1>
305
306
<ul>
307
<li>Pure text files generate a pure text delta.
308
</li>
309
<li>Binary files generate a delta that may contain some binary data.
310
</li>
311
<li>The delta encoding does not attempt to compress the content.
312
It was considered to be much
313
more sensible to do compression using a separate general-purpose
314
compression library, like <a href="http://www.zlib.net">zlib</a>.
315
</li>
316
</ul>
317

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button