Fossil SCM

Improvements to the document that describes the delta format.

drh 2019-03-02 20:33 trunk
Commit 81e61d78fdfdd2ee471217054b6de89e26d244ebf9671e6a842c47d252ba187d
1 file changed +67 -29
--- www/delta_format.wiki
+++ www/delta_format.wiki
@@ -1,42 +1,80 @@
11
<title>Fossil Delta Format</title>
2
-<nowiki>
3
-<h2>Abstract</h2>
2
+<h1>1.0 Overview</h1>
43
54
<p>Fossil achieves efficient storage and low-bandwidth synchronization
65
through the use of delta-compression. Instead of storing
7
-or transmitting the complete content of an artifact, fossil stores or
6
+or transmitting the complete content of an artifact, Fossil stores or
87
transmits only the changes relative to a related artifact.
98
</p>
109
11
-<p>This document describes the delta-encoding format used by fossil.
10
+<p>This document describes the delta-encoding format used by Fossil.
1211
The intended audience is developers working on either
13
-<a href="index.wiki">fossil</a> itself, or on tools compatible with
14
-fossil.</p>
15
-
16
-<p>Note that the delta-encoding is not a fundamental element of the
17
-state of a fossil repository. A state of a fossil repository is
18
-defined by the uncompressed and undeltaed content of all artifacts.
19
-The fact the artifacts
20
-are stored on disk using this delta-encoding format is merely an
21
-optimization. One could, in theory, create an entirely new and
22
-compatible implementation of fossil that used a different delta-encoding
23
-or did no delta-encoding at all. However, experience has shown that
24
-the delta-encoding described here is both efficient to compute and
25
-results in very small deltas, so its continued use is recommended.</p>
26
-
27
-<a name="structure"></a><h2>1.0 Structure</h2>
12
+<a href="index.wiki">Fossil</a> itself, or on tools compatible with
13
+Fossil. Understanding of this document is <em>not</em> required for
14
+ordinary users of Fossil. This document is an implementation detail.</p>
15
+
16
+<p>This document only describes the delta file format. A
17
+[./delta_encoder_algorithm.wiki|separate document] describes one possible
18
+algorithm for generating deltas in this format.</p>
19
+
20
+<h2>1.1 Sample Software And Analysis Tools</h2>
21
+
22
+<p>The core routines used to generate and apply deltas in this format
23
+are contained in the [../src/delta.c|delta.c] source file. Interface
24
+logic, including "test-*" commands to directly manipulate deltas are
25
+contained in the [../src/deltacmd.c|deltacmd.c] source file. SQL functions
26
+to create, apply, and analyze deltas are implemented by code in the
27
+[../src/deltafunc.c|deltafunc.c] source file.
28
+
29
+<p>The following command-line tools are available to create and apply
30
+deltas and to test the delta logic:
31
+
32
+ * [/help?cmd=test-delta|fossil test-delta] &rarr; Run self-tests of
33
+ the delta logic
34
+
35
+ * [/help?cmd=test-delta-create|fossil test-delta-create X Y] &rarr; compute
36
+ a delta that converts file X into file Y. Output that delta.
37
+
38
+ * [/help?cmd=test-delta-apply|fossil test-delta-apply X D] &rarr; apply
39
+ delta D to input file X and output the result.
40
+
41
+ * [/help?cmd=test-delta-analyze|fossil test-delta-analyze X Y] &rarr; compute
42
+ and delta that converts file X into file Y but instead of writing the
43
+ delta to output, write performance information about the delta.
44
+
45
+<p>When running the [/help?cmd=sqlite3|fossil sql] command to get an
46
+interactive SQL session connected to the repository, the following
47
+additional SQL functions are provided:
48
+
49
+ * <b>delta_create(</b><i>X</i><b>,</b><i>Y</i><b>)</b> &rarr;
50
+ Compute a data that carries blob X into blob Y and return that delta
51
+ as a blob.
52
+
53
+ * <b>delta_apply(</b><i>X</i><b>,</b><i>D</i><b>)</b> &rarr;
54
+ Apply delta D to input blob X return a new blob which is the result.
55
+
56
+
57
+ * <b>delta_output_size(</b><i>D</i>)</b> &rarr;
58
+ Return the size of the output that would result from applying delta D.
59
+
60
+ * <b>delta_parse(</b><i>D</i>)</b> &rarr; This is a table-valued function
61
+ that returns one row for the header, for the trailer, and for each segment
62
+ in delta D.
63
+
64
+
65
+<a name="structure"></a><h1>2.0 Structure</h1>
2866
<img src="delta1.gif" align="left" hspace="10">
2967
3068
<p>A delta consists of three parts, a "header", a "trailer", and a
3169
"segment-list" between them.</p>
3270
3371
<p>Both header and trailer provide information about the target
3472
helping the decoder, and the segment-list describes how the target can
3573
be constructed from the original.</p>
3674
37
-<a name="header"></a><h3>1.1 Header</h3>
75
+<a name="header"></a><h2>2.1 Header</h2>
3876
<img src="delta6.gif" align="left" hspace="10">
3977
4078
<p>The header consists of a single number followed by a newline
4179
character (ASCII 0x0a). The number is the length of the target in
4280
bytes.</p>
@@ -44,11 +82,11 @@
4482
<p>This means that, given a delta, the decoder can compute the size of
4583
the target (and allocate any necessary memory based on that) by simply
4684
reading the first line of the delta and decoding the number found
4785
there. In other words, before it has to decode everything else.</p>
4886
49
-<a name="trailer"></a><h3>1.2 Trailer</h3>
87
+<a name="trailer"></a><h2>2.2 Trailer</h2>
5088
<img src="delta5.gif" align="left" hspace="10">
5189
5290
<p>The trailer consists of a single number followed by a semicolon (ASCII
5391
0x3b). This number is a checksum of the target and can be used by a
5492
decoder to verify that the delta applied correctly, reconstructing the
@@ -61,11 +99,11 @@
6199
62100
<p>By putting this information at the end of the delta a decoder has
63101
it available immediately after the target has been reconstructed
64102
fully.</p>
65103
66
-<a name="slist"></a><h3>1.3 Segment-List</h3>
104
+<a name="slist"></a><h2>2.3 Segment-List</h2>
67105
<img src="delta2.gif" align="left" hspace="10">
68106
69107
<p>The segment-list of a delta describes how to create the target from
70108
the original by a combination of inserting literal byte-sequences and
71109
copying ranges of bytes from the original. This is where the
@@ -74,20 +112,20 @@
74112
75113
<p>The target is constructed from beginning to end, with the data
76114
generated by each instruction appended after the data of all previous
77115
instructions, with no gaps.</p>
78116
79
-<a name="insertlit"></a><h4>1.3.1 Insert Literal</h4>
117
+<a name="insertlit"></a><h3>2.3.1 Insert Literal</h3>
80118
81119
<p>A literal is specified by two elements, the size of the literal in
82120
bytes, and the bytes of the literal itself.</p>
83121
84122
<img src="delta4.gif" align="left" hspace="10">
85123
<p>The length is written first, followed by a colon character (ASCII
86124
0x3a), followed by the bytes of the literal.</p>
87125
88
-<a name="copyrange"></a><h4>1.3.2 Copy Range</h4>
126
+<a name="copyrange"></a><h3>2.3.2 Copy Range</h3>
89127
90128
<p>A range to copy is specified by two numbers, the offset of the
91129
first byte in the original to copy, and the size of the range, in
92130
bytes. The size zero is special, its usage indicates that the range
93131
extends to the end of the original.</p>
@@ -94,11 +132,11 @@
94132
95133
<img src="delta3.gif" align="left" hspace="10">
96134
<p>The length is written first, followed by an "at" character (ASCII
97135
0x40), then the offset, followed by a comma (ASCII 0x2c).</p>
98136
99
-<a name="intcoding"></a><h2>2.0 Encoding of integers</h2>
137
+<a name="intcoding"></a><h1>3.0 Encoding of integers</h1>
100138
101139
<p>
102140
The format currently handles only 32 bit integer numbers. They are
103141
written base-64 encoded, MSB first, and without leading
104142
"0"-characters, except if they are significant (i.e. 0 => "0").
@@ -107,13 +145,13 @@
107145
<p>
108146
The base-64 coding is described in
109147
<a href="http://www.ietf.org/rfc/rfc3548.txt">RFC 3548</a>.
110148
</p>
111149
112
-<a name="examples"></a><h2>3.0 Examples</h2>
150
+<a name="examples"></a><h1>4.0 Examples</h1>
113151
114
-<a name="examplesint"></a><h3>3.1 Integer encoding</h3>
152
+<a name="examplesint"></a><h2>4.1 Integer encoding</h2>
115153
116154
<table border=1>
117155
<tr>
118156
<th>Value</th>
119157
<th>Encoding</th>
@@ -130,11 +168,11 @@
130168
<td>-1101438770</td>
131169
<td>2zMM3E</td>
132170
</tr>
133171
</table>
134172
135
-<a name="examplesdelta"></a><h3>3.2 Delta encoding</h3>
173
+<a name="examplesdelta"></a><h2>4.2 Delta encoding</h2>
136174
137175
<p>An example of a delta using the specified encoding is:</p>
138176
139177
<table border=1><tr><td><pre>
140178
1Xb
@@ -206,11 +244,11 @@
206244
207245
</pre></td></tr></table>
208246
209247
210248
211
-<a name="notes"></a><h2>Notes</h2>
249
+<a name="notes"></a><h1>Notes</h1>
212250
213251
<ul>
214252
<li>Pure text files generate a pure text delta.
215253
</li>
216254
<li>Binary files generate a delta that may contain some binary data.
217255
--- www/delta_format.wiki
+++ www/delta_format.wiki
@@ -1,42 +1,80 @@
1 <title>Fossil Delta Format</title>
2 <nowiki>
3 <h2>Abstract</h2>
4
5 <p>Fossil achieves efficient storage and low-bandwidth synchronization
6 through the use of delta-compression. Instead of storing
7 or transmitting the complete content of an artifact, fossil stores or
8 transmits only the changes relative to a related artifact.
9 </p>
10
11 <p>This document describes the delta-encoding format used by fossil.
12 The intended audience is developers working on either
13 <a href="index.wiki">fossil</a> itself, or on tools compatible with
14 fossil.</p>
15
16 <p>Note that the delta-encoding is not a fundamental element of the
17 state of a fossil repository. A state of a fossil repository is
18 defined by the uncompressed and undeltaed content of all artifacts.
19 The fact the artifacts
20 are stored on disk using this delta-encoding format is merely an
21 optimization. One could, in theory, create an entirely new and
22 compatible implementation of fossil that used a different delta-encoding
23 or did no delta-encoding at all. However, experience has shown that
24 the delta-encoding described here is both efficient to compute and
25 results in very small deltas, so its continued use is recommended.</p>
26
27 <a name="structure"></a><h2>1.0 Structure</h2>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28 <img src="delta1.gif" align="left" hspace="10">
29
30 <p>A delta consists of three parts, a "header", a "trailer", and a
31 "segment-list" between them.</p>
32
33 <p>Both header and trailer provide information about the target
34 helping the decoder, and the segment-list describes how the target can
35 be constructed from the original.</p>
36
37 <a name="header"></a><h3>1.1 Header</h3>
38 <img src="delta6.gif" align="left" hspace="10">
39
40 <p>The header consists of a single number followed by a newline
41 character (ASCII 0x0a). The number is the length of the target in
42 bytes.</p>
@@ -44,11 +82,11 @@
44 <p>This means that, given a delta, the decoder can compute the size of
45 the target (and allocate any necessary memory based on that) by simply
46 reading the first line of the delta and decoding the number found
47 there. In other words, before it has to decode everything else.</p>
48
49 <a name="trailer"></a><h3>1.2 Trailer</h3>
50 <img src="delta5.gif" align="left" hspace="10">
51
52 <p>The trailer consists of a single number followed by a semicolon (ASCII
53 0x3b). This number is a checksum of the target and can be used by a
54 decoder to verify that the delta applied correctly, reconstructing the
@@ -61,11 +99,11 @@
61
62 <p>By putting this information at the end of the delta a decoder has
63 it available immediately after the target has been reconstructed
64 fully.</p>
65
66 <a name="slist"></a><h3>1.3 Segment-List</h3>
67 <img src="delta2.gif" align="left" hspace="10">
68
69 <p>The segment-list of a delta describes how to create the target from
70 the original by a combination of inserting literal byte-sequences and
71 copying ranges of bytes from the original. This is where the
@@ -74,20 +112,20 @@
74
75 <p>The target is constructed from beginning to end, with the data
76 generated by each instruction appended after the data of all previous
77 instructions, with no gaps.</p>
78
79 <a name="insertlit"></a><h4>1.3.1 Insert Literal</h4>
80
81 <p>A literal is specified by two elements, the size of the literal in
82 bytes, and the bytes of the literal itself.</p>
83
84 <img src="delta4.gif" align="left" hspace="10">
85 <p>The length is written first, followed by a colon character (ASCII
86 0x3a), followed by the bytes of the literal.</p>
87
88 <a name="copyrange"></a><h4>1.3.2 Copy Range</h4>
89
90 <p>A range to copy is specified by two numbers, the offset of the
91 first byte in the original to copy, and the size of the range, in
92 bytes. The size zero is special, its usage indicates that the range
93 extends to the end of the original.</p>
@@ -94,11 +132,11 @@
94
95 <img src="delta3.gif" align="left" hspace="10">
96 <p>The length is written first, followed by an "at" character (ASCII
97 0x40), then the offset, followed by a comma (ASCII 0x2c).</p>
98
99 <a name="intcoding"></a><h2>2.0 Encoding of integers</h2>
100
101 <p>
102 The format currently handles only 32 bit integer numbers. They are
103 written base-64 encoded, MSB first, and without leading
104 "0"-characters, except if they are significant (i.e. 0 => "0").
@@ -107,13 +145,13 @@
107 <p>
108 The base-64 coding is described in
109 <a href="http://www.ietf.org/rfc/rfc3548.txt">RFC 3548</a>.
110 </p>
111
112 <a name="examples"></a><h2>3.0 Examples</h2>
113
114 <a name="examplesint"></a><h3>3.1 Integer encoding</h3>
115
116 <table border=1>
117 <tr>
118 <th>Value</th>
119 <th>Encoding</th>
@@ -130,11 +168,11 @@
130 <td>-1101438770</td>
131 <td>2zMM3E</td>
132 </tr>
133 </table>
134
135 <a name="examplesdelta"></a><h3>3.2 Delta encoding</h3>
136
137 <p>An example of a delta using the specified encoding is:</p>
138
139 <table border=1><tr><td><pre>
140 1Xb
@@ -206,11 +244,11 @@
206
207 </pre></td></tr></table>
208
209
210
211 <a name="notes"></a><h2>Notes</h2>
212
213 <ul>
214 <li>Pure text files generate a pure text delta.
215 </li>
216 <li>Binary files generate a delta that may contain some binary data.
217
--- www/delta_format.wiki
+++ www/delta_format.wiki
@@ -1,42 +1,80 @@
1 <title>Fossil Delta Format</title>
2 <h1>1.0 Overview</h1>
 
3
4 <p>Fossil achieves efficient storage and low-bandwidth synchronization
5 through the use of delta-compression. Instead of storing
6 or transmitting the complete content of an artifact, Fossil stores or
7 transmits only the changes relative to a related artifact.
8 </p>
9
10 <p>This document describes the delta-encoding format used by Fossil.
11 The intended audience is developers working on either
12 <a href="index.wiki">Fossil</a> itself, or on tools compatible with
13 Fossil. Understanding of this document is <em>not</em> required for
14 ordinary users of Fossil. This document is an implementation detail.</p>
15
16 <p>This document only describes the delta file format. A
17 [./delta_encoder_algorithm.wiki|separate document] describes one possible
18 algorithm for generating deltas in this format.</p>
19
20 <h2>1.1 Sample Software And Analysis Tools</h2>
21
22 <p>The core routines used to generate and apply deltas in this format
23 are contained in the [../src/delta.c|delta.c] source file. Interface
24 logic, including "test-*" commands to directly manipulate deltas are
25 contained in the [../src/deltacmd.c|deltacmd.c] source file. SQL functions
26 to create, apply, and analyze deltas are implemented by code in the
27 [../src/deltafunc.c|deltafunc.c] source file.
28
29 <p>The following command-line tools are available to create and apply
30 deltas and to test the delta logic:
31
32 * [/help?cmd=test-delta|fossil test-delta] &rarr; Run self-tests of
33 the delta logic
34
35 * [/help?cmd=test-delta-create|fossil test-delta-create X Y] &rarr; compute
36 a delta that converts file X into file Y. Output that delta.
37
38 * [/help?cmd=test-delta-apply|fossil test-delta-apply X D] &rarr; apply
39 delta D to input file X and output the result.
40
41 * [/help?cmd=test-delta-analyze|fossil test-delta-analyze X Y] &rarr; compute
42 and delta that converts file X into file Y but instead of writing the
43 delta to output, write performance information about the delta.
44
45 <p>When running the [/help?cmd=sqlite3|fossil sql] command to get an
46 interactive SQL session connected to the repository, the following
47 additional SQL functions are provided:
48
49 * <b>delta_create(</b><i>X</i><b>,</b><i>Y</i><b>)</b> &rarr;
50 Compute a data that carries blob X into blob Y and return that delta
51 as a blob.
52
53 * <b>delta_apply(</b><i>X</i><b>,</b><i>D</i><b>)</b> &rarr;
54 Apply delta D to input blob X return a new blob which is the result.
55
56
57 * <b>delta_output_size(</b><i>D</i>)</b> &rarr;
58 Return the size of the output that would result from applying delta D.
59
60 * <b>delta_parse(</b><i>D</i>)</b> &rarr; This is a table-valued function
61 that returns one row for the header, for the trailer, and for each segment
62 in delta D.
63
64
65 <a name="structure"></a><h1>2.0 Structure</h1>
66 <img src="delta1.gif" align="left" hspace="10">
67
68 <p>A delta consists of three parts, a "header", a "trailer", and a
69 "segment-list" between them.</p>
70
71 <p>Both header and trailer provide information about the target
72 helping the decoder, and the segment-list describes how the target can
73 be constructed from the original.</p>
74
75 <a name="header"></a><h2>2.1 Header</h2>
76 <img src="delta6.gif" align="left" hspace="10">
77
78 <p>The header consists of a single number followed by a newline
79 character (ASCII 0x0a). The number is the length of the target in
80 bytes.</p>
@@ -44,11 +82,11 @@
82 <p>This means that, given a delta, the decoder can compute the size of
83 the target (and allocate any necessary memory based on that) by simply
84 reading the first line of the delta and decoding the number found
85 there. In other words, before it has to decode everything else.</p>
86
87 <a name="trailer"></a><h2>2.2 Trailer</h2>
88 <img src="delta5.gif" align="left" hspace="10">
89
90 <p>The trailer consists of a single number followed by a semicolon (ASCII
91 0x3b). This number is a checksum of the target and can be used by a
92 decoder to verify that the delta applied correctly, reconstructing the
@@ -61,11 +99,11 @@
99
100 <p>By putting this information at the end of the delta a decoder has
101 it available immediately after the target has been reconstructed
102 fully.</p>
103
104 <a name="slist"></a><h2>2.3 Segment-List</h2>
105 <img src="delta2.gif" align="left" hspace="10">
106
107 <p>The segment-list of a delta describes how to create the target from
108 the original by a combination of inserting literal byte-sequences and
109 copying ranges of bytes from the original. This is where the
@@ -74,20 +112,20 @@
112
113 <p>The target is constructed from beginning to end, with the data
114 generated by each instruction appended after the data of all previous
115 instructions, with no gaps.</p>
116
117 <a name="insertlit"></a><h3>2.3.1 Insert Literal</h3>
118
119 <p>A literal is specified by two elements, the size of the literal in
120 bytes, and the bytes of the literal itself.</p>
121
122 <img src="delta4.gif" align="left" hspace="10">
123 <p>The length is written first, followed by a colon character (ASCII
124 0x3a), followed by the bytes of the literal.</p>
125
126 <a name="copyrange"></a><h3>2.3.2 Copy Range</h3>
127
128 <p>A range to copy is specified by two numbers, the offset of the
129 first byte in the original to copy, and the size of the range, in
130 bytes. The size zero is special, its usage indicates that the range
131 extends to the end of the original.</p>
@@ -94,11 +132,11 @@
132
133 <img src="delta3.gif" align="left" hspace="10">
134 <p>The length is written first, followed by an "at" character (ASCII
135 0x40), then the offset, followed by a comma (ASCII 0x2c).</p>
136
137 <a name="intcoding"></a><h1>3.0 Encoding of integers</h1>
138
139 <p>
140 The format currently handles only 32 bit integer numbers. They are
141 written base-64 encoded, MSB first, and without leading
142 "0"-characters, except if they are significant (i.e. 0 => "0").
@@ -107,13 +145,13 @@
145 <p>
146 The base-64 coding is described in
147 <a href="http://www.ietf.org/rfc/rfc3548.txt">RFC 3548</a>.
148 </p>
149
150 <a name="examples"></a><h1>4.0 Examples</h1>
151
152 <a name="examplesint"></a><h2>4.1 Integer encoding</h2>
153
154 <table border=1>
155 <tr>
156 <th>Value</th>
157 <th>Encoding</th>
@@ -130,11 +168,11 @@
168 <td>-1101438770</td>
169 <td>2zMM3E</td>
170 </tr>
171 </table>
172
173 <a name="examplesdelta"></a><h2>4.2 Delta encoding</h2>
174
175 <p>An example of a delta using the specified encoding is:</p>
176
177 <table border=1><tr><td><pre>
178 1Xb
@@ -206,11 +244,11 @@
244
245 </pre></td></tr></table>
246
247
248
249 <a name="notes"></a><h1>Notes</h1>
250
251 <ul>
252 <li>Pure text files generate a pure text delta.
253 </li>
254 <li>Binary files generate a delta that may contain some binary data.
255

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button