Fossil SCM

Merged from trunk to verify fix in [62352847].

rberteig 2017-03-13 21:53 rkb-2.0-tests merge
Commit 4077357a38ddb2f17c466960c473fb2e1e1408d10f774b28ef0104519fa4f580
+1 -1
--- VERSION
+++ VERSION
@@ -1,1 +1,1 @@
1
-2.0
1
+2.1
22
33
DELETED compat/zlib/doc/algorithm.txt
44
DELETED compat/zlib/doc/rfc1950.txt
55
DELETED compat/zlib/doc/rfc1951.txt
66
DELETED compat/zlib/doc/rfc1952.txt
77
DELETED compat/zlib/doc/txtvsbin.txt
--- VERSION
+++ VERSION
@@ -1,1 +1,1 @@
1 2.0
2
3 ELETED compat/zlib/doc/algorithm.txt
4 ELETED compat/zlib/doc/rfc1950.txt
5 ELETED compat/zlib/doc/rfc1951.txt
6 ELETED compat/zlib/doc/rfc1952.txt
7 ELETED compat/zlib/doc/txtvsbin.txt
--- VERSION
+++ VERSION
@@ -1,1 +1,1 @@
1 2.1
2
3 ELETED compat/zlib/doc/algorithm.txt
4 ELETED compat/zlib/doc/rfc1950.txt
5 ELETED compat/zlib/doc/rfc1951.txt
6 ELETED compat/zlib/doc/rfc1952.txt
7 ELETED compat/zlib/doc/txtvsbin.txt
D compat/zlib/doc/algorithm.txt
-209
--- a/compat/zlib/doc/algorithm.txt
+++ b/compat/zlib/doc/algorithm.txt
@@ -1,209 +0,0 @@
1
-1. Compression algorithm (deflate)
2
-
3
-The deflation algorithm used by gzip (also zip and zlib) is a variation of
4
-LZ77 (Lempel-Ziv 1977, see reference below). It finds duplicated strings in
5
-the input data. The second occurrence of a string is replaced by a
6
-pointer to the previous string, in the form of a pair (distance,
7
-length). Distances are limited to 32K bytes, and lengths are limited
8
-to 258 bytes. When a string does not occur anywhere in the previous
9
-32K bytes, it is emitted as a sequence of literal bytes. (In this
10
-description, `string' must be taken as an arbitrary sequence of bytes,
11
-and is not restricted to printable characters.)
12
-
13
-Literals or match lengths are compressed with one Huffman tree, and
14
-match distances are compressed with another tree. The trees are stored
15
-in a compact form at the start of each block. The blocks can have any
16
-size (except that the compressed data for one block must fit in
17
-available memory). A block is terminated when deflate() determines that
18
-it would be useful to start another block with fresh trees. (This is
19
-somewhat similar to the behavior of LZW-based _compress_.)
20
-
21
-Duplicated strings are found using a hash table. All input strings of
22
-length 3 are inserted in the hash table. A hash index is computed for
23
-the next 3 bytes. If the hash chain for this index is not empty, all
24
-strings in the chain are compared with the current input string, and
25
-the longest match is selected.
26
-
27
-The hash chains are searched starting with the most recent strings, to
28
-favor small distances and thus take advantage of the Huffman encoding.
29
-The hash chains are singly linked. There are no deletions from the
30
-hash chains, the algorithm simply discards matches that are too old.
31
-
32
-To avoid a worst-case situation, very long hash chains are arbitrarily
33
-truncated at a certain length, determined by a runtime option (level
34
-parameter of deflateInit). So deflate() does not always find the longest
35
-possible match but generally finds a match which is long enough.
36
-
37
-deflate() also defers the selection of matches with a lazy evaluation
38
-mechanism. After a match of length N has been found, deflate() searches for
39
-a longer match at the next input byte. If a longer match is found, the
40
-previous match is truncated to a length of one (thus producing a single
41
-literal byte) and the process of lazy evaluation begins again. Otherwise,
42
-the original match is kept, and the next match search is attempted only N
43
-steps later.
44
-
45
-The lazy match evaluation is also subject to a runtime parameter. If
46
-the current match is long enough, deflate() reduces the search for a longer
47
-match, thus speeding up the whole process. If compression ratio is more
48
-important than speed, deflate() attempts a complete second search even if
49
-the first match is already long enough.
50
-
51
-The lazy match evaluation is not performed for the fastest compression
52
-modes (level parameter 1 to 3). For these fast modes, new strings
53
-are inserted in the hash table only when no match was found, or
54
-when the match is not too long. This degrades the compression ratio
55
-but saves time since there are both fewer insertions and fewer searches.
56
-
57
-
58
-2. Decompression algorithm (inflate)
59
-
60
-2.1 Introduction
61
-
62
-The key question is how to represent a Huffman code (or any prefix code) so
63
-that you can decode fast. The most important characteristic is that shorter
64
-codes are much more common than longer codes, so pay attention to decoding the
65
-short codes fast, and let the long codes take longer to decode.
66
-
67
-inflate() sets up a first level table that covers some number of bits of
68
-input less than the length of longest code. It gets that many bits from the
69
-stream, and looks it up in the table. The table will tell if the next
70
-code is that many bits or less and how many, and if it is, it will tell
71
-the value, else it will point to the next level table for which inflate()
72
-grabs more bits and tries to decode a longer code.
73
-
74
-How many bits to make the first lookup is a tradeoff between the time it
75
-takes to decode and the time it takes to build the table. If building the
76
-table took no time (and if you had infinite memory), then there would only
77
-be a first level table to cover all the way to the longest code. However,
78
-building the table ends up taking a lot longer for more bits since short
79
-codes are replicated many times in such a table. What inflate() does is
80
-simply to make the number of bits in the first table a variable, and then
81
-to set that variable for the maximum speed.
82
-
83
-For inflate, which has 286 possible codes for the literal/length tree, the size
84
-of the first table is nine bits. Also the distance trees have 30 possible
85
-values, and the size of the first table is six bits. Note that for each of
86
-those cases, the table ended up one bit longer than the ``average'' code
87
-length, i.e. the code length of an approximately flat code which would be a
88
-little more than eight bits for 286 symbols and a little less than five bits
89
-for 30 symbols.
90
-
91
-
92
-2.2 More details on the inflate table lookup
93
-
94
-Ok, you want to know what this cleverly obfuscated inflate tree actually
95
-looks like. You are correct that it's not a Huffman tree. It is simply a
96
-lookup table for the first, let's say, nine bits of a Huffman symbol. The
97
-symbol could be as short as one bit or as long as 15 bits. If a particular
98
-symbol is shorter than nine bits, then that symbol's translation is duplicated
99
-in all those entries that start with that symbol's bits. For example, if the
100
-symbol is four bits, then it's duplicated 32 times in a nine-bit table. If a
101
-symbol is nine bits long, it appears in the table once.
102
-
103
-If the symbol is longer than nine bits, then that entry in the table points
104
-to another similar table for the remaining bits. Again, there are duplicated
105
-entries as needed. The idea is that most of the time the symbol will be short
106
-and there will only be one table look up. (That's whole idea behind data
107
-compression in the first place.) For the less frequent long symbols, there
108
-will be two lookups. If you had a compression method with really long
109
-symbols, you could have as many levels of lookups as is efficient. For
110
-inflate, two is enough.
111
-
112
-So a table entry either points to another table (in which case nine bits in
113
-the above example are gobbled), or it contains the translation for the symbol
114
-and the number of bits to gobble. Then you start again with the next
115
-ungobbled bit.
116
-
117
-You may wonder: why not just have one lookup table for how ever many bits the
118
-longest symbol is? The reason is that if you do that, you end up spending
119
-more time filling in duplicate symbol entries than you do actually decoding.
120
-At least for deflate's output that generates new trees every several 10's of
121
-kbytes. You can imagine that filling in a 2^15 entry table for a 15-bit code
122
-would take too long if you're only decoding several thousand symbols. At the
123
-other extreme, you could make a new table for every bit in the code. In fact,
124
-that's essentially a Huffman tree. But then you spend too much time
125
-traversing the tree while decoding, even for short symbols.
126
-
127
-So the number of bits for the first lookup table is a trade of the time to
128
-fill out the table vs. the time spent looking at the second level and above of
129
-the table.
130
-
131
-Here is an example, scaled down:
132
-
133
-The code being decoded, with 10 symbols, from 1 to 6 bits long:
134
-
135
-A: 0
136
-B: 10
137
-C: 1100
138
-D: 11010
139
-E: 11011
140
-F: 11100
141
-G: 11101
142
-H: 11110
143
-I: 111110
144
-J: 111111
145
-
146
-Let's make the first table three bits long (eight entries):
147
-
148
-000: A,1
149
-001: A,1
150
-010: A,1
151
-011: A,1
152
-100: B,2
153
-101: B,2
154
-110: -> table X (gobble 3 bits)
155
-111: -> table Y (gobble 3 bits)
156
-
157
-Each entry is what the bits decode as and how many bits that is, i.e. how
158
-many bits to gobble. Or the entry points to another table, with the number of
159
-bits to gobble implicit in the size of the table.
160
-
161
-Table X is two bits long since the longest code starting with 110 is five bits
162
-long:
163
-
164
-00: C,1
165
-01: C,1
166
-10: D,2
167
-11: E,2
168
-
169
-Table Y is three bits long since the longest code starting with 111 is six
170
-bits long:
171
-
172
-000: F,2
173
-001: F,2
174
-010: G,2
175
-011: G,2
176
-100: H,2
177
-101: H,2
178
-110: I,3
179
-111: J,3
180
-
181
-So what we have here are three tables with a total of 20 entries that had to
182
-be constructed. That's compared to 64 entries for a single table. Or
183
-compared to 16 entries for a Huffman tree (six two entry tables and one four
184
-entry table). Assuming that the code ideally represents the probability of
185
-the symbols, it takes on the average 1.25 lookups per symbol. That's compared
186
-to one lookup for the single table, or 1.66 lookups per symbol for the
187
-Huffman tree.
188
-
189
-There, I think that gives you a picture of what's going on. For inflate, the
190
-meaning of a particular symbol is often more than just a letter. It can be a
191
-byte (a "literal"), or it can be either a length or a distance which
192
-indicates a base value and a number of bits to fetch after the code that is
193
-added to the base value. Or it might be the special end-of-block code. The
194
-data structures created in inftrees.c try to encode all that information
195
-compactly in the tables.
196
-
197
-
198
-Jean-loup Gailly Mark Adler
199
-[email protected] [email protected]
200
-
201
-
202
-References:
203
-
204
-[LZ77] Ziv J., Lempel A., ``A Universal Algorithm for Sequential Data
205
-Compression,'' IEEE Transactions on Information Theory, Vol. 23, No. 3,
206
-pp. 337-343.
207
-
208
-``DEFLATE Compressed Data Format Specification'' available in
209
-http://tools.ietf.org/html/rfc1951
--- a/compat/zlib/doc/algorithm.txt
+++ b/compat/zlib/doc/algorithm.txt
@@ -1,209 +0,0 @@
1 1. Compression algorithm (deflate)
2
3 The deflation algorithm used by gzip (also zip and zlib) is a variation of
4 LZ77 (Lempel-Ziv 1977, see reference below). It finds duplicated strings in
5 the input data. The second occurrence of a string is replaced by a
6 pointer to the previous string, in the form of a pair (distance,
7 length). Distances are limited to 32K bytes, and lengths are limited
8 to 258 bytes. When a string does not occur anywhere in the previous
9 32K bytes, it is emitted as a sequence of literal bytes. (In this
10 description, `string' must be taken as an arbitrary sequence of bytes,
11 and is not restricted to printable characters.)
12
13 Literals or match lengths are compressed with one Huffman tree, and
14 match distances are compressed with another tree. The trees are stored
15 in a compact form at the start of each block. The blocks can have any
16 size (except that the compressed data for one block must fit in
17 available memory). A block is terminated when deflate() determines that
18 it would be useful to start another block with fresh trees. (This is
19 somewhat similar to the behavior of LZW-based _compress_.)
20
21 Duplicated strings are found using a hash table. All input strings of
22 length 3 are inserted in the hash table. A hash index is computed for
23 the next 3 bytes. If the hash chain for this index is not empty, all
24 strings in the chain are compared with the current input string, and
25 the longest match is selected.
26
27 The hash chains are searched starting with the most recent strings, to
28 favor small distances and thus take advantage of the Huffman encoding.
29 The hash chains are singly linked. There are no deletions from the
30 hash chains, the algorithm simply discards matches that are too old.
31
32 To avoid a worst-case situation, very long hash chains are arbitrarily
33 truncated at a certain length, determined by a runtime option (level
34 parameter of deflateInit). So deflate() does not always find the longest
35 possible match but generally finds a match which is long enough.
36
37 deflate() also defers the selection of matches with a lazy evaluation
38 mechanism. After a match of length N has been found, deflate() searches for
39 a longer match at the next input byte. If a longer match is found, the
40 previous match is truncated to a length of one (thus producing a single
41 literal byte) and the process of lazy evaluation begins again. Otherwise,
42 the original match is kept, and the next match search is attempted only N
43 steps later.
44
45 The lazy match evaluation is also subject to a runtime parameter. If
46 the current match is long enough, deflate() reduces the search for a longer
47 match, thus speeding up the whole process. If compression ratio is more
48 important than speed, deflate() attempts a complete second search even if
49 the first match is already long enough.
50
51 The lazy match evaluation is not performed for the fastest compression
52 modes (level parameter 1 to 3). For these fast modes, new strings
53 are inserted in the hash table only when no match was found, or
54 when the match is not too long. This degrades the compression ratio
55 but saves time since there are both fewer insertions and fewer searches.
56
57
58 2. Decompression algorithm (inflate)
59
60 2.1 Introduction
61
62 The key question is how to represent a Huffman code (or any prefix code) so
63 that you can decode fast. The most important characteristic is that shorter
64 codes are much more common than longer codes, so pay attention to decoding the
65 short codes fast, and let the long codes take longer to decode.
66
67 inflate() sets up a first level table that covers some number of bits of
68 input less than the length of longest code. It gets that many bits from the
69 stream, and looks it up in the table. The table will tell if the next
70 code is that many bits or less and how many, and if it is, it will tell
71 the value, else it will point to the next level table for which inflate()
72 grabs more bits and tries to decode a longer code.
73
74 How many bits to make the first lookup is a tradeoff between the time it
75 takes to decode and the time it takes to build the table. If building the
76 table took no time (and if you had infinite memory), then there would only
77 be a first level table to cover all the way to the longest code. However,
78 building the table ends up taking a lot longer for more bits since short
79 codes are replicated many times in such a table. What inflate() does is
80 simply to make the number of bits in the first table a variable, and then
81 to set that variable for the maximum speed.
82
83 For inflate, which has 286 possible codes for the literal/length tree, the size
84 of the first table is nine bits. Also the distance trees have 30 possible
85 values, and the size of the first table is six bits. Note that for each of
86 those cases, the table ended up one bit longer than the ``average'' code
87 length, i.e. the code length of an approximately flat code which would be a
88 little more than eight bits for 286 symbols and a little less than five bits
89 for 30 symbols.
90
91
92 2.2 More details on the inflate table lookup
93
94 Ok, you want to know what this cleverly obfuscated inflate tree actually
95 looks like. You are correct that it's not a Huffman tree. It is simply a
96 lookup table for the first, let's say, nine bits of a Huffman symbol. The
97 symbol could be as short as one bit or as long as 15 bits. If a particular
98 symbol is shorter than nine bits, then that symbol's translation is duplicated
99 in all those entries that start with that symbol's bits. For example, if the
100 symbol is four bits, then it's duplicated 32 times in a nine-bit table. If a
101 symbol is nine bits long, it appears in the table once.
102
103 If the symbol is longer than nine bits, then that entry in the table points
104 to another similar table for the remaining bits. Again, there are duplicated
105 entries as needed. The idea is that most of the time the symbol will be short
106 and there will only be one table look up. (That's whole idea behind data
107 compression in the first place.) For the less frequent long symbols, there
108 will be two lookups. If you had a compression method with really long
109 symbols, you could have as many levels of lookups as is efficient. For
110 inflate, two is enough.
111
112 So a table entry either points to another table (in which case nine bits in
113 the above example are gobbled), or it contains the translation for the symbol
114 and the number of bits to gobble. Then you start again with the next
115 ungobbled bit.
116
117 You may wonder: why not just have one lookup table for how ever many bits the
118 longest symbol is? The reason is that if you do that, you end up spending
119 more time filling in duplicate symbol entries than you do actually decoding.
120 At least for deflate's output that generates new trees every several 10's of
121 kbytes. You can imagine that filling in a 2^15 entry table for a 15-bit code
122 would take too long if you're only decoding several thousand symbols. At the
123 other extreme, you could make a new table for every bit in the code. In fact,
124 that's essentially a Huffman tree. But then you spend too much time
125 traversing the tree while decoding, even for short symbols.
126
127 So the number of bits for the first lookup table is a trade of the time to
128 fill out the table vs. the time spent looking at the second level and above of
129 the table.
130
131 Here is an example, scaled down:
132
133 The code being decoded, with 10 symbols, from 1 to 6 bits long:
134
135 A: 0
136 B: 10
137 C: 1100
138 D: 11010
139 E: 11011
140 F: 11100
141 G: 11101
142 H: 11110
143 I: 111110
144 J: 111111
145
146 Let's make the first table three bits long (eight entries):
147
148 000: A,1
149 001: A,1
150 010: A,1
151 011: A,1
152 100: B,2
153 101: B,2
154 110: -> table X (gobble 3 bits)
155 111: -> table Y (gobble 3 bits)
156
157 Each entry is what the bits decode as and how many bits that is, i.e. how
158 many bits to gobble. Or the entry points to another table, with the number of
159 bits to gobble implicit in the size of the table.
160
161 Table X is two bits long since the longest code starting with 110 is five bits
162 long:
163
164 00: C,1
165 01: C,1
166 10: D,2
167 11: E,2
168
169 Table Y is three bits long since the longest code starting with 111 is six
170 bits long:
171
172 000: F,2
173 001: F,2
174 010: G,2
175 011: G,2
176 100: H,2
177 101: H,2
178 110: I,3
179 111: J,3
180
181 So what we have here are three tables with a total of 20 entries that had to
182 be constructed. That's compared to 64 entries for a single table. Or
183 compared to 16 entries for a Huffman tree (six two entry tables and one four
184 entry table). Assuming that the code ideally represents the probability of
185 the symbols, it takes on the average 1.25 lookups per symbol. That's compared
186 to one lookup for the single table, or 1.66 lookups per symbol for the
187 Huffman tree.
188
189 There, I think that gives you a picture of what's going on. For inflate, the
190 meaning of a particular symbol is often more than just a letter. It can be a
191 byte (a "literal"), or it can be either a length or a distance which
192 indicates a base value and a number of bits to fetch after the code that is
193 added to the base value. Or it might be the special end-of-block code. The
194 data structures created in inftrees.c try to encode all that information
195 compactly in the tables.
196
197
198 Jean-loup Gailly Mark Adler
199 [email protected] [email protected]
200
201
202 References:
203
204 [LZ77] Ziv J., Lempel A., ``A Universal Algorithm for Sequential Data
205 Compression,'' IEEE Transactions on Information Theory, Vol. 23, No. 3,
206 pp. 337-343.
207
208 ``DEFLATE Compressed Data Format Specification'' available in
209 http://tools.ietf.org/html/rfc1951
--- a/compat/zlib/doc/algorithm.txt
+++ b/compat/zlib/doc/algorithm.txt
@@ -1,209 +0,0 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
D compat/zlib/doc/rfc1950.txt
-630
--- a/compat/zlib/doc/rfc1950.txt
+++ b/compat/zlib/doc/rfc1950.txt
@@ -1,630 +0,0 @@
1
-
2
-
3
-
4
-
5
-
6
-
7
-Network Working Group P. Deutsch
8
-Request for Comments: 1950 Aladdin Enterprises
9
-Category: Informational J-L. Gailly
10
- Info-ZIP
11
- May 1996
12
-
13
-
14
- ZLIB Compressed Data Format Specification version 3.3
15
-
16
-Status of This Memo
17
-
18
- This memo provides information for the Internet community. This memo
19
- does not specify an Internet standard of any kind. Distribution of
20
- this memo is unlimited.
21
-
22
-IESG Note:
23
-
24
- The IESG takes no position on the validity of any Intellectual
25
- Property Rights statements contained in this document.
26
-
27
-Notices
28
-
29
- Copyright (c) 1996 L. Peter Deutsch and Jean-Loup Gailly
30
-
31
- Permission is granted to copy and distribute this document for any
32
- purpose and without charge, including translations into other
33
- languages and incorporation into compilations, provided that the
34
- copyright notice and this notice are preserved, and that any
35
- substantive changes or deletions from the original are clearly
36
- marked.
37
-
38
- A pointer to the latest version of this and related documentation in
39
- HTML format can be found at the URL
40
- <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
41
-
42
-Abstract
43
-
44
- This specification defines a lossless compressed data format. The
45
- data can be produced or consumed, even for an arbitrarily long
46
- sequentially presented input data stream, using only an a priori
47
- bounded amount of intermediate storage. The format presently uses
48
- the DEFLATE compression method but can be easily extended to use
49
- other compression methods. It can be implemented readily in a manner
50
- not covered by patents. This specification also defines the ADLER-32
51
- checksum (an extension and improvement of the Fletcher checksum),
52
- used for detection of data corruption, and provides an algorithm for
53
- computing it.
54
-
55
-
56
-
57
-
58
-Deutsch & Gailly Informational [Page 1]
59
-
60
-
61
-RFC 1950 ZLIB Compressed Data Format Specification May 1996
62
-
63
-
64
-Table of Contents
65
-
66
- 1. Introduction ................................................... 2
67
- 1.1. Purpose ................................................... 2
68
- 1.2. Intended audience ......................................... 3
69
- 1.3. Scope ..................................................... 3
70
- 1.4. Compliance ................................................ 3
71
- 1.5. Definitions of terms and conventions used ................ 3
72
- 1.6. Changes from previous versions ............................ 3
73
- 2. Detailed specification ......................................... 3
74
- 2.1. Overall conventions ....................................... 3
75
- 2.2. Data format ............................................... 4
76
- 2.3. Compliance ................................................ 7
77
- 3. References ..................................................... 7
78
- 4. Source code .................................................... 8
79
- 5. Security Considerations ........................................ 8
80
- 6. Acknowledgements ............................................... 8
81
- 7. Authors' Addresses ............................................. 8
82
- 8. Appendix: Rationale ............................................ 9
83
- 9. Appendix: Sample code ..........................................10
84
-
85
-1. Introduction
86
-
87
- 1.1. Purpose
88
-
89
- The purpose of this specification is to define a lossless
90
- compressed data format that:
91
-
92
- * Is independent of CPU type, operating system, file system,
93
- and character set, and hence can be used for interchange;
94
-
95
- * Can be produced or consumed, even for an arbitrarily long
96
- sequentially presented input data stream, using only an a
97
- priori bounded amount of intermediate storage, and hence can
98
- be used in data communications or similar structures such as
99
- Unix filters;
100
-
101
- * Can use a number of different compression methods;
102
-
103
- * Can be implemented readily in a manner not covered by
104
- patents, and hence can be practiced freely.
105
-
106
- The data format defined by this specification does not attempt to
107
- allow random access to compressed data.
108
-
109
-
110
-
111
-
112
-
113
-
114
-
115
-Deutsch & Gailly Informational [Page 2]
116
-
117
-
118
-RFC 1950 ZLIB Compressed Data Format Specification May 1996
119
-
120
-
121
- 1.2. Intended audience
122
-
123
- This specification is intended for use by implementors of software
124
- to compress data into zlib format and/or decompress data from zlib
125
- format.
126
-
127
- The text of the specification assumes a basic background in
128
- programming at the level of bits and other primitive data
129
- representations.
130
-
131
- 1.3. Scope
132
-
133
- The specification specifies a compressed data format that can be
134
- used for in-memory compression of a sequence of arbitrary bytes.
135
-
136
- 1.4. Compliance
137
-
138
- Unless otherwise indicated below, a compliant decompressor must be
139
- able to accept and decompress any data set that conforms to all
140
- the specifications presented here; a compliant compressor must
141
- produce data sets that conform to all the specifications presented
142
- here.
143
-
144
- 1.5. Definitions of terms and conventions used
145
-
146
- byte: 8 bits stored or transmitted as a unit (same as an octet).
147
- (For this specification, a byte is exactly 8 bits, even on
148
- machines which store a character on a number of bits different
149
- from 8.) See below, for the numbering of bits within a byte.
150
-
151
- 1.6. Changes from previous versions
152
-
153
- Version 3.1 was the first public release of this specification.
154
- In version 3.2, some terminology was changed and the Adler-32
155
- sample code was rewritten for clarity. In version 3.3, the
156
- support for a preset dictionary was introduced, and the
157
- specification was converted to RFC style.
158
-
159
-2. Detailed specification
160
-
161
- 2.1. Overall conventions
162
-
163
- In the diagrams below, a box like this:
164
-
165
- +---+
166
- | | <-- the vertical bars might be missing
167
- +---+
168
-
169
-
170
-
171
-
172
-Deutsch & Gailly Informational [Page 3]
173
-
174
-
175
-RFC 1950 ZLIB Compressed Data Format Specification May 1996
176
-
177
-
178
- represents one byte; a box like this:
179
-
180
- +==============+
181
- | |
182
- +==============+
183
-
184
- represents a variable number of bytes.
185
-
186
- Bytes stored within a computer do not have a "bit order", since
187
- they are always treated as a unit. However, a byte considered as
188
- an integer between 0 and 255 does have a most- and least-
189
- significant bit, and since we write numbers with the most-
190
- significant digit on the left, we also write bytes with the most-
191
- significant bit on the left. In the diagrams below, we number the
192
- bits of a byte so that bit 0 is the least-significant bit, i.e.,
193
- the bits are numbered:
194
-
195
- +--------+
196
- |76543210|
197
- +--------+
198
-
199
- Within a computer, a number may occupy multiple bytes. All
200
- multi-byte numbers in the format described here are stored with
201
- the MOST-significant byte first (at the lower memory address).
202
- For example, the decimal number 520 is stored as:
203
-
204
- 0 1
205
- +--------+--------+
206
- |00000010|00001000|
207
- +--------+--------+
208
- ^ ^
209
- | |
210
- | + less significant byte = 8
211
- + more significant byte = 2 x 256
212
-
213
- 2.2. Data format
214
-
215
- A zlib stream has the following structure:
216
-
217
- 0 1
218
- +---+---+
219
- |CMF|FLG| (more-->)
220
- +---+---+
221
-
222
-
223
-
224
-
225
-
226
-
227
-
228
-
229
-Deutsch & Gailly Informational [Page 4]
230
-
231
-
232
-RFC 1950 ZLIB Compressed Data Format Specification May 1996
233
-
234
-
235
- (if FLG.FDICT set)
236
-
237
- 0 1 2 3
238
- +---+---+---+---+
239
- | DICTID | (more-->)
240
- +---+---+---+---+
241
-
242
- +=====================+---+---+---+---+
243
- |...compressed data...| ADLER32 |
244
- +=====================+---+---+---+---+
245
-
246
- Any data which may appear after ADLER32 are not part of the zlib
247
- stream.
248
-
249
- CMF (Compression Method and flags)
250
- This byte is divided into a 4-bit compression method and a 4-
251
- bit information field depending on the compression method.
252
-
253
- bits 0 to 3 CM Compression method
254
- bits 4 to 7 CINFO Compression info
255
-
256
- CM (Compression method)
257
- This identifies the compression method used in the file. CM = 8
258
- denotes the "deflate" compression method with a window size up
259
- to 32K. This is the method used by gzip and PNG (see
260
- references [1] and [2] in Chapter 3, below, for the reference
261
- documents). CM = 15 is reserved. It might be used in a future
262
- version of this specification to indicate the presence of an
263
- extra field before the compressed data.
264
-
265
- CINFO (Compression info)
266
- For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
267
- size, minus eight (CINFO=7 indicates a 32K window size). Values
268
- of CINFO above 7 are not allowed in this version of the
269
- specification. CINFO is not defined in this specification for
270
- CM not equal to 8.
271
-
272
- FLG (FLaGs)
273
- This flag byte is divided as follows:
274
-
275
- bits 0 to 4 FCHECK (check bits for CMF and FLG)
276
- bit 5 FDICT (preset dictionary)
277
- bits 6 to 7 FLEVEL (compression level)
278
-
279
- The FCHECK value must be such that CMF and FLG, when viewed as
280
- a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG),
281
- is a multiple of 31.
282
-
283
-
284
-
285
-
286
-Deutsch & Gailly Informational [Page 5]
287
-
288
-
289
-RFC 1950 ZLIB Compressed Data Format Specification May 1996
290
-
291
-
292
- FDICT (Preset dictionary)
293
- If FDICT is set, a DICT dictionary identifier is present
294
- immediately after the FLG byte. The dictionary is a sequence of
295
- bytes which are initially fed to the compressor without
296
- producing any compressed output. DICT is the Adler-32 checksum
297
- of this sequence of bytes (see the definition of ADLER32
298
- below). The decompressor can use this identifier to determine
299
- which dictionary has been used by the compressor.
300
-
301
- FLEVEL (Compression level)
302
- These flags are available for use by specific compression
303
- methods. The "deflate" method (CM = 8) sets these flags as
304
- follows:
305
-
306
- 0 - compressor used fastest algorithm
307
- 1 - compressor used fast algorithm
308
- 2 - compressor used default algorithm
309
- 3 - compressor used maximum compression, slowest algorithm
310
-
311
- The information in FLEVEL is not needed for decompression; it
312
- is there to indicate if recompression might be worthwhile.
313
-
314
- compressed data
315
- For compression method 8, the compressed data is stored in the
316
- deflate compressed data format as described in the document
317
- "DEFLATE Compressed Data Format Specification" by L. Peter
318
- Deutsch. (See reference [3] in Chapter 3, below)
319
-
320
- Other compressed data formats are not specified in this version
321
- of the zlib specification.
322
-
323
- ADLER32 (Adler-32 checksum)
324
- This contains a checksum value of the uncompressed data
325
- (excluding any dictionary data) computed according to Adler-32
326
- algorithm. This algorithm is a 32-bit extension and improvement
327
- of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073
328
- standard. See references [4] and [5] in Chapter 3, below)
329
-
330
- Adler-32 is composed of two sums accumulated per byte: s1 is
331
- the sum of all bytes, s2 is the sum of all s1 values. Both sums
332
- are done modulo 65521. s1 is initialized to 1, s2 to zero. The
333
- Adler-32 checksum is stored as s2*65536 + s1 in most-
334
- significant-byte first (network) order.
335
-
336
-
337
-
338
-
339
-
340
-
341
-
342
-
343
-Deutsch & Gailly Informational [Page 6]
344
-
345
-
346
-RFC 1950 ZLIB Compressed Data Format Specification May 1996
347
-
348
-
349
- 2.3. Compliance
350
-
351
- A compliant compressor must produce streams with correct CMF, FLG
352
- and ADLER32, but need not support preset dictionaries. When the
353
- zlib data format is used as part of another standard data format,
354
- the compressor may use only preset dictionaries that are specified
355
- by this other data format. If this other format does not use the
356
- preset dictionary feature, the compressor must not set the FDICT
357
- flag.
358
-
359
- A compliant decompressor must check CMF, FLG, and ADLER32, and
360
- provide an error indication if any of these have incorrect values.
361
- A compliant decompressor must give an error indication if CM is
362
- not one of the values defined in this specification (only the
363
- value 8 is permitted in this version), since another value could
364
- indicate the presence of new features that would cause subsequent
365
- data to be interpreted incorrectly. A compliant decompressor must
366
- give an error indication if FDICT is set and DICTID is not the
367
- identifier of a known preset dictionary. A decompressor may
368
- ignore FLEVEL and still be compliant. When the zlib data format
369
- is being used as a part of another standard format, a compliant
370
- decompressor must support all the preset dictionaries specified by
371
- the other format. When the other format does not use the preset
372
- dictionary feature, a compliant decompressor must reject any
373
- stream in which the FDICT flag is set.
374
-
375
-3. References
376
-
377
- [1] Deutsch, L.P.,"GZIP Compressed Data Format Specification",
378
- available in ftp://ftp.uu.net/pub/archiving/zip/doc/
379
-
380
- [2] Thomas Boutell, "PNG (Portable Network Graphics) specification",
381
- available in ftp://ftp.uu.net/graphics/png/documents/
382
-
383
- [3] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
384
- available in ftp://ftp.uu.net/pub/archiving/zip/doc/
385
-
386
- [4] Fletcher, J. G., "An Arithmetic Checksum for Serial
387
- Transmissions," IEEE Transactions on Communications, Vol. COM-30,
388
- No. 1, January 1982, pp. 247-252.
389
-
390
- [5] ITU-T Recommendation X.224, Annex D, "Checksum Algorithms,"
391
- November, 1993, pp. 144, 145. (Available from
392
- gopher://info.itu.ch). ITU-T X.244 is also the same as ISO 8073.
393
-
394
-
395
-
396
-
397
-
398
-
399
-
400
-Deutsch & Gailly Informational [Page 7]
401
-
402
-
403
-RFC 1950 ZLIB Compressed Data Format Specification May 1996
404
-
405
-
406
-4. Source code
407
-
408
- Source code for a C language implementation of a "zlib" compliant
409
- library is available at ftp://ftp.uu.net/pub/archiving/zip/zlib/.
410
-
411
-5. Security Considerations
412
-
413
- A decoder that fails to check the ADLER32 checksum value may be
414
- subject to undetected data corruption.
415
-
416
-6. Acknowledgements
417
-
418
- Trademarks cited in this document are the property of their
419
- respective owners.
420
-
421
- Jean-Loup Gailly and Mark Adler designed the zlib format and wrote
422
- the related software described in this specification. Glenn
423
- Randers-Pehrson converted this document to RFC and HTML format.
424
-
425
-7. Authors' Addresses
426
-
427
- L. Peter Deutsch
428
- Aladdin Enterprises
429
- 203 Santa Margarita Ave.
430
- Menlo Park, CA 94025
431
-
432
- Phone: (415) 322-0103 (AM only)
433
- FAX: (415) 322-1734
434
- EMail: <[email protected]>
435
-
436
-
437
- Jean-Loup Gailly
438
-
439
- EMail: <[email protected]>
440
-
441
- Questions about the technical content of this specification can be
442
- sent by email to
443
-
444
- Jean-Loup Gailly <[email protected]> and
445
- Mark Adler <[email protected]>
446
-
447
- Editorial comments on this specification can be sent by email to
448
-
449
- L. Peter Deutsch <[email protected]> and
450
- Glenn Randers-Pehrson <[email protected]>
451
-
452
-
453
-
454
-
455
-
456
-
457
-Deutsch & Gailly Informational [Page 8]
458
-
459
-
460
-RFC 1950 ZLIB Compressed Data Format Specification May 1996
461
-
462
-
463
-8. Appendix: Rationale
464
-
465
- 8.1. Preset dictionaries
466
-
467
- A preset dictionary is specially useful to compress short input
468
- sequences. The compressor can take advantage of the dictionary
469
- context to encode the input in a more compact manner. The
470
- decompressor can be initialized with the appropriate context by
471
- virtually decompressing a compressed version of the dictionary
472
- without producing any output. However for certain compression
473
- algorithms such as the deflate algorithm this operation can be
474
- achieved without actually performing any decompression.
475
-
476
- The compressor and the decompressor must use exactly the same
477
- dictionary. The dictionary may be fixed or may be chosen among a
478
- certain number of predefined dictionaries, according to the kind
479
- of input data. The decompressor can determine which dictionary has
480
- been chosen by the compressor by checking the dictionary
481
- identifier. This document does not specify the contents of
482
- predefined dictionaries, since the optimal dictionaries are
483
- application specific. Standard data formats using this feature of
484
- the zlib specification must precisely define the allowed
485
- dictionaries.
486
-
487
- 8.2. The Adler-32 algorithm
488
-
489
- The Adler-32 algorithm is much faster than the CRC32 algorithm yet
490
- still provides an extremely low probability of undetected errors.
491
-
492
- The modulo on unsigned long accumulators can be delayed for 5552
493
- bytes, so the modulo operation time is negligible. If the bytes
494
- are a, b, c, the second sum is 3a + 2b + c + 3, and so is position
495
- and order sensitive, unlike the first sum, which is just a
496
- checksum. That 65521 is prime is important to avoid a possible
497
- large class of two-byte errors that leave the check unchanged.
498
- (The Fletcher checksum uses 255, which is not prime and which also
499
- makes the Fletcher check insensitive to single byte changes 0 <->
500
- 255.)
501
-
502
- The sum s1 is initialized to 1 instead of zero to make the length
503
- of the sequence part of s2, so that the length does not have to be
504
- checked separately. (Any sequence of zeroes has a Fletcher
505
- checksum of zero.)
506
-
507
-
508
-
509
-
510
-
511
-
512
-
513
-
514
-Deutsch & Gailly Informational [Page 9]
515
-
516
-
517
-RFC 1950 ZLIB Compressed Data Format Specification May 1996
518
-
519
-
520
-9. Appendix: Sample code
521
-
522
- The following C code computes the Adler-32 checksum of a data buffer.
523
- It is written for clarity, not for speed. The sample code is in the
524
- ANSI C programming language. Non C users may find it easier to read
525
- with these hints:
526
-
527
- & Bitwise AND operator.
528
- >> Bitwise right shift operator. When applied to an
529
- unsigned quantity, as here, right shift inserts zero bit(s)
530
- at the left.
531
- << Bitwise left shift operator. Left shift inserts zero
532
- bit(s) at the right.
533
- ++ "n++" increments the variable n.
534
- % modulo operator: a % b is the remainder of a divided by b.
535
-
536
- #define BASE 65521 /* largest prime smaller than 65536 */
537
-
538
- /*
539
- Update a running Adler-32 checksum with the bytes buf[0..len-1]
540
- and return the updated checksum. The Adler-32 checksum should be
541
- initialized to 1.
542
-
543
- Usage example:
544
-
545
- unsigned long adler = 1L;
546
-
547
- while (read_buffer(buffer, length) != EOF) {
548
- adler = update_adler32(adler, buffer, length);
549
- }
550
- if (adler != original_adler) error();
551
- */
552
- unsigned long update_adler32(unsigned long adler,
553
- unsigned char *buf, int len)
554
- {
555
- unsigned long s1 = adler & 0xffff;
556
- unsigned long s2 = (adler >> 16) & 0xffff;
557
- int n;
558
-
559
- for (n = 0; n < len; n++) {
560
- s1 = (s1 + buf[n]) % BASE;
561
- s2 = (s2 + s1) % BASE;
562
- }
563
- return (s2 << 16) + s1;
564
- }
565
-
566
- /* Return the adler32 of the bytes buf[0..len-1] */
567
-
568
-
569
-
570
-
571
-Deutsch & Gailly Informational [Page 10]
572
-
573
-
574
-RFC 1950 ZLIB Compressed Data Format Specification May 1996
575
-
576
-
577
- unsigned long adler32(unsigned char *buf, int len)
578
- {
579
- return update_adler32(1L, buf, len);
580
- }
581
-
582
-
583
-
584
-
585
-
586
-
587
-
588
-
589
-
590
-
591
-
592
-
593
-
594
-
595
-
596
-
597
-
598
-
599
-
600
-
601
-
602
-
603
-
604
-
605
-
606
-
607
-
608
-
609
-
610
-
611
-
612
-
613
-
614
-
615
-
616
-
617
-
618
-
619
-
620
-
621
-
622
-
623
-
624
-
625
-
626
-
627
-
628
-Deutsch & Gailly Informational [Page 11]
629
-
630
-
--- a/compat/zlib/doc/rfc1950.txt
+++ b/compat/zlib/doc/rfc1950.txt
@@ -1,630 +0,0 @@
1
2
3
4
5
6
7 Network Working Group P. Deutsch
8 Request for Comments: 1950 Aladdin Enterprises
9 Category: Informational J-L. Gailly
10 Info-ZIP
11 May 1996
12
13
14 ZLIB Compressed Data Format Specification version 3.3
15
16 Status of This Memo
17
18 This memo provides information for the Internet community. This memo
19 does not specify an Internet standard of any kind. Distribution of
20 this memo is unlimited.
21
22 IESG Note:
23
24 The IESG takes no position on the validity of any Intellectual
25 Property Rights statements contained in this document.
26
27 Notices
28
29 Copyright (c) 1996 L. Peter Deutsch and Jean-Loup Gailly
30
31 Permission is granted to copy and distribute this document for any
32 purpose and without charge, including translations into other
33 languages and incorporation into compilations, provided that the
34 copyright notice and this notice are preserved, and that any
35 substantive changes or deletions from the original are clearly
36 marked.
37
38 A pointer to the latest version of this and related documentation in
39 HTML format can be found at the URL
40 <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
41
42 Abstract
43
44 This specification defines a lossless compressed data format. The
45 data can be produced or consumed, even for an arbitrarily long
46 sequentially presented input data stream, using only an a priori
47 bounded amount of intermediate storage. The format presently uses
48 the DEFLATE compression method but can be easily extended to use
49 other compression methods. It can be implemented readily in a manner
50 not covered by patents. This specification also defines the ADLER-32
51 checksum (an extension and improvement of the Fletcher checksum),
52 used for detection of data corruption, and provides an algorithm for
53 computing it.
54
55
56
57
58 Deutsch & Gailly Informational [Page 1]
59
60
61 RFC 1950 ZLIB Compressed Data Format Specification May 1996
62
63
64 Table of Contents
65
66 1. Introduction ................................................... 2
67 1.1. Purpose ................................................... 2
68 1.2. Intended audience ......................................... 3
69 1.3. Scope ..................................................... 3
70 1.4. Compliance ................................................ 3
71 1.5. Definitions of terms and conventions used ................ 3
72 1.6. Changes from previous versions ............................ 3
73 2. Detailed specification ......................................... 3
74 2.1. Overall conventions ....................................... 3
75 2.2. Data format ............................................... 4
76 2.3. Compliance ................................................ 7
77 3. References ..................................................... 7
78 4. Source code .................................................... 8
79 5. Security Considerations ........................................ 8
80 6. Acknowledgements ............................................... 8
81 7. Authors' Addresses ............................................. 8
82 8. Appendix: Rationale ............................................ 9
83 9. Appendix: Sample code ..........................................10
84
85 1. Introduction
86
87 1.1. Purpose
88
89 The purpose of this specification is to define a lossless
90 compressed data format that:
91
92 * Is independent of CPU type, operating system, file system,
93 and character set, and hence can be used for interchange;
94
95 * Can be produced or consumed, even for an arbitrarily long
96 sequentially presented input data stream, using only an a
97 priori bounded amount of intermediate storage, and hence can
98 be used in data communications or similar structures such as
99 Unix filters;
100
101 * Can use a number of different compression methods;
102
103 * Can be implemented readily in a manner not covered by
104 patents, and hence can be practiced freely.
105
106 The data format defined by this specification does not attempt to
107 allow random access to compressed data.
108
109
110
111
112
113
114
115 Deutsch & Gailly Informational [Page 2]
116
117
118 RFC 1950 ZLIB Compressed Data Format Specification May 1996
119
120
121 1.2. Intended audience
122
123 This specification is intended for use by implementors of software
124 to compress data into zlib format and/or decompress data from zlib
125 format.
126
127 The text of the specification assumes a basic background in
128 programming at the level of bits and other primitive data
129 representations.
130
131 1.3. Scope
132
133 The specification specifies a compressed data format that can be
134 used for in-memory compression of a sequence of arbitrary bytes.
135
136 1.4. Compliance
137
138 Unless otherwise indicated below, a compliant decompressor must be
139 able to accept and decompress any data set that conforms to all
140 the specifications presented here; a compliant compressor must
141 produce data sets that conform to all the specifications presented
142 here.
143
144 1.5. Definitions of terms and conventions used
145
146 byte: 8 bits stored or transmitted as a unit (same as an octet).
147 (For this specification, a byte is exactly 8 bits, even on
148 machines which store a character on a number of bits different
149 from 8.) See below, for the numbering of bits within a byte.
150
151 1.6. Changes from previous versions
152
153 Version 3.1 was the first public release of this specification.
154 In version 3.2, some terminology was changed and the Adler-32
155 sample code was rewritten for clarity. In version 3.3, the
156 support for a preset dictionary was introduced, and the
157 specification was converted to RFC style.
158
159 2. Detailed specification
160
161 2.1. Overall conventions
162
163 In the diagrams below, a box like this:
164
165 +---+
166 | | <-- the vertical bars might be missing
167 +---+
168
169
170
171
172 Deutsch & Gailly Informational [Page 3]
173
174
175 RFC 1950 ZLIB Compressed Data Format Specification May 1996
176
177
178 represents one byte; a box like this:
179
180 +==============+
181 | |
182 +==============+
183
184 represents a variable number of bytes.
185
186 Bytes stored within a computer do not have a "bit order", since
187 they are always treated as a unit. However, a byte considered as
188 an integer between 0 and 255 does have a most- and least-
189 significant bit, and since we write numbers with the most-
190 significant digit on the left, we also write bytes with the most-
191 significant bit on the left. In the diagrams below, we number the
192 bits of a byte so that bit 0 is the least-significant bit, i.e.,
193 the bits are numbered:
194
195 +--------+
196 |76543210|
197 +--------+
198
199 Within a computer, a number may occupy multiple bytes. All
200 multi-byte numbers in the format described here are stored with
201 the MOST-significant byte first (at the lower memory address).
202 For example, the decimal number 520 is stored as:
203
204 0 1
205 +--------+--------+
206 |00000010|00001000|
207 +--------+--------+
208 ^ ^
209 | |
210 | + less significant byte = 8
211 + more significant byte = 2 x 256
212
213 2.2. Data format
214
215 A zlib stream has the following structure:
216
217 0 1
218 +---+---+
219 |CMF|FLG| (more-->)
220 +---+---+
221
222
223
224
225
226
227
228
229 Deutsch & Gailly Informational [Page 4]
230
231
232 RFC 1950 ZLIB Compressed Data Format Specification May 1996
233
234
235 (if FLG.FDICT set)
236
237 0 1 2 3
238 +---+---+---+---+
239 | DICTID | (more-->)
240 +---+---+---+---+
241
242 +=====================+---+---+---+---+
243 |...compressed data...| ADLER32 |
244 +=====================+---+---+---+---+
245
246 Any data which may appear after ADLER32 are not part of the zlib
247 stream.
248
249 CMF (Compression Method and flags)
250 This byte is divided into a 4-bit compression method and a 4-
251 bit information field depending on the compression method.
252
253 bits 0 to 3 CM Compression method
254 bits 4 to 7 CINFO Compression info
255
256 CM (Compression method)
257 This identifies the compression method used in the file. CM = 8
258 denotes the "deflate" compression method with a window size up
259 to 32K. This is the method used by gzip and PNG (see
260 references [1] and [2] in Chapter 3, below, for the reference
261 documents). CM = 15 is reserved. It might be used in a future
262 version of this specification to indicate the presence of an
263 extra field before the compressed data.
264
265 CINFO (Compression info)
266 For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
267 size, minus eight (CINFO=7 indicates a 32K window size). Values
268 of CINFO above 7 are not allowed in this version of the
269 specification. CINFO is not defined in this specification for
270 CM not equal to 8.
271
272 FLG (FLaGs)
273 This flag byte is divided as follows:
274
275 bits 0 to 4 FCHECK (check bits for CMF and FLG)
276 bit 5 FDICT (preset dictionary)
277 bits 6 to 7 FLEVEL (compression level)
278
279 The FCHECK value must be such that CMF and FLG, when viewed as
280 a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG),
281 is a multiple of 31.
282
283
284
285
286 Deutsch & Gailly Informational [Page 5]
287
288
289 RFC 1950 ZLIB Compressed Data Format Specification May 1996
290
291
292 FDICT (Preset dictionary)
293 If FDICT is set, a DICT dictionary identifier is present
294 immediately after the FLG byte. The dictionary is a sequence of
295 bytes which are initially fed to the compressor without
296 producing any compressed output. DICT is the Adler-32 checksum
297 of this sequence of bytes (see the definition of ADLER32
298 below). The decompressor can use this identifier to determine
299 which dictionary has been used by the compressor.
300
301 FLEVEL (Compression level)
302 These flags are available for use by specific compression
303 methods. The "deflate" method (CM = 8) sets these flags as
304 follows:
305
306 0 - compressor used fastest algorithm
307 1 - compressor used fast algorithm
308 2 - compressor used default algorithm
309 3 - compressor used maximum compression, slowest algorithm
310
311 The information in FLEVEL is not needed for decompression; it
312 is there to indicate if recompression might be worthwhile.
313
314 compressed data
315 For compression method 8, the compressed data is stored in the
316 deflate compressed data format as described in the document
317 "DEFLATE Compressed Data Format Specification" by L. Peter
318 Deutsch. (See reference [3] in Chapter 3, below)
319
320 Other compressed data formats are not specified in this version
321 of the zlib specification.
322
323 ADLER32 (Adler-32 checksum)
324 This contains a checksum value of the uncompressed data
325 (excluding any dictionary data) computed according to Adler-32
326 algorithm. This algorithm is a 32-bit extension and improvement
327 of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073
328 standard. See references [4] and [5] in Chapter 3, below)
329
330 Adler-32 is composed of two sums accumulated per byte: s1 is
331 the sum of all bytes, s2 is the sum of all s1 values. Both sums
332 are done modulo 65521. s1 is initialized to 1, s2 to zero. The
333 Adler-32 checksum is stored as s2*65536 + s1 in most-
334 significant-byte first (network) order.
335
336
337
338
339
340
341
342
343 Deutsch & Gailly Informational [Page 6]
344
345
346 RFC 1950 ZLIB Compressed Data Format Specification May 1996
347
348
349 2.3. Compliance
350
351 A compliant compressor must produce streams with correct CMF, FLG
352 and ADLER32, but need not support preset dictionaries. When the
353 zlib data format is used as part of another standard data format,
354 the compressor may use only preset dictionaries that are specified
355 by this other data format. If this other format does not use the
356 preset dictionary feature, the compressor must not set the FDICT
357 flag.
358
359 A compliant decompressor must check CMF, FLG, and ADLER32, and
360 provide an error indication if any of these have incorrect values.
361 A compliant decompressor must give an error indication if CM is
362 not one of the values defined in this specification (only the
363 value 8 is permitted in this version), since another value could
364 indicate the presence of new features that would cause subsequent
365 data to be interpreted incorrectly. A compliant decompressor must
366 give an error indication if FDICT is set and DICTID is not the
367 identifier of a known preset dictionary. A decompressor may
368 ignore FLEVEL and still be compliant. When the zlib data format
369 is being used as a part of another standard format, a compliant
370 decompressor must support all the preset dictionaries specified by
371 the other format. When the other format does not use the preset
372 dictionary feature, a compliant decompressor must reject any
373 stream in which the FDICT flag is set.
374
375 3. References
376
377 [1] Deutsch, L.P.,"GZIP Compressed Data Format Specification",
378 available in ftp://ftp.uu.net/pub/archiving/zip/doc/
379
380 [2] Thomas Boutell, "PNG (Portable Network Graphics) specification",
381 available in ftp://ftp.uu.net/graphics/png/documents/
382
383 [3] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
384 available in ftp://ftp.uu.net/pub/archiving/zip/doc/
385
386 [4] Fletcher, J. G., "An Arithmetic Checksum for Serial
387 Transmissions," IEEE Transactions on Communications, Vol. COM-30,
388 No. 1, January 1982, pp. 247-252.
389
390 [5] ITU-T Recommendation X.224, Annex D, "Checksum Algorithms,"
391 November, 1993, pp. 144, 145. (Available from
392 gopher://info.itu.ch). ITU-T X.244 is also the same as ISO 8073.
393
394
395
396
397
398
399
400 Deutsch & Gailly Informational [Page 7]
401
402
403 RFC 1950 ZLIB Compressed Data Format Specification May 1996
404
405
406 4. Source code
407
408 Source code for a C language implementation of a "zlib" compliant
409 library is available at ftp://ftp.uu.net/pub/archiving/zip/zlib/.
410
411 5. Security Considerations
412
413 A decoder that fails to check the ADLER32 checksum value may be
414 subject to undetected data corruption.
415
416 6. Acknowledgements
417
418 Trademarks cited in this document are the property of their
419 respective owners.
420
421 Jean-Loup Gailly and Mark Adler designed the zlib format and wrote
422 the related software described in this specification. Glenn
423 Randers-Pehrson converted this document to RFC and HTML format.
424
425 7. Authors' Addresses
426
427 L. Peter Deutsch
428 Aladdin Enterprises
429 203 Santa Margarita Ave.
430 Menlo Park, CA 94025
431
432 Phone: (415) 322-0103 (AM only)
433 FAX: (415) 322-1734
434 EMail: <[email protected]>
435
436
437 Jean-Loup Gailly
438
439 EMail: <[email protected]>
440
441 Questions about the technical content of this specification can be
442 sent by email to
443
444 Jean-Loup Gailly <[email protected]> and
445 Mark Adler <[email protected]>
446
447 Editorial comments on this specification can be sent by email to
448
449 L. Peter Deutsch <[email protected]> and
450 Glenn Randers-Pehrson <[email protected]>
451
452
453
454
455
456
457 Deutsch & Gailly Informational [Page 8]
458
459
460 RFC 1950 ZLIB Compressed Data Format Specification May 1996
461
462
463 8. Appendix: Rationale
464
465 8.1. Preset dictionaries
466
467 A preset dictionary is specially useful to compress short input
468 sequences. The compressor can take advantage of the dictionary
469 context to encode the input in a more compact manner. The
470 decompressor can be initialized with the appropriate context by
471 virtually decompressing a compressed version of the dictionary
472 without producing any output. However for certain compression
473 algorithms such as the deflate algorithm this operation can be
474 achieved without actually performing any decompression.
475
476 The compressor and the decompressor must use exactly the same
477 dictionary. The dictionary may be fixed or may be chosen among a
478 certain number of predefined dictionaries, according to the kind
479 of input data. The decompressor can determine which dictionary has
480 been chosen by the compressor by checking the dictionary
481 identifier. This document does not specify the contents of
482 predefined dictionaries, since the optimal dictionaries are
483 application specific. Standard data formats using this feature of
484 the zlib specification must precisely define the allowed
485 dictionaries.
486
487 8.2. The Adler-32 algorithm
488
489 The Adler-32 algorithm is much faster than the CRC32 algorithm yet
490 still provides an extremely low probability of undetected errors.
491
492 The modulo on unsigned long accumulators can be delayed for 5552
493 bytes, so the modulo operation time is negligible. If the bytes
494 are a, b, c, the second sum is 3a + 2b + c + 3, and so is position
495 and order sensitive, unlike the first sum, which is just a
496 checksum. That 65521 is prime is important to avoid a possible
497 large class of two-byte errors that leave the check unchanged.
498 (The Fletcher checksum uses 255, which is not prime and which also
499 makes the Fletcher check insensitive to single byte changes 0 <->
500 255.)
501
502 The sum s1 is initialized to 1 instead of zero to make the length
503 of the sequence part of s2, so that the length does not have to be
504 checked separately. (Any sequence of zeroes has a Fletcher
505 checksum of zero.)
506
507
508
509
510
511
512
513
514 Deutsch & Gailly Informational [Page 9]
515
516
517 RFC 1950 ZLIB Compressed Data Format Specification May 1996
518
519
520 9. Appendix: Sample code
521
522 The following C code computes the Adler-32 checksum of a data buffer.
523 It is written for clarity, not for speed. The sample code is in the
524 ANSI C programming language. Non C users may find it easier to read
525 with these hints:
526
527 & Bitwise AND operator.
528 >> Bitwise right shift operator. When applied to an
529 unsigned quantity, as here, right shift inserts zero bit(s)
530 at the left.
531 << Bitwise left shift operator. Left shift inserts zero
532 bit(s) at the right.
533 ++ "n++" increments the variable n.
534 % modulo operator: a % b is the remainder of a divided by b.
535
536 #define BASE 65521 /* largest prime smaller than 65536 */
537
538 /*
539 Update a running Adler-32 checksum with the bytes buf[0..len-1]
540 and return the updated checksum. The Adler-32 checksum should be
541 initialized to 1.
542
543 Usage example:
544
545 unsigned long adler = 1L;
546
547 while (read_buffer(buffer, length) != EOF) {
548 adler = update_adler32(adler, buffer, length);
549 }
550 if (adler != original_adler) error();
551 */
552 unsigned long update_adler32(unsigned long adler,
553 unsigned char *buf, int len)
554 {
555 unsigned long s1 = adler & 0xffff;
556 unsigned long s2 = (adler >> 16) & 0xffff;
557 int n;
558
559 for (n = 0; n < len; n++) {
560 s1 = (s1 + buf[n]) % BASE;
561 s2 = (s2 + s1) % BASE;
562 }
563 return (s2 << 16) + s1;
564 }
565
566 /* Return the adler32 of the bytes buf[0..len-1] */
567
568
569
570
571 Deutsch & Gailly Informational [Page 10]
572
573
574 RFC 1950 ZLIB Compressed Data Format Specification May 1996
575
576
577 unsigned long adler32(unsigned char *buf, int len)
578 {
579 return update_adler32(1L, buf, len);
580 }
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628 Deutsch & Gailly Informational [Page 11]
629
630
--- a/compat/zlib/doc/rfc1950.txt
+++ b/compat/zlib/doc/rfc1950.txt
@@ -1,630 +0,0 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
D compat/zlib/doc/rfc1951.txt
-972
--- a/compat/zlib/doc/rfc1951.txt
+++ b/compat/zlib/doc/rfc1951.txt
@@ -1,972 +0,0 @@
1
-
2
-
3
-
4
-
5
-
6
-
7
-Network Working Group P. Deutsch
8
-Request for Comments: 1951 Aladdin Enterprises
9
-Category: Informational May 1996
10
-
11
-
12
- DEFLATE Compressed Data Format Specification version 1.3
13
-
14
-Status of This Memo
15
-
16
- This memo provides information for the Internet community. This memo
17
- does not specify an Internet standard of any kind. Distribution of
18
- this memo is unlimited.
19
-
20
-IESG Note:
21
-
22
- The IESG takes no position on the validity of any Intellectual
23
- Property Rights statements contained in this document.
24
-
25
-Notices
26
-
27
- Copyright (c) 1996 L. Peter Deutsch
28
-
29
- Permission is granted to copy and distribute this document for any
30
- purpose and without charge, including translations into other
31
- languages and incorporation into compilations, provided that the
32
- copyright notice and this notice are preserved, and that any
33
- substantive changes or deletions from the original are clearly
34
- marked.
35
-
36
- A pointer to the latest version of this and related documentation in
37
- HTML format can be found at the URL
38
- <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
39
-
40
-Abstract
41
-
42
- This specification defines a lossless compressed data format that
43
- compresses data using a combination of the LZ77 algorithm and Huffman
44
- coding, with efficiency comparable to the best currently available
45
- general-purpose compression methods. The data can be produced or
46
- consumed, even for an arbitrarily long sequentially presented input
47
- data stream, using only an a priori bounded amount of intermediate
48
- storage. The format can be implemented readily in a manner not
49
- covered by patents.
50
-
51
-
52
-
53
-
54
-
55
-
56
-
57
-
58
-Deutsch Informational [Page 1]
59
-
60
-
61
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
62
-
63
-
64
-Table of Contents
65
-
66
- 1. Introduction ................................................... 2
67
- 1.1. Purpose ................................................... 2
68
- 1.2. Intended audience ......................................... 3
69
- 1.3. Scope ..................................................... 3
70
- 1.4. Compliance ................................................ 3
71
- 1.5. Definitions of terms and conventions used ................ 3
72
- 1.6. Changes from previous versions ............................ 4
73
- 2. Compressed representation overview ............................. 4
74
- 3. Detailed specification ......................................... 5
75
- 3.1. Overall conventions ....................................... 5
76
- 3.1.1. Packing into bytes .................................. 5
77
- 3.2. Compressed block format ................................... 6
78
- 3.2.1. Synopsis of prefix and Huffman coding ............... 6
79
- 3.2.2. Use of Huffman coding in the "deflate" format ....... 7
80
- 3.2.3. Details of block format ............................. 9
81
- 3.2.4. Non-compressed blocks (BTYPE=00) ................... 11
82
- 3.2.5. Compressed blocks (length and distance codes) ...... 11
83
- 3.2.6. Compression with fixed Huffman codes (BTYPE=01) .... 12
84
- 3.2.7. Compression with dynamic Huffman codes (BTYPE=10) .. 13
85
- 3.3. Compliance ............................................... 14
86
- 4. Compression algorithm details ................................. 14
87
- 5. References .................................................... 16
88
- 6. Security Considerations ....................................... 16
89
- 7. Source code ................................................... 16
90
- 8. Acknowledgements .............................................. 16
91
- 9. Author's Address .............................................. 17
92
-
93
-1. Introduction
94
-
95
- 1.1. Purpose
96
-
97
- The purpose of this specification is to define a lossless
98
- compressed data format that:
99
- * Is independent of CPU type, operating system, file system,
100
- and character set, and hence can be used for interchange;
101
- * Can be produced or consumed, even for an arbitrarily long
102
- sequentially presented input data stream, using only an a
103
- priori bounded amount of intermediate storage, and hence
104
- can be used in data communications or similar structures
105
- such as Unix filters;
106
- * Compresses data with efficiency comparable to the best
107
- currently available general-purpose compression methods,
108
- and in particular considerably better than the "compress"
109
- program;
110
- * Can be implemented readily in a manner not covered by
111
- patents, and hence can be practiced freely;
112
-
113
-
114
-
115
-Deutsch Informational [Page 2]
116
-
117
-
118
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
119
-
120
-
121
- * Is compatible with the file format produced by the current
122
- widely used gzip utility, in that conforming decompressors
123
- will be able to read data produced by the existing gzip
124
- compressor.
125
-
126
- The data format defined by this specification does not attempt to:
127
-
128
- * Allow random access to compressed data;
129
- * Compress specialized data (e.g., raster graphics) as well
130
- as the best currently available specialized algorithms.
131
-
132
- A simple counting argument shows that no lossless compression
133
- algorithm can compress every possible input data set. For the
134
- format defined here, the worst case expansion is 5 bytes per 32K-
135
- byte block, i.e., a size increase of 0.015% for large data sets.
136
- English text usually compresses by a factor of 2.5 to 3;
137
- executable files usually compress somewhat less; graphical data
138
- such as raster images may compress much more.
139
-
140
- 1.2. Intended audience
141
-
142
- This specification is intended for use by implementors of software
143
- to compress data into "deflate" format and/or decompress data from
144
- "deflate" format.
145
-
146
- The text of the specification assumes a basic background in
147
- programming at the level of bits and other primitive data
148
- representations. Familiarity with the technique of Huffman coding
149
- is helpful but not required.
150
-
151
- 1.3. Scope
152
-
153
- The specification specifies a method for representing a sequence
154
- of bytes as a (usually shorter) sequence of bits, and a method for
155
- packing the latter bit sequence into bytes.
156
-
157
- 1.4. Compliance
158
-
159
- Unless otherwise indicated below, a compliant decompressor must be
160
- able to accept and decompress any data set that conforms to all
161
- the specifications presented here; a compliant compressor must
162
- produce data sets that conform to all the specifications presented
163
- here.
164
-
165
- 1.5. Definitions of terms and conventions used
166
-
167
- Byte: 8 bits stored or transmitted as a unit (same as an octet).
168
- For this specification, a byte is exactly 8 bits, even on machines
169
-
170
-
171
-
172
-Deutsch Informational [Page 3]
173
-
174
-
175
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
176
-
177
-
178
- which store a character on a number of bits different from eight.
179
- See below, for the numbering of bits within a byte.
180
-
181
- String: a sequence of arbitrary bytes.
182
-
183
- 1.6. Changes from previous versions
184
-
185
- There have been no technical changes to the deflate format since
186
- version 1.1 of this specification. In version 1.2, some
187
- terminology was changed. Version 1.3 is a conversion of the
188
- specification to RFC style.
189
-
190
-2. Compressed representation overview
191
-
192
- A compressed data set consists of a series of blocks, corresponding
193
- to successive blocks of input data. The block sizes are arbitrary,
194
- except that non-compressible blocks are limited to 65,535 bytes.
195
-
196
- Each block is compressed using a combination of the LZ77 algorithm
197
- and Huffman coding. The Huffman trees for each block are independent
198
- of those for previous or subsequent blocks; the LZ77 algorithm may
199
- use a reference to a duplicated string occurring in a previous block,
200
- up to 32K input bytes before.
201
-
202
- Each block consists of two parts: a pair of Huffman code trees that
203
- describe the representation of the compressed data part, and a
204
- compressed data part. (The Huffman trees themselves are compressed
205
- using Huffman encoding.) The compressed data consists of a series of
206
- elements of two types: literal bytes (of strings that have not been
207
- detected as duplicated within the previous 32K input bytes), and
208
- pointers to duplicated strings, where a pointer is represented as a
209
- pair <length, backward distance>. The representation used in the
210
- "deflate" format limits distances to 32K bytes and lengths to 258
211
- bytes, but does not limit the size of a block, except for
212
- uncompressible blocks, which are limited as noted above.
213
-
214
- Each type of value (literals, distances, and lengths) in the
215
- compressed data is represented using a Huffman code, using one code
216
- tree for literals and lengths and a separate code tree for distances.
217
- The code trees for each block appear in a compact form just before
218
- the compressed data for that block.
219
-
220
-
221
-
222
-
223
-
224
-
225
-
226
-
227
-
228
-
229
-Deutsch Informational [Page 4]
230
-
231
-
232
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
233
-
234
-
235
-3. Detailed specification
236
-
237
- 3.1. Overall conventions In the diagrams below, a box like this:
238
-
239
- +---+
240
- | | <-- the vertical bars might be missing
241
- +---+
242
-
243
- represents one byte; a box like this:
244
-
245
- +==============+
246
- | |
247
- +==============+
248
-
249
- represents a variable number of bytes.
250
-
251
- Bytes stored within a computer do not have a "bit order", since
252
- they are always treated as a unit. However, a byte considered as
253
- an integer between 0 and 255 does have a most- and least-
254
- significant bit, and since we write numbers with the most-
255
- significant digit on the left, we also write bytes with the most-
256
- significant bit on the left. In the diagrams below, we number the
257
- bits of a byte so that bit 0 is the least-significant bit, i.e.,
258
- the bits are numbered:
259
-
260
- +--------+
261
- |76543210|
262
- +--------+
263
-
264
- Within a computer, a number may occupy multiple bytes. All
265
- multi-byte numbers in the format described here are stored with
266
- the least-significant byte first (at the lower memory address).
267
- For example, the decimal number 520 is stored as:
268
-
269
- 0 1
270
- +--------+--------+
271
- |00001000|00000010|
272
- +--------+--------+
273
- ^ ^
274
- | |
275
- | + more significant byte = 2 x 256
276
- + less significant byte = 8
277
-
278
- 3.1.1. Packing into bytes
279
-
280
- This document does not address the issue of the order in which
281
- bits of a byte are transmitted on a bit-sequential medium,
282
- since the final data format described here is byte- rather than
283
-
284
-
285
-
286
-Deutsch Informational [Page 5]
287
-
288
-
289
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
290
-
291
-
292
- bit-oriented. However, we describe the compressed block format
293
- in below, as a sequence of data elements of various bit
294
- lengths, not a sequence of bytes. We must therefore specify
295
- how to pack these data elements into bytes to form the final
296
- compressed byte sequence:
297
-
298
- * Data elements are packed into bytes in order of
299
- increasing bit number within the byte, i.e., starting
300
- with the least-significant bit of the byte.
301
- * Data elements other than Huffman codes are packed
302
- starting with the least-significant bit of the data
303
- element.
304
- * Huffman codes are packed starting with the most-
305
- significant bit of the code.
306
-
307
- In other words, if one were to print out the compressed data as
308
- a sequence of bytes, starting with the first byte at the
309
- *right* margin and proceeding to the *left*, with the most-
310
- significant bit of each byte on the left as usual, one would be
311
- able to parse the result from right to left, with fixed-width
312
- elements in the correct MSB-to-LSB order and Huffman codes in
313
- bit-reversed order (i.e., with the first bit of the code in the
314
- relative LSB position).
315
-
316
- 3.2. Compressed block format
317
-
318
- 3.2.1. Synopsis of prefix and Huffman coding
319
-
320
- Prefix coding represents symbols from an a priori known
321
- alphabet by bit sequences (codes), one code for each symbol, in
322
- a manner such that different symbols may be represented by bit
323
- sequences of different lengths, but a parser can always parse
324
- an encoded string unambiguously symbol-by-symbol.
325
-
326
- We define a prefix code in terms of a binary tree in which the
327
- two edges descending from each non-leaf node are labeled 0 and
328
- 1 and in which the leaf nodes correspond one-for-one with (are
329
- labeled with) the symbols of the alphabet; then the code for a
330
- symbol is the sequence of 0's and 1's on the edges leading from
331
- the root to the leaf labeled with that symbol. For example:
332
-
333
-
334
-
335
-
336
-
337
-
338
-
339
-
340
-
341
-
342
-
343
-Deutsch Informational [Page 6]
344
-
345
-
346
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
347
-
348
-
349
- /\ Symbol Code
350
- 0 1 ------ ----
351
- / \ A 00
352
- /\ B B 1
353
- 0 1 C 011
354
- / \ D 010
355
- A /\
356
- 0 1
357
- / \
358
- D C
359
-
360
- A parser can decode the next symbol from an encoded input
361
- stream by walking down the tree from the root, at each step
362
- choosing the edge corresponding to the next input bit.
363
-
364
- Given an alphabet with known symbol frequencies, the Huffman
365
- algorithm allows the construction of an optimal prefix code
366
- (one which represents strings with those symbol frequencies
367
- using the fewest bits of any possible prefix codes for that
368
- alphabet). Such a code is called a Huffman code. (See
369
- reference [1] in Chapter 5, references for additional
370
- information on Huffman codes.)
371
-
372
- Note that in the "deflate" format, the Huffman codes for the
373
- various alphabets must not exceed certain maximum code lengths.
374
- This constraint complicates the algorithm for computing code
375
- lengths from symbol frequencies. Again, see Chapter 5,
376
- references for details.
377
-
378
- 3.2.2. Use of Huffman coding in the "deflate" format
379
-
380
- The Huffman codes used for each alphabet in the "deflate"
381
- format have two additional rules:
382
-
383
- * All codes of a given bit length have lexicographically
384
- consecutive values, in the same order as the symbols
385
- they represent;
386
-
387
- * Shorter codes lexicographically precede longer codes.
388
-
389
-
390
-
391
-
392
-
393
-
394
-
395
-
396
-
397
-
398
-
399
-
400
-Deutsch Informational [Page 7]
401
-
402
-
403
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
404
-
405
-
406
- We could recode the example above to follow this rule as
407
- follows, assuming that the order of the alphabet is ABCD:
408
-
409
- Symbol Code
410
- ------ ----
411
- A 10
412
- B 0
413
- C 110
414
- D 111
415
-
416
- I.e., 0 precedes 10 which precedes 11x, and 110 and 111 are
417
- lexicographically consecutive.
418
-
419
- Given this rule, we can define the Huffman code for an alphabet
420
- just by giving the bit lengths of the codes for each symbol of
421
- the alphabet in order; this is sufficient to determine the
422
- actual codes. In our example, the code is completely defined
423
- by the sequence of bit lengths (2, 1, 3, 3). The following
424
- algorithm generates the codes as integers, intended to be read
425
- from most- to least-significant bit. The code lengths are
426
- initially in tree[I].Len; the codes are produced in
427
- tree[I].Code.
428
-
429
- 1) Count the number of codes for each code length. Let
430
- bl_count[N] be the number of codes of length N, N >= 1.
431
-
432
- 2) Find the numerical value of the smallest code for each
433
- code length:
434
-
435
- code = 0;
436
- bl_count[0] = 0;
437
- for (bits = 1; bits <= MAX_BITS; bits++) {
438
- code = (code + bl_count[bits-1]) << 1;
439
- next_code[bits] = code;
440
- }
441
-
442
- 3) Assign numerical values to all codes, using consecutive
443
- values for all codes of the same length with the base
444
- values determined at step 2. Codes that are never used
445
- (which have a bit length of zero) must not be assigned a
446
- value.
447
-
448
- for (n = 0; n <= max_code; n++) {
449
- len = tree[n].Len;
450
- if (len != 0) {
451
- tree[n].Code = next_code[len];
452
- next_code[len]++;
453
- }
454
-
455
-
456
-
457
-Deutsch Informational [Page 8]
458
-
459
-
460
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
461
-
462
-
463
- }
464
-
465
- Example:
466
-
467
- Consider the alphabet ABCDEFGH, with bit lengths (3, 3, 3, 3,
468
- 3, 2, 4, 4). After step 1, we have:
469
-
470
- N bl_count[N]
471
- - -----------
472
- 2 1
473
- 3 5
474
- 4 2
475
-
476
- Step 2 computes the following next_code values:
477
-
478
- N next_code[N]
479
- - ------------
480
- 1 0
481
- 2 0
482
- 3 2
483
- 4 14
484
-
485
- Step 3 produces the following code values:
486
-
487
- Symbol Length Code
488
- ------ ------ ----
489
- A 3 010
490
- B 3 011
491
- C 3 100
492
- D 3 101
493
- E 3 110
494
- F 2 00
495
- G 4 1110
496
- H 4 1111
497
-
498
- 3.2.3. Details of block format
499
-
500
- Each block of compressed data begins with 3 header bits
501
- containing the following data:
502
-
503
- first bit BFINAL
504
- next 2 bits BTYPE
505
-
506
- Note that the header bits do not necessarily begin on a byte
507
- boundary, since a block does not necessarily occupy an integral
508
- number of bytes.
509
-
510
-
511
-
512
-
513
-
514
-Deutsch Informational [Page 9]
515
-
516
-
517
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
518
-
519
-
520
- BFINAL is set if and only if this is the last block of the data
521
- set.
522
-
523
- BTYPE specifies how the data are compressed, as follows:
524
-
525
- 00 - no compression
526
- 01 - compressed with fixed Huffman codes
527
- 10 - compressed with dynamic Huffman codes
528
- 11 - reserved (error)
529
-
530
- The only difference between the two compressed cases is how the
531
- Huffman codes for the literal/length and distance alphabets are
532
- defined.
533
-
534
- In all cases, the decoding algorithm for the actual data is as
535
- follows:
536
-
537
- do
538
- read block header from input stream.
539
- if stored with no compression
540
- skip any remaining bits in current partially
541
- processed byte
542
- read LEN and NLEN (see next section)
543
- copy LEN bytes of data to output
544
- otherwise
545
- if compressed with dynamic Huffman codes
546
- read representation of code trees (see
547
- subsection below)
548
- loop (until end of block code recognized)
549
- decode literal/length value from input stream
550
- if value < 256
551
- copy value (literal byte) to output stream
552
- otherwise
553
- if value = end of block (256)
554
- break from loop
555
- otherwise (value = 257..285)
556
- decode distance from input stream
557
-
558
- move backwards distance bytes in the output
559
- stream, and copy length bytes from this
560
- position to the output stream.
561
- end loop
562
- while not last block
563
-
564
- Note that a duplicated string reference may refer to a string
565
- in a previous block; i.e., the backward distance may cross one
566
- or more block boundaries. However a distance cannot refer past
567
- the beginning of the output stream. (An application using a
568
-
569
-
570
-
571
-Deutsch Informational [Page 10]
572
-
573
-
574
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
575
-
576
-
577
- preset dictionary might discard part of the output stream; a
578
- distance can refer to that part of the output stream anyway)
579
- Note also that the referenced string may overlap the current
580
- position; for example, if the last 2 bytes decoded have values
581
- X and Y, a string reference with <length = 5, distance = 2>
582
- adds X,Y,X,Y,X to the output stream.
583
-
584
- We now specify each compression method in turn.
585
-
586
- 3.2.4. Non-compressed blocks (BTYPE=00)
587
-
588
- Any bits of input up to the next byte boundary are ignored.
589
- The rest of the block consists of the following information:
590
-
591
- 0 1 2 3 4...
592
- +---+---+---+---+================================+
593
- | LEN | NLEN |... LEN bytes of literal data...|
594
- +---+---+---+---+================================+
595
-
596
- LEN is the number of data bytes in the block. NLEN is the
597
- one's complement of LEN.
598
-
599
- 3.2.5. Compressed blocks (length and distance codes)
600
-
601
- As noted above, encoded data blocks in the "deflate" format
602
- consist of sequences of symbols drawn from three conceptually
603
- distinct alphabets: either literal bytes, from the alphabet of
604
- byte values (0..255), or <length, backward distance> pairs,
605
- where the length is drawn from (3..258) and the distance is
606
- drawn from (1..32,768). In fact, the literal and length
607
- alphabets are merged into a single alphabet (0..285), where
608
- values 0..255 represent literal bytes, the value 256 indicates
609
- end-of-block, and values 257..285 represent length codes
610
- (possibly in conjunction with extra bits following the symbol
611
- code) as follows:
612
-
613
-
614
-
615
-
616
-
617
-
618
-
619
-
620
-
621
-
622
-
623
-
624
-
625
-
626
-
627
-
628
-Deutsch Informational [Page 11]
629
-
630
-
631
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
632
-
633
-
634
- Extra Extra Extra
635
- Code Bits Length(s) Code Bits Lengths Code Bits Length(s)
636
- ---- ---- ------ ---- ---- ------- ---- ---- -------
637
- 257 0 3 267 1 15,16 277 4 67-82
638
- 258 0 4 268 1 17,18 278 4 83-98
639
- 259 0 5 269 2 19-22 279 4 99-114
640
- 260 0 6 270 2 23-26 280 4 115-130
641
- 261 0 7 271 2 27-30 281 5 131-162
642
- 262 0 8 272 2 31-34 282 5 163-194
643
- 263 0 9 273 3 35-42 283 5 195-226
644
- 264 0 10 274 3 43-50 284 5 227-257
645
- 265 1 11,12 275 3 51-58 285 0 258
646
- 266 1 13,14 276 3 59-66
647
-
648
- The extra bits should be interpreted as a machine integer
649
- stored with the most-significant bit first, e.g., bits 1110
650
- represent the value 14.
651
-
652
- Extra Extra Extra
653
- Code Bits Dist Code Bits Dist Code Bits Distance
654
- ---- ---- ---- ---- ---- ------ ---- ---- --------
655
- 0 0 1 10 4 33-48 20 9 1025-1536
656
- 1 0 2 11 4 49-64 21 9 1537-2048
657
- 2 0 3 12 5 65-96 22 10 2049-3072
658
- 3 0 4 13 5 97-128 23 10 3073-4096
659
- 4 1 5,6 14 6 129-192 24 11 4097-6144
660
- 5 1 7,8 15 6 193-256 25 11 6145-8192
661
- 6 2 9-12 16 7 257-384 26 12 8193-12288
662
- 7 2 13-16 17 7 385-512 27 12 12289-16384
663
- 8 3 17-24 18 8 513-768 28 13 16385-24576
664
- 9 3 25-32 19 8 769-1024 29 13 24577-32768
665
-
666
- 3.2.6. Compression with fixed Huffman codes (BTYPE=01)
667
-
668
- The Huffman codes for the two alphabets are fixed, and are not
669
- represented explicitly in the data. The Huffman code lengths
670
- for the literal/length alphabet are:
671
-
672
- Lit Value Bits Codes
673
- --------- ---- -----
674
- 0 - 143 8 00110000 through
675
- 10111111
676
- 144 - 255 9 110010000 through
677
- 111111111
678
- 256 - 279 7 0000000 through
679
- 0010111
680
- 280 - 287 8 11000000 through
681
- 11000111
682
-
683
-
684
-
685
-Deutsch Informational [Page 12]
686
-
687
-
688
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
689
-
690
-
691
- The code lengths are sufficient to generate the actual codes,
692
- as described above; we show the codes in the table for added
693
- clarity. Literal/length values 286-287 will never actually
694
- occur in the compressed data, but participate in the code
695
- construction.
696
-
697
- Distance codes 0-31 are represented by (fixed-length) 5-bit
698
- codes, with possible additional bits as shown in the table
699
- shown in Paragraph 3.2.5, above. Note that distance codes 30-
700
- 31 will never actually occur in the compressed data.
701
-
702
- 3.2.7. Compression with dynamic Huffman codes (BTYPE=10)
703
-
704
- The Huffman codes for the two alphabets appear in the block
705
- immediately after the header bits and before the actual
706
- compressed data, first the literal/length code and then the
707
- distance code. Each code is defined by a sequence of code
708
- lengths, as discussed in Paragraph 3.2.2, above. For even
709
- greater compactness, the code length sequences themselves are
710
- compressed using a Huffman code. The alphabet for code lengths
711
- is as follows:
712
-
713
- 0 - 15: Represent code lengths of 0 - 15
714
- 16: Copy the previous code length 3 - 6 times.
715
- The next 2 bits indicate repeat length
716
- (0 = 3, ... , 3 = 6)
717
- Example: Codes 8, 16 (+2 bits 11),
718
- 16 (+2 bits 10) will expand to
719
- 12 code lengths of 8 (1 + 6 + 5)
720
- 17: Repeat a code length of 0 for 3 - 10 times.
721
- (3 bits of length)
722
- 18: Repeat a code length of 0 for 11 - 138 times
723
- (7 bits of length)
724
-
725
- A code length of 0 indicates that the corresponding symbol in
726
- the literal/length or distance alphabet will not occur in the
727
- block, and should not participate in the Huffman code
728
- construction algorithm given earlier. If only one distance
729
- code is used, it is encoded using one bit, not zero bits; in
730
- this case there is a single code length of one, with one unused
731
- code. One distance code of zero bits means that there are no
732
- distance codes used at all (the data is all literals).
733
-
734
- We can now define the format of the block:
735
-
736
- 5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)
737
- 5 Bits: HDIST, # of Distance codes - 1 (1 - 32)
738
- 4 Bits: HCLEN, # of Code Length codes - 4 (4 - 19)
739
-
740
-
741
-
742
-Deutsch Informational [Page 13]
743
-
744
-
745
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
746
-
747
-
748
- (HCLEN + 4) x 3 bits: code lengths for the code length
749
- alphabet given just above, in the order: 16, 17, 18,
750
- 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
751
-
752
- These code lengths are interpreted as 3-bit integers
753
- (0-7); as above, a code length of 0 means the
754
- corresponding symbol (literal/length or distance code
755
- length) is not used.
756
-
757
- HLIT + 257 code lengths for the literal/length alphabet,
758
- encoded using the code length Huffman code
759
-
760
- HDIST + 1 code lengths for the distance alphabet,
761
- encoded using the code length Huffman code
762
-
763
- The actual compressed data of the block,
764
- encoded using the literal/length and distance Huffman
765
- codes
766
-
767
- The literal/length symbol 256 (end of data),
768
- encoded using the literal/length Huffman code
769
-
770
- The code length repeat codes can cross from HLIT + 257 to the
771
- HDIST + 1 code lengths. In other words, all code lengths form
772
- a single sequence of HLIT + HDIST + 258 values.
773
-
774
- 3.3. Compliance
775
-
776
- A compressor may limit further the ranges of values specified in
777
- the previous section and still be compliant; for example, it may
778
- limit the range of backward pointers to some value smaller than
779
- 32K. Similarly, a compressor may limit the size of blocks so that
780
- a compressible block fits in memory.
781
-
782
- A compliant decompressor must accept the full range of possible
783
- values defined in the previous section, and must accept blocks of
784
- arbitrary size.
785
-
786
-4. Compression algorithm details
787
-
788
- While it is the intent of this document to define the "deflate"
789
- compressed data format without reference to any particular
790
- compression algorithm, the format is related to the compressed
791
- formats produced by LZ77 (Lempel-Ziv 1977, see reference [2] below);
792
- since many variations of LZ77 are patented, it is strongly
793
- recommended that the implementor of a compressor follow the general
794
- algorithm presented here, which is known not to be patented per se.
795
- The material in this section is not part of the definition of the
796
-
797
-
798
-
799
-Deutsch Informational [Page 14]
800
-
801
-
802
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
803
-
804
-
805
- specification per se, and a compressor need not follow it in order to
806
- be compliant.
807
-
808
- The compressor terminates a block when it determines that starting a
809
- new block with fresh trees would be useful, or when the block size
810
- fills up the compressor's block buffer.
811
-
812
- The compressor uses a chained hash table to find duplicated strings,
813
- using a hash function that operates on 3-byte sequences. At any
814
- given point during compression, let XYZ be the next 3 input bytes to
815
- be examined (not necessarily all different, of course). First, the
816
- compressor examines the hash chain for XYZ. If the chain is empty,
817
- the compressor simply writes out X as a literal byte and advances one
818
- byte in the input. If the hash chain is not empty, indicating that
819
- the sequence XYZ (or, if we are unlucky, some other 3 bytes with the
820
- same hash function value) has occurred recently, the compressor
821
- compares all strings on the XYZ hash chain with the actual input data
822
- sequence starting at the current point, and selects the longest
823
- match.
824
-
825
- The compressor searches the hash chains starting with the most recent
826
- strings, to favor small distances and thus take advantage of the
827
- Huffman encoding. The hash chains are singly linked. There are no
828
- deletions from the hash chains; the algorithm simply discards matches
829
- that are too old. To avoid a worst-case situation, very long hash
830
- chains are arbitrarily truncated at a certain length, determined by a
831
- run-time parameter.
832
-
833
- To improve overall compression, the compressor optionally defers the
834
- selection of matches ("lazy matching"): after a match of length N has
835
- been found, the compressor searches for a longer match starting at
836
- the next input byte. If it finds a longer match, it truncates the
837
- previous match to a length of one (thus producing a single literal
838
- byte) and then emits the longer match. Otherwise, it emits the
839
- original match, and, as described above, advances N bytes before
840
- continuing.
841
-
842
- Run-time parameters also control this "lazy match" procedure. If
843
- compression ratio is most important, the compressor attempts a
844
- complete second search regardless of the length of the first match.
845
- In the normal case, if the current match is "long enough", the
846
- compressor reduces the search for a longer match, thus speeding up
847
- the process. If speed is most important, the compressor inserts new
848
- strings in the hash table only when no match was found, or when the
849
- match is not "too long". This degrades the compression ratio but
850
- saves time since there are both fewer insertions and fewer searches.
851
-
852
-
853
-
854
-
855
-
856
-Deutsch Informational [Page 15]
857
-
858
-
859
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
860
-
861
-
862
-5. References
863
-
864
- [1] Huffman, D. A., "A Method for the Construction of Minimum
865
- Redundancy Codes", Proceedings of the Institute of Radio
866
- Engineers, September 1952, Volume 40, Number 9, pp. 1098-1101.
867
-
868
- [2] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data
869
- Compression", IEEE Transactions on Information Theory, Vol. 23,
870
- No. 3, pp. 337-343.
871
-
872
- [3] Gailly, J.-L., and Adler, M., ZLIB documentation and sources,
873
- available in ftp://ftp.uu.net/pub/archiving/zip/doc/
874
-
875
- [4] Gailly, J.-L., and Adler, M., GZIP documentation and sources,
876
- available as gzip-*.tar in ftp://prep.ai.mit.edu/pub/gnu/
877
-
878
- [5] Schwartz, E. S., and Kallick, B. "Generating a canonical prefix
879
- encoding." Comm. ACM, 7,3 (Mar. 1964), pp. 166-169.
880
-
881
- [6] Hirschberg and Lelewer, "Efficient decoding of prefix codes,"
882
- Comm. ACM, 33,4, April 1990, pp. 449-459.
883
-
884
-6. Security Considerations
885
-
886
- Any data compression method involves the reduction of redundancy in
887
- the data. Consequently, any corruption of the data is likely to have
888
- severe effects and be difficult to correct. Uncompressed text, on
889
- the other hand, will probably still be readable despite the presence
890
- of some corrupted bytes.
891
-
892
- It is recommended that systems using this data format provide some
893
- means of validating the integrity of the compressed data. See
894
- reference [3], for example.
895
-
896
-7. Source code
897
-
898
- Source code for a C language implementation of a "deflate" compliant
899
- compressor and decompressor is available within the zlib package at
900
- ftp://ftp.uu.net/pub/archiving/zip/zlib/.
901
-
902
-8. Acknowledgements
903
-
904
- Trademarks cited in this document are the property of their
905
- respective owners.
906
-
907
- Phil Katz designed the deflate format. Jean-Loup Gailly and Mark
908
- Adler wrote the related software described in this specification.
909
- Glenn Randers-Pehrson converted this document to RFC and HTML format.
910
-
911
-
912
-
913
-Deutsch Informational [Page 16]
914
-
915
-
916
-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
917
-
918
-
919
-9. Author's Address
920
-
921
- L. Peter Deutsch
922
- Aladdin Enterprises
923
- 203 Santa Margarita Ave.
924
- Menlo Park, CA 94025
925
-
926
- Phone: (415) 322-0103 (AM only)
927
- FAX: (415) 322-1734
928
- EMail: <[email protected]>
929
-
930
- Questions about the technical content of this specification can be
931
- sent by email to:
932
-
933
- Jean-Loup Gailly <[email protected]> and
934
- Mark Adler <[email protected]>
935
-
936
- Editorial comments on this specification can be sent by email to:
937
-
938
- L. Peter Deutsch <[email protected]> and
939
- Glenn Randers-Pehrson <[email protected]>
940
-
941
-
942
-
943
-
944
-
945
-
946
-
947
-
948
-
949
-
950
-
951
-
952
-
953
-
954
-
955
-
956
-
957
-
958
-
959
-
960
-
961
-
962
-
963
-
964
-
965
-
966
-
967
-
968
-
969
-
970
-Deutsch Informational [Page 17]
971
-
972
-
--- a/compat/zlib/doc/rfc1951.txt
+++ b/compat/zlib/doc/rfc1951.txt
@@ -1,972 +0,0 @@
1
2
3
4
5
6
7 Network Working Group P. Deutsch
8 Request for Comments: 1951 Aladdin Enterprises
9 Category: Informational May 1996
10
11
12 DEFLATE Compressed Data Format Specification version 1.3
13
14 Status of This Memo
15
16 This memo provides information for the Internet community. This memo
17 does not specify an Internet standard of any kind. Distribution of
18 this memo is unlimited.
19
20 IESG Note:
21
22 The IESG takes no position on the validity of any Intellectual
23 Property Rights statements contained in this document.
24
25 Notices
26
27 Copyright (c) 1996 L. Peter Deutsch
28
29 Permission is granted to copy and distribute this document for any
30 purpose and without charge, including translations into other
31 languages and incorporation into compilations, provided that the
32 copyright notice and this notice are preserved, and that any
33 substantive changes or deletions from the original are clearly
34 marked.
35
36 A pointer to the latest version of this and related documentation in
37 HTML format can be found at the URL
38 <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
39
40 Abstract
41
42 This specification defines a lossless compressed data format that
43 compresses data using a combination of the LZ77 algorithm and Huffman
44 coding, with efficiency comparable to the best currently available
45 general-purpose compression methods. The data can be produced or
46 consumed, even for an arbitrarily long sequentially presented input
47 data stream, using only an a priori bounded amount of intermediate
48 storage. The format can be implemented readily in a manner not
49 covered by patents.
50
51
52
53
54
55
56
57
58 Deutsch Informational [Page 1]
59
60
61 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
62
63
64 Table of Contents
65
66 1. Introduction ................................................... 2
67 1.1. Purpose ................................................... 2
68 1.2. Intended audience ......................................... 3
69 1.3. Scope ..................................................... 3
70 1.4. Compliance ................................................ 3
71 1.5. Definitions of terms and conventions used ................ 3
72 1.6. Changes from previous versions ............................ 4
73 2. Compressed representation overview ............................. 4
74 3. Detailed specification ......................................... 5
75 3.1. Overall conventions ....................................... 5
76 3.1.1. Packing into bytes .................................. 5
77 3.2. Compressed block format ................................... 6
78 3.2.1. Synopsis of prefix and Huffman coding ............... 6
79 3.2.2. Use of Huffman coding in the "deflate" format ....... 7
80 3.2.3. Details of block format ............................. 9
81 3.2.4. Non-compressed blocks (BTYPE=00) ................... 11
82 3.2.5. Compressed blocks (length and distance codes) ...... 11
83 3.2.6. Compression with fixed Huffman codes (BTYPE=01) .... 12
84 3.2.7. Compression with dynamic Huffman codes (BTYPE=10) .. 13
85 3.3. Compliance ............................................... 14
86 4. Compression algorithm details ................................. 14
87 5. References .................................................... 16
88 6. Security Considerations ....................................... 16
89 7. Source code ................................................... 16
90 8. Acknowledgements .............................................. 16
91 9. Author's Address .............................................. 17
92
93 1. Introduction
94
95 1.1. Purpose
96
97 The purpose of this specification is to define a lossless
98 compressed data format that:
99 * Is independent of CPU type, operating system, file system,
100 and character set, and hence can be used for interchange;
101 * Can be produced or consumed, even for an arbitrarily long
102 sequentially presented input data stream, using only an a
103 priori bounded amount of intermediate storage, and hence
104 can be used in data communications or similar structures
105 such as Unix filters;
106 * Compresses data with efficiency comparable to the best
107 currently available general-purpose compression methods,
108 and in particular considerably better than the "compress"
109 program;
110 * Can be implemented readily in a manner not covered by
111 patents, and hence can be practiced freely;
112
113
114
115 Deutsch Informational [Page 2]
116
117
118 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
119
120
121 * Is compatible with the file format produced by the current
122 widely used gzip utility, in that conforming decompressors
123 will be able to read data produced by the existing gzip
124 compressor.
125
126 The data format defined by this specification does not attempt to:
127
128 * Allow random access to compressed data;
129 * Compress specialized data (e.g., raster graphics) as well
130 as the best currently available specialized algorithms.
131
132 A simple counting argument shows that no lossless compression
133 algorithm can compress every possible input data set. For the
134 format defined here, the worst case expansion is 5 bytes per 32K-
135 byte block, i.e., a size increase of 0.015% for large data sets.
136 English text usually compresses by a factor of 2.5 to 3;
137 executable files usually compress somewhat less; graphical data
138 such as raster images may compress much more.
139
140 1.2. Intended audience
141
142 This specification is intended for use by implementors of software
143 to compress data into "deflate" format and/or decompress data from
144 "deflate" format.
145
146 The text of the specification assumes a basic background in
147 programming at the level of bits and other primitive data
148 representations. Familiarity with the technique of Huffman coding
149 is helpful but not required.
150
151 1.3. Scope
152
153 The specification specifies a method for representing a sequence
154 of bytes as a (usually shorter) sequence of bits, and a method for
155 packing the latter bit sequence into bytes.
156
157 1.4. Compliance
158
159 Unless otherwise indicated below, a compliant decompressor must be
160 able to accept and decompress any data set that conforms to all
161 the specifications presented here; a compliant compressor must
162 produce data sets that conform to all the specifications presented
163 here.
164
165 1.5. Definitions of terms and conventions used
166
167 Byte: 8 bits stored or transmitted as a unit (same as an octet).
168 For this specification, a byte is exactly 8 bits, even on machines
169
170
171
172 Deutsch Informational [Page 3]
173
174
175 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
176
177
178 which store a character on a number of bits different from eight.
179 See below, for the numbering of bits within a byte.
180
181 String: a sequence of arbitrary bytes.
182
183 1.6. Changes from previous versions
184
185 There have been no technical changes to the deflate format since
186 version 1.1 of this specification. In version 1.2, some
187 terminology was changed. Version 1.3 is a conversion of the
188 specification to RFC style.
189
190 2. Compressed representation overview
191
192 A compressed data set consists of a series of blocks, corresponding
193 to successive blocks of input data. The block sizes are arbitrary,
194 except that non-compressible blocks are limited to 65,535 bytes.
195
196 Each block is compressed using a combination of the LZ77 algorithm
197 and Huffman coding. The Huffman trees for each block are independent
198 of those for previous or subsequent blocks; the LZ77 algorithm may
199 use a reference to a duplicated string occurring in a previous block,
200 up to 32K input bytes before.
201
202 Each block consists of two parts: a pair of Huffman code trees that
203 describe the representation of the compressed data part, and a
204 compressed data part. (The Huffman trees themselves are compressed
205 using Huffman encoding.) The compressed data consists of a series of
206 elements of two types: literal bytes (of strings that have not been
207 detected as duplicated within the previous 32K input bytes), and
208 pointers to duplicated strings, where a pointer is represented as a
209 pair <length, backward distance>. The representation used in the
210 "deflate" format limits distances to 32K bytes and lengths to 258
211 bytes, but does not limit the size of a block, except for
212 uncompressible blocks, which are limited as noted above.
213
214 Each type of value (literals, distances, and lengths) in the
215 compressed data is represented using a Huffman code, using one code
216 tree for literals and lengths and a separate code tree for distances.
217 The code trees for each block appear in a compact form just before
218 the compressed data for that block.
219
220
221
222
223
224
225
226
227
228
229 Deutsch Informational [Page 4]
230
231
232 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
233
234
235 3. Detailed specification
236
237 3.1. Overall conventions In the diagrams below, a box like this:
238
239 +---+
240 | | <-- the vertical bars might be missing
241 +---+
242
243 represents one byte; a box like this:
244
245 +==============+
246 | |
247 +==============+
248
249 represents a variable number of bytes.
250
251 Bytes stored within a computer do not have a "bit order", since
252 they are always treated as a unit. However, a byte considered as
253 an integer between 0 and 255 does have a most- and least-
254 significant bit, and since we write numbers with the most-
255 significant digit on the left, we also write bytes with the most-
256 significant bit on the left. In the diagrams below, we number the
257 bits of a byte so that bit 0 is the least-significant bit, i.e.,
258 the bits are numbered:
259
260 +--------+
261 |76543210|
262 +--------+
263
264 Within a computer, a number may occupy multiple bytes. All
265 multi-byte numbers in the format described here are stored with
266 the least-significant byte first (at the lower memory address).
267 For example, the decimal number 520 is stored as:
268
269 0 1
270 +--------+--------+
271 |00001000|00000010|
272 +--------+--------+
273 ^ ^
274 | |
275 | + more significant byte = 2 x 256
276 + less significant byte = 8
277
278 3.1.1. Packing into bytes
279
280 This document does not address the issue of the order in which
281 bits of a byte are transmitted on a bit-sequential medium,
282 since the final data format described here is byte- rather than
283
284
285
286 Deutsch Informational [Page 5]
287
288
289 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
290
291
292 bit-oriented. However, we describe the compressed block format
293 in below, as a sequence of data elements of various bit
294 lengths, not a sequence of bytes. We must therefore specify
295 how to pack these data elements into bytes to form the final
296 compressed byte sequence:
297
298 * Data elements are packed into bytes in order of
299 increasing bit number within the byte, i.e., starting
300 with the least-significant bit of the byte.
301 * Data elements other than Huffman codes are packed
302 starting with the least-significant bit of the data
303 element.
304 * Huffman codes are packed starting with the most-
305 significant bit of the code.
306
307 In other words, if one were to print out the compressed data as
308 a sequence of bytes, starting with the first byte at the
309 *right* margin and proceeding to the *left*, with the most-
310 significant bit of each byte on the left as usual, one would be
311 able to parse the result from right to left, with fixed-width
312 elements in the correct MSB-to-LSB order and Huffman codes in
313 bit-reversed order (i.e., with the first bit of the code in the
314 relative LSB position).
315
316 3.2. Compressed block format
317
318 3.2.1. Synopsis of prefix and Huffman coding
319
320 Prefix coding represents symbols from an a priori known
321 alphabet by bit sequences (codes), one code for each symbol, in
322 a manner such that different symbols may be represented by bit
323 sequences of different lengths, but a parser can always parse
324 an encoded string unambiguously symbol-by-symbol.
325
326 We define a prefix code in terms of a binary tree in which the
327 two edges descending from each non-leaf node are labeled 0 and
328 1 and in which the leaf nodes correspond one-for-one with (are
329 labeled with) the symbols of the alphabet; then the code for a
330 symbol is the sequence of 0's and 1's on the edges leading from
331 the root to the leaf labeled with that symbol. For example:
332
333
334
335
336
337
338
339
340
341
342
343 Deutsch Informational [Page 6]
344
345
346 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
347
348
349 /\ Symbol Code
350 0 1 ------ ----
351 / \ A 00
352 /\ B B 1
353 0 1 C 011
354 / \ D 010
355 A /\
356 0 1
357 / \
358 D C
359
360 A parser can decode the next symbol from an encoded input
361 stream by walking down the tree from the root, at each step
362 choosing the edge corresponding to the next input bit.
363
364 Given an alphabet with known symbol frequencies, the Huffman
365 algorithm allows the construction of an optimal prefix code
366 (one which represents strings with those symbol frequencies
367 using the fewest bits of any possible prefix codes for that
368 alphabet). Such a code is called a Huffman code. (See
369 reference [1] in Chapter 5, references for additional
370 information on Huffman codes.)
371
372 Note that in the "deflate" format, the Huffman codes for the
373 various alphabets must not exceed certain maximum code lengths.
374 This constraint complicates the algorithm for computing code
375 lengths from symbol frequencies. Again, see Chapter 5,
376 references for details.
377
378 3.2.2. Use of Huffman coding in the "deflate" format
379
380 The Huffman codes used for each alphabet in the "deflate"
381 format have two additional rules:
382
383 * All codes of a given bit length have lexicographically
384 consecutive values, in the same order as the symbols
385 they represent;
386
387 * Shorter codes lexicographically precede longer codes.
388
389
390
391
392
393
394
395
396
397
398
399
400 Deutsch Informational [Page 7]
401
402
403 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
404
405
406 We could recode the example above to follow this rule as
407 follows, assuming that the order of the alphabet is ABCD:
408
409 Symbol Code
410 ------ ----
411 A 10
412 B 0
413 C 110
414 D 111
415
416 I.e., 0 precedes 10 which precedes 11x, and 110 and 111 are
417 lexicographically consecutive.
418
419 Given this rule, we can define the Huffman code for an alphabet
420 just by giving the bit lengths of the codes for each symbol of
421 the alphabet in order; this is sufficient to determine the
422 actual codes. In our example, the code is completely defined
423 by the sequence of bit lengths (2, 1, 3, 3). The following
424 algorithm generates the codes as integers, intended to be read
425 from most- to least-significant bit. The code lengths are
426 initially in tree[I].Len; the codes are produced in
427 tree[I].Code.
428
429 1) Count the number of codes for each code length. Let
430 bl_count[N] be the number of codes of length N, N >= 1.
431
432 2) Find the numerical value of the smallest code for each
433 code length:
434
435 code = 0;
436 bl_count[0] = 0;
437 for (bits = 1; bits <= MAX_BITS; bits++) {
438 code = (code + bl_count[bits-1]) << 1;
439 next_code[bits] = code;
440 }
441
442 3) Assign numerical values to all codes, using consecutive
443 values for all codes of the same length with the base
444 values determined at step 2. Codes that are never used
445 (which have a bit length of zero) must not be assigned a
446 value.
447
448 for (n = 0; n <= max_code; n++) {
449 len = tree[n].Len;
450 if (len != 0) {
451 tree[n].Code = next_code[len];
452 next_code[len]++;
453 }
454
455
456
457 Deutsch Informational [Page 8]
458
459
460 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
461
462
463 }
464
465 Example:
466
467 Consider the alphabet ABCDEFGH, with bit lengths (3, 3, 3, 3,
468 3, 2, 4, 4). After step 1, we have:
469
470 N bl_count[N]
471 - -----------
472 2 1
473 3 5
474 4 2
475
476 Step 2 computes the following next_code values:
477
478 N next_code[N]
479 - ------------
480 1 0
481 2 0
482 3 2
483 4 14
484
485 Step 3 produces the following code values:
486
487 Symbol Length Code
488 ------ ------ ----
489 A 3 010
490 B 3 011
491 C 3 100
492 D 3 101
493 E 3 110
494 F 2 00
495 G 4 1110
496 H 4 1111
497
498 3.2.3. Details of block format
499
500 Each block of compressed data begins with 3 header bits
501 containing the following data:
502
503 first bit BFINAL
504 next 2 bits BTYPE
505
506 Note that the header bits do not necessarily begin on a byte
507 boundary, since a block does not necessarily occupy an integral
508 number of bytes.
509
510
511
512
513
514 Deutsch Informational [Page 9]
515
516
517 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
518
519
520 BFINAL is set if and only if this is the last block of the data
521 set.
522
523 BTYPE specifies how the data are compressed, as follows:
524
525 00 - no compression
526 01 - compressed with fixed Huffman codes
527 10 - compressed with dynamic Huffman codes
528 11 - reserved (error)
529
530 The only difference between the two compressed cases is how the
531 Huffman codes for the literal/length and distance alphabets are
532 defined.
533
534 In all cases, the decoding algorithm for the actual data is as
535 follows:
536
537 do
538 read block header from input stream.
539 if stored with no compression
540 skip any remaining bits in current partially
541 processed byte
542 read LEN and NLEN (see next section)
543 copy LEN bytes of data to output
544 otherwise
545 if compressed with dynamic Huffman codes
546 read representation of code trees (see
547 subsection below)
548 loop (until end of block code recognized)
549 decode literal/length value from input stream
550 if value < 256
551 copy value (literal byte) to output stream
552 otherwise
553 if value = end of block (256)
554 break from loop
555 otherwise (value = 257..285)
556 decode distance from input stream
557
558 move backwards distance bytes in the output
559 stream, and copy length bytes from this
560 position to the output stream.
561 end loop
562 while not last block
563
564 Note that a duplicated string reference may refer to a string
565 in a previous block; i.e., the backward distance may cross one
566 or more block boundaries. However a distance cannot refer past
567 the beginning of the output stream. (An application using a
568
569
570
571 Deutsch Informational [Page 10]
572
573
574 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
575
576
577 preset dictionary might discard part of the output stream; a
578 distance can refer to that part of the output stream anyway)
579 Note also that the referenced string may overlap the current
580 position; for example, if the last 2 bytes decoded have values
581 X and Y, a string reference with <length = 5, distance = 2>
582 adds X,Y,X,Y,X to the output stream.
583
584 We now specify each compression method in turn.
585
586 3.2.4. Non-compressed blocks (BTYPE=00)
587
588 Any bits of input up to the next byte boundary are ignored.
589 The rest of the block consists of the following information:
590
591 0 1 2 3 4...
592 +---+---+---+---+================================+
593 | LEN | NLEN |... LEN bytes of literal data...|
594 +---+---+---+---+================================+
595
596 LEN is the number of data bytes in the block. NLEN is the
597 one's complement of LEN.
598
599 3.2.5. Compressed blocks (length and distance codes)
600
601 As noted above, encoded data blocks in the "deflate" format
602 consist of sequences of symbols drawn from three conceptually
603 distinct alphabets: either literal bytes, from the alphabet of
604 byte values (0..255), or <length, backward distance> pairs,
605 where the length is drawn from (3..258) and the distance is
606 drawn from (1..32,768). In fact, the literal and length
607 alphabets are merged into a single alphabet (0..285), where
608 values 0..255 represent literal bytes, the value 256 indicates
609 end-of-block, and values 257..285 represent length codes
610 (possibly in conjunction with extra bits following the symbol
611 code) as follows:
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628 Deutsch Informational [Page 11]
629
630
631 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
632
633
634 Extra Extra Extra
635 Code Bits Length(s) Code Bits Lengths Code Bits Length(s)
636 ---- ---- ------ ---- ---- ------- ---- ---- -------
637 257 0 3 267 1 15,16 277 4 67-82
638 258 0 4 268 1 17,18 278 4 83-98
639 259 0 5 269 2 19-22 279 4 99-114
640 260 0 6 270 2 23-26 280 4 115-130
641 261 0 7 271 2 27-30 281 5 131-162
642 262 0 8 272 2 31-34 282 5 163-194
643 263 0 9 273 3 35-42 283 5 195-226
644 264 0 10 274 3 43-50 284 5 227-257
645 265 1 11,12 275 3 51-58 285 0 258
646 266 1 13,14 276 3 59-66
647
648 The extra bits should be interpreted as a machine integer
649 stored with the most-significant bit first, e.g., bits 1110
650 represent the value 14.
651
652 Extra Extra Extra
653 Code Bits Dist Code Bits Dist Code Bits Distance
654 ---- ---- ---- ---- ---- ------ ---- ---- --------
655 0 0 1 10 4 33-48 20 9 1025-1536
656 1 0 2 11 4 49-64 21 9 1537-2048
657 2 0 3 12 5 65-96 22 10 2049-3072
658 3 0 4 13 5 97-128 23 10 3073-4096
659 4 1 5,6 14 6 129-192 24 11 4097-6144
660 5 1 7,8 15 6 193-256 25 11 6145-8192
661 6 2 9-12 16 7 257-384 26 12 8193-12288
662 7 2 13-16 17 7 385-512 27 12 12289-16384
663 8 3 17-24 18 8 513-768 28 13 16385-24576
664 9 3 25-32 19 8 769-1024 29 13 24577-32768
665
666 3.2.6. Compression with fixed Huffman codes (BTYPE=01)
667
668 The Huffman codes for the two alphabets are fixed, and are not
669 represented explicitly in the data. The Huffman code lengths
670 for the literal/length alphabet are:
671
672 Lit Value Bits Codes
673 --------- ---- -----
674 0 - 143 8 00110000 through
675 10111111
676 144 - 255 9 110010000 through
677 111111111
678 256 - 279 7 0000000 through
679 0010111
680 280 - 287 8 11000000 through
681 11000111
682
683
684
685 Deutsch Informational [Page 12]
686
687
688 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
689
690
691 The code lengths are sufficient to generate the actual codes,
692 as described above; we show the codes in the table for added
693 clarity. Literal/length values 286-287 will never actually
694 occur in the compressed data, but participate in the code
695 construction.
696
697 Distance codes 0-31 are represented by (fixed-length) 5-bit
698 codes, with possible additional bits as shown in the table
699 shown in Paragraph 3.2.5, above. Note that distance codes 30-
700 31 will never actually occur in the compressed data.
701
702 3.2.7. Compression with dynamic Huffman codes (BTYPE=10)
703
704 The Huffman codes for the two alphabets appear in the block
705 immediately after the header bits and before the actual
706 compressed data, first the literal/length code and then the
707 distance code. Each code is defined by a sequence of code
708 lengths, as discussed in Paragraph 3.2.2, above. For even
709 greater compactness, the code length sequences themselves are
710 compressed using a Huffman code. The alphabet for code lengths
711 is as follows:
712
713 0 - 15: Represent code lengths of 0 - 15
714 16: Copy the previous code length 3 - 6 times.
715 The next 2 bits indicate repeat length
716 (0 = 3, ... , 3 = 6)
717 Example: Codes 8, 16 (+2 bits 11),
718 16 (+2 bits 10) will expand to
719 12 code lengths of 8 (1 + 6 + 5)
720 17: Repeat a code length of 0 for 3 - 10 times.
721 (3 bits of length)
722 18: Repeat a code length of 0 for 11 - 138 times
723 (7 bits of length)
724
725 A code length of 0 indicates that the corresponding symbol in
726 the literal/length or distance alphabet will not occur in the
727 block, and should not participate in the Huffman code
728 construction algorithm given earlier. If only one distance
729 code is used, it is encoded using one bit, not zero bits; in
730 this case there is a single code length of one, with one unused
731 code. One distance code of zero bits means that there are no
732 distance codes used at all (the data is all literals).
733
734 We can now define the format of the block:
735
736 5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)
737 5 Bits: HDIST, # of Distance codes - 1 (1 - 32)
738 4 Bits: HCLEN, # of Code Length codes - 4 (4 - 19)
739
740
741
742 Deutsch Informational [Page 13]
743
744
745 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
746
747
748 (HCLEN + 4) x 3 bits: code lengths for the code length
749 alphabet given just above, in the order: 16, 17, 18,
750 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
751
752 These code lengths are interpreted as 3-bit integers
753 (0-7); as above, a code length of 0 means the
754 corresponding symbol (literal/length or distance code
755 length) is not used.
756
757 HLIT + 257 code lengths for the literal/length alphabet,
758 encoded using the code length Huffman code
759
760 HDIST + 1 code lengths for the distance alphabet,
761 encoded using the code length Huffman code
762
763 The actual compressed data of the block,
764 encoded using the literal/length and distance Huffman
765 codes
766
767 The literal/length symbol 256 (end of data),
768 encoded using the literal/length Huffman code
769
770 The code length repeat codes can cross from HLIT + 257 to the
771 HDIST + 1 code lengths. In other words, all code lengths form
772 a single sequence of HLIT + HDIST + 258 values.
773
774 3.3. Compliance
775
776 A compressor may limit further the ranges of values specified in
777 the previous section and still be compliant; for example, it may
778 limit the range of backward pointers to some value smaller than
779 32K. Similarly, a compressor may limit the size of blocks so that
780 a compressible block fits in memory.
781
782 A compliant decompressor must accept the full range of possible
783 values defined in the previous section, and must accept blocks of
784 arbitrary size.
785
786 4. Compression algorithm details
787
788 While it is the intent of this document to define the "deflate"
789 compressed data format without reference to any particular
790 compression algorithm, the format is related to the compressed
791 formats produced by LZ77 (Lempel-Ziv 1977, see reference [2] below);
792 since many variations of LZ77 are patented, it is strongly
793 recommended that the implementor of a compressor follow the general
794 algorithm presented here, which is known not to be patented per se.
795 The material in this section is not part of the definition of the
796
797
798
799 Deutsch Informational [Page 14]
800
801
802 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
803
804
805 specification per se, and a compressor need not follow it in order to
806 be compliant.
807
808 The compressor terminates a block when it determines that starting a
809 new block with fresh trees would be useful, or when the block size
810 fills up the compressor's block buffer.
811
812 The compressor uses a chained hash table to find duplicated strings,
813 using a hash function that operates on 3-byte sequences. At any
814 given point during compression, let XYZ be the next 3 input bytes to
815 be examined (not necessarily all different, of course). First, the
816 compressor examines the hash chain for XYZ. If the chain is empty,
817 the compressor simply writes out X as a literal byte and advances one
818 byte in the input. If the hash chain is not empty, indicating that
819 the sequence XYZ (or, if we are unlucky, some other 3 bytes with the
820 same hash function value) has occurred recently, the compressor
821 compares all strings on the XYZ hash chain with the actual input data
822 sequence starting at the current point, and selects the longest
823 match.
824
825 The compressor searches the hash chains starting with the most recent
826 strings, to favor small distances and thus take advantage of the
827 Huffman encoding. The hash chains are singly linked. There are no
828 deletions from the hash chains; the algorithm simply discards matches
829 that are too old. To avoid a worst-case situation, very long hash
830 chains are arbitrarily truncated at a certain length, determined by a
831 run-time parameter.
832
833 To improve overall compression, the compressor optionally defers the
834 selection of matches ("lazy matching"): after a match of length N has
835 been found, the compressor searches for a longer match starting at
836 the next input byte. If it finds a longer match, it truncates the
837 previous match to a length of one (thus producing a single literal
838 byte) and then emits the longer match. Otherwise, it emits the
839 original match, and, as described above, advances N bytes before
840 continuing.
841
842 Run-time parameters also control this "lazy match" procedure. If
843 compression ratio is most important, the compressor attempts a
844 complete second search regardless of the length of the first match.
845 In the normal case, if the current match is "long enough", the
846 compressor reduces the search for a longer match, thus speeding up
847 the process. If speed is most important, the compressor inserts new
848 strings in the hash table only when no match was found, or when the
849 match is not "too long". This degrades the compression ratio but
850 saves time since there are both fewer insertions and fewer searches.
851
852
853
854
855
856 Deutsch Informational [Page 15]
857
858
859 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
860
861
862 5. References
863
864 [1] Huffman, D. A., "A Method for the Construction of Minimum
865 Redundancy Codes", Proceedings of the Institute of Radio
866 Engineers, September 1952, Volume 40, Number 9, pp. 1098-1101.
867
868 [2] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data
869 Compression", IEEE Transactions on Information Theory, Vol. 23,
870 No. 3, pp. 337-343.
871
872 [3] Gailly, J.-L., and Adler, M., ZLIB documentation and sources,
873 available in ftp://ftp.uu.net/pub/archiving/zip/doc/
874
875 [4] Gailly, J.-L., and Adler, M., GZIP documentation and sources,
876 available as gzip-*.tar in ftp://prep.ai.mit.edu/pub/gnu/
877
878 [5] Schwartz, E. S., and Kallick, B. "Generating a canonical prefix
879 encoding." Comm. ACM, 7,3 (Mar. 1964), pp. 166-169.
880
881 [6] Hirschberg and Lelewer, "Efficient decoding of prefix codes,"
882 Comm. ACM, 33,4, April 1990, pp. 449-459.
883
884 6. Security Considerations
885
886 Any data compression method involves the reduction of redundancy in
887 the data. Consequently, any corruption of the data is likely to have
888 severe effects and be difficult to correct. Uncompressed text, on
889 the other hand, will probably still be readable despite the presence
890 of some corrupted bytes.
891
892 It is recommended that systems using this data format provide some
893 means of validating the integrity of the compressed data. See
894 reference [3], for example.
895
896 7. Source code
897
898 Source code for a C language implementation of a "deflate" compliant
899 compressor and decompressor is available within the zlib package at
900 ftp://ftp.uu.net/pub/archiving/zip/zlib/.
901
902 8. Acknowledgements
903
904 Trademarks cited in this document are the property of their
905 respective owners.
906
907 Phil Katz designed the deflate format. Jean-Loup Gailly and Mark
908 Adler wrote the related software described in this specification.
909 Glenn Randers-Pehrson converted this document to RFC and HTML format.
910
911
912
913 Deutsch Informational [Page 16]
914
915
916 RFC 1951 DEFLATE Compressed Data Format Specification May 1996
917
918
919 9. Author's Address
920
921 L. Peter Deutsch
922 Aladdin Enterprises
923 203 Santa Margarita Ave.
924 Menlo Park, CA 94025
925
926 Phone: (415) 322-0103 (AM only)
927 FAX: (415) 322-1734
928 EMail: <[email protected]>
929
930 Questions about the technical content of this specification can be
931 sent by email to:
932
933 Jean-Loup Gailly <[email protected]> and
934 Mark Adler <[email protected]>
935
936 Editorial comments on this specification can be sent by email to:
937
938 L. Peter Deutsch <[email protected]> and
939 Glenn Randers-Pehrson <[email protected]>
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970 Deutsch Informational [Page 17]
971
972
--- a/compat/zlib/doc/rfc1951.txt
+++ b/compat/zlib/doc/rfc1951.txt
@@ -1,972 +0,0 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
D compat/zlib/doc/rfc1952.txt
-687
--- a/compat/zlib/doc/rfc1952.txt
+++ b/compat/zlib/doc/rfc1952.txt
@@ -1,687 +0,0 @@
1
-
2
-
3
-
4
-
5
-
6
-
7
-Network Working Group P. Deutsch
8
-Request for Comments: 1952 Aladdin Enterprises
9
-Category: Informational May 1996
10
-
11
-
12
- GZIP file format specification version 4.3
13
-
14
-Status of This Memo
15
-
16
- This memo provides information for the Internet community. This memo
17
- does not specify an Internet standard of any kind. Distribution of
18
- this memo is unlimited.
19
-
20
-IESG Note:
21
-
22
- The IESG takes no position on the validity of any Intellectual
23
- Property Rights statements contained in this document.
24
-
25
-Notices
26
-
27
- Copyright (c) 1996 L. Peter Deutsch
28
-
29
- Permission is granted to copy and distribute this document for any
30
- purpose and without charge, including translations into other
31
- languages and incorporation into compilations, provided that the
32
- copyright notice and this notice are preserved, and that any
33
- substantive changes or deletions from the original are clearly
34
- marked.
35
-
36
- A pointer to the latest version of this and related documentation in
37
- HTML format can be found at the URL
38
- <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
39
-
40
-Abstract
41
-
42
- This specification defines a lossless compressed data format that is
43
- compatible with the widely used GZIP utility. The format includes a
44
- cyclic redundancy check value for detecting data corruption. The
45
- format presently uses the DEFLATE method of compression but can be
46
- easily extended to use other compression methods. The format can be
47
- implemented readily in a manner not covered by patents.
48
-
49
-
50
-
51
-
52
-
53
-
54
-
55
-
56
-
57
-
58
-Deutsch Informational [Page 1]
59
-
60
-
61
-RFC 1952 GZIP File Format Specification May 1996
62
-
63
-
64
-Table of Contents
65
-
66
- 1. Introduction ................................................... 2
67
- 1.1. Purpose ................................................... 2
68
- 1.2. Intended audience ......................................... 3
69
- 1.3. Scope ..................................................... 3
70
- 1.4. Compliance ................................................ 3
71
- 1.5. Definitions of terms and conventions used ................. 3
72
- 1.6. Changes from previous versions ............................ 3
73
- 2. Detailed specification ......................................... 4
74
- 2.1. Overall conventions ....................................... 4
75
- 2.2. File format ............................................... 5
76
- 2.3. Member format ............................................. 5
77
- 2.3.1. Member header and trailer ........................... 6
78
- 2.3.1.1. Extra field ................................... 8
79
- 2.3.1.2. Compliance .................................... 9
80
- 3. References .................................................. 9
81
- 4. Security Considerations .................................... 10
82
- 5. Acknowledgements ........................................... 10
83
- 6. Author's Address ........................................... 10
84
- 7. Appendix: Jean-Loup Gailly's gzip utility .................. 11
85
- 8. Appendix: Sample CRC Code .................................. 11
86
-
87
-1. Introduction
88
-
89
- 1.1. Purpose
90
-
91
- The purpose of this specification is to define a lossless
92
- compressed data format that:
93
-
94
- * Is independent of CPU type, operating system, file system,
95
- and character set, and hence can be used for interchange;
96
- * Can compress or decompress a data stream (as opposed to a
97
- randomly accessible file) to produce another data stream,
98
- using only an a priori bounded amount of intermediate
99
- storage, and hence can be used in data communications or
100
- similar structures such as Unix filters;
101
- * Compresses data with efficiency comparable to the best
102
- currently available general-purpose compression methods,
103
- and in particular considerably better than the "compress"
104
- program;
105
- * Can be implemented readily in a manner not covered by
106
- patents, and hence can be practiced freely;
107
- * Is compatible with the file format produced by the current
108
- widely used gzip utility, in that conforming decompressors
109
- will be able to read data produced by the existing gzip
110
- compressor.
111
-
112
-
113
-
114
-
115
-Deutsch Informational [Page 2]
116
-
117
-
118
-RFC 1952 GZIP File Format Specification May 1996
119
-
120
-
121
- The data format defined by this specification does not attempt to:
122
-
123
- * Provide random access to compressed data;
124
- * Compress specialized data (e.g., raster graphics) as well as
125
- the best currently available specialized algorithms.
126
-
127
- 1.2. Intended audience
128
-
129
- This specification is intended for use by implementors of software
130
- to compress data into gzip format and/or decompress data from gzip
131
- format.
132
-
133
- The text of the specification assumes a basic background in
134
- programming at the level of bits and other primitive data
135
- representations.
136
-
137
- 1.3. Scope
138
-
139
- The specification specifies a compression method and a file format
140
- (the latter assuming only that a file can store a sequence of
141
- arbitrary bytes). It does not specify any particular interface to
142
- a file system or anything about character sets or encodings
143
- (except for file names and comments, which are optional).
144
-
145
- 1.4. Compliance
146
-
147
- Unless otherwise indicated below, a compliant decompressor must be
148
- able to accept and decompress any file that conforms to all the
149
- specifications presented here; a compliant compressor must produce
150
- files that conform to all the specifications presented here. The
151
- material in the appendices is not part of the specification per se
152
- and is not relevant to compliance.
153
-
154
- 1.5. Definitions of terms and conventions used
155
-
156
- byte: 8 bits stored or transmitted as a unit (same as an octet).
157
- (For this specification, a byte is exactly 8 bits, even on
158
- machines which store a character on a number of bits different
159
- from 8.) See below for the numbering of bits within a byte.
160
-
161
- 1.6. Changes from previous versions
162
-
163
- There have been no technical changes to the gzip format since
164
- version 4.1 of this specification. In version 4.2, some
165
- terminology was changed, and the sample CRC code was rewritten for
166
- clarity and to eliminate the requirement for the caller to do pre-
167
- and post-conditioning. Version 4.3 is a conversion of the
168
- specification to RFC style.
169
-
170
-
171
-
172
-Deutsch Informational [Page 3]
173
-
174
-
175
-RFC 1952 GZIP File Format Specification May 1996
176
-
177
-
178
-2. Detailed specification
179
-
180
- 2.1. Overall conventions
181
-
182
- In the diagrams below, a box like this:
183
-
184
- +---+
185
- | | <-- the vertical bars might be missing
186
- +---+
187
-
188
- represents one byte; a box like this:
189
-
190
- +==============+
191
- | |
192
- +==============+
193
-
194
- represents a variable number of bytes.
195
-
196
- Bytes stored within a computer do not have a "bit order", since
197
- they are always treated as a unit. However, a byte considered as
198
- an integer between 0 and 255 does have a most- and least-
199
- significant bit, and since we write numbers with the most-
200
- significant digit on the left, we also write bytes with the most-
201
- significant bit on the left. In the diagrams below, we number the
202
- bits of a byte so that bit 0 is the least-significant bit, i.e.,
203
- the bits are numbered:
204
-
205
- +--------+
206
- |76543210|
207
- +--------+
208
-
209
- This document does not address the issue of the order in which
210
- bits of a byte are transmitted on a bit-sequential medium, since
211
- the data format described here is byte- rather than bit-oriented.
212
-
213
- Within a computer, a number may occupy multiple bytes. All
214
- multi-byte numbers in the format described here are stored with
215
- the least-significant byte first (at the lower memory address).
216
- For example, the decimal number 520 is stored as:
217
-
218
- 0 1
219
- +--------+--------+
220
- |00001000|00000010|
221
- +--------+--------+
222
- ^ ^
223
- | |
224
- | + more significant byte = 2 x 256
225
- + less significant byte = 8
226
-
227
-
228
-
229
-Deutsch Informational [Page 4]
230
-
231
-
232
-RFC 1952 GZIP File Format Specification May 1996
233
-
234
-
235
- 2.2. File format
236
-
237
- A gzip file consists of a series of "members" (compressed data
238
- sets). The format of each member is specified in the following
239
- section. The members simply appear one after another in the file,
240
- with no additional information before, between, or after them.
241
-
242
- 2.3. Member format
243
-
244
- Each member has the following structure:
245
-
246
- +---+---+---+---+---+---+---+---+---+---+
247
- |ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
248
- +---+---+---+---+---+---+---+---+---+---+
249
-
250
- (if FLG.FEXTRA set)
251
-
252
- +---+---+=================================+
253
- | XLEN |...XLEN bytes of "extra field"...| (more-->)
254
- +---+---+=================================+
255
-
256
- (if FLG.FNAME set)
257
-
258
- +=========================================+
259
- |...original file name, zero-terminated...| (more-->)
260
- +=========================================+
261
-
262
- (if FLG.FCOMMENT set)
263
-
264
- +===================================+
265
- |...file comment, zero-terminated...| (more-->)
266
- +===================================+
267
-
268
- (if FLG.FHCRC set)
269
-
270
- +---+---+
271
- | CRC16 |
272
- +---+---+
273
-
274
- +=======================+
275
- |...compressed blocks...| (more-->)
276
- +=======================+
277
-
278
- 0 1 2 3 4 5 6 7
279
- +---+---+---+---+---+---+---+---+
280
- | CRC32 | ISIZE |
281
- +---+---+---+---+---+---+---+---+
282
-
283
-
284
-
285
-
286
-Deutsch Informational [Page 5]
287
-
288
-
289
-RFC 1952 GZIP File Format Specification May 1996
290
-
291
-
292
- 2.3.1. Member header and trailer
293
-
294
- ID1 (IDentification 1)
295
- ID2 (IDentification 2)
296
- These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139
297
- (0x8b, \213), to identify the file as being in gzip format.
298
-
299
- CM (Compression Method)
300
- This identifies the compression method used in the file. CM
301
- = 0-7 are reserved. CM = 8 denotes the "deflate"
302
- compression method, which is the one customarily used by
303
- gzip and which is documented elsewhere.
304
-
305
- FLG (FLaGs)
306
- This flag byte is divided into individual bits as follows:
307
-
308
- bit 0 FTEXT
309
- bit 1 FHCRC
310
- bit 2 FEXTRA
311
- bit 3 FNAME
312
- bit 4 FCOMMENT
313
- bit 5 reserved
314
- bit 6 reserved
315
- bit 7 reserved
316
-
317
- If FTEXT is set, the file is probably ASCII text. This is
318
- an optional indication, which the compressor may set by
319
- checking a small amount of the input data to see whether any
320
- non-ASCII characters are present. In case of doubt, FTEXT
321
- is cleared, indicating binary data. For systems which have
322
- different file formats for ascii text and binary data, the
323
- decompressor can use FTEXT to choose the appropriate format.
324
- We deliberately do not specify the algorithm used to set
325
- this bit, since a compressor always has the option of
326
- leaving it cleared and a decompressor always has the option
327
- of ignoring it and letting some other program handle issues
328
- of data conversion.
329
-
330
- If FHCRC is set, a CRC16 for the gzip header is present,
331
- immediately before the compressed data. The CRC16 consists
332
- of the two least significant bytes of the CRC32 for all
333
- bytes of the gzip header up to and not including the CRC16.
334
- [The FHCRC bit was never set by versions of gzip up to
335
- 1.2.4, even though it was documented with a different
336
- meaning in gzip 1.2.4.]
337
-
338
- If FEXTRA is set, optional extra fields are present, as
339
- described in a following section.
340
-
341
-
342
-
343
-Deutsch Informational [Page 6]
344
-
345
-
346
-RFC 1952 GZIP File Format Specification May 1996
347
-
348
-
349
- If FNAME is set, an original file name is present,
350
- terminated by a zero byte. The name must consist of ISO
351
- 8859-1 (LATIN-1) characters; on operating systems using
352
- EBCDIC or any other character set for file names, the name
353
- must be translated to the ISO LATIN-1 character set. This
354
- is the original name of the file being compressed, with any
355
- directory components removed, and, if the file being
356
- compressed is on a file system with case insensitive names,
357
- forced to lower case. There is no original file name if the
358
- data was compressed from a source other than a named file;
359
- for example, if the source was stdin on a Unix system, there
360
- is no file name.
361
-
362
- If FCOMMENT is set, a zero-terminated file comment is
363
- present. This comment is not interpreted; it is only
364
- intended for human consumption. The comment must consist of
365
- ISO 8859-1 (LATIN-1) characters. Line breaks should be
366
- denoted by a single line feed character (10 decimal).
367
-
368
- Reserved FLG bits must be zero.
369
-
370
- MTIME (Modification TIME)
371
- This gives the most recent modification time of the original
372
- file being compressed. The time is in Unix format, i.e.,
373
- seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this
374
- may cause problems for MS-DOS and other systems that use
375
- local rather than Universal time.) If the compressed data
376
- did not come from a file, MTIME is set to the time at which
377
- compression started. MTIME = 0 means no time stamp is
378
- available.
379
-
380
- XFL (eXtra FLags)
381
- These flags are available for use by specific compression
382
- methods. The "deflate" method (CM = 8) sets these flags as
383
- follows:
384
-
385
- XFL = 2 - compressor used maximum compression,
386
- slowest algorithm
387
- XFL = 4 - compressor used fastest algorithm
388
-
389
- OS (Operating System)
390
- This identifies the type of file system on which compression
391
- took place. This may be useful in determining end-of-line
392
- convention for text files. The currently defined values are
393
- as follows:
394
-
395
-
396
-
397
-
398
-
399
-
400
-Deutsch Informational [Page 7]
401
-
402
-
403
-RFC 1952 GZIP File Format Specification May 1996
404
-
405
-
406
- 0 - FAT filesystem (MS-DOS, OS/2, NT/Win32)
407
- 1 - Amiga
408
- 2 - VMS (or OpenVMS)
409
- 3 - Unix
410
- 4 - VM/CMS
411
- 5 - Atari TOS
412
- 6 - HPFS filesystem (OS/2, NT)
413
- 7 - Macintosh
414
- 8 - Z-System
415
- 9 - CP/M
416
- 10 - TOPS-20
417
- 11 - NTFS filesystem (NT)
418
- 12 - QDOS
419
- 13 - Acorn RISCOS
420
- 255 - unknown
421
-
422
- XLEN (eXtra LENgth)
423
- If FLG.FEXTRA is set, this gives the length of the optional
424
- extra field. See below for details.
425
-
426
- CRC32 (CRC-32)
427
- This contains a Cyclic Redundancy Check value of the
428
- uncompressed data computed according to CRC-32 algorithm
429
- used in the ISO 3309 standard and in section 8.1.1.6.2 of
430
- ITU-T recommendation V.42. (See http://www.iso.ch for
431
- ordering ISO documents. See gopher://info.itu.ch for an
432
- online version of ITU-T V.42.)
433
-
434
- ISIZE (Input SIZE)
435
- This contains the size of the original (uncompressed) input
436
- data modulo 2^32.
437
-
438
- 2.3.1.1. Extra field
439
-
440
- If the FLG.FEXTRA bit is set, an "extra field" is present in
441
- the header, with total length XLEN bytes. It consists of a
442
- series of subfields, each of the form:
443
-
444
- +---+---+---+---+==================================+
445
- |SI1|SI2| LEN |... LEN bytes of subfield data ...|
446
- +---+---+---+---+==================================+
447
-
448
- SI1 and SI2 provide a subfield ID, typically two ASCII letters
449
- with some mnemonic value. Jean-Loup Gailly
450
- <[email protected]> is maintaining a registry of subfield
451
- IDs; please send him any subfield ID you wish to use. Subfield
452
- IDs with SI2 = 0 are reserved for future use. The following
453
- IDs are currently defined:
454
-
455
-
456
-
457
-Deutsch Informational [Page 8]
458
-
459
-
460
-RFC 1952 GZIP File Format Specification May 1996
461
-
462
-
463
- SI1 SI2 Data
464
- ---------- ---------- ----
465
- 0x41 ('A') 0x70 ('P') Apollo file type information
466
-
467
- LEN gives the length of the subfield data, excluding the 4
468
- initial bytes.
469
-
470
- 2.3.1.2. Compliance
471
-
472
- A compliant compressor must produce files with correct ID1,
473
- ID2, CM, CRC32, and ISIZE, but may set all the other fields in
474
- the fixed-length part of the header to default values (255 for
475
- OS, 0 for all others). The compressor must set all reserved
476
- bits to zero.
477
-
478
- A compliant decompressor must check ID1, ID2, and CM, and
479
- provide an error indication if any of these have incorrect
480
- values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC
481
- at least so it can skip over the optional fields if they are
482
- present. It need not examine any other part of the header or
483
- trailer; in particular, a decompressor may ignore FTEXT and OS
484
- and always produce binary output, and still be compliant. A
485
- compliant decompressor must give an error indication if any
486
- reserved bit is non-zero, since such a bit could indicate the
487
- presence of a new field that would cause subsequent data to be
488
- interpreted incorrectly.
489
-
490
-3. References
491
-
492
- [1] "Information Processing - 8-bit single-byte coded graphic
493
- character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987).
494
- The ISO 8859-1 (Latin-1) character set is a superset of 7-bit
495
- ASCII. Files defining this character set are available as
496
- iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/
497
-
498
- [2] ISO 3309
499
-
500
- [3] ITU-T recommendation V.42
501
-
502
- [4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
503
- available in ftp://ftp.uu.net/pub/archiving/zip/doc/
504
-
505
- [5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in
506
- ftp://prep.ai.mit.edu/pub/gnu/
507
-
508
- [6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table
509
- Look-Up", Communications of the ACM, 31(8), pp.1008-1013.
510
-
511
-
512
-
513
-
514
-Deutsch Informational [Page 9]
515
-
516
-
517
-RFC 1952 GZIP File Format Specification May 1996
518
-
519
-
520
- [7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal,
521
- pp.118-133.
522
-
523
- [8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt,
524
- describing the CRC concept.
525
-
526
-4. Security Considerations
527
-
528
- Any data compression method involves the reduction of redundancy in
529
- the data. Consequently, any corruption of the data is likely to have
530
- severe effects and be difficult to correct. Uncompressed text, on
531
- the other hand, will probably still be readable despite the presence
532
- of some corrupted bytes.
533
-
534
- It is recommended that systems using this data format provide some
535
- means of validating the integrity of the compressed data, such as by
536
- setting and checking the CRC-32 check value.
537
-
538
-5. Acknowledgements
539
-
540
- Trademarks cited in this document are the property of their
541
- respective owners.
542
-
543
- Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler,
544
- the related software described in this specification. Glenn
545
- Randers-Pehrson converted this document to RFC and HTML format.
546
-
547
-6. Author's Address
548
-
549
- L. Peter Deutsch
550
- Aladdin Enterprises
551
- 203 Santa Margarita Ave.
552
- Menlo Park, CA 94025
553
-
554
- Phone: (415) 322-0103 (AM only)
555
- FAX: (415) 322-1734
556
- EMail: <[email protected]>
557
-
558
- Questions about the technical content of this specification can be
559
- sent by email to:
560
-
561
- Jean-Loup Gailly <[email protected]> and
562
- Mark Adler <[email protected]>
563
-
564
- Editorial comments on this specification can be sent by email to:
565
-
566
- L. Peter Deutsch <[email protected]> and
567
- Glenn Randers-Pehrson <[email protected]>
568
-
569
-
570
-
571
-Deutsch Informational [Page 10]
572
-
573
-
574
-RFC 1952 GZIP File Format Specification May 1996
575
-
576
-
577
-7. Appendix: Jean-Loup Gailly's gzip utility
578
-
579
- The most widely used implementation of gzip compression, and the
580
- original documentation on which this specification is based, were
581
- created by Jean-Loup Gailly <[email protected]>. Since this
582
- implementation is a de facto standard, we mention some more of its
583
- features here. Again, the material in this section is not part of
584
- the specification per se, and implementations need not follow it to
585
- be compliant.
586
-
587
- When compressing or decompressing a file, gzip preserves the
588
- protection, ownership, and modification time attributes on the local
589
- file system, since there is no provision for representing protection
590
- attributes in the gzip file format itself. Since the file format
591
- includes a modification time, the gzip decompressor provides a
592
- command line switch that assigns the modification time from the file,
593
- rather than the local modification time of the compressed input, to
594
- the decompressed output.
595
-
596
-8. Appendix: Sample CRC Code
597
-
598
- The following sample code represents a practical implementation of
599
- the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42
600
- for a formal specification.)
601
-
602
- The sample code is in the ANSI C programming language. Non C users
603
- may find it easier to read with these hints:
604
-
605
- & Bitwise AND operator.
606
- ^ Bitwise exclusive-OR operator.
607
- >> Bitwise right shift operator. When applied to an
608
- unsigned quantity, as here, right shift inserts zero
609
- bit(s) at the left.
610
- ! Logical NOT operator.
611
- ++ "n++" increments the variable n.
612
- 0xNNN 0x introduces a hexadecimal (base 16) constant.
613
- Suffix L indicates a long value (at least 32 bits).
614
-
615
- /* Table of CRCs of all 8-bit messages. */
616
- unsigned long crc_table[256];
617
-
618
- /* Flag: has the table been computed? Initially false. */
619
- int crc_table_computed = 0;
620
-
621
- /* Make the table for a fast CRC. */
622
- void make_crc_table(void)
623
- {
624
- unsigned long c;
625
-
626
-
627
-
628
-Deutsch Informational [Page 11]
629
-
630
-
631
-RFC 1952 GZIP File Format Specification May 1996
632
-
633
-
634
- int n, k;
635
- for (n = 0; n < 256; n++) {
636
- c = (unsigned long) n;
637
- for (k = 0; k < 8; k++) {
638
- if (c & 1) {
639
- c = 0xedb88320L ^ (c >> 1);
640
- } else {
641
- c = c >> 1;
642
- }
643
- }
644
- crc_table[n] = c;
645
- }
646
- crc_table_computed = 1;
647
- }
648
-
649
- /*
650
- Update a running crc with the bytes buf[0..len-1] and return
651
- the updated crc. The crc should be initialized to zero. Pre- and
652
- post-conditioning (one's complement) is performed within this
653
- function so it shouldn't be done by the caller. Usage example:
654
-
655
- unsigned long crc = 0L;
656
-
657
- while (read_buffer(buffer, length) != EOF) {
658
- crc = update_crc(crc, buffer, length);
659
- }
660
- if (crc != original_crc) error();
661
- */
662
- unsigned long update_crc(unsigned long crc,
663
- unsigned char *buf, int len)
664
- {
665
- unsigned long c = crc ^ 0xffffffffL;
666
- int n;
667
-
668
- if (!crc_table_computed)
669
- make_crc_table();
670
- for (n = 0; n < len; n++) {
671
- c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
672
- }
673
- return c ^ 0xffffffffL;
674
- }
675
-
676
- /* Return the CRC of the bytes buf[0..len-1]. */
677
- unsigned long crc(unsigned char *buf, int len)
678
- {
679
- return update_crc(0L, buf, len);
680
- }
681
-
682
-
683
-
684
-
685
-Deutsch Informational [Page 12]
686
-
687
-
--- a/compat/zlib/doc/rfc1952.txt
+++ b/compat/zlib/doc/rfc1952.txt
@@ -1,687 +0,0 @@
1
2
3
4
5
6
7 Network Working Group P. Deutsch
8 Request for Comments: 1952 Aladdin Enterprises
9 Category: Informational May 1996
10
11
12 GZIP file format specification version 4.3
13
14 Status of This Memo
15
16 This memo provides information for the Internet community. This memo
17 does not specify an Internet standard of any kind. Distribution of
18 this memo is unlimited.
19
20 IESG Note:
21
22 The IESG takes no position on the validity of any Intellectual
23 Property Rights statements contained in this document.
24
25 Notices
26
27 Copyright (c) 1996 L. Peter Deutsch
28
29 Permission is granted to copy and distribute this document for any
30 purpose and without charge, including translations into other
31 languages and incorporation into compilations, provided that the
32 copyright notice and this notice are preserved, and that any
33 substantive changes or deletions from the original are clearly
34 marked.
35
36 A pointer to the latest version of this and related documentation in
37 HTML format can be found at the URL
38 <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
39
40 Abstract
41
42 This specification defines a lossless compressed data format that is
43 compatible with the widely used GZIP utility. The format includes a
44 cyclic redundancy check value for detecting data corruption. The
45 format presently uses the DEFLATE method of compression but can be
46 easily extended to use other compression methods. The format can be
47 implemented readily in a manner not covered by patents.
48
49
50
51
52
53
54
55
56
57
58 Deutsch Informational [Page 1]
59
60
61 RFC 1952 GZIP File Format Specification May 1996
62
63
64 Table of Contents
65
66 1. Introduction ................................................... 2
67 1.1. Purpose ................................................... 2
68 1.2. Intended audience ......................................... 3
69 1.3. Scope ..................................................... 3
70 1.4. Compliance ................................................ 3
71 1.5. Definitions of terms and conventions used ................. 3
72 1.6. Changes from previous versions ............................ 3
73 2. Detailed specification ......................................... 4
74 2.1. Overall conventions ....................................... 4
75 2.2. File format ............................................... 5
76 2.3. Member format ............................................. 5
77 2.3.1. Member header and trailer ........................... 6
78 2.3.1.1. Extra field ................................... 8
79 2.3.1.2. Compliance .................................... 9
80 3. References .................................................. 9
81 4. Security Considerations .................................... 10
82 5. Acknowledgements ........................................... 10
83 6. Author's Address ........................................... 10
84 7. Appendix: Jean-Loup Gailly's gzip utility .................. 11
85 8. Appendix: Sample CRC Code .................................. 11
86
87 1. Introduction
88
89 1.1. Purpose
90
91 The purpose of this specification is to define a lossless
92 compressed data format that:
93
94 * Is independent of CPU type, operating system, file system,
95 and character set, and hence can be used for interchange;
96 * Can compress or decompress a data stream (as opposed to a
97 randomly accessible file) to produce another data stream,
98 using only an a priori bounded amount of intermediate
99 storage, and hence can be used in data communications or
100 similar structures such as Unix filters;
101 * Compresses data with efficiency comparable to the best
102 currently available general-purpose compression methods,
103 and in particular considerably better than the "compress"
104 program;
105 * Can be implemented readily in a manner not covered by
106 patents, and hence can be practiced freely;
107 * Is compatible with the file format produced by the current
108 widely used gzip utility, in that conforming decompressors
109 will be able to read data produced by the existing gzip
110 compressor.
111
112
113
114
115 Deutsch Informational [Page 2]
116
117
118 RFC 1952 GZIP File Format Specification May 1996
119
120
121 The data format defined by this specification does not attempt to:
122
123 * Provide random access to compressed data;
124 * Compress specialized data (e.g., raster graphics) as well as
125 the best currently available specialized algorithms.
126
127 1.2. Intended audience
128
129 This specification is intended for use by implementors of software
130 to compress data into gzip format and/or decompress data from gzip
131 format.
132
133 The text of the specification assumes a basic background in
134 programming at the level of bits and other primitive data
135 representations.
136
137 1.3. Scope
138
139 The specification specifies a compression method and a file format
140 (the latter assuming only that a file can store a sequence of
141 arbitrary bytes). It does not specify any particular interface to
142 a file system or anything about character sets or encodings
143 (except for file names and comments, which are optional).
144
145 1.4. Compliance
146
147 Unless otherwise indicated below, a compliant decompressor must be
148 able to accept and decompress any file that conforms to all the
149 specifications presented here; a compliant compressor must produce
150 files that conform to all the specifications presented here. The
151 material in the appendices is not part of the specification per se
152 and is not relevant to compliance.
153
154 1.5. Definitions of terms and conventions used
155
156 byte: 8 bits stored or transmitted as a unit (same as an octet).
157 (For this specification, a byte is exactly 8 bits, even on
158 machines which store a character on a number of bits different
159 from 8.) See below for the numbering of bits within a byte.
160
161 1.6. Changes from previous versions
162
163 There have been no technical changes to the gzip format since
164 version 4.1 of this specification. In version 4.2, some
165 terminology was changed, and the sample CRC code was rewritten for
166 clarity and to eliminate the requirement for the caller to do pre-
167 and post-conditioning. Version 4.3 is a conversion of the
168 specification to RFC style.
169
170
171
172 Deutsch Informational [Page 3]
173
174
175 RFC 1952 GZIP File Format Specification May 1996
176
177
178 2. Detailed specification
179
180 2.1. Overall conventions
181
182 In the diagrams below, a box like this:
183
184 +---+
185 | | <-- the vertical bars might be missing
186 +---+
187
188 represents one byte; a box like this:
189
190 +==============+
191 | |
192 +==============+
193
194 represents a variable number of bytes.
195
196 Bytes stored within a computer do not have a "bit order", since
197 they are always treated as a unit. However, a byte considered as
198 an integer between 0 and 255 does have a most- and least-
199 significant bit, and since we write numbers with the most-
200 significant digit on the left, we also write bytes with the most-
201 significant bit on the left. In the diagrams below, we number the
202 bits of a byte so that bit 0 is the least-significant bit, i.e.,
203 the bits are numbered:
204
205 +--------+
206 |76543210|
207 +--------+
208
209 This document does not address the issue of the order in which
210 bits of a byte are transmitted on a bit-sequential medium, since
211 the data format described here is byte- rather than bit-oriented.
212
213 Within a computer, a number may occupy multiple bytes. All
214 multi-byte numbers in the format described here are stored with
215 the least-significant byte first (at the lower memory address).
216 For example, the decimal number 520 is stored as:
217
218 0 1
219 +--------+--------+
220 |00001000|00000010|
221 +--------+--------+
222 ^ ^
223 | |
224 | + more significant byte = 2 x 256
225 + less significant byte = 8
226
227
228
229 Deutsch Informational [Page 4]
230
231
232 RFC 1952 GZIP File Format Specification May 1996
233
234
235 2.2. File format
236
237 A gzip file consists of a series of "members" (compressed data
238 sets). The format of each member is specified in the following
239 section. The members simply appear one after another in the file,
240 with no additional information before, between, or after them.
241
242 2.3. Member format
243
244 Each member has the following structure:
245
246 +---+---+---+---+---+---+---+---+---+---+
247 |ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
248 +---+---+---+---+---+---+---+---+---+---+
249
250 (if FLG.FEXTRA set)
251
252 +---+---+=================================+
253 | XLEN |...XLEN bytes of "extra field"...| (more-->)
254 +---+---+=================================+
255
256 (if FLG.FNAME set)
257
258 +=========================================+
259 |...original file name, zero-terminated...| (more-->)
260 +=========================================+
261
262 (if FLG.FCOMMENT set)
263
264 +===================================+
265 |...file comment, zero-terminated...| (more-->)
266 +===================================+
267
268 (if FLG.FHCRC set)
269
270 +---+---+
271 | CRC16 |
272 +---+---+
273
274 +=======================+
275 |...compressed blocks...| (more-->)
276 +=======================+
277
278 0 1 2 3 4 5 6 7
279 +---+---+---+---+---+---+---+---+
280 | CRC32 | ISIZE |
281 +---+---+---+---+---+---+---+---+
282
283
284
285
286 Deutsch Informational [Page 5]
287
288
289 RFC 1952 GZIP File Format Specification May 1996
290
291
292 2.3.1. Member header and trailer
293
294 ID1 (IDentification 1)
295 ID2 (IDentification 2)
296 These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139
297 (0x8b, \213), to identify the file as being in gzip format.
298
299 CM (Compression Method)
300 This identifies the compression method used in the file. CM
301 = 0-7 are reserved. CM = 8 denotes the "deflate"
302 compression method, which is the one customarily used by
303 gzip and which is documented elsewhere.
304
305 FLG (FLaGs)
306 This flag byte is divided into individual bits as follows:
307
308 bit 0 FTEXT
309 bit 1 FHCRC
310 bit 2 FEXTRA
311 bit 3 FNAME
312 bit 4 FCOMMENT
313 bit 5 reserved
314 bit 6 reserved
315 bit 7 reserved
316
317 If FTEXT is set, the file is probably ASCII text. This is
318 an optional indication, which the compressor may set by
319 checking a small amount of the input data to see whether any
320 non-ASCII characters are present. In case of doubt, FTEXT
321 is cleared, indicating binary data. For systems which have
322 different file formats for ascii text and binary data, the
323 decompressor can use FTEXT to choose the appropriate format.
324 We deliberately do not specify the algorithm used to set
325 this bit, since a compressor always has the option of
326 leaving it cleared and a decompressor always has the option
327 of ignoring it and letting some other program handle issues
328 of data conversion.
329
330 If FHCRC is set, a CRC16 for the gzip header is present,
331 immediately before the compressed data. The CRC16 consists
332 of the two least significant bytes of the CRC32 for all
333 bytes of the gzip header up to and not including the CRC16.
334 [The FHCRC bit was never set by versions of gzip up to
335 1.2.4, even though it was documented with a different
336 meaning in gzip 1.2.4.]
337
338 If FEXTRA is set, optional extra fields are present, as
339 described in a following section.
340
341
342
343 Deutsch Informational [Page 6]
344
345
346 RFC 1952 GZIP File Format Specification May 1996
347
348
349 If FNAME is set, an original file name is present,
350 terminated by a zero byte. The name must consist of ISO
351 8859-1 (LATIN-1) characters; on operating systems using
352 EBCDIC or any other character set for file names, the name
353 must be translated to the ISO LATIN-1 character set. This
354 is the original name of the file being compressed, with any
355 directory components removed, and, if the file being
356 compressed is on a file system with case insensitive names,
357 forced to lower case. There is no original file name if the
358 data was compressed from a source other than a named file;
359 for example, if the source was stdin on a Unix system, there
360 is no file name.
361
362 If FCOMMENT is set, a zero-terminated file comment is
363 present. This comment is not interpreted; it is only
364 intended for human consumption. The comment must consist of
365 ISO 8859-1 (LATIN-1) characters. Line breaks should be
366 denoted by a single line feed character (10 decimal).
367
368 Reserved FLG bits must be zero.
369
370 MTIME (Modification TIME)
371 This gives the most recent modification time of the original
372 file being compressed. The time is in Unix format, i.e.,
373 seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this
374 may cause problems for MS-DOS and other systems that use
375 local rather than Universal time.) If the compressed data
376 did not come from a file, MTIME is set to the time at which
377 compression started. MTIME = 0 means no time stamp is
378 available.
379
380 XFL (eXtra FLags)
381 These flags are available for use by specific compression
382 methods. The "deflate" method (CM = 8) sets these flags as
383 follows:
384
385 XFL = 2 - compressor used maximum compression,
386 slowest algorithm
387 XFL = 4 - compressor used fastest algorithm
388
389 OS (Operating System)
390 This identifies the type of file system on which compression
391 took place. This may be useful in determining end-of-line
392 convention for text files. The currently defined values are
393 as follows:
394
395
396
397
398
399
400 Deutsch Informational [Page 7]
401
402
403 RFC 1952 GZIP File Format Specification May 1996
404
405
406 0 - FAT filesystem (MS-DOS, OS/2, NT/Win32)
407 1 - Amiga
408 2 - VMS (or OpenVMS)
409 3 - Unix
410 4 - VM/CMS
411 5 - Atari TOS
412 6 - HPFS filesystem (OS/2, NT)
413 7 - Macintosh
414 8 - Z-System
415 9 - CP/M
416 10 - TOPS-20
417 11 - NTFS filesystem (NT)
418 12 - QDOS
419 13 - Acorn RISCOS
420 255 - unknown
421
422 XLEN (eXtra LENgth)
423 If FLG.FEXTRA is set, this gives the length of the optional
424 extra field. See below for details.
425
426 CRC32 (CRC-32)
427 This contains a Cyclic Redundancy Check value of the
428 uncompressed data computed according to CRC-32 algorithm
429 used in the ISO 3309 standard and in section 8.1.1.6.2 of
430 ITU-T recommendation V.42. (See http://www.iso.ch for
431 ordering ISO documents. See gopher://info.itu.ch for an
432 online version of ITU-T V.42.)
433
434 ISIZE (Input SIZE)
435 This contains the size of the original (uncompressed) input
436 data modulo 2^32.
437
438 2.3.1.1. Extra field
439
440 If the FLG.FEXTRA bit is set, an "extra field" is present in
441 the header, with total length XLEN bytes. It consists of a
442 series of subfields, each of the form:
443
444 +---+---+---+---+==================================+
445 |SI1|SI2| LEN |... LEN bytes of subfield data ...|
446 +---+---+---+---+==================================+
447
448 SI1 and SI2 provide a subfield ID, typically two ASCII letters
449 with some mnemonic value. Jean-Loup Gailly
450 <[email protected]> is maintaining a registry of subfield
451 IDs; please send him any subfield ID you wish to use. Subfield
452 IDs with SI2 = 0 are reserved for future use. The following
453 IDs are currently defined:
454
455
456
457 Deutsch Informational [Page 8]
458
459
460 RFC 1952 GZIP File Format Specification May 1996
461
462
463 SI1 SI2 Data
464 ---------- ---------- ----
465 0x41 ('A') 0x70 ('P') Apollo file type information
466
467 LEN gives the length of the subfield data, excluding the 4
468 initial bytes.
469
470 2.3.1.2. Compliance
471
472 A compliant compressor must produce files with correct ID1,
473 ID2, CM, CRC32, and ISIZE, but may set all the other fields in
474 the fixed-length part of the header to default values (255 for
475 OS, 0 for all others). The compressor must set all reserved
476 bits to zero.
477
478 A compliant decompressor must check ID1, ID2, and CM, and
479 provide an error indication if any of these have incorrect
480 values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC
481 at least so it can skip over the optional fields if they are
482 present. It need not examine any other part of the header or
483 trailer; in particular, a decompressor may ignore FTEXT and OS
484 and always produce binary output, and still be compliant. A
485 compliant decompressor must give an error indication if any
486 reserved bit is non-zero, since such a bit could indicate the
487 presence of a new field that would cause subsequent data to be
488 interpreted incorrectly.
489
490 3. References
491
492 [1] "Information Processing - 8-bit single-byte coded graphic
493 character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987).
494 The ISO 8859-1 (Latin-1) character set is a superset of 7-bit
495 ASCII. Files defining this character set are available as
496 iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/
497
498 [2] ISO 3309
499
500 [3] ITU-T recommendation V.42
501
502 [4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
503 available in ftp://ftp.uu.net/pub/archiving/zip/doc/
504
505 [5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in
506 ftp://prep.ai.mit.edu/pub/gnu/
507
508 [6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table
509 Look-Up", Communications of the ACM, 31(8), pp.1008-1013.
510
511
512
513
514 Deutsch Informational [Page 9]
515
516
517 RFC 1952 GZIP File Format Specification May 1996
518
519
520 [7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal,
521 pp.118-133.
522
523 [8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt,
524 describing the CRC concept.
525
526 4. Security Considerations
527
528 Any data compression method involves the reduction of redundancy in
529 the data. Consequently, any corruption of the data is likely to have
530 severe effects and be difficult to correct. Uncompressed text, on
531 the other hand, will probably still be readable despite the presence
532 of some corrupted bytes.
533
534 It is recommended that systems using this data format provide some
535 means of validating the integrity of the compressed data, such as by
536 setting and checking the CRC-32 check value.
537
538 5. Acknowledgements
539
540 Trademarks cited in this document are the property of their
541 respective owners.
542
543 Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler,
544 the related software described in this specification. Glenn
545 Randers-Pehrson converted this document to RFC and HTML format.
546
547 6. Author's Address
548
549 L. Peter Deutsch
550 Aladdin Enterprises
551 203 Santa Margarita Ave.
552 Menlo Park, CA 94025
553
554 Phone: (415) 322-0103 (AM only)
555 FAX: (415) 322-1734
556 EMail: <[email protected]>
557
558 Questions about the technical content of this specification can be
559 sent by email to:
560
561 Jean-Loup Gailly <[email protected]> and
562 Mark Adler <[email protected]>
563
564 Editorial comments on this specification can be sent by email to:
565
566 L. Peter Deutsch <[email protected]> and
567 Glenn Randers-Pehrson <[email protected]>
568
569
570
571 Deutsch Informational [Page 10]
572
573
574 RFC 1952 GZIP File Format Specification May 1996
575
576
577 7. Appendix: Jean-Loup Gailly's gzip utility
578
579 The most widely used implementation of gzip compression, and the
580 original documentation on which this specification is based, were
581 created by Jean-Loup Gailly <[email protected]>. Since this
582 implementation is a de facto standard, we mention some more of its
583 features here. Again, the material in this section is not part of
584 the specification per se, and implementations need not follow it to
585 be compliant.
586
587 When compressing or decompressing a file, gzip preserves the
588 protection, ownership, and modification time attributes on the local
589 file system, since there is no provision for representing protection
590 attributes in the gzip file format itself. Since the file format
591 includes a modification time, the gzip decompressor provides a
592 command line switch that assigns the modification time from the file,
593 rather than the local modification time of the compressed input, to
594 the decompressed output.
595
596 8. Appendix: Sample CRC Code
597
598 The following sample code represents a practical implementation of
599 the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42
600 for a formal specification.)
601
602 The sample code is in the ANSI C programming language. Non C users
603 may find it easier to read with these hints:
604
605 & Bitwise AND operator.
606 ^ Bitwise exclusive-OR operator.
607 >> Bitwise right shift operator. When applied to an
608 unsigned quantity, as here, right shift inserts zero
609 bit(s) at the left.
610 ! Logical NOT operator.
611 ++ "n++" increments the variable n.
612 0xNNN 0x introduces a hexadecimal (base 16) constant.
613 Suffix L indicates a long value (at least 32 bits).
614
615 /* Table of CRCs of all 8-bit messages. */
616 unsigned long crc_table[256];
617
618 /* Flag: has the table been computed? Initially false. */
619 int crc_table_computed = 0;
620
621 /* Make the table for a fast CRC. */
622 void make_crc_table(void)
623 {
624 unsigned long c;
625
626
627
628 Deutsch Informational [Page 11]
629
630
631 RFC 1952 GZIP File Format Specification May 1996
632
633
634 int n, k;
635 for (n = 0; n < 256; n++) {
636 c = (unsigned long) n;
637 for (k = 0; k < 8; k++) {
638 if (c & 1) {
639 c = 0xedb88320L ^ (c >> 1);
640 } else {
641 c = c >> 1;
642 }
643 }
644 crc_table[n] = c;
645 }
646 crc_table_computed = 1;
647 }
648
649 /*
650 Update a running crc with the bytes buf[0..len-1] and return
651 the updated crc. The crc should be initialized to zero. Pre- and
652 post-conditioning (one's complement) is performed within this
653 function so it shouldn't be done by the caller. Usage example:
654
655 unsigned long crc = 0L;
656
657 while (read_buffer(buffer, length) != EOF) {
658 crc = update_crc(crc, buffer, length);
659 }
660 if (crc != original_crc) error();
661 */
662 unsigned long update_crc(unsigned long crc,
663 unsigned char *buf, int len)
664 {
665 unsigned long c = crc ^ 0xffffffffL;
666 int n;
667
668 if (!crc_table_computed)
669 make_crc_table();
670 for (n = 0; n < len; n++) {
671 c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
672 }
673 return c ^ 0xffffffffL;
674 }
675
676 /* Return the CRC of the bytes buf[0..len-1]. */
677 unsigned long crc(unsigned char *buf, int len)
678 {
679 return update_crc(0L, buf, len);
680 }
681
682
683
684
685 Deutsch Informational [Page 12]
686
687
--- a/compat/zlib/doc/rfc1952.txt
+++ b/compat/zlib/doc/rfc1952.txt
@@ -1,687 +0,0 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
D compat/zlib/doc/txtvsbin.txt
-103
--- a/compat/zlib/doc/txtvsbin.txt
+++ b/compat/zlib/doc/txtvsbin.txt
@@ -1,107 +0,0 @@
1
-A Fast Method for Identifying Plain Text Files
2
-==============================================
3
-
4
-
5
-Introduction
-------------
6
-
7
-Given a file coming from an unknown source, it is sometimes desirable
8
-to find out whether the format of that file is plain text. Although
9
-this may appear like a simple task, a fully accurate detection of the
10
-file type requires heavy-duty semantic analysis on the file contents.
11
-It is, however, possible to obtain satisfactory results by employing
12
-various heuristics.
13
-
14
-Previous versions of PKZip and other zip-compatible compression tools
15
-were using a crude detection scheme: if more than 80% (4/5) of the bytes
16
-found in a certain buffer are within the range [7..127], the file is
17
-labeled as plain text, otherwise it is labeled as binary. A prominent
18
-limitation of this scheme is the restriction to Latin-based alphabets.
19
-Other alphabets, like Greek, Cyrillic or Asian, make extensive use of
20
-the bytes within the range [128..255], and texts using these alphabets
21
-are most often misidentified by this scheme; in other words, the rate
22
-of false negatives is sometimes too high, which means that the recall
23
-is low. Another weakness of this scheme is a reduced precision, due to
24
-the false positives that may occur when binary files containing large
25
-amounts of textual characters are misidentified as plain text.
26
-
27
-In this article we propose a new, simple detection scheme that features
28
-a much increased precision and a near-100% recall. This scheme is
29
-designed to work on ASCII, Unicode and other ASCII-derived alphabets,
30
-and it handles single-byte encodings (ISO-8859, MacRoman, KOI8, etc.)
31
-and variable-sized encodings (ISO-2022, UTF-8, etc.). Wider encodings
32
-(UCS-2/UTF-16 and UCS-4/UTF-32) are not handled, however.
33
-
34
-
35
-The Algorithm
--------------
36
-
37
-The algorithm works by dividing the set of bytecodes [0..255] into three
38
-categories:
39
-- The white list of textual bytecodes:
40
- 9 (TAB), 10 (LF), 13 (CR), 32 (SPACE) to 255.
41
-- The gray list of tolerated bytecodes:
42
- 7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB), 27 (ESC).
43
-- The black list of undesired, non-textual bytecodes:
44
- 0 (NUL) to 6, 14 to 31.
45
-
46
-If a file contains at least one byte that belongs to the white list and
47
-no byte that belongs to the black list, then the file is categorized as
48
-plain text; otherwise, it is categorized as binary. (The boundary case,
49
-when the file is empty, automatically falls into the latter category.)
50
-
51
-
52
-Rationale
----------
53
-
54
-The idea behind this algorithm relies on two observations.
55
-
56
-The first observation is that, although the full range of 7-bit codes
57
-[0..127] is properly specified by the ASCII standard, most control
58
-characters in the range [0..31] are not used in practice. The only
59
-widely-used, almost universally-portable control codes are 9 (TAB),
60
-10 (LF) and 13 (CR). There are a few more control codes that are
61
-recognized on a reduced range of platforms and text viewers/editors:
62
-7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB) and 27 (ESC); but these
63
-codes are rarely (if ever) used alone, without being accompanied by
64
-some printable text. Even the newer, portable text formats such as
65
-XML avoid using control characters outside the list mentioned here.
66
-
67
-The second observation is that most of the binary files tend to contain
68
-control characters, especially 0 (NUL). Even though the older text
69
-detection schemes observe the presence of non-ASCII codes from the range
70
-[128..255], the precision rarely has to suffer if this upper range is
71
-labeled as textual, because the files that are genuinely binary tend to
72
-contain both control characters and codes from the upper range. On the
73
-other hand, the upper range needs to be labeled as textual, because it
74
-is used by virtually all ASCII extensions. In particular, this range is
75
-used for encoding non-Latin scripts.
76
-
77
-Since there is no counting involved, other than simply observing the
78
-presence or the absence of some byte values, the algorithm produces
79
-consistent results, regardless what alphabet encoding is being used.
80
-(If counting were involved, it could be possible to obtain different
81
-results on a text encoded, say, using ISO-8859-16 versus UTF-8.)
82
-
83
-There is an extra category of plain text files that are "polluted" with
84
-one or more black-listed codes, either by mistake or by peculiar design
85
-considerations. In such cases, a scheme that tolerates a small fraction
86
-of black-listed codes would provide an increased recall (i.e. more true
87
-positives). This, however, incurs a reduced precision overall, since
88
-false positives are more likely to appear in binary files that contain
89
-large chunks of textual data. Furthermore, "polluted" plain text should
90
-be regarded as binary by general-purpose text detection schemes, because
91
-general-purpose text processing algorithms might not be applicable.
92
-Under this premise, it is safe to say that our detection method provides
93
-a near-100% recall.
94
-
95
-Experiments have been run on many files coming from various platforms
96
-and applications. We tried plain text files, system logs, source code,
97
-formatted office documents, compiled object code, etc. The results
98
-confirm the optimistic assumptions about the capabilities of this
99
-algorithm.
100
-
101
-
---
102
-Cosmin Truta
103
-Last updated: 2006-May-28
--- a/compat/zlib/doc/txtvsbin.txt
+++ b/compat/zlib/doc/txtvsbin.txt
@@ -1,107 +0,0 @@
1 A Fast Method for Identifying Plain Text Files
2 ==============================================
3
4
5 Introduction
-------------
6
7 Given a file coming from an unknown source, it is sometimes desirable
8 to find out whether the format of that file is plain text. Although
9 this may appear like a simple task, a fully accurate detection of the
10 file type requires heavy-duty semantic analysis on the file contents.
11 It is, however, possible to obtain satisfactory results by employing
12 various heuristics.
13
14 Previous versions of PKZip and other zip-compatible compression tools
15 were using a crude detection scheme: if more than 80% (4/5) of the bytes
16 found in a certain buffer are within the range [7..127], the file is
17 labeled as plain text, otherwise it is labeled as binary. A prominent
18 limitation of this scheme is the restriction to Latin-based alphabets.
19 Other alphabets, like Greek, Cyrillic or Asian, make extensive use of
20 the bytes within the range [128..255], and texts using these alphabets
21 are most often misidentified by this scheme; in other words, the rate
22 of false negatives is sometimes too high, which means that the recall
23 is low. Another weakness of this scheme is a reduced precision, due to
24 the false positives that may occur when binary files containing large
25 amounts of textual characters are misidentified as plain text.
26
27 In this article we propose a new, simple detection scheme that features
28 a much increased precision and a near-100% recall. This scheme is
29 designed to work on ASCII, Unicode and other ASCII-derived alphabets,
30 and it handles single-byte encodings (ISO-8859, MacRoman, KOI8, etc.)
31 and variable-sized encodings (ISO-2022, UTF-8, etc.). Wider encodings
32 (UCS-2/UTF-16 and UCS-4/UTF-32) are not handled, however.
33
34
35 The Algorithm
--------------
36
37 The algorithm works by dividing the set of bytecodes [0..255] into three
38 categories:
39 - The white list of textual bytecodes:
40 9 (TAB), 10 (LF), 13 (CR), 32 (SPACE) to 255.
41 - The gray list of tolerated bytecodes:
42 7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB), 27 (ESC).
43 - The black list of undesired, non-textual bytecodes:
44 0 (NUL) to 6, 14 to 31.
45
46 If a file contains at least one byte that belongs to the white list and
47 no byte that belongs to the black list, then the file is categorized as
48 plain text; otherwise, it is categorized as binary. (The boundary case,
49 when the file is empty, automatically falls into the latter category.)
50
51
52 Rationale
----------
53
54 The idea behind this algorithm relies on two observations.
55
56 The first observation is that, although the full range of 7-bit codes
57 [0..127] is properly specified by the ASCII standard, most control
58 characters in the range [0..31] are not used in practice. The only
59 widely-used, almost universally-portable control codes are 9 (TAB),
60 10 (LF) and 13 (CR). There are a few more control codes that are
61 recognized on a reduced range of platforms and text viewers/editors:
62 7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB) and 27 (ESC); but these
63 codes are rarely (if ever) used alone, without being accompanied by
64 some printable text. Even the newer, portable text formats such as
65 XML avoid using control characters outside the list mentioned here.
66
67 The second observation is that most of the binary files tend to contain
68 control characters, especially 0 (NUL). Even though the older text
69 detection schemes observe the presence of non-ASCII codes from the range
70 [128..255], the precision rarely has to suffer if this upper range is
71 labeled as textual, because the files that are genuinely binary tend to
72 contain both control characters and codes from the upper range. On the
73 other hand, the upper range needs to be labeled as textual, because it
74 is used by virtually all ASCII extensions. In particular, this range is
75 used for encoding non-Latin scripts.
76
77 Since there is no counting involved, other than simply observing the
78 presence or the absence of some byte values, the algorithm produces
79 consistent results, regardless what alphabet encoding is being used.
80 (If counting were involved, it could be possible to obtain different
81 results on a text encoded, say, using ISO-8859-16 versus UTF-8.)
82
83 There is an extra category of plain text files that are "polluted" with
84 one or more black-listed codes, either by mistake or by peculiar design
85 considerations. In such cases, a scheme that tolerates a small fraction
86 of black-listed codes would provide an increased recall (i.e. more true
87 positives). This, however, incurs a reduced precision overall, since
88 false positives are more likely to appear in binary files that contain
89 large chunks of textual data. Furthermore, "polluted" plain text should
90 be regarded as binary by general-purpose text detection schemes, because
91 general-purpose text processing algorithms might not be applicable.
92 Under this premise, it is safe to say that our detection method provides
93 a near-100% recall.
94
95 Experiments have been run on many files coming from various platforms
96 and applications. We tried plain text files, system logs, source code,
97 formatted office documents, compiled object code, etc. The results
98 confirm the optimistic assumptions about the capabilities of this
99 algorithm.
100
101
---
102 Cosmin Truta
103 Last updated: 2006-May-28
--- a/compat/zlib/doc/txtvsbin.txt
+++ b/compat/zlib/doc/txtvsbin.txt
@@ -1,107 +0,0 @@
 
 
 
 
 
-------------
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--------------
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
----------
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
---
 
 
--- src/clone.c
+++ src/clone.c
@@ -175,10 +175,11 @@
175175
db_initial_setup(0, 0, zDefaultUser);
176176
user_select();
177177
db_set("content-schema", CONTENT_SCHEMA, 0);
178178
db_set("aux-schema", AUX_SCHEMA_MAX, 0);
179179
db_set("rebuilt", get_version(), 0);
180
+ db_unset("hash-policy", 0);
180181
remember_or_get_http_auth(zHttpAuth, urlFlags & URL_REMEMBER, g.argv[2]);
181182
url_remember();
182183
if( g.zSSLIdentity!=0 ){
183184
/* If the --ssl-identity option was specified, store it as a setting */
184185
Blob fn;
185186
--- src/clone.c
+++ src/clone.c
@@ -175,10 +175,11 @@
175 db_initial_setup(0, 0, zDefaultUser);
176 user_select();
177 db_set("content-schema", CONTENT_SCHEMA, 0);
178 db_set("aux-schema", AUX_SCHEMA_MAX, 0);
179 db_set("rebuilt", get_version(), 0);
 
180 remember_or_get_http_auth(zHttpAuth, urlFlags & URL_REMEMBER, g.argv[2]);
181 url_remember();
182 if( g.zSSLIdentity!=0 ){
183 /* If the --ssl-identity option was specified, store it as a setting */
184 Blob fn;
185
--- src/clone.c
+++ src/clone.c
@@ -175,10 +175,11 @@
175 db_initial_setup(0, 0, zDefaultUser);
176 user_select();
177 db_set("content-schema", CONTENT_SCHEMA, 0);
178 db_set("aux-schema", AUX_SCHEMA_MAX, 0);
179 db_set("rebuilt", get_version(), 0);
180 db_unset("hash-policy", 0);
181 remember_or_get_http_auth(zHttpAuth, urlFlags & URL_REMEMBER, g.argv[2]);
182 url_remember();
183 if( g.zSSLIdentity!=0 ){
184 /* If the --ssl-identity option was specified, store it as a setting */
185 Blob fn;
186
--- src/configure.c
+++ src/configure.c
@@ -129,10 +129,11 @@
129129
{ "empty-dirs", CONFIGSET_PROJ },
130130
{ "allow-symlinks", CONFIGSET_PROJ },
131131
{ "dotfiles", CONFIGSET_PROJ },
132132
{ "parent-project-code", CONFIGSET_PROJ },
133133
{ "parent-project-name", CONFIGSET_PROJ },
134
+ { "hash-policy", CONFIGSET_PROJ },
134135
135136
#ifdef FOSSIL_ENABLE_LEGACY_MV_RM
136137
{ "mv-rm-files", CONFIGSET_PROJ },
137138
#endif
138139
139140
--- src/configure.c
+++ src/configure.c
@@ -129,10 +129,11 @@
129 { "empty-dirs", CONFIGSET_PROJ },
130 { "allow-symlinks", CONFIGSET_PROJ },
131 { "dotfiles", CONFIGSET_PROJ },
132 { "parent-project-code", CONFIGSET_PROJ },
133 { "parent-project-name", CONFIGSET_PROJ },
 
134
135 #ifdef FOSSIL_ENABLE_LEGACY_MV_RM
136 { "mv-rm-files", CONFIGSET_PROJ },
137 #endif
138
139
--- src/configure.c
+++ src/configure.c
@@ -129,10 +129,11 @@
129 { "empty-dirs", CONFIGSET_PROJ },
130 { "allow-symlinks", CONFIGSET_PROJ },
131 { "dotfiles", CONFIGSET_PROJ },
132 { "parent-project-code", CONFIGSET_PROJ },
133 { "parent-project-name", CONFIGSET_PROJ },
134 { "hash-policy", CONFIGSET_PROJ },
135
136 #ifdef FOSSIL_ENABLE_LEGACY_MV_RM
137 { "mv-rm-files", CONFIGSET_PROJ },
138 #endif
139
140
--- src/content.c
+++ src/content.c
@@ -528,10 +528,14 @@
528528
blob_reset(&hash);
529529
hname_hash(pBlob, 0, &hash);
530530
}
531531
}else{
532532
blob_init(&hash, zUuid, -1);
533
+ }
534
+ if( g.eHashPolicy==HPOLICY_AUTO && blob_size(&hash)>HNAME_LEN_SHA1 ){
535
+ g.eHashPolicy = HPOLICY_SHA3;
536
+ db_set_int("hash-policy", HPOLICY_SHA3, 0);
533537
}
534538
if( nBlob ){
535539
size = nBlob;
536540
}else{
537541
size = blob_size(pBlob);
538542
--- src/content.c
+++ src/content.c
@@ -528,10 +528,14 @@
528 blob_reset(&hash);
529 hname_hash(pBlob, 0, &hash);
530 }
531 }else{
532 blob_init(&hash, zUuid, -1);
 
 
 
 
533 }
534 if( nBlob ){
535 size = nBlob;
536 }else{
537 size = blob_size(pBlob);
538
--- src/content.c
+++ src/content.c
@@ -528,10 +528,14 @@
528 blob_reset(&hash);
529 hname_hash(pBlob, 0, &hash);
530 }
531 }else{
532 blob_init(&hash, zUuid, -1);
533 }
534 if( g.eHashPolicy==HPOLICY_AUTO && blob_size(&hash)>HNAME_LEN_SHA1 ){
535 g.eHashPolicy = HPOLICY_SHA3;
536 db_set_int("hash-policy", HPOLICY_SHA3, 0);
537 }
538 if( nBlob ){
539 size = nBlob;
540 }else{
541 size = blob_size(pBlob);
542
+17 -3
--- src/db.c
+++ src/db.c
@@ -1485,10 +1485,15 @@
14851485
g.repositoryOpen = 1;
14861486
/* Cache "allow-symlinks" option, because we'll need it on every stat call */
14871487
g.allowSymlinks = db_get_boolean("allow-symlinks",
14881488
db_allow_symlinks_by_default());
14891489
g.zAuxSchema = db_get("aux-schema","");
1490
+ g.eHashPolicy = db_get_int("hash-policy",-1);
1491
+ if( g.eHashPolicy<0 ){
1492
+ g.eHashPolicy = hname_default_policy();
1493
+ db_set_int("hash-policy", g.eHashPolicy, 0);
1494
+ }
14901495
14911496
/* If the ALIAS table is not present, then some on-the-fly schema
14921497
** updates might be required.
14931498
*/
14941499
rebuild_schema_update_2_0(); /* Do the Fossil-2.0 schema updates */
@@ -1828,10 +1833,11 @@
18281833
" AND name NOT GLOB 'project-*'"
18291834
" AND name NOT GLOB 'short-project-*';",
18301835
configure_inop_rhs(CONFIGSET_ALL),
18311836
db_setting_inop_rhs()
18321837
);
1838
+ g.eHashPolicy = db_get_int("hash-policy", g.eHashPolicy);
18331839
db_multi_exec(
18341840
"REPLACE INTO reportfmt SELECT * FROM settingSrc.reportfmt;"
18351841
);
18361842
18371843
/*
@@ -1900,13 +1906,14 @@
19001906
** their associated permissions will not be copied; however, the system
19011907
** default users "anonymous", "nobody", "reader", "developer", and their
19021908
** associated permissions will be copied.
19031909
**
19041910
** Options:
1905
-** --template FILE copy settings from repository file
1906
-** --admin-user|-A USERNAME select given USERNAME as admin user
1907
-** --date-override DATETIME use DATETIME as time of the initial check-in
1911
+** --template FILE Copy settings from repository file
1912
+** --admin-user|-A USERNAME Select given USERNAME as admin user
1913
+** --date-override DATETIME Use DATETIME as time of the initial check-in
1914
+** --sha1 Use a initial hash policy of "sha1"
19081915
**
19091916
** DATETIME may be "now" or "YYYY-MM-DDTHH:MM:SS.SSS". If in
19101917
** year-month-day form, it may be truncated, the "T" may be replaced by
19111918
** a space, and it may also name a timezone offset from UTC as "-HH:MM"
19121919
** (westward) or "+HH:MM" (eastward). Either no timezone suffix or "Z"
@@ -1917,14 +1924,17 @@
19171924
void create_repository_cmd(void){
19181925
char *zPassword;
19191926
const char *zTemplate; /* Repository from which to copy settings */
19201927
const char *zDate; /* Date of the initial check-in */
19211928
const char *zDefaultUser; /* Optional name of the default user */
1929
+ int bUseSha1 = 0; /* True to set the hash-policy to sha1 */
1930
+
19221931
19231932
zTemplate = find_option("template",0,1);
19241933
zDate = find_option("date-override",0,1);
19251934
zDefaultUser = find_option("admin-user","A",1);
1935
+ bUseSha1 = find_option("sha1",0,0)!=0;
19261936
/* We should be done with options.. */
19271937
verify_all_options();
19281938
19291939
if( g.argc!=3 ){
19301940
usage("REPOSITORY-NAME");
@@ -1937,10 +1947,14 @@
19371947
db_create_repository(g.argv[2]);
19381948
db_open_repository(g.argv[2]);
19391949
db_open_config(0, 0);
19401950
if( zTemplate ) db_attach(zTemplate, "settingSrc");
19411951
db_begin_transaction();
1952
+ if( bUseSha1 ){
1953
+ g.eHashPolicy = HPOLICY_SHA1;
1954
+ db_set_int("hash-policy", HPOLICY_SHA1, 0);
1955
+ }
19421956
if( zDate==0 ) zDate = "now";
19431957
db_initial_setup(zTemplate, zDate, zDefaultUser);
19441958
db_end_transaction(0);
19451959
if( zTemplate ) db_detach("settingSrc");
19461960
fossil_print("project-id: %s\n", db_get("project-code", 0));
19471961
--- src/db.c
+++ src/db.c
@@ -1485,10 +1485,15 @@
1485 g.repositoryOpen = 1;
1486 /* Cache "allow-symlinks" option, because we'll need it on every stat call */
1487 g.allowSymlinks = db_get_boolean("allow-symlinks",
1488 db_allow_symlinks_by_default());
1489 g.zAuxSchema = db_get("aux-schema","");
 
 
 
 
 
1490
1491 /* If the ALIAS table is not present, then some on-the-fly schema
1492 ** updates might be required.
1493 */
1494 rebuild_schema_update_2_0(); /* Do the Fossil-2.0 schema updates */
@@ -1828,10 +1833,11 @@
1828 " AND name NOT GLOB 'project-*'"
1829 " AND name NOT GLOB 'short-project-*';",
1830 configure_inop_rhs(CONFIGSET_ALL),
1831 db_setting_inop_rhs()
1832 );
 
1833 db_multi_exec(
1834 "REPLACE INTO reportfmt SELECT * FROM settingSrc.reportfmt;"
1835 );
1836
1837 /*
@@ -1900,13 +1906,14 @@
1900 ** their associated permissions will not be copied; however, the system
1901 ** default users "anonymous", "nobody", "reader", "developer", and their
1902 ** associated permissions will be copied.
1903 **
1904 ** Options:
1905 ** --template FILE copy settings from repository file
1906 ** --admin-user|-A USERNAME select given USERNAME as admin user
1907 ** --date-override DATETIME use DATETIME as time of the initial check-in
 
1908 **
1909 ** DATETIME may be "now" or "YYYY-MM-DDTHH:MM:SS.SSS". If in
1910 ** year-month-day form, it may be truncated, the "T" may be replaced by
1911 ** a space, and it may also name a timezone offset from UTC as "-HH:MM"
1912 ** (westward) or "+HH:MM" (eastward). Either no timezone suffix or "Z"
@@ -1917,14 +1924,17 @@
1917 void create_repository_cmd(void){
1918 char *zPassword;
1919 const char *zTemplate; /* Repository from which to copy settings */
1920 const char *zDate; /* Date of the initial check-in */
1921 const char *zDefaultUser; /* Optional name of the default user */
 
 
1922
1923 zTemplate = find_option("template",0,1);
1924 zDate = find_option("date-override",0,1);
1925 zDefaultUser = find_option("admin-user","A",1);
 
1926 /* We should be done with options.. */
1927 verify_all_options();
1928
1929 if( g.argc!=3 ){
1930 usage("REPOSITORY-NAME");
@@ -1937,10 +1947,14 @@
1937 db_create_repository(g.argv[2]);
1938 db_open_repository(g.argv[2]);
1939 db_open_config(0, 0);
1940 if( zTemplate ) db_attach(zTemplate, "settingSrc");
1941 db_begin_transaction();
 
 
 
 
1942 if( zDate==0 ) zDate = "now";
1943 db_initial_setup(zTemplate, zDate, zDefaultUser);
1944 db_end_transaction(0);
1945 if( zTemplate ) db_detach("settingSrc");
1946 fossil_print("project-id: %s\n", db_get("project-code", 0));
1947
--- src/db.c
+++ src/db.c
@@ -1485,10 +1485,15 @@
1485 g.repositoryOpen = 1;
1486 /* Cache "allow-symlinks" option, because we'll need it on every stat call */
1487 g.allowSymlinks = db_get_boolean("allow-symlinks",
1488 db_allow_symlinks_by_default());
1489 g.zAuxSchema = db_get("aux-schema","");
1490 g.eHashPolicy = db_get_int("hash-policy",-1);
1491 if( g.eHashPolicy<0 ){
1492 g.eHashPolicy = hname_default_policy();
1493 db_set_int("hash-policy", g.eHashPolicy, 0);
1494 }
1495
1496 /* If the ALIAS table is not present, then some on-the-fly schema
1497 ** updates might be required.
1498 */
1499 rebuild_schema_update_2_0(); /* Do the Fossil-2.0 schema updates */
@@ -1828,10 +1833,11 @@
1833 " AND name NOT GLOB 'project-*'"
1834 " AND name NOT GLOB 'short-project-*';",
1835 configure_inop_rhs(CONFIGSET_ALL),
1836 db_setting_inop_rhs()
1837 );
1838 g.eHashPolicy = db_get_int("hash-policy", g.eHashPolicy);
1839 db_multi_exec(
1840 "REPLACE INTO reportfmt SELECT * FROM settingSrc.reportfmt;"
1841 );
1842
1843 /*
@@ -1900,13 +1906,14 @@
1906 ** their associated permissions will not be copied; however, the system
1907 ** default users "anonymous", "nobody", "reader", "developer", and their
1908 ** associated permissions will be copied.
1909 **
1910 ** Options:
1911 ** --template FILE Copy settings from repository file
1912 ** --admin-user|-A USERNAME Select given USERNAME as admin user
1913 ** --date-override DATETIME Use DATETIME as time of the initial check-in
1914 ** --sha1 Use a initial hash policy of "sha1"
1915 **
1916 ** DATETIME may be "now" or "YYYY-MM-DDTHH:MM:SS.SSS". If in
1917 ** year-month-day form, it may be truncated, the "T" may be replaced by
1918 ** a space, and it may also name a timezone offset from UTC as "-HH:MM"
1919 ** (westward) or "+HH:MM" (eastward). Either no timezone suffix or "Z"
@@ -1917,14 +1924,17 @@
1924 void create_repository_cmd(void){
1925 char *zPassword;
1926 const char *zTemplate; /* Repository from which to copy settings */
1927 const char *zDate; /* Date of the initial check-in */
1928 const char *zDefaultUser; /* Optional name of the default user */
1929 int bUseSha1 = 0; /* True to set the hash-policy to sha1 */
1930
1931
1932 zTemplate = find_option("template",0,1);
1933 zDate = find_option("date-override",0,1);
1934 zDefaultUser = find_option("admin-user","A",1);
1935 bUseSha1 = find_option("sha1",0,0)!=0;
1936 /* We should be done with options.. */
1937 verify_all_options();
1938
1939 if( g.argc!=3 ){
1940 usage("REPOSITORY-NAME");
@@ -1937,10 +1947,14 @@
1947 db_create_repository(g.argv[2]);
1948 db_open_repository(g.argv[2]);
1949 db_open_config(0, 0);
1950 if( zTemplate ) db_attach(zTemplate, "settingSrc");
1951 db_begin_transaction();
1952 if( bUseSha1 ){
1953 g.eHashPolicy = HPOLICY_SHA1;
1954 db_set_int("hash-policy", HPOLICY_SHA1, 0);
1955 }
1956 if( zDate==0 ) zDate = "now";
1957 db_initial_setup(zTemplate, zDate, zDefaultUser);
1958 db_end_transaction(0);
1959 if( zTemplate ) db_detach("settingSrc");
1960 fossil_print("project-id: %s\n", db_get("project-code", 0));
1961
+21 -7
--- src/diffcmd.c
+++ src/diffcmd.c
@@ -151,10 +151,13 @@
151151
/*
152152
** Show the difference between two files, one in memory and one on disk.
153153
**
154154
** The difference is the set of edits needed to transform pFile1 into
155155
** zFile2. The content of pFile1 is in memory. zFile2 exists on disk.
156
+**
157
+** If fSwapDiff is 1, show the set of edits to transform zFile2 into pFile1
158
+** instead of the opposite.
156159
**
157160
** Use the internal diff logic if zDiffCmd is NULL. Otherwise call the
158161
** command zDiffCmd to do the diffing.
159162
**
160163
** When using an external diff program, zBinGlob contains the GLOB patterns
@@ -167,11 +170,12 @@
167170
const char *zFile2, /* On disk content to compare to */
168171
const char *zName, /* Display name of the file */
169172
const char *zDiffCmd, /* Command for comparison */
170173
const char *zBinGlob, /* Treat file names matching this as binary */
171174
int fIncludeBinary, /* Include binary files for external diff */
172
- u64 diffFlags /* Flags to control the diff */
175
+ u64 diffFlags, /* Flags to control the diff */
176
+ int fSwapDiff /* Diff from Zfile2 to Pfile1 */
173177
){
174178
if( zDiffCmd==0 ){
175179
Blob out; /* Diff output text */
176180
Blob file2; /* Content of zFile2 */
177181
const char *zName2; /* Name of zFile2 for display */
@@ -194,11 +198,15 @@
194198
if( blob_compare(pFile1, &file2) ){
195199
fossil_print("CHANGED %s\n", zName);
196200
}
197201
}else{
198202
blob_zero(&out);
199
- text_diff(pFile1, &file2, &out, 0, diffFlags);
203
+ if( fSwapDiff ){
204
+ text_diff(&file2, pFile1, &out, 0, diffFlags);
205
+ }else{
206
+ text_diff(pFile1, &file2, &out, 0, diffFlags);
207
+ }
200208
if( blob_size(&out) ){
201209
diff_print_filenames(zName, zName2, diffFlags);
202210
fossil_print("%s\n", blob_str(&out));
203211
}
204212
blob_reset(&out);
@@ -252,13 +260,19 @@
252260
blob_write_to_file(pFile1, blob_str(&nameFile1));
253261
254262
/* Construct the external diff command */
255263
blob_zero(&cmd);
256264
blob_appendf(&cmd, "%s ", zDiffCmd);
257
- shell_escape(&cmd, blob_str(&nameFile1));
258
- blob_append(&cmd, " ", 1);
259
- shell_escape(&cmd, zFile2);
265
+ if( fSwapDiff ){
266
+ shell_escape(&cmd, zFile2);
267
+ blob_append(&cmd, " ", 1);
268
+ shell_escape(&cmd, blob_str(&nameFile1));
269
+ }else{
270
+ shell_escape(&cmd, blob_str(&nameFile1));
271
+ blob_append(&cmd, " ", 1);
272
+ shell_escape(&cmd, zFile2);
273
+ }
260274
261275
/* Run the external diff command */
262276
fossil_system(blob_str(&cmd));
263277
264278
/* Delete the temporary file and clean up memory used */
@@ -482,11 +496,11 @@
482496
blob_zero(&content);
483497
}
484498
isBin = fIncludeBinary ? 0 : looks_like_binary(&content);
485499
diff_print_index(zPathname, diffFlags);
486500
diff_file(&content, isBin, zFullName, zPathname, zDiffCmd,
487
- zBinGlob, fIncludeBinary, diffFlags);
501
+ zBinGlob, fIncludeBinary, diffFlags, 0);
488502
blob_reset(&content);
489503
}
490504
blob_reset(&fname);
491505
}
492506
db_finalize(&q);
@@ -519,11 +533,11 @@
519533
const char *zFile = (const char*)db_column_text(&q, 0);
520534
if( !file_dir_match(pFileDir, zFile) ) continue;
521535
zFullName = mprintf("%s%s", g.zLocalRoot, zFile);
522536
db_column_blob(&q, 1, &content);
523537
diff_file(&content, 0, zFullName, zFile,
524
- zDiffCmd, zBinGlob, fIncludeBinary, diffFlags);
538
+ zDiffCmd, zBinGlob, fIncludeBinary, diffFlags, 0);
525539
fossil_free(zFullName);
526540
blob_reset(&content);
527541
}
528542
db_finalize(&q);
529543
}
530544
--- src/diffcmd.c
+++ src/diffcmd.c
@@ -151,10 +151,13 @@
151 /*
152 ** Show the difference between two files, one in memory and one on disk.
153 **
154 ** The difference is the set of edits needed to transform pFile1 into
155 ** zFile2. The content of pFile1 is in memory. zFile2 exists on disk.
 
 
 
156 **
157 ** Use the internal diff logic if zDiffCmd is NULL. Otherwise call the
158 ** command zDiffCmd to do the diffing.
159 **
160 ** When using an external diff program, zBinGlob contains the GLOB patterns
@@ -167,11 +170,12 @@
167 const char *zFile2, /* On disk content to compare to */
168 const char *zName, /* Display name of the file */
169 const char *zDiffCmd, /* Command for comparison */
170 const char *zBinGlob, /* Treat file names matching this as binary */
171 int fIncludeBinary, /* Include binary files for external diff */
172 u64 diffFlags /* Flags to control the diff */
 
173 ){
174 if( zDiffCmd==0 ){
175 Blob out; /* Diff output text */
176 Blob file2; /* Content of zFile2 */
177 const char *zName2; /* Name of zFile2 for display */
@@ -194,11 +198,15 @@
194 if( blob_compare(pFile1, &file2) ){
195 fossil_print("CHANGED %s\n", zName);
196 }
197 }else{
198 blob_zero(&out);
199 text_diff(pFile1, &file2, &out, 0, diffFlags);
 
 
 
 
200 if( blob_size(&out) ){
201 diff_print_filenames(zName, zName2, diffFlags);
202 fossil_print("%s\n", blob_str(&out));
203 }
204 blob_reset(&out);
@@ -252,13 +260,19 @@
252 blob_write_to_file(pFile1, blob_str(&nameFile1));
253
254 /* Construct the external diff command */
255 blob_zero(&cmd);
256 blob_appendf(&cmd, "%s ", zDiffCmd);
257 shell_escape(&cmd, blob_str(&nameFile1));
258 blob_append(&cmd, " ", 1);
259 shell_escape(&cmd, zFile2);
 
 
 
 
 
 
260
261 /* Run the external diff command */
262 fossil_system(blob_str(&cmd));
263
264 /* Delete the temporary file and clean up memory used */
@@ -482,11 +496,11 @@
482 blob_zero(&content);
483 }
484 isBin = fIncludeBinary ? 0 : looks_like_binary(&content);
485 diff_print_index(zPathname, diffFlags);
486 diff_file(&content, isBin, zFullName, zPathname, zDiffCmd,
487 zBinGlob, fIncludeBinary, diffFlags);
488 blob_reset(&content);
489 }
490 blob_reset(&fname);
491 }
492 db_finalize(&q);
@@ -519,11 +533,11 @@
519 const char *zFile = (const char*)db_column_text(&q, 0);
520 if( !file_dir_match(pFileDir, zFile) ) continue;
521 zFullName = mprintf("%s%s", g.zLocalRoot, zFile);
522 db_column_blob(&q, 1, &content);
523 diff_file(&content, 0, zFullName, zFile,
524 zDiffCmd, zBinGlob, fIncludeBinary, diffFlags);
525 fossil_free(zFullName);
526 blob_reset(&content);
527 }
528 db_finalize(&q);
529 }
530
--- src/diffcmd.c
+++ src/diffcmd.c
@@ -151,10 +151,13 @@
151 /*
152 ** Show the difference between two files, one in memory and one on disk.
153 **
154 ** The difference is the set of edits needed to transform pFile1 into
155 ** zFile2. The content of pFile1 is in memory. zFile2 exists on disk.
156 **
157 ** If fSwapDiff is 1, show the set of edits to transform zFile2 into pFile1
158 ** instead of the opposite.
159 **
160 ** Use the internal diff logic if zDiffCmd is NULL. Otherwise call the
161 ** command zDiffCmd to do the diffing.
162 **
163 ** When using an external diff program, zBinGlob contains the GLOB patterns
@@ -167,11 +170,12 @@
170 const char *zFile2, /* On disk content to compare to */
171 const char *zName, /* Display name of the file */
172 const char *zDiffCmd, /* Command for comparison */
173 const char *zBinGlob, /* Treat file names matching this as binary */
174 int fIncludeBinary, /* Include binary files for external diff */
175 u64 diffFlags, /* Flags to control the diff */
176 int fSwapDiff /* Diff from Zfile2 to Pfile1 */
177 ){
178 if( zDiffCmd==0 ){
179 Blob out; /* Diff output text */
180 Blob file2; /* Content of zFile2 */
181 const char *zName2; /* Name of zFile2 for display */
@@ -194,11 +198,15 @@
198 if( blob_compare(pFile1, &file2) ){
199 fossil_print("CHANGED %s\n", zName);
200 }
201 }else{
202 blob_zero(&out);
203 if( fSwapDiff ){
204 text_diff(&file2, pFile1, &out, 0, diffFlags);
205 }else{
206 text_diff(pFile1, &file2, &out, 0, diffFlags);
207 }
208 if( blob_size(&out) ){
209 diff_print_filenames(zName, zName2, diffFlags);
210 fossil_print("%s\n", blob_str(&out));
211 }
212 blob_reset(&out);
@@ -252,13 +260,19 @@
260 blob_write_to_file(pFile1, blob_str(&nameFile1));
261
262 /* Construct the external diff command */
263 blob_zero(&cmd);
264 blob_appendf(&cmd, "%s ", zDiffCmd);
265 if( fSwapDiff ){
266 shell_escape(&cmd, zFile2);
267 blob_append(&cmd, " ", 1);
268 shell_escape(&cmd, blob_str(&nameFile1));
269 }else{
270 shell_escape(&cmd, blob_str(&nameFile1));
271 blob_append(&cmd, " ", 1);
272 shell_escape(&cmd, zFile2);
273 }
274
275 /* Run the external diff command */
276 fossil_system(blob_str(&cmd));
277
278 /* Delete the temporary file and clean up memory used */
@@ -482,11 +496,11 @@
496 blob_zero(&content);
497 }
498 isBin = fIncludeBinary ? 0 : looks_like_binary(&content);
499 diff_print_index(zPathname, diffFlags);
500 diff_file(&content, isBin, zFullName, zPathname, zDiffCmd,
501 zBinGlob, fIncludeBinary, diffFlags, 0);
502 blob_reset(&content);
503 }
504 blob_reset(&fname);
505 }
506 db_finalize(&q);
@@ -519,11 +533,11 @@
533 const char *zFile = (const char*)db_column_text(&q, 0);
534 if( !file_dir_match(pFileDir, zFile) ) continue;
535 zFullName = mprintf("%s%s", g.zLocalRoot, zFile);
536 db_column_blob(&q, 1, &content);
537 diff_file(&content, 0, zFullName, zFile,
538 zDiffCmd, zBinGlob, fIncludeBinary, diffFlags, 0);
539 fossil_free(zFullName);
540 blob_reset(&content);
541 }
542 db_finalize(&q);
543 }
544
+1 -1
--- src/doc.c
+++ src/doc.c
@@ -735,11 +735,11 @@
735735
736736
/* Jump here when unable to locate the document */
737737
doc_not_found:
738738
db_end_transaction(0);
739739
if( isUV && P("name")==0 ){
740
- uvstat_page();
740
+ uvlist_page();
741741
return;
742742
}
743743
cgi_set_status(404, "Not Found");
744744
style_header("Not Found");
745745
@ <p>Document %h(zOrigName) not found
746746
--- src/doc.c
+++ src/doc.c
@@ -735,11 +735,11 @@
735
736 /* Jump here when unable to locate the document */
737 doc_not_found:
738 db_end_transaction(0);
739 if( isUV && P("name")==0 ){
740 uvstat_page();
741 return;
742 }
743 cgi_set_status(404, "Not Found");
744 style_header("Not Found");
745 @ <p>Document %h(zOrigName) not found
746
--- src/doc.c
+++ src/doc.c
@@ -735,11 +735,11 @@
735
736 /* Jump here when unable to locate the document */
737 doc_not_found:
738 db_end_transaction(0);
739 if( isUV && P("name")==0 ){
740 uvlist_page();
741 return;
742 }
743 cgi_set_status(404, "Not Found");
744 style_header("Not Found");
745 @ <p>Document %h(zOrigName) not found
746
+90
--- src/encode.c
+++ src/encode.c
@@ -336,10 +336,100 @@
336336
z[j++] = c;
337337
}
338338
if( z[j] ) z[j] = 0;
339339
}
340340
341
+
342
+/*
343
+** The *pz variable points to a UTF8 string. Read the next character
344
+** off of that string and return its codepoint value. Advance *pz to the
345
+** next character
346
+*/
347
+u32 fossil_utf8_read(
348
+ const unsigned char **pz /* Pointer to string from which to read char */
349
+){
350
+ unsigned int c;
351
+
352
+ /*
353
+ ** This lookup table is used to help decode the first byte of
354
+ ** a multi-byte UTF8 character.
355
+ */
356
+ static const unsigned char utf8Trans1[] = {
357
+ 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
358
+ 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
359
+ 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
360
+ 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
361
+ 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
362
+ 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
363
+ 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
364
+ 0x00, 0x01, 0x02, 0x03, 0x00, 0x01, 0x00, 0x00,
365
+ };
366
+
367
+ c = *((*pz)++);
368
+ if( c>=0xc0 ){
369
+ c = utf8Trans1[c-0xc0];
370
+ while( (*(*pz) & 0xc0)==0x80 ){
371
+ c = (c<<6) + (0x3f & *((*pz)++));
372
+ }
373
+ if( c<0x80
374
+ || (c&0xFFFFF800)==0xD800
375
+ || (c&0xFFFFFFFE)==0xFFFE ){ c = 0xFFFD; }
376
+ }
377
+ return c;
378
+}
379
+
380
+/*
381
+** Encode a UTF8 string for JSON. All special characters are escaped.
382
+*/
383
+void blob_append_json_string(Blob *pBlob, const char *zStr){
384
+ const unsigned char *z;
385
+ char *zOut;
386
+ u32 c;
387
+ int n, i, j;
388
+ z = (const unsigned char*)zStr;
389
+ n = 0;
390
+ while( (c = fossil_utf8_read(&z))!=0 ){
391
+ if( c=='\\' || c=='"' ){
392
+ n += 2;
393
+ }else if( c<' ' || c>=0x7f ){
394
+ if( c=='\n' || c=='\r' ){
395
+ n += 2;
396
+ }else{
397
+ n += 6;
398
+ }
399
+ }else{
400
+ n++;
401
+ }
402
+ }
403
+ i = blob_size(pBlob);
404
+ blob_resize(pBlob, i+n);
405
+ zOut = blob_buffer(pBlob);
406
+ z = (const unsigned char*)zStr;
407
+ while( (c = fossil_utf8_read(&z))!=0 ){
408
+ if( c=='\\' ){
409
+ zOut[i++] = '\\';
410
+ zOut[i++] = c;
411
+ }else if( c<' ' || c>=0x7f ){
412
+ zOut[i++] = '\\';
413
+ if( c=='\n' ){
414
+ zOut[i++] = 'n';
415
+ }else if( c=='\r' ){
416
+ zOut[i++] = 'r';
417
+ }else{
418
+ zOut[i++] = 'u';
419
+ for(j=3; j>=0; j--){
420
+ zOut[i+j] = "0123456789abcdef"[c&0xf];
421
+ c >>= 4;
422
+ }
423
+ i += 4;
424
+ }
425
+ }else{
426
+ zOut[i++] = c;
427
+ }
428
+ }
429
+ zOut[i] = 0;
430
+}
341431
342432
/*
343433
** The characters used for HTTP base64 encoding.
344434
*/
345435
static unsigned char zBase[] =
346436
--- src/encode.c
+++ src/encode.c
@@ -336,10 +336,100 @@
336 z[j++] = c;
337 }
338 if( z[j] ) z[j] = 0;
339 }
340
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
341
342 /*
343 ** The characters used for HTTP base64 encoding.
344 */
345 static unsigned char zBase[] =
346
--- src/encode.c
+++ src/encode.c
@@ -336,10 +336,100 @@
336 z[j++] = c;
337 }
338 if( z[j] ) z[j] = 0;
339 }
340
341
342 /*
343 ** The *pz variable points to a UTF8 string. Read the next character
344 ** off of that string and return its codepoint value. Advance *pz to the
345 ** next character
346 */
347 u32 fossil_utf8_read(
348 const unsigned char **pz /* Pointer to string from which to read char */
349 ){
350 unsigned int c;
351
352 /*
353 ** This lookup table is used to help decode the first byte of
354 ** a multi-byte UTF8 character.
355 */
356 static const unsigned char utf8Trans1[] = {
357 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
358 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
359 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
360 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
361 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
362 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
363 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
364 0x00, 0x01, 0x02, 0x03, 0x00, 0x01, 0x00, 0x00,
365 };
366
367 c = *((*pz)++);
368 if( c>=0xc0 ){
369 c = utf8Trans1[c-0xc0];
370 while( (*(*pz) & 0xc0)==0x80 ){
371 c = (c<<6) + (0x3f & *((*pz)++));
372 }
373 if( c<0x80
374 || (c&0xFFFFF800)==0xD800
375 || (c&0xFFFFFFFE)==0xFFFE ){ c = 0xFFFD; }
376 }
377 return c;
378 }
379
380 /*
381 ** Encode a UTF8 string for JSON. All special characters are escaped.
382 */
383 void blob_append_json_string(Blob *pBlob, const char *zStr){
384 const unsigned char *z;
385 char *zOut;
386 u32 c;
387 int n, i, j;
388 z = (const unsigned char*)zStr;
389 n = 0;
390 while( (c = fossil_utf8_read(&z))!=0 ){
391 if( c=='\\' || c=='"' ){
392 n += 2;
393 }else if( c<' ' || c>=0x7f ){
394 if( c=='\n' || c=='\r' ){
395 n += 2;
396 }else{
397 n += 6;
398 }
399 }else{
400 n++;
401 }
402 }
403 i = blob_size(pBlob);
404 blob_resize(pBlob, i+n);
405 zOut = blob_buffer(pBlob);
406 z = (const unsigned char*)zStr;
407 while( (c = fossil_utf8_read(&z))!=0 ){
408 if( c=='\\' ){
409 zOut[i++] = '\\';
410 zOut[i++] = c;
411 }else if( c<' ' || c>=0x7f ){
412 zOut[i++] = '\\';
413 if( c=='\n' ){
414 zOut[i++] = 'n';
415 }else if( c=='\r' ){
416 zOut[i++] = 'r';
417 }else{
418 zOut[i++] = 'u';
419 for(j=3; j>=0; j--){
420 zOut[i+j] = "0123456789abcdef"[c&0xf];
421 c >>= 4;
422 }
423 i += 4;
424 }
425 }else{
426 zOut[i++] = c;
427 }
428 }
429 zOut[i] = 0;
430 }
431
432 /*
433 ** The characters used for HTTP base64 encoding.
434 */
435 static unsigned char zBase[] =
436
+137 -20
--- src/hname.c
+++ src/hname.c
@@ -16,11 +16,13 @@
1616
*******************************************************************************
1717
**
1818
** This file contains generic code for dealing with hashes used for
1919
** naming artifacts. Specific hash algorithms are implemented separately
2020
** (for example in sha1.c and sha3.c). This file contains the generic
21
-** interface code.
21
+** interface logic.
22
+**
23
+** "hname" is intended to be an abbreviation of "hash name".
2224
*/
2325
#include "config.h"
2426
#include "hname.h"
2527
2628
@@ -47,10 +49,19 @@
4749
/*
4850
** The number of distinct hash algorithms:
4951
*/
5052
#define HNAME_COUNT 2 /* Just SHA1 and SHA3-256. Let's keep it that way! */
5153
54
+/*
55
+** Hash naming policies
56
+*/
57
+#define HPOLICY_SHA1 0 /* Use SHA1 hashes */
58
+#define HPOLICY_AUTO 1 /* SHA1 but auto-promote to SHA3 */
59
+#define HPOLICY_SHA3 2 /* Use SHA3 hashes */
60
+#define HPOLICY_SHA3_ONLY 3 /* Use SHA3 hashes exclusively */
61
+#define HPOLICY_SHUN_SHA1 4 /* Shun all SHA1 objects */
62
+
5263
#endif /* INTERFACE */
5364
5465
/*
5566
** Return a human-readable name for the hash algorithm given a hash with
5667
** a length of nHash hexadecimal digits.
@@ -142,26 +153,132 @@
142153
143154
/*
144155
** Compute a hash on blob pContent. Write the hash into blob pHashOut.
145156
** This routine assumes that pHashOut is uninitialized.
146157
**
147
-** The preferred hash is used for iHType==0, and various alternative hashes
148
-** are used for iHType>0 && iHType<NHAME_COUNT.
158
+** The preferred hash is used for iHType==0 and the alternative hash is
159
+** used if iHType==1. (The interface is designed to accommodate more than
160
+** just two hashes, but HNAME_COUNT is currently fixed at 2.)
161
+**
162
+** Depending on the hash policy, the alternative hash may be disallowed.
163
+** If the alterative hash is disallowed, the routine returns 0. This
164
+** routine returns 1 if iHType>0 and the alternative hash is allowed,
165
+** and it always returns 1 when iHType==0.
166
+**
167
+** Alternative hash is disallowed for all hash policies except auto,
168
+** sha1 and sha3.
169
+*/
170
+int hname_hash(const Blob *pContent, unsigned int iHType, Blob *pHashOut){
171
+ assert( iHType==0 || iHType==1 );
172
+ if( iHType==1 ){
173
+ switch( g.eHashPolicy ){
174
+ case HPOLICY_AUTO:
175
+ case HPOLICY_SHA1:
176
+ sha3sum_blob(pContent, 256, pHashOut);
177
+ return 1;
178
+ case HPOLICY_SHA3:
179
+ sha1sum_blob(pContent, pHashOut);
180
+ return 1;
181
+ }
182
+ }
183
+ if( iHType==0 ){
184
+ switch( g.eHashPolicy ){
185
+ case HPOLICY_SHA1:
186
+ case HPOLICY_AUTO:
187
+ sha1sum_blob(pContent, pHashOut);
188
+ return 1;
189
+ case HPOLICY_SHA3:
190
+ case HPOLICY_SHA3_ONLY:
191
+ case HPOLICY_SHUN_SHA1:
192
+ sha3sum_blob(pContent, 256, pHashOut);
193
+ return 1;
194
+ }
195
+ }
196
+ blob_init(pHashOut, 0, 0);
197
+ return 0;
198
+}
199
+
200
+/*
201
+** Return the default hash policy for repositories that do not currently
202
+** have an assigned hash policy.
203
+**
204
+** Make the default HPOLICY_AUTO if there are SHA1 artficates but no SHA3
205
+** artifacts in the repository. Make the default HPOLICY_SHA3 if there
206
+** are one or more SHA3 artifacts or if the repository is initially empty.
207
+*/
208
+int hname_default_policy(void){
209
+ if( db_exists("SELECT 1 FROM blob WHERE length(uuid)>40")
210
+ || !db_exists("SELECT 1 FROM blob WHERE length(uuid)==40")
211
+ ){
212
+ return HPOLICY_SHA3;
213
+ }else{
214
+ return HPOLICY_AUTO;
215
+ }
216
+}
217
+
218
+/*
219
+** Names of the hash policies.
220
+*/
221
+static const char *azPolicy[] = {
222
+ "sha1", "auto", "sha3", "sha3-only", "shun-sha1"
223
+};
224
+
225
+/* Return the name of the current hash policy.
226
+*/
227
+const char *hpolicy_name(void){
228
+ return azPolicy[g.eHashPolicy];
229
+}
230
+
231
+
232
+/*
233
+** COMMAND: hash-policy*
234
+**
235
+** Usage: fossil hash-policy ?NEW-POLICY?
236
+**
237
+** Query or set the hash policy for the current repository. Available hash
238
+** policies are as follows:
239
+**
240
+** sha1 New artifact names are created using SHA1
241
+**
242
+** auto New artifact names are created using SHA1, but
243
+** automatically change the policy to "sha3" when
244
+** any SHA3 artifact enters the repository.
245
+**
246
+** sha3 New artifact names are created using SHA3, but
247
+** older artifacts with SHA1 names may be reused.
248
+**
249
+** sha3-only Use only SHA3 artifact names. Do not reuse legacy
250
+** SHA1 names.
251
+**
252
+** shun-sha1 Shun any SHA1 artifacts received by sync operations
253
+** other than clones. Older legacy SHA1 artifacts are
254
+** are allowed during a clone.
255
+**
256
+** The default hash policy for existing repositories is "auto", which will
257
+** immediately promote to "sha3" if the repository contains one or more
258
+** artifacts with SHA3 names. The default hash policy for new repositories
259
+** is "shun-sha1".
149260
*/
150
-void hname_hash(const Blob *pContent, unsigned int iHType, Blob *pHashOut){
151
-#if RELEASE_VERSION_NUMBER>=20100
152
- /* For Fossil 2.1 and later, the preferred hash algorithm is SHA3-256 and
153
- ** SHA1 is the secondary hash algorithm. */
154
- switch( iHType ){
155
- case 0: sha3sum_blob(pContent, 256, pHashOut); break;
156
- case 1: sha1sum_blob(pContent, pHashOut); break;
157
- }
158
-#else
159
- /* Prior to Fossil 2.1, the preferred hash algorithm is SHA1 (for backwards
160
- ** compatibility with Fossil 1.x) and SHA3-256 is the only auxiliary
161
- ** algorithm */
162
- switch( iHType ){
163
- case 0: sha1sum_blob(pContent, pHashOut); break;
164
- case 1: sha3sum_blob(pContent, 256, pHashOut); break;
165
- }
166
-#endif
261
+void hash_policy_command(void){
262
+ int i;
263
+ db_find_and_open_repository(0, 0);
264
+ if( g.argc!=2 && g.argc!=3 ) usage("?NEW-POLICY?");
265
+ if( g.argc==2 ){
266
+ fossil_print("%s\n", azPolicy[g.eHashPolicy]);
267
+ return;
268
+ }
269
+ for(i=HPOLICY_SHA1; i<=HPOLICY_SHUN_SHA1; i++){
270
+ if( fossil_strcmp(g.argv[2],azPolicy[i])==0 ){
271
+ if( i==HPOLICY_AUTO
272
+ && db_exists("SELECT 1 FROM blob WHERE length(uuid)>40")
273
+ ){
274
+ i = HPOLICY_SHA3;
275
+ }
276
+ g.eHashPolicy = i;
277
+ db_set_int("hash-policy", i, 0);
278
+ fossil_print("%s\n", azPolicy[i]);
279
+ return;
280
+ }
281
+ }
282
+ fossil_fatal("unknown hash policy \"%s\" - should be one of: sha1 auto"
283
+ " sha3 sha3-only shun-sha1", g.argv[2]);
167284
}
168285
--- src/hname.c
+++ src/hname.c
@@ -16,11 +16,13 @@
16 *******************************************************************************
17 **
18 ** This file contains generic code for dealing with hashes used for
19 ** naming artifacts. Specific hash algorithms are implemented separately
20 ** (for example in sha1.c and sha3.c). This file contains the generic
21 ** interface code.
 
 
22 */
23 #include "config.h"
24 #include "hname.h"
25
26
@@ -47,10 +49,19 @@
47 /*
48 ** The number of distinct hash algorithms:
49 */
50 #define HNAME_COUNT 2 /* Just SHA1 and SHA3-256. Let's keep it that way! */
51
 
 
 
 
 
 
 
 
 
52 #endif /* INTERFACE */
53
54 /*
55 ** Return a human-readable name for the hash algorithm given a hash with
56 ** a length of nHash hexadecimal digits.
@@ -142,26 +153,132 @@
142
143 /*
144 ** Compute a hash on blob pContent. Write the hash into blob pHashOut.
145 ** This routine assumes that pHashOut is uninitialized.
146 **
147 ** The preferred hash is used for iHType==0, and various alternative hashes
148 ** are used for iHType>0 && iHType<NHAME_COUNT.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
149 */
150 void hname_hash(const Blob *pContent, unsigned int iHType, Blob *pHashOut){
151 #if RELEASE_VERSION_NUMBER>=20100
152 /* For Fossil 2.1 and later, the preferred hash algorithm is SHA3-256 and
153 ** SHA1 is the secondary hash algorithm. */
154 switch( iHType ){
155 case 0: sha3sum_blob(pContent, 256, pHashOut); break;
156 case 1: sha1sum_blob(pContent, pHashOut); break;
157 }
158 #else
159 /* Prior to Fossil 2.1, the preferred hash algorithm is SHA1 (for backwards
160 ** compatibility with Fossil 1.x) and SHA3-256 is the only auxiliary
161 ** algorithm */
162 switch( iHType ){
163 case 0: sha1sum_blob(pContent, pHashOut); break;
164 case 1: sha3sum_blob(pContent, 256, pHashOut); break;
165 }
166 #endif
 
 
 
 
 
 
167 }
168
--- src/hname.c
+++ src/hname.c
@@ -16,11 +16,13 @@
16 *******************************************************************************
17 **
18 ** This file contains generic code for dealing with hashes used for
19 ** naming artifacts. Specific hash algorithms are implemented separately
20 ** (for example in sha1.c and sha3.c). This file contains the generic
21 ** interface logic.
22 **
23 ** "hname" is intended to be an abbreviation of "hash name".
24 */
25 #include "config.h"
26 #include "hname.h"
27
28
@@ -47,10 +49,19 @@
49 /*
50 ** The number of distinct hash algorithms:
51 */
52 #define HNAME_COUNT 2 /* Just SHA1 and SHA3-256. Let's keep it that way! */
53
54 /*
55 ** Hash naming policies
56 */
57 #define HPOLICY_SHA1 0 /* Use SHA1 hashes */
58 #define HPOLICY_AUTO 1 /* SHA1 but auto-promote to SHA3 */
59 #define HPOLICY_SHA3 2 /* Use SHA3 hashes */
60 #define HPOLICY_SHA3_ONLY 3 /* Use SHA3 hashes exclusively */
61 #define HPOLICY_SHUN_SHA1 4 /* Shun all SHA1 objects */
62
63 #endif /* INTERFACE */
64
65 /*
66 ** Return a human-readable name for the hash algorithm given a hash with
67 ** a length of nHash hexadecimal digits.
@@ -142,26 +153,132 @@
153
154 /*
155 ** Compute a hash on blob pContent. Write the hash into blob pHashOut.
156 ** This routine assumes that pHashOut is uninitialized.
157 **
158 ** The preferred hash is used for iHType==0 and the alternative hash is
159 ** used if iHType==1. (The interface is designed to accommodate more than
160 ** just two hashes, but HNAME_COUNT is currently fixed at 2.)
161 **
162 ** Depending on the hash policy, the alternative hash may be disallowed.
163 ** If the alterative hash is disallowed, the routine returns 0. This
164 ** routine returns 1 if iHType>0 and the alternative hash is allowed,
165 ** and it always returns 1 when iHType==0.
166 **
167 ** Alternative hash is disallowed for all hash policies except auto,
168 ** sha1 and sha3.
169 */
170 int hname_hash(const Blob *pContent, unsigned int iHType, Blob *pHashOut){
171 assert( iHType==0 || iHType==1 );
172 if( iHType==1 ){
173 switch( g.eHashPolicy ){
174 case HPOLICY_AUTO:
175 case HPOLICY_SHA1:
176 sha3sum_blob(pContent, 256, pHashOut);
177 return 1;
178 case HPOLICY_SHA3:
179 sha1sum_blob(pContent, pHashOut);
180 return 1;
181 }
182 }
183 if( iHType==0 ){
184 switch( g.eHashPolicy ){
185 case HPOLICY_SHA1:
186 case HPOLICY_AUTO:
187 sha1sum_blob(pContent, pHashOut);
188 return 1;
189 case HPOLICY_SHA3:
190 case HPOLICY_SHA3_ONLY:
191 case HPOLICY_SHUN_SHA1:
192 sha3sum_blob(pContent, 256, pHashOut);
193 return 1;
194 }
195 }
196 blob_init(pHashOut, 0, 0);
197 return 0;
198 }
199
200 /*
201 ** Return the default hash policy for repositories that do not currently
202 ** have an assigned hash policy.
203 **
204 ** Make the default HPOLICY_AUTO if there are SHA1 artficates but no SHA3
205 ** artifacts in the repository. Make the default HPOLICY_SHA3 if there
206 ** are one or more SHA3 artifacts or if the repository is initially empty.
207 */
208 int hname_default_policy(void){
209 if( db_exists("SELECT 1 FROM blob WHERE length(uuid)>40")
210 || !db_exists("SELECT 1 FROM blob WHERE length(uuid)==40")
211 ){
212 return HPOLICY_SHA3;
213 }else{
214 return HPOLICY_AUTO;
215 }
216 }
217
218 /*
219 ** Names of the hash policies.
220 */
221 static const char *azPolicy[] = {
222 "sha1", "auto", "sha3", "sha3-only", "shun-sha1"
223 };
224
225 /* Return the name of the current hash policy.
226 */
227 const char *hpolicy_name(void){
228 return azPolicy[g.eHashPolicy];
229 }
230
231
232 /*
233 ** COMMAND: hash-policy*
234 **
235 ** Usage: fossil hash-policy ?NEW-POLICY?
236 **
237 ** Query or set the hash policy for the current repository. Available hash
238 ** policies are as follows:
239 **
240 ** sha1 New artifact names are created using SHA1
241 **
242 ** auto New artifact names are created using SHA1, but
243 ** automatically change the policy to "sha3" when
244 ** any SHA3 artifact enters the repository.
245 **
246 ** sha3 New artifact names are created using SHA3, but
247 ** older artifacts with SHA1 names may be reused.
248 **
249 ** sha3-only Use only SHA3 artifact names. Do not reuse legacy
250 ** SHA1 names.
251 **
252 ** shun-sha1 Shun any SHA1 artifacts received by sync operations
253 ** other than clones. Older legacy SHA1 artifacts are
254 ** are allowed during a clone.
255 **
256 ** The default hash policy for existing repositories is "auto", which will
257 ** immediately promote to "sha3" if the repository contains one or more
258 ** artifacts with SHA3 names. The default hash policy for new repositories
259 ** is "shun-sha1".
260 */
261 void hash_policy_command(void){
262 int i;
263 db_find_and_open_repository(0, 0);
264 if( g.argc!=2 && g.argc!=3 ) usage("?NEW-POLICY?");
265 if( g.argc==2 ){
266 fossil_print("%s\n", azPolicy[g.eHashPolicy]);
267 return;
268 }
269 for(i=HPOLICY_SHA1; i<=HPOLICY_SHUN_SHA1; i++){
270 if( fossil_strcmp(g.argv[2],azPolicy[i])==0 ){
271 if( i==HPOLICY_AUTO
272 && db_exists("SELECT 1 FROM blob WHERE length(uuid)>40")
273 ){
274 i = HPOLICY_SHA3;
275 }
276 g.eHashPolicy = i;
277 db_set_int("hash-policy", i, 0);
278 fossil_print("%s\n", azPolicy[i]);
279 return;
280 }
281 }
282 fossil_fatal("unknown hash policy \"%s\" - should be one of: sha1 auto"
283 " sha3 sha3-only shun-sha1", g.argv[2]);
284 }
285
+4 -1
--- src/main.c
+++ src/main.c
@@ -140,10 +140,11 @@
140140
char *zLocalDbName; /* Name of the local database file */
141141
char *zOpenRevision; /* Check-in version to use during database open */
142142
int localOpen; /* True if the local database is open */
143143
char *zLocalRoot; /* The directory holding the local database */
144144
int minPrefix; /* Number of digits needed for a distinct UUID */
145
+ int eHashPolicy; /* Current hash policy. One of HPOLICY_* */
145146
int fNoDirSymlinks; /* True if --no-dir-symlinks flag is present */
146147
int fSqlTrace; /* True if --sqltrace flag is present */
147148
int fSqlStats; /* True if --sqltrace or --sqlstats are present */
148149
int fSqlPrint; /* True if -sqlprint flag is present */
149150
int fQuiet; /* True if -quiet flag is present */
@@ -2005,11 +2006,11 @@
20052006
** the name of that directory and the specific repository will be
20062007
** opened later by process_one_web_page() based on the content of
20072008
** the PATH_INFO variable.
20082009
**
20092010
** If the fCreate flag is set, then create the repository if it
2010
-** does not already exist.
2011
+** does not already exist. Always use "auto" hash-policy in this case.
20112012
*/
20122013
static void find_server_repository(int arg, int fCreate){
20132014
if( g.argc<=arg ){
20142015
db_must_be_within_tree();
20152016
}else{
@@ -2022,10 +2023,12 @@
20222023
if( isDir==0 && fCreate ){
20232024
const char *zPassword;
20242025
db_create_repository(zRepo);
20252026
db_open_repository(zRepo);
20262027
db_begin_transaction();
2028
+ g.eHashPolicy = HPOLICY_AUTO;
2029
+ db_set_int("hash-policy", HPOLICY_AUTO, 0);
20272030
db_initial_setup(0, "now", g.zLogin);
20282031
db_end_transaction(0);
20292032
fossil_print("project-id: %s\n", db_get("project-code", 0));
20302033
fossil_print("server-id: %s\n", db_get("server-code", 0));
20312034
zPassword = db_text(0, "SELECT pw FROM user WHERE login=%Q", g.zLogin);
20322035
--- src/main.c
+++ src/main.c
@@ -140,10 +140,11 @@
140 char *zLocalDbName; /* Name of the local database file */
141 char *zOpenRevision; /* Check-in version to use during database open */
142 int localOpen; /* True if the local database is open */
143 char *zLocalRoot; /* The directory holding the local database */
144 int minPrefix; /* Number of digits needed for a distinct UUID */
 
145 int fNoDirSymlinks; /* True if --no-dir-symlinks flag is present */
146 int fSqlTrace; /* True if --sqltrace flag is present */
147 int fSqlStats; /* True if --sqltrace or --sqlstats are present */
148 int fSqlPrint; /* True if -sqlprint flag is present */
149 int fQuiet; /* True if -quiet flag is present */
@@ -2005,11 +2006,11 @@
2005 ** the name of that directory and the specific repository will be
2006 ** opened later by process_one_web_page() based on the content of
2007 ** the PATH_INFO variable.
2008 **
2009 ** If the fCreate flag is set, then create the repository if it
2010 ** does not already exist.
2011 */
2012 static void find_server_repository(int arg, int fCreate){
2013 if( g.argc<=arg ){
2014 db_must_be_within_tree();
2015 }else{
@@ -2022,10 +2023,12 @@
2022 if( isDir==0 && fCreate ){
2023 const char *zPassword;
2024 db_create_repository(zRepo);
2025 db_open_repository(zRepo);
2026 db_begin_transaction();
 
 
2027 db_initial_setup(0, "now", g.zLogin);
2028 db_end_transaction(0);
2029 fossil_print("project-id: %s\n", db_get("project-code", 0));
2030 fossil_print("server-id: %s\n", db_get("server-code", 0));
2031 zPassword = db_text(0, "SELECT pw FROM user WHERE login=%Q", g.zLogin);
2032
--- src/main.c
+++ src/main.c
@@ -140,10 +140,11 @@
140 char *zLocalDbName; /* Name of the local database file */
141 char *zOpenRevision; /* Check-in version to use during database open */
142 int localOpen; /* True if the local database is open */
143 char *zLocalRoot; /* The directory holding the local database */
144 int minPrefix; /* Number of digits needed for a distinct UUID */
145 int eHashPolicy; /* Current hash policy. One of HPOLICY_* */
146 int fNoDirSymlinks; /* True if --no-dir-symlinks flag is present */
147 int fSqlTrace; /* True if --sqltrace flag is present */
148 int fSqlStats; /* True if --sqltrace or --sqlstats are present */
149 int fSqlPrint; /* True if -sqlprint flag is present */
150 int fQuiet; /* True if -quiet flag is present */
@@ -2005,11 +2006,11 @@
2006 ** the name of that directory and the specific repository will be
2007 ** opened later by process_one_web_page() based on the content of
2008 ** the PATH_INFO variable.
2009 **
2010 ** If the fCreate flag is set, then create the repository if it
2011 ** does not already exist. Always use "auto" hash-policy in this case.
2012 */
2013 static void find_server_repository(int arg, int fCreate){
2014 if( g.argc<=arg ){
2015 db_must_be_within_tree();
2016 }else{
@@ -2022,10 +2023,12 @@
2023 if( isDir==0 && fCreate ){
2024 const char *zPassword;
2025 db_create_repository(zRepo);
2026 db_open_repository(zRepo);
2027 db_begin_transaction();
2028 g.eHashPolicy = HPOLICY_AUTO;
2029 db_set_int("hash-policy", HPOLICY_AUTO, 0);
2030 db_initial_setup(0, "now", g.zLogin);
2031 db_end_transaction(0);
2032 fossil_print("project-id: %s\n", db_get("project-code", 0));
2033 fossil_print("server-id: %s\n", db_get("server-code", 0));
2034 zPassword = db_text(0, "SELECT pw FROM user WHERE login=%Q", g.zLogin);
2035
+3 -3
--- src/sha3.c
+++ src/sha3.c
@@ -378,18 +378,18 @@
378378
}
379379
380380
/*
381381
** Initialize a new hash. iSize determines the size of the hash
382382
** in bits and should be one of 224, 256, 384, or 512. Or iSize
383
-** can be zero to use the default hash size of 224 bits.
383
+** can be zero to use the default hash size of 256 bits.
384384
*/
385385
static void SHA3Init(SHA3Context *p, int iSize){
386386
memset(p, 0, sizeof(*p));
387387
if( iSize>=128 && iSize<=512 ){
388388
p->nRate = (1600 - ((iSize + 31)&~31)*2)/8;
389389
}else{
390
- p->nRate = 144;
390
+ p->nRate = (1600 - 2*256)/8;
391391
}
392392
#if SHA3_BYTEORDER==1234
393393
/* Known to be little-endian at compile-time. No-op */
394394
#elif SHA3_BYTEORDER==4321
395395
p->ixMask = 7; /* Big-endian */
@@ -428,11 +428,11 @@
428428
}
429429
}
430430
}
431431
#endif
432432
for(; i<nData; i++){
433
-#if SHA1_BYTEORDER==1234
433
+#if SHA3_BYTEORDER==1234
434434
p->u.x[p->nLoaded] ^= aData[i];
435435
#elif SHA3_BYTEORDER==4321
436436
p->u.x[p->nLoaded^0x07] ^= aData[i];
437437
#else
438438
p->u.x[p->nLoaded^p->ixMask] ^= aData[i];
439439
--- src/sha3.c
+++ src/sha3.c
@@ -378,18 +378,18 @@
378 }
379
380 /*
381 ** Initialize a new hash. iSize determines the size of the hash
382 ** in bits and should be one of 224, 256, 384, or 512. Or iSize
383 ** can be zero to use the default hash size of 224 bits.
384 */
385 static void SHA3Init(SHA3Context *p, int iSize){
386 memset(p, 0, sizeof(*p));
387 if( iSize>=128 && iSize<=512 ){
388 p->nRate = (1600 - ((iSize + 31)&~31)*2)/8;
389 }else{
390 p->nRate = 144;
391 }
392 #if SHA3_BYTEORDER==1234
393 /* Known to be little-endian at compile-time. No-op */
394 #elif SHA3_BYTEORDER==4321
395 p->ixMask = 7; /* Big-endian */
@@ -428,11 +428,11 @@
428 }
429 }
430 }
431 #endif
432 for(; i<nData; i++){
433 #if SHA1_BYTEORDER==1234
434 p->u.x[p->nLoaded] ^= aData[i];
435 #elif SHA3_BYTEORDER==4321
436 p->u.x[p->nLoaded^0x07] ^= aData[i];
437 #else
438 p->u.x[p->nLoaded^p->ixMask] ^= aData[i];
439
--- src/sha3.c
+++ src/sha3.c
@@ -378,18 +378,18 @@
378 }
379
380 /*
381 ** Initialize a new hash. iSize determines the size of the hash
382 ** in bits and should be one of 224, 256, 384, or 512. Or iSize
383 ** can be zero to use the default hash size of 256 bits.
384 */
385 static void SHA3Init(SHA3Context *p, int iSize){
386 memset(p, 0, sizeof(*p));
387 if( iSize>=128 && iSize<=512 ){
388 p->nRate = (1600 - ((iSize + 31)&~31)*2)/8;
389 }else{
390 p->nRate = (1600 - 2*256)/8;
391 }
392 #if SHA3_BYTEORDER==1234
393 /* Known to be little-endian at compile-time. No-op */
394 #elif SHA3_BYTEORDER==4321
395 p->ixMask = 7; /* Big-endian */
@@ -428,11 +428,11 @@
428 }
429 }
430 }
431 #endif
432 for(; i<nData; i++){
433 #if SHA3_BYTEORDER==1234
434 p->u.x[p->nLoaded] ^= aData[i];
435 #elif SHA3_BYTEORDER==4321
436 p->u.x[p->nLoaded^0x07] ^= aData[i];
437 #else
438 p->u.x[p->nLoaded^p->ixMask] ^= aData[i];
439
+1
--- src/shun.c
+++ src/shun.c
@@ -26,10 +26,11 @@
2626
*/
2727
int uuid_is_shunned(const char *zUuid){
2828
static Stmt q;
2929
int rc;
3030
if( zUuid==0 || zUuid[0]==0 ) return 0;
31
+ if( g.eHashPolicy==HPOLICY_SHUN_SHA1 && zUuid[HNAME_LEN_SHA1]==0 ) return 1;
3132
db_static_prepare(&q, "SELECT 1 FROM shun WHERE uuid=:uuid");
3233
db_bind_text(&q, ":uuid", zUuid);
3334
rc = db_step(&q);
3435
db_reset(&q);
3536
return rc==SQLITE_ROW;
3637
--- src/shun.c
+++ src/shun.c
@@ -26,10 +26,11 @@
26 */
27 int uuid_is_shunned(const char *zUuid){
28 static Stmt q;
29 int rc;
30 if( zUuid==0 || zUuid[0]==0 ) return 0;
 
31 db_static_prepare(&q, "SELECT 1 FROM shun WHERE uuid=:uuid");
32 db_bind_text(&q, ":uuid", zUuid);
33 rc = db_step(&q);
34 db_reset(&q);
35 return rc==SQLITE_ROW;
36
--- src/shun.c
+++ src/shun.c
@@ -26,10 +26,11 @@
26 */
27 int uuid_is_shunned(const char *zUuid){
28 static Stmt q;
29 int rc;
30 if( zUuid==0 || zUuid[0]==0 ) return 0;
31 if( g.eHashPolicy==HPOLICY_SHUN_SHA1 && zUuid[HNAME_LEN_SHA1]==0 ) return 1;
32 db_static_prepare(&q, "SELECT 1 FROM shun WHERE uuid=:uuid");
33 db_bind_text(&q, ":uuid", zUuid);
34 rc = db_step(&q);
35 db_reset(&q);
36 return rc==SQLITE_ROW;
37
--- src/sqlcmd.c
+++ src/sqlcmd.c
@@ -212,10 +212,13 @@
212212
*/
213213
void cmd_sqlite3(void){
214214
int noRepository;
215215
const char *zConfigDb;
216216
extern int sqlite3_shell(int, char**);
217
+#ifdef FOSSIL_ENABLE_TH1_HOOKS
218
+ g.fNoThHook = 1;
219
+#endif
217220
noRepository = find_option("no-repository", 0, 0)!=0;
218221
if( !noRepository ){
219222
db_find_and_open_repository(OPEN_ANY_SCHEMA, 0);
220223
}
221224
db_open_config(1,0);
222225
--- src/sqlcmd.c
+++ src/sqlcmd.c
@@ -212,10 +212,13 @@
212 */
213 void cmd_sqlite3(void){
214 int noRepository;
215 const char *zConfigDb;
216 extern int sqlite3_shell(int, char**);
 
 
 
217 noRepository = find_option("no-repository", 0, 0)!=0;
218 if( !noRepository ){
219 db_find_and_open_repository(OPEN_ANY_SCHEMA, 0);
220 }
221 db_open_config(1,0);
222
--- src/sqlcmd.c
+++ src/sqlcmd.c
@@ -212,10 +212,13 @@
212 */
213 void cmd_sqlite3(void){
214 int noRepository;
215 const char *zConfigDb;
216 extern int sqlite3_shell(int, char**);
217 #ifdef FOSSIL_ENABLE_TH1_HOOKS
218 g.fNoThHook = 1;
219 #endif
220 noRepository = find_option("no-repository", 0, 0)!=0;
221 if( !noRepository ){
222 db_find_and_open_repository(OPEN_ANY_SCHEMA, 0);
223 }
224 db_open_config(1,0);
225
+35 -32
--- src/stash.c
+++ src/stash.c
@@ -332,52 +332,45 @@
332332
isBin2 = fIncludeBinary ? 0 : looks_like_binary(&a);
333333
diff_file_mem(&empty, &a, isBin1, isBin2, zNew, zDiffCmd,
334334
zBinGlob, fIncludeBinary, diffFlags);
335335
}else if( isRemoved ){
336336
fossil_print("DELETE %s\n", zOrig);
337
- if( fBaseline==0 ){
338
- if( file_wd_islink(zOPath) ){
339
- blob_read_link(&a, zOPath);
340
- }else{
341
- blob_read_from_file(&a, zOPath);
342
- }
343
- }else{
344
- content_get(rid, &a);
345
- }
346
- diff_print_index(zNew, diffFlags);
347
- isBin1 = fIncludeBinary ? 0 : looks_like_binary(&a);
348
- isBin2 = 0;
349
- diff_file_mem(&a, &empty, isBin1, isBin2, zOrig, zDiffCmd,
350
- zBinGlob, fIncludeBinary, diffFlags);
351
- }else{
352
- Blob delta, disk;
337
+ diff_print_index(zNew, diffFlags);
338
+ isBin2 = 0;
339
+ if( fBaseline ){
340
+ content_get(rid, &a);
341
+ isBin1 = fIncludeBinary ? 0 : looks_like_binary(&a);
342
+ diff_file_mem(&a, &empty, isBin1, isBin2, zOrig, zDiffCmd,
343
+ zBinGlob, fIncludeBinary, diffFlags);
344
+ }else{
345
+ }
346
+ }else{
347
+ Blob delta;
353348
int isOrigLink = file_wd_islink(zOPath);
354349
db_ephemeral_blob(&q, 6, &delta);
355
- if( fBaseline==0 ){
356
- if( isOrigLink ){
357
- blob_read_link(&disk, zOPath);
358
- }else{
359
- blob_read_from_file(&disk, zOPath);
360
- }
361
- }
362350
fossil_print("CHANGED %s\n", zNew);
363351
if( !isOrigLink != !isLink ){
364352
diff_print_index(zNew, diffFlags);
365353
diff_print_filenames(zOrig, zNew, diffFlags);
366354
printf(DIFF_CANNOT_COMPUTE_SYMLINK);
367355
}else{
368
- Blob *pBase = fBaseline ? &a : &disk;
369356
content_get(rid, &a);
370357
blob_delta_apply(&a, &delta, &b);
371
- isBin1 = fIncludeBinary ? 0 : looks_like_binary(pBase);
358
+ isBin1 = fIncludeBinary ? 0 : looks_like_binary(&a);
372359
isBin2 = fIncludeBinary ? 0 : looks_like_binary(&b);
373
- diff_file_mem(fBaseline? &a : &disk, &b, isBin1, isBin2, zNew,
374
- zDiffCmd, zBinGlob, fIncludeBinary, diffFlags);
360
+ if( fBaseline ){
361
+ diff_file_mem(&a, &b, isBin1, isBin2, zNew,
362
+ zDiffCmd, zBinGlob, fIncludeBinary, diffFlags);
363
+ }else{
364
+ /*Diff with file on disk using fSwapDiff=1 to show the diff in the
365
+ same direction as if fBaseline=1.*/
366
+ diff_file(&b, isBin2, zOPath, zNew, zDiffCmd,
367
+ zBinGlob, fIncludeBinary, diffFlags, 1);
368
+ }
375369
blob_reset(&a);
376370
blob_reset(&b);
377371
}
378
- if( !fBaseline ) blob_reset(&disk);
379372
blob_reset(&delta);
380373
}
381374
}
382375
db_finalize(&q);
383376
}
@@ -433,12 +426,15 @@
433426
**
434427
** List all changes sets currently stashed. Show information about
435428
** individual files in each changeset if -v or --verbose is used.
436429
**
437430
** fossil stash show|cat ?STASHID? ?DIFF-OPTIONS?
431
+** fossil stash gshow|gcat ?STASHID? ?DIFF-OPTIONS?
438432
**
439
-** Show the contents of a stash.
433
+** Show the contents of a stash as a diff against it's baseline.
434
+** With gshow and gcat, gdiff-command is used instead of internal
435
+** diff logic.
440436
**
441437
** fossil stash pop
442438
** fossil stash apply ?STASHID?
443439
**
444440
** Apply STASHID or the most recently create stash to the current
@@ -460,18 +456,20 @@
460456
**
461457
** fossil stash diff ?STASHID? ?DIFF-OPTIONS?
462458
** fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
463459
**
464460
** Show diffs of the current working directory and what that
465
-** directory would be if STASHID were applied.
461
+** directory would be if STASHID were applied. With gdiff,
462
+** gdiff-command is used instead of internal diff logic.
466463
**
467464
** SUMMARY:
468465
** fossil stash
469466
** fossil stash save ?-m|--comment COMMENT? ?FILES...?
470467
** fossil stash snapshot ?-m|--comment COMMENT? ?FILES...?
471468
** fossil stash list|ls ?-v|--verbose? ?-W|--width <num>?
472469
** fossil stash show|cat ?STASHID? ?DIFF-OPTIONS?
470
+** fossil stash gshow|gcat ?STASHID? ?DIFF-OPTIONS?
473471
** fossil stash pop
474472
** fossil stash apply|goto ?STASHID?
475473
** fossil stash drop|rm ?STASHID? ?-a|--all?
476474
** fossil stash diff ?STASHID? ?DIFF-OPTIONS?
477475
** fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
@@ -654,25 +652,30 @@
654652
undo_finish();
655653
}else
656654
if( memcmp(zCmd, "diff", nCmd)==0
657655
|| memcmp(zCmd, "gdiff", nCmd)==0
658656
|| memcmp(zCmd, "show", nCmd)==0
657
+ || memcmp(zCmd, "gshow", nCmd)==0
659658
|| memcmp(zCmd, "cat", nCmd)==0
659
+ || memcmp(zCmd, "gcat", nCmd)==0
660660
){
661661
const char *zDiffCmd = 0;
662662
const char *zBinGlob = 0;
663663
int fIncludeBinary = 0;
664
- int fBaseline = zCmd[0]=='s' || zCmd[0]=='c';
664
+ int fBaseline = 0;
665665
u64 diffFlags;
666666
667
+ if( strstr(zCmd,"show")!=0 || strstr(zCmd,"cat")!=0 ){
668
+ fBaseline = 1;
669
+ }
667670
if( find_option("tk",0,0)!=0 ){
668671
db_close(0);
669672
diff_tk(fBaseline ? "stash show" : "stash diff", 3);
670673
return;
671674
}
672675
if( find_option("internal","i",0)==0 ){
673
- zDiffCmd = diff_command_external(memcmp(zCmd, "gdiff", nCmd)==0);
676
+ zDiffCmd = diff_command_external(zCmd[0]=='g');
674677
}
675678
diffFlags = diff_options();
676679
if( find_option("verbose","v",0)!=0 ) diffFlags |= DIFF_VERBOSE;
677680
if( g.argc>4 ) usage(mprintf("%s ?STASHID? ?DIFF-OPTIONS?", zCmd));
678681
if( zDiffCmd ){
679682
--- src/stash.c
+++ src/stash.c
@@ -332,52 +332,45 @@
332 isBin2 = fIncludeBinary ? 0 : looks_like_binary(&a);
333 diff_file_mem(&empty, &a, isBin1, isBin2, zNew, zDiffCmd,
334 zBinGlob, fIncludeBinary, diffFlags);
335 }else if( isRemoved ){
336 fossil_print("DELETE %s\n", zOrig);
337 if( fBaseline==0 ){
338 if( file_wd_islink(zOPath) ){
339 blob_read_link(&a, zOPath);
340 }else{
341 blob_read_from_file(&a, zOPath);
342 }
343 }else{
344 content_get(rid, &a);
345 }
346 diff_print_index(zNew, diffFlags);
347 isBin1 = fIncludeBinary ? 0 : looks_like_binary(&a);
348 isBin2 = 0;
349 diff_file_mem(&a, &empty, isBin1, isBin2, zOrig, zDiffCmd,
350 zBinGlob, fIncludeBinary, diffFlags);
351 }else{
352 Blob delta, disk;
353 int isOrigLink = file_wd_islink(zOPath);
354 db_ephemeral_blob(&q, 6, &delta);
355 if( fBaseline==0 ){
356 if( isOrigLink ){
357 blob_read_link(&disk, zOPath);
358 }else{
359 blob_read_from_file(&disk, zOPath);
360 }
361 }
362 fossil_print("CHANGED %s\n", zNew);
363 if( !isOrigLink != !isLink ){
364 diff_print_index(zNew, diffFlags);
365 diff_print_filenames(zOrig, zNew, diffFlags);
366 printf(DIFF_CANNOT_COMPUTE_SYMLINK);
367 }else{
368 Blob *pBase = fBaseline ? &a : &disk;
369 content_get(rid, &a);
370 blob_delta_apply(&a, &delta, &b);
371 isBin1 = fIncludeBinary ? 0 : looks_like_binary(pBase);
372 isBin2 = fIncludeBinary ? 0 : looks_like_binary(&b);
373 diff_file_mem(fBaseline? &a : &disk, &b, isBin1, isBin2, zNew,
374 zDiffCmd, zBinGlob, fIncludeBinary, diffFlags);
 
 
 
 
 
 
 
375 blob_reset(&a);
376 blob_reset(&b);
377 }
378 if( !fBaseline ) blob_reset(&disk);
379 blob_reset(&delta);
380 }
381 }
382 db_finalize(&q);
383 }
@@ -433,12 +426,15 @@
433 **
434 ** List all changes sets currently stashed. Show information about
435 ** individual files in each changeset if -v or --verbose is used.
436 **
437 ** fossil stash show|cat ?STASHID? ?DIFF-OPTIONS?
 
438 **
439 ** Show the contents of a stash.
 
 
440 **
441 ** fossil stash pop
442 ** fossil stash apply ?STASHID?
443 **
444 ** Apply STASHID or the most recently create stash to the current
@@ -460,18 +456,20 @@
460 **
461 ** fossil stash diff ?STASHID? ?DIFF-OPTIONS?
462 ** fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
463 **
464 ** Show diffs of the current working directory and what that
465 ** directory would be if STASHID were applied.
 
466 **
467 ** SUMMARY:
468 ** fossil stash
469 ** fossil stash save ?-m|--comment COMMENT? ?FILES...?
470 ** fossil stash snapshot ?-m|--comment COMMENT? ?FILES...?
471 ** fossil stash list|ls ?-v|--verbose? ?-W|--width <num>?
472 ** fossil stash show|cat ?STASHID? ?DIFF-OPTIONS?
 
473 ** fossil stash pop
474 ** fossil stash apply|goto ?STASHID?
475 ** fossil stash drop|rm ?STASHID? ?-a|--all?
476 ** fossil stash diff ?STASHID? ?DIFF-OPTIONS?
477 ** fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
@@ -654,25 +652,30 @@
654 undo_finish();
655 }else
656 if( memcmp(zCmd, "diff", nCmd)==0
657 || memcmp(zCmd, "gdiff", nCmd)==0
658 || memcmp(zCmd, "show", nCmd)==0
 
659 || memcmp(zCmd, "cat", nCmd)==0
 
660 ){
661 const char *zDiffCmd = 0;
662 const char *zBinGlob = 0;
663 int fIncludeBinary = 0;
664 int fBaseline = zCmd[0]=='s' || zCmd[0]=='c';
665 u64 diffFlags;
666
 
 
 
667 if( find_option("tk",0,0)!=0 ){
668 db_close(0);
669 diff_tk(fBaseline ? "stash show" : "stash diff", 3);
670 return;
671 }
672 if( find_option("internal","i",0)==0 ){
673 zDiffCmd = diff_command_external(memcmp(zCmd, "gdiff", nCmd)==0);
674 }
675 diffFlags = diff_options();
676 if( find_option("verbose","v",0)!=0 ) diffFlags |= DIFF_VERBOSE;
677 if( g.argc>4 ) usage(mprintf("%s ?STASHID? ?DIFF-OPTIONS?", zCmd));
678 if( zDiffCmd ){
679
--- src/stash.c
+++ src/stash.c
@@ -332,52 +332,45 @@
332 isBin2 = fIncludeBinary ? 0 : looks_like_binary(&a);
333 diff_file_mem(&empty, &a, isBin1, isBin2, zNew, zDiffCmd,
334 zBinGlob, fIncludeBinary, diffFlags);
335 }else if( isRemoved ){
336 fossil_print("DELETE %s\n", zOrig);
337 diff_print_index(zNew, diffFlags);
338 isBin2 = 0;
339 if( fBaseline ){
340 content_get(rid, &a);
341 isBin1 = fIncludeBinary ? 0 : looks_like_binary(&a);
342 diff_file_mem(&a, &empty, isBin1, isBin2, zOrig, zDiffCmd,
343 zBinGlob, fIncludeBinary, diffFlags);
344 }else{
345 }
346 }else{
347 Blob delta;
 
 
 
 
 
348 int isOrigLink = file_wd_islink(zOPath);
349 db_ephemeral_blob(&q, 6, &delta);
 
 
 
 
 
 
 
350 fossil_print("CHANGED %s\n", zNew);
351 if( !isOrigLink != !isLink ){
352 diff_print_index(zNew, diffFlags);
353 diff_print_filenames(zOrig, zNew, diffFlags);
354 printf(DIFF_CANNOT_COMPUTE_SYMLINK);
355 }else{
 
356 content_get(rid, &a);
357 blob_delta_apply(&a, &delta, &b);
358 isBin1 = fIncludeBinary ? 0 : looks_like_binary(&a);
359 isBin2 = fIncludeBinary ? 0 : looks_like_binary(&b);
360 if( fBaseline ){
361 diff_file_mem(&a, &b, isBin1, isBin2, zNew,
362 zDiffCmd, zBinGlob, fIncludeBinary, diffFlags);
363 }else{
364 /*Diff with file on disk using fSwapDiff=1 to show the diff in the
365 same direction as if fBaseline=1.*/
366 diff_file(&b, isBin2, zOPath, zNew, zDiffCmd,
367 zBinGlob, fIncludeBinary, diffFlags, 1);
368 }
369 blob_reset(&a);
370 blob_reset(&b);
371 }
 
372 blob_reset(&delta);
373 }
374 }
375 db_finalize(&q);
376 }
@@ -433,12 +426,15 @@
426 **
427 ** List all changes sets currently stashed. Show information about
428 ** individual files in each changeset if -v or --verbose is used.
429 **
430 ** fossil stash show|cat ?STASHID? ?DIFF-OPTIONS?
431 ** fossil stash gshow|gcat ?STASHID? ?DIFF-OPTIONS?
432 **
433 ** Show the contents of a stash as a diff against it's baseline.
434 ** With gshow and gcat, gdiff-command is used instead of internal
435 ** diff logic.
436 **
437 ** fossil stash pop
438 ** fossil stash apply ?STASHID?
439 **
440 ** Apply STASHID or the most recently create stash to the current
@@ -460,18 +456,20 @@
456 **
457 ** fossil stash diff ?STASHID? ?DIFF-OPTIONS?
458 ** fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
459 **
460 ** Show diffs of the current working directory and what that
461 ** directory would be if STASHID were applied. With gdiff,
462 ** gdiff-command is used instead of internal diff logic.
463 **
464 ** SUMMARY:
465 ** fossil stash
466 ** fossil stash save ?-m|--comment COMMENT? ?FILES...?
467 ** fossil stash snapshot ?-m|--comment COMMENT? ?FILES...?
468 ** fossil stash list|ls ?-v|--verbose? ?-W|--width <num>?
469 ** fossil stash show|cat ?STASHID? ?DIFF-OPTIONS?
470 ** fossil stash gshow|gcat ?STASHID? ?DIFF-OPTIONS?
471 ** fossil stash pop
472 ** fossil stash apply|goto ?STASHID?
473 ** fossil stash drop|rm ?STASHID? ?-a|--all?
474 ** fossil stash diff ?STASHID? ?DIFF-OPTIONS?
475 ** fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
@@ -654,25 +652,30 @@
652 undo_finish();
653 }else
654 if( memcmp(zCmd, "diff", nCmd)==0
655 || memcmp(zCmd, "gdiff", nCmd)==0
656 || memcmp(zCmd, "show", nCmd)==0
657 || memcmp(zCmd, "gshow", nCmd)==0
658 || memcmp(zCmd, "cat", nCmd)==0
659 || memcmp(zCmd, "gcat", nCmd)==0
660 ){
661 const char *zDiffCmd = 0;
662 const char *zBinGlob = 0;
663 int fIncludeBinary = 0;
664 int fBaseline = 0;
665 u64 diffFlags;
666
667 if( strstr(zCmd,"show")!=0 || strstr(zCmd,"cat")!=0 ){
668 fBaseline = 1;
669 }
670 if( find_option("tk",0,0)!=0 ){
671 db_close(0);
672 diff_tk(fBaseline ? "stash show" : "stash diff", 3);
673 return;
674 }
675 if( find_option("internal","i",0)==0 ){
676 zDiffCmd = diff_command_external(zCmd[0]=='g');
677 }
678 diffFlags = diff_options();
679 if( find_option("verbose","v",0)!=0 ) diffFlags |= DIFF_VERBOSE;
680 if( g.argc>4 ) usage(mprintf("%s ?STASHID? ?DIFF-OPTIONS?", zCmd));
681 if( zDiffCmd ){
682
+6 -1
--- src/stat.c
+++ src/stat.c
@@ -183,11 +183,16 @@
183183
@ (%h(RELEASE_VERSION)) <a href='version?verbose=1'>(details)</a>
184184
@ </td></tr>
185185
@ <tr><th>SQLite&nbsp;Version:</th><td>%.19s(sqlite3_sourceid())
186186
@ [%.10s(&sqlite3_sourceid()[20])] (%s(sqlite3_libversion()))
187187
@ <a href='version?verbose=2'>(details)</a></td></tr>
188
- @ <tr><th>Schema&nbsp;Version:</th><td>%h(g.zAuxSchema)</td></tr>
188
+ if( g.eHashPolicy!=HPOLICY_AUTO ){
189
+ @ <tr><th>Schema&nbsp;Version:</th><td>%h(g.zAuxSchema),
190
+ @ %s(hpolicy_name())</td></tr>
191
+ }else{
192
+ @ <tr><th>Schema&nbsp;Version:</th><td>%h(g.zAuxSchema)</td></tr>
193
+ }
189194
@ <tr><th>Repository Rebuilt:</th><td>
190195
@ %h(db_get_mtime("rebuilt","%Y-%m-%d %H:%M:%S","Never"))
191196
@ By Fossil %h(db_get("rebuilt","Unknown"))</td></tr>
192197
@ <tr><th>Database&nbsp;Stats:</th><td>
193198
@ %d(db_int(0, "PRAGMA repository.page_count")) pages,
194199
--- src/stat.c
+++ src/stat.c
@@ -183,11 +183,16 @@
183 @ (%h(RELEASE_VERSION)) <a href='version?verbose=1'>(details)</a>
184 @ </td></tr>
185 @ <tr><th>SQLite&nbsp;Version:</th><td>%.19s(sqlite3_sourceid())
186 @ [%.10s(&sqlite3_sourceid()[20])] (%s(sqlite3_libversion()))
187 @ <a href='version?verbose=2'>(details)</a></td></tr>
188 @ <tr><th>Schema&nbsp;Version:</th><td>%h(g.zAuxSchema)</td></tr>
 
 
 
 
 
189 @ <tr><th>Repository Rebuilt:</th><td>
190 @ %h(db_get_mtime("rebuilt","%Y-%m-%d %H:%M:%S","Never"))
191 @ By Fossil %h(db_get("rebuilt","Unknown"))</td></tr>
192 @ <tr><th>Database&nbsp;Stats:</th><td>
193 @ %d(db_int(0, "PRAGMA repository.page_count")) pages,
194
--- src/stat.c
+++ src/stat.c
@@ -183,11 +183,16 @@
183 @ (%h(RELEASE_VERSION)) <a href='version?verbose=1'>(details)</a>
184 @ </td></tr>
185 @ <tr><th>SQLite&nbsp;Version:</th><td>%.19s(sqlite3_sourceid())
186 @ [%.10s(&sqlite3_sourceid()[20])] (%s(sqlite3_libversion()))
187 @ <a href='version?verbose=2'>(details)</a></td></tr>
188 if( g.eHashPolicy!=HPOLICY_AUTO ){
189 @ <tr><th>Schema&nbsp;Version:</th><td>%h(g.zAuxSchema),
190 @ %s(hpolicy_name())</td></tr>
191 }else{
192 @ <tr><th>Schema&nbsp;Version:</th><td>%h(g.zAuxSchema)</td></tr>
193 }
194 @ <tr><th>Repository Rebuilt:</th><td>
195 @ %h(db_get_mtime("rebuilt","%Y-%m-%d %H:%M:%S","Never"))
196 @ By Fossil %h(db_get("rebuilt","Unknown"))</td></tr>
197 @ <tr><th>Database&nbsp;Stats:</th><td>
198 @ %d(db_int(0, "PRAGMA repository.page_count")) pages,
199
--- src/unversioned.c
+++ src/unversioned.c
@@ -456,11 +456,11 @@
456456
** Query parameters:
457457
**
458458
** byage=1 Order the initial display be decreasing age
459459
** showdel=0 Show deleted files
460460
*/
461
-void uvstat_page(void){
461
+void uvlist_page(void){
462462
Stmt q;
463463
sqlite3_int64 iNow;
464464
sqlite3_int64 iTotalSz = 0;
465465
int cnt = 0;
466466
int n = 0;
@@ -554,5 +554,62 @@
554554
}else{
555555
@ No unversioned files on this server.
556556
}
557557
style_footer();
558558
}
559
+
560
+/*
561
+** WEBPAGE: juvlist
562
+**
563
+** Return a complete list of unversioned files as JSON. The JSON
564
+** looks like this:
565
+**
566
+** [{"name":NAME,
567
+** "mtime":MTIME,
568
+** "hash":HASH,
569
+** "size":SIZE,
570
+** "user":USER}]
571
+*/
572
+void uvlist_json_page(void){
573
+ Stmt q;
574
+ char *zSep = "[";
575
+ Blob json;
576
+
577
+ login_check_credentials();
578
+ if( !g.perm.Read ){ login_needed(g.anon.Read); return; }
579
+ cgi_set_content_type("text/json");
580
+ if( !db_table_exists("repository","unversioned") ){
581
+ blob_init(&json, "[]", -1);
582
+ cgi_set_content(&json);
583
+ return;
584
+ }
585
+ blob_init(&json, 0, 0);
586
+ db_prepare(&q,
587
+ "SELECT"
588
+ " name,"
589
+ " mtime,"
590
+ " hash,"
591
+ " sz,"
592
+ " (SELECT login FROM rcvfrom, user"
593
+ " WHERE user.uid=rcvfrom.uid AND rcvfrom.rcvid=unversioned.rcvid)"
594
+ " FROM unversioned WHERE hash IS NOT NULL"
595
+ );
596
+ while( db_step(&q)==SQLITE_ROW ){
597
+ const char *zName = db_column_text(&q, 0);
598
+ sqlite3_int64 mtime = db_column_int(&q, 1);
599
+ const char *zHash = db_column_text(&q, 2);
600
+ int fullSize = db_column_int(&q, 3);
601
+ const char *zLogin = db_column_text(&q, 4);
602
+ if( zLogin==0 ) zLogin = "";
603
+ blob_appendf(&json, "%s{\"name\":\"", zSep);
604
+ zSep = ",\n ";
605
+ blob_append_json_string(&json, zName);
606
+ blob_appendf(&json, "\",\n \"mtime\":%lld,\n \"hash\":\"", mtime);
607
+ blob_append_json_string(&json, zHash);
608
+ blob_appendf(&json, "\",\n \"size\":%d,\n \"user\":\"", fullSize);
609
+ blob_append_json_string(&json, zLogin);
610
+ blob_appendf(&json, "\"}");
611
+ }
612
+ db_finalize(&q);
613
+ blob_appendf(&json,"]\n");
614
+ cgi_set_content(&json);
615
+}
559616
--- src/unversioned.c
+++ src/unversioned.c
@@ -456,11 +456,11 @@
456 ** Query parameters:
457 **
458 ** byage=1 Order the initial display be decreasing age
459 ** showdel=0 Show deleted files
460 */
461 void uvstat_page(void){
462 Stmt q;
463 sqlite3_int64 iNow;
464 sqlite3_int64 iTotalSz = 0;
465 int cnt = 0;
466 int n = 0;
@@ -554,5 +554,62 @@
554 }else{
555 @ No unversioned files on this server.
556 }
557 style_footer();
558 }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
559
--- src/unversioned.c
+++ src/unversioned.c
@@ -456,11 +456,11 @@
456 ** Query parameters:
457 **
458 ** byage=1 Order the initial display be decreasing age
459 ** showdel=0 Show deleted files
460 */
461 void uvlist_page(void){
462 Stmt q;
463 sqlite3_int64 iNow;
464 sqlite3_int64 iTotalSz = 0;
465 int cnt = 0;
466 int n = 0;
@@ -554,5 +554,62 @@
554 }else{
555 @ No unversioned files on this server.
556 }
557 style_footer();
558 }
559
560 /*
561 ** WEBPAGE: juvlist
562 **
563 ** Return a complete list of unversioned files as JSON. The JSON
564 ** looks like this:
565 **
566 ** [{"name":NAME,
567 ** "mtime":MTIME,
568 ** "hash":HASH,
569 ** "size":SIZE,
570 ** "user":USER}]
571 */
572 void uvlist_json_page(void){
573 Stmt q;
574 char *zSep = "[";
575 Blob json;
576
577 login_check_credentials();
578 if( !g.perm.Read ){ login_needed(g.anon.Read); return; }
579 cgi_set_content_type("text/json");
580 if( !db_table_exists("repository","unversioned") ){
581 blob_init(&json, "[]", -1);
582 cgi_set_content(&json);
583 return;
584 }
585 blob_init(&json, 0, 0);
586 db_prepare(&q,
587 "SELECT"
588 " name,"
589 " mtime,"
590 " hash,"
591 " sz,"
592 " (SELECT login FROM rcvfrom, user"
593 " WHERE user.uid=rcvfrom.uid AND rcvfrom.rcvid=unversioned.rcvid)"
594 " FROM unversioned WHERE hash IS NOT NULL"
595 );
596 while( db_step(&q)==SQLITE_ROW ){
597 const char *zName = db_column_text(&q, 0);
598 sqlite3_int64 mtime = db_column_int(&q, 1);
599 const char *zHash = db_column_text(&q, 2);
600 int fullSize = db_column_int(&q, 3);
601 const char *zLogin = db_column_text(&q, 4);
602 if( zLogin==0 ) zLogin = "";
603 blob_appendf(&json, "%s{\"name\":\"", zSep);
604 zSep = ",\n ";
605 blob_append_json_string(&json, zName);
606 blob_appendf(&json, "\",\n \"mtime\":%lld,\n \"hash\":\"", mtime);
607 blob_append_json_string(&json, zHash);
608 blob_appendf(&json, "\",\n \"size\":%d,\n \"user\":\"", fullSize);
609 blob_append_json_string(&json, zLogin);
610 blob_appendf(&json, "\"}");
611 }
612 db_finalize(&q);
613 blob_appendf(&json,"]\n");
614 cgi_set_content(&json);
615 }
616
+1 -1
--- src/wiki.c
+++ src/wiki.c
@@ -1122,11 +1122,11 @@
11221122
*/
11231123
int wiki_technote_to_rid(const char *zETime) {
11241124
int rid=0; /* Artifact ID of the tech note */
11251125
int nETime = strlen(zETime);
11261126
Stmt q;
1127
- if( nETime>=4 && hname_validate(zETime, nETime) ){
1127
+ if( nETime>=4 && nETime<=HNAME_MAX && validate16(zETime, nETime) ){
11281128
char zUuid[HNAME_MAX+1];
11291129
memcpy(zUuid, zETime, nETime+1);
11301130
canonical16(zUuid, nETime);
11311131
db_prepare(&q,
11321132
"SELECT e.objid"
11331133
--- src/wiki.c
+++ src/wiki.c
@@ -1122,11 +1122,11 @@
1122 */
1123 int wiki_technote_to_rid(const char *zETime) {
1124 int rid=0; /* Artifact ID of the tech note */
1125 int nETime = strlen(zETime);
1126 Stmt q;
1127 if( nETime>=4 && hname_validate(zETime, nETime) ){
1128 char zUuid[HNAME_MAX+1];
1129 memcpy(zUuid, zETime, nETime+1);
1130 canonical16(zUuid, nETime);
1131 db_prepare(&q,
1132 "SELECT e.objid"
1133
--- src/wiki.c
+++ src/wiki.c
@@ -1122,11 +1122,11 @@
1122 */
1123 int wiki_technote_to_rid(const char *zETime) {
1124 int rid=0; /* Artifact ID of the tech note */
1125 int nETime = strlen(zETime);
1126 Stmt q;
1127 if( nETime>=4 && nETime<=HNAME_MAX && validate16(zETime, nETime) ){
1128 char zUuid[HNAME_MAX+1];
1129 memcpy(zUuid, zETime, nETime+1);
1130 canonical16(zUuid, nETime);
1131 db_prepare(&q,
1132 "SELECT e.objid"
1133
+1
--- src/xfer.c
+++ src/xfer.c
@@ -1768,10 +1768,11 @@
17681768
memset(&xfer, 0, sizeof(xfer));
17691769
xfer.pIn = &recv;
17701770
xfer.pOut = &send;
17711771
xfer.mxSend = db_get_int("max-upload", 250000);
17721772
xfer.maxTime = -1;
1773
+ xfer.clientVersion = RELEASE_VERSION_NUMBER;
17731774
if( syncFlags & SYNC_PRIVATE ){
17741775
g.perm.Private = 1;
17751776
xfer.syncPrivate = 1;
17761777
}
17771778
17781779
--- src/xfer.c
+++ src/xfer.c
@@ -1768,10 +1768,11 @@
1768 memset(&xfer, 0, sizeof(xfer));
1769 xfer.pIn = &recv;
1770 xfer.pOut = &send;
1771 xfer.mxSend = db_get_int("max-upload", 250000);
1772 xfer.maxTime = -1;
 
1773 if( syncFlags & SYNC_PRIVATE ){
1774 g.perm.Private = 1;
1775 xfer.syncPrivate = 1;
1776 }
1777
1778
--- src/xfer.c
+++ src/xfer.c
@@ -1768,10 +1768,11 @@
1768 memset(&xfer, 0, sizeof(xfer));
1769 xfer.pIn = &recv;
1770 xfer.pOut = &send;
1771 xfer.mxSend = db_get_int("max-upload", 250000);
1772 xfer.maxTime = -1;
1773 xfer.clientVersion = RELEASE_VERSION_NUMBER;
1774 if( syncFlags & SYNC_PRIVATE ){
1775 g.perm.Private = 1;
1776 xfer.syncPrivate = 1;
1777 }
1778
1779
--- win/Makefile.mingw.mistachkin
+++ win/Makefile.mingw.mistachkin
@@ -461,10 +461,11 @@
461461
$(SRCDIR)/fshell.c \
462462
$(SRCDIR)/fusefs.c \
463463
$(SRCDIR)/glob.c \
464464
$(SRCDIR)/graph.c \
465465
$(SRCDIR)/gzip.c \
466
+ $(SRCDIR)/hname.c \
466467
$(SRCDIR)/http.c \
467468
$(SRCDIR)/http_socket.c \
468469
$(SRCDIR)/http_ssl.c \
469470
$(SRCDIR)/http_transport.c \
470471
$(SRCDIR)/import.c \
@@ -511,10 +512,12 @@
511512
$(SRCDIR)/rss.c \
512513
$(SRCDIR)/schema.c \
513514
$(SRCDIR)/search.c \
514515
$(SRCDIR)/setup.c \
515516
$(SRCDIR)/sha1.c \
517
+ $(SRCDIR)/sha1hard.c \
518
+ $(SRCDIR)/sha3.c \
516519
$(SRCDIR)/shun.c \
517520
$(SRCDIR)/sitemap.c \
518521
$(SRCDIR)/skins.c \
519522
$(SRCDIR)/sqlcmd.c \
520523
$(SRCDIR)/stash.c \
@@ -636,10 +639,11 @@
636639
$(OBJDIR)/fshell_.c \
637640
$(OBJDIR)/fusefs_.c \
638641
$(OBJDIR)/glob_.c \
639642
$(OBJDIR)/graph_.c \
640643
$(OBJDIR)/gzip_.c \
644
+ $(OBJDIR)/hname_.c \
641645
$(OBJDIR)/http_.c \
642646
$(OBJDIR)/http_socket_.c \
643647
$(OBJDIR)/http_ssl_.c \
644648
$(OBJDIR)/http_transport_.c \
645649
$(OBJDIR)/import_.c \
@@ -686,10 +690,12 @@
686690
$(OBJDIR)/rss_.c \
687691
$(OBJDIR)/schema_.c \
688692
$(OBJDIR)/search_.c \
689693
$(OBJDIR)/setup_.c \
690694
$(OBJDIR)/sha1_.c \
695
+ $(OBJDIR)/sha1hard_.c \
696
+ $(OBJDIR)/sha3_.c \
691697
$(OBJDIR)/shun_.c \
692698
$(OBJDIR)/sitemap_.c \
693699
$(OBJDIR)/skins_.c \
694700
$(OBJDIR)/sqlcmd_.c \
695701
$(OBJDIR)/stash_.c \
@@ -760,10 +766,11 @@
760766
$(OBJDIR)/fshell.o \
761767
$(OBJDIR)/fusefs.o \
762768
$(OBJDIR)/glob.o \
763769
$(OBJDIR)/graph.o \
764770
$(OBJDIR)/gzip.o \
771
+ $(OBJDIR)/hname.o \
765772
$(OBJDIR)/http.o \
766773
$(OBJDIR)/http_socket.o \
767774
$(OBJDIR)/http_ssl.o \
768775
$(OBJDIR)/http_transport.o \
769776
$(OBJDIR)/import.o \
@@ -810,10 +817,12 @@
810817
$(OBJDIR)/rss.o \
811818
$(OBJDIR)/schema.o \
812819
$(OBJDIR)/search.o \
813820
$(OBJDIR)/setup.o \
814821
$(OBJDIR)/sha1.o \
822
+ $(OBJDIR)/sha1hard.o \
823
+ $(OBJDIR)/sha3.o \
815824
$(OBJDIR)/shun.o \
816825
$(OBJDIR)/sitemap.o \
817826
$(OBJDIR)/skins.o \
818827
$(OBJDIR)/sqlcmd.o \
819828
$(OBJDIR)/stash.o \
@@ -1095,10 +1104,11 @@
10951104
$(OBJDIR)/fshell_.c:$(OBJDIR)/fshell.h \
10961105
$(OBJDIR)/fusefs_.c:$(OBJDIR)/fusefs.h \
10971106
$(OBJDIR)/glob_.c:$(OBJDIR)/glob.h \
10981107
$(OBJDIR)/graph_.c:$(OBJDIR)/graph.h \
10991108
$(OBJDIR)/gzip_.c:$(OBJDIR)/gzip.h \
1109
+ $(OBJDIR)/hname_.c:$(OBJDIR)/hname.h \
11001110
$(OBJDIR)/http_.c:$(OBJDIR)/http.h \
11011111
$(OBJDIR)/http_socket_.c:$(OBJDIR)/http_socket.h \
11021112
$(OBJDIR)/http_ssl_.c:$(OBJDIR)/http_ssl.h \
11031113
$(OBJDIR)/http_transport_.c:$(OBJDIR)/http_transport.h \
11041114
$(OBJDIR)/import_.c:$(OBJDIR)/import.h \
@@ -1145,10 +1155,12 @@
11451155
$(OBJDIR)/rss_.c:$(OBJDIR)/rss.h \
11461156
$(OBJDIR)/schema_.c:$(OBJDIR)/schema.h \
11471157
$(OBJDIR)/search_.c:$(OBJDIR)/search.h \
11481158
$(OBJDIR)/setup_.c:$(OBJDIR)/setup.h \
11491159
$(OBJDIR)/sha1_.c:$(OBJDIR)/sha1.h \
1160
+ $(OBJDIR)/sha1hard_.c:$(OBJDIR)/sha1hard.h \
1161
+ $(OBJDIR)/sha3_.c:$(OBJDIR)/sha3.h \
11501162
$(OBJDIR)/shun_.c:$(OBJDIR)/shun.h \
11511163
$(OBJDIR)/sitemap_.c:$(OBJDIR)/sitemap.h \
11521164
$(OBJDIR)/skins_.c:$(OBJDIR)/skins.h \
11531165
$(OBJDIR)/sqlcmd_.c:$(OBJDIR)/sqlcmd.h \
11541166
$(OBJDIR)/stash_.c:$(OBJDIR)/stash.h \
@@ -1498,10 +1510,18 @@
14981510
14991511
$(OBJDIR)/gzip.o: $(OBJDIR)/gzip_.c $(OBJDIR)/gzip.h $(SRCDIR)/config.h
15001512
$(XTCC) -o $(OBJDIR)/gzip.o -c $(OBJDIR)/gzip_.c
15011513
15021514
$(OBJDIR)/gzip.h: $(OBJDIR)/headers
1515
+
1516
+$(OBJDIR)/hname_.c: $(SRCDIR)/hname.c $(TRANSLATE)
1517
+ $(TRANSLATE) $(SRCDIR)/hname.c >$@
1518
+
1519
+$(OBJDIR)/hname.o: $(OBJDIR)/hname_.c $(OBJDIR)/hname.h $(SRCDIR)/config.h
1520
+ $(XTCC) -o $(OBJDIR)/hname.o -c $(OBJDIR)/hname_.c
1521
+
1522
+$(OBJDIR)/hname.h: $(OBJDIR)/headers
15031523
15041524
$(OBJDIR)/http_.c: $(SRCDIR)/http.c $(TRANSLATE)
15051525
$(TRANSLATE) $(SRCDIR)/http.c >$@
15061526
15071527
$(OBJDIR)/http.o: $(OBJDIR)/http_.c $(OBJDIR)/http.h $(SRCDIR)/config.h
@@ -1898,10 +1918,26 @@
18981918
18991919
$(OBJDIR)/sha1.o: $(OBJDIR)/sha1_.c $(OBJDIR)/sha1.h $(SRCDIR)/config.h
19001920
$(XTCC) -o $(OBJDIR)/sha1.o -c $(OBJDIR)/sha1_.c
19011921
19021922
$(OBJDIR)/sha1.h: $(OBJDIR)/headers
1923
+
1924
+$(OBJDIR)/sha1hard_.c: $(SRCDIR)/sha1hard.c $(TRANSLATE)
1925
+ $(TRANSLATE) $(SRCDIR)/sha1hard.c >$@
1926
+
1927
+$(OBJDIR)/sha1hard.o: $(OBJDIR)/sha1hard_.c $(OBJDIR)/sha1hard.h $(SRCDIR)/config.h
1928
+ $(XTCC) -o $(OBJDIR)/sha1hard.o -c $(OBJDIR)/sha1hard_.c
1929
+
1930
+$(OBJDIR)/sha1hard.h: $(OBJDIR)/headers
1931
+
1932
+$(OBJDIR)/sha3_.c: $(SRCDIR)/sha3.c $(TRANSLATE)
1933
+ $(TRANSLATE) $(SRCDIR)/sha3.c >$@
1934
+
1935
+$(OBJDIR)/sha3.o: $(OBJDIR)/sha3_.c $(OBJDIR)/sha3.h $(SRCDIR)/config.h
1936
+ $(XTCC) -o $(OBJDIR)/sha3.o -c $(OBJDIR)/sha3_.c
1937
+
1938
+$(OBJDIR)/sha3.h: $(OBJDIR)/headers
19031939
19041940
$(OBJDIR)/shun_.c: $(SRCDIR)/shun.c $(TRANSLATE)
19051941
$(TRANSLATE) $(SRCDIR)/shun.c >$@
19061942
19071943
$(OBJDIR)/shun.o: $(OBJDIR)/shun_.c $(OBJDIR)/shun.h $(SRCDIR)/config.h
19081944
--- win/Makefile.mingw.mistachkin
+++ win/Makefile.mingw.mistachkin
@@ -461,10 +461,11 @@
461 $(SRCDIR)/fshell.c \
462 $(SRCDIR)/fusefs.c \
463 $(SRCDIR)/glob.c \
464 $(SRCDIR)/graph.c \
465 $(SRCDIR)/gzip.c \
 
466 $(SRCDIR)/http.c \
467 $(SRCDIR)/http_socket.c \
468 $(SRCDIR)/http_ssl.c \
469 $(SRCDIR)/http_transport.c \
470 $(SRCDIR)/import.c \
@@ -511,10 +512,12 @@
511 $(SRCDIR)/rss.c \
512 $(SRCDIR)/schema.c \
513 $(SRCDIR)/search.c \
514 $(SRCDIR)/setup.c \
515 $(SRCDIR)/sha1.c \
 
 
516 $(SRCDIR)/shun.c \
517 $(SRCDIR)/sitemap.c \
518 $(SRCDIR)/skins.c \
519 $(SRCDIR)/sqlcmd.c \
520 $(SRCDIR)/stash.c \
@@ -636,10 +639,11 @@
636 $(OBJDIR)/fshell_.c \
637 $(OBJDIR)/fusefs_.c \
638 $(OBJDIR)/glob_.c \
639 $(OBJDIR)/graph_.c \
640 $(OBJDIR)/gzip_.c \
 
641 $(OBJDIR)/http_.c \
642 $(OBJDIR)/http_socket_.c \
643 $(OBJDIR)/http_ssl_.c \
644 $(OBJDIR)/http_transport_.c \
645 $(OBJDIR)/import_.c \
@@ -686,10 +690,12 @@
686 $(OBJDIR)/rss_.c \
687 $(OBJDIR)/schema_.c \
688 $(OBJDIR)/search_.c \
689 $(OBJDIR)/setup_.c \
690 $(OBJDIR)/sha1_.c \
 
 
691 $(OBJDIR)/shun_.c \
692 $(OBJDIR)/sitemap_.c \
693 $(OBJDIR)/skins_.c \
694 $(OBJDIR)/sqlcmd_.c \
695 $(OBJDIR)/stash_.c \
@@ -760,10 +766,11 @@
760 $(OBJDIR)/fshell.o \
761 $(OBJDIR)/fusefs.o \
762 $(OBJDIR)/glob.o \
763 $(OBJDIR)/graph.o \
764 $(OBJDIR)/gzip.o \
 
765 $(OBJDIR)/http.o \
766 $(OBJDIR)/http_socket.o \
767 $(OBJDIR)/http_ssl.o \
768 $(OBJDIR)/http_transport.o \
769 $(OBJDIR)/import.o \
@@ -810,10 +817,12 @@
810 $(OBJDIR)/rss.o \
811 $(OBJDIR)/schema.o \
812 $(OBJDIR)/search.o \
813 $(OBJDIR)/setup.o \
814 $(OBJDIR)/sha1.o \
 
 
815 $(OBJDIR)/shun.o \
816 $(OBJDIR)/sitemap.o \
817 $(OBJDIR)/skins.o \
818 $(OBJDIR)/sqlcmd.o \
819 $(OBJDIR)/stash.o \
@@ -1095,10 +1104,11 @@
1095 $(OBJDIR)/fshell_.c:$(OBJDIR)/fshell.h \
1096 $(OBJDIR)/fusefs_.c:$(OBJDIR)/fusefs.h \
1097 $(OBJDIR)/glob_.c:$(OBJDIR)/glob.h \
1098 $(OBJDIR)/graph_.c:$(OBJDIR)/graph.h \
1099 $(OBJDIR)/gzip_.c:$(OBJDIR)/gzip.h \
 
1100 $(OBJDIR)/http_.c:$(OBJDIR)/http.h \
1101 $(OBJDIR)/http_socket_.c:$(OBJDIR)/http_socket.h \
1102 $(OBJDIR)/http_ssl_.c:$(OBJDIR)/http_ssl.h \
1103 $(OBJDIR)/http_transport_.c:$(OBJDIR)/http_transport.h \
1104 $(OBJDIR)/import_.c:$(OBJDIR)/import.h \
@@ -1145,10 +1155,12 @@
1145 $(OBJDIR)/rss_.c:$(OBJDIR)/rss.h \
1146 $(OBJDIR)/schema_.c:$(OBJDIR)/schema.h \
1147 $(OBJDIR)/search_.c:$(OBJDIR)/search.h \
1148 $(OBJDIR)/setup_.c:$(OBJDIR)/setup.h \
1149 $(OBJDIR)/sha1_.c:$(OBJDIR)/sha1.h \
 
 
1150 $(OBJDIR)/shun_.c:$(OBJDIR)/shun.h \
1151 $(OBJDIR)/sitemap_.c:$(OBJDIR)/sitemap.h \
1152 $(OBJDIR)/skins_.c:$(OBJDIR)/skins.h \
1153 $(OBJDIR)/sqlcmd_.c:$(OBJDIR)/sqlcmd.h \
1154 $(OBJDIR)/stash_.c:$(OBJDIR)/stash.h \
@@ -1498,10 +1510,18 @@
1498
1499 $(OBJDIR)/gzip.o: $(OBJDIR)/gzip_.c $(OBJDIR)/gzip.h $(SRCDIR)/config.h
1500 $(XTCC) -o $(OBJDIR)/gzip.o -c $(OBJDIR)/gzip_.c
1501
1502 $(OBJDIR)/gzip.h: $(OBJDIR)/headers
 
 
 
 
 
 
 
 
1503
1504 $(OBJDIR)/http_.c: $(SRCDIR)/http.c $(TRANSLATE)
1505 $(TRANSLATE) $(SRCDIR)/http.c >$@
1506
1507 $(OBJDIR)/http.o: $(OBJDIR)/http_.c $(OBJDIR)/http.h $(SRCDIR)/config.h
@@ -1898,10 +1918,26 @@
1898
1899 $(OBJDIR)/sha1.o: $(OBJDIR)/sha1_.c $(OBJDIR)/sha1.h $(SRCDIR)/config.h
1900 $(XTCC) -o $(OBJDIR)/sha1.o -c $(OBJDIR)/sha1_.c
1901
1902 $(OBJDIR)/sha1.h: $(OBJDIR)/headers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1903
1904 $(OBJDIR)/shun_.c: $(SRCDIR)/shun.c $(TRANSLATE)
1905 $(TRANSLATE) $(SRCDIR)/shun.c >$@
1906
1907 $(OBJDIR)/shun.o: $(OBJDIR)/shun_.c $(OBJDIR)/shun.h $(SRCDIR)/config.h
1908
--- win/Makefile.mingw.mistachkin
+++ win/Makefile.mingw.mistachkin
@@ -461,10 +461,11 @@
461 $(SRCDIR)/fshell.c \
462 $(SRCDIR)/fusefs.c \
463 $(SRCDIR)/glob.c \
464 $(SRCDIR)/graph.c \
465 $(SRCDIR)/gzip.c \
466 $(SRCDIR)/hname.c \
467 $(SRCDIR)/http.c \
468 $(SRCDIR)/http_socket.c \
469 $(SRCDIR)/http_ssl.c \
470 $(SRCDIR)/http_transport.c \
471 $(SRCDIR)/import.c \
@@ -511,10 +512,12 @@
512 $(SRCDIR)/rss.c \
513 $(SRCDIR)/schema.c \
514 $(SRCDIR)/search.c \
515 $(SRCDIR)/setup.c \
516 $(SRCDIR)/sha1.c \
517 $(SRCDIR)/sha1hard.c \
518 $(SRCDIR)/sha3.c \
519 $(SRCDIR)/shun.c \
520 $(SRCDIR)/sitemap.c \
521 $(SRCDIR)/skins.c \
522 $(SRCDIR)/sqlcmd.c \
523 $(SRCDIR)/stash.c \
@@ -636,10 +639,11 @@
639 $(OBJDIR)/fshell_.c \
640 $(OBJDIR)/fusefs_.c \
641 $(OBJDIR)/glob_.c \
642 $(OBJDIR)/graph_.c \
643 $(OBJDIR)/gzip_.c \
644 $(OBJDIR)/hname_.c \
645 $(OBJDIR)/http_.c \
646 $(OBJDIR)/http_socket_.c \
647 $(OBJDIR)/http_ssl_.c \
648 $(OBJDIR)/http_transport_.c \
649 $(OBJDIR)/import_.c \
@@ -686,10 +690,12 @@
690 $(OBJDIR)/rss_.c \
691 $(OBJDIR)/schema_.c \
692 $(OBJDIR)/search_.c \
693 $(OBJDIR)/setup_.c \
694 $(OBJDIR)/sha1_.c \
695 $(OBJDIR)/sha1hard_.c \
696 $(OBJDIR)/sha3_.c \
697 $(OBJDIR)/shun_.c \
698 $(OBJDIR)/sitemap_.c \
699 $(OBJDIR)/skins_.c \
700 $(OBJDIR)/sqlcmd_.c \
701 $(OBJDIR)/stash_.c \
@@ -760,10 +766,11 @@
766 $(OBJDIR)/fshell.o \
767 $(OBJDIR)/fusefs.o \
768 $(OBJDIR)/glob.o \
769 $(OBJDIR)/graph.o \
770 $(OBJDIR)/gzip.o \
771 $(OBJDIR)/hname.o \
772 $(OBJDIR)/http.o \
773 $(OBJDIR)/http_socket.o \
774 $(OBJDIR)/http_ssl.o \
775 $(OBJDIR)/http_transport.o \
776 $(OBJDIR)/import.o \
@@ -810,10 +817,12 @@
817 $(OBJDIR)/rss.o \
818 $(OBJDIR)/schema.o \
819 $(OBJDIR)/search.o \
820 $(OBJDIR)/setup.o \
821 $(OBJDIR)/sha1.o \
822 $(OBJDIR)/sha1hard.o \
823 $(OBJDIR)/sha3.o \
824 $(OBJDIR)/shun.o \
825 $(OBJDIR)/sitemap.o \
826 $(OBJDIR)/skins.o \
827 $(OBJDIR)/sqlcmd.o \
828 $(OBJDIR)/stash.o \
@@ -1095,10 +1104,11 @@
1104 $(OBJDIR)/fshell_.c:$(OBJDIR)/fshell.h \
1105 $(OBJDIR)/fusefs_.c:$(OBJDIR)/fusefs.h \
1106 $(OBJDIR)/glob_.c:$(OBJDIR)/glob.h \
1107 $(OBJDIR)/graph_.c:$(OBJDIR)/graph.h \
1108 $(OBJDIR)/gzip_.c:$(OBJDIR)/gzip.h \
1109 $(OBJDIR)/hname_.c:$(OBJDIR)/hname.h \
1110 $(OBJDIR)/http_.c:$(OBJDIR)/http.h \
1111 $(OBJDIR)/http_socket_.c:$(OBJDIR)/http_socket.h \
1112 $(OBJDIR)/http_ssl_.c:$(OBJDIR)/http_ssl.h \
1113 $(OBJDIR)/http_transport_.c:$(OBJDIR)/http_transport.h \
1114 $(OBJDIR)/import_.c:$(OBJDIR)/import.h \
@@ -1145,10 +1155,12 @@
1155 $(OBJDIR)/rss_.c:$(OBJDIR)/rss.h \
1156 $(OBJDIR)/schema_.c:$(OBJDIR)/schema.h \
1157 $(OBJDIR)/search_.c:$(OBJDIR)/search.h \
1158 $(OBJDIR)/setup_.c:$(OBJDIR)/setup.h \
1159 $(OBJDIR)/sha1_.c:$(OBJDIR)/sha1.h \
1160 $(OBJDIR)/sha1hard_.c:$(OBJDIR)/sha1hard.h \
1161 $(OBJDIR)/sha3_.c:$(OBJDIR)/sha3.h \
1162 $(OBJDIR)/shun_.c:$(OBJDIR)/shun.h \
1163 $(OBJDIR)/sitemap_.c:$(OBJDIR)/sitemap.h \
1164 $(OBJDIR)/skins_.c:$(OBJDIR)/skins.h \
1165 $(OBJDIR)/sqlcmd_.c:$(OBJDIR)/sqlcmd.h \
1166 $(OBJDIR)/stash_.c:$(OBJDIR)/stash.h \
@@ -1498,10 +1510,18 @@
1510
1511 $(OBJDIR)/gzip.o: $(OBJDIR)/gzip_.c $(OBJDIR)/gzip.h $(SRCDIR)/config.h
1512 $(XTCC) -o $(OBJDIR)/gzip.o -c $(OBJDIR)/gzip_.c
1513
1514 $(OBJDIR)/gzip.h: $(OBJDIR)/headers
1515
1516 $(OBJDIR)/hname_.c: $(SRCDIR)/hname.c $(TRANSLATE)
1517 $(TRANSLATE) $(SRCDIR)/hname.c >$@
1518
1519 $(OBJDIR)/hname.o: $(OBJDIR)/hname_.c $(OBJDIR)/hname.h $(SRCDIR)/config.h
1520 $(XTCC) -o $(OBJDIR)/hname.o -c $(OBJDIR)/hname_.c
1521
1522 $(OBJDIR)/hname.h: $(OBJDIR)/headers
1523
1524 $(OBJDIR)/http_.c: $(SRCDIR)/http.c $(TRANSLATE)
1525 $(TRANSLATE) $(SRCDIR)/http.c >$@
1526
1527 $(OBJDIR)/http.o: $(OBJDIR)/http_.c $(OBJDIR)/http.h $(SRCDIR)/config.h
@@ -1898,10 +1918,26 @@
1918
1919 $(OBJDIR)/sha1.o: $(OBJDIR)/sha1_.c $(OBJDIR)/sha1.h $(SRCDIR)/config.h
1920 $(XTCC) -o $(OBJDIR)/sha1.o -c $(OBJDIR)/sha1_.c
1921
1922 $(OBJDIR)/sha1.h: $(OBJDIR)/headers
1923
1924 $(OBJDIR)/sha1hard_.c: $(SRCDIR)/sha1hard.c $(TRANSLATE)
1925 $(TRANSLATE) $(SRCDIR)/sha1hard.c >$@
1926
1927 $(OBJDIR)/sha1hard.o: $(OBJDIR)/sha1hard_.c $(OBJDIR)/sha1hard.h $(SRCDIR)/config.h
1928 $(XTCC) -o $(OBJDIR)/sha1hard.o -c $(OBJDIR)/sha1hard_.c
1929
1930 $(OBJDIR)/sha1hard.h: $(OBJDIR)/headers
1931
1932 $(OBJDIR)/sha3_.c: $(SRCDIR)/sha3.c $(TRANSLATE)
1933 $(TRANSLATE) $(SRCDIR)/sha3.c >$@
1934
1935 $(OBJDIR)/sha3.o: $(OBJDIR)/sha3_.c $(OBJDIR)/sha3.h $(SRCDIR)/config.h
1936 $(XTCC) -o $(OBJDIR)/sha3.o -c $(OBJDIR)/sha3_.c
1937
1938 $(OBJDIR)/sha3.h: $(OBJDIR)/headers
1939
1940 $(OBJDIR)/shun_.c: $(SRCDIR)/shun.c $(TRANSLATE)
1941 $(TRANSLATE) $(SRCDIR)/shun.c >$@
1942
1943 $(OBJDIR)/shun.o: $(OBJDIR)/shun_.c $(OBJDIR)/shun.h $(SRCDIR)/config.h
1944
--- www/changes.wiki
+++ www/changes.wiki
@@ -1,6 +1,17 @@
11
<title>Change Log</title>
2
+
3
+<a name='v2_1'></a>
4
+<h2>Changes for Version 2.1 (2017-03-??)</h2>
5
+
6
+ * Add support for [./hashpolicy.wiki|hash policies] that control which
7
+ of the Hardened-SHA1 or SHA3-256 algorithms is used to name new
8
+ artifacts.
9
+ * Add the "gshow" and "gcat" subcommands to [/help?cmd=stash|fossil stash].
10
+ * Add the [/help?cmd=/juvlist|/juvlist] web page and use it to construct
11
+ the [/uv/download.html|Download Page] of the Fossil self-hosting website
12
+ using Ajax.
213
314
<a name='v2_0'></a>
415
<h2>Changes for Version 2.0 (2017-03-03)</h2>
516
617
* Use the
718
819
ADDED www/hashpolicy.wiki
--- www/changes.wiki
+++ www/changes.wiki
@@ -1,6 +1,17 @@
1 <title>Change Log</title>
 
 
 
 
 
 
 
 
 
 
 
2
3 <a name='v2_0'></a>
4 <h2>Changes for Version 2.0 (2017-03-03)</h2>
5
6 * Use the
7
8 DDED www/hashpolicy.wiki
--- www/changes.wiki
+++ www/changes.wiki
@@ -1,6 +1,17 @@
1 <title>Change Log</title>
2
3 <a name='v2_1'></a>
4 <h2>Changes for Version 2.1 (2017-03-??)</h2>
5
6 * Add support for [./hashpolicy.wiki|hash policies] that control which
7 of the Hardened-SHA1 or SHA3-256 algorithms is used to name new
8 artifacts.
9 * Add the "gshow" and "gcat" subcommands to [/help?cmd=stash|fossil stash].
10 * Add the [/help?cmd=/juvlist|/juvlist] web page and use it to construct
11 the [/uv/download.html|Download Page] of the Fossil self-hosting website
12 using Ajax.
13
14 <a name='v2_0'></a>
15 <h2>Changes for Version 2.0 (2017-03-03)</h2>
16
17 * Use the
18
19 DDED www/hashpolicy.wiki
--- a/www/hashpolicy.wiki
+++ b/www/hashpolicy.wiki
@@ -0,0 +1,20 @@
1
+<title>Hash Policy</title>
2
+
3
+<h2> Executive Summary, Orcutive Summary</h2>
4
+
5
+<b>Or: How To </h2>
6
+
7
+There i This Article</b>
8
+
9
+Thham now
10
+upgraded to
11
+change texpected to be
12
+replaced ot expected to be
13
+replaced until Ma
14
+out o
15
+Debian 9 is implement0 or later
16
+
17
+work and
18
+Hash Policy</title>
19
+
20
+<h2>< Introduction ha", not generic SHA1sequel
--- a/www/hashpolicy.wiki
+++ b/www/hashpolicy.wiki
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/www/hashpolicy.wiki
+++ b/www/hashpolicy.wiki
@@ -0,0 +1,20 @@
1 <title>Hash Policy</title>
2
3 <h2> Executive Summary, Orcutive Summary</h2>
4
5 <b>Or: How To </h2>
6
7 There i This Article</b>
8
9 Thham now
10 upgraded to
11 change texpected to be
12 replaced ot expected to be
13 replaced until Ma
14 out o
15 Debian 9 is implement0 or later
16
17 work and
18 Hash Policy</title>
19
20 <h2>< Introduction ha", not generic SHA1sequel
--- www/mkdownload.tcl
+++ www/mkdownload.tcl
@@ -37,12 +37,12 @@
3737
set avers($version) 1
3838
}
3939
}
4040
close $in
4141
42
+set vdate(2.0) 2017-03-03
4243
set vdate(1.37) 2017-01-15
43
-set vdate(1.36) 2016-10-24
4444
4545
# Do all versions from newest to oldest
4646
#
4747
foreach vers [lsort -decr -real [array names avers]] {
4848
# set hr "../timeline?c=version-$vers;y=ci"
@@ -57,11 +57,11 @@
5757
puts $out "</b></center>"
5858
puts $out "</td></tr>"
5959
puts $out "<tr>"
6060
6161
foreach {prefix img desc} {
62
- fossil-linux-x86 linux.gif {Linux 3.x x86}
62
+ fossil-linux linux.gif {Linux 3.x x64}
6363
fossil-macosx mac.gif {Mac 10.x x86}
6464
fossil-openbsd-x86 openbsd.gif {OpenBSD 5.x x86}
6565
fossil-w32 win32.gif {Windows}
6666
fossil-src src.gif {Source Tarball}
6767
} {
6868
--- www/mkdownload.tcl
+++ www/mkdownload.tcl
@@ -37,12 +37,12 @@
37 set avers($version) 1
38 }
39 }
40 close $in
41
 
42 set vdate(1.37) 2017-01-15
43 set vdate(1.36) 2016-10-24
44
45 # Do all versions from newest to oldest
46 #
47 foreach vers [lsort -decr -real [array names avers]] {
48 # set hr "../timeline?c=version-$vers;y=ci"
@@ -57,11 +57,11 @@
57 puts $out "</b></center>"
58 puts $out "</td></tr>"
59 puts $out "<tr>"
60
61 foreach {prefix img desc} {
62 fossil-linux-x86 linux.gif {Linux 3.x x86}
63 fossil-macosx mac.gif {Mac 10.x x86}
64 fossil-openbsd-x86 openbsd.gif {OpenBSD 5.x x86}
65 fossil-w32 win32.gif {Windows}
66 fossil-src src.gif {Source Tarball}
67 } {
68
--- www/mkdownload.tcl
+++ www/mkdownload.tcl
@@ -37,12 +37,12 @@
37 set avers($version) 1
38 }
39 }
40 close $in
41
42 set vdate(2.0) 2017-03-03
43 set vdate(1.37) 2017-01-15
 
44
45 # Do all versions from newest to oldest
46 #
47 foreach vers [lsort -decr -real [array names avers]] {
48 # set hr "../timeline?c=version-$vers;y=ci"
@@ -57,11 +57,11 @@
57 puts $out "</b></center>"
58 puts $out "</td></tr>"
59 puts $out "<tr>"
60
61 foreach {prefix img desc} {
62 fossil-linux linux.gif {Linux 3.x x64}
63 fossil-macosx mac.gif {Mac 10.x x86}
64 fossil-openbsd-x86 openbsd.gif {OpenBSD 5.x x86}
65 fossil-w32 win32.gif {Windows}
66 fossil-src src.gif {Source Tarball}
67 } {
68
--- www/mkindex.tcl
+++ www/mkindex.tcl
@@ -36,10 +36,11 @@
3636
fiveminutes.wiki {Update and Running in 5 Minutes as a Single User}
3737
foss-cklist.wiki {Checklist For Successful Open-Source Projects}
3838
fossil-from-msvc.wiki {Integrating Fossil in the Microsoft Express 2010 IDE}
3939
fossil-v-git.wiki {Fossil Versus Git}
4040
hacker-howto.wiki {Hacker How-To}
41
+ hashpolicy.wiki {Hash Policy: Choosing Between SHA1 and SHA3-256}
4142
/help {Lists of Commands and Webpages}
4243
hints.wiki {Fossil Tips And Usage Hints}
4344
index.wiki {Home Page}
4445
inout.wiki {Import And Export To And From Git}
4546
makefile.wiki {The Fossil Build Process}
4647
--- www/mkindex.tcl
+++ www/mkindex.tcl
@@ -36,10 +36,11 @@
36 fiveminutes.wiki {Update and Running in 5 Minutes as a Single User}
37 foss-cklist.wiki {Checklist For Successful Open-Source Projects}
38 fossil-from-msvc.wiki {Integrating Fossil in the Microsoft Express 2010 IDE}
39 fossil-v-git.wiki {Fossil Versus Git}
40 hacker-howto.wiki {Hacker How-To}
 
41 /help {Lists of Commands and Webpages}
42 hints.wiki {Fossil Tips And Usage Hints}
43 index.wiki {Home Page}
44 inout.wiki {Import And Export To And From Git}
45 makefile.wiki {The Fossil Build Process}
46
--- www/mkindex.tcl
+++ www/mkindex.tcl
@@ -36,10 +36,11 @@
36 fiveminutes.wiki {Update and Running in 5 Minutes as a Single User}
37 foss-cklist.wiki {Checklist For Successful Open-Source Projects}
38 fossil-from-msvc.wiki {Integrating Fossil in the Microsoft Express 2010 IDE}
39 fossil-v-git.wiki {Fossil Versus Git}
40 hacker-howto.wiki {Hacker How-To}
41 hashpolicy.wiki {Hash Policy: Choosing Between SHA1 and SHA3-256}
42 /help {Lists of Commands and Webpages}
43 hints.wiki {Fossil Tips And Usage Hints}
44 index.wiki {Home Page}
45 inout.wiki {Import And Export To And From Git}
46 makefile.wiki {The Fossil Build Process}
47
--- www/permutedindex.html
+++ www/permutedindex.html
@@ -29,10 +29,11 @@
2929
<li><a href="blame.wiki">Annotate/Blame Algorithm Of Fossil &mdash; The</a></li>
3030
<li><a href="customskin.md">Appearance of Web Pages &mdash; Theming: Customizing The</a></li>
3131
<li><a href="faq.wiki">Asked Questions &mdash; Frequently</a></li>
3232
<li><a href="password.wiki">Authentication &mdash; Password Management And</a></li>
3333
<li><a href="whyusefossil.wiki"><b>Benefits Of Version Control</b></a></li>
34
+<li><a href="hashpolicy.wiki">Between SHA1 and SHA3-256 &mdash; Hash Policy: Choosing</a></li>
3435
<li><a href="antibot.wiki">Bots &mdash; Defense against Spiders and</a></li>
3536
<li><a href="private.wiki">Branches &mdash; Creating, Syncing, and Deleting Private</a></li>
3637
<li><a href="branching.wiki"><b>Branching, Forking, Merging, and Tagging</b></a></li>
3738
<li><a href="bugtheory.wiki"><b>Bug Tracking In Fossil</b></a></li>
3839
<li><a href="makefile.wiki">Build Process &mdash; The Fossil</a></li>
@@ -43,10 +44,11 @@
4344
<li><a href="checkin.wiki">Checklist &mdash; Check-in</a></li>
4445
<li><a href="../test/release-checklist.wiki">Checklist &mdash; Pre-Release Testing</a></li>
4546
<li><a href="foss-cklist.wiki"><b>Checklist For Successful Open-Source Projects</b></a></li>
4647
<li><a href="selfcheck.wiki">Checks &mdash; Fossil Repository Integrity Self</a></li>
4748
<li><a href="childprojects.wiki"><b>Child Projects</b></a></li>
49
+<li><a href="hashpolicy.wiki">Choosing Between SHA1 and SHA3-256 &mdash; Hash Policy:</a></li>
4850
<li><a href="contribute.wiki">Code or Documentation To The Fossil Project &mdash; Contributing</a></li>
4951
<li><a href="style.wiki">Code Style Guidelines &mdash; Source</a></li>
5052
<li><a href="../../../help">Commands and Webpages &mdash; Lists of</a></li>
5153
<li><a href="build.wiki"><b>Compiling and Installing Fossil</b></a></li>
5254
<li><a href="concepts.wiki">Concepts &mdash; Fossil Core</a></li>
@@ -111,10 +113,11 @@
111113
<li><a href="customgraph.md">Graph &mdash; Theming: Customizing the Timeline</a></li>
112114
<li><a href="quickstart.wiki">Guide &mdash; Fossil Quick Start</a></li>
113115
<li><a href="style.wiki">Guidelines &mdash; Source Code Style</a></li>
114116
<li><a href="hacker-howto.wiki"><b>Hacker How-To</b></a></li>
115117
<li><a href="adding_code.wiki"><b>Hacking Fossil</b></a></li>
118
+<li><a href="hashpolicy.wiki"><b>Hash Policy: Choosing Between SHA1 and SHA3-256</b></a></li>
116119
<li><a href="hints.wiki">Hints &mdash; Fossil Tips And Usage</a></li>
117120
<li><a href="index.wiki"><b>Home Page</b></a></li>
118121
<li><a href="selfhost.wiki">Hosting Repositories &mdash; Fossil Self</a></li>
119122
<li><a href="aboutcgi.wiki"><b>How CGI Works In Fossil</b></a></li>
120123
<li><a href="server.wiki"><b>How To Configure A Fossil Server</b></a></li>
@@ -147,10 +150,11 @@
147150
<li><a href="index.wiki">Page &mdash; Home</a></li>
148151
<li><a href="customskin.md">Pages &mdash; Theming: Customizing The Appearance of Web</a></li>
149152
<li><a href="password.wiki"><b>Password Management And Authentication</b></a></li>
150153
<li><a href="quotes.wiki">People Are Saying About Fossil, Git, and DVCSes in General &mdash; Quotes: What</a></li>
151154
<li><a href="stats.wiki"><b>Performance Statistics</b></a></li>
155
+<li><a href="hashpolicy.wiki">Policy: Choosing Between SHA1 and SHA3-256 &mdash; Hash</a></li>
152156
<li><a href="../test/release-checklist.wiki"><b>Pre-Release Testing Checklist</b></a></li>
153157
<li><a href="pop.wiki"><b>Principles Of Operation</b></a></li>
154158
<li><a href="private.wiki">Private Branches &mdash; Creating, Syncing, and Deleting</a></li>
155159
<li><a href="makefile.wiki">Process &mdash; The Fossil Build</a></li>
156160
<li><a href="contribute.wiki">Project &mdash; Contributing Code or Documentation To The Fossil</a></li>
@@ -174,10 +178,12 @@
174178
<li><a href="th1.md">Scripting Language &mdash; The TH1</a></li>
175179
<li><a href="selfcheck.wiki">Self Checks &mdash; Fossil Repository Integrity</a></li>
176180
<li><a href="selfhost.wiki">Self Hosting Repositories &mdash; Fossil</a></li>
177181
<li><a href="server.wiki">Server &mdash; How To Configure A Fossil</a></li>
178182
<li><a href="settings.wiki">Settings &mdash; Fossil</a></li>
183
+<li><a href="hashpolicy.wiki">SHA1 and SHA3-256 &mdash; Hash Policy: Choosing Between</a></li>
184
+<li><a href="hashpolicy.wiki">SHA3-256 &mdash; Hash Policy: Choosing Between SHA1 and</a></li>
179185
<li><a href="shunning.wiki"><b>Shunning: Deleting Content From Fossil</b></a></li>
180186
<li><a href="fiveminutes.wiki">Single User &mdash; Update and Running in 5 Minutes as a</a></li>
181187
<li><a href="../../../sitemap"><b>Site Map</b></a></li>
182188
<li><a href="style.wiki"><b>Source Code Style Guidelines</b></a></li>
183189
<li><a href="antibot.wiki">Spiders and Bots &mdash; Defense against</a></li>
184190
--- www/permutedindex.html
+++ www/permutedindex.html
@@ -29,10 +29,11 @@
29 <li><a href="blame.wiki">Annotate/Blame Algorithm Of Fossil &mdash; The</a></li>
30 <li><a href="customskin.md">Appearance of Web Pages &mdash; Theming: Customizing The</a></li>
31 <li><a href="faq.wiki">Asked Questions &mdash; Frequently</a></li>
32 <li><a href="password.wiki">Authentication &mdash; Password Management And</a></li>
33 <li><a href="whyusefossil.wiki"><b>Benefits Of Version Control</b></a></li>
 
34 <li><a href="antibot.wiki">Bots &mdash; Defense against Spiders and</a></li>
35 <li><a href="private.wiki">Branches &mdash; Creating, Syncing, and Deleting Private</a></li>
36 <li><a href="branching.wiki"><b>Branching, Forking, Merging, and Tagging</b></a></li>
37 <li><a href="bugtheory.wiki"><b>Bug Tracking In Fossil</b></a></li>
38 <li><a href="makefile.wiki">Build Process &mdash; The Fossil</a></li>
@@ -43,10 +44,11 @@
43 <li><a href="checkin.wiki">Checklist &mdash; Check-in</a></li>
44 <li><a href="../test/release-checklist.wiki">Checklist &mdash; Pre-Release Testing</a></li>
45 <li><a href="foss-cklist.wiki"><b>Checklist For Successful Open-Source Projects</b></a></li>
46 <li><a href="selfcheck.wiki">Checks &mdash; Fossil Repository Integrity Self</a></li>
47 <li><a href="childprojects.wiki"><b>Child Projects</b></a></li>
 
48 <li><a href="contribute.wiki">Code or Documentation To The Fossil Project &mdash; Contributing</a></li>
49 <li><a href="style.wiki">Code Style Guidelines &mdash; Source</a></li>
50 <li><a href="../../../help">Commands and Webpages &mdash; Lists of</a></li>
51 <li><a href="build.wiki"><b>Compiling and Installing Fossil</b></a></li>
52 <li><a href="concepts.wiki">Concepts &mdash; Fossil Core</a></li>
@@ -111,10 +113,11 @@
111 <li><a href="customgraph.md">Graph &mdash; Theming: Customizing the Timeline</a></li>
112 <li><a href="quickstart.wiki">Guide &mdash; Fossil Quick Start</a></li>
113 <li><a href="style.wiki">Guidelines &mdash; Source Code Style</a></li>
114 <li><a href="hacker-howto.wiki"><b>Hacker How-To</b></a></li>
115 <li><a href="adding_code.wiki"><b>Hacking Fossil</b></a></li>
 
116 <li><a href="hints.wiki">Hints &mdash; Fossil Tips And Usage</a></li>
117 <li><a href="index.wiki"><b>Home Page</b></a></li>
118 <li><a href="selfhost.wiki">Hosting Repositories &mdash; Fossil Self</a></li>
119 <li><a href="aboutcgi.wiki"><b>How CGI Works In Fossil</b></a></li>
120 <li><a href="server.wiki"><b>How To Configure A Fossil Server</b></a></li>
@@ -147,10 +150,11 @@
147 <li><a href="index.wiki">Page &mdash; Home</a></li>
148 <li><a href="customskin.md">Pages &mdash; Theming: Customizing The Appearance of Web</a></li>
149 <li><a href="password.wiki"><b>Password Management And Authentication</b></a></li>
150 <li><a href="quotes.wiki">People Are Saying About Fossil, Git, and DVCSes in General &mdash; Quotes: What</a></li>
151 <li><a href="stats.wiki"><b>Performance Statistics</b></a></li>
 
152 <li><a href="../test/release-checklist.wiki"><b>Pre-Release Testing Checklist</b></a></li>
153 <li><a href="pop.wiki"><b>Principles Of Operation</b></a></li>
154 <li><a href="private.wiki">Private Branches &mdash; Creating, Syncing, and Deleting</a></li>
155 <li><a href="makefile.wiki">Process &mdash; The Fossil Build</a></li>
156 <li><a href="contribute.wiki">Project &mdash; Contributing Code or Documentation To The Fossil</a></li>
@@ -174,10 +178,12 @@
174 <li><a href="th1.md">Scripting Language &mdash; The TH1</a></li>
175 <li><a href="selfcheck.wiki">Self Checks &mdash; Fossil Repository Integrity</a></li>
176 <li><a href="selfhost.wiki">Self Hosting Repositories &mdash; Fossil</a></li>
177 <li><a href="server.wiki">Server &mdash; How To Configure A Fossil</a></li>
178 <li><a href="settings.wiki">Settings &mdash; Fossil</a></li>
 
 
179 <li><a href="shunning.wiki"><b>Shunning: Deleting Content From Fossil</b></a></li>
180 <li><a href="fiveminutes.wiki">Single User &mdash; Update and Running in 5 Minutes as a</a></li>
181 <li><a href="../../../sitemap"><b>Site Map</b></a></li>
182 <li><a href="style.wiki"><b>Source Code Style Guidelines</b></a></li>
183 <li><a href="antibot.wiki">Spiders and Bots &mdash; Defense against</a></li>
184
--- www/permutedindex.html
+++ www/permutedindex.html
@@ -29,10 +29,11 @@
29 <li><a href="blame.wiki">Annotate/Blame Algorithm Of Fossil &mdash; The</a></li>
30 <li><a href="customskin.md">Appearance of Web Pages &mdash; Theming: Customizing The</a></li>
31 <li><a href="faq.wiki">Asked Questions &mdash; Frequently</a></li>
32 <li><a href="password.wiki">Authentication &mdash; Password Management And</a></li>
33 <li><a href="whyusefossil.wiki"><b>Benefits Of Version Control</b></a></li>
34 <li><a href="hashpolicy.wiki">Between SHA1 and SHA3-256 &mdash; Hash Policy: Choosing</a></li>
35 <li><a href="antibot.wiki">Bots &mdash; Defense against Spiders and</a></li>
36 <li><a href="private.wiki">Branches &mdash; Creating, Syncing, and Deleting Private</a></li>
37 <li><a href="branching.wiki"><b>Branching, Forking, Merging, and Tagging</b></a></li>
38 <li><a href="bugtheory.wiki"><b>Bug Tracking In Fossil</b></a></li>
39 <li><a href="makefile.wiki">Build Process &mdash; The Fossil</a></li>
@@ -43,10 +44,11 @@
44 <li><a href="checkin.wiki">Checklist &mdash; Check-in</a></li>
45 <li><a href="../test/release-checklist.wiki">Checklist &mdash; Pre-Release Testing</a></li>
46 <li><a href="foss-cklist.wiki"><b>Checklist For Successful Open-Source Projects</b></a></li>
47 <li><a href="selfcheck.wiki">Checks &mdash; Fossil Repository Integrity Self</a></li>
48 <li><a href="childprojects.wiki"><b>Child Projects</b></a></li>
49 <li><a href="hashpolicy.wiki">Choosing Between SHA1 and SHA3-256 &mdash; Hash Policy:</a></li>
50 <li><a href="contribute.wiki">Code or Documentation To The Fossil Project &mdash; Contributing</a></li>
51 <li><a href="style.wiki">Code Style Guidelines &mdash; Source</a></li>
52 <li><a href="../../../help">Commands and Webpages &mdash; Lists of</a></li>
53 <li><a href="build.wiki"><b>Compiling and Installing Fossil</b></a></li>
54 <li><a href="concepts.wiki">Concepts &mdash; Fossil Core</a></li>
@@ -111,10 +113,11 @@
113 <li><a href="customgraph.md">Graph &mdash; Theming: Customizing the Timeline</a></li>
114 <li><a href="quickstart.wiki">Guide &mdash; Fossil Quick Start</a></li>
115 <li><a href="style.wiki">Guidelines &mdash; Source Code Style</a></li>
116 <li><a href="hacker-howto.wiki"><b>Hacker How-To</b></a></li>
117 <li><a href="adding_code.wiki"><b>Hacking Fossil</b></a></li>
118 <li><a href="hashpolicy.wiki"><b>Hash Policy: Choosing Between SHA1 and SHA3-256</b></a></li>
119 <li><a href="hints.wiki">Hints &mdash; Fossil Tips And Usage</a></li>
120 <li><a href="index.wiki"><b>Home Page</b></a></li>
121 <li><a href="selfhost.wiki">Hosting Repositories &mdash; Fossil Self</a></li>
122 <li><a href="aboutcgi.wiki"><b>How CGI Works In Fossil</b></a></li>
123 <li><a href="server.wiki"><b>How To Configure A Fossil Server</b></a></li>
@@ -147,10 +150,11 @@
150 <li><a href="index.wiki">Page &mdash; Home</a></li>
151 <li><a href="customskin.md">Pages &mdash; Theming: Customizing The Appearance of Web</a></li>
152 <li><a href="password.wiki"><b>Password Management And Authentication</b></a></li>
153 <li><a href="quotes.wiki">People Are Saying About Fossil, Git, and DVCSes in General &mdash; Quotes: What</a></li>
154 <li><a href="stats.wiki"><b>Performance Statistics</b></a></li>
155 <li><a href="hashpolicy.wiki">Policy: Choosing Between SHA1 and SHA3-256 &mdash; Hash</a></li>
156 <li><a href="../test/release-checklist.wiki"><b>Pre-Release Testing Checklist</b></a></li>
157 <li><a href="pop.wiki"><b>Principles Of Operation</b></a></li>
158 <li><a href="private.wiki">Private Branches &mdash; Creating, Syncing, and Deleting</a></li>
159 <li><a href="makefile.wiki">Process &mdash; The Fossil Build</a></li>
160 <li><a href="contribute.wiki">Project &mdash; Contributing Code or Documentation To The Fossil</a></li>
@@ -174,10 +178,12 @@
178 <li><a href="th1.md">Scripting Language &mdash; The TH1</a></li>
179 <li><a href="selfcheck.wiki">Self Checks &mdash; Fossil Repository Integrity</a></li>
180 <li><a href="selfhost.wiki">Self Hosting Repositories &mdash; Fossil</a></li>
181 <li><a href="server.wiki">Server &mdash; How To Configure A Fossil</a></li>
182 <li><a href="settings.wiki">Settings &mdash; Fossil</a></li>
183 <li><a href="hashpolicy.wiki">SHA1 and SHA3-256 &mdash; Hash Policy: Choosing Between</a></li>
184 <li><a href="hashpolicy.wiki">SHA3-256 &mdash; Hash Policy: Choosing Between SHA1 and</a></li>
185 <li><a href="shunning.wiki"><b>Shunning: Deleting Content From Fossil</b></a></li>
186 <li><a href="fiveminutes.wiki">Single User &mdash; Update and Running in 5 Minutes as a</a></li>
187 <li><a href="../../../sitemap"><b>Site Map</b></a></li>
188 <li><a href="style.wiki"><b>Source Code Style Guidelines</b></a></li>
189 <li><a href="antibot.wiki">Spiders and Bots &mdash; Defense against</a></li>
190

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button