Fossil SCM

Merged from trunk to verify fix in [62352847].

rberteig 2017-03-13 21:53 rkb-2.0-tests merge

Commit 4077357a38ddb2f17c466960c473fb2e1e1408d10f774b28ef0104519fa4f580

Parent 5ee57d84b07781a…

29 files changed +1 -1 -209 -630 -972 -687 -103 +1 +1 +4 +17 -3 +21 -7 +1 -1 +90 +137 -20 +4 -1 +3 -3 +1 +3 +35 -32 +6 -1 +58 -1 +1 -1 +1 +36 +11 +20 +2 -2 +1 +6

~ VERSION - compat/zlib/doc/algorithm.txt - compat/zlib/doc/rfc1950.txt - compat/zlib/doc/rfc1951.txt - compat/zlib/doc/rfc1952.txt - compat/zlib/doc/txtvsbin.txt ~ src/clone.c ~ src/configure.c ~ src/content.c ~ src/db.c ~ src/diffcmd.c ~ src/doc.c ~ src/encode.c ~ src/hname.c ~ src/main.c ~ src/sha3.c ~ src/shun.c ~ src/sqlcmd.c ~ src/stash.c ~ src/stat.c ~ src/unversioned.c ~ src/wiki.c ~ src/xfer.c ~ win/Makefile.mingw.mistachkin ~ www/changes.wiki ~ www/hashpolicy.wiki ~ www/mkdownload.tcl ~ www/mkindex.tcl ~ www/permutedindex.html

M VERSION

+1 -1

		--- VERSION
		+++ VERSION
		@@ -1,1 +1,1 @@
1		-2.0
	1	+2.1
2	2
3	3	DELETED compat/zlib/doc/algorithm.txt
4	4	DELETED compat/zlib/doc/rfc1950.txt
5	5	DELETED compat/zlib/doc/rfc1951.txt
6	6	DELETED compat/zlib/doc/rfc1952.txt
7	7	DELETED compat/zlib/doc/txtvsbin.txt

	--- VERSION
	+++ VERSION
	@@ -1,1 +1,1 @@
1	2.0
2
3	ELETED compat/zlib/doc/algorithm.txt
4	ELETED compat/zlib/doc/rfc1950.txt
5	ELETED compat/zlib/doc/rfc1951.txt
6	ELETED compat/zlib/doc/rfc1952.txt
7	ELETED compat/zlib/doc/txtvsbin.txt

	--- VERSION
	+++ VERSION
	@@ -1,1 +1,1 @@
1	2.1
2
3	ELETED compat/zlib/doc/algorithm.txt
4	ELETED compat/zlib/doc/rfc1950.txt
5	ELETED compat/zlib/doc/rfc1951.txt
6	ELETED compat/zlib/doc/rfc1952.txt
7	ELETED compat/zlib/doc/txtvsbin.txt

D compat/zlib/doc/algorithm.txt

-209

		--- a/compat/zlib/doc/algorithm.txt
		+++ b/compat/zlib/doc/algorithm.txt
		@@ -1,209 +0,0 @@
1		-1. Compression algorithm (deflate)
2		-
3		-The deflation algorithm used by gzip (also zip and zlib) is a variation of
4		-LZ77 (Lempel-Ziv 1977, see reference below). It finds duplicated strings in
5		-the input data. The second occurrence of a string is replaced by a
6		-pointer to the previous string, in the form of a pair (distance,
7		-length). Distances are limited to 32K bytes, and lengths are limited
8		-to 258 bytes. When a string does not occur anywhere in the previous
9		-32K bytes, it is emitted as a sequence of literal bytes. (In this
10		-description, `string' must be taken as an arbitrary sequence of bytes,
11		-and is not restricted to printable characters.)
12		-
13		-Literals or match lengths are compressed with one Huffman tree, and
14		-match distances are compressed with another tree. The trees are stored
15		-in a compact form at the start of each block. The blocks can have any
16		-size (except that the compressed data for one block must fit in
17		-available memory). A block is terminated when deflate() determines that
18		-it would be useful to start another block with fresh trees. (This is
19		-somewhat similar to the behavior of LZW-based _compress_.)
20		-
21		-Duplicated strings are found using a hash table. All input strings of
22		-length 3 are inserted in the hash table. A hash index is computed for
23		-the next 3 bytes. If the hash chain for this index is not empty, all
24		-strings in the chain are compared with the current input string, and
25		-the longest match is selected.
26		-
27		-The hash chains are searched starting with the most recent strings, to
28		-favor small distances and thus take advantage of the Huffman encoding.
29		-The hash chains are singly linked. There are no deletions from the
30		-hash chains, the algorithm simply discards matches that are too old.
31		-
32		-To avoid a worst-case situation, very long hash chains are arbitrarily
33		-truncated at a certain length, determined by a runtime option (level
34		-parameter of deflateInit). So deflate() does not always find the longest
35		-possible match but generally finds a match which is long enough.
36		-
37		-deflate() also defers the selection of matches with a lazy evaluation
38		-mechanism. After a match of length N has been found, deflate() searches for
39		-a longer match at the next input byte. If a longer match is found, the
40		-previous match is truncated to a length of one (thus producing a single
41		-literal byte) and the process of lazy evaluation begins again. Otherwise,
42		-the original match is kept, and the next match search is attempted only N
43		-steps later.
44		-
45		-The lazy match evaluation is also subject to a runtime parameter. If
46		-the current match is long enough, deflate() reduces the search for a longer
47		-match, thus speeding up the whole process. If compression ratio is more
48		-important than speed, deflate() attempts a complete second search even if
49		-the first match is already long enough.
50		-
51		-The lazy match evaluation is not performed for the fastest compression
52		-modes (level parameter 1 to 3). For these fast modes, new strings
53		-are inserted in the hash table only when no match was found, or
54		-when the match is not too long. This degrades the compression ratio
55		-but saves time since there are both fewer insertions and fewer searches.
56		-
57		-
58		-2. Decompression algorithm (inflate)
59		-
60		-2.1 Introduction
61		-
62		-The key question is how to represent a Huffman code (or any prefix code) so
63		-that you can decode fast. The most important characteristic is that shorter
64		-codes are much more common than longer codes, so pay attention to decoding the
65		-short codes fast, and let the long codes take longer to decode.
66		-
67		-inflate() sets up a first level table that covers some number of bits of
68		-input less than the length of longest code. It gets that many bits from the
69		-stream, and looks it up in the table. The table will tell if the next
70		-code is that many bits or less and how many, and if it is, it will tell
71		-the value, else it will point to the next level table for which inflate()
72		-grabs more bits and tries to decode a longer code.
73		-
74		-How many bits to make the first lookup is a tradeoff between the time it
75		-takes to decode and the time it takes to build the table. If building the
76		-table took no time (and if you had infinite memory), then there would only
77		-be a first level table to cover all the way to the longest code. However,
78		-building the table ends up taking a lot longer for more bits since short
79		-codes are replicated many times in such a table. What inflate() does is
80		-simply to make the number of bits in the first table a variable, and then
81		-to set that variable for the maximum speed.
82		-
83		-For inflate, which has 286 possible codes for the literal/length tree, the size
84		-of the first table is nine bits. Also the distance trees have 30 possible
85		-values, and the size of the first table is six bits. Note that for each of
86		-those cases, the table ended up one bit longer than the ``average'' code
87		-length, i.e. the code length of an approximately flat code which would be a
88		-little more than eight bits for 286 symbols and a little less than five bits
89		-for 30 symbols.
90		-
91		-
92		-2.2 More details on the inflate table lookup
93		-
94		-Ok, you want to know what this cleverly obfuscated inflate tree actually
95		-looks like. You are correct that it's not a Huffman tree. It is simply a
96		-lookup table for the first, let's say, nine bits of a Huffman symbol. The
97		-symbol could be as short as one bit or as long as 15 bits. If a particular
98		-symbol is shorter than nine bits, then that symbol's translation is duplicated
99		-in all those entries that start with that symbol's bits. For example, if the
100		-symbol is four bits, then it's duplicated 32 times in a nine-bit table. If a
101		-symbol is nine bits long, it appears in the table once.
102		-
103		-If the symbol is longer than nine bits, then that entry in the table points
104		-to another similar table for the remaining bits. Again, there are duplicated
105		-entries as needed. The idea is that most of the time the symbol will be short
106		-and there will only be one table look up. (That's whole idea behind data
107		-compression in the first place.) For the less frequent long symbols, there
108		-will be two lookups. If you had a compression method with really long
109		-symbols, you could have as many levels of lookups as is efficient. For
110		-inflate, two is enough.
111		-
112		-So a table entry either points to another table (in which case nine bits in
113		-the above example are gobbled), or it contains the translation for the symbol
114		-and the number of bits to gobble. Then you start again with the next
115		-ungobbled bit.
116		-
117		-You may wonder: why not just have one lookup table for how ever many bits the
118		-longest symbol is? The reason is that if you do that, you end up spending
119		-more time filling in duplicate symbol entries than you do actually decoding.
120		-At least for deflate's output that generates new trees every several 10's of
121		-kbytes. You can imagine that filling in a 2^15 entry table for a 15-bit code
122		-would take too long if you're only decoding several thousand symbols. At the
123		-other extreme, you could make a new table for every bit in the code. In fact,
124		-that's essentially a Huffman tree. But then you spend too much time
125		-traversing the tree while decoding, even for short symbols.
126		-
127		-So the number of bits for the first lookup table is a trade of the time to
128		-fill out the table vs. the time spent looking at the second level and above of
129		-the table.
130		-
131		-Here is an example, scaled down:
132		-
133		-The code being decoded, with 10 symbols, from 1 to 6 bits long:
134		-
135		-A: 0
136		-B: 10
137		-C: 1100
138		-D: 11010
139		-E: 11011
140		-F: 11100
141		-G: 11101
142		-H: 11110
143		-I: 111110
144		-J: 111111
145		-
146		-Let's make the first table three bits long (eight entries):
147		-
148		-000: A,1
149		-001: A,1
150		-010: A,1
151		-011: A,1
152		-100: B,2
153		-101: B,2
154		-110: -> table X (gobble 3 bits)
155		-111: -> table Y (gobble 3 bits)
156		-
157		-Each entry is what the bits decode as and how many bits that is, i.e. how
158		-many bits to gobble. Or the entry points to another table, with the number of
159		-bits to gobble implicit in the size of the table.
160		-
161		-Table X is two bits long since the longest code starting with 110 is five bits
162		-long:
163		-
164		-00: C,1
165		-01: C,1
166		-10: D,2
167		-11: E,2
168		-
169		-Table Y is three bits long since the longest code starting with 111 is six
170		-bits long:
171		-
172		-000: F,2
173		-001: F,2
174		-010: G,2
175		-011: G,2
176		-100: H,2
177		-101: H,2
178		-110: I,3
179		-111: J,3
180		-
181		-So what we have here are three tables with a total of 20 entries that had to
182		-be constructed. That's compared to 64 entries for a single table. Or
183		-compared to 16 entries for a Huffman tree (six two entry tables and one four
184		-entry table). Assuming that the code ideally represents the probability of
185		-the symbols, it takes on the average 1.25 lookups per symbol. That's compared
186		-to one lookup for the single table, or 1.66 lookups per symbol for the
187		-Huffman tree.
188		-
189		-There, I think that gives you a picture of what's going on. For inflate, the
190		-meaning of a particular symbol is often more than just a letter. It can be a
191		-byte (a "literal"), or it can be either a length or a distance which
192		-indicates a base value and a number of bits to fetch after the code that is
193		-added to the base value. Or it might be the special end-of-block code. The
194		-data structures created in inftrees.c try to encode all that information
195		-compactly in the tables.
196		-
197		-
198		-Jean-loup Gailly Mark Adler
199		-[email protected] [email protected]
200		-
201		-
202		-References:
203		-
204		-[LZ77] Ziv J., Lempel A., ``A Universal Algorithm for Sequential Data
205		-Compression,'' IEEE Transactions on Information Theory, Vol. 23, No. 3,
206		-pp. 337-343.
207		-
208		-``DEFLATE Compressed Data Format Specification'' available in
209		-http://tools.ietf.org/html/rfc1951

	--- a/compat/zlib/doc/algorithm.txt
	+++ b/compat/zlib/doc/algorithm.txt
	@@ -1,209 +0,0 @@
1	1. Compression algorithm (deflate)
2
3	The deflation algorithm used by gzip (also zip and zlib) is a variation of
4	LZ77 (Lempel-Ziv 1977, see reference below). It finds duplicated strings in
5	the input data. The second occurrence of a string is replaced by a
6	pointer to the previous string, in the form of a pair (distance,
7	length). Distances are limited to 32K bytes, and lengths are limited
8	to 258 bytes. When a string does not occur anywhere in the previous
9	32K bytes, it is emitted as a sequence of literal bytes. (In this
10	description, `string' must be taken as an arbitrary sequence of bytes,
11	and is not restricted to printable characters.)
12
13	Literals or match lengths are compressed with one Huffman tree, and
14	match distances are compressed with another tree. The trees are stored
15	in a compact form at the start of each block. The blocks can have any
16	size (except that the compressed data for one block must fit in
17	available memory). A block is terminated when deflate() determines that
18	it would be useful to start another block with fresh trees. (This is
19	somewhat similar to the behavior of LZW-based _compress_.)
20
21	Duplicated strings are found using a hash table. All input strings of
22	length 3 are inserted in the hash table. A hash index is computed for
23	the next 3 bytes. If the hash chain for this index is not empty, all
24	strings in the chain are compared with the current input string, and
25	the longest match is selected.
26
27	The hash chains are searched starting with the most recent strings, to
28	favor small distances and thus take advantage of the Huffman encoding.
29	The hash chains are singly linked. There are no deletions from the
30	hash chains, the algorithm simply discards matches that are too old.
31
32	To avoid a worst-case situation, very long hash chains are arbitrarily
33	truncated at a certain length, determined by a runtime option (level
34	parameter of deflateInit). So deflate() does not always find the longest
35	possible match but generally finds a match which is long enough.
36
37	deflate() also defers the selection of matches with a lazy evaluation
38	mechanism. After a match of length N has been found, deflate() searches for
39	a longer match at the next input byte. If a longer match is found, the
40	previous match is truncated to a length of one (thus producing a single
41	literal byte) and the process of lazy evaluation begins again. Otherwise,
42	the original match is kept, and the next match search is attempted only N
43	steps later.
44
45	The lazy match evaluation is also subject to a runtime parameter. If
46	the current match is long enough, deflate() reduces the search for a longer
47	match, thus speeding up the whole process. If compression ratio is more
48	important than speed, deflate() attempts a complete second search even if
49	the first match is already long enough.
50
51	The lazy match evaluation is not performed for the fastest compression
52	modes (level parameter 1 to 3). For these fast modes, new strings
53	are inserted in the hash table only when no match was found, or
54	when the match is not too long. This degrades the compression ratio
55	but saves time since there are both fewer insertions and fewer searches.
56
57
58	2. Decompression algorithm (inflate)
59
60	2.1 Introduction
61
62	The key question is how to represent a Huffman code (or any prefix code) so
63	that you can decode fast. The most important characteristic is that shorter
64	codes are much more common than longer codes, so pay attention to decoding the
65	short codes fast, and let the long codes take longer to decode.
66
67	inflate() sets up a first level table that covers some number of bits of
68	input less than the length of longest code. It gets that many bits from the
69	stream, and looks it up in the table. The table will tell if the next
70	code is that many bits or less and how many, and if it is, it will tell
71	the value, else it will point to the next level table for which inflate()
72	grabs more bits and tries to decode a longer code.
73
74	How many bits to make the first lookup is a tradeoff between the time it
75	takes to decode and the time it takes to build the table. If building the
76	table took no time (and if you had infinite memory), then there would only
77	be a first level table to cover all the way to the longest code. However,
78	building the table ends up taking a lot longer for more bits since short
79	codes are replicated many times in such a table. What inflate() does is
80	simply to make the number of bits in the first table a variable, and then
81	to set that variable for the maximum speed.
82
83	For inflate, which has 286 possible codes for the literal/length tree, the size
84	of the first table is nine bits. Also the distance trees have 30 possible
85	values, and the size of the first table is six bits. Note that for each of
86	those cases, the table ended up one bit longer than the ``average'' code
87	length, i.e. the code length of an approximately flat code which would be a
88	little more than eight bits for 286 symbols and a little less than five bits
89	for 30 symbols.
90
91
92	2.2 More details on the inflate table lookup
93
94	Ok, you want to know what this cleverly obfuscated inflate tree actually
95	looks like. You are correct that it's not a Huffman tree. It is simply a
96	lookup table for the first, let's say, nine bits of a Huffman symbol. The
97	symbol could be as short as one bit or as long as 15 bits. If a particular
98	symbol is shorter than nine bits, then that symbol's translation is duplicated
99	in all those entries that start with that symbol's bits. For example, if the
100	symbol is four bits, then it's duplicated 32 times in a nine-bit table. If a
101	symbol is nine bits long, it appears in the table once.
102
103	If the symbol is longer than nine bits, then that entry in the table points
104	to another similar table for the remaining bits. Again, there are duplicated
105	entries as needed. The idea is that most of the time the symbol will be short
106	and there will only be one table look up. (That's whole idea behind data
107	compression in the first place.) For the less frequent long symbols, there
108	will be two lookups. If you had a compression method with really long
109	symbols, you could have as many levels of lookups as is efficient. For
110	inflate, two is enough.
111
112	So a table entry either points to another table (in which case nine bits in
113	the above example are gobbled), or it contains the translation for the symbol
114	and the number of bits to gobble. Then you start again with the next
115	ungobbled bit.
116
117	You may wonder: why not just have one lookup table for how ever many bits the
118	longest symbol is? The reason is that if you do that, you end up spending
119	more time filling in duplicate symbol entries than you do actually decoding.
120	At least for deflate's output that generates new trees every several 10's of
121	kbytes. You can imagine that filling in a 2^15 entry table for a 15-bit code
122	would take too long if you're only decoding several thousand symbols. At the
123	other extreme, you could make a new table for every bit in the code. In fact,
124	that's essentially a Huffman tree. But then you spend too much time
125	traversing the tree while decoding, even for short symbols.
126
127	So the number of bits for the first lookup table is a trade of the time to
128	fill out the table vs. the time spent looking at the second level and above of
129	the table.
130
131	Here is an example, scaled down:
132
133	The code being decoded, with 10 symbols, from 1 to 6 bits long:
134
135	A: 0
136	B: 10
137	C: 1100
138	D: 11010
139	E: 11011
140	F: 11100
141	G: 11101
142	H: 11110
143	I: 111110
144	J: 111111
145
146	Let's make the first table three bits long (eight entries):
147
148	000: A,1
149	001: A,1
150	010: A,1
151	011: A,1
152	100: B,2
153	101: B,2
154	110: -> table X (gobble 3 bits)
155	111: -> table Y (gobble 3 bits)
156
157	Each entry is what the bits decode as and how many bits that is, i.e. how
158	many bits to gobble. Or the entry points to another table, with the number of
159	bits to gobble implicit in the size of the table.
160
161	Table X is two bits long since the longest code starting with 110 is five bits
162	long:
163
164	00: C,1
165	01: C,1
166	10: D,2
167	11: E,2
168
169	Table Y is three bits long since the longest code starting with 111 is six
170	bits long:
171
172	000: F,2
173	001: F,2
174	010: G,2
175	011: G,2
176	100: H,2
177	101: H,2
178	110: I,3
179	111: J,3
180
181	So what we have here are three tables with a total of 20 entries that had to
182	be constructed. That's compared to 64 entries for a single table. Or
183	compared to 16 entries for a Huffman tree (six two entry tables and one four
184	entry table). Assuming that the code ideally represents the probability of
185	the symbols, it takes on the average 1.25 lookups per symbol. That's compared
186	to one lookup for the single table, or 1.66 lookups per symbol for the
187	Huffman tree.
188
189	There, I think that gives you a picture of what's going on. For inflate, the
190	meaning of a particular symbol is often more than just a letter. It can be a
191	byte (a "literal"), or it can be either a length or a distance which
192	indicates a base value and a number of bits to fetch after the code that is
193	added to the base value. Or it might be the special end-of-block code. The
194	data structures created in inftrees.c try to encode all that information
195	compactly in the tables.
196
197
198	Jean-loup Gailly Mark Adler
199	[email protected] [email protected]
200
201
202	References:
203
204	[LZ77] Ziv J., Lempel A., ``A Universal Algorithm for Sequential Data
205	Compression,'' IEEE Transactions on Information Theory, Vol. 23, No. 3,
206	pp. 337-343.
207
208	``DEFLATE Compressed Data Format Specification'' available in
209	http://tools.ietf.org/html/rfc1951

	--- a/compat/zlib/doc/algorithm.txt
	+++ b/compat/zlib/doc/algorithm.txt
	@@ -1,209 +0,0 @@

D compat/zlib/doc/rfc1950.txt

-630

		--- a/compat/zlib/doc/rfc1950.txt
		+++ b/compat/zlib/doc/rfc1950.txt
		@@ -1,630 +0,0 @@
1		-
2		-
3		-
4		-
5		-
6		-
7		-Network Working Group P. Deutsch
8		-Request for Comments: 1950 Aladdin Enterprises
9		-Category: Informational J-L. Gailly
10		- Info-ZIP
11		- May 1996
12		-
13		-
14		- ZLIB Compressed Data Format Specification version 3.3
15		-
16		-Status of This Memo
17		-
18		- This memo provides information for the Internet community. This memo
19		- does not specify an Internet standard of any kind. Distribution of
20		- this memo is unlimited.
21		-
22		-IESG Note:
23		-
24		- The IESG takes no position on the validity of any Intellectual
25		- Property Rights statements contained in this document.
26		-
27		-Notices
28		-
29		- Copyright (c) 1996 L. Peter Deutsch and Jean-Loup Gailly
30		-
31		- Permission is granted to copy and distribute this document for any
32		- purpose and without charge, including translations into other
33		- languages and incorporation into compilations, provided that the
34		- copyright notice and this notice are preserved, and that any
35		- substantive changes or deletions from the original are clearly
36		- marked.
37		-
38		- A pointer to the latest version of this and related documentation in
39		- HTML format can be found at the URL
40		- <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
41		-
42		-Abstract
43		-
44		- This specification defines a lossless compressed data format. The
45		- data can be produced or consumed, even for an arbitrarily long
46		- sequentially presented input data stream, using only an a priori
47		- bounded amount of intermediate storage. The format presently uses
48		- the DEFLATE compression method but can be easily extended to use
49		- other compression methods. It can be implemented readily in a manner
50		- not covered by patents. This specification also defines the ADLER-32
51		- checksum (an extension and improvement of the Fletcher checksum),
52		- used for detection of data corruption, and provides an algorithm for
53		- computing it.
54		-
55		-
56		-
57		-
58		-Deutsch & Gailly Informational [Page 1]
59		-
60		-
61		-RFC 1950 ZLIB Compressed Data Format Specification May 1996
62		-
63		-
64		-Table of Contents
65		-
66		- 1. Introduction ................................................... 2
67		- 1.1. Purpose ................................................... 2
68		- 1.2. Intended audience ......................................... 3
69		- 1.3. Scope ..................................................... 3
70		- 1.4. Compliance ................................................ 3
71		- 1.5. Definitions of terms and conventions used ................ 3
72		- 1.6. Changes from previous versions ............................ 3
73		- 2. Detailed specification ......................................... 3
74		- 2.1. Overall conventions ....................................... 3
75		- 2.2. Data format ............................................... 4
76		- 2.3. Compliance ................................................ 7
77		- 3. References ..................................................... 7
78		- 4. Source code .................................................... 8
79		- 5. Security Considerations ........................................ 8
80		- 6. Acknowledgements ............................................... 8
81		- 7. Authors' Addresses ............................................. 8
82		- 8. Appendix: Rationale ............................................ 9
83		- 9. Appendix: Sample code ..........................................10
84		-
85		-1. Introduction
86		-
87		- 1.1. Purpose
88		-
89		- The purpose of this specification is to define a lossless
90		- compressed data format that:
91		-
92		- * Is independent of CPU type, operating system, file system,
93		- and character set, and hence can be used for interchange;
94		-
95		- * Can be produced or consumed, even for an arbitrarily long
96		- sequentially presented input data stream, using only an a
97		- priori bounded amount of intermediate storage, and hence can
98		- be used in data communications or similar structures such as
99		- Unix filters;
100		-
101		- * Can use a number of different compression methods;
102		-
103		- * Can be implemented readily in a manner not covered by
104		- patents, and hence can be practiced freely.
105		-
106		- The data format defined by this specification does not attempt to
107		- allow random access to compressed data.
108		-
109		-
110		-
111		-
112		-
113		-
114		-
115		-Deutsch & Gailly Informational [Page 2]
116		-
117		-
118		-RFC 1950 ZLIB Compressed Data Format Specification May 1996
119		-
120		-
121		- 1.2. Intended audience
122		-
123		- This specification is intended for use by implementors of software
124		- to compress data into zlib format and/or decompress data from zlib
125		- format.
126		-
127		- The text of the specification assumes a basic background in
128		- programming at the level of bits and other primitive data
129		- representations.
130		-
131		- 1.3. Scope
132		-
133		- The specification specifies a compressed data format that can be
134		- used for in-memory compression of a sequence of arbitrary bytes.
135		-
136		- 1.4. Compliance
137		-
138		- Unless otherwise indicated below, a compliant decompressor must be
139		- able to accept and decompress any data set that conforms to all
140		- the specifications presented here; a compliant compressor must
141		- produce data sets that conform to all the specifications presented
142		- here.
143		-
144		- 1.5. Definitions of terms and conventions used
145		-
146		- byte: 8 bits stored or transmitted as a unit (same as an octet).
147		- (For this specification, a byte is exactly 8 bits, even on
148		- machines which store a character on a number of bits different
149		- from 8.) See below, for the numbering of bits within a byte.
150		-
151		- 1.6. Changes from previous versions
152		-
153		- Version 3.1 was the first public release of this specification.
154		- In version 3.2, some terminology was changed and the Adler-32
155		- sample code was rewritten for clarity. In version 3.3, the
156		- support for a preset dictionary was introduced, and the
157		- specification was converted to RFC style.
158		-
159		-2. Detailed specification
160		-
161		- 2.1. Overall conventions
162		-
163		- In the diagrams below, a box like this:
164		-
165		- +---+
166		- \| \| <-- the vertical bars might be missing
167		- +---+
168		-
169		-
170		-
171		-
172		-Deutsch & Gailly Informational [Page 3]
173		-
174		-
175		-RFC 1950 ZLIB Compressed Data Format Specification May 1996
176		-
177		-
178		- represents one byte; a box like this:
179		-
180		- +==============+
181		- \| \|
182		- +==============+
183		-
184		- represents a variable number of bytes.
185		-
186		- Bytes stored within a computer do not have a "bit order", since
187		- they are always treated as a unit. However, a byte considered as
188		- an integer between 0 and 255 does have a most- and least-
189		- significant bit, and since we write numbers with the most-
190		- significant digit on the left, we also write bytes with the most-
191		- significant bit on the left. In the diagrams below, we number the
192		- bits of a byte so that bit 0 is the least-significant bit, i.e.,
193		- the bits are numbered:
194		-
195		- +--------+
196		- \|76543210\|
197		- +--------+
198		-
199		- Within a computer, a number may occupy multiple bytes. All
200		- multi-byte numbers in the format described here are stored with
201		- the MOST-significant byte first (at the lower memory address).
202		- For example, the decimal number 520 is stored as:
203		-
204		- 0 1
205		- +--------+--------+
206		- \|00000010\|00001000\|
207		- +--------+--------+
208		- ^ ^
209		- \| \|
210		- \| + less significant byte = 8
211		- + more significant byte = 2 x 256
212		-
213		- 2.2. Data format
214		-
215		- A zlib stream has the following structure:
216		-
217		- 0 1
218		- +---+---+
219		- \|CMF\|FLG\| (more-->)
220		- +---+---+
221		-
222		-
223		-
224		-
225		-
226		-
227		-
228		-
229		-Deutsch & Gailly Informational [Page 4]
230		-
231		-
232		-RFC 1950 ZLIB Compressed Data Format Specification May 1996
233		-
234		-
235		- (if FLG.FDICT set)
236		-
237		- 0 1 2 3
238		- +---+---+---+---+
239		- \| DICTID \| (more-->)
240		- +---+---+---+---+
241		-
242		- +=====================+---+---+---+---+
243		- \|...compressed data...\| ADLER32 \|
244		- +=====================+---+---+---+---+
245		-
246		- Any data which may appear after ADLER32 are not part of the zlib
247		- stream.
248		-
249		- CMF (Compression Method and flags)
250		- This byte is divided into a 4-bit compression method and a 4-
251		- bit information field depending on the compression method.
252		-
253		- bits 0 to 3 CM Compression method
254		- bits 4 to 7 CINFO Compression info
255		-
256		- CM (Compression method)
257		- This identifies the compression method used in the file. CM = 8
258		- denotes the "deflate" compression method with a window size up
259		- to 32K. This is the method used by gzip and PNG (see
260		- references [1] and [2] in Chapter 3, below, for the reference
261		- documents). CM = 15 is reserved. It might be used in a future
262		- version of this specification to indicate the presence of an
263		- extra field before the compressed data.
264		-
265		- CINFO (Compression info)
266		- For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
267		- size, minus eight (CINFO=7 indicates a 32K window size). Values
268		- of CINFO above 7 are not allowed in this version of the
269		- specification. CINFO is not defined in this specification for
270		- CM not equal to 8.
271		-
272		- FLG (FLaGs)
273		- This flag byte is divided as follows:
274		-
275		- bits 0 to 4 FCHECK (check bits for CMF and FLG)
276		- bit 5 FDICT (preset dictionary)
277		- bits 6 to 7 FLEVEL (compression level)
278		-
279		- The FCHECK value must be such that CMF and FLG, when viewed as
280		- a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG),
281		- is a multiple of 31.
282		-
283		-
284		-
285		-
286		-Deutsch & Gailly Informational [Page 5]
287		-
288		-
289		-RFC 1950 ZLIB Compressed Data Format Specification May 1996
290		-
291		-
292		- FDICT (Preset dictionary)
293		- If FDICT is set, a DICT dictionary identifier is present
294		- immediately after the FLG byte. The dictionary is a sequence of
295		- bytes which are initially fed to the compressor without
296		- producing any compressed output. DICT is the Adler-32 checksum
297		- of this sequence of bytes (see the definition of ADLER32
298		- below). The decompressor can use this identifier to determine
299		- which dictionary has been used by the compressor.
300		-
301		- FLEVEL (Compression level)
302		- These flags are available for use by specific compression
303		- methods. The "deflate" method (CM = 8) sets these flags as
304		- follows:
305		-
306		- 0 - compressor used fastest algorithm
307		- 1 - compressor used fast algorithm
308		- 2 - compressor used default algorithm
309		- 3 - compressor used maximum compression, slowest algorithm
310		-
311		- The information in FLEVEL is not needed for decompression; it
312		- is there to indicate if recompression might be worthwhile.
313		-
314		- compressed data
315		- For compression method 8, the compressed data is stored in the
316		- deflate compressed data format as described in the document
317		- "DEFLATE Compressed Data Format Specification" by L. Peter
318		- Deutsch. (See reference [3] in Chapter 3, below)
319		-
320		- Other compressed data formats are not specified in this version
321		- of the zlib specification.
322		-
323		- ADLER32 (Adler-32 checksum)
324		- This contains a checksum value of the uncompressed data
325		- (excluding any dictionary data) computed according to Adler-32
326		- algorithm. This algorithm is a 32-bit extension and improvement
327		- of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073
328		- standard. See references [4] and [5] in Chapter 3, below)
329		-
330		- Adler-32 is composed of two sums accumulated per byte: s1 is
331		- the sum of all bytes, s2 is the sum of all s1 values. Both sums
332		- are done modulo 65521. s1 is initialized to 1, s2 to zero. The
333		- Adler-32 checksum is stored as s2*65536 + s1 in most-
334		- significant-byte first (network) order.
335		-
336		-
337		-
338		-
339		-
340		-
341		-
342		-
343		-Deutsch & Gailly Informational [Page 6]
344		-
345		-
346		-RFC 1950 ZLIB Compressed Data Format Specification May 1996
347		-
348		-
349		- 2.3. Compliance
350		-
351		- A compliant compressor must produce streams with correct CMF, FLG
352		- and ADLER32, but need not support preset dictionaries. When the
353		- zlib data format is used as part of another standard data format,
354		- the compressor may use only preset dictionaries that are specified
355		- by this other data format. If this other format does not use the
356		- preset dictionary feature, the compressor must not set the FDICT
357		- flag.
358		-
359		- A compliant decompressor must check CMF, FLG, and ADLER32, and
360		- provide an error indication if any of these have incorrect values.
361		- A compliant decompressor must give an error indication if CM is
362		- not one of the values defined in this specification (only the
363		- value 8 is permitted in this version), since another value could
364		- indicate the presence of new features that would cause subsequent
365		- data to be interpreted incorrectly. A compliant decompressor must
366		- give an error indication if FDICT is set and DICTID is not the
367		- identifier of a known preset dictionary. A decompressor may
368		- ignore FLEVEL and still be compliant. When the zlib data format
369		- is being used as a part of another standard format, a compliant
370		- decompressor must support all the preset dictionaries specified by
371		- the other format. When the other format does not use the preset
372		- dictionary feature, a compliant decompressor must reject any
373		- stream in which the FDICT flag is set.
374		-
375		-3. References
376		-
377		- [1] Deutsch, L.P.,"GZIP Compressed Data Format Specification",
378		- available in ftp://ftp.uu.net/pub/archiving/zip/doc/
379		-
380		- [2] Thomas Boutell, "PNG (Portable Network Graphics) specification",
381		- available in ftp://ftp.uu.net/graphics/png/documents/
382		-
383		- [3] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
384		- available in ftp://ftp.uu.net/pub/archiving/zip/doc/
385		-
386		- [4] Fletcher, J. G., "An Arithmetic Checksum for Serial
387		- Transmissions," IEEE Transactions on Communications, Vol. COM-30,
388		- No. 1, January 1982, pp. 247-252.
389		-
390		- [5] ITU-T Recommendation X.224, Annex D, "Checksum Algorithms,"
391		- November, 1993, pp. 144, 145. (Available from
392		- gopher://info.itu.ch). ITU-T X.244 is also the same as ISO 8073.
393		-
394		-
395		-
396		-
397		-
398		-
399		-
400		-Deutsch & Gailly Informational [Page 7]
401		-
402		-
403		-RFC 1950 ZLIB Compressed Data Format Specification May 1996
404		-
405		-
406		-4. Source code
407		-
408		- Source code for a C language implementation of a "zlib" compliant
409		- library is available at ftp://ftp.uu.net/pub/archiving/zip/zlib/.
410		-
411		-5. Security Considerations
412		-
413		- A decoder that fails to check the ADLER32 checksum value may be
414		- subject to undetected data corruption.
415		-
416		-6. Acknowledgements
417		-
418		- Trademarks cited in this document are the property of their
419		- respective owners.
420		-
421		- Jean-Loup Gailly and Mark Adler designed the zlib format and wrote
422		- the related software described in this specification. Glenn
423		- Randers-Pehrson converted this document to RFC and HTML format.
424		-
425		-7. Authors' Addresses
426		-
427		- L. Peter Deutsch
428		- Aladdin Enterprises
429		- 203 Santa Margarita Ave.
430		- Menlo Park, CA 94025
431		-
432		- Phone: (415) 322-0103 (AM only)
433		- FAX: (415) 322-1734
434		- EMail: <[email protected]>
435		-
436		-
437		- Jean-Loup Gailly
438		-
439		- EMail: <[email protected]>
440		-
441		- Questions about the technical content of this specification can be
442		- sent by email to
443		-
444		- Jean-Loup Gailly <[email protected]> and
445		- Mark Adler <[email protected]>
446		-
447		- Editorial comments on this specification can be sent by email to
448		-
449		- L. Peter Deutsch <[email protected]> and
450		- Glenn Randers-Pehrson <[email protected]>
451		-
452		-
453		-
454		-
455		-
456		-
457		-Deutsch & Gailly Informational [Page 8]
458		-
459		-
460		-RFC 1950 ZLIB Compressed Data Format Specification May 1996
461		-
462		-
463		-8. Appendix: Rationale
464		-
465		- 8.1. Preset dictionaries
466		-
467		- A preset dictionary is specially useful to compress short input
468		- sequences. The compressor can take advantage of the dictionary
469		- context to encode the input in a more compact manner. The
470		- decompressor can be initialized with the appropriate context by
471		- virtually decompressing a compressed version of the dictionary
472		- without producing any output. However for certain compression
473		- algorithms such as the deflate algorithm this operation can be
474		- achieved without actually performing any decompression.
475		-
476		- The compressor and the decompressor must use exactly the same
477		- dictionary. The dictionary may be fixed or may be chosen among a
478		- certain number of predefined dictionaries, according to the kind
479		- of input data. The decompressor can determine which dictionary has
480		- been chosen by the compressor by checking the dictionary
481		- identifier. This document does not specify the contents of
482		- predefined dictionaries, since the optimal dictionaries are
483		- application specific. Standard data formats using this feature of
484		- the zlib specification must precisely define the allowed
485		- dictionaries.
486		-
487		- 8.2. The Adler-32 algorithm
488		-
489		- The Adler-32 algorithm is much faster than the CRC32 algorithm yet
490		- still provides an extremely low probability of undetected errors.
491		-
492		- The modulo on unsigned long accumulators can be delayed for 5552
493		- bytes, so the modulo operation time is negligible. If the bytes
494		- are a, b, c, the second sum is 3a + 2b + c + 3, and so is position
495		- and order sensitive, unlike the first sum, which is just a
496		- checksum. That 65521 is prime is important to avoid a possible
497		- large class of two-byte errors that leave the check unchanged.
498		- (The Fletcher checksum uses 255, which is not prime and which also
499		- makes the Fletcher check insensitive to single byte changes 0 <->
500		- 255.)
501		-
502		- The sum s1 is initialized to 1 instead of zero to make the length
503		- of the sequence part of s2, so that the length does not have to be
504		- checked separately. (Any sequence of zeroes has a Fletcher
505		- checksum of zero.)
506		-
507		-
508		-
509		-
510		-
511		-
512		-
513		-
514		-Deutsch & Gailly Informational [Page 9]
515		-
516		-
517		-RFC 1950 ZLIB Compressed Data Format Specification May 1996
518		-
519		-
520		-9. Appendix: Sample code
521		-
522		- The following C code computes the Adler-32 checksum of a data buffer.
523		- It is written for clarity, not for speed. The sample code is in the
524		- ANSI C programming language. Non C users may find it easier to read
525		- with these hints:
526		-
527		- & Bitwise AND operator.
528		- >> Bitwise right shift operator. When applied to an
529		- unsigned quantity, as here, right shift inserts zero bit(s)
530		- at the left.
531		- << Bitwise left shift operator. Left shift inserts zero
532		- bit(s) at the right.
533		- ++ "n++" increments the variable n.
534		- % modulo operator: a % b is the remainder of a divided by b.
535		-
536		- #define BASE 65521 /* largest prime smaller than 65536 */
537		-
538		- /*
539		- Update a running Adler-32 checksum with the bytes buf[0..len-1]
540		- and return the updated checksum. The Adler-32 checksum should be
541		- initialized to 1.
542		-
543		- Usage example:
544		-
545		- unsigned long adler = 1L;
546		-
547		- while (read_buffer(buffer, length) != EOF) {
548		- adler = update_adler32(adler, buffer, length);
549		- }
550		- if (adler != original_adler) error();
551		- */
552		- unsigned long update_adler32(unsigned long adler,
553		- unsigned char *buf, int len)
554		- {
555		- unsigned long s1 = adler & 0xffff;
556		- unsigned long s2 = (adler >> 16) & 0xffff;
557		- int n;
558		-
559		- for (n = 0; n < len; n++) {
560		- s1 = (s1 + buf[n]) % BASE;
561		- s2 = (s2 + s1) % BASE;
562		- }
563		- return (s2 << 16) + s1;
564		- }
565		-
566		- /* Return the adler32 of the bytes buf[0..len-1] */
567		-
568		-
569		-
570		-
571		-Deutsch & Gailly Informational [Page 10]
572		-
573		-
574		-RFC 1950 ZLIB Compressed Data Format Specification May 1996
575		-
576		-
577		- unsigned long adler32(unsigned char *buf, int len)
578		- {
579		- return update_adler32(1L, buf, len);
580		- }
581		-
582		-
583		-
584		-
585		-
586		-
587		-
588		-
589		-
590		-
591		-
592		-
593		-
594		-
595		-
596		-
597		-
598		-
599		-
600		-
601		-
602		-
603		-
604		-
605		-
606		-
607		-
608		-
609		-
610		-
611		-
612		-
613		-
614		-
615		-
616		-
617		-
618		-
619		-
620		-
621		-
622		-
623		-
624		-
625		-
626		-
627		-
628		-Deutsch & Gailly Informational [Page 11]
629		-
630		-

	--- a/compat/zlib/doc/rfc1950.txt
	+++ b/compat/zlib/doc/rfc1950.txt
	@@ -1,630 +0,0 @@
1
2
3
4
5
6
7	Network Working Group P. Deutsch
8	Request for Comments: 1950 Aladdin Enterprises
9	Category: Informational J-L. Gailly
10	Info-ZIP
11	May 1996
12
13
14	ZLIB Compressed Data Format Specification version 3.3
15
16	Status of This Memo
17
18	This memo provides information for the Internet community. This memo
19	does not specify an Internet standard of any kind. Distribution of
20	this memo is unlimited.
21
22	IESG Note:
23
24	The IESG takes no position on the validity of any Intellectual
25	Property Rights statements contained in this document.
26
27	Notices
28
29	Copyright (c) 1996 L. Peter Deutsch and Jean-Loup Gailly
30
31	Permission is granted to copy and distribute this document for any
32	purpose and without charge, including translations into other
33	languages and incorporation into compilations, provided that the
34	copyright notice and this notice are preserved, and that any
35	substantive changes or deletions from the original are clearly
36	marked.
37
38	A pointer to the latest version of this and related documentation in
39	HTML format can be found at the URL
40	<ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
41
42	Abstract
43
44	This specification defines a lossless compressed data format. The
45	data can be produced or consumed, even for an arbitrarily long
46	sequentially presented input data stream, using only an a priori
47	bounded amount of intermediate storage. The format presently uses
48	the DEFLATE compression method but can be easily extended to use
49	other compression methods. It can be implemented readily in a manner
50	not covered by patents. This specification also defines the ADLER-32
51	checksum (an extension and improvement of the Fletcher checksum),
52	used for detection of data corruption, and provides an algorithm for
53	computing it.
54
55
56
57
58	Deutsch & Gailly Informational [Page 1]
59
60
61	RFC 1950 ZLIB Compressed Data Format Specification May 1996
62
63
64	Table of Contents
65
66	1. Introduction ................................................... 2
67	1.1. Purpose ................................................... 2
68	1.2. Intended audience ......................................... 3
69	1.3. Scope ..................................................... 3
70	1.4. Compliance ................................................ 3
71	1.5. Definitions of terms and conventions used ................ 3
72	1.6. Changes from previous versions ............................ 3
73	2. Detailed specification ......................................... 3
74	2.1. Overall conventions ....................................... 3
75	2.2. Data format ............................................... 4
76	2.3. Compliance ................................................ 7
77	3. References ..................................................... 7
78	4. Source code .................................................... 8
79	5. Security Considerations ........................................ 8
80	6. Acknowledgements ............................................... 8
81	7. Authors' Addresses ............................................. 8
82	8. Appendix: Rationale ............................................ 9
83	9. Appendix: Sample code ..........................................10
84
85	1. Introduction
86
87	1.1. Purpose
88
89	The purpose of this specification is to define a lossless
90	compressed data format that:
91
92	* Is independent of CPU type, operating system, file system,
93	and character set, and hence can be used for interchange;
94
95	* Can be produced or consumed, even for an arbitrarily long
96	sequentially presented input data stream, using only an a
97	priori bounded amount of intermediate storage, and hence can
98	be used in data communications or similar structures such as
99	Unix filters;
100
101	* Can use a number of different compression methods;
102
103	* Can be implemented readily in a manner not covered by
104	patents, and hence can be practiced freely.
105
106	The data format defined by this specification does not attempt to
107	allow random access to compressed data.
108
109
110
111
112
113
114
115	Deutsch & Gailly Informational [Page 2]
116
117
118	RFC 1950 ZLIB Compressed Data Format Specification May 1996
119
120
121	1.2. Intended audience
122
123	This specification is intended for use by implementors of software
124	to compress data into zlib format and/or decompress data from zlib
125	format.
126
127	The text of the specification assumes a basic background in
128	programming at the level of bits and other primitive data
129	representations.
130
131	1.3. Scope
132
133	The specification specifies a compressed data format that can be
134	used for in-memory compression of a sequence of arbitrary bytes.
135
136	1.4. Compliance
137
138	Unless otherwise indicated below, a compliant decompressor must be
139	able to accept and decompress any data set that conforms to all
140	the specifications presented here; a compliant compressor must
141	produce data sets that conform to all the specifications presented
142	here.
143
144	1.5. Definitions of terms and conventions used
145
146	byte: 8 bits stored or transmitted as a unit (same as an octet).
147	(For this specification, a byte is exactly 8 bits, even on
148	machines which store a character on a number of bits different
149	from 8.) See below, for the numbering of bits within a byte.
150
151	1.6. Changes from previous versions
152
153	Version 3.1 was the first public release of this specification.
154	In version 3.2, some terminology was changed and the Adler-32
155	sample code was rewritten for clarity. In version 3.3, the
156	support for a preset dictionary was introduced, and the
157	specification was converted to RFC style.
158
159	2. Detailed specification
160
161	2.1. Overall conventions
162
163	In the diagrams below, a box like this:
164
165	+---+
166	\| \| <-- the vertical bars might be missing
167	+---+
168
169
170
171
172	Deutsch & Gailly Informational [Page 3]
173
174
175	RFC 1950 ZLIB Compressed Data Format Specification May 1996
176
177
178	represents one byte; a box like this:
179
180	+==============+
181	\| \|
182	+==============+
183
184	represents a variable number of bytes.
185
186	Bytes stored within a computer do not have a "bit order", since
187	they are always treated as a unit. However, a byte considered as
188	an integer between 0 and 255 does have a most- and least-
189	significant bit, and since we write numbers with the most-
190	significant digit on the left, we also write bytes with the most-
191	significant bit on the left. In the diagrams below, we number the
192	bits of a byte so that bit 0 is the least-significant bit, i.e.,
193	the bits are numbered:
194
195	+--------+
196	\|76543210\|
197	+--------+
198
199	Within a computer, a number may occupy multiple bytes. All
200	multi-byte numbers in the format described here are stored with
201	the MOST-significant byte first (at the lower memory address).
202	For example, the decimal number 520 is stored as:
203
204	0 1
205	+--------+--------+
206	\|00000010\|00001000\|
207	+--------+--------+
208	^ ^
209	\| \|
210	\| + less significant byte = 8
211	+ more significant byte = 2 x 256
212
213	2.2. Data format
214
215	A zlib stream has the following structure:
216
217	0 1
218	+---+---+
219	\|CMF\|FLG\| (more-->)
220	+---+---+
221
222
223
224
225
226
227
228
229	Deutsch & Gailly Informational [Page 4]
230
231
232	RFC 1950 ZLIB Compressed Data Format Specification May 1996
233
234
235	(if FLG.FDICT set)
236
237	0 1 2 3
238	+---+---+---+---+
239	\| DICTID \| (more-->)
240	+---+---+---+---+
241
242	+=====================+---+---+---+---+
243	\|...compressed data...\| ADLER32 \|
244	+=====================+---+---+---+---+
245
246	Any data which may appear after ADLER32 are not part of the zlib
247	stream.
248
249	CMF (Compression Method and flags)
250	This byte is divided into a 4-bit compression method and a 4-
251	bit information field depending on the compression method.
252
253	bits 0 to 3 CM Compression method
254	bits 4 to 7 CINFO Compression info
255
256	CM (Compression method)
257	This identifies the compression method used in the file. CM = 8
258	denotes the "deflate" compression method with a window size up
259	to 32K. This is the method used by gzip and PNG (see
260	references [1] and [2] in Chapter 3, below, for the reference
261	documents). CM = 15 is reserved. It might be used in a future
262	version of this specification to indicate the presence of an
263	extra field before the compressed data.
264
265	CINFO (Compression info)
266	For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
267	size, minus eight (CINFO=7 indicates a 32K window size). Values
268	of CINFO above 7 are not allowed in this version of the
269	specification. CINFO is not defined in this specification for
270	CM not equal to 8.
271
272	FLG (FLaGs)
273	This flag byte is divided as follows:
274
275	bits 0 to 4 FCHECK (check bits for CMF and FLG)
276	bit 5 FDICT (preset dictionary)
277	bits 6 to 7 FLEVEL (compression level)
278
279	The FCHECK value must be such that CMF and FLG, when viewed as
280	a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG),
281	is a multiple of 31.
282
283
284
285
286	Deutsch & Gailly Informational [Page 5]
287
288
289	RFC 1950 ZLIB Compressed Data Format Specification May 1996
290
291
292	FDICT (Preset dictionary)
293	If FDICT is set, a DICT dictionary identifier is present
294	immediately after the FLG byte. The dictionary is a sequence of
295	bytes which are initially fed to the compressor without
296	producing any compressed output. DICT is the Adler-32 checksum
297	of this sequence of bytes (see the definition of ADLER32
298	below). The decompressor can use this identifier to determine
299	which dictionary has been used by the compressor.
300
301	FLEVEL (Compression level)
302	These flags are available for use by specific compression
303	methods. The "deflate" method (CM = 8) sets these flags as
304	follows:
305
306	0 - compressor used fastest algorithm
307	1 - compressor used fast algorithm
308	2 - compressor used default algorithm
309	3 - compressor used maximum compression, slowest algorithm
310
311	The information in FLEVEL is not needed for decompression; it
312	is there to indicate if recompression might be worthwhile.
313
314	compressed data
315	For compression method 8, the compressed data is stored in the
316	deflate compressed data format as described in the document
317	"DEFLATE Compressed Data Format Specification" by L. Peter
318	Deutsch. (See reference [3] in Chapter 3, below)
319
320	Other compressed data formats are not specified in this version
321	of the zlib specification.
322
323	ADLER32 (Adler-32 checksum)
324	This contains a checksum value of the uncompressed data
325	(excluding any dictionary data) computed according to Adler-32
326	algorithm. This algorithm is a 32-bit extension and improvement
327	of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073
328	standard. See references [4] and [5] in Chapter 3, below)
329
330	Adler-32 is composed of two sums accumulated per byte: s1 is
331	the sum of all bytes, s2 is the sum of all s1 values. Both sums
332	are done modulo 65521. s1 is initialized to 1, s2 to zero. The
333	Adler-32 checksum is stored as s2*65536 + s1 in most-
334	significant-byte first (network) order.
335
336
337
338
339
340
341
342
343	Deutsch & Gailly Informational [Page 6]
344
345
346	RFC 1950 ZLIB Compressed Data Format Specification May 1996
347
348
349	2.3. Compliance
350
351	A compliant compressor must produce streams with correct CMF, FLG
352	and ADLER32, but need not support preset dictionaries. When the
353	zlib data format is used as part of another standard data format,
354	the compressor may use only preset dictionaries that are specified
355	by this other data format. If this other format does not use the
356	preset dictionary feature, the compressor must not set the FDICT
357	flag.
358
359	A compliant decompressor must check CMF, FLG, and ADLER32, and
360	provide an error indication if any of these have incorrect values.
361	A compliant decompressor must give an error indication if CM is
362	not one of the values defined in this specification (only the
363	value 8 is permitted in this version), since another value could
364	indicate the presence of new features that would cause subsequent
365	data to be interpreted incorrectly. A compliant decompressor must
366	give an error indication if FDICT is set and DICTID is not the
367	identifier of a known preset dictionary. A decompressor may
368	ignore FLEVEL and still be compliant. When the zlib data format
369	is being used as a part of another standard format, a compliant
370	decompressor must support all the preset dictionaries specified by
371	the other format. When the other format does not use the preset
372	dictionary feature, a compliant decompressor must reject any
373	stream in which the FDICT flag is set.
374
375	3. References
376
377	[1] Deutsch, L.P.,"GZIP Compressed Data Format Specification",
378	available in ftp://ftp.uu.net/pub/archiving/zip/doc/
379
380	[2] Thomas Boutell, "PNG (Portable Network Graphics) specification",
381	available in ftp://ftp.uu.net/graphics/png/documents/
382
383	[3] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
384	available in ftp://ftp.uu.net/pub/archiving/zip/doc/
385
386	[4] Fletcher, J. G., "An Arithmetic Checksum for Serial
387	Transmissions," IEEE Transactions on Communications, Vol. COM-30,
388	No. 1, January 1982, pp. 247-252.
389
390	[5] ITU-T Recommendation X.224, Annex D, "Checksum Algorithms,"
391	November, 1993, pp. 144, 145. (Available from
392	gopher://info.itu.ch). ITU-T X.244 is also the same as ISO 8073.
393
394
395
396
397
398
399
400	Deutsch & Gailly Informational [Page 7]
401
402
403	RFC 1950 ZLIB Compressed Data Format Specification May 1996
404
405
406	4. Source code
407
408	Source code for a C language implementation of a "zlib" compliant
409	library is available at ftp://ftp.uu.net/pub/archiving/zip/zlib/.
410
411	5. Security Considerations
412
413	A decoder that fails to check the ADLER32 checksum value may be
414	subject to undetected data corruption.
415
416	6. Acknowledgements
417
418	Trademarks cited in this document are the property of their
419	respective owners.
420
421	Jean-Loup Gailly and Mark Adler designed the zlib format and wrote
422	the related software described in this specification. Glenn
423	Randers-Pehrson converted this document to RFC and HTML format.
424
425	7. Authors' Addresses
426
427	L. Peter Deutsch
428	Aladdin Enterprises
429	203 Santa Margarita Ave.
430	Menlo Park, CA 94025
431
432	Phone: (415) 322-0103 (AM only)
433	FAX: (415) 322-1734
434	EMail: <[email protected]>
435
436
437	Jean-Loup Gailly
438
439	EMail: <[email protected]>
440
441	Questions about the technical content of this specification can be
442	sent by email to
443
444	Jean-Loup Gailly <[email protected]> and
445	Mark Adler <[email protected]>
446
447	Editorial comments on this specification can be sent by email to
448
449	L. Peter Deutsch <[email protected]> and
450	Glenn Randers-Pehrson <[email protected]>
451
452
453
454
455
456
457	Deutsch & Gailly Informational [Page 8]
458
459
460	RFC 1950 ZLIB Compressed Data Format Specification May 1996
461
462
463	8. Appendix: Rationale
464
465	8.1. Preset dictionaries
466
467	A preset dictionary is specially useful to compress short input
468	sequences. The compressor can take advantage of the dictionary
469	context to encode the input in a more compact manner. The
470	decompressor can be initialized with the appropriate context by
471	virtually decompressing a compressed version of the dictionary
472	without producing any output. However for certain compression
473	algorithms such as the deflate algorithm this operation can be
474	achieved without actually performing any decompression.
475
476	The compressor and the decompressor must use exactly the same
477	dictionary. The dictionary may be fixed or may be chosen among a
478	certain number of predefined dictionaries, according to the kind
479	of input data. The decompressor can determine which dictionary has
480	been chosen by the compressor by checking the dictionary
481	identifier. This document does not specify the contents of
482	predefined dictionaries, since the optimal dictionaries are
483	application specific. Standard data formats using this feature of
484	the zlib specification must precisely define the allowed
485	dictionaries.
486
487	8.2. The Adler-32 algorithm
488
489	The Adler-32 algorithm is much faster than the CRC32 algorithm yet
490	still provides an extremely low probability of undetected errors.
491
492	The modulo on unsigned long accumulators can be delayed for 5552
493	bytes, so the modulo operation time is negligible. If the bytes
494	are a, b, c, the second sum is 3a + 2b + c + 3, and so is position
495	and order sensitive, unlike the first sum, which is just a
496	checksum. That 65521 is prime is important to avoid a possible
497	large class of two-byte errors that leave the check unchanged.
498	(The Fletcher checksum uses 255, which is not prime and which also
499	makes the Fletcher check insensitive to single byte changes 0 <->
500	255.)
501
502	The sum s1 is initialized to 1 instead of zero to make the length
503	of the sequence part of s2, so that the length does not have to be
504	checked separately. (Any sequence of zeroes has a Fletcher
505	checksum of zero.)
506
507
508
509
510
511
512
513
514	Deutsch & Gailly Informational [Page 9]
515
516
517	RFC 1950 ZLIB Compressed Data Format Specification May 1996
518
519
520	9. Appendix: Sample code
521
522	The following C code computes the Adler-32 checksum of a data buffer.
523	It is written for clarity, not for speed. The sample code is in the
524	ANSI C programming language. Non C users may find it easier to read
525	with these hints:
526
527	& Bitwise AND operator.
528	>> Bitwise right shift operator. When applied to an
529	unsigned quantity, as here, right shift inserts zero bit(s)
530	at the left.
531	<< Bitwise left shift operator. Left shift inserts zero
532	bit(s) at the right.
533	++ "n++" increments the variable n.
534	% modulo operator: a % b is the remainder of a divided by b.
535
536	#define BASE 65521 /* largest prime smaller than 65536 */
537
538	/*
539	Update a running Adler-32 checksum with the bytes buf[0..len-1]
540	and return the updated checksum. The Adler-32 checksum should be
541	initialized to 1.
542
543	Usage example:
544
545	unsigned long adler = 1L;
546
547	while (read_buffer(buffer, length) != EOF) {
548	adler = update_adler32(adler, buffer, length);
549	}
550	if (adler != original_adler) error();
551	*/
552	unsigned long update_adler32(unsigned long adler,
553	unsigned char *buf, int len)
554	{
555	unsigned long s1 = adler & 0xffff;
556	unsigned long s2 = (adler >> 16) & 0xffff;
557	int n;
558
559	for (n = 0; n < len; n++) {
560	s1 = (s1 + buf[n]) % BASE;
561	s2 = (s2 + s1) % BASE;
562	}
563	return (s2 << 16) + s1;
564	}
565
566	/* Return the adler32 of the bytes buf[0..len-1] */
567
568
569
570
571	Deutsch & Gailly Informational [Page 10]
572
573
574	RFC 1950 ZLIB Compressed Data Format Specification May 1996
575
576
577	unsigned long adler32(unsigned char *buf, int len)
578	{
579	return update_adler32(1L, buf, len);
580	}
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628	Deutsch & Gailly Informational [Page 11]
629
630

	--- a/compat/zlib/doc/rfc1950.txt
	+++ b/compat/zlib/doc/rfc1950.txt
	@@ -1,630 +0,0 @@

D compat/zlib/doc/rfc1951.txt

-972

		--- a/compat/zlib/doc/rfc1951.txt
		+++ b/compat/zlib/doc/rfc1951.txt
		@@ -1,972 +0,0 @@
1		-
2		-
3		-
4		-
5		-
6		-
7		-Network Working Group P. Deutsch
8		-Request for Comments: 1951 Aladdin Enterprises
9		-Category: Informational May 1996
10		-
11		-
12		- DEFLATE Compressed Data Format Specification version 1.3
13		-
14		-Status of This Memo
15		-
16		- This memo provides information for the Internet community. This memo
17		- does not specify an Internet standard of any kind. Distribution of
18		- this memo is unlimited.
19		-
20		-IESG Note:
21		-
22		- The IESG takes no position on the validity of any Intellectual
23		- Property Rights statements contained in this document.
24		-
25		-Notices
26		-
27		- Copyright (c) 1996 L. Peter Deutsch
28		-
29		- Permission is granted to copy and distribute this document for any
30		- purpose and without charge, including translations into other
31		- languages and incorporation into compilations, provided that the
32		- copyright notice and this notice are preserved, and that any
33		- substantive changes or deletions from the original are clearly
34		- marked.
35		-
36		- A pointer to the latest version of this and related documentation in
37		- HTML format can be found at the URL
38		- <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
39		-
40		-Abstract
41		-
42		- This specification defines a lossless compressed data format that
43		- compresses data using a combination of the LZ77 algorithm and Huffman
44		- coding, with efficiency comparable to the best currently available
45		- general-purpose compression methods. The data can be produced or
46		- consumed, even for an arbitrarily long sequentially presented input
47		- data stream, using only an a priori bounded amount of intermediate
48		- storage. The format can be implemented readily in a manner not
49		- covered by patents.
50		-
51		-
52		-
53		-
54		-
55		-
56		-
57		-
58		-Deutsch Informational [Page 1]
59		-
60		-
61		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
62		-
63		-
64		-Table of Contents
65		-
66		- 1. Introduction ................................................... 2
67		- 1.1. Purpose ................................................... 2
68		- 1.2. Intended audience ......................................... 3
69		- 1.3. Scope ..................................................... 3
70		- 1.4. Compliance ................................................ 3
71		- 1.5. Definitions of terms and conventions used ................ 3
72		- 1.6. Changes from previous versions ............................ 4
73		- 2. Compressed representation overview ............................. 4
74		- 3. Detailed specification ......................................... 5
75		- 3.1. Overall conventions ....................................... 5
76		- 3.1.1. Packing into bytes .................................. 5
77		- 3.2. Compressed block format ................................... 6
78		- 3.2.1. Synopsis of prefix and Huffman coding ............... 6
79		- 3.2.2. Use of Huffman coding in the "deflate" format ....... 7
80		- 3.2.3. Details of block format ............................. 9
81		- 3.2.4. Non-compressed blocks (BTYPE=00) ................... 11
82		- 3.2.5. Compressed blocks (length and distance codes) ...... 11
83		- 3.2.6. Compression with fixed Huffman codes (BTYPE=01) .... 12
84		- 3.2.7. Compression with dynamic Huffman codes (BTYPE=10) .. 13
85		- 3.3. Compliance ............................................... 14
86		- 4. Compression algorithm details ................................. 14
87		- 5. References .................................................... 16
88		- 6. Security Considerations ....................................... 16
89		- 7. Source code ................................................... 16
90		- 8. Acknowledgements .............................................. 16
91		- 9. Author's Address .............................................. 17
92		-
93		-1. Introduction
94		-
95		- 1.1. Purpose
96		-
97		- The purpose of this specification is to define a lossless
98		- compressed data format that:
99		- * Is independent of CPU type, operating system, file system,
100		- and character set, and hence can be used for interchange;
101		- * Can be produced or consumed, even for an arbitrarily long
102		- sequentially presented input data stream, using only an a
103		- priori bounded amount of intermediate storage, and hence
104		- can be used in data communications or similar structures
105		- such as Unix filters;
106		- * Compresses data with efficiency comparable to the best
107		- currently available general-purpose compression methods,
108		- and in particular considerably better than the "compress"
109		- program;
110		- * Can be implemented readily in a manner not covered by
111		- patents, and hence can be practiced freely;
112		-
113		-
114		-
115		-Deutsch Informational [Page 2]
116		-
117		-
118		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
119		-
120		-
121		- * Is compatible with the file format produced by the current
122		- widely used gzip utility, in that conforming decompressors
123		- will be able to read data produced by the existing gzip
124		- compressor.
125		-
126		- The data format defined by this specification does not attempt to:
127		-
128		- * Allow random access to compressed data;
129		- * Compress specialized data (e.g., raster graphics) as well
130		- as the best currently available specialized algorithms.
131		-
132		- A simple counting argument shows that no lossless compression
133		- algorithm can compress every possible input data set. For the
134		- format defined here, the worst case expansion is 5 bytes per 32K-
135		- byte block, i.e., a size increase of 0.015% for large data sets.
136		- English text usually compresses by a factor of 2.5 to 3;
137		- executable files usually compress somewhat less; graphical data
138		- such as raster images may compress much more.
139		-
140		- 1.2. Intended audience
141		-
142		- This specification is intended for use by implementors of software
143		- to compress data into "deflate" format and/or decompress data from
144		- "deflate" format.
145		-
146		- The text of the specification assumes a basic background in
147		- programming at the level of bits and other primitive data
148		- representations. Familiarity with the technique of Huffman coding
149		- is helpful but not required.
150		-
151		- 1.3. Scope
152		-
153		- The specification specifies a method for representing a sequence
154		- of bytes as a (usually shorter) sequence of bits, and a method for
155		- packing the latter bit sequence into bytes.
156		-
157		- 1.4. Compliance
158		-
159		- Unless otherwise indicated below, a compliant decompressor must be
160		- able to accept and decompress any data set that conforms to all
161		- the specifications presented here; a compliant compressor must
162		- produce data sets that conform to all the specifications presented
163		- here.
164		-
165		- 1.5. Definitions of terms and conventions used
166		-
167		- Byte: 8 bits stored or transmitted as a unit (same as an octet).
168		- For this specification, a byte is exactly 8 bits, even on machines
169		-
170		-
171		-
172		-Deutsch Informational [Page 3]
173		-
174		-
175		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
176		-
177		-
178		- which store a character on a number of bits different from eight.
179		- See below, for the numbering of bits within a byte.
180		-
181		- String: a sequence of arbitrary bytes.
182		-
183		- 1.6. Changes from previous versions
184		-
185		- There have been no technical changes to the deflate format since
186		- version 1.1 of this specification. In version 1.2, some
187		- terminology was changed. Version 1.3 is a conversion of the
188		- specification to RFC style.
189		-
190		-2. Compressed representation overview
191		-
192		- A compressed data set consists of a series of blocks, corresponding
193		- to successive blocks of input data. The block sizes are arbitrary,
194		- except that non-compressible blocks are limited to 65,535 bytes.
195		-
196		- Each block is compressed using a combination of the LZ77 algorithm
197		- and Huffman coding. The Huffman trees for each block are independent
198		- of those for previous or subsequent blocks; the LZ77 algorithm may
199		- use a reference to a duplicated string occurring in a previous block,
200		- up to 32K input bytes before.
201		-
202		- Each block consists of two parts: a pair of Huffman code trees that
203		- describe the representation of the compressed data part, and a
204		- compressed data part. (The Huffman trees themselves are compressed
205		- using Huffman encoding.) The compressed data consists of a series of
206		- elements of two types: literal bytes (of strings that have not been
207		- detected as duplicated within the previous 32K input bytes), and
208		- pointers to duplicated strings, where a pointer is represented as a
209		- pair <length, backward distance>. The representation used in the
210		- "deflate" format limits distances to 32K bytes and lengths to 258
211		- bytes, but does not limit the size of a block, except for
212		- uncompressible blocks, which are limited as noted above.
213		-
214		- Each type of value (literals, distances, and lengths) in the
215		- compressed data is represented using a Huffman code, using one code
216		- tree for literals and lengths and a separate code tree for distances.
217		- The code trees for each block appear in a compact form just before
218		- the compressed data for that block.
219		-
220		-
221		-
222		-
223		-
224		-
225		-
226		-
227		-
228		-
229		-Deutsch Informational [Page 4]
230		-
231		-
232		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
233		-
234		-
235		-3. Detailed specification
236		-
237		- 3.1. Overall conventions In the diagrams below, a box like this:
238		-
239		- +---+
240		- \| \| <-- the vertical bars might be missing
241		- +---+
242		-
243		- represents one byte; a box like this:
244		-
245		- +==============+
246		- \| \|
247		- +==============+
248		-
249		- represents a variable number of bytes.
250		-
251		- Bytes stored within a computer do not have a "bit order", since
252		- they are always treated as a unit. However, a byte considered as
253		- an integer between 0 and 255 does have a most- and least-
254		- significant bit, and since we write numbers with the most-
255		- significant digit on the left, we also write bytes with the most-
256		- significant bit on the left. In the diagrams below, we number the
257		- bits of a byte so that bit 0 is the least-significant bit, i.e.,
258		- the bits are numbered:
259		-
260		- +--------+
261		- \|76543210\|
262		- +--------+
263		-
264		- Within a computer, a number may occupy multiple bytes. All
265		- multi-byte numbers in the format described here are stored with
266		- the least-significant byte first (at the lower memory address).
267		- For example, the decimal number 520 is stored as:
268		-
269		- 0 1
270		- +--------+--------+
271		- \|00001000\|00000010\|
272		- +--------+--------+
273		- ^ ^
274		- \| \|
275		- \| + more significant byte = 2 x 256
276		- + less significant byte = 8
277		-
278		- 3.1.1. Packing into bytes
279		-
280		- This document does not address the issue of the order in which
281		- bits of a byte are transmitted on a bit-sequential medium,
282		- since the final data format described here is byte- rather than
283		-
284		-
285		-
286		-Deutsch Informational [Page 5]
287		-
288		-
289		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
290		-
291		-
292		- bit-oriented. However, we describe the compressed block format
293		- in below, as a sequence of data elements of various bit
294		- lengths, not a sequence of bytes. We must therefore specify
295		- how to pack these data elements into bytes to form the final
296		- compressed byte sequence:
297		-
298		- * Data elements are packed into bytes in order of
299		- increasing bit number within the byte, i.e., starting
300		- with the least-significant bit of the byte.
301		- * Data elements other than Huffman codes are packed
302		- starting with the least-significant bit of the data
303		- element.
304		- * Huffman codes are packed starting with the most-
305		- significant bit of the code.
306		-
307		- In other words, if one were to print out the compressed data as
308		- a sequence of bytes, starting with the first byte at the
309		- right margin and proceeding to the left, with the most-
310		- significant bit of each byte on the left as usual, one would be
311		- able to parse the result from right to left, with fixed-width
312		- elements in the correct MSB-to-LSB order and Huffman codes in
313		- bit-reversed order (i.e., with the first bit of the code in the
314		- relative LSB position).
315		-
316		- 3.2. Compressed block format
317		-
318		- 3.2.1. Synopsis of prefix and Huffman coding
319		-
320		- Prefix coding represents symbols from an a priori known
321		- alphabet by bit sequences (codes), one code for each symbol, in
322		- a manner such that different symbols may be represented by bit
323		- sequences of different lengths, but a parser can always parse
324		- an encoded string unambiguously symbol-by-symbol.
325		-
326		- We define a prefix code in terms of a binary tree in which the
327		- two edges descending from each non-leaf node are labeled 0 and
328		- 1 and in which the leaf nodes correspond one-for-one with (are
329		- labeled with) the symbols of the alphabet; then the code for a
330		- symbol is the sequence of 0's and 1's on the edges leading from
331		- the root to the leaf labeled with that symbol. For example:
332		-
333		-
334		-
335		-
336		-
337		-
338		-
339		-
340		-
341		-
342		-
343		-Deutsch Informational [Page 6]
344		-
345		-
346		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
347		-
348		-
349		- /\ Symbol Code
350		- 0 1 ------ ----
351		- / \ A 00
352		- /\ B B 1
353		- 0 1 C 011
354		- / \ D 010
355		- A /\
356		- 0 1
357		- / \
358		- D C
359		-
360		- A parser can decode the next symbol from an encoded input
361		- stream by walking down the tree from the root, at each step
362		- choosing the edge corresponding to the next input bit.
363		-
364		- Given an alphabet with known symbol frequencies, the Huffman
365		- algorithm allows the construction of an optimal prefix code
366		- (one which represents strings with those symbol frequencies
367		- using the fewest bits of any possible prefix codes for that
368		- alphabet). Such a code is called a Huffman code. (See
369		- reference [1] in Chapter 5, references for additional
370		- information on Huffman codes.)
371		-
372		- Note that in the "deflate" format, the Huffman codes for the
373		- various alphabets must not exceed certain maximum code lengths.
374		- This constraint complicates the algorithm for computing code
375		- lengths from symbol frequencies. Again, see Chapter 5,
376		- references for details.
377		-
378		- 3.2.2. Use of Huffman coding in the "deflate" format
379		-
380		- The Huffman codes used for each alphabet in the "deflate"
381		- format have two additional rules:
382		-
383		- * All codes of a given bit length have lexicographically
384		- consecutive values, in the same order as the symbols
385		- they represent;
386		-
387		- * Shorter codes lexicographically precede longer codes.
388		-
389		-
390		-
391		-
392		-
393		-
394		-
395		-
396		-
397		-
398		-
399		-
400		-Deutsch Informational [Page 7]
401		-
402		-
403		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
404		-
405		-
406		- We could recode the example above to follow this rule as
407		- follows, assuming that the order of the alphabet is ABCD:
408		-
409		- Symbol Code
410		- ------ ----
411		- A 10
412		- B 0
413		- C 110
414		- D 111
415		-
416		- I.e., 0 precedes 10 which precedes 11x, and 110 and 111 are
417		- lexicographically consecutive.
418		-
419		- Given this rule, we can define the Huffman code for an alphabet
420		- just by giving the bit lengths of the codes for each symbol of
421		- the alphabet in order; this is sufficient to determine the
422		- actual codes. In our example, the code is completely defined
423		- by the sequence of bit lengths (2, 1, 3, 3). The following
424		- algorithm generates the codes as integers, intended to be read
425		- from most- to least-significant bit. The code lengths are
426		- initially in tree[I].Len; the codes are produced in
427		- tree[I].Code.
428		-
429		- 1) Count the number of codes for each code length. Let
430		- bl_count[N] be the number of codes of length N, N >= 1.
431		-
432		- 2) Find the numerical value of the smallest code for each
433		- code length:
434		-
435		- code = 0;
436		- bl_count[0] = 0;
437		- for (bits = 1; bits <= MAX_BITS; bits++) {
438		- code = (code + bl_count[bits-1]) << 1;
439		- next_code[bits] = code;
440		- }
441		-
442		- 3) Assign numerical values to all codes, using consecutive
443		- values for all codes of the same length with the base
444		- values determined at step 2. Codes that are never used
445		- (which have a bit length of zero) must not be assigned a
446		- value.
447		-
448		- for (n = 0; n <= max_code; n++) {
449		- len = tree[n].Len;
450		- if (len != 0) {
451		- tree[n].Code = next_code[len];
452		- next_code[len]++;
453		- }
454		-
455		-
456		-
457		-Deutsch Informational [Page 8]
458		-
459		-
460		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
461		-
462		-
463		- }
464		-
465		- Example:
466		-
467		- Consider the alphabet ABCDEFGH, with bit lengths (3, 3, 3, 3,
468		- 3, 2, 4, 4). After step 1, we have:
469		-
470		- N bl_count[N]
471		- - -----------
472		- 2 1
473		- 3 5
474		- 4 2
475		-
476		- Step 2 computes the following next_code values:
477		-
478		- N next_code[N]
479		- - ------------
480		- 1 0
481		- 2 0
482		- 3 2
483		- 4 14
484		-
485		- Step 3 produces the following code values:
486		-
487		- Symbol Length Code
488		- ------ ------ ----
489		- A 3 010
490		- B 3 011
491		- C 3 100
492		- D 3 101
493		- E 3 110
494		- F 2 00
495		- G 4 1110
496		- H 4 1111
497		-
498		- 3.2.3. Details of block format
499		-
500		- Each block of compressed data begins with 3 header bits
501		- containing the following data:
502		-
503		- first bit BFINAL
504		- next 2 bits BTYPE
505		-
506		- Note that the header bits do not necessarily begin on a byte
507		- boundary, since a block does not necessarily occupy an integral
508		- number of bytes.
509		-
510		-
511		-
512		-
513		-
514		-Deutsch Informational [Page 9]
515		-
516		-
517		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
518		-
519		-
520		- BFINAL is set if and only if this is the last block of the data
521		- set.
522		-
523		- BTYPE specifies how the data are compressed, as follows:
524		-
525		- 00 - no compression
526		- 01 - compressed with fixed Huffman codes
527		- 10 - compressed with dynamic Huffman codes
528		- 11 - reserved (error)
529		-
530		- The only difference between the two compressed cases is how the
531		- Huffman codes for the literal/length and distance alphabets are
532		- defined.
533		-
534		- In all cases, the decoding algorithm for the actual data is as
535		- follows:
536		-
537		- do
538		- read block header from input stream.
539		- if stored with no compression
540		- skip any remaining bits in current partially
541		- processed byte
542		- read LEN and NLEN (see next section)
543		- copy LEN bytes of data to output
544		- otherwise
545		- if compressed with dynamic Huffman codes
546		- read representation of code trees (see
547		- subsection below)
548		- loop (until end of block code recognized)
549		- decode literal/length value from input stream
550		- if value < 256
551		- copy value (literal byte) to output stream
552		- otherwise
553		- if value = end of block (256)
554		- break from loop
555		- otherwise (value = 257..285)
556		- decode distance from input stream
557		-
558		- move backwards distance bytes in the output
559		- stream, and copy length bytes from this
560		- position to the output stream.
561		- end loop
562		- while not last block
563		-
564		- Note that a duplicated string reference may refer to a string
565		- in a previous block; i.e., the backward distance may cross one
566		- or more block boundaries. However a distance cannot refer past
567		- the beginning of the output stream. (An application using a
568		-
569		-
570		-
571		-Deutsch Informational [Page 10]
572		-
573		-
574		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
575		-
576		-
577		- preset dictionary might discard part of the output stream; a
578		- distance can refer to that part of the output stream anyway)
579		- Note also that the referenced string may overlap the current
580		- position; for example, if the last 2 bytes decoded have values
581		- X and Y, a string reference with <length = 5, distance = 2>
582		- adds X,Y,X,Y,X to the output stream.
583		-
584		- We now specify each compression method in turn.
585		-
586		- 3.2.4. Non-compressed blocks (BTYPE=00)
587		-
588		- Any bits of input up to the next byte boundary are ignored.
589		- The rest of the block consists of the following information:
590		-
591		- 0 1 2 3 4...
592		- +---+---+---+---+================================+
593		- \| LEN \| NLEN \|... LEN bytes of literal data...\|
594		- +---+---+---+---+================================+
595		-
596		- LEN is the number of data bytes in the block. NLEN is the
597		- one's complement of LEN.
598		-
599		- 3.2.5. Compressed blocks (length and distance codes)
600		-
601		- As noted above, encoded data blocks in the "deflate" format
602		- consist of sequences of symbols drawn from three conceptually
603		- distinct alphabets: either literal bytes, from the alphabet of
604		- byte values (0..255), or <length, backward distance> pairs,
605		- where the length is drawn from (3..258) and the distance is
606		- drawn from (1..32,768). In fact, the literal and length
607		- alphabets are merged into a single alphabet (0..285), where
608		- values 0..255 represent literal bytes, the value 256 indicates
609		- end-of-block, and values 257..285 represent length codes
610		- (possibly in conjunction with extra bits following the symbol
611		- code) as follows:
612		-
613		-
614		-
615		-
616		-
617		-
618		-
619		-
620		-
621		-
622		-
623		-
624		-
625		-
626		-
627		-
628		-Deutsch Informational [Page 11]
629		-
630		-
631		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
632		-
633		-
634		- Extra Extra Extra
635		- Code Bits Length(s) Code Bits Lengths Code Bits Length(s)
636		- ---- ---- ------ ---- ---- ------- ---- ---- -------
637		- 257 0 3 267 1 15,16 277 4 67-82
638		- 258 0 4 268 1 17,18 278 4 83-98
639		- 259 0 5 269 2 19-22 279 4 99-114
640		- 260 0 6 270 2 23-26 280 4 115-130
641		- 261 0 7 271 2 27-30 281 5 131-162
642		- 262 0 8 272 2 31-34 282 5 163-194
643		- 263 0 9 273 3 35-42 283 5 195-226
644		- 264 0 10 274 3 43-50 284 5 227-257
645		- 265 1 11,12 275 3 51-58 285 0 258
646		- 266 1 13,14 276 3 59-66
647		-
648		- The extra bits should be interpreted as a machine integer
649		- stored with the most-significant bit first, e.g., bits 1110
650		- represent the value 14.
651		-
652		- Extra Extra Extra
653		- Code Bits Dist Code Bits Dist Code Bits Distance
654		- ---- ---- ---- ---- ---- ------ ---- ---- --------
655		- 0 0 1 10 4 33-48 20 9 1025-1536
656		- 1 0 2 11 4 49-64 21 9 1537-2048
657		- 2 0 3 12 5 65-96 22 10 2049-3072
658		- 3 0 4 13 5 97-128 23 10 3073-4096
659		- 4 1 5,6 14 6 129-192 24 11 4097-6144
660		- 5 1 7,8 15 6 193-256 25 11 6145-8192
661		- 6 2 9-12 16 7 257-384 26 12 8193-12288
662		- 7 2 13-16 17 7 385-512 27 12 12289-16384
663		- 8 3 17-24 18 8 513-768 28 13 16385-24576
664		- 9 3 25-32 19 8 769-1024 29 13 24577-32768
665		-
666		- 3.2.6. Compression with fixed Huffman codes (BTYPE=01)
667		-
668		- The Huffman codes for the two alphabets are fixed, and are not
669		- represented explicitly in the data. The Huffman code lengths
670		- for the literal/length alphabet are:
671		-
672		- Lit Value Bits Codes
673		- --------- ---- -----
674		- 0 - 143 8 00110000 through
675		- 10111111
676		- 144 - 255 9 110010000 through
677		- 111111111
678		- 256 - 279 7 0000000 through
679		- 0010111
680		- 280 - 287 8 11000000 through
681		- 11000111
682		-
683		-
684		-
685		-Deutsch Informational [Page 12]
686		-
687		-
688		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
689		-
690		-
691		- The code lengths are sufficient to generate the actual codes,
692		- as described above; we show the codes in the table for added
693		- clarity. Literal/length values 286-287 will never actually
694		- occur in the compressed data, but participate in the code
695		- construction.
696		-
697		- Distance codes 0-31 are represented by (fixed-length) 5-bit
698		- codes, with possible additional bits as shown in the table
699		- shown in Paragraph 3.2.5, above. Note that distance codes 30-
700		- 31 will never actually occur in the compressed data.
701		-
702		- 3.2.7. Compression with dynamic Huffman codes (BTYPE=10)
703		-
704		- The Huffman codes for the two alphabets appear in the block
705		- immediately after the header bits and before the actual
706		- compressed data, first the literal/length code and then the
707		- distance code. Each code is defined by a sequence of code
708		- lengths, as discussed in Paragraph 3.2.2, above. For even
709		- greater compactness, the code length sequences themselves are
710		- compressed using a Huffman code. The alphabet for code lengths
711		- is as follows:
712		-
713		- 0 - 15: Represent code lengths of 0 - 15
714		- 16: Copy the previous code length 3 - 6 times.
715		- The next 2 bits indicate repeat length
716		- (0 = 3, ... , 3 = 6)
717		- Example: Codes 8, 16 (+2 bits 11),
718		- 16 (+2 bits 10) will expand to
719		- 12 code lengths of 8 (1 + 6 + 5)
720		- 17: Repeat a code length of 0 for 3 - 10 times.
721		- (3 bits of length)
722		- 18: Repeat a code length of 0 for 11 - 138 times
723		- (7 bits of length)
724		-
725		- A code length of 0 indicates that the corresponding symbol in
726		- the literal/length or distance alphabet will not occur in the
727		- block, and should not participate in the Huffman code
728		- construction algorithm given earlier. If only one distance
729		- code is used, it is encoded using one bit, not zero bits; in
730		- this case there is a single code length of one, with one unused
731		- code. One distance code of zero bits means that there are no
732		- distance codes used at all (the data is all literals).
733		-
734		- We can now define the format of the block:
735		-
736		- 5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)
737		- 5 Bits: HDIST, # of Distance codes - 1 (1 - 32)
738		- 4 Bits: HCLEN, # of Code Length codes - 4 (4 - 19)
739		-
740		-
741		-
742		-Deutsch Informational [Page 13]
743		-
744		-
745		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
746		-
747		-
748		- (HCLEN + 4) x 3 bits: code lengths for the code length
749		- alphabet given just above, in the order: 16, 17, 18,
750		- 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
751		-
752		- These code lengths are interpreted as 3-bit integers
753		- (0-7); as above, a code length of 0 means the
754		- corresponding symbol (literal/length or distance code
755		- length) is not used.
756		-
757		- HLIT + 257 code lengths for the literal/length alphabet,
758		- encoded using the code length Huffman code
759		-
760		- HDIST + 1 code lengths for the distance alphabet,
761		- encoded using the code length Huffman code
762		-
763		- The actual compressed data of the block,
764		- encoded using the literal/length and distance Huffman
765		- codes
766		-
767		- The literal/length symbol 256 (end of data),
768		- encoded using the literal/length Huffman code
769		-
770		- The code length repeat codes can cross from HLIT + 257 to the
771		- HDIST + 1 code lengths. In other words, all code lengths form
772		- a single sequence of HLIT + HDIST + 258 values.
773		-
774		- 3.3. Compliance
775		-
776		- A compressor may limit further the ranges of values specified in
777		- the previous section and still be compliant; for example, it may
778		- limit the range of backward pointers to some value smaller than
779		- 32K. Similarly, a compressor may limit the size of blocks so that
780		- a compressible block fits in memory.
781		-
782		- A compliant decompressor must accept the full range of possible
783		- values defined in the previous section, and must accept blocks of
784		- arbitrary size.
785		-
786		-4. Compression algorithm details
787		-
788		- While it is the intent of this document to define the "deflate"
789		- compressed data format without reference to any particular
790		- compression algorithm, the format is related to the compressed
791		- formats produced by LZ77 (Lempel-Ziv 1977, see reference [2] below);
792		- since many variations of LZ77 are patented, it is strongly
793		- recommended that the implementor of a compressor follow the general
794		- algorithm presented here, which is known not to be patented per se.
795		- The material in this section is not part of the definition of the
796		-
797		-
798		-
799		-Deutsch Informational [Page 14]
800		-
801		-
802		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
803		-
804		-
805		- specification per se, and a compressor need not follow it in order to
806		- be compliant.
807		-
808		- The compressor terminates a block when it determines that starting a
809		- new block with fresh trees would be useful, or when the block size
810		- fills up the compressor's block buffer.
811		-
812		- The compressor uses a chained hash table to find duplicated strings,
813		- using a hash function that operates on 3-byte sequences. At any
814		- given point during compression, let XYZ be the next 3 input bytes to
815		- be examined (not necessarily all different, of course). First, the
816		- compressor examines the hash chain for XYZ. If the chain is empty,
817		- the compressor simply writes out X as a literal byte and advances one
818		- byte in the input. If the hash chain is not empty, indicating that
819		- the sequence XYZ (or, if we are unlucky, some other 3 bytes with the
820		- same hash function value) has occurred recently, the compressor
821		- compares all strings on the XYZ hash chain with the actual input data
822		- sequence starting at the current point, and selects the longest
823		- match.
824		-
825		- The compressor searches the hash chains starting with the most recent
826		- strings, to favor small distances and thus take advantage of the
827		- Huffman encoding. The hash chains are singly linked. There are no
828		- deletions from the hash chains; the algorithm simply discards matches
829		- that are too old. To avoid a worst-case situation, very long hash
830		- chains are arbitrarily truncated at a certain length, determined by a
831		- run-time parameter.
832		-
833		- To improve overall compression, the compressor optionally defers the
834		- selection of matches ("lazy matching"): after a match of length N has
835		- been found, the compressor searches for a longer match starting at
836		- the next input byte. If it finds a longer match, it truncates the
837		- previous match to a length of one (thus producing a single literal
838		- byte) and then emits the longer match. Otherwise, it emits the
839		- original match, and, as described above, advances N bytes before
840		- continuing.
841		-
842		- Run-time parameters also control this "lazy match" procedure. If
843		- compression ratio is most important, the compressor attempts a
844		- complete second search regardless of the length of the first match.
845		- In the normal case, if the current match is "long enough", the
846		- compressor reduces the search for a longer match, thus speeding up
847		- the process. If speed is most important, the compressor inserts new
848		- strings in the hash table only when no match was found, or when the
849		- match is not "too long". This degrades the compression ratio but
850		- saves time since there are both fewer insertions and fewer searches.
851		-
852		-
853		-
854		-
855		-
856		-Deutsch Informational [Page 15]
857		-
858		-
859		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
860		-
861		-
862		-5. References
863		-
864		- [1] Huffman, D. A., "A Method for the Construction of Minimum
865		- Redundancy Codes", Proceedings of the Institute of Radio
866		- Engineers, September 1952, Volume 40, Number 9, pp. 1098-1101.
867		-
868		- [2] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data
869		- Compression", IEEE Transactions on Information Theory, Vol. 23,
870		- No. 3, pp. 337-343.
871		-
872		- [3] Gailly, J.-L., and Adler, M., ZLIB documentation and sources,
873		- available in ftp://ftp.uu.net/pub/archiving/zip/doc/
874		-
875		- [4] Gailly, J.-L., and Adler, M., GZIP documentation and sources,
876		- available as gzip-*.tar in ftp://prep.ai.mit.edu/pub/gnu/
877		-
878		- [5] Schwartz, E. S., and Kallick, B. "Generating a canonical prefix
879		- encoding." Comm. ACM, 7,3 (Mar. 1964), pp. 166-169.
880		-
881		- [6] Hirschberg and Lelewer, "Efficient decoding of prefix codes,"
882		- Comm. ACM, 33,4, April 1990, pp. 449-459.
883		-
884		-6. Security Considerations
885		-
886		- Any data compression method involves the reduction of redundancy in
887		- the data. Consequently, any corruption of the data is likely to have
888		- severe effects and be difficult to correct. Uncompressed text, on
889		- the other hand, will probably still be readable despite the presence
890		- of some corrupted bytes.
891		-
892		- It is recommended that systems using this data format provide some
893		- means of validating the integrity of the compressed data. See
894		- reference [3], for example.
895		-
896		-7. Source code
897		-
898		- Source code for a C language implementation of a "deflate" compliant
899		- compressor and decompressor is available within the zlib package at
900		- ftp://ftp.uu.net/pub/archiving/zip/zlib/.
901		-
902		-8. Acknowledgements
903		-
904		- Trademarks cited in this document are the property of their
905		- respective owners.
906		-
907		- Phil Katz designed the deflate format. Jean-Loup Gailly and Mark
908		- Adler wrote the related software described in this specification.
909		- Glenn Randers-Pehrson converted this document to RFC and HTML format.
910		-
911		-
912		-
913		-Deutsch Informational [Page 16]
914		-
915		-
916		-RFC 1951 DEFLATE Compressed Data Format Specification May 1996
917		-
918		-
919		-9. Author's Address
920		-
921		- L. Peter Deutsch
922		- Aladdin Enterprises
923		- 203 Santa Margarita Ave.
924		- Menlo Park, CA 94025
925		-
926		- Phone: (415) 322-0103 (AM only)
927		- FAX: (415) 322-1734
928		- EMail: <[email protected]>
929		-
930		- Questions about the technical content of this specification can be
931		- sent by email to:
932		-
933		- Jean-Loup Gailly <[email protected]> and
934		- Mark Adler <[email protected]>
935		-
936		- Editorial comments on this specification can be sent by email to:
937		-
938		- L. Peter Deutsch <[email protected]> and
939		- Glenn Randers-Pehrson <[email protected]>
940		-
941		-
942		-
943		-
944		-
945		-
946		-
947		-
948		-
949		-
950		-
951		-
952		-
953		-
954		-
955		-
956		-
957		-
958		-
959		-
960		-
961		-
962		-
963		-
964		-
965		-
966		-
967		-
968		-
969		-
970		-Deutsch Informational [Page 17]
971		-
972		-

	--- a/compat/zlib/doc/rfc1951.txt
	+++ b/compat/zlib/doc/rfc1951.txt
	@@ -1,972 +0,0 @@
1
2
3
4
5
6
7	Network Working Group P. Deutsch
8	Request for Comments: 1951 Aladdin Enterprises
9	Category: Informational May 1996
10
11
12	DEFLATE Compressed Data Format Specification version 1.3
13
14	Status of This Memo
15
16	This memo provides information for the Internet community. This memo
17	does not specify an Internet standard of any kind. Distribution of
18	this memo is unlimited.
19
20	IESG Note:
21
22	The IESG takes no position on the validity of any Intellectual
23	Property Rights statements contained in this document.
24
25	Notices
26
27	Copyright (c) 1996 L. Peter Deutsch
28
29	Permission is granted to copy and distribute this document for any
30	purpose and without charge, including translations into other
31	languages and incorporation into compilations, provided that the
32	copyright notice and this notice are preserved, and that any
33	substantive changes or deletions from the original are clearly
34	marked.
35
36	A pointer to the latest version of this and related documentation in
37	HTML format can be found at the URL
38	<ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
39
40	Abstract
41
42	This specification defines a lossless compressed data format that
43	compresses data using a combination of the LZ77 algorithm and Huffman
44	coding, with efficiency comparable to the best currently available
45	general-purpose compression methods. The data can be produced or
46	consumed, even for an arbitrarily long sequentially presented input
47	data stream, using only an a priori bounded amount of intermediate
48	storage. The format can be implemented readily in a manner not
49	covered by patents.
50
51
52
53
54
55
56
57
58	Deutsch Informational [Page 1]
59
60
61	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
62
63
64	Table of Contents
65
66	1. Introduction ................................................... 2
67	1.1. Purpose ................................................... 2
68	1.2. Intended audience ......................................... 3
69	1.3. Scope ..................................................... 3
70	1.4. Compliance ................................................ 3
71	1.5. Definitions of terms and conventions used ................ 3
72	1.6. Changes from previous versions ............................ 4
73	2. Compressed representation overview ............................. 4
74	3. Detailed specification ......................................... 5
75	3.1. Overall conventions ....................................... 5
76	3.1.1. Packing into bytes .................................. 5
77	3.2. Compressed block format ................................... 6
78	3.2.1. Synopsis of prefix and Huffman coding ............... 6
79	3.2.2. Use of Huffman coding in the "deflate" format ....... 7
80	3.2.3. Details of block format ............................. 9
81	3.2.4. Non-compressed blocks (BTYPE=00) ................... 11
82	3.2.5. Compressed blocks (length and distance codes) ...... 11
83	3.2.6. Compression with fixed Huffman codes (BTYPE=01) .... 12
84	3.2.7. Compression with dynamic Huffman codes (BTYPE=10) .. 13
85	3.3. Compliance ............................................... 14
86	4. Compression algorithm details ................................. 14
87	5. References .................................................... 16
88	6. Security Considerations ....................................... 16
89	7. Source code ................................................... 16
90	8. Acknowledgements .............................................. 16
91	9. Author's Address .............................................. 17
92
93	1. Introduction
94
95	1.1. Purpose
96
97	The purpose of this specification is to define a lossless
98	compressed data format that:
99	* Is independent of CPU type, operating system, file system,
100	and character set, and hence can be used for interchange;
101	* Can be produced or consumed, even for an arbitrarily long
102	sequentially presented input data stream, using only an a
103	priori bounded amount of intermediate storage, and hence
104	can be used in data communications or similar structures
105	such as Unix filters;
106	* Compresses data with efficiency comparable to the best
107	currently available general-purpose compression methods,
108	and in particular considerably better than the "compress"
109	program;
110	* Can be implemented readily in a manner not covered by
111	patents, and hence can be practiced freely;
112
113
114
115	Deutsch Informational [Page 2]
116
117
118	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
119
120
121	* Is compatible with the file format produced by the current
122	widely used gzip utility, in that conforming decompressors
123	will be able to read data produced by the existing gzip
124	compressor.
125
126	The data format defined by this specification does not attempt to:
127
128	* Allow random access to compressed data;
129	* Compress specialized data (e.g., raster graphics) as well
130	as the best currently available specialized algorithms.
131
132	A simple counting argument shows that no lossless compression
133	algorithm can compress every possible input data set. For the
134	format defined here, the worst case expansion is 5 bytes per 32K-
135	byte block, i.e., a size increase of 0.015% for large data sets.
136	English text usually compresses by a factor of 2.5 to 3;
137	executable files usually compress somewhat less; graphical data
138	such as raster images may compress much more.
139
140	1.2. Intended audience
141
142	This specification is intended for use by implementors of software
143	to compress data into "deflate" format and/or decompress data from
144	"deflate" format.
145
146	The text of the specification assumes a basic background in
147	programming at the level of bits and other primitive data
148	representations. Familiarity with the technique of Huffman coding
149	is helpful but not required.
150
151	1.3. Scope
152
153	The specification specifies a method for representing a sequence
154	of bytes as a (usually shorter) sequence of bits, and a method for
155	packing the latter bit sequence into bytes.
156
157	1.4. Compliance
158
159	Unless otherwise indicated below, a compliant decompressor must be
160	able to accept and decompress any data set that conforms to all
161	the specifications presented here; a compliant compressor must
162	produce data sets that conform to all the specifications presented
163	here.
164
165	1.5. Definitions of terms and conventions used
166
167	Byte: 8 bits stored or transmitted as a unit (same as an octet).
168	For this specification, a byte is exactly 8 bits, even on machines
169
170
171
172	Deutsch Informational [Page 3]
173
174
175	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
176
177
178	which store a character on a number of bits different from eight.
179	See below, for the numbering of bits within a byte.
180
181	String: a sequence of arbitrary bytes.
182
183	1.6. Changes from previous versions
184
185	There have been no technical changes to the deflate format since
186	version 1.1 of this specification. In version 1.2, some
187	terminology was changed. Version 1.3 is a conversion of the
188	specification to RFC style.
189
190	2. Compressed representation overview
191
192	A compressed data set consists of a series of blocks, corresponding
193	to successive blocks of input data. The block sizes are arbitrary,
194	except that non-compressible blocks are limited to 65,535 bytes.
195
196	Each block is compressed using a combination of the LZ77 algorithm
197	and Huffman coding. The Huffman trees for each block are independent
198	of those for previous or subsequent blocks; the LZ77 algorithm may
199	use a reference to a duplicated string occurring in a previous block,
200	up to 32K input bytes before.
201
202	Each block consists of two parts: a pair of Huffman code trees that
203	describe the representation of the compressed data part, and a
204	compressed data part. (The Huffman trees themselves are compressed
205	using Huffman encoding.) The compressed data consists of a series of
206	elements of two types: literal bytes (of strings that have not been
207	detected as duplicated within the previous 32K input bytes), and
208	pointers to duplicated strings, where a pointer is represented as a
209	pair <length, backward distance>. The representation used in the
210	"deflate" format limits distances to 32K bytes and lengths to 258
211	bytes, but does not limit the size of a block, except for
212	uncompressible blocks, which are limited as noted above.
213
214	Each type of value (literals, distances, and lengths) in the
215	compressed data is represented using a Huffman code, using one code
216	tree for literals and lengths and a separate code tree for distances.
217	The code trees for each block appear in a compact form just before
218	the compressed data for that block.
219
220
221
222
223
224
225
226
227
228
229	Deutsch Informational [Page 4]
230
231
232	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
233
234
235	3. Detailed specification
236
237	3.1. Overall conventions In the diagrams below, a box like this:
238
239	+---+
240	\| \| <-- the vertical bars might be missing
241	+---+
242
243	represents one byte; a box like this:
244
245	+==============+
246	\| \|
247	+==============+
248
249	represents a variable number of bytes.
250
251	Bytes stored within a computer do not have a "bit order", since
252	they are always treated as a unit. However, a byte considered as
253	an integer between 0 and 255 does have a most- and least-
254	significant bit, and since we write numbers with the most-
255	significant digit on the left, we also write bytes with the most-
256	significant bit on the left. In the diagrams below, we number the
257	bits of a byte so that bit 0 is the least-significant bit, i.e.,
258	the bits are numbered:
259
260	+--------+
261	\|76543210\|
262	+--------+
263
264	Within a computer, a number may occupy multiple bytes. All
265	multi-byte numbers in the format described here are stored with
266	the least-significant byte first (at the lower memory address).
267	For example, the decimal number 520 is stored as:
268
269	0 1
270	+--------+--------+
271	\|00001000\|00000010\|
272	+--------+--------+
273	^ ^
274	\| \|
275	\| + more significant byte = 2 x 256
276	+ less significant byte = 8
277
278	3.1.1. Packing into bytes
279
280	This document does not address the issue of the order in which
281	bits of a byte are transmitted on a bit-sequential medium,
282	since the final data format described here is byte- rather than
283
284
285
286	Deutsch Informational [Page 5]
287
288
289	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
290
291
292	bit-oriented. However, we describe the compressed block format
293	in below, as a sequence of data elements of various bit
294	lengths, not a sequence of bytes. We must therefore specify
295	how to pack these data elements into bytes to form the final
296	compressed byte sequence:
297
298	* Data elements are packed into bytes in order of
299	increasing bit number within the byte, i.e., starting
300	with the least-significant bit of the byte.
301	* Data elements other than Huffman codes are packed
302	starting with the least-significant bit of the data
303	element.
304	* Huffman codes are packed starting with the most-
305	significant bit of the code.
306
307	In other words, if one were to print out the compressed data as
308	a sequence of bytes, starting with the first byte at the
309	right margin and proceeding to the left, with the most-
310	significant bit of each byte on the left as usual, one would be
311	able to parse the result from right to left, with fixed-width
312	elements in the correct MSB-to-LSB order and Huffman codes in
313	bit-reversed order (i.e., with the first bit of the code in the
314	relative LSB position).
315
316	3.2. Compressed block format
317
318	3.2.1. Synopsis of prefix and Huffman coding
319
320	Prefix coding represents symbols from an a priori known
321	alphabet by bit sequences (codes), one code for each symbol, in
322	a manner such that different symbols may be represented by bit
323	sequences of different lengths, but a parser can always parse
324	an encoded string unambiguously symbol-by-symbol.
325
326	We define a prefix code in terms of a binary tree in which the
327	two edges descending from each non-leaf node are labeled 0 and
328	1 and in which the leaf nodes correspond one-for-one with (are
329	labeled with) the symbols of the alphabet; then the code for a
330	symbol is the sequence of 0's and 1's on the edges leading from
331	the root to the leaf labeled with that symbol. For example:
332
333
334
335
336
337
338
339
340
341
342
343	Deutsch Informational [Page 6]
344
345
346	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
347
348
349	/\ Symbol Code
350	0 1 ------ ----
351	/ \ A 00
352	/\ B B 1
353	0 1 C 011
354	/ \ D 010
355	A /\
356	0 1
357	/ \
358	D C
359
360	A parser can decode the next symbol from an encoded input
361	stream by walking down the tree from the root, at each step
362	choosing the edge corresponding to the next input bit.
363
364	Given an alphabet with known symbol frequencies, the Huffman
365	algorithm allows the construction of an optimal prefix code
366	(one which represents strings with those symbol frequencies
367	using the fewest bits of any possible prefix codes for that
368	alphabet). Such a code is called a Huffman code. (See
369	reference [1] in Chapter 5, references for additional
370	information on Huffman codes.)
371
372	Note that in the "deflate" format, the Huffman codes for the
373	various alphabets must not exceed certain maximum code lengths.
374	This constraint complicates the algorithm for computing code
375	lengths from symbol frequencies. Again, see Chapter 5,
376	references for details.
377
378	3.2.2. Use of Huffman coding in the "deflate" format
379
380	The Huffman codes used for each alphabet in the "deflate"
381	format have two additional rules:
382
383	* All codes of a given bit length have lexicographically
384	consecutive values, in the same order as the symbols
385	they represent;
386
387	* Shorter codes lexicographically precede longer codes.
388
389
390
391
392
393
394
395
396
397
398
399
400	Deutsch Informational [Page 7]
401
402
403	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
404
405
406	We could recode the example above to follow this rule as
407	follows, assuming that the order of the alphabet is ABCD:
408
409	Symbol Code
410	------ ----
411	A 10
412	B 0
413	C 110
414	D 111
415
416	I.e., 0 precedes 10 which precedes 11x, and 110 and 111 are
417	lexicographically consecutive.
418
419	Given this rule, we can define the Huffman code for an alphabet
420	just by giving the bit lengths of the codes for each symbol of
421	the alphabet in order; this is sufficient to determine the
422	actual codes. In our example, the code is completely defined
423	by the sequence of bit lengths (2, 1, 3, 3). The following
424	algorithm generates the codes as integers, intended to be read
425	from most- to least-significant bit. The code lengths are
426	initially in tree[I].Len; the codes are produced in
427	tree[I].Code.
428
429	1) Count the number of codes for each code length. Let
430	bl_count[N] be the number of codes of length N, N >= 1.
431
432	2) Find the numerical value of the smallest code for each
433	code length:
434
435	code = 0;
436	bl_count[0] = 0;
437	for (bits = 1; bits <= MAX_BITS; bits++) {
438	code = (code + bl_count[bits-1]) << 1;
439	next_code[bits] = code;
440	}
441
442	3) Assign numerical values to all codes, using consecutive
443	values for all codes of the same length with the base
444	values determined at step 2. Codes that are never used
445	(which have a bit length of zero) must not be assigned a
446	value.
447
448	for (n = 0; n <= max_code; n++) {
449	len = tree[n].Len;
450	if (len != 0) {
451	tree[n].Code = next_code[len];
452	next_code[len]++;
453	}
454
455
456
457	Deutsch Informational [Page 8]
458
459
460	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
461
462
463	}
464
465	Example:
466
467	Consider the alphabet ABCDEFGH, with bit lengths (3, 3, 3, 3,
468	3, 2, 4, 4). After step 1, we have:
469
470	N bl_count[N]
471	- -----------
472	2 1
473	3 5
474	4 2
475
476	Step 2 computes the following next_code values:
477
478	N next_code[N]
479	- ------------
480	1 0
481	2 0
482	3 2
483	4 14
484
485	Step 3 produces the following code values:
486
487	Symbol Length Code
488	------ ------ ----
489	A 3 010
490	B 3 011
491	C 3 100
492	D 3 101
493	E 3 110
494	F 2 00
495	G 4 1110
496	H 4 1111
497
498	3.2.3. Details of block format
499
500	Each block of compressed data begins with 3 header bits
501	containing the following data:
502
503	first bit BFINAL
504	next 2 bits BTYPE
505
506	Note that the header bits do not necessarily begin on a byte
507	boundary, since a block does not necessarily occupy an integral
508	number of bytes.
509
510
511
512
513
514	Deutsch Informational [Page 9]
515
516
517	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
518
519
520	BFINAL is set if and only if this is the last block of the data
521	set.
522
523	BTYPE specifies how the data are compressed, as follows:
524
525	00 - no compression
526	01 - compressed with fixed Huffman codes
527	10 - compressed with dynamic Huffman codes
528	11 - reserved (error)
529
530	The only difference between the two compressed cases is how the
531	Huffman codes for the literal/length and distance alphabets are
532	defined.
533
534	In all cases, the decoding algorithm for the actual data is as
535	follows:
536
537	do
538	read block header from input stream.
539	if stored with no compression
540	skip any remaining bits in current partially
541	processed byte
542	read LEN and NLEN (see next section)
543	copy LEN bytes of data to output
544	otherwise
545	if compressed with dynamic Huffman codes
546	read representation of code trees (see
547	subsection below)
548	loop (until end of block code recognized)
549	decode literal/length value from input stream
550	if value < 256
551	copy value (literal byte) to output stream
552	otherwise
553	if value = end of block (256)
554	break from loop
555	otherwise (value = 257..285)
556	decode distance from input stream
557
558	move backwards distance bytes in the output
559	stream, and copy length bytes from this
560	position to the output stream.
561	end loop
562	while not last block
563
564	Note that a duplicated string reference may refer to a string
565	in a previous block; i.e., the backward distance may cross one
566	or more block boundaries. However a distance cannot refer past
567	the beginning of the output stream. (An application using a
568
569
570
571	Deutsch Informational [Page 10]
572
573
574	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
575
576
577	preset dictionary might discard part of the output stream; a
578	distance can refer to that part of the output stream anyway)
579	Note also that the referenced string may overlap the current
580	position; for example, if the last 2 bytes decoded have values
581	X and Y, a string reference with <length = 5, distance = 2>
582	adds X,Y,X,Y,X to the output stream.
583
584	We now specify each compression method in turn.
585
586	3.2.4. Non-compressed blocks (BTYPE=00)
587
588	Any bits of input up to the next byte boundary are ignored.
589	The rest of the block consists of the following information:
590
591	0 1 2 3 4...
592	+---+---+---+---+================================+
593	\| LEN \| NLEN \|... LEN bytes of literal data...\|
594	+---+---+---+---+================================+
595
596	LEN is the number of data bytes in the block. NLEN is the
597	one's complement of LEN.
598
599	3.2.5. Compressed blocks (length and distance codes)
600
601	As noted above, encoded data blocks in the "deflate" format
602	consist of sequences of symbols drawn from three conceptually
603	distinct alphabets: either literal bytes, from the alphabet of
604	byte values (0..255), or <length, backward distance> pairs,
605	where the length is drawn from (3..258) and the distance is
606	drawn from (1..32,768). In fact, the literal and length
607	alphabets are merged into a single alphabet (0..285), where
608	values 0..255 represent literal bytes, the value 256 indicates
609	end-of-block, and values 257..285 represent length codes
610	(possibly in conjunction with extra bits following the symbol
611	code) as follows:
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628	Deutsch Informational [Page 11]
629
630
631	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
632
633
634	Extra Extra Extra
635	Code Bits Length(s) Code Bits Lengths Code Bits Length(s)
636	---- ---- ------ ---- ---- ------- ---- ---- -------
637	257 0 3 267 1 15,16 277 4 67-82
638	258 0 4 268 1 17,18 278 4 83-98
639	259 0 5 269 2 19-22 279 4 99-114
640	260 0 6 270 2 23-26 280 4 115-130
641	261 0 7 271 2 27-30 281 5 131-162
642	262 0 8 272 2 31-34 282 5 163-194
643	263 0 9 273 3 35-42 283 5 195-226
644	264 0 10 274 3 43-50 284 5 227-257
645	265 1 11,12 275 3 51-58 285 0 258
646	266 1 13,14 276 3 59-66
647
648	The extra bits should be interpreted as a machine integer
649	stored with the most-significant bit first, e.g., bits 1110
650	represent the value 14.
651
652	Extra Extra Extra
653	Code Bits Dist Code Bits Dist Code Bits Distance
654	---- ---- ---- ---- ---- ------ ---- ---- --------
655	0 0 1 10 4 33-48 20 9 1025-1536
656	1 0 2 11 4 49-64 21 9 1537-2048
657	2 0 3 12 5 65-96 22 10 2049-3072
658	3 0 4 13 5 97-128 23 10 3073-4096
659	4 1 5,6 14 6 129-192 24 11 4097-6144
660	5 1 7,8 15 6 193-256 25 11 6145-8192
661	6 2 9-12 16 7 257-384 26 12 8193-12288
662	7 2 13-16 17 7 385-512 27 12 12289-16384
663	8 3 17-24 18 8 513-768 28 13 16385-24576
664	9 3 25-32 19 8 769-1024 29 13 24577-32768
665
666	3.2.6. Compression with fixed Huffman codes (BTYPE=01)
667
668	The Huffman codes for the two alphabets are fixed, and are not
669	represented explicitly in the data. The Huffman code lengths
670	for the literal/length alphabet are:
671
672	Lit Value Bits Codes
673	--------- ---- -----
674	0 - 143 8 00110000 through
675	10111111
676	144 - 255 9 110010000 through
677	111111111
678	256 - 279 7 0000000 through
679	0010111
680	280 - 287 8 11000000 through
681	11000111
682
683
684
685	Deutsch Informational [Page 12]
686
687
688	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
689
690
691	The code lengths are sufficient to generate the actual codes,
692	as described above; we show the codes in the table for added
693	clarity. Literal/length values 286-287 will never actually
694	occur in the compressed data, but participate in the code
695	construction.
696
697	Distance codes 0-31 are represented by (fixed-length) 5-bit
698	codes, with possible additional bits as shown in the table
699	shown in Paragraph 3.2.5, above. Note that distance codes 30-
700	31 will never actually occur in the compressed data.
701
702	3.2.7. Compression with dynamic Huffman codes (BTYPE=10)
703
704	The Huffman codes for the two alphabets appear in the block
705	immediately after the header bits and before the actual
706	compressed data, first the literal/length code and then the
707	distance code. Each code is defined by a sequence of code
708	lengths, as discussed in Paragraph 3.2.2, above. For even
709	greater compactness, the code length sequences themselves are
710	compressed using a Huffman code. The alphabet for code lengths
711	is as follows:
712
713	0 - 15: Represent code lengths of 0 - 15
714	16: Copy the previous code length 3 - 6 times.
715	The next 2 bits indicate repeat length
716	(0 = 3, ... , 3 = 6)
717	Example: Codes 8, 16 (+2 bits 11),
718	16 (+2 bits 10) will expand to
719	12 code lengths of 8 (1 + 6 + 5)
720	17: Repeat a code length of 0 for 3 - 10 times.
721	(3 bits of length)
722	18: Repeat a code length of 0 for 11 - 138 times
723	(7 bits of length)
724
725	A code length of 0 indicates that the corresponding symbol in
726	the literal/length or distance alphabet will not occur in the
727	block, and should not participate in the Huffman code
728	construction algorithm given earlier. If only one distance
729	code is used, it is encoded using one bit, not zero bits; in
730	this case there is a single code length of one, with one unused
731	code. One distance code of zero bits means that there are no
732	distance codes used at all (the data is all literals).
733
734	We can now define the format of the block:
735
736	5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)
737	5 Bits: HDIST, # of Distance codes - 1 (1 - 32)
738	4 Bits: HCLEN, # of Code Length codes - 4 (4 - 19)
739
740
741
742	Deutsch Informational [Page 13]
743
744
745	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
746
747
748	(HCLEN + 4) x 3 bits: code lengths for the code length
749	alphabet given just above, in the order: 16, 17, 18,
750	0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
751
752	These code lengths are interpreted as 3-bit integers
753	(0-7); as above, a code length of 0 means the
754	corresponding symbol (literal/length or distance code
755	length) is not used.
756
757	HLIT + 257 code lengths for the literal/length alphabet,
758	encoded using the code length Huffman code
759
760	HDIST + 1 code lengths for the distance alphabet,
761	encoded using the code length Huffman code
762
763	The actual compressed data of the block,
764	encoded using the literal/length and distance Huffman
765	codes
766
767	The literal/length symbol 256 (end of data),
768	encoded using the literal/length Huffman code
769
770	The code length repeat codes can cross from HLIT + 257 to the
771	HDIST + 1 code lengths. In other words, all code lengths form
772	a single sequence of HLIT + HDIST + 258 values.
773
774	3.3. Compliance
775
776	A compressor may limit further the ranges of values specified in
777	the previous section and still be compliant; for example, it may
778	limit the range of backward pointers to some value smaller than
779	32K. Similarly, a compressor may limit the size of blocks so that
780	a compressible block fits in memory.
781
782	A compliant decompressor must accept the full range of possible
783	values defined in the previous section, and must accept blocks of
784	arbitrary size.
785
786	4. Compression algorithm details
787
788	While it is the intent of this document to define the "deflate"
789	compressed data format without reference to any particular
790	compression algorithm, the format is related to the compressed
791	formats produced by LZ77 (Lempel-Ziv 1977, see reference [2] below);
792	since many variations of LZ77 are patented, it is strongly
793	recommended that the implementor of a compressor follow the general
794	algorithm presented here, which is known not to be patented per se.
795	The material in this section is not part of the definition of the
796
797
798
799	Deutsch Informational [Page 14]
800
801
802	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
803
804
805	specification per se, and a compressor need not follow it in order to
806	be compliant.
807
808	The compressor terminates a block when it determines that starting a
809	new block with fresh trees would be useful, or when the block size
810	fills up the compressor's block buffer.
811
812	The compressor uses a chained hash table to find duplicated strings,
813	using a hash function that operates on 3-byte sequences. At any
814	given point during compression, let XYZ be the next 3 input bytes to
815	be examined (not necessarily all different, of course). First, the
816	compressor examines the hash chain for XYZ. If the chain is empty,
817	the compressor simply writes out X as a literal byte and advances one
818	byte in the input. If the hash chain is not empty, indicating that
819	the sequence XYZ (or, if we are unlucky, some other 3 bytes with the
820	same hash function value) has occurred recently, the compressor
821	compares all strings on the XYZ hash chain with the actual input data
822	sequence starting at the current point, and selects the longest
823	match.
824
825	The compressor searches the hash chains starting with the most recent
826	strings, to favor small distances and thus take advantage of the
827	Huffman encoding. The hash chains are singly linked. There are no
828	deletions from the hash chains; the algorithm simply discards matches
829	that are too old. To avoid a worst-case situation, very long hash
830	chains are arbitrarily truncated at a certain length, determined by a
831	run-time parameter.
832
833	To improve overall compression, the compressor optionally defers the
834	selection of matches ("lazy matching"): after a match of length N has
835	been found, the compressor searches for a longer match starting at
836	the next input byte. If it finds a longer match, it truncates the
837	previous match to a length of one (thus producing a single literal
838	byte) and then emits the longer match. Otherwise, it emits the
839	original match, and, as described above, advances N bytes before
840	continuing.
841
842	Run-time parameters also control this "lazy match" procedure. If
843	compression ratio is most important, the compressor attempts a
844	complete second search regardless of the length of the first match.
845	In the normal case, if the current match is "long enough", the
846	compressor reduces the search for a longer match, thus speeding up
847	the process. If speed is most important, the compressor inserts new
848	strings in the hash table only when no match was found, or when the
849	match is not "too long". This degrades the compression ratio but
850	saves time since there are both fewer insertions and fewer searches.
851
852
853
854
855
856	Deutsch Informational [Page 15]
857
858
859	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
860
861
862	5. References
863
864	[1] Huffman, D. A., "A Method for the Construction of Minimum
865	Redundancy Codes", Proceedings of the Institute of Radio
866	Engineers, September 1952, Volume 40, Number 9, pp. 1098-1101.
867
868	[2] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data
869	Compression", IEEE Transactions on Information Theory, Vol. 23,
870	No. 3, pp. 337-343.
871
872	[3] Gailly, J.-L., and Adler, M., ZLIB documentation and sources,
873	available in ftp://ftp.uu.net/pub/archiving/zip/doc/
874
875	[4] Gailly, J.-L., and Adler, M., GZIP documentation and sources,
876	available as gzip-*.tar in ftp://prep.ai.mit.edu/pub/gnu/
877
878	[5] Schwartz, E. S., and Kallick, B. "Generating a canonical prefix
879	encoding." Comm. ACM, 7,3 (Mar. 1964), pp. 166-169.
880
881	[6] Hirschberg and Lelewer, "Efficient decoding of prefix codes,"
882	Comm. ACM, 33,4, April 1990, pp. 449-459.
883
884	6. Security Considerations
885
886	Any data compression method involves the reduction of redundancy in
887	the data. Consequently, any corruption of the data is likely to have
888	severe effects and be difficult to correct. Uncompressed text, on
889	the other hand, will probably still be readable despite the presence
890	of some corrupted bytes.
891
892	It is recommended that systems using this data format provide some
893	means of validating the integrity of the compressed data. See
894	reference [3], for example.
895
896	7. Source code
897
898	Source code for a C language implementation of a "deflate" compliant
899	compressor and decompressor is available within the zlib package at
900	ftp://ftp.uu.net/pub/archiving/zip/zlib/.
901
902	8. Acknowledgements
903
904	Trademarks cited in this document are the property of their
905	respective owners.
906
907	Phil Katz designed the deflate format. Jean-Loup Gailly and Mark
908	Adler wrote the related software described in this specification.
909	Glenn Randers-Pehrson converted this document to RFC and HTML format.
910
911
912
913	Deutsch Informational [Page 16]
914
915
916	RFC 1951 DEFLATE Compressed Data Format Specification May 1996
917
918
919	9. Author's Address
920
921	L. Peter Deutsch
922	Aladdin Enterprises
923	203 Santa Margarita Ave.
924	Menlo Park, CA 94025
925
926	Phone: (415) 322-0103 (AM only)
927	FAX: (415) 322-1734
928	EMail: <[email protected]>
929
930	Questions about the technical content of this specification can be
931	sent by email to:
932
933	Jean-Loup Gailly <[email protected]> and
934	Mark Adler <[email protected]>
935
936	Editorial comments on this specification can be sent by email to:
937
938	L. Peter Deutsch <[email protected]> and
939	Glenn Randers-Pehrson <[email protected]>
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970	Deutsch Informational [Page 17]
971
972

	--- a/compat/zlib/doc/rfc1951.txt
	+++ b/compat/zlib/doc/rfc1951.txt
	@@ -1,972 +0,0 @@

D compat/zlib/doc/rfc1952.txt

-687

		--- a/compat/zlib/doc/rfc1952.txt
		+++ b/compat/zlib/doc/rfc1952.txt
		@@ -1,687 +0,0 @@
1		-
2		-
3		-
4		-
5		-
6		-
7		-Network Working Group P. Deutsch
8		-Request for Comments: 1952 Aladdin Enterprises
9		-Category: Informational May 1996
10		-
11		-
12		- GZIP file format specification version 4.3
13		-
14		-Status of This Memo
15		-
16		- This memo provides information for the Internet community. This memo
17		- does not specify an Internet standard of any kind. Distribution of
18		- this memo is unlimited.
19		-
20		-IESG Note:
21		-
22		- The IESG takes no position on the validity of any Intellectual
23		- Property Rights statements contained in this document.
24		-
25		-Notices
26		-
27		- Copyright (c) 1996 L. Peter Deutsch
28		-
29		- Permission is granted to copy and distribute this document for any
30		- purpose and without charge, including translations into other
31		- languages and incorporation into compilations, provided that the
32		- copyright notice and this notice are preserved, and that any
33		- substantive changes or deletions from the original are clearly
34		- marked.
35		-
36		- A pointer to the latest version of this and related documentation in
37		- HTML format can be found at the URL
38		- <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
39		-
40		-Abstract
41		-
42		- This specification defines a lossless compressed data format that is
43		- compatible with the widely used GZIP utility. The format includes a
44		- cyclic redundancy check value for detecting data corruption. The
45		- format presently uses the DEFLATE method of compression but can be
46		- easily extended to use other compression methods. The format can be
47		- implemented readily in a manner not covered by patents.
48		-
49		-
50		-
51		-
52		-
53		-
54		-
55		-
56		-
57		-
58		-Deutsch Informational [Page 1]
59		-
60		-
61		-RFC 1952 GZIP File Format Specification May 1996
62		-
63		-
64		-Table of Contents
65		-
66		- 1. Introduction ................................................... 2
67		- 1.1. Purpose ................................................... 2
68		- 1.2. Intended audience ......................................... 3
69		- 1.3. Scope ..................................................... 3
70		- 1.4. Compliance ................................................ 3
71		- 1.5. Definitions of terms and conventions used ................. 3
72		- 1.6. Changes from previous versions ............................ 3
73		- 2. Detailed specification ......................................... 4
74		- 2.1. Overall conventions ....................................... 4
75		- 2.2. File format ............................................... 5
76		- 2.3. Member format ............................................. 5
77		- 2.3.1. Member header and trailer ........................... 6
78		- 2.3.1.1. Extra field ................................... 8
79		- 2.3.1.2. Compliance .................................... 9
80		- 3. References .................................................. 9
81		- 4. Security Considerations .................................... 10
82		- 5. Acknowledgements ........................................... 10
83		- 6. Author's Address ........................................... 10
84		- 7. Appendix: Jean-Loup Gailly's gzip utility .................. 11
85		- 8. Appendix: Sample CRC Code .................................. 11
86		-
87		-1. Introduction
88		-
89		- 1.1. Purpose
90		-
91		- The purpose of this specification is to define a lossless
92		- compressed data format that:
93		-
94		- * Is independent of CPU type, operating system, file system,
95		- and character set, and hence can be used for interchange;
96		- * Can compress or decompress a data stream (as opposed to a
97		- randomly accessible file) to produce another data stream,
98		- using only an a priori bounded amount of intermediate
99		- storage, and hence can be used in data communications or
100		- similar structures such as Unix filters;
101		- * Compresses data with efficiency comparable to the best
102		- currently available general-purpose compression methods,
103		- and in particular considerably better than the "compress"
104		- program;
105		- * Can be implemented readily in a manner not covered by
106		- patents, and hence can be practiced freely;
107		- * Is compatible with the file format produced by the current
108		- widely used gzip utility, in that conforming decompressors
109		- will be able to read data produced by the existing gzip
110		- compressor.
111		-
112		-
113		-
114		-
115		-Deutsch Informational [Page 2]
116		-
117		-
118		-RFC 1952 GZIP File Format Specification May 1996
119		-
120		-
121		- The data format defined by this specification does not attempt to:
122		-
123		- * Provide random access to compressed data;
124		- * Compress specialized data (e.g., raster graphics) as well as
125		- the best currently available specialized algorithms.
126		-
127		- 1.2. Intended audience
128		-
129		- This specification is intended for use by implementors of software
130		- to compress data into gzip format and/or decompress data from gzip
131		- format.
132		-
133		- The text of the specification assumes a basic background in
134		- programming at the level of bits and other primitive data
135		- representations.
136		-
137		- 1.3. Scope
138		-
139		- The specification specifies a compression method and a file format
140		- (the latter assuming only that a file can store a sequence of
141		- arbitrary bytes). It does not specify any particular interface to
142		- a file system or anything about character sets or encodings
143		- (except for file names and comments, which are optional).
144		-
145		- 1.4. Compliance
146		-
147		- Unless otherwise indicated below, a compliant decompressor must be
148		- able to accept and decompress any file that conforms to all the
149		- specifications presented here; a compliant compressor must produce
150		- files that conform to all the specifications presented here. The
151		- material in the appendices is not part of the specification per se
152		- and is not relevant to compliance.
153		-
154		- 1.5. Definitions of terms and conventions used
155		-
156		- byte: 8 bits stored or transmitted as a unit (same as an octet).
157		- (For this specification, a byte is exactly 8 bits, even on
158		- machines which store a character on a number of bits different
159		- from 8.) See below for the numbering of bits within a byte.
160		-
161		- 1.6. Changes from previous versions
162		-
163		- There have been no technical changes to the gzip format since
164		- version 4.1 of this specification. In version 4.2, some
165		- terminology was changed, and the sample CRC code was rewritten for
166		- clarity and to eliminate the requirement for the caller to do pre-
167		- and post-conditioning. Version 4.3 is a conversion of the
168		- specification to RFC style.
169		-
170		-
171		-
172		-Deutsch Informational [Page 3]
173		-
174		-
175		-RFC 1952 GZIP File Format Specification May 1996
176		-
177		-
178		-2. Detailed specification
179		-
180		- 2.1. Overall conventions
181		-
182		- In the diagrams below, a box like this:
183		-
184		- +---+
185		- \| \| <-- the vertical bars might be missing
186		- +---+
187		-
188		- represents one byte; a box like this:
189		-
190		- +==============+
191		- \| \|
192		- +==============+
193		-
194		- represents a variable number of bytes.
195		-
196		- Bytes stored within a computer do not have a "bit order", since
197		- they are always treated as a unit. However, a byte considered as
198		- an integer between 0 and 255 does have a most- and least-
199		- significant bit, and since we write numbers with the most-
200		- significant digit on the left, we also write bytes with the most-
201		- significant bit on the left. In the diagrams below, we number the
202		- bits of a byte so that bit 0 is the least-significant bit, i.e.,
203		- the bits are numbered:
204		-
205		- +--------+
206		- \|76543210\|
207		- +--------+
208		-
209		- This document does not address the issue of the order in which
210		- bits of a byte are transmitted on a bit-sequential medium, since
211		- the data format described here is byte- rather than bit-oriented.
212		-
213		- Within a computer, a number may occupy multiple bytes. All
214		- multi-byte numbers in the format described here are stored with
215		- the least-significant byte first (at the lower memory address).
216		- For example, the decimal number 520 is stored as:
217		-
218		- 0 1
219		- +--------+--------+
220		- \|00001000\|00000010\|
221		- +--------+--------+
222		- ^ ^
223		- \| \|
224		- \| + more significant byte = 2 x 256
225		- + less significant byte = 8
226		-
227		-
228		-
229		-Deutsch Informational [Page 4]
230		-
231		-
232		-RFC 1952 GZIP File Format Specification May 1996
233		-
234		-
235		- 2.2. File format
236		-
237		- A gzip file consists of a series of "members" (compressed data
238		- sets). The format of each member is specified in the following
239		- section. The members simply appear one after another in the file,
240		- with no additional information before, between, or after them.
241		-
242		- 2.3. Member format
243		-
244		- Each member has the following structure:
245		-
246		- +---+---+---+---+---+---+---+---+---+---+
247		- \|ID1\|ID2\|CM \|FLG\| MTIME \|XFL\|OS \| (more-->)
248		- +---+---+---+---+---+---+---+---+---+---+
249		-
250		- (if FLG.FEXTRA set)
251		-
252		- +---+---+=================================+
253		- \| XLEN \|...XLEN bytes of "extra field"...\| (more-->)
254		- +---+---+=================================+
255		-
256		- (if FLG.FNAME set)
257		-
258		- +=========================================+
259		- \|...original file name, zero-terminated...\| (more-->)
260		- +=========================================+
261		-
262		- (if FLG.FCOMMENT set)
263		-
264		- +===================================+
265		- \|...file comment, zero-terminated...\| (more-->)
266		- +===================================+
267		-
268		- (if FLG.FHCRC set)
269		-
270		- +---+---+
271		- \| CRC16 \|
272		- +---+---+
273		-
274		- +=======================+
275		- \|...compressed blocks...\| (more-->)
276		- +=======================+
277		-
278		- 0 1 2 3 4 5 6 7
279		- +---+---+---+---+---+---+---+---+
280		- \| CRC32 \| ISIZE \|
281		- +---+---+---+---+---+---+---+---+
282		-
283		-
284		-
285		-
286		-Deutsch Informational [Page 5]
287		-
288		-
289		-RFC 1952 GZIP File Format Specification May 1996
290		-
291		-
292		- 2.3.1. Member header and trailer
293		-
294		- ID1 (IDentification 1)
295		- ID2 (IDentification 2)
296		- These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139
297		- (0x8b, \213), to identify the file as being in gzip format.
298		-
299		- CM (Compression Method)
300		- This identifies the compression method used in the file. CM
301		- = 0-7 are reserved. CM = 8 denotes the "deflate"
302		- compression method, which is the one customarily used by
303		- gzip and which is documented elsewhere.
304		-
305		- FLG (FLaGs)
306		- This flag byte is divided into individual bits as follows:
307		-
308		- bit 0 FTEXT
309		- bit 1 FHCRC
310		- bit 2 FEXTRA
311		- bit 3 FNAME
312		- bit 4 FCOMMENT
313		- bit 5 reserved
314		- bit 6 reserved
315		- bit 7 reserved
316		-
317		- If FTEXT is set, the file is probably ASCII text. This is
318		- an optional indication, which the compressor may set by
319		- checking a small amount of the input data to see whether any
320		- non-ASCII characters are present. In case of doubt, FTEXT
321		- is cleared, indicating binary data. For systems which have
322		- different file formats for ascii text and binary data, the
323		- decompressor can use FTEXT to choose the appropriate format.
324		- We deliberately do not specify the algorithm used to set
325		- this bit, since a compressor always has the option of
326		- leaving it cleared and a decompressor always has the option
327		- of ignoring it and letting some other program handle issues
328		- of data conversion.
329		-
330		- If FHCRC is set, a CRC16 for the gzip header is present,
331		- immediately before the compressed data. The CRC16 consists
332		- of the two least significant bytes of the CRC32 for all
333		- bytes of the gzip header up to and not including the CRC16.
334		- [The FHCRC bit was never set by versions of gzip up to
335		- 1.2.4, even though it was documented with a different
336		- meaning in gzip 1.2.4.]
337		-
338		- If FEXTRA is set, optional extra fields are present, as
339		- described in a following section.
340		-
341		-
342		-
343		-Deutsch Informational [Page 6]
344		-
345		-
346		-RFC 1952 GZIP File Format Specification May 1996
347		-
348		-
349		- If FNAME is set, an original file name is present,
350		- terminated by a zero byte. The name must consist of ISO
351		- 8859-1 (LATIN-1) characters; on operating systems using
352		- EBCDIC or any other character set for file names, the name
353		- must be translated to the ISO LATIN-1 character set. This
354		- is the original name of the file being compressed, with any
355		- directory components removed, and, if the file being
356		- compressed is on a file system with case insensitive names,
357		- forced to lower case. There is no original file name if the
358		- data was compressed from a source other than a named file;
359		- for example, if the source was stdin on a Unix system, there
360		- is no file name.
361		-
362		- If FCOMMENT is set, a zero-terminated file comment is
363		- present. This comment is not interpreted; it is only
364		- intended for human consumption. The comment must consist of
365		- ISO 8859-1 (LATIN-1) characters. Line breaks should be
366		- denoted by a single line feed character (10 decimal).
367		-
368		- Reserved FLG bits must be zero.
369		-
370		- MTIME (Modification TIME)
371		- This gives the most recent modification time of the original
372		- file being compressed. The time is in Unix format, i.e.,
373		- seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this
374		- may cause problems for MS-DOS and other systems that use
375		- local rather than Universal time.) If the compressed data
376		- did not come from a file, MTIME is set to the time at which
377		- compression started. MTIME = 0 means no time stamp is
378		- available.
379		-
380		- XFL (eXtra FLags)
381		- These flags are available for use by specific compression
382		- methods. The "deflate" method (CM = 8) sets these flags as
383		- follows:
384		-
385		- XFL = 2 - compressor used maximum compression,
386		- slowest algorithm
387		- XFL = 4 - compressor used fastest algorithm
388		-
389		- OS (Operating System)
390		- This identifies the type of file system on which compression
391		- took place. This may be useful in determining end-of-line
392		- convention for text files. The currently defined values are
393		- as follows:
394		-
395		-
396		-
397		-
398		-
399		-
400		-Deutsch Informational [Page 7]
401		-
402		-
403		-RFC 1952 GZIP File Format Specification May 1996
404		-
405		-
406		- 0 - FAT filesystem (MS-DOS, OS/2, NT/Win32)
407		- 1 - Amiga
408		- 2 - VMS (or OpenVMS)
409		- 3 - Unix
410		- 4 - VM/CMS
411		- 5 - Atari TOS
412		- 6 - HPFS filesystem (OS/2, NT)
413		- 7 - Macintosh
414		- 8 - Z-System
415		- 9 - CP/M
416		- 10 - TOPS-20
417		- 11 - NTFS filesystem (NT)
418		- 12 - QDOS
419		- 13 - Acorn RISCOS
420		- 255 - unknown
421		-
422		- XLEN (eXtra LENgth)
423		- If FLG.FEXTRA is set, this gives the length of the optional
424		- extra field. See below for details.
425		-
426		- CRC32 (CRC-32)
427		- This contains a Cyclic Redundancy Check value of the
428		- uncompressed data computed according to CRC-32 algorithm
429		- used in the ISO 3309 standard and in section 8.1.1.6.2 of
430		- ITU-T recommendation V.42. (See http://www.iso.ch for
431		- ordering ISO documents. See gopher://info.itu.ch for an
432		- online version of ITU-T V.42.)
433		-
434		- ISIZE (Input SIZE)
435		- This contains the size of the original (uncompressed) input
436		- data modulo 2^32.
437		-
438		- 2.3.1.1. Extra field
439		-
440		- If the FLG.FEXTRA bit is set, an "extra field" is present in
441		- the header, with total length XLEN bytes. It consists of a
442		- series of subfields, each of the form:
443		-
444		- +---+---+---+---+==================================+
445		- \|SI1\|SI2\| LEN \|... LEN bytes of subfield data ...\|
446		- +---+---+---+---+==================================+
447		-
448		- SI1 and SI2 provide a subfield ID, typically two ASCII letters
449		- with some mnemonic value. Jean-Loup Gailly
450		- <[email protected]> is maintaining a registry of subfield
451		- IDs; please send him any subfield ID you wish to use. Subfield
452		- IDs with SI2 = 0 are reserved for future use. The following
453		- IDs are currently defined:
454		-
455		-
456		-
457		-Deutsch Informational [Page 8]
458		-
459		-
460		-RFC 1952 GZIP File Format Specification May 1996
461		-
462		-
463		- SI1 SI2 Data
464		- ---------- ---------- ----
465		- 0x41 ('A') 0x70 ('P') Apollo file type information
466		-
467		- LEN gives the length of the subfield data, excluding the 4
468		- initial bytes.
469		-
470		- 2.3.1.2. Compliance
471		-
472		- A compliant compressor must produce files with correct ID1,
473		- ID2, CM, CRC32, and ISIZE, but may set all the other fields in
474		- the fixed-length part of the header to default values (255 for
475		- OS, 0 for all others). The compressor must set all reserved
476		- bits to zero.
477		-
478		- A compliant decompressor must check ID1, ID2, and CM, and
479		- provide an error indication if any of these have incorrect
480		- values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC
481		- at least so it can skip over the optional fields if they are
482		- present. It need not examine any other part of the header or
483		- trailer; in particular, a decompressor may ignore FTEXT and OS
484		- and always produce binary output, and still be compliant. A
485		- compliant decompressor must give an error indication if any
486		- reserved bit is non-zero, since such a bit could indicate the
487		- presence of a new field that would cause subsequent data to be
488		- interpreted incorrectly.
489		-
490		-3. References
491		-
492		- [1] "Information Processing - 8-bit single-byte coded graphic
493		- character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987).
494		- The ISO 8859-1 (Latin-1) character set is a superset of 7-bit
495		- ASCII. Files defining this character set are available as
496		- iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/
497		-
498		- [2] ISO 3309
499		-
500		- [3] ITU-T recommendation V.42
501		-
502		- [4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
503		- available in ftp://ftp.uu.net/pub/archiving/zip/doc/
504		-
505		- [5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in
506		- ftp://prep.ai.mit.edu/pub/gnu/
507		-
508		- [6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table
509		- Look-Up", Communications of the ACM, 31(8), pp.1008-1013.
510		-
511		-
512		-
513		-
514		-Deutsch Informational [Page 9]
515		-
516		-
517		-RFC 1952 GZIP File Format Specification May 1996
518		-
519		-
520		- [7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal,
521		- pp.118-133.
522		-
523		- [8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt,
524		- describing the CRC concept.
525		-
526		-4. Security Considerations
527		-
528		- Any data compression method involves the reduction of redundancy in
529		- the data. Consequently, any corruption of the data is likely to have
530		- severe effects and be difficult to correct. Uncompressed text, on
531		- the other hand, will probably still be readable despite the presence
532		- of some corrupted bytes.
533		-
534		- It is recommended that systems using this data format provide some
535		- means of validating the integrity of the compressed data, such as by
536		- setting and checking the CRC-32 check value.
537		-
538		-5. Acknowledgements
539		-
540		- Trademarks cited in this document are the property of their
541		- respective owners.
542		-
543		- Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler,
544		- the related software described in this specification. Glenn
545		- Randers-Pehrson converted this document to RFC and HTML format.
546		-
547		-6. Author's Address
548		-
549		- L. Peter Deutsch
550		- Aladdin Enterprises
551		- 203 Santa Margarita Ave.
552		- Menlo Park, CA 94025
553		-
554		- Phone: (415) 322-0103 (AM only)
555		- FAX: (415) 322-1734
556		- EMail: <[email protected]>
557		-
558		- Questions about the technical content of this specification can be
559		- sent by email to:
560		-
561		- Jean-Loup Gailly <[email protected]> and
562		- Mark Adler <[email protected]>
563		-
564		- Editorial comments on this specification can be sent by email to:
565		-
566		- L. Peter Deutsch <[email protected]> and
567		- Glenn Randers-Pehrson <[email protected]>
568		-
569		-
570		-
571		-Deutsch Informational [Page 10]
572		-
573		-
574		-RFC 1952 GZIP File Format Specification May 1996
575		-
576		-
577		-7. Appendix: Jean-Loup Gailly's gzip utility
578		-
579		- The most widely used implementation of gzip compression, and the
580		- original documentation on which this specification is based, were
581		- created by Jean-Loup Gailly <[email protected]>. Since this
582		- implementation is a de facto standard, we mention some more of its
583		- features here. Again, the material in this section is not part of
584		- the specification per se, and implementations need not follow it to
585		- be compliant.
586		-
587		- When compressing or decompressing a file, gzip preserves the
588		- protection, ownership, and modification time attributes on the local
589		- file system, since there is no provision for representing protection
590		- attributes in the gzip file format itself. Since the file format
591		- includes a modification time, the gzip decompressor provides a
592		- command line switch that assigns the modification time from the file,
593		- rather than the local modification time of the compressed input, to
594		- the decompressed output.
595		-
596		-8. Appendix: Sample CRC Code
597		-
598		- The following sample code represents a practical implementation of
599		- the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42
600		- for a formal specification.)
601		-
602		- The sample code is in the ANSI C programming language. Non C users
603		- may find it easier to read with these hints:
604		-
605		- & Bitwise AND operator.
606		- ^ Bitwise exclusive-OR operator.
607		- >> Bitwise right shift operator. When applied to an
608		- unsigned quantity, as here, right shift inserts zero
609		- bit(s) at the left.
610		- ! Logical NOT operator.
611		- ++ "n++" increments the variable n.
612		- 0xNNN 0x introduces a hexadecimal (base 16) constant.
613		- Suffix L indicates a long value (at least 32 bits).
614		-
615		- /* Table of CRCs of all 8-bit messages. */
616		- unsigned long crc_table[256];
617		-
618		- /* Flag: has the table been computed? Initially false. */
619		- int crc_table_computed = 0;
620		-
621		- /* Make the table for a fast CRC. */
622		- void make_crc_table(void)
623		- {
624		- unsigned long c;
625		-
626		-
627		-
628		-Deutsch Informational [Page 11]
629		-
630		-
631		-RFC 1952 GZIP File Format Specification May 1996
632		-
633		-
634		- int n, k;
635		- for (n = 0; n < 256; n++) {
636		- c = (unsigned long) n;
637		- for (k = 0; k < 8; k++) {
638		- if (c & 1) {
639		- c = 0xedb88320L ^ (c >> 1);
640		- } else {
641		- c = c >> 1;
642		- }
643		- }
644		- crc_table[n] = c;
645		- }
646		- crc_table_computed = 1;
647		- }
648		-
649		- /*
650		- Update a running crc with the bytes buf[0..len-1] and return
651		- the updated crc. The crc should be initialized to zero. Pre- and
652		- post-conditioning (one's complement) is performed within this
653		- function so it shouldn't be done by the caller. Usage example:
654		-
655		- unsigned long crc = 0L;
656		-
657		- while (read_buffer(buffer, length) != EOF) {
658		- crc = update_crc(crc, buffer, length);
659		- }
660		- if (crc != original_crc) error();
661		- */
662		- unsigned long update_crc(unsigned long crc,
663		- unsigned char *buf, int len)
664		- {
665		- unsigned long c = crc ^ 0xffffffffL;
666		- int n;
667		-
668		- if (!crc_table_computed)
669		- make_crc_table();
670		- for (n = 0; n < len; n++) {
671		- c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
672		- }
673		- return c ^ 0xffffffffL;
674		- }
675		-
676		- /* Return the CRC of the bytes buf[0..len-1]. */
677		- unsigned long crc(unsigned char *buf, int len)
678		- {
679		- return update_crc(0L, buf, len);
680		- }
681		-
682		-
683		-
684		-
685		-Deutsch Informational [Page 12]
686		-
687		-

	--- a/compat/zlib/doc/rfc1952.txt
	+++ b/compat/zlib/doc/rfc1952.txt
	@@ -1,687 +0,0 @@
1
2
3
4
5
6
7	Network Working Group P. Deutsch
8	Request for Comments: 1952 Aladdin Enterprises
9	Category: Informational May 1996
10
11
12	GZIP file format specification version 4.3
13
14	Status of This Memo
15
16	This memo provides information for the Internet community. This memo
17	does not specify an Internet standard of any kind. Distribution of
18	this memo is unlimited.
19
20	IESG Note:
21
22	The IESG takes no position on the validity of any Intellectual
23	Property Rights statements contained in this document.
24
25	Notices
26
27	Copyright (c) 1996 L. Peter Deutsch
28
29	Permission is granted to copy and distribute this document for any
30	purpose and without charge, including translations into other
31	languages and incorporation into compilations, provided that the
32	copyright notice and this notice are preserved, and that any
33	substantive changes or deletions from the original are clearly
34	marked.
35
36	A pointer to the latest version of this and related documentation in
37	HTML format can be found at the URL
38	<ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
39
40	Abstract
41
42	This specification defines a lossless compressed data format that is
43	compatible with the widely used GZIP utility. The format includes a
44	cyclic redundancy check value for detecting data corruption. The
45	format presently uses the DEFLATE method of compression but can be
46	easily extended to use other compression methods. The format can be
47	implemented readily in a manner not covered by patents.
48
49
50
51
52
53
54
55
56
57
58	Deutsch Informational [Page 1]
59
60
61	RFC 1952 GZIP File Format Specification May 1996
62
63
64	Table of Contents
65
66	1. Introduction ................................................... 2
67	1.1. Purpose ................................................... 2
68	1.2. Intended audience ......................................... 3
69	1.3. Scope ..................................................... 3
70	1.4. Compliance ................................................ 3
71	1.5. Definitions of terms and conventions used ................. 3
72	1.6. Changes from previous versions ............................ 3
73	2. Detailed specification ......................................... 4
74	2.1. Overall conventions ....................................... 4
75	2.2. File format ............................................... 5
76	2.3. Member format ............................................. 5
77	2.3.1. Member header and trailer ........................... 6
78	2.3.1.1. Extra field ................................... 8
79	2.3.1.2. Compliance .................................... 9
80	3. References .................................................. 9
81	4. Security Considerations .................................... 10
82	5. Acknowledgements ........................................... 10
83	6. Author's Address ........................................... 10
84	7. Appendix: Jean-Loup Gailly's gzip utility .................. 11
85	8. Appendix: Sample CRC Code .................................. 11
86
87	1. Introduction
88
89	1.1. Purpose
90
91	The purpose of this specification is to define a lossless
92	compressed data format that:
93
94	* Is independent of CPU type, operating system, file system,
95	and character set, and hence can be used for interchange;
96	* Can compress or decompress a data stream (as opposed to a
97	randomly accessible file) to produce another data stream,
98	using only an a priori bounded amount of intermediate
99	storage, and hence can be used in data communications or
100	similar structures such as Unix filters;
101	* Compresses data with efficiency comparable to the best
102	currently available general-purpose compression methods,
103	and in particular considerably better than the "compress"
104	program;
105	* Can be implemented readily in a manner not covered by
106	patents, and hence can be practiced freely;
107	* Is compatible with the file format produced by the current
108	widely used gzip utility, in that conforming decompressors
109	will be able to read data produced by the existing gzip
110	compressor.
111
112
113
114
115	Deutsch Informational [Page 2]
116
117
118	RFC 1952 GZIP File Format Specification May 1996
119
120
121	The data format defined by this specification does not attempt to:
122
123	* Provide random access to compressed data;
124	* Compress specialized data (e.g., raster graphics) as well as
125	the best currently available specialized algorithms.
126
127	1.2. Intended audience
128
129	This specification is intended for use by implementors of software
130	to compress data into gzip format and/or decompress data from gzip
131	format.
132
133	The text of the specification assumes a basic background in
134	programming at the level of bits and other primitive data
135	representations.
136
137	1.3. Scope
138
139	The specification specifies a compression method and a file format
140	(the latter assuming only that a file can store a sequence of
141	arbitrary bytes). It does not specify any particular interface to
142	a file system or anything about character sets or encodings
143	(except for file names and comments, which are optional).
144
145	1.4. Compliance
146
147	Unless otherwise indicated below, a compliant decompressor must be
148	able to accept and decompress any file that conforms to all the
149	specifications presented here; a compliant compressor must produce
150	files that conform to all the specifications presented here. The
151	material in the appendices is not part of the specification per se
152	and is not relevant to compliance.
153
154	1.5. Definitions of terms and conventions used
155
156	byte: 8 bits stored or transmitted as a unit (same as an octet).
157	(For this specification, a byte is exactly 8 bits, even on
158	machines which store a character on a number of bits different
159	from 8.) See below for the numbering of bits within a byte.
160
161	1.6. Changes from previous versions
162
163	There have been no technical changes to the gzip format since
164	version 4.1 of this specification. In version 4.2, some
165	terminology was changed, and the sample CRC code was rewritten for
166	clarity and to eliminate the requirement for the caller to do pre-
167	and post-conditioning. Version 4.3 is a conversion of the
168	specification to RFC style.
169
170
171
172	Deutsch Informational [Page 3]
173
174
175	RFC 1952 GZIP File Format Specification May 1996
176
177
178	2. Detailed specification
179
180	2.1. Overall conventions
181
182	In the diagrams below, a box like this:
183
184	+---+
185	\| \| <-- the vertical bars might be missing
186	+---+
187
188	represents one byte; a box like this:
189
190	+==============+
191	\| \|
192	+==============+
193
194	represents a variable number of bytes.
195
196	Bytes stored within a computer do not have a "bit order", since
197	they are always treated as a unit. However, a byte considered as
198	an integer between 0 and 255 does have a most- and least-
199	significant bit, and since we write numbers with the most-
200	significant digit on the left, we also write bytes with the most-
201	significant bit on the left. In the diagrams below, we number the
202	bits of a byte so that bit 0 is the least-significant bit, i.e.,
203	the bits are numbered:
204
205	+--------+
206	\|76543210\|
207	+--------+
208
209	This document does not address the issue of the order in which
210	bits of a byte are transmitted on a bit-sequential medium, since
211	the data format described here is byte- rather than bit-oriented.
212
213	Within a computer, a number may occupy multiple bytes. All
214	multi-byte numbers in the format described here are stored with
215	the least-significant byte first (at the lower memory address).
216	For example, the decimal number 520 is stored as:
217
218	0 1
219	+--------+--------+
220	\|00001000\|00000010\|
221	+--------+--------+
222	^ ^
223	\| \|
224	\| + more significant byte = 2 x 256
225	+ less significant byte = 8
226
227
228
229	Deutsch Informational [Page 4]
230
231
232	RFC 1952 GZIP File Format Specification May 1996
233
234
235	2.2. File format
236
237	A gzip file consists of a series of "members" (compressed data
238	sets). The format of each member is specified in the following
239	section. The members simply appear one after another in the file,
240	with no additional information before, between, or after them.
241
242	2.3. Member format
243
244	Each member has the following structure:
245
246	+---+---+---+---+---+---+---+---+---+---+
247	\|ID1\|ID2\|CM \|FLG\| MTIME \|XFL\|OS \| (more-->)
248	+---+---+---+---+---+---+---+---+---+---+
249
250	(if FLG.FEXTRA set)
251
252	+---+---+=================================+
253	\| XLEN \|...XLEN bytes of "extra field"...\| (more-->)
254	+---+---+=================================+
255
256	(if FLG.FNAME set)
257
258	+=========================================+
259	\|...original file name, zero-terminated...\| (more-->)
260	+=========================================+
261
262	(if FLG.FCOMMENT set)
263
264	+===================================+
265	\|...file comment, zero-terminated...\| (more-->)
266	+===================================+
267
268	(if FLG.FHCRC set)
269
270	+---+---+
271	\| CRC16 \|
272	+---+---+
273
274	+=======================+
275	\|...compressed blocks...\| (more-->)
276	+=======================+
277
278	0 1 2 3 4 5 6 7
279	+---+---+---+---+---+---+---+---+
280	\| CRC32 \| ISIZE \|
281	+---+---+---+---+---+---+---+---+
282
283
284
285
286	Deutsch Informational [Page 5]
287
288
289	RFC 1952 GZIP File Format Specification May 1996
290
291
292	2.3.1. Member header and trailer
293
294	ID1 (IDentification 1)
295	ID2 (IDentification 2)
296	These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139
297	(0x8b, \213), to identify the file as being in gzip format.
298
299	CM (Compression Method)
300	This identifies the compression method used in the file. CM
301	= 0-7 are reserved. CM = 8 denotes the "deflate"
302	compression method, which is the one customarily used by
303	gzip and which is documented elsewhere.
304
305	FLG (FLaGs)
306	This flag byte is divided into individual bits as follows:
307
308	bit 0 FTEXT
309	bit 1 FHCRC
310	bit 2 FEXTRA
311	bit 3 FNAME
312	bit 4 FCOMMENT
313	bit 5 reserved
314	bit 6 reserved
315	bit 7 reserved
316
317	If FTEXT is set, the file is probably ASCII text. This is
318	an optional indication, which the compressor may set by
319	checking a small amount of the input data to see whether any
320	non-ASCII characters are present. In case of doubt, FTEXT
321	is cleared, indicating binary data. For systems which have
322	different file formats for ascii text and binary data, the
323	decompressor can use FTEXT to choose the appropriate format.
324	We deliberately do not specify the algorithm used to set
325	this bit, since a compressor always has the option of
326	leaving it cleared and a decompressor always has the option
327	of ignoring it and letting some other program handle issues
328	of data conversion.
329
330	If FHCRC is set, a CRC16 for the gzip header is present,
331	immediately before the compressed data. The CRC16 consists
332	of the two least significant bytes of the CRC32 for all
333	bytes of the gzip header up to and not including the CRC16.
334	[The FHCRC bit was never set by versions of gzip up to
335	1.2.4, even though it was documented with a different
336	meaning in gzip 1.2.4.]
337
338	If FEXTRA is set, optional extra fields are present, as
339	described in a following section.
340
341
342
343	Deutsch Informational [Page 6]
344
345
346	RFC 1952 GZIP File Format Specification May 1996
347
348
349	If FNAME is set, an original file name is present,
350	terminated by a zero byte. The name must consist of ISO
351	8859-1 (LATIN-1) characters; on operating systems using
352	EBCDIC or any other character set for file names, the name
353	must be translated to the ISO LATIN-1 character set. This
354	is the original name of the file being compressed, with any
355	directory components removed, and, if the file being
356	compressed is on a file system with case insensitive names,
357	forced to lower case. There is no original file name if the
358	data was compressed from a source other than a named file;
359	for example, if the source was stdin on a Unix system, there
360	is no file name.
361
362	If FCOMMENT is set, a zero-terminated file comment is
363	present. This comment is not interpreted; it is only
364	intended for human consumption. The comment must consist of
365	ISO 8859-1 (LATIN-1) characters. Line breaks should be
366	denoted by a single line feed character (10 decimal).
367
368	Reserved FLG bits must be zero.
369
370	MTIME (Modification TIME)
371	This gives the most recent modification time of the original
372	file being compressed. The time is in Unix format, i.e.,
373	seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this
374	may cause problems for MS-DOS and other systems that use
375	local rather than Universal time.) If the compressed data
376	did not come from a file, MTIME is set to the time at which
377	compression started. MTIME = 0 means no time stamp is
378	available.
379
380	XFL (eXtra FLags)
381	These flags are available for use by specific compression
382	methods. The "deflate" method (CM = 8) sets these flags as
383	follows:
384
385	XFL = 2 - compressor used maximum compression,
386	slowest algorithm
387	XFL = 4 - compressor used fastest algorithm
388
389	OS (Operating System)
390	This identifies the type of file system on which compression
391	took place. This may be useful in determining end-of-line
392	convention for text files. The currently defined values are
393	as follows:
394
395
396
397
398
399
400	Deutsch Informational [Page 7]
401
402
403	RFC 1952 GZIP File Format Specification May 1996
404
405
406	0 - FAT filesystem (MS-DOS, OS/2, NT/Win32)
407	1 - Amiga
408	2 - VMS (or OpenVMS)
409	3 - Unix
410	4 - VM/CMS
411	5 - Atari TOS
412	6 - HPFS filesystem (OS/2, NT)
413	7 - Macintosh
414	8 - Z-System
415	9 - CP/M
416	10 - TOPS-20
417	11 - NTFS filesystem (NT)
418	12 - QDOS
419	13 - Acorn RISCOS
420	255 - unknown
421
422	XLEN (eXtra LENgth)
423	If FLG.FEXTRA is set, this gives the length of the optional
424	extra field. See below for details.
425
426	CRC32 (CRC-32)
427	This contains a Cyclic Redundancy Check value of the
428	uncompressed data computed according to CRC-32 algorithm
429	used in the ISO 3309 standard and in section 8.1.1.6.2 of
430	ITU-T recommendation V.42. (See http://www.iso.ch for
431	ordering ISO documents. See gopher://info.itu.ch for an
432	online version of ITU-T V.42.)
433
434	ISIZE (Input SIZE)
435	This contains the size of the original (uncompressed) input
436	data modulo 2^32.
437
438	2.3.1.1. Extra field
439
440	If the FLG.FEXTRA bit is set, an "extra field" is present in
441	the header, with total length XLEN bytes. It consists of a
442	series of subfields, each of the form:
443
444	+---+---+---+---+==================================+
445	\|SI1\|SI2\| LEN \|... LEN bytes of subfield data ...\|
446	+---+---+---+---+==================================+
447
448	SI1 and SI2 provide a subfield ID, typically two ASCII letters
449	with some mnemonic value. Jean-Loup Gailly
450	<[email protected]> is maintaining a registry of subfield
451	IDs; please send him any subfield ID you wish to use. Subfield
452	IDs with SI2 = 0 are reserved for future use. The following
453	IDs are currently defined:
454
455
456
457	Deutsch Informational [Page 8]
458
459
460	RFC 1952 GZIP File Format Specification May 1996
461
462
463	SI1 SI2 Data
464	---------- ---------- ----
465	0x41 ('A') 0x70 ('P') Apollo file type information
466
467	LEN gives the length of the subfield data, excluding the 4
468	initial bytes.
469
470	2.3.1.2. Compliance
471
472	A compliant compressor must produce files with correct ID1,
473	ID2, CM, CRC32, and ISIZE, but may set all the other fields in
474	the fixed-length part of the header to default values (255 for
475	OS, 0 for all others). The compressor must set all reserved
476	bits to zero.
477
478	A compliant decompressor must check ID1, ID2, and CM, and
479	provide an error indication if any of these have incorrect
480	values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC
481	at least so it can skip over the optional fields if they are
482	present. It need not examine any other part of the header or
483	trailer; in particular, a decompressor may ignore FTEXT and OS
484	and always produce binary output, and still be compliant. A
485	compliant decompressor must give an error indication if any
486	reserved bit is non-zero, since such a bit could indicate the
487	presence of a new field that would cause subsequent data to be
488	interpreted incorrectly.
489
490	3. References
491
492	[1] "Information Processing - 8-bit single-byte coded graphic
493	character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987).
494	The ISO 8859-1 (Latin-1) character set is a superset of 7-bit
495	ASCII. Files defining this character set are available as
496	iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/
497
498	[2] ISO 3309
499
500	[3] ITU-T recommendation V.42
501
502	[4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
503	available in ftp://ftp.uu.net/pub/archiving/zip/doc/
504
505	[5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in
506	ftp://prep.ai.mit.edu/pub/gnu/
507
508	[6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table
509	Look-Up", Communications of the ACM, 31(8), pp.1008-1013.
510
511
512
513
514	Deutsch Informational [Page 9]
515
516
517	RFC 1952 GZIP File Format Specification May 1996
518
519
520	[7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal,
521	pp.118-133.
522
523	[8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt,
524	describing the CRC concept.
525
526	4. Security Considerations
527
528	Any data compression method involves the reduction of redundancy in
529	the data. Consequently, any corruption of the data is likely to have
530	severe effects and be difficult to correct. Uncompressed text, on
531	the other hand, will probably still be readable despite the presence
532	of some corrupted bytes.
533
534	It is recommended that systems using this data format provide some
535	means of validating the integrity of the compressed data, such as by
536	setting and checking the CRC-32 check value.
537
538	5. Acknowledgements
539
540	Trademarks cited in this document are the property of their
541	respective owners.
542
543	Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler,
544	the related software described in this specification. Glenn
545	Randers-Pehrson converted this document to RFC and HTML format.
546
547	6. Author's Address
548
549	L. Peter Deutsch
550	Aladdin Enterprises
551	203 Santa Margarita Ave.
552	Menlo Park, CA 94025
553
554	Phone: (415) 322-0103 (AM only)
555	FAX: (415) 322-1734
556	EMail: <[email protected]>
557
558	Questions about the technical content of this specification can be
559	sent by email to:
560
561	Jean-Loup Gailly <[email protected]> and
562	Mark Adler <[email protected]>
563
564	Editorial comments on this specification can be sent by email to:
565
566	L. Peter Deutsch <[email protected]> and
567	Glenn Randers-Pehrson <[email protected]>
568
569
570
571	Deutsch Informational [Page 10]
572
573
574	RFC 1952 GZIP File Format Specification May 1996
575
576
577	7. Appendix: Jean-Loup Gailly's gzip utility
578
579	The most widely used implementation of gzip compression, and the
580	original documentation on which this specification is based, were
581	created by Jean-Loup Gailly <[email protected]>. Since this
582	implementation is a de facto standard, we mention some more of its
583	features here. Again, the material in this section is not part of
584	the specification per se, and implementations need not follow it to
585	be compliant.
586
587	When compressing or decompressing a file, gzip preserves the
588	protection, ownership, and modification time attributes on the local
589	file system, since there is no provision for representing protection
590	attributes in the gzip file format itself. Since the file format
591	includes a modification time, the gzip decompressor provides a
592	command line switch that assigns the modification time from the file,
593	rather than the local modification time of the compressed input, to
594	the decompressed output.
595
596	8. Appendix: Sample CRC Code
597
598	The following sample code represents a practical implementation of
599	the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42
600	for a formal specification.)
601
602	The sample code is in the ANSI C programming language. Non C users
603	may find it easier to read with these hints:
604
605	& Bitwise AND operator.
606	^ Bitwise exclusive-OR operator.
607	>> Bitwise right shift operator. When applied to an
608	unsigned quantity, as here, right shift inserts zero
609	bit(s) at the left.
610	! Logical NOT operator.
611	++ "n++" increments the variable n.
612	0xNNN 0x introduces a hexadecimal (base 16) constant.
613	Suffix L indicates a long value (at least 32 bits).
614
615	/* Table of CRCs of all 8-bit messages. */
616	unsigned long crc_table[256];
617
618	/* Flag: has the table been computed? Initially false. */
619	int crc_table_computed = 0;
620
621	/* Make the table for a fast CRC. */
622	void make_crc_table(void)
623	{
624	unsigned long c;
625
626
627
628	Deutsch Informational [Page 11]
629
630
631	RFC 1952 GZIP File Format Specification May 1996
632
633
634	int n, k;
635	for (n = 0; n < 256; n++) {
636	c = (unsigned long) n;
637	for (k = 0; k < 8; k++) {
638	if (c & 1) {
639	c = 0xedb88320L ^ (c >> 1);
640	} else {
641	c = c >> 1;
642	}
643	}
644	crc_table[n] = c;
645	}
646	crc_table_computed = 1;
647	}
648
649	/*
650	Update a running crc with the bytes buf[0..len-1] and return
651	the updated crc. The crc should be initialized to zero. Pre- and
652	post-conditioning (one's complement) is performed within this
653	function so it shouldn't be done by the caller. Usage example:
654
655	unsigned long crc = 0L;
656
657	while (read_buffer(buffer, length) != EOF) {
658	crc = update_crc(crc, buffer, length);
659	}
660	if (crc != original_crc) error();
661	*/
662	unsigned long update_crc(unsigned long crc,
663	unsigned char *buf, int len)
664	{
665	unsigned long c = crc ^ 0xffffffffL;
666	int n;
667
668	if (!crc_table_computed)
669	make_crc_table();
670	for (n = 0; n < len; n++) {
671	c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
672	}
673	return c ^ 0xffffffffL;
674	}
675
676	/* Return the CRC of the bytes buf[0..len-1]. */
677	unsigned long crc(unsigned char *buf, int len)
678	{
679	return update_crc(0L, buf, len);
680	}
681
682
683
684
685	Deutsch Informational [Page 12]
686
687

	--- a/compat/zlib/doc/rfc1952.txt
	+++ b/compat/zlib/doc/rfc1952.txt
	@@ -1,687 +0,0 @@

D compat/zlib/doc/txtvsbin.txt

-103

		--- a/compat/zlib/doc/txtvsbin.txt
		+++ b/compat/zlib/doc/txtvsbin.txt
		@@ -1,107 +0,0 @@
1		-A Fast Method for Identifying Plain Text Files
2		-==============================================
3		-
4		-
5		-Introduction
		-------------
6		-
7		-Given a file coming from an unknown source, it is sometimes desirable
8		-to find out whether the format of that file is plain text. Although
9		-this may appear like a simple task, a fully accurate detection of the
10		-file type requires heavy-duty semantic analysis on the file contents.
11		-It is, however, possible to obtain satisfactory results by employing
12		-various heuristics.
13		-
14		-Previous versions of PKZip and other zip-compatible compression tools
15		-were using a crude detection scheme: if more than 80% (4/5) of the bytes
16		-found in a certain buffer are within the range [7..127], the file is
17		-labeled as plain text, otherwise it is labeled as binary. A prominent
18		-limitation of this scheme is the restriction to Latin-based alphabets.
19		-Other alphabets, like Greek, Cyrillic or Asian, make extensive use of
20		-the bytes within the range [128..255], and texts using these alphabets
21		-are most often misidentified by this scheme; in other words, the rate
22		-of false negatives is sometimes too high, which means that the recall
23		-is low. Another weakness of this scheme is a reduced precision, due to
24		-the false positives that may occur when binary files containing large
25		-amounts of textual characters are misidentified as plain text.
26		-
27		-In this article we propose a new, simple detection scheme that features
28		-a much increased precision and a near-100% recall. This scheme is
29		-designed to work on ASCII, Unicode and other ASCII-derived alphabets,
30		-and it handles single-byte encodings (ISO-8859, MacRoman, KOI8, etc.)
31		-and variable-sized encodings (ISO-2022, UTF-8, etc.). Wider encodings
32		-(UCS-2/UTF-16 and UCS-4/UTF-32) are not handled, however.
33		-
34		-
35		-The Algorithm
		--------------
36		-
37		-The algorithm works by dividing the set of bytecodes [0..255] into three
38		-categories:
39		-- The white list of textual bytecodes:
40		- 9 (TAB), 10 (LF), 13 (CR), 32 (SPACE) to 255.
41		-- The gray list of tolerated bytecodes:
42		- 7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB), 27 (ESC).
43		-- The black list of undesired, non-textual bytecodes:
44		- 0 (NUL) to 6, 14 to 31.
45		-
46		-If a file contains at least one byte that belongs to the white list and
47		-no byte that belongs to the black list, then the file is categorized as
48		-plain text; otherwise, it is categorized as binary. (The boundary case,
49		-when the file is empty, automatically falls into the latter category.)
50		-
51		-
52		-Rationale
		----------
53		-
54		-The idea behind this algorithm relies on two observations.
55		-
56		-The first observation is that, although the full range of 7-bit codes
57		-[0..127] is properly specified by the ASCII standard, most control
58		-characters in the range [0..31] are not used in practice. The only
59		-widely-used, almost universally-portable control codes are 9 (TAB),
60		-10 (LF) and 13 (CR). There are a few more control codes that are
61		-recognized on a reduced range of platforms and text viewers/editors:
62		-7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB) and 27 (ESC); but these
63		-codes are rarely (if ever) used alone, without being accompanied by
64		-some printable text. Even the newer, portable text formats such as
65		-XML avoid using control characters outside the list mentioned here.
66		-
67		-The second observation is that most of the binary files tend to contain
68		-control characters, especially 0 (NUL). Even though the older text
69		-detection schemes observe the presence of non-ASCII codes from the range
70		-[128..255], the precision rarely has to suffer if this upper range is
71		-labeled as textual, because the files that are genuinely binary tend to
72		-contain both control characters and codes from the upper range. On the
73		-other hand, the upper range needs to be labeled as textual, because it
74		-is used by virtually all ASCII extensions. In particular, this range is
75		-used for encoding non-Latin scripts.
76		-
77		-Since there is no counting involved, other than simply observing the
78		-presence or the absence of some byte values, the algorithm produces
79		-consistent results, regardless what alphabet encoding is being used.
80		-(If counting were involved, it could be possible to obtain different
81		-results on a text encoded, say, using ISO-8859-16 versus UTF-8.)
82		-
83		-There is an extra category of plain text files that are "polluted" with
84		-one or more black-listed codes, either by mistake or by peculiar design
85		-considerations. In such cases, a scheme that tolerates a small fraction
86		-of black-listed codes would provide an increased recall (i.e. more true
87		-positives). This, however, incurs a reduced precision overall, since
88		-false positives are more likely to appear in binary files that contain
89		-large chunks of textual data. Furthermore, "polluted" plain text should
90		-be regarded as binary by general-purpose text detection schemes, because
91		-general-purpose text processing algorithms might not be applicable.
92		-Under this premise, it is safe to say that our detection method provides
93		-a near-100% recall.
94		-
95		-Experiments have been run on many files coming from various platforms
96		-and applications. We tried plain text files, system logs, source code,
97		-formatted office documents, compiled object code, etc. The results
98		-confirm the optimistic assumptions about the capabilities of this
99		-algorithm.
100		-
101		-
		---
102		-Cosmin Truta
103		-Last updated: 2006-May-28

	--- a/compat/zlib/doc/txtvsbin.txt
	+++ b/compat/zlib/doc/txtvsbin.txt
	@@ -1,107 +0,0 @@
1	A Fast Method for Identifying Plain Text Files
2	==============================================
3
4
5	Introduction
	-------------
6
7	Given a file coming from an unknown source, it is sometimes desirable
8	to find out whether the format of that file is plain text. Although
9	this may appear like a simple task, a fully accurate detection of the
10	file type requires heavy-duty semantic analysis on the file contents.
11	It is, however, possible to obtain satisfactory results by employing
12	various heuristics.
13
14	Previous versions of PKZip and other zip-compatible compression tools
15	were using a crude detection scheme: if more than 80% (4/5) of the bytes
16	found in a certain buffer are within the range [7..127], the file is
17	labeled as plain text, otherwise it is labeled as binary. A prominent
18	limitation of this scheme is the restriction to Latin-based alphabets.
19	Other alphabets, like Greek, Cyrillic or Asian, make extensive use of
20	the bytes within the range [128..255], and texts using these alphabets
21	are most often misidentified by this scheme; in other words, the rate
22	of false negatives is sometimes too high, which means that the recall
23	is low. Another weakness of this scheme is a reduced precision, due to
24	the false positives that may occur when binary files containing large
25	amounts of textual characters are misidentified as plain text.
26
27	In this article we propose a new, simple detection scheme that features
28	a much increased precision and a near-100% recall. This scheme is
29	designed to work on ASCII, Unicode and other ASCII-derived alphabets,
30	and it handles single-byte encodings (ISO-8859, MacRoman, KOI8, etc.)
31	and variable-sized encodings (ISO-2022, UTF-8, etc.). Wider encodings
32	(UCS-2/UTF-16 and UCS-4/UTF-32) are not handled, however.
33
34
35	The Algorithm
	--------------
36
37	The algorithm works by dividing the set of bytecodes [0..255] into three
38	categories:
39	- The white list of textual bytecodes:
40	9 (TAB), 10 (LF), 13 (CR), 32 (SPACE) to 255.
41	- The gray list of tolerated bytecodes:
42	7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB), 27 (ESC).
43	- The black list of undesired, non-textual bytecodes:
44	0 (NUL) to 6, 14 to 31.
45
46	If a file contains at least one byte that belongs to the white list and
47	no byte that belongs to the black list, then the file is categorized as
48	plain text; otherwise, it is categorized as binary. (The boundary case,
49	when the file is empty, automatically falls into the latter category.)
50
51
52	Rationale
	----------
53
54	The idea behind this algorithm relies on two observations.
55
56	The first observation is that, although the full range of 7-bit codes
57	[0..127] is properly specified by the ASCII standard, most control
58	characters in the range [0..31] are not used in practice. The only
59	widely-used, almost universally-portable control codes are 9 (TAB),
60	10 (LF) and 13 (CR). There are a few more control codes that are
61	recognized on a reduced range of platforms and text viewers/editors:
62	7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB) and 27 (ESC); but these
63	codes are rarely (if ever) used alone, without being accompanied by
64	some printable text. Even the newer, portable text formats such as
65	XML avoid using control characters outside the list mentioned here.
66
67	The second observation is that most of the binary files tend to contain
68	control characters, especially 0 (NUL). Even though the older text
69	detection schemes observe the presence of non-ASCII codes from the range
70	[128..255], the precision rarely has to suffer if this upper range is
71	labeled as textual, because the files that are genuinely binary tend to
72	contain both control characters and codes from the upper range. On the
73	other hand, the upper range needs to be labeled as textual, because it
74	is used by virtually all ASCII extensions. In particular, this range is
75	used for encoding non-Latin scripts.
76
77	Since there is no counting involved, other than simply observing the
78	presence or the absence of some byte values, the algorithm produces
79	consistent results, regardless what alphabet encoding is being used.
80	(If counting were involved, it could be possible to obtain different
81	results on a text encoded, say, using ISO-8859-16 versus UTF-8.)
82
83	There is an extra category of plain text files that are "polluted" with
84	one or more black-listed codes, either by mistake or by peculiar design
85	considerations. In such cases, a scheme that tolerates a small fraction
86	of black-listed codes would provide an increased recall (i.e. more true
87	positives). This, however, incurs a reduced precision overall, since
88	false positives are more likely to appear in binary files that contain
89	large chunks of textual data. Furthermore, "polluted" plain text should
90	be regarded as binary by general-purpose text detection schemes, because
91	general-purpose text processing algorithms might not be applicable.
92	Under this premise, it is safe to say that our detection method provides
93	a near-100% recall.
94
95	Experiments have been run on many files coming from various platforms
96	and applications. We tried plain text files, system logs, source code,
97	formatted office documents, compiled object code, etc. The results
98	confirm the optimistic assumptions about the capabilities of this
99	algorithm.
100
101
	---
102	Cosmin Truta
103	Last updated: 2006-May-28

	--- a/compat/zlib/doc/txtvsbin.txt
	+++ b/compat/zlib/doc/txtvsbin.txt
	@@ -1,107 +0,0 @@





	-------------






























	--------------

















	----------

















































	---

M src/clone.c

		--- src/clone.c
		+++ src/clone.c
		@@ -175,10 +175,11 @@
175	175	db_initial_setup(0, 0, zDefaultUser);
176	176	user_select();
177	177	db_set("content-schema", CONTENT_SCHEMA, 0);
178	178	db_set("aux-schema", AUX_SCHEMA_MAX, 0);
179	179	db_set("rebuilt", get_version(), 0);
	180	+ db_unset("hash-policy", 0);
180	181	remember_or_get_http_auth(zHttpAuth, urlFlags & URL_REMEMBER, g.argv[2]);
181	182	url_remember();
182	183	if( g.zSSLIdentity!=0 ){
183	184	/* If the --ssl-identity option was specified, store it as a setting */
184	185	Blob fn;
185	186

	--- src/clone.c
	+++ src/clone.c
	@@ -175,10 +175,11 @@
175	db_initial_setup(0, 0, zDefaultUser);
176	user_select();
177	db_set("content-schema", CONTENT_SCHEMA, 0);
178	db_set("aux-schema", AUX_SCHEMA_MAX, 0);
179	db_set("rebuilt", get_version(), 0);

180	remember_or_get_http_auth(zHttpAuth, urlFlags & URL_REMEMBER, g.argv[2]);
181	url_remember();
182	if( g.zSSLIdentity!=0 ){
183	/* If the --ssl-identity option was specified, store it as a setting */
184	Blob fn;
185

	--- src/clone.c
	+++ src/clone.c
	@@ -175,10 +175,11 @@
175	db_initial_setup(0, 0, zDefaultUser);
176	user_select();
177	db_set("content-schema", CONTENT_SCHEMA, 0);
178	db_set("aux-schema", AUX_SCHEMA_MAX, 0);
179	db_set("rebuilt", get_version(), 0);
180	db_unset("hash-policy", 0);
181	remember_or_get_http_auth(zHttpAuth, urlFlags & URL_REMEMBER, g.argv[2]);
182	url_remember();
183	if( g.zSSLIdentity!=0 ){
184	/* If the --ssl-identity option was specified, store it as a setting */
185	Blob fn;
186

M src/configure.c

		--- src/configure.c
		+++ src/configure.c
		@@ -129,10 +129,11 @@
129	129	{ "empty-dirs", CONFIGSET_PROJ },
130	130	{ "allow-symlinks", CONFIGSET_PROJ },
131	131	{ "dotfiles", CONFIGSET_PROJ },
132	132	{ "parent-project-code", CONFIGSET_PROJ },
133	133	{ "parent-project-name", CONFIGSET_PROJ },
	134	+ { "hash-policy", CONFIGSET_PROJ },
134	135
135	136	#ifdef FOSSIL_ENABLE_LEGACY_MV_RM
136	137	{ "mv-rm-files", CONFIGSET_PROJ },
137	138	#endif
138	139
139	140

	--- src/configure.c
	+++ src/configure.c
	@@ -129,10 +129,11 @@
129	{ "empty-dirs", CONFIGSET_PROJ },
130	{ "allow-symlinks", CONFIGSET_PROJ },
131	{ "dotfiles", CONFIGSET_PROJ },
132	{ "parent-project-code", CONFIGSET_PROJ },
133	{ "parent-project-name", CONFIGSET_PROJ },

134
135	#ifdef FOSSIL_ENABLE_LEGACY_MV_RM
136	{ "mv-rm-files", CONFIGSET_PROJ },
137	#endif
138
139

	--- src/configure.c
	+++ src/configure.c
	@@ -129,10 +129,11 @@
129	{ "empty-dirs", CONFIGSET_PROJ },
130	{ "allow-symlinks", CONFIGSET_PROJ },
131	{ "dotfiles", CONFIGSET_PROJ },
132	{ "parent-project-code", CONFIGSET_PROJ },
133	{ "parent-project-name", CONFIGSET_PROJ },
134	{ "hash-policy", CONFIGSET_PROJ },
135
136	#ifdef FOSSIL_ENABLE_LEGACY_MV_RM
137	{ "mv-rm-files", CONFIGSET_PROJ },
138	#endif
139
140

M src/content.c

		--- src/content.c
		+++ src/content.c
		@@ -528,10 +528,14 @@
528	528	blob_reset(&hash);
529	529	hname_hash(pBlob, 0, &hash);
530	530	}
531	531	}else{
532	532	blob_init(&hash, zUuid, -1);
	533	+ }
	534	+ if( g.eHashPolicy==HPOLICY_AUTO && blob_size(&hash)>HNAME_LEN_SHA1 ){
	535	+ g.eHashPolicy = HPOLICY_SHA3;
	536	+ db_set_int("hash-policy", HPOLICY_SHA3, 0);
533	537	}
534	538	if( nBlob ){
535	539	size = nBlob;
536	540	}else{
537	541	size = blob_size(pBlob);
538	542

	--- src/content.c
	+++ src/content.c
	@@ -528,10 +528,14 @@
528	blob_reset(&hash);
529	hname_hash(pBlob, 0, &hash);
530	}
531	}else{
532	blob_init(&hash, zUuid, -1);




533	}
534	if( nBlob ){
535	size = nBlob;
536	}else{
537	size = blob_size(pBlob);
538

	--- src/content.c
	+++ src/content.c
	@@ -528,10 +528,14 @@
528	blob_reset(&hash);
529	hname_hash(pBlob, 0, &hash);
530	}
531	}else{
532	blob_init(&hash, zUuid, -1);
533	}
534	if( g.eHashPolicy==HPOLICY_AUTO && blob_size(&hash)>HNAME_LEN_SHA1 ){
535	g.eHashPolicy = HPOLICY_SHA3;
536	db_set_int("hash-policy", HPOLICY_SHA3, 0);
537	}
538	if( nBlob ){
539	size = nBlob;
540	}else{
541	size = blob_size(pBlob);
542

M src/db.c

+17 -3

		--- src/db.c
		+++ src/db.c
		@@ -1485,10 +1485,15 @@
1485	1485	g.repositoryOpen = 1;
1486	1486	/* Cache "allow-symlinks" option, because we'll need it on every stat call */
1487	1487	g.allowSymlinks = db_get_boolean("allow-symlinks",
1488	1488	db_allow_symlinks_by_default());
1489	1489	g.zAuxSchema = db_get("aux-schema","");
	1490	+ g.eHashPolicy = db_get_int("hash-policy",-1);
	1491	+ if( g.eHashPolicy<0 ){
	1492	+ g.eHashPolicy = hname_default_policy();
	1493	+ db_set_int("hash-policy", g.eHashPolicy, 0);
	1494	+ }
1490	1495
1491	1496	/* If the ALIAS table is not present, then some on-the-fly schema
1492	1497	** updates might be required.
1493	1498	*/
1494	1499	rebuild_schema_update_2_0(); /* Do the Fossil-2.0 schema updates */
		@@ -1828,10 +1833,11 @@
1828	1833	" AND name NOT GLOB 'project-*'"
1829	1834	" AND name NOT GLOB 'short-project-*';",
1830	1835	configure_inop_rhs(CONFIGSET_ALL),
1831	1836	db_setting_inop_rhs()
1832	1837	);
	1838	+ g.eHashPolicy = db_get_int("hash-policy", g.eHashPolicy);
1833	1839	db_multi_exec(
1834	1840	"REPLACE INTO reportfmt SELECT * FROM settingSrc.reportfmt;"
1835	1841	);
1836	1842
1837	1843	/*
		@@ -1900,13 +1906,14 @@
1900	1906	** their associated permissions will not be copied; however, the system
1901	1907	** default users "anonymous", "nobody", "reader", "developer", and their
1902	1908	** associated permissions will be copied.
1903	1909	**
1904	1910	** Options:
1905		-** --template FILE copy settings from repository file
1906		-** --admin-user\|-A USERNAME select given USERNAME as admin user
1907		-** --date-override DATETIME use DATETIME as time of the initial check-in
	1911	+** --template FILE Copy settings from repository file
	1912	+** --admin-user\|-A USERNAME Select given USERNAME as admin user
	1913	+** --date-override DATETIME Use DATETIME as time of the initial check-in
	1914	+** --sha1 Use a initial hash policy of "sha1"
1908	1915	**
1909	1916	** DATETIME may be "now" or "YYYY-MM-DDTHH:MM:SS.SSS". If in
1910	1917	** year-month-day form, it may be truncated, the "T" may be replaced by
1911	1918	** a space, and it may also name a timezone offset from UTC as "-HH:MM"
1912	1919	** (westward) or "+HH:MM" (eastward). Either no timezone suffix or "Z"
		@@ -1917,14 +1924,17 @@
1917	1924	void create_repository_cmd(void){
1918	1925	char *zPassword;
1919	1926	const char zTemplate; / Repository from which to copy settings */
1920	1927	const char zDate; / Date of the initial check-in */
1921	1928	const char zDefaultUser; / Optional name of the default user */
	1929	+ int bUseSha1 = 0; /* True to set the hash-policy to sha1 */
	1930	+
1922	1931
1923	1932	zTemplate = find_option("template",0,1);
1924	1933	zDate = find_option("date-override",0,1);
1925	1934	zDefaultUser = find_option("admin-user","A",1);
	1935	+ bUseSha1 = find_option("sha1",0,0)!=0;
1926	1936	/* We should be done with options.. */
1927	1937	verify_all_options();
1928	1938
1929	1939	if( g.argc!=3 ){
1930	1940	usage("REPOSITORY-NAME");
		@@ -1937,10 +1947,14 @@
1937	1947	db_create_repository(g.argv[2]);
1938	1948	db_open_repository(g.argv[2]);
1939	1949	db_open_config(0, 0);
1940	1950	if( zTemplate ) db_attach(zTemplate, "settingSrc");
1941	1951	db_begin_transaction();
	1952	+ if( bUseSha1 ){
	1953	+ g.eHashPolicy = HPOLICY_SHA1;
	1954	+ db_set_int("hash-policy", HPOLICY_SHA1, 0);
	1955	+ }
1942	1956	if( zDate==0 ) zDate = "now";
1943	1957	db_initial_setup(zTemplate, zDate, zDefaultUser);
1944	1958	db_end_transaction(0);
1945	1959	if( zTemplate ) db_detach("settingSrc");
1946	1960	fossil_print("project-id: %s\n", db_get("project-code", 0));
1947	1961

	--- src/db.c
	+++ src/db.c
	@@ -1485,10 +1485,15 @@
1485	g.repositoryOpen = 1;
1486	/* Cache "allow-symlinks" option, because we'll need it on every stat call */
1487	g.allowSymlinks = db_get_boolean("allow-symlinks",
1488	db_allow_symlinks_by_default());
1489	g.zAuxSchema = db_get("aux-schema","");





1490
1491	/* If the ALIAS table is not present, then some on-the-fly schema
1492	** updates might be required.
1493	*/
1494	rebuild_schema_update_2_0(); /* Do the Fossil-2.0 schema updates */
	@@ -1828,10 +1833,11 @@
1828	" AND name NOT GLOB 'project-*'"
1829	" AND name NOT GLOB 'short-project-*';",
1830	configure_inop_rhs(CONFIGSET_ALL),
1831	db_setting_inop_rhs()
1832	);

1833	db_multi_exec(
1834	"REPLACE INTO reportfmt SELECT * FROM settingSrc.reportfmt;"
1835	);
1836
1837	/*
	@@ -1900,13 +1906,14 @@
1900	** their associated permissions will not be copied; however, the system
1901	** default users "anonymous", "nobody", "reader", "developer", and their
1902	** associated permissions will be copied.
1903	**
1904	** Options:
1905	** --template FILE copy settings from repository file
1906	** --admin-user\|-A USERNAME select given USERNAME as admin user
1907	** --date-override DATETIME use DATETIME as time of the initial check-in

1908	**
1909	** DATETIME may be "now" or "YYYY-MM-DDTHH:MM:SS.SSS". If in
1910	** year-month-day form, it may be truncated, the "T" may be replaced by
1911	** a space, and it may also name a timezone offset from UTC as "-HH:MM"
1912	** (westward) or "+HH:MM" (eastward). Either no timezone suffix or "Z"
	@@ -1917,14 +1924,17 @@
1917	void create_repository_cmd(void){
1918	char *zPassword;
1919	const char zTemplate; / Repository from which to copy settings */
1920	const char zDate; / Date of the initial check-in */
1921	const char zDefaultUser; / Optional name of the default user */


1922
1923	zTemplate = find_option("template",0,1);
1924	zDate = find_option("date-override",0,1);
1925	zDefaultUser = find_option("admin-user","A",1);

1926	/* We should be done with options.. */
1927	verify_all_options();
1928
1929	if( g.argc!=3 ){
1930	usage("REPOSITORY-NAME");
	@@ -1937,10 +1947,14 @@
1937	db_create_repository(g.argv[2]);
1938	db_open_repository(g.argv[2]);
1939	db_open_config(0, 0);
1940	if( zTemplate ) db_attach(zTemplate, "settingSrc");
1941	db_begin_transaction();




1942	if( zDate==0 ) zDate = "now";
1943	db_initial_setup(zTemplate, zDate, zDefaultUser);
1944	db_end_transaction(0);
1945	if( zTemplate ) db_detach("settingSrc");
1946	fossil_print("project-id: %s\n", db_get("project-code", 0));
1947

	--- src/db.c
	+++ src/db.c
	@@ -1485,10 +1485,15 @@
1485	g.repositoryOpen = 1;
1486	/* Cache "allow-symlinks" option, because we'll need it on every stat call */
1487	g.allowSymlinks = db_get_boolean("allow-symlinks",
1488	db_allow_symlinks_by_default());
1489	g.zAuxSchema = db_get("aux-schema","");
1490	g.eHashPolicy = db_get_int("hash-policy",-1);
1491	if( g.eHashPolicy<0 ){
1492	g.eHashPolicy = hname_default_policy();
1493	db_set_int("hash-policy", g.eHashPolicy, 0);
1494	}
1495
1496	/* If the ALIAS table is not present, then some on-the-fly schema
1497	** updates might be required.
1498	*/
1499	rebuild_schema_update_2_0(); /* Do the Fossil-2.0 schema updates */
	@@ -1828,10 +1833,11 @@
1833	" AND name NOT GLOB 'project-*'"
1834	" AND name NOT GLOB 'short-project-*';",
1835	configure_inop_rhs(CONFIGSET_ALL),
1836	db_setting_inop_rhs()
1837	);
1838	g.eHashPolicy = db_get_int("hash-policy", g.eHashPolicy);
1839	db_multi_exec(
1840	"REPLACE INTO reportfmt SELECT * FROM settingSrc.reportfmt;"
1841	);
1842
1843	/*
	@@ -1900,13 +1906,14 @@
1906	** their associated permissions will not be copied; however, the system
1907	** default users "anonymous", "nobody", "reader", "developer", and their
1908	** associated permissions will be copied.
1909	**
1910	** Options:
1911	** --template FILE Copy settings from repository file
1912	** --admin-user\|-A USERNAME Select given USERNAME as admin user
1913	** --date-override DATETIME Use DATETIME as time of the initial check-in
1914	** --sha1 Use a initial hash policy of "sha1"
1915	**
1916	** DATETIME may be "now" or "YYYY-MM-DDTHH:MM:SS.SSS". If in
1917	** year-month-day form, it may be truncated, the "T" may be replaced by
1918	** a space, and it may also name a timezone offset from UTC as "-HH:MM"
1919	** (westward) or "+HH:MM" (eastward). Either no timezone suffix or "Z"
	@@ -1917,14 +1924,17 @@
1924	void create_repository_cmd(void){
1925	char *zPassword;
1926	const char zTemplate; / Repository from which to copy settings */
1927	const char zDate; / Date of the initial check-in */
1928	const char zDefaultUser; / Optional name of the default user */
1929	int bUseSha1 = 0; /* True to set the hash-policy to sha1 */
1930
1931
1932	zTemplate = find_option("template",0,1);
1933	zDate = find_option("date-override",0,1);
1934	zDefaultUser = find_option("admin-user","A",1);
1935	bUseSha1 = find_option("sha1",0,0)!=0;
1936	/* We should be done with options.. */
1937	verify_all_options();
1938
1939	if( g.argc!=3 ){
1940	usage("REPOSITORY-NAME");
	@@ -1937,10 +1947,14 @@
1947	db_create_repository(g.argv[2]);
1948	db_open_repository(g.argv[2]);
1949	db_open_config(0, 0);
1950	if( zTemplate ) db_attach(zTemplate, "settingSrc");
1951	db_begin_transaction();
1952	if( bUseSha1 ){
1953	g.eHashPolicy = HPOLICY_SHA1;
1954	db_set_int("hash-policy", HPOLICY_SHA1, 0);
1955	}
1956	if( zDate==0 ) zDate = "now";
1957	db_initial_setup(zTemplate, zDate, zDefaultUser);
1958	db_end_transaction(0);
1959	if( zTemplate ) db_detach("settingSrc");
1960	fossil_print("project-id: %s\n", db_get("project-code", 0));
1961

M src/diffcmd.c

+21 -7

		--- src/diffcmd.c
		+++ src/diffcmd.c
		@@ -151,10 +151,13 @@
151	151	/*
152	152	** Show the difference between two files, one in memory and one on disk.
153	153	**
154	154	** The difference is the set of edits needed to transform pFile1 into
155	155	** zFile2. The content of pFile1 is in memory. zFile2 exists on disk.
	156	+**
	157	+** If fSwapDiff is 1, show the set of edits to transform zFile2 into pFile1
	158	+** instead of the opposite.
156	159	**
157	160	** Use the internal diff logic if zDiffCmd is NULL. Otherwise call the
158	161	** command zDiffCmd to do the diffing.
159	162	**
160	163	** When using an external diff program, zBinGlob contains the GLOB patterns
		@@ -167,11 +170,12 @@
167	170	const char zFile2, / On disk content to compare to */
168	171	const char zName, / Display name of the file */
169	172	const char zDiffCmd, / Command for comparison */
170	173	const char zBinGlob, / Treat file names matching this as binary */
171	174	int fIncludeBinary, /* Include binary files for external diff */
172		- u64 diffFlags /* Flags to control the diff */
	175	+ u64 diffFlags, /* Flags to control the diff */
	176	+ int fSwapDiff /* Diff from Zfile2 to Pfile1 */
173	177	){
174	178	if( zDiffCmd==0 ){
175	179	Blob out; /* Diff output text */
176	180	Blob file2; /* Content of zFile2 */
177	181	const char zName2; / Name of zFile2 for display */
		@@ -194,11 +198,15 @@
194	198	if( blob_compare(pFile1, &file2) ){
195	199	fossil_print("CHANGED %s\n", zName);
196	200	}
197	201	}else{
198	202	blob_zero(&out);
199		- text_diff(pFile1, &file2, &out, 0, diffFlags);
	203	+ if( fSwapDiff ){
	204	+ text_diff(&file2, pFile1, &out, 0, diffFlags);
	205	+ }else{
	206	+ text_diff(pFile1, &file2, &out, 0, diffFlags);
	207	+ }
200	208	if( blob_size(&out) ){
201	209	diff_print_filenames(zName, zName2, diffFlags);
202	210	fossil_print("%s\n", blob_str(&out));
203	211	}
204	212	blob_reset(&out);
		@@ -252,13 +260,19 @@
252	260	blob_write_to_file(pFile1, blob_str(&nameFile1));
253	261
254	262	/* Construct the external diff command */
255	263	blob_zero(&cmd);
256	264	blob_appendf(&cmd, "%s ", zDiffCmd);
257		- shell_escape(&cmd, blob_str(&nameFile1));
258		- blob_append(&cmd, " ", 1);
259		- shell_escape(&cmd, zFile2);
	265	+ if( fSwapDiff ){
	266	+ shell_escape(&cmd, zFile2);
	267	+ blob_append(&cmd, " ", 1);
	268	+ shell_escape(&cmd, blob_str(&nameFile1));
	269	+ }else{
	270	+ shell_escape(&cmd, blob_str(&nameFile1));
	271	+ blob_append(&cmd, " ", 1);
	272	+ shell_escape(&cmd, zFile2);
	273	+ }
260	274
261	275	/* Run the external diff command */
262	276	fossil_system(blob_str(&cmd));
263	277
264	278	/* Delete the temporary file and clean up memory used */
		@@ -482,11 +496,11 @@
482	496	blob_zero(&content);
483	497	}
484	498	isBin = fIncludeBinary ? 0 : looks_like_binary(&content);
485	499	diff_print_index(zPathname, diffFlags);
486	500	diff_file(&content, isBin, zFullName, zPathname, zDiffCmd,
487		- zBinGlob, fIncludeBinary, diffFlags);
	501	+ zBinGlob, fIncludeBinary, diffFlags, 0);
488	502	blob_reset(&content);
489	503	}
490	504	blob_reset(&fname);
491	505	}
492	506	db_finalize(&q);
		@@ -519,11 +533,11 @@
519	533	const char zFile = (const char)db_column_text(&q, 0);
520	534	if( !file_dir_match(pFileDir, zFile) ) continue;
521	535	zFullName = mprintf("%s%s", g.zLocalRoot, zFile);
522	536	db_column_blob(&q, 1, &content);
523	537	diff_file(&content, 0, zFullName, zFile,
524		- zDiffCmd, zBinGlob, fIncludeBinary, diffFlags);
	538	+ zDiffCmd, zBinGlob, fIncludeBinary, diffFlags, 0);
525	539	fossil_free(zFullName);
526	540	blob_reset(&content);
527	541	}
528	542	db_finalize(&q);
529	543	}
530	544

	--- src/diffcmd.c
	+++ src/diffcmd.c
	@@ -151,10 +151,13 @@
151	/*
152	** Show the difference between two files, one in memory and one on disk.
153	**
154	** The difference is the set of edits needed to transform pFile1 into
155	** zFile2. The content of pFile1 is in memory. zFile2 exists on disk.



156	**
157	** Use the internal diff logic if zDiffCmd is NULL. Otherwise call the
158	** command zDiffCmd to do the diffing.
159	**
160	** When using an external diff program, zBinGlob contains the GLOB patterns
	@@ -167,11 +170,12 @@
167	const char zFile2, / On disk content to compare to */
168	const char zName, / Display name of the file */
169	const char zDiffCmd, / Command for comparison */
170	const char zBinGlob, / Treat file names matching this as binary */
171	int fIncludeBinary, /* Include binary files for external diff */
172	u64 diffFlags /* Flags to control the diff */

173	){
174	if( zDiffCmd==0 ){
175	Blob out; /* Diff output text */
176	Blob file2; /* Content of zFile2 */
177	const char zName2; / Name of zFile2 for display */
	@@ -194,11 +198,15 @@
194	if( blob_compare(pFile1, &file2) ){
195	fossil_print("CHANGED %s\n", zName);
196	}
197	}else{
198	blob_zero(&out);
199	text_diff(pFile1, &file2, &out, 0, diffFlags);




200	if( blob_size(&out) ){
201	diff_print_filenames(zName, zName2, diffFlags);
202	fossil_print("%s\n", blob_str(&out));
203	}
204	blob_reset(&out);
	@@ -252,13 +260,19 @@
252	blob_write_to_file(pFile1, blob_str(&nameFile1));
253
254	/* Construct the external diff command */
255	blob_zero(&cmd);
256	blob_appendf(&cmd, "%s ", zDiffCmd);
257	shell_escape(&cmd, blob_str(&nameFile1));
258	blob_append(&cmd, " ", 1);
259	shell_escape(&cmd, zFile2);






260
261	/* Run the external diff command */
262	fossil_system(blob_str(&cmd));
263
264	/* Delete the temporary file and clean up memory used */
	@@ -482,11 +496,11 @@
482	blob_zero(&content);
483	}
484	isBin = fIncludeBinary ? 0 : looks_like_binary(&content);
485	diff_print_index(zPathname, diffFlags);
486	diff_file(&content, isBin, zFullName, zPathname, zDiffCmd,
487	zBinGlob, fIncludeBinary, diffFlags);
488	blob_reset(&content);
489	}
490	blob_reset(&fname);
491	}
492	db_finalize(&q);
	@@ -519,11 +533,11 @@
519	const char zFile = (const char)db_column_text(&q, 0);
520	if( !file_dir_match(pFileDir, zFile) ) continue;
521	zFullName = mprintf("%s%s", g.zLocalRoot, zFile);
522	db_column_blob(&q, 1, &content);
523	diff_file(&content, 0, zFullName, zFile,
524	zDiffCmd, zBinGlob, fIncludeBinary, diffFlags);
525	fossil_free(zFullName);
526	blob_reset(&content);
527	}
528	db_finalize(&q);
529	}
530

	--- src/diffcmd.c
	+++ src/diffcmd.c
	@@ -151,10 +151,13 @@
151	/*
152	** Show the difference between two files, one in memory and one on disk.
153	**
154	** The difference is the set of edits needed to transform pFile1 into
155	** zFile2. The content of pFile1 is in memory. zFile2 exists on disk.
156	**
157	** If fSwapDiff is 1, show the set of edits to transform zFile2 into pFile1
158	** instead of the opposite.
159	**
160	** Use the internal diff logic if zDiffCmd is NULL. Otherwise call the
161	** command zDiffCmd to do the diffing.
162	**
163	** When using an external diff program, zBinGlob contains the GLOB patterns
	@@ -167,11 +170,12 @@
170	const char zFile2, / On disk content to compare to */
171	const char zName, / Display name of the file */
172	const char zDiffCmd, / Command for comparison */
173	const char zBinGlob, / Treat file names matching this as binary */
174	int fIncludeBinary, /* Include binary files for external diff */
175	u64 diffFlags, /* Flags to control the diff */
176	int fSwapDiff /* Diff from Zfile2 to Pfile1 */
177	){
178	if( zDiffCmd==0 ){
179	Blob out; /* Diff output text */
180	Blob file2; /* Content of zFile2 */
181	const char zName2; / Name of zFile2 for display */
	@@ -194,11 +198,15 @@
198	if( blob_compare(pFile1, &file2) ){
199	fossil_print("CHANGED %s\n", zName);
200	}
201	}else{
202	blob_zero(&out);
203	if( fSwapDiff ){
204	text_diff(&file2, pFile1, &out, 0, diffFlags);
205	}else{
206	text_diff(pFile1, &file2, &out, 0, diffFlags);
207	}
208	if( blob_size(&out) ){
209	diff_print_filenames(zName, zName2, diffFlags);
210	fossil_print("%s\n", blob_str(&out));
211	}
212	blob_reset(&out);
	@@ -252,13 +260,19 @@
260	blob_write_to_file(pFile1, blob_str(&nameFile1));
261
262	/* Construct the external diff command */
263	blob_zero(&cmd);
264	blob_appendf(&cmd, "%s ", zDiffCmd);
265	if( fSwapDiff ){
266	shell_escape(&cmd, zFile2);
267	blob_append(&cmd, " ", 1);
268	shell_escape(&cmd, blob_str(&nameFile1));
269	}else{
270	shell_escape(&cmd, blob_str(&nameFile1));
271	blob_append(&cmd, " ", 1);
272	shell_escape(&cmd, zFile2);
273	}
274
275	/* Run the external diff command */
276	fossil_system(blob_str(&cmd));
277
278	/* Delete the temporary file and clean up memory used */
	@@ -482,11 +496,11 @@
496	blob_zero(&content);
497	}
498	isBin = fIncludeBinary ? 0 : looks_like_binary(&content);
499	diff_print_index(zPathname, diffFlags);
500	diff_file(&content, isBin, zFullName, zPathname, zDiffCmd,
501	zBinGlob, fIncludeBinary, diffFlags, 0);
502	blob_reset(&content);
503	}
504	blob_reset(&fname);
505	}
506	db_finalize(&q);
	@@ -519,11 +533,11 @@
533	const char zFile = (const char)db_column_text(&q, 0);
534	if( !file_dir_match(pFileDir, zFile) ) continue;
535	zFullName = mprintf("%s%s", g.zLocalRoot, zFile);
536	db_column_blob(&q, 1, &content);
537	diff_file(&content, 0, zFullName, zFile,
538	zDiffCmd, zBinGlob, fIncludeBinary, diffFlags, 0);
539	fossil_free(zFullName);
540	blob_reset(&content);
541	}
542	db_finalize(&q);
543	}
544

M src/doc.c

+1 -1

		--- src/doc.c
		+++ src/doc.c
		@@ -735,11 +735,11 @@
735	735
736	736	/* Jump here when unable to locate the document */
737	737	doc_not_found:
738	738	db_end_transaction(0);
739	739	if( isUV && P("name")==0 ){
740		- uvstat_page();
	740	+ uvlist_page();
741	741	return;
742	742	}
743	743	cgi_set_status(404, "Not Found");
744	744	style_header("Not Found");
745	745	@ <p>Document %h(zOrigName) not found
746	746

	--- src/doc.c
	+++ src/doc.c
	@@ -735,11 +735,11 @@
735
736	/* Jump here when unable to locate the document */
737	doc_not_found:
738	db_end_transaction(0);
739	if( isUV && P("name")==0 ){
740	uvstat_page();
741	return;
742	}
743	cgi_set_status(404, "Not Found");
744	style_header("Not Found");
745	@ <p>Document %h(zOrigName) not found
746

	--- src/doc.c
	+++ src/doc.c
	@@ -735,11 +735,11 @@
735
736	/* Jump here when unable to locate the document */
737	doc_not_found:
738	db_end_transaction(0);
739	if( isUV && P("name")==0 ){
740	uvlist_page();
741	return;
742	}
743	cgi_set_status(404, "Not Found");
744	style_header("Not Found");
745	@ <p>Document %h(zOrigName) not found
746

M src/encode.c

+90

		--- src/encode.c
		+++ src/encode.c
		@@ -336,10 +336,100 @@
336	336	z[j++] = c;
337	337	}
338	338	if( z[j] ) z[j] = 0;
339	339	}
340	340
	341	+
	342	+/*
	343	+** The *pz variable points to a UTF8 string. Read the next character
	344	+** off of that string and return its codepoint value. Advance *pz to the
	345	+** next character
	346	+*/
	347	+u32 fossil_utf8_read(
	348	+ const unsigned char *pz / Pointer to string from which to read char */
	349	+){
	350	+ unsigned int c;
	351	+
	352	+ /*
	353	+ ** This lookup table is used to help decode the first byte of
	354	+ ** a multi-byte UTF8 character.
	355	+ */
	356	+ static const unsigned char utf8Trans1[] = {
	357	+ 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
	358	+ 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
	359	+ 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
	360	+ 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
	361	+ 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
	362	+ 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
	363	+ 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
	364	+ 0x00, 0x01, 0x02, 0x03, 0x00, 0x01, 0x00, 0x00,
	365	+ };
	366	+
	367	+ c = ((pz)++);
	368	+ if( c>=0xc0 ){
	369	+ c = utf8Trans1[c-0xc0];
	370	+ while( ((pz) & 0xc0)==0x80 ){
	371	+ c = (c<<6) + (0x3f & ((pz)++));
	372	+ }
	373	+ if( c<0x80
	374	+ \|\| (c&0xFFFFF800)==0xD800
	375	+ \|\| (c&0xFFFFFFFE)==0xFFFE ){ c = 0xFFFD; }
	376	+ }
	377	+ return c;
	378	+}
	379	+
	380	+/*
	381	+** Encode a UTF8 string for JSON. All special characters are escaped.
	382	+*/
	383	+void blob_append_json_string(Blob pBlob, const char zStr){
	384	+ const unsigned char *z;
	385	+ char *zOut;
	386	+ u32 c;
	387	+ int n, i, j;
	388	+ z = (const unsigned char*)zStr;
	389	+ n = 0;
	390	+ while( (c = fossil_utf8_read(&z))!=0 ){
	391	+ if( c=='\\' \|\| c=='"' ){
	392	+ n += 2;
	393	+ }else if( c<' ' \|\| c>=0x7f ){
	394	+ if( c=='\n' \|\| c=='\r' ){
	395	+ n += 2;
	396	+ }else{
	397	+ n += 6;
	398	+ }
	399	+ }else{
	400	+ n++;
	401	+ }
	402	+ }
	403	+ i = blob_size(pBlob);
	404	+ blob_resize(pBlob, i+n);
	405	+ zOut = blob_buffer(pBlob);
	406	+ z = (const unsigned char*)zStr;
	407	+ while( (c = fossil_utf8_read(&z))!=0 ){
	408	+ if( c=='\\' ){
	409	+ zOut[i++] = '\\';
	410	+ zOut[i++] = c;
	411	+ }else if( c<' ' \|\| c>=0x7f ){
	412	+ zOut[i++] = '\\';
	413	+ if( c=='\n' ){
	414	+ zOut[i++] = 'n';
	415	+ }else if( c=='\r' ){
	416	+ zOut[i++] = 'r';
	417	+ }else{
	418	+ zOut[i++] = 'u';
	419	+ for(j=3; j>=0; j--){
	420	+ zOut[i+j] = "0123456789abcdef"[c&0xf];
	421	+ c >>= 4;
	422	+ }
	423	+ i += 4;
	424	+ }
	425	+ }else{
	426	+ zOut[i++] = c;
	427	+ }
	428	+ }
	429	+ zOut[i] = 0;
	430	+}
341	431
342	432	/*
343	433	** The characters used for HTTP base64 encoding.
344	434	*/
345	435	static unsigned char zBase[] =
346	436

	--- src/encode.c
	+++ src/encode.c
	@@ -336,10 +336,100 @@
336	z[j++] = c;
337	}
338	if( z[j] ) z[j] = 0;
339	}
340


























































































341
342	/*
343	** The characters used for HTTP base64 encoding.
344	*/
345	static unsigned char zBase[] =
346

	--- src/encode.c
	+++ src/encode.c
	@@ -336,10 +336,100 @@
336	z[j++] = c;
337	}
338	if( z[j] ) z[j] = 0;
339	}
340
341
342	/*
343	** The *pz variable points to a UTF8 string. Read the next character
344	** off of that string and return its codepoint value. Advance *pz to the
345	** next character
346	*/
347	u32 fossil_utf8_read(
348	const unsigned char *pz / Pointer to string from which to read char */
349	){
350	unsigned int c;
351
352	/*
353	** This lookup table is used to help decode the first byte of
354	** a multi-byte UTF8 character.
355	*/
356	static const unsigned char utf8Trans1[] = {
357	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
358	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
359	0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
360	0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
361	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
362	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
363	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
364	0x00, 0x01, 0x02, 0x03, 0x00, 0x01, 0x00, 0x00,
365	};
366
367	c = ((pz)++);
368	if( c>=0xc0 ){
369	c = utf8Trans1[c-0xc0];
370	while( ((pz) & 0xc0)==0x80 ){
371	c = (c<<6) + (0x3f & ((pz)++));
372	}
373	if( c<0x80
374	\|\| (c&0xFFFFF800)==0xD800
375	\|\| (c&0xFFFFFFFE)==0xFFFE ){ c = 0xFFFD; }
376	}
377	return c;
378	}
379
380	/*
381	** Encode a UTF8 string for JSON. All special characters are escaped.
382	*/
383	void blob_append_json_string(Blob pBlob, const char zStr){
384	const unsigned char *z;
385	char *zOut;
386	u32 c;
387	int n, i, j;
388	z = (const unsigned char*)zStr;
389	n = 0;
390	while( (c = fossil_utf8_read(&z))!=0 ){
391	if( c=='\\' \|\| c=='"' ){
392	n += 2;
393	}else if( c<' ' \|\| c>=0x7f ){
394	if( c=='\n' \|\| c=='\r' ){
395	n += 2;
396	}else{
397	n += 6;
398	}
399	}else{
400	n++;
401	}
402	}
403	i = blob_size(pBlob);
404	blob_resize(pBlob, i+n);
405	zOut = blob_buffer(pBlob);
406	z = (const unsigned char*)zStr;
407	while( (c = fossil_utf8_read(&z))!=0 ){
408	if( c=='\\' ){
409	zOut[i++] = '\\';
410	zOut[i++] = c;
411	}else if( c<' ' \|\| c>=0x7f ){
412	zOut[i++] = '\\';
413	if( c=='\n' ){
414	zOut[i++] = 'n';
415	}else if( c=='\r' ){
416	zOut[i++] = 'r';
417	}else{
418	zOut[i++] = 'u';
419	for(j=3; j>=0; j--){
420	zOut[i+j] = "0123456789abcdef"[c&0xf];
421	c >>= 4;
422	}
423	i += 4;
424	}
425	}else{
426	zOut[i++] = c;
427	}
428	}
429	zOut[i] = 0;
430	}
431
432	/*
433	** The characters used for HTTP base64 encoding.
434	*/
435	static unsigned char zBase[] =
436

M src/hname.c

+137 -20

		--- src/hname.c
		+++ src/hname.c
		@@ -16,11 +16,13 @@
16	16	*******************************************************************************
17	17	**
18	18	** This file contains generic code for dealing with hashes used for
19	19	** naming artifacts. Specific hash algorithms are implemented separately
20	20	** (for example in sha1.c and sha3.c). This file contains the generic
21		-** interface code.
	21	+** interface logic.
	22	+**
	23	+** "hname" is intended to be an abbreviation of "hash name".
22	24	*/
23	25	#include "config.h"
24	26	#include "hname.h"
25	27
26	28
		@@ -47,10 +49,19 @@
47	49	/*
48	50	** The number of distinct hash algorithms:
49	51	*/
50	52	#define HNAME_COUNT 2 /* Just SHA1 and SHA3-256. Let's keep it that way! */
51	53
	54	+/*
	55	+** Hash naming policies
	56	+*/
	57	+#define HPOLICY_SHA1 0 /* Use SHA1 hashes */
	58	+#define HPOLICY_AUTO 1 /* SHA1 but auto-promote to SHA3 */
	59	+#define HPOLICY_SHA3 2 /* Use SHA3 hashes */
	60	+#define HPOLICY_SHA3_ONLY 3 /* Use SHA3 hashes exclusively */
	61	+#define HPOLICY_SHUN_SHA1 4 /* Shun all SHA1 objects */
	62	+
52	63	#endif /* INTERFACE */
53	64
54	65	/*
55	66	** Return a human-readable name for the hash algorithm given a hash with
56	67	** a length of nHash hexadecimal digits.
		@@ -142,26 +153,132 @@
142	153
143	154	/*
144	155	** Compute a hash on blob pContent. Write the hash into blob pHashOut.
145	156	** This routine assumes that pHashOut is uninitialized.
146	157	**
147		-** The preferred hash is used for iHType==0, and various alternative hashes
148		-** are used for iHType>0 && iHType<NHAME_COUNT.
	158	+** The preferred hash is used for iHType==0 and the alternative hash is
	159	+** used if iHType==1. (The interface is designed to accommodate more than
	160	+** just two hashes, but HNAME_COUNT is currently fixed at 2.)
	161	+**
	162	+** Depending on the hash policy, the alternative hash may be disallowed.
	163	+** If the alterative hash is disallowed, the routine returns 0. This
	164	+** routine returns 1 if iHType>0 and the alternative hash is allowed,
	165	+** and it always returns 1 when iHType==0.
	166	+**
	167	+** Alternative hash is disallowed for all hash policies except auto,
	168	+** sha1 and sha3.
	169	+*/
	170	+int hname_hash(const Blob pContent, unsigned int iHType, Blob pHashOut){
	171	+ assert( iHType==0 \|\| iHType==1 );
	172	+ if( iHType==1 ){
	173	+ switch( g.eHashPolicy ){
	174	+ case HPOLICY_AUTO:
	175	+ case HPOLICY_SHA1:
	176	+ sha3sum_blob(pContent, 256, pHashOut);
	177	+ return 1;
	178	+ case HPOLICY_SHA3:
	179	+ sha1sum_blob(pContent, pHashOut);
	180	+ return 1;
	181	+ }
	182	+ }
	183	+ if( iHType==0 ){
	184	+ switch( g.eHashPolicy ){
	185	+ case HPOLICY_SHA1:
	186	+ case HPOLICY_AUTO:
	187	+ sha1sum_blob(pContent, pHashOut);
	188	+ return 1;
	189	+ case HPOLICY_SHA3:
	190	+ case HPOLICY_SHA3_ONLY:
	191	+ case HPOLICY_SHUN_SHA1:
	192	+ sha3sum_blob(pContent, 256, pHashOut);
	193	+ return 1;
	194	+ }
	195	+ }
	196	+ blob_init(pHashOut, 0, 0);
	197	+ return 0;
	198	+}
	199	+
	200	+/*
	201	+** Return the default hash policy for repositories that do not currently
	202	+** have an assigned hash policy.
	203	+**
	204	+** Make the default HPOLICY_AUTO if there are SHA1 artficates but no SHA3
	205	+** artifacts in the repository. Make the default HPOLICY_SHA3 if there
	206	+** are one or more SHA3 artifacts or if the repository is initially empty.
	207	+*/
	208	+int hname_default_policy(void){
	209	+ if( db_exists("SELECT 1 FROM blob WHERE length(uuid)>40")
	210	+ \|\| !db_exists("SELECT 1 FROM blob WHERE length(uuid)==40")
	211	+ ){
	212	+ return HPOLICY_SHA3;
	213	+ }else{
	214	+ return HPOLICY_AUTO;
	215	+ }
	216	+}
	217	+
	218	+/*
	219	+** Names of the hash policies.
	220	+*/
	221	+static const char *azPolicy[] = {
	222	+ "sha1", "auto", "sha3", "sha3-only", "shun-sha1"
	223	+};
	224	+
	225	+/* Return the name of the current hash policy.
	226	+*/
	227	+const char *hpolicy_name(void){
	228	+ return azPolicy[g.eHashPolicy];
	229	+}
	230	+
	231	+
	232	+/*
	233	+** COMMAND: hash-policy*
	234	+**
	235	+** Usage: fossil hash-policy ?NEW-POLICY?
	236	+**
	237	+** Query or set the hash policy for the current repository. Available hash
	238	+** policies are as follows:
	239	+**
	240	+** sha1 New artifact names are created using SHA1
	241	+**
	242	+** auto New artifact names are created using SHA1, but
	243	+** automatically change the policy to "sha3" when
	244	+** any SHA3 artifact enters the repository.
	245	+**
	246	+** sha3 New artifact names are created using SHA3, but
	247	+** older artifacts with SHA1 names may be reused.
	248	+**
	249	+** sha3-only Use only SHA3 artifact names. Do not reuse legacy
	250	+** SHA1 names.
	251	+**
	252	+** shun-sha1 Shun any SHA1 artifacts received by sync operations
	253	+** other than clones. Older legacy SHA1 artifacts are
	254	+** are allowed during a clone.
	255	+**
	256	+** The default hash policy for existing repositories is "auto", which will
	257	+** immediately promote to "sha3" if the repository contains one or more
	258	+** artifacts with SHA3 names. The default hash policy for new repositories
	259	+** is "shun-sha1".
149	260	*/
150		-void hname_hash(const Blob pContent, unsigned int iHType, Blob pHashOut){
151		-#if RELEASE_VERSION_NUMBER>=20100
152		- /* For Fossil 2.1 and later, the preferred hash algorithm is SHA3-256 and
153		- ** SHA1 is the secondary hash algorithm. */
154		- switch( iHType ){
155		- case 0: sha3sum_blob(pContent, 256, pHashOut); break;
156		- case 1: sha1sum_blob(pContent, pHashOut); break;
157		- }
158		-#else
159		- /* Prior to Fossil 2.1, the preferred hash algorithm is SHA1 (for backwards
160		- ** compatibility with Fossil 1.x) and SHA3-256 is the only auxiliary
161		- ** algorithm */
162		- switch( iHType ){
163		- case 0: sha1sum_blob(pContent, pHashOut); break;
164		- case 1: sha3sum_blob(pContent, 256, pHashOut); break;
165		- }
166		-#endif
	261	+void hash_policy_command(void){
	262	+ int i;
	263	+ db_find_and_open_repository(0, 0);
	264	+ if( g.argc!=2 && g.argc!=3 ) usage("?NEW-POLICY?");
	265	+ if( g.argc==2 ){
	266	+ fossil_print("%s\n", azPolicy[g.eHashPolicy]);
	267	+ return;
	268	+ }
	269	+ for(i=HPOLICY_SHA1; i<=HPOLICY_SHUN_SHA1; i++){
	270	+ if( fossil_strcmp(g.argv[2],azPolicy[i])==0 ){
	271	+ if( i==HPOLICY_AUTO
	272	+ && db_exists("SELECT 1 FROM blob WHERE length(uuid)>40")
	273	+ ){
	274	+ i = HPOLICY_SHA3;
	275	+ }
	276	+ g.eHashPolicy = i;
	277	+ db_set_int("hash-policy", i, 0);
	278	+ fossil_print("%s\n", azPolicy[i]);
	279	+ return;
	280	+ }
	281	+ }
	282	+ fossil_fatal("unknown hash policy \"%s\" - should be one of: sha1 auto"
	283	+ " sha3 sha3-only shun-sha1", g.argv[2]);
167	284	}
168	285

	--- src/hname.c
	+++ src/hname.c
	@@ -16,11 +16,13 @@
16	*******************************************************************************
17	**
18	** This file contains generic code for dealing with hashes used for
19	** naming artifacts. Specific hash algorithms are implemented separately
20	** (for example in sha1.c and sha3.c). This file contains the generic
21	** interface code.


22	*/
23	#include "config.h"
24	#include "hname.h"
25
26
	@@ -47,10 +49,19 @@
47	/*
48	** The number of distinct hash algorithms:
49	*/
50	#define HNAME_COUNT 2 /* Just SHA1 and SHA3-256. Let's keep it that way! */
51









52	#endif /* INTERFACE */
53
54	/*
55	** Return a human-readable name for the hash algorithm given a hash with
56	** a length of nHash hexadecimal digits.
	@@ -142,26 +153,132 @@
142
143	/*
144	** Compute a hash on blob pContent. Write the hash into blob pHashOut.
145	** This routine assumes that pHashOut is uninitialized.
146	**
147	** The preferred hash is used for iHType==0, and various alternative hashes
148	** are used for iHType>0 && iHType<NHAME_COUNT.




































































































149	*/
150	void hname_hash(const Blob pContent, unsigned int iHType, Blob pHashOut){
151	#if RELEASE_VERSION_NUMBER>=20100
152	/* For Fossil 2.1 and later, the preferred hash algorithm is SHA3-256 and
153	** SHA1 is the secondary hash algorithm. */
154	switch( iHType ){
155	case 0: sha3sum_blob(pContent, 256, pHashOut); break;
156	case 1: sha1sum_blob(pContent, pHashOut); break;
157	}
158	#else
159	/* Prior to Fossil 2.1, the preferred hash algorithm is SHA1 (for backwards
160	** compatibility with Fossil 1.x) and SHA3-256 is the only auxiliary
161	** algorithm */
162	switch( iHType ){
163	case 0: sha1sum_blob(pContent, pHashOut); break;
164	case 1: sha3sum_blob(pContent, 256, pHashOut); break;
165	}
166	#endif






167	}
168

	--- src/hname.c
	+++ src/hname.c
	@@ -16,11 +16,13 @@
16	*******************************************************************************
17	**
18	** This file contains generic code for dealing with hashes used for
19	** naming artifacts. Specific hash algorithms are implemented separately
20	** (for example in sha1.c and sha3.c). This file contains the generic
21	** interface logic.
22	**
23	** "hname" is intended to be an abbreviation of "hash name".
24	*/
25	#include "config.h"
26	#include "hname.h"
27
28
	@@ -47,10 +49,19 @@
49	/*
50	** The number of distinct hash algorithms:
51	*/
52	#define HNAME_COUNT 2 /* Just SHA1 and SHA3-256. Let's keep it that way! */
53
54	/*
55	** Hash naming policies
56	*/
57	#define HPOLICY_SHA1 0 /* Use SHA1 hashes */
58	#define HPOLICY_AUTO 1 /* SHA1 but auto-promote to SHA3 */
59	#define HPOLICY_SHA3 2 /* Use SHA3 hashes */
60	#define HPOLICY_SHA3_ONLY 3 /* Use SHA3 hashes exclusively */
61	#define HPOLICY_SHUN_SHA1 4 /* Shun all SHA1 objects */
62
63	#endif /* INTERFACE */
64
65	/*
66	** Return a human-readable name for the hash algorithm given a hash with
67	** a length of nHash hexadecimal digits.
	@@ -142,26 +153,132 @@
153
154	/*
155	** Compute a hash on blob pContent. Write the hash into blob pHashOut.
156	** This routine assumes that pHashOut is uninitialized.
157	**
158	** The preferred hash is used for iHType==0 and the alternative hash is
159	** used if iHType==1. (The interface is designed to accommodate more than
160	** just two hashes, but HNAME_COUNT is currently fixed at 2.)
161	**
162	** Depending on the hash policy, the alternative hash may be disallowed.
163	** If the alterative hash is disallowed, the routine returns 0. This
164	** routine returns 1 if iHType>0 and the alternative hash is allowed,
165	** and it always returns 1 when iHType==0.
166	**
167	** Alternative hash is disallowed for all hash policies except auto,
168	** sha1 and sha3.
169	*/
170	int hname_hash(const Blob pContent, unsigned int iHType, Blob pHashOut){
171	assert( iHType==0 \|\| iHType==1 );
172	if( iHType==1 ){
173	switch( g.eHashPolicy ){
174	case HPOLICY_AUTO:
175	case HPOLICY_SHA1:
176	sha3sum_blob(pContent, 256, pHashOut);
177	return 1;
178	case HPOLICY_SHA3:
179	sha1sum_blob(pContent, pHashOut);
180	return 1;
181	}
182	}
183	if( iHType==0 ){
184	switch( g.eHashPolicy ){
185	case HPOLICY_SHA1:
186	case HPOLICY_AUTO:
187	sha1sum_blob(pContent, pHashOut);
188	return 1;
189	case HPOLICY_SHA3:
190	case HPOLICY_SHA3_ONLY:
191	case HPOLICY_SHUN_SHA1:
192	sha3sum_blob(pContent, 256, pHashOut);
193	return 1;
194	}
195	}
196	blob_init(pHashOut, 0, 0);
197	return 0;
198	}
199
200	/*
201	** Return the default hash policy for repositories that do not currently
202	** have an assigned hash policy.
203	**
204	** Make the default HPOLICY_AUTO if there are SHA1 artficates but no SHA3
205	** artifacts in the repository. Make the default HPOLICY_SHA3 if there
206	** are one or more SHA3 artifacts or if the repository is initially empty.
207	*/
208	int hname_default_policy(void){
209	if( db_exists("SELECT 1 FROM blob WHERE length(uuid)>40")
210	\|\| !db_exists("SELECT 1 FROM blob WHERE length(uuid)==40")
211	){
212	return HPOLICY_SHA3;
213	}else{
214	return HPOLICY_AUTO;
215	}
216	}
217
218	/*
219	** Names of the hash policies.
220	*/
221	static const char *azPolicy[] = {
222	"sha1", "auto", "sha3", "sha3-only", "shun-sha1"
223	};
224
225	/* Return the name of the current hash policy.
226	*/
227	const char *hpolicy_name(void){
228	return azPolicy[g.eHashPolicy];
229	}
230
231
232	/*
233	** COMMAND: hash-policy*
234	**
235	** Usage: fossil hash-policy ?NEW-POLICY?
236	**
237	** Query or set the hash policy for the current repository. Available hash
238	** policies are as follows:
239	**
240	** sha1 New artifact names are created using SHA1
241	**
242	** auto New artifact names are created using SHA1, but
243	** automatically change the policy to "sha3" when
244	** any SHA3 artifact enters the repository.
245	**
246	** sha3 New artifact names are created using SHA3, but
247	** older artifacts with SHA1 names may be reused.
248	**
249	** sha3-only Use only SHA3 artifact names. Do not reuse legacy
250	** SHA1 names.
251	**
252	** shun-sha1 Shun any SHA1 artifacts received by sync operations
253	** other than clones. Older legacy SHA1 artifacts are
254	** are allowed during a clone.
255	**
256	** The default hash policy for existing repositories is "auto", which will
257	** immediately promote to "sha3" if the repository contains one or more
258	** artifacts with SHA3 names. The default hash policy for new repositories
259	** is "shun-sha1".
260	*/
261	void hash_policy_command(void){
262	int i;
263	db_find_and_open_repository(0, 0);
264	if( g.argc!=2 && g.argc!=3 ) usage("?NEW-POLICY?");
265	if( g.argc==2 ){
266	fossil_print("%s\n", azPolicy[g.eHashPolicy]);
267	return;
268	}
269	for(i=HPOLICY_SHA1; i<=HPOLICY_SHUN_SHA1; i++){
270	if( fossil_strcmp(g.argv[2],azPolicy[i])==0 ){
271	if( i==HPOLICY_AUTO
272	&& db_exists("SELECT 1 FROM blob WHERE length(uuid)>40")
273	){
274	i = HPOLICY_SHA3;
275	}
276	g.eHashPolicy = i;
277	db_set_int("hash-policy", i, 0);
278	fossil_print("%s\n", azPolicy[i]);
279	return;
280	}
281	}
282	fossil_fatal("unknown hash policy \"%s\" - should be one of: sha1 auto"
283	" sha3 sha3-only shun-sha1", g.argv[2]);
284	}
285

M src/main.c

+4 -1

		--- src/main.c
		+++ src/main.c
		@@ -140,10 +140,11 @@
140	140	char zLocalDbName; / Name of the local database file */
141	141	char zOpenRevision; / Check-in version to use during database open */
142	142	int localOpen; /* True if the local database is open */
143	143	char zLocalRoot; / The directory holding the local database */
144	144	int minPrefix; /* Number of digits needed for a distinct UUID */
	145	+ int eHashPolicy; /* Current hash policy. One of HPOLICY_* */
145	146	int fNoDirSymlinks; /* True if --no-dir-symlinks flag is present */
146	147	int fSqlTrace; /* True if --sqltrace flag is present */
147	148	int fSqlStats; /* True if --sqltrace or --sqlstats are present */
148	149	int fSqlPrint; /* True if -sqlprint flag is present */
149	150	int fQuiet; /* True if -quiet flag is present */
		@@ -2005,11 +2006,11 @@
2005	2006	** the name of that directory and the specific repository will be
2006	2007	** opened later by process_one_web_page() based on the content of
2007	2008	** the PATH_INFO variable.
2008	2009	**
2009	2010	** If the fCreate flag is set, then create the repository if it
2010		-** does not already exist.
	2011	+** does not already exist. Always use "auto" hash-policy in this case.
2011	2012	*/
2012	2013	static void find_server_repository(int arg, int fCreate){
2013	2014	if( g.argc<=arg ){
2014	2015	db_must_be_within_tree();
2015	2016	}else{
		@@ -2022,10 +2023,12 @@
2022	2023	if( isDir==0 && fCreate ){
2023	2024	const char *zPassword;
2024	2025	db_create_repository(zRepo);
2025	2026	db_open_repository(zRepo);
2026	2027	db_begin_transaction();
	2028	+ g.eHashPolicy = HPOLICY_AUTO;
	2029	+ db_set_int("hash-policy", HPOLICY_AUTO, 0);
2027	2030	db_initial_setup(0, "now", g.zLogin);
2028	2031	db_end_transaction(0);
2029	2032	fossil_print("project-id: %s\n", db_get("project-code", 0));
2030	2033	fossil_print("server-id: %s\n", db_get("server-code", 0));
2031	2034	zPassword = db_text(0, "SELECT pw FROM user WHERE login=%Q", g.zLogin);
2032	2035

	--- src/main.c
	+++ src/main.c
	@@ -140,10 +140,11 @@
140	char zLocalDbName; / Name of the local database file */
141	char zOpenRevision; / Check-in version to use during database open */
142	int localOpen; /* True if the local database is open */
143	char zLocalRoot; / The directory holding the local database */
144	int minPrefix; /* Number of digits needed for a distinct UUID */

145	int fNoDirSymlinks; /* True if --no-dir-symlinks flag is present */
146	int fSqlTrace; /* True if --sqltrace flag is present */
147	int fSqlStats; /* True if --sqltrace or --sqlstats are present */
148	int fSqlPrint; /* True if -sqlprint flag is present */
149	int fQuiet; /* True if -quiet flag is present */
	@@ -2005,11 +2006,11 @@
2005	** the name of that directory and the specific repository will be
2006	** opened later by process_one_web_page() based on the content of
2007	** the PATH_INFO variable.
2008	**
2009	** If the fCreate flag is set, then create the repository if it
2010	** does not already exist.
2011	*/
2012	static void find_server_repository(int arg, int fCreate){
2013	if( g.argc<=arg ){
2014	db_must_be_within_tree();
2015	}else{
	@@ -2022,10 +2023,12 @@
2022	if( isDir==0 && fCreate ){
2023	const char *zPassword;
2024	db_create_repository(zRepo);
2025	db_open_repository(zRepo);
2026	db_begin_transaction();


2027	db_initial_setup(0, "now", g.zLogin);
2028	db_end_transaction(0);
2029	fossil_print("project-id: %s\n", db_get("project-code", 0));
2030	fossil_print("server-id: %s\n", db_get("server-code", 0));
2031	zPassword = db_text(0, "SELECT pw FROM user WHERE login=%Q", g.zLogin);
2032

	--- src/main.c
	+++ src/main.c
	@@ -140,10 +140,11 @@
140	char zLocalDbName; / Name of the local database file */
141	char zOpenRevision; / Check-in version to use during database open */
142	int localOpen; /* True if the local database is open */
143	char zLocalRoot; / The directory holding the local database */
144	int minPrefix; /* Number of digits needed for a distinct UUID */
145	int eHashPolicy; /* Current hash policy. One of HPOLICY_* */
146	int fNoDirSymlinks; /* True if --no-dir-symlinks flag is present */
147	int fSqlTrace; /* True if --sqltrace flag is present */
148	int fSqlStats; /* True if --sqltrace or --sqlstats are present */
149	int fSqlPrint; /* True if -sqlprint flag is present */
150	int fQuiet; /* True if -quiet flag is present */
	@@ -2005,11 +2006,11 @@
2006	** the name of that directory and the specific repository will be
2007	** opened later by process_one_web_page() based on the content of
2008	** the PATH_INFO variable.
2009	**
2010	** If the fCreate flag is set, then create the repository if it
2011	** does not already exist. Always use "auto" hash-policy in this case.
2012	*/
2013	static void find_server_repository(int arg, int fCreate){
2014	if( g.argc<=arg ){
2015	db_must_be_within_tree();
2016	}else{
	@@ -2022,10 +2023,12 @@
2023	if( isDir==0 && fCreate ){
2024	const char *zPassword;
2025	db_create_repository(zRepo);
2026	db_open_repository(zRepo);
2027	db_begin_transaction();
2028	g.eHashPolicy = HPOLICY_AUTO;
2029	db_set_int("hash-policy", HPOLICY_AUTO, 0);
2030	db_initial_setup(0, "now", g.zLogin);
2031	db_end_transaction(0);
2032	fossil_print("project-id: %s\n", db_get("project-code", 0));
2033	fossil_print("server-id: %s\n", db_get("server-code", 0));
2034	zPassword = db_text(0, "SELECT pw FROM user WHERE login=%Q", g.zLogin);
2035

M src/sha3.c

+3 -3

		--- src/sha3.c
		+++ src/sha3.c
		@@ -378,18 +378,18 @@
378	378	}
379	379
380	380	/*
381	381	** Initialize a new hash. iSize determines the size of the hash
382	382	** in bits and should be one of 224, 256, 384, or 512. Or iSize
383		-** can be zero to use the default hash size of 224 bits.
	383	+** can be zero to use the default hash size of 256 bits.
384	384	*/
385	385	static void SHA3Init(SHA3Context *p, int iSize){
386	386	memset(p, 0, sizeof(*p));
387	387	if( iSize>=128 && iSize<=512 ){
388	388	p->nRate = (1600 - ((iSize + 31)&~31)*2)/8;
389	389	}else{
390		- p->nRate = 144;
	390	+ p->nRate = (1600 - 2*256)/8;
391	391	}
392	392	#if SHA3_BYTEORDER==1234
393	393	/* Known to be little-endian at compile-time. No-op */
394	394	#elif SHA3_BYTEORDER==4321
395	395	p->ixMask = 7; /* Big-endian */
		@@ -428,11 +428,11 @@
428	428	}
429	429	}
430	430	}
431	431	#endif
432	432	for(; i<nData; i++){
433		-#if SHA1_BYTEORDER==1234
	433	+#if SHA3_BYTEORDER==1234
434	434	p->u.x[p->nLoaded] ^= aData[i];
435	435	#elif SHA3_BYTEORDER==4321
436	436	p->u.x[p->nLoaded^0x07] ^= aData[i];
437	437	#else
438	438	p->u.x[p->nLoaded^p->ixMask] ^= aData[i];
439	439

	--- src/sha3.c
	+++ src/sha3.c
	@@ -378,18 +378,18 @@
378	}
379
380	/*
381	** Initialize a new hash. iSize determines the size of the hash
382	** in bits and should be one of 224, 256, 384, or 512. Or iSize
383	** can be zero to use the default hash size of 224 bits.
384	*/
385	static void SHA3Init(SHA3Context *p, int iSize){
386	memset(p, 0, sizeof(*p));
387	if( iSize>=128 && iSize<=512 ){
388	p->nRate = (1600 - ((iSize + 31)&~31)*2)/8;
389	}else{
390	p->nRate = 144;
391	}
392	#if SHA3_BYTEORDER==1234
393	/* Known to be little-endian at compile-time. No-op */
394	#elif SHA3_BYTEORDER==4321
395	p->ixMask = 7; /* Big-endian */
	@@ -428,11 +428,11 @@
428	}
429	}
430	}
431	#endif
432	for(; i<nData; i++){
433	#if SHA1_BYTEORDER==1234
434	p->u.x[p->nLoaded] ^= aData[i];
435	#elif SHA3_BYTEORDER==4321
436	p->u.x[p->nLoaded^0x07] ^= aData[i];
437	#else
438	p->u.x[p->nLoaded^p->ixMask] ^= aData[i];
439

	--- src/sha3.c
	+++ src/sha3.c
	@@ -378,18 +378,18 @@
378	}
379
380	/*
381	** Initialize a new hash. iSize determines the size of the hash
382	** in bits and should be one of 224, 256, 384, or 512. Or iSize
383	** can be zero to use the default hash size of 256 bits.
384	*/
385	static void SHA3Init(SHA3Context *p, int iSize){
386	memset(p, 0, sizeof(*p));
387	if( iSize>=128 && iSize<=512 ){
388	p->nRate = (1600 - ((iSize + 31)&~31)*2)/8;
389	}else{
390	p->nRate = (1600 - 2*256)/8;
391	}
392	#if SHA3_BYTEORDER==1234
393	/* Known to be little-endian at compile-time. No-op */
394	#elif SHA3_BYTEORDER==4321
395	p->ixMask = 7; /* Big-endian */
	@@ -428,11 +428,11 @@
428	}
429	}
430	}
431	#endif
432	for(; i<nData; i++){
433	#if SHA3_BYTEORDER==1234
434	p->u.x[p->nLoaded] ^= aData[i];
435	#elif SHA3_BYTEORDER==4321
436	p->u.x[p->nLoaded^0x07] ^= aData[i];
437	#else
438	p->u.x[p->nLoaded^p->ixMask] ^= aData[i];
439

M src/shun.c

		--- src/shun.c
		+++ src/shun.c
		@@ -26,10 +26,11 @@
26	26	*/
27	27	int uuid_is_shunned(const char *zUuid){
28	28	static Stmt q;
29	29	int rc;
30	30	if( zUuid==0 \|\| zUuid[0]==0 ) return 0;
	31	+ if( g.eHashPolicy==HPOLICY_SHUN_SHA1 && zUuid[HNAME_LEN_SHA1]==0 ) return 1;
31	32	db_static_prepare(&q, "SELECT 1 FROM shun WHERE uuid=:uuid");
32	33	db_bind_text(&q, ":uuid", zUuid);
33	34	rc = db_step(&q);
34	35	db_reset(&q);
35	36	return rc==SQLITE_ROW;
36	37

	--- src/shun.c
	+++ src/shun.c
	@@ -26,10 +26,11 @@
26	*/
27	int uuid_is_shunned(const char *zUuid){
28	static Stmt q;
29	int rc;
30	if( zUuid==0 \|\| zUuid[0]==0 ) return 0;

31	db_static_prepare(&q, "SELECT 1 FROM shun WHERE uuid=:uuid");
32	db_bind_text(&q, ":uuid", zUuid);
33	rc = db_step(&q);
34	db_reset(&q);
35	return rc==SQLITE_ROW;
36

	--- src/shun.c
	+++ src/shun.c
	@@ -26,10 +26,11 @@
26	*/
27	int uuid_is_shunned(const char *zUuid){
28	static Stmt q;
29	int rc;
30	if( zUuid==0 \|\| zUuid[0]==0 ) return 0;
31	if( g.eHashPolicy==HPOLICY_SHUN_SHA1 && zUuid[HNAME_LEN_SHA1]==0 ) return 1;
32	db_static_prepare(&q, "SELECT 1 FROM shun WHERE uuid=:uuid");
33	db_bind_text(&q, ":uuid", zUuid);
34	rc = db_step(&q);
35	db_reset(&q);
36	return rc==SQLITE_ROW;
37

M src/sqlcmd.c

		--- src/sqlcmd.c
		+++ src/sqlcmd.c
		@@ -212,10 +212,13 @@
212	212	*/
213	213	void cmd_sqlite3(void){
214	214	int noRepository;
215	215	const char *zConfigDb;
216	216	extern int sqlite3_shell(int, char**);
	217	+#ifdef FOSSIL_ENABLE_TH1_HOOKS
	218	+ g.fNoThHook = 1;
	219	+#endif
217	220	noRepository = find_option("no-repository", 0, 0)!=0;
218	221	if( !noRepository ){
219	222	db_find_and_open_repository(OPEN_ANY_SCHEMA, 0);
220	223	}
221	224	db_open_config(1,0);
222	225

	--- src/sqlcmd.c
	+++ src/sqlcmd.c
	@@ -212,10 +212,13 @@
212	*/
213	void cmd_sqlite3(void){
214	int noRepository;
215	const char *zConfigDb;
216	extern int sqlite3_shell(int, char**);



217	noRepository = find_option("no-repository", 0, 0)!=0;
218	if( !noRepository ){
219	db_find_and_open_repository(OPEN_ANY_SCHEMA, 0);
220	}
221	db_open_config(1,0);
222

	--- src/sqlcmd.c
	+++ src/sqlcmd.c
	@@ -212,10 +212,13 @@
212	*/
213	void cmd_sqlite3(void){
214	int noRepository;
215	const char *zConfigDb;
216	extern int sqlite3_shell(int, char**);
217	#ifdef FOSSIL_ENABLE_TH1_HOOKS
218	g.fNoThHook = 1;
219	#endif
220	noRepository = find_option("no-repository", 0, 0)!=0;
221	if( !noRepository ){
222	db_find_and_open_repository(OPEN_ANY_SCHEMA, 0);
223	}
224	db_open_config(1,0);
225

M src/stash.c

+35 -32

		--- src/stash.c
		+++ src/stash.c
		@@ -332,52 +332,45 @@
332	332	isBin2 = fIncludeBinary ? 0 : looks_like_binary(&a);
333	333	diff_file_mem(&empty, &a, isBin1, isBin2, zNew, zDiffCmd,
334	334	zBinGlob, fIncludeBinary, diffFlags);
335	335	}else if( isRemoved ){
336	336	fossil_print("DELETE %s\n", zOrig);
337		- if( fBaseline==0 ){
338		- if( file_wd_islink(zOPath) ){
339		- blob_read_link(&a, zOPath);
340		- }else{
341		- blob_read_from_file(&a, zOPath);
342		- }
343		- }else{
344		- content_get(rid, &a);
345		- }
346		- diff_print_index(zNew, diffFlags);
347		- isBin1 = fIncludeBinary ? 0 : looks_like_binary(&a);
348		- isBin2 = 0;
349		- diff_file_mem(&a, &empty, isBin1, isBin2, zOrig, zDiffCmd,
350		- zBinGlob, fIncludeBinary, diffFlags);
351		- }else{
352		- Blob delta, disk;
	337	+ diff_print_index(zNew, diffFlags);
	338	+ isBin2 = 0;
	339	+ if( fBaseline ){
	340	+ content_get(rid, &a);
	341	+ isBin1 = fIncludeBinary ? 0 : looks_like_binary(&a);
	342	+ diff_file_mem(&a, &empty, isBin1, isBin2, zOrig, zDiffCmd,
	343	+ zBinGlob, fIncludeBinary, diffFlags);
	344	+ }else{
	345	+ }
	346	+ }else{
	347	+ Blob delta;
353	348	int isOrigLink = file_wd_islink(zOPath);
354	349	db_ephemeral_blob(&q, 6, &delta);
355		- if( fBaseline==0 ){
356		- if( isOrigLink ){
357		- blob_read_link(&disk, zOPath);
358		- }else{
359		- blob_read_from_file(&disk, zOPath);
360		- }
361		- }
362	350	fossil_print("CHANGED %s\n", zNew);
363	351	if( !isOrigLink != !isLink ){
364	352	diff_print_index(zNew, diffFlags);
365	353	diff_print_filenames(zOrig, zNew, diffFlags);
366	354	printf(DIFF_CANNOT_COMPUTE_SYMLINK);
367	355	}else{
368		- Blob *pBase = fBaseline ? &a : &disk;
369	356	content_get(rid, &a);
370	357	blob_delta_apply(&a, &delta, &b);
371		- isBin1 = fIncludeBinary ? 0 : looks_like_binary(pBase);
	358	+ isBin1 = fIncludeBinary ? 0 : looks_like_binary(&a);
372	359	isBin2 = fIncludeBinary ? 0 : looks_like_binary(&b);
373		- diff_file_mem(fBaseline? &a : &disk, &b, isBin1, isBin2, zNew,
374		- zDiffCmd, zBinGlob, fIncludeBinary, diffFlags);
	360	+ if( fBaseline ){
	361	+ diff_file_mem(&a, &b, isBin1, isBin2, zNew,
	362	+ zDiffCmd, zBinGlob, fIncludeBinary, diffFlags);
	363	+ }else{
	364	+ /*Diff with file on disk using fSwapDiff=1 to show the diff in the
	365	+ same direction as if fBaseline=1.*/
	366	+ diff_file(&b, isBin2, zOPath, zNew, zDiffCmd,
	367	+ zBinGlob, fIncludeBinary, diffFlags, 1);
	368	+ }
375	369	blob_reset(&a);
376	370	blob_reset(&b);
377	371	}
378		- if( !fBaseline ) blob_reset(&disk);
379	372	blob_reset(&delta);
380	373	}
381	374	}
382	375	db_finalize(&q);
383	376	}
		@@ -433,12 +426,15 @@
433	426	**
434	427	** List all changes sets currently stashed. Show information about
435	428	** individual files in each changeset if -v or --verbose is used.
436	429	**
437	430	** fossil stash show\|cat ?STASHID? ?DIFF-OPTIONS?
	431	+** fossil stash gshow\|gcat ?STASHID? ?DIFF-OPTIONS?
438	432	**
439		-** Show the contents of a stash.
	433	+** Show the contents of a stash as a diff against it's baseline.
	434	+** With gshow and gcat, gdiff-command is used instead of internal
	435	+** diff logic.
440	436	**
441	437	** fossil stash pop
442	438	** fossil stash apply ?STASHID?
443	439	**
444	440	** Apply STASHID or the most recently create stash to the current
		@@ -460,18 +456,20 @@
460	456	**
461	457	** fossil stash diff ?STASHID? ?DIFF-OPTIONS?
462	458	** fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
463	459	**
464	460	** Show diffs of the current working directory and what that
465		-** directory would be if STASHID were applied.
	461	+** directory would be if STASHID were applied. With gdiff,
	462	+** gdiff-command is used instead of internal diff logic.
466	463	**
467	464	** SUMMARY:
468	465	** fossil stash
469	466	** fossil stash save ?-m\|--comment COMMENT? ?FILES...?
470	467	** fossil stash snapshot ?-m\|--comment COMMENT? ?FILES...?
471	468	** fossil stash list\|ls ?-v\|--verbose? ?-W\|--width <num>?
472	469	** fossil stash show\|cat ?STASHID? ?DIFF-OPTIONS?
	470	+** fossil stash gshow\|gcat ?STASHID? ?DIFF-OPTIONS?
473	471	** fossil stash pop
474	472	** fossil stash apply\|goto ?STASHID?
475	473	** fossil stash drop\|rm ?STASHID? ?-a\|--all?
476	474	** fossil stash diff ?STASHID? ?DIFF-OPTIONS?
477	475	** fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
		@@ -654,25 +652,30 @@
654	652	undo_finish();
655	653	}else
656	654	if( memcmp(zCmd, "diff", nCmd)==0
657	655	\|\| memcmp(zCmd, "gdiff", nCmd)==0
658	656	\|\| memcmp(zCmd, "show", nCmd)==0
	657	+ \|\| memcmp(zCmd, "gshow", nCmd)==0
659	658	\|\| memcmp(zCmd, "cat", nCmd)==0
	659	+ \|\| memcmp(zCmd, "gcat", nCmd)==0
660	660	){
661	661	const char *zDiffCmd = 0;
662	662	const char *zBinGlob = 0;
663	663	int fIncludeBinary = 0;
664		- int fBaseline = zCmd[0]=='s' \|\| zCmd[0]=='c';
	664	+ int fBaseline = 0;
665	665	u64 diffFlags;
666	666
	667	+ if( strstr(zCmd,"show")!=0 \|\| strstr(zCmd,"cat")!=0 ){
	668	+ fBaseline = 1;
	669	+ }
667	670	if( find_option("tk",0,0)!=0 ){
668	671	db_close(0);
669	672	diff_tk(fBaseline ? "stash show" : "stash diff", 3);
670	673	return;
671	674	}
672	675	if( find_option("internal","i",0)==0 ){
673		- zDiffCmd = diff_command_external(memcmp(zCmd, "gdiff", nCmd)==0);
	676	+ zDiffCmd = diff_command_external(zCmd[0]=='g');
674	677	}
675	678	diffFlags = diff_options();
676	679	if( find_option("verbose","v",0)!=0 ) diffFlags \|= DIFF_VERBOSE;
677	680	if( g.argc>4 ) usage(mprintf("%s ?STASHID? ?DIFF-OPTIONS?", zCmd));
678	681	if( zDiffCmd ){
679	682

	--- src/stash.c
	+++ src/stash.c
	@@ -332,52 +332,45 @@
332	isBin2 = fIncludeBinary ? 0 : looks_like_binary(&a);
333	diff_file_mem(&empty, &a, isBin1, isBin2, zNew, zDiffCmd,
334	zBinGlob, fIncludeBinary, diffFlags);
335	}else if( isRemoved ){
336	fossil_print("DELETE %s\n", zOrig);
337	if( fBaseline==0 ){
338	if( file_wd_islink(zOPath) ){
339	blob_read_link(&a, zOPath);
340	}else{
341	blob_read_from_file(&a, zOPath);
342	}
343	}else{
344	content_get(rid, &a);
345	}
346	diff_print_index(zNew, diffFlags);
347	isBin1 = fIncludeBinary ? 0 : looks_like_binary(&a);
348	isBin2 = 0;
349	diff_file_mem(&a, &empty, isBin1, isBin2, zOrig, zDiffCmd,
350	zBinGlob, fIncludeBinary, diffFlags);
351	}else{
352	Blob delta, disk;
353	int isOrigLink = file_wd_islink(zOPath);
354	db_ephemeral_blob(&q, 6, &delta);
355	if( fBaseline==0 ){
356	if( isOrigLink ){
357	blob_read_link(&disk, zOPath);
358	}else{
359	blob_read_from_file(&disk, zOPath);
360	}
361	}
362	fossil_print("CHANGED %s\n", zNew);
363	if( !isOrigLink != !isLink ){
364	diff_print_index(zNew, diffFlags);
365	diff_print_filenames(zOrig, zNew, diffFlags);
366	printf(DIFF_CANNOT_COMPUTE_SYMLINK);
367	}else{
368	Blob *pBase = fBaseline ? &a : &disk;
369	content_get(rid, &a);
370	blob_delta_apply(&a, &delta, &b);
371	isBin1 = fIncludeBinary ? 0 : looks_like_binary(pBase);
372	isBin2 = fIncludeBinary ? 0 : looks_like_binary(&b);
373	diff_file_mem(fBaseline? &a : &disk, &b, isBin1, isBin2, zNew,
374	zDiffCmd, zBinGlob, fIncludeBinary, diffFlags);







375	blob_reset(&a);
376	blob_reset(&b);
377	}
378	if( !fBaseline ) blob_reset(&disk);
379	blob_reset(&delta);
380	}
381	}
382	db_finalize(&q);
383	}
	@@ -433,12 +426,15 @@
433	**
434	** List all changes sets currently stashed. Show information about
435	** individual files in each changeset if -v or --verbose is used.
436	**
437	** fossil stash show\|cat ?STASHID? ?DIFF-OPTIONS?

438	**
439	** Show the contents of a stash.


440	**
441	** fossil stash pop
442	** fossil stash apply ?STASHID?
443	**
444	** Apply STASHID or the most recently create stash to the current
	@@ -460,18 +456,20 @@
460	**
461	** fossil stash diff ?STASHID? ?DIFF-OPTIONS?
462	** fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
463	**
464	** Show diffs of the current working directory and what that
465	** directory would be if STASHID were applied.

466	**
467	** SUMMARY:
468	** fossil stash
469	** fossil stash save ?-m\|--comment COMMENT? ?FILES...?
470	** fossil stash snapshot ?-m\|--comment COMMENT? ?FILES...?
471	** fossil stash list\|ls ?-v\|--verbose? ?-W\|--width <num>?
472	** fossil stash show\|cat ?STASHID? ?DIFF-OPTIONS?

473	** fossil stash pop
474	** fossil stash apply\|goto ?STASHID?
475	** fossil stash drop\|rm ?STASHID? ?-a\|--all?
476	** fossil stash diff ?STASHID? ?DIFF-OPTIONS?
477	** fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
	@@ -654,25 +652,30 @@
654	undo_finish();
655	}else
656	if( memcmp(zCmd, "diff", nCmd)==0
657	\|\| memcmp(zCmd, "gdiff", nCmd)==0
658	\|\| memcmp(zCmd, "show", nCmd)==0

659	\|\| memcmp(zCmd, "cat", nCmd)==0

660	){
661	const char *zDiffCmd = 0;
662	const char *zBinGlob = 0;
663	int fIncludeBinary = 0;
664	int fBaseline = zCmd[0]=='s' \|\| zCmd[0]=='c';
665	u64 diffFlags;
666



667	if( find_option("tk",0,0)!=0 ){
668	db_close(0);
669	diff_tk(fBaseline ? "stash show" : "stash diff", 3);
670	return;
671	}
672	if( find_option("internal","i",0)==0 ){
673	zDiffCmd = diff_command_external(memcmp(zCmd, "gdiff", nCmd)==0);
674	}
675	diffFlags = diff_options();
676	if( find_option("verbose","v",0)!=0 ) diffFlags \|= DIFF_VERBOSE;
677	if( g.argc>4 ) usage(mprintf("%s ?STASHID? ?DIFF-OPTIONS?", zCmd));
678	if( zDiffCmd ){
679

	--- src/stash.c
	+++ src/stash.c
	@@ -332,52 +332,45 @@
332	isBin2 = fIncludeBinary ? 0 : looks_like_binary(&a);
333	diff_file_mem(&empty, &a, isBin1, isBin2, zNew, zDiffCmd,
334	zBinGlob, fIncludeBinary, diffFlags);
335	}else if( isRemoved ){
336	fossil_print("DELETE %s\n", zOrig);
337	diff_print_index(zNew, diffFlags);
338	isBin2 = 0;
339	if( fBaseline ){
340	content_get(rid, &a);
341	isBin1 = fIncludeBinary ? 0 : looks_like_binary(&a);
342	diff_file_mem(&a, &empty, isBin1, isBin2, zOrig, zDiffCmd,
343	zBinGlob, fIncludeBinary, diffFlags);
344	}else{
345	}
346	}else{
347	Blob delta;





348	int isOrigLink = file_wd_islink(zOPath);
349	db_ephemeral_blob(&q, 6, &delta);







350	fossil_print("CHANGED %s\n", zNew);
351	if( !isOrigLink != !isLink ){
352	diff_print_index(zNew, diffFlags);
353	diff_print_filenames(zOrig, zNew, diffFlags);
354	printf(DIFF_CANNOT_COMPUTE_SYMLINK);
355	}else{

356	content_get(rid, &a);
357	blob_delta_apply(&a, &delta, &b);
358	isBin1 = fIncludeBinary ? 0 : looks_like_binary(&a);
359	isBin2 = fIncludeBinary ? 0 : looks_like_binary(&b);
360	if( fBaseline ){
361	diff_file_mem(&a, &b, isBin1, isBin2, zNew,
362	zDiffCmd, zBinGlob, fIncludeBinary, diffFlags);
363	}else{
364	/*Diff with file on disk using fSwapDiff=1 to show the diff in the
365	same direction as if fBaseline=1.*/
366	diff_file(&b, isBin2, zOPath, zNew, zDiffCmd,
367	zBinGlob, fIncludeBinary, diffFlags, 1);
368	}
369	blob_reset(&a);
370	blob_reset(&b);
371	}

372	blob_reset(&delta);
373	}
374	}
375	db_finalize(&q);
376	}
	@@ -433,12 +426,15 @@
426	**
427	** List all changes sets currently stashed. Show information about
428	** individual files in each changeset if -v or --verbose is used.
429	**
430	** fossil stash show\|cat ?STASHID? ?DIFF-OPTIONS?
431	** fossil stash gshow\|gcat ?STASHID? ?DIFF-OPTIONS?
432	**
433	** Show the contents of a stash as a diff against it's baseline.
434	** With gshow and gcat, gdiff-command is used instead of internal
435	** diff logic.
436	**
437	** fossil stash pop
438	** fossil stash apply ?STASHID?
439	**
440	** Apply STASHID or the most recently create stash to the current
	@@ -460,18 +456,20 @@
456	**
457	** fossil stash diff ?STASHID? ?DIFF-OPTIONS?
458	** fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
459	**
460	** Show diffs of the current working directory and what that
461	** directory would be if STASHID were applied. With gdiff,
462	** gdiff-command is used instead of internal diff logic.
463	**
464	** SUMMARY:
465	** fossil stash
466	** fossil stash save ?-m\|--comment COMMENT? ?FILES...?
467	** fossil stash snapshot ?-m\|--comment COMMENT? ?FILES...?
468	** fossil stash list\|ls ?-v\|--verbose? ?-W\|--width <num>?
469	** fossil stash show\|cat ?STASHID? ?DIFF-OPTIONS?
470	** fossil stash gshow\|gcat ?STASHID? ?DIFF-OPTIONS?
471	** fossil stash pop
472	** fossil stash apply\|goto ?STASHID?
473	** fossil stash drop\|rm ?STASHID? ?-a\|--all?
474	** fossil stash diff ?STASHID? ?DIFF-OPTIONS?
475	** fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
	@@ -654,25 +652,30 @@
652	undo_finish();
653	}else
654	if( memcmp(zCmd, "diff", nCmd)==0
655	\|\| memcmp(zCmd, "gdiff", nCmd)==0
656	\|\| memcmp(zCmd, "show", nCmd)==0
657	\|\| memcmp(zCmd, "gshow", nCmd)==0
658	\|\| memcmp(zCmd, "cat", nCmd)==0
659	\|\| memcmp(zCmd, "gcat", nCmd)==0
660	){
661	const char *zDiffCmd = 0;
662	const char *zBinGlob = 0;
663	int fIncludeBinary = 0;
664	int fBaseline = 0;
665	u64 diffFlags;
666
667	if( strstr(zCmd,"show")!=0 \|\| strstr(zCmd,"cat")!=0 ){
668	fBaseline = 1;
669	}
670	if( find_option("tk",0,0)!=0 ){
671	db_close(0);
672	diff_tk(fBaseline ? "stash show" : "stash diff", 3);
673	return;
674	}
675	if( find_option("internal","i",0)==0 ){
676	zDiffCmd = diff_command_external(zCmd[0]=='g');
677	}
678	diffFlags = diff_options();
679	if( find_option("verbose","v",0)!=0 ) diffFlags \|= DIFF_VERBOSE;
680	if( g.argc>4 ) usage(mprintf("%s ?STASHID? ?DIFF-OPTIONS?", zCmd));
681	if( zDiffCmd ){
682

M src/stat.c

+6 -1

		--- src/stat.c
		+++ src/stat.c
		@@ -183,11 +183,16 @@
183	183	@ (%h(RELEASE_VERSION)) <a href='version?verbose=1'>(details)</a>
184	184	@ </td></tr>
185	185	@ <tr><th>SQLite Version:</th><td>%.19s(sqlite3_sourceid())
186	186	@ [%.10s(&sqlite3_sourceid()[20])] (%s(sqlite3_libversion()))
187	187	@ <a href='version?verbose=2'>(details)</a></td></tr>
188		- @ <tr><th>Schema Version:</th><td>%h(g.zAuxSchema)</td></tr>
	188	+ if( g.eHashPolicy!=HPOLICY_AUTO ){
	189	+ @ <tr><th>Schema Version:</th><td>%h(g.zAuxSchema),
	190	+ @ %s(hpolicy_name())</td></tr>
	191	+ }else{
	192	+ @ <tr><th>Schema Version:</th><td>%h(g.zAuxSchema)</td></tr>
	193	+ }
189	194	@ <tr><th>Repository Rebuilt:</th><td>
190	195	@ %h(db_get_mtime("rebuilt","%Y-%m-%d %H:%M:%S","Never"))
191	196	@ By Fossil %h(db_get("rebuilt","Unknown"))</td></tr>
192	197	@ <tr><th>Database Stats:</th><td>
193	198	@ %d(db_int(0, "PRAGMA repository.page_count")) pages,
194	199

	--- src/stat.c
	+++ src/stat.c
	@@ -183,11 +183,16 @@
183	@ (%h(RELEASE_VERSION)) <a href='version?verbose=1'>(details)</a>
184	@ </td></tr>
185	@ <tr><th>SQLite Version:</th><td>%.19s(sqlite3_sourceid())
186	@ [%.10s(&sqlite3_sourceid()[20])] (%s(sqlite3_libversion()))
187	@ <a href='version?verbose=2'>(details)</a></td></tr>
188	@ <tr><th>Schema Version:</th><td>%h(g.zAuxSchema)</td></tr>





189	@ <tr><th>Repository Rebuilt:</th><td>
190	@ %h(db_get_mtime("rebuilt","%Y-%m-%d %H:%M:%S","Never"))
191	@ By Fossil %h(db_get("rebuilt","Unknown"))</td></tr>
192	@ <tr><th>Database Stats:</th><td>
193	@ %d(db_int(0, "PRAGMA repository.page_count")) pages,
194

	--- src/stat.c
	+++ src/stat.c
	@@ -183,11 +183,16 @@
183	@ (%h(RELEASE_VERSION)) <a href='version?verbose=1'>(details)</a>
184	@ </td></tr>
185	@ <tr><th>SQLite Version:</th><td>%.19s(sqlite3_sourceid())
186	@ [%.10s(&sqlite3_sourceid()[20])] (%s(sqlite3_libversion()))
187	@ <a href='version?verbose=2'>(details)</a></td></tr>
188	if( g.eHashPolicy!=HPOLICY_AUTO ){
189	@ <tr><th>Schema Version:</th><td>%h(g.zAuxSchema),
190	@ %s(hpolicy_name())</td></tr>
191	}else{
192	@ <tr><th>Schema Version:</th><td>%h(g.zAuxSchema)</td></tr>
193	}
194	@ <tr><th>Repository Rebuilt:</th><td>
195	@ %h(db_get_mtime("rebuilt","%Y-%m-%d %H:%M:%S","Never"))
196	@ By Fossil %h(db_get("rebuilt","Unknown"))</td></tr>
197	@ <tr><th>Database Stats:</th><td>
198	@ %d(db_int(0, "PRAGMA repository.page_count")) pages,
199

M src/unversioned.c

+58 -1

		--- src/unversioned.c
		+++ src/unversioned.c
		@@ -456,11 +456,11 @@
456	456	** Query parameters:
457	457	**
458	458	** byage=1 Order the initial display be decreasing age
459	459	** showdel=0 Show deleted files
460	460	*/
461		-void uvstat_page(void){
	461	+void uvlist_page(void){
462	462	Stmt q;
463	463	sqlite3_int64 iNow;
464	464	sqlite3_int64 iTotalSz = 0;
465	465	int cnt = 0;
466	466	int n = 0;
		@@ -554,5 +554,62 @@
554	554	}else{
555	555	@ No unversioned files on this server.
556	556	}
557	557	style_footer();
558	558	}
	559	+
	560	+/*
	561	+** WEBPAGE: juvlist
	562	+**
	563	+** Return a complete list of unversioned files as JSON. The JSON
	564	+** looks like this:
	565	+**
	566	+** [{"name":NAME,
	567	+** "mtime":MTIME,
	568	+** "hash":HASH,
	569	+** "size":SIZE,
	570	+** "user":USER}]
	571	+*/
	572	+void uvlist_json_page(void){
	573	+ Stmt q;
	574	+ char *zSep = "[";
	575	+ Blob json;
	576	+
	577	+ login_check_credentials();
	578	+ if( !g.perm.Read ){ login_needed(g.anon.Read); return; }
	579	+ cgi_set_content_type("text/json");
	580	+ if( !db_table_exists("repository","unversioned") ){
	581	+ blob_init(&json, "[]", -1);
	582	+ cgi_set_content(&json);
	583	+ return;
	584	+ }
	585	+ blob_init(&json, 0, 0);
	586	+ db_prepare(&q,
	587	+ "SELECT"
	588	+ " name,"
	589	+ " mtime,"
	590	+ " hash,"
	591	+ " sz,"
	592	+ " (SELECT login FROM rcvfrom, user"
	593	+ " WHERE user.uid=rcvfrom.uid AND rcvfrom.rcvid=unversioned.rcvid)"
	594	+ " FROM unversioned WHERE hash IS NOT NULL"
	595	+ );
	596	+ while( db_step(&q)==SQLITE_ROW ){
	597	+ const char *zName = db_column_text(&q, 0);
	598	+ sqlite3_int64 mtime = db_column_int(&q, 1);
	599	+ const char *zHash = db_column_text(&q, 2);
	600	+ int fullSize = db_column_int(&q, 3);
	601	+ const char *zLogin = db_column_text(&q, 4);
	602	+ if( zLogin==0 ) zLogin = "";
	603	+ blob_appendf(&json, "%s{\"name\":\"", zSep);
	604	+ zSep = ",\n ";
	605	+ blob_append_json_string(&json, zName);
	606	+ blob_appendf(&json, "\",\n \"mtime\":%lld,\n \"hash\":\"", mtime);
	607	+ blob_append_json_string(&json, zHash);
	608	+ blob_appendf(&json, "\",\n \"size\":%d,\n \"user\":\"", fullSize);
	609	+ blob_append_json_string(&json, zLogin);
	610	+ blob_appendf(&json, "\"}");
	611	+ }
	612	+ db_finalize(&q);
	613	+ blob_appendf(&json,"]\n");
	614	+ cgi_set_content(&json);
	615	+}
559	616

	--- src/unversioned.c
	+++ src/unversioned.c
	@@ -456,11 +456,11 @@
456	** Query parameters:
457	**
458	** byage=1 Order the initial display be decreasing age
459	** showdel=0 Show deleted files
460	*/
461	void uvstat_page(void){
462	Stmt q;
463	sqlite3_int64 iNow;
464	sqlite3_int64 iTotalSz = 0;
465	int cnt = 0;
466	int n = 0;
	@@ -554,5 +554,62 @@
554	}else{
555	@ No unversioned files on this server.
556	}
557	style_footer();
558	}

























































559

	--- src/unversioned.c
	+++ src/unversioned.c
	@@ -456,11 +456,11 @@
456	** Query parameters:
457	**
458	** byage=1 Order the initial display be decreasing age
459	** showdel=0 Show deleted files
460	*/
461	void uvlist_page(void){
462	Stmt q;
463	sqlite3_int64 iNow;
464	sqlite3_int64 iTotalSz = 0;
465	int cnt = 0;
466	int n = 0;
	@@ -554,5 +554,62 @@
554	}else{
555	@ No unversioned files on this server.
556	}
557	style_footer();
558	}
559
560	/*
561	** WEBPAGE: juvlist
562	**
563	** Return a complete list of unversioned files as JSON. The JSON
564	** looks like this:
565	**
566	** [{"name":NAME,
567	** "mtime":MTIME,
568	** "hash":HASH,
569	** "size":SIZE,
570	** "user":USER}]
571	*/
572	void uvlist_json_page(void){
573	Stmt q;
574	char *zSep = "[";
575	Blob json;
576
577	login_check_credentials();
578	if( !g.perm.Read ){ login_needed(g.anon.Read); return; }
579	cgi_set_content_type("text/json");
580	if( !db_table_exists("repository","unversioned") ){
581	blob_init(&json, "[]", -1);
582	cgi_set_content(&json);
583	return;
584	}
585	blob_init(&json, 0, 0);
586	db_prepare(&q,
587	"SELECT"
588	" name,"
589	" mtime,"
590	" hash,"
591	" sz,"
592	" (SELECT login FROM rcvfrom, user"
593	" WHERE user.uid=rcvfrom.uid AND rcvfrom.rcvid=unversioned.rcvid)"
594	" FROM unversioned WHERE hash IS NOT NULL"
595	);
596	while( db_step(&q)==SQLITE_ROW ){
597	const char *zName = db_column_text(&q, 0);
598	sqlite3_int64 mtime = db_column_int(&q, 1);
599	const char *zHash = db_column_text(&q, 2);
600	int fullSize = db_column_int(&q, 3);
601	const char *zLogin = db_column_text(&q, 4);
602	if( zLogin==0 ) zLogin = "";
603	blob_appendf(&json, "%s{\"name\":\"", zSep);
604	zSep = ",\n ";
605	blob_append_json_string(&json, zName);
606	blob_appendf(&json, "\",\n \"mtime\":%lld,\n \"hash\":\"", mtime);
607	blob_append_json_string(&json, zHash);
608	blob_appendf(&json, "\",\n \"size\":%d,\n \"user\":\"", fullSize);
609	blob_append_json_string(&json, zLogin);
610	blob_appendf(&json, "\"}");
611	}
612	db_finalize(&q);
613	blob_appendf(&json,"]\n");
614	cgi_set_content(&json);
615	}
616

M src/wiki.c

+1 -1

		--- src/wiki.c
		+++ src/wiki.c
		@@ -1122,11 +1122,11 @@
1122	1122	*/
1123	1123	int wiki_technote_to_rid(const char *zETime) {
1124	1124	int rid=0; /* Artifact ID of the tech note */
1125	1125	int nETime = strlen(zETime);
1126	1126	Stmt q;
1127		- if( nETime>=4 && hname_validate(zETime, nETime) ){
	1127	+ if( nETime>=4 && nETime<=HNAME_MAX && validate16(zETime, nETime) ){
1128	1128	char zUuid[HNAME_MAX+1];
1129	1129	memcpy(zUuid, zETime, nETime+1);
1130	1130	canonical16(zUuid, nETime);
1131	1131	db_prepare(&q,
1132	1132	"SELECT e.objid"
1133	1133

	--- src/wiki.c
	+++ src/wiki.c
	@@ -1122,11 +1122,11 @@
1122	*/
1123	int wiki_technote_to_rid(const char *zETime) {
1124	int rid=0; /* Artifact ID of the tech note */
1125	int nETime = strlen(zETime);
1126	Stmt q;
1127	if( nETime>=4 && hname_validate(zETime, nETime) ){
1128	char zUuid[HNAME_MAX+1];
1129	memcpy(zUuid, zETime, nETime+1);
1130	canonical16(zUuid, nETime);
1131	db_prepare(&q,
1132	"SELECT e.objid"
1133

	--- src/wiki.c
	+++ src/wiki.c
	@@ -1122,11 +1122,11 @@
1122	*/
1123	int wiki_technote_to_rid(const char *zETime) {
1124	int rid=0; /* Artifact ID of the tech note */
1125	int nETime = strlen(zETime);
1126	Stmt q;
1127	if( nETime>=4 && nETime<=HNAME_MAX && validate16(zETime, nETime) ){
1128	char zUuid[HNAME_MAX+1];
1129	memcpy(zUuid, zETime, nETime+1);
1130	canonical16(zUuid, nETime);
1131	db_prepare(&q,
1132	"SELECT e.objid"
1133

M src/xfer.c

		--- src/xfer.c
		+++ src/xfer.c
		@@ -1768,10 +1768,11 @@
1768	1768	memset(&xfer, 0, sizeof(xfer));
1769	1769	xfer.pIn = &recv;
1770	1770	xfer.pOut = &send;
1771	1771	xfer.mxSend = db_get_int("max-upload", 250000);
1772	1772	xfer.maxTime = -1;
	1773	+ xfer.clientVersion = RELEASE_VERSION_NUMBER;
1773	1774	if( syncFlags & SYNC_PRIVATE ){
1774	1775	g.perm.Private = 1;
1775	1776	xfer.syncPrivate = 1;
1776	1777	}
1777	1778
1778	1779

	--- src/xfer.c
	+++ src/xfer.c
	@@ -1768,10 +1768,11 @@
1768	memset(&xfer, 0, sizeof(xfer));
1769	xfer.pIn = &recv;
1770	xfer.pOut = &send;
1771	xfer.mxSend = db_get_int("max-upload", 250000);
1772	xfer.maxTime = -1;

1773	if( syncFlags & SYNC_PRIVATE ){
1774	g.perm.Private = 1;
1775	xfer.syncPrivate = 1;
1776	}
1777
1778

	--- src/xfer.c
	+++ src/xfer.c
	@@ -1768,10 +1768,11 @@
1768	memset(&xfer, 0, sizeof(xfer));
1769	xfer.pIn = &recv;
1770	xfer.pOut = &send;
1771	xfer.mxSend = db_get_int("max-upload", 250000);
1772	xfer.maxTime = -1;
1773	xfer.clientVersion = RELEASE_VERSION_NUMBER;
1774	if( syncFlags & SYNC_PRIVATE ){
1775	g.perm.Private = 1;
1776	xfer.syncPrivate = 1;
1777	}
1778
1779

M win/Makefile.mingw.mistachkin

+36

		--- win/Makefile.mingw.mistachkin
		+++ win/Makefile.mingw.mistachkin
		@@ -461,10 +461,11 @@
461	461	$(SRCDIR)/fshell.c \
462	462	$(SRCDIR)/fusefs.c \
463	463	$(SRCDIR)/glob.c \
464	464	$(SRCDIR)/graph.c \
465	465	$(SRCDIR)/gzip.c \
	466	+ $(SRCDIR)/hname.c \
466	467	$(SRCDIR)/http.c \
467	468	$(SRCDIR)/http_socket.c \
468	469	$(SRCDIR)/http_ssl.c \
469	470	$(SRCDIR)/http_transport.c \
470	471	$(SRCDIR)/import.c \
		@@ -511,10 +512,12 @@
511	512	$(SRCDIR)/rss.c \
512	513	$(SRCDIR)/schema.c \
513	514	$(SRCDIR)/search.c \
514	515	$(SRCDIR)/setup.c \
515	516	$(SRCDIR)/sha1.c \
	517	+ $(SRCDIR)/sha1hard.c \
	518	+ $(SRCDIR)/sha3.c \
516	519	$(SRCDIR)/shun.c \
517	520	$(SRCDIR)/sitemap.c \
518	521	$(SRCDIR)/skins.c \
519	522	$(SRCDIR)/sqlcmd.c \
520	523	$(SRCDIR)/stash.c \
		@@ -636,10 +639,11 @@
636	639	$(OBJDIR)/fshell_.c \
637	640	$(OBJDIR)/fusefs_.c \
638	641	$(OBJDIR)/glob_.c \
639	642	$(OBJDIR)/graph_.c \
640	643	$(OBJDIR)/gzip_.c \
	644	+ $(OBJDIR)/hname_.c \
641	645	$(OBJDIR)/http_.c \
642	646	$(OBJDIR)/http_socket_.c \
643	647	$(OBJDIR)/http_ssl_.c \
644	648	$(OBJDIR)/http_transport_.c \
645	649	$(OBJDIR)/import_.c \
		@@ -686,10 +690,12 @@
686	690	$(OBJDIR)/rss_.c \
687	691	$(OBJDIR)/schema_.c \
688	692	$(OBJDIR)/search_.c \
689	693	$(OBJDIR)/setup_.c \
690	694	$(OBJDIR)/sha1_.c \
	695	+ $(OBJDIR)/sha1hard_.c \
	696	+ $(OBJDIR)/sha3_.c \
691	697	$(OBJDIR)/shun_.c \
692	698	$(OBJDIR)/sitemap_.c \
693	699	$(OBJDIR)/skins_.c \
694	700	$(OBJDIR)/sqlcmd_.c \
695	701	$(OBJDIR)/stash_.c \
		@@ -760,10 +766,11 @@
760	766	$(OBJDIR)/fshell.o \
761	767	$(OBJDIR)/fusefs.o \
762	768	$(OBJDIR)/glob.o \
763	769	$(OBJDIR)/graph.o \
764	770	$(OBJDIR)/gzip.o \
	771	+ $(OBJDIR)/hname.o \
765	772	$(OBJDIR)/http.o \
766	773	$(OBJDIR)/http_socket.o \
767	774	$(OBJDIR)/http_ssl.o \
768	775	$(OBJDIR)/http_transport.o \
769	776	$(OBJDIR)/import.o \
		@@ -810,10 +817,12 @@
810	817	$(OBJDIR)/rss.o \
811	818	$(OBJDIR)/schema.o \
812	819	$(OBJDIR)/search.o \
813	820	$(OBJDIR)/setup.o \
814	821	$(OBJDIR)/sha1.o \
	822	+ $(OBJDIR)/sha1hard.o \
	823	+ $(OBJDIR)/sha3.o \
815	824	$(OBJDIR)/shun.o \
816	825	$(OBJDIR)/sitemap.o \
817	826	$(OBJDIR)/skins.o \
818	827	$(OBJDIR)/sqlcmd.o \
819	828	$(OBJDIR)/stash.o \
		@@ -1095,10 +1104,11 @@
1095	1104	$(OBJDIR)/fshell_.c:$(OBJDIR)/fshell.h \
1096	1105	$(OBJDIR)/fusefs_.c:$(OBJDIR)/fusefs.h \
1097	1106	$(OBJDIR)/glob_.c:$(OBJDIR)/glob.h \
1098	1107	$(OBJDIR)/graph_.c:$(OBJDIR)/graph.h \
1099	1108	$(OBJDIR)/gzip_.c:$(OBJDIR)/gzip.h \
	1109	+ $(OBJDIR)/hname_.c:$(OBJDIR)/hname.h \
1100	1110	$(OBJDIR)/http_.c:$(OBJDIR)/http.h \
1101	1111	$(OBJDIR)/http_socket_.c:$(OBJDIR)/http_socket.h \
1102	1112	$(OBJDIR)/http_ssl_.c:$(OBJDIR)/http_ssl.h \
1103	1113	$(OBJDIR)/http_transport_.c:$(OBJDIR)/http_transport.h \
1104	1114	$(OBJDIR)/import_.c:$(OBJDIR)/import.h \
		@@ -1145,10 +1155,12 @@
1145	1155	$(OBJDIR)/rss_.c:$(OBJDIR)/rss.h \
1146	1156	$(OBJDIR)/schema_.c:$(OBJDIR)/schema.h \
1147	1157	$(OBJDIR)/search_.c:$(OBJDIR)/search.h \
1148	1158	$(OBJDIR)/setup_.c:$(OBJDIR)/setup.h \
1149	1159	$(OBJDIR)/sha1_.c:$(OBJDIR)/sha1.h \
	1160	+ $(OBJDIR)/sha1hard_.c:$(OBJDIR)/sha1hard.h \
	1161	+ $(OBJDIR)/sha3_.c:$(OBJDIR)/sha3.h \
1150	1162	$(OBJDIR)/shun_.c:$(OBJDIR)/shun.h \
1151	1163	$(OBJDIR)/sitemap_.c:$(OBJDIR)/sitemap.h \
1152	1164	$(OBJDIR)/skins_.c:$(OBJDIR)/skins.h \
1153	1165	$(OBJDIR)/sqlcmd_.c:$(OBJDIR)/sqlcmd.h \
1154	1166	$(OBJDIR)/stash_.c:$(OBJDIR)/stash.h \
		@@ -1498,10 +1510,18 @@
1498	1510
1499	1511	$(OBJDIR)/gzip.o: $(OBJDIR)/gzip_.c $(OBJDIR)/gzip.h $(SRCDIR)/config.h
1500	1512	$(XTCC) -o $(OBJDIR)/gzip.o -c $(OBJDIR)/gzip_.c
1501	1513
1502	1514	$(OBJDIR)/gzip.h: $(OBJDIR)/headers
	1515	+
	1516	+$(OBJDIR)/hname_.c: $(SRCDIR)/hname.c $(TRANSLATE)
	1517	+ $(TRANSLATE) $(SRCDIR)/hname.c >$@
	1518	+
	1519	+$(OBJDIR)/hname.o: $(OBJDIR)/hname_.c $(OBJDIR)/hname.h $(SRCDIR)/config.h
	1520	+ $(XTCC) -o $(OBJDIR)/hname.o -c $(OBJDIR)/hname_.c
	1521	+
	1522	+$(OBJDIR)/hname.h: $(OBJDIR)/headers
1503	1523
1504	1524	$(OBJDIR)/http_.c: $(SRCDIR)/http.c $(TRANSLATE)
1505	1525	$(TRANSLATE) $(SRCDIR)/http.c >$@
1506	1526
1507	1527	$(OBJDIR)/http.o: $(OBJDIR)/http_.c $(OBJDIR)/http.h $(SRCDIR)/config.h
		@@ -1898,10 +1918,26 @@
1898	1918
1899	1919	$(OBJDIR)/sha1.o: $(OBJDIR)/sha1_.c $(OBJDIR)/sha1.h $(SRCDIR)/config.h
1900	1920	$(XTCC) -o $(OBJDIR)/sha1.o -c $(OBJDIR)/sha1_.c
1901	1921
1902	1922	$(OBJDIR)/sha1.h: $(OBJDIR)/headers
	1923	+
	1924	+$(OBJDIR)/sha1hard_.c: $(SRCDIR)/sha1hard.c $(TRANSLATE)
	1925	+ $(TRANSLATE) $(SRCDIR)/sha1hard.c >$@
	1926	+
	1927	+$(OBJDIR)/sha1hard.o: $(OBJDIR)/sha1hard_.c $(OBJDIR)/sha1hard.h $(SRCDIR)/config.h
	1928	+ $(XTCC) -o $(OBJDIR)/sha1hard.o -c $(OBJDIR)/sha1hard_.c
	1929	+
	1930	+$(OBJDIR)/sha1hard.h: $(OBJDIR)/headers
	1931	+
	1932	+$(OBJDIR)/sha3_.c: $(SRCDIR)/sha3.c $(TRANSLATE)
	1933	+ $(TRANSLATE) $(SRCDIR)/sha3.c >$@
	1934	+
	1935	+$(OBJDIR)/sha3.o: $(OBJDIR)/sha3_.c $(OBJDIR)/sha3.h $(SRCDIR)/config.h
	1936	+ $(XTCC) -o $(OBJDIR)/sha3.o -c $(OBJDIR)/sha3_.c
	1937	+
	1938	+$(OBJDIR)/sha3.h: $(OBJDIR)/headers
1903	1939
1904	1940	$(OBJDIR)/shun_.c: $(SRCDIR)/shun.c $(TRANSLATE)
1905	1941	$(TRANSLATE) $(SRCDIR)/shun.c >$@
1906	1942
1907	1943	$(OBJDIR)/shun.o: $(OBJDIR)/shun_.c $(OBJDIR)/shun.h $(SRCDIR)/config.h
1908	1944

	--- win/Makefile.mingw.mistachkin
	+++ win/Makefile.mingw.mistachkin
	@@ -461,10 +461,11 @@
461	$(SRCDIR)/fshell.c \
462	$(SRCDIR)/fusefs.c \
463	$(SRCDIR)/glob.c \
464	$(SRCDIR)/graph.c \
465	$(SRCDIR)/gzip.c \

466	$(SRCDIR)/http.c \
467	$(SRCDIR)/http_socket.c \
468	$(SRCDIR)/http_ssl.c \
469	$(SRCDIR)/http_transport.c \
470	$(SRCDIR)/import.c \
	@@ -511,10 +512,12 @@
511	$(SRCDIR)/rss.c \
512	$(SRCDIR)/schema.c \
513	$(SRCDIR)/search.c \
514	$(SRCDIR)/setup.c \
515	$(SRCDIR)/sha1.c \


516	$(SRCDIR)/shun.c \
517	$(SRCDIR)/sitemap.c \
518	$(SRCDIR)/skins.c \
519	$(SRCDIR)/sqlcmd.c \
520	$(SRCDIR)/stash.c \
	@@ -636,10 +639,11 @@
636	$(OBJDIR)/fshell_.c \
637	$(OBJDIR)/fusefs_.c \
638	$(OBJDIR)/glob_.c \
639	$(OBJDIR)/graph_.c \
640	$(OBJDIR)/gzip_.c \

641	$(OBJDIR)/http_.c \
642	$(OBJDIR)/http_socket_.c \
643	$(OBJDIR)/http_ssl_.c \
644	$(OBJDIR)/http_transport_.c \
645	$(OBJDIR)/import_.c \
	@@ -686,10 +690,12 @@
686	$(OBJDIR)/rss_.c \
687	$(OBJDIR)/schema_.c \
688	$(OBJDIR)/search_.c \
689	$(OBJDIR)/setup_.c \
690	$(OBJDIR)/sha1_.c \


691	$(OBJDIR)/shun_.c \
692	$(OBJDIR)/sitemap_.c \
693	$(OBJDIR)/skins_.c \
694	$(OBJDIR)/sqlcmd_.c \
695	$(OBJDIR)/stash_.c \
	@@ -760,10 +766,11 @@
760	$(OBJDIR)/fshell.o \
761	$(OBJDIR)/fusefs.o \
762	$(OBJDIR)/glob.o \
763	$(OBJDIR)/graph.o \
764	$(OBJDIR)/gzip.o \

765	$(OBJDIR)/http.o \
766	$(OBJDIR)/http_socket.o \
767	$(OBJDIR)/http_ssl.o \
768	$(OBJDIR)/http_transport.o \
769	$(OBJDIR)/import.o \
	@@ -810,10 +817,12 @@
810	$(OBJDIR)/rss.o \
811	$(OBJDIR)/schema.o \
812	$(OBJDIR)/search.o \
813	$(OBJDIR)/setup.o \
814	$(OBJDIR)/sha1.o \


815	$(OBJDIR)/shun.o \
816	$(OBJDIR)/sitemap.o \
817	$(OBJDIR)/skins.o \
818	$(OBJDIR)/sqlcmd.o \
819	$(OBJDIR)/stash.o \
	@@ -1095,10 +1104,11 @@
1095	$(OBJDIR)/fshell_.c:$(OBJDIR)/fshell.h \
1096	$(OBJDIR)/fusefs_.c:$(OBJDIR)/fusefs.h \
1097	$(OBJDIR)/glob_.c:$(OBJDIR)/glob.h \
1098	$(OBJDIR)/graph_.c:$(OBJDIR)/graph.h \
1099	$(OBJDIR)/gzip_.c:$(OBJDIR)/gzip.h \

1100	$(OBJDIR)/http_.c:$(OBJDIR)/http.h \
1101	$(OBJDIR)/http_socket_.c:$(OBJDIR)/http_socket.h \
1102	$(OBJDIR)/http_ssl_.c:$(OBJDIR)/http_ssl.h \
1103	$(OBJDIR)/http_transport_.c:$(OBJDIR)/http_transport.h \
1104	$(OBJDIR)/import_.c:$(OBJDIR)/import.h \
	@@ -1145,10 +1155,12 @@
1145	$(OBJDIR)/rss_.c:$(OBJDIR)/rss.h \
1146	$(OBJDIR)/schema_.c:$(OBJDIR)/schema.h \
1147	$(OBJDIR)/search_.c:$(OBJDIR)/search.h \
1148	$(OBJDIR)/setup_.c:$(OBJDIR)/setup.h \
1149	$(OBJDIR)/sha1_.c:$(OBJDIR)/sha1.h \


1150	$(OBJDIR)/shun_.c:$(OBJDIR)/shun.h \
1151	$(OBJDIR)/sitemap_.c:$(OBJDIR)/sitemap.h \
1152	$(OBJDIR)/skins_.c:$(OBJDIR)/skins.h \
1153	$(OBJDIR)/sqlcmd_.c:$(OBJDIR)/sqlcmd.h \
1154	$(OBJDIR)/stash_.c:$(OBJDIR)/stash.h \
	@@ -1498,10 +1510,18 @@
1498
1499	$(OBJDIR)/gzip.o: $(OBJDIR)/gzip_.c $(OBJDIR)/gzip.h $(SRCDIR)/config.h
1500	$(XTCC) -o $(OBJDIR)/gzip.o -c $(OBJDIR)/gzip_.c
1501
1502	$(OBJDIR)/gzip.h: $(OBJDIR)/headers








1503
1504	$(OBJDIR)/http_.c: $(SRCDIR)/http.c $(TRANSLATE)
1505	$(TRANSLATE) $(SRCDIR)/http.c >$@
1506
1507	$(OBJDIR)/http.o: $(OBJDIR)/http_.c $(OBJDIR)/http.h $(SRCDIR)/config.h
	@@ -1898,10 +1918,26 @@
1898
1899	$(OBJDIR)/sha1.o: $(OBJDIR)/sha1_.c $(OBJDIR)/sha1.h $(SRCDIR)/config.h
1900	$(XTCC) -o $(OBJDIR)/sha1.o -c $(OBJDIR)/sha1_.c
1901
1902	$(OBJDIR)/sha1.h: $(OBJDIR)/headers
















1903
1904	$(OBJDIR)/shun_.c: $(SRCDIR)/shun.c $(TRANSLATE)
1905	$(TRANSLATE) $(SRCDIR)/shun.c >$@
1906
1907	$(OBJDIR)/shun.o: $(OBJDIR)/shun_.c $(OBJDIR)/shun.h $(SRCDIR)/config.h
1908

	--- win/Makefile.mingw.mistachkin
	+++ win/Makefile.mingw.mistachkin
	@@ -461,10 +461,11 @@
461	$(SRCDIR)/fshell.c \
462	$(SRCDIR)/fusefs.c \
463	$(SRCDIR)/glob.c \
464	$(SRCDIR)/graph.c \
465	$(SRCDIR)/gzip.c \
466	$(SRCDIR)/hname.c \
467	$(SRCDIR)/http.c \
468	$(SRCDIR)/http_socket.c \
469	$(SRCDIR)/http_ssl.c \
470	$(SRCDIR)/http_transport.c \
471	$(SRCDIR)/import.c \
	@@ -511,10 +512,12 @@
512	$(SRCDIR)/rss.c \
513	$(SRCDIR)/schema.c \
514	$(SRCDIR)/search.c \
515	$(SRCDIR)/setup.c \
516	$(SRCDIR)/sha1.c \
517	$(SRCDIR)/sha1hard.c \
518	$(SRCDIR)/sha3.c \
519	$(SRCDIR)/shun.c \
520	$(SRCDIR)/sitemap.c \
521	$(SRCDIR)/skins.c \
522	$(SRCDIR)/sqlcmd.c \
523	$(SRCDIR)/stash.c \
	@@ -636,10 +639,11 @@
639	$(OBJDIR)/fshell_.c \
640	$(OBJDIR)/fusefs_.c \
641	$(OBJDIR)/glob_.c \
642	$(OBJDIR)/graph_.c \
643	$(OBJDIR)/gzip_.c \
644	$(OBJDIR)/hname_.c \
645	$(OBJDIR)/http_.c \
646	$(OBJDIR)/http_socket_.c \
647	$(OBJDIR)/http_ssl_.c \
648	$(OBJDIR)/http_transport_.c \
649	$(OBJDIR)/import_.c \
	@@ -686,10 +690,12 @@
690	$(OBJDIR)/rss_.c \
691	$(OBJDIR)/schema_.c \
692	$(OBJDIR)/search_.c \
693	$(OBJDIR)/setup_.c \
694	$(OBJDIR)/sha1_.c \
695	$(OBJDIR)/sha1hard_.c \
696	$(OBJDIR)/sha3_.c \
697	$(OBJDIR)/shun_.c \
698	$(OBJDIR)/sitemap_.c \
699	$(OBJDIR)/skins_.c \
700	$(OBJDIR)/sqlcmd_.c \
701	$(OBJDIR)/stash_.c \
	@@ -760,10 +766,11 @@
766	$(OBJDIR)/fshell.o \
767	$(OBJDIR)/fusefs.o \
768	$(OBJDIR)/glob.o \
769	$(OBJDIR)/graph.o \
770	$(OBJDIR)/gzip.o \
771	$(OBJDIR)/hname.o \
772	$(OBJDIR)/http.o \
773	$(OBJDIR)/http_socket.o \
774	$(OBJDIR)/http_ssl.o \
775	$(OBJDIR)/http_transport.o \
776	$(OBJDIR)/import.o \
	@@ -810,10 +817,12 @@
817	$(OBJDIR)/rss.o \
818	$(OBJDIR)/schema.o \
819	$(OBJDIR)/search.o \
820	$(OBJDIR)/setup.o \
821	$(OBJDIR)/sha1.o \
822	$(OBJDIR)/sha1hard.o \
823	$(OBJDIR)/sha3.o \
824	$(OBJDIR)/shun.o \
825	$(OBJDIR)/sitemap.o \
826	$(OBJDIR)/skins.o \
827	$(OBJDIR)/sqlcmd.o \
828	$(OBJDIR)/stash.o \
	@@ -1095,10 +1104,11 @@
1104	$(OBJDIR)/fshell_.c:$(OBJDIR)/fshell.h \
1105	$(OBJDIR)/fusefs_.c:$(OBJDIR)/fusefs.h \
1106	$(OBJDIR)/glob_.c:$(OBJDIR)/glob.h \
1107	$(OBJDIR)/graph_.c:$(OBJDIR)/graph.h \
1108	$(OBJDIR)/gzip_.c:$(OBJDIR)/gzip.h \
1109	$(OBJDIR)/hname_.c:$(OBJDIR)/hname.h \
1110	$(OBJDIR)/http_.c:$(OBJDIR)/http.h \
1111	$(OBJDIR)/http_socket_.c:$(OBJDIR)/http_socket.h \
1112	$(OBJDIR)/http_ssl_.c:$(OBJDIR)/http_ssl.h \
1113	$(OBJDIR)/http_transport_.c:$(OBJDIR)/http_transport.h \
1114	$(OBJDIR)/import_.c:$(OBJDIR)/import.h \
	@@ -1145,10 +1155,12 @@
1155	$(OBJDIR)/rss_.c:$(OBJDIR)/rss.h \
1156	$(OBJDIR)/schema_.c:$(OBJDIR)/schema.h \
1157	$(OBJDIR)/search_.c:$(OBJDIR)/search.h \
1158	$(OBJDIR)/setup_.c:$(OBJDIR)/setup.h \
1159	$(OBJDIR)/sha1_.c:$(OBJDIR)/sha1.h \
1160	$(OBJDIR)/sha1hard_.c:$(OBJDIR)/sha1hard.h \
1161	$(OBJDIR)/sha3_.c:$(OBJDIR)/sha3.h \
1162	$(OBJDIR)/shun_.c:$(OBJDIR)/shun.h \
1163	$(OBJDIR)/sitemap_.c:$(OBJDIR)/sitemap.h \
1164	$(OBJDIR)/skins_.c:$(OBJDIR)/skins.h \
1165	$(OBJDIR)/sqlcmd_.c:$(OBJDIR)/sqlcmd.h \
1166	$(OBJDIR)/stash_.c:$(OBJDIR)/stash.h \
	@@ -1498,10 +1510,18 @@
1510
1511	$(OBJDIR)/gzip.o: $(OBJDIR)/gzip_.c $(OBJDIR)/gzip.h $(SRCDIR)/config.h
1512	$(XTCC) -o $(OBJDIR)/gzip.o -c $(OBJDIR)/gzip_.c
1513
1514	$(OBJDIR)/gzip.h: $(OBJDIR)/headers
1515
1516	$(OBJDIR)/hname_.c: $(SRCDIR)/hname.c $(TRANSLATE)
1517	$(TRANSLATE) $(SRCDIR)/hname.c >$@
1518
1519	$(OBJDIR)/hname.o: $(OBJDIR)/hname_.c $(OBJDIR)/hname.h $(SRCDIR)/config.h
1520	$(XTCC) -o $(OBJDIR)/hname.o -c $(OBJDIR)/hname_.c
1521
1522	$(OBJDIR)/hname.h: $(OBJDIR)/headers
1523
1524	$(OBJDIR)/http_.c: $(SRCDIR)/http.c $(TRANSLATE)
1525	$(TRANSLATE) $(SRCDIR)/http.c >$@
1526
1527	$(OBJDIR)/http.o: $(OBJDIR)/http_.c $(OBJDIR)/http.h $(SRCDIR)/config.h
	@@ -1898,10 +1918,26 @@
1918
1919	$(OBJDIR)/sha1.o: $(OBJDIR)/sha1_.c $(OBJDIR)/sha1.h $(SRCDIR)/config.h
1920	$(XTCC) -o $(OBJDIR)/sha1.o -c $(OBJDIR)/sha1_.c
1921
1922	$(OBJDIR)/sha1.h: $(OBJDIR)/headers
1923
1924	$(OBJDIR)/sha1hard_.c: $(SRCDIR)/sha1hard.c $(TRANSLATE)
1925	$(TRANSLATE) $(SRCDIR)/sha1hard.c >$@
1926
1927	$(OBJDIR)/sha1hard.o: $(OBJDIR)/sha1hard_.c $(OBJDIR)/sha1hard.h $(SRCDIR)/config.h
1928	$(XTCC) -o $(OBJDIR)/sha1hard.o -c $(OBJDIR)/sha1hard_.c
1929
1930	$(OBJDIR)/sha1hard.h: $(OBJDIR)/headers
1931
1932	$(OBJDIR)/sha3_.c: $(SRCDIR)/sha3.c $(TRANSLATE)
1933	$(TRANSLATE) $(SRCDIR)/sha3.c >$@
1934
1935	$(OBJDIR)/sha3.o: $(OBJDIR)/sha3_.c $(OBJDIR)/sha3.h $(SRCDIR)/config.h
1936	$(XTCC) -o $(OBJDIR)/sha3.o -c $(OBJDIR)/sha3_.c
1937
1938	$(OBJDIR)/sha3.h: $(OBJDIR)/headers
1939
1940	$(OBJDIR)/shun_.c: $(SRCDIR)/shun.c $(TRANSLATE)
1941	$(TRANSLATE) $(SRCDIR)/shun.c >$@
1942
1943	$(OBJDIR)/shun.o: $(OBJDIR)/shun_.c $(OBJDIR)/shun.h $(SRCDIR)/config.h
1944

M www/changes.wiki

+11

		--- www/changes.wiki
		+++ www/changes.wiki
		@@ -1,6 +1,17 @@
1	1	<title>Change Log</title>
	2	+
	3	+<a name='v2_1'></a>
	4	+<h2>Changes for Version 2.1 (2017-03-??)</h2>
	5	+
	6	+ * Add support for [./hashpolicy.wiki\|hash policies] that control which
	7	+ of the Hardened-SHA1 or SHA3-256 algorithms is used to name new
	8	+ artifacts.
	9	+ * Add the "gshow" and "gcat" subcommands to [/help?cmd=stash\|fossil stash].
	10	+ * Add the [/help?cmd=/juvlist\|/juvlist] web page and use it to construct
	11	+ the [/uv/download.html\|Download Page] of the Fossil self-hosting website
	12	+ using Ajax.
2	13
3	14	<a name='v2_0'></a>
4	15	<h2>Changes for Version 2.0 (2017-03-03)</h2>
5	16
6	17	* Use the
7	18
8	19	ADDED www/hashpolicy.wiki

	--- www/changes.wiki
	+++ www/changes.wiki
	@@ -1,6 +1,17 @@
1	<title>Change Log</title>











2
3	<a name='v2_0'></a>
4	<h2>Changes for Version 2.0 (2017-03-03)</h2>
5
6	* Use the
7
8	DDED www/hashpolicy.wiki

	--- www/changes.wiki
	+++ www/changes.wiki
	@@ -1,6 +1,17 @@
1	<title>Change Log</title>
2
3	<a name='v2_1'></a>
4	<h2>Changes for Version 2.1 (2017-03-??)</h2>
5
6	* Add support for [./hashpolicy.wiki\|hash policies] that control which
7	of the Hardened-SHA1 or SHA3-256 algorithms is used to name new
8	artifacts.
9	* Add the "gshow" and "gcat" subcommands to [/help?cmd=stash\|fossil stash].
10	* Add the [/help?cmd=/juvlist\|/juvlist] web page and use it to construct
11	the [/uv/download.html\|Download Page] of the Fossil self-hosting website
12	using Ajax.
13
14	<a name='v2_0'></a>
15	<h2>Changes for Version 2.0 (2017-03-03)</h2>
16
17	* Use the
18
19	DDED www/hashpolicy.wiki

M www/hashpolicy.wiki

+20

		--- a/www/hashpolicy.wiki
		+++ b/www/hashpolicy.wiki
		@@ -0,0 +1,20 @@
	1	+<title>Hash Policy</title>
	2	+
	3	+<h2> Executive Summary, Orcutive Summary</h2>
	4	+
	5	+<b>Or: How To </h2>
	6	+
	7	+There i This Article</b>
	8	+
	9	+Thham now
	10	+upgraded to
	11	+change texpected to be
	12	+replaced ot expected to be
	13	+replaced until Ma
	14	+out o
	15	+Debian 9 is implement0 or later
	16	+
	17	+work and
	18	+Hash Policy</title>
	19	+
	20	+<h2>< Introduction ha", not generic SHA1sequel

	--- a/www/hashpolicy.wiki
	+++ b/www/hashpolicy.wiki
	@@ -0,0 +1,20 @@

	--- a/www/hashpolicy.wiki
	+++ b/www/hashpolicy.wiki
	@@ -0,0 +1,20 @@
1	<title>Hash Policy</title>
2
3	<h2> Executive Summary, Orcutive Summary</h2>
4
5	<b>Or: How To </h2>
6
7	There i This Article</b>
8
9	Thham now
10	upgraded to
11	change texpected to be
12	replaced ot expected to be
13	replaced until Ma
14	out o
15	Debian 9 is implement0 or later
16
17	work and
18	Hash Policy</title>
19
20	<h2>< Introduction ha", not generic SHA1sequel

M www/mkdownload.tcl

+2 -2

		--- www/mkdownload.tcl
		+++ www/mkdownload.tcl
		@@ -37,12 +37,12 @@
37	37	set avers($version) 1
38	38	}
39	39	}
40	40	close $in
41	41
	42	+set vdate(2.0) 2017-03-03
42	43	set vdate(1.37) 2017-01-15
43		-set vdate(1.36) 2016-10-24
44	44
45	45	# Do all versions from newest to oldest
46	46	#
47	47	foreach vers [lsort -decr -real [array names avers]] {
48	48	# set hr "../timeline?c=version-$vers;y=ci"
		@@ -57,11 +57,11 @@
57	57	puts $out "</b></center>"
58	58	puts $out "</td></tr>"
59	59	puts $out "<tr>"
60	60
61	61	foreach {prefix img desc} {
62		- fossil-linux-x86 linux.gif {Linux 3.x x86}
	62	+ fossil-linux linux.gif {Linux 3.x x64}
63	63	fossil-macosx mac.gif {Mac 10.x x86}
64	64	fossil-openbsd-x86 openbsd.gif {OpenBSD 5.x x86}
65	65	fossil-w32 win32.gif {Windows}
66	66	fossil-src src.gif {Source Tarball}
67	67	} {
68	68

	--- www/mkdownload.tcl
	+++ www/mkdownload.tcl
	@@ -37,12 +37,12 @@
37	set avers($version) 1
38	}
39	}
40	close $in
41

42	set vdate(1.37) 2017-01-15
43	set vdate(1.36) 2016-10-24
44
45	# Do all versions from newest to oldest
46	#
47	foreach vers [lsort -decr -real [array names avers]] {
48	# set hr "../timeline?c=version-$vers;y=ci"
	@@ -57,11 +57,11 @@
57	puts $out "</b></center>"
58	puts $out "</td></tr>"
59	puts $out "<tr>"
60
61	foreach {prefix img desc} {
62	fossil-linux-x86 linux.gif {Linux 3.x x86}
63	fossil-macosx mac.gif {Mac 10.x x86}
64	fossil-openbsd-x86 openbsd.gif {OpenBSD 5.x x86}
65	fossil-w32 win32.gif {Windows}
66	fossil-src src.gif {Source Tarball}
67	} {
68

	--- www/mkdownload.tcl
	+++ www/mkdownload.tcl
	@@ -37,12 +37,12 @@
37	set avers($version) 1
38	}
39	}
40	close $in
41
42	set vdate(2.0) 2017-03-03
43	set vdate(1.37) 2017-01-15

44
45	# Do all versions from newest to oldest
46	#
47	foreach vers [lsort -decr -real [array names avers]] {
48	# set hr "../timeline?c=version-$vers;y=ci"
	@@ -57,11 +57,11 @@
57	puts $out "</b></center>"
58	puts $out "</td></tr>"
59	puts $out "<tr>"
60
61	foreach {prefix img desc} {
62	fossil-linux linux.gif {Linux 3.x x64}
63	fossil-macosx mac.gif {Mac 10.x x86}
64	fossil-openbsd-x86 openbsd.gif {OpenBSD 5.x x86}
65	fossil-w32 win32.gif {Windows}
66	fossil-src src.gif {Source Tarball}
67	} {
68

M www/mkindex.tcl

		--- www/mkindex.tcl
		+++ www/mkindex.tcl
		@@ -36,10 +36,11 @@
36	36	fiveminutes.wiki {Update and Running in 5 Minutes as a Single User}
37	37	foss-cklist.wiki {Checklist For Successful Open-Source Projects}
38	38	fossil-from-msvc.wiki {Integrating Fossil in the Microsoft Express 2010 IDE}
39	39	fossil-v-git.wiki {Fossil Versus Git}
40	40	hacker-howto.wiki {Hacker How-To}
	41	+ hashpolicy.wiki {Hash Policy: Choosing Between SHA1 and SHA3-256}
41	42	/help {Lists of Commands and Webpages}
42	43	hints.wiki {Fossil Tips And Usage Hints}
43	44	index.wiki {Home Page}
44	45	inout.wiki {Import And Export To And From Git}
45	46	makefile.wiki {The Fossil Build Process}
46	47

	--- www/mkindex.tcl
	+++ www/mkindex.tcl
	@@ -36,10 +36,11 @@
36	fiveminutes.wiki {Update and Running in 5 Minutes as a Single User}
37	foss-cklist.wiki {Checklist For Successful Open-Source Projects}
38	fossil-from-msvc.wiki {Integrating Fossil in the Microsoft Express 2010 IDE}
39	fossil-v-git.wiki {Fossil Versus Git}
40	hacker-howto.wiki {Hacker How-To}

41	/help {Lists of Commands and Webpages}
42	hints.wiki {Fossil Tips And Usage Hints}
43	index.wiki {Home Page}
44	inout.wiki {Import And Export To And From Git}
45	makefile.wiki {The Fossil Build Process}
46

	--- www/mkindex.tcl
	+++ www/mkindex.tcl
	@@ -36,10 +36,11 @@
36	fiveminutes.wiki {Update and Running in 5 Minutes as a Single User}
37	foss-cklist.wiki {Checklist For Successful Open-Source Projects}
38	fossil-from-msvc.wiki {Integrating Fossil in the Microsoft Express 2010 IDE}
39	fossil-v-git.wiki {Fossil Versus Git}
40	hacker-howto.wiki {Hacker How-To}
41	hashpolicy.wiki {Hash Policy: Choosing Between SHA1 and SHA3-256}
42	/help {Lists of Commands and Webpages}
43	hints.wiki {Fossil Tips And Usage Hints}
44	index.wiki {Home Page}
45	inout.wiki {Import And Export To And From Git}
46	makefile.wiki {The Fossil Build Process}
47

M www/permutedindex.html

		--- www/permutedindex.html
		+++ www/permutedindex.html
		@@ -29,10 +29,11 @@
29	29	<li><a href="blame.wiki">Annotate/Blame Algorithm Of Fossil — The</a></li>
30	30	<li><a href="customskin.md">Appearance of Web Pages — Theming: Customizing The</a></li>
31	31	<li><a href="faq.wiki">Asked Questions — Frequently</a></li>
32	32	<li><a href="password.wiki">Authentication — Password Management And</a></li>
33	33	<li><a href="whyusefossil.wiki"><b>Benefits Of Version Control</b></a></li>
	34	+<li><a href="hashpolicy.wiki">Between SHA1 and SHA3-256 — Hash Policy: Choosing</a></li>
34	35	<li><a href="antibot.wiki">Bots — Defense against Spiders and</a></li>
35	36	<li><a href="private.wiki">Branches — Creating, Syncing, and Deleting Private</a></li>
36	37	<li><a href="branching.wiki"><b>Branching, Forking, Merging, and Tagging</b></a></li>
37	38	<li><a href="bugtheory.wiki"><b>Bug Tracking In Fossil</b></a></li>
38	39	<li><a href="makefile.wiki">Build Process — The Fossil</a></li>
		@@ -43,10 +44,11 @@
43	44	<li><a href="checkin.wiki">Checklist — Check-in</a></li>
44	45	<li><a href="../test/release-checklist.wiki">Checklist — Pre-Release Testing</a></li>
45	46	<li><a href="foss-cklist.wiki"><b>Checklist For Successful Open-Source Projects</b></a></li>
46	47	<li><a href="selfcheck.wiki">Checks — Fossil Repository Integrity Self</a></li>
47	48	<li><a href="childprojects.wiki"><b>Child Projects</b></a></li>
	49	+<li><a href="hashpolicy.wiki">Choosing Between SHA1 and SHA3-256 — Hash Policy:</a></li>
48	50	<li><a href="contribute.wiki">Code or Documentation To The Fossil Project — Contributing</a></li>
49	51	<li><a href="style.wiki">Code Style Guidelines — Source</a></li>
50	52	<li><a href="../../../help">Commands and Webpages — Lists of</a></li>
51	53	<li><a href="build.wiki"><b>Compiling and Installing Fossil</b></a></li>
52	54	<li><a href="concepts.wiki">Concepts — Fossil Core</a></li>
		@@ -111,10 +113,11 @@
111	113	<li><a href="customgraph.md">Graph — Theming: Customizing the Timeline</a></li>
112	114	<li><a href="quickstart.wiki">Guide — Fossil Quick Start</a></li>
113	115	<li><a href="style.wiki">Guidelines — Source Code Style</a></li>
114	116	<li><a href="hacker-howto.wiki"><b>Hacker How-To</b></a></li>
115	117	<li><a href="adding_code.wiki"><b>Hacking Fossil</b></a></li>
	118	+<li><a href="hashpolicy.wiki"><b>Hash Policy: Choosing Between SHA1 and SHA3-256</b></a></li>
116	119	<li><a href="hints.wiki">Hints — Fossil Tips And Usage</a></li>
117	120	<li><a href="index.wiki"><b>Home Page</b></a></li>
118	121	<li><a href="selfhost.wiki">Hosting Repositories — Fossil Self</a></li>
119	122	<li><a href="aboutcgi.wiki"><b>How CGI Works In Fossil</b></a></li>
120	123	<li><a href="server.wiki"><b>How To Configure A Fossil Server</b></a></li>
		@@ -147,10 +150,11 @@
147	150	<li><a href="index.wiki">Page — Home</a></li>
148	151	<li><a href="customskin.md">Pages — Theming: Customizing The Appearance of Web</a></li>
149	152	<li><a href="password.wiki"><b>Password Management And Authentication</b></a></li>
150	153	<li><a href="quotes.wiki">People Are Saying About Fossil, Git, and DVCSes in General — Quotes: What</a></li>
151	154	<li><a href="stats.wiki"><b>Performance Statistics</b></a></li>
	155	+<li><a href="hashpolicy.wiki">Policy: Choosing Between SHA1 and SHA3-256 — Hash</a></li>
152	156	<li><a href="../test/release-checklist.wiki"><b>Pre-Release Testing Checklist</b></a></li>
153	157	<li><a href="pop.wiki"><b>Principles Of Operation</b></a></li>
154	158	<li><a href="private.wiki">Private Branches — Creating, Syncing, and Deleting</a></li>
155	159	<li><a href="makefile.wiki">Process — The Fossil Build</a></li>
156	160	<li><a href="contribute.wiki">Project — Contributing Code or Documentation To The Fossil</a></li>
		@@ -174,10 +178,12 @@
174	178	<li><a href="th1.md">Scripting Language — The TH1</a></li>
175	179	<li><a href="selfcheck.wiki">Self Checks — Fossil Repository Integrity</a></li>
176	180	<li><a href="selfhost.wiki">Self Hosting Repositories — Fossil</a></li>
177	181	<li><a href="server.wiki">Server — How To Configure A Fossil</a></li>
178	182	<li><a href="settings.wiki">Settings — Fossil</a></li>
	183	+<li><a href="hashpolicy.wiki">SHA1 and SHA3-256 — Hash Policy: Choosing Between</a></li>
	184	+<li><a href="hashpolicy.wiki">SHA3-256 — Hash Policy: Choosing Between SHA1 and</a></li>
179	185	<li><a href="shunning.wiki"><b>Shunning: Deleting Content From Fossil</b></a></li>
180	186	<li><a href="fiveminutes.wiki">Single User — Update and Running in 5 Minutes as a</a></li>
181	187	<li><a href="../../../sitemap"><b>Site Map</b></a></li>
182	188	<li><a href="style.wiki"><b>Source Code Style Guidelines</b></a></li>
183	189	<li><a href="antibot.wiki">Spiders and Bots — Defense against</a></li>
184	190

	--- www/permutedindex.html
	+++ www/permutedindex.html
	@@ -29,10 +29,11 @@
29	<li><a href="blame.wiki">Annotate/Blame Algorithm Of Fossil — The</a></li>
30	<li><a href="customskin.md">Appearance of Web Pages — Theming: Customizing The</a></li>
31	<li><a href="faq.wiki">Asked Questions — Frequently</a></li>
32	<li><a href="password.wiki">Authentication — Password Management And</a></li>
33	<li><a href="whyusefossil.wiki"><b>Benefits Of Version Control</b></a></li>

34	<li><a href="antibot.wiki">Bots — Defense against Spiders and</a></li>
35	<li><a href="private.wiki">Branches — Creating, Syncing, and Deleting Private</a></li>
36	<li><a href="branching.wiki"><b>Branching, Forking, Merging, and Tagging</b></a></li>
37	<li><a href="bugtheory.wiki"><b>Bug Tracking In Fossil</b></a></li>
38	<li><a href="makefile.wiki">Build Process — The Fossil</a></li>
	@@ -43,10 +44,11 @@
43	<li><a href="checkin.wiki">Checklist — Check-in</a></li>
44	<li><a href="../test/release-checklist.wiki">Checklist — Pre-Release Testing</a></li>
45	<li><a href="foss-cklist.wiki"><b>Checklist For Successful Open-Source Projects</b></a></li>
46	<li><a href="selfcheck.wiki">Checks — Fossil Repository Integrity Self</a></li>
47	<li><a href="childprojects.wiki"><b>Child Projects</b></a></li>

48	<li><a href="contribute.wiki">Code or Documentation To The Fossil Project — Contributing</a></li>
49	<li><a href="style.wiki">Code Style Guidelines — Source</a></li>
50	<li><a href="../../../help">Commands and Webpages — Lists of</a></li>
51	<li><a href="build.wiki"><b>Compiling and Installing Fossil</b></a></li>
52	<li><a href="concepts.wiki">Concepts — Fossil Core</a></li>
	@@ -111,10 +113,11 @@
111	<li><a href="customgraph.md">Graph — Theming: Customizing the Timeline</a></li>
112	<li><a href="quickstart.wiki">Guide — Fossil Quick Start</a></li>
113	<li><a href="style.wiki">Guidelines — Source Code Style</a></li>
114	<li><a href="hacker-howto.wiki"><b>Hacker How-To</b></a></li>
115	<li><a href="adding_code.wiki"><b>Hacking Fossil</b></a></li>

116	<li><a href="hints.wiki">Hints — Fossil Tips And Usage</a></li>
117	<li><a href="index.wiki"><b>Home Page</b></a></li>
118	<li><a href="selfhost.wiki">Hosting Repositories — Fossil Self</a></li>
119	<li><a href="aboutcgi.wiki"><b>How CGI Works In Fossil</b></a></li>
120	<li><a href="server.wiki"><b>How To Configure A Fossil Server</b></a></li>
	@@ -147,10 +150,11 @@
147	<li><a href="index.wiki">Page — Home</a></li>
148	<li><a href="customskin.md">Pages — Theming: Customizing The Appearance of Web</a></li>
149	<li><a href="password.wiki"><b>Password Management And Authentication</b></a></li>
150	<li><a href="quotes.wiki">People Are Saying About Fossil, Git, and DVCSes in General — Quotes: What</a></li>
151	<li><a href="stats.wiki"><b>Performance Statistics</b></a></li>

152	<li><a href="../test/release-checklist.wiki"><b>Pre-Release Testing Checklist</b></a></li>
153	<li><a href="pop.wiki"><b>Principles Of Operation</b></a></li>
154	<li><a href="private.wiki">Private Branches — Creating, Syncing, and Deleting</a></li>
155	<li><a href="makefile.wiki">Process — The Fossil Build</a></li>
156	<li><a href="contribute.wiki">Project — Contributing Code or Documentation To The Fossil</a></li>
	@@ -174,10 +178,12 @@
174	<li><a href="th1.md">Scripting Language — The TH1</a></li>
175	<li><a href="selfcheck.wiki">Self Checks — Fossil Repository Integrity</a></li>
176	<li><a href="selfhost.wiki">Self Hosting Repositories — Fossil</a></li>
177	<li><a href="server.wiki">Server — How To Configure A Fossil</a></li>
178	<li><a href="settings.wiki">Settings — Fossil</a></li>


179	<li><a href="shunning.wiki"><b>Shunning: Deleting Content From Fossil</b></a></li>
180	<li><a href="fiveminutes.wiki">Single User — Update and Running in 5 Minutes as a</a></li>
181	<li><a href="../../../sitemap"><b>Site Map</b></a></li>
182	<li><a href="style.wiki"><b>Source Code Style Guidelines</b></a></li>
183	<li><a href="antibot.wiki">Spiders and Bots — Defense against</a></li>
184

	--- www/permutedindex.html
	+++ www/permutedindex.html
	@@ -29,10 +29,11 @@
29	<li><a href="blame.wiki">Annotate/Blame Algorithm Of Fossil — The</a></li>
30	<li><a href="customskin.md">Appearance of Web Pages — Theming: Customizing The</a></li>
31	<li><a href="faq.wiki">Asked Questions — Frequently</a></li>
32	<li><a href="password.wiki">Authentication — Password Management And</a></li>
33	<li><a href="whyusefossil.wiki"><b>Benefits Of Version Control</b></a></li>
34	<li><a href="hashpolicy.wiki">Between SHA1 and SHA3-256 — Hash Policy: Choosing</a></li>
35	<li><a href="antibot.wiki">Bots — Defense against Spiders and</a></li>
36	<li><a href="private.wiki">Branches — Creating, Syncing, and Deleting Private</a></li>
37	<li><a href="branching.wiki"><b>Branching, Forking, Merging, and Tagging</b></a></li>
38	<li><a href="bugtheory.wiki"><b>Bug Tracking In Fossil</b></a></li>
39	<li><a href="makefile.wiki">Build Process — The Fossil</a></li>
	@@ -43,10 +44,11 @@
44	<li><a href="checkin.wiki">Checklist — Check-in</a></li>
45	<li><a href="../test/release-checklist.wiki">Checklist — Pre-Release Testing</a></li>
46	<li><a href="foss-cklist.wiki"><b>Checklist For Successful Open-Source Projects</b></a></li>
47	<li><a href="selfcheck.wiki">Checks — Fossil Repository Integrity Self</a></li>
48	<li><a href="childprojects.wiki"><b>Child Projects</b></a></li>
49	<li><a href="hashpolicy.wiki">Choosing Between SHA1 and SHA3-256 — Hash Policy:</a></li>
50	<li><a href="contribute.wiki">Code or Documentation To The Fossil Project — Contributing</a></li>
51	<li><a href="style.wiki">Code Style Guidelines — Source</a></li>
52	<li><a href="../../../help">Commands and Webpages — Lists of</a></li>
53	<li><a href="build.wiki"><b>Compiling and Installing Fossil</b></a></li>
54	<li><a href="concepts.wiki">Concepts — Fossil Core</a></li>
	@@ -111,10 +113,11 @@
113	<li><a href="customgraph.md">Graph — Theming: Customizing the Timeline</a></li>
114	<li><a href="quickstart.wiki">Guide — Fossil Quick Start</a></li>
115	<li><a href="style.wiki">Guidelines — Source Code Style</a></li>
116	<li><a href="hacker-howto.wiki"><b>Hacker How-To</b></a></li>
117	<li><a href="adding_code.wiki"><b>Hacking Fossil</b></a></li>
118	<li><a href="hashpolicy.wiki"><b>Hash Policy: Choosing Between SHA1 and SHA3-256</b></a></li>
119	<li><a href="hints.wiki">Hints — Fossil Tips And Usage</a></li>
120	<li><a href="index.wiki"><b>Home Page</b></a></li>
121	<li><a href="selfhost.wiki">Hosting Repositories — Fossil Self</a></li>
122	<li><a href="aboutcgi.wiki"><b>How CGI Works In Fossil</b></a></li>
123	<li><a href="server.wiki"><b>How To Configure A Fossil Server</b></a></li>
	@@ -147,10 +150,11 @@
150	<li><a href="index.wiki">Page — Home</a></li>
151	<li><a href="customskin.md">Pages — Theming: Customizing The Appearance of Web</a></li>
152	<li><a href="password.wiki"><b>Password Management And Authentication</b></a></li>
153	<li><a href="quotes.wiki">People Are Saying About Fossil, Git, and DVCSes in General — Quotes: What</a></li>
154	<li><a href="stats.wiki"><b>Performance Statistics</b></a></li>
155	<li><a href="hashpolicy.wiki">Policy: Choosing Between SHA1 and SHA3-256 — Hash</a></li>
156	<li><a href="../test/release-checklist.wiki"><b>Pre-Release Testing Checklist</b></a></li>
157	<li><a href="pop.wiki"><b>Principles Of Operation</b></a></li>
158	<li><a href="private.wiki">Private Branches — Creating, Syncing, and Deleting</a></li>
159	<li><a href="makefile.wiki">Process — The Fossil Build</a></li>
160	<li><a href="contribute.wiki">Project — Contributing Code or Documentation To The Fossil</a></li>
	@@ -174,10 +178,12 @@
178	<li><a href="th1.md">Scripting Language — The TH1</a></li>
179	<li><a href="selfcheck.wiki">Self Checks — Fossil Repository Integrity</a></li>
180	<li><a href="selfhost.wiki">Self Hosting Repositories — Fossil</a></li>
181	<li><a href="server.wiki">Server — How To Configure A Fossil</a></li>
182	<li><a href="settings.wiki">Settings — Fossil</a></li>
183	<li><a href="hashpolicy.wiki">SHA1 and SHA3-256 — Hash Policy: Choosing Between</a></li>
184	<li><a href="hashpolicy.wiki">SHA3-256 — Hash Policy: Choosing Between SHA1 and</a></li>
185	<li><a href="shunning.wiki"><b>Shunning: Deleting Content From Fossil</b></a></li>
186	<li><a href="fiveminutes.wiki">Single User — Update and Running in 5 Minutes as a</a></li>
187	<li><a href="../../../sitemap"><b>Site Map</b></a></li>
188	<li><a href="style.wiki"><b>Source Code Style Guidelines</b></a></li>
189	<li><a href="antibot.wiki">Spiders and Bots — Defense against</a></li>
190

Fossil SCM

Keyboard Shortcuts