Fossil SCM

fossil-scm / compat / zlib / examples / zlib_how.html
Source Blame History 550 lines
f1f1d6c… drh 1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
f1f1d6c… drh 2 "http://www.w3.org/TR/html4/loose.dtd">
7ef7284… drh 3 <html>
7ef7284… drh 4 <head>
7ef7284… drh 5 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
7ef7284… drh 6 <title>zlib Usage Example</title>
6ea30fb… florian 7 <!-- Copyright (c) 2004-2026 Mark Adler. -->
7ef7284… drh 8 </head>
7ef7284… drh 9 <body bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#00A000">
7ef7284… drh 10 <h2 align="center"> zlib Usage Example </h2>
7ef7284… drh 11 We often get questions about how the <tt>deflate()</tt> and <tt>inflate()</tt> functions should be used.
7ef7284… drh 12 Users wonder when they should provide more input, when they should use more output,
7ef7284… drh 13 what to do with a <tt>Z_BUF_ERROR</tt>, how to make sure the process terminates properly, and
7ef7284… drh 14 so on. So for those who have read <tt>zlib.h</tt> (a few times), and
7ef7284… drh 15 would like further edification, below is an annotated example in C of simple routines to compress and decompress
7ef7284… drh 16 from an input file to an output file using <tt>deflate()</tt> and <tt>inflate()</tt> respectively. The
7ef7284… drh 17 annotations are interspersed between lines of the code. So please read between the lines.
7ef7284… drh 18 We hope this helps explain some of the intricacies of <em>zlib</em>.
7ef7284… drh 19 <p>
f1f1d6c… drh 20 Without further ado, here is the program <a href="zpipe.c"><tt>zpipe.c</tt></a>:
7ef7284… drh 21 <pre><b>
7ef7284… drh 22 /* zpipe.c: example of proper use of zlib's inflate() and deflate()
7ef7284… drh 23 Not copyrighted -- provided to the public domain
6ea30fb… florian 24 Version 1.5 11 February 2026 Mark Adler */
7ef7284… drh 25
7ef7284… drh 26 /* Version history:
7ef7284… drh 27 1.0 30 Oct 2004 First version
7ef7284… drh 28 1.1 8 Nov 2004 Add void casting for unused return values
7ef7284… drh 29 Use switch statement for inflate() return values
7ef7284… drh 30 1.2 9 Nov 2004 Add assertions to document zlib guarantees
7ef7284… drh 31 1.3 6 Apr 2005 Remove incorrect assertion in inf()
7ef7284… drh 32 1.4 11 Dec 2005 Add hack to avoid MSDOS end-of-line conversions
7ef7284… drh 33 Avoid some compiler warnings for input and output buffers
6ea30fb… florian 34 1.5 11 Feb 2026 Use underscores for Windows POSIX names
7ef7284… drh 35 */
7ef7284… drh 36 </b></pre><!-- -->
7ef7284… drh 37 We now include the header files for the required definitions. From
7ef7284… drh 38 <tt>stdio.h</tt> we use <tt>fopen()</tt>, <tt>fread()</tt>, <tt>fwrite()</tt>,
7ef7284… drh 39 <tt>feof()</tt>, <tt>ferror()</tt>, and <tt>fclose()</tt> for file i/o, and
7ef7284… drh 40 <tt>fputs()</tt> for error messages. From <tt>string.h</tt> we use
7ef7284… drh 41 <tt>strcmp()</tt> for command line argument processing.
7ef7284… drh 42 From <tt>assert.h</tt> we use the <tt>assert()</tt> macro.
7ef7284… drh 43 From <tt>zlib.h</tt>
7ef7284… drh 44 we use the basic compression functions <tt>deflateInit()</tt>,
7ef7284… drh 45 <tt>deflate()</tt>, and <tt>deflateEnd()</tt>, and the basic decompression
7ef7284… drh 46 functions <tt>inflateInit()</tt>, <tt>inflate()</tt>, and
7ef7284… drh 47 <tt>inflateEnd()</tt>.
7ef7284… drh 48 <pre><b>
7ef7284… drh 49 #include &lt;stdio.h&gt;
7ef7284… drh 50 #include &lt;string.h&gt;
7ef7284… drh 51 #include &lt;assert.h&gt;
7ef7284… drh 52 #include "zlib.h"
7ef7284… drh 53 </b></pre><!-- -->
7ef7284… drh 54 This is an ugly hack required to avoid corruption of the input and output data on
7ef7284… drh 55 Windows/MS-DOS systems. Without this, those systems would assume that the input and output
7ef7284… drh 56 files are text, and try to convert the end-of-line characters from one standard to
7ef7284… drh 57 another. That would corrupt binary data, and in particular would render the compressed data unusable.
7ef7284… drh 58 This sets the input and output to binary which suppresses the end-of-line conversions.
7ef7284… drh 59 <tt>SET_BINARY_MODE()</tt> will be used later on <tt>stdin</tt> and <tt>stdout</tt>, at the beginning of <tt>main()</tt>.
7ef7284… drh 60 <pre><b>
7ef7284… drh 61 #if defined(MSDOS) || defined(OS2) || defined(WIN32) || defined(__CYGWIN__)
7ef7284… drh 62 # include &lt;fcntl.h&gt;
7ef7284… drh 63 # include &lt;io.h&gt;
6ea30fb… florian 64 # define SET_BINARY_MODE(file) _setmode(_fileno(file), _O_BINARY)
7ef7284… drh 65 #else
7ef7284… drh 66 # define SET_BINARY_MODE(file)
7ef7284… drh 67 #endif
7ef7284… drh 68 </b></pre><!-- -->
7ef7284… drh 69 <tt>CHUNK</tt> is simply the buffer size for feeding data to and pulling data
7ef7284… drh 70 from the <em>zlib</em> routines. Larger buffer sizes would be more efficient,
7ef7284… drh 71 especially for <tt>inflate()</tt>. If the memory is available, buffers sizes
7ef7284… drh 72 on the order of 128K or 256K bytes should be used.
7ef7284… drh 73 <pre><b>
7ef7284… drh 74 #define CHUNK 16384
7ef7284… drh 75 </b></pre><!-- -->
7ef7284… drh 76 The <tt>def()</tt> routine compresses data from an input file to an output file. The output data
7ef7284… drh 77 will be in the <em>zlib</em> format, which is different from the <em>gzip</em> or <em>zip</em>
7ef7284… drh 78 formats. The <em>zlib</em> format has a very small header of only two bytes to identify it as
7ef7284… drh 79 a <em>zlib</em> stream and to provide decoding information, and a four-byte trailer with a fast
7ef7284… drh 80 check value to verify the integrity of the uncompressed data after decoding.
7ef7284… drh 81 <pre><b>
7ef7284… drh 82 /* Compress from file source to file dest until EOF on source.
7ef7284… drh 83 def() returns Z_OK on success, Z_MEM_ERROR if memory could not be
7ef7284… drh 84 allocated for processing, Z_STREAM_ERROR if an invalid compression
7ef7284… drh 85 level is supplied, Z_VERSION_ERROR if the version of zlib.h and the
7ef7284… drh 86 version of the library linked do not match, or Z_ERRNO if there is
7ef7284… drh 87 an error reading or writing the files. */
7ef7284… drh 88 int def(FILE *source, FILE *dest, int level)
7ef7284… drh 89 {
7ef7284… drh 90 </b></pre>
7ef7284… drh 91 Here are the local variables for <tt>def()</tt>. <tt>ret</tt> will be used for <em>zlib</em>
7ef7284… drh 92 return codes. <tt>flush</tt> will keep track of the current flushing state for <tt>deflate()</tt>,
7ef7284… drh 93 which is either no flushing, or flush to completion after the end of the input file is reached.
7ef7284… drh 94 <tt>have</tt> is the amount of data returned from <tt>deflate()</tt>. The <tt>strm</tt> structure
7ef7284… drh 95 is used to pass information to and from the <em>zlib</em> routines, and to maintain the
7ef7284… drh 96 <tt>deflate()</tt> state. <tt>in</tt> and <tt>out</tt> are the input and output buffers for
7ef7284… drh 97 <tt>deflate()</tt>.
7ef7284… drh 98 <pre><b>
7ef7284… drh 99 int ret, flush;
7ef7284… drh 100 unsigned have;
7ef7284… drh 101 z_stream strm;
7ef7284… drh 102 unsigned char in[CHUNK];
7ef7284… drh 103 unsigned char out[CHUNK];
7ef7284… drh 104 </b></pre><!-- -->
7ef7284… drh 105 The first thing we do is to initialize the <em>zlib</em> state for compression using
7ef7284… drh 106 <tt>deflateInit()</tt>. This must be done before the first use of <tt>deflate()</tt>.
7ef7284… drh 107 The <tt>zalloc</tt>, <tt>zfree</tt>, and <tt>opaque</tt> fields in the <tt>strm</tt>
7ef7284… drh 108 structure must be initialized before calling <tt>deflateInit()</tt>. Here they are
7ef7284… drh 109 set to the <em>zlib</em> constant <tt>Z_NULL</tt> to request that <em>zlib</em> use
7ef7284… drh 110 the default memory allocation routines. An application may also choose to provide
7ef7284… drh 111 custom memory allocation routines here. <tt>deflateInit()</tt> will allocate on the
7ef7284… drh 112 order of 256K bytes for the internal state.
7ef7284… drh 113 (See <a href="zlib_tech.html"><em>zlib Technical Details</em></a>.)
7ef7284… drh 114 <p>
7ef7284… drh 115 <tt>deflateInit()</tt> is called with a pointer to the structure to be initialized and
7ef7284… drh 116 the compression level, which is an integer in the range of -1 to 9. Lower compression
7ef7284… drh 117 levels result in faster execution, but less compression. Higher levels result in
7ef7284… drh 118 greater compression, but slower execution. The <em>zlib</em> constant Z_DEFAULT_COMPRESSION,
7ef7284… drh 119 equal to -1,
7ef7284… drh 120 provides a good compromise between compression and speed and is equivalent to level 6.
7ef7284… drh 121 Level 0 actually does no compression at all, and in fact expands the data slightly to produce
7ef7284… drh 122 the <em>zlib</em> format (it is not a byte-for-byte copy of the input).
7ef7284… drh 123 More advanced applications of <em>zlib</em>
7ef7284… drh 124 may use <tt>deflateInit2()</tt> here instead. Such an application may want to reduce how
7ef7284… drh 125 much memory will be used, at some price in compression. Or it may need to request a
7ef7284… drh 126 <em>gzip</em> header and trailer instead of a <em>zlib</em> header and trailer, or raw
7ef7284… drh 127 encoding with no header or trailer at all.
7ef7284… drh 128 <p>
7ef7284… drh 129 We must check the return value of <tt>deflateInit()</tt> against the <em>zlib</em> constant
7ef7284… drh 130 <tt>Z_OK</tt> to make sure that it was able to
7ef7284… drh 131 allocate memory for the internal state, and that the provided arguments were valid.
7ef7284… drh 132 <tt>deflateInit()</tt> will also check that the version of <em>zlib</em> that the <tt>zlib.h</tt>
7ef7284… drh 133 file came from matches the version of <em>zlib</em> actually linked with the program. This
7ef7284… drh 134 is especially important for environments in which <em>zlib</em> is a shared library.
7ef7284… drh 135 <p>
7ef7284… drh 136 Note that an application can initialize multiple, independent <em>zlib</em> streams, which can
7ef7284… drh 137 operate in parallel. The state information maintained in the structure allows the <em>zlib</em>
7ef7284… drh 138 routines to be reentrant.
7ef7284… drh 139 <pre><b>
7ef7284… drh 140 /* allocate deflate state */
7ef7284… drh 141 strm.zalloc = Z_NULL;
7ef7284… drh 142 strm.zfree = Z_NULL;
7ef7284… drh 143 strm.opaque = Z_NULL;
7ef7284… drh 144 ret = deflateInit(&amp;strm, level);
7ef7284… drh 145 if (ret != Z_OK)
7ef7284… drh 146 return ret;
7ef7284… drh 147 </b></pre><!-- -->
7ef7284… drh 148 With the pleasantries out of the way, now we can get down to business. The outer <tt>do</tt>-loop
7ef7284… drh 149 reads all of the input file and exits at the bottom of the loop once end-of-file is reached.
7ef7284… drh 150 This loop contains the only call of <tt>deflate()</tt>. So we must make sure that all of the
7ef7284… drh 151 input data has been processed and that all of the output data has been generated and consumed
7ef7284… drh 152 before we fall out of the loop at the bottom.
7ef7284… drh 153 <pre><b>
7ef7284… drh 154 /* compress until end of file */
7ef7284… drh 155 do {
7ef7284… drh 156 </b></pre>
7ef7284… drh 157 We start off by reading data from the input file. The number of bytes read is put directly
7ef7284… drh 158 into <tt>avail_in</tt>, and a pointer to those bytes is put into <tt>next_in</tt>. We also
f1f1d6c… drh 159 check to see if end-of-file on the input has been reached using feof().
f1f1d6c… drh 160 If we are at the end of file, then <tt>flush</tt> is set to the
7ef7284… drh 161 <em>zlib</em> constant <tt>Z_FINISH</tt>, which is later passed to <tt>deflate()</tt> to
f1f1d6c… drh 162 indicate that this is the last chunk of input data to compress.
f1f1d6c… drh 163 If we are not yet at the end of the input, then the <em>zlib</em>
7ef7284… drh 164 constant <tt>Z_NO_FLUSH</tt> will be passed to <tt>deflate</tt> to indicate that we are still
7ef7284… drh 165 in the middle of the uncompressed data.
7ef7284… drh 166 <p>
7ef7284… drh 167 If there is an error in reading from the input file, the process is aborted with
7ef7284… drh 168 <tt>deflateEnd()</tt> being called to free the allocated <em>zlib</em> state before returning
7ef7284… drh 169 the error. We wouldn't want a memory leak, now would we? <tt>deflateEnd()</tt> can be called
7ef7284… drh 170 at any time after the state has been initialized. Once that's done, <tt>deflateInit()</tt> (or
7ef7284… drh 171 <tt>deflateInit2()</tt>) would have to be called to start a new compression process. There is
7ef7284… drh 172 no point here in checking the <tt>deflateEnd()</tt> return code. The deallocation can't fail.
7ef7284… drh 173 <pre><b>
7ef7284… drh 174 strm.avail_in = fread(in, 1, CHUNK, source);
7ef7284… drh 175 if (ferror(source)) {
7ef7284… drh 176 (void)deflateEnd(&amp;strm);
7ef7284… drh 177 return Z_ERRNO;
7ef7284… drh 178 }
7ef7284… drh 179 flush = feof(source) ? Z_FINISH : Z_NO_FLUSH;
7ef7284… drh 180 strm.next_in = in;
7ef7284… drh 181 </b></pre><!-- -->
7ef7284… drh 182 The inner <tt>do</tt>-loop passes our chunk of input data to <tt>deflate()</tt>, and then
7ef7284… drh 183 keeps calling <tt>deflate()</tt> until it is done producing output. Once there is no more
7ef7284… drh 184 new output, <tt>deflate()</tt> is guaranteed to have consumed all of the input, i.e.,
7ef7284… drh 185 <tt>avail_in</tt> will be zero.
7ef7284… drh 186 <pre><b>
7ef7284… drh 187 /* run deflate() on input until output buffer not full, finish
7ef7284… drh 188 compression if all of source has been read in */
7ef7284… drh 189 do {
7ef7284… drh 190 </b></pre>
7ef7284… drh 191 Output space is provided to <tt>deflate()</tt> by setting <tt>avail_out</tt> to the number
7ef7284… drh 192 of available output bytes and <tt>next_out</tt> to a pointer to that space.
7ef7284… drh 193 <pre><b>
7ef7284… drh 194 strm.avail_out = CHUNK;
7ef7284… drh 195 strm.next_out = out;
7ef7284… drh 196 </b></pre>
7ef7284… drh 197 Now we call the compression engine itself, <tt>deflate()</tt>. It takes as many of the
7ef7284… drh 198 <tt>avail_in</tt> bytes at <tt>next_in</tt> as it can process, and writes as many as
7ef7284… drh 199 <tt>avail_out</tt> bytes to <tt>next_out</tt>. Those counters and pointers are then
7ef7284… drh 200 updated past the input data consumed and the output data written. It is the amount of
7ef7284… drh 201 output space available that may limit how much input is consumed.
7ef7284… drh 202 Hence the inner loop to make sure that
7ef7284… drh 203 all of the input is consumed by providing more output space each time. Since <tt>avail_in</tt>
7ef7284… drh 204 and <tt>next_in</tt> are updated by <tt>deflate()</tt>, we don't have to mess with those
7ef7284… drh 205 between <tt>deflate()</tt> calls until it's all used up.
7ef7284… drh 206 <p>
7ef7284… drh 207 The parameters to <tt>deflate()</tt> are a pointer to the <tt>strm</tt> structure containing
7ef7284… drh 208 the input and output information and the internal compression engine state, and a parameter
7ef7284… drh 209 indicating whether and how to flush data to the output. Normally <tt>deflate</tt> will consume
7ef7284… drh 210 several K bytes of input data before producing any output (except for the header), in order
7ef7284… drh 211 to accumulate statistics on the data for optimum compression. It will then put out a burst of
7ef7284… drh 212 compressed data, and proceed to consume more input before the next burst. Eventually,
7ef7284… drh 213 <tt>deflate()</tt>
7ef7284… drh 214 must be told to terminate the stream, complete the compression with provided input data, and
7ef7284… drh 215 write out the trailer check value. <tt>deflate()</tt> will continue to compress normally as long
7ef7284… drh 216 as the flush parameter is <tt>Z_NO_FLUSH</tt>. Once the <tt>Z_FINISH</tt> parameter is provided,
7ef7284… drh 217 <tt>deflate()</tt> will begin to complete the compressed output stream. However depending on how
7ef7284… drh 218 much output space is provided, <tt>deflate()</tt> may have to be called several times until it
7ef7284… drh 219 has provided the complete compressed stream, even after it has consumed all of the input. The flush
7ef7284… drh 220 parameter must continue to be <tt>Z_FINISH</tt> for those subsequent calls.
7ef7284… drh 221 <p>
7ef7284… drh 222 There are other values of the flush parameter that are used in more advanced applications. You can
7ef7284… drh 223 force <tt>deflate()</tt> to produce a burst of output that encodes all of the input data provided
7ef7284… drh 224 so far, even if it wouldn't have otherwise, for example to control data latency on a link with
7ef7284… drh 225 compressed data. You can also ask that <tt>deflate()</tt> do that as well as erase any history up to
7ef7284… drh 226 that point so that what follows can be decompressed independently, for example for random access
7ef7284… drh 227 applications. Both requests will degrade compression by an amount depending on how often such
7ef7284… drh 228 requests are made.
7ef7284… drh 229 <p>
7ef7284… drh 230 <tt>deflate()</tt> has a return value that can indicate errors, yet we do not check it here. Why
7ef7284… drh 231 not? Well, it turns out that <tt>deflate()</tt> can do no wrong here. Let's go through
7ef7284… drh 232 <tt>deflate()</tt>'s return values and dispense with them one by one. The possible values are
7ef7284… drh 233 <tt>Z_OK</tt>, <tt>Z_STREAM_END</tt>, <tt>Z_STREAM_ERROR</tt>, or <tt>Z_BUF_ERROR</tt>. <tt>Z_OK</tt>
7ef7284… drh 234 is, well, ok. <tt>Z_STREAM_END</tt> is also ok and will be returned for the last call of
7ef7284… drh 235 <tt>deflate()</tt>. This is already guaranteed by calling <tt>deflate()</tt> with <tt>Z_FINISH</tt>
7ef7284… drh 236 until it has no more output. <tt>Z_STREAM_ERROR</tt> is only possible if the stream is not
7ef7284… drh 237 initialized properly, but we did initialize it properly. There is no harm in checking for
7ef7284… drh 238 <tt>Z_STREAM_ERROR</tt> here, for example to check for the possibility that some
7ef7284… drh 239 other part of the application inadvertently clobbered the memory containing the <em>zlib</em> state.
7ef7284… drh 240 <tt>Z_BUF_ERROR</tt> will be explained further below, but
7ef7284… drh 241 suffice it to say that this is simply an indication that <tt>deflate()</tt> could not consume
7ef7284… drh 242 more input or produce more output. <tt>deflate()</tt> can be called again with more output space
7ef7284… drh 243 or more available input, which it will be in this code.
7ef7284… drh 244 <pre><b>
7ef7284… drh 245 ret = deflate(&amp;strm, flush); /* no bad return value */
7ef7284… drh 246 assert(ret != Z_STREAM_ERROR); /* state not clobbered */
7ef7284… drh 247 </b></pre>
7ef7284… drh 248 Now we compute how much output <tt>deflate()</tt> provided on the last call, which is the
7ef7284… drh 249 difference between how much space was provided before the call, and how much output space
7ef7284… drh 250 is still available after the call. Then that data, if any, is written to the output file.
7ef7284… drh 251 We can then reuse the output buffer for the next call of <tt>deflate()</tt>. Again if there
7ef7284… drh 252 is a file i/o error, we call <tt>deflateEnd()</tt> before returning to avoid a memory leak.
7ef7284… drh 253 <pre><b>
7ef7284… drh 254 have = CHUNK - strm.avail_out;
7ef7284… drh 255 if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
7ef7284… drh 256 (void)deflateEnd(&amp;strm);
7ef7284… drh 257 return Z_ERRNO;
7ef7284… drh 258 }
7ef7284… drh 259 </b></pre>
7ef7284… drh 260 The inner <tt>do</tt>-loop is repeated until the last <tt>deflate()</tt> call fails to fill the
7ef7284… drh 261 provided output buffer. Then we know that <tt>deflate()</tt> has done as much as it can with
7ef7284… drh 262 the provided input, and that all of that input has been consumed. We can then fall out of this
7ef7284… drh 263 loop and reuse the input buffer.
7ef7284… drh 264 <p>
7ef7284… drh 265 The way we tell that <tt>deflate()</tt> has no more output is by seeing that it did not fill
7ef7284… drh 266 the output buffer, leaving <tt>avail_out</tt> greater than zero. However suppose that
7ef7284… drh 267 <tt>deflate()</tt> has no more output, but just so happened to exactly fill the output buffer!
7ef7284… drh 268 <tt>avail_out</tt> is zero, and we can't tell that <tt>deflate()</tt> has done all it can.
7ef7284… drh 269 As far as we know, <tt>deflate()</tt>
7ef7284… drh 270 has more output for us. So we call it again. But now <tt>deflate()</tt> produces no output
7ef7284… drh 271 at all, and <tt>avail_out</tt> remains unchanged as <tt>CHUNK</tt>. That <tt>deflate()</tt> call
7ef7284… drh 272 wasn't able to do anything, either consume input or produce output, and so it returns
7ef7284… drh 273 <tt>Z_BUF_ERROR</tt>. (See, I told you I'd cover this later.) However this is not a problem at
7ef7284… drh 274 all. Now we finally have the desired indication that <tt>deflate()</tt> is really done,
7ef7284… drh 275 and so we drop out of the inner loop to provide more input to <tt>deflate()</tt>.
7ef7284… drh 276 <p>
7ef7284… drh 277 With <tt>flush</tt> set to <tt>Z_FINISH</tt>, this final set of <tt>deflate()</tt> calls will
7ef7284… drh 278 complete the output stream. Once that is done, subsequent calls of <tt>deflate()</tt> would return
7ef7284… drh 279 <tt>Z_STREAM_ERROR</tt> if the flush parameter is not <tt>Z_FINISH</tt>, and do no more processing
7ef7284… drh 280 until the state is reinitialized.
7ef7284… drh 281 <p>
7ef7284… drh 282 Some applications of <em>zlib</em> have two loops that call <tt>deflate()</tt>
7ef7284… drh 283 instead of the single inner loop we have here. The first loop would call
7ef7284… drh 284 without flushing and feed all of the data to <tt>deflate()</tt>. The second loop would call
7ef7284… drh 285 <tt>deflate()</tt> with no more
7ef7284… drh 286 data and the <tt>Z_FINISH</tt> parameter to complete the process. As you can see from this
7ef7284… drh 287 example, that can be avoided by simply keeping track of the current flush state.
7ef7284… drh 288 <pre><b>
7ef7284… drh 289 } while (strm.avail_out == 0);
7ef7284… drh 290 assert(strm.avail_in == 0); /* all input will be used */
7ef7284… drh 291 </b></pre><!-- -->
7ef7284… drh 292 Now we check to see if we have already processed all of the input file. That information was
7ef7284… drh 293 saved in the <tt>flush</tt> variable, so we see if that was set to <tt>Z_FINISH</tt>. If so,
7ef7284… drh 294 then we're done and we fall out of the outer loop. We're guaranteed to get <tt>Z_STREAM_END</tt>
7ef7284… drh 295 from the last <tt>deflate()</tt> call, since we ran it until the last chunk of input was
7ef7284… drh 296 consumed and all of the output was generated.
7ef7284… drh 297 <pre><b>
7ef7284… drh 298 /* done when last data in file processed */
7ef7284… drh 299 } while (flush != Z_FINISH);
7ef7284… drh 300 assert(ret == Z_STREAM_END); /* stream will be complete */
7ef7284… drh 301 </b></pre><!-- -->
7ef7284… drh 302 The process is complete, but we still need to deallocate the state to avoid a memory leak
7ef7284… drh 303 (or rather more like a memory hemorrhage if you didn't do this). Then
7ef7284… drh 304 finally we can return with a happy return value.
7ef7284… drh 305 <pre><b>
7ef7284… drh 306 /* clean up and return */
7ef7284… drh 307 (void)deflateEnd(&amp;strm);
7ef7284… drh 308 return Z_OK;
7ef7284… drh 309 }
7ef7284… drh 310 </b></pre><!-- -->
7ef7284… drh 311 Now we do the same thing for decompression in the <tt>inf()</tt> routine. <tt>inf()</tt>
7ef7284… drh 312 decompresses what is hopefully a valid <em>zlib</em> stream from the input file and writes the
7ef7284… drh 313 uncompressed data to the output file. Much of the discussion above for <tt>def()</tt>
7ef7284… drh 314 applies to <tt>inf()</tt> as well, so the discussion here will focus on the differences between
7ef7284… drh 315 the two.
7ef7284… drh 316 <pre><b>
7ef7284… drh 317 /* Decompress from file source to file dest until stream ends or EOF.
7ef7284… drh 318 inf() returns Z_OK on success, Z_MEM_ERROR if memory could not be
7ef7284… drh 319 allocated for processing, Z_DATA_ERROR if the deflate data is
7ef7284… drh 320 invalid or incomplete, Z_VERSION_ERROR if the version of zlib.h and
7ef7284… drh 321 the version of the library linked do not match, or Z_ERRNO if there
7ef7284… drh 322 is an error reading or writing the files. */
7ef7284… drh 323 int inf(FILE *source, FILE *dest)
7ef7284… drh 324 {
7ef7284… drh 325 </b></pre>
7ef7284… drh 326 The local variables have the same functionality as they do for <tt>def()</tt>. The
7ef7284… drh 327 only difference is that there is no <tt>flush</tt> variable, since <tt>inflate()</tt>
7ef7284… drh 328 can tell from the <em>zlib</em> stream itself when the stream is complete.
7ef7284… drh 329 <pre><b>
7ef7284… drh 330 int ret;
7ef7284… drh 331 unsigned have;
7ef7284… drh 332 z_stream strm;
7ef7284… drh 333 unsigned char in[CHUNK];
7ef7284… drh 334 unsigned char out[CHUNK];
7ef7284… drh 335 </b></pre><!-- -->
7ef7284… drh 336 The initialization of the state is the same, except that there is no compression level,
7ef7284… drh 337 of course, and two more elements of the structure are initialized. <tt>avail_in</tt>
7ef7284… drh 338 and <tt>next_in</tt> must be initialized before calling <tt>inflateInit()</tt>. This
7ef7284… drh 339 is because the application has the option to provide the start of the zlib stream in
7ef7284… drh 340 order for <tt>inflateInit()</tt> to have access to information about the compression
7ef7284… drh 341 method to aid in memory allocation. In the current implementation of <em>zlib</em>
7ef7284… drh 342 (up through versions 1.2.x), the method-dependent memory allocations are deferred to the first call of
7ef7284… drh 343 <tt>inflate()</tt> anyway. However those fields must be initialized since later versions
7ef7284… drh 344 of <em>zlib</em> that provide more compression methods may take advantage of this interface.
7ef7284… drh 345 In any case, no decompression is performed by <tt>inflateInit()</tt>, so the
7ef7284… drh 346 <tt>avail_out</tt> and <tt>next_out</tt> fields do not need to be initialized before calling.
7ef7284… drh 347 <p>
7ef7284… drh 348 Here <tt>avail_in</tt> is set to zero and <tt>next_in</tt> is set to <tt>Z_NULL</tt> to
7ef7284… drh 349 indicate that no input data is being provided.
7ef7284… drh 350 <pre><b>
7ef7284… drh 351 /* allocate inflate state */
7ef7284… drh 352 strm.zalloc = Z_NULL;
7ef7284… drh 353 strm.zfree = Z_NULL;
7ef7284… drh 354 strm.opaque = Z_NULL;
7ef7284… drh 355 strm.avail_in = 0;
7ef7284… drh 356 strm.next_in = Z_NULL;
7ef7284… drh 357 ret = inflateInit(&amp;strm);
7ef7284… drh 358 if (ret != Z_OK)
7ef7284… drh 359 return ret;
7ef7284… drh 360 </b></pre><!-- -->
7ef7284… drh 361 The outer <tt>do</tt>-loop decompresses input until <tt>inflate()</tt> indicates
7ef7284… drh 362 that it has reached the end of the compressed data and has produced all of the uncompressed
7ef7284… drh 363 output. This is in contrast to <tt>def()</tt> which processes all of the input file.
7ef7284… drh 364 If end-of-file is reached before the compressed data self-terminates, then the compressed
7ef7284… drh 365 data is incomplete and an error is returned.
7ef7284… drh 366 <pre><b>
7ef7284… drh 367 /* decompress until deflate stream ends or end of file */
7ef7284… drh 368 do {
7ef7284… drh 369 </b></pre>
7ef7284… drh 370 We read input data and set the <tt>strm</tt> structure accordingly. If we've reached the
7ef7284… drh 371 end of the input file, then we leave the outer loop and report an error, since the
7ef7284… drh 372 compressed data is incomplete. Note that we may read more data than is eventually consumed
7ef7284… drh 373 by <tt>inflate()</tt>, if the input file continues past the <em>zlib</em> stream.
7ef7284… drh 374 For applications where <em>zlib</em> streams are embedded in other data, this routine would
7ef7284… drh 375 need to be modified to return the unused data, or at least indicate how much of the input
7ef7284… drh 376 data was not used, so the application would know where to pick up after the <em>zlib</em> stream.
7ef7284… drh 377 <pre><b>
7ef7284… drh 378 strm.avail_in = fread(in, 1, CHUNK, source);
7ef7284… drh 379 if (ferror(source)) {
7ef7284… drh 380 (void)inflateEnd(&amp;strm);
7ef7284… drh 381 return Z_ERRNO;
7ef7284… drh 382 }
7ef7284… drh 383 if (strm.avail_in == 0)
7ef7284… drh 384 break;
7ef7284… drh 385 strm.next_in = in;
7ef7284… drh 386 </b></pre><!-- -->
7ef7284… drh 387 The inner <tt>do</tt>-loop has the same function it did in <tt>def()</tt>, which is to
7ef7284… drh 388 keep calling <tt>inflate()</tt> until has generated all of the output it can with the
7ef7284… drh 389 provided input.
7ef7284… drh 390 <pre><b>
7ef7284… drh 391 /* run inflate() on input until output buffer not full */
7ef7284… drh 392 do {
7ef7284… drh 393 </b></pre>
7ef7284… drh 394 Just like in <tt>def()</tt>, the same output space is provided for each call of <tt>inflate()</tt>.
7ef7284… drh 395 <pre><b>
7ef7284… drh 396 strm.avail_out = CHUNK;
7ef7284… drh 397 strm.next_out = out;
7ef7284… drh 398 </b></pre>
7ef7284… drh 399 Now we run the decompression engine itself. There is no need to adjust the flush parameter, since
7ef7284… drh 400 the <em>zlib</em> format is self-terminating. The main difference here is that there are
7ef7284… drh 401 return values that we need to pay attention to. <tt>Z_DATA_ERROR</tt>
7ef7284… drh 402 indicates that <tt>inflate()</tt> detected an error in the <em>zlib</em> compressed data format,
7ef7284… drh 403 which means that either the data is not a <em>zlib</em> stream to begin with, or that the data was
7ef7284… drh 404 corrupted somewhere along the way since it was compressed. The other error to be processed is
7ef7284… drh 405 <tt>Z_MEM_ERROR</tt>, which can occur since memory allocation is deferred until <tt>inflate()</tt>
7ef7284… drh 406 needs it, unlike <tt>deflate()</tt>, whose memory is allocated at the start by <tt>deflateInit()</tt>.
7ef7284… drh 407 <p>
7ef7284… drh 408 Advanced applications may use
7ef7284… drh 409 <tt>deflateSetDictionary()</tt> to prime <tt>deflate()</tt> with a set of likely data to improve the
7ef7284… drh 410 first 32K or so of compression. This is noted in the <em>zlib</em> header, so <tt>inflate()</tt>
7ef7284… drh 411 requests that that dictionary be provided before it can start to decompress. Without the dictionary,
7ef7284… drh 412 correct decompression is not possible. For this routine, we have no idea what the dictionary is,
7ef7284… drh 413 so the <tt>Z_NEED_DICT</tt> indication is converted to a <tt>Z_DATA_ERROR</tt>.
7ef7284… drh 414 <p>
7ef7284… drh 415 <tt>inflate()</tt> can also return <tt>Z_STREAM_ERROR</tt>, which should not be possible here,
7ef7284… drh 416 but could be checked for as noted above for <tt>def()</tt>. <tt>Z_BUF_ERROR</tt> does not need to be
7ef7284… drh 417 checked for here, for the same reasons noted for <tt>def()</tt>. <tt>Z_STREAM_END</tt> will be
7ef7284… drh 418 checked for later.
7ef7284… drh 419 <pre><b>
7ef7284… drh 420 ret = inflate(&amp;strm, Z_NO_FLUSH);
7ef7284… drh 421 assert(ret != Z_STREAM_ERROR); /* state not clobbered */
7ef7284… drh 422 switch (ret) {
7ef7284… drh 423 case Z_NEED_DICT:
7ef7284… drh 424 ret = Z_DATA_ERROR; /* and fall through */
7ef7284… drh 425 case Z_DATA_ERROR:
7ef7284… drh 426 case Z_MEM_ERROR:
7ef7284… drh 427 (void)inflateEnd(&amp;strm);
7ef7284… drh 428 return ret;
7ef7284… drh 429 }
7ef7284… drh 430 </b></pre>
7ef7284… drh 431 The output of <tt>inflate()</tt> is handled identically to that of <tt>deflate()</tt>.
7ef7284… drh 432 <pre><b>
7ef7284… drh 433 have = CHUNK - strm.avail_out;
7ef7284… drh 434 if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
7ef7284… drh 435 (void)inflateEnd(&amp;strm);
7ef7284… drh 436 return Z_ERRNO;
7ef7284… drh 437 }
7ef7284… drh 438 </b></pre>
7ef7284… drh 439 The inner <tt>do</tt>-loop ends when <tt>inflate()</tt> has no more output as indicated
7ef7284… drh 440 by not filling the output buffer, just as for <tt>deflate()</tt>. In this case, we cannot
7ef7284… drh 441 assert that <tt>strm.avail_in</tt> will be zero, since the deflate stream may end before the file
7ef7284… drh 442 does.
7ef7284… drh 443 <pre><b>
7ef7284… drh 444 } while (strm.avail_out == 0);
7ef7284… drh 445 </b></pre><!-- -->
7ef7284… drh 446 The outer <tt>do</tt>-loop ends when <tt>inflate()</tt> reports that it has reached the
7ef7284… drh 447 end of the input <em>zlib</em> stream, has completed the decompression and integrity
7ef7284… drh 448 check, and has provided all of the output. This is indicated by the <tt>inflate()</tt>
7ef7284… drh 449 return value <tt>Z_STREAM_END</tt>. The inner loop is guaranteed to leave <tt>ret</tt>
7ef7284… drh 450 equal to <tt>Z_STREAM_END</tt> if the last chunk of the input file read contained the end
7ef7284… drh 451 of the <em>zlib</em> stream. So if the return value is not <tt>Z_STREAM_END</tt>, the
7ef7284… drh 452 loop continues to read more input.
7ef7284… drh 453 <pre><b>
7ef7284… drh 454 /* done when inflate() says it's done */
7ef7284… drh 455 } while (ret != Z_STREAM_END);
7ef7284… drh 456 </b></pre><!-- -->
7ef7284… drh 457 At this point, decompression successfully completed, or we broke out of the loop due to no
7ef7284… drh 458 more data being available from the input file. If the last <tt>inflate()</tt> return value
7ef7284… drh 459 is not <tt>Z_STREAM_END</tt>, then the <em>zlib</em> stream was incomplete and a data error
7ef7284… drh 460 is returned. Otherwise, we return with a happy return value. Of course, <tt>inflateEnd()</tt>
7ef7284… drh 461 is called first to avoid a memory leak.
7ef7284… drh 462 <pre><b>
7ef7284… drh 463 /* clean up and return */
7ef7284… drh 464 (void)inflateEnd(&amp;strm);
7ef7284… drh 465 return ret == Z_STREAM_END ? Z_OK : Z_DATA_ERROR;
7ef7284… drh 466 }
7ef7284… drh 467 </b></pre><!-- -->
7ef7284… drh 468 That ends the routines that directly use <em>zlib</em>. The following routines make this
7ef7284… drh 469 a command-line program by running data through the above routines from <tt>stdin</tt> to
7ef7284… drh 470 <tt>stdout</tt>, and handling any errors reported by <tt>def()</tt> or <tt>inf()</tt>.
7ef7284… drh 471 <p>
7ef7284… drh 472 <tt>zerr()</tt> is used to interpret the possible error codes from <tt>def()</tt>
7ef7284… drh 473 and <tt>inf()</tt>, as detailed in their comments above, and print out an error message.
7ef7284… drh 474 Note that these are only a subset of the possible return values from <tt>deflate()</tt>
7ef7284… drh 475 and <tt>inflate()</tt>.
7ef7284… drh 476 <pre><b>
7ef7284… drh 477 /* report a zlib or i/o error */
7ef7284… drh 478 void zerr(int ret)
7ef7284… drh 479 {
7ef7284… drh 480 fputs("zpipe: ", stderr);
7ef7284… drh 481 switch (ret) {
7ef7284… drh 482 case Z_ERRNO:
7ef7284… drh 483 if (ferror(stdin))
7ef7284… drh 484 fputs("error reading stdin\n", stderr);
7ef7284… drh 485 if (ferror(stdout))
7ef7284… drh 486 fputs("error writing stdout\n", stderr);
7ef7284… drh 487 break;
7ef7284… drh 488 case Z_STREAM_ERROR:
7ef7284… drh 489 fputs("invalid compression level\n", stderr);
7ef7284… drh 490 break;
7ef7284… drh 491 case Z_DATA_ERROR:
7ef7284… drh 492 fputs("invalid or incomplete deflate data\n", stderr);
7ef7284… drh 493 break;
7ef7284… drh 494 case Z_MEM_ERROR:
7ef7284… drh 495 fputs("out of memory\n", stderr);
7ef7284… drh 496 break;
7ef7284… drh 497 case Z_VERSION_ERROR:
7ef7284… drh 498 fputs("zlib version mismatch!\n", stderr);
7ef7284… drh 499 }
7ef7284… drh 500 }
7ef7284… drh 501 </b></pre><!-- -->
7ef7284… drh 502 Here is the <tt>main()</tt> routine used to test <tt>def()</tt> and <tt>inf()</tt>. The
7ef7284… drh 503 <tt>zpipe</tt> command is simply a compression pipe from <tt>stdin</tt> to <tt>stdout</tt>, if
7ef7284… drh 504 no arguments are given, or it is a decompression pipe if <tt>zpipe -d</tt> is used. If any other
7ef7284… drh 505 arguments are provided, no compression or decompression is performed. Instead a usage
7ef7284… drh 506 message is displayed. Examples are <tt>zpipe < foo.txt > foo.txt.z</tt> to compress, and
7ef7284… drh 507 <tt>zpipe -d < foo.txt.z > foo.txt</tt> to decompress.
7ef7284… drh 508 <pre><b>
7ef7284… drh 509 /* compress or decompress from stdin to stdout */
7ef7284… drh 510 int main(int argc, char **argv)
7ef7284… drh 511 {
7ef7284… drh 512 int ret;
7ef7284… drh 513
7ef7284… drh 514 /* avoid end-of-line conversions */
7ef7284… drh 515 SET_BINARY_MODE(stdin);
7ef7284… drh 516 SET_BINARY_MODE(stdout);
7ef7284… drh 517
7ef7284… drh 518 /* do compression if no arguments */
7ef7284… drh 519 if (argc == 1) {
7ef7284… drh 520 ret = def(stdin, stdout, Z_DEFAULT_COMPRESSION);
7ef7284… drh 521 if (ret != Z_OK)
7ef7284… drh 522 zerr(ret);
7ef7284… drh 523 return ret;
7ef7284… drh 524 }
7ef7284… drh 525
7ef7284… drh 526 /* do decompression if -d specified */
7ef7284… drh 527 else if (argc == 2 &amp;&amp; strcmp(argv[1], "-d") == 0) {
7ef7284… drh 528 ret = inf(stdin, stdout);
7ef7284… drh 529 if (ret != Z_OK)
7ef7284… drh 530 zerr(ret);
7ef7284… drh 531 return ret;
7ef7284… drh 532 }
7ef7284… drh 533
7ef7284… drh 534 /* otherwise, report usage */
7ef7284… drh 535 else {
7ef7284… drh 536 fputs("zpipe usage: zpipe [-d] &lt; source &gt; dest\n", stderr);
7ef7284… drh 537 return 1;
7ef7284… drh 538 }
7ef7284… drh 539 }
7ef7284… drh 540 </b></pre>
7ef7284… drh 541 <hr>
6ea30fb… florian 542 <i>Last modified 12 February 2026<br>
6ea30fb… florian 543 Copyright &#169; 2004-2026 Mark Adler</i><br>
6ea30fb… florian 544 <a rel="license" href="https://creativecommons.org/licenses/by-nd/4.0/">
f1f1d6c… drh 545 <img alt="Creative Commons License" style="border-width:0"
f1f1d6c… drh 546 src="https://i.creativecommons.org/l/by-nd/4.0/88x31.png"></a>
6ea30fb… florian 547 <a rel="license" href="https://creativecommons.org/licenses/by-nd/4.0/">
f1f1d6c… drh 548 Creative Commons Attribution-NoDerivatives 4.0 International License</a>.
7ef7284… drh 549 </body>
7ef7284… drh 550 </html>

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button