|
05f95db…
|
wyoung
|
1 |
# Image Format vs Fossil Repo Size |
|
05f95db…
|
wyoung
|
2 |
|
|
05f95db…
|
wyoung
|
3 |
## The Problem |
|
05f95db…
|
wyoung
|
4 |
|
|
05f95db…
|
wyoung
|
5 |
Fossil has a [delta compression][dc] feature which removes redundant |
|
389e3fb…
|
wyoung
|
6 |
information from a file relative to its parent on check-in.[^delta-prgs] |
|
f281b3d…
|
wyoung
|
7 |
That delta is then [zlib][zl]-compressed before being stored |
|
41e5237…
|
wyoung
|
8 |
in the Fossil repository database file. |
|
41e5237…
|
wyoung
|
9 |
|
|
41e5237…
|
wyoung
|
10 |
Storing pre-compressed data files in a Fossil repository defeats both of |
|
41e5237…
|
wyoung
|
11 |
these space-saving measures: |
|
41e5237…
|
wyoung
|
12 |
|
|
f281b3d…
|
wyoung
|
13 |
1. Binary data compression algorithms turn the file data into |
|
389e3fb…
|
wyoung
|
14 |
pseudorandom noise.[^prn] |
|
fd1282e…
|
drh
|
15 |
|
|
41e5237…
|
wyoung
|
16 |
Typical data compression algorithms are not [hash functions][hf], |
|
41e5237…
|
wyoung
|
17 |
where the goal is that a change to each bit in the input has a |
|
41e5237…
|
wyoung
|
18 |
statistically even chance of changing every bit in the output, but |
|
41e5237…
|
wyoung
|
19 |
because they do approach that pathological condition, pre-compressed |
|
41e5237…
|
wyoung
|
20 |
data tends to defeat Fossil’s delta compression algorithm, there |
|
41e5237…
|
wyoung
|
21 |
being so little correlation between two different outputs from the |
|
41e5237…
|
wyoung
|
22 |
binary data compression algorithm. |
|
05f95db…
|
wyoung
|
23 |
|
|
05f95db…
|
wyoung
|
24 |
2. An ideal lossless binary data compression algorithm cannot be |
|
05f95db…
|
wyoung
|
25 |
applied more than once to make the data even smaller, since random |
|
05f95db…
|
wyoung
|
26 |
noise is incompressible. The consequence for our purposes here is |
|
05f95db…
|
wyoung
|
27 |
that pre-compressed data doesn’t benefit from Fossil’s zlib |
|
05f95db…
|
wyoung
|
28 |
compression. |
|
05f95db…
|
wyoung
|
29 |
|
|
41e5237…
|
wyoung
|
30 |
You might then ask, what does it matter if the space savings comes from |
|
f281b3d…
|
wyoung
|
31 |
the application file format (e.g. JPEG, DOCX, Zip, etc.) or from Fossil |
|
41e5237…
|
wyoung
|
32 |
itself? It really doesn’t, as far as point 2 above goes, but point 1 |
|
41e5237…
|
wyoung
|
33 |
causes the Fossil repository to balloon out of proportion to the size of |
|
41e5237…
|
wyoung
|
34 |
the input data change on each checkin. This article will illustrate that |
|
41e5237…
|
wyoung
|
35 |
problem, quantify it, and give a solution to it. |
|
41e5237…
|
wyoung
|
36 |
|
|
05f95db…
|
wyoung
|
37 |
[dc]: ./delta_format.wiki |
|
05f95db…
|
wyoung
|
38 |
[hf]: https://en.wikipedia.org/wiki/Hash_function |
|
05f95db…
|
wyoung
|
39 |
[zl]: http://www.zlib.net/ |
|
05f95db…
|
wyoung
|
40 |
|
|
05f95db…
|
wyoung
|
41 |
|
|
7de2410…
|
wyoung
|
42 |
## <a id="formats"></a>Affected File Formats |
|
41e5237…
|
wyoung
|
43 |
|
|
41e5237…
|
wyoung
|
44 |
In this article’s core experiment, we use 2D image file formats, but |
|
41e5237…
|
wyoung
|
45 |
this article’s advice also applies to many other file types. For just a |
|
41e5237…
|
wyoung
|
46 |
few examples out of what must be thousands: |
|
41e5237…
|
wyoung
|
47 |
|
|
41e5237…
|
wyoung
|
48 |
* **Microsoft Office**: The [OOXML document format][oox] used from |
|
41e5237…
|
wyoung
|
49 |
Office 2003 onward (`.docx`, `.xlsx`, `.pptx`, etc.) are Zip files |
|
41e5237…
|
wyoung
|
50 |
containing an XML document file and several collateral files. |
|
41e5237…
|
wyoung
|
51 |
|
|
f281b3d…
|
wyoung
|
52 |
* **Libre Office**: For the purposes of this article, its |
|
f281b3d…
|
wyoung
|
53 |
[OpenDocument Format][odf] is designed the same basic way as OOXML. |
|
41e5237…
|
wyoung
|
54 |
|
|
41e5237…
|
wyoung
|
55 |
* **Java**: A Java [`.jar` file][jcl] is a Zip file containing JVM |
|
05f95db…
|
wyoung
|
56 |
`.class` files, manifest files, and more. |
|
05f95db…
|
wyoung
|
57 |
|
|
41e5237…
|
wyoung
|
58 |
* **Windows Installer:** An [`*.msi` file][wi] is a proprietary |
|
05f95db…
|
wyoung
|
59 |
database format that contains, among other things, [Microsoft |
|
05f95db…
|
wyoung
|
60 |
Cabinet][cab]-compressed files, which in turn may hold Windows |
|
05f95db…
|
wyoung
|
61 |
executables, which [may themselves be compressed][exc]. |
|
05f95db…
|
wyoung
|
62 |
|
|
41e5237…
|
wyoung
|
63 |
* **SVG, PDF, TIFF, etc.**: Many file formats are available in both |
|
41e5237…
|
wyoung
|
64 |
compressed and uncompressed forms. You should use the uncompressed |
|
41e5237…
|
wyoung
|
65 |
form with Fossil wherever practical, as we will show below. |
|
05f95db…
|
wyoung
|
66 |
|
|
05f95db…
|
wyoung
|
67 |
|
|
05f95db…
|
wyoung
|
68 |
[cab]: https://en.wikipedia.org/wiki/Cabinet_(file_format) |
|
05f95db…
|
wyoung
|
69 |
[exc]: https://en.wikipedia.org/wiki/Executable_compression |
|
05f95db…
|
wyoung
|
70 |
[jcl]: https://en.wikipedia.org/wiki/Java_(programming_language) |
|
05f95db…
|
wyoung
|
71 |
[odf]: https://en.wikipedia.org/wiki/OpenDocument |
|
05f95db…
|
wyoung
|
72 |
[oox]: https://en.wikipedia.org/wiki/Office_Open_XML |
|
05f95db…
|
wyoung
|
73 |
[wi]: https://en.wikipedia.org/wiki/Windows_Installer |
|
05f95db…
|
wyoung
|
74 |
|
|
05f95db…
|
wyoung
|
75 |
|
|
05f95db…
|
wyoung
|
76 |
|
|
7de2410…
|
wyoung
|
77 |
## <a id="demo"></a>Demonstration |
|
05f95db…
|
wyoung
|
78 |
|
|
f52d63e…
|
wyoung
|
79 |
The companion `image-format-vs-repo-size.ipynb` file ([download][nbd], |
|
f281b3d…
|
wyoung
|
80 |
[preview][nbp]) is a [JupyterLab][jl] notebook implementing the following |
|
f52d63e…
|
wyoung
|
81 |
experiment: |
|
05f95db…
|
wyoung
|
82 |
|
|
f281b3d…
|
wyoung
|
83 |
1. Create a new minimum-size Fossil repository. Save this initial size. |
|
05f95db…
|
wyoung
|
84 |
|
|
05f95db…
|
wyoung
|
85 |
2. Use [ImageMagick][im] via [Wand][wp] to generate a JPEG file of a |
|
05f95db…
|
wyoung
|
86 |
particular size — currently 256 px² — filled with Gaussian noise to |
|
41e5237…
|
wyoung
|
87 |
make data compression more difficult than with a solid-color image. |
|
05f95db…
|
wyoung
|
88 |
|
|
05f95db…
|
wyoung
|
89 |
3. Check that image into the new Fossil repo, and remember that size. |
|
05f95db…
|
wyoung
|
90 |
|
|
05f95db…
|
wyoung
|
91 |
4. Change a random pixel in the image to a random RGB value, save that |
|
05f95db…
|
wyoung
|
92 |
image, check it in, and remember the new Fossil repo size. |
|
05f95db…
|
wyoung
|
93 |
|
|
05f95db…
|
wyoung
|
94 |
5. Iterate on step 4 some number of times — currently 10 — and remember |
|
05f95db…
|
wyoung
|
95 |
the Fossil repo size at each step. |
|
05f95db…
|
wyoung
|
96 |
|
|
389e3fb…
|
wyoung
|
97 |
6. Repeat the above steps for BMP, PNG, and TIFF.[^tiff-cmp] |
|
05f95db…
|
wyoung
|
98 |
|
|
05f95db…
|
wyoung
|
99 |
7. Create a bar chart showing how the Fossil repository size changes |
|
05f95db…
|
wyoung
|
100 |
with each checkin. |
|
05f95db…
|
wyoung
|
101 |
|
|
f281b3d…
|
wyoung
|
102 |
We chose to use JupyterLab for this because it makes it easy for you to |
|
05f95db…
|
wyoung
|
103 |
modify the notebook to try different things. Want to see how the |
|
05f95db…
|
wyoung
|
104 |
results change with a different image size? Easy, change the `size` |
|
05f95db…
|
wyoung
|
105 |
value in the second cell of the notebook. Want to try more image |
|
05f95db…
|
wyoung
|
106 |
formats? You can put anything ImageMagick can recognize into the |
|
05f95db…
|
wyoung
|
107 |
`formats` list. Want to find the break-even point for images like those |
|
79c2cb0…
|
wyoung
|
108 |
in your own repository? Easily done with a small amount of code. |
|
f52d63e…
|
wyoung
|
109 |
|
|
f52d63e…
|
wyoung
|
110 |
[im]: https://www.imagemagick.org/ |
|
f281b3d…
|
wyoung
|
111 |
[jl]: https://jupyter.org/ |
|
f52d63e…
|
wyoung
|
112 |
[nbd]: ./image-format-vs-repo-size.ipynb |
|
f52d63e…
|
wyoung
|
113 |
[nbp]: https://nbviewer.jupyter.org/urls/fossil-scm.org/fossil/doc/trunk/www/image-format-vs-repo-size.ipynb |
|
f52d63e…
|
wyoung
|
114 |
[wp]: http://wand-py.org/ |
|
f52d63e…
|
wyoung
|
115 |
|
|
f52d63e…
|
wyoung
|
116 |
|
|
7de2410…
|
wyoung
|
117 |
## <a id="results"></a>Results |
|
f52d63e…
|
wyoung
|
118 |
|
|
389e3fb…
|
wyoung
|
119 |
Running the notebook gives a bar chart something like[^variance] this: |
|
05f95db…
|
wyoung
|
120 |
|
|
05f95db…
|
wyoung
|
121 |
 |
|
05f95db…
|
wyoung
|
122 |
|
|
41e5237…
|
wyoung
|
123 |
There are a few key things we want to draw your attention to in that |
|
41e5237…
|
wyoung
|
124 |
chart: |
|
05f95db…
|
wyoung
|
125 |
|
|
05f95db…
|
wyoung
|
126 |
* BMP and uncompressed TIFF are nearly identical in size for all |
|
f281b3d…
|
wyoung
|
127 |
checkins, and the repository growth rate is negligible past the |
|
389e3fb…
|
wyoung
|
128 |
first commit.[^size-jump] We owe this economy to Fossil’s delta compression |
|
f281b3d…
|
wyoung
|
129 |
feature: it is encoding each of those single-pixel changes in a very |
|
f281b3d…
|
wyoung
|
130 |
small amount of repository space. |
|
f281b3d…
|
wyoung
|
131 |
|
|
309af34…
|
andygoth
|
132 |
* The JPEG and PNG bars increase by large amounts on most checkins |
|
f281b3d…
|
wyoung
|
133 |
even though each checkin *also* encodes only a *single-pixel change*. |
|
41e5237…
|
wyoung
|
134 |
|
|
f281b3d…
|
wyoung
|
135 |
* The size of the first checkin in the BMP and TIFF cases is roughly |
|
f281b3d…
|
wyoung
|
136 |
the same as that for the PNG case, because both PNG and Fossil use |
|
f281b3d…
|
wyoung
|
137 |
the zlib binary data compression algorithm. This shows that for |
|
f281b3d…
|
wyoung
|
138 |
repos where the image files are committed only once, there is |
|
f281b3d…
|
wyoung
|
139 |
virtually no penalty to using BMP or TIFF over PNG. The file sizes |
|
f281b3d…
|
wyoung
|
140 |
likely differ only because of differences in zlib settings between |
|
f281b3d…
|
wyoung
|
141 |
the cases. |
|
05f95db…
|
wyoung
|
142 |
|
|
05f95db…
|
wyoung
|
143 |
* Because JPEG’s lossy nature allows it to start smaller and have |
|
e755561…
|
danield
|
144 |
smaller size increases than PNG, the crossover point with |
|
41e5237…
|
wyoung
|
145 |
BMP/TIFF isn’t until 7-9 checkins in typical runs of this [Monte |
|
41e5237…
|
wyoung
|
146 |
Carlo experiment][mce]. Given a choice among these four file |
|
41e5237…
|
wyoung
|
147 |
formats and a willingness to use lossy image compression, a rational |
|
41e5237…
|
wyoung
|
148 |
tradeoff is to choose JPEG for repositories where each image will |
|
41e5237…
|
wyoung
|
149 |
change fewer than that number of times. |
|
41e5237…
|
wyoung
|
150 |
|
|
f281b3d…
|
wyoung
|
151 |
|
|
05f95db…
|
wyoung
|
152 |
[mce]: https://en.wikipedia.org/wiki/Monte_Carlo_method |
|
05f95db…
|
wyoung
|
153 |
|
|
05f95db…
|
wyoung
|
154 |
|
|
7de2410…
|
wyoung
|
155 |
## <a id="makefile"></a>Automated Recompression |
|
05f95db…
|
wyoung
|
156 |
|
|
05f95db…
|
wyoung
|
157 |
Since programs that produce and consume binary-compressed data files |
|
05f95db…
|
wyoung
|
158 |
often make it either difficult or impossible to work with the |
|
05f95db…
|
wyoung
|
159 |
uncompressed form, we want an automated method for producing the |
|
05f95db…
|
wyoung
|
160 |
uncompressed form to make Fossil happy while still having the compressed |
|
05f95db…
|
wyoung
|
161 |
form to keep our content creation applications happy. This `Makefile` |
|
389e3fb…
|
wyoung
|
162 |
should[^makefile] do that for BMP, PNG, SVG, and XLSX files: |
|
389e3fb…
|
wyoung
|
163 |
|
|
8a1ba49…
|
wyoung
|
164 |
.SUFFIXES: .bmp .png .svg .svgz |
|
8a1ba49…
|
wyoung
|
165 |
|
|
8a1ba49…
|
wyoung
|
166 |
.svgz.svg: |
|
8a1ba49…
|
wyoung
|
167 |
gzip -dc < $< > $@ |
|
8a1ba49…
|
wyoung
|
168 |
|
|
8a1ba49…
|
wyoung
|
169 |
.svg.svgz: |
|
8a1ba49…
|
wyoung
|
170 |
gzip -9c < $< > $@ |
|
8a1ba49…
|
wyoung
|
171 |
|
|
8a1ba49…
|
wyoung
|
172 |
.bmp.png: |
|
8a1ba49…
|
wyoung
|
173 |
convert -quality 95 $< $@ |
|
8a1ba49…
|
wyoung
|
174 |
|
|
8a1ba49…
|
wyoung
|
175 |
.png.bmp: |
|
8a1ba49…
|
wyoung
|
176 |
convert $< $@ |
|
8a1ba49…
|
wyoung
|
177 |
|
|
8a1ba49…
|
wyoung
|
178 |
SS_FILES := $(wildcard spreadsheet/*) |
|
8a1ba49…
|
wyoung
|
179 |
|
|
8a1ba49…
|
wyoung
|
180 |
|
|
8a1ba49…
|
wyoung
|
181 |
all: $(SS_FILES) illus.svg image.bmp doc-big.pdf |
|
8a1ba49…
|
wyoung
|
182 |
|
|
8a1ba49…
|
wyoung
|
183 |
reconstitute: illus.svgz image.png |
|
8a1ba49…
|
wyoung
|
184 |
( cd spreadsheet ; zip -9 ../spreadsheet.xlsx) * ) |
|
8a1ba49…
|
wyoung
|
185 |
qpdf doc-big.pdf doc-small.pdf |
|
8a1ba49…
|
wyoung
|
186 |
|
|
8a1ba49…
|
wyoung
|
187 |
|
|
8a1ba49…
|
wyoung
|
188 |
$(SS_FILES): spreadsheet.xlsx |
|
8a1ba49…
|
wyoung
|
189 |
unzip $@ -d $< |
|
8a1ba49…
|
wyoung
|
190 |
|
|
8a1ba49…
|
wyoung
|
191 |
doc-big.pdf: doc-small.pdf |
|
8a1ba49…
|
wyoung
|
192 |
qpdf --stream-data=uncompress $@ $< |
|
05f95db…
|
wyoung
|
193 |
|
|
05f95db…
|
wyoung
|
194 |
This `Makefile` allows you to treat the compressed version as the |
|
05f95db…
|
wyoung
|
195 |
process input, but to actually check in only the changes against the |
|
41e5237…
|
wyoung
|
196 |
uncompressed version by typing “`make`” before “`fossil ci`”. This is |
|
41e5237…
|
wyoung
|
197 |
not actually an extra step in practice, since if you’ve got a |
|
389e3fb…
|
wyoung
|
198 |
`Makefile`-based project, you should be building — and testing — it |
|
41e5237…
|
wyoung
|
199 |
before checking each change in anyway! |
|
05f95db…
|
wyoung
|
200 |
|
|
41e5237…
|
wyoung
|
201 |
Because this technique is based on dependency rules, only the necessary |
|
41e5237…
|
wyoung
|
202 |
files are generated on each `make` command. |
|
05f95db…
|
wyoung
|
203 |
|
|
05f95db…
|
wyoung
|
204 |
You only have to run “`make reconstitute`” *once* after opening a fresh |
|
05f95db…
|
wyoung
|
205 |
Fossil checkout to produce those compressed sources. After that, you |
|
41e5237…
|
wyoung
|
206 |
work with the compressed files in your content creation programs. Your |
|
41e5237…
|
wyoung
|
207 |
build system might include some kind of bootstrapping or |
|
41e5237…
|
wyoung
|
208 |
auto-configuration step that you could attach this to, so that it |
|
41e5237…
|
wyoung
|
209 |
doesn’t need to be run by hand. |
|
05f95db…
|
wyoung
|
210 |
|
|
41e5237…
|
wyoung
|
211 |
This `Makefile` illustrates two primary strategies: |
|
05f95db…
|
wyoung
|
212 |
|
|
05f95db…
|
wyoung
|
213 |
|
|
79c2cb0…
|
wyoung
|
214 |
### Input and Output File Formats Differ by Extension |
|
05f95db…
|
wyoung
|
215 |
|
|
05f95db…
|
wyoung
|
216 |
In the case of SVG and the bitmap image formats, the file name extension |
|
05f95db…
|
wyoung
|
217 |
differs between the cases, so we can use `make` suffix rules to get the |
|
05f95db…
|
wyoung
|
218 |
behavior we want. The top half of the `Makefile` just tells `make` how |
|
05f95db…
|
wyoung
|
219 |
to map from `*.svg` to `*.svgz` and vice versa, and the same for `*.bmp` |
|
05f95db…
|
wyoung
|
220 |
to/from `*.png`. |
|
05f95db…
|
wyoung
|
221 |
|
|
05f95db…
|
wyoung
|
222 |
|
|
41e5237…
|
wyoung
|
223 |
### Input and Output Use the Same Extension |
|
05f95db…
|
wyoung
|
224 |
|
|
41e5237…
|
wyoung
|
225 |
We don’t have that luxury for Excel and PDF files, each for a different |
|
41e5237…
|
wyoung
|
226 |
reason: |
|
05f95db…
|
wyoung
|
227 |
|
|
05f95db…
|
wyoung
|
228 |
* **Excel:** Excel has no way to work with the unpacked Zip file |
|
05f95db…
|
wyoung
|
229 |
contents at all, so we have to unpack it into a subdirectory, which |
|
05f95db…
|
wyoung
|
230 |
is what we check into Fossil. On making a fresh Fossil checkout, we |
|
05f95db…
|
wyoung
|
231 |
have to pack that subdirectory’s contents back up into an `*.xlsx` |
|
05f95db…
|
wyoung
|
232 |
file with “`make reconstitute`” so we can edit it with Excel again. |
|
05f95db…
|
wyoung
|
233 |
|
|
05f95db…
|
wyoung
|
234 |
* **PDF:** All PDF readers can display an uncompressed PDF file, but |
|
05f95db…
|
wyoung
|
235 |
many PDF-*producing* programs have no option for uncompressed |
|
05f95db…
|
wyoung
|
236 |
output. Since the file name extension is the same either way, we |
|
05f95db…
|
wyoung
|
237 |
treat the compressed PDF as the source to the process, yielding an |
|
05f95db…
|
wyoung
|
238 |
automatically-uncompressed PDF for the benefit of Fossil. Unlike |
|
05f95db…
|
wyoung
|
239 |
with the Excel case, there is no simple “file base name to directory |
|
05f95db…
|
wyoung
|
240 |
name” mapping, so we just created the `-big` to `-small` name scheme |
|
05f95db…
|
wyoung
|
241 |
here. |
|
05f95db…
|
wyoung
|
242 |
|
|
f281b3d…
|
wyoung
|
243 |
|
|
389e3fb…
|
wyoung
|
244 |
[^delta-prgs]: |
|
389e3fb…
|
wyoung
|
245 |
This problem is not Fossil-specific. Several other programs also do |
|
f281b3d…
|
wyoung
|
246 |
delta compression, so they’ll also be affected by this problem: |
|
f281b3d…
|
wyoung
|
247 |
[rsync][rs], [Unison][us], [Git][git], etc. You should take this |
|
f281b3d…
|
wyoung
|
248 |
article’s advice when using all such programs, not just Fossil. |
|
f281b3d…
|
wyoung
|
249 |
When using file copying and synchronization programs *without* delta |
|
f281b3d…
|
wyoung
|
250 |
compression, on the other hand, it’s best to use the most |
|
f281b3d…
|
wyoung
|
251 |
highly-compressed file format you can tolerate, since they copy the |
|
f281b3d…
|
wyoung
|
252 |
whole file any time any bit of it changes. |
|
f281b3d…
|
wyoung
|
253 |
|
|
389e3fb…
|
wyoung
|
254 |
[^prn]: |
|
389e3fb…
|
wyoung
|
255 |
In fact, a good way to gauge the effectiveness of a given |
|
41e5237…
|
wyoung
|
256 |
compression scheme is to run its output through the same sort of |
|
41e5237…
|
wyoung
|
257 |
tests we use to gauge how “random” a given [PRNG][prng] is. Another |
|
41e5237…
|
wyoung
|
258 |
way to look at it is that if there is a discernible pattern in the |
|
f281b3d…
|
wyoung
|
259 |
output of a compression scheme, that constitutes *information* (in |
|
f281b3d…
|
wyoung
|
260 |
[the technical sense of that word][ith]) that could be further |
|
f281b3d…
|
wyoung
|
261 |
compressed. |
|
41e5237…
|
wyoung
|
262 |
|
|
389e3fb…
|
wyoung
|
263 |
[^tiff-cmp]: |
|
389e3fb…
|
wyoung
|
264 |
We're using *uncompressed* TIFF here, not [LZW][lzw]- or |
|
05f95db…
|
wyoung
|
265 |
Zip-compressed TIFF, either of which would give similar results to |
|
05f95db…
|
wyoung
|
266 |
PNG, which is always zlib-compressed. |
|
05f95db…
|
wyoung
|
267 |
|
|
389e3fb…
|
wyoung
|
268 |
[^variance]: |
|
389e3fb…
|
wyoung
|
269 |
The raw data changes somewhat from one run to the next due to the |
|
05f95db…
|
wyoung
|
270 |
use of random noise in the image to make the zlib/PNG compression |
|
05f95db…
|
wyoung
|
271 |
more difficult, and the random pixel changes. Those test design |
|
79c2cb0…
|
wyoung
|
272 |
choices make this a [Monte Carlo experiment][mce]. We’ve found that |
|
41e5237…
|
wyoung
|
273 |
the overall character of the results doesn’t change from one run to |
|
41e5237…
|
wyoung
|
274 |
the next. |
|
41e5237…
|
wyoung
|
275 |
|
|
389e3fb…
|
wyoung
|
276 |
[^size-jump]: |
|
389e3fb…
|
wyoung
|
277 |
It’s not clear to me why there is a one-time jump in size for BMP |
|
f281b3d…
|
wyoung
|
278 |
and TIFF past the first commit. I suspect it is due to the SQLite |
|
f281b3d…
|
wyoung
|
279 |
indices being initialized for the first time. |
|
f281b3d…
|
wyoung
|
280 |
Page size inflation might have something to do with it as well, |
|
f281b3d…
|
wyoung
|
281 |
though we tried to control that by rebuilding the initial DB with a |
|
f281b3d…
|
wyoung
|
282 |
minimal page size. If you re-run the program often enough, you will |
|
f281b3d…
|
wyoung
|
283 |
sometimes see the BMP or TIFF bar jump higher than the other, again |
|
f281b3d…
|
wyoung
|
284 |
likely due to one of the repos crossing a page boundary. |
|
f281b3d…
|
wyoung
|
285 |
Another curious artifact in the data is that the BMP is slightly |
|
f281b3d…
|
wyoung
|
286 |
larger than for the TIFF. This goes against expectation because a |
|
f281b3d…
|
wyoung
|
287 |
low-tech format like BMP should have a small edge in this test |
|
41e5237…
|
wyoung
|
288 |
because TIFF metadata includes the option for multiple timestamps, |
|
41e5237…
|
wyoung
|
289 |
UUIDs, etc., which bloat the checkin size by creating many small |
|
f281b3d…
|
wyoung
|
290 |
deltas. |
|
82ad591…
|
wyoung
|
291 |
|
|
389e3fb…
|
wyoung
|
292 |
[^makefile]: |
|
389e3fb…
|
wyoung
|
293 |
The `Makefile` above is not battle-tested. Please report bugs and |
|
82ad591…
|
wyoung
|
294 |
needed extensions [on the forum][for]. |
|
82ad591…
|
wyoung
|
295 |
|
|
41e5237…
|
wyoung
|
296 |
[for]: https://fossil-scm.org/forum/forumpost/15e677f2c8 |
|
41e5237…
|
wyoung
|
297 |
[git]: https://git-scm.com/ |
|
f281b3d…
|
wyoung
|
298 |
[ith]: https://en.wikipedia.org/wiki/Information_theory |
|
41e5237…
|
wyoung
|
299 |
[lzw]: https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch |
|
41e5237…
|
wyoung
|
300 |
[prng]: https://en.wikipedia.org/wiki/Pseudorandom_number_generator |
|
41e5237…
|
wyoung
|
301 |
[rs]: https://rsync.samba.org/ |
|
41e5237…
|
wyoung
|
302 |
[us]: http://www.cis.upenn.edu/~bcpierce/unison/ |