Fossil SCM
Converted the hand-crafted footnotes in the "Image Format vs Fossil Repo Size" doc to use the new Markdown affordance.
Commit
389e3fb976ea4a23d04b847c9c2bea77e73312be27f17ab564ac961783d9593e
Parent
5c8f5575657c0e3…
1 file changed
+19
-20
+19
-20
| --- www/image-format-vs-repo-size.md | ||
| +++ www/image-format-vs-repo-size.md | ||
| @@ -1,19 +1,19 @@ | ||
| 1 | 1 | # Image Format vs Fossil Repo Size |
| 2 | 2 | |
| 3 | 3 | ## The Problem |
| 4 | 4 | |
| 5 | 5 | Fossil has a [delta compression][dc] feature which removes redundant |
| 6 | -information from a file relative to its parent on check-in.¹ | |
| 6 | +information from a file relative to its parent on check-in.[^delta-prgs] | |
| 7 | 7 | That delta is then [zlib][zl]-compressed before being stored |
| 8 | 8 | in the Fossil repository database file. |
| 9 | 9 | |
| 10 | 10 | Storing pre-compressed data files in a Fossil repository defeats both of |
| 11 | 11 | these space-saving measures: |
| 12 | 12 | |
| 13 | 13 | 1. Binary data compression algorithms turn the file data into |
| 14 | - [pseudorandom noise][prn].² | |
| 14 | + pseudorandom noise.[^prn] | |
| 15 | 15 | |
| 16 | 16 | Typical data compression algorithms are not [hash functions][hf], |
| 17 | 17 | where the goal is that a change to each bit in the input has a |
| 18 | 18 | statistically even chance of changing every bit in the output, but |
| 19 | 19 | because they do approach that pathological condition, pre-compressed |
| @@ -34,11 +34,10 @@ | ||
| 34 | 34 | the input data change on each checkin. This article will illustrate that |
| 35 | 35 | problem, quantify it, and give a solution to it. |
| 36 | 36 | |
| 37 | 37 | [dc]: ./delta_format.wiki |
| 38 | 38 | [hf]: https://en.wikipedia.org/wiki/Hash_function |
| 39 | -[prn]: https://en.wikipedia.org/wiki/Pseudorandomness | |
| 40 | 39 | [zl]: http://www.zlib.net/ |
| 41 | 40 | |
| 42 | 41 | |
| 43 | 42 | ## <a id="formats"></a>Affected File Formats |
| 44 | 43 | |
| @@ -93,11 +92,11 @@ | ||
| 93 | 92 | image, check it in, and remember the new Fossil repo size. |
| 94 | 93 | |
| 95 | 94 | 5. Iterate on step 4 some number of times — currently 10 — and remember |
| 96 | 95 | the Fossil repo size at each step. |
| 97 | 96 | |
| 98 | -6. Repeat the above steps for BMP, TIFF,³ and PNG. | |
| 97 | +6. Repeat the above steps for BMP, PNG, and TIFF.[^tiff-cmp] | |
| 99 | 98 | |
| 100 | 99 | 7. Create a bar chart showing how the Fossil repository size changes |
| 101 | 100 | with each checkin. |
| 102 | 101 | |
| 103 | 102 | We chose to use JupyterLab for this because it makes it easy for you to |
| @@ -115,20 +114,20 @@ | ||
| 115 | 114 | [wp]: http://wand-py.org/ |
| 116 | 115 | |
| 117 | 116 | |
| 118 | 117 | ## <a id="results"></a>Results |
| 119 | 118 | |
| 120 | -Running the notebook gives a bar chart something like⁴ this: | |
| 119 | +Running the notebook gives a bar chart something like[^variance] this: | |
| 121 | 120 | |
| 122 | 121 |  |
| 123 | 122 | |
| 124 | 123 | There are a few key things we want to draw your attention to in that |
| 125 | 124 | chart: |
| 126 | 125 | |
| 127 | 126 | * BMP and uncompressed TIFF are nearly identical in size for all |
| 128 | 127 | checkins, and the repository growth rate is negligible past the |
| 129 | - first commit.⁵ We owe this economy to Fossil’s delta compression | |
| 128 | + first commit.[^size-jump] We owe this economy to Fossil’s delta compression | |
| 130 | 129 | feature: it is encoding each of those single-pixel changes in a very |
| 131 | 130 | small amount of repository space. |
| 132 | 131 | |
| 133 | 132 | * The JPEG and PNG bars increase by large amounts on most checkins |
| 134 | 133 | even though each checkin *also* encodes only a *single-pixel change*. |
| @@ -158,11 +157,11 @@ | ||
| 158 | 157 | Since programs that produce and consume binary-compressed data files |
| 159 | 158 | often make it either difficult or impossible to work with the |
| 160 | 159 | uncompressed form, we want an automated method for producing the |
| 161 | 160 | uncompressed form to make Fossil happy while still having the compressed |
| 162 | 161 | form to keep our content creation applications happy. This `Makefile` |
| 163 | -should⁶ do that for BMP, PNG, SVG, and XLSX files: | |
| 162 | +should[^makefile] do that for BMP, PNG, SVG, and XLSX files: | |
| 164 | 163 | |
| 165 | 164 | .SUFFIXES: .bmp .png .svg .svgz |
| 166 | 165 | |
| 167 | 166 | .svgz.svg: |
| 168 | 167 | gzip -dc < $< > $@ |
| @@ -194,11 +193,11 @@ | ||
| 194 | 193 | |
| 195 | 194 | This `Makefile` allows you to treat the compressed version as the |
| 196 | 195 | process input, but to actually check in only the changes against the |
| 197 | 196 | uncompressed version by typing “`make`” before “`fossil ci`”. This is |
| 198 | 197 | not actually an extra step in practice, since if you’ve got a |
| 199 | -`Makefile`-based project, you should be building (and testing!) it | |
| 198 | +`Makefile`-based project, you should be building — and testing — it | |
| 200 | 199 | before checking each change in anyway! |
| 201 | 200 | |
| 202 | 201 | Because this technique is based on dependency rules, only the necessary |
| 203 | 202 | files are generated on each `make` command. |
| 204 | 203 | |
| @@ -239,66 +238,65 @@ | ||
| 239 | 238 | automatically-uncompressed PDF for the benefit of Fossil. Unlike |
| 240 | 239 | with the Excel case, there is no simple “file base name to directory |
| 241 | 240 | name” mapping, so we just created the `-big` to `-small` name scheme |
| 242 | 241 | here. |
| 243 | 242 | |
| ----- | ||
| 244 | 243 | |
| 245 | - | |
| 246 | -## <a id="notes"></a>Footnotes and Digressions | |
| 247 | - | |
| 248 | -1. This problem is not Fossil-specific. Several other programs also do | |
| 244 | +[^delta-prgs]: | |
| 245 | + This problem is not Fossil-specific. Several other programs also do | |
| 249 | 246 | delta compression, so they’ll also be affected by this problem: |
| 250 | 247 | [rsync][rs], [Unison][us], [Git][git], etc. You should take this |
| 251 | 248 | article’s advice when using all such programs, not just Fossil. |
| 252 | - | |
| 253 | 249 | When using file copying and synchronization programs *without* delta |
| 254 | 250 | compression, on the other hand, it’s best to use the most |
| 255 | 251 | highly-compressed file format you can tolerate, since they copy the |
| 256 | 252 | whole file any time any bit of it changes. |
| 257 | 253 | |
| 258 | -2. In fact, a good way to gauge the effectiveness of a given | |
| 254 | +[^prn]: | |
| 255 | + In fact, a good way to gauge the effectiveness of a given | |
| 259 | 256 | compression scheme is to run its output through the same sort of |
| 260 | 257 | tests we use to gauge how “random” a given [PRNG][prng] is. Another |
| 261 | 258 | way to look at it is that if there is a discernible pattern in the |
| 262 | 259 | output of a compression scheme, that constitutes *information* (in |
| 263 | 260 | [the technical sense of that word][ith]) that could be further |
| 264 | 261 | compressed. |
| 265 | 262 | |
| 266 | -3. We're using *uncompressed* TIFF here, not [LZW][lzw]- or | |
| 263 | +[^tiff-cmp]: | |
| 264 | + We're using *uncompressed* TIFF here, not [LZW][lzw]- or | |
| 267 | 265 | Zip-compressed TIFF, either of which would give similar results to |
| 268 | 266 | PNG, which is always zlib-compressed. |
| 269 | 267 | |
| 270 | -4. The raw data changes somewhat from one run to the next due to the | |
| 268 | +[^variance]: | |
| 269 | + The raw data changes somewhat from one run to the next due to the | |
| 271 | 270 | use of random noise in the image to make the zlib/PNG compression |
| 272 | 271 | more difficult, and the random pixel changes. Those test design |
| 273 | 272 | choices make this a [Monte Carlo experiment][mce]. We’ve found that |
| 274 | 273 | the overall character of the results doesn’t change from one run to |
| 275 | 274 | the next. |
| 276 | 275 | |
| 277 | -5. It’s not clear to me why there is a one-time jump in size for BMP | |
| 276 | +[^size-jump]: | |
| 277 | + It’s not clear to me why there is a one-time jump in size for BMP | |
| 278 | 278 | and TIFF past the first commit. I suspect it is due to the SQLite |
| 279 | 279 | indices being initialized for the first time. |
| 280 | - | |
| 281 | 280 | Page size inflation might have something to do with it as well, |
| 282 | 281 | though we tried to control that by rebuilding the initial DB with a |
| 283 | 282 | minimal page size. If you re-run the program often enough, you will |
| 284 | 283 | sometimes see the BMP or TIFF bar jump higher than the other, again |
| 285 | 284 | likely due to one of the repos crossing a page boundary. |
| 286 | - | |
| 287 | 285 | Another curious artifact in the data is that the BMP is slightly |
| 288 | 286 | larger than for the TIFF. This goes against expectation because a |
| 289 | 287 | low-tech format like BMP should have a small edge in this test |
| 290 | 288 | because TIFF metadata includes the option for multiple timestamps, |
| 291 | 289 | UUIDs, etc., which bloat the checkin size by creating many small |
| 292 | 290 | deltas. |
| 293 | 291 | |
| 294 | -6. The `Makefile` above is not battle-tested. Please report bugs and | |
| 292 | +[^makefile]: | |
| 293 | + The `Makefile` above is not battle-tested. Please report bugs and | |
| 295 | 294 | needed extensions [on the forum][for]. |
| 296 | 295 | |
| 297 | 296 | [for]: https://fossil-scm.org/forum/forumpost/15e677f2c8 |
| 298 | 297 | [git]: https://git-scm.com/ |
| 299 | 298 | [ith]: https://en.wikipedia.org/wiki/Information_theory |
| 300 | 299 | [lzw]: https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch |
| 301 | 300 | [prng]: https://en.wikipedia.org/wiki/Pseudorandom_number_generator |
| 302 | 301 | [rs]: https://rsync.samba.org/ |
| 303 | 302 | [us]: http://www.cis.upenn.edu/~bcpierce/unison/ |
| 304 | 303 |
| --- www/image-format-vs-repo-size.md | |
| +++ www/image-format-vs-repo-size.md | |
| @@ -1,19 +1,19 @@ | |
| 1 | # Image Format vs Fossil Repo Size |
| 2 | |
| 3 | ## The Problem |
| 4 | |
| 5 | Fossil has a [delta compression][dc] feature which removes redundant |
| 6 | information from a file relative to its parent on check-in.¹ |
| 7 | That delta is then [zlib][zl]-compressed before being stored |
| 8 | in the Fossil repository database file. |
| 9 | |
| 10 | Storing pre-compressed data files in a Fossil repository defeats both of |
| 11 | these space-saving measures: |
| 12 | |
| 13 | 1. Binary data compression algorithms turn the file data into |
| 14 | [pseudorandom noise][prn].² |
| 15 | |
| 16 | Typical data compression algorithms are not [hash functions][hf], |
| 17 | where the goal is that a change to each bit in the input has a |
| 18 | statistically even chance of changing every bit in the output, but |
| 19 | because they do approach that pathological condition, pre-compressed |
| @@ -34,11 +34,10 @@ | |
| 34 | the input data change on each checkin. This article will illustrate that |
| 35 | problem, quantify it, and give a solution to it. |
| 36 | |
| 37 | [dc]: ./delta_format.wiki |
| 38 | [hf]: https://en.wikipedia.org/wiki/Hash_function |
| 39 | [prn]: https://en.wikipedia.org/wiki/Pseudorandomness |
| 40 | [zl]: http://www.zlib.net/ |
| 41 | |
| 42 | |
| 43 | ## <a id="formats"></a>Affected File Formats |
| 44 | |
| @@ -93,11 +92,11 @@ | |
| 93 | image, check it in, and remember the new Fossil repo size. |
| 94 | |
| 95 | 5. Iterate on step 4 some number of times — currently 10 — and remember |
| 96 | the Fossil repo size at each step. |
| 97 | |
| 98 | 6. Repeat the above steps for BMP, TIFF,³ and PNG. |
| 99 | |
| 100 | 7. Create a bar chart showing how the Fossil repository size changes |
| 101 | with each checkin. |
| 102 | |
| 103 | We chose to use JupyterLab for this because it makes it easy for you to |
| @@ -115,20 +114,20 @@ | |
| 115 | [wp]: http://wand-py.org/ |
| 116 | |
| 117 | |
| 118 | ## <a id="results"></a>Results |
| 119 | |
| 120 | Running the notebook gives a bar chart something like⁴ this: |
| 121 | |
| 122 |  |
| 123 | |
| 124 | There are a few key things we want to draw your attention to in that |
| 125 | chart: |
| 126 | |
| 127 | * BMP and uncompressed TIFF are nearly identical in size for all |
| 128 | checkins, and the repository growth rate is negligible past the |
| 129 | first commit.⁵ We owe this economy to Fossil’s delta compression |
| 130 | feature: it is encoding each of those single-pixel changes in a very |
| 131 | small amount of repository space. |
| 132 | |
| 133 | * The JPEG and PNG bars increase by large amounts on most checkins |
| 134 | even though each checkin *also* encodes only a *single-pixel change*. |
| @@ -158,11 +157,11 @@ | |
| 158 | Since programs that produce and consume binary-compressed data files |
| 159 | often make it either difficult or impossible to work with the |
| 160 | uncompressed form, we want an automated method for producing the |
| 161 | uncompressed form to make Fossil happy while still having the compressed |
| 162 | form to keep our content creation applications happy. This `Makefile` |
| 163 | should⁶ do that for BMP, PNG, SVG, and XLSX files: |
| 164 | |
| 165 | .SUFFIXES: .bmp .png .svg .svgz |
| 166 | |
| 167 | .svgz.svg: |
| 168 | gzip -dc < $< > $@ |
| @@ -194,11 +193,11 @@ | |
| 194 | |
| 195 | This `Makefile` allows you to treat the compressed version as the |
| 196 | process input, but to actually check in only the changes against the |
| 197 | uncompressed version by typing “`make`” before “`fossil ci`”. This is |
| 198 | not actually an extra step in practice, since if you’ve got a |
| 199 | `Makefile`-based project, you should be building (and testing!) it |
| 200 | before checking each change in anyway! |
| 201 | |
| 202 | Because this technique is based on dependency rules, only the necessary |
| 203 | files are generated on each `make` command. |
| 204 | |
| @@ -239,66 +238,65 @@ | |
| 239 | automatically-uncompressed PDF for the benefit of Fossil. Unlike |
| 240 | with the Excel case, there is no simple “file base name to directory |
| 241 | name” mapping, so we just created the `-big` to `-small` name scheme |
| 242 | here. |
| 243 | |
| ----- | |
| 244 | |
| 245 | |
| 246 | ## <a id="notes"></a>Footnotes and Digressions |
| 247 | |
| 248 | 1. This problem is not Fossil-specific. Several other programs also do |
| 249 | delta compression, so they’ll also be affected by this problem: |
| 250 | [rsync][rs], [Unison][us], [Git][git], etc. You should take this |
| 251 | article’s advice when using all such programs, not just Fossil. |
| 252 | |
| 253 | When using file copying and synchronization programs *without* delta |
| 254 | compression, on the other hand, it’s best to use the most |
| 255 | highly-compressed file format you can tolerate, since they copy the |
| 256 | whole file any time any bit of it changes. |
| 257 | |
| 258 | 2. In fact, a good way to gauge the effectiveness of a given |
| 259 | compression scheme is to run its output through the same sort of |
| 260 | tests we use to gauge how “random” a given [PRNG][prng] is. Another |
| 261 | way to look at it is that if there is a discernible pattern in the |
| 262 | output of a compression scheme, that constitutes *information* (in |
| 263 | [the technical sense of that word][ith]) that could be further |
| 264 | compressed. |
| 265 | |
| 266 | 3. We're using *uncompressed* TIFF here, not [LZW][lzw]- or |
| 267 | Zip-compressed TIFF, either of which would give similar results to |
| 268 | PNG, which is always zlib-compressed. |
| 269 | |
| 270 | 4. The raw data changes somewhat from one run to the next due to the |
| 271 | use of random noise in the image to make the zlib/PNG compression |
| 272 | more difficult, and the random pixel changes. Those test design |
| 273 | choices make this a [Monte Carlo experiment][mce]. We’ve found that |
| 274 | the overall character of the results doesn’t change from one run to |
| 275 | the next. |
| 276 | |
| 277 | 5. It’s not clear to me why there is a one-time jump in size for BMP |
| 278 | and TIFF past the first commit. I suspect it is due to the SQLite |
| 279 | indices being initialized for the first time. |
| 280 | |
| 281 | Page size inflation might have something to do with it as well, |
| 282 | though we tried to control that by rebuilding the initial DB with a |
| 283 | minimal page size. If you re-run the program often enough, you will |
| 284 | sometimes see the BMP or TIFF bar jump higher than the other, again |
| 285 | likely due to one of the repos crossing a page boundary. |
| 286 | |
| 287 | Another curious artifact in the data is that the BMP is slightly |
| 288 | larger than for the TIFF. This goes against expectation because a |
| 289 | low-tech format like BMP should have a small edge in this test |
| 290 | because TIFF metadata includes the option for multiple timestamps, |
| 291 | UUIDs, etc., which bloat the checkin size by creating many small |
| 292 | deltas. |
| 293 | |
| 294 | 6. The `Makefile` above is not battle-tested. Please report bugs and |
| 295 | needed extensions [on the forum][for]. |
| 296 | |
| 297 | [for]: https://fossil-scm.org/forum/forumpost/15e677f2c8 |
| 298 | [git]: https://git-scm.com/ |
| 299 | [ith]: https://en.wikipedia.org/wiki/Information_theory |
| 300 | [lzw]: https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch |
| 301 | [prng]: https://en.wikipedia.org/wiki/Pseudorandom_number_generator |
| 302 | [rs]: https://rsync.samba.org/ |
| 303 | [us]: http://www.cis.upenn.edu/~bcpierce/unison/ |
| 304 |
| --- www/image-format-vs-repo-size.md | |
| +++ www/image-format-vs-repo-size.md | |
| @@ -1,19 +1,19 @@ | |
| 1 | # Image Format vs Fossil Repo Size |
| 2 | |
| 3 | ## The Problem |
| 4 | |
| 5 | Fossil has a [delta compression][dc] feature which removes redundant |
| 6 | information from a file relative to its parent on check-in.[^delta-prgs] |
| 7 | That delta is then [zlib][zl]-compressed before being stored |
| 8 | in the Fossil repository database file. |
| 9 | |
| 10 | Storing pre-compressed data files in a Fossil repository defeats both of |
| 11 | these space-saving measures: |
| 12 | |
| 13 | 1. Binary data compression algorithms turn the file data into |
| 14 | pseudorandom noise.[^prn] |
| 15 | |
| 16 | Typical data compression algorithms are not [hash functions][hf], |
| 17 | where the goal is that a change to each bit in the input has a |
| 18 | statistically even chance of changing every bit in the output, but |
| 19 | because they do approach that pathological condition, pre-compressed |
| @@ -34,11 +34,10 @@ | |
| 34 | the input data change on each checkin. This article will illustrate that |
| 35 | problem, quantify it, and give a solution to it. |
| 36 | |
| 37 | [dc]: ./delta_format.wiki |
| 38 | [hf]: https://en.wikipedia.org/wiki/Hash_function |
| 39 | [zl]: http://www.zlib.net/ |
| 40 | |
| 41 | |
| 42 | ## <a id="formats"></a>Affected File Formats |
| 43 | |
| @@ -93,11 +92,11 @@ | |
| 92 | image, check it in, and remember the new Fossil repo size. |
| 93 | |
| 94 | 5. Iterate on step 4 some number of times — currently 10 — and remember |
| 95 | the Fossil repo size at each step. |
| 96 | |
| 97 | 6. Repeat the above steps for BMP, PNG, and TIFF.[^tiff-cmp] |
| 98 | |
| 99 | 7. Create a bar chart showing how the Fossil repository size changes |
| 100 | with each checkin. |
| 101 | |
| 102 | We chose to use JupyterLab for this because it makes it easy for you to |
| @@ -115,20 +114,20 @@ | |
| 114 | [wp]: http://wand-py.org/ |
| 115 | |
| 116 | |
| 117 | ## <a id="results"></a>Results |
| 118 | |
| 119 | Running the notebook gives a bar chart something like[^variance] this: |
| 120 | |
| 121 |  |
| 122 | |
| 123 | There are a few key things we want to draw your attention to in that |
| 124 | chart: |
| 125 | |
| 126 | * BMP and uncompressed TIFF are nearly identical in size for all |
| 127 | checkins, and the repository growth rate is negligible past the |
| 128 | first commit.[^size-jump] We owe this economy to Fossil’s delta compression |
| 129 | feature: it is encoding each of those single-pixel changes in a very |
| 130 | small amount of repository space. |
| 131 | |
| 132 | * The JPEG and PNG bars increase by large amounts on most checkins |
| 133 | even though each checkin *also* encodes only a *single-pixel change*. |
| @@ -158,11 +157,11 @@ | |
| 157 | Since programs that produce and consume binary-compressed data files |
| 158 | often make it either difficult or impossible to work with the |
| 159 | uncompressed form, we want an automated method for producing the |
| 160 | uncompressed form to make Fossil happy while still having the compressed |
| 161 | form to keep our content creation applications happy. This `Makefile` |
| 162 | should[^makefile] do that for BMP, PNG, SVG, and XLSX files: |
| 163 | |
| 164 | .SUFFIXES: .bmp .png .svg .svgz |
| 165 | |
| 166 | .svgz.svg: |
| 167 | gzip -dc < $< > $@ |
| @@ -194,11 +193,11 @@ | |
| 193 | |
| 194 | This `Makefile` allows you to treat the compressed version as the |
| 195 | process input, but to actually check in only the changes against the |
| 196 | uncompressed version by typing “`make`” before “`fossil ci`”. This is |
| 197 | not actually an extra step in practice, since if you’ve got a |
| 198 | `Makefile`-based project, you should be building — and testing — it |
| 199 | before checking each change in anyway! |
| 200 | |
| 201 | Because this technique is based on dependency rules, only the necessary |
| 202 | files are generated on each `make` command. |
| 203 | |
| @@ -239,66 +238,65 @@ | |
| 238 | automatically-uncompressed PDF for the benefit of Fossil. Unlike |
| 239 | with the Excel case, there is no simple “file base name to directory |
| 240 | name” mapping, so we just created the `-big` to `-small` name scheme |
| 241 | here. |
| 242 | |
| ----- | |
| 243 | |
| 244 | [^delta-prgs]: |
| 245 | This problem is not Fossil-specific. Several other programs also do |
| 246 | delta compression, so they’ll also be affected by this problem: |
| 247 | [rsync][rs], [Unison][us], [Git][git], etc. You should take this |
| 248 | article’s advice when using all such programs, not just Fossil. |
| 249 | When using file copying and synchronization programs *without* delta |
| 250 | compression, on the other hand, it’s best to use the most |
| 251 | highly-compressed file format you can tolerate, since they copy the |
| 252 | whole file any time any bit of it changes. |
| 253 | |
| 254 | [^prn]: |
| 255 | In fact, a good way to gauge the effectiveness of a given |
| 256 | compression scheme is to run its output through the same sort of |
| 257 | tests we use to gauge how “random” a given [PRNG][prng] is. Another |
| 258 | way to look at it is that if there is a discernible pattern in the |
| 259 | output of a compression scheme, that constitutes *information* (in |
| 260 | [the technical sense of that word][ith]) that could be further |
| 261 | compressed. |
| 262 | |
| 263 | [^tiff-cmp]: |
| 264 | We're using *uncompressed* TIFF here, not [LZW][lzw]- or |
| 265 | Zip-compressed TIFF, either of which would give similar results to |
| 266 | PNG, which is always zlib-compressed. |
| 267 | |
| 268 | [^variance]: |
| 269 | The raw data changes somewhat from one run to the next due to the |
| 270 | use of random noise in the image to make the zlib/PNG compression |
| 271 | more difficult, and the random pixel changes. Those test design |
| 272 | choices make this a [Monte Carlo experiment][mce]. We’ve found that |
| 273 | the overall character of the results doesn’t change from one run to |
| 274 | the next. |
| 275 | |
| 276 | [^size-jump]: |
| 277 | It’s not clear to me why there is a one-time jump in size for BMP |
| 278 | and TIFF past the first commit. I suspect it is due to the SQLite |
| 279 | indices being initialized for the first time. |
| 280 | Page size inflation might have something to do with it as well, |
| 281 | though we tried to control that by rebuilding the initial DB with a |
| 282 | minimal page size. If you re-run the program often enough, you will |
| 283 | sometimes see the BMP or TIFF bar jump higher than the other, again |
| 284 | likely due to one of the repos crossing a page boundary. |
| 285 | Another curious artifact in the data is that the BMP is slightly |
| 286 | larger than for the TIFF. This goes against expectation because a |
| 287 | low-tech format like BMP should have a small edge in this test |
| 288 | because TIFF metadata includes the option for multiple timestamps, |
| 289 | UUIDs, etc., which bloat the checkin size by creating many small |
| 290 | deltas. |
| 291 | |
| 292 | [^makefile]: |
| 293 | The `Makefile` above is not battle-tested. Please report bugs and |
| 294 | needed extensions [on the forum][for]. |
| 295 | |
| 296 | [for]: https://fossil-scm.org/forum/forumpost/15e677f2c8 |
| 297 | [git]: https://git-scm.com/ |
| 298 | [ith]: https://en.wikipedia.org/wiki/Information_theory |
| 299 | [lzw]: https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch |
| 300 | [prng]: https://en.wikipedia.org/wiki/Pseudorandom_number_generator |
| 301 | [rs]: https://rsync.samba.org/ |
| 302 | [us]: http://www.cis.upenn.edu/~bcpierce/unison/ |
| 303 |