Fossil SCM
Fix capitalization and statements on hash collisions in delta-manifest.md
Commit
36f90c7b9e9f7697e8f931f7b1896e46ddda506240ebe0a52d85c653f1d3d36a
Parent
749d9daf03e76e2…
1 file changed
+24
-24
+24
-24
| --- www/delta-manifests.md | ||
| +++ www/delta-manifests.md | ||
| @@ -9,11 +9,11 @@ | ||
| 9 | 9 | This article assumes that the reader is at least moderately familiar |
| 10 | 10 | with Fossil's [artifact file format](./fileformat.wiki), in particular |
| 11 | 11 | the structure of checkin manifests, and it won't make much sense to |
| 12 | 12 | readers unfamiliar with that topic. |
| 13 | 13 | |
| 14 | -Sidebar: delta manifests are not to be confused with the core [fossil | |
| 14 | +Sidebar: delta manifests are not to be confused with the core [Fossil | |
| 15 | 15 | delta format](./delta_format.wiki). The former is a special-case form |
| 16 | 16 | of delta which applies *only* to checkin manifests whereas the latter |
| 17 | 17 | is a general-purpose delta compression which can apply to any |
| 18 | 18 | Fossil-stored data (including delta manifests). |
| 19 | 19 | |
| @@ -20,11 +20,11 @@ | ||
| 20 | 20 | # Background and Motivation of Delta Manifests |
| 21 | 21 | |
| 22 | 22 | A checkin manifest includes a list of every file in that checkin. A |
| 23 | 23 | moderately-sized project can easily have a thousand files, and every |
| 24 | 24 | checkin manifest will include those thousand files. As of this writing |
| 25 | -fossil's own checkins contain 989 files and the manifests are 80kb | |
| 25 | +Fossil's own checkins contain 989 files and the manifests are 80kb | |
| 26 | 26 | each. Thus a checkin which changes only 2 bytes of sourse code |
| 27 | 27 | ostensibly costs another 80kb of storage for the manifest for that |
| 28 | 28 | change. |
| 29 | 29 | |
| 30 | 30 | Delta manifests were conceived as a mechanism to help combat that |
| @@ -89,11 +89,11 @@ | ||
| 89 | 89 | differences between their own version and a baseline version, and thus |
| 90 | 90 | have to record deletions. They do this by including F-cards which have |
| 91 | 91 | only a file name and no hash. |
| 92 | 92 | |
| 93 | 93 | Iterating over F-cards in a manifest is something several important |
| 94 | -internal parts of fossil have to do. Iterating over a baseline | |
| 94 | +internal parts of Fossil have to do. Iterating over a baseline | |
| 95 | 95 | manifest, e.g. when performing a checkout, is straightforward: simply |
| 96 | 96 | walk through the list in the order the cards are listed. A delta, |
| 97 | 97 | however, introduces a significant wrinkle to that process. In short, |
| 98 | 98 | when iterating over a delta's F-cards, code has to compare the delta's |
| 99 | 99 | list to the baseline's list. If the delta has an entry the parent does |
| @@ -108,38 +108,38 @@ | ||
| 108 | 108 | an internal detail, not something which higher-level code should |
| 109 | 109 | concern itself with. If higher-level iteration code were shown file |
| 110 | 110 | deletions, they would effectively be dealing with a leaky abstraction |
| 111 | 111 | and special-case handling which only applies to delta manifests. The |
| 112 | 112 | F-card iteration API hides such details from its users (other |
| 113 | -fossil-internal APIs). | |
| 113 | +Fossil-internal APIs). | |
| 114 | 114 | |
| 115 | 115 | |
| 116 | 116 | # When does Fossil Create Deltas? |
| 117 | 117 | |
| 118 | -By default, fossil never creates delta manifests. It can be told to do | |
| 118 | +By default, Fossil never creates delta manifests. It can be told to do | |
| 119 | 119 | so using the `--delta` flag to the [`commit` |
| 120 | 120 | command](/help/commit). (Before doing so in your own repositories, |
| 121 | 121 | please read the section below about the caveats!) When a given |
| 122 | -repository gets a delta manifest for the first time, fossil records | |
| 122 | +repository gets a delta manifest for the first time, Fossil records | |
| 123 | 123 | that fact in the repository's `config` table with an entry named |
| 124 | -`seen-delta-manifest`. If, in later sessions, fossil sees that that | |
| 124 | +`seen-delta-manifest`. If, in later sessions, Fossil sees that that | |
| 125 | 125 | setting has a true value, it will *consider* creating delta manifests |
| 126 | 126 | by default. |
| 127 | 127 | |
| 128 | 128 | Conversely, the [`forbid-delta-manifests` repository config |
| 129 | -setting](/help/forbid-delta-manifests) may be used to force fossil to | |
| 129 | +setting](/help/forbid-delta-manifests) may be used to force Fossil to | |
| 130 | 130 | *never* create deltas. That setting will propagate to other repository |
| 131 | 131 | clones via the sync process, to try to ensure that no clone introduces |
| 132 | 132 | a delta manifests. We'll cover reasons why one might want to use that |
| 133 | 133 | setting later on. |
| 134 | 134 | |
| 135 | -After creating a delta manifest during the commit process, fossil | |
| 136 | -examines the size of the delta. If, in fossil's opinion, the space | |
| 135 | +After creating a delta manifest during the commit process, Fossil | |
| 136 | +examines the size of the delta. If, in Fossil's opinion, the space | |
| 137 | 137 | savings are not significant enough to warrant the delta's own |
| 138 | 138 | overhead, it will discard the delta and create a new baseline manifest |
| 139 | 139 | instead. (The heuristic it uses for that purpose is tucked away in |
| 140 | -fossil's checkin algorithm.) | |
| 140 | +Fossil's checkin algorithm.) | |
| 141 | 141 | |
| 142 | 142 | |
| 143 | 143 | # Caveats |
| 144 | 144 | |
| 145 | 145 | Delta manifests may appear, on the surface, to be a great way to save |
| @@ -148,37 +148,37 @@ | ||
| 148 | 148 | ## Space Savings? |
| 149 | 149 | |
| 150 | 150 | Though deltas were conceived as a way to save storage space, that |
| 151 | 151 | benefit is *not truly achieved* because... |
| 152 | 152 | |
| 153 | -When a manifest is created, fossil stores its parent version as a | |
| 153 | +When a manifest is created, Fossil stores its parent version as a | |
| 154 | 154 | [fossil delta](./delta_format.wiki) (as opposed to a delta manifest) |
| 155 | 155 | which succinctly descibes the differences between the parent and its |
| 156 | 156 | new child. This form of compression is extremely space-efficient and |
| 157 | 157 | can reduce the real storage space requirements of a manifest from tens |
| 158 | 158 | or hundreds of kilobytes down to a kilobyte or less for checkins which |
| 159 | -modify only a few files. As an example, as of this writing, fossil's | |
| 159 | +modify only a few files. As an example, as of this writing, Fossil's | |
| 160 | 160 | [tip checkin baseline manifest](/artifact/decd537016bf) is 80252 bytes |
| 161 | 161 | (uncompressed), and the delta-compressed baseline manifest of the |
| 162 | 162 | [previous checkin](/artifact/2f7c93f49c0e) is stored as a mere 726 |
| 163 | -bytes of fossil-delta'd data (not counting the z-lib compression which | |
| 163 | +bytes of Fossil-delta'd data (not counting the z-lib compression which | |
| 164 | 164 | gets applied on top of that). In this case, the tip version modified 7 |
| 165 | 165 | files compared to its parent version. |
| 166 | 166 | |
| 167 | 167 | Thus delta manifests do not *actually* save much storage space. They |
| 168 | -save *some*, in particular in the tip checkin version: fossil | |
| 168 | +save *some*, in particular in the tip checkin version: Fossil | |
| 169 | 169 | delta-compresses *older* versions of checkins against the child |
| 170 | 170 | versions, as opposed to delta-compressing the children against the |
| 171 | 171 | parents. The reason is to speed up access for the most common case - |
| 172 | 172 | the latest version. Thus tip-version delta manifests are more |
| 173 | 173 | storage-space efficient than tip-version baseline manifests. Once the |
| 174 | -next version is committed, though, and fossil deltification is applied | |
| 174 | +next version is committed, though, and Fossil deltification is applied | |
| 175 | 175 | to those manifests, that difference in space efficiency shrinks |
| 176 | 176 | tremendously, often to the point of insignificance. |
| 177 | 177 | |
| 178 | -We can observe the fossil-delta compression savings using a bit of | |
| 179 | -3rd-party code which can extract fossil-format blobs both with and | |
| 178 | +We can observe the Fossil-delta compression savings using a bit of | |
| 179 | +3rd-party code which can extract Fossil-format blobs both with and | |
| 180 | 180 | without applying their deltas: |
| 181 | 181 | |
| 182 | 182 | ``` |
| 183 | 183 | $ f-acat tip > A # tip version's manifest |
| 184 | 184 | $ f-acat prev --raw > B # previous manifest in its raw deltified form |
| @@ -199,11 +199,11 @@ | ||
| 199 | 199 | |
| 200 | 200 | In terms of RAM costs, deltas usually cost more memory than baseline |
| 201 | 201 | manifests. The reason is because traversing a delta requires having |
| 202 | 202 | not only that delta in memory, but also its baseline version. Delta |
| 203 | 203 | manifests are seldom used in ways which do not require also loading |
| 204 | -their baselines. Thus fossil internally requires two manifest objects | |
| 204 | +their baselines. Thus Fossil internally requires two manifest objects | |
| 205 | 205 | for most operations with a delta manifest, whereas a baseline has but |
| 206 | 206 | one. The difference in RAM cost is directly proportional to the size |
| 207 | 207 | of the delta manifest. |
| 208 | 208 | |
| 209 | 209 | ## Manifests as Proof of Code Integrity |
| @@ -211,25 +211,25 @@ | ||
| 211 | 211 | Delta manifests have at least one more notable caveat, this one |
| 212 | 212 | arguably more significant than an apparent lack of space savings: |
| 213 | 213 | they're useless for purposes of publishing a manifest which downstream |
| 214 | 214 | clients can use to verify the integrity of their copy of the software. |
| 215 | 215 | |
| 216 | -Consider this use case: [the sqlite3 project](https://sqlite.org) | |
| 216 | +Consider this use case: [the SQLite project](https://sqlite.org) | |
| 217 | 217 | publishes source code to many thousands of downstream consumers, many |
| 218 | 218 | of whom would like to be able to verify that the copy they have |
| 219 | 219 | downloaded is actually the copy published by the project. This is |
| 220 | 220 | easily achieved by providing a copy of the downloaded version's |
| 221 | 221 | manifest, as it contains a hash of every single file the project |
| 222 | 222 | published and the manifest itself has a well-known hash and is |
| 223 | -cryptographically tamper-proof. It's mathematically impossible for a | |
| 223 | +cryptographically tamper-proof. It's mathematically extremely improbable for a | |
| 224 | 224 | malicious party to modify such a manifest and re-publish it as an |
| 225 | 225 | "official" one, as the various hashes (F-cards, R-card, Z-card, *and* |
| 226 | 226 | the hash of the manifest itself) would not line up. A collision-based |
| 227 | 227 | attack would have to defeat *all four of those hashes*, which is |
| 228 | -literally impossible to do. Thus a fossil checkin manifest can be used | |
| 228 | +practically impossible to do. Thus a Fossil checkin manifest can be used | |
| 229 | 229 | to provide strong assurances that a given copy of the software has not |
| 230 | -been tampered with since being exported by fossil. | |
| 230 | +been tampered with since being exported by Fossil. | |
| 231 | 231 | |
| 232 | 232 | *However*, that use case is *only possible with baseline manifests*. |
| 233 | 233 | A delta manifest is *essentially useless* for that purpose. The |
| 234 | 234 | algorithm for traversing F-cards of a delta manifest is not trivial |
| 235 | 235 | for arbitrary clients to reproduce, e.g. using a shell script. While |
| @@ -237,13 +237,13 @@ | ||
| 237 | 237 | truly unsightly shell code), it would be an onerous burden on |
| 238 | 238 | downstream consumers and would not be without risks of having bugs |
| 239 | 239 | which invalidate the strong guarantees provided by the manifest. |
| 240 | 240 | |
| 241 | 241 | It's worth noting that the core Fossil project repository does not use |
| 242 | -delta manifests, at least in part for the same reason the sqlite | |
| 242 | +delta manifests, at least in part for the same reason the SQLite | |
| 243 | 243 | project does not: the ability to provide a manifest which clients can |
| 244 | 244 | easily use to verify the integrity of the code they've downloaded. The |
| 245 | 245 | [`forbid-delta-manifests` config |
| 246 | 246 | setting](/help/forbid-delta-manifests) is used to ensure that none are |
| 247 | 247 | introduced into the repository beyond the few which were introduced |
| 248 | 248 | solely for testing purposes. |
| 249 | 249 | |
| 250 | 250 |
| --- www/delta-manifests.md | |
| +++ www/delta-manifests.md | |
| @@ -9,11 +9,11 @@ | |
| 9 | This article assumes that the reader is at least moderately familiar |
| 10 | with Fossil's [artifact file format](./fileformat.wiki), in particular |
| 11 | the structure of checkin manifests, and it won't make much sense to |
| 12 | readers unfamiliar with that topic. |
| 13 | |
| 14 | Sidebar: delta manifests are not to be confused with the core [fossil |
| 15 | delta format](./delta_format.wiki). The former is a special-case form |
| 16 | of delta which applies *only* to checkin manifests whereas the latter |
| 17 | is a general-purpose delta compression which can apply to any |
| 18 | Fossil-stored data (including delta manifests). |
| 19 | |
| @@ -20,11 +20,11 @@ | |
| 20 | # Background and Motivation of Delta Manifests |
| 21 | |
| 22 | A checkin manifest includes a list of every file in that checkin. A |
| 23 | moderately-sized project can easily have a thousand files, and every |
| 24 | checkin manifest will include those thousand files. As of this writing |
| 25 | fossil's own checkins contain 989 files and the manifests are 80kb |
| 26 | each. Thus a checkin which changes only 2 bytes of sourse code |
| 27 | ostensibly costs another 80kb of storage for the manifest for that |
| 28 | change. |
| 29 | |
| 30 | Delta manifests were conceived as a mechanism to help combat that |
| @@ -89,11 +89,11 @@ | |
| 89 | differences between their own version and a baseline version, and thus |
| 90 | have to record deletions. They do this by including F-cards which have |
| 91 | only a file name and no hash. |
| 92 | |
| 93 | Iterating over F-cards in a manifest is something several important |
| 94 | internal parts of fossil have to do. Iterating over a baseline |
| 95 | manifest, e.g. when performing a checkout, is straightforward: simply |
| 96 | walk through the list in the order the cards are listed. A delta, |
| 97 | however, introduces a significant wrinkle to that process. In short, |
| 98 | when iterating over a delta's F-cards, code has to compare the delta's |
| 99 | list to the baseline's list. If the delta has an entry the parent does |
| @@ -108,38 +108,38 @@ | |
| 108 | an internal detail, not something which higher-level code should |
| 109 | concern itself with. If higher-level iteration code were shown file |
| 110 | deletions, they would effectively be dealing with a leaky abstraction |
| 111 | and special-case handling which only applies to delta manifests. The |
| 112 | F-card iteration API hides such details from its users (other |
| 113 | fossil-internal APIs). |
| 114 | |
| 115 | |
| 116 | # When does Fossil Create Deltas? |
| 117 | |
| 118 | By default, fossil never creates delta manifests. It can be told to do |
| 119 | so using the `--delta` flag to the [`commit` |
| 120 | command](/help/commit). (Before doing so in your own repositories, |
| 121 | please read the section below about the caveats!) When a given |
| 122 | repository gets a delta manifest for the first time, fossil records |
| 123 | that fact in the repository's `config` table with an entry named |
| 124 | `seen-delta-manifest`. If, in later sessions, fossil sees that that |
| 125 | setting has a true value, it will *consider* creating delta manifests |
| 126 | by default. |
| 127 | |
| 128 | Conversely, the [`forbid-delta-manifests` repository config |
| 129 | setting](/help/forbid-delta-manifests) may be used to force fossil to |
| 130 | *never* create deltas. That setting will propagate to other repository |
| 131 | clones via the sync process, to try to ensure that no clone introduces |
| 132 | a delta manifests. We'll cover reasons why one might want to use that |
| 133 | setting later on. |
| 134 | |
| 135 | After creating a delta manifest during the commit process, fossil |
| 136 | examines the size of the delta. If, in fossil's opinion, the space |
| 137 | savings are not significant enough to warrant the delta's own |
| 138 | overhead, it will discard the delta and create a new baseline manifest |
| 139 | instead. (The heuristic it uses for that purpose is tucked away in |
| 140 | fossil's checkin algorithm.) |
| 141 | |
| 142 | |
| 143 | # Caveats |
| 144 | |
| 145 | Delta manifests may appear, on the surface, to be a great way to save |
| @@ -148,37 +148,37 @@ | |
| 148 | ## Space Savings? |
| 149 | |
| 150 | Though deltas were conceived as a way to save storage space, that |
| 151 | benefit is *not truly achieved* because... |
| 152 | |
| 153 | When a manifest is created, fossil stores its parent version as a |
| 154 | [fossil delta](./delta_format.wiki) (as opposed to a delta manifest) |
| 155 | which succinctly descibes the differences between the parent and its |
| 156 | new child. This form of compression is extremely space-efficient and |
| 157 | can reduce the real storage space requirements of a manifest from tens |
| 158 | or hundreds of kilobytes down to a kilobyte or less for checkins which |
| 159 | modify only a few files. As an example, as of this writing, fossil's |
| 160 | [tip checkin baseline manifest](/artifact/decd537016bf) is 80252 bytes |
| 161 | (uncompressed), and the delta-compressed baseline manifest of the |
| 162 | [previous checkin](/artifact/2f7c93f49c0e) is stored as a mere 726 |
| 163 | bytes of fossil-delta'd data (not counting the z-lib compression which |
| 164 | gets applied on top of that). In this case, the tip version modified 7 |
| 165 | files compared to its parent version. |
| 166 | |
| 167 | Thus delta manifests do not *actually* save much storage space. They |
| 168 | save *some*, in particular in the tip checkin version: fossil |
| 169 | delta-compresses *older* versions of checkins against the child |
| 170 | versions, as opposed to delta-compressing the children against the |
| 171 | parents. The reason is to speed up access for the most common case - |
| 172 | the latest version. Thus tip-version delta manifests are more |
| 173 | storage-space efficient than tip-version baseline manifests. Once the |
| 174 | next version is committed, though, and fossil deltification is applied |
| 175 | to those manifests, that difference in space efficiency shrinks |
| 176 | tremendously, often to the point of insignificance. |
| 177 | |
| 178 | We can observe the fossil-delta compression savings using a bit of |
| 179 | 3rd-party code which can extract fossil-format blobs both with and |
| 180 | without applying their deltas: |
| 181 | |
| 182 | ``` |
| 183 | $ f-acat tip > A # tip version's manifest |
| 184 | $ f-acat prev --raw > B # previous manifest in its raw deltified form |
| @@ -199,11 +199,11 @@ | |
| 199 | |
| 200 | In terms of RAM costs, deltas usually cost more memory than baseline |
| 201 | manifests. The reason is because traversing a delta requires having |
| 202 | not only that delta in memory, but also its baseline version. Delta |
| 203 | manifests are seldom used in ways which do not require also loading |
| 204 | their baselines. Thus fossil internally requires two manifest objects |
| 205 | for most operations with a delta manifest, whereas a baseline has but |
| 206 | one. The difference in RAM cost is directly proportional to the size |
| 207 | of the delta manifest. |
| 208 | |
| 209 | ## Manifests as Proof of Code Integrity |
| @@ -211,25 +211,25 @@ | |
| 211 | Delta manifests have at least one more notable caveat, this one |
| 212 | arguably more significant than an apparent lack of space savings: |
| 213 | they're useless for purposes of publishing a manifest which downstream |
| 214 | clients can use to verify the integrity of their copy of the software. |
| 215 | |
| 216 | Consider this use case: [the sqlite3 project](https://sqlite.org) |
| 217 | publishes source code to many thousands of downstream consumers, many |
| 218 | of whom would like to be able to verify that the copy they have |
| 219 | downloaded is actually the copy published by the project. This is |
| 220 | easily achieved by providing a copy of the downloaded version's |
| 221 | manifest, as it contains a hash of every single file the project |
| 222 | published and the manifest itself has a well-known hash and is |
| 223 | cryptographically tamper-proof. It's mathematically impossible for a |
| 224 | malicious party to modify such a manifest and re-publish it as an |
| 225 | "official" one, as the various hashes (F-cards, R-card, Z-card, *and* |
| 226 | the hash of the manifest itself) would not line up. A collision-based |
| 227 | attack would have to defeat *all four of those hashes*, which is |
| 228 | literally impossible to do. Thus a fossil checkin manifest can be used |
| 229 | to provide strong assurances that a given copy of the software has not |
| 230 | been tampered with since being exported by fossil. |
| 231 | |
| 232 | *However*, that use case is *only possible with baseline manifests*. |
| 233 | A delta manifest is *essentially useless* for that purpose. The |
| 234 | algorithm for traversing F-cards of a delta manifest is not trivial |
| 235 | for arbitrary clients to reproduce, e.g. using a shell script. While |
| @@ -237,13 +237,13 @@ | |
| 237 | truly unsightly shell code), it would be an onerous burden on |
| 238 | downstream consumers and would not be without risks of having bugs |
| 239 | which invalidate the strong guarantees provided by the manifest. |
| 240 | |
| 241 | It's worth noting that the core Fossil project repository does not use |
| 242 | delta manifests, at least in part for the same reason the sqlite |
| 243 | project does not: the ability to provide a manifest which clients can |
| 244 | easily use to verify the integrity of the code they've downloaded. The |
| 245 | [`forbid-delta-manifests` config |
| 246 | setting](/help/forbid-delta-manifests) is used to ensure that none are |
| 247 | introduced into the repository beyond the few which were introduced |
| 248 | solely for testing purposes. |
| 249 | |
| 250 |
| --- www/delta-manifests.md | |
| +++ www/delta-manifests.md | |
| @@ -9,11 +9,11 @@ | |
| 9 | This article assumes that the reader is at least moderately familiar |
| 10 | with Fossil's [artifact file format](./fileformat.wiki), in particular |
| 11 | the structure of checkin manifests, and it won't make much sense to |
| 12 | readers unfamiliar with that topic. |
| 13 | |
| 14 | Sidebar: delta manifests are not to be confused with the core [Fossil |
| 15 | delta format](./delta_format.wiki). The former is a special-case form |
| 16 | of delta which applies *only* to checkin manifests whereas the latter |
| 17 | is a general-purpose delta compression which can apply to any |
| 18 | Fossil-stored data (including delta manifests). |
| 19 | |
| @@ -20,11 +20,11 @@ | |
| 20 | # Background and Motivation of Delta Manifests |
| 21 | |
| 22 | A checkin manifest includes a list of every file in that checkin. A |
| 23 | moderately-sized project can easily have a thousand files, and every |
| 24 | checkin manifest will include those thousand files. As of this writing |
| 25 | Fossil's own checkins contain 989 files and the manifests are 80kb |
| 26 | each. Thus a checkin which changes only 2 bytes of sourse code |
| 27 | ostensibly costs another 80kb of storage for the manifest for that |
| 28 | change. |
| 29 | |
| 30 | Delta manifests were conceived as a mechanism to help combat that |
| @@ -89,11 +89,11 @@ | |
| 89 | differences between their own version and a baseline version, and thus |
| 90 | have to record deletions. They do this by including F-cards which have |
| 91 | only a file name and no hash. |
| 92 | |
| 93 | Iterating over F-cards in a manifest is something several important |
| 94 | internal parts of Fossil have to do. Iterating over a baseline |
| 95 | manifest, e.g. when performing a checkout, is straightforward: simply |
| 96 | walk through the list in the order the cards are listed. A delta, |
| 97 | however, introduces a significant wrinkle to that process. In short, |
| 98 | when iterating over a delta's F-cards, code has to compare the delta's |
| 99 | list to the baseline's list. If the delta has an entry the parent does |
| @@ -108,38 +108,38 @@ | |
| 108 | an internal detail, not something which higher-level code should |
| 109 | concern itself with. If higher-level iteration code were shown file |
| 110 | deletions, they would effectively be dealing with a leaky abstraction |
| 111 | and special-case handling which only applies to delta manifests. The |
| 112 | F-card iteration API hides such details from its users (other |
| 113 | Fossil-internal APIs). |
| 114 | |
| 115 | |
| 116 | # When does Fossil Create Deltas? |
| 117 | |
| 118 | By default, Fossil never creates delta manifests. It can be told to do |
| 119 | so using the `--delta` flag to the [`commit` |
| 120 | command](/help/commit). (Before doing so in your own repositories, |
| 121 | please read the section below about the caveats!) When a given |
| 122 | repository gets a delta manifest for the first time, Fossil records |
| 123 | that fact in the repository's `config` table with an entry named |
| 124 | `seen-delta-manifest`. If, in later sessions, Fossil sees that that |
| 125 | setting has a true value, it will *consider* creating delta manifests |
| 126 | by default. |
| 127 | |
| 128 | Conversely, the [`forbid-delta-manifests` repository config |
| 129 | setting](/help/forbid-delta-manifests) may be used to force Fossil to |
| 130 | *never* create deltas. That setting will propagate to other repository |
| 131 | clones via the sync process, to try to ensure that no clone introduces |
| 132 | a delta manifests. We'll cover reasons why one might want to use that |
| 133 | setting later on. |
| 134 | |
| 135 | After creating a delta manifest during the commit process, Fossil |
| 136 | examines the size of the delta. If, in Fossil's opinion, the space |
| 137 | savings are not significant enough to warrant the delta's own |
| 138 | overhead, it will discard the delta and create a new baseline manifest |
| 139 | instead. (The heuristic it uses for that purpose is tucked away in |
| 140 | Fossil's checkin algorithm.) |
| 141 | |
| 142 | |
| 143 | # Caveats |
| 144 | |
| 145 | Delta manifests may appear, on the surface, to be a great way to save |
| @@ -148,37 +148,37 @@ | |
| 148 | ## Space Savings? |
| 149 | |
| 150 | Though deltas were conceived as a way to save storage space, that |
| 151 | benefit is *not truly achieved* because... |
| 152 | |
| 153 | When a manifest is created, Fossil stores its parent version as a |
| 154 | [fossil delta](./delta_format.wiki) (as opposed to a delta manifest) |
| 155 | which succinctly descibes the differences between the parent and its |
| 156 | new child. This form of compression is extremely space-efficient and |
| 157 | can reduce the real storage space requirements of a manifest from tens |
| 158 | or hundreds of kilobytes down to a kilobyte or less for checkins which |
| 159 | modify only a few files. As an example, as of this writing, Fossil's |
| 160 | [tip checkin baseline manifest](/artifact/decd537016bf) is 80252 bytes |
| 161 | (uncompressed), and the delta-compressed baseline manifest of the |
| 162 | [previous checkin](/artifact/2f7c93f49c0e) is stored as a mere 726 |
| 163 | bytes of Fossil-delta'd data (not counting the z-lib compression which |
| 164 | gets applied on top of that). In this case, the tip version modified 7 |
| 165 | files compared to its parent version. |
| 166 | |
| 167 | Thus delta manifests do not *actually* save much storage space. They |
| 168 | save *some*, in particular in the tip checkin version: Fossil |
| 169 | delta-compresses *older* versions of checkins against the child |
| 170 | versions, as opposed to delta-compressing the children against the |
| 171 | parents. The reason is to speed up access for the most common case - |
| 172 | the latest version. Thus tip-version delta manifests are more |
| 173 | storage-space efficient than tip-version baseline manifests. Once the |
| 174 | next version is committed, though, and Fossil deltification is applied |
| 175 | to those manifests, that difference in space efficiency shrinks |
| 176 | tremendously, often to the point of insignificance. |
| 177 | |
| 178 | We can observe the Fossil-delta compression savings using a bit of |
| 179 | 3rd-party code which can extract Fossil-format blobs both with and |
| 180 | without applying their deltas: |
| 181 | |
| 182 | ``` |
| 183 | $ f-acat tip > A # tip version's manifest |
| 184 | $ f-acat prev --raw > B # previous manifest in its raw deltified form |
| @@ -199,11 +199,11 @@ | |
| 199 | |
| 200 | In terms of RAM costs, deltas usually cost more memory than baseline |
| 201 | manifests. The reason is because traversing a delta requires having |
| 202 | not only that delta in memory, but also its baseline version. Delta |
| 203 | manifests are seldom used in ways which do not require also loading |
| 204 | their baselines. Thus Fossil internally requires two manifest objects |
| 205 | for most operations with a delta manifest, whereas a baseline has but |
| 206 | one. The difference in RAM cost is directly proportional to the size |
| 207 | of the delta manifest. |
| 208 | |
| 209 | ## Manifests as Proof of Code Integrity |
| @@ -211,25 +211,25 @@ | |
| 211 | Delta manifests have at least one more notable caveat, this one |
| 212 | arguably more significant than an apparent lack of space savings: |
| 213 | they're useless for purposes of publishing a manifest which downstream |
| 214 | clients can use to verify the integrity of their copy of the software. |
| 215 | |
| 216 | Consider this use case: [the SQLite project](https://sqlite.org) |
| 217 | publishes source code to many thousands of downstream consumers, many |
| 218 | of whom would like to be able to verify that the copy they have |
| 219 | downloaded is actually the copy published by the project. This is |
| 220 | easily achieved by providing a copy of the downloaded version's |
| 221 | manifest, as it contains a hash of every single file the project |
| 222 | published and the manifest itself has a well-known hash and is |
| 223 | cryptographically tamper-proof. It's mathematically extremely improbable for a |
| 224 | malicious party to modify such a manifest and re-publish it as an |
| 225 | "official" one, as the various hashes (F-cards, R-card, Z-card, *and* |
| 226 | the hash of the manifest itself) would not line up. A collision-based |
| 227 | attack would have to defeat *all four of those hashes*, which is |
| 228 | practically impossible to do. Thus a Fossil checkin manifest can be used |
| 229 | to provide strong assurances that a given copy of the software has not |
| 230 | been tampered with since being exported by Fossil. |
| 231 | |
| 232 | *However*, that use case is *only possible with baseline manifests*. |
| 233 | A delta manifest is *essentially useless* for that purpose. The |
| 234 | algorithm for traversing F-cards of a delta manifest is not trivial |
| 235 | for arbitrary clients to reproduce, e.g. using a shell script. While |
| @@ -237,13 +237,13 @@ | |
| 237 | truly unsightly shell code), it would be an onerous burden on |
| 238 | downstream consumers and would not be without risks of having bugs |
| 239 | which invalidate the strong guarantees provided by the manifest. |
| 240 | |
| 241 | It's worth noting that the core Fossil project repository does not use |
| 242 | delta manifests, at least in part for the same reason the SQLite |
| 243 | project does not: the ability to provide a manifest which clients can |
| 244 | easily use to verify the integrity of the code they've downloaded. The |
| 245 | [`forbid-delta-manifests` config |
| 246 | setting](/help/forbid-delta-manifests) is used to ensure that none are |
| 247 | introduced into the repository beyond the few which were introduced |
| 248 | solely for testing purposes. |
| 249 | |
| 250 |