Fossil SCM
Updates to the doc/hashes.md document.
Commit
364307951cf5224d20c19b93bb53c7dacd5c98fa14220f88cd528083b27c4183
Parent
e714b8427c8dd10…
1 file changed
+54
-45
+54
-45
| --- www/hashes.md | ||
| +++ www/hashes.md | ||
| @@ -3,14 +3,15 @@ | ||
| 3 | 3 | All artifacts in Fossil are identified by a unique hash, currently using |
| 4 | 4 | [the SHA3 algorithm by default][hpol], but historically using the SHA1 |
| 5 | 5 | algorithm. Therefore, there are two full-length hash formats used by |
| 6 | 6 | Fossil: |
| 7 | 7 | |
| 8 | -| Algorithm | Raw Bits | Hex ASCII Bytes | | |
| 9 | -|-----------|----------|-----------------| | |
| 10 | -| SHA3-256 | 256 | 64 | | |
| 11 | -| SHA1 | 160 | 40 | | |
| 8 | +<table border="1" cellspacing="0" cellpadding="10"> | |
| 9 | +<tr><th>Algorithm<th>Raw Bits<th>Hexadecimal digits | |
| 10 | +<tr><td>SHA3-256<td>256<td>64 | |
| 11 | +<tr><td>SHA1<td>160<td>40 | |
| 12 | +</table> | |
| 12 | 13 | |
| 13 | 14 | There are many types of artifacts in Fossil: commits (a.k.a. check-ins), |
| 14 | 15 | tickets, ticket comments, wiki articles, forum postings, file data |
| 15 | 16 | belonging to check-ins, etc. ([More info...](./concepts.wiki#artifacts)). |
| 16 | 17 | |
| @@ -47,85 +48,95 @@ | ||
| 47 | 48 | “abc123”, then that is a valid version string as long as it remains |
| 48 | 49 | unambiguous. |
| 49 | 50 | |
| 50 | 51 | |
| 51 | 52 | |
| 52 | -## <a id="uvh"></a>UUIDs: An Unfortunate Historical Artifact | |
| 53 | - | |
| 54 | -Historically, Fossil incorrectly used the term “[UUID][uuid]” where it | |
| 55 | -should use the term “artifact hash” instead. There are two primary | |
| 56 | -problems with miscalling Fossil artifact hashes UUIDs: | |
| 57 | - | |
| 58 | -1. UUIDs are always 128 bits in length — 32 hex ASCII bytes — making | |
| 59 | - them shorter than any actual Fossil artifact hash. | |
| 60 | - | |
| 61 | -2. Artifact hashes are necessarily highly pseudorandom blobs, but only | |
| 62 | - [version 4 UUIDs][v4] are pseudorandom in the same way. Other UUID | |
| 63 | - types have non-random meanings for certain subgroups of the bits, | |
| 64 | - restrictions that Fossil artifact hashes do not meet. | |
| 65 | - | |
| 66 | -Therefore, no Fossil hash can ever be a proper UUID. | |
| 67 | - | |
| 68 | -Nevertheless, there are several places in Fossil where we still use the | |
| 69 | -term UUID, primarily for backwards compatibility: | |
| 53 | +## <a id="uvh"></a>Unconventional Use Of The Term "UUID" | |
| 54 | + | |
| 55 | +"UUID" is an acronym for "Univerially Unique Identifier". Hashes | |
| 56 | +generated by SHA1 or SHA3-256 are universally unique (in practice, | |
| 57 | +if not in theory) and they identify a particular artifact, and so | |
| 58 | +it seems reasonable to refer to artifact hashes as UUIDs. | |
| 59 | + | |
| 60 | +However, the term UUID has acquired a much stricter meaning than its | |
| 61 | +name alone implies. Purists insist that UUIDs must be *exactly* 128 bits, | |
| 62 | +that they must be displayed in a particular hexadecimal format that includes | |
| 63 | +dashes at proscribed intervals, and that they must have four particular bits | |
| 64 | +set aside to indicate the "type" of UUID. Fossil artifact hashes do not | |
| 65 | +comply with any of these supplemental requirements, and so are not UUIDs | |
| 66 | +in the strictest sense of the word. But the artifact hashes in Fossil are | |
| 67 | +literally "univerally unique identifiers", and so they are sometimes | |
| 68 | +called "UUIDs" anyhow. | |
| 69 | + | |
| 70 | +Some readers are greatly annoyed by Fossil's use of "UUID" in its most | |
| 71 | +literal sense. To those readers, the designer apologizes, and seeks your | |
| 72 | +mercy by noting that when the term "UUID" first began to be used by Fossil, | |
| 73 | +only SHA1 was supported and so all the artifact hashes were 128 bits, making | |
| 74 | +them close to, if not exactly, in compliance with the rigid definition | |
| 75 | +of the term. For his misuse of the term "UUID", the designer has been | |
| 76 | +frequently rebuked. | |
| 77 | +Some efforts have been made, over the ensuing years, to avoid and replace | |
| 78 | +"UUID" in newer code and documentation. | |
| 79 | +But it does not seem like such a serious issue as to require an immediate | |
| 80 | +purge of the term from existing documentation, code, and database schemas, | |
| 81 | +as some have suggested. Hence, the unconventional use of the term "UUID" | |
| 82 | +lingers on in Fossil. Let new readers beware. | |
| 83 | + | |
| 84 | +Places where the non-conforming use of "UUID" persists in Fossil are | |
| 85 | +discussed in the sequel. | |
| 70 | 86 | |
| 71 | 87 | |
| 72 | 88 | ### Repository DB Schema |
| 73 | 89 | |
| 74 | -Almost all of these uses flow from the `blob.uuid` table column. This is | |
| 90 | +Almost all remaining uses of the term "UUID" in Fossil derive | |
| 91 | +from the `blob.uuid` table column. This is | |
| 75 | 92 | a key lookup column in the most important persistent Fossil DB table, so |
| 76 | 93 | it influences broad swaths of the Fossil internals. |
| 77 | 94 | |
| 78 | -Someday we may rename this column and those it has influenced (e.g. | |
| 79 | -`purgeitem.uuid`, `shun.uuid`, and `ticket.tkt_uuid`) by making Fossil | |
| 80 | -detect the outdated schema and silently upgrade it, coincident with | |
| 81 | -updating all of the SQL in Fossil that refers to these columns. Until | |
| 82 | -then, Fossil will continue to have “UUID” all through its internals. | |
| 95 | +It is theoretically possible to rename this column and those it has | |
| 96 | +influenced (e.g. `purgeitem.uuid`, `shun.uuid`, and `ticket.tkt_uuid`) | |
| 97 | +by making Fossil detect the outdated schema and silently upgrade it, | |
| 98 | +coincident with updating all of the SQL in Fossil that refers to these | |
| 99 | +columns. But that is a large and error-prone edit that does | |
| 100 | +serve any pressing need, and so is unlikely to happen any time soon. | |
| 101 | +Hence, Fossil will likely continue to have “UUID” all through its internals. | |
| 83 | 102 | |
| 84 | 103 | In order to avoid needless terminology conflicts, Fossil code that |
| 85 | -refers to these misnamed columns also uses some variant of “UUID.” For | |
| 104 | +refers to these columns also uses some variant of “UUID.” For | |
| 86 | 105 | example, C code that refers to SQL result data on `blob.uuid` usually |
| 87 | 106 | calls the variable `zUuid`. Another example is the internal function |
| 88 | -`uuid_to_rid()`. Until and unless we decide to rename these DB columns, | |
| 89 | -we will keep these associated internal identifiers unchanged. | |
| 107 | +`uuid_to_rid()`. Until and unless the columns are renamed, | |
| 108 | +these associated function names will likely also go unchanged. | |
| 90 | 109 | |
| 91 | 110 | You may have local SQL code that digs into the repository DB using these |
| 92 | 111 | column names. If so, be warned: we are not inclined to consider |
| 93 | 112 | existence of such code sufficient reason to avoid renaming the columns. |
| 94 | 113 | The Fossil repository DB schema is not considered an external user |
| 95 | 114 | interface, and internal interfaces are subject to change at any time. We |
| 96 | 115 | suggest switching to a more stable API: the JSON API, `/timeline.rss`, |
| 97 | 116 | TH1, etc. |
| 98 | 117 | |
| 99 | -There are also some temporary tables that misuse “UUID” in this way. | |
| 100 | -(`description.uuid`, `timeline.uuid`, `xmark.uuid`, etc.) There’s a good | |
| 101 | -chance we’ll fix these before we fix the on-disk DB schema since no | |
| 102 | -other code can depend on them. | |
| 103 | - | |
| 104 | 118 | |
| 105 | 119 | ### TH1 Scripting Interfaces |
| 106 | 120 | |
| 107 | 121 | Some [TH1](./th1.md) interfaces use “UUID” where they actually mean some |
| 108 | 122 | kind of hash. For example, the `$tkt_uuid` variable, available via TH1 |
| 109 | 123 | when [customizing Fossil’s ticket system][ctkt]. |
| 110 | 124 | |
| 111 | 125 | Because this is considered a public programming interface, we are |
| 112 | 126 | unwilling to unilaterally rename such TH1 variables, even though they |
| 113 | -are “wrong.” For now, we are simply documenting the misuse. Later, we | |
| 114 | -may provide a parallel interface — e.g. `$tkt_hash` in this case — and | |
| 115 | -drop mention of the old interface from the documentation, but still | |
| 116 | -support it. | |
| 127 | +are "wrong". For now, we are simply documenting the unconventional | |
| 128 | +terminology. | |
| 117 | 129 | |
| 118 | 130 | |
| 119 | 131 | ### JSON API Parameters and Outputs |
| 120 | 132 | |
| 121 | -The JSON API frequently misuses the term “UUID” in the same sort of way, | |
| 133 | +The JSON API frequently uses the term “UUID” in the same sort of way, | |
| 122 | 134 | most commonly in [artifact][jart] and [timeline][jtim] APIs. As with the |
| 123 | 135 | prior case, we can’t fix these without breaking code that uses the JSON |
| 124 | 136 | API as originally designed, so our solutions are the same: document the |
| 125 | -misuse here for now, then possibly provide a backwards-compatible fix | |
| 126 | -later. | |
| 137 | +unconventional usage. | |
| 127 | 138 | |
| 128 | 139 | |
| 129 | 140 | ### `manifest.uuid` |
| 130 | 141 | |
| 131 | 142 | If you have [the `manifest` setting][mset] enabled, Fossil writes a file |
| @@ -139,7 +150,5 @@ | ||
| 139 | 150 | [hpol]: ./hashpolicy.wiki |
| 140 | 151 | [jart]: ./json-api/api-artifact.md |
| 141 | 152 | [jtim]: ./json-api/api-timeline.md |
| 142 | 153 | [mset]: /help?cmd=manifest |
| 143 | 154 | [tvb]: ./branching.wiki |
| 144 | -[uuid]: https://en.wikipedia.org/wiki/Universally_unique_identifier | |
| 145 | -[v4]: https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random) | |
| 146 | 155 |
| --- www/hashes.md | |
| +++ www/hashes.md | |
| @@ -3,14 +3,15 @@ | |
| 3 | All artifacts in Fossil are identified by a unique hash, currently using |
| 4 | [the SHA3 algorithm by default][hpol], but historically using the SHA1 |
| 5 | algorithm. Therefore, there are two full-length hash formats used by |
| 6 | Fossil: |
| 7 | |
| 8 | | Algorithm | Raw Bits | Hex ASCII Bytes | |
| 9 | |-----------|----------|-----------------| |
| 10 | | SHA3-256 | 256 | 64 | |
| 11 | | SHA1 | 160 | 40 | |
| 12 | |
| 13 | There are many types of artifacts in Fossil: commits (a.k.a. check-ins), |
| 14 | tickets, ticket comments, wiki articles, forum postings, file data |
| 15 | belonging to check-ins, etc. ([More info...](./concepts.wiki#artifacts)). |
| 16 | |
| @@ -47,85 +48,95 @@ | |
| 47 | “abc123”, then that is a valid version string as long as it remains |
| 48 | unambiguous. |
| 49 | |
| 50 | |
| 51 | |
| 52 | ## <a id="uvh"></a>UUIDs: An Unfortunate Historical Artifact |
| 53 | |
| 54 | Historically, Fossil incorrectly used the term “[UUID][uuid]” where it |
| 55 | should use the term “artifact hash” instead. There are two primary |
| 56 | problems with miscalling Fossil artifact hashes UUIDs: |
| 57 | |
| 58 | 1. UUIDs are always 128 bits in length — 32 hex ASCII bytes — making |
| 59 | them shorter than any actual Fossil artifact hash. |
| 60 | |
| 61 | 2. Artifact hashes are necessarily highly pseudorandom blobs, but only |
| 62 | [version 4 UUIDs][v4] are pseudorandom in the same way. Other UUID |
| 63 | types have non-random meanings for certain subgroups of the bits, |
| 64 | restrictions that Fossil artifact hashes do not meet. |
| 65 | |
| 66 | Therefore, no Fossil hash can ever be a proper UUID. |
| 67 | |
| 68 | Nevertheless, there are several places in Fossil where we still use the |
| 69 | term UUID, primarily for backwards compatibility: |
| 70 | |
| 71 | |
| 72 | ### Repository DB Schema |
| 73 | |
| 74 | Almost all of these uses flow from the `blob.uuid` table column. This is |
| 75 | a key lookup column in the most important persistent Fossil DB table, so |
| 76 | it influences broad swaths of the Fossil internals. |
| 77 | |
| 78 | Someday we may rename this column and those it has influenced (e.g. |
| 79 | `purgeitem.uuid`, `shun.uuid`, and `ticket.tkt_uuid`) by making Fossil |
| 80 | detect the outdated schema and silently upgrade it, coincident with |
| 81 | updating all of the SQL in Fossil that refers to these columns. Until |
| 82 | then, Fossil will continue to have “UUID” all through its internals. |
| 83 | |
| 84 | In order to avoid needless terminology conflicts, Fossil code that |
| 85 | refers to these misnamed columns also uses some variant of “UUID.” For |
| 86 | example, C code that refers to SQL result data on `blob.uuid` usually |
| 87 | calls the variable `zUuid`. Another example is the internal function |
| 88 | `uuid_to_rid()`. Until and unless we decide to rename these DB columns, |
| 89 | we will keep these associated internal identifiers unchanged. |
| 90 | |
| 91 | You may have local SQL code that digs into the repository DB using these |
| 92 | column names. If so, be warned: we are not inclined to consider |
| 93 | existence of such code sufficient reason to avoid renaming the columns. |
| 94 | The Fossil repository DB schema is not considered an external user |
| 95 | interface, and internal interfaces are subject to change at any time. We |
| 96 | suggest switching to a more stable API: the JSON API, `/timeline.rss`, |
| 97 | TH1, etc. |
| 98 | |
| 99 | There are also some temporary tables that misuse “UUID” in this way. |
| 100 | (`description.uuid`, `timeline.uuid`, `xmark.uuid`, etc.) There’s a good |
| 101 | chance we’ll fix these before we fix the on-disk DB schema since no |
| 102 | other code can depend on them. |
| 103 | |
| 104 | |
| 105 | ### TH1 Scripting Interfaces |
| 106 | |
| 107 | Some [TH1](./th1.md) interfaces use “UUID” where they actually mean some |
| 108 | kind of hash. For example, the `$tkt_uuid` variable, available via TH1 |
| 109 | when [customizing Fossil’s ticket system][ctkt]. |
| 110 | |
| 111 | Because this is considered a public programming interface, we are |
| 112 | unwilling to unilaterally rename such TH1 variables, even though they |
| 113 | are “wrong.” For now, we are simply documenting the misuse. Later, we |
| 114 | may provide a parallel interface — e.g. `$tkt_hash` in this case — and |
| 115 | drop mention of the old interface from the documentation, but still |
| 116 | support it. |
| 117 | |
| 118 | |
| 119 | ### JSON API Parameters and Outputs |
| 120 | |
| 121 | The JSON API frequently misuses the term “UUID” in the same sort of way, |
| 122 | most commonly in [artifact][jart] and [timeline][jtim] APIs. As with the |
| 123 | prior case, we can’t fix these without breaking code that uses the JSON |
| 124 | API as originally designed, so our solutions are the same: document the |
| 125 | misuse here for now, then possibly provide a backwards-compatible fix |
| 126 | later. |
| 127 | |
| 128 | |
| 129 | ### `manifest.uuid` |
| 130 | |
| 131 | If you have [the `manifest` setting][mset] enabled, Fossil writes a file |
| @@ -139,7 +150,5 @@ | |
| 139 | [hpol]: ./hashpolicy.wiki |
| 140 | [jart]: ./json-api/api-artifact.md |
| 141 | [jtim]: ./json-api/api-timeline.md |
| 142 | [mset]: /help?cmd=manifest |
| 143 | [tvb]: ./branching.wiki |
| 144 | [uuid]: https://en.wikipedia.org/wiki/Universally_unique_identifier |
| 145 | [v4]: https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random) |
| 146 |
| --- www/hashes.md | |
| +++ www/hashes.md | |
| @@ -3,14 +3,15 @@ | |
| 3 | All artifacts in Fossil are identified by a unique hash, currently using |
| 4 | [the SHA3 algorithm by default][hpol], but historically using the SHA1 |
| 5 | algorithm. Therefore, there are two full-length hash formats used by |
| 6 | Fossil: |
| 7 | |
| 8 | <table border="1" cellspacing="0" cellpadding="10"> |
| 9 | <tr><th>Algorithm<th>Raw Bits<th>Hexadecimal digits |
| 10 | <tr><td>SHA3-256<td>256<td>64 |
| 11 | <tr><td>SHA1<td>160<td>40 |
| 12 | </table> |
| 13 | |
| 14 | There are many types of artifacts in Fossil: commits (a.k.a. check-ins), |
| 15 | tickets, ticket comments, wiki articles, forum postings, file data |
| 16 | belonging to check-ins, etc. ([More info...](./concepts.wiki#artifacts)). |
| 17 | |
| @@ -47,85 +48,95 @@ | |
| 48 | “abc123”, then that is a valid version string as long as it remains |
| 49 | unambiguous. |
| 50 | |
| 51 | |
| 52 | |
| 53 | ## <a id="uvh"></a>Unconventional Use Of The Term "UUID" |
| 54 | |
| 55 | "UUID" is an acronym for "Univerially Unique Identifier". Hashes |
| 56 | generated by SHA1 or SHA3-256 are universally unique (in practice, |
| 57 | if not in theory) and they identify a particular artifact, and so |
| 58 | it seems reasonable to refer to artifact hashes as UUIDs. |
| 59 | |
| 60 | However, the term UUID has acquired a much stricter meaning than its |
| 61 | name alone implies. Purists insist that UUIDs must be *exactly* 128 bits, |
| 62 | that they must be displayed in a particular hexadecimal format that includes |
| 63 | dashes at proscribed intervals, and that they must have four particular bits |
| 64 | set aside to indicate the "type" of UUID. Fossil artifact hashes do not |
| 65 | comply with any of these supplemental requirements, and so are not UUIDs |
| 66 | in the strictest sense of the word. But the artifact hashes in Fossil are |
| 67 | literally "univerally unique identifiers", and so they are sometimes |
| 68 | called "UUIDs" anyhow. |
| 69 | |
| 70 | Some readers are greatly annoyed by Fossil's use of "UUID" in its most |
| 71 | literal sense. To those readers, the designer apologizes, and seeks your |
| 72 | mercy by noting that when the term "UUID" first began to be used by Fossil, |
| 73 | only SHA1 was supported and so all the artifact hashes were 128 bits, making |
| 74 | them close to, if not exactly, in compliance with the rigid definition |
| 75 | of the term. For his misuse of the term "UUID", the designer has been |
| 76 | frequently rebuked. |
| 77 | Some efforts have been made, over the ensuing years, to avoid and replace |
| 78 | "UUID" in newer code and documentation. |
| 79 | But it does not seem like such a serious issue as to require an immediate |
| 80 | purge of the term from existing documentation, code, and database schemas, |
| 81 | as some have suggested. Hence, the unconventional use of the term "UUID" |
| 82 | lingers on in Fossil. Let new readers beware. |
| 83 | |
| 84 | Places where the non-conforming use of "UUID" persists in Fossil are |
| 85 | discussed in the sequel. |
| 86 | |
| 87 | |
| 88 | ### Repository DB Schema |
| 89 | |
| 90 | Almost all remaining uses of the term "UUID" in Fossil derive |
| 91 | from the `blob.uuid` table column. This is |
| 92 | a key lookup column in the most important persistent Fossil DB table, so |
| 93 | it influences broad swaths of the Fossil internals. |
| 94 | |
| 95 | It is theoretically possible to rename this column and those it has |
| 96 | influenced (e.g. `purgeitem.uuid`, `shun.uuid`, and `ticket.tkt_uuid`) |
| 97 | by making Fossil detect the outdated schema and silently upgrade it, |
| 98 | coincident with updating all of the SQL in Fossil that refers to these |
| 99 | columns. But that is a large and error-prone edit that does |
| 100 | serve any pressing need, and so is unlikely to happen any time soon. |
| 101 | Hence, Fossil will likely continue to have “UUID” all through its internals. |
| 102 | |
| 103 | In order to avoid needless terminology conflicts, Fossil code that |
| 104 | refers to these columns also uses some variant of “UUID.” For |
| 105 | example, C code that refers to SQL result data on `blob.uuid` usually |
| 106 | calls the variable `zUuid`. Another example is the internal function |
| 107 | `uuid_to_rid()`. Until and unless the columns are renamed, |
| 108 | these associated function names will likely also go unchanged. |
| 109 | |
| 110 | You may have local SQL code that digs into the repository DB using these |
| 111 | column names. If so, be warned: we are not inclined to consider |
| 112 | existence of such code sufficient reason to avoid renaming the columns. |
| 113 | The Fossil repository DB schema is not considered an external user |
| 114 | interface, and internal interfaces are subject to change at any time. We |
| 115 | suggest switching to a more stable API: the JSON API, `/timeline.rss`, |
| 116 | TH1, etc. |
| 117 | |
| 118 | |
| 119 | ### TH1 Scripting Interfaces |
| 120 | |
| 121 | Some [TH1](./th1.md) interfaces use “UUID” where they actually mean some |
| 122 | kind of hash. For example, the `$tkt_uuid` variable, available via TH1 |
| 123 | when [customizing Fossil’s ticket system][ctkt]. |
| 124 | |
| 125 | Because this is considered a public programming interface, we are |
| 126 | unwilling to unilaterally rename such TH1 variables, even though they |
| 127 | are "wrong". For now, we are simply documenting the unconventional |
| 128 | terminology. |
| 129 | |
| 130 | |
| 131 | ### JSON API Parameters and Outputs |
| 132 | |
| 133 | The JSON API frequently uses the term “UUID” in the same sort of way, |
| 134 | most commonly in [artifact][jart] and [timeline][jtim] APIs. As with the |
| 135 | prior case, we can’t fix these without breaking code that uses the JSON |
| 136 | API as originally designed, so our solutions are the same: document the |
| 137 | unconventional usage. |
| 138 | |
| 139 | |
| 140 | ### `manifest.uuid` |
| 141 | |
| 142 | If you have [the `manifest` setting][mset] enabled, Fossil writes a file |
| @@ -139,7 +150,5 @@ | |
| 150 | [hpol]: ./hashpolicy.wiki |
| 151 | [jart]: ./json-api/api-artifact.md |
| 152 | [jtim]: ./json-api/api-timeline.md |
| 153 | [mset]: /help?cmd=manifest |
| 154 | [tvb]: ./branching.wiki |
| 155 |