Fossil SCM

Updates to the doc/hashes.md document.

drh 2020-05-28 17:37 trunk
Commit 364307951cf5224d20c19b93bb53c7dacd5c98fa14220f88cd528083b27c4183
1 file changed +54 -45
+54 -45
--- www/hashes.md
+++ www/hashes.md
@@ -3,14 +3,15 @@
33
All artifacts in Fossil are identified by a unique hash, currently using
44
[the SHA3 algorithm by default][hpol], but historically using the SHA1
55
algorithm. Therefore, there are two full-length hash formats used by
66
Fossil:
77
8
-| Algorithm | Raw Bits | Hex ASCII Bytes |
9
-|-----------|----------|-----------------|
10
-| SHA3-256 | 256 | 64 |
11
-| SHA1 | 160 | 40 |
8
+<table border="1" cellspacing="0" cellpadding="10">
9
+<tr><th>Algorithm<th>Raw Bits<th>Hexadecimal digits
10
+<tr><td>SHA3-256<td>256<td>64
11
+<tr><td>SHA1<td>160<td>40
12
+</table>
1213
1314
There are many types of artifacts in Fossil: commits (a.k.a. check-ins),
1415
tickets, ticket comments, wiki articles, forum postings, file data
1516
belonging to check-ins, etc. ([More info...](./concepts.wiki#artifacts)).
1617
@@ -47,85 +48,95 @@
4748
“abc123”, then that is a valid version string as long as it remains
4849
unambiguous.
4950
5051
5152
52
-## <a id="uvh"></a>UUIDs: An Unfortunate Historical Artifact
53
-
54
-Historically, Fossil incorrectly used the term “[UUID][uuid]” where it
55
-should use the term “artifact hash” instead. There are two primary
56
-problems with miscalling Fossil artifact hashes UUIDs:
57
-
58
-1. UUIDs are always 128 bits in length — 32 hex ASCII bytes — making
59
- them shorter than any actual Fossil artifact hash.
60
-
61
-2. Artifact hashes are necessarily highly pseudorandom blobs, but only
62
- [version 4 UUIDs][v4] are pseudorandom in the same way. Other UUID
63
- types have non-random meanings for certain subgroups of the bits,
64
- restrictions that Fossil artifact hashes do not meet.
65
-
66
-Therefore, no Fossil hash can ever be a proper UUID.
67
-
68
-Nevertheless, there are several places in Fossil where we still use the
69
-term UUID, primarily for backwards compatibility:
53
+## <a id="uvh"></a>Unconventional Use Of The Term "UUID"
54
+
55
+"UUID" is an acronym for "Univerially Unique Identifier". Hashes
56
+generated by SHA1 or SHA3-256 are universally unique (in practice,
57
+if not in theory) and they identify a particular artifact, and so
58
+it seems reasonable to refer to artifact hashes as UUIDs.
59
+
60
+However, the term UUID has acquired a much stricter meaning than its
61
+name alone implies. Purists insist that UUIDs must be *exactly* 128 bits,
62
+that they must be displayed in a particular hexadecimal format that includes
63
+dashes at proscribed intervals, and that they must have four particular bits
64
+set aside to indicate the "type" of UUID. Fossil artifact hashes do not
65
+comply with any of these supplemental requirements, and so are not UUIDs
66
+in the strictest sense of the word. But the artifact hashes in Fossil are
67
+literally "univerally unique identifiers", and so they are sometimes
68
+called "UUIDs" anyhow.
69
+
70
+Some readers are greatly annoyed by Fossil's use of "UUID" in its most
71
+literal sense. To those readers, the designer apologizes, and seeks your
72
+mercy by noting that when the term "UUID" first began to be used by Fossil,
73
+only SHA1 was supported and so all the artifact hashes were 128 bits, making
74
+them close to, if not exactly, in compliance with the rigid definition
75
+of the term. For his misuse of the term "UUID", the designer has been
76
+frequently rebuked.
77
+Some efforts have been made, over the ensuing years, to avoid and replace
78
+"UUID" in newer code and documentation.
79
+But it does not seem like such a serious issue as to require an immediate
80
+purge of the term from existing documentation, code, and database schemas,
81
+as some have suggested. Hence, the unconventional use of the term "UUID"
82
+lingers on in Fossil. Let new readers beware.
83
+
84
+Places where the non-conforming use of "UUID" persists in Fossil are
85
+discussed in the sequel.
7086
7187
7288
### Repository DB Schema
7389
74
-Almost all of these uses flow from the `blob.uuid` table column. This is
90
+Almost all remaining uses of the term "UUID" in Fossil derive
91
+from the `blob.uuid` table column. This is
7592
a key lookup column in the most important persistent Fossil DB table, so
7693
it influences broad swaths of the Fossil internals.
7794
78
-Someday we may rename this column and those it has influenced (e.g.
79
-`purgeitem.uuid`, `shun.uuid`, and `ticket.tkt_uuid`) by making Fossil
80
-detect the outdated schema and silently upgrade it, coincident with
81
-updating all of the SQL in Fossil that refers to these columns. Until
82
-then, Fossil will continue to have “UUID” all through its internals.
95
+It is theoretically possible to rename this column and those it has
96
+influenced (e.g. `purgeitem.uuid`, `shun.uuid`, and `ticket.tkt_uuid`)
97
+by making Fossil detect the outdated schema and silently upgrade it,
98
+coincident with updating all of the SQL in Fossil that refers to these
99
+columns. But that is a large and error-prone edit that does
100
+serve any pressing need, and so is unlikely to happen any time soon.
101
+Hence, Fossil will likely continue to have “UUID” all through its internals.
83102
84103
In order to avoid needless terminology conflicts, Fossil code that
85
-refers to these misnamed columns also uses some variant of “UUID.” For
104
+refers to these columns also uses some variant of “UUID.” For
86105
example, C code that refers to SQL result data on `blob.uuid` usually
87106
calls the variable `zUuid`. Another example is the internal function
88
-`uuid_to_rid()`. Until and unless we decide to rename these DB columns,
89
-we will keep these associated internal identifiers unchanged.
107
+`uuid_to_rid()`. Until and unless the columns are renamed,
108
+these associated function names will likely also go unchanged.
90109
91110
You may have local SQL code that digs into the repository DB using these
92111
column names. If so, be warned: we are not inclined to consider
93112
existence of such code sufficient reason to avoid renaming the columns.
94113
The Fossil repository DB schema is not considered an external user
95114
interface, and internal interfaces are subject to change at any time. We
96115
suggest switching to a more stable API: the JSON API, `/timeline.rss`,
97116
TH1, etc.
98117
99
-There are also some temporary tables that misuse “UUID” in this way.
100
-(`description.uuid`, `timeline.uuid`, `xmark.uuid`, etc.) There’s a good
101
-chance we’ll fix these before we fix the on-disk DB schema since no
102
-other code can depend on them.
103
-
104118
105119
### TH1 Scripting Interfaces
106120
107121
Some [TH1](./th1.md) interfaces use “UUID” where they actually mean some
108122
kind of hash. For example, the `$tkt_uuid` variable, available via TH1
109123
when [customizing Fossil’s ticket system][ctkt].
110124
111125
Because this is considered a public programming interface, we are
112126
unwilling to unilaterally rename such TH1 variables, even though they
113
-are “wrong.” For now, we are simply documenting the misuse. Later, we
114
-may provide a parallel interface — e.g. `$tkt_hash` in this case — and
115
-drop mention of the old interface from the documentation, but still
116
-support it.
127
+are "wrong". For now, we are simply documenting the unconventional
128
+terminology.
117129
118130
119131
### JSON API Parameters and Outputs
120132
121
-The JSON API frequently misuses the term “UUID” in the same sort of way,
133
+The JSON API frequently uses the term “UUID” in the same sort of way,
122134
most commonly in [artifact][jart] and [timeline][jtim] APIs. As with the
123135
prior case, we can’t fix these without breaking code that uses the JSON
124136
API as originally designed, so our solutions are the same: document the
125
-misuse here for now, then possibly provide a backwards-compatible fix
126
-later.
137
+unconventional usage.
127138
128139
129140
### `manifest.uuid`
130141
131142
If you have [the `manifest` setting][mset] enabled, Fossil writes a file
@@ -139,7 +150,5 @@
139150
[hpol]: ./hashpolicy.wiki
140151
[jart]: ./json-api/api-artifact.md
141152
[jtim]: ./json-api/api-timeline.md
142153
[mset]: /help?cmd=manifest
143154
[tvb]: ./branching.wiki
144
-[uuid]: https://en.wikipedia.org/wiki/Universally_unique_identifier
145
-[v4]: https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random)
146155
--- www/hashes.md
+++ www/hashes.md
@@ -3,14 +3,15 @@
3 All artifacts in Fossil are identified by a unique hash, currently using
4 [the SHA3 algorithm by default][hpol], but historically using the SHA1
5 algorithm. Therefore, there are two full-length hash formats used by
6 Fossil:
7
8 | Algorithm | Raw Bits | Hex ASCII Bytes |
9 |-----------|----------|-----------------|
10 | SHA3-256 | 256 | 64 |
11 | SHA1 | 160 | 40 |
 
12
13 There are many types of artifacts in Fossil: commits (a.k.a. check-ins),
14 tickets, ticket comments, wiki articles, forum postings, file data
15 belonging to check-ins, etc. ([More info...](./concepts.wiki#artifacts)).
16
@@ -47,85 +48,95 @@
47 “abc123”, then that is a valid version string as long as it remains
48 unambiguous.
49
50
51
52 ## <a id="uvh"></a>UUIDs: An Unfortunate Historical Artifact
53
54 Historically, Fossil incorrectly used the term “[UUID][uuid]” where it
55 should use the term “artifact hash” instead. There are two primary
56 problems with miscalling Fossil artifact hashes UUIDs:
57
58 1. UUIDs are always 128 bits in length — 32 hex ASCII bytes — making
59 them shorter than any actual Fossil artifact hash.
60
61 2. Artifact hashes are necessarily highly pseudorandom blobs, but only
62 [version 4 UUIDs][v4] are pseudorandom in the same way. Other UUID
63 types have non-random meanings for certain subgroups of the bits,
64 restrictions that Fossil artifact hashes do not meet.
65
66 Therefore, no Fossil hash can ever be a proper UUID.
67
68 Nevertheless, there are several places in Fossil where we still use the
69 term UUID, primarily for backwards compatibility:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
71
72 ### Repository DB Schema
73
74 Almost all of these uses flow from the `blob.uuid` table column. This is
 
75 a key lookup column in the most important persistent Fossil DB table, so
76 it influences broad swaths of the Fossil internals.
77
78 Someday we may rename this column and those it has influenced (e.g.
79 `purgeitem.uuid`, `shun.uuid`, and `ticket.tkt_uuid`) by making Fossil
80 detect the outdated schema and silently upgrade it, coincident with
81 updating all of the SQL in Fossil that refers to these columns. Until
82 then, Fossil will continue to have “UUID” all through its internals.
 
 
83
84 In order to avoid needless terminology conflicts, Fossil code that
85 refers to these misnamed columns also uses some variant of “UUID.” For
86 example, C code that refers to SQL result data on `blob.uuid` usually
87 calls the variable `zUuid`. Another example is the internal function
88 `uuid_to_rid()`. Until and unless we decide to rename these DB columns,
89 we will keep these associated internal identifiers unchanged.
90
91 You may have local SQL code that digs into the repository DB using these
92 column names. If so, be warned: we are not inclined to consider
93 existence of such code sufficient reason to avoid renaming the columns.
94 The Fossil repository DB schema is not considered an external user
95 interface, and internal interfaces are subject to change at any time. We
96 suggest switching to a more stable API: the JSON API, `/timeline.rss`,
97 TH1, etc.
98
99 There are also some temporary tables that misuse “UUID” in this way.
100 (`description.uuid`, `timeline.uuid`, `xmark.uuid`, etc.) There’s a good
101 chance we’ll fix these before we fix the on-disk DB schema since no
102 other code can depend on them.
103
104
105 ### TH1 Scripting Interfaces
106
107 Some [TH1](./th1.md) interfaces use “UUID” where they actually mean some
108 kind of hash. For example, the `$tkt_uuid` variable, available via TH1
109 when [customizing Fossil’s ticket system][ctkt].
110
111 Because this is considered a public programming interface, we are
112 unwilling to unilaterally rename such TH1 variables, even though they
113 are “wrong.” For now, we are simply documenting the misuse. Later, we
114 may provide a parallel interface — e.g. `$tkt_hash` in this case — and
115 drop mention of the old interface from the documentation, but still
116 support it.
117
118
119 ### JSON API Parameters and Outputs
120
121 The JSON API frequently misuses the term “UUID” in the same sort of way,
122 most commonly in [artifact][jart] and [timeline][jtim] APIs. As with the
123 prior case, we can’t fix these without breaking code that uses the JSON
124 API as originally designed, so our solutions are the same: document the
125 misuse here for now, then possibly provide a backwards-compatible fix
126 later.
127
128
129 ### `manifest.uuid`
130
131 If you have [the `manifest` setting][mset] enabled, Fossil writes a file
@@ -139,7 +150,5 @@
139 [hpol]: ./hashpolicy.wiki
140 [jart]: ./json-api/api-artifact.md
141 [jtim]: ./json-api/api-timeline.md
142 [mset]: /help?cmd=manifest
143 [tvb]: ./branching.wiki
144 [uuid]: https://en.wikipedia.org/wiki/Universally_unique_identifier
145 [v4]: https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random)
146
--- www/hashes.md
+++ www/hashes.md
@@ -3,14 +3,15 @@
3 All artifacts in Fossil are identified by a unique hash, currently using
4 [the SHA3 algorithm by default][hpol], but historically using the SHA1
5 algorithm. Therefore, there are two full-length hash formats used by
6 Fossil:
7
8 <table border="1" cellspacing="0" cellpadding="10">
9 <tr><th>Algorithm<th>Raw Bits<th>Hexadecimal digits
10 <tr><td>SHA3-256<td>256<td>64
11 <tr><td>SHA1<td>160<td>40
12 </table>
13
14 There are many types of artifacts in Fossil: commits (a.k.a. check-ins),
15 tickets, ticket comments, wiki articles, forum postings, file data
16 belonging to check-ins, etc. ([More info...](./concepts.wiki#artifacts)).
17
@@ -47,85 +48,95 @@
48 “abc123”, then that is a valid version string as long as it remains
49 unambiguous.
50
51
52
53 ## <a id="uvh"></a>Unconventional Use Of The Term "UUID"
54
55 "UUID" is an acronym for "Univerially Unique Identifier". Hashes
56 generated by SHA1 or SHA3-256 are universally unique (in practice,
57 if not in theory) and they identify a particular artifact, and so
58 it seems reasonable to refer to artifact hashes as UUIDs.
59
60 However, the term UUID has acquired a much stricter meaning than its
61 name alone implies. Purists insist that UUIDs must be *exactly* 128 bits,
62 that they must be displayed in a particular hexadecimal format that includes
63 dashes at proscribed intervals, and that they must have four particular bits
64 set aside to indicate the "type" of UUID. Fossil artifact hashes do not
65 comply with any of these supplemental requirements, and so are not UUIDs
66 in the strictest sense of the word. But the artifact hashes in Fossil are
67 literally "univerally unique identifiers", and so they are sometimes
68 called "UUIDs" anyhow.
69
70 Some readers are greatly annoyed by Fossil's use of "UUID" in its most
71 literal sense. To those readers, the designer apologizes, and seeks your
72 mercy by noting that when the term "UUID" first began to be used by Fossil,
73 only SHA1 was supported and so all the artifact hashes were 128 bits, making
74 them close to, if not exactly, in compliance with the rigid definition
75 of the term. For his misuse of the term "UUID", the designer has been
76 frequently rebuked.
77 Some efforts have been made, over the ensuing years, to avoid and replace
78 "UUID" in newer code and documentation.
79 But it does not seem like such a serious issue as to require an immediate
80 purge of the term from existing documentation, code, and database schemas,
81 as some have suggested. Hence, the unconventional use of the term "UUID"
82 lingers on in Fossil. Let new readers beware.
83
84 Places where the non-conforming use of "UUID" persists in Fossil are
85 discussed in the sequel.
86
87
88 ### Repository DB Schema
89
90 Almost all remaining uses of the term "UUID" in Fossil derive
91 from the `blob.uuid` table column. This is
92 a key lookup column in the most important persistent Fossil DB table, so
93 it influences broad swaths of the Fossil internals.
94
95 It is theoretically possible to rename this column and those it has
96 influenced (e.g. `purgeitem.uuid`, `shun.uuid`, and `ticket.tkt_uuid`)
97 by making Fossil detect the outdated schema and silently upgrade it,
98 coincident with updating all of the SQL in Fossil that refers to these
99 columns. But that is a large and error-prone edit that does
100 serve any pressing need, and so is unlikely to happen any time soon.
101 Hence, Fossil will likely continue to have “UUID” all through its internals.
102
103 In order to avoid needless terminology conflicts, Fossil code that
104 refers to these columns also uses some variant of “UUID.” For
105 example, C code that refers to SQL result data on `blob.uuid` usually
106 calls the variable `zUuid`. Another example is the internal function
107 `uuid_to_rid()`. Until and unless the columns are renamed,
108 these associated function names will likely also go unchanged.
109
110 You may have local SQL code that digs into the repository DB using these
111 column names. If so, be warned: we are not inclined to consider
112 existence of such code sufficient reason to avoid renaming the columns.
113 The Fossil repository DB schema is not considered an external user
114 interface, and internal interfaces are subject to change at any time. We
115 suggest switching to a more stable API: the JSON API, `/timeline.rss`,
116 TH1, etc.
117
 
 
 
 
 
118
119 ### TH1 Scripting Interfaces
120
121 Some [TH1](./th1.md) interfaces use “UUID” where they actually mean some
122 kind of hash. For example, the `$tkt_uuid` variable, available via TH1
123 when [customizing Fossil’s ticket system][ctkt].
124
125 Because this is considered a public programming interface, we are
126 unwilling to unilaterally rename such TH1 variables, even though they
127 are "wrong". For now, we are simply documenting the unconventional
128 terminology.
 
 
129
130
131 ### JSON API Parameters and Outputs
132
133 The JSON API frequently uses the term “UUID” in the same sort of way,
134 most commonly in [artifact][jart] and [timeline][jtim] APIs. As with the
135 prior case, we can’t fix these without breaking code that uses the JSON
136 API as originally designed, so our solutions are the same: document the
137 unconventional usage.
 
138
139
140 ### `manifest.uuid`
141
142 If you have [the `manifest` setting][mset] enabled, Fossil writes a file
@@ -139,7 +150,5 @@
150 [hpol]: ./hashpolicy.wiki
151 [jart]: ./json-api/api-artifact.md
152 [jtim]: ./json-api/api-timeline.md
153 [mset]: /help?cmd=manifest
154 [tvb]: ./branching.wiki
 
 
155

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button