Fossil SCM

Added fossil-is-not-relational.md.

stephan 2021-02-12 03:45 trunk
Commit 8da2f2ae84cec51c19fdaac71e009fcdf77ea5e876ec848ae79ecfde26735e44
--- a/www/fossil-is-not-relational.md
+++ b/www/fossil-is-not-relational.md
@@ -0,0 +1,386 @@
1
+# Fossil is not Relational
2
+
3
+***An Introduction to the Fossil Data Model***
4
+
5
+Upon hearing that Fossil is based on sqlite, it's natural for people
6
+unfamiliar with its internals to assume that Fossil stores its
7
+SCM-relevant data in a database-friendly way and that the SCM history
8
+could be modified via SQL. The truth, however, is *far presentation than
9
+that.*
10
+
11
+This document introduces, at a relatively high level:
12
+
13
+1) The underlying enduring and immutable data format, which is
14
+ independent of any specific storage engine.
15
+
16
+2) The `blob` table: Fossil's single point of SCM-relevant data
17
+ storage.
18
+
19
+3) The transformation of (1) from its immutable raw form to a
20
+ *transient* database-friendly form.
21
+
22
+4) Some of the consequences of this model.
23
+
24
+<!--ral for people
25
+unfamiliar with its internals to assume that Fossil stores its
26
+S# Fossil is not Relational
27
+
28
+%;
29
+B: cylinder "blob table"
30
+rightAUX: cylinder "Auxiliary" "tables"
31
+arc -> cw dotted from AUX.s to B.s;
32
+] # end of AllObjects
33
+```
34
+
35
+
36
+The `blob` table is the core-most storage of a F// text "Architecture Overview" big bold at .1cm above north of AllObjects
37
+```
38
+-->consequences of this model.
39
+
40
+
41
+# Part 1: Artifacts
42
+
43
+```pikchr center
44
+AllObjects: [
45
+A: file "Artifacts" fill lightskyblue;
46
+down; move to A.s; move 50%;
47
+F: file "Client" "files";
48
+right; move 1; up; move 50%;
49
+B: cylinder "blob table"
50
+right;
51
+arrow from A.e to B.w;
52
+arrow from F.e to B.w;
53
+arrow dashed from B.e;
54
+C: box rad 0.1 "Crosslink" "process";
55
+arrow
56
+AUX: cylinder "Auxiliary" "tables"
57
+arc -> cw dotted from AUX.s to B.s;
58
+] # end of AllObjects
59
+```
60
+
61
+
62
+The centerpiece of Fossil's architecture is a data format which
63
+describes what we call "artifacts." Each artifact represents the state
64
+of one atomic unit of SCM-relevant data, such as a single checkin, a
65
+single wiki page edit, a single modification to a ticket, creation or
66
+cancellation of tags, and similar SCM constructs. In the cases of
67
+checkins and ticket updates, an artifact may record changes to
68
+multiple files resp. ticket fields, but the change as a whole
69
+is atomic. Though we often refer to both fossil-specific SCM data
70
+and client-side content as artifacts, this document uses the term
71
+artifact solely for the former purpose.
72
+
73
+From [the data format's main documentation][dataformat]:
74
+
75
+> The globobal state of a fossil repository is kept simple so that it
76
+> can endure in useful form for decades or centuries. A fossil
77
+> repository is intended to be readable, searchable, and extensible by
78
+> people not yet born.
79
+
80
+[dataformat]: ./fileformat.wiki
81
+
82
+This format has the following major properties:
83
+
84
+- It is <u>**syntactically simple**</u>, easily and efficiently
85
+ parsable in any programming language. It is also entirely
86
+ human-readable.
87
+
88
+- It is <u>**immutable**</u>. An artifact is identified by its unique
89
+ hash value. Any modification to an artifact changes that hash,
90
+ thereby changing its identity.
91
+
92
+- It is <u>**not generic**</u>. It is custom-made for its purpose and
93
+ makes no attempt at providing a generic format. It contains *only*
94
+ what it *needs* to function, with zero bloat.
95
+
96
+- It <u>**holds all SCM-relevant data except for client-level file
97
+ content**</u>, the latter instead being referenced by their unique
98
+ hash values. Storagaarc -> cw dotted from AUX.s to B.s;
99
+] # end of AllObjects
100
+```
101
+
102
+
103
+The centerpiece of Fossil's architecture is a data format which
104
+describes what we call "artifacts." Each artifact represents the state
105
+of one atomic unit of SCM-relevant data, such as a single checkin, a
106
+single wiki page edit, a single modification to a ticket, creation or
107
+cancellation of tags, and similar SCM constructs. In the cases of
108
+checkins and ticket updates, an artifact may record changes to
109
+multiple files resp. ticket fields, but the change as a whole
110
+is atomic. Though we often refer to both fossil-specific SCM data
111
+and client-side content as artifacts, this document uses the term
112
+artifact solely for the former purpose.
113
+
114
+From [the data format's main documentation][dataformat]:
115
+
116
+> The globobal state of a fossil repository is kept simple so that it
117
+> can endure in useful form for decades or centuries. A fossil
118
+> repository is intended to be readable, searchable, and extensible by
119
+> people not yet born.
120
+
121
+[dataformat]: ./fileformat.wiki
122
+
123
+This format has the following major properties:
124
+
125
+- It is <u>**syntactically simple**</u>, easily and efficiently
126
+ parsable in any programming language. It is also entirely
127
+ human-readable.
128
+
129
+- It is <u>**immutable**</u>. An artifact is identified by its unique
130
+ hash value. Any modification to an artifact changes that hash,
131
+ thereby changing its identity.
132
+
133
+- It is <u>**not generic**</u>. It is custom-made for its purpose and
134
+ makes no attempt at providing a generic format. It contains *only*
135
+ what it *needs* to function, with zero bloat.
136
+
137
+- It <u>**holds all SCM-relevant data except for client-level file
138
+ content**</u>, the latter instead being referenced by their unique
139
+ hash values. Storage of the client-side content is an implementation
140
+ detail delegated to higher-level applications.
141
+
142
+- <u>**Auditability**</u>. By following the hash references in
143
+ artifacts it is possible to unambiguously trace the origin of any
144
+ modification to the SCM state. Combined with higher-level tools
145
+ (specifically, Fossil's database), this audit trail can easily be
146
+ traced both backwards and forwards in time, using any given version
147
+ in the SCM history as a starting point.
148
+
149
+Notably, the artifact file format <u>does not</u>...
150
+
151
+- Specify any specific storage mechanism for the SCM's raw bytes,
152
+ which includes both artifacts themselves and client-side file
153
+ content. The file format refers to all such content solely by its
154
+ unique hash value.
155
+
156
+- Specify any optimizations such as storing file-level changes as
157
+ deltas between two versions of that content.
158
+
159
+Such aspects are all considered to be implementation details of
160
+higher-level applications (be they in the main fossil binary or a
161
+hypothetical 3rd-party application), and have no effect on the
162
+underlying artifact data model. That said, in Fossil:
163
+
164
+- All raw byte content (artifacts and client files) is stored in
165
+ the `blob` database table.
166
+
167
+- Fossil uses delta and zlib compression to keep the storage size of
168
+ changes from one version of a piece of content to the next to a
169
+ minimum.
170
+
171
+
172
+## Sidebar: SCM-relevant vs Non-SCM-relevant State
173
+
174
+Certain data in Fossil are "SCM-relevant" and certain data are not. In
175
+short, SCM-relevant data are managed in a way consistent with
176
+controlled versioning of that data. Conversely, non-SCM-relevant data
177
+are essentially any state neither specified by nor unambiguously
178
+refererenced by the artifact file format and are therefore not
179
+versioned.
180
+
181
+SCM-relevant state includes:
182
+
183
+- Any and all data stored in the bodies of artifacts. This includes,
184
+ but is not limited to: wiki/ticket/forum content, tags, file names
185
+ and Fossil-side permissions, the name of each user who introduces
186
+ any given artifact into the data store, the timestamp of each such
187
+ change, the inheritance tree of checkins, and many other pieces of
188
+ metadata.
189
+
190
+- Raw file content of versioned files. These data are external to
191
+ artifacts, which refer to them by their hashes. How they are stored
192
+ is not the concern of the data model, but (spoilin themt!) Fossil
193
+ stores them in an sqlite database, one record per distinct hash, in
194
+ its `blob` table (which we will cover more very soon)".users and their metadata (permissions, email
195
+ address, etc.). Artifacts themselves reference users only by their
196
+ user names. Artifacts neither care whether, nor guaranty that, user
197
+ "drh" in one artifact is in fact the same "drh" referenced in
198
+ another artifact.
199
+
200
+- All Fossil UI configuration, e.g. the site's skin, config settings,
201
+ and project name.
202
+
203
+- In short, any tables in a Fossil repository file except for the
204
+ `blob` table. Most, but not all, of these tables are transient
205
+ caches for the data specified by the artifact files (which are
206
+ stored in the `blob` table), and can safely be destroyed and rebuilt
207
+ from the collection of artifacts with no loss of state to the
208
+ repository. *All* of them, except for `blob` and `delta`, can be
209
+ destroyed with no loss of *SCM-relevant* data.
210
+
211
+## Terminology Hair-splitting: Manifest vs. Artifact
212
+
213
+We sometimes refer to artifacts as "manifests," which is technically a
214
+term for artifacts which record checkins. The various other artifact
215
+types are arguably not "manifests," but are sometimes referred to as
216
+such because the internal APIs use that term.
217
+
218
+
219
+## A Very Basic Example
220
+
221
+The following artifact, truncated for brevity, represents a typical
222
+checkin artifact (a.k.a. a manifest):
223
+
224
+```
225
+C Bug\sfix\sin\sthe\slocal\sdatabase\sfinder.
226
+D 2007-07-30T13:01:08
227
+F src/VERSION 24bbb3aad63325ff33c56d777007d7cd63dc19ea
228
+F src/add.c 1a5dfcdbfd24c65fa04da865b2e21486d075e154
229
+F src/blob.c 8ec1e279a6cd0cfd5f1e3f8a39f2e9a1682e0113
230
+<SNIP>
231
+F www/selfcheck.html 849df9860df602dc2c55163d658c6b138213122f
232
+P 01e7596a984e2cd2bc12abc0a741415b902cbeea
233
+R 74a0432d81b956bfc3ff5a1a2bb46eb5
234
+U drh
235
+Z c9dcc06ecead312b1c310711cb360bc3
236
+```
237
+
238
+Each line is a single data record called a "card." The first letter of
239
+each line tells us the type of data stored on that line and the
240
+following space-separated tokens contain the data for that
241
+line. Tokens which themselves contain spaces (notably the checkin
242
+comment) have those escaped as `\s`. The raw text of wiki
243
+pages/comments, forum posts, and ticket bodies/comments is stored
244
+directly in the corresponding artifact, but is stored in a way which
245
+makes such escaping unnecessary.
246
+
247
+The hashes seen above are a critical component of the architecture:
248
+
249
+- The `F` (file) records refer to the content of those files by the
250
+hash of that content. Where that content is stored is *not* specified
251
+by the data model.
252
+
253
+- The `P` (parent) line is the hash code of the parent version (itself
254
+ an artifact).
255
+
256
+- The `Z` line is a hash of all of the content of *this artifact*
257
+ which precedes the `Z` line. Thus any change to the content of an
258
+ artifact changes both the artifact's identity (its hash) and its `Z`
259
+ value, making it impossible to inject modified artifacts into an
260
+ existing artifact tree.
261
+
262
+- The `R` line is yet another consistency-checking hash which we won't
263
+ go into here except to say that it's an internal consistency
264
+ check/line of defense against modification of file content
265
+ referenced by the artifact.
266
+
267
+# Part 2: The `blob` Table
268
+
269
+```pikchr center
270
+AllObjects: [
271
+A: file "Artifacts";
272
+down; move to A.s; move 50%;
273
+F: file "Client" "files" fill lightskyblue;
274
+right; move 1; up; move 50%;
275
+B: cylinder "blob table" fill lightskyblue;
276
+right;
277
+arrow from A.e to B.w;
278
+arrow from F.e to B.w;
279
+arrow dashed from B.e;
280
+C: box rad 0.1 "Crosslink" "process";
281
+arrow
282
+AUX: cylinder "Auxiliary" "tables"
283
+arc -> cw dotted from AUX.s to B.s;
284
+] # end of AllObjects
285
+```
286
+
287
+
288
+The `blob` table is the core-most storage of a Fossil repository
289
+database, storing all SCM-relevant data (and *only* SCM-relevant
290
+data). Each row of this table holds a single artifact or the content
291
+for a single version of a single client-side file. Slightly truncated
292
+for clarity, its schema contains the following fields:
293
+
294
+- **`uuid`**: the hash code of the blob's contents.
295
+- **`rid`**: a unique integer key for this record. This is how the
296
+ blob table is mapped to other (transient) tables, but the RIDs are
297
+ specific to one given copy of a repository and must not be used for
298
+ cross-repository referencing. The RID is a private/internal value of
299
+ no use to a user unless they're building SQL queries for use with
300
+ the Fossil db schema.
301
+- **`size`**: the size, in bytes, of the blob's contents, or -1 for
302
+ "phantom" blobs (those which Fossil knows should exist because it's
303
+ seen them referenced somewhere, but for which it has not been given
304
+ any content).
305
+- **`content`**: the blob's raw content bytes, with the caveat that
306
+ Fossil is free to store it in an "alternate representation."
307
+ Specifically, the `content` field often holds a zlib-compressed
308
+ delta from a previous version of the blob's content (a separate
309
+ entry in the `blob` table), and an auxiliary table named `delta`
310
+ maps such blobs to their previous versions, such that Fossil can
311
+ reconstruct the real content from them by applying the delta to its
312
+ previous version (and such deltas may be chained). Thus extraction
313
+ of the content from this field cannot be performed via vanilla SQL,
314
+ and requires a Fossil-specific function which knows how to convert
315
+ any internal representations of the content to its original form.
316
+
317
+
318
+## Sidebar: How does `blob` Distinguish Between Artifacts and Client Content?
319
+
320
+Notice that the `blob` table has no flag saying "this record is an
321
+artifact" or "this record is client data." Similarly, there is no
322
+place in the database dedicated to keeping track of which `blob`
323
+records are artifacts and which are file content.
324
+
325
+That said, (A) the type of a blob can be implied via certain table
326
+relationships and (B) the `event` table (the `/timeline`'s main data
327
+source) incidentally has a list of artifacts and their sub-types
328
+(checkin, wiki, tag, etc.). However, given that all of those
329
+relationships, including the timeline, are *transient*, how can Fossil
330
+distinguish between the two types of data?
331
+
332
+Fossil's artifact format is extremely rigid and is *strictly* enforced
333
+internally, with zero room provided for leniency. Every artifact which
334
+is internally created is re-parsed for validity before it is committed
335
+to the database, making it impossible that Fossil can inject an
336
+invalid artifact into the repository. Because of the strictness of the
337
+artifact parser, the chances that any given piece of arbitrary clien could be successfully parsed as an artifact, even if it is
338
+syntactically 99% similar to an artifact, are *effectively zero*.
339
+
340
+Thus Fossil's rule of interpreting the contents of the blob table is:
341
+if it can be parsed as an artifact, it *is* an artifact, else it is
342
+opaque client-side data.
343
+
344
+That rule is most often relevant in operations like `rebuild` and
345
+`reconstruct`, both of which necessarily have to sort out artifacts
346
+and non-artifact blobs from arbitrary collections of blobs.
347
+
348
+It is, in fact, possible to store an artifact unrelated to the current
349
+repository in that repository, and it *will be parsed and processed as
350
+an artifact* (see below), but it likely refers to other artifacts or
351
+blobs which are not part of the current repository, thereby possibly
352
+introducing "strange" data into the UI. If this happens, it's
353
+potentially slightly confusing but is functionally harmless.
354
+
355
+
356
+# Part 3: Crosslinking
357
+
358
+```pikchr center
359
+AllObjects: [
360
+A: file "Artifacts";
361
+down; move to A.s; move 50%;
362
+F: file "Client" "files";
363
+right; move 1; up; move 50%;
364
+B: cylinder "blob table"
365
+right;
366
+arrow from A.e to B.w;
367
+arrow from F.e to B.w;
368
+arrow dashed from B.e;
369
+C: box rad 0.1 "Crosslink" "process" fill lightskyblue;
370
+arrow
371
+AUX: cylinder "Auxiliary" "tables" fill lightskyblue;
372
+arc -> cw dotted from AUX.s to B.s;
373
+] # end of AllObjects
374
+```
375
+
376
+Once an artifact is stored in the `blob` table, how does one perform
377
+SQL queries against its plain-text format? In short: *One Does Not
378
+Simply Query the Artifacts*.
379
+
380
+Crosslinking, as its colloquially known, is a one-way processing step
381
+which transforms an immutable artifact's state into something
382
+database-friendly. Crosslinking happens automatically every time
383
+Fossil generates, or is given, a new artifact. Crosslinking of any
384
+given artifact may update many different auxiliary tables, *all* of
385
+which are transient in the sense that they may be destroyed and then
386
+recreated by crosslinking all artifacts
--- a/www/fossil-is-not-relational.md
+++ b/www/fossil-is-not-relational.md
@@ -0,0 +1,386 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/www/fossil-is-not-relational.md
+++ b/www/fossil-is-not-relational.md
@@ -0,0 +1,386 @@
1 # Fossil is not Relational
2
3 ***An Introduction to the Fossil Data Model***
4
5 Upon hearing that Fossil is based on sqlite, it's natural for people
6 unfamiliar with its internals to assume that Fossil stores its
7 SCM-relevant data in a database-friendly way and that the SCM history
8 could be modified via SQL. The truth, however, is *far presentation than
9 that.*
10
11 This document introduces, at a relatively high level:
12
13 1) The underlying enduring and immutable data format, which is
14 independent of any specific storage engine.
15
16 2) The `blob` table: Fossil's single point of SCM-relevant data
17 storage.
18
19 3) The transformation of (1) from its immutable raw form to a
20 *transient* database-friendly form.
21
22 4) Some of the consequences of this model.
23
24 <!--ral for people
25 unfamiliar with its internals to assume that Fossil stores its
26 S# Fossil is not Relational
27
28 %;
29 B: cylinder "blob table"
30 rightAUX: cylinder "Auxiliary" "tables"
31 arc -> cw dotted from AUX.s to B.s;
32 ] # end of AllObjects
33 ```
34
35
36 The `blob` table is the core-most storage of a F// text "Architecture Overview" big bold at .1cm above north of AllObjects
37 ```
38 -->consequences of this model.
39
40
41 # Part 1: Artifacts
42
43 ```pikchr center
44 AllObjects: [
45 A: file "Artifacts" fill lightskyblue;
46 down; move to A.s; move 50%;
47 F: file "Client" "files";
48 right; move 1; up; move 50%;
49 B: cylinder "blob table"
50 right;
51 arrow from A.e to B.w;
52 arrow from F.e to B.w;
53 arrow dashed from B.e;
54 C: box rad 0.1 "Crosslink" "process";
55 arrow
56 AUX: cylinder "Auxiliary" "tables"
57 arc -> cw dotted from AUX.s to B.s;
58 ] # end of AllObjects
59 ```
60
61
62 The centerpiece of Fossil's architecture is a data format which
63 describes what we call "artifacts." Each artifact represents the state
64 of one atomic unit of SCM-relevant data, such as a single checkin, a
65 single wiki page edit, a single modification to a ticket, creation or
66 cancellation of tags, and similar SCM constructs. In the cases of
67 checkins and ticket updates, an artifact may record changes to
68 multiple files resp. ticket fields, but the change as a whole
69 is atomic. Though we often refer to both fossil-specific SCM data
70 and client-side content as artifacts, this document uses the term
71 artifact solely for the former purpose.
72
73 From [the data format's main documentation][dataformat]:
74
75 > The globobal state of a fossil repository is kept simple so that it
76 > can endure in useful form for decades or centuries. A fossil
77 > repository is intended to be readable, searchable, and extensible by
78 > people not yet born.
79
80 [dataformat]: ./fileformat.wiki
81
82 This format has the following major properties:
83
84 - It is <u>**syntactically simple**</u>, easily and efficiently
85 parsable in any programming language. It is also entirely
86 human-readable.
87
88 - It is <u>**immutable**</u>. An artifact is identified by its unique
89 hash value. Any modification to an artifact changes that hash,
90 thereby changing its identity.
91
92 - It is <u>**not generic**</u>. It is custom-made for its purpose and
93 makes no attempt at providing a generic format. It contains *only*
94 what it *needs* to function, with zero bloat.
95
96 - It <u>**holds all SCM-relevant data except for client-level file
97 content**</u>, the latter instead being referenced by their unique
98 hash values. Storagaarc -> cw dotted from AUX.s to B.s;
99 ] # end of AllObjects
100 ```
101
102
103 The centerpiece of Fossil's architecture is a data format which
104 describes what we call "artifacts." Each artifact represents the state
105 of one atomic unit of SCM-relevant data, such as a single checkin, a
106 single wiki page edit, a single modification to a ticket, creation or
107 cancellation of tags, and similar SCM constructs. In the cases of
108 checkins and ticket updates, an artifact may record changes to
109 multiple files resp. ticket fields, but the change as a whole
110 is atomic. Though we often refer to both fossil-specific SCM data
111 and client-side content as artifacts, this document uses the term
112 artifact solely for the former purpose.
113
114 From [the data format's main documentation][dataformat]:
115
116 > The globobal state of a fossil repository is kept simple so that it
117 > can endure in useful form for decades or centuries. A fossil
118 > repository is intended to be readable, searchable, and extensible by
119 > people not yet born.
120
121 [dataformat]: ./fileformat.wiki
122
123 This format has the following major properties:
124
125 - It is <u>**syntactically simple**</u>, easily and efficiently
126 parsable in any programming language. It is also entirely
127 human-readable.
128
129 - It is <u>**immutable**</u>. An artifact is identified by its unique
130 hash value. Any modification to an artifact changes that hash,
131 thereby changing its identity.
132
133 - It is <u>**not generic**</u>. It is custom-made for its purpose and
134 makes no attempt at providing a generic format. It contains *only*
135 what it *needs* to function, with zero bloat.
136
137 - It <u>**holds all SCM-relevant data except for client-level file
138 content**</u>, the latter instead being referenced by their unique
139 hash values. Storage of the client-side content is an implementation
140 detail delegated to higher-level applications.
141
142 - <u>**Auditability**</u>. By following the hash references in
143 artifacts it is possible to unambiguously trace the origin of any
144 modification to the SCM state. Combined with higher-level tools
145 (specifically, Fossil's database), this audit trail can easily be
146 traced both backwards and forwards in time, using any given version
147 in the SCM history as a starting point.
148
149 Notably, the artifact file format <u>does not</u>...
150
151 - Specify any specific storage mechanism for the SCM's raw bytes,
152 which includes both artifacts themselves and client-side file
153 content. The file format refers to all such content solely by its
154 unique hash value.
155
156 - Specify any optimizations such as storing file-level changes as
157 deltas between two versions of that content.
158
159 Such aspects are all considered to be implementation details of
160 higher-level applications (be they in the main fossil binary or a
161 hypothetical 3rd-party application), and have no effect on the
162 underlying artifact data model. That said, in Fossil:
163
164 - All raw byte content (artifacts and client files) is stored in
165 the `blob` database table.
166
167 - Fossil uses delta and zlib compression to keep the storage size of
168 changes from one version of a piece of content to the next to a
169 minimum.
170
171
172 ## Sidebar: SCM-relevant vs Non-SCM-relevant State
173
174 Certain data in Fossil are "SCM-relevant" and certain data are not. In
175 short, SCM-relevant data are managed in a way consistent with
176 controlled versioning of that data. Conversely, non-SCM-relevant data
177 are essentially any state neither specified by nor unambiguously
178 refererenced by the artifact file format and are therefore not
179 versioned.
180
181 SCM-relevant state includes:
182
183 - Any and all data stored in the bodies of artifacts. This includes,
184 but is not limited to: wiki/ticket/forum content, tags, file names
185 and Fossil-side permissions, the name of each user who introduces
186 any given artifact into the data store, the timestamp of each such
187 change, the inheritance tree of checkins, and many other pieces of
188 metadata.
189
190 - Raw file content of versioned files. These data are external to
191 artifacts, which refer to them by their hashes. How they are stored
192 is not the concern of the data model, but (spoilin themt!) Fossil
193 stores them in an sqlite database, one record per distinct hash, in
194 its `blob` table (which we will cover more very soon)".users and their metadata (permissions, email
195 address, etc.). Artifacts themselves reference users only by their
196 user names. Artifacts neither care whether, nor guaranty that, user
197 "drh" in one artifact is in fact the same "drh" referenced in
198 another artifact.
199
200 - All Fossil UI configuration, e.g. the site's skin, config settings,
201 and project name.
202
203 - In short, any tables in a Fossil repository file except for the
204 `blob` table. Most, but not all, of these tables are transient
205 caches for the data specified by the artifact files (which are
206 stored in the `blob` table), and can safely be destroyed and rebuilt
207 from the collection of artifacts with no loss of state to the
208 repository. *All* of them, except for `blob` and `delta`, can be
209 destroyed with no loss of *SCM-relevant* data.
210
211 ## Terminology Hair-splitting: Manifest vs. Artifact
212
213 We sometimes refer to artifacts as "manifests," which is technically a
214 term for artifacts which record checkins. The various other artifact
215 types are arguably not "manifests," but are sometimes referred to as
216 such because the internal APIs use that term.
217
218
219 ## A Very Basic Example
220
221 The following artifact, truncated for brevity, represents a typical
222 checkin artifact (a.k.a. a manifest):
223
224 ```
225 C Bug\sfix\sin\sthe\slocal\sdatabase\sfinder.
226 D 2007-07-30T13:01:08
227 F src/VERSION 24bbb3aad63325ff33c56d777007d7cd63dc19ea
228 F src/add.c 1a5dfcdbfd24c65fa04da865b2e21486d075e154
229 F src/blob.c 8ec1e279a6cd0cfd5f1e3f8a39f2e9a1682e0113
230 <SNIP>
231 F www/selfcheck.html 849df9860df602dc2c55163d658c6b138213122f
232 P 01e7596a984e2cd2bc12abc0a741415b902cbeea
233 R 74a0432d81b956bfc3ff5a1a2bb46eb5
234 U drh
235 Z c9dcc06ecead312b1c310711cb360bc3
236 ```
237
238 Each line is a single data record called a "card." The first letter of
239 each line tells us the type of data stored on that line and the
240 following space-separated tokens contain the data for that
241 line. Tokens which themselves contain spaces (notably the checkin
242 comment) have those escaped as `\s`. The raw text of wiki
243 pages/comments, forum posts, and ticket bodies/comments is stored
244 directly in the corresponding artifact, but is stored in a way which
245 makes such escaping unnecessary.
246
247 The hashes seen above are a critical component of the architecture:
248
249 - The `F` (file) records refer to the content of those files by the
250 hash of that content. Where that content is stored is *not* specified
251 by the data model.
252
253 - The `P` (parent) line is the hash code of the parent version (itself
254 an artifact).
255
256 - The `Z` line is a hash of all of the content of *this artifact*
257 which precedes the `Z` line. Thus any change to the content of an
258 artifact changes both the artifact's identity (its hash) and its `Z`
259 value, making it impossible to inject modified artifacts into an
260 existing artifact tree.
261
262 - The `R` line is yet another consistency-checking hash which we won't
263 go into here except to say that it's an internal consistency
264 check/line of defense against modification of file content
265 referenced by the artifact.
266
267 # Part 2: The `blob` Table
268
269 ```pikchr center
270 AllObjects: [
271 A: file "Artifacts";
272 down; move to A.s; move 50%;
273 F: file "Client" "files" fill lightskyblue;
274 right; move 1; up; move 50%;
275 B: cylinder "blob table" fill lightskyblue;
276 right;
277 arrow from A.e to B.w;
278 arrow from F.e to B.w;
279 arrow dashed from B.e;
280 C: box rad 0.1 "Crosslink" "process";
281 arrow
282 AUX: cylinder "Auxiliary" "tables"
283 arc -> cw dotted from AUX.s to B.s;
284 ] # end of AllObjects
285 ```
286
287
288 The `blob` table is the core-most storage of a Fossil repository
289 database, storing all SCM-relevant data (and *only* SCM-relevant
290 data). Each row of this table holds a single artifact or the content
291 for a single version of a single client-side file. Slightly truncated
292 for clarity, its schema contains the following fields:
293
294 - **`uuid`**: the hash code of the blob's contents.
295 - **`rid`**: a unique integer key for this record. This is how the
296 blob table is mapped to other (transient) tables, but the RIDs are
297 specific to one given copy of a repository and must not be used for
298 cross-repository referencing. The RID is a private/internal value of
299 no use to a user unless they're building SQL queries for use with
300 the Fossil db schema.
301 - **`size`**: the size, in bytes, of the blob's contents, or -1 for
302 "phantom" blobs (those which Fossil knows should exist because it's
303 seen them referenced somewhere, but for which it has not been given
304 any content).
305 - **`content`**: the blob's raw content bytes, with the caveat that
306 Fossil is free to store it in an "alternate representation."
307 Specifically, the `content` field often holds a zlib-compressed
308 delta from a previous version of the blob's content (a separate
309 entry in the `blob` table), and an auxiliary table named `delta`
310 maps such blobs to their previous versions, such that Fossil can
311 reconstruct the real content from them by applying the delta to its
312 previous version (and such deltas may be chained). Thus extraction
313 of the content from this field cannot be performed via vanilla SQL,
314 and requires a Fossil-specific function which knows how to convert
315 any internal representations of the content to its original form.
316
317
318 ## Sidebar: How does `blob` Distinguish Between Artifacts and Client Content?
319
320 Notice that the `blob` table has no flag saying "this record is an
321 artifact" or "this record is client data." Similarly, there is no
322 place in the database dedicated to keeping track of which `blob`
323 records are artifacts and which are file content.
324
325 That said, (A) the type of a blob can be implied via certain table
326 relationships and (B) the `event` table (the `/timeline`'s main data
327 source) incidentally has a list of artifacts and their sub-types
328 (checkin, wiki, tag, etc.). However, given that all of those
329 relationships, including the timeline, are *transient*, how can Fossil
330 distinguish between the two types of data?
331
332 Fossil's artifact format is extremely rigid and is *strictly* enforced
333 internally, with zero room provided for leniency. Every artifact which
334 is internally created is re-parsed for validity before it is committed
335 to the database, making it impossible that Fossil can inject an
336 invalid artifact into the repository. Because of the strictness of the
337 artifact parser, the chances that any given piece of arbitrary clien could be successfully parsed as an artifact, even if it is
338 syntactically 99% similar to an artifact, are *effectively zero*.
339
340 Thus Fossil's rule of interpreting the contents of the blob table is:
341 if it can be parsed as an artifact, it *is* an artifact, else it is
342 opaque client-side data.
343
344 That rule is most often relevant in operations like `rebuild` and
345 `reconstruct`, both of which necessarily have to sort out artifacts
346 and non-artifact blobs from arbitrary collections of blobs.
347
348 It is, in fact, possible to store an artifact unrelated to the current
349 repository in that repository, and it *will be parsed and processed as
350 an artifact* (see below), but it likely refers to other artifacts or
351 blobs which are not part of the current repository, thereby possibly
352 introducing "strange" data into the UI. If this happens, it's
353 potentially slightly confusing but is functionally harmless.
354
355
356 # Part 3: Crosslinking
357
358 ```pikchr center
359 AllObjects: [
360 A: file "Artifacts";
361 down; move to A.s; move 50%;
362 F: file "Client" "files";
363 right; move 1; up; move 50%;
364 B: cylinder "blob table"
365 right;
366 arrow from A.e to B.w;
367 arrow from F.e to B.w;
368 arrow dashed from B.e;
369 C: box rad 0.1 "Crosslink" "process" fill lightskyblue;
370 arrow
371 AUX: cylinder "Auxiliary" "tables" fill lightskyblue;
372 arc -> cw dotted from AUX.s to B.s;
373 ] # end of AllObjects
374 ```
375
376 Once an artifact is stored in the `blob` table, how does one perform
377 SQL queries against its plain-text format? In short: *One Does Not
378 Simply Query the Artifacts*.
379
380 Crosslinking, as its colloquially known, is a one-way processing step
381 which transforms an immutable artifact's state into something
382 database-friendly. Crosslinking happens automatically every time
383 Fossil generates, or is given, a new artifact. Crosslinking of any
384 given artifact may update many different auxiliary tables, *all* of
385 which are transient in the sense that they may be destroyed and then
386 recreated by crosslinking all artifacts
--- www/mkindex.tcl
+++ www/mkindex.tcl
@@ -54,10 +54,11 @@
5454
fileformat.wiki {Fossil File Format}
5555
fiveminutes.wiki {Up and Running in 5 Minutes as a Single User}
5656
forum.wiki {Fossil Forums}
5757
foss-cklist.wiki {Checklist For Successful Open-Source Projects}
5858
fossil-from-msvc.wiki {Integrating Fossil in the Microsoft Express 2010 IDE}
59
+ fossil-is-not-relational.md {Introduction to the Fossil Data Model}
5960
fossil_prompt.wiki {Fossilized Bash Prompt}
6061
fossil-v-git.wiki {Fossil Versus Git}
6162
globs.md {File Name Glob Patterns}
6263
gitusers.md {Git to Fossil Translation Guide}
6364
grep.md {Fossil grep vs POSIX grep}
6465
--- www/mkindex.tcl
+++ www/mkindex.tcl
@@ -54,10 +54,11 @@
54 fileformat.wiki {Fossil File Format}
55 fiveminutes.wiki {Up and Running in 5 Minutes as a Single User}
56 forum.wiki {Fossil Forums}
57 foss-cklist.wiki {Checklist For Successful Open-Source Projects}
58 fossil-from-msvc.wiki {Integrating Fossil in the Microsoft Express 2010 IDE}
 
59 fossil_prompt.wiki {Fossilized Bash Prompt}
60 fossil-v-git.wiki {Fossil Versus Git}
61 globs.md {File Name Glob Patterns}
62 gitusers.md {Git to Fossil Translation Guide}
63 grep.md {Fossil grep vs POSIX grep}
64
--- www/mkindex.tcl
+++ www/mkindex.tcl
@@ -54,10 +54,11 @@
54 fileformat.wiki {Fossil File Format}
55 fiveminutes.wiki {Up and Running in 5 Minutes as a Single User}
56 forum.wiki {Fossil Forums}
57 foss-cklist.wiki {Checklist For Successful Open-Source Projects}
58 fossil-from-msvc.wiki {Integrating Fossil in the Microsoft Express 2010 IDE}
59 fossil-is-not-relational.md {Introduction to the Fossil Data Model}
60 fossil_prompt.wiki {Fossilized Bash Prompt}
61 fossil-v-git.wiki {Fossil Versus Git}
62 globs.md {File Name Glob Patterns}
63 gitusers.md {Git to Fossil Translation Guide}
64 grep.md {Fossil grep vs POSIX grep}
65
--- www/permutedindex.html
+++ www/permutedindex.html
@@ -91,10 +91,11 @@
9191
<li><a href="css-tricks.md">CSS Tips and Tricks &mdash; Fossil</a></li>
9292
<li><a href="customskin.md"><b>Custom Skins</b></a></li>
9393
<li><a href="customskin.md">Customizing The Appearance of Web Pages &mdash; Theming:</a></li>
9494
<li><a href="custom_ticket.wiki"><b>Customizing The Ticket System</b></a></li>
9595
<li><a href="customgraph.md">Customizing the Timeline Graph &mdash; Theming:</a></li>
96
+<li><a href="fossil-is-not-relational.md">Data Model &mdash; Introduction to the Fossil</a></li>
9697
<li><a href="tech_overview.wiki">Databases Used By Fossil &mdash; SQLite</a></li>
9798
<li><a href="defcsp.md">Default Content Security Policy &mdash; The</a></li>
9899
<li><a href="antibot.wiki"><b>Defense against Spiders and Bots</b></a></li>
99100
<li><a href="shunning.wiki">Deleting Content From Fossil &mdash; Shunning:</a></li>
100101
<li><a href="private.wiki">Deleting Private Branches &mdash; Creating, Syncing, and</a></li>
@@ -195,10 +196,11 @@
195196
<li><a href="build.wiki">Installing Fossil &mdash; Compiling and</a></li>
196197
<li><a href="fossil-from-msvc.wiki"><b>Integrating Fossil in the Microsoft Express 2010 IDE</b></a></li>
197198
<li><a href="selfcheck.wiki">Integrity Self Checks &mdash; Fossil Repository</a></li>
198199
<li><a href="webui.wiki">Interface &mdash; The Fossil Web</a></li>
199200
<li><a href="interwiki.md"><b>Interwiki Links</b></a></li>
201
+<li><a href="fossil-is-not-relational.md"><b>Introduction to the Fossil Data Model</b></a></li>
200202
<li><a href="blockchain.md"><b>Is Fossil A Blockchain?</b></a></li>
201203
<li><a href="chroot.md">Jail &mdash; Server Chroot</a></li>
202204
<li><a href="javascript.md">JavaScript in Fossil &mdash; Use of</a></li>
203205
<li><a href="pikchr.md">Language &mdash; The Pikchr Diagram</a></li>
204206
<li><a href="th1.md">Language &mdash; The TH1 Scripting</a></li>
@@ -215,10 +217,11 @@
215217
<li><a href="branching.wiki">Merging, and Tagging &mdash; Branching, Forking,</a></li>
216218
<li><a href="fossil-from-msvc.wiki">Microsoft Express 2010 IDE &mdash; Integrating Fossil in the</a></li>
217219
<li><a href="fiveminutes.wiki">Minutes as a Single User &mdash; Up and Running in 5</a></li>
218220
<li><a href="mirrortogithub.md">Mirror A Fossil Repository On GitHub &mdash; How To</a></li>
219221
<li><a href="mirrorlimitations.md">Mirrors &mdash; Limitations On Git</a></li>
222
+<li><a href="fossil-is-not-relational.md">Model &mdash; Introduction to the Fossil Data</a></li>
220223
<li><a href="globs.md">Name Glob Patterns &mdash; File</a></li>
221224
<li><a href="checkin_names.wiki">Names &mdash; Check-in And Version</a></li>
222225
<li><a href="adding_code.wiki">New Features To Fossil &mdash; Adding</a></li>
223226
<li><a href="newrepo.wiki">New Fossil Repository &mdash; How To Create A</a></li>
224227
<li><a href="alerts.md">Notifications &mdash; Email Alerts And</a></li>
225228
--- www/permutedindex.html
+++ www/permutedindex.html
@@ -91,10 +91,11 @@
91 <li><a href="css-tricks.md">CSS Tips and Tricks &mdash; Fossil</a></li>
92 <li><a href="customskin.md"><b>Custom Skins</b></a></li>
93 <li><a href="customskin.md">Customizing The Appearance of Web Pages &mdash; Theming:</a></li>
94 <li><a href="custom_ticket.wiki"><b>Customizing The Ticket System</b></a></li>
95 <li><a href="customgraph.md">Customizing the Timeline Graph &mdash; Theming:</a></li>
 
96 <li><a href="tech_overview.wiki">Databases Used By Fossil &mdash; SQLite</a></li>
97 <li><a href="defcsp.md">Default Content Security Policy &mdash; The</a></li>
98 <li><a href="antibot.wiki"><b>Defense against Spiders and Bots</b></a></li>
99 <li><a href="shunning.wiki">Deleting Content From Fossil &mdash; Shunning:</a></li>
100 <li><a href="private.wiki">Deleting Private Branches &mdash; Creating, Syncing, and</a></li>
@@ -195,10 +196,11 @@
195 <li><a href="build.wiki">Installing Fossil &mdash; Compiling and</a></li>
196 <li><a href="fossil-from-msvc.wiki"><b>Integrating Fossil in the Microsoft Express 2010 IDE</b></a></li>
197 <li><a href="selfcheck.wiki">Integrity Self Checks &mdash; Fossil Repository</a></li>
198 <li><a href="webui.wiki">Interface &mdash; The Fossil Web</a></li>
199 <li><a href="interwiki.md"><b>Interwiki Links</b></a></li>
 
200 <li><a href="blockchain.md"><b>Is Fossil A Blockchain?</b></a></li>
201 <li><a href="chroot.md">Jail &mdash; Server Chroot</a></li>
202 <li><a href="javascript.md">JavaScript in Fossil &mdash; Use of</a></li>
203 <li><a href="pikchr.md">Language &mdash; The Pikchr Diagram</a></li>
204 <li><a href="th1.md">Language &mdash; The TH1 Scripting</a></li>
@@ -215,10 +217,11 @@
215 <li><a href="branching.wiki">Merging, and Tagging &mdash; Branching, Forking,</a></li>
216 <li><a href="fossil-from-msvc.wiki">Microsoft Express 2010 IDE &mdash; Integrating Fossil in the</a></li>
217 <li><a href="fiveminutes.wiki">Minutes as a Single User &mdash; Up and Running in 5</a></li>
218 <li><a href="mirrortogithub.md">Mirror A Fossil Repository On GitHub &mdash; How To</a></li>
219 <li><a href="mirrorlimitations.md">Mirrors &mdash; Limitations On Git</a></li>
 
220 <li><a href="globs.md">Name Glob Patterns &mdash; File</a></li>
221 <li><a href="checkin_names.wiki">Names &mdash; Check-in And Version</a></li>
222 <li><a href="adding_code.wiki">New Features To Fossil &mdash; Adding</a></li>
223 <li><a href="newrepo.wiki">New Fossil Repository &mdash; How To Create A</a></li>
224 <li><a href="alerts.md">Notifications &mdash; Email Alerts And</a></li>
225
--- www/permutedindex.html
+++ www/permutedindex.html
@@ -91,10 +91,11 @@
91 <li><a href="css-tricks.md">CSS Tips and Tricks &mdash; Fossil</a></li>
92 <li><a href="customskin.md"><b>Custom Skins</b></a></li>
93 <li><a href="customskin.md">Customizing The Appearance of Web Pages &mdash; Theming:</a></li>
94 <li><a href="custom_ticket.wiki"><b>Customizing The Ticket System</b></a></li>
95 <li><a href="customgraph.md">Customizing the Timeline Graph &mdash; Theming:</a></li>
96 <li><a href="fossil-is-not-relational.md">Data Model &mdash; Introduction to the Fossil</a></li>
97 <li><a href="tech_overview.wiki">Databases Used By Fossil &mdash; SQLite</a></li>
98 <li><a href="defcsp.md">Default Content Security Policy &mdash; The</a></li>
99 <li><a href="antibot.wiki"><b>Defense against Spiders and Bots</b></a></li>
100 <li><a href="shunning.wiki">Deleting Content From Fossil &mdash; Shunning:</a></li>
101 <li><a href="private.wiki">Deleting Private Branches &mdash; Creating, Syncing, and</a></li>
@@ -195,10 +196,11 @@
196 <li><a href="build.wiki">Installing Fossil &mdash; Compiling and</a></li>
197 <li><a href="fossil-from-msvc.wiki"><b>Integrating Fossil in the Microsoft Express 2010 IDE</b></a></li>
198 <li><a href="selfcheck.wiki">Integrity Self Checks &mdash; Fossil Repository</a></li>
199 <li><a href="webui.wiki">Interface &mdash; The Fossil Web</a></li>
200 <li><a href="interwiki.md"><b>Interwiki Links</b></a></li>
201 <li><a href="fossil-is-not-relational.md"><b>Introduction to the Fossil Data Model</b></a></li>
202 <li><a href="blockchain.md"><b>Is Fossil A Blockchain?</b></a></li>
203 <li><a href="chroot.md">Jail &mdash; Server Chroot</a></li>
204 <li><a href="javascript.md">JavaScript in Fossil &mdash; Use of</a></li>
205 <li><a href="pikchr.md">Language &mdash; The Pikchr Diagram</a></li>
206 <li><a href="th1.md">Language &mdash; The TH1 Scripting</a></li>
@@ -215,10 +217,11 @@
217 <li><a href="branching.wiki">Merging, and Tagging &mdash; Branching, Forking,</a></li>
218 <li><a href="fossil-from-msvc.wiki">Microsoft Express 2010 IDE &mdash; Integrating Fossil in the</a></li>
219 <li><a href="fiveminutes.wiki">Minutes as a Single User &mdash; Up and Running in 5</a></li>
220 <li><a href="mirrortogithub.md">Mirror A Fossil Repository On GitHub &mdash; How To</a></li>
221 <li><a href="mirrorlimitations.md">Mirrors &mdash; Limitations On Git</a></li>
222 <li><a href="fossil-is-not-relational.md">Model &mdash; Introduction to the Fossil Data</a></li>
223 <li><a href="globs.md">Name Glob Patterns &mdash; File</a></li>
224 <li><a href="checkin_names.wiki">Names &mdash; Check-in And Version</a></li>
225 <li><a href="adding_code.wiki">New Features To Fossil &mdash; Adding</a></li>
226 <li><a href="newrepo.wiki">New Fossil Repository &mdash; How To Create A</a></li>
227 <li><a href="alerts.md">Notifications &mdash; Email Alerts And</a></li>
228

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button