|
8da2f2a…
|
stephan
|
1 |
# Fossil is not Relational |
|
8da2f2a…
|
stephan
|
2 |
|
|
8da2f2a…
|
stephan
|
3 |
***An Introduction to the Fossil Data Model*** |
|
8da2f2a…
|
stephan
|
4 |
|
|
8da2f2a…
|
stephan
|
5 |
Upon hearing that Fossil is based on sqlite, it's natural for people |
|
8da2f2a…
|
stephan
|
6 |
unfamiliar with its internals to assume that Fossil stores its |
|
8da2f2a…
|
stephan
|
7 |
SCM-relevant data in a database-friendly way and that the SCM history |
|
269788e…
|
stephan
|
8 |
can be modified via SQL. The truth, however, is *far stranger than |
|
8da2f2a…
|
stephan
|
9 |
that.* |
|
8da2f2a…
|
stephan
|
10 |
|
|
213160c…
|
stephan
|
11 |
This document introduces, at a relatively high level: |
|
8da2f2a…
|
stephan
|
12 |
|
|
8da2f2a…
|
stephan
|
13 |
1) The underlying enduring and immutable data format, which is |
|
8da2f2a…
|
stephan
|
14 |
independent of any specific storage engine. |
|
8da2f2a…
|
stephan
|
15 |
|
|
8da2f2a…
|
stephan
|
16 |
2) The `blob` table: Fossil's single point of SCM-relevant data |
|
8da2f2a…
|
stephan
|
17 |
storage. |
|
8da2f2a…
|
stephan
|
18 |
|
|
8da2f2a…
|
stephan
|
19 |
3) The transformation of (1) from its immutable raw form to a |
|
8da2f2a…
|
stephan
|
20 |
*transient* database-friendly form. |
|
8da2f2a…
|
stephan
|
21 |
|
|
8da2f2a…
|
stephan
|
22 |
4) Some of the consequences of this model. |
|
8da2f2a…
|
stephan
|
23 |
|
|
8da2f2a…
|
stephan
|
24 |
|
|
8da2f2a…
|
stephan
|
25 |
# Part 1: Artifacts |
|
8da2f2a…
|
stephan
|
26 |
|
|
8da2f2a…
|
stephan
|
27 |
```pikchr center |
|
8da2f2a…
|
stephan
|
28 |
AllObjects: [ |
|
8da2f2a…
|
stephan
|
29 |
A: file "Artifacts" fill lightskyblue; |
|
8da2f2a…
|
stephan
|
30 |
down; move to A.s; move 50%; |
|
8da2f2a…
|
stephan
|
31 |
F: file "Client" "files"; |
|
8da2f2a…
|
stephan
|
32 |
right; move 1; up; move 50%; |
|
8da2f2a…
|
stephan
|
33 |
B: cylinder "blob table" |
|
8da2f2a…
|
stephan
|
34 |
right; |
|
8da2f2a…
|
stephan
|
35 |
arrow from A.e to B.w; |
|
8da2f2a…
|
stephan
|
36 |
arrow from F.e to B.w; |
|
8da2f2a…
|
stephan
|
37 |
arrow dashed from B.e; |
|
8da2f2a…
|
stephan
|
38 |
C: box rad 0.1 "Crosslink" "process"; |
|
8da2f2a…
|
stephan
|
39 |
arrow |
|
8da2f2a…
|
stephan
|
40 |
AUX: cylinder "Auxiliary" "tables" |
|
8da2f2a…
|
stephan
|
41 |
arc -> cw dotted from AUX.s to B.s; |
|
8da2f2a…
|
stephan
|
42 |
] # end of AllObjects |
|
8da2f2a…
|
stephan
|
43 |
``` |
|
8da2f2a…
|
stephan
|
44 |
|
|
8da2f2a…
|
stephan
|
45 |
|
|
8da2f2a…
|
stephan
|
46 |
The centerpiece of Fossil's architecture is a data format which |
|
8da2f2a…
|
stephan
|
47 |
describes what we call "artifacts." Each artifact represents the state |
|
8da2f2a…
|
stephan
|
48 |
of one atomic unit of SCM-relevant data, such as a single checkin, a |
|
8da2f2a…
|
stephan
|
49 |
single wiki page edit, a single modification to a ticket, creation or |
|
8da2f2a…
|
stephan
|
50 |
cancellation of tags, and similar SCM constructs. In the cases of |
|
8da2f2a…
|
stephan
|
51 |
checkins and ticket updates, an artifact may record changes to |
|
8da2f2a…
|
stephan
|
52 |
multiple files resp. ticket fields, but the change as a whole |
|
8da2f2a…
|
stephan
|
53 |
is atomic. Though we often refer to both fossil-specific SCM data |
|
8da2f2a…
|
stephan
|
54 |
and client-side content as artifacts, this document uses the term |
|
8da2f2a…
|
stephan
|
55 |
artifact solely for the former purpose. |
|
8da2f2a…
|
stephan
|
56 |
|
|
8da2f2a…
|
stephan
|
57 |
From [the data format's main documentation][dataformat]: |
|
8da2f2a…
|
stephan
|
58 |
|
|
8da2f2a…
|
stephan
|
59 |
> The global state of a fossil repository is kept simple so that it |
|
8da2f2a…
|
stephan
|
60 |
> can endure in useful form for decades or centuries. A fossil |
|
8da2f2a…
|
stephan
|
61 |
> repository is intended to be readable, searchable, and extensible by |
|
8da2f2a…
|
stephan
|
62 |
> people not yet born. |
|
8da2f2a…
|
stephan
|
63 |
|
|
8da2f2a…
|
stephan
|
64 |
[dataformat]: ./fileformat.wiki |
|
8da2f2a…
|
stephan
|
65 |
|
|
8da2f2a…
|
stephan
|
66 |
This format has the following major properties: |
|
8da2f2a…
|
stephan
|
67 |
|
|
8da2f2a…
|
stephan
|
68 |
- It is <u>**syntactically simple**</u>, easily and efficiently |
|
8da2f2a…
|
stephan
|
69 |
parsable in any programming language. It is also entirely |
|
8da2f2a…
|
stephan
|
70 |
human-readable. |
|
8da2f2a…
|
stephan
|
71 |
|
|
8da2f2a…
|
stephan
|
72 |
- It is <u>**immutable**</u>. An artifact is identified by its unique |
|
8da2f2a…
|
stephan
|
73 |
hash value. Any modification to an artifact changes that hash, |
|
8da2f2a…
|
stephan
|
74 |
thereby changing its identity. |
|
8da2f2a…
|
stephan
|
75 |
|
|
8da2f2a…
|
stephan
|
76 |
- It is <u>**not generic**</u>. It is custom-made for its purpose and |
|
8da2f2a…
|
stephan
|
77 |
makes no attempt at providing a generic format. It contains *only* |
|
8da2f2a…
|
stephan
|
78 |
what it *needs* to function, with zero bloat. |
|
8da2f2a…
|
stephan
|
79 |
|
|
8da2f2a…
|
stephan
|
80 |
- It <u>**holds all SCM-relevant data except for client-level file |
|
8da2f2a…
|
stephan
|
81 |
content**</u>, the latter instead being referenced by their unique |
|
269788e…
|
stephan
|
82 |
hash values. Storage of the client-side content is an implementation |
|
8da2f2a…
|
stephan
|
83 |
detail delegated to higher-level applications. |
|
8da2f2a…
|
stephan
|
84 |
|
|
8da2f2a…
|
stephan
|
85 |
- <u>**Auditability**</u>. By following the hash references in |
|
8da2f2a…
|
stephan
|
86 |
artifacts it is possible to unambiguously trace the origin of any |
|
8da2f2a…
|
stephan
|
87 |
modification to the SCM state. Combined with higher-level tools |
|
8da2f2a…
|
stephan
|
88 |
(specifically, Fossil's database), this audit trail can easily be |
|
8da2f2a…
|
stephan
|
89 |
traced both backwards and forwards in time, using any given version |
|
8da2f2a…
|
stephan
|
90 |
in the SCM history as a starting point. |
|
8da2f2a…
|
stephan
|
91 |
|
|
8da2f2a…
|
stephan
|
92 |
Notably, the artifact file format <u>does not</u>... |
|
8da2f2a…
|
stephan
|
93 |
|
|
8da2f2a…
|
stephan
|
94 |
- Specify any specific storage mechanism for the SCM's raw bytes, |
|
8da2f2a…
|
stephan
|
95 |
which includes both artifacts themselves and client-side file |
|
8da2f2a…
|
stephan
|
96 |
content. The file format refers to all such content solely by its |
|
8da2f2a…
|
stephan
|
97 |
unique hash value. |
|
8da2f2a…
|
stephan
|
98 |
|
|
4799aae…
|
drh
|
99 |
- Specify any optimizations such as storing file-level changes as |
|
8da2f2a…
|
stephan
|
100 |
deltas between two versions of that content. |
|
8da2f2a…
|
stephan
|
101 |
|
|
8da2f2a…
|
stephan
|
102 |
Such aspects are all considered to be implementation details of |
|
8da2f2a…
|
stephan
|
103 |
higher-level applications (be they in the main fossil binary or a |
|
8da2f2a…
|
stephan
|
104 |
hypothetical 3rd-party application), and have no effect on the |
|
8da2f2a…
|
stephan
|
105 |
underlying artifact data model. That said, in Fossil: |
|
8da2f2a…
|
stephan
|
106 |
|
|
8da2f2a…
|
stephan
|
107 |
- All raw byte content (artifacts and client files) is stored in |
|
8da2f2a…
|
stephan
|
108 |
the `blob` database table. |
|
8da2f2a…
|
stephan
|
109 |
|
|
8da2f2a…
|
stephan
|
110 |
- Fossil uses delta and zlib compression to keep the storage size of |
|
8da2f2a…
|
stephan
|
111 |
changes from one version of a piece of content to the next to a |
|
8da2f2a…
|
stephan
|
112 |
minimum. |
|
8da2f2a…
|
stephan
|
113 |
|
|
8da2f2a…
|
stephan
|
114 |
|
|
8da2f2a…
|
stephan
|
115 |
## Sidebar: SCM-relevant vs Non-SCM-relevant State |
|
8da2f2a…
|
stephan
|
116 |
|
|
8da2f2a…
|
stephan
|
117 |
Certain data in Fossil are "SCM-relevant" and certain data are not. In |
|
8da2f2a…
|
stephan
|
118 |
short, SCM-relevant data are managed in a way consistent with |
|
8da2f2a…
|
stephan
|
119 |
controlled versioning of that data. Conversely, non-SCM-relevant data |
|
8da2f2a…
|
stephan
|
120 |
are essentially any state neither specified by nor unambiguously |
|
8da2f2a…
|
stephan
|
121 |
refererenced by the artifact file format and are therefore not |
|
8da2f2a…
|
stephan
|
122 |
versioned. |
|
8da2f2a…
|
stephan
|
123 |
|
|
8da2f2a…
|
stephan
|
124 |
SCM-relevant state includes: |
|
8da2f2a…
|
stephan
|
125 |
|
|
8da2f2a…
|
stephan
|
126 |
- Any and all data stored in the bodies of artifacts. This includes, |
|
8da2f2a…
|
stephan
|
127 |
but is not limited to: wiki/ticket/forum content, tags, file names |
|
8da2f2a…
|
stephan
|
128 |
and Fossil-side permissions, the name of each user who introduces |
|
8da2f2a…
|
stephan
|
129 |
any given artifact into the data store, the timestamp of each such |
|
8da2f2a…
|
stephan
|
130 |
change, the inheritance tree of checkins, and many other pieces of |
|
8da2f2a…
|
stephan
|
131 |
metadata. |
|
8da2f2a…
|
stephan
|
132 |
|
|
8da2f2a…
|
stephan
|
133 |
- Raw file content of versioned files. These data are external to |
|
8da2f2a…
|
stephan
|
134 |
artifacts, which refer to them by their hashes. How they are stored |
|
8da2f2a…
|
stephan
|
135 |
is not the concern of the data model, but (spoiler alert!) Fossil |
|
c0654b1…
|
brickviking
|
136 |
stores them in an SQLite database, one record per distinct hash, in |
|
8da2f2a…
|
stephan
|
137 |
its `blob` table (which we will cover more very soon). |
|
8da2f2a…
|
stephan
|
138 |
|
|
8da2f2a…
|
stephan
|
139 |
Non-SCM-relevant state includes: |
|
8da2f2a…
|
stephan
|
140 |
|
|
8da2f2a…
|
stephan
|
141 |
- Fossil's list of users and their metadata (permissions, email |
|
8da2f2a…
|
stephan
|
142 |
address, etc.). Artifacts themselves reference users only by their |
|
44c5d02…
|
stephan
|
143 |
user names. Artifacts neither care whether, nor guarantee that, user |
|
8da2f2a…
|
stephan
|
144 |
"drh" in one artifact is in fact the same "drh" referenced in |
|
8da2f2a…
|
stephan
|
145 |
another artifact. |
|
8da2f2a…
|
stephan
|
146 |
|
|
8da2f2a…
|
stephan
|
147 |
- All Fossil UI configuration, e.g. the site's skin, config settings, |
|
8da2f2a…
|
stephan
|
148 |
and project name. |
|
8da2f2a…
|
stephan
|
149 |
|
|
8da2f2a…
|
stephan
|
150 |
- In short, any tables in a Fossil repository file except for the |
|
8da2f2a…
|
stephan
|
151 |
`blob` table. Most, but not all, of these tables are transient |
|
8da2f2a…
|
stephan
|
152 |
caches for the data specified by the artifact files (which are |
|
8da2f2a…
|
stephan
|
153 |
stored in the `blob` table), and can safely be destroyed and rebuilt |
|
8da2f2a…
|
stephan
|
154 |
from the collection of artifacts with no loss of state to the |
|
8da2f2a…
|
stephan
|
155 |
repository. *All* of them, except for `blob` and `delta`, can be |
|
8da2f2a…
|
stephan
|
156 |
destroyed with no loss of *SCM-relevant* data. |
|
8da2f2a…
|
stephan
|
157 |
|
|
8da2f2a…
|
stephan
|
158 |
## Terminology Hair-splitting: Manifest vs. Artifact |
|
8da2f2a…
|
stephan
|
159 |
|
|
8da2f2a…
|
stephan
|
160 |
We sometimes refer to artifacts as "manifests," which is technically a |
|
8da2f2a…
|
stephan
|
161 |
term for artifacts which record checkins. The various other artifact |
|
8da2f2a…
|
stephan
|
162 |
types are arguably not "manifests," but are sometimes referred to as |
|
8da2f2a…
|
stephan
|
163 |
such because the internal APIs use that term. |
|
8da2f2a…
|
stephan
|
164 |
|
|
8da2f2a…
|
stephan
|
165 |
|
|
8da2f2a…
|
stephan
|
166 |
## A Very Basic Example |
|
8da2f2a…
|
stephan
|
167 |
|
|
8da2f2a…
|
stephan
|
168 |
The following artifact, truncated for brevity, represents a typical |
|
8da2f2a…
|
stephan
|
169 |
checkin artifact (a.k.a. a manifest): |
|
8da2f2a…
|
stephan
|
170 |
|
|
8da2f2a…
|
stephan
|
171 |
``` |
|
8da2f2a…
|
stephan
|
172 |
C Bug\sfix\sin\sthe\slocal\sdatabase\sfinder. |
|
8da2f2a…
|
stephan
|
173 |
D 2007-07-30T13:01:08 |
|
8da2f2a…
|
stephan
|
174 |
F src/VERSION 24bbb3aad63325ff33c56d777007d7cd63dc19ea |
|
8da2f2a…
|
stephan
|
175 |
F src/add.c 1a5dfcdbfd24c65fa04da865b2e21486d075e154 |
|
8da2f2a…
|
stephan
|
176 |
F src/blob.c 8ec1e279a6cd0cfd5f1e3f8a39f2e9a1682e0113 |
|
8da2f2a…
|
stephan
|
177 |
<SNIP> |
|
8da2f2a…
|
stephan
|
178 |
F www/selfcheck.html 849df9860df602dc2c55163d658c6b138213122f |
|
8da2f2a…
|
stephan
|
179 |
P 01e7596a984e2cd2bc12abc0a741415b902cbeea |
|
8da2f2a…
|
stephan
|
180 |
R 74a0432d81b956bfc3ff5a1a2bb46eb5 |
|
8da2f2a…
|
stephan
|
181 |
U drh |
|
8da2f2a…
|
stephan
|
182 |
Z c9dcc06ecead312b1c310711cb360bc3 |
|
8da2f2a…
|
stephan
|
183 |
``` |
|
8da2f2a…
|
stephan
|
184 |
|
|
213160c…
|
stephan
|
185 |
Each line is a single data record called a "card." The first letter of |
|
8da2f2a…
|
stephan
|
186 |
each line tells us the type of data stored on that line and the |
|
8da2f2a…
|
stephan
|
187 |
following space-separated tokens contain the data for that |
|
8da2f2a…
|
stephan
|
188 |
line. Tokens which themselves contain spaces (notably the checkin |
|
8da2f2a…
|
stephan
|
189 |
comment) have those escaped as `\s`. The raw text of wiki |
|
8da2f2a…
|
stephan
|
190 |
pages/comments, forum posts, and ticket bodies/comments is stored |
|
8da2f2a…
|
stephan
|
191 |
directly in the corresponding artifact, but is stored in a way which |
|
8da2f2a…
|
stephan
|
192 |
makes such escaping unnecessary. |
|
8da2f2a…
|
stephan
|
193 |
|
|
8da2f2a…
|
stephan
|
194 |
The hashes seen above are a critical component of the architecture: |
|
8da2f2a…
|
stephan
|
195 |
|
|
8da2f2a…
|
stephan
|
196 |
- The `F` (file) records refer to the content of those files by the |
|
8da2f2a…
|
stephan
|
197 |
hash of that content. Where that content is stored is *not* specified |
|
8da2f2a…
|
stephan
|
198 |
by the data model. |
|
8da2f2a…
|
stephan
|
199 |
|
|
8da2f2a…
|
stephan
|
200 |
- The `P` (parent) line is the hash code of the parent version (itself |
|
8da2f2a…
|
stephan
|
201 |
an artifact). |
|
8da2f2a…
|
stephan
|
202 |
|
|
8da2f2a…
|
stephan
|
203 |
- The `Z` line is a hash of all of the content of *this artifact* |
|
8da2f2a…
|
stephan
|
204 |
which precedes the `Z` line. Thus any change to the content of an |
|
8da2f2a…
|
stephan
|
205 |
artifact changes both the artifact's identity (its hash) and its `Z` |
|
8da2f2a…
|
stephan
|
206 |
value, making it impossible to inject modified artifacts into an |
|
8da2f2a…
|
stephan
|
207 |
existing artifact tree. |
|
8da2f2a…
|
stephan
|
208 |
|
|
8da2f2a…
|
stephan
|
209 |
- The `R` line is yet another consistency-checking hash which we won't |
|
8da2f2a…
|
stephan
|
210 |
go into here except to say that it's an internal consistency |
|
8da2f2a…
|
stephan
|
211 |
check/line of defense against modification of file content |
|
8da2f2a…
|
stephan
|
212 |
referenced by the artifact. |
|
8da2f2a…
|
stephan
|
213 |
|
|
8da2f2a…
|
stephan
|
214 |
# Part 2: The `blob` Table |
|
8da2f2a…
|
stephan
|
215 |
|
|
8da2f2a…
|
stephan
|
216 |
```pikchr center |
|
8da2f2a…
|
stephan
|
217 |
AllObjects: [ |
|
8da2f2a…
|
stephan
|
218 |
A: file "Artifacts"; |
|
8da2f2a…
|
stephan
|
219 |
down; move to A.s; move 50%; |
|
8da2f2a…
|
stephan
|
220 |
F: file "Client" "files" fill lightskyblue; |
|
8da2f2a…
|
stephan
|
221 |
right; move 1; up; move 50%; |
|
8da2f2a…
|
stephan
|
222 |
B: cylinder "blob table" fill lightskyblue; |
|
8da2f2a…
|
stephan
|
223 |
right; |
|
8da2f2a…
|
stephan
|
224 |
arrow from A.e to B.w; |
|
8da2f2a…
|
stephan
|
225 |
arrow from F.e to B.w; |
|
8da2f2a…
|
stephan
|
226 |
arrow dashed from B.e; |
|
8da2f2a…
|
stephan
|
227 |
C: box rad 0.1 "Crosslink" "process"; |
|
8da2f2a…
|
stephan
|
228 |
arrow |
|
8da2f2a…
|
stephan
|
229 |
AUX: cylinder "Auxiliary" "tables" |
|
8da2f2a…
|
stephan
|
230 |
arc -> cw dotted from AUX.s to B.s; |
|
8da2f2a…
|
stephan
|
231 |
] # end of AllObjects |
|
8da2f2a…
|
stephan
|
232 |
``` |
|
8da2f2a…
|
stephan
|
233 |
|
|
8da2f2a…
|
stephan
|
234 |
|
|
8da2f2a…
|
stephan
|
235 |
The `blob` table is the core-most storage of a Fossil repository |
|
8da2f2a…
|
stephan
|
236 |
database, storing all SCM-relevant data (and *only* SCM-relevant |
|
8da2f2a…
|
stephan
|
237 |
data). Each row of this table holds a single artifact or the content |
|
8da2f2a…
|
stephan
|
238 |
for a single version of a single client-side file. Slightly truncated |
|
8da2f2a…
|
stephan
|
239 |
for clarity, its schema contains the following fields: |
|
8da2f2a…
|
stephan
|
240 |
|
|
8da2f2a…
|
stephan
|
241 |
- **`uuid`**: the hash code of the blob's contents. |
|
8da2f2a…
|
stephan
|
242 |
- **`rid`**: a unique integer key for this record. This is how the |
|
8da2f2a…
|
stephan
|
243 |
blob table is mapped to other (transient) tables, but the RIDs are |
|
8da2f2a…
|
stephan
|
244 |
specific to one given copy of a repository and must not be used for |
|
8da2f2a…
|
stephan
|
245 |
cross-repository referencing. The RID is a private/internal value of |
|
8da2f2a…
|
stephan
|
246 |
no use to a user unless they're building SQL queries for use with |
|
8da2f2a…
|
stephan
|
247 |
the Fossil db schema. |
|
8da2f2a…
|
stephan
|
248 |
- **`size`**: the size, in bytes, of the blob's contents, or -1 for |
|
8da2f2a…
|
stephan
|
249 |
"phantom" blobs (those which Fossil knows should exist because it's |
|
8da2f2a…
|
stephan
|
250 |
seen them referenced somewhere, but for which it has not been given |
|
8da2f2a…
|
stephan
|
251 |
any content). |
|
8da2f2a…
|
stephan
|
252 |
- **`content`**: the blob's raw content bytes, with the caveat that |
|
8da2f2a…
|
stephan
|
253 |
Fossil is free to store it in an "alternate representation." |
|
8da2f2a…
|
stephan
|
254 |
Specifically, the `content` field often holds a zlib-compressed |
|
8da2f2a…
|
stephan
|
255 |
delta from a previous version of the blob's content (a separate |
|
8da2f2a…
|
stephan
|
256 |
entry in the `blob` table), and an auxiliary table named `delta` |
|
8da2f2a…
|
stephan
|
257 |
maps such blobs to their previous versions, such that Fossil can |
|
8da2f2a…
|
stephan
|
258 |
reconstruct the real content from them by applying the delta to its |
|
8da2f2a…
|
stephan
|
259 |
previous version (and such deltas may be chained). Thus extraction |
|
8da2f2a…
|
stephan
|
260 |
of the content from this field cannot be performed via vanilla SQL, |
|
8da2f2a…
|
stephan
|
261 |
and requires a Fossil-specific function which knows how to convert |
|
8da2f2a…
|
stephan
|
262 |
any internal representations of the content to its original form. |
|
8da2f2a…
|
stephan
|
263 |
|
|
8da2f2a…
|
stephan
|
264 |
|
|
8da2f2a…
|
stephan
|
265 |
## Sidebar: How does `blob` Distinguish Between Artifacts and Client Content? |
|
8da2f2a…
|
stephan
|
266 |
|
|
8da2f2a…
|
stephan
|
267 |
Notice that the `blob` table has no flag saying "this record is an |
|
8da2f2a…
|
stephan
|
268 |
artifact" or "this record is client data." Similarly, there is no |
|
8da2f2a…
|
stephan
|
269 |
place in the database dedicated to keeping track of which `blob` |
|
8da2f2a…
|
stephan
|
270 |
records are artifacts and which are file content. |
|
8da2f2a…
|
stephan
|
271 |
|
|
8da2f2a…
|
stephan
|
272 |
That said, (A) the type of a blob can be implied via certain table |
|
8da2f2a…
|
stephan
|
273 |
relationships and (B) the `event` table (the `/timeline`'s main data |
|
8da2f2a…
|
stephan
|
274 |
source) incidentally has a list of artifacts and their sub-types |
|
8da2f2a…
|
stephan
|
275 |
(checkin, wiki, tag, etc.). However, given that all of those |
|
8da2f2a…
|
stephan
|
276 |
relationships, including the timeline, are *transient*, how can Fossil |
|
8da2f2a…
|
stephan
|
277 |
distinguish between the two types of data? |
|
8da2f2a…
|
stephan
|
278 |
|
|
8da2f2a…
|
stephan
|
279 |
Fossil's artifact format is extremely rigid and is *strictly* enforced |
|
8da2f2a…
|
stephan
|
280 |
internally, with zero room provided for leniency. Every artifact which |
|
8da2f2a…
|
stephan
|
281 |
is internally created is re-parsed for validity before it is committed |
|
8da2f2a…
|
stephan
|
282 |
to the database, making it impossible that Fossil can inject an |
|
8da2f2a…
|
stephan
|
283 |
invalid artifact into the repository. Because of the strictness of the |
|
8da2f2a…
|
stephan
|
284 |
artifact parser, the chances that any given piece of arbitrary client |
|
8da2f2a…
|
stephan
|
285 |
data could be successfully parsed as an artifact, even if it is |
|
8da2f2a…
|
stephan
|
286 |
syntactically 99% similar to an artifact, are *effectively zero*. |
|
8da2f2a…
|
stephan
|
287 |
|
|
8da2f2a…
|
stephan
|
288 |
Thus Fossil's rule of interpreting the contents of the blob table is: |
|
8da2f2a…
|
stephan
|
289 |
if it can be parsed as an artifact, it *is* an artifact, else it is |
|
8da2f2a…
|
stephan
|
290 |
opaque client-side data. |
|
8da2f2a…
|
stephan
|
291 |
|
|
8da2f2a…
|
stephan
|
292 |
That rule is most often relevant in operations like `rebuild` and |
|
8da2f2a…
|
stephan
|
293 |
`reconstruct`, both of which necessarily have to sort out artifacts |
|
8da2f2a…
|
stephan
|
294 |
and non-artifact blobs from arbitrary collections of blobs. |
|
8da2f2a…
|
stephan
|
295 |
|
|
8da2f2a…
|
stephan
|
296 |
It is, in fact, possible to store an artifact unrelated to the current |
|
8da2f2a…
|
stephan
|
297 |
repository in that repository, and it *will be parsed and processed as |
|
8da2f2a…
|
stephan
|
298 |
an artifact* (see below), but it likely refers to other artifacts or |
|
8da2f2a…
|
stephan
|
299 |
blobs which are not part of the current repository, thereby possibly |
|
8da2f2a…
|
stephan
|
300 |
introducing "strange" data into the UI. If this happens, it's |
|
8da2f2a…
|
stephan
|
301 |
potentially slightly confusing but is functionally harmless. |
|
8da2f2a…
|
stephan
|
302 |
|
|
8da2f2a…
|
stephan
|
303 |
|
|
8da2f2a…
|
stephan
|
304 |
# Part 3: Crosslinking |
|
8da2f2a…
|
stephan
|
305 |
|
|
8da2f2a…
|
stephan
|
306 |
```pikchr center |
|
8da2f2a…
|
stephan
|
307 |
AllObjects: [ |
|
8da2f2a…
|
stephan
|
308 |
A: file "Artifacts"; |
|
8da2f2a…
|
stephan
|
309 |
down; move to A.s; move 50%; |
|
8da2f2a…
|
stephan
|
310 |
F: file "Client" "files"; |
|
8da2f2a…
|
stephan
|
311 |
right; move 1; up; move 50%; |
|
8da2f2a…
|
stephan
|
312 |
B: cylinder "blob table" |
|
8da2f2a…
|
stephan
|
313 |
right; |
|
8da2f2a…
|
stephan
|
314 |
arrow from A.e to B.w; |
|
8da2f2a…
|
stephan
|
315 |
arrow from F.e to B.w; |
|
8da2f2a…
|
stephan
|
316 |
arrow dashed from B.e; |
|
8da2f2a…
|
stephan
|
317 |
C: box rad 0.1 "Crosslink" "process" fill lightskyblue; |
|
8da2f2a…
|
stephan
|
318 |
arrow |
|
8da2f2a…
|
stephan
|
319 |
AUX: cylinder "Auxiliary" "tables" fill lightskyblue; |
|
8da2f2a…
|
stephan
|
320 |
arc -> cw dotted from AUX.s to B.s; |
|
8da2f2a…
|
stephan
|
321 |
] # end of AllObjects |
|
8da2f2a…
|
stephan
|
322 |
``` |
|
8da2f2a…
|
stephan
|
323 |
|
|
8da2f2a…
|
stephan
|
324 |
Once an artifact is stored in the `blob` table, how does one perform |
|
8da2f2a…
|
stephan
|
325 |
SQL queries against its plain-text format? In short: *One Does Not |
|
8da2f2a…
|
stephan
|
326 |
Simply Query the Artifacts*. |
|
8da2f2a…
|
stephan
|
327 |
|
|
8da2f2a…
|
stephan
|
328 |
Crosslinking, as its colloquially known, is a one-way processing step |
|
8da2f2a…
|
stephan
|
329 |
which transforms an immutable artifact's state into something |
|
8da2f2a…
|
stephan
|
330 |
database-friendly. Crosslinking happens automatically every time |
|
8da2f2a…
|
stephan
|
331 |
Fossil generates, or is given, a new artifact. Crosslinking of any |
|
8da2f2a…
|
stephan
|
332 |
given artifact may update many different auxiliary tables, *all* of |
|
8da2f2a…
|
stephan
|
333 |
which are transient in the sense that they may be destroyed and then |
|
8da2f2a…
|
stephan
|
334 |
recreated by crosslinking all artifacts from the `blob` table (which |
|
8da2f2a…
|
stephan
|
335 |
is exactly what the `rebuild` command does). The overwhelming majority |
|
8da2f2a…
|
stephan
|
336 |
of individual database records in any Fossil repository are found in |
|
8da2f2a…
|
stephan
|
337 |
these transient auxiliary tables, though the `blob` table tends to |
|
8da2f2a…
|
stephan
|
338 |
account for the overwhelming majority of a repository's disk space. |
|
8da2f2a…
|
stephan
|
339 |
|
|
8da2f2a…
|
stephan
|
340 |
This approach to mapping data from artifacts to the db gives Fossil |
|
8da2f2a…
|
stephan
|
341 |
the freedom to change its database model, effectively at will, with |
|
8da2f2a…
|
stephan
|
342 |
minimal client-side disruption (at most, a call to `rebuild`). This |
|
8da2f2a…
|
stephan
|
343 |
allows, for example, Fossil to take advantage of new improvements in |
|
8da2f2a…
|
stephan
|
344 |
sqlite without affecting compatibility with older repositories. |
|
8da2f2a…
|
stephan
|
345 |
|
|
8da2f2a…
|
stephan
|
346 |
Auxiliary tables hold data mappings such as: |
|
8da2f2a…
|
stephan
|
347 |
|
|
8da2f2a…
|
stephan
|
348 |
- Child/parent relationships of checkins. (The `plink` table.) |
|
8da2f2a…
|
stephan
|
349 |
- Records of file names and changes to files. (The `mlink` and `filename` tables.) |
|
8da2f2a…
|
stephan
|
350 |
- Timeline entries. (The `event` table.) |
|
8da2f2a…
|
stephan
|
351 |
|
|
8da2f2a…
|
stephan
|
352 |
And numerous other bits and pieces. |
|
8da2f2a…
|
stephan
|
353 |
|
|
8da2f2a…
|
stephan
|
354 |
The many auxiliary tables maintained by the app-level code reference |
|
8da2f2a…
|
stephan
|
355 |
the `blob` table via its RID field, as that's far more efficient than |
|
8da2f2a…
|
stephan
|
356 |
using hashes (`blob.uuid`) as foreign keys. The contexts of those |
|
8da2f2a…
|
stephan
|
357 |
auxiliary data unambiguously tell us whether the referenced blobs are |
|
8da2f2a…
|
stephan
|
358 |
artifacts or file content, so there is no efficiency penalty there for |
|
8da2f2a…
|
stephan
|
359 |
hosting both opaque blobs and artifacts in the `blob` table. |
|
8da2f2a…
|
stephan
|
360 |
|
|
8da2f2a…
|
stephan
|
361 |
The complete SQL schemas for the core-most auxiliary tables can be found |
|
8da2f2a…
|
stephan
|
362 |
at: |
|
8da2f2a…
|
stephan
|
363 |
|
|
5b42737…
|
stephan
|
364 |
[](/finfo/src/schema.c?ci=trunk) |
|
8da2f2a…
|
stephan
|
365 |
|
|
8da2f2a…
|
stephan
|
366 |
Noting, however, that all database tables are effectively internal |
|
8da2f2a…
|
stephan
|
367 |
APIs, with no API stability guarantees and subject to change at any |
|
8da2f2a…
|
stephan
|
368 |
time. Thus their structures generally should not be relied upon in |
|
8da2f2a…
|
stephan
|
369 |
client-side scripts. |
|
8da2f2a…
|
stephan
|
370 |
|
|
8da2f2a…
|
stephan
|
371 |
|
|
8da2f2a…
|
stephan
|
372 |
# Part 4: Implications and Consequences of the Model |
|
8da2f2a…
|
stephan
|
373 |
|
|
8da2f2a…
|
stephan
|
374 |
*Some* of the implications and consequences of Fossil's data model |
|
8da2f2a…
|
stephan
|
375 |
combined with the higher-level access via SQL include: |
|
8da2f2a…
|
stephan
|
376 |
|
|
8da2f2a…
|
stephan
|
377 |
- **Provable immutability of history.** Fossil offers only one option |
|
8da2f2a…
|
stephan
|
378 |
for modifying history: "shunning" is the forceful removal of an |
|
8da2f2a…
|
stephan
|
379 |
artifact from the `blob` table and the creation of a db record |
|
8da2f2a…
|
stephan
|
380 |
stating that the shunned hash may no longer be synced into this |
|
8da2f2a…
|
stephan
|
381 |
repository. Shunning effectively leaves a hole in the SCM history, |
|
8da2f2a…
|
stephan
|
382 |
and is only intended to be used for removal of illegal, dangerous, |
|
8da2f2a…
|
stephan
|
383 |
or private information which should never have been added to the |
|
8da2f2a…
|
stephan
|
384 |
repository. |
|
8da2f2a…
|
stephan
|
385 |
|
|
8da2f2a…
|
stephan
|
386 |
- **Complete separation of SCM-relevant data and app-level data |
|
8da2f2a…
|
stephan
|
387 |
structures**. This allows the application to update its structures |
|
8da2f2a…
|
stephan
|
388 |
at will without significant backwards-compatibility concerns. In |
|
8da2f2a…
|
stephan
|
389 |
Fossil's case, "data structures" primarily refers to the SQL |
|
8da2f2a…
|
stephan
|
390 |
schema. Bringing a given repository schema up to date vis a vis a |
|
8da2f2a…
|
stephan
|
391 |
given fossil binary version simply means rebuilding the repository |
|
8da2f2a…
|
stephan
|
392 |
with that fossil binary. There are exceptionally rare cases, namely |
|
8da2f2a…
|
stephan
|
393 |
the switch from SHA1 to SHA3-256 ushered in with Fossil 2.0, which |
|
8da2f2a…
|
stephan
|
394 |
can lead to true incompatibility. e.g. a Fossil 1.x client cannot |
|
8da2f2a…
|
stephan
|
395 |
use a repository database which contains SHA3 hashes, regardless of |
|
8da2f2a…
|
stephan
|
396 |
a rebuild. |
|
8da2f2a…
|
stephan
|
397 |
|
|
8da2f2a…
|
stephan
|
398 |
- **Two-way compatibility with other hypothetical clients** which also |
|
8da2f2a…
|
stephan
|
399 |
implement the same underlying data model. So far there are none, but |
|
8da2f2a…
|
stephan
|
400 |
it's conceivably possible. |
|
8da2f2a…
|
stephan
|
401 |
|
|
213160c…
|
stephan
|
402 |
- **Provides a solid basis for reporting.** Fossil's real-time metrics |
|
213160c…
|
stephan
|
403 |
and reporting options are arguably the most powerful and flexible |
|
213160c…
|
stephan
|
404 |
yet seen in an SCM. |
|
8da2f2a…
|
stephan
|
405 |
|
|
8da2f2a…
|
stephan
|
406 |
- Very probably several more things. |