|
1
|
<title>A Technical Overview of Fossil's Design & Implementation</title> |
|
2
|
|
|
3
|
<h2>1.0 Introduction</h2> |
|
4
|
|
|
5
|
At its lowest level, a Fossil repository consists of an unordered set |
|
6
|
of immutable "artifacts". You might think of these artifacts as "files", |
|
7
|
since in many cases the artifacts are exactly that. |
|
8
|
But other "structural artifacts" are also included in the mix. |
|
9
|
These structural artifacts define the relationships |
|
10
|
between artifacts - which files go together to form a particular |
|
11
|
version of the project, who checked in that version and when, what was |
|
12
|
the check-in comment, what wiki pages are included with the project, what |
|
13
|
are the edit histories of each wiki page, what bug reports or tickets are |
|
14
|
included, who contributed to the evolution of each ticket, and so forth. |
|
15
|
This low-level file format is called the "global state" of |
|
16
|
the repository, since this is the information that is synced to peer |
|
17
|
repositories using push and pull operations. The low-level file format |
|
18
|
is also called "enduring" since it is intended to last for many years. |
|
19
|
The details of the low-level, enduring, global file format |
|
20
|
are [./fileformat.wiki | described separately]. |
|
21
|
|
|
22
|
This article is about how Fossil is currently implemented. Instead of |
|
23
|
dealing with vague abstractions of "enduring file formats" as the |
|
24
|
[./fileformat.wiki | other document] does, this article provides |
|
25
|
some detail on how Fossil actually stores information on disk. |
|
26
|
|
|
27
|
<h2>2.0 Three Databases</h2> |
|
28
|
|
|
29
|
Fossil stores state information in |
|
30
|
[http://www.sqlite.org/ | SQLite] database files. |
|
31
|
SQLite keeps an entire relational database, including multiple tables and |
|
32
|
indices, in a single disk file. The SQLite library allows the database |
|
33
|
files to be efficiently queried and updated using the industry-standard |
|
34
|
SQL language. SQLite updates are atomic, so even in the event of |
|
35
|
a system crashes or power failure the repository content is protected. |
|
36
|
|
|
37
|
Fossil uses three separate classes of SQLite databases: |
|
38
|
|
|
39
|
<ol> |
|
40
|
<li>The configuration database |
|
41
|
<li>Repository databases |
|
42
|
<li>Checkout databases |
|
43
|
</ol> |
|
44
|
|
|
45
|
The configuration database is a one-per-user database that holds |
|
46
|
global configuration information used by Fossil. There is one |
|
47
|
repository database per project. The repository database is the |
|
48
|
file that people are normally referring to when they say |
|
49
|
"a Fossil repository". The checkout database is found in the working |
|
50
|
checkout for a project and contains state information that is unique |
|
51
|
to that working checkout. |
|
52
|
|
|
53
|
Fossil does not always use all three database files. The web interface, |
|
54
|
for example, typically only uses the repository database. And the |
|
55
|
[/help/settings | fossil settings] command only opens the configuration database |
|
56
|
when the --global option is used. But other commands use all three |
|
57
|
databases at once. For example, the [/help/status | fossil status] |
|
58
|
command will first locate the checkout database, then use the checkout |
|
59
|
database to find the repository database, then open the configuration |
|
60
|
database. Whenever multiple databases are used at the same time, |
|
61
|
they are all opened on the same SQLite database connection using |
|
62
|
SQLite's [http://www.sqlite.org/lang_attach.html | ATTACH] command. |
|
63
|
|
|
64
|
The chart below provides a quick summary of how each of these |
|
65
|
database files are used by Fossil, with detailed discussion following. |
|
66
|
|
|
67
|
<table align="center"> |
|
68
|
<tr valign="bottom"> |
|
69
|
<th style="text-align:center">Configuration Database<br>"~/.fossil" or<br> |
|
70
|
"~/.config/fossil.db" |
|
71
|
<th style="text-align:center">Repository Database<br>"<i>project</i>.fossil" |
|
72
|
<th style="text-align:center">Checkout Database<br>"_FOSSIL_" or ".fslckout" |
|
73
|
<tr valign="top"> |
|
74
|
<td><ul> |
|
75
|
<li>Global [/help/settings |settings] |
|
76
|
<li>List of active repositories used by the [/help/all | all] command |
|
77
|
</ul></td> |
|
78
|
<td><ul> |
|
79
|
<li>[./fileformat.wiki | Global state of the project] |
|
80
|
encoded using delta-compression |
|
81
|
<li>Local [/help/settings|settings] |
|
82
|
<li>Web interface display preferences |
|
83
|
<li>User credentials and permissions |
|
84
|
<li>Metadata about the global state to facilitate rapid |
|
85
|
queries |
|
86
|
</ul></td> |
|
87
|
<td><ul> |
|
88
|
<li>The repository database used by this checkout |
|
89
|
<li>The version currently checked out |
|
90
|
<li>Other versions [/help/merge | merged] in but not |
|
91
|
yet [/help/commit | committed] |
|
92
|
<li>Changes from the [/help/add | add], [/help/delete | delete], |
|
93
|
and [/help/rename | rename] commands that have not yet been committed |
|
94
|
<li>"mtime" values and other information used to efficiently detect |
|
95
|
local edits |
|
96
|
<li>The "[/help/stash | stash]" |
|
97
|
<li>Information needed to "[/help/undo|undo]" or "[/help/redo|redo]" |
|
98
|
</ul></td> |
|
99
|
</tr> |
|
100
|
</table> |
|
101
|
|
|
102
|
<h3 id="configdb">2.1 The Configuration Database</h3> |
|
103
|
|
|
104
|
The configuration database holds cross-repository preferences and a list of all |
|
105
|
repositories for a single user. |
|
106
|
|
|
107
|
The [/help/settings | fossil settings] command can be used to specify various |
|
108
|
operating parameters and preferences for Fossil repositories. Settings can |
|
109
|
apply to a single repository, or they can apply globally to all repositories |
|
110
|
for a user. If both a global and a repository value exists for a setting, |
|
111
|
then the repository-specific value takes precedence. All of the settings |
|
112
|
have reasonable defaults, and so many users will never need to change them. |
|
113
|
But if changes to settings are desired, the configuration database provides |
|
114
|
a way to change settings for all repositories with a single command, rather |
|
115
|
than having to change the setting individually on each repository. |
|
116
|
|
|
117
|
The configuration database also maintains a list of repositories. This |
|
118
|
list is used by the [/help/all | fossil all] command in order to run various |
|
119
|
operations such as "sync" or "rebuild" on all repositories managed by a user. |
|
120
|
|
|
121
|
<h4 id="configloc">2.1.1 Location Of The Configuration Database</h4> |
|
122
|
|
|
123
|
On Unix systems, the configuration database is named by the following |
|
124
|
algorithm: |
|
125
|
|
|
126
|
<table> |
|
127
|
<tr><td>1. if environment variable FOSSIL_HOME exists |
|
128
|
<td> → <td>$FOSSIL_HOME/.fossil |
|
129
|
<tr><td>2. if file ~/.fossil exists |
|
130
|
<td> →<td>~/.fossil |
|
131
|
<tr><td>3. if environment variable XDG_CONFIG_HOME exists |
|
132
|
<td> →<td>$XDG_CONFIG_HOME/fossil.db |
|
133
|
<tr><td>4. if the directory ~/.config exists |
|
134
|
<td> →<td>~/.config/fossil.db |
|
135
|
<tr><td>5. Otherwise<td> →<td>~/.fossil |
|
136
|
</table> |
|
137
|
|
|
138
|
Another way of thinking of this algorithm is the following: |
|
139
|
|
|
140
|
* Use "$FOSSIL_HOME/.fossil" if the FOSSIL_HOME variable is defined |
|
141
|
* Use the XDG-compatible name (usually ~/.config/fossil.db) on XDG systems |
|
142
|
if the ~/.fossil file does not already exist |
|
143
|
* Otherwise, use the traditional unix name of "~/.fossil" |
|
144
|
|
|
145
|
This algorithm is complex due to the need for historical compatibility. |
|
146
|
Originally, the database was always just "~/.fossil". Then support |
|
147
|
for the FOSSIL_HOME environment variable was added. Later, support for the |
|
148
|
[https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html|XDG-compatible configation filenames] |
|
149
|
was added. Each of these changes needed to continue to support legacy |
|
150
|
installations. |
|
151
|
|
|
152
|
On Windows, the configuration database is the first of the following |
|
153
|
for which the corresponding environment variables exist: |
|
154
|
|
|
155
|
* %FOSSIL_HOME%/_fossil |
|
156
|
* %LOCALAPPDATA%/_fossil |
|
157
|
* %APPDATA%/_fossil |
|
158
|
* %USERPROFILES%/_fossil |
|
159
|
* %HOMEDRIVE%%HOMEPATH%/_fossil |
|
160
|
|
|
161
|
The second case is the one that usually determines the name. Note that the |
|
162
|
FOSSIL_HOME environment variable can always be set to determine the |
|
163
|
location of the configuration database. Note also that the configuration |
|
164
|
database file itself is called ".fossil" or "fossil.db" on unix but |
|
165
|
"_fossil" on windows. |
|
166
|
|
|
167
|
The [/help/info|fossil info] command will show the location of |
|
168
|
the configuration database on a line that starts with "config-db:". |
|
169
|
|
|
170
|
<h3>2.2 Repository Databases</h3> |
|
171
|
|
|
172
|
The repository database is the file that is commonly referred to as |
|
173
|
"the repository". This is because the repository database contains, |
|
174
|
among other things, the complete revision, ticket, and wiki history for |
|
175
|
a project. It is customary to name the repository database after the |
|
176
|
name of the project, with a ".fossil" suffix. For example, the repository |
|
177
|
database for the self-hosting Fossil repository is called "fossil.fossil" |
|
178
|
and the repository database for SQLite is called "sqlite.fossil". |
|
179
|
|
|
180
|
<h4>2.2.1 Global Project State</h4> |
|
181
|
|
|
182
|
The bulk of the repository database (typically 75 to 85%) consists |
|
183
|
of the artifacts that comprise the |
|
184
|
[./fileformat.wiki | enduring, global, shared state] of the project. |
|
185
|
The artifacts are stored as BLOBs, compressed using |
|
186
|
[http://www.zlib.net/ | zlib compression] and, where applicable, |
|
187
|
using [./delta_encoder_algorithm.wiki | delta compression]. |
|
188
|
The combination of zlib and delta compression results in a considerable |
|
189
|
space savings. For the SQLite project (when this paragraph was last |
|
190
|
updated on 2020-02-08) |
|
191
|
the total size of all artifacts is over 7.1 GB but thanks to the |
|
192
|
combined zlib and delta compression, that content only takes less than |
|
193
|
97 MB of space in the repository database, for a compression ratio |
|
194
|
of about 74:1. The median size of all content BLOBs after delta |
|
195
|
and zlib compression have been applied is 156 bytes. |
|
196
|
The median size of BLOBs without compression is 45,312 bytes. |
|
197
|
|
|
198
|
Note that the zlib and delta compression is not an inherent part of the |
|
199
|
Fossil file format; it is just an optimization. |
|
200
|
The enduring file format for Fossil is the unordered |
|
201
|
set of artifacts. The compression techniques are just a detail of |
|
202
|
how the current implementation of Fossil happens to store these artifacts |
|
203
|
efficiently on disk. |
|
204
|
|
|
205
|
All of the original uncompressed and un-delta'd artifacts can be extracted |
|
206
|
from a Fossil repository database using |
|
207
|
the [/help/deconstruct | fossil deconstruct] |
|
208
|
command. Individual artifacts can be extracted using the |
|
209
|
[/help/artifact | fossil artifact] command. |
|
210
|
When accessing the repository database using raw SQL and the |
|
211
|
[/help/sqlite3 | fossil sql] command, the extension function |
|
212
|
"<tt>content()</tt>" with a single argument which is the SHA1 or |
|
213
|
SHA3-256 hash |
|
214
|
of an artifact will return the complete uncompressed |
|
215
|
content of that artifact. |
|
216
|
|
|
217
|
Going the other way, the [/help/reconstruct | fossil reconstruct] |
|
218
|
command will scan a directory hierarchy and add all files found to |
|
219
|
a new repository database. The [/help/import | fossil import] command |
|
220
|
works by reading the input git-fast-export stream and using it to construct |
|
221
|
corresponding artifacts which are then written into the repository database. |
|
222
|
|
|
223
|
<h4>2.2.2 Project Metadata</h4> |
|
224
|
|
|
225
|
The global project state information in the repository database is |
|
226
|
supplemented by computed metadata that makes querying the project state |
|
227
|
more efficient. Metadata includes information such as the following: |
|
228
|
|
|
229
|
* The names for all files found in any check-in. |
|
230
|
* All check-ins that modify a given file |
|
231
|
* Parents and children of each check-in. |
|
232
|
* Potential timeline rows. |
|
233
|
* The names of all symbolic tags and the check-ins they apply to. |
|
234
|
* The names of all wiki pages and the artifacts that comprise each |
|
235
|
wiki page. |
|
236
|
* Attachments and the wiki pages or tickets they apply to. |
|
237
|
* Current content of each ticket. |
|
238
|
* Cross-references between tickets, check-ins, and wiki pages. |
|
239
|
|
|
240
|
The metadata is held in various SQL tables in the repository database. |
|
241
|
The metadata is designed to facilitate queries for the various timelines and |
|
242
|
reports that Fossil generates. |
|
243
|
As the functionality of Fossil evolves, |
|
244
|
the schema for the metadata can and does change. |
|
245
|
But schema changes do not invalidate the repository. Remember that the |
|
246
|
metadata contains no new information - only information that has been |
|
247
|
extracted from the canonical artifacts and saved in a more useful form. |
|
248
|
Hence, when the metadata schema changes, the prior metadata can be discarded |
|
249
|
and the entire metadata corpus can be recomputed from the canonical |
|
250
|
artifacts. That is what the |
|
251
|
[/help/rebuild | fossil rebuild] command does. |
|
252
|
|
|
253
|
<h4>2.2.3 Display And Processing Preferences</h4> |
|
254
|
|
|
255
|
The repository database also holds information used to help format |
|
256
|
the display of web pages and configuration settings that override the |
|
257
|
global configuration settings for the specific repository. All of |
|
258
|
this information (and the user credentials and privileges too) is |
|
259
|
local to each repository database; it is not shared between repositories |
|
260
|
by [/help/sync | fossil sync]. That is because it is entirely reasonable |
|
261
|
that two different websites for the same project might have completely |
|
262
|
different display preferences and user communities. One instance of the |
|
263
|
project might be a fork of the other, for example, which pulls from the |
|
264
|
other but never pushes and extends the project in ways that the keepers of |
|
265
|
the other website disapprove of. |
|
266
|
|
|
267
|
Display and processing information includes the following: |
|
268
|
|
|
269
|
* The name and description of the project |
|
270
|
* The CSS file, header, and footer used by all web pages |
|
271
|
* The project logo image |
|
272
|
* Fields of tickets that are considered "significant" and which are |
|
273
|
therefore collected from artifacts and made available for display |
|
274
|
* Templates for screens to view, edit, and create tickets |
|
275
|
* Ticket report formats and display preferences |
|
276
|
* Local values for [/help/settings | settings] that override the |
|
277
|
global values defined in the per-user configuration database. |
|
278
|
|
|
279
|
Though the display and processing preferences do not move between |
|
280
|
repository instances using [/help/sync | fossil sync], this information |
|
281
|
can be shared between repositories using the |
|
282
|
[/help/config | fossil config push] and |
|
283
|
[/help/config | fossil config pull] commands. |
|
284
|
The display and processing information is also copied into new |
|
285
|
repositories when they are created using |
|
286
|
[/help/clone | fossil clone]. |
|
287
|
|
|
288
|
<h4>2.2.4 User Credentials And Privileges</h4> |
|
289
|
|
|
290
|
Just because two development teams are collaborating on a project and allow |
|
291
|
push and/or pull between their repositories does not mean that they |
|
292
|
trust each other enough to share passwords and access privileges. |
|
293
|
Hence the names and emails and passwords and privileges of users are |
|
294
|
considered private information that is kept locally in each repository. |
|
295
|
|
|
296
|
Each repository database has a table holding the username, privileges, |
|
297
|
and login credentials for users authorized to interact with that particular |
|
298
|
database. In addition, there is a table named "concealed" that maps the |
|
299
|
SHA1 hash of each users email address back into their true email address. |
|
300
|
The concealed table allows just the SHA1 hash of email addresses to |
|
301
|
be stored in tickets, and thus prevents actual email addresses from falling |
|
302
|
into the hands of spammers who happen to clone the repository. |
|
303
|
|
|
304
|
The content of the user and concealed tables can be pushed and pulled using the |
|
305
|
[/help/config | fossil config push] and |
|
306
|
[/help/config | fossil config pull] commands with the "user" and |
|
307
|
"email" as the AREA argument, but only if you have administrative |
|
308
|
privileges on the remote repository. |
|
309
|
|
|
310
|
<h4>2.2.5 Shunned Artifact List</h4> |
|
311
|
|
|
312
|
The set of canonical artifacts for a project - the global state for the |
|
313
|
project - is intended to be an append-only database. In other words, |
|
314
|
new artifacts can be added but artifacts can never be removed. But |
|
315
|
it sometimes happens that inappropriate content is mistakenly or |
|
316
|
maliciously added to a repository. The only way to get rid of |
|
317
|
the undesired content is to [./shunning.wiki | "shun"] it. |
|
318
|
The "shun" table in the repository database records the hash values for |
|
319
|
all shunned artifacts. |
|
320
|
|
|
321
|
The shun table can be pushed or pulled using |
|
322
|
the [/help/config | fossil config] command with the "shun" AREA argument. |
|
323
|
The shun table is also copied during a [/help/clone | clone]. |
|
324
|
|
|
325
|
<h3 id="localdb">2.3 Checkout Databases</h3> |
|
326
|
|
|
327
|
Fossil allows a single repository |
|
328
|
to have multiple working checkouts. Each working checkout has a single |
|
329
|
database in its root directory that records the state of that checkout. |
|
330
|
The checkout database is named "_FOSSIL_" or ".fslckout". |
|
331
|
The checkout database records information such as the following: |
|
332
|
|
|
333
|
* The name of the repository database file. |
|
334
|
* The version that is currently checked out. |
|
335
|
* Files that have been [/help/add | added], |
|
336
|
[/help/rm | removed], or [/help/mv | renamed] but not |
|
337
|
yet committed. |
|
338
|
* The mtime and size of files as they were originally checked out, |
|
339
|
in order to expedite checking which files have been edited. |
|
340
|
* Other check-ins that have been [/help/merge | merged] into the |
|
341
|
working checkout but not yet committed. |
|
342
|
* Copies of files prior to the most recent undoable operation - needed to |
|
343
|
implement the [/help/undo | undo] and [/help/redo | redo] commands. |
|
344
|
* The [/help/stash | stash]. |
|
345
|
* State information for the [/help/bisect | bisect] command. |
|
346
|
|
|
347
|
For Fossil commands that run from within a working checkout, the |
|
348
|
first thing that happens is that Fossil locates the checkout database. |
|
349
|
Fossil first looks in the current directory. If not found there, it |
|
350
|
looks in the parent directory. If not found there, the parent of the |
|
351
|
parent. And so forth until either the checkout database is found |
|
352
|
or the search reaches the root of the file system. (In the latter case, |
|
353
|
Fossil returns an error, of course.) Once the checkout database is |
|
354
|
located, it is used to locate the repository database. |
|
355
|
|
|
356
|
Notice that the checkout database contains a pointer to the repository |
|
357
|
database but that the repository database has no record of the checkout |
|
358
|
databases. That means that a working checkout directory tree can be |
|
359
|
freely renamed or copied or deleted without consequence. But the |
|
360
|
repository database file, on the other hand, has to stay in the same |
|
361
|
place with the same name or else the open checkout databases will not |
|
362
|
be able to find it. |
|
363
|
|
|
364
|
A checkout database is created by the [/help/open | fossil open] command. |
|
365
|
A checkout database is deleted by [/help/close | fossil close]. The |
|
366
|
fossil close command really isn't needed; one can accomplish the same |
|
367
|
thing simply by deleting the checkout database. |
|
368
|
|
|
369
|
Note that the stash, the undo stack, and the state of the bisect command |
|
370
|
are all contained within the checkout database. That means that the |
|
371
|
fossil close command will delete all stash content, the undo stack, and |
|
372
|
the bisect state. The close command is not undoable. Use it with care. |
|
373
|
|
|
374
|
<h2>3.0 See Also</h2> |
|
375
|
|
|
376
|
* [./makefile.wiki | The Fossil Build Process] |
|
377
|
* [./contribute.wiki | How To Contribute Code To Fossil] |
|
378
|
* [./adding_code.wiki | Adding New Features To Fossil] |
|
379
|
|