Fossil SCM

fossil-scm / www / tech_overview.wiki
1
<title>A Technical Overview of Fossil's Design & Implementation</title>
2
3
<h2>1.0 Introduction</h2>
4
5
At its lowest level, a Fossil repository consists of an unordered set
6
of immutable "artifacts". You might think of these artifacts as "files",
7
since in many cases the artifacts are exactly that.
8
But other "structural artifacts" are also included in the mix.
9
These structural artifacts define the relationships
10
between artifacts - which files go together to form a particular
11
version of the project, who checked in that version and when, what was
12
the check-in comment, what wiki pages are included with the project, what
13
are the edit histories of each wiki page, what bug reports or tickets are
14
included, who contributed to the evolution of each ticket, and so forth.
15
This low-level file format is called the "global state" of
16
the repository, since this is the information that is synced to peer
17
repositories using push and pull operations. The low-level file format
18
is also called "enduring" since it is intended to last for many years.
19
The details of the low-level, enduring, global file format
20
are [./fileformat.wiki | described separately].
21
22
This article is about how Fossil is currently implemented. Instead of
23
dealing with vague abstractions of "enduring file formats" as the
24
[./fileformat.wiki | other document] does, this article provides
25
some detail on how Fossil actually stores information on disk.
26
27
<h2>2.0 Three Databases</h2>
28
29
Fossil stores state information in
30
[http://www.sqlite.org/ | SQLite] database files.
31
SQLite keeps an entire relational database, including multiple tables and
32
indices, in a single disk file. The SQLite library allows the database
33
files to be efficiently queried and updated using the industry-standard
34
SQL language. SQLite updates are atomic, so even in the event of
35
a system crashes or power failure the repository content is protected.
36
37
Fossil uses three separate classes of SQLite databases:
38
39
<ol>
40
<li>The configuration database
41
<li>Repository databases
42
<li>Checkout databases
43
</ol>
44
45
The configuration database is a one-per-user database that holds
46
global configuration information used by Fossil. There is one
47
repository database per project. The repository database is the
48
file that people are normally referring to when they say
49
"a Fossil repository". The checkout database is found in the working
50
checkout for a project and contains state information that is unique
51
to that working checkout.
52
53
Fossil does not always use all three database files. The web interface,
54
for example, typically only uses the repository database. And the
55
[/help/settings | fossil settings] command only opens the configuration database
56
when the --global option is used. But other commands use all three
57
databases at once. For example, the [/help/status | fossil status]
58
command will first locate the checkout database, then use the checkout
59
database to find the repository database, then open the configuration
60
database. Whenever multiple databases are used at the same time,
61
they are all opened on the same SQLite database connection using
62
SQLite's [http://www.sqlite.org/lang_attach.html | ATTACH] command.
63
64
The chart below provides a quick summary of how each of these
65
database files are used by Fossil, with detailed discussion following.
66
67
<table align="center">
68
<tr valign="bottom">
69
<th style="text-align:center">Configuration&nbsp;Database<br>"~/.fossil" or<br>
70
"~/.config/fossil.db"
71
<th style="text-align:center">Repository Database<br>"<i>project</i>.fossil"
72
<th style="text-align:center">Checkout Database<br>"_FOSSIL_" or ".fslckout"
73
<tr valign="top">
74
<td><ul>
75
<li>Global [/help/settings |settings]
76
<li>List of active repositories used by the [/help/all | all] command
77
</ul></td>
78
<td><ul>
79
<li>[./fileformat.wiki | Global state of the project]
80
encoded using delta-compression
81
<li>Local [/help/settings|settings]
82
<li>Web interface display preferences
83
<li>User credentials and permissions
84
<li>Metadata about the global state to facilitate rapid
85
queries
86
</ul></td>
87
<td><ul>
88
<li>The repository database used by this checkout
89
<li>The version currently checked out
90
<li>Other versions [/help/merge | merged] in but not
91
yet [/help/commit | committed]
92
<li>Changes from the [/help/add | add], [/help/delete | delete],
93
and [/help/rename | rename] commands that have not yet been committed
94
<li>"mtime" values and other information used to efficiently detect
95
local edits
96
<li>The "[/help/stash | stash]"
97
<li>Information needed to "[/help/undo|undo]" or "[/help/redo|redo]"
98
</ul></td>
99
</tr>
100
</table>
101
102
<h3 id="configdb">2.1 The Configuration Database</h3>
103
104
The configuration database holds cross-repository preferences and a list of all
105
repositories for a single user.
106
107
The [/help/settings | fossil settings] command can be used to specify various
108
operating parameters and preferences for Fossil repositories. Settings can
109
apply to a single repository, or they can apply globally to all repositories
110
for a user. If both a global and a repository value exists for a setting,
111
then the repository-specific value takes precedence. All of the settings
112
have reasonable defaults, and so many users will never need to change them.
113
But if changes to settings are desired, the configuration database provides
114
a way to change settings for all repositories with a single command, rather
115
than having to change the setting individually on each repository.
116
117
The configuration database also maintains a list of repositories. This
118
list is used by the [/help/all | fossil all] command in order to run various
119
operations such as "sync" or "rebuild" on all repositories managed by a user.
120
121
<h4 id="configloc">2.1.1 Location Of The Configuration Database</h4>
122
123
On Unix systems, the configuration database is named by the following
124
algorithm:
125
126
<table>
127
<tr><td>1. if environment variable FOSSIL_HOME exists
128
<td>&nbsp;&rarr;&nbsp;<td>$FOSSIL_HOME/.fossil
129
<tr><td>2. if file ~/.fossil exists
130
<td>&nbsp;&rarr;<td>~/.fossil
131
<tr><td>3. if environment variable XDG_CONFIG_HOME exists
132
<td>&nbsp;&rarr;<td>$XDG_CONFIG_HOME/fossil.db
133
<tr><td>4. if the directory ~/.config exists
134
<td>&nbsp;&rarr;<td>~/.config/fossil.db
135
<tr><td>5. Otherwise<td>&nbsp;&rarr;<td>~/.fossil
136
</table>
137
138
Another way of thinking of this algorithm is the following:
139
140
* Use "$FOSSIL_HOME/.fossil" if the FOSSIL_HOME variable is defined
141
* Use the XDG-compatible name (usually ~/.config/fossil.db) on XDG systems
142
if the ~/.fossil file does not already exist
143
* Otherwise, use the traditional unix name of "~/.fossil"
144
145
This algorithm is complex due to the need for historical compatibility.
146
Originally, the database was always just "~/.fossil". Then support
147
for the FOSSIL_HOME environment variable was added. Later, support for the
148
[https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html|XDG-compatible configation filenames]
149
was added. Each of these changes needed to continue to support legacy
150
installations.
151
152
On Windows, the configuration database is the first of the following
153
for which the corresponding environment variables exist:
154
155
* %FOSSIL_HOME%/_fossil
156
* %LOCALAPPDATA%/_fossil
157
* %APPDATA%/_fossil
158
* %USERPROFILES%/_fossil
159
* %HOMEDRIVE%%HOMEPATH%/_fossil
160
161
The second case is the one that usually determines the name. Note that the
162
FOSSIL_HOME environment variable can always be set to determine the
163
location of the configuration database. Note also that the configuration
164
database file itself is called ".fossil" or "fossil.db" on unix but
165
"_fossil" on windows.
166
167
The [/help/info|fossil info] command will show the location of
168
the configuration database on a line that starts with "config-db:".
169
170
<h3>2.2 Repository Databases</h3>
171
172
The repository database is the file that is commonly referred to as
173
"the repository". This is because the repository database contains,
174
among other things, the complete revision, ticket, and wiki history for
175
a project. It is customary to name the repository database after the
176
name of the project, with a ".fossil" suffix. For example, the repository
177
database for the self-hosting Fossil repository is called "fossil.fossil"
178
and the repository database for SQLite is called "sqlite.fossil".
179
180
<h4>2.2.1 Global Project State</h4>
181
182
The bulk of the repository database (typically 75 to 85%) consists
183
of the artifacts that comprise the
184
[./fileformat.wiki | enduring, global, shared state] of the project.
185
The artifacts are stored as BLOBs, compressed using
186
[http://www.zlib.net/ | zlib compression] and, where applicable,
187
using [./delta_encoder_algorithm.wiki | delta compression].
188
The combination of zlib and delta compression results in a considerable
189
space savings. For the SQLite project (when this paragraph was last
190
updated on 2020-02-08)
191
the total size of all artifacts is over 7.1 GB but thanks to the
192
combined zlib and delta compression, that content only takes less than
193
97 MB of space in the repository database, for a compression ratio
194
of about 74:1. The median size of all content BLOBs after delta
195
and zlib compression have been applied is 156 bytes.
196
The median size of BLOBs without compression is 45,312 bytes.
197
198
Note that the zlib and delta compression is not an inherent part of the
199
Fossil file format; it is just an optimization.
200
The enduring file format for Fossil is the unordered
201
set of artifacts. The compression techniques are just a detail of
202
how the current implementation of Fossil happens to store these artifacts
203
efficiently on disk.
204
205
All of the original uncompressed and un-delta'd artifacts can be extracted
206
from a Fossil repository database using
207
the [/help/deconstruct | fossil deconstruct]
208
command. Individual artifacts can be extracted using the
209
[/help/artifact | fossil artifact] command.
210
When accessing the repository database using raw SQL and the
211
[/help/sqlite3 | fossil sql] command, the extension function
212
"<tt>content()</tt>" with a single argument which is the SHA1 or
213
SHA3-256 hash
214
of an artifact will return the complete uncompressed
215
content of that artifact.
216
217
Going the other way, the [/help/reconstruct | fossil reconstruct]
218
command will scan a directory hierarchy and add all files found to
219
a new repository database. The [/help/import | fossil import] command
220
works by reading the input git-fast-export stream and using it to construct
221
corresponding artifacts which are then written into the repository database.
222
223
<h4>2.2.2 Project Metadata</h4>
224
225
The global project state information in the repository database is
226
supplemented by computed metadata that makes querying the project state
227
more efficient. Metadata includes information such as the following:
228
229
* The names for all files found in any check-in.
230
* All check-ins that modify a given file
231
* Parents and children of each check-in.
232
* Potential timeline rows.
233
* The names of all symbolic tags and the check-ins they apply to.
234
* The names of all wiki pages and the artifacts that comprise each
235
wiki page.
236
* Attachments and the wiki pages or tickets they apply to.
237
* Current content of each ticket.
238
* Cross-references between tickets, check-ins, and wiki pages.
239
240
The metadata is held in various SQL tables in the repository database.
241
The metadata is designed to facilitate queries for the various timelines and
242
reports that Fossil generates.
243
As the functionality of Fossil evolves,
244
the schema for the metadata can and does change.
245
But schema changes do not invalidate the repository. Remember that the
246
metadata contains no new information - only information that has been
247
extracted from the canonical artifacts and saved in a more useful form.
248
Hence, when the metadata schema changes, the prior metadata can be discarded
249
and the entire metadata corpus can be recomputed from the canonical
250
artifacts. That is what the
251
[/help/rebuild | fossil rebuild] command does.
252
253
<h4>2.2.3 Display And Processing Preferences</h4>
254
255
The repository database also holds information used to help format
256
the display of web pages and configuration settings that override the
257
global configuration settings for the specific repository. All of
258
this information (and the user credentials and privileges too) is
259
local to each repository database; it is not shared between repositories
260
by [/help/sync | fossil sync]. That is because it is entirely reasonable
261
that two different websites for the same project might have completely
262
different display preferences and user communities. One instance of the
263
project might be a fork of the other, for example, which pulls from the
264
other but never pushes and extends the project in ways that the keepers of
265
the other website disapprove of.
266
267
Display and processing information includes the following:
268
269
* The name and description of the project
270
* The CSS file, header, and footer used by all web pages
271
* The project logo image
272
* Fields of tickets that are considered "significant" and which are
273
therefore collected from artifacts and made available for display
274
* Templates for screens to view, edit, and create tickets
275
* Ticket report formats and display preferences
276
* Local values for [/help/settings | settings] that override the
277
global values defined in the per-user configuration database.
278
279
Though the display and processing preferences do not move between
280
repository instances using [/help/sync | fossil sync], this information
281
can be shared between repositories using the
282
[/help/config | fossil config push] and
283
[/help/config | fossil config pull] commands.
284
The display and processing information is also copied into new
285
repositories when they are created using
286
[/help/clone | fossil clone].
287
288
<h4>2.2.4 User Credentials And Privileges</h4>
289
290
Just because two development teams are collaborating on a project and allow
291
push and/or pull between their repositories does not mean that they
292
trust each other enough to share passwords and access privileges.
293
Hence the names and emails and passwords and privileges of users are
294
considered private information that is kept locally in each repository.
295
296
Each repository database has a table holding the username, privileges,
297
and login credentials for users authorized to interact with that particular
298
database. In addition, there is a table named "concealed" that maps the
299
SHA1 hash of each users email address back into their true email address.
300
The concealed table allows just the SHA1 hash of email addresses to
301
be stored in tickets, and thus prevents actual email addresses from falling
302
into the hands of spammers who happen to clone the repository.
303
304
The content of the user and concealed tables can be pushed and pulled using the
305
[/help/config | fossil config push] and
306
[/help/config | fossil config pull] commands with the "user" and
307
"email" as the AREA argument, but only if you have administrative
308
privileges on the remote repository.
309
310
<h4>2.2.5 Shunned Artifact List</h4>
311
312
The set of canonical artifacts for a project - the global state for the
313
project - is intended to be an append-only database. In other words,
314
new artifacts can be added but artifacts can never be removed. But
315
it sometimes happens that inappropriate content is mistakenly or
316
maliciously added to a repository. The only way to get rid of
317
the undesired content is to [./shunning.wiki | "shun"] it.
318
The "shun" table in the repository database records the hash values for
319
all shunned artifacts.
320
321
The shun table can be pushed or pulled using
322
the [/help/config | fossil config] command with the "shun" AREA argument.
323
The shun table is also copied during a [/help/clone | clone].
324
325
<h3 id="localdb">2.3 Checkout Databases</h3>
326
327
Fossil allows a single repository
328
to have multiple working checkouts. Each working checkout has a single
329
database in its root directory that records the state of that checkout.
330
The checkout database is named "_FOSSIL_" or ".fslckout".
331
The checkout database records information such as the following:
332
333
* The name of the repository database file.
334
* The version that is currently checked out.
335
* Files that have been [/help/add | added],
336
[/help/rm | removed], or [/help/mv | renamed] but not
337
yet committed.
338
* The mtime and size of files as they were originally checked out,
339
in order to expedite checking which files have been edited.
340
* Other check-ins that have been [/help/merge | merged] into the
341
working checkout but not yet committed.
342
* Copies of files prior to the most recent undoable operation - needed to
343
implement the [/help/undo | undo] and [/help/redo | redo] commands.
344
* The [/help/stash | stash].
345
* State information for the [/help/bisect | bisect] command.
346
347
For Fossil commands that run from within a working checkout, the
348
first thing that happens is that Fossil locates the checkout database.
349
Fossil first looks in the current directory. If not found there, it
350
looks in the parent directory. If not found there, the parent of the
351
parent. And so forth until either the checkout database is found
352
or the search reaches the root of the file system. (In the latter case,
353
Fossil returns an error, of course.) Once the checkout database is
354
located, it is used to locate the repository database.
355
356
Notice that the checkout database contains a pointer to the repository
357
database but that the repository database has no record of the checkout
358
databases. That means that a working checkout directory tree can be
359
freely renamed or copied or deleted without consequence. But the
360
repository database file, on the other hand, has to stay in the same
361
place with the same name or else the open checkout databases will not
362
be able to find it.
363
364
A checkout database is created by the [/help/open | fossil open] command.
365
A checkout database is deleted by [/help/close | fossil close]. The
366
fossil close command really isn't needed; one can accomplish the same
367
thing simply by deleting the checkout database.
368
369
Note that the stash, the undo stack, and the state of the bisect command
370
are all contained within the checkout database. That means that the
371
fossil close command will delete all stash content, the undo stack, and
372
the bisect state. The close command is not undoable. Use it with care.
373
374
<h2>3.0 See Also</h2>
375
376
* [./makefile.wiki | The Fossil Build Process]
377
* [./contribute.wiki | How To Contribute Code To Fossil]
378
* [./adding_code.wiki | Adding New Features To Fossil]
379

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button