Fossil SCM

fossil-scm / www / selfcheck.wiki
1
<title>Fossil Repository Integrity Self-Checks</title>
2
3
Fossil is designed with features to give it a high level
4
of integrity so that users can have confidence that content will
5
never be mangled or lost by Fossil.
6
This note describes the defensive measures that
7
Fossil uses to help prevent information loss due to bugs.
8
9
Fossil has been hosting itself and many other projects for
10
years now. Many bugs have been encountered. But, thanks in large
11
part to the defensive measures described here, no data has been
12
lost. The integrity checks are doing their job well.</p>
13
14
<h2>Atomic Check-ins With Rollback</h2>
15
16
The Fossil repository is stored in an
17
<a href="http://www.sqlite.org/">SQLite</a> database file.
18
([./tech_overview.wiki | Additional information] about the repository
19
file format.)
20
SQLite is very mature and stable and has been in wide-spread use for many
21
years, so we are confident it will not cause repository
22
corruption. SQLite
23
databases do not corrupt even if a program or system crash or power
24
failure occurs in the middle of the update. If some kind of crash
25
does occur in the middle of a change, then all the changes are rolled
26
back the next time that the database is accessed.
27
28
A check-in operation in Fossil makes many changes to the repository
29
database. But all these changes happen within a single transaction.
30
If something goes wrong in the middle of the commit, even if that something
31
is a power failure or OS crash, then the transaction
32
is rolled back and the database is unchanged.
33
34
<h2>Verification Of Delta Encodings Prior To Transaction Commit</h2>
35
36
The content files that comprise the global state of a Fossil repository
37
are stored in the repository as a tree. The leaves of the tree are
38
stored as zlib-compressed BLOBs. Interior nodes are deltas from their
39
descendants. A lot of encoding is going on. There is
40
zlib-compression which is relatively well-tested but still might
41
cause corruption if used improperly. And there is the relatively
42
new [./delta_encoder_algorithm.wiki | delta-encoding mechanism] designed expressly for Fossil. We want
43
to make sure that bugs in these encoding mechanisms do not lead to
44
loss of data.
45
46
To increase our confidence that everything in the repository is
47
recoverable, Fossil makes sure it can extract an exact replica
48
of every content file that it changes just prior to transaction
49
commit. So during the course of check-in (or other repository
50
operation) many different files
51
in the repository might be modified. Some files are simply
52
compressed. Other files are delta encoded and then compressed.
53
While all this is going on, Fossil makes a record of every file
54
and the SHA1 or SHA3-256 hash of the original content of that
55
file. Then just before transaction commit, Fossil re-extracts
56
the original content of all files that were written, recomputes
57
the hash, and verifies that the recomputed hash still matches.
58
If anything does not match up, an error
59
message is printed and the transaction rolls back.
60
61
So, in other words, Fossil always checks to make sure it can
62
re-extract a file before it commits a change to that file.
63
Hence bugs in Fossil are unlikely to corrupt the repository in
64
a way that prevents us from extracting historical versions of
65
files.
66
67
<h2>Checksum Over All Files In A Check-in</h2>
68
69
Manifest artifacts that define a check-in have two fields (the
70
R-card and Z-card) that record MD5 hashes of the manifest itself
71
and of all other files in the manifest. Prior to any check-in
72
commit, these checksums are verified to ensure that the check-in
73
agrees exactly with what is on disk. Similarly,
74
the repository checksum is verified after a checkout to make
75
sure that the entire repository was checked out correctly.
76
Note that these added checks use a different hash algorithm (MD5)
77
in order to avoid common-mode failures in the hash
78
algorithm implementation.
79
80
81
<h2>Checksums On Structural Artifacts And Deltas</h2>
82
83
Every [./fileformat.wiki | structural artifact] in a Fossil repository
84
contains a "Z-card" bearing an MD5 checksum over the rest of the
85
artifact. Any mismatch causes the structural artifact to be ignored.
86
87
The [./delta_format.wiki | file delta format] includes a 32-bit
88
checksum of the target file. Whenever a file is reconstructed from
89
a delta, that checksum is verified to make sure the reconstruction
90
was done correctly.
91
92
<h2>Reliability Versus Performance</h2>
93
94
Some version control systems make a big deal out of being "high performance"
95
or the "fastest version control system". Fossil makes no such claims and has
96
no such ambition. Indeed, profiling indicates that Fossil bears a
97
substantial performance cost for
98
doing all of the checksumming and verification outlined above.
99
Fossil takes the philosophy of the
100
<a href="http://en.wikipedia.org/wiki/The_Tortoise_and_the_Hare">tortoise</a>:
101
reliability is more important than raw speed. The developers of
102
Fossil see no merit in getting the wrong answer quickly.
103
104
Fossil may not be the fastest versioning system, but it is <i>fast enough</i>.
105
Fossil runs quickly enough to stay out of the developer's way.
106
Most operations complete in milliseconds, faster than you can press
107
the "Enter" key.
108

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button