Fossil SCM

Updates to the technical overview document.

drh 2012-02-04 13:55 trunk
Commit e255caa2c71a3b35db7163ea42bfe4ba9e34e43c
1 file changed +16 -18
--- www/tech_overview.wiki
+++ www/tech_overview.wiki
@@ -5,41 +5,38 @@
55
66
<h2>1.0 Introduction</h2>
77
88
At its lowest level, a Fossil repository consists of an unordered set
99
of immutable "artifacts". You might think of these artifacts as "files",
10
-since in many cases the artifacts exactly correspond to source code files
11
-that are stored in the Fossil repository. But other "control artifacts"
10
+since in many cases the artifacts exactly that. But other "control artifacts"
1211
are also included in the mix. These control artifacts define the relationships
1312
between artifacts - which files go together to form a particular
1413
version of the project, who checked in that version and when, what was
1514
the check-in comment, what wiki pages are included with the project, what
1615
are the edit histories of each wiki page, what bug reports or tickets are
17
-included, who contributed to the evolution of each ticket, and so forth,
18
-and so on. This low-level file format is called the "global state" of
16
+included, who contributed to the evolution of each ticket, and so forth.
17
+This low-level file format is called the "global state" of
1918
the repository, since this is the information that is synced to peer
2019
repositories using push and pull operations. The low-level file format
2120
is also called "enduring" since it is intended to last for many years.
2221
The details of the low-level, enduring, global file format
2322
are [./fileformat.wiki | described separately].
2423
2524
This article is about how Fossil is currently implemented. Instead of
2625
dealing with vague abstractions of "enduring file formats" as the
27
-[./fileformat.wiki | that other document] does, this article provides
26
+[./fileformat.wiki | other document] does, this article provides
2827
some detail on how Fossil actually stores information on disk.
2928
3029
<h2>2.0 Three Databases</h2>
3130
3231
Fossil stores state information in
3332
[http://www.sqlite.org/ | SQLite] database files.
3433
SQLite keeps an entire relational database, including multiple tables and
3534
indices, in a single disk file. The SQLite library allows the database
3635
files to be efficiently queried and updated using the industry-standard
37
-SQL language. And SQLite makes updates to these database files atomic,
38
-even if a system crashes or power failure occurs in the middle of the
39
-update, meaning that repository content is protected even during severe
40
-malfunctions.
36
+SQL language. SQLite updates are atomic, so even in the event of
37
+a system crashes or power failure the repository content is protected.
4138
4239
Fossil uses three separate classes of SQLite databases:
4340
4441
<ol>
4542
<li>The configuration database
@@ -152,14 +149,15 @@
152149
The artifacts are stored as BLOBs, compressed using
153150
[http://www.zlib.net/ | zlib compression] and, where applicable,
154151
using [./delta_encoder_algorithm.wiki | delta compression].
155152
The combination of zlib and delta compression results in a considerable
156153
space savings. For the SQLite project, at the time of this writing,
157
-the total size of all artifacts is over 1.7 GB but thanks to the
154
+the total size of all artifacts is over 2.0 GB but thanks to the
158155
combined zlib and delta compression, that content only takes up
159
-51.4 MB of space in the repository database, for a compression ratio
160
-of about 33:1.
156
+32 MB of space in the repository database, for a compression ratio
157
+of about 64:1. The average size of a content BLOB in the database
158
+is around 500 bytes.
161159
162160
Note that the zlib and delta compression is not an inherent part of the
163161
Fossil file format; it is just an optimization.
164162
The enduring file format for Fossil is the unordered
165163
set of artifacts. The compression techniques are just a detail of
@@ -185,11 +183,11 @@
185183
186184
<h4>2.2.2 Project Metadata</h4>
187185
188186
The global project state information in the repository database is
189187
supplemented by computed metadata that makes querying the project state
190
-more efficient. Metadata includes but information such as the following:
188
+more efficient. Metadata includes information such as the following:
191189
192190
* The names for all files found in any checkin.
193191
* All check-ins that modify a given file
194192
* Parents and children of each checkin.
195193
* Potential timeline rows.
@@ -200,13 +198,13 @@
200198
* Current content of each ticket.
201199
* Cross-references between tickets, checkins, and wiki pages.
202200
203201
The metadata is held in various SQL tables in the repository database.
204202
The metadata is designed to facilitate queries for the various timelines and
205
-reports that Fossil generates.
203
+reports that Fossil generates.
206204
As the functionality of Fossil evolves,
207
-the schema for the metadata can and does change from time to time.
205
+the schema for the metadata can and does change.
208206
But schema changes do no invalidate the repository. Remember that the
209207
metadata contains no new information - only information that has been
210208
extracted from the canonical artifacts and saved in a more useful form.
211209
Hence, when the metadata schema changes, the prior metadata can be discarded
212210
and the entire metadata corpus can be recomputed from the canonical
@@ -273,13 +271,13 @@
273271
<h4>2.2.5 Shunned Artifact List</h4>
274272
275273
The set of canonical artifacts for a project - the global state for the
276274
project - is intended to be an append-only database. In other words,
277275
new artifacts can be added but artifacts can never be removed. But
278
-it sometimes happens that inappropriate content can be mistakenly or
279
-maliciously added to a repository. When that happens, the only way
280
-to get rid of the content is to [./shunning.wiki | "shun"] it.
276
+it sometimes happens that inappropriate content is mistakenly or
277
+maliciously added to a repository. The only way to get rid of
278
+the undesired content is to [./shunning.wiki | "shun"] it.
281279
The "shun" table in the repository database records the SHA1 hash of
282280
all shunned artifacts.
283281
284282
The shun table can be pushed or pulled using
285283
the [/help/config | fossil config] command with the "shun" AREA argument.
286284
--- www/tech_overview.wiki
+++ www/tech_overview.wiki
@@ -5,41 +5,38 @@
5
6 <h2>1.0 Introduction</h2>
7
8 At its lowest level, a Fossil repository consists of an unordered set
9 of immutable "artifacts". You might think of these artifacts as "files",
10 since in many cases the artifacts exactly correspond to source code files
11 that are stored in the Fossil repository. But other "control artifacts"
12 are also included in the mix. These control artifacts define the relationships
13 between artifacts - which files go together to form a particular
14 version of the project, who checked in that version and when, what was
15 the check-in comment, what wiki pages are included with the project, what
16 are the edit histories of each wiki page, what bug reports or tickets are
17 included, who contributed to the evolution of each ticket, and so forth,
18 and so on. This low-level file format is called the "global state" of
19 the repository, since this is the information that is synced to peer
20 repositories using push and pull operations. The low-level file format
21 is also called "enduring" since it is intended to last for many years.
22 The details of the low-level, enduring, global file format
23 are [./fileformat.wiki | described separately].
24
25 This article is about how Fossil is currently implemented. Instead of
26 dealing with vague abstractions of "enduring file formats" as the
27 [./fileformat.wiki | that other document] does, this article provides
28 some detail on how Fossil actually stores information on disk.
29
30 <h2>2.0 Three Databases</h2>
31
32 Fossil stores state information in
33 [http://www.sqlite.org/ | SQLite] database files.
34 SQLite keeps an entire relational database, including multiple tables and
35 indices, in a single disk file. The SQLite library allows the database
36 files to be efficiently queried and updated using the industry-standard
37 SQL language. And SQLite makes updates to these database files atomic,
38 even if a system crashes or power failure occurs in the middle of the
39 update, meaning that repository content is protected even during severe
40 malfunctions.
41
42 Fossil uses three separate classes of SQLite databases:
43
44 <ol>
45 <li>The configuration database
@@ -152,14 +149,15 @@
152 The artifacts are stored as BLOBs, compressed using
153 [http://www.zlib.net/ | zlib compression] and, where applicable,
154 using [./delta_encoder_algorithm.wiki | delta compression].
155 The combination of zlib and delta compression results in a considerable
156 space savings. For the SQLite project, at the time of this writing,
157 the total size of all artifacts is over 1.7 GB but thanks to the
158 combined zlib and delta compression, that content only takes up
159 51.4 MB of space in the repository database, for a compression ratio
160 of about 33:1.
 
161
162 Note that the zlib and delta compression is not an inherent part of the
163 Fossil file format; it is just an optimization.
164 The enduring file format for Fossil is the unordered
165 set of artifacts. The compression techniques are just a detail of
@@ -185,11 +183,11 @@
185
186 <h4>2.2.2 Project Metadata</h4>
187
188 The global project state information in the repository database is
189 supplemented by computed metadata that makes querying the project state
190 more efficient. Metadata includes but information such as the following:
191
192 * The names for all files found in any checkin.
193 * All check-ins that modify a given file
194 * Parents and children of each checkin.
195 * Potential timeline rows.
@@ -200,13 +198,13 @@
200 * Current content of each ticket.
201 * Cross-references between tickets, checkins, and wiki pages.
202
203 The metadata is held in various SQL tables in the repository database.
204 The metadata is designed to facilitate queries for the various timelines and
205 reports that Fossil generates.
206 As the functionality of Fossil evolves,
207 the schema for the metadata can and does change from time to time.
208 But schema changes do no invalidate the repository. Remember that the
209 metadata contains no new information - only information that has been
210 extracted from the canonical artifacts and saved in a more useful form.
211 Hence, when the metadata schema changes, the prior metadata can be discarded
212 and the entire metadata corpus can be recomputed from the canonical
@@ -273,13 +271,13 @@
273 <h4>2.2.5 Shunned Artifact List</h4>
274
275 The set of canonical artifacts for a project - the global state for the
276 project - is intended to be an append-only database. In other words,
277 new artifacts can be added but artifacts can never be removed. But
278 it sometimes happens that inappropriate content can be mistakenly or
279 maliciously added to a repository. When that happens, the only way
280 to get rid of the content is to [./shunning.wiki | "shun"] it.
281 The "shun" table in the repository database records the SHA1 hash of
282 all shunned artifacts.
283
284 The shun table can be pushed or pulled using
285 the [/help/config | fossil config] command with the "shun" AREA argument.
286
--- www/tech_overview.wiki
+++ www/tech_overview.wiki
@@ -5,41 +5,38 @@
5
6 <h2>1.0 Introduction</h2>
7
8 At its lowest level, a Fossil repository consists of an unordered set
9 of immutable "artifacts". You might think of these artifacts as "files",
10 since in many cases the artifacts exactly that. But other "control artifacts"
 
11 are also included in the mix. These control artifacts define the relationships
12 between artifacts - which files go together to form a particular
13 version of the project, who checked in that version and when, what was
14 the check-in comment, what wiki pages are included with the project, what
15 are the edit histories of each wiki page, what bug reports or tickets are
16 included, who contributed to the evolution of each ticket, and so forth.
17 This low-level file format is called the "global state" of
18 the repository, since this is the information that is synced to peer
19 repositories using push and pull operations. The low-level file format
20 is also called "enduring" since it is intended to last for many years.
21 The details of the low-level, enduring, global file format
22 are [./fileformat.wiki | described separately].
23
24 This article is about how Fossil is currently implemented. Instead of
25 dealing with vague abstractions of "enduring file formats" as the
26 [./fileformat.wiki | other document] does, this article provides
27 some detail on how Fossil actually stores information on disk.
28
29 <h2>2.0 Three Databases</h2>
30
31 Fossil stores state information in
32 [http://www.sqlite.org/ | SQLite] database files.
33 SQLite keeps an entire relational database, including multiple tables and
34 indices, in a single disk file. The SQLite library allows the database
35 files to be efficiently queried and updated using the industry-standard
36 SQL language. SQLite updates are atomic, so even in the event of
37 a system crashes or power failure the repository content is protected.
 
 
38
39 Fossil uses three separate classes of SQLite databases:
40
41 <ol>
42 <li>The configuration database
@@ -152,14 +149,15 @@
149 The artifacts are stored as BLOBs, compressed using
150 [http://www.zlib.net/ | zlib compression] and, where applicable,
151 using [./delta_encoder_algorithm.wiki | delta compression].
152 The combination of zlib and delta compression results in a considerable
153 space savings. For the SQLite project, at the time of this writing,
154 the total size of all artifacts is over 2.0 GB but thanks to the
155 combined zlib and delta compression, that content only takes up
156 32 MB of space in the repository database, for a compression ratio
157 of about 64:1. The average size of a content BLOB in the database
158 is around 500 bytes.
159
160 Note that the zlib and delta compression is not an inherent part of the
161 Fossil file format; it is just an optimization.
162 The enduring file format for Fossil is the unordered
163 set of artifacts. The compression techniques are just a detail of
@@ -185,11 +183,11 @@
183
184 <h4>2.2.2 Project Metadata</h4>
185
186 The global project state information in the repository database is
187 supplemented by computed metadata that makes querying the project state
188 more efficient. Metadata includes information such as the following:
189
190 * The names for all files found in any checkin.
191 * All check-ins that modify a given file
192 * Parents and children of each checkin.
193 * Potential timeline rows.
@@ -200,13 +198,13 @@
198 * Current content of each ticket.
199 * Cross-references between tickets, checkins, and wiki pages.
200
201 The metadata is held in various SQL tables in the repository database.
202 The metadata is designed to facilitate queries for the various timelines and
203 reports that Fossil generates.
204 As the functionality of Fossil evolves,
205 the schema for the metadata can and does change.
206 But schema changes do no invalidate the repository. Remember that the
207 metadata contains no new information - only information that has been
208 extracted from the canonical artifacts and saved in a more useful form.
209 Hence, when the metadata schema changes, the prior metadata can be discarded
210 and the entire metadata corpus can be recomputed from the canonical
@@ -273,13 +271,13 @@
271 <h4>2.2.5 Shunned Artifact List</h4>
272
273 The set of canonical artifacts for a project - the global state for the
274 project - is intended to be an append-only database. In other words,
275 new artifacts can be added but artifacts can never be removed. But
276 it sometimes happens that inappropriate content is mistakenly or
277 maliciously added to a repository. The only way to get rid of
278 the undesired content is to [./shunning.wiki | "shun"] it.
279 The "shun" table in the repository database records the SHA1 hash of
280 all shunned artifacts.
281
282 The shun table can be pushed or pulled using
283 the [/help/config | fossil config] command with the "shun" AREA argument.
284

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button