Fossil SCM

fossil-scm / www / theory1.wiki

Source Rendered

Blame History Raw 112 lines

1	`<title>Thoughts On The Design Of The Fossil DVCS</title>`
2
3	`Two questions (or criticisms) that arise frequently regarding Fossil`
4	`can be summarized as follows:`
5
6	`1. Why is Fossil based on SQLite instead of a distributed NoSQL database?`
7
8	`2. Why is Fossil written in C instead of a modern high-level language?`
9
10	`Neither question can be answered directly because they are both`
11	`based on false assumptions. We claim that Fossil is not based on SQLite`
12	`at all and that Fossil is not based on a distributed NoSQL database`
13	`because Fossil is a distributed NoSQL database. And, Fossil does use`
14	`a modern high-level language for its implementation, namely SQL.`
15
16	`<h2>Fossil Is A NoSQL Database</h2>`
17
18	`We begin with the first question: Fossil is not based on a distributed`
19	`NoSQL database because Fossil <u><i>is</i></u> a distributed NoSQL database.`
20	`Fossil is <u>not</u> based on SQLite.`
21	`The current implementation of Fossnil uses`
22	`SQLite as a local store for the content of the distributed database and as`
23	`a cache for meta-information about the distributed database that is precomputed`
24	`for quick and easy presentation. But the use of SQLite in this role is an`
25	`implementation detail and is not fundamental to the design. Some future`
26	`version of Fossil might do away with SQLite and substitute a pile-of-files or`
27	`a key/value database in place of SQLite.`
28	`(Actually, that is very unlikely`
29	`to happen since SQLite works amazingly well in its current role, but the point`
30	`is that omitting SQLite from Fossil is a theoretical possibility.)`
31
32	`The underlying database that Fossil implements has nothing to do with`
33	`SQLite, or SQL, or even relational database theory. The underlying`
34	`database is very simple: it is an unordered collection of "artifacts".`
35	`An artifact is a list of bytes - a "file" in the usual manner of thinking.`
36	`Many artifacts are simply the content of source files that have`
37	`been checked into the Fossil repository. Call these "content artifacts".`
38	`Other artifacts, known as`
39	`"control artifacts", contain ASCII text in a particular format that`
40	`defines relationships between other artifacts, such as which`
41	`content artifacts that go together to form a particular version of the`
42	`project. Each artifact is named by its SHA1 or SHA3-256 hash and is`
43	`thus immutable.`
44	`Artifacts can be added to the database but not removed (if we ignore`
45	`the exceptional case of [./shunning.wiki \| shunning].) Repositories`
46	`synchronize by computing the union of their artifact sets. SQL and`
47	`relation theory play no role in any of this.`
48
49	`SQL enters the picture only in the implementation details. The current`
50	`implementation of Fossil stores each artifact as a BLOB in a SQLite`
51	`database.`
52	`The current implementation also parses up each control artifact as it`
53	`arrives and stores the information discovered from that parse in various`
54	`other SQLite tables to facilitate rapid generation of reports such as`
55	`timelines, file histories, file lists, branch lists, and so forth. Note`
56	`that all of this additional information is derived from the artifacts.`
57	`The artifacts are canonical. The relational tables serve only as a cache.`
58	`Everything in the relational tables can be recomputed`
59	`from the artifacts, and in fact that is exactly what happens when one runs`
60	`the "fossil rebuild" command on a repository.`
61
62	`So really, Fossil works with two separate databases. There is the`
63	`bag-of-artifacts database which is non-relational and distributed (like`
64	`a NoSQL database) and there is the local relational database. The`
65	`bag-of-artifacts database has a fixed format and is what defines a Fossil`
66	`repository. Fossil will never modify the file format of the bag-of-artifacts`
67	`database in an incompatible way because to do so would be to make something`
68	`that is no longer "Fossil". The local relational database, on the other hand,`
69	`is a cache that contains information derived from the bag-of-artifacts.`
70	`The schema of the local relational database changes from time to time as`
71	`the Fossil implementation is enhanced, and the content is recomputed from`
72	`the unchanging bag of artifacts. The local relational database is an`
73	`implementation detail which currently happens to use SQLite.`
74
75	`Another way to think of the relational tables in a Fossil repository is`
76	`as an index for the artifacts. Without the relational tables,`
77	`to generate a report like a timeline would require scanning every artifact -`
78	`the equivalent of a full table scan. The relational tables hold pointers to`
79	`the relevant artifacts in presorted order so that generating a timeline`
80	`is much more efficient. So like an index in a relational database, the`
81	`relational tables in a Fossil repository do not add any new information,`
82	`they merely make the information in the artifacts faster and easier to`
83	`look up.`
84
85	`Fossil is not "based" on SQLite. Fossil simply exploits SQLite as`
86	`a powerful tool to make the implementation easier.`
87	`And Fossil doesn't use a distributed`
88	`NoSQL database because Fossil is a distributed NoSQL database. That answers`
89	`the first question.`
90
91	`<h2>SQL Is A High-Level Scripting Language</h2>`
92
93	`The second concern states that Fossil does not use a high-level scripting`
94	`language. But that is not true. Fossil uses SQL (as implemented by SQLite)`
95	`as its scripting language.`
96
97	`This misunderstanding likely arises because people fail`
98	`to appreciate that SQL is a programming language. People are taught that SQL`
99	`is a "query language" as if that were somehow different from a`
100	`"programming language". But they really are two different flavors of the`
101	`same thing. I find that people do better with SQL if they think of`
102	`SQL as a programming language and each statement`
103	`of SQL is a separate program. SQL is a peculiar programming language`
104	`in that one uses SQL to specify <i>what</i> to compute whereas in`
105	`most other programming languages one specifies <i>how</i>`
106	`to carry out the computation.`
107	`This difference means that SQL`
108	`is an extraordinary high-level programming language, but it is still`
109	`just a programming language.`
110
111	`For certain types of problems, SQL has a huge advantage over other`
112	`programming languages because it`

Fossil SCM

Keyboard Shortcuts