Fossil SCM

fossil-scm / www / tech_overview.wiki

Source Rendered

Blame History Raw 379 lines

1	`<title>A Technical Overview of Fossil's Design & Implementation</title>`
2
3	`<h2>1.0 Introduction</h2>`
4
5	`At its lowest level, a Fossil repository consists of an unordered set`
6	`of immutable "artifacts". You might think of these artifacts as "files",`
7	`since in many cases the artifacts are exactly that.`
8	`But other "structural artifacts" are also included in the mix.`
9	`These structural artifacts define the relationships`
10	`between artifacts - which files go together to form a particular`
11	`version of the project, who checked in that version and when, what was`
12	`the check-in comment, what wiki pages are included with the project, what`
13	`are the edit histories of each wiki page, what bug reports or tickets are`
14	`included, who contributed to the evolution of each ticket, and so forth.`
15	`This low-level file format is called the "global state" of`
16	`the repository, since this is the information that is synced to peer`
17	`repositories using push and pull operations. The low-level file format`
18	`is also called "enduring" since it is intended to last for many years.`
19	`The details of the low-level, enduring, global file format`
20	`are [./fileformat.wiki \| described separately].`
21
22	`This article is about how Fossil is currently implemented. Instead of`
23	`dealing with vague abstractions of "enduring file formats" as the`
24	`[./fileformat.wiki \| other document] does, this article provides`
25	`some detail on how Fossil actually stores information on disk.`
26
27	`<h2>2.0 Three Databases</h2>`
28
29	`Fossil stores state information in`
30	`[http://www.sqlite.org/ \| SQLite] database files.`
31	`SQLite keeps an entire relational database, including multiple tables and`
32	`indices, in a single disk file. The SQLite library allows the database`
33	`files to be efficiently queried and updated using the industry-standard`
34	`SQL language. SQLite updates are atomic, so even in the event of`
35	`a system crashes or power failure the repository content is protected.`
36
37	`Fossil uses three separate classes of SQLite databases:`
38
39	`<ol>`
40	`<li>The configuration database`
41	`<li>Repository databases`
42	`<li>Checkout databases`
43	`</ol>`
44
45	`The configuration database is a one-per-user database that holds`
46	`global configuration information used by Fossil. There is one`
47	`repository database per project. The repository database is the`
48	`file that people are normally referring to when they say`
49	`"a Fossil repository". The checkout database is found in the working`
50	`checkout for a project and contains state information that is unique`
51	`to that working checkout.`
52
53	`Fossil does not always use all three database files. The web interface,`
54	`for example, typically only uses the repository database. And the`
55	`[/help/settings \| fossil settings] command only opens the configuration database`
56	`when the --global option is used. But other commands use all three`
57	`databases at once. For example, the [/help/status \| fossil status]`
58	`command will first locate the checkout database, then use the checkout`
59	`database to find the repository database, then open the configuration`
60	`database. Whenever multiple databases are used at the same time,`
61	`they are all opened on the same SQLite database connection using`
62	`SQLite's [http://www.sqlite.org/lang_attach.html \| ATTACH] command.`
63
64	`The chart below provides a quick summary of how each of these`
65	`database files are used by Fossil, with detailed discussion following.`
66
67	`<table align="center">`
68	`<tr valign="bottom">`
69	`<th style="text-align:center">Configuration Database<br>"~/.fossil" or<br>`
70	`"~/.config/fossil.db"`
71	`<th style="text-align:center">Repository Database<br>"<i>project</i>.fossil"`
72	`<th style="text-align:center">Checkout Database<br>"_FOSSIL_" or ".fslckout"`
73	`<tr valign="top">`
74	`<td><ul>`
75	`<li>Global [/help/settings \|settings]`
76	`<li>List of active repositories used by the [/help/all \| all] command`
77	`</ul></td>`
78	`<td><ul>`
79	`<li>[./fileformat.wiki \| Global state of the project]`
80	`encoded using delta-compression`
81	`<li>Local [/help/settings\|settings]`
82	`<li>Web interface display preferences`
83	`<li>User credentials and permissions`
84	`<li>Metadata about the global state to facilitate rapid`
85	`queries`
86	`</ul></td>`
87	`<td><ul>`
88	`<li>The repository database used by this checkout`
89	`<li>The version currently checked out`
90	`<li>Other versions [/help/merge \| merged] in but not`
91	`yet [/help/commit \| committed]`
92	`<li>Changes from the [/help/add \| add], [/help/delete \| delete],`
93	`and [/help/rename \| rename] commands that have not yet been committed`
94	`<li>"mtime" values and other information used to efficiently detect`
95	`local edits`
96	`<li>The "[/help/stash \| stash]"`
97	`<li>Information needed to "[/help/undo\|undo]" or "[/help/redo\|redo]"`
98	`</ul></td>`
99	`</tr>`
100	`</table>`
101
102	`<h3 id="configdb">2.1 The Configuration Database</h3>`
103
104	`The configuration database holds cross-repository preferences and a list of all`
105	`repositories for a single user.`
106
107	`The [/help/settings \| fossil settings] command can be used to specify various`
108	`operating parameters and preferences for Fossil repositories. Settings can`
109	`apply to a single repository, or they can apply globally to all repositories`
110	`for a user. If both a global and a repository value exists for a setting,`
111	`then the repository-specific value takes precedence. All of the settings`
112	`have reasonable defaults, and so many users will never need to change them.`
113	`But if changes to settings are desired, the configuration database provides`
114	`a way to change settings for all repositories with a single command, rather`
115	`than having to change the setting individually on each repository.`
116
117	`The configuration database also maintains a list of repositories. This`
118	`list is used by the [/help/all \| fossil all] command in order to run various`
119	`operations such as "sync" or "rebuild" on all repositories managed by a user.`
120
121	`<h4 id="configloc">2.1.1 Location Of The Configuration Database</h4>`
122
123	`On Unix systems, the configuration database is named by the following`
124	`algorithm:`
125
126	`<table>`
127	`<tr><td>1. if environment variable FOSSIL_HOME exists`
128	`<td> → <td>$FOSSIL_HOME/.fossil`
129	`<tr><td>2. if file ~/.fossil exists`
130	`<td> →<td>~/.fossil`
131	`<tr><td>3. if environment variable XDG_CONFIG_HOME exists`
132	`<td> →<td>$XDG_CONFIG_HOME/fossil.db`
133	`<tr><td>4. if the directory ~/.config exists`
134	`<td> →<td>~/.config/fossil.db`
135	`<tr><td>5. Otherwise<td> →<td>~/.fossil`
136	`</table>`
137
138	`Another way of thinking of this algorithm is the following:`
139
140	`* Use "$FOSSIL_HOME/.fossil" if the FOSSIL_HOME variable is defined`
141	`* Use the XDG-compatible name (usually ~/.config/fossil.db) on XDG systems`
142	`if the ~/.fossil file does not already exist`
143	`* Otherwise, use the traditional unix name of "~/.fossil"`
144
145	`This algorithm is complex due to the need for historical compatibility.`
146	`Originally, the database was always just "~/.fossil". Then support`
147	`for the FOSSIL_HOME environment variable was added. Later, support for the`
148	`[https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html\|XDG-compatible configation filenames]`
149	`was added. Each of these changes needed to continue to support legacy`
150	`installations.`
151
152	`On Windows, the configuration database is the first of the following`
153	`for which the corresponding environment variables exist:`
154
155	`* %FOSSIL_HOME%/_fossil`
156	`* %LOCALAPPDATA%/_fossil`
157	`* %APPDATA%/_fossil`
158	`* %USERPROFILES%/_fossil`
159	`* %HOMEDRIVE%%HOMEPATH%/_fossil`
160
161	`The second case is the one that usually determines the name. Note that the`
162	`FOSSIL_HOME environment variable can always be set to determine the`
163	`location of the configuration database. Note also that the configuration`
164	`database file itself is called ".fossil" or "fossil.db" on unix but`
165	`"_fossil" on windows.`
166
167	`The [/help/info\|fossil info] command will show the location of`
168	`the configuration database on a line that starts with "config-db:".`
169
170	`<h3>2.2 Repository Databases</h3>`
171
172	`The repository database is the file that is commonly referred to as`
173	`"the repository". This is because the repository database contains,`
174	`among other things, the complete revision, ticket, and wiki history for`
175	`a project. It is customary to name the repository database after the`
176	`name of the project, with a ".fossil" suffix. For example, the repository`
177	`database for the self-hosting Fossil repository is called "fossil.fossil"`
178	`and the repository database for SQLite is called "sqlite.fossil".`
179
180	`<h4>2.2.1 Global Project State</h4>`
181
182	`The bulk of the repository database (typically 75 to 85%) consists`
183	`of the artifacts that comprise the`
184	`[./fileformat.wiki \| enduring, global, shared state] of the project.`
185	`The artifacts are stored as BLOBs, compressed using`
186	`[http://www.zlib.net/ \| zlib compression] and, where applicable,`
187	`using [./delta_encoder_algorithm.wiki \| delta compression].`
188	`The combination of zlib and delta compression results in a considerable`
189	`space savings. For the SQLite project (when this paragraph was last`
190	`updated on 2020-02-08)`
191	`the total size of all artifacts is over 7.1 GB but thanks to the`
192	`combined zlib and delta compression, that content only takes less than`
193	`97 MB of space in the repository database, for a compression ratio`
194	`of about 74:1. The median size of all content BLOBs after delta`
195	`and zlib compression have been applied is 156 bytes.`
196	`The median size of BLOBs without compression is 45,312 bytes.`
197
198	`Note that the zlib and delta compression is not an inherent part of the`
199	`Fossil file format; it is just an optimization.`
200	`The enduring file format for Fossil is the unordered`
201	`set of artifacts. The compression techniques are just a detail of`
202	`how the current implementation of Fossil happens to store these artifacts`
203	`efficiently on disk.`
204
205	`All of the original uncompressed and un-delta'd artifacts can be extracted`
206	`from a Fossil repository database using`
207	`the [/help/deconstruct \| fossil deconstruct]`
208	`command. Individual artifacts can be extracted using the`
209	`[/help/artifact \| fossil artifact] command.`
210	`When accessing the repository database using raw SQL and the`
211	`[/help/sqlite3 \| fossil sql] command, the extension function`
212	`"<tt>content()</tt>" with a single argument which is the SHA1 or`
213	`SHA3-256 hash`
214	`of an artifact will return the complete uncompressed`
215	`content of that artifact.`
216
217	`Going the other way, the [/help/reconstruct \| fossil reconstruct]`
218	`command will scan a directory hierarchy and add all files found to`
219	`a new repository database. The [/help/import \| fossil import] command`
220	`works by reading the input git-fast-export stream and using it to construct`
221	`corresponding artifacts which are then written into the repository database.`
222
223	`<h4>2.2.2 Project Metadata</h4>`
224
225	`The global project state information in the repository database is`
226	`supplemented by computed metadata that makes querying the project state`
227	`more efficient. Metadata includes information such as the following:`
228
229	`* The names for all files found in any check-in.`
230	`* All check-ins that modify a given file`
231	`* Parents and children of each check-in.`
232	`* Potential timeline rows.`
233	`* The names of all symbolic tags and the check-ins they apply to.`
234	`* The names of all wiki pages and the artifacts that comprise each`
235	`wiki page.`
236	`* Attachments and the wiki pages or tickets they apply to.`
237	`* Current content of each ticket.`
238	`* Cross-references between tickets, check-ins, and wiki pages.`
239
240	`The metadata is held in various SQL tables in the repository database.`
241	`The metadata is designed to facilitate queries for the various timelines and`
242	`reports that Fossil generates.`
243	`As the functionality of Fossil evolves,`
244	`the schema for the metadata can and does change.`
245	`But schema changes do not invalidate the repository. Remember that the`
246	`metadata contains no new information - only information that has been`
247	`extracted from the canonical artifacts and saved in a more useful form.`
248	`Hence, when the metadata schema changes, the prior metadata can be discarded`
249	`and the entire metadata corpus can be recomputed from the canonical`
250	`artifacts. That is what the`
251	`[/help/rebuild \| fossil rebuild] command does.`
252
253	`<h4>2.2.3 Display And Processing Preferences</h4>`
254
255	`The repository database also holds information used to help format`
256	`the display of web pages and configuration settings that override the`
257	`global configuration settings for the specific repository. All of`
258	`this information (and the user credentials and privileges too) is`
259	`local to each repository database; it is not shared between repositories`
260	`by [/help/sync \| fossil sync]. That is because it is entirely reasonable`
261	`that two different websites for the same project might have completely`
262	`different display preferences and user communities. One instance of the`
263	`project might be a fork of the other, for example, which pulls from the`
264	`other but never pushes and extends the project in ways that the keepers of`
265	`the other website disapprove of.`
266
267	`Display and processing information includes the following:`
268
269	`* The name and description of the project`
270	`* The CSS file, header, and footer used by all web pages`
271	`* The project logo image`
272	`* Fields of tickets that are considered "significant" and which are`
273	`therefore collected from artifacts and made available for display`
274	`* Templates for screens to view, edit, and create tickets`
275	`* Ticket report formats and display preferences`
276	`* Local values for [/help/settings \| settings] that override the`
277	`global values defined in the per-user configuration database.`
278
279	`Though the display and processing preferences do not move between`
280	`repository instances using [/help/sync \| fossil sync], this information`
281	`can be shared between repositories using the`
282	`[/help/config \| fossil config push] and`
283	`[/help/config \| fossil config pull] commands.`
284	`The display and processing information is also copied into new`
285	`repositories when they are created using`
286	`[/help/clone \| fossil clone].`
287
288	`<h4>2.2.4 User Credentials And Privileges</h4>`
289
290	`Just because two development teams are collaborating on a project and allow`
291	`push and/or pull between their repositories does not mean that they`
292	`trust each other enough to share passwords and access privileges.`
293	`Hence the names and emails and passwords and privileges of users are`
294	`considered private information that is kept locally in each repository.`
295
296	`Each repository database has a table holding the username, privileges,`
297	`and login credentials for users authorized to interact with that particular`
298	`database. In addition, there is a table named "concealed" that maps the`
299	`SHA1 hash of each users email address back into their true email address.`
300	`The concealed table allows just the SHA1 hash of email addresses to`
301	`be stored in tickets, and thus prevents actual email addresses from falling`
302	`into the hands of spammers who happen to clone the repository.`
303
304	`The content of the user and concealed tables can be pushed and pulled using the`
305	`[/help/config \| fossil config push] and`
306	`[/help/config \| fossil config pull] commands with the "user" and`
307	`"email" as the AREA argument, but only if you have administrative`
308	`privileges on the remote repository.`
309
310	`<h4>2.2.5 Shunned Artifact List</h4>`
311
312	`The set of canonical artifacts for a project - the global state for the`
313	`project - is intended to be an append-only database. In other words,`
314	`new artifacts can be added but artifacts can never be removed. But`
315	`it sometimes happens that inappropriate content is mistakenly or`
316	`maliciously added to a repository. The only way to get rid of`
317	`the undesired content is to [./shunning.wiki \| "shun"] it.`
318	`The "shun" table in the repository database records the hash values for`
319	`all shunned artifacts.`
320
321	`The shun table can be pushed or pulled using`
322	`the [/help/config \| fossil config] command with the "shun" AREA argument.`
323	`The shun table is also copied during a [/help/clone \| clone].`
324
325	`<h3 id="localdb">2.3 Checkout Databases</h3>`
326
327	`Fossil allows a single repository`
328	`to have multiple working checkouts. Each working checkout has a single`
329	`database in its root directory that records the state of that checkout.`
330	`The checkout database is named "_FOSSIL_" or ".fslckout".`
331	`The checkout database records information such as the following:`
332
333	`* The name of the repository database file.`
334	`* The version that is currently checked out.`
335	`* Files that have been [/help/add \| added],`
336	`[/help/rm \| removed], or [/help/mv \| renamed] but not`
337	`yet committed.`
338	`* The mtime and size of files as they were originally checked out,`
339	`in order to expedite checking which files have been edited.`
340	`* Other check-ins that have been [/help/merge \| merged] into the`
341	`working checkout but not yet committed.`
342	`* Copies of files prior to the most recent undoable operation - needed to`
343	`implement the [/help/undo \| undo] and [/help/redo \| redo] commands.`
344	`* The [/help/stash \| stash].`
345	`* State information for the [/help/bisect \| bisect] command.`
346
347	`For Fossil commands that run from within a working checkout, the`
348	`first thing that happens is that Fossil locates the checkout database.`
349	`Fossil first looks in the current directory. If not found there, it`
350	`looks in the parent directory. If not found there, the parent of the`
351	`parent. And so forth until either the checkout database is found`
352	`or the search reaches the root of the file system. (In the latter case,`
353	`Fossil returns an error, of course.) Once the checkout database is`
354	`located, it is used to locate the repository database.`
355
356	`Notice that the checkout database contains a pointer to the repository`
357	`database but that the repository database has no record of the checkout`
358	`databases. That means that a working checkout directory tree can be`
359	`freely renamed or copied or deleted without consequence. But the`
360	`repository database file, on the other hand, has to stay in the same`
361	`place with the same name or else the open checkout databases will not`
362	`be able to find it.`
363
364	`A checkout database is created by the [/help/open \| fossil open] command.`
365	`A checkout database is deleted by [/help/close \| fossil close]. The`
366	`fossil close command really isn't needed; one can accomplish the same`
367	`thing simply by deleting the checkout database.`
368
369	`Note that the stash, the undo stack, and the state of the bisect command`
370	`are all contained within the checkout database. That means that the`
371	`fossil close command will delete all stash content, the undo stack, and`
372	`the bisect state. The close command is not undoable. Use it with care.`
373
374	`<h2>3.0 See Also</h2>`
375
376	`* [./makefile.wiki \| The Fossil Build Process]`
377	`* [./contribute.wiki \| How To Contribute Code To Fossil]`
378	`* [./adding_code.wiki \| Adding New Features To Fossil]`
379

Fossil SCM

Keyboard Shortcuts