|
1
|
# Backing Up a Remote Fossil Repository |
|
2
|
|
|
3
|
One of the great benefits of Fossil and other [distributed version control systems][dvcs] |
|
4
|
is that cloning a repository makes a backup. If you are running a project with multiple |
|
5
|
developers who share their work using a [central server][server] and the server hardware |
|
6
|
catches fire, the clones of the repository on each developer |
|
7
|
workstation *may* serve as a suitable backup. |
|
8
|
|
|
9
|
[dvcs]: wikipedia:/wiki/Distributed_version_control |
|
10
|
[server]: ./server/whyuseaserver.wiki |
|
11
|
|
|
12
|
We say “may” because |
|
13
|
it turns out not everything in a Fossil repository is copied when cloning. You |
|
14
|
don’t even always get copies of all historical file artifacts. More than |
|
15
|
that, a Fossil repository typically contains |
|
16
|
other useful information that is not always shared as part of a clone, which might need |
|
17
|
to be backed up separately. To wit: |
|
18
|
|
|
19
|
|
|
20
|
## <a id="pii"></a> Sensitive Information |
|
21
|
|
|
22
|
Fossil purposefully does not clone certain sensitive information unless |
|
23
|
you’re logged in as a user with [Setup] capability. As an example, a local clone |
|
24
|
may have a different `user` table than the remote, because only a |
|
25
|
Setup user is allowed to see the full version for privacy and security |
|
26
|
reasons. |
|
27
|
|
|
28
|
|
|
29
|
## <a id="config"></a> Configuration Drift |
|
30
|
|
|
31
|
Fossil allows the local configuration to differ in several areas from |
|
32
|
that of the remote. You get a copy |
|
33
|
of *some* of these configuration areas on initial clone — not all! — but after that, |
|
34
|
remote configuration changes mostly do not sync down automatically. |
|
35
|
|
|
36
|
|
|
37
|
#### <a id="skin"></a> Skin |
|
38
|
|
|
39
|
Changes to the remote’s skin don’t sync down, on purpose, since you may |
|
40
|
want to have a different skin on the local clone than on the remote. You |
|
41
|
can ask for updates with [`fossil config pull skin`][cfg], but that does |
|
42
|
not happen automatically during the course of normal development. |
|
43
|
|
|
44
|
|
|
45
|
#### <a id="alerts"></a> Email Alerts |
|
46
|
|
|
47
|
The Admin → Notification settings do not get copied on clone or sync, |
|
48
|
and it is not possible to push such settings from one repository to |
|
49
|
another. We did this on purpose because you may have a network of peer |
|
50
|
repositories, and you only want one repository sending email alerts. If |
|
51
|
Fossil were to automatically replicate the email alert settings to a |
|
52
|
separate repository, subscribers would get multiple alerts for each |
|
53
|
event, which would be *bad.* |
|
54
|
|
|
55
|
The only element of the email alert configuration that can be pulled |
|
56
|
over the sync protocol on demand is the subscriber list, via |
|
57
|
[`fossil config pull subscriber`][cfg]. |
|
58
|
|
|
59
|
|
|
60
|
#### <a id="project"></a> Project Configuration |
|
61
|
|
|
62
|
This is normally generated once during `fossil init` and never changed, |
|
63
|
so Fossil doesn’t pull this information without being forced, on |
|
64
|
purpose. You could accidentally merge two separate Fossil repos by |
|
65
|
pushing one repo’s project config up to another, for example. |
|
66
|
|
|
67
|
|
|
68
|
#### <a id="other-cfg"></a> Others |
|
69
|
|
|
70
|
A repo’s URL aliases, [interwiki configuration](./interwiki.md), and |
|
71
|
[ticket customizations](./custom_tcket.wiki) also do not normally sync. |
|
72
|
|
|
73
|
[cfg]: /help/configuration |
|
74
|
|
|
75
|
|
|
76
|
|
|
77
|
## <a id="private"></a> Private Branches |
|
78
|
|
|
79
|
The very nature of Fossil’s [private branch feature][pbr] ensures that |
|
80
|
remote clones don’t get a copy of those branches. Normally this is |
|
81
|
exactly what you want, but in the case of making backups, you probably |
|
82
|
want to back up these branches as well. One of the two backup methods below |
|
83
|
provides this. |
|
84
|
|
|
85
|
|
|
86
|
## <a id="shun"></a> Shunned Artifacts |
|
87
|
|
|
88
|
Fossil purposefully doesn’t sync [shunned artifacts][shun]. If you want |
|
89
|
your local clone to be a precise match to the remote, it needs to track |
|
90
|
changes to the shun table as well. |
|
91
|
|
|
92
|
|
|
93
|
## <a id="uv"></a> Unversioned Artifacts |
|
94
|
|
|
95
|
Data in Fossil’s [unversioned artifacts table][uv] doesn’t sync down by |
|
96
|
default unless you specifically ask for it. Like local configuration |
|
97
|
data, it doesn’t get pulled as part of a normal `fossil sync`, but |
|
98
|
*unlike* the config data, you don’t get unversioned files as part of the |
|
99
|
initial clone unless you ask for it by passing the `--unversioned/-u` |
|
100
|
flag. |
|
101
|
|
|
102
|
|
|
103
|
## <a id="ait"></a>Autosync Is Intransitive |
|
104
|
|
|
105
|
If you’re using Fossil in a truly distributed mode, rather than the |
|
106
|
simple central-and-clones model that is more common, there may be no |
|
107
|
single source of truth in the network because Fossil’s autosync feature |
|
108
|
isn’t transitive. |
|
109
|
|
|
110
|
That is, if you cloned from server A, and then you stand that up on a |
|
111
|
server B, then if I clone from your server as my repository C, your changes to B |
|
112
|
autosync up to A, but not down to me on C until I do something locally |
|
113
|
that triggers autosync. The inverse is also true: if I commit something |
|
114
|
on C, it will autosync up to B, but A won’t get a copy until someone on |
|
115
|
B does something to trigger a sync there. |
|
116
|
|
|
117
|
An easy way to run into this problem is to set up failover servers |
|
118
|
`svr1` thru `svr3.example.com`, then set `svr2` and `svr3` up to sync |
|
119
|
with the first. If all of the users normally clone from `svr1`, their |
|
120
|
commits don’t get to `svr2` and `svr3` until something on one of the |
|
121
|
servers pushes or pulls the changes down to the next server in the sync |
|
122
|
chain. |
|
123
|
|
|
124
|
Likewise, if `svr1` falls over and all of the users re-point their local |
|
125
|
clones at `svr2`, then `svr1` later reappears, `svr1` is likely to |
|
126
|
remain a stale copy of the old version of the repository until someone |
|
127
|
causes it to sync with `svr2` or `svr3` to catch up again. And then if |
|
128
|
you originally designed the sync scheme to treat `svr1` as the primary |
|
129
|
source of truth, those users still syncing with `svr2` won’t have their |
|
130
|
commits pushed up to `svr1` unless you’ve set up bidirectional sync, |
|
131
|
rather than have the two backup servers do `pull` only. |
|
132
|
|
|
133
|
|
|
134
|
# <a id="sync-solution"></a> Solution 1: Explicit Pulls |
|
135
|
|
|
136
|
The following script solves most of the above problems for the use case |
|
137
|
where you want a *nearly-complete* clone of the remote repository using nothing |
|
138
|
but the normal Fossil sync protocol. It only does so if you are logged into |
|
139
|
the remote as a user with Setup capability, however. |
|
140
|
|
|
141
|
``` shell |
|
142
|
#!/bin/sh |
|
143
|
fossil sync --unversioned |
|
144
|
fossil configuration pull all |
|
145
|
fossil rebuild |
|
146
|
``` |
|
147
|
|
|
148
|
The last step is needed to ensure that shunned artifacts on the remote |
|
149
|
are removed from the local clone. The second step includes |
|
150
|
`fossil conf pull shun`, but until those artifacts are actually rebuilt |
|
151
|
out of existence, your backup will be “more than complete” in the sense |
|
152
|
that it will continue to have information that the remote says should |
|
153
|
not exist any more. That would be not so much a “backup” as an |
|
154
|
“archive,” which might not be what you want. |
|
155
|
|
|
156
|
|
|
157
|
# <a id="sql-solution"></a> Solution 2: SQL-Level Backup |
|
158
|
|
|
159
|
The first method doesn’t get you a copy of the remote’s |
|
160
|
[private branches][pbr], on purpose. It may also miss other info on the |
|
161
|
remote, such as SQL-level customizations that the sync protocol can’t |
|
162
|
see. (Some [ticket system customization][tkt] schemes rely on this ability, for example.) You can |
|
163
|
solve such problems if you have access to the remote server, which |
|
164
|
allows you to get a SQL-level backup by delegating handling of locking |
|
165
|
and transaction isolation to |
|
166
|
[the `backup` command][bu], allowing the user to safely back up an in-use |
|
167
|
repository. |
|
168
|
|
|
169
|
If you have SSH access to the remote server, something like this will work: |
|
170
|
|
|
171
|
``` shell |
|
172
|
#!/bin/bash |
|
173
|
bf=repo-$(date +%Y-%m-%d).fossil |
|
174
|
ssh example.com "cd museum ; fossil backup -R repo.fossil backups/$bf" && |
|
175
|
scp example.com:museum/backups/$bf ~/museum/backups |
|
176
|
``` |
|
177
|
|
|
178
|
Beware that this method does not solve [the intransitive sync |
|
179
|
problem](#ait), in and of itself: if you do a SQL-level backup of a |
|
180
|
stale repo DB, you have a *stale backup!* You should therefore run this |
|
181
|
on every node that may need to serve as a backup so that at least *one* |
|
182
|
of the backups is also up-to-date. |
|
183
|
|
|
184
|
|
|
185
|
# <a id="enc"></a> Encrypted Off-Site Backups |
|
186
|
|
|
187
|
A useful refinement that you can apply to both methods above is |
|
188
|
encrypted off-site backups. You may wish to store backups of your |
|
189
|
repositories off-site on a service such as Dropbox, Google Drive, iCloud, |
|
190
|
or Microsoft OneDrive, where you don’t fully trust the service not to |
|
191
|
leak your information. This addition to the prior scripts will encrypt |
|
192
|
the resulting backup in such a way that the cloud copy is a useless blob |
|
193
|
of noise to anyone without the key: |
|
194
|
|
|
195
|
```shell |
|
196
|
iter=152830 |
|
197
|
pass="h8TixP6Mt6edJ3d6COaexiiFlvAM54auF2AjT7ZYYn" |
|
198
|
gd="$HOME/Google Drive/Fossil Backups/$bf.xz.enc" |
|
199
|
fossil sql -R ~/museum/backups/"$bf" .dump | xz -9 | |
|
200
|
openssl enc -e -aes-256-cbc -pbkdf2 -iter $iter -pass pass:"$pass" -out "$gd" |
|
201
|
``` |
|
202
|
|
|
203
|
If you’re adding this to the first script above, remove the |
|
204
|
“`-R repo-name`” bit so you get a dump of the repository backing the |
|
205
|
current working directory. |
|
206
|
|
|
207
|
Change the `pass` value to some other long random string, and change the |
|
208
|
`iter` value to something in the hundreds of thousands range. A good source for |
|
209
|
the first is [here][grcp], and for the second, [here][rint]. |
|
210
|
|
|
211
|
You may find posts online written by people recommending millions of |
|
212
|
iterations for PBKDF2, but they’re generally talking about this in the |
|
213
|
context of memorizable passwords, where adding even one more character |
|
214
|
to the password is a significant burden. Given our script’s purely |
|
215
|
random maximum-length passphrase, there isn’t much more that increasing |
|
216
|
the key derivation iteration count can do for us. |
|
217
|
|
|
218
|
Conversely, if you were to reduce the passphrase to 41 characters, that |
|
219
|
would drop the key strength by roughly 2⁶, being the entropy value per |
|
220
|
character for using most of printable ASCII in our passphrase. To make |
|
221
|
that lost strength up on the PBKDF2 end, you’d have to multiply your |
|
222
|
iterations by 2⁶ = 64 times. It’s easier to use a max-length passphrase |
|
223
|
in this situation than get crazy with key derivation iteration counts. |
|
224
|
|
|
225
|
(This, by the way, is why the example passphrase above is 42 characters: |
|
226
|
with 6 bits of entropy per character, that gives you a key size of 252, |
|
227
|
as close as we can get to our chosen encryption algorithm’s 256-bit key |
|
228
|
size without going over. If it pleases you to give it 43 random |
|
229
|
characters for a passphrase in order to pick up those last four bits of |
|
230
|
security, you’re welcome to do so.) |
|
231
|
|
|
232
|
Compressing the data before encrypting it removes redundancies that can |
|
233
|
make decryption easier, and it results in a smaller backup than you get |
|
234
|
with the previous script alone, at the expense of a lot of CPU time |
|
235
|
during the backup. You may wish to switch to a less space-efficient |
|
236
|
compression algorithm that takes less CPU power, such as [`lz4`][lz4]. |
|
237
|
Changing up the compression algorithm also provides some |
|
238
|
security-thru-obscurity, which is useless on its own, but it *is* a |
|
239
|
useful adjunct to strong encryption. |
|
240
|
|
|
241
|
This requires OpenSSL 1.1 or higher. If you’re on 1.0 or older, you |
|
242
|
won’t have the `-pbkdf2` and `-iter` options, and you may have to choose |
|
243
|
a different cipher algorithm; both changes are likely to weaken the |
|
244
|
encryption significantly, so you should install a newer version rather |
|
245
|
than work around the lack of these features. |
|
246
|
|
|
247
|
Beware that macOS ships a fork of OpenSSL called [LibreSSL][lssl] that |
|
248
|
lacked this capability until Ventura (13.0). If you’re on Monterey (12) |
|
249
|
or older, we recommend use of the [Homebrew][hb] OpenSSL package rather |
|
250
|
than give up on the security afforded by use of configurable-iteration |
|
251
|
PBKDF2. To avoid a conflict with the platform’s `openssl` binary, |
|
252
|
Homebrew’s installation is [unlinked][hbul] by default, so you have to |
|
253
|
give an explicit path to it, one of: |
|
254
|
|
|
255
|
/usr/local/opt/openssl/bin/openssl ... # Intel x86 Macs |
|
256
|
/opt/homebrew/opt/openssl/bin/openssl ... # ARM Macs (“Apple silicon”) |
|
257
|
|
|
258
|
[lssl]: https://www.libressl.org/ |
|
259
|
|
|
260
|
|
|
261
|
## <a id="rest"></a> Restoring From An Encrypted Backup |
|
262
|
|
|
263
|
The “restore” script for the above fragment is basically an inverse of |
|
264
|
it, but it’s worth showing it because there are some subtleties to take |
|
265
|
care of. If all variables defined in earlier scripts are available, then |
|
266
|
restoration is: |
|
267
|
|
|
268
|
``` |
|
269
|
openssl enc -d -aes-256-cbc -pbkdf2 -iter $iter -pass pass:"$pass" -in "$gd" | |
|
270
|
xz -d | fossil sql --no-repository ~/museum/restored-repo.fossil |
|
271
|
``` |
|
272
|
|
|
273
|
We changed the `-e` to `-d` on the `openssl` command to get decryption, |
|
274
|
and we changed the `-out` to `-in` so it reads from the encrypted backup |
|
275
|
file and writes the result to stdout. |
|
276
|
|
|
277
|
The decompression step is trivial. |
|
278
|
|
|
279
|
The last change is tricky: we used `fossil sql` above to ensure that |
|
280
|
we’re using the same version of SQLite to write the encrypted backup DB |
|
281
|
as was used to maintain the repository. We must also do that on |
|
282
|
restoration: |
|
283
|
Fossil serves as a dogfooding project for SQLite, |
|
284
|
often making use of the latest features, so it is quite likely that a given |
|
285
|
random `sqlite3` binary in your `PATH` will be unable to understand the |
|
286
|
file created by “`fossil sql .dump`”! The tricky bit is, you can’t just |
|
287
|
pipe the decrypted SQL dump into `fossil sql`, because on startup, Fossil |
|
288
|
normally goes looking for tables created by `fossil init`, and it won’t |
|
289
|
find them in a newly-created repo DB. We get around this by passing |
|
290
|
the `--no-repository` flag, which suppresses this behavior. Doing it |
|
291
|
this way saves you from needing to go and build a matching version of |
|
292
|
`sqlite3` just to restore the backup. |
|
293
|
|
|
294
|
[bu]: /help/backup |
|
295
|
[grcp]: https://www.grc.com/passwords.htm |
|
296
|
[hb]: https://brew.sh |
|
297
|
[hbul]: https://docs.brew.sh/FAQ#what-does-keg-only-mean |
|
298
|
[lz4]: https://lz4.github.io/lz4/ |
|
299
|
[pbr]: ./private.wiki |
|
300
|
[rint]: https://www.random.org/integers/?num=1&min=100000&max=1000000&col=5&base=10&format=html&rnd=new |
|
301
|
[Setup]: ./caps/admin-v-setup.md#apsu |
|
302
|
[shun]: ./shunning.wiki |
|
303
|
[tkt]: ./tickets.wiki |
|
304
|
[uv]: ./unvers.wiki |
|
305
|
|