Fossil SCM

fossil-scm / www / containers.md
1
# OCI Containers
2
3
This document shows how to build Fossil into [OCI] compatible containers
4
and how to use those containers in interesting ways. We start off using
5
the original and still most popular container development and runtime
6
platform, [Docker], but since you have more options than that, we will
7
show some of these options later on.
8
9
[Docker]: https://www.docker.com/
10
[OCI]: https://opencontainers.org/
11
12
13
## 1. Quick Start
14
15
Fossil ships a `Dockerfile` at the top of its source tree,
16
[here][DF], which you can build like so:
17
18
$ docker build -t fossil .
19
20
If the image built successfully, you can create a container from it and
21
test that it runs:
22
23
$ docker run --name fossil -p 9999:8080/tcp fossil
24
25
This shows us remapping the internal TCP listening port as 9999 on the
26
host. This feature of OCI runtimes means there’s little point to using
27
the “`fossil server --port`” feature inside the container. We can let
28
Fossil default to 8080 internally, then remap it to wherever we want it
29
on the host instead.
30
31
Our stock `Dockerfile` configures Fossil with the default feature set,
32
so you may wish to modify the `Dockerfile` to add configuration options,
33
add APK packages to support those options, and so forth.
34
35
The Fossil `Makefile` provides two convenience targets,
36
“`make container-image`” and “`make container-run`”. The first creates a
37
versioned container image, and the second does that and then launches a
38
fresh container based on that image. You can pass extra arguments to the
39
first command via the Makefile’s `DBFLAGS` variable and to the second
40
with the `DCFLAGS` variable. (DB is short for “`docker build`”, and DC
41
is short for “`docker create`”, a sub-step of the “run” target.)
42
To get the custom port setting as in
43
second command above, say:
44
45
$ make container-run DCFLAGS='-p 9999:8080/tcp'
46
47
Contrast the raw “`docker`” commands above, which create an
48
_unversioned_ image called `fossil:latest` and from that a container
49
simply called `fossil`. The unversioned names are more convenient for
50
interactive use, while the versioned ones are good for CI/CD type
51
applications since they avoid a conflict with past versions; it lets you
52
keep old containers around for quick roll-backs while replacing them
53
with fresh ones.
54
55
[DF]: /file/Dockerfile
56
57
58
## 2. <a id="storage"></a>Repository Storage Options
59
60
If you want the container to serve an existing repository, there are at
61
least two right ways to do it.
62
63
The wrong way is to use the `Dockerfile COPY` command, because by baking
64
the repo into the image at build time, it will become one of the image’s
65
base layers. The end result is that each time you build a container from
66
that image, the repo will be reset to its build-time state. Worse,
67
restarting the container will do the same thing, since the base image
68
layers are immutable. This is almost certainly not what you
69
want.
70
71
The correct ways put the repo into the _container_ created from the
72
_image_, not in the image itself.
73
74
75
### <a id="repo-inside"></a> 2.1 Storing the Repo Inside the Container
76
77
The simplest method is to stop the container if it was running, then
78
say:
79
80
$ docker cp /path/to/my-project.fossil fossil:/museum/repo.fossil
81
$ docker start fossil
82
$ docker exec fossil chown -R 499 /museum
83
84
That copies the local Fossil repo into the container where the server
85
expects to find it, so that the “start” command causes it to serve from
86
that copied-in file instead. Since it lives atop the immutable base
87
layers, it persists as part of the container proper, surviving restarts.
88
89
Notice that the copy command changes the name of the repository
90
database. The container configuration expects it to be called
91
`repo.fossil`, which it almost certainly was not out on the host system.
92
This is because there is only one repository inside this container, so
93
we don’t have to name it after the project it contains, as is
94
traditional. A generic name lets us hard-code the server start command.
95
96
If you skip the “chown” command above and put “`http://localhost:9999/`”
97
into your browser, expecting to see the copied-in repo’s home page, you
98
will get an opaque “Not Found” error. This is because the user and group
99
ID of the file will be that of your local user on the container’s host
100
machine, which is unlikely to map to anything in the container’s
101
`/etc/passwd` and `/etc/group` files, effectively preventing the server
102
from reading the copied-in repository file. 499 is the default “`fossil`”
103
user ID inside the container, causing Fossil to run with that user’s
104
privileges after it enters the chroot. (See [below](#args) for how to
105
change this default.) You don’t have to restart the server after fixing
106
this with `chmod`: simply reload the browser, and Fossil will try again.
107
108
109
### 2.2 <a id="bind-mount"></a>Storing the Repo Outside the Container
110
111
The simple storage method above has a problem: containers are
112
designed to be killed off at the slightest cause, rebuilt, and
113
redeployed. If you do that with the repo inside the container, it gets
114
destroyed, too. The solution is to replace the “run” command above with
115
the following:
116
117
$ docker run \
118
--publish 9999:8080 \
119
--name fossil-bind-mount \
120
--volume ~/museum:/museum \
121
fossil
122
123
Because this bind mount maps a host-side directory (`~/museum`) into the
124
container, you don’t need to `docker cp` the repo into the container at
125
all. It still expects to find the repository as `repo.fossil` under that
126
directory, but now both the host and the container can see that repo DB.
127
128
Instead of a bind mount, you could instead set up a separate
129
[volume](https://docs.docker.com/storage/volumes/), at which point you
130
_would_ need to `docker cp` the repo file into the container.
131
132
Either way, files in these mounted directories have a lifetime
133
independent of the container(s) they’re mounted into. When you need to
134
rebuild the container or its underlying image — such as to upgrade to a
135
newer version of Fossil — the external directory remains behind and gets
136
remapped into the new container when you recreate it with `--volume/-v`.
137
138
139
#### 2.2.1 <a id="wal-mode"></a>WAL Mode Interactions
140
141
You might be aware that OCI containers allow mapping a single file into
142
the repository rather than a whole directory. Since Fossil repositories
143
are specially-formatted SQLite databases, you might be wondering why we
144
don’t say things like:
145
146
--volume ~/museum/my-project.fossil:/museum/repo.fossil
147
148
That lets us have a convenient file name for the project outside the
149
container while letting the configuration inside the container refer to
150
the generic “`/museum/repo.fossil`” name. Why should we have to name
151
the repo generically on the outside merely to placate the container?
152
153
The reason is, you might be serving that repo with [WAL mode][wal]
154
enabled. If you map the repo DB alone into the container, the Fossil
155
instance inside the container will write the `-journal` and `-wal` files
156
alongside the mapped-in repository inside the container. That’s fine as
157
far as it goes, but if you then try using the same repo DB from outside
158
the container while there’s an active WAL, the Fossil instance outside
159
won’t know about it. It will think it needs to write *its own*
160
`-journal` and `-wal` files *outside* the container, creating a high
161
risk of [database corruption][dbcorr].
162
163
If we map a whole directory, both sides see the same set of WAL files.
164
[Testing](https://tangentsoft.com/sqlite/dir/walbanger?ci=trunk)
165
gives us a reasonable level of confidence that using WAL across a
166
container boundary is safe when used in this manner.
167
168
[dbcorr]: https://www.sqlite.org/howtocorrupt.html#_deleting_a_hot_journal
169
[wal]: https://www.sqlite.org/wal.html
170
171
172
## 3. <a id="security"></a>Security
173
174
### 3.1 <a id="chroot"></a>Why Not Chroot?
175
176
Prior to 2023.03.26, the stock Fossil container relied on [the chroot
177
jail feature](./chroot.md) to wall away the shell and other tools
178
provided by [BusyBox]. It included that as a bare-bones operating system
179
inside the container on the off chance that someone might need it for
180
debugging, but the thing is, Fossil is self-contained, needing none of
181
that power in the main-line use cases.
182
183
Our weak “you might need it” justification collapsed when we realized
184
you could restore this basic shell environment with a one-line change to
185
the `Dockerfile`, as shown [below](#run).
186
187
[BusyBox]: https://www.busybox.net/BusyBox.html
188
189
190
### 3.2 <a id="caps"></a>Dropping Unnecessary Capabilities
191
192
The example commands above create the container with [a default set of
193
Linux kernel capabilities][defcap]. Although Docker strips away almost
194
all of the traditional root capabilities by default, and Fossil doesn’t
195
need any of those it does take away, Docker does leave some enabled that
196
Fossil doesn’t actually need. You can tighten the scope of capabilities
197
by adding “`--cap-drop`” options to your container creation commands.
198
199
Specifically:
200
201
* **`AUDIT_WRITE`**: Fossil doesn’t write to the kernel’s auditing
202
log, and we can’t see any reason you’d want to be able to do that as
203
an administrator shelled into the container, either. Auditing is
204
something done on the host, not from inside each individual
205
container.
206
207
* **`CHOWN`**: The Fossil server never even calls `chown(2)`, and our
208
image build process sets up all file ownership properly, to the
209
extent that this is possible under the limitations of our
210
automation.
211
212
Curiously, stripping this capability doesn’t affect your ability to
213
run commands like “`chown -R fossil:fossil /museum`” when
214
you’re using bind mounts or external volumes — as we recommend
215
[above](#bind-mount) — because it’s the host OS’s kernel
216
capabilities that affect the underlying `chown(2)` call in that
217
case, not those of the container.
218
219
If for some reason you did have to change file ownership of
220
in-container files, it’s best to do that by changing the
221
`Dockerfile` to suit, then rebuilding the container, since that
222
bakes the need for the change into your reproducible build process.
223
If you had to do it without rebuilding the container, [there’s a
224
workaround][capchg] for the fact that capabilities are a create-time
225
change, baked semi-indelibly into the container configuration.
226
227
* **`FSETID`**: Fossil doesn’t use the SUID and SGID bits itself, and
228
our build process doesn’t set those flags on any of the files.
229
Although the second fact means we can’t see any harm from leaving
230
this enabled, we also can’t see any good reason to allow it, so we
231
strip it.
232
233
* **`KILL`**: The only place Fossil calls `kill(2)` is in the
234
[backoffice], and then only for processes it created on earlier
235
runs; it doesn’t need the ability to kill processes created by other
236
users. You might wish for this ability as an administrator shelled
237
into the container, but you can pass the “`docker exec --user`”
238
option to run commands within your container as the legitimate owner
239
of the process, removing the need for this capability.
240
241
* **`MKNOD`**: As of 2023.03.26, the stock container uses the
242
runtime’s default `/dev` node tree. Prior to this, we had to create
243
`/dev/null` and `/dev/urandom` inside [the chroot jail](#chroot),
244
but even then, these device nodes were created at build time and
245
were never changed at run time, so we didn’t need this run-time
246
capability even then.
247
248
* **`NET_BIND_SERVICE`**: With containerized deployment, Fossil never
249
needs the ability to bind the server to low-numbered TCP ports, not
250
even if you’re running the server in production with TLS enabled and
251
want the service bound to port 443. It’s perfectly fine to let the
252
Fossil instance inside the container bind to its default port (8080)
253
because you can rebind it on the host with the
254
“`docker create --publish 443:8080`” option. It’s the container’s
255
_host_ that needs this ability, not the container itself.
256
257
(Even the container runtime might not need that capability if you’re
258
[terminating TLS with a front-end proxy](./ssl.wiki#server). You’re
259
more likely to say something like “`-p localhost:12345:8080`” and then
260
configure the reverse proxy to translate external HTTPS calls into
261
HTTP directed at this internal port 12345.)
262
263
* **`NET_RAW`**: Fossil itself doesn’t use raw sockets, and while
264
you could [swap out the run layer](#run) for something more
265
functional that *does* make use of raw sockets, there’s little call
266
for it. The best reason I can come up with is to be able to run
267
utilities like `ping` and `traceroute`, but since we aren’t doing
268
anything clever with the networking configuration, there’s no
269
particularly compelling reason to run these from inside the
270
container. If you need to ping something, do it on the host.
271
272
If we did not take this hard-line stance, an attacker that broke
273
into the container and gained root privileges might use raw sockets
274
to do a wide array of bad things to any network the container is
275
bound to.
276
277
* **`SETFCAP, SETPCAP`**: There isn’t much call for file permission
278
granularity beyond the classic Unix ones inside the container, so we
279
drop root’s ability to change them.
280
281
All together, we recommend adding the following options to your
282
“`docker run`” commands, as well as to any “`docker create`” command
283
that will be followed by “`docker start`”:
284
285
--cap-drop AUDIT_WRITE \
286
--cap-drop CHOWN \
287
--cap-drop FSETID \
288
--cap-drop KILL \
289
--cap-drop MKNOD \
290
--cap-drop NET_BIND_SERVICE \
291
--cap-drop NET_RAW \
292
--cap-drop SETFCAP \
293
--cap-drop SETPCAP
294
295
In the next section, we’ll show a case where you create a container
296
without ever running it, making these options pointless.
297
298
[backoffice]: ./backoffice.md
299
[defcap]: https://docs.docker.com/engine/security/#linux-kernel-capabilities
300
[capchg]: https://stackoverflow.com/a/45752205/142454
301
302
303
304
## 4. <a id="static"></a>Extracting a Static Binary
305
306
Our 2-stage build process uses Alpine Linux only as a build host. Once
307
we’ve got everything reduced to a single static Fossil binary,
308
we throw all the rest of it away.
309
310
A secondary benefit falls out of this process for free: it’s arguably
311
the easiest way to build a purely static Fossil binary for Linux. Most
312
modern Linux distros make this [surprisingly difficult][lsl], but Alpine’s
313
back-to-basics nature makes static builds work the way they used to,
314
back in the day. If that’s all you’re after, you can do so as easily as
315
this:
316
317
$ docker build -t fossil .
318
$ docker create --name fossil-static-tmp fossil
319
$ docker cp fossil-static-tmp:/bin/fossil .
320
$ docker container rm fossil-static-tmp
321
322
The result is six or seven megs, depending on the CPU architecture you
323
build for. It’s built stripped.
324
325
[lsl]: https://stackoverflow.com/questions/3430400/linux-static-linking-is-dead
326
327
328
## 5. <a id="custom" name="args"></a>Customization Points
329
330
### <a id="pkg-vers"></a> 5.1 Fossil Version
331
332
The default version of Fossil fetched in the build is the version in the
333
checkout directory at the time you run it. You could override it to get
334
a release build like so:
335
336
$ docker build -t fossil --build-arg FSLVER=version-2.20 .
337
338
Or equivalently, using Fossil’s `Makefile` convenience target:
339
340
$ make container-image DBFLAGS='--build-arg FSLVER=version-2.20'
341
342
While you could instead use the generic
343
“`release`” tag here, it’s better to use a specific version number
344
since container builders cache downloaded files, hoping to
345
reuse them across builds. If you ask for “`release`” before a new
346
version is tagged and then immediately after, you might expect to get
347
two different tarballs, but because the underlying source tarball URL
348
remains the same when you do that, you’ll end up reusing the
349
old tarball from cache. This will occur
350
even if you pass the “`docker build --no-cache`” option.
351
352
This is why we default to pulling the Fossil tarball by checkin ID
353
rather than let it default to the generic “`trunk`” tag: so the URL will
354
change each time you update your Fossil source tree, forcing the builder to
355
pull a fresh tarball.
356
357
358
### 5.2 <a id="uids"></a>User & Group IDs
359
360
The “`fossil`” user and group IDs inside the container default to 499.
361
Why? Regular user IDs start at 500 or 1000 on most Unix type systems,
362
leaving those below it for system users like this Fossil daemon owner.
363
Since it’s typical for these to start at 0 and go upward, we started at
364
500 and went *down* one instead to reduce the chance of a conflict to as
365
close to zero as we can manage.
366
367
To change it to something else, say:
368
369
$ make container-image DBFLAGS='--build-arg UID=501'
370
371
This is particularly useful if you’re putting your repository on a
372
separate volume since the IDs “leak” out into the host environment via
373
file permissions. You may therefore wish them to mean something on both
374
sides of the container barrier rather than have “499” appear on the host
375
in “`ls -l`” output.
376
377
378
### 5.3 <a id="cengine"></a>Container Engine
379
380
Although the Fossil container build system defaults to Docker, we allow
381
for use of any OCI container system that implements the same interfaces.
382
We go into more details about this [below](#light), but
383
for now, it suffices to point out that you can switch to Podman while
384
using our `Makefile` convenience targets unchanged by saying:
385
386
$ make CENGINE=podman container-run
387
388
389
### 5.4 <a id="config"></a>Fossil Configuration Options
390
391
You can use this same mechanism to enable non-default Fossil
392
configuration options in your build. For instance, to turn on
393
the JSON API and the TH1 docs extension:
394
395
$ make container-image \
396
DBFLAGS='--build-arg FSLCFG="--json --with-th1-docs"'
397
398
If you also wanted [the Tcl evaluation extension](./th1.md#tclEval),
399
that brings us to [the next point](#run).
400
401
402
### 5.5 <a id="run"></a>Elaborating the Run Layer
403
404
If you want a basic shell environment for temporary debugging of the
405
running container, that’s easily added. Simply change this line in the
406
`Dockerfile`…
407
408
FROM scratch AS run
409
410
…to this:
411
412
FROM busybox AS run
413
414
Rebuild and redeploy to give your Fossil container a [BusyBox]-based
415
shell environment that you can get into via:
416
417
$ docker exec -it -u fossil $(make container-version) sh
418
419
That command assumes you built it via “`make container`” and are
420
therefore using its versioning scheme.
421
422
You will likely want to remove the `PATH` override in the “RUN” stage
423
when doing this since it’s written for the case where everything is in
424
`/bin`, and that will no longer be the case with a more full-featured
425
“`run`” layer. As long as the parent layer’s `PATH` value contains
426
`/bin`, delegating to it is more likely the correct thing.
427
428
Another useful case to consider is that you’ve installed a [server
429
extension](./serverext.wiki) and you need an interpreter for that
430
script. The first option above won’t work except in the unlikely case that
431
it’s written for one of the bare-bones script interpreters that BusyBox
432
ships.(^[BusyBox]’s `/bin/sh` is based on the old 4.4BSD Lite Almquist
433
shell, implementing little more than what POSIX specified in 1989, plus
434
equally stripped-down versions of `awk` and `sed`.)
435
436
Let’s say the extension is written in Python. Because this is one of the
437
most popular programming languages in the world, we have many options
438
for achieving this. For instance, there is a whole class of
439
“[distroless]” images that will do this efficiently by changing
440
“`STAGE 2`” in the `Dockefile` to this:
441
442
## ---------------------------------------------------------------------
443
## STAGE 2: Pare that back to the bare essentials, plus Python.
444
## ---------------------------------------------------------------------
445
FROM cgr.dev/chainguard/python:latest
446
USER root
447
ARG UID=499
448
ENV PATH "/sbin:/usr/sbin:/bin:/usr/bin"
449
COPY --from=builder /tmp/fossil /bin/
450
COPY --from=builder /bin/busybox.static /bin/busybox
451
RUN [ "/bin/busybox", "--install", "/bin" ]
452
RUN set -x \
453
&& echo "fossil:x:${UID}:${UID}:User:/museum:/false" >> /etc/passwd \
454
&& echo "fossil:x:${UID}:fossil" >> /etc/group \
455
&& install -d -m 700 -o fossil -g fossil log museum
456
457
You will also have to add `busybox-static` to the APK package list in
458
STAGE 1 for the `RUN` script at the end of that stage to work, since the
459
[Chainguard Python image][cgimgs] lacks a shell, on purpose. The need to
460
install root-level binaries is why we change `USER` temporarily here.
461
462
Build it and test that it works like so:
463
464
$ make container-run &&
465
docker exec -i $(make container-version) python --version
466
3.11.2
467
468
The compensation for the hassle of using Chainguard over something more
469
general purpose like changing the `run` layer to Alpine and then adding
470
a “`apk add python`” command to the `Dockerfile`
471
is huge: we no longer leave a package manager sitting around inside the
472
container, waiting for some malefactor to figure out how to abuse it.
473
474
Beware that there’s a limit to this über-jail’s ability to save you when
475
you go and provide a more capable runtime layer like this. The container
476
layer should stop an attacker from accessing any files out on the host
477
that you haven’t explicitly mounted into the container’s namespace, but
478
it can’t stop them from making outbound network connections or modifying
479
the repo DB inside the container.
480
481
[cgimgs]: https://github.com/chainguard-images/images/tree/main/images
482
[distroless]: https://www.chainguard.dev/unchained/minimal-container-images-towards-a-more-secure-future
483
[MTA]: https://en.wikipedia.org/wiki/Message_transfer_agent
484
485
486
### 5.6 <a id="alerts"></a>Email Alerts
487
488
The nature of our single static binary container precludes two of the
489
options for [sending email alerts](./alerts.md) from Fossil:
490
491
* pipe to a command
492
* SMTP relay host
493
494
There is no `/usr/sbin/sendmail` inside the container, and the container
495
cannot connect out to a TCP service on the host by default.
496
497
While it is possible to get around the first lack by [elaborating the
498
run layer](#run), to inject a full-blown Sendmail setup into the
499
container would go against the whole idea of containerization.
500
Forwarding an SMTP relay port into the container isn’t nearly as bad,
501
but it’s still bending the intent behind containers out of shape.
502
503
A far better option in this case is the “store emails in database”
504
method since the containerized Fossil binary knows perfectly well how to
505
write SQLite DB files without relying on any external code. Using the
506
paths in the configuration recommended above, the database path should
507
be set to something like `/museum/mail.db`. This, along with the use of
508
[bind mounts](#bind-mount) means you can have a process running outside
509
the container that passes the emails along to the host-side MTA.
510
511
The included [`email-sender.tcl`](/file/tools/email-sender.tcl) script
512
works reasonably well for this, though in my own usage, I had to make
513
two changes to it:
514
515
1. The shebang line at the top has to be `#!/usr/bin/tclsh` on my server.
516
2. I parameterized the `DBFILE` variable at the top thus:
517
518
set DBFILE [lindex $argv 0]
519
520
I then wanted a way to start this Tcl script on startup and keep it
521
running, which made me reach for systemd. My server is set to allow user
522
services to run at boot(^”Desktop” class Linuxes tend to disable that by
523
default under the theory that you don’t want those services to run until
524
you’ve logged into the GUI as that user. If you find yourself running
525
into this, [enable linger
526
mode](https://www.freedesktop.org/software/systemd/man/loginctl.html).)
527
so I was able to create a unit file called
528
`~/.local/share/systemd/user/[email protected]` with these contents:
529
530
[Unit]
531
Description=Fossil email alert sender for %I
532
533
[Service]
534
WorkingDirectory=/home/fossil/museum
535
ExecStart=/home/fossil/bin/alert-sender %I/mail.db
536
Restart=always
537
RestartSec=3
538
539
[Install]
540
WantedBy=default.target
541
542
I was then able to enable email alert forwarding for select repositories
543
after configuring them per [the docs](./alerts.md) by saying:
544
545
$ systemctl --user daemon-reload
546
$ systemctl --user enable --now alert-sender@myproject
547
548
Because this is a parameterized script and we’ve set our repository
549
paths predictably, you can do this for as many repositories as you need
550
to by passing their names after the “`@`” sign in the commands above.
551
552
553
## 6. <a id="light"></a>Lightweight Alternatives to Docker
554
555
Those afflicted with sticker shock at seeing the size of a [Docker
556
Desktop][DD] installation — 1.65 GB here — might’ve immediately
557
“noped” out of the whole concept of containers. The first thing to
558
realize is that when it comes to actually serving simple containers like
559
the ones shown above is that [Docker Engine][DE] suffices, at about a
560
quarter of the size.
561
562
Yet on a small server — say, a $4/month ten gig Digital Ocean droplet —
563
that’s still a big chunk of your storage budget. It takes ~60:1 overhead
564
merely to run a Fossil server container? Once again, I wouldn’t
565
blame you if you noped right on out of here, but if you will be patient,
566
you will find that there are ways to run Fossil inside a container even
567
on entry-level cloud VPSes. These are well-suited to running Fossil; you
568
don’t have to resort to [raw Fossil service][srv] to succeed,
569
leaving the benefits of containerization to those with bigger budgets.
570
571
For the sake of simple examples in this section, we’ll assume you’re
572
integrating Fossil into a larger web site, such as with our [Debian +
573
nginx + TLS][DNT] plan. This is why all of the examples below create
574
the container with this option:
575
576
--publish 127.0.0.1:9999:8080
577
578
The assumption is that there’s a reverse proxy running somewhere that
579
redirects public web hits to localhost port 9999, which in turn goes to
580
port 8080 inside the container. This use of port
581
publishing effectively replaces the use of the
582
“`fossil server --localhost`” option.
583
584
For the nginx case, you need to add `--scgi` to these commands, and you
585
might also need to specify `--baseurl`.
586
587
Containers are a fine addition to such a scheme as they isolate the
588
Fossil sections of the site from the rest of the back-end resources,
589
thus greatly reducing the chance that they’ll ever be used to break into
590
the host as a whole.
591
592
(If you wanted to be double-safe, you could put the web server into
593
another container, restricting it to reading from the static web
594
site directory and connecting across localhost to back-end dynamic
595
content servers such as Fossil. That’s way outside the scope of this
596
document, but you can find ready advice for that elsewhere. Seeing how
597
we do this with Fossil should help you bridge the gap in extending
598
this idea to the rest of your site.)
599
600
[DD]: https://www.docker.com/products/docker-desktop/
601
[DE]: https://docs.docker.com/engine/
602
[DNT]: ./server/debian/nginx.md
603
[srv]: ./server/
604
605
606
### 6.1 <a id="nerdctl" name="containerd"></a>Stripping Docker Engine Down
607
608
The core of Docker Engine is its [`containerd`][ctrd] daemon and the
609
[`runc`][runc] container runtime. Add to this the out-of-core CLI program
610
[`nerdctl`][nerdctl] and you have enough of the engine to run Fossil
611
containers. The big things you’re missing are:
612
613
* **BuildKit**: The container build engine, which doesn’t matter if
614
you’re building elsewhere and shipping the images to the target.
615
A good example is using a container registry as an
616
intermediary between the build and deployment hosts.
617
618
* **SwarmKit**: A powerful yet simple orchestrator for Docker that you
619
probably aren’t using with Fossil anyway.
620
621
In exchange, you get a runtime that’s about half the size of Docker
622
Engine. The commands are essentially the same as above, but you say
623
“`nerdctl`” instead of “`docker`”. You might alias one to the other,
624
because you’re still going to be using Docker to build and ship your
625
container images.
626
627
[ctrd]: https://containerd.io/
628
[nerdctl]: https://github.com/containerd/nerdctl
629
[runc]: https://github.com/opencontainers/runc
630
631
632
### 6.2 <a id="podman"></a>Podman
633
634
A lighter-weight [rootless][rl] [drop-in replacement][whatis] that
635
doesn’t give up the image builder is [Podman]. Initially created by
636
Red Hat and thus popular on that family of OSes, it will run on
637
any flavor of Linux. It can even be made to run [on macOS via Homebrew][pmmac]
638
or [on Windows via WSL2][pmwin].
639
640
On Ubuntu 22.04, the installation size is about 38&nbsp;MiB, roughly a
641
tenth the size of Docker Engine.
642
643
For our purposes here, the only thing that changes relative to the
644
examples at the top of this document are the initial command:
645
646
$ podman build -t fossil .
647
$ podman run --name fossil -p 9999:8080/tcp fossil
648
649
Your Linux package repo may have a `podman-docker` package which
650
provides a “`docker`” script that calls “`podman`” for you, eliminating
651
even the command name difference. With that installed, the `make`
652
commands above will work with Podman as-is.
653
654
The only difference that matters here is that Podman doesn’t have the
655
same [default Linux kernel capability set](#caps) as Docker, which
656
affects the `--cap-drop` flags recommended above to:
657
658
$ podman create \
659
--name fossil \
660
--cap-drop CHOWN \
661
--cap-drop FSETID \
662
--cap-drop KILL \
663
--cap-drop NET_BIND_SERVICE \
664
--cap-drop SETFCAP \
665
--cap-drop SETPCAP \
666
--publish 127.0.0.1:9999:8080 \
667
localhost/fossil
668
$ podman start fossil
669
670
[pmmac]: https://podman.io/getting-started/installation.html#macos
671
[pmwin]: https://github.com/containers/podman/blob/main/docs/tutorials/podman-for-windows.md
672
[Podman]: https://podman.io/
673
[rl]: https://github.com/containers/podman/blob/main/docs/tutorials/rootless_tutorial.md
674
[whatis]: https://docs.podman.io/en/latest/index.html
675
676
677
### 6.3 <a id="nspawn"></a>`systemd-container`
678
679
If even the Podman stack is too big for you, the next-best option I’m
680
aware of is the `systemd-container` infrastructure on modern Linuxes,
681
available since version 239 or so. Its runtime tooling requires only
682
about 1.4 MiB of disk space:
683
684
$ sudo apt install systemd-container btrfs-tools
685
686
That command assumes the primary test environment for
687
this guide, Ubuntu 22.04 LTS with `systemd` 249. For best
688
results, `/var/lib/machines` should be a btrfs volume, because
689
[`$REASONS`][mcfad]. For CentOS Stream 9 and other Red Hattish
690
systems, you will have to make several adjustments, which we’ve
691
collected [below](#nspawn-centos) to keep these examples clear.
692
693
We’ll assume your Fossil repository stores something called
694
“`myproject`” within `~/museum/myproject/repo.fossil`, named according
695
to the reasons given [above](#repo-inside). We’ll make consistent use of
696
this naming scheme in the examples below so that you will be able to
697
replace the “`myproject`” element of the various file and path names.
698
699
If you use [the stock `Dockerfile`][DF] to generate your
700
base image, `nspawn` won’t recognize it as containing an OS unless you
701
change the “`FROM scratch AS os`” line at the top of the second stage
702
to something like this:
703
704
FROM gcr.io/distroless/static-debian11 AS os
705
706
Using that as a base image provides all the files `nspawn` checks for to
707
determine whether the container is sufficiently close to a Linux VM for
708
the following step to proceed:
709
710
$ make container
711
$ docker container export $(make container-version) |
712
machinectl import-tar - myproject
713
714
Next, create `/etc/systemd/nspawn/myproject.nspawn`:
715
716
----
717
718
[Exec]
719
WorkingDirectory=/
720
Parameters=bin/fossil server \
721
--baseurl https://example.com/myproject \
722
--create \
723
--jsmode bundled \
724
--localhost \
725
--port 9000 \
726
--scgi \
727
--user admin \
728
museum/repo.fossil
729
DropCapability= \
730
CAP_AUDIT_WRITE \
731
CAP_CHOWN \
732
CAP_FSETID \
733
CAP_KILL \
734
CAP_MKNOD \
735
CAP_NET_BIND_SERVICE \
736
CAP_NET_RAW \
737
CAP_SETFCAP \
738
CAP_SETPCAP
739
ProcessTwo=yes
740
LinkJournal=no
741
Timezone=no
742
743
[Files]
744
Bind=/home/fossil/museum/myproject:/museum
745
746
[Network]
747
VirtualEthernet=no
748
749
----
750
751
If you recognize most of that from the `Dockerfile` discussion above,
752
congratulations, you’ve been paying attention. The rest should also
753
be clear from context.
754
755
Some of this is expected to vary:
756
757
* The references to `example.com` and `myproject` are stand-ins for
758
your actual web site and repository name.
759
760
* The command given in the `Parameters` directive assumes you’re
761
setting up [SCGI proxying via nginx][DNT], but with adjustment,
762
it’ll work with the other repository service methods we’ve
763
[documented][srv].
764
765
* The path in the host-side part of the `Bind` value must point at the
766
directory containing the `repo.fossil` file referenced in said
767
command so that `/museum/repo.fossil` refers to your repo out
768
on the host for the reasons given [above](#bind-mount).
769
770
That being done, we also need a generic `systemd` unit file called
771
`/etc/systemd/system/[email protected]`, containing:
772
773
----
774
775
[Unit]
776
Description=Fossil %i Repo Service
777
[email protected] [email protected]
778
After=network.target systemd-resolved.service [email protected] [email protected]
779
780
[Service]
781
ExecStart=systemd-nspawn --settings=override --read-only --machine=%i bin/fossil
782
783
[Install]
784
WantedBy=multi-user.target
785
786
----
787
788
You shouldn’t have to change any of this because we’ve given the
789
`--setting=override` flag, meaning any setting in the nspawn file
790
overrides the setting passed to `systemd-nspawn`. This arrangement
791
not only keeps the unit file simple, it allows multiple services to
792
share the base configuration, varying on a per-repo level through
793
adjustments to their individual `*.nspawn` files.
794
795
You may then start the service in the normal way:
796
797
$ sudo systemctl enable fossil@myproject
798
$ sudo systemctl start fossil@myproject
799
800
You should then find it running on localhost port 9000 per the nspawn
801
configuration file above, suitable for proxying Fossil out to the
802
public using nginx via SCGI. If you aren’t using a front-end proxy
803
and want Fossil exposed to the world via HTTPS, you might say this instead in
804
the `*.nspawn` file:
805
806
Parameters=bin/fossil server \
807
--cert /path/to/cert.pem \
808
--create \
809
--jsmode bundled \
810
--port 443 \
811
--user admin \
812
museum/repo.fossil
813
814
You would also need to un-drop the `CAP_NET_BIND_SERVICE` capability
815
to allow Fossil to bind to this low-numbered port.
816
817
We use the `systemd` template file feature to allow multiple Fossil
818
servers running on a single machine, each on a different TCP port,
819
as when proxying them out as subdirectories of a larger site.
820
To add another project, you must first clone the base “machine” layer:
821
822
$ sudo machinectl clone myproject otherthing
823
824
That will not only create a clone of `/var/lib/machines/myproject`
825
as `../otherthing`, it will create a matching `otherthing.nspawn` file for you
826
as a copy of the first one. Adjust its contents to suit, then enable
827
and start it as above.
828
829
[mcfad]: https://www.freedesktop.org/software/systemd/man/machinectl.html#Files%20and%20Directories
830
831
832
### 6.3.1 <a id="nspawn-rhel"></a>Getting It Working on a RHEL Clone
833
834
The biggest difference between doing this on OSes like CentOS versus
835
Ubuntu is that RHEL (thus also its clones) doesn’t ship btrfs in
836
its kernel, thus ships with no package repositories containing `mkfs.btrfs`, which
837
[`machinectl`][mctl] depends on for achieving its various purposes.
838
839
Fortunately, there are workarounds.
840
841
First, the `apt install` command above becomes:
842
843
$ sudo dnf install systemd-container
844
845
Second, you have to hack around the lack of `machinectl import-tar`:
846
847
$ rootfs=/var/lib/machines/fossil
848
$ sudo mkdir -p $rootfs
849
$ docker container export fossil | sudo tar -xf -C $rootfs -
850
851
The parent directory path in the `rootfs` variable is important,
852
because although we aren’t able to use `machinectl` on such systems, the
853
`systemd-nspawn` developers assume you’re using them together; when you give
854
`--machine`, it assumes the `machinectl` directory scheme. You could
855
instead use `--directory`, allowing you to store the rootfs wherever
856
you like, but why make things difficult? It’s a perfectly sensible
857
default, consistent with the [LHS] rules.
858
859
The final element &mdash; the machine name &mdash; can be anything
860
you like so long as it matches the nspawn file’s base name.
861
862
Finally, since you can’t use `machinectl clone`, you have to make
863
a wasteful copy of `/var/lib/machines/myproject` when standing up
864
multiple Fossil repo services on a single machine. (This is one
865
of the reasons `machinectl` depends on `btrfs`: cheap copy-on-write
866
subvolumes.) Because we give the `--read-only` flag, you can simply
867
`cp -r` one machine to a new name rather than go through the
868
export-and-import dance you used to create the first one.
869
870
[LHS]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html
871
[mctl]: https://www.freedesktop.org/software/systemd/man/machinectl.html
872
873
874
### 6.3.2 <a id="nspawn-weaknesses"></a>What Am I Missing Out On?
875
876
For all the runtime size savings in this method, you may be wondering
877
what you’re missing out on relative to Podman, which takes up
878
roughly 27× more disk space. Short answer: lots. Long answer:
879
880
1. **Build system.** You’ll have to build and test your containers
881
some other way. This method is only suitable for running them
882
once they’re built.
883
884
2. **Orchestration.** All of the higher-level things like
885
“compose” files, Docker Swarm mode, and Kubernetes are
886
unavailable to you at this level. You can run multiple
887
instances of Fossil, but on a single machine only and with a
888
static configuration.
889
890
3. **Image layer sharing.** When you update an image using one of the
891
above methods, Docker and Podman are smart enough to copy only
892
changed layers. Furthermore, when you base multiple containers
893
on a single image, they don’t make copies of the base layers;
894
they can share them, because base layers are immutable, thus
895
cannot cross-contaminate.
896
897
Because we use `systemd-nspawn --read-only`, we get *some*
898
of this benefit, particularly when using `machinectl` with
899
`/var/lib/machines` as a btrfs volume. Even so, the disk space
900
and network I/O optimizations go deeper in the Docker and Podman
901
worlds.
902
903
4. **Tooling.** Hand-creating and modifying those `systemd`
904
files sucks compared to “`podman container create ...`” This
905
is but one of many affordances you will find in the runtimes
906
aimed at daily-use devops warriors.
907
908
5. **Network virtualization.** In the scheme above, we turn off the
909
`systemd` private networking support because in its default mode, it
910
wants to hide containerized services entirely. While there are
911
[ways][ndcmp] to expose Fossil’s single network service port under
912
that scheme, it adds a lot of administration complexity. In the
913
big-boy container runtimes, `docker create --publish` fixes all this
914
up in a single option, whereas `systemd-nspawn --port` does
915
approximately *none* of that despite the command’s superficial
916
similarity.
917
918
From a purely functional point of view, this isn’t a huge problem if
919
you consider the inbound service direction only, being external
920
connections to the Fossil service we’re providing. Since we do want
921
this Fossil service to be exposed — else why are we running it? — we
922
get all the control we need via `fossil server --localhost` and
923
similar options.
924
925
The complexity of the `systemd` networking infrastructure’s
926
interactions with containers make more sense when you consider the
927
outbound path. Consider what happens if you enable Fossil’s
928
optional TH1 docs feature plus its Tcl evaluation feature. That
929
would enable anyone with the rights to commit to your repository the
930
ability to make arbitrary network connections on the Fossil host.
931
Then, let us say you have a client-server DBMS server on that same
932
host, bound to localhost for private use by other services on the
933
machine. Now that DBMS is open to access by a rogue Fossil committer
934
because the host’s loopback interface is mapped directly into the
935
container’s network namespace.
936
937
Proper network virtualization would protect you in this instance.
938
939
This author expects that the set of considerations is broader than
940
presented here, but that it suffices to make our case as it is: if you
941
can afford the space of Podman or Docker, we strongly recommend using
942
either of them over the much lower-level `systemd-container`
943
infrastructure. You’re getting a considerable amount of value for the
944
higher runtime cost; it isn’t pointless overhead.
945
946
(Incidentally, these are essentially the same reasons why we no longer
947
talk about the `crun` tool underpinning Podman in this document. It’s
948
even more limited than `nspawn`, making it even more difficult to administer while
949
providing no runtime size advantage. The `runc` tool underpinning
950
Docker is even worse on this score, being scarcely easier to use than
951
`crun` while having a much larger footprint.)
952
953
[ndcmp]: https://wiki.archlinux.org/title/systemd-networkd#Usage_with_containers
954
955
956
### 6.3.3 <a id="nspawn-assumptions"></a>Violated Assumptions
957
958
The `systemd-container` infrastructure has a bunch of hard-coded
959
assumptions baked into it. We papered over these problems above,
960
but if you’re using these tools for other purposes on the machine
961
you’re serving Fossil from, you may need to know which assumptions
962
our container violates and the resulting consequences.
963
964
Some of it we discussed above already, but there’s one big class of
965
problems we haven’t covered yet. It stems from the fact that our stock
966
container starts a single static executable inside a bare-bones container
967
rather than “boot” an OS image. That causes a bunch of commands to fail:
968
969
* **`machinectl poweroff`** will fail because the container
970
isn’t running dbus.
971
972
* **`machinectl start`** will try to find an `/sbin/init`
973
program in the rootfs, which we haven’t got. We could
974
rename `/bin/fossil` to `/sbin/init` and then hack
975
the chroot scheme to match, but ick. (This, incidentally,
976
is why we set `ProcessTwo=yes` above even though Fossil is
977
perfectly capable of running as PID 1, a fact we depend on
978
in the other methods above.)
979
980
* **`machinectl shell`** will fail because there is no login
981
daemon running, which we purposefully avoided adding by
982
creating a “`FROM scratch`” container. (If you need a
983
shell, say: `sudo systemd-nspawn --machine=myproject /bin/sh`)
984
985
* **`machinectl status`** won’t give you the container logs
986
because we disabled the shared journal, which was in turn
987
necessary because we don’t run `systemd` *inside* the
988
container, just outside.
989
990
If these are problems for you, you may wish to build a
991
fatter container using `debootstrap` or similar. ([External
992
tutorial][medtut].)
993
994
[medtut]: https://medium.com/@huljar/setting-up-containers-with-systemd-nspawn-b719cff0fb8d
995
996
<div style="height:50em" id="this-space-intentionally-left-blank"></div>
997

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button