Fossil SCM

Worked out how to get systemd-container (a.k.a. nspawn + machinectl) working with the stock Fossil container. Following the above commits, it's pure documentation. Removed the runc and crun docs at the same time since this is as small as crun while being more functional; there's zero reaon to push through all the additional complexity of those even lower-level tools now that this method is debugged and documented.

wyoung 2022-11-30 23:09 trunk
Commit 930a655a14e9b040fcb4156bdda544ea9f6b684e2756ed15eb2ffb2bd4f6306a
1 file changed +319 -245
+319 -245
--- www/containers.md
+++ www/containers.md
@@ -484,11 +484,11 @@
484484
that’s still a big chunk of your storage budget. It takes 100:1 overhead
485485
just to run a 4 MiB Fossil server container? Once again, I wouldn’t
486486
blame you if you noped right on out of here, but if you will be patient,
487487
you will find that there are ways to run Fossil inside a container even
488488
on entry-level cloud VPSes. These are well-suited to running Fossil; you
489
-don’t have to resort to [raw Fossil service](./server/) to succeed,
489
+don’t have to resort to [raw Fossil service][srv] to succeed,
490490
leaving the benefits of containerization to those with bigger budgets.
491491
492492
For the sake of simple examples in this section, we’ll assume you’re
493493
integrating Fossil into a larger web site, such as with our [Debian +
494494
nginx + TLS][DNT] plan. This is why all of the examples below create
@@ -521,10 +521,11 @@
521521
this idea to the rest of your site.)
522522
523523
[DD]: https://www.docker.com/products/docker-desktop/
524524
[DE]: https://docs.docker.com/engine/
525525
[DNT]: ./server/debian/nginx.md
526
+[srv]: ./server/
526527
527528
528529
### 6.1 <a id="nerdctl" name="containerd"></a>Stripping Docker Engine Down
529530
530531
The core of Docker Engine is its [`containerd`][ctrd] daemon and the
@@ -556,12 +557,12 @@
556557
give up the image builder is [Podman]. Initially created by
557558
Red Hat and thus popular on that family of OSes, it will run on
558559
any flavor of Linux. It can even be made to run [on macOS via Homebrew][pmmac]
559560
or [on Windows via WSL2][pmwin].
560561
561
-On Ubuntu 22.04, it’s about a quarter the size of Docker Engine, or half
562
-that of the “full” distribution of `nerdctl` and all its dependencies.
562
+On Ubuntu 22.04, the installation size is about 38&nbsp;MiB, roughly a
563
+tenth the size of Docker Engine.
563564
564565
Although Podman [bills itself][whatis] as a drop-in replacement for the
565566
`docker` command and everything that sits behind it, some of the tool’s
566567
design decisions affect how our Fossil containers run, as compared to
567568
using Docker. The most important of these is that, by default, Podman
@@ -703,251 +704,322 @@
703704
container images across the Internet, it can be a net win in terms of
704705
build time.
705706
706707
707708
708
-### 6.3 <a id="barebones"></a>Bare-Bones OCI Bundle Runners
709
-
710
-If even the Podman stack is too big for you, you still have options for
711
-running containers that are considerably slimmer, at a high cost to
712
-administration complexity and loss of features.
713
-
714
-Part of the OCI standard is the notion of a “bundle,” being a consistent
715
-way to present a pre-built and configured container to the runtime.
716
-Essentially, it consists of a directory containing a `config.json` file
717
-and a `rootfs/` subdirectory containing the root filesystem image. Many
718
-tools can produce these for you. We’ll show only one method in the first
719
-section below, then reuse that in the following sections.
720
-
721
-
722
-#### 6.3.1 <a id="runc"></a>`runc`
723
-
724
-We mentioned `runc` [above](#nerdctl), but it’s possible to use it
725
-standalone, without `containerd` or its CLI frontend `nerdctl`. You also
726
-lose the build engine, intelligent image layer sharing, image registry
727
-connections, and much more. The plus side is that `runc` alone is
728
-18 MiB.
729
-
730
-Using it without all the support tooling isn’t complicated, but it *is*
731
-cryptic enough to want a shell script. Let’s say we want to build on our
732
-big desktop machine but ship the resulting container to a small remote
733
-host. This should serve:
734
-
-----
735
-
736
-```shell
737
-#!/bin/bash -ex
738
-c=fossil
739
-b=/var/lib/machines/$c
740
-h=my-host.example.com
741
-m=/run/containerd/io.containerd.runtime.v2.task/moby
742
-t=$(mktemp -d /tmp/$c-bundle.XXXXXX)
743
-
744
-if [ -d "$t" ]
745
-then
746
- docker container start $c
747
- docker container export $c > $t/rootfs.tar
748
- id=$(docker inspect --format="{{.Id}}" $c)
749
- sudo cat $m/$id/config.json \
750
- | jq '.root.path = "'$b/rootfs'"'
751
- | jq '.linux.cgroupsPath = ""'
752
- | jq 'del(.linux.sysctl)'
753
- | jq 'del(.linux.namespaces[] | select(.type == "network"))'
754
- | jq 'del(.mounts[] | select(.destination == "/etc/hostname"))'
755
- | jq 'del(.mounts[] | select(.destination == "/etc/resolv.conf"))'
756
- | jq 'del(.mounts[] | select(.destination == "/etc/hosts"))'
757
- | jq 'del(.hooks)' > $t/config.json
758
- scp -r $t $h:tmp
759
- ssh -t $h "{
760
- mv ./$t/config.json $b &&
761
- sudo tar -C $b/rootfs -xf ./$t/rootfs.tar &&
762
- rm -r ./$t
763
- }"
764
- rm -r $t
765
-fi
766
-```
767
-
-----
768
-
769
-The first several lines list configurables:
770
-
771
-* **`c`**: the name of the Docker container you’re bundling up for use
772
- with `runc`
773
-* **`b`**: the path of the exported container, called the “bundle” in
774
- OCI jargon; we’re using the [`nspawn`](#nspawn) convention, a
775
- reasonable choice under the [Linux FHS rules][LFHS]
776
-* **`h`**: the remote host name
777
-* **`m`**: the local directory holding the running machines, configurable
778
- because:
779
- * the path name is longer than we want to use inline
780
- * it’s been known to change from one version of Docker to the next
781
- * you might be building and testing with [Podman](#podman), so it
782
- has to be “`/run/user/$UID/crun`” instead
783
-* **`t`**: the temporary bundle directory we populate locally, then
784
- `scp` to the remote machine, where it’s unpacked
785
-
786
-[LFHS]: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard
787
-
788
-
789
-##### Why All That `sudo` Stuff?
790
-
791
-This script uses `sudo` for two different purposes:
792
-
793
-1. To read the local `config.json` file out of the `containerd` managed
794
- directory, which is owned by `root` on Docker systems. Additionally,
795
- that input file is only available while the container is started, so
796
- we must ensure that before extracting it.
797
-
798
-2. To unpack the bundle onto the remote machine. If you try to get
799
- clever and unpack it locally, then `rsync` it to the remote host to
800
- avoid re-copying files that haven’t changed since the last update,
801
- you’ll find that it fails when it tries to copy device nodes, to
802
- create files owned only by the remote root user, and so forth. If the
803
- container bundle is small, it’s simpler to re-copy and unpack it
804
- fresh each time.
805
-
806
-I point all this out because it might ask for your password twice: once for
807
-the local sudo command, and once for the remote.
808
-
809
-
810
-
811
-##### Why All That `jq` Stuff?
812
-
813
-We’re using [jq] for two separate purposes:
814
-
815
-1. To automatically transmogrify Docker’s container configuration so it
816
- will work with `runc`:
817
-
818
- * point it where we unpacked the container’s exported rootfs
819
- * accede to its wish to [manage cgroups by itself][ecg]
820
- * remove the `sysctl` calls that will break after…
821
- * …we remove the network namespace to allow Fossil’s TCP listening
822
- port to be available on the host; `runc` doesn’t offer the
823
- equivalent of `docker create --publish`, and we can’t be
824
- bothered to set up a manual mapping from the host port into the
825
- container
826
- * remove file bindings that point into the local runtime managed
827
- directories; one of the things we give up by using a bare
828
- container runner is automatic management of these files
829
- * remove the hooks for essentially the same reason
830
-
831
-2. To make the Docker-managed machine-readable `config.json` more
832
- human-readable, in case there are other things you want changed in
833
- this version of the container. Exposing the `config.json` file like
834
- this means you don’t have to rebuild the container merely to change
835
- a value like a mount point, the kernel capability set, and so forth.
836
-
837
-
838
-##### Running the Bundle
839
-
840
-With the container exported to a bundle like this, you can start it as:
841
-
842
-```
843
- $ cd /path/to/bundle
844
- $ c=fossil-runc ← …or anything else you prefer
845
- $ sudo runc create $c
846
- $ sudo runc start $c
847
- $ sudo runc exec $c -t sh -l
848
- ~ $ ls museum
849
- repo.fossil
850
- ~ $ ps -eaf
851
- PID USER TIME COMMAND
852
- 1 fossil 0:00 bin/fossil server --create …
853
- ~ $ exit
854
- $ sudo runc kill $c
855
- $ sudo runc delete $c
856
-```
857
-
858
-If you’re doing this on the export host, the first command is “`cd $b`”
859
-if we’re using the variables from the shell script above. Alternately,
860
-the `runc` subcommands that need to read the bundle files take a
861
-`--bundle/-b` flag to let you avoid switching directories.
862
-
863
-The rest should be straightforward: create and start the container as
864
-root so the `chroot(2)` call inside the container will succeed, then get
865
-into it with a login shell and poke around to prove to ourselves that
866
-everything is working properly. It is. Yay!
867
-
868
-The remaining commands show shutting the container down and destroying
869
-it, simply to show how these commands change relative to using the
870
-Docker Engine commands. It’s “kill,” not “stop,” and it’s “delete,” not
871
-“rm.”
872
-
873
-[ecg]: https://github.com/opencontainers/runc/pull/3131
874
-[jq]: https://stedolan.github.io/jq/
875
-
876
-
877
-##### Lack of Layer Sharing
878
-
879
-The bundle export process collapses Docker’s union filesystem down to a
880
-single layer. Atop that, it makes all files mutable.
881
-
882
-All of this is fine for tiny remote hosts with a single container, or at
883
-least one where none of the containers share base layers. Where it
884
-becomes a problem is when you have multiple Fossil containers on a
885
-single host, since they all derive from the same base image.
886
-
887
-The full-featured container runtimes above will intelligently share
888
-these immutable base layers among the containers, storing only the
889
-differences in each individual container. More, when pulling images from
890
-a registry host, they’ll transfer only the layers you don’t have copies
891
-of locally, so you don’t have to burn bandwidth sending copies of Alpine
892
-and BusyBox each time, even though they’re unlikely to change from one
893
-build to the next.
894
-
895
-
896
-#### 6.3.2 <a id="crun"></a>`crun`
897
-
898
-In the same way that [Docker Engine is based on `runc`](#runc), Podman’s
899
-engine is based on [`crun`][crun], a lighter-weight alternative to
900
-`runc`. It’s only 1.4 MiB on the system I tested it on, yet it will run
901
-the same container bundles as in my `runc` examples above. We saved
902
-more than that by compressing the container’s Fossil executable with
903
-UPX, making the runtime virtually free in this case. The only question
904
-is whether you can put up with its limitations, which are the same as
905
-for `runc`.
906
-
907
-[crun]: https://github.com/containers/crun
908
-
909
-
910
-#### 6.3.3 <a id="nspawn"></a>`systemd-nspawn`
911
-
912
-As of `systemd` version 242, its optional `nspawn` piece
913
-[reportedly](https://www.phoronix.com/news/Systemd-Nspawn-OCI-Runtime)
914
-got the ability to run OCI bundles directly. You might
915
-have it installed already, but if not, it’s only about 2 MiB. It’s
916
-in the `systemd-containers` package as of Ubuntu 22.04 LTS:
917
-
918
-```
919
- $ sudo apt install systemd-containers
920
-```
921
-
922
-It’s also in CentOS Stream 9, under the same name.
923
-
924
-You create the bundles the same way as with [the `runc` method
925
-above](#runc). The only thing that changes are the top-level management
926
-commands:
927
-
928
-```
929
- $ sudo systemd-nspawn \
930
- --oci-bundle=/var/lib/machines/fossil \
931
- --machine=fossil \
932
- --network-veth \
933
- --port=127.0.0.1:127.0.0.1:9999:8080
934
- $ sudo machinectl list
935
- No machines.
936
-```
937
-
938
-This is why I wrote “reportedly” above: I couldn’t get it to work on two different
939
-Linux distributions, and I can’t see why. I’m leaving this here to give
940
-someone else a leg up, with the hope that they will work out what’s
941
-needed to get the container running and registered with `machinectl`.
942
-
943
-As of this writing, the tool expects an OCI container version of
944
-“1.0.0”. I had to edit this at the top of my `config.json` file to get
945
-the first command to read the bundle. The fact that it errored out when
946
-I had “`1.0.2-dev`” in there proves it’s reading the file, but it
947
-doesn’t seem able to make sense of what it finds there, and it doesn’t
948
-give any diagnostics to say why.
949
-
709
+### 6.3 <a id="nspawn"></a>`systemd-container`
710
+
711
+If even the Podman stack is too big for you, the next-best option I’m
712
+aware of is the `systemd-container` infrastructure on modern Linuxes,
713
+available since version 239 or so. Its runtime tooling requires only
714
+about 1.4 MiB of disk space:
715
+
716
+```
717
+ $ sudo apt install systemd-container btrfs-tools
718
+```
719
+
720
+That command assumes the primary test environment for
721
+this guide, Ubuntu 22.04 LTS with `systemd` 249. For best
722
+results, `/var/lib/machines` should be a btrfs volume, because
723
+[`$REASONS`][mcfad]. (For CentOS Stream 9 and other Red Hattish
724
+systems, you will have to make serveral adjustments, which we’ve
725
+collected [below](#nspawn-centos) to keep these examples clear.)
726
+
727
+The first configuration step is to convert the Docker container into
728
+a “machine”, as systemd calls it. The easiest method is:
729
+
730
+```
731
+ $ make container-run
732
+ $ docker container export fossil-e119d5983620 |
733
+ machinectl import-tar - myproject
734
+```
735
+
736
+Copy the container name from the first step to the second. Yours will
737
+almost certainly be named after a different Fossil commit ID.
738
+
739
+It’s important that the name of the machine you create &mdash;
740
+“`myproject`” in this example &mdash; matches the base name
741
+of the nspawn configuration file you create as the next step.
742
+Therefore, to extend the example, the following file needs to be
743
+called `/etc/systemd/nspawn/myproject.nspawn`, and it will contain
744
+something like:
745
+
746
+----
747
+
748
+```
749
+[Exec]
750
+WorkingDirectory=/jail
751
+Parameters=bin/fossil server \
752
+ --baseurl https://example.com/myproject \
753
+ --chroot /jail \
754
+ --create \
755
+ --jsmode bundled \
756
+ --localhost \
757
+ --port 9000 \
758
+ --scgi \
759
+ --user admin \
760
+ museum/repo.fossil
761
+DropCapability= \
762
+ CAP_AUDIT_WRITE \
763
+ CAP_CHOWN \
764
+ CAP_FSETID \
765
+ CAP_KILL \
766
+ CAP_MKNOD \
767
+ CAP_NET_BIND_SERVICE \
768
+ CAP_NET_RAW \
769
+ CAP_SETFCAP \
770
+ CAP_SETPCAP
771
+ProcessTwo=yes
772
+LinkJournal=no
773
+Timezone=no
774
+
775
+[Files]
776
+Bind=/home/fossil/museum/myproject:/jail/museum
777
+
778
+[Network]
779
+VirtualEthernet=no
780
+```
781
+
782
+----
783
+
784
+If you recognize most of that from the `Dockerfile` discussion above,
785
+congratulations, you’ve been paying attention. The rest should also
786
+be clear from context.
787
+
788
+Some of this is expected to vary. For one, the command given in the
789
+`Parameters` directive assumes [SCGI proxying via nginx][DNT]. For
790
+other use cases, see our collection of [Fossil server configuration
791
+guides][srv], then adjust the command to your local needs.
792
+For another, you will likely have to adjust the `Bind` value to
793
+point at the directory containing the `repo.fossil` file referenced
794
+in the command.
795
+
796
+We also need a generic systemd unit file called
797
+`/etc/systemd/system/[email protected]`, containing:
798
+
799
+----
800
+
801
+```
802
+[Unit]
803
+Description=Fossil %i Repo Service
804
+[email protected] [email protected]
805
+After=network.target systemd-resolved.service [email protected] [email protected]
806
+
807
+[Service]
808
+ExecStart=systemd-nspawn --settings=override --read-only --machine=%i bin/fossil
809
+
810
+[Install]
811
+WantedBy=multi-user.target
812
+```
813
+
814
+----
815
+
816
+You shouldn’t have to change any of this because we’ve given the
817
+`--setting=override` flag, meaning any setting in the nspawn file
818
+overrides the setting passed to `systemd-nspawn`. This arrangement
819
+not only keeps the unit file simple, it allows multiple services to
820
+share the base configuration, varying on a per-repo level.
821
+
822
+Start the service in the normal way:
823
+
824
+```
825
+ $ sudo systemctl enable fossil@myproject
826
+ $ sudo systemctl start fossil@myproject
827
+```
828
+
829
+You should find it running on localhost port 9000 per the nspawn
830
+configuration file above, suitable for proxying Fossil out to the
831
+public using nginx, via SCGI. If you aren’t using a front-end proxy
832
+and want Fossil exposed to the world, you might say this instead in
833
+the `nspawn` file:
834
+
835
+```
836
+Parameters=bin/fossil server \
837
+ --cert /path/to/my/fullchain.pem \
838
+ --chroot /jail \
839
+ --create \
840
+ --jsmode bundled \
841
+ --port 443 \
842
+ --user admin \
843
+ museum/repo.fossil
844
+```
845
+
846
+You would also need to un-drop the `CAP_NET_BIND_SERVICE` capability
847
+to allow Fossil to bind to this low-numbered port.
848
+
849
+We use systemd’s template file feature to allow multiple Fossil
850
+servers running on a single machine, each on a different TCP port,
851
+as when proxying them out as subdirectories of a larger site.
852
+To add another project, you must first clone the base “machine” layer:
853
+
854
+```
855
+ $ sudo machinectl clone myproject otherthing
856
+```
857
+
858
+That will not only create a clone of `/var/lib/machines/myproject`
859
+as `../otherthing`, it will create a matching `nspawn` file for you
860
+as a copy of the first one. Adjust its contents to suit, then enable
861
+and start it as above.
862
+
863
+[mcfad]: https://www.freedesktop.org/software/systemd/man/machinectl.html#Files%20and%20Directories
864
+
865
+
866
+### 6.3.1 <a id="nspawn-rhel"></a>Getting It Working on a RHEL Clone
867
+
868
+The biggest difference between doing this on OSes like CentOS versus
869
+Ubuntu is that RHEL (thus also its clones) doesn’t ship btrfs in
870
+its kernel, thus has no option for installing `mkfs.btrfs`, which
871
+[`machinectl`][mctl] needs for various purposes.
872
+
873
+Fortunately, there are workarounds.
874
+
875
+First, the `apt install` command above becomes:
876
+
877
+```
878
+ $ sudo dnf install systemd-container
879
+```
880
+
881
+Second, you have to hack around the lack of `machinectl import-tar` so:
882
+
883
+```
884
+ $ rootfs=/var/lib/machines/fossil
885
+ $ sudo mkdir -p $rootfs
886
+ $ docker container export fossil | sudo tar -xf -C $rootfs -
887
+```
888
+
889
+The parent directory path in the `rootfs` variable is important,
890
+because although we aren’t using `machinectl`, the `systemd-nspawn`
891
+developers assume you’re using them together. Thus, when you give
892
+`--machine`, it assumes the `machinectl` directory scheme. You could
893
+instead use `--directory`, allowing you to store the rootfs whereever
894
+you like, but why make things difficult? It’s a perfectly sensible
895
+default, consistent with the [LHS] rules.
896
+
897
+The final element &mdash; the machine name &mdash; can be anything
898
+you like so long as it matches the nspawn file’s base name.
899
+
900
+Finally, since you can’t use `machinectl clone`, you have to make
901
+a wasteful copy of `/var/lib/machines/myproject` when standing up
902
+multiple Fossil repo services on a single machine. (This is one
903
+of the reasons `machinectl` depends on `btrfs`: cheap copy-on-write
904
+subvolumes.) Because we give the `--read-only` flag, you can simply
905
+`cp -r` one machine to a new name rather than go through the
906
+export-and-import dance you used to create the first one.
907
+
908
+[LHS]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html
909
+[mctl]: https://www.freedesktop.org/software/systemd/man/machinectl.html
910
+
911
+
912
+### 6.3.2 <a id="nspawn-weaknesses"></a>What Am I Missing Out On?
913
+
914
+For all the runtime size savings in this method, you may be wondering
915
+what you’re missing out on relative to Podman, which takes up
916
+roughly 27× more disk space. Short answer: lots. Long answer:
917
+
918
+1. **Build system.** You’ll have to build and test your containers
919
+ some other way. This method is only suitable for running them
920
+ once they’re built.
921
+
922
+2. **Orchestration.** All of the higher-level things like
923
+ “compose” files, Docker Swarm mode, and Kubernetes are
924
+ unavailable to you at this level. You can run multiple
925
+ instances of Fossil, but on a single machine only and with a
926
+ static configuration.
927
+
928
+3. **Image layer sharing.** When you update an image using one of the
929
+ above methods, Docker and Podman are smart enough to copy only
930
+ changed layers. Furthermore, when you base multiple containers
931
+ on a single image, they don’t make copies of the base layers;
932
+ they can share them, because base layers are immutable, thus
933
+ cannot cross-contaminate.
934
+
935
+ Because we use `sysetmd-nspawn --read-only`, we get *some*
936
+ of this benefit, particularly when using `machinectl` with
937
+ `/var/lib/machines` as a btrfs volume. Even so, the disk space
938
+ and network I/O optimizations go deeper in the Docker and Podman
939
+ worlds.
940
+
941
+4. **Tooling.** Hand-creating and modifying those systemd
942
+ files sucks compared to “`podman container create ...`” This
943
+ is but one of many affordances you will find in the runtimes
944
+ aimed at daily-use devops warriors.
945
+
946
+5. **Network virtualization.** In the scheme above, we turn off the
947
+ `systemd` virtual netorking support because in its default mode,
948
+ it wants to hide the service entirely.
949
+
950
+ Another way to put this is that `systemd-nspawn --port` does
951
+ approximately *nothing* of what `docker create --publish` does
952
+ despite their superficial similarities.
953
+
954
+ For this container, it doesn’t much matter, since it exposes
955
+ only a single port, and we do want that one port exposed, one way
956
+ or another. Beyond that, we get all the control we need using
957
+ Fossil options like `--localhost`. I point this out because in
958
+ more complex situations, the automatic network setup features of
959
+ the more featureful runtimes can save a lot of time and hassle.
960
+ They aren’t doing anything you couldn’t do by hand, but why
961
+ would you want to, given the choice?
962
+
963
+I expect there’s a lot more I neglected to think of when creating
964
+this list, but I think it suffices to make my case as it is. If you
965
+can afford the space of Podman or Docker, I strongly recommend using
966
+either of them over the much lower-level `systemd-container`
967
+infrastructure.
968
+
969
+(Incidentally, these are essentially the same reasons why we no longer
970
+talk about the `crun` tool underpinning Podman in this document. It’s
971
+even more limited, making it even more difficult to administer while
972
+providing no runtime size advantage. The `runc` tool underpinning
973
+Docker is even worse on this score, being scarcely easier to use than
974
+`crun` while having a much larger footprint.)
975
+
976
+
977
+### 6.3.3 <a id="nspawn-assumptions"></a>Violated Assumptions
978
+
979
+The `systemd-container` infrastructure has a bunch of hard-coded
980
+assumptions baked into it. We papered over these problems above,
981
+but if you’re using these tools for other purposes on the machine
982
+you’re serving Fossil from, you may need to know which assumptions
983
+our container violates and the resulting consequences:
984
+
985
+1. `systemd-nspawn` works best with `machinectl`, but if you haven’t
986
+ got `btrfs` available, you run into [trouble](#nspawn-rhel).
987
+
988
+2. Our stock container starts a single static executable inside
989
+ a stripped-to-the-bones container rather than “boot” an OS
990
+ image, causing a bunch of commands to fail:
991
+
992
+ * **`machinectl poweroff`** will fail because the container
993
+ isn’t running dbus.
994
+ * **`machinectl start`** will try to find an `/sbin/init`
995
+ program in the rootfs, which we haven’t got. We could
996
+ rename `/jail/bin/fossil` to `/sbin/init` and then hack
997
+ the chroot scheme to match, but ick. (This, incidentally,
998
+ is why we set `ProcessTwo=yes` above even though Fossil is
999
+ perfectly capable of running as PID 1, a fact we depend on
1000
+ in the other methods above.)
1001
+ * **`machinectl shell`** will fail because there is no login
1002
+ daemon running, which we purposefully avoided adding by
1003
+ creating a “`FROM scratch`” container. (If you need a
1004
+ shell, say: `sudo systemd-nspawn --machine=myproject /bin/sh`)
1005
+ * **`machinectl status`** won’t give you the container logs
1006
+ because we disabled the shared journal, which was in turn
1007
+ necessary because we don’t run `systemd` *inside* the
1008
+ container, just outside.
1009
+
1010
+ If these are problems for you, you may wish to build a
1011
+ fatter container using `debootstrap` or similar. ([External
1012
+ tutorial][medtut].)
1013
+
1014
+3. We disable the “private networking” feature since the whole
1015
+ point of this container is to expose a network service to the
1016
+ public, one way or another. If you do things the way the defaults
1017
+ (and thus the official docs) expect, you must push through
1018
+ [a whole lot of complexity][ndcmp] to re-expose this single
1019
+ network port. That complexity is justified only if your service
1020
+ is itself complex, having both private and public service ports.
1021
+
1022
+[medtut]: https://medium.com/@huljar/setting-up-containers-with-systemd-nspawn-b719cff0fb8d
1023
+[ndcmp]: https://wiki.archlinux.org/title/systemd-networkd#Usage_with_containers
9501024
9511025
<div style="height:50em" id="this-space-intentionally-left-blank"></div>
9521026
--- www/containers.md
+++ www/containers.md
@@ -484,11 +484,11 @@
484 that’s still a big chunk of your storage budget. It takes 100:1 overhead
485 just to run a 4 MiB Fossil server container? Once again, I wouldn’t
486 blame you if you noped right on out of here, but if you will be patient,
487 you will find that there are ways to run Fossil inside a container even
488 on entry-level cloud VPSes. These are well-suited to running Fossil; you
489 don’t have to resort to [raw Fossil service](./server/) to succeed,
490 leaving the benefits of containerization to those with bigger budgets.
491
492 For the sake of simple examples in this section, we’ll assume you’re
493 integrating Fossil into a larger web site, such as with our [Debian +
494 nginx + TLS][DNT] plan. This is why all of the examples below create
@@ -521,10 +521,11 @@
521 this idea to the rest of your site.)
522
523 [DD]: https://www.docker.com/products/docker-desktop/
524 [DE]: https://docs.docker.com/engine/
525 [DNT]: ./server/debian/nginx.md
 
526
527
528 ### 6.1 <a id="nerdctl" name="containerd"></a>Stripping Docker Engine Down
529
530 The core of Docker Engine is its [`containerd`][ctrd] daemon and the
@@ -556,12 +557,12 @@
556 give up the image builder is [Podman]. Initially created by
557 Red Hat and thus popular on that family of OSes, it will run on
558 any flavor of Linux. It can even be made to run [on macOS via Homebrew][pmmac]
559 or [on Windows via WSL2][pmwin].
560
561 On Ubuntu 22.04, it’s about a quarter the size of Docker Engine, or half
562 that of the “full” distribution of `nerdctl` and all its dependencies.
563
564 Although Podman [bills itself][whatis] as a drop-in replacement for the
565 `docker` command and everything that sits behind it, some of the tool’s
566 design decisions affect how our Fossil containers run, as compared to
567 using Docker. The most important of these is that, by default, Podman
@@ -703,251 +704,322 @@
703 container images across the Internet, it can be a net win in terms of
704 build time.
705
706
707
708 ### 6.3 <a id="barebones"></a>Bare-Bones OCI Bundle Runners
709
710 If even the Podman stack is too big for you, you still have options for
711 running containers that are considerably slimmer, at a high cost to
712 administration complexity and loss of features.
713
714 Part of the OCI standard is the notion of a “bundle,” being a consistent
715 way to present a pre-built and configured container to the runtime.
716 Essentially, it consists of a directory containing a `config.json` file
717 and a `rootfs/` subdirectory containing the root filesystem image. Many
718 tools can produce these for you. We’ll show only one method in the first
719 section below, then reuse that in the following sections.
720
721
722 #### 6.3.1 <a id="runc"></a>`runc`
723
724 We mentioned `runc` [above](#nerdctl), but it’s possible to use it
725 standalone, without `containerd` or its CLI frontend `nerdctl`. You also
726 lose the build engine, intelligent image layer sharing, image registry
727 connections, and much more. The plus side is that `runc` alone is
728 18 MiB.
729
730 Using it without all the support tooling isn’t complicated, but it *is*
731 cryptic enough to want a shell script. Let’s say we want to build on our
732 big desktop machine but ship the resulting container to a small remote
733 host. This should serve:
734
-----
735
736 ```shell
737 #!/bin/bash -ex
738 c=fossil
739 b=/var/lib/machines/$c
740 h=my-host.example.com
741 m=/run/containerd/io.containerd.runtime.v2.task/moby
742 t=$(mktemp -d /tmp/$c-bundle.XXXXXX)
743
744 if [ -d "$t" ]
745 then
746 docker container start $c
747 docker container export $c > $t/rootfs.tar
748 id=$(docker inspect --format="{{.Id}}" $c)
749 sudo cat $m/$id/config.json \
750 | jq '.root.path = "'$b/rootfs'"'
751 | jq '.linux.cgroupsPath = ""'
752 | jq 'del(.linux.sysctl)'
753 | jq 'del(.linux.namespaces[] | select(.type == "network"))'
754 | jq 'del(.mounts[] | select(.destination == "/etc/hostname"))'
755 | jq 'del(.mounts[] | select(.destination == "/etc/resolv.conf"))'
756 | jq 'del(.mounts[] | select(.destination == "/etc/hosts"))'
757 | jq 'del(.hooks)' > $t/config.json
758 scp -r $t $h:tmp
759 ssh -t $h "{
760 mv ./$t/config.json $b &&
761 sudo tar -C $b/rootfs -xf ./$t/rootfs.tar &&
762 rm -r ./$t
763 }"
764 rm -r $t
765 fi
766 ```
767
-----
768
769 The first several lines list configurables:
770
771 * **`c`**: the name of the Docker container you’re bundling up for use
772 with `runc`
773 * **`b`**: the path of the exported container, called the “bundle” in
774 OCI jargon; we’re using the [`nspawn`](#nspawn) convention, a
775 reasonable choice under the [Linux FHS rules][LFHS]
776 * **`h`**: the remote host name
777 * **`m`**: the local directory holding the running machines, configurable
778 because:
779 * the path name is longer than we want to use inline
780 * it’s been known to change from one version of Docker to the next
781 * you might be building and testing with [Podman](#podman), so it
782 has to be “`/run/user/$UID/crun`” instead
783 * **`t`**: the temporary bundle directory we populate locally, then
784 `scp` to the remote machine, where it’s unpacked
785
786 [LFHS]: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard
787
788
789 ##### Why All That `sudo` Stuff?
790
791 This script uses `sudo` for two different purposes:
792
793 1. To read the local `config.json` file out of the `containerd` managed
794 directory, which is owned by `root` on Docker systems. Additionally,
795 that input file is only available while the container is started, so
796 we must ensure that before extracting it.
797
798 2. To unpack the bundle onto the remote machine. If you try to get
799 clever and unpack it locally, then `rsync` it to the remote host to
800 avoid re-copying files that haven’t changed since the last update,
801 you’ll find that it fails when it tries to copy device nodes, to
802 create files owned only by the remote root user, and so forth. If the
803 container bundle is small, it’s simpler to re-copy and unpack it
804 fresh each time.
805
806 I point all this out because it might ask for your password twice: once for
807 the local sudo command, and once for the remote.
808
809
810
811 ##### Why All That `jq` Stuff?
812
813 We’re using [jq] for two separate purposes:
814
815 1. To automatically transmogrify Docker’s container configuration so it
816 will work with `runc`:
817
818 * point it where we unpacked the container’s exported rootfs
819 * accede to its wish to [manage cgroups by itself][ecg]
820 * remove the `sysctl` calls that will break after…
821 * …we remove the network namespace to allow Fossil’s TCP listening
822 port to be available on the host; `runc` doesn’t offer the
823 equivalent of `docker create --publish`, and we can’t be
824 bothered to set up a manual mapping from the host port into the
825 container
826 * remove file bindings that point into the local runtime managed
827 directories; one of the things we give up by using a bare
828 container runner is automatic management of these files
829 * remove the hooks for essentially the same reason
830
831 2. To make the Docker-managed machine-readable `config.json` more
832 human-readable, in case there are other things you want changed in
833 this version of the container. Exposing the `config.json` file like
834 this means you don’t have to rebuild the container merely to change
835 a value like a mount point, the kernel capability set, and so forth.
836
837
838 ##### Running the Bundle
839
840 With the container exported to a bundle like this, you can start it as:
841
842 ```
843 $ cd /path/to/bundle
844 $ c=fossil-runc ← …or anything else you prefer
845 $ sudo runc create $c
846 $ sudo runc start $c
847 $ sudo runc exec $c -t sh -l
848 ~ $ ls museum
849 repo.fossil
850 ~ $ ps -eaf
851 PID USER TIME COMMAND
852 1 fossil 0:00 bin/fossil server --create …
853 ~ $ exit
854 $ sudo runc kill $c
855 $ sudo runc delete $c
856 ```
857
858 If you’re doing this on the export host, the first command is “`cd $b`”
859 if we’re using the variables from the shell script above. Alternately,
860 the `runc` subcommands that need to read the bundle files take a
861 `--bundle/-b` flag to let you avoid switching directories.
862
863 The rest should be straightforward: create and start the container as
864 root so the `chroot(2)` call inside the container will succeed, then get
865 into it with a login shell and poke around to prove to ourselves that
866 everything is working properly. It is. Yay!
867
868 The remaining commands show shutting the container down and destroying
869 it, simply to show how these commands change relative to using the
870 Docker Engine commands. It’s “kill,” not “stop,” and it’s “delete,” not
871 “rm.”
872
873 [ecg]: https://github.com/opencontainers/runc/pull/3131
874 [jq]: https://stedolan.github.io/jq/
875
876
877 ##### Lack of Layer Sharing
878
879 The bundle export process collapses Docker’s union filesystem down to a
880 single layer. Atop that, it makes all files mutable.
881
882 All of this is fine for tiny remote hosts with a single container, or at
883 least one where none of the containers share base layers. Where it
884 becomes a problem is when you have multiple Fossil containers on a
885 single host, since they all derive from the same base image.
886
887 The full-featured container runtimes above will intelligently share
888 these immutable base layers among the containers, storing only the
889 differences in each individual container. More, when pulling images from
890 a registry host, they’ll transfer only the layers you don’t have copies
891 of locally, so you don’t have to burn bandwidth sending copies of Alpine
892 and BusyBox each time, even though they’re unlikely to change from one
893 build to the next.
894
895
896 #### 6.3.2 <a id="crun"></a>`crun`
897
898 In the same way that [Docker Engine is based on `runc`](#runc), Podman’s
899 engine is based on [`crun`][crun], a lighter-weight alternative to
900 `runc`. It’s only 1.4 MiB on the system I tested it on, yet it will run
901 the same container bundles as in my `runc` examples above. We saved
902 more than that by compressing the container’s Fossil executable with
903 UPX, making the runtime virtually free in this case. The only question
904 is whether you can put up with its limitations, which are the same as
905 for `runc`.
906
907 [crun]: https://github.com/containers/crun
908
909
910 #### 6.3.3 <a id="nspawn"></a>`systemd-nspawn`
911
912 As of `systemd` version 242, its optional `nspawn` piece
913 [reportedly](https://www.phoronix.com/news/Systemd-Nspawn-OCI-Runtime)
914 got the ability to run OCI bundles directly. You might
915 have it installed already, but if not, it’s only about 2 MiB. It’s
916 in the `systemd-containers` package as of Ubuntu 22.04 LTS:
917
918 ```
919 $ sudo apt install systemd-containers
920 ```
921
922 It’s also in CentOS Stream 9, under the same name.
923
924 You create the bundles the same way as with [the `runc` method
925 above](#runc). The only thing that changes are the top-level management
926 commands:
927
928 ```
929 $ sudo systemd-nspawn \
930 --oci-bundle=/var/lib/machines/fossil \
931 --machine=fossil \
932 --network-veth \
933 --port=127.0.0.1:127.0.0.1:9999:8080
934 $ sudo machinectl list
935 No machines.
936 ```
937
938 This is why I wrote “reportedly” above: I couldn’t get it to work on two different
939 Linux distributions, and I can’t see why. I’m leaving this here to give
940 someone else a leg up, with the hope that they will work out what’s
941 needed to get the container running and registered with `machinectl`.
942
943 As of this writing, the tool expects an OCI container version of
944 “1.0.0”. I had to edit this at the top of my `config.json` file to get
945 the first command to read the bundle. The fact that it errored out when
946 I had “`1.0.2-dev`” in there proves it’s reading the file, but it
947 doesn’t seem able to make sense of what it finds there, and it doesn’t
948 give any diagnostics to say why.
949
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
950
951 <div style="height:50em" id="this-space-intentionally-left-blank"></div>
952
--- www/containers.md
+++ www/containers.md
@@ -484,11 +484,11 @@
484 that’s still a big chunk of your storage budget. It takes 100:1 overhead
485 just to run a 4 MiB Fossil server container? Once again, I wouldn’t
486 blame you if you noped right on out of here, but if you will be patient,
487 you will find that there are ways to run Fossil inside a container even
488 on entry-level cloud VPSes. These are well-suited to running Fossil; you
489 don’t have to resort to [raw Fossil service][srv] to succeed,
490 leaving the benefits of containerization to those with bigger budgets.
491
492 For the sake of simple examples in this section, we’ll assume you’re
493 integrating Fossil into a larger web site, such as with our [Debian +
494 nginx + TLS][DNT] plan. This is why all of the examples below create
@@ -521,10 +521,11 @@
521 this idea to the rest of your site.)
522
523 [DD]: https://www.docker.com/products/docker-desktop/
524 [DE]: https://docs.docker.com/engine/
525 [DNT]: ./server/debian/nginx.md
526 [srv]: ./server/
527
528
529 ### 6.1 <a id="nerdctl" name="containerd"></a>Stripping Docker Engine Down
530
531 The core of Docker Engine is its [`containerd`][ctrd] daemon and the
@@ -556,12 +557,12 @@
557 give up the image builder is [Podman]. Initially created by
558 Red Hat and thus popular on that family of OSes, it will run on
559 any flavor of Linux. It can even be made to run [on macOS via Homebrew][pmmac]
560 or [on Windows via WSL2][pmwin].
561
562 On Ubuntu 22.04, the installation size is about 38&nbsp;MiB, roughly a
563 tenth the size of Docker Engine.
564
565 Although Podman [bills itself][whatis] as a drop-in replacement for the
566 `docker` command and everything that sits behind it, some of the tool’s
567 design decisions affect how our Fossil containers run, as compared to
568 using Docker. The most important of these is that, by default, Podman
@@ -703,251 +704,322 @@
704 container images across the Internet, it can be a net win in terms of
705 build time.
706
707
708
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
-----
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
-----
709 ### 6.3 <a id="nspawn"></a>`systemd-container`
710
711 If even the Podman stack is too big for you, the next-best option I’m
712 aware of is the `systemd-container` infrastructure on modern Linuxes,
713 available since version 239 or so. Its runtime tooling requires only
714 about 1.4 MiB of disk space:
715
716 ```
717 $ sudo apt install systemd-container btrfs-tools
718 ```
719
720 That command assumes the primary test environment for
721 this guide, Ubuntu 22.04 LTS with `systemd` 249. For best
722 results, `/var/lib/machines` should be a btrfs volume, because
723 [`$REASONS`][mcfad]. (For CentOS Stream 9 and other Red Hattish
724 systems, you will have to make serveral adjustments, which we’ve
725 collected [below](#nspawn-centos) to keep these examples clear.)
726
727 The first configuration step is to convert the Docker container into
728 a “machine”, as systemd calls it. The easiest method is:
729
730 ```
731 $ make container-run
732 $ docker container export fossil-e119d5983620 |
733 machinectl import-tar - myproject
734 ```
735
736 Copy the container name from the first step to the second. Yours will
737 almost certainly be named after a different Fossil commit ID.
738
739 It’s important that the name of the machine you create &mdash;
740 “`myproject`” in this example &mdash; matches the base name
741 of the nspawn configuration file you create as the next step.
742 Therefore, to extend the example, the following file needs to be
743 called `/etc/systemd/nspawn/myproject.nspawn`, and it will contain
744 something like:
745
746 ----
747
748 ```
749 [Exec]
750 WorkingDirectory=/jail
751 Parameters=bin/fossil server \
752 --baseurl https://example.com/myproject \
753 --chroot /jail \
754 --create \
755 --jsmode bundled \
756 --localhost \
757 --port 9000 \
758 --scgi \
759 --user admin \
760 museum/repo.fossil
761 DropCapability= \
762 CAP_AUDIT_WRITE \
763 CAP_CHOWN \
764 CAP_FSETID \
765 CAP_KILL \
766 CAP_MKNOD \
767 CAP_NET_BIND_SERVICE \
768 CAP_NET_RAW \
769 CAP_SETFCAP \
770 CAP_SETPCAP
771 ProcessTwo=yes
772 LinkJournal=no
773 Timezone=no
774
775 [Files]
776 Bind=/home/fossil/museum/myproject:/jail/museum
777
778 [Network]
779 VirtualEthernet=no
780 ```
781
782 ----
783
784 If you recognize most of that from the `Dockerfile` discussion above,
785 congratulations, you’ve been paying attention. The rest should also
786 be clear from context.
787
788 Some of this is expected to vary. For one, the command given in the
789 `Parameters` directive assumes [SCGI proxying via nginx][DNT]. For
790 other use cases, see our collection of [Fossil server configuration
791 guides][srv], then adjust the command to your local needs.
792 For another, you will likely have to adjust the `Bind` value to
793 point at the directory containing the `repo.fossil` file referenced
794 in the command.
795
796 We also need a generic systemd unit file called
797 `/etc/systemd/system/[email protected]`, containing:
798
799 ----
800
801 ```
802 [Unit]
803 Description=Fossil %i Repo Service
804 [email protected] [email protected]
805 After=network.target systemd-resolved.service [email protected] [email protected]
806
807 [Service]
808 ExecStart=systemd-nspawn --settings=override --read-only --machine=%i bin/fossil
809
810 [Install]
811 WantedBy=multi-user.target
812 ```
813
814 ----
815
816 You shouldn’t have to change any of this because we’ve given the
817 `--setting=override` flag, meaning any setting in the nspawn file
818 overrides the setting passed to `systemd-nspawn`. This arrangement
819 not only keeps the unit file simple, it allows multiple services to
820 share the base configuration, varying on a per-repo level.
821
822 Start the service in the normal way:
823
824 ```
825 $ sudo systemctl enable fossil@myproject
826 $ sudo systemctl start fossil@myproject
827 ```
828
829 You should find it running on localhost port 9000 per the nspawn
830 configuration file above, suitable for proxying Fossil out to the
831 public using nginx, via SCGI. If you aren’t using a front-end proxy
832 and want Fossil exposed to the world, you might say this instead in
833 the `nspawn` file:
834
835 ```
836 Parameters=bin/fossil server \
837 --cert /path/to/my/fullchain.pem \
838 --chroot /jail \
839 --create \
840 --jsmode bundled \
841 --port 443 \
842 --user admin \
843 museum/repo.fossil
844 ```
845
846 You would also need to un-drop the `CAP_NET_BIND_SERVICE` capability
847 to allow Fossil to bind to this low-numbered port.
848
849 We use systemd’s template file feature to allow multiple Fossil
850 servers running on a single machine, each on a different TCP port,
851 as when proxying them out as subdirectories of a larger site.
852 To add another project, you must first clone the base “machine” layer:
853
854 ```
855 $ sudo machinectl clone myproject otherthing
856 ```
857
858 That will not only create a clone of `/var/lib/machines/myproject`
859 as `../otherthing`, it will create a matching `nspawn` file for you
860 as a copy of the first one. Adjust its contents to suit, then enable
861 and start it as above.
862
863 [mcfad]: https://www.freedesktop.org/software/systemd/man/machinectl.html#Files%20and%20Directories
864
865
866 ### 6.3.1 <a id="nspawn-rhel"></a>Getting It Working on a RHEL Clone
867
868 The biggest difference between doing this on OSes like CentOS versus
869 Ubuntu is that RHEL (thus also its clones) doesn’t ship btrfs in
870 its kernel, thus has no option for installing `mkfs.btrfs`, which
871 [`machinectl`][mctl] needs for various purposes.
872
873 Fortunately, there are workarounds.
874
875 First, the `apt install` command above becomes:
876
877 ```
878 $ sudo dnf install systemd-container
879 ```
880
881 Second, you have to hack around the lack of `machinectl import-tar` so:
882
883 ```
884 $ rootfs=/var/lib/machines/fossil
885 $ sudo mkdir -p $rootfs
886 $ docker container export fossil | sudo tar -xf -C $rootfs -
887 ```
888
889 The parent directory path in the `rootfs` variable is important,
890 because although we aren’t using `machinectl`, the `systemd-nspawn`
891 developers assume you’re using them together. Thus, when you give
892 `--machine`, it assumes the `machinectl` directory scheme. You could
893 instead use `--directory`, allowing you to store the rootfs whereever
894 you like, but why make things difficult? It’s a perfectly sensible
895 default, consistent with the [LHS] rules.
896
897 The final element &mdash; the machine name &mdash; can be anything
898 you like so long as it matches the nspawn file’s base name.
899
900 Finally, since you can’t use `machinectl clone`, you have to make
901 a wasteful copy of `/var/lib/machines/myproject` when standing up
902 multiple Fossil repo services on a single machine. (This is one
903 of the reasons `machinectl` depends on `btrfs`: cheap copy-on-write
904 subvolumes.) Because we give the `--read-only` flag, you can simply
905 `cp -r` one machine to a new name rather than go through the
906 export-and-import dance you used to create the first one.
907
908 [LHS]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html
909 [mctl]: https://www.freedesktop.org/software/systemd/man/machinectl.html
910
911
912 ### 6.3.2 <a id="nspawn-weaknesses"></a>What Am I Missing Out On?
913
914 For all the runtime size savings in this method, you may be wondering
915 what you’re missing out on relative to Podman, which takes up
916 roughly 27× more disk space. Short answer: lots. Long answer:
917
918 1. **Build system.** You’ll have to build and test your containers
919 some other way. This method is only suitable for running them
920 once they’re built.
921
922 2. **Orchestration.** All of the higher-level things like
923 “compose” files, Docker Swarm mode, and Kubernetes are
924 unavailable to you at this level. You can run multiple
925 instances of Fossil, but on a single machine only and with a
926 static configuration.
927
928 3. **Image layer sharing.** When you update an image using one of the
929 above methods, Docker and Podman are smart enough to copy only
930 changed layers. Furthermore, when you base multiple containers
931 on a single image, they don’t make copies of the base layers;
932 they can share them, because base layers are immutable, thus
933 cannot cross-contaminate.
934
935 Because we use `sysetmd-nspawn --read-only`, we get *some*
936 of this benefit, particularly when using `machinectl` with
937 `/var/lib/machines` as a btrfs volume. Even so, the disk space
938 and network I/O optimizations go deeper in the Docker and Podman
939 worlds.
940
941 4. **Tooling.** Hand-creating and modifying those systemd
942 files sucks compared to “`podman container create ...`” This
943 is but one of many affordances you will find in the runtimes
944 aimed at daily-use devops warriors.
945
946 5. **Network virtualization.** In the scheme above, we turn off the
947 `systemd` virtual netorking support because in its default mode,
948 it wants to hide the service entirely.
949
950 Another way to put this is that `systemd-nspawn --port` does
951 approximately *nothing* of what `docker create --publish` does
952 despite their superficial similarities.
953
954 For this container, it doesn’t much matter, since it exposes
955 only a single port, and we do want that one port exposed, one way
956 or another. Beyond that, we get all the control we need using
957 Fossil options like `--localhost`. I point this out because in
958 more complex situations, the automatic network setup features of
959 the more featureful runtimes can save a lot of time and hassle.
960 They aren’t doing anything you couldn’t do by hand, but why
961 would you want to, given the choice?
962
963 I expect there’s a lot more I neglected to think of when creating
964 this list, but I think it suffices to make my case as it is. If you
965 can afford the space of Podman or Docker, I strongly recommend using
966 either of them over the much lower-level `systemd-container`
967 infrastructure.
968
969 (Incidentally, these are essentially the same reasons why we no longer
970 talk about the `crun` tool underpinning Podman in this document. It’s
971 even more limited, making it even more difficult to administer while
972 providing no runtime size advantage. The `runc` tool underpinning
973 Docker is even worse on this score, being scarcely easier to use than
974 `crun` while having a much larger footprint.)
975
976
977 ### 6.3.3 <a id="nspawn-assumptions"></a>Violated Assumptions
978
979 The `systemd-container` infrastructure has a bunch of hard-coded
980 assumptions baked into it. We papered over these problems above,
981 but if you’re using these tools for other purposes on the machine
982 you’re serving Fossil from, you may need to know which assumptions
983 our container violates and the resulting consequences:
984
985 1. `systemd-nspawn` works best with `machinectl`, but if you haven’t
986 got `btrfs` available, you run into [trouble](#nspawn-rhel).
987
988 2. Our stock container starts a single static executable inside
989 a stripped-to-the-bones container rather than “boot” an OS
990 image, causing a bunch of commands to fail:
991
992 * **`machinectl poweroff`** will fail because the container
993 isn’t running dbus.
994 * **`machinectl start`** will try to find an `/sbin/init`
995 program in the rootfs, which we haven’t got. We could
996 rename `/jail/bin/fossil` to `/sbin/init` and then hack
997 the chroot scheme to match, but ick. (This, incidentally,
998 is why we set `ProcessTwo=yes` above even though Fossil is
999 perfectly capable of running as PID 1, a fact we depend on
1000 in the other methods above.)
1001 * **`machinectl shell`** will fail because there is no login
1002 daemon running, which we purposefully avoided adding by
1003 creating a “`FROM scratch`” container. (If you need a
1004 shell, say: `sudo systemd-nspawn --machine=myproject /bin/sh`)
1005 * **`machinectl status`** won’t give you the container logs
1006 because we disabled the shared journal, which was in turn
1007 necessary because we don’t run `systemd` *inside* the
1008 container, just outside.
1009
1010 If these are problems for you, you may wish to build a
1011 fatter container using `debootstrap` or similar. ([External
1012 tutorial][medtut].)
1013
1014 3. We disable the “private networking” feature since the whole
1015 point of this container is to expose a network service to the
1016 public, one way or another. If you do things the way the defaults
1017 (and thus the official docs) expect, you must push through
1018 [a whole lot of complexity][ndcmp] to re-expose this single
1019 network port. That complexity is justified only if your service
1020 is itself complex, having both private and public service ports.
1021
1022 [medtut]: https://medium.com/@huljar/setting-up-containers-with-systemd-nspawn-b719cff0fb8d
1023 [ndcmp]: https://wiki.archlinux.org/title/systemd-networkd#Usage_with_containers
1024
1025 <div style="height:50em" id="this-space-intentionally-left-blank"></div>
1026

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button