Fossil SCM

Worked out how to get systemd-container (a.k.a. nspawn + machinectl) working with the stock Fossil container. Following the above commits, it's pure documentation. Removed the runc and crun docs at the same time since this is as small as crun while being more functional; there's zero reaon to push through all the additional complexity of those even lower-level tools now that this method is debugged and documented.

wyoung 2022-11-30 23:09 trunk

Commit 930a655a14e9b040fcb4156bdda544ea9f6b684e2756ed15eb2ffb2bd4f6306a

Parent 0733be502bdab6a…

1 file changed +319 -245

~ www/containers.md

M www/containers.md

+319 -245

		--- www/containers.md
		+++ www/containers.md
		@@ -484,11 +484,11 @@
484	484	that’s still a big chunk of your storage budget. It takes 100:1 overhead
485	485	just to run a 4 MiB Fossil server container? Once again, I wouldn’t
486	486	blame you if you noped right on out of here, but if you will be patient,
487	487	you will find that there are ways to run Fossil inside a container even
488	488	on entry-level cloud VPSes. These are well-suited to running Fossil; you
489		-don’t have to resort to [raw Fossil service](./server/) to succeed,
	489	+don’t have to resort to [raw Fossil service][srv] to succeed,
490	490	leaving the benefits of containerization to those with bigger budgets.
491	491
492	492	For the sake of simple examples in this section, we’ll assume you’re
493	493	integrating Fossil into a larger web site, such as with our [Debian +
494	494	nginx + TLS][DNT] plan. This is why all of the examples below create
		@@ -521,10 +521,11 @@
521	521	this idea to the rest of your site.)
522	522
523	523	[DD]: https://www.docker.com/products/docker-desktop/
524	524	[DE]: https://docs.docker.com/engine/
525	525	[DNT]: ./server/debian/nginx.md
	526	+[srv]: ./server/
526	527
527	528
528	529	### 6.1 <a id="nerdctl" name="containerd"></a>Stripping Docker Engine Down
529	530
530	531	The core of Docker Engine is its [`containerd`][ctrd] daemon and the
		@@ -556,12 +557,12 @@
556	557	give up the image builder is [Podman]. Initially created by
557	558	Red Hat and thus popular on that family of OSes, it will run on
558	559	any flavor of Linux. It can even be made to run [on macOS via Homebrew][pmmac]
559	560	or [on Windows via WSL2][pmwin].
560	561
561		-On Ubuntu 22.04, it’s about a quarter the size of Docker Engine, or half
562		-that of the “full” distribution of `nerdctl` and all its dependencies.
	562	+On Ubuntu 22.04, the installation size is about 38 MiB, roughly a
	563	+tenth the size of Docker Engine.
563	564
564	565	Although Podman [bills itself][whatis] as a drop-in replacement for the
565	566	`docker` command and everything that sits behind it, some of the tool’s
566	567	design decisions affect how our Fossil containers run, as compared to
567	568	using Docker. The most important of these is that, by default, Podman
		@@ -703,251 +704,322 @@
703	704	container images across the Internet, it can be a net win in terms of
704	705	build time.
705	706
706	707
707	708
708		-### 6.3 <a id="barebones"></a>Bare-Bones OCI Bundle Runners
709		-
710		-If even the Podman stack is too big for you, you still have options for
711		-running containers that are considerably slimmer, at a high cost to
712		-administration complexity and loss of features.
713		-
714		-Part of the OCI standard is the notion of a “bundle,” being a consistent
715		-way to present a pre-built and configured container to the runtime.
716		-Essentially, it consists of a directory containing a `config.json` file
717		-and a `rootfs/` subdirectory containing the root filesystem image. Many
718		-tools can produce these for you. We’ll show only one method in the first
719		-section below, then reuse that in the following sections.
720		-
721		-
722		-#### 6.3.1 <a id="runc"></a>`runc`
723		-
724		-We mentioned `runc` [above](#nerdctl), but it’s possible to use it
725		-standalone, without `containerd` or its CLI frontend `nerdctl`. You also
726		-lose the build engine, intelligent image layer sharing, image registry
727		-connections, and much more. The plus side is that `runc` alone is
728		-18 MiB.
729		-
730		-Using it without all the support tooling isn’t complicated, but it is
731		-cryptic enough to want a shell script. Let’s say we want to build on our
732		-big desktop machine but ship the resulting container to a small remote
733		-host. This should serve:
734		-
		-----
735		-
736		-```shell
737		-#!/bin/bash -ex
738		-c=fossil
739		-b=/var/lib/machines/$c
740		-h=my-host.example.com
741		-m=/run/containerd/io.containerd.runtime.v2.task/moby
742		-t=$(mktemp -d /tmp/$c-bundle.XXXXXX)
743		-
744		-if [ -d "$t" ]
745		-then
746		- docker container start $c
747		- docker container export $c > $t/rootfs.tar
748		- id=$(docker inspect --format="{{.Id}}" $c)
749		- sudo cat $m/$id/config.json \
750		- \| jq '.root.path = "'$b/rootfs'"'
751		- \| jq '.linux.cgroupsPath = ""'
752		- \| jq 'del(.linux.sysctl)'
753		- \| jq 'del(.linux.namespaces[] \| select(.type == "network"))'
754		- \| jq 'del(.mounts[] \| select(.destination == "/etc/hostname"))'
755		- \| jq 'del(.mounts[] \| select(.destination == "/etc/resolv.conf"))'
756		- \| jq 'del(.mounts[] \| select(.destination == "/etc/hosts"))'
757		- \| jq 'del(.hooks)' > $t/config.json
758		- scp -r $t $h:tmp
759		- ssh -t $h "{
760		- mv ./$t/config.json $b &&
761		- sudo tar -C $b/rootfs -xf ./$t/rootfs.tar &&
762		- rm -r ./$t
763		- }"
764		- rm -r $t
765		-fi
766		-```
767		-
		-----
768		-
769		-The first several lines list configurables:
770		-
771		-* `c`: the name of the Docker container you’re bundling up for use
772		- with `runc`
773		-* `b`: the path of the exported container, called the “bundle” in
774		- OCI jargon; we’re using the [`nspawn`](#nspawn) convention, a
775		- reasonable choice under the [Linux FHS rules][LFHS]
776		-* `h`: the remote host name
777		-* `m`: the local directory holding the running machines, configurable
778		- because:
779		- * the path name is longer than we want to use inline
780		- * it’s been known to change from one version of Docker to the next
781		- * you might be building and testing with [Podman](#podman), so it
782		- has to be “`/run/user/$UID/crun`” instead
783		-* `t`: the temporary bundle directory we populate locally, then
784		- `scp` to the remote machine, where it’s unpacked
785		-
786		-[LFHS]: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard
787		-
788		-
789		-##### Why All That `sudo` Stuff?
790		-
791		-This script uses `sudo` for two different purposes:
792		-
793		-1. To read the local `config.json` file out of the `containerd` managed
794		- directory, which is owned by `root` on Docker systems. Additionally,
795		- that input file is only available while the container is started, so
796		- we must ensure that before extracting it.
797		-
798		-2. To unpack the bundle onto the remote machine. If you try to get
799		- clever and unpack it locally, then `rsync` it to the remote host to
800		- avoid re-copying files that haven’t changed since the last update,
801		- you’ll find that it fails when it tries to copy device nodes, to
802		- create files owned only by the remote root user, and so forth. If the
803		- container bundle is small, it’s simpler to re-copy and unpack it
804		- fresh each time.
805		-
806		-I point all this out because it might ask for your password twice: once for
807		-the local sudo command, and once for the remote.
808		-
809		-
810		-
811		-##### Why All That `jq` Stuff?
812		-
813		-We’re using [jq] for two separate purposes:
814		-
815		-1. To automatically transmogrify Docker’s container configuration so it
816		- will work with `runc`:
817		-
818		- * point it where we unpacked the container’s exported rootfs
819		- * accede to its wish to [manage cgroups by itself][ecg]
820		- * remove the `sysctl` calls that will break after…
821		- * …we remove the network namespace to allow Fossil’s TCP listening
822		- port to be available on the host; `runc` doesn’t offer the
823		- equivalent of `docker create --publish`, and we can’t be
824		- bothered to set up a manual mapping from the host port into the
825		- container
826		- * remove file bindings that point into the local runtime managed
827		- directories; one of the things we give up by using a bare
828		- container runner is automatic management of these files
829		- * remove the hooks for essentially the same reason
830		-
831		-2. To make the Docker-managed machine-readable `config.json` more
832		- human-readable, in case there are other things you want changed in
833		- this version of the container. Exposing the `config.json` file like
834		- this means you don’t have to rebuild the container merely to change
835		- a value like a mount point, the kernel capability set, and so forth.
836		-
837		-
838		-##### Running the Bundle
839		-
840		-With the container exported to a bundle like this, you can start it as:
841		-
842		-```
843		- $ cd /path/to/bundle
844		- $ c=fossil-runc ← …or anything else you prefer
845		- $ sudo runc create $c
846		- $ sudo runc start $c
847		- $ sudo runc exec $c -t sh -l
848		- ~ $ ls museum
849		- repo.fossil
850		- ~ $ ps -eaf
851		- PID USER TIME COMMAND
852		- 1 fossil 0:00 bin/fossil server --create …
853		- ~ $ exit
854		- $ sudo runc kill $c
855		- $ sudo runc delete $c
856		-```
857		-
858		-If you’re doing this on the export host, the first command is “`cd $b`”
859		-if we’re using the variables from the shell script above. Alternately,
860		-the `runc` subcommands that need to read the bundle files take a
861		-`--bundle/-b` flag to let you avoid switching directories.
862		-
863		-The rest should be straightforward: create and start the container as
864		-root so the `chroot(2)` call inside the container will succeed, then get
865		-into it with a login shell and poke around to prove to ourselves that
866		-everything is working properly. It is. Yay!
867		-
868		-The remaining commands show shutting the container down and destroying
869		-it, simply to show how these commands change relative to using the
870		-Docker Engine commands. It’s “kill,” not “stop,” and it’s “delete,” not
871		-“rm.”
872		-
873		-[ecg]: https://github.com/opencontainers/runc/pull/3131
874		-[jq]: https://stedolan.github.io/jq/
875		-
876		-
877		-##### Lack of Layer Sharing
878		-
879		-The bundle export process collapses Docker’s union filesystem down to a
880		-single layer. Atop that, it makes all files mutable.
881		-
882		-All of this is fine for tiny remote hosts with a single container, or at
883		-least one where none of the containers share base layers. Where it
884		-becomes a problem is when you have multiple Fossil containers on a
885		-single host, since they all derive from the same base image.
886		-
887		-The full-featured container runtimes above will intelligently share
888		-these immutable base layers among the containers, storing only the
889		-differences in each individual container. More, when pulling images from
890		-a registry host, they’ll transfer only the layers you don’t have copies
891		-of locally, so you don’t have to burn bandwidth sending copies of Alpine
892		-and BusyBox each time, even though they’re unlikely to change from one
893		-build to the next.
894		-
895		-
896		-#### 6.3.2 <a id="crun"></a>`crun`
897		-
898		-In the same way that [Docker Engine is based on `runc`](#runc), Podman’s
899		-engine is based on [`crun`][crun], a lighter-weight alternative to
900		-`runc`. It’s only 1.4 MiB on the system I tested it on, yet it will run
901		-the same container bundles as in my `runc` examples above. We saved
902		-more than that by compressing the container’s Fossil executable with
903		-UPX, making the runtime virtually free in this case. The only question
904		-is whether you can put up with its limitations, which are the same as
905		-for `runc`.
906		-
907		-[crun]: https://github.com/containers/crun
908		-
909		-
910		-#### 6.3.3 <a id="nspawn"></a>`systemd-nspawn`
911		-
912		-As of `systemd` version 242, its optional `nspawn` piece
913		-[reportedly](https://www.phoronix.com/news/Systemd-Nspawn-OCI-Runtime)
914		-got the ability to run OCI bundles directly. You might
915		-have it installed already, but if not, it’s only about 2 MiB. It’s
916		-in the `systemd-containers` package as of Ubuntu 22.04 LTS:
917		-
918		-```
919		- $ sudo apt install systemd-containers
920		-```
921		-
922		-It’s also in CentOS Stream 9, under the same name.
923		-
924		-You create the bundles the same way as with [the `runc` method
925		-above](#runc). The only thing that changes are the top-level management
926		-commands:
927		-
928		-```
929		- $ sudo systemd-nspawn \
930		- --oci-bundle=/var/lib/machines/fossil \
931		- --machine=fossil \
932		- --network-veth \
933		- --port=127.0.0.1:127.0.0.1:9999:8080
934		- $ sudo machinectl list
935		- No machines.
936		-```
937		-
938		-This is why I wrote “reportedly” above: I couldn’t get it to work on two different
939		-Linux distributions, and I can’t see why. I’m leaving this here to give
940		-someone else a leg up, with the hope that they will work out what’s
941		-needed to get the container running and registered with `machinectl`.
942		-
943		-As of this writing, the tool expects an OCI container version of
944		-“1.0.0”. I had to edit this at the top of my `config.json` file to get
945		-the first command to read the bundle. The fact that it errored out when
946		-I had “`1.0.2-dev`” in there proves it’s reading the file, but it
947		-doesn’t seem able to make sense of what it finds there, and it doesn’t
948		-give any diagnostics to say why.
949		-
	709	+### 6.3 <a id="nspawn"></a>`systemd-container`
	710	+
	711	+If even the Podman stack is too big for you, the next-best option I’m
	712	+aware of is the `systemd-container` infrastructure on modern Linuxes,
	713	+available since version 239 or so. Its runtime tooling requires only
	714	+about 1.4 MiB of disk space:
	715	+
	716	+```
	717	+ $ sudo apt install systemd-container btrfs-tools
	718	+```
	719	+
	720	+That command assumes the primary test environment for
	721	+this guide, Ubuntu 22.04 LTS with `systemd` 249. For best
	722	+results, `/var/lib/machines` should be a btrfs volume, because
	723	+[`$REASONS`][mcfad]. (For CentOS Stream 9 and other Red Hattish
	724	+systems, you will have to make serveral adjustments, which we’ve
	725	+collected [below](#nspawn-centos) to keep these examples clear.)
	726	+
	727	+The first configuration step is to convert the Docker container into
	728	+a “machine”, as systemd calls it. The easiest method is:
	729	+
	730	+```
	731	+ $ make container-run
	732	+ $ docker container export fossil-e119d5983620 \|
	733	+ machinectl import-tar - myproject
	734	+```
	735	+
	736	+Copy the container name from the first step to the second. Yours will
	737	+almost certainly be named after a different Fossil commit ID.
	738	+
	739	+It’s important that the name of the machine you create —
	740	+“`myproject`” in this example — matches the base name
	741	+of the nspawn configuration file you create as the next step.
	742	+Therefore, to extend the example, the following file needs to be
	743	+called `/etc/systemd/nspawn/myproject.nspawn`, and it will contain
	744	+something like:
	745	+
	746	+----
	747	+
	748	+```
	749	+[Exec]
	750	+WorkingDirectory=/jail
	751	+Parameters=bin/fossil server \
	752	+ --baseurl https://example.com/myproject \
	753	+ --chroot /jail \
	754	+ --create \
	755	+ --jsmode bundled \
	756	+ --localhost \
	757	+ --port 9000 \
	758	+ --scgi \
	759	+ --user admin \
	760	+ museum/repo.fossil
	761	+DropCapability= \
	762	+ CAP_AUDIT_WRITE \
	763	+ CAP_CHOWN \
	764	+ CAP_FSETID \
	765	+ CAP_KILL \
	766	+ CAP_MKNOD \
	767	+ CAP_NET_BIND_SERVICE \
	768	+ CAP_NET_RAW \
	769	+ CAP_SETFCAP \
	770	+ CAP_SETPCAP
	771	+ProcessTwo=yes
	772	+LinkJournal=no
	773	+Timezone=no
	774	+
	775	+[Files]
	776	+Bind=/home/fossil/museum/myproject:/jail/museum
	777	+
	778	+[Network]
	779	+VirtualEthernet=no
	780	+```
	781	+
	782	+----
	783	+
	784	+If you recognize most of that from the `Dockerfile` discussion above,
	785	+congratulations, you’ve been paying attention. The rest should also
	786	+be clear from context.
	787	+
	788	+Some of this is expected to vary. For one, the command given in the
	789	+`Parameters` directive assumes [SCGI proxying via nginx][DNT]. For
	790	+other use cases, see our collection of [Fossil server configuration
	791	+guides][srv], then adjust the command to your local needs.
	792	+For another, you will likely have to adjust the `Bind` value to
	793	+point at the directory containing the `repo.fossil` file referenced
	794	+in the command.
	795	+
	796	+We also need a generic systemd unit file called
	797	+`/etc/systemd/system/[email protected]`, containing:
	798	+
	799	+----
	800	+
	801	+```
	802	+[Unit]
	803	+Description=Fossil %i Repo Service
	804	+[email protected] [email protected]
	805	+After=network.target systemd-resolved.service [email protected] [email protected]
	806	+
	807	+[Service]
	808	+ExecStart=systemd-nspawn --settings=override --read-only --machine=%i bin/fossil
	809	+
	810	+[Install]
	811	+WantedBy=multi-user.target
	812	+```
	813	+
	814	+----
	815	+
	816	+You shouldn’t have to change any of this because we’ve given the
	817	+`--setting=override` flag, meaning any setting in the nspawn file
	818	+overrides the setting passed to `systemd-nspawn`. This arrangement
	819	+not only keeps the unit file simple, it allows multiple services to
	820	+share the base configuration, varying on a per-repo level.
	821	+
	822	+Start the service in the normal way:
	823	+
	824	+```
	825	+ $ sudo systemctl enable fossil@myproject
	826	+ $ sudo systemctl start fossil@myproject
	827	+```
	828	+
	829	+You should find it running on localhost port 9000 per the nspawn
	830	+configuration file above, suitable for proxying Fossil out to the
	831	+public using nginx, via SCGI. If you aren’t using a front-end proxy
	832	+and want Fossil exposed to the world, you might say this instead in
	833	+the `nspawn` file:
	834	+
	835	+```
	836	+Parameters=bin/fossil server \
	837	+ --cert /path/to/my/fullchain.pem \
	838	+ --chroot /jail \
	839	+ --create \
	840	+ --jsmode bundled \
	841	+ --port 443 \
	842	+ --user admin \
	843	+ museum/repo.fossil
	844	+```
	845	+
	846	+You would also need to un-drop the `CAP_NET_BIND_SERVICE` capability
	847	+to allow Fossil to bind to this low-numbered port.
	848	+
	849	+We use systemd’s template file feature to allow multiple Fossil
	850	+servers running on a single machine, each on a different TCP port,
	851	+as when proxying them out as subdirectories of a larger site.
	852	+To add another project, you must first clone the base “machine” layer:
	853	+
	854	+```
	855	+ $ sudo machinectl clone myproject otherthing
	856	+```
	857	+
	858	+That will not only create a clone of `/var/lib/machines/myproject`
	859	+as `../otherthing`, it will create a matching `nspawn` file for you
	860	+as a copy of the first one. Adjust its contents to suit, then enable
	861	+and start it as above.
	862	+
	863	+[mcfad]: https://www.freedesktop.org/software/systemd/man/machinectl.html#Files%20and%20Directories
	864	+
	865	+
	866	+### 6.3.1 <a id="nspawn-rhel"></a>Getting It Working on a RHEL Clone
	867	+
	868	+The biggest difference between doing this on OSes like CentOS versus
	869	+Ubuntu is that RHEL (thus also its clones) doesn’t ship btrfs in
	870	+its kernel, thus has no option for installing `mkfs.btrfs`, which
	871	+[`machinectl`][mctl] needs for various purposes.
	872	+
	873	+Fortunately, there are workarounds.
	874	+
	875	+First, the `apt install` command above becomes:
	876	+
	877	+```
	878	+ $ sudo dnf install systemd-container
	879	+```
	880	+
	881	+Second, you have to hack around the lack of `machinectl import-tar` so:
	882	+
	883	+```
	884	+ $ rootfs=/var/lib/machines/fossil
	885	+ $ sudo mkdir -p $rootfs
	886	+ $ docker container export fossil \| sudo tar -xf -C $rootfs -
	887	+```
	888	+
	889	+The parent directory path in the `rootfs` variable is important,
	890	+because although we aren’t using `machinectl`, the `systemd-nspawn`
	891	+developers assume you’re using them together. Thus, when you give
	892	+`--machine`, it assumes the `machinectl` directory scheme. You could
	893	+instead use `--directory`, allowing you to store the rootfs whereever
	894	+you like, but why make things difficult? It’s a perfectly sensible
	895	+default, consistent with the [LHS] rules.
	896	+
	897	+The final element — the machine name — can be anything
	898	+you like so long as it matches the nspawn file’s base name.
	899	+
	900	+Finally, since you can’t use `machinectl clone`, you have to make
	901	+a wasteful copy of `/var/lib/machines/myproject` when standing up
	902	+multiple Fossil repo services on a single machine. (This is one
	903	+of the reasons `machinectl` depends on `btrfs`: cheap copy-on-write
	904	+subvolumes.) Because we give the `--read-only` flag, you can simply
	905	+`cp -r` one machine to a new name rather than go through the
	906	+export-and-import dance you used to create the first one.
	907	+
	908	+[LHS]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html
	909	+[mctl]: https://www.freedesktop.org/software/systemd/man/machinectl.html
	910	+
	911	+
	912	+### 6.3.2 <a id="nspawn-weaknesses"></a>What Am I Missing Out On?
	913	+
	914	+For all the runtime size savings in this method, you may be wondering
	915	+what you’re missing out on relative to Podman, which takes up
	916	+roughly 27× more disk space. Short answer: lots. Long answer:
	917	+
	918	+1. Build system. You’ll have to build and test your containers
	919	+ some other way. This method is only suitable for running them
	920	+ once they’re built.
	921	+
	922	+2. Orchestration. All of the higher-level things like
	923	+ “compose” files, Docker Swarm mode, and Kubernetes are
	924	+ unavailable to you at this level. You can run multiple
	925	+ instances of Fossil, but on a single machine only and with a
	926	+ static configuration.
	927	+
	928	+3. Image layer sharing. When you update an image using one of the
	929	+ above methods, Docker and Podman are smart enough to copy only
	930	+ changed layers. Furthermore, when you base multiple containers
	931	+ on a single image, they don’t make copies of the base layers;
	932	+ they can share them, because base layers are immutable, thus
	933	+ cannot cross-contaminate.
	934	+
	935	+ Because we use `sysetmd-nspawn --read-only`, we get some
	936	+ of this benefit, particularly when using `machinectl` with
	937	+ `/var/lib/machines` as a btrfs volume. Even so, the disk space
	938	+ and network I/O optimizations go deeper in the Docker and Podman
	939	+ worlds.
	940	+
	941	+4. Tooling. Hand-creating and modifying those systemd
	942	+ files sucks compared to “`podman container create ...`” This
	943	+ is but one of many affordances you will find in the runtimes
	944	+ aimed at daily-use devops warriors.
	945	+
	946	+5. Network virtualization. In the scheme above, we turn off the
	947	+ `systemd` virtual netorking support because in its default mode,
	948	+ it wants to hide the service entirely.
	949	+
	950	+ Another way to put this is that `systemd-nspawn --port` does
	951	+ approximately nothing of what `docker create --publish` does
	952	+ despite their superficial similarities.
	953	+
	954	+ For this container, it doesn’t much matter, since it exposes
	955	+ only a single port, and we do want that one port exposed, one way
	956	+ or another. Beyond that, we get all the control we need using
	957	+ Fossil options like `--localhost`. I point this out because in
	958	+ more complex situations, the automatic network setup features of
	959	+ the more featureful runtimes can save a lot of time and hassle.
	960	+ They aren’t doing anything you couldn’t do by hand, but why
	961	+ would you want to, given the choice?
	962	+
	963	+I expect there’s a lot more I neglected to think of when creating
	964	+this list, but I think it suffices to make my case as it is. If you
	965	+can afford the space of Podman or Docker, I strongly recommend using
	966	+either of them over the much lower-level `systemd-container`
	967	+infrastructure.
	968	+
	969	+(Incidentally, these are essentially the same reasons why we no longer
	970	+talk about the `crun` tool underpinning Podman in this document. It’s
	971	+even more limited, making it even more difficult to administer while
	972	+providing no runtime size advantage. The `runc` tool underpinning
	973	+Docker is even worse on this score, being scarcely easier to use than
	974	+`crun` while having a much larger footprint.)
	975	+
	976	+
	977	+### 6.3.3 <a id="nspawn-assumptions"></a>Violated Assumptions
	978	+
	979	+The `systemd-container` infrastructure has a bunch of hard-coded
	980	+assumptions baked into it. We papered over these problems above,
	981	+but if you’re using these tools for other purposes on the machine
	982	+you’re serving Fossil from, you may need to know which assumptions
	983	+our container violates and the resulting consequences:
	984	+
	985	+1. `systemd-nspawn` works best with `machinectl`, but if you haven’t
	986	+ got `btrfs` available, you run into [trouble](#nspawn-rhel).
	987	+
	988	+2. Our stock container starts a single static executable inside
	989	+ a stripped-to-the-bones container rather than “boot” an OS
	990	+ image, causing a bunch of commands to fail:
	991	+
	992	+ * `machinectl poweroff` will fail because the container
	993	+ isn’t running dbus.
	994	+ * `machinectl start` will try to find an `/sbin/init`
	995	+ program in the rootfs, which we haven’t got. We could
	996	+ rename `/jail/bin/fossil` to `/sbin/init` and then hack
	997	+ the chroot scheme to match, but ick. (This, incidentally,
	998	+ is why we set `ProcessTwo=yes` above even though Fossil is
	999	+ perfectly capable of running as PID 1, a fact we depend on
	1000	+ in the other methods above.)
	1001	+ * `machinectl shell` will fail because there is no login
	1002	+ daemon running, which we purposefully avoided adding by
	1003	+ creating a “`FROM scratch`” container. (If you need a
	1004	+ shell, say: `sudo systemd-nspawn --machine=myproject /bin/sh`)
	1005	+ * `machinectl status` won’t give you the container logs
	1006	+ because we disabled the shared journal, which was in turn
	1007	+ necessary because we don’t run `systemd` inside the
	1008	+ container, just outside.
	1009	+
	1010	+ If these are problems for you, you may wish to build a
	1011	+ fatter container using `debootstrap` or similar. ([External
	1012	+ tutorial][medtut].)
	1013	+
	1014	+3. We disable the “private networking” feature since the whole
	1015	+ point of this container is to expose a network service to the
	1016	+ public, one way or another. If you do things the way the defaults
	1017	+ (and thus the official docs) expect, you must push through
	1018	+ [a whole lot of complexity][ndcmp] to re-expose this single
	1019	+ network port. That complexity is justified only if your service
	1020	+ is itself complex, having both private and public service ports.
	1021	+
	1022	+[medtut]: https://medium.com/@huljar/setting-up-containers-with-systemd-nspawn-b719cff0fb8d
	1023	+[ndcmp]: https://wiki.archlinux.org/title/systemd-networkd#Usage_with_containers
950	1024
951	1025	<div style="height:50em" id="this-space-intentionally-left-blank"></div>
952	1026

	--- www/containers.md
	+++ www/containers.md
	@@ -484,11 +484,11 @@
484	that’s still a big chunk of your storage budget. It takes 100:1 overhead
485	just to run a 4 MiB Fossil server container? Once again, I wouldn’t
486	blame you if you noped right on out of here, but if you will be patient,
487	you will find that there are ways to run Fossil inside a container even
488	on entry-level cloud VPSes. These are well-suited to running Fossil; you
489	don’t have to resort to [raw Fossil service](./server/) to succeed,
490	leaving the benefits of containerization to those with bigger budgets.
491
492	For the sake of simple examples in this section, we’ll assume you’re
493	integrating Fossil into a larger web site, such as with our [Debian +
494	nginx + TLS][DNT] plan. This is why all of the examples below create
	@@ -521,10 +521,11 @@
521	this idea to the rest of your site.)
522
523	[DD]: https://www.docker.com/products/docker-desktop/
524	[DE]: https://docs.docker.com/engine/
525	[DNT]: ./server/debian/nginx.md

526
527
528	### 6.1 <a id="nerdctl" name="containerd"></a>Stripping Docker Engine Down
529
530	The core of Docker Engine is its [`containerd`][ctrd] daemon and the
	@@ -556,12 +557,12 @@
556	give up the image builder is [Podman]. Initially created by
557	Red Hat and thus popular on that family of OSes, it will run on
558	any flavor of Linux. It can even be made to run [on macOS via Homebrew][pmmac]
559	or [on Windows via WSL2][pmwin].
560
561	On Ubuntu 22.04, it’s about a quarter the size of Docker Engine, or half
562	that of the “full” distribution of `nerdctl` and all its dependencies.
563
564	Although Podman [bills itself][whatis] as a drop-in replacement for the
565	`docker` command and everything that sits behind it, some of the tool’s
566	design decisions affect how our Fossil containers run, as compared to
567	using Docker. The most important of these is that, by default, Podman
	@@ -703,251 +704,322 @@
703	container images across the Internet, it can be a net win in terms of
704	build time.
705
706
707
708	### 6.3 <a id="barebones"></a>Bare-Bones OCI Bundle Runners
709
710	If even the Podman stack is too big for you, you still have options for
711	running containers that are considerably slimmer, at a high cost to
712	administration complexity and loss of features.
713
714	Part of the OCI standard is the notion of a “bundle,” being a consistent
715	way to present a pre-built and configured container to the runtime.
716	Essentially, it consists of a directory containing a `config.json` file
717	and a `rootfs/` subdirectory containing the root filesystem image. Many
718	tools can produce these for you. We’ll show only one method in the first
719	section below, then reuse that in the following sections.
720
721
722	#### 6.3.1 <a id="runc"></a>`runc`
723
724	We mentioned `runc` [above](#nerdctl), but it’s possible to use it
725	standalone, without `containerd` or its CLI frontend `nerdctl`. You also
726	lose the build engine, intelligent image layer sharing, image registry
727	connections, and much more. The plus side is that `runc` alone is
728	18 MiB.
729
730	Using it without all the support tooling isn’t complicated, but it is
731	cryptic enough to want a shell script. Let’s say we want to build on our
732	big desktop machine but ship the resulting container to a small remote
733	host. This should serve:
734
	-----
735
736	```shell
737	#!/bin/bash -ex
738	c=fossil
739	b=/var/lib/machines/$c
740	h=my-host.example.com
741	m=/run/containerd/io.containerd.runtime.v2.task/moby
742	t=$(mktemp -d /tmp/$c-bundle.XXXXXX)
743
744	if [ -d "$t" ]
745	then
746	docker container start $c
747	docker container export $c > $t/rootfs.tar
748	id=$(docker inspect --format="{{.Id}}" $c)
749	sudo cat $m/$id/config.json \
750	\| jq '.root.path = "'$b/rootfs'"'
751	\| jq '.linux.cgroupsPath = ""'
752	\| jq 'del(.linux.sysctl)'
753	\| jq 'del(.linux.namespaces[] \| select(.type == "network"))'
754	\| jq 'del(.mounts[] \| select(.destination == "/etc/hostname"))'
755	\| jq 'del(.mounts[] \| select(.destination == "/etc/resolv.conf"))'
756	\| jq 'del(.mounts[] \| select(.destination == "/etc/hosts"))'
757	\| jq 'del(.hooks)' > $t/config.json
758	scp -r $t $h:tmp
759	ssh -t $h "{
760	mv ./$t/config.json $b &&
761	sudo tar -C $b/rootfs -xf ./$t/rootfs.tar &&
762	rm -r ./$t
763	}"
764	rm -r $t
765	fi
766	```
767
	-----
768
769	The first several lines list configurables:
770
771	* `c`: the name of the Docker container you’re bundling up for use
772	with `runc`
773	* `b`: the path of the exported container, called the “bundle” in
774	OCI jargon; we’re using the [`nspawn`](#nspawn) convention, a
775	reasonable choice under the [Linux FHS rules][LFHS]
776	* `h`: the remote host name
777	* `m`: the local directory holding the running machines, configurable
778	because:
779	* the path name is longer than we want to use inline
780	* it’s been known to change from one version of Docker to the next
781	* you might be building and testing with [Podman](#podman), so it
782	has to be “`/run/user/$UID/crun`” instead
783	* `t`: the temporary bundle directory we populate locally, then
784	`scp` to the remote machine, where it’s unpacked
785
786	[LFHS]: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard
787
788
789	##### Why All That `sudo` Stuff?
790
791	This script uses `sudo` for two different purposes:
792
793	1. To read the local `config.json` file out of the `containerd` managed
794	directory, which is owned by `root` on Docker systems. Additionally,
795	that input file is only available while the container is started, so
796	we must ensure that before extracting it.
797
798	2. To unpack the bundle onto the remote machine. If you try to get
799	clever and unpack it locally, then `rsync` it to the remote host to
800	avoid re-copying files that haven’t changed since the last update,
801	you’ll find that it fails when it tries to copy device nodes, to
802	create files owned only by the remote root user, and so forth. If the
803	container bundle is small, it’s simpler to re-copy and unpack it
804	fresh each time.
805
806	I point all this out because it might ask for your password twice: once for
807	the local sudo command, and once for the remote.
808
809
810
811	##### Why All That `jq` Stuff?
812
813	We’re using [jq] for two separate purposes:
814
815	1. To automatically transmogrify Docker’s container configuration so it
816	will work with `runc`:
817
818	* point it where we unpacked the container’s exported rootfs
819	* accede to its wish to [manage cgroups by itself][ecg]
820	* remove the `sysctl` calls that will break after…
821	* …we remove the network namespace to allow Fossil’s TCP listening
822	port to be available on the host; `runc` doesn’t offer the
823	equivalent of `docker create --publish`, and we can’t be
824	bothered to set up a manual mapping from the host port into the
825	container
826	* remove file bindings that point into the local runtime managed
827	directories; one of the things we give up by using a bare
828	container runner is automatic management of these files
829	* remove the hooks for essentially the same reason
830
831	2. To make the Docker-managed machine-readable `config.json` more
832	human-readable, in case there are other things you want changed in
833	this version of the container. Exposing the `config.json` file like
834	this means you don’t have to rebuild the container merely to change
835	a value like a mount point, the kernel capability set, and so forth.
836
837
838	##### Running the Bundle
839
840	With the container exported to a bundle like this, you can start it as:
841
842	```
843	$ cd /path/to/bundle
844	$ c=fossil-runc ← …or anything else you prefer
845	$ sudo runc create $c
846	$ sudo runc start $c
847	$ sudo runc exec $c -t sh -l
848	~ $ ls museum
849	repo.fossil
850	~ $ ps -eaf
851	PID USER TIME COMMAND
852	1 fossil 0:00 bin/fossil server --create …
853	~ $ exit
854	$ sudo runc kill $c
855	$ sudo runc delete $c
856	```
857
858	If you’re doing this on the export host, the first command is “`cd $b`”
859	if we’re using the variables from the shell script above. Alternately,
860	the `runc` subcommands that need to read the bundle files take a
861	`--bundle/-b` flag to let you avoid switching directories.
862
863	The rest should be straightforward: create and start the container as
864	root so the `chroot(2)` call inside the container will succeed, then get
865	into it with a login shell and poke around to prove to ourselves that
866	everything is working properly. It is. Yay!
867
868	The remaining commands show shutting the container down and destroying
869	it, simply to show how these commands change relative to using the
870	Docker Engine commands. It’s “kill,” not “stop,” and it’s “delete,” not
871	“rm.”
872
873	[ecg]: https://github.com/opencontainers/runc/pull/3131
874	[jq]: https://stedolan.github.io/jq/
875
876
877	##### Lack of Layer Sharing
878
879	The bundle export process collapses Docker’s union filesystem down to a
880	single layer. Atop that, it makes all files mutable.
881
882	All of this is fine for tiny remote hosts with a single container, or at
883	least one where none of the containers share base layers. Where it
884	becomes a problem is when you have multiple Fossil containers on a
885	single host, since they all derive from the same base image.
886
887	The full-featured container runtimes above will intelligently share
888	these immutable base layers among the containers, storing only the
889	differences in each individual container. More, when pulling images from
890	a registry host, they’ll transfer only the layers you don’t have copies
891	of locally, so you don’t have to burn bandwidth sending copies of Alpine
892	and BusyBox each time, even though they’re unlikely to change from one
893	build to the next.
894
895
896	#### 6.3.2 <a id="crun"></a>`crun`
897
898	In the same way that [Docker Engine is based on `runc`](#runc), Podman’s
899	engine is based on [`crun`][crun], a lighter-weight alternative to
900	`runc`. It’s only 1.4 MiB on the system I tested it on, yet it will run
901	the same container bundles as in my `runc` examples above. We saved
902	more than that by compressing the container’s Fossil executable with
903	UPX, making the runtime virtually free in this case. The only question
904	is whether you can put up with its limitations, which are the same as
905	for `runc`.
906
907	[crun]: https://github.com/containers/crun
908
909
910	#### 6.3.3 <a id="nspawn"></a>`systemd-nspawn`
911
912	As of `systemd` version 242, its optional `nspawn` piece
913	[reportedly](https://www.phoronix.com/news/Systemd-Nspawn-OCI-Runtime)
914	got the ability to run OCI bundles directly. You might
915	have it installed already, but if not, it’s only about 2 MiB. It’s
916	in the `systemd-containers` package as of Ubuntu 22.04 LTS:
917
918	```
919	$ sudo apt install systemd-containers
920	```
921
922	It’s also in CentOS Stream 9, under the same name.
923
924	You create the bundles the same way as with [the `runc` method
925	above](#runc). The only thing that changes are the top-level management
926	commands:
927
928	```
929	$ sudo systemd-nspawn \
930	--oci-bundle=/var/lib/machines/fossil \
931	--machine=fossil \
932	--network-veth \
933	--port=127.0.0.1:127.0.0.1:9999:8080
934	$ sudo machinectl list
935	No machines.
936	```
937
938	This is why I wrote “reportedly” above: I couldn’t get it to work on two different
939	Linux distributions, and I can’t see why. I’m leaving this here to give
940	someone else a leg up, with the hope that they will work out what’s
941	needed to get the container running and registered with `machinectl`.
942
943	As of this writing, the tool expects an OCI container version of
944	“1.0.0”. I had to edit this at the top of my `config.json` file to get
945	the first command to read the bundle. The fact that it errored out when
946	I had “`1.0.2-dev`” in there proves it’s reading the file, but it
947	doesn’t seem able to make sense of what it finds there, and it doesn’t
948	give any diagnostics to say why.
949





































































































































950
951	<div style="height:50em" id="this-space-intentionally-left-blank"></div>
952

	--- www/containers.md
	+++ www/containers.md
	@@ -484,11 +484,11 @@
484	that’s still a big chunk of your storage budget. It takes 100:1 overhead
485	just to run a 4 MiB Fossil server container? Once again, I wouldn’t
486	blame you if you noped right on out of here, but if you will be patient,
487	you will find that there are ways to run Fossil inside a container even
488	on entry-level cloud VPSes. These are well-suited to running Fossil; you
489	don’t have to resort to [raw Fossil service][srv] to succeed,
490	leaving the benefits of containerization to those with bigger budgets.
491
492	For the sake of simple examples in this section, we’ll assume you’re
493	integrating Fossil into a larger web site, such as with our [Debian +
494	nginx + TLS][DNT] plan. This is why all of the examples below create
	@@ -521,10 +521,11 @@
521	this idea to the rest of your site.)
522
523	[DD]: https://www.docker.com/products/docker-desktop/
524	[DE]: https://docs.docker.com/engine/
525	[DNT]: ./server/debian/nginx.md
526	[srv]: ./server/
527
528
529	### 6.1 <a id="nerdctl" name="containerd"></a>Stripping Docker Engine Down
530
531	The core of Docker Engine is its [`containerd`][ctrd] daemon and the
	@@ -556,12 +557,12 @@
557	give up the image builder is [Podman]. Initially created by
558	Red Hat and thus popular on that family of OSes, it will run on
559	any flavor of Linux. It can even be made to run [on macOS via Homebrew][pmmac]
560	or [on Windows via WSL2][pmwin].
561
562	On Ubuntu 22.04, the installation size is about 38 MiB, roughly a
563	tenth the size of Docker Engine.
564
565	Although Podman [bills itself][whatis] as a drop-in replacement for the
566	`docker` command and everything that sits behind it, some of the tool’s
567	design decisions affect how our Fossil containers run, as compared to
568	using Docker. The most important of these is that, by default, Podman
	@@ -703,251 +704,322 @@
704	container images across the Internet, it can be a net win in terms of
705	build time.
706
707
708



























	-----

































	-----
709	### 6.3 <a id="nspawn"></a>`systemd-container`
710
711	If even the Podman stack is too big for you, the next-best option I’m
712	aware of is the `systemd-container` infrastructure on modern Linuxes,
713	available since version 239 or so. Its runtime tooling requires only
714	about 1.4 MiB of disk space:
715
716	```
717	$ sudo apt install systemd-container btrfs-tools
718	```
719
720	That command assumes the primary test environment for
721	this guide, Ubuntu 22.04 LTS with `systemd` 249. For best
722	results, `/var/lib/machines` should be a btrfs volume, because
723	[`$REASONS`][mcfad]. (For CentOS Stream 9 and other Red Hattish
724	systems, you will have to make serveral adjustments, which we’ve
725	collected [below](#nspawn-centos) to keep these examples clear.)
726
727	The first configuration step is to convert the Docker container into
728	a “machine”, as systemd calls it. The easiest method is:
729
730	```
731	$ make container-run
732	$ docker container export fossil-e119d5983620 \|
733	machinectl import-tar - myproject
734	```
735
736	Copy the container name from the first step to the second. Yours will
737	almost certainly be named after a different Fossil commit ID.
738
739	It’s important that the name of the machine you create —
740	“`myproject`” in this example — matches the base name
741	of the nspawn configuration file you create as the next step.
742	Therefore, to extend the example, the following file needs to be
743	called `/etc/systemd/nspawn/myproject.nspawn`, and it will contain
744	something like:
745
746	----
747
748	```
749	[Exec]
750	WorkingDirectory=/jail
751	Parameters=bin/fossil server \
752	--baseurl https://example.com/myproject \
753	--chroot /jail \
754	--create \
755	--jsmode bundled \
756	--localhost \
757	--port 9000 \
758	--scgi \
759	--user admin \
760	museum/repo.fossil
761	DropCapability= \
762	CAP_AUDIT_WRITE \
763	CAP_CHOWN \
764	CAP_FSETID \
765	CAP_KILL \
766	CAP_MKNOD \
767	CAP_NET_BIND_SERVICE \
768	CAP_NET_RAW \
769	CAP_SETFCAP \
770	CAP_SETPCAP
771	ProcessTwo=yes
772	LinkJournal=no
773	Timezone=no
774
775	[Files]
776	Bind=/home/fossil/museum/myproject:/jail/museum
777
778	[Network]
779	VirtualEthernet=no
780	```
781
782	----
783
784	If you recognize most of that from the `Dockerfile` discussion above,
785	congratulations, you’ve been paying attention. The rest should also
786	be clear from context.
787
788	Some of this is expected to vary. For one, the command given in the
789	`Parameters` directive assumes [SCGI proxying via nginx][DNT]. For
790	other use cases, see our collection of [Fossil server configuration
791	guides][srv], then adjust the command to your local needs.
792	For another, you will likely have to adjust the `Bind` value to
793	point at the directory containing the `repo.fossil` file referenced
794	in the command.
795
796	We also need a generic systemd unit file called
797	`/etc/systemd/system/[email protected]`, containing:
798
799	----
800
801	```
802	[Unit]
803	Description=Fossil %i Repo Service
804	[email protected] [email protected]
805	After=network.target systemd-resolved.service [email protected] [email protected]
806
807	[Service]
808	ExecStart=systemd-nspawn --settings=override --read-only --machine=%i bin/fossil
809
810	[Install]
811	WantedBy=multi-user.target
812	```
813
814	----
815
816	You shouldn’t have to change any of this because we’ve given the
817	`--setting=override` flag, meaning any setting in the nspawn file
818	overrides the setting passed to `systemd-nspawn`. This arrangement
819	not only keeps the unit file simple, it allows multiple services to
820	share the base configuration, varying on a per-repo level.
821
822	Start the service in the normal way:
823
824	```
825	$ sudo systemctl enable fossil@myproject
826	$ sudo systemctl start fossil@myproject
827	```
828
829	You should find it running on localhost port 9000 per the nspawn
830	configuration file above, suitable for proxying Fossil out to the
831	public using nginx, via SCGI. If you aren’t using a front-end proxy
832	and want Fossil exposed to the world, you might say this instead in
833	the `nspawn` file:
834
835	```
836	Parameters=bin/fossil server \
837	--cert /path/to/my/fullchain.pem \
838	--chroot /jail \
839	--create \
840	--jsmode bundled \
841	--port 443 \
842	--user admin \
843	museum/repo.fossil
844	```
845
846	You would also need to un-drop the `CAP_NET_BIND_SERVICE` capability
847	to allow Fossil to bind to this low-numbered port.
848
849	We use systemd’s template file feature to allow multiple Fossil
850	servers running on a single machine, each on a different TCP port,
851	as when proxying them out as subdirectories of a larger site.
852	To add another project, you must first clone the base “machine” layer:
853
854	```
855	$ sudo machinectl clone myproject otherthing
856	```
857
858	That will not only create a clone of `/var/lib/machines/myproject`
859	as `../otherthing`, it will create a matching `nspawn` file for you
860	as a copy of the first one. Adjust its contents to suit, then enable
861	and start it as above.
862
863	[mcfad]: https://www.freedesktop.org/software/systemd/man/machinectl.html#Files%20and%20Directories
864
865
866	### 6.3.1 <a id="nspawn-rhel"></a>Getting It Working on a RHEL Clone
867
868	The biggest difference between doing this on OSes like CentOS versus
869	Ubuntu is that RHEL (thus also its clones) doesn’t ship btrfs in
870	its kernel, thus has no option for installing `mkfs.btrfs`, which
871	[`machinectl`][mctl] needs for various purposes.
872
873	Fortunately, there are workarounds.
874
875	First, the `apt install` command above becomes:
876
877	```
878	$ sudo dnf install systemd-container
879	```
880
881	Second, you have to hack around the lack of `machinectl import-tar` so:
882
883	```
884	$ rootfs=/var/lib/machines/fossil
885	$ sudo mkdir -p $rootfs
886	$ docker container export fossil \| sudo tar -xf -C $rootfs -
887	```
888
889	The parent directory path in the `rootfs` variable is important,
890	because although we aren’t using `machinectl`, the `systemd-nspawn`
891	developers assume you’re using them together. Thus, when you give
892	`--machine`, it assumes the `machinectl` directory scheme. You could
893	instead use `--directory`, allowing you to store the rootfs whereever
894	you like, but why make things difficult? It’s a perfectly sensible
895	default, consistent with the [LHS] rules.
896
897	The final element — the machine name — can be anything
898	you like so long as it matches the nspawn file’s base name.
899
900	Finally, since you can’t use `machinectl clone`, you have to make
901	a wasteful copy of `/var/lib/machines/myproject` when standing up
902	multiple Fossil repo services on a single machine. (This is one
903	of the reasons `machinectl` depends on `btrfs`: cheap copy-on-write
904	subvolumes.) Because we give the `--read-only` flag, you can simply
905	`cp -r` one machine to a new name rather than go through the
906	export-and-import dance you used to create the first one.
907
908	[LHS]: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html
909	[mctl]: https://www.freedesktop.org/software/systemd/man/machinectl.html
910
911
912	### 6.3.2 <a id="nspawn-weaknesses"></a>What Am I Missing Out On?
913
914	For all the runtime size savings in this method, you may be wondering
915	what you’re missing out on relative to Podman, which takes up
916	roughly 27× more disk space. Short answer: lots. Long answer:
917
918	1. Build system. You’ll have to build and test your containers
919	some other way. This method is only suitable for running them
920	once they’re built.
921
922	2. Orchestration. All of the higher-level things like
923	“compose” files, Docker Swarm mode, and Kubernetes are
924	unavailable to you at this level. You can run multiple
925	instances of Fossil, but on a single machine only and with a
926	static configuration.
927
928	3. Image layer sharing. When you update an image using one of the
929	above methods, Docker and Podman are smart enough to copy only
930	changed layers. Furthermore, when you base multiple containers
931	on a single image, they don’t make copies of the base layers;
932	they can share them, because base layers are immutable, thus
933	cannot cross-contaminate.
934
935	Because we use `sysetmd-nspawn --read-only`, we get some
936	of this benefit, particularly when using `machinectl` with
937	`/var/lib/machines` as a btrfs volume. Even so, the disk space
938	and network I/O optimizations go deeper in the Docker and Podman
939	worlds.
940
941	4. Tooling. Hand-creating and modifying those systemd
942	files sucks compared to “`podman container create ...`” This
943	is but one of many affordances you will find in the runtimes
944	aimed at daily-use devops warriors.
945
946	5. Network virtualization. In the scheme above, we turn off the
947	`systemd` virtual netorking support because in its default mode,
948	it wants to hide the service entirely.
949
950	Another way to put this is that `systemd-nspawn --port` does
951	approximately nothing of what `docker create --publish` does
952	despite their superficial similarities.
953
954	For this container, it doesn’t much matter, since it exposes
955	only a single port, and we do want that one port exposed, one way
956	or another. Beyond that, we get all the control we need using
957	Fossil options like `--localhost`. I point this out because in
958	more complex situations, the automatic network setup features of
959	the more featureful runtimes can save a lot of time and hassle.
960	They aren’t doing anything you couldn’t do by hand, but why
961	would you want to, given the choice?
962
963	I expect there’s a lot more I neglected to think of when creating
964	this list, but I think it suffices to make my case as it is. If you
965	can afford the space of Podman or Docker, I strongly recommend using
966	either of them over the much lower-level `systemd-container`
967	infrastructure.
968
969	(Incidentally, these are essentially the same reasons why we no longer
970	talk about the `crun` tool underpinning Podman in this document. It’s
971	even more limited, making it even more difficult to administer while
972	providing no runtime size advantage. The `runc` tool underpinning
973	Docker is even worse on this score, being scarcely easier to use than
974	`crun` while having a much larger footprint.)
975
976
977	### 6.3.3 <a id="nspawn-assumptions"></a>Violated Assumptions
978
979	The `systemd-container` infrastructure has a bunch of hard-coded
980	assumptions baked into it. We papered over these problems above,
981	but if you’re using these tools for other purposes on the machine
982	you’re serving Fossil from, you may need to know which assumptions
983	our container violates and the resulting consequences:
984
985	1. `systemd-nspawn` works best with `machinectl`, but if you haven’t
986	got `btrfs` available, you run into [trouble](#nspawn-rhel).
987
988	2. Our stock container starts a single static executable inside
989	a stripped-to-the-bones container rather than “boot” an OS
990	image, causing a bunch of commands to fail:
991
992	* `machinectl poweroff` will fail because the container
993	isn’t running dbus.
994	* `machinectl start` will try to find an `/sbin/init`
995	program in the rootfs, which we haven’t got. We could
996	rename `/jail/bin/fossil` to `/sbin/init` and then hack
997	the chroot scheme to match, but ick. (This, incidentally,
998	is why we set `ProcessTwo=yes` above even though Fossil is
999	perfectly capable of running as PID 1, a fact we depend on
1000	in the other methods above.)
1001	* `machinectl shell` will fail because there is no login
1002	daemon running, which we purposefully avoided adding by
1003	creating a “`FROM scratch`” container. (If you need a
1004	shell, say: `sudo systemd-nspawn --machine=myproject /bin/sh`)
1005	* `machinectl status` won’t give you the container logs
1006	because we disabled the shared journal, which was in turn
1007	necessary because we don’t run `systemd` inside the
1008	container, just outside.
1009
1010	If these are problems for you, you may wish to build a
1011	fatter container using `debootstrap` or similar. ([External
1012	tutorial][medtut].)
1013
1014	3. We disable the “private networking” feature since the whole
1015	point of this container is to expose a network service to the
1016	public, one way or another. If you do things the way the defaults
1017	(and thus the official docs) expect, you must push through
1018	[a whole lot of complexity][ndcmp] to re-expose this single
1019	network port. That complexity is justified only if your service
1020	is itself complex, having both private and public service ports.
1021
1022	[medtut]: https://medium.com/@huljar/setting-up-containers-with-systemd-nspawn-b719cff0fb8d
1023	[ndcmp]: https://wiki.archlinux.org/title/systemd-networkd#Usage_with_containers
1024
1025	<div style="height:50em" id="this-space-intentionally-left-blank"></div>
1026

Fossil SCM

Keyboard Shortcuts