Fossil SCM
Mentioned containerd+nerdctl in place of runc in the containers doc. A tightened-up version of the prior runc and crun sections are now collected below the Podman section. This gives a better flow: each successive option is smaller than the last, excepting only nspawn, which is a bit bigger than crun. (We leave nspawn last because we can't get it to work!)
Commit
457c14a490c76ef168505d9e17456fe470ff6c47c3068038e25595905ef5a5ad
Parent
19abf0ac13b6a28…
1 file changed
+237
-254
+237
-254
| --- www/containers.md | ||
| +++ www/containers.md | ||
| @@ -530,245 +530,45 @@ | ||
| 530 | 530 | [DD]: https://www.docker.com/products/docker-desktop/ |
| 531 | 531 | [DE]: https://docs.docker.com/engine/ |
| 532 | 532 | [DNT]: ./server/debian/nginx.md |
| 533 | 533 | |
| 534 | 534 | |
| 535 | -### 6.1 <a id="runc" name="containerd"></a>Stripping Docker Engine Down | |
| 535 | +### 6.1 <a id="nerdctl" name="containerd"></a>Stripping Docker Engine Down | |
| 536 | 536 | |
| 537 | 537 | The core of Docker Engine is its [`containerd`][ctrd] daemon and the |
| 538 | -[`runc`][runc] container runner. It’s possible to dig into the subtree | |
| 539 | -managed by `containerd` on the build host and extract what we need to | |
| 540 | -run our Fossil container elsewhere with `runc`, leaving out all the | |
| 541 | -rest. `runc` alone is about 18 MiB, and you can do without `containerd` | |
| 542 | -entirely, if you want. | |
| 543 | - | |
| 544 | -The method isn’t complicated, but it *is* cryptic enough to want a shell | |
| 545 | -script: | |
| 546 | - | |
| ----- | ||
| 547 | - | |
| 548 | -```shell | |
| 549 | -#!/bin/sh | |
| 550 | -c=fossil | |
| 551 | -b=$HOME/containers/$c | |
| 552 | -r=$b/rootfs | |
| 553 | -m=/run/containerd/io.containerd.runtime.v2.task/moby | |
| 554 | - | |
| 555 | -if [ -d "$t" ] && mkdir -p $r | |
| 556 | -then | |
| 557 | - docker container start $c | |
| 558 | - docker container export $c | sudo tar -C $r -xf - | |
| 559 | - id=$(docker inspect --format="{{.Id}}" $c) | |
| 560 | - sudo cat $m/$id/config.json | | |
| 561 | - jq '.root.path = "'$r'"' | | |
| 562 | - jq '.linux.cgroupsPath = ""' | | |
| 563 | - jq 'del(.linux.sysctl)' | | |
| 564 | - jq 'del(.linux.namespaces[] | select(.type == "network"))' | | |
| 565 | - jq 'del(.mounts[] | select(.destination == "/etc/hostname"))' | | |
| 566 | - jq 'del(.mounts[] | select(.destination == "/etc/resolv.conf"))' | | |
| 567 | - jq 'del(.mounts[] | select(.destination == "/etc/hosts"))' | | |
| 568 | - jq 'del(.hooks)' > $b/config.json | |
| 569 | -fi | |
| 570 | -``` | |
| 571 | - | |
| ----- | ||
| 572 | - | |
| 573 | -The first several lines list configurables: | |
| 574 | - | |
| 575 | -* **`b`**: the path of the exported container, called the “bundle” in OCI | |
| 576 | - jargon | |
| 577 | -* **`c`**: the name of the Docker container you’re bundling up for use | |
| 578 | - with `runc` | |
| 579 | -* **`m`**: the directory holding the running machines, configurable | |
| 580 | - because: | |
| 581 | - * it’s long | |
| 582 | - * it’s been known to change from one version of Docker to the next | |
| 583 | - * you might be using [Podman](#podman)/[`crun`](#crun), so it has | |
| 584 | - to be “`/run/user/$UID/crun`” instead | |
| 585 | -* **`r`**: the path of the directory containing the bundle’s root file | |
| 586 | - system. | |
| 587 | - | |
| 588 | -That last doesn’t have to be called `rootfs/`, and it doesn’t have to | |
| 589 | -live in the same directory as `config.json`, but it is conventional. | |
| 590 | -Because some OCI tools use those names as defaults, it’s best to follow | |
| 591 | -suit. | |
| 592 | - | |
| 593 | -The rest is generic, but you’re welcome to freestyle here. We’ll show an | |
| 594 | -example of this below. | |
| 595 | - | |
| 596 | -We’re using [jq] for two separate purposes: | |
| 597 | - | |
| 598 | -1. To automatically transmogrify Docker’s container configuration so it | |
| 599 | - will work with `runc`: | |
| 600 | - | |
| 601 | - * point it where we unpacked the container’s exported rootfs | |
| 602 | - * accede to its wish to [manage cgroups by itself][ecg] | |
| 603 | - * remove the `sysctl` calls that will break after… | |
| 604 | - * …we remove the network namespace to allow Fossil’s TCP listening | |
| 605 | - port to be available on the host; `runc` doesn’t offer the | |
| 606 | - equivalent of `docker create --publish`, and we can’t be | |
| 607 | - bothered to set up a manual mapping from the host port into the | |
| 608 | - container | |
| 609 | - * remove file bindings that point into the local runtime managed | |
| 610 | - directories; one of the things we give up by using a bare | |
| 611 | - container runner is automatic management of these files | |
| 612 | - * remove the hooks for essentially the same reason | |
| 613 | - | |
| 614 | -2. To make the Docker-managed machine-readable `config.json` more | |
| 615 | - human-readable, in case there are other things you want changed in | |
| 616 | - this version of the container. Exposing the `config.json` file like | |
| 617 | - this means you don’t have to rebuild the container merely to change | |
| 618 | - a value like a mount point, the kernel capability set, and so forth. | |
| 619 | - | |
| 620 | -<a id="why-sudo"></a> | |
| 621 | -We have to do this transformation of `config.json` as the local root | |
| 622 | -user because it isn’t readable by your normal user. Additionally, that | |
| 623 | -input file is only available while the container is started, which is | |
| 624 | -why we ensure that before exporting the container’s rootfs. | |
| 625 | - | |
| 626 | -With the container exported like this, you can start it as: | |
| 627 | - | |
| 628 | -``` | |
| 629 | - $ cd /path/to/bundle | |
| 630 | - $ c=any-name-you-like | |
| 631 | - $ sudo runc create $c | |
| 632 | - $ sudo runc start $c | |
| 633 | - $ sudo runc exec $c -t sh -l | |
| 634 | - ~ $ ls museum | |
| 635 | - repo.fossil | |
| 636 | - ~ $ ps -eaf | |
| 637 | - PID USER TIME COMMAND | |
| 638 | - 1 fossil 0:00 bin/fossil server --create … | |
| 639 | - ~ $ exit | |
| 640 | - $ sudo runc kill fossil-runc | |
| 641 | - $ sudo runc delete fossil-runc | |
| 642 | -``` | |
| 643 | - | |
| 644 | -If you’re doing this on the export host, the first command is “`cd $b`” | |
| 645 | -if we’re using the variables from the shell script above. We do this | |
| 646 | -because `runc` assumes you’re running it from the bundle directory. If | |
| 647 | -you prefer, the `runc` commands that care about this take a | |
| 648 | -`--bundle/-b` flag to let you avoid switching directories. | |
| 649 | - | |
| 650 | -The rest should be straightforward: create and start the container as | |
| 651 | -root so the `chroot(2)` call inside the container will succeed, then get | |
| 652 | -into it with a login shell and poke around to prove to ourselves that | |
| 653 | -everything is working properly. It is. Yay! | |
| 654 | - | |
| 655 | -The remaining commands show shutting the container down and destroying | |
| 656 | -it, simply to show how these commands change relative to using the | |
| 657 | -Docker Engine commands. It’s “kill,” not “stop,” and it’s “delete,” not | |
| 658 | -“rm.” | |
| 659 | - | |
| 660 | -If you want the bundle to run on a remote host, the local and remote | |
| 661 | -bundle directories likely will not match, as the shell script above | |
| 662 | -assumes. This is a more realistic shell script for that case: | |
| 663 | - | |
| ----- | ||
| 664 | - | |
| 665 | -```shell | |
| 666 | -#!/bin/bash -ex | |
| 667 | -c=fossil | |
| 668 | -b=/var/lib/machines/$c | |
| 669 | -h=my-host.example.com | |
| 670 | -m=/run/containerd/io.containerd.runtime.v2.task/moby | |
| 671 | -t=$(mktemp -d /tmp/$c-bundle.XXXXXX) | |
| 672 | - | |
| 673 | -if [ -d "$t" ] | |
| 674 | -then | |
| 675 | - docker container start $c | |
| 676 | - docker container export $c > $t/rootfs.tar | |
| 677 | - id=$(docker inspect --format="{{.Id}}" $c) | |
| 678 | - sudo cat $m/$id/config.json | | |
| 679 | - jq '.root.path = "'$b/rootfs'"' | | |
| 680 | - jq '.linux.cgroupsPath = ""' | | |
| 681 | - jq 'del(.linux.sysctl)' | | |
| 682 | - jq 'del(.linux.namespaces[] | select(.type == "network"))' | | |
| 683 | - jq 'del(.mounts[] | select(.destination == "/etc/hostname"))' | | |
| 684 | - jq 'del(.mounts[] | select(.destination == "/etc/resolv.conf"))' | | |
| 685 | - jq 'del(.mounts[] | select(.destination == "/etc/hosts"))' | | |
| 686 | - jq 'del(.hooks)' > $t/config.json | |
| 687 | - scp -r $t $h:tmp | |
| 688 | - ssh -t $h "{ | |
| 689 | - mv ./$t/config.json $b && | |
| 690 | - sudo tar -C $b/rootfs -xf ./$t/rootfs.tar && | |
| 691 | - rm -r ./$t | |
| 692 | - }" | |
| 693 | - rm -r $t | |
| 694 | -fi | |
| 695 | -``` | |
| 696 | - | |
| ----- | ||
| 697 | - | |
| 698 | -We’ve introduced two new variables: | |
| 699 | - | |
| 700 | -* **`h`**: the remote host name | |
| 701 | -* **`t`**: a temporary bundle directory we populate locally, then | |
| 702 | - `scp` to the remote machine, where it’s unpacked | |
| 703 | - | |
| 704 | -We dropped the **`r`** variable because now we have two different | |
| 705 | -“rootfs” types: the tarball and the unpacked version of that tarball. | |
| 706 | -To avoid confusing ourselves between these cases, we’ve replaced uses of | |
| 707 | -`$r` with explicit paths. | |
| 708 | - | |
| 709 | -You need to be aware that this script uses `sudo` for two different purposes: | |
| 710 | - | |
| 711 | -1. To read the local `config.json` file out of the `containerd` managed | |
| 712 | - directory. ([Details above](#why-sudo).) | |
| 713 | - | |
| 714 | -2. To unpack the bundle onto the remote machine. If you try to get | |
| 715 | - clever and unpack it locally, then `rsync` it to the remote host to | |
| 716 | - avoid re-copying files that haven’t changed since the last update, | |
| 717 | - you’ll find that it fails when it tries to copy device nodes, to | |
| 718 | - create files owned only by the remote root user, and so forth. If the | |
| 719 | - container bundle is small, it’s simpler to re-copy and unpack it | |
| 720 | - fresh each time. | |
| 721 | - | |
| 722 | -I point that out because it might ask for your password twice: once for | |
| 723 | -the local sudo command, and once for the remote. | |
| 724 | - | |
| 725 | -The default for the **`b`** variable is the convention for systemd based | |
| 726 | -machines, which will play into the [`nspawn` alternative below][sdnsp]. | |
| 727 | -Even if you aren’t using `nspawn`, it’s a reasonable place to put | |
| 728 | -containers under the [Linux FHS rules][LFHS]. | |
| 729 | - | |
| 730 | -[ctrd]: https://containerd.io/ | |
| 731 | -[ecg]: https://github.com/opencontainers/runc/pull/3131 | |
| 732 | -[LFHS]: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard | |
| 733 | -[jq]: https://stedolan.github.io/jq/ | |
| 734 | -[sdnsp]: #nspawn | |
| 735 | -[runc]: https://github.com/opencontainers/runc | |
| 538 | +[`runc`][runc] container runner. Add to this the out-of-core CLI program | |
| 539 | +[`nerdctl`][nerdctl] and you have enough of the engine to run Fossil | |
| 540 | +containers. The big things you’re missing are: | |
| 541 | + | |
| 542 | +* **BuildKit**: The container build engine, which doesn’t matter if | |
| 543 | + you’re building elsewhere and using a container registry as an | |
| 544 | + intermediary between that build host and the deployment host. | |
| 545 | + | |
| 546 | +* **SwarmKit**: A powerful yet simple orchestrator for Docker that you | |
| 547 | + probably aren’t using with Fossil anyway. | |
| 548 | + | |
| 549 | +In exchange, you get a runtime that’s about half the size of Docker | |
| 550 | +Engine. The commands are essentially the same as above, but you say | |
| 551 | +“`nerdctl`” instead of “`docker`”. You might alias one to the other, | |
| 552 | +because you’re still going to be using Docker to build and ship your | |
| 553 | +container images. | |
| 554 | + | |
| 555 | +[ctrd]: https://containerd.io/ | |
| 556 | +[nerdctl]: https://github.com/containerd/nerdctl | |
| 557 | +[runc]: https://github.com/opencontainers/runc | |
| 736 | 558 | |
| 737 | 559 | |
| 738 | 560 | ### 6.2 <a id="podman"></a>Podman |
| 739 | 561 | |
| 740 | -Although your humble author claims the `runc` methods above are not | |
| 741 | -complicated, merely cryptic, you might be fondly recollecting the | |
| 742 | -carefree commands at the top of this document, pondering whether you can | |
| 743 | -live without the abstractions a proper container runtime system | |
| 744 | -provides. | |
| 745 | - | |
| 746 | -More than that, there’s a hidden cost to the `runc` method: there is no | |
| 747 | -layer sharing among containers. If you have multiple Fossil containers | |
| 748 | -on a single host — perhaps because each serves an independent section of | |
| 749 | -the overall web site — and you export them to a remote host using the | |
| 750 | -shell script above, you’ll end up with redundant copies of the `rootfs` | |
| 751 | -in each. A proper OCI container runtime knows they’re all derived from | |
| 752 | -the same base image, differing only in minor configuration details, | |
| 753 | -giving us one of the major advantages of containerization: if none of | |
| 754 | -the running containers can change these immutable base layers, it | |
| 755 | -doesn’t have to copy them. | |
| 756 | - | |
| 757 | -A lighter-weight alternative to Docker Engine that doesn’t give up so | |
| 758 | -many of its administrator affordances is [Podman], initially created by | |
| 759 | -Red Hat and thus popular on that family of OSes, although it will run on | |
| 562 | +A lighter-weight alternative to either of the prior options that doesn’t | |
| 563 | +give up the image builder is [Podman]. Initially created by | |
| 564 | +Red Hat and thus popular on that family of OSes, it will run on | |
| 760 | 565 | any flavor of Linux. It can even be made to run [on macOS via Homebrew][pmmac] |
| 761 | 566 | or [on Windows via WSL2][pmwin]. |
| 762 | 567 | |
| 763 | -On Ubuntu 22.04, it’s about a quarter the size of Docker Engine. That | |
| 764 | -isn’t nearly so slim as `runc`, but we may be willing to pay this | |
| 765 | -overhead to get shorter and fewer commands. | |
| 568 | +On Ubuntu 22.04, it’s about a quarter the size of Docker Engine, or half | |
| 569 | +that of the “full” distribution of `nerdctl` and all its dependencies. | |
| 766 | 570 | |
| 767 | 571 | Although Podman [bills itself][whatis] as a drop-in replacement for the |
| 768 | 572 | `docker` command and everything that sits behind it, some of the tool’s |
| 769 | 573 | design decisions affect how our Fossil containers run, as compared to |
| 770 | 574 | using Docker. The most important of these is that, by default, Podman |
| @@ -817,39 +617,14 @@ | ||
| 817 | 617 | they’ll be connected to the network the container runs on. Once the bad |
| 818 | 618 | guy is inside the house, he doesn’t necessarily have to go after the |
| 819 | 619 | residents directly to cause problems for them. |
| 820 | 620 | |
| 821 | 621 | |
| 822 | -#### 6.2.2 <a id="crun"></a>`crun` | |
| 823 | - | |
| 824 | -In the same way that [Docker Engine is based on `runc`](#runc), Podman’s | |
| 825 | -engine is based on [`crun`][crun], a lighter-weight alternative to | |
| 826 | -`runc`. It’s only 1.4 MiB on the system I tested it on, yet it will run | |
| 827 | -the same container bundles as in my `runc` examples above. | |
| 828 | -Above, we saved more than that by compressing the container’s Fossil | |
| 829 | -executable with UPX! | |
| 830 | - | |
| 831 | -This makes `crun` a great option for tiny remote hosts with a single | |
| 832 | -container, or at least where none of the containers share base layers, | |
| 833 | -so that there is no effective cost to duplicating the immutable base | |
| 834 | -layers of the containers’ source images. | |
| 835 | - | |
| 836 | -This suggests one method around the problem of rootless Podman containers: | |
| 837 | -`sudo crun`, following the examples above. | |
| 838 | - | |
| 839 | -[crun]: https://github.com/containers/crun | |
| 840 | - | |
| 841 | - | |
| 842 | -#### 6.2.3 <a id="podman-rootful"></a>Fossil in a Rootful Podman Container | |
| 622 | +#### 6.2.2 <a id="podman-rootful"></a>Fossil in a Rootful Podman Container | |
| 843 | 623 | |
| 844 | 624 | ##### Simple Method |
| 845 | 625 | |
| 846 | -As we saw above with `runc`, switching to `crun` just to get your | |
| 847 | -containers to run as root loses a lot of functionality and requires a | |
| 848 | -bunch of cryptic commands to get the same effect as a single command | |
| 849 | -under Podman. | |
| 850 | - | |
| 851 | 626 | Fortunately, it’s easy enough to have it both ways. Simply run your |
| 852 | 627 | `podman` commands as root: |
| 853 | 628 | |
| 854 | 629 | ``` |
| 855 | 630 | $ sudo podman build -t fossil --cap-add MKNOD . |
| @@ -875,11 +650,11 @@ | ||
| 875 | 650 | it’s done inside a container runtime’s build environment doesn’t mean we |
| 876 | 651 | can get away without root privileges to do things like create the |
| 877 | 652 | `/jail/dev/null` node. |
| 878 | 653 | |
| 879 | 654 | The other reason we need “`sudo podman build`” is because it puts the result |
| 880 | -into root’s Podman image repository, where the next steps look for it. | |
| 655 | +into root’s Podman image registry, where the next steps look for it. | |
| 881 | 656 | |
| 882 | 657 | That in turn explains why we need “`sudo podman create`:” because it’s |
| 883 | 658 | creating a container based on an image that was created by root. If you |
| 884 | 659 | ran that step without `sudo`, it wouldn’t be able to find the image. |
| 885 | 660 | |
| @@ -927,23 +702,227 @@ | ||
| 927 | 702 | $ sudo podman create \ |
| 928 | 703 | --any-options-you-like \ |
| 929 | 704 | docker.io/mydockername/fossil |
| 930 | 705 | ``` |
| 931 | 706 | |
| 932 | -This round-trip through the public image repository has another side | |
| 707 | +This round-trip through the public image registry has another side | |
| 933 | 708 | benefit: your local system might be a lot faster than your remote one, |
| 934 | 709 | as when the remote is a small VPS. Even with the overhead of schlepping |
| 935 | 710 | container images across the Internet, it can be a net win in terms of |
| 936 | 711 | build time. |
| 937 | 712 | |
| 938 | 713 | |
| 939 | 714 | |
| 940 | -### 6.3 <a id="nspawn"></a>`systemd-nspawn` | |
| 715 | +### 6.3 <a id="barebones"></a>Bare-Bones OCI Bundle Runners | |
| 716 | + | |
| 717 | +If even the Podman stack is too big for you, you still have options for | |
| 718 | +running containers that are considerably slimmer, at a high cost to | |
| 719 | +administration complexity and loss of features. | |
| 720 | + | |
| 721 | +Part of the OCI standard is the notion of a “bundle,” being a consistent | |
| 722 | +way to present a pre-built and configured container to the runtime. | |
| 723 | +Essentially, it consists of a directory containing a `config.json` file | |
| 724 | +and a `rootfs/` subdirectory containing the root filesystem image. Many | |
| 725 | +tools can produce these for you. We’ll show only one method in the first | |
| 726 | +section below, then reuse that in the following sections. | |
| 727 | + | |
| 728 | + | |
| 729 | +#### 6.3.1 <a id="runc"></a>`runc` | |
| 730 | + | |
| 731 | +We mentioned `runc` [above](#nerdctl), but it’s possible to use it | |
| 732 | +standalone, without `containerd` or its CLI frontend `nerdctl`. You also | |
| 733 | +lose the build engine, intelligent image layer sharing, image registry | |
| 734 | +connections, and much more. The plus side is that `runc` alone is | |
| 735 | +18 MiB. | |
| 736 | + | |
| 737 | +Using it without all the support tooling isn’t complicated, but it *is* | |
| 738 | +cryptic enough to want a shell script. Let’s say we want to build on our | |
| 739 | +big desktop machine but ship the resulting container to a small remote | |
| 740 | +host. This should serve: | |
| 741 | + | |
| 742 | +---- | |
| 743 | + | |
| 744 | +```shell | |
| 745 | +#!/bin/bash -ex | |
| 746 | +c=fossil | |
| 747 | +b=/var/lib/machines/$c | |
| 748 | +h=my-host.example.com | |
| 749 | +m=/run/containerd/io.containerd.runtime.v2.task/moby | |
| 750 | +t=$(mktemp -d /tmp/$c-bundle.XXXXXX) | |
| 751 | + | |
| 752 | +if [ -d "$t" ] | |
| 753 | +then | |
| 754 | + docker container start $c | |
| 755 | + docker container export $c > $t/rootfs.tar | |
| 756 | + id=$(docker inspect --format="{{.Id}}" $c) | |
| 757 | + sudo cat $m/$id/config.json \ | |
| 758 | + | jq '.root.path = "'$b/rootfs'"' | |
| 759 | + | jq '.linux.cgroupsPath = ""' | |
| 760 | + | jq 'del(.linux.sysctl)' | |
| 761 | + | jq 'del(.linux.namespaces[] | select(.type == "network"))' | |
| 762 | + | jq 'del(.mounts[] | select(.destination == "/etc/hostname"))' | |
| 763 | + | jq 'del(.mounts[] | select(.destination == "/etc/resolv.conf"))' | |
| 764 | + | jq 'del(.mounts[] | select(.destination == "/etc/hosts"))' | |
| 765 | + | jq 'del(.hooks)' > $t/config.json | |
| 766 | + scp -r $t $h:tmp | |
| 767 | + ssh -t $h "{ | |
| 768 | + mv ./$t/config.json $b && | |
| 769 | + sudo tar -C $b/rootfs -xf ./$t/rootfs.tar && | |
| 770 | + rm -r ./$t | |
| 771 | + }" | |
| 772 | + rm -r $t | |
| 773 | +fi | |
| 774 | +``` | |
| 775 | + | |
| 776 | +---- | |
| 777 | + | |
| 778 | +The first several lines list configurables: | |
| 779 | + | |
| 780 | +* **`c`**: the name of the Docker container you’re bundling up for use | |
| 781 | + with `runc` | |
| 782 | +* **`b`**: the path of the exported container, called the “bundle” in | |
| 783 | + OCI jargon; we’re using the [`nspawn`](#nspawn) convention, a | |
| 784 | + reasonable choice under the [Linux FHS rules][LFHS] | |
| 785 | +* **`h`**: the remote host name | |
| 786 | +* **`m`**: the local directory holding the running machines, configurable | |
| 787 | + because: | |
| 788 | + * the path name is longer than we want to use inline | |
| 789 | + * it’s been known to change from one version of Docker to the next | |
| 790 | + * you might be building and testing with [Podman](#podman), so it | |
| 791 | + has to be “`/run/user/$UID/crun`” instead | |
| 792 | +* **`t`**: the temporary bundle directory we populate locally, then | |
| 793 | + `scp` to the remote machine, where it’s unpacked | |
| 794 | + | |
| 795 | +[LFHS]: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard | |
| 796 | + | |
| 797 | + | |
| 798 | +##### Why All That `sudo` Stuff? | |
| 799 | + | |
| 800 | +This script uses `sudo` for two different purposes: | |
| 801 | + | |
| 802 | +1. To read the local `config.json` file out of the `containerd` managed | |
| 803 | + directory, which is owned by `root` on Docker systems. Additionally, | |
| 804 | + that input file is only available while the container is started, so | |
| 805 | + we must ensure that before extracting it. | |
| 806 | + | |
| 807 | +2. To unpack the bundle onto the remote machine. If you try to get | |
| 808 | + clever and unpack it locally, then `rsync` it to the remote host to | |
| 809 | + avoid re-copying files that haven’t changed since the last update, | |
| 810 | + you’ll find that it fails when it tries to copy device nodes, to | |
| 811 | + create files owned only by the remote root user, and so forth. If the | |
| 812 | + container bundle is small, it’s simpler to re-copy and unpack it | |
| 813 | + fresh each time. | |
| 814 | + | |
| 815 | +I point all this out because it might ask for your password twice: once for | |
| 816 | +the local sudo command, and once for the remote. | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | +##### Why All That `jq` Stuff? | |
| 821 | + | |
| 822 | +We’re using [jq] for two separate purposes: | |
| 823 | + | |
| 824 | +1. To automatically transmogrify Docker’s container configuration so it | |
| 825 | + will work with `runc`: | |
| 826 | + | |
| 827 | + * point it where we unpacked the container’s exported rootfs | |
| 828 | + * accede to its wish to [manage cgroups by itself][ecg] | |
| 829 | + * remove the `sysctl` calls that will break after… | |
| 830 | + * …we remove the network namespace to allow Fossil’s TCP listening | |
| 831 | + port to be available on the host; `runc` doesn’t offer the | |
| 832 | + equivalent of `docker create --publish`, and we can’t be | |
| 833 | + bothered to set up a manual mapping from the host port into the | |
| 834 | + container | |
| 835 | + * remove file bindings that point into the local runtime managed | |
| 836 | + directories; one of the things we give up by using a bare | |
| 837 | + container runner is automatic management of these files | |
| 838 | + * remove the hooks for essentially the same reason | |
| 839 | + | |
| 840 | +2. To make the Docker-managed machine-readable `config.json` more | |
| 841 | + human-readable, in case there are other things you want changed in | |
| 842 | + this version of the container. Exposing the `config.json` file like | |
| 843 | + this means you don’t have to rebuild the container merely to change | |
| 844 | + a value like a mount point, the kernel capability set, and so forth. | |
| 845 | + | |
| 846 | + | |
| 847 | +##### Running the Bundle | |
| 848 | + | |
| 849 | +With the container exported to a bundle like this, you can start it as: | |
| 850 | + | |
| 851 | +``` | |
| 852 | + $ cd /path/to/bundle | |
| 853 | + $ c=fossil-runc ← …or anything else you prefer | |
| 854 | + $ sudo runc create $c | |
| 855 | + $ sudo runc start $c | |
| 856 | + $ sudo runc exec $c -t sh -l | |
| 857 | + ~ $ ls museum | |
| 858 | + repo.fossil | |
| 859 | + ~ $ ps -eaf | |
| 860 | + PID USER TIME COMMAND | |
| 861 | + 1 fossil 0:00 bin/fossil server --create … | |
| 862 | + ~ $ exit | |
| 863 | + $ sudo runc kill $c | |
| 864 | + $ sudo runc delete $c | |
| 865 | +``` | |
| 866 | + | |
| 867 | +If you’re doing this on the export host, the first command is “`cd $b`” | |
| 868 | +if we’re using the variables from the shell script above. Alternately, | |
| 869 | +the `runc` subcommands that need to read the bundle files take a | |
| 870 | +`--bundle/-b` flag to let you avoid switching directories. | |
| 871 | + | |
| 872 | +The rest should be straightforward: create and start the container as | |
| 873 | +root so the `chroot(2)` call inside the container will succeed, then get | |
| 874 | +into it with a login shell and poke around to prove to ourselves that | |
| 875 | +everything is working properly. It is. Yay! | |
| 876 | + | |
| 877 | +The remaining commands show shutting the container down and destroying | |
| 878 | +it, simply to show how these commands change relative to using the | |
| 879 | +Docker Engine commands. It’s “kill,” not “stop,” and it’s “delete,” not | |
| 880 | +“rm.” | |
| 881 | + | |
| 882 | +[ecg]: https://github.com/opencontainers/runc/pull/3131 | |
| 883 | +[jq]: https://stedolan.github.io/jq/ | |
| 884 | + | |
| 885 | + | |
| 886 | +##### Lack of Layer Sharing | |
| 887 | + | |
| 888 | +The bundle export process collapses Docker’s union filesystem down to a | |
| 889 | +single layer. Atop that, it makes all files mutable. | |
| 890 | + | |
| 891 | +All of this is fine for tiny remote hosts with a single container, or at | |
| 892 | +least one where none of the containers share base layers. Where it | |
| 893 | +becomes a problem is when you have multiple Fossil containers on a | |
| 894 | +single host, since they all derive from the same base image. | |
| 895 | + | |
| 896 | +The full-featured container runtimes above will intelligently share | |
| 897 | +these immutable base layers among the containers, storing only the | |
| 898 | +differences in each individual container. More, when pulling images from | |
| 899 | +a registry host, they’ll transfer only the layers you don’t have copies | |
| 900 | +of locally, so you don’t have to burn bandwidth sending copies of Alpine | |
| 901 | +and BusyBox each time, even though they’re unlikely to change from one | |
| 902 | +build to the next. | |
| 903 | + | |
| 904 | + | |
| 905 | +#### 6.3.2 <a id="crun"></a>`crun` | |
| 906 | + | |
| 907 | +In the same way that [Docker Engine is based on `runc`](#runc), Podman’s | |
| 908 | +engine is based on [`crun`][crun], a lighter-weight alternative to | |
| 909 | +`runc`. It’s only 1.4 MiB on the system I tested it on, yet it will run | |
| 910 | +the same container bundles as in my `runc` examples above. We saved | |
| 911 | +more than that by compressing the container’s Fossil executable with | |
| 912 | +UPX, making the runtime virtually free in this case. The only question | |
| 913 | +is whether you can put up with its limitations, which are the same as | |
| 914 | +for `runc`. | |
| 915 | + | |
| 916 | +[crun]: https://github.com/containers/crun | |
| 917 | + | |
| 918 | + | |
| 919 | +#### 6.3.3 <a id="nspawn"></a>`systemd-nspawn` | |
| 941 | 920 | |
| 942 | 921 | As of `systemd` version 242, its optional `nspawn` piece |
| 943 | 922 | [reportedly](https://www.phoronix.com/news/Systemd-Nspawn-OCI-Runtime) |
| 944 | -now has the ability to run OCI container bundles directly. You might | |
| 923 | +got the ability to run OCI bundles directly. You might | |
| 945 | 924 | have it installed already, but if not, it’s only about 2 MiB. It’s |
| 946 | 925 | in the `systemd-containers` package as of Ubuntu 22.04 LTS: |
| 947 | 926 | |
| 948 | 927 | ``` |
| 949 | 928 | $ sudo apt install systemd-containers |
| @@ -963,12 +942,12 @@ | ||
| 963 | 942 | --port=127.0.0.1:127.0.0.1:9999:8080 |
| 964 | 943 | $ sudo machinectl list |
| 965 | 944 | No machines. |
| 966 | 945 | ``` |
| 967 | 946 | |
| 968 | -This is why I wrote “reportedly” above: it doesn’t work on two different | |
| 969 | -Linux distributions, and I can’t see why. I’m putting this here to give | |
| 947 | +This is why I wrote “reportedly” above: I couldn’t get it to work on two different | |
| 948 | +Linux distributions, and I can’t see why. I’m leaving this here to give | |
| 970 | 949 | someone else a leg up, with the hope that they will work out what’s |
| 971 | 950 | needed to get the container running and registered with `machinectl`. |
| 972 | 951 | |
| 973 | 952 | As of this writing, the tool expects an OCI container version of |
| 974 | 953 | “1.0.0”. I had to edit this at the top of my `config.json` file to get |
| 975 | 954 |
| --- www/containers.md | |
| +++ www/containers.md | |
| @@ -530,245 +530,45 @@ | |
| 530 | [DD]: https://www.docker.com/products/docker-desktop/ |
| 531 | [DE]: https://docs.docker.com/engine/ |
| 532 | [DNT]: ./server/debian/nginx.md |
| 533 | |
| 534 | |
| 535 | ### 6.1 <a id="runc" name="containerd"></a>Stripping Docker Engine Down |
| 536 | |
| 537 | The core of Docker Engine is its [`containerd`][ctrd] daemon and the |
| 538 | [`runc`][runc] container runner. It’s possible to dig into the subtree |
| 539 | managed by `containerd` on the build host and extract what we need to |
| 540 | run our Fossil container elsewhere with `runc`, leaving out all the |
| 541 | rest. `runc` alone is about 18 MiB, and you can do without `containerd` |
| 542 | entirely, if you want. |
| 543 | |
| 544 | The method isn’t complicated, but it *is* cryptic enough to want a shell |
| 545 | script: |
| 546 | |
| ----- | |
| 547 | |
| 548 | ```shell |
| 549 | #!/bin/sh |
| 550 | c=fossil |
| 551 | b=$HOME/containers/$c |
| 552 | r=$b/rootfs |
| 553 | m=/run/containerd/io.containerd.runtime.v2.task/moby |
| 554 | |
| 555 | if [ -d "$t" ] && mkdir -p $r |
| 556 | then |
| 557 | docker container start $c |
| 558 | docker container export $c | sudo tar -C $r -xf - |
| 559 | id=$(docker inspect --format="{{.Id}}" $c) |
| 560 | sudo cat $m/$id/config.json | |
| 561 | jq '.root.path = "'$r'"' | |
| 562 | jq '.linux.cgroupsPath = ""' | |
| 563 | jq 'del(.linux.sysctl)' | |
| 564 | jq 'del(.linux.namespaces[] | select(.type == "network"))' | |
| 565 | jq 'del(.mounts[] | select(.destination == "/etc/hostname"))' | |
| 566 | jq 'del(.mounts[] | select(.destination == "/etc/resolv.conf"))' | |
| 567 | jq 'del(.mounts[] | select(.destination == "/etc/hosts"))' | |
| 568 | jq 'del(.hooks)' > $b/config.json |
| 569 | fi |
| 570 | ``` |
| 571 | |
| ----- | |
| 572 | |
| 573 | The first several lines list configurables: |
| 574 | |
| 575 | * **`b`**: the path of the exported container, called the “bundle” in OCI |
| 576 | jargon |
| 577 | * **`c`**: the name of the Docker container you’re bundling up for use |
| 578 | with `runc` |
| 579 | * **`m`**: the directory holding the running machines, configurable |
| 580 | because: |
| 581 | * it’s long |
| 582 | * it’s been known to change from one version of Docker to the next |
| 583 | * you might be using [Podman](#podman)/[`crun`](#crun), so it has |
| 584 | to be “`/run/user/$UID/crun`” instead |
| 585 | * **`r`**: the path of the directory containing the bundle’s root file |
| 586 | system. |
| 587 | |
| 588 | That last doesn’t have to be called `rootfs/`, and it doesn’t have to |
| 589 | live in the same directory as `config.json`, but it is conventional. |
| 590 | Because some OCI tools use those names as defaults, it’s best to follow |
| 591 | suit. |
| 592 | |
| 593 | The rest is generic, but you’re welcome to freestyle here. We’ll show an |
| 594 | example of this below. |
| 595 | |
| 596 | We’re using [jq] for two separate purposes: |
| 597 | |
| 598 | 1. To automatically transmogrify Docker’s container configuration so it |
| 599 | will work with `runc`: |
| 600 | |
| 601 | * point it where we unpacked the container’s exported rootfs |
| 602 | * accede to its wish to [manage cgroups by itself][ecg] |
| 603 | * remove the `sysctl` calls that will break after… |
| 604 | * …we remove the network namespace to allow Fossil’s TCP listening |
| 605 | port to be available on the host; `runc` doesn’t offer the |
| 606 | equivalent of `docker create --publish`, and we can’t be |
| 607 | bothered to set up a manual mapping from the host port into the |
| 608 | container |
| 609 | * remove file bindings that point into the local runtime managed |
| 610 | directories; one of the things we give up by using a bare |
| 611 | container runner is automatic management of these files |
| 612 | * remove the hooks for essentially the same reason |
| 613 | |
| 614 | 2. To make the Docker-managed machine-readable `config.json` more |
| 615 | human-readable, in case there are other things you want changed in |
| 616 | this version of the container. Exposing the `config.json` file like |
| 617 | this means you don’t have to rebuild the container merely to change |
| 618 | a value like a mount point, the kernel capability set, and so forth. |
| 619 | |
| 620 | <a id="why-sudo"></a> |
| 621 | We have to do this transformation of `config.json` as the local root |
| 622 | user because it isn’t readable by your normal user. Additionally, that |
| 623 | input file is only available while the container is started, which is |
| 624 | why we ensure that before exporting the container’s rootfs. |
| 625 | |
| 626 | With the container exported like this, you can start it as: |
| 627 | |
| 628 | ``` |
| 629 | $ cd /path/to/bundle |
| 630 | $ c=any-name-you-like |
| 631 | $ sudo runc create $c |
| 632 | $ sudo runc start $c |
| 633 | $ sudo runc exec $c -t sh -l |
| 634 | ~ $ ls museum |
| 635 | repo.fossil |
| 636 | ~ $ ps -eaf |
| 637 | PID USER TIME COMMAND |
| 638 | 1 fossil 0:00 bin/fossil server --create … |
| 639 | ~ $ exit |
| 640 | $ sudo runc kill fossil-runc |
| 641 | $ sudo runc delete fossil-runc |
| 642 | ``` |
| 643 | |
| 644 | If you’re doing this on the export host, the first command is “`cd $b`” |
| 645 | if we’re using the variables from the shell script above. We do this |
| 646 | because `runc` assumes you’re running it from the bundle directory. If |
| 647 | you prefer, the `runc` commands that care about this take a |
| 648 | `--bundle/-b` flag to let you avoid switching directories. |
| 649 | |
| 650 | The rest should be straightforward: create and start the container as |
| 651 | root so the `chroot(2)` call inside the container will succeed, then get |
| 652 | into it with a login shell and poke around to prove to ourselves that |
| 653 | everything is working properly. It is. Yay! |
| 654 | |
| 655 | The remaining commands show shutting the container down and destroying |
| 656 | it, simply to show how these commands change relative to using the |
| 657 | Docker Engine commands. It’s “kill,” not “stop,” and it’s “delete,” not |
| 658 | “rm.” |
| 659 | |
| 660 | If you want the bundle to run on a remote host, the local and remote |
| 661 | bundle directories likely will not match, as the shell script above |
| 662 | assumes. This is a more realistic shell script for that case: |
| 663 | |
| ----- | |
| 664 | |
| 665 | ```shell |
| 666 | #!/bin/bash -ex |
| 667 | c=fossil |
| 668 | b=/var/lib/machines/$c |
| 669 | h=my-host.example.com |
| 670 | m=/run/containerd/io.containerd.runtime.v2.task/moby |
| 671 | t=$(mktemp -d /tmp/$c-bundle.XXXXXX) |
| 672 | |
| 673 | if [ -d "$t" ] |
| 674 | then |
| 675 | docker container start $c |
| 676 | docker container export $c > $t/rootfs.tar |
| 677 | id=$(docker inspect --format="{{.Id}}" $c) |
| 678 | sudo cat $m/$id/config.json | |
| 679 | jq '.root.path = "'$b/rootfs'"' | |
| 680 | jq '.linux.cgroupsPath = ""' | |
| 681 | jq 'del(.linux.sysctl)' | |
| 682 | jq 'del(.linux.namespaces[] | select(.type == "network"))' | |
| 683 | jq 'del(.mounts[] | select(.destination == "/etc/hostname"))' | |
| 684 | jq 'del(.mounts[] | select(.destination == "/etc/resolv.conf"))' | |
| 685 | jq 'del(.mounts[] | select(.destination == "/etc/hosts"))' | |
| 686 | jq 'del(.hooks)' > $t/config.json |
| 687 | scp -r $t $h:tmp |
| 688 | ssh -t $h "{ |
| 689 | mv ./$t/config.json $b && |
| 690 | sudo tar -C $b/rootfs -xf ./$t/rootfs.tar && |
| 691 | rm -r ./$t |
| 692 | }" |
| 693 | rm -r $t |
| 694 | fi |
| 695 | ``` |
| 696 | |
| ----- | |
| 697 | |
| 698 | We’ve introduced two new variables: |
| 699 | |
| 700 | * **`h`**: the remote host name |
| 701 | * **`t`**: a temporary bundle directory we populate locally, then |
| 702 | `scp` to the remote machine, where it’s unpacked |
| 703 | |
| 704 | We dropped the **`r`** variable because now we have two different |
| 705 | “rootfs” types: the tarball and the unpacked version of that tarball. |
| 706 | To avoid confusing ourselves between these cases, we’ve replaced uses of |
| 707 | `$r` with explicit paths. |
| 708 | |
| 709 | You need to be aware that this script uses `sudo` for two different purposes: |
| 710 | |
| 711 | 1. To read the local `config.json` file out of the `containerd` managed |
| 712 | directory. ([Details above](#why-sudo).) |
| 713 | |
| 714 | 2. To unpack the bundle onto the remote machine. If you try to get |
| 715 | clever and unpack it locally, then `rsync` it to the remote host to |
| 716 | avoid re-copying files that haven’t changed since the last update, |
| 717 | you’ll find that it fails when it tries to copy device nodes, to |
| 718 | create files owned only by the remote root user, and so forth. If the |
| 719 | container bundle is small, it’s simpler to re-copy and unpack it |
| 720 | fresh each time. |
| 721 | |
| 722 | I point that out because it might ask for your password twice: once for |
| 723 | the local sudo command, and once for the remote. |
| 724 | |
| 725 | The default for the **`b`** variable is the convention for systemd based |
| 726 | machines, which will play into the [`nspawn` alternative below][sdnsp]. |
| 727 | Even if you aren’t using `nspawn`, it’s a reasonable place to put |
| 728 | containers under the [Linux FHS rules][LFHS]. |
| 729 | |
| 730 | [ctrd]: https://containerd.io/ |
| 731 | [ecg]: https://github.com/opencontainers/runc/pull/3131 |
| 732 | [LFHS]: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard |
| 733 | [jq]: https://stedolan.github.io/jq/ |
| 734 | [sdnsp]: #nspawn |
| 735 | [runc]: https://github.com/opencontainers/runc |
| 736 | |
| 737 | |
| 738 | ### 6.2 <a id="podman"></a>Podman |
| 739 | |
| 740 | Although your humble author claims the `runc` methods above are not |
| 741 | complicated, merely cryptic, you might be fondly recollecting the |
| 742 | carefree commands at the top of this document, pondering whether you can |
| 743 | live without the abstractions a proper container runtime system |
| 744 | provides. |
| 745 | |
| 746 | More than that, there’s a hidden cost to the `runc` method: there is no |
| 747 | layer sharing among containers. If you have multiple Fossil containers |
| 748 | on a single host — perhaps because each serves an independent section of |
| 749 | the overall web site — and you export them to a remote host using the |
| 750 | shell script above, you’ll end up with redundant copies of the `rootfs` |
| 751 | in each. A proper OCI container runtime knows they’re all derived from |
| 752 | the same base image, differing only in minor configuration details, |
| 753 | giving us one of the major advantages of containerization: if none of |
| 754 | the running containers can change these immutable base layers, it |
| 755 | doesn’t have to copy them. |
| 756 | |
| 757 | A lighter-weight alternative to Docker Engine that doesn’t give up so |
| 758 | many of its administrator affordances is [Podman], initially created by |
| 759 | Red Hat and thus popular on that family of OSes, although it will run on |
| 760 | any flavor of Linux. It can even be made to run [on macOS via Homebrew][pmmac] |
| 761 | or [on Windows via WSL2][pmwin]. |
| 762 | |
| 763 | On Ubuntu 22.04, it’s about a quarter the size of Docker Engine. That |
| 764 | isn’t nearly so slim as `runc`, but we may be willing to pay this |
| 765 | overhead to get shorter and fewer commands. |
| 766 | |
| 767 | Although Podman [bills itself][whatis] as a drop-in replacement for the |
| 768 | `docker` command and everything that sits behind it, some of the tool’s |
| 769 | design decisions affect how our Fossil containers run, as compared to |
| 770 | using Docker. The most important of these is that, by default, Podman |
| @@ -817,39 +617,14 @@ | |
| 817 | they’ll be connected to the network the container runs on. Once the bad |
| 818 | guy is inside the house, he doesn’t necessarily have to go after the |
| 819 | residents directly to cause problems for them. |
| 820 | |
| 821 | |
| 822 | #### 6.2.2 <a id="crun"></a>`crun` |
| 823 | |
| 824 | In the same way that [Docker Engine is based on `runc`](#runc), Podman’s |
| 825 | engine is based on [`crun`][crun], a lighter-weight alternative to |
| 826 | `runc`. It’s only 1.4 MiB on the system I tested it on, yet it will run |
| 827 | the same container bundles as in my `runc` examples above. |
| 828 | Above, we saved more than that by compressing the container’s Fossil |
| 829 | executable with UPX! |
| 830 | |
| 831 | This makes `crun` a great option for tiny remote hosts with a single |
| 832 | container, or at least where none of the containers share base layers, |
| 833 | so that there is no effective cost to duplicating the immutable base |
| 834 | layers of the containers’ source images. |
| 835 | |
| 836 | This suggests one method around the problem of rootless Podman containers: |
| 837 | `sudo crun`, following the examples above. |
| 838 | |
| 839 | [crun]: https://github.com/containers/crun |
| 840 | |
| 841 | |
| 842 | #### 6.2.3 <a id="podman-rootful"></a>Fossil in a Rootful Podman Container |
| 843 | |
| 844 | ##### Simple Method |
| 845 | |
| 846 | As we saw above with `runc`, switching to `crun` just to get your |
| 847 | containers to run as root loses a lot of functionality and requires a |
| 848 | bunch of cryptic commands to get the same effect as a single command |
| 849 | under Podman. |
| 850 | |
| 851 | Fortunately, it’s easy enough to have it both ways. Simply run your |
| 852 | `podman` commands as root: |
| 853 | |
| 854 | ``` |
| 855 | $ sudo podman build -t fossil --cap-add MKNOD . |
| @@ -875,11 +650,11 @@ | |
| 875 | it’s done inside a container runtime’s build environment doesn’t mean we |
| 876 | can get away without root privileges to do things like create the |
| 877 | `/jail/dev/null` node. |
| 878 | |
| 879 | The other reason we need “`sudo podman build`” is because it puts the result |
| 880 | into root’s Podman image repository, where the next steps look for it. |
| 881 | |
| 882 | That in turn explains why we need “`sudo podman create`:” because it’s |
| 883 | creating a container based on an image that was created by root. If you |
| 884 | ran that step without `sudo`, it wouldn’t be able to find the image. |
| 885 | |
| @@ -927,23 +702,227 @@ | |
| 927 | $ sudo podman create \ |
| 928 | --any-options-you-like \ |
| 929 | docker.io/mydockername/fossil |
| 930 | ``` |
| 931 | |
| 932 | This round-trip through the public image repository has another side |
| 933 | benefit: your local system might be a lot faster than your remote one, |
| 934 | as when the remote is a small VPS. Even with the overhead of schlepping |
| 935 | container images across the Internet, it can be a net win in terms of |
| 936 | build time. |
| 937 | |
| 938 | |
| 939 | |
| 940 | ### 6.3 <a id="nspawn"></a>`systemd-nspawn` |
| 941 | |
| 942 | As of `systemd` version 242, its optional `nspawn` piece |
| 943 | [reportedly](https://www.phoronix.com/news/Systemd-Nspawn-OCI-Runtime) |
| 944 | now has the ability to run OCI container bundles directly. You might |
| 945 | have it installed already, but if not, it’s only about 2 MiB. It’s |
| 946 | in the `systemd-containers` package as of Ubuntu 22.04 LTS: |
| 947 | |
| 948 | ``` |
| 949 | $ sudo apt install systemd-containers |
| @@ -963,12 +942,12 @@ | |
| 963 | --port=127.0.0.1:127.0.0.1:9999:8080 |
| 964 | $ sudo machinectl list |
| 965 | No machines. |
| 966 | ``` |
| 967 | |
| 968 | This is why I wrote “reportedly” above: it doesn’t work on two different |
| 969 | Linux distributions, and I can’t see why. I’m putting this here to give |
| 970 | someone else a leg up, with the hope that they will work out what’s |
| 971 | needed to get the container running and registered with `machinectl`. |
| 972 | |
| 973 | As of this writing, the tool expects an OCI container version of |
| 974 | “1.0.0”. I had to edit this at the top of my `config.json` file to get |
| 975 |
| --- www/containers.md | |
| +++ www/containers.md | |
| @@ -530,245 +530,45 @@ | |
| 530 | [DD]: https://www.docker.com/products/docker-desktop/ |
| 531 | [DE]: https://docs.docker.com/engine/ |
| 532 | [DNT]: ./server/debian/nginx.md |
| 533 | |
| 534 | |
| 535 | ### 6.1 <a id="nerdctl" name="containerd"></a>Stripping Docker Engine Down |
| 536 | |
| 537 | The core of Docker Engine is its [`containerd`][ctrd] daemon and the |
| ----- | |
| ----- | |
| ----- | |
| ----- | |
| 538 | [`runc`][runc] container runner. Add to this the out-of-core CLI program |
| 539 | [`nerdctl`][nerdctl] and you have enough of the engine to run Fossil |
| 540 | containers. The big things you’re missing are: |
| 541 | |
| 542 | * **BuildKit**: The container build engine, which doesn’t matter if |
| 543 | you’re building elsewhere and using a container registry as an |
| 544 | intermediary between that build host and the deployment host. |
| 545 | |
| 546 | * **SwarmKit**: A powerful yet simple orchestrator for Docker that you |
| 547 | probably aren’t using with Fossil anyway. |
| 548 | |
| 549 | In exchange, you get a runtime that’s about half the size of Docker |
| 550 | Engine. The commands are essentially the same as above, but you say |
| 551 | “`nerdctl`” instead of “`docker`”. You might alias one to the other, |
| 552 | because you’re still going to be using Docker to build and ship your |
| 553 | container images. |
| 554 | |
| 555 | [ctrd]: https://containerd.io/ |
| 556 | [nerdctl]: https://github.com/containerd/nerdctl |
| 557 | [runc]: https://github.com/opencontainers/runc |
| 558 | |
| 559 | |
| 560 | ### 6.2 <a id="podman"></a>Podman |
| 561 | |
| 562 | A lighter-weight alternative to either of the prior options that doesn’t |
| 563 | give up the image builder is [Podman]. Initially created by |
| 564 | Red Hat and thus popular on that family of OSes, it will run on |
| 565 | any flavor of Linux. It can even be made to run [on macOS via Homebrew][pmmac] |
| 566 | or [on Windows via WSL2][pmwin]. |
| 567 | |
| 568 | On Ubuntu 22.04, it’s about a quarter the size of Docker Engine, or half |
| 569 | that of the “full” distribution of `nerdctl` and all its dependencies. |
| 570 | |
| 571 | Although Podman [bills itself][whatis] as a drop-in replacement for the |
| 572 | `docker` command and everything that sits behind it, some of the tool’s |
| 573 | design decisions affect how our Fossil containers run, as compared to |
| 574 | using Docker. The most important of these is that, by default, Podman |
| @@ -817,39 +617,14 @@ | |
| 617 | they’ll be connected to the network the container runs on. Once the bad |
| 618 | guy is inside the house, he doesn’t necessarily have to go after the |
| 619 | residents directly to cause problems for them. |
| 620 | |
| 621 | |
| 622 | #### 6.2.2 <a id="podman-rootful"></a>Fossil in a Rootful Podman Container |
| 623 | |
| 624 | ##### Simple Method |
| 625 | |
| 626 | Fortunately, it’s easy enough to have it both ways. Simply run your |
| 627 | `podman` commands as root: |
| 628 | |
| 629 | ``` |
| 630 | $ sudo podman build -t fossil --cap-add MKNOD . |
| @@ -875,11 +650,11 @@ | |
| 650 | it’s done inside a container runtime’s build environment doesn’t mean we |
| 651 | can get away without root privileges to do things like create the |
| 652 | `/jail/dev/null` node. |
| 653 | |
| 654 | The other reason we need “`sudo podman build`” is because it puts the result |
| 655 | into root’s Podman image registry, where the next steps look for it. |
| 656 | |
| 657 | That in turn explains why we need “`sudo podman create`:” because it’s |
| 658 | creating a container based on an image that was created by root. If you |
| 659 | ran that step without `sudo`, it wouldn’t be able to find the image. |
| 660 | |
| @@ -927,23 +702,227 @@ | |
| 702 | $ sudo podman create \ |
| 703 | --any-options-you-like \ |
| 704 | docker.io/mydockername/fossil |
| 705 | ``` |
| 706 | |
| 707 | This round-trip through the public image registry has another side |
| 708 | benefit: your local system might be a lot faster than your remote one, |
| 709 | as when the remote is a small VPS. Even with the overhead of schlepping |
| 710 | container images across the Internet, it can be a net win in terms of |
| 711 | build time. |
| 712 | |
| 713 | |
| 714 | |
| 715 | ### 6.3 <a id="barebones"></a>Bare-Bones OCI Bundle Runners |
| 716 | |
| 717 | If even the Podman stack is too big for you, you still have options for |
| 718 | running containers that are considerably slimmer, at a high cost to |
| 719 | administration complexity and loss of features. |
| 720 | |
| 721 | Part of the OCI standard is the notion of a “bundle,” being a consistent |
| 722 | way to present a pre-built and configured container to the runtime. |
| 723 | Essentially, it consists of a directory containing a `config.json` file |
| 724 | and a `rootfs/` subdirectory containing the root filesystem image. Many |
| 725 | tools can produce these for you. We’ll show only one method in the first |
| 726 | section below, then reuse that in the following sections. |
| 727 | |
| 728 | |
| 729 | #### 6.3.1 <a id="runc"></a>`runc` |
| 730 | |
| 731 | We mentioned `runc` [above](#nerdctl), but it’s possible to use it |
| 732 | standalone, without `containerd` or its CLI frontend `nerdctl`. You also |
| 733 | lose the build engine, intelligent image layer sharing, image registry |
| 734 | connections, and much more. The plus side is that `runc` alone is |
| 735 | 18 MiB. |
| 736 | |
| 737 | Using it without all the support tooling isn’t complicated, but it *is* |
| 738 | cryptic enough to want a shell script. Let’s say we want to build on our |
| 739 | big desktop machine but ship the resulting container to a small remote |
| 740 | host. This should serve: |
| 741 | |
| 742 | ---- |
| 743 | |
| 744 | ```shell |
| 745 | #!/bin/bash -ex |
| 746 | c=fossil |
| 747 | b=/var/lib/machines/$c |
| 748 | h=my-host.example.com |
| 749 | m=/run/containerd/io.containerd.runtime.v2.task/moby |
| 750 | t=$(mktemp -d /tmp/$c-bundle.XXXXXX) |
| 751 | |
| 752 | if [ -d "$t" ] |
| 753 | then |
| 754 | docker container start $c |
| 755 | docker container export $c > $t/rootfs.tar |
| 756 | id=$(docker inspect --format="{{.Id}}" $c) |
| 757 | sudo cat $m/$id/config.json \ |
| 758 | | jq '.root.path = "'$b/rootfs'"' |
| 759 | | jq '.linux.cgroupsPath = ""' |
| 760 | | jq 'del(.linux.sysctl)' |
| 761 | | jq 'del(.linux.namespaces[] | select(.type == "network"))' |
| 762 | | jq 'del(.mounts[] | select(.destination == "/etc/hostname"))' |
| 763 | | jq 'del(.mounts[] | select(.destination == "/etc/resolv.conf"))' |
| 764 | | jq 'del(.mounts[] | select(.destination == "/etc/hosts"))' |
| 765 | | jq 'del(.hooks)' > $t/config.json |
| 766 | scp -r $t $h:tmp |
| 767 | ssh -t $h "{ |
| 768 | mv ./$t/config.json $b && |
| 769 | sudo tar -C $b/rootfs -xf ./$t/rootfs.tar && |
| 770 | rm -r ./$t |
| 771 | }" |
| 772 | rm -r $t |
| 773 | fi |
| 774 | ``` |
| 775 | |
| 776 | ---- |
| 777 | |
| 778 | The first several lines list configurables: |
| 779 | |
| 780 | * **`c`**: the name of the Docker container you’re bundling up for use |
| 781 | with `runc` |
| 782 | * **`b`**: the path of the exported container, called the “bundle” in |
| 783 | OCI jargon; we’re using the [`nspawn`](#nspawn) convention, a |
| 784 | reasonable choice under the [Linux FHS rules][LFHS] |
| 785 | * **`h`**: the remote host name |
| 786 | * **`m`**: the local directory holding the running machines, configurable |
| 787 | because: |
| 788 | * the path name is longer than we want to use inline |
| 789 | * it’s been known to change from one version of Docker to the next |
| 790 | * you might be building and testing with [Podman](#podman), so it |
| 791 | has to be “`/run/user/$UID/crun`” instead |
| 792 | * **`t`**: the temporary bundle directory we populate locally, then |
| 793 | `scp` to the remote machine, where it’s unpacked |
| 794 | |
| 795 | [LFHS]: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard |
| 796 | |
| 797 | |
| 798 | ##### Why All That `sudo` Stuff? |
| 799 | |
| 800 | This script uses `sudo` for two different purposes: |
| 801 | |
| 802 | 1. To read the local `config.json` file out of the `containerd` managed |
| 803 | directory, which is owned by `root` on Docker systems. Additionally, |
| 804 | that input file is only available while the container is started, so |
| 805 | we must ensure that before extracting it. |
| 806 | |
| 807 | 2. To unpack the bundle onto the remote machine. If you try to get |
| 808 | clever and unpack it locally, then `rsync` it to the remote host to |
| 809 | avoid re-copying files that haven’t changed since the last update, |
| 810 | you’ll find that it fails when it tries to copy device nodes, to |
| 811 | create files owned only by the remote root user, and so forth. If the |
| 812 | container bundle is small, it’s simpler to re-copy and unpack it |
| 813 | fresh each time. |
| 814 | |
| 815 | I point all this out because it might ask for your password twice: once for |
| 816 | the local sudo command, and once for the remote. |
| 817 | |
| 818 | |
| 819 | |
| 820 | ##### Why All That `jq` Stuff? |
| 821 | |
| 822 | We’re using [jq] for two separate purposes: |
| 823 | |
| 824 | 1. To automatically transmogrify Docker’s container configuration so it |
| 825 | will work with `runc`: |
| 826 | |
| 827 | * point it where we unpacked the container’s exported rootfs |
| 828 | * accede to its wish to [manage cgroups by itself][ecg] |
| 829 | * remove the `sysctl` calls that will break after… |
| 830 | * …we remove the network namespace to allow Fossil’s TCP listening |
| 831 | port to be available on the host; `runc` doesn’t offer the |
| 832 | equivalent of `docker create --publish`, and we can’t be |
| 833 | bothered to set up a manual mapping from the host port into the |
| 834 | container |
| 835 | * remove file bindings that point into the local runtime managed |
| 836 | directories; one of the things we give up by using a bare |
| 837 | container runner is automatic management of these files |
| 838 | * remove the hooks for essentially the same reason |
| 839 | |
| 840 | 2. To make the Docker-managed machine-readable `config.json` more |
| 841 | human-readable, in case there are other things you want changed in |
| 842 | this version of the container. Exposing the `config.json` file like |
| 843 | this means you don’t have to rebuild the container merely to change |
| 844 | a value like a mount point, the kernel capability set, and so forth. |
| 845 | |
| 846 | |
| 847 | ##### Running the Bundle |
| 848 | |
| 849 | With the container exported to a bundle like this, you can start it as: |
| 850 | |
| 851 | ``` |
| 852 | $ cd /path/to/bundle |
| 853 | $ c=fossil-runc ← …or anything else you prefer |
| 854 | $ sudo runc create $c |
| 855 | $ sudo runc start $c |
| 856 | $ sudo runc exec $c -t sh -l |
| 857 | ~ $ ls museum |
| 858 | repo.fossil |
| 859 | ~ $ ps -eaf |
| 860 | PID USER TIME COMMAND |
| 861 | 1 fossil 0:00 bin/fossil server --create … |
| 862 | ~ $ exit |
| 863 | $ sudo runc kill $c |
| 864 | $ sudo runc delete $c |
| 865 | ``` |
| 866 | |
| 867 | If you’re doing this on the export host, the first command is “`cd $b`” |
| 868 | if we’re using the variables from the shell script above. Alternately, |
| 869 | the `runc` subcommands that need to read the bundle files take a |
| 870 | `--bundle/-b` flag to let you avoid switching directories. |
| 871 | |
| 872 | The rest should be straightforward: create and start the container as |
| 873 | root so the `chroot(2)` call inside the container will succeed, then get |
| 874 | into it with a login shell and poke around to prove to ourselves that |
| 875 | everything is working properly. It is. Yay! |
| 876 | |
| 877 | The remaining commands show shutting the container down and destroying |
| 878 | it, simply to show how these commands change relative to using the |
| 879 | Docker Engine commands. It’s “kill,” not “stop,” and it’s “delete,” not |
| 880 | “rm.” |
| 881 | |
| 882 | [ecg]: https://github.com/opencontainers/runc/pull/3131 |
| 883 | [jq]: https://stedolan.github.io/jq/ |
| 884 | |
| 885 | |
| 886 | ##### Lack of Layer Sharing |
| 887 | |
| 888 | The bundle export process collapses Docker’s union filesystem down to a |
| 889 | single layer. Atop that, it makes all files mutable. |
| 890 | |
| 891 | All of this is fine for tiny remote hosts with a single container, or at |
| 892 | least one where none of the containers share base layers. Where it |
| 893 | becomes a problem is when you have multiple Fossil containers on a |
| 894 | single host, since they all derive from the same base image. |
| 895 | |
| 896 | The full-featured container runtimes above will intelligently share |
| 897 | these immutable base layers among the containers, storing only the |
| 898 | differences in each individual container. More, when pulling images from |
| 899 | a registry host, they’ll transfer only the layers you don’t have copies |
| 900 | of locally, so you don’t have to burn bandwidth sending copies of Alpine |
| 901 | and BusyBox each time, even though they’re unlikely to change from one |
| 902 | build to the next. |
| 903 | |
| 904 | |
| 905 | #### 6.3.2 <a id="crun"></a>`crun` |
| 906 | |
| 907 | In the same way that [Docker Engine is based on `runc`](#runc), Podman’s |
| 908 | engine is based on [`crun`][crun], a lighter-weight alternative to |
| 909 | `runc`. It’s only 1.4 MiB on the system I tested it on, yet it will run |
| 910 | the same container bundles as in my `runc` examples above. We saved |
| 911 | more than that by compressing the container’s Fossil executable with |
| 912 | UPX, making the runtime virtually free in this case. The only question |
| 913 | is whether you can put up with its limitations, which are the same as |
| 914 | for `runc`. |
| 915 | |
| 916 | [crun]: https://github.com/containers/crun |
| 917 | |
| 918 | |
| 919 | #### 6.3.3 <a id="nspawn"></a>`systemd-nspawn` |
| 920 | |
| 921 | As of `systemd` version 242, its optional `nspawn` piece |
| 922 | [reportedly](https://www.phoronix.com/news/Systemd-Nspawn-OCI-Runtime) |
| 923 | got the ability to run OCI bundles directly. You might |
| 924 | have it installed already, but if not, it’s only about 2 MiB. It’s |
| 925 | in the `systemd-containers` package as of Ubuntu 22.04 LTS: |
| 926 | |
| 927 | ``` |
| 928 | $ sudo apt install systemd-containers |
| @@ -963,12 +942,12 @@ | |
| 942 | --port=127.0.0.1:127.0.0.1:9999:8080 |
| 943 | $ sudo machinectl list |
| 944 | No machines. |
| 945 | ``` |
| 946 | |
| 947 | This is why I wrote “reportedly” above: I couldn’t get it to work on two different |
| 948 | Linux distributions, and I can’t see why. I’m leaving this here to give |
| 949 | someone else a leg up, with the hope that they will work out what’s |
| 950 | needed to get the container running and registered with `machinectl`. |
| 951 | |
| 952 | As of this writing, the tool expects an OCI container version of |
| 953 | “1.0.0”. I had to edit this at the top of my `config.json` file to get |
| 954 |