Fossil SCM

Researched, tested, and documented the set of "docker create --cap-drop" options we can add to strip away unnecessary root privileges inside the container without harming normal operation. Belt-and-suspenders: if any bad actor ever got into the container with root privileges, this would help prevent them from affecting anything outside the container. Added that set to the "make container-run" target so they get applied by default in the easy case.

wyoung 2022-08-29 17:54 trunk

Commit f715add9381024ab6a733db88a1f6a8915775444464ad4337dcae7ea8c3421a0

Parent f00a88f89632259…

2 files changed +12 -1 +104 -2

~ Makefile.in ~ www/build.wiki

M Makefile.in

+12 -1

		--- Makefile.in
		+++ Makefile.in
		@@ -122,10 +122,21 @@
122	122	# Container stuff
123	123	container-image: @srcdir@/Dockerfile
124	124	docker build -t fossil:@FOSSIL_CI_PFX@ $(DBFLAGS) @srcdir@
125	125
126	126	container-run: container-image
127		- docker run --name fossil-@FOSSIL_CI_PFX@ $(DRFLAGS) fossil:@FOSSIL_CI_PFX@
	127	+ docker run \
	128	+ --name fossil-@FOSSIL_CI_PFX@ \
	129	+ --cap-drop AUDIT_WRITE \
	130	+ --cap-drop CHOWN \
	131	+ --cap-drop FSETID \
	132	+ --cap-drop KILL \
	133	+ --cap-drop MKNOD \
	134	+ --cap-drop NET_BIND_SERVICE \
	135	+ --cap-drop NET_RAW \
	136	+ --cap-drop SETFCAP \
	137	+ --cap-drop SETPCAP \
	138	+ $(DRFLAGS) fossil:@FOSSIL_CI_PFX@
128	139
129	140	@srcdir@/Dockerfile: @srcdir@/Dockerfile.in @srcdir@/manifest.uuid
130	141	@AUTOREMAKE@
131	142
132	143

	--- Makefile.in
	+++ Makefile.in
	@@ -122,10 +122,21 @@
122	# Container stuff
123	container-image: @srcdir@/Dockerfile
124	docker build -t fossil:@FOSSIL_CI_PFX@ $(DBFLAGS) @srcdir@
125
126	container-run: container-image
127	docker run --name fossil-@FOSSIL_CI_PFX@ $(DRFLAGS) fossil:@FOSSIL_CI_PFX@











128
129	@srcdir@/Dockerfile: @srcdir@/Dockerfile.in @srcdir@/manifest.uuid
130	@AUTOREMAKE@
131
132

	--- Makefile.in
	+++ Makefile.in
	@@ -122,10 +122,21 @@
122	# Container stuff
123	container-image: @srcdir@/Dockerfile
124	docker build -t fossil:@FOSSIL_CI_PFX@ $(DBFLAGS) @srcdir@
125
126	container-run: container-image
127	docker run \
128	--name fossil-@FOSSIL_CI_PFX@ \
129	--cap-drop AUDIT_WRITE \
130	--cap-drop CHOWN \
131	--cap-drop FSETID \
132	--cap-drop KILL \
133	--cap-drop MKNOD \
134	--cap-drop NET_BIND_SERVICE \
135	--cap-drop NET_RAW \
136	--cap-drop SETFCAP \
137	--cap-drop SETPCAP \
138	$(DRFLAGS) fossil:@FOSSIL_CI_PFX@
139
140	@srcdir@/Dockerfile: @srcdir@/Dockerfile.in @srcdir@/manifest.uuid
141	@AUTOREMAKE@
142
143

M www/build.wiki

+104 -2

		--- www/build.wiki
		+++ www/build.wiki
		@@ -344,11 +344,11 @@
344	344	(See [#docker-args \| below] for how to change this default.)
345	345	You don't have to restart the server after fixing this with
346	346	<tt>chmod</tt>: simply reload the browser, and Fossil will try again.
347	347
348	348
349		-<h4>5.1.2 Storing the Repo Outside the Container</h4>
	349	+<h4 id="docker-bind-mount">5.1.2 Storing the Repo Outside the Container</h4>
350	350
351	351	The simple storage method above has a problem: Docker containers are designed to be
352	352	killed off at the slightest cause, rebuilt, and redeployed. If you do
353	353	that with the repo inside the container, it gets destroyed, too. The
354	354	solution is to replace the "run" command above with the following:
		@@ -377,11 +377,13 @@
377	377	rebuild the container or its underlying image — such as to upgrade to a newer version of Fossil
378	378	— the external directory remains behind and gets remapped into the new container
379	379	when you recreate it with <tt>-v</tt>.
380	380
381	381
382		-<h3 id="docker-chroot">5.2 Why Chroot?</h3>
	382	+<h3 id="docker-security">5.2 Security</h3>
	383	+
	384	+<h4 id="docker-chroot">5.2.1 Why Chroot?</h4>
383	385
384	386	A potentially surprising feature of this container is that it runs
385	387	Fossil as root. Since that causes [./chroot.md \| Fossil's chroot jail
386	388	feature] to kick in, and a Docker container is a type of über-jail
387	389	already, you may be wondering why we bother. Instead, why not either:
		@@ -422,10 +424,110 @@
422	424	the ones forked off to handle each HTTP/CGI hit. This is why you can fix broken
423	425	permissions with <tt>chown</tt> after the container is already running,
424	426	without restarting it: each hit reevaluates the repository file
425	427	permissions when deciding what user to become when dropping root
426	428	privileges.
	429	+
	430	+
	431	+<h4 id="docker-caps">5.2.2 Dropping Unnecessary Capabilities</h4>
	432	+
	433	+The example commands given in this section create the container with
	434	+[https://docs.docker.com/engine/security/#linux-kernel-capabilities \| a
	435	+default set of Linux kernel capabilities]. Although Docker strips almost
	436	+all of the traditional root capabilities away by default, and Fossil
	437	+doesn't need any of those it does take away, Docker does leave some
	438	+enabled that Fossil doesn't actually need. You can tighten the scope of
	439	+capabilities by adding a "<tt>--cap-drop LIST</tt>" option to your
	440	+container creation commands. Specifically:
	441	+
	442	+ * <b><tt>AUDIT_WRITE</tt></b>: Fossil doesn't write to the kernel's
	443	+ auditing log, and we can't see any reason you'd want to be able to
	444	+ do that as an administrator shelled into the container, either.
	445	+ Auditing is something done on the host, not from inside each
	446	+ individual container.<p>
	447	+ * <b><tt>CHOWN</tt></b>: The Fossil server never even calls
	448	+ <tt>chown(2)</tt>, and our image build process sets up all file
	449	+ ownership properly, to the extent that this is possible under the
	450	+ limitations of our automation.<p>
	451	+ Curiously, stripping this capability doesn't affect your ability to
	452	+ run commands like "<tt>chown -R fossil:fossil /jail/museum</tt>"
	453	+ when you're using bind mounts or external volumes — as we recommend
	454	+ [#docker-bind-mount \| above] — because it's the host OS's kernel
	455	+ capabilities that affect the underlying <tt>chown(2)</tt> call in
	456	+ that case, not those of the container.<p>
	457	+ If for some reason you did have to change file ownership of
	458	+ in-container files, it's best to do that by changing the
	459	+ <tt>Dockerfile</tt> to suit, then rebuilding the container, since
	460	+ that bakes the need for the change into your reproducible build
	461	+ process. If you had to do it without rebuilding the container,
	462	+ [https://stackoverflow.com/a/45752205/142454 \| there's a
	463	+ workaround] for the fact that capabilities are a create-time
	464	+ change, baked semi-indelibly into the container configuration.<p>
	465	+ * <b><tt>FSETID</tt></b>: Fossil doesn't use the SUID and SGID bits
	466	+ itself, and our build process doesn't set those flags on any of the
	467	+ files. Although the second fact means we can't see any harm from
	468	+ leaving this enabled, we also can't see any good reason to allow
	469	+ it, so we strip it.<p>
	470	+ * <b><tt>KILL</tt></b>: The only place Fossil calls <tt>kill(2)</tt>
	471	+ is in the [./backoffice.md \| backoffice], and then only for
	472	+ processes it created on earlier runs; it doesn't need the ability
	473	+ to kill processes created by other users. You might wish for this
	474	+ ability as an administrator shelled into the container, but you can
	475	+ pass the "<tt>docker exec --user</tt>" option to run commands
	476	+ within your container as the legitimate owner of the process,
	477	+ removing the need for this capability.<p>
	478	+ * <b><tt>MKNOD</tt></b>: All device nodes are created at build time
	479	+ and are never changed at run time. Realize that the virtualized
	480	+ device nodes inside the container get mapped onto real devices on
	481	+ the host, so if an attacker ever got a root shell on the container,
	482	+ they might be able to do actual damage to the host if we didn't
	483	+ preemptively strip this capability away.<p>
	484	+ * <b><tt>NET_BIND_SERVICE</tt></b>: With containerized deployment,
	485	+ Fossil never needs the ability to bind the server to low-numbered
	486	+ TCP ports, not even if you're running the server in production with
	487	+ TLS enabled and want the service bound to port 443. It's perfectly
	488	+ fine to let the Fossil instance inside the container bind to its
	489	+ default port (8080) because you can rebind it on the host with the
	490	+ "<tt>docker create --publish 443:8080</tt>" option. It's the
	491	+ container's <i>host</i> that needs this ability, not the container
	492	+ itself.<p> (Even the container runtime might not need that
	493	+ capability if you're [./ssl.wiki#server \| terminating TLS with a
	494	+ front-end proxy]. You're more likely to say something like "<tt>-p
	495	+ localhost:12345:8080</tt>", then configure the reverse proxy to
	496	+ translate external HTTPS calls into HTTP directed at this internal
	497	+ port 12345.)<p>
	498	+ * <b><tt>NET_RAW</tt></b>: Fossil itself doesn't use raw sockets, and
	499	+ our build process leaves out <tt>ping</tt> and <tt>traceroute</tt>,
	500	+ the only Busybox utilities that require that ability. If you need
	501	+ to ping something, you can almost certainly do it just as well out
	502	+ on the host; we foresee no compelling reason to use ping or
	503	+ traceroute from inside the container.<p> If we did not take this
	504	+ hard-line stance, an attacker that broke into the container and
	505	+ gained root privileges could use raw sockets to do a wide array of
	506	+ bad things to any network the container is bound to.<p>
	507	+ * <b><tt>SETFCAP, SETPCAP</tt></b>: There isn't much call for file
	508	+ permission granularity beyond the classic Unix ones inside the
	509	+ container, so we drop root's ability to change them.
	510	+
	511	+All together, we recommend adding the following options to your
	512	+"<tt>docker run</tt>" commands, as well as to any "<tt>docker
	513	+create</tt>" command that will be followed by "<tt>docker start</tt>":
	514	+
	515	+<pre><code> --cap-drop AUDIT_WRITE \
	516	+ --cap-drop CHOWN \
	517	+ --cap-drop FSETID \
	518	+ --cap-drop KILL \
	519	+ --cap-drop MKNOD \
	520	+ --cap-drop NET_BIND_SERVICE \
	521	+ --cap-drop NET_RAW \
	522	+ --cap-drop SETFCAP \
	523	+ --cap-drop SETPCAP
	524	+</code></pre>
	525	+
	526	+In the next section, we'll show a case where you create a container
	527	+without ever running it, making these options pointless.
	528	+
427	529
428	530
429	531	<h3 id="docker-static">5.3 Extracting a Static Binary</h3>
430	532
431	533	Our 2-stage build process uses Alpine Linux only as a build host. Once
432	534

	--- www/build.wiki
	+++ www/build.wiki
	@@ -344,11 +344,11 @@
344	(See [#docker-args \| below] for how to change this default.)
345	You don't have to restart the server after fixing this with
346	<tt>chmod</tt>: simply reload the browser, and Fossil will try again.
347
348
349	<h4>5.1.2 Storing the Repo Outside the Container</h4>
350
351	The simple storage method above has a problem: Docker containers are designed to be
352	killed off at the slightest cause, rebuilt, and redeployed. If you do
353	that with the repo inside the container, it gets destroyed, too. The
354	solution is to replace the "run" command above with the following:
	@@ -377,11 +377,13 @@
377	rebuild the container or its underlying image — such as to upgrade to a newer version of Fossil
378	— the external directory remains behind and gets remapped into the new container
379	when you recreate it with <tt>-v</tt>.
380
381
382	<h3 id="docker-chroot">5.2 Why Chroot?</h3>


383
384	A potentially surprising feature of this container is that it runs
385	Fossil as root. Since that causes [./chroot.md \| Fossil's chroot jail
386	feature] to kick in, and a Docker container is a type of über-jail
387	already, you may be wondering why we bother. Instead, why not either:
	@@ -422,10 +424,110 @@
422	the ones forked off to handle each HTTP/CGI hit. This is why you can fix broken
423	permissions with <tt>chown</tt> after the container is already running,
424	without restarting it: each hit reevaluates the repository file
425	permissions when deciding what user to become when dropping root
426	privileges.




































































































427
428
429	<h3 id="docker-static">5.3 Extracting a Static Binary</h3>
430
431	Our 2-stage build process uses Alpine Linux only as a build host. Once
432

	--- www/build.wiki
	+++ www/build.wiki
	@@ -344,11 +344,11 @@
344	(See [#docker-args \| below] for how to change this default.)
345	You don't have to restart the server after fixing this with
346	<tt>chmod</tt>: simply reload the browser, and Fossil will try again.
347
348
349	<h4 id="docker-bind-mount">5.1.2 Storing the Repo Outside the Container</h4>
350
351	The simple storage method above has a problem: Docker containers are designed to be
352	killed off at the slightest cause, rebuilt, and redeployed. If you do
353	that with the repo inside the container, it gets destroyed, too. The
354	solution is to replace the "run" command above with the following:
	@@ -377,11 +377,13 @@
377	rebuild the container or its underlying image — such as to upgrade to a newer version of Fossil
378	— the external directory remains behind and gets remapped into the new container
379	when you recreate it with <tt>-v</tt>.
380
381
382	<h3 id="docker-security">5.2 Security</h3>
383
384	<h4 id="docker-chroot">5.2.1 Why Chroot?</h4>
385
386	A potentially surprising feature of this container is that it runs
387	Fossil as root. Since that causes [./chroot.md \| Fossil's chroot jail
388	feature] to kick in, and a Docker container is a type of über-jail
389	already, you may be wondering why we bother. Instead, why not either:
	@@ -422,10 +424,110 @@
424	the ones forked off to handle each HTTP/CGI hit. This is why you can fix broken
425	permissions with <tt>chown</tt> after the container is already running,
426	without restarting it: each hit reevaluates the repository file
427	permissions when deciding what user to become when dropping root
428	privileges.
429
430
431	<h4 id="docker-caps">5.2.2 Dropping Unnecessary Capabilities</h4>
432
433	The example commands given in this section create the container with
434	[https://docs.docker.com/engine/security/#linux-kernel-capabilities \| a
435	default set of Linux kernel capabilities]. Although Docker strips almost
436	all of the traditional root capabilities away by default, and Fossil
437	doesn't need any of those it does take away, Docker does leave some
438	enabled that Fossil doesn't actually need. You can tighten the scope of
439	capabilities by adding a "<tt>--cap-drop LIST</tt>" option to your
440	container creation commands. Specifically:
441
442	* <b><tt>AUDIT_WRITE</tt></b>: Fossil doesn't write to the kernel's
443	auditing log, and we can't see any reason you'd want to be able to
444	do that as an administrator shelled into the container, either.
445	Auditing is something done on the host, not from inside each
446	individual container.<p>
447	* <b><tt>CHOWN</tt></b>: The Fossil server never even calls
448	<tt>chown(2)</tt>, and our image build process sets up all file
449	ownership properly, to the extent that this is possible under the
450	limitations of our automation.<p>
451	Curiously, stripping this capability doesn't affect your ability to
452	run commands like "<tt>chown -R fossil:fossil /jail/museum</tt>"
453	when you're using bind mounts or external volumes — as we recommend
454	[#docker-bind-mount \| above] — because it's the host OS's kernel
455	capabilities that affect the underlying <tt>chown(2)</tt> call in
456	that case, not those of the container.<p>
457	If for some reason you did have to change file ownership of
458	in-container files, it's best to do that by changing the
459	<tt>Dockerfile</tt> to suit, then rebuilding the container, since
460	that bakes the need for the change into your reproducible build
461	process. If you had to do it without rebuilding the container,
462	[https://stackoverflow.com/a/45752205/142454 \| there's a
463	workaround] for the fact that capabilities are a create-time
464	change, baked semi-indelibly into the container configuration.<p>
465	* <b><tt>FSETID</tt></b>: Fossil doesn't use the SUID and SGID bits
466	itself, and our build process doesn't set those flags on any of the
467	files. Although the second fact means we can't see any harm from
468	leaving this enabled, we also can't see any good reason to allow
469	it, so we strip it.<p>
470	* <b><tt>KILL</tt></b>: The only place Fossil calls <tt>kill(2)</tt>
471	is in the [./backoffice.md \| backoffice], and then only for
472	processes it created on earlier runs; it doesn't need the ability
473	to kill processes created by other users. You might wish for this
474	ability as an administrator shelled into the container, but you can
475	pass the "<tt>docker exec --user</tt>" option to run commands
476	within your container as the legitimate owner of the process,
477	removing the need for this capability.<p>
478	* <b><tt>MKNOD</tt></b>: All device nodes are created at build time
479	and are never changed at run time. Realize that the virtualized
480	device nodes inside the container get mapped onto real devices on
481	the host, so if an attacker ever got a root shell on the container,
482	they might be able to do actual damage to the host if we didn't
483	preemptively strip this capability away.<p>
484	* <b><tt>NET_BIND_SERVICE</tt></b>: With containerized deployment,
485	Fossil never needs the ability to bind the server to low-numbered
486	TCP ports, not even if you're running the server in production with
487	TLS enabled and want the service bound to port 443. It's perfectly
488	fine to let the Fossil instance inside the container bind to its
489	default port (8080) because you can rebind it on the host with the
490	"<tt>docker create --publish 443:8080</tt>" option. It's the
491	container's <i>host</i> that needs this ability, not the container
492	itself.<p> (Even the container runtime might not need that
493	capability if you're [./ssl.wiki#server \| terminating TLS with a
494	front-end proxy]. You're more likely to say something like "<tt>-p
495	localhost:12345:8080</tt>", then configure the reverse proxy to
496	translate external HTTPS calls into HTTP directed at this internal
497	port 12345.)<p>
498	* <b><tt>NET_RAW</tt></b>: Fossil itself doesn't use raw sockets, and
499	our build process leaves out <tt>ping</tt> and <tt>traceroute</tt>,
500	the only Busybox utilities that require that ability. If you need
501	to ping something, you can almost certainly do it just as well out
502	on the host; we foresee no compelling reason to use ping or
503	traceroute from inside the container.<p> If we did not take this
504	hard-line stance, an attacker that broke into the container and
505	gained root privileges could use raw sockets to do a wide array of
506	bad things to any network the container is bound to.<p>
507	* <b><tt>SETFCAP, SETPCAP</tt></b>: There isn't much call for file
508	permission granularity beyond the classic Unix ones inside the
509	container, so we drop root's ability to change them.
510
511	All together, we recommend adding the following options to your
512	"<tt>docker run</tt>" commands, as well as to any "<tt>docker
513	create</tt>" command that will be followed by "<tt>docker start</tt>":
514
515	<pre><code> --cap-drop AUDIT_WRITE \
516	--cap-drop CHOWN \
517	--cap-drop FSETID \
518	--cap-drop KILL \
519	--cap-drop MKNOD \
520	--cap-drop NET_BIND_SERVICE \
521	--cap-drop NET_RAW \
522	--cap-drop SETFCAP \
523	--cap-drop SETPCAP
524	</code></pre>
525
526	In the next section, we'll show a case where you create a container
527	without ever running it, making these options pointless.
528
529
530
531	<h3 id="docker-static">5.3 Extracting a Static Binary</h3>
532
533	Our 2-stage build process uses Alpine Linux only as a build host. Once
534

Fossil SCM

Keyboard Shortcuts