Fossil SCM
Researched, tested, and documented the set of "docker create --cap-drop" options we can add to strip away unnecessary root privileges inside the container without harming normal operation. Belt-and-suspenders: if any bad actor ever got into the container with root privileges, this would help prevent them from affecting anything outside the container. Added that set to the "make container-run" target so they get applied by default in the easy case.
Commit
f715add9381024ab6a733db88a1f6a8915775444464ad4337dcae7ea8c3421a0
Parent
f00a88f89632259…
2 files changed
+12
-1
+104
-2
+12
-1
| --- Makefile.in | ||
| +++ Makefile.in | ||
| @@ -122,10 +122,21 @@ | ||
| 122 | 122 | # Container stuff |
| 123 | 123 | container-image: @srcdir@/Dockerfile |
| 124 | 124 | docker build -t fossil:@FOSSIL_CI_PFX@ $(DBFLAGS) @srcdir@ |
| 125 | 125 | |
| 126 | 126 | container-run: container-image |
| 127 | - docker run --name fossil-@FOSSIL_CI_PFX@ $(DRFLAGS) fossil:@FOSSIL_CI_PFX@ | |
| 127 | + docker run \ | |
| 128 | + --name fossil-@FOSSIL_CI_PFX@ \ | |
| 129 | + --cap-drop AUDIT_WRITE \ | |
| 130 | + --cap-drop CHOWN \ | |
| 131 | + --cap-drop FSETID \ | |
| 132 | + --cap-drop KILL \ | |
| 133 | + --cap-drop MKNOD \ | |
| 134 | + --cap-drop NET_BIND_SERVICE \ | |
| 135 | + --cap-drop NET_RAW \ | |
| 136 | + --cap-drop SETFCAP \ | |
| 137 | + --cap-drop SETPCAP \ | |
| 138 | + $(DRFLAGS) fossil:@FOSSIL_CI_PFX@ | |
| 128 | 139 | |
| 129 | 140 | @srcdir@/Dockerfile: @srcdir@/Dockerfile.in @srcdir@/manifest.uuid |
| 130 | 141 | @AUTOREMAKE@ |
| 131 | 142 | |
| 132 | 143 |
| --- Makefile.in | |
| +++ Makefile.in | |
| @@ -122,10 +122,21 @@ | |
| 122 | # Container stuff |
| 123 | container-image: @srcdir@/Dockerfile |
| 124 | docker build -t fossil:@FOSSIL_CI_PFX@ $(DBFLAGS) @srcdir@ |
| 125 | |
| 126 | container-run: container-image |
| 127 | docker run --name fossil-@FOSSIL_CI_PFX@ $(DRFLAGS) fossil:@FOSSIL_CI_PFX@ |
| 128 | |
| 129 | @srcdir@/Dockerfile: @srcdir@/Dockerfile.in @srcdir@/manifest.uuid |
| 130 | @AUTOREMAKE@ |
| 131 | |
| 132 |
| --- Makefile.in | |
| +++ Makefile.in | |
| @@ -122,10 +122,21 @@ | |
| 122 | # Container stuff |
| 123 | container-image: @srcdir@/Dockerfile |
| 124 | docker build -t fossil:@FOSSIL_CI_PFX@ $(DBFLAGS) @srcdir@ |
| 125 | |
| 126 | container-run: container-image |
| 127 | docker run \ |
| 128 | --name fossil-@FOSSIL_CI_PFX@ \ |
| 129 | --cap-drop AUDIT_WRITE \ |
| 130 | --cap-drop CHOWN \ |
| 131 | --cap-drop FSETID \ |
| 132 | --cap-drop KILL \ |
| 133 | --cap-drop MKNOD \ |
| 134 | --cap-drop NET_BIND_SERVICE \ |
| 135 | --cap-drop NET_RAW \ |
| 136 | --cap-drop SETFCAP \ |
| 137 | --cap-drop SETPCAP \ |
| 138 | $(DRFLAGS) fossil:@FOSSIL_CI_PFX@ |
| 139 | |
| 140 | @srcdir@/Dockerfile: @srcdir@/Dockerfile.in @srcdir@/manifest.uuid |
| 141 | @AUTOREMAKE@ |
| 142 | |
| 143 |
+104
-2
| --- www/build.wiki | ||
| +++ www/build.wiki | ||
| @@ -344,11 +344,11 @@ | ||
| 344 | 344 | (See [#docker-args | below] for how to change this default.) |
| 345 | 345 | You don't have to restart the server after fixing this with |
| 346 | 346 | <tt>chmod</tt>: simply reload the browser, and Fossil will try again. |
| 347 | 347 | |
| 348 | 348 | |
| 349 | -<h4>5.1.2 Storing the Repo Outside the Container</h4> | |
| 349 | +<h4 id="docker-bind-mount">5.1.2 Storing the Repo Outside the Container</h4> | |
| 350 | 350 | |
| 351 | 351 | The simple storage method above has a problem: Docker containers are designed to be |
| 352 | 352 | killed off at the slightest cause, rebuilt, and redeployed. If you do |
| 353 | 353 | that with the repo inside the container, it gets destroyed, too. The |
| 354 | 354 | solution is to replace the "run" command above with the following: |
| @@ -377,11 +377,13 @@ | ||
| 377 | 377 | rebuild the container or its underlying image — such as to upgrade to a newer version of Fossil |
| 378 | 378 | — the external directory remains behind and gets remapped into the new container |
| 379 | 379 | when you recreate it with <tt>-v</tt>. |
| 380 | 380 | |
| 381 | 381 | |
| 382 | -<h3 id="docker-chroot">5.2 Why Chroot?</h3> | |
| 382 | +<h3 id="docker-security">5.2 Security</h3> | |
| 383 | + | |
| 384 | +<h4 id="docker-chroot">5.2.1 Why Chroot?</h4> | |
| 383 | 385 | |
| 384 | 386 | A potentially surprising feature of this container is that it runs |
| 385 | 387 | Fossil as root. Since that causes [./chroot.md | Fossil's chroot jail |
| 386 | 388 | feature] to kick in, and a Docker container is a type of über-jail |
| 387 | 389 | already, you may be wondering why we bother. Instead, why not either: |
| @@ -422,10 +424,110 @@ | ||
| 422 | 424 | the ones forked off to handle each HTTP/CGI hit. This is why you can fix broken |
| 423 | 425 | permissions with <tt>chown</tt> after the container is already running, |
| 424 | 426 | without restarting it: each hit reevaluates the repository file |
| 425 | 427 | permissions when deciding what user to become when dropping root |
| 426 | 428 | privileges. |
| 429 | + | |
| 430 | + | |
| 431 | +<h4 id="docker-caps">5.2.2 Dropping Unnecessary Capabilities</h4> | |
| 432 | + | |
| 433 | +The example commands given in this section create the container with | |
| 434 | +[https://docs.docker.com/engine/security/#linux-kernel-capabilities | a | |
| 435 | +default set of Linux kernel capabilities]. Although Docker strips almost | |
| 436 | +all of the traditional root capabilities away by default, and Fossil | |
| 437 | +doesn't need any of those it does take away, Docker does leave some | |
| 438 | +enabled that Fossil doesn't actually need. You can tighten the scope of | |
| 439 | +capabilities by adding a "<tt>--cap-drop LIST</tt>" option to your | |
| 440 | +container creation commands. Specifically: | |
| 441 | + | |
| 442 | + * <b><tt>AUDIT_WRITE</tt></b>: Fossil doesn't write to the kernel's | |
| 443 | + auditing log, and we can't see any reason you'd want to be able to | |
| 444 | + do that as an administrator shelled into the container, either. | |
| 445 | + Auditing is something done on the host, not from inside each | |
| 446 | + individual container.<p> | |
| 447 | + * <b><tt>CHOWN</tt></b>: The Fossil server never even calls | |
| 448 | + <tt>chown(2)</tt>, and our image build process sets up all file | |
| 449 | + ownership properly, to the extent that this is possible under the | |
| 450 | + limitations of our automation.<p> | |
| 451 | + Curiously, stripping this capability doesn't affect your ability to | |
| 452 | + run commands like "<tt>chown -R fossil:fossil /jail/museum</tt>" | |
| 453 | + when you're using bind mounts or external volumes — as we recommend | |
| 454 | + [#docker-bind-mount | above] — because it's the host OS's kernel | |
| 455 | + capabilities that affect the underlying <tt>chown(2)</tt> call in | |
| 456 | + that case, not those of the container.<p> | |
| 457 | + If for some reason you did have to change file ownership of | |
| 458 | + in-container files, it's best to do that by changing the | |
| 459 | + <tt>Dockerfile</tt> to suit, then rebuilding the container, since | |
| 460 | + that bakes the need for the change into your reproducible build | |
| 461 | + process. If you had to do it without rebuilding the container, | |
| 462 | + [https://stackoverflow.com/a/45752205/142454 | there's a | |
| 463 | + workaround] for the fact that capabilities are a create-time | |
| 464 | + change, baked semi-indelibly into the container configuration.<p> | |
| 465 | + * <b><tt>FSETID</tt></b>: Fossil doesn't use the SUID and SGID bits | |
| 466 | + itself, and our build process doesn't set those flags on any of the | |
| 467 | + files. Although the second fact means we can't see any harm from | |
| 468 | + leaving this enabled, we also can't see any good reason to allow | |
| 469 | + it, so we strip it.<p> | |
| 470 | + * <b><tt>KILL</tt></b>: The only place Fossil calls <tt>kill(2)</tt> | |
| 471 | + is in the [./backoffice.md | backoffice], and then only for | |
| 472 | + processes it created on earlier runs; it doesn't need the ability | |
| 473 | + to kill processes created by other users. You might wish for this | |
| 474 | + ability as an administrator shelled into the container, but you can | |
| 475 | + pass the "<tt>docker exec --user</tt>" option to run commands | |
| 476 | + within your container as the legitimate owner of the process, | |
| 477 | + removing the need for this capability.<p> | |
| 478 | + * <b><tt>MKNOD</tt></b>: All device nodes are created at build time | |
| 479 | + and are never changed at run time. Realize that the virtualized | |
| 480 | + device nodes inside the container get mapped onto real devices on | |
| 481 | + the host, so if an attacker ever got a root shell on the container, | |
| 482 | + they might be able to do actual damage to the host if we didn't | |
| 483 | + preemptively strip this capability away.<p> | |
| 484 | + * <b><tt>NET_BIND_SERVICE</tt></b>: With containerized deployment, | |
| 485 | + Fossil never needs the ability to bind the server to low-numbered | |
| 486 | + TCP ports, not even if you're running the server in production with | |
| 487 | + TLS enabled and want the service bound to port 443. It's perfectly | |
| 488 | + fine to let the Fossil instance inside the container bind to its | |
| 489 | + default port (8080) because you can rebind it on the host with the | |
| 490 | + "<tt>docker create --publish 443:8080</tt>" option. It's the | |
| 491 | + container's <i>host</i> that needs this ability, not the container | |
| 492 | + itself.<p> (Even the container runtime might not need that | |
| 493 | + capability if you're [./ssl.wiki#server | terminating TLS with a | |
| 494 | + front-end proxy]. You're more likely to say something like "<tt>-p | |
| 495 | + localhost:12345:8080</tt>", then configure the reverse proxy to | |
| 496 | + translate external HTTPS calls into HTTP directed at this internal | |
| 497 | + port 12345.)<p> | |
| 498 | + * <b><tt>NET_RAW</tt></b>: Fossil itself doesn't use raw sockets, and | |
| 499 | + our build process leaves out <tt>ping</tt> and <tt>traceroute</tt>, | |
| 500 | + the only Busybox utilities that require that ability. If you need | |
| 501 | + to ping something, you can almost certainly do it just as well out | |
| 502 | + on the host; we foresee no compelling reason to use ping or | |
| 503 | + traceroute from inside the container.<p> If we did not take this | |
| 504 | + hard-line stance, an attacker that broke into the container and | |
| 505 | + gained root privileges could use raw sockets to do a wide array of | |
| 506 | + bad things to any network the container is bound to.<p> | |
| 507 | + * <b><tt>SETFCAP, SETPCAP</tt></b>: There isn't much call for file | |
| 508 | + permission granularity beyond the classic Unix ones inside the | |
| 509 | + container, so we drop root's ability to change them. | |
| 510 | + | |
| 511 | +All together, we recommend adding the following options to your | |
| 512 | +"<tt>docker run</tt>" commands, as well as to any "<tt>docker | |
| 513 | +create</tt>" command that will be followed by "<tt>docker start</tt>": | |
| 514 | + | |
| 515 | +<pre><code> --cap-drop AUDIT_WRITE \ | |
| 516 | + --cap-drop CHOWN \ | |
| 517 | + --cap-drop FSETID \ | |
| 518 | + --cap-drop KILL \ | |
| 519 | + --cap-drop MKNOD \ | |
| 520 | + --cap-drop NET_BIND_SERVICE \ | |
| 521 | + --cap-drop NET_RAW \ | |
| 522 | + --cap-drop SETFCAP \ | |
| 523 | + --cap-drop SETPCAP | |
| 524 | +</code></pre> | |
| 525 | + | |
| 526 | +In the next section, we'll show a case where you create a container | |
| 527 | +without ever running it, making these options pointless. | |
| 528 | + | |
| 427 | 529 | |
| 428 | 530 | |
| 429 | 531 | <h3 id="docker-static">5.3 Extracting a Static Binary</h3> |
| 430 | 532 | |
| 431 | 533 | Our 2-stage build process uses Alpine Linux only as a build host. Once |
| 432 | 534 |
| --- www/build.wiki | |
| +++ www/build.wiki | |
| @@ -344,11 +344,11 @@ | |
| 344 | (See [#docker-args | below] for how to change this default.) |
| 345 | You don't have to restart the server after fixing this with |
| 346 | <tt>chmod</tt>: simply reload the browser, and Fossil will try again. |
| 347 | |
| 348 | |
| 349 | <h4>5.1.2 Storing the Repo Outside the Container</h4> |
| 350 | |
| 351 | The simple storage method above has a problem: Docker containers are designed to be |
| 352 | killed off at the slightest cause, rebuilt, and redeployed. If you do |
| 353 | that with the repo inside the container, it gets destroyed, too. The |
| 354 | solution is to replace the "run" command above with the following: |
| @@ -377,11 +377,13 @@ | |
| 377 | rebuild the container or its underlying image — such as to upgrade to a newer version of Fossil |
| 378 | — the external directory remains behind and gets remapped into the new container |
| 379 | when you recreate it with <tt>-v</tt>. |
| 380 | |
| 381 | |
| 382 | <h3 id="docker-chroot">5.2 Why Chroot?</h3> |
| 383 | |
| 384 | A potentially surprising feature of this container is that it runs |
| 385 | Fossil as root. Since that causes [./chroot.md | Fossil's chroot jail |
| 386 | feature] to kick in, and a Docker container is a type of über-jail |
| 387 | already, you may be wondering why we bother. Instead, why not either: |
| @@ -422,10 +424,110 @@ | |
| 422 | the ones forked off to handle each HTTP/CGI hit. This is why you can fix broken |
| 423 | permissions with <tt>chown</tt> after the container is already running, |
| 424 | without restarting it: each hit reevaluates the repository file |
| 425 | permissions when deciding what user to become when dropping root |
| 426 | privileges. |
| 427 | |
| 428 | |
| 429 | <h3 id="docker-static">5.3 Extracting a Static Binary</h3> |
| 430 | |
| 431 | Our 2-stage build process uses Alpine Linux only as a build host. Once |
| 432 |
| --- www/build.wiki | |
| +++ www/build.wiki | |
| @@ -344,11 +344,11 @@ | |
| 344 | (See [#docker-args | below] for how to change this default.) |
| 345 | You don't have to restart the server after fixing this with |
| 346 | <tt>chmod</tt>: simply reload the browser, and Fossil will try again. |
| 347 | |
| 348 | |
| 349 | <h4 id="docker-bind-mount">5.1.2 Storing the Repo Outside the Container</h4> |
| 350 | |
| 351 | The simple storage method above has a problem: Docker containers are designed to be |
| 352 | killed off at the slightest cause, rebuilt, and redeployed. If you do |
| 353 | that with the repo inside the container, it gets destroyed, too. The |
| 354 | solution is to replace the "run" command above with the following: |
| @@ -377,11 +377,13 @@ | |
| 377 | rebuild the container or its underlying image — such as to upgrade to a newer version of Fossil |
| 378 | — the external directory remains behind and gets remapped into the new container |
| 379 | when you recreate it with <tt>-v</tt>. |
| 380 | |
| 381 | |
| 382 | <h3 id="docker-security">5.2 Security</h3> |
| 383 | |
| 384 | <h4 id="docker-chroot">5.2.1 Why Chroot?</h4> |
| 385 | |
| 386 | A potentially surprising feature of this container is that it runs |
| 387 | Fossil as root. Since that causes [./chroot.md | Fossil's chroot jail |
| 388 | feature] to kick in, and a Docker container is a type of über-jail |
| 389 | already, you may be wondering why we bother. Instead, why not either: |
| @@ -422,10 +424,110 @@ | |
| 424 | the ones forked off to handle each HTTP/CGI hit. This is why you can fix broken |
| 425 | permissions with <tt>chown</tt> after the container is already running, |
| 426 | without restarting it: each hit reevaluates the repository file |
| 427 | permissions when deciding what user to become when dropping root |
| 428 | privileges. |
| 429 | |
| 430 | |
| 431 | <h4 id="docker-caps">5.2.2 Dropping Unnecessary Capabilities</h4> |
| 432 | |
| 433 | The example commands given in this section create the container with |
| 434 | [https://docs.docker.com/engine/security/#linux-kernel-capabilities | a |
| 435 | default set of Linux kernel capabilities]. Although Docker strips almost |
| 436 | all of the traditional root capabilities away by default, and Fossil |
| 437 | doesn't need any of those it does take away, Docker does leave some |
| 438 | enabled that Fossil doesn't actually need. You can tighten the scope of |
| 439 | capabilities by adding a "<tt>--cap-drop LIST</tt>" option to your |
| 440 | container creation commands. Specifically: |
| 441 | |
| 442 | * <b><tt>AUDIT_WRITE</tt></b>: Fossil doesn't write to the kernel's |
| 443 | auditing log, and we can't see any reason you'd want to be able to |
| 444 | do that as an administrator shelled into the container, either. |
| 445 | Auditing is something done on the host, not from inside each |
| 446 | individual container.<p> |
| 447 | * <b><tt>CHOWN</tt></b>: The Fossil server never even calls |
| 448 | <tt>chown(2)</tt>, and our image build process sets up all file |
| 449 | ownership properly, to the extent that this is possible under the |
| 450 | limitations of our automation.<p> |
| 451 | Curiously, stripping this capability doesn't affect your ability to |
| 452 | run commands like "<tt>chown -R fossil:fossil /jail/museum</tt>" |
| 453 | when you're using bind mounts or external volumes — as we recommend |
| 454 | [#docker-bind-mount | above] — because it's the host OS's kernel |
| 455 | capabilities that affect the underlying <tt>chown(2)</tt> call in |
| 456 | that case, not those of the container.<p> |
| 457 | If for some reason you did have to change file ownership of |
| 458 | in-container files, it's best to do that by changing the |
| 459 | <tt>Dockerfile</tt> to suit, then rebuilding the container, since |
| 460 | that bakes the need for the change into your reproducible build |
| 461 | process. If you had to do it without rebuilding the container, |
| 462 | [https://stackoverflow.com/a/45752205/142454 | there's a |
| 463 | workaround] for the fact that capabilities are a create-time |
| 464 | change, baked semi-indelibly into the container configuration.<p> |
| 465 | * <b><tt>FSETID</tt></b>: Fossil doesn't use the SUID and SGID bits |
| 466 | itself, and our build process doesn't set those flags on any of the |
| 467 | files. Although the second fact means we can't see any harm from |
| 468 | leaving this enabled, we also can't see any good reason to allow |
| 469 | it, so we strip it.<p> |
| 470 | * <b><tt>KILL</tt></b>: The only place Fossil calls <tt>kill(2)</tt> |
| 471 | is in the [./backoffice.md | backoffice], and then only for |
| 472 | processes it created on earlier runs; it doesn't need the ability |
| 473 | to kill processes created by other users. You might wish for this |
| 474 | ability as an administrator shelled into the container, but you can |
| 475 | pass the "<tt>docker exec --user</tt>" option to run commands |
| 476 | within your container as the legitimate owner of the process, |
| 477 | removing the need for this capability.<p> |
| 478 | * <b><tt>MKNOD</tt></b>: All device nodes are created at build time |
| 479 | and are never changed at run time. Realize that the virtualized |
| 480 | device nodes inside the container get mapped onto real devices on |
| 481 | the host, so if an attacker ever got a root shell on the container, |
| 482 | they might be able to do actual damage to the host if we didn't |
| 483 | preemptively strip this capability away.<p> |
| 484 | * <b><tt>NET_BIND_SERVICE</tt></b>: With containerized deployment, |
| 485 | Fossil never needs the ability to bind the server to low-numbered |
| 486 | TCP ports, not even if you're running the server in production with |
| 487 | TLS enabled and want the service bound to port 443. It's perfectly |
| 488 | fine to let the Fossil instance inside the container bind to its |
| 489 | default port (8080) because you can rebind it on the host with the |
| 490 | "<tt>docker create --publish 443:8080</tt>" option. It's the |
| 491 | container's <i>host</i> that needs this ability, not the container |
| 492 | itself.<p> (Even the container runtime might not need that |
| 493 | capability if you're [./ssl.wiki#server | terminating TLS with a |
| 494 | front-end proxy]. You're more likely to say something like "<tt>-p |
| 495 | localhost:12345:8080</tt>", then configure the reverse proxy to |
| 496 | translate external HTTPS calls into HTTP directed at this internal |
| 497 | port 12345.)<p> |
| 498 | * <b><tt>NET_RAW</tt></b>: Fossil itself doesn't use raw sockets, and |
| 499 | our build process leaves out <tt>ping</tt> and <tt>traceroute</tt>, |
| 500 | the only Busybox utilities that require that ability. If you need |
| 501 | to ping something, you can almost certainly do it just as well out |
| 502 | on the host; we foresee no compelling reason to use ping or |
| 503 | traceroute from inside the container.<p> If we did not take this |
| 504 | hard-line stance, an attacker that broke into the container and |
| 505 | gained root privileges could use raw sockets to do a wide array of |
| 506 | bad things to any network the container is bound to.<p> |
| 507 | * <b><tt>SETFCAP, SETPCAP</tt></b>: There isn't much call for file |
| 508 | permission granularity beyond the classic Unix ones inside the |
| 509 | container, so we drop root's ability to change them. |
| 510 | |
| 511 | All together, we recommend adding the following options to your |
| 512 | "<tt>docker run</tt>" commands, as well as to any "<tt>docker |
| 513 | create</tt>" command that will be followed by "<tt>docker start</tt>": |
| 514 | |
| 515 | <pre><code> --cap-drop AUDIT_WRITE \ |
| 516 | --cap-drop CHOWN \ |
| 517 | --cap-drop FSETID \ |
| 518 | --cap-drop KILL \ |
| 519 | --cap-drop MKNOD \ |
| 520 | --cap-drop NET_BIND_SERVICE \ |
| 521 | --cap-drop NET_RAW \ |
| 522 | --cap-drop SETFCAP \ |
| 523 | --cap-drop SETPCAP |
| 524 | </code></pre> |
| 525 | |
| 526 | In the next section, we'll show a case where you create a container |
| 527 | without ever running it, making these options pointless. |
| 528 | |
| 529 | |
| 530 | |
| 531 | <h3 id="docker-static">5.3 Extracting a Static Binary</h3> |
| 532 | |
| 533 | Our 2-stage build process uses Alpine Linux only as a build host. Once |
| 534 |