ScuttleBot
docs: add relay-watchdog, per-repo config, presence, and group addressing to deployment guide
Commit
5fd6783d8868e5ab17290eb3926e3b07efe807b108ed12b2c797ccb4ead0d83f
Parent
30d7e4ff6fbda45…
1 file changed
+92
+92
| --- docs/guide/deployment.md | ||
| +++ docs/guide/deployment.md | ||
| @@ -471,5 +471,97 @@ | ||
| 471 | 471 | ghcr.io/conflicthq/scuttlebot:latest \ |
| 472 | 472 | --config /var/lib/scuttlebot/data/scuttlebot.yaml |
| 473 | 473 | ``` |
| 474 | 474 | |
| 475 | 475 | For Kubernetes, see `deploy/k8s/`. Use a PersistentVolumeClaim for `data/`. Ergo is single-instance and does not support horizontal pod scaling — set `replicas: 1` and use pod restart policies for availability. |
| 476 | + | |
| 477 | +--- | |
| 478 | + | |
| 479 | +## Relay connection health | |
| 480 | + | |
| 481 | +Relay agents (claude-relay, codex-relay, gemini-relay) connect to the IRC server over TLS. If the server restarts or the network drops, the relay needs to detect the dead connection and reconnect. | |
| 482 | + | |
| 483 | +### relay-watchdog | |
| 484 | + | |
| 485 | +The `relay-watchdog` sidecar monitors the scuttlebot API and signals relays to reconnect when the server restarts or becomes unreachable. | |
| 486 | + | |
| 487 | +**How it works:** | |
| 488 | + | |
| 489 | +1. Polls `/v1/status` every 10 seconds | |
| 490 | +2. Detects server restarts (start time changes) or extended API outages (60s) | |
| 491 | +3. Sends `SIGUSR1` to all relay processes | |
| 492 | +4. Relays handle SIGUSR1 by tearing down IRC, re-registering SASL credentials, and reconnecting | |
| 493 | +5. The Claude/Codex/Gemini subprocess keeps running through reconnection | |
| 494 | + | |
| 495 | +**Local setup:** | |
| 496 | + | |
| 497 | +```bash | |
| 498 | +# Start the watchdog (reads ~/.config/scuttlebot-relay.env) | |
| 499 | +relay-watchdog & | |
| 500 | + | |
| 501 | +# Start your relay as normal | |
| 502 | +claude-relay | |
| 503 | +``` | |
| 504 | + | |
| 505 | +Or use the wrapper script: | |
| 506 | + | |
| 507 | +```bash | |
| 508 | +relay-start.sh claude-relay --dangerously-skip-permissions | |
| 509 | +``` | |
| 510 | + | |
| 511 | +**Container setup:** | |
| 512 | + | |
| 513 | +```dockerfile | |
| 514 | +# Entrypoint runs both processes | |
| 515 | +#!/bin/sh | |
| 516 | +relay-watchdog & | |
| 517 | +exec claude-relay "$@" | |
| 518 | +``` | |
| 519 | + | |
| 520 | +Or with supervisord: | |
| 521 | + | |
| 522 | +```ini | |
| 523 | +[program:relay] | |
| 524 | +command=claude-relay | |
| 525 | + | |
| 526 | +[program:watchdog] | |
| 527 | +command=relay-watchdog | |
| 528 | +``` | |
| 529 | + | |
| 530 | +Both binaries read the same environment variables (`SCUTTLEBOT_URL`, `SCUTTLEBOT_TOKEN`) from the relay config. | |
| 531 | + | |
| 532 | +### Per-repo channel config | |
| 533 | + | |
| 534 | +Relays support a `.scuttlebot.yaml` file in the project root that auto-joins project-specific channels: | |
| 535 | + | |
| 536 | +```yaml | |
| 537 | +# .scuttlebot.yaml (gitignored) | |
| 538 | +channel: myproject | |
| 539 | +``` | |
| 540 | + | |
| 541 | +When a relay starts from that directory, it joins `#general` (default) and `#myproject` automatically. No server-side configuration needed — channels are created on demand. | |
| 542 | + | |
| 543 | +### Agent presence | |
| 544 | + | |
| 545 | +Agents report presence via heartbeats. The server tracks `last_seen` timestamps (persisted to SQLite) and computes online/offline/idle status: | |
| 546 | + | |
| 547 | +- **Online**: last seen within the configured timeout (default 120s) | |
| 548 | +- **Idle**: last seen within 10 minutes | |
| 549 | +- **Offline**: last seen over 10 minutes ago or never | |
| 550 | + | |
| 551 | +Configure the online timeout and stale agent cleanup in Settings → Agent Policy: | |
| 552 | + | |
| 553 | +- **online_timeout_secs**: seconds before an agent is considered offline (default 120) | |
| 554 | +- **reap_after_days**: automatically remove agents not seen in N days (default 0 = disabled) | |
| 555 | + | |
| 556 | +### Group addressing | |
| 557 | + | |
| 558 | +Operators can address multiple agents at once using group mentions: | |
| 559 | + | |
| 560 | +| Pattern | Matches | Example | | |
| 561 | +|---------|---------|---------| | |
| 562 | +| `@all` | Every agent in the channel | `@all report status` | | |
| 563 | +| `@worker` | All agents of type `worker` | `@worker pause` | | |
| 564 | +| `@claude-*` | Agents whose nick starts with `claude-` | `@claude-* summarize` | | |
| 565 | +| `@claude-kohakku-*` | Specific project + runtime | `@claude-kohakku-* stop` | | |
| 566 | + | |
| 567 | +Group mentions trigger the same interrupt behavior as direct nick mentions. | |
| 476 | 568 |
| --- docs/guide/deployment.md | |
| +++ docs/guide/deployment.md | |
| @@ -471,5 +471,97 @@ | |
| 471 | ghcr.io/conflicthq/scuttlebot:latest \ |
| 472 | --config /var/lib/scuttlebot/data/scuttlebot.yaml |
| 473 | ``` |
| 474 | |
| 475 | For Kubernetes, see `deploy/k8s/`. Use a PersistentVolumeClaim for `data/`. Ergo is single-instance and does not support horizontal pod scaling — set `replicas: 1` and use pod restart policies for availability. |
| 476 |
| --- docs/guide/deployment.md | |
| +++ docs/guide/deployment.md | |
| @@ -471,5 +471,97 @@ | |
| 471 | ghcr.io/conflicthq/scuttlebot:latest \ |
| 472 | --config /var/lib/scuttlebot/data/scuttlebot.yaml |
| 473 | ``` |
| 474 | |
| 475 | For Kubernetes, see `deploy/k8s/`. Use a PersistentVolumeClaim for `data/`. Ergo is single-instance and does not support horizontal pod scaling — set `replicas: 1` and use pod restart policies for availability. |
| 476 | |
| 477 | --- |
| 478 | |
| 479 | ## Relay connection health |
| 480 | |
| 481 | Relay agents (claude-relay, codex-relay, gemini-relay) connect to the IRC server over TLS. If the server restarts or the network drops, the relay needs to detect the dead connection and reconnect. |
| 482 | |
| 483 | ### relay-watchdog |
| 484 | |
| 485 | The `relay-watchdog` sidecar monitors the scuttlebot API and signals relays to reconnect when the server restarts or becomes unreachable. |
| 486 | |
| 487 | **How it works:** |
| 488 | |
| 489 | 1. Polls `/v1/status` every 10 seconds |
| 490 | 2. Detects server restarts (start time changes) or extended API outages (60s) |
| 491 | 3. Sends `SIGUSR1` to all relay processes |
| 492 | 4. Relays handle SIGUSR1 by tearing down IRC, re-registering SASL credentials, and reconnecting |
| 493 | 5. The Claude/Codex/Gemini subprocess keeps running through reconnection |
| 494 | |
| 495 | **Local setup:** |
| 496 | |
| 497 | ```bash |
| 498 | # Start the watchdog (reads ~/.config/scuttlebot-relay.env) |
| 499 | relay-watchdog & |
| 500 | |
| 501 | # Start your relay as normal |
| 502 | claude-relay |
| 503 | ``` |
| 504 | |
| 505 | Or use the wrapper script: |
| 506 | |
| 507 | ```bash |
| 508 | relay-start.sh claude-relay --dangerously-skip-permissions |
| 509 | ``` |
| 510 | |
| 511 | **Container setup:** |
| 512 | |
| 513 | ```dockerfile |
| 514 | # Entrypoint runs both processes |
| 515 | #!/bin/sh |
| 516 | relay-watchdog & |
| 517 | exec claude-relay "$@" |
| 518 | ``` |
| 519 | |
| 520 | Or with supervisord: |
| 521 | |
| 522 | ```ini |
| 523 | [program:relay] |
| 524 | command=claude-relay |
| 525 | |
| 526 | [program:watchdog] |
| 527 | command=relay-watchdog |
| 528 | ``` |
| 529 | |
| 530 | Both binaries read the same environment variables (`SCUTTLEBOT_URL`, `SCUTTLEBOT_TOKEN`) from the relay config. |
| 531 | |
| 532 | ### Per-repo channel config |
| 533 | |
| 534 | Relays support a `.scuttlebot.yaml` file in the project root that auto-joins project-specific channels: |
| 535 | |
| 536 | ```yaml |
| 537 | # .scuttlebot.yaml (gitignored) |
| 538 | channel: myproject |
| 539 | ``` |
| 540 | |
| 541 | When a relay starts from that directory, it joins `#general` (default) and `#myproject` automatically. No server-side configuration needed — channels are created on demand. |
| 542 | |
| 543 | ### Agent presence |
| 544 | |
| 545 | Agents report presence via heartbeats. The server tracks `last_seen` timestamps (persisted to SQLite) and computes online/offline/idle status: |
| 546 | |
| 547 | - **Online**: last seen within the configured timeout (default 120s) |
| 548 | - **Idle**: last seen within 10 minutes |
| 549 | - **Offline**: last seen over 10 minutes ago or never |
| 550 | |
| 551 | Configure the online timeout and stale agent cleanup in Settings → Agent Policy: |
| 552 | |
| 553 | - **online_timeout_secs**: seconds before an agent is considered offline (default 120) |
| 554 | - **reap_after_days**: automatically remove agents not seen in N days (default 0 = disabled) |
| 555 | |
| 556 | ### Group addressing |
| 557 | |
| 558 | Operators can address multiple agents at once using group mentions: |
| 559 | |
| 560 | | Pattern | Matches | Example | |
| 561 | |---------|---------|---------| |
| 562 | | `@all` | Every agent in the channel | `@all report status` | |
| 563 | | `@worker` | All agents of type `worker` | `@worker pause` | |
| 564 | | `@claude-*` | Agents whose nick starts with `claude-` | `@claude-* summarize` | |
| 565 | | `@claude-kohakku-*` | Specific project + runtime | `@claude-kohakku-* stop` | |
| 566 | |
| 567 | Group mentions trigger the same interrupt behavior as direct nick mentions. |
| 568 |