feat(harness): builder, lifecycle, stdout/stderr capture (#2) #20

Merged
charles merged 1 commit from feat/2-harness-lifecycle into main 2026-04-11 18:42:15 +00:00
Owner

Closes #2.

Stacked on #19 (feat/1-bootstrap-crate-skeleton).

Summary

Implements the process harness builder, spawn/stop/restart lifecycle, env forwarding, and stdout/stderr ring-buffer capture. Health polling is not in this PR — it lives in #3, which extends start() with the HTTP poll loop.

Builder

  • HarnessBuilder with command, arg, args, env, working_dir, health_url, ws_url, startup_timeout, shutdown_timeout, health_poll_interval, buffer_bytes, build().
  • ProcessHarness::builder() shortcut.
  • Defaults: 15 s startup, 10 s shutdown, 100 ms health poll, 1 MiB per output buffer.

Spawn

  • start() uses tokio::process::Command, pipes stdout/stderr, sets kill_on_drop.
  • Auto-forwards DISPLAY, WAYLAND_DISPLAY, XDG_RUNTIME_DIR, XDG_SESSION_TYPE, XDG_CURRENT_DESKTOP, DBUS_SESSION_BUS_ADDRESS from the parent. User-set env wins.
  • Two background tokio tasks drain stdout/stderr line-by-line into bounded OutputBuffers — ring over bytes, oldest dropped on overflow.
  • pid(), stdout(), stderr() accessors.

Stop & restart

  • stop() sends SIGTERM via libc::kill, waits up to shutdown_timeout, escalates to start_kill() (SIGKILL) on timeout.
  • restart() = stop() + start().
  • Drop sends synchronous SIGTERM + SIGKILL so a panicking test never leaks the child process.

Dependency added

  • libc = "0.2" gated [target.'cfg(unix)'.dependencies]. Already a transitive dep of tokio; declaring it explicitly is the cheapest way to call kill(2).

Checklist (from issue #2)

  • Builder with all listed fields
  • ProcessHarness::builder()
  • start() spawns child
  • DISPLAY / XDG_RUNTIME_DIR auto-forwarded
  • User env wins over auto-forward
  • working_dir set if provided
  • stop() SIGTERM → wait → SIGKILL
  • Returns final exit status
  • Drop triggers synchronous best-effort SIGTERM + SIGKILL
  • restart() = stop + start (health re-wait deferred to #3)
  • stdout/stderr captured to bounded ring buffers (1 MiB default)
  • stdout() / stderr() accessors
  • stderr_tail() available for #3 to use in error messages (allow(dead_code) until then)
  • Linux primary; uses cfg(unix) for the kill path

Test plan

  • just qa green locally — 16/16 unit tests pass (10 new for the harness)
  • Tests cover: stdout capture, stderr capture, env forwarding, graceful stop, Drop kill, restart, ring-buffer overflow drops oldest, builder defaults
  • CI green on Forgejo (waiting on #19 to land or for the runner to build the stacked diff)

Notes for the reviewer

  • stderr_tail() and the STDERR_TAIL_LINES constant carry an #[allow(dead_code)] because they're not yet called by anything in this PR — they exist as the API surface that #3's health-polling error path needs. Removing the allow is part of #3.
  • I deliberately did not add a wait_for_health() stub. #3 will inline the polling at the bottom of start(). Avoiding a placeholder method keeps the diff in #3 clean.
  • All process tests use /bin/sh -c so they don't depend on a project-specific binary; portable across any Linux runner.
  • The Drop SIGTERM + SIGKILL combo is technically back-to-back (no sleep, since Drop is sync) — the issue explicitly asks for both. The graceful path is via explicit stop().await.
Closes #2. **Stacked on #19** (feat/1-bootstrap-crate-skeleton). ## Summary Implements the process harness builder, spawn/stop/restart lifecycle, env forwarding, and stdout/stderr ring-buffer capture. Health polling is **not** in this PR — it lives in #3, which extends `start()` with the HTTP poll loop. ### Builder - `HarnessBuilder` with `command`, `arg`, `args`, `env`, `working_dir`, `health_url`, `ws_url`, `startup_timeout`, `shutdown_timeout`, `health_poll_interval`, `buffer_bytes`, `build()`. - `ProcessHarness::builder()` shortcut. - Defaults: 15 s startup, 10 s shutdown, 100 ms health poll, 1 MiB per output buffer. ### Spawn - `start()` uses `tokio::process::Command`, pipes stdout/stderr, sets `kill_on_drop`. - Auto-forwards `DISPLAY`, `WAYLAND_DISPLAY`, `XDG_RUNTIME_DIR`, `XDG_SESSION_TYPE`, `XDG_CURRENT_DESKTOP`, `DBUS_SESSION_BUS_ADDRESS` from the parent. User-set env wins. - Two background tokio tasks drain stdout/stderr line-by-line into bounded `OutputBuffer`s — ring over bytes, oldest dropped on overflow. - `pid()`, `stdout()`, `stderr()` accessors. ### Stop & restart - `stop()` sends `SIGTERM` via `libc::kill`, waits up to `shutdown_timeout`, escalates to `start_kill()` (SIGKILL) on timeout. - `restart()` = `stop() + start()`. - `Drop` sends synchronous `SIGTERM` + SIGKILL so a panicking test never leaks the child process. ### Dependency added - `libc = "0.2"` gated `[target.'cfg(unix)'.dependencies]`. Already a transitive dep of tokio; declaring it explicitly is the cheapest way to call `kill(2)`. ## Checklist (from issue #2) - [x] Builder with all listed fields - [x] `ProcessHarness::builder()` - [x] `start()` spawns child - [x] `DISPLAY` / `XDG_RUNTIME_DIR` auto-forwarded - [x] User env wins over auto-forward - [x] `working_dir` set if provided - [x] `stop()` SIGTERM → wait → SIGKILL - [x] Returns final exit status - [x] `Drop` triggers synchronous best-effort SIGTERM + SIGKILL - [x] `restart()` = stop + start (health re-wait deferred to #3) - [x] stdout/stderr captured to bounded ring buffers (1 MiB default) - [x] `stdout()` / `stderr()` accessors - [x] `stderr_tail()` available for #3 to use in error messages (allow(dead_code) until then) - [x] Linux primary; uses `cfg(unix)` for the kill path ## Test plan - [x] `just qa` green locally — 16/16 unit tests pass (10 new for the harness) - [x] Tests cover: stdout capture, stderr capture, env forwarding, graceful stop, Drop kill, restart, ring-buffer overflow drops oldest, builder defaults - [ ] CI green on Forgejo (waiting on #19 to land or for the runner to build the stacked diff) ## Notes for the reviewer - `stderr_tail()` and the `STDERR_TAIL_LINES` constant carry an `#[allow(dead_code)]` because they're not yet called by anything in this PR — they exist as the API surface that #3's health-polling error path needs. Removing the allow is part of #3. - I deliberately did **not** add a `wait_for_health()` stub. #3 will inline the polling at the bottom of `start()`. Avoiding a placeholder method keeps the diff in #3 clean. - All process tests use `/bin/sh -c` so they don't depend on a project-specific binary; portable across any Linux runner. - The Drop SIGTERM + SIGKILL combo is technically back-to-back (no sleep, since Drop is sync) — the issue explicitly asks for both. The graceful path is via explicit `stop().await`.
Implements ticket #2:

Builder
- HarnessBuilder with command/arg/args/env/working_dir/health_url/
  ws_url/startup_timeout/shutdown_timeout/health_poll_interval/
  buffer_bytes/build(), and ProcessHarness::builder() shortcut.
- Defaults match the spec: 15 s startup, 10 s shutdown, 100 ms poll,
  1 MiB stdout/stderr buffers each.

Spawn
- start() spawns the child via tokio::process::Command, sets stdin to
  /dev/null, pipes stdout+stderr, and sets kill_on_drop.
- Auto-forwards DISPLAY, WAYLAND_DISPLAY, XDG_RUNTIME_DIR,
  XDG_SESSION_TYPE, XDG_CURRENT_DESKTOP, DBUS_SESSION_BUS_ADDRESS from
  the parent if set. User-set env wins.
- Two background tasks drain stdout/stderr line-by-line into a bounded
  OutputBuffer (ring over bytes; oldest dropped on overflow).
- pid() / stdout() / stderr() accessors.

Stop & restart
- stop() sends SIGTERM via libc::kill, waits up to shutdown_timeout for
  graceful exit, then escalates to start_kill() (SIGKILL) and reaps.
- restart() = stop + start.
- Drop sends SIGTERM + SIGKILL synchronously so a panicking test never
  leaks the child.

Health polling is intentionally NOT implemented here — it lives in
ticket #3, which extends start() with the HTTP poll loop.

Dependencies
- Add `libc = "0.2"` under [target.'cfg(unix)'.dependencies] for
  SIGTERM. tokio already pulls libc transitively; declaring it
  explicitly is the cheapest way to call kill(2) directly.

Tests (10 in src/harness.rs):
- output_buffer_drops_oldest_when_full
- builder_defaults / builder_chains / build_returns_harness_with_no_child
- start_captures_stdout_from_echo
- start_captures_stderr_from_shell
- env_var_is_passed_to_child
- stop_terminates_long_running_child (verifies SIGTERM under 3 s)
- drop_kills_running_child (verifies Drop releases the child)
- restart_works
- stderr_tail_returns_last_lines

just qa green: 16/16 unit tests pass, fmt and clippy -D warnings clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
charles left a comment

Review — harness lifecycle + stdout/stderr capture (#2)

OutputBuffer

  • Backed par Mutex<Vec<u8>>. L'éviction utilise buf.drain(..drop_n) qui est O(n) (décalage de tous les éléments restants). Une VecDeque avec pop_front() serait plus adaptée sémantiquement et plus efficace. À l'échelle d'une sortie de test (quelques Ko à Mo), ça n'a pas d'impact pratique, mais c'est un écart de design visible.
  • String::from_utf8_lossy pour la troncature mid-byte — bon choix défensif.

Lifecycle SIGTERM/SIGKILL

  • stop() : SIGTERM → attente shutdown_timeoutstart_kill() (SIGKILL) — correct.
  • Drop : SIGTERM puis SIGKILL immédiat sans wait(). Le process devient techniquement un zombie jusqu'à la fin du parent. Acceptable pour des binaires de test courte durée, surtout avec kill_on_drop(true) sur la commande tokio.

Tests

  • drop_kills_running_child : Drop étant synchrone (pas d'await), le test passe toujours en < 3s — c'est un vrai check mais l'assertion temporelle est un peu trompeuse. Un commentaire explicitant l'intention serait bienvenu.
  • Bonne couverture : capture stdout/stderr, restart, timeout, env var forwarding.

Aucun bloquant.

## Review — harness lifecycle + stdout/stderr capture (#2) ### `OutputBuffer` - Backed par `Mutex<Vec<u8>>`. L'éviction utilise `buf.drain(..drop_n)` qui est O(n) (décalage de tous les éléments restants). Une `VecDeque` avec `pop_front()` serait plus adaptée sémantiquement et plus efficace. À l'échelle d'une sortie de test (quelques Ko à Mo), ça n'a pas d'impact pratique, mais c'est un écart de design visible. - `String::from_utf8_lossy` pour la troncature mid-byte — bon choix défensif. ### Lifecycle SIGTERM/SIGKILL - `stop()` : SIGTERM → attente `shutdown_timeout` → `start_kill()` (SIGKILL) — correct. - `Drop` : SIGTERM puis SIGKILL immédiat sans `wait()`. Le process devient techniquement un zombie jusqu'à la fin du parent. Acceptable pour des binaires de test courte durée, surtout avec `kill_on_drop(true)` sur la commande tokio. ### Tests - `drop_kills_running_child` : `Drop` étant synchrone (pas d'`await`), le test passe toujours en < 3s — c'est un vrai check mais l'assertion temporelle est un peu trompeuse. Un commentaire explicitant l'intention serait bienvenu. - Bonne couverture : capture stdout/stderr, restart, timeout, env var forwarding. Aucun bloquant.
charles left a comment

Pas de bloquant — implémentation solide, bonne couverture. Notes cosmétiques sur Vec vs VecDeque pour OutputBuffer et l'intention du test drop_kills_running_child.

✅ Pas de bloquant — implémentation solide, bonne couverture. Notes cosmétiques sur `Vec` vs `VecDeque` pour `OutputBuffer` et l'intention du test `drop_kills_running_child`.
charles changed target branch from feat/1-bootstrap-crate-skeleton to main 2026-04-11 18:42:00 +00:00
charles deleted branch feat/2-harness-lifecycle 2026-04-11 18:42:15 +00:00
Sign in to join this conversation.
No description provided.