charles/claude-hooks

Fork 0

feat(voice): /architect/transcribe SSE proxy + speaches integration (VOICE-1) #778

Merged

code-lead merged 3 commits from dev/773 into main

2026-05-02 22:29:03 +00:00

dev commented

2026-05-02 21:54:29 +00:00

Collaborator

Adds a server-side proxy so the browser never talks directly to speaches.

Migration 008 adds speech_json column to service_config
config/service.json seeded with speech.* factory defaults
getSpeechConfig() accessor reads exclusively via getServiceConfig()
POST /architect/transcribe: multipart audio → speaches → SSE partial/final/error frames
503 when disabled, 413 on oversize, language=auto omits field upstream, AbortController keyed to browser disconnect
12 unit/integration tests: guard rails, language mapping, SSE fan-out, dynamic enable toggle

Closes #773

Test plan

just qa green (3134 tests, 0 fail)
POST /architect/transcribe with speech disabled → 503
Enable speech in DB → next request proxies to speaches, streams SSE frames

Adds a server-side proxy so the browser never talks directly to speaches. - Migration 008 adds `speech_json` column to `service_config` - `config/service.json` seeded with `speech.*` factory defaults - `getSpeechConfig()` accessor reads exclusively via `getServiceConfig()` - `POST /architect/transcribe`: multipart audio → speaches → SSE `partial`/`final`/`error` frames - 503 when disabled, 413 on oversize, `language=auto` omits field upstream, `AbortController` keyed to browser disconnect - 12 unit/integration tests: guard rails, language mapping, SSE fan-out, dynamic enable toggle Closes #773 ## Test plan - [ ] `just qa` green (3134 tests, 0 fail) - [ ] `POST /architect/transcribe` with speech disabled → 503 - [ ] Enable speech in DB → next request proxies to speaches, streams SSE frames

dev self-assigned this

2026-05-02 21:54:29 +00:00

dev added 1 commit

2026-05-02 21:54:29 +00:00

feat(voice): add /architect/transcribe SSE proxy + speaches integration (VOICE-1, #773 )

qa / dockerfile (pull_request) Successful in 6s

Details

qa / qa (pull_request) Successful in 2m55s

Details

9ad6143f2f

- Migration 008 adds speech_json column to service_config
- config/service.json seeded with speech.{enabled,transcribe_url,model,...} defaults
- getSpeechConfig() accessor in webhook-config.ts reads via getServiceConfig()
- POST /architect/transcribe: multipart audio → speaches → SSE partial/final/error events
- 503 when disabled, 413 on oversize, auto-language omits field upstream, abort on disconnect
- 12 unit/integration tests covering guard rails, language mapping, SSE fan-out, dynamic toggle

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dev requested review from reviewer

2026-05-02 21:57:26 +00:00

reviewer requested changes

2026-05-02 22:03:38 +00:00

Dismissed

reviewer left a comment

behavior apps/server/src/http/handlers/transcribe.ts: cfg.max_audio_seconds is read from config and stored, but never enforced. AC says reject with 413 (or SSE error) when audio duration exceeds speech.max_audio_seconds. Current code only gates on max_audio_bytes. At minimum, after speaches returns duration_ms in the final event, compare duration_ms / 1000 > cfg.max_audio_seconds and emit an SSE error frame (code: "payload-too-large") instead of the final frame. The blob-header heuristic is best-effort, but the post-response check is required. Add a unit test: insertSpeechConfig({ max_audio_seconds: 1 }) + mock that returns duration_ms: 5000 → downstream SSE is error, not final.

- **behavior** `apps/server/src/http/handlers/transcribe.ts`: `cfg.max_audio_seconds` is read from config and stored, but **never enforced**. AC says reject with 413 (or SSE error) when audio duration exceeds `speech.max_audio_seconds`. Current code only gates on `max_audio_bytes`. At minimum, after speaches returns `duration_ms` in the final event, compare `duration_ms / 1000 > cfg.max_audio_seconds` and emit an SSE `error` frame (`code: "payload-too-large"`) instead of the `final` frame. The blob-header heuristic is best-effort, but the post-response check is required. Add a unit test: `insertSpeechConfig({ max_audio_seconds: 1 })` + mock that returns `duration_ms: 5000` → downstream SSE is `error`, not `final`.

dev added 1 commit

2026-05-02 22:10:28 +00:00

fix(voice): enforce max_audio_seconds duration guard post-transcription

qa / dockerfile (pull_request) Successful in 7s

Details

qa / qa (pull_request) Successful in 3m26s

Details

c4f7bf3e98

Extract duration_ms from speaches final event (SSE and plain-JSON paths)
and emit SSE error(payload-too-large) when it exceeds cfg.max_audio_seconds;
add matching unit test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dev commented

2026-05-02 22:11:02 +00:00

Author

Collaborator

Fixed in c4f7bf3.

All three emission sites (SSE final, plain-JSON fallback, implicit-final) now check durationMs / 1000 > cfg.max_audio_seconds and call emitError("payload-too-large", …) instead of emitting the final frame when the limit is breached. duration_ms from speaches response is extracted and preferred over the wallclock fallback in the SSE and plain-JSON paths.

Added two tests in a new transcribe — duration guard block:

max_audio_seconds: 1 + mock returning duration_ms: 5000 → single SSE error event with code: "payload-too-large"
max_audio_seconds: 10 + same mock → normal final event (guard does not fire)

Fixed in c4f7bf3. All three emission sites (SSE final, plain-JSON fallback, implicit-final) now check `durationMs / 1000 > cfg.max_audio_seconds` and call `emitError("payload-too-large", …)` instead of emitting the final frame when the limit is breached. `duration_ms` from speaches response is extracted and preferred over the wallclock fallback in the SSE and plain-JSON paths. Added two tests in a new `transcribe — duration guard` block: - `max_audio_seconds: 1` + mock returning `duration_ms: 5000` → single SSE `error` event with `code: "payload-too-large"` - `max_audio_seconds: 10` + same mock → normal `final` event (guard does not fire)

dev requested review from reviewer

2026-05-02 22:13:55 +00:00

reviewer approved these changes

2026-05-02 22:16:14 +00:00

reviewer left a comment

Duration guard (max_audio_seconds) now enforced on both plain-JSON (line 191) and SSE (line 278) paths. Test at line 284 covers the rejection case. CI green.

Duration guard (`max_audio_seconds`) now enforced on both plain-JSON (line 191) and SSE (line 278) paths. Test at line 284 covers the rejection case. CI green.

code-lead added 1 commit

2026-05-02 22:18:16 +00:00

feat(voice): GET /architect/transcribe/health probe (VOICE-2, #774 )

qa / dockerfile (pull_request) Successful in 6s

Details

qa / qa (pull_request) Successful in 5m57s

Details

ea47da1ef0

Adds `GET /architect/transcribe/health` so the composer and dashboard can check whether speaches is enabled and reachable before rendering the mic button.

- Returns `{ enabled, reachable, model, default_language, allowed_languages, last_error? }` — always 200
- Probes `${transcribe_url}/v1/models` (HEAD); result cached 30 s; `?refresh=1` busts it
- `enabled=false` returns early — no upstream call
- Timeout and non-2xx → `reachable: false` + `last_error`
- Auth: `guardMutating`, same as the rest of `/architect/*`
- Stacked on VOICE-1 (`dev/773`); base branch is `dev/773`

Closes #774