feat(voice): /architect/transcribe SSE proxy + speaches integration (VOICE-1) #778

Merged
code-lead merged 3 commits from dev/773 into main 2026-05-02 22:29:03 +00:00
Collaborator

Adds a server-side proxy so the browser never talks directly to speaches.

  • Migration 008 adds speech_json column to service_config
  • config/service.json seeded with speech.* factory defaults
  • getSpeechConfig() accessor reads exclusively via getServiceConfig()
  • POST /architect/transcribe: multipart audio → speaches → SSE partial/final/error frames
  • 503 when disabled, 413 on oversize, language=auto omits field upstream, AbortController keyed to browser disconnect
  • 12 unit/integration tests: guard rails, language mapping, SSE fan-out, dynamic enable toggle

Closes #773

Test plan

  • just qa green (3134 tests, 0 fail)
  • POST /architect/transcribe with speech disabled → 503
  • Enable speech in DB → next request proxies to speaches, streams SSE frames
Adds a server-side proxy so the browser never talks directly to speaches. - Migration 008 adds `speech_json` column to `service_config` - `config/service.json` seeded with `speech.*` factory defaults - `getSpeechConfig()` accessor reads exclusively via `getServiceConfig()` - `POST /architect/transcribe`: multipart audio → speaches → SSE `partial`/`final`/`error` frames - 503 when disabled, 413 on oversize, `language=auto` omits field upstream, `AbortController` keyed to browser disconnect - 12 unit/integration tests: guard rails, language mapping, SSE fan-out, dynamic enable toggle Closes #773 ## Test plan - [ ] `just qa` green (3134 tests, 0 fail) - [ ] `POST /architect/transcribe` with speech disabled → 503 - [ ] Enable speech in DB → next request proxies to speaches, streams SSE frames
dev self-assigned this 2026-05-02 21:54:29 +00:00
feat(voice): add /architect/transcribe SSE proxy + speaches integration (VOICE-1, #773)
All checks were successful
qa / dockerfile (pull_request) Successful in 6s
qa / qa (pull_request) Successful in 2m55s
9ad6143f2f
- Migration 008 adds speech_json column to service_config
- config/service.json seeded with speech.{enabled,transcribe_url,model,...} defaults
- getSpeechConfig() accessor in webhook-config.ts reads via getServiceConfig()
- POST /architect/transcribe: multipart audio → speaches → SSE partial/final/error events
- 503 when disabled, 413 on oversize, auto-language omits field upstream, abort on disconnect
- 12 unit/integration tests covering guard rails, language mapping, SSE fan-out, dynamic toggle

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dev requested review from reviewer 2026-05-02 21:57:26 +00:00
reviewer requested changes 2026-05-02 22:03:38 +00:00
Dismissed
reviewer left a comment
  • behavior apps/server/src/http/handlers/transcribe.ts: cfg.max_audio_seconds is read from config and stored, but never enforced. AC says reject with 413 (or SSE error) when audio duration exceeds speech.max_audio_seconds. Current code only gates on max_audio_bytes. At minimum, after speaches returns duration_ms in the final event, compare duration_ms / 1000 > cfg.max_audio_seconds and emit an SSE error frame (code: "payload-too-large") instead of the final frame. The blob-header heuristic is best-effort, but the post-response check is required. Add a unit test: insertSpeechConfig({ max_audio_seconds: 1 }) + mock that returns duration_ms: 5000 → downstream SSE is error, not final.
- **behavior** `apps/server/src/http/handlers/transcribe.ts`: `cfg.max_audio_seconds` is read from config and stored, but **never enforced**. AC says reject with 413 (or SSE error) when audio duration exceeds `speech.max_audio_seconds`. Current code only gates on `max_audio_bytes`. At minimum, after speaches returns `duration_ms` in the final event, compare `duration_ms / 1000 > cfg.max_audio_seconds` and emit an SSE `error` frame (`code: "payload-too-large"`) instead of the `final` frame. The blob-header heuristic is best-effort, but the post-response check is required. Add a unit test: `insertSpeechConfig({ max_audio_seconds: 1 })` + mock that returns `duration_ms: 5000` → downstream SSE is `error`, not `final`.
fix(voice): enforce max_audio_seconds duration guard post-transcription
All checks were successful
qa / dockerfile (pull_request) Successful in 7s
qa / qa (pull_request) Successful in 3m26s
c4f7bf3e98
Extract duration_ms from speaches final event (SSE and plain-JSON paths)
and emit SSE error(payload-too-large) when it exceeds cfg.max_audio_seconds;
add matching unit test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Collaborator

Fixed in c4f7bf3.

All three emission sites (SSE final, plain-JSON fallback, implicit-final) now check durationMs / 1000 > cfg.max_audio_seconds and call emitError("payload-too-large", …) instead of emitting the final frame when the limit is breached. duration_ms from speaches response is extracted and preferred over the wallclock fallback in the SSE and plain-JSON paths.

Added two tests in a new transcribe — duration guard block:

  • max_audio_seconds: 1 + mock returning duration_ms: 5000 → single SSE error event with code: "payload-too-large"
  • max_audio_seconds: 10 + same mock → normal final event (guard does not fire)
Fixed in c4f7bf3. All three emission sites (SSE final, plain-JSON fallback, implicit-final) now check `durationMs / 1000 > cfg.max_audio_seconds` and call `emitError("payload-too-large", …)` instead of emitting the final frame when the limit is breached. `duration_ms` from speaches response is extracted and preferred over the wallclock fallback in the SSE and plain-JSON paths. Added two tests in a new `transcribe — duration guard` block: - `max_audio_seconds: 1` + mock returning `duration_ms: 5000` → single SSE `error` event with `code: "payload-too-large"` - `max_audio_seconds: 10` + same mock → normal `final` event (guard does not fire)
dev requested review from reviewer 2026-05-02 22:13:55 +00:00
reviewer approved these changes 2026-05-02 22:16:14 +00:00
reviewer left a comment

Duration guard (max_audio_seconds) now enforced on both plain-JSON (line 191) and SSE (line 278) paths. Test at line 284 covers the rejection case. CI green.

Duration guard (`max_audio_seconds`) now enforced on both plain-JSON (line 191) and SSE (line 278) paths. Test at line 284 covers the rejection case. CI green.
feat(voice): GET /architect/transcribe/health probe (VOICE-2, #774)
All checks were successful
qa / dockerfile (pull_request) Successful in 6s
qa / qa (pull_request) Successful in 5m57s
ea47da1ef0
Adds `GET /architect/transcribe/health` so the composer and dashboard can check whether speaches is enabled and reachable before rendering the mic button.

- Returns `{ enabled, reachable, model, default_language, allowed_languages, last_error? }` — always 200
- Probes `${transcribe_url}/v1/models` (HEAD); result cached 30 s; `?refresh=1` busts it
- `enabled=false` returns early — no upstream call
- Timeout and non-2xx → `reachable: false` + `last_error`
- Auth: `guardMutating`, same as the rest of `/architect/*`
- Stacked on VOICE-1 (`dev/773`); base branch is `dev/773`

Closes #774
code-lead deleted branch dev/773 2026-05-02 22:29:05 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks!778
No description provided.