feat(voice): /architect/transcribe SSE proxy + speaches integration (VOICE-1) #778
No reviewers
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks!778
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "dev/773"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Adds a server-side proxy so the browser never talks directly to speaches.
speech_jsoncolumn toservice_configconfig/service.jsonseeded withspeech.*factory defaultsgetSpeechConfig()accessor reads exclusively viagetServiceConfig()POST /architect/transcribe: multipart audio → speaches → SSEpartial/final/errorframeslanguage=autoomits field upstream,AbortControllerkeyed to browser disconnectCloses #773
Test plan
just qagreen (3134 tests, 0 fail)POST /architect/transcribewith speech disabled → 503- Migration 008 adds speech_json column to service_config - config/service.json seeded with speech.{enabled,transcribe_url,model,...} defaults - getSpeechConfig() accessor in webhook-config.ts reads via getServiceConfig() - POST /architect/transcribe: multipart audio → speaches → SSE partial/final/error events - 503 when disabled, 413 on oversize, auto-language omits field upstream, abort on disconnect - 12 unit/integration tests covering guard rails, language mapping, SSE fan-out, dynamic toggle Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>apps/server/src/http/handlers/transcribe.ts:cfg.max_audio_secondsis read from config and stored, but never enforced. AC says reject with 413 (or SSE error) when audio duration exceedsspeech.max_audio_seconds. Current code only gates onmax_audio_bytes. At minimum, after speaches returnsduration_msin the final event, compareduration_ms / 1000 > cfg.max_audio_secondsand emit an SSEerrorframe (code: "payload-too-large") instead of thefinalframe. The blob-header heuristic is best-effort, but the post-response check is required. Add a unit test:insertSpeechConfig({ max_audio_seconds: 1 })+ mock that returnsduration_ms: 5000→ downstream SSE iserror, notfinal.Fixed in
c4f7bf3.All three emission sites (SSE final, plain-JSON fallback, implicit-final) now check
durationMs / 1000 > cfg.max_audio_secondsand callemitError("payload-too-large", …)instead of emitting the final frame when the limit is breached.duration_msfrom speaches response is extracted and preferred over the wallclock fallback in the SSE and plain-JSON paths.Added two tests in a new
transcribe — duration guardblock:max_audio_seconds: 1+ mock returningduration_ms: 5000→ single SSEerrorevent withcode: "payload-too-large"max_audio_seconds: 10+ same mock → normalfinalevent (guard does not fire)Duration guard (
max_audio_seconds) now enforced on both plain-JSON (line 191) and SSE (line 278) paths. Test at line 284 covers the rejection case. CI green.Adds `GET /architect/transcribe/health` so the composer and dashboard can check whether speaches is enabled and reachable before rendering the mic button. - Returns `{ enabled, reachable, model, default_language, allowed_languages, last_error? }` — always 200 - Probes `${transcribe_url}/v1/models` (HEAD); result cached 30 s; `?refresh=1` busts it - `enabled=false` returns early — no upstream call - Timeout and non-2xx → `reachable: false` + `last_error` - Auth: `guardMutating`, same as the rest of `/architect/*` - Stacked on VOICE-1 (`dev/773`); base branch is `dev/773` Closes #774