VOICE-1: /architect/transcribe server proxy + speaches integration #773
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Blocks
#774 VOICE-2: /architect/transcribe/health probe endpoint
charles/claude-hooks
#775 VOICE-3: Composer mic toggle + live partials
charles/claude-hooks
#776 VOICE-4: Settings → Service "Voice input" group
charles/claude-hooks
Reference
charles/claude-hooks#773
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
As an operator, I want the claude-hooks server to broker every transcription request so my speaches instance stays bound to localhost and the browser only ever talks to the claude-hooks origin it already has a session cookie for.
Acceptance criteria
Config plumbing
config/service.jsonunder a new"speech": { … }block matching the spec's storage table (speech.enabled,speech.transcribe_url,speech.model,speech.default_language,speech.max_audio_seconds,speech.max_audio_bytes,speech.allowed_languages_json). Not read at runtime — only by SVC-1'ssyncServiceConfigBuiltin.getSpeechConfig()accessor lands next to the other typed sub-accessors that SVC-2 introduces. Reads exclusively viagetServiceConfig(). NoreadFileSync('service.json')anywhere.speech.enabled === falseorspeech.transcribe_urlis empty,POST /architect/transcribereturns503 service-disabledwith a JSON body explaining how to flip the flag from the dashboard. Don't 404 — the UI needs the distinction so it can disable the mic button gracefully.Endpoint shape
POST /architect/transcribeacceptsmultipart/form-datawith:audio: the recorded blob (typicallyaudio/webm;codecs=opus).language(optional): ISO 639-1 code orauto. Falls back tospeech.default_language.prompt(optional): forwarded to speaches as the Whisperpromptfield for context-priming./architect/*(reuse whatarchitect.tsalready wires).speech.max_audio_byteswith413. Reject requests where the audio duration (deduced from blob header where possible, otherwise enforced after speaches replies) exceedsspeech.max_audio_secondswith413.text/event-stream) with envelope events:event: partial—data: { "text": "…", "is_final": false }emitted as speaches yields incremental hypotheses.event: final—data: { "text": "…", "language": "fr", "duration_ms": 4321 }once speaches finishes.event: error—data: { "code": "...", "message": "..." }on any upstream failure; the stream then closes./events— seeapps/server/src/main.ts::SSE_HEARTBEAT_MS).Speaches integration
${transcribe_url}/v1/audio/transcriptionswithmodelandlanguage(nullwhen the operator choseauto, since speaches/Whisper auto-detects when omitted).stream=trueand parses speaches' SSE response, fanning events through to the browser. If the upstream returns plain JSON instead of SSE (older speaches builds), fall back to a singlefinalevent.AbortControllerkeyed to the response's close signal).Tests
language: "auto"→ omits the field upstream,language: "fr"→ forwards verbatim.413;speech.enabled=false→503; missing operator session →401.partialSSE frames and afinalproduces three SSE frames downstream in the same order.speech.enabledfromfalsetotrueatscope='global'(via directservice_configinsert in the test) makes the next request succeed without restart.Out of scope
secrettable — defer until we actually need a hosted provider. speaches needs no auth.References
specs/workspace-chat-voice-input.md— full spec (P1 section).specs/config-to-db.md— SVC-1/SVC-2/SVC-3 contract this story lands on from day one.~/.config/systemd/user/speaches.service— existing speaches unit on the desktop, port 8078, modeldeepdml/faster-whisper-large-v3-turbo-ct2,STT_MODEL_TTL=-1(warm).POST /v1/audio/transcriptionswithstream=truefor SSE partials.Dependencies
/issues/ready; the architect (or operator) assigns it manually to kick off dispatch.