VOICE-3: Composer mic toggle + live partials #775
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Depends on
#773 VOICE-1: /architect/transcribe server proxy + speaches integration
charles/claude-hooks
#774 VOICE-2: /architect/transcribe/health probe endpoint
charles/claude-hooks
Reference
charles/claude-hooks#775
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
As an operator typing into the workspace chat, I want to click a mic icon to start dictating, see the words appear live, and click again to stop and have the final transcript inserted at my caret — without losing whatever I had already typed.
Acceptance criteria
Mic button
apps/web/src/components/planner/composer.tsxgains a<Button>withlucide-reactMicicon, sitting between the attachments strip and Send/Queue/Stop.aria-label="Start dictation"→"Stop dictation"swap on toggle.aria-pressedmirrors the recording state./architect/transcribe/healthprobe (see VOICE-2 (#774)) reports the feature disabled or unreachable. It renders disabled (with a tooltip explaining why) when the browser lacksnavigator.mediaDevices.getUserMedia.Recording state machine
idle → requesting-permission → recording → uploading → idle. Errors at any step return toidlewith a toast.recording: a small pulsing dot appears next to the button (CSS animation, gated by@media (prefers-reduced-motion: reduce)), and a live elapsed timer (mm:ss) renders to the right. Hard cap atspeech.max_audio_seconds— auto-stops at the limit and proceeds touploading.Escwhile recording cancels (no upload, no insertion).Escalready aborts a streaming architect turn (onAbortin the Composer) — keep the existing handler, just add a higher-priority cancel when recording is active.MediaRecorderconfigured foraudio/webm;codecs=opuswhen supported, falls back to the browser default mime type. Chunk timeslice 250 ms so we have something to upload promptly.Streaming partials & final insert
/architect/transcribewith the operator's resolved default language (no per-browser pref — comes fromspeech.default_languagevia the health probe payload). The response is consumed viaEventSource-style SSE — usefetch+ aReadableStreamreader sinceEventSourcecan't POST (match the helper used inuseArchitectStream).partialevents render under the textarea in arole="status" aria-live="polite"band styled withtext-fg-muted, debounced ~1 s so screen-reader announcements don't flood.final, the final text is inserted at the current caret position of the textarea via a controlled-input update — preserving any text the operator typed before/after the recording started. If the textarea has lost focus, append at the end with a leading space when the existing text is non-empty.error, surface a toast (tone="error") with the upstream message, drop any partial preview, leave the textarea unchanged.Tests
MediaRecorder+ a stub SSE reader; assert the state-machine transitions on each event, thatEsccancels cleanly, and that the final insert respects caret position.enabled: false.Out of scope
References
specs/workspace-chat-voice-input.md— full spec (P2 section).apps/web/src/components/planner/composer.tsx— the shared composer used by both the workspace and planner chat surfaces.apps/web/CLAUDE.md— primitives, a11y baseline, radius/shadow conventions.useArchitectStream— existing helper that doesfetch+ReadableStreamSSE consumption (pattern to mirror for the transcribe POST).Dependencies
/architect/transcribeserver proxy must exist before the composer can call it. Native dep edge written against VOICE-1 (#773)'s issue number.