charles/claude-hooks

Fork

You've already forked claude-hooks

Code Issues 10 Pull requests Projects Releases Packages 1 Wiki Activity Actions

VOICE-3: Composer mic toggle + live partials #775

New issue

Closed

opened 2026-05-02 21:41:53 +00:00 by code-lead · 0 comments

code-lead commented

2026-05-02 21:41:53 +00:00

Collaborator

Copy link

Acceptance criteria

Mic button

apps/web/src/components/planner/composer.tsx gains a <Button> with lucide-react Mic icon, sitting between the attachments strip and Send/Queue/Stop. aria-label="Start dictation" → "Stop dictation" swap on toggle. aria-pressed mirrors the recording state.
The button is hidden when the /architect/transcribe/health probe (see VOICE-2 (#774)) reports the feature disabled or unreachable. It renders disabled (with a tooltip explaining why) when the browser lacks navigator.mediaDevices.getUserMedia.
First click prompts mic permission. Permission denial surfaces a one-shot toast ("Microphone access denied — enable in browser settings") and reverts the button to idle.

Recording state machine

States: idle → requesting-permission → recording → uploading → idle. Errors at any step return to idle with a toast.
While recording: a small pulsing dot appears next to the button (CSS animation, gated by @media (prefers-reduced-motion: reduce)), and a live elapsed timer (mm:ss) renders to the right. Hard cap at speech.max_audio_seconds — auto-stops at the limit and proceeds to uploading.
Esc while recording cancels (no upload, no insertion). Esc already aborts a streaming architect turn (onAbort in the Composer) — keep the existing handler, just add a higher-priority cancel when recording is active.
MediaRecorder configured for audio/webm;codecs=opus when supported, falls back to the browser default mime type. Chunk timeslice 250 ms so we have something to upload promptly.

Streaming partials & final insert

On stop, the assembled blob POSTs to /architect/transcribe with the operator's resolved default language (no per-browser pref — comes from speech.default_language via the health probe payload). The response is consumed via EventSource-style SSE — use fetch + a ReadableStream reader since EventSource can't POST (match the helper used in useArchitectStream).
partial events render under the textarea in a role="status" aria-live="polite" band styled with text-fg-muted, debounced ~1 s so screen-reader announcements don't flood.
On final, the final text is inserted at the current caret position of the textarea via a controlled-input update — preserving any text the operator typed before/after the recording started. If the textarea has lost focus, append at the end with a leading space when the existing text is non-empty.
On error, surface a toast (tone="error") with the upstream message, drop any partial preview, leave the textarea unchanged.

Tests

Vitest: mock MediaRecorder + a stub SSE reader; assert the state-machine transitions on each event, that Esc cancels cleanly, and that the final insert respects caret position.
Vitest: assert the mic button is hidden when the health probe returns enabled: false.

Out of scope

Mobile-specific recording UX (push-to-talk gesture, background recording). Desktop browsers only this iteration.
Replacing the existing keyboard composer flow — voice is purely additive; ⌘/ctrl+Enter still sends.
Capturing audio outside the workspace / planner composer (no global hotkey, no recording from other routes).
TTS read-back of architect responses.
Settings-group UI — that lands in VOICE-4 (settings group, posted next).

References

specs/workspace-chat-voice-input.md — full spec (P2 section).
apps/web/src/components/planner/composer.tsx — the shared composer used by both the workspace and planner chat surfaces.
apps/web/CLAUDE.md — primitives, a11y baseline, radius/shadow conventions.
useArchitectStream — existing helper that does fetch + ReadableStream SSE consumption (pattern to mirror for the transcribe POST).

Dependencies

Blocked by VOICE-1 (#773) — the /architect/transcribe server proxy must exist before the composer can call it. Native dep edge written against VOICE-1 (#773)'s issue number.
Blocked by VOICE-2 (#774) — the composer reads the health probe to decide whether to render the mic and which default language to forward. Native dep edge written against VOICE-2 (#774)'s issue number.

As an operator typing into the workspace chat, I want to click a mic icon to start dictating, see the words appear live, and click again to stop and have the final transcript inserted at my caret — without losing whatever I had already typed. ## Acceptance criteria ### Mic button - [ ] `apps/web/src/components/planner/composer.tsx` gains a `<Button>` with `lucide-react` `Mic` icon, sitting between the attachments strip and Send/Queue/Stop. `aria-label="Start dictation"` → `"Stop dictation"` swap on toggle. `aria-pressed` mirrors the recording state. - [ ] The button is hidden when the `/architect/transcribe/health` probe (see VOICE-2 (#774)) reports the feature disabled or unreachable. It renders disabled (with a tooltip explaining why) when the browser lacks `navigator.mediaDevices.getUserMedia`. - [ ] First click prompts mic permission. Permission denial surfaces a one-shot toast ("Microphone access denied — enable in browser settings") and reverts the button to idle. ### Recording state machine - [ ] States: `idle → requesting-permission → recording → uploading → idle`. Errors at any step return to `idle` with a toast. - [ ] While `recording`: a small pulsing dot appears next to the button (CSS animation, gated by `@media (prefers-reduced-motion: reduce)`), and a live elapsed timer (`mm:ss`) renders to the right. Hard cap at `speech.max_audio_seconds` — auto-stops at the limit and proceeds to `uploading`. - [ ] `Esc` while recording cancels (no upload, no insertion). `Esc` already aborts a streaming architect turn (`onAbort` in the Composer) — keep the existing handler, just add a higher-priority cancel when recording is active. - [ ] `MediaRecorder` configured for `audio/webm;codecs=opus` when supported, falls back to the browser default mime type. Chunk timeslice 250 ms so we have something to upload promptly. ### Streaming partials & final insert - [ ] On stop, the assembled blob POSTs to `/architect/transcribe` with the operator's resolved default language (no per-browser pref — comes from `speech.default_language` via the health probe payload). The response is consumed via `EventSource`-style SSE — use `fetch` + a `ReadableStream` reader since `EventSource` can't POST (match the helper used in `useArchitectStream`). - [ ] `partial` events render under the textarea in a `role="status" aria-live="polite"` band styled with `text-fg-muted`, debounced ~1 s so screen-reader announcements don't flood. - [ ] On `final`, the final text is inserted **at the current caret position** of the textarea via a controlled-input update — preserving any text the operator typed before/after the recording started. If the textarea has lost focus, append at the end with a leading space when the existing text is non-empty. - [ ] On `error`, surface a toast (`tone="error"`) with the upstream message, drop any partial preview, leave the textarea unchanged. ### Tests - [ ] Vitest: mock `MediaRecorder` + a stub SSE reader; assert the state-machine transitions on each event, that `Esc` cancels cleanly, and that the final insert respects caret position. - [ ] Vitest: assert the mic button is hidden when the health probe returns `enabled: false`. ## Out of scope - Mobile-specific recording UX (push-to-talk gesture, background recording). Desktop browsers only this iteration. - Replacing the existing keyboard composer flow — voice is purely additive; ⌘/ctrl+Enter still sends. - Capturing audio outside the workspace / planner composer (no global hotkey, no recording from other routes). - TTS read-back of architect responses. - Settings-group UI — that lands in VOICE-4 (settings group, posted next). ## References - `specs/workspace-chat-voice-input.md` — full spec (P2 section). - `apps/web/src/components/planner/composer.tsx` — the shared composer used by both the workspace and planner chat surfaces. - `apps/web/CLAUDE.md` — primitives, a11y baseline, radius/shadow conventions. - `useArchitectStream` — existing helper that does `fetch` + `ReadableStream` SSE consumption (pattern to mirror for the transcribe POST). ## Dependencies - **Blocked by VOICE-1 (#773)** — the `/architect/transcribe` server proxy must exist before the composer can call it. Native dep edge written against VOICE-1 (#773)'s issue number. - **Blocked by VOICE-2 (#774)** — the composer reads the health probe to decide whether to render the mic and which default language to forward. Native dep edge written against VOICE-2 (#774)'s issue number.

dev was assigned by code-lead

2026-05-02 21:41:53 +00:00

code-lead added the

area:dashboard

type:user-story

labels

2026-05-02 21:41:54 +00:00

code-lead added a new dependency

2026-05-02 21:42:26 +00:00

#773 VOICE-1: /architect/transcribe server proxy + speaches integration

code-lead added a new dependency

2026-05-02 21:42:26 +00:00

#774 VOICE-2: /architect/transcribe/health probe endpoint

dev referenced this issue from a commit

2026-05-02 22:02:20 +00:00

feat(web): VOICE-3 composer mic toggle + live partials (#775)

dev referenced this issue from a pull request that will close it,

2026-05-02 22:02:47 +00:00

feat(web): VOICE-3 composer mic toggle + live partials #780

code-lead referenced this issue from a commit

2026-05-02 22:53:14 +00:00

feat(web): VOICE-3 composer mic toggle + live partials (#780)

claude-desktop closed this issue

2026-05-03 10:01:42 +00:00

No Branch/Tag specified

main

chore/sync-pre-push-from-forge-base

fix/flows-yaml-dispatch-identity

feat/board-tap-to-assign

dev/1107

code-lead/1106

code-lead/1108

dev/1104

code-lead/1103

code-lead/1080

dev/1087

feat/flows-yaml-ci-events

chore/board-drop-stalled-and-density-controls

fix/flows-yaml-routes-always-register

flows-yaml/api-defaults

dev/1023

fix/event-log-history-bleed

fix/janitor-fix-ci-logs-and-cap

dev/1022

fix/board-card-provider

code-lead/1036

dev/1025

code-lead/1020

dev/1017

code-lead/1026

feat/web-shortcut-registry-1018

dev/1015

code-lead/1009

code-lead/1008

dev/975

dev/969

dev/973

dev/967

code-lead/968

code-lead/953

dev/970

dev/976

code-lead/966

code-lead/956

code-lead/951

dev/962

dev/963

dev/977

dev/955

dev/983

dev/961

dev/974

code-lead/950

code-lead/939

dev/941

dev/940

dev/937

dev/938

dev/936

dev/935

feat/web-i18n-fr-locale

feat/spec-editor-ui-polish

chore/drop-legacy-compat

fix/skills-drop-preview-pane

fix/882-skills-safety-rail

dev/911

dev/909

dev/923

dev/917

dev/915

feat/879-sr11-m2-drop-legacy-skill

code-lead/873

dev/881

code-lead/869

dev/867

code-lead/845

code-lead/843

code-lead/844

dev/837

dev/861

dev/849

code-lead/837

code-lead/842

fix/dedup-rebase-inflight

dev/838

code-lead/847

dev/833

code-lead/848

pr/838

code-lead/841

feat/settings-save-bar/836

code-lead/840

dev/846

code-lead/839

dev/832

fix/board-sse-stale-cache

dev/834

dev/835

feat/settings-breadcrumbs

feat/forge-oauth-credentials

refactor/service-config-consolidation

feat/agent-tokens-to-secrets

feat/gitlab-oauth-to-db

feat/authelia-rip-and-voice-fixes

fix/rebase-storm-and-dead-letter

code-lead/797

code-lead/796

dev/811

code-lead/798

dev/810

code-lead/795

dev/808

code-lead/794

dev/805

dev/802

dev/803

feat/avatar-menu-settings-entry

feat/per-agent-token-tracking

dev/793

dev/747

dev/752

code-lead/790

code-lead/759

dev/756

dev/760

dev/741

dev/767

dev/740

dev/709

dev/644

dev/637

boss/614

dev/600

dev/611

dev/585

fix/login-bonus-fixes

boss/544

dev/542

refactor/api-prefix-and-session-gate

dev/489

boss/531

boss/518

dev/499

boss/516

dev/530

dev/517

dev/519

dev/515

dev/522

dev/503

dev/471

boss/329

dev/417

dev/418

dev/402

boss/327

dev/334

dev/332

boss/326

boss/325

dev/331

boss/324

boss/323

boss/322

dev/294

test/s11-task-analytics

dev/262

boss/270

dev/268

foreman/ui-consolidation-spec

dev/234

boss/196

boss/176

boss/164

fix/124-session-persist-bind

boss/52

dev/87

boss/73

dev/77

dev/81

dev/82

boss/79

dev/42

dev/35

boss/7

No results found.

Labels

Clear labels

area:agents

Agent types, pool scheduling, per-instance config

area:dashboard

Dashboard UI and observability surfaces

area:database

DB layer — schema, migrations, ORM, raw SQL

area:design

UI/UX mockup work — routes to designer agent

area:design-review

Design review dispatch — routes to design-reviewer agent

area:flows

Flow runner — YAML loader, executor, op registry, expression eval

area:infra

Deployment, isolation, containers, systemd units

area:meta

Tracking, scaffolding, project setup

area:security

Security — routes to reviewer-security (opus)

area:sessions

Session-id store, Claude SDK resume logic

area:webhook

Forgejo webhook routing and handlers

area:workdir

Clone cache, worktrees, git identity

security

Security-sensitive issue

Tracking or decisions, not implementation work

No labels

Milestone

Clear milestone

No items

No milestone

Projects

Clear projects

No items

No project

Assignees

Clear assignees

No assignees

dev

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Depends on

#773 VOICE-1: /architect/transcribe server proxy + speaches integration

charles/claude-hooks

#774 VOICE-2: /architect/transcribe/health probe endpoint

charles/claude-hooks

Reference

charles/claude-hooks#775

Reference in a new issue

Repository

charles/claude-hooks

Title

Body

No description provided.

Delete branch "%!s()"

Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?

Rows
Columns