charles/claude-hooks

Fork

You've already forked claude-hooks

Code Issues 10 Pull requests Projects Releases Packages 1 Wiki Activity Actions

VOICE-1: /architect/transcribe server proxy + speaches integration #773

New issue

Closed

opened 2026-05-02 21:41:50 +00:00 by code-lead · 0 comments

code-lead commented

2026-05-02 21:41:50 +00:00

Collaborator

Copy link

Acceptance criteria

Config plumbing

Factory-image defaults added to config/service.json under a new "speech": { … } block matching the spec's storage table (speech.enabled, speech.transcribe_url, speech.model, speech.default_language, speech.max_audio_seconds, speech.max_audio_bytes, speech.allowed_languages_json). Not read at runtime — only by SVC-1's syncServiceConfigBuiltin.
getSpeechConfig() accessor lands next to the other typed sub-accessors that SVC-2 introduces. Reads exclusively via getServiceConfig(). No readFileSync('service.json') anywhere.
When speech.enabled === false or speech.transcribe_url is empty, POST /architect/transcribe returns 503 service-disabled with a JSON body explaining how to flip the flag from the dashboard. Don't 404 — the UI needs the distinction so it can disable the mic button gracefully.

Endpoint shape

POST /architect/transcribe accepts multipart/form-data with:
- audio: the recorded blob (typically audio/webm;codecs=opus).
- language (optional): ISO 639-1 code or auto. Falls back to speech.default_language.
- prompt (optional): forwarded to speaches as the Whisper prompt field for context-priming.
Auth: same operator-session middleware as the rest of /architect/* (reuse what architect.ts already wires).
Reject requests larger than speech.max_audio_bytes with 413. Reject requests where the audio duration (deduced from blob header where possible, otherwise enforced after speaches replies) exceeds speech.max_audio_seconds with 413.
Response is SSE (text/event-stream) with envelope events:
- event: partial — data: { "text": "…", "is_final": false } emitted as speaches yields incremental hypotheses.
- event: final — data: { "text": "…", "language": "fr", "duration_ms": 4321 } once speaches finishes.
- event: error — data: { "code": "...", "message": "..." } on any upstream failure; the stream then closes.
Heartbeat comment frames every 10 s while waiting on speaches so idle proxies don't reset the connection (mirror the pattern already used by /events — see apps/server/src/main.ts::SSE_HEARTBEAT_MS).

Speaches integration

Forwards the audio to ${transcribe_url}/v1/audio/transcriptions with model and language (null when the operator chose auto, since speaches/Whisper auto-detects when omitted).
Requests stream=true and parses speaches' SSE response, fanning events through to the browser. If the upstream returns plain JSON instead of SSE (older speaches builds), fall back to a single final event.
Aborts the upstream request when the browser disconnects (use AbortController keyed to the response's close signal).
Logs a one-line entry per request: agent, language, duration_ms, audio_bytes, upstream HTTP status. No transcript text in logs.

Tests

Unit: proxy correctly maps language: "auto" → omits the field upstream, language: "fr" → forwards verbatim.
Unit: oversize blob → 413; speech.enabled=false → 503; missing operator session → 401.
Integration: a fixture upstream that emits two partial SSE frames and a final produces three SSE frames downstream in the same order.
Integration: flipping speech.enabled from false to true at scope='global' (via direct service_config insert in the test) makes the next request succeed without restart.

Out of scope

TTS / read-back of architect responses.
Speaker diarisation, custom vocabularies, or fine-tuned models.
Multi-provider abstraction (OpenAI Whisper, Groq, Replicate, etc.) — config shape stays generic enough to swap later, but no provider-abstraction code lands here.
Hosted-STT API keys in the SC-6 secret table — defer until we actually need a hosted provider. speaches needs no auth.

References

specs/workspace-chat-voice-input.md — full spec (P1 section).
specs/config-to-db.md — SVC-1/SVC-2/SVC-3 contract this story lands on from day one.
~/.config/systemd/user/speaches.service — existing speaches unit on the desktop, port 8078, model deepdml/faster-whisper-large-v3-turbo-ct2, STT_MODEL_TTL=-1 (warm).
speaches OpenAI-compatible STT: POST /v1/audio/transcriptions with stream=true for SSE partials.

Dependencies

SVC-1 (#750) — already merged. No native dep edge needed.
This story has no open blockers at creation time. It will land unassigned on /issues/ready; the architect (or operator) assigns it manually to kick off dispatch.

As an operator, I want the claude-hooks server to broker every transcription request so my speaches instance stays bound to localhost and the browser only ever talks to the claude-hooks origin it already has a session cookie for. ## Acceptance criteria ### Config plumbing - [ ] Factory-image defaults added to `config/service.json` under a new `"speech": { … }` block matching the spec's storage table (`speech.enabled`, `speech.transcribe_url`, `speech.model`, `speech.default_language`, `speech.max_audio_seconds`, `speech.max_audio_bytes`, `speech.allowed_languages_json`). **Not** read at runtime — only by SVC-1's `syncServiceConfigBuiltin`. - [ ] `getSpeechConfig()` accessor lands next to the other typed sub-accessors that SVC-2 introduces. Reads exclusively via `getServiceConfig()`. No `readFileSync('service.json')` anywhere. - [ ] When `speech.enabled === false` or `speech.transcribe_url` is empty, `POST /architect/transcribe` returns `503 service-disabled` with a JSON body explaining how to flip the flag from the dashboard. Don't 404 — the UI needs the distinction so it can disable the mic button gracefully. ### Endpoint shape - [ ] `POST /architect/transcribe` accepts `multipart/form-data` with: - `audio`: the recorded blob (typically `audio/webm;codecs=opus`). - `language` (optional): ISO 639-1 code or `auto`. Falls back to `speech.default_language`. - `prompt` (optional): forwarded to speaches as the Whisper `prompt` field for context-priming. - [ ] Auth: same operator-session middleware as the rest of `/architect/*` (reuse what `architect.ts` already wires). - [ ] Reject requests larger than `speech.max_audio_bytes` with `413`. Reject requests where the audio duration (deduced from blob header where possible, otherwise enforced after speaches replies) exceeds `speech.max_audio_seconds` with `413`. - [ ] Response is **SSE** (`text/event-stream`) with envelope events: - `event: partial` — `data: { "text": "…", "is_final": false }` emitted as speaches yields incremental hypotheses. - `event: final` — `data: { "text": "…", "language": "fr", "duration_ms": 4321 }` once speaches finishes. - `event: error` — `data: { "code": "...", "message": "..." }` on any upstream failure; the stream then closes. - [ ] Heartbeat comment frames every 10 s while waiting on speaches so idle proxies don't reset the connection (mirror the pattern already used by `/events` — see `apps/server/src/main.ts::SSE_HEARTBEAT_MS`). ### Speaches integration - [ ] Forwards the audio to `${transcribe_url}/v1/audio/transcriptions` with `model` and `language` (`null` when the operator chose `auto`, since speaches/Whisper auto-detects when omitted). - [ ] Requests `stream=true` and parses speaches' SSE response, fanning events through to the browser. If the upstream returns plain JSON instead of SSE (older speaches builds), fall back to a single `final` event. - [ ] Aborts the upstream request when the browser disconnects (use `AbortController` keyed to the response's close signal). - [ ] Logs a one-line entry per request: agent, language, duration_ms, audio_bytes, upstream HTTP status. **No transcript text in logs.** ### Tests - [ ] Unit: proxy correctly maps `language: "auto"` → omits the field upstream, `language: "fr"` → forwards verbatim. - [ ] Unit: oversize blob → `413`; `speech.enabled=false` → `503`; missing operator session → `401`. - [ ] Integration: a fixture upstream that emits two `partial` SSE frames and a `final` produces three SSE frames downstream in the same order. - [ ] Integration: flipping `speech.enabled` from `false` to `true` at `scope='global'` (via direct `service_config` insert in the test) makes the next request succeed without restart. ## Out of scope - TTS / read-back of architect responses. - Speaker diarisation, custom vocabularies, or fine-tuned models. - Multi-provider abstraction (OpenAI Whisper, Groq, Replicate, etc.) — config shape stays generic enough to swap later, but no provider-abstraction code lands here. - Hosted-STT API keys in the SC-6 `secret` table — defer until we actually need a hosted provider. speaches needs no auth. ## References - `specs/workspace-chat-voice-input.md` — full spec (P1 section). - `specs/config-to-db.md` — SVC-1/SVC-2/SVC-3 contract this story lands on from day one. - `~/.config/systemd/user/speaches.service` — existing speaches unit on the desktop, port 8078, model `deepdml/faster-whisper-large-v3-turbo-ct2`, `STT_MODEL_TTL=-1` (warm). - speaches OpenAI-compatible STT: `POST /v1/audio/transcriptions` with `stream=true` for SSE partials. ## Dependencies - **SVC-1 (#750)** — already merged. No native dep edge needed. - This story has **no open blockers** at creation time. It will land unassigned on `/issues/ready`; the architect (or operator) assigns it manually to kick off dispatch.

dev was assigned by code-lead

2026-05-02 21:41:50 +00:00

code-lead added the

area:dashboard

security

type:user-story

labels

2026-05-02 21:41:51 +00:00

code-lead referenced this issue

2026-05-02 21:41:52 +00:00

VOICE-2: /architect/transcribe/health probe endpoint #774

code-lead referenced this issue

2026-05-02 21:41:53 +00:00

VOICE-3: Composer mic toggle + live partials #775

code-lead referenced this issue

2026-05-02 21:41:55 +00:00

VOICE-4: Settings → Service "Voice input" group #776

code-lead added a new dependency

2026-05-02 21:42:09 +00:00

#774 VOICE-2: /architect/transcribe/health probe endpoint

code-lead added a new dependency

2026-05-02 21:42:26 +00:00

#775 VOICE-3: Composer mic toggle + live partials

code-lead added a new dependency

2026-05-02 21:42:26 +00:00

#776 VOICE-4: Settings → Service "Voice input" group

dev referenced this issue from a commit

2026-05-02 21:54:05 +00:00

feat(voice): add /architect/transcribe SSE proxy + speaches integration (VOICE-1, #773)

dev referenced this issue from a pull request that will close it,

2026-05-02 21:54:29 +00:00

feat(voice): /architect/transcribe SSE proxy + speaches integration (VOICE-1) #778

code-lead closed this issue

2026-05-02 22:29:04 +00:00

code-lead referenced this issue from a commit

2026-05-02 22:29:04 +00:00

feat(voice): /architect/transcribe SSE proxy + speaches integration (VOICE-1)

charles referenced this issue from a commit

2026-05-03 13:51:31 +00:00

fix(voice): unify SpeechConfig on resolver — drop webhook-config duplicate

No Branch/Tag specified

main

chore/sync-pre-push-from-forge-base

fix/flows-yaml-dispatch-identity

feat/board-tap-to-assign

dev/1107

code-lead/1106

code-lead/1108

dev/1104

code-lead/1103

code-lead/1080

dev/1087

feat/flows-yaml-ci-events

chore/board-drop-stalled-and-density-controls

fix/flows-yaml-routes-always-register

flows-yaml/api-defaults

dev/1023

fix/event-log-history-bleed

fix/janitor-fix-ci-logs-and-cap

dev/1022

fix/board-card-provider

code-lead/1036

dev/1025

code-lead/1020

dev/1017

code-lead/1026

feat/web-shortcut-registry-1018

dev/1015

code-lead/1009

code-lead/1008

dev/975

dev/969

dev/973

dev/967

code-lead/968

code-lead/953

dev/970

dev/976

code-lead/966

code-lead/956

code-lead/951

dev/962

dev/963

dev/977

dev/955

dev/983

dev/961

dev/974

code-lead/950

code-lead/939

dev/941

dev/940

dev/937

dev/938

dev/936

dev/935

feat/web-i18n-fr-locale

feat/spec-editor-ui-polish

chore/drop-legacy-compat

fix/skills-drop-preview-pane

fix/882-skills-safety-rail

dev/911

dev/909

dev/923

dev/917

dev/915

feat/879-sr11-m2-drop-legacy-skill

code-lead/873

dev/881

code-lead/869

dev/867

code-lead/845

code-lead/843

code-lead/844

dev/837

dev/861

dev/849

code-lead/837

code-lead/842

fix/dedup-rebase-inflight

dev/838

code-lead/847

dev/833

code-lead/848

pr/838

code-lead/841

feat/settings-save-bar/836

code-lead/840

dev/846

code-lead/839

dev/832

fix/board-sse-stale-cache

dev/834

dev/835

feat/settings-breadcrumbs

feat/forge-oauth-credentials

refactor/service-config-consolidation

feat/agent-tokens-to-secrets

feat/gitlab-oauth-to-db

feat/authelia-rip-and-voice-fixes

fix/rebase-storm-and-dead-letter

code-lead/797

code-lead/796

dev/811

code-lead/798

dev/810

code-lead/795

dev/808

code-lead/794

dev/805

dev/802

dev/803

feat/avatar-menu-settings-entry

feat/per-agent-token-tracking

dev/793

dev/747

dev/752

code-lead/790

code-lead/759

dev/756

dev/760

dev/741

dev/767

dev/740

dev/709

dev/644

dev/637

boss/614

dev/600

dev/611

dev/585

fix/login-bonus-fixes

boss/544

dev/542

refactor/api-prefix-and-session-gate

dev/489

boss/531

boss/518

dev/499

boss/516

dev/530

dev/517

dev/519

dev/515

dev/522

dev/503

dev/471

boss/329

dev/417

dev/418

dev/402

boss/327

dev/334

dev/332

boss/326

boss/325

dev/331

boss/324

boss/323

boss/322

dev/294

test/s11-task-analytics

dev/262

boss/270

dev/268

foreman/ui-consolidation-spec

dev/234

boss/196

boss/176

boss/164

fix/124-session-persist-bind

boss/52

dev/87

boss/73

dev/77

dev/81

dev/82

boss/79

dev/42

dev/35

boss/7

No results found.

Labels

Clear labels

area:agents

Agent types, pool scheduling, per-instance config

area:dashboard

Dashboard UI and observability surfaces

area:database

DB layer — schema, migrations, ORM, raw SQL

area:design

UI/UX mockup work — routes to designer agent

area:design-review

Design review dispatch — routes to design-reviewer agent

area:flows

Flow runner — YAML loader, executor, op registry, expression eval

area:infra

Deployment, isolation, containers, systemd units

area:meta

Tracking, scaffolding, project setup

area:security

Security — routes to reviewer-security (opus)

area:sessions

Session-id store, Claude SDK resume logic

area:webhook

Forgejo webhook routing and handlers

area:workdir

Clone cache, worktrees, git identity

security

Security-sensitive issue

Tracking or decisions, not implementation work

No labels

Milestone

Clear milestone

No items

No milestone

Projects

Clear projects

No items

No project

Assignees

Clear assignees

No assignees

dev

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks

#774 VOICE-2: /architect/transcribe/health probe endpoint

charles/claude-hooks

#775 VOICE-3: Composer mic toggle + live partials

charles/claude-hooks

#776 VOICE-4: Settings → Service "Voice input" group

charles/claude-hooks

Reference

charles/claude-hooks#773

Reference in a new issue

Repository

charles/claude-hooks

Title

Body

No description provided.

Delete branch "%!s()"

Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?

Rows
Columns