M26-3: tier badge + failover event log + provider history sparkline #550
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks#550
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
As an operator, I want the agents page to show which tier each instance is currently on, when it last failed over, and what's happening across the fleet, so that I can spot a degraded primary provider at a glance and trace why a specific agent jumped to its fallback.
Background
M26-1 lands the data (per-instance
current_tier,last_failover_at,last_failure_kind,paused). M26-2 lands the wizard. This story makes the runtime state visible.Acceptance criteria
Agents list — tier badge column
/agentspage list rows.① claude-opus-4-7for tier 1,② deepseek-v4-profor tier 2,③ qwen3-coderfor tier 3,✕ all tiers exhaustedwhen paused.last_failure_kind+ relative time (e.g. "rate_limit · 47m ago").Cooldown countdown
current_tier > 1, badge shows(retry in Nm)countdown untillast_failover_at + cooldown_min. Auto-decrements every 30s. Hidden when expired.Per-instance drawer — failover event log
agent_provider_eventsledger table populated by M26-1's classifier (one row per tier flip + every cooldown-elapsed reset).2026-04-29 09:14 ① → ② "401 auth_error" [view raw]. The[view raw]link opens the linked task transcript drawer.Manual controls
[ ↺ Reset to tier 1 ]button on paused or tier-degraded instances. POSTs/agents/<name>/reset-tier. Server clears state row, schedules immediate reconcile (env-file rewrite).[ ⏸ Pause ]/[ ▶ Resume ]toggle independent of tier. POSTs/agents/<name>/pauseand/unpause.Server endpoints (new)
GET /agents/<name>/provider-events?since=<ms>&limit=N→ ledger rows.POST /agents/<name>/reset-tier→ clearscurrent_tier=1, paused=0, returns updated state.POST /agents/<name>/pause→ setspaused=1.POST /agents/<name>/unpause→ setspaused=0.Tests
Out of scope
/agentsonly./usage).References
agent_provider_state+agent_provider_eventsrows this consumes.apps/web/src/features/agents/InstanceDrawer.tsx(or current location post-config-split).apps/web/src/components/watchdog-panel.tsx(similar pattern to mirror).Shipped via PR #552 + #554 (post-redesign):
Tier badge:
Tiercolumn on the InstancesTable. Renders ① / ② / ③ glyph + active model id, ✕ when paused.last_failure_kind · Nm until retry.last_failure_kind === "token_budget".Cooldown countdown:
setIntervalticker drives a freshMath.ceilon each tick. Hidden when expired or tier == 1.Manual controls:
↺Reset-to-tier-1 (visible only when degraded or paused) →POST /agents/<name>/reset-tier.⏸ / ▶toggle →POST /agents/<name>/{pause,unpause}.Server endpoints:
GET /agents/<name>/provider-events?since=&limit=→ ledger rows.POST /agents/<name>/{reset-tier,pause,unpause}— all three behindguardMutating.pausedcoerced to boolean uniformly across list + control endpoints (serialiseProviderState).Deviation:
Tests: 6 endpoint cases (list shape, ledger empty, 404 unknown agent, pause/unpause toggle, reset-tier, auth rejection).