sdk-adapter canUseTool returns no updatedInput — every tool call fires ZodError #881

New issue

Closed

opened 2026-05-05 10:58:19 +00:00 by claude-desktop · 0 comments

claude-desktop commented

2026-05-05 10:58:19 +00:00

Collaborator

Symptom

Every tool call in every agent dispatch produces a tool_result containing:

Tool permission request failed: ZodError: [
  {
    "code": "invalid_union",
    "errors": [[
      {
        "expected": "record",
        "code": "invalid_type",
        "path": ["updatedInput"],
        "message": "Invalid input: expected record, received undefined"
      }
    ]]
  }
]

The SDK falls back to allow-on-schema-fail, so the tool actually runs. But the agent reads the error in its own conversation context, misinterprets "permission request failed" as a real denial, and goes into recovery loops — observed instances include code-lead trying to invoke update-config / fewer-permission-prompts / harness skills to "fix" the permission system, then aborting with a confused dispatch summary.

Confirmed across at least 5 distinct code-lead sessions (issues #571, #213, #209, #591, #671). Likely affects every agent on every dispatch since the permission callback was wired up.

Root cause

apps/server/src/infrastructure/agent/sdk-adapter.ts:570-574:

canUseTool: async (toolName: string, input: Record<string, unknown>) => {
  const verdict = await req.toolPolicy?.(toolName, input);
  if (!verdict || verdict.allow) return { behavior: "allow" as const };
  return { behavior: "deny" as const, message: verdict.message };
}

The SDK's permission-callback schema requires updatedInput: Record<string, unknown> on the allow branch — it is the (possibly-modified) input the SDK passes to the tool. We omit it entirely. The Zod union fails on the first member (allow with updatedInput) and reports the error.

Acceptance criteria

apps/server/src/infrastructure/agent/sdk-adapter.ts:570-574 returns { behavior: "allow", updatedInput: input } on the allow branch.
Add a regression test in sdk-adapter.test.ts: stub canUseTool, invoke a tool, assert no Zod error appears in any tool_result event.
Visual confirmation: dispatch a fresh code-lead task, grep its session JSONL for "permission request failed" — must be empty.
No agent in the next 24h dispatches a tool_use event for Skill, update-config, or fewer-permission-prompts (those names should never appear in agent reasoning if the underlying permission noise is gone).

Out of scope

Filtering operator-side skills out of agent context (separate ticket — extraKnownMarketplaces in agent settings.json mirrors the whole Anthropic marketplace into the agent env-dir, which leaks update-config / fewer-permission-prompts etc into the skill_listing system reminder; that is a different bug).
Skill-body safety rails telling agents not to invoke harness skills (separate ticket).

References

Bug surface: apps/server/src/infrastructure/agent/sdk-adapter.ts:570-574.
SDK callback contract: @anthropic-ai/claude-agent-sdk — canUseTool return shape (allow branch carries updatedInput).
Sample affected sessions: ~/.config/claude-hooks/agent-env/code-lead/projects/-state-worktrees-boss-default-charles--claude-hooks--boss-2F671/9bacbc0f-12ad-428c-b06f-629ea0a6c211.jsonl (and 4 siblings).
Memory: "Don't punt on missing MCP tools — fall back to Forgejo REST API" — agents are violating this rule because the permission noise convinces them MCP is broken when it's not.

## Symptom Every tool call in every agent dispatch produces a `tool_result` containing: ``` Tool permission request failed: ZodError: [ { "code": "invalid_union", "errors": [[ { "expected": "record", "code": "invalid_type", "path": ["updatedInput"], "message": "Invalid input: expected record, received undefined" } ]] } ] ``` The SDK falls back to allow-on-schema-fail, so the tool actually runs. But the agent reads the error in its own conversation context, misinterprets "permission request failed" as a real denial, and goes into recovery loops — observed instances include code-lead trying to invoke `update-config` / `fewer-permission-prompts` / harness skills to "fix" the permission system, then aborting with a confused dispatch summary. Confirmed across at least 5 distinct code-lead sessions (issues #571, #213, #209, #591, #671). Likely affects every agent on every dispatch since the permission callback was wired up. ## Root cause `apps/server/src/infrastructure/agent/sdk-adapter.ts:570-574`: ```ts canUseTool: async (toolName: string, input: Record<string, unknown>) => { const verdict = await req.toolPolicy?.(toolName, input); if (!verdict || verdict.allow) return { behavior: "allow" as const }; return { behavior: "deny" as const, message: verdict.message }; } ``` The SDK's permission-callback schema requires `updatedInput: Record<string, unknown>` on the allow branch — it is the (possibly-modified) input the SDK passes to the tool. We omit it entirely. The Zod union fails on the first member (allow with updatedInput) and reports the error. ## Acceptance criteria - [ ] `apps/server/src/infrastructure/agent/sdk-adapter.ts:570-574` returns `{ behavior: "allow", updatedInput: input }` on the allow branch. - [ ] Add a regression test in `sdk-adapter.test.ts`: stub `canUseTool`, invoke a tool, assert no Zod error appears in any `tool_result` event. - [ ] Visual confirmation: dispatch a fresh code-lead task, grep its session JSONL for `"permission request failed"` — must be empty. - [ ] No agent in the next 24h dispatches a `tool_use` event for `Skill`, `update-config`, or `fewer-permission-prompts` (those names should never appear in agent reasoning if the underlying permission noise is gone). ## Out of scope - Filtering operator-side skills out of agent context (separate ticket — `extraKnownMarketplaces` in agent settings.json mirrors the whole Anthropic marketplace into the agent env-dir, which leaks `update-config` / `fewer-permission-prompts` etc into the `skill_listing` system reminder; that is a different bug). - Skill-body safety rails telling agents not to invoke harness skills (separate ticket). ## References - Bug surface: `apps/server/src/infrastructure/agent/sdk-adapter.ts:570-574`. - SDK callback contract: `@anthropic-ai/claude-agent-sdk` — `canUseTool` return shape (allow branch carries `updatedInput`). - Sample affected sessions: `~/.config/claude-hooks/agent-env/code-lead/projects/-state-worktrees-boss-default-charles--claude-hooks--boss-2F671/9bacbc0f-12ad-428c-b06f-629ea0a6c211.jsonl` (and 4 siblings). - Memory: "Don't punt on missing MCP tools — fall back to Forgejo REST API" — agents are violating this rule because the permission noise convinces them MCP is broken when it's not.

claude-desktop added the

area:agents

type:bug

labels

2026-05-05 10:58:52 +00:00

claude-desktop referenced this issue

2026-05-05 11:02:41 +00:00

Agents see operator-side skills (update-config, fewer-permission-prompts, loop, schedule, …) in skill_listing — strip the marketplace mirror from agent env #882