docs(roadmap): add subagent lane event conformance gap

docs(roadmap): add worker replay receipt integrity gap
docs(roadmap): add mcp tool bridge registry drift gap
2026-06-13 19:44:47 -04:00 · 2026-05-22 21:31:15 +00:00 · 2026-05-22 21:01:16 +00:00 · 2026-05-22 20:31:24 +00:00 · 2026-05-22 20:01:13 +00:00 · 2026-05-22 19:31:27 +00:00
1 changed files with 30 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6721,3 +6721,33 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 583. **`PermissionEnforcer::check_file_write` uses string-prefix workspace checks, so relative traversal like `../../outside` is allowed as if it were inside the workspace** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 14:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@aa9efe5`. Active tmux session at probe time: `omx-issue-2468-autoresearch-docs-css`; no active claw-code implementation session. Channel context included Jobdori's #591 `Prompt`-mode bypass in `permission_enforcer.rs`, so this probe inspected the adjacent file-write boundary helper. Code inspection: `rust/crates/runtime/src/permission_enforcer.rs::is_within_workspace` treats relative paths by string-concatenating `format!("{workspace_root}/{path}")`, then checks `normalized.starts_with(root) || normalized == workspace_root.trim_end_matches('/')`. It never normalizes `.`/`..`, canonicalizes symlinks, or resolves the candidate path. Therefore `check_file_write("../../outside/secret.txt", "/workspace")` builds `/workspace/../../outside/secret.txt`, which still starts with `/workspace/`, and returns `Allowed` in `WorkspaceWrite` mode even though the resolved target is outside the workspace. Existing tests cover absolute outside paths and simple relative `src/main.rs`, but not relative traversal escapes. **Required fix shape:** (a) replace string-prefix checks with path normalization/canonicalization against a workspace root, resolving `.`/`..` before comparison; (b) reject escaping relative paths even if the file does not exist yet, using lexical normalization plus parent canonicalization where needed; (c) add regressions for `../outside.txt`, `../../outside/secret.txt`, `src/../safe.txt`, and symlink escape cases; (d) share the same boundary primitive with runtime file ops and bash path-scope checks so permission enforcement cannot drift; (e) include resolved/candidate path evidence in denial messages without leaking sensitive contents. **Why this matters:** workspace-write mode's core promise is “project files only.” A string-prefix check lets a caller smuggle outside writes through relative traversal while the permission enforcer reports success, defeating the boundary before lower-level file tools can help. Source: gaebal-gajae dogfood response to Clawhip message `1507382558340681779` on 2026-05-22.

 584. **`WorkerRegistry::restart` clears prompt/trust state but leaves old events and the original `created_at`, so post-restart timeout evidence can mix prior-attempt blockers with the new boot attempt** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 14:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ac09033`. Active tmux sessions at probe time: none. Channel context included Jobdori's worker-boot #592 finding, so this probe stayed in `worker_boot.rs` but checked restart lifecycle continuity. Code inspection: `rust/crates/runtime/src/worker_boot.rs::restart` resets `status`, trust flags, prompt fields, `last_error`, attempts, and `prompt_in_flight`, then appends `WorkerEventKind::Restarted`. It does not reset `worker.created_at`, does not create a per-attempt `boot_started_at`, does not increment a restart/attempt counter, and does not clear or partition previous events. Later `observe_startup_timeout` computes `elapsed = now.saturating_sub(worker.created_at)` and derives `trust_prompt_detected`, `tool_permission_detected`, and `ready_for_prompt_detected` by scanning the entire `worker.events` history. A worker that previously hit trust/tool/ready evidence, then restarts cleanly, can have the new timeout classified with old pre-restart evidence and an elapsed time anchored to the original creation. Existing `restart_and_terminate_reset_or_finish_worker` only asserts prompt fields/attempts reset; it does not assert event scoping or elapsed-time reset. **Required fix shape:** (a) add `current_attempt_started_at`/`boot_started_at` and `attempt_index` fields; (b) set them on create and restart and use them for startup timeout elapsed/command_started_at; (c) scope timeout evidence scans to events since the current attempt or store per-attempt cached evidence; (d) add tests where trust/tool evidence exists before restart but not after, proving the post-restart timeout does not inherit stale blockers; (e) include attempt index in worker events so dashboards can separate old and new boot attempts. **Why this matters:** restart is supposed to produce a fresh boot attempt. If old events and timestamps remain authoritative, operators see misleading “stalled for hours / trust required” evidence for a brand-new restart and recovery automation can choose the wrong next action. Source: gaebal-gajae dogfood response to Clawhip message `1507390103956226228` on 2026-05-22.
+
+585. **Prompt-misdelivery auto-recovery arms replay without clearing `last_error`, so a worker can be `ReadyForPrompt` while still carrying a stale prompt-delivery failure** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 15:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@b0bca2e`. Active tmux session at probe time: `gajae-pr-346-session-gateway-continuity-digest-review`; no active claw-code implementation session. Code inspection: in `rust/crates/runtime/src/worker_boot.rs::observe`, prompt misdelivery sets `worker.last_error = Some(WorkerFailureKind::PromptDelivery)` and `prompt_in_flight = false`, then pushes a `PromptMisdelivery` event. If `worker.auto_recover_prompt_misdelivery` is true, it sets `worker.replay_prompt = worker.last_prompt.clone()` and `worker.status = WorkerStatus::ReadyForPrompt`, then pushes `PromptReplayArmed`; however it never clears or demotes `last_error`. `await_ready` then returns `ready: true` and `last_error: Some(PromptDelivery)`, so callers/dashboards can see a ready replay state and a failure state simultaneously. Later `send_prompt` clears `last_error`, but until replay is actually sent the state snapshot is contradictory. Existing replay tests assert `status == ReadyForPrompt` and `replay_prompt` contents, but do not assert `last_error` semantics while replay is armed. **Required fix shape:** (a) when auto-recovery arms replay, either clear `last_error` or replace it with a non-fatal/degraded `replay_armed` status separate from failure; (b) include `recovery_armed:true` in the worker snapshot or ready result so callers can distinguish a recoverable ready state from a failed state; (c) add tests asserting `await_ready` after auto-recovery does not report contradictory ready+fatal error; (d) preserve the original misdelivery event for audit history while keeping current worker state coherent; (e) ensure manual/non-auto recovery still reports failure until explicitly resolved. **Why this matters:** recovery state is an operator contract. A worker that is ready to replay should not also advertise a current fatal prompt-delivery error, or automation may both retry and escalate the same incident. Source: gaebal-gajae dogfood response to Clawhip message `1507397657763778562` on 2026-05-22.
+
+586. **Plugin registry reads synchronously mutate bundled plugin installs without a lock/atomic swap, so startup/listing can race or fail while merely trying to aggregate hooks/tools** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 15:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8043090`. Active tmux session at probe time: `gajae-pr-348-package-release-drift-review`; no active claw-code implementation session. Code inspection: every `PluginManager::plugin_registry_report` call begins with `self.sync_bundled_plugins()?`, and read-style paths (`plugin_registry`, `list_plugins`, `discover_plugins`, `aggregated_hooks`, `aggregated_tools`, startup `build_runtime_plugin_state_with_loader`) all flow through it. `sync_bundled_plugins` loads bundled manifests, then for each stale/outdated bundled plugin does `fs::remove_dir_all(&install_path)?; copy_dir_all(&source_root, &install_path)?;`, removes stale bundled IDs/directories, and finally writes `plugins/registry.json`. There is no process-wide file lock, no temp-dir + atomic rename, and no read-only/degraded mode. Two concurrent CLI startups or a startup plus `claw plugins list` can both decide a sync is needed; one can remove an install dir while the other is loading/copying it, yielding transient missing/partial plugin directories or a registry write race. Even a purely diagnostic/list/aggregate command can fail because bundled-plugin self-sync mutates disk before returning registry data. Existing tests cover sync happy paths and load-failure reporting, but not concurrent registry readers or a simulated remove/copy interruption. **Required fix shape:** (a) separate read-only registry discovery from bundled-plugin reconciliation, or gate reconciliation behind an explicit locked startup/update phase; (b) protect bundled install sync and registry writes with an interprocess lock; (c) copy bundled plugins into a temp dir and atomically rename/swap, never exposing partial installs; (d) if sync fails during a read-style command, return a degraded registry report with load failures instead of aborting all plugin aggregation where safe; (e) add concurrency/interruption tests with two managers racing `plugin_registry_report` and with `copy_dir_all` failure after removal, proving readers see either old or new complete plugin installs. **Why this matters:** plugin/MCP startup already has lifecycle friction. Registry reads should be safe and mostly observational; making them perform unlocked destructive replacement means diagnostics and startup can create the very plugin-load failures they are trying to observe. Source: gaebal-gajae dogfood response to Clawhip message `1507405207602987138` on 2026-05-22.
+
+587. **`REPL` has an optional timeout with no default, so model-supplied code can block the tool dispatch thread indefinitely when `timeout_ms` is omitted** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 16:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@9843ccf`. Active tmux sessions at probe time: none. Channel context included Jobdori's adjacent #595 Sleep finding, so this probe inspected the other model-facing blocking execution surfaces instead of duplicating Sleep. Code inspection: `rust/crates/tools/src/lib.rs::execute_repl` validates code/language, spawns `python -c` / `node -e`, and only enters the polling timeout loop when `input.timeout_ms` is `Some`. If `timeout_ms` is omitted, it directly calls `process.spawn()?.wait_with_output()?`, with no default deadline, no backgrounding, and no abort-signal checks. The tool schema makes `timeout_ms` optional (`"timeout_ms": { "type": "integer", "minimum": 1 }`), and existing REPL success coverage passes `timeout_ms:500`; the timeout regression passes `timeout_ms:10`; there is no test for omitted-timeout long-running code. Therefore `REPL({language:"python", code:"import time; time.sleep(999999)"})` can freeze the same tool dispatch path indefinitely, which is worse than Sleep's 5-minute cap and easy for a model to trigger by forgetting the optional field. **Required fix shape:** (a) make `timeout_ms` default to a conservative deadline (for example 30s) instead of unbounded wait; (b) enforce an upper cap and return a structured timeout error matching bash/PowerShell timeout surfaces; (c) poll in small intervals that can observe a shared abort signal if/when the tool registry gains one; (d) add regressions for omitted timeout, explicit timeout, excessive timeout, and successful short code; (e) include elapsed/timeout metadata in the JSON error so claws can distinguish user-code hang from interpreter startup failure. **Why this matters:** REPL is a model-facing code execution tool. Optional timeout means the safest path depends on the model remembering to provide a guard every time; one missing field can wedge unattended claw runs forever with no heartbeat or typed recovery event. Source: gaebal-gajae dogfood response to Clawhip message `1507412753374249032` on 2026-05-22.
+
+588. **Prompt-mode `read_piped_stdin()` still reads the entire pipe with no cap before merging it into the prompt, so the one-shot prompt path can OOM and API/session history can diverge from the persisted truncated JSONL** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 16:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@a1728b2`. Active tmux session at probe time: `omx-fooks-issue-1046-clean-epic-only-spawn-child`; no active claw-code implementation session. Channel context included Jobdori's #596 `LineEditor::read_line_fallback` unbounded line-read finding, so this probe checked the other piped-stdin entrypoint. Code inspection: `rust/crates/rusty-claude-cli/src/main.rs::read_piped_stdin` returns `None` for TTY stdin, then does `let mut buffer = String::new(); io::stdin().read_to_string(&mut buffer)` with no byte/char cap. In `CliAction::Prompt`, when `permission_mode == DangerFullAccess`, this entire buffer is appended by `merge_prompt_with_stdin` to the user prompt and sent to `LiveCli::run_turn_with_output`. The session layer later truncates JSONL fields to `MAX_JSONL_FIELD_CHARS = 16 * 1024`, so a huge piped context can be fully allocated and sent to the provider while the persisted session records only a truncated copy. This is distinct from #596: `read_line_fallback` covers non-TTY REPL line input; `read_piped_stdin` covers explicit one-shot prompt + piped context. Existing tests exercise merge formatting but not large stdin caps or persistence parity. **Required fix shape:** (a) introduce one shared `MAX_STDIN_PROMPT_BYTES`/`MAX_STDIN_PROMPT_CHARS` boundary for both `read_line_fallback` and `read_piped_stdin`; (b) read through `take(limit + 1)` or chunked bounded reads so oversize input is detected before unbounded allocation; (c) fail with a typed “stdin prompt too large” error or summarize/truncate before both API send and session persistence using the same content; (d) add tests for empty stdin, normal small piped context, limit-boundary input, and oversize input; (e) include the stdin byte/char count and cap in JSON/text diagnostics without echoing the large payload. **Why this matters:** piped stdin is a primary automation path (`cat file | claw prompt ...`). If it reads unbounded context and then persists a different truncated transcript, claws cannot replay, audit, or recover the same conversation the provider actually saw. Source: gaebal-gajae dogfood response to Clawhip message `1507420302924185720` on 2026-05-22.
+
+589. **Managed-proxy MCP transport has no auth field or `requires_user_auth` path, so protected proxy endpoints cannot use the same OAuth/auth lifecycle as HTTP/SSE** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 17:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@a93f36d`. Active tmux sessions at probe time: `gajae-issue-351-ops-surface-inventory-snapshot`, `omx-fooks-issue-1046-clean-epic-only-spawn-child`; no active claw-code implementation session. Channel context included Jobdori's #597 WebSocket OAuth gap, so this probe checked the remaining remote/proxy MCP transport shapes. Code inspection: `runtime/src/config.rs::McpManagedProxyServerConfig` has only `{ url, id }`, `parse_mcp_server_config` for `"claudeai-proxy"` parses only those two fields, `runtime/src/mcp_client.rs::McpClientTransport::ManagedProxy` stores `McpManagedProxyTransport { url, id }` with no `auth`, and `commands/src/lib.rs` reports only URL/proxy id in text/JSON. Unlike SSE/HTTP `McpRemoteTransport`, the managed-proxy path cannot carry `McpOAuthConfig`, cannot report `requires_user_auth()`, and cannot participate in the same user-auth/preflight lifecycle. A protected managed proxy must either be unauthenticated, rely on credentials encoded in URL/query (already a reporting leak class), or use an out-of-band mechanism invisible to `mcp list/show/doctor`. **Required fix shape:** (a) add `oauth: Option<McpOAuthConfig>` or an explicit managed-proxy auth config to `McpManagedProxyServerConfig`; (b) include `auth: McpClientAuth` in `McpManagedProxyTransport` or otherwise expose a shared `requires_user_auth` trait across all remote transports; (c) parse/report the auth requirement in `mcp show/list` without leaking tokens; (d) add tests for `claudeai-proxy` with OAuth proving bootstrap requires user auth and JSON/text surfaces expose non-secret auth metadata; (e) align with the WebSocket fix so every network MCP transport (SSE/HTTP/WS/managed-proxy) has symmetric auth configuration or an explicit documented reason why not. **Why this matters:** managed proxy is a remote MCP lifecycle path. If it cannot express auth, operators lose preflight visibility and are pushed toward URL/header secret hacks that diagnostics either leak (#90) or cannot validate. Source: gaebal-gajae dogfood response to Clawhip message `1507427853031968799` on 2026-05-22.
+
+590. **Configured remote MCP transports are parsed and shown as first-class, but runtime `McpServerManager` only starts stdio servers and silently degrades every HTTP/SSE/WS/SDK/managed-proxy server into an unsupported registration failure** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 17:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@106c243`. Active tmux sessions at probe time: `gajae-issue-353-review-ci-verdict-transition-receipt`, `omx-fooks-issue-1046-clean-epic-only-spawn-child`; no active claw-code implementation session. Code inspection: `runtime/src/config.rs` parses transport variants `stdio`, `sse`, `http`, `ws`, `sdk`, and `claudeai-proxy`; `commands/src/lib.rs` renders their URL/header/OAuth/proxy details in `mcp list/show`. But `runtime/src/mcp_stdio.rs::McpServerManager::from_servers` inserts only `McpTransport::Stdio` into `managed_servers`; every other transport is pushed into `unsupported_servers` with reason `transport X is not supported by McpServerManager`. `discover_tools_best_effort` later turns those into `McpLifecyclePhase::ServerRegistration` failures/degraded startup, while `discover_tools` over `server_names()` ignores them entirely because `server_names()` only returns managed stdio server keys. Existing test `manager_records_unsupported_non_stdio_servers_without_panicking` locks the current behavior for http/sdk/ws as expected. **Required fix shape:** (a) decide whether remote transports are supported in this build; if not, make config parsing/show/list label them as `configured_but_runtime_unsupported` rather than implying they are usable; (b) if they are intended to work, implement separate managers/clients for HTTP/SSE/WS/managed-proxy and route discovery/tool calls by transport; (c) surface unsupported required remote servers as hard startup blockers in prompt/runtime preflight, and optional ones as typed degraded state, not just best-effort discovery noise; (d) add JSON fields in `mcp list/show/doctor` such as `runtime_supported:false`, `unsupported_reason`, and `required`; (e) add regressions proving `discover_tools` and prompt startup cannot silently ignore required non-stdio servers. **Why this matters:** users can configure and inspect remote MCP servers today, including auth fields for HTTP/SSE, but the actual runtime path does not run them. That split creates event/log opacity: `mcp show` looks configured while prompt-time tool discovery either degrades or omits the server, so operators cannot tell from control-plane surfaces whether remote MCP tools are actually reachable. Source: gaebal-gajae dogfood response to Clawhip message `1507435406277476393` on 2026-05-22.
+
+591. **MCP degraded reports always compute `missing_tools` as empty because degraded construction passes `available_tools` (or `Vec::new()`) as the expected-tool set, so failed/unsupported servers never surface which tools disappeared** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 18:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@f50625a`. Active tmux sessions at probe time: `gajae-issue-355-backlog-zero-next-artifact-queue`, `omx-fooks-issue-1046-clean-epic-only-spawn-child`; no active claw-code implementation session. Code inspection: `runtime/src/mcp_lifecycle_hardened.rs::McpDegradedReport::new` can compute `missing_tools` by subtracting `available_tools` from `expected_tools`. But both producers discard the useful expected set: `runtime/src/mcp_stdio.rs::discover_tools_best_effort` passes `Vec::new()` for `expected_tools`, and `rusty-claude-cli/src/main.rs::RuntimeMcpState::new` passes `available_tools.clone()` as both `available_tools` and `expected_tools`. Therefore `missing_tools` is always empty, even when required servers fail discovery or unsupported remote transports are configured. Existing tests assert `degraded.missing_tools.is_empty()`, locking the broken contract. The only visible degraded data is failed server names/phases; operators cannot tell which qualified tool names vanished from the tool surface. **Required fix shape:** (a) retain each server's advertised/expected tool list from prior successful discovery, static config, manifest metadata, or at least the expected server/tool namespace when known; (b) pass that set into `McpDegradedReport::new` instead of `available_tools`/empty; (c) if exact tools are unknown, expose `missing_tool_names_unknown_for_servers:[...]` so the report is honest rather than falsely empty; (d) update tests to assert non-empty `missing_tools` or explicit unknown-missing metadata when a server with known tools fails; (e) thread the same degraded metadata into `ToolSearch` so models see which tools are unavailable, not just that a server failed. **Why this matters:** degraded startup is supposed to make partial success first-class. An always-empty `missing_tools` field falsely suggests no capability loss, hiding the actual impact of MCP failures and unsupported remote transports from both humans and automation. Source: gaebal-gajae dogfood response to Clawhip message `1507442954308948040` on 2026-05-22.
+
+592. **Plugin degraded-mode reporting drops degraded-server errors because `PluginState::Degraded` only preserves failed server details, so a server marked degraded can appear fully healthy/available in the operator contract** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 18:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@1df4114`. Active tmux session at probe time: `gajae-issue-357-review-session-vanish-replacement`; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime plugin_lifecycle -- --nocapture` passed 6/6, confirming the current locked contract. Code inspection: `runtime/src/plugin_lifecycle.rs::PluginState::from_servers` treats `ServerStatus::Degraded` as usable by placing it in `healthy_servers`, but the `PluginState::Degraded` variant stores only `healthy_servers: Vec<String>` and `failed_servers: Vec<ServerHealth>`. `PluginHealthcheck::degraded_mode` then builds `unavailable_tools` only from `failed_servers` and reports a reason of `"N servers healthy, M servers failed"`. Existing test `degraded_server_status_keeps_server_usable` asserts a degraded `beta` server is included in `healthy_servers` and `failed_servers` is empty, but it does not assert that `beta`'s `last_error` (`high latency`) or degraded status is visible in `degraded_mode`. Result: a plugin with one healthy server and one degraded-but-usable server can emit `startup_degraded` while its degraded-mode payload says `2 servers healthy, 0 servers failed`, has no degraded server list, and can mark the degraded server's tools as simply available if discovery returns them. Operators and automation see the terminal state but lose the actual degraded reason/capability risk. **Required fix shape:** (a) split `PluginState::Degraded` into `healthy_servers`, `degraded_servers: Vec<ServerHealth>`, and `failed_servers`, or preserve all non-healthy server health records; (b) make `degraded_mode` include `degraded_tools`/`degraded_servers` with `last_error` separately from unavailable failed tools; (c) update the reason string/counts to distinguish healthy, degraded, and failed rather than folding degraded into healthy; (d) add tests proving a `ServerStatus::Degraded` server's status/error/capabilities appear in the degraded-mode JSON while remaining callable when appropriate; (e) align this with MCP degraded reports so partial plugin startup carries both availability and quality-of-service impact. **Why this matters:** partial startup needs impact opacity removed. A degraded server is not failed, but it also is not fully healthy; folding it into the healthy list makes `startup_degraded` hard to diagnose and can trick recovery logic into thinking no server has actionable degradation details. Source: gaebal-gajae dogfood response to Clawhip message `1507450501925441616` on 2026-05-22.
+
+593. **Plugin uninstall is a three-store non-atomic transaction: it deletes the plugin directory before persisting registry/settings cleanup, so an IO failure can leave registry and `enabledPlugins` pointing at a removed plugin** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 19:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@244bdb7`. Active tmux session at probe time: `gajae-pr-358-review-session-vanish-replacement-review`; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p plugins installs_enables_updates_and_uninstalls_external_plugins -- --nocapture` passed 1/1, confirming the happy-path lifecycle test does not cover interrupted persistence. Code inspection: `plugins/src/lib.rs::PluginManager::uninstall` removes the plugin record from the in-memory registry, then `remove_dir_all(record.install_path)`, then `store_registry(&registry)`, then `write_enabled_state(plugin_id, None)`. If directory removal succeeds but `store_registry` fails, `installed.json` still records the old plugin while the on-disk install directory is gone. If `store_registry` succeeds but `write_enabled_state` fails, the registry and disk are removed but `.claw/settings.json` can still contain `enabledPlugins[plugin_id]=true`. The next `plugin_registry_report` may silently prune the stale registry entry, but the enabled state can remain as orphaned configuration noise and the original uninstall command has already destroyed the install before it can report a clean transactional result. Existing `installs_enables_updates_and_uninstalls_external_plugins` asserts only the success path; it does not simulate registry write failure, settings write failure, or crash after `remove_dir_all`. **Required fix shape:** (a) make plugin uninstall a transaction with a tombstone/backup phase: first persist intended disabled/removed state or acquire a lock, then move the install directory to a same-filesystem trash/backup path, then update registry/settings, then remove backup after all metadata commits succeed; (b) if metadata cleanup fails, restore or report a recoverable tombstoned state with `rollback_available:true`; (c) make stale enabled settings for missing plugins produce a structured warning and auto-clean only through a safe metadata path; (d) add tests injecting failures after directory move/removal, after `store_registry`, and after `write_enabled_state`; (e) reuse the atomic JSON/write-lock helper from #508 so registry/settings writes cannot be partially written. **Why this matters:** uninstall is the destructive plugin lifecycle verb. Deleting files before committing metadata means a transient settings/registry IO failure turns a local cleanup into stale-branch-style plugin confusion: inventory can say installed/enabled while the executable hook/tool files are already gone. Source: gaebal-gajae dogfood response to Clawhip message `1507458055812546630` on 2026-05-22.
+
+594. **MCP stdio frame reader trusts arbitrary `Content-Length` and allocates that many bytes before reading, so a buggy or hostile MCP server can OOM the runtime with one header** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 19:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@d996b65`. Active tmux session at probe time: `gajae-pr-358-review-session-vanish-replacement-review`; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime mcp_stdio -- --nocapture` passed 22/22, confirming current MCP stdio tests cover normal/mismatched/lowercase frames but not oversized frames. Code inspection: `runtime/src/mcp_stdio.rs::McpStdioProcess::read_frame` parses `Content-Length` into `usize`, then immediately does `let mut payload = vec![0_u8; content_length]; self.stdout.read_exact(&mut payload).await?;`. There is no maximum frame size, no per-server byte budget, no early rejection on huge lengths, and no streaming cap. A server can send `Content-Length: 10000000000
+
+` (or any value near available memory) and force allocation before any payload bytes arrive; the surrounding `run_process_request` timeout does not protect the allocation itself. This is distinct from HTTP body caps (#503) and SSE parser buffering (#506): it is the MCP JSON-RPC stdio framing layer. Existing tests assert lowercase `Content-Length`, missing/mismatched IDs, timeout, and retry/reset behavior, but none assert a maximum accepted frame length. **Required fix shape:** (a) add a conservative `MAX_MCP_STDIO_FRAME_BYTES` default and optional per-server override; (b) after parsing `Content-Length`, reject values above the cap with `io::ErrorKind::InvalidData` carrying `content_length` and `max_frame_bytes`; (c) read the body through a bounded buffer/helper so allocation is capped and timeout/error surfaces stay typed as MCP invalid response; (d) add regression scripts that emit huge `Content-Length` with no body and oversized body, proving no large allocation and a structured invalid-response error; (e) include frame-size metadata in MCP degraded/error reports so operators can distinguish protocol abuse from transport EOF. **Why this matters:** MCP servers are extension processes. The client must treat their stdio as untrusted protocol input; one oversized length header should not be able to OOM a prompt startup, tool discovery, or resource read before degraded-mode reporting can fire. Source: gaebal-gajae dogfood response to Clawhip message `1507465601499660349` on 2026-05-22.
+
+595. **OAuth authorize URL builder allows `extra_params` to override core PKCE/OAuth parameters after they were already set, so plugin/config extras can replace `state`, `code_challenge`, `redirect_uri`, or `response_type`** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 20:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@46f3bff`. Active tmux sessions at probe time: none; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime oauth -- --nocapture` passed 9/9, confirming current OAuth tests cover happy-path URL/form/callback parsing but not reserved extra-param collisions. Code inspection: `runtime/src/oauth.rs::OAuthAuthorizationRequest::build_url` creates a `params` vector containing core parameters (`response_type=code`, `client_id`, `redirect_uri`, `scope`, `state`, `code_challenge`, `code_challenge_method`), then blindly `extend`s `self.extra_params` into the same query. `with_extra_param` accepts any key and stores it in a `BTreeMap`, with no reserved-name validation. A caller that sets `with_extra_param("state", "attacker")`, `code_challenge`, `redirect_uri`, `response_type`, `client_id`, or `scope` produces a URL with duplicate query parameters where the extra value appears after the core value. Because many OAuth parsers use last-value-wins semantics, this can desynchronize the locally expected state/PKCE verifier from the authorization-server-visible values, or change redirect/scope semantics. Jobdori separately filed duplicate callback parameters (#603); this is the outbound sibling: duplicates are generated by the client itself before the browser redirect, not just accepted on callback. **Required fix shape:** (a) define a reserved parameter set for OAuth authorization requests (`response_type`, `client_id`, `redirect_uri`, `scope`, `state`, `code_challenge`, `code_challenge_method`) and reject attempts to add them via `with_extra_param`; (b) make `with_extra_param` return `Result<Self, OAuthError>` or validate in `build_url` with a typed error rather than silently emitting duplicates; (c) add tests for reserved collisions (`state`, `code_challenge`, `redirect_uri`) and a safe extension like `login_hint`; (d) if an override is intentionally supported, make it explicit and update the stored expected state/verifier/redirect to match so callback/token exchange cannot drift; (e) document provider-specific extra params as additive-only. **Why this matters:** `state` and PKCE are the OAuth anti-CSRF/proof-of-possession controls. Letting arbitrary extras duplicate or override them in the authorization URL creates prompt/auth lifecycle ambiguity and can turn a provider-specific hint hook into a security-sensitive parameter injection footgun. Source: gaebal-gajae dogfood response to Clawhip message `1507473155273265172` on 2026-05-22.
+
+596. **MCP tool bridge gates `call_tool` on a stale in-memory registry snapshot before doing live discovery, so newly discovered tools can be rejected and removed tools can be offered until runtime failure** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 20:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@20c9d9d`. Active tmux sessions at probe time: none; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime mcp_tool_bridge -- --nocapture` passed 19/19, confirming current bridge tests cover pre-registered happy-path tools but not registry/manager drift. Code inspection: `runtime/src/mcp_tool_bridge.rs::McpToolRegistry::call_tool` first locks `self.inner`, requires `state.status == Connected`, and checks `state.tools.iter().any(|t| t.name == tool_name)`. Only after that snapshot gate does it drop the registry lock and call `spawn_tool_call`, which creates a runtime and runs `manager.discover_tools().await` followed by `manager.call_tool(...)`. The live discovery result updates the manager/tool index, but the registry snapshot used for admission is never refreshed from that discovery. Therefore a tool that becomes available after startup/discovery refresh is still rejected as `tool not found` if it is absent from the stale registry, while a tool that disappeared can remain listed/accepted by the registry and then fail later as an `UnknownTool`/runtime error from the manager. Existing tests explicitly register `echo` in the registry before calling the live manager, so they lock in the assumption that the registry already matches runtime discovery. **Required fix shape:** (a) make `call_tool` reconcile registry state from `manager.discover_tools()` before the tool-existence gate, or delegate existence checks entirely to the authoritative manager and then update the registry snapshot; (b) add a registry `last_discovered_at`/generation or per-server tool-source metadata so `list_tools` can report stale/degraded state; (c) add regressions where the registry starts empty but the manager discovers `echo`, and where the registry advertises a stale tool that the manager no longer exposes; (d) ensure `list_tools`, `ToolSearch`, and `call_tool` agree on the same generation/tool surface; (e) surface drift as a typed MCP lifecycle/degraded event rather than a generic `tool not found`. **Why this matters:** the bridge is a control-plane contract between model-visible MCP tools and the live server manager. If admission checks use an old snapshot while execution uses fresh discovery, claws get stale tool availability evidence and confusing failures exactly where autonomous recovery needs a single source of truth. Source: gaebal-gajae dogfood response to Clawhip message `1507480700868362384` on 2026-05-22.
+
+597. **Worker prompt replay accepts a new `task_receipt` even when replaying a recovered prompt, so auto-recovery can pair the old prompt text with a new/different execution receipt** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 21:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@71f8554`. Active tmux sessions at probe time: none; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime worker_boot -- --nocapture` passed 22/22, confirming current worker-boot tests cover replay happy path and receipt mismatch detection after observation but not replay receipt integrity. Code inspection: `runtime/src/worker_boot.rs::WorkerRegistry::send_prompt` computes `next_prompt` from the explicit `prompt` argument or falls back to `worker.replay_prompt.clone()`, then unconditionally sets `worker.expected_receipt = task_receipt`. The tool input (`tools/src/lib.rs::WorkerSendPromptInput`) allows `prompt: Option<String>` and `task_receipt: Option<WorkerTaskReceipt>` independently. After prompt misdelivery, `observe` sets `worker.replay_prompt = worker.last_prompt.clone()` and payloads include the original `worker.expected_receipt`. But a subsequent `send_prompt(worker_id, None, Some(new_receipt))` replays the old recovered prompt while replacing the expected receipt with a new one. Conversely, replaying with `None` drops the expected receipt entirely. The later wrong-task-receipt detector compares observed receipts against this newly supplied/dropped value, not the receipt that belonged to the misdelivered prompt. Existing tests `prompt_misdelivery_is_detected_and_replay_can_be_rearmed` and `wrong_task_receipt_mismatch_is_detected_before_execution_continues` do not assert that replay preserves the original receipt or rejects receipt changes. **Required fix shape:** (a) when `prompt` is omitted and `worker.replay_prompt` is used, preserve the original `worker.expected_receipt` and reject any conflicting `task_receipt`; (b) store replay as a structured `{prompt, task_receipt}` bundle instead of only `String`; (c) if callers intentionally want a new prompt/receipt, require an explicit non-empty `prompt` and clear the replay bundle with an event; (d) add regressions for replay-with-new-receipt rejection, replay-with-no-receipt preserving the original receipt, and explicit new prompt replacing both prompt and receipt; (e) include receipt id/hash in `PromptReplayArmed`/`Running` event payloads without leaking sensitive prompt text. **Why this matters:** prompt recovery is supposed to resend the same task that was misdelivered. Letting the control plane silently combine old prompt text with a new or missing receipt breaks the provenance chain, causing stale/incorrect task execution evidence and making misdelivery recovery untrustworthy. Source: gaebal-gajae dogfood response to Clawhip message `1507488254734106685` on 2026-05-22.
+
+598. **Sub-agent lane manifests emit `LaneEvent::started/blocked/failed/finished/commit_created` with minimal optional metadata, so exported lane events fail the stricter G004 `emitterIdentity`/`environmentLabel` contract and have duplicate `seq=0` ordering** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 21:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@d2018d7`. Active tmux sessions at probe time: none; no active claw-code implementation session. Channel context included Jobdori filing #606 for the general G004 validator/model mismatch; this pinpoint is the concrete producer-side lane-manifest impact. Focused validation: `cd rust && cargo test -p tools lane -- --nocapture` passed 8/8, confirming current tools lane tests cover canonical event names/failure taxonomy but not G004 conformance of generated sub-agent manifests. Code inspection: `tools/src/lib.rs::create_agent_with_spawn` initializes `AgentOutput.lane_events` with `LaneEvent::started(iso8601_now())`; `persist_agent_terminal_state` appends `LaneEvent::blocked`, `LaneEvent::failed`, `LaneEvent::finished(...).with_data(...)`, and `LaneEvent::commit_created(...)`. All of these constructors call `LaneEvent::new`, which uses `LaneEventMetadata::new(0, EventProvenance::LiveLane)` and leaves `environment_label`/`emitter_identity` as `None`. Because metadata fields are skipped when `None`, serialized manifests omit the exact fields that `g004_conformance.rs::validate_lane_events` requires, and every appended event also carries `metadata.seq = 0`, violating the validator's strictly-increasing sequence rule as soon as a manifest has multiple events. This means even if the data model is fixed broadly, current sub-agent manifest production still needs a generation path that fills emitter/environment and monotonic seq. **Required fix shape:** (a) route all sub-agent lane event creation through `LaneEventBuilder` or a manifest-local helper that assigns monotonic sequence numbers; (b) set non-secret defaults such as `emitterIdentity:"tools.subagent"` and `environmentLabel` from cwd/session/channel/config, never leave them absent for exported manifests; (c) add a test that creates and completes/fails a sub-agent manifest then validates its `lane_events` with `validate_g004_contract_bundle` or a lane-event-specific conformance helper; (d) update helper constructors or document them as internal/non-G004 only; (e) align with #606 so constructor defaults and validator requirements agree. **Why this matters:** lane manifests are the event-log surface for autonomous sub-agents. If their generated events cannot pass the repo's own conformance validator and all share seq 0, downstream dashboards/reconcilers cannot rely on ordering, ownership, or environment provenance. Source: gaebal-gajae dogfood response to Clawhip message `1507495807052550174` on 2026-05-22.
Author	SHA1	Message	Date
Yeachan-Heo	e5d2eb1423	docs(roadmap): add subagent lane event conformance gap	2026-05-22 21:31:15 +00:00
Yeachan-Heo	d2018d7aee	docs(roadmap): add worker replay receipt integrity gap	2026-05-22 21:01:16 +00:00
Yeachan-Heo	71f85541bd	docs(roadmap): add mcp tool bridge registry drift gap	2026-05-22 20:31:24 +00:00
Yeachan-Heo	20c9d9d6c3	docs(roadmap): add oauth authorize extra param collision gap	2026-05-22 20:01:13 +00:00
Yeachan-Heo	46f3bff7ef	docs(roadmap): add mcp stdio frame size cap gap	2026-05-22 19:31:27 +00:00
Yeachan-Heo	d996b65d64	docs(roadmap): add plugin uninstall transaction gap	2026-05-22 19:01:43 +00:00
Yeachan-Heo	244bdb78fd	docs(roadmap): add plugin degraded server reporting gap	2026-05-22 18:31:51 +00:00
Yeachan-Heo	1df41147e3	docs(roadmap): add mcp degraded missing tools gap	2026-05-22 18:01:37 +00:00
Yeachan-Heo	f50625a04b	docs(roadmap): add remote mcp unsupported runtime gap	2026-05-22 17:31:15 +00:00
Yeachan-Heo	106c243bd3	docs(roadmap): add managed proxy mcp auth gap	2026-05-22 17:01:25 +00:00
Yeachan-Heo	a93f36d0b9	docs(roadmap): add prompt piped stdin cap gap	2026-05-22 16:30:55 +00:00
Yeachan-Heo	a1728b2be8	docs(roadmap): add repl missing default timeout gap	2026-05-22 16:01:10 +00:00
Yeachan-Heo	9843ccfa28	docs(roadmap): add plugin registry sync race gap	2026-05-22 15:31:00 +00:00
Yeachan-Heo	8043090953	docs(roadmap): add replay armed stale error gap	2026-05-22 15:01:23 +00:00