docs(roadmap): add worker restart evidence scope gap

docs(roadmap): add permission file traversal gap
docs(roadmap): add table alignment render gap
2026-06-13 11:40:47 -04:00 · 2026-05-22 14:30:51 +00:00 · 2026-05-22 14:01:09 +00:00 · 2026-05-22 13:30:55 +00:00 · 2026-05-22 13:00:52 +00:00 · 2026-05-22 12:30:49 +00:00
1 changed files with 26 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6695,3 +6695,29 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 570. **`read_file` binary detection scans only the first 8192 bytes for NUL, so text-prefixed binaries can pass the binary gate and fail later as generic UTF-8 read errors** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 05:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@a49d8e8`. Active tmux session at probe time: `gajae-issue-324-review-gate-degradation`; no active claw-code implementation session. Channel context from Jobdori pointed at `is_binary_file` line 30. Code inspection confirmed `rust/crates/runtime/src/file_ops.rs::is_binary_file` opens the path, reads exactly one 8192-byte chunk, and returns true only if that first chunk contains NUL. `read_file` then calls `fs::read_to_string` over the whole file. A file with an 8KB text header followed by NUL/non-UTF8 binary payload is therefore not rejected by the explicit binary gate; it reaches `read_to_string` and surfaces as a generic invalid UTF-8/io error instead of the stable `InvalidData: file appears to be binary` contract. Existing coverage writes NUL at byte 0 (`rejects_binary_files`) and does not cover delayed-NUL or delayed-non-UTF8 payloads. **Required fix shape:** (a) make binary/text validation cover the entire allowed read range (bounded by `MAX_READ_SIZE`) or stream decode as UTF-8 while detecting NUL/non-text bytes; (b) map any delayed NUL/non-UTF8 failure to the same stable binary-file error kind/message used by the early gate; (c) add regressions with NUL at byte 8192+, non-UTF8 after an ASCII prefix, and valid large UTF-8 text controls; (d) avoid loading oversized files by preserving the metadata size gate before full scan/decode; (e) include byte offset/evidence in diagnostics only if it does not leak file contents. **Why this matters:** `read_file` should fail predictably on binary artifacts. A small text header is common in mixed/generated files, and letting those bypass the binary gate creates inconsistent errors that look like brittle UTF-8/tool failures rather than an intentional binary-read refusal. Source: gaebal-gajae dogfood response to Clawhip message `1507254212403138672` on 2026-05-22.

 571. **`grep_search` silently drops unreadable or non-UTF8 files, so search results can look complete while binary/permission/encoding failures are hidden from the output contract** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 06:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8632f90`. Active tmux session at probe time: `gajae-issue-326-backlog-zero-continuation`; no active claw-code implementation session. Channel context included Jobdori's glob-sort #576 finding, so this probe stayed in the adjacent file-ops search surface but chose a different contract gap. Code inspection: `rust/crates/runtime/src/file_ops.rs::grep_search_impl` iterates `collect_search_files`, applies workspace/filter checks, then does `let Ok(file_contents) = fs::read_to_string(&file_path) else { continue; };`. Any file that exists in the search set but cannot be decoded as UTF-8, is transiently unreadable, or hits another read error is silently skipped. The returned `GrepSearchOutput` has `num_files`, `filenames`, optional `content`, and `num_matches`, but no `skipped_files`, `read_errors`, `binary_files`, or `truncated_due_to_errors` field. This differs from `read_file`, which intentionally rejects binary files with a stable error, and it makes grep output appear authoritative even when a subset of candidate files was ignored. Existing grep tests cover happy-path content matches but not a directory containing one matching text file plus one unreadable/binary/non-UTF8 file. **Required fix shape:** (a) track skipped candidate files by reason (`binary_or_non_utf8`, `permission_denied`, `read_error`) without leaking file contents; (b) expose counts and optionally bounded path samples in `GrepSearchOutput`; (c) preserve best-effort search behavior if desired, but mark the result as partial/degraded when any candidate is skipped; (d) add regressions with delayed-non-UTF8/binary and permission-denied files proving the output reports skipped counts; (e) align binary/non-UTF8 classification with the `read_file` fix required by #570 so file tools share one text/binary contract. **Why this matters:** grep is an observability surface. If it silently ignores files, agents can conclude “no matches” or report an incomplete match set without any evidence that encoding or permission failures narrowed the search. Source: gaebal-gajae dogfood response to Clawhip message `1507269308277850223` on 2026-05-22.
+
+572. **`grep_search` accepts unknown `output_mode` strings as filename-mode success, so typos like `contents` or `json` silently change the result contract instead of failing fast** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 07:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ef67aa9`. Active tmux session at probe time: `gajae-pr-327-backlog-zero-review`; no active claw-code implementation session. Channel context included Jobdori's adjacent grep duplicate-context #577 finding; this probe stayed in `grep_search_impl` but checked argument-contract handling. Code inspection: `rust/crates/runtime/src/file_ops.rs::grep_search_impl` sets `output_mode = input.output_mode.clone().unwrap_or_else(|| "files_with_matches")`, then special-cases only `"count"` and `"content"`. Any other string falls through to the default filename-list path and is echoed back as `mode: Some(output_mode.clone())`, with `content: None` and `num_matches: None`. That means misspellings such as `output_mode:"contents"`, `"files_with_match"`, or unsupported values like `"json"` return a successful-looking response whose `mode` claims the unsupported value while the payload shape is actually files-with-matches. There is no enum validation, no `InvalidInput`, and no test for unsupported output modes. **Required fix shape:** (a) validate `output_mode` against an explicit enum (`files_with_matches`, `content`, `count`) before reading files; (b) return a typed `InvalidInput`/machine-readable error listing supported values for unknown modes; (c) make the returned `mode` always match the actual payload semantics; (d) add regressions for `contents`, `files_with_match`, and valid `content`/`count`/default behavior; (e) keep this aligned with CLI `--output-format` enum validation gaps so search/tool contracts fail fast on typos. **Why this matters:** grep output shape drives model/tool parsing. A typo should not silently downgrade from content/count mode into filename mode while preserving the bogus mode label; that creates false negative evidence and parser confusion with no visible error. Source: gaebal-gajae dogfood response to Clawhip message `1507276861917368421` on 2026-05-22.
+
+573. **`grep_search` treats `head_limit:0` as unlimited, so callers cannot request an empty/page-metadata probe and may accidentally dump the full match set** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 07:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@eb12c3d`. Active tmux session at probe time: `gajae-pr-330-review-v2`; no active claw-code implementation session. Code inspection: `rust/crates/runtime/src/file_ops.rs::apply_limit` is used for grep filenames and content lines. It computes `explicit_limit = limit.unwrap_or(250)`, but if `explicit_limit == 0` it returns the full post-offset item vector with `appliedLimit: None` instead of truncating to zero or rejecting the value. Therefore a caller using `head_limit:0` as a common “no rows, just metadata/count/page preflight” convention gets every matching filename/content line after the offset, bypassing the default 250 cap and potentially injecting a large result into the model context. Existing grep tests pass `head_limit: Some(10)` and do not cover zero-limit semantics. This also makes `appliedLimit` misleading: an explicit limit was supplied, but the output reports no applied limit. **Required fix shape:** (a) define `head_limit` as a positive integer and reject zero with `InvalidInput`, or make zero return an empty result with `appliedLimit:0` consistently; (b) never let zero disable truncation unless a separately named `unlimited:true` escape hatch exists; (c) add regressions for `head_limit:0` in filename and content modes, positive limits, default 250 truncation, and offset-only behavior; (d) ensure `appliedLimit` reflects the caller-supplied limit when accepted; (e) document pagination semantics so wrappers do not accidentally turn metadata probes into full dumps. **Why this matters:** search pagination is a context-window safety control. A zero limit should be safe or invalid, not the one value that disables the cap and can flood the assistant with every match. Source: gaebal-gajae dogfood response to Clawhip message `1507284411995918427` on 2026-05-22.
+
+574. **Kimi compatibility helpers only strip the `kimi/` routing prefix, so documented `dashscope/kimi-*` and `moonshot/kimi-*` slugs can leak provider prefixes onto the wire** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 09:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@c1bb355`. Active tmux session at probe time: `omx-issue-2462-madmax-lock-diagnostic`; no active claw-code implementation session. Channel context included Jobdori's reasoning-history #581 finding in `openai_compat.rs`; this probe inspected the same provider file for another Kimi/OpenAI-compatible model-routing contract gap. Code inspection: `model_rejects_is_error_field` intentionally recognizes `dashscope/kimi-k2.5` and `moonshot/kimi-k2.5` by stripping any prefix with `rsplit('/')`, and tests assert those slugs reject `is_error`. But `wire_model_for_base_url` / `strip_routing_prefix` only strip prefixes matching `openai|xai|grok|qwen|kimi`; `dashscope` and `moonshot` are not included. Therefore a request model like `dashscope/kimi-k2.5` can be treated as Kimi for tool-result compatibility while the serialized `model` sent to a DashScope/Moonshot-compatible endpoint remains `dashscope/kimi-k2.5` instead of the expected `kimi-k2.5`. Existing tests cover `strip_routing_prefix("kimi/kimi-k2.5")` but not `dashscope/kimi-k2.5` or `moonshot/kimi-k2.5`, despite those exact prefixes being listed in Kimi compatibility tests. **Required fix shape:** (a) decide the supported routing prefixes for Kimi/DashScope/Moonshot and use one shared prefix-strip helper for compatibility checks and wire model serialization; (b) add `dashscope` and `moonshot` where appropriate, or reject those prefixed slugs early with a typed configuration error; (c) add tests proving `wire_model_for_base_url` and `strip_routing_prefix` produce `kimi-k2.5` for every documented Kimi prefix; (d) keep OpenRouter/non-default OpenAI base-url slash-slug preservation semantics intact; (e) update docs/config examples so model slugs and provider routing prefixes are unambiguous. **Why this matters:** provider compatibility logic and wire serialization must agree on the model identity. If one path says “this is Kimi” while another sends a prefixed slug the backend may not understand, users get avoidable 400/model-not-found errors that look like provider instability. Source: gaebal-gajae dogfood response to Clawhip message `1507307060998443059` on 2026-05-22.
+
+575. **`grep_search` walks `.git`, `node_modules`, `target`, and other heavy directories even though `glob_search` skips them, so grep can waste startup/context time scanning generated/vendor files by default** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 10:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@819f67b`. Active tmux sessions at probe time: none. Code inspection: `rust/crates/runtime/src/file_ops.rs` defines `GLOB_SEARCH_IGNORED_DIRS` (`.git`, `node_modules`, `.build`, `target`, `dist`, `coverage`) and `glob_search_impl` applies it via `WalkDir::filter_entry(|entry| !should_skip_glob_dir(entry))`. But `grep_search_impl` calls `collect_search_files(&base_path)`, and `collect_search_files` uses raw `WalkDir::new(base_path)` with no `filter_entry` or ignore list. As a result, a default grep over a repo can descend through `.git/objects`, Rust `target/`, vendored `node_modules`, coverage, and dist outputs, then silently skip unreadable/non-UTF8 files (#571) or spend time decoding generated blobs before any model-visible answer. Existing tests prove `glob_search_skips_common_heavy_directories`, but no equivalent grep test exists. **Required fix shape:** (a) make `collect_search_files` share the same ignored-directory policy as `glob_search`, or define an explicit grep ignore policy with opt-in overrides; (b) add skipped/ignored directory counts to grep output so operators know scope was pruned; (c) add regressions where `.git`/`target` contain matching files but default grep ignores them, while direct path-to-file behavior remains explicit; (d) allow explicit include/override if users really want generated/vendor search; (e) align docs/tool schema so glob and grep default search scope semantics match. **Why this matters:** grep is a high-frequency dogfood/search tool. Scanning generated and VCS internals by default creates startup friction, noisy false matches, and token waste, especially in Rust/JS workspaces with huge `target` or `node_modules` trees. Source: gaebal-gajae dogfood response to Clawhip message `1507322157561151671` on 2026-05-22.
+
+576. **`glob_search` brace expansion has no fan-out cap, so a single pattern with large brace groups can explode into thousands of `WalkDir` traversals before any timeout/result limit applies** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 10:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ab5754a`. Active tmux sessions at probe time: none. Code inspection: `rust/crates/runtime/src/file_ops.rs::expand_braces` recursively expands the first `{...}` group by splitting on every comma and calling itself on each alternative. `glob_search_impl` then iterates every expanded pattern and creates a separate `WalkDir` traversal for each one before applying the hard `take(100)` result cap. There is no maximum number of alternatives, no recursion-depth/expanded-pattern cap, and no early deduplication by shared walk root. A user/model-supplied pattern like `**/*.{a,b,c,...hundreds}` or multiple brace groups can produce hundreds/thousands of full repo walks even though only 100 filenames can be returned. Existing tests cover a tiny `*.{rs,toml}` happy path and unmatched braces, but not expansion fan-out or timeout behavior. **Required fix shape:** (a) impose a small maximum expanded-pattern count and return `InvalidInput`/typed error when exceeded; (b) deduplicate shared walk roots or compile multiple patterns per root so alternatives do not trigger repeated full-tree walks; (c) include `expanded_pattern_count` and truncation/fanout metadata in diagnostics; (d) add regressions for large single brace groups, multiple nested brace groups, and normal small brace patterns; (e) keep the final 100-result cap but do not rely on it as protection against pre-result traversal explosions. **Why this matters:** glob is a model-facing discovery tool. Unbounded brace fan-out turns a compact pattern into many expensive filesystem walks, creating startup friction and an easy self-inflicted DoS before the existing result limit can help. Source: gaebal-gajae dogfood response to Clawhip message `1507329710257078353` on 2026-05-22.
+
+577. **`WebFetch` upgrades only the initial HTTP URL, but follows redirects without revalidating the final target, so HTTPS pages can redirect the tool into localhost/private HTTP endpoints** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 11:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@571b4c2`. Active tmux sessions at probe time: none. Code inspection: `rust/crates/tools/src/lib.rs::normalize_fetch_url` upgrades an initial `http://` URL to HTTPS unless the host is exactly `localhost`, `127.0.0.1`, or `::1`. But `build_http_client` enables `redirect(reqwest::redirect::Policy::limited(10))`, and `execute_web_fetch` sends the request once, then trusts `response.url()` as the final URL. There is no redirect policy that re-runs scheme/host/IP checks for each hop, no block for redirects from public HTTPS to `http://localhost`, `http://127.0.0.1`, RFC1918/link-local/metadata IPs, or non-HTTPS final URLs. As a result, a model-requested public URL can legally redirect the tool into local/private network resources even though direct non-local HTTP is upgraded and no SSRF boundary is documented. **Required fix shape:** (a) implement a custom redirect policy that validates every `Location` target before following it; (b) reject redirects to localhost, loopback, link-local, RFC1918/private ranges, cloud metadata hosts/IPs, and/or downgrade-to-HTTP unless explicitly allowed; (c) resolve hostnames before fetch or after redirect to prevent DNS-based private IP hops; (d) add tests with public HTTPS -> localhost/private/metadata redirects and safe HTTPS->HTTPS redirects; (e) expose final URL plus redirect-block reason in the `WebFetch` output/error without leaking response bodies. **Why this matters:** WebFetch is an external content tool. Redirects are part of the fetch boundary, not a postscript; without per-hop validation, a harmless-looking public URL can become an internal network probe or local service read. Source: gaebal-gajae dogfood response to Clawhip message `1507337256107642961` on 2026-05-22.
+
+578. **`WebSearch` accepts `CLAWD_WEB_SEARCH_BASE_URL` with any scheme/host and follows the shared redirect policy, so a local environment variable can turn search into a private-network fetch without guardrails** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 11:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@bff67c2`. Active tmux session at probe time: `gajae-issue-339-backlog-zero-candidate-selection`; no active claw-code implementation session. Code inspection: `rust/crates/tools/src/lib.rs::build_search_url` checks `CLAWD_WEB_SEARCH_BASE_URL`, parses it with `reqwest::Url::parse`, appends `q`, and returns it unchanged. Unlike `WebFetch`, there is no `normalize_fetch_url`-style scheme upgrade or host validation at all. `execute_web_search` then uses the same `build_http_client` as WebFetch, which follows up to 10 redirects. A poisoned or stale local env var can point search at `http://localhost`, RFC1918 services, metadata endpoints, or a redirector to those targets; the tool will fetch and parse generic links before domain allow/block filters are applied to extracted result URLs, not to the search endpoint itself. Existing tests/logic focus on hit filtering, not search-provider endpoint validation. **Required fix shape:** (a) validate `CLAWD_WEB_SEARCH_BASE_URL` at parse time against allowed schemes/hosts or require an explicit unsafe/local-test opt-in; (b) apply the same per-redirect target validation required by #577 to WebSearch; (c) distinguish configured test search providers from production search with a typed config/source field in output; (d) add regressions for env base URLs pointing to localhost/private/metadata and safe HTTPS test endpoints; (e) document the env var as test-only or constrain it to trusted public search domains. **Why this matters:** search is often treated as a low-risk public-web tool. An unvalidated base URL makes it an environment-controlled internal fetch surface, and result-domain filters do not protect the initial request or redirects. Source: gaebal-gajae dogfood response to Clawhip message `1507344805167108257` on 2026-05-22.
+
+579. **Streaming flushes pending markdown at `ContentBlockStop` before pending tool calls are rendered, so text immediately followed by a tool call can be displayed before the tool-use event that actually interrupted/ended the content block** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 12:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@d8864ff`. Active tmux sessions at probe time: none. Channel context included Jobdori's #587 finding that `MarkdownStreamState::push` can hold prose until flush; this probe inspected the caller ordering. Code inspection: in `rust/crates/rusty-claude-cli/src/main.rs` stream handling, `ContentBlockDelta::TextDelta` buffers text through `markdown_stream.push`. On `ApiStreamEvent::ContentBlockStop`, the code first calls `markdown_stream.flush(&renderer)` and writes any pending prose, then only afterward checks `pending_tool.take()` and renders `format_tool_call_start`. If a provider emits a text block that has no safe boundary (single paragraph or unclosed markdown) followed by a tool-use block, the held text is flushed at the stop event just before the pending tool call display. Operators watching the terminal can see the assistant's prose burst immediately before the tool start marker, even though the next actionable event is the tool call; any timing/ordering cue that the model paused to call a tool is blurred by the delayed text flush. There is no test asserting streamed text/tool display ordering when markdown buffering holds content until block stop. **Required fix shape:** (a) make stream rendering preserve block/event order explicitly, perhaps flushing text at the exact text block stop and rendering tool-use starts at their own block starts/stops with clear separators; (b) when a text block is followed by a tool block, include a newline/phase boundary so delayed text does not visually merge into the tool call; (c) add streaming tests with single-paragraph text immediately followed by a tool call and with safe-boundary text, asserting terminal output order and separators; (d) consider the #587 word-boundary fallback so prose is not entirely held until the tool boundary; (e) keep persisted `AssistantEvent` ordering aligned with displayed output. **Why this matters:** streaming UI is also an event log. If buffered prose appears only when the next block stops and visually collides with a tool-use marker, users cannot tell whether the model is still speaking, has switched to tool execution, or the stream stalled. Source: gaebal-gajae dogfood response to Clawhip message `1507352360379482193` on 2026-05-22.
+
+580. **Streaming safe-boundary detection treats any indented triple-backtick line as a fence opener but will not close it if the closing fence is indented more than three spaces, so quoted/list code blocks can freeze streaming until final flush** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 12:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@9f762b2`. Active tmux sessions at probe time: `gajae-pr-340-backlog-zero-candidate-selection-final-review`, `gajae-pr-340-backlog-zero-rereview2`; no active claw-code implementation session. Code inspection: `rust/crates/rusty-claude-cli/src/render.rs::parse_fence_opener` counts only literal spaces and accepts fence openers with indent <=3. Once inside a fence, `line_closes_fence` also requires the closing marker indent <=3. That matches CommonMark top-level fenced blocks, but streaming markdown from models often nests code fences under bullets, block quotes, or copied indentation where both opener and closer are indented four or more spaces. In that case the opener with >3 spaces is ignored (fine), but mixed/normalized output can still open at <=3 and then fail to close if the closer is rendered with extra list/quote indentation; `open_fence` remains set, `last_boundary` stops updating, and `MarkdownStreamState::push` returns `None` until `flush()`. Existing tests cover nested fence marker lengths and tilde/backtick distinction, but not list/blockquote/indented fence streaming behavior or a mismatched indentation close. **Required fix shape:** (a) decide whether the stream boundary detector should follow strict CommonMark or be tolerant for model-generated/list-nested fences; (b) if tolerant, close fences when the same marker appears after quote/list prefixes or consistent extra indentation, while still avoiding false closes inside literal code; (c) add tests for bullet-nested fences, blockquote fences, and an opener at indent <=3 with a closer at indent >3; (d) include a max-buffer/word-boundary fallback from #587 so a malformed fence cannot suppress all output indefinitely; (e) keep rendering tests aligned with boundary tests so visual output and streaming segmentation share one markdown dialect. **Why this matters:** model answers frequently include code inside bullets or quoted explanations. If the stream boundary tracker misses the closing fence, the terminal looks stalled even though deltas are arriving, turning a formatting edge case into startup/streaming opacity. Source: gaebal-gajae dogfood response to Clawhip message `1507359904959172758` on 2026-05-22.
+
+581. **Terminal markdown renderer drops styling inside link labels because link text is accumulated as raw text while emphasis/inline-code events inside links are flattened before final rendering** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 13:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8c4b33d`. Active tmux sessions at probe time: `gajae-issue-341-post-merge-cleanup-retention`, `omc-issue-3087-windows-hud-execfile`; no active claw-code implementation session. Code inspection: `rust/crates/rusty-claude-cli/src/render.rs::RenderState::append_raw` diverts all text into `link_stack.last_mut().text` whenever a link is open. `Event::Start(Tag::Emphasis)`, `Strong`, and inline `Code` still update global style/rendering, but when the current context is a link, the rendered text is appended to `LinkState.text` as plain accumulated label text. At `Event::End(TagEnd::Link)`, the renderer emits one uniformly underlined blue `[label](destination)` string. This means markdown like `[**important** docs](https://...)`, `[*emphasis* link](...)`, or ``[`code` API](...)`` loses bold/italic/inline-code styling inside the label, and ANSI escape sequences from nested inline code can be embedded into the label string before the final link styling in ways not covered by tests. Existing link coverage only asserts a simple `[Claw](url)` label. **Required fix shape:** (a) decide whether link labels should preserve nested inline styles or intentionally flatten them; (b) if preserving, store rendered segments or nested style spans in `LinkState` instead of one raw label string; (c) if flattening, strip ANSI/control styling from accumulated labels before final link render and document the limitation; (d) add tests for bold/emphasis/code inside links and nested links/images where the parser permits them; (e) ensure streaming chunk boundaries do not split ANSI state inside a link label. **Why this matters:** terminal rendering is the user-facing event log. Links often carry emphasized API names or inline-code identifiers; flattening or double-styling labels makes important documentation output less readable and can leak nested ANSI into one final styled link span. Source: gaebal-gajae dogfood response to Clawhip message `1507367459689201715` on 2026-05-22.
+
+582. **Terminal table renderer ignores markdown alignment markers, so right/center-aligned numeric columns are rendered left-aligned with no test coverage** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 13:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ccb319a`. Active tmux session at probe time: `omx-issue-2466-ultragoal-get-goal-recovery-resume`; no active claw-code implementation session. Code inspection: `rust/crates/rusty-claude-cli/src/render.rs` starts tables with `Event::Start(Tag::Table(..))` but discards the alignment vector carried by pulldown-cmark. `TableState` stores headers/rows/current cells only, and `render_table_row` always writes the cell text followed by padding spaces (`left` alignment) regardless of whether the markdown separator uses `:---`, `---:`, or `:---:`. Existing test `renders_tables_with_alignment` checks column width alignment only, using `| ---- | ----- |`, and does not cover right/center alignment semantics. **Required fix shape:** (a) store pulldown-cmark table alignment metadata in `TableState`; (b) apply left/center/right padding in `render_table_row` for headers and body cells; (c) add tests for `| left | center | right |` with `| :--- | :---: | ---: |`, including numeric columns; (d) ensure `visible_width` still handles ANSI-styled header cells correctly when padding is split before/after centered text; (e) document whether terminal rendering intentionally supports GitHub-style table alignment. **Why this matters:** tables are common in status reports, benchmarks, and cost/test summaries. Dropping right alignment makes numeric columns harder to scan and means the terminal renderer does not faithfully reflect markdown that users expect from GitHub/Discord-style output. Source: gaebal-gajae dogfood response to Clawhip message `1507375004671672391` on 2026-05-22.
+
+583. **`PermissionEnforcer::check_file_write` uses string-prefix workspace checks, so relative traversal like `../../outside` is allowed as if it were inside the workspace** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 14:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@aa9efe5`. Active tmux session at probe time: `omx-issue-2468-autoresearch-docs-css`; no active claw-code implementation session. Channel context included Jobdori's #591 `Prompt`-mode bypass in `permission_enforcer.rs`, so this probe inspected the adjacent file-write boundary helper. Code inspection: `rust/crates/runtime/src/permission_enforcer.rs::is_within_workspace` treats relative paths by string-concatenating `format!("{workspace_root}/{path}")`, then checks `normalized.starts_with(root) || normalized == workspace_root.trim_end_matches('/')`. It never normalizes `.`/`..`, canonicalizes symlinks, or resolves the candidate path. Therefore `check_file_write("../../outside/secret.txt", "/workspace")` builds `/workspace/../../outside/secret.txt`, which still starts with `/workspace/`, and returns `Allowed` in `WorkspaceWrite` mode even though the resolved target is outside the workspace. Existing tests cover absolute outside paths and simple relative `src/main.rs`, but not relative traversal escapes. **Required fix shape:** (a) replace string-prefix checks with path normalization/canonicalization against a workspace root, resolving `.`/`..` before comparison; (b) reject escaping relative paths even if the file does not exist yet, using lexical normalization plus parent canonicalization where needed; (c) add regressions for `../outside.txt`, `../../outside/secret.txt`, `src/../safe.txt`, and symlink escape cases; (d) share the same boundary primitive with runtime file ops and bash path-scope checks so permission enforcement cannot drift; (e) include resolved/candidate path evidence in denial messages without leaking sensitive contents. **Why this matters:** workspace-write mode's core promise is “project files only.” A string-prefix check lets a caller smuggle outside writes through relative traversal while the permission enforcer reports success, defeating the boundary before lower-level file tools can help. Source: gaebal-gajae dogfood response to Clawhip message `1507382558340681779` on 2026-05-22.
+
+584. **`WorkerRegistry::restart` clears prompt/trust state but leaves old events and the original `created_at`, so post-restart timeout evidence can mix prior-attempt blockers with the new boot attempt** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 14:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ac09033`. Active tmux sessions at probe time: none. Channel context included Jobdori's worker-boot #592 finding, so this probe stayed in `worker_boot.rs` but checked restart lifecycle continuity. Code inspection: `rust/crates/runtime/src/worker_boot.rs::restart` resets `status`, trust flags, prompt fields, `last_error`, attempts, and `prompt_in_flight`, then appends `WorkerEventKind::Restarted`. It does not reset `worker.created_at`, does not create a per-attempt `boot_started_at`, does not increment a restart/attempt counter, and does not clear or partition previous events. Later `observe_startup_timeout` computes `elapsed = now.saturating_sub(worker.created_at)` and derives `trust_prompt_detected`, `tool_permission_detected`, and `ready_for_prompt_detected` by scanning the entire `worker.events` history. A worker that previously hit trust/tool/ready evidence, then restarts cleanly, can have the new timeout classified with old pre-restart evidence and an elapsed time anchored to the original creation. Existing `restart_and_terminate_reset_or_finish_worker` only asserts prompt fields/attempts reset; it does not assert event scoping or elapsed-time reset. **Required fix shape:** (a) add `current_attempt_started_at`/`boot_started_at` and `attempt_index` fields; (b) set them on create and restart and use them for startup timeout elapsed/command_started_at; (c) scope timeout evidence scans to events since the current attempt or store per-attempt cached evidence; (d) add tests where trust/tool evidence exists before restart but not after, proving the post-restart timeout does not inherit stale blockers; (e) include attempt index in worker events so dashboards can separate old and new boot attempts. **Why this matters:** restart is supposed to produce a fresh boot attempt. If old events and timestamps remain authoritative, operators see misleading “stalled for hours / trust required” evidence for a brand-new restart and recovery automation can choose the wrong next action. Source: gaebal-gajae dogfood response to Clawhip message `1507390103956226228` on 2026-05-22.
Author	SHA1	Message	Date
Yeachan-Heo	b0bca2ea4f	docs(roadmap): add worker restart evidence scope gap	2026-05-22 14:30:51 +00:00
Yeachan-Heo	ac0903362c	docs(roadmap): add permission file traversal gap	2026-05-22 14:01:09 +00:00
Yeachan-Heo	aa9efe5474	docs(roadmap): add table alignment render gap	2026-05-22 13:30:55 +00:00
Yeachan-Heo	ccb319a3dc	docs(roadmap): add link label styling gap	2026-05-22 13:00:52 +00:00
Yeachan-Heo	8c4b33d9b1	docs(roadmap): add indented fence streaming gap	2026-05-22 12:30:49 +00:00
Yeachan-Heo	9f762b26fa	docs(roadmap): add stream text-tool ordering gap	2026-05-22 12:00:54 +00:00
Yeachan-Heo	d8864ff151	docs(roadmap): add websearch base url boundary gap	2026-05-22 11:30:40 +00:00
Yeachan-Heo	bff67c2b24	docs(roadmap): add webfetch redirect boundary gap	2026-05-22 11:00:48 +00:00
Yeachan-Heo	571b4c2cd1	docs(roadmap): add glob brace fanout gap	2026-05-22 10:30:50 +00:00
Yeachan-Heo	ab5754a524	docs(roadmap): add grep heavy-dir traversal gap	2026-05-22 10:00:54 +00:00
Yeachan-Heo	819f67b01b	docs(roadmap): add kimi prefix wire-model gap	2026-05-22 09:01:22 +00:00
Yeachan-Heo	c1bb355691	docs(roadmap): add grep zero limit gap	2026-05-22 07:30:46 +00:00
Yeachan-Heo	eb12c3d1ef	docs(roadmap): add grep output mode validation gap	2026-05-22 07:00:48 +00:00