docs(roadmap): add #470 — reasoning-effort accepted by diagnostics but invisible; invalid parser kind unknown

2026-06-28 13:28:39 -04:00 · 2026-05-24 19:32:06 +00:00
1 changed files with 23 additions and 34 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6429,49 +6429,38 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)

 450. **`prompt` emits `kind:"missing_credentials"` JSON on STDERR (not stdout), leaving stdout at 0 bytes — automation pattern `output=$(claw prompt hello --output-format json)` captures nothing on auth-absent failure; `doctor` correctly surfaces `auth.status:"warn"` with `api_key_present:false` but exposes no `prompt_ready:false` field that automation can check before invoking `prompt`** — dogfooded 2026-05-16 by Jobdori on `a35ee9a0` in response to Clawhip pinpoint nudge at `1505208225321062521`. Exact reproduction (isolated env, no creds, fresh git repo, HEAD `a35ee9a0`): `timeout 5 env -i HOME=$ISOLATED_HOME PATH=$PATH CLAW_CONFIG_HOME=$PROBE/.claw-cfg claw prompt hello --output-format json > stdout.txt 2> stderr.txt` → stdout = **0 bytes**, stderr = 195 bytes containing `{"error":"missing Anthropic credentials…","exit_code":1,"hint":null,"kind":"missing_credentials","type":"error"}`, exit code 1. Confirms Gaebal's `1505208553793781792` pinpoint that `prompt` timeout + zero bytes was the prior state — HEAD `a35ee9a0` now correctly exits 1 with `kind:"missing_credentials"` **but the envelope is still routed to stderr** (issue #447 class, same class as prior entries #422, #435). **Contrast with `doctor`:** `claw doctor --output-format json 2>/dev/null` succeeds to stdout with `checks[auth].status:"warn"`, `api_key_present:false`, `auth_token_present:false` — but the auth check has no `prompt_ready:false` field. Automation that gates on `doctor` before invoking `prompt` must re-derive readiness from `api_key_present && auth_token_present` — there is no single canonical boolean. **Three compound problems:** (a) **stdout-empty on `--output-format json` failure**: same class as #447; `prompt`'s error envelope goes to stderr, not stdout. The canonical automation idiom `if ! result=$(claw prompt "q" --output-format json); then echo "$result" | jq .kind; fi` sees `$result=""` on failure — the jq call gets nothing. All `--output-format json` error paths must route JSON to stdout per #447 contract; (b) **`doctor` missing `prompt_ready` field**: `doctor --output-format json` already knows auth is absent (`api_key_present:false`) but surfaces no derived `prompt_ready:bool` or `prompt_blocked_reason:string` field. Automation must infer readiness from `api_key_present || auth_token_present || legacy_*_present` — a 5-field OR across legacy fields that is fragile as auth mechanisms evolve. A single `prompt_ready:false` (with `prompt_blocked_reason:"auth_missing"`) inside the `auth` check would give downstream a stable contract; (c) **`claw prompt` with no auth does no preflight and fires straight at the API**: the preflight check that `doctor` runs (auth discovery) is not reused by `prompt` to emit a fast typed error before attempting the network call. Both Gaebal's pinpoint (prompt hanging silently on older HEAD) and the current behavior (prompt hitting auth gate after a brief API attempt) stem from the same root: prompt does not short-circuit at the point where `doctor` already knows auth is absent. If `doctor` can emit `kind:"doctor"` with `auth.status:"warn"` in ~20ms without a network call, `prompt` should emit `kind:"missing_credentials"` in the same window and output it to stdout. **Required fix shape:** (a) `prompt --output-format json` must write the `kind:"missing_credentials"` JSON envelope to **stdout**, not stderr — same fix as #447 for all error envelopes; (b) add `prompt_ready:bool` and `prompt_blocked_reason:string|null` to the `auth` check in `doctor --output-format json`; derive it as `api_key_present || auth_token_present || legacy_saved_oauth_present`; (c) `prompt` must run the credential preflight check (same codepath as doctor's auth check) before attempting any API call and emit `{"kind":"missing_credentials","prompt_blocked_reason":"auth_missing"}` on **stdout** with exit 1 if the check fails; (d) `--output-format json` stdout routing fix must cover: `prompt`, `session list` (cross-ref #449), `skills uninstall` (cross-ref #431), `resume` (cross-ref #435), `acp serve` (cross-ref #443) — the full `kind:"missing_credentials"` class; (e) regression test: `claw prompt hello --output-format json` with no creds writes JSON to stdout (0 bytes stderr), exits 1, `kind:"missing_credentials"`, in under 200ms (no network attempt). **Why this matters:** `prompt` is the primary consumer entry point. Auth-absent failure routing to stderr breaks every automation wrapper that captures `$(claw prompt ... --output-format json)`. The `doctor` preflight metadata gap means auth-readiness checks require parsing 5 legacy fields instead of reading one boolean. Cross-references #447 (all JSON error envelopes on stderr), #449 (session list hits auth gate), #431 (skills uninstall hits auth gate), #357 (auth gate on local ops cluster), #422 (exit-code parity). Source: Jobdori live dogfood, `a35ee9a0`, 2026-05-16.

-451. **Dogfood automation can silently run probes in the wrong repository/worktree when adjacent checkouts share similarly named binaries and stale build artifacts, so reports may mix evidence from Clawdbot/OMX with claw-code** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 12:00/12:30 UTC nudge cycle. While following a tmux-hook JSON lifecycle probe, the shell reported `/home/bellman/clawd` as the top-level worktree and executed `node dist/cli/omx.js ...` from Clawdbot/OMX artifacts instead of a claw-code checkout; a later correction found the actual claw-code repos under `/home/bellman/Workspace/claw-code-*`, including one `main` checkout hundreds of commits behind `origin/main`. The transcript therefore briefly contained plausible-looking CLI evidence from the wrong product tree before git provenance checks caught it. **Required fix shape:** (a) before dogfood probes, emit a mandatory machine-readable provenance preflight with repo root, remote URL, branch, HEAD, upstream HEAD, ahead/behind counts, binary path, and embedded build SHA when available; (b) make report templates include this provenance block before any command evidence; (c) warn or block when the requested product name does not match the remote/package/binary identity, or when the checkout is behind the target upstream by a configured threshold; (d) add regression coverage around multi-worktree/multi-product environments proving dogfood harnesses cannot silently attribute evidence from a neighboring repo or stale artifact. **Why this matters:** stale-branch confusion is not just a git annoyance; it corrupts the evidence chain. Claws can land or report fixes against the wrong codebase if the harness does not prove repo and binary identity before probing. Source: gaebal-gajae dogfood response to Clawhip messages `1506265193863446711` and `1506272743895728249` on 2026-05-19.
+470. **`--reasoning-effort` is accepted by local diagnostic subcommands (`status`, `doctor`, `version`, etc.) even though it is a prompt/API-request knob, then disappears from every diagnostic JSON surface; valid values (`low|medium|high`) parse and exit 0 on `status` / `doctor` with no `reasoning_effort` field, no `ignored_flags` warning, and no indication that the flag will only affect future prompt turns. Invalid values have their own strict parser (`LOW`, `Low`, `low `, ` low`, empty, `extreme`, `1`) but that error sentinel is another `classify_error_kind` orphan (`kind:"unknown"`, `hint:null`), repeating the #463/#464 pattern on a third flag-value parser** — dogfooded 2026-05-24 for the 19:30 Clawhip nudge at message `1508190377130197222`, reproduced on local `./rust/target/debug/claw` `git_sha 003b739d` (origin/main `f8e1bb72`) in a clean isolated env.

+    Reproduction:

-452. **Validated claw-code checkouts can have no runnable local debug binary, and the failure is a raw shell `No such file or directory` instead of a typed build/provenance preflight result** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 13:00 UTC nudge after applying the provenance guard from #451. The corrected claw-code checkout was `/home/bellman/Workspace/claw-code-pr2967` with remote `https://github.com/ultraworkers/claw-code.git`, branch `docs/roadmap-workdir-provenance`, HEAD `6183d95`, upstream `origin/main` at `f8e1bb7`, and ahead/behind `1/0`. The first real probe then failed before reaching claw-code logic: `timeout --kill-after=1s 8s ./rust/target/debug/claw plugins list --output-format json` exited `127` with stderr `timeout: failed to run command './rust/target/debug/claw': No such file or directory` and empty stdout. This is distinct from stale-binary mismatch: here the selected checkout is identifiable, but there is no built binary and no canonical instruction/result telling automation whether to build, locate an installed `claw`, or stop. **Required fix shape:** (a) provide a canonical `claw dogfood preflight --output-format json` or equivalent script that checks expected binary paths, installed binary fallback, embedded build SHA, workspace HEAD, and build freshness before any product probe; (b) when the expected local binary is absent, return a typed result such as `kind:"dogfood_preflight"`, `binary_status:"missing"`, `expected_path`, `recommended_build_command`, and `can_use_installed_binary:false|true`; (c) integrate the preflight into dogfood report templates so a missing build artifact is reported as startup friction, not a raw shell 127; (d) add regression/fixture coverage for missing binary, stale binary, matching debug binary, and installed-binary fallback cases. **Why this matters:** after #451 proves the repo is right, claws still need to prove the executable exists and corresponds to that repo. A raw shell missing-file error wastes a nudge cycle and tempts operators to run whatever stale binary happens to be nearby. Source: gaebal-gajae dogfood response to Clawhip message `1506280293831544997` on 2026-05-19.
+    ```bash
+    $ claw --output-format json --reasoning-effort high status | jq 'keys'
+    ["allowed_tools","config_load_error","kind","model","model_raw","model_source","permission_mode","sandbox","status","usage","workspace"]
+    # no reasoning_effort, no ignored_flags, no active_request_options

+    $ claw --output-format json --reasoning-effort high doctor | jq '.checks[].name'
+    "auth" "config" "install_source" "workspace" "sandbox" "system"
+    # no reasoning/model/request-options check carrying reasoning_effort

-453. **Plugin list JSON can report bundled plugin `source` paths from a stale user registry in a different checkout, with no stale-source warning or current-bundled-root distinction** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 13:30 UTC nudge after a successful local build in `/home/bellman/Workspace/claw-code-pr2967` (branch `docs/roadmap-workdir-provenance`, HEAD `25d663d`, binary `./rust/target/debug/claw` reporting `git_sha:"25d663d"`). Running `./rust/target/debug/claw plugins list --output-format json` returned structured `plugins[]`, but both bundled plugin entries reported `source` under `/home/bellman/Workspace/claw-code-parity-worktrees/clawcode-ux-enhance/...` instead of the current checkout. Cleaning and rebuilding the `plugins` crate did not change the output; the stale paths came from `~/.claw/plugins/installed.json`, where bundled plugin records persisted old `source.path` values. The JSON payload gave no `source_stale`, `source_exists`, `current_bundled_root`, `registry_path`, or `source_origin:"registry"` cue, so automation would treat another worktree's bundled plugin path as current truth. **Required fix shape:** (a) for bundled plugins, derive/display source from the current binary/workspace bundled root rather than a persistent user registry path when possible; (b) if registry source is retained, expose `registry_path`, `source_origin`, `source_exists`, `source_matches_current_bundle_root`, and `current_bundled_root` fields; (c) warn in text mode and JSON diagnostics when bundled plugin registry records point outside the current binary/workspace provenance; (d) add regression coverage where `installed.json` contains stale bundled paths from another checkout and `plugins list --output-format json` either self-heals or marks the source stale. **Why this matters:** plugin lifecycle actions rely on source provenance. If a fresh build from checkout A reports bundled plugin sources from checkout B, claws can inspect, enable, update, or debug the wrong plugin tree and misattribute lifecycle failures to current code. Source: gaebal-gajae dogfood response to Clawhip message `1506287843021160500` on 2026-05-19.
+    $ claw --output-format json --reasoning-effort LOW status
+    {"error":"invalid value for --reasoning-effort: 'LOW'; must be low, medium, or high","hint":null,"kind":"unknown","type":"error"}

+    $ claw --output-format json --reasoning-effort 'low ' status
+    {"error":"invalid value for --reasoning-effort: 'low '; must be low, medium, or high","hint":null,"kind":"unknown","type":"error"}
+    ```

-454. **`plugins help --output-format json` returns a success-shaped plugin inventory plus `Unknown /plugins action 'help'` prose instead of structured plugin command help or supported lifecycle actions** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 14:00 UTC nudge on a freshly built binary from `/home/bellman/Workspace/claw-code-pr2967` (`./rust/target/debug/claw version --output-format json` reported `git_sha:"25d663d"`; the worktree had roadmap-only commits ahead of that source build). Running `./rust/target/debug/claw plugins help --output-format json` exited `0` and returned JSON with `kind:"plugin"`, `action:"help"`, `status:"ok"`, a full `plugins[]` inventory, and message `Unknown /plugins action 'help'. Use list, install, enable, disable, uninstall, or update.` It did not return `supported_actions[]`, usage, action metadata, destructive-action markers, target requirements, or a typed `unsupported_action` / `help_unavailable` status. The same probe also confirmed `plugins show example-bundled --output-format json` still returns `status:"ok"` plus an unknown-action message and no selected `plugin` object, matching the existing unsupported-show class, but the new pinpoint is the absence of a plugin lifecycle discovery/help contract. **Required fix shape:** (a) implement `plugins help --output-format json` as a real help/discovery payload with `supported_actions[]`, per-action `requires_target`, `destructive`, `resume_safe`/automation notes, usage, and examples; (b) if `help` is intentionally unsupported, return a non-ok typed JSON envelope with `code:"unsupported_plugin_action"` and structured `supported_actions[]`, not `status:"ok"`; (c) avoid attaching full plugin inventory to unsupported/help responses unless requested, or mark it as incidental; (d) add regression coverage proving plugin lifecycle help is machine-readable and does not require scraping `message` for available actions. **Why this matters:** plugin lifecycle commands include install/enable/disable/update/uninstall; claws need a safe discovery surface before attempting mutations. A success-shaped unknown-help response with only prose action names keeps lifecycle automation brittle and encourages trial-and-error against mutating commands. Source: gaebal-gajae dogfood response to Clawhip message `1506295397285494905` on 2026-05-19.
+    Valid matrix: `low`, `medium`, `high` all exit 0 on `status` and `doctor` but are invisible. Invalid matrix: `LOW`, `Low`, `low `, ` low`, empty string, `extreme`, `1` all exit 1 with `kind:"unknown"` and `hint:null`.

+    **Root cause traced:** `rust/crates/rusty-claude-cli/src/main.rs:712-731` validates `--reasoning-effort` as a global flag and stores `reasoning_effort = Some(value)`, but the local diagnostic renderers never receive or serialize it. `status_json_value()` around `main.rs:5688-5714` exposes model, permission mode, allowed tools, workspace, sandbox, config load error, and usage — not request options. `doctor` builds fixed checks (`auth`, `config`, `install_source`, `workspace`, `sandbox`, `system`) and has no request-options check. The invalid-value error string is:

-455. **Plugin help entrypoints hang before producing any help bytes, and with normal user config they emit only a deprecation warning before timeout** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 14:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. Bounded probes of `plugins --help --output-format json`, `plugins help --output-format json`, and `plugins list --help --output-format json` each timed out after 8s with `stdout=0`; under the normal user config each had `stderr=121` containing only the repeated config deprecation warning, and under an isolated clean `HOME`/`CLAW_CONFIG_HOME` even `plugins --help` and JSON help forms timed out with both stdout and stderr empty. This is distinct from #454's success-shaped unknown-help JSON observed on the action-dispatch path: the flag-style/local help path can hang before returning any help or typed JSON at all. **Required fix shape:** (a) route `plugins --help`, `plugins help`, and subcommand help forms through static help rendering before config/plugin registry loading; (b) make JSON help return bounded stdout JSON with `kind:"plugin"`, `action:"help"`, `supported_actions[]`, and usage metadata; (c) if dynamic plugin state is intentionally consulted, enforce a short internal timeout and return a typed `plugin_help_unavailable` JSON error instead of zero-byte hangs; (d) add regression coverage with clean home and deprecated-config home proving plugin help emits bytes promptly and does not initialize slow lifecycle/registry paths. **Why this matters:** help must be the safe escape hatch when plugin lifecycle is broken. If every plugin help spelling can hang before bytes, claws cannot discover valid plugin actions, recover from registry issues, or explain how to fix lifecycle state without external docs. Source: gaebal-gajae dogfood response to Clawhip message `1506302942754639892` on 2026-05-19.
+    ```rust
+    "invalid value for --reasoning-effort: '{value}'; must be low, medium, or high"
+    ```

+    but `classify_error_kind()` has no branch for it, so every invalid reasoning-effort parse error becomes `kind:"unknown"`.

-456. **Static `--help --output-format json` hangs across multiple local lifecycle namespaces under a clean home, so help discovery is not a reliable no-side-effect escape hatch** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 15:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. After pushing the accumulated roadmap branch, clean-environment probes using isolated `HOME` and `CLAW_CONFIG_HOME` showed that `mcp --help --output-format json`, `agents --help --output-format json`, `skills --help --output-format json`, `memory --help --output-format json`, and `session --help --output-format json` each timed out after 8s with `stdout=0` and `stderr=0`. This extends the plugin-specific #455 into a shared parser/help-layer gap: even command namespaces whose help should be static and local can enter a zero-byte hang before any JSON or text help is emitted. **Required fix shape:** (a) centralize `--help`/`help` handling before config, auth, registry, session, MCP, memory, or plugin initialization for all local lifecycle namespaces; (b) make `--output-format json` help return a bounded stdout payload with `kind:"help"`, namespace, usage, supported actions/sections, output formats, and side-effect/auth requirements; (c) add a global deterministic help timeout guard that returns typed JSON such as `kind:"help_unavailable"` instead of allowing zero-byte hangs; (d) add clean-home regression coverage for `mcp`, `agents`, `skills`, `memory`, `session`, and `plugins` help forms proving they emit bytes promptly and do not touch slow lifecycle providers. **Why this matters:** help is the only safe discovery path when lifecycle state is broken. If help itself can hang with no bytes, claws cannot learn how to inspect or recover MCP, agents, skills, memory, sessions, or plugins without external docs or guesswork. Source: gaebal-gajae dogfood response to Clawhip message `1506310493080518767` on 2026-05-19.
+    **Why distinct from existing items:** ROADMAP #98 covers `--compact` silently ignored outside prompt/text output. This entry covers `--reasoning-effort`, a different prompt/API knob with different parser and provider semantics. ROADMAP #464 covers invalid `--output-format` enum parsing; this entry covers `--reasoning-effort` enum parsing plus valid-value invisibility. ROADMAP #468 covers duplicate global flags; this entry covers single valid flag accepted by diagnostic subcommands with no visible effect. ROADMAP #34/#35 cover OpenAI-compat reasoning-effort transport support; this entry is about CLI diagnostic surfaces accepting the knob without exposing whether it is active. No existing entry documents `--reasoning-effort high status` being a successful no-op/invisible state.

+    **Why this matters:** (1) **Prompt/API knobs on diagnostic commands are misleading.** A launcher may run `claw --reasoning-effort high status` to verify a lane is configured for high reasoning; status says ok but never confirms the knob. (2) **Automation cannot audit request options.** `status` is the lightweight preflight surface; if reasoning effort affects cost/latency/model behavior, claws need to know its active value before prompt execution. (3) **Same enum-parser quality gap repeats.** `LOW` and trailing whitespace are common wrapper/env-file outputs; the parser rejects them with `kind:"unknown"` instead of `kind:"invalid_reasoning_effort"`, no suggestion, no trim/case handling. (4) **Silent successful no-op and loud unknown invalid are both bad.** Valid values disappear; invalid values are misclassified. Together they make the knob hard to reason about programmatically. (5) **Regression tests are too shallow.** Existing tests around `rejects_invalid_reasoning_effort_value` assert the substring exists, and `accepts_valid_reasoning_effort_values` asserts parse state only. They do not assert JSON envelope kind/hint, diagnostic-surface visibility, or prompt-vs-diagnostic applicability.

-457. **Root help is bounded in text mode, but `help --output-format json` and command `--help --output-format json` convert help into a zero-byte hang** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 15:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. In an isolated clean `HOME`/`CLAW_CONFIG_HOME`, `./rust/target/debug/claw --help` and `./rust/target/debug/claw help` both exited 0 and printed 7403 bytes of root text help. But `help --output-format json`, `version --help --output-format json`, `doctor --help --output-format json`, and `status --help --output-format json` each timed out after 8s with `stdout=0` and `stderr=0`. This narrows #456: the help renderer itself is capable of returning promptly, but the parser/order path that combines help with JSON output appears to route into a different slow/non-returning path. **Required fix shape:** (a) parse `--help`/`help` before selecting command execution paths, auth, provider, session, or lifecycle initialization, while still preserving requested output format; (b) make root and command help JSON static/bounded with `kind:"help"`, `scope`, `command`, `usage`, `options`, `examples`, and `supported_output_formats`; (c) add regression coverage proving `claw --help`, `claw help`, `claw help --output-format json`, and representative command help forms all return within a small deterministic budget under clean home; (d) ensure JSON help does not silently fall back to text or zero-byte timeout. **Why this matters:** this is a parser-order failure in the safest command surface. Operators can get text help, but the moment automation asks for JSON discovery, help becomes a hang, forcing claws back to prose scraping or external docs. Source: gaebal-gajae dogfood response to Clawhip message `1506318038167847045` on 2026-05-19.
-
-
-458. **Global output-format flag ordering is parser-hostile, and even `version --output-format json` can hang under clean env despite normal-env success** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 16:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with normal-env `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. Clean-environment flag-order probes showed `--output-format json --help`, `--output-format json help`, `--output-format json version`, `--version --output-format json`, and `--output-format json --version` each failed immediately with text-mode stderr only, e.g. `[error-kind: cli_parse] error: unknown option: --output-format json --help`, and `stdout=0`. The parser appears to treat the whole trailing string as one unknown option rather than recognizing a global output-format flag before the command. Separately, in the same clean `HOME`/`CLAW_CONFIG_HOME`, canonical `version --output-format json` timed out after 8s with `stdout=0`/`stderr=0`, even though the same command succeeds in the normal environment. **Required fix shape:** (a) make global flags such as `--output-format json` accepted before or after the subcommand, or return structured JSON `cli_parse` errors on stdout when JSON format is requested anywhere in argv; (b) parse flag values as separate tokens in error reporting instead of echoing combined strings like `--output-format json --help` as one option; (c) ensure `version` is fully local/static and cannot hang under clean env or missing config/auth; (d) add clean-env regression coverage for `version --output-format json`, `--output-format json version`, `--version --output-format json`, and JSON parse errors with stdout envelopes. **Why this matters:** claws often put global flags first for CLI uniformity and run in sanitized envs. If global JSON selection is order-sensitive and local version can hang only under clean env, startup probes become unreliable exactly in CI/sandbox contexts. Source: gaebal-gajae dogfood response to Clawhip message `1506325598245748828` on 2026-05-19.
-
-
-459. **Dogfood timeout claims lack a required retry/evidence contract, so transient hangs can be recorded as durable product gaps without immediate reproducibility metadata** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 16:30 UTC nudge while narrowing #458 on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. The previous 16:00 pass saw clean-env `version --output-format json` time out with zero bytes, but a focused 16:30 retry matrix using isolated `HOME`/`CLAW_CONFIG_HOME` plus minimal env variants (`TERM`, `USER`/`LOGNAME`, `SHELL`, `LANG`/`LC_ALL`, and all combined) returned valid version JSON every time. That means the earlier timeout may have been transient harness load, process scheduling, or invocation interference, while the report format had no mandatory retry count, timing, command log artifact, or “reproduced N/M” field to distinguish flaky evidence from stable behavior. **Required fix shape:** (a) dogfood timeout reports must include retry count, per-attempt exit code/stdout/stderr byte counts, elapsed duration, env summary, binary provenance, and whether the failure reproduced after process isolation; (b) add a standard `timeout_evidence` block to report templates and ROADMAP entries before filing zero-byte hang claims; (c) classify un-reproduced hangs as `flaky_unconfirmed` with follow-up probes instead of stable product bugs; (d) provide a small harness command that runs bounded retries and emits machine-readable evidence JSON. **Why this matters:** zero-byte timeouts are high-severity but easy to misattribute. Without a retry/evidence contract, claws can pollute the backlog with transient scheduler artifacts or miss real nondeterministic hangs because the evidence shape is too thin. Source: gaebal-gajae dogfood response to Clawhip message `1506333141571211314` on 2026-05-19.
-
-
-460. **Root `help --output-format json` is reproducibly bounded but only wraps 7.4KB of prose in `{kind,message}`, with no structured command or slash-command schema** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 17:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. Applying the retry/evidence discipline from #459, three clean-home attempts of `./rust/target/debug/claw help --output-format json` all exited 0 with `stdout=7563`, `stderr=0`, valid JSON keys exactly `kind,message`, `kind:"help"`, and a `message` string of length 7401. There were no `commands[]`, `options[]`, `slash_commands[]`, resume-safety flags, output-format support metadata, side-effect/auth requirements, or examples as structured fields. This supersedes the flaky zero-byte hang framing from #457 for root help: the stable reproducible gap is schema opacity, not timeout. **Required fix shape:** (a) keep `message` for human rendering but add a versioned structured help schema with `schema_version`, `commands[]`, `global_options[]`, `slash_commands[]`, `examples[]`, and `related_docs[]`; (b) include per-command fields such as `name`, `aliases`, `usage`, `description`, `supports_json`, `requires_auth`, `side_effects`, and `resume_safe` where applicable; (c) expose slash-command metadata without requiring prose scraping; (d) add regression coverage proving root help JSON has stable structured fields and that old `message` remains optional/backward-compatible. **Why this matters:** help JSON is the bootstrap discovery surface for claws. Valid JSON that contains only prose still forces automation to scrape text before choosing safe commands or resume paths. Source: gaebal-gajae dogfood response to Clawhip message `1506340691431657472` on 2026-05-19.
-
-
-461. **Command-specific `--help --output-format json` reproducibly zero-byte hangs even though root JSON help returns bounded prose JSON** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 17:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. After #460 confirmed root `help --output-format json` is bounded but schema-opaque, two clean-home attempts each for `status --help --output-format json`, `doctor --help --output-format json`, `version --help --output-format json`, and `sandbox --help --output-format json` all timed out after 8s with `stdout=0` and `stderr=0`. This establishes a split contract: root JSON help reaches the help serializer, while command-specific JSON help falls into a non-returning command execution/parser path. **Required fix shape:** (a) add a command-help dispatch layer that catches `<command> --help` before entering the command's runtime handler; (b) share the same bounded JSON help schema from #460 for command-specific help, with fields `kind:"help"`, `command`, `usage`, `options`, `examples`, `supports_json`, `requires_auth`, and `side_effects`; (c) ensure local/static commands like `version`, `status`, `doctor`, and `sandbox` never initialize slow providers just to render help; (d) add clean-home regression coverage proving command-specific JSON help emits bytes promptly for representative static and lifecycle commands. **Why this matters:** claws often discover command contracts one command at a time. If root help is available but every command's JSON help hangs, automation still cannot inspect option-level semantics safely and must scrape root prose or guess. Source: gaebal-gajae dogfood response to Clawhip message `1506348241128788111` on 2026-05-19.
-
-
-462. **Text-mode `<command> --help` also zero-byte hangs, proving the bug is command-help dispatch rather than JSON serialization** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 18:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. After #461 showed command-specific JSON help hangs, a text-mode retry matrix in isolated clean `HOME`/`CLAW_CONFIG_HOME` showed two attempts each for `status --help`, `doctor --help`, `version --help`, `sandbox --help`, `mcp --help`, `agents --help`, and `skills --help` all timed out after 6s with `stdout=0` and `stderr=0`. Root `--help` and `help` remain bounded, so the split is not help text rendering generally and not JSON formatting specifically; it is the command-specific help dispatch path failing to intercept `<command> --help` before some non-returning runtime path. **Required fix shape:** (a) implement a first-stage argv parser that recognizes `<command> --help` and `<command> help` for every registered command before command runtime initialization; (b) render static text help in text mode and structured JSON help in JSON mode from the same command metadata registry; (c) add regression coverage for both text and JSON help for representative static commands (`version`, `status`, `doctor`, `sandbox`) and lifecycle commands (`mcp`, `agents`, `skills`, `plugins`); (d) ensure every command-help path has a bounded no-provider/no-auth/no-config execution budget. **Why this matters:** users do not only run root `--help`; they naturally ask `claw status --help` or `claw mcp --help`. If that path hangs silently, the product loses the most basic local recovery surface before any real action starts. Source: gaebal-gajae dogfood response to Clawhip message `1506355792582938665` on 2026-05-19.
-
-
-463. **Root help advertises direct slash examples like `claw /skills`, but direct slash behavior is inconsistent: `/skills` runs a huge local report, `/help` aliases root help, while `/status` is rejected as interactive-only despite being marked resume-safe** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 18:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. Root help examples include `claw /skills`, and clean-home probes showed direct `claw /skills` exits 0 and prints a 27KB skill inventory, while direct `claw /help` exits 0 and prints root help. But direct `claw /status` exits 1 with `slash command /status is interactive-only... use claw --resume SESSION.jsonl /status ... when the command is marked [resume] in /help`, even though `/status` is explicitly marked `[resume]` in root help and has a top-level sibling `claw status`. The same surface therefore mixes three semantics for direct slash invocation: accepted alias, accepted local slash report, and rejected interactive-only/resume-only command. **Required fix shape:** (a) define a single direct-slash CLI contract: either reject all slash commands outside REPL/resume with structured guidance, or allow resume-safe/local slash commands consistently; (b) if allowing direct slash commands, route `/status` to the same local/resume-safe status serializer as `claw status` or require `--resume` with a typed `resume_required` code; (c) make help examples distinguish top-level commands from slash commands and avoid advertising `claw /skills` unless direct slash invocation is intentional for the whole supported set; (d) add regression coverage for `/help`, `/skills`, `/status`, `/mcp`, `/agents`, and their top-level equivalents proving consistent direct/resume/text/json behavior. **Why this matters:** claws copy help examples literally. If `claw /skills` works but `claw /status` says interactive-only despite `[resume]`, automation cannot infer which slash commands are safe outside the REPL and will oscillate between direct, top-level, and resume forms. Source: gaebal-gajae dogfood response to Clawhip message `1506363340048564425` on 2026-05-19.
-
-
-464. **Global `--output-format json` placement is broken for top-level subcommands: post-subcommand placement silently hangs, while pre-subcommand placement is rejected as `cli_parse` despite help documenting `--output-format` as a global flag** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 19:00/19:30 UTC nudges on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. Root help lists `--output-format FORMAT` in the global Flags section and examples like `claw [--model MODEL] [--output-format text|json] prompt TEXT`, while top-level commands (`claw status`, `claw doctor`, `claw mcp`, `claw skills`, `claw version`) advertise local diagnostic/report surfaces. Clean-home probes showed two distinct bad paths: `claw version --output-format json`, `claw status --output-format json`, and `claw skills --output-format json` timed out after 5s with `stdout=0`/`stderr=0`; the more canonical GNU-style global placement `claw --output-format json version`, `claw --output-format json status`, and `claw --output-format json skills` exited 1 with `[error-kind: cli_parse] error: unknown option: --output-format json <command>`. The only working form observed for `version` is the special local parser path used by `claw version --output-format json` in a non-clean environment/provenance preflight, which contradicts the clean-home bounded test and suggests the parser/runtime path is environment-sensitive as well as placement-sensitive. **Required fix shape:** (a) make `--output-format` a true global flag accepted before any subcommand and before slash commands; (b) make top-level local commands also accept trailing `--output-format` if the project wants common CLI ergonomics; (c) normalize both placements into one parsed `OutputFormat` before command dispatch; (d) ensure parse errors in JSON-requested mode emit structured JSON rather than prose-on-stderr; (e) add clean-home regression coverage for `claw --output-format json version|status|doctor|mcp|skills` and `claw version|status|doctor|mcp|skills --output-format json`, with bounded no-provider execution. **Why this matters:** claws and shell users routinely place global flags either before or after subcommands. A machine-readable output flag that sometimes hangs and sometimes parse-errors means automation cannot reliably request JSON for the exact local diagnostics it needs during recovery. Source: gaebal-gajae dogfood response to Clawhip messages `1506370889653027012` and `1506378443812765756` on 2026-05-19.
-
-
-465. **`claw skills` is technically bounded but operationally noisy: the default text output dumps full descriptions for every discovered skill (~27KB / 65 skills in clean-home dogfood), unlike `status`, `doctor`, `mcp`, `sandbox`, and `agents` which stay compact enough for recovery logs** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 20:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. Clean-home text probes showed `version`, `status`, `doctor`, `mcp`, `sandbox`, and `agents` all exit promptly with compact diagnostic reports (`version` 136 bytes, `status` 1274 bytes, `doctor` 2519 bytes, `mcp` 117 bytes, `sandbox` 352 bytes, `agents` 17 bytes). `skills` also exits 0, but emits 27,573 bytes by default because it prints every skill name plus full description. This makes `claw skills` a poor first-response diagnostic in clawhip logs, tmux tails, CI failure artifacts, and copied support snippets: the useful inventory count and roots are buried under pages of prose. This is distinct from discovery-scope/security issues (#85/#95): even when the skill set is legitimate, the default report shape is too verbose for a recovery command. **Required fix shape:** (a) make default text `claw skills` compact: counts by source/root plus names only, truncated with an explicit `--verbose` hint; (b) add `claw skills --verbose` or `claw skills list --verbose` for full descriptions; (c) add `--format compact|verbose` or reuse `--compact` if the CLI standardizes it; (d) keep JSON mode complete but add `summary` and `entries[].description` fields so consumers can choose; (e) add regression coverage enforcing that default text output for 50+ skills stays under a small byte/line budget while verbose preserves current detail. **Why this matters:** local diagnostics should be safe to paste and scan during failures. A 27KB default skill dump hides the actual signal and makes every dogfood/support loop noisier than necessary. Source: gaebal-gajae dogfood response to Clawhip message `1506385992368652489` on 2026-05-19.
-
-
-466. **Syntactically valid MCP config with a nonexistent command is reported as healthy/configured by `doctor`, `mcp`, and `status` instead of surfacing executable reachability as degraded** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 20:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. In a temp workspace containing only `.claw.json` with `mcpServers.broken.command:"/definitely/not/a/real/mcp-server"`, clean-home text probes showed `doctor` exits 0 with `Failures 0`, `Config Status ok`, and `Boot preflight ... mcp=true · servers 1`; `mcp` exits 0 and lists `broken stdio project /definitely/not/a/real/mcp-server --serve`; `status` exits 0 and reports `Boot preflight ... mcp=true plugins=true last_failed=none`. None of the three surfaces attempts even a cheap executable existence/PATH reachability check, so a server that cannot possibly launch is indistinguishable from a configured usable MCP server until the first runtime tool call fails. This is distinct from malformed-config degraded-mode items (#143/#144/#440): here the JSON shape is valid and the parser succeeds, but lifecycle readiness is false. **Required fix shape:** (a) add an MCP executable preflight pass that classifies each stdio server as `reachable:true|false|unknown` using absolute-path existence/executable-bit checks and PATH lookup for bare commands, without launching untrusted code; (b) expose per-server fields in `claw mcp` / JSON (`launch_status`, `command_exists`, `command_executable`, `path_resolution`, `error_kind:"mcp_command_not_found"`); (c) make `doctor` add an `mcp_reachability` check with `status:"warn"` or `"fail"` when any configured server command is missing; (d) make `status` distinguish `mcp_configured:true` from `mcp_reachable:false` instead of the current single `mcp=true`; (e) regression test with one valid `/bin/echo` server and one nonexistent absolute path proving partial status is reported. **Fresh mixed-PATH proof (21:00 UTC):** a temp config containing one PATH-resolvable server (`pathEcho.command:"echo"`) and one missing bare command (`missingBare.command:"definitely-not-a-real-mcp-bare-command-xyz"`) still made `mcp` list both as configured, `doctor` report `Failures 0` and `mcp=true · servers 2`, and `status` report `mcp=true` with no degraded marker. So the fix must handle both absolute paths and PATH lookup for bare commands, and it must preserve partial success (`echo` reachable, missingBare not found) instead of collapsing both into `configured`. **Why this matters:** MCP failures are often setup/path mistakes. If doctor says failures 0 and mcp=true while the command path does not exist, claws and users waste turns discovering the break only after a blocked tool call. Source: gaebal-gajae dogfood response to Clawhip messages `1506393539385491598` and `1506401088801083495` on 2026-05-19.
+    **Required fix shape:** (a) Decide contract: either reject prompt/API-only knobs (`--reasoning-effort`, maybe future `--max-output-tokens`) on local diagnostic subcommands with a structured `unsupported_flag_for_subcommand` error, or expose them in `status`/`doctor` as `request_options.reasoning_effort` / `active_request_options`. (b) If accepted, add `reasoning_effort` to `status --output-format json` and a `request_options` doctor check so preflight can verify it. (c) Register `invalid_reasoning_effort` in `classify_error_kind` and split a real `hint` field (valid values: `low`, `medium`, `high`). (d) Normalize or suggest common variants: trim whitespace, lower-case `LOW`/`Low`, or produce `Did you mean: low?`. (e) Add regression coverage for valid visibility on `status`/`doctor`, invalid-value JSON kind/hint, and prompt-path preservation. **Acceptance check:** `claw --output-format json --reasoning-effort high status | jq -e '.request_options.reasoning_effort == "high" or .ignored_flags[]?.flag == "--reasoning-effort"'` should pass; current output has neither. `claw --output-format json --reasoning-effort LOW status 2>&1 | jq -e '.kind == "invalid_reasoning_effort"'` should pass; current kind is `unknown`. Source: gaebal-gajae dogfood for the 2026-05-24 19:30 Clawhip nudge. Number intentionally skips #469 because Jobdori publicly reported a local #469 (`/compact` slash divergence) not yet pushed; this avoids collision.