Compare commits

..

3 Commits

Author SHA1 Message Date
Yeachan-Heo
53bd4f0666 Keep roadmap diff check clean
Remove only the trailing EOF blank line so the PR satisfies whitespace validation without touching test-only Rust coverage.

Constraint: PR #2837 review requested preserving the Rust test module.

Rejected: remove Rust test module | OMX review determined it is test-only and acceptable.

Confidence: high

Scope-risk: narrow

Directive: Do not expand this fix beyond ROADMAP.md EOF whitespace.

Tested: git diff --check origin/main...HEAD; scripts/fmt.sh --check; git diff --name-status origin/main...HEAD
2026-04-29 10:38:10 +00:00
YeonGyu-Kim
1f90198893 fix: add session_lifecycle field to test StatusContext initializer (ci) 2026-04-29 10:34:22 +00:00
YeonGyu-Kim
d5e1e0b8f5 docs(roadmap): add #322 — deprecation warnings corrupt json output stream 2026-04-29 10:32:52 +00:00
2 changed files with 70 additions and 8 deletions

View File

@@ -6258,12 +6258,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
248. **Non-interactive prompt mode can exceed caller timeouts with no in-band startup/API phase event or partial status artifact** — dogfooded 2026-04-29 from live tmux session `claw-code-issue-247-human-fresh-run` after the owner explicitly asked gaebal-gajae to make a fresh session and use `claw-code` directly. The actual `./rust/target/debug/claw` binary was launched via `clawhip tmux new` on current main. `claw doctor --output-format json` and `claw status --output-format json` both succeeded and reported auth/config/workspace ok, but minimal non-interactive prompt calls (`timeout 120 ./rust/target/debug/claw --output-format json --dangerously-skip-permissions "echo hello"` and `timeout 120 ./rust/target/debug/claw --output-format json prompt "Reply with just the word hello"`) both timed out from the outer harness after roughly 150s with only `Command exceeded timeout` visible. There was no machine-readable `api_request_started`, `waiting_for_first_token`, provider/model/base-url identity, retry count, or partial status file/event that would let clawhip distinguish slow provider, network stall, auth/OAuth drift, stream parser hang, or prompt-mode bug. **Required fix shape:** (a) emit structured non-interactive lifecycle events for `startup_ok`, `api_request_started`, `first_byte/first_token`, retry/backoff, and terminal `timeout_or_stall` states; (b) include provider/model/base URL source and auth source category without leaking secrets; (c) support a CLI/request timeout flag or env override that returns a typed JSON error before the outer orchestrator kills the process; (d) write/emit a final partial status artifact on timeout so lane monitors do not have to infer state from a dead process. **Why this matters:** non-interactive prompt mode is the automation path; if it can hang past the caller's timeout while doctor/status are green, claws lose the ability to tell whether startup, auth, transport, provider latency, or stream consumption failed. Source: live session `claw-code-issue-247-human-fresh-run` on 2026-04-29.
249. **`/issue` advertises GitHub issue creation but never reaches a GitHub/OAuth/auth preflight or creation path, and the non-interactive error suggests unusable resume forms** — dogfooded 2026-04-29 on current main `8e22f757` while chasing the remaining Phase-0 GitHub OAuth blocker. The visible help advertises `/issue [context]` as “Draft or create a GitHub issue from the conversation,” but the actual implementation path only renders a local `Issue` report (`format_issue_report`) and does not invoke `gh`, GitHub API, OAuth, token discovery, browser auth, or even a dry-run/auth-preflight surface. Direct non-interactive use (`./rust/target/debug/claw '/issue dogfood test'`) returns `slash command /issue dogfood test is interactive-only` and suggests `claw --resume SESSION.jsonl /issue ...` / `claw --resume latest /issue ...` “when the command is marked [resume]”, while `/help` does not mark `/issue` as resume-safe and resume dispatch rejects interactive-only commands. That leaves operators with a GitHub-labeled command whose real behavior is neither issue creation nor a clear GitHub OAuth blocker. **Required fix shape:** (a) split the contract explicitly: either rename/copy to “draft issue text” or implement a real `create` path with GitHub auth preflight; (b) surface a machine-readable GitHub auth state (`gh_cli_authenticated`, `github_token_present`, `oauth_required`, `creation_unavailable`) before any issue-create attempt; (c) make the direct-mode error avoid suggesting resume forms for commands not marked resume-safe; (d) add regression coverage proving `/issue` help, direct-mode rejection, resume support flags, and creation/draft behavior agree. **Why this matters:** Phase-0 GitHub OAuth verification cannot complete if the only GitHub issue surface stops at local prose while still advertising creation. Claws need to know whether they are missing GitHub auth, using a draft-only helper, or hitting an unimplemented creation path. Source: gaebal-gajae dogfood cycle in `#clawcode-building-in-public` on 2026-04-29.
322. **Config deprecation warnings are emitted to stderr even under `--output-format json`, making JSON output unparseable from combined stdout+stderr capture** — dogfooded 2026-04-29 by Jobdori on current main (`8e22f75`). Running `cargo run --bin claw -- doctor --output-format json 2>&1 | python3 -c "import sys,json; json.loads(sys.stdin.read())"` fails with `Expecting value: line 1 column 1 (char 0)` because a `warning: /path/settings.json: field "enabledPlugins" is deprecated. Use "plugins.enabled" instead` line is emitted to stderr before the JSON body begins. When a caller captures combined output (the common automation pattern: `2>&1`, subprocess `STDOUT | STDERR`, PTY capture, or tmux pane scrape) the warning prefix breaks JSON parse for every downstream consumer. Root cause: `rust/crates/runtime/src/config.rs` line ~300 calls `eprintln!("warning: {warning}")` unconditionally during `ClawSettings::load_merged()` regardless of active output format. **Required fix shape:** (a) thread the active `CliOutputFormat` through the config loading path and suppress or defer human-readable warning strings when `json` mode is active; (b) instead, collect deprecation diagnostics and inject them into the JSON output as a top-level `"warnings": [...]` array (same field already used by `doctor`); (c) ensure the JSON body is always the first bytes on stdout and all prose warnings stay on stderr or are suppressed in json mode; (d) add regression coverage proving `claw <any-cmd> --output-format json` stdout is valid JSON regardless of config deprecation state. **Why this matters:** `--output-format json` is the automation/claw contract; if config warnings can silently corrupt the JSON stream, every orchestration layer that captures combined output gets broken parse-on-warning with no stable fallback. Source: Jobdori live dogfood on mengmotaHost, claw-code main `8e22f75`, 2026-04-29.
323. **`status --output-format json` reports `session.session = "live-repl"` while simultaneously reporting `session_lifecycle.kind = "saved_only"` — contradictory session identity in a single status snapshot** — dogfooded 2026-04-29 by Jobdori on current main (`804d96b`). Running `claw status --output-format json` from an active REPL-style invocation produced `"session": "live-repl"` in the `workspace` block and `"session_lifecycle": {"kind": "saved_only", "pane_id": null, ...}` in the same object. Those two fields carry contradictory claims: `"live-repl"` asserts there is an active interactive session, while `"saved_only"` asserts there is no live tmux pane hosting the session — the session exists only as a saved artifact. A downstream claw reading this snapshot cannot tell which claim to trust: is this a running session whose pane is undetectable, or a saved-only session that the `session` field is misclassifying? Root cause: `"live-repl"` is a fallback sentinel emitted by `main.rs:6070` when `context.session_path` is `None`, while `session_lifecycle` is computed independently by `classify_session_lifecycle_for()` from tmux pane discovery; the two fields share no common source and can diverge. **Required fix shape:** (a) derive both `session.session` and `session_lifecycle.kind` from the same lifecycle classification result so they cannot diverge; (b) replace the `"live-repl"` free-form sentinel with a structured `session_kind` field (`live_repl`, `saved`, `resume`, etc.) that carries the same type vocabulary as `session_lifecycle.kind`; (c) when `session_lifecycle.kind = "saved_only"`, never emit `"session": "live-repl"` (or vice versa); (d) add a regression test proving `status --output-format json` never emits `session.kind = "live_repl"` and `session_lifecycle.kind = "saved_only"` simultaneously. **Why this matters:** `status --output-format json` is the machine-readable truth surface for session state; if two fields in the same snapshot contradict each other, every lane, monitor, and orchestrator has to pick a winner instead of reading a coherent state. Source: Jobdori live dogfood on mengmotaHost, claw-code `804d96b`, 2026-04-29.
324. **Stale local debug binaries can impersonate the current workspace because version/status/doctor do not compare embedded build provenance to repo HEAD** — dogfooded 2026-04-29 on current `origin/main` / workspace HEAD `e7074f47` after PR #2838. The working tree was at `e7074f47`, but running `./rust/target/debug/claw version --output-format json` reported embedded `git_sha` `1f901988`. `status` and `doctor` remained green and exposed no warning that the executable under test was stale relative to the workspace HEAD, nor any structured build-provenance freshness signal that downstream claws could use to decide whether the observed behavior came from the checked-out code or an older debug artifact. This is a repo-identity opacity gap: the JSON truth surfaces can look authoritative while actually describing a different binary lineage than the source tree being dogfooded. **Required fix shape:** (a) compare the embedded build `git_sha` / build date with the current workspace git HEAD and dirty state when the binary can discover a containing worktree; (b) expose redaction-safe structured fields in `version --output-format json`, `status --output-format json`, and `doctor --output-format json`, including `binary_provenance`, `workspace_head`, and `stale_binary` (with enough reason/detail to distinguish clean match, dirty workspace, unknown workspace, and definite stale SHA mismatch); (c) warn in human/text mode when executing a stale local debug binary such as `./rust/target/debug/claw` so dogfooders do not trust old behavior as current-main evidence; (d) avoid leaking secrets or absolute sensitive paths beyond the existing workspace-identification policy; (e) add regression/fixture coverage for matching HEAD, dirty workspace, no-worktree/unknown provenance, and stale embedded SHA cases. **Why this matters:** status/doctor/version are supposed to be the machine-readable basis for dogfood truth. If a stale binary can report a different `git_sha` than the checked-out repo without any freshness warning, claws can file or verify bugs against the wrong code and waste cycles chasing already-fixed or not-yet-built behavior. Source: gaebal-gajae dogfood follow-up from current main `e7074f47` after PR #2838; observed `./rust/target/debug/claw version --output-format json` reporting `git_sha` `1f901988` with no stale-binary-vs-workspace-HEAD warning.
325. **`help --output-format json` returns valid JSON but hides the actual help schema inside one prose `message` string** — dogfooded 2026-04-29 on current `origin/main` / workspace HEAD `d607ff36`. Running `./rust/target/debug/claw help --output-format json` produces parseable JSON, but the object only exposes top-level keys like `kind` and `message`; all command names, global flags, slash-command metadata, aliases, resume-safety, output-format support, auth/preflight notes, and descriptions are flattened into one human-oriented prose blob. That technically satisfies “valid JSON” while still forcing automation to scrape the same help text humans read, making `/issue`, `/help`, and resume-safety contracts opaque to claws. **Required fix shape:** (a) keep `message` as the compact human-rendered help summary, but add a documented structured schema with `schema` / `schema_version` fields; (b) expose first-class arrays/objects such as `commands[]`, `options[]`, and `slash_commands[]` with stable fields including `name`, `aliases`, `description`, `args`, `output_formats_supported`, `resume_safe`, `interactive_only`, and `creates_external_side_effects`; (c) include auth and creation preflight metadata where relevant, especially for GitHub/issue flows (`auth_preflight`, `creation_unavailable`, `gh_cli_authenticated`, `github_token_present`, or equivalent non-secret state); (d) make `/issue`, `/help`, aliases, and resume-dispatch safety machine-readable from the JSON payload instead of recoverable only by parsing prose markers; (e) add regression coverage proving `help --output-format json` is valid JSON and that `/issue`, `/help`, resume-safe vs interactive-only slash commands, aliases, descriptions, supported output formats, and side-effect/auth-preflight fields are present and internally consistent. **Why this matters:** help JSON is the discoverability surface automation uses before invoking commands. If it is just prose wrapped in JSON, claws cannot safely decide whether a command can run non-interactively, resume from a saved session, create external GitHub side effects, or requires auth/preflight without brittle text scraping. Source: gaebal-gajae dogfood follow-up from current main `d607ff36`; observed `./rust/target/debug/claw help --output-format json` returning valid JSON with only `{kind,message}` at the top level while the actionable command schema remained buried in `message`.
336. **`/session` and `/session list` have inconsistent resume-mode support: bare `/session` returns `"unsupported resumed slash command"` while `/session list` works in `--resume` — yet `/help` JSON exposes no per-subcommand resume flag** — dogfooded 2026-04-29 by Jobdori on current main (`b90875f`). Running `claw --output-format json --resume latest /session` returns `{"command":"/session","error":"unsupported resumed slash command","type":"error"}`. Running `claw --output-format json --resume latest /session list` returns a valid session list JSON. The two forms share the same slash command name but have divergent resume-mode contracts with no documentation path to learn which subcommands are resume-safe. Additionally, the error JSON uses `"type"` instead of `"kind"` — violating the vocabulary used by all other JSON outputs (`kind=status`, `kind=version`, `kind=doctor`, etc.), making error responses structurally inconsistent with success responses. **Required fix shape:** (a) document per-subcommand resume support in `/help` JSON output — each command entry should carry `"resume_subcommands": ["list"]` or similar so callers know which subcommands work in `--resume` mode; (b) align the error JSON vocabulary: replace `"type"` with `"kind"` and use `kind=error` consistently; (c) either mark `/session` bare as `[resume]`-safe (showing the list by default) or emit a typed error that names the resume-safe subcommands; (d) add regression coverage proving that all error JSON responses use `"kind"` not `"type"`, and that `/session list` resume-mode support is stable. **Why this matters:** callers using `/session` in automation cannot discover from `/help` or error text which subcommands work in resume-mode; the `type` vs `kind` vocabulary mismatch in errors also breaks generic JSON-kind dispatchers that expect a stable `kind` field. Source: Jobdori live dogfood on mengmotaHost, claw-code `b90875f`, 2026-04-29.

View File

@@ -13619,3 +13619,72 @@ mod dump_manifests_tests {
let _ = fs::remove_dir_all(&root);
}
}
#[cfg(test)]
mod doctor_broad_cwd_tests {
//! #122b regression tests: doctor's workspace check must surface broad-cwd
//! as a warning, matching runtime (Prompt/REPL) refuse-to-run behavior.
//! Without these, `claw doctor` in ~/ or / reports "ok" while `claw prompt`
//! in the same dir errors out — diagnostic deception.
use super::{check_workspace_health, render_diagnostic_check, StatusContext};
use std::path::PathBuf;
fn make_ctx(cwd: PathBuf, project_root: Option<PathBuf>) -> StatusContext {
use super::{SessionLifecycleKind, SessionLifecycleSummary};
use runtime::SandboxStatus;
StatusContext {
cwd,
session_path: None,
loaded_config_files: 0,
discovered_config_files: 0,
memory_file_count: 0,
project_root,
git_branch: None,
git_summary: super::parse_git_workspace_summary(None),
sandbox_status: SandboxStatus::default(),
config_load_error: None,
session_lifecycle: SessionLifecycleSummary {
kind: SessionLifecycleKind::SavedOnly,
pane_id: None,
pane_command: None,
pane_path: None,
workspace_dirty: false,
abandoned: false,
},
}
}
#[test]
fn workspace_check_in_project_dir_reports_ok() {
// #122b non-regression: non-broad project dir should stay OK.
let ctx = make_ctx(
PathBuf::from("/tmp/my-project"),
Some(PathBuf::from("/tmp/my-project")),
);
let check = check_workspace_health(&ctx);
// Use rendered output as the contract surface.
let rendered = render_diagnostic_check(&check);
assert!(
rendered.contains("Status ok"),
"project dir should be OK; got:\n{rendered}"
);
}
#[test]
fn workspace_check_outside_project_reports_warn() {
// #122b non-regression: non-broad, non-git dir stays as Warn with the
// "not inside a git project" summary.
let ctx = make_ctx(PathBuf::from("/tmp/random-dir-not-project"), None);
let check = check_workspace_health(&ctx);
let rendered = render_diagnostic_check(&check);
assert!(
rendered.contains("Status warn"),
"non-git dir should warn; got:\n{rendered}"
);
assert!(
rendered.contains("not inside a git project"),
"should report not-in-project; got:\n{rendered}"
);
}
}