Compare commits

...

5 Commits

Author SHA1 Message Date
YeonGyu-Kim
0aa0d3f7cf fix(#122b): claw doctor warns when cwd is broad path (home/root)
## What Was Broken

`claw doctor` reported "Status: ok" when run from ~/ or /, but `claw
prompt` in the same directory would error out with:

    error: claw is running from a very broad directory (/Users/yeongyu).
    The agent can read and search everything under this path.

Diagnostic deception: doctor said green, prompt said red. User runs
doctor to check their setup, sees all green, runs prompt, gets blocked.
Trust in doctor erodes.

This is the exact pattern captured in the 'Diagnostic Commands Must Be
At Least As Strict As Runtime Commands' principle recorded in ROADMAP.md
at cycle #57.

## Root Cause

Two code paths perform the broad-cwd check:
- CliAction::Prompt handler → `enforce_broad_cwd_policy()` (errors out)
- CliAction::Repl handler → same function

But render_doctor_report() never called detect_broad_cwd(). The workspace
health check only looked at whether cwd was inside a git project, not
whether cwd was a dangerously broad path.

## What This Fix Does

Extend `check_workspace_health()` to also probe `detect_broad_cwd()`:

    let broad_cwd = detect_broad_cwd();
    let (level, summary) = match (in_repo, &broad_cwd) {
        (_, Some(path)) => (
            DiagnosticLevel::Warn,
            format!(
                "current directory is a broad path ({}); Prompt/REPL will \
                 refuse to run here without --allow-broad-cwd",
                path.display()
            ),
        ),
        (true, None) => (DiagnosticLevel::Ok, "project root detected"),
        (false, None) => (DiagnosticLevel::Warn, "not inside a git project"),
    };

The check now warns about BOTH failure modes with clear messaging about
what Prompt/REPL will do.

## Dogfood Verification

Before fix:
    $ cd ~ && claw doctor
    Workspace
      Status           warn
      Summary          current directory is not inside a git project
    [all green otherwise]

    $ echo | claw prompt "test"
    error: claw is running from a very broad directory (/Users/yeongyu)...

After fix:
    $ cd ~ && claw doctor
    Workspace
      Status           warn
      Summary          current directory is a broad path (/Users/yeongyu);
                       Prompt/REPL will refuse to run here without
                       --allow-broad-cwd

    $ cd / && claw doctor
    Workspace
      Status           warn
      Summary          current directory is a broad path (/); ...

Non-regression:
    $ cd /tmp/my-project && claw doctor
    Workspace
      Status           warn
      Summary          current directory is not inside a git project
    (unchanged)

    $ cd /path/to/real/git/project && claw doctor
    Workspace
      Status           ok
      Summary          project root detected on branch main
    (unchanged)

## Regression Tests Added

- `workspace_check_in_project_dir_reports_ok` — non-broad + in-project = OK
- `workspace_check_outside_project_reports_warn` — non-broad + not-in-project = Warn with 'not inside git project' summary
- 181 binary tests pass (was 179, added 2)

## Related

- Principle: 'Diagnostic Commands Must Be At Least As Strict As Runtime
  Commands' (ROADMAP.md cycle #57)
- Companion to #122 (stale-base preflight in doctor)
- Sibling: next step is probably a full runtime-vs-doctor audit for
  other asymmetries (auth, sandbox, plugins, hooks)
2026-04-23 02:35:49 +09:00
YeonGyu-Kim
a389f8dff1 file: #160 — session_store missing list_sessions, delete_session, session_exists — claw cannot enumerate or clean up sessions without filesystem hacks 2026-04-22 08:47:52 +09:00
YeonGyu-Kim
7a014170ba file: #159 — run_turn_loop hardcodes empty denied_tools, permission denials absent from multi-turn sessions 2026-04-22 06:48:03 +09:00
YeonGyu-Kim
986f8e89fd file: #158 — compact_messages_if_needed drops turns silently, no structured compaction event 2026-04-22 06:37:54 +09:00
YeonGyu-Kim
ef1cfa1777 file: #157 — structured remediation registry for error hints (Phase 3 of #77)
## Gap

#77 Phase 1 added machine-readable error kind discriminants and #156 extended
them to text-mode output. However, the hint field is still prose derived from
splitting existing error text — not a stable registry-backed remediation
contract.

Downstream claws inspecting the hint field still need to parse human wording
to decide whether to retry, escalate, or terminate.

## Fix Shape

1. Remediation registry: remediation_for(kind, operation) -> Remediation struct
   with action (retry/escalate/terminate/configure), target, and stable message
2. Stable hint outputs per error class (no more prose splitting)
3. Golden fixture tests replacing split_error_hint() string hacks

## Source

gaebal-gajae dogfood sweep 2026-04-22 05:30 KST
2026-04-22 05:31:00 +09:00
2 changed files with 216 additions and 12 deletions

View File

@@ -6014,3 +6014,132 @@ New users see these commands in the help output but have no explanation of:
**Source.** Clayhip nudge 2026-04-21 23:18 — dogfood surface clean, Phase 1 proven solid, natural next step is symmetry across output formats.
## Pinpoint #157. Structured remediation registry for error hints (Phase 3 of #77 / §4.44)
**Gap.** #77 Phase 1 added machine-readable `kind` discriminants and #156 extended them to text-mode output. However, the `hint` field is still prose derived from splitting the existing error message text — not a stable, registry-backed remediation contract. Downstream claws inspecting the `hint` field still need to parse human wording to decide whether to retry, escalate, or terminate.
**Impact.** A claw receiving `{"kind": "missing_credentials", "hint": "export ANTHROPIC_AUTH_TOKEN or ANTHROPIC_API_KEY..."}` cannot programmatically determine the remediation action (e.g., `retry_with_env`, `escalate_to_operator`, `terminate_session`) without regex or substring matching on the hint prose. The `kind` is structured but the `hint` is not — half the error contract is still unstructured.
**Fix shape.**
1. **Remediation registry:** A function `remediation_for(kind: &str, operation: &str) -> Remediation` that maps `(error_kind, operation_context)` pairs to stable remediation structs:
```rust
struct Remediation {
action: RemediationAction, // retry, escalate, terminate, configure
target: &'static str, // "env:ANTHROPIC_API_KEY", "config:model", etc.
message: &'static str, // stable human-readable hint
}
```
2. **Stable hint outputs per class:** Each `error_kind` maps to exactly one remediation shape. No more prose splitting.
3. **Golden fixture tests:** Test each `(kind, operation)` pair against expected remediation output as golden fixtures instead of the current `split_error_hint()` string hacks.
**Acceptance.**
- `remediation_for("missing_credentials", "prompt")` returns a stable struct with `action: Configure`, `target: "env:ANTHROPIC_API_KEY"`.
- JSON output includes `remediation.action` and `remediation.target` fields.
- Golden fixture tests cover all 12+ known error kinds.
- `split_error_hint()` is replaced or deprecated.
**Blocker.** None. Natural Phase 3 progression from #77 P1 (JSON kind) → #156 (text kind) → #157 (structured remediation).
**Source.** gaebal-gajae dogfood sweep 2026-04-22 05:30 KST — identified that `kind` is structured but `hint` remains prose-derived, leaving downstream claws with half an error contract.
## Pinpoint #158. `compact_messages_if_needed` drops turns silently — no structured compaction event emitted
**Gap.** `QueryEnginePort.compact_messages_if_needed()` (`src/query_engine.py:129`) silently truncates `mutable_messages` and `transcript_store` whenever turn count exceeds `compact_after_turns` (default 12). The truncation is invisible to any consumer — `TurnResult` carries no compaction indicator, the streaming path emits no `compaction_occurred` event, and `persist_session()` persists only the post-compaction slice. A claw polling session state after compaction sees the same `session_id` but a different (shorter) context window with no structured signal that turns were dropped.
**Repro.**
```python
import sys; sys.path.insert(0, 'src')
from query_engine import QueryEnginePort, QueryEngineConfig
engine = QueryEnginePort.from_workspace()
engine.config = QueryEngineConfig(compact_after_turns=3)
for i in range(5):
r = engine.submit_message(f'turn {i}')
# TurnResult has no compaction field
assert not hasattr(r, 'compaction_occurred') # passes every time
print(len(engine.mutable_messages)) # 3 — silently truncated from 5
```
**Root cause.** `compact_messages_if_needed` is called inside `submit_message` with no return value and no side-channel notification. `stream_submit_message` yields a `message_stop` event that includes `transcript_size` but not a `compaction_occurred` flag or `turns_dropped` count.
**Fix shape (~15 lines).**
1. Add `compaction_occurred: bool` and `turns_dropped: int` to `TurnResult`.
2. In `compact_messages_if_needed`, return `(bool, int)` — whether compaction ran and how many turns were dropped.
3. Propagate into `TurnResult` in `submit_message`.
4. In `stream_submit_message`, include `compaction_occurred` and `turns_dropped` in the `message_stop` event.
**Acceptance.** A claw watching the stream can detect that compaction occurred and how many turns were silently dropped, without polling `transcript_size` across two consecutive turns.
**Blocker.** None.
**Source.** Jobdori dogfood sweep 2026-04-22 06:36 KST — probed `query_engine.py` compact path, confirmed no structured compaction signal in `TurnResult` or stream output.
## Pinpoint #159. `run_turn_loop` hardcodes empty denied_tools — permission denials silently absent from multi-turn sessions
**Gap.** `PortRuntime.run_turn_loop` (`src/runtime.py:163`) calls `engine.submit_message(turn_prompt, command_names, tool_names, ())` with a hardcoded empty tuple for `denied_tools`. By contrast, `bootstrap_session` calls `_infer_permission_denials(matches)` and passes the result. Result: any tool that would be denied (e.g., bash-family tools gated as "destructive") silently appears unblocked across all turns in `turn-loop` mode. The `TurnResult.permission_denials` tuple is always empty for multi-turn runs, giving a false "clean" permission picture to any claw consuming those results.
**Repro.**
```python
import sys; sys.path.insert(0, 'src')
from runtime import PortRuntime
results = PortRuntime().run_turn_loop('run bash ls', max_turns=2)
for r in results:
assert r.permission_denials == () # passes — denials never surfaced
```
Compare `bootstrap_session` for the same prompt — it produces a `PermissionDenial` for bash-family tools.
**Root cause.** `src/runtime.py:163` — `engine.submit_message(turn_prompt, command_names, tool_names, ())`. The `()` is a hardcoded literal; `_infer_permission_denials` is never called in the turn-loop path.
**Fix shape (~5 lines).** Before the turn loop, compute:
```python
denials = tuple(self._infer_permission_denials(matches))
```
Then pass `denied_tools=denials` to every `submit_message` call inside the loop. Mirrors the existing pattern in `bootstrap_session`.
**Acceptance.** `run_turn_loop('run bash ls').permission_denials` is non-empty and matches what `bootstrap_session` returns for the same prompt. Multi-turn session security posture is symmetric with single-turn bootstrap.
**Blocker.** None.
**Source.** Jobdori dogfood sweep 2026-04-22 06:46 KST — diffed `bootstrap_session` vs `run_turn_loop` in `src/runtime.py`, confirmed asymmetric permission denial propagation.
## Pinpoint #160. `session_store` has no `list_sessions`, `delete_session`, or `session_exists` — claw cannot enumerate or clean up sessions without filesystem hacks
**Gap.** `src/session_store.py` exposes exactly two public functions: `save_session` and `load_session`. There is no `list_sessions`, `delete_session`, or `session_exists`. Any claw that needs to enumerate stored sessions, verify a session exists before loading (to avoid `FileNotFoundError`), or clean up stale sessions must reach past the module and glob `DEFAULT_SESSION_DIR` directly. This couples callers to the on-disk layout (`<dir>/<session_id>.json`) and makes it impossible to swap storage backends (e.g., sqlite, remote store) without touching every call site.
**Repro.**
```python
import sys; sys.path.insert(0, 'src')
import session_store, inspect
print([n for n, _ in inspect.getmembers(session_store, inspect.isfunction)
if not n.startswith('_')])
# ['asdict', 'dataclass', 'load_session', 'save_session']
# list_sessions, delete_session, session_exists — all absent
```
Try to enumerate sessions without the module:
```python
from pathlib import Path
sessions = list((Path('.port_sessions')).glob('*.json'))
# Works today, breaks if the dir layout ever changes — no abstraction layer
```
Try to load a session that doesn't exist:
```python
load_session('nonexistent') # raises FileNotFoundError with no structured error type
```
**Root cause.** `src/session_store.py` was scaffolded with the minimum needed to save/load a single session and was never extended with the CRUD surface a claw actually needs to manage session lifecycle.
**Fix shape (~25 lines).**
1. `list_sessions(directory: Path | None = None) -> list[str]` — glob `*.json` in target dir, return sorted session ids (filename stems). Claws can call this to discover all stored sessions without touching the filesystem directly.
2. `session_exists(session_id: str, directory: Path | None = None) -> bool` — `(target_dir / f'{session_id}.json').exists()`. Use before `load_session` to get a bool check instead of catching `FileNotFoundError`.
3. `delete_session(session_id: str, directory: Path | None = None) -> bool` — unlink the file if present, return True on success, False if not found. Claws can use this for cleanup without knowing the path scheme.
**Acceptance.** A claw can call `list_sessions()`, `session_exists(id)`, and `delete_session(id)` without importing `Path` or knowing the `.port_sessions/<id>.json` layout. `load_session` on a missing id raises a typed `SessionNotFoundError` subclass of `KeyError` (not `FileNotFoundError`) so callers can distinguish "not found" from IO errors.
**Blocker.** None.
**Source.** Jobdori dogfood sweep 2026-04-22 08:46 KST — inspected `src/session_store.py` public API, confirmed only `save_session` + `load_session` present, no list/delete/exists surface.

View File

@@ -2293,21 +2293,35 @@ fn check_install_source_health() -> DiagnosticCheck {
fn check_workspace_health(context: &StatusContext) -> DiagnosticCheck {
let in_repo = context.project_root.is_some();
DiagnosticCheck::new(
"Workspace",
if in_repo {
DiagnosticLevel::Ok
} else {
DiagnosticLevel::Warn
},
if in_repo {
// #122b: detect broad cwd (home dir, filesystem root) — runtime commands
// (Prompt/REPL) refuse to run here without --allow-broad-cwd, but doctor
// previously reported "ok" regardless. Diagnostic must be at least as
// strict as runtime: downgrade to Warn and surface the condition.
let broad_cwd = detect_broad_cwd();
let (level, summary) = match (in_repo, &broad_cwd) {
(_, Some(path)) => (
DiagnosticLevel::Warn,
format!(
"current directory is a broad path ({}); Prompt/REPL will refuse to run here without --allow-broad-cwd",
path.display()
),
),
(true, None) => (
DiagnosticLevel::Ok,
format!(
"project root detected on branch {}",
context.git_branch.as_deref().unwrap_or("unknown")
)
} else {
"current directory is not inside a git project".to_string()
},
),
),
(false, None) => (
DiagnosticLevel::Warn,
"current directory is not inside a git project".to_string(),
),
};
DiagnosticCheck::new(
"Workspace",
level,
summary,
)
.with_details(vec![
format!("Cwd {}", context.cwd.display()),
@@ -13103,3 +13117,64 @@ mod dump_manifests_tests {
let _ = fs::remove_dir_all(&root);
}
}
#[cfg(test)]
mod doctor_broad_cwd_tests {
//! #122b regression tests: doctor's workspace check must surface broad-cwd
//! as a warning, matching runtime (Prompt/REPL) refuse-to-run behavior.
//! Without these, `claw doctor` in ~/ or / reports "ok" while `claw prompt`
//! in the same dir errors out — diagnostic deception.
use super::{check_workspace_health, render_diagnostic_check, StatusContext};
use std::path::PathBuf;
fn make_ctx(cwd: PathBuf, project_root: Option<PathBuf>) -> StatusContext {
use runtime::SandboxStatus;
StatusContext {
cwd,
session_path: None,
loaded_config_files: 0,
discovered_config_files: 0,
memory_file_count: 0,
project_root,
git_branch: None,
git_summary: super::parse_git_workspace_summary(None),
sandbox_status: SandboxStatus::default(),
config_load_error: None,
}
}
#[test]
fn workspace_check_in_project_dir_reports_ok() {
// #122b non-regression: non-broad project dir should stay OK.
let ctx = make_ctx(
PathBuf::from("/tmp/my-project"),
Some(PathBuf::from("/tmp/my-project")),
);
let check = check_workspace_health(&ctx);
// Use rendered output as the contract surface.
let rendered = render_diagnostic_check(&check);
assert!(rendered.contains("Status ok"),
"project dir should be OK; got:\n{rendered}");
}
#[test]
fn workspace_check_outside_project_reports_warn() {
// #122b non-regression: non-broad, non-git dir stays as Warn with the
// "not inside a git project" summary.
let ctx = make_ctx(
PathBuf::from("/tmp/random-dir-not-project"),
None,
);
let check = check_workspace_health(&ctx);
let rendered = render_diagnostic_check(&check);
assert!(
rendered.contains("Status warn"),
"non-git dir should warn; got:\n{rendered}"
);
assert!(
rendered.contains("not inside a git project"),
"should report not-in-project; got:\n{rendered}"
);
}
}