Compare commits

...

4 Commits

View File

@@ -12275,3 +12275,147 @@ claw status --session clawcode-human # → no approval-blocked indicator
**Status:** Open. No code changed. Merge-wait mode. Filed from DOGFOOD_FINDINGS.md evidence (gaebal-gajae).
🪨
---
## Pinpoint #200 — SCHEMAS.md / classifier comments self-documenting drift: declarative claims diverge from implementation with no derive-from-source enforcement (Q *YeonGyu Kim, cycle #304)
**Observed:** SCHEMAS.md action-field counts and classifier comments claim coverage over implementation-enumerable facts (e.g. event kinds, action verbs, field lists). Over time these declarative claims silently diverge from actual code — pattern already observed at #170 (classifier 4-verb sweep) and #172 (action-field count drift). No test enforces that SCHEMAS.md reflects current implementation; document can go stale without any CI signal.
**Gap:**
- SCHEMAS.md field/kind enumerations are hand-maintained with no automated sync
- Classifier comments referencing "N action verbs" or "M event types" have no derive-from-source test
- Divergence is invisible until a human audits both doc and code manually
- Pattern recurs: #170 found 4-verb classifier claim vs actual 6-verb set; #172 found action-field count mismatch
**Repro:**
```
# Check SCHEMAS.md event kind list vs actual runtime event kinds
grep -E 'kind:' SCHEMAS.md | wc -l
grep -rE 'kind:.*=' src/ | grep -v test | wc -l
# Counts diverge with no CI gate
```
**Expected:** A derive-from-source test (e.g. `test_schemas_md_event_kinds_complete`) that parses SCHEMAS.md claimed enumerations and asserts they match implementation-enumerable facts. Fails loudly on drift.
**Fix sketch:**
1. Add `tests/test_schemas_doc_parity.py` — parse SCHEMAS.md event-kind list, compare to runtime-emitted kinds
2. Add similar check for classifier action-verb claims vs actual classifier verb set
3. Gate on CI so SCHEMAS.md updates are forced when implementation changes
**Status:** Open. No code changed. Merge-wait mode. Filed from Q *YeonGyu Kim cycle #304 observation.
🪨
---
## Pinpoint #201 — `parse_tool_arguments` silent fallback: malformed JSON tool args wrapped as `{"raw": ...}`, no structured error event emitted (Jobdori, cycle #134)
**Observed:** In `rust/crates/api/src/providers/openai_compat.rs`, `parse_tool_arguments()` (line ~1223) silently converts malformed JSON tool arguments to `json!({ "raw": arguments })` with no error event, no log entry, and no structured signal. Downstream consumers receive what appears to be a valid parsed object with a `raw` field — the parse failure is completely invisible.
**Gap:**
- A tool call with malformed arguments (e.g. `arguments: "not json"`) is silently normalized to `{"raw": "not json"}`
- No `tool_parse_error` event or `parse_error` field is emitted
- Classifier and orchestrator see an apparently-valid tool args object — they cannot distinguish "arguments parsed cleanly" from "arguments were garbage and got wrapped"
- Error surfaces only when downstream logic tries to access expected keys and gets `None` — attribution is lost by then
**Repro:**
```
# Simulate a provider returning malformed tool call arguments
# claw receives chunk with: tool_call.function.arguments = "not valid json {"
# parse_tool_arguments("not valid json {") → Ok({"raw": "not valid json {"})
# No error logged, no event emitted, session continues as if parse succeeded
```
**Expected:**
- `parse_tool_arguments` returns a typed result: either parsed object or a `ToolParseError { raw: String, error: String }`
- On parse failure, emit structured event: `{ "kind": "tool_arg_parse_error", "tool_index": N, "raw": "...", "parse_error": "..." }`
- Session status reflects parse failure; `claw doctor` can surface it
- Downstream code can distinguish clean parse from fallback wrap
**Fix sketch:**
1. Change return type of `parse_tool_arguments` to `Result<Value, ToolArgParseError>`
2. On error path, emit `tool_arg_parse_error` event before returning fallback
3. Include `parse_error` field alongside `raw` in fallback value so downstream can detect it: `json!({ "raw": arguments, "__parse_error": err.to_string() })`
4. Add to classifier's recognized error taxonomy
**Status:** Open. No code changed. Filed 2026-04-25 05:02 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: b780c80.
🪨
---
## Pinpoint #202 — `sanitize_tool_message_pairing` silent drop: orphaned tool messages removed with no event, no log, no diagnostic visibility (Jobdori, cycle #135)
**Observed:** In `rust/crates/api/src/providers/openai_compat.rs`, `sanitize_tool_message_pairing()` (called at line ~868) silently drops any `role:"tool"` message whose `tool_call_id` has no matching preceding `assistant` turn with a `tool_calls[].id`. The drop is intentional (prevents 400s from OpenAI-compat backends) but produces zero structured signal: no event, no log entry, no field in the request envelope indicating N messages were removed.
**Gap:**
- A session with compaction, editing, or resume can arrive at the request boundary with orphaned tool messages
- These are quietly dropped; the provider receives a request with fewer messages than the session history claims
- No `tool_message_dropped` event or `history_sanitized` field is emitted
- `claw doctor`, the event log, and downstream observers cannot distinguish "all tool messages sent" from "N tool messages silently omitted"
- Debugging mismatch between session history and what the provider actually received requires source-level tracing
**Repro:**
```
# Craft a session where a tool result has no matching assistant tool_calls entry
# (e.g. resume after compaction that dropped the assistant turn but kept the result)
# sanitize_tool_message_pairing() drops the orphan silently
# Event log shows no drop event
# Provider receives history minus the orphaned message; caller sees no indication
```
**Expected:**
- On any drop, emit structured event: `{ "kind": "tool_message_dropped", "tool_call_id": "...", "count": N, "reason": "no_paired_assistant_turn" }`
- Optionally: include `{ "history_sanitized": { "dropped_tool_messages": N } }` in request metadata
- `claw doctor` can surface sessions where tool message sanitization occurred
- Clawability: agents replaying or resuming sessions can detect the gap and re-issue the tool call or warn
**Fix sketch:**
1. Change `sanitize_tool_message_pairing` to return `(Vec<Value>, Vec<DroppedToolMessage>)` or emit events via a callback/channel
2. At call site (line ~868), if any drops occurred, emit `tool_message_dropped` event(s) before sending request
3. Add `dropped_tool_messages` count to request diagnostic envelope if non-zero
4. Add to classifier's recognized event taxonomy
**Status:** Open. No code changed. Filed 2026-04-25 06:09 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: 5e0228d.
🪨
---
## Pinpoint #203 — `AutoCompactionEvent` is summary-only: no streaming SSE event emitted when auto-compaction fires mid-turn (Jobdori, cycle #136)
**Observed:** In `rust/crates/runtime/src/conversation.rs`, `maybe_auto_compact()` (line ~555) fires compaction between turns and returns an `AutoCompactionEvent { removed_message_count }`. This event is attached to `TurnSummary.auto_compaction` and only surfaces in the post-turn struct returned by `run_turn()`. It is not emitted as a streaming SSE event at the moment compaction occurs.
**Gap:**
- A claw monitoring the SSE stream during a long multi-turn session cannot detect that compaction fired until the final `TurnSummary` JSON arrives (or, in JSON output mode, until the CLI prints the final response envelope)
- Between compaction firing and the final summary, the session history has already been truncated — any mid-turn state the claw was tracking against the old history is now stale
- No `session_compacted` or `auto_compaction` SSE event exists; the classifier's event taxonomy has no entry for it
- `claw doctor` cannot surface "this session has been auto-compacted N times" or "compaction removed M messages in the last turn"
- Claws relying on replay-by-session-history for context reconstruction silently receive a shorter history with no notification
**Repro:**
```
# Run a session long enough to trigger auto-compaction
# Monitor the SSE stream during the turn
# Observe: no event with kind=session_compacted or similar appears in the stream
# The only signal is the post-turn auto_compaction field in the JSON summary
# If running in interactive TUI mode, the only signal is the printed compaction notice
```
**Expected:**
- When `maybe_auto_compact()` removes messages, emit a streaming SSE event immediately: `{ "kind": "session_compacted", "removed_message_count": N, "retained_message_count": M, "trigger": "auto" }`
- `claw doctor` surfaces sessions with auto-compaction history and message counts
- Classifier recognizes `session_compacted` as a first-class event kind
- `/compact` manual command similarly emits this event (currently only prints a user-facing string)
**Fix sketch:**
1. Add `session_compacted` to the `StreamEvent` enum (or as a diagnostic event channel alongside it)
2. In `maybe_auto_compact()`, after compaction, push `session_compacted` event through the event channel before continuing the turn
3. Expose count in the event: `{ kind: "session_compacted", removed_message_count: N }`
4. Wire manual `/compact` command to emit the same event
5. Add to classifier event taxonomy and `claw doctor` output
**Status:** Open. No code changed. Filed 2026-04-25 07:47 KST. Branch: feat/jobdori-168c-emission-routing. HEAD: 0730183.
🪨