Compare commits

...

8 Commits

Author SHA1 Message Date
Yeachan-Heo
2e34949507 Keep latest-session timestamps increasing under tight loops
The next repo-local sweep target was ROADMAP #73: repeated backlog
sweeps exposed that session writes could share the same wall-clock
millisecond, which made semantic recency fragile and forced the
resume-latest regression to sleep between saves. The fix makes session
timestamps monotonic within the process and removes the timing hack
from the test so latest-session selection stays stable under tight
loops.

Constraint: Preserve the existing session file format while changing only the timestamp source semantics
Rejected: Keep the sleep-based test workaround | hides the real ordering hazard instead of fixing timestamp generation
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Any future session-recency logic must keep `current_time_millis`, ordering tests, and latest-session expectations aligned
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: Cross-process monotonicity when multiple binaries write sessions concurrently
2026-04-12 10:51:19 +00:00
Yeachan-Heo
8f53524bd3 Make backlog-scan lanes say what they actually selected
The next repo-local sweep target was ROADMAP #65: backlog-scanning
lanes could stop with prose-only summaries naming roadmap items, but
there was no machine-readable record of which items were chosen,
which were skipped, or whether the lane intended to execute, review,
or no-op. The fix teaches completed lane persistence to extract a
structured selection outcome while preserving the existing quality-
floor and review-verdict behavior for other lanes.

Constraint: Keep selection-outcome extraction on the existing `lane.finished` metadata path instead of inventing a separate event stream
Rejected: Add a dedicated selection event type first | unnecessary for this focused closeout because `lane.finished` already persists structured data downstream can read
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If backlog-scan summary conventions change later, update `extract_selection_outcome`, its regression test, and the ROADMAP closeout wording together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE after roadmap closeout update
Not-tested: Downstream consumers that may still ignore `lane.finished.data.selectionOutcome`
2026-04-12 09:54:37 +00:00
Yeachan-Heo
b5e30e2975 Make completed review lanes emit machine-readable verdicts
The next repo-local sweep target was ROADMAP #67: scoped review lanes
could stop with prose-only output, leaving downstream consumers to infer
approval or rejection from later chatter. The fix teaches completed lane
persistence to recognize review-style `APPROVE`/`REJECT`/`BLOCKED`
results, attach structured verdict metadata to `lane.finished`, and keep
ordinary non-review lanes on the existing quality-floor path.

Constraint: Preserve the existing non-review lane summary path while enriching only review-style completions
Rejected: Add a brand-new lane event type just for review results | unnecessary when `lane.finished` already carries structured metadata and downstream consumers can read it there
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If review verdict parsing changes later, update `extract_review_outcome`, the finished-event payload fields, and the review-lane regression together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: External consumers that may still ignore `lane.finished.data.reviewVerdict`
2026-04-12 08:49:40 +00:00
Yeachan-Heo
dbc2824a3e Keep latest session selection tied to real session recency
The next repo-local sweep target was ROADMAP #72: the `latest`
managed-session alias could depend on filesystem mtime before the
session's own persisted recency markers, which made the selection
path vulnerable to coarse or misleading file timestamps. The fix
promotes `updated_at_ms` into the summary/order path, keeps CLI
wrappers in sync, and locks the mtime-vs-session-recency case with
regression coverage.

Constraint: Preserve existing managed-session storage layout while changing only the ordering signal
Rejected: Keep sorting by filesystem mtime and just sleep longer in tests | hides the semantic ordering bug instead of fixing it
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Any future managed-session ordering change must keep runtime and CLI summary structs aligned on the same recency fields
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: Cross-filesystem behavior where persisted session JSON cannot be read and fallback ordering uses mtime only
2026-04-12 07:49:32 +00:00
Yeachan-Heo
f309ff8642 Stop repo lanes from executing the wrong task payload
The next repo-local sweep target was ROADMAP #71: a claw-code lane
accepted an unrelated KakaoTalk/image-analysis prompt even though the
lane itself was supposed to be repo-scoped work. This extends the
existing prompt-misdelivery guardrail with an optional structured task
receipt so worker boot can reject visible wrong-task context before the
lane continues executing.

Constraint: Keep the fix inside the existing worker_boot / WorkerSendPrompt control surface instead of inventing a new external OMX-only protocol
Rejected: Treat wrong-task receipts as generic shell misdelivery | loses the expected-vs-observed task context needed to debug contaminated lanes
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If task-receipt fields change later, update the WorkerSendPrompt schema, worker payload serialization, and wrong-task regression together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: External orchestrators that have not yet started populating the optional task_receipt field
2026-04-12 07:00:07 +00:00
Yeachan-Heo
3b806702e7 Make the CLI point users at the real install source
The next repo-local backlog item was ROADMAP #70: users could
mistake third-party pages or the deprecated `cargo install
claw-code` path for the official install route. The CLI now
surfaces the source of truth directly in `claw doctor` and
`claw --help`, and the roadmap closeout records the change.

Constraint: Keep the fix inside repo-local Rust CLI surfaces instead of relying on docs alone
Rejected: Close #70 with README-only wording | the bug was user-facing CLI ambiguity, so the warning needed to appear in runtime help/doctor output
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If install guidance changes later, update both the doctor check payload and the help-text warning together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: Third-party websites outside this repo that may still present stale install instructions
2026-04-12 04:50:03 +00:00
Yeachan-Heo
26b89e583f Keep completed lanes from ending on mushy stop summaries
The next repo-local sweep target was ROADMAP #69: completed lane
runs could persist vague control text like “commit push everyting,
keep sweeping $ralph”, which made downstream stop summaries
operationally useless. The fix adds a lane-finished quality floor
that preserves strong summaries, rewrites empty/control-only/too-
short-without-context summaries into a contextual fallback, and
records structured metadata explaining when the fallback fired.

Constraint: Keep legitimate concise lane summaries intact while improving only low-signal completions
Rejected: Blanket-rewrite every completed summary into a templated sentence | would erase useful model-authored detail from good lane outputs
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If lane-finished summary heuristics change later, update the structured `qualityFloorApplied/rawSummary/reasons/wordCount` contract and its regression tests together
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace; architect review APPROVE
Not-tested: External OMX consumers that may still ignore the new lane.finished data payload
2026-04-12 03:23:39 +00:00
YeonGyu-Kim
17e21bc4ad docs(roadmap): add #70 — install-source ambiguity misleads users
User treated claw-code.io as official, hit clawcode vs deprecated
claw-code naming collision. Adding requirement for canonical
docs to explicitly state official source and warn against
deprecated crate.

Source: gaebal-gajae community watch 2026-04-12
2026-04-12 12:08:52 +09:00
8 changed files with 798 additions and 34 deletions

View File

@@ -496,18 +496,25 @@ Model name prefix now wins unconditionally over env-var presence. Regression tes
62. **Worker state file surface not implemented****done (verified 2026-04-12):** current `main` already wires `emit_state_file(worker)` into the worker transition path in `rust/crates/runtime/src/worker_boot.rs`, atomically writes `.claw/worker-state.json`, and exposes the documented reader surface through `claw state` / `claw state --output-format json` in `rust/crates/rusty-claude-cli/src/main.rs`. Fresh proof exists in `runtime` regression `emit_state_file_writes_worker_status_on_transition`, the end-to-end `tools` regression `recovery_loop_state_file_reflects_transitions`, and direct CLI parsing coverage for `state` / `state --output-format json`. Source: Jobdori dogfood.
**Scope note (verified 2026-04-12):** ROADMAP #31, #43, and #63-#68 currently appear to describe acpx/droid or upstream OMX/server orchestration behavior, not claw-code source already present in this repository. Repo-local searches for `acpx`, `use-droid`, `run-acpx`, `commit-wrapper`, `ultraclaw`, `roadmap-nudge-10min`, `OMX_TMUX_INJECT`, `/hooks/health`, and `/hooks/status` found no implementation hits outside `ROADMAP.md`, and the earlier state-surface note already records that the HTTP server is not owned by claw-code. With #45 now fixed, the remaining unresolved items in this section look like external tracking notes rather than confirmed repo-local backlog; re-check if new repo-local evidence appears.
**Scope note (verified 2026-04-12):** ROADMAP #31, #43, and #63-#68 currently appear to describe acpx/droid or upstream OMX/server orchestration behavior, not claw-code source already present in this repository. Repo-local searches for `acpx`, `use-droid`, `run-acpx`, `commit-wrapper`, `ultraclaw`, `roadmap-nudge-10min`, `OMX_TMUX_INJECT`, `/hooks/health`, and `/hooks/status` found no implementation hits outside `ROADMAP.md`, and the earlier state-surface note already records that the HTTP server is not owned by claw-code. With #45, #65, #67, and #69 now fixed, the remaining unresolved items in this section look like external tracking notes rather than confirmed repo-local backlog; re-check if new repo-local evidence appears.
63. **Droid session completion semantics broken: code arrives after "status: completed"** — dogfooded 2026-04-12. Ultraclaw droid sessions (use-droid via acpx) report `session.status: completed` before file writes are fully flushed/synced to the working tree. Discovered +410 lines of "late-arriving" droid output that appeared after I had already assessed 8 sessions as "no code produced." This creates false-negative assessments and duplicate work. **Fix shape:** (a) droid agent should only report completion after explicit file-write confirmation (fsync or existence check); (b) or, claw-code should expose a `pending_writes` status that indicates "agent responded, disk flush pending"; (c) lane orchestrators should poll for file changes for N seconds after completion before final assessment. **Blocker:** none. Source: Jobdori ultraclaw dogfood 2026-04-12.
64. **Artifact provenance is post-hoc narration, not structured events** — dogfooded 2026-04-12. The ultraclaw batch delivered 4 ROADMAP items and 3 commits, but the event stream only contained log-shaped text ("+410 lines detected", "committing...", "pushed"). Downstream consumers (clawhip, lane orchestrators, monitors) must reconstruct provenance from chat messages rather than consuming first-class events. **Fix shape:** emit structured artifact/result events with: `sourceLanes`, `roadmapIds`, `files`, `diffStat`, `verification: tested|committed|pushed|merged`, `commitSha`. Remove dependency on human/bot narration layer to explain what actually landed. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
65. **Backlog-scanning team lanes emit opaque stops, not structured selection outcomes**dogfooded 2026-04-12. $ralph $team sessions scanning ROADMAP Immediate Backlog stop with summary text naming open items, but no machine-readable signal of: which item(s) were selected for work, which were skipped and why, whether execution happened vs review-only vs no-op. **Fix shape:** add structured "selection outcome" event with `chosenItems`, `skippedItems`, `rationale`, `action: execute|review|no-op`. Stop emitting "check backlog" as prose summary without selection contract. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
65. **Backlog-scanning team lanes emit opaque stops, not structured selection outcomes****done (verified 2026-04-12):** completed lane persistence in `rust/crates/tools/src/lib.rs` now recognizes backlog-scan selection summaries and records structured `selectionOutcome` metadata on `lane.finished`, including `chosenItems`, `skippedItems`, `action`, and optional `rationale`, while preserving existing non-selection and review-lane behavior. Regression coverage locks the structured backlog-scan payload alongside the earlier quality-floor and review-verdict paths. **Original filing below.**
66. **Completion-aware reminder shutdown missing** — dogfooded 2026-04-12. Ultraclaw batch completed and was reported as done, but 10-minute cron reminder (`roadmap-nudge-10min`) kept firing into channel as if work still pending. Reminder/cron state not coupled to terminal task state. **Fix shape:** (a) cron jobs should check task completion state before firing; (b) or, provide explicit `cron.remove` on task completion; (c) or, reminders should include "work complete" detection and auto-expire. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
67. **Scoped review lanes do not emit structured verdicts**dogfooded 2026-04-12. OMX review lanes now have improved scope (specific ROADMAP items, specific files, explicit APPROVE/REJECT contract), but the stop event only contains the review request — not the actual verdict. Operators must infer approval/rejection/blockage from later git commits or surrounding chatter. **Fix shape:** emit structured review result on stop with: `verdict: approve|reject|blocked`, `target: commit/diff reviewed`, `rationale: short summary`. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
67. **Scoped review lanes do not emit structured verdicts****done (verified 2026-04-12):** completed lane persistence in `rust/crates/tools/src/lib.rs` now recognizes review-style `APPROVE`/`REJECT`/`BLOCKED` results and records structured `reviewVerdict`, `reviewTarget`, and `reviewRationale` metadata on the `lane.finished` event while preserving existing non-review lane behavior. Regression coverage locks both the normal completion path and a scoped review-lane completion payload. **Original filing below.**
68. **Internal reinjection/resume paths leak opaque control prose** — dogfooded 2026-04-12. OMX lanes stopping with `Continue from current mode state. [OMX_TMUX_INJECT]` expose internal implementation details instead of operator-meaningful state. The event tells us *that* tmux reinjection happened, but not *why* (retry after failure? resume after idle? manual recovery?), *what state was preserved*, or *what the lane was trying to do*. **Fix shape:** recovery/reinject events should emit structured cause like: `resume_after_stop`, `retry_after_tool_failure`, `tmux_reinject_after_idle`, `manual_recovery` plus preserved state / target lane info. Never leak bare internal markers like `[OMX_TMUX_INJECT]` as the primary summary. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
69. **Lane stop summaries have no minimum quality floor**dogfooded 2026-04-12. `clawcode-human` session stopped with summary `commit push everyting, keep sweeping $ralph` — vague, typo-ridden, operationally useless. Unlike well-scoped review lanes, this summary regressed to mushy command prose with no outcome clarity. **Fix shape:** (a) enforce minimum stop/result summary standards: what was done (outcome), what was scoped (target), what's next (state); (b) typo/grammar validation; (c) reject summaries that are shorter than N words or contain only control verbs without context. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
69. **Lane stop summaries have no minimum quality floor****done (verified 2026-04-12):** completed lane persistence in `rust/crates/tools/src/lib.rs` now normalizes vague/control-only stop summaries into a contextual fallback that includes the lane target and status, while preserving structured metadata about whether the quality floor fired (`qualityFloorApplied`, `rawSummary`, `reasons`, `wordCount`). Regression coverage locks both the pass-through path for good summaries and the fallback path for mushy summaries like `commit push everyting, keep sweeping $ralph`. **Original filing below.**
70. **Install-source ambiguity misleads real users****done (verified 2026-04-12):** repo-local Rust guidance now makes the source of truth explicit in `claw doctor` and `claw --help`, naming `ultraworkers/claw-code` as the canonical repo and warning that `cargo install claw-code` installs a deprecated stub rather than the `claw` binary. Regression coverage locks both the new doctor JSON check and the help-text warning. **Original filing below.**
71. **Wrong-task prompt receipt is not detected before execution****done (verified 2026-04-12):** worker boot prompt dispatch now accepts an optional structured `task_receipt` (`repo`, `task_kind`, `source_surface`, `expected_artifacts`, `objective_preview`) and treats mismatched visible prompt context as a `WrongTask` prompt-delivery failure before execution continues. The prompt-delivery payload now records `observed_prompt_preview` plus the expected receipt, and regression coverage locks both the existing shell/wrong-target paths and the new KakaoTalk-style wrong-task mismatch case. **Original filing below.**
72. **`latest` managed-session selection depends on filesystem mtime before semantic session recency** — **done (verified 2026-04-12):** managed-session summaries now carry `updated_at_ms`, `SessionStore::list_sessions()` sorts by semantic recency before filesystem mtime, and regression coverage locks the case where `latest` must prefer the newer session payload even when file mtimes point the other way. The CLI session-summary wrapper now stays in sync with the runtime field so `latest` resolution uses the same ordering signal everywhere. **Original filing below.**
73. **Session timestamps are not monotonic enough for latest-session ordering under tight loops****done (verified 2026-04-12):** runtime session timestamps now use a process-local monotonic millisecond source, so back-to-back saves still produce increasing `updated_at_ms` even when the wall clock does not advance. The temporary sleep hack was removed from the resume-latest regression, and fresh workspace verification stayed green with the semantic-recency ordering path from #72. **Original filing below.**

View File

@@ -13,6 +13,7 @@ const SESSION_VERSION: u32 = 1;
const ROTATE_AFTER_BYTES: u64 = 256 * 1024;
const MAX_ROTATED_FILES: usize = 3;
static SESSION_ID_COUNTER: AtomicU64 = AtomicU64::new(0);
static LAST_TIMESTAMP_MS: AtomicU64 = AtomicU64::new(0);
/// Speaker role associated with a persisted conversation message.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
@@ -1030,10 +1031,27 @@ fn normalize_optional_string(value: Option<String>) -> Option<String> {
}
fn current_time_millis() -> u64 {
SystemTime::now()
let wall_clock = SystemTime::now()
.duration_since(UNIX_EPOCH)
.map(|duration| u64::try_from(duration.as_millis()).unwrap_or(u64::MAX))
.unwrap_or_default()
.unwrap_or_default();
let mut candidate = wall_clock;
loop {
let previous = LAST_TIMESTAMP_MS.load(Ordering::Relaxed);
if candidate <= previous {
candidate = previous.saturating_add(1);
}
match LAST_TIMESTAMP_MS.compare_exchange(
previous,
candidate,
Ordering::SeqCst,
Ordering::SeqCst,
) {
Ok(_) => return candidate,
Err(actual) => candidate = actual.saturating_add(1),
}
}
}
fn generate_session_id() -> String {
@@ -1125,8 +1143,8 @@ fn cleanup_rotated_logs(path: &Path) -> Result<(), SessionError> {
#[cfg(test)]
mod tests {
use super::{
cleanup_rotated_logs, rotate_session_file_if_needed, ContentBlock, ConversationMessage,
MessageRole, Session, SessionFork,
cleanup_rotated_logs, current_time_millis, rotate_session_file_if_needed, ContentBlock,
ConversationMessage, MessageRole, Session, SessionFork,
};
use crate::json::JsonValue;
use crate::usage::TokenUsage;
@@ -1134,6 +1152,16 @@ mod tests {
use std::path::{Path, PathBuf};
use std::time::{SystemTime, UNIX_EPOCH};
#[test]
fn session_timestamps_are_monotonic_under_tight_loops() {
let first = current_time_millis();
let second = current_time_millis();
let third = current_time_millis();
assert!(first < second);
assert!(second < third);
}
#[test]
fn persists_and_restores_session_jsonl() {
let mut session = Session::new();

View File

@@ -144,12 +144,7 @@ impl SessionStore {
if let Some(legacy_root) = self.legacy_sessions_root() {
self.collect_sessions_from_dir(&legacy_root, &mut sessions)?;
}
sessions.sort_by(|left, right| {
right
.modified_epoch_millis
.cmp(&left.modified_epoch_millis)
.then_with(|| right.id.cmp(&left.id))
});
sort_managed_sessions(&mut sessions);
Ok(sessions)
}
@@ -260,6 +255,7 @@ impl SessionStore {
ManagedSessionSummary {
id: session.session_id,
path,
updated_at_ms: session.updated_at_ms,
modified_epoch_millis,
message_count: session.messages.len(),
parent_session_id: session
@@ -279,6 +275,7 @@ impl SessionStore {
.unwrap_or("unknown")
.to_string(),
path,
updated_at_ms: 0,
modified_epoch_millis,
message_count: 0,
parent_session_id: None,
@@ -322,12 +319,23 @@ pub struct SessionHandle {
pub struct ManagedSessionSummary {
pub id: String,
pub path: PathBuf,
pub updated_at_ms: u64,
pub modified_epoch_millis: u128,
pub message_count: usize,
pub parent_session_id: Option<String>,
pub branch_name: Option<String>,
}
fn sort_managed_sessions(sessions: &mut [ManagedSessionSummary]) {
sessions.sort_by(|left, right| {
right
.updated_at_ms
.cmp(&left.updated_at_ms)
.then_with(|| right.modified_epoch_millis.cmp(&left.modified_epoch_millis))
.then_with(|| right.id.cmp(&left.id))
});
}
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct LoadedManagedSession {
pub handle: SessionHandle,
@@ -598,6 +606,35 @@ mod tests {
.expect("session summary should exist")
}
#[test]
fn latest_session_prefers_semantic_updated_at_over_file_mtime() {
let mut sessions = vec![
ManagedSessionSummary {
id: "older-file-newer-session".to_string(),
path: PathBuf::from("/tmp/older"),
updated_at_ms: 200,
modified_epoch_millis: 100,
message_count: 2,
parent_session_id: None,
branch_name: None,
},
ManagedSessionSummary {
id: "newer-file-older-session".to_string(),
path: PathBuf::from("/tmp/newer"),
updated_at_ms: 100,
modified_epoch_millis: 200,
message_count: 1,
parent_session_id: None,
branch_name: None,
},
];
crate::session_control::sort_managed_sessions(&mut sessions);
assert_eq!(sessions[0].id, "older-file-newer-session");
assert_eq!(sessions[1].id, "newer-file-older-session");
}
#[test]
fn creates_and_lists_managed_sessions() {
// given

View File

@@ -92,6 +92,7 @@ pub enum WorkerTrustResolution {
pub enum WorkerPromptTarget {
Shell,
WrongTarget,
WrongTask,
Unknown,
}
@@ -108,10 +109,24 @@ pub enum WorkerEventPayload {
observed_target: WorkerPromptTarget,
#[serde(skip_serializing_if = "Option::is_none")]
observed_cwd: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
observed_prompt_preview: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
task_receipt: Option<WorkerTaskReceipt>,
recovery_armed: bool,
},
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct WorkerTaskReceipt {
pub repo: String,
pub task_kind: String,
pub source_surface: String,
#[serde(default, skip_serializing_if = "Vec::is_empty")]
pub expected_artifacts: Vec<String>,
pub objective_preview: String,
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct WorkerEvent {
pub seq: u64,
@@ -134,6 +149,7 @@ pub struct Worker {
pub prompt_delivery_attempts: u32,
pub prompt_in_flight: bool,
pub last_prompt: Option<String>,
pub expected_receipt: Option<WorkerTaskReceipt>,
pub replay_prompt: Option<String>,
pub last_error: Option<WorkerFailure>,
pub created_at: u64,
@@ -182,6 +198,7 @@ impl WorkerRegistry {
prompt_delivery_attempts: 0,
prompt_in_flight: false,
last_prompt: None,
expected_receipt: None,
replay_prompt: None,
last_error: None,
created_at: ts,
@@ -257,6 +274,7 @@ impl WorkerRegistry {
&lowered,
worker.last_prompt.as_deref(),
&worker.cwd,
worker.expected_receipt.as_ref(),
)
})
.flatten()
@@ -272,6 +290,10 @@ impl WorkerRegistry {
"worker prompt landed in the wrong target instead of {}: {}",
worker.cwd, prompt_preview
),
WorkerPromptTarget::WrongTask => format!(
"worker prompt receipt mismatched the expected task context for {}: {}",
worker.cwd, prompt_preview
),
WorkerPromptTarget::Unknown => format!(
"worker prompt delivery failed before reaching coding agent: {prompt_preview}"
),
@@ -291,6 +313,8 @@ impl WorkerRegistry {
prompt_preview: prompt_preview.clone(),
observed_target: observation.target,
observed_cwd: observation.observed_cwd.clone(),
observed_prompt_preview: observation.observed_prompt_preview.clone(),
task_receipt: worker.expected_receipt.clone(),
recovery_armed: false,
}),
);
@@ -306,6 +330,8 @@ impl WorkerRegistry {
prompt_preview,
observed_target: observation.target,
observed_cwd: observation.observed_cwd,
observed_prompt_preview: observation.observed_prompt_preview,
task_receipt: worker.expected_receipt.clone(),
recovery_armed: true,
}),
);
@@ -374,7 +400,12 @@ impl WorkerRegistry {
Ok(worker.clone())
}
pub fn send_prompt(&self, worker_id: &str, prompt: Option<&str>) -> Result<Worker, String> {
pub fn send_prompt(
&self,
worker_id: &str,
prompt: Option<&str>,
task_receipt: Option<WorkerTaskReceipt>,
) -> Result<Worker, String> {
let mut inner = self.inner.lock().expect("worker registry lock poisoned");
let worker = inner
.workers
@@ -398,6 +429,7 @@ impl WorkerRegistry {
worker.prompt_delivery_attempts += 1;
worker.prompt_in_flight = true;
worker.last_prompt = Some(next_prompt.clone());
worker.expected_receipt = task_receipt;
worker.replay_prompt = None;
worker.last_error = None;
worker.status = WorkerStatus::Running;
@@ -548,6 +580,7 @@ fn prompt_misdelivery_is_relevant(worker: &Worker) -> bool {
struct PromptDeliveryObservation {
target: WorkerPromptTarget,
observed_cwd: Option<String>,
observed_prompt_preview: Option<String>,
}
fn push_event(
@@ -699,6 +732,7 @@ fn detect_prompt_misdelivery(
lowered: &str,
prompt: Option<&str>,
expected_cwd: &str,
expected_receipt: Option<&WorkerTaskReceipt>,
) -> Option<PromptDeliveryObservation> {
let Some(prompt) = prompt else {
return None;
@@ -713,12 +747,30 @@ fn detect_prompt_misdelivery(
return None;
}
let prompt_visible = lowered.contains(&prompt_snippet);
let observed_prompt_preview = detect_prompt_echo(screen_text);
if let Some(receipt) = expected_receipt {
let receipt_visible = task_receipt_visible(lowered, receipt);
let mismatched_prompt_visible = observed_prompt_preview
.as_deref()
.map(str::to_ascii_lowercase)
.is_some_and(|preview| !preview.contains(&prompt_snippet));
if (prompt_visible || mismatched_prompt_visible) && !receipt_visible {
return Some(PromptDeliveryObservation {
target: WorkerPromptTarget::WrongTask,
observed_cwd: detect_observed_shell_cwd(screen_text),
observed_prompt_preview,
});
}
}
if let Some(observed_cwd) = detect_observed_shell_cwd(screen_text) {
if prompt_visible && !cwd_matches_observed_target(expected_cwd, &observed_cwd) {
return Some(PromptDeliveryObservation {
target: WorkerPromptTarget::WrongTarget,
observed_cwd: Some(observed_cwd),
observed_prompt_preview,
});
}
}
@@ -736,6 +788,7 @@ fn detect_prompt_misdelivery(
(shell_error && prompt_visible).then_some(PromptDeliveryObservation {
target: WorkerPromptTarget::Shell,
observed_cwd: None,
observed_prompt_preview,
})
}
@@ -748,10 +801,38 @@ fn prompt_preview(prompt: &str) -> String {
format!("{}", preview.trim_end())
}
fn detect_prompt_echo(screen_text: &str) -> Option<String> {
screen_text.lines().find_map(|line| {
line.trim_start()
.strip_prefix('')
.map(str::trim)
.filter(|value| !value.is_empty())
.map(str::to_string)
})
}
fn task_receipt_visible(lowered_screen_text: &str, receipt: &WorkerTaskReceipt) -> bool {
let expected_tokens = [
receipt.repo.to_ascii_lowercase(),
receipt.task_kind.to_ascii_lowercase(),
receipt.source_surface.to_ascii_lowercase(),
receipt.objective_preview.to_ascii_lowercase(),
];
expected_tokens
.iter()
.all(|token| lowered_screen_text.contains(token))
&& receipt
.expected_artifacts
.iter()
.all(|artifact| lowered_screen_text.contains(&artifact.to_ascii_lowercase()))
}
fn prompt_misdelivery_detail(observation: &PromptDeliveryObservation) -> &'static str {
match observation.target {
WorkerPromptTarget::Shell => "shell misdelivery detected",
WorkerPromptTarget::WrongTarget => "prompt landed in wrong target",
WorkerPromptTarget::WrongTask => "prompt receipt mismatched expected task context",
WorkerPromptTarget::Unknown => "prompt delivery failure detected",
}
}
@@ -865,7 +946,7 @@ mod tests {
WorkerFailureKind::TrustGate
);
let send_before_resolve = registry.send_prompt(&worker.worker_id, Some("ship it"));
let send_before_resolve = registry.send_prompt(&worker.worker_id, Some("ship it"), None);
assert!(send_before_resolve
.expect_err("prompt delivery should be gated")
.contains("not ready for prompt delivery"));
@@ -905,7 +986,7 @@ mod tests {
.expect("ready observe should succeed");
let running = registry
.send_prompt(&worker.worker_id, Some("Implement worker handshake"))
.send_prompt(&worker.worker_id, Some("Implement worker handshake"), None)
.expect("prompt send should succeed");
assert_eq!(running.status, WorkerStatus::Running);
assert_eq!(running.prompt_delivery_attempts, 1);
@@ -941,6 +1022,8 @@ mod tests {
prompt_preview: "Implement worker handshake".to_string(),
observed_target: WorkerPromptTarget::Shell,
observed_cwd: None,
observed_prompt_preview: None,
task_receipt: None,
recovery_armed: false,
})
);
@@ -956,12 +1039,14 @@ mod tests {
prompt_preview: "Implement worker handshake".to_string(),
observed_target: WorkerPromptTarget::Shell,
observed_cwd: None,
observed_prompt_preview: None,
task_receipt: None,
recovery_armed: true,
})
);
let replayed = registry
.send_prompt(&worker.worker_id, None)
.send_prompt(&worker.worker_id, None, None)
.expect("replay send should succeed");
assert_eq!(replayed.status, WorkerStatus::Running);
assert!(replayed.replay_prompt.is_none());
@@ -976,7 +1061,11 @@ mod tests {
.observe(&worker.worker_id, "Ready for input\n>")
.expect("ready observe should succeed");
registry
.send_prompt(&worker.worker_id, Some("Run the worker bootstrap tests"))
.send_prompt(
&worker.worker_id,
Some("Run the worker bootstrap tests"),
None,
)
.expect("prompt send should succeed");
let recovered = registry
@@ -1007,6 +1096,8 @@ mod tests {
prompt_preview: "Run the worker bootstrap tests".to_string(),
observed_target: WorkerPromptTarget::WrongTarget,
observed_cwd: Some("/tmp/repo-target-b".to_string()),
observed_prompt_preview: None,
task_receipt: None,
recovery_armed: false,
})
);
@@ -1049,6 +1140,75 @@ mod tests {
assert!(ready.last_error.is_none());
}
#[test]
fn wrong_task_receipt_mismatch_is_detected_before_execution_continues() {
let registry = WorkerRegistry::new();
let worker = registry.create("/tmp/repo-task", &[], true);
registry
.observe(&worker.worker_id, "Ready for input\n>")
.expect("ready observe should succeed");
registry
.send_prompt(
&worker.worker_id,
Some("Implement worker handshake"),
Some(WorkerTaskReceipt {
repo: "claw-code".to_string(),
task_kind: "repo_code".to_string(),
source_surface: "omx_team".to_string(),
expected_artifacts: vec!["patch".to_string(), "tests".to_string()],
objective_preview: "Implement worker handshake".to_string(),
}),
)
.expect("prompt send should succeed");
let recovered = registry
.observe(
&worker.worker_id,
" Explain this KakaoTalk screenshot for a friend\nI can help analyze the screenshot…",
)
.expect("mismatch observe should succeed");
assert_eq!(recovered.status, WorkerStatus::ReadyForPrompt);
assert_eq!(
recovered
.last_error
.expect("mismatch error should exist")
.kind,
WorkerFailureKind::PromptDelivery
);
let mismatch = recovered
.events
.iter()
.find(|event| event.kind == WorkerEventKind::PromptMisdelivery)
.expect("wrong-task event should exist");
assert_eq!(mismatch.status, WorkerStatus::Failed);
assert_eq!(
mismatch.payload,
Some(WorkerEventPayload::PromptDelivery {
prompt_preview: "Implement worker handshake".to_string(),
observed_target: WorkerPromptTarget::WrongTask,
observed_cwd: None,
observed_prompt_preview: Some(
"Explain this KakaoTalk screenshot for a friend".to_string()
),
task_receipt: Some(WorkerTaskReceipt {
repo: "claw-code".to_string(),
task_kind: "repo_code".to_string(),
source_surface: "omx_team".to_string(),
expected_artifacts: vec!["patch".to_string(), "tests".to_string()],
objective_preview: "Implement worker handshake".to_string(),
}),
recovery_armed: false,
})
);
let replay = recovered
.events
.iter()
.find(|event| event.kind == WorkerEventKind::PromptReplayArmed)
.expect("replay event should exist");
assert_eq!(replay.status, WorkerStatus::ReadyForPrompt);
}
#[test]
fn restart_and_terminate_reset_or_finish_worker() {
let registry = WorkerRegistry::new();
@@ -1057,7 +1217,7 @@ mod tests {
.observe(&worker.worker_id, "Ready for input\n>")
.expect("ready observe should succeed");
registry
.send_prompt(&worker.worker_id, Some("Run tests"))
.send_prompt(&worker.worker_id, Some("Run tests"), None)
.expect("prompt send should succeed");
let restarted = registry
@@ -1086,7 +1246,7 @@ mod tests {
.observe(&worker.worker_id, "Ready for input\n>")
.expect("ready observe should succeed");
registry
.send_prompt(&worker.worker_id, Some("Run tests"))
.send_prompt(&worker.worker_id, Some("Run tests"), None)
.expect("prompt send should succeed");
let failed = registry
@@ -1163,7 +1323,7 @@ mod tests {
.observe(&worker.worker_id, "Ready for input\n>")
.expect("ready observe should succeed");
registry
.send_prompt(&worker.worker_id, Some("Run tests"))
.send_prompt(&worker.worker_id, Some("Run tests"), None)
.expect("prompt send should succeed");
let finished = registry

View File

@@ -304,7 +304,7 @@ fn worker_provider_failure_flows_through_recovery_to_policy() {
.observe(&worker.worker_id, "Ready for your input\n>")
.expect("ready observe should succeed");
registry
.send_prompt(&worker.worker_id, Some("Run analysis"))
.send_prompt(&worker.worker_id, Some("Run analysis"), None)
.expect("prompt send should succeed");
// Session completes with provider failure (finish="unknown", tokens=0)

View File

@@ -78,6 +78,9 @@ const INTERNAL_PROGRESS_HEARTBEAT_INTERVAL: Duration = Duration::from_secs(3);
const POST_TOOL_STALL_TIMEOUT: Duration = Duration::from_secs(10);
const PRIMARY_SESSION_EXTENSION: &str = "jsonl";
const LEGACY_SESSION_EXTENSION: &str = "json";
const OFFICIAL_REPO_URL: &str = "https://github.com/ultraworkers/claw-code";
const OFFICIAL_REPO_SLUG: &str = "ultraworkers/claw-code";
const DEPRECATED_INSTALL_COMMAND: &str = "cargo install claw-code";
const LATEST_SESSION_REFERENCE: &str = "latest";
const SESSION_REFERENCE_ALIASES: &[&str] = &[LATEST_SESSION_REFERENCE, "last", "recent"];
const CLI_OPTION_SUGGESTIONS: &[&str] = &[
@@ -1477,6 +1480,7 @@ fn render_doctor_report() -> Result<DoctorReport, Box<dyn std::error::Error>> {
checks: vec![
check_auth_health(),
check_config_health(&config_loader, config.as_ref()),
check_install_source_health(),
check_workspace_health(&context),
check_sandbox_health(&context.sandbox_status),
check_system_health(&cwd, config.as_ref().ok()),
@@ -1764,6 +1768,36 @@ fn check_config_health(
}
}
fn check_install_source_health() -> DiagnosticCheck {
DiagnosticCheck::new(
"Install source",
DiagnosticLevel::Ok,
format!(
"official source of truth is {OFFICIAL_REPO_SLUG}; avoid `{DEPRECATED_INSTALL_COMMAND}`"
),
)
.with_details(vec![
format!("Official repo {OFFICIAL_REPO_URL}"),
"Recommended path build from this repo or use the upstream binary documented in README.md"
.to_string(),
format!(
"Deprecated crate `{DEPRECATED_INSTALL_COMMAND}` installs a deprecated stub and does not provide the `claw` binary"
)
.to_string(),
])
.with_data(Map::from_iter([
("official_repo".to_string(), json!(OFFICIAL_REPO_URL)),
(
"deprecated_install".to_string(),
json!(DEPRECATED_INSTALL_COMMAND),
),
(
"recommended_install".to_string(),
json!("build from source or follow the upstream binary instructions in README.md"),
),
]))
}
fn check_workspace_health(context: &StatusContext) -> DiagnosticCheck {
let in_repo = context.project_root.is_some();
DiagnosticCheck::new(
@@ -3088,6 +3122,7 @@ struct SessionHandle {
struct ManagedSessionSummary {
id: String,
path: PathBuf,
updated_at_ms: u64,
modified_epoch_millis: u128,
message_count: usize,
parent_session_id: Option<String>,
@@ -4677,6 +4712,7 @@ fn list_managed_sessions() -> Result<Vec<ManagedSessionSummary>, Box<dyn std::er
.map(|session| ManagedSessionSummary {
id: session.id,
path: session.path,
updated_at_ms: session.updated_at_ms,
modified_epoch_millis: session.modified_epoch_millis,
message_count: session.message_count,
parent_session_id: session.parent_session_id,
@@ -4692,6 +4728,7 @@ fn latest_managed_session() -> Result<ManagedSessionSummary, Box<dyn std::error:
Ok(ManagedSessionSummary {
id: session.id,
path: session.path,
updated_at_ms: session.updated_at_ms,
modified_epoch_millis: session.modified_epoch_millis,
message_count: session.message_count,
parent_session_id: session.parent_session_id,
@@ -8111,6 +8148,11 @@ fn print_help_to(out: &mut impl Write) -> io::Result<()> {
out,
" Diagnose local auth, config, workspace, and sandbox health"
)?;
writeln!(out, " Source of truth: {OFFICIAL_REPO_SLUG}")?;
writeln!(
out,
" Warning: do not `{DEPRECATED_INSTALL_COMMAND}` (deprecated stub)"
)?;
writeln!(out, " claw dump-manifests [--manifests-dir PATH]")?;
writeln!(out, " claw bootstrap-plan")?;
writeln!(out, " claw agents")?;
@@ -8200,6 +8242,11 @@ fn print_help_to(out: &mut impl Write) -> io::Result<()> {
writeln!(out, " claw mcp show my-server")?;
writeln!(out, " claw /skills")?;
writeln!(out, " claw doctor")?;
writeln!(out, " source of truth: {OFFICIAL_REPO_URL}")?;
writeln!(
out,
" do not run `{DEPRECATED_INSTALL_COMMAND}` — it installs a deprecated stub"
)?;
writeln!(out, " claw init")?;
writeln!(out, " claw export")?;
writeln!(out, " claw export conversation.md")?;
@@ -10082,6 +10129,8 @@ mod tests {
assert!(help.contains("claw mcp"));
assert!(help.contains("claw skills"));
assert!(help.contains("claw /skills"));
assert!(help.contains("ultraworkers/claw-code"));
assert!(help.contains("cargo install claw-code"));
assert!(!help.contains("claw login"));
assert!(!help.contains("claw logout"));
}

View File

@@ -209,7 +209,7 @@ fn doctor_and_resume_status_emit_json_when_requested() {
assert!(summary["failures"].as_u64().is_some());
let checks = doctor["checks"].as_array().expect("doctor checks");
assert_eq!(checks.len(), 5);
assert_eq!(checks.len(), 6);
let check_names = checks
.iter()
.map(|check| {
@@ -221,7 +221,27 @@ fn doctor_and_resume_status_emit_json_when_requested() {
.collect::<Vec<_>>();
assert_eq!(
check_names,
vec!["auth", "config", "workspace", "sandbox", "system"]
vec![
"auth",
"config",
"install source",
"workspace",
"sandbox",
"system"
]
);
let install_source = checks
.iter()
.find(|check| check["name"] == "install source")
.expect("install source check");
assert_eq!(
install_source["official_repo"],
"https://github.com/ultraworkers/claw-code"
);
assert_eq!(
install_source["deprecated_install"],
"cargo install claw-code"
);
let workspace = checks

View File

@@ -20,7 +20,7 @@ use runtime::{
summary_compression::compress_summary_text,
task_registry::TaskRegistry,
team_cron_registry::{CronRegistry, TeamRegistry},
worker_boot::{WorkerReadySnapshot, WorkerRegistry},
worker_boot::{WorkerReadySnapshot, WorkerRegistry, WorkerTaskReceipt},
write_file, ApiClient, ApiRequest, AssistantEvent, BashCommandInput, BashCommandOutput,
BranchFreshness, ConfigLoader, ContentBlock, ConversationMessage, ConversationRuntime,
GrepSearchInput, LaneCommitProvenance, LaneEvent, LaneEventBlocker, LaneEventName,
@@ -930,7 +930,22 @@ pub fn mvp_tool_specs() -> Vec<ToolSpec> {
"type": "object",
"properties": {
"worker_id": { "type": "string" },
"prompt": { "type": "string" }
"prompt": { "type": "string" },
"task_receipt": {
"type": "object",
"properties": {
"repo": { "type": "string" },
"task_kind": { "type": "string" },
"source_surface": { "type": "string" },
"expected_artifacts": {
"type": "array",
"items": { "type": "string" }
},
"objective_preview": { "type": "string" }
},
"required": ["repo", "task_kind", "source_surface", "objective_preview"],
"additionalProperties": false
}
},
"required": ["worker_id"],
"additionalProperties": false
@@ -1522,7 +1537,11 @@ fn run_worker_await_ready(input: WorkerIdInput) -> Result<String, String> {
#[allow(clippy::needless_pass_by_value)]
fn run_worker_send_prompt(input: WorkerSendPromptInput) -> Result<String, String> {
let worker = global_worker_registry().send_prompt(&input.worker_id, input.prompt.as_deref())?;
let worker = global_worker_registry().send_prompt(
&input.worker_id,
input.prompt.as_deref(),
input.task_receipt,
)?;
to_pretty_json(worker)
}
@@ -2439,6 +2458,8 @@ struct WorkerSendPromptInput {
worker_id: String,
#[serde(default)]
prompt: Option<String>,
#[serde(default)]
task_receipt: Option<WorkerTaskReceipt>,
}
const fn default_auto_recover_prompt_misdelivery() -> bool {
@@ -3743,12 +3764,13 @@ fn persist_agent_terminal_state(
.push(LaneEvent::failed(iso8601_now(), &blocker));
} else {
next_manifest.current_blocker = None;
let compressed_detail = result
.filter(|value| !value.trim().is_empty())
.map(|value| compress_summary_text(value.trim()));
next_manifest
.lane_events
.push(LaneEvent::finished(iso8601_now(), compressed_detail));
let finished_summary = build_lane_finished_summary(&next_manifest, result);
next_manifest.lane_events.push(
LaneEvent::finished(iso8601_now(), finished_summary.detail).with_data(
serde_json::to_value(&finished_summary.data)
.expect("lane summary metadata should serialize"),
),
);
if let Some(provenance) = maybe_commit_provenance(result) {
next_manifest.lane_events.push(LaneEvent::commit_created(
iso8601_now(),
@@ -3760,6 +3782,308 @@ fn persist_agent_terminal_state(
write_agent_manifest(&next_manifest)
}
const MIN_LANE_SUMMARY_WORDS: usize = 7;
const REVIEW_VERDICTS: &[(&str, &str)] = &[
("APPROVE", "approve"),
("REJECT", "reject"),
("BLOCKED", "blocked"),
];
const CONTROL_ONLY_SUMMARY_WORDS: &[&str] = &[
"ack",
"commit",
"continue",
"everyting",
"everything",
"keep",
"next",
"push",
"ralph",
"resume",
"retry",
"run",
"stop",
"sweep",
"sweeping",
"team",
];
const CONTEXTUAL_SUMMARY_WORDS: &[&str] = &[
"added",
"audited",
"blocked",
"completed",
"documented",
"failed",
"finished",
"fixed",
"implemented",
"investigated",
"merged",
"pushed",
"refactored",
"removed",
"reviewed",
"tested",
"updated",
"verified",
];
#[derive(Debug, Clone, Serialize)]
struct LaneFinishedSummaryData {
#[serde(rename = "qualityFloorApplied")]
quality_floor_applied: bool,
reasons: Vec<String>,
#[serde(rename = "rawSummary", skip_serializing_if = "Option::is_none")]
raw_summary: Option<String>,
#[serde(rename = "wordCount")]
word_count: usize,
#[serde(rename = "reviewVerdict", skip_serializing_if = "Option::is_none")]
review_verdict: Option<String>,
#[serde(rename = "reviewTarget", skip_serializing_if = "Option::is_none")]
review_target: Option<String>,
#[serde(rename = "reviewRationale", skip_serializing_if = "Option::is_none")]
review_rationale: Option<String>,
#[serde(rename = "selectionOutcome", skip_serializing_if = "Option::is_none")]
selection_outcome: Option<SelectionOutcome>,
}
#[derive(Debug, Clone)]
struct LaneFinishedSummary {
detail: Option<String>,
data: LaneFinishedSummaryData,
}
#[derive(Debug)]
struct LaneSummaryAssessment {
apply_quality_floor: bool,
reasons: Vec<String>,
word_count: usize,
review_outcome: Option<ReviewLaneOutcome>,
}
#[derive(Debug, Clone)]
struct ReviewLaneOutcome {
verdict: String,
rationale: Option<String>,
}
#[derive(Debug, Clone, Serialize)]
struct SelectionOutcome {
#[serde(rename = "chosenItems", skip_serializing_if = "Vec::is_empty")]
chosen_items: Vec<String>,
#[serde(rename = "skippedItems", skip_serializing_if = "Vec::is_empty")]
skipped_items: Vec<String>,
action: String,
#[serde(skip_serializing_if = "Option::is_none")]
rationale: Option<String>,
}
fn build_lane_finished_summary(
manifest: &AgentOutput,
result: Option<&str>,
) -> LaneFinishedSummary {
let raw_summary = result.map(str::trim).filter(|value| !value.is_empty());
let assessment = assess_lane_summary_quality(raw_summary.unwrap_or_default());
let detail = match raw_summary {
Some(summary) if !assessment.apply_quality_floor => Some(compress_summary_text(summary)),
Some(summary) => Some(compose_lane_summary_fallback(manifest, Some(summary))),
None => Some(compose_lane_summary_fallback(manifest, None)),
};
let review_outcome = assessment.review_outcome.clone();
let review_target = review_outcome
.as_ref()
.map(|_| manifest.description.trim())
.filter(|value| !value.is_empty())
.map(str::to_string);
LaneFinishedSummary {
detail,
data: LaneFinishedSummaryData {
quality_floor_applied: raw_summary.is_none() || assessment.apply_quality_floor,
reasons: assessment.reasons,
raw_summary: raw_summary.map(str::to_string),
word_count: assessment.word_count,
review_verdict: review_outcome
.as_ref()
.map(|outcome| outcome.verdict.clone()),
review_target,
review_rationale: review_outcome.and_then(|outcome| outcome.rationale),
selection_outcome: extract_selection_outcome(raw_summary.unwrap_or_default()),
},
}
}
fn assess_lane_summary_quality(summary: &str) -> LaneSummaryAssessment {
let words = summary
.split(|ch: char| !(ch.is_ascii_alphanumeric() || ch == '-' || ch == '#'))
.filter(|token| !token.is_empty())
.map(str::to_ascii_lowercase)
.collect::<Vec<_>>();
let word_count = words.len();
let mut reasons = Vec::new();
if summary.trim().is_empty() {
reasons.push(String::from("empty"));
}
let review_outcome = extract_review_outcome(summary);
let control_only = !words.is_empty()
&& words
.iter()
.all(|word| CONTROL_ONLY_SUMMARY_WORDS.contains(&word.as_str()));
if control_only && review_outcome.is_none() {
reasons.push(String::from("control_only"));
}
let has_context_signal = summary.contains('`')
|| summary.contains('/')
|| summary.contains(':')
|| summary.contains('#')
|| review_outcome.is_some()
|| words
.iter()
.any(|word| CONTEXTUAL_SUMMARY_WORDS.contains(&word.as_str()));
if word_count < MIN_LANE_SUMMARY_WORDS && !has_context_signal {
reasons.push(String::from("too_short_without_context"));
}
LaneSummaryAssessment {
apply_quality_floor: !reasons.is_empty(),
reasons,
word_count,
review_outcome,
}
}
fn compose_lane_summary_fallback(manifest: &AgentOutput, raw_summary: Option<&str>) -> String {
let target = manifest.description.trim();
let base = format!(
"Completed lane `{}` for target: {}. Status: completed.",
manifest.name,
if target.is_empty() {
"unspecified task"
} else {
target
}
);
match raw_summary {
Some(summary) => format!(
"{base} Original stop summary was too vague to keep as the lane result: \"{}\".",
summary.trim()
),
None => format!("{base} No usable stop summary was produced by the lane."),
}
}
fn extract_review_outcome(summary: &str) -> Option<ReviewLaneOutcome> {
let mut lines = summary
.lines()
.map(str::trim)
.filter(|line| !line.is_empty());
let first = lines.next()?;
let verdict = REVIEW_VERDICTS.iter().find_map(|(prefix, verdict)| {
first
.eq_ignore_ascii_case(prefix)
.then(|| (*verdict).to_string())
})?;
let rationale = lines.collect::<Vec<_>>().join(" ").trim().to_string();
Some(ReviewLaneOutcome {
verdict,
rationale: (!rationale.is_empty()).then_some(compress_summary_text(&rationale)),
})
}
fn extract_selection_outcome(summary: &str) -> Option<SelectionOutcome> {
let mut chosen_items = Vec::new();
let mut skipped_items = Vec::new();
let mut action = None;
let mut rationale = None;
for line in summary
.lines()
.map(str::trim)
.filter(|line| !line.is_empty())
{
let lowered = line.to_ascii_lowercase();
let roadmap_items = extract_roadmap_items(line);
if lowered.starts_with("chosen:")
|| lowered.starts_with("picked:")
|| lowered.starts_with("selected:")
|| (lowered.contains("picked") && !roadmap_items.is_empty())
|| (lowered.contains("selected") && !roadmap_items.is_empty())
{
chosen_items.extend(roadmap_items);
} else if lowered.starts_with("skipped:")
|| lowered.starts_with("skip:")
|| (lowered.contains("skipped") && !roadmap_items.is_empty())
{
skipped_items.extend(roadmap_items);
}
if let Some(rest) = lowered.strip_prefix("action:") {
if rest.contains("execute") || rest.contains("implement") || rest.contains("fix") {
action = Some(String::from("execute"));
} else if rest.contains("review") || rest.contains("audit") {
action = Some(String::from("review"));
} else if rest.contains("no-op") || rest.contains("noop") {
action = Some(String::from("no-op"));
}
}
if let Some(rest) = line.strip_prefix("Rationale:") {
let trimmed = rest.trim();
if !trimmed.is_empty() {
rationale = Some(compress_summary_text(trimmed));
}
}
}
chosen_items.sort();
chosen_items.dedup();
skipped_items.sort();
skipped_items.dedup();
if chosen_items.is_empty() && skipped_items.is_empty() && action.is_none() {
return None;
}
let default_action = if chosen_items.is_empty() {
String::from("no-op")
} else {
String::from("execute")
};
Some(SelectionOutcome {
chosen_items,
skipped_items,
action: action.unwrap_or(default_action),
rationale,
})
}
fn extract_roadmap_items(line: &str) -> Vec<String> {
let mut items = Vec::new();
let mut chars = line.chars().peekable();
while let Some(ch) = chars.next() {
if ch == '#' {
let mut digits = String::new();
while let Some(next) = chars.peek() {
if next.is_ascii_digit() {
digits.push(*next);
chars.next();
} else {
break;
}
}
if !digits.is_empty() {
items.push(format!("ROADMAP #{digits}"));
}
}
}
items
}
fn derive_agent_state(
status: &str,
result: Option<&str>,
@@ -7240,6 +7564,14 @@ mod tests {
completed_manifest_json["laneEvents"][1]["event"],
"lane.finished"
);
assert_eq!(
completed_manifest_json["laneEvents"][1]["data"]["qualityFloorApplied"],
false
);
assert_eq!(
completed_manifest_json["laneEvents"][1]["detail"],
"Finished successfully in commit abc1234"
);
assert_eq!(
completed_manifest_json["laneEvents"][2]["event"],
"lane.commit.created"
@@ -7301,6 +7633,137 @@ mod tests {
);
assert_eq!(failed_manifest_json["derivedState"], "truly_idle");
let normalized = execute_agent_with_spawn(
AgentInput {
description: "Sweep the next backlog item".to_string(),
prompt: "Produce a low-signal stop summary".to_string(),
subagent_type: Some("Explore".to_string()),
name: Some("summary-floor".to_string()),
model: None,
},
|job| {
persist_agent_terminal_state(
&job.manifest,
"completed",
Some("commit push everyting, keep sweeping $ralph"),
None,
)
},
)
.expect("normalized agent should succeed");
let normalized_manifest = std::fs::read_to_string(&normalized.manifest_file)
.expect("normalized manifest should exist");
let normalized_manifest_json: serde_json::Value =
serde_json::from_str(&normalized_manifest).expect("normalized manifest json");
assert_eq!(
normalized_manifest_json["laneEvents"][1]["event"],
"lane.finished"
);
let normalized_detail = normalized_manifest_json["laneEvents"][1]["detail"]
.as_str()
.expect("normalized detail");
assert!(normalized_detail.contains("Completed lane `summary-floor`"));
assert!(normalized_detail.contains("Sweep the next backlog item"));
assert_eq!(
normalized_manifest_json["laneEvents"][1]["data"]["qualityFloorApplied"],
true
);
assert_eq!(
normalized_manifest_json["laneEvents"][1]["data"]["rawSummary"],
"commit push everyting, keep sweeping $ralph"
);
assert_eq!(
normalized_manifest_json["laneEvents"][1]["data"]["reasons"][0],
"control_only"
);
let review = execute_agent_with_spawn(
AgentInput {
description: "Review commit 1234abcd for ROADMAP #67".to_string(),
prompt: "Review the scoped diff".to_string(),
subagent_type: Some("Verification".to_string()),
name: Some("review-lane".to_string()),
model: None,
},
|job| {
persist_agent_terminal_state(
&job.manifest,
"completed",
Some("APPROVE\n\nTarget: commit 1234abcd\nRationale: scoped diff is safe."),
None,
)
},
)
.expect("review agent should succeed");
let review_manifest =
std::fs::read_to_string(&review.manifest_file).expect("review manifest should exist");
let review_manifest_json: serde_json::Value =
serde_json::from_str(&review_manifest).expect("review manifest json");
assert_eq!(
review_manifest_json["laneEvents"][1]["data"]["reviewVerdict"],
"approve"
);
assert_eq!(
review_manifest_json["laneEvents"][1]["data"]["reviewTarget"],
"Review commit 1234abcd for ROADMAP #67"
);
assert_eq!(
review_manifest_json["laneEvents"][1]["data"]["reviewRationale"],
"Target: commit 1234abcd Rationale: scoped diff is safe."
);
assert_eq!(
review_manifest_json["laneEvents"][1]["data"]["qualityFloorApplied"],
false
);
let selection = execute_agent_with_spawn(
AgentInput {
description: "Scan ROADMAP Immediate Backlog for the next repo-local item".to_string(),
prompt: "Choose the next backlog target".to_string(),
subagent_type: Some("Explore".to_string()),
name: Some("backlog-scan".to_string()),
model: None,
},
|job| {
persist_agent_terminal_state(
&job.manifest,
"completed",
Some(
"Selected next backlog target.\nChosen: ROADMAP #65\nSkipped: ROADMAP #63, ROADMAP #64\nAction: execute\nRationale: #65 is the next repo-local lane-finished metadata task.",
),
None,
)
},
)
.expect("selection agent should succeed");
let selection_manifest = std::fs::read_to_string(&selection.manifest_file)
.expect("selection manifest should exist");
let selection_manifest_json: serde_json::Value =
serde_json::from_str(&selection_manifest).expect("selection manifest json");
assert_eq!(
selection_manifest_json["laneEvents"][1]["data"]["selectionOutcome"]["chosenItems"][0],
"ROADMAP #65"
);
assert_eq!(
selection_manifest_json["laneEvents"][1]["data"]["selectionOutcome"]["skippedItems"][0],
"ROADMAP #63"
);
assert_eq!(
selection_manifest_json["laneEvents"][1]["data"]["selectionOutcome"]["skippedItems"][1],
"ROADMAP #64"
);
assert_eq!(
selection_manifest_json["laneEvents"][1]["data"]["selectionOutcome"]["action"],
"execute"
);
assert_eq!(
selection_manifest_json["laneEvents"][1]["data"]["selectionOutcome"]["rationale"],
"#65 is the next repo-local lane-finished metadata task."
);
let spawn_error = execute_agent_with_spawn(
AgentInput {
description: "Spawn error task".to_string(),