mirror of
https://github.com/instructkr/claw-code.git
synced 2026-06-28 05:26:24 -04:00
Compare commits
8 Commits
4f83a81cf6
...
2e34949507
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
2e34949507 | ||
|
|
8f53524bd3 | ||
|
|
b5e30e2975 | ||
|
|
dbc2824a3e | ||
|
|
f309ff8642 | ||
|
|
3b806702e7 | ||
|
|
26b89e583f | ||
|
|
17e21bc4ad |
15
ROADMAP.md
15
ROADMAP.md
@@ -496,18 +496,25 @@ Model name prefix now wins unconditionally over env-var presence. Regression tes
|
||||
|
||||
62. **Worker state file surface not implemented** — **done (verified 2026-04-12):** current `main` already wires `emit_state_file(worker)` into the worker transition path in `rust/crates/runtime/src/worker_boot.rs`, atomically writes `.claw/worker-state.json`, and exposes the documented reader surface through `claw state` / `claw state --output-format json` in `rust/crates/rusty-claude-cli/src/main.rs`. Fresh proof exists in `runtime` regression `emit_state_file_writes_worker_status_on_transition`, the end-to-end `tools` regression `recovery_loop_state_file_reflects_transitions`, and direct CLI parsing coverage for `state` / `state --output-format json`. Source: Jobdori dogfood.
|
||||
|
||||
**Scope note (verified 2026-04-12):** ROADMAP #31, #43, and #63-#68 currently appear to describe acpx/droid or upstream OMX/server orchestration behavior, not claw-code source already present in this repository. Repo-local searches for `acpx`, `use-droid`, `run-acpx`, `commit-wrapper`, `ultraclaw`, `roadmap-nudge-10min`, `OMX_TMUX_INJECT`, `/hooks/health`, and `/hooks/status` found no implementation hits outside `ROADMAP.md`, and the earlier state-surface note already records that the HTTP server is not owned by claw-code. With #45 now fixed, the remaining unresolved items in this section look like external tracking notes rather than confirmed repo-local backlog; re-check if new repo-local evidence appears.
|
||||
**Scope note (verified 2026-04-12):** ROADMAP #31, #43, and #63-#68 currently appear to describe acpx/droid or upstream OMX/server orchestration behavior, not claw-code source already present in this repository. Repo-local searches for `acpx`, `use-droid`, `run-acpx`, `commit-wrapper`, `ultraclaw`, `roadmap-nudge-10min`, `OMX_TMUX_INJECT`, `/hooks/health`, and `/hooks/status` found no implementation hits outside `ROADMAP.md`, and the earlier state-surface note already records that the HTTP server is not owned by claw-code. With #45, #65, #67, and #69 now fixed, the remaining unresolved items in this section look like external tracking notes rather than confirmed repo-local backlog; re-check if new repo-local evidence appears.
|
||||
|
||||
63. **Droid session completion semantics broken: code arrives after "status: completed"** — dogfooded 2026-04-12. Ultraclaw droid sessions (use-droid via acpx) report `session.status: completed` before file writes are fully flushed/synced to the working tree. Discovered +410 lines of "late-arriving" droid output that appeared after I had already assessed 8 sessions as "no code produced." This creates false-negative assessments and duplicate work. **Fix shape:** (a) droid agent should only report completion after explicit file-write confirmation (fsync or existence check); (b) or, claw-code should expose a `pending_writes` status that indicates "agent responded, disk flush pending"; (c) lane orchestrators should poll for file changes for N seconds after completion before final assessment. **Blocker:** none. Source: Jobdori ultraclaw dogfood 2026-04-12.
|
||||
|
||||
64. **Artifact provenance is post-hoc narration, not structured events** — dogfooded 2026-04-12. The ultraclaw batch delivered 4 ROADMAP items and 3 commits, but the event stream only contained log-shaped text ("+410 lines detected", "committing...", "pushed"). Downstream consumers (clawhip, lane orchestrators, monitors) must reconstruct provenance from chat messages rather than consuming first-class events. **Fix shape:** emit structured artifact/result events with: `sourceLanes`, `roadmapIds`, `files`, `diffStat`, `verification: tested|committed|pushed|merged`, `commitSha`. Remove dependency on human/bot narration layer to explain what actually landed. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
|
||||
|
||||
65. **Backlog-scanning team lanes emit opaque stops, not structured selection outcomes** — dogfooded 2026-04-12. $ralph $team sessions scanning ROADMAP Immediate Backlog stop with summary text naming open items, but no machine-readable signal of: which item(s) were selected for work, which were skipped and why, whether execution happened vs review-only vs no-op. **Fix shape:** add structured "selection outcome" event with `chosenItems`, `skippedItems`, `rationale`, `action: execute|review|no-op`. Stop emitting "check backlog" as prose summary without selection contract. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
|
||||
65. **Backlog-scanning team lanes emit opaque stops, not structured selection outcomes** — **done (verified 2026-04-12):** completed lane persistence in `rust/crates/tools/src/lib.rs` now recognizes backlog-scan selection summaries and records structured `selectionOutcome` metadata on `lane.finished`, including `chosenItems`, `skippedItems`, `action`, and optional `rationale`, while preserving existing non-selection and review-lane behavior. Regression coverage locks the structured backlog-scan payload alongside the earlier quality-floor and review-verdict paths. **Original filing below.**
|
||||
|
||||
66. **Completion-aware reminder shutdown missing** — dogfooded 2026-04-12. Ultraclaw batch completed and was reported as done, but 10-minute cron reminder (`roadmap-nudge-10min`) kept firing into channel as if work still pending. Reminder/cron state not coupled to terminal task state. **Fix shape:** (a) cron jobs should check task completion state before firing; (b) or, provide explicit `cron.remove` on task completion; (c) or, reminders should include "work complete" detection and auto-expire. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
|
||||
|
||||
67. **Scoped review lanes do not emit structured verdicts** — dogfooded 2026-04-12. OMX review lanes now have improved scope (specific ROADMAP items, specific files, explicit APPROVE/REJECT contract), but the stop event only contains the review request — not the actual verdict. Operators must infer approval/rejection/blockage from later git commits or surrounding chatter. **Fix shape:** emit structured review result on stop with: `verdict: approve|reject|blocked`, `target: commit/diff reviewed`, `rationale: short summary`. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
|
||||
67. **Scoped review lanes do not emit structured verdicts** — **done (verified 2026-04-12):** completed lane persistence in `rust/crates/tools/src/lib.rs` now recognizes review-style `APPROVE`/`REJECT`/`BLOCKED` results and records structured `reviewVerdict`, `reviewTarget`, and `reviewRationale` metadata on the `lane.finished` event while preserving existing non-review lane behavior. Regression coverage locks both the normal completion path and a scoped review-lane completion payload. **Original filing below.**
|
||||
|
||||
68. **Internal reinjection/resume paths leak opaque control prose** — dogfooded 2026-04-12. OMX lanes stopping with `Continue from current mode state. [OMX_TMUX_INJECT]` expose internal implementation details instead of operator-meaningful state. The event tells us *that* tmux reinjection happened, but not *why* (retry after failure? resume after idle? manual recovery?), *what state was preserved*, or *what the lane was trying to do*. **Fix shape:** recovery/reinject events should emit structured cause like: `resume_after_stop`, `retry_after_tool_failure`, `tmux_reinject_after_idle`, `manual_recovery` plus preserved state / target lane info. Never leak bare internal markers like `[OMX_TMUX_INJECT]` as the primary summary. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
|
||||
|
||||
69. **Lane stop summaries have no minimum quality floor** — dogfooded 2026-04-12. `clawcode-human` session stopped with summary `commit push everyting, keep sweeping $ralph` — vague, typo-ridden, operationally useless. Unlike well-scoped review lanes, this summary regressed to mushy command prose with no outcome clarity. **Fix shape:** (a) enforce minimum stop/result summary standards: what was done (outcome), what was scoped (target), what's next (state); (b) typo/grammar validation; (c) reject summaries that are shorter than N words or contain only control verbs without context. Blocker: none. Source: gaebal-gajae dogfood analysis 2026-04-12.
|
||||
69. **Lane stop summaries have no minimum quality floor** — **done (verified 2026-04-12):** completed lane persistence in `rust/crates/tools/src/lib.rs` now normalizes vague/control-only stop summaries into a contextual fallback that includes the lane target and status, while preserving structured metadata about whether the quality floor fired (`qualityFloorApplied`, `rawSummary`, `reasons`, `wordCount`). Regression coverage locks both the pass-through path for good summaries and the fallback path for mushy summaries like `commit push everyting, keep sweeping $ralph`. **Original filing below.**
|
||||
|
||||
70. **Install-source ambiguity misleads real users** — **done (verified 2026-04-12):** repo-local Rust guidance now makes the source of truth explicit in `claw doctor` and `claw --help`, naming `ultraworkers/claw-code` as the canonical repo and warning that `cargo install claw-code` installs a deprecated stub rather than the `claw` binary. Regression coverage locks both the new doctor JSON check and the help-text warning. **Original filing below.**
|
||||
|
||||
71. **Wrong-task prompt receipt is not detected before execution** — **done (verified 2026-04-12):** worker boot prompt dispatch now accepts an optional structured `task_receipt` (`repo`, `task_kind`, `source_surface`, `expected_artifacts`, `objective_preview`) and treats mismatched visible prompt context as a `WrongTask` prompt-delivery failure before execution continues. The prompt-delivery payload now records `observed_prompt_preview` plus the expected receipt, and regression coverage locks both the existing shell/wrong-target paths and the new KakaoTalk-style wrong-task mismatch case. **Original filing below.**
|
||||
|
||||
72. **`latest` managed-session selection depends on filesystem mtime before semantic session recency** — **done (verified 2026-04-12):** managed-session summaries now carry `updated_at_ms`, `SessionStore::list_sessions()` sorts by semantic recency before filesystem mtime, and regression coverage locks the case where `latest` must prefer the newer session payload even when file mtimes point the other way. The CLI session-summary wrapper now stays in sync with the runtime field so `latest` resolution uses the same ordering signal everywhere. **Original filing below.**
|
||||
73. **Session timestamps are not monotonic enough for latest-session ordering under tight loops** — **done (verified 2026-04-12):** runtime session timestamps now use a process-local monotonic millisecond source, so back-to-back saves still produce increasing `updated_at_ms` even when the wall clock does not advance. The temporary sleep hack was removed from the resume-latest regression, and fresh workspace verification stayed green with the semantic-recency ordering path from #72. **Original filing below.**
|
||||
|
||||
@@ -13,6 +13,7 @@ const SESSION_VERSION: u32 = 1;
|
||||
const ROTATE_AFTER_BYTES: u64 = 256 * 1024;
|
||||
const MAX_ROTATED_FILES: usize = 3;
|
||||
static SESSION_ID_COUNTER: AtomicU64 = AtomicU64::new(0);
|
||||
static LAST_TIMESTAMP_MS: AtomicU64 = AtomicU64::new(0);
|
||||
|
||||
/// Speaker role associated with a persisted conversation message.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
@@ -1030,10 +1031,27 @@ fn normalize_optional_string(value: Option<String>) -> Option<String> {
|
||||
}
|
||||
|
||||
fn current_time_millis() -> u64 {
|
||||
SystemTime::now()
|
||||
let wall_clock = SystemTime::now()
|
||||
.duration_since(UNIX_EPOCH)
|
||||
.map(|duration| u64::try_from(duration.as_millis()).unwrap_or(u64::MAX))
|
||||
.unwrap_or_default()
|
||||
.unwrap_or_default();
|
||||
|
||||
let mut candidate = wall_clock;
|
||||
loop {
|
||||
let previous = LAST_TIMESTAMP_MS.load(Ordering::Relaxed);
|
||||
if candidate <= previous {
|
||||
candidate = previous.saturating_add(1);
|
||||
}
|
||||
match LAST_TIMESTAMP_MS.compare_exchange(
|
||||
previous,
|
||||
candidate,
|
||||
Ordering::SeqCst,
|
||||
Ordering::SeqCst,
|
||||
) {
|
||||
Ok(_) => return candidate,
|
||||
Err(actual) => candidate = actual.saturating_add(1),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn generate_session_id() -> String {
|
||||
@@ -1125,8 +1143,8 @@ fn cleanup_rotated_logs(path: &Path) -> Result<(), SessionError> {
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::{
|
||||
cleanup_rotated_logs, rotate_session_file_if_needed, ContentBlock, ConversationMessage,
|
||||
MessageRole, Session, SessionFork,
|
||||
cleanup_rotated_logs, current_time_millis, rotate_session_file_if_needed, ContentBlock,
|
||||
ConversationMessage, MessageRole, Session, SessionFork,
|
||||
};
|
||||
use crate::json::JsonValue;
|
||||
use crate::usage::TokenUsage;
|
||||
@@ -1134,6 +1152,16 @@ mod tests {
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::time::{SystemTime, UNIX_EPOCH};
|
||||
|
||||
#[test]
|
||||
fn session_timestamps_are_monotonic_under_tight_loops() {
|
||||
let first = current_time_millis();
|
||||
let second = current_time_millis();
|
||||
let third = current_time_millis();
|
||||
|
||||
assert!(first < second);
|
||||
assert!(second < third);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn persists_and_restores_session_jsonl() {
|
||||
let mut session = Session::new();
|
||||
|
||||
@@ -144,12 +144,7 @@ impl SessionStore {
|
||||
if let Some(legacy_root) = self.legacy_sessions_root() {
|
||||
self.collect_sessions_from_dir(&legacy_root, &mut sessions)?;
|
||||
}
|
||||
sessions.sort_by(|left, right| {
|
||||
right
|
||||
.modified_epoch_millis
|
||||
.cmp(&left.modified_epoch_millis)
|
||||
.then_with(|| right.id.cmp(&left.id))
|
||||
});
|
||||
sort_managed_sessions(&mut sessions);
|
||||
Ok(sessions)
|
||||
}
|
||||
|
||||
@@ -260,6 +255,7 @@ impl SessionStore {
|
||||
ManagedSessionSummary {
|
||||
id: session.session_id,
|
||||
path,
|
||||
updated_at_ms: session.updated_at_ms,
|
||||
modified_epoch_millis,
|
||||
message_count: session.messages.len(),
|
||||
parent_session_id: session
|
||||
@@ -279,6 +275,7 @@ impl SessionStore {
|
||||
.unwrap_or("unknown")
|
||||
.to_string(),
|
||||
path,
|
||||
updated_at_ms: 0,
|
||||
modified_epoch_millis,
|
||||
message_count: 0,
|
||||
parent_session_id: None,
|
||||
@@ -322,12 +319,23 @@ pub struct SessionHandle {
|
||||
pub struct ManagedSessionSummary {
|
||||
pub id: String,
|
||||
pub path: PathBuf,
|
||||
pub updated_at_ms: u64,
|
||||
pub modified_epoch_millis: u128,
|
||||
pub message_count: usize,
|
||||
pub parent_session_id: Option<String>,
|
||||
pub branch_name: Option<String>,
|
||||
}
|
||||
|
||||
fn sort_managed_sessions(sessions: &mut [ManagedSessionSummary]) {
|
||||
sessions.sort_by(|left, right| {
|
||||
right
|
||||
.updated_at_ms
|
||||
.cmp(&left.updated_at_ms)
|
||||
.then_with(|| right.modified_epoch_millis.cmp(&left.modified_epoch_millis))
|
||||
.then_with(|| right.id.cmp(&left.id))
|
||||
});
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub struct LoadedManagedSession {
|
||||
pub handle: SessionHandle,
|
||||
@@ -598,6 +606,35 @@ mod tests {
|
||||
.expect("session summary should exist")
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn latest_session_prefers_semantic_updated_at_over_file_mtime() {
|
||||
let mut sessions = vec![
|
||||
ManagedSessionSummary {
|
||||
id: "older-file-newer-session".to_string(),
|
||||
path: PathBuf::from("/tmp/older"),
|
||||
updated_at_ms: 200,
|
||||
modified_epoch_millis: 100,
|
||||
message_count: 2,
|
||||
parent_session_id: None,
|
||||
branch_name: None,
|
||||
},
|
||||
ManagedSessionSummary {
|
||||
id: "newer-file-older-session".to_string(),
|
||||
path: PathBuf::from("/tmp/newer"),
|
||||
updated_at_ms: 100,
|
||||
modified_epoch_millis: 200,
|
||||
message_count: 1,
|
||||
parent_session_id: None,
|
||||
branch_name: None,
|
||||
},
|
||||
];
|
||||
|
||||
crate::session_control::sort_managed_sessions(&mut sessions);
|
||||
|
||||
assert_eq!(sessions[0].id, "older-file-newer-session");
|
||||
assert_eq!(sessions[1].id, "newer-file-older-session");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn creates_and_lists_managed_sessions() {
|
||||
// given
|
||||
|
||||
@@ -92,6 +92,7 @@ pub enum WorkerTrustResolution {
|
||||
pub enum WorkerPromptTarget {
|
||||
Shell,
|
||||
WrongTarget,
|
||||
WrongTask,
|
||||
Unknown,
|
||||
}
|
||||
|
||||
@@ -108,10 +109,24 @@ pub enum WorkerEventPayload {
|
||||
observed_target: WorkerPromptTarget,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
observed_cwd: Option<String>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
observed_prompt_preview: Option<String>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
task_receipt: Option<WorkerTaskReceipt>,
|
||||
recovery_armed: bool,
|
||||
},
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
|
||||
pub struct WorkerTaskReceipt {
|
||||
pub repo: String,
|
||||
pub task_kind: String,
|
||||
pub source_surface: String,
|
||||
#[serde(default, skip_serializing_if = "Vec::is_empty")]
|
||||
pub expected_artifacts: Vec<String>,
|
||||
pub objective_preview: String,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
|
||||
pub struct WorkerEvent {
|
||||
pub seq: u64,
|
||||
@@ -134,6 +149,7 @@ pub struct Worker {
|
||||
pub prompt_delivery_attempts: u32,
|
||||
pub prompt_in_flight: bool,
|
||||
pub last_prompt: Option<String>,
|
||||
pub expected_receipt: Option<WorkerTaskReceipt>,
|
||||
pub replay_prompt: Option<String>,
|
||||
pub last_error: Option<WorkerFailure>,
|
||||
pub created_at: u64,
|
||||
@@ -182,6 +198,7 @@ impl WorkerRegistry {
|
||||
prompt_delivery_attempts: 0,
|
||||
prompt_in_flight: false,
|
||||
last_prompt: None,
|
||||
expected_receipt: None,
|
||||
replay_prompt: None,
|
||||
last_error: None,
|
||||
created_at: ts,
|
||||
@@ -257,6 +274,7 @@ impl WorkerRegistry {
|
||||
&lowered,
|
||||
worker.last_prompt.as_deref(),
|
||||
&worker.cwd,
|
||||
worker.expected_receipt.as_ref(),
|
||||
)
|
||||
})
|
||||
.flatten()
|
||||
@@ -272,6 +290,10 @@ impl WorkerRegistry {
|
||||
"worker prompt landed in the wrong target instead of {}: {}",
|
||||
worker.cwd, prompt_preview
|
||||
),
|
||||
WorkerPromptTarget::WrongTask => format!(
|
||||
"worker prompt receipt mismatched the expected task context for {}: {}",
|
||||
worker.cwd, prompt_preview
|
||||
),
|
||||
WorkerPromptTarget::Unknown => format!(
|
||||
"worker prompt delivery failed before reaching coding agent: {prompt_preview}"
|
||||
),
|
||||
@@ -291,6 +313,8 @@ impl WorkerRegistry {
|
||||
prompt_preview: prompt_preview.clone(),
|
||||
observed_target: observation.target,
|
||||
observed_cwd: observation.observed_cwd.clone(),
|
||||
observed_prompt_preview: observation.observed_prompt_preview.clone(),
|
||||
task_receipt: worker.expected_receipt.clone(),
|
||||
recovery_armed: false,
|
||||
}),
|
||||
);
|
||||
@@ -306,6 +330,8 @@ impl WorkerRegistry {
|
||||
prompt_preview,
|
||||
observed_target: observation.target,
|
||||
observed_cwd: observation.observed_cwd,
|
||||
observed_prompt_preview: observation.observed_prompt_preview,
|
||||
task_receipt: worker.expected_receipt.clone(),
|
||||
recovery_armed: true,
|
||||
}),
|
||||
);
|
||||
@@ -374,7 +400,12 @@ impl WorkerRegistry {
|
||||
Ok(worker.clone())
|
||||
}
|
||||
|
||||
pub fn send_prompt(&self, worker_id: &str, prompt: Option<&str>) -> Result<Worker, String> {
|
||||
pub fn send_prompt(
|
||||
&self,
|
||||
worker_id: &str,
|
||||
prompt: Option<&str>,
|
||||
task_receipt: Option<WorkerTaskReceipt>,
|
||||
) -> Result<Worker, String> {
|
||||
let mut inner = self.inner.lock().expect("worker registry lock poisoned");
|
||||
let worker = inner
|
||||
.workers
|
||||
@@ -398,6 +429,7 @@ impl WorkerRegistry {
|
||||
worker.prompt_delivery_attempts += 1;
|
||||
worker.prompt_in_flight = true;
|
||||
worker.last_prompt = Some(next_prompt.clone());
|
||||
worker.expected_receipt = task_receipt;
|
||||
worker.replay_prompt = None;
|
||||
worker.last_error = None;
|
||||
worker.status = WorkerStatus::Running;
|
||||
@@ -548,6 +580,7 @@ fn prompt_misdelivery_is_relevant(worker: &Worker) -> bool {
|
||||
struct PromptDeliveryObservation {
|
||||
target: WorkerPromptTarget,
|
||||
observed_cwd: Option<String>,
|
||||
observed_prompt_preview: Option<String>,
|
||||
}
|
||||
|
||||
fn push_event(
|
||||
@@ -699,6 +732,7 @@ fn detect_prompt_misdelivery(
|
||||
lowered: &str,
|
||||
prompt: Option<&str>,
|
||||
expected_cwd: &str,
|
||||
expected_receipt: Option<&WorkerTaskReceipt>,
|
||||
) -> Option<PromptDeliveryObservation> {
|
||||
let Some(prompt) = prompt else {
|
||||
return None;
|
||||
@@ -713,12 +747,30 @@ fn detect_prompt_misdelivery(
|
||||
return None;
|
||||
}
|
||||
let prompt_visible = lowered.contains(&prompt_snippet);
|
||||
let observed_prompt_preview = detect_prompt_echo(screen_text);
|
||||
|
||||
if let Some(receipt) = expected_receipt {
|
||||
let receipt_visible = task_receipt_visible(lowered, receipt);
|
||||
let mismatched_prompt_visible = observed_prompt_preview
|
||||
.as_deref()
|
||||
.map(str::to_ascii_lowercase)
|
||||
.is_some_and(|preview| !preview.contains(&prompt_snippet));
|
||||
|
||||
if (prompt_visible || mismatched_prompt_visible) && !receipt_visible {
|
||||
return Some(PromptDeliveryObservation {
|
||||
target: WorkerPromptTarget::WrongTask,
|
||||
observed_cwd: detect_observed_shell_cwd(screen_text),
|
||||
observed_prompt_preview,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
if let Some(observed_cwd) = detect_observed_shell_cwd(screen_text) {
|
||||
if prompt_visible && !cwd_matches_observed_target(expected_cwd, &observed_cwd) {
|
||||
return Some(PromptDeliveryObservation {
|
||||
target: WorkerPromptTarget::WrongTarget,
|
||||
observed_cwd: Some(observed_cwd),
|
||||
observed_prompt_preview,
|
||||
});
|
||||
}
|
||||
}
|
||||
@@ -736,6 +788,7 @@ fn detect_prompt_misdelivery(
|
||||
(shell_error && prompt_visible).then_some(PromptDeliveryObservation {
|
||||
target: WorkerPromptTarget::Shell,
|
||||
observed_cwd: None,
|
||||
observed_prompt_preview,
|
||||
})
|
||||
}
|
||||
|
||||
@@ -748,10 +801,38 @@ fn prompt_preview(prompt: &str) -> String {
|
||||
format!("{}…", preview.trim_end())
|
||||
}
|
||||
|
||||
fn detect_prompt_echo(screen_text: &str) -> Option<String> {
|
||||
screen_text.lines().find_map(|line| {
|
||||
line.trim_start()
|
||||
.strip_prefix('›')
|
||||
.map(str::trim)
|
||||
.filter(|value| !value.is_empty())
|
||||
.map(str::to_string)
|
||||
})
|
||||
}
|
||||
|
||||
fn task_receipt_visible(lowered_screen_text: &str, receipt: &WorkerTaskReceipt) -> bool {
|
||||
let expected_tokens = [
|
||||
receipt.repo.to_ascii_lowercase(),
|
||||
receipt.task_kind.to_ascii_lowercase(),
|
||||
receipt.source_surface.to_ascii_lowercase(),
|
||||
receipt.objective_preview.to_ascii_lowercase(),
|
||||
];
|
||||
|
||||
expected_tokens
|
||||
.iter()
|
||||
.all(|token| lowered_screen_text.contains(token))
|
||||
&& receipt
|
||||
.expected_artifacts
|
||||
.iter()
|
||||
.all(|artifact| lowered_screen_text.contains(&artifact.to_ascii_lowercase()))
|
||||
}
|
||||
|
||||
fn prompt_misdelivery_detail(observation: &PromptDeliveryObservation) -> &'static str {
|
||||
match observation.target {
|
||||
WorkerPromptTarget::Shell => "shell misdelivery detected",
|
||||
WorkerPromptTarget::WrongTarget => "prompt landed in wrong target",
|
||||
WorkerPromptTarget::WrongTask => "prompt receipt mismatched expected task context",
|
||||
WorkerPromptTarget::Unknown => "prompt delivery failure detected",
|
||||
}
|
||||
}
|
||||
@@ -865,7 +946,7 @@ mod tests {
|
||||
WorkerFailureKind::TrustGate
|
||||
);
|
||||
|
||||
let send_before_resolve = registry.send_prompt(&worker.worker_id, Some("ship it"));
|
||||
let send_before_resolve = registry.send_prompt(&worker.worker_id, Some("ship it"), None);
|
||||
assert!(send_before_resolve
|
||||
.expect_err("prompt delivery should be gated")
|
||||
.contains("not ready for prompt delivery"));
|
||||
@@ -905,7 +986,7 @@ mod tests {
|
||||
.expect("ready observe should succeed");
|
||||
|
||||
let running = registry
|
||||
.send_prompt(&worker.worker_id, Some("Implement worker handshake"))
|
||||
.send_prompt(&worker.worker_id, Some("Implement worker handshake"), None)
|
||||
.expect("prompt send should succeed");
|
||||
assert_eq!(running.status, WorkerStatus::Running);
|
||||
assert_eq!(running.prompt_delivery_attempts, 1);
|
||||
@@ -941,6 +1022,8 @@ mod tests {
|
||||
prompt_preview: "Implement worker handshake".to_string(),
|
||||
observed_target: WorkerPromptTarget::Shell,
|
||||
observed_cwd: None,
|
||||
observed_prompt_preview: None,
|
||||
task_receipt: None,
|
||||
recovery_armed: false,
|
||||
})
|
||||
);
|
||||
@@ -956,12 +1039,14 @@ mod tests {
|
||||
prompt_preview: "Implement worker handshake".to_string(),
|
||||
observed_target: WorkerPromptTarget::Shell,
|
||||
observed_cwd: None,
|
||||
observed_prompt_preview: None,
|
||||
task_receipt: None,
|
||||
recovery_armed: true,
|
||||
})
|
||||
);
|
||||
|
||||
let replayed = registry
|
||||
.send_prompt(&worker.worker_id, None)
|
||||
.send_prompt(&worker.worker_id, None, None)
|
||||
.expect("replay send should succeed");
|
||||
assert_eq!(replayed.status, WorkerStatus::Running);
|
||||
assert!(replayed.replay_prompt.is_none());
|
||||
@@ -976,7 +1061,11 @@ mod tests {
|
||||
.observe(&worker.worker_id, "Ready for input\n>")
|
||||
.expect("ready observe should succeed");
|
||||
registry
|
||||
.send_prompt(&worker.worker_id, Some("Run the worker bootstrap tests"))
|
||||
.send_prompt(
|
||||
&worker.worker_id,
|
||||
Some("Run the worker bootstrap tests"),
|
||||
None,
|
||||
)
|
||||
.expect("prompt send should succeed");
|
||||
|
||||
let recovered = registry
|
||||
@@ -1007,6 +1096,8 @@ mod tests {
|
||||
prompt_preview: "Run the worker bootstrap tests".to_string(),
|
||||
observed_target: WorkerPromptTarget::WrongTarget,
|
||||
observed_cwd: Some("/tmp/repo-target-b".to_string()),
|
||||
observed_prompt_preview: None,
|
||||
task_receipt: None,
|
||||
recovery_armed: false,
|
||||
})
|
||||
);
|
||||
@@ -1049,6 +1140,75 @@ mod tests {
|
||||
assert!(ready.last_error.is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn wrong_task_receipt_mismatch_is_detected_before_execution_continues() {
|
||||
let registry = WorkerRegistry::new();
|
||||
let worker = registry.create("/tmp/repo-task", &[], true);
|
||||
registry
|
||||
.observe(&worker.worker_id, "Ready for input\n>")
|
||||
.expect("ready observe should succeed");
|
||||
registry
|
||||
.send_prompt(
|
||||
&worker.worker_id,
|
||||
Some("Implement worker handshake"),
|
||||
Some(WorkerTaskReceipt {
|
||||
repo: "claw-code".to_string(),
|
||||
task_kind: "repo_code".to_string(),
|
||||
source_surface: "omx_team".to_string(),
|
||||
expected_artifacts: vec!["patch".to_string(), "tests".to_string()],
|
||||
objective_preview: "Implement worker handshake".to_string(),
|
||||
}),
|
||||
)
|
||||
.expect("prompt send should succeed");
|
||||
|
||||
let recovered = registry
|
||||
.observe(
|
||||
&worker.worker_id,
|
||||
"› Explain this KakaoTalk screenshot for a friend\nI can help analyze the screenshot…",
|
||||
)
|
||||
.expect("mismatch observe should succeed");
|
||||
|
||||
assert_eq!(recovered.status, WorkerStatus::ReadyForPrompt);
|
||||
assert_eq!(
|
||||
recovered
|
||||
.last_error
|
||||
.expect("mismatch error should exist")
|
||||
.kind,
|
||||
WorkerFailureKind::PromptDelivery
|
||||
);
|
||||
let mismatch = recovered
|
||||
.events
|
||||
.iter()
|
||||
.find(|event| event.kind == WorkerEventKind::PromptMisdelivery)
|
||||
.expect("wrong-task event should exist");
|
||||
assert_eq!(mismatch.status, WorkerStatus::Failed);
|
||||
assert_eq!(
|
||||
mismatch.payload,
|
||||
Some(WorkerEventPayload::PromptDelivery {
|
||||
prompt_preview: "Implement worker handshake".to_string(),
|
||||
observed_target: WorkerPromptTarget::WrongTask,
|
||||
observed_cwd: None,
|
||||
observed_prompt_preview: Some(
|
||||
"Explain this KakaoTalk screenshot for a friend".to_string()
|
||||
),
|
||||
task_receipt: Some(WorkerTaskReceipt {
|
||||
repo: "claw-code".to_string(),
|
||||
task_kind: "repo_code".to_string(),
|
||||
source_surface: "omx_team".to_string(),
|
||||
expected_artifacts: vec!["patch".to_string(), "tests".to_string()],
|
||||
objective_preview: "Implement worker handshake".to_string(),
|
||||
}),
|
||||
recovery_armed: false,
|
||||
})
|
||||
);
|
||||
let replay = recovered
|
||||
.events
|
||||
.iter()
|
||||
.find(|event| event.kind == WorkerEventKind::PromptReplayArmed)
|
||||
.expect("replay event should exist");
|
||||
assert_eq!(replay.status, WorkerStatus::ReadyForPrompt);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn restart_and_terminate_reset_or_finish_worker() {
|
||||
let registry = WorkerRegistry::new();
|
||||
@@ -1057,7 +1217,7 @@ mod tests {
|
||||
.observe(&worker.worker_id, "Ready for input\n>")
|
||||
.expect("ready observe should succeed");
|
||||
registry
|
||||
.send_prompt(&worker.worker_id, Some("Run tests"))
|
||||
.send_prompt(&worker.worker_id, Some("Run tests"), None)
|
||||
.expect("prompt send should succeed");
|
||||
|
||||
let restarted = registry
|
||||
@@ -1086,7 +1246,7 @@ mod tests {
|
||||
.observe(&worker.worker_id, "Ready for input\n>")
|
||||
.expect("ready observe should succeed");
|
||||
registry
|
||||
.send_prompt(&worker.worker_id, Some("Run tests"))
|
||||
.send_prompt(&worker.worker_id, Some("Run tests"), None)
|
||||
.expect("prompt send should succeed");
|
||||
|
||||
let failed = registry
|
||||
@@ -1163,7 +1323,7 @@ mod tests {
|
||||
.observe(&worker.worker_id, "Ready for input\n>")
|
||||
.expect("ready observe should succeed");
|
||||
registry
|
||||
.send_prompt(&worker.worker_id, Some("Run tests"))
|
||||
.send_prompt(&worker.worker_id, Some("Run tests"), None)
|
||||
.expect("prompt send should succeed");
|
||||
|
||||
let finished = registry
|
||||
|
||||
@@ -304,7 +304,7 @@ fn worker_provider_failure_flows_through_recovery_to_policy() {
|
||||
.observe(&worker.worker_id, "Ready for your input\n>")
|
||||
.expect("ready observe should succeed");
|
||||
registry
|
||||
.send_prompt(&worker.worker_id, Some("Run analysis"))
|
||||
.send_prompt(&worker.worker_id, Some("Run analysis"), None)
|
||||
.expect("prompt send should succeed");
|
||||
|
||||
// Session completes with provider failure (finish="unknown", tokens=0)
|
||||
|
||||
@@ -78,6 +78,9 @@ const INTERNAL_PROGRESS_HEARTBEAT_INTERVAL: Duration = Duration::from_secs(3);
|
||||
const POST_TOOL_STALL_TIMEOUT: Duration = Duration::from_secs(10);
|
||||
const PRIMARY_SESSION_EXTENSION: &str = "jsonl";
|
||||
const LEGACY_SESSION_EXTENSION: &str = "json";
|
||||
const OFFICIAL_REPO_URL: &str = "https://github.com/ultraworkers/claw-code";
|
||||
const OFFICIAL_REPO_SLUG: &str = "ultraworkers/claw-code";
|
||||
const DEPRECATED_INSTALL_COMMAND: &str = "cargo install claw-code";
|
||||
const LATEST_SESSION_REFERENCE: &str = "latest";
|
||||
const SESSION_REFERENCE_ALIASES: &[&str] = &[LATEST_SESSION_REFERENCE, "last", "recent"];
|
||||
const CLI_OPTION_SUGGESTIONS: &[&str] = &[
|
||||
@@ -1477,6 +1480,7 @@ fn render_doctor_report() -> Result<DoctorReport, Box<dyn std::error::Error>> {
|
||||
checks: vec![
|
||||
check_auth_health(),
|
||||
check_config_health(&config_loader, config.as_ref()),
|
||||
check_install_source_health(),
|
||||
check_workspace_health(&context),
|
||||
check_sandbox_health(&context.sandbox_status),
|
||||
check_system_health(&cwd, config.as_ref().ok()),
|
||||
@@ -1764,6 +1768,36 @@ fn check_config_health(
|
||||
}
|
||||
}
|
||||
|
||||
fn check_install_source_health() -> DiagnosticCheck {
|
||||
DiagnosticCheck::new(
|
||||
"Install source",
|
||||
DiagnosticLevel::Ok,
|
||||
format!(
|
||||
"official source of truth is {OFFICIAL_REPO_SLUG}; avoid `{DEPRECATED_INSTALL_COMMAND}`"
|
||||
),
|
||||
)
|
||||
.with_details(vec![
|
||||
format!("Official repo {OFFICIAL_REPO_URL}"),
|
||||
"Recommended path build from this repo or use the upstream binary documented in README.md"
|
||||
.to_string(),
|
||||
format!(
|
||||
"Deprecated crate `{DEPRECATED_INSTALL_COMMAND}` installs a deprecated stub and does not provide the `claw` binary"
|
||||
)
|
||||
.to_string(),
|
||||
])
|
||||
.with_data(Map::from_iter([
|
||||
("official_repo".to_string(), json!(OFFICIAL_REPO_URL)),
|
||||
(
|
||||
"deprecated_install".to_string(),
|
||||
json!(DEPRECATED_INSTALL_COMMAND),
|
||||
),
|
||||
(
|
||||
"recommended_install".to_string(),
|
||||
json!("build from source or follow the upstream binary instructions in README.md"),
|
||||
),
|
||||
]))
|
||||
}
|
||||
|
||||
fn check_workspace_health(context: &StatusContext) -> DiagnosticCheck {
|
||||
let in_repo = context.project_root.is_some();
|
||||
DiagnosticCheck::new(
|
||||
@@ -3088,6 +3122,7 @@ struct SessionHandle {
|
||||
struct ManagedSessionSummary {
|
||||
id: String,
|
||||
path: PathBuf,
|
||||
updated_at_ms: u64,
|
||||
modified_epoch_millis: u128,
|
||||
message_count: usize,
|
||||
parent_session_id: Option<String>,
|
||||
@@ -4677,6 +4712,7 @@ fn list_managed_sessions() -> Result<Vec<ManagedSessionSummary>, Box<dyn std::er
|
||||
.map(|session| ManagedSessionSummary {
|
||||
id: session.id,
|
||||
path: session.path,
|
||||
updated_at_ms: session.updated_at_ms,
|
||||
modified_epoch_millis: session.modified_epoch_millis,
|
||||
message_count: session.message_count,
|
||||
parent_session_id: session.parent_session_id,
|
||||
@@ -4692,6 +4728,7 @@ fn latest_managed_session() -> Result<ManagedSessionSummary, Box<dyn std::error:
|
||||
Ok(ManagedSessionSummary {
|
||||
id: session.id,
|
||||
path: session.path,
|
||||
updated_at_ms: session.updated_at_ms,
|
||||
modified_epoch_millis: session.modified_epoch_millis,
|
||||
message_count: session.message_count,
|
||||
parent_session_id: session.parent_session_id,
|
||||
@@ -8111,6 +8148,11 @@ fn print_help_to(out: &mut impl Write) -> io::Result<()> {
|
||||
out,
|
||||
" Diagnose local auth, config, workspace, and sandbox health"
|
||||
)?;
|
||||
writeln!(out, " Source of truth: {OFFICIAL_REPO_SLUG}")?;
|
||||
writeln!(
|
||||
out,
|
||||
" Warning: do not `{DEPRECATED_INSTALL_COMMAND}` (deprecated stub)"
|
||||
)?;
|
||||
writeln!(out, " claw dump-manifests [--manifests-dir PATH]")?;
|
||||
writeln!(out, " claw bootstrap-plan")?;
|
||||
writeln!(out, " claw agents")?;
|
||||
@@ -8200,6 +8242,11 @@ fn print_help_to(out: &mut impl Write) -> io::Result<()> {
|
||||
writeln!(out, " claw mcp show my-server")?;
|
||||
writeln!(out, " claw /skills")?;
|
||||
writeln!(out, " claw doctor")?;
|
||||
writeln!(out, " source of truth: {OFFICIAL_REPO_URL}")?;
|
||||
writeln!(
|
||||
out,
|
||||
" do not run `{DEPRECATED_INSTALL_COMMAND}` — it installs a deprecated stub"
|
||||
)?;
|
||||
writeln!(out, " claw init")?;
|
||||
writeln!(out, " claw export")?;
|
||||
writeln!(out, " claw export conversation.md")?;
|
||||
@@ -10082,6 +10129,8 @@ mod tests {
|
||||
assert!(help.contains("claw mcp"));
|
||||
assert!(help.contains("claw skills"));
|
||||
assert!(help.contains("claw /skills"));
|
||||
assert!(help.contains("ultraworkers/claw-code"));
|
||||
assert!(help.contains("cargo install claw-code"));
|
||||
assert!(!help.contains("claw login"));
|
||||
assert!(!help.contains("claw logout"));
|
||||
}
|
||||
|
||||
@@ -209,7 +209,7 @@ fn doctor_and_resume_status_emit_json_when_requested() {
|
||||
assert!(summary["failures"].as_u64().is_some());
|
||||
|
||||
let checks = doctor["checks"].as_array().expect("doctor checks");
|
||||
assert_eq!(checks.len(), 5);
|
||||
assert_eq!(checks.len(), 6);
|
||||
let check_names = checks
|
||||
.iter()
|
||||
.map(|check| {
|
||||
@@ -221,7 +221,27 @@ fn doctor_and_resume_status_emit_json_when_requested() {
|
||||
.collect::<Vec<_>>();
|
||||
assert_eq!(
|
||||
check_names,
|
||||
vec!["auth", "config", "workspace", "sandbox", "system"]
|
||||
vec![
|
||||
"auth",
|
||||
"config",
|
||||
"install source",
|
||||
"workspace",
|
||||
"sandbox",
|
||||
"system"
|
||||
]
|
||||
);
|
||||
|
||||
let install_source = checks
|
||||
.iter()
|
||||
.find(|check| check["name"] == "install source")
|
||||
.expect("install source check");
|
||||
assert_eq!(
|
||||
install_source["official_repo"],
|
||||
"https://github.com/ultraworkers/claw-code"
|
||||
);
|
||||
assert_eq!(
|
||||
install_source["deprecated_install"],
|
||||
"cargo install claw-code"
|
||||
);
|
||||
|
||||
let workspace = checks
|
||||
|
||||
@@ -20,7 +20,7 @@ use runtime::{
|
||||
summary_compression::compress_summary_text,
|
||||
task_registry::TaskRegistry,
|
||||
team_cron_registry::{CronRegistry, TeamRegistry},
|
||||
worker_boot::{WorkerReadySnapshot, WorkerRegistry},
|
||||
worker_boot::{WorkerReadySnapshot, WorkerRegistry, WorkerTaskReceipt},
|
||||
write_file, ApiClient, ApiRequest, AssistantEvent, BashCommandInput, BashCommandOutput,
|
||||
BranchFreshness, ConfigLoader, ContentBlock, ConversationMessage, ConversationRuntime,
|
||||
GrepSearchInput, LaneCommitProvenance, LaneEvent, LaneEventBlocker, LaneEventName,
|
||||
@@ -930,7 +930,22 @@ pub fn mvp_tool_specs() -> Vec<ToolSpec> {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"worker_id": { "type": "string" },
|
||||
"prompt": { "type": "string" }
|
||||
"prompt": { "type": "string" },
|
||||
"task_receipt": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"repo": { "type": "string" },
|
||||
"task_kind": { "type": "string" },
|
||||
"source_surface": { "type": "string" },
|
||||
"expected_artifacts": {
|
||||
"type": "array",
|
||||
"items": { "type": "string" }
|
||||
},
|
||||
"objective_preview": { "type": "string" }
|
||||
},
|
||||
"required": ["repo", "task_kind", "source_surface", "objective_preview"],
|
||||
"additionalProperties": false
|
||||
}
|
||||
},
|
||||
"required": ["worker_id"],
|
||||
"additionalProperties": false
|
||||
@@ -1522,7 +1537,11 @@ fn run_worker_await_ready(input: WorkerIdInput) -> Result<String, String> {
|
||||
|
||||
#[allow(clippy::needless_pass_by_value)]
|
||||
fn run_worker_send_prompt(input: WorkerSendPromptInput) -> Result<String, String> {
|
||||
let worker = global_worker_registry().send_prompt(&input.worker_id, input.prompt.as_deref())?;
|
||||
let worker = global_worker_registry().send_prompt(
|
||||
&input.worker_id,
|
||||
input.prompt.as_deref(),
|
||||
input.task_receipt,
|
||||
)?;
|
||||
to_pretty_json(worker)
|
||||
}
|
||||
|
||||
@@ -2439,6 +2458,8 @@ struct WorkerSendPromptInput {
|
||||
worker_id: String,
|
||||
#[serde(default)]
|
||||
prompt: Option<String>,
|
||||
#[serde(default)]
|
||||
task_receipt: Option<WorkerTaskReceipt>,
|
||||
}
|
||||
|
||||
const fn default_auto_recover_prompt_misdelivery() -> bool {
|
||||
@@ -3743,12 +3764,13 @@ fn persist_agent_terminal_state(
|
||||
.push(LaneEvent::failed(iso8601_now(), &blocker));
|
||||
} else {
|
||||
next_manifest.current_blocker = None;
|
||||
let compressed_detail = result
|
||||
.filter(|value| !value.trim().is_empty())
|
||||
.map(|value| compress_summary_text(value.trim()));
|
||||
next_manifest
|
||||
.lane_events
|
||||
.push(LaneEvent::finished(iso8601_now(), compressed_detail));
|
||||
let finished_summary = build_lane_finished_summary(&next_manifest, result);
|
||||
next_manifest.lane_events.push(
|
||||
LaneEvent::finished(iso8601_now(), finished_summary.detail).with_data(
|
||||
serde_json::to_value(&finished_summary.data)
|
||||
.expect("lane summary metadata should serialize"),
|
||||
),
|
||||
);
|
||||
if let Some(provenance) = maybe_commit_provenance(result) {
|
||||
next_manifest.lane_events.push(LaneEvent::commit_created(
|
||||
iso8601_now(),
|
||||
@@ -3760,6 +3782,308 @@ fn persist_agent_terminal_state(
|
||||
write_agent_manifest(&next_manifest)
|
||||
}
|
||||
|
||||
const MIN_LANE_SUMMARY_WORDS: usize = 7;
|
||||
const REVIEW_VERDICTS: &[(&str, &str)] = &[
|
||||
("APPROVE", "approve"),
|
||||
("REJECT", "reject"),
|
||||
("BLOCKED", "blocked"),
|
||||
];
|
||||
const CONTROL_ONLY_SUMMARY_WORDS: &[&str] = &[
|
||||
"ack",
|
||||
"commit",
|
||||
"continue",
|
||||
"everyting",
|
||||
"everything",
|
||||
"keep",
|
||||
"next",
|
||||
"push",
|
||||
"ralph",
|
||||
"resume",
|
||||
"retry",
|
||||
"run",
|
||||
"stop",
|
||||
"sweep",
|
||||
"sweeping",
|
||||
"team",
|
||||
];
|
||||
const CONTEXTUAL_SUMMARY_WORDS: &[&str] = &[
|
||||
"added",
|
||||
"audited",
|
||||
"blocked",
|
||||
"completed",
|
||||
"documented",
|
||||
"failed",
|
||||
"finished",
|
||||
"fixed",
|
||||
"implemented",
|
||||
"investigated",
|
||||
"merged",
|
||||
"pushed",
|
||||
"refactored",
|
||||
"removed",
|
||||
"reviewed",
|
||||
"tested",
|
||||
"updated",
|
||||
"verified",
|
||||
];
|
||||
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
struct LaneFinishedSummaryData {
|
||||
#[serde(rename = "qualityFloorApplied")]
|
||||
quality_floor_applied: bool,
|
||||
reasons: Vec<String>,
|
||||
#[serde(rename = "rawSummary", skip_serializing_if = "Option::is_none")]
|
||||
raw_summary: Option<String>,
|
||||
#[serde(rename = "wordCount")]
|
||||
word_count: usize,
|
||||
#[serde(rename = "reviewVerdict", skip_serializing_if = "Option::is_none")]
|
||||
review_verdict: Option<String>,
|
||||
#[serde(rename = "reviewTarget", skip_serializing_if = "Option::is_none")]
|
||||
review_target: Option<String>,
|
||||
#[serde(rename = "reviewRationale", skip_serializing_if = "Option::is_none")]
|
||||
review_rationale: Option<String>,
|
||||
#[serde(rename = "selectionOutcome", skip_serializing_if = "Option::is_none")]
|
||||
selection_outcome: Option<SelectionOutcome>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
struct LaneFinishedSummary {
|
||||
detail: Option<String>,
|
||||
data: LaneFinishedSummaryData,
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
struct LaneSummaryAssessment {
|
||||
apply_quality_floor: bool,
|
||||
reasons: Vec<String>,
|
||||
word_count: usize,
|
||||
review_outcome: Option<ReviewLaneOutcome>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
struct ReviewLaneOutcome {
|
||||
verdict: String,
|
||||
rationale: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize)]
|
||||
struct SelectionOutcome {
|
||||
#[serde(rename = "chosenItems", skip_serializing_if = "Vec::is_empty")]
|
||||
chosen_items: Vec<String>,
|
||||
#[serde(rename = "skippedItems", skip_serializing_if = "Vec::is_empty")]
|
||||
skipped_items: Vec<String>,
|
||||
action: String,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
rationale: Option<String>,
|
||||
}
|
||||
|
||||
fn build_lane_finished_summary(
|
||||
manifest: &AgentOutput,
|
||||
result: Option<&str>,
|
||||
) -> LaneFinishedSummary {
|
||||
let raw_summary = result.map(str::trim).filter(|value| !value.is_empty());
|
||||
let assessment = assess_lane_summary_quality(raw_summary.unwrap_or_default());
|
||||
let detail = match raw_summary {
|
||||
Some(summary) if !assessment.apply_quality_floor => Some(compress_summary_text(summary)),
|
||||
Some(summary) => Some(compose_lane_summary_fallback(manifest, Some(summary))),
|
||||
None => Some(compose_lane_summary_fallback(manifest, None)),
|
||||
};
|
||||
let review_outcome = assessment.review_outcome.clone();
|
||||
let review_target = review_outcome
|
||||
.as_ref()
|
||||
.map(|_| manifest.description.trim())
|
||||
.filter(|value| !value.is_empty())
|
||||
.map(str::to_string);
|
||||
|
||||
LaneFinishedSummary {
|
||||
detail,
|
||||
data: LaneFinishedSummaryData {
|
||||
quality_floor_applied: raw_summary.is_none() || assessment.apply_quality_floor,
|
||||
reasons: assessment.reasons,
|
||||
raw_summary: raw_summary.map(str::to_string),
|
||||
word_count: assessment.word_count,
|
||||
review_verdict: review_outcome
|
||||
.as_ref()
|
||||
.map(|outcome| outcome.verdict.clone()),
|
||||
review_target,
|
||||
review_rationale: review_outcome.and_then(|outcome| outcome.rationale),
|
||||
selection_outcome: extract_selection_outcome(raw_summary.unwrap_or_default()),
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
fn assess_lane_summary_quality(summary: &str) -> LaneSummaryAssessment {
|
||||
let words = summary
|
||||
.split(|ch: char| !(ch.is_ascii_alphanumeric() || ch == '-' || ch == '#'))
|
||||
.filter(|token| !token.is_empty())
|
||||
.map(str::to_ascii_lowercase)
|
||||
.collect::<Vec<_>>();
|
||||
|
||||
let word_count = words.len();
|
||||
let mut reasons = Vec::new();
|
||||
if summary.trim().is_empty() {
|
||||
reasons.push(String::from("empty"));
|
||||
}
|
||||
|
||||
let review_outcome = extract_review_outcome(summary);
|
||||
|
||||
let control_only = !words.is_empty()
|
||||
&& words
|
||||
.iter()
|
||||
.all(|word| CONTROL_ONLY_SUMMARY_WORDS.contains(&word.as_str()));
|
||||
if control_only && review_outcome.is_none() {
|
||||
reasons.push(String::from("control_only"));
|
||||
}
|
||||
|
||||
let has_context_signal = summary.contains('`')
|
||||
|| summary.contains('/')
|
||||
|| summary.contains(':')
|
||||
|| summary.contains('#')
|
||||
|| review_outcome.is_some()
|
||||
|| words
|
||||
.iter()
|
||||
.any(|word| CONTEXTUAL_SUMMARY_WORDS.contains(&word.as_str()));
|
||||
if word_count < MIN_LANE_SUMMARY_WORDS && !has_context_signal {
|
||||
reasons.push(String::from("too_short_without_context"));
|
||||
}
|
||||
|
||||
LaneSummaryAssessment {
|
||||
apply_quality_floor: !reasons.is_empty(),
|
||||
reasons,
|
||||
word_count,
|
||||
review_outcome,
|
||||
}
|
||||
}
|
||||
|
||||
fn compose_lane_summary_fallback(manifest: &AgentOutput, raw_summary: Option<&str>) -> String {
|
||||
let target = manifest.description.trim();
|
||||
let base = format!(
|
||||
"Completed lane `{}` for target: {}. Status: completed.",
|
||||
manifest.name,
|
||||
if target.is_empty() {
|
||||
"unspecified task"
|
||||
} else {
|
||||
target
|
||||
}
|
||||
);
|
||||
match raw_summary {
|
||||
Some(summary) => format!(
|
||||
"{base} Original stop summary was too vague to keep as the lane result: \"{}\".",
|
||||
summary.trim()
|
||||
),
|
||||
None => format!("{base} No usable stop summary was produced by the lane."),
|
||||
}
|
||||
}
|
||||
|
||||
fn extract_review_outcome(summary: &str) -> Option<ReviewLaneOutcome> {
|
||||
let mut lines = summary
|
||||
.lines()
|
||||
.map(str::trim)
|
||||
.filter(|line| !line.is_empty());
|
||||
let first = lines.next()?;
|
||||
let verdict = REVIEW_VERDICTS.iter().find_map(|(prefix, verdict)| {
|
||||
first
|
||||
.eq_ignore_ascii_case(prefix)
|
||||
.then(|| (*verdict).to_string())
|
||||
})?;
|
||||
let rationale = lines.collect::<Vec<_>>().join(" ").trim().to_string();
|
||||
Some(ReviewLaneOutcome {
|
||||
verdict,
|
||||
rationale: (!rationale.is_empty()).then_some(compress_summary_text(&rationale)),
|
||||
})
|
||||
}
|
||||
|
||||
fn extract_selection_outcome(summary: &str) -> Option<SelectionOutcome> {
|
||||
let mut chosen_items = Vec::new();
|
||||
let mut skipped_items = Vec::new();
|
||||
let mut action = None;
|
||||
let mut rationale = None;
|
||||
|
||||
for line in summary
|
||||
.lines()
|
||||
.map(str::trim)
|
||||
.filter(|line| !line.is_empty())
|
||||
{
|
||||
let lowered = line.to_ascii_lowercase();
|
||||
let roadmap_items = extract_roadmap_items(line);
|
||||
|
||||
if lowered.starts_with("chosen:")
|
||||
|| lowered.starts_with("picked:")
|
||||
|| lowered.starts_with("selected:")
|
||||
|| (lowered.contains("picked") && !roadmap_items.is_empty())
|
||||
|| (lowered.contains("selected") && !roadmap_items.is_empty())
|
||||
{
|
||||
chosen_items.extend(roadmap_items);
|
||||
} else if lowered.starts_with("skipped:")
|
||||
|| lowered.starts_with("skip:")
|
||||
|| (lowered.contains("skipped") && !roadmap_items.is_empty())
|
||||
{
|
||||
skipped_items.extend(roadmap_items);
|
||||
}
|
||||
|
||||
if let Some(rest) = lowered.strip_prefix("action:") {
|
||||
if rest.contains("execute") || rest.contains("implement") || rest.contains("fix") {
|
||||
action = Some(String::from("execute"));
|
||||
} else if rest.contains("review") || rest.contains("audit") {
|
||||
action = Some(String::from("review"));
|
||||
} else if rest.contains("no-op") || rest.contains("noop") {
|
||||
action = Some(String::from("no-op"));
|
||||
}
|
||||
}
|
||||
|
||||
if let Some(rest) = line.strip_prefix("Rationale:") {
|
||||
let trimmed = rest.trim();
|
||||
if !trimmed.is_empty() {
|
||||
rationale = Some(compress_summary_text(trimmed));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
chosen_items.sort();
|
||||
chosen_items.dedup();
|
||||
skipped_items.sort();
|
||||
skipped_items.dedup();
|
||||
|
||||
if chosen_items.is_empty() && skipped_items.is_empty() && action.is_none() {
|
||||
return None;
|
||||
}
|
||||
|
||||
let default_action = if chosen_items.is_empty() {
|
||||
String::from("no-op")
|
||||
} else {
|
||||
String::from("execute")
|
||||
};
|
||||
|
||||
Some(SelectionOutcome {
|
||||
chosen_items,
|
||||
skipped_items,
|
||||
action: action.unwrap_or(default_action),
|
||||
rationale,
|
||||
})
|
||||
}
|
||||
|
||||
fn extract_roadmap_items(line: &str) -> Vec<String> {
|
||||
let mut items = Vec::new();
|
||||
let mut chars = line.chars().peekable();
|
||||
while let Some(ch) = chars.next() {
|
||||
if ch == '#' {
|
||||
let mut digits = String::new();
|
||||
while let Some(next) = chars.peek() {
|
||||
if next.is_ascii_digit() {
|
||||
digits.push(*next);
|
||||
chars.next();
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
}
|
||||
if !digits.is_empty() {
|
||||
items.push(format!("ROADMAP #{digits}"));
|
||||
}
|
||||
}
|
||||
}
|
||||
items
|
||||
}
|
||||
|
||||
fn derive_agent_state(
|
||||
status: &str,
|
||||
result: Option<&str>,
|
||||
@@ -7240,6 +7564,14 @@ mod tests {
|
||||
completed_manifest_json["laneEvents"][1]["event"],
|
||||
"lane.finished"
|
||||
);
|
||||
assert_eq!(
|
||||
completed_manifest_json["laneEvents"][1]["data"]["qualityFloorApplied"],
|
||||
false
|
||||
);
|
||||
assert_eq!(
|
||||
completed_manifest_json["laneEvents"][1]["detail"],
|
||||
"Finished successfully in commit abc1234"
|
||||
);
|
||||
assert_eq!(
|
||||
completed_manifest_json["laneEvents"][2]["event"],
|
||||
"lane.commit.created"
|
||||
@@ -7301,6 +7633,137 @@ mod tests {
|
||||
);
|
||||
assert_eq!(failed_manifest_json["derivedState"], "truly_idle");
|
||||
|
||||
let normalized = execute_agent_with_spawn(
|
||||
AgentInput {
|
||||
description: "Sweep the next backlog item".to_string(),
|
||||
prompt: "Produce a low-signal stop summary".to_string(),
|
||||
subagent_type: Some("Explore".to_string()),
|
||||
name: Some("summary-floor".to_string()),
|
||||
model: None,
|
||||
},
|
||||
|job| {
|
||||
persist_agent_terminal_state(
|
||||
&job.manifest,
|
||||
"completed",
|
||||
Some("commit push everyting, keep sweeping $ralph"),
|
||||
None,
|
||||
)
|
||||
},
|
||||
)
|
||||
.expect("normalized agent should succeed");
|
||||
|
||||
let normalized_manifest = std::fs::read_to_string(&normalized.manifest_file)
|
||||
.expect("normalized manifest should exist");
|
||||
let normalized_manifest_json: serde_json::Value =
|
||||
serde_json::from_str(&normalized_manifest).expect("normalized manifest json");
|
||||
assert_eq!(
|
||||
normalized_manifest_json["laneEvents"][1]["event"],
|
||||
"lane.finished"
|
||||
);
|
||||
let normalized_detail = normalized_manifest_json["laneEvents"][1]["detail"]
|
||||
.as_str()
|
||||
.expect("normalized detail");
|
||||
assert!(normalized_detail.contains("Completed lane `summary-floor`"));
|
||||
assert!(normalized_detail.contains("Sweep the next backlog item"));
|
||||
assert_eq!(
|
||||
normalized_manifest_json["laneEvents"][1]["data"]["qualityFloorApplied"],
|
||||
true
|
||||
);
|
||||
assert_eq!(
|
||||
normalized_manifest_json["laneEvents"][1]["data"]["rawSummary"],
|
||||
"commit push everyting, keep sweeping $ralph"
|
||||
);
|
||||
assert_eq!(
|
||||
normalized_manifest_json["laneEvents"][1]["data"]["reasons"][0],
|
||||
"control_only"
|
||||
);
|
||||
|
||||
let review = execute_agent_with_spawn(
|
||||
AgentInput {
|
||||
description: "Review commit 1234abcd for ROADMAP #67".to_string(),
|
||||
prompt: "Review the scoped diff".to_string(),
|
||||
subagent_type: Some("Verification".to_string()),
|
||||
name: Some("review-lane".to_string()),
|
||||
model: None,
|
||||
},
|
||||
|job| {
|
||||
persist_agent_terminal_state(
|
||||
&job.manifest,
|
||||
"completed",
|
||||
Some("APPROVE\n\nTarget: commit 1234abcd\nRationale: scoped diff is safe."),
|
||||
None,
|
||||
)
|
||||
},
|
||||
)
|
||||
.expect("review agent should succeed");
|
||||
|
||||
let review_manifest =
|
||||
std::fs::read_to_string(&review.manifest_file).expect("review manifest should exist");
|
||||
let review_manifest_json: serde_json::Value =
|
||||
serde_json::from_str(&review_manifest).expect("review manifest json");
|
||||
assert_eq!(
|
||||
review_manifest_json["laneEvents"][1]["data"]["reviewVerdict"],
|
||||
"approve"
|
||||
);
|
||||
assert_eq!(
|
||||
review_manifest_json["laneEvents"][1]["data"]["reviewTarget"],
|
||||
"Review commit 1234abcd for ROADMAP #67"
|
||||
);
|
||||
assert_eq!(
|
||||
review_manifest_json["laneEvents"][1]["data"]["reviewRationale"],
|
||||
"Target: commit 1234abcd Rationale: scoped diff is safe."
|
||||
);
|
||||
assert_eq!(
|
||||
review_manifest_json["laneEvents"][1]["data"]["qualityFloorApplied"],
|
||||
false
|
||||
);
|
||||
|
||||
let selection = execute_agent_with_spawn(
|
||||
AgentInput {
|
||||
description: "Scan ROADMAP Immediate Backlog for the next repo-local item".to_string(),
|
||||
prompt: "Choose the next backlog target".to_string(),
|
||||
subagent_type: Some("Explore".to_string()),
|
||||
name: Some("backlog-scan".to_string()),
|
||||
model: None,
|
||||
},
|
||||
|job| {
|
||||
persist_agent_terminal_state(
|
||||
&job.manifest,
|
||||
"completed",
|
||||
Some(
|
||||
"Selected next backlog target.\nChosen: ROADMAP #65\nSkipped: ROADMAP #63, ROADMAP #64\nAction: execute\nRationale: #65 is the next repo-local lane-finished metadata task.",
|
||||
),
|
||||
None,
|
||||
)
|
||||
},
|
||||
)
|
||||
.expect("selection agent should succeed");
|
||||
|
||||
let selection_manifest = std::fs::read_to_string(&selection.manifest_file)
|
||||
.expect("selection manifest should exist");
|
||||
let selection_manifest_json: serde_json::Value =
|
||||
serde_json::from_str(&selection_manifest).expect("selection manifest json");
|
||||
assert_eq!(
|
||||
selection_manifest_json["laneEvents"][1]["data"]["selectionOutcome"]["chosenItems"][0],
|
||||
"ROADMAP #65"
|
||||
);
|
||||
assert_eq!(
|
||||
selection_manifest_json["laneEvents"][1]["data"]["selectionOutcome"]["skippedItems"][0],
|
||||
"ROADMAP #63"
|
||||
);
|
||||
assert_eq!(
|
||||
selection_manifest_json["laneEvents"][1]["data"]["selectionOutcome"]["skippedItems"][1],
|
||||
"ROADMAP #64"
|
||||
);
|
||||
assert_eq!(
|
||||
selection_manifest_json["laneEvents"][1]["data"]["selectionOutcome"]["action"],
|
||||
"execute"
|
||||
);
|
||||
assert_eq!(
|
||||
selection_manifest_json["laneEvents"][1]["data"]["selectionOutcome"]["rationale"],
|
||||
"#65 is the next repo-local lane-finished metadata task."
|
||||
);
|
||||
|
||||
let spawn_error = execute_agent_with_spawn(
|
||||
AgentInput {
|
||||
description: "Spawn error task".to_string(),
|
||||
|
||||
Reference in New Issue
Block a user