ROADMAP #132 : --output-format json global error renderer flattens every typed error variant into {type:"error",error:<prose>}, erasing §4.44 envelope structure at the final serialization boundary — runtime has 5 typed error enums (SessionError/ConfigError/McpServerManagerError/PromptBuildError/SessionControlError) but ~11 CLI emission sites call error.to_string() and wrap in flat {type,error} shape; kind/operation/target/errno/hint/retryable all discarded. #130 fix lands typed envelope in text mode but JSON mode still emits flat string. Commit d305178 body self-declares this debt. Closes the renderer-side half of §4.44. Joins JSON-envelope asymmetry family #90/#91/#92/#110/#115/#116 as 7th; joins silent-state inventory #102/#127/#129/#130/#131 as 6th. Bundle: #130+#132 export-surface typed errors text+json parity.

ROADMAP #131 : claw export positional arg silently treated as output PATH not session reference — operator types 'claw export <session-id> --output /tmp/x.md' expecting to export <session-id>, but parser puts <session-id> in output_path slot and session_reference defaults to LATEST. Result: wrong session exported with no warning. Discovered while auditing #130 export error path during scope-truth pass on 2026-04-20 dogfood cycle. Joins silent-state inventory family (#102 + #127 + #129 + #130 ) and parser-level trust gap quintet (#108 + #117 + #119 + #122 + #127 ).
fix(cli): #130 export error envelope — wrap fs::write() in run_export() with structured ExportError per Phase 2 §4.44 typed-error envelope contract; eliminates zero-context errno output. ExportError struct includes kind/operation/target/errno/hint/retryable fields with serde::Serialize and Display impls. wrap_export_io_error() classifies io::ErrorKind into filesystem/permission/invalid_path categories and synthesizes actionable hints (e.g. 'intermediate directory does not exist; try mkdir -p X first'). Verified end-to-end: ENOENT, EPERM, IsADirectory, empty path, trailing slash all emit structured envelope; success case unchanged (backward-compat anchor preserved). JSON mode still uses string error rendering — separate concern requiring global error renderer refactor (tracked for follow-up cycle).
2026-06-14 15:26:05 -04:00 · 2026-04-20 14:14:41 +09:00 · 2026-04-20 14:07:27 +09:00 · 2026-04-20 14:02:15 +09:00 · 2026-04-20 13:42:57 +09:00 · 2026-04-20 13:11:12 +09:00
13 changed files with 5717 additions and 31 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
--- a/docs/MODEL_COMPATIBILITY.md
+++ b/docs/MODEL_COMPATIBILITY.md
@@ -0,0 +1,236 @@
 # Model Compatibility Guide
 This document describes model-specific handling in the OpenAI-compatible provider. When adding new models or providers, review this guide to ensure proper compatibility.
 ## Table of Contents
 - [Overview](#overview)
 - [Model-Specific Handling](#model-specific-handling)
  - [Kimi Models (is_error Exclusion)](#kimi-models-is_error-exclusion)
  - [Reasoning Models (Tuning Parameter Stripping)](#reasoning-models-tuning-parameter-stripping)
  - [GPT-5 (max_completion_tokens)](#gpt-5-max_completion_tokens)
  - [Qwen Models (DashScope Routing)](#qwen-models-dashscope-routing)
 - [Implementation Details](#implementation-details)
 - [Adding New Models](#adding-new-models)
 - [Testing](#testing)
 ## Overview
 The `openai_compat.rs` provider translates Claude Code's internal message format to OpenAI-compatible chat completion requests. Different models have varying requirements for:
 - Tool result message fields (`is_error`)
 - Sampling parameters (temperature, top_p, etc.)
 - Token limit fields (`max_tokens` vs `max_completion_tokens`)
 - Base URL routing
 ## Model-Specific Handling
 ### Kimi Models (is_error Exclusion)
 **Affected models:** `kimi-k2.5`, `kimi-k1.5`, `kimi-moonshot`, and any model with `kimi` in the name (case-insensitive)
 **Behavior:** The `is_error` field is **excluded** from tool result messages.
 **Rationale:** Kimi models (via Moonshot AI and DashScope) reject the `is_error` field with a 400 Bad Request error:
 ```json
 {
  "error": {
    "type": "invalid_request_error",
    "message": "Unknown field: is_error"
  }
 }
 ```
 **Detection:**
 ```rust
 fn model_rejects_is_error_field(model: &str) -> bool {
    let lowered = model.to_ascii_lowercase();
    let canonical = lowered.rsplit('/').next().unwrap_or(lowered.as_str());
    canonical.starts_with("kimi-")
 }
 ```
 **Testing:** See `model_rejects_is_error_field_detects_kimi_models` and related tests in `openai_compat.rs`.
 ---
 ### Reasoning Models (Tuning Parameter Stripping)
 **Affected models:**
 - OpenAI: `o1`, `o1-*`, `o3`, `o3-*`, `o4`, `o4-*`
 - xAI: `grok-3-mini`
 - Alibaba DashScope: `qwen-qwq-*`, `qwq-*`, `qwen3-*-thinking`
 **Behavior:** The following tuning parameters are **stripped** from requests:
 - `temperature`
 - `top_p`
 - `frequency_penalty`
 - `presence_penalty`
 **Rationale:** Reasoning/chain-of-thought models use fixed sampling strategies and reject these parameters with 400 errors.
 **Exception:** `reasoning_effort` is included for compatible models when explicitly set.
 **Detection:**
 ```rust
 fn is_reasoning_model(model: &str) -> bool {
    let canonical = model.to_ascii_lowercase()
        .rsplit('/')
        .next()
        .unwrap_or(model);
    canonical.starts_with("o1")
        || canonical.starts_with("o3")
        || canonical.starts_with("o4")
        || canonical == "grok-3-mini"
        || canonical.starts_with("qwen-qwq")
        || canonical.starts_with("qwq")
        || (canonical.starts_with("qwen3") && canonical.contains("-thinking"))
 }
 ```
 **Testing:** See `reasoning_model_strips_tuning_params`, `grok_3_mini_is_reasoning_model`, and `qwen_reasoning_variants_are_detected` tests.
 ---
 ### GPT-5 (max_completion_tokens)
 **Affected models:** All models starting with `gpt-5`
 **Behavior:** Uses `max_completion_tokens` instead of `max_tokens` in the request payload.
 **Rationale:** GPT-5 models require the `max_completion_tokens` field. Legacy `max_tokens` causes request validation failures:
 ```json
 {
  "error": {
    "message": "Unknown field: max_tokens"
  }
 }
 ```
 **Implementation:**
 ```rust
 let max_tokens_key = if wire_model.starts_with("gpt-5") {
    "max_completion_tokens"
 } else {
    "max_tokens"
 };
 ```
 **Testing:** See `gpt5_uses_max_completion_tokens_not_max_tokens` and `non_gpt5_uses_max_tokens` tests.
 ---
 ### Qwen Models (DashScope Routing)
 **Affected models:** All models with `qwen` prefix
 **Behavior:** Routed to DashScope (`https://dashscope.aliyuncs.com/compatible-mode/v1`) rather than default providers.
 **Rationale:** Qwen models are hosted by Alibaba Cloud's DashScope service, not OpenAI or Anthropic.
 **Configuration:**
 ```rust
 pub const DEFAULT_DASHSCOPE_BASE_URL: &str = "https://dashscope.aliyuncs.com/compatible-mode/v1";
 ```
 **Authentication:** Uses `DASHSCOPE_API_KEY` environment variable.
 **Note:** Some Qwen models are also reasoning models (see [Reasoning Models](#reasoning-models-tuning-parameter-stripping) above) and receive both treatments.
 ## Implementation Details
 ### File Location
 All model-specific logic is in:
 ```
 rust/crates/api/src/providers/openai_compat.rs
 ```
 ### Key Functions
 | Function | Purpose |
 |----------|---------|
 | `model_rejects_is_error_field()` | Detects models that don't support `is_error` in tool results |
 | `is_reasoning_model()` | Detects reasoning models that need tuning param stripping |
 | `translate_message()` | Converts internal messages to OpenAI format (applies `is_error` logic) |
 | `build_chat_completion_request()` | Constructs full request payload (applies all model-specific logic) |
 ### Provider Prefix Handling
 All model detection functions strip provider prefixes (e.g., `dashscope/kimi-k2.5` → `kimi-k2.5`) before matching:
 ```rust
 let canonical = model.to_ascii_lowercase()
    .rsplit('/')
    .next()
    .unwrap_or(model);
 ```
 This ensures consistent detection regardless of whether models are referenced with or without provider prefixes.
 ## Adding New Models
 When adding support for new models:
 1. **Check if the model is a reasoning model**
   - Does it reject temperature/top_p parameters?
   - Add to `is_reasoning_model()` detection
 2. **Check tool result compatibility**
   - Does it reject the `is_error` field?
   - Add to `model_rejects_is_error_field()` detection
 3. **Check token limit field**
   - Does it require `max_completion_tokens` instead of `max_tokens`?
   - Update the `max_tokens_key` logic
 4. **Add tests**
   - Unit test for detection function
   - Integration test in `build_chat_completion_request`
 5. **Update this documentation**
   - Add the model to the affected lists
   - Document any special behavior
 ## Testing
 ### Running Model-Specific Tests
 ```bash
 # All OpenAI compatibility tests
 cargo test --package api providers::openai_compat
 # Specific test categories
 cargo test --package api model_rejects_is_error_field
 cargo test --package api reasoning_model
 cargo test --package api gpt5
 cargo test --package api qwen
 ```
 ### Test Files
 - Unit tests: `rust/crates/api/src/providers/openai_compat.rs` (in `mod tests`)
 - Integration tests: `rust/crates/api/tests/openai_compat_integration.rs`
 ### Verifying Model Detection
 To verify a model is detected correctly without making API calls:
 ```rust
 #[test]
 fn my_new_model_is_detected() {
    // is_error handling
    assert!(model_rejects_is_error_field("my-model"));
    // Reasoning model detection
    assert!(is_reasoning_model("my-model"));
    // Provider prefix handling
    assert!(model_rejects_is_error_field("provider/my-model"));
 }
 ```
 ---
 *Last updated: 2026-04-16*
 For questions or updates, see the implementation in `rust/crates/api/src/providers/openai_compat.rs`.
--- a/prd.json
+++ b/prd.json
@@ -116,6 +116,241 @@
      ],
      "passes": true,
      "priority": "P0"
    },
    {
      "id": "US-009",
      "title": "Add unit tests for kimi model compatibility fix",
      "description": "During dogfooding we discovered the existing test coverage for model-specific is_error handling is insufficient. Need to add dedicated tests for model_rejects_is_error_field function and translate_message behavior with different models.",
      "acceptanceCriteria": [
        "Test model_rejects_is_error_field identifies kimi-k2.5, kimi-k1.5, dashscope/kimi-k2.5",
        "Test translate_message includes is_error for gpt-4, grok-3, claude models",
        "Test translate_message excludes is_error for kimi models",
        "Test build_chat_completion_request produces correct payload for kimi vs non-kimi",
        "All new tests pass",
        "cargo test --package api passes"
      ],
      "passes": true,
      "priority": "P1"
    },
    {
      "id": "US-010",
      "title": "Add model compatibility documentation",
      "description": "Document which models require special handling (is_error exclusion, reasoning model tuning param stripping, etc.) in a MODEL_COMPATIBILITY.md file for operators and contributors.",
      "acceptanceCriteria": [
        "MODEL_COMPATIBILITY.md created in docs/ or repo root",
        "Document kimi models is_error exclusion",
        "Document reasoning models (o1, o3, grok-3-mini) tuning param stripping",
        "Document gpt-5 max_completion_tokens requirement",
        "Document qwen model routing through dashscope",
        "Cross-reference with existing code comments"
      ],
      "passes": true,
      "priority": "P2"
    },
    {
      "id": "US-011",
      "title": "Performance optimization: reduce API request serialization overhead",
      "description": "The translate_message function creates intermediate JSON Value objects that could be optimized. Profile and optimize the hot path for API request building, especially for conversations with many tool results.",
      "acceptanceCriteria": [
        "Profile current request building with criterion or similar",
        "Identify bottlenecks in translate_message and build_chat_completion_request",
        "Implement optimizations (Vec pre-allocation, reduced cloning, etc.)",
        "Benchmark before/after showing improvement",
        "No functional changes or API breakage"
      ],
      "passes": true,
      "priority": "P2"
    },
    {
      "id": "US-012",
      "title": "Trust prompt resolver with allowlist auto-trust",
      "description": "Add allowlisted auto-trust behavior for known repos/worktrees. Trust prompts currently block TUI startup and require manual intervention. Implement automatic trust resolution for pre-approved repositories.",
      "acceptanceCriteria": [
        "TrustAllowlist config structure with repo patterns",
        "Auto-trust behavior for allowlisted repos/worktrees",
        "trust_required event emitted when trust prompt detected",
        "trust_resolved event emitted when trust is granted",
        "Non-allowlisted repos remain gated (manual trust required)",
        "Integration with worker boot lifecycle",
        "Tests for allowlist matching and event emission"
      ],
      "passes": true,
      "priority": "P1"
    },
    {
      "id": "US-013",
      "title": "Phase 2 - Session event ordering + terminal-state reconciliation",
      "description": "When the same session emits contradictory lifecycle events (idle, error, completed, transport/server-down) in close succession, expose deterministic final truth. Attach monotonic sequence/causal ordering metadata, classify terminal vs advisory events, reconcile duplicate/out-of-order terminal events into one canonical lane outcome.",
      "acceptanceCriteria": [
        "Monotonic sequence / causal ordering metadata attached to session lifecycle events",
        "Terminal vs advisory event classification implemented",
        "Reconcile duplicate or out-of-order terminal events into one canonical outcome",
        "Distinguish 'session terminal state unknown because transport died' from real 'completed'",
        "Tests verify reconciliation behavior with out-of-order event bursts"
      ],
      "passes": true,
      "priority": "P1"
    },
    {
      "id": "US-014",
      "title": "Phase 2 - Event provenance / environment labeling",
      "description": "Every emitted event should declare its source (live_lane, test, healthcheck, replay, transport) so claws do not mistake test noise for production truth. Include environment/channel label, emitter identity, and confidence/trust level.",
      "acceptanceCriteria": [
        "EventProvenance enum with live_lane, test, healthcheck, replay, transport variants",
        "Environment/channel label attached to all events",
        "Emitter identity field on events",
        "Confidence/trust level field for downstream automation",
        "Tests verify provenance labeling and filtering"
      ],
      "passes": true,
      "priority": "P1"
    },
    {
      "id": "US-015",
      "title": "Phase 2 - Session identity completeness at creation time",
      "description": "A newly created session should emit stable title, workspace/worktree path, and lane/session purpose at creation time. If any field is not yet known, emit explicit typed placeholder reason rather than bare unknown string.",
      "acceptanceCriteria": [
        "Session creation emits stable title, workspace/worktree path, purpose immediately",
        "Explicit typed placeholder when fields unknown (not bare 'unknown' strings)",
        "Later-enriched metadata reconciles onto same session identity without ambiguity",
        "Tests verify session identity completeness and placeholder handling"
      ],
      "passes": true,
      "priority": "P1"
    },
    {
      "id": "US-016",
      "title": "Phase 2 - Duplicate terminal-event suppression",
      "description": "When the same session emits repeated completed/failed/terminal notifications, collapse duplicates before they trigger repeated downstream reactions. Attach canonical terminal-event fingerprint per lane/session outcome.",
      "acceptanceCriteria": [
        "Canonical terminal-event fingerprint attached per lane/session outcome",
        "Suppress/coalesce repeated terminal notifications within reconciliation window",
        "Preserve raw event history for audit while exposing one actionable outcome downstream",
        "Surface when later duplicate materially differs from original terminal payload",
        "Tests verify deduplication and material difference detection"
      ],
      "passes": true,
      "priority": "P2"
    },
    {
      "id": "US-017",
      "title": "Phase 2 - Lane ownership / scope binding",
      "description": "Each session and lane event should declare who owns it and what workflow scope it belongs to. Attach owner/assignee identity, workflow scope (claw-code-dogfood, external-git-maintenance, infra-health, manual-operator), and mark whether watcher is expected to act, observe only, or ignore.",
      "acceptanceCriteria": [
        "Owner/assignee identity attached to sessions and lane events",
        "Workflow scope field (claw-code-dogfood, external-git-maintenance, etc.)",
        "Watcher action expectation field (act, observe-only, ignore)",
        "Preserve scope through session restarts, resumes, and late terminal events",
        "Tests verify ownership and scope binding"
      ],
      "passes": true,
      "priority": "P2"
    },
    {
      "id": "US-018",
      "title": "Phase 2 - Nudge acknowledgment / dedupe contract",
      "description": "Periodic clawhip nudges should carry nudge id/cycle id and delivery timestamp. Expose whether claw has already acknowledged or responded for that cycle. Distinguish new nudge, retry nudge, and stale duplicate.",
      "acceptanceCriteria": [
        "Nudge id / cycle id and delivery timestamp attached",
        "Acknowledgment state exposed (already acknowledged or not)",
        "Distinguish new nudge vs retry nudge vs stale duplicate",
        "Allow downstream summaries to bind reported pinpoint back to triggering nudge id",
        "Tests verify nudge deduplication and acknowledgment tracking"
      ],
      "passes": true,
      "priority": "P2"
    },
    {
      "id": "US-019",
      "title": "Phase 2 - Stable roadmap-id assignment for newly filed pinpoints",
      "description": "When a claw records a new pinpoint/follow-up, assign or expose a stable tracking id immediately. Expose that id in structured event/report payload and preserve across edits, reorderings, and summary compression.",
      "acceptanceCriteria": [
        "Canonical roadmap id assigned at filing time",
        "Roadmap id exposed in structured event/report payload",
        "Same id preserved across edits, reorderings, summary compression",
        "Distinguish 'new roadmap filing' from 'update to existing roadmap item'",
        "Tests verify stable id assignment and update detection"
      ],
      "passes": true,
      "priority": "P2"
    },
    {
      "id": "US-020",
      "title": "Phase 2 - Roadmap item lifecycle state contract",
      "description": "Each roadmap pinpoint should carry machine-readable lifecycle state (filed, acknowledged, in_progress, blocked, done, superseded). Attach last state-change timestamp and preserve lineage when one pinpoint supersedes or merges into another.",
      "acceptanceCriteria": [
        "Lifecycle state enum with filed, acknowledged, in_progress, blocked, done, superseded",
        "Last state-change timestamp attached",
        "New report can declare first filing, status update, or closure",
        "Preserve lineage when one pinpoint supersedes or merges into another",
        "Tests verify lifecycle state transitions"
      ],
      "passes": true,
      "priority": "P2"
    },
    {
      "id": "US-021",
      "title": "Request body size pre-flight check for OpenAI-compatible provider",
      "description": "Implement pre-flight request body size estimation to prevent 400 Bad Request errors from API gateways with size limits. Based on dogfood findings with kimi-k2.5 testing, DashScope API has a 6MB request body limit that was exceeded by large system prompts.",
      "acceptanceCriteria": [
        "Pre-flight size estimation before sending requests to OpenAI-compatible providers",
        "Clear error message when request exceeds provider-specific size limit",
        "Configuration for different provider limits (6MB DashScope, 100MB OpenAI, etc.)",
        "Unit tests for size estimation and limit checking",
        "Integration with existing error handling for actionable user messages"
      ],
      "passes": true,
      "priority": "P1"
    },
    {
      "id": "US-022",
      "title": "Enhanced error context for API failures",
      "description": "Add structured error context to API failures including request ID tracking across retries, provider-specific error code mapping, and suggested user actions based on error type (e.g., 'Reduce prompt size' for 413, 'Check API key' for 401).",
      "acceptanceCriteria": [
        "Request ID tracking across retries with full context in error messages",
        "Provider-specific error code mapping with actionable suggestions",
        "Suggested user actions for common error types (401, 403, 413, 429, 500, 502-504)",
        "Unit tests for error context extraction",
        "All existing tests pass and clippy is clean"
      ],
      "passes": true,
      "priority": "P1"
    },
    {
      "id": "US-023",
      "title": "Add automatic routing for kimi models to DashScope",
      "description": "Based on dogfood findings with kimi-k2.5 testing, users must manually prefix with dashscope/kimi-k2.5 instead of just using kimi-k2.5. Add automatic routing for kimi/ and kimi- prefixed models to DashScope (similar to qwen models), and add a 'kimi' alias to the model registry.",
      "acceptanceCriteria": [
        "kimi/ and kimi- prefix routing to DashScope in metadata_for_model()",
        "'kimi' alias in MODEL_REGISTRY that resolves to 'kimi-k2.5'",
        "resolve_model_alias() handles the kimi alias correctly",
        "Unit tests for kimi routing (similar to qwen routing tests)",
        "All tests pass and clippy is clean"
      ],
      "passes": true,
      "priority": "P1"
    },
    {
      "id": "US-024",
      "title": "Add token limit metadata for kimi models",
      "description": "The model_token_limit() function has no entries for kimi-k2.5 or kimi-k1.5, causing preflight context window validation to skip these models. Add token limit metadata to enable preflight checks and accurate max token defaults. Per Moonshot AI documentation, kimi-k2.5 supports 256K context window and 16K max output tokens.",
      "acceptanceCriteria": [
        "model_token_limit('kimi-k2.5') returns Some(ModelTokenLimit { max_output_tokens: 16384, context_window_tokens: 256000 })",
        "model_token_limit('kimi-k1.5') returns appropriate limits",
        "model_token_limit('kimi') follows alias chain (kimi → kimi-k2.5) and returns k2.5 limits",
        "preflight_message_request() validates context window for kimi models (via generic preflight, no provider-specific code needed)",
        "Unit tests verify limits and preflight behavior for kimi models",
        "All tests pass and clippy is clean"
      ],
      "passes": true,
      "priority": "P1"
    }
-  ]
+  ],
  "metadata": {
    "lastUpdated": "2026-04-17",
    "completedStories": ["US-001", "US-002", "US-003", "US-004", "US-005", "US-006", "US-007", "US-008", "US-009", "US-010", "US-011", "US-012", "US-013", "US-014", "US-015", "US-016", "US-017", "US-018", "US-019", "US-020", "US-021", "US-022", "US-023", "US-024"],
    "inProgressStories": [],
    "totalStories": 24,
    "status": "completed"
  }
 }
--- a/progress.txt
+++ b/progress.txt
@@ -81,3 +81,53 @@ VERIFICATION STATUS:
 - cargo clippy --workspace: PASSED
 All 7 stories from prd.json now have passes: true
 Iteration 2: 2026-04-16
 ------------------------
 US-009 COMPLETED (Add unit tests for kimi model compatibility fix)
 - Files: rust/crates/api/src/providers/openai_compat.rs
 - Added 4 comprehensive unit tests:
  1. model_rejects_is_error_field_detects_kimi_models - verifies detection of kimi-k2.5, kimi-k1.5, dashscope/kimi-k2.5, case insensitivity
  2. translate_message_includes_is_error_for_non_kimi_models - verifies gpt-4o, grok-3, claude include is_error
  3. translate_message_excludes_is_error_for_kimi_models - verifies kimi models exclude is_error (prevents 400 Bad Request)
  4. build_chat_completion_request_kimi_vs_non_kimi_tool_results - full integration test for request building
 - Tests: 4 new tests, 119 unit tests total in api crate (+4), all passing
 - Integration tests: 29 passing (no regressions)
 US-010 COMPLETED (Add model compatibility documentation)
 - Files: docs/MODEL_COMPATIBILITY.md
 - Created comprehensive documentation covering:
  1. Kimi Models (is_error Exclusion) - documents the 400 Bad Request issue and solution
  2. Reasoning Models (Tuning Parameter Stripping) - covers o1, o3, o4, grok-3-mini, qwen-qwq, qwen3-thinking
  3. GPT-5 (max_completion_tokens) - documents max_tokens vs max_completion_tokens requirement
  4. Qwen Models (DashScope Routing) - explains routing and authentication
 - Added implementation details section with key functions
 - Added "Adding New Models" guide for future contributors
 - Added testing section with example commands
 - Cross-referenced with existing code comments in openai_compat.rs
 - cargo clippy passes
 US-011 COMPLETED (Performance optimization: reduce API request serialization overhead)
 - Files:
  - rust/crates/api/Cargo.toml (added criterion dev-dependency and bench config)
  - rust/crates/api/benches/request_building.rs (new benchmark suite)
  - rust/crates/api/src/providers/openai_compat.rs (optimizations)
  - rust/crates/api/src/lib.rs (public exports for benchmarks)
 - Optimizations implemented:
  1. flatten_tool_result_content: Pre-allocate String capacity and avoid intermediate Vec
     - Before: collected to Vec<String> then joined
     - After: single String with pre-calculated capacity, push directly
  2. Made key functions public for benchmarking: translate_message, build_chat_completion_request,
     flatten_tool_result_content, is_reasoning_model, model_rejects_is_error_field
 - Benchmark results:
  - flatten_tool_result_content/single_text: ~17ns
  - flatten_tool_result_content/multi_text (10 blocks): ~46ns
  - flatten_tool_result_content/large_content (50 blocks): ~11.7µs
  - translate_message/text_only: ~200ns
  - translate_message/tool_result: ~348ns
  - build_chat_completion_request/10 messages: ~16.4µs
  - build_chat_completion_request/100 messages: ~209µs
  - is_reasoning_model detection: ~26-42ns depending on model
 - All tests pass (119 unit tests + 29 integration tests)
 - cargo clippy passes
--- a/rust/Cargo.lock
+++ b/rust/Cargo.lock
@@ -17,10 +17,23 @@ dependencies = [
 "memchr",
 ]
 [[package]]
 name = "anes"
 version = "0.1.6"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "4b46cbb362ab8752921c97e041f5e366ee6297bd428a31275b9fcf1e380f7299"
 [[package]]
 name = "anstyle"
 version = "1.0.14"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "940b3a0ca603d1eade50a4846a2afffd5ef57a9feac2c0e2ec2e14f9ead76000"
 [[package]]
 name = "api"
 version = "0.1.0"
 dependencies = [
 "criterion",
 "reqwest",
 "runtime",
 "serde",
@@ -35,6 +48,12 @@ version = "1.1.2"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "1505bd5d3d116872e7271a6d4e16d81d0c8570876c8de68093a09ac269d8aac0"
 [[package]]
 name = "autocfg"
 version = "1.5.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "c08606f8c3cbf4ce6ec8e28fb0014a2c086708fe954eaa885384a6165172e7e8"
 [[package]]
 name = "base64"
 version = "0.22.1"
@@ -77,6 +96,12 @@ version = "1.11.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "1e748733b7cbc798e1434b6ac524f0c1ff2ab456fe201501e6497c8417a4fc33"
 [[package]]
 name = "cast"
 version = "0.3.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "37b2a672a2cb129a2e41c10b1224bb368f9f37a2b16b612598138befd7b37eb5"
 [[package]]
 name = "cc"
 version = "1.2.58"
@@ -99,6 +124,58 @@ version = "0.2.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "613afe47fcd5fac7ccf1db93babcb082c5994d996f20b8b159f2ad1658eb5724"
 [[package]]
 name = "ciborium"
 version = "0.2.2"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "42e69ffd6f0917f5c029256a24d0161db17cea3997d185db0d35926308770f0e"
 dependencies = [
 "ciborium-io",
 "ciborium-ll",
 "serde",
 ]
 [[package]]
 name = "ciborium-io"
 version = "0.2.2"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "05afea1e0a06c9be33d539b876f1ce3692f4afea2cb41f740e7743225ed1c757"
 [[package]]
 name = "ciborium-ll"
 version = "0.2.2"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "57663b653d948a338bfb3eeba9bb2fd5fcfaecb9e199e87e1eda4d9e8b240fd9"
 dependencies = [
 "ciborium-io",
 "half",
 ]
 [[package]]
 name = "clap"
 version = "4.6.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "1ddb117e43bbf7dacf0a4190fef4d345b9bad68dfc649cb349e7d17d28428e51"
 dependencies = [
 "clap_builder",
 ]
 [[package]]
 name = "clap_builder"
 version = "4.6.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "714a53001bf66416adb0e2ef5ac857140e7dc3a0c48fb28b2f10762fc4b5069f"
 dependencies = [
 "anstyle",
 "clap_lex",
 ]
 [[package]]
 name = "clap_lex"
 version = "1.1.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "c8d4a3bb8b1e0c1050499d1815f5ab16d04f0959b233085fb31653fbfc9d98f9"
 [[package]]
 name = "clipboard-win"
 version = "5.4.1"
@@ -144,6 +221,67 @@ dependencies = [
 "cfg-if",
 ]
 [[package]]
 name = "criterion"
 version = "0.5.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "f2b12d017a929603d80db1831cd3a24082f8137ce19c69e6447f54f5fc8d692f"
 dependencies = [
 "anes",
 "cast",
 "ciborium",
 "clap",
 "criterion-plot",
 "is-terminal",
 "itertools",
 "num-traits",
 "once_cell",
 "oorandom",
 "plotters",
 "rayon",
 "regex",
 "serde",
 "serde_derive",
 "serde_json",
 "tinytemplate",
 "walkdir",
 ]
 [[package]]
 name = "criterion-plot"
 version = "0.5.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "6b50826342786a51a89e2da3a28f1c32b06e387201bc2d19791f622c673706b1"
 dependencies = [
 "cast",
 "itertools",
 ]
 [[package]]
 name = "crossbeam-deque"
 version = "0.8.6"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "9dd111b7b7f7d55b72c0a6ae361660ee5853c9af73f70c3c2ef6858b950e2e51"
 dependencies = [
 "crossbeam-epoch",
 "crossbeam-utils",
 ]
 [[package]]
 name = "crossbeam-epoch"
 version = "0.9.18"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "5b82ac4a3c2ca9c3460964f020e1402edd5753411d7737aa39c3714ad1b5420e"
 dependencies = [
 "crossbeam-utils",
 ]
 [[package]]
 name = "crossbeam-utils"
 version = "0.8.21"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "d0a5c400df2834b80a4c3327b3aad3a4c4cd4de0629063962b03235697506a28"
 [[package]]
 name = "crossterm"
 version = "0.28.1"
@@ -169,6 +307,12 @@ dependencies = [
 "winapi",
 ]
 [[package]]
 name = "crunchy"
 version = "0.2.4"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "460fbee9c2c2f33933d720630a6a0bac33ba7053db5344fac858d4b8952d77d5"
 [[package]]
 name = "crypto-common"
 version = "0.1.7"
@@ -209,6 +353,12 @@ dependencies = [
 "syn",
 ]
 [[package]]
 name = "either"
 version = "1.15.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "48c757948c5ede0e46177b7add2e67155f70e33c07fea8284df6576da70b3719"
 [[package]]
 name = "endian-type"
 version = "0.1.2"
@@ -245,7 +395,7 @@ checksum = "0ce92ff622d6dadf7349484f42c93271a0d49b7cc4d466a936405bacbe10aa78"
 dependencies = [
 "cfg-if",
 "rustix 1.1.4",
- "windows-sys 0.52.0",
+ "windows-sys 0.59.0",
 ]
 [[package]]
@@ -380,12 +530,29 @@ version = "0.3.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "0cc23270f6e1808e30a928bdc84dea0b9b4136a8bc82338574f23baf47bbd280"
 [[package]]
 name = "half"
 version = "2.7.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "6ea2d84b969582b4b1864a92dc5d27cd2b77b622a8d79306834f1be5ba20d84b"
 dependencies = [
 "cfg-if",
 "crunchy",
 "zerocopy",
 ]
 [[package]]
 name = "hashbrown"
 version = "0.16.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "841d1cc9bed7f9236f321df977030373f4a4163ae1a7dbfe1a51a2c1a51d9100"
 [[package]]
 name = "hermit-abi"
 version = "0.5.2"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "fc0fef456e4baa96da950455cd02c081ca953b141298e41db3fc7e36b1da849c"
 [[package]]
 name = "home"
 version = "0.5.12"
@@ -622,6 +789,26 @@ dependencies = [
 "serde",
 ]
 [[package]]
 name = "is-terminal"
 version = "0.4.17"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "3640c1c38b8e4e43584d8df18be5fc6b0aa314ce6ebf51b53313d4306cca8e46"
 dependencies = [
 "hermit-abi",
 "libc",
 "windows-sys 0.61.2",
 ]
 [[package]]
 name = "itertools"
 version = "0.10.5"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "b0fd2260e829bddf4cb6ea802289de2f86d6a7a690192fbe91b3f46e0f2c8473"
 dependencies = [
 "either",
 ]
 [[package]]
 name = "itoa"
 version = "1.0.18"
@@ -755,6 +942,15 @@ version = "0.2.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "c6673768db2d862beb9b39a78fdcb1a69439615d5794a1be50caa9bc92c81967"
 [[package]]
 name = "num-traits"
 version = "0.2.19"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "071dfc062690e90b734c0b2273ce72ad0ffa95f0c74596bc250dcfd960262841"
 dependencies = [
 "autocfg",
 ]
 [[package]]
 name = "once_cell"
 version = "1.21.4"
@@ -783,6 +979,12 @@ dependencies = [
 "pkg-config",
 ]
 [[package]]
 name = "oorandom"
 version = "11.1.5"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "d6790f58c7ff633d8771f42965289203411a5e5c68388703c06e14f24770b41e"
 [[package]]
 name = "parking_lot"
 version = "0.12.5"
@@ -837,6 +1039,34 @@ dependencies = [
 "time",
 ]
 [[package]]
 name = "plotters"
 version = "0.3.7"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "5aeb6f403d7a4911efb1e33402027fc44f29b5bf6def3effcc22d7bb75f2b747"
 dependencies = [
 "num-traits",
 "plotters-backend",
 "plotters-svg",
 "wasm-bindgen",
 "web-sys",
 ]
 [[package]]
 name = "plotters-backend"
 version = "0.3.7"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "df42e13c12958a16b3f7f4386b9ab1f3e7933914ecea48da7139435263a4172a"
 [[package]]
 name = "plotters-svg"
 version = "0.3.7"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "51bae2ac328883f7acdfea3d66a7c35751187f870bc81f94563733a154d7a670"
 dependencies = [
 "plotters-backend",
 ]
 [[package]]
 name = "plugins"
 version = "0.1.0"
@@ -1015,6 +1245,26 @@ dependencies = [
 "getrandom 0.3.4",
 ]
 [[package]]
 name = "rayon"
 version = "1.12.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "fb39b166781f92d482534ef4b4b1b2568f42613b53e5b6c160e24cfbfa30926d"
 dependencies = [
 "either",
 "rayon-core",
 ]
 [[package]]
 name = "rayon-core"
 version = "1.13.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "22e18b0f0062d30d4230b2e85ff77fdfe4326feb054b9783a3460d8435c8ab91"
 dependencies = [
 "crossbeam-deque",
 "crossbeam-utils",
 ]
 [[package]]
 name = "redox_syscall"
 version = "0.5.18"
@@ -1138,7 +1388,7 @@ dependencies = [
 "errno",
 "libc",
 "linux-raw-sys 0.4.15",
- "windows-sys 0.52.0",
+ "windows-sys 0.59.0",
 ]
 [[package]]
@@ -1522,6 +1772,16 @@ dependencies = [
 "zerovec",
 ]
 [[package]]
 name = "tinytemplate"
 version = "1.2.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "be4d6b5f19ff7664e8c98d03e2139cb510db9b0a60b55f8e8709b689d939b6bc"
 dependencies = [
 "serde",
 "serde_json",
 ]
 [[package]]
 name = "tinyvec"
 version = "1.11.0"
--- a/rust/crates/api/Cargo.toml
+++ b/rust/crates/api/Cargo.toml
@@ -13,5 +13,12 @@ serde_json.workspace = true
 telemetry = { path = "../telemetry" }
 tokio = { version = "1", features = ["io-util", "macros", "net", "rt-multi-thread", "time"] }
 [dev-dependencies]
 criterion = { version = "0.5", features = ["html_reports"] }
 [lints]
 workspace = true
 [[bench]]
 name = "request_building"
 harness = false
--- a/rust/crates/api/benches/request_building.rs
+++ b/rust/crates/api/benches/request_building.rs
@@ -0,0 +1,329 @@
 // Benchmarks for API request building performance
 // Benchmarks are exempt from strict linting as they are test/performance code
 #![allow(
    clippy::cognitive_complexity,
    clippy::doc_markdown,
    clippy::explicit_iter_loop,
    clippy::format_in_format_args,
    clippy::missing_docs_in_private_items,
    clippy::must_use_candidate,
    clippy::needless_pass_by_value,
    clippy::clone_on_copy,
    clippy::too_many_lines,
    clippy::uninlined_format_args
 )]
 use api::{
    build_chat_completion_request, flatten_tool_result_content, is_reasoning_model,
    translate_message, InputContentBlock, InputMessage, MessageRequest, OpenAiCompatConfig,
    ToolResultContentBlock,
 };
 use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion};
 use serde_json::json;
 /// Create a sample message request with various content types
 fn create_sample_request(message_count: usize) -> MessageRequest {
    let mut messages = Vec::with_capacity(message_count);
    for i in 0..message_count {
        match i % 4 {
            0 => messages.push(InputMessage::user_text(format!("Message {}", i))),
            1 => messages.push(InputMessage {
                role: "assistant".to_string(),
                content: vec![
                    InputContentBlock::Text {
                        text: format!("Assistant response {}", i),
                    },
                    InputContentBlock::ToolUse {
                        id: format!("call_{}", i),
                        name: "read_file".to_string(),
                        input: json!({"path": format!("/tmp/file{}", i)}),
                    },
                ],
            }),
            2 => messages.push(InputMessage {
                role: "user".to_string(),
                content: vec![InputContentBlock::ToolResult {
                    tool_use_id: format!("call_{}", i - 1),
                    content: vec![ToolResultContentBlock::Text {
                        text: format!("Tool result content {}", i),
                    }],
                    is_error: false,
                }],
            }),
            _ => messages.push(InputMessage {
                role: "assistant".to_string(),
                content: vec![InputContentBlock::ToolUse {
                    id: format!("call_{}", i),
                    name: "write_file".to_string(),
                    input: json!({"path": format!("/tmp/out{}", i), "content": "data"}),
                }],
            }),
        }
    }
    MessageRequest {
        model: "gpt-4o".to_string(),
        max_tokens: 1024,
        messages,
        stream: false,
        system: Some("You are a helpful assistant.".to_string()),
        temperature: Some(0.7),
        top_p: None,
        tools: None,
        tool_choice: None,
        frequency_penalty: None,
        presence_penalty: None,
        stop: None,
        reasoning_effort: None,
    }
 }
 /// Benchmark translate_message with various message types
 fn bench_translate_message(c: &mut Criterion) {
    let mut group = c.benchmark_group("translate_message");
    // Text-only message
    let text_message = InputMessage::user_text("Simple text message".to_string());
    group.bench_with_input(
        BenchmarkId::new("text_only", "single"),
        &text_message,
        |b, msg| {
            b.iter(|| translate_message(black_box(msg), black_box("gpt-4o")));
        },
    );
    // Assistant message with tool calls
    let assistant_message = InputMessage {
        role: "assistant".to_string(),
        content: vec![
            InputContentBlock::Text {
                text: "I'll help you with that.".to_string(),
            },
            InputContentBlock::ToolUse {
                id: "call_1".to_string(),
                name: "read_file".to_string(),
                input: json!({"path": "/tmp/test"}),
            },
            InputContentBlock::ToolUse {
                id: "call_2".to_string(),
                name: "write_file".to_string(),
                input: json!({"path": "/tmp/out", "content": "data"}),
            },
        ],
    };
    group.bench_with_input(
        BenchmarkId::new("assistant_with_tools", "2_tools"),
        &assistant_message,
        |b, msg| {
            b.iter(|| translate_message(black_box(msg), black_box("gpt-4o")));
        },
    );
    // Tool result message
    let tool_result_message = InputMessage {
        role: "user".to_string(),
        content: vec![InputContentBlock::ToolResult {
            tool_use_id: "call_1".to_string(),
            content: vec![ToolResultContentBlock::Text {
                text: "File contents here".to_string(),
            }],
            is_error: false,
        }],
    };
    group.bench_with_input(
        BenchmarkId::new("tool_result", "single"),
        &tool_result_message,
        |b, msg| {
            b.iter(|| translate_message(black_box(msg), black_box("gpt-4o")));
        },
    );
    // Tool result for kimi model (is_error excluded)
    group.bench_with_input(
        BenchmarkId::new("tool_result_kimi", "kimi-k2.5"),
        &tool_result_message,
        |b, msg| {
            b.iter(|| translate_message(black_box(msg), black_box("kimi-k2.5")));
        },
    );
    // Large content message
    let large_content = "x".repeat(10000);
    let large_message = InputMessage::user_text(large_content);
    group.bench_with_input(
        BenchmarkId::new("large_text", "10kb"),
        &large_message,
        |b, msg| {
            b.iter(|| translate_message(black_box(msg), black_box("gpt-4o")));
        },
    );
    group.finish();
 }
 /// Benchmark build_chat_completion_request with various message counts
 fn bench_build_request(c: &mut Criterion) {
    let mut group = c.benchmark_group("build_chat_completion_request");
    let config = OpenAiCompatConfig::openai();
    for message_count in [10, 50, 100].iter() {
        let request = create_sample_request(*message_count);
        group.bench_with_input(
            BenchmarkId::new("message_count", message_count),
            &request,
            |b, req| {
                b.iter(|| build_chat_completion_request(black_box(req), config.clone()));
            },
        );
    }
    // Benchmark with reasoning model (tuning params stripped)
    let mut reasoning_request = create_sample_request(50);
    reasoning_request.model = "o1-mini".to_string();
    group.bench_with_input(
        BenchmarkId::new("reasoning_model", "o1-mini"),
        &reasoning_request,
        |b, req| {
            b.iter(|| build_chat_completion_request(black_box(req), config.clone()));
        },
    );
    // Benchmark with gpt-5 (max_completion_tokens)
    let mut gpt5_request = create_sample_request(50);
    gpt5_request.model = "gpt-5".to_string();
    group.bench_with_input(
        BenchmarkId::new("gpt5", "gpt-5"),
        &gpt5_request,
        |b, req| {
            b.iter(|| build_chat_completion_request(black_box(req), config.clone()));
        },
    );
    group.finish();
 }
 /// Benchmark flatten_tool_result_content
 fn bench_flatten_tool_result(c: &mut Criterion) {
    let mut group = c.benchmark_group("flatten_tool_result_content");
    // Single text block
    let single_text = vec![ToolResultContentBlock::Text {
        text: "Simple result".to_string(),
    }];
    group.bench_with_input(
        BenchmarkId::new("single_text", "1_block"),
        &single_text,
        |b, content| {
            b.iter(|| flatten_tool_result_content(black_box(content)));
        },
    );
    // Multiple text blocks
    let multi_text: Vec<ToolResultContentBlock> = (0..10)
        .map(|i| ToolResultContentBlock::Text {
            text: format!("Line {}: some content here\n", i),
        })
        .collect();
    group.bench_with_input(
        BenchmarkId::new("multi_text", "10_blocks"),
        &multi_text,
        |b, content| {
            b.iter(|| flatten_tool_result_content(black_box(content)));
        },
    );
    // JSON content blocks
    let json_content: Vec<ToolResultContentBlock> = (0..5)
        .map(|i| ToolResultContentBlock::Json {
            value: json!({"index": i, "data": "test content", "nested": {"key": "value"}}),
        })
        .collect();
    group.bench_with_input(
        BenchmarkId::new("json_content", "5_blocks"),
        &json_content,
        |b, content| {
            b.iter(|| flatten_tool_result_content(black_box(content)));
        },
    );
    // Mixed content
    let mixed_content = vec![
        ToolResultContentBlock::Text {
            text: "Here's the result:".to_string(),
        },
        ToolResultContentBlock::Json {
            value: json!({"status": "success", "count": 42}),
        },
        ToolResultContentBlock::Text {
            text: "Processing complete.".to_string(),
        },
    ];
    group.bench_with_input(
        BenchmarkId::new("mixed_content", "text+json"),
        &mixed_content,
        |b, content| {
            b.iter(|| flatten_tool_result_content(black_box(content)));
        },
    );
    // Large content - simulating typical tool output
    let large_content: Vec<ToolResultContentBlock> = (0..50)
        .map(|i| {
            if i % 3 == 0 {
                ToolResultContentBlock::Json {
                    value: json!({"line": i, "content": "x".repeat(100)}),
                }
            } else {
                ToolResultContentBlock::Text {
                    text: format!("Line {}: {}", i, "some output content here"),
                }
            }
        })
        .collect();
    group.bench_with_input(
        BenchmarkId::new("large_content", "50_blocks"),
        &large_content,
        |b, content| {
            b.iter(|| flatten_tool_result_content(black_box(content)));
        },
    );
    group.finish();
 }
 /// Benchmark is_reasoning_model detection
 fn bench_is_reasoning_model(c: &mut Criterion) {
    let mut group = c.benchmark_group("is_reasoning_model");
    let models = vec![
        ("gpt-4o", false),
        ("o1-mini", true),
        ("o3", true),
        ("grok-3", false),
        ("grok-3-mini", true),
        ("qwen/qwen-qwq-32b", true),
        ("qwen/qwen-plus", false),
    ];
    for (model, expected) in models {
        group.bench_with_input(
            BenchmarkId::new(model, if expected { "reasoning" } else { "normal" }),
            model,
            |b, m| {
                b.iter(|| is_reasoning_model(black_box(m)));
            },
        );
    }
    group.finish();
 }
 criterion_group!(
    benches,
    bench_translate_message,
    bench_build_request,
    bench_flatten_tool_result,
    bench_is_reasoning_model
 );
 criterion_main!(benches);
--- a/rust/crates/api/src/error.rs
+++ b/rust/crates/api/src/error.rs
@@ -53,6 +53,8 @@ pub enum ApiError {
        request_id: Option<String>,
        body: String,
        retryable: bool,
        /// Suggested user action based on error type (e.g., "Reduce prompt size" for 413)
        suggested_action: Option<String>,
    },
    RetriesExhausted {
        attempts: u32,
@@ -63,6 +65,11 @@ pub enum ApiError {
        attempt: u32,
        base_delay: Duration,
    },
    RequestBodySizeExceeded {
        estimated_bytes: usize,
        max_bytes: usize,
        provider: &'static str,
    },
 }
 impl ApiError {
@@ -129,7 +136,8 @@ impl ApiError {
            | Self::Io(_)
            | Self::Json { .. }
            | Self::InvalidSseFrame(_)
-            | Self::BackoffOverflow { .. } => false,
+            | Self::BackoffOverflow { .. }
            | Self::RequestBodySizeExceeded { .. } => false,
        }
    }
@@ -147,7 +155,8 @@ impl ApiError {
            | Self::Io(_)
            | Self::Json { .. }
            | Self::InvalidSseFrame(_)
-            | Self::BackoffOverflow { .. } => None,
+            | Self::BackoffOverflow { .. }
            | Self::RequestBodySizeExceeded { .. } => None,
        }
    }
@@ -172,6 +181,7 @@ impl ApiError {
                "provider_transport"
            }
            Self::InvalidApiKeyEnv(_) | Self::Io(_) | Self::Json { .. } => "runtime_io",
            Self::RequestBodySizeExceeded { .. } => "request_size",
        }
    }
@@ -194,7 +204,8 @@ impl ApiError {
            | Self::Io(_)
            | Self::Json { .. }
            | Self::InvalidSseFrame(_)
-            | Self::BackoffOverflow { .. } => false,
+            | Self::BackoffOverflow { .. }
            | Self::RequestBodySizeExceeded { .. } => false,
        }
    }
@@ -223,12 +234,14 @@ impl ApiError {
            | Self::Io(_)
            | Self::Json { .. }
            | Self::InvalidSseFrame(_)
-            | Self::BackoffOverflow { .. } => false,
+            | Self::BackoffOverflow { .. }
            | Self::RequestBodySizeExceeded { .. } => false,
        }
    }
 }
 impl Display for ApiError {
    #[allow(clippy::too_many_lines)]
    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
        match self {
            Self::MissingCredentials {
@@ -324,6 +337,14 @@ impl Display for ApiError {
                f,
                "retry backoff overflowed on attempt {attempt} with base delay {base_delay:?}"
            ),
            Self::RequestBodySizeExceeded {
                estimated_bytes,
                max_bytes,
                provider,
            } => write!(
                f,
                "request body size ({estimated_bytes} bytes) exceeds {provider} limit ({max_bytes} bytes); reduce prompt length or context before retrying"
            ),
        }
    }
 }
@@ -469,6 +490,7 @@ mod tests {
            request_id: Some("req_jobdori_123".to_string()),
            body: String::new(),
            retryable: true,
            suggested_action: None,
        };
        assert!(error.is_generic_fatal_wrapper());
@@ -491,6 +513,7 @@ mod tests {
                request_id: Some("req_nested_456".to_string()),
                body: String::new(),
                retryable: true,
                suggested_action: None,
            }),
        };
@@ -511,6 +534,7 @@ mod tests {
            request_id: Some("req_ctx_123".to_string()),
            body: String::new(),
            retryable: false,
            suggested_action: None,
        };
        assert!(error.is_context_window_failure());
--- a/rust/crates/api/src/lib.rs
+++ b/rust/crates/api/src/lib.rs
@@ -19,7 +19,10 @@ pub use prompt_cache::{
    PromptCacheStats,
 };
 pub use providers::anthropic::{AnthropicClient, AnthropicClient as ApiClient, AuthSource};
-pub use providers::openai_compat::{OpenAiCompatClient, OpenAiCompatConfig};
+pub use providers::openai_compat::{
    build_chat_completion_request, flatten_tool_result_content, is_reasoning_model,
    model_rejects_is_error_field, translate_message, OpenAiCompatClient, OpenAiCompatConfig,
 };
 pub use providers::{
    detect_provider_kind, max_tokens_for_model, max_tokens_for_model_with_override,
    resolve_model_alias, ProviderKind,
--- a/rust/crates/api/src/providers/anthropic.rs
+++ b/rust/crates/api/src/providers/anthropic.rs
@@ -885,6 +885,7 @@ async fn expect_success(response: reqwest::Response) -> Result<reqwest::Response
        request_id,
        body,
        retryable,
        suggested_action: None,
    })
 }
@@ -909,6 +910,7 @@ fn enrich_bearer_auth_error(error: ApiError, auth: &AuthSource) -> ApiError {
        request_id,
        body,
        retryable,
        suggested_action,
    } = error
    else {
        return error;
@@ -921,6 +923,7 @@ fn enrich_bearer_auth_error(error: ApiError, auth: &AuthSource) -> ApiError {
            request_id,
            body,
            retryable,
            suggested_action,
        };
    }
    let Some(bearer_token) = auth.bearer_token() else {
@@ -931,6 +934,7 @@ fn enrich_bearer_auth_error(error: ApiError, auth: &AuthSource) -> ApiError {
            request_id,
            body,
            retryable,
            suggested_action,
        };
    };
    if !bearer_token.starts_with("sk-ant-") {
@@ -941,6 +945,7 @@ fn enrich_bearer_auth_error(error: ApiError, auth: &AuthSource) -> ApiError {
            request_id,
            body,
            retryable,
            suggested_action,
        };
    }
    // Only append the hint when the AuthSource is pure BearerToken. If both
@@ -955,6 +960,7 @@ fn enrich_bearer_auth_error(error: ApiError, auth: &AuthSource) -> ApiError {
            request_id,
            body,
            retryable,
            suggested_action,
        };
    }
    let enriched_message = match message {
@@ -968,6 +974,7 @@ fn enrich_bearer_auth_error(error: ApiError, auth: &AuthSource) -> ApiError {
        request_id,
        body,
        retryable,
        suggested_action,
    }
 }
@@ -1555,6 +1562,7 @@ mod tests {
            request_id: Some("req_varleg_001".to_string()),
            body: String::new(),
            retryable: false,
            suggested_action: None,
        };
        // when
@@ -1595,6 +1603,7 @@ mod tests {
            request_id: None,
            body: String::new(),
            retryable: true,
            suggested_action: None,
        };
        // when
@@ -1623,6 +1632,7 @@ mod tests {
            request_id: None,
            body: String::new(),
            retryable: false,
            suggested_action: None,
        };
        // when
@@ -1650,6 +1660,7 @@ mod tests {
            request_id: None,
            body: String::new(),
            retryable: false,
            suggested_action: None,
        };
        // when
@@ -1674,6 +1685,7 @@ mod tests {
            request_id: None,
            body: String::new(),
            retryable: false,
            suggested_action: None,
        };
        // when
--- a/rust/crates/api/src/providers/mod.rs
+++ b/rust/crates/api/src/providers/mod.rs
@@ -122,6 +122,15 @@ const MODEL_REGISTRY: &[(&str, ProviderMetadata)] = &[
            default_base_url: openai_compat::DEFAULT_XAI_BASE_URL,
        },
    ),
    (
        "kimi",
        ProviderMetadata {
            provider: ProviderKind::OpenAi,
            auth_env: "DASHSCOPE_API_KEY",
            base_url_env: "DASHSCOPE_BASE_URL",
            default_base_url: openai_compat::DEFAULT_DASHSCOPE_BASE_URL,
        },
    ),
 ];
 #[must_use]
@@ -144,7 +153,10 @@ pub fn resolve_model_alias(model: &str) -> String {
                    "grok-2" => "grok-2",
                    _ => trimmed,
                },
-                ProviderKind::OpenAi => trimmed,
+                ProviderKind::OpenAi => match *alias {
                    "kimi" => "kimi-k2.5",
                    _ => trimmed,
                },
            })
        })
        .map_or_else(|| trimmed.to_string(), ToOwned::to_owned)
@@ -194,6 +206,16 @@ pub fn metadata_for_model(model: &str) -> Option<ProviderMetadata> {
            default_base_url: openai_compat::DEFAULT_DASHSCOPE_BASE_URL,
        });
    }
    // Kimi models (kimi-k2.5, kimi-k1.5, etc.) via DashScope compatible-mode.
    // Routes kimi/* and kimi-* model names to DashScope endpoint.
    if canonical.starts_with("kimi/") || canonical.starts_with("kimi-") {
        return Some(ProviderMetadata {
            provider: ProviderKind::OpenAi,
            auth_env: "DASHSCOPE_API_KEY",
            base_url_env: "DASHSCOPE_BASE_URL",
            default_base_url: openai_compat::DEFAULT_DASHSCOPE_BASE_URL,
        });
    }
    None
 }
@@ -267,6 +289,12 @@ pub fn model_token_limit(model: &str) -> Option<ModelTokenLimit> {
            max_output_tokens: 64_000,
            context_window_tokens: 131_072,
        }),
        // Kimi models via DashScope (Moonshot AI)
        // Source: https://platform.moonshot.cn/docs/intro
        "kimi-k2.5" | "kimi-k1.5" => Some(ModelTokenLimit {
            max_output_tokens: 16_384,
            context_window_tokens: 256_000,
        }),
        _ => None,
    }
 }
@@ -554,6 +582,34 @@ mod tests {
        );
    }
    #[test]
    fn kimi_prefix_routes_to_dashscope() {
        // Kimi models via DashScope (kimi-k2.5, kimi-k1.5, etc.)
        let meta = super::metadata_for_model("kimi-k2.5")
            .expect("kimi-k2.5 must resolve to DashScope metadata");
        assert_eq!(meta.auth_env, "DASHSCOPE_API_KEY");
        assert_eq!(meta.base_url_env, "DASHSCOPE_BASE_URL");
        assert!(meta.default_base_url.contains("dashscope.aliyuncs.com"));
        assert_eq!(meta.provider, ProviderKind::OpenAi);
        // With provider prefix
        let meta2 = super::metadata_for_model("kimi/kimi-k2.5")
            .expect("kimi/kimi-k2.5 must resolve to DashScope metadata");
        assert_eq!(meta2.auth_env, "DASHSCOPE_API_KEY");
        assert_eq!(meta2.provider, ProviderKind::OpenAi);
        // Different kimi variants
        let meta3 = super::metadata_for_model("kimi-k1.5")
            .expect("kimi-k1.5 must resolve to DashScope metadata");
        assert_eq!(meta3.auth_env, "DASHSCOPE_API_KEY");
    }
    #[test]
    fn kimi_alias_resolves_to_kimi_k2_5() {
        assert_eq!(super::resolve_model_alias("kimi"), "kimi-k2.5");
        assert_eq!(super::resolve_model_alias("KIMI"), "kimi-k2.5"); // case insensitive
    }
    #[test]
    fn keeps_existing_max_token_heuristic() {
        assert_eq!(max_tokens_for_model("opus"), 32_000);
@@ -694,6 +750,69 @@ mod tests {
            .expect("models without context metadata should skip the guarded preflight");
    }
    #[test]
    fn returns_context_window_metadata_for_kimi_models() {
        // kimi-k2.5
        let k25_limit = model_token_limit("kimi-k2.5")
            .expect("kimi-k2.5 should have token limit metadata");
        assert_eq!(k25_limit.max_output_tokens, 16_384);
        assert_eq!(k25_limit.context_window_tokens, 256_000);
        // kimi-k1.5
        let k15_limit = model_token_limit("kimi-k1.5")
            .expect("kimi-k1.5 should have token limit metadata");
        assert_eq!(k15_limit.max_output_tokens, 16_384);
        assert_eq!(k15_limit.context_window_tokens, 256_000);
    }
    #[test]
    fn kimi_alias_resolves_to_kimi_k25_token_limits() {
        // The "kimi" alias resolves to "kimi-k2.5" via resolve_model_alias()
        let alias_limit = model_token_limit("kimi")
            .expect("kimi alias should resolve to kimi-k2.5 limits");
        let direct_limit = model_token_limit("kimi-k2.5")
            .expect("kimi-k2.5 should have limits");
        assert_eq!(alias_limit.max_output_tokens, direct_limit.max_output_tokens);
        assert_eq!(
            alias_limit.context_window_tokens,
            direct_limit.context_window_tokens
        );
    }
    #[test]
    fn preflight_blocks_oversized_requests_for_kimi_models() {
        let request = MessageRequest {
            model: "kimi-k2.5".to_string(),
            max_tokens: 16_384,
            messages: vec![InputMessage {
                role: "user".to_string(),
                content: vec![InputContentBlock::Text {
                    text: "x".repeat(1_000_000), // Large input to exceed context window
                }],
            }],
            system: Some("Keep the answer short.".to_string()),
            tools: None,
            tool_choice: None,
            stream: true,
            ..Default::default()
        };
        let error = preflight_message_request(&request)
            .expect_err("oversized request should be rejected for kimi models");
        match error {
            ApiError::ContextWindowExceeded {
                model,
                context_window_tokens,
                ..
            } => {
                assert_eq!(model, "kimi-k2.5");
                assert_eq!(context_window_tokens, 256_000);
            }
            other => panic!("expected context-window preflight failure, got {other:?}"),
        }
    }
    #[test]
    fn parse_dotenv_extracts_keys_handles_comments_quotes_and_export_prefix() {
        // given
--- a/rust/crates/api/src/providers/openai_compat.rs
+++ b/rust/crates/api/src/providers/openai_compat.rs
@@ -31,12 +31,22 @@ pub struct OpenAiCompatConfig {
    pub api_key_env: &'static str,
    pub base_url_env: &'static str,
    pub default_base_url: &'static str,
    /// Maximum request body size in bytes. Provider-specific limits:
    /// - `DashScope`: 6MB (`6_291_456` bytes) - observed in dogfood testing
    /// - `OpenAI`: 100MB (`104_857_600` bytes)
    /// - `xAI`: 50MB (`52_428_800` bytes)
    pub max_request_body_bytes: usize,
 }
 const XAI_ENV_VARS: &[&str] = &["XAI_API_KEY"];
 const OPENAI_ENV_VARS: &[&str] = &["OPENAI_API_KEY"];
 const DASHSCOPE_ENV_VARS: &[&str] = &["DASHSCOPE_API_KEY"];
 // Provider-specific request body size limits in bytes
 const XAI_MAX_REQUEST_BODY_BYTES: usize = 52_428_800; // 50MB
 const OPENAI_MAX_REQUEST_BODY_BYTES: usize = 104_857_600; // 100MB
 const DASHSCOPE_MAX_REQUEST_BODY_BYTES: usize = 6_291_456; // 6MB (observed limit in dogfood)
 impl OpenAiCompatConfig {
    #[must_use]
    pub const fn xai() -> Self {
@@ -45,6 +55,7 @@ impl OpenAiCompatConfig {
            api_key_env: "XAI_API_KEY",
            base_url_env: "XAI_BASE_URL",
            default_base_url: DEFAULT_XAI_BASE_URL,
            max_request_body_bytes: XAI_MAX_REQUEST_BODY_BYTES,
        }
    }
@@ -55,6 +66,7 @@ impl OpenAiCompatConfig {
            api_key_env: "OPENAI_API_KEY",
            base_url_env: "OPENAI_BASE_URL",
            default_base_url: DEFAULT_OPENAI_BASE_URL,
            max_request_body_bytes: OPENAI_MAX_REQUEST_BODY_BYTES,
        }
    }
@@ -69,6 +81,7 @@ impl OpenAiCompatConfig {
            api_key_env: "DASHSCOPE_API_KEY",
            base_url_env: "DASHSCOPE_BASE_URL",
            default_base_url: DEFAULT_DASHSCOPE_BASE_URL,
            max_request_body_bytes: DASHSCOPE_MAX_REQUEST_BODY_BYTES,
        }
    }
@@ -183,6 +196,10 @@ impl OpenAiCompatClient {
                    request_id,
                    body,
                    retryable: false,
                    suggested_action: suggested_action_for_status(
                        reqwest::StatusCode::from_u16(code.unwrap_or(400))
                            .unwrap_or(reqwest::StatusCode::BAD_REQUEST),
                    ),
                });
            }
        }
@@ -249,6 +266,9 @@ impl OpenAiCompatClient {
        &self,
        request: &MessageRequest,
    ) -> Result<reqwest::Response, ApiError> {
        // Pre-flight check: verify request body size against provider limits
        check_request_body_size(request, self.config())?;
        let request_url = chat_completions_endpoint(&self.base_url);
        self.http
            .post(&request_url)
@@ -752,7 +772,12 @@ struct ErrorBody {
 /// Returns true for models known to reject tuning parameters like temperature,
 /// `top_p`, `frequency_penalty`, and `presence_penalty`. These are typically
 /// reasoning/chain-of-thought models with fixed sampling.
-fn is_reasoning_model(model: &str) -> bool {
+/// Returns true for models known to reject tuning parameters like temperature,
 /// `top_p`, `frequency_penalty`, and `presence_penalty`. These are typically
 /// reasoning/chain-of-thought models with fixed sampling.
 /// Public for benchmarking and testing purposes.
 #[must_use]
 pub fn is_reasoning_model(model: &str) -> bool {
    let lowered = model.to_ascii_lowercase();
    // Strip any provider/ prefix for the check (e.g. qwen/qwen-qwq -> qwen-qwq)
    let canonical = lowered.rsplit('/').next().unwrap_or(lowered.as_str());
@@ -776,7 +801,7 @@ fn strip_routing_prefix(model: &str) -> &str {
        let prefix = &model[..pos];
        // Only strip if the prefix before "/" is a known routing prefix,
        // not if "/" appears in the middle of the model name for other reasons.
-        if matches!(prefix, "openai" | "xai" | "grok" | "qwen") {
+        if matches!(prefix, "openai" | "xai" | "grok" | "qwen" | "kimi") {
            &model[pos + 1..]
        } else {
            model
@@ -786,7 +811,41 @@ fn strip_routing_prefix(model: &str) -> &str {
    }
 }
-fn build_chat_completion_request(request: &MessageRequest, config: OpenAiCompatConfig) -> Value {
+/// Estimate the serialized JSON size of a request payload in bytes.
 /// This is a pre-flight check to avoid hitting provider-specific size limits.
 pub fn estimate_request_body_size(request: &MessageRequest, config: OpenAiCompatConfig) -> usize {
    let payload = build_chat_completion_request(request, config);
    // serde_json::to_vec gives us the exact byte size of the serialized JSON
    serde_json::to_vec(&payload).map_or(0, |v| v.len())
 }
 /// Pre-flight check for request body size against provider limits.
 /// Returns Ok(()) if the request is within limits, or an error with
 /// a clear message about the size limit being exceeded.
 pub fn check_request_body_size(
    request: &MessageRequest,
    config: OpenAiCompatConfig,
 ) -> Result<(), ApiError> {
    let estimated_bytes = estimate_request_body_size(request, config);
    let max_bytes = config.max_request_body_bytes;
    if estimated_bytes > max_bytes {
        Err(ApiError::RequestBodySizeExceeded {
            estimated_bytes,
            max_bytes,
            provider: config.provider_name,
        })
    } else {
        Ok(())
    }
 }
 /// Builds a chat completion request payload from a `MessageRequest`.
 /// Public for benchmarking purposes.
 pub fn build_chat_completion_request(
    request: &MessageRequest,
    config: OpenAiCompatConfig,
 ) -> Value {
    let mut messages = Vec::new();
    if let Some(system) = request.system.as_ref().filter(|value| !value.is_empty()) {
        messages.push(json!({
@@ -794,8 +853,10 @@ fn build_chat_completion_request(request: &MessageRequest, config: OpenAiCompatC
            "content": system,
        }));
    }
    // Strip routing prefix (e.g., "openai/gpt-4" → "gpt-4") for the wire.
    let wire_model = strip_routing_prefix(&request.model);
    for message in &request.messages {
-        messages.extend(translate_message(message));
+        messages.extend(translate_message(message, wire_model));
    }
    // Sanitize: drop any `role:"tool"` message that does not have a valid
    // paired `role:"assistant"` with a `tool_calls` entry carrying the same
@@ -806,9 +867,6 @@ fn build_chat_completion_request(request: &MessageRequest, config: OpenAiCompatC
    // still proceed with the remaining history intact.
    messages = sanitize_tool_message_pairing(messages);
    // Strip routing prefix (e.g., "openai/gpt-4" → "gpt-4") for the wire.
    let wire_model = strip_routing_prefix(&request.model);
    // gpt-5* requires `max_completion_tokens`; older OpenAI models accept both.
    // We send the correct field based on the wire model name so gpt-5.x requests
    // don't fail with "unknown field max_tokens".
@@ -868,7 +926,25 @@ fn build_chat_completion_request(request: &MessageRequest, config: OpenAiCompatC
    payload
 }
-fn translate_message(message: &InputMessage) -> Vec<Value> {
+/// Returns true for models that do NOT support the `is_error` field in tool results.
 /// kimi models (via Moonshot AI/Dashscope) reject this field with 400 Bad Request.
 /// Returns true for models that do NOT support the `is_error` field in tool results.
 /// kimi models (via Moonshot AI/Dashscope) reject this field with 400 Bad Request.
 /// Public for benchmarking and testing purposes.
 #[must_use]
 pub fn model_rejects_is_error_field(model: &str) -> bool {
    let lowered = model.to_ascii_lowercase();
    // Strip any provider/ prefix for the check
    let canonical = lowered.rsplit('/').next().unwrap_or(lowered.as_str());
    // kimi models (kimi-k2.5, kimi-k1.5, kimi-moonshot, etc.)
    canonical.starts_with("kimi")
 }
 /// Translates an `InputMessage` into OpenAI-compatible message format.
 /// Public for benchmarking purposes.
 #[must_use]
 pub fn translate_message(message: &InputMessage, model: &str) -> Vec<Value> {
    let supports_is_error = !model_rejects_is_error_field(model);
    match message.role.as_str() {
        "assistant" => {
            let mut text = String::new();
@@ -914,12 +990,19 @@ fn translate_message(message: &InputMessage) -> Vec<Value> {
                    tool_use_id,
                    content,
                    is_error,
-                } => Some(json!({
+                } => {
-                    "role": "tool",
+                    let mut msg = json!({
-                    "tool_call_id": tool_use_id,
+                        "role": "tool",
-                    "content": flatten_tool_result_content(content),
+                        "tool_call_id": tool_use_id,
-                    "is_error": is_error,
+                        "content": flatten_tool_result_content(content),
-                })),
+                    });
                    // Only include is_error for models that support it.
                    // kimi models reject this field with 400 Bad Request.
                    if supports_is_error {
                        msg["is_error"] = json!(is_error);
                    }
                    Some(msg)
                }
                InputContentBlock::ToolUse { .. } => None,
            })
            .collect(),
@@ -938,7 +1021,10 @@ fn translate_message(message: &InputMessage) -> Vec<Value> {
 /// `tool_calls` array containing an entry whose `id` matches the tool
 /// message's `tool_call_id`, the pair is valid and both are kept. Otherwise
 /// the tool message is dropped.
-fn sanitize_tool_message_pairing(messages: Vec<Value>) -> Vec<Value> {
+/// Remove `role:"tool"` messages from `messages` that have no valid paired
 /// `role:"assistant"` message with a matching `tool_calls[].id` immediately
 /// preceding them. Public for benchmarking purposes.
 pub fn sanitize_tool_message_pairing(messages: Vec<Value>) -> Vec<Value> {
    // Collect indices of tool messages that are orphaned.
    let mut drop_indices = std::collections::HashSet::new();
    for (i, msg) in messages.iter().enumerate() {
@@ -994,15 +1080,36 @@ fn sanitize_tool_message_pairing(messages: Vec<Value>) -> Vec<Value> {
        .collect()
 }
-fn flatten_tool_result_content(content: &[ToolResultContentBlock]) -> String {
+/// Flattens tool result content blocks into a single string.
-    content
+/// Optimized to pre-allocate capacity and avoid intermediate `Vec` construction.
 #[must_use]
 pub fn flatten_tool_result_content(content: &[ToolResultContentBlock]) -> String {
    // Pre-calculate total capacity needed to avoid reallocations
    let total_len: usize = content
        .iter()
        .map(|block| match block {
-            ToolResultContentBlock::Text { text } => text.clone(),
+            ToolResultContentBlock::Text { text } => text.len(),
-            ToolResultContentBlock::Json { value } => value.to_string(),
+            ToolResultContentBlock::Json { value } => value.to_string().len(),
        })
-        .collect::<Vec<_>>()
+        .sum();
-        .join("\n")
+
    // Add capacity for newlines between blocks
    let capacity = total_len + content.len().saturating_sub(1);
    let mut result = String::with_capacity(capacity);
    for (i, block) in content.iter().enumerate() {
        if i > 0 {
            result.push('\n');
        }
        match block {
            ToolResultContentBlock::Text { text } => result.push_str(text),
            ToolResultContentBlock::Json { value } => {
                // Use write! to append without creating intermediate String
                result.push_str(&value.to_string());
            }
        }
    }
    result
 }
 /// Recursively ensure every object-type node in a JSON Schema has
@@ -1186,6 +1293,7 @@ fn parse_sse_frame(
                request_id: None,
                body: payload.clone(),
                retryable: false,
                suggested_action: suggested_action_for_status(status),
            });
        }
    }
@@ -1243,6 +1351,8 @@ async fn expect_success(response: reqwest::Response) -> Result<reqwest::Response
    let parsed_error = serde_json::from_str::<ErrorEnvelope>(&body).ok();
    let retryable = is_retryable_status(status);
    let suggested_action = suggested_action_for_status(status);
    Err(ApiError::Api {
        status,
        error_type: parsed_error
@@ -1254,6 +1364,7 @@ async fn expect_success(response: reqwest::Response) -> Result<reqwest::Response
        request_id,
        body,
        retryable,
        suggested_action,
    })
 }
@@ -1261,6 +1372,20 @@ const fn is_retryable_status(status: reqwest::StatusCode) -> bool {
    matches!(status.as_u16(), 408 | 409 | 429 | 500 | 502 | 503 | 504)
 }
 /// Generate a suggested user action based on the HTTP status code and error context.
 /// This provides actionable guidance when API requests fail.
 fn suggested_action_for_status(status: reqwest::StatusCode) -> Option<String> {
    match status.as_u16() {
        401 => Some("Check API key is set correctly and has not expired".to_string()),
        403 => Some("Verify API key has required permissions for this operation".to_string()),
        413 => Some("Reduce prompt size or context window before retrying".to_string()),
        429 => Some("Wait a moment before retrying; consider reducing request rate".to_string()),
        500 => Some("Provider server error - retry after a brief wait".to_string()),
        502..=504 => Some("Provider gateway error - retry after a brief wait".to_string()),
        _ => None,
    }
 }
 fn normalize_finish_reason(value: &str) -> String {
    match value {
        "stop" => "end_turn",
@@ -1794,4 +1919,292 @@ mod tests {
            "gpt-4o must not emit max_completion_tokens"
        );
    }
    // ============================================================================
    // US-009: kimi model compatibility tests
    // ============================================================================
    #[test]
    fn model_rejects_is_error_field_detects_kimi_models() {
        // kimi models (various formats) should be detected
        assert!(super::model_rejects_is_error_field("kimi-k2.5"));
        assert!(super::model_rejects_is_error_field("kimi-k1.5"));
        assert!(super::model_rejects_is_error_field("kimi-moonshot"));
        assert!(super::model_rejects_is_error_field("KIMI-K2.5")); // case insensitive
        assert!(super::model_rejects_is_error_field("dashscope/kimi-k2.5")); // with prefix
        assert!(super::model_rejects_is_error_field("moonshot/kimi-k2.5")); // different prefix
        // Non-kimi models should NOT be detected
        assert!(!super::model_rejects_is_error_field("gpt-4o"));
        assert!(!super::model_rejects_is_error_field("gpt-4"));
        assert!(!super::model_rejects_is_error_field("claude-sonnet-4-6"));
        assert!(!super::model_rejects_is_error_field("grok-3"));
        assert!(!super::model_rejects_is_error_field("grok-3-mini"));
        assert!(!super::model_rejects_is_error_field("xai/grok-3"));
        assert!(!super::model_rejects_is_error_field("qwen/qwen-plus"));
        assert!(!super::model_rejects_is_error_field("o1-mini"));
    }
    #[test]
    fn translate_message_includes_is_error_for_non_kimi_models() {
        use crate::types::{InputContentBlock, InputMessage, ToolResultContentBlock};
        // Test with gpt-4o (should include is_error)
        let message = InputMessage {
            role: "user".to_string(),
            content: vec![InputContentBlock::ToolResult {
                tool_use_id: "call_1".to_string(),
                content: vec![ToolResultContentBlock::Text {
                    text: "Error occurred".to_string(),
                }],
                is_error: true,
            }],
        };
        let translated = super::translate_message(&message, "gpt-4o");
        assert_eq!(translated.len(), 1);
        let tool_msg = &translated[0];
        assert_eq!(tool_msg["role"], json!("tool"));
        assert_eq!(tool_msg["tool_call_id"], json!("call_1"));
        assert_eq!(tool_msg["content"], json!("Error occurred"));
        assert!(
            tool_msg.get("is_error").is_some(),
            "gpt-4o should include is_error field"
        );
        assert_eq!(tool_msg["is_error"], json!(true));
        // Test with grok-3 (should include is_error)
        let message2 = InputMessage {
            role: "user".to_string(),
            content: vec![InputContentBlock::ToolResult {
                tool_use_id: "call_2".to_string(),
                content: vec![ToolResultContentBlock::Text {
                    text: "Success".to_string(),
                }],
                is_error: false,
            }],
        };
        let translated2 = super::translate_message(&message2, "grok-3");
        assert!(
            translated2[0].get("is_error").is_some(),
            "grok-3 should include is_error field"
        );
        assert_eq!(translated2[0]["is_error"], json!(false));
        // Test with claude model (should include is_error)
        let translated3 = super::translate_message(&message, "claude-sonnet-4-6");
        assert!(
            translated3[0].get("is_error").is_some(),
            "claude should include is_error field"
        );
    }
    #[test]
    fn translate_message_excludes_is_error_for_kimi_models() {
        use crate::types::{InputContentBlock, InputMessage, ToolResultContentBlock};
        // Test with kimi-k2.5 (should EXCLUDE is_error)
        let message = InputMessage {
            role: "user".to_string(),
            content: vec![InputContentBlock::ToolResult {
                tool_use_id: "call_1".to_string(),
                content: vec![ToolResultContentBlock::Text {
                    text: "Error occurred".to_string(),
                }],
                is_error: true,
            }],
        };
        let translated = super::translate_message(&message, "kimi-k2.5");
        assert_eq!(translated.len(), 1);
        let tool_msg = &translated[0];
        assert_eq!(tool_msg["role"], json!("tool"));
        assert_eq!(tool_msg["tool_call_id"], json!("call_1"));
        assert_eq!(tool_msg["content"], json!("Error occurred"));
        assert!(
            tool_msg.get("is_error").is_none(),
            "kimi-k2.5 must NOT include is_error field (would cause 400 Bad Request)"
        );
        // Test with kimi-k1.5
        let translated2 = super::translate_message(&message, "kimi-k1.5");
        assert!(
            translated2[0].get("is_error").is_none(),
            "kimi-k1.5 must NOT include is_error field"
        );
        // Test with dashscope/kimi-k2.5 (with provider prefix)
        let translated3 = super::translate_message(&message, "dashscope/kimi-k2.5");
        assert!(
            translated3[0].get("is_error").is_none(),
            "dashscope/kimi-k2.5 must NOT include is_error field"
        );
    }
    #[test]
    fn build_chat_completion_request_kimi_vs_non_kimi_tool_results() {
        use crate::types::{InputContentBlock, InputMessage, ToolResultContentBlock};
        // Helper to create a request with a tool result
        let make_request = |model: &str| MessageRequest {
            model: model.to_string(),
            max_tokens: 100,
            messages: vec![
                InputMessage {
                    role: "assistant".to_string(),
                    content: vec![InputContentBlock::ToolUse {
                        id: "call_1".to_string(),
                        name: "read_file".to_string(),
                        input: serde_json::json!({"path": "/tmp/test"}),
                    }],
                },
                InputMessage {
                    role: "user".to_string(),
                    content: vec![InputContentBlock::ToolResult {
                        tool_use_id: "call_1".to_string(),
                        content: vec![ToolResultContentBlock::Text {
                            text: "file contents".to_string(),
                        }],
                        is_error: false,
                    }],
                },
            ],
            stream: false,
            ..Default::default()
        };
        // Non-kimi model: should have is_error field
        let request_gpt = make_request("gpt-4o");
        let payload_gpt = build_chat_completion_request(&request_gpt, OpenAiCompatConfig::openai());
        let messages_gpt = payload_gpt["messages"].as_array().unwrap();
        let tool_msg_gpt = messages_gpt.iter().find(|m| m["role"] == "tool").unwrap();
        assert!(
            tool_msg_gpt.get("is_error").is_some(),
            "gpt-4o request should include is_error in tool result"
        );
        // kimi model: should NOT have is_error field
        let request_kimi = make_request("kimi-k2.5");
        let payload_kimi =
            build_chat_completion_request(&request_kimi, OpenAiCompatConfig::dashscope());
        let messages_kimi = payload_kimi["messages"].as_array().unwrap();
        let tool_msg_kimi = messages_kimi.iter().find(|m| m["role"] == "tool").unwrap();
        assert!(
            tool_msg_kimi.get("is_error").is_none(),
            "kimi-k2.5 request must NOT include is_error in tool result (would cause 400)"
        );
        // Verify both have the essential fields
        assert_eq!(tool_msg_gpt["tool_call_id"], json!("call_1"));
        assert_eq!(tool_msg_kimi["tool_call_id"], json!("call_1"));
        assert_eq!(tool_msg_gpt["content"], json!("file contents"));
        assert_eq!(tool_msg_kimi["content"], json!("file contents"));
    }
    // ============================================================================
    // US-021: Request body size pre-flight check tests
    // ============================================================================
    #[test]
    fn estimate_request_body_size_returns_reasonable_estimate() {
        let request = MessageRequest {
            model: "gpt-4o".to_string(),
            max_tokens: 100,
            messages: vec![InputMessage::user_text("Hello world".to_string())],
            stream: false,
            ..Default::default()
        };
        let size = super::estimate_request_body_size(&request, OpenAiCompatConfig::openai());
        // Should be non-zero and reasonable for a small request
        assert!(size > 0, "estimated size should be positive");
        assert!(size < 10_000, "small request should be under 10KB");
    }
    #[test]
    fn check_request_body_size_passes_for_small_requests() {
        let request = MessageRequest {
            model: "gpt-4o".to_string(),
            max_tokens: 100,
            messages: vec![InputMessage::user_text("Hello".to_string())],
            stream: false,
            ..Default::default()
        };
        // Should pass for all providers with a small request
        assert!(super::check_request_body_size(&request, OpenAiCompatConfig::openai()).is_ok());
        assert!(super::check_request_body_size(&request, OpenAiCompatConfig::xai()).is_ok());
        assert!(super::check_request_body_size(&request, OpenAiCompatConfig::dashscope()).is_ok());
    }
    #[test]
    fn check_request_body_size_fails_for_dashscope_when_exceeds_6mb() {
        // Create a request that exceeds DashScope's 6MB limit
        let large_content = "x".repeat(7_000_000); // 7MB of content
        let request = MessageRequest {
            model: "qwen-plus".to_string(),
            max_tokens: 100,
            messages: vec![InputMessage::user_text(large_content)],
            stream: false,
            ..Default::default()
        };
        let result = super::check_request_body_size(&request, OpenAiCompatConfig::dashscope());
        assert!(result.is_err(), "should fail for 7MB request to DashScope");
        let err = result.unwrap_err();
        match err {
            crate::error::ApiError::RequestBodySizeExceeded {
                estimated_bytes,
                max_bytes,
                provider,
            } => {
                assert_eq!(provider, "DashScope");
                assert_eq!(max_bytes, 6_291_456); // 6MB limit
                assert!(estimated_bytes > max_bytes);
            }
            _ => panic!("expected RequestBodySizeExceeded error, got {err:?}"),
        }
    }
    #[test]
    fn check_request_body_size_allows_large_requests_for_openai() {
        // Create a request that exceeds DashScope's limit but is under OpenAI's 100MB limit
        let large_content = "x".repeat(10_000_000); // 10MB of content
        let request = MessageRequest {
            model: "gpt-4o".to_string(),
            max_tokens: 100,
            messages: vec![InputMessage::user_text(large_content)],
            stream: false,
            ..Default::default()
        };
        // Should pass for OpenAI (100MB limit)
        assert!(
            super::check_request_body_size(&request, OpenAiCompatConfig::openai()).is_ok(),
            "10MB request should pass for OpenAI's 100MB limit"
        );
        // Should fail for DashScope (6MB limit)
        assert!(
            super::check_request_body_size(&request, OpenAiCompatConfig::dashscope()).is_err(),
            "10MB request should fail for DashScope's 6MB limit"
        );
    }
    #[test]
    fn provider_specific_size_limits_are_correct() {
        assert_eq!(OpenAiCompatConfig::dashscope().max_request_body_bytes, 6_291_456); // 6MB
        assert_eq!(OpenAiCompatConfig::openai().max_request_body_bytes, 104_857_600); // 100MB
        assert_eq!(OpenAiCompatConfig::xai().max_request_body_bytes, 52_428_800); // 50MB
    }
    #[test]
    fn strip_routing_prefix_strips_kimi_provider_prefix() {
        // US-023: kimi prefix should be stripped for wire format
        assert_eq!(super::strip_routing_prefix("kimi/kimi-k2.5"), "kimi-k2.5");
        assert_eq!(super::strip_routing_prefix("kimi-k2.5"), "kimi-k2.5"); // no prefix, unchanged
        assert_eq!(super::strip_routing_prefix("kimi/kimi-k1.5"), "kimi-k1.5");
    }
 }
--- a/rust/crates/rusty-claude-cli/src/main.rs
+++ b/rust/crates/rusty-claude-cli/src/main.rs
@@ -6018,6 +6018,93 @@ fn summarize_tool_payload_for_markdown(payload: &str) -> String {
    truncate_for_summary(&compact, SESSION_MARKDOWN_TOOL_SUMMARY_LIMIT)
 }
 /// Structured export error envelope (#130).
 /// Conforms to Phase 2 §4.44 typed-error envelope contract.
 /// Includes kind/operation/target/errno/hint/retryable for actionable diagnostics.
 #[derive(Debug, serde::Serialize)]
 struct ExportError {
    kind: String,
    operation: String,
    target: String,
    #[serde(skip_serializing_if = "Option::is_none")]
    errno: Option<String>,
    hint: String,
    retryable: bool,
 }
 impl std::fmt::Display for ExportError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(
            f,
            "export failed: {} ({})\n  target: {}\n  errno: {}\n  hint: {}",
            self.kind,
            self.operation,
            self.target,
            self.errno.as_deref().unwrap_or("unknown"),
            self.hint
        )
    }
 }
 impl std::error::Error for ExportError {}
 /// Wrap std::io::Error into a structured ExportError per §4.44.
 fn wrap_export_io_error(path: &Path, op: &str, e: std::io::Error) -> ExportError {
    use std::io::ErrorKind;
    let target_display = path.display().to_string();
    let parent = path
        .parent()
        .filter(|p| !p.as_os_str().is_empty())
        .map(|p| p.display().to_string());
    let (kind, hint) = match e.kind() {
        ErrorKind::NotFound => (
            "filesystem",
            parent
                .as_ref()
                .map(|p| format!("intermediate directory does not exist; try `mkdir -p {p}` first"))
                .unwrap_or_else(|| {
                    "path is empty or invalid; provide a non-empty file path".to_string()
                }),
        ),
        ErrorKind::PermissionDenied => (
            "permission",
            format!(
                "permission denied; check file permissions with `ls -la {}`",
                parent.as_deref().unwrap_or(".")
            ),
        ),
        ErrorKind::IsADirectory => (
            "filesystem",
            format!(
                "path `{}` is a directory, not a file; use a file path like `{}/session.md`",
                target_display, target_display
            ),
        ),
        ErrorKind::AlreadyExists => (
            "filesystem",
            format!("path `{target_display}` already exists; remove it or pick a different name"),
        ),
        ErrorKind::InvalidInput | ErrorKind::InvalidData => (
            "invalid_path",
            format!("path `{target_display}` is invalid; check for empty or malformed input"),
        ),
        _ => (
            "filesystem",
            format!(
                "unexpected error writing to `{target_display}`; check disk space and path validity"
            ),
        ),
    };
    ExportError {
        kind: kind.to_string(),
        operation: op.to_string(),
        target: target_display,
        errno: Some(format!("{:?}", e.kind())),
        hint,
        retryable: matches!(e.kind(), ErrorKind::TimedOut | ErrorKind::Interrupted),
    }
 }
 fn run_export(
    session_reference: &str,
    output_path: Option<&Path>,
@@ -6027,7 +6114,9 @@ fn run_export(
    let markdown = render_session_markdown(&session, &handle.id, &handle.path);
    if let Some(path) = output_path {
-        fs::write(path, &markdown)?;
+        fs::write(path, &markdown).map_err(|e| {
            Box::new(wrap_export_io_error(path, "write", e)) as Box<dyn std::error::Error>
        })?;
        let report = format!(
            "Export\n  Result           wrote markdown transcript\n  File             {}\n  Session          {}\n  Messages         {}",
            path.display(),