Compare commits

..

17 Commits

Author SHA1 Message Date
YeonGyu-Kim
0121f20a09 roadmap: #222 filed — Models list endpoint typed taxonomy is structurally absent: zero GET /v1/models and zero GET /v1/models/{id} surface across rust/crates/api/src/providers/anthropic.rs and rust/crates/api/src/providers/openai_compat.rs (rg returns zero hits for /v1/models, list_models, fetch_models, get_models, available_models, model_catalog, ModelInfo, ModelList, ListModelsResponse, OwnedBy, ModelObject, ModelCatalog across rust/), zero Model / ModelInfo / ModelList / ListModelsResponse typed taxonomy in rust/crates/api/src/types.rs, zero list_models<'a>(&'a self) -> ProviderFuture<'a, ModelList> and zero retrieve_model<'a>(&'a self, model_id: &'a str) -> ProviderFuture<'a, ModelInfo> methods on the Provider trait at rust/crates/api/src/providers/mod.rs:17-30 (only send_message and stream_message exist, both per-request), zero list_models dispatch on the ProviderClient enum at rust/crates/api/src/client.rs:8-14 (three variants Anthropic/Xai/OpenAi, all closed under per-request synchronous dispatch), zero claw models / claw model list / claw list-models CLI subcommand surface at rust/crates/rusty-claude-cli/src/main.rs, zero /models slash command in the SlashCommandSpec table at rust/crates/commands/src/lib.rs, zero validation against an authoritative source on set_model at rust/crates/rusty-claude-cli/src/main.rs:4989-5037 (user can type /model claude-banana-9000 and the runtime accepts it, swaps the active model to that string, and only fails at request time when the upstream provider returns 404 / invalid_model_error), and the existing /providers slash command at rust/crates/commands/src/lib.rs:716-720 is just a literal alias for /doctor at rust/crates/commands/src/lib.rs:1386-1389 despite advertising summary: "List available model providers" (advertised-but-rerouted shape — actively misleading at the UX layer, distinct from #220's advertised-but-unbuilt shape because the parse arm dispatches to a *different* command entirely instead of returning a clear unsupported error) — the canonical model-discovery affordance is invisible across every CLI / REPL / slash-command / Provider-trait / ProviderClient-enum / data-model surface, leaving claw-code's local hardcoded 13-entry MODEL_REGISTRY (3 anthropic + 5 grok + 1 kimi + 4 prefix routes for openai/gpt/qwen/kimi at rust/crates/api/src/providers/mod.rs:52-134 and 166-225) and its 6-entry model_token_limit match arm (rust/crates/api/src/providers/mod.rs:277-301 covering claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251213, grok-3, grok-3-mini, kimi-k2.5, kimi-k1.5 — returns None for current production IDs claude-opus-4-7, claude-haiku-4-6, gpt-5.2, o3, o4-mini, kimi-k3, qwen3-max, grok-4, deepseek-reasoner) as the only model-name knowledge the runtime has access to, with no way to refresh it, no way to discover new model IDs that providers publish, no way to validate user-supplied model strings, no way to cross-link to the pricing_for_model cost estimator (#209 substring-matching gap), no way to cross-link to the model_token_limit preflight check (#210 max_tokens shadow-fork gap silently no-ops on unknown models), no way to cross-link to the future is_batch_request flag (#221 batch-dispatch gap requires knowing which models support batch), and USAGE.md:426-440 documents only six model rows out of nine MODEL_REGISTRY entries (kimi alias missing from the documented table, four prefix routes mentioned only in passing prose, zero documentation of /v1/models endpoint usage / zero documentation of model-catalog discovery / zero documentation of "what to do when your provider ships a new model that isn't in claw-code's hardcoded registry") — the canonical model-discovery affordance is **the most universally-available endpoint in the LLM API ecosystem** (older than /v1/chat/completions itself, older than /v1/embeddings, older than /v1/messages, the literal first endpoint after auth on every OpenAI-compat provider since 2020 and on Anthropic since 2024-12-04, GA-shipped first-class typed surfaces in every Python/TypeScript SDK in the ecosystem) and claw-code is the **sole client/agent/CLI in the surveyed coding-agent ecosystem with zero /v1/models integration AND a misleading /providers slash command that aliases to /doctor** — both gaps are unique to claw-code in the surveyed ecosystem (Jobdori cycle #374 / extends #168c emission-routing audit / explicit follow-on candidate from #221's seven-layer-endpoint-family-absence shape — the third of three named candidates: Files API typed taxonomy / Embeddings API typed taxonomy / Models list endpoint typed taxonomy, and the most clawability-impacting because it's the upstream root cause of three downstream gaps already catalogued in this audit / sibling-shape cluster grows to twenty-one: #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215/#216/#217/#218/#219/#220/#221/#222 / wire-format-parity cluster grows to twelve: #211+#212+#213+#214+#215+#216+#217+#218+#219+#220+#221+#222 / capability-parity cluster grows to four: #218+#220+#221+#222 / discovery-and-validation cluster: #222 alone but it's the upstream root cause of #209's pricing-fallback gap, #210's max_tokens shadow-fork gap, and #221's batch-dispatch gap / eight-layer-endpoint-family-absence-with-misleading-alias shape (endpoint-URL + data-model-taxonomy + Provider-trait-method + ProviderClient-enum-dispatch + CLI-subcommand-surface + slash-command-surface-with-misleading-alias + set_model-validation + downstream-consumers-with-stale-data) is the largest single advertised-vs-actual gap catalogued, distinct from prior single-field (#211/#212/#214) / response-only (#213/#207) / header-only (#215) / three-dimensional (#216) / classifier-leakage (#217) / four-layer (#218) / false-positive-opt-in (#219) / five-layer-feature-absence (#220) / seven-layer-endpoint-family-absence (#221) members; the advertised-but-rerouted shape is novel — strict-superset of #220's advertised-but-unbuilt because the parse arm dispatches to a *different* command instead of returning a clear unsupported error, applies to any future SlashCommandSpec entry where the summary field describes a feature different from what the parse arm dispatches to / external validation: Anthropic Models API reference at https://docs.anthropic.com/en/api/models-list documenting GET /v1/models GA 2024-12-04 with paginated before_id / after_id / limit and ModelInfo { id, type: "model", display_name, created_at } shape, Anthropic retrieve reference at https://docs.anthropic.com/en/api/models documenting GET /v1/models/{model_id} for single-model lookup, OpenAI Models API at https://platform.openai.com/docs/api-reference/models documenting the literal first endpoint after auth with Model { id, object: "model", created, owned_by } and ModelList { object: "list", data: Vec<Model> }, OpenAI Python SDK client.models.list() and client.models.retrieve(model_id) first-class typed surface, Anthropic Python SDK client.models.list() parallel surface GA-shipped 2024-12-04 alongside the API endpoint, Anthropic TypeScript SDK client.models.list(), AWS Bedrock ListFoundationModels API documenting Bedrock-anthropic-relay equivalent with FoundationModelSummary provider+model+modalities+active flag, Azure OpenAI Models reference with deployment-aware catalog, Vertex AI projects.locations.models.list for Vertex-published Anthropic/Gemini/3rd-party models, DeepSeek/Moonshot/Alibaba-DashScope/xAI parallel /v1/models OpenAI-compat shape, OpenRouter Models API at https://openrouter.ai/api/v1/models — the canonical "live model catalog with pricing" reference and the model that anomalyco/opencode-via-models.dev uses for pricing-data freshness, simonw/llm llm models and llm models default <model> first-class CLI subcommand backed by per-plugin model registration with models.dev-equivalent freshness, simonw/llm plugin-registration architecture for ad-hoc model addition, Vercel AI SDK 6 provider.languageModels() and provider.embeddingModels() first-class typed catalog APIs, LangChain init_chat_model(model_provider, model_name) reflective discovery via provider-defined catalogs and BaseChatModel.aget_models async catalog query, models.dev (https://models.dev) — community-maintained authoritative model catalog with pricing + capability flags + provider routing, used by anomalyco/opencode for pricing-data freshness with explicit fallback metadata when a model id isn't in the catalog (the canonical "external authoritative source for model metadata" reference), anomalyco/opencode models.dev integration with periodic refresh and explicit { provider: unknown, reason: not_in_pricing_table } fallback metadata, charmbracelet/crush typed catalog with provider+model+input/output-pricing, continue.dev config-file-driven catalog with auto-refresh from provider endpoints, zed-industries/zed bundled JSON catalog with periodic upstream refresh, TabbyML/tabby model catalog via plugin registration, llama.cpp server /v1/models local-model catalog via OpenAI-compat shape, LM Studio /v1/models local-model catalog, Ollama /api/tags and /v1/models local-model catalog with both Ollama-native and OpenAI-compat shapes, llamafile bundled-model catalog, LiteLLM models reference covering 100+ models at proxy level, portkey.ai gateway-level catalog, helicone.ai observability-platform model catalog with per-model usage stats, prompthub.us model-catalog-as-service, OpenTelemetry GenAI semconv gen_ai.request.model and gen_ai.response.model documented as required attributes for spans (every observability backend treats model as a first-class structured signal requiring authoritative-source validation), OpenAPI 3.1 spec for /v1/models at https://github.com/openai/openai-openapi as canonical machine-readable schema, Anthropic API stability versioning at https://docs.anthropic.com/en/api/versioning with anthropic-version header semver-stable since 2023-06-01 and models endpoint stable since 2024-12-04. Thirty-two ecosystem references, three first-class models-endpoint specs (Anthropic, OpenAI, OpenRouter), GA timeline of 16 months on Anthropic's side and 6+ years on OpenAI's side, eight first-class CLI/SDK implementations (Anthropic Python+TypeScript, OpenAI Python, simonw/llm, Vercel AI SDK, LangChain, Zed, charmbracelet/crush), seven first-class local-model catalogs (Ollama, LM Studio, llama.cpp server, llamafile, Tabby, Continue.dev, LiteLLM proxy), one community-maintained authoritative pricing source (models.dev) used by the closest peer coding agent. claw-code is the **sole client/agent/CLI in the surveyed coding-agent ecosystem with zero /v1/models integration AND a misleading /providers slash command that aliases to /doctor** — both gaps are unique to claw-code in the surveyed ecosystem, the model-discovery gap is the **upstream root cause** of three downstream cost-and-correctness gaps already catalogued in this audit (#209 / #210 / #221), and the misleading-alias-shape is novel within the cluster — #222 closes the upstream root cause of three downstream gaps and unblocks live-catalog-driven cost-estimation, max-tokens-validation, batch-capability-detection, and CLI-vs-slash-command-symmetry that the runtime's clawability doctrine treats as canonical baseline expectations. 2026-04-26 02:15:43 +09:00
YeonGyu-Kim
9acd4f14da roadmap: #221 filed — Message Batches API is structurally absent: zero /v1/messages/batches endpoint, zero /v1/batches endpoint, zero MessageBatch / BatchedRequest / BatchedResult / BatchProcessingStatus / BatchRequestCounts typed taxonomy across rust/crates/api/src/types.rs (zero hits for batches, MessageBatch, BatchedRequest, custom_id, processing_status), zero submit_batch / retrieve_batch / retrieve_batch_results / cancel_batch / list_batches methods on the Provider trait at rust/crates/api/src/providers/mod.rs:17-30 (only send_message and stream_message exist, both per-request synchronous), zero batch dispatch on ProviderClient enum at rust/crates/api/src/client.rs:8-14 (three variants Anthropic/Xai/OpenAi all closed under sync send_message + stream_message), zero BatchSubmittedEvent / BatchInProgressEvent / BatchEndedEvent typed events on the runtime telemetry sink, zero claw batch / claw batches CLI subcommand surface at rust/crates/rusty-claude-cli/src/main.rs, zero /batch slash command in SlashCommandSpec table at rust/crates/commands/src/lib.rs, zero pending_batches field in claw status --json output, zero is_batch_request flag on pricing_for_model cost estimator (so even if Batch API were wired, cost would over-charge by 2x), zero batch_input_tokens_per_million_usd / batch_output_tokens_per_million_usd fields in the Pricing struct — the API has been GA on Anthropic since 2024-10-08 (18 months ago at filing time, with explicit 'anthropic-beta: message-batches-2024-09-24' opt-in header documented) and on OpenAI since 2024-04-15 (24 months ago at filing time), uniformly offers 50% input-and-output token discount, accepts up to 100,000 requests per batch with 256MB total payload (Anthropic) or unlimited via Files API (OpenAI), 24-hour completion SLO; combining with #219's also-missing prompt-caching opt-in (90% input savings) gives a compounded ~95% input-cost asymmetry on bulk ingest scenarios — the single largest cost-reduction lever in the entire API parity audit, missing at the endpoint-family level rather than the per-field level (Jobdori cycle #373 / extends #168c emission-routing audit / sibling-shape cluster grows to twenty: #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215/#216/#217/#218/#219/#220/#221 / wire-format-parity cluster grows to eleven: #211+#212+#213+#214+#215+#216+#217+#218+#219+#220+#221 / capability-parity cluster grows to three: #218+#220+#221 / cost-parity cluster grows to eight: #204+#207+#209+#210+#213+#216+#219+#221 — #221 compounds with #219 to ~95% bulk-ingest cost asymmetry, the largest cost gap in the cluster / seven-layer-endpoint-family-absence shape (endpoint-URL + data-model-taxonomy + Provider-trait-method + ProviderClient-enum-dispatch + Worker-registry-status-enum + CLI-subcommand-surface + pricing-tier-flag) is the largest single capability absence catalogued, exceeding #220's five-layer-feature-absence / endpoint-family-level absence shape is novel — applies to follow-on candidates 'Files API typed taxonomy is absent' (the OpenAI batch path's prerequisite endpoint, also absent), 'Embeddings API typed taxonomy is absent' (/v1/embeddings cross-cutting), 'Models list endpoint typed taxonomy is absent' (/v1/models / Anthropic Models API) / external validation: Anthropic Message Batches API reference at https://docs.anthropic.com/en/api/messages-batches documenting five operations on /v1/messages/batches + GA 2024-10-08 + 50% discount + 100k-requests-per-batch + 256MB-total-payload + 24-hour-SLO + custom_id correlation field, Anthropic launch announcement at anthropic.com/news/message-batches-api documenting '50% off both input and output tokens' positioning, Anthropic Pricing page documenting Batch API column with 50% across Sonnet 3.5/4/4.5/4.6 + Opus 3/4/4.6 + Haiku 3.5, Anthropic Python SDK client.messages.batches.create(requests=[...]) first-class typed surface, Anthropic TypeScript SDK parallel surface, AWS Bedrock InvokeModelBatch / batch-inference docs (Bedrock-anthropic-relay path), OpenAI Batch API reference at platform.openai.com/docs/api-reference/batch documenting GA 2024-04-15 + 50% discount + JSONL-via-Files-API + completion_window:'24h', OpenAI launch announcement at openai.com/index/openai-introduces-batch-api documenting 'process batches asynchronously and receive results within 24 hours at a 50% discount', DeepSeek/Moonshot/Alibaba-DashScope/xAI batch-inference parallel surfaces, OpenRouter batch passthrough, simonw/llm --batch flag, Vercel AI SDK generateBatch + provider-specific batch passthrough, LangChain Runnable.batch() + Runnable.abatch() first-class Python+TypeScript parity, LangSmith batch-aware tracing, llmindset.co.uk independent cost-calculus validation, Medium 'process 10,000 queries without breaking the bank' tutorial, Steve Kinney's Anthropic-Batch-with-Temporal workflow-orchestration article, ai.moda Anthropic-Batch+Caching 95%-compounded-savings analysis (proves #219+#221 together close the largest cost gap), VentureBeat industry-press coverage, Reddit r/ClaudeAI launch thread, zed-industries/zed#19945 (peer ecosystem with same gap), RooCodeInc/Roo-Code#8667 (peer ecosystem with same gap), n8n Anthropic-batch-processing workflow, startground.com batch-deals tracker, silicondata.com 2026-pricing per-model batch breakdown, Hacker News batch-mechanics discussions, OpenTelemetry GenAI semconv gen_ai.request.batch_id + gen_ai.batch.processing_status + gen_ai.batch.request_counts documented attributes, IANA application/x-ndjson + application/jsonl MIME-type registrations / claw-code is the sole client/agent/CLI in the surveyed coding-agent ecosystem with zero batch-dispatch capability despite the API being GA on both major providers for 18+ months — parity floor against every other CLI/SDK/coding-agent in 2024-2025, the largest single cost-reduction lever in the entire emission-routing audit, and the largest endpoint-family-level capability gap catalogued so far) 2026-04-26 01:45:20 +09:00
YeonGyu-Kim
d46c423c1d roadmap: #220 filed — Image/vision input is structurally impossible across the entire data model: zero image content-block taxonomy variant on InputContentBlock (types.rs:80-94 has only Text/ToolUse/ToolResult — three of three exhaustive variants, zero Image, zero Document, zero MediaType, zero ImageSource, zero base64/file_id slot, zero media_type field anywhere in rust/crates/api/src/), zero parse arm for /image <path> and /screenshot slash commands despite their advertised summaries ("Add an image file to the conversation" at commands/lib.rs:585, "Take a screenshot and add to conversation" at commands/lib.rs:578) being in the canonical SlashCommandSpec table since project inception, both gated under STUB_COMMANDS at main.rs:8381-8382 (UX patch over missing-feature, not missing-feature fix), ResolvedAttachment at tools/lib.rs:2660-2666 carries path/size/is_image triple but no bytes / no base64 / no media_type / no upload affordance / no transport-ready payload despite is_image_path at line 5276 correctly classifying png/jpg/jpeg/gif/webp/bmp/svg extensions and the SendUserMessage/Brief tool surfacing isImage: true in JSON envelope (asserted at line 8969); build_chat_completion_request (openai_compat.rs:845) and translate_message (openai_compat.rs:946) have three-arm exhaustive matches over Text/ToolUse/ToolResult with no Image arm and no {type: "image", source: {type: "base64", media_type, data}} Anthropic-canonical wire shape and no {type: "image_url", image_url: {url: "data:image/...;base64,..."}} OpenAI-compat wire shape; the markdown renderer at render.rs:379-426 handles Tag::Image and TagEnd::Image for *output* rendering (asymmetric capability — model emits image markdown → rendered as colored [image:url] link, user attaches image → silent black hole at API boundary); the runtime's own worker_boot test fixture at worker_boot.rs:1324+:1349 literally hard-codes "Explain this KakaoTalk screenshot for a friend" as the canonical task-classification example for worker prompt-mismatch recovery — claw-code uses screenshot analysis as a runtime-classifier signal while having zero capability to actually send a screenshot to the model; TUI-ENHANCEMENT-PLAN.md:57 backlogs the gap as "No image/attachment preview" but the gap is far worse than no preview — there is no transport, no codec, no envelope, no anything from the byte stream to the wire (Jobdori cycle #372 / extends #168c emission-routing audit / sibling-shape cluster grows to nineteen: #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215/#216/#217/#218/#219/#220 / wire-format-parity cluster grows to ten: #211+#212+#213+#214+#215+#216+#217+#218+#219+#220 / capability-parity cluster (strict-superset including user-facing surfacing): #218+#220 / five-layer-structural-absence shape (data-model-variant + slash-command-parse-arm + attachment-metadata-threading + request-builder-translation + OS-integration-helper) is the largest single feature absence yet catalogued, exceeding #218's four-layer; advertised-but-unbuilt shape is novel — UX-layer cousin of #219's false-positive-opt-in shape — applicable to other STUB_COMMAND entries with capability-claim summaries / claw-code is the sole client/agent/CLI in the surveyed coding-agent ecosystem with zero image-input capability despite Anthropic Vision GA on 2024-03-04 (25 months ago at filing time, default-on for all Claude 3.5+ models with 5MB-per-image / 32MB-per-request / 100-images-per-request limits) and OpenAI Vision GA on 2024-05-13 (23 months ago) and Google Gemini multimodal GA on 2024-02-15 (26 months ago), making this a regression against the upstream claude-code CLI claw-code is porting from / external validation: Anthropic Vision API reference at platform.claude.com/docs/en/build-with-claude/vision documenting the canonical {type, source: {type, media_type, data}} content block, Anthropic Messages API reference, Anthropic Files API beta with file_id reference for repeated-image-use efficiency, AWS Bedrock prompt-caching docs with image-block coverage and 20-images-per-request stricter limit and same cachePoint:{} pattern from #219, OpenAI Vision API reference documenting the {type:image_url, image_url:{url}} data-URL shape used by GPT-4o/4o-mini/5-vision/o1-vision/o3-vision/DeepSeek-VL2/Qwen-VL/QwQ-VL/MiniMax-VL/Moonshot kimi-VL, Google Gemini multimodal API documenting {inline_data:{mime_type, data}} shape, anomalyco/opencode#16184 (look_at tool image-file-from-disk handling bug), anomalyco/opencode#15728 (Read tool image-handling bug), anomalyco/opencode#8875 (custom-provider attachment-allowlist gap), anomalyco/opencode#17205 (text-only-model token-burn on image attachment) — all four are integration-quality gaps in opencode while claw-code is missing the capability entirely (~85% vs 0% parity asymmetry, the largest in the cluster), charmbracelet/crush vision-input via terminal paste, simonw/llm --attachment flag, Vercel AI SDK experimental_attachments + image content blocks, LangChain HumanMessage content blocks, LangGraph image-message routing, OpenAI Python and Anthropic Python SDK first-class image-typed messages, anthropic-quickstarts vision examples, claude-code official CLI paste-image and screenshot shortcuts (the upstream this is a regression against), OpenTelemetry GenAI semconv gen_ai.input.attachments and gen_ai.input.images.count multimodal observability attributes, IANA MIME-type registry RFC 4288/4289) 2026-04-26 01:18:43 +09:00
YeonGyu-Kim
2858aeccff roadmap: #219 filed — Anthropic prompt-caching opt-in is structurally impossible: cache_control marker has zero codebase footprint (rg returns 0 hits across rust/ src/ docs/ tests/) despite the wire-side beta header 'prompt-caching-scope-2026-01-05' being unconditionally enabled at every Anthropic request (telemetry/lib.rs:16,452,469 + anthropic.rs:1443); five cacheable surfaces are uniformly locked: pub system: Option<String> at types.rs:11 is a flat string with no array form so no system-block cache_control slot exists; InputContentBlock variants Text/ToolUse/ToolResult at types.rs:80-99 have no cache_control field; ToolResultContentBlock variants Text/Json at types.rs:100-103 have no cache_control field; ToolDefinition at types.rs:105-110 has no cache_control field; openai_compat path translate_message at openai_compat.rs:946 and build_chat_completion_request at openai_compat.rs:850 emit flat-string system+content with no cache_control or Bedrock cachePoint translation; ~600 LOC of response-side cache stats infrastructure (prompt_cache.rs PromptCacheStats / PromptCacheRecord / PromptCache trait) accumulates a zero stream because no payload was opted in, and four hardcoded zero-coercion sites (openai_compat.rs:477-478, 489-490, 597-598, 1211-1212) discard upstream cache stats from Bedrock/Vertex/kimi-anthropic-compat/MiniMax-relay even when emitted; integration test at client_integration.rs:88-89 asserts the beta header is sent but no companion test asserts payload contains a cache_control marker because the data structures cannot produce one — a uniquely paradoxical false-positive opt-in shape: wire signal advertises caching intent and data-model structurally precludes it (Jobdori cycle #371 / extends #168c emission-routing audit / sibling-shape cluster grows to eighteen: #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215/#216/#217/#218/#219 / wire-format-parity cluster grows to nine: #211+#212+#213+#214+#215+#216+#217+#218+#219 / cost-parity cluster grows to seven: #204+#207+#209+#210+#213+#216+#219 — #219 is the dominant cost-parity miss, ~90% input-token-cost reduction unattainable / cache-parity request/response symmetry pair: #219 (request-side opt-in absent) + #213 (response-side stats absent on openai-compat lane) / five-surface uniform-structural-absence shape: system+tools+tool_choice+messages+tool_result_content all locked, with no extra_body escape hatch since cache_control is a per-block annotation not a top-level field / false-positive-opt-in shape: novel cluster member where wire signal says yes and structure says no / external validation: Anthropic prompt-caching reference at platform.claude.com/docs/en/build-with-claude/prompt-caching documenting cache_control: {type: ephemeral} on system/tools/messages/content blocks with 5-min default TTL and 1-hour optional TTL and 90% cost reduction on cache-read tokens, Anthropic Messages API reference documenting system: Vec<SystemBlock> array form as the cacheable shape, Bedrock prompt-caching docs documenting cachePoint: {} block form for Bedrock-anthropic relay, claudecodecamp.com analysis of how prompt caching actually works in Claude Code, xda-developers article documenting claude-code's cache-token-budget knob proving caching is actively engaged, anomalyco/opencode#5416 #14203 #16848 #17910 #20110 #20265 (cache-related issues and PR for system-prompt-split-for-cache-hit-rate optimization), opencode-anthropic-cache npm package as third-party plugin proving the ecosystem expectation, LangChain anthropicPromptCachingMiddleware as first-class JS wrapper, LiteLLM prompt-caching docs with single-line cache_control pass-through for Anthropic+Bedrock, Vercel AI SDK Anthropic provider providerOptions.anthropic.cacheControl, prompthub.us multi-provider comparison treating opt-in as documented baseline, portkey.ai gateway-level pass-through, mindstudio.ai cost-impact analysis, OpenTelemetry GenAI semconv gen_ai.usage.input_tokens.cached as documented attribute — claw is the sole client/agent/CLI in the surveyed coding-agent ecosystem with zero cache_control request-side opt-in capability despite shipping the eligibility beta header on every Anthropic request) 2026-04-26 00:40:20 +09:00
YeonGyu-Kim
116a95a253 roadmap: #218 filed — MessageRequest has no response_format / output_config / seed / logprobs / top_logprobs / logit_bias / n / metadata fields (types.rs:6-36, thirteen fields, zero hits across rust/ for any of these); build_chat_completion_request (openai_compat.rs:845) writes thirteen optional fields and emits none of these on the wire; AnthropicClient::send_raw_request (anthropic.rs:466) renders same MessageRequest via render_json_body (telemetry/lib.rs:107) with same gaps; ChatMessage (openai_compat.rs:688) has three fields (role, content, tool_calls) and no refusal field despite the streaming-aggregator test at line 1781 explicitly including "refusal": null in test data — silent serde drop; ChunkDelta (openai_compat.rs:735) has same gap; OutputContentBlock (types.rs:147) has four variants (Text, ToolUse, Thinking, RedactedThinking) and no Refusal variant; MessageResponse.stop_reason (types.rs:127) has no slot for Anthropic's 2025-11+ stop_reason='refusal' value; net effect: claw cannot opt into OpenAI strict-schema constrained decoding (response_format json_schema, GA 2024-08), cannot opt into Anthropic GA structured outputs (output_config.format, GA 2025-11-13), cannot opt into legacy JSON mode (response_format json_object), cannot supply seed for reproducible sampling, cannot request logprobs/top_logprobs, cannot bias tokens via logit_bias, cannot request multiple completions via n, and silently discards every refusal string OpenAI emits when constrained decoding rejects a generation — refusals classified as Finished/success with empty content via #217 normalize_finish_reason mapping (Jobdori cycle #370 / extends #168c emission-routing audit / sibling-shape cluster grows to seventeen: #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215/#216/#217/#218 / wire-format-parity cluster grows to eight: #211+#212+#213+#214+#215+#216+#217+#218 / four-layer-structural-absence shape: request-struct-field + request-builder-write + response-struct-field + content-block-taxonomy-variant, largest single-feature absence catalogued / external validation: OpenAI Structured Outputs guide, OpenAI Chat Completions API reference, Anthropic structured-outputs reference (GA 2025-11-13), Anthropic Messages API reference (stop_reason='refusal'), Vercel AI Gateway Anthropic structured outputs, Vercel AI SDK 6 generateObject + Zod, LangChain with_structured_output, simonw/llm --schema flag, charmbracelet/crush, anomalyco/opencode#10456 open feature request citing OpenAI Codex as reference, anomalyco/opencode#5639/#11357/#13618, OpenAI Codex CI/code-review cookbook, OpenRouter structured-outputs docs, OpenAI Python SDK client.beta.chat.completions.parse, OpenTelemetry GenAI semconv gen_ai.request.response_format + gen_ai.response.refusal) 2026-04-26 00:13:01 +09:00
YeonGyu-Kim
91e290526a roadmap: #217 filed — normalize_finish_reason (openai_compat.rs:1389) is a two-arm match (stop→end_turn, tool_calls→tool_use) with a string-passthrough fallthrough that drops three of five OpenAI-spec finish reasons (length, content_filter, function_call); MessageResponse.stop_reason is Option<String> with no enum constraint; WorkerRegistry::observe_completion (worker_boot.rs:558) classifies failure on finish=='unknown'||finish=='error' only, so OpenAI/DeepSeek/Moonshot truncation (length) and content-policy refusal (content_filter) become WorkerStatus::Finished with success events; the streaming aggregator's tool-call-block-close branch at openai_compat.rs:537 keys on 'tool_calls' literal and never fires for legacy 'function_call' shape (Azure pre-2024-02-15 / DeepSeek pre-2025-08 / SiliconFlow / OpenRouter relays); Anthropic native path produces the canonical taxonomy correctly (Jobdori cycle #369 / extends #168c emission-routing audit / sibling-shape cluster grows to sixteen: #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215/#216/#217 / wire-format-parity cluster grows to seven: #211+#212+#213+#214+#215+#216+#217 / classifier-leakage shape: response-side string mistranslation flows three layers deep into runtime classifier with two-literal-compare coverage / external validation: OpenAI Chat Completions API reference, Anthropic Messages API reference, OpenAI function_call deprecation notice, Azure OpenAI reference, DeepSeek/Moonshot/DashScope refs, anomalyco/opencode#19842, charmbracelet/crush typed enum, simonw/llm Reason enum, Vercel AI SDK FinishReason union, LangChain LengthFinishReasonError/ContentFilterFinishReasonError, semantic-kernel FinishReason enum, openai-python Literal type, OpenTelemetry GenAI gen_ai.response.finish_reasons spec) 2026-04-25 23:39:13 +09:00
YeonGyu-Kim
ceb092abd7 roadmap: #216 filed — neither MessageRequest nor MessageResponse has any service_tier field; build_chat_completion_request (openai_compat.rs:845) writes thirteen optional fields (model, max_tokens/max_completion_tokens, messages, stream, stream_options, tools, tool_choice, temperature, top_p, frequency_penalty, presence_penalty, stop, reasoning_effort) and does not write service_tier; AnthropicClient::send_raw_request (anthropic.rs:466) renders the same MessageRequest struct via AnthropicRequestProfile::render_json_body (telemetry/lib.rs:107) which has no field for it either, only a per-client extra_body escape hatch (asymmetric — openai_compat path has zero hits for extra_body); ChatCompletionResponse / ChatCompletionChunk / OpenAiUsage all deserialize four fields each, dropping the upstream-echoed service_tier confirmation and the system_fingerprint reproducibility marker that OpenAI documents as the canonical "what backend served you" signal; claw cannot opt into OpenAI flex (~50% cheaper async batch — developers.openai.com/api/docs/guides/flex-processing), cannot opt into OpenAI priority (~1.5-2x premium SLA latency — developers.openai.com/api/docs/guides/priority-processing), cannot opt into Anthropic priority (auto/standard_only — platform.claude.com/docs/en/api/service-tiers), and cannot detect at the response layer whether a request was flex-served or silently upgraded to priority by a project-level default override (Jobdori cycle #368 / extends #168c emission-routing audit / sibling-shape cluster grows to fifteen: #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215/#216 / wire-format-parity cluster grows to six: #211+#212+#213+#214+#215+#216 / cost-parity cluster grows to six: #204+#207+#209+#210+#213+#216 / three-dimensional-structural-absence shape: request-side write + response-side read + reproducibility marker, distinct from prior request-only #211#212 / response-only #207#213#214 / header-only #215 members / external validation: OpenAI flex/priority/scale-tier guides, OpenAI advanced-usage system_fingerprint guide, Anthropic service-tiers reference, OpenTelemetry GenAI semconv gen_ai.openai.request.service_tier + gen_ai.openai.response.service_tier + gen_ai.openai.response.system_fingerprint, anomalyco/opencode#12297, Vercel AI SDK serviceTier provider option, LangChain ChatOpenAI service_tier ctor param, LiteLLM service_tier pass-through, semantic-kernel OpenAIPromptExecutionSettings.ServiceTier, openai-python SDK client.chat.completions.create(service_tier=...) first-class kwarg, MiniMax/DeepSeek Anthropic-compat layer notes, badlogic/pi-mono#1381) 2026-04-25 23:12:25 +09:00
YeonGyu-Kim
2da12117eb roadmap: #215 filed — expect_success reads only request-id/x-request-id headers and discards the rest; both OpenAiCompatClient::send_with_retry and AnthropicClient::send_with_retry sleep on pure exponential backoff (2^(n-1) * initial + jitter) that ignores upstream Retry-After (RFC 7231 §7.1.3, mandated by Anthropic on 429, emitted by OpenAI/DeepSeek/Moonshot/DashScope on 429/503/529); ApiError::Api has no retry_after field, scheduler has no input port for it; on a 60s server-specified cooldown, claw burns 3 retries in <8s against a closed gate then surfaces RetriesExhausted (Jobdori cycle #367 / extends #168c emission-routing audit / sibling-shape cluster grows to fourteen: #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214/#215 / upstream-contract-honoring trio: #211+#213+#215 / wire-format-parity cluster: #211+#212+#213+#214+#215 / external validation: Anthropic rate-limits docs, OpenAI cookbook, DeepSeek rate-limit docs, RFC 7231 §7.1.3, openai-python#957, Vercel AI SDK LanguageModelV1RateLimit.retryAfter, LangChain BaseChatOpenAI, anomalyco/opencode#16993/#16994/#9091/#17583/#11705, charmbracelet/crush, LiteLLM Router.retry_after_strategy) 2026-04-25 22:41:49 +09:00
YeonGyu-Kim
959bdf8491 roadmap: #214 filed — ChunkDelta and ChatMessage in openai_compat.rs deserialize only content/tool_calls; delta.reasoning_content (sibling to delta.content, the canonical wire field for DeepSeek deepseek-reasoner / Alibaba Qwen3-Thinking / QwQ / vLLM reasoning-parser backends) is silently discarded at serde-deserialize time before any handler sees it; non-streaming ChatMessage has the same gap; is_reasoning_model classifier already returns true for o1/o3/o4/grok-3-mini/qwen-qwq/qwq/*thinking* and is consulted at line 901 to strip request-side tuning params but never on the response side to opt into reasoning_content extraction; local taxonomy already declares OutputContentBlock::Thinking and ContentBlockDelta::ThinkingDelta and the Anthropic native path correctly emits both with full test coverage at sse.rs:260,288 — the openai-compat translator has the destination types one import away and never bridges to them (Jobdori cycle #366 / extends #168c emission-routing audit / sibling-shape cluster grows to thirteen: #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213/#214 / reasoning-fidelity trio: #207+#211+#214 / wire-format-parity cluster: #211+#212+#213+#214 / external validation: DeepSeek API docs, vLLM reasoning-outputs, anomalyco/opencode#24124, charmbracelet/crush, simonw/llm, Vercel AI SDK, LangChain BaseChatOpenAI, LiteLLM, continue.dev#9245) 2026-04-25 22:16:02 +09:00
YeonGyu-Kim
347102d83b roadmap: #213 filed — OpenAiUsage struct does not deserialize prompt_tokens_details.cached_tokens (OpenAI 2024-10) or prompt_cache_hit_tokens (DeepSeek); openai_compat path hardcodes cache_creation_input_tokens: 0 and cache_read_input_tokens: 0 at four sites; cost estimator computes $0 cache savings for every OpenAI/DeepSeek/Moonshot kimi request even when upstream prompt cache is hitting; Anthropic native path correctly populates same Usage fields from native wire format (Jobdori cycle #365 / extends #168c emission-routing audit / sibling-shape cluster grows to twelve: #201/#202/#203/#206/#207/#208/#209/#210/#211/#212/#213 / cost-parity cluster: #204+#207+#209+#210+#213 / wire-format-parity cluster: #211+#212+#213 / external validation: OpenAI prompt caching docs, DeepSeek pricing docs, anomalyco/opencode#17223/#17121/#17056/#11995, Vercel AI SDK cachedInputTokens, charmbracelet/crush, simonw/llm) 2026-04-25 21:42:54 +09:00
Jobdori
c00981896f roadmap: #212 filed — MessageRequest+ToolChoice cannot express parallel_tool_calls (OpenAI top-level) or disable_parallel_tool_use (Anthropic tool_choice modifier); zero hits across rust/ src/ tests/ docs/; ToolChoice is 3-variant enum with no modifier slot; openai_tool_choice mapper has 3-arm match no parallel path; provider default is parallel-on, claw cannot opt out (Jobdori cycle #364 / extends #168c emission-routing audit / sibling-shape cluster grows to eleven: #201/#202/#203/#206/#207/#208/#209/#210/#211/#212 / wire-format-parity cluster: #211+#212 / external validation: Anthropic docs, OpenAI API reference, LangChain BaseChatOpenAI, anomalyco/opencode, charmbracelet/crush#1061) 2026-04-25 21:10:50 +09:00
YeonGyu Kim
f004f74ffa roadmap: #211 filed — build_chat_completion_request selects max_tokens_key only on wire_model.starts_with("gpt-5"), sending legacy max_tokens to OpenAI o1/o3/o4-mini reasoning models which reject it with unsupported_parameter; is_reasoning_model classifier 90 lines above already knows o-series is reasoning, taxonomy half-applied within 30-line span; no test for any o-series model (Jobdori cycle #363 / extends #168c emission-routing audit / sibling-shape cluster grows to ten: #201/#202/#203/#206/#207/#208/#209/#210/#211 / external validation: charmbracelet/crush#1061, simonw/llm#724, HKUDS/DeepTutor#54) 2026-04-25 20:38:43 +09:00
YeonGyu-Kim
02252a8585 roadmap: #210 filed — rusty-claude-cli shadows api::max_tokens_for_model with stripped 2-branch fork (opus=32k, else=64k); ignores model_token_limit registry, bypasses plugin maxOutputTokens override, silently sends 64_000 for kimi-k2.5 whose registry cap is 16_384 (4x over) (Jobdori cycle #362 / extends #168c emission-routing audit / sibling-shape cluster grows to nine: #201/#202/#203/#206/#207/#208/#209/#210) 2026-04-25 20:06:43 +09:00
YeonGyu-Kim
134e945a01 roadmap: #209 filed — pricing_for_model substring-matches haiku/opus/sonnet only; default_sonnet_tier function name carries Opus pricing constants (15.0/75.0 vs real Sonnet 3.0/15.0); every non-Anthropic model silently falls back producing 5-100x wrong cost estimates with no event signal, only a magic-string suffix on one summary line; rusty-claude-cli session JSON and anthropic.rs telemetry emit cost without pricing_source field (Jobdori cycle #361 / cost-parity cluster closer to #204+#207 / models.dev parity gap vs anomalyco/opencode) 2026-04-25 19:42:37 +09:00
Jobdori
c20d0330c1 roadmap: #208 filed — silent param/field strip on outbound serialization (4 tuning params for reasoning models, is_error for kimi), self-documenting 'silently strip' comments, no event emission, tests assert removal but not visibility (Jobdori cycle #359 / sibling-chain closer to #207 inbound-drop / completes OpenAI-compat boundary audit) 2026-04-25 19:06:56 +09:00
YeonGyu-Kim
ba3a34d6fe roadmap: #207 filed — OpenAiUsage discards prompt_tokens_details.cached_tokens and completion_tokens_details.reasoning_tokens, cache_read_input_tokens hardcoded 0 in 4 sites breaking cost parity with Anthropic path (Jobdori cycle #358 / fix-pair with #204 / anomalyco/opencode #24233 sibling) 2026-04-25 18:34:44 +09:00
YeonGyu-Kim
0e9cff588d roadmap: #206 filed — normalize_finish_reason covers 2/5 OpenAI finish reasons, length/content_filter/function_call unmapped (Jobdori cycle #357)
Pinpoint #206: normalize_finish_reason() in openai_compat.rs only maps
stop→end_turn and tool_calls→tool_use. The 'other => other' pass-through
arm silently leaks length, content_filter, function_call to downstream
consumers expecting Anthropic vocabulary (max_tokens, refusal, tool_use).

Sibling of #201/#202/#203/#204 (silent fallbacks at provider boundary).
No structured event for unmapped values; test coverage locks only the
two-case happy path.

Branch: feat/jobdori-168c-emission-routing
HEAD: dba4f28
2026-04-25 18:20:04 +09:00

3266
ROADMAP.md

File diff suppressed because one or more lines are too long