Skip to content

Add OpenClaw and Cloud API Support #34

Open
SecretSettler wants to merge 79 commits intomainfrom
cloud-cache-proxy
Open

Add OpenClaw and Cloud API Support #34
SecretSettler wants to merge 79 commits intomainfrom
cloud-cache-proxy

Conversation

@SecretSettler
Copy link
Member

Close #8

SecretSettler and others added 30 commits March 1, 2026 17:54
Intercepts /v1/chat/completions and /v1/messages, extracts documents
from system prompts (XML tags, numbered, separator formats), reorders
them via ContextPilot for optimal prefix sharing, and forwards to the
backend. Users just change their API endpoint URL — no code changes needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e, and deployment files

- Extract and reorder documents from tool_result messages (OpenAI role="tool"
  and Anthropic type="tool_result"), with camelCase compat for OpenClaw internal format
- Add markdown_header extraction mode (split on # / ## headers)
- Extend XML tag recognition with <files>/<file>
- Add X-ContextPilot-Scope header (system / tool_results / all)
- Refactor _intercept_and_forward to use MultiExtractionResult for multi-source reordering
- Expand _contextpilot response metadata with total_documents and sources breakdown
- Add OpenClaw examples: setup.sh, Docker Compose, provider config template
- Add integration guide at docs/guides/openclaw.md
- 95 tests passing (34 new)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…den forwarding

- Move _contextpilot response metadata from JSON body to X-ContextPilot-Result
  header so strict API parsers (OpenClaw SDK) receive unmodified responses
- Broaden request header forwarding from 4-header whitelist to blacklist
  (only strip x-contextpilot-* and hop-by-hop), fixing dropped anthropic-beta etc.
- Forward backend response headers and status code in streaming mode
- Replace noisy print() in schedule_only() with logger.debug()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… paths

- Move proxy_completions metadata from response body to X-ContextPilot-Result header
- Inject rid in intercept path when running in stateful mode (index active)
- Add tests for header metadata, rid injection, and stateless bypass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ential, not parallel)

Add MUST come before Reorder: it expands the block set so Reorder
has more permutation options. Solves cases where no permutation of
existing blocks hits a prefix, but adding a block makes it possible.

Updated: both design docs, SVG diagram, Notion callout + CRUD table.
…icable

Dedup skips on Turn 1 or no duplicates; Repartition skips when all
blocks have dependencies; Add skips when no prefix-hit candidate
exists; Reorder skips when order is already optimal.

Updated EXPLAIN example to show a SKIPPED primitive.
Updated Notion primitives table with skip conditions.
Default 13, configurable via --chunk-modulus. Passed through to all dedup calls.
Added tuning guide in how_it_works.md with M value recommendations.
@SecretSettler SecretSettler requested a review from dalongbao March 24, 2026 00:46
@dalongbao
Copy link
Collaborator

the three functions in block_dedup.py can be made into one for simplicity

Copy link
Collaborator

@dalongbao dalongbao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Solid PR overall. The adapter pattern, intercept parser, and TTL eviction module are well-structured. Docs are accurate, tests have decent coverage, and the OpenClaw example works. Below are the findings grouped by priority.


Bugs (should fix before merge)

  1. _chunk_modulus not in global declaration (http_server.py ~L2148) — main() declares global _max_tokens, _infer_api_url, ... but omits _chunk_modulus. The --chunk-modulus CLI flag is silently ignored; value always stays at default 13.

  2. proxy_engine hardcodes temperature=0 (http_server.py ~L1933) — The generic /v1/{path:path} catch-all unconditionally sets body["temperature"] = 0, overwriting the user's value on every proxied request.

  3. "\n\n".join corrupts content (block_dedup.py L149/241/323, conversation_tracker.py L307) — Content is split on "\n" but reassembled with "\n\n", inserting phantom blank lines at every chunk boundary even for non-deduped blocks. Breaks structured content (JSON, YAML, code).

  4. hash() is non-deterministic across processes (block_dedup.py L64) — Python randomizes hash() per process via PYTHONHASHSEED. Chunk boundaries differ on every restart and across workers. Should use a deterministic hash.

  5. default_ttl_seconds=0 silently becomes 300 (ttl_eviction.py L115) — Uses or instead of is not None. 0 is falsy so it falls through to default_ttl.seconds.

  6. default_ttl setter is a no-op (ttl_eviction.py L129-132) — Updates the enum _default_ttl but not _default_ttl_seconds, which is what add_entry actually reads.

  7. Reconstruction uses default config (intercept_parser.py ~L572/621/962/1002) — reconstruct_* re-runs extraction with InterceptConfig() defaults instead of the original config. Silently fails for non-auto modes like mode=separator.

  8. _apply_block_dedup mutates caller's dict (conversation_tracker.py L306-307) — Hidden side effect: modifies doc_contents in-place with no indication in the signature or return value.


Resource / safety issues

  1. Streaming connection leak (http_server.py ~L1770-1806) — _stream_with_headers has no finally cleanup. Client disconnect mid-stream can leak aiohttp connections. Compare with proxy_engine which has finally: response.close().

  2. Double deep-copy (http_server.py ~L1363+1486) — _strip_external_content_ids recursively copies the body, then copy.deepcopy(body) copies it again. 2x memory pressure on every request.

  3. API key could leak in errors (http_server.py ~L1851) — aiohttp.ClientError can include URLs/headers in str(e), which is returned verbatim in the 502 detail.

  4. Non-JSON upstream error crashes (http_server.py ~L1815) — resp.json() on a plain-text 502 from a load balancer raises JSONDecodeError instead of a clean error.

  5. get_conversation_chain has no cycle detection (conversation_tracker.py L137-146) — Infinite loop if parent chain has a cycle.

  6. _requests dict grows unbounded (conversation_tracker.py L77) — No TTL, no max size, no automatic cleanup. timestamp field exists but is never read.


Cloud adapters

  1. Cache breakpoint limit (anthropic_adapter.py L105-117) — Injects cache_control on every qualifying tool result. Anthropic limits to 4 breakpoints per request. Will 400 on real agentic conversations with many tool results.

  2. No streaming cache metrics (http_server.py ~L1768-1806) — parse_cache_metrics only runs in the non-streaming path. TTL policy never updates for streaming requests.

  3. TTL label mismatch (confirmed via manual testing) — --extended-cache with OpenAI shows "default_ttl": "5m" but "default_ttl_seconds": 86400. The enum and seconds are set independently.

  4. update_from_response double-counts (ttl_eviction.py L247-268) — On partial cache hits (both read and creation tokens), calls touch_entry then add_entry non-atomically. Hit counter is inflated.


Docs

  1. Auto-detection priority is wrong (docs/guides/openclaw.md L218) — Says "XML > Numbered > Separator > Markdown headers" but code does xml_tag > numbered > json_results. Separator and markdown_header are not auto-detected.

  2. json_results missing from format table (docs/guides/openclaw.md L210-216) — The document extraction table omits json_results, which is an auto-detected format.


Dedup module

  1. Core dedup loop duplicated 3x (block_dedup.py) — dedup_chat_completions, _dedup_assistant_code_blocks, and dedup_responses_api share near-identical logic. Should extract into a shared helper.

  2. blocks_total undercounts (block_dedup.py L129/222/303) — Single-block messages are registered in seen_blocks but never counted in blocks_total.

  3. No unit tests for _content_defined_chunking, _hash_block, or the dedup functions directly.


Minor / nits (non-blocking)

  • FrozenSet imported but unused in all cloud adapter files
  • tool_results_skipped initialized but never incremented (dead code, http_server.py L1371)
  • Debug SHA-256 hashing runs unconditionally, not gated on log level (http_server.py L1380)
  • _intercept_index not reset when conversation changes (http_server.py L1189)
  • alpha header not validated — non-numeric value crashes (intercept_parser.py L135)
  • clear_conversation only walks ancestors, leaks child requests (conversation_tracker.py L357)
  • live_index.py schedule_only converted to logger.debug() but other methods still use print()
  • Test test_single_separator_returns_none asserts result is not None — name contradicts assertion
  • MiniMax listed in News section of README but omitted from Drop-in solutions line


for line in lines:
current.append(line)
line_hash = hash(line.strip()) & 0xFFFFFFFF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use hashlib.md5 for determinism


if current:
if blocks and len(current) < CHUNK_MIN_LINES:
blocks[-1] += "\n" + "\n".join(current)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add two \n?

@dalongbao
Copy link
Collaborator

cloud adapter test 了沒問題

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support OpenClaw

2 participants