codeflash-ai · KRRT7 · Feb 15, 2026 · Feb 15, 2026 · Feb 15, 2026 · Feb 15, 2026
diff --git a/.codex/skills/.gitignore b/.codex/skills/.gitignore
@@ -0,0 +1,2 @@
+# Managed by Tessl
+tessl:*
diff --git a/.gemini/skills/.gitignore b/.gemini/skills/.gitignore
@@ -0,0 +1,2 @@
+# Managed by Tessl
+tessl:*
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -33,3 +33,5 @@ Discovery → Ranking → Context Extraction → Test Gen + Optimization → Bas
 # Agent Rules <!-- tessl-managed -->
 
 @.tessl/RULES.md follow the [instructions](.tessl/RULES.md)
+
+@AGENTS.md
diff --git a/tessl.json b/tessl.json
@@ -63,6 +63,18 @@
     },
     "tessl/pypi-filelock": {
       "version": "3.19.0"
+    },
+    "codeflash/codeflash-rules": {
+      "version": "0.1.0"
+    },
+    "codeflash/codeflash-docs": {
+      "version": "0.1.0"
+    },
+    "codeflash/codeflash-skills": {
+      "version": "0.2.0"
+    },
+    "tessl-labs/tessl-skill-eval-scenarios": {
+      "version": "0.0.5"
     }
   }
 }
diff --git a/tiles/codeflash-docs/docs/ai-service.md b/tiles/codeflash-docs/docs/ai-service.md
@@ -0,0 +1,108 @@
+# AI Service
+
+How codeflash communicates with the AI optimization backend.
+
+## `AiServiceClient` (`api/aiservice.py`)
+
+The client connects to the AI service at `https://app.codeflash.ai` (or `http://localhost:8000` when `CODEFLASH_AIS_SERVER=local`).
+
+Authentication uses Bearer token from `get_codeflash_api_key()`. All requests go through `make_ai_service_request()` which handles JSON serialization via Pydantic encoder.
+
+Timeout: 90s for production, 300s for local.
+
+## Endpoints
+
+### `/ai/optimize` — Generate Candidates
+
+Method: `optimize_code()`
+
+Sends source code + dependency context to generate optimization candidates.
+
+Payload:
+- `source_code` — The read-writable code (markdown format)
+- `dependency_code` — Read-only context code
+- `trace_id` — Unique trace ID for the optimization run
+- `language` — `"python"`, `"javascript"`, or `"typescript"`
+- `n_candidates` — Number of candidates to generate (controlled by effort level)
+- `is_async` — Whether the function is async
+- `is_numerical_code` — Whether the code is numerical (affects optimization strategy)
+
+Returns: `list[OptimizedCandidate]` with `source=OptimizedCandidateSource.OPTIMIZE`
+
+### `/ai/optimize_line_profiler` — Line-Profiler-Guided Candidates
+
+Method: `optimize_python_code_line_profiler()`
+
+Like `/optimize` but includes `line_profiler_results` to guide the LLM toward hot lines.
+
+Returns: candidates with `source=OptimizedCandidateSource.OPTIMIZE_LP`
+
+### `/ai/refine` — Refine Existing Candidate
+
+Method: `refine_code()`
+
+Request type: `AIServiceRefinerRequest`
+
+Sends an existing candidate with runtime data and line profiler results to generate an improved version.
+
+Key fields:
+- `original_source_code` / `optimized_source_code` — Before and after
+- `original_code_runtime` / `optimized_code_runtime` — Timing data
+- `speedup` — Current speedup ratio
+- `original_line_profiler_results` / `optimized_line_profiler_results`
+
+Returns: candidates with `source=OptimizedCandidateSource.REFINE` and `parent_id` set to the refined candidate's ID
+
+### `/ai/repair` — Fix Failed Candidate
+
+Method: `repair_code()`
+
+Request type: `AIServiceCodeRepairRequest`
+
+Sends a failed candidate with test diffs showing what went wrong.
+
+Key fields:
+- `original_source_code` / `modified_source_code`
+- `test_diffs: list[TestDiff]` — Each with `scope` (return_value/stdout/did_pass), original vs candidate values, and test source code
+
+Returns: candidates with `source=OptimizedCandidateSource.REPAIR` and `parent_id` set
+
+### `/ai/adaptive_optimize` — Multi-Candidate Adaptive
+
+Method: `adaptive_optimize()`
+
+Request type: `AIServiceAdaptiveOptimizeRequest`
+
+Sends multiple previous candidates with their speedups for the LLM to learn from and generate better candidates.
+
+Key fields:
+- `candidates: list[AdaptiveOptimizedCandidate]` — Previous candidates with source code, explanation, source type, and speedup
+
+Returns: candidates with `source=OptimizedCandidateSource.ADAPTIVE`
+
+### `/ai/rewrite_jit` — JIT Rewrite
+
+Method: `get_jit_rewritten_code()`
+
+Rewrites code to use JIT compilation (e.g., Numba).
+
+Returns: candidates with `source=OptimizedCandidateSource.JIT_REWRITE`
+
+## Candidate Parsing
+
+All endpoints return JSON with an `optimizations` array. Each entry has:
+- `source_code` — Markdown-formatted code blocks
+- `explanation` — LLM explanation
+- `optimization_id` — Unique ID
+- `parent_id` — Optional parent reference
+- `model` — Which LLM model was used
+
+`_get_valid_candidates()` parses the markdown code via `CodeStringsMarkdown.parse_markdown_code()` and filters out entries with empty code blocks.
+
+## `LocalAiServiceClient`
+
+Used when `CODEFLASH_EXPERIMENT_ID` is set. Mirrors `AiServiceClient` but sends to a separate experimental endpoint for A/B testing optimization strategies.
+
+## LLM Call Sequencing
+
+`AiServiceClient` tracks call sequence via `llm_call_counter` (itertools.count). Each request includes a `call_sequence` number, used by the backend to maintain conversation context across multiple calls for the same function.
diff --git a/tiles/codeflash-docs/docs/configuration.md b/tiles/codeflash-docs/docs/configuration.md
@@ -0,0 +1,79 @@
+# Configuration
+
+Key configuration constants, effort levels, and thresholds.
+
+## Constants (`code_utils/config_consts.py`)
+
+### Test Execution
+
+| Constant | Value | Description |
+|----------|-------|-------------|
+| `MAX_TEST_RUN_ITERATIONS` | 5 | Maximum test loop iterations |
+| `INDIVIDUAL_TESTCASE_TIMEOUT` | 15s | Timeout per individual test case |
+| `MAX_FUNCTION_TEST_SECONDS` | 60s | Max total time for function testing |
+| `MAX_TEST_FUNCTION_RUNS` | 50 | Max test function executions |
+| `MAX_CUMULATIVE_TEST_RUNTIME_NANOSECONDS` | 100ms | Max cumulative test runtime |
+| `TOTAL_LOOPING_TIME` | 10s | Candidate benchmarking budget |
+| `MIN_TESTCASE_PASSED_THRESHOLD` | 6 | Minimum test cases that must pass |
+
+### Performance Thresholds
+
+| Constant | Value | Description |
+|----------|-------|-------------|
+| `MIN_IMPROVEMENT_THRESHOLD` | 0.05 (5%) | Minimum speedup to accept a candidate |
+| `MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD` | 0.10 (10%) | Minimum async throughput improvement |
+| `MIN_CONCURRENCY_IMPROVEMENT_THRESHOLD` | 0.20 (20%) | Minimum concurrency ratio improvement |
+| `COVERAGE_THRESHOLD` | 60.0% | Minimum test coverage |
+
+### Stability Thresholds
+
+| Constant | Value | Description |
+|----------|-------|-------------|
+| `STABILITY_WINDOW_SIZE` | 0.35 | 35% of total iteration window |
+| `STABILITY_CENTER_TOLERANCE` | 0.0025 | ±0.25% around median |
+| `STABILITY_SPREAD_TOLERANCE` | 0.0025 | 0.25% window spread |
+
+### Context Limits
+
+| Constant | Value | Description |
+|----------|-------|-------------|
+| `OPTIMIZATION_CONTEXT_TOKEN_LIMIT` | 16000 | Max tokens for optimization context |
+| `TESTGEN_CONTEXT_TOKEN_LIMIT` | 16000 | Max tokens for test generation context |
+| `MAX_CONTEXT_LEN_REVIEW` | 1000 | Max context length for optimization review |
+
+### Other
+
+| Constant | Value | Description |
+|----------|-------|-------------|
+| `MIN_CORRECT_CANDIDATES` | 2 | Min correct candidates before skipping repair |
+| `REPEAT_OPTIMIZATION_PROBABILITY` | 0.1 | Probability of re-optimizing a function |
+| `DEFAULT_IMPORTANCE_THRESHOLD` | 0.001 | Minimum addressable time to consider a function |
+| `CONCURRENCY_FACTOR` | 10 | Number of concurrent executions for concurrency benchmark |
+| `REFINED_CANDIDATE_RANKING_WEIGHTS` | (2, 1) | (runtime, diff) weights — runtime 2x more important |
+
+## Effort Levels
+
+`EffortLevel` enum: `LOW`, `MEDIUM`, `HIGH`
+
+Effort controls the number of candidates, repairs, and refinements:
+
+| Key | LOW | MEDIUM | HIGH |
+|-----|-----|--------|------|
+| `N_OPTIMIZER_CANDIDATES` | 3 | 5 | 6 |
+| `N_OPTIMIZER_LP_CANDIDATES` | 4 | 6 | 7 |
+| `N_GENERATED_TESTS` | 2 | 2 | 2 |
+| `MAX_CODE_REPAIRS_PER_TRACE` | 2 | 3 | 5 |
+| `REPAIR_UNMATCHED_PERCENTAGE_LIMIT` | 0.2 | 0.3 | 0.4 |
+| `TOP_VALID_CANDIDATES_FOR_REFINEMENT` | 2 | 3 | 4 |
+| `ADAPTIVE_OPTIMIZATION_THRESHOLD` | 0 | 0 | 2 |
+| `MAX_ADAPTIVE_OPTIMIZATIONS_PER_TRACE` | 0 | 0 | 4 |
+
+Use `get_effort_value(EffortKeys.KEY, effort_level)` to retrieve values.
+
+## Project Configuration
+
+Configuration is read from `pyproject.toml` under `[tool.codeflash]`. Key settings are auto-detected by `setup/detector.py`:
+- `module-root` — Root of the module to optimize
+- `tests-root` — Root of test files
+- `test-framework` — pytest, unittest, jest, etc.
+- `formatter-cmds` — Code formatting commands
diff --git a/tiles/codeflash-docs/docs/context-extraction.md b/tiles/codeflash-docs/docs/context-extraction.md
@@ -0,0 +1,60 @@
+# Context Extraction
+
+How codeflash extracts and limits code context for optimization and test generation.
+
+## Overview
+
+Context extraction (`context/code_context_extractor.py`) builds a `CodeOptimizationContext` containing all code needed for the LLM to understand and optimize a function, split into:
+
+- **Read-writable code** (`CodeContextType.READ_WRITABLE`): The function being optimized plus its helper functions — code the LLM is allowed to modify
+- **Read-only context** (`CodeContextType.READ_ONLY`): Dependency code for reference — imports, type definitions, base classes
+- **Testgen context** (`CodeContextType.TESTGEN`): Context for test generation, may include imported class definitions and external base class inits
+- **Hashing context** (`CodeContextType.HASHING`): Used for deduplication of optimization runs
+
+## Token Limits
+
+Both optimization and test generation contexts are token-limited:
+- `OPTIMIZATION_CONTEXT_TOKEN_LIMIT = 16000` tokens
+- `TESTGEN_CONTEXT_TOKEN_LIMIT = 16000` tokens
+
+Token counting uses `encoded_tokens_len()` from `code_utils/code_utils.py`. Functions whose context exceeds these limits are skipped.
+
+## Context Building Process
+
+### 1. Helper Discovery
+
+For the target function (`FunctionToOptimize`), the extractor finds:
+- **Helpers of the function**: Functions/classes in the same file that the target function calls
+- **Helpers of helpers**: Transitive dependencies of the helper functions
+
+These are organized as `dict[Path, set[FunctionSource]]` — mapping file paths to the set of helper functions found in each file.
+
+### 2. Code Extraction
+
+`extract_code_markdown_context_from_files()` builds `CodeStringsMarkdown` from the helper dictionaries. Each file's relevant code is extracted as a `CodeString` with its file path.
+
+### 3. Testgen Context Enrichment
+
+`build_testgen_context()` extends the basic context with:
+- Imported class definitions (resolved from imports)
+- External base class `__init__` methods
+- External class `__init__` methods referenced in the context
+
+### 4. Unused Definition Removal
+
+`detect_unused_helper_functions()` and `remove_unused_definitions_by_function_names()` from `context/unused_definition_remover.py` prune definitions that are not transitively reachable from the target function, reducing token usage.
+
+### 5. Deduplication
+
+The hashing context (`hashing_code_context`) generates a hash (`hashing_code_context_hash`) used to detect when the same function context has already been optimized in a previous run, avoiding redundant work.
+
+## Key Functions
+
+| Function | Location | Purpose |
+|----------|----------|---------|
+| `build_testgen_context()` | `context/code_context_extractor.py` | Build enriched testgen context |
+| `extract_code_markdown_context_from_files()` | `context/code_context_extractor.py` | Convert helper dicts to `CodeStringsMarkdown` |
+| `detect_unused_helper_functions()` | `context/unused_definition_remover.py` | Find unused definitions |
+| `remove_unused_definitions_by_function_names()` | `context/unused_definition_remover.py` | Remove unused definitions |
+| `collect_top_level_defs_with_usages()` | `context/unused_definition_remover.py` | Analyze definition usage |
+| `encoded_tokens_len()` | `code_utils/code_utils.py` | Count tokens in code |
Original file line number	Diff line number	Diff line change
Expand Up		@@ -33,3 +33,5 @@ Discovery → Ranking → Context Extraction → Test Gen + Optimization → Bas
		# Agent Rules <!-- tessl-managed -->

		@.tessl/RULES.md follow the [instructions](.tessl/RULES.md)

		@AGENTS.md