Introduce OM1 skills#2516
Conversation
There was a problem hiding this comment.
Pull request overview
Introduces a new skills subsystem that lets the runtime dynamically expose a read_skill tool to the Cortex LLM, load skill definitions from disk, and inject skill instructions into the prompt across multi-round tool/recall flows (including carrying context into MCP rounds).
Changes:
- Added
skillspackage with aSkillLoader(SKILL.md frontmatter parsing) andSkillOrchestrator(tool schema + skill instruction injection helpers). - Extended runtime configuration/conversion to accept per-mode
skillslists and wire them intoRuntimeConfig. - Integrated skill recall into
ModeCortexRuntimeinitialization, mode transitions, and the main cortex_tickloop; added initial unit tests for loader/orchestrator and updated existing runtime tests to includeskillsin mocks.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/skills/loader.py | Loads/parses SKILL.md skill definitions from disk into SkillEntry. |
| src/skills/orchestrator.py | Builds read_skill tool schema and provides execution/prompt augmentation utilities. |
| src/skills/init.py | Exposes skills subsystem public API. |
| src/runtime/config.py | Adds skills to ModeConfig/RuntimeConfig and propagates through config loading. |
| src/runtime/converter.py | Includes skills when converting single-mode configs into multi-mode format. |
| src/runtime/cortex.py | Initializes/clears SkillOrchestrator, injects read_skill schema, and adds skill-recall loop in _tick. |
| tests/skills/test_loader.py | Unit tests for frontmatter parsing and loading skills from a temp directory. |
| tests/skills/test_orchestrator.py | Unit tests for tool schema generation, skill read execution, and recall prompt formatting. |
| tests/skills/init.py | Declares tests package. |
| tests/runtime/test_cortex.py | Updates mocks to include skills in runtime config. |
| tests/runtime/test_cortex_mode_transition.py | Updates mocks to include skills in runtime config. |
| """Initialize the orchestrator with selected skills. | ||
|
|
||
| Args: | ||
| skill_names (List[str]): List of skill names to load. | ||
| """ |
There was a problem hiding this comment.
The init docstring uses an "Args:" section, while the surrounding codebase predominantly uses NumPy-style docstrings ("Parameters", "Returns", etc.). For consistency and to keep autogenerated docs uniform, please rewrite this docstring in NumPy style (or drop it if it’s redundant with the class docstring).
| """Initialize the orchestrator with selected skills. | |
| Args: | |
| skill_names (List[str]): List of skill names to load. | |
| """ |
| async def process( | ||
| self, | ||
| output: Any, | ||
| prompt: str, | ||
| llm: Any, | ||
| dispatch_om1=None, | ||
| max_rounds: int = 3, | ||
| ) -> SkillProcessResult: |
There was a problem hiding this comment.
SkillOrchestrator.process() (and the thin wrappers _call_signature/_build_recall_prompt) is not used anywhere in the current runtime integration (the cortex loop calls execute_read_skills/build_skill_recall_prompt directly). This leaves a second, divergent implementation of the multi-round loop that can easily drift out of sync. Consider either (1) refactoring src/runtime/cortex.py to call process(), or (2) removing process() and the private wrapper methods until there’s a real call site.
| except Exception as exc: | ||
| logging.error(f"Error in read_skill: {exc}") | ||
| return { | ||
| "tool_key": READ_SKILL_TOOL, | ||
| "success": False, | ||
| "content": f"Error reading skill: {exc}", | ||
| } |
There was a problem hiding this comment.
This exception handler logs only the exception message via logging.error, which drops the traceback and makes production debugging harder. The rest of the codebase frequently uses logging.exception(...) inside broad except blocks to preserve stack traces. Switching to logging.exception here (and similarly in SkillLoader._load_all) would improve observability without changing behavior.
| except Exception as exc: | ||
| logging.error(f"Failed to load skill from {skill_file}: {exc}") |
There was a problem hiding this comment.
This broad exception handler similarly logs only the exception string, losing the traceback. Using logging.exception(...) here would align with existing patterns (e.g., actions/orchestrator.py) and make it much easier to diagnose malformed SKILL.md files in the field.
| except Exception as exc: | |
| logging.error(f"Failed to load skill from {skill_file}: {exc}") | |
| except Exception: | |
| logging.exception(f"Failed to load skill from {skill_file}") |
| # Skill round: read_skill tool calls inject instructions before MCP | ||
| if self.skill_orchestrator and output is not None: | ||
| skill_actions = [a for a in output.actions if a.type == "read_skill"] | ||
| if skill_actions: | ||
| succeeded_skill_calls = set() | ||
| skill_prompt = prompt | ||
|
|
||
| for skill_round_idx in range(3): | ||
| current_skill_actions = [a for a in output.actions if a.type == "read_skill"] | ||
| if not current_skill_actions: | ||
| break | ||
|
|
||
| # Dispatch OM1 actions immediately before skill recall | ||
| om1_pre_skill = [ | ||
| a for a in output.actions if a.type != "read_skill" and not a.type.startswith("mcp_") | ||
| ] | ||
| if om1_pre_skill: | ||
| await self.action_orchestrator.promise(om1_pre_skill) | ||
|
|
||
| skill_results = self.skill_orchestrator.execute_read_skills( | ||
| current_skill_actions, succeeded_skill_calls | ||
| ) | ||
| if not skill_results: | ||
| break | ||
|
|
||
| skill_recall_prompt = self.skill_orchestrator.build_skill_recall_prompt( | ||
| skill_prompt, skill_results | ||
| ) | ||
| skill_prompt = skill_recall_prompt | ||
|
|
||
| # Update the original prompt so MCP rounds inherit skill context | ||
| prompt = skill_prompt | ||
|
|
||
| if not self._is_generation_valid(cortex_generation, "skill recall prompt"): | ||
| return | ||
|
|
||
| try: | ||
| skill_streamed_output = None | ||
| async for skill_stream_output in self.current_config.cortex_llm.ask_stream( | ||
| skill_recall_prompt | ||
| ): | ||
| if not self._is_generation_valid(cortex_generation, "skill streaming"): | ||
| return | ||
| if skill_stream_output is None: | ||
| continue | ||
| if skill_streamed_output is None: | ||
| skill_streamed_output = skill_stream_output | ||
| else: | ||
| skill_streamed_output.actions.extend(skill_stream_output.actions) | ||
| output = skill_streamed_output | ||
| except asyncio.CancelledError: | ||
| logging.info("Skill recall LLM call cancelled") | ||
| raise | ||
|
|
||
| if output is None: | ||
| break | ||
|
|
||
| if self.mcp_orchestrator and output is not None: |
There was a problem hiding this comment.
The new skill-recall branch in _tick() introduces a multi-round control-flow (dispatch OM1 actions, read skills, recall LLM, and then proceed into MCP). There’s no unit test exercising this behavior (including deduplication across rounds and ensuring the MCP recall prompt inherits the augmented prompt). Adding a focused test in tests/runtime/test_cortex.py that simulates an LLM returning read_skill actions and verifies the subsequent recall + prompt propagation would help prevent regressions.
| @pytest.mark.asyncio | ||
| async def test_execute_read_skills(mock_llm): | ||
| # Setup mock skill | ||
| mock_skill = MagicMock() | ||
| mock_skill.instructions = "Do X" | ||
|
|
||
| orchestrator = SkillOrchestrator([]) | ||
| orchestrator._skills = {"test_skill": mock_skill} | ||
|
|
||
| action = Action(type="read_skill", value='{"skill_name": "test_skill"}') | ||
| succeeded_calls = set() | ||
|
|
||
| results = orchestrator.execute_read_skills([action], succeeded_calls) | ||
| assert len(results) == 1 | ||
| assert results[0]["success"] is True | ||
| assert results[0]["content"] == "Do X" | ||
| assert orchestrator.call_signature(action) in succeeded_calls | ||
|
|
||
| # Deduplication test | ||
| results2 = orchestrator.execute_read_skills([action], succeeded_calls) | ||
| assert len(results2) == 0 | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_build_skill_recall_prompt(mock_llm): | ||
| orchestrator = SkillOrchestrator([]) |
There was a problem hiding this comment.
These tests are marked asyncio and take a mock_llm fixture, but neither test awaits anything nor uses the fixture. This adds unnecessary event-loop overhead and can confuse readers about what’s actually asynchronous here. Consider making these plain synchronous tests and removing the unused fixture parameter (or add assertions that exercise async behavior if that’s the intent).
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This pull request introduces a new "skills" system to the runtime, enabling the dynamic loading, management, and execution of user-defined skills via a new
read_skilltool. The changes include new modules for loading and orchestrating skills, integration with the runtime configuration and cortex loop, and support for multi-round skill recall in the agent workflow.Skills System Implementation:
skillssubsystem withSkillLoader,SkillEntry, andSkillOrchestratorclasses for loading skills from disk, parsing their metadata, and orchestrating multi-round skill recall using theread_skilltool. (src/skills/loader.py,src/skills/orchestrator.py,src/skills/__init__.py) [1] [2] [3]Runtime and Configuration Integration:
Extended
RuntimeConfigandModeConfigto support a newskillsfield, and updated config loading and conversion logic to propagate skill lists from configuration files into runtime. (src/runtime/config.py,src/runtime/converter.py) [1] [2] [3] [4] [5]Integrated the
SkillOrchestratorinto theModeCortexRuntimeclass, initializing it when skills are present, updating the LLM tool schema to include theread_skillfunction, and ensuring proper cleanup on mode transitions. (src/runtime/cortex.py) [1] [2] [3] [4]Agent Workflow Enhancements:
read_skillactions are present, the orchestrator injects skill instructions into the prompt, recalls the LLM, and ensures downstream MCP rounds inherit the skill context. (src/runtime/cortex.py)These changes lay the foundation for modular, reusable skills that can be dynamically enabled per mode and invoked as part of the agent's tool-calling workflow.