diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md new file mode 100644 index 00000000..99162419 --- /dev/null +++ b/.claude/CLAUDE.md @@ -0,0 +1,306 @@ +# Claude Code Universal Behavior Guidelines + +## Overview + +This document defines universal behavior guidelines for Claude Code across all commands and workflows. These principles apply regardless of the specific command being executed. + +## Core Principles + +### 1. Complete Documentation +- Document every action you take in the appropriate JSON file +- Track all files created, modified, or deleted +- Capture task progress and status changes +- Include all relevant context, decisions, and assumptions +- Never assume information is obvious - document everything explicitly + +### 2. Consistent Output Format +- Always use the unified JSON schema (see below) +- Include all required fields for the relevant status +- Use optional fields as needed to provide additional context +- Validate JSON structure before completing work +- Ensure JSON is properly formatted and parseable + +### 3. Session Management & Stateful Resumption +- Claude Code provides a session ID that maintains conversation context automatically +- Always include `session_id` in your output to enable seamless continuation +- When resuming work from a previous session, include `parent_session_id` to link sessions +- The session ID allows Claude Code to preserve full conversation history +- If you need user input, the context is preserved via session ID +- Include enough detail in `session_summary` to understand what was accomplished +- Don't make the user repeat information - session maintains context + +### 4. Task Management +- Track all tasks in JSON output files (NOT in separate markdown files) +- Use hierarchical task IDs: "1.0" for parent, "1.1", "1.2" for children +- Track task status: pending, in_progress, completed, skipped, blocked +- Include task descriptions and any relevant notes +- Update task status as you work +- Document which tasks were completed in each session +- Note any tasks that were skipped and explain why +- When blocked, document the blocker clearly + +### 5. Query Management +- Save all queries to users in the session JSON file +- When querying users, include: + - Clear, specific questions + - Query type (text, multiple_choice, boolean) + - Any relevant context needed to answer + - Query number for reference +- Save user responses in the same JSON file +- Link queries and responses with query numbers + +## File Organization Structure + +All agent-related documents and files must be organized under the `agent-io` directory: + +``` +agent-io/ +├── prds/ +│ └── / +│ ├── humanprompt.md # Original user description of PRD +│ ├── fullprompt.md # Fully fleshed PRD after completion +│ └── data.json # JSON file documenting queries, responses, tasks, etc. +└── docs/ + └── .md # Architecture docs, usage docs, etc. +``` + +### File Organization Guidelines: +- **PRD Files**: Save to `agent-io/prds//` directory + - Each PRD gets its own directory named after the PRD + - Use kebab-case for PRD names (e.g., "user-profile-editing", "payment-integration") + - Directory contains: humanprompt.md, fullprompt.md, and data.json + - The data.json file tracks all queries, responses, tasks, errors, and progress + +- **PRD Storage and Reference**: + - **When user provides a prompt without a PRD name**: + - Analyze the prompt to create a descriptive PRD name (use kebab-case) + - Create directory: `agent-io/prds//` + - Save the original user prompt to `agent-io/prds//humanprompt.md` + - Document the PRD name in your output for future reference + - This allows users to reference this PRD by name in future sessions + + - **When user references an existing PRD by name**: + - Look for the PRD directory: `agent-io/prds//` + - Read available PRD files in order of preference: + 1. `fullprompt.md` - the complete, finalized PRD (if available) + 2. `humanprompt.md` - the original user description + - Use these files as context for the requested work + - Update or create additional files as needed + + - **PRD Naming Best Practices**: + - Use descriptive, feature-focused names + - Keep names concise (2-4 words typically) + - Use kebab-case consistently + - Examples: "user-authentication", "payment-processing", "real-time-notifications" + +- **Documentation Files**: Save to `agent-io/docs/` + - Architecture documentation: `agent-io/docs/-architecture.md` + - Usage documentation: `agent-io/docs/-usage.md` + - Other documentation as appropriate + +- **Code Files**: Save to appropriate project locations + - Follow existing project structure + - Document each file in the JSON tracking file + - Include purpose and type for each file + +### JSON Documentation Files: +- Every PRD must have an associated `data.json` file in its directory +- The data.json file documents: + - Tasks and their status + - Queries to users and their responses + - Errors and problems encountered + - Files created, modified, deleted + - Session information and summaries + - Comments and context + +## Unified JSON Output Schema + +Use this schema for all JSON output files: + +```json +{ + "command_type": "string (create-prd | doc-code-for-dev | doc-code-usage | free-agent | generate-tasks)", + "status": "string (complete | incomplete | user_query | error)", + "session_id": "string - Claude Code session ID for this execution", + "parent_session_id": "string | null - Session ID of previous session when resuming work", + "session_summary": "string - Brief summary of what was accomplished", + + "tasks": [ + { + "task_id": "string (e.g., '1.0', '1.1', '2.0')", + "description": "string", + "status": "string (pending | in_progress | completed | skipped | blocked)", + "parent_task_id": "string | null", + "notes": "string (optional details about completion/issues)" + } + ], + + "files": { + "created": [ + { + "path": "string (relative to working directory)", + "purpose": "string (why this file was created)", + "type": "string (markdown | code | config | documentation)" + } + ], + "modified": [ + { + "path": "string", + "changes": "string (description of modifications)" + } + ], + "deleted": [ + { + "path": "string", + "reason": "string" + } + ] + }, + + "artifacts": { + "prd_filename": "string (for create-prd command)", + "documentation_filename": "string (for doc-code commands)" + }, + + "queries_for_user": [ + { + "query_number": "integer", + "query": "string", + "type": "string (text | multiple_choice | boolean)", + "choices": [ + { + "id": "string", + "value": "string" + } + ], + "response": "string | null - User's response (populated after query is answered)" + } + ], + + "comments": [ + "string - important notes, warnings, observations" + ], + + "context": "string - optional supplementary state details. Session ID preserves full context automatically, so this field is only needed for additional implementation-specific state not captured in the conversation.", + + "metrics": { + "duration_seconds": "number (optional)", + "files_analyzed": "number (optional)", + "lines_of_code": "number (optional)" + }, + + "errors": [ + { + "message": "string", + "type": "string", + "fatal": "boolean" + } + ] +} +``` + +## Required Fields by Status + +### Status: "complete" +- `command_type`, `status`, `session_id`, `session_summary`, `files`, `comments` +- `parent_session_id` (if this session continues work from a previous session) +- Plus any command-specific artifacts (prd_filename, documentation_filename, etc.) +- `tasks` array if the command involves tasks + +### Status: "user_query" +- `command_type`, `status`, `session_id`, `session_summary`, `queries_for_user` +- `files` (for work done so far) +- `comments` (explaining why input is needed) +- `context` (optional - session_id maintains context automatically) +- Note: When user provides answers, they'll create a new session with `parent_session_id` linking back to this one + +### Status: "incomplete" +- `command_type`, `status`, `session_id`, `session_summary`, `files`, `comments` +- Explanation in `comments` of what's incomplete and why +- `errors` array if errors caused incompleteness +- `context` (optional - session_id maintains context automatically) + +### Status: "error" +- `command_type`, `status`, `session_id`, `session_summary`, `errors`, `comments` +- `files` (if any work was done before error) +- `context` (optional - for additional recovery details beyond what session maintains) + +## Error Handling + +When errors occur: +1. Set status to "error" (or "incomplete" if partial work succeeded) +2. Document the error in the `errors` array +3. Include what failed, why it failed, and potential fixes +4. Document any work that was completed before the error +5. Provide context for potential recovery +6. Save error details to the JSON file + +## Code Development Guidelines + +### Keep Code Simple +- Prefer simple, straightforward implementations over clever or complex solutions +- Write code that is easy to read and understand +- Avoid unnecessary abstractions or over-engineering +- Use clear, descriptive variable and function names +- Comment complex logic, but prefer self-documenting code + +### Limit Complexity +- Minimize the number of classes and Python files +- Consolidate related functionality into fewer, well-organized modules +- Only create new files when there's a clear separation of concerns +- Avoid deep inheritance hierarchies +- Prefer composition over inheritance when appropriate + +### Use JSON Schema Validation +- All JSON files must have corresponding JSON schemas +- Validate JSON files against their schemas +- Document the schema in comments or separate schema files +- Use schema validation to catch errors early +- Keep schemas simple and focused + +### Keep Code Management Simple +- Don't use excessive linting rules +- Avoid complex documentation frameworks (like Sphinx) unless truly needed +- Use simple, standard tools (pytest for testing, basic linting) +- Focus on clear code over extensive tooling +- Documentation should be clear markdown files, not generated sites + +## Best Practices + +- **Be Specific**: Include file paths, line numbers, function names +- **Be Complete**: Don't leave out details assuming the user knows them +- **Be Clear**: Write for someone who wasn't watching you work +- **Be Actionable**: Comments should help the user understand next steps +- **Be Honest**: If something is incomplete or uncertain, say so +- **Be Consistent**: Follow the same patterns and conventions throughout +- **Be Thorough**: Test your work and verify it functions correctly +- **Be Organized**: Maintain clean directory structure and file organization + +## Workflow Principles + +### PRD Workflow +1. User provides initial feature description → saved as `humanprompt.md` +2. Complete PRD after workflow → saved as `fullprompt.md` +3. All progress tracked in `.json` + +### Task Workflow +1. Break work into clear, manageable tasks +2. Use hierarchical task IDs (1.0, 1.1, 1.2, 2.0, etc.) +3. Update task status as work progresses +4. Document completed work and any blockers +5. Track everything in JSON file + +### Documentation Workflow +1. Understand the codebase or feature thoroughly +2. Create clear, well-organized documentation +3. Save to appropriate location in `agent-io/docs/` +4. Track file creation and content in JSON output +5. Include examples and practical guidance + +### Query Workflow +1. Only query when genuinely needed +2. Ask clear, specific questions +3. Save query to JSON file with query_number +4. Wait for user response +5. Save response to same JSON file +6. Continue work with provided information diff --git a/.claude/commands/analyze-email.md b/.claude/commands/analyze-email.md new file mode 100644 index 00000000..967e101a --- /dev/null +++ b/.claude/commands/analyze-email.md @@ -0,0 +1,282 @@ +# Command: analyze-email + +## Purpose + +Analyze an email document to extract key information, classify its importance, assign it to relevant projects, identify action items, and prepare a draft response. All analysis results are saved to a structured JSON file for downstream processing. + +## Command Type + +`analyze-email` + +## Input + +You will receive a request file containing: +- Email content (body, subject, sender, recipients) +- Email metadata (date, time, headers) +- User preferences (optional) + +## Process + +### Phase 1: Email Content Analysis + +1. **Read Email Document** + - Parse email subject, body, sender, recipients + - Extract metadata (date, time, CC, BCC if available) + - Identify attachments mentioned or referenced + - Note email thread context if provided + +2. **Extract Key Information** + - Identify main topics and themes + - Extract specific requests or questions + - Note mentioned dates, deadlines, or time-sensitive information + - Identify key stakeholders mentioned + - Extract any reference numbers, project codes, or identifiers + +### Phase 2: Classification + +3. **Classify Email Importance** + - Analyze content and metadata to classify as one of: + - **unimportant**: Mass emails, newsletters, low-priority updates, spam-like content + - **personal**: Personal correspondence, non-work related, social invitations + - **professional**: Work-related, business correspondence, project updates, actionable items + + - Consider these factors: + - Sender relationship (colleague, client, vendor, unknown) + - Subject urgency indicators (urgent, ASAP, deadline, etc.) + - Content type (FYI, action required, question, update) + - Presence of deadlines or action items + - Email thread importance + + - Provide classification confidence score (0.0-1.0) + - Document classification reasoning in comments + - Emails classified as unimportant should not proceed to further processing + +### Phase 3: Task Extraction + +4. **Identify Action Items** + - Scan email for explicit tasks: + - Action verbs (review, approve, send, create, update, etc.) + - Questions requiring responses + - Requests for information or deliverables + - Meeting requests or scheduling needs + + - For each identified task: + - Extract task description + - Determine task type (respond, review, create, schedule, research, etc.) + - Identify task owner (you, sender, other party) + - Extract related context and requirements + +5. **Determine Urgency and Deadlines** + - Analyze for urgency indicators: + - **Critical**: Explicit urgent markers, imminent deadlines (<24 hours), blocking issues + - **High**: Near-term deadlines (1-3 days), important stakeholders, escalations + - **Medium**: Standard deadlines (4-7 days), routine requests, normal priority + - **Low**: Long-term deadlines (>7 days), FYI items, optional tasks + + - Extract deadlines: + - Explicit dates ("by Friday", "before March 15") + - Implicit timeframes ("ASAP", "end of week", "Q1") + - Recurring deadlines ("weekly report", "monthly update") + + - Convert to standardized format (ISO 8601) + - If no deadline specified, suggest reasonable deadline based on urgency + +### Phase 4: Draft Response + +6. **Analyze Response Requirements** + - Determine if response is needed + - Identify key points to address + - Note any questions to answer + - Consider required tone (formal, casual, apologetic, etc.) + - Identify if response requires attachments or follow-up actions + +7. **Generate Draft Response** + - Create draft email response including: + - Appropriate greeting based on sender relationship + - Address all questions and requests + - Confirm understanding of tasks and deadlines + - Propose next steps if applicable + - Professional closing + + - Match tone to original email and relationship + - Keep response concise and actionable + - Include placeholders for information you don't have ([YOUR_INPUT_NEEDED]) + - Add suggested subject line (Re: or continuation) + + - If no response needed, set draft_response to null and explain why + +### Phase 5: Save Structured Output + +8. **Prepare JSON Output File** + - Determine sequence number for email analysis + - Check `orchestrator/email-analysis/` directory for existing analyses + - Use next sequential number (0001, 0002, 0003, etc.) + - If directory doesn't exist, create it and start at 0001 + +9. **Save Analysis File** + - Filename format: `orchestrator/email-analysis/[NNNN]-[YYYY-MM-DD]-[sender-name].json` + - Example: `orchestrator/email-analysis/0042-2025-11-09-john-smith.json` + - Use kebab-case for sender name + - Document the filename in JSON output's `artifacts.analysis_filename` + +## JSON Output Schema + +The analysis JSON file must follow this structure: + +```json +{ + "email_metadata": { + "subject": "string", + "sender": { + "name": "string", + "email": "string" + }, + "recipients": { + "to": ["email1@example.com", "email2@example.com"], + "cc": ["email3@example.com"], + "bcc": [] + }, + "date_received": "ISO 8601 datetime", + "thread_id": "string or null", + "message_id": "string or null", + "attachments": ["filename1.pdf", "filename2.xlsx"] + }, + + "classification": { + "category": "unimportant | personal | professional", + "confidence": 0.95, + "reasoning": "Detailed explanation of classification decision", + "urgency_level": "critical | high | medium | low", + "is_actionable": true, + "sentiment": "positive | neutral | negative | mixed" + }, + + "tasks": [ + { + "task_id": "T001", + "description": "Review and approve the Q4 budget proposal", + "task_type": "review | respond | create | schedule | research | approve | other", + "owner": "self | sender | other", + "urgency": "critical | high | medium | low", + "deadline": { + "date": "ISO 8601 datetime or null", + "is_explicit": true, + "original_text": "by end of week", + "suggested_deadline": "ISO 8601 datetime - if no explicit deadline" + }, + "status": "pending", + "context": "Additional context from email about this task", + "dependencies": ["T002"], + "estimated_effort": "15 minutes | 1 hour | 2 hours | 1 day | 1 week" + } + ], + + "draft_response": { + "should_respond": true, + "response_urgency": "immediate | today | this_week | no_rush", + "suggested_subject": "Re: Q4 Budget Review Request", + "draft_body": "Full draft email body with appropriate greeting, content, and closing", + "tone": "formal | professional | casual | friendly | apologetic", + "requires_attachments": false, + "placeholders": [ + { + "placeholder": "[YOUR_INPUT_NEEDED]", + "description": "Insert your availability for the meeting", + "location": "paragraph 2" + } + ], + "key_points_to_address": [ + "Confirm receipt of budget proposal", + "Provide timeline for review", + "Ask clarifying questions about line items" + ] + }, + + "summary": { + "one_line": "Budget approval request from Finance requiring review by Friday", + "detailed": "Longer summary (2-3 sentences) of email content and required actions", + "key_entities": [ + {"type": "person", "value": "Jane Doe"}, + {"type": "project", "value": "Q4 Budget Planning"}, + {"type": "document", "value": "Budget_Proposal_Q4.xlsx"}, + {"type": "date", "value": "2025-11-15"} + ] + }, + + "analysis_metadata": { + "analyzed_at": "ISO 8601 datetime", + "analysis_version": "1.0", + "model_used": "string", + "processing_time_seconds": 3.45, + "confidence_overall": 0.89, + "requires_human_review": false, + "review_reason": "string or null - why human review is needed" + } +} +``` + +## Command JSON Output Requirements + +Your command execution JSON output must include: + +**Required Fields:** +- `command_type`: "analyze-email" +- `status`: "complete", "user_query", or "error" +- `session_summary`: Brief summary of email analysis +- `files.created`: Array with the analysis JSON file entry +- `artifacts.analysis_filename`: Path to the analysis JSON file +- `artifacts.email_data`: Copy of the email_metadata for quick reference +- `comments`: Array of notes about the analysis process + +**For user_query status:** +- `queries_for_user`: Questions needing clarification +- `context`: Save partial analysis and email content + +**Example Comments:** +- "Email classified as professional with high confidence (0.95)" +- "Identified 3 action items with deadlines ranging from 2-5 days" +- "Draft response prepared; requires user input for meeting availability" +- "No explicit deadlines found; suggested deadlines based on urgency level" + +## Tasks to Track + +Create tasks in the internal todo list: + +``` +1.0 Parse and extract email content +2.0 Classify email importance and urgency +3.0 Extract tasks and deadlines +4.0 Generate draft response +5.0 Save structured JSON file +``` + +Mark tasks as completed as you progress. + +## Quality Checklist + +Before marking complete, verify: +- ✅ Email metadata completely extracted and validated +- ✅ Classification includes confidence score and reasoning +- ✅ All action items extracted with urgency and deadlines +- ✅ Deadlines converted to ISO 8601 format +- ✅ Draft response addresses all key points (if response needed) +- ✅ JSON file saved with correct naming and structure +- ✅ All required JSON schema fields populated +- ✅ Comments include insights about classification and task extraction +- ✅ Edge cases handled (no deadline, no clear tasks, etc.) + +## Error Handling + +Handle these scenarios gracefully: + +1. **Malformed Email**: Return error status with details +2. **No Clear Tasks**: Set tasks array to empty, note in comments +3. **Ambiguous Classification**: Use most likely category, lower confidence score +4. **No Response Needed**: Set draft_response.should_respond to false with explanation + +## Privacy and Security Considerations + +- Ensure sensitive information (passwords, SSNs, credentials) is not logged in comments +- Redact sensitive data in analysis file if present in email +- Document any sensitive content detected in analysis_metadata.requires_human_review +- Do not include full email body in command output JSON, only in analysis file diff --git a/.claude/commands/claude-commands-expert.md b/.claude/commands/claude-commands-expert.md new file mode 100644 index 00000000..55db696a --- /dev/null +++ b/.claude/commands/claude-commands-expert.md @@ -0,0 +1,204 @@ +# ClaudeCommands Expert + +You are an expert on the ClaudeCommands repository - a system for managing Claude Code commands and skills across multiple projects. You have deep knowledge of: + +1. **The CLI Tool** - `claude-commands` for installing, updating, and managing commands +2. **Command/Skill Development** - How to create new commands and expert skills +3. **System Architecture** - The two-file input pattern, unified JSON output, session management +4. **Deployment Workflow** - How commands are deployed to projects and ~/.claude + +## Knowledge Loading + +Before answering, read the relevant documentation from this repository: + +**Core Documentation:** +- `/Users/chenry/Dropbox/Projects/ClaudeCommands/README.md` - Overview and quick start +- `/Users/chenry/Dropbox/Projects/ClaudeCommands/docs/CLI.md` - CLI tool documentation +- `/Users/chenry/Dropbox/Projects/ClaudeCommands/docs/ARCHITECTURE.md` - System design + +**When needed:** +- `/Users/chenry/Dropbox/Projects/ClaudeCommands/SYSTEM-PROMPT.md` - Universal system instructions +- `/Users/chenry/Dropbox/Projects/ClaudeCommands/claude_commands.py` - CLI implementation + +## Quick Reference + +### Repository Structure +``` +ClaudeCommands/ +├── SYSTEM-PROMPT.md # Universal instructions for all commands +├── claude_commands.py # CLI tool implementation +├── setup.py # pip install configuration +├── commands/ # SOURCE command definitions +│ ├── create-prd.md +│ ├── free-agent.md +│ ├── msmodelutl-expert.md # Expert skill example +│ └── msmodelutl-expert/ # Context subdirectory +│ └── context/ +│ ├── api-summary.md +│ ├── patterns.md +│ └── integration.md +├── data/ # CLI runtime data +│ └── projects.json # Tracked projects +├── docs/ # Documentation +└── .claude/ # Local installation (for testing) + ├── CLAUDE.md + └── commands/ +``` + +### CLI Commands + +| Command | Purpose | +|---------|---------| +| `claude-commands install` | Install to ~/.claude (global) | +| `claude-commands addproject ` | Add project and install commands | +| `claude-commands update` | Update all tracked projects | +| `claude-commands list` | List all tracked projects | +| `claude-commands removeproject ` | Stop tracking a project | + +### Creating a New Command + +1. Create `commands/.md`: +```markdown +# Command: + +## Purpose +What this command does + +## Command Type +`` + +## Core Directive +What Claude should do + +## Input +What the request file should contain + +## Process +Step-by-step execution + +## Output Requirements +What goes in the JSON output + +## Quality Checklist +Verification steps +``` + +### Creating an Expert Skill + +Expert skills provide domain-specific knowledge. Structure: + +``` +commands/ +├── .md # Main skill definition +└── / # Optional context directory + └── context/ + ├── api-summary.md # Quick API reference + ├── patterns.md # Common usage patterns + └── integration.md # Integration with other modules +``` + +Main skill file template: +```markdown +# Expert + +You are an expert on . You have deep knowledge of: +1. **Topic 1** - Description +2. **Topic 2** - Description + +## Knowledge Loading +Before answering, read: +- `/path/to/documentation.md` +- `/path/to/source/code.py` (when needed) + +## Quick Reference +[Embedded patterns and common info] + +## Guidelines +How to respond to questions + +## User Request +$ARGUMENTS +``` + +### Deployment Flow + +``` +commands/ (source) + │ + ├── install ──────────────► ~/.claude/commands/ + │ + └── addproject ───────────► project/.claude/commands/ + │ + └── update ───────► All tracked projects +``` + +### Unified JSON Output Schema + +All commands produce output following this schema: +```json +{ + "command_type": "string", + "status": "complete|incomplete|user_query|error", + "session_id": "string", + "session_summary": "string", + "tasks": [...], + "files": { "created": [], "modified": [], "deleted": [] }, + "artifacts": {...}, + "queries_for_user": [...], + "comments": [...], + "errors": [...] +} +``` + +## Guidelines for Responding + +When helping users: + +1. **Be practical** - Provide working examples and commands +2. **Reference files** - Point to specific files in the repository +3. **Explain the flow** - Show how components connect +4. **Warn about pitfalls** - Mention common mistakes + +## Response Formats + +### For "how do I" questions: +``` +### Approach + +Brief explanation + +**Step 1:** Description +```bash +command or code +``` + +**Step 2:** Description +... + +**Files involved:** List of relevant files +``` + +### For architecture questions: +``` +### Overview + +Brief explanation of the component/concept + +### How It Works + +1. First... +2. Then... + +### Key Files + +- `path/to/file.md` - Purpose +- `path/to/code.py` - Purpose + +### Example + +Working example +``` + +## User Request + +$ARGUMENTS diff --git a/.claude/commands/claude-commands-expert/context/architecture.md b/.claude/commands/claude-commands-expert/context/architecture.md new file mode 100644 index 00000000..b04e140c --- /dev/null +++ b/.claude/commands/claude-commands-expert/context/architecture.md @@ -0,0 +1,282 @@ +# ClaudeCommands Architecture + +## System Overview + +ClaudeCommands is a framework for running Claude Code in headless mode with structured input/output and comprehensive documentation. + +``` +┌─────────────────────────────────────────────────────────────┐ +│ ClaudeCommands Repository │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ SYSTEM- │ │ commands/ │ │ data/ │ │ +│ │ PROMPT.md │ │ *.md │ │ projects.json│ │ +│ │ │ │ */context/ │ │ │ │ +│ │ (Universal │ │ (Command & │ │ (Tracked │ │ +│ │ instructions)│ │ Skill defs) │ │ projects) │ │ +│ └──────────────┘ └──────────────┘ └──────────────┘ │ +│ │ │ │ +│ └────────┬────────┘ │ +│ │ │ +│ ▼ │ +│ ┌──────────────┐ │ +│ │claude_ │ │ +│ │commands.py │ ◄───── CLI Tool │ +│ │ │ │ +│ └──────────────┘ │ +│ │ │ +│ ┌───────┴───────┐ │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌────────────┐ ┌────────────────┐ │ +│ │ ~/.claude/ │ │ project/.claude│ │ +│ │ (Global) │ │ (Per-project) │ │ +│ └────────────┘ └────────────────┘ │ +└─────────────────────────────────────────────────────────────┘ +``` + +## Core Components + +### 1. SYSTEM-PROMPT.md + +Universal instructions that apply to ALL command executions. + +**Purpose:** Define the "rules of the game" - output format, documentation requirements, session management. + +**Key sections:** +- Core Principles (documentation, output format, session management) +- Unified JSON Output Schema +- File Organization Structure +- Error Handling +- Best Practices + +**Deployed as:** `CLAUDE.md` in target directories + +### 2. Command Files (commands/*.md) + +Define WHAT to do for specific command types. + +**Categories:** +- **Task Commands:** create-prd, generate-tasks, free-agent +- **Documentation Commands:** doc-code-for-dev, doc-code-usage +- **Expert Skills:** msmodelutl-expert, claude-commands-expert + +**Structure:** +```markdown +# Command/Skill Name +## Purpose +## Command Type (for commands) or Knowledge Loading (for skills) +## Process/Guidelines +## Output Requirements/Response Format +## Quality Checklist +``` + +### 3. CLI Tool (claude_commands.py) + +Manages deployment of commands to projects. + +**Key methods:** +```python +class ClaudeCommandsCLI: + def install(self) # → ~/.claude/ + def addproject(self, dir) # → project/.claude/ + def update(self) # → All tracked projects + def list(self) # → Show tracked projects + def removeproject(self, name) +``` + +**Deployment logic:** +```python +def _copy_files_to_project(self, project_path): + # 1. Copy SYSTEM-PROMPT.md → .claude/CLAUDE.md + shutil.copy2(self.system_prompt, target_prompt) + + # 2. Copy entire commands/ → .claude/commands/ + shutil.copytree(self.commands_dir, target_commands) + # (Preserves subdirectories for skills with context) +``` + +### 4. Project Tracking (data/projects.json) + +Tracks which projects have commands installed. + +```json +{ + "project-name": "/absolute/path/to/project", + "another-project": "/path/to/another" +} +``` + +## Information Flow + +### Headless Execution + +``` +┌─────────────────────────────────────────────────────────────┐ +│ Claude Code CLI │ +│ │ +│ Inputs: │ +│ ├─ --system-prompt .claude/CLAUDE.md │ +│ ├─ --command .claude/commands/.md │ +│ └─ --request request.json │ +│ │ +│ Execution: │ +│ ├─ Reads and follows system prompt │ +│ ├─ Follows command-specific instructions │ +│ ├─ Creates artifacts (PRDs, docs, code) │ +│ └─ Documents everything │ +│ │ +│ Outputs: │ +│ ├─ claude-output.json (complete execution record) │ +│ └─ [artifacts] (files created by command) │ +└─────────────────────────────────────────────────────────────┘ +``` + +### Skill Invocation + +``` +┌─────────────────────────────────────────────────────────────┐ +│ Claude Code IDE/CLI │ +│ │ +│ User types: /skill-name How do I do X? │ +│ │ +│ Claude: │ +│ ├─ Loads .claude/commands/skill-name.md │ +│ ├─ Follows Knowledge Loading instructions │ +│ ├─ Reads referenced documentation (dynamic) │ +│ ├─ Uses Quick Reference (static) │ +│ └─ Responds following Guidelines │ +│ │ +│ Output: Conversational response with examples │ +└─────────────────────────────────────────────────────────────┘ +``` + +## Deployment Architecture + +### Global Installation (~/.claude/) + +Available in all projects (fallback when project-level not present). + +``` +~/.claude/ +├── CLAUDE.md # System prompt +└── commands/ + ├── command1.md + ├── skill1.md + └── skill1/ + └── context/ + └── *.md +``` + +### Project Installation (project/.claude/) + +Project-specific, takes precedence over global. + +``` +my-project/ +├── .claude/ +│ ├── CLAUDE.md # System prompt +│ └── commands/ +│ └── ... +└── (project files) +``` + +### Precedence + +1. Project-level (.claude/) - checked first +2. User-level (~/.claude/) - fallback + +## Unified JSON Output Schema + +All commands produce structured output: + +```json +{ + "command_type": "string", + "status": "complete|incomplete|user_query|error", + "session_id": "string", + "parent_session_id": "string|null", + "session_summary": "string", + + "tasks": [ + { + "task_id": "1.0", + "description": "string", + "status": "pending|in_progress|completed|skipped|blocked", + "parent_task_id": "string|null", + "notes": "string" + } + ], + + "files": { + "created": [{"path": "", "purpose": "", "type": ""}], + "modified": [{"path": "", "changes": ""}], + "deleted": [{"path": "", "reason": ""}] + }, + + "artifacts": { + "prd_filename": "string", + "documentation_filename": "string" + }, + + "queries_for_user": [ + { + "query_number": 1, + "query": "string", + "type": "text|multiple_choice|boolean", + "choices": [{"id": "", "value": ""}], + "response": "string|null" + } + ], + + "comments": ["string"], + "context": "string", + + "errors": [ + { + "message": "string", + "type": "string", + "fatal": true + } + ] +} +``` + +## Extension Points + +### Adding New Commands + +1. Create `commands/.md` following template +2. Deploy with `claude-commands update` + +### Adding Expert Skills + +1. Create `commands/.md` with Knowledge Loading +2. Optionally add `commands//context/` for reference docs +3. Deploy with `claude-commands update` + +### Modifying System Behavior + +1. Edit `SYSTEM-PROMPT.md` +2. Deploy with `claude-commands update` + +## Design Principles + +1. **Separation of Concerns** + - Universal rules → SYSTEM-PROMPT.md + - Command logic → commands/*.md + - User requests → request.json + +2. **Single Source of Truth** + - One system prompt for all commands + - One output schema for all outputs + +3. **Complete Documentation** + - Everything in JSON (user can't see terminal) + - All file operations tracked + - Session management for resumption + +4. **Centralized Management** + - Commands developed in one repo + - Deployed to many projects + - Single update pushes everywhere diff --git a/.claude/commands/claude-commands-expert/context/cli-reference.md b/.claude/commands/claude-commands-expert/context/cli-reference.md new file mode 100644 index 00000000..98c9fb30 --- /dev/null +++ b/.claude/commands/claude-commands-expert/context/cli-reference.md @@ -0,0 +1,187 @@ +# ClaudeCommands CLI Reference + +## Installation + +```bash +cd /path/to/ClaudeCommands +pip install -e . +``` + +This installs the `claude-commands` command globally. + +## Commands + +### install + +Install commands to user's home directory (~/.claude). + +```bash +claude-commands install +``` + +**What it does:** +1. Creates `~/.claude/` directory if missing +2. Copies `SYSTEM-PROMPT.md` to `~/.claude/CLAUDE.md` +3. Copies all commands (including subdirectories) to `~/.claude/commands/` + +**Prompts:** +- Asks before overwriting existing files + +### addproject + +Add a project to tracking and install commands. + +```bash +claude-commands addproject ~/my-project +``` + +**What it does:** +1. Validates the directory exists +2. Adds to tracking list (`data/projects.json`) +3. Creates `.claude/` directory in project +4. Copies `SYSTEM-PROMPT.md` to `.claude/CLAUDE.md` +5. Copies all commands to `.claude/commands/` + +**Name collision:** +- Projects are tracked by directory name +- Two projects with same directory name → error + +### update + +Update all tracked projects with latest commands. + +```bash +claude-commands update +``` + +**What it does:** +1. Reads `data/projects.json` +2. For each project: + - Verifies directory exists + - Re-copies SYSTEM-PROMPT.md and commands +3. Reports success/warnings + +### list + +List all tracked projects. + +```bash +claude-commands list +``` + +**Output:** +``` +Tracked projects (3): + + ✓ my-project + /Users/me/code/my-project + + ✓ another-app + /Users/me/work/another-app + + ✗ deleted-project + /Users/me/old/deleted-project +``` + +- ✓ = directory exists +- ✗ = directory missing + +### removeproject + +Remove a project from tracking. + +```bash +claude-commands removeproject my-project +``` + +**What it does:** +1. Removes from `data/projects.json` +2. Does NOT delete `.claude/` directory + +## Project Tracking + +Projects tracked in `data/projects.json`: + +```json +{ + "my-project": "/Users/me/code/my-project", + "another-app": "/Users/me/work/another-app" +} +``` + +**Key points:** +- Project name = directory name (not path) +- Paths are absolute +- File is gitignored (local to machine) +- Don't edit manually - use CLI + +## File Structure After Installation + +### User-level (~/.claude/) +``` +~/.claude/ +├── CLAUDE.md # Universal system prompt +└── commands/ + ├── create-prd.md + ├── free-agent.md + ├── msmodelutl-expert.md + └── msmodelutl-expert/ + └── context/ + ├── api-summary.md + ├── patterns.md + └── integration.md +``` + +### Project-level (project/.claude/) +``` +my-project/ +├── .claude/ +│ ├── CLAUDE.md # Universal system prompt +│ └── commands/ +│ ├── create-prd.md +│ └── ... +└── (project files) +``` + +## Workflow Examples + +### Initial Setup +```bash +# Clone repo +git clone ClaudeCommands +cd ClaudeCommands + +# Install CLI +pip install -e . + +# Install to home directory +claude-commands install + +# Add your projects +claude-commands addproject ~/project1 +claude-commands addproject ~/project2 +``` + +### After Modifying Commands +```bash +# Edit a command +vim commands/my-command.md + +# Push to all projects +claude-commands update +``` + +### Adding New Project +```bash +claude-commands addproject ~/new-project +# Commands automatically installed +``` + +### Cleaning Up +```bash +# Remove project from tracking +claude-commands removeproject old-project + +# Manually delete .claude if desired +rm -rf ~/old-project/.claude +``` diff --git a/.claude/commands/claude-commands-expert/context/skill-development.md b/.claude/commands/claude-commands-expert/context/skill-development.md new file mode 100644 index 00000000..463023f9 --- /dev/null +++ b/.claude/commands/claude-commands-expert/context/skill-development.md @@ -0,0 +1,273 @@ +# Skill/Command Development Guide + +## Overview + +This repository manages two types of Claude Code extensions: + +1. **Commands** - Task-oriented instructions (create-prd, free-agent, etc.) +2. **Expert Skills** - Domain-specific knowledge assistants (msmodelutl-expert, etc.) + +## Command vs. Skill + +| Aspect | Command | Expert Skill | +|--------|---------|--------------| +| Purpose | Execute a specific task | Answer questions, provide guidance | +| Input | Request JSON file | User question (natural language) | +| Output | JSON + artifacts | Conversational response | +| Invocation | `claude code headless --command` | `/skill-name ` | +| Examples | create-prd, generate-tasks | msmodelutl-expert | + +## Creating a New Command + +### Step 1: Create Command File + +Create `commands/.md`: + +```markdown +# Command: + +## Purpose +Brief description of what this command does. + +## Command Type +`` + +## Core Directive +You are a [role]. Your job is to [primary responsibility]. + +**YOUR JOB:** +- ✅ Task 1 +- ✅ Task 2 + +**DO NOT:** +- ❌ Anti-pattern 1 +- ❌ Anti-pattern 2 + +## Input +You will receive a request file containing: +- `field1`: Description +- `field2`: Description + +## Process + +### 1. First Step +Description of what to do. + +### 2. Second Step +Description of what to do. + +## Output Requirements +Describe what goes in the JSON output: +- Required fields +- Artifacts to create +- Files to document + +## Quality Checklist +- ✅ Verification 1 +- ✅ Verification 2 +``` + +### Step 2: Add to Schema + +Update `unified-output-schema.json`: + +```json +"command_type": { + "enum": [..., "your-new-command"] +} +``` + +### Step 3: Create Example + +Add `examples/-example.json`: + +```json +{ + "request_type": "", + "description": "Example request", + "context": { + "relevant_field": "value" + } +} +``` + +### Step 4: Deploy + +```bash +claude-commands update +``` + +## Creating an Expert Skill + +Expert skills provide domain expertise for answering questions. + +### Step 1: Create Main Skill File + +Create `commands/.md`: + +```markdown +# Expert + +You are an expert on . You have deep knowledge of: + +1. **Area 1** - Description +2. **Area 2** - Description +3. **Area 3** - Description + +## Knowledge Loading + +Before answering, read the relevant documentation: + +**Always read:** +- `/path/to/main/documentation.md` + +**When needed:** +- `/path/to/source/code.py` +- `/path/to/additional/docs.md` + +## Quick Reference + +### Key Concept 1 +```python +# Code example +example_code() +``` + +### Key Concept 2 +Brief explanation with example. + +### Common Mistakes +1. **Mistake 1**: How to avoid it +2. **Mistake 2**: How to avoid it + +## Guidelines for Responding + +When helping users: + +1. **Be specific** - Reference exact functions, parameters +2. **Show examples** - Provide working code +3. **Explain why** - Not just what, but why +4. **Warn about pitfalls** - Common mistakes + +## Response Formats + +### For API questions: +\``` +### Method: `method_name(params)` + +**Purpose:** Description + +**Parameters:** +- `param1` (type): Description + +**Returns:** Description + +**Example:** +```python +code +``` +\``` + +### For "how do I" questions: +\``` +### Approach + +Explanation + +**Step 1:** Description +```python +code +``` +\``` + +## User Request + +$ARGUMENTS +``` + +### Step 2: Create Context Directory (Optional) + +For skills with extensive reference material: + +``` +commands/ +├── .md +└── / + └── context/ + ├── api-summary.md # Quick API reference + ├── patterns.md # Common usage patterns + └── integration.md # Integration with other systems +``` + +### Step 3: Choose Documentation Strategy + +**Static Context (embedded in skill):** +- Faster response time +- Requires manual updates when source changes +- Best for: stable APIs, patterns that rarely change + +**Dynamic Loading (read files on invocation):** +- Always current with source +- Slightly slower +- Best for: actively developed code + +**Hybrid Approach (recommended):** +- Static: patterns, common mistakes, integration info +- Dynamic: full API reference, source code + +Example hybrid: +```markdown +## Knowledge Loading + +Before answering, read the current documentation: +- `/path/to/developer-guide.md` # Dynamic - always current + +## Quick Reference +[Embedded patterns and common info] # Static - fast access +``` + +### Step 4: Deploy + +```bash +claude-commands update +``` + +## Skill Invocation + +After deployment, invoke with: + +``` +/skill-name How do I do X? +/skill-name What's the difference between A and B? +/skill-name Debug this code that's failing +``` + +## Best Practices + +### For Commands +1. Follow the standard template structure +2. Reference SYSTEM-PROMPT.md for output format (don't duplicate) +3. Include clear quality checklist +4. Document expected request format +5. Provide examples + +### For Expert Skills +1. Use dynamic loading for frequently-updated documentation +2. Embed common patterns for fast access +3. Include response format templates +4. Warn about common mistakes +5. Reference exact file paths + +### For Both +1. Keep files in `commands/` directory (source) +2. Use `claude-commands update` to deploy +3. Test locally before pushing to all projects +4. Document in comments what each file does + +## File Naming Conventions + +| Type | Pattern | Example | +|------|---------|---------| +| Command | `-.md` | `create-prd.md`, `doc-code-usage.md` | +| Expert Skill | `-expert.md` | `msmodelutl-expert.md` | +| Context Dir | `/context/` | `msmodelutl-expert/context/` | diff --git a/.claude/commands/create-new-project.md b/.claude/commands/create-new-project.md new file mode 100644 index 00000000..b637c50c --- /dev/null +++ b/.claude/commands/create-new-project.md @@ -0,0 +1,637 @@ +# Command: create-new-project + +## Purpose + +Create a new project with complete setup including Cursor workspace configuration, virtual environment management, Claude commands installation, and optional git repository initialization. This command orchestrates multiple setup steps to create a fully configured development environment. + +## Command Type + +`create-new-project` + +## Input + +You will receive a request file containing: +- Project name (required) +- Project directory path (required - can be relative or absolute) +- Project type (optional: python, javascript, typescript, jupyter, multi-language) +- Initialize git repository (optional: boolean, default true) +- Python version for venv (optional: e.g., "3.11", default to system python3) +- Additional workspace folders (optional) +- Workspace settings preferences (optional) + +## Process + +### Phase 1: Create Project Directory + +1. **Setup Project Directory** + - Create project directory at specified path if it doesn't exist + - Convert to absolute path for consistency + - Verify write permissions + - Document the project path + +2. **Validate Project Name** + - Use provided project name or derive from directory name + - Sanitize for use in filenames (remove special characters) + - Check for conflicts with existing projects + - Document the final project name + +### Phase 2: Initialize Git Repository (Optional) + +3. **Git Initialization** + - If git initialization requested (default: true): + - Run: git init + - Create .gitignore file with common patterns + - Create initial commit with project structure + - Document git initialization status + - If git initialization skipped: + - Note in comments why it was skipped + - Continue with setup + +4. **Create .gitignore** + - Add common patterns based on project type: + - Python: venv/, __pycache__/, *.pyc, .pytest_cache/, *.egg-info/ + - JavaScript/Node: node_modules/, dist/, .cache/ + - Jupyter: .ipynb_checkpoints/, notebooks/datacache/ + - General: .DS_Store, .vscode/, *.swp, *.swo + - Claude: .claude/commands/ (managed by claude-commands) + - Keep: .claude/CLAUDE.md, .claude/settings.local.json + - Document .gitignore creation + +### Phase 3: Setup Virtual Environment with venvman + +5. **Register with venvman** + - Run: venvman add PROJECT_NAME PROJECT_PATH + - If Python version specified, create venv with that version + - If not specified, use system default python3 + - Document venvman registration + - Note the virtual environment path + +6. **Activate and Setup Python Environment** + - If project type is Python or Jupyter: + - Install basic dependencies (pip, setuptools, wheel) + - Create requirements.txt if it doesn't exist + - Document Python setup + - If not Python project: + - Note that venv was created but is optional + +### Phase 4: Install Claude Commands + +7. **Register with claude-commands** + - Run: claude-commands addproject PROJECT_PATH + - This installs SYSTEM-PROMPT.md to .claude/CLAUDE.md + - Installs all command files to .claude/commands/ + - Document claude-commands registration + - Count and list installed commands + +### Phase 5: Create Cursor Workspace + +8. **Generate Workspace File** + - Create workspace file: EXCLAMATION + project-name.code-workspace + - Include current directory as primary folder + - Configure settings based on project type + - Add file exclusions (venv/, node_modules/, __pycache__, etc.) + - Add search exclusions for performance + - Document workspace creation + +9. **Configure Project-Specific Settings** + - Python projects: Black formatter, pytest, type checking + - JavaScript/TypeScript: Prettier, ESLint + - Jupyter: Notebook settings, output limits + - Add extension recommendations + - Document all settings configured + +### Phase 6: Create Project Structure + +10. **Create Standard Directories** + - ALWAYS create: agent-io/ directory for Claude command tracking files + - Based on project type, create: + - Python: src/, tests/, docs/ + - Jupyter: notebooks/, notebooks/data/, notebooks/datacache/, notebooks/genomes/, notebooks/models/, notebooks/nboutput/, notebooks/util.py + - JavaScript: src/, tests/, dist/ + - General: docs/, README.md + - Document directory structure created + +11. **Create Initial Files** + - README.md with project name and description + - requirements.txt (for Python projects) + - package.json (for JavaScript projects) + - For Jupyter: notebooks/util.py with NotebookUtil template + - Document files created + +### Phase 7: Finalize Setup + +12. **Create Initial Git Commit (if git enabled)** + - Stage all created files + - Create commit: "Initial project setup: PROJECT_NAME" + - Include setup details in commit message + - Document commit creation + +13. **Generate Setup Summary** + - List all tools registered (venvman, claude-commands) + - List all files and directories created + - Provide next steps for user + - Document complete setup status + +### Phase 8: Save Structured Output + +14. **Save JSON Tracking File** + - IMPORTANT: Save all agent-io output to the NEW project directory, NOT the current working directory + - Create agent-io/ directory in the new project if it doesn't exist + - Save tracking JSON to: NEW_PROJECT_PATH/agent-io/create-new-project-session-SESSIONID.json + - Document all setup steps completed + - List all files and directories created + - Record all command executions + - Note any errors or warnings + - Include completion status + +## JSON Output Schema + +```json +{ + "command_type": "create-new-project", + "status": "complete | incomplete | user_query | error", + "session_id": "string", + "parent_session_id": "string | null", + "session_summary": "Brief summary of project creation", + + "project": { + "name": "string - project name", + "path": "string - absolute path to project", + "type": "python | javascript | typescript | jupyter | multi-language | other" + }, + + "git": { + "initialized": true, + "initial_commit": true, + "commit_hash": "string - git commit hash", + "gitignore_created": true + }, + + "venvman": { + "registered": true, + "command_run": "venvman add PROJECT_NAME PROJECT_PATH", + "venv_path": "string - path to virtual environment", + "python_version": "3.11" + }, + + "claude_commands": { + "registered": true, + "command_run": "claude-commands addproject .", + "commands_installed": 5, + "system_prompt_installed": true, + "commands_list": ["create-prd", "doc-code-for-dev", "doc-code-usage", "jupyter-dev", "cursor-setup"] + }, + + "workspace": { + "filename": "string - workspace file with ! prefix", + "path": "string - absolute path to workspace file", + "folders_count": 1, + "settings_configured": true, + "extensions_recommended": ["ms-python.python", "ms-toolsai.jupyter"] + }, + + "directories_created": [ + "agent-io/", + "src/", + "tests/", + "docs/", + "notebooks/", + "notebooks/data/", + "notebooks/datacache/", + "notebooks/genomes/", + "notebooks/models/", + "notebooks/nboutput/" + ], + + "files": { + "created": [ + { + "path": "!ProjectName.code-workspace", + "purpose": "Cursor workspace configuration", + "type": "config" + }, + { + "path": ".gitignore", + "purpose": "Git ignore patterns", + "type": "config" + }, + { + "path": "README.md", + "purpose": "Project documentation", + "type": "documentation" + }, + { + "path": "requirements.txt", + "purpose": "Python dependencies", + "type": "config" + }, + { + "path": ".claude/CLAUDE.md", + "purpose": "Claude system prompt", + "type": "documentation" + }, + { + "path": "agent-io/create-new-project-session-SESSIONID.json", + "purpose": "Claude command execution tracking for this session", + "type": "tracking" + } + ], + "modified": [] + }, + + "artifacts": { + "project_path": "absolute path to project", + "workspace_file": "path to workspace file", + "readme_file": "path to README.md", + "tracking_file": "agent-io/create-new-project-session-SESSIONID.json" + }, + + "next_steps": [ + "Open workspace: code !ProjectName.code-workspace", + "Activate venv: venvman activate ProjectName", + "Install dependencies: pip install -r requirements.txt", + "Start developing!" + ], + + "comments": [ + "Created project directory at /path/to/project", + "Created agent-io/ directory for Claude command tracking", + "Initialized git repository with initial commit", + "Registered with venvman using Python 3.11", + "Installed 5 Claude commands to .claude/commands/", + "Created Cursor workspace with Python settings", + "Created standard Python project structure (src/, tests/, docs/)", + "Generated README.md and requirements.txt", + "Saved tracking JSON to NEW_PROJECT_PATH/agent-io/" + ], + + "queries_for_user": [], + + "errors": [] +} +``` + +## Command JSON Output Requirements + +**Required Fields:** +- `command_type`: "create-new-project" +- `status`: "complete", "user_query", or "error" +- `session_id`: Session ID for this execution +- `session_summary`: Brief summary of project creation +- `project`: Project details (name, path, type) +- `git`: Git initialization status +- `venvman`: Virtual environment registration +- `claude_commands`: Claude commands registration +- `workspace`: Cursor workspace details +- `directories_created`: List of directories created +- `files`: All files created +- `artifacts`: Key file paths +- `next_steps`: User guidance for next actions +- `comments`: Detailed notes about setup process + +**For user_query status:** +- `queries_for_user`: Questions needing clarification +- `context`: Save partial setup state + +**Example Comments:** +- "Created new project 'MetabolicModeling' at ~/Projects/MetabolicModeling" +- "Initialized git repository with initial commit (abc123f)" +- "Registered with venvman using Python 3.11 at ~/Projects/MetabolicModeling/venv" +- "Installed 5 Claude commands to .claude/commands/" +- "Created Cursor workspace: !MetabolicModeling.code-workspace" +- "Created Jupyter notebook structure with util.py template" +- "Generated .gitignore with Python and Jupyter patterns" + +## .gitignore Template + +### Python Projects +``` +# Python +__pycache__/ +*.py[cod] +*$py.class +*.so +.Python +venv/ +env/ +ENV/ +.venv +pip-log.txt +pip-delete-this-directory.txt +.pytest_cache/ +.coverage +htmlcov/ +*.egg-info/ +dist/ +build/ + +# Jupyter +.ipynb_checkpoints/ +notebooks/datacache/ + +# IDE +.vscode/ +.idea/ +*.swp +*.swo + +# OS +.DS_Store +Thumbs.db + +# Claude (commands are managed by claude-commands) +.claude/commands/ + +# Agent-IO (Claude command tracking - keep in git for project history) +# agent-io/ is intentionally tracked + +# Keep these +!.claude/CLAUDE.md +!.claude/settings.local.json +``` + +### JavaScript/Node Projects +``` +# Node +node_modules/ +npm-debug.log* +yarn-debug.log* +yarn-error.log* +.pnpm-debug.log* +dist/ +build/ +.cache/ + +# IDE +.vscode/ +.idea/ +*.swp +*.swo + +# OS +.DS_Store +Thumbs.db + +# Claude +.claude/commands/ + +# Agent-IO (keep in git) +# agent-io/ is intentionally tracked + +# Keep these +!.claude/CLAUDE.md +!.claude/settings.local.json +``` + +### Jupyter Projects +``` +# Jupyter +.ipynb_checkpoints/ +notebooks/datacache/ + +# Python +__pycache__/ +*.py[cod] +venv/ +*.egg-info/ + +# Data (keep structure, ignore large files) +notebooks/data/*.csv +notebooks/data/*.tsv +notebooks/data/*.xlsx +notebooks/genomes/*.fasta +notebooks/genomes/*.gbk +notebooks/models/*.xml +notebooks/models/*.json +notebooks/nboutput/* + +# Keep these data directory files +!notebooks/data/.gitkeep +!notebooks/genomes/.gitkeep +!notebooks/models/.gitkeep + +# IDE +.vscode/ +.DS_Store + +# Claude +.claude/commands/ + +# Agent-IO (keep in git) +# agent-io/ is intentionally tracked + +# Keep these +!.claude/CLAUDE.md +``` + +## README.md Template + +```markdown +# PROJECT_NAME + +[Brief project description] + +## Setup + +This project was created with the `create-new-project` Claude command. + +### Prerequisites + +- Python 3.11+ (or appropriate version) +- venvman for virtual environment management +- claude-commands for Claude Code integration + +### Installation + +1. Activate the virtual environment: + ```bash + venvman activate PROJECT_NAME + ``` + +2. Install dependencies: + ```bash + pip install -r requirements.txt + ``` + +### Development + +Open the Cursor workspace: +```bash +code !PROJECT_NAME.code-workspace +``` + +### Project Structure + +- `agent-io/` - Claude command execution tracking and session history +- `src/` - Source code +- `tests/` - Test files +- `docs/` - Documentation +- `notebooks/` - Jupyter notebooks (if applicable) +- `.claude/` - Claude Code configuration (commands managed by claude-commands) + +### Claude Code Integration + +This project includes Claude Code integration: +- Command tracking stored in `agent-io/` for project history +- Commands automatically installed to `.claude/commands/` (managed by claude-commands) +- Update commands: `claude-commands update` + +## License + +[Add license information] +``` + +## Jupyter util.py Template + +For Jupyter projects, create notebooks/util.py: + +```python +import sys +import os +import json +from os import path + +# Add the parent directory to the sys.path +sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) +script_path = os.path.abspath(__file__) +script_dir = os.path.dirname(script_path) +base_dir = os.path.dirname(os.path.dirname(script_dir)) +folder_name = os.path.basename(script_dir) + +print(base_dir+"/KBUtilLib/src") +sys.path = [base_dir+"/KBUtilLib/src",base_dir+"/cobrakbase",base_dir+"/ModelSEEDpy/"] + sys.path + +# Import utilities with error handling +from kbutillib import NotebookUtils + +import hashlib +import pandas as pd +from modelseedpy import AnnotationOntology, MSPackageManager, MSMedia, MSModelUtil, MSBuilder, MSATPCorrection, MSGapfill, MSGrowthPhenotype, MSGrowthPhenotypes, ModelSEEDBiochem, MSExpression + +class NotebookUtil(NotebookUtils): + def __init__(self,**kwargs): + super().__init__( + notebook_folder=script_dir, + name="PROJECT_NAME", + user="chenry", + retries=5, + proxy_port=None, + **kwargs + ) + + # PLACE ALL UTILITY FUNCTIONS NEEDED FOR NOTEBOOKS HERE + +# Initialize the NotebookUtil instance +util = NotebookUtil() +``` + +## Quality Checklist + +Before marking complete, verify: +- ✅ Project directory created at specified path +- ✅ agent-io/ directory created in NEW project directory +- ✅ Git repository initialized (if requested) +- ✅ .gitignore created with appropriate patterns (agent-io/ kept in git) +- ✅ Initial git commit created (if git enabled) +- ✅ Registered with venvman successfully +- ✅ Virtual environment created with correct Python version +- ✅ Registered with claude-commands successfully +- ✅ Claude commands and SYSTEM-PROMPT installed to .claude/ +- ✅ Cursor workspace file created with exclamation prefix +- ✅ Workspace settings configured for project type +- ✅ Standard directory structure created +- ✅ README.md generated with project info +- ✅ requirements.txt or package.json created (if applicable) +- ✅ For Jupyter: notebooks/util.py created with project name +- ✅ All setup steps documented in comments +- ✅ Tracking JSON saved to NEW_PROJECT_PATH/agent-io/ directory +- ✅ Next steps provided for user + +## Error Handling + +Handle these scenarios gracefully: + +1. **Directory Already Exists**: Ask user whether to use existing or create new name +2. **Git Not Installed**: Skip git initialization, note in comments +3. **venvman Not Found**: Note error, continue with other setup steps +4. **claude-commands Not Found**: Note error, continue with other setup steps +5. **Permission Issues**: Document error and suggest manual fix +6. **Invalid Project Name**: Sanitize name and notify user of changes +7. **Python Version Not Available**: Fall back to system default, note in comments + +## Command Execution Order + +Critical: Execute commands in this exact order to avoid conflicts: + +1. Create project directory +2. Change to project directory +3. Create agent-io/ directory +4. Initialize git (optional) +5. Create .gitignore +6. Register with venvman +7. Register with claude-commands +8. Create workspace file +9. Create directory structure (including agent-io/) +10. Create initial files +11. Create initial git commit (if enabled) +12. Save tracking file to NEW_PROJECT_PATH/agent-io/ + +## Integration Notes + +### venvman Integration +- venvman stores virtual environments centrally +- Command: `venvman add PROJECT_NAME PROJECT_PATH` +- Activate with: `venvman activate PROJECT_NAME` +- List all: `venvman list` + +### claude-commands Integration +- Installs commands to .claude/commands/ +- Updates can be pulled with: `claude-commands update` +- List tracked projects: `claude-commands list` + +### Cursor Workspace +- Workspace file appears at top of directory (! prefix) +- Open with: `code !PROJECT_NAME.code-workspace` +- Settings are project-specific and version-controlled + +## Privacy and Security Considerations + +- Don't include API keys or credentials in generated files +- .gitignore should exclude sensitive data directories +- README template should not expose internal paths +- Virtual environment paths are local, not in git +- .claude/commands/ excluded from git (managed by claude-commands) +- Keep .claude/CLAUDE.md in git for project-specific settings + +## Next Steps After Project Creation + +Provide users with clear next steps: + +1. **Open Workspace** + ```bash + code !PROJECT_NAME.code-workspace + ``` + +2. **Activate Virtual Environment** + ```bash + venvman activate PROJECT_NAME + ``` + +3. **Install Dependencies** + ```bash + pip install -r requirements.txt + # or + npm install + ``` + +4. **Start Development** + - Begin coding in src/ + - Write tests in tests/ + - Document in docs/ + - For Jupyter: Create notebooks in notebooks/ + +5. **Commit Changes** + ```bash + git add . + git commit -m "Add initial implementation" + ``` diff --git a/.claude/commands/create-prd.md b/.claude/commands/create-prd.md new file mode 100644 index 00000000..e6794631 --- /dev/null +++ b/.claude/commands/create-prd.md @@ -0,0 +1,174 @@ +# Command: create-prd + +## Purpose + +Generate a comprehensive Product Requirements Document (PRD) from a user's feature request. The PRD should be clear, actionable, and suitable for a junior developer to understand and implement. + +## Command Type + +`create-prd` + +## Input + +You will receive a request file containing: +- Initial feature description or request +- Any existing context about the product/system +- Target users or stakeholders + +## Process + +### Phase 1: Clarification + +1. **Analyze the Request** + - Read the feature request carefully + - Identify what information is provided + - Identify what critical information is missing + +2. **Ask Clarifying Questions** (if needed) + - Ask about problem/goal: "What problem does this feature solve?" + - Ask about target users: "Who is the primary user?" + - Ask about core functionality: "What are the key actions users should perform?" + - Ask for user stories: "As a [user], I want to [action] so that [benefit]" + - Ask about acceptance criteria: "How will we know this is successfully implemented?" + - Ask about scope: "What should this feature NOT do?" + - Ask about data requirements: "What data needs to be displayed or manipulated?" + - Ask about design/UI: "Are there mockups or UI guidelines?" + - Ask about edge cases: "What potential error conditions should we consider?" + + **Important**: Only ask questions where the answer is not already clear from the request. Make reasonable assumptions and document them in comments. + +### Phase 2: PRD Generation + +3. **Generate PRD Markdown** + - Create a comprehensive PRD following the structure below + - Write for a junior developer audience + - Be explicit and unambiguous + - Avoid jargon where possible + +4. **Determine PRD Directory Name** + - Convert feature name to kebab-case + - Example: "User Profile Editing" → "user-profile-editing" + +5. **Save PRD Files** + - Create directory: `agent-io/prds//` + - Save user's original request to: `agent-io/prds//humanprompt.md` + - Save complete PRD to: `agent-io/prds//fullprompt.md` + - Create JSON tracking file: `agent-io/prds//.json` + - Document the filename in JSON output's `artifacts.prd_filename` + +## PRD Structure + +Your PRD markdown file must include these sections: + +```markdown +# PRD: [Feature Name] + +## Introduction/Overview +Brief description of the feature and the problem it solves. State the primary goal. + +## Goals +List specific, measurable objectives for this feature: +1. [Goal 1] +2. [Goal 2] +3. [Goal 3] + +## User Stories +Detail user narratives describing feature usage and benefits: + +**As a** [type of user] +**I want to** [perform some action] +**So that** [I can achieve some benefit] + +(Include 3-5 user stories) + +## Functional Requirements + +List specific functionalities the feature must have. Use clear, concise language. Number each requirement. + +1. The system must [specific requirement] +2. The system must [specific requirement] +3. Users must be able to [specific action] +4. The feature must [specific behavior] + +## Non-Goals (Out of Scope) + +Clearly state what this feature will NOT include: +- [Non-goal 1] +- [Non-goal 2] +- [Non-goal 3] + +## Design Considerations + +(Optional - include if relevant) +- Link to mockups or design files +- Describe UI/UX requirements +- Mention relevant components or design system elements +- Note accessibility requirements + +## Technical Considerations + +(Optional - include if relevant) +- Known technical constraints +- Dependencies on other systems or modules +- Performance requirements +- Security considerations +- Scalability concerns + +## Success Metrics + +How will the success of this feature be measured? +- [Metric 1: e.g., "Increase user engagement by 10%"] +- [Metric 2: e.g., "Reduce support tickets related to X by 25%"] +- [Metric 3: e.g., "90% of users complete the flow without errors"] + +## Open Questions + +List any remaining questions or areas needing further clarification: +1. [Question 1] +2. [Question 2] +``` + +## Tasks to Track + +Create tasks in the JSON output: + +``` +1.0 Clarify requirements (if questions needed) +2.0 Generate PRD content +3.0 Save PRD file +``` + +Mark tasks as completed as you progress. + +## JSON Output Requirements + +Your JSON output must include: + +**Required Fields:** +- `command_type`: "create-prd" +- `status`: "complete", "user_query", or "error" +- `session_summary`: Brief summary of PRD creation +- `files.created`: Array with the PRD file entry +- `artifacts.prd_filename`: Path to the PRD file +- `comments`: Array of notes (e.g., assumptions made, important decisions) + +**For user_query status:** +- `queries_for_user`: Your clarifying questions +- `context`: Save the initial request and any partial work + +**Example Comments:** +- "Assumed feature is for logged-in users only" +- "PRD written for web interface; mobile considerations noted as future enhancement" +- "No existing user authentication system mentioned; included as technical dependency" + +## Quality Checklist + +Before marking complete, verify: +- ✅ PRD includes all required sections +- ✅ Requirements are specific and measurable +- ✅ User stories follow the standard format +- ✅ Non-goals are clearly stated +- ✅ PRD is understandable by a junior developer +- ✅ File saved to correct location with correct naming +- ✅ JSON output includes all required fields +- ✅ All assumptions documented in comments diff --git a/.claude/commands/cursor-setup.md b/.claude/commands/cursor-setup.md new file mode 100644 index 00000000..526159f5 --- /dev/null +++ b/.claude/commands/cursor-setup.md @@ -0,0 +1,379 @@ +# Command: cursor-setup + +## Purpose + +Create a Cursor workspace file for the current project directory, enabling multi-root workspace features, custom settings, and organized project management in Cursor IDE. + +## Command Type + +`cursor-setup` + +## Input + +You will receive a request file containing: +- Project name (required) +- Additional workspace folders to include (optional) +- Workspace-specific settings (optional) +- Extensions to recommend (optional) + +## Process + +### Phase 1: Gather Project Information + +1. **Determine Project Name** + - Use project name from input request + - If not provided, derive from current directory name + - Sanitize name for filename use (remove special characters) + - Document the project name + +2. **Identify Project Structure** + - Examine current directory structure + - Identify key folders (src, tests, docs, etc.) + - Note any existing configuration files (.vscode, .cursor, etc.) + - Document project type (Python, Node.js, multi-language, etc.) + +### Phase 2: Create Workspace File + +3. **Generate Workspace Configuration** + - Create workspace file with naming pattern: EXCLAMATION-project-name.code-workspace + - The exclamation mark prefix ensures the file appears at top of directory listings + - Include current directory as primary folder + - Add any additional folders specified in request + - Configure workspace settings appropriate for project type + +4. **Configure Workspace Settings** + - Add workspace-level settings for: + - File associations + - Editor preferences + - Language-specific settings + - Search exclusions + - Extension recommendations + - Preserve any existing settings from .vscode/settings.json + - Document all settings added + +### Phase 3: Register with ClaudeCommands + +5. **Add Project to ClaudeCommands Database** + - Run the command: claude-commands addproject . + - This registers the project directory in the ClaudeCommands tracking system + - Installs the latest Claude commands and SYSTEM-PROMPT.md to the project + - Document the registration in comments + - If the command fails, note the error but continue with workspace setup + +### Phase 4: Validate and Document + +6. **Validate Workspace File** + - Verify JSON structure is valid + - Ensure all paths are relative to workspace file location + - Check that workspace file can be opened in Cursor + - Document workspace structure + +7. **Create Documentation** + - Document workspace file location + - Explain workspace structure + - List any workspace-specific settings + - Provide usage instructions + +### Phase 5: Save Structured Output + +8. **Save JSON Tracking File** + - Document workspace file creation + - List all settings configured + - Note any issues or recommendations + - Include completion status + +## Workspace File Template + +The workspace file should follow this structure: + +```json +{ + "folders": [ + { + "path": ".", + "name": "" + } + ], + "settings": { + "files.exclude": { + "**/__pycache__": true, + "**/*.pyc": true, + "**/.pytest_cache": true, + "**/.DS_Store": true, + "**/node_modules": true, + "**/.git": false + }, + "search.exclude": { + "**/__pycache__": true, + "**/*.pyc": true, + "**/node_modules": true, + "**/.git": true + }, + "files.watcherExclude": { + "**/__pycache__/**": true, + "**/node_modules/**": true + } + }, + "extensions": { + "recommendations": [] + } +} +``` + +### Workspace Settings by Project Type + +**Python Projects:** +```json +{ + "python.analysis.typeCheckingMode": "basic", + "python.analysis.autoImportCompletions": true, + "[python]": { + "editor.defaultFormatter": "ms-python.black-formatter", + "editor.formatOnSave": true, + "editor.codeActionsOnSave": { + "source.organizeImports": true + } + }, + "files.exclude": { + "**/__pycache__": true, + "**/*.pyc": true, + "**/.pytest_cache": true + } +} +``` + +**Node.js/JavaScript Projects:** +```json +{ + "[javascript]": { + "editor.defaultFormatter": "esbenp.prettier-vscode", + "editor.formatOnSave": true + }, + "[typescript]": { + "editor.defaultFormatter": "esbenp.prettier-vscode", + "editor.formatOnSave": true + }, + "files.exclude": { + "**/node_modules": true, + "**/dist": true, + "**/.cache": true + } +} +``` + +**Jupyter Notebook Projects:** +```json +{ + "jupyter.notebookFileRoot": "${workspaceFolder}/notebooks", + "notebook.output.textLineLimit": 500, + "[python]": { + "editor.defaultFormatter": "ms-python.black-formatter" + }, + "files.exclude": { + "**/.ipynb_checkpoints": true, + "**/__pycache__": true + } +} +``` + +## JSON Output Schema + +```json +{ + "command_type": "cursor-setup", + "status": "complete | incomplete | user_query | error", + "session_id": "string", + "parent_session_id": "string | null", + "session_summary": "Brief summary of workspace setup", + + "project": { + "name": "string - project name", + "type": "python | javascript | typescript | jupyter | multi-language | other", + "workspace_filename": "string - filename with ! prefix" + }, + + "workspace": { + "folders": [ + { + "path": "string - relative path", + "name": "string - folder display name" + } + ], + "settings_count": 10, + "extensions_recommended": 3 + }, + + "claude_commands": { + "registered": true, + "command_run": "claude-commands addproject .", + "commands_installed": 5, + "system_prompt_installed": true + }, + + "files": { + "created": [ + { + "path": "string - workspace file with ! prefix", + "purpose": "Cursor workspace configuration", + "type": "config" + } + ], + "modified": [] + }, + + "artifacts": { + "workspace_filename": "string - workspace file with ! prefix", + "workspace_path": "absolute path to workspace file" + }, + + "comments": [ + "Created workspace file with name prefix '!' for top sorting", + "Configured Python-specific settings for project", + "Added file exclusions for __pycache__ and .pyc files", + "Workspace can be opened in Cursor via File > Open Workspace", + "Registered project with ClaudeCommands database", + "Installed 5 Claude commands to .claude/commands/" + ], + + "queries_for_user": [], + + "errors": [] +} +``` + +## Command JSON Output Requirements + +**Required Fields:** +- `command_type`: "cursor-setup" +- `status`: "complete", "user_query", or "error" +- `session_id`: Session ID for this execution +- `session_summary`: Brief summary of workspace creation +- `project`: Project name and workspace details +- `workspace`: Configuration details +- `claude_commands`: Registration status with ClaudeCommands database +- `files`: Workspace file created +- `artifacts`: Path to workspace file +- `comments`: Notes about workspace configuration + +**For user_query status:** +- `queries_for_user`: Questions about project structure or preferences +- `context`: Save partial workspace configuration + +**Example Comments:** +- "Created workspace file with exclamation prefix for top sorting" +- "Configured Python development settings with Black formatter" +- "Added exclusions for common Python cache directories" +- "Included notebooks/ folder as additional workspace folder" +- "Recommended extensions: Python, Jupyter, Black Formatter" + +## Workspace File Naming Convention + +The workspace file must be named with an exclamation mark prefix followed by the project name and .code-workspace extension. + +Format: EXCLAMATION + project-name + .code-workspace + +**Why the exclamation mark prefix?** +- Ensures workspace file appears at top of alphabetical directory listings +- Makes workspace file easy to find and identify +- Common convention for important configuration files +- Visual indicator of workspace root file + +**Examples:** +- Exclamation mark + MetabolicModeling.code-workspace +- Exclamation mark + ClaudeCommands.code-workspace +- Exclamation mark + WebsiteRedesign.code-workspace + +## Quality Checklist + +Before marking complete, verify: +- ✅ Workspace file created with exclamation mark prefix in filename +- ✅ JSON structure is valid and properly formatted +- ✅ Current directory included as primary folder +- ✅ Workspace settings appropriate for project type +- ✅ File exclusions configured to hide build artifacts +- ✅ Search exclusions configured for better performance +- ✅ Extension recommendations included (if applicable) +- ✅ All paths are relative to workspace file location +- ✅ Workspace file can be opened in Cursor +- ✅ Project registered with ClaudeCommands (claude-commands addproject .) +- ✅ Claude commands and SYSTEM-PROMPT installed to .claude/ directory +- ✅ Documentation includes usage instructions + +## Error Handling + +Handle these scenarios gracefully: + +1. **No Project Name**: Use current directory name as fallback +2. **Existing Workspace File**: Ask user whether to overwrite or merge +3. **Invalid Characters in Name**: Sanitize project name for filename +4. **Unknown Project Type**: Use generic workspace template +5. **Permission Issues**: Document if unable to write file +6. **ClaudeCommands Not Found**: Note error in comments, continue with workspace setup + +## Usage Instructions + +After creating workspace file, users can: + +1. **Open Workspace in Cursor** + - File > Open Workspace from File + - Select the workspace file (begins with exclamation mark) + - Or double-click the workspace file + +2. **Benefits of Workspace** + - Consistent settings across team members + - Multi-root folder support + - Workspace-specific extensions + - Organized project structure + - Easy project switching + +3. **Customization** + - Edit workspace file to add more folders + - Add custom tasks and launch configurations + - Configure language-specific settings + - Add extension recommendations + +## Advanced Workspace Features + +Optionally include these advanced features: + +**Tasks Configuration:** +```json +{ + "tasks": { + "version": "2.0.0", + "tasks": [ + { + "label": "Run Tests", + "type": "shell", + "command": "pytest", + "group": "test" + } + ] + } +} +``` + +**Launch Configurations:** +```json +{ + "launch": { + "version": "0.2.0", + "configurations": [ + { + "name": "Python: Current File", + "type": "python", + "request": "launch", + "program": "${file}" + } + ] + } +} +``` + +## Privacy and Security Considerations + +- Don't include absolute paths that expose user directory structure +- Use relative paths for all folder references +- Don't include API keys or credentials in workspace settings +- Don't commit sensitive workspace settings to version control +- Use workspace file for team-shared settings only diff --git a/.claude/commands/doc-code-for-dev.md b/.claude/commands/doc-code-for-dev.md new file mode 100644 index 00000000..3b883034 --- /dev/null +++ b/.claude/commands/doc-code-for-dev.md @@ -0,0 +1,312 @@ +# Command: doc-code-for-dev + +## Purpose + +Create comprehensive architecture documentation that enables developers (and AI agents) to understand, modify, and extend a codebase. This is internal documentation about HOW the code works, not how to USE it. + +## Command Type + +`doc-code-for-dev` + +## Core Directive + +**YOUR ONLY JOB**: Document and explain the codebase as it exists today. + +**DO NOT:** +- Suggest improvements or changes +- Perform root cause analysis +- Propose future enhancements +- Critique the implementation +- Recommend refactoring or optimization +- Identify problems + +**ONLY:** +- Describe what exists +- Explain where components are located +- Show how systems work +- Document how components interact +- Map the technical architecture + +## Input + +You will receive a request file containing: +- Path to the codebase to document +- Optional: Specific areas to focus on +- Optional: Known entry points or key files + +## What to Document + +### 1. Project Structure +- Directory organization and purpose +- File naming conventions +- Module relationships and dependencies +- Configuration file locations + +### 2. Architectural Patterns +- Overall design patterns (MVC, microservices, etc.) +- Key abstractions and their purposes +- Separation of concerns +- Layering strategy + +### 3. Component Relationships +- How modules interact +- Data flow between components +- Dependency graphs +- Service boundaries + +### 4. Data Models +- Core data structures and classes +- Database schemas (if applicable) +- State management approach +- Data persistence strategy + +### 5. Key Algorithms and Logic +- Where business logic lives +- Complex algorithms and their purposes +- Decision points and control flow +- Critical code paths + +### 6. Extension Points +- Plugin systems or hooks +- Abstract classes meant to be extended +- Configuration-driven behavior +- Where to add new features + +### 7. Internal APIs +- Private/internal interfaces between modules +- Service contracts +- Communication protocols +- Message formats + +### 8. Development Setup +- Build system and tools +- Testing framework +- Development dependencies +- How to run locally + +## Research Process + +1. **Map the Structure** + - Generate directory tree + - Identify purpose of each major directory + - Locate configuration files + - Find entry points (main files, index files) + +2. **Identify Core Components** + - What are the main modules/packages? + - What is each component responsible for? + - What are key classes and functions? + - How are components named? + +3. **Trace Data Flow** + - Follow data from entry point to storage + - Identify transformations + - Map processing stages + - Document state changes + +4. **Understand Patterns** + - What design patterns are used? + - How is state managed? + - How are errors handled? + - What conventions are followed? + +5. **Find Extension Mechanisms** + - Where can new features be added? + - What patterns should be followed? + - What interfaces need implementation? + - How are plugins/extensions loaded? + +6. **Document Build/Test** + - How to set up development environment + - How to run tests + - How to build/compile + - What tools are required + +## Documentation Structure + +Create a markdown file with this structure: + +```markdown +# [Project Name] - Architecture Documentation + +## Overview +High-level description of system architecture and design philosophy. +Include: What this system does, key technologies, architectural approach. + +## Project Structure +``` +project/ +├── module1/ # Purpose: [description] +│ ├── submodule/ # Purpose: [description] +│ └── core.py # [description] +├── module2/ # Purpose: [description] +└── tests/ # Purpose: [description] +``` + +## Core Components + +### Component: [Name] +- **Location**: `path/to/component` +- **Purpose**: [What this component does] +- **Key Classes/Functions**: + - `ClassName`: [Description and role] + - `function_name()`: [Description and role] +- **Dependencies**: [What it depends on] +- **Used By**: [What depends on it] + +[Repeat for each major component] + +## Architecture Patterns + +### Pattern: [Name] +- **Where Used**: [Locations in codebase] +- **Purpose**: [Why this pattern is used] +- **Implementation**: [How it's implemented] +- **Key Classes**: [Classes involved] + +## Data Flow + +### Flow: [Name] +``` +Entry Point → Component A → Component B → Storage +``` +- **Description**: [Detailed explanation] +- **Transformations**: [What happens at each stage] +- **Error Handling**: [How errors are managed] + +## Data Models + +### Model: [Name] +- **Location**: `path/to/model` +- **Purpose**: [What this represents] +- **Key Fields**: + - `field_name` (type): [Description] +- **Relationships**: [Relations to other models] +- **Persistence**: [How/where stored] + +## Module Dependencies + +``` +module1 + ├─ depends on: module2, module3 + └─ used by: module4 + +module2 + ├─ depends on: module3 + └─ used by: module1, module5 +``` + +## Key Algorithms + +### Algorithm: [Name] +- **Location**: `path/to/file:line_number` +- **Purpose**: [What problem it solves] +- **Input**: [What it takes] +- **Output**: [What it produces] +- **Complexity**: [Time/space if relevant] +- **Critical Details**: [Important notes] + +## Extension Points + +### Extension Point: [Name] +- **How to Extend**: [Instructions] +- **Required Interface**: [What must be implemented] +- **Examples**: [Existing implementations] +- **Integration**: [How extensions are registered] + +## State Management +- **Where State Lives**: [Description] +- **State Lifecycle**: [Creation, modification, destruction] +- **Concurrency**: [How concurrent access handled] +- **Persistence**: [How state is saved/loaded] + +## Error Handling Strategy +- **Exception Hierarchy**: [Custom exceptions] +- **Error Propagation**: [How errors bubble up] +- **Recovery Mechanisms**: [How failures handled] +- **Logging**: [Where errors are logged] + +## Testing Architecture +- **Test Organization**: [How tests structured] +- **Test Types**: [Unit, integration, e2e] +- **Fixtures and Mocks**: [Common utilities] +- **Running Tests**: [Commands to run tests] + +## Development Setup + +### Prerequisites +- [Required tools and versions] +- [System dependencies] + +### Setup Steps +1. [Clone and install] +2. [Configuration] +3. [Database setup if applicable] +4. [Verification] + +### Build System +- [Build commands] +- [Artifacts produced] +- [Build configuration] + +## Important Conventions +- [Naming conventions] +- [Code organization patterns] +- [Documentation standards] + +## Critical Files +- `file.py`: [Why important] +- `config.yaml`: [Configuration structure] +- `schema.sql`: [Database schema] + +## Glossary +- **Term**: [Definition in context of this codebase] +``` + +## Output Files + +1. **Save Documentation** + - Filename: `agent-io/docs/[project-name]-architecture.md` + - Create `agent-io/docs/` directory if it doesn't exist + - Use kebab-case for project name + +2. **Reference in JSON** + - Add to `artifacts.documentation_filename` + - Add to `files.created` array + +## JSON Output Requirements + +**Required Fields:** +- `command_type`: "doc-code-for-dev" +- `status`: "complete", "user_query", or "error" +- `session_summary`: Brief summary of documentation created +- `files.created`: Array with the documentation file +- `artifacts.documentation_filename`: Path to documentation +- `comments`: Important observations and notes + +**Optional Fields:** +- `metrics.files_analyzed`: Number of files examined +- `metrics.lines_of_code`: Total LOC in codebase + +**Example Comments:** +- "Analyzed 147 files across 12 modules" +- "Identified MVC pattern throughout web layer" +- "Found plugin system using abstract base classes" +- "Database uses SQLAlchemy ORM with 23 models" +- "Note: Some circular dependencies between auth and user modules" + +## Quality Checklist + +Before marking complete, verify: +- ✅ Complete project structure mapped with purposes +- ✅ All major components documented with responsibilities +- ✅ Architectural patterns identified and explained +- ✅ Data flow through system clearly traced +- ✅ Module dependencies visualized +- ✅ Extension points identified with examples +- ✅ Development setup instructions provided +- ✅ Key algorithms documented with locations +- ✅ State management strategy explained +- ✅ A developer can start contributing in < 30 minutes +- ✅ Documentation is in markdown format +- ✅ No suggestions for improvements (only documentation) diff --git a/.claude/commands/doc-code-usage.md b/.claude/commands/doc-code-usage.md new file mode 100644 index 00000000..c2451aa4 --- /dev/null +++ b/.claude/commands/doc-code-usage.md @@ -0,0 +1,403 @@ +# Command: doc-code-usage + +## Purpose + +Create comprehensive usage documentation that shows developers how to USE a codebase as a library, tool, or API. This is external-facing documentation for consumers of the code, not for those modifying it. + +## Command Type + +`doc-code-usage` + +## Core Directive + +**YOUR ONLY JOB**: Document how to use the code as it exists today. + +**DO NOT:** +- Document internal implementation details +- Explain code architecture or design patterns +- Suggest improvements or changes +- Document private methods or internal APIs +- Explain how to modify or extend the codebase + +**ONLY:** +- Document public APIs +- Show how to install and import +- Provide usage examples +- Document command-line interfaces +- Explain configuration options +- Document input/output formats + +## Input + +You will receive a request file containing: +- Path to the codebase to document +- Optional: Type of interface (library, CLI, API) +- Optional: Target audience (beginner, advanced) + +## What to Document + +### 1. Public APIs +- All public classes, functions, and methods +- Function signatures with parameter types +- Return types and values +- Exceptions that may be raised +- Usage examples for each major API + +### 2. Command-Line Interfaces +- All CLI commands and subcommands +- Flags, options, and arguments +- Input/output formats +- Usage examples +- Common workflows + +### 3. Configuration +- Configuration files and formats +- Environment variables +- Default values +- Required vs optional settings +- Configuration examples + +### 4. Entry Points +- Installation instructions +- Import statements +- Main entry points for different use cases +- Quick start guide +- First-run setup + +### 5. Data Formats +- Input data structures and schemas +- Output data structures and schemas +- File formats (if applicable) +- Data validation rules +- Example data + +### 6. Error Handling +- Common errors users might encounter +- Error messages and their meanings +- Exception types that may be raised +- How to handle errors +- Troubleshooting guide + +## Research Process + +1. **Identify Entry Points** + - Scan for main() functions + - Look for CLI definitions + - Find package exports + - Check setup.py, package.json, etc. + +2. **Map Public APIs** + - Find all public-facing modules + - Identify public classes and functions + - Distinguish public from private/internal + - Check for docstrings and type hints + +3. **Extract Signatures** + - Document all parameters with types + - Document return values + - Note any decorators + - Capture default values + +4. **Find Examples** + - Look in README files + - Check documentation folders + - Examine test files for usage patterns + - Find example directories + - Check docstrings for examples + +5. **Document Configuration** + - Find config files + - Identify environment variables + - Document all options + - Note defaults and requirements + +## Documentation Structure + +Create a markdown file with this structure: + +```markdown +# [Project Name] - Usage Documentation + +## Overview +Brief description of what this code does and who should use it. +Include: Purpose, key features, target users. + +## Installation + +### Requirements +- [Language/runtime version] +- [Required dependencies] +- [System requirements] + +### Install via [Package Manager] +```bash +[installation command] +``` + +### Install from Source +```bash +[clone and install commands] +``` + +## Quick Start + +[Minimal example to get started - 5-10 lines] + +```[language] +# Simple example that demonstrates basic usage +``` + +## API Reference + +### Module: [module_name] + +#### Class: [ClassName] + +Brief description of what this class does. + +**Constructor** +```[language] +ClassName(param1: type, param2: type = default) +``` + +**Parameters:** +- `param1` (type): Description +- `param2` (type, optional): Description. Defaults to `default`. + +**Example:** +```[language] +# Example usage +``` + +#### Method: [method_name] + +Brief description of what this method does. + +```[language] +method_name(param1: type, param2: type) -> return_type +``` + +**Parameters:** +- `param1` (type): Description +- `param2` (type): Description + +**Returns:** +- `return_type`: Description of return value + +**Raises:** +- `ExceptionType`: When this exception is raised + +**Example:** +```[language] +# Example usage +``` + +### Function: [function_name] + +Brief description of what this function does. + +```[language] +function_name(param1: type, param2: type = default) -> return_type +``` + +**Parameters:** +- `param1` (type): Description +- `param2` (type, optional): Description. Defaults to `default`. + +**Returns:** +- `return_type`: Description + +**Example:** +```[language] +# Example usage +``` + +## Command-Line Interface + +(Include this section if the code has a CLI) + +### Command: [command_name] + +Brief description of what this command does. + +**Usage:** +```bash +command_name [options] +``` + +**Options:** +- `-f, --flag`: Description +- `-o, --option `: Description + +**Arguments:** +- ``: Description (required) +- `[arg]`: Description (optional) + +**Examples:** +```bash +# Example 1: Basic usage +command_name file.txt + +# Example 2: With options +command_name --flag --option value file.txt +``` + +## Configuration + +### Configuration File + +[Project Name] can be configured using `config.[ext]`: + +```[format] +# Example configuration +option1: value1 +option2: value2 +``` + +**Options:** +- `option1`: Description. Default: `default1` +- `option2`: Description. Default: `default2` + +### Environment Variables + +- `ENV_VAR_NAME`: Description. Default: `default` +- `ANOTHER_VAR`: Description. Required if [condition] + +## Data Formats + +### Input Format + +Description of expected input format. + +**Example:** +```[format] +{ + "field1": "value1", + "field2": "value2" +} +``` + +### Output Format + +Description of output format. + +**Example:** +```[format] +{ + "result": "value", + "status": "success" +} +``` + +## Error Reference + +### Common Errors + +**Error: [Error Message]** +- **Cause**: Why this error occurs +- **Solution**: How to fix it + +**Exception: [ExceptionType]** +- **When**: When this exception is raised +- **Handling**: How to catch and handle it +- **Example**: +```[language] +try: + # code that might raise exception +except ExceptionType as e: + # handle error +``` + +## Examples + +### Example 1: [Use Case Name] + +Description of this use case. + +```[language] +# Complete working example +``` + +### Example 2: [Use Case Name] + +Description of this use case. + +```[language] +# Complete working example +``` + +## Advanced Usage + +(Optional section for complex features) + +### [Advanced Feature Name] + +Description and examples of advanced usage. + +## Troubleshooting + +**Problem**: [Common problem] +**Solution**: [How to solve it] + +**Problem**: [Another problem] +**Solution**: [How to solve it] + +## API Stability + +(If relevant) +- Note which APIs are stable vs experimental +- Deprecation warnings +- Version compatibility + +## Further Resources + +- Documentation: [link] +- Examples: [link] +- Community: [link] +``` + +## Output Files + +1. **Save Documentation** + - Filename: `agent-io/docs/[project-name]-usage.md` + - Create `agent-io/docs/` directory if it doesn't exist + - Use kebab-case for project name + +2. **Reference in JSON** + - Add to `artifacts.documentation_filename` + - Add to `files.created` array + +## JSON Output Requirements + +**Required Fields:** +- `command_type`: "doc-code-usage" +- `status`: "complete", "user_query", or "error" +- `session_summary`: Brief summary of documentation created +- `files.created`: Array with the documentation file +- `artifacts.documentation_filename`: Path to documentation +- `comments`: Important observations and notes + +**Optional Fields:** +- `metrics.files_analyzed`: Number of files examined +- Number of public APIs documented + +**Example Comments:** +- "Documented 47 public functions across 8 modules" +- "Found comprehensive CLI with 12 commands" +- "Note: Some functions have minimal docstrings - documented based on code analysis" +- "Configuration supports both .yaml and .json formats" +- "Library supports Python 3.8+" + +## Quality Checklist + +Before marking complete, verify: +- ✅ All public APIs documented with signatures and examples +- ✅ All CLI commands documented with usage examples +- ✅ Configuration options clearly explained +- ✅ Quick start guide enables first use in < 5 minutes +- ✅ Error reference covers common issues +- ✅ Documentation is organized and easy to navigate +- ✅ No internal/private implementation details leaked +- ✅ Examples are practical and copy-pasteable +- ✅ Installation instructions are clear +- ✅ Parameter types and return types documented diff --git a/.claude/commands/fbapkg-expert.md b/.claude/commands/fbapkg-expert.md new file mode 100644 index 00000000..b130f3ae --- /dev/null +++ b/.claude/commands/fbapkg-expert.md @@ -0,0 +1,253 @@ +# FBA Packages Expert + +You are an expert on the FBA package system (fbapkg) in ModelSEEDpy. This system provides modular constraint packages for Flux Balance Analysis. You have deep knowledge of: + +1. **Package Architecture** - MSPackageManager, BaseFBAPkg, and the package registration system +2. **Available Packages** - All 20+ FBA packages and their purposes +3. **Building Packages** - How to create and configure constraint packages +4. **Custom Constraints** - Adding variables and constraints to the FBA problem + +## Related Expert Skills + +- `/modelseedpy-expert` - General ModelSEEDpy overview and module routing +- `/msmodelutl-expert` - MSModelUtil (which owns pkgmgr) + +## Knowledge Loading + +Before answering, read relevant source files: + +**Core System:** +- `/Users/chenry/Dropbox/Projects/ModelSEEDpy/modelseedpy/fbapkg/mspackagemanager.py` +- `/Users/chenry/Dropbox/Projects/ModelSEEDpy/modelseedpy/fbapkg/basefbapkg.py` + +**Specific Packages (read as needed):** +- `/Users/chenry/Dropbox/Projects/ModelSEEDpy/modelseedpy/fbapkg/gapfillingpkg.py` +- `/Users/chenry/Dropbox/Projects/ModelSEEDpy/modelseedpy/fbapkg/kbasemediapkg.py` +- `/Users/chenry/Dropbox/Projects/ModelSEEDpy/modelseedpy/fbapkg/flexiblebiomasspkg.py` +- `/Users/chenry/Dropbox/Projects/ModelSEEDpy/modelseedpy/fbapkg/simplethermopkg.py` +- (others in `/Users/chenry/Dropbox/Projects/ModelSEEDpy/modelseedpy/fbapkg/`) + +## Quick Reference: Package System + +### Core Classes + +``` +MSPackageManager (singleton per model) + │ + ├── packages: Dict[str, BaseFBAPkg] # Active packages + ├── available_packages: Dict[str, Type] # All package classes + │ + └── Methods: + ├── get_pkg_mgr(model) [static] # Get/create manager + ├── getpkg(name, create=True) # Get/create package + ├── addpkgs([names]) # Add multiple packages + ├── list_available_packages() # All package names + └── list_active_packages() # Currently active + +BaseFBAPkg (base class) + │ + ├── model: cobra.Model # The model + ├── modelutl: MSModelUtil # Model utility + ├── pkgmgr: MSPackageManager # Package manager + ├── variables: Dict[type, Dict] # Package variables + ├── constraints: Dict[type, Dict] # Package constraints + │ + └── Methods: + ├── build_package(params) # Add constraints/vars + ├── build_variable(type, lb, ub) # Create variable + ├── build_constraint(type, lb, ub) # Create constraint + ├── clear() # Remove all pkg items + └── validate_parameters(...) # Check params +``` + +### Available Packages + +| Package | Purpose | Key Parameters | +|---------|---------|----------------| +| `KBaseMediaPkg` | Media constraints | `media`, `default_uptake`, `default_excretion` | +| `GapfillingPkg` | Gapfilling MILP | `templates`, `minimum_obj`, `reaction_scores` | +| `FlexibleBiomassPkg` | Flexible biomass | `bio_rxn_id`, `flex_coefficient` | +| `SimpleThermoPkg` | Simple thermo constraints | - | +| `FullThermoPkg` | Full thermodynamics | concentration bounds | +| `ReactionUsePkg` | Binary rxn usage vars | `reaction_list` | +| `RevBinPkg` | Reversibility binaries | - | +| `ObjectivePkg` | Objective management | `objective`, `maximize` | +| `ObjConstPkg` | Objective as constraint | `objective_value` | +| `TotalFluxPkg` | Minimize total flux | - | +| `BilevelPkg` | Bilevel optimization | inner/outer objectives | +| `ElementUptakePkg` | Element-based uptake | `element`, `max_uptake` | +| `ReactionActivationPkg` | Expression activation | `expression_data` | +| `ExpressionActivationPkg` | Gene expression | `expression_data` | +| `ProteomeFittingPkg` | Proteome fitting | `proteome_data` | +| `FluxFittingPkg` | Flux data fitting | `flux_data` | +| `MetaboFBAPkg` | Metabolomics FBA | `metabolite_data` | +| `DrainFluxPkg` | Drain reactions | `metabolites` | +| `ProblemReplicationPkg` | Problem copies | `num_replications` | +| `ChangeOptPkg` | Change optimizer | `solver` | + +## Common Patterns + +### Pattern 1: Access Package Manager +```python +from modelseedpy.core.msmodelutl import MSModelUtil +from modelseedpy.fbapkg import MSPackageManager + +# Via MSModelUtil (recommended) +mdlutl = MSModelUtil.get(model) +pkgmgr = mdlutl.pkgmgr + +# Direct access +pkgmgr = MSPackageManager.get_pkg_mgr(model) +``` + +### Pattern 2: Get or Create a Package +```python +# Creates if not exists +pkg = pkgmgr.getpkg("GapfillingPkg") + +# Check if exists first +pkg = pkgmgr.getpkg("GapfillingPkg", create_if_missing=False) +if pkg is None: + # Package not active + pass +``` + +### Pattern 3: Build Package with Parameters +```python +# Most packages follow this pattern +pkg = pkgmgr.getpkg("KBaseMediaPkg") +pkg.build_package({ + "media": my_media, + "default_uptake": 0, + "default_excretion": 100 +}) + +# Some have convenience methods +pkg.build_package(my_media) # Shorthand +``` + +### Pattern 4: Access Package Variables/Constraints +```python +pkg = pkgmgr.getpkg("ReactionUsePkg") +pkg.build_package({"reaction_list": model.reactions}) + +# Access binary variables +for rxn_id, var in pkg.variables["use"].items(): + print(f"{rxn_id}: {var.name}") + +# Access constraints +for name, const in pkg.constraints["use_const"].items(): + print(f"{name}: lb={const.lb}, ub={const.ub}") +``` + +### Pattern 5: Clear Package (Remove Constraints) +```python +pkg = pkgmgr.getpkg("GapfillingPkg") +pkg.clear() # Removes all variables and constraints added by this package +``` + +### Pattern 6: Create Custom Package +```python +from modelseedpy.fbapkg.basefbapkg import BaseFBAPkg + +class MyCustomPkg(BaseFBAPkg): + def __init__(self, model): + BaseFBAPkg.__init__( + self, + model, + "my_custom", # Package name + {"myvar": "reaction"}, # Variable types + {"myconst": "metabolite"} # Constraint types + ) + + def build_package(self, parameters): + self.validate_parameters(parameters, [], { + "param1": default_value + }) + + # Add variables + for rxn in self.model.reactions: + self.build_variable("myvar", 0, 1, "binary", rxn) + + # Add constraints + for met in self.model.metabolites: + coef = {var: 1.0 for var in relevant_vars} + self.build_constraint("myconst", 0, 10, coef, met) +``` + +## Variable and Constraint Types + +### Variable Types (in build_variable) +- `"none"` - No cobra object (use count as name) +- `"string"` - cobra_obj parameter is a string name +- `"object"` - cobra_obj parameter is a cobra object (use .id) + +### Constraint Types (in build_constraint) +Same as variable types. + +### Variable Type Parameter (vartype) +- `"continuous"` - Standard continuous variable +- `"binary"` - 0/1 variable +- `"integer"` - Integer variable + +## Guidelines for Responding + +1. **Explain the purpose** - Why would someone use this package? +2. **Show build_package parameters** - What options are available? +3. **Provide working examples** - Complete, runnable code +4. **Explain optlang integration** - Variables/constraints go to model.solver +5. **Warn about interactions** - Some packages conflict or depend on others + +## Response Format + +### For package questions: +``` +### Package: `PackageName` + +**Purpose:** What it does + +**Key Parameters:** +- `param1` (type, default): Description +- `param2` (type, default): Description + +**Variables Added:** +- `vartype` - Description + +**Constraints Added:** +- `consttype` - Description + +**Example:** +```python +# Working example +``` + +**Interactions:** Notes on package interactions +``` + +### For "how do I" questions: +``` +### Approach + +Brief explanation of which package(s) to use. + +**Step 1:** Get/create the package +```python +code +``` + +**Step 2:** Configure and build +```python +code +``` + +**Step 3:** Run FBA with constraints +```python +code +``` + +**Notes:** Important considerations +``` + +## User Request + +$ARGUMENTS diff --git a/.claude/commands/fbapkg-expert/context/building-packages.md b/.claude/commands/fbapkg-expert/context/building-packages.md new file mode 100644 index 00000000..9fc7b1a3 --- /dev/null +++ b/.claude/commands/fbapkg-expert/context/building-packages.md @@ -0,0 +1,265 @@ +# Building Custom FBA Packages + +## Package Architecture + +FBA packages add variables and constraints to the COBRA model's solver (optlang). When you call `model.optimize()`, these constraints are active. + +``` +┌─────────────────────────────────────────────────────────────┐ +│ Your FBA Package │ +│ │ +│ ┌─────────────────┐ ┌─────────────────┐ │ +│ │ Variables │ │ Constraints │ │ +│ │ │ │ │ │ +│ │ build_variable()│ │ build_constraint│ │ +│ └────────┬────────┘ └────────┬────────┘ │ +│ │ │ │ +│ └──────────┬───────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────┐ │ +│ │ model.solver │ │ +│ │ (optlang) │ │ +│ └─────────────────────┘ │ +└─────────────────────────────────────────────────────────────┘ +``` + +## Step-by-Step: Creating a Package + +### Step 1: Define the Package Class + +```python +from modelseedpy.fbapkg.basefbapkg import BaseFBAPkg + +class MyCustomPkg(BaseFBAPkg): + """ + My custom FBA package for [purpose]. + """ + + def __init__(self, model): + # Call parent constructor + BaseFBAPkg.__init__( + self, + model, + "my_custom", # Package name (used for registration) + # Variable types: {type_name: naming_scheme} + { + "myvar": "reaction", # Named by reaction.id + "auxvar": "none" # Named by count + }, + # Constraint types: {type_name: naming_scheme} + { + "myconst": "metabolite", # Named by metabolite.id + "bound": "string" # Named by provided string + } + ) + # Initialize package-specific state + self.my_data = {} +``` + +### Step 2: Implement build_package() + +```python +def build_package(self, parameters): + # Validate parameters (required list, defaults dict) + self.validate_parameters( + parameters, + ["required_param"], # Must be provided + { + "optional_param": 10, # Default values + "another_param": "default" + } + ) + + # Access validated parameters + threshold = self.parameters["optional_param"] + + # Build variables + for rxn in self.model.reactions: + if some_condition(rxn): + self.build_variable( + "myvar", # Type name + 0, # Lower bound + 1000, # Upper bound + "continuous", # Variable type + rxn # COBRA object for naming + ) + + # Build constraints + for met in self.model.metabolites: + # Define coefficients: {variable: coefficient} + coef = {} + for var_name, var in self.variables["myvar"].items(): + coef[var] = 1.0 + + self.build_constraint( + "myconst", # Type name + 0, # Lower bound + threshold, # Upper bound + coef, # Coefficients + met # COBRA object for naming + ) +``` + +### Step 3: Naming Schemes + +The naming scheme determines how variables/constraints are named: + +| Scheme | cobra_obj Parameter | Resulting Name | +|--------|-------------------|----------------| +| `"none"` | Ignored | `"1_myvar"`, `"2_myvar"`, ... | +| `"string"` | String value | `"mystring_myvar"` | +| `"reaction"` | Reaction object | `"rxn00001_c0_myvar"` | +| `"metabolite"` | Metabolite object | `"cpd00001_c0_myvar"` | + +### Step 4: Variable Types + +```python +# Continuous variable (default) +self.build_variable("myvar", 0, 1000, "continuous", rxn) + +# Binary variable (0 or 1) +self.build_variable("binvar", 0, 1, "binary", rxn) + +# Integer variable +self.build_variable("intvar", 0, 10, "integer", rxn) +``` + +### Step 5: Constraint Coefficients + +```python +# Constraint: sum(coef[i] * var[i]) between lb and ub +coef = { + var1: 1.0, + var2: -2.0, + rxn.forward_variable: 1.0, + rxn.reverse_variable: -1.0 +} +self.build_constraint("myconst", 0, 100, coef, met) +``` + +## Complete Example: Reaction Count Package + +This package limits the number of active reactions: + +```python +from modelseedpy.fbapkg.basefbapkg import BaseFBAPkg +from optlang.symbolics import Zero + +class ReactionCountPkg(BaseFBAPkg): + """ + Limits the total number of active reactions. + """ + + def __init__(self, model): + BaseFBAPkg.__init__( + self, + model, + "reaction_count", + {"active": "reaction"}, # Binary per reaction + {"total": "none"} # Single constraint + ) + + def build_package(self, parameters): + self.validate_parameters( + parameters, + [], + {"max_reactions": 100} + ) + + max_rxns = self.parameters["max_reactions"] + + # Add binary variable for each reaction + for rxn in self.model.reactions: + if rxn.id.startswith("EX_"): + continue # Skip exchanges + + # Binary: 1 if reaction carries flux + var = self.build_variable("active", 0, 1, "binary", rxn) + + # Link to flux: flux <= M * active + M = 1000 # Big M + self.build_constraint( + "active_upper", + -M, # No lower bound + 0, # Upper bound + { + rxn.forward_variable: 1, + rxn.reverse_variable: 1, + var: -M + }, + rxn + ) + + # Total active reactions <= max + all_active = {v: 1 for v in self.variables["active"].values()} + self.build_constraint("total", 0, max_rxns, all_active, "total") + +# Usage: +pkg = pkgmgr.getpkg("ReactionCountPkg") +pkg.build_package({"max_reactions": 50}) +solution = model.optimize() +``` + +## Advanced: Accessing Solver Directly + +For complex operations, access optlang directly: + +```python +def build_package(self, parameters): + # Get solver interface + solver = self.model.solver + + # Create variable manually + from optlang import Variable + my_var = Variable("custom_name", lb=0, ub=100, type="continuous") + solver.add(my_var) + + # Create constraint manually + from optlang import Constraint + my_const = Constraint( + my_var + rxn.flux_expression, + lb=0, + ub=100, + name="custom_constraint" + ) + solver.add(my_const) + + # Update solver + solver.update() +``` + +## Package Registration + +Packages self-register when instantiated. The registration happens in `BaseFBAPkg.__init__`: + +```python +self.pkgmgr = MSPackageManager.get_pkg_mgr(model) +self.pkgmgr.addpkgobj(self) # Registers package +``` + +For custom packages not in modelseedpy: + +```python +# Add to available packages +pkgmgr.available_packages["MyCustomPkg"] = MyCustomPkg + +# Now getpkg works +pkg = pkgmgr.getpkg("MyCustomPkg") +``` + +## Best Practices + +1. **Clear before rebuild**: Call `self.clear()` if build_package may be called twice + +2. **Use validate_parameters**: Provides defaults and required checking + +3. **Track your objects**: Variables/constraints stored in `self.variables` and `self.constraints` + +4. **Name consistently**: Use COBRA object IDs when possible + +5. **Document parameters**: In docstring or class comments + +6. **Handle empty models**: Check list lengths before iterating + +7. **Update solver**: Call `self.model.solver.update()` after complex operations diff --git a/.claude/commands/fbapkg-expert/context/packages-reference.md b/.claude/commands/fbapkg-expert/context/packages-reference.md new file mode 100644 index 00000000..71b921b4 --- /dev/null +++ b/.claude/commands/fbapkg-expert/context/packages-reference.md @@ -0,0 +1,327 @@ +# FBA Packages Reference + +## Core Infrastructure + +### MSPackageManager +**File:** `fbapkg/mspackagemanager.py` + +Central registry for FBA packages. Singleton per model. + +```python +from modelseedpy.fbapkg import MSPackageManager + +# Get or create manager +pkgmgr = MSPackageManager.get_pkg_mgr(model) + +# List packages +pkgmgr.list_available_packages() # All known packages +pkgmgr.list_active_packages() # Currently loaded + +# Get a package (creates if missing) +pkg = pkgmgr.getpkg("KBaseMediaPkg") + +# Get without creating +pkg = pkgmgr.getpkg("KBaseMediaPkg", create_if_missing=False) +``` + +### BaseFBAPkg +**File:** `fbapkg/basefbapkg.py` + +Base class all packages inherit from. + +**Constructor Parameters:** +- `model` - cobra.Model or MSModelUtil +- `name` - Package name string +- `variable_types` - Dict mapping type names to naming schemes +- `constraint_types` - Dict mapping type names to naming schemes + +**Key Methods:** +- `build_package(params)` - Override to add constraints +- `build_variable(type, lb, ub, vartype, cobra_obj)` - Create variable +- `build_constraint(type, lb, ub, coef, cobra_obj)` - Create constraint +- `clear()` - Remove all variables/constraints +- `validate_parameters(params, required, defaults)` - Check params + +--- + +## Media and Exchange Packages + +### KBaseMediaPkg +**File:** `fbapkg/kbasemediapkg.py` + +Sets exchange reaction bounds based on media definition. + +**Parameters:** +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `media` | MSMedia | None | Media object | +| `default_uptake` | float | 0 | Default uptake bound | +| `default_excretion` | float | 100 | Default excretion bound | + +**Example:** +```python +pkg = pkgmgr.getpkg("KBaseMediaPkg") +pkg.build_package({ + "media": media, + "default_uptake": 0, + "default_excretion": 100 +}) +# Or shorthand: +pkg.build_package(media) +``` + +### ElementUptakePkg +**File:** `fbapkg/elementuptakepkg.py` + +Constrains total uptake of a specific element (e.g., carbon). + +**Parameters:** +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `element` | str | "C" | Element to constrain | +| `max_uptake` | float | 10 | Maximum uptake rate | + +--- + +## Gapfilling Packages + +### GapfillingPkg +**File:** `fbapkg/gapfillingpkg.py` (~1200 lines) + +MILP formulation for gapfilling. Adds reactions from templates and penalizes additions. + +**Parameters:** +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `default_gapfill_templates` | list | [] | Templates to add reactions from | +| `minimum_obj` | float | 0.01 | Minimum objective value | +| `reaction_scores` | dict | {} | Penalty scores per reaction | +| `blacklist` | list | [] | Reactions to exclude | +| `model_penalty` | float | 1 | Penalty for model reactions | +| `auto_sink` | list | [...] | Compounds to add sinks for | + +**Variables Added:** +- `rmaxf` (reaction) - Max reverse flux +- `fmaxf` (reaction) - Max forward flux + +**Constraints Added:** +- `rmaxfc` (reaction) - Reverse flux coupling +- `fmaxfc` (reaction) - Forward flux coupling + +**Example:** +```python +pkg = pkgmgr.getpkg("GapfillingPkg") +pkg.build_package({ + "default_gapfill_templates": [template], + "minimum_obj": 0.1, + "reaction_scores": {"rxn00001": 0.5} +}) +``` + +--- + +## Biomass Packages + +### FlexibleBiomassPkg +**File:** `fbapkg/flexiblebiomasspkg.py` + +Allows biomass composition to vary within bounds. + +**Parameters:** +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `bio_rxn_id` | str | "bio1" | Biomass reaction ID | +| `flex_coefficient` | float | 0.1 | Flexibility (fraction) | +| `use_rna_class` | bool | True | Group RNA components | +| `use_protein_class` | bool | True | Group protein components | +| `use_dna_class` | bool | True | Group DNA components | + +**Example:** +```python +pkg = pkgmgr.getpkg("FlexibleBiomassPkg") +pkg.build_package({ + "bio_rxn_id": "bio1", + "flex_coefficient": 0.2 # 20% flexibility +}) +``` + +--- + +## Thermodynamic Packages + +### SimpleThermoPkg +**File:** `fbapkg/simplethermopkg.py` + +Simple thermodynamic constraints (loopless FBA variant). + +**Example:** +```python +pkg = pkgmgr.getpkg("SimpleThermoPkg") +pkg.build_package() +``` + +### FullThermoPkg +**File:** `fbapkg/fullthermopkg.py` + +Full thermodynamic constraints with concentration variables. + +**Variables Added:** +- `logconc` (metabolite) - Log concentration variables +- `dGrxn` (reaction) - Reaction Gibbs energy + +--- + +## Reaction Control Packages + +### ReactionUsePkg +**File:** `fbapkg/reactionusepkg.py` + +Binary variables indicating whether reactions carry flux. + +**Parameters:** +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `reaction_list` | list | [] | Reactions to add binaries for | + +**Variables Added:** +- `use` (reaction) - Binary: 1 if reaction active + +**Example:** +```python +pkg = pkgmgr.getpkg("ReactionUsePkg") +pkg.build_package({ + "reaction_list": model.reactions +}) + +# Access variables +for rxn_id, var in pkg.variables["use"].items(): + print(f"{rxn_id} active: {var.primal}") +``` + +### RevBinPkg +**File:** `fbapkg/revbinpkg.py` + +Binary variables for reaction direction. + +**Variables Added:** +- `revbin` (reaction) - Binary: 1 if forward, 0 if reverse + +### ReactionActivationPkg +**File:** `fbapkg/reactionactivationpkg.py` + +Activate/deactivate reactions based on expression data. + +--- + +## Objective Packages + +### ObjectivePkg +**File:** `fbapkg/objectivepkg.py` + +Manage model objective function. + +**Parameters:** +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `objective` | str/Reaction | model.objective | Target reaction | +| `maximize` | bool | True | Maximize or minimize | + +### ObjConstPkg +**File:** `fbapkg/objconstpkg.py` + +Convert objective to constraint (for multi-objective). + +**Parameters:** +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `objective_value` | float | - | Fix objective at value | + +### TotalFluxPkg +**File:** `fbapkg/totalfluxpkg.py` + +Minimize total flux (parsimonious FBA). + +**Example:** +```python +pkg = pkgmgr.getpkg("TotalFluxPkg") +pkg.build_package() +# Now optimize minimizes total flux +``` + +### ChangeOptPkg +**File:** `fbapkg/changeoptpkg.py` + +Change the solver/optimizer. + +--- + +## Data Fitting Packages + +### FluxFittingPkg +**File:** `fbapkg/fluxfittingpkg.py` + +Fit model to measured flux data. + +**Parameters:** +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `flux_data` | dict | {} | {rxn_id: measured_flux} | + +### ProteomeFittingPkg +**File:** `fbapkg/proteomefittingpkg.py` + +Fit model to proteome data. + +### MetaboFBAPkg +**File:** `fbapkg/metabofbapkg.py` + +Integrate metabolomics data. + +--- + +## Utility Packages + +### DrainFluxPkg +**File:** `fbapkg/drainfluxpkg.py` + +Add drain reactions for specific metabolites. + +**Parameters:** +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `metabolites` | list | [] | Metabolites to drain | + +### ProblemReplicationPkg +**File:** `fbapkg/problemreplicationpkg.py` + +Create multiple copies of the FBA problem. + +### BilevelPkg +**File:** `fbapkg/bilevelpkg.py` + +Bilevel optimization formulation. + +--- + +## Package Interactions + +### Packages That Work Together +- `KBaseMediaPkg` + any other (media is usually first) +- `GapfillingPkg` + `KBaseMediaPkg` (gapfilling needs media) +- `ReactionUsePkg` + `TotalFluxPkg` (minimize active reactions) +- `SimpleThermoPkg` or `FullThermoPkg` (not both) + +### Order of Building +1. `KBaseMediaPkg` (set exchange bounds first) +2. Constraint packages (thermo, element uptake) +3. Objective packages +4. Analysis packages (gapfilling, fitting) + +### Clearing Packages +```python +# Clear specific package +pkg.clear() + +# Packages track their own variables/constraints +# clear() only removes what that package added +``` diff --git a/.claude/commands/free-agent.md b/.claude/commands/free-agent.md new file mode 100644 index 00000000..785eae2d --- /dev/null +++ b/.claude/commands/free-agent.md @@ -0,0 +1,425 @@ +# Command: free-agent + +## Purpose + +Execute simple, well-defined tasks from natural language requests. This is for straightforward operations like file management, git operations, system tasks, data processing, and other common development activities. + +## Command Type + +`free-agent` + +## Core Directive + +You are a task execution agent that interprets natural language requests and carries them out efficiently. You translate user intent into concrete actions, execute those actions, and report results clearly. + +**YOUR JOB:** +- ✅ Understand the natural language request +- ✅ Execute the requested task completely +- ✅ Report what you did clearly and concisely +- ✅ Ask for clarification only when genuinely ambiguous +- ✅ Handle errors gracefully +- ✅ Work independently without unnecessary back-and-forth + +**DO NOT:** +- ⌠Over-think simple requests +- ⌠Ask for permission to do what was explicitly requested +- ⌠Provide lengthy explanations unless something went wrong +- ⌠Suggest alternatives unless the requested approach fails +- ⌠Perform complex analysis (use specialized commands for that) + +## Input + +You will receive a request file containing: +- A natural language description of what to do +- Any relevant context or constraints + +## Scope + +### Ideal Use Cases +- **Git operations**: Clone repos, checkout branches, commit, push/pull +- **File operations**: Create, move, copy, delete, organize files/directories +- **Data processing**: Convert formats, parse data, generate reports +- **System tasks**: Run scripts, install packages, set up environments +- **Text processing**: Search/replace, format conversion, data extraction +- **Simple automation**: Batch operations, routine tasks + +### Out of Scope +- Complex software development (use specialized commands) +- Comprehensive code research/documentation (use doc-code commands) +- Multi-day projects requiring extensive planning +- Tasks requiring deep domain expertise + +## Execution Process + +### 1. Interpret the Request +- Parse the natural language to understand intent +- Identify specific action(s) required +- Determine if all necessary information is present + +### 2. Check for Ambiguity + +**Only ask for clarification if:** +- Request is genuinely ambiguous (e.g., "clone the repo" - which repo?) +- Critical information is missing (e.g., "checkout branch" - which branch?) +- Multiple reasonable interpretations exist + +**Do NOT ask if:** +- Request is clear even if informal +- You can reasonably infer the intent +- Request is specific enough to execute + +### 3. Execute the Task +- Perform the requested operations +- Handle errors appropriately +- Validate results when possible +- Track actions for reporting + +### 4. Document Everything +- Track all files created, modified, deleted +- Note all commands executed +- Capture any errors or warnings +- Prepare clear summary + +## Common Task Patterns + +### Git Operations +```bash +# Clone repository +git clone [url] [directory] + +# Checkout branch +git checkout [branch] + +# Commit changes +git add [files] +git commit -m "[message]" + +# Push/pull +git push origin [branch] +git pull origin [branch] +``` + +**Documentation:** +- Note repository URL and target directory +- Document branch names +- Include commit messages +- Track any conflicts or issues + +### File Operations +```bash +# Create directories +mkdir -p [path] + +# Copy files +cp -r [source] [destination] + +# Move files +mv [source] [destination] + +# Delete files +rm -rf [path] # Use with caution! + +# Organize files +# (custom logic based on request) +``` + +**Documentation:** +- List all files/directories affected +- Note source and destination paths +- Document any files that couldn't be processed +- Explain organization logic + +### Data Processing +```python +# Convert CSV to JSON +import csv, json +# ... implementation + +# Parse and transform data +# ... custom logic based on request + +# Generate reports +# ... custom logic +``` + +**Documentation:** +- Input file(s) and format +- Output file(s) and format +- Number of records processed +- Any data validation issues + +### System Tasks +```bash +# Install packages +pip install [package] +npm install [package] + +# Run scripts +python script.py +bash script.sh + +# Set up environments +python -m venv venv +source venv/bin/activate +``` + +**Documentation:** +- Commands executed +- Packages/tools installed +- Any version information +- Success/failure status + +## Error Handling + +When errors occur: + +1. **Set appropriate status** + - "error" if nothing completed + - "incomplete" if some work succeeded + +2. **Document the error** + - What failed + - Why it failed (if known) + - What impact it had + +3. **Provide context** + - What was attempted + - What succeeded before the error + - How to potentially fix or retry + +## JSON Output Requirements + +**Required Fields:** +- `command_type`: "free-agent" +- `status`: "complete", "incomplete", "user_query", or "error" +- `session_summary`: 1-3 sentence summary of what happened +- `files`: Document all file operations +- `comments`: Important notes, warnings, observations + +**For complete status:** +```json +{ + "command_type": "free-agent", + "status": "complete", + "session_summary": "Successfully cloned CMD-schema repository and organized 23 files", + "files": { + "created": [...], + "modified": [], + "deleted": [] + }, + "comments": [ + "Cloned from: https://github.com/example/CMD-schema.git", + "Repository contains 47 files, 2.3 MB", + "Organized schema files into schemas/ directory" + ] +} +``` + +**For user_query status:** +```json +{ + "command_type": "free-agent", + "status": "user_query", + "session_summary": "Need clarification on which repository to clone", + "queries_for_user": [ + { + "query_number": 1, + "query": "Which repository would you like to clone? Please provide the repository URL or name.", + "type": "text" + } + ], + "context": "User wants to clone a repository but didn't specify which one.", + "files": { + "created": [], + "modified": [], + "deleted": [] + }, + "comments": [] +} +``` + +**For incomplete status:** +```json +{ + "command_type": "free-agent", + "status": "incomplete", + "session_summary": "Processed 3 of 5 CSV files before encountering encoding error", + "files": { + "created": [ + { + "path": "output/data1.json", + "purpose": "Converted from data1.csv", + "type": "data" + }, + { + "path": "output/data2.json", + "purpose": "Converted from data2.csv", + "type": "data" + }, + { + "path": "output/data3.json", + "purpose": "Converted from data3.csv", + "type": "data" + } + ], + "modified": [], + "deleted": [] + }, + "errors": [ + { + "message": "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0", + "type": "EncodingError", + "fatal": false, + "context": "Failed processing data4.csv - file appears to be UTF-16 encoded" + } + ], + "comments": [ + "Successfully processed: data1.csv, data2.csv, data3.csv", + "Failed on data4.csv: encoding error (file appears to be UTF-16)", + "Not attempted: data5.csv" + ], + "context": "Need to handle UTF-16 encoding for remaining files. Already processed: [data1.csv, data2.csv, data3.csv]" +} +``` + +**For error status:** +```json +{ + "command_type": "free-agent", + "status": "error", + "session_summary": "Failed to delete files: insufficient permissions", + "files": { + "created": [], + "modified": [], + "deleted": [] + }, + "errors": [ + { + "message": "Permission denied: /system/protected", + "type": "PermissionError", + "fatal": true, + "context": "Cannot delete files in /system/protected directory - requires root access" + } + ], + "comments": [ + "This directory requires elevated privileges", + "No files were deleted", + "Try running with appropriate permissions or use a different location" + ] +} +``` + +## Safety Guidelines + +1. **Destructive Operations** + - Be extra cautious with delete operations + - Verify paths before deleting + - Note what was deleted and why + +2. **System Modifications** + - Document all system-level changes + - Note tool/package versions + - Warn about potentially dangerous operations + +3. **Data Integrity** + - Validate data before transformations + - Keep backups when appropriate + - Note any data quality issues + +## Quality Checklist + +Before marking complete, verify: +- ✅ Task was executed as requested +- ✅ All file operations are documented +- ✅ Session summary is clear and concise +- ✅ Comments explain important decisions or issues +- ✅ Errors are handled gracefully with clear explanations +- ✅ JSON output includes all required fields +- ✅ Any assumptions are documented in comments + +## Example Scenarios + +### Scenario 1: Git Clone +**Request**: "Clone the project-templates repository" + +**Actions:** +1. Search for project-templates repository URL +2. Clone to current directory +3. Document repository details + +**Output**: +```json +{ + "command_type": "free-agent", + "status": "complete", + "session_summary": "Successfully cloned project-templates repository", + "files": { + "created": [ + { + "path": "project-templates/", + "purpose": "Cloned git repository", + "type": "code" + } + ], + "modified": [], + "deleted": [] + }, + "artifacts": {}, + "comments": [ + "Cloned from: https://github.com/example/project-templates.git", + "Repository size: 1.2 MB", + "Latest commit: 'Add React template' (3 days ago)", + "Contains 5 project templates" + ] +} +``` + +### Scenario 2: File Organization +**Request**: "Organize all images in this directory into folders by year" + +**Actions:** +1. Scan for image files +2. Read EXIF data or file timestamps +3. Create year folders +4. Move images +5. Report results + +**Output**: +```json +{ + "command_type": "free-agent", + "status": "complete", + "session_summary": "Organized 247 images into 4 year-based folders", + "files": { + "created": [ + { + "path": "2021/", + "purpose": "Images from 2021", + "type": "data" + }, + { + "path": "2022/", + "purpose": "Images from 2022", + "type": "data" + }, + { + "path": "2023/", + "purpose": "Images from 2023", + "type": "data" + }, + { + "path": "2024/", + "purpose": "Images from 2024", + "type": "data" + } + ], + "modified": [], + "deleted": [] + }, + "comments": [ + "Organized by year: 2021 (43 images), 2022 (89 images), 2023 (67 images), 2024 (48 images)", + "Used EXIF data where available, file modification time as fallback", + "3 files skipped: no valid date information (corrupted.jpg, temp.png, test.gif)" + ] +} +``` diff --git a/.claude/commands/generate-tasks.md b/.claude/commands/generate-tasks.md new file mode 100644 index 00000000..9d00bfd7 --- /dev/null +++ b/.claude/commands/generate-tasks.md @@ -0,0 +1,191 @@ +# Command: generate-tasks + +## Purpose + +Generate a detailed, hierarchical task list from an existing PRD. Tasks should guide a developer through implementation with clear, actionable steps. + +## Command Type + +`generate-tasks` + +## Input + +You will receive a request file containing: +- Reference to a specific PRD file (path or ID) +- Any additional context or constraints + +## Process + +### Phase 1: Analysis + +1. **Read the PRD** + - Locate and read the specified PRD file + - Understand functional requirements + - Identify user stories and acceptance criteria + - Note technical considerations + +2. **Assess Current Codebase** + - Review existing code structure + - Identify relevant existing components + - Understand architectural patterns + - Note relevant files that may need modification + - Identify utilities and libraries already in use + +3. **Identify Relevant Files** + - List files that will need to be created + - List files that will need to be modified + - Include corresponding test files + - Note the purpose of each file + +### Phase 2: Generate Parent Tasks + +4. **Create High-Level Tasks** + - Break the PRD into 4-7 major work streams + - Each parent task should be a significant milestone + - Examples: + - "Set up data models and database schema" + - "Implement backend API endpoints" + - "Create frontend components" + - "Add form validation and error handling" + - "Implement tests" + - "Add documentation" + +5. **Present to User** + - Generate the high-level tasks in the JSON output + - Set status to "user_query" + - Ask: "I have generated the high-level tasks. Ready to generate sub-tasks? Respond with 'Go' to proceed." + - Save context with the parent tasks + +### Phase 3: Generate Sub-Tasks + +6. **Wait for User Confirmation** + - Only proceed after user responds with "Go" or equivalent + +7. **Break Down Each Parent Task** + - Create 2-8 sub-tasks for each parent task + - Sub-tasks should be: + - Specific and actionable + - Able to be completed in 15-60 minutes + - Ordered logically (dependencies first) + - Clear enough for a junior developer + + **Sub-task Quality Guidelines:** + - Start with action verbs: "Create", "Implement", "Add", "Update", "Test" + - Include what and where: "Create UserProfile component in components/profile/" + - Reference existing patterns: "Following the pattern used in AuthForm component" + - Note dependencies: "After completing 1.2, update..." + +8. **Update Task List** + - Add all sub-tasks to the JSON output + - Link sub-tasks to parent tasks using parent_task_id + - All tasks should have status "pending" + +## Task ID Format + +- **Parent tasks**: X.0 (1.0, 2.0, 3.0, etc.) +- **Sub-tasks**: X.Y (1.1, 1.2, 1.3, etc.) +- Maximum depth: 2 levels (no sub-sub-tasks) + +## Task Structure in JSON + +```json +{ + "task_id": "1.0", + "description": "Set up data models and database schema", + "status": "pending", + "parent_task_id": null, + "notes": "" +}, +{ + "task_id": "1.1", + "description": "Create User model with fields: name, email, avatar_url, bio", + "status": "pending", + "parent_task_id": "1.0", + "notes": "Reference existing models in models/ directory" +} +``` + +## Relevant Files Documentation + +In your `comments` array, include a section listing relevant files: + +``` +"RELEVANT FILES:", +"- src/models/User.ts - Create new User model", +"- src/models/User.test.ts - Unit tests for User model", +"- src/api/users.ts - API endpoints for user operations", +"- src/api/users.test.ts - API endpoint tests", +"- src/components/UserProfile.tsx - New profile display component", +"- src/components/UserProfile.test.tsx - Component tests" +``` + +## JSON Output Requirements + +**Required Fields:** +- `command_type`: "generate-tasks" +- `status`: "complete" (after sub-tasks) or "user_query" (after parent tasks) +- `session_summary`: Brief summary of task generation +- `tasks`: Array of all tasks (parent and sub-tasks after completion) +- `comments`: Include relevant files list and important notes + +**For user_query status (after Phase 2):** +- `tasks`: Array with only parent tasks +- `queries_for_user`: Ask user to confirm before generating sub-tasks +- `context`: Save PRD analysis and parent tasks + +**Example Comments:** +- "Generated 5 parent tasks and 27 sub-tasks total" +- "Identified 12 files that need creation or modification" +- "Tasks assume use of existing authentication middleware" +- "Test tasks follow Jest/React Testing Library patterns used in codebase" + +## Quality Checklist + +Before marking complete, verify: +- ✅ All functional requirements from PRD are covered by tasks +- ✅ Tasks are ordered logically with dependencies first +- ✅ Each task is specific and actionable +- ✅ Parent tasks represent major milestones +- ✅ Sub-tasks can each be completed in reasonable time +- ✅ Testing tasks are included +- ✅ Task descriptions reference existing patterns where relevant +- ✅ All tasks use proper ID format +- ✅ Relevant files are identified with purposes +- ✅ JSON output includes all required fields + +## Example Task Breakdown + +**Parent Task:** +```json +{ + "task_id": "2.0", + "description": "Implement backend API endpoints", + "status": "pending", + "parent_task_id": null +} +``` + +**Sub-tasks:** +```json +{ + "task_id": "2.1", + "description": "Create GET /api/users/:id endpoint to retrieve user profile", + "status": "pending", + "parent_task_id": "2.0", + "notes": "Return user object with all fields from User model" +}, +{ + "task_id": "2.2", + "description": "Create PUT /api/users/:id endpoint to update user profile", + "status": "pending", + "parent_task_id": "2.0", + "notes": "Validate input, check authorization, update only allowed fields" +}, +{ + "task_id": "2.3", + "description": "Add authentication middleware to protect user endpoints", + "status": "pending", + "parent_task_id": "2.0", + "notes": "Use existing auth middleware pattern from api/auth.ts" +} +``` diff --git a/.claude/commands/jupyter-dev.md b/.claude/commands/jupyter-dev.md new file mode 100644 index 00000000..61eeb30b --- /dev/null +++ b/.claude/commands/jupyter-dev.md @@ -0,0 +1,480 @@ +# Command: jupyter-dev + +## Purpose + +Develop Jupyter notebooks following a standardized workflow that emphasizes: +- Organized directory structure with data, models, and output segregation +- Independent, self-contained cells that can run in any order +- Centralized utilities and imports via util.py +- Intermediate data caching for debugging and efficiency +- Clear markdown documentation preceding each code cell + +## Command Type + +`jupyter-dev` + +## Input + +You will receive a request file containing: +- Notebook development task description +- Project name (for util.py configuration) +- Specific analysis or computation requirements +- Input data files (optional) +- User preferences (optional) + +## Project Structure + +All notebooks must follow this directory structure: + +``` +notebooks/ +├── util.py # Centralized utilities and imports +├── .ipynb # Notebook files +├── data/ # Input data (experimental, omics, expression data) +├── datacache/ # JSON output from util.save() function +├── genomes/ # Genome files +├── models/ # COBRA/COBRApy models +└── nboutput/ # Non-JSON output (TSV, Excel, tables, etc.) +``` + +### Directory Purposes + +- **notebooks/**: Root directory containing all notebooks and util.py +- **data/**: All input data files (experimental data, omics data, expression data) +- **datacache/**: Intermediate JSON data saved via util.save() for cell independence +- **genomes/**: Genome files only +- **models/**: COBRA/COBRApy model files only +- **nboutput/**: Non-JSON output files (TSV, Excel, tables, plots, etc.) + +## util.py Structure + +The util.py file must follow this template: + +```python +import sys +import os +import json +from os import path + +# Add the parent directory to the sys.path +sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) +script_path = os.path.abspath(__file__) +script_dir = os.path.dirname(script_path) +base_dir = os.path.dirname(os.path.dirname(script_dir)) +folder_name = os.path.basename(script_dir) + +print(base_dir+"/KBUtilLib/src") +sys.path = [base_dir+"/KBUtilLib/src",base_dir+"/cobrakbase",base_dir+"/ModelSEEDpy/"] + sys.path + +# Import utilities with error handling +from kbutillib import NotebookUtils + +import hashlib +import pandas as pd +from modelseedpy import AnnotationOntology, MSPackageManager, MSMedia, MSModelUtil, MSBuilder, MSATPCorrection, MSGapfill, MSGrowthPhenotype, MSGrowthPhenotypes, ModelSEEDBiochem, MSExpression + +class NotebookUtil(NotebookUtils): + def __init__(self,**kwargs): + super().__init__( + notebook_folder=script_dir, + name="", + user="chenry", + retries=5, + proxy_port=None, + **kwargs + ) + + # PLACE ALL UTILITY FUNCTIONS NEEDED FOR NOTEBOOKS HERE + +# Initialize the NotebookUtil instance +util = NotebookUtil() +``` + +### Key Points for util.py + +1. **Replace ``** with the actual project name from user input +2. **Add all imports** needed by notebooks to this file +3. **Add all utility functions** as methods of the NotebookUtil class +4. **Keep it centralized**: All shared code goes here, not in notebook cells + +## Notebook Cell Design Pattern + +Every notebook must follow this strict cell pattern: + +### 1. Markdown Cell (Always First) +```markdown +## [Step Name/Purpose] + +[Explanation of what this code cell does and why] +- Key objective +- Input data used +- Output data produced +- Any important notes +``` + +### 2. Code Cell (Always Second) +```python +%run util.py + +# Load required data from previous steps +data1 = util.load('data1_name') +data2 = util.load('data2_name') + +# Perform analysis/computation +result = some_analysis(data1, data2) + +# Save intermediate results for cell independence +util.save('result_name', result) +``` + +### Critical Cell Design Rules + +1. **Every code cell starts with**: `%run util.py` + - This instantiates the util class + - This loads all imports + - This ensures cell independence + +2. **Load data at cell start**: Use `util.load('data_name')` for any data from previous cells + - Only load what this cell needs + - Data comes from datacache/ directory + +3. **Save data at cell end**: Use `util.save('data_name', data)` for outputs + - Save all intermediate results that other cells might need + - Only JSON-serializable data structures + - Saved to datacache/ directory + +4. **Cell independence**: Each cell should run independently + - Don't rely on variables from previous cells without loading them + - Don't assume cells run in order + - Enable debugging by re-running individual cells + +5. **Markdown precedes code**: Every code cell has a markdown cell explaining it + - What the cell does + - Why it's needed + - What data it uses and produces + +## Process + +### Phase 1: Setup Project Structure + +1. **Check for notebooks/ Directory** + - If `notebooks/` doesn't exist, create it + - If it exists, verify subdirectories + +2. **Create Required Subdirectories** + - Create `notebooks/data/` if missing + - Create `notebooks/datacache/` if missing + - Create `notebooks/genomes/` if missing + - Create `notebooks/models/` if missing + - Create `notebooks/nboutput/` if missing + +3. **Create or Validate util.py** + - If `notebooks/util.py` doesn't exist, create it from template + - Replace `` with actual project name + - If util.py exists, verify it has the NotebookUtil class + - Document whether created or validated + +### Phase 2: Understand Requirements + +4. **Analyze Task Description** + - Identify the scientific/analytical goal + - Determine required input data + - Identify computation steps needed + - Plan logical cell breakdown + - Determine what utility functions might be needed + +5. **Plan Notebook Structure** + - Break task into logical steps (cells) + - Identify data flow between cells + - Determine what gets saved/loaded at each step + - Plan utility functions for util.py + - Document the planned structure + +### Phase 3: Develop Utility Functions + +6. **Add Utility Functions to util.py** + - Add any custom functions needed by notebooks + - Add imports required for these functions + - Add functions as methods to NotebookUtil class + - Document each function with docstrings + - Keep functions general and reusable + +### Phase 4: Create/Modify Notebook + +7. **Create Notebook Cells** + - For each logical step: + - Create markdown cell explaining the step + - Create code cell with proper pattern: + - Start with `%run util.py` + - Load required data with util.load() + - Perform computation + - Save results with util.save() + - Follow cell independence principles + - Add clear variable names and comments + +8. **Organize Data Files** + - Move/reference input data to `notebooks/data/` + - Reference genome files from `notebooks/genomes/` + - Reference model files from `notebooks/models/` + - Save non-JSON output to `notebooks/nboutput/` + - Let util.save() handle datacache/ automatically + +### Phase 5: Validate and Document + +9. **Verify Notebook Standards** + - Every code cell starts with `%run util.py` + - Every code cell has preceding markdown explanation + - Data dependencies use util.load() + - Results saved with util.save() + - Cells can run independently + - All files in correct directories + +10. **Create Summary Documentation** + - Document notebook purpose and workflow + - List required input data and locations + - Describe each major step + - Note any manual setup required + - Include example usage + +### Phase 6: Save Structured Output + +11. **Save JSON Tracking File** + - Document all files created/modified + - List all utility functions added + - Describe notebook cell structure + - Note any issues or edge cases + - Include completion status + +## JSON Output Schema + +The command execution tracking file must follow this structure: + +```json +{ + "command_type": "jupyter-dev", + "status": "complete | incomplete | user_query | error", + "session_id": "string", + "parent_session_id": "string | null", + "session_summary": "Brief summary of notebook development work", + + "project": { + "name": "string - project name used in util.py", + "notebook_name": "string - name of notebook file", + "purpose": "string - what this notebook does" + }, + + "structure": { + "directories_created": ["data", "datacache", "genomes", "models", "nboutput"], + "util_py_status": "created | existed | modified", + "notebook_path": "notebooks/.ipynb" + }, + + "notebook_cells": [ + { + "cell_number": 1, + "type": "markdown | code", + "purpose": "Description of what this cell does", + "data_loaded": ["data1", "data2"], + "data_saved": ["result1"] + } + ], + + "utility_functions": [ + { + "name": "function_name", + "purpose": "What this utility function does", + "added_to_util_py": true + } + ], + + "files": { + "created": [ + { + "path": "notebooks/util.py", + "purpose": "Centralized utilities and imports", + "type": "code" + } + ], + "modified": [ + { + "path": "notebooks/analysis.ipynb", + "changes": "Added 5 cells for data loading and analysis" + } + ], + "data_files": [ + { + "path": "notebooks/data/experimental_data.csv", + "purpose": "Input experimental data", + "type": "input" + } + ] + }, + + "artifacts": { + "notebook_filename": "notebooks/.ipynb", + "util_py_path": "notebooks/util.py", + "cell_count": 10, + "utility_function_count": 3 + }, + + "validation": { + "all_cells_have_markdown": true, + "all_cells_start_with_run_util": true, + "data_loading_uses_util_load": true, + "data_saving_uses_util_save": true, + "cells_independent": true, + "files_in_correct_directories": true + }, + + "comments": [ + "Created notebook structure with 5 analysis steps", + "Added 3 utility functions for data processing", + "All cells follow independence pattern with util.load/save", + "Input data placed in notebooks/data/", + "Output tables saved to notebooks/nboutput/" + ], + + "queries_for_user": [], + + "errors": [] +} +``` + +## Command JSON Output Requirements + +Your command execution JSON output must include: + +**Required Fields:** +- `command_type`: "jupyter-dev" +- `status`: "complete", "user_query", or "error" +- `session_id`: Session ID for this execution +- `session_summary`: Brief summary of notebook development +- `project`: Project name and notebook details +- `structure`: Directory and util.py status +- `files`: All files created, modified, or referenced +- `artifacts`: Paths to notebook and util.py +- `validation`: Checklist confirming standards followed +- `comments`: Notes about development process + +**For user_query status:** +- `queries_for_user`: Questions needing clarification +- `context`: Save partial work and notebook state + +**Example Comments:** +- "Created notebooks directory structure with all required subdirectories" +- "Generated util.py with project name 'MetabolicAnalysis'" +- "Created notebook with 8 cells following independence pattern" +- "Added 4 utility functions for COBRA model manipulation" +- "All intermediate results saved to datacache/ for cell independence" +- "Placed genome files in genomes/, model files in models/" + +## Design Principles + +### Cell Independence Philosophy + +The notebook design prioritizes **cell independence** for several critical reasons: + +1. **Debugging Efficiency**: Re-run individual cells without executing entire notebook +2. **Time Savings**: Skip expensive computations by loading cached results +3. **Error Recovery**: Recover from failures without losing all progress +4. **Experimentation**: Test variations by modifying single cells +5. **Collaboration**: Others can understand and modify individual steps + +### Implementation Strategy + +- **util.load()** and **util.save()** create checkpoints +- **datacache/** stores intermediate results as JSON +- **%run util.py** ensures consistent environment +- **Markdown cells** provide context for each step + +### When to Save Data + +Save data when: +- Results took significant time to compute +- Data will be used by multiple subsequent cells +- Intermediate results are worth preserving +- Enabling cell re-runs would save time + +Don't save data when: +- Quick computations (< 1 second) +- Data only used in next cell +- Data is not JSON-serializable (save to nboutput/ instead) + +## Utility Function Guidelines + +Add functions to util.py when: +- Code is used by multiple cells +- Complex operations that need documentation +- Interactions with external systems (APIs, databases) +- Data transformations used repeatedly +- Model-specific operations + +Keep in notebooks when: +- Code is cell-specific analysis +- One-time exploratory code +- Visualization/plotting specific to that cell +- Simple operations that don't need abstraction + +## Quality Checklist + +Before marking complete, verify: +- ✅ notebooks/ directory exists with all 5 subdirectories +- ✅ util.py exists and has correct project name +- ✅ util.py contains NotebookUtil class with needed functions +- ✅ Every code cell starts with `%run util.py` +- ✅ Every code cell has preceding markdown explanation +- ✅ Data dependencies use util.load() +- ✅ Results saved with util.save() where appropriate +- ✅ Cells can run independently (tested) +- ✅ Input data in data/ directory +- ✅ Models in models/ directory +- ✅ Genomes in genomes/ directory +- ✅ Non-JSON output in nboutput/ directory +- ✅ JSON output handled by util.save() to datacache/ +- ✅ Markdown cells explain reasoning and purpose +- ✅ All imports in util.py, not scattered in cells +- ✅ Utility functions documented with docstrings + +## Error Handling + +Handle these scenarios gracefully: + +1. **Missing Dependencies**: If KBUtilLib or ModelSEEDpy not available, note in errors +2. **Existing Files**: Don't overwrite util.py if it already exists; validate instead +3. **Non-JSON Data**: Guide user to save to nboutput/ and load manually +4. **Complex Analysis**: Break into multiple cells for independence +5. **Long-Running Cells**: Emphasize saving intermediate results + +## Privacy and Security Considerations + +- Don't include API keys or credentials in util.py or notebooks +- Use environment variables or config files for sensitive data +- Document if manual credential setup is needed +- Don't log sensitive data in datacache/ files +- Note if data files contain sensitive information + +## Example Workflow + +For a typical metabolic modeling notebook: + +1. **Cell 1**: Load genome data from genomes/ + - Markdown: Explain which genome and why + - Code: Load, parse, save processed genome data + +2. **Cell 2**: Load COBRA model from models/ + - Markdown: Explain model selection and purpose + - Code: Load model, save to datacache + +3. **Cell 3**: Load experimental data from data/ + - Markdown: Describe experimental conditions + - Code: Load CSV, process, save data structure + +4. **Cell 4**: Run flux balance analysis + - Markdown: Explain FBA parameters and objectives + - Code: Load model, run FBA, save results + +5. **Cell 5**: Generate result tables + - Markdown: Describe what tables show + - Code: Load FBA results, create tables, save to nboutput/ + +Each cell independent, each with clear purpose, each properly cached. diff --git a/.claude/commands/kb-sdk-dev.md b/.claude/commands/kb-sdk-dev.md new file mode 100644 index 00000000..aa6f0566 --- /dev/null +++ b/.claude/commands/kb-sdk-dev.md @@ -0,0 +1,388 @@ +# KBase SDK Development Expert + +You are an expert on KBase SDK development. You have deep knowledge of: + +1. **KIDL Specification** - Writing and compiling KBase Interface Description Language spec files +2. **Module Structure** - Dockerfile, kbase.yml, spec.json, display.yaml, impl files +3. **Workspace Data Types** - All 223 KBase data types across 45 modules +4. **Narrative UI Integration** - Creating app interfaces with proper input/output widgets +5. **KBUtilLib Integration** - Using the shared utility library to avoid redundant code +6. **Best Practices** - Code organization, error handling, reporting, Docker optimization + +## Critical: KBUtilLib Usage + +**ALWAYS use KBUtilLib for common functionality.** The library at `/Users/chenry/Dropbox/Projects/KBUtilLib` provides: + +- `KBWSUtils` - Workspace operations (get/save objects) +- `KBGenomeUtils` - Genome parsing, feature extraction +- `KBModelUtils` - Metabolic model utilities +- `KBCallbackUtils` - Callback server handling +- `KBAnnotationUtils` - Annotation workflows +- `SharedEnvUtils` - Configuration and token management +- `MSBiochemUtils` - ModelSEED biochemistry access +- And many more utilities + +**In your Dockerfile, ALWAYS include:** +```dockerfile +# Checkout KBUtilLib for shared utilities +RUN cd /kb/module && \ + git clone https://github.com/cshenry/KBUtilLib.git && \ + cd KBUtilLib && \ + pip install -e . +``` + +**When writing new utility code:** If a function has general utility beyond this specific app, consider adding it to KBUtilLib instead. + +## Knowledge Loading + +**KBUtilLib Reference (read for available utilities):** +- `/Users/chenry/Dropbox/Projects/KBUtilLib/README.md` +- `/Users/chenry/Dropbox/Projects/KBUtilLib/src/kbutillib/` (module source) +- `/Users/chenry/Dropbox/Projects/KBUtilLib/docs/` (module documentation) + +**Workspace Data Types (read for type specifications):** +- `/Users/chenry/Dropbox/Projects/workspace_deluxe/agent-io/docs/WorkspaceDataTypes/all_types_list.json` +- `/Users/chenry/Dropbox/Projects/workspace_deluxe/agent-io/docs/WorkspaceDataTypes/individual_specs/` (individual type specs) +- `/Users/chenry/Dropbox/Projects/workspace_deluxe/agent-io/docs/WorkspaceDataTypes/all_type_specs.json` (full specs) + +**Online Documentation:** +- https://kbase.github.io/kb_sdk_docs/ (SDK documentation) +- https://kbase.github.io/kb_sdk_docs/references/KIDL_spec.html (KIDL reference) +- https://kbase.github.io/kb_sdk_docs/references/module_anatomy.html (module structure) + +## Quick Reference: Module Structure + +``` +MyModule/ +├── kbase.yml # Module metadata +├── Makefile # Build commands +├── Dockerfile # Container definition +├── MyModule.spec # KIDL specification +├── lib/ +│ └── MyModule/ +│ └── MyModuleImpl.py # Implementation code +├── ui/ +│ └── narrative/ +│ └── methods/ +│ └── run_my_app/ +│ ├── spec.json # Parameter mapping +│ └── display.yaml # UI labels/docs +├── test/ +│ └── MyModule_server_test.py # Unit tests +├── scripts/ +│ └── entrypoint.sh # Docker entrypoint +└── data/ # Reference data (<100MB) +``` + +## KIDL Spec File Format + +``` +/* +A KBase module: MyModule +Module description here. +*/ +module MyModule { + + /* Documentation for this type */ + typedef structure { + string workspace_name; + string genome_ref; + int min_length; + } RunAppParams; + + typedef structure { + string report_name; + string report_ref; + } RunAppResults; + + /* + Run the main application. + + This function does X, Y, Z. + */ + funcdef run_app(RunAppParams params) + returns (RunAppResults output) + authentication required; +}; +``` + +## Implementation File Pattern + +```python +#BEGIN_HEADER +import os +import json +from kbutillib import KBWSUtils, KBCallbackUtils, SharedEnvUtils + +class MyAppUtils(KBWSUtils, KBCallbackUtils, SharedEnvUtils): + """Custom utility class combining KBUtilLib modules.""" + pass +#END_HEADER + +class MyModule: + #BEGIN_CLASS_HEADER + #END_CLASS_HEADER + + def __init__(self, config): + #BEGIN_CONSTRUCTOR + self.callback_url = os.environ['SDK_CALLBACK_URL'] + self.scratch = config['scratch'] + self.utils = MyAppUtils(callback_url=self.callback_url) + #END_CONSTRUCTOR + pass + + def run_app(self, ctx, params): + #BEGIN run_app + # Validate inputs + workspace_name = params['workspace_name'] + genome_ref = params['genome_ref'] + + # Get data using KBUtilLib + genome_data = self.utils.get_object(workspace_name, genome_ref) + + # Do processing... + results = self.process_genome(genome_data) + + # Create report + report_info = self.utils.create_extended_report({ + 'message': 'Analysis complete', + 'workspace_name': workspace_name + }) + + return { + 'report_name': report_info['name'], + 'report_ref': report_info['ref'] + } + #END run_app +``` + +## spec.json Structure + +```json +{ + "ver": "1.0.0", + "authors": ["username"], + "contact": "email@example.com", + "categories": ["active"], + "widgets": { + "input": null, + "output": "no-display" + }, + "parameters": [ + { + "id": "genome_ref", + "optional": false, + "advanced": false, + "allow_multiple": false, + "default_values": [""], + "field_type": "text", + "text_options": { + "valid_ws_types": ["KBaseGenomes.Genome"] + } + }, + { + "id": "min_length", + "optional": true, + "advanced": true, + "allow_multiple": false, + "default_values": ["100"], + "field_type": "text", + "text_options": { + "validate_as": "int", + "min_int": 1 + } + } + ], + "behavior": { + "service-mapping": { + "url": "", + "name": "MyModule", + "method": "run_app", + "input_mapping": [ + { + "narrative_system_variable": "workspace", + "target_property": "workspace_name" + }, + { + "input_parameter": "genome_ref", + "target_property": "genome_ref", + "target_type_transform": "resolved-ref" + }, + { + "input_parameter": "min_length", + "target_property": "min_length", + "target_type_transform": "int" + } + ], + "output_mapping": [ + { + "service_method_output_path": [0, "report_name"], + "target_property": "report_name" + }, + { + "service_method_output_path": [0, "report_ref"], + "target_property": "report_ref" + } + ] + } + }, + "job_id_output_field": "docker" +} +``` + +## display.yaml Structure + +```yaml +name: Run My App +tooltip: | + Analyze genome data with custom parameters +screenshots: [] + +icon: icon.png + +suggestions: + apps: + related: [] + next: [] + methods: + related: [] + next: [] + +parameters: + genome_ref: + ui-name: | + Genome + short-hint: | + Select a genome to analyze + long-hint: | + Select a genome object from your workspace for analysis. + min_length: + ui-name: | + Minimum Length + short-hint: | + Minimum sequence length to consider + long-hint: | + Sequences shorter than this value will be filtered out. + +description: | +

Detailed description of what this app does.

+

Include information about inputs, outputs, and methodology.

+ +publications: + - pmid: 12345678 + display-text: | + Author et al. (2024) Paper title. Journal Name. + link: https://doi.org/xxx +``` + +## Dockerfile Pattern + +```dockerfile +FROM kbase/sdkbase2:python +MAINTAINER Your Name + +# Install system dependencies +RUN apt-get update && apt-get install -y \ + build-essential \ + && rm -rf /var/lib/apt/lists/* + +# Install Python dependencies +COPY requirements.txt /kb/module/requirements.txt +RUN pip install -r /kb/module/requirements.txt + +# CRITICAL: Install KBUtilLib for shared utilities +RUN cd /kb/module && \ + git clone https://github.com/cshenry/KBUtilLib.git && \ + cd KBUtilLib && \ + pip install -e . + +# Copy module files +COPY . /kb/module +WORKDIR /kb/module + +# Compile the module +RUN make all + +ENTRYPOINT ["./scripts/entrypoint.sh"] +CMD [] +``` + +## Common Data Types + +| Module | Type | Description | +|--------|------|-------------| +| KBaseGenomes | Genome | Annotated genome | +| KBaseGenomes | ContigSet | Set of contigs | +| KBaseFBA | FBAModel | Metabolic model | +| KBaseFBA | FBA | FBA solution | +| KBaseFBA | Media | Growth media | +| KBaseBiochem | Biochemistry | Compound/reaction DB | +| KBaseAssembly | Assembly | Genome assembly | +| KBaseRNASeq | RNASeqAlignment | RNA-seq alignment | +| KBaseSets | GenomeSet | Set of genomes | +| KBaseReport | Report | App output report | + +## Guidelines for Responding + +1. **Always recommend KBUtilLib** - Check if functionality exists there first +2. **Show complete examples** - KIDL specs, impl code, UI files together +3. **Explain compilation** - Remind about `make` after spec changes +4. **Include Dockerfile** - Show how to install dependencies +5. **Reference data types** - Point to specific workspace types when relevant + +## Response Format + +### For "how do I create" questions: +``` +### Overview +What we're building and why. + +### KIDL Spec +```kidl +// Complete spec file +``` + +### Implementation +```python +# Complete impl code +``` + +### UI Files +spec.json and display.yaml content + +### Dockerfile Updates +Any required additions + +### Build & Test +```bash +make +kb-sdk test +``` +``` + +### For data type questions: +``` +### Type: `ModuleName.TypeName` + +**Structure:** +``` +typedef structure { + field definitions... +} TypeName; +``` + +**Common Fields:** +- `field1` - Description +- `field2` - Description + +**Usage Example:** +```python +# How to work with this type +``` + +**Related Types:** List of related types +``` + +## User Request + +$ARGUMENTS diff --git a/.claude/commands/kb-sdk-dev/context/kbutillib-integration.md b/.claude/commands/kb-sdk-dev/context/kbutillib-integration.md new file mode 100644 index 00000000..051a9d9e --- /dev/null +++ b/.claude/commands/kb-sdk-dev/context/kbutillib-integration.md @@ -0,0 +1,287 @@ +# KBUtilLib Integration Guide + +## Overview + +KBUtilLib is a modular utility framework that should be used in ALL KBase SDK applications to avoid code duplication. The library provides composable utility classes that can be combined via multiple inheritance. + +**Repository:** `/Users/chenry/Dropbox/Projects/KBUtilLib` +**GitHub:** https://github.com/cshenry/KBUtilLib + +## Installation in Dockerfile + +**ALWAYS include KBUtilLib in your Dockerfile:** + +```dockerfile +# Install KBUtilLib for shared utilities +RUN cd /kb/module && \ + git clone https://github.com/cshenry/KBUtilLib.git && \ + cd KBUtilLib && \ + pip install -e . +``` + +## Available Modules + +### Core Foundation + +| Module | Purpose | +|--------|---------| +| `BaseUtils` | Logging, error handling, dependency management | +| `SharedEnvUtils` | Configuration files, authentication tokens | +| `NotebookUtils` | Jupyter integration, enhanced displays | + +### KBase Data Access + +| Module | Purpose | +|--------|---------| +| `KBWSUtils` | Workspace operations: get/save objects | +| `KBCallbackUtils` | Callback server handling for SDK apps | +| `KBSDKUtils` | SDK development utilities | + +### Analysis Utilities + +| Module | Purpose | +|--------|---------| +| `KBGenomeUtils` | Genome parsing, feature extraction, translation | +| `KBAnnotationUtils` | Gene/protein annotation workflows | +| `KBModelUtils` | Metabolic model analysis, FBA utilities | +| `MSBiochemUtils` | ModelSEED biochemistry database access | +| `KBReadsUtils` | Reads processing and QC | + +### External Integrations + +| Module | Purpose | +|--------|---------| +| `ArgoUtils` | Language model integration | +| `BVBRCUtils` | BV-BRC database access | +| `PatricWSUtils` | PATRIC workspace utilities | + +## Usage Patterns + +### Pattern 1: Single Module +```python +from kbutillib import KBWSUtils + +class MyApp: + def __init__(self, callback_url): + self.ws_utils = KBWSUtils(callback_url=callback_url) + + def run(self, params): + obj = self.ws_utils.get_object(params['workspace'], params['ref']) +``` + +### Pattern 2: Multiple Inheritance (Recommended) +```python +from kbutillib import KBWSUtils, KBGenomeUtils, KBCallbackUtils, SharedEnvUtils + +class MyAppUtils(KBWSUtils, KBGenomeUtils, KBCallbackUtils, SharedEnvUtils): + """Custom utility class combining needed modules.""" + pass + +class MyApp: + def __init__(self, callback_url): + self.utils = MyAppUtils(callback_url=callback_url) + + def run(self, params): + # Access all methods from combined classes + genome = self.utils.get_object(params['workspace'], params['ref']) + features = self.utils.extract_features_by_type(genome, 'CDS') + report = self.utils.create_extended_report({...}) +``` + +### Pattern 3: In Implementation File +```python +#BEGIN_HEADER +import os +from kbutillib import KBWSUtils, KBCallbackUtils, KBGenomeUtils + +class AppUtils(KBWSUtils, KBCallbackUtils, KBGenomeUtils): + """Combined utilities for this app.""" + pass +#END_HEADER + +class MyModule: + def __init__(self, config): + #BEGIN_CONSTRUCTOR + self.callback_url = os.environ['SDK_CALLBACK_URL'] + self.scratch = config['scratch'] + self.utils = AppUtils( + callback_url=self.callback_url, + scratch=self.scratch + ) + #END_CONSTRUCTOR + + def my_method(self, ctx, params): + #BEGIN my_method + workspace = params['workspace_name'] + + # Get genome using KBWSUtils + genome = self.utils.get_object(workspace, params['genome_ref']) + + # Parse genome using KBGenomeUtils + features = self.utils.extract_features_by_type(genome, 'CDS') + + # Create report using KBCallbackUtils + report_info = self.utils.create_extended_report({ + 'message': f'Found {len(features)} CDS features', + 'workspace_name': workspace + }) + + return [{ + 'report_name': report_info['name'], + 'report_ref': report_info['ref'] + }] + #END my_method +``` + +## Key Methods Reference + +### KBWSUtils + +```python +# Get a single object +obj_data = utils.get_object(workspace, object_ref) + +# Get object with metadata +obj, info = utils.get_object_with_info(workspace, object_ref) + +# Save an object +info = utils.save_object(workspace, obj_type, obj_name, obj_data) + +# List objects in workspace +objects = utils.list_objects(workspace, type_filter='KBaseGenomes.Genome') +``` + +### KBCallbackUtils + +```python +# Create a report +report_info = utils.create_extended_report({ + 'message': 'Analysis complete', + 'workspace_name': workspace, + 'objects_created': [{'ref': new_ref, 'description': 'My output'}], + 'file_links': [{'path': '/path/to/file.txt', 'name': 'results.txt'}], + 'html_links': [{'path': '/path/to/report.html', 'name': 'report'}] +}) + +# Download staging file +local_path = utils.download_staging_file(staging_file_path) + +# Upload file to shock +shock_id = utils.upload_to_shock(file_path) +``` + +### KBGenomeUtils + +```python +# Extract all features of a type +cds_features = utils.extract_features_by_type(genome_data, 'CDS') + +# Translate DNA sequence +protein = utils.translate_sequence(dna_seq) + +# Find ORFs in sequence +orfs = utils.find_orfs(sequence, min_length=100) + +# Parse genome object +genome_info = utils.parse_genome_object(genome_data) +``` + +### KBModelUtils + +```python +# Load model data +model_data = utils.get_model(workspace, model_ref) + +# Get reactions/metabolites +reactions = utils.get_model_reactions(model_data) +metabolites = utils.get_model_metabolites(model_data) + +# Check model consistency +issues = utils.validate_model(model_data) +``` + +### MSBiochemUtils + +```python +# Search compounds +compounds = utils.search_compounds("glucose") + +# Get reaction info +reaction = utils.get_reaction("rxn00001") + +# Search reactions by compound +reactions = utils.find_reactions_with_compound("cpd00001") +``` + +## When to Add Code to KBUtilLib + +If you're writing a function that: +1. Could be used in multiple KBase apps +2. Performs a common operation (parsing, converting, validating) +3. Wraps a KBase service in a cleaner way +4. Provides utility for a common data type + +**Consider adding it to KBUtilLib instead of your app.** + +### How to Add + +1. Identify which module it belongs in (or create new one) +2. Add the method to the appropriate class +3. Add tests in `tests/` +4. Update documentation +5. Push to GitHub +6. Update your app's Dockerfile to get latest + +## Configuration + +KBUtilLib can be configured via `config.yaml`: + +```yaml +kbase: + endpoint: https://kbase.us/services + token_env: KB_AUTH_TOKEN + +scratch: /kb/module/work/tmp + +logging: + level: INFO + format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s" +``` + +Load configuration: +```python +utils = MyAppUtils(config_file='config.yaml') +# Or +utils = MyAppUtils(callback_url=url, scratch=scratch_dir) +``` + +## Error Handling + +KBUtilLib provides consistent error handling: + +```python +from kbutillib.base_utils import KBUtilLibError + +try: + result = utils.get_object(workspace, ref) +except KBUtilLibError as e: + # Handle KBUtilLib-specific errors + logger.error(f"KBUtilLib error: {e}") +except Exception as e: + # Handle other errors + logger.error(f"Unexpected error: {e}") +``` + +## Testing + +Test your integration: + +```python +import pytest +from kbutillib import KBWSUtils + +def test_workspace_access(): + utils = KBWSUtils(callback_url=test_callback_url) + obj = utils.get_object('test_workspace', 'test_object') + assert obj is not None +``` diff --git a/.claude/commands/kb-sdk-dev/context/kidl-reference.md b/.claude/commands/kb-sdk-dev/context/kidl-reference.md new file mode 100644 index 00000000..eb0c9771 --- /dev/null +++ b/.claude/commands/kb-sdk-dev/context/kidl-reference.md @@ -0,0 +1,250 @@ +# KIDL Specification Reference + +## Overview + +KIDL (KBase Interface Description Language) defines the interface for KBase modules. It specifies: +- Data types (typedefs) +- Function signatures +- Authentication requirements +- Documentation + +## Basic Types + +| Type | Description | Example | +|------|-------------|---------| +| `string` | Text value | `"hello"` | +| `int` | Integer | `42` | +| `float` | Floating point | `3.14` | +| `bool` | Boolean (0 or 1) | `1` | +| `UnspecifiedObject` | Any JSON object | `{}` | +| `list` | List of type T | `["a", "b"]` | +| `mapping` | Key-value pairs | `{"key": "value"}` | +| `tuple` | Fixed-length tuple | `["a", 1]` | + +## Type Definitions + +### Simple Typedef +```kidl +typedef string genome_ref; +typedef int workspace_id; +``` + +### Structure Typedef +```kidl +typedef structure { + string workspace_name; + string object_name; + string object_ref; +} ObjectInfo; +``` + +### Nested Structures +```kidl +typedef structure { + string id; + string name; + list aliases; +} Feature; + +typedef structure { + string id; + list features; +} Genome; +``` + +### Optional Fields +```kidl +typedef structure { + string required_field; + string optional_field; /* marked optional in spec.json */ +} MyParams; +``` + +### Mappings and Lists +```kidl +typedef mapping StringToIntMap; +typedef list StringList; +typedef mapping> StringToListMap; +``` + +### Tuple Types +```kidl +typedef tuple ObjectVersion; +``` + +## Function Definitions + +### Basic Function +```kidl +funcdef my_function(MyParams params) + returns (MyResults results) + authentication required; +``` + +### Function with Multiple Returns +```kidl +funcdef get_info(string ref) + returns (string name, int size, string type) + authentication required; +``` + +### Function with No Return +```kidl +funcdef log_event(string message) + returns () + authentication required; +``` + +### Function Documentation +```kidl +/* + * Short description of function. + * + * Longer description with details about what the function does, + * what parameters it expects, and what it returns. + * + * @param params The input parameters + * @return results The output results + */ +funcdef documented_function(Params params) + returns (Results results) + authentication required; +``` + +## Authentication Options + +```kidl +/* Requires valid KBase token */ +funcdef secure_func(Params p) returns (Results r) authentication required; + +/* No authentication needed */ +funcdef public_func(Params p) returns (Results r) authentication none; + +/* Optional authentication */ +funcdef optional_auth_func(Params p) returns (Results r) authentication optional; +``` + +## Complete Module Example + +```kidl +/* + * A KBase module: GenomeAnalyzer + * + * This module provides tools for analyzing genome data, + * including feature extraction and sequence analysis. + */ +module GenomeAnalyzer { + + /* Reference to a genome object */ + typedef string genome_ref; + + /* Reference to a workspace object */ + typedef string ws_ref; + + /* Feature information extracted from genome */ + typedef structure { + string feature_id; + string feature_type; + int start; + int end; + string strand; + string sequence; + } FeatureInfo; + + /* Input parameters for analyze_genome */ + typedef structure { + string workspace_name; + genome_ref genome_ref; + int min_feature_length; + list feature_types; + } AnalyzeGenomeParams; + + /* Results from analyze_genome */ + typedef structure { + string report_name; + ws_ref report_ref; + int features_analyzed; + list feature_summary; + } AnalyzeGenomeResults; + + /* Input for batch analysis */ + typedef structure { + string workspace_name; + list genome_refs; + } BatchAnalyzeParams; + + /* Results from batch analysis */ + typedef structure { + string report_name; + ws_ref report_ref; + mapping genome_feature_counts; + } BatchAnalyzeResults; + + /* + * Analyze a single genome for features. + * + * This function extracts and analyzes features from the specified + * genome, filtering by minimum length and feature type. + * + * @param params Analysis parameters including genome reference + * @return results Analysis results with report reference + */ + funcdef analyze_genome(AnalyzeGenomeParams params) + returns (AnalyzeGenomeResults results) + authentication required; + + /* + * Analyze multiple genomes in batch. + * + * @param params Batch parameters with list of genome references + * @return results Batch results with per-genome counts + */ + funcdef batch_analyze(BatchAnalyzeParams params) + returns (BatchAnalyzeResults results) + authentication required; +}; +``` + +## Compilation + +After modifying the spec file, always run: +```bash +make +``` + +This regenerates: +- `lib/MyModule/MyModuleImpl.py` - Implementation stubs +- `lib/MyModule/MyModuleServer.py` - Server code +- `lib/MyModule/MyModuleClient.py` - Client code + +## Common Patterns + +### Workspace References +```kidl +typedef string ws_ref; /* Format: "workspace/object" or "workspace/object/version" */ +``` + +### Report Output +```kidl +typedef structure { + string report_name; + string report_ref; +} ReportOutput; +``` + +### Standard Input Pattern +```kidl +typedef structure { + string workspace_name; + string workspace_id; /* Alternative to name */ + /* ... other params */ +} StandardParams; +``` + +## Tips + +1. **Keep types simple** - Complex nested structures are hard to maintain +2. **Use meaningful names** - `genome_ref` not `gr` or `ref1` +3. **Document everything** - Comments become API documentation +4. **Use lists for collections** - `list` not repeated fields +5. **Use mappings for lookups** - `mapping` for ID-based access diff --git a/.claude/commands/kb-sdk-dev/context/ui-spec-reference.md b/.claude/commands/kb-sdk-dev/context/ui-spec-reference.md new file mode 100644 index 00000000..5b80f60e --- /dev/null +++ b/.claude/commands/kb-sdk-dev/context/ui-spec-reference.md @@ -0,0 +1,421 @@ +# KBase Narrative UI Specification Reference + +## File Structure + +Each app method requires two files in `ui/narrative/methods//`: +- `spec.json` - Parameter mapping and validation +- `display.yaml` - UI labels, hints, and documentation + +## spec.json Reference + +### Complete Structure + +```json +{ + "ver": "1.0.0", + "authors": ["username"], + "contact": "email@example.com", + "categories": ["active"], + "widgets": { + "input": null, + "output": "no-display" + }, + "parameters": [...], + "behavior": { + "service-mapping": {...} + }, + "job_id_output_field": "docker" +} +``` + +### Parameter Types + +#### Text Input +```json +{ + "id": "my_string", + "optional": false, + "advanced": false, + "allow_multiple": false, + "default_values": ["default_value"], + "field_type": "text", + "text_options": { + "valid_ws_types": [] + } +} +``` + +#### Integer Input +```json +{ + "id": "my_int", + "optional": true, + "advanced": false, + "allow_multiple": false, + "default_values": ["10"], + "field_type": "text", + "text_options": { + "validate_as": "int", + "min_int": 1, + "max_int": 100 + } +} +``` + +#### Float Input +```json +{ + "id": "my_float", + "optional": true, + "advanced": true, + "allow_multiple": false, + "default_values": ["0.5"], + "field_type": "text", + "text_options": { + "validate_as": "float", + "min_float": 0.0, + "max_float": 1.0 + } +} +``` + +#### Workspace Object Selector +```json +{ + "id": "genome_ref", + "optional": false, + "advanced": false, + "allow_multiple": false, + "default_values": [""], + "field_type": "text", + "text_options": { + "valid_ws_types": ["KBaseGenomes.Genome"] + } +} +``` + +#### Multiple Object Types +```json +{ + "id": "input_ref", + "optional": false, + "advanced": false, + "allow_multiple": false, + "default_values": [""], + "field_type": "text", + "text_options": { + "valid_ws_types": [ + "KBaseGenomes.Genome", + "KBaseGenomeAnnotations.Assembly" + ] + } +} +``` + +#### Dropdown/Select +```json +{ + "id": "algorithm", + "optional": false, + "advanced": false, + "allow_multiple": false, + "default_values": ["default"], + "field_type": "dropdown", + "dropdown_options": { + "options": [ + {"value": "fast", "display": "Fast (less accurate)"}, + {"value": "default", "display": "Default"}, + {"value": "accurate", "display": "Accurate (slower)"} + ] + } +} +``` + +#### Checkbox (Boolean) +```json +{ + "id": "include_empty", + "optional": true, + "advanced": true, + "allow_multiple": false, + "default_values": ["0"], + "field_type": "checkbox", + "checkbox_options": { + "checked_value": 1, + "unchecked_value": 0 + } +} +``` + +#### Multiple Selection +```json +{ + "id": "genomes", + "optional": false, + "advanced": false, + "allow_multiple": true, + "default_values": [""], + "field_type": "text", + "text_options": { + "valid_ws_types": ["KBaseGenomes.Genome"] + } +} +``` + +#### Textarea (Multi-line) +```json +{ + "id": "description", + "optional": true, + "advanced": false, + "allow_multiple": false, + "default_values": [""], + "field_type": "textarea", + "textarea_options": { + "n_rows": 5 + } +} +``` + +#### Output Object Name +```json +{ + "id": "output_name", + "optional": false, + "advanced": false, + "allow_multiple": false, + "default_values": [""], + "field_type": "text", + "text_options": { + "valid_ws_types": [], + "is_output_name": true + } +} +``` + +### Behavior Section + +#### Input Mapping + +```json +"input_mapping": [ + { + "narrative_system_variable": "workspace", + "target_property": "workspace_name" + }, + { + "narrative_system_variable": "workspace_id", + "target_property": "workspace_id" + }, + { + "input_parameter": "genome_ref", + "target_property": "genome_ref", + "target_type_transform": "resolved-ref" + }, + { + "input_parameter": "min_length", + "target_property": "min_length", + "target_type_transform": "int" + }, + { + "input_parameter": "threshold", + "target_property": "threshold", + "target_type_transform": "float" + }, + { + "input_parameter": "genomes", + "target_property": "genome_refs", + "target_type_transform": "list" + } +] +``` + +#### Type Transforms + +| Transform | Description | +|-----------|-------------| +| `resolved-ref` | Converts object name to full reference | +| `ref` | Keep as reference string | +| `int` | Parse as integer | +| `float` | Parse as float | +| `string` | Keep as string (default) | +| `list` | List of resolved references | +| `list` | List of integers | + +#### Output Mapping + +```json +"output_mapping": [ + { + "service_method_output_path": [0, "report_name"], + "target_property": "report_name" + }, + { + "service_method_output_path": [0, "report_ref"], + "target_property": "report_ref" + }, + { + "narrative_system_variable": "workspace", + "target_property": "workspace_name" + } +] +``` + +### Widget Options + +```json +"widgets": { + "input": null, + "output": "no-display" +} +``` + +Common output widgets: +- `"no-display"` - No output display (use for report-based apps) +- `"kbaseReportView"` - Display KBase report + +## display.yaml Reference + +### Complete Structure + +```yaml +name: My App Name + +tooltip: | + Brief one-line description of the app + +screenshots: + - my_screenshot.png + +icon: icon.png + +suggestions: + apps: + related: + - related_app_1 + - related_app_2 + next: + - follow_up_app + methods: + related: [] + next: [] + +parameters: + genome_ref: + ui-name: | + Genome + short-hint: | + Select a genome object + long-hint: | + Select a genome object from your Narrative data panel. + The genome should have annotated features. + + min_length: + ui-name: | + Minimum Length + short-hint: | + Minimum feature length + long-hint: | + Features shorter than this value will be excluded + from the analysis. Default is 100 bp. + + output_name: + ui-name: | + Output Name + short-hint: | + Name for the output object + long-hint: | + Provide a name for the output object that will be + saved to your Narrative. + +description: | +

Full description of the app in HTML format.

+ +

Overview

+

What this app does and why you would use it.

+ +

Inputs

+
    +
  • Genome - A KBase genome object
  • +
  • Minimum Length - Filter threshold
  • +
+ +

Outputs

+

This app produces:

+
    +
  • A summary report
  • +
  • Downloadable data files
  • +
+ +

Algorithm

+

Description of the methodology used.

+ +publications: + - pmid: 12345678 + display-text: | + Author A, Author B (2024) Title of paper. Journal Name 10:123-456 + link: https://doi.org/10.xxxx/xxxxx + + - display-text: | + Software documentation at https://example.com + link: https://example.com +``` + +### Parameter Groups + +For complex apps, group related parameters: + +```yaml +parameter-groups: + basic_options: + ui-name: Basic Options + short-hint: Core parameters for the analysis + parameters: + - genome_ref + - output_name + + advanced_options: + ui-name: Advanced Options + short-hint: Fine-tune the analysis + parameters: + - min_length + - threshold + - algorithm +``` + +### Fixed Parameters + +Parameters not shown in UI but passed to service: + +```json +"fixed_parameters": [ + { + "target_property": "version", + "target_value": "1.0" + } +] +``` + +## Common Workspace Types for valid_ws_types + +| Type | Description | +|------|-------------| +| `KBaseGenomes.Genome` | Annotated genome | +| `KBaseGenomeAnnotations.Assembly` | Genome assembly | +| `KBaseSets.GenomeSet` | Set of genomes | +| `KBaseFBA.FBAModel` | Metabolic model | +| `KBaseFBA.FBA` | FBA solution | +| `KBaseFBA.Media` | Growth media | +| `KBaseRNASeq.RNASeqAlignment` | RNA-seq alignment | +| `KBaseMatrices.ExpressionMatrix` | Expression data | +| `KBaseFile.AssemblyFile` | Assembly file | +| `KBaseSets.ReadsSet` | Set of reads | + +## Tips + +1. **Use advanced: true** for optional parameters to reduce UI clutter +2. **Provide good defaults** - Apps should work with minimal configuration +3. **Write clear hints** - Users rely on short-hint for quick understanding +4. **Use dropdown for constrained choices** - Better than free text for enumerated options +5. **Group related parameters** - Improves usability for complex apps +6. **Include publications** - Helps users cite your work properly diff --git a/.claude/commands/kb-sdk-dev/context/workspace-datatypes.md b/.claude/commands/kb-sdk-dev/context/workspace-datatypes.md new file mode 100644 index 00000000..98b44dcf --- /dev/null +++ b/.claude/commands/kb-sdk-dev/context/workspace-datatypes.md @@ -0,0 +1,436 @@ +# KBase Workspace Data Types Reference + +## Overview + +KBase has **223 data types** across **45 modules**. This reference provides a quick lookup for the most commonly used types. + +**Full Specifications:** `/Users/chenry/Dropbox/Projects/workspace_deluxe/agent-io/docs/WorkspaceDataTypes/` +- `all_types_list.json` - Complete list of all types +- `all_type_specs.json` - Full specifications +- `individual_specs/` - Individual type specification files + +## Types by Module + +### Most Used Modules + +| Module | Type Count | Description | +|--------|-----------|-------------| +| KBaseFBA | 21 | Flux Balance Analysis, models | +| KBaseGenomes | 8 | Genomes, contigs, features | +| KBaseSets | 8 | Set collections | +| KBaseRNASeq | 13 | RNA sequencing | +| KBaseBiochem | 6 | Biochemistry, media | +| Communities | 31 | Metagenomics | + +--- + +## Core Genome Types (KBaseGenomes) + +### Genome +**Type:** `KBaseGenomes.Genome` + +The primary genome object containing annotations. + +**Key Fields:** +- `id` - Genome identifier +- `scientific_name` - Organism name +- `domain` - Bacteria, Archaea, Eukaryota +- `features` - List of genomic features +- `contigs` - Contig sequences (or reference) +- `source` - Data source (RefSeq, etc.) + +**Usage:** +```python +genome = utils.get_object(workspace, genome_ref) +features = genome.get('features', []) +``` + +### ContigSet +**Type:** `KBaseGenomes.ContigSet` + +Set of DNA contigs/sequences. + +**Key Fields:** +- `id` - ContigSet identifier +- `contigs` - List of contig objects +- `source` - Data source + +### Feature +**Type:** `KBaseGenomes.Feature` + +Individual genomic feature (gene, CDS, etc.). + +**Key Fields:** +- `id` - Feature identifier +- `type` - Feature type (CDS, gene, rRNA, etc.) +- `location` - Genomic coordinates +- `function` - Functional annotation +- `protein_translation` - Amino acid sequence + +### Pangenome +**Type:** `KBaseGenomes.Pangenome` + +Comparison of multiple genomes. + +--- + +## FBA and Modeling Types (KBaseFBA) + +### FBAModel +**Type:** `KBaseFBA.FBAModel` + +Metabolic model for flux balance analysis. + +**Key Fields:** +- `id` - Model identifier +- `name` - Model name +- `modelreactions` - List of reactions +- `modelcompounds` - List of metabolites +- `modelcompartments` - Compartments +- `biomasses` - Biomass reactions +- `genome_ref` - Reference to source genome + +**Usage:** +```python +model = utils.get_object(workspace, model_ref) +reactions = model.get('modelreactions', []) +``` + +### FBA +**Type:** `KBaseFBA.FBA` + +FBA simulation result. + +**Key Fields:** +- `id` - FBA identifier +- `fbamodel_ref` - Reference to model +- `media_ref` - Media used +- `objectiveValue` - Objective function value +- `FBAReactionVariables` - Reaction flux values +- `FBAMetaboliteVariables` - Metabolite values + +### Gapfilling +**Type:** `KBaseFBA.Gapfilling` + +Gapfilling solution. + +### ModelTemplate +**Type:** `KBaseFBA.ModelTemplate` + +Template for building models. + +### ModelComparison +**Type:** `KBaseFBA.ModelComparison` + +Comparison of multiple models. + +--- + +## Biochemistry Types (KBaseBiochem) + +### Media +**Type:** `KBaseBiochem.Media` + +Growth media definition. + +**Key Fields:** +- `id` - Media identifier +- `name` - Media name +- `mediacompounds` - List of compounds and concentrations +- `type` - Media type + +**Usage:** +```python +media = utils.get_object(workspace, media_ref) +compounds = media.get('mediacompounds', []) +``` + +### Biochemistry +**Type:** `KBaseBiochem.Biochemistry` + +Biochemistry database (compounds, reactions). + +### CompoundSet +**Type:** `KBaseBiochem.CompoundSet` + +Collection of compounds. + +--- + +## Set Types (KBaseSets) + +### GenomeSet +**Type:** `KBaseSets.GenomeSet` + +Set of genome references. + +**Key Fields:** +- `description` - Set description +- `items` - List of genome references with labels + +### AssemblySet +**Type:** `KBaseSets.AssemblySet` + +Set of assembly references. + +### ReadsSet +**Type:** `KBaseSets.ReadsSet` + +Set of reads library references. + +### ExpressionSet +**Type:** `KBaseSets.ExpressionSet` + +Set of expression data references. + +### SampleSet +**Type:** `KBaseSets.SampleSet` + +Set of sample references. + +--- + +## Assembly Types (KBaseAssembly) + +### PairedEndLibrary +**Type:** `KBaseAssembly.PairedEndLibrary` + +Paired-end reads library. + +### SingleEndLibrary +**Type:** `KBaseAssembly.SingleEndLibrary` + +Single-end reads library. + +### AssemblyReport +**Type:** `KBaseAssembly.AssemblyReport` + +Assembly quality report. + +--- + +## RNA-Seq Types (KBaseRNASeq) + +### RNASeqAlignment +**Type:** `KBaseRNASeq.RNASeqAlignment` + +Read alignment result. + +### RNASeqExpression +**Type:** `KBaseRNASeq.RNASeqExpression` + +Expression values from RNA-Seq. + +### RNASeqDifferentialExpression +**Type:** `KBaseRNASeq.RNASeqDifferentialExpression` + +Differential expression analysis. + +### RNASeqSampleSet +**Type:** `KBaseRNASeq.RNASeqSampleSet` + +Set of RNA-Seq samples. + +--- + +## Expression Types (KBaseFeatureValues) + +### ExpressionMatrix +**Type:** `KBaseFeatureValues.ExpressionMatrix` + +Gene expression matrix. + +**Key Fields:** +- `genome_ref` - Reference genome +- `data` - Expression values matrix +- `feature_ids` - Row identifiers (genes) +- `condition_ids` - Column identifiers (conditions) + +### FeatureClusters +**Type:** `KBaseFeatureValues.FeatureClusters` + +Clustered features from expression data. + +--- + +## Annotation Types (KBaseGenomeAnnotations) + +### Assembly +**Type:** `KBaseGenomeAnnotations.Assembly` + +Genome assembly (newer format). + +### GenomeAnnotation +**Type:** `KBaseGenomeAnnotations.GenomeAnnotation` + +Genome with annotations (newer format). + +### Taxon +**Type:** `KBaseGenomeAnnotations.Taxon` + +Taxonomic information. + +--- + +## Report Type (KBaseReport) + +### Report +**Type:** `KBaseReport.Report` + +Standard app output report. + +**Key Fields:** +- `text_message` - Report text +- `objects_created` - List of created objects +- `file_links` - Links to downloadable files +- `html_links` - Links to HTML reports +- `warnings` - Warning messages + +**Usage:** +```python +report_info = utils.create_extended_report({ + 'message': 'Analysis complete', + 'workspace_name': workspace, + 'objects_created': [{'ref': obj_ref, 'description': 'My output'}], + 'file_links': [{'path': '/path/to/file.txt', 'name': 'results.txt'}] +}) +``` + +--- + +## File Types (KBaseFile) + +### FileRef +**Type:** `KBaseFile.FileRef` + +Reference to a file in Shock/Blobstore. + +### PairedEndLibrary +**Type:** `KBaseFile.PairedEndLibrary` + +Paired-end library (file-based). + +### SingleEndLibrary +**Type:** `KBaseFile.SingleEndLibrary` + +Single-end library (file-based). + +--- + +## Matrix Types (KBaseMatrices) + +### ExpressionMatrix +**Type:** `KBaseMatrices.ExpressionMatrix` + +Expression data matrix (newer format). + +### AmpliconMatrix +**Type:** `KBaseMatrices.AmpliconMatrix` + +Amplicon abundance matrix. + +### MetaboliteMatrix +**Type:** `KBaseMatrices.MetaboliteMatrix` + +Metabolite abundance matrix. + +### FitnessMatrix +**Type:** `KBaseMatrices.FitnessMatrix` + +Gene fitness data. + +--- + +## Phenotype Types (KBasePhenotypes) + +### PhenotypeSet +**Type:** `KBasePhenotypes.PhenotypeSet` + +Set of phenotype measurements. + +**Key Fields:** +- `genome_ref` - Associated genome +- `phenotypes` - List of phenotypes with media/gene knockouts + +### PhenotypeSimulationSet +**Type:** `KBasePhenotypes.PhenotypeSimulationSet` + +Predicted phenotypes from FBA. + +--- + +## Tree Types (KBaseTrees) + +### Tree +**Type:** `KBaseTrees.Tree` + +Phylogenetic tree. + +### MSA +**Type:** `KBaseTrees.MSA` + +Multiple sequence alignment. + +--- + +## Complete Type List by Module + +### KBaseFBA (21 types) +- FBAModel, FBA, Gapfilling, Gapgeneration +- ModelTemplate, NewModelTemplate, ModelComparison +- FBAComparison, FBAModelSet +- FBAPathwayAnalysis, FBAPathwayAnalysisMultiple +- BooleanGeneExpressionData, BooleanGeneExpressionDataCollection +- Classifier, ClassifierResult, ClassifierTrainingSet +- ETC, EscherConfiguration, EscherMap +- PromConstraint, ReactionProbabilities +- ReactionSensitivityAnalysis, SubsystemAnnotation +- MissingRoleData, regulatory_network + +### KBaseGenomes (8 types) +- Genome, ContigSet, Feature +- GenomeComparison, GenomeDomainData +- MetagenomeAnnotation, Pangenome +- ProbabilisticAnnotation + +### KBaseSets (8 types) +- AssemblySet, DifferentialExpressionMatrixSet +- ExpressionSet, FeatureSetSet +- GenomeSet, ReadsAlignmentSet +- ReadsSet, SampleSet + +### KBaseBiochem (6 types) +- Biochemistry, BiochemistryStructures +- CompoundSet, Media, MediaSet +- MetabolicMap + +### KBaseRNASeq (13 types) +- RNASeqAlignment, RNASeqAlignmentSet +- RNASeqAnalysis, RNASeqExpression +- RNASeqExpressionSet, RNASeqSample +- RNASeqSampleAlignment, RNASeqSampleSet +- RNASeqDifferentialExpression +- RNASeqCuffdiffdifferentialExpression +- RNASeqCuffmergetranscriptome +- Bowtie2IndexV2, Bowtie2Indexes +- GFFAnnotation, ReferenceAnnotation +- AlignmentStatsResults, DifferentialExpressionStat +- cummerbund_output, cummerbundplot + +### KBaseCollections (6 types) +- FBAModelList, FBAModelSet +- FeatureList, FeatureSet +- GenomeList, GenomeSet + +--- + +## Type Reference Usage + +When you need detailed information about a specific type: + +```python +# Read the individual spec file +spec_path = f"/Users/chenry/Dropbox/Projects/workspace_deluxe/agent-io/docs/WorkspaceDataTypes/individual_specs/{module}_{type}.json" +``` + +Example spec file name: `KBaseGenomes_Genome.json` diff --git a/.claude/commands/modelseedpy-expert.md b/.claude/commands/modelseedpy-expert.md new file mode 100644 index 00000000..4bc7c6ce --- /dev/null +++ b/.claude/commands/modelseedpy-expert.md @@ -0,0 +1,221 @@ +# ModelSEEDpy Expert + +You are an expert on ModelSEEDpy - a Python package for metabolic model reconstruction, analysis, and gapfilling. You have comprehensive knowledge of: + +1. **Overall Architecture** - How the modules connect and interact +2. **Core Workflows** - Model building, gapfilling, FBA, community modeling +3. **Module Selection** - Which classes/functions to use for specific tasks +4. **Integration Patterns** - How ModelSEEDpy integrates with COBRApy and KBase + +## Related Expert Skills + +For deep dives into specific areas, use these specialized skills: +- `/msmodelutl-expert` - Deep expertise on MSModelUtil (central model wrapper) +- `/fbapkg-expert` - Deep expertise on FBA packages and constraint systems + +## Knowledge Loading + +Before answering, read relevant documentation based on the question: + +**Architecture Overview:** +- `/Users/chenry/Dropbox/Projects/ModelSEEDpy/modelseedpy/__init__.py` + +**For specific modules, read the source:** +- Core: `/Users/chenry/Dropbox/Projects/ModelSEEDpy/modelseedpy/core/` +- FBA Packages: `/Users/chenry/Dropbox/Projects/ModelSEEDpy/modelseedpy/fbapkg/` +- Community: `/Users/chenry/Dropbox/Projects/ModelSEEDpy/modelseedpy/community/` +- Biochemistry: `/Users/chenry/Dropbox/Projects/ModelSEEDpy/modelseedpy/biochem/` + +## Quick Reference: Module Map + +``` +ModelSEEDpy +│ +├── core/ # Core model utilities +│ ├── msmodelutl.py # MSModelUtil - Central model wrapper ⭐ +│ ├── msgapfill.py # MSGapfill - Gapfilling algorithms +│ ├── msfba.py # MSFBA - FBA execution +│ ├── msatpcorrection.py # MSATPCorrection - ATP analysis +│ ├── msmedia.py # MSMedia - Growth media definitions +│ ├── mstemplate.py # MSTemplate - Model templates +│ ├── msbuilder.py # MSBuilder - Model construction +│ ├── msgrowthphenotypes.py # Growth phenotype testing +│ ├── msminimalmedia.py # Minimal media computation +│ ├── fbahelper.py # FBAHelper - Low-level FBA utilities +│ └── msgenome.py # MSGenome - Genome handling +│ +├── fbapkg/ # FBA constraint packages +│ ├── mspackagemanager.py # MSPackageManager - Package registry ⭐ +│ ├── basefbapkg.py # BaseFBAPkg - Base class for packages +│ ├── gapfillingpkg.py # GapfillingPkg - Gapfilling constraints +│ ├── kbasemediapkg.py # KBaseMediaPkg - Media constraints +│ ├── flexiblebiomasspkg.py # FlexibleBiomassPkg - Biomass flexibility +│ ├── simplethermopkg.py # SimpleThermoPkg - Thermodynamic constraints +│ └── [15+ more packages] +│ +├── community/ # Community/multi-species modeling +│ ├── mscommunity.py # MSCommunity - Community models +│ ├── mssteadycom.py # MSSteadyCom - SteadyCom algorithm +│ └── mscommfitting.py # Community fitting +│ +├── biochem/ # ModelSEED biochemistry database +│ ├── modelseed_biochem.py # ModelSEEDBiochem - Reaction/compound DB +│ └── modelseed_reaction.py # Reaction utilities +│ +└── multiomics/ # Multi-omics integration + └── [omics integration tools] +``` + +## Common Workflows + +### Workflow 1: Load and Analyze a Model +```python +from modelseedpy.core.msmodelutl import MSModelUtil +from modelseedpy.core.msmedia import MSMedia + +# Load model +mdlutl = MSModelUtil.from_cobrapy("model.json") + +# Set media and run FBA +media = MSMedia.from_dict({"EX_cpd00027_e0": 10}) # Glucose +mdlutl.add_missing_exchanges(media) +mdlutl.set_media(media) +solution = mdlutl.model.optimize() +``` + +### Workflow 2: Gapfill a Model +```python +from modelseedpy.core.msgapfill import MSGapfill + +# Create gapfiller +gapfill = MSGapfill(mdlutl, default_target="bio1") + +# Run gapfilling +solution = gapfill.run_gapfilling(media, target="bio1") + +# Integrate solution +mdlutl.add_gapfilling(solution) +``` + +### Workflow 3: Build Model from Genome +```python +from modelseedpy.core.msbuilder import MSBuilder + +# Build draft model from genome +builder = MSBuilder(genome, template) +model = builder.build() +``` + +### Workflow 4: Community Modeling +```python +from modelseedpy.community.mscommunity import MSCommunity + +# Create community from member models +community = MSCommunity(member_models=[model1, model2]) +community.run_fba() +``` + +## Task → Module Routing + +| Task | Primary Module | Secondary | +|------|---------------|-----------| +| Load/wrap a model | `MSModelUtil` | - | +| Find metabolites/reactions | `MSModelUtil` | - | +| Set growth media | `MSModelUtil` + `KBaseMediaPkg` | `MSMedia` | +| Run FBA | `mdlutl.model.optimize()` | `MSFBA` | +| Gapfill a model | `MSGapfill` | `GapfillingPkg` | +| Test growth conditions | `MSModelUtil` | - | +| ATP correction | `MSATPCorrection` | - | +| Add custom constraints | `fbapkg` classes | `BaseFBAPkg` | +| Community modeling | `MSCommunity` | `MSSteadyCom` | +| Build model from genome | `MSBuilder` | `MSTemplate` | +| Access biochemistry DB | `ModelSEEDBiochem` | - | + +## Key Design Patterns + +### Singleton Caching +Both `MSModelUtil` and `MSPackageManager` use singleton patterns: +```python +# These return the same instance +mdlutl1 = MSModelUtil.get(model) +mdlutl2 = MSModelUtil.get(model) + +pkgmgr1 = MSPackageManager.get_pkg_mgr(model) +pkgmgr2 = MSPackageManager.get_pkg_mgr(model) +``` + +### Model Wrapping +All high-level classes accept either `model` or `MSModelUtil`: +```python +# Both work: +gapfill = MSGapfill(model) +gapfill = MSGapfill(mdlutl) +``` + +### Package System +FBA constraints are modular through packages: +```python +# Get or create a package +pkg = mdlutl.pkgmgr.getpkg("GapfillingPkg") + +# Packages add variables/constraints to the model +pkg.build_package(parameters) +``` + +## Guidelines for Responding + +1. **Route to specialized skills** when questions go deep: + - MSModelUtil details → suggest `/msmodelutl-expert` + - FBA package details → suggest `/fbapkg-expert` + +2. **Start with the right module** - Help users find where to begin + +3. **Show integration** - How modules work together + +4. **Provide working examples** - Complete, runnable code + +5. **Explain COBRApy relationship** - ModelSEEDpy wraps and extends COBRApy + +## Response Format + +### For "how do I" questions: +``` +### Approach + +Brief explanation of which modules to use and why. + +**Modules involved:** +- `Module1` - Purpose +- `Module2` - Purpose + +**Example:** +```python +# Complete working code +``` + +**For deeper information:** Use `/specialized-skill` +``` + +### For architecture questions: +``` +### Overview + +Explanation of the component/concept. + +### Key Classes + +- `ClassName` (module) - Purpose +- `ClassName` (module) - Purpose + +### How They Connect + +Explanation of relationships. + +### Example + +Working example showing integration. +``` + +## User Request + +$ARGUMENTS diff --git a/.claude/commands/modelseedpy-expert/context/architecture.md b/.claude/commands/modelseedpy-expert/context/architecture.md new file mode 100644 index 00000000..bf105a86 --- /dev/null +++ b/.claude/commands/modelseedpy-expert/context/architecture.md @@ -0,0 +1,285 @@ +# ModelSEEDpy Architecture + +## Overview + +ModelSEEDpy is a Python package for metabolic model reconstruction, gapfilling, and analysis. It builds on COBRApy and integrates with the ModelSEED biochemistry database and KBase platform. + +## Module Hierarchy + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ User Code │ +└─────────────────────────────────────────────────────────────────┘ + │ + ┌─────────────────────┼─────────────────────┐ + │ │ │ + ▼ ▼ ▼ +┌───────────────┐ ┌───────────────┐ ┌───────────────┐ +│ MSGapfill │ │ MSCommunity │ │ MSBuilder │ +│ (Gapfilling) │ │ (Community) │ │ (Model build)│ +└───────────────┘ └───────────────┘ └───────────────┘ + │ │ │ + └─────────────────────┼─────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ MSModelUtil │ +│ (Central Model Wrapper - core/msmodelutl.py) │ +│ │ +│ • Wraps cobra.Model │ +│ • Provides metabolite/reaction search │ +│ • Manages media, exchanges, tests │ +│ • Coordinates with other components │ +└─────────────────────────────────────────────────────────────────┘ + │ + ┌─────────────────────┼─────────────────────┐ + │ │ │ + ▼ ▼ ▼ +┌───────────────┐ ┌───────────────┐ ┌───────────────┐ +│MSPackageManager│ │ MSATPCorrection│ │ModelSEEDBiochem│ +│ (FBA Packages) │ │ (ATP Analysis) │ │ (Biochem DB) │ +└───────────────┘ └───────────────┘ └───────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ FBA Packages (fbapkg/) │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │GapfillingPkg │ │KBaseMediaPkg │ │FlexBiomassPkg│ ... │ +│ └──────────────┘ └──────────────┘ └──────────────┘ │ +│ │ +│ All inherit from BaseFBAPkg │ +│ Add variables/constraints to model.solver │ +└─────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ COBRApy Model │ +│ (cobra.Model object) │ +│ │ +│ • Reactions, Metabolites, Genes │ +│ • model.solver (optlang) │ +│ • model.optimize() │ +└─────────────────────────────────────────────────────────────────┘ +``` + +## Core Modules (modelseedpy/core/) + +### MSModelUtil (msmodelutl.py) ~2000 lines +**The central hub for model operations.** + +Key responsibilities: +- Wrap and extend cobra.Model +- Metabolite/reaction search and lookup +- Exchange and transport management +- Media configuration +- FBA testing and condition management +- Gapfilling support methods +- Integration with all other components + +```python +from modelseedpy.core.msmodelutl import MSModelUtil +mdlutl = MSModelUtil.get(model) # Singleton access +``` + +### MSGapfill (msgapfill.py) ~1200 lines +**Automated model gapfilling.** + +Key features: +- Multi-media gapfilling +- ATP-aware gapfilling +- Binary/linear reaction filtering +- Solution testing and validation + +```python +from modelseedpy.core.msgapfill import MSGapfill +gapfill = MSGapfill(mdlutl, default_target="bio1") +solution = gapfill.run_gapfilling(media, target="bio1") +``` + +### MSATPCorrection (msatpcorrection.py) +**ATP production analysis and correction.** + +Prevents models from producing ATP without valid biochemistry. + +```python +atputl = mdlutl.get_atputl(core_template=template) +atp_tests = mdlutl.get_atp_tests() +``` + +### MSFBA (msfba.py) +**Higher-level FBA execution with reporting.** + +```python +from modelseedpy.core.msfba import MSFBA +fba = MSFBA(mdlutl) +result = fba.run_fba() +``` + +### MSMedia (msmedia.py) +**Growth media definitions.** + +```python +from modelseedpy.core.msmedia import MSMedia +media = MSMedia.from_dict({"EX_cpd00027_e0": 10}) +media = MSMedia.from_file("media.tsv") +``` + +### MSBuilder (msbuilder.py) +**Model construction from genome annotations.** + +```python +from modelseedpy.core.msbuilder import MSBuilder +builder = MSBuilder(genome, template) +model = builder.build() +``` + +### MSTemplate (mstemplate.py) +**Model templates for reconstruction.** + +Templates define which reactions can be added during reconstruction and their properties. + +### MSGrowthPhenotypes (msgrowthphenotypes.py) +**Phenotype testing and comparison.** + +Test model predictions against experimental growth data. + +## FBA Packages (modelseedpy/fbapkg/) + +### MSPackageManager +**Central registry for FBA packages.** + +```python +from modelseedpy.fbapkg import MSPackageManager +pkgmgr = MSPackageManager.get_pkg_mgr(model) # Singleton + +# List available packages +pkgmgr.list_available_packages() + +# Get or create a package +pkg = pkgmgr.getpkg("GapfillingPkg") +``` + +### BaseFBAPkg +**Base class for all FBA packages.** + +All packages inherit from this and implement: +- `build_package(params)` - Add constraints/variables +- `clear()` - Remove constraints/variables + +### Key Packages + +| Package | Purpose | +|---------|---------| +| `GapfillingPkg` | Gapfilling MILP formulation | +| `KBaseMediaPkg` | Media exchange constraints | +| `FlexibleBiomassPkg` | Flexible biomass composition | +| `SimpleThermoPkg` | Simple thermodynamic constraints | +| `FullThermoPkg` | Full thermodynamic constraints | +| `ReactionUsePkg` | Binary reaction usage variables | +| `RevBinPkg` | Reversibility binary variables | +| `ObjectivePkg` | Objective function management | +| `TotalFluxPkg` | Total flux minimization | +| `BilevelPkg` | Bilevel optimization | + +## Community Module (modelseedpy/community/) + +### MSCommunity (mscommunity.py) +**Multi-species community modeling.** + +```python +from modelseedpy.community.mscommunity import MSCommunity +community = MSCommunity(member_models=[m1, m2, m3]) +``` + +### MSSteadyCom (mssteadycom.py) +**SteadyCom algorithm for community FBA.** + +Computes steady-state community compositions. + +## Biochemistry Module (modelseedpy/biochem/) + +### ModelSEEDBiochem (modelseed_biochem.py) +**Access to ModelSEED reaction/compound database.** + +```python +from modelseedpy.biochem import ModelSEEDBiochem +biochem = ModelSEEDBiochem.get() +reaction = biochem.get_reaction("rxn00001") +compound = biochem.get_compound("cpd00001") +``` + +## Key Design Patterns + +### 1. Singleton/Cache Pattern +Used by MSModelUtil, MSPackageManager, ModelSEEDBiochem: + +```python +# Same instance returned for same model +mdlutl1 = MSModelUtil.get(model) +mdlutl2 = MSModelUtil.get(model) +assert mdlutl1 is mdlutl2 +``` + +### 2. Model/Utility Acceptance +All high-level classes accept either raw model or utility: + +```python +def __init__(self, model_or_mdlutl): + self.mdlutl = MSModelUtil.get(model_or_mdlutl) + self.model = self.mdlutl.model +``` + +### 3. Package Registration +FBA packages self-register with MSPackageManager: + +```python +class MyPkg(BaseFBAPkg): + def __init__(self, model): + super().__init__(model, "MyPkg", ...) + # BaseFBAPkg.__init__ calls pkgmgr.addpkgobj(self) +``` + +### 4. Lazy Loading +Heavy components loaded on demand: + +```python +# MSATPCorrection created only when needed +atputl = mdlutl.get_atputl() # Creates if missing +``` + +## Data Flow Example: Gapfilling + +``` +User Request: "Gapfill model on glucose media" + │ + ▼ + ┌───────────────┐ + │ MSGapfill │ + │ │ + │ 1. Get media │ + │ 2. Setup FBA │ + │ 3. Run MILP │ + │ 4. Filter │ + └───────────────┘ + │ + ┌───────────┼───────────┐ + │ │ │ + ▼ ▼ ▼ +┌─────────────┐ ┌─────────┐ ┌─────────────┐ +│MSModelUtil │ │GapfillPkg│ │KBaseMediaPkg│ +│ │ │ │ │ │ +│set_media() │ │build_pkg │ │build_pkg │ +│test_soln() │ │(MILP) │ │(bounds) │ +└─────────────┘ └─────────┘ └─────────────┘ + │ │ │ + └───────────┼───────────┘ + │ + ▼ + ┌───────────────┐ + │ cobra.Model │ + │ │ + │ .solver │ + │ .optimize() │ + └───────────────┘ +``` diff --git a/.claude/commands/modelseedpy-expert/context/workflows.md b/.claude/commands/modelseedpy-expert/context/workflows.md new file mode 100644 index 00000000..dd9b4fef --- /dev/null +++ b/.claude/commands/modelseedpy-expert/context/workflows.md @@ -0,0 +1,316 @@ +# ModelSEEDpy Common Workflows + +## Workflow 1: Load and Analyze an Existing Model + +```python +from modelseedpy.core.msmodelutl import MSModelUtil +from modelseedpy.core.msmedia import MSMedia + +# Load model from file +mdlutl = MSModelUtil.from_cobrapy("model.json") +# Or wrap an existing COBRApy model +# mdlutl = MSModelUtil.get(cobra_model) + +# Inspect model +print(f"Reactions: {len(mdlutl.model.reactions)}") +print(f"Metabolites: {len(mdlutl.model.metabolites)}") + +# Find specific metabolites +glucose_list = mdlutl.find_met("glucose", "c0") +if glucose_list: + glucose = glucose_list[0] + print(f"Found glucose: {glucose.id}") + +# Set up media +media = MSMedia.from_dict({ + "EX_cpd00027_e0": 10, # Glucose + "EX_cpd00001_e0": 1000, # Water + "EX_cpd00009_e0": 1000, # Phosphate + # ... other nutrients +}) + +# Ensure exchanges exist +mdlutl.add_missing_exchanges(media) +mdlutl.set_media(media) + +# Run FBA +solution = mdlutl.model.optimize() +print(f"Growth rate: {solution.objective_value}") +``` + +## Workflow 2: Gapfill a Non-Growing Model + +```python +from modelseedpy.core.msmodelutl import MSModelUtil +from modelseedpy.core.msgapfill import MSGapfill +from modelseedpy.core.msmedia import MSMedia + +# Load model +mdlutl = MSModelUtil.from_cobrapy("draft_model.json") + +# Define media +media = MSMedia.from_dict({"EX_cpd00027_e0": 10}) +mdlutl.add_missing_exchanges(media) + +# Check if model grows (probably not if draft) +mdlutl.set_media(media) +sol = mdlutl.model.optimize() +print(f"Pre-gapfill growth: {sol.objective_value}") + +# Create gapfiller +gapfill = MSGapfill( + mdlutl, + default_target="bio1", + minimum_obj=0.1 # Minimum required growth +) + +# Run gapfilling +solution = gapfill.run_gapfilling( + media=media, + target="bio1" +) + +print(f"Gapfilling solution: {solution}") + +# Test which reactions are truly needed +unneeded = mdlutl.test_solution( + solution, + targets=["bio1"], + medias=[media], + thresholds=[0.1], + remove_unneeded_reactions=True +) + +# Record the gapfilling +mdlutl.add_gapfilling(solution) + +# Verify growth +sol = mdlutl.model.optimize() +print(f"Post-gapfill growth: {sol.objective_value}") + +# Save model +mdlutl.save_model("gapfilled_model.json") +``` + +## Workflow 3: ATP-Aware Gapfilling + +```python +from modelseedpy.core.msmodelutl import MSModelUtil +from modelseedpy.core.msgapfill import MSGapfill +from modelseedpy.core.mstemplate import MSTemplateBuilder + +# Load model +mdlutl = MSModelUtil.from_cobrapy("model.json") + +# Get core template for ATP tests +template = MSTemplateBuilder.build_core_template() + +# Get ATP test conditions (prevents ATP loops) +atp_tests = mdlutl.get_atp_tests(core_template=template) + +# Create gapfiller with ATP constraints +gapfill = MSGapfill(mdlutl, default_target="bio1") + +# Run ATP-constrained gapfilling +solution = gapfill.run_gapfilling( + media=media, + target="bio1", + atp_tests=atp_tests # Prevents solutions that produce free ATP +) +``` + +## Workflow 4: Test Growth on Multiple Media + +```python +from modelseedpy.core.msmodelutl import MSModelUtil +from modelseedpy.core.msmedia import MSMedia + +mdlutl = MSModelUtil.from_cobrapy("model.json") + +# Define test conditions +conditions = [ + { + "media": MSMedia.from_dict({"EX_cpd00027_e0": 10}), # Glucose + "objective": "bio1", + "is_max_threshold": False, # Must grow ABOVE threshold + "threshold": 0.1 + }, + { + "media": MSMedia.from_dict({"EX_cpd00029_e0": 10}), # Acetate + "objective": "bio1", + "is_max_threshold": False, + "threshold": 0.05 + }, + { + "media": MSMedia.from_dict({"EX_cpd00036_e0": 10}), # Succinate + "objective": "bio1", + "is_max_threshold": False, + "threshold": 0.05 + } +] + +# Add missing exchanges for all media +for cond in conditions: + mdlutl.add_missing_exchanges(cond["media"]) + +# Test all conditions +results = {} +for i, cond in enumerate(conditions): + passed = mdlutl.test_single_condition(cond) + media_name = f"condition_{i}" + results[media_name] = passed + print(f"{media_name}: {'PASS' if passed else 'FAIL'}") + +# Or use batch testing +all_passed = mdlutl.test_condition_list(conditions) +print(f"All conditions passed: {all_passed}") +``` + +## Workflow 5: Build Model from Genome + +```python +from modelseedpy.core.msbuilder import MSBuilder +from modelseedpy.core.msgenome import MSGenome +from modelseedpy.core.mstemplate import MSTemplateBuilder + +# Load genome +genome = MSGenome.from_fasta("genome.fasta") +# Or from annotation +# genome = MSGenome.from_rast(annotation_data) + +# Get template +template = MSTemplateBuilder.build_template("GramNegative") + +# Build model +builder = MSBuilder(genome, template) +model = builder.build() + +# Wrap in MSModelUtil for further operations +mdlutl = MSModelUtil.get(model) +print(f"Built model with {len(model.reactions)} reactions") +``` + +## Workflow 6: Community Modeling + +```python +from modelseedpy.core.msmodelutl import MSModelUtil +from modelseedpy.community.mscommunity import MSCommunity + +# Load individual models +model1 = MSModelUtil.from_cobrapy("species1.json").model +model2 = MSModelUtil.from_cobrapy("species2.json").model +model3 = MSModelUtil.from_cobrapy("species3.json").model + +# Create community +community = MSCommunity( + member_models=[model1, model2, model3], + ids=["sp1", "sp2", "sp3"] +) + +# Run community FBA +result = community.run_fba() + +# Get individual contributions +for member in community.members: + print(f"{member.id}: {member.growth_rate}") +``` + +## Workflow 7: Add Custom FBA Constraints + +```python +from modelseedpy.core.msmodelutl import MSModelUtil +from modelseedpy.fbapkg import MSPackageManager + +mdlutl = MSModelUtil.from_cobrapy("model.json") +pkgmgr = mdlutl.pkgmgr + +# Get reaction use package (binary variables for reaction on/off) +rxn_use_pkg = pkgmgr.getpkg("ReactionUsePkg") +rxn_use_pkg.build_package({ + "reaction_list": mdlutl.model.reactions +}) + +# Get total flux package (minimize total flux) +total_flux_pkg = pkgmgr.getpkg("TotalFluxPkg") +total_flux_pkg.build_package() + +# Get thermodynamic package +thermo_pkg = pkgmgr.getpkg("SimpleThermoPkg") +thermo_pkg.build_package() + +# Run FBA with all constraints active +solution = mdlutl.model.optimize() +``` + +## Workflow 8: Flexible Biomass Analysis + +```python +from modelseedpy.core.msmodelutl import MSModelUtil +from modelseedpy.fbapkg import MSPackageManager + +mdlutl = MSModelUtil.from_cobrapy("model.json") + +# Get flexible biomass package +flex_bio_pkg = mdlutl.pkgmgr.getpkg("FlexibleBiomassPkg") + +# Build with flexibility parameters +flex_bio_pkg.build_package({ + "bio_rxn_id": "bio1", + "flex_coefficient": 0.1, # Allow 10% flexibility + "use_rna_class": True, + "use_protein_class": True +}) + +# Now biomass composition can vary within bounds +solution = mdlutl.model.optimize() +``` + +## Workflow 9: Compare Multiple Solutions + +```python +from modelseedpy.core.msmodelutl import MSModelUtil +from modelseedpy.core.msmedia import MSMedia + +mdlutl = MSModelUtil.from_cobrapy("model.json") + +# Run FBA on different media and collect solutions +solutions = {} + +media_glucose = MSMedia.from_dict({"EX_cpd00027_e0": 10}) +mdlutl.add_missing_exchanges(media_glucose) +mdlutl.set_media(media_glucose) +solutions["glucose"] = mdlutl.model.optimize() + +media_acetate = MSMedia.from_dict({"EX_cpd00029_e0": 10}) +mdlutl.add_missing_exchanges(media_acetate) +mdlutl.set_media(media_acetate) +solutions["acetate"] = mdlutl.model.optimize() + +# Export comparison to CSV +mdlutl.print_solutions(solutions, "flux_comparison.csv") +``` + +## Workflow 10: Debugging - Find Unproducible Biomass Components + +```python +from modelseedpy.core.msmodelutl import MSModelUtil + +mdlutl = MSModelUtil.from_cobrapy("model.json") + +# Set up media +mdlutl.set_media(media) + +# Find biomass components that can't be produced +unproducible = mdlutl.find_unproducible_biomass_compounds( + target_rxn="bio1" +) + +for met in unproducible: + print(f"Cannot produce: {met.id} - {met.name}") + +# Check sensitivity to specific reaction knockouts +ko_results = mdlutl.find_unproducible_biomass_compounds( + target_rxn="bio1", + ko_list=[["rxn00001_c0", ">"], ["rxn00002_c0", "<"]] +) +``` diff --git a/.claude/commands/msmodelutl-expert.md b/.claude/commands/msmodelutl-expert.md new file mode 100644 index 00000000..6059df81 --- /dev/null +++ b/.claude/commands/msmodelutl-expert.md @@ -0,0 +1,175 @@ +# MSModelUtil Expert + +You are an expert on the MSModelUtil class from ModelSEEDpy. You have deep knowledge of: + +1. **The MSModelUtil API** - All 55+ methods, their parameters, return values, and usage +2. **Integration patterns** - How MSModelUtil connects with MSGapfill, MSFBA, MSPackageManager, etc. +3. **Best practices** - Efficient ways to use the API, common pitfalls to avoid +4. **Debugging** - How to diagnose issues in code using MSModelUtil + +## Related Expert Skills + +For questions outside MSModelUtil's scope, suggest these specialized skills: +- `/modelseedpy-expert` - General ModelSEEDpy overview, module routing, workflows +- `/fbapkg-expert` - Deep dive on FBA packages (GapfillingPkg, KBaseMediaPkg, etc.) + +## Knowledge Loading + +Before answering, read the current MSModelUtil documentation: + +**Primary Reference (always read):** +- `/Users/chenry/Dropbox/Projects/ModelSEEDpy/agent-io/docs/msmodelutl-developer-guide.md` + +**Source Code (read when needed for implementation details):** +- `/Users/chenry/Dropbox/Projects/ModelSEEDpy/modelseedpy/core/msmodelutl.py` + +## Quick Reference: Essential Patterns + +### Pattern 1: Safe Instance Access +```python +# Always use get() for consistent instance access +mdlutl = MSModelUtil.get(model) # Works with model or mdlutl + +# Functions should accept either +def my_function(model_or_mdlutl): + mdlutl = MSModelUtil.get(model_or_mdlutl) + model = mdlutl.model +``` + +### Pattern 2: Find and Operate on Metabolites +```python +# Always handle empty results +mets = mdlutl.find_met("glucose", "c0") +if mets: + glucose = mets[0] + # Do something with glucose +else: + # Handle not found +``` + +### Pattern 3: Add Exchanges for Media +```python +# Before setting media, ensure exchanges exist +missing = mdlutl.add_missing_exchanges(media) +if missing: + print(f"Added exchanges for: {missing}") +mdlutl.set_media(media) +``` + +### Pattern 4: Test Growth Conditions +```python +condition = { + "media": media, + "objective": "bio1", + "is_max_threshold": True, # True = must be BELOW threshold + "threshold": 0.1 +} +mdlutl.apply_test_condition(condition) +passed = mdlutl.test_single_condition(condition, apply_condition=False) +``` + +### Pattern 5: Gapfill and Validate +```python +# After gapfilling +solution = gapfiller.run_gapfilling(media, target="bio1") + +# Test which reactions are actually needed +unneeded = mdlutl.test_solution( + solution, + targets=["bio1"], + medias=[media], + thresholds=[0.1], + remove_unneeded_reactions=True +) +``` + +## Common Mistakes to Avoid + +1. **Not using get()**: Creating multiple MSModelUtil instances for same model +2. **Ignoring empty find_met results**: Always check if list is empty +3. **Forgetting build_metabolite_hash()**: Called automatically by find_met, but cached +4. **Wrong threshold interpretation**: is_max_threshold=True means FAIL if >= threshold +5. **Not adding exchanges before setting media**: Use add_missing_exchanges() first + +## Integration Map + +``` +MSModelUtil ↔ MSGapfill +- MSGapfill takes MSModelUtil in constructor +- Sets mdlutl.gfutl = self for bidirectional access +- Uses mdlutl.test_solution() for solution validation +- Uses mdlutl.reaction_expansion_test() for minimal solutions + +MSModelUtil ↔ MSPackageManager +- Created automatically: self.pkgmgr = MSPackageManager.get_pkg_mgr(model) +- Used for media: self.pkgmgr.getpkg("KBaseMediaPkg").build_package(media) +- All FBA packages access model through MSPackageManager + +MSModelUtil ↔ MSATPCorrection +- Lazy-loaded via get_atputl() +- Sets self.atputl for caching +- Uses ATP tests for gapfilling constraints + +MSModelUtil ↔ ModelSEEDBiochem +- Used in add_ms_reaction() for reaction data +- Used in assign_reliability_scores_to_reactions() for scoring + +MSModelUtil ↔ MSFBA +- MSFBA wraps model_or_mdlutl input +- Uses MSModelUtil for consistent access +``` + +## Guidelines for Responding + +When helping users: + +1. **Be specific** - Reference exact method names, parameters, and return types +2. **Show examples** - Provide working code snippets +3. **Explain integration** - Show how methods connect to other ModelSEEDpy components +4. **Warn about pitfalls** - Mention common mistakes and how to avoid them +5. **Read the docs first** - Always consult the developer guide for accurate information + +## Response Format + +### For API questions: +``` +### Method: `method_name(params)` + +**Purpose:** Brief description + +**Parameters:** +- `param1` (type): Description +- `param2` (type, optional): Description + +**Returns:** Description of return value + +**Example:** +```python +# Working example +``` + +**Related methods:** List of related methods +``` + +### For "how do I" questions: +``` +### Approach + +Brief explanation of the approach. + +**Step 1:** Description +```python +code +``` + +**Step 2:** Description +```python +code +``` + +**Notes:** Any important considerations +``` + +## User Request + +$ARGUMENTS diff --git a/.claude/commands/msmodelutl-expert/context/api-summary.md b/.claude/commands/msmodelutl-expert/context/api-summary.md new file mode 100644 index 00000000..86669d97 --- /dev/null +++ b/.claude/commands/msmodelutl-expert/context/api-summary.md @@ -0,0 +1,121 @@ +# MSModelUtil API Quick Reference + +## Core Concepts + +- **Singleton pattern**: Use `MSModelUtil.get(model)` to get/create instances +- **Wraps cobra.Model**: Access via `mdlutl.model` +- **Integrates with MSPackageManager**: Access via `mdlutl.pkgmgr` +- **Location**: `modelseedpy/core/msmodelutl.py` (~2,000 lines) + +## Essential Methods + +### Factory/Initialization +| Method | Description | +|--------|-------------| +| `MSModelUtil.get(model)` | Get or create instance (PREFERRED) | +| `MSModelUtil.from_cobrapy(filename)` | Load from file | +| `MSModelUtil(model)` | Direct construction | + +### Metabolite Search +| Method | Description | +|--------|-------------| +| `find_met(name, compartment=None)` | Find metabolites by name/ID | +| `msid_hash()` | Get ModelSEED ID to metabolite mapping | +| `metabolite_msid(met)` [static] | Extract ModelSEED ID from metabolite | +| `build_metabolite_hash()` | Build internal lookup caches | + +### Reaction Operations +| Method | Description | +|--------|-------------| +| `rxn_hash()` | Get stoichiometry to reaction mapping | +| `find_reaction(stoichiometry)` | Find reaction by stoichiometry | +| `exchange_list()` | Get exchange reactions | +| `exchange_hash()` | Metabolite to exchange mapping | +| `is_core(rxn)` | Check if reaction is core metabolism | + +### Exchange/Transport +| Method | Description | +|--------|-------------| +| `add_exchanges_for_metabolites(cpds, uptake, excretion)` | Add exchanges | +| `add_transport_and_exchange_for_metabolite(met, direction)` | Add transport | +| `add_missing_exchanges(media)` | Fill media gaps | + +### Media/FBA +| Method | Description | +|--------|-------------| +| `set_media(media)` | Configure growth media | +| `apply_test_condition(condition)` | Apply test constraints | +| `test_single_condition(condition)` | Run single test | +| `test_condition_list(conditions)` | Run multiple tests | + +### Gapfilling Support +| Method | Description | +|--------|-------------| +| `test_solution(solution, targets, medias, thresholds)` | Validate solutions | +| `add_gapfilling(solution)` | Record integrated gapfilling | +| `reaction_expansion_test(rxn_list, conditions)` | Find minimal sets | + +### ATP Correction +| Method | Description | +|--------|-------------| +| `get_atputl()` | Get ATP correction utility | +| `get_atp_tests()` | Get ATP test conditions | + +### Model Editing +| Method | Description | +|--------|-------------| +| `add_ms_reaction(rxn_dict)` | Add ModelSEED reactions | +| `add_atp_hydrolysis(compartment)` | Add ATP hydrolysis | +| `get_attributes()` / `save_attributes()` | Model metadata | + +### Analysis +| Method | Description | +|--------|-------------| +| `assign_reliability_scores_to_reactions()` | Score reactions | +| `find_unproducible_biomass_compounds()` | Biomass sensitivity | +| `analyze_minimal_reaction_set(solution, label)` | Alternative analysis | + +### I/O +| Method | Description | +|--------|-------------| +| `save_model(filename, format)` | Save model to file | +| `printlp(filename)` | Write LP for debugging | +| `print_solutions(solution_hash, filename)` | Export solutions to CSV | + +## Key Instance Attributes + +```python +self.model # The wrapped cobra.Model +self.pkgmgr # MSPackageManager for this model +self.atputl # MSATPCorrection instance (lazy-loaded) +self.gfutl # MSGapfill reference (set by gapfiller) +self.metabolite_hash # Metabolite lookup cache +self.test_objective # Current test objective value +self.reaction_scores # Gapfilling reaction scores +self.integrated_gapfillings # List of integrated solutions +self.attributes # Model metadata dictionary +``` + +## Condition Dictionary Format + +```python +condition = { + "media": MSMedia, # Media object + "objective": "bio1", # Objective reaction ID + "is_max_threshold": True, # True = FAIL if value >= threshold + "threshold": 0.1 # Threshold value +} +``` + +## Solution Dictionary Format + +```python +solution = { + "new": {"rxn00001_c0": ">"}, # Newly added reactions + "reversed": {"rxn00002_c0": "<"}, # Direction-reversed reactions + "media": media, # Media used + "target": "bio1", # Target reaction + "minobjective": 0.1, # Minimum objective + "binary_check": True # Binary filtering done +} +``` diff --git a/.claude/commands/msmodelutl-expert/context/integration.md b/.claude/commands/msmodelutl-expert/context/integration.md new file mode 100644 index 00000000..6f99490c --- /dev/null +++ b/.claude/commands/msmodelutl-expert/context/integration.md @@ -0,0 +1,239 @@ +# MSModelUtil Integration Map + +## Module Architecture + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ MSModelUtil │ +│ (Central Model Wrapper) │ +└───────────────────────────┬─────────────────────────────────────┘ + │ + ┌───────────────────┼───────────────────┐ + │ │ │ + ▼ ▼ ▼ +┌───────────────┐ ┌───────────────┐ ┌───────────────────────┐ +│ MSFBA │ │ MSGapfill │ │ MSPackageManager │ +│ (FBA runner) │ │ (Gapfilling) │ │ (Constraint pkgs) │ +└───────────────┘ └───────────────┘ └───────────────────────┘ + │ │ │ + └───────────────────┼───────────────────┘ + │ + ┌───────────────────┼───────────────────┐ + │ │ │ + ▼ ▼ ▼ +┌───────────────┐ ┌───────────────┐ ┌───────────────────────┐ +│ MSMedia │ │MSATPCorrection│ │ ModelSEEDBiochem │ +│ (Media def) │ │ (ATP tests) │ │ (Reaction database) │ +└───────────────┘ └───────────────┘ └───────────────────────┘ +``` + +## Key Relationships + +### MSModelUtil ↔ MSGapfill + +**Connection:** +- MSGapfill takes MSModelUtil in constructor +- Sets `mdlutl.gfutl = self` for bidirectional access + +**Methods Used:** +- `mdlutl.test_solution()` - Validates gapfilling solutions +- `mdlutl.reaction_expansion_test()` - Finds minimal reaction sets +- `mdlutl.add_gapfilling()` - Records integrated solutions +- `mdlutl.assign_reliability_scores_to_reactions()` - Scores reactions for gapfilling + +**Example:** +```python +from modelseedpy.core.msgapfill import MSGapfill + +# MSGapfill stores reference to mdlutl +gapfill = MSGapfill(mdlutl, default_target="bio1") +# Now: mdlutl.gfutl == gapfill + +# Use mdlutl methods for solution validation +solution = gapfill.run_gapfilling(media, target="bio1") +unneeded = mdlutl.test_solution(solution, ["bio1"], [media], [0.1]) +``` + +### MSModelUtil ↔ MSPackageManager + +**Connection:** +- Created automatically in `__init__`: `self.pkgmgr = MSPackageManager.get_pkg_mgr(model)` +- Provides FBA constraint packages + +**Methods Used:** +- `mdlutl.pkgmgr.getpkg("KBaseMediaPkg").build_package(media)` - Apply media constraints +- `mdlutl.pkgmgr.getpkg("ObjectivePkg")` - Set objectives +- All FBA packages access model through MSPackageManager + +**Example:** +```python +# MSModelUtil uses pkgmgr internally for set_media() +mdlutl.set_media(media) +# Equivalent to: +# mdlutl.pkgmgr.getpkg("KBaseMediaPkg").build_package(media) +``` + +### MSModelUtil ↔ MSATPCorrection + +**Connection:** +- Lazy-loaded via `get_atputl()` +- Sets `self.atputl` for caching +- Used for ATP production tests during gapfilling + +**Methods Used:** +- `mdlutl.get_atputl()` - Get or create MSATPCorrection +- `mdlutl.get_atp_tests()` - Get ATP test conditions +- ATP tests are used as constraints during gapfilling + +**Example:** +```python +from modelseedpy.core.mstemplate import MSTemplateBuilder + +template = MSTemplateBuilder.build_core_template() +atputl = mdlutl.get_atputl(core_template=template) +tests = mdlutl.get_atp_tests(core_template=template) + +# Tests are condition dicts that can be used with test_single_condition +for test in tests: + passed = mdlutl.test_single_condition(test) +``` + +### MSModelUtil ↔ ModelSEEDBiochem + +**Connection:** +- Used for reaction/compound database lookups +- Not stored as instance attribute (imported when needed) + +**Methods Used:** +- `mdlutl.add_ms_reaction()` - Adds reactions from ModelSEED database +- `mdlutl.assign_reliability_scores_to_reactions()` - Uses biochemistry data for scoring + +**Example:** +```python +# Add ModelSEED reactions by ID +reactions = mdlutl.add_ms_reaction({ + "rxn00001": "c0", # Reaction ID -> compartment + "rxn00002": "c0" +}) +``` + +### MSModelUtil ↔ MSFBA + +**Connection:** +- MSFBA wraps `model_or_mdlutl` input +- Uses `MSModelUtil.get()` for consistent access + +**Example:** +```python +from modelseedpy.core.msfba import MSFBA + +# MSFBA internally calls MSModelUtil.get() +fba = MSFBA(mdlutl) +# or +fba = MSFBA(model) # Will create/get MSModelUtil +``` + +### MSModelUtil ↔ MSMedia + +**Connection:** +- MSMedia objects are passed to `set_media()` +- Used in test conditions + +**Example:** +```python +from modelseedpy.core.msmedia import MSMedia + +# Create media +media = MSMedia.from_dict({"EX_cpd00027_e0": 10}) + +# Apply to model +mdlutl.add_missing_exchanges(media) +mdlutl.set_media(media) +``` + +## Dependency Chain + +``` +User Code + │ + ▼ +MSGapfill / MSFBA / MSCommunity + │ + ▼ +MSModelUtil ◄──────────────────┐ + │ │ + ├── MSPackageManager ───────┤ + │ │ │ + │ ▼ │ + │ FBA Packages │ + │ │ + ├── MSATPCorrection ────────┤ + │ │ + └── ModelSEEDBiochem │ + │ │ + └───────────────────┘ +``` + +## Instance Attributes Set by Other Modules + +| Attribute | Set By | Purpose | +|-----------|--------|---------| +| `mdlutl.gfutl` | MSGapfill | Reference to gapfiller | +| `mdlutl.atputl` | get_atputl() | Cached ATP correction utility | +| `mdlutl.pkgmgr` | __init__ | Package manager for constraints | +| `mdlutl.reaction_scores` | MSGapfill | Gapfilling reaction scores | + +## Cross-Module Workflows + +### Gapfilling Workflow + +```python +# 1. Create MSModelUtil +mdlutl = MSModelUtil.get(model) + +# 2. Create MSGapfill (sets mdlutl.gfutl) +gapfill = MSGapfill(mdlutl) + +# 3. Get ATP tests (creates mdlutl.atputl) +atp_tests = mdlutl.get_atp_tests(core_template=template) + +# 4. Run gapfilling (uses pkgmgr internally) +solution = gapfill.run_gapfilling(media, target="bio1") + +# 5. Validate solution (uses test_solution) +unneeded = mdlutl.test_solution(solution, ["bio1"], [media], [0.1]) + +# 6. Record gapfilling +mdlutl.add_gapfilling(solution) +``` + +### FBA Workflow + +```python +# 1. Create MSModelUtil +mdlutl = MSModelUtil.get(model) + +# 2. Set media (uses pkgmgr) +mdlutl.add_missing_exchanges(media) +mdlutl.set_media(media) + +# 3. Run FBA (through cobra.Model) +solution = mdlutl.model.optimize() + +# 4. Analyze results +print(f"Growth: {solution.objective_value}") +``` + +### Community Modeling Workflow + +```python +from modelseedpy.community.mscommunity import MSCommunity + +# MSCommunity creates MSModelUtil for each member +community = MSCommunity(model=community_model, member_models=[m1, m2]) + +# Each member has its own MSModelUtil +for member in community.members: + mdlutl = member.model_util + # Work with individual member +``` diff --git a/.claude/commands/msmodelutl-expert/context/patterns.md b/.claude/commands/msmodelutl-expert/context/patterns.md new file mode 100644 index 00000000..34baf3ce --- /dev/null +++ b/.claude/commands/msmodelutl-expert/context/patterns.md @@ -0,0 +1,257 @@ +# Common MSModelUtil Patterns + +## Pattern 1: Safe Instance Access + +```python +from modelseedpy.core.msmodelutl import MSModelUtil + +# Always use get() for consistent instance access +mdlutl = MSModelUtil.get(model) # Works with model or mdlutl + +# Multiple calls return same instance +mdlutl1 = MSModelUtil.get(model) +mdlutl2 = MSModelUtil.get(model) +assert mdlutl1 is mdlutl2 # True + +# Functions should accept either model or mdlutl +def my_function(model_or_mdlutl): + mdlutl = MSModelUtil.get(model_or_mdlutl) + model = mdlutl.model + # ... rest of function +``` + +## Pattern 2: Load and Analyze a Model + +```python +from modelseedpy.core.msmodelutl import MSModelUtil +from modelseedpy.core.msmedia import MSMedia + +# Load model +mdlutl = MSModelUtil.from_cobrapy("my_model.json") + +# Set media +media = MSMedia.from_dict({"EX_cpd00027_e0": 10}) # Glucose +mdlutl.set_media(media) + +# Run FBA +solution = mdlutl.model.optimize() +print(f"Growth: {solution.objective_value}") +``` + +## Pattern 3: Find and Operate on Metabolites + +```python +# Always handle empty results +mets = mdlutl.find_met("glucose", "c0") +if mets: + glucose = mets[0] + # Do something with glucose +else: + print("Glucose not found in model") + +# Find by ModelSEED ID +mets = mdlutl.find_met("cpd00001") # Water +mets = mdlutl.find_met("cpd00001", "c0") # Cytosolic water + +# Get all ModelSEED IDs +id_hash = mdlutl.msid_hash() +# id_hash["cpd00001"] = [, ] +``` + +## Pattern 4: Add Exchanges for Media + +```python +from modelseedpy.core.msmedia import MSMedia + +# Before setting media, ensure exchanges exist +missing = mdlutl.add_missing_exchanges(media) +if missing: + print(f"Added exchanges for: {missing}") + +# Now set the media +mdlutl.set_media(media) + +# Alternative: add specific exchanges +mets_e0 = mdlutl.find_met("glucose", "e0") +if mets_e0: + mdlutl.add_exchanges_for_metabolites(mets_e0, uptake=10, excretion=0) +``` + +## Pattern 5: Test Growth Conditions + +```python +# Define condition +condition = { + "media": media, + "objective": "bio1", + "is_max_threshold": True, # True = FAIL if value >= threshold + "threshold": 0.1 +} + +# Apply and test (two-step) +mdlutl.apply_test_condition(condition) +passed = mdlutl.test_single_condition(condition, apply_condition=False) + +# Or test directly (one-step) +passed = mdlutl.test_single_condition(condition, apply_condition=True) + +# Test multiple conditions +all_passed = mdlutl.test_condition_list([cond1, cond2, cond3]) +``` + +## Pattern 6: Gapfill and Validate + +```python +from modelseedpy.core.msgapfill import MSGapfill + +# Create gapfiller +gapfill = MSGapfill(mdlutl, default_target="bio1") + +# Run gapfilling +solution = gapfill.run_gapfilling(media, target="bio1") + +# Test which reactions are actually needed +unneeded = mdlutl.test_solution( + solution, + targets=["bio1"], + medias=[media], + thresholds=[0.1], + remove_unneeded_reactions=True # Actually remove them +) + +# Record the gapfilling +mdlutl.add_gapfilling(solution) +``` + +## Pattern 7: ATP Correction + +```python +from modelseedpy.core.mstemplate import MSTemplateBuilder + +# Get core template +template = MSTemplateBuilder.build_core_template() + +# Get ATP tests +tests = mdlutl.get_atp_tests(core_template=template) + +# Run tests +for test in tests: + passed = mdlutl.test_single_condition(test) + print(f"{test['media'].id}: {'PASS' if passed else 'FAIL'}") +``` + +## Pattern 8: Find and Add Reactions + +```python +# Find a metabolite +glucose_list = mdlutl.find_met("glucose", "c0") +if glucose_list: + glucose = glucose_list[0] + + # Add exchange if missing + if glucose not in mdlutl.exchange_hash(): + mdlutl.add_exchanges_for_metabolites([glucose], uptake=10, excretion=0) + +# Add a transport reaction +mdlutl.add_transport_and_exchange_for_metabolite(glucose, direction=">") + +# Add ModelSEED reactions +reactions = mdlutl.add_ms_reaction({ + "rxn00001": "c0", + "rxn00002": "c0" +}) +``` + +## Pattern 9: Debug FBA Issues + +```python +import logging + +# Enable debug logging +logging.getLogger("modelseedpy.core.msmodelutl").setLevel(logging.DEBUG) + +# Print LP file for solver issues +mdlutl.printlp(print=True, filename="debug_problem") +# Creates debug_problem.lp in current directory + +# Check metabolite hash is built +if mdlutl.metabolite_hash is None: + mdlutl.build_metabolite_hash() + +# Verify MSModelUtil caching +print(f"Cached models: {len(MSModelUtil.mdlutls)}") + +# Find unproducible biomass compounds +unproducible = mdlutl.find_unproducible_biomass_compounds("bio1") +for met in unproducible: + print(f"Cannot produce: {met.id}") +``` + +## Pattern 10: Compare Multiple Solutions + +```python +# Run FBA under different conditions +solutions = {} + +# Glucose media +mdlutl.set_media(glucose_media) +solutions["glucose"] = mdlutl.model.optimize() + +# Acetate media +mdlutl.set_media(acetate_media) +solutions["acetate"] = mdlutl.model.optimize() + +# Export comparison +mdlutl.print_solutions(solutions, "flux_comparison.csv") +``` + +## Common Mistakes + +1. **Not using get()**: Creating multiple MSModelUtil instances for same model + ```python + # WRONG + mdlutl1 = MSModelUtil(model) + mdlutl2 = MSModelUtil(model) # Different instances! + + # RIGHT + mdlutl1 = MSModelUtil.get(model) + mdlutl2 = MSModelUtil.get(model) # Same instance + ``` + +2. **Ignoring empty find_met results**: Always check if list is empty + ```python + # WRONG + glucose = mdlutl.find_met("glucose")[0] # IndexError if not found! + + # RIGHT + mets = mdlutl.find_met("glucose") + if mets: + glucose = mets[0] + ``` + +3. **Wrong threshold interpretation**: is_max_threshold=True means FAIL if value >= threshold + ```python + # is_max_threshold=True means: + # - Test PASSES if objective < threshold + # - Test FAILS if objective >= threshold + # This is for testing "no ATP production" conditions + ``` + +4. **Not adding exchanges before setting media**: + ```python + # WRONG + mdlutl.set_media(media) # May fail if exchanges missing + + # RIGHT + mdlutl.add_missing_exchanges(media) + mdlutl.set_media(media) + ``` + +5. **Modifying model instead of mdlutl.model**: + ```python + # WRONG (if model was reassigned) + model.reactions.get_by_id("bio1").bounds = (0, 1000) + + # RIGHT (always use mdlutl.model) + mdlutl.model.reactions.get_by_id("bio1").bounds = (0, 1000) + ``` diff --git a/.claude/commands/run_headless.md b/.claude/commands/run_headless.md new file mode 100644 index 00000000..b3a272fd --- /dev/null +++ b/.claude/commands/run_headless.md @@ -0,0 +1,158 @@ +# Command: run_headless + +## Purpose + +Execute Claude Code commands in autonomous headless mode with comprehensive JSON output. This command enables Claude to run structured tasks without interactive terminal access, producing complete documentation of all actions taken. + +## Overview + +You are running in headless mode to execute structured commands. You will receive input that may include: +1. **Claude Commands**: One or more commands to be executed (e.g., create-prd, generate-tasks, doc-code-for-dev) +2. **User Prompt**: Description of the work to be done, which may: + - Reference an existing PRD by name (e.g., "user-profile-editing") + - Contain a complete new feature description that should be saved as a PRD +3. **PRD Reference Handling**: When a PRD name is referenced: + - Look for `agent-io/prds//humanprompt.md` + - Look for `agent-io/prds//fullprompt.md` if present + - These files provide the detailed context for the work +4. **PRD Storage**: When a user prompt is provided without a PRD name: + - Analyze the prompt to create a descriptive PRD name (use kebab-case) + - Save the user prompt to `agent-io/prds//humanprompt.md` + - Document the PRD name in your output for future reference + +Your job is to execute the command according to the instructions and produce a comprehensive JSON output file. + +## Critical Principles for Headless Operation + +### User Cannot See Terminal +- The user has NO access to your terminal output +- ALL relevant information MUST go in the JSON output file +- Do not assume the user saw anything you did +- Every action, decision, and result must be documented in `claude-output.json` + +### Autonomous Execution +- Execute tasks independently without asking for permission +- Only ask questions when genuinely ambiguous or missing critical information +- Make reasonable assumptions and document them in comments +- Complete as much work as possible before requesting user input +- Work proactively to accomplish the full scope of the command + +## Command Execution Flow + +Follow this process for all headless executions: + +### 1. Parse Input and Handle PRDs +- Parse the input to identify: + - Which Claude commands to execute + - The user prompt describing the work + - Whether a PRD name is referenced +- **If a PRD name is referenced**: + - Read the PRD files from `agent-io/prds//` + - Use humanprompt.md and fullprompt.md (if available) as context +- **If user prompt provided without PRD name**: + - Create a descriptive PRD name based on the prompt content (use kebab-case) + - Create directory `agent-io/prds//` + - Save the user prompt to `agent-io/prds//humanprompt.md` + - Document the PRD name in your output +- If resuming from a previous session, review the parent session context + +### 2. Execute Command +- Follow the instructions in the command file +- Apply the principles from the system prompt +- Work autonomously as much as possible +- Track all actions as you work + +### 3. Track Everything +- Track all actions in memory as you work +- Build up the JSON output structure continuously +- Document files created, modified, or deleted +- Record task progress and status changes +- Capture all decisions and assumptions + +### 4. Handle User Queries (if needed) +- If you need user input, prepare clear questions +- Format questions according to the JSON schema +- Save complete context for resumption +- Set status to "user_query" +- Ensure session_id is included for continuity + +### 5. Write JSON Output +- Write the complete JSON to `claude-output.json` +- Ensure all required fields are present +- Validate JSON structure before writing +- Include comprehensive session_summary + +## Example Headless Session + +### Example 1: New PRD Creation + +**Input:** +- Commands: `["create-prd"]` +- User prompt: "Add user profile editing feature with avatar upload and bio section" +- PRD name: Not provided + +**Execution Process:** +1. Parse input - no PRD name provided, so create one +2. Generate PRD name: "user-profile-editing" +3. Create directory: `agent-io/prds/user-profile-editing/` +4. Save user prompt to `agent-io/prds/user-profile-editing/humanprompt.md` +5. Ask clarifying questions (if needed) by setting status to "user_query" +6. Generate enhanced PRD content +7. Save to `agent-io/prds/user-profile-editing/fullprompt.md` +8. Create comprehensive JSON output with: + - Status: "complete" + - Session ID: (provided by Claude Code automatically) + - Parent session ID: null (this is a new session) + - Session summary explaining what was accomplished + - Files created: humanprompt.md, fullprompt.md, data.json + - PRD name documented in artifacts + - Any relevant comments, assumptions, or observations + +### Example 2: Using Existing PRD + +**Input:** +- Commands: `["generate-tasks"]` +- User prompt: "Generate implementation tasks for user-profile-editing" +- PRD name: "user-profile-editing" (referenced in prompt) + +**Execution Process:** +1. Parse input - PRD name "user-profile-editing" identified +2. Read `agent-io/prds/user-profile-editing/humanprompt.md` +3. Read `agent-io/prds/user-profile-editing/fullprompt.md` (if exists) +4. Use PRD context to generate detailed task list +5. Save tasks to `agent-io/prds/user-profile-editing/data.json` +6. Create comprehensive JSON output with task list and references + +### The user workflow: +- User reads `claude-output.json` to understand everything you did +- User can review created files based on paths in JSON +- User can resume work by creating new session with parent_session_id + +### If clarification is needed: +- Set status to "user_query" +- Include session_id in output +- Add queries_for_user array with clear, specific questions +- When user provides answers in a new session, that session will have parent_session_id pointing to this session +- Claude Code uses the session chain to maintain full context + +## Output Requirements + +Always output to: `claude-output.json` in the working directory + +The JSON must include: +- All required fields for the command type and status +- Complete file tracking (created, modified, deleted) +- Task progress if applicable +- Session information for continuity +- Comments explaining decisions and assumptions +- Any errors or warnings encountered + +## Best Practices for Headless Execution + +- **Be Specific**: Include file paths, line numbers, function names +- **Be Complete**: Don't leave out details assuming the user knows them +- **Be Clear**: Write for someone who wasn't watching you work +- **Be Actionable**: Comments should help the user understand next steps +- **Be Honest**: If something is incomplete or uncertain, say so +- **Be Thorough**: Document every action taken, no matter how small +- **Be Proactive**: Complete as much work as possible before asking questions diff --git a/.claude/settings.local.json b/.claude/settings.local.json new file mode 100644 index 00000000..b3cf9c7c --- /dev/null +++ b/.claude/settings.local.json @@ -0,0 +1,10 @@ +{ + "permissions": { + "allow": [ + "Bash(grep:*)", + "Bash(find:*)", + "Bash(tree:*)", + "Bash(mkdir:*)" + ] + } +} diff --git a/.github/workflows/pre-commit.yml b/.github/workflows/pre-commit.yml index 87de0099..ffde9f98 100644 --- a/.github/workflows/pre-commit.yml +++ b/.github/workflows/pre-commit.yml @@ -3,6 +3,8 @@ name: Run Pre-Commit on: pull_request: {} push: + paths-ignore: + - 'examples/**' branches: - dev - main @@ -13,7 +15,7 @@ jobs: strategy: matrix: os: [ubuntu-latest] - python-version: ['3.8', '3.9', '3.10'] + python-version: ['3.9', '3.10', '3.11'] steps: - uses: actions/checkout@v2 - uses: actions/setup-python@v3 diff --git a/.github/workflows/tox.yml b/.github/workflows/tox.yml new file mode 100644 index 00000000..c3d816d0 --- /dev/null +++ b/.github/workflows/tox.yml @@ -0,0 +1,28 @@ +name: Run Tox + +on: + pull_request: {} + push: + branches: [main] + +jobs: + build: + runs-on: ${{ matrix.os }} + strategy: + matrix: + os: [ubuntu-latest, macos-latest, windows-latest] + python-version: ['3.9', '3.10', '3.11'] + steps: + - uses: actions/checkout@v3 + - name: Set up Python + uses: actions/setup-python@v3 + with: + python-version: ${{ matrix.python-version }} + - name: Install dependencies + run: | + python -m pip install --upgrade pip setuptools wheel build + python -m pip install tox tox-gh-actions + - name: Test with tox + run: | + tox + python -m build . diff --git a/.gitignore b/.gitignore index 6390162b..591c53c2 100644 --- a/.gitignore +++ b/.gitignore @@ -5,10 +5,8 @@ __pycache__/ *.py[cod] *$py.class - # C extensions *.so - # Distribution / packaging .Python build/ @@ -29,17 +27,14 @@ share/python-wheels/ .installed.cfg *.egg MANIFEST - # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec - # Installer logs pip-log.txt pip-delete-this-directory.txt - # Unit test / coverage reports htmlcov/ .tox/ @@ -53,81 +48,70 @@ coverage.xml *.py,cover .hypothesis/ .pytest_cache/ - # Translations *.mo *.pot - # Django stuff: *.log local_settings.py db.sqlite3 db.sqlite3-journal - # Flask stuff: instance/ .webassets-cache - # Scrapy stuff: .scrapy - # Sphinx documentation docs/_build/ - # PyBuilder target/ - # Jupyter Notebook .ipynb_checkpoints .idea - # IPython profile_default/ ipython_config.py - # pyenv .python-version - # pipenv # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. # However, in case of collaboration, if having platform-specific dependencies or dependencies # having no cross-platform support, pipenv may install dependencies that don't work, or not # install all needed dependencies. #Pipfile.lock - # PEP 582; used by e.g. github.com/David-OConnor/pyflow __pypackages__/ - # Celery stuff celerybeat-schedule celerybeat.pid - # SageMath parsed files *.sage.py - # Environments .env .venv +activate.sh env/ venv/ ENV/ env.bak/ venv.bak/ - # Spyder project settings .spyderproject .spyproject - # Rope project settings .ropeproject - # mkdocs documentation /site - # mypy .mypy_cache/ .dmypy.json dmypy.json - # Pyre type checker .pyre/ +.pydevproject +.settings/* +*data/* +*.lp + +# Cursor workspace files +*.code-workspace diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 04cde634..325706ab 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -21,7 +21,9 @@ repos: args: - --pytest-test-first - id: check-json + exclude: examples/ - id: pretty-format-json + exclude: examples/ args: - --autofix - --top-keys=_id diff --git a/.travis.yml b/.travis.yml index e72cfaff..75b2eb81 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,8 +1,8 @@ language: python python: - - 3.6 - - 3.7 - - 3.8 + - 3.9 + - 3.10 + - 3.11 before_install: - python --version - pip install -U pip diff --git a/README.rst b/README.rst index 6f380d9a..3d491ec3 100644 --- a/README.rst +++ b/README.rst @@ -25,6 +25,10 @@ ________________________________________________________________________ :target: https://pepy.tech/project/modelseedpy :alt: Downloads +.. image:: https://img.shields.io/badge/code%20style-black-000000.svg + :target: https://github.com/ambv/black + :alt: Black + Metabolic modeling is an pivotal method for computational research in synthetic biology and precision medicine. The metabolic models, such as the constrint-based flux balance analysis (FBA) algorithm, are improved with comprehensive datasets that capture more metabolic chemistry in the model and improve the accuracy of simulation predictions. We therefore developed ModelSEEDpy as a comprehensive suite of packages that bootstrap metabolic modeling with the ModelSEED Database (`Seaver et al., 2021 `_ ). These packages parse and manipulate (e.g. gapfill missing reactions or calculated chemical properties of metabolites), constrain (with kinetic, thermodynamics, and nutrient uptake), and simulate cobrakbase models (both individual models and communities). This is achieved by standardizing COBRA models through the ``cobrakbase`` module into a form that is amenable with the KBase/ModelSEED ecosystem. These functionalities are exemplified in `Python Notebooks `_ . Please submit errors, inquiries, or suggestions as `GitHub issues `_ where they can be addressed by our developers. @@ -33,11 +37,11 @@ Metabolic modeling is an pivotal method for computational research in synthetic Installation ---------------------- -ModelSEEDpy will soon be installable via the ``PyPI`` channel:: +PIP (latest stable version 0.4.0):: pip install modelseedpy -but, until then, the repository must cloned:: +GitHub dev build (latest working version):: git clone https://github.com/ModelSEED/ModelSEEDpy.git @@ -51,8 +55,3 @@ The associated ModelSEED Database, which is required for a few packages, is simp git clone https://github.com/ModelSEED/ModelSEEDDatabase.git and the path to this repository is passed as an argument to the corresponding packages. - -**Windows users** must separately install the ``pyeda`` module: 1) download the appropriate wheel for your Python version from `this website `_ ; and 2) install the wheel through the following commands in a command prompt/powershell console:: - - cd path/to/pyeda/wheel - pip install pyeda_wheel_name.whl diff --git a/agent-io/docs/msmodelutl-developer-guide.md b/agent-io/docs/msmodelutl-developer-guide.md new file mode 100644 index 00000000..af82ff21 --- /dev/null +++ b/agent-io/docs/msmodelutl-developer-guide.md @@ -0,0 +1,712 @@ +# MSModelUtil Developer Guide + +## Overview + +`MSModelUtil` is the central utility wrapper class in ModelSEEDpy that encapsulates a COBRApy `Model` object and provides extensive model manipulation, analysis, and FBA-related functionality. It serves as the primary bridge between COBRApy models and ModelSEED-specific functionality. + +**Location:** `modelseedpy/core/msmodelutl.py` (~2,000 lines) + +## Architecture + +### Design Pattern + +MSModelUtil uses a **singleton-like caching pattern** where instances are cached by model object: + +```python +class MSModelUtil: + mdlutls = {} # Static cache of MSModelUtil instances + + @staticmethod + def get(model, create_if_missing=True): + """Get or create MSModelUtil for a model""" + if isinstance(model, MSModelUtil): + return model + if model in MSModelUtil.mdlutls: + return MSModelUtil.mdlutls[model] + elif create_if_missing: + MSModelUtil.mdlutls[model] = MSModelUtil(model) + return MSModelUtil.mdlutls[model] + return None +``` + +This means you can safely call `MSModelUtil.get(model)` multiple times and always get the same instance. + +### Core Dependencies + +``` +MSModelUtil + ├── cobra.Model (wrapped object) + ├── MSPackageManager (FBA constraint packages) + ├── ModelSEEDBiochem (reaction/compound database) + ├── FBAHelper (FBA utility functions) + └── MSATPCorrection (lazy-loaded for ATP analysis) +``` + +### Key Instance Attributes + +```python +self.model # The wrapped cobra.Model +self.pkgmgr # MSPackageManager for this model +self.wsid # KBase workspace ID (if applicable) +self.atputl # MSATPCorrection instance (lazy-loaded) +self.gfutl # MSGapfill reference (set by gapfiller) +self.metabolite_hash # Metabolite lookup cache +self.search_metabolite_hash # Fuzzy search cache +self.test_objective # Current test objective value +self.reaction_scores # Gapfilling reaction scores +self.integrated_gapfillings # List of integrated gapfilling solutions +self.attributes # Model metadata dictionary +self.atp_tests # Cached ATP test conditions +self.reliability_scores # Reaction reliability scores +``` + +--- + +## API Reference + +### Initialization & Factory Methods + +#### `MSModelUtil(model)` +Create a new MSModelUtil wrapping a cobra.Model. + +```python +from modelseedpy.core.msmodelutl import MSModelUtil +import cobra + +model = cobra.io.load_json_model("my_model.json") +mdlutl = MSModelUtil(model) +``` + +#### `MSModelUtil.get(model, create_if_missing=True)` [static] +Get or create an MSModelUtil instance. Preferred method for obtaining instances. + +```python +# These are equivalent and return the same instance: +mdlutl1 = MSModelUtil.get(model) +mdlutl2 = MSModelUtil.get(model) +assert mdlutl1 is mdlutl2 # True + +# Also accepts MSModelUtil directly (returns it unchanged) +mdlutl3 = MSModelUtil.get(mdlutl1) +assert mdlutl3 is mdlutl1 # True +``` + +#### `MSModelUtil.from_cobrapy(filename)` [static] +Load a model from a file and wrap it in MSModelUtil. + +```python +# Supports .json and .xml/.sbml files +mdlutl = MSModelUtil.from_cobrapy("model.json") +mdlutl = MSModelUtil.from_cobrapy("model.xml") +``` + +#### `MSModelUtil.build_from_kbase_json_file(filename, kbaseapi)` [static] +Load a model from KBase JSON format. + +```python +from modelseedpy.core.kbaseapi import KBaseAPI +kbaseapi = KBaseAPI() +mdlutl = MSModelUtil.build_from_kbase_json_file("kbase_model.json", kbaseapi) +``` + +--- + +### I/O Methods + +#### `save_model(filename, format="json")` +Save the model to a file. + +```python +mdlutl.save_model("output.json", format="json") +mdlutl.save_model("output.xml", format="xml") +``` + +#### `printlp(model=None, path="", filename="debug", print=False)` +Write the LP formulation to a file for debugging. + +```python +mdlutl.printlp(print=True) # Writes debug.lp +mdlutl.printlp(path="/tmp", filename="mymodel", print=True) +``` + +#### `print_solutions(solution_hash, filename="reaction_solutions.csv")` +Export multiple FBA solutions to CSV. + +```python +solutions = { + "glucose": model.optimize(), + "acetate": model.optimize() # after changing media +} +mdlutl.print_solutions(solutions, "flux_comparison.csv") +``` + +--- + +### Metabolite Search & Lookup + +#### `find_met(name, compartment=None)` +Find metabolites by name, ID, or annotation. Returns a list of matching metabolites. + +```python +# Find by ModelSEED ID +mets = mdlutl.find_met("cpd00001") # Water + +# Find by name +mets = mdlutl.find_met("glucose") + +# Find in specific compartment +mets = mdlutl.find_met("cpd00001", "c0") # Cytosolic water +mets = mdlutl.find_met("cpd00001", "e0") # Extracellular water +``` + +#### `metabolite_msid(metabolite)` [static] +Extract the ModelSEED compound ID from a metabolite. + +```python +msid = MSModelUtil.metabolite_msid(met) # Returns "cpd00001" or None +``` + +#### `reaction_msid(reaction)` [static] +Extract the ModelSEED reaction ID from a reaction. + +```python +msid = MSModelUtil.reaction_msid(rxn) # Returns "rxn00001" or None +``` + +#### `msid_hash()` +Create a dictionary mapping ModelSEED IDs to metabolite lists. + +```python +id_hash = mdlutl.msid_hash() +# id_hash["cpd00001"] = [, ] +``` + +#### `build_metabolite_hash()` +Build internal metabolite lookup caches. Called automatically by `find_met()`. + +```python +mdlutl.build_metabolite_hash() +# Now mdlutl.metabolite_hash and mdlutl.search_metabolite_hash are populated +``` + +#### `search_name(name)` [static] +Normalize a name for fuzzy searching (lowercase, strip compartment suffix, remove non-alphanumeric). + +```python +MSModelUtil.search_name("D-Glucose_c0") # Returns "dglucose" +``` + +--- + +### Reaction Search & Analysis + +#### `rxn_hash()` +Create a dictionary mapping reaction stoichiometry strings to reactions. + +```python +rxn_hash = mdlutl.rxn_hash() +# rxn_hash["cpd00001_c0+cpd00002_c0=cpd00003_c0"] = [, 1] +``` + +#### `find_reaction(stoichiometry)` +Find a reaction by its stoichiometry. + +```python +stoich = {met1: -1, met2: -1, met3: 1} +result = mdlutl.find_reaction(stoich) +# Returns [reaction, direction] or None +``` + +#### `stoichiometry_to_string(stoichiometry)` [static] +Convert stoichiometry dict to canonical string representation. + +```python +strings = MSModelUtil.stoichiometry_to_string(rxn.metabolites) +# Returns ["reactants=products", "products=reactants"] +``` + +#### `exchange_list()` +Get all exchange reactions (EX_ or EXF prefixed). + +```python +exchanges = mdlutl.exchange_list() +``` + +#### `exchange_hash()` +Create a dictionary mapping metabolites to their exchange reactions. + +```python +ex_hash = mdlutl.exchange_hash() +# ex_hash[] = +``` + +#### `nonexchange_reaction_count()` +Count non-exchange reactions that have non-zero bounds. + +```python +count = mdlutl.nonexchange_reaction_count() +``` + +#### `is_core(rxn)` +Check if a reaction is a core metabolic reaction. + +```python +if mdlutl.is_core("rxn00001_c0"): + print("This is a core reaction") +``` + +--- + +### Exchange & Transport Management + +#### `add_exchanges_for_metabolites(cpds, uptake=0, excretion=0, prefix="EX_", prefix_name="Exchange for ")` +Add exchange reactions for metabolites. + +```python +# Add uptake-only exchanges +mdlutl.add_exchanges_for_metabolites(mets, uptake=1000, excretion=0) + +# Add bidirectional exchanges +mdlutl.add_exchanges_for_metabolites(mets, uptake=1000, excretion=1000) +``` + +#### `add_transport_and_exchange_for_metabolite(met, direction="=", prefix="trans", override=False)` +Add a charge-balanced transport reaction and corresponding exchange. + +```python +# Add bidirectional transport for a cytosolic metabolite +transport = mdlutl.add_transport_and_exchange_for_metabolite( + met_c0, direction="=", prefix="trans" +) +``` + +#### `add_missing_exchanges(media)` +Add exchange reactions for media compounds that don't have them. + +```python +missing = mdlutl.add_missing_exchanges(my_media) +# Returns list of compound IDs that needed exchanges added +``` + +--- + +### Media & FBA Configuration + +#### `set_media(media)` +Set the model's growth media. + +```python +from modelseedpy.core.msmedia import MSMedia + +# From MSMedia object +mdlutl.set_media(my_media) + +# From dictionary +mdlutl.set_media({"cpd00001": 1000, "cpd00007": 1000}) +``` + +#### `set_objective_from_phenotype(phenotype, missing_transporters=[], create_missing_compounds=False)` +Configure the model objective based on a phenotype type. + +```python +# For growth phenotypes, sets biomass objective +# For uptake/excretion phenotypes, sets appropriate exchange objectives +obj_str = mdlutl.set_objective_from_phenotype(phenotype) +``` + +--- + +### FBA Testing & Condition Management + +#### `apply_test_condition(condition, model=None)` +Apply a test condition (media, objective, direction) to the model. + +```python +condition = { + "media": my_media, + "objective": "bio1", + "is_max_threshold": True, + "threshold": 0.1 +} +mdlutl.apply_test_condition(condition) +``` + +#### `test_single_condition(condition, apply_condition=True, model=None, report_atp_loop_reactions=False, analyze_failures=False, rxn_list=[])` +Test if a model meets a condition's threshold. + +```python +passed = mdlutl.test_single_condition(condition) +# Returns True if threshold is NOT exceeded (for is_max_threshold=True) +``` + +#### `test_condition_list(condition_list, model=None, positive_growth=[], rxn_list=[])` +Test multiple conditions. Returns True only if ALL pass. + +```python +all_passed = mdlutl.test_condition_list(conditions) +``` + +--- + +### Gapfilling Support + +#### `test_solution(solution, targets, medias, thresholds=[0.1], remove_unneeded_reactions=False, do_not_remove_list=[])` +Test if gapfilling solution reactions are needed. + +```python +# Solution format: {"new": {rxn_id: direction}, "reversed": {rxn_id: direction}} +# Or: list of [rxn_id, direction, label] +unneeded = mdlutl.test_solution( + solution, + targets=["bio1"], + medias=[glucose_media], + thresholds=[0.1] +) +``` + +#### `add_gapfilling(solution)` +Record an integrated gapfilling solution. + +```python +mdlutl.add_gapfilling({ + "new": {"rxn00001_c0": ">"}, + "reversed": {"rxn00002_c0": "<"}, + "media": media, + "target": "bio1", + "minobjective": 0.1, + "binary_check": True +}) +``` + +#### `convert_solution_to_list(solution)` +Convert dictionary solution format to list format. + +```python +solution_list = mdlutl.convert_solution_to_list(solution) +# Returns [[rxn_id, direction, "new"|"reversed"], ...] +``` + +--- + +### Reaction Expansion Testing + +These methods are used for binary/linear search to find minimal reaction sets. + +#### `reaction_expansion_test(reaction_list, condition_list, binary_search=True, attribute_label="gf_filter", positive_growth=[], resort_by_score=True, active_reaction_sets=[])` +Test which reactions in a list can be removed while still meeting conditions. + +```python +filtered = mdlutl.reaction_expansion_test( + reaction_list=[[rxn, ">"], [rxn2, "<"], ...], + condition_list=conditions, + binary_search=True +) +# Returns reactions that were filtered out +``` + +#### `binary_expansion_test(reaction_list, condition, currmodel, depth=0, positive_growth=[])` +Binary search variant of expansion testing. + +#### `linear_expansion_test(reaction_list, condition, currmodel, positive_growth=[])` +Linear (one-by-one) variant of expansion testing. + +--- + +### ATP Correction + +#### `get_atputl(atp_media_filename=None, core_template=None, gapfilling_delta=0, max_gapfilling=0, forced_media=[], remake_atputil=False)` +Get or create the MSATPCorrection utility. + +```python +atputl = mdlutl.get_atputl(core_template=template) +``` + +#### `get_atp_tests(core_template=None, atp_media_filename=None, recompute=False, remake_atputil=False)` +Get ATP test conditions. + +```python +tests = mdlutl.get_atp_tests(core_template=template) +# Returns list of condition dicts with media, threshold, objective +``` + +--- + +### Reliability Scoring + +#### `assign_reliability_scores_to_reactions(active_reaction_sets=[])` +Calculate reliability scores for all reactions based on biochemistry data. + +```python +scores = mdlutl.assign_reliability_scores_to_reactions() +# scores[rxn_id][">"] = forward score +# scores[rxn_id]["<"] = reverse score +``` + +Scoring considers: +- Mass/charge balance status +- Delta G values +- Compound completeness +- ATP production direction +- Transported charge + +--- + +### Biomass Analysis + +#### `evaluate_biomass_reaction_mass(biomass_rxn_id, normalize=False)` +Calculate the mass balance of a biomass reaction. + +```python +result = mdlutl.evaluate_biomass_reaction_mass("bio1") +# Returns {"ATP": atp_coefficient, "Total": total_mass} +``` + +#### `find_unproducible_biomass_compounds(target_rxn="bio1", ko_list=None)` +Find biomass compounds that cannot be produced. + +```python +# Without knockouts +unproducible = mdlutl.find_unproducible_biomass_compounds() + +# With knockouts to test sensitivity +ko_results = mdlutl.find_unproducible_biomass_compounds( + ko_list=[["rxn00001_c0", ">"], ["rxn00002_c0", "<"]] +) +``` + +--- + +### Minimal Reaction Set Analysis + +#### `analyze_minimal_reaction_set(solution, label, print_output=True)` +Analyze a minimal reaction set for alternatives and coupled reactions. + +```python +output = mdlutl.analyze_minimal_reaction_set(fba_solution, "my_analysis") +# Writes CSV to nboutput/rxn_analysis/