diff --git a/aws-transform-agent-toolkit/POWER.md b/aws-transform-agent-toolkit/POWER.md new file mode 100644 index 0000000..de7e14d --- /dev/null +++ b/aws-transform-agent-toolkit/POWER.md @@ -0,0 +1,324 @@ +--- +name: "aws-transform-agent-toolkit" +displayName: "AWS Transform Agent Toolkit" +description: "Build agents to run in AWS Transform. This power provides a self-service agent lifecycle from inception to development to production. Build modernization and migration agents with citation-backed AWS Transform documentation search, package agents as containers, deploy to Bedrock AgentCore across platforms (Windows/macOS/Linux), and register with AWS Transform." +author: "AWS" +keywords: ["aws transform", "agent development", "composability", "modernization", "migration"] +--- + +## Onboarding + +### Step 1: Validate tools and access + +Before using this power, ensure the following are installed and configured: + +- **Python 3.11+**: Required for AWS Transform Agent SDK + - Verify with: `python3 --version` + - **CRITICAL**: Python 3.11 or higher is required. The SDK will not work with earlier versions. + +- **AWS CLI**: Required for deploying to Bedrock AgentCore and accessing AWS Transform registry + - Verify with: `aws --version` + - **CRITICAL**: Must be configured with credentials that have access to your AWS account. + - Test access: `aws sts get-caller-identity` + +- **Finch or Docker**: Required for building ARM64 container images + - Verify Finch: `finch --version` + - Or Docker: `docker --version` + - **CRITICAL**: Container runtime must be running before building images. + +- **AWS Transform Account Allowlisting**: Your AWS account must be allowlisted by AWS Transform team if you want to publish to registry. + - **CRITICAL**: Contact your Solutions Architect to request allowlisting before proceeding with agent registration. + - Without allowlisting, agent registration will fail. + +### Step 2: Install AWS Transform Agent SDK + +Install the SDK from PyPI into a virtual environment: + +```bash +cd +python3 -m venv .venv && source .venv/bin/activate +pip install agent-builder-sdk-aws-transform \ + agent-builder-agentic-mcp-aws-transform \ + agent-builder-types-aws-transform \ + agent-builder-mcp-client-aws-transform +``` + +Windows PowerShell: +```powershell +cd +py -3 -m venv .venv; .venv\Scripts\Activate.ps1 +pip install agent-builder-sdk-aws-transform ` + agent-builder-agentic-mcp-aws-transform ` + agent-builder-types-aws-transform ` + agent-builder-mcp-client-aws-transform +``` + +**Verify installation:** + +```bash +python3 -c "import agent_builder_sdk; print('SDK OK')" +``` + +**Register botocore service models:** + +The SDK ships with custom botocore service models that must be registered before use: + +macOS/Linux: +```bash +SDK_MODELS=$(python3 -c "from importlib.resources import files; print(files('agent_builder_sdk').joinpath('botocore_models'))") +aws configure add-model --service-name atxagentregistryexternal \ + --service-model "file://${SDK_MODELS}/atxagentregistryexternal/2022-07-26/service-2.json" +aws configure add-model --service-name transformagenticservice \ + --service-model "file://${SDK_MODELS}/transformagenticservice/2018-05-10/service-2.json" +``` + +Windows PowerShell: +```powershell +$SDK_MODELS = python3 -c "from importlib.resources import files; print(files('agent_builder_sdk').joinpath('botocore_models'))" +aws configure add-model --service-name atxagentregistryexternal --service-model "file://$SDK_MODELS\atxagentregistryexternal\2022-07-26\service-2.json" +aws configure add-model --service-name transformagenticservice --service-model "file://$SDK_MODELS\transformagenticservice\2018-05-10\service-2.json" +``` + +**CRITICAL**: Without these models, the SDK will fail at runtime with `Unknown service: 'transformagenticservice'`. + +### Step 3: Set up IAM roles + +AWS Transform agent deployment requires two IAM roles in your AWS account: + +- **`AgentCoreExecutionRole`** — used by Bedrock AgentCore to run your agent container. Needs Bedrock model access, `transform-agents:*`, ECR pull, CloudWatch Logs, and X-Ray permissions. +- **`AWSTransformAgentInvokeRole`** — assumed by the AWS Transform compute service to invoke your Bedrock AgentCore runtime. Needs `bedrock-agentcore:InvokeAgentRuntime`, `GetAgentRuntime`, and `GetAgentRuntimeEndpoint`. + +**Missing or incorrectly configured roles are the single most common cause of silent deployment failures** — the runtime reaches READY but jobs fail ~8 minutes after creation with "Failed to start the job" in the AWS Transform webapp. + +**Check if the roles already exist:** + +```bash +aws iam get-role --role-name AgentCoreExecutionRole --query 'Role.Arn' --output text +aws iam get-role --role-name AWSTransformAgentInvokeRole --query 'Role.Arn' --output text +``` + +If either command returns `NoSuchEntity`, you need to create the roles. + +**Create both roles using the provided CloudFormation template:** + +A complete, correct CloudFormation template with both roles, all required permissions, and the right trust policies is provided in [steering/deployment-pipeline-guide.md](./steering/deployment-pipeline-guide.md#section-2-complete-cloudformation-template). + +Save the template as `iam-roles.yaml` and deploy it: + +```bash +aws cloudformation deploy \ + --template-file iam-roles.yaml \ + --stack-name aws-transform-agent-iam-roles \ + --capabilities CAPABILITY_NAMED_IAM \ + --region us-east-1 +``` + +**CRITICAL — Trust policy:** `AWSTransformAgentInvokeRole` MUST trust `prod.us-east-1.compute.elastic-gumby.aws.internal`. The template handles this correctly. + +**Regional scope:** The AWS Transform Compute principal format is `prod.{region}.compute.elastic-gumby.aws.internal`, and AWS Transform is available in several regions. This power, its CloudFormation template, and the deployment tooling assume us-east-1 only. Using a non us-east-1 AWS Transform region requires swapping the region segment in both principals, pointing the registry endpoint at the matching region, and passing `region` explicitly to `deploy_agent_full_pipeline`. + +**If your roles have non-default names** (e.g., set up via the AWS console or Bedrock AgentCore SDK which creates roles like `AmazonBedrockAgentCoreSDKRuntime-...`): + +- `AgentCoreExecutionRole`: the MCP deployment tool first tries the default name, then falls back to scanning trust policies for a role trusting `bedrock-agentcore.amazonaws.com`. If exactly one match is found it's used automatically; if zero or multiple, you'll get an error asking you to pass `execution_role_arn` explicitly. +- `AWSTransformAgentInvokeRole`: the MCP deployment tool only looks up the exact default name. If your invoke role has a different name, you must pass `access_role_arn` explicitly — otherwise registry registration is skipped with a warning. + +For the complete permissions reference for both roles, see [steering/deployment-pipeline-guide.md](./steering/deployment-pipeline-guide.md#section-1-iam-roles-overview). + +### Step 4: Add workspace hooks + +Add a hook to validate deployment prerequisites before deploying agents: + +`.kiro/hooks/validate-deployment.kiro.hook` +```json +{ + "enabled": true, + "name": "Validate AWS Transform Deployment Prerequisites", + "description": "Check IAM roles, container runtime, and AWS access before deployment", + "version": "1", + "when": { + "type": "userTriggered" + }, + "then": { + "type": "askAgent", + "prompt": "Before deploying AWS Transform agents, verify: 1) AWS credentials are valid (aws sts get-caller-identity), 2) finch or docker is running, 3) IAM roles exist and have correct permissions (AgentCoreExecutionRole with bedrock:InvokeModel and transform-agents:*; AWSTransformAgentInvokeRole with bedrock-agentcore:InvokeAgentRuntime). Report any missing prerequisites." + } +} +``` + +This hook helps catch common deployment issues before they cause failures in the pipeline. + +### Step 5: Configure MCP Server Environment (Optional) + +The MCP server is installed via `uvx` (see `mcp.json`). To pass environment variables, edit the `mcp.json` in your power directory: + +```json +{ + "mcpServers": { + "aws-transform-agent-toolkit": { + "command": "uvx", + "args": ["agent-builder-mcp-aws-transform"], + "env": { + "AWS_PROFILE": "my-profile", + "AWS_REGION": "us-east-1" + } + } + } +} +``` + +**Environment variables:** +- `AWS_PROFILE`: AWS CLI profile to use for credentials (set via `aws configure` or `aws sso login`) +- `AWS_REGION`: AWS region for the AWS Transform (defaults to us-east-1) + +Restart Kiro after making changes. + +## Deployment Automation + +Kiro can generate complete deployment pipelines covering: +- Docker image building (with SDK and MCP runtime) +- ECR repository setup and image push +- Bedrock AgentCore runtime creation with `bedrock-agentcore-control` +- AWS Transform agent registration with correct API parameters +- IAM role CloudFormation templates + +See [steering/deployment-pipeline-guide.md](./steering/deployment-pipeline-guide.md) for detailed patterns and best practices. + +**Example reference implementation:** See the `pipeline/` directory in the AWS Transform demo project for a working deployment pipeline. + +**Platform Compatibility:** +- **Windows**: Uses AWS CodeBuild for container builds (finch not available) +- **macOS/Linux**: Uses finch or docker for local builds +- **All platforms**: MCP deployment tools automatically detect best approach + +For conversational deployment workflow, see [steering/deploy-agent-workflow.md](./steering/deploy-agent-workflow.md). + +## MCP Tools Available + +This power includes an MCP server with search and registration tools: + +### Search Tools +- **keyword_search(query, top_k)** - Search AWS Transform documentation using keyword matching (recommended) +- **search_by_source(query, source, top_k)** - Search filtered by source (dev-guide, iam-roles, sdk, api) + +### Agent Deployment Tools +- **build_agent_image** - Build AWS Transform agent Docker image for ARM64 platform + - Supports three build methods: local finch, local Docker, or AWS CodeBuild (required for Windows) + - Automatically detects best runtime for current platform + - Pushes image to ECR (creates repository if needed) +- **deploy_agent_to_agentcore** - Deploy agent image to Bedrock AgentCore + - Creates Bedrock AgentCore runtime and polls until READY + - Generates unique runtime names with timestamp to avoid conflicts +- **deploy_agent_full_pipeline** - Complete deployment pipeline: build → push → deploy → register + - Orchestrates all phases for full agent deployment to AWS Transform + - Auto-detects IAM roles (AgentCoreExecutionRole, AWSTransformAgentInvokeRole) + - Platform-aware: Windows users automatically use CodeBuild, macOS/Linux prefer local finch/Docker + +### Agent Registry Tools +- **register_agent** - Register and publish a new agent with AWS Transform Agent Registry + - Performs all three registration steps: RegisterAgent → PublishAgentVersion → UpdatePublisherAccessControl + - Use after deploying your agent to Bedrock AgentCore to register it with AWS Transform +- **get_agent** - Get details of a registered agent +- **get_agent_version** - Get a specific version of a registered agent +- **update_agent** - Update an existing agent's metadata +- **list_agents_by_publisher** - List all agents published by the current account +- **publish_agent_version** - Publish a new version of an existing agent + - Copies config from the current (or specified) version, applies optional overrides (runtimeArn, atxAccessRoleArn), and publishes the new version + - Use after initial registration to iterate on agent versions +- **list_agent_access_control** - List access control settings for an agent +- **update_publisher_access_control** - Grant or revoke account access to an agent + +### Debugging Tools +- **fetch_logs** - Fetch CloudWatch logs for an agent runtime +- **list_log_streams** - List available log streams for an agent runtime +- **validate_agent_setup** - Validate agent deployment prerequisites (IAM roles, ECR, etc.) + +### HITL Tools +- **get_hitl_generation_prompt** - Get the full HITL UI generation rules and component schema + +The MCP server provides access to: +- AWS Transform Developer Guide (architecture, workflows, testing) +- BaseAgent SDK documentation (AsyncBaseOrchestrator, AsyncBaseSubagent) +- Agentic API and Agent Registry API specifications + +## Verification Guidelines + +When answering AWS Transform questions: +1. **Use MCP tools first** - Search indexed docs before answering +2. **Verify CLI commands** - Use `keyword_search("aws cli")` to confirm correct service names +3. **Verify API names** - Use `search_by_source(query, "api")` to confirm operation names +4. **Cross-reference** - Check steering files for patterns after MCP search + +Never guess: CLI commands, API operation names, service endpoints, or registration steps. + +### Search → Read → Generate (IMPORTANT) + +Search results are **truncated previews** (500 chars). Before generating code from a search result, read the full source to get complete signatures, parameters, and implementation details. + +When a search result includes a `file` field (e.g., `"file": "agent_builder_sdk/orchestrator.py"`): + +1. **Find** the installed package location: + ```bash + python3 -c "import agent_builder_sdk; print(agent_builder_sdk.__file__)" + ``` +2. **Grep** for the class or function in that location: + ```bash + grep -r "class BaseOrchestrator" $(python3 -c "import agent_builder_sdk, os; print(os.path.dirname(agent_builder_sdk.__file__))") + ``` +3. **Read** the matched file for full signatures and docstrings +4. **Generate** code using the complete source — not the truncated preview + +## Grounding Rules (CRITICAL) + +**ALWAYS cite your sources.** Every answer must include citation tags from search results. + +1. **NEVER answer from memory** - Always search first using MCP tools +2. **If not found, say so** - Respond with "I don't have information about X in the AWS Transform documentation" +3. **Cite sources in EVERY response** - Include the citation tag from search results: + - Format: `[source:name]` e.g., `[sdk:AsyncBaseOrchestrator]`, `[api:RegisterAgent]`, `[dev-guide:doc]` + - Place citations inline or at the end of relevant statements + - If multiple sources, cite all of them +4. **Low confidence = search again** - Try different queries before guessing +5. **Consolidate code snippets** - When search returns code examples: + - Verify API operations: `search_by_source("OperationName", "api")` + - Verify SDK classes: `search_by_source("ClassName", "sdk")` +6. **Iterate if needed** - If first search results are insufficient: + - Start broad: `keyword_search("orchestrator")` + - Then narrow: `keyword_search("orchestrator invoke subagent")` + - Evaluate results before answering - if low relevance, search again +7. **Low-score results** — If top result BM25 score is very low or results seem + irrelevant, try: (a) different terminology, (b) `search_by_source` to narrow scope, + (c) read the SDK source directly via filesystem. Do NOT generate code from + low-confidence search results. + +Example workflow: +1. User asks about agent registration +2. Call `keyword_search("agent registration")` or `search_by_source("RegisterAgent", "api")` +3. If results found → Answer using ONLY retrieved content + include citation tag from results +4. If not found → Say "I don't have this in the indexed docs" and suggest checking the Developer Guide or contacting your SA + +# When to Load Steering Files + +- Getting started with AWS Transform or building your first agent → `steering/getting-started.md` +- Building a new agent from scratch (orchestrator or subagent) → `steering/orchestrator-patterns.md` or `steering/subagent-patterns.md` +- Creating or modifying orchestrator agents → `steering/orchestrator-patterns.md` +- Building or updating subagents → `steering/subagent-patterns.md` +- Working with AWS Transform APIs (Agentic API, Registry API) → `steering/api-reference.md` +- Registering agents, publishing versions, understanding agentCard schema for composability → `steering/agent-registration.md` +- Deploying agents (Docker, ECR, Bedrock AgentCore, pipeline automation) → `steering/deployment-pipeline-guide.md` +- Working with the skill registry (upload, download, share, manage agent skills) → `steering/skill-operations.md` + - Skills are reusable capabilities that expand what an agent can do. The skill registry is a central repository where developers can choose from or contribute to a bank of skills to use with their agents. +- Troubleshooting agent deployment or runtime issues → `steering/troubleshooting.md` +- Adding an agent to an existing workflow → `steering/workflow-integration.md` +- **Deploying agents through Kiro (recommended workflow)** → `steering/deploy-agent-workflow.md` + +## Next Steps + +1. Read the architecture overview in `getting-started.md` +2. Follow the patterns in `orchestrator-patterns.md` or `subagent-patterns.md` +3. Use the inline code examples to scaffold your agent, then customize +4. Test locally before deploying to Bedrock AgentCore + +## License +AWS Service Terms. This power is provided by AWS and is subject to the AWS Customer Agreement and applicable AWS service terms. + +This power integrates with [agent-builder-mcp-aws-transform](https://github.com/awslabs/agent-builder-toolkit-aws-transform) (Apache-2.0 license). diff --git a/aws-transform-agent-toolkit/mcp.json b/aws-transform-agent-toolkit/mcp.json new file mode 100644 index 0000000..cd95d7d --- /dev/null +++ b/aws-transform-agent-toolkit/mcp.json @@ -0,0 +1,8 @@ +{ + "mcpServers": { + "aws-transform-agent-toolkit": { + "command": "uvx", + "args": ["--from", "agent-builder-mcp-aws-transform", "agent-builder-mcp"] + } + } +} diff --git a/aws-transform-agent-toolkit/steering/agent-registration.md b/aws-transform-agent-toolkit/steering/agent-registration.md new file mode 100644 index 0000000..104a6d3 --- /dev/null +++ b/aws-transform-agent-toolkit/steering/agent-registration.md @@ -0,0 +1,1049 @@ +--- +inclusion: auto +name: agent-registration +description: "Guidance for registering agents, publishing versions, and understanding agentCard schema for composability" +--- + +# Agent Registration Guide + +## Related Guides + +- For full deployment automation (build → ECR → Bedrock AgentCore → registry), see [deployment-pipeline-guide.md](./deployment-pipeline-guide.md) +- For IAM role setup and permissions, see [IAM Roles Overview](./deployment-pipeline-guide.md#section-1-iam-roles-overview) + +## Important Distinctions + +- **Bedrock AgentCore CLI**: `aws bedrock-agentcore-control` (NOT `aws bedrock-agent`) +- **Agentic API** (`transformagenticservice`): Runtime operations (InvokeAgent, GetJob) - for running agents +- **Agent Registry API** (`atxagentregistryexternal`): Registration operations (RegisterAgent, PublishAgentVersion) - for registering agents + +To explore available operations: +```bash +aws atxagentregistryexternal help +aws transformagenticservice help +aws bedrock-agentcore-control help +``` + +## Endpoint Configuration (CRITICAL) + +The Agent Registry API requires an explicit endpoint URL and region: + +| Service | Endpoint URL | Region | +|---------|-------------|--------| +| Agent Registry API (`atxagentregistryexternal`) | `https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev` | `us-east-1` | + +**Every AWS CLI call to the Agent Registry MUST include `--endpoint-url` and `--region`:** + +```bash +aws atxagentregistryexternal list-agents-by-publisher \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 +``` + +**Every boto3 call MUST include `endpoint_url` and `region_name`:** + +```python +import boto3 +client = boto3.client( + 'atxagentregistryexternal', + region_name='us-east-1', + endpoint_url='https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev' +) +``` + +**NEVER omit the endpoint URL** — without it, the CLI/SDK will attempt to resolve a non-existent endpoint and fail. + +## Registration Flow + +1. **Build & containerize** your agent +2. **Deploy to Bedrock AgentCore** via `aws bedrock-agentcore-control create-agent-runtime` +3. **Register, publish, and enable access** — use the `register_agent` MCP tool to perform steps 3–5 in a single call (RegisterAgent → PublishAgentVersion → UpdatePublisherAccessControl) + +> **Tip:** The `register_agent` tool automates the RegisterAgent, PublishAgentVersion, and UpdatePublisherAccessControl API calls. For manual CLI registration, see the detailed API sections below. + +## Bedrock AgentCore Deployment Requirements + +### Runtime Naming Constraints + +Bedrock AgentCore runtime names MUST match the pattern: `[a-zA-Z][a-zA-Z0-9_]{0,47}` + +**CRITICAL RULES:** +- First character MUST be a letter (a-z, A-Z) +- Remaining characters can be letters, digits, or underscores +- **NO HYPHENS ALLOWED** - `my-agent` is INVALID, use `my_agent` +- Maximum 48 characters total +- Deleted runtime names have a cooldown period and cannot be reused immediately + +**Good names:** `code_analysis_agent`, `myAgent_v1`, `atx_ws_agent_20240224` +**Bad names:** `code-analysis-agent` (hyphens), `123agent` (starts with digit), `my.agent` (dots) + +### Runtime Status + +After creating a Bedrock AgentCore runtime, poll for status. The success state is **`READY`** (not `ACTIVE`). + +**Status values:** +- `CREATING` - Runtime is being provisioned +- `READY` - **Runtime is ready to serve requests (SUCCESS STATE)** +- `ACTIVE` - Legacy alias for READY (some API versions return this instead of READY) +- `FAILED` - Creation failed +- `STOPPED` - Runtime stopped +- `DELETE_FAILED` - Deletion failed + +**Polling example:** +```bash +aws bedrock-agentcore-control get-agent-runtime \ + --agent-runtime-id \ + --region us-east-1 \ + --query 'status' +``` + +## IAM Role Trust Policy Requirements + +The AWS Transform Agent Invoke Role (role assumed by AWS Transform to invoke your agent) MUST trust the correct service principal. + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "Service": "prod.us-east-1.compute.elastic-gumby.aws.internal" + }, + "Action": "sts:AssumeRole" + } + ] +} +``` + +**Common Error:** If you get `AccessDeniedException: AWS Transform is unable to assume the Access Role` during `publish-agent-version`, the trust policy principal doesn't match your registry endpoint. + +## IAM Role Permissions (Beyond Trust Policy) + +The `AgentCoreExecutionRole` needs these key permissions: +- **Bedrock**: `bedrock:InvokeModel`, `bedrock:InvokeModelWithResponseStream` on `arn:aws:bedrock:*::foundation-model/*` +- **Bedrock AgentCore**: `bedrock-agentcore:GetWorkloadAccessToken*` +- **AWS Transform Agentic API**: `transform-agents:*` — required for the agent to call GetAgentInstance, UpdateJobStatus, SendMessage, etc. +- **ECR**: Image pull permissions (ecr:GetAuthorizationToken, ecr:BatchCheckLayerAvailability, ecr:GetDownloadUrlForLayer, ecr:BatchGetImage) +- **CloudWatch Logs**: CreateLogGroup, CreateLogStream, PutLogEvents +- **X-Ray**: PutTraceSegments, PutTelemetryRecords + +For complete CloudFormation template, see [deployment-pipeline-guide.md](./deployment-pipeline-guide.md). + +## RegisterAgent API + +Registers a new agent with the AWS Transform registry. Use this once per agent (not per version). + +### API Signature + +```bash +aws atxagentregistryexternal register-agent \ + --name \ + --metadata \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 +``` + +### Metadata Structure + +The `--metadata` parameter is a JSON object with the following fields: + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `type` | string | Yes | Agent type: `SUB_AGENT` or `ORCHESTRATOR_AGENT` | +| `description` | string | Yes | Human-readable description of the agent | +| `ownerName` | string | Yes | Agent owner/publisher name | +| `ownerContactInfo` | string | Yes | Contact email or information | +| `ownerType` | string | Yes | One of: `DIRECT_AGENT`, `MARKETPLACE_AGENT` | +| `customerConfigurationRequired` | boolean | Yes | `true` if customer must configure; `false` for same-account deployment | +| `jobOrchestrator` | boolean | **Orchestrators only** | MUST be `true` for orchestrator agents | +| `jobOrchestratorMetadata` | object | **Orchestrators only** | Chat UI configuration (see below) | +| `customerConfiguredAgentDependencies` | array | Optional | List of subagent names (only for customer-account deployments) | + +### ownerType Values + +- **`DIRECT_AGENT`**: Use this for your own agents deployed in your AWS account +- **`MARKETPLACE_AGENT`**: Agents published to AWS Marketplace (future) + +**Default:** Use `DIRECT_AGENT`. + +### customerConfigurationRequired + +- **`false`**: Agent and dependencies are deployed in the same AWS account as the registry (typical for demos/development) +- **`true`**: Agent will be deployed in customer's AWS account and requires customer configuration + +**Trade-off matrix:** + +| `customerConfigurationRequired` | `computeConfiguration` in publish | `customerConfiguredAgentDependencies` via update | +|---|---|---| +| `true` | Blocked — customer provides compute at deployment time | Allowed — declare subagent dependencies | +| `false` | Allowed — embed runtime ARN in published version | Blocked — no dependency declaration | + +An orchestrator needing webapp visibility + compute config + subagent dependencies cannot satisfy all three. Choose based on priority: +- Register with `false` — webapp + compute config, but no declared dependencies (orchestrator still invokes subagents at runtime) +- Register with `true` — dependencies declared, but compute config provided by customer at deployment time + +### customerConfiguredAgentDependencies + +When `customerConfigurationRequired: true`, declare the orchestrator's subagent dependencies so AWS Transform knows which agents must be available: + +```python +metadata = { + "type": "ORCHESTRATOR_AGENT", + "customerConfigurationRequired": True, + "customerConfiguredAgentDependencies": [ + "my-analysis-agent", + "my-transformation-agent" + ] +} +``` + +These MUST be kept in sync with the `discover_subagents()` tool in `tools/orchestrator_tools.py`. + +### Orchestrator-Specific Fields + +**CRITICAL:** If `type` is `ORCHESTRATOR_AGENT`, you MUST set `jobOrchestrator: true` and provide `jobOrchestratorMetadata`. + +#### jobOrchestratorMetadata Structure + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `chatUILabel` | string | Yes | Display name shown in AWS Transform chat UI | +| `chatAgentIdentifier` | string | Yes | Unique identifier for chat routing (use agent name) | +| `a2aSupported` | boolean | Yes | Whether agent supports Agent-to-Agent protocol | + +> **Note:** If the agent is already registered (`ConflictException`), use `UpdateAgent` to modify `jobOrchestratorMetadata` (including `chatUILabel`) — it is fully updatable post-registration. + +### Complete Examples + +#### Registering a Subagent + +```bash +aws atxagentregistryexternal register-agent \ + --name code-analysis-agent \ + --metadata '{ + "type": "SUB_AGENT", + "description": "Analyzes code structure and dependencies", + "ownerName": "my-team", + "ownerContactInfo": "team@example.com", + "ownerType": "DIRECT_AGENT", + "customerConfigurationRequired": false + }' \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 +``` + +#### Registering an Orchestrator + +```bash +aws atxagentregistryexternal register-agent \ + --name modernization-orchestrator \ + --metadata '{ + "type": "ORCHESTRATOR_AGENT", + "description": "Coordinates code modernization workflow", + "ownerName": "my-team", + "ownerContactInfo": "team@example.com", + "ownerType": "DIRECT_AGENT", + "customerConfigurationRequired": false, + "jobOrchestrator": true, + "jobOrchestratorMetadata": { + "chatUILabel": "Code Modernization Orchestrator", + "chatAgentIdentifier": "modernization-orchestrator", + "a2aSupported": true + } + }' \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 +``` + +### Python Example + +```python +import boto3 +import json + +client = boto3.client( + 'atxagentregistryexternal', + region_name='us-east-1', + endpoint_url='https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev' +) + +# Subagent +client.register_agent( + name='code-analysis-agent', + metadata={ + 'type': 'SUB_AGENT', + 'description': 'Analyzes code structure and dependencies', + 'ownerName': 'my-team', + 'ownerContactInfo': 'team@example.com', + 'ownerType': 'DIRECT_AGENT', + 'customerConfigurationRequired': False + } +) + +# Orchestrator +client.register_agent( + name='modernization-orchestrator', + metadata={ + 'type': 'ORCHESTRATOR_AGENT', + 'description': 'Coordinates code modernization workflow', + 'ownerName': 'my-team', + 'ownerContactInfo': 'team@example.com', + 'ownerType': 'DIRECT_AGENT', + 'customerConfigurationRequired': False, + 'jobOrchestrator': True, + 'jobOrchestratorMetadata': { + 'chatUILabel': 'Code Modernization Orchestrator', + 'chatAgentIdentifier': 'modernization-orchestrator', + 'a2aSupported': True + } + } +) +``` + +## PublishAgentVersion API + +Publishes a new version of an agent, linking it to a Bedrock AgentCore runtime. + +> **Tip:** To publish a new version without manual CLI commands, use the `publish_agent_version` MCP tool. It fetches the existing version's config, applies optional overrides (runtimeArn, atxAccessRoleArn), and publishes in a single call. Requires only the agent name and new version number. + +### API Signature + +```bash +aws atxagentregistryexternal publish-agent-version \ + --name \ + --version \ + --configuration \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 +``` + +### Configuration Structure + +The `--configuration` parameter is a JSON object with nested structures. **All fields below are required unless marked optional.** + +#### Top-Level Fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `shortDescription` | string | Yes | Brief description (can be same as registration description) | +| `computeConfiguration` | object | Yes | Bedrock AgentCore runtime configuration (see below) | +| `agentCard` | object | Yes | Agent metadata card (see [agentCard Structure](#agentcard-structure) below) | +| `inputPayloadSchema` | object | Yes | JSON schema for input (use `{"type": "object"}` minimal — empty `{}` is rejected) | +| `outputPayloadSchema` | object | Yes | JSON schema for output (use `{"type": "object"}` minimal — empty `{}` is rejected) | +| `monitoringType` | string | Yes | Must be `HEALTHCHECK` or `HEARTBEAT` | +| `notificationsEnabled` | string | Yes | `ENABLED` or `DISABLED` | +| `objectiveNegotiationPrompt` | string | Yes | Prompt for objective validation (can be empty string) | +| `agentResiliencyConfiguration` | object | Optional | Retry and recovery settings (see below) | + +**CRITICAL:** Use `"monitoringType": "HEALTHCHECK"` (not `"DEFAULT"`) + +#### computeConfiguration Structure + +```json +{ + "computeConfiguration": { + "provisionedComputeConfiguration": { + "agentCoreConfiguration": { + "atxAccessRoleArn": "arn:aws:iam:::role/AWSTransformAgentInvokeRole", + "runtimeArn": "arn:aws:bedrock-agentcore:us-east-1::runtime/", + "qualifier": "DEFAULT" + } + } + } +} +``` + +**IMPORTANT:** +- Use `runtimeArn` (NOT `agentCoreRuntimeArn`) +- The `qualifier` field is optional but recommended (use `"DEFAULT"`) +- Get the runtime ARN from `aws bedrock-agentcore-control get-agent-runtime` + +#### agentResiliencyConfiguration Structure (Optional but Recommended) + +```json +{ + "agentResiliencyConfiguration": { + "partnerControllerRetryWindowMinutes": 6, + "agentRecoveryConfiguration": { + "recoveryWaitTimeSeconds": 60 + } + } +} +``` + +#### agentCard Structure + +The `agentCard` field describes the agent's identity, capabilities, skills, and provider metadata. It is required by `PublishAgentVersion` (not by `RegisterAgent`). An empty `{}` is rejected by boto3 client-side validation — you must provide at least the required fields. The same validation applies to both orchestrator and subagent cards. + +##### Top-Level agentCard Fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `id` | string | Yes | Agent identifier (pattern: `^[a-zA-Z0-9_-]+$`) | +| `name` | string | Yes | Human-readable agent name | +| `description` | string | Yes | Agent description | +| `version` | string | Yes | Semantic version: `major.minor.patch` (e.g., `1.0.0`) | +| `url` | string | No | Agent URL | +| `defaultInputModes` | string[] | No | Input modes (e.g., `["text"]`) | +| `defaultOutputModes` | string[] | No | Output modes (e.g., `["text"]`) | +| `capabilities` | object | Yes | Capability flags and extensions (see below) | +| `tags` | string[] | No | Freeform tags | + +##### capabilities Structure + +The `capabilities` object contains boolean flags and a required `extensions` array: + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `restartable` | boolean | Yes | Whether the agent can be restarted | +| `a2aSupported` | boolean | Yes | Whether agent supports Agent-to-Agent protocol | +| `legacyDashboard` | boolean | Yes | Legacy dashboard support flag | +| `legacyTaskLink` | boolean | Yes | Legacy task link support flag | +| `webAppV2` | boolean | Yes | Web app v2 support flag | +| `legacyRestartable` | boolean | Yes | Legacy restartable support flag | +| `extensions` | array | Yes | Required extensions (see below) | + +##### Required Extensions + +Three extensions are **required** in `capabilities.extensions`: + +**1. Agent Provider** — Publisher/owner details + +```json +{ + "name": "Agent Provider", + "description": "Agent publisher details", + "params": { + "name": "My Team Name", + "accountId": "123456789012", + "ownerType": "DIRECT_AGENT", + "organization": "AWS", + "contactInfo": [ + { "type": "email", "value": "team@example.com" } + ] + } +} +``` + +Provider params: + +| Param | Type | Required | Description | +|-------|------|----------|-------------| +| `name` | string | Yes | Publisher display name | +| `accountId` | string | Yes | 12-digit AWS account ID (pattern: `^[0-9]{12}$`) | +| `ownerType` | string | Yes | `DIRECT_AGENT` or `MARKETPLACE_AGENT` | +| `contactInfo` | array | Yes | List of contact entries (at least one required) | + +Contact entry types: `email`, `phone`, `slack`, `other` + +**Contact validation rules:** +- `type` — Required. Must be non-null, non-blank, and one of the valid types above. Throws `ValidationException` if missing or invalid. + +**2. Agent Dependencies** — Runtime dependencies + +```json +{ + "name": "Agent Dependencies", + "description": "Runtime dependencies", + "params": { + "agentDependencies": [], + "requiredConnectorTypes": [] + } +} +``` + +**3. Agent Connectors** — Connector types used by the agent + +```json +{ + "name": "Agent Connectors", + "description": "Connector types used by this agent", + "params": { + "connectors": [ + { + "connectorTypeId": "platform|s3|1", + "displayName": "Platform S3 Managed Policy Connector", + "required": true, + "description": "S3 bucket for storing transformation artifacts" + } + ] + } +} +``` + +Connector entry fields: + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `connectorTypeId` | string | Yes | Format: `{owner}\|{shortName}\|{version}` | +| `displayName` | string | Yes | Human-readable connector name | +| `required` | boolean | Yes | Whether the connector is required | +| `description` | string | Yes | Connector description | + + +##### Minimal agentCard Example + +The minimum structure that passes boto3 and server-side validation (works for both orchestrators and subagents): + +```json +{ + "agentCard": { + "id": "my-orchestrator-agent", + "name": "My Orchestrator Agent", + "description": "Orchestrates sub-agents to complete tasks", + "version": "1.0.0", + "capabilities": { + "restartable": true, + "a2aSupported": true, + "legacyDashboard": false, + "legacyTaskLink": false, + "webAppV2": true, + "legacyRestartable": false, + "extensions": [ + { + "name": "Agent Provider", + "description": "Agent publisher details", + "params": { + "name": "MyTeam", + "accountId": "123456789012", + "ownerType": "DIRECT_AGENT", + "contactInfo": [ + { + "type": "email", + "value": "team@example.com" + } + ] + } + }, + { + "name": "Agent Dependencies", + "description": "Agent runtime dependencies", + "params": { + "agentDependencies": [], + "requiredConnectorTypes": [] + } + }, + { + "name": "Agent Connectors", + "description": "Connector types used by this agent", + "params": { + "connectors": [] + } + } + ] + } + } +} +``` + +##### Complete agentCard Example + +```json +{ + "agentCard": { + "id": "code-analysis-agent", + "name": "Code Analysis Agent", + "description": "Analyzes code structure, dependencies, and quality", + "version": "1.0.0", + "url": "https://example.com/agent", + "defaultInputModes": ["text"], + "defaultOutputModes": ["text"], + "tags": ["analysis", "assessment"], + "capabilities": { + "restartable": true, + "a2aSupported": false, + "legacyDashboard": false, + "legacyTaskLink": false, + "webAppV2": true, + "legacyRestartable": false, + "extensions": [ + { + "name": "Agent Provider", + "description": "Agent publisher details", + "params": { + "name": "My Team", + "accountId": "111122223333", + "ownerType": "DIRECT_AGENT", + "organization": "AWS", + "contactInfo": [ + { "type": "email", "value": "team@example.com" } + ] + } + }, + { + "name": "Agent Dependencies", + "description": "Runtime dependencies", + "params": { + "agentDependencies": [], + "requiredConnectorTypes": [] + } + }, + { + "name": "Agent Connectors", + "description": "Connector types used by this agent", + "params": { + "connectors": [] + } + } + ] + } + } +} +``` + +##### Orchestrator agentCard Example + +An orchestrator card showing subagent dependencies, connectors, and owner metadata: + +```json +{ + "agentCard": { + "id": "my-orchestrator-agent", + "name": "My Orchestrator Agent", + "description": "Orchestrator agent that coordinates analysis and transformation tasks", + "version": "1.0.0", + "capabilities": { + "restartable": true, + "a2aSupported": true, + "legacyDashboard": true, + "legacyTaskLink": false, + "webAppV2": true, + "legacyRestartable": false, + "extensions": [ + { + "name": "Agent Provider", + "description": "Agent publisher details", + "params": { + "name": "My Team", + "accountId": "123456789012", + "ownerType": "DIRECT_AGENT", + "contactInfo": [ + { + "type": "email", + "value": "team@example.com" + } + ] + } + }, + { + "name": "Agent Dependencies", + "description": "Runtime dependencies", + "params": { + "agentDependencies": [ + { + "agentName": "my-analysis-agent", + "role": "Analyzes source artifacts", + "required": false + }, + { + "agentName": "my-assessment-agent", + "role": "Runs assessment checks", + "required": false + }, + { + "agentName": "my-transformation-agent", + "role": "Performs transformation tasks", + "required": false + } + ] + } + }, + { + "name": "Agent Connectors", + "description": "Agent connector configurations", + "params": { + "connectors": [ + { + "connectorTypeId": "my_service|s3|1", + "displayName": "Amazon S3", + "required": false, + "description": "S3 bucket for storing artifacts" + } + ] + } + } + ] + }, + "skills": [] + } +} +``` + +Key differences from the minimal subagent card: +- **Agent Dependencies** lists all subagents the orchestrator invokes at runtime +- **Agent Connectors** declares connector types customers need to configure +- **`legacyDashboard: true`** enables the legacy dashboard view for this orchestrator + +#### JSON Schema Examples + +For minimal setup, use a basic type declaration (empty `{}` is rejected by boto3): +```json +{ + "inputPayloadSchema": {"type": "object"}, + "outputPayloadSchema": {"type": "object"} +} +``` + +For proper schemas (recommended for production): +```json +{ + "inputPayloadSchema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "type": "object", + "properties": { + "sourceCode": {"type": "string"}, + "language": {"type": "string"} + }, + "required": ["sourceCode"] + }, + "outputPayloadSchema": { + "$schema": "http://json-schema.org/draft-07/schema#", + "type": "object", + "properties": { + "analysis": {"type": "object"}, + "recommendations": {"type": "array"} + } + } +} +``` + +### Complete Example + +```bash +aws atxagentregistryexternal publish-agent-version \ + --name code-analysis-agent \ + --version 1.0.0 \ + --configuration '{ + "shortDescription": "Analyzes code structure and dependencies", + "computeConfiguration": { + "provisionedComputeConfiguration": { + "agentCoreConfiguration": { + "atxAccessRoleArn": "arn:aws:iam::111122223333:role/AWSTransformAgentInvokeRole", + "runtimeArn": "arn:aws:bedrock-agentcore:us-east-1:111122223333:runtime/code_analysis_agent-ABC123", + "qualifier": "DEFAULT" + } + } + }, + "agentCard": { "...see Minimal agentCard Example above..." }, + "inputPayloadSchema": {"type": "object"}, + "outputPayloadSchema": {"type": "object"}, + "monitoringType": "HEALTHCHECK", + "notificationsEnabled": "ENABLED", + "objectiveNegotiationPrompt": "", + "agentResiliencyConfiguration": { + "partnerControllerRetryWindowMinutes": 6, + "agentRecoveryConfiguration": { + "recoveryWaitTimeSeconds": 60 + } + } + }' \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 +``` + +### Python Example + +```python +configuration = { + "shortDescription": "Analyzes code structure and dependencies", + "computeConfiguration": { + "provisionedComputeConfiguration": { + "agentCoreConfiguration": { + "atxAccessRoleArn": f"arn:aws:iam::{account_id}:role/AWSTransformAgentInvokeRole", + "runtimeArn": runtime_arn, + "qualifier": "DEFAULT" + } + } + }, + "agentCard": { "...see Minimal agentCard Example above..." }, + "inputPayloadSchema": {"type": "object"}, + "outputPayloadSchema": {"type": "object"}, + "monitoringType": "HEALTHCHECK", + "notificationsEnabled": "ENABLED", + "objectiveNegotiationPrompt": "", + "agentResiliencyConfiguration": { + "partnerControllerRetryWindowMinutes": 6, + "agentRecoveryConfiguration": { + "recoveryWaitTimeSeconds": 60 + } + } +} + +client.publish_agent_version( + name='code-analysis-agent', + version='1.0.0', + configuration=configuration +) +``` + +## UpdatePublisherAccessControl API + +Controls which AWS accounts can access your agent. **REQUIRED** to make agents visible, even in same-account scenarios. + +### API Signature + +```bash +aws atxagentregistryexternal update-publisher-access-control \ + --agent-name \ + --customer-account-id <12-digit-account-id> \ + --access-control \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 +``` + +### Parameters + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `--agent-name` | string | Yes | Name of the agent | +| `--customer-account-id` | string | Yes | 12-digit AWS account ID to grant/revoke access | +| `--access-control` | enum | Yes | `ENABLED` or `DISABLED` | + +### Example + +```bash +# Grant access to the same account (same account as publisher) +aws atxagentregistryexternal update-publisher-access-control \ + --agent-name code-analysis-agent \ + --customer-account-id 111122223333 \ + --access-control ENABLED \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 + +# Grant access to a different customer account +aws atxagentregistryexternal update-publisher-access-control \ + --agent-name code-analysis-agent \ + --customer-account-id 999888777666 \ + --access-control ENABLED \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 +``` + +**CRITICAL:** Even if you're using the same AWS account for both publishing and consuming agents, you MUST run this command to make agents visible in the AWS Transform console. + +## Complete Registration Workflow + +### For Subagents + +```bash +# 1. Create Bedrock AgentCore runtime +RUNTIME_ARN=$(aws bedrock-agentcore-control create-agent-runtime \ + --agent-runtime-name code_analysis_agent_v1 \ + --agent-runtime-artifact '{"containerConfiguration":{"containerUri":""}}' \ + --role-arn arn:aws:iam:::role/AgentCoreExecutionRole \ + --network-configuration '{"networkMode":"PUBLIC"}' \ + --region us-east-1 \ + --query 'agentRuntimeArn' \ + --output text) + +# 2. Poll until READY +aws bedrock-agentcore-control get-agent-runtime \ + --agent-runtime-id \ + --region us-east-1 \ + --query 'status' + +# 3. Register agent +aws atxagentregistryexternal register-agent \ + --name code-analysis-agent \ + --metadata '{ + "type": "SUB_AGENT", + "description": "Analyzes code structure", + "ownerName": "my-team", + "ownerContactInfo": "team@example.com", + "ownerType": "DIRECT_AGENT", + "customerConfigurationRequired": false + }' \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 + +# 4. Publish version +aws atxagentregistryexternal publish-agent-version \ + --name code-analysis-agent \ + --version 1.0.0 \ + --configuration '{ + "shortDescription": "Analyzes code structure", + "computeConfiguration": { + "provisionedComputeConfiguration": { + "agentCoreConfiguration": { + "atxAccessRoleArn": "arn:aws:iam:::role/AWSTransformAgentInvokeRole", + "runtimeArn": "'$RUNTIME_ARN'", + "qualifier": "DEFAULT" + } + } + }, + "monitoringType": "HEALTHCHECK", + "notificationsEnabled": "ENABLED", + "objectiveNegotiationPrompt": "", + "agentCard": { "...see Minimal agentCard Example above..." }, + "inputPayloadSchema": {"type": "object"}, + "outputPayloadSchema": {"type": "object"} + }' \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 + +# 5. Enable access +aws atxagentregistryexternal update-publisher-access-control \ + --agent-name code-analysis-agent \ + --customer-account-id \ + --access-control ENABLED \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 +``` + +### For Orchestrators + +Same workflow as subagents, but with different metadata in step 3: + +```bash +# 3. Register orchestrator (note the additional fields) +aws atxagentregistryexternal register-agent \ + --name modernization-orchestrator \ + --metadata '{ + "type": "ORCHESTRATOR_AGENT", + "description": "Coordinates code modernization", + "ownerName": "my-team", + "ownerContactInfo": "team@example.com", + "ownerType": "DIRECT_AGENT", + "customerConfigurationRequired": false, + "jobOrchestrator": true, + "jobOrchestratorMetadata": { + "chatUILabel": "Code Modernization Orchestrator", + "chatAgentIdentifier": "modernization-orchestrator", + "a2aSupported": true + } + }' \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 + +# ... continue with publish-agent-version and update-publisher-access-control +``` + +## DeregisterAgent API + +Deregisters an agent from the AWS Transform registry. Synchronous when no active instances exist; asynchronous (via `force=true`) when there are. + +> **Use the `deregister_agent` MCP tool.** Do NOT auto-approve it — the two-click Run flow below is the safety acknowledgment. + +### Two-step safety flow (REQUIRED) + +The registry enforces a two-step confirmation when the agent has active instances in running jobs: + +1. **First call — always without force.** Call `deregister_agent(name=)`. Do NOT set `force=True` on the first call, ever. +2. **If no active instances:** returns `{"deregistrationStatus": "DEREGISTERED"}`. Done. +3. **If active instances exist:** the service returns a `ValidationException` with this exact message: + + > Cannot deregister agent '' because it has active instances in running jobs. Use force=true to proceed with async deregistration, which will stop all running instances and fail associated jobs. + + The MCP tool returns this message verbatim in its `error` field. **Surface it to the user verbatim** and ask for explicit confirmation before retrying. + + **Clarification on "fail associated jobs":** jobs are failed only when the agent being deregistered is the **orchestrator** of those jobs. If the agent is a **subagent**, its running instances are stopped but the parent job continues. + +4. **Second call — with force, after user confirmation.** Once the user acknowledges, call `deregister_agent(name=, force=True)`. Returns `{"deregistrationStatus": "DEREGISTRATION_IN_PROGRESS"}`. Async teardown is queued. +5. **Retry safety:** if deregistration is already in progress, subsequent calls are idempotent and return `{"deregistrationStatus": "DEREGISTRATION_IN_PROGRESS"}`. + +### Why two Run clicks matter + +Each MCP tool invocation shown to the user in Kiro requires a "Run" / "Accept command" click. Calling `deregister_agent` twice — first without force, then with force after the user reads the ValidationException message — means the user clicks Run twice, with the service's own guidance text in between as the acknowledgment. Never collapse this into a single `force=True` call. + +### API Signature (CLI, for reference) + +```bash +aws atxagentregistryexternal deregister-agent \ + --name \ + [--force] \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 +``` + +### Parameters + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `name` | string | Yes | Agent name to deregister. | +| `force` | boolean | No (default `false`) | Acknowledge that active instances may exist and proceed with async deregistration. Only set `true` after explicit user confirmation. | + +### DeregistrationStatus values + +| Value | Meaning | +|-------|---------| +| `DEREGISTERED` | Synchronous deregistration completed. No active instances existed. | +| `DEREGISTRATION_IN_PROGRESS` | Async teardown queued. Running instances are being stopped; orchestrator jobs (if any) are being failed. | + +### Python example + +```python +# Step 1 — always without force. +result = deregister_agent(name="code-analysis-agent") +# result is JSON. Inspect for 'error' vs 'deregistrationStatus'. + +# Step 2 — only after the user confirms the ValidationException message. +result = deregister_agent(name="code-analysis-agent", force=True) +``` + +## Common Errors and Solutions + +| Error | Cause | Solution | +|-------|-------|----------| +| `ValidationException: agentRuntimeName must match [a-zA-Z][a-zA-Z0-9_]{0,47}` | Runtime name contains hyphens or invalid characters | Use underscores instead of hyphens: `my_agent` not `my-agent` | +| `AccessDeniedException: unable to assume the Access Role` | IAM role trust policy doesn't match registry endpoint | Update trust policy to include correct service principal (`prod.us-east-1.compute.elastic-gumby.aws.internal` for prod) | +| `ConflictException: Agent already exists` | Agent name already registered | Use `publish-agent-version` to publish a new version instead of `register-agent` | +| `Cannot start job without orchestrator agent` | Orchestrator not registered or access not enabled | Re-register with `jobOrchestrator: true` and run `update-publisher-access-control` | +| Runtime status stuck in `CREATING` | Container image issues or role permissions | Check CloudWatch logs for the runtime, verify execution role permissions | +| `Invalid parameter: monitoringType DEFAULT` | Wrong monitoringType value | Use `HEALTHCHECK` or `HEARTBEAT`, not `DEFAULT` | +| `Agent marked as customer configurable, compute configuration cannot be provided` | `customerConfigurationRequired: true` with `computeConfiguration` in publish | Omit `computeConfiguration`; use `publish-agent-version` from `agent-builder-mcp-aws-transform` | +| Chat input never enables / zero invocations | Wrong role used in registration | Use `AWSTransformAgentInvokeRole`, not `AgentCoreExecutionRole` | + +## Verification Commands + +```bash +# List your registered agents +aws atxagentregistryexternal list-agents-by-publisher \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 + +# Get agent details +aws atxagentregistryexternal get-agent \ + --name \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 + +# Get specific agent version +aws atxagentregistryexternal get-agent-version \ + --name \ + --version \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 + +# Check Bedrock AgentCore runtime status +aws bedrock-agentcore-control get-agent-runtime \ + --agent-runtime-id \ + --region us-east-1 + +# List all Bedrock AgentCore runtimes +aws bedrock-agentcore-control list-agent-runtimes \ + --region us-east-1 + +# Deregister an agent (synchronous — no active instances) +aws atxagentregistryexternal deregister-agent \ + --name \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 + +# Deregister with force (async — only after ValidationException confirmation) +aws atxagentregistryexternal deregister-agent \ + --name \ + --force \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 +``` + +## Additional Resources + +For API-specific details, always use the MCP search tools: +- `keyword_search("register agent")` - General registration guidance +- `search_by_source("RegisterAgent", "api")` - RegisterAgent API reference +- `search_by_source("PublishAgentVersion", "api")` - PublishAgentVersion API reference +- `search_by_source("UpdatePublisherAccessControl", "api")` - Access control API reference + +**Grounding requirement:** Only answer based on search results. If specific information isn't found, refer users to the AWS Transform Developer Guide or their Solutions Architect. diff --git a/aws-transform-agent-toolkit/steering/api-reference.md b/aws-transform-agent-toolkit/steering/api-reference.md new file mode 100644 index 0000000..4ccff6e --- /dev/null +++ b/aws-transform-agent-toolkit/steering/api-reference.md @@ -0,0 +1,471 @@ +--- +inclusion: auto +name: api-reference +description: "Guidance for working with AWS Transform APIs (Agentic API, Registry API)" +--- + +# AWS Transform Agentic API Reference + +## Overview + +The AWS Transform Agentic API provides operations for agents to interact with AWS Transform. All operations use JSON-RPC protocol over HTTPS with AWS SigV4 authentication. + +**Base Endpoint**: Configured via `QT_AGENTIC_API_ENDPOINT` environment variable + +**Authentication**: AWS SigV4 signing with `transform-agents` signing name + +**Protocol**: JSON 1.0 + +## Common Request Structure + +All requests include a `requestContext` with job metadata and authorization: + +```json +{ + "requestContext": { + "jobMetadata": { + "workspaceId": "uuid", + "jobId": "uuid" + }, + "agentInstanceId": "uuid", + "authorizationToken": "token" + } +} +``` + +## Key Operations + +### InvokeAgent + +Invoke another agent (orchestrator or subagent) to perform a task. + +| Property | Value | +|----------|-------| +| **HTTP Method** | POST | +| **Path** | / | +| **Idempotent** | Yes (with idempotencyToken) | + +**Input Shape** (`InvokeAgentRequest`): + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `requestContext` | RequestContext | Yes | Job and authorization context | +| `agentId` | String | Yes | Agent identifier (pattern: `^[a-z0-9-]+$`) | +| `inputPayload` | AgentInputPayload | No | Input data for the agent | +| `idempotencyToken` | UUID | No | Token for idempotent retries | +| `agentVersion` | String | No | Specific agent version (pattern: `^\d+\.\d+\.\d+(?:-dev-[a-zA-Z0-9]+)?$`) | +| `agentInstanceId` | UUID | No | Existing agent instance to resume | + +**Output Shape** (`InvokeAgentResponse`): + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `agentInstanceId` | UUID | Yes | Unique identifier for the agent invocation | + +**Example Request**: + +```json +{ + "requestContext": { + "jobMetadata": { + "workspaceId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", + "jobId": "12345678-1234-1234-1234-123456789012" + }, + "agentInstanceId": "87654321-4321-4321-4321-210987654321", + "authorizationToken": "eyJhbGc..." + }, + "agentId": "infrastructure-analyzer", + "agentVersion": "1.0.0", + "inputPayload": { + "task": "analyze", + "config": "{...}" + } +} +``` + +**Example Response**: + +```json +{ + "agentInstanceId": "98765432-8765-8765-8765-987654321098" +} +``` + +--- + +### GetJob + +Retrieve details about the current transformation job. + +| Property | Value | +|----------|-------| +| **HTTP Method** | POST | +| **Path** | / | +| **Read-only** | Yes | + +**Input Shape** (`GetJobRequest`): + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `requestContext` | RequestContext | Yes | Job and authorization context | +| `includeObjective` | Boolean | No | Include job objective in response | + +**Output Shape** (`GetJobResponse`): + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `job` | JobInfo | No | Job details including ID, workspace, status | + +**Example Request**: + +```json +{ + "requestContext": { + "jobMetadata": { + "workspaceId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", + "jobId": "12345678-1234-1234-1234-123456789012" + }, + "agentInstanceId": "87654321-4321-4321-4321-210987654321", + "authorizationToken": "eyJhbGc..." + }, + "includeObjective": true +} +``` + +**Example Response**: + +```json +{ + "job": { + "jobId": "12345678-1234-1234-1234-123456789012", + "workspaceId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", + "status": "IN_PROGRESS", + "objective": "Migrate infrastructure to AWS", + "createdAt": "2024-01-15T10:30:00Z" + } +} +``` + +--- + +### ListAgents + +List available agents that can be invoked. + +| Property | Value | +|----------|-------| +| **HTTP Method** | POST | +| **Path** | / | +| **Paginated** | Yes | + +**Input Shape** (`ListAgentsRequest`): + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `requestContext` | RequestContext | Yes | Job and authorization context | +| `agentFilter` | ListAgentsFilter | No | Filter criteria for agents | +| `nextToken` | String | No | Pagination token | +| `maxResults` | Integer | No | Max results per page (1-10) | + +**Output Shape** (`ListAgentsResponse`): + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `items` | Array | No | List of agent metadata summaries | +| `nextToken` | String | No | Token for next page | + +**Example Request**: + +```json +{ + "requestContext": { + "jobMetadata": { + "workspaceId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", + "jobId": "12345678-1234-1234-1234-123456789012" + }, + "agentInstanceId": "87654321-4321-4321-4321-210987654321", + "authorizationToken": "eyJhbGc..." + }, + "maxResults": 10 +} +``` + +**Example Response**: + +```json +{ + "items": [ + { + "agentId": "infrastructure-analyzer", + "agentName": "Infrastructure Analyzer", + "agentType": "SUB_AGENT", + "version": "1.0.0" + }, + { + "agentId": "cost-calculator", + "agentName": "Cost Calculator", + "agentType": "SUB_AGENT", + "version": "2.1.0" + } + ], + "nextToken": "eyJuZXh0VG9rZW4..." +} +``` + +--- + +### ListAgentInstances + +List all agent invocations for the current job. + +| Property | Value | +|----------|-------| +| **HTTP Method** | POST | +| **Path** | / | +| **Read-only** | Yes | +| **Paginated** | Yes | + +**Input Shape** (`ListAgentInstancesRequest`): + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `requestContext` | RequestContext | Yes | Job and authorization context | +| `nextToken` | String | No | Pagination token | +| `maxResults` | Integer | No | Max results per page (1-100) | + +**Output Shape** (`ListAgentInstancesResponse`): + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `agentInstances` | Array | Yes | List of agent instances | +| `nextToken` | String | No | Token for next page | + +**Example Request**: + +```json +{ + "requestContext": { + "jobMetadata": { + "workspaceId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", + "jobId": "12345678-1234-1234-1234-123456789012" + }, + "agentInstanceId": "87654321-4321-4321-4321-210987654321", + "authorizationToken": "eyJhbGc..." + }, + "maxResults": 50 +} +``` + +**Example Response**: + +```json +{ + "agentInstances": [ + { + "agentInstanceId": "98765432-8765-8765-8765-987654321098", + "agentId": "infrastructure-analyzer", + "status": "COMPLETED", + "startedAt": "2024-01-15T10:35:00Z", + "completedAt": "2024-01-15T10:40:00Z" + } + ] +} +``` + +--- + +### GetAgentInstance + +Get details about a specific agent invocation. + +| Property | Value | +|----------|-------| +| **HTTP Method** | POST | +| **Path** | / | + +**Input Shape** (`GetAgentInstanceRequest`): + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `requestContext` | RequestContext | Yes | Job and authorization context | +| `agentInstanceId` | UUID | Yes | Agent instance identifier | + +**Output Shape** (`GetAgentInstanceResponse`): + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `agentInstanceId` | UUID | Yes | Instance identifier | +| `agentInstanceStatus` | String | Yes | INVOKING, INVOKED, RUNNING, COMPLETED, FAILED | +| `agentOutput` | Object | No | Present when COMPLETED | +| `agentOutput.serializedPayload` | String | No | JSON string with response data | +| `statusReason` | String | No | Failure reason (when FAILED) | + +**Instance lifecycle:** INVOKING → INVOKED → RUNNING → COMPLETED or FAILED. Poll for RUNNING before sending messages; poll for COMPLETED to extract results. + +**Extracting response:** +```python +output = instance.get("agentOutput", {}) +payload = json.loads(output.get("serializedPayload", "{}")) +response_text = payload.get("response", "") +``` + +**Example Request**: + +```json +{ + "requestContext": { + "jobMetadata": { + "workspaceId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", + "jobId": "12345678-1234-1234-1234-123456789012" + }, + "agentInstanceId": "87654321-4321-4321-4321-210987654321", + "authorizationToken": "eyJhbGc..." + }, + "agentInstanceId": "98765432-8765-8765-8765-987654321098" +} +``` + +**Example Response**: + +```json +{ + "agentInstance": { + "agentInstanceId": "98765432-8765-8765-8765-987654321098", + "agentId": "infrastructure-analyzer", + "status": "COMPLETED", + "startedAt": "2024-01-15T10:35:00Z", + "completedAt": "2024-01-15T10:40:00Z", + "outputPayload": { + "result": "analysis_complete", + "findings": [...] + } + } +} +``` + +--- + +### SendMessage + +Send an A2A message to a running agent instance. + +| Property | Value | +|----------|-------| +| **HTTP Method** | POST | +| **Path** | / | + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `requestContext` | RequestContext | Yes | Job and authorization context | +| `agentInstanceId` | UUID | Yes | Target agent instance (or `"ATX_CHAT"` for webapp chat) | +| `params` | Object | Yes | Message payload with A2A `message` object | + +**CRITICAL:** SendMessage has a ~25s internal timeout. If the subagent takes longer, returns error code `-32603` with HTTP 200. The subagent is still processing — use the fire-and-forget + polling pattern: send the message, then poll `GetAgentInstance` until COMPLETED. + +**ATX_CHAT:** Use `agentInstanceId="ATX_CHAT"` to send messages to the webapp chat. The required A2A format uses `extensions` with `userSelection: "jobCreator"` metadata. See `orchestrator-patterns.md` for the working code pattern. + +--- + +## Job Plan APIs + +### PutJobPlan + +Create or replace the job plan with steps visible in the AWS Transform webapp. + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `plan.nodes` | Array | Yes | List of step objects with `stepLabel`, `stepName`, `description` | +| `mode` | Object | Yes | Use `{"override": {}}` to replace existing plan | +| `idempotencyToken` | UUID | Yes | Required for all mutating calls | + +**CRITICAL:** `PutJobPlan` assigns its own `stepId` values. The response does NOT include them. Call `ListJobPlanSteps` immediately after to get the `stepLabel → stepId` mapping. + +### ListJobPlanSteps + +List all steps in the current job plan with their API-assigned stepIds and statuses. + +### UpdateJobPlanStep + +Update a plan step's status. + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `planStep.stepId` | String | Yes | API-assigned stepId (NOT stepLabel) | +| `planStep.status` | String | Yes | NOT_STARTED, IN_PROGRESS, SUCCEEDED, FAILED, PENDING_HUMAN_INPUT | +| `planStep.description` | String | No | Error message for FAILED status | +| `idempotencyToken` | UUID | Yes | Required | + +**CRITICAL:** The error message field is `description`, NOT `errorMessage`. Using `errorMessage` silently drops the error text. + +### UpdateJobStatus + +Update overall job status: PLANNING, PLANNED, EXECUTING, COMPLETED, FAILED. + +--- + +## HITL APIs + +HITL (Human-in-the-Loop) enables subagents to collect user input via AutoForms in the AWS Transform webapp. See `subagent-patterns.md` for the HITL AutoForm pattern. + +### CreateHitlTask + +Create a HITL task attached to a job plan step. + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `uxComponentId` | String | Yes | Use `"AutoForm"` | +| `title` | String | Yes | Form title | +| `description` | String | Yes | Form description (max 1024 chars) | +| `stepId` | String | No | Plan step to attach to | +| `blockingType` | String | Yes | `"BLOCKING"` (wait for input) or `"NON_BLOCKING"` (display only) | +| `hitlRequestArtifact.artifactId` | String | No | Uploaded form schema artifact | +| `idempotencyToken` | UUID | Yes | Required | + +### StartHitlTask / GetHitlTask / CloseHitlTask + +- **StartHitlTask**: Activates the form in the webapp. Status: CREATED → AWAITING. +- **GetHitlTask**: Poll for status. When SUBMITTED, download the response artifact. +- **CloseHitlTask**: Close with `closureType: "CLOSED"`. Always close after processing. + +--- + +## Artifact APIs + +### UploadArtifact (CreateArtifactUploadUrl + PUT + CompleteArtifactUpload) + +Three-step process: get presigned URL, PUT content, mark complete. Used for HITL form schemas and reports. + +### DownloadArtifact (CreateArtifactDownloadUrl + GET) + +Get presigned URL, then GET the content. Used to retrieve HITL user responses. + +Use `AgenticApiHelper.create_artifact_upload_url()`, PUT content, then `complete_artifact_upload()`. For download: `create_artifact_download_url()` then GET. + +--- + +## AgenticApiHelper Pattern + +The recommended pattern for calling the Agentic API is to extend `AgenticApiHelper` from the SDK. It provides `_inject_request_context()` which automatically adds `workspaceId`, `jobId`, `agentInstanceId`, and `authorizationToken` to every API call. + +Create an `AgenticApiHelper` subclass that calls `_inject_request_context()` on every API request. Do NOT use raw `boto3` clients directly — you'll get `Missing required parameter: requestContext` errors. Search `keyword_search("AgenticApiHelper")` for the SDK class documentation. + +--- + +## Common Error Responses + +All operations may return these standard errors: + +| Error | HTTP Status | Description | +|-------|-------------|-------------| +| `AccessDeniedException` | 403 | Access denied to the requested resource | +| `InternalServerException` | 500 | Internal server error occurred | +| `ResourceNotFoundException` | 404 | Requested resource not found | +| `ThrottlingException` | 429 | Request rate limit exceeded | +| `ValidationException` | 400 | Request validation failed | +| `ConflictException` | 409 | Request conflicts with current resource state | + +## Next Steps + +- Review orchestrator patterns: `orchestrator-patterns.md` +- Review subagent patterns: `subagent-patterns.md` +- Troubleshoot: `troubleshooting.md` diff --git a/aws-transform-agent-toolkit/steering/deploy-agent-workflow.md b/aws-transform-agent-toolkit/steering/deploy-agent-workflow.md new file mode 100644 index 0000000..feca703 --- /dev/null +++ b/aws-transform-agent-toolkit/steering/deploy-agent-workflow.md @@ -0,0 +1,287 @@ +--- +inclusion: auto +name: deploy-agent +description: Deploy AWS Transform agent using Docker/finch/CodeBuild pipeline (build, push, deploy, register) +--- + +# AWS Transform Agent Deployment Workflow + +This workflow deploys an AWS Transform agent through the complete pipeline: build → push → deploy → register. + +## When to Use + +Invoke this workflow when the user wants to: +- Deploy an AWS Transform agent to production +- Build and register a new agent +- Update an existing agent with a new version +- Push agent changes to Bedrock AgentCore + +## First-time setup: IAM roles + +**If this is the first time you're deploying an AWS Transform agent in this AWS account, you likely need to create IAM roles first.** Skipping this step is the single most common cause of silent deployment failures: the runtime reaches READY but jobs fail ~8 minutes after creation with "Failed to start the job" in the AWS Transform webapp. + +AWS Transform agent deployment requires two IAM roles: + +- **`AgentCoreExecutionRole`** — runs your agent container. Needs `bedrock:InvokeModel`, `bedrock:InvokeModelWithResponseStream`, `transform-agents:*`, ECR pull, CloudWatch Logs, and X-Ray permissions. Trust principal: `bedrock-agentcore.amazonaws.com`. +- **`AWSTransformAgentInvokeRole`** — assumed by AWS Transform to invoke your runtime. Needs `bedrock-agentcore:InvokeAgentRuntime`, `bedrock-agentcore:GetAgentRuntime`, and `bedrock-agentcore:GetAgentRuntimeEndpoint`. Trust principal: `prod.us-east-1.compute.elastic-gumby.aws.internal`. + +> **Regional scope:** The AWS Transform Compute principal format is `{stage}.{region}.compute.elastic-gumby.aws.internal`, and AWS Transform runs in several prod regions. This workflow, the CloudFormation template, and `deploy_agent_full_pipeline` assume us-east-1 only. For a non-us-east-1 AWS Transform region, swap the region segment in both principals, point the registry endpoint at the matching airport code, and pass `region` explicitly to the pipeline tool. + +### Check if roles exist + +```bash +aws iam get-role --role-name AgentCoreExecutionRole --query 'Role.Arn' --output text +aws iam get-role --role-name AWSTransformAgentInvokeRole --query 'Role.Arn' --output text +``` + +If either returns `NoSuchEntity`, create them using the CloudFormation template below. + +### Create roles with CloudFormation + +A complete, battle-tested CloudFormation template for both roles is documented in [deployment-pipeline-guide.md Section 2](./deployment-pipeline-guide.md#section-2-complete-cloudformation-template). Save it as `iam-roles.yaml` and deploy: + +```bash +aws cloudformation deploy \ + --template-file iam-roles.yaml \ + --stack-name aws-transform-agent-iam-roles \ + --capabilities CAPABILITY_NAMED_IAM \ + --region us-east-1 +``` + +### Using non-default role names + +If you already have Bedrock AgentCore roles with different names (common when Bedrock AgentCore was set up via the AWS console or SDK, which creates roles like `AmazonBedrockAgentCoreSDKRuntime-us-east-1-d4f0bc5a29`): + +- **`AgentCoreExecutionRole`** — `deploy_agent_full_pipeline` first tries the default name, then falls back to scanning trust policies for a role trusting `bedrock-agentcore.amazonaws.com`. If exactly one match is found it's used automatically; if zero or multiple candidates are found, the tool errors out and asks you to pass `execution_role_arn` explicitly. +- **`AWSTransformAgentInvokeRole`** — only the exact default name is auto-detected. There is no trust-policy fallback. If your invoke role has a non-default name, pass `access_role_arn` explicitly; otherwise registry registration is skipped with a warning and the pipeline continues without registering. + +## Prerequisites Check + +Before deploying, verify: +1. Agent directory contains Dockerfile +2. AWS credentials are configured (`aws sts get-caller-identity`) +3. IAM roles exist with correct permissions (see "First-time setup" above) +4. AWS account is allowlisted by AWS Transform team + +## Deployment Steps + +### Step 1: Gather Information + +Ask the user these questions in order: + +1. **Agent directory path**: "Where is the agent code?" (e.g., `eswar-test` or `./agents/modernization`) + +2. **Agent name**: "What should this agent be called?" (e.g., `eswar-test` or `modernization-orchestrator`) + - **CRITICAL:** `agent_name` is used as an ECR repository name. Must be lowercase and match `[a-z0-9]+((\.|_|__|-+)[a-z0-9]+)*`. No uppercase letters — `myOrchestrator` will fail with `InvalidParameterException` on `DescribeRepositories`. + +3. **Agent version**: "What version?" (default: `1.0.0`) + +4. **IMPORTANT - Agent type**: "Will this agent be the main orchestrator that users interact with in the AWS Transform console?" + - If user says **YES** → This is a job orchestrator (set `job_orchestrator=True`) + - If user says **NO** → This is a subagent (set `job_orchestrator=False`) + +5. **If YES to question 4**, also ask: "What should the display name be in the chat UI?" (e.g., "Eswar Test Orchestrator") + +6. **Build method**: "Use CodeBuild?" (recommend yes for Windows, auto-detect otherwise) + +### Step 2: Use deploy_agent_full_pipeline Tool + +Call `deploy_agent_full_pipeline` with the gathered information: + +```python +deploy_agent_full_pipeline( + agent_path="", + agent_name="", + agent_version="", + job_orchestrator=, + chat_ui_label="", # Optional + use_codebuild= +) +``` + +**Key parameters:** +- `job_orchestrator=True` → Agent can be bound to workspaces (for orchestrators) +- `job_orchestrator=False` → Agent called by other agents only (for subagents) +- `chat_ui_label` → Only needed if job_orchestrator=True + +The tool will automatically: +- Detect platform (Windows → CodeBuild, macOS → finch/docker) +- Build ARM64 Docker image +- Push to ECR +- Deploy to Bedrock AgentCore +- Register with AWS Transform registry + +### Step 3: Report Results + +After deployment completes, report to the user: + +``` +✓ Agent deployed successfully! + +Build Phase: + - Method: {build_method} + - Image URI: {image_uri} + +Deploy Phase: + - Runtime ARN: {runtime_arn} + - Status: READY + +Register Phase: + - Agent: {agent_name} + - Version: {agent_version} + - Registry: {registry_endpoint} + +Your agent is ready to use! +``` + +## Platform-Specific Guidance + +**Windows Users**: +- Tool automatically uses CodeBuild (finch not available on Windows) +- Requires AWS credentials with CodeBuild permissions +- Build takes 2-3 minutes (CodeBuild startup overhead) + +**macOS/Linux Users**: +- Tool uses local finch (fastest) +- Falls back to docker if finch not installed +- Can force CodeBuild with `use_codebuild=True` + +## Error Handling + +Common errors and solutions: + +1. **"Dockerfile not found"** + - Verify agent_path is correct + - Check that Dockerfile exists in the directory + +2. **"No container runtime available"** + - Windows: Tool should auto-use CodeBuild + - macOS/Linux: Install finch or docker, or use `use_codebuild=True` + +3. **"Bedrock AgentCore runtime stuck in CREATING"** + - Check CloudWatch logs: `/aws/bedrock-agentcore/agent-runtime/{runtime-id}` + - Common causes: ECR permissions, health check failure, container crash + +4. **"Agent registration failed"** + - Verify AWS account is allowlisted by AWS Transform team + - Check IAM role permissions (AWSTransformAgentInvokeRole) + +## Advanced Usage + +### Deploy Without Registry Registration + +If you want to deploy to Bedrock AgentCore but skip registry registration: + +```python +deploy_agent_full_pipeline( + agent_path="./agents/modernization", + agent_name="modernization-orchestrator", + skip_registry=True +) +``` + +### Force CodeBuild (Windows or CI/CD) + +For Windows users or CI/CD pipelines without local Docker: + +```python +deploy_agent_full_pipeline( + agent_path="./agents/modernization", + agent_name="modernization-orchestrator", + use_codebuild=True +) +``` + +### Custom IAM Roles + +If your IAM roles have different names: + +```python +deploy_agent_full_pipeline( + agent_path="./agents/modernization", + agent_name="modernization-orchestrator", + execution_role_arn="arn:aws:iam::123456:role/CustomExecutionRole", + access_role_arn="arn:aws:iam::123456:role/CustomAccessRole" +) +``` + +### Individual Phase Tools + +For more control, use individual tools: + +**Build only:** +```python +build_agent_image( + agent_path="./agents/modernization", + agent_name="modernization-orchestrator", + use_codebuild=False +) +``` + +**Deploy only (after building manually):** +```python +deploy_agent_to_agentcore( + image_uri="123456.dkr.ecr.us-east-1.amazonaws.com/aws-transform-agents/agent:latest", + agent_name="modernization-orchestrator", + execution_role_arn="arn:aws:iam::123456:role/AgentCoreExecutionRole" +) +``` + +## Related Documentation + +- [Deployment Pipeline Guide](deployment-pipeline-guide.md) - Detailed pipeline patterns and IAM setup +- [Agent Registration](agent-registration.md) - Registry API details and manual registration +- [Orchestrator Patterns](orchestrator-patterns.md) - Agent architecture patterns + +## Troubleshooting + +### Build Issues + +**Image build fails with "permission denied":** +- Check that Dockerfile exists and is readable +- Verify finch/docker is running +- Try `finch system prune` to clean up space + +**ECR push fails:** +- Verify AWS credentials: `aws sts get-caller-identity` +- Check ECR permissions in your IAM policy +- Ensure ECR repository exists or can be created + +### Deployment Issues + +**Bedrock AgentCore runtime stuck in CREATING:** +- Wait up to 2 minutes for provisioning +- Check CloudWatch logs: `/aws/bedrock-agentcore/agent-runtime/{runtime-id}` +- Common causes: + - Container health check failing (must expose port 8080 with /health endpoint) + - Container crashes on startup (check application logs) + - ECR image not accessible (verify execution role permissions) + +**Bedrock AgentCore runtime fails with "FAILED" status:** +- Check CloudWatch logs for detailed error messages +- Verify container image is ARM64 architecture +- Ensure health check endpoint returns 200 OK + +### Registry Issues + +**"Account not allowlisted":** +- Contact AWS Transform team to allowlist your AWS account ID +- Provide account ID from `aws sts get-caller-identity` + +**"Agent already registered":** +- Agent names must be unique across all publishers +- Try a different agent name +- Or update existing agent with `publish_agent_version` tool + +**Registration succeeds but agent not visible:** +- Check agent visibility setting (PRIVATE vs PUBLIC) +- Verify your registry endpoint points to prod (`iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev`) +- Allow a few minutes for registry propagation + +--- + +**Note**: This workflow requires: +- AWS Transform Agent Toolkit MCP server with deployment tools +- AWS credentials configured +- IAM roles: AgentCoreExecutionRole, AWSTransformAgentInvokeRole +- AWS account allowlisted by AWS Transform team diff --git a/aws-transform-agent-toolkit/steering/deployment-pipeline-guide.md b/aws-transform-agent-toolkit/steering/deployment-pipeline-guide.md new file mode 100644 index 0000000..7b8c2bb --- /dev/null +++ b/aws-transform-agent-toolkit/steering/deployment-pipeline-guide.md @@ -0,0 +1,1013 @@ +--- +inclusion: auto +name: deployment-pipeline-guide +description: "Guidance for deploying AWS Transform agents (Docker, ECR, Bedrock AgentCore, pipeline automation)" +--- + +# AWS Transform Agent Deployment Pipeline Guide + +This guide covers the complete deployment pipeline for AWS Transform agents, including IAM role setup, container builds, Bedrock AgentCore runtime deployment, and registry integration. This is based on the battle-tested patterns from the AWS Transform modernization workshop demo project. + +> **💡 Recommended Approach**: For the easiest deployment experience, use the MCP deployment tools instead of manual scripts. See [Deploy Agent Workflow Guide](deploy-agent-workflow.md) for the recommended workflow that works cross-platform (Windows/macOS/Linux) and handles all phases automatically. +> +> This guide documents the manual shell script approach for reference and advanced customization. + +--- + +## Section 1: IAM Roles Overview + +The AWS Transform agent deployment model requires two IAM roles, each with specific trust relationships and permissions: + +### 1. AgentCoreExecutionRole +- **Used by**: Bedrock AgentCore service (runtime execution environment) +- **Purpose**: Executes agent containers, accesses ECR images, writes logs, traces +- **Trust Policy**: Bedrock's `bedrock-agentcore` service principal +- **When it's used**: During agent runtime execution when Bedrock AgentCore pulls images and runs containers + +### 2. AWSTransformAgentInvokeRole +- **Used by**: AWS Transform compute service +- **Purpose**: Invokes Bedrock AgentCore runtimes on behalf of AWS Transform +- **Trust Policy**: AWS Transform compute service principal (`prod.us-east-1.compute.elastic-gumby.aws.internal`) +- **When it's used**: When AWS Transform routes requests to your agent + +--- + +## Section 2: Complete CloudFormation Template + +**IMPORTANT PREREQUISITES:** +1. **AWS Account Allowlisting**: Your AWS account ID must be allowlisted by the AWS Transform team before you can register agents with the AWS Transform registry. Contact your Solutions Architect or AWS Transform team to request allowlisting for your account. + +The following CloudFormation template defines all required IAM roles. + +```yaml +AWSTemplateFormatVersion: "2010-09-09" +Description: > + IAM roles required for AWS Transform modernization agent deployment. + - AgentCoreExecutionRole: Used by Bedrock AgentCore to run agent containers + - AWSTransformAgentInvokeRole: Used by AWS Transform to invoke agents + +Resources: + + # ----------------------------------------------------------------------- + # AgentCoreExecutionRole + # Used by Bedrock AgentCore to run your agent container + # ----------------------------------------------------------------------- + AgentCoreExecutionRole: + Type: AWS::IAM::Role + Properties: + RoleName: AgentCoreExecutionRole + AssumeRolePolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Principal: + Service: bedrock-agentcore.amazonaws.com + Action: sts:AssumeRole + Policies: + - PolicyName: AgentCoreExecutionPolicy + PolicyDocument: + Version: "2012-10-17" + Statement: + - Sid: BedrockInvoke + Effect: Allow + Action: + - bedrock:InvokeModel + - bedrock:InvokeModelWithResponseStream + - bedrock-runtime:Converse + - bedrock-runtime:InvokeModel + Resource: "*" + - Sid: TransformAgentsApiPolicy + Effect: Allow + Action: + - transform-agents:* + Resource: "*" + - Sid: ECRImageAccess + Effect: Allow + Action: + - ecr:GetAuthorizationToken + - ecr:BatchCheckLayerAvailability + - ecr:GetDownloadUrlForLayer + - ecr:BatchGetImage + Resource: "*" + - Sid: CloudWatchLogs + Effect: Allow + Action: + - logs:CreateLogGroup + - logs:CreateLogStream + - logs:PutLogEvents + Resource: "*" + - Sid: XRayTracing + Effect: Allow + Action: + - xray:PutTraceSegments + - xray:PutTelemetryRecords + Resource: "*" + + # ----------------------------------------------------------------------- + # AWSTransformAgentInvokeRole + # Trust: AWS Transform compute service + # Permissions: Invoke Bedrock AgentCore runtimes + # ----------------------------------------------------------------------- + AWSTransformAgentInvokeRole: + Type: AWS::IAM::Role + Properties: + RoleName: AWSTransformAgentInvokeRole + AssumeRolePolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Principal: + Service: + - prod.us-east-1.compute.elastic-gumby.aws.internal + Action: sts:AssumeRole + Policies: + - PolicyName: ATXAgentInvokePolicy + PolicyDocument: + Version: "2012-10-17" + Statement: + - Sid: ATXAgentCoreRuntimePermissions + Effect: Allow + Action: + - bedrock-agentcore:GetAgentRuntime + - bedrock-agentcore:GetAgentRuntimeEndpoint + - bedrock-agentcore:InvokeAgentRuntime + - bedrock-agentcore:ListAgentRuntimeEndpoints + - bedrock-agentcore:ListAgentRuntimeVersions + - bedrock-agentcore:ListAgentRuntimes + - bedrock-agentcore:StopRuntimeSession + Resource: "*" + - Sid: TransformAgentsAPI + Effect: Allow + Action: + - "transform-agents:*" + Resource: "*" + + +Outputs: + AgentCoreExecutionRoleArn: + Description: ARN of the Bedrock AgentCore execution role + Value: !GetAtt AgentCoreExecutionRole.Arn + Export: + Name: AWSTransform-AgentCoreExecutionRoleArn + + AWSTransformAgentInvokeRoleArn: + Description: ARN of the AWS Transform agent invoke role + Value: !GetAtt AWSTransformAgentInvokeRole.Arn + Export: + Name: AWSTransform-AWSTransformAgentInvokeRoleArn +``` + +### Deploying the CloudFormation Stack + +**Before deploying**, ensure your AWS account has been allowlisted by the AWS Transform team. + +```bash +aws cloudformation deploy \ + --template-file pipeline/iam-roles.yaml \ + --stack-name aws-transform-agent-iam-roles \ + --capabilities CAPABILITY_NAMED_IAM \ + --region us-east-1 +``` + +--- + +## Section 3: Deployment Pipeline Pattern + +The deployment pipeline follows a **four-phase automation pattern**: Build → Push → Deploy → Register. This pattern is battle-tested and handles common failure modes gracefully. + +### Four-Phase Pipeline Architecture + +``` +Phase 1: BUILD + ├─ Docker build with platform=linux/arm64 + ├─ Install SDK wheels from local files + ├─ Create MCP wrapper scripts + └─ Save images as .tar for verification + +Phase 2: PUSH + ├─ ECR login using finch (not docker) + ├─ Create ECR repos if needed + ├─ Tag and push images + └─ Verify images exist in ECR + +Phase 3: DEPLOY + ├─ Create Bedrock AgentCore runtimes with unique names (timestamp with seconds) + ├─ Poll status until READY (or ACTIVE in older API versions) + ├─ Detect terminal failure states (FAILED, STOPPED, DELETE_FAILED) + └─ Capture runtime ARNs for registration + +Phase 4: REGISTER + ├─ Register agent with AWS Transform registry + ├─ Publish version with compute configuration + ├─ Grant access to account + └─ Verify registration +``` + +### Key Implementation Details + +#### 1. Container Runtime Selection (Platform-Specific) + +**macOS/Linux**: +```python +# Recommended: Use finch to avoid Docker Desktop org auth issues +CONTAINER_CMD = "finch" # or "docker" if finch not available +``` + +**Windows**: +```python +# Use Docker Desktop (finch is not available on Windows) +CONTAINER_CMD = "docker" +``` + +**Why finch on macOS/Linux**: Docker Desktop requires organization authentication which can cause silent failures in corporate environments. Finch is Amazon's open-source container runtime that works without licensing issues. + +**Why docker on Windows**: Finch does not support Windows. Windows users must use Docker Desktop with proper authentication configured. + +**Alternative: AWS CodeBuild** (Recommended for cross-platform teams) +- Offload container builds to AWS CodeBuild +- Handles ARM64 builds natively (using Amazon Linux 2 ARM64 instances) +- No local container runtime needed +- Works consistently across all developer platforms +- See Section 8 for CodeBuild setup + +#### 2. Runtime Names with Seconds Precision + +```python +# Lines 282-283 from deploy_agents.py +runtime_name = ( + f"atx_ws_{name.replace('-', '_')}_{datetime.now().strftime('%m%d%H%M%S')}" +) +``` + +**Why**: Bedrock AgentCore has a runtime name cooldown period. Using timestamp with seconds precision (not just minutes) prevents "runtime name already exists" errors during rapid redeployment cycles. + +#### 3. Status Polling for READY + +```python +# Lines 378-422 from deploy_agents.py +def _poll_runtime_status(runtime_id: str, region: str, name: str) -> str: + """Poll Bedrock AgentCore runtime status until READY (or ACTIVE) or failure. Returns the ARN.""" + log.info( + " Polling runtime status for %s (timeout %ds)...", name, AGENTCORE_POLL_TIMEOUT + ) + start = time.time() + + while True: + elapsed = time.time() - start + if elapsed > AGENTCORE_POLL_TIMEOUT: + log.error( + " ✗ Timeout waiting for %s to become READY (%.0fs)", name, elapsed + ) + sys.exit(1) + + result = run_json( + [ + "aws", + "bedrock-agentcore-control", + "get-agent-runtime", + "--agent-runtime-id", + runtime_id, + "--region", + region, + "--output", + "json", + ] + ) + + status = result.get("status") + arn = result.get("agentRuntimeArn", "") + log.info(" [%3.0fs] %s status: %s", elapsed, name, status) + + if status == "ACTIVE" or status == "READY": + log.info(" ✓ %s is %s (ARN: %s)", name, status, arn) + return arn + + if status in AGENTCORE_TERMINAL_FAILURE_STATES: + failure_reasons = result.get("statusReasons", []) + log.error(" ✗ %s entered terminal state: %s", name, status) + if failure_reasons: + log.error(" Reasons: %s", json.dumps(failure_reasons, indent=2)) + sys.exit(1) + + time.sleep(AGENTCORE_POLL_INTERVAL) +``` + +**Why**: Bedrock AgentCore runtime deployment is asynchronous. The create call returns immediately, but the runtime isn't usable until status reaches READY or ACTIVE. This polling loop with timeout prevents premature registration. + +#### 4. Error Handling with stderr Capture + +```python +# Lines 72-86 from deploy_agents.py +def run_json(cmd: list[str]) -> dict: + """Run a command and parse JSON output.""" + result = run(cmd, capture=True, check=False) + if result.returncode != 0: + log.error( + "Command failed (exit %d): %s", result.returncode, result.stderr.strip() + ) + raise subprocess.CalledProcessError( + result.returncode, cmd, result.stdout, result.stderr + ) + try: + return json.loads(result.stdout) + except json.JSONDecodeError: + log.error("Failed to parse JSON from command output:\n%s", result.stdout) + raise +``` + +**Why**: Many AWS CLI failures produce cryptic exit codes. Capturing and logging stderr provides actionable error messages for debugging. + +#### 5. Using --cli-input-json for Complex Structures + +```python +# Lines 313-331 from deploy_agents.py +result = run_json( + [ + "aws", + "bedrock-agentcore-control", + "create-agent-runtime", + "--agent-runtime-name", + runtime_name, + "--agent-runtime-artifact", + json.dumps({"containerConfiguration": {"containerUri": image_uri}}), + "--role-arn", + execution_role_arn, + "--network-configuration", + json.dumps({"networkMode": "PUBLIC"}), + "--region", + region, + "--output", + "json", + ] +) +``` + +**Why**: Passing JSON as string arguments avoids shell escaping issues and makes complex nested structures explicit. + +#### 6. Phase Skip Flags for Iteration + +```python +# Lines 659-674 from deploy_agents.py +parser.add_argument( + "--skip-build", + action="store_true", + help="Skip Docker build phase (use existing images)", +) +parser.add_argument( + "--skip-push", + action="store_true", + help="Skip ECR push phase (use already-pushed images)", +) +``` + +**Why**: During development, you often need to iterate on deploy/register logic without rebuilding images. Skip flags save 5-10 minutes per iteration. + +### Complete Pipeline Script Pattern + +```python +#!/usr/bin/env python3 +"""Deployment pipeline for AWS Transform modernization agent system. + +Phases (in order): + 1. Build — Docker build each agent image, save as .tar + 2. Push — Create ECR repos if needed, tag and push images + 3. Deploy — Create Bedrock AgentCore runtimes, poll until READY + 4. Register — Register agents with AWS Transform registry, publish versions + +Usage: + python pipeline/deploy_agents.py # Full pipeline + python pipeline/deploy_agents.py --skip-build # Skip Docker build + python pipeline/deploy_agents.py --skip-push # Skip ECR push + python pipeline/deploy_agents.py --skip-build --skip-push # Deploy + register only +""" + +import argparse +import json +import logging +import os +import subprocess +import sys +import time +from datetime import datetime + +# Configuration +CONTAINER_CMD = "finch" # Use finch, not docker +AGENTCORE_POLL_INTERVAL = 10 # seconds +AGENTCORE_POLL_TIMEOUT = 120 # seconds +AGENTCORE_TERMINAL_FAILURE_STATES = {"FAILED", "STOPPED", "DELETE_FAILED"} + +def main(): + config = load_config() + + # Phase 1: Build + if not args.skip_build: + phase_build(config) + + # Phase 2: Push + if not args.skip_push: + phase_push(config) + + # Phase 3: Deploy to Bedrock AgentCore + runtime_info = phase_deploy(config) + + # Phase 4: Register with AWS Transform + phase_register(config, runtime_info) +``` + +--- + +## Section 4: Docker Build Context + +### Platform Requirement + +**CRITICAL**: All AWS Transform agents MUST be built for `linux/arm64`: + +```dockerfile +FROM --platform=linux/arm64 public.ecr.aws/docker/library/python:3.11-slim +``` + +**Why**: Bedrock AgentCore runtimes run on AWS Graviton (ARM64) instances. x86_64 images will fail at runtime with "exec format error". + +**Why ECR Public (not Docker Hub)**: `public.ecr.aws/docker/library/python` is the AWS-operated public mirror of Docker Hub's official Python image. Same bits, no AWS account required to pull. + +### SDK Installation + +Install the AWS Transform SDK packages from PyPI: + +```dockerfile +# Install AWS Transform SDK from PyPI +RUN pip install --no-cache-dir \ + agent-builder-sdk-aws-transform \ + agent-builder-agentic-mcp-aws-transform +``` + +### Botocore Service Model Registration ⚠️ CRITICAL + +**REQUIRED**: You MUST register the botocore service models for the Agentic API and Agent Registry API. Without these, boto3 clients will fail with "Unknown service" errors. + +The service model JSON files ship with the `agent-builder-sdk-aws-transform` pip package. After installing the SDK, register them from the installed path: + +```dockerfile +# Register botocore service models from the installed SDK package +RUN pip install --no-cache-dir awscli && \ + SDK_MODELS=$(python -c "from importlib.resources import files; print(files('agent_builder_sdk').joinpath('botocore_models'))") && \ + aws configure add-model --service-name atxagentregistryexternal \ + --service-model "file://${SDK_MODELS}/atxagentregistryexternal/2022-07-26/service-2.json" && \ + aws configure add-model --service-name transformagenticservice \ + --service-model "file://${SDK_MODELS}/transformagenticservice/2018-05-10/service-2.json" +``` + +**Why**: AWS Transform uses custom AWS service APIs not part of the standard boto3 distribution: +- `transformagenticservice` - Used by BaseAgent SDK to invoke other agents (InvokeAgent operation) +- `atxagentregistryexternal` - Used to register and publish agents to AWS Transform registry + +Without registering these service models, your agent will fail at runtime when trying to: +- Invoke subagents from an orchestrator +- Register the agent with AWS Transform registry +- Use any BaseAgent SDK features that call Agentic API + +**Common Error Without Service Models**: +``` +botocore.exceptions.UnknownServiceError: Unknown service: 'transformagenticservice' +``` + +### MCP Runtime Wrapper + +Bedrock AgentCore expects an MCP server binary at a specific path: + +```dockerfile +# Create MCP server wrapper binary +RUN mkdir -p /home/amazon/AgentBuilderAgenticMCP/bin && \ + printf '#!/bin/bash\npython -m agent_builder_agentic_mcp "$@"\n' > /home/amazon/AgentBuilderAgenticMCP/bin/agent-builder-agentic-mcp && \ + chmod +x /home/amazon/AgentBuilderAgenticMCP/bin/agent-builder-agentic-mcp +``` + +**Why**: Bedrock AgentCore runtime looks for `agent-builder-agentic-mcp` binary in this exact path. The wrapper script delegates to the Python module installed from PyPI. + +### Complete Dockerfile Templates + +The canonical Dockerfile templates are maintained as standalone files for easy reference and single-source-of-truth maintenance: + +- **Orchestrator**: [dockerfile-orchestrator.md](./dockerfile-orchestrator.md) +- **Subagent**: [dockerfile-subagent.md](./dockerfile-subagent.md) + +These templates incorporate both the botocore service model registration and MCP wrapper script creation documented above. **Use them verbatim** — do not generate a Dockerfile from scratch. + +### Docker Build Command + +```bash +finch build \ + --platform linux/arm64 \ + -f src/orchestrator/Dockerfile \ + -t modernization-orchestrator:latest \ + . +``` + +--- + +## Section 5: Bedrock AgentCore CLI Commands + +### Create Agent Runtime + +```bash +aws bedrock-agentcore-control create-agent-runtime \ + --agent-runtime-name atx_ws_my_agent_02251430 \ + --agent-runtime-artifact '{ + "containerConfiguration": { + "containerUri": "111122223333.dkr.ecr.us-east-1.amazonaws.com/aws-transform-agents/my-agent:latest" + } + }' \ + --role-arn arn:aws:iam::111122223333:role/AgentCoreExecutionRole \ + --network-configuration '{"networkMode": "PUBLIC"}' \ + --region us-east-1 \ + --output json +``` + +**Returns**: +```json +{ + "agentRuntimeId": "abc123def456", + "agentRuntimeName": "atx_ws_my_agent_02251430", + "status": "CREATING" +} +``` + +### Get Agent Runtime Status + +```bash +aws bedrock-agentcore-control get-agent-runtime \ + --agent-runtime-id abc123def456 \ + --region us-east-1 \ + --output json +``` + +**Returns**: +```json +{ + "agentRuntimeId": "abc123def456", + "agentRuntimeName": "atx_ws_my_agent_02251430", + "agentRuntimeArn": "arn:aws:bedrock-agentcore:us-east-1:111122223333:agent-runtime/abc123def456", + "status": "READY", + "containerConfiguration": { + "containerUri": "111122223333.dkr.ecr.us-east-1.amazonaws.com/aws-transform-agents/my-agent:latest" + }, + "roleArn": "arn:aws:iam::111122223333:role/AgentCoreExecutionRole", + "networkConfiguration": { + "networkMode": "PUBLIC" + }, + "createdAt": "2026-02-25T14:30:00Z", + "updatedAt": "2026-02-25T14:32:15Z" +} +``` + +### List Agent Runtimes + +```bash +aws bedrock-agentcore-control list-agent-runtimes \ + --region us-east-1 \ + --output json +``` + +### Update Agent Runtime + +```bash +aws bedrock-agentcore-control update-agent-runtime \ + --agent-runtime-id abc123def456 \ + --agent-runtime-artifact '{ + "containerConfiguration": { + "containerUri": "111122223333.dkr.ecr.us-east-1.amazonaws.com/aws-transform-agents/my-agent:v2" + } + }' \ + --region us-east-1 +``` + +### Delete Agent Runtime + +```bash +aws bedrock-agentcore-control delete-agent-runtime \ + --agent-runtime-id abc123def456 \ + --region us-east-1 +``` + +--- + +## Section 6: Common Pipeline Issues + +### Issue 1: Docker Desktop Organization Authentication + +**Symptom**: +``` +Error response from daemon: Get https://111122223333.dkr.ecr.us-east-1.amazonaws.com/v2/: unauthorized +``` + +**Root Cause**: Docker Desktop on macOS requires organization authentication which may not be configured in CI/CD environments. + +**Solution**: Use `finch` instead of `docker`: + +```python +CONTAINER_CMD = "finch" +``` + +**Why It Works**: Finch is Amazon's open-source container runtime built on containerd and doesn't require Docker Desktop licensing or org authentication. + +--- + +### Issue 2: Runtime Name Cooldown Period + +**Symptom**: +``` +ConflictException: An error occurred (ConflictException) when calling the CreateAgentRuntime operation: +Runtime name 'atx_ws_my_agent' already exists or was recently deleted +``` + +**Root Cause**: Bedrock AgentCore has a cooldown period for runtime names. Even after deleting a runtime, the name cannot be immediately reused. + +**Solution**: Append timestamp with **seconds precision** to runtime names: + +```python +runtime_name = ( + f"atx_ws_{name.replace('-', '_')}_{datetime.now().strftime('%m%d%H%M%S')}" +) +``` + +**Example**: `atx_ws_code_analysis_agent_02251430` (month, day, hour, minute, second) + +**Why Seconds Matter**: Using only `%m%d%H%M` (without seconds) still causes collisions during rapid testing cycles. Adding seconds provides uniqueness within a 1-second window. + +--- + +### Issue 3: Silent Publish Failures with Positional Arguments + +**Symptom**: `publish-agent-version` command returns success but version doesn't appear in registry. + +**Root Cause**: AWS CLI silently ignores malformed JSON when passed as positional arguments instead of named parameters. + +**Bad**: +```bash +aws atxagentregistryexternal publish-agent-version \ + my-agent 1.0.0 '{"computeConfiguration": ...}' +``` + +**Good**: +```bash +aws atxagentregistryexternal publish-agent-version \ + --name my-agent \ + --version 1.0.0 \ + --configuration '{"computeConfiguration": ...}' +``` + +**Solution**: Always use `--cli-input-json` or named parameters: + +```python +run_json([ + "aws", + "atxagentregistryexternal", + "publish-agent-version", + "--name", name, + "--version", version, + "--configuration", json.dumps(configuration), + "--endpoint-url", endpoint, + "--region", region +]) +``` + +--- + +### Issue 4: Trust Policy Mismatch (Prod vs Gamma) + +**Symptom**: +``` +AccessDeniedException: Cross-account pass role is not allowed +``` + +**Root Cause**: AWSTransformAgentInvokeRole trust policy doesn't include the correct AWS Transform compute service principal for your environment. + +**Solution**: Trust policy must include the AWS Transform compute service principal: + +```yaml +AssumeRolePolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Principal: + Service: + - prod.us-east-1.compute.elastic-gumby.aws.internal + Action: sts:AssumeRole +``` + +**Verification**: +```bash +aws iam get-role --role-name AWSTransformAgentInvokeRole --query 'Role.AssumeRolePolicyDocument' +``` + +--- + +### Issue 5: Missing Bedrock and Agentic API Permissions in AgentCoreExecutionRole + +**Symptom**: +``` +AccessDeniedException: User: arn:aws:sts::111122223333:assumed-role/AgentCoreExecutionRole/... +is not authorized to perform: bedrock:InvokeModel / transform-agents:GetAgentInstance +``` + +**Root Cause**: AgentCoreExecutionRole missing Bedrock and AWS Transform Agentic API permissions. + +**Solution**: Add required permissions to AgentCoreExecutionRole: + +```yaml +- Sid: BedrockInvoke + Effect: Allow + Action: + - bedrock:InvokeModel + - bedrock:InvokeModelWithResponseStream + - bedrock-runtime:Converse + - bedrock-runtime:InvokeModel + Resource: "*" +- Sid: TransformAgentsApiPolicy + Effect: Allow + Action: + - transform-agents:* + Resource: "*" +``` + +**Note**: Bedrock AgentCore needs broad access because: +- Agents may invoke different Bedrock models dynamically +- Agents need to call various AWS Transform Agentic API operations (GetAgentInstance, UpdateJobStatus, etc.) +Using `Resource: "*"` is intentional and recommended. + +--- + +### Issue 6: Wrong Platform Architecture (x86_64 instead of arm64) + +**Symptom**: +``` +Container exited with code 1: exec /usr/local/bin/python: exec format error +``` + +**Root Cause**: Image was built for x86_64 but Bedrock AgentCore runtimes run on ARM64 (Graviton) instances. + +**Solution**: Always specify `--platform linux/arm64` in Dockerfile FROM directive: + +```dockerfile +FROM --platform=linux/arm64 public.ecr.aws/docker/library/python:3.11-slim +``` + +**Verification**: +```bash +finch inspect modernization-orchestrator:latest | grep Architecture +# Should output: "Architecture": "arm64" +``` + +**Build Command**: +```bash +finch build --platform linux/arm64 -f Dockerfile -t my-agent:latest . +``` + +--- + +### Issue 7: Bedrock AgentCore Runtime Stuck in CREATING State + +**Symptom**: Runtime status stays "CREATING" for >5 minutes, never reaches READY. + +**Root Cause**: Typically one of: +1. Image pull failure (ECR permissions) +2. Container health check failing +3. Container crashes immediately on startup + +**Solution**: Check Bedrock AgentCore runtime failure reasons: + +```bash +aws bedrock-agentcore-control get-agent-runtime \ + --agent-runtime-id abc123def456 \ + --region us-east-1 \ + --query 'statusReasons' +``` + +**Common Failure Reasons**: +- `IMAGE_PULL_FAILED`: AgentCoreExecutionRole lacks ECR permissions +- `CONTAINER_UNHEALTHY`: Health check endpoint not responding +- `CONTAINER_EXITED`: Application crashed (check CloudWatch logs) + +**Debugging**: +1. Verify ECR permissions in AgentCoreExecutionRole +2. Test health check locally: `curl http://localhost:8080/ping` +3. Check CloudWatch log group: `/aws/bedrock-agentcore/agent-runtime/` + +--- + +### Issue 8: Registry Endpoint Mismatch (Prod vs Gamma) + +**Symptom**: Agent registration succeeds but agent doesn't appear in AWS Transform UI. + +**Root Cause**: Registered to wrong registry endpoint. + +**Solution**: Verify registry endpoint is the prod endpoint: + +```json +{ + "atx_registry_endpoint": "https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev" +} +``` + +**Verification**: +```bash +aws atxagentregistryexternal get-agent \ + --name my-agent \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 +``` + +--- + +## Section 7: End-to-End Example + +### Reference Implementation + +The complete working implementation is available in the AWS Transform modernization workshop demo project: + +**Location**: `/pipeline/` + +**Key Files**: +``` +pipeline/ +├── deploy_agents.py # Four-phase deployment automation +├── iam-roles.yaml # CloudFormation IAM role definitions +├── config.json # Agent configuration and AWS settings +└── README.md # Setup and usage instructions + +src/ +├── orchestrator/ +│ ├── Dockerfile # Orchestrator container definition +│ ├── app.py # Flask app with /invoke endpoint +│ ├── orchestrator.py # Multi-agent orchestration logic +│ └── requirements.txt # Python dependencies +└── subagents/ + ├── Dockerfile.analysis # Analysis agent container + ├── Dockerfile.transformation # Transformation agent container + ├── code_analysis_subagent.py + ├── code_transformation_subagent.py + └── requirements.txt + +``` + +### Running the Example + +1. **Clone the demo project**: + ```bash + cd + ``` + +2. **Configure AWS credentials**: + ```bash + export AWS_PROFILE=your-profile + export AWS_REGION=us-east-1 + ``` + +3. **Deploy IAM roles** (one-time setup): + ```bash + aws cloudformation deploy \ + --template-file pipeline/iam-roles.yaml \ + --stack-name aws-transform-agent-iam-roles \ + --capabilities CAPABILITY_NAMED_IAM \ + --region us-east-1 + ``` + +4. **Edit configuration**: + ```bash + # Edit pipeline/config.json with your account ID and registry endpoint + vi pipeline/config.json + ``` + +5. **Run full pipeline**: + ```bash + python pipeline/deploy_agents.py + ``` + + **Expected Output**: + ``` + ============================================================ + PHASE 1: BUILD + ============================================================ + Building code-analysis-agent from src/subagents/Dockerfile.analysis ... + ✓ Image code-analysis-agent:latest built successfully + ✓ Saved docker-images/code-analysis-agent.tar (156.3 MB) + + ============================================================ + PHASE 2: PUSH TO ECR + ============================================================ + Logging in to ECR... + ✓ ECR login successful + Ensuring ECR repo aws-transform-agents/code-analysis-agent exists... + ✓ Created aws-transform-agents/code-analysis-agent + ✓ Image aws-transform-agents/code-analysis-agent:latest verified in ECR + + ============================================================ + PHASE 3: DEPLOY TO AGENTCORE + ============================================================ + Creating Bedrock AgentCore runtime for code-analysis-agent ... + ✓ Created runtime ID: abc123def456 + Polling runtime status for code-analysis-agent (timeout 120s)... + [ 10s] code-analysis-agent status: CREATING + [ 20s] code-analysis-agent status: CREATING + [ 30s] code-analysis-agent status: READY + ✓ code-analysis-agent is READY (ARN: arn:aws:bedrock-agentcore:...) + + ============================================================ + PHASE 4: REGISTER WITH AWS TRANSFORM + ============================================================ + Registering code-analysis-agent with AWS Transform registry... + ✓ Registered code-analysis-agent + ✓ Published version 1.0.0 for code-analysis-agent + ✓ Access granted for code-analysis-agent to account 111122223333 + + ============================================================ + DEPLOYMENT COMPLETE + ============================================================ + code-analysis-agent + Runtime ID: abc123def456 + Runtime ARN: arn:aws:bedrock-agentcore:us-east-1:111122223333:agent-runtime/abc123def456 + + All 3 agents deployed and registered successfully. + ``` + +6. **Iterate on deployment** (skip build/push to save time): + ```bash + python pipeline/deploy_agents.py --skip-build --skip-push + ``` + +### Verification Steps + +1. **Verify Bedrock AgentCore runtimes**: + ```bash + aws bedrock-agentcore-control list-agent-runtimes --region us-east-1 + ``` + +2. **Verify registry entries**: + ```bash + aws atxagentregistryexternal get-agent \ + --name code-analysis-agent \ + --endpoint-url https://iad.prod.agent-registry-external.elastic-gumby.ai.aws.dev \ + --region us-east-1 + ``` + +3. **Test agent invocation** (via AWS Transform): + ```bash + # This requires AWS Transform access + curl -X POST https://your-atx-instance/api/v1/invoke \ + -H "Content-Type: application/json" \ + -d '{ + "agentName": "code-analysis-agent", + "agentVersion": "1.0.0", + "payload": {"request": "analyze this code"} + }' + ``` + +--- + +## Summary + +This guide covers the complete AWS Transform agent deployment lifecycle: + +1. **IAM Setup**: Two roles (AgentCoreExecutionRole, AWSTransformAgentInvokeRole) with precise trust policies +2. **Build Pipeline**: Four-phase automation (Build → Push → Deploy → Register) with error handling +3. **Docker Best Practices**: ARM64 platform, SDK wheel installation, MCP wrapper creation +4. **Bedrock AgentCore Operations**: Create, poll, verify runtimes with proper status checking +5. **Common Issues**: Solutions for Docker auth, runtime cooldown, trust policies, platform mismatches + +**Key Takeaways**: +- Always use `finch` instead of `docker` on macOS/Linux (finch is not available on Windows — use Docker Desktop there) +- Include seconds in runtime name timestamps +- Specify `--platform linux/arm64` for all builds +- Poll Bedrock AgentCore status until READY/ACTIVE before registration +- Use `--cli-input-json` or named parameters to avoid silent failures +- Trust the prod principal in AWSTransformAgentInvokeRole + +**Reference Implementation**: `/pipeline/` + +## Alternative: Using MCP Deployment Tools (Recommended) + +Instead of manual shell scripts, you can deploy agents directly from Kiro using the new MCP deployment tools: + +```python +# Full pipeline (build → push → deploy → register) +deploy_agent_full_pipeline( + agent_path="./agents/modernization", + agent_name="modernization-orchestrator", + agent_version="1.0.0" +) +``` + +**Advantages of MCP Tools:** +- **Cross-platform**: Works on Windows (uses CodeBuild), macOS (uses finch), and Linux (uses docker) +- **Auto-detection**: Automatically detects best container runtime and IAM roles +- **Error handling**: Returns structured errors with helpful hints +- **No manual scripts**: No need to maintain separate shell scripts for each phase +- **Conversational**: Deploy agents through natural conversation with Kiro + +**See**: [Deploy Agent Workflow Guide](deploy-agent-workflow.md) for detailed instructions on using MCP deployment tools. + +For additional patterns, see: +- [Orchestrator Patterns](orchestrator-patterns.md) +- [Subagent Patterns](subagent-patterns.md) +- [Agent Registration](agent-registration.md) +- **[Deploy Agent Workflow](deploy-agent-workflow.md)** - Recommended deployment method using MCP tools diff --git a/aws-transform-agent-toolkit/steering/dockerfile-orchestrator.md b/aws-transform-agent-toolkit/steering/dockerfile-orchestrator.md new file mode 100644 index 0000000..f68b3a3 --- /dev/null +++ b/aws-transform-agent-toolkit/steering/dockerfile-orchestrator.md @@ -0,0 +1,59 @@ +# Orchestrator Dockerfile Template + +**MANDATORY** — use this Dockerfile verbatim when scaffolding an orchestrator agent. Do NOT generate a Dockerfile from scratch. A minimal Dockerfile will fail on first invocation with two separate bugs: + +1. **Missing botocore service models** — agent init fails with `Unknown service: 'transformagenticservice'`. The SDK registers these models on the host during install, but they must also be registered inside the container. + +2. **Missing MCP server shim** — `AgentRuntimeServer` spawns the Agentic MCP server from the hardcoded path `/home/amazon/AgentBuilderAgenticMCP/bin/agent-builder-agentic-mcp`. Without the shim, the runtime fails with `FileNotFoundError` and the agent is stuck in STARTING. + +Adapt the `COPY src/orchestrator/ .` line and `ENTRYPOINT` to match your source layout. Everything else must remain as-is. + +**Note on the base image**: `public.ecr.aws/docker/library/python` is the AWS-operated public mirror of Docker Hub's official Python image — equivalent bits, no AWS account required to pull. + +```dockerfile +FROM --platform=linux/arm64 public.ecr.aws/docker/library/python:3.11-slim + +WORKDIR /app + +# Install system dependencies +RUN apt-get update && \ + apt-get install -y --no-install-recommends curl && \ + rm -rf /var/lib/apt/lists/* + +# Install AWS Transform SDK from PyPI +RUN pip install --no-cache-dir \ + agent-builder-sdk-aws-transform \ + agent-builder-agentic-mcp-aws-transform \ + agent-builder-types-aws-transform \ + agent-builder-mcp-client-aws-transform + +# Register botocore service models (REQUIRED for Agentic API and Agent Registry API) +RUN pip install --no-cache-dir awscli && \ + SDK_MODELS=$(python -c "from importlib.resources import files; print(files('agent_builder_sdk').joinpath('botocore_models'))") && \ + aws configure add-model --service-name atxagentregistryexternal \ + --service-model "file://${SDK_MODELS}/atxagentregistryexternal/2022-07-26/service-2.json" && \ + aws configure add-model --service-name transformagenticservice \ + --service-model "file://${SDK_MODELS}/transformagenticservice/2018-05-10/service-2.json" + +# Install remaining Python dependencies +COPY requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt + +# Create MCP server wrapper binary +RUN mkdir -p /home/amazon/AgentBuilderAgenticMCP/bin && \ + printf '#!/bin/bash\npython -m agent_builder_agentic_mcp "$@"\n' > /home/amazon/AgentBuilderAgenticMCP/bin/agent-builder-agentic-mcp && \ + chmod +x /home/amazon/AgentBuilderAgenticMCP/bin/agent-builder-agentic-mcp + +# Copy orchestrator source +COPY src/orchestrator/ . + +# Create storage directories +RUN mkdir -p /tmp/agent_queue /tmp/orchestrator_agent + +EXPOSE 8080 + +HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \ + CMD curl -f http://localhost:8080/ping || exit 1 + +ENTRYPOINT ["python", "app.py"] +``` diff --git a/aws-transform-agent-toolkit/steering/dockerfile-subagent.md b/aws-transform-agent-toolkit/steering/dockerfile-subagent.md new file mode 100644 index 0000000..8f61afe --- /dev/null +++ b/aws-transform-agent-toolkit/steering/dockerfile-subagent.md @@ -0,0 +1,52 @@ +# Subagent Dockerfile Template + +**MANDATORY** — use this Dockerfile verbatim when scaffolding a subagent. Do NOT generate a Dockerfile from scratch. A minimal Dockerfile will fail on first invocation with two separate bugs: + +Adapt the `COPY` source lines and the `ENTRYPOINT` to match your source layout. Everything else must remain as-is. + +**Note on the base image**: `public.ecr.aws/docker/library/python` is the AWS-operated public mirror of Docker Hub's official Python image — equivalent bits, no AWS account required to pull. + +```dockerfile +FROM --platform=linux/arm64 public.ecr.aws/docker/library/python:3.11-slim + +WORKDIR /app + +# Install system dependencies +RUN apt-get update && \ + apt-get install -y --no-install-recommends curl && \ + rm -rf /var/lib/apt/lists/* + +# Install AWS Transform SDK from PyPI +RUN pip install --no-cache-dir \ + agent-builder-sdk-aws-transform \ + agent-builder-agentic-mcp-aws-transform \ + agent-builder-types-aws-transform \ + agent-builder-mcp-client-aws-transform + +# Register botocore service models (REQUIRED for Agentic API and Agent Registry API) +RUN pip install --no-cache-dir awscli && \ + SDK_MODELS=$(python -c "from importlib.resources import files; print(files('agent_builder_sdk').joinpath('botocore_models'))") && \ + aws configure add-model --service-name atxagentregistryexternal \ + --service-model "file://${SDK_MODELS}/atxagentregistryexternal/2022-07-26/service-2.json" && \ + aws configure add-model --service-name transformagenticservice \ + --service-model "file://${SDK_MODELS}/transformagenticservice/2018-05-10/service-2.json" + +# Install remaining Python dependencies +COPY requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt + +# Create MCP server wrapper binary +RUN mkdir -p /home/amazon/AgentBuilderAgenticMCP/bin && \ + printf '#!/bin/bash\npython -m agent_builder_agentic_mcp "$@"\n' > /home/amazon/AgentBuilderAgenticMCP/bin/agent-builder-agentic-mcp && \ + chmod +x /home/amazon/AgentBuilderAgenticMCP/bin/agent-builder-agentic-mcp + +# Copy subagent source +COPY src/subagent/ . + +EXPOSE 8080 + +HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \ + CMD curl -f http://localhost:8080/ping || exit 1 + +ENTRYPOINT ["python", "app.py"] +``` diff --git a/aws-transform-agent-toolkit/steering/getting-started.md b/aws-transform-agent-toolkit/steering/getting-started.md new file mode 100644 index 0000000..b33a3bf --- /dev/null +++ b/aws-transform-agent-toolkit/steering/getting-started.md @@ -0,0 +1,157 @@ +--- +inclusion: auto +name: getting-started +description: "Getting started with AWS Transform or building your first agent" +--- + +# Getting Started with AWS Transform Agent Development + +This guide helps you build your first AWS Transform agent. + +## Prerequisites + +Before starting, ensure you've completed the onboarding steps: + +1. **Tools validated**: Python 3.11+, AWS CLI, finch/Docker +2. **SDK installed**: `pip install agent-builder-sdk-aws-transform` +3. **Hooks added**: Workspace validation hooks set up + +**If you haven't done this yet**: The AWS Transform Agent Toolkit onboarding walks you through all installation steps. Kiro will guide you through tool validation, SDK installation, and hook setup when you first activate the power. + +## What You Can Build + +With AWS Transform, you create two types of agents: + +- **Orchestrator Agents**: Coordinate complex workflows and manage multiple subagents +- **Subagents**: Handle specific, focused tasks within a workflow + +## How to Use This Power + +### Search → Read → Generate Workflow + +MCP search tools are a **discovery layer** — they find what exists and where. +For code generation, always follow this workflow: + +1. **Discover**: `keyword_search("BaseOrchestrator")` → find the class, get the `file` field from results +2. **Find** the installed package location: + ```bash + python3 -c "import agent_builder_sdk; print(agent_builder_sdk.__file__)" + ``` +3. **Grep** for the class or function: + ```bash + grep -r "class BaseOrchestrator" $(python3 -c "import agent_builder_sdk, os; print(os.path.dirname(agent_builder_sdk.__file__))") + ``` +4. **Read** the matched file for full signatures and docstrings +5. **Generate** code using the complete source — not the truncated preview + +**NEVER generate code from search result snippets alone** — they are truncated previews. +Key classes like AsyncBaseOrchestrator and AsyncBaseSubagent are heavily truncated in +search results. Always read the full source via grep. + +### Search AWS Transform Documentation + +Ask Kiro to search the indexed documentation: +- "How do I create an orchestrator agent?" +- "What's the difference between orchestrator and subagent?" +- "How do I invoke a subagent?" + +### Generate Agent Code + +Ask Kiro to generate code from descriptions: +- "Create an orchestrator agent called CustomerSupportAgent that handles support tickets" +- "Create a subagent that analyzes code quality" + +### Get Guidance on Specific Topics + +Reference other steering files for detailed patterns: +- Creating orchestrators → See orchestrator-patterns.md +- Building subagents → See subagent-patterns.md +- Working with APIs → See api-reference.md +- Deploying agents → See deployment-pipeline-guide.md + +## Key Concepts + +### Agent Types + +| Type | Purpose | Base Class | +|------|---------|------------| +| **Orchestrator** | Coordinate workflows, manage subagents | `AsyncBaseOrchestrator` | +| **Subagent** | Handle specific focused tasks | `AsyncBaseSubagent` | + +### Architecture Overview + +``` +┌─────────────────────────────────────────────────────────┐ +│ AWS Transform │ +└─────────────────────────────────────────────────────────┘ + │ + ▼ + ┌──────────────────────┐ + │ Orchestrator Agent │ + │ (Your Main Logic) │ + └──────────────────────┘ + │ │ + ┌────────┴───────┴────────┐ + ▼ ▼ + ┌──────────────┐ ┌──────────────┐ + │ Subagent A │ │ Subagent B │ + │ (Specialized)│ │ (Specialized)│ + └──────────────┘ └──────────────┘ +``` + +- **Orchestrator**: Receives requests from AWS Transform, coordinates workflow +- **Subagents**: Perform specialized tasks (code analysis, data processing, etc.) +- **Communication**: Via Agentic API (InvokeAgent operation) + +## Quick Start Examples + +### Example 1: Simple Orchestrator + +Ask Kiro: +``` +"Create an orchestrator that receives a customer question and returns an answer" +``` + +Kiro will generate: +- Flask app with `/invoke` endpoint +- AsyncBaseOrchestrator subclass +- Request/response handling +- Error handling patterns + +### Example 2: Multi-Agent System + +Ask Kiro: +``` +"Create a code modernization orchestrator with two subagents: +one for code analysis and one for generating recommendations" +``` + +Kiro will generate: +- Orchestrator that coordinates both subagents +- Subagent invocation using Agentic API +- Job status polling +- Result aggregation + +## Next Steps + +1. **Learn patterns**: Review orchestrator-patterns.md or subagent-patterns.md +2. **Generate code**: Ask Kiro to create your first agent +3. **Test locally**: Run Flask app and test `/invoke` endpoint +4. **Deploy**: Follow deployment-pipeline-guide.md to deploy to Bedrock AgentCore + +## Common Questions + +**Q: Where do I start?** +A: Ask Kiro "Create an orchestrator called X that does Y" - it will generate a working scaffold. + +**Q: How do I test my agent locally?** +A: Run the Flask app and POST to `http://localhost:8080/invoke` with test payloads. + +**Q: How do I invoke a subagent from my orchestrator?** +A: Use the Agentic API's InvokeAgent operation. See orchestrator-patterns.md for examples. + +**Q: What APIs are available?** +A: See api-reference.md for complete API documentation. + +**Q: How do I deploy my agent?** +A: See deployment-pipeline-guide.md for Docker, ECR, and Bedrock AgentCore deployment steps. diff --git a/aws-transform-agent-toolkit/steering/hitl.md b/aws-transform-agent-toolkit/steering/hitl.md new file mode 100644 index 0000000..f609f54 --- /dev/null +++ b/aws-transform-agent-toolkit/steering/hitl.md @@ -0,0 +1,52 @@ +--- +inclusion: auto +name: hitl +description: "HITL human-in-the-loop UI, domTreeJson, DynamicHITLRenderEngine, AutoForm, HitlClient SDK, HITL task lifecycle" +--- + +# HITL (Human-in-the-Loop) + +## Critical Rules + +1. **NEVER generate submit/cancel/save/OK/close buttons** -- AWS Transform handles submission automatically +2. **Only 4 components capture input** -- Input, Textarea, RadioGroup, FileUpload. All require `fieldId`. Select, Multiselect, Checkbox, DatePicker, TimeInput render but silently lose data. Use AutoForm (`uxComponentId: "AutoForm"`) if you need captured select/checkbox fields. +3. **NEVER use Container** -- banned by ESLint. Use SpaceBetween + Header instead. +4. **NEVER put raw text in layout children** -- SpaceBetween, ColumnLayout, Grid, Cards, Tabs, Form only accept component objects; wrap text in Box or TextContent +5. **Table MUST have `variant: "borderless"`** -- default "container" variant is rejected by ESLint +6. **Header variant: h1, h2, h3 only** -- h4-h6 not supported in schema +7. **Wrap artifacts correctly** -- DynamicHITLRenderEngine: `{"properties": {"domTreeJson": {...}}}`. Other components (AutoForm, TextInput, etc.): `{"properties": {...}}` without domTreeJson. Or use `serialize()` from the SDK. +8. **ALWAYS use `"type"` field** in component JSON -- never `"component"` or `"component_type"` + +## JSON Generation Mode + +**Before generating any HITL UI JSON, you MUST complete these steps in order:** + +1. **Check render engine limitations** — call `search_by_source("input capture supported", "hitl-render-limitations")` to understand which components capture input vs silently discard data. +2. **Call `get_hitl_generation_prompt()`** to load the full generation rules and component schema. +3. **Only then generate the JSON** — pure JSON only, start with `{`, end with `}`, no markdown wrapping, no explanations before or after the JSON. + +Skipping step 1 risks generating forms with components that render but silently discard user input (e.g., Select, Checkbox, DatePicker). + +## SDK Integration + +To add HITL to an agent, refactor HITL code, or integrate with the task lifecycle, search the KB — do not answer from memory: + +- **Quick start pattern**: `keyword_search("HitlClient upload_artifact create_and_start_task")` +- **Python SDK methods**: `search_by_source("HitlClient", "hitl-sdk-python")` +- **Custom UIs (domTreeJson)**: `keyword_search("DynamicHITLRenderEngine domTreeJson")` + +Always recommend the SDK over raw API calls. + +## Deeper Topics (search the KB) + +| Question | Search query | +| -------------------------- | -------------------------------------------------------------------------------- | +| Refresh loops | `keyword_search("execute_with_refresh refresh loop")` | +| Custom UI components | `search_by_source("custom UI component wrap", "hitl-custom-components")` | +| Ready-to-use templates | `search_by_source("pattern template", "hitl-common-patterns")` | +| Validation rules | `search_by_source("validation common errors", "hitl-validation")` | +| System architecture | `search_by_source("three participants lifecycle", "hitl-architecture")` | +| Java SDK | `search_by_source("HitlClient Java", "hitl-sdk-java")` | +| Dashboard (read-only) | `keyword_search("dashboard HitlTaskType DASHBOARD read-only")` | +| Blocking vs non-blocking | `keyword_search("blocking non-blocking HITL task")` | +| CRITICAL severity approval | `keyword_search("CRITICAL severity approval workflow")` | diff --git a/aws-transform-agent-toolkit/steering/orchestrator-patterns.md b/aws-transform-agent-toolkit/steering/orchestrator-patterns.md new file mode 100644 index 0000000..a357931 --- /dev/null +++ b/aws-transform-agent-toolkit/steering/orchestrator-patterns.md @@ -0,0 +1,372 @@ +--- +inclusion: auto +name: orchestrator-patterns +description: "Guidance for creating or modifying orchestrator agents" +--- + +# Building Orchestrator Agents + +## What is an Orchestrator Agent? + +An orchestrator coordinates multiple subagents to accomplish complex workflows. It creates job plans visible in the AWS Transform webapp, executes steps in background threads (no timeout ceiling), sends progress messages via ATX_CHAT, and handles ad-hoc queries during execution. Orchestrators follow a 3-phase workflow: Negotiate, Confirm, Execute. + +## How to Build an Orchestrator + +### Step 1: Create Your Orchestrator Class + +```python +# In your agent package (e.g., MyCustomAgent) +from agent_builder_sdk.orchestrator_strands.base_orchestrator import AsyncBaseOrchestrator + + +class MyCustomOrchestrator(AsyncBaseOrchestrator): + """Your custom orchestrator implementation.""" + + def __init__(self, **kwargs): + super().__init__( + system_prompt="You are a specialized orchestrator for...", + **kwargs + ) + # Add your custom tools, hooks, conversation implementation +``` + +**Key Points**: +- Extend `AsyncBaseOrchestrator` (not the synchronous version) +- Provide a clear `system_prompt` that defines the agent's role +- Use `**kwargs` to pass through configuration options + +### Step 2: Configure System Prompt + +The system prompt defines your agent's behavior and capabilities: + +```python +system_prompt = """You are a specialized orchestrator for AWS infrastructure transformation. + +Your responsibilities: +- Analyze customer infrastructure requirements +- Create detailed transformation plans +- Coordinate with subagents for specific tasks +- Provide progress updates to customers + +Available capabilities: +- Access to AWS Transform APIs via MCP +- Memory management for conversation context +- Job plan creation and management +""" +``` + +**Best Practices**: +- Be specific about the agent's domain and responsibilities +- List available capabilities and tools +- Define expected behavior and constraints + +### Step 3: Create Custom Tools (Optional) + +Define domain-specific tools using Strands decorators: + +```python +# custom_tools.py +from strands.tools import tool + + +@tool +def analyze_infrastructure(config: str) -> dict: + """Analyze infrastructure configuration and return recommendations. + + Args: + config: Infrastructure configuration in JSON or YAML format + + Returns: + Dictionary with analysis results and recommendations + """ + # Your analysis logic here + return { + "status": "analyzed", + "recommendations": ["Use t3.medium instances", "Enable auto-scaling"] + } + + +@tool +def create_migration_plan(source: str, target: str) -> str: + """Create a migration plan from source to target infrastructure. + + Args: + source: Source infrastructure identifier + target: Target infrastructure identifier + + Returns: + Migration plan as formatted string + """ + return f"Migration plan from {source} to {target}:\n1. Assess current state\n2. Plan migration\n3. Execute migration" +``` + +**Tool Guidelines**: +- Use clear, descriptive function names +- Include comprehensive docstrings (LLM reads these) +- Specify parameter types and return types +- Keep tools focused on single responsibilities + +### Step 4: Create Your Entry Point + +#### Option A: Using AgentRuntimeServer (Recommended) + +Use the simplified `AgentRuntimeServer` with a custom agent factory: + +```python +# my_agent_cli.py +import argparse +import logging +from agent_builder_sdk.server.agent_runtime_server import AgentRuntimeServer +from agent_builder_sdk.agent_factory import create_default_orchestrator +from agent_builder_sdk.utils import get_prompt_with_name + +# Configure logging +logging.basicConfig( + level=logging.INFO, + format="%(asctime)s - %(levelname)s - %(message)s", +) +logger = logging.getLogger(__name__) + + +def create_parser() -> argparse.ArgumentParser: + """Create command line argument parser.""" + parser = argparse.ArgumentParser(description="Run Agent Runtime Server") + parser.add_argument("--host", default="0.0.0.0", help="Host to bind server to") + parser.add_argument("--port", type=int, default=8080, help="Port to bind server to") + parser.add_argument( + "--storage-dir", + default="/tmp/orchestrator_agent", + help="Storage directory for agent data (queue, responses, checkpoints)" + ) + parser.add_argument( + "--binary-location", + default="/home/amazon/AgentBuilderAgenticMCP/bin/agent-builder-agentic-mcp", + help="Path to the agentic MCP server binary", + ) + return parser + + +def main(): + """Main entry point.""" + parser = create_parser() + args = parser.parse_args() + + # Create agent factory with default configuration + def agent_factory(mcp_client, storage_dir=None): + return create_default_orchestrator( + mcp_client=mcp_client, + storage_dir=storage_dir, + system_prompt=get_prompt_with_name("test_orchestrator_prompt"), + model_id="us.anthropic.claude-sonnet-4-5-20250929-v1:0", + ) + + logger.info("Starting Agent Runtime Server...") + server = AgentRuntimeServer( + agent_factory=agent_factory, + host=args.host, + port=args.port, + binary_location=args.binary_location, + storage_dir=args.storage_dir, + delayed_timeout=3600, + ) + + # This will set up everything and run the server + server.start() + + +if __name__ == "__main__": + main() +``` + +**Key Features**: +- **Simplified interface**: Single unified server +- **Agent factory pattern**: Pluggable agent creation +- **Compatible protocols**: Supports both Bedrock AgentCore and AWS Transform compute service endpoints +- **Automatic handling**: JSON-RPC 2.0 protocol, context initialization, session management + +#### Option B: Custom Agent Factory + +For more control, create a custom agent factory: + +```python +from my_custom_agent.custom_tools import analyze_infrastructure, create_migration_plan + + +def agent_factory(mcp_client, storage_dir=None): + """Create custom orchestrator with specific tools and configuration.""" + from agent_builder_sdk.orchestrator_strands.base_orchestrator import AsyncBaseOrchestrator + + # Create orchestrator with custom tools + orchestrator = AsyncBaseOrchestrator( + system_prompt="You are a specialized AWS infrastructure transformation orchestrator...", + mcp_clients=[mcp_client] if mcp_client is not None else None, + region_name="us-east-1", + model_id="us.anthropic.claude-sonnet-4-5-20250929-v1:0", + custom_tools=[analyze_infrastructure, create_migration_plan], + ) + + return orchestrator +``` + +### Step 5: Local Testing + +Set up environment variables and test your agent: + +```bash +# Required environment variables +export WORKSPACE_ID=your-workspace-id +export JOB_ID=your-job-id +export AGENT_INSTANCE_ID=your-agent-instance-id +export AUTHORIZATION_TOKEN=your-token +export QT_AGENTIC_API_ENDPOINT=https://iad.prod.agenticapi.elastic-gumby.ai.aws.dev +export AWS_REGION=us-east-1 + +# Console mode: Talk directly to the agent +python src/my_custom_agent/my_agent_cli.py \ + --storage-dir . \ + --binary-location agent-builder-agentic-mcp + +# Server mode: Start local server for API testing +python src/my_custom_agent/my_agent_cli.py \ + --storage-dir . \ + --binary-location agent-builder-agentic-mcp \ + --queue-storage-path . +``` + +## Project Structure + +``` +my-orchestrator/ +├── __init__.py +├── orchestrator_cli.py # Entry point (argparse + AgentRuntimeServer) +├── orchestrator.py # Main class (all tools defined here) +├── agent_client.py # API client (AgenticApiHelper subclass) +├── tools/ +│ └── orchestrator_tools.py # CUSTOMIZE: discover_subagents tool +├── prompts/ +│ └── orchestrator_prompt.md # CUSTOMIZE: 3-phase workflow prompt +├── requirements.txt +├── Dockerfile +└── .bedrock_agentcore.yaml +``` + +## Dockerfile (REQUIRED — do NOT scaffold from scratch) + +**You MUST use the canonical Dockerfile template from [dockerfile-orchestrator.md](./dockerfile-orchestrator.md).** Copy its contents verbatim as your `Dockerfile`. Adapt the `COPY src/orchestrator/ .` line and `ENTRYPOINT` to match your source layout. + +A minimal Dockerfile will fail on first invocation with two separate bugs that are extremely hard to debug: + +1. **Missing botocore service models** → `Unknown service: 'transformagenticservice'` at agent init. Job stuck in STARTING. +2. **Missing MCP server shim** → `FileNotFoundError: '/home/amazon/AgentBuilderAgenticMCP/bin/agent-builder-agentic-mcp'`. Job stuck in STARTING. + +Both require a full rebuild → new runtime → new version → re-register cycle to fix. The template already handles both correctly. + +## Architecture Decisions (must-know) + +### AgentRuntimeServer with delayed_timeout=3600 + +Orchestrators MUST use `AgentRuntimeServer` (NOT `StatelessAgentRuntimeServer`) with `delayed_timeout=3600`. The stateless server has a hard 28s asyncio timeout that kills long-running subagent coordination. `AgentRuntimeServer` has queue support built-in — it acks after 28s and continues processing in the background for up to `delayed_timeout` seconds. + +```python +from agent_builder_sdk import AgentRuntimeServer + +server = AgentRuntimeServer(agent_factory=agent_factory, delayed_timeout=3600) +server.start() +``` + +> **WARNING:** Do NOT pass `queue=True` — that parameter does not exist and will crash with `TypeError: __init__() got an unexpected keyword argument 'queue'`. Queue behavior is always enabled internally. + +### mcp_clients Must Be Plural (List) + +```python +# WRONG — will crash +MyOrchestrator(mcp_client=mcp_client) +# CORRECT — wrap in list, or None if A2A-only +MyOrchestrator(mcp_clients=[mcp_client] if mcp_client is not None else None) +``` + +### Background Execution via Daemon Thread + +Running the full workflow synchronously exceeds `delayed_timeout`. The solution: `execute_plan()` spawns a `threading.Thread(daemon=True)` and returns immediately. The thread executes each step, polls with no timeout ceiling (exponential backoff), updates step statuses, and sends progress via ATX_CHAT. Python's GIL makes dict reads/writes atomic — no locks needed for shared `_execution_state`. + +``` +User Chat --> LLM (negotiate plan) --> create_job_plan --> user confirms + | + execute_plan() + | + returns immediately: "Plan started" + | + +-----------v-----------+ + | Background Thread | + | for step in plan: | + | invoke subagent | + | send message | + | poll (no timeout) | + | update step status | + | send ATX_CHAT msg | + +-----------------------+ +``` + +### execution_groups: Parallel + Sequential + +Use `execution_groups` (list of dicts) for grouped parallel/sequential execution. Each dict in the list is a group — steps within the same dict run in parallel, groups run sequentially. + +- Sequential: `[{"step-a": "agent-a"}, {"step-b": "agent-b"}]` +- Parallel: `[{"step-a": "agent-a", "step-b": "agent-b"}]` +- Mixed: `[{"step-a": "agent-a"}, {"step-b": "agent-b", "step-c": "agent-c"}]` + +**NOTE:** Use `execution_groups` (list of dicts), NOT `step_agent_mapping` (flat dict). Do NOT simplify to a flat dict — that loses the parallel execution capability. + +### Fire-and-Forget + Polling (A2A Communication) + +The A2A `SendMessage` API has a ~25s internal timeout. If the subagent takes longer, the API returns error code `-32603` with HTTP 200. This is expected — the subagent is still processing. The pattern: send the message (fire-and-forget), immediately poll `get_agent_instance` until COMPLETED, then extract the response from `agentOutput.serializedPayload`. + +### stepId vs stepLabel + +`PutJobPlan` assigns its own step IDs. The `stepLabel` you send (e.g., `"analysis"`) is NOT the `stepId` the API uses. After calling `put_job_plan`, immediately call `list_job_plan_steps()` to get the real `stepId` values. Build a `stepLabel → stepId` mapping dict and use it for all subsequent `UpdateJobPlanStep` calls. + +### Subagents Are Single-Use + +Each subagent instance processes exactly one message then sets COMPLETED. Track completed instances via a `_completed_instances` set and block re-sends with a clear error. + +### ATX_CHAT Messaging + +`send_chat_message` sends progress messages to the webapp chat using `agent_instance_id="ATX_CHAT"` (special target, not a real instance). The required A2A format with `extensions` and `metadata` for `userSelection: "jobCreator"` is undocumented. The working format: + +```python +message = { + "role": "agent", + "parts": [{"type": "TextPart", "text": progress_text}], + "extensions": [json.dumps({"userSelection": "jobCreator"})], + "metadata": {} +} +self.send_message(agent_instance_id="ATX_CHAT", params={"message": message}) +``` + +### Cross-Region Inference Profile Required + +Use `model_id="us.anthropic.claude-sonnet-4-5-20250929-v1:0"` — bare model IDs fail with `ValidationException`. + +## Common Errors Reference + +| Error | Cause | Fix | +|-------|-------|-----| +| `Package requires Python >=3.11` | Wrong Python in Dockerfile | Use `python:3.11-slim` base image | +| `ModuleNotFoundError: mypy_boto3_transformagenticservice` | Missing type stubs | `pip install agent-builder-types-aws-transform` | +| `Agent.__init__() got unexpected keyword argument 'mcp_client'` | Singular `mcp_client` | Use `mcp_clients=[client]` (plural, list) | +| `Missing required parameter: requestContext` | Using raw boto3 | Extend `AgenticApiHelper` | +| `SendMessage returned error code=-32603` | Normal — subagent took >25s | Expected; poll `get_agent_instance` | +| `ResourceNotFoundException` on `update_job_plan_step` | Using `stepLabel` not `stepId` | Use `stepId` from `list_job_plan_steps()` | +| `I was not able to generate a response on time` | Missing `delayed_timeout` | Set `delayed_timeout=3600` | +| `Distribution not found at: file:///app/...` | Deps not copied before pip | Fix Dockerfile copy order | +| `Unknown service: 'transformagenticservice'` | Missing botocore models in container | Use canonical Dockerfile from [dockerfile-orchestrator.md](./dockerfile-orchestrator.md) | +| `FileNotFoundError: '..agent-builder-agentic-mcp'` | Missing MCP shim in container | Use canonical Dockerfile from [dockerfile-orchestrator.md](./dockerfile-orchestrator.md) | +| `ValidationException: Invocation of model ID ...` | Bare model ID | Use `us.` prefix (cross-region profile) | +| `AttributeError: 'ClientSession' object has no attribute 'get_server_capabilities'` | MCP client issue | Use `mcp_clients=None` if A2A-only | + +## Next Steps + +- Build subagents: see `subagent-patterns.md` +- Deploy: see `deploy-agent-workflow.md` +- Register: see `agent-registration.md` +- Troubleshoot: see `troubleshooting.md` diff --git a/aws-transform-agent-toolkit/steering/skill-operations.md b/aws-transform-agent-toolkit/steering/skill-operations.md new file mode 100644 index 0000000..b964abf --- /dev/null +++ b/aws-transform-agent-toolkit/steering/skill-operations.md @@ -0,0 +1,166 @@ +--- +inclusion: auto +name: skill-operations +description: "Guidance for managing AWS Transform skills — upload, download, access control, metadata" +--- + +# AWS Transform Skill Operations Guide + +## Related Guides + +- For agent registration and publishing, see [agent-registration.md](./agent-registration.md) +- For full deployment automation, see [deployment-pipeline-guide.md](./deployment-pipeline-guide.md) + +## Overview + +Skills expand an agent's capabilities on the AWS Transform. The **skill registry** is a central repository where developers can choose from or contribute to a bank of skills to plug and play with their agents. + +These tools interact with the skill registry and come from the AWS Transform Agent Toolkit MCP server (`agent-builder-mcp-aws-transform`). Use them when a developer needs to: +- **Upload** a skill they've authored to the registry so other agents can use it +- **Download** an existing skill from the registry to inspect, iterate on, or plug into their agent +- **Share** skills across AWS accounts so other developers' agents can use them +- **Manage** skill metadata, visibility, and lifecycle (activate, deprecate) + +## Skill Lifecycle + +``` +upload-skill → (auto-zip, auto-activate, auto-grant-access) + → get-skill-metadata → update-skill-metadata + → update-skill-access-control → list-skills + → download-skill +``` + +## Tool Reference + +### list-skills + +List all skills visible to your account. Auto-paginates internally — returns all results in one call. + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `filter` | object | No | Filter criteria (see below) | +| `filter.accessFilter` | enum | No | `ACCESSIBLE_ONLY` or `ALL_SKILLS` | +| `filter.statusFilter` | enum | No | `ACTIVE`, `DELETED`, `DEPRECATED`, `PENDING_UPLOAD`, or `UPDATE_IN_PROGRESS` | + +**Returns:** `{ skills: [SkillSummary[]] }` + +### get-skill-metadata + +Get metadata for a specific skill. + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `skillName` | string | Yes | Name of the skill | + +**Returns:** Full skill metadata object (name, description, status, visibility, timestamps, etc.) + +### update-skill-metadata + +Update a skill's metadata. Uses idempotency tokens for safe retries. + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `skillName` | string | Yes | Name of the skill to update | +| `description` | string | No | Updated description | +| `status` | enum | No | `ACTIVE` only (use `deprecate` flag to deprecate) | +| `visibility` | enum | No | `PRIVATE` or `PUBLIC` | +| `deprecate` | boolean | No | Set to `true` to deprecate the skill | + +### upload-skill + +Upload a skill artifact. Handles zipping, activation, and access control automatically. + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `skillName` | string | Yes | Name for the skill | +| `description` | string | Yes | Description of the skill artifact | +| `filePath` | string | Yes | Local path to the skill directory or file | +| `visibility` | enum | No | `PRIVATE` (default) or `PUBLIC` | + +**Returns:** `{ skillName, status: "ACTIVE", accessGrantedTo: , zipped: boolean, uploadedSize: }` + +**Key behaviors:** +- **Auto-zip:** Directories and non-zip files are automatically zipped before upload +- **Auto-activate:** Skill status is set to `ACTIVE` immediately after upload +- **Auto-grant access:** Caller's AWS account (via STS GetCallerIdentity) is automatically granted access + +### download-skill + +Download a skill artifact. Optionally extract the zip contents. + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `skillName` | string | Yes | Name of the skill to download | +| `filePath` | string | Yes | Local path to save the artifact | +| `unzip` | boolean | No | If `true`, extract to `//` | + +**Returns:** +- If `unzip=true`: `{ message: "Successfully extracted skill to...", files: [] }` +- If `unzip=false`: `{ message: "Successfully downloaded skill to..." }` + +### list-skill-access-control + +List AWS accounts that have access to a skill. + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `skillName` | string | Yes | Name of the skill | + +**Returns:** List of account IDs and their access status. + +### update-skill-access-control + +Grant or revoke access to a skill for a specific AWS account. Uses idempotency tokens for safe retries. + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `skillName` | string | Yes | Name of the skill | +| `skillUserAccountId` | string | Yes | 12-digit AWS account ID | +| `accessStatus` | enum | Yes | `ENABLED` to grant, `DISABLED` to revoke | + +**Note:** Account ID must be exactly 12 digits. + +## Common Workflows + +### Publishing a New Skill + +``` +1. Prepare skill directory (SKILL.md + templates/ + examples/) +2. upload-skill(skillName, description, filePath) + → auto-zips, auto-activates, auto-grants your account access +3. get-skill-metadata(skillName) to verify +4. update-skill-access-control(skillName, targetAccountId, "ENABLED") to share +``` + +### Downloading and Inspecting a Skill + +``` +1. list-skills() to find available skills +2. download-skill(skillName, filePath, unzip=true) + → extracts to // +3. Review SKILL.md, templates/, examples/ +``` + +### Sharing a Skill With Another Account + +``` +1. upload-skill(skillName, description, filePath) — your account gets access automatically +2. update-skill-access-control(skillName, "123456789012", "ENABLED") +3. list-skill-access-control(skillName) to verify +``` + +### Deprecating a Skill + +``` +1. update-skill-metadata(skillName, deprecate=true) +2. get-skill-metadata(skillName) to verify status is DEPRECATED +``` + +## Common Errors + +| Error | Cause | Solution | +|-------|-------|----------| +| Upload fails with S3 error | File path doesn't exist or permissions issue | Verify `filePath` exists and is readable | +| Skill not visible after upload | Access not granted to consuming account | Use `update-skill-access-control` to grant access (caller account is auto-granted) | +| Account ID validation error | `skillUserAccountId` is not 12 digits | Provide exactly 12 numeric digits | +| Status update rejected | Tried to set status to something other than `ACTIVE` | Use `deprecate=true` to deprecate; only `ACTIVE` is valid for `status` field | diff --git a/aws-transform-agent-toolkit/steering/subagent-patterns.md b/aws-transform-agent-toolkit/steering/subagent-patterns.md new file mode 100644 index 0000000..0c584c0 --- /dev/null +++ b/aws-transform-agent-toolkit/steering/subagent-patterns.md @@ -0,0 +1,351 @@ +--- +inclusion: auto +name: subagent-patterns +description: "Guidance for building or updating subagents" +--- + +# Building Subagents + +## What is a Subagent? + +A subagent performs specialized, focused tasks within an orchestrator's workflow. It responds to A2A messages from orchestrators, collects user input via HITL AutoForms, processes data with domain-specific tools, and reports results back. Each subagent instance processes exactly one message. + +## How to Build a Subagent + +### Step 1: Understand the Subagent Base Class + +Your subagent will extend `AsyncBaseSubagent`. Key points: +- Use `AsyncBaseSubagent` (not the synchronous version) +- Provide a focused `system_prompt` for the specific task +- Keep implementation simple and stateless +- **CRITICAL**: The subagent class MUST be defined inside `agent_factory()` — module-level subclasses hang in production containers. See Step 4 for the full pattern. + +### Step 2: Configure System Prompt + +The system prompt defines your subagent's specific task: + +```python +system_prompt = """You are a specialized subagent for analyzing AWS infrastructure configurations. + +Your specific task: +- Parse infrastructure configuration files (JSON, YAML, Terraform) +- Identify security vulnerabilities and misconfigurations +- Return structured analysis results + +Constraints: +- Process one configuration at a time +- Return results in JSON format +- Do not maintain conversation history +""" +``` + +**Best Practices**: +- Be very specific about the single task +- Define input/output formats clearly +- Emphasize stateless operation +- Keep scope narrow and focused + +### Step 3: Create Custom Tools (Optional) + +Define specialized tools for your subagent's task: + +```python +# custom_subagent_tools.py +from strands.tools import tool + + +@tool +def parse_terraform_config(config: str) -> dict: + """Parse Terraform configuration and extract resource definitions. + + Args: + config: Terraform configuration as string + + Returns: + Dictionary with parsed resources and their properties + """ + # Your parsing logic here + return { + "resources": ["aws_instance", "aws_s3_bucket"], + "count": 2 + } + + +@tool +def validate_security_rules(rules: list) -> dict: + """Validate security group rules against best practices. + + Args: + rules: List of security group rules + + Returns: + Validation results with issues found + """ + return { + "valid": True, + "issues": [], + "recommendations": ["Restrict SSH to specific IPs"] + } +``` + +**Tool Guidelines**: +- Keep tools focused on the subagent's specific domain +- Ensure tools are stateless (no side effects) +- Return structured data for easy processing +- Include comprehensive docstrings + +### Step 4: Create Your Entry Point + +Use `AgentRuntimeServer` with `delayed_timeout=3600` (NOT `StatelessAgentRuntimeServer` — it has a hard 28s timeout that kills HITL polling and long-running tools): + +```python +# my_subagent_cli.py +import argparse +import logging +from agent_builder_sdk.server.agent_runtime_server import AgentRuntimeServer + +# Configure logging +logging.basicConfig( + level=logging.INFO, + format="%(asctime)s - %(levelname)s - %(message)s", +) +logger = logging.getLogger(__name__) + + +def create_parser() -> argparse.ArgumentParser: + """Create command line argument parser.""" + parser = argparse.ArgumentParser(description="Run Agent Runtime Server") + parser.add_argument("--host", default="0.0.0.0", help="Host to bind server to") + parser.add_argument("--port", type=int, default=8080, help="Port to bind server to") + parser.add_argument( + "--binary-location", + default="/home/amazon/AgentBuilderAgenticMCP/bin/agent-builder-agentic-mcp", + help="Path to the agentic MCP server binary", + ) + return parser + + +def main(): + """Main entry point.""" + parser = create_parser() + args = parser.parse_args() + + # Create agent factory with default configuration + def agent_factory(mcp_client, storage_dir=None): + from agent_builder_sdk.base_subagent.base_subagent import AsyncBaseSubagent + + # Define subagent class INSIDE agent_factory (module-level hangs in containers) + class MySubagent(AsyncBaseSubagent): + pass + + return MySubagent( + system_prompt="You are a specialized subagent for infrastructure analysis...", + mcp_clients=[mcp_client] if mcp_client is not None else None, + region_name="us-east-1", + model_id="us.anthropic.claude-sonnet-4-5-20250929-v1:0", + ) + + logger.info("Starting Agent Runtime Server...") + server = AgentRuntimeServer( + agent_factory=agent_factory, + host=args.host, + port=args.port, + binary_location=args.binary_location, + delayed_timeout=3600, + ) + + # This will set up everything and run the server + server.start() + + +if __name__ == "__main__": + main() +``` + +**Key Features**: +- **Queue-based server**: Acks after 28s, continues processing in background for up to `delayed_timeout` +- **Class inside factory**: Avoids module-level subclass hang in production containers +- **Agent factory pattern**: Pluggable agent creation +- **Compatible protocols**: Supports both Bedrock AgentCore and AWS Transform compute service endpoints + +#### Custom Agent Factory + +For more control, create a custom agent factory with custom tools: + +```python +from my_custom_subagent.custom_tools import parse_terraform_config, validate_security_rules + + +def agent_factory(mcp_client, storage_dir=None): + """Create custom subagent with specific tools and configuration.""" + from agent_builder_sdk.base_subagent.base_subagent import AsyncBaseSubagent + + class MySubagent(AsyncBaseSubagent): + pass + + return MySubagent( + system_prompt="You are a specialized infrastructure analysis subagent...", + mcp_clients=[mcp_client] if mcp_client is not None else None, + region_name="us-east-1", + custom_tools=[parse_terraform_config, validate_security_rules], + model_id="us.anthropic.claude-sonnet-4-5-20250929-v1:0", + ) +``` + +### Step 5: Local Testing + +Set up environment variables and test your subagent: + +```bash +# Required environment variables +export WORKSPACE_ID=your-workspace-id +export JOB_ID=your-job-id +export AGENT_INSTANCE_ID=your-agent-instance-id +export AUTHORIZATION_TOKEN=your-token +export QT_AGENTIC_API_ENDPOINT=https://iad.prod.agenticapi.elastic-gumby.ai.aws.dev +export AWS_REGION=us-east-1 + +# Start local server for API testing +python src/my_custom_subagent/my_subagent_cli.py \ + --binary-location agent-builder-agentic-mcp +``` + +## Orchestrator vs Subagent Decision Guide + +| Feature | Orchestrator | Subagent | +|---------|-------------|----------| +| **Base Class** | `AsyncBaseOrchestrator` | `AsyncBaseSubagent` | +| **Server** | AgentRuntimeServer + delayed_timeout | AgentRuntimeServer + delayed_timeout | +| **Visibility** | PUBLIC (in webapp chat) | RESTRICTED (orchestrator-invoked) | +| **Status Management** | Automatic (queue manages) | Manual (must set COMPLETED/FAILED) | +| **State** | Stateful (episodic memory) | Stateless (no memory between requests) | +| **MCP Client** | Optional (disable if A2A-only) | Optional (disable if using @tool functions) | + +**Use Orchestrator when:** coordinating multiple agents, maintaining conversation context, building user-facing agents. +**Use Subagent when:** performing focused tasks, responding only to orchestrator requests, no conversation history needed. + +## Project Structure + +``` +my-subagent/ +├── __init__.py +├── subagent_cli.py # Entry point + subagent class (INSIDE agent_factory) +├── agent_client.py # HITL + artifact client (AgenticApiHelper subclass) +├── tools/ +│ ├── __init__.py +│ ├── custom_tools.py # CUSTOMIZE: domain-specific tools +│ ├── hitl_tools.py # HITL AutoForm tools +│ ├── s3_tools.py # Direct S3 upload/download +│ └── connector_s3_tools.py # Connector-aware S3 tools (optional) +├── prompts/ +│ └── subagent_prompt.md # CUSTOMIZE: domain-specific system prompt +├── requirements.txt +├── Dockerfile +└── .bedrock_agentcore.yaml +``` + +## Dockerfile (REQUIRED — do NOT scaffold from scratch) + +**You MUST use the canonical Dockerfile template from [dockerfile-subagent.md](./dockerfile-subagent.md).** Copy its contents verbatim as your `Dockerfile`. Adapt the `COPY` source lines and `ENTRYPOINT` to match your source layout. + +A minimal Dockerfile will fail on first invocation — see [orchestrator-patterns.md](./orchestrator-patterns.md#dockerfile-required--do-not-scaffold-from-scratch) for the full explanation of the two bugs this prevents. + +## Architecture Decisions (must-know) + +### CRITICAL: AgentRuntimeServer, NOT StatelessAgentRuntimeServer + +`StatelessAgentRuntimeServer` has a hard 28s `asyncio.timeout` around the entire `process_message` call. Any tool blocking longer (like `poll_hitl_response` waiting for user input) gets killed silently. Use `AgentRuntimeServer` with `delayed_timeout=3600` — it acks after 28s and continues processing in the background. + +### Subagent Class Defined INSIDE agent_factory + +Module-level subclasses of `AsyncBaseSubagent` hang in production containers. Always define the subagent class inside the `agent_factory()` function: + +```python +def agent_factory(mcp_client, storage_dir=None): + class MySubagent(AsyncBaseSubagent): + async def process_message_async(self, request): + # ... handle message + pass + return MySubagent(mcp_clients=[mcp_client] if mcp_client else None) +``` + +### process_message_async Override + +The SDK queue handler passes `ProcessMessageRequest` objects, but the base class expects `str`. The override must: +1. Handle `ProcessMessageRequest` type (extract `.message`) +2. Extract text from A2A dict format (`parts[0].text`) +3. Call `self._process_message(message)` with the extracted string +4. Manually set COMPLETED or FAILED via `get_agent_instance_manager().update_status()` with `agentOutput={"serializedPayload": json.dumps(...)}` + +The override must extract the text and set final status: + +```python +async def process_message_async(self, request): + if hasattr(request, 'message'): + msg = request.message + else: + msg = request + if isinstance(msg, dict): + text = msg.get("parts", [{}])[0].get("text", str(msg)) + else: + text = str(msg) + try: + result = await self._process_message(text) + self.get_agent_instance_manager().update_status( + "COMPLETED", agentOutput={"serializedPayload": json.dumps({"response": result})} + ) + except Exception as e: + self.get_agent_instance_manager().update_status("FAILED", statusReason=str(e)) +``` + +### HITL AutoForm Pattern + +The orchestrator passes `step_id` in the A2A message. The subagent: +1. Extracts `step_id` from the message text +2. Calls `create_hitl_autoform(step_id, title, description, fields)` — uploads form schema, creates task, starts it, sets step to `PENDING_HUMAN_INPUT` +3. Calls `poll_hitl_response(hitl_task_id)` — polls until SUBMITTED, downloads response artifact, closes task +4. Processes the user's input with domain-specific tools + +The `description` field in HITL tasks has a max length of 1024 characters — truncate before passing. + +### S3 Access: Connector First, Direct Fallback + +Always register both `connector_s3_tools` AND `s3_tools`. The system prompt must instruct the LLM to: +1. Call `list_s3_connectors()` first +2. Use `download_from_connector`/`upload_to_connector` if connectors available +3. Fall back to `download_s3_file`/`upload_s3_file` if connector fails (auth error) or none available + +See troubleshooting #29 for known connector auth issues. + +### mcp_clients Must Be Plural (List) or None + +```python +# WRONG +MySubagent(mcp_client=mcp_client) +# CORRECT +MySubagent(mcp_clients=[mcp_client] if mcp_client else None) +``` + +### Cross-Region Inference Profile Required + +Use `model_id="us.anthropic.claude-sonnet-4-5-20250929-v1:0"` — bare model IDs fail with `ValidationException`. + +## Common Errors Reference + +| Error | Cause | Fix | +|-------|-------|-----| +| `Agent processing timed out after 28 seconds` | Using `StatelessAgentRuntimeServer` | Switch to `AgentRuntimeServer` with `delayed_timeout=3600` | +| Never reaches COMPLETED | Missing `update_status()` | Set COMPLETED with `agentOutput` in `process_message_async` | +| Orchestrator can't get response | Bad `agentOutput` format | Include `serializedPayload` as JSON string | +| Message processing fails | A2A format not extracted | Handle `ProcessMessageRequest` + dict in override | +| Constructor hangs in container | Module-level subclass | Define class inside `agent_factory()` | +| `unexpected keyword argument 'mcp_client'` | Singular not plural | Use `mcp_clients=[client]` or `None` | +| HITL display_report fails | Description exceeds 1024 chars | Truncate to < 1024 chars | +| `TerminalResourceException: status is not valid: COMPLETED` | Container reuse (AWS Transform bug) | Re-run job for fresh container | + +## Next Steps + +- Build orchestrator: see `orchestrator-patterns.md` +- Deploy: see `deploy-agent-workflow.md` +- Register: see `agent-registration.md` +- Troubleshoot: see `troubleshooting.md` diff --git a/aws-transform-agent-toolkit/steering/troubleshooting.md b/aws-transform-agent-toolkit/steering/troubleshooting.md new file mode 100644 index 0000000..ed9ec3a --- /dev/null +++ b/aws-transform-agent-toolkit/steering/troubleshooting.md @@ -0,0 +1,217 @@ +--- +inclusion: auto +name: troubleshooting +description: "Field-tested troubleshooting guide for common AWS Transform agent issues" +--- + +# AWS Transform Agent Troubleshooting Guide + +## Quick Reference + +| # | Issue | Symptom | Fix | +|---|-------|---------|-----| +| 1 | Wrong agent type registered | Shows as ORCHESTRATOR instead of SUB_AGENT | Manual 3-step registration via MCP | +| 2 | MCP credentials not inherited | Auth errors in MCP tools | Add creds to `~/.kiro/settings/mcp.json` env block | +| 3 | MCP server slow startup | `MCP error -32001: Request timed out` | Retry 2-3 times after config changes | +| 4 | Relative agent_path | `FileNotFoundError` in `build_agent_image` | Use absolute path | +| 5 | Workspace MCP overrides power | Power servers disabled | Only use `~/.kiro/settings/mcp.json` for power servers | +| 6 | Python version mismatch | SDK install fails | Use `/opt/homebrew/bin/python3.11 -m venv .venv` | +| 7 | Publish config gaps | Agent missing resiliency/schemas | Include all required fields (see agent-registration.md) | +| 8 | agentCard field | A2A Agent Card — required by PublishAgentVersion | Empty `{}` rejected by boto3 — see Minimal agentCard Example in agent-registration.md | +| 9 | Large publish payload | "tool does not exist" error | Publish minimal config first, add fields in later versions | +| 10 | Orchestrator not in webapp | Agent registered but not visible | Set `jobOrchestrator: true` + `jobOrchestratorMetadata` at registration | +| 11 | customerConfigurationRequired trade-off | Can't have compute config + dependencies | Choose based on priority (see details) | +| 12 | Bare model ID fails | `ValidationException: Invocation of model ID...` | Use `us.` prefix (cross-region inference profile) | +| 13 | agent_factory signature | `takes 1 positional argument but 2 were given` | Add `storage_dir=None` param | +| 14 | publish_agent_version includes compute | Rejected for `customerConfigurationRequired: true` | Use `publish-agent-version` from `agent-builder-mcp-aws-transform` | +| 15 | Stale registration after redeploy | New runtime gets zero invocations | Publish new version or re-register with `customerConfigurationRequired: false` | +| 16 | computeConfiguration schema change | Flat `agentRuntimeArn` rejected | Use nested `provisionedComputeConfiguration.agentCoreConfiguration.runtimeArn` | +| 17 | Expired STS tokens | `get_caller_identity()` fails | Use `TARGET_ACCOUNT_ID` from `.env` | +| 18 | Symlinked SDK dir | Build fails or SDK missing | `cp -r` not `ln -s` | +| 19 | Wrong role in agent registration | Chat never enables, zero invocations | Use `AWSTransformAgentInvokeRole` not `AgentCoreExecutionRole` | +| 20 | StatelessAgentRuntimeServer timeout | HITL polling killed after 28s | Use `AgentRuntimeServer` with `delayed_timeout=3600` | +| 21 | Container reuse stale instance | Subagent COMPLETED without doing work | Re-run job (AWS Transform bug) | +| 22 | HITL description too long | display_report fails | Truncate to < 1024 chars | +| 23 | Post-COMPLETED operations fail | `TerminalResourceException` after second message | Design one message per subagent instance | +| 24 | SendMessage -32603 timeout | Orchestrator thinks subagent failed | Poll `get_agent_instance` until COMPLETED | +| 25 | PutJobPlan stepId mismatch | 404 on UpdateJobPlanStep | Call `list_job_plan_steps()` to get real stepIds | +| 26 | JobManager auto-transitions status | Job goes EXECUTING→PLANNING→PLANNED→EXECUTING | Harmless; SDK auto-transitions during init | +| 27 | ATX_CHAT messaging undocumented | Chat messages don't appear | Use A2A format with `extensions` + `userSelection: "jobCreator"` (see orchestrator-patterns.md) | +| 28 | Background execution undocumented | Orchestrator exceeds delayed_timeout | Spawn daemon thread from LLM tool | +| 29 | S3 connector auth denied | `Partner not authorized to access this type of connector` | Fall back to direct S3 tools; request connector access from AWS Transform team | + +## Detailed Issues + +### 1. Agent Registered as Wrong Type + +**Symptom:** `deploy_agent_full_pipeline` registers as ORCHESTRATOR_AGENT instead of SUB_AGENT, sets `customerConfigurationRequired: false`. +**Fix:** Split into 3 steps: (1) `build_agent_image` + `deploy_agent_to_agentcore`, (2) `register_agent` MCP tool with correct metadata, (3) `publish_agent_version` MCP tool. +**Gotcha:** `agentCard`, `inputPayloadSchema`, `outputPayloadSchema` cannot be empty `{}` — must contain at least one property. + +### 2. MCP Server Credentials Not Inherited + +**Symptom:** Auth errors from `agent-builder-mcp-aws-transform` MCP server. +**Fix:** Add `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN` to the `env` block in `~/.kiro/settings/mcp.json`. Tokens expire — update when credentials refresh. + +### 3. MCP Server Slow Startup + +**Symptom:** `MCP error -32001: Request timed out` on connect. +**Fix:** Retry 2-3 times. The MCP server may be slower on first connect. + +### 4. Relative Path in deploy_agent_full_pipeline + +**Symptom:** `FileNotFoundError` when using relative `agent_path`. +**Fix:** Use absolute path: `/Users/username/projects/my-agent/`. + +### 5. Workspace MCP Overrides Power Config + +**Symptom:** Power servers disabled after creating `.kiro/settings/mcp.json` in workspace. +**Fix:** Don't duplicate power server entries in workspace-level MCP config. Use only `~/.kiro/settings/mcp.json`. + +### 6. SDK Requires Python 3.11+ + +**Symptom:** `agent-builder-sdk-aws-transform` fails to install. +**Fix:** Create venv with correct Python: `/opt/homebrew/bin/python3.11 -m venv .venv`. + +### 7. Publish Configuration Gaps + +**Symptom:** Agent missing `objectiveNegotiationPrompt`, `agentResiliencyConfiguration`, proper schemas. +**Fix:** Include all required publish configuration fields. Add `partnerControllerRetryWindowMinutes: 6`, `recoveryWaitTimeSeconds: 60`. See `agent-registration.md` for the complete configuration structure. + +### 8. agentCard Field + +Forward-looking field for A2A agent discovery. Required by `PublishAgentVersion` — boto3 client-side validation rejects empty `{}`. Must contain at least the required fields (id, name, description, version, capabilities with extensions). See the Minimal agentCard Example in `agent-registration.md`. + +### 9. Large Publish Payload Fails + +**Symptom:** `publish_agent_version` via MCP returns "tool does not exist". +**Fix:** Publish v1.0.0 with minimal config, add fields in v1.0.1, v1.0.2, etc. + +### 10. Orchestrator Not Visible in Webapp + +**Symptom:** Registered orchestrator doesn't appear in workspace agent list. +**Fix:** Must set `jobOrchestrator: true` AND `jobOrchestratorMetadata` (chatUILabel, chatAgentIdentifier, a2aSupported) at registration time. These fields cannot be updated after registration. + +### 11. customerConfigurationRequired Trade-Off + +`true` blocks `computeConfiguration` in publish but allows `customerConfiguredAgentDependencies`. `false` allows compute config but blocks dependencies. See `agent-registration.md` for the full trade-off matrix. + +### 12. Cross-Region Inference Profile Required + +**Symptom:** `ValidationException: Invocation of model ID ... with on-demand throughput isn't supported`. +**Fix:** Use `us.anthropic.claude-3-7-sonnet-20250219-v1:0` (cross-region inference profile). + +### 13. agent_factory Signature Mismatch + +**Symptom:** `agent_factory() takes 1 positional argument but 2 were given`. +**Fix:** `def agent_factory(mcp_client, storage_dir=None):` — server now passes 2 args. + +### 14. publish_agent_version Includes Compute for customerConfigurationRequired + +**Symptom:** `publish_agent_version` always includes `computeConfiguration`. +**Fix:** Use `publish_agent_version` MCP tool with appropriate overrides, manually omitting `computeConfiguration`. + +### 15. Stale Registration After Redeploy + +**Symptom:** New runtime is READY but gets zero invocations; old runtime still receives traffic. +**Fix:** Publish a new agent version pointing to the new runtime ARN, or re-register under a new name with `customerConfigurationRequired: false` to embed runtime ARN directly. + +### 16. computeConfiguration Schema Change + +**Symptom:** Flat `agentRuntimeArn` rejected. +**Fix:** Use nested structure: `computeConfiguration.provisionedComputeConfiguration.agentCoreConfiguration.runtimeArn`. + +### 17. Expired STS Tokens in Scripts + +**Symptom:** `boto3.client("sts").get_caller_identity()` fails. +**Fix:** Read account ID from environment: `os.environ.get("TARGET_ACCOUNT_ID")`. Set in `.env` file. + +### 18. Symlinked SDK Directory + +**Symptom:** Docker/finch build fails — build context doesn't follow symlinks. +**Fix:** Copy the SDK into the project directory (don't symlink): `pip install agent-builder-sdk-aws-transform --target /sdk/`. + +### 19. Wrong Role in Agent Registration + +**Symptom:** Chat input never enables, zero invocations, no error. +**Cause:** Used `AgentCoreExecutionRole` instead of `AWSTransformAgentInvokeRole` in the `atxAccessRoleArn` field during registration. +**Fix:** Use `AWSTransformAgentInvokeRole` (trusted by `prod.us-east-1.compute.elastic-gumby.aws.internal`). + +### 20. StatelessAgentRuntimeServer 28s Timeout + +**Symptom:** Subagent processing killed after 28 seconds; HITL polling never completes. +**Cause:** `StatelessAgentRuntimeServer` uses `asyncio.timeout(28)` on entire `process_message`. +**Fix:** Use `AgentRuntimeServer` with `delayed_timeout=3600`. Must also override `process_message_async` to set COMPLETED/FAILED. See `subagent-patterns.md` for the override pattern. + +### 21. Container Reuse Stale Instance (AWS Transform Bug) + +**Symptom:** `TerminalResourceException: Agent instance status is not valid: COMPLETED` on first invocation. +**Cause:** AWS Transform reuses container from previous job without resetting instance state. +**Fix:** Re-run the job. Service-level issue — no code workaround. + +### 22. HITL Description Exceeds 1024 Characters + +**Symptom:** `display_report` HITL task creation fails. +**Fix:** Truncate: `description = description[:1000] + "\n\n[Truncated]"` before passing. + +### 23. Post-COMPLETED Operations Fail + +**Symptom:** After COMPLETED, all API calls on that instance return `TerminalResourceException`. +**Fix:** Subagents are single-use. Design orchestrators to send all work in a single message per invocation. + +### 24. SendMessage -32603 Internal Timeout + +**Symptom:** SendMessage returns `error.code: "-32603"` with HTTP 200 after ~25s. +**Cause:** Internal API timeout. Subagent is still processing. +**Fix:** Fire-and-forget pattern — send message, immediately poll `get_agent_instance` until COMPLETED. See `orchestrator-patterns.md` for the polling pattern. + +### 25. PutJobPlan Assigns Its Own Step IDs + +**Symptom:** `ResourceNotFoundException` when calling `UpdateJobPlanStep` with `stepLabel`. +**Fix:** After `put_job_plan`, call `list_job_plan_steps()` and build `stepLabel → stepId` mapping. + +### 26. JobManager Auto-Transitions Status + +**Symptom:** Job status goes EXECUTING→PLANNING→PLANNED→EXECUTING during startup. +**Cause:** SDK's `JobManager` auto-transitions to EXECUTING during init. Your tool then sets PLANNING. +**Impact:** Harmless. Status settles to correct value. + +### 27. ATX_CHAT Messaging Format Undocumented + +**Symptom:** Chat messages don't appear in webapp. +**Fix:** Use the exact A2A format with `extensions` containing `{"userSelection": "jobCreator"}` metadata. See `orchestrator-patterns.md` for the working code pattern. + +### 28. Background Thread Execution Pattern + +**Symptom:** Orchestrator exceeds `delayed_timeout` during multi-step execution. +**Fix:** Spawn a `threading.Thread(daemon=True)` from the LLM tool, return immediately. The thread handles step execution, polling, status updates, and ATX_CHAT progress. See `orchestrator-patterns.md` for the background execution architecture. + +### 29. S3 Connector Authorization Denied + +**Symptom:** `ValidationException: Partner 'X' is not authorized to access this type of connector`. +**Cause:** Publisher not authorized for S3 connector type at AWS Transform level. `list_s3_connectors()` succeeds but data plane calls fail. +**Fix:** Fall back to direct S3 tools (`download_s3_file`/`upload_s3_file`). Contact AWS Transform team to request S3 connector authorization. Always register both connector and direct S3 tools. + +## Debugging Techniques + +### CloudWatch Logs + +```bash +# Tail logs in real-time +aws logs tail /aws/bedrock-agentcore/runtimes/-DEFAULT --follow --region us-east-1 + +# Filter for errors +aws logs tail /aws/bedrock-agentcore/runtimes/-DEFAULT --filter-pattern "ERROR" +``` + +Also use `fetch_logs` and `list_log_streams` MCP tools from `agent-builder-mcp-aws-transform`. + +### Local Docker Testing + +```bash +docker build -t my-agent . +docker run -p 8080:8080 \ + -e WORKSPACE_ID=test -e JOB_ID=test -e AGENT_INSTANCE_ID=test \ + -e LOG_LEVEL=DEBUG my-agent +curl http://localhost:8080/ping +``` diff --git a/aws-transform-agent-toolkit/steering/workflow-integration.md b/aws-transform-agent-toolkit/steering/workflow-integration.md new file mode 100644 index 0000000..aca574e --- /dev/null +++ b/aws-transform-agent-toolkit/steering/workflow-integration.md @@ -0,0 +1,57 @@ +--- +inclusion: auto +name: workflow-integration +description: "Guidelines for adding agents to an existing workflow from the AWS Transform console" +--- + +# Adding an Agent to an Existing Workflow + +**IMPORTANT**: Do NOT look up the requested agent first. Understand the current workflow architecture before anything else. + +## Step 1: Analyze the Current Codebase + +### Check for architecture documents first + +Search for `**/architecture*.md`, `**/design*.md`, `**/workflow*.md`, `**/ARCHITECTURE*`, `**/README.md`, `**/.kiro/specs/*`, `**/.kiro/steering/*`, `**/docs/agents*`. Look for: which agents exist, their roles, invocation flow, decision logic, and message formats. + +### If no architecture document exists + +Analyze the codebase: + +1. **Find agents**: Search for classes extending `AsyncBaseOrchestrator`, `BaseOrchestrator`, `AsyncBaseSubagent`, or `BaseSubagent`. Also check for `AgentRuntimeServer` / `StatelessAgentRuntimeServer` setup. +2. **Read system prompts**: Look for `system_prompt=` in constructors, `get_prompt_with_name()` references, and prompt strings in config files. +3. **Find invocation patterns**: `InvokeAgent` calls, `agentId` references, A2A messages, tool definitions that invoke agents. +4. **Map the workflow**: Which agents exist, how they connect, invocation conditions, data passed between them. Don't assume a single-orchestrator-with-subagents pattern — there may be multiple orchestrators, chained agents, peer agents, or other topologies. + +## Step 2: Confirm and Discuss with the User + +1. **Confirm your understanding**: Share your mental model of the current workflow. Ask the user to confirm or correct it before proceeding. +2. **Suggest integration points**: The new agent could be any type. Consider: invoked by an existing agent, replacing an existing agent, an additional workflow step, a new orchestrator coordinating existing agents, or a peer agent. +3. **Ask the user to confirm**: Where it goes, when it's invoked, what data it needs. + +## Step 3: Integrate the Agent + +### Update system prompts + +Add the new agent to the relevant agent's system prompt: its name/ID, when to invoke it, and what data to send. If it fits into a sequence with existing agents, update the workflow steps. Match the existing prompt style. + +### Add invocation code (if applicable) + +If agents use explicit code to invoke other agents, add the new invocation following the existing pattern in the codebase. + +### Update architecture document (if it exists) + +Add the agent to the agent list, update the workflow diagram/flow, and document invocation conditions and message format. + +## Step 4: Verify + +1. **Prompt coherence**: No contradictions or ambiguous invocation conditions. +2. **Code consistency**: New invocation code follows existing patterns. +3. **Document accuracy**: Architecture doc matches actual code changes. + +## Guidelines + +- Only modify what's necessary. +- Match existing style in prompts and code. +- Always specify: agent name/ID, when to invoke, what to send. +- Ask the user rather than guessing when unclear.