From afc4dfedb8fc3006d9510016fb2bdadcf040c8e7 Mon Sep 17 00:00:00 2001 From: Sahil Bhimjiani Date: Mon, 20 Apr 2026 16:16:42 -0500 Subject: [PATCH 1/8] feat: add AWS Lambda Managed Instances (LMI) skill to aws-serverless plugin Add a new skill for evaluating, configuring, and migrating workloads to AWS Lambda Managed Instances. Includes workload fitness assessment, 4-column cost comparison (Lambda OD/SP vs LMI OD/SP), configuration recommendations, thread-safety review, and end-to-end migration framework. Reference files cover cost analysis, configuration tuning, thread safety, runtime-specific migration patterns, infrastructure setup (CLI/SAM/CDK), and troubleshooting. --- .../aws-lambda-managed-instances/SKILL.md | 205 ++++++++++++++++++ .../references/configuration-guide.md | 59 +++++ .../references/cost-comparison.md | 72 ++++++ .../references/infrastructure-setup.md | 96 ++++++++ .../references/migration-patterns.md | 128 +++++++++++ .../references/thread-safety.md | 53 +++++ .../references/troubleshooting.md | 42 ++++ 7 files changed, 655 insertions(+) create mode 100644 plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md create mode 100644 plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md create mode 100644 plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md create mode 100644 plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md create mode 100644 plugins/aws-serverless/skills/aws-lambda-managed-instances/references/migration-patterns.md create mode 100644 plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md create mode 100644 plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md new file mode 100644 index 00000000..10cfc77a --- /dev/null +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md @@ -0,0 +1,205 @@ +--- +name: aws-lambda-managed-instances +description: > + Evaluate, configure, and migrate workloads to AWS Lambda Managed Instances (LMI). + Triggers on: Lambda Managed Instances, LMI, capacity provider, multi-concurrency Lambda, + dedicated instance Lambda, EC2-backed Lambda, cold start elimination, Graviton Lambda, + instance type for Lambda, Lambda cost optimization with Reserved Instances or Savings Plans. + Also trigger when users describe high-volume predictable workloads seeking cost savings, + or compare Lambda vs EC2 for steady-state traffic. For standard Lambda without LMI, + use the aws-lambda skill instead. +argument-hint: "[describe your workload or what you need help with]" +metadata: + tags: lambda, lmi, managed-instances, ec2, capacity-provider, multi-concurrency, cost-optimization +--- + +# AWS Lambda Managed Instances (LMI) + +Run Lambda functions on current-generation EC2 instances in your account while AWS manages provisioning, patching, scaling, routing, and load balancing. Combines Lambda's developer experience with EC2's pricing and hardware options. + +For standard Lambda development, see [aws-lambda skill](../aws-lambda/). For SAM/CDK deployment, see [aws-serverless-deployment skill](../aws-serverless-deployment/). + +## When to Load Reference Files + +- **Cost comparison**, **pricing analysis**, **Lambda vs LMI cost**, **Savings Plans**, or **Reserved Instances** -> see [references/cost-comparison.md](references/cost-comparison.md) +- **Instance types**, **memory sizing**, **vCPU ratios**, **scaling tuning**, or **capacity provider config** -> see [references/configuration-guide.md](references/configuration-guide.md) +- **Thread safety**, **code review checklist**, or **multi-concurrency readiness** -> see [references/thread-safety.md](references/thread-safety.md) +- **Before/after code examples**, **runtime-specific migration** (Node.js, Python, Java, .NET), or **connection pooling** -> see [references/migration-patterns.md](references/migration-patterns.md) +- **IAM roles**, **VPC setup**, **CLI commands**, **SAM template**, or **CDK example** -> see [references/infrastructure-setup.md](references/infrastructure-setup.md) +- **Errors**, **throttling**, **debugging**, or **stuck deployments** -> see [references/troubleshooting.md](references/troubleshooting.md) + +## Quick Decision: Is LMI Right for This Workload? + +| Signal | LMI is a strong fit | Standard Lambda is better | +|--------|---------------------|---------------------------| +| Traffic | Steady, predictable, 50M+ req/mo | Bursty, unpredictable, long idle | +| Cost | Duration-heavy spend at scale | Low or sporadic invocations | +| Cold starts | Unacceptable (LMI has zero) | Tolerable or mitigated by SnapStart | +| Compute | Latest CPUs, specific families, high network BW | Standard Lambda memory/CPU sufficient | +| Compliance | Single-tenant required, VPC control | Multi-tenant Firecracker acceptable | +| Scale-to-zero | Not needed (min 3 instances always run) | Required (pay nothing when idle) | +| Code readiness | Thread-safe or feasible to refactor | Non-thread-safe, expensive to change | + +## Instructions + +### Step 1: Assess the Workload + +Gather these signals before recommending: + +1. **Traffic pattern**: Steady vs bursty? Requests per second? +2. **Current costs**: Monthly Lambda spend? Existing Savings Plans? +3. **Runtime**: Node.js, Java, .NET, or Python? +4. **Memory/CPU**: How much memory? CPU-bound or I/O-bound? +5. **Execution duration**: Average and P99? +6. **Thread safety**: Mutable globals, shared `/tmp` paths, non-thread-safe libs? +7. **VPC**: Already in a VPC? Private resource access needed? + +### Step 2: Build the Cost Comparison + +REQUIRED: Present a 4-column comparison before recommending LMI. + +| Scenario | When it wins | +|----------|-------------| +| Lambda on-demand | Low volume, bursty traffic | +| Lambda + Savings Plan | Moderate steady volume (~17% duration discount) | +| LMI on-demand | High volume, steady traffic | +| LMI + 3yr Savings Plan | High volume + commitment (up to 72% EC2 discount) | + +Rule of thumb: LMI becomes cost-competitive at 50-100M+ req/month with steady traffic. + +See [references/cost-comparison.md](references/cost-comparison.md) for formulas, worked example, and comparison table template. + +### Step 3: Configure the Deployment + +**Instance families** (400+ types, .large and up): C-series (compute), M-series (general), R-series (memory). ARM (Graviton) for best price-performance. + +**Memory-to-vCPU ratios**: 2:1 (compute), 4:1 (general, default), 8:1 (memory). Min 2 GB, max 32 GB. + +**Multi-concurrency defaults/vCPU**: Node.js 64, Java 32, .NET 32, Python 16. + +**Scaling**: MinExecutionEnvironments (default 3), MaxVCpuCount (required), TargetResourceUtilization. + +See [references/configuration-guide.md](references/configuration-guide.md) for decision trees and detailed tuning. + +### Step 4: Migrate the Code + +Review code for thread safety. LMI runs multiple invocations concurrently per execution environment. + +**Common issues**: mutable globals, shared `/tmp` paths, non-thread-safe libs, per-invocation DB connections. + +See [references/thread-safety.md](references/thread-safety.md) for the review checklist and [references/migration-patterns.md](references/migration-patterns.md) for runtime-specific before/after code. + +### Step 5: Set Up Infrastructure + +Two IAM roles required (execution + operator). VPC with 3+ AZ subnets. Create capacity provider, attach function, publish version. + +See [references/infrastructure-setup.md](references/infrastructure-setup.md) for CLI commands, SAM, and CDK templates. + +### Step 6: Validate and Cut Over + +1. Test locally with LocalStack (supports LMI emulation) +2. Monitor CloudWatch: CPU utilization, memory, concurrency, throttle rate +3. Gradual traffic shift with weighted aliases (10% → 50% → 100%) +4. Compare costs after 1-2 weeks of production data +5. Decommission standard Lambda once stable + +## Best Practices + +### Configuration + +- Do: Start with 4:1 ratio and runtime default concurrency +- Do: Use ARM (Graviton) unless x86 dependencies exist +- Do: Let Lambda choose instance types unless specific hardware needed +- Do: Set MaxVCpuCount to control cost ceiling +- Don't: Set MinExecutionEnvironments below 3 (breaks AZ resiliency) +- Don't: Over-restrict instance types (lowers availability) + +### Migration + +- Do: Review all code for thread safety before attaching to capacity provider +- Do: Use weighted aliases for gradual traffic shift +- Do: Include request IDs in all log statements +- Do: Initialize DB pools and SDK clients outside the handler +- Don't: Write to hardcoded `/tmp` paths without request-unique naming +- Don't: Skip cost comparison — LMI is not always cheaper + +### Operations + +- Do: Set CloudWatch alarms on throttle rate > 1% and CPU > 80% +- Do: Plan for 14-day instance rotation (automatic) +- Don't: Manually terminate LMI EC2 instances (delete the capacity provider instead) +- Don't: Forget to publish a version — unpublished functions cannot run on LMI + +## Limits Quick Reference + +| Resource | Limit | +|----------|-------| +| Memory | 2 GB min, 32 GB max | +| Instances | 3 minimum (AZ resiliency) | +| Instance lifespan | 14 days (auto-replaced) | +| Concurrency/vCPU | 64 (Node.js), 32 (Java/.NET), 16 (Python) | +| Runtimes | Node.js, Java, .NET, Python | +| Instance families | C, M, R (.large and up) | +| Scaling | Absorbs 50% spike; doubles within 5 min | + +## Troubleshooting Quick Reference + +| Issue | Cause | Fix | +|-------|-------|-----| +| 429 throttles | Traffic exceeds scaling speed | Increase MinExecutionEnvironments or lower TargetResourceUtilization | +| Function stuck PENDING | Provisioning instances | Wait; check VPC/IAM config | +| Architecture mismatch | Function ≠ capacity provider arch | Align both to same architecture | +| Cannot terminate instances | Managed by capacity provider | Delete capacity provider instead | +| Race conditions | Code not thread-safe | See [references/thread-safety.md](references/thread-safety.md) | + +See [references/troubleshooting.md](references/troubleshooting.md) for detailed resolution steps. + +## Configuration + +### AWS CLI Setup + +REQUIRED: AWS credentials configured on the host machine. + +**Verify access**: Run `aws sts get-caller-identity` + +### Regional Availability + +us-east-1, us-east-2, us-west-2, ap-northeast-1, eu-west-1 + +## Language Selection + +Default: TypeScript + +Override: "use Python" → Python, "use JavaScript" → JavaScript. When not specified, ALWAYS use TypeScript. + +## IaC Framework Selection + +Default: CDK + +Override: "use SAM" → SAM YAML, "use CloudFormation" → CloudFormation YAML. When not specified, ALWAYS use CDK. + +## Error Scenarios + +### Serverless MCP Server Unavailable + +- Inform user: "AWS Serverless MCP not responding" +- Ask: "Proceed without MCP support?" +- DO NOT continue without user confirmation + +### Unsupported Runtime + +- State: "Lambda Managed Instances does not yet support [runtime]" +- List supported runtimes +- Suggest standard Lambda as alternative + +### Unsupported Region + +- State: "Lambda Managed Instances is not yet available in [region]" +- List available regions + +## Resources + +- [Lambda Managed Instances Docs](https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html) +- [Introducing LMI (AWS Blog)](https://aws.amazon.com/blogs/aws/introducing-aws-lambda-managed-instances-serverless-simplicity-with-ec2-flexibility/) +- [Build High-Performance Apps with LMI](https://aws.amazon.com/blogs/compute/build-high-performance-apps-with-aws-lambda-managed-instances/) +- [AWS Lambda Pricing](https://aws.amazon.com/lambda/pricing/) diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md new file mode 100644 index 00000000..5fa7e49f --- /dev/null +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md @@ -0,0 +1,59 @@ +# LMI Configuration Guide + +## Instance Type Decision Tree + +- **CPU-intensive** (encoding, ML, compression) → C-series, 2:1 ratio, concurrency=1/vCPU +- **Memory-intensive** (caching, large datasets) → R-series, 8:1 ratio +- **Network-intensive** (streaming, data transfer) → Use AllowedInstanceTypes for n-suffix types, 4:1 ratio +- **General/balanced** (web APIs, microservices) → M-series, 4:1 ratio, default concurrency + +Architecture: ARM (Graviton, g-suffix) for price-performance. x86 (i=Intel, a=AMD) when dependencies require it. + +## Memory-to-vCPU Ratios + +| Ratio | Profile | When to use | Memory examples | +|-------|---------|-------------|-----------------| +| 2:1 | Compute | CPU-bound work | 2GB/1vCPU, 4GB/2vCPU | +| 4:1 | General | Most workloads (default) | 4GB/1vCPU, 8GB/2vCPU | +| 8:1 | Memory | Caching, data, Python apps | 8GB/1vCPU, 16GB/2vCPU | + +Min: 2 GB / 1 vCPU. Max: 32 GB. Memory must align with ratio multiples. + +## Memory Sizing from Existing Lambda + +| Current Lambda | LMI memory | Ratio | Rationale | +|---------------|------------|-------|-----------| +| 128-512 MB | 2048 MB | 4:1 | LMI minimum; multi-concurrency shares memory | +| 512 MB-1 GB | 2048 MB | 4:1 | Room for concurrent requests | +| 1-2 GB | 4096 MB | 4:1 | Standard upgrade path | +| 2-4 GB | 4096-8192 MB | 4:1 or 8:1 | Depends on memory vs CPU bottleneck | +| 4-10 GB | 8192-16384 MB | 8:1 | Likely memory-heavy workload | + +## Concurrency Tuning + +| Runtime | Default/vCPU | I/O-bound | CPU-bound | +|---------|-------------|-----------|-----------| +| Node.js | 64 | Keep or increase | 1 per vCPU | +| Java | 32 | Keep | 1 per vCPU | +| .NET | 32 | Keep | 1 per vCPU | +| Python | 16 | Keep | 1 per vCPU | + +Total capacity = MinExecutionEnvironments × PerExecutionEnvironmentMaxConcurrency + +## Capacity Provider Scaling Controls + +| Control | Default | Guidance | +|---------|---------|----------| +| MinExecutionEnvironments | 3 | Increase for baseline capacity; never below 3 | +| MaxExecutionEnvironments | — | Set based on cost budget | +| MaxVCpuCount | Required | Start at 30, adjust by load | +| TargetResourceUtilization | ~50% headroom | Raise for cost savings (less burst tolerance) | +| AllowedInstanceTypes | All | Restrict only for specific hardware needs | +| ExcludedInstanceTypes | None | Exclude expensive types in dev/test | + +## Monitoring Thresholds + +- **CPU > 80%**: reduce concurrency or add vCPUs +- **CPU < 20%**: increase concurrency for better utilization +- **Throttle rate (429s) > 1%**: increase MinExecutionEnvironments or reduce utilization target +- **Memory > 90%**: increase memory or reduce concurrency diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md new file mode 100644 index 00000000..11112244 --- /dev/null +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md @@ -0,0 +1,72 @@ +# Lambda vs LMI Cost Comparison + +## Pricing Components + +**Standard Lambda:** $0.20/M requests + $0.0000166667/GB-sec (x86) or $0.0000133334 (ARM). Compute Savings Plans give ~17% discount on duration. + +**LMI:** $0.20/M requests + EC2 instance cost (24/7) + 15% management fee on EC2 on-demand price. No per-request duration charge. Savings Plans/RIs discount up to 72% on EC2 compute. 15% fee always on on-demand price. + +## Discount Comparison + +| Option | Lambda discount | LMI discount | +|--------|----------------|--------------| +| On-demand | 0% | 0% | +| Compute Savings Plan (1yr) | ~17% on duration | ~40-50% on EC2 | +| Compute Savings Plan (3yr) | ~17% on duration | ~60-72% on EC2 | +| Reserved Instances (1yr) | N/A | ~40% on EC2 | +| Reserved Instances (3yr) | N/A | ~60-65% on EC2 | + +Compute Savings Plans apply to both Lambda duration AND EC2 instances. One commitment can cover both. + +## Calculation Formulas + +``` +# Lambda on-demand +duration_cost = requests × avg_duration_sec × memory_GB × $0.0000166667 +request_cost = requests × $0.20 / 1,000,000 +total = duration_cost + request_cost + +# Lambda + Savings Plan (17% on duration) +total = (duration_cost × 0.83) + request_cost + +# LMI on-demand +ec2_cost = num_instances × hourly_price × 730 +mgmt_fee = ec2_cost × 0.15 +total = ec2_cost + mgmt_fee + request_cost + +# LMI + 3yr Savings Plan (65% discount on EC2) +total = (ec2_cost × 0.35) + mgmt_fee + request_cost +``` + +## Comparison Table Template + +Present this for every assessment: + +``` +| Component | Lambda OD | Lambda+SP | LMI OD | LMI+3yr SP | +|--------------------|-----------|-----------|--------|------------| +| Requests | $X | $X | $X | $X | +| Duration/compute | $X | $X | $X | $X | +| Management fee | — | — | $X | $X | +| Monthly total | $X | $X | $X | $X | +| Annual total | $X | $X | $X | $X | +| Savings vs Lambda | baseline | X% | X% | X% | +``` + +## Worked Example + +Node.js API, 100 req/s steady (259M req/mo), 200ms avg, 512 MB, x86: + +| Scenario | Monthly | Annual | Savings | +|----------|---------|--------|---------| +| Lambda on-demand | $484 | $5,808 | baseline | +| Lambda + 3yr SP | $411 | $4,932 | 15% | +| LMI on-demand (3× m7i.large) | $288 | $3,456 | 40% | +| LMI + 3yr SP | $155 | $1,860 | 68% | + +## When LMI is NOT Cheaper + +- < 50M req/month (fixed 3-instance cost exceeds Lambda) +- Very short functions (< 100ms duration) +- Highly bursty, unpredictable traffic +- Workloads needing scale-to-zero diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md new file mode 100644 index 00000000..7ccb188a --- /dev/null +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md @@ -0,0 +1,96 @@ +# LMI Infrastructure Setup + +## IAM Roles (Two Required) + +### 1. Execution Role (for the function) +```bash +aws iam create-role --role-name LMIExecutionRole \ + --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}' +aws iam attach-role-policy --role-name LMIExecutionRole \ + --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole +``` + +### 2. Operator Role (for capacity provider EC2 management) +```bash +aws iam create-role --role-name LMIOperatorRole \ + --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}' +aws iam attach-role-policy --role-name LMIOperatorRole \ + --policy-arn arn:aws:iam::aws:policy/AWSLambdaManagedEC2ResourceOperator +``` + +First-time capacity provider creation also requires `iam:CreateServiceLinkedRole`. + +## VPC Requirements + +- 3+ subnets across different AZs (for default 3-instance fleet) +- Security groups restricting to necessary traffic only +- NAT Gateway or VPC endpoints for egress (CloudWatch Logs, X-Ray) +- Function invocations bypass VPC (routed through Lambda service) +- Recommended VPC endpoints: CloudWatch Logs, X-Ray, S3, DynamoDB, SQS + +## CLI Workflow + +```bash +# 1. Create capacity provider +aws lambda create-capacity-provider \ + --capacity-provider-name my-cp \ + --vpc-config SubnetIds=[$SUBNET1,$SUBNET2,$SUBNET3],SecurityGroupIds=[$SG_ID] \ + --permissions-config CapacityProviderOperatorRoleArn=arn:aws:iam::$ACCT:role/LMIOperatorRole \ + --instance-requirements Architectures=[arm64] \ + --capacity-provider-scaling-config MaxVCpuCount=30 + +# 2. Create function +aws lambda create-function --function-name my-fn --runtime python3.13 \ + --handler app.handler --zip-file fileb://function.zip \ + --role arn:aws:iam::$ACCT:role/LMIExecutionRole --architectures arm64 \ + --memory-size 4096 \ + --capacity-provider-config \ + LambdaManagedInstancesCapacityProviderConfig='{CapacityProviderArn=arn:aws:lambda:$REGION:$ACCT:capacity-provider:my-cp}' + +# 3. Publish version (triggers provisioning — takes several minutes) +aws lambda publish-version --function-name my-fn + +# 4. Invoke (must use versioned ARN) +aws lambda invoke --function-name my-fn:1 --payload '{}' response.json +``` + +Architecture must match between function and capacity provider. + +## SAM Template + +```yaml +Resources: + MyCP: + Type: AWS::Lambda::CapacityProvider + Properties: + CapacityProviderName: my-cp + VpcConfig: + SubnetIds: [!Ref Sub1, !Ref Sub2, !Ref Sub3] + SecurityGroupIds: [!Ref SG] + PermissionsConfig: + CapacityProviderOperatorRoleArn: !GetAtt OpRole.Arn + InstanceRequirements: + Architectures: [arm64] + CapacityProviderScalingConfig: + MaxVCpuCount: 30 + + MyFn: + Type: AWS::Serverless::Function + Properties: + Runtime: python3.13 + Handler: app.handler + MemorySize: 4096 + Architectures: [arm64] + CapacityProviderConfig: + LambdaManagedInstancesCapacityProviderConfig: + CapacityProviderArn: !GetAtt MyCP.Arn +``` + +## Cleanup + +```bash +aws lambda delete-function --function-name my-fn +aws lambda delete-capacity-provider --capacity-provider-name my-cp +``` + +Deleting the capacity provider destroys all associated EC2 instances. diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/migration-patterns.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/migration-patterns.md new file mode 100644 index 00000000..c0e302a0 --- /dev/null +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/migration-patterns.md @@ -0,0 +1,128 @@ +# LMI Migration Patterns + +Before/after code examples for migrating to multi-concurrency. + +## Node.js + +### Global State +```javascript +// BEFORE (race condition) +let requestCount = 0; +exports.handler = async (event) => { + requestCount++; + return { count: requestCount }; +}; + +// AFTER (request-isolated) +const { AsyncLocalStorage } = require('node:async_hooks'); +const als = new AsyncLocalStorage(); +exports.handler = async (event) => { + return als.run({ id: event.requestContext?.requestId }, async () => { + return await processEvent(event); + }); +}; +``` + +### File I/O +```javascript +// BEFORE (shared path) +fs.writeFileSync('/tmp/output.json', JSON.stringify(data)); + +// AFTER (request-unique path) +const path = `/tmp/output-${event.requestContext?.requestId}.json`; +try { fs.writeFileSync(path, JSON.stringify(data)); } +finally { fs.unlinkSync(path); } +``` + +### Database +```javascript +// BEFORE (per-invocation connection) +exports.handler = async (event) => { + const conn = await mysql.createConnection({/*...*/}); + const [rows] = await conn.execute('SELECT ...'); + await conn.end(); +}; + +// AFTER (shared pool) +const pool = mysql.createPool({ connectionLimit: 10, /*...*/ }); +exports.handler = async (event) => { + const [rows] = await pool.execute('SELECT ...'); + return rows; +}; +``` + +## Python + +### Global State +```python +# BEFORE (race condition) +cache = {} +def handler(event, context): + cache[event['key']] = compute(event) + +# AFTER (thread-safe) +import threading +_lock = threading.Lock() +_cache = {} +def handler(event, context): + with _lock: + if event['key'] not in _cache: + _cache[event['key']] = compute(event) + return _cache[event['key']] +``` + +### File I/O +```python +# BEFORE +with open('/tmp/data.json', 'w') as f: json.dump(event, f) + +# AFTER +path = f'/tmp/data-{context.aws_request_id}.json' +try: + with open(path, 'w') as f: json.dump(event, f) +finally: + os.unlink(path) +``` + +### Database +```python +# BEFORE (per-invocation) +def handler(event, context): + conn = psycopg2.connect(host='...') + +# AFTER (pool) +from psycopg2 import pool +db_pool = pool.ThreadedConnectionPool(2, 10, host=os.environ['DB_HOST']) +def handler(event, context): + conn = db_pool.getconn() + try: return query(conn, event) + finally: db_pool.putconn(conn) +``` + +## Java + +### Global State +```java +// BEFORE (race condition) +private static Map cache = new HashMap<>(); + +// AFTER (thread-safe) +private static final ConcurrentHashMap cache = new ConcurrentHashMap<>(); +// Use cache.computeIfAbsent(key, k -> compute(k)); +``` + +### Database +```java +// BEFORE (per-invocation) +Connection conn = DriverManager.getConnection("jdbc:..."); + +// AFTER (HikariCP pool, static init) +private static final HikariDataSource ds; +static { + HikariConfig c = new HikariConfig(); + c.setJdbcUrl(System.getenv("DB_URL")); + c.setMaximumPoolSize(10); + ds = new HikariDataSource(c); +} +// Use: try (Connection conn = ds.getConnection()) { ... } +``` diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md new file mode 100644 index 00000000..338e0477 --- /dev/null +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md @@ -0,0 +1,53 @@ +# Thread Safety for LMI + +LMI runs multiple invocations concurrently in the same execution environment. Code must be thread-safe. + +## Code Review Checklist + +When reviewing a function for LMI readiness, check each item: + +- [ ] No global/static mutable variables (use immutable or request-local state) +- [ ] No shared `/tmp` paths (use request ID in filenames, clean up after) +- [ ] Thread-safe libraries only (check DB drivers, HTTP clients, caching libs) +- [ ] Database connections use pools (initialized outside handler, not per-invocation) +- [ ] SDK clients outside handler (module-level singletons are fine — they are thread-safe) +- [ ] No request state in global scope (use AsyncLocalStorage, contextvars, ThreadLocal) +- [ ] Logging includes request ID (for tracing concurrent requests) +- [ ] No environment variable mutation during requests (os.environ is shared) + +## Runtime-Specific Guidance + +### Node.js +- Async/await model naturally suits multi-concurrency +- Use `AsyncLocalStorage` from `node:async_hooks` for request context +- Initialize SDK clients and DB pools at module level +- Avoid module-level mutable state (`let count = 0` is a race condition) + +### Python +- Uses separate processes per env (GIL limits true threading), but concurrent requests still share the process +- Use `contextvars` for request-specific data +- Use `threading.Lock` for shared mutable state +- Prefer 4:1 or 8:1 memory ratio (GIL limits CPU utilization) +- Use `ThreadedConnectionPool` for database connections + +### Java +- Use immutable objects and thread-safe collections (`ConcurrentHashMap`, `Collections.unmodifiableList`) +- Initialize SDK clients and connection pools in constructor or static block +- Avoid mutable `static` fields +- Use `ThreadLocal` for request-specific state +- Use HikariCP or similar for connection pooling + +### .NET +- Use `AsyncLocal` for request-scoped data +- Inject scoped services via DI container +- Initialize `HttpClient` and SDK clients as singletons +- Use `ConcurrentDictionary` instead of `Dictionary` for shared state + +## Common Anti-Patterns + +| Anti-pattern | Risk | Fix | +|-------------|------|-----| +| Singleton HTTP clients per invocation | Wasted connections | Module-level initialization | +| Setting env vars during request | Race condition | Pass state via parameters | +| Logging without request ID | Unreadable interleaved logs | Include aws_request_id | +| Assuming sequential execution | State corruption | Each invocation must be self-contained | diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md new file mode 100644 index 00000000..16dd6d2b --- /dev/null +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md @@ -0,0 +1,42 @@ +# LMI Troubleshooting + +## Common Issues + +| Issue | Cause | Resolution | +|-------|-------|------------| +| 429 throttles during scale-up | Traffic doubled faster than 5-min scaling window | Increase MinExecutionEnvironments or lower TargetResourceUtilization | +| Function stuck in PENDING | Capacity provider provisioning instances | Wait several minutes; verify VPC subnets have IP capacity and IAM roles are correct | +| Architecture mismatch error | Function architecture ≠ capacity provider | Align both to arm64 or x86_64 | +| Cannot terminate EC2 instances | LMI instances managed by capacity provider | Delete capacity provider to destroy instances; cannot use EC2 console | +| High CPU, low throughput | Concurrency too high for CPU-bound work | Reduce PerExecutionEnvironmentMaxConcurrency to 1/vCPU | +| Race conditions in production | Code not thread-safe for multi-concurrency | Review with checklist in thread-safety.md | +| Function version not ACTIVE | Fewer than 3 execution environments ready | Wait for provisioning; check capacity provider status | +| Unexpected 500 errors | Unhandled concurrent access to shared state | Add thread-safe patterns from migration-patterns.md | +| CloudWatch logs missing | VPC egress not configured | Add NAT Gateway or CloudWatch Logs VPC endpoint | +| High costs despite low traffic | Minimum 3 instances always running | Evaluate if standard Lambda is more cost-effective | + +## Debugging Steps + +### Function Not Starting + +1. Check capacity provider status: `aws lambda get-capacity-provider --capacity-provider-name ` +2. Verify subnets span 3+ AZs with available IPs +3. Confirm security group allows necessary egress +4. Check operator role has `AWSLambdaManagedEC2ResourceOperator` policy +5. Look for `Operator` field in EC2 DescribeInstances or `aws:lambda:capacity-provider` tag + +### Performance Issues + +1. Check CloudWatch metrics (5-min intervals): CPU utilization, memory, concurrency/env +2. If CPU > 80%: reduce concurrency or add vCPUs (increase memory with appropriate ratio) +3. If throttles > 1%: increase MinExecutionEnvironments +4. If CPU < 20%: increase concurrency — resources are underutilized +5. For Python: verify 4:1 or 8:1 ratio (GIL limits CPU parallelism) + +### Cost Issues + +1. Verify instance count matches actual need (not over-provisioned) +2. Check if Savings Plans or RIs are applied to these instances +3. Compare actual costs against the 4-column estimate from cost-comparison.md +4. If traffic is lower than expected, consider reducing MaxVCpuCount +5. For dev/test: use ExcludedInstanceTypes to avoid expensive instance families From 2ca247bf46c1d9e5438f898bc513a7c3d3327b68 Mon Sep 17 00:00:00 2001 From: Sahil Bhimjiani Date: Mon, 20 Apr 2026 16:21:06 -0500 Subject: [PATCH 2/8] feat: register LMI skill in serverless plugin and update cross-references - Add managed-instances and lmi keywords to plugin.json - Add LMI skill triggers to README aws-serverless section - Add cross-reference to LMI skill in aws-lambda SKILL.md (key capabilities and "When to Load Reference Files" sections) - Update plugin description to mention Lambda Managed Instances --- README.md | 5 +++-- plugins/aws-serverless/.claude-plugin/plugin.json | 2 ++ plugins/aws-serverless/skills/aws-lambda/SKILL.md | 2 ++ 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 1e51475e..2d75aa86 100644 --- a/README.md +++ b/README.md @@ -32,7 +32,7 @@ To maximize the benefits of plugin-assisted development while maintaining securi | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------- | | **amazon-location-service** | Add maps, geocoding, routing, places search, and geospatial features to applications with Amazon Location Service | Available | | **aws-amplify** | Build full-stack apps with AWS Amplify Gen 2 using guided workflows for auth, data, storage, and functions | Available | -| **aws-serverless** | Build serverless applications with Lambda, API Gateway, EventBridge, Step Functions, and durable functions | Available | +| **aws-serverless** | Build serverless applications with Lambda, API Gateway, EventBridge, Step Functions, durable functions, and Lambda Managed Instances | Available | | **databases-on-aws** | Database guidance for the AWS database portfolio — schema design, queries, migrations, and multi-tenant patterns | Some Services Available (Aurora DSQL) | | **deploy-on-aws** | Deploy applications to AWS with architecture recommendations, cost estimates, and IaC deployment | Available | | **migration-to-aws** | Migrate GCP infrastructure to AWS with resource discovery, architecture mapping, cost analysis, and execution planning | Available | @@ -205,7 +205,7 @@ Build full-stack apps with AWS Amplify Gen 2 using TypeScript code-first develop ## aws-serverless -Design, build, deploy, test, and debug serverless applications with AWS Lambda, API Gateway, EventBridge, Step Functions, and durable functions. Includes SAM and CDK deployment workflows, a SAM template validation hook, and the AWS Lambda durable functions skill for building resilient, long-running, multi-step applications. +Design, build, deploy, test, and debug serverless applications with AWS Lambda, API Gateway, EventBridge, Step Functions, and durable functions. Includes SAM and CDK deployment workflows, a SAM template validation hook, the AWS Lambda durable functions skill for building resilient, long-running, multi-step applications, and the Lambda Managed Instances skill for evaluating, configuring, and migrating workloads to EC2-backed Lambda. ### Agent Skill Triggers @@ -214,6 +214,7 @@ Design, build, deploy, test, and debug serverless applications with AWS Lambda, | **aws-lambda** | "Lambda function", "event source", "serverless application", "API Gateway", "EventBridge", "Step Functions", "serverless API", "event-driven architecture", "Lambda trigger" | | **aws-serverless-deployment** | "use SAM", "SAM template", "SAM init", "SAM deploy", "CDK serverless", "CDK Lambda construct", "NodejsFunction", "PythonFunction", "serverless CI/CD pipeline" | | **aws-lambda-durable-functions** | "lambda durable functions", "workflow orchestration", "state machines", "retry/checkpoint patterns", "long-running stateful Lambda", "saga pattern", "human-in-the-loop" | +| **aws-lambda-managed-instances** | "Lambda Managed Instances", "LMI", "capacity provider", "multi-concurrency Lambda", "EC2-backed Lambda", "cold start elimination", "Graviton Lambda", "Lambda cost optimization with Reserved Instances" | ### MCP Servers diff --git a/plugins/aws-serverless/.claude-plugin/plugin.json b/plugins/aws-serverless/.claude-plugin/plugin.json index 2b0d7ddf..46190f3b 100644 --- a/plugins/aws-serverless/.claude-plugin/plugin.json +++ b/plugins/aws-serverless/.claude-plugin/plugin.json @@ -8,6 +8,8 @@ "aws", "lambda", "durable functions", + "managed-instances", + "lmi", "serverless", "development", "sam", diff --git a/plugins/aws-serverless/skills/aws-lambda/SKILL.md b/plugins/aws-serverless/skills/aws-lambda/SKILL.md index 9e074af2..4dd14e2f 100644 --- a/plugins/aws-serverless/skills/aws-lambda/SKILL.md +++ b/plugins/aws-serverless/skills/aws-lambda/SKILL.md @@ -16,6 +16,7 @@ Use SAM CLI for project initialization and deployment, Lambda Web Adapter for we - **Web Application Deployment**: Deploy full-stack applications with Lambda Web Adapter - **Event Source Mappings**: Configure Lambda triggers for DynamoDB, Kinesis, SQS, Kafka - **Lambda durable functions**: Resilient multi-step applications with checkpointing — see the [durable-functions skill](../aws-lambda-durable-functions/) for guidance +- **Lambda Managed Instances**: Run Lambda on dedicated EC2 instances with managed lifecycle — see the [managed-instances skill](../aws-lambda-managed-instances/) for evaluation, configuration, and migration guidance - **Schema Management**: Type-safe EventBridge integration with schema registry - **Observability**: CloudWatch logs, metrics, and X-Ray tracing - **Performance Optimization**: Right-sizing, cost optimization, and troubleshooting @@ -30,6 +31,7 @@ Load the appropriate reference file based on what the user is working on: - **Event sources**, **DynamoDB Streams**, **Kinesis**, **SQS**, **Kafka**, **S3 notifications**, or **SNS** -> see [references/event-sources.md](references/event-sources.md) - **EventBridge**, **event bus**, **event patterns**, **event design**, **Pipes**, or **schema registry** -> see [references/event-driven-architecture.md](references/event-driven-architecture.md) - **Durable functions**, **checkpointing**, **replay model**, **saga pattern**, or **long-running Lambda workflows** -> see the [durable-functions skill](../aws-lambda-durable-functions/) (separate skill in this plugin with full SDK reference, testing, and deployment guides) +- **Lambda Managed Instances**, **LMI**, **capacity providers**, **multi-concurrency**, **EC2-backed Lambda**, **cold start elimination**, or **Lambda cost optimization with Reserved Instances** -> see the [managed-instances skill](../aws-lambda-managed-instances/) (separate skill in this plugin for evaluation, configuration, and migration) - **Orchestration**, **workflows**, or **Durable Functions vs Step Functions** -> see [references/orchestration-and-workflows.md](references/orchestration-and-workflows.md) - **Step Functions**, **ASL**, **state machines**, **JSONata**, **Distributed Map**, or **SDK integrations** -> see [references/step-functions.md](references/step-functions.md) - **Step Functions testing**, **TestState API**, **mocking service integrations**, or **state machine unit tests** -> see [references/step-functions-testing.md](references/step-functions-testing.md) From 6645a697ee20600d67a7513dd78490a1f33a52ce Mon Sep 17 00:00:00 2001 From: Sahil Bhimjiani Date: Fri, 8 May 2026 16:55:23 -0500 Subject: [PATCH 3/8] fix(aws-serverless): address PR feedback and correct LMI concurrency model - Fix Python concurrency model: process-based isolation, no thread safety needed - Clarify cold starts claim (provisioned capacity only) - Replace hardcoded regional availability with docs link - Replace LocalStack reference with generic non-prod testing - Convert Step 5 to numbered procedure - Simplify cost comparison (defer SP/RI to pricing calculator) - Use least-privilege IAM with ec2:ManagedResourceOperator condition - Add setup-lmi.sh script for automated provisioning - Add VPC endpoint requirements table with costs - Add Powertools compatibility and SDK minimum versions - Add CloudWatch metrics dimensions guidance - Add pricing calculator, samples repo, and migration blog to resources --- .../aws-lambda-managed-instances/SKILL.md | 47 +++-- .../references/configuration-guide.md | 10 + .../references/cost-comparison.md | 5 + .../references/infrastructure-setup.md | 172 +++++++++++++++--- .../references/migration-patterns.md | 42 +++-- .../references/thread-safety.md | 103 ++++++++--- .../scripts/setup-lmi.sh | 69 +++++++ 7 files changed, 364 insertions(+), 84 deletions(-) create mode 100755 plugins/aws-serverless/skills/aws-lambda-managed-instances/scripts/setup-lmi.sh diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md index 10cfc77a..fdeb8ab7 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md @@ -23,9 +23,9 @@ For standard Lambda development, see [aws-lambda skill](../aws-lambda/). For SAM - **Cost comparison**, **pricing analysis**, **Lambda vs LMI cost**, **Savings Plans**, or **Reserved Instances** -> see [references/cost-comparison.md](references/cost-comparison.md) - **Instance types**, **memory sizing**, **vCPU ratios**, **scaling tuning**, or **capacity provider config** -> see [references/configuration-guide.md](references/configuration-guide.md) -- **Thread safety**, **code review checklist**, or **multi-concurrency readiness** -> see [references/thread-safety.md](references/thread-safety.md) +- **Thread safety**, **concurrency model**, **code review checklist**, **Powertools compatibility**, or **multi-concurrency readiness** -> see [references/thread-safety.md](references/thread-safety.md) - **Before/after code examples**, **runtime-specific migration** (Node.js, Python, Java, .NET), or **connection pooling** -> see [references/migration-patterns.md](references/migration-patterns.md) -- **IAM roles**, **VPC setup**, **CLI commands**, **SAM template**, or **CDK example** -> see [references/infrastructure-setup.md](references/infrastructure-setup.md) +- **IAM roles**, **VPC setup**, **CLI commands**, **SAM template**, or **CDK example** -> see [references/infrastructure-setup.md](references/infrastructure-setup.md) and [scripts/setup-lmi.sh](scripts/setup-lmi.sh) - **Errors**, **throttling**, **debugging**, or **stuck deployments** -> see [references/troubleshooting.md](references/troubleshooting.md) ## Quick Decision: Is LMI Right for This Workload? @@ -34,11 +34,11 @@ For standard Lambda development, see [aws-lambda skill](../aws-lambda/). For SAM |--------|---------------------|---------------------------| | Traffic | Steady, predictable, 50M+ req/mo | Bursty, unpredictable, long idle | | Cost | Duration-heavy spend at scale | Low or sporadic invocations | -| Cold starts | Unacceptable (LMI has zero) | Tolerable or mitigated by SnapStart | -| Compute | Latest CPUs, specific families, high network BW | Standard Lambda memory/CPU sufficient | -| Compliance | Single-tenant required, VPC control | Multi-tenant Firecracker acceptable | +| Cold starts | Unacceptable (LMI eliminates for provisioned capacity; scale-out may have brief delays) | Tolerable or mitigated by SnapStart | +| Compute | Latest CPUs, specific families, high network bandwidth | Standard Lambda memory/CPU sufficient | +| Isolation | Dedicated EC2 instances in your account, full VPC control | Shared Firecracker micro-VMs acceptable | | Scale-to-zero | Not needed (min 3 instances always run) | Required (pay nothing when idle) | -| Code readiness | Thread-safe or feasible to refactor | Non-thread-safe, expensive to change | +| Code readiness | Thread-safe (Node.js/Java/.NET) or any Python code | Non-thread-safe Node.js/Java/.NET, expensive to change | ## Instructions @@ -51,23 +51,21 @@ Gather these signals before recommending: 3. **Runtime**: Node.js, Java, .NET, or Python? 4. **Memory/CPU**: How much memory? CPU-bound or I/O-bound? 5. **Execution duration**: Average and P99? -6. **Thread safety**: Mutable globals, shared `/tmp` paths, non-thread-safe libs? +6. **Concurrency readiness**: Thread safety (Node.js/Java/.NET)? Shared `/tmp` paths? Per-invocation DB connections? 7. **VPC**: Already in a VPC? Private resource access needed? ### Step 2: Build the Cost Comparison -REQUIRED: Present a 4-column comparison before recommending LMI. +REQUIRED: Present a cost comparison before recommending LMI. Compare at minimum: | Scenario | When it wins | |----------|-------------| | Lambda on-demand | Low volume, bursty traffic | -| Lambda + Savings Plan | Moderate steady volume (~17% duration discount) | | LMI on-demand | High volume, steady traffic | -| LMI + 3yr Savings Plan | High volume + commitment (up to 72% EC2 discount) | Rule of thumb: LMI becomes cost-competitive at 50-100M+ req/month with steady traffic. -See [references/cost-comparison.md](references/cost-comparison.md) for formulas, worked example, and comparison table template. +For discount analysis (Savings Plans, Reserved Instances), refer users to the [AWS Pricing Calculator](https://calculator.aws/) and [references/cost-comparison.md](references/cost-comparison.md) for formulas and worked examples. Discount recommendations require workload-specific forecasting beyond this skill's scope. ### Step 3: Configure the Deployment @@ -83,21 +81,30 @@ See [references/configuration-guide.md](references/configuration-guide.md) for d ### Step 4: Migrate the Code -Review code for thread safety. LMI runs multiple invocations concurrently per execution environment. +Review code for concurrency safety. LMI runs multiple invocations concurrently per execution environment, but the model differs by runtime: -**Common issues**: mutable globals, shared `/tmp` paths, non-thread-safe libs, per-invocation DB connections. +- **Python**: Process-based isolation — globals are NOT shared. No thread-safety changes needed. Focus on `/tmp` conflicts and memory sizing (per-process × concurrency). +- **Node.js**: Worker threads — globals shared within a worker. Requires async safety. Callback handlers not supported on Node.js 22. +- **Java/.NET**: OS threads/Tasks — handler shared across threads. Requires full thread safety. + +**Common issues (all runtimes)**: shared `/tmp` paths, per-invocation DB connections. +**Thread-safety issues (Node.js/Java/.NET only)**: mutable globals, non-thread-safe libs. See [references/thread-safety.md](references/thread-safety.md) for the review checklist and [references/migration-patterns.md](references/migration-patterns.md) for runtime-specific before/after code. ### Step 5: Set Up Infrastructure -Two IAM roles required (execution + operator). VPC with 3+ AZ subnets. Create capacity provider, attach function, publish version. +1. Create two IAM roles: execution role (for the function) and operator role (for capacity provider EC2 management) +2. Configure VPC with subnets across 3+ AZs +3. Create capacity provider with VPC config and scaling limits +4. Create or update function with capacity provider attachment +5. Publish a version (triggers instance provisioning) -See [references/infrastructure-setup.md](references/infrastructure-setup.md) for CLI commands, SAM, and CDK templates. +See [references/infrastructure-setup.md](references/infrastructure-setup.md) for CLI commands and SAM templates. ### Step 6: Validate and Cut Over -1. Test locally with LocalStack (supports LMI emulation) +1. Deploy to a non-production environment first 2. Monitor CloudWatch: CPU utilization, memory, concurrency, throttle rate 3. Gradual traffic shift with weighted aliases (10% → 50% → 100%) 4. Compare costs after 1-2 weeks of production data @@ -116,7 +123,8 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for ### Migration -- Do: Review all code for thread safety before attaching to capacity provider +- Do: Start with I/O-heavy functions (benefit most from multi-concurrency; CPU-bound functions compete for same CPU) +- Do: Review code for concurrency safety before attaching to capacity provider (thread safety for Node.js/Java/.NET; `/tmp` and memory for Python) - Do: Use weighted aliases for gradual traffic shift - Do: Include request IDs in all log statements - Do: Initialize DB pools and SDK clients outside the handler @@ -164,7 +172,7 @@ REQUIRED: AWS credentials configured on the host machine. ### Regional Availability -us-east-1, us-east-2, us-west-2, ap-northeast-1, eu-west-1 +Check the [Lambda Managed Instances documentation](https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html) for current regional availability. ## Language Selection @@ -202,4 +210,7 @@ Override: "use SAM" → SAM YAML, "use CloudFormation" → CloudFormation YAML. - [Lambda Managed Instances Docs](https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html) - [Introducing LMI (AWS Blog)](https://aws.amazon.com/blogs/aws/introducing-aws-lambda-managed-instances-serverless-simplicity-with-ec2-flexibility/) - [Build High-Performance Apps with LMI](https://aws.amazon.com/blogs/compute/build-high-performance-apps-with-aws-lambda-managed-instances/) +- [Migrating Functions to LMI (AWS Blog)](https://aws.amazon.com/blogs/compute/migrating-your-functions-to-aws-lambda-managed-instances/) +- [LMI Pricing Calculator](https://aws-samples.github.io/sample-aws-lambda-managed-instances/) +- [LMI Samples Repository](https://github.com/aws-samples/sample-aws-lambda-managed-instances) - [AWS Lambda Pricing](https://aws.amazon.com/lambda/pricing/) diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md index 5fa7e49f..ff0162d6 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md @@ -57,3 +57,13 @@ Total capacity = MinExecutionEnvironments × PerExecutionEnvironmentMaxConcurren - **CPU < 20%**: increase concurrency for better utilization - **Throttle rate (429s) > 1%**: increase MinExecutionEnvironments or reduce utilization target - **Memory > 90%**: increase memory or reduce concurrency +- **ExecutionEnvironmentConcurrency near ExecutionEnvironmentConcurrencyLimit**: saturation — reduce concurrency or scale out + +## CloudWatch Metrics Dimensions + +LMI metrics are split across two CloudWatch dimensions: + +- **Alias (live)**: Invocations, Errors, Throttles, Duration +- **Version ($LATEST or numbered)**: CPUUtilization, MemoryUtilization, ExecutionEnvironmentConcurrency, ExecutionEnvironmentCount + +Create a unified dashboard combining both views to monitor LMI performance effectively. diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md index 11112244..1bc16a76 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md @@ -70,3 +70,8 @@ Node.js API, 100 req/s steady (259M req/mo), 200ms avg, 512 MB, x86: - Very short functions (< 100ms duration) - Highly bursty, unpredictable traffic - Workloads needing scale-to-zero + +## Tools + +- [LMI Pricing Calculator](https://aws-samples.github.io/sample-aws-lambda-managed-instances/) — interactive comparison tool +- [AWS Pricing Calculator](https://calculator.aws/) — general AWS cost estimation diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md index 7ccb188a..50731535 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md @@ -3,49 +3,179 @@ ## IAM Roles (Two Required) ### 1. Execution Role (for the function) -```bash -aws iam create-role --role-name LMIExecutionRole \ - --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}' -aws iam attach-role-policy --role-name LMIExecutionRole \ - --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole + +Trust policy: +```json +{ + "Version": "2012-10-17", + "Statement": [{ + "Effect": "Allow", + "Principal": {"Service": "lambda.amazonaws.com"}, + "Action": "sts:AssumeRole" + }] +} +``` + +Minimum permissions: +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "logs:CreateLogGroup", + "logs:CreateLogStream", + "logs:PutLogEvents" + ], + "Resource": "arn:aws:logs:*:*:log-group:/aws/lambda/*" + } + ] +} +``` + +Add VPC permissions only if the function accesses VPC resources: +```json +{ + "Effect": "Allow", + "Action": [ + "ec2:CreateNetworkInterface", + "ec2:DescribeNetworkInterfaces", + "ec2:DeleteNetworkInterface" + ], + "Resource": "*" +} ``` ### 2. Operator Role (for capacity provider EC2 management) -```bash -aws iam create-role --role-name LMIOperatorRole \ - --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}' -aws iam attach-role-policy --role-name LMIOperatorRole \ - --policy-arn arn:aws:iam::aws:policy/AWSLambdaManagedEC2ResourceOperator + +Trust policy: +```json +{ + "Version": "2012-10-17", + "Statement": [{ + "Effect": "Allow", + "Principal": {"Service": "lambda.amazonaws.com"}, + "Action": "sts:AssumeRole" + }] +} +``` + +Minimum permissions (scoped with conditions): +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": ["ec2:RunInstances", "ec2:CreateTags", "ec2:AttachNetworkInterface"], + "Resource": [ + "arn:aws:ec2:*:*:instance/*", + "arn:aws:ec2:*:*:network-interface/*", + "arn:aws:ec2:*:*:volume/*" + ], + "Condition": { + "StringEquals": { + "ec2:ManagedResourceOperator": "scaler.lambda.amazonaws.com" + } + } + }, + { + "Effect": "Allow", + "Action": [ + "ec2:DescribeAvailabilityZones", + "ec2:DescribeCapacityReservations", + "ec2:DescribeInstances", + "ec2:DescribeInstanceStatus", + "ec2:DescribeInstanceTypeOfferings", + "ec2:DescribeInstanceTypes", + "ec2:DescribeSecurityGroups", + "ec2:DescribeSubnets" + ], + "Resource": "*" + }, + { + "Effect": "Allow", + "Action": ["ec2:RunInstances", "ec2:CreateNetworkInterface"], + "Resource": [ + "arn:aws:ec2:*:*:subnet/*", + "arn:aws:ec2:*:*:security-group/*" + ] + }, + { + "Effect": "Allow", + "Action": "ec2:RunInstances", + "Resource": "arn:aws:ec2:*:*:image/*", + "Condition": { + "StringEquals": { "ec2:Owner": "amazon" } + } + }, + { + "Effect": "Allow", + "Action": "iam:PassRole", + "Resource": "" + } + ] +} ``` -First-time capacity provider creation also requires `iam:CreateServiceLinkedRole`. +The `ec2:ManagedResourceOperator` condition ensures RunInstances/CreateTags only apply to Lambda-managed instances. First-time capacity provider creation also requires `iam:CreateServiceLinkedRole`. ## VPC Requirements +LMI runs functions on EC2 instances inside the VPC. These instances need VPC endpoints or NAT to reach AWS services. + - 3+ subnets across different AZs (for default 3-instance fleet) -- Security groups restricting to necessary traffic only -- NAT Gateway or VPC endpoints for egress (CloudWatch Logs, X-Ray) -- Function invocations bypass VPC (routed through Lambda service) -- Recommended VPC endpoints: CloudWatch Logs, X-Ray, S3, DynamoDB, SQS +- Security groups: HTTPS egress (port 443) for AWS API calls; no ingress needed +- Required VPC endpoints: + +| Endpoint | Type | Purpose | Cost | +|----------|------|---------|------| +| S3 | Gateway | Object storage access | Free | +| DynamoDB | Gateway | Table access | Free | +| SQS | Interface | Queue operations | $0.01/hr per AZ | +| CloudWatch Logs | Interface | Log delivery | $0.01/hr per AZ | +| CloudWatch Monitoring | Interface | Metrics/EMF | $0.01/hr per AZ | +| X-Ray | Interface | Distributed tracing | $0.01/hr per AZ | + +Gateway endpoints are free; interface endpoints incur hourly charges per AZ. ## CLI Workflow +Use the setup script for automated provisioning: + +```bash +# Set required environment variables +export SUBNET_IDS="subnet-abc,subnet-def,subnet-ghi" +export SECURITY_GROUP_ID="sg-123456" +export ACCOUNT_ID="123456789012" +export OPERATOR_ROLE_ARN="arn:aws:iam::123456789012:role/LMIOperatorRole" +export EXECUTION_ROLE_ARN="arn:aws:iam::123456789012:role/LMIExecutionRole" + +# Run setup +./scripts/setup-lmi.sh my-function my-capacity-provider arm64 +``` + +See [`scripts/setup-lmi.sh`](../scripts/setup-lmi.sh) for the full script with configurable options. + +### Manual Steps (if not using the script) + ```bash # 1. Create capacity provider aws lambda create-capacity-provider \ --capacity-provider-name my-cp \ - --vpc-config SubnetIds=[$SUBNET1,$SUBNET2,$SUBNET3],SecurityGroupIds=[$SG_ID] \ - --permissions-config CapacityProviderOperatorRoleArn=arn:aws:iam::$ACCT:role/LMIOperatorRole \ - --instance-requirements Architectures=[arm64] \ - --capacity-provider-scaling-config MaxVCpuCount=30 + --vpc-config "SubnetIds=[subnet-abc,subnet-def,subnet-ghi],SecurityGroupIds=[sg-123456]" \ + --permissions-config "CapacityProviderOperatorRoleArn=arn:aws:iam::$ACCT:role/LMIOperatorRole" \ + --instance-requirements "Architectures=[arm64]" \ + --capacity-provider-scaling-config "MaxVCpuCount=30" # 2. Create function aws lambda create-function --function-name my-fn --runtime python3.13 \ --handler app.handler --zip-file fileb://function.zip \ - --role arn:aws:iam::$ACCT:role/LMIExecutionRole --architectures arm64 \ + --role "arn:aws:iam::$ACCT:role/LMIExecutionRole" --architectures arm64 \ --memory-size 4096 \ --capacity-provider-config \ - LambdaManagedInstancesCapacityProviderConfig='{CapacityProviderArn=arn:aws:lambda:$REGION:$ACCT:capacity-provider:my-cp}' + "LambdaManagedInstancesCapacityProviderConfig={CapacityProviderArn=arn:aws:lambda:$REGION:$ACCT:capacity-provider:my-cp}" # 3. Publish version (triggers provisioning — takes several minutes) aws lambda publish-version --function-name my-fn diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/migration-patterns.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/migration-patterns.md index c0e302a0..c26d9f0d 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/migration-patterns.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/migration-patterns.md @@ -53,30 +53,27 @@ exports.handler = async (event) => { ## Python -### Global State +Python on LMI uses **process-based isolation**. Each concurrent invocation runs in its own process with independent memory. Global state is NOT shared, so no locking is needed. The main migration concerns are `/tmp` conflicts, memory sizing, and connection pooling. + +### Global State (No Changes Needed) ```python -# BEFORE (race condition) +# This is SAFE on LMI — each process has its own copy of cache cache = {} def handler(event, context): cache[event['key']] = compute(event) + return cache[event['key']] -# AFTER (thread-safe) -import threading -_lock = threading.Lock() -_cache = {} -def handler(event, context): - with _lock: - if event['key'] not in _cache: - _cache[event['key']] = compute(event) - return _cache[event['key']] +# Module-level clients are also safe (isolated per process) +s3_client = boto3.client('s3') +dynamodb = boto3.resource('dynamodb') ``` -### File I/O +### File I/O (Change Required — `/tmp` is shared across processes) ```python -# BEFORE +# BEFORE (conflict — all processes share /tmp) with open('/tmp/data.json', 'w') as f: json.dump(event, f) -# AFTER +# AFTER (request-unique path) path = f'/tmp/data-{context.aws_request_id}.json' try: with open(path, 'w') as f: json.dump(event, f) @@ -84,19 +81,28 @@ finally: os.unlink(path) ``` -### Database +### Database (Change Required — each process needs pooled connections) ```python -# BEFORE (per-invocation) +# BEFORE (per-invocation connection — exhausts limits at concurrency) def handler(event, context): conn = psycopg2.connect(host='...') -# AFTER (pool) +# AFTER (pool per process — initialized at module level) from psycopg2 import pool -db_pool = pool.ThreadedConnectionPool(2, 10, host=os.environ['DB_HOST']) +db_pool = pool.SimpleConnectionPool(1, 3, host=os.environ['DB_HOST']) def handler(event, context): conn = db_pool.getconn() try: return query(conn, event) finally: db_pool.putconn(conn) +# Note: total connections = pool_size × concurrency (e.g., 3 × 16 = 48) +``` + +### Memory Sizing +```python +# A function using 200 MB per process with default concurrency of 16: +# Total memory ≈ 200 MB × 16 = 3.2 GB +# Use 4:1 or 8:1 memory-to-vCPU ratio to accommodate +# Monitor MemoryUtilization metric and adjust as needed ``` ## Java diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md index 338e0477..5e411f46 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md @@ -1,53 +1,102 @@ -# Thread Safety for LMI +# Concurrency Safety for LMI -LMI runs multiple invocations concurrently in the same execution environment. Code must be thread-safe. +LMI runs multiple invocations concurrently in the same execution environment. The concurrency model differs by runtime — some require thread safety, others provide process isolation. ## Code Review Checklist When reviewing a function for LMI readiness, check each item: -- [ ] No global/static mutable variables (use immutable or request-local state) -- [ ] No shared `/tmp` paths (use request ID in filenames, clean up after) -- [ ] Thread-safe libraries only (check DB drivers, HTTP clients, caching libs) +- [ ] No shared `/tmp` paths (use request ID in filenames, clean up after — shared across ALL runtimes) - [ ] Database connections use pools (initialized outside handler, not per-invocation) - [ ] SDK clients outside handler (module-level singletons are fine — they are thread-safe) -- [ ] No request state in global scope (use AsyncLocalStorage, contextvars, ThreadLocal) - [ ] Logging includes request ID (for tracing concurrent requests) -- [ ] No environment variable mutation during requests (os.environ is shared) +- [ ] **Node.js/Java/.NET only:** No global/static mutable variables (use immutable or request-local state) +- [ ] **Node.js/Java/.NET only:** Thread-safe libraries only (check DB drivers, HTTP clients, caching libs) +- [ ] **Node.js/Java/.NET only:** No request state in global scope (use AsyncLocalStorage, contextvars, ThreadLocal) +- [ ] **Node.js/Java/.NET only:** No environment variable mutation during requests +- [ ] **Python only:** Memory budget accounts for per-process multiplication (memory × concurrency) ## Runtime-Specific Guidance -### Node.js -- Async/await model naturally suits multi-concurrency +### Python (Process-Based Isolation) + +Python uses **multiple independent processes**, each with its own interpreter and memory space. Global variables, module-level caches, and singleton objects are duplicated per process, not shared. If a function works on standard Lambda today, it works on LMI without code changes related to shared state. + +**Key concerns:** +- Memory consumption: total footprint ≈ per-process memory × concurrency. A 200 MB function with 16 concurrent processes can consume 3+ GB. +- `/tmp` filesystem is shared across all processes — use `context.aws_request_id` in filenames +- Each process needs its own connection pool — size pools per-process, not globally +- Prefer 4:1 or 8:1 memory-to-vCPU ratio to accommodate memory multiplication +- Monitor `MemoryUtilization` metric and adjust ratio if needed + +**Safe patterns (no locking needed):** +- Module-level mutable globals (isolated per process) +- Module-level SDK clients and caches +- `os.environ` reads + +### Node.js (Worker Threads + Async/Await) + +Uses worker threads (configurable via `AWS_LAMBDA_NODEJS_WORKER_COUNT`) combined with async/await event loops. The handler and global state are **shared across concurrent invocations within a worker thread**. + +The `await` keyword yields control to the event loop, which may execute another invocation that overwrites shared state before the first resumes. + +**Key concerns:** - Use `AsyncLocalStorage` from `node:async_hooks` for request context -- Initialize SDK clients and DB pools at module level +- Keep mutable state within handler local scope +- Initialize SDK clients and DB pools at module level (they are thread-safe) - Avoid module-level mutable state (`let count = 0` is a race condition) +- Callback-based handlers are NOT supported on Node.js 22 — use async handlers + +### Java (OS Threads) -### Python -- Uses separate processes per env (GIL limits true threading), but concurrent requests still share the process -- Use `contextvars` for request-specific data -- Use `threading.Lock` for shared mutable state -- Prefer 4:1 or 8:1 memory ratio (GIL limits CPU utilization) -- Use `ThreadedConnectionPool` for database connections +Uses OS-level threads. Lambda loads the handler class once and invokes `handleRequest` from multiple threads simultaneously (identical to a Java app server). -### Java -- Use immutable objects and thread-safe collections (`ConcurrentHashMap`, `Collections.unmodifiableList`) +**Key concerns:** +- Use immutable objects and thread-safe collections (`ConcurrentHashMap`, `Collections.synchronizedList`) - Initialize SDK clients and connection pools in constructor or static block - Avoid mutable `static` fields - Use `ThreadLocal` for request-specific state -- Use HikariCP or similar for connection pooling +- Use HikariCP or similar for connection pooling (AWS SDK for Java 2.x clients are thread-safe) -### .NET +### .NET (Task-Based Concurrency) + +Uses a single process with .NET Tasks (same model as ASP.NET Core). The handler object is shared across all Tasks. + +**Key concerns:** - Use `AsyncLocal` for request-scoped data - Inject scoped services via DI container - Initialize `HttpClient` and SDK clients as singletons -- Use `ConcurrentDictionary` instead of `Dictionary` for shared state +- Use `ConcurrentDictionary` and `SemaphoreSlim` for thread-safe access +- Invocation timeouts are NOT enforced by the runtime — use `ILambdaContext.RemainingTime` to detect approaching timeouts +- Powertools for AWS Lambda (.NET) does not yet support Lambda Managed Instances ## Common Anti-Patterns -| Anti-pattern | Risk | Fix | -|-------------|------|-----| -| Singleton HTTP clients per invocation | Wasted connections | Module-level initialization | -| Setting env vars during request | Race condition | Pass state via parameters | -| Logging without request ID | Unreadable interleaved logs | Include aws_request_id | -| Assuming sequential execution | State corruption | Each invocation must be self-contained | +| Anti-pattern | Affected Runtimes | Risk | Fix | +|-------------|-------------------|------|-----| +| New DB connection per invocation | All | Exhausts connection limits | Module-level connection pool | +| Hardcoded `/tmp` paths | All | File conflicts across processes | Use `aws_request_id` in path | +| Logging without request ID | All | Unreadable interleaved logs | Include `aws_request_id` | +| Mutable module-level state | Node.js, Java, .NET | Race condition / state corruption | Request-local scope or concurrent collections | +| Setting env vars during request | Node.js, Java, .NET | Race condition | Pass state via parameters | +| Assuming sequential execution | Node.js, Java, .NET | State corruption | Each invocation must be self-contained | +| Ignoring memory multiplication | Python | OOM at high concurrency | Account for per-process × concurrency | + +## Powertools for AWS Lambda Compatibility + +Powertools handles multi-concurrency transparently (structured logging, tracing, metrics). No code changes needed. + +| Runtime | Package | Minimum Version | +|---------|---------|-----------------| +| Python | Powertools for AWS Lambda (Python) | 3.23.0 | +| TypeScript | Powertools for AWS Lambda (TypeScript) | 2.29.0 | +| Java | Powertools for AWS Lambda (Java) | 2.8.0 | +| .NET | Powertools for AWS Lambda (.NET) | Not yet supported | + +AWS SDK and X-Ray minimum versions: + +| Runtime | AWS SDK minimum | X-Ray SDK minimum | +|---------|----------------|-------------------| +| Node.js | AWS SDK for JavaScript v3 (3.933.0) | 3.12.0 | +| Java | AWS SDK for Java 2.0 (2.34.0) | 2.20.0 | +| .NET | AWSSDK.Core (4.0.0.32) | AWSXRayRecorder.Core (2.16.0) | diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/scripts/setup-lmi.sh b/plugins/aws-serverless/skills/aws-lambda-managed-instances/scripts/setup-lmi.sh new file mode 100755 index 00000000..8485d6f9 --- /dev/null +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/scripts/setup-lmi.sh @@ -0,0 +1,69 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Setup script for AWS Lambda Managed Instances (LMI) +# Usage: ./setup-lmi.sh +# +# Prerequisites: +# - AWS CLI configured with appropriate credentials +# - VPC subnets and security group created +# - IAM roles created (see references/infrastructure-setup.md) +# +# Environment variables (required): +# SUBNET_IDS - Comma-separated subnet IDs (3+ AZs) +# SECURITY_GROUP_ID - Security group ID +# ACCOUNT_ID - AWS account ID +# OPERATOR_ROLE_ARN - ARN of the LMI operator role +# EXECUTION_ROLE_ARN - ARN of the Lambda execution role +# +# Environment variables (optional): +# AWS_REGION - AWS region (default: from AWS CLI config) +# MAX_VCPU_COUNT - Max vCPU limit (default: 30) +# MEMORY_SIZE - Function memory in MB (default: 4096) +# RUNTIME - Lambda runtime (default: python3.13) +# HANDLER - Function handler (default: app.handler) + +FUNCTION_NAME="${1:?Usage: $0 }" +CP_NAME="${2:?Usage: $0 }" +ARCHITECTURE="${3:-arm64}" + +: "${SUBNET_IDS:?Set SUBNET_IDS (comma-separated, 3+ AZs)}" +: "${SECURITY_GROUP_ID:?Set SECURITY_GROUP_ID}" +: "${ACCOUNT_ID:?Set ACCOUNT_ID}" +: "${OPERATOR_ROLE_ARN:?Set OPERATOR_ROLE_ARN}" +: "${EXECUTION_ROLE_ARN:?Set EXECUTION_ROLE_ARN}" + +MAX_VCPU_COUNT="${MAX_VCPU_COUNT:-30}" +MEMORY_SIZE="${MEMORY_SIZE:-4096}" +RUNTIME="${RUNTIME:-python3.13}" +HANDLER="${HANDLER:-app.handler}" +REGION="${AWS_REGION:-$(aws configure get region)}" + +echo "==> Creating capacity provider: ${CP_NAME}" +aws lambda create-capacity-provider \ + --capacity-provider-name "${CP_NAME}" \ + --vpc-config "SubnetIds=[${SUBNET_IDS}],SecurityGroupIds=[${SECURITY_GROUP_ID}]" \ + --permissions-config "CapacityProviderOperatorRoleArn=${OPERATOR_ROLE_ARN}" \ + --instance-requirements "Architectures=[${ARCHITECTURE}]" \ + --capacity-provider-scaling-config "MaxVCpuCount=${MAX_VCPU_COUNT}" + +CP_ARN="arn:aws:lambda:${REGION}:${ACCOUNT_ID}:capacity-provider:${CP_NAME}" + +echo "==> Creating function: ${FUNCTION_NAME}" +aws lambda create-function \ + --function-name "${FUNCTION_NAME}" \ + --runtime "${RUNTIME}" \ + --handler "${HANDLER}" \ + --zip-file fileb://function.zip \ + --role "${EXECUTION_ROLE_ARN}" \ + --architectures "${ARCHITECTURE}" \ + --memory-size "${MEMORY_SIZE}" \ + --capacity-provider-config \ + "LambdaManagedInstancesCapacityProviderConfig={CapacityProviderArn=${CP_ARN}}" + +echo "==> Publishing version (triggers instance provisioning — may take several minutes)" +VERSION=$(aws lambda publish-version --function-name "${FUNCTION_NAME}" --query 'Version' --output text) + +echo "==> Done. Function version: ${VERSION}" +echo " Invoke with: aws lambda invoke --function-name ${FUNCTION_NAME}:${VERSION} --payload '{}' response.json" +echo " Monitor provisioning: aws lambda get-capacity-provider --capacity-provider-name ${CP_NAME}" From e465e5bf577267bdb0d9b00c3d4f8f31ec0f5489 Mon Sep 17 00:00:00 2001 From: Sahil Bhimjiani Date: Sat, 9 May 2026 13:46:21 -0500 Subject: [PATCH 4/8] fix(aws-serverless): address Leandro's feedback on Powertools .NET and cost example - Update Powertools .NET to supported (logging, tracing, idempotency, batch, parameters) - Remove hardcoded cost example, defer to LMI Pricing Calculator --- .../references/cost-comparison.md | 11 +++-------- .../references/thread-safety.md | 3 +-- 2 files changed, 4 insertions(+), 10 deletions(-) diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md index 1bc16a76..fd86f65a 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md @@ -55,14 +55,9 @@ Present this for every assessment: ## Worked Example -Node.js API, 100 req/s steady (259M req/mo), 200ms avg, 512 MB, x86: - -| Scenario | Monthly | Annual | Savings | -|----------|---------|--------|---------| -| Lambda on-demand | $484 | $5,808 | baseline | -| Lambda + 3yr SP | $411 | $4,932 | 15% | -| LMI on-demand (3× m7i.large) | $288 | $3,456 | 40% | -| LMI + 3yr SP | $155 | $1,860 | 68% | +Use the [LMI Pricing Calculator](https://aws-samples.github.io/sample-aws-lambda-managed-instances/) for accurate, up-to-date cost comparisons based on your specific workload parameters (region, instance type, request volume, duration). + +When building a cost comparison for a user, gather: region, runtime, requests/month, average duration, memory, and architecture (x86 vs ARM). Plug these into the calculator rather than relying on hardcoded estimates. ## When LMI is NOT Cheaper diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md index 5e411f46..5942009e 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md @@ -68,7 +68,6 @@ Uses a single process with .NET Tasks (same model as ASP.NET Core). The handler - Initialize `HttpClient` and SDK clients as singletons - Use `ConcurrentDictionary` and `SemaphoreSlim` for thread-safe access - Invocation timeouts are NOT enforced by the runtime — use `ILambdaContext.RemainingTime` to detect approaching timeouts -- Powertools for AWS Lambda (.NET) does not yet support Lambda Managed Instances ## Common Anti-Patterns @@ -91,7 +90,7 @@ Powertools handles multi-concurrency transparently (structured logging, tracing, | Python | Powertools for AWS Lambda (Python) | 3.23.0 | | TypeScript | Powertools for AWS Lambda (TypeScript) | 2.29.0 | | Java | Powertools for AWS Lambda (Java) | 2.8.0 | -| .NET | Powertools for AWS Lambda (.NET) | Not yet supported | +| .NET | Powertools for AWS Lambda (.NET) | Supported (logging, tracing, idempotency, batch, parameters) | AWS SDK and X-Ray minimum versions: From 12a006bc6874aa69c561496cb63d985dd23413c5 Mon Sep 17 00:00:00 2001 From: Sahil Bhimjiani Date: Mon, 11 May 2026 16:50:45 -0500 Subject: [PATCH 5/8] fix(aws-serverless): simplify cost reference and remove VPC endpoint pricing - Strip cost-comparison.md to just pricing calculator links - Remove cost column from VPC endpoints table in infrastructure-setup.md --- .../references/cost-comparison.md | 55 --------------- .../references/infrastructure-setup.md | 67 ++++++++++--------- .../references/thread-safety.md | 2 +- 3 files changed, 36 insertions(+), 88 deletions(-) diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md index fd86f65a..d57c9031 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/cost-comparison.md @@ -1,60 +1,5 @@ # Lambda vs LMI Cost Comparison -## Pricing Components - -**Standard Lambda:** $0.20/M requests + $0.0000166667/GB-sec (x86) or $0.0000133334 (ARM). Compute Savings Plans give ~17% discount on duration. - -**LMI:** $0.20/M requests + EC2 instance cost (24/7) + 15% management fee on EC2 on-demand price. No per-request duration charge. Savings Plans/RIs discount up to 72% on EC2 compute. 15% fee always on on-demand price. - -## Discount Comparison - -| Option | Lambda discount | LMI discount | -|--------|----------------|--------------| -| On-demand | 0% | 0% | -| Compute Savings Plan (1yr) | ~17% on duration | ~40-50% on EC2 | -| Compute Savings Plan (3yr) | ~17% on duration | ~60-72% on EC2 | -| Reserved Instances (1yr) | N/A | ~40% on EC2 | -| Reserved Instances (3yr) | N/A | ~60-65% on EC2 | - -Compute Savings Plans apply to both Lambda duration AND EC2 instances. One commitment can cover both. - -## Calculation Formulas - -``` -# Lambda on-demand -duration_cost = requests × avg_duration_sec × memory_GB × $0.0000166667 -request_cost = requests × $0.20 / 1,000,000 -total = duration_cost + request_cost - -# Lambda + Savings Plan (17% on duration) -total = (duration_cost × 0.83) + request_cost - -# LMI on-demand -ec2_cost = num_instances × hourly_price × 730 -mgmt_fee = ec2_cost × 0.15 -total = ec2_cost + mgmt_fee + request_cost - -# LMI + 3yr Savings Plan (65% discount on EC2) -total = (ec2_cost × 0.35) + mgmt_fee + request_cost -``` - -## Comparison Table Template - -Present this for every assessment: - -``` -| Component | Lambda OD | Lambda+SP | LMI OD | LMI+3yr SP | -|--------------------|-----------|-----------|--------|------------| -| Requests | $X | $X | $X | $X | -| Duration/compute | $X | $X | $X | $X | -| Management fee | — | — | $X | $X | -| Monthly total | $X | $X | $X | $X | -| Annual total | $X | $X | $X | $X | -| Savings vs Lambda | baseline | X% | X% | X% | -``` - -## Worked Example - Use the [LMI Pricing Calculator](https://aws-samples.github.io/sample-aws-lambda-managed-instances/) for accurate, up-to-date cost comparisons based on your specific workload parameters (region, instance type, request volume, duration). When building a cost comparison for a user, gather: region, runtime, requests/month, average duration, memory, and architecture (x86 vs ARM). Plug these into the calculator rather than relying on hardcoded estimates. diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md index 50731535..10eb2980 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md @@ -129,59 +129,62 @@ LMI runs functions on EC2 instances inside the VPC. These instances need VPC end - Security groups: HTTPS egress (port 443) for AWS API calls; no ingress needed - Required VPC endpoints: -| Endpoint | Type | Purpose | Cost | -|----------|------|---------|------| -| S3 | Gateway | Object storage access | Free | -| DynamoDB | Gateway | Table access | Free | -| SQS | Interface | Queue operations | $0.01/hr per AZ | -| CloudWatch Logs | Interface | Log delivery | $0.01/hr per AZ | -| CloudWatch Monitoring | Interface | Metrics/EMF | $0.01/hr per AZ | -| X-Ray | Interface | Distributed tracing | $0.01/hr per AZ | - -Gateway endpoints are free; interface endpoints incur hourly charges per AZ. +| Endpoint | Type | Purpose | +|----------|------|---------| +| S3 | Gateway | Object storage access | +| DynamoDB | Gateway | Table access | +| SQS | Interface | Queue operations | +| CloudWatch Logs | Interface | Log delivery | +| CloudWatch Monitoring | Interface | Metrics/EMF | +| X-Ray | Interface | Distributed tracing | ## CLI Workflow -Use the setup script for automated provisioning: +### Required Parameters + +| Parameter | Description | +|-----------|-------------| +| `SUBNET_IDS` | Comma-separated subnet IDs across 3+ AZs | +| `SECURITY_GROUP_ID` | Security group ID for the capacity provider | +| `ACCOUNT_ID` | AWS account ID | +| `OPERATOR_ROLE_ARN` | ARN of the operator role (see above) | +| `EXECUTION_ROLE_ARN` | ARN of the execution role (see above) | +| `FUNCTION_NAME` | Name for the Lambda function | +| `CP_NAME` | Name for the capacity provider | +| `ARCHITECTURE` | `arm64` (Graviton) or `x86_64` | + +### Automated Setup + +See [`scripts/setup-lmi.sh`](../scripts/setup-lmi.sh) — set the environment variables above and run: ```bash -# Set required environment variables -export SUBNET_IDS="subnet-abc,subnet-def,subnet-ghi" -export SECURITY_GROUP_ID="sg-123456" -export ACCOUNT_ID="123456789012" -export OPERATOR_ROLE_ARN="arn:aws:iam::123456789012:role/LMIOperatorRole" -export EXECUTION_ROLE_ARN="arn:aws:iam::123456789012:role/LMIExecutionRole" - -# Run setup -./scripts/setup-lmi.sh my-function my-capacity-provider arm64 +./scripts/setup-lmi.sh ``` -See [`scripts/setup-lmi.sh`](../scripts/setup-lmi.sh) for the full script with configurable options. - -### Manual Steps (if not using the script) +### Manual Steps ```bash # 1. Create capacity provider aws lambda create-capacity-provider \ - --capacity-provider-name my-cp \ - --vpc-config "SubnetIds=[subnet-abc,subnet-def,subnet-ghi],SecurityGroupIds=[sg-123456]" \ - --permissions-config "CapacityProviderOperatorRoleArn=arn:aws:iam::$ACCT:role/LMIOperatorRole" \ - --instance-requirements "Architectures=[arm64]" \ + --capacity-provider-name $CP_NAME \ + --vpc-config "SubnetIds=[$SUBNET_IDS],SecurityGroupIds=[$SECURITY_GROUP_ID]" \ + --permissions-config "CapacityProviderOperatorRoleArn=$OPERATOR_ROLE_ARN" \ + --instance-requirements "Architectures=[$ARCHITECTURE]" \ --capacity-provider-scaling-config "MaxVCpuCount=30" # 2. Create function -aws lambda create-function --function-name my-fn --runtime python3.13 \ +aws lambda create-function --function-name $FUNCTION_NAME --runtime python3.13 \ --handler app.handler --zip-file fileb://function.zip \ - --role "arn:aws:iam::$ACCT:role/LMIExecutionRole" --architectures arm64 \ + --role $EXECUTION_ROLE_ARN --architectures $ARCHITECTURE \ --memory-size 4096 \ --capacity-provider-config \ - "LambdaManagedInstancesCapacityProviderConfig={CapacityProviderArn=arn:aws:lambda:$REGION:$ACCT:capacity-provider:my-cp}" + "LambdaManagedInstancesCapacityProviderConfig={CapacityProviderArn=arn:aws:lambda:$AWS_REGION:$ACCOUNT_ID:capacity-provider:$CP_NAME}" # 3. Publish version (triggers provisioning — takes several minutes) -aws lambda publish-version --function-name my-fn +aws lambda publish-version --function-name $FUNCTION_NAME # 4. Invoke (must use versioned ARN) -aws lambda invoke --function-name my-fn:1 --payload '{}' response.json +aws lambda invoke --function-name $FUNCTION_NAME:1 --payload '{}' response.json ``` Architecture must match between function and capacity provider. diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md index 5942009e..5591befe 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md @@ -90,7 +90,7 @@ Powertools handles multi-concurrency transparently (structured logging, tracing, | Python | Powertools for AWS Lambda (Python) | 3.23.0 | | TypeScript | Powertools for AWS Lambda (TypeScript) | 2.29.0 | | Java | Powertools for AWS Lambda (Java) | 2.8.0 | -| .NET | Powertools for AWS Lambda (.NET) | Supported (logging, tracing, idempotency, batch, parameters) | +| .NET | Powertools for AWS Lambda (.NET) | 3.1.0 | AWS SDK and X-Ray minimum versions: From 733dec967be93c3453962749e684d3c167c23704 Mon Sep 17 00:00:00 2001 From: Sahil Bhimjiani Date: Mon, 11 May 2026 17:15:49 -0500 Subject: [PATCH 6/8] fix(aws-serverless): fix markdown lint errors (blank lines around fences/headings/lists) --- .../references/infrastructure-setup.md | 5 +++++ .../references/migration-patterns.md | 9 +++++++++ .../references/thread-safety.md | 5 +++++ 3 files changed, 19 insertions(+) diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md index 10eb2980..2a6a1680 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md @@ -5,6 +5,7 @@ ### 1. Execution Role (for the function) Trust policy: + ```json { "Version": "2012-10-17", @@ -17,6 +18,7 @@ Trust policy: ``` Minimum permissions: + ```json { "Version": "2012-10-17", @@ -35,6 +37,7 @@ Minimum permissions: ``` Add VPC permissions only if the function accesses VPC resources: + ```json { "Effect": "Allow", @@ -50,6 +53,7 @@ Add VPC permissions only if the function accesses VPC resources: ### 2. Operator Role (for capacity provider EC2 management) Trust policy: + ```json { "Version": "2012-10-17", @@ -62,6 +66,7 @@ Trust policy: ``` Minimum permissions (scoped with conditions): + ```json { "Version": "2012-10-17", diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/migration-patterns.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/migration-patterns.md index c26d9f0d..7898f03f 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/migration-patterns.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/migration-patterns.md @@ -5,6 +5,7 @@ Before/after code examples for migrating to multi-concurrency. ## Node.js ### Global State + ```javascript // BEFORE (race condition) let requestCount = 0; @@ -24,6 +25,7 @@ exports.handler = async (event) => { ``` ### File I/O + ```javascript // BEFORE (shared path) fs.writeFileSync('/tmp/output.json', JSON.stringify(data)); @@ -35,6 +37,7 @@ finally { fs.unlinkSync(path); } ``` ### Database + ```javascript // BEFORE (per-invocation connection) exports.handler = async (event) => { @@ -56,6 +59,7 @@ exports.handler = async (event) => { Python on LMI uses **process-based isolation**. Each concurrent invocation runs in its own process with independent memory. Global state is NOT shared, so no locking is needed. The main migration concerns are `/tmp` conflicts, memory sizing, and connection pooling. ### Global State (No Changes Needed) + ```python # This is SAFE on LMI — each process has its own copy of cache cache = {} @@ -69,6 +73,7 @@ dynamodb = boto3.resource('dynamodb') ``` ### File I/O (Change Required — `/tmp` is shared across processes) + ```python # BEFORE (conflict — all processes share /tmp) with open('/tmp/data.json', 'w') as f: json.dump(event, f) @@ -82,6 +87,7 @@ finally: ``` ### Database (Change Required — each process needs pooled connections) + ```python # BEFORE (per-invocation connection — exhausts limits at concurrency) def handler(event, context): @@ -98,6 +104,7 @@ def handler(event, context): ``` ### Memory Sizing + ```python # A function using 200 MB per process with default concurrency of 16: # Total memory ≈ 200 MB × 16 = 3.2 GB @@ -108,6 +115,7 @@ def handler(event, context): ## Java ### Global State + ```java // BEFORE (race condition) private static Map cache = new HashMap<>(); @@ -118,6 +126,7 @@ private static final ConcurrentHashMap cache = new ConcurrentHas ``` ### Database + ```java // BEFORE (per-invocation) Connection conn = DriverManager.getConnection("jdbc:..."); diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md index 5591befe..2d926cce 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md @@ -23,6 +23,7 @@ When reviewing a function for LMI readiness, check each item: Python uses **multiple independent processes**, each with its own interpreter and memory space. Global variables, module-level caches, and singleton objects are duplicated per process, not shared. If a function works on standard Lambda today, it works on LMI without code changes related to shared state. **Key concerns:** + - Memory consumption: total footprint ≈ per-process memory × concurrency. A 200 MB function with 16 concurrent processes can consume 3+ GB. - `/tmp` filesystem is shared across all processes — use `context.aws_request_id` in filenames - Each process needs its own connection pool — size pools per-process, not globally @@ -30,6 +31,7 @@ Python uses **multiple independent processes**, each with its own interpreter an - Monitor `MemoryUtilization` metric and adjust ratio if needed **Safe patterns (no locking needed):** + - Module-level mutable globals (isolated per process) - Module-level SDK clients and caches - `os.environ` reads @@ -41,6 +43,7 @@ Uses worker threads (configurable via `AWS_LAMBDA_NODEJS_WORKER_COUNT`) combined The `await` keyword yields control to the event loop, which may execute another invocation that overwrites shared state before the first resumes. **Key concerns:** + - Use `AsyncLocalStorage` from `node:async_hooks` for request context - Keep mutable state within handler local scope - Initialize SDK clients and DB pools at module level (they are thread-safe) @@ -52,6 +55,7 @@ The `await` keyword yields control to the event loop, which may execute another Uses OS-level threads. Lambda loads the handler class once and invokes `handleRequest` from multiple threads simultaneously (identical to a Java app server). **Key concerns:** + - Use immutable objects and thread-safe collections (`ConcurrentHashMap`, `Collections.synchronizedList`) - Initialize SDK clients and connection pools in constructor or static block - Avoid mutable `static` fields @@ -63,6 +67,7 @@ Uses OS-level threads. Lambda loads the handler class once and invokes `handleRe Uses a single process with .NET Tasks (same model as ASP.NET Core). The handler object is shared across all Tasks. **Key concerns:** + - Use `AsyncLocal` for request-scoped data - Inject scoped services via DI container - Initialize `HttpClient` and SDK clients as singletons From 17fe66b41b8cea474b18a7ca0c9302a4be413d7f Mon Sep 17 00:00:00 2001 From: Sahil Bhimjiani Date: Mon, 11 May 2026 17:36:06 -0500 Subject: [PATCH 7/8] fix(aws-serverless): run dprint fmt to fix table formatting --- .../aws-lambda-managed-instances/SKILL.md | 58 +++++++++---------- .../references/configuration-guide.md | 50 ++++++++-------- .../references/infrastructure-setup.md | 40 ++++++------- .../references/thread-safety.md | 40 ++++++------- .../references/troubleshooting.md | 24 ++++---- 5 files changed, 106 insertions(+), 106 deletions(-) diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md index fdeb8ab7..ef15303c 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/SKILL.md @@ -30,15 +30,15 @@ For standard Lambda development, see [aws-lambda skill](../aws-lambda/). For SAM ## Quick Decision: Is LMI Right for This Workload? -| Signal | LMI is a strong fit | Standard Lambda is better | -|--------|---------------------|---------------------------| -| Traffic | Steady, predictable, 50M+ req/mo | Bursty, unpredictable, long idle | -| Cost | Duration-heavy spend at scale | Low or sporadic invocations | -| Cold starts | Unacceptable (LMI eliminates for provisioned capacity; scale-out may have brief delays) | Tolerable or mitigated by SnapStart | -| Compute | Latest CPUs, specific families, high network bandwidth | Standard Lambda memory/CPU sufficient | -| Isolation | Dedicated EC2 instances in your account, full VPC control | Shared Firecracker micro-VMs acceptable | -| Scale-to-zero | Not needed (min 3 instances always run) | Required (pay nothing when idle) | -| Code readiness | Thread-safe (Node.js/Java/.NET) or any Python code | Non-thread-safe Node.js/Java/.NET, expensive to change | +| Signal | LMI is a strong fit | Standard Lambda is better | +| -------------- | --------------------------------------------------------------------------------------- | ------------------------------------------------------ | +| Traffic | Steady, predictable, 50M+ req/mo | Bursty, unpredictable, long idle | +| Cost | Duration-heavy spend at scale | Low or sporadic invocations | +| Cold starts | Unacceptable (LMI eliminates for provisioned capacity; scale-out may have brief delays) | Tolerable or mitigated by SnapStart | +| Compute | Latest CPUs, specific families, high network bandwidth | Standard Lambda memory/CPU sufficient | +| Isolation | Dedicated EC2 instances in your account, full VPC control | Shared Firecracker micro-VMs acceptable | +| Scale-to-zero | Not needed (min 3 instances always run) | Required (pay nothing when idle) | +| Code readiness | Thread-safe (Node.js/Java/.NET) or any Python code | Non-thread-safe Node.js/Java/.NET, expensive to change | ## Instructions @@ -58,10 +58,10 @@ Gather these signals before recommending: REQUIRED: Present a cost comparison before recommending LMI. Compare at minimum: -| Scenario | When it wins | -|----------|-------------| -| Lambda on-demand | Low volume, bursty traffic | -| LMI on-demand | High volume, steady traffic | +| Scenario | When it wins | +| ---------------- | --------------------------- | +| Lambda on-demand | Low volume, bursty traffic | +| LMI on-demand | High volume, steady traffic | Rule of thumb: LMI becomes cost-competitive at 50-100M+ req/month with steady traffic. @@ -140,25 +140,25 @@ See [references/infrastructure-setup.md](references/infrastructure-setup.md) for ## Limits Quick Reference -| Resource | Limit | -|----------|-------| -| Memory | 2 GB min, 32 GB max | -| Instances | 3 minimum (AZ resiliency) | -| Instance lifespan | 14 days (auto-replaced) | -| Concurrency/vCPU | 64 (Node.js), 32 (Java/.NET), 16 (Python) | -| Runtimes | Node.js, Java, .NET, Python | -| Instance families | C, M, R (.large and up) | -| Scaling | Absorbs 50% spike; doubles within 5 min | +| Resource | Limit | +| ----------------- | ----------------------------------------- | +| Memory | 2 GB min, 32 GB max | +| Instances | 3 minimum (AZ resiliency) | +| Instance lifespan | 14 days (auto-replaced) | +| Concurrency/vCPU | 64 (Node.js), 32 (Java/.NET), 16 (Python) | +| Runtimes | Node.js, Java, .NET, Python | +| Instance families | C, M, R (.large and up) | +| Scaling | Absorbs 50% spike; doubles within 5 min | ## Troubleshooting Quick Reference -| Issue | Cause | Fix | -|-------|-------|-----| -| 429 throttles | Traffic exceeds scaling speed | Increase MinExecutionEnvironments or lower TargetResourceUtilization | -| Function stuck PENDING | Provisioning instances | Wait; check VPC/IAM config | -| Architecture mismatch | Function ≠ capacity provider arch | Align both to same architecture | -| Cannot terminate instances | Managed by capacity provider | Delete capacity provider instead | -| Race conditions | Code not thread-safe | See [references/thread-safety.md](references/thread-safety.md) | +| Issue | Cause | Fix | +| -------------------------- | --------------------------------- | -------------------------------------------------------------------- | +| 429 throttles | Traffic exceeds scaling speed | Increase MinExecutionEnvironments or lower TargetResourceUtilization | +| Function stuck PENDING | Provisioning instances | Wait; check VPC/IAM config | +| Architecture mismatch | Function ≠ capacity provider arch | Align both to same architecture | +| Cannot terminate instances | Managed by capacity provider | Delete capacity provider instead | +| Race conditions | Code not thread-safe | See [references/thread-safety.md](references/thread-safety.md) | See [references/troubleshooting.md](references/troubleshooting.md) for detailed resolution steps. diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md index ff0162d6..9b2bc458 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/configuration-guide.md @@ -11,45 +11,45 @@ Architecture: ARM (Graviton, g-suffix) for price-performance. x86 (i=Intel, a=AM ## Memory-to-vCPU Ratios -| Ratio | Profile | When to use | Memory examples | -|-------|---------|-------------|-----------------| -| 2:1 | Compute | CPU-bound work | 2GB/1vCPU, 4GB/2vCPU | -| 4:1 | General | Most workloads (default) | 4GB/1vCPU, 8GB/2vCPU | -| 8:1 | Memory | Caching, data, Python apps | 8GB/1vCPU, 16GB/2vCPU | +| Ratio | Profile | When to use | Memory examples | +| ----- | ------- | -------------------------- | --------------------- | +| 2:1 | Compute | CPU-bound work | 2GB/1vCPU, 4GB/2vCPU | +| 4:1 | General | Most workloads (default) | 4GB/1vCPU, 8GB/2vCPU | +| 8:1 | Memory | Caching, data, Python apps | 8GB/1vCPU, 16GB/2vCPU | Min: 2 GB / 1 vCPU. Max: 32 GB. Memory must align with ratio multiples. ## Memory Sizing from Existing Lambda -| Current Lambda | LMI memory | Ratio | Rationale | -|---------------|------------|-------|-----------| -| 128-512 MB | 2048 MB | 4:1 | LMI minimum; multi-concurrency shares memory | -| 512 MB-1 GB | 2048 MB | 4:1 | Room for concurrent requests | -| 1-2 GB | 4096 MB | 4:1 | Standard upgrade path | -| 2-4 GB | 4096-8192 MB | 4:1 or 8:1 | Depends on memory vs CPU bottleneck | -| 4-10 GB | 8192-16384 MB | 8:1 | Likely memory-heavy workload | +| Current Lambda | LMI memory | Ratio | Rationale | +| -------------- | ------------- | ---------- | -------------------------------------------- | +| 128-512 MB | 2048 MB | 4:1 | LMI minimum; multi-concurrency shares memory | +| 512 MB-1 GB | 2048 MB | 4:1 | Room for concurrent requests | +| 1-2 GB | 4096 MB | 4:1 | Standard upgrade path | +| 2-4 GB | 4096-8192 MB | 4:1 or 8:1 | Depends on memory vs CPU bottleneck | +| 4-10 GB | 8192-16384 MB | 8:1 | Likely memory-heavy workload | ## Concurrency Tuning -| Runtime | Default/vCPU | I/O-bound | CPU-bound | -|---------|-------------|-----------|-----------| -| Node.js | 64 | Keep or increase | 1 per vCPU | -| Java | 32 | Keep | 1 per vCPU | -| .NET | 32 | Keep | 1 per vCPU | -| Python | 16 | Keep | 1 per vCPU | +| Runtime | Default/vCPU | I/O-bound | CPU-bound | +| ------- | ------------ | ---------------- | ---------- | +| Node.js | 64 | Keep or increase | 1 per vCPU | +| Java | 32 | Keep | 1 per vCPU | +| .NET | 32 | Keep | 1 per vCPU | +| Python | 16 | Keep | 1 per vCPU | Total capacity = MinExecutionEnvironments × PerExecutionEnvironmentMaxConcurrency ## Capacity Provider Scaling Controls -| Control | Default | Guidance | -|---------|---------|----------| -| MinExecutionEnvironments | 3 | Increase for baseline capacity; never below 3 | -| MaxExecutionEnvironments | — | Set based on cost budget | -| MaxVCpuCount | Required | Start at 30, adjust by load | +| Control | Default | Guidance | +| ------------------------- | ------------- | --------------------------------------------- | +| MinExecutionEnvironments | 3 | Increase for baseline capacity; never below 3 | +| MaxExecutionEnvironments | — | Set based on cost budget | +| MaxVCpuCount | Required | Start at 30, adjust by load | | TargetResourceUtilization | ~50% headroom | Raise for cost savings (less burst tolerance) | -| AllowedInstanceTypes | All | Restrict only for specific hardware needs | -| ExcludedInstanceTypes | None | Exclude expensive types in dev/test | +| AllowedInstanceTypes | All | Restrict only for specific hardware needs | +| ExcludedInstanceTypes | None | Exclude expensive types in dev/test | ## Monitoring Thresholds diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md index 2a6a1680..81c234dd 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/infrastructure-setup.md @@ -11,7 +11,7 @@ Trust policy: "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", - "Principal": {"Service": "lambda.amazonaws.com"}, + "Principal": { "Service": "lambda.amazonaws.com" }, "Action": "sts:AssumeRole" }] } @@ -59,7 +59,7 @@ Trust policy: "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", - "Principal": {"Service": "lambda.amazonaws.com"}, + "Principal": { "Service": "lambda.amazonaws.com" }, "Action": "sts:AssumeRole" }] } @@ -134,29 +134,29 @@ LMI runs functions on EC2 instances inside the VPC. These instances need VPC end - Security groups: HTTPS egress (port 443) for AWS API calls; no ingress needed - Required VPC endpoints: -| Endpoint | Type | Purpose | -|----------|------|---------| -| S3 | Gateway | Object storage access | -| DynamoDB | Gateway | Table access | -| SQS | Interface | Queue operations | -| CloudWatch Logs | Interface | Log delivery | -| CloudWatch Monitoring | Interface | Metrics/EMF | -| X-Ray | Interface | Distributed tracing | +| Endpoint | Type | Purpose | +| --------------------- | --------- | --------------------- | +| S3 | Gateway | Object storage access | +| DynamoDB | Gateway | Table access | +| SQS | Interface | Queue operations | +| CloudWatch Logs | Interface | Log delivery | +| CloudWatch Monitoring | Interface | Metrics/EMF | +| X-Ray | Interface | Distributed tracing | ## CLI Workflow ### Required Parameters -| Parameter | Description | -|-----------|-------------| -| `SUBNET_IDS` | Comma-separated subnet IDs across 3+ AZs | -| `SECURITY_GROUP_ID` | Security group ID for the capacity provider | -| `ACCOUNT_ID` | AWS account ID | -| `OPERATOR_ROLE_ARN` | ARN of the operator role (see above) | -| `EXECUTION_ROLE_ARN` | ARN of the execution role (see above) | -| `FUNCTION_NAME` | Name for the Lambda function | -| `CP_NAME` | Name for the capacity provider | -| `ARCHITECTURE` | `arm64` (Graviton) or `x86_64` | +| Parameter | Description | +| -------------------- | ------------------------------------------- | +| `SUBNET_IDS` | Comma-separated subnet IDs across 3+ AZs | +| `SECURITY_GROUP_ID` | Security group ID for the capacity provider | +| `ACCOUNT_ID` | AWS account ID | +| `OPERATOR_ROLE_ARN` | ARN of the operator role (see above) | +| `EXECUTION_ROLE_ARN` | ARN of the execution role (see above) | +| `FUNCTION_NAME` | Name for the Lambda function | +| `CP_NAME` | Name for the capacity provider | +| `ARCHITECTURE` | `arm64` (Graviton) or `x86_64` | ### Automated Setup diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md index 2d926cce..d6d677f2 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/thread-safety.md @@ -76,31 +76,31 @@ Uses a single process with .NET Tasks (same model as ASP.NET Core). The handler ## Common Anti-Patterns -| Anti-pattern | Affected Runtimes | Risk | Fix | -|-------------|-------------------|------|-----| -| New DB connection per invocation | All | Exhausts connection limits | Module-level connection pool | -| Hardcoded `/tmp` paths | All | File conflicts across processes | Use `aws_request_id` in path | -| Logging without request ID | All | Unreadable interleaved logs | Include `aws_request_id` | -| Mutable module-level state | Node.js, Java, .NET | Race condition / state corruption | Request-local scope or concurrent collections | -| Setting env vars during request | Node.js, Java, .NET | Race condition | Pass state via parameters | -| Assuming sequential execution | Node.js, Java, .NET | State corruption | Each invocation must be self-contained | -| Ignoring memory multiplication | Python | OOM at high concurrency | Account for per-process × concurrency | +| Anti-pattern | Affected Runtimes | Risk | Fix | +| -------------------------------- | ------------------- | --------------------------------- | --------------------------------------------- | +| New DB connection per invocation | All | Exhausts connection limits | Module-level connection pool | +| Hardcoded `/tmp` paths | All | File conflicts across processes | Use `aws_request_id` in path | +| Logging without request ID | All | Unreadable interleaved logs | Include `aws_request_id` | +| Mutable module-level state | Node.js, Java, .NET | Race condition / state corruption | Request-local scope or concurrent collections | +| Setting env vars during request | Node.js, Java, .NET | Race condition | Pass state via parameters | +| Assuming sequential execution | Node.js, Java, .NET | State corruption | Each invocation must be self-contained | +| Ignoring memory multiplication | Python | OOM at high concurrency | Account for per-process × concurrency | ## Powertools for AWS Lambda Compatibility Powertools handles multi-concurrency transparently (structured logging, tracing, metrics). No code changes needed. -| Runtime | Package | Minimum Version | -|---------|---------|-----------------| -| Python | Powertools for AWS Lambda (Python) | 3.23.0 | -| TypeScript | Powertools for AWS Lambda (TypeScript) | 2.29.0 | -| Java | Powertools for AWS Lambda (Java) | 2.8.0 | -| .NET | Powertools for AWS Lambda (.NET) | 3.1.0 | +| Runtime | Package | Minimum Version | +| ---------- | -------------------------------------- | --------------- | +| Python | Powertools for AWS Lambda (Python) | 3.23.0 | +| TypeScript | Powertools for AWS Lambda (TypeScript) | 2.29.0 | +| Java | Powertools for AWS Lambda (Java) | 2.8.0 | +| .NET | Powertools for AWS Lambda (.NET) | 3.1.0 | AWS SDK and X-Ray minimum versions: -| Runtime | AWS SDK minimum | X-Ray SDK minimum | -|---------|----------------|-------------------| -| Node.js | AWS SDK for JavaScript v3 (3.933.0) | 3.12.0 | -| Java | AWS SDK for Java 2.0 (2.34.0) | 2.20.0 | -| .NET | AWSSDK.Core (4.0.0.32) | AWSXRayRecorder.Core (2.16.0) | +| Runtime | AWS SDK minimum | X-Ray SDK minimum | +| ------- | ----------------------------------- | ----------------------------- | +| Node.js | AWS SDK for JavaScript v3 (3.933.0) | 3.12.0 | +| Java | AWS SDK for Java 2.0 (2.34.0) | 2.20.0 | +| .NET | AWSSDK.Core (4.0.0.32) | AWSXRayRecorder.Core (2.16.0) | diff --git a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md index 16dd6d2b..4b17579e 100644 --- a/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md +++ b/plugins/aws-serverless/skills/aws-lambda-managed-instances/references/troubleshooting.md @@ -2,18 +2,18 @@ ## Common Issues -| Issue | Cause | Resolution | -|-------|-------|------------| -| 429 throttles during scale-up | Traffic doubled faster than 5-min scaling window | Increase MinExecutionEnvironments or lower TargetResourceUtilization | -| Function stuck in PENDING | Capacity provider provisioning instances | Wait several minutes; verify VPC subnets have IP capacity and IAM roles are correct | -| Architecture mismatch error | Function architecture ≠ capacity provider | Align both to arm64 or x86_64 | -| Cannot terminate EC2 instances | LMI instances managed by capacity provider | Delete capacity provider to destroy instances; cannot use EC2 console | -| High CPU, low throughput | Concurrency too high for CPU-bound work | Reduce PerExecutionEnvironmentMaxConcurrency to 1/vCPU | -| Race conditions in production | Code not thread-safe for multi-concurrency | Review with checklist in thread-safety.md | -| Function version not ACTIVE | Fewer than 3 execution environments ready | Wait for provisioning; check capacity provider status | -| Unexpected 500 errors | Unhandled concurrent access to shared state | Add thread-safe patterns from migration-patterns.md | -| CloudWatch logs missing | VPC egress not configured | Add NAT Gateway or CloudWatch Logs VPC endpoint | -| High costs despite low traffic | Minimum 3 instances always running | Evaluate if standard Lambda is more cost-effective | +| Issue | Cause | Resolution | +| ------------------------------ | ------------------------------------------------ | ----------------------------------------------------------------------------------- | +| 429 throttles during scale-up | Traffic doubled faster than 5-min scaling window | Increase MinExecutionEnvironments or lower TargetResourceUtilization | +| Function stuck in PENDING | Capacity provider provisioning instances | Wait several minutes; verify VPC subnets have IP capacity and IAM roles are correct | +| Architecture mismatch error | Function architecture ≠ capacity provider | Align both to arm64 or x86_64 | +| Cannot terminate EC2 instances | LMI instances managed by capacity provider | Delete capacity provider to destroy instances; cannot use EC2 console | +| High CPU, low throughput | Concurrency too high for CPU-bound work | Reduce PerExecutionEnvironmentMaxConcurrency to 1/vCPU | +| Race conditions in production | Code not thread-safe for multi-concurrency | Review with checklist in thread-safety.md | +| Function version not ACTIVE | Fewer than 3 execution environments ready | Wait for provisioning; check capacity provider status | +| Unexpected 500 errors | Unhandled concurrent access to shared state | Add thread-safe patterns from migration-patterns.md | +| CloudWatch logs missing | VPC egress not configured | Add NAT Gateway or CloudWatch Logs VPC endpoint | +| High costs despite low traffic | Minimum 3 instances always running | Evaluate if standard Lambda is more cost-effective | ## Debugging Steps From 54e5d53abdc647c9389d38943a92dd7901e6d138 Mon Sep 17 00:00:00 2001 From: Sahil Bhimjiani Date: Mon, 11 May 2026 17:49:37 -0500 Subject: [PATCH 8/8] fix: run dprint fmt on README.md table alignment --- README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 68f82a44..9368d443 100644 --- a/README.md +++ b/README.md @@ -219,11 +219,11 @@ Design, build, deploy, test, and debug serverless applications with AWS Lambda, ### Agent Skill Triggers -| Agent Skill | Triggers | -| -------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| **aws-lambda** | "Lambda function", "event source", "serverless application", "API Gateway", "EventBridge", "Step Functions", "serverless API", "event-driven architecture", "Lambda trigger" | -| **aws-serverless-deployment** | "use SAM", "SAM template", "SAM init", "SAM deploy", "CDK serverless", "CDK Lambda construct", "NodejsFunction", "PythonFunction", "serverless CI/CD pipeline" | -| **aws-lambda-durable-functions** | "lambda durable functions", "workflow orchestration", "state machines", "retry/checkpoint patterns", "long-running stateful Lambda", "saga pattern", "human-in-the-loop" | +| Agent Skill | Triggers | +| -------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **aws-lambda** | "Lambda function", "event source", "serverless application", "API Gateway", "EventBridge", "Step Functions", "serverless API", "event-driven architecture", "Lambda trigger" | +| **aws-serverless-deployment** | "use SAM", "SAM template", "SAM init", "SAM deploy", "CDK serverless", "CDK Lambda construct", "NodejsFunction", "PythonFunction", "serverless CI/CD pipeline" | +| **aws-lambda-durable-functions** | "lambda durable functions", "workflow orchestration", "state machines", "retry/checkpoint patterns", "long-running stateful Lambda", "saga pattern", "human-in-the-loop" | | **aws-lambda-managed-instances** | "Lambda Managed Instances", "LMI", "capacity provider", "multi-concurrency Lambda", "EC2-backed Lambda", "cold start elimination", "Graviton Lambda", "Lambda cost optimization with Reserved Instances" | ### MCP Servers