Skip to content

adding PMM to loadtest environment#45567

Open
ksykulev wants to merge 2 commits into
mainfrom
pmm-in-loadtest
Open

adding PMM to loadtest environment#45567
ksykulev wants to merge 2 commits into
mainfrom
pmm-in-loadtest

Conversation

@ksykulev
Copy link
Copy Markdown
Contributor

@ksykulev ksykulev commented May 15, 2026

Taking local PMM setup from https://github.com/fleetdm/fleet/tree/main/tools/percona/pmm and adapting it to be an optional component to spin up for load tests.

Testing

  • QA'd all new/changed functionality manually

Summary by CodeRabbit

  • Chores
    • Migrated load testing infrastructure from EC2-based PMM to ECS Fargate deployment behind internal load balancer
    • Exposed RDS cluster endpoints (writer/reader), ALB listener configuration, and security group identifiers for external reference
    • Added automated PMM password management through secrets storage and configured MySQL monitoring with automated credential injection

Review Change Stack

Copilot AI review requested due to automatic review settings May 15, 2026 04:14
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Terraform module that provisions a Percona Monitoring and Management (PMM) server in the loadtesting environment, exposes the loadtest RDS connection details so the PMM module can consume them, and bootstraps the PMM container via EC2 user-data to register the Aurora cluster for monitoring.

Changes:

  • New pmm/ Terraform module: EC2 instance (Amazon Linux 2023) running PMM 2 in Docker, security group restricted to VPC + VPN on 443, IAM role with SSM, Route53 vanity record.
  • infra/outputs.tf exports RDS endpoint, username, database name, and SG id (consumed via remote state by the new module).
  • user-data.sh installs Docker, runs PMM, and registers the RDS MySQL/Aurora instance with pmm-admin.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
infrastructure/loadtesting/terraform/pmm/main.tf New PMM EC2 module: provider/backend, remote-state lookups, IAM, SG, instance, DNS.
infrastructure/loadtesting/terraform/pmm/user-data.sh Bootstrap script: installs Docker, runs PMM container, registers RDS for monitoring.
infrastructure/loadtesting/terraform/pmm/outputs.tf Exposes PMM private IP, instance ID, and vanity URL.
infrastructure/loadtesting/terraform/pmm/variables.tf Defines aws_region variable (default us-east-2).
infrastructure/loadtesting/terraform/infra/outputs.tf Adds RDS endpoint/username/db/SG outputs consumed by the PMM module.
Comments suppressed due to low confidence (1)

infrastructure/loadtesting/terraform/pmm/user-data.sh:25

  • The script runs set -euo pipefail, so if PMM never becomes ready within 60 attempts (5 min), the loop exits without error but the subsequent pmm-admin add mysql will likely fail against an unready server, and there is no explicit failure if the readiness check never succeeded. Consider failing fast (e.g., exit 1) when the readiness loop exhausts all retries instead of silently continuing.
# Wait for PMM server to be ready
echo "Waiting for PMM server to become ready..."
for i in $(seq 1 60); do
  if docker exec pmm-server curl -sSf -k https://localhost:443/v1/readyz > /dev/null 2>&1; then
    echo "PMM server is ready"
    break
  fi
  echo "Attempt $i/60 - PMM server not ready yet..."
  sleep 5
done

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread infrastructure/loadtesting/terraform/pmm/user-data.sh Outdated
Comment thread infrastructure/loadtesting/terraform/pmm/main.tf Outdated
Comment thread infrastructure/loadtesting/terraform/pmm/main.tf
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Walkthrough

This PR refactors PMM load-testing infrastructure from EC2-based to ECS Fargate-based deployment. It exports ALB listener ARN, hosted zone ID, and RDS connection details (endpoints, master username, database name, security group) from the shared infra module. The PMM module then provisions a containerized Fargate service with workspace-scoped DNS, secrets management via Secrets Manager, IAM task execution permissions, CloudWatch logging, and automatic RDS monitoring registration via pmm-admin CLI during container startup. The service attaches to an internal ALB with workspace-scoped host-header routing and health checks.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is incomplete relative to the template. While it explains the purpose and provides a testing checklist, it lacks required sections like 'Related issue', proper checklist items, and security/database validation details. Add 'Related issue' section, complete the testing checklist with specific details about what was tested, and address relevant security and infrastructure validation items from the template.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'adding PMM to loadtest environment' clearly and directly describes the main change: introducing PMM (Percona Monitoring and Management) as a new component to the load testing infrastructure.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pmm-in-loadtest

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
infrastructure/loadtesting/terraform/pmm/user-data.sh (2)

16-25: ⚡ Quick win

Add error handling for PMM readiness timeout.

If the PMM server fails to become ready after 60 attempts (~5 minutes), the script silently continues to the pmm-admin add command, which will fail. This can make debugging difficult.

⚠️ Proposed fix to add timeout error handling
 # Wait for PMM server to be ready
 echo "Waiting for PMM server to become ready..."
+ready=false
 for i in $(seq 1 60); do
   if docker exec pmm-server curl -sSf -k https://localhost:443/v1/readyz > /dev/null 2>&1; then
     echo "PMM server is ready"
+    ready=true
     break
   fi
   echo "Attempt $i/60 - PMM server not ready yet..."
   sleep 5
 done
+
+if [ "$ready" = false ]; then
+  echo "ERROR: PMM server failed to become ready after 5 minutes"
+  docker logs pmm-server
+  exit 1
+fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@infrastructure/loadtesting/terraform/pmm/user-data.sh` around lines 16 - 25,
The readiness loop that calls "docker exec pmm-server curl -sSf -k
https://localhost:443/v1/readyz" may time out silently after 60 attempts; update
the script to detect when the loop finishes without a successful readiness check
and fail fast: after the for/seq loop (or by checking a flag like a broken-out
variable), log an explicit error message including attempt count and optionally
the last curl output, and exit with a non-zero status so subsequent commands
like "pmm-admin add" do not run on a non-ready PMM server; reference the
existing readiness check and the downstream "pmm-admin add" invocation when
implementing this change.

27-36: ⚡ Quick win

Verify RDS registration success.

The pmm-admin add mysql command can fail for various reasons (network connectivity, authentication, permissions), but there's no verification of success. Failed registration means no metrics will be collected, defeating the purpose of the PMM deployment.

✅ Proposed fix to verify registration
 # Add the RDS MySQL instance to PMM monitoring
-docker exec pmm-server pmm-admin add mysql \
+if docker exec pmm-server pmm-admin add mysql \
   --server-url=https://admin:admin@localhost:443 \
   --server-insecure-tls \
   --username='${rds_username}' \
   --password='${rds_password}' \
   --host='${rds_endpoint}' \
   --port=3306 \
   --query-source=perfschema \
-  "fleet-mysql"
+  "fleet-mysql"; then
+  echo "Successfully registered RDS instance with PMM"
+else
+  echo "ERROR: Failed to register RDS instance with PMM"
+  docker exec pmm-server pmm-admin list
+  exit 1
+fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@infrastructure/loadtesting/terraform/pmm/user-data.sh` around lines 27 - 36,
The pmm-admin add mysql invocation (run via "docker exec pmm-server pmm-admin
add mysql" for target "fleet-mysql") is not checked for success; update the
user-data script to verify the command succeeded by inspecting its exit status
and/or parsing pmm-admin output and then confirming registration with a
follow-up "docker exec pmm-server pmm-admin list" (or equivalent) to ensure
"fleet-mysql" appears; if the add fails, log the error output, retry a few times
with backoff or exit non‑zero so Terraform/user-data can surface the failure
instead of silently continuing.
infrastructure/loadtesting/terraform/pmm/main.tf (1)

176-198: ⚡ Quick win

Enable detailed monitoring for production observability.

Detailed monitoring provides 1-minute CloudWatch metrics instead of 5-minute intervals, enabling faster detection of performance issues and anomalies in the PMM monitoring infrastructure.

📊 Proposed fix to enable detailed monitoring
 resource "aws_instance" "pmm" {
   ami                    = data.aws_ami.al2023.id
   instance_type          = "t3.large"
   subnet_id              = data.terraform_remote_state.shared.outputs.vpc.private_subnets[0]
   vpc_security_group_ids = [aws_security_group.pmm.id]
   iam_instance_profile   = aws_iam_instance_profile.pmm.name
+  monitoring             = true

   root_block_device {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@infrastructure/loadtesting/terraform/pmm/main.tf` around lines 176 - 198, The
aws_instance "pmm" resource lacks detailed (1-minute) CloudWatch monitoring; add
the EC2 detailed monitoring flag by setting monitoring = true in the resource
"aws_instance" "pmm" block (and ensure any IAM role/policy used by
aws_iam_instance_profile.pmm allows CloudWatch metrics if not already
configured) so the instance emits 1-minute metrics for production observability.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@infrastructure/loadtesting/terraform/pmm/main.tf`:
- Around line 176-198: The aws_instance "pmm" resource does not enforce IMDSv2
(leaving IMDSv1 allowed); update the aws_instance resource named pmm to set
metadata_options with http_tokens = "required" (and optionally
http_put_response_hop_limit = 1) so the instance requires IMDSv2; locate the
aws_instance "pmm" block and add the metadata_options stanza to enforce IMDSv2.
- Around line 189-193: The user-data template is currently being rendered with
secrets (rds_username, rds_password) which exposes credentials via EC2 metadata;
remove rds_username and rds_password from the templatefile invocation so only
non-sensitive values (e.g., rds_endpoint) are passed, and update the referenced
script user-data.sh to retrieve RDS credentials at runtime from AWS Secrets
Manager using the AWS CLI/SDK (e.g., aws secretsmanager get-secret-value)
instead of template interpolation; finally, ensure the EC2 instance profile/IAM
role attached to the instance has least-privilege SecretsManager:GetSecretValue
permission for the secret used.

In `@infrastructure/loadtesting/terraform/pmm/user-data.sh`:
- Around line 27-36: The PMM server is being registered with hardcoded
credentials "admin:admin" in user-data.sh (the docker exec pmm-server pmm-admin
add mysql --server-url=https://admin:admin@localhost:443 line); replace this by
generating a secure random password (use a random_password resource), store it
in AWS Secrets Manager (aws_secretsmanager_secret +
aws_secretsmanager_secret_version), pass the secret value into the instance
user-data as a template variable (e.g., pmm_admin_password), and update
user-data.sh to inject that password into the PMM container initialization and
into the pmm-admin --server-url (use
https://admin:${pmm_admin_password}@localhost:443) without logging the secret.
Ensure the new Terraform resources (random_password and secrets) are created and
the user-data template variable pmm_admin_password is supplied when rendering
user-data so the docker/container init uses the secure password.

---

Nitpick comments:
In `@infrastructure/loadtesting/terraform/pmm/main.tf`:
- Around line 176-198: The aws_instance "pmm" resource lacks detailed (1-minute)
CloudWatch monitoring; add the EC2 detailed monitoring flag by setting
monitoring = true in the resource "aws_instance" "pmm" block (and ensure any IAM
role/policy used by aws_iam_instance_profile.pmm allows CloudWatch metrics if
not already configured) so the instance emits 1-minute metrics for production
observability.

In `@infrastructure/loadtesting/terraform/pmm/user-data.sh`:
- Around line 16-25: The readiness loop that calls "docker exec pmm-server curl
-sSf -k https://localhost:443/v1/readyz" may time out silently after 60
attempts; update the script to detect when the loop finishes without a
successful readiness check and fail fast: after the for/seq loop (or by checking
a flag like a broken-out variable), log an explicit error message including
attempt count and optionally the last curl output, and exit with a non-zero
status so subsequent commands like "pmm-admin add" do not run on a non-ready PMM
server; reference the existing readiness check and the downstream "pmm-admin
add" invocation when implementing this change.
- Around line 27-36: The pmm-admin add mysql invocation (run via "docker exec
pmm-server pmm-admin add mysql" for target "fleet-mysql") is not checked for
success; update the user-data script to verify the command succeeded by
inspecting its exit status and/or parsing pmm-admin output and then confirming
registration with a follow-up "docker exec pmm-server pmm-admin list" (or
equivalent) to ensure "fleet-mysql" appears; if the add fails, log the error
output, retry a few times with backoff or exit non‑zero so Terraform/user-data
can surface the failure instead of silently continuing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5072db73-4c8e-4cdf-88c9-1bd0e701f040

📥 Commits

Reviewing files that changed from the base of the PR and between b5e52ef and 15bbd0c.

📒 Files selected for processing (5)
  • infrastructure/loadtesting/terraform/infra/outputs.tf
  • infrastructure/loadtesting/terraform/pmm/main.tf
  • infrastructure/loadtesting/terraform/pmm/outputs.tf
  • infrastructure/loadtesting/terraform/pmm/user-data.sh
  • infrastructure/loadtesting/terraform/pmm/variables.tf

Comment thread infrastructure/loadtesting/terraform/pmm/main.tf Outdated
Comment thread infrastructure/loadtesting/terraform/pmm/main.tf Outdated
Comment thread infrastructure/loadtesting/terraform/pmm/user-data.sh Outdated
Comment thread infrastructure/loadtesting/terraform/pmm/user-data.sh Outdated
Comment thread infrastructure/loadtesting/terraform/pmm/user-data.sh Outdated
Comment thread infrastructure/loadtesting/terraform/pmm/user-data.sh Outdated
@ksykulev ksykulev requested a review from rfairburn May 15, 2026 15:13
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@infrastructure/loadtesting/terraform/pmm/main.tf`:
- Around line 88-91: The ECS task/service can start before the secret value
exists because task definitions only reference aws_secretsmanager_secret
metadata; update the ECS resource(s) that create tasks (e.g.,
aws_ecs_task_definition and/or aws_ecs_service) to add an explicit depends_on
referencing aws_secretsmanager_secret_version.pmm_admin_password (and the other
secret_version resources noted at lines 155-162 and 226-229) so Terraform will
wait for the first secret version to be created before registering or starting
tasks.
- Around line 181-213: The background subshell that performs the PATCH to
/v1/users and the pmm-admin add mysql (spawned in the container
entryPoint/command) can fail silently because it is backgrounded with "&" while
/opt/entrypoint.sh is exec'd; change the startup so the bootstrap steps run
synchronously before exec and propagate errors: run the readiness loop, run the
curl PATCH (change admin password) and the pmm-admin add mysql sequentially
without backgrounding, set the script to fail fast on any command error (e.g.,
enable errexit/pipefail behavior), log failures from the curl and pmm-admin
calls, and exit non-zero if either command fails so /opt/entrypoint.sh is not
started when bootstrap fails.
- Around line 220-233: The PMM task is using the RDS cluster master credentials
(PMM_MYSQL_USERNAME valueFrom
data.terraform_remote_state.infra.outputs.rds_cluster_master_username and
PMM_MYSQL_PASSWORD from data.aws_secretsmanager_secret.rds_password), which
gives full admin access; create a dedicated monitoring DB user with minimal
privileges required for PMM scraping, store its username/password in Secrets
Manager, and update the task env vars to reference the new secret and username
(replace PMM_MYSQL_USERNAME and PMM_MYSQL_PASSWORD references to use the new
secret ARN and username secret). Ensure you provision the DB user (via
Terraform-provisioned SQL, aws_db_instance tools, or a one-time migration) and
grant only the least privileges needed for monitoring.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2d91130a-f5de-4a01-9b74-5db442c3e1b3

📥 Commits

Reviewing files that changed from the base of the PR and between 15bbd0c and 07f9ce2.

📒 Files selected for processing (3)
  • infrastructure/loadtesting/terraform/infra/outputs.tf
  • infrastructure/loadtesting/terraform/pmm/main.tf
  • infrastructure/loadtesting/terraform/pmm/outputs.tf
🚧 Files skipped from review as they are similar to previous changes (1)
  • infrastructure/loadtesting/terraform/infra/outputs.tf

Comment on lines +88 to +91
resource "aws_secretsmanager_secret_version" "pmm_admin_password" {
secret_id = aws_secretsmanager_secret.pmm_admin_password.id
secret_string = random_password.pmm_admin.result
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wait for the first admin-secret version before creating tasks.

The task definition only depends on the secret metadata, not on aws_secretsmanager_secret_version.pmm_admin_password. On a brand-new workspace ECS can start before the secret has a current value and fail to resolve PMM_ADMIN_PASSWORD.

Proposed fix
 resource "aws_ecs_task_definition" "pmm" {
   family                   = "${local.customer}-pmm"
+  depends_on               = [aws_secretsmanager_secret_version.pmm_admin_password]
   requires_compatibilities = ["FARGATE"]
   network_mode             = "awsvpc"
   cpu                      = 2048

Also applies to: 155-162, 226-229

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@infrastructure/loadtesting/terraform/pmm/main.tf` around lines 88 - 91, The
ECS task/service can start before the secret value exists because task
definitions only reference aws_secretsmanager_secret metadata; update the ECS
resource(s) that create tasks (e.g., aws_ecs_task_definition and/or
aws_ecs_service) to add an explicit depends_on referencing
aws_secretsmanager_secret_version.pmm_admin_password (and the other
secret_version resources noted at lines 155-162 and 226-229) so Terraform will
wait for the first secret version to be created before registering or starting
tasks.

Comment on lines +181 to +213
entryPoint = ["/bin/bash", "-c"]
command = [join("\n", [
"# Background: wait for PMM server, then configure it",
"(",
" echo 'Waiting for PMM server to become ready...'",
" until curl -sSf -k https://localhost:443/v1/readyz > /dev/null 2>&1; do",
" sleep 5",
" done",
" echo 'PMM server is ready'",
"",
" # Change default admin password",
" curl -sSf -k -X PATCH https://localhost:443/v1/users \\",
" -H 'Content-Type: application/json' \\",
" -u admin:admin \\",
" -d \"{\\\"new_password\\\": \\\"$PMM_ADMIN_PASSWORD\\\"}\"",
" echo 'Admin password changed'",
"",
" # Add RDS MySQL monitoring",
" pmm-admin add mysql \\",
" --server-url=\"https://admin:$PMM_ADMIN_PASSWORD@localhost:443\" \\",
" --server-insecure-tls \\",
" --username=\"$PMM_MYSQL_USERNAME\" \\",
" --password=\"$PMM_MYSQL_PASSWORD\" \\",
" --host=\"$PMM_MYSQL_HOST\" \\",
" --port=3306 \\",
" --query-source=perfschema \\",
" fleet-mysql",
" echo 'MySQL monitoring configured'",
") &",
"",
"# Run PMM server as PID 1 (foreground)",
"exec /opt/entrypoint.sh",
])]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't let PMM bootstrap fail silently in the background.

The password change and pmm-admin add mysql both run in a background subshell. If either command fails, /opt/entrypoint.sh still stays healthy and /v1/readyz still passes, so the service looks healthy while database monitoring was never configured.

🧰 Tools
🪛 Betterleaks (1.2.0)

[high] 192-194: Discovered a potential basic authorization token provided in a curl command, which could compromise the curl accessed resource.

(curl-auth-user)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@infrastructure/loadtesting/terraform/pmm/main.tf` around lines 181 - 213, The
background subshell that performs the PATCH to /v1/users and the pmm-admin add
mysql (spawned in the container entryPoint/command) can fail silently because it
is backgrounded with "&" while /opt/entrypoint.sh is exec'd; change the startup
so the bootstrap steps run synchronously before exec and propagate errors: run
the readiness loop, run the curl PATCH (change admin password) and the pmm-admin
add mysql sequentially without backgrounding, set the script to fail fast on any
command error (e.g., enable errexit/pipefail behavior), log failures from the
curl and pmm-admin calls, and exit non-zero if either command fails so
/opt/entrypoint.sh is not started when bootstrap fails.

Comment on lines +220 to +233
{
name = "PMM_MYSQL_USERNAME"
value = data.terraform_remote_state.infra.outputs.rds_cluster_master_username
},
]

secrets = [
{
name = "PMM_ADMIN_PASSWORD"
valueFrom = aws_secretsmanager_secret.pmm_admin_password.arn
},
{
name = "PMM_MYSQL_PASSWORD"
valueFrom = "${data.aws_secretsmanager_secret.rds_password.arn}:password::"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Use a dedicated monitoring user instead of the cluster master account.

This task passes the cluster master username and password into PMM. That gives the monitoring container full administrative DB access when it only needs a narrower privilege set for scraping, which materially widens the blast radius if PMM is compromised.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@infrastructure/loadtesting/terraform/pmm/main.tf` around lines 220 - 233, The
PMM task is using the RDS cluster master credentials (PMM_MYSQL_USERNAME
valueFrom data.terraform_remote_state.infra.outputs.rds_cluster_master_username
and PMM_MYSQL_PASSWORD from data.aws_secretsmanager_secret.rds_password), which
gives full admin access; create a dedicated monitoring DB user with minimal
privileges required for PMM scraping, store its username/password in Secrets
Manager, and update the task env vars to reference the new secret and username
(replace PMM_MYSQL_USERNAME and PMM_MYSQL_PASSWORD references to use the new
secret ARN and username secret). Ensure you provision the DB user (via
Terraform-provisioned SQL, aws_db_instance tools, or a one-time migration) and
grant only the least privileges needed for monitoring.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't care for load testing really.

Copy link
Copy Markdown
Contributor

@rfairburn rfairburn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ECS version is working for you end-to-end, this LGTM. I don't see a problem with the listener rule or anything else at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants