Skip to content

Add Gatling load test suite for overpass.deflock.org#147

Draft
dougborg wants to merge 1 commit intoFoggedLens:mainfrom
dougborg:feat/load-tests
Draft

Add Gatling load test suite for overpass.deflock.org#147
dougborg wants to merge 1 commit intoFoggedLens:mainfrom
dougborg:feat/load-tests

Conversation

@dougborg
Copy link
Collaborator

@dougborg dougborg commented Mar 9, 2026

Summary

  • Adds a Gatling (Scala) load test suite in load-tests/ to validate overpass.deflock.org performance before switching app users to it
  • Four simulation scenarios: baseline (deterministic single-user), concurrent (ramp to 50), stress (spike to 500), and burst (realistic app session waves)
  • Queries use the exact same per-profile tag filters as the app's NodeProfile.getDefaults() (all 11 built-in profiles)
  • Manual-trigger GitHub Actions workflow with scenario picker dropdown, parallel matrix runs, and downloadable HTML report artifacts
  • Includes shared source set for testable pure logic, ScalaTest unit tests (22 tests), dev container, and documentation

Scenarios

Scenario Users Duration Purpose
Baseline 1 ~2 min Deterministic zoom progression (6 zooms × 6 cities = 36 requests)
Concurrent 1→50 4 min Find the degradation inflection point
Stress 500 5 min Exceed ~512 Overpass compute slots
Burst Waves of 20→50→100→80 ~3 min Realistic app sessions (10-20 requests each)

Baseline results (with full 11-profile tag filters)

36 requests, 0 failures:

Metric Value
p50 1,655ms
p75 3,311ms
p95 6,309ms
p99 9,891ms
Error rate 0%

Project structure

load-tests/src/
├── shared/   — Pure logic (OverpassQuery, TestData) — no Gatling dependency
├── gatling/  — Simulations + HTTP request def — depends on shared
└── test/     — ScalaTest unit tests — depends on shared

The shared source set avoids a circular Gradle dependency between gatling and test.

Test plan

  • ./gradlew compileGatlingScala — all 4 simulations compile
  • ./gradlew test — 22 unit tests pass
  • ./gradlew gatlingRun — baseline passes (36/36, 0 errors)
  • ./gradlew gatlingRun --simulation deflock.ConcurrentSimulation — runs, 434 requests at 50 users, 0 errors
  • GitHub Actions workflow triggers and uploads report artifact
  • Dockerfile pins Coursier v2.1.24 with SHA256 checksum verification

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 9, 2026 05:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a standalone Gatling/Scala load-testing project under load-tests/ plus a manual GitHub Actions workflow to generate and upload HTML performance reports for the overpass.deflock.org Overpass instance.

Changes:

  • Introduces Gatling simulations/request builders and test data for a single-user zoom progression scenario.
  • Adds Gradle wrapper/build config, Gatling/logback config, and contributor docs (README + dev container).
  • Adds a manually triggered GitHub Actions workflow to run the load test and upload reports.

Reviewed changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
load-tests/src/gatling/scala/deflock/TestData.scala Defines city centers, zoom viewport sizes, and feeders for request parameterization.
load-tests/src/gatling/scala/deflock/OverpassSimulation.scala Implements the baseline Gatling scenario and global assertions.
load-tests/src/gatling/scala/deflock/OverpassRequests.scala Defines Overpass query construction, timeouts, and HTTP request checks.
load-tests/src/gatling/resources/logback-test.xml Sets default logging level for simulations.
load-tests/src/gatling/resources/gatling.conf Configures report charting thresholds.
load-tests/settings.gradle.kts Sets the Gradle root project name.
load-tests/build.gradle.kts Adds Scala + Gatling Gradle plugin and Maven Central repository.
load-tests/README.md Documents purpose, running locally/CI, and interpreting reports.
load-tests/.gitignore Ignores Gradle outputs and caches within load-tests/.
load-tests/.devcontainer/devcontainer.json Defines a VS Code dev container for running the load tests.
load-tests/.devcontainer/Dockerfile Builds the dev container image (JDK 21 + Coursier).
load-tests/gradlew Adds Gradle wrapper script (POSIX).
load-tests/gradlew.bat Adds Gradle wrapper script (Windows).
load-tests/gradle/wrapper/gradle-wrapper.properties Configures Gradle wrapper distribution and settings.
load-tests/gradle/wrapper/gradle-wrapper.jar Adds the Gradle wrapper JAR.
.github/workflows/load-test.yml Adds a manual workflow to run Gatling and upload the HTML report artifact.
.gitignore Fixes formatting/indentation for the windows/ ignore entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@dougborg dougborg force-pushed the feat/load-tests branch 7 times, most recently from 83b9e5a to ef8233d Compare March 9, 2026 07:28
@dougborg dougborg changed the title Add Gatling load tests for overpass.deflock.org Add Gatling load test suite for overpass.deflock.org Mar 9, 2026
Four simulation scenarios to validate server performance (~512 Overpass
compute slots) before switching app users to the self-hosted instance:

- Baseline: deterministic single-user zoom progression (36 requests)
- Concurrent: ramp 1→50 users to find degradation inflection point
- Stress: spike to 500 users to exceed server capacity
- Burst: realistic app sessions (10-20 requests each) in waves

Queries use the exact same per-profile tag filters as the app's
NodeProfile.getDefaults() (all 11 built-in profiles). Pure query logic
lives in a shared source set with 22 ScalaTest unit tests.

Includes GitHub Actions workflow with scenario picker dropdown, matrix
strategy for parallel runs, and PR comment with report download links.
Dev container with pinned Coursier (SHA256 verified) for IDE support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dougborg
Copy link
Collaborator Author

dougborg commented Mar 9, 2026

I worked on this with some excellent input from T on the server. I think this will really help us tune things and make sure we can support the load, etc.


Open for discussion

This PR establishes the test harness and four initial scenarios, but the tuning numbers are starting points based on assumptions — not measurements. Happy to adjust any of these based on what Thomas knows about the server and what we learn from runs.

Tuning knobs

Parameter Current value Rationale Question
Concurrent user cap 50 Conservative — stay below assumed 512 slots Is 50 too timid? Should we push to 100-200?
Stress spike 500 users total Designed to exceed 512 slots Is 512 actually the slot count? Should we go higher?
Burst wave sizes 20→50→100→80 Gut feel for "bursty app traffic" Do we have real traffic data to calibrate against?
Burst session length 10-20 requests Rough estimate of a map browsing session What does actual app telemetry say?
Pause between requests 200-800ms (burst), 500ms (concurrent), 100ms (stress) Simulates pan/zoom interaction speed Too fast? Too slow?
Weighted zoom distribution 80% z13-z15, 20% z10-z12 Assumption that most users are zoomed in Do we have zoom level analytics?
Assertion thresholds p99<30s baseline, p95<45s concurrent, p95<30s burst Generous — exploring, not enforcing SLA What response times are actually acceptable for the app UX?
City set 6 US cities High surveillance camera density Should we add international cities? Low-density areas?

What we can easily add

Gatling is flexible — new scenarios are ~30 lines of Scala. Some ideas:

  • Soak test — moderate load (20-30 users) sustained for 30-60 minutes to catch memory leaks or slow degradation
  • Geographic sweep — random lat/lng worldwide instead of fixed cities, to stress different parts of the Overpass dataset
  • Cache effectiveness — repeat the same queries to measure how much the server's internal caching helps
  • Ramp-to-failure — start at 10 users, add 10 every minute, keep going until error rate exceeds 50%. Automatically finds the ceiling.
  • Single-profile queries — test individual profiles in isolation (just ALPR, just gunshot detectors) to see if query complexity affects latency differently
  • Realistic daily curve — model a full 24-hour traffic pattern compressed into 15 minutes
  • Traffic replay — in the future, we could record anonymized query patterns from the app (zoom levels, bbox sizes, request timing) and replay them through Gatling to model scenarios based on real user behavior rather than assumptions

Open questions

  1. What's the actual server capacity? The 512 compute slot number came from early discussion — is that still accurate? Does nginx have its own connection limits in front of Overpass?
  2. What does "good" look like? The baseline shows p501.7s, p9910s with the full 11-profile query. Is that acceptable for the app UX, or do we need it faster?
  3. Should the load test workflow run on a schedule? We could add a nightly or weekly cron trigger for the baseline to catch regressions after server updates.
  4. Multiple geographic regions? Running from GitHub Actions (US-East) means we're measuring US latency. Worth adding a runner in EU to test from there too?

@dougborg dougborg marked this pull request as draft March 12, 2026 18:49
@dougborg
Copy link
Collaborator Author

This will not be integrated here - we will be migrating this over to the infra repo on gitlab.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants