Problem
When a worker node runs more than one test (e.g. idle → new POST /config → running again),
all runs produce metrics with identical label sets (region, tenant). There is no way
to distinguish test 1 from test 2 in Prometheus/GMP queries or dashboards.
Additionally node_id (the hostname of the worker) is only present on the
rust_loadtest_node_info gauge, not on the per-request counters and histograms,
so per-node breakdowns require joining an info metric.
Proposed changes
1. Add run_id to YamlMetadata (src/yaml_config.rs)
metadata:
name: "Acme checkout load test"
tenant: "acme"
run_id: "run-20260321-001234" # generated by control plane (webload-gui#83)
If run_id is absent from the YAML (e.g. standalone env-var boot), auto-generate
a value at POST /config time:
let run_id = yaml_cfg.metadata.run_id
.unwrap_or_else(|| format!("auto-{}", chrono::Utc::now().timestamp()));
2. Add node_id and run_id label dimensions to all request metrics (src/metrics.rs)
Affected metrics:
| Metric |
Current labels |
New labels |
rust_loadtest_requests_total |
region, tenant |
region, tenant, node_id, run_id |
rust_loadtest_request_errors_by_category |
category, region, tenant |
+ node_id, run_id |
rust_loadtest_request_duration_seconds (histogram) |
region, tenant |
+ node_id, run_id |
Per-step histograms (step, scenario) |
step, scenario, region, tenant |
+ node_id, run_id |
node_id is read from CLUSTER_NODE_ID env var (already available at startup).
3. Thread node_id and run_id through WorkerConfig
pub struct WorkerConfig {
// existing fields ...
pub node_id: String,
pub run_id: String,
}
4. Expose active run_id in GET /health
{
"node_id": "worker-run-20260321-001",
"run_id": "run-20260321-001234",
...
}
Allows the control plane to confirm which run is active and correlate GMP metrics.
5. run_id in TestState
struct TestState {
// existing fields ...
run_id: String,
}
Reset to a new auto-generated value on each POST /config, same as generation.
Example GMP queries after change
# Total RPS for a specific run
sum(rate(rust_loadtest_requests_total{run_id="run-20260321-001234"}[1m]))
# Compare two runs on the same node
sum(rate(rust_loadtest_requests_total{node_id="worker-1"}[1m])) by (run_id)
# Error rate per run across all nodes for a tenant
sum(rate(rust_loadtest_request_errors_by_category{tenant="acme"}[5m])) by (run_id, node_id)
Acceptance criteria
Related
- webload-gui#83 — control plane generates and includes
run_id in dispatched YAML
- Depends on no other issues
Problem
When a worker node runs more than one test (e.g. idle → new POST /config → running again),
all runs produce metrics with identical label sets (
region,tenant). There is no wayto distinguish test 1 from test 2 in Prometheus/GMP queries or dashboards.
Additionally
node_id(the hostname of the worker) is only present on therust_loadtest_node_infogauge, not on the per-request counters and histograms,so per-node breakdowns require joining an info metric.
Proposed changes
1. Add
run_idtoYamlMetadata(src/yaml_config.rs)If
run_idis absent from the YAML (e.g. standalone env-var boot), auto-generatea value at
POST /configtime:2. Add
node_idandrun_idlabel dimensions to all request metrics (src/metrics.rs)Affected metrics:
rust_loadtest_requests_totalregion,tenantregion,tenant,node_id,run_idrust_loadtest_request_errors_by_categorycategory,region,tenantnode_id,run_idrust_loadtest_request_duration_seconds(histogram)region,tenantnode_id,run_idstep,scenario)step,scenario,region,tenantnode_id,run_idnode_idis read fromCLUSTER_NODE_IDenv var (already available at startup).3. Thread
node_idandrun_idthroughWorkerConfig4. Expose active
run_idinGET /health{ "node_id": "worker-run-20260321-001", "run_id": "run-20260321-001234", ... }Allows the control plane to confirm which run is active and correlate GMP metrics.
5.
run_idinTestStateReset to a new auto-generated value on each
POST /config, same asgeneration.Example GMP queries after change
Acceptance criteria
run_idfield added toYamlMetadata; auto-generated (timestamp-based) if absentnode_idandrun_idpresent onrequests_total,errors_by_category, duration histograms, and per-step histogramsWorkerConfigcarriesnode_idandrun_id; passed through to all metric label callsGET /healthresponse includes activerun_idTestStatestoresrun_id; reset on each newPOST /configPOST /configcalls produce metrics with differentrun_idvaluesRelated
run_idin dispatched YAML