From 0203914484d6a463dd0151594cf61bff16140f86 Mon Sep 17 00:00:00 2001 From: Kristin Martin Date: Thu, 7 May 2026 18:37:28 +0000 Subject: [PATCH 1/4] Document one-app-per-user scale limit for autostop Add a footgun to the per-user dev environments blueprint explaining that the Fly Proxy autostop loop can't keep idle Machines stopped at thousands-of-Machines scale, with pointers in the autostop config and proxy reference docs back to the dynamic-routing pattern. --- blueprints/per-user-dev-environments.html.md | 1 + launch/autostop-autostart.html.markerb | 2 ++ reference/fly-proxy-autostop-autostart.html.markerb | 4 ++++ 3 files changed, 7 insertions(+) diff --git a/blueprints/per-user-dev-environments.html.md b/blueprints/per-user-dev-environments.html.md index 0c0df40484..153281d592 100644 --- a/blueprints/per-user-dev-environments.html.md +++ b/blueprints/per-user-dev-environments.html.md @@ -61,6 +61,7 @@ You'll want to spin up at least one Machine per user app (but apps can have as m - **Machines & volumes are tied to physical hardware:** hardware failures can destroy machines and attached volumes. **Always persist important user data** (code, config, outputs) to external storage (like [Tigris Data](/docs/tigris/#main-content-start) or AWS S3). - **Your users will break their environments:** pre-create standby machines to handle hardware & runtime failures, or the inevitable user or robot poisoned environment. Pre-create standby machines that you can quickly activate in these scenarios. - **Machine restarts reset ephemeral filesystem:** the temporary Fly Machine filesystem state resets on Machine restarts, ensuring clean environments. However, volume data remains persistent, making it useful for retaining user progress or state. +- **One app per user:** Putting all user Machines into a single app and routing to them with dynamic routing works, but `auto_stop` won't behave the way you expect once you scale. The Fly Proxy's stop loop is rate-limited: it [stops or suspends at most one Machine per region each pass, and runs every few minutes](/docs/reference/fly-proxy-autostop-autostart/#fly-proxy-process-to-stop-or-suspend-machines). That's fine for a normal app; with thousands of Machines the loop can't keep up, and most of your idle Machines stay running. Machines in a single app also share app-level secrets and a flat [private network](/docs/networking/private-networking/). A compromised user environment can reach every other Machine in the app. If you do keep all user Machines in one app, implement stop-when-idle behavior in your app or orchestrator (see [apps that shut down when idle](/docs/launch/autostop-autostart/#apps-that-shut-down-when-idle)). Don't rely on the Fly Proxy to keep most Machines stopped. ## Related reading diff --git a/launch/autostop-autostart.html.markerb b/launch/autostop-autostart.html.markerb index a6d75eb0de..ef0edfbbfa 100644 --- a/launch/autostop-autostart.html.markerb +++ b/launch/autostop-autostart.html.markerb @@ -14,6 +14,8 @@ Get all the details of [how Fly Proxy autostop/autostart works](/docs/reference/ Autostop/autostart works well for apps with highly variable workloads, for smaller apps with low or sporadic traffic, and for most apps that aren't receiving requests continuously. You can reduce resource usage and costs by using autostop/autostart to manage your Fly Machines as demand decreases and increases. You'll never have to run excess Machines to handle peak load; you'll only run, and get charged for, the number of Machines that you need. You can choose to keep one or more Machines running in your primary region. +Autostop/autostart isn't a fit for every workload. The Fly Proxy's stop loop runs every few minutes and stops at most one Machine per region per pass. That's fine for normal apps, but if you're running thousands of Machines in a single app (for example, [per-user dev environments](/docs/blueprints/per-user-dev-environments/)), the loop can't keep up. In that case, use [one app per user](/docs/machines/guides-examples/one-app-per-user-why/) with [dynamic routing](/docs/networking/dynamic-request-routing/), or have your app shut itself down when idle. + ## Configure autostop/autostart The autostop/autostart settings are part of each service in an app's `fly.toml` file. See the [[[services]]](/docs/reference/configuration/#the-services-sections) or [[http_service]](/docs/reference/configuration/#the-http_service-section) docs for details about service configuration. You can also add services to [private apps](#private-apps). diff --git a/reference/fly-proxy-autostop-autostart.html.markerb b/reference/fly-proxy-autostop-autostart.html.markerb index 915ee43890..be0fc38c76 100644 --- a/reference/fly-proxy-autostop-autostart.html.markerb +++ b/reference/fly-proxy-autostop-autostart.html.markerb @@ -39,6 +39,10 @@ Fly Proxy determines excess capacity per region as follows: * the proxy checks if the Machine has any traffic * if the Machine has no traffic (a load of 0), then the proxy stops or suspends the Machine +
+**At scale**, the rate-limited loop becomes the constraint: with thousands of Machines in a single app, the proxy can't stop them fast enough to keep most idle Machines stopped. If that's your shape (for example, [per-user dev environments](/docs/blueprints/per-user-dev-environments/)), use [one app per user](/docs/machines/guides-examples/one-app-per-user-why/) with [dynamic routing](/docs/networking/dynamic-request-routing/), or have your app shut down when idle. +
+ ### Fly Proxy process to start Machines When `auto_start_machines = true` in your `fly.toml`, the Fly Proxy restarts a Machine in the nearest region when required. From fe0bd0e9c6888c143403204b7275f51c641f5513 Mon Sep 17 00:00:00 2001 From: Kristin Martin Date: Thu, 7 May 2026 11:41:19 -0700 Subject: [PATCH 2/4] update date --- blueprints/per-user-dev-environments.html.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blueprints/per-user-dev-environments.html.md b/blueprints/per-user-dev-environments.html.md index 153281d592..64871c54d8 100644 --- a/blueprints/per-user-dev-environments.html.md +++ b/blueprints/per-user-dev-environments.html.md @@ -2,7 +2,7 @@ title: Per-User Dev Environments with Fly Machines layout: docs nav: guides -date: 2025-04-02 +date: 2025-05-07 ---
From ae2317e1d33c092f66f42e438072b90144d1ea99 Mon Sep 17 00:00:00 2001 From: Kristin Martin Date: Thu, 7 May 2026 11:42:11 -0700 Subject: [PATCH 3/4] correct date --- blueprints/per-user-dev-environments.html.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blueprints/per-user-dev-environments.html.md b/blueprints/per-user-dev-environments.html.md index 64871c54d8..0d5a612a41 100644 --- a/blueprints/per-user-dev-environments.html.md +++ b/blueprints/per-user-dev-environments.html.md @@ -2,7 +2,7 @@ title: Per-User Dev Environments with Fly Machines layout: docs nav: guides -date: 2025-05-07 +date: 2026-05-07 ---
From b53806280a1b32647fc21b00e831c7157e7d2a0e Mon Sep 17 00:00:00 2001 From: Kristin Martin Date: Thu, 7 May 2026 11:47:21 -0700 Subject: [PATCH 4/4] copy edit --- reference/fly-proxy-autostop-autostart.html.markerb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/fly-proxy-autostop-autostart.html.markerb b/reference/fly-proxy-autostop-autostart.html.markerb index be0fc38c76..ff3e989563 100644 --- a/reference/fly-proxy-autostop-autostart.html.markerb +++ b/reference/fly-proxy-autostop-autostart.html.markerb @@ -40,7 +40,7 @@ Fly Proxy determines excess capacity per region as follows: * if the Machine has no traffic (a load of 0), then the proxy stops or suspends the Machine
-**At scale**, the rate-limited loop becomes the constraint: with thousands of Machines in a single app, the proxy can't stop them fast enough to keep most idle Machines stopped. If that's your shape (for example, [per-user dev environments](/docs/blueprints/per-user-dev-environments/)), use [one app per user](/docs/machines/guides-examples/one-app-per-user-why/) with [dynamic routing](/docs/networking/dynamic-request-routing/), or have your app shut down when idle. +**At scale**, the rate-limited loop becomes the constraint: with thousands of Machines in a single app, the proxy can't stop them fast enough to keep most idle Machines stopped. If that's your use case (for example, [per-user dev environments](/docs/blueprints/per-user-dev-environments/)), use [one app per user](/docs/machines/guides-examples/one-app-per-user-why/) with [dynamic routing](/docs/networking/dynamic-request-routing/), or have your app shut down when idle.
### Fly Proxy process to start Machines