dstackai
diff --git a/‎README.md‎
Lines changed: 6 additions & 7 deletions b/‎README.md‎
Lines changed: 6 additions & 7 deletions
diff --git a/‎docs/blog/posts/gpu-health-checks.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/blog/posts/gpu-health-checks.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/blog/posts/metrics-ui.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/blog/posts/metrics-ui.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/blog/posts/prometheus.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/blog/posts/prometheus.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/docs/concepts/backends.md‎
Lines changed: 13 additions & 8 deletions b/‎docs/docs/concepts/backends.md‎
Lines changed: 13 additions & 8 deletions
@@ -18,7 +18,7 @@
 
 It streamlines development, training, and inference, and is compatible with any hardware, open-source tools, and frameworks.
 
-#### Hardware
+#### Accelerators
 
 `dstack` supports `NVIDIA`, `AMD`, `Google TPU`, `Intel Gaudi`, and `Tenstorrent` accelerators out of the box.
 
@@ -46,7 +46,7 @@ It streamlines development, training, and inference, and is compatible with any
 
 ##### Configure backends
 
-To orchestrate compute across cloud providers or existing Kubernetes clusters, you need to configure backends.
+To orchestrate compute across GPU clouds or Kubernetes clusters, you need to configure backends.
 
 Backends can be set up in `~/.dstack/server/config.yml` or through the [project settings page](https://dstack.ai/docs/concepts/projects#backends) in the UI.
 
@@ -123,12 +123,11 @@ Configuration is updated at ~/.dstack/config.yml
 
 `dstack` supports the following configurations:
 
-* [Dev environments](https://dstack.ai/docs/dev-environments) &mdash; for interactive development using a desktop IDE
-* [Tasks](https://dstack.ai/docs/tasks) &mdash; for scheduling jobs (incl. distributed jobs) or running web apps
-* [Services](https://dstack.ai/docs/services) &mdash; for deployment of models and web apps (with auto-scaling and authorization)
-* [Fleets](https://dstack.ai/docs/fleets) &mdash; for managing cloud and on-prem clusters
+* [Fleets](https://dstack.ai/docs/concepts/fleets) &mdash; for managing cloud and on-prem clusters
+* [Dev environments](https://dstack.ai/docs/concepts/dev-environments) &mdash; for interactive development using a desktop IDE
+* [Tasks](https://dstack.ai/docs/concepts/tasks) &mdash; for scheduling jobs (incl. distributed jobs) or running web apps
+* [Services](https://dstack.ai/docs/concepts/services) &mdash; for deployment of models and web apps (with auto-scaling and authorization)
 * [Volumes](https://dstack.ai/docs/concepts/volumes) &mdash; for managing persisted volumes
-* [Gateways](https://dstack.ai/docs/concepts/gateways) &mdash; for configuring the ingress traffic and public endpoints
 
 Configuration can be defined as YAML files within your repo.
 
 
@@ -12,7 +12,7 @@ categories:
 
 In large-scale training, a single bad GPU can derail progress. Sometimes the failure is obvious — jobs crash outright. Other times it’s subtle: correctable memory errors, intermittent instability, or thermal throttling that quietly drags down throughput. In big experiments, these issues can go unnoticed for hours or days, wasting compute and delaying results.
 
-`dstack` already supports GPU telemetry monitoring through NVIDIA DCGM [metrics](../../docs/guides/metrics.md), covering utilization, memory, and temperature. This release extends that capability with passive hardware health checks powered by DCGM [background health checks](https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/feature-overview.html#background-health-checks). With these, `dstack` continuously evaluates fleet GPUs for hardware reliability and displays their status before scheduling workloads.
+`dstack` already supports GPU telemetry monitoring through NVIDIA DCGM [metrics](../../docs/concepts/metrics.md), covering utilization, memory, and temperature. This release extends that capability with passive hardware health checks powered by DCGM [background health checks](https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/feature-overview.html#background-health-checks). With these, `dstack` continuously evaluates fleet GPUs for hardware reliability and displays their status before scheduling workloads.
 
 <img src="https://dstack.ai/static-assets/static-assets/images/gpu-health-checks.png" width="630"/>
 
@@ -69,5 +69,5 @@ If you have experience with GPU reliability or ideas for automated recovery, joi
 !!! info "What's next?"
     1. Check [Quickstart](../../docs/quickstart.md)
     2. Explore the [clusters](../../docs/guides/clusters.md) guide
-    3. Learn more about [metrics](../../docs/guides/metrics.md)
+    3. Learn more about [metrics](../../docs/concepts/metrics.md)
     4. Join [Discord](https://discord.gg/u8SmfwPpMd)
@@ -53,6 +53,6 @@ For persistent storage and long-term access to metrics, we still recommend setti
 metrics from `dstack`.
 
 !!! info "What's next?"
-    1. See [Metrics](../../docs/guides/metrics.md)
+    1. See [Metrics](../../docs/concepts/metrics.md)
     2. Check [dev environments](../../docs/concepts/dev-environments.md), [tasks](../../docs/concepts/tasks.md), [services](../../docs/concepts/services.md), and [fleets](../../docs/concepts/fleets.md)
     3. Join [Discord](https://discord.gg/u8SmfwPpMd)
@@ -45,7 +45,7 @@ Overall, `dstack` collects three groups of metrics:
 | **Runs**   | Run metrics include run counters for each user in each project.                                                                                                   |
 | **Jobs**   | A run consists of one or more jobs, each mapped to a container. Job metrics offer insights into execution time, cost, GPU model, NVIDIA DCGM telemetry, and more. |
 
-For a full list of available metrics and labels, check out [Metrics](../../docs/guides/metrics.md).
+For a full list of available metrics and labels, check out [Metrics](../../docs/concepts/metrics.md).
 
 ??? info "NVIDIA"
     NVIDIA DCGM metrics are automatically collected for `aws`, `azure`, `gcp`, and `oci` backends,
@@ -59,7 +59,7 @@ For a full list of available metrics and labels, check out [Metrics](../../docs/
     only accessible through the UI and the [`dstack metrics`](dstack-metrics.md) CLI.
 
 !!! info "What's next?"
-    1. See [Metrics](../../docs/guides/metrics.md)
+    1. See [Metrics](../../docs/concepts/metrics.md)
     1. Check [dev environments](../../docs/concepts/dev-environments.md),
        [tasks](../../docs/concepts/tasks.md), [services](../../docs/concepts/services.md),
        and [fleets](../../docs/concepts/fleets.md)
 
@@ -1,21 +1,22 @@
 # Backends
 
-Backends allow `dstack` to provision fleets across cloud providers or Kubernetes clusters.
+Backends allow `dstack` to provision fleets across GPU clouds or Kubernetes clusters.
 
 `dstack` supports two types of backends: 
 
   * [VM-based](#vm-based) – use `dstack`'s native integration with cloud providers to provision VMs, manage clusters, and orchestrate container-based runs.  
   * [Container-based](#container-based) – use either `dstack`'s native integration with cloud providers or Kubernetes to orchestrate container-based runs; provisioning in this case is delegated to the cloud provider or Kubernetes.  
 
-??? info "SSH fleets"
+!!! info "SSH fleets"
     When using `dstack` with on-prem servers, backend configuration isn’t required. Simply create [SSH fleets](../concepts/fleets.md#ssh-fleets) once the server is up.
 
 Backends can be configured via `~/.dstack/server/config.yml` or through the [project settings page](../concepts/projects.md#backends) in the UI. See the examples of backend configuration below.
 
+> If you update `~/.dstack/server/config.yml`, you have to restart the server.
+
 ## VM-based
 
-VM-based backends allow `dstack` users to manage clusters and orchestrate container-based runs across a wide range of cloud providers.  
-Under the hood, `dstack` uses native integrations with these providers to provision clusters on demand.  
+VM-based backends allow `dstack` users to manage clusters and orchestrate container-based runs across a wide range of cloud providers. Under the hood, `dstack` uses native integrations with these providers to provision clusters on demand.  
 
 Compared to [container-based](#container-based) backends, this approach offers finer-grained, simpler control over cluster provisioning and eliminates the dependency on a Kubernetes layer.
 
@@ -1036,9 +1037,13 @@ projects:
 
     No additional setup is required — `dstack` configures and manages the proxy automatically.
 
-??? info "NVIDIA GPU Operator"
-    For `dstack` to correctly detect GPUs in your Kubernetes cluster, the cluster must have the
-    [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html) pre-installed.
+??? info "Required operators"
+    === "NVIDIA"
+        For `dstack` to correctly detect GPUs in your Kubernetes cluster, the cluster must have the
+        [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html) pre-installed.
+    === "AMD"
+        For `dstack` to correctly detect GPUs in your Kubernetes cluster, the cluster must have the
+        [AMD GPU Operator](https://github.com/ROCm/gpu-operator) pre-installed.
 
 <!-- ??? info "Managed Kubernetes"
     While `dstack` supports both managed and on-prem Kubernetes clusters, it can only run on pre-provisioned nodes.
@@ -1071,7 +1076,7 @@ projects:
 
     Ensure you've created a ClusterRoleBinding to grant the role to the user or the service account you're using.
 
-> To learn more, see the [Kubernetes](../guides/kubernetes.md) guide.
+> To learn more, see the [Lambda](../../examples/clusters/lambda/#kubernetes) and [Lambda](../../examples/clusters/crusoe/#kubernetes) examples.
 
 ### RunPod