From 25d9d4db8cba59aaf4f4ba6bbf80e64edf424a5b Mon Sep 17 00:00:00 2001 From: Rich Loveland Date: Mon, 18 May 2026 13:53:49 -0400 Subject: [PATCH 1/3] Update 'health?ready=1' docs Fixes DOC-17125 --- src/current/v26.2/monitoring-and-alerting.md | 2 ++ src/current/v26.2/node-shutdown.md | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/src/current/v26.2/monitoring-and-alerting.md b/src/current/v26.2/monitoring-and-alerting.md index d6bd689cb1a..fcb8741f563 100644 --- a/src/current/v26.2/monitoring-and-alerting.md +++ b/src/current/v26.2/monitoring-and-alerting.md @@ -120,6 +120,8 @@ The `http://:/health?ready=1` endpoint returns an HTTP `50 If you find that your load balancer's health check is not always recognizing a node as unready before the node shuts down, you can increase the `server.shutdown.initial_wait` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) (previously named `server.shutdown.drain_wait`) to cause a node to return `503 Service Unavailable` even before it has started shutting down. {{site.data.alerts.end}} +- The node is [decommissioning or decommissioned]({% link {{ page.version.version }}/node-shutdown.md %}?filters=decommission#decommissioning). This causes load balancers and connection managers to reroute traffic to other nodes while replicas are rebalanced away from the node. + - The node is unable to communicate with a majority of the other nodes in the cluster, likely because the cluster is unavailable due to too many nodes being down. {% include_cached copy-clipboard.html %} diff --git a/src/current/v26.2/node-shutdown.md b/src/current/v26.2/node-shutdown.md index 7f18836c2a9..61bec1b2cd3 100644 --- a/src/current/v26.2/node-shutdown.md +++ b/src/current/v26.2/node-shutdown.md @@ -51,7 +51,7 @@ An operator [initiates the decommissioning process](#decommission-the-node) on t The node's [`is_decommissioning`]({% link {{ page.version.version }}/cockroach-node.md %}#node-status) field is set to `true` and its `membership` status is set to `decommissioning`, which causes its replicas to be rebalanced to other nodes. If the rebalancing stalls during decommissioning, replicas that have yet to move are printed to the [SQL shell]({% link {{ page.version.version }}/cockroach-sql.md %}) and written to the [`OPS` logging channel]({% link {{ page.version.version }}/logging-overview.md %}#logging-channels). [By default]({% link {{ page.version.version }}/configure-logs.md %}#default-logging-configuration), the `OPS` channel logs output to a `cockroach.log` file. -The node's [`/health?ready=1` endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#health-ready-1) continues to consider the node "ready" so that the node can function as a gateway to route SQL client connections to relevant data. +The node's [`/health?ready=1` endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#health-ready-1) returns an HTTP `503 Service Unavailable` response code so that load balancers and connection managers stop directing new SQL client connections to the node while replicas are rebalanced. {{site.data.alerts.callout_info}} After this stage, the node is automatically drained. However, to avoid possible disruptions in query performance, you can manually drain the node before decommissioning. For more information, see [Perform node shutdown](#perform-node-shutdown). From 21a8b4b3cb0105c9c141e677109390d298e37674 Mon Sep 17 00:00:00 2001 From: Rich Loveland Date: Mon, 18 May 2026 13:57:04 -0400 Subject: [PATCH 2/3] Clarify health readiness load balancing wording --- src/current/v26.2/node-shutdown.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/current/v26.2/node-shutdown.md b/src/current/v26.2/node-shutdown.md index 61bec1b2cd3..a24e1629928 100644 --- a/src/current/v26.2/node-shutdown.md +++ b/src/current/v26.2/node-shutdown.md @@ -160,7 +160,7 @@ Before you [perform node shutdown](#perform-node-shutdown), review the following ### Load balancing -Your [load balancer]({% link {{ page.version.version }}/recommended-production-settings.md %}#load-balancing) should use the [`/health?ready=1` endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#health-ready-1) to actively monitor node health and direct SQL client connections away from draining nodes. +Your [load balancer]({% link {{ page.version.version }}/recommended-production-settings.md %}#load-balancing) should use the [`/health?ready=1` endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#health-ready-1) to actively monitor node health and direct SQL client connections away from nodes that are not ready to receive requests. To handle node shutdown effectively, the load balancer must be given enough time by the [`server.shutdown.initial_wait` duration](#server-shutdown-initial_wait). From 66f5250bff71c8353343083a22dcd543bda4ff7d Mon Sep 17 00:00:00 2001 From: Rich Loveland Date: Tue, 19 May 2026 11:00:15 -0400 Subject: [PATCH 3/3] Show health ready JSON error response Fixes DOC-17125 --- src/current/v26.2/monitoring-and-alerting.md | 6 +++++- src/current/v26.2/node-shutdown.md | 2 +- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/src/current/v26.2/monitoring-and-alerting.md b/src/current/v26.2/monitoring-and-alerting.md index fcb8741f563..22b28fe56fa 100644 --- a/src/current/v26.2/monitoring-and-alerting.md +++ b/src/current/v26.2/monitoring-and-alerting.md @@ -126,10 +126,14 @@ The `http://:/health?ready=1` endpoint returns an HTTP `50 {% include_cached copy-clipboard.html %} ~~~ shell -$ curl http://localhost:8080/health?ready=1 +$ curl -i http://localhost:8080/health?ready=1 ~~~ +The `-i` flag includes the HTTP response status in the `curl` output. Without `-i`, `curl` prints only the response body by default. + ~~~ +HTTP/1.1 503 Service Unavailable + { "error": "node is not healthy", "code": 14, diff --git a/src/current/v26.2/node-shutdown.md b/src/current/v26.2/node-shutdown.md index a24e1629928..e14626f7553 100644 --- a/src/current/v26.2/node-shutdown.md +++ b/src/current/v26.2/node-shutdown.md @@ -51,7 +51,7 @@ An operator [initiates the decommissioning process](#decommission-the-node) on t The node's [`is_decommissioning`]({% link {{ page.version.version }}/cockroach-node.md %}#node-status) field is set to `true` and its `membership` status is set to `decommissioning`, which causes its replicas to be rebalanced to other nodes. If the rebalancing stalls during decommissioning, replicas that have yet to move are printed to the [SQL shell]({% link {{ page.version.version }}/cockroach-sql.md %}) and written to the [`OPS` logging channel]({% link {{ page.version.version }}/logging-overview.md %}#logging-channels). [By default]({% link {{ page.version.version }}/configure-logs.md %}#default-logging-configuration), the `OPS` channel logs output to a `cockroach.log` file. -The node's [`/health?ready=1` endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#health-ready-1) returns an HTTP `503 Service Unavailable` response code so that load balancers and connection managers stop directing new SQL client connections to the node while replicas are rebalanced. +The node's [`/health?ready=1` endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#health-ready-1) returns an HTTP `503 Service Unavailable` response code with a JSON error response so that load balancers and connection managers stop directing new SQL client connections to the node while replicas are rebalanced. {{site.data.alerts.callout_info}} After this stage, the node is automatically drained. However, to avoid possible disruptions in query performance, you can manually drain the node before decommissioning. For more information, see [Perform node shutdown](#perform-node-shutdown).