Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/current/v26.2/monitoring-and-alerting.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,8 @@ The `http://<node-host>:<http-port>/health?ready=1` endpoint returns an HTTP `50
If you find that your load balancer's health check is not always recognizing a node as unready before the node shuts down, you can increase the `server.shutdown.initial_wait` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) (previously named `server.shutdown.drain_wait`) to cause a node to return `503 Service Unavailable` even before it has started shutting down.
{{site.data.alerts.end}}

- The node is [decommissioning or decommissioned]({% link {{ page.version.version }}/node-shutdown.md %}?filters=decommission#decommissioning). This causes load balancers and connection managers to reroute traffic to other nodes while replicas are rebalanced away from the node.

- The node is unable to communicate with a majority of the other nodes in the cluster, likely because the cluster is unavailable due to too many nodes being down.

{% include_cached copy-clipboard.html %}
Expand Down
4 changes: 2 additions & 2 deletions src/current/v26.2/node-shutdown.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ An operator [initiates the decommissioning process](#decommission-the-node) on t

The node's [`is_decommissioning`]({% link {{ page.version.version }}/cockroach-node.md %}#node-status) field is set to `true` and its `membership` status is set to `decommissioning`, which causes its replicas to be rebalanced to other nodes. If the rebalancing stalls during decommissioning, replicas that have yet to move are printed to the [SQL shell]({% link {{ page.version.version }}/cockroach-sql.md %}) and written to the [`OPS` logging channel]({% link {{ page.version.version }}/logging-overview.md %}#logging-channels). [By default]({% link {{ page.version.version }}/configure-logs.md %}#default-logging-configuration), the `OPS` channel logs output to a `cockroach.log` file.

The node's [`/health?ready=1` endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#health-ready-1) continues to consider the node "ready" so that the node can function as a gateway to route SQL client connections to relevant data.
The node's [`/health?ready=1` endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#health-ready-1) returns an HTTP `503 Service Unavailable` response code so that load balancers and connection managers stop directing new SQL client connections to the node while replicas are rebalanced.

{{site.data.alerts.callout_info}}
After this stage, the node is automatically drained. However, to avoid possible disruptions in query performance, you can manually drain the node before decommissioning. For more information, see [Perform node shutdown](#perform-node-shutdown).
Expand Down Expand Up @@ -160,7 +160,7 @@ Before you [perform node shutdown](#perform-node-shutdown), review the following

### Load balancing

Your [load balancer]({% link {{ page.version.version }}/recommended-production-settings.md %}#load-balancing) should use the [`/health?ready=1` endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#health-ready-1) to actively monitor node health and direct SQL client connections away from draining nodes.
Your [load balancer]({% link {{ page.version.version }}/recommended-production-settings.md %}#load-balancing) should use the [`/health?ready=1` endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#health-ready-1) to actively monitor node health and direct SQL client connections away from nodes that are not ready to receive requests.

To handle node shutdown effectively, the load balancer must be given enough time by the [`server.shutdown.initial_wait` duration](#server-shutdown-initial_wait).

Expand Down
Loading