Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 3 additions & 18 deletions alert-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ This section gives the alert rules for the PD component.
* If you confirm that the TiKV/TiFlash instance cannot be recovered, you can make it offline.
* If you confirm that the TiKV/TiFlash instance can be recovered, but not in the short term, you can consider increasing the value of `max-down-time`. It will prevent the TiKV/TiFlash instance from being considered as irrecoverable and the data from being removed from the TiKV/TiFlash.

#### `PD_cluster_unhealthy_tikv_nums`
#### `PD_cluster_unhealthy_store_nums`

* Alert rule:

Expand All @@ -256,7 +256,7 @@ This section gives the alert rules for the PD component.

* Solution:

Check the state of the TiKV stores.
Check the status of TiKV/TiFlash.

#### `PD_cluster_low_space`

Expand Down Expand Up @@ -355,7 +355,7 @@ This section gives the alert rules for the PD component.
* Check the network and system load status.
* If the problematic PD instance cannot be recovered due to environmental factors, make it offline and replace it.

#### `TiKV_space_used_more_than_80%`
#### `PD_cluster_store_space_used_more_than_80%`

* Alert rule:

Expand Down Expand Up @@ -384,21 +384,6 @@ This section gives the alert rules for the PD component.

Check whether the system time is configured correctly.

#### `PD_no_store_for_making_replica`

* Alert rule:

`increase(pd_checker_event_count{type="replica_checker", name="no_target_store"}[1m]) > 0`

* Description:

There is no appropriate store for additional replicas.

* Solution:

* Check whether there is enough space in the store.
* Check whether there is any store for additional replicas according to the label configuration if it is configured.

#### `PD_cluster_slow_tikv_nums`

* Alert rule:
Expand Down
Loading