Skip to content

nethsm: document cluster failure recovery#578

Open
Firobe wants to merge 2 commits into
Nitrokey:mainfrom
Firobe:nethsm/recovery
Open

nethsm: document cluster failure recovery#578
Firobe wants to merge 2 commits into
Nitrokey:mainfrom
Firobe:nethsm/recovery

Conversation

@Firobe
Copy link
Copy Markdown

@Firobe Firobe commented May 13, 2026

Document recovery procedure and Failed state, for the next release.

@Firobe Firobe force-pushed the nethsm/recovery branch from da76971 to e837d1a Compare May 13, 2026 17:07

To help you understand which case your node is in, the ``GET /health/diagnose``
endpoint remains available and returns information about the current status of
``etcd`` and its database, including logs (refer to the API documentation).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we clarify that it might take a while for the transition into the Failed state to complete, e.g. due to internal retries and timeouts? But that this is necessary to recover from failures and to not enter the Failed mode too often.

Comment thread source/components/nethsm/clustering.rst Outdated
- *isolate* the node with the ``POST /cluster/force-new`` endpoint, which will
irreversibly forget all other cluster members and restart ``etcd`` with the
data present on disk. If the underlying failure was cluster-related, the node
will transition out of the _Failed_ state.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention what state it ends up in? The answer might depend on whether you use unattended boot.

@Firobe
Copy link
Copy Markdown
Author

Firobe commented May 13, 2026

Thanks! Both your comments should be addressed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants