Skip to content

Conversation

@YiwenZhang12
Copy link

@YiwenZhang12 YiwenZhang12 commented Dec 5, 2025

  • Handshake monitoring: Every TLS client connection has its certificate inspected immediately after the handshake. We track the smallest “days until expiry” we’ve seen since startup/reset and surface it via INFO clients as client_cert_min_seconds_until_expiry (initially -1).
# Clients
...
client_cert_min_seconds_until_expiry:345600
  • Configurable warnings: Introduce tls-client-cert-expiry-warn-threshold so operators can enable proactive alerts. Example configuration:
    tls-client-cert-expiry-warn-threshold 10
    Example warning:
    TLS client certificate for id=147 addr=10.1.2.3:54128 fd=15 name=*redacted* expires in 4 days (threshold 10 days).

  • 24‑hour deduplication: To avoid flooding logs, each certificate is fingerprinted (SHA‑256) and stored in client_cert_expiry_warned with a 24‑hour suppression window. The same certificate will trigger at most one warning per day.

Yiwen Zhang and others added 3 commits December 4, 2025 22:52
Signed-off-by: Yiwen Zhang <zhangyiwen1221@gmail.com>
Signed-off-by: Yiwen Zhang <zhangyiwen1221@gmail.com>
Signed-off-by: Yiwen Zhang <zhangyiwen1221@gmail.com>
Comment on lines +327 to +332
# Emit a warning log when a client-presented TLS certificate gets close to its
# expiration date. The value is expressed in days. Set to 0 to disable the
# warning; no connections are rejected when the threshold is crossed.
#
# tls-client-cert-expiry-warn-threshold 10

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very clear why this should be the responsibility of the server to emit these types of events. Can clients not also track this information?

Today, LL_WARNING is reserved for events which need immediate server operator intervention, in this cause it's not the server that has an issue but the end client.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @madolson, thanks for the feedback! I understand the concern. ideally clients could track this themselves, but many client libraries or environments don’t expose the right hooks. A lightweight server-side signal helps operators be proactive, especially when client visibility is limited.

It’s also not uncommon for database cores to surface these kinds of client-side behaviors. for example, Cassandra emits similar warnings to help operators understand issues that can directly impact customer traffic, even if the server itself is healthy.

I’m open to adjusting the severity if LL_WARNING feels too strong; the main goal is simply to provide centralized visibility.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @madolson, thanks again for the feedback. I’ve lowered the log level to LL_NOTICEso it no longer reads like a server error. The KPI in INFO stays the same to drive proactive alerting.
This signal helps the server as well as clients: if a shared client cert quietly expires, thousands of clients may start reconnecting and re-handshaking at once, creating a connection storm. With the INFO metric + NOTICE log in place, our SRE partners can wire up alerts and rotate the cert ahead of time, keeping both the clients and the server stable.

Signed-off-by: Yiwen Zhang <zhangyiwen1221@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants