Summary
matchPatterns() sets a node condition to ConditionTrue when an error-level log pattern fires, but never sets ConditionFalse when the pattern is no longer present. After a node reboot clears the kernel ring buffer (kmsg), the condition stays True indefinitely.
Observed Behaviour
NodeDoctorVmxnet3TxHang: True on nodes a1ubr720p04 and a1ubopsp06 with lastTransitionTime: 2025-11-30 — both nodes have since been rebooted onto a fixed kernel (6.17) but the condition remains stuck True 5+ months later.
NodeDoctorConntrackTableFull: True on a1ubr720p03 — conntrack is at 0.7% utilisation currently; condition is stale from a Cilium incident.
$ kubectl get node a1ubr720p04 -o jsonpath='{.status.conditions[?(@.type=="NodeDoctorVmxnet3TxHang")]}'
{"lastTransitionTime":"2025-11-30T05:35:19Z","message":"Pattern 'vmxnet3-tx-hang' matched in kmsg logs","reason":"LogPatternMatched","status":"True"}
Root Cause
In pkg/monitors/custom/logpattern.go, matchPatterns() (line 1076) only ever emits ConditionTrue:
// Add condition for error-level patterns
if severity == types.EventError {
condition := types.NewCondition(
pattern.Name,
types.ConditionTrue, // ← only True is ever emitted
"LogPatternMatched",
fmt.Sprintf("Pattern '%s' matched in %s logs", pattern.Name, source),
)
status.AddCondition(condition)
}
checkLogPatterns() scans all sources but emits conditions only on match. When a pattern does not match (e.g. because kmsg was cleared by reboot), nothing is emitted — the Kubernetes condition object persists unchanged.
Kubernetes node conditions are persistent state: absence of an update ≠ setting False.
Proposed Fix
After each full scan cycle, emit ConditionFalse for any error-level pattern that was not matched:
for _, pattern := range m.config.Patterns {
if m.parseSeverity(pattern.Severity) == types.EventError {
if _, matched := matchedPatterns[pattern.Name]; !matched {
status.AddCondition(types.NewCondition(
pattern.Name,
types.ConditionFalse,
"LogPatternNotFound",
fmt.Sprintf("Pattern '%s' not found in current scan", pattern.Name),
))
}
}
}
Impact
Every error-level log pattern condition accumulates false positives permanently across node reboots. Operators cannot trust True conditions without cross-checking lastTransitionTime.
Summary
matchPatterns()sets a node condition toConditionTruewhen an error-level log pattern fires, but never setsConditionFalsewhen the pattern is no longer present. After a node reboot clears the kernel ring buffer (kmsg), the condition staysTrueindefinitely.Observed Behaviour
NodeDoctorVmxnet3TxHang: Trueon nodesa1ubr720p04anda1ubopsp06withlastTransitionTime: 2025-11-30— both nodes have since been rebooted onto a fixed kernel (6.17) but the condition remains stuckTrue5+ months later.NodeDoctorConntrackTableFull: Trueona1ubr720p03— conntrack is at 0.7% utilisation currently; condition is stale from a Cilium incident.Root Cause
In
pkg/monitors/custom/logpattern.go,matchPatterns()(line 1076) only ever emitsConditionTrue:checkLogPatterns()scans all sources but emits conditions only on match. When a pattern does not match (e.g. because kmsg was cleared by reboot), nothing is emitted — the Kubernetes condition object persists unchanged.Kubernetes node conditions are persistent state: absence of an update ≠ setting
False.Proposed Fix
After each full scan cycle, emit
ConditionFalsefor any error-level pattern that was not matched:Impact
Every error-level log pattern condition accumulates false positives permanently across node reboots. Operators cannot trust
Trueconditions without cross-checkinglastTransitionTime.