<fix>[ha]: setup self-fencer earlier during host reconnect#4003
<fix>[ha]: setup self-fencer earlier during host reconnect#4003MatheMatrix wants to merge 2 commits into
Conversation
This change is necessary because HA self-fencer setup needs to run soon after the hypervisor connect hook completes during host reconnect. This commit adds a host-level after-connect-hook extension point and marks reconnect-generated connect messages so HA can distinguish real host reconnects without changing existing connect single-flight behavior. It also propagates the one-time status-check bypass to KVM self-fencer setup paths and fixes host capacity check timeout setup. DBImpact Resolves: ZSTAC-69133 Change-Id: I7a73746163693639313333636f6d70757465 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml) Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
Walkthrough此 PR 为主机连接流程添加 after-connect hook 扩展点并在连接 Flow 中收集/执行/回滚这些 hook;在主机容量检查消息上显式设置 30 分钟超时;并为 KVM 自定义围栏参数添加 noStatusCheck 以控制状态检查行为。 Changes主机连接流程与扩展
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
compute/src/main/java/org/zstack/compute/host/HostAfterConnectHookExtensionPoint.java (1)
8-8: ⚡ Quick win请为扩展点方法补充 Javadoc 契约说明。
这个接口方法会被插件实现并跨模块调用,建议补齐参数语义(尤其是
reconnect)和返回Flow的约束,避免实现方理解偏差。♻️ 建议修改
public interface HostAfterConnectHookExtensionPoint { + /** + * Create a hook flow that will be executed right after host connect. + * + * `@param` host connected host inventory snapshot + * `@param` info connect context + * `@param` reconnect true when this connect is triggered by reconnect flow + * `@return` hook flow, or null if no hook is needed + */ Flow createAfterConnectHookFlow(HostInventory host, ConnectHostInfo info, boolean reconnect); }As per coding guidelines “接口方法不应有多余的修饰符(例如
public),且必须配有有效的 Javadoc 注释。”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@compute/src/main/java/org/zstack/compute/host/HostAfterConnectHookExtensionPoint.java` at line 8, 为接口方法 createAfterConnectHookFlow(HostInventory host, ConnectHostInfo info, boolean reconnect) 补充规范的 Javadoc,删除任何多余的修饰符并在注释中明确说明:参数 host 和 info 的语义,reconnect 表示这是一个重连(true)还是首次连接(false)的场景;对返回值 Flow 的契约进行说明(例如执行时序、异常处理/回滚约定、是否可为 null 或必须返回非空 Flow、调用者如何使用返回的 Flow),以及线程/异步执行注意事项,并在注释中举例或提示实现者应遵循的行为准则,以避免跨模块实现歧义;保留方法签名不变,仅补充/替换为完整 Javadoc 注释。
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@compute/src/main/java/org/zstack/compute/host/HostAfterConnectHookExtensionPoint.java`:
- Line 8: 为接口方法 createAfterConnectHookFlow(HostInventory host, ConnectHostInfo
info, boolean reconnect) 补充规范的 Javadoc,删除任何多余的修饰符并在注释中明确说明:参数 host 和 info
的语义,reconnect 表示这是一个重连(true)还是首次连接(false)的场景;对返回值 Flow
的契约进行说明(例如执行时序、异常处理/回滚约定、是否可为 null 或必须返回非空 Flow、调用者如何使用返回的
Flow),以及线程/异步执行注意事项,并在注释中举例或提示实现者应遵循的行为准则,以避免跨模块实现歧义;保留方法签名不变,仅补充/替换为完整 Javadoc
注释。
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
Run ID: ea14a67b-1783-4743-b226-4489ac47d821
📒 Files selected for processing (6)
compute/src/main/java/org/zstack/compute/host/HostAfterConnectHookExtensionPoint.javacompute/src/main/java/org/zstack/compute/host/HostBase.javacompute/src/main/java/org/zstack/compute/host/HostManagerImpl.javaplugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMHostCapacityExtension.javaplugin/kvm/src/main/java/org/zstack/kvm/KvmSetupSelfFencerExtensionPoint.java
There was a problem hiding this comment.
🧹 Nitpick comments (1)
compute/src/main/java/org/zstack/compute/host/HostManagerImpl.java (1)
821-821: ⚡ Quick win建议把容量检查超时提取为统一常量,避免跨模块硬编码漂移。
Line 821 与 KVM 扩展里都写死了
30分钟,后续改超时时容易出现不一致。建议提取为共享常量(或统一走已有全局配置)后复用。可参考的最小改动
public class HostManagerImpl extends AbstractService implements HostManager, ManagementNodeChangeListener, ManagementNodeReadyExtensionPoint, FindSameNodeExtensionPoint { + private static final long CHECK_HOST_CAPACITY_TIMEOUT_MILLIS = TimeUnit.MINUTES.toMillis(30); @@ - msg.setTimeout(TimeUnit.MINUTES.toMillis(30)); + msg.setTimeout(CHECK_HOST_CAPACITY_TIMEOUT_MILLIS);As per coding guidelines "避免使用魔法值(Magic Value)".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@compute/src/main/java/org/zstack/compute/host/HostManagerImpl.java` at line 821, Hardcoded 30-minute timeout used via msg.setTimeout(TimeUnit.MINUTES.toMillis(30)) in HostManagerImpl (and the KVM extension) should be replaced with a shared constant: introduce a single constant (e.g., CAPACITY_CHECK_TIMEOUT_MS or CAPACITY_CHECK_TIMEOUT_MINUTES) in a common config/const class (or reuse an existing global timeout config) and reference that constant from HostManagerImpl.msg.setTimeout(...) and the KVM extension so both modules use the same value and avoid magic numbers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@compute/src/main/java/org/zstack/compute/host/HostManagerImpl.java`:
- Line 821: Hardcoded 30-minute timeout used via
msg.setTimeout(TimeUnit.MINUTES.toMillis(30)) in HostManagerImpl (and the KVM
extension) should be replaced with a shared constant: introduce a single
constant (e.g., CAPACITY_CHECK_TIMEOUT_MS or CAPACITY_CHECK_TIMEOUT_MINUTES) in
a common config/const class (or reuse an existing global timeout config) and
reference that constant from HostManagerImpl.msg.setTimeout(...) and the KVM
extension so both modules use the same value and avoid magic numbers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
Run ID: 570509f2-8e45-441f-a95c-35663c0dac3c
📒 Files selected for processing (2)
compute/src/main/java/org/zstack/compute/host/HostManagerImpl.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMHostCapacityExtension.java
|
Comment from yaohua.wu: Review: MR !9896 — ZSTAC-69133背景ZSTAC-69133(P0,5.5.22 必须解):HA self-fencer 在 host 重连后被触发得太晚(要等整条 reconnect FlowChain 跑完、host 切到 Connected 才在 P0 — Critical无。 🟡 WarningW1.
🟢 SuggestionS1. 30 分钟超时为重复魔法值 — S2. 新增 SPI 缺少 Javadoc — 🟢 advisory目标分支 关联 MR本次为跨仓联合修复(源分支
Verdict: REVISION_REQUIRED无 Critical。W1 需作者确认 single-flight signature 是否有意省略 —— 这关系到 P0 修复在并发场景下是否真正生效,作为 P0 bugfix 建议在合并前澄清。S1/S2/advisory 不阻塞。 🤖 Robot Reviewer |
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
70a7aa4 to
9d2004f
Compare
Summary: add HostBase after-connect hook, propagate one-time noStatusCheck, and fix CheckHostCapacityMsg timeout setup. Validation: root install, premium package, premium test-compile passed.
sync from gitlab !9896