Skip to content

<fix>[ha]: stabilize suspect host pre-fence flow#3992

Open
MatheMatrix wants to merge 1 commit into
5.5.22from
sync/yingzhe.hu/fix/ZSTAC-83890-5.5.22@@3
Open

<fix>[ha]: stabilize suspect host pre-fence flow#3992
MatheMatrix wants to merge 1 commit into
5.5.22from
sync/yingzhe.hu/fix/ZSTAC-83890-5.5.22@@3

Conversation

@MatheMatrix
Copy link
Copy Markdown
Owner

Pre-fence leftover QEMU processes through a reachable sibling host
and pass that sibling through HA VM start. Use the agent success
flag as the pre-fence verdict and drop redundant response fields.

Test: mvn -pl plugin/kvm,simulator/simulatorImpl,testlib -am -DskipTests compile

Resolves: ZSTAC-83890

Change-Id: I168adf82338f9df9e76287619b7f76a8e5be695f
(cherry picked from commit 847cc3b)

sync from gitlab !9885

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

此 PR 实现 HA 虚拟机启动前的预围栏保护流程。通过异步化扩展点执行机制、新增预围栏系统标签与消息协议、实现 KVM 主机间远程围栏通信,完成启动前防护链路。核心流程为:标记待保护 VM、向对等主机请求隔离疑似主机上的 VM、基于隔离结果决策是否允许启动。

Changes

HA 预围栏流程

Layer / File(s) Summary
消息与系统标签数据契约
header/src/main/java/org/zstack/header/vm/HaStartVmInstanceMsg.java, header/src/main/java/org/zstack/header/vm/FenceVmFromPeerHostMsg.java, header/src/main/java/org/zstack/header/vm/FenceVmFromPeerHostReply.java, compute/src/main/java/org/zstack/compute/vm/VmSystemTags.java
HaStartVmInstanceMsg 新增 accessiblePeerHostUuid 字段用于传递可访问对端主机标识;新增 FenceVmFromPeerHostMsgFenceVmFromPeerHostReply 消息类;定义 HA_PRE_FENCE_PENDING 系统标签与相关 token 常量。
扩展点接口异步化
header/src/main/java/org/zstack/header/vm/VmBeforeStartOnHypervisorExtensionPoint.java
VmBeforeStartOnHypervisorExtensionPoint 新增支持 Completion 参数的默认重载方法,原无参抽象方法改为默认空实现,实现向后兼容。
启动流程异步链路重构
compute/src/main/java/org/zstack/compute/vm/VmStartOnHypervisorFlow.java
VmStartOnHypervisorFlow.run() 方法从同步扩展点遍历重构为 FlowChain 异步流程:使用 While 串行执行扩展点并通过 Completion 汇报结果,再发送启动消息,最后推进外层链路。
VM 启动与预围栏标签集成
compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
HA 启动准备阶段创建 HA_PRE_FENCE_PENDING 系统标签并写入可访问对端主机 UUID token;启动成功后删除该标签;startVm 方法中为 HaStartVmInstanceMsg 分支提取 softAvoidHostUuids 并设置到启动规格。
KVM 预围栏扩展实现
plugin/kvm/src/main/java/org/zstack/kvm/KvmHaPreFenceVmExtension.java
新增 KvmHaPreFenceVmExtension 实现扩展点:检查 HA_PRE_FENCE_PENDING 标签与 KVM hypervisor 类型;基于 softAvoidHostUuidslastHostUuid 确定怀疑宿主;从标签 token 读取兄弟对端主机(缺失时回退目的主机);构造并发送 FenceVmFromPeerHostMsg 通过 CloudBus 路由至对端主机。
KVM 围栏 Agent 命令协议
plugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.java, plugin/kvm/src/main/java/org/zstack/kvm/KVMConstant.java
定义 FenceVmOnSuspectHostCmd Agent 命令(含 VM UUID、目标主机管理 IP/用户名/私钥、SSH 端口与超时参数)与空 FenceVmOnSuspectHostRsp 响应;新增 HTTP 路径常量用于异步 HTTP 调用。
KVM 主机远程围栏处理
plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java
KVMHost 本地消息分发增加 FenceVmFromPeerHostMsg 分支;handle() 方法校验 suspect 主机,查询数据库确认其存在,构造 FenceVmOnSuspectHostCmd 并通过 KVMHostAsyncHttpCallMsg 向对等主机发起异步 HTTP 调用,依据响应结果设置启动错误或正常回复。
模拟器与测试支持
simulator/simulatorImpl/src/main/java/org/zstack/simulator/kvm/KVMSimulatorController.java, testlib/src/main/java/org/zstack/testlib/KVMSimulator.groovy
模拟器控制器新增 fenceVmOnSuspectHost REST 端点;测试库为围栏路径注册 Groovy simulator,均返回成功响应以支持集成测试。

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

兔兔携来围栏术,
异步流链护虚机,
对端一声"隔离吧",
分脑梦碎自然散。
🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title follows the required [scope]: format (fix>[ha]: stabilize suspect host pre-fence flow) and is 48 characters, well under the 72-character limit.
Description check ✅ Passed The PR description is directly related to the changeset, describing the pre-fence stabilization changes for HA VM start flow and summarizing the key modifications.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/yingzhe.hu/fix/ZSTAC-83890-5.5.22@@3

Comment @coderabbitai help to get the list of available commands and usage tips.

@MatheMatrix MatheMatrix force-pushed the sync/yingzhe.hu/fix/ZSTAC-83890-5.5.22@@3 branch 2 times, most recently from 1478dff to ce42b4f Compare May 17, 2026 16:09
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java (1)

1038-1049: ⚡ Quick win

保留底层错误详情,避免 HA 拒绝启动时丢失诊断信息。

这两个失败分支现在只返回通用文案,r.getError() / rsp.getError() 都被吞掉了。上层最终只能看到“拒绝启动”,但看不到是 sibling 不可达、SSH 失败还是 agent 执行失败,HA 现场排障会很困难。建议把原始错误挂到 cause 或拼进 details。

♻️ 建议修改
                 if (!r.isSuccess()) {
-                    reply.setError(operr("HA-start vm[%s]: transport error asking sibling[%s] to kill vm on suspect host[%s]. " +
-                                    "Refuse to start to prevent split-brain.",
-                            msg.getVmUuid(), peerHostUuid, suspectHostUuid));
+                    reply.setError(operr("HA-start vm[%s]: transport error asking sibling[%s] to kill vm on suspect host[%s]. " +
+                                    "Refuse to start to prevent split-brain. cause: %s",
+                            msg.getVmUuid(), peerHostUuid, suspectHostUuid, r.getError()));
                     bus.reply(msg, reply);
                     return;
                 }
                 FenceVmOnSuspectHostRsp rsp = ((KVMHostAsyncHttpCallReply) r).toResponse(FenceVmOnSuspectHostRsp.class);
                 if (!rsp.isSuccess()) {
-                    reply.setError(operr("HA-start vm[%s]: sibling[%s] failed to pre-fence suspect host[%s]. " +
-                                    "Refuse to start to prevent split-brain.",
-                            msg.getVmUuid(), peerHostUuid, suspectHostUuid));
+                    reply.setError(operr("HA-start vm[%s]: sibling[%s] failed to pre-fence suspect host[%s]. " +
+                                    "Refuse to start to prevent split-brain. cause: %s",
+                            msg.getVmUuid(), peerHostUuid, suspectHostUuid, rsp.getError()));
                     bus.reply(msg, reply);
                     return;
                 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 1038 -
1049, The failure branches in KVMHost handling of the peer call drop the
underlying error details (r.getError() and rsp.getError()), so update the two
branches that call reply.setError(...) (the block checking if (!r.isSuccess())
and the block checking if (!rsp.isSuccess()) where rsp is a
FenceVmOnSuspectHostRsp) to include the original error info in the created
ErrorCode (attach r.getError() / rsp.getError() as the cause or append it to the
details) instead of only the generic operr message, so callers can see whether
the transport, SSH/agent, or peer-side failure occurred; keep the existing operr
text but combine it with the underlying error payload when calling
reply.setError.
simulator/simulatorImpl/src/main/java/org/zstack/simulator/kvm/KVMSimulatorController.java (1)

789-796: ⚡ Quick win

建议在模拟端点先反序列化请求体以校验协议契约

当前直接返回成功且不解析请求体,命令字段漂移时测试不容易暴露问题。建议至少先将 body 反序列化为 FenceVmOnSuspectHostCmd

♻️ 建议修改
 `@RequestMapping`(value = KVMConstant.KVM_HA_FENCE_VM_ON_SUSPECT_HOST_PATH, method = RequestMethod.POST)
 public `@ResponseBody` String fenceVmOnSuspectHost(HttpServletRequest req) {
     HttpEntity<String> entity = restf.httpServletRequestToHttpEntity(req);
+    JSONObjectUtil.toObject(entity.getBody(), FenceVmOnSuspectHostCmd.class);
     FenceVmOnSuspectHostRsp rsp = new FenceVmOnSuspectHostRsp();
     rsp.setSuccess(true);
     replyer.reply(entity, rsp);
     return null;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@simulator/simulatorImpl/src/main/java/org/zstack/simulator/kvm/KVMSimulatorController.java`
around lines 789 - 796, In KVMSimulatorController.fenceVmOnSuspectHost, the
endpoint currently ignores the request body; change it to deserialize the
HttpEntity body into a FenceVmOnSuspectHostCmd (use the same JSON binding/util
used elsewhere), validate required fields (e.g. non-null command and expected
fields) and handle/report parsing errors before constructing
FenceVmOnSuspectHostRsp and calling replyer.reply; keep the existing successful
response path for valid commands so tests can detect protocol/field drift.
testlib/src/main/java/org/zstack/testlib/KVMSimulator.groovy (1)

301-305: ⚡ Quick win

建议在测试模拟器中解析 FenceVmOnSuspectHostCmd 请求体

这里同样直接返回成功,建议先做一次反序列化,避免命令结构变化被静默忽略。

♻️ 建议修改
 spec.simulator(KVMConstant.KVM_HA_FENCE_VM_ON_SUSPECT_HOST_PATH) { HttpEntity<String> e ->
+    JSONObjectUtil.toObject(e.body, KVMAgentCommands.FenceVmOnSuspectHostCmd.class)
     def rsp = new KVMAgentCommands.FenceVmOnSuspectHostRsp()
     rsp.success = true
     return rsp
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@testlib/src/main/java/org/zstack/testlib/KVMSimulator.groovy` around lines
301 - 305, The simulator handler for KVM_HA_FENCE_VM_ON_SUSPECT_HOST_PATH
returns a success response without validating the incoming request;
parse/deserialise the HttpEntity<String> e into a FenceVmOnSuspectHostCmd (use
the same JSON marshaller used elsewhere in KVMSimulator), optionally assert
required fields (e.g., vmUuid/hostUuid) match expectations, and only then
construct and return a KVMAgentCommands.FenceVmOnSuspectHostRsp with
success=true; update the handler inside KVMSimulator.groovy (the spec.simulator
block handling KVMConstant.KVM_HA_FENCE_VM_ON_SUSPECT_HOST_PATH) to perform the
deserialization and simple validation before returning the response.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@header/src/main/java/org/zstack/header/vm/VmBeforeStartOnHypervisorExtensionPoint.java`:
- Line 8: Add a Javadoc to the VmBeforeStartOnHypervisorExtensionPoint interface
and its method beforeStartVmOnHypervisor(VmInstanceSpec spec, Completion
completion) that clearly documents this extension point is asynchronous and that
the provided Completion must be invoked exactly once (either success or failure)
to avoid hanging or duplicate progression; ensure the Javadoc follows project
style and do not add extraneous modifiers to the interface method (leave it
package-private as an interface declaration).

In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmHaPreFenceVmExtension.java`:
- Around line 49-54: In KvmHaPreFenceVmExtension, the current null-check for
spec.getPreFenceSiblingHostUuid() allows the preFence peer to be the same as
suspectHostUuid; update the logic that sets peerHostUuid so that if
spec.getPreFenceSiblingHostUuid() is null OR equals suspectHostUuid it is
treated as invalid and you fall back to destHostUuid (or explicitly fail as
appropriate); adjust the code around peerHostUuid, vmUuid, suspectHostUuid and
destHostUuid to perform the equality check and use the fallback value instead of
sending the pre-fence to the suspect host.

---

Nitpick comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java`:
- Around line 1038-1049: The failure branches in KVMHost handling of the peer
call drop the underlying error details (r.getError() and rsp.getError()), so
update the two branches that call reply.setError(...) (the block checking if
(!r.isSuccess()) and the block checking if (!rsp.isSuccess()) where rsp is a
FenceVmOnSuspectHostRsp) to include the original error info in the created
ErrorCode (attach r.getError() / rsp.getError() as the cause or append it to the
details) instead of only the generic operr message, so callers can see whether
the transport, SSH/agent, or peer-side failure occurred; keep the existing operr
text but combine it with the underlying error payload when calling
reply.setError.

In
`@simulator/simulatorImpl/src/main/java/org/zstack/simulator/kvm/KVMSimulatorController.java`:
- Around line 789-796: In KVMSimulatorController.fenceVmOnSuspectHost, the
endpoint currently ignores the request body; change it to deserialize the
HttpEntity body into a FenceVmOnSuspectHostCmd (use the same JSON binding/util
used elsewhere), validate required fields (e.g. non-null command and expected
fields) and handle/report parsing errors before constructing
FenceVmOnSuspectHostRsp and calling replyer.reply; keep the existing successful
response path for valid commands so tests can detect protocol/field drift.

In `@testlib/src/main/java/org/zstack/testlib/KVMSimulator.groovy`:
- Around line 301-305: The simulator handler for
KVM_HA_FENCE_VM_ON_SUSPECT_HOST_PATH returns a success response without
validating the incoming request; parse/deserialise the HttpEntity<String> e into
a FenceVmOnSuspectHostCmd (use the same JSON marshaller used elsewhere in
KVMSimulator), optionally assert required fields (e.g., vmUuid/hostUuid) match
expectations, and only then construct and return a
KVMAgentCommands.FenceVmOnSuspectHostRsp with success=true; update the handler
inside KVMSimulator.groovy (the spec.simulator block handling
KVMConstant.KVM_HA_FENCE_VM_ON_SUSPECT_HOST_PATH) to perform the deserialization
and simple validation before returning the response.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 4739738d-8e37-4ebf-a2af-378c476a7b2f

📥 Commits

Reviewing files that changed from the base of the PR and between d75e924 and 1478dff.

⛔ Files ignored due to path filters (2)
  • conf/springConfigXml/Kvm.xml is excluded by !**/*.xml
  • test/src/test/resources/springConfigXml/Kvm.xml is excluded by !**/*.xml
📒 Files selected for processing (16)
  • compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
  • compute/src/main/java/org/zstack/compute/vm/VmStartOnHypervisorFlow.java
  • compute/src/main/java/org/zstack/compute/vm/VmSystemTags.java
  • header/src/main/java/org/zstack/header/vm/FenceVmOnHostMsg.java
  • header/src/main/java/org/zstack/header/vm/FenceVmOnHostReply.java
  • header/src/main/java/org/zstack/header/vm/HaStartVmInstanceMsg.java
  • header/src/main/java/org/zstack/header/vm/VmBeforeStartOnHypervisorExtensionPoint.java
  • header/src/main/java/org/zstack/header/vm/VmInstanceSpec.java
  • plugin/applianceVm/src/main/java/org/zstack/appliancevm/ApplianceVmManagementIpChecker.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMConstant.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KvmHaPreFenceVmExtension.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KvmVmHardwareVerifyExtensionPoint.java
  • simulator/simulatorImpl/src/main/java/org/zstack/simulator/kvm/KVMSimulatorController.java
  • testlib/src/main/java/org/zstack/testlib/KVMSimulator.groovy

*/
public interface VmBeforeStartOnHypervisorExtensionPoint {
void beforeStartVmOnHypervisor(VmInstanceSpec spec);
void beforeStartVmOnHypervisor(VmInstanceSpec spec, Completion completion);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

为异步扩展点补充 Javadoc 并明确回调约束。

Line [8] 将扩展点改为异步后,建议在接口方法上明确:Completion 必须且只能回调一次(成功/失败二选一),否则调用链可能悬挂或重复推进。

As per coding guidelines 「接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释」。

✍️ 建议补丁
 public interface VmBeforeStartOnHypervisorExtensionPoint {
+    /**
+     * Called before starting VM on hypervisor.
+     * Implementations must invoke {`@code` completion} exactly once:
+     * call {`@code` success()} on pass, or {`@code` fail(...)} on rejection/error.
+     */
     void beforeStartVmOnHypervisor(VmInstanceSpec spec, Completion completion);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void beforeStartVmOnHypervisor(VmInstanceSpec spec, Completion completion);
public interface VmBeforeStartOnHypervisorExtensionPoint {
/**
* Called before starting VM on hypervisor.
* Implementations must invoke {`@code` completion} exactly once:
* call {`@code` success()} on pass, or {`@code` fail(...)} on rejection/error.
*/
void beforeStartVmOnHypervisor(VmInstanceSpec spec, Completion completion);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@header/src/main/java/org/zstack/header/vm/VmBeforeStartOnHypervisorExtensionPoint.java`
at line 8, Add a Javadoc to the VmBeforeStartOnHypervisorExtensionPoint
interface and its method beforeStartVmOnHypervisor(VmInstanceSpec spec,
Completion completion) that clearly documents this extension point is
asynchronous and that the provided Completion must be invoked exactly once
(either success or failure) to avoid hanging or duplicate progression; ensure
the Javadoc follows project style and do not add extraneous modifiers to the
interface method (leave it package-private as an interface declaration).

Comment on lines +49 to +54
String peerHostUuid = spec.getPreFenceSiblingHostUuid();
if (peerHostUuid == null) {
peerHostUuid = destHostUuid;
logger.debug(String.format("HA-start vm[%s]: pre-fence sibling host is absent, fallback to dest host[%s]",
vmUuid, destHostUuid));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

避免把 suspect host 当作 pre-fence peer。

Line 49 这里只对 null 做了兜底;如果上游把 preFenceSiblingHostUuid 传成了 suspectHostUuid,这里会把围栏请求发回疑似故障宿主本身,预围栏就会直接失效。建议把 peerHostUuid.equals(suspectHostUuid) 也视为非法值,回退到 destHostUuid 或直接失败。

💡 建议修改
         String peerHostUuid = spec.getPreFenceSiblingHostUuid();
-        if (peerHostUuid == null) {
+        if (peerHostUuid == null || peerHostUuid.equals(suspectHostUuid)) {
             peerHostUuid = destHostUuid;
-            logger.debug(String.format("HA-start vm[%s]: pre-fence sibling host is absent, fallback to dest host[%s]",
-                    vmUuid, destHostUuid));
+            logger.debug(String.format("HA-start vm[%s]: pre-fence sibling host is invalid[%s], fallback to dest host[%s]",
+                    vmUuid, spec.getPreFenceSiblingHostUuid(), destHostUuid));
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
String peerHostUuid = spec.getPreFenceSiblingHostUuid();
if (peerHostUuid == null) {
peerHostUuid = destHostUuid;
logger.debug(String.format("HA-start vm[%s]: pre-fence sibling host is absent, fallback to dest host[%s]",
vmUuid, destHostUuid));
}
String peerHostUuid = spec.getPreFenceSiblingHostUuid();
if (peerHostUuid == null || peerHostUuid.equals(suspectHostUuid)) {
peerHostUuid = destHostUuid;
logger.debug(String.format("HA-start vm[%s]: pre-fence sibling host is invalid[%s], fallback to dest host[%s]",
vmUuid, spec.getPreFenceSiblingHostUuid(), destHostUuid));
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmHaPreFenceVmExtension.java` around
lines 49 - 54, In KvmHaPreFenceVmExtension, the current null-check for
spec.getPreFenceSiblingHostUuid() allows the preFence peer to be the same as
suspectHostUuid; update the logic that sets peerHostUuid so that if
spec.getPreFenceSiblingHostUuid() is null OR equals suspectHostUuid it is
treated as invalid and you fall back to destHostUuid (or explicitly fail as
appropriate); adjust the code around peerHostUuid, vmUuid, suspectHostUuid and
destHostUuid to perform the equality check and use the fallback value instead of
sending the pre-fence to the suspect host.

}

fireExtensions(spec);
private void runBeforeStartExtensions(List<VmBeforeStartOnHypervisorExtensionPoint> exts,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from jin.ma:

为啥不是递归,2行搞定的写这么一大坨

import org.zstack.header.host.HostMessage;
import org.zstack.header.message.NeedReplyMessage;

public class FenceVmOnHostMsg extends NeedReplyMessage implements HostMessage {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from jin.ma:

FenceVmFromPeerHost

private String vmInstanceUuid;
private String judgerClassName;
private List<String> softAvoidHostUuids;
private String preFenceSiblingHostUuid;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from jin.ma:

就字面意思,accessiblePeerHostUuid

cmd.targetHostUuid = suspectHostUuid;
cmd.targetHostIp = suspectVO.getManagementIp();
cmd.targetHostUsername = suspectVO.getUsername();
cmd.targetHostPassword = suspectVO.getPassword();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from jin.ma:

直接传密码不太合适把,传公钥吧

Comment thread conf/springConfigXml/Kvm.xml Outdated

<bean id="KvmHaPreFenceVmExtension" class="org.zstack.kvm.KvmHaPreFenceVmExtension">
<zstack:plugin>
<zstack:extension interface="org.zstack.header.vm.VmBeforeStartOnHypervisorExtensionPoint" />
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from jin.ma:

说了要限定优先级,优先级放最后

@MatheMatrix MatheMatrix force-pushed the sync/yingzhe.hu/fix/ZSTAC-83890-5.5.22@@3 branch 2 times, most recently from 1f8f82b to e72c610 Compare May 18, 2026 05:59
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.java`:
- Around line 5274-5296: 将 FenceVmOnSuspectHostCmd 类中的所有 public 字段(vmUuid,
targetHostUuid, targetHostIp, targetHostUsername, targetHostPrivateKey,
targetHostSshPort, sshTimeoutSec)改为 private,并为每个字段添加标准的 getter 和 setter
方法,保持原有注解(如 `@GrayVersion` 和 `@NoLogging`)放在字段或对应的 getter
上以保留行为一致性;确保序列化/反序列化路径仍能访问这些字段(如框架依赖 getter/setter),并在 setter
中为敏感字段(targetHostPrivateKey)预留后续验证或清理逻辑位置。

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java`:
- Around line 1038-1049: The reply currently discards the underlying errors from
the async call and response (r.getError() and rsp.getError()) in KVMHost when
handling the fence result; modify the failure branches that call
reply.setError(operr(...)) so they include the original error details (either
append r.getError()/rsp.getError() text into the op error message or attach it
as the cause) so operators can see whether the failure was transport/SSH/agent;
update the blocks that handle the KVMHostAsyncHttpCallReply (variable r) and the
FenceVmOnSuspectHostRsp (variable rsp) to propagate their error strings into the
reply.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 076324a2-87dd-4064-95bb-caaad572a504

📥 Commits

Reviewing files that changed from the base of the PR and between ce42b4f and 1f8f82b.

⛔ Files ignored due to path filters (2)
  • conf/springConfigXml/Kvm.xml is excluded by !**/*.xml
  • test/src/test/resources/springConfigXml/Kvm.xml is excluded by !**/*.xml
📒 Files selected for processing (14)
  • compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
  • compute/src/main/java/org/zstack/compute/vm/VmStartOnHypervisorFlow.java
  • compute/src/main/java/org/zstack/compute/vm/VmSystemTags.java
  • header/src/main/java/org/zstack/header/vm/FenceVmFromPeerHostMsg.java
  • header/src/main/java/org/zstack/header/vm/FenceVmFromPeerHostReply.java
  • header/src/main/java/org/zstack/header/vm/HaStartVmInstanceMsg.java
  • header/src/main/java/org/zstack/header/vm/VmBeforeStartOnHypervisorExtensionPoint.java
  • header/src/main/java/org/zstack/header/vm/VmInstanceSpec.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMConstant.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KvmHaPreFenceVmExtension.java
  • simulator/simulatorImpl/src/main/java/org/zstack/simulator/kvm/KVMSimulatorController.java
  • testlib/src/main/java/org/zstack/testlib/KVMSimulator.groovy
✅ Files skipped from review due to trivial changes (1)
  • header/src/main/java/org/zstack/header/vm/FenceVmFromPeerHostReply.java
🚧 Files skipped from review as they are similar to previous changes (6)
  • testlib/src/main/java/org/zstack/testlib/KVMSimulator.groovy
  • simulator/simulatorImpl/src/main/java/org/zstack/simulator/kvm/KVMSimulatorController.java
  • compute/src/main/java/org/zstack/compute/vm/VmSystemTags.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KvmHaPreFenceVmExtension.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMConstant.java
  • compute/src/main/java/org/zstack/compute/vm/VmStartOnHypervisorFlow.java

Comment on lines +5274 to +5296
public static class FenceVmOnSuspectHostCmd extends AgentCommand implements java.io.Serializable {
@GrayVersion(value = "5.5.22")
public String vmUuid;

@GrayVersion(value = "5.5.22")
public String targetHostUuid;

@GrayVersion(value = "5.5.22")
public String targetHostIp;

@GrayVersion(value = "5.5.22")
public String targetHostUsername;

@GrayVersion(value = "5.5.22")
@NoLogging
public String targetHostPrivateKey;

@GrayVersion(value = "5.5.22")
public Integer targetHostSshPort;

@GrayVersion(value = "5.5.22")
public Integer sshTimeoutSec;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

建议将 public 字段改为 private 并提供 getter/setter 方法。

当前类中的所有字段都声明为 public,这与代码库中绝大多数命令类的模式不一致。参考同文件中的 SetVmConsolePasswordLiveCmdLoginIscsiTargetCmdStartVmCmd 等类,它们都使用 private 字段配合 getter/setter 方法。

使用 public 字段的问题:

  • 破坏封装性,外部代码可以直接修改字段而无法进行验证
  • 特别是 targetHostPrivateKey 等敏感字段,应当通过 private 访问控制提供额外保护层
  • 后续若需添加验证逻辑会导致破坏性变更
♻️ 建议的重构方案
 public static class FenceVmOnSuspectHostCmd extends AgentCommand implements java.io.Serializable {
     `@GrayVersion`(value = "5.5.22")
-    public String vmUuid;
+    private String vmUuid;

     `@GrayVersion`(value = "5.5.22")
-    public String targetHostUuid;
+    private String targetHostUuid;

     `@GrayVersion`(value = "5.5.22")
-    public String targetHostIp;
+    private String targetHostIp;

     `@GrayVersion`(value = "5.5.22")
-    public String targetHostUsername;
+    private String targetHostUsername;

     `@GrayVersion`(value = "5.5.22")
     `@NoLogging`
-    public String targetHostPrivateKey;
+    private String targetHostPrivateKey;

     `@GrayVersion`(value = "5.5.22")
-    public Integer targetHostSshPort;
+    private Integer targetHostSshPort;

     `@GrayVersion`(value = "5.5.22")
-    public Integer sshTimeoutSec;
+    private Integer sshTimeoutSec;
+
+    public String getVmUuid() {
+        return vmUuid;
+    }
+
+    public void setVmUuid(String vmUuid) {
+        this.vmUuid = vmUuid;
+    }
+
+    public String getTargetHostUuid() {
+        return targetHostUuid;
+    }
+
+    public void setTargetHostUuid(String targetHostUuid) {
+        this.targetHostUuid = targetHostUuid;
+    }
+
+    public String getTargetHostIp() {
+        return targetHostIp;
+    }
+
+    public void setTargetHostIp(String targetHostIp) {
+        this.targetHostIp = targetHostIp;
+    }
+
+    public String getTargetHostUsername() {
+        return targetHostUsername;
+    }
+
+    public void setTargetHostUsername(String targetHostUsername) {
+        this.targetHostUsername = targetHostUsername;
+    }
+
+    public String getTargetHostPrivateKey() {
+        return targetHostPrivateKey;
+    }
+
+    public void setTargetHostPrivateKey(String targetHostPrivateKey) {
+        this.targetHostPrivateKey = targetHostPrivateKey;
+    }
+
+    public Integer getTargetHostSshPort() {
+        return targetHostSshPort;
+    }
+
+    public void setTargetHostSshPort(Integer targetHostSshPort) {
+        this.targetHostSshPort = targetHostSshPort;
+    }
+
+    public Integer getSshTimeoutSec() {
+        return sshTimeoutSec;
+    }
+
+    public void setSshTimeoutSec(Integer sshTimeoutSec) {
+        this.sshTimeoutSec = sshTimeoutSec;
+    }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.java` around lines
5274 - 5296, 将 FenceVmOnSuspectHostCmd 类中的所有 public 字段(vmUuid, targetHostUuid,
targetHostIp, targetHostUsername, targetHostPrivateKey, targetHostSshPort,
sshTimeoutSec)改为 private,并为每个字段添加标准的 getter 和 setter 方法,保持原有注解(如 `@GrayVersion` 和
`@NoLogging`)放在字段或对应的 getter 上以保留行为一致性;确保序列化/反序列化路径仍能访问这些字段(如框架依赖
getter/setter),并在 setter 中为敏感字段(targetHostPrivateKey)预留后续验证或清理逻辑位置。

Comment thread plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java Outdated
@MatheMatrix MatheMatrix force-pushed the sync/yingzhe.hu/fix/ZSTAC-83890-5.5.22@@3 branch 2 times, most recently from 3555064 to cf18979 Compare May 18, 2026 06:25
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java`:
- Around line 1029-1032: The current call to creator.setTagByTokens writes
msg.getAccessiblePeerHostUuid() directly and only handles null, so empty or
whitespace-only strings become treated as valid peer UUIDs; update the value
passed for VmSystemTags.HA_PRE_FENCE_ACCESSIBLE_PEER_HOST_UUID_TOKEN by trimming
the input (e.g., trim to null) and falling back to
VmSystemTags.HA_PRE_FENCE_NO_ACCESSIBLE_PEER_HOST when the trimmed result is
empty, ensuring creator.setTagByTokens receives either a real UUID or the
NO_ACCESSIBLE_PEER_HOST sentinel.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 51f1472f-ffe5-4ff0-ad84-ed248c558cee

📥 Commits

Reviewing files that changed from the base of the PR and between cf18979 and ba51656.

📒 Files selected for processing (3)
  • compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
  • compute/src/main/java/org/zstack/compute/vm/VmSystemTags.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KvmHaPreFenceVmExtension.java

Comment thread compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java Outdated
@MatheMatrix MatheMatrix force-pushed the sync/yingzhe.hu/fix/ZSTAC-83890-5.5.22@@3 branch 6 times, most recently from 2639e39 to 2264699 Compare May 18, 2026 15:06
Move premium-only HA pre-fence details out of community while keeping the generic async before-start extension point support.

Jira: ZSTAC-83890

Test: mvn -pl header,compute,plugin/kvm,simulator/simulatorImpl,testlib -DskipTests compile

Change-Id: I168adf82338f9df9e76287619b7f76a8e5be695f
@MatheMatrix MatheMatrix force-pushed the sync/yingzhe.hu/fix/ZSTAC-83890-5.5.22@@3 branch from c32f827 to 3ca638f Compare May 18, 2026 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants