Skip to content

<fix>[compute]: fix VM clone quota check fail#3980

Open
zstack-robot-1 wants to merge 1450 commits into
masterfrom
sync/hongbo.liu/5.5.22
Open

<fix>[compute]: fix VM clone quota check fail#3980
zstack-robot-1 wants to merge 1450 commits into
masterfrom
sync/hongbo.liu/5.5.22

Conversation

@zstack-robot-1
Copy link
Copy Markdown
Collaborator

1.Clone a tenant VM using admin account, quota check
didn't base on current role.
2.quota error is overwritten during exception handling.
http://jira.zstack.io/browse/ZSTAC-83646

Resolves: ZSTAC-83646

Change-Id: I776c6b7a786d79746f616b716f736f646e686868

sync from gitlab !9873

gitlab and others added 30 commits February 16, 2026 15:43
<fix>[loadBalancer]: block SLB deletion during grayscale upgrade

See merge request zstackio/zstack!9187
<fix>[volumebackup]: add backup cancel timeout error code

See merge request zstackio/zstack!9200
<fix>[core]: add @nologging to sensitive config fields

See merge request zstackio/zstack!9171
…plit-brain

When a management node departs, its VM skip-trace entries were
immediately removed. If VMs were still being started by kvmagent,
the next VM sync would falsely detect them as Stopped and trigger
HA, causing split-brain.

Fix: transfer departed MN skip-trace entries to an orphaned set with
10-minute TTL instead of immediate deletion. VMs in the orphaned set
remain skip-traced until the TTL expires or they are explicitly
continued, preventing false HA triggers during MN restart scenarios.

Resolves: ZSTAC-80821

Change-Id: I3222e260b2d7b33dc43aba0431ce59a788566b34
…anup

Resolves: ZSTAC-80821

Change-Id: I59284c4e69f5d2ee357b1836b7c243200e30949a
Resolves: ZSTAC-77544

Change-Id: I1f711bff9c1e87a8cbf6a2eb310ca6086f0f99ba
Resolves: ZSTAC-80821

Change-Id: Ia9a9597feceb96b3e6e22259e2d0be7bde8ae499
…V bond check

add new error code constant for SR-IOV bond slave NIC count validation.

Resolves: ZSTAC-81163

Change-Id: Ie2a74411129a98c3c03a4a085e94a3bd45922da5
…tion

Resolves: ZSTAC-80991

Change-Id: I7677ddc25c8859e35e8ba80fd3105406bc761a76
Resolves: ZSTAC-74908

Change-Id: I48054139babb1e8092ab81e4367743ae3fd8aefb
Resolves: ZSTAC-81354

Change-Id: Iff2131b3a878444fa27641f24dd727fe4fa176fb
- UpdateQueryImpl: guard val.getClass() NPE
- LogSafeGson: return JsonNull when input is null
- HostAllocatorChain: null check completion
- VmInstanceVO: use Objects.equals to avoid NPE
- SessionManagerImpl: guard null session
- VmCapabilitiesJudger: guard null PS type result
- CephPSMonBase: guard null self when evicted
- CephPSBase: guard null mon.getSelf()
- HostBase: guard null self when HostVO deleted
- ExternalPSFactory: guard null URI protocol
- LocalStorageBase: guard null errorCode.getCause()

Resolves: ZSTAC-69300, ZSTAC-69957, ZSTAC-71973,
 ZSTAC-81294, ZSTAC-70180, ZSTAC-70181,
 ZSTAC-78309, ZSTAC-78310, ZSTAC-70668,
 ZSTAC-71909, ZSTAC-80555, ZSTAC-81270,
 ZSTAC-70101, ZSTAC-72034, ZSTAC-73197,
 ZSTAC-79921, ZSTAC-81160, ZSTAC-81224,
 ZSTAC-81805, ZSTAC-72304, ZSTAC-81804,
 ZSTAC-74898, ZSTAC-69215, ZSTAC-70151,
 ZSTAC-68933

Change-Id: I910e9b542ecd254fdf7e956f943316988a56a1f9
- LdapUtil: CRE to OperationFailureException
- QueryFacadeImpl: CRE to OperationFailureException
- HostAllocatorManagerImpl: CRE to warn + clamp
- CloudOperationsErrorCode: add LDAP/PROMETHEUS codes

Resolves: ZSTAC-81334

Change-Id: Iab947b0476e9174d5a61baa095847b521b1f59fa
<fix>[pciDevice]: add error code ORG_ZSTACK_PCIDEVICE_10077

See merge request zstackio/zstack!9205
Add ORG_ZSTACK_AI_10134 for ModelCenter disconnected
check and ORG_ZSTACK_PCIDEVICE_10077 for SR-IOV bond
validation.

Resolves: ZSTAC-72783

Change-Id: I504a415a6e822513df955be600188ae88e2e1058
fix(ai): add error codes for AI and PCI modules [ZSTAC-72783]

See merge request zstackio/zstack!9215
<fix>[multi]: batch fix CRE quality issues

See merge request zstackio/zstack!9214
<fix>[volumebackup]: add backup cancel timeout error code

See merge request zstackio/zstack!9204
Companion DB migration for premium MR !13025. Adds normalizedModelName column and index to GpuDeviceSpecVO for GPU spec dedup by normalized model name.

Resolves: ZSTAC-75319

Change-Id: If15e615bcbda955cc1d6c58527bae27d4af4b497
<fix>[compute]: respect vm.migrationQuantity during host maintenance

See merge request zstackio/zstack!9209
…ttachedPSMountPath NPE

Resolves: ZSTAC-69300

Change-Id: I1b39a9e7b76751e8a4ef4cc53e9ac2028386e334
Resolves: ZSTAC-61988

Change-Id: Id3d5a48801fda21e2a51d96949c743bac254b2e6
<fix>[kvm]: configurable orphan skip timeout

See merge request zstackio/zstack!9203
Resolves: ZSTAC-61988

Change-Id: I0908fce97128904f9954198645290f4e5709252e
- Add Javadoc: NULL resourceType = universal (backward compatible)
- Add resourceType to TagPatternVO_ metamodel and TagPatternInventory
- Add groovy integration test (3 scenarios: universal/scoped/combined filter)

Resolves: ZSTAC-74908

Change-Id: I6fc05535ae688e50290759f1e129501f0240696c
<fix>[multi]: batch guard NPE quality issues

See merge request zstackio/zstack!9213
<fix>[telemetry]: fix Sentry transaction loss and add debug logging

See merge request zstackio/zstack!9220
<fix>[gpu]: add normalizedModelName migration SQL

See merge request zstackio/zstack!9218
<fix>[zbs]: sync MDS node statuses to DB when reconnect fails

See merge request zstackio/zstack!9161
Add missing error codes ORG_ZSTACK_STORAGE_PRIMARY_10048
and ORG_ZSTACK_STORAGE_PRIMARY_10051 to all 10 language
files. Fix zh_CN mistranslations replaced with correct
term. Fix zh_TW garbled characters in error messages.

Resolves: ZSTAC-72656

Change-Id: I5f08109d1c415b751ec130285b9d92522f1e0a34
huhu0316 and others added 28 commits April 29, 2026 17:03
br_conn_all_ns (169.254.64.1, mevoco.py CONNECT_ALL_NETNS_BR_OUTER_IP) is
created on the host only when the first flat L3 network lands, AFTER the
add-host ansible playbook has generated the TLS cert. The cert SAN
therefore lacks this IP, and on every subsequent reconnect the
check-tls-certs flow detects the gap and triggers an ansible redeploy
plus libvirtd/kvmagent restart, breaking PID stability.

This IP is host-internal (used only for VM <-> root-netns userdata /
pushgateway / kvmagent HTTP) and has no business in a libvirtd TLS cert.
Add EXCLUDED_INTERNAL_IPS as a single source of truth used by both the
check flow (buildIpList) and the deploy flow (unionTlsCertIps) so the two
flows can never disagree, regardless of when the bridge appears.
host_plugin.fact() on the agent side is unchanged.

Resolves: ZSTAC-84446

Change-Id: I10821d8c68190f2cbc8a0679d19ed053916d9184
<fix>[kvm]: exclude br_conn_all_ns IP from TLS cert SAN

See merge request zstackio/zstack!9758
DBImpact

Change-Id: I6d757a76676d6763697161776a61796175737775
…erviceRefVO.isDefault/createDate/lastOpDate

Resolves: ZSTAC-84025

Change-Id: Ifbb68c9d475aa19762bb053e6545381a305dfe5b
…l.defaultModelServiceUuid

Resolves: ZSTAC-84025

Change-Id: I4046976b0c4437ba4c14069946b8075393f152ed
…Model DSL

Resolves: ZSTAC-84025

Change-Id: I0df26f7367a28da9c6c825dcddcb829db133374e
Resolves: ZSTAC-84025

Change-Id: I7a6c657369736978676164676b616b626d707974
Review found the new ModelServiceRefVO createDate column used a zero-date default, which is unsafe under strict SQL mode. The upgrade now adds the column as nullable, backfills existing rows, then tightens it to NOT NULL with CURRENT_TIMESTAMP. This also adds the shared AI error code consumed by the premium review fix.

Constraint: Premium branch references CloudOperationsErrorCode from zstack utils

Rejected: Keep zero-date default | fails SQL coding rules and strict-mode upgrades

Confidence: high

Scope-risk: narrow

Tested: JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 mvn -pl utils -DskipTests install

Tested: JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 mvn -pl sdk -DskipTests install

Not-tested: Full database migration against a live strict-mode MySQL instance

Resolves: ZSTAC-84025

Change-Id: Ic0a63eec1e3db23e4bb8843efd8e2aee143dce21
CI hit MySQL error 1293 because ModelServiceRefVO.lastOpDate already uses TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, and older MySQL/MariaDB versions allow only one automatic TIMESTAMP column per table. Keep createDate non-zero with a fixed default while the migration backfills existing rows with CURRENT_TIMESTAMP and premium ModelServiceRefVO prePersist sets real creation time for new JPA inserts.

Constraint: Older MySQL/MariaDB permits only one automatic TIMESTAMP column per table

Rejected: DEFAULT CURRENT_TIMESTAMP on createDate | conflicts with lastOpDate automatic timestamp

Confidence: high

Scope-risk: narrow

Tested: git diff --check

Tested: rg createDate/current timestamp/default checks in V5.5.22__schema.sql

Tested: JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 mvn -pl utils -DskipTests install

Not-tested: Full Flyway migration against the CI MySQL image

Resolves: ZSTAC-84025

Change-Id: I095a6818443441409c84988dafbb6c16ef5d8e54
Code review noted that adding ModelServiceRefVO.isDefault as NOT NULL DEFAULT 0 already initializes existing rows, so the immediate UPDATE to the same value only adds unnecessary migration work. The schema keeps the defaulted column and removes the redundant write.\n\nConstraint: MySQL initializes existing rows when adding a NOT NULL column with a DEFAULT\nRejected: Keep explicit UPDATE | unnecessary table write during upgrade\nConfidence: high\nScope-risk: narrow\nTested: git diff --check\nTested: rg check for redundant isDefault UPDATE and zero-date defaults in V5.5.22__schema.sql\nTested: JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 mvn -pl utils -DskipTests install\nNot-tested: Full Flyway migration against CI database image

Resolves: ZSTAC-84025

Change-Id: I062d884ed6232fa603a6ecfd68c2124db85c5ed6
<feature>[ai] ZSTAC-84025: auto-match schema + SDK

See merge request zstackio/zstack!9772
DBImpact

Resolves: ZSTAC-81413

Change-Id: I6a7574656467636e686377756f716d67707a7063
XInfiniStorageController.blacklist was a no-op (// todo,
comp.success()). When ExternalPrimaryStorageKvmFactory's
beforeStartVmOnKvm fell back to blacklist after a deactivate
failure, the no-op silently allowed the VM to start while a
stale client on another host still held write access, risking
a split-brain on the volume.

1. Why?
xinfini does not implement volume path isolation yet. The
blacklist no-op masked the missing capability and let the
caller proceed as if the stale client had been fenced.

2. How?
Throw OperationFailureException from blacklist with a clear
message. Because beforeStartVmOnKvm invokes blacklist with
NopeCompletion (which ignores fail()), only an exception can
abort the VM start. This makes the unsupported case explicit
and conservative until xinfini supports path isolation.

3. Side effects?
VM start is aborted when an old active client cannot be
deactivated on xinfini. This is the safe behavior; previously
such VMs would start with split-brain risk.

# Summary of changes (by module):
- plugin/xinfini: throw OperationFailureException in blacklist
  instead of silently returning success.

Related: ZSTAC-84963
Change-Id: Ib0092f39a7b23cf7b06442ac9616cb2ce39240b7


(cherry picked from commit 024c445)
Regenerated by ./runMavenProfile sdk from updated
@APIParam.validValues on APIAddLabelToAlarmMsg and
APIUpdateAlarmLabelMsg in premium repo (same Jira). Lets SDK
clients pass RegexAgainst and NotEqual as the operator field
for alarm labels, matching the server-side API interceptor.

APIImpact: AddLabelToAlarm/UpdateAlarmLabel SDK Action now accepts
RegexAgainst and NotEqual operator values.
Resolves: ZSTAC-84454

Change-Id: I71636c616567697566786f667974766a6c757073
<fix>[xinfini]: fail blacklist to prevent split-brain on VM start

See merge request zstackio/zstack!9797
<feature>[sdk]: regenerate SDK for RegexAgainst/NotEqual operators

See merge request zstackio/zstack!9798
Existing host reconnect can fail before kvmagent echo.

Ansible masks libvirt sockets with systemctl during deploy.

If host systemd D-Bus is stuck, that optional step times out.

Continue only for this known mask timeout on reconnect.

New host deploy and other ansible failures still fail.

Resolves: ZSTAC-77120

Change-Id: I0ef4a535065ff797c9e4cfae5b39c2daa321a4cc
fix(ZSTAC-77120): continue reconnect on systemd timeout

See merge request zstackio/zstack!9810
Remove an unused MariaDB socket before zstack-server start.
Restart MariaDB so MN can recover after power loss.

Resolves: ZSTAC-83507

Change-Id: Ifb8255f7c9021ef12353f058a0230fe66989b590
…to avoid stale config cache

Resolves: ZSTAC-79075

Change-Id: I6868776c0ea5ef4905ec409181fbf2a431f3034a
<feature>[utils]: Network group high availability strategy

See merge request zstackio/zstack!9779
<feature>[conf]: recover stale mariadb socket

See merge request zstackio/zstack!9809
<fix>[virtualRouter]: skip grayscale upgrade check on auto reconnect to avoid stale config cache

See merge request zstackio/zstack!9814
Store VmModelMountVO lastAttachedEpoch in database so restore cleanup can distinguish successful asynchronous attach from stale failure callbacks.\n\nTested: docker verify-case runMavenProfile premium\nTested: docker verify-case VmModelMountCase

Resolves: ZSTAC-84246

Change-Id: Ib426797059be9401c3e2556ecc31c1879dd04049
Resolves: ZSTAC-84919

Change-Id: Iff44de2bbb1dad50e75678180cb89d3430c562b1
<fix>[ai]: persist mount restore epoch schema

See merge request zstackio/zstack!9833
<fix>[compute]: ZSTAC-84919 avoid stale iso detach NPE

See merge request zstackio/zstack!9834
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Walkthrough

本次变更新增插件与外部服务框架接口,扩展核心错误码、本地化、CloudBus/REST/Telemetry 与数据库查询能力,重构计算与虚拟机分配及网络逻辑,并同步更新数据库迁移、安装部署脚本、文档与版本信息。

Changes

平台、计算与发布更新

Layer / File(s) Summary
插件与外部服务框架
abstraction/src/main/java/org/zstack/abstraction/..., core/src/main/java/org/zstack/core/plugin/..., core/src/main/java/org/zstack/core/externalservice/...
新增插件驱动、校验器、OAuth2/加密客户端契约,以及插件管理、删除校验和外部服务配置下发逻辑。
核心运行时与基础设施
core/src/main/java/org/zstack/core/..., core/src/main/java/org/zstack/core/rest/..., core/src/main/java/org/zstack/core/cloudbus/..., core/src/main/java/org/zstack/core/db/...
扩展全局错误码、本地化、CloudBus 发送返回值、REST/Telemetry 跟踪、数据库查询构建、生命周期回调、配置与任务处理。
计算分配、集群与主机路径
compute/src/main/java/org/zstack/compute/allocator/..., compute/src/main/java/org/zstack/compute/cluster/..., compute/src/main/java/org/zstack/compute/host/...
更新宿主分配失败码、资源绑定策略、容量计算、主机网络标签管理、主机跟踪与 IPMI 处理,以及集群校验与错误封装。
虚拟机、网卡与静态地址流程
compute/src/main/java/org/zstack/compute/vm/..., compute/src/main/java/org/zstack/compute/VmNicUtils.java
重构 VM 静态 IP/DNS、MAC 生成与冲突检测、控制台密码更新、设备地址同步、配额校验和多条 VM/网卡分配流程。
数据库迁移与模式更新
conf/db/..., conf/db/upgrade/...
新增和调整插件、SSO、模型服务、网络、NFV、可观测性等表结构、视图、外键、存储过程与数据回填脚本。
发布脚本与文档资源
.gitconfig/hooks/*, .gitignore, VERSION, README.zh-CN.md, CONTRIBUTORS, build/*, conf/install/*, conf/deploy*.sh, conf/tools/install.sh, conf/i18n/*
更新提交钩子解释器、忽略规则、版本号、中文 README、贡献者列表,以及数据库部署、安装、证书生成和提示文案资源。

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~150 minutes

Poem

小兔翻着补丁跑,
插件云总装满包。
数据库里新芽冒,
主机虚机都站好。
版本轻轻跳一跳,
月光下代码真热闹。

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/hongbo.liu/5.5.22

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
conf/db/upgrade/V5.1.8.1__schema.sql (1)

1-233: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

本文件多处零时间默认值需统一整改

该脚本里有大量 DEFAULT '0000-00-00 00:00:00'(例如 Line 15、Line 33、Line 58、Line 94、Line 197、Line 218、Line 230 等)。这在升级场景下风险较高,建议统一替换为仓库兼容的非零哨兵时间,并做一次全文件批量修正,避免后续分支重复踩坑。

示例修复(请同模式应用到全文件)
-    `createDate` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
+    `createDate` timestamp NOT NULL DEFAULT '2000-01-01 00:00:00',

As per coding guidelines "Do not use DEFAULT 0000-00-00 00:00:00, use DEFAULT CURRENT_TIMESTAMP instead".
Based on learnings "in this repo’s MySQL 5.7 setup, when another TIMESTAMP already has ON UPDATE CURRENT_TIMESTAMP, use a non-CURRENT_TIMESTAMP sentinel such as '2000-01-01 00:00:00'."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/db/upgrade/V5.1.8.1__schema.sql` around lines 1 - 233, This migration
uses many DEFAULT '0000-00-00 00:00:00' values (e.g., ModelCenterVO.createDate,
ModelVO.createDate, ModelServiceVO.createDate,
ModelServiceInstanceVO.createDate, SSOServerTokenVO.createDate/lastOpDate,
ContainerManagementVmVO.createDate/lastOpDate, DatasetVO.createDate/lastOpDate,
UserProxyConfigVO.createDate/lastOpDate,
UserProxyConfigResourceRefVO.createDate/lastOpDate,
ModelEvaluationTaskVO.createDate/lastOpDate,
TrainedModelRecordVO.createDate/lastOpDate) which must be replaced: for each
TIMESTAMP column that already uses ON UPDATE CURRENT_TIMESTAMP keep the ON
UPDATE but set a non-zero sentinel like '2000-01-01 00:00:00' as the default;
for TIMESTAMP columns that do not rely on another ON UPDATE CURRENT_TIMESTAMP
use DEFAULT CURRENT_TIMESTAMP (or DEFAULT '2000-01-01 00:00:00' if
CURRENT_TIMESTAMP would conflict with repo MySQL compatibility rules); apply
this change consistently across all occurrences in the file, ensuring
constraints and ON UPDATE clauses (e.g., columns with "ON UPDATE
CURRENT_TIMESTAMP") are preserved.
conf/deployuidb.sh (1)

1-21: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Shell 兼容性问题:shebang 与语法不匹配

脚本声明为 #!/bin/sh(POSIX sh),但使用了 bash 特有语法:

  • [[ ]] 双括号测试(第 12、18 行)
  • &> 重定向(第 16、28、40、78 行)

/bin/sh 不是 bash 的系统上(如 Debian/Ubuntu 使用 dash),脚本会执行失败。

🔧 建议修复
-#!/bin/sh
+#!/bin/bash
 set -e

或者保持 POSIX 兼容:

-if [[ `id -u` -ne 0 ]] && [[ x"$user" = x"root" ]]; then
+if [ "$(id -u)" -ne 0 ] && [ "$user" = "root" ]; then

-if command -v greatdb &> /dev/null; then
+if command -v greatdb > /dev/null 2>&1; then
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/deployuidb.sh` around lines 1 - 21, The script uses bash-only constructs
while claiming a POSIX shebang; either update the shebang to #!/bin/bash or make
the script POSIX-compliant: replace all [[ ... ]] tests (e.g., the checks around
`id -u` and `x"$user" = x"root"`) with POSIX [ ... ] or test, and replace all
`&>` redirections with POSIX equivalents (e.g., redirect stdout and stderr using
>file 2>&1 or >/dev/null 2>&1); update usages around the `MYSQL` variable
assignment, the `command -v greatdb` block and any `&>` occurrences to follow
the chosen approach so the script runs on shells like dash.
core/src/main/java/org/zstack/core/job/JobQueueFacadeImpl2.java (1)

286-299: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

处理反序列化失败后的空任务分支,避免二次 NPE。

Line 298 重新查询后 jobe 可能为 null;下一轮循环会在 jobe.getContext()jobe.getId() 处再次抛异常,导致该队列处理链路中断。

💡 建议修复
                     while (true) {
                         try {
                             JobContextObject ctx = SerializableHelper.readObject(jobe.getContext());
                             Job theJob = ctx.load();
                             jobe.setState(JobState.Processing);
                             jobe = dbf.updateAndRefresh(jobe);
                             return Bucket.newBucket(jobe, theJob);
                         } catch (Exception e1) {
                             ErrorCode ierr = inerr(ORG_ZSTACK_CORE_JOB_10001, "[Job de-serialize failed, the job will be marked as Error] queue name: %s, job id: %s, %s", qvo.getName(),
                                     jobe.getId(), e1.getMessage());
                             jobFail(jobe, ierr);
                             logger.warn(ierr.getDetails(), e1);
                             jobe = findJob(qvo);
+                            if (jobe == null) {
+                                dbf.remove(qvo);
+                                logger.debug(String.format("[JobQueue released, no pending task, delete the queue] last owner: %s, queue name: %s, queue id: %s",
+                                        qvo.getOwner(), qvo.getName(), qvo.getId()));
+                                return null;
+                            }
                         }
                     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/main/java/org/zstack/core/job/JobQueueFacadeImpl2.java` around lines
286 - 299, In JobQueueFacadeImpl2 inside the loop that deserializes job context,
handle the case where findJob(qvo) returns null after a deserialization failure:
after the catch where you call jobFail(jobe, ierr) and set jobe = findJob(qvo),
check if jobe is null and if so skip processing that iteration (e.g., continue
the while loop or return an empty/none Bucket as appropriate) to avoid a second
NPE on jobe.getContext() or jobe.getId(); ensure you reference the jobe local,
SerializableHelper.readObject/JobContextObject load, jobFail(...) and
Bucket.newBucket(...) paths when applying the fix.
console/src/main/java/org/zstack/console/ManagementServerConsoleProxyBackend.java (1)

565-572: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

consoleProxyPort 配置失败现在会被当成成功。

这里先执行了 setConsoleProxyPort(newProxyPort),但分支只判断 rst == 0,所以只要 IP 配置成功、端口配置失败,流程仍然会继续往下走。另外失败消息里还有两个 %d 占位符却没有传参,错误路径本身也会丢失诊断信息。

建议直接一起修正
-                        if (rst == 0) {
+                        if (rst == 0 && portRst == 0) {
                             trigger.next();
                         } else {
-                            trigger.fail(operr(ORG_ZSTACK_CONSOLE_10003, "failed to configure consoleProxyOverriddenIp[code:%d] or consoleProxyPort[code:%d]"));
+                            trigger.fail(operr(
+                                    ORG_ZSTACK_CONSOLE_10003,
+                                    "failed to configure consoleProxyOverriddenIp[code:%d] or consoleProxyPort[code:%d]",
+                                    rst, portRst));
                         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@console/src/main/java/org/zstack/console/ManagementServerConsoleProxyBackend.java`
around lines 565 - 572, 在 run(FlowTrigger trigger, Map data) 中当前只检查 rst (来自
setConsoleProxyOverridenIp) 导致 setConsoleProxyPort(newProxyPort)
失败也被视为成功;修改逻辑同时检查 rst 和 portRst 的返回值,只有两者都为 0 时才调用 trigger.next(),否则调用
trigger.fail(...) 并将两个返回码和底层错误信息包含在 operr 的格式化参数中(使用
setConsoleProxyOverridenIp、setConsoleProxyPort 的返回码与/或异常信息),确保占位符 %d
有对应参数并附带可诊断的错误上下文。
conf/tools/install.sh (1)

107-121: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

包装脚本这里需要用 "$@" 原样透传参数。

当前的 $@ 会再次分词,带空格的 inventory 路径、-e 参数值或其他复杂参数都会被拆坏,实际行为会和直接调用 ansible 不一致。

🛠️ 建议修改
-\${VIRTUAL_ENV}/bin/ansible \$@
+\${VIRTUAL_ENV}/bin/ansible "$@"
-\${VIRTUAL_ENV}/bin/ansible-playbook \$@
+\${VIRTUAL_ENV}/bin/ansible-playbook "$@"

Also applies to: 124-138

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/tools/install.sh` around lines 107 - 121, The wrapper script that writes
/usr/bin/ansible uses unquoted $@ which re-splits arguments and breaks
spaces/complex args; update the generated script to forward parameters exactly
using "$@" (and prefer exec "${VIRTUAL_ENV}/bin/ansible" "$@" so the wrapper
doesn't add an extra process). Apply the same fix to the second wrapper block
referenced (lines 124-138) so both instances use "$@" and exec with
${VIRTUAL_ENV}/bin/ansible to preserve argument quoting and behavior.
core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java (1)

80-92: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

缓存未命中时这里会返回 null

Map.put() 返回的是旧值,不是新插入的对象。也就是说数据库查到了 vo 之后,info = nodes.put(...) 这一句仍然会把 info 设成 null,调用方拿到的还是空值。

🛠️ 建议修改
         if (info == null) {
             ManagementNodeVO vo = dbf.findByUuid(nodeUuid, ManagementNodeVO.class);
             if (vo == null) {
                 throw new ManagementNodeNotFoundException(nodeUuid);
             }

             nodeHash.add(nodeUuid);
-            info = nodes.put(nodeUuid, new NodeInfo(vo));
+            info = new NodeInfo(vo);
+            nodes.put(nodeUuid, info);
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java`
around lines 80 - 92, In getNodeInfo the code sets info = nodes.put(nodeUuid,
new NodeInfo(vo)), but Map.put returns the previous value so info becomes null;
instead construct the NodeInfo (new NodeInfo(vo)), assign it to a local
variable, put it into nodes (nodes.put(nodeUuid, info) or nodes.putIfAbsent and
then retrieve), then add nodeUuid to nodeHash and return that local NodeInfo;
refer to getNodeInfo, NodeInfo, nodes.put/nodes.putIfAbsent, nodeHash.add,
dbf.findByUuid, and ManagementNodeNotFoundException when making the change.
compute/src/main/java/org/zstack/compute/vm/VmQuotaOperator.java (1)

44-73: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

管理员路径仍然会绕过新增的运行态配额检查。

APIChangeInstanceOfferingMsgAPIUpdateVmInstanceMsg 这两个新分支现在只挂在 type != AccountType.SystemAdmin 下面。管理员代租户执行这两类操作时,下面新增的 check(...) 永远不会被调用,运行中 VM 的 CPU/内存扩容仍然不会按目标租户配额拦截。

既然这里已经把 currentAccountUuidresourceTargetOwnerAccountUuid 拆开了,这两类消息也应该像 APIChangeResourceOwnerMsg 一样接入管理员分支。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@compute/src/main/java/org/zstack/compute/vm/VmQuotaOperator.java` around
lines 44 - 73, In VmQuotaOperator.checkQuota, admin users currently bypass the
new runtime quota checks for APIChangeInstanceOfferingMsg and
APIUpdateVmInstanceMsg because those branches are only under the non-admin path;
update the admin branch to also invoke check(...) for
APIChangeInstanceOfferingMsg and APIUpdateVmInstanceMsg (just like
APIChangeResourceOwnerMsg) so admin-acting-on-behalf operations validate quotas
against resourceTargetOwnerAccountUuid/currentAccountUuid separation and enforce
target tenant runtime CPU/memory limits. Ensure you call the same
check((APIChangeInstanceOfferingMsg) msg, pairs) and
check((APIUpdateVmInstanceMsg) msg, pairs) in the else (AccountType.SystemAdmin)
branch.
compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java (1)

1112-1114: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

相同 IP 时不要直接短路返回。

这里现在只比较了 IP 字面值。这个补丁新增了 overrideInfo 的网关/掩码/IPv6 前缀覆写;如果请求保持 IP 不变、只更新这些字段,这里会直接 success(),后面的重新分配不会执行,UsedIpVO 仍然保留旧网络参数。

🛠️ 一个可行修正
-        if (targetNic.getUsedIps().stream().map(UsedIpVO::getIp).collect(Collectors.toList()).containsAll(staticIpMap.values())) {
+        boolean sameIp = targetNic.getUsedIps().stream()
+                .map(UsedIpVO::getIp)
+                .collect(Collectors.toList())
+                .containsAll(staticIpMap.values());
+        boolean hasOverride = overrideInfo != null && (
+                StringUtils.isNotBlank(overrideInfo.getNetmask()) ||
+                StringUtils.isNotBlank(overrideInfo.getGateway()) ||
+                StringUtils.isNotBlank(overrideInfo.getIpv6Gateway()) ||
+                StringUtils.isNotBlank(overrideInfo.getIpv6Prefix())
+        );
+        if (sameIp && !hasOverride) {
             completion.success();
             return;
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines
1112 - 1114, The current early-return in VmInstanceBase (the if that checks
targetNic.getUsedIps().stream().map(UsedIpVO::getIp).collect(Collectors.toList()).containsAll(staticIpMap.values()))
only compares IP string values and therefore incorrectly short-circuits when
overrideInfo (gateway/netmask/ipv6Prefix) needs to be applied; change the check
to verify for each desired IP in staticIpMap there exists a matching UsedIpVO
whose ip AND network parameters (gateway, netmask, and IPv6 prefix where
applicable) equal the requested values from overrideInfo (or null/empty
equivalently), and only call completion.success() if all fields match; otherwise
let the existing reallocation/update logic run so UsedIpVO entries are updated.
Ensure you reference targetNic, getUsedIps, UsedIpVO and staticIpMap when
locating and updating the logic.
🟡 Minor comments (27)
conf/db/upgrade/V5.1.19__schema.sql-7-8 (1)

7-8: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

时间戳字段的默认值不符合编码规范。

根据编码规范:

  1. 不应使用 DEFAULT '0000-00-00 00:00:00'
  2. lastOpDate 应使用 DEFAULT CURRENT_TIMESTAMP
  3. createDate 因为已有一个 TIMESTAMP 列使用了 ON UPDATE CURRENT_TIMESTAMP,根据 MySQL 5.7 兼容性要求,可使用 '2000-01-01 00:00:00' 作为哨兵值(而非 '0000-00-00 00:00:00'

As per coding guidelines, SQL scripts should not use DEFAULT '0000-00-00 00:00:00'. Based on learnings, use '2000-01-01 00:00:00' as sentinel default for the second TIMESTAMP column to avoid MySQL 5.7 Error 1293.

🔧 建议的修复
-    `lastOpDate` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00' ON UPDATE CURRENT_TIMESTAMP,
-    `createDate` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
+    `lastOpDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
+    `createDate` timestamp NOT NULL DEFAULT '2000-01-01 00:00:00',
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/db/upgrade/V5.1.19__schema.sql` around lines 7 - 8, Replace the illegal
'0000-00-00 00:00:00' defaults: set the lastOpDate TIMESTAMP column to use
DEFAULT CURRENT_TIMESTAMP (and keep ON UPDATE CURRENT_TIMESTAMP) and set the
createDate TIMESTAMP column to use the sentinel DEFAULT '2000-01-01 00:00:00' to
comply with MySQL 5.7 compatibility and the coding guidelines.
abstraction/src/main/java/org/zstack/abstraction/PluginEndpointValidator.java-22-35 (1)

22-35: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

异常缺少错误信息且逻辑可以优化。

  1. 抛出异常时未提供错误消息,无法得知哪些插件类型重复。
  2. 当插件数量与去重后数量不等时才抛异常的逻辑略显间接,建议直接描述"存在重复的插件类型"。
🔧 建议的修复
     public void validateAllPlugins(List<PluginDriver> pluginDriverList) {
         int pluginNumber = pluginDriverList.size();
 
         int distinctPluginByEndpointTypeNumber = pluginDriverList
                 .stream()
                 .map(PluginDriver::type)
                 .collect(Collectors.toSet()).size();
 
         if (pluginNumber == distinctPluginByEndpointTypeNumber) {
             return;
         }
 
-        throw new InvalidPluginDefinitionException();
+        throw new InvalidPluginDefinitionException(
+            String.format("Duplicate plugin types detected. Total plugins: %d, Unique types: %d", 
+                pluginNumber, distinctPluginByEndpointTypeNumber));
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@abstraction/src/main/java/org/zstack/abstraction/PluginEndpointValidator.java`
around lines 22 - 35, validateAllPlugins currently throws
InvalidPluginDefinitionException without details and uses an indirect size
comparison; update PluginEndpointValidator.validateAllPlugins to detect
duplicate plugin types by comparing pluginDriverList.size() with the distinct
set size and, when they differ, compute the duplicate types via
pluginDriverList.stream().map(PluginDriver::type) and grouping/counting to
produce a list or set of duplicated type names, then throw new
InvalidPluginDefinitionException with a clear message like "duplicate plugin
types found: [...]" that includes the duplicate type names so callers can see
which PluginDriver.type values conflict.
abstraction/src/main/java/org/zstack/abstraction/PluginEndpointValidator.java-16-18 (1)

16-18: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

异常缺少错误信息。

抛出 InvalidPluginDefinitionException 时未提供任何错误消息,这会导致调试困难。建议添加描述性的错误信息,说明哪个插件的 type 为 null。

🔧 建议的修复
     public void validate(PluginDriver register) {
         if (register.type() == null) {
-            throw new InvalidPluginDefinitionException();
+            throw new InvalidPluginDefinitionException(
+                String.format("Plugin type cannot be null for plugin class: %s", 
+                    register.getClass().getName()));
         }
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@abstraction/src/main/java/org/zstack/abstraction/PluginEndpointValidator.java`
around lines 16 - 18, 在 PluginEndpointValidator 中当 register.type() 为 null 时,抛出的
InvalidPluginDefinitionException 缺少描述性信息;修改该分支在抛出异常时附带上下文信息(例如 register 的标识或
toString() 输出)以指明是哪一个插件定义的 type 为
null(参考符号:PluginEndpointValidator、register.type()、InvalidPluginDefinitionException、register)。确保异常构造器接收一个包含插件标识的字符串,例如
"Plugin registration has null type: " + register,以便后续能定位问题。
README.zh-CN.md-98-119 (1)

98-119: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

标题层级有跳级,建议调整为连续层级。

## 后直接使用了 ####(如 Line 100、Line 125),会触发 MD001 并影响文档结构解析。建议这些二级标题下的小节统一改成 ###

📝 建议修改
-#### 伸缩性
+### 伸缩性

-#### 插件架构
+### 插件架构

-#### 资源操作框架
+### 资源操作框架

-#### 整体技术架构概述
+### 整体技术架构概述

-#### 问题反馈
+### 问题反馈

-#### 功能需求
+### 功能需求

-#### 代码贡献
+### 代码贡献

Also applies to: 124-137

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.zh-CN.md` around lines 98 - 119, The section under the "## ZStack架构设计"
heading uses fourth-level headings (e.g., "#### 伸缩性", "#### 插件架构", "####
资源操作框架", "#### 整体技术架构概述") which causes a jump from H2 to H4 and triggers MD001;
change those "####" headings to third-level "###" so the hierarchy is continuous
(e.g., update "#### 伸缩性" → "### 伸缩性", "#### 插件架构" → "### 插件架构", etc.), and
ensure any nested subpoints below those remain at lower levels if intended.
conf/db/upgrade/V1.3__schema.sql-73-73 (1)

73-73: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

该视图语句仍有未加反引号的标识符,建议一次性补齐。

这次只修复了 system,但同一行里其他列名与 WHERE 条件列仍未加反引号,不符合当前 SQL 规范。

🧩 建议修改
-CREATE VIEW `zstack`.`ImageVO` AS SELECT uuid, name, description, status, state, size, actualSize, md5Sum, platform, type, format, url, `system`, mediaType, createDate, lastOpDate, guestOsType FROM `zstack`.`ImageEO` WHERE deleted IS NULL;
+CREATE VIEW `zstack`.`ImageVO` AS SELECT `uuid`, `name`, `description`, `status`, `state`, `size`, `actualSize`, `md5Sum`, `platform`, `type`, `format`, `url`, `system`, `mediaType`, `createDate`, `lastOpDate`, `guestOsType` FROM `zstack`.`ImageEO` WHERE `deleted` IS NULL;

As per coding guidelines "所有表名和列名必须使用反引号包裹(例如:WHERE system = 1),以避免 MySQL 8.0 / GreatSQL 保留关键字冲突导致的语法错误"。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/db/upgrade/V1.3__schema.sql` at line 73, The CREATE VIEW definition for
ImageVO still has unquoted identifiers; update the CREATE VIEW
`zstack`.`ImageVO` ... SELECT ... FROM `zstack`.`ImageEO` statement so that
every table and column identifier is wrapped in backticks (e.g., `uuid`, `name`,
`description`, `status`, `state`, `size`, `actualSize`, `md5Sum`, `platform`,
`type`, `format`, `url`, `system`, `mediaType`, `createDate`, `lastOpDate`,
`guestOsType`) and also quote the WHERE column (`deleted`) to comply with the
guideline that all table/column names must use backticks.
abstraction/src/main/java/org/zstack/abstraction/sns/EndpointDriver.java-5-8 (1)

5-8: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Javadoc 与实际接口不一致,建议立即修正。

Line 6-7 注释提到 PluginEndpointSender / PluginRegister,与当前 EndpointDriver extends PluginDriver 不一致,容易误导后续实现者。

建议修复
 /**
- * PluginEndpointSender extends PluginRegister and contains sender
- * of sns endpoint.
+ * EndpointDriver extends PluginDriver and defines the sender
+ * behavior for SNS endpoints.
  */

As per coding guidelines 对于较长的注释,需要仔细校对并随代码更新,确保内容正确。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@abstraction/src/main/java/org/zstack/abstraction/sns/EndpointDriver.java`
around lines 5 - 8, The class Javadoc is out of sync: it refers to
PluginEndpointSender and PluginRegister but the class is declared as
EndpointDriver extends PluginDriver; update the class-level comment to correctly
describe EndpointDriver and its relationship to PluginDriver and its
responsibility (sending/handling SNS endpoints) so readers aren't misled by
stale names like PluginEndpointSender/PluginRegister—locate the Javadoc above
the EndpointDriver class and replace the incorrect references with accurate
wording mentioning EndpointDriver and PluginDriver and a brief description of
SNS endpoint behavior.
abstraction/src/main/java/org/zstack/abstraction/sso/OAuth2PluginConstants.java-20-20 (1)

20-20: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

修正拼写错误。

GRANT_TYPE_OCDE_VALUE 存在拼写错误,应为 GRANT_TYPE_CODE_VALUE

-public static final String GRANT_TYPE_OCDE_VALUE = "authorization_code";
+public static final String GRANT_TYPE_CODE_VALUE = "authorization_code";
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@abstraction/src/main/java/org/zstack/abstraction/sso/OAuth2PluginConstants.java`
at line 20, Fix the typo in OAuth2PluginConstants by renaming the constant
GRANT_TYPE_OCDE_VALUE to GRANT_TYPE_CODE_VALUE and updating every
usage/reference to the new name (e.g., imports/usages across the codebase) to
avoid compilation errors; modify the declaration in class OAuth2PluginConstants
and run a project-wide search/replace to update references (ensure no duplicate
constants remain and adjust any tests or docs that referenced the old symbol).
core/src/main/java/org/zstack/core/CoreManagerImpl.java-141-143 (1)

141-143: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

改进任务过滤逻辑的效率和设计清晰度。

isOnlyRunningTask() 为 true 时,代码构建完整的 ChainInfo 对象(包含 running 和 pending 任务),然后清空 pendingTask 列表。这种方式存在两个问题:

  1. 效率问题:pendingQueue 中的任务被遍历并添加到 ChainInfo,随后立即被清除,造成不必要的计算和内存操作。

  2. 设计问题:在返回的对象上执行副作用(清除列表)不符合函数式设计原则,使代码意图不够清晰。

建议重构:在 ChainTaskQueueWrapper.getChainInfo() 中增加参数控制是否包含 pending 任务,或在 getChainTaskInfo() 中新增专用方法(如 getRunningTasksOnly()),避免无效的构建和清除操作。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/main/java/org/zstack/core/CoreManagerImpl.java` around lines 141 -
143, When msg.isOnlyRunningTask() is true, avoid building full ChainInfo and
then mutating its pendingTask lists; instead update CoreManagerImpl to call a
new or overloaded method that returns only running tasks (e.g., add
getRunningTasksOnly() or add a boolean includePending parameter to
ChainTaskQueueWrapper.getChainInfo) and use that from getChainTaskInfo() so
pendingQueue entries are not added then cleared and no side-effect (clear()) is
performed on the returned ChainInfo objects.
core/src/main/java/org/zstack/core/With.java-21-23 (1)

21-23: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

修正注释格式错误。

第 21 行末尾的 /** 注释标记位置不正确,应移至单独的行或删除。当前格式可能导致文档工具解析错误。

🔧 建议的修正
 import static org.zstack.core.Platform.inerr;
-import static org.zstack.utils.clouderrorcode.CloudOperationsErrorCode.*;/**
- */
-
+import static org.zstack.utils.clouderrorcode.CloudOperationsErrorCode.*;
+
+/**
+ */
 `@Configurable`(preConstruction = true, autowire = Autowire.BY_TYPE)
 public class With {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/main/java/org/zstack/core/With.java` around lines 21 - 23, The stray
Javadoc start token `/**` appended to the import statement should be removed or
moved to its own proper comment line to avoid parsing errors; edit the import
line that currently reads `import static
org.zstack.utils.clouderrorcode.CloudOperationsErrorCode.*;/**` in the With
class and either delete the trailing `/**` or place it on a separate line as a
valid comment block opener so the import statement is clean.
conf/db/upgrade/V5.1.20__schema.sql-10-10 (1)

10-10: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

删除 SQL 注释中的中文

第 10 行和第 17 行的 SQL 注释包含中文文本,违反编码指南要求。应改为英文注释:

  • 第 10 行:-- 开始事务 改为 -- Start transaction
  • 第 17 行:-- 提交事务 改为 -- Commit transaction

根据编码指南,代码中不应出现中文,包括注释和错误信息,应统一使用无拼写错误的英文。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/db/upgrade/V5.1.20__schema.sql` at line 10, Replace the Chinese SQL
comments in V5.1.20__schema.sql with English equivalents: change the comment "--
开始事务" to "-- Start transaction" and change "-- 提交事务" to "-- Commit transaction";
ensure no other Chinese text remains in comments or messages in this file and
keep spelling correct.
core/src/main/java/org/zstack/core/plugin/PluginApiInterceptor.java-25-27 (1)

25-27: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

先对 UUID 做 trim,再交给扩展点校验

这里直接使用 msg.getUuid(),当请求参数带首尾空白时,扩展点可能得到错误值,导致误判或查找失败。

建议修改
 private void validate(APIDeletePluginDriversMsg msg) {
+    String pluginUuid = msg.getUuid() == null ? null : msg.getUuid().trim();
     for (PluginDriversExtensionPoint ext : pluginRgty.getExtensionList(PluginDriversExtensionPoint.class)) {
-        ErrorCode result = ext.validateDeletePluginDrivers(msg.getUuid());
+        ErrorCode result = ext.validateDeletePluginDrivers(pluginUuid);
         if (result != null) {
             throw new ApiMessageInterceptionException(result);
         }
     }
 }

As per coding guidelines: **/*Interceptor.java: “注意检查来自 Message 的参数是否做过 trim,用户可能在浏览器上复制粘贴的数据带有空格、换行符等”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/main/java/org/zstack/core/plugin/PluginApiInterceptor.java` around
lines 25 - 27, In PluginApiInterceptor's validate(APIDeletePluginDriversMsg msg)
ensure the UUID is trimmed before passing to extension points: retrieve
msg.getUuid(), call trim() (and handle null safely), then pass the trimmed value
to ext.validateDeletePluginDrivers(...) (or set the trimmed value back on msg if
other code expects msg.getUuid()); update references in the validate method so
pluginRgty.getExtensionList(...)/ext.validateDeletePluginDrivers use the trimmed
UUID to avoid failures when the incoming message contains leading/trailing
whitespace.
build/bump_version.py-27-27 (1)

27-27: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

注释中包含全角标点符号。

根据编码规范,代码中(包括注释)不应包含中文及中文标点。Lines 27, 49, 60, 83的注释中使用了全角逗号,应改为半角逗号,

🔧 建议修复
-        # 使用 'rb' 模式读取,保留原始字节
+        # 使用 'rb' 模式读取, 保留原始字节

对其他三处做类似修改。

Also applies to: 49-49, 60-60, 83-83

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@build/bump_version.py` at line 27, Replace full-width Chinese punctuation in
comments by using ASCII punctuation: locate the comments that contain the text
"使用 'rb' 模式读取,保留原始字节" and the other comment lines noted (the occurrences that
use the full-width comma ',') and change ',' to a standard half-width comma ','
so all comments use ASCII punctuation; apply the same replacement to the other
two comment instances referenced (the ones at the other similar comment lines).
conf/install/zstack-server-105-120 (1)

105-120: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

函数recover_mariadb_stale_socket的返回值处理需要更明确。

当前实现中,如果ZSTACK_RECOVER_MARIADB_SOCKET设为false或MariaDB服务不存在时,函数返回0(成功)。但在Line 112-113,当socket_is_used返回2时(表示无法探测socket状态),函数也返回0。这可能掩盖了潜在的环境配置问题(如lsoffuser都不可用)。

建议:

  1. socket_status为2时,考虑记录警告日志而非静默返回成功
  2. start_mariadb_service的失败情况应有更明确的错误处理与日志
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/install/zstack-server` around lines 105 - 120, In
recover_mariadb_stale_socket, don't silently treat socket_status==2 as success:
call socket_is_used and if socket_status equals 2, emit a warning via
log_zstack_server that socket detection tools are unavailable (include
$MARIADB_SOCKET and suggest lsof/fuser missing) and return a non‑zero status;
also after calling start_mariadb_service check its exit code and if it fails,
log an error with details via log_zstack_server and return a non‑zero status so
callers see the failure (keep existing early returns for
ZSTACK_RECOVER_MARIADB_SOCKET, mariadb_service_exists and mariadb_is_active).
conf/db/upgrade/V5.3.6__schema.sql-15-41 (1)

15-41: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

SQL 注释使用了中文

根据编码规范,代码中的注释应使用英文。当前 SQL 语句中存在多处中文注释(如 "生成不含连字符的uuid"、"使用DatasetVO的uuid作为resourceUuid" 等)。

🔧 建议修复
 INSERT INTO SystemTagVO
 (`uuid`, `resourceUuid`, `resourceType`, `inherent`, `type`, `tag`, `createDate`, `lastOpDate`)
 SELECT
-    REPLACE(UUID(), '-', ''),                  -- 生成不含连字符的uuid
-    uuid,                                       -- 使用DatasetVO的uuid作为resourceUuid
+    REPLACE(UUID(), '-', ''),                  -- generate uuid without hyphens
+    uuid,                                       -- use DatasetVO uuid as resourceUuid
     'DatasetVO',                               -- resourceType
     1,                                         -- inherent
     'System',                                  -- type
     'dataset::usage::scenarios::ModelEval',    -- tag
-    CURRENT_TIMESTAMP(),                       -- createDate
-    CURRENT_TIMESTAMP()                        -- lastOpDate
+    CURRENT_TIMESTAMP(),
+    CURRENT_TIMESTAMP()
 FROM DatasetVO
 WHERE `system` = true;

As per coding guidelines: "代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/db/upgrade/V5.3.6__schema.sql` around lines 15 - 41, The SQL migration
contains Chinese comments in the INSERT...SELECT statements that populate
SystemTagVO from DatasetVO (comments like "生成不含连字符的uuid" and
"使用DatasetVO的uuid作为resourceUuid"); update all inline comments to English while
preserving meaning (e.g. explain REPLACE(UUID(), '-', '') generates a UUID
without hyphens and uuid is used as resourceUuid), ensure comments referencing
SystemTagVO, DatasetVO, tag values ('dataset::usage::scenarios::ModelEval',
'dataset::datatype::Text'), and column names (`uuid`, `resourceUuid`,
`resourceType`, `inherent`, `type`, `tag`, `createDate`, `lastOpDate`) remain
accurate and concise in English.
core/src/main/java/org/zstack/core/jsonlabel/JsonLabelInventoryDoc_zh_cn.groovy-7-7 (1)

7-7: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

文档字符串包含中文,违反仓库文本规范

Line [7]、Line [11]、Line [17]、Line [23]、Line [29]、Line [35]、Line [41] 使用了中文描述。请改为英文,避免后续一致性与自动化检查问题。
As per coding guidelines: 代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写

Also applies to: 11-11, 17-17, 23-23, 29-29, 35-35, 41-41

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@core/src/main/java/org/zstack/core/jsonlabel/JsonLabelInventoryDoc_zh_cn.groovy`
at line 7, The file JsonLabelInventoryDoc_zh_cn.groovy contains Chinese text in
its documentation strings (e.g., the title "JsonLabelInventory的数据结构" and the
other doc lines at the same file: lines referenced at 7, 11, 17, 23, 29, 35,
41); replace each Chinese string with clear, grammatically correct English
equivalents (for example change the title to "JsonLabelInventory data structure"
and translate the other Chinese descriptions to English), ensuring you only
modify the human-readable doc strings in JsonLabelInventoryDoc_zh_cn.groovy and
preserve identifiers and code structure.
conf/db/upgrade/beforeMigrate.sql-251-251 (1)

251-251: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

RENAME_TABLE 动态 SQL 需补齐 schema 与反引号

Line [251] 当前目标表名未显式限定 zstack,且表名未使用反引号。建议统一为带 schema 且带反引号的形式,避免升级脚本在关键字/环境差异下出错。

建议修复
-        SET `@alter_sql` = CONCAT('RENAME TABLE zstack.', old_name, ' TO ', new_name);
+        SET `@alter_sql` = CONCAT('RENAME TABLE `zstack`.`', old_name, '` TO `zstack`.`', new_name, '`');

Based on learnings: In ZStack upgrade SQL, schema name is fixed as zstack and should be explicitly qualified.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/db/upgrade/beforeMigrate.sql` at line 251, The dynamic RENAME TABLE SQL
built into `@alter_sql` uses unqualified and unquoted identifiers; update the
CONCAT in the RENAME TABLE construction so it explicitly qualifies the schema
and wraps identifiers in backticks (i.e. produce `RENAME TABLE
`zstack`.`old_name` TO `zstack`.`new_name``) by referencing the variables used
(old_name, new_name) and the `@alter_sql/CONCAT` expression that builds the
statement; ensure both source and target table names are schema-qualified with
backticks to avoid issues with keywords or environment differences.
core/src/main/java/org/zstack/core/errorcode/LocaleUtils.java-40-43 (1)

40-43: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

availableLocales 增加空值兜底,避免 NPE。

当前只校验了 acceptLanguage,未校验 availableLocales。当其为 null 时,Line 61 会直接抛 NPE。

💡 建议修复
     public static String resolveLocale(String acceptLanguage, Set<String> availableLocales) {
+        if (availableLocales == null || availableLocales.isEmpty()) {
+            return DEFAULT_LOCALE;
+        }
+
         if (acceptLanguage == null || acceptLanguage.trim().isEmpty()) {
             return DEFAULT_LOCALE;
         }

Also applies to: 61-68

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/main/java/org/zstack/core/errorcode/LocaleUtils.java` around lines
40 - 43, The resolveLocale method fails to null-check the availableLocales
parameter causing a possible NPE; update resolveLocale to guard against
availableLocales being null or empty (e.g., treat null as Collections.emptySet()
or return DEFAULT_LOCALE immediately) and adjust the subsequent logic that
iterates or queries availableLocales (the code around the resolveLocale method
and the block referenced at lines 61-68) to safely handle an empty set when
matching acceptLanguage; ensure you reference the resolveLocale method and
DEFAULT_LOCALE constant while making the change.
conf/db/upgrade/V4.5.1.2__schema.sql-37-37 (1)

37-37: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

请移除 SQL 中的中文常量文本。

Line 37 直接写入中文字符串与仓库规范不一致,建议改为英文文本或改为可本地化键值。

As per coding guidelines: 代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/db/upgrade/V4.5.1.2__schema.sql` at line 37, The SQL update uses a
Chinese literal in SystemTagVO.tag for
resourceUuid='eecc7b576c05391bb01fd956964d3ba4'; replace the Chinese constant
"name::cn::密码资源状态异常" with an English string or a localization key (e.g.
"name::en::password_resource_status_error" or
"name::key::PASSWORD_RESOURCE_STATUS_ERROR") so the update statement on
SystemTagVO sets a non-Chinese value; keep the same UPDATE statement but change
only the tag value.
compute/src/main/java/org/zstack/compute/host/HostApiInterceptor.java-84-87 (1)

84-87: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

创建前请先 trim() serviceType

这里直接用 msg.getServiceType() 做唯一性校验,"foo""foo " 会被当成两个值,重复校验能被绕过,后面也容易把带空白的值落库。As per coding guidelines: "注意检查来自 Message 的参数是否做过 trim,用户可能在浏览器上复制粘贴的数据带有空格、换行符等"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@compute/src/main/java/org/zstack/compute/host/HostApiInterceptor.java` around
lines 84 - 87, The uniqueness check in
HostApiInterceptor.validate(APICreateHostNetworkServiceTypeMsg msg) uses
msg.getServiceType() raw, so values like "foo" and "foo " bypass validation;
trim the serviceType before using it for the DB existence check and normalize
the message payload (i.e., obtain String svc = msg.getServiceType(); if non-null
call svc = svc.trim(); use svc in the
Q.New(...).eq(HostNetworkLabelVO_.serviceType, svc).isExists() check and, if the
message has a setter, set the trimmed value back on msg so later persistence
also uses the trimmed value).
compute/src/main/java/org/zstack/compute/host/HostIpmiPowerExecutor.java-223-225 (1)

223-225: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

修正这条报错里的英文断句。

is not reachable.because 少了空格,最终会直接出现在运维侧错误信息里。这里改成 is not reachable: %sis not reachable, because %s 会更干净。

As per coding guidelines, "代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@compute/src/main/java/org/zstack/compute/host/HostIpmiPowerExecutor.java`
around lines 223 - 225, The error message in HostIpmiPowerExecutor's operr call
uses a malformed sentence fragment ("is not reachable.because %s"); update the
operr invocation that formats ipmi.getIpmiAddress() and rst.getStderr() so the
message reads cleanly (e.g. "host ipmi[%s] is not reachable: %s" or "host
ipmi[%s] is not reachable, because %s") and ensure there is a space/colon before
the stderr substitution to avoid concatenated text.
conf/db/upgrade/V5.3.20__schema.sql-100-103 (1)

100-103: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

表名和列名未使用反引号包裹

根据编码规范,所有表名和列名必须使用反引号包裹,以避免 MySQL 8.0 / GreatSQL 保留关键字冲突导致的语法错误。

🔧 建议的修复方案
-Update ModelServiceVO set framework = 'Other' where framework not in
+UPDATE `zstack`.`ModelServiceVO` SET `framework` = 'Other' WHERE `framework` NOT IN
     ('vLLM', 'Diffusers', 'Transformers', 'sentence_transformers', 'llama.cpp', 'BentoML', 'Other', 'Ollama')
-    AND source != 'Bentoml';
-Update ModelServiceVO set framework = 'Other' where framework is NULL;
+    AND `source` != 'Bentoml';
+UPDATE `zstack`.`ModelServiceVO` SET `framework` = 'Other' WHERE `framework` IS NULL;

-UPDATE `zstack`.`L2NetworkVO` set vSwitchType='TfL2Network' where type='TfL2Network';
+UPDATE `zstack`.`L2NetworkVO` SET `vSwitchType`='TfL2Network' WHERE `type`='TfL2Network';

Also applies to: 186-186

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/db/upgrade/V5.3.20__schema.sql` around lines 100 - 103, The SQL updates
referencing ModelServiceVO should wrap identifiers with backticks to avoid
reserved-word conflicts: update the two statements that set framework to 'Other'
to use `ModelServiceVO`, and enclose column names `framework` and `source` in
backticks (e.g., UPDATE `ModelServiceVO` SET `framework` = 'Other' WHERE
`framework` NOT IN (...) AND `source` != 'Bentoml'; and the NULL-check UPDATE
`ModelServiceVO` SET `framework` = 'Other' WHERE `framework` IS NULL;), leaving
string literals unchanged.
core/src/main/java/org/zstack/core/externalservice/ExternalServiceManagerImpl.java-263-263 (1)

263-263: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

错误码不一致

部分 operr() 调用使用了 ORG_ZSTACK_CORE_EXTERNALSERVICE_* 错误码(Lines 54, 129, 135, 141),但 Lines 263, 429, 434 没有使用错误码。建议保持一致性。

🔧 建议的修复方案
-            completion.fail(operr("unable to find external service configuration with uuid [%s]", msg.getUuid()));
+            completion.fail(operr(ORG_ZSTACK_CORE_EXTERNALSERVICE_10004, "unable to find external service configuration with uuid [%s]", msg.getUuid()));

-                    completion.fail(operr("failed to apply external service configuration for type [%s]: %s", serviceType, e.getMessage()));
+                    completion.fail(operr(ORG_ZSTACK_CORE_EXTERNALSERVICE_10005, "failed to apply external service configuration for type [%s]: %s", serviceType, e.getMessage()));

-        completion.fail(operr("unable to find external service type [%s]", serviceType));
+        completion.fail(operr(ORG_ZSTACK_CORE_EXTERNALSERVICE_10006, "unable to find external service type [%s]", serviceType));

Also applies to: 429-429, 434-434

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@core/src/main/java/org/zstack/core/externalservice/ExternalServiceManagerImpl.java`
at line 263, 在 ExternalServiceManagerImpl 中的 completion.fail(operr("unable to
find external service configuration with uuid [%s]", msg.getUuid())) 调用缺少与类中其它
operr(...) 调用一致的错误码;请修改该调用(以及文件中类似位置 429、434 的 operr
调用)为使用与本类其它错误处理一致的错误码常量(即类中已使用的 ORG_ZSTACK_CORE_EXTERNALSERVICE_* 常量),保持形如
operr(ORG_ZSTACK_CORE_EXTERNALSERVICE_<SAME_CODE>, "unable to find external
service configuration with uuid [%s]",
msg.getUuid()),并确保使用相同的错误码符号风格以便统一日志/错误码管理。
core/src/main/java/org/zstack/core/cloudbus/CloudBusImpl3.java-143-144 (1)

143-144: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

静态字段在实例初始化块中被修改

SEND_CONFIRMED 是一个 static final 字段,但 SEND_CONFIRMED.success() 在实例初始化块中调用。虽然 FutureCompletion.success() 可能是幂等的,但如果创建多个 CloudBusImpl3 实例,这个调用会执行多次。

建议将 success() 调用移到静态初始化块中:

🔧 建议的修复方案
     public static final String HTTP_BASE_URL = "/cloudbus";
-    public static final FutureCompletion SEND_CONFIRMED = new FutureCompletion(null);
+    public static final FutureCompletion SEND_CONFIRMED;
+
+    static {
+        SEND_CONFIRMED = new FutureCompletion(null);
+        SEND_CONFIRMED.success();
+    }

     // ... instance initializer block ...
     {
         // ...
-        SEND_CONFIRMED.success();
     }

Also applies to: 178-179

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/main/java/org/zstack/core/cloudbus/CloudBusImpl3.java` around lines
143 - 144, The static final SEND_CONFIRMED is being mutated by calling
SEND_CONFIRMED.success() inside an instance initializer; move that invocation
into a static initialization block so the success() call runs once per class
load (not per instance). Specifically, remove the SEND_CONFIRMED.success() call
from the instance initialization area in CloudBusImpl3 and add a static {
SEND_CONFIRMED.success(); } block in the class; do the same for any other
occurrences where a static final future is completed in an instance initializer
(the similar calls around the other instance-init region mentioned) so all
static futures are completed in a static init block.
core/src/main/java/org/zstack/core/cloudbus/CloudBusImpl3.java-940-940 (1)

940-940: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

代码注释包含中文

根据编码规范,代码中不应包含中文,包括注释。请将此注释改为英文。

🔧 建议的修复方案
-                                    // Sentry 仅由 TelemetryFacade 在注册全局 OpenTelemetry 后 init;此处只判断是否已启用
+                                    // Sentry is only initialized by TelemetryFacade after registering global OpenTelemetry; here we only check if it's enabled
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/main/java/org/zstack/core/cloudbus/CloudBusImpl3.java` at line 940,
The inline comment in CloudBusImpl3 (around the Telemetry/Sentry init check) is
written in Chinese; replace it with an English comment that conveys the same
meaning (e.g., "Sentry is only initialized by TelemetryFacade after global
OpenTelemetry registration; here we only check whether it is enabled"). Update
the comment near the Telemetry/Sentry check in the CloudBusImpl3 class
accordingly, preserving intent and clarity.
compute/src/main/java/org/zstack/compute/vm/VmInstanceManagerImpl.java-1147-1150 (1)

1147-1150: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

不要在不可变空列表上追加错误码。

这里的 resultCollections.emptyList()result.add(...) 会直接抛 UnsupportedOperationException,把原本想返回的“msg 为空”错误再次覆盖掉。改成 new ArrayList<>(),或者直接返回单元素列表更安全。

Also applies to: 1160-1163

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceManagerImpl.java`
around lines 1147 - 1150, In VmInstanceManagerImpl (the method containing the
shown snippet) avoid mutating Collections.emptyList(); replace result
initialization with a mutable list (e.g., new ArrayList<>()) or simply return a
single-element list containing operr(ORG_ZSTACK_COMPUTE_VM_10238,
"CreateVmInstanceMsg cannot be null"); do the same fix for the other occurrence
around lines 1160-1163 so you don't trigger UnsupportedOperationException when
adding the error code.
compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java-409-425 (1)

409-425: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

trim() DNS 地址再校验。

现在直接校验原始 dnsAddresses,浏览器复制出来的前后空格会被当成非法地址。拦截器里最好先做 trim(),并把清洗后的值回写消息。 As per coding guidelines "注意检查来自 Message 的参数是否做过 trim,用户可能在浏览器上复制粘贴的数据带有空格、换行符等"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceApiInterceptor.java`
around lines 409 - 425, In validateDnsAddresses (VmInstanceApiInterceptor) trim
each DNS string (use String.trim()), drop empty results, and replace the
original dnsAddresses list with the cleaned values before performing size and
format checks; then enforce the MAXIMUM_NIC_DNS_NUMBER limit and run the
existing NetworkUtils.isIpv4Address / IPv6NetworkUtils.isIpv6Address validations
against the trimmed addresses so browser-copied entries with surrounding
whitespace are handled correctly.
compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java-3537-3537 (1)

3537-3537: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

静态 DNS 的写入和删除路径不对称。

这里开始把 DNS 当作静态 IP 的伴生状态持久化了,但同文件里的 handle(APIDeleteVmStaticIpMsg) 仍然只删 static IP tag,不会在删掉该 L3 的最后一个静态 IP 时同步清掉 static DNS。这样用户删除静态 IP 后,旧 DNS 仍会残留到后续启动/换网流程里。

Also applies to: 3719-3722, 3768-3771

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` at line
3537, The code writes DNS as companion state of static IPs via
cmsg.setDnsAddresses(...) but the delete path in handle(APIDeleteVmStaticIpMsg)
only removes the static IP tag and does not clear/update the static DNS tag,
leaving stale DNS entries; update the delete handlers (the
APIDeleteVmStaticIpMsg handler and similar delete flows referenced around the
other spots) to check whether the removed static IP was the last one on that L3
and, if so, clear the stored static DNS (or remove the static DNS tag) for that
L3—mirror the logic used when creating/setting DNS (cmsg.setDnsAddresses) so DNS
persistence and deletion are symmetric. Ensure you reference and modify the same
methods/classes handling static IP deletion and tag updates so the DNS state is
cleaned when the last static IP for an L3 is removed.

Comment thread build/deploydb.sh
Comment on lines +30 to +62
# Detect MySQL version
# Extract major version number from various MySQL/MariaDB/GreatDB output formats
db_version=$(${MYSQL} --version 2>/dev/null | grep -oP '\d+\.\d+\.\d+' | head -1 | cut -d'.' -f1 || echo "5")

# GreatDB and MySQL 8.0+ require CREATE USER before GRANT
if command -v greatdb &> /dev/null || [ "$db_version" -ge 8 ] 2>/dev/null; then
${MYSQL} ${loginCmd} << EOF
set global log_bin_trust_function_creators=1;
DROP DATABASE IF EXISTS zstack;
CREATE DATABASE zstack;
DROP DATABASE IF EXISTS zstack_rest;
CREATE DATABASE zstack_rest;
CREATE USER IF NOT EXISTS 'root'@'%' IDENTIFIED BY "${password}";
CREATE USER IF NOT EXISTS 'root'@'127.0.0.1' IDENTIFIED BY "${password}";
grant all privileges on zstack.* to root@'%';
grant all privileges on zstack_rest.* to root@'%';
grant all privileges on zstack.* to root@'127.0.0.1';
grant all privileges on zstack_rest.* to root@'127.0.0.1';
EOF
else
# MySQL 5.x: GRANT with IDENTIFIED BY auto-creates users
${MYSQL} ${loginCmd} << EOF
set global log_bin_trust_function_creators=1;
DROP DATABASE IF EXISTS zstack;
CREATE DATABASE zstack;
DROP DATABASE IF EXISTS zstack_rest;
CREATE DATABASE zstack_rest;
grant all privileges on zstack.* to root@'%' identified by "${password}";
grant all privileges on zstack_rest.* to root@'%' identified by "${password}";
grant all privileges on zstack.* to root@'127.0.0.1' identified by "${password}";
grant all privileges on zstack_rest.* to root@'127.0.0.1' identified by "${password}";
EOF
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚖️ Poor tradeoff

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# 测试数据库版本检测逻辑在不同环境下的行为

# 检查当前环境的 MySQL/GreatDB 版本输出格式
if command -v mysql &> /dev/null; then
  echo "=== MySQL version output ==="
  mysql --version
fi

if command -v greatdb &> /dev/null; then
  echo "=== GreatDB version output ==="
  greatdb --version
fi

# 检查版本提取逻辑
test_version_extraction() {
  local version_output="$1"
  local extracted=$(echo "$version_output" | grep -oP '\d+\.\d+\.\d+' | head -1 | cut -d'.' -f1 || echo "5")
  echo "Input: $version_output"
  echo "Extracted major version: $extracted"
  echo "---"
}

# 测试各种可能的输出格式
test_version_extraction "mysql  Ver 5.7.38 for Linux"
test_version_extraction "mysql  Ver 8.0.32 for Linux"
test_version_extraction "mysql  Ver 15.1 Distrib 10.6.12-MariaDB"
test_version_extraction "greatdb  Ver 8.0.25 for Linux"

Repository: MatheMatrix/zstack

Length of output: 341


🏁 Script executed:

# 查看实际的 deploydb.sh 文件
find . -name "deploydb.sh" -type f | head -5

Repository: MatheMatrix/zstack

Length of output: 101


🏁 Script executed:

# 测试 MariaDB 版本提取的边界情况
#!/bin/bash

test_version_extraction() {
  local version_output="$1"
  local extracted=$(echo "$version_output" | grep -oP '\d+\.\d+\.\d+' | head -1 | cut -d'.' -f1 || echo "5")
  echo "Input: $version_output"
  echo "Extracted major version: $extracted"
  echo "---"
}

# 测试可能导致问题的 MariaDB 格式
# MariaDB 10.6.12-MariaDB 可能报告为 "Ver 15.1 Distrib 10.6.12-MariaDB"
test_version_extraction "mysql  Ver 15.1 Distrib 10.6.12-MariaDB"

# 测试只有 X.Y 格式的版本
test_version_extraction "mysql  Ver 5.5 for Linux"

# 测试无版本号输出的情况
test_version_extraction "some random output"

# 测试空输出的情况
test_version_extraction ""

Repository: MatheMatrix/zstack

Length of output: 297


🏁 Script executed:

# 读取 build/deploydb.sh 的相关部分
head -80 ./build/deploydb.sh | tail -60

Repository: MatheMatrix/zstack

Length of output: 2361


🏁 Script executed:

# 测试完整的版本检测逻辑(包括 || 后备)
#!/bin/bash

# 测试版本提取中的后备机制
test_with_fallback() {
  local version_output="$1"
  # 模拟实际的脚本逻辑
  local result=$(echo "$version_output" | grep -oP '\d+\.\d+\.\d+' | head -1 | cut -d'.' -f1 || echo "5")
  echo "Input: '$version_output'"
  echo "Result: '$result'"
  echo "---"
}

# 测试各种格式
test_with_fallback "mysql  Ver 5.7.38 for Linux"
test_with_fallback "mysql  Ver 8.0.32 for Linux"
test_with_fallback "mysql  Ver 5.5 for Linux"
test_with_fallback "mysql  Ver 15.1 Distrib 10.6.12-MariaDB"
test_with_fallback "some random output"
test_with_fallback ""

# 测试 grep -oP 的返回码
echo "=== Testing grep return codes ==="
echo "mysql  Ver 5.5 for Linux" | grep -oP '\d+\.\d+\.\d+' 
echo "Return code: $?"

echo "some random output" | grep -oP '\d+\.\d+\.\d+'
echo "Return code: $?"

Repository: MatheMatrix/zstack

Length of output: 415


修复版本检测逻辑中的严重缺陷。

版本提取逻辑存在关键问题:

  1. 提取失败:MySQL 5.5(格式为 X.Y)无法匹配正则 \d+\.\d+\.\d+,导致 db_version 为空而不是回退到 "5"。根本原因是 cut 对空输入返回退出码 0,使 || echo "5" 永不执行。
  2. 版本错误:MariaDB "Ver 15.1 Distrib 10.6.12-MariaDB" 提取到 "10"(发行版本)而非 "15"(报告版本),导致版本判断错误。
  3. 空值风险:当 db_version 为空时,第 36 行的 [ "$db_version" -ge 8 ] 会评估失败,分支逻辑不符预期。

建议重写版本检测使用更兼容的模式(如 \d+ 提取首个数字),或使用版本命令的不同输出解析方法。同时在第 34、49 行为 "${MYSQL}"${loginCmd} 添加双引号以防止分词。

🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 36-36: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 51-51: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@build/deploydb.sh` around lines 30 - 62, The db version detection and MySQL
invocation are brittle: update the extraction logic for db_version to reliably
get the major version (e.g., use a regex that grabs the first numeric token like
'\d+' or parse the "Distrib" token so MariaDB returns 10 not the client report
number), ensure the fallback to "5" actually triggers if extraction yields empty
(assign db_version="$(... || echo 5)" or set db_version="${db_version:-5}"), and
guard the numeric compare by treating db_version as an integer (e.g., defaulting
empty to 5 before [ "$db_version" -ge 8 ]). Also quote the invocations of MYSQL
and loginCmd (use "${MYSQL}" "${loginCmd}") where used in the here-doc blocks so
word-splitting is prevented; locate and change uses of db_version, MYSQL, and
loginCmd in the script (variables: db_version, MYSQL, loginCmd) accordingly.

Comment on lines 95 to +96
} else if (isNoHypervisor(spec.getHypervisorType())) {
error = operr("no Enabled hosts found in the [%s] candidate hosts having the hypervisor type [%s]", candidates.size(), spec.getHypervisorType());
error = operr(ORG_ZSTACK_COMPUTE_ALLOCATOR_10031, "no Enabled hosts found in the [%s] candidate hosts having the hypervisor type [%s]", candidates.size(), spec.getHypervisorType());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

spec.getHypervisorType() 为空时存在空指针风险

Line 95 在未判空时调用 isNoHypervisor(...),会在 hyType.equals(...) 处触发 NPE,导致分配流程异常中断。

建议修复
-            } else if (isNoHypervisor(spec.getHypervisorType())) {
+            } else if (spec.getHypervisorType() != null && isNoHypervisor(spec.getHypervisorType())) {
                 error = operr(ORG_ZSTACK_COMPUTE_ALLOCATOR_10031, "no Enabled hosts found in the [%s] candidate hosts having the hypervisor type [%s]", candidates.size(), spec.getHypervisorType());
             } else {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@compute/src/main/java/org/zstack/compute/allocator/HostStateAndHypervisorAllocatorFlow.java`
around lines 95 - 96, The call to isNoHypervisor(spec.getHypervisorType()) can
throw an NPE when spec.getHypervisorType() is null; update the
HostStateAndHypervisorAllocatorFlow code to null-safe this check by first
verifying spec.getHypervisorType() is not null (e.g., if
(spec.getHypervisorType() != null && isNoHypervisor(spec.getHypervisorType())) )
or alternatively make isNoHypervisor(...) defensively handle null inputs, and
keep the existing operr(...) usage unchanged so the error path still executes
when no hypervisor type is present.

Comment on lines +222 to +231
private void deleteHostNetworkLabel(APIDeleteHostNetworkServiceTypeMsg msg, Completion completion) {
HostNetworkLabelVO vo = dbf.findByUuid(msg.getUuid(), HostNetworkLabelVO.class);
CollectionUtils.safeForEach(pluginRgty.getExtensionList(HostNetworkLabelExtensionPoint.class),
arg -> arg.deleteHostNetworkLabel(vo.toInventory(), completion));
}

private void updateHostNetworkLabel(APIUpdateHostNetworkServiceTypeMsg msg, Completion completion) {
HostNetworkLabelVO vo = dbf.findByUuid(msg.getUuid(), HostNetworkLabelVO.class);
CollectionUtils.safeForEach(pluginRgty.getExtensionList(HostNetworkLabelExtensionPoint.class),
arg -> arg.updateHostNetworkLabel(vo.toInventory(), msg.getServiceType(), completion));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

扩展点回调方式会让更新/删除接口挂起或重入。

这里把同一个 Completion 直接传给所有 HostNetworkLabelExtensionPoint。如果没有任何扩展实现,completion 永远不会被回调,API 会一直卡住;如果有多个实现,任一扩展都可能重复触发同一个 completion,导致 chain.next()/事件发布重入。这里需要像前面的 beforeAddHost 一样串行聚合扩展,并在扩展列表为空时直接 success()

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@compute/src/main/java/org/zstack/compute/host/HostManagerImpl.java` around
lines 222 - 231, The current deleteHostNetworkLabel and updateHostNetworkLabel
pass the same Completion to every HostNetworkLabelExtensionPoint via
CollectionUtils.safeForEach, which can leave the API hanging if the extension
list is empty or cause duplicate/ reentrant completion calls when multiple
extensions invoke the same Completion; fix by retrieving
pluginRgty.getExtensionList(HostNetworkLabelExtensionPoint.class) into a list,
if the list is empty call completion.success() immediately, otherwise invoke the
extensions serially (like beforeAddHost) by chaining calls so each extension
receives a per-extension callback that advances to the next extension and only
the final step calls the original completion.success()/fail(), ensuring
HostNetworkLabelExtensionPoint.deleteHostNetworkLabel/updateHostNetworkLabel are
invoked in order and completion is invoked exactly once.

Comment on lines 2977 to 2980
AttachNicToVmReply r = new AttachNicToVmReply();
if (!reply.isSuccess()) {
r.setError(err(VmErrors.ATTACH_NETWORK_ERROR, r.getError(), r.getError().getDetails()));
r.setError(err(ORG_ZSTACK_COMPUTE_VM_10274, VmErrors.ATTACH_NETWORK_ERROR, r.getError(), r.getError().getDetails()));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

这里会把真实错误覆盖成空指针。

Line 2979 读取的是新建出来的 AttachNicToVmReply r,它的 error 还是 null。一旦 host 侧返回失败,这里会在 r.getError().getDetails() 处抛 NPE,原始 attach 失败原因也一起丢掉。

🛠️ 建议修正
                 AttachNicToVmReply r = new AttachNicToVmReply();
                 if (!reply.isSuccess()) {
-                    r.setError(err(ORG_ZSTACK_COMPUTE_VM_10274, VmErrors.ATTACH_NETWORK_ERROR, r.getError(), r.getError().getDetails()));
+                    r.setError(err(ORG_ZSTACK_COMPUTE_VM_10274, VmErrors.ATTACH_NETWORK_ERROR,
+                            reply.getError(), reply.getError().getDetails()));
                 }
                 bus.reply(msg, r);

Comment on lines +431 to +434
if (!_conditions.isEmpty()) {
for (Condition condition : _conditions) {
q.condition(condition._attr, condition._op, condition._val[0]);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

条件参数转换存在越界与语义丢失

Line [433] 只传了 condition._val[0]

  • Op.NULL/Op.NOT_NULL 时可能数组为空,触发越界异常;
  • Op.IN/Op.NOT_IN 时会丢失其余参数,导致查询条件错误。
💡 建议修复
         if (!_conditions.isEmpty()) {
             for (Condition condition : _conditions) {
-                q.condition(condition._attr, condition._op, condition._val[0]);
+                if (condition._val == null || condition._val.length == 0) {
+                    q.condition(condition._attr, condition._op);
+                } else {
+                    q.condition(condition._attr, condition._op, condition._val);
+                }
             }
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/src/main/java/org/zstack/core/db/SimpleQueryImpl.java` around lines 431
- 434, The loop in SimpleQueryImpl that calls q.condition(condition._attr,
condition._op, condition._val[0]) is unsafe and lossy: it will IndexOutOfBounds
for Op.NULL/Op.NOT_NULL when _val is empty and will drop values for
Op.IN/Op.NOT_IN. Fix by branching on condition._op inside the _conditions
iteration in SimpleQueryImpl: for Op.NULL/Op.NOT_NULL call q.condition with no
value (or the API equivalent that accepts only attr+op); for Op.IN/Op.NOT_IN
pass the whole condition._val array or converted Collection to q.condition; for
single-value ops verify condition._val.length>0 and pass condition._val[0] (or
handle/throw when missing). Update the call sites that use q.condition
accordingly and reference the Condition class fields _attr, _op, _val to
implement these branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.