[5.5.22][core] Bound JDBC waits on VIP failover#3993
Conversation
Add default MySQL JDBC connect/read timeouts and TCP keepalive to management-node datasource URLs so stale VIP connections fail and recover instead of waiting indefinitely in Connector/J socket reads. Constraint: 5.5.22 deployments may configure datasource URLs explicitly through zstack.properties, so explicit DbFacadeDataSource and RESTApiDataSource URLs must also receive defaults when they do not already set them. Rejected: Only changing XML fallback URLs | zstack.properties datasource URLs bypass the XML fallback in deployed environments. Confidence: high Scope-risk: moderate Directive: Do not remove the JDBC timeout defaults without re-testing keepalived VIP movement against a running MN. Tested: JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 mvn -pl core -DskipTests install Tested: JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 mvn -pl plugin/kvm -DskipTests install Tested: JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 mvn -Dtest=org.zstack.test.core.TestDefaultDbJdbcUrl -Dsurefire.useFile=false test Tested: Manual 172.20.1.164/172.20.1.165 keepalived VIP failover, 40/40 zstack-cli queries succeeded with bounded post-failover delay Not-tested: Full CI suite Resolves: ZSTAC-0 Change-Id: Ie551c05a6b646c192f01e47fdda902b388cc2b04
概览新增数据库连接的全局参数配置(connectTimeout/socketTimeout/tcpKeepAlive),在Platform中实现JDBC URL的统一构建逻辑,自动追加默认连接参数并避免重复,通过完整测试验证多种URL场景下的正确处理。 变更数据库JDBC URL默认参数自动化
评审工作量估算🎯 3 (中等) | ⏱️ ~20分钟 庆祝诗
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/src/main/java/org/zstack/core/db/DatabaseGlobalProperty.java`:
- Around line 25-30: 在 DatabaseGlobalProperty 类中为新加的全局属性
DbConnectTimeout、DbSocketTimeout、DbTcpKeepAlive
增加启动时格式校验:把超时属性(DbConnectTimeout、DbSocketTimeout)从无约束的 String 改为数值类型(int/long 或
Integer/Long)或在类的静态初始化/属性加载路径中立即对其进行 Integer.parseInt/Long.parseLong
校验并在解析失败时抛出明确异常;对 DbTcpKeepAlive 进行布尔解析(Boolean.parseBoolean 或更严格的接受
"true"/"false" 校验)并在格式不符合时抛出错误。确保改动出现在 DatabaseGlobalProperty 的定义/初始化逻辑中(涉及字段
DbConnectTimeout、DbSocketTimeout、DbTcpKeepAlive),以便在应用启动阶段就能发现并报出配置错误。
In `@core/src/main/java/org/zstack/core/Platform.java`:
- Line 390: The debug log currently prints the full JDBC URL (defaultedJdbcUrl)
which may contain sensitive info; update the logging in class Platform where
propertyName and defaultedJdbcUrl are used so it does not emit the raw
URL—either log only the propertyName or sanitize/mask credentials and query
parameters from defaultedJdbcUrl before including it in the logger.debug call
(use a helper to strip user:pass@ and sensitive query keys), ensuring the log
message retains context but never contains unmasked connection secrets.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
Run ID: 80456f43-32d3-4ba7-bd03-7ba560c91ae3
⛔ Files ignored due to path filters (2)
conf/springConfigXml/DatabaseFacade.xmlis excluded by!**/*.xmlconf/springConfigXml/RESTFacade.xmlis excluded by!**/*.xml
📒 Files selected for processing (3)
core/src/main/java/org/zstack/core/Platform.javacore/src/main/java/org/zstack/core/db/DatabaseGlobalProperty.javatest/src/test/java/org/zstack/test/core/TestDefaultDbJdbcUrl.java
| @GlobalProperty(name="DB.connectTimeout", defaultValue = "5000") | ||
| public static String DbConnectTimeout; | ||
| @GlobalProperty(name="DB.socketTimeout", defaultValue = "60000") | ||
| public static String DbSocketTimeout; | ||
| @GlobalProperty(name="DB.tcpKeepAlive", defaultValue = "true") | ||
| public static String DbTcpKeepAlive; |
There was a problem hiding this comment.
为新增 JDBC 参数增加格式校验,避免错误配置延迟到运行期暴露。
Line 25-30 的新属性当前无格式约束。建议在属性层直接限制:超时必须为数字,tcpKeepAlive 必须为布尔值,这样可在启动阶段尽早失败并给出明确错误。
建议修改
- `@GlobalProperty`(name="DB.connectTimeout", defaultValue = "5000")
+ `@GlobalProperty`(name="DB.connectTimeout", defaultValue = "5000")
+ `@RegexValues`(value = "^[0-9]+$")
public static String DbConnectTimeout;
- `@GlobalProperty`(name="DB.socketTimeout", defaultValue = "60000")
+ `@GlobalProperty`(name="DB.socketTimeout", defaultValue = "60000")
+ `@RegexValues`(value = "^[0-9]+$")
public static String DbSocketTimeout;
- `@GlobalProperty`(name="DB.tcpKeepAlive", defaultValue = "true")
+ `@GlobalProperty`(name="DB.tcpKeepAlive", defaultValue = "true")
+ `@RegexValues`(value = "^(?i:true|false)$")
public static String DbTcpKeepAlive;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@core/src/main/java/org/zstack/core/db/DatabaseGlobalProperty.java` around
lines 25 - 30, 在 DatabaseGlobalProperty 类中为新加的全局属性
DbConnectTimeout、DbSocketTimeout、DbTcpKeepAlive
增加启动时格式校验:把超时属性(DbConnectTimeout、DbSocketTimeout)从无约束的 String 改为数值类型(int/long 或
Integer/Long)或在类的静态初始化/属性加载路径中立即对其进行 Integer.parseInt/Long.parseLong
校验并在解析失败时抛出明确异常;对 DbTcpKeepAlive 进行布尔解析(Boolean.parseBoolean 或更严格的接受
"true"/"false" 校验)并在格式不符合时抛出错误。确保改动出现在 DatabaseGlobalProperty 的定义/初始化逻辑中(涉及字段
DbConnectTimeout、DbSocketTimeout、DbTcpKeepAlive),以便在应用启动阶段就能发现并报出配置错误。
| String defaultedJdbcUrl = appendDefaultDbJdbcParameters(jdbcUrl); | ||
| if (!jdbcUrl.equals(defaultedJdbcUrl)) { | ||
| System.setProperty(propertyName, defaultedJdbcUrl); | ||
| logger.debug(String.format("append default JDBC parameters to %s [%s]", propertyName, defaultedJdbcUrl)); |
There was a problem hiding this comment.
避免在日志中输出完整 JDBC URL(可能包含敏感参数)。
Line 390 直接打印 defaultedJdbcUrl,若 URL 携带凭据或其他敏感 query,会写入日志。建议只记录属性名,或先脱敏再打印。
建议修改
- logger.debug(String.format("append default JDBC parameters to %s [%s]", propertyName, defaultedJdbcUrl));
+ logger.debug(String.format("append default JDBC parameters to %s", propertyName));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@core/src/main/java/org/zstack/core/Platform.java` at line 390, The debug log
currently prints the full JDBC URL (defaultedJdbcUrl) which may contain
sensitive info; update the logging in class Platform where propertyName and
defaultedJdbcUrl are used so it does not emit the raw URL—either log only the
propertyName or sanitize/mask credentials and query parameters from
defaultedJdbcUrl before including it in the logger.debug call (use a helper to
strip user:pass@ and sensitive query keys), ensuring the log message retains
context but never contains unmasked connection secrets.
Summary
connectTimeout=5000,socketTimeout=60000,tcpKeepAlive=true.DB.urland whenDbFacadeDataSource.jdbcUrl/RESTApiDataSource.jdbcUrlare explicitly configured.connectTimeout,socketTimeout, ortcpKeepAlivevalues are not overwritten.{database}templates, explicit datasource URLs, and explicit timeout preservation.Background
In a two-MN deployment where the control plane reaches MySQL through a keepalived VIP, old JDBC connections can survive a VIP move and block in Connector/J socket reads. A captured MN thread dump showed a ZStack API worker blocked in
SocketInputStream.socketRead0throughConnectionImpl.commit, so the fix bounds read/connect waits and allows recovery instead of indefinite blocking.Verification
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 mvn -pl core -DskipTests installJAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 mvn -pl plugin/kvm -DskipTests installJAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 mvn -Dtest=org.zstack.test.core.TestDefaultDbJdbcUrl -Dsurefire.useFile=false test172.20.1.164/172.20.1.165: restart keepalived to move VIP, 40/40zstack-cliquery iterations succeeded with only a bounded post-failover delay.Notes
Local Maven initially failed because existing
target/output was owned by root and the default shell used Java 21; fixed by chowning generated targets and running with JDK 8.sync from gitlab !9886