Skip to content

[IOTDB-17179] CLI: support automatic reconnection when connection is lost#17181

Open
miantalha45 wants to merge 2 commits intoapache:masterfrom
miantalha45:cli-auto-reconnect
Open

[IOTDB-17179] CLI: support automatic reconnection when connection is lost#17181
miantalha45 wants to merge 2 commits intoapache:masterfrom
miantalha45:cli-auto-reconnect

Conversation

@miantalha45
Copy link

@miantalha45 miantalha45 commented Feb 7, 2026

Description

Add automatic reconnection to the IoTDB CLI when the connection to the server is lost during an interactive session (e.g. server restart, network blip, or idle timeout). The CLI no longer exits immediately on connection-related errors; it attempts to reconnect with the same parameters and retries the failed command, aligning behavior with the Session API, JDBC, and C++/Python clients.

Content1 — Detection and reconnection flow

  • Detection: Connection loss is detected when a command fails with a connection-related SQLException. We treat an exception as connection-related if its message (or cause message, lowercased) contains any of: connection, refused, timeout, closed, reset, network, broken pipe. This logic lives in AbstractCli.isConnectionRelated(SQLException) and matchesConnectionFailure(String) so it can be shared and reused.
  • Reconnection: On such a failure, the CLI closes the current connection and opens a new one using the same parameters (host, port, user, password, and options) via DriverManager.getConnection and the existing info properties. Helper methods openConnection(), setupConnection(), and closeConnectionQuietly() in Cli encapsulate open/setup/close so the main loop stays clear.
  • Retry: After a successful reconnection, the same user command (the current line that failed) is retried with the new connection. We retry reconnection up to 3 times with a 1 s delay between attempts (no delay before the first attempt). Constants RECONNECT_RETRY_NUM and RECONNECT_RETRY_INTERVAL_MS in Cli control this; they are not yet user-configurable.
  • Feedback: On successful reconnection we print: Connection lost. Reconnected. Retrying command. If all reconnection attempts fail we print: IoTDB: Could not reconnect after 3 attempts. Please check that the server is running and try again. and exit with error code.

Content2 — Class and method organization

  • AbstractCli: Added isConnectionRelated(SQLException) (package-private static) and matchesConnectionFailure(String) (private static) for shared detection. In executeQuery, setTimeZone, and showTimeZone, we catch SQLException (or Exception where the API does not throw SQLException) and rethrow when isConnectionRelated(e); otherwise we keep the existing "print error and return error code" behavior. handleInputCmd and processCommand now declare throws SQLException so connection failures propagate to the CLI loop instead of being swallowed.
  • Cli: Introduced ReadLineResult (inner class with stop, failedCommand) and factory methods continueLoop(), stopLoop(), reconnectAndRetry(String) so the read-eval loop can signal "continue", "exit", or "reconnect and retry this command". receiveCommands() no longer uses try-with-resources for the connection; it holds the connection in a variable, and when readerReadLine() returns a result with failedCommand != null, it runs the reconnect loop (close → retry open/setup → print message → retry command). readerReadLine() wraps processCommand() in a try-catch; on connection-related SQLException it returns reconnectAndRetry(s) with the current line; on other SQLException it prints and returns stopLoop().
  • AbstractCliTest: testHandleInputInputCmd() now declares throws SQLException and imports java.sql.SQLException so it compiles with the updated handleInputCmd signature.

Content3 — Corner cases and alternatives

  • Corner cases: If reconnection succeeds but the retried command fails again with a connection-related error, the outer loop will see another reconnectAndRetry and run the same reconnect/retry flow again (each time with up to 3 reconnect attempts). Non-connection SQLExceptions still print the error and stop the loop (exit) as before. Interrupt and EOF handling in readerReadLine() are unchanged.
  • Session/statement errors after reconnect: If the server returns an error such as "StatementId doesn't exist in this session" (e.g. after a server stop/start), the retried command or the next user command can fail with that instead of a connection error. We treat such exceptions as session/statement state errors via isSessionOrStatementError() in AbstractCli and show: "Reconnected, but the previous command could not be completed. Please run your command again." so the user is not shown the raw exception. This handling is applied both in the reconnect-retry path in Cli and in AbstractCli.executeQuery for the normal command path.
  • Alternatives considered: (1) Reconnect without retrying the failed command—simpler but worse UX. (2) Prompt "Reconnect? (y/n)"—gives control but adds friction and is less script-friendly. (3) Leave current behavior—rejected to align CLI with other clients and improve long-lived session UX.

This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR
  • org.apache.iotdb.cli.AbstractCliisConnectionRelated, matchesConnectionFailure, isSessionOrStatementError, matchesSessionOrStatementFailure; rethrow connection-related SQLException in executeQuery, setTimeZone, showTimeZone; in executeQuery show friendly message for session/statement errors; handleInputCmd, processCommand now throws SQLException
  • org.apache.iotdb.cli.CliReadLineResult, openConnection(), setupConnection(), closeConnectionQuietly(); refactored receiveCommands() and readerReadLine() for reconnect-and-retry flow; handle session/statement error in reconnect-retry path
  • org.apache.iotdb.cli.AbstractCliTesttestHandleInputInputCmd() updated for throws SQLException

Closes #17179

- Detect connection-related SQLExceptions (refused, timeout, closed, reset, etc.)
- In AbstractCli: rethrow connection-related SQLException from executeQuery,
  setTimeZone, showTimeZone so CLI can handle them
- In Cli: on connection loss, close current connection, retry reconnect up to 3
  times with 1s interval, then retry the failed command; print 'Connection lost.
  Reconnected. Retrying command.' on success; exit with clear message after
  all retries fail
- Add isConnectionRelated() and matchesConnectionFailure() in AbstractCli for
  shared detection; openConnection(), setupConnection(), closeConnectionQuietly()
  and ReadLineResult in Cli for reconnect flow
- Update AbstractCliTest to declare throws SQLException for handleInputCmd calls

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this is your first pull request in IoTDB project. Thanks for your contribution! IoTDB will be better because of you.

When reconnect succeeds but the retried command fails with a session/statement
error (e.g. StatementId doesn't exist in this session), show a friendly message
instead of the raw exception. Apply the same handling in AbstractCli.executeQuery
so the message is shown both during reconnect-retry and when the user runs the
next command. Add isSessionOrStatement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature request] CLI should support automatic reconnection when connection is lost during interactive session

1 participant