-
Notifications
You must be signed in to change notification settings - Fork 324
Fix database context loss on transparent reconnection #4130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
paulmedynski
wants to merge
4
commits into
main
Choose a base branch
from
dev/paul/issue-4108
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
73f2480
Add tests reproducing database context loss on reconnection (#4108)
paulmedynski 2599986
Added a database comparison at the end of session recovery that issue…
paulmedynski 8aae7e4
Further analysis and testing, trying to find a scenario that reliably…
paulmedynski 9263cda
Restricted the new tests to SQL Server only (no Azure, Synapse, Fabri…
paulmedynski File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| // Licensed to the .NET Foundation under one or more agreements. | ||
| // The .NET Foundation licenses this file to you under the MIT license. | ||
| // See the LICENSE file in the project root for more information. | ||
|
|
||
| // Configuration for the markdownlint VS Code extension. | ||
| // See https://github.com/DavidAnson/markdownlint for rule documentation. | ||
| { | ||
| // MD013 - Line length: enforce a maximum line length of 100 characters | ||
| "MD013": { | ||
| "line_length": 100, | ||
| "tables": false | ||
| }, | ||
| // MD024 - No duplicate headings: only flag duplicates among sibling headings | ||
| // (allows the same heading text under different parents) | ||
| "MD024": { | ||
| "siblings_only": true | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,68 @@ | ||||||||||
| # Database Context Preservation Across Internal Reconnections | ||||||||||
|
|
||||||||||
| ## Issue | ||||||||||
|
|
||||||||||
| [dotnet/SqlClient#4108](https://github.com/dotnet/SqlClient/issues/4108) — `SqlConnection` doesn't | ||||||||||
| restore database in the new session if connection is lost. | ||||||||||
|
|
||||||||||
| When a user changes the active database via `USE [db]` through `SqlCommand.ExecuteNonQuery()`, and | ||||||||||
| the physical connection subsequently breaks and is transparently reconnected, the recovered session | ||||||||||
| may land on the **initial catalog** from the connection string instead of the database the user | ||||||||||
| switched to. | ||||||||||
|
|
||||||||||
| ## Status | ||||||||||
|
|
||||||||||
| **Fix implemented and validated.** The root cause (Issue G) has been identified and fixed. See | ||||||||||
| [03-issues.md](03-issues.md) for the full issue list and [04-recommendations.md](04-recommendations.md) | ||||||||||
| for the fix details. Server-side analysis of the SQL Server engine's session recovery code confirms | ||||||||||
| the fix is correct — see [06-server-side-analysis.md](06-server-side-analysis.md). | ||||||||||
|
|
||||||||||
| ### Root Cause | ||||||||||
|
|
||||||||||
| `CompleteLogin()` in `SqlConnectionInternal.cs` trusted the server's `ENV_CHANGE` response | ||||||||||
| unconditionally after session recovery. If the server did not properly restore the database context, | ||||||||||
| the client silently ended up on the wrong database. | ||||||||||
|
|
||||||||||
| ### Server-Side Confirmation | ||||||||||
|
|
||||||||||
| Analysis of the SQL Server source code (`featureext.cpp`, `login.cpp`, `session.cpp`) confirms that | ||||||||||
| the server correctly implements session recovery for database context — the recovery database from | ||||||||||
| the ToBe chunk is treated as mandatory (`Source #0`) with no silent fallback. The root cause is | ||||||||||
| entirely **client-side**: `CompleteLogin()` did not verify the server's response matched the recovery | ||||||||||
| target. | ||||||||||
|
|
||||||||||
| ### Fix | ||||||||||
|
|
||||||||||
| After session recovery completes in `CompleteLogin()`, the fix compares `CurrentDatabase` (set by | ||||||||||
| the server's `ENV_CHANGE`) against the database from `_recoverySessionData`. If they differ, a | ||||||||||
| `USE [database]` command is sent to the server to force alignment. This guarantees both client and | ||||||||||
| server are on the correct database after recovery, regardless of server behavior. | ||||||||||
|
|
||||||||||
| The fix is gated behind the `Switch.Microsoft.Data.SqlClient.VerifyRecoveredDatabaseContext` | ||||||||||
| AppContext switch (default: `true`). Manual tests set it to `false` to confirm the server-only path | ||||||||||
| fails without the fix. | ||||||||||
|
Comment on lines
+42
to
+43
|
||||||||||
| AppContext switch (default: `true`). Manual tests set it to `false` to confirm the server-only path | |
| fails without the fix. | |
| AppContext switch (default: `false`). Manual tests use `false` to confirm the server-only path | |
| fails without the client-side fix. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| # Architecture: Database Context Tracking and Session Recovery | ||
|
|
||
| ## Key Data Structures | ||
|
|
||
| ### `SqlConnectionInternal` fields | ||
|
|
||
| | Field | Type | Set during | Purpose | | ||
| | ----- | ---- | ---------- | ------- | | ||
| | `CurrentDatabase` | `string` | Login, ENV_CHANGE | The database the server considers active right now | | ||
| | `_originalDatabase` | `string` | Login, constructor | Reset target for pool recycling (`ResetConnection()` restores to this value) | | ||
| | `_currentSessionData` | `SessionData` | Constructor | Live session state, snapshotted before reconnection | | ||
| | `_recoverySessionData` | `SessionData` | Constructor (param) | Saved session from the broken connection, used to build the recovery login packet | | ||
| | `_fConnectionOpen` | `bool` | `CompleteLogin()` | Guards whether ENV_CHANGE updates `_originalDatabase` | | ||
| | `_sessionRecoveryAcknowledged` | `bool` | `OnFeatureExtAck()` | Whether the server supports session recovery | | ||
|
|
||
| ### `SessionData` fields | ||
|
|
||
| | Field | Mutated by | Cleared by `Reset()` | Purpose | | ||
| | ----- | ---------- | -------------------- | ------- | | ||
| | `_initialDatabase` | `CompleteLogin()` (first login only) | No | Immutable baseline from the login the server confirmed | | ||
| | `_database` | `CurrentSessionData` getter | Yes (set to `null`) | Current database, written just-in-time before snapshot | | ||
| | `_initialLanguage` | `CompleteLogin()` | No | Immutable baseline language | | ||
| | `_language` | `CurrentSessionData` getter | Yes | Current language | | ||
| | `_initialCollation` | `CompleteLogin()` | No | Immutable baseline collation | | ||
| | `_collation` | `OnEnvChange()` | Yes | Current collation | | ||
| | `_delta[]` | `OnFeatureExtAck()`, `SQLSESSIONSTATE` token handler | Yes | Per-stateID session variable changes | | ||
| | `_initialState[]` | `OnFeatureExtAck()` (first login only) | No | Per-stateID session variable baselines | | ||
| | `_unrecoverableStatesCount` | `SQLSESSIONSTATE` token handler | Yes | Count of non-recoverable session states | | ||
|
|
||
| `Reset()` is called when `ENV_SPRESETCONNECTIONACK` arrives (server acknowledged | ||
| `sp_reset_connection`). It clears delta/current state but preserves the immutable baselines. | ||
|
|
||
| ## How `CurrentDatabase` is set | ||
|
|
||
| ### During login | ||
|
|
||
| ```text | ||
| Login() → CurrentDatabase = server.ResolvedDatabaseName | ||
| (= ConnectionOptions.InitialCatalog) | ||
| Server login response → ENV_CHANGE(ENV_DATABASE) → CurrentDatabase = newValue | ||
| CompleteLogin() → _currentSessionData._initialDatabase = CurrentDatabase | ||
| (only when _recoverySessionData == null, i.e. first login) | ||
| ``` | ||
|
|
||
| `SqlConnectionInternal.cs` line 2976—sets `CurrentDatabase` to `InitialCatalog` immediately. The | ||
| server then confirms (or overrides) via ENV_CHANGE before `CompleteLogin()` captures it. | ||
|
|
||
| ### During normal operation | ||
|
|
||
| ```text | ||
| USE [MyDb] via SqlCommand → server response → ENV_CHANGE(ENV_DATABASE) | ||
| OnEnvChange() → CurrentDatabase = "MyDb" | ||
| _originalDatabase NOT updated (guarded by _fConnectionOpen) | ||
| ``` | ||
|
|
||
| `SqlConnectionInternal.cs` lines 1155–1164. After the connection is open, `_originalDatabase` is frozen. | ||
|
|
||
| ### During pool reset | ||
|
|
||
| ```text | ||
| Deactivate() → ResetConnection() | ||
| → _parser.PrepareResetConnection() (sets TDS header flag for sp_reset_connection) | ||
| → CurrentDatabase = _originalDatabase (resets to initial catalog immediately) | ||
| ``` | ||
|
|
||
| `SqlConnectionInternal.cs` lines 3895–3907. | ||
|
|
||
| ### `CurrentSessionData` getter (just-in-time snapshot) | ||
|
|
||
| ```csharp | ||
| internal SessionData CurrentSessionData | ||
| { | ||
| get | ||
| { | ||
| if (_currentSessionData != null) | ||
| { | ||
| _currentSessionData._database = CurrentDatabase; | ||
| _currentSessionData._language = _currentLanguage; | ||
| } | ||
| return _currentSessionData; | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| `SqlConnectionInternal.cs` lines 530–537. This is called by `ValidateAndReconnect()` right before | ||
| saving recovery data for reconnection. | ||
|
|
||
| ## Session Recovery Protocol | ||
|
|
||
| When `ConnectRetryCount > 0` (default: **1**), the driver negotiates `FEATUREEXT_SRECOVERY` with the | ||
| server during login. On reconnection, `WriteSessionRecoveryFeatureRequest()` encodes: | ||
|
|
||
| 1. **Initial state**: `_initialDatabase`, `_initialCollation`, `_initialLanguage`, `_initialState[]` | ||
| 2. **Current deltas**: `_database` (if different from `_initialDatabase`), `_language`, | ||
| `_collation`, `_delta[]` | ||
|
|
||
| The server uses the initial state + deltas to rebuild the session. If `_database != | ||
| _initialDatabase`, the server switches to `_database` after login. | ||
|
|
||
| ### `WriteSessionRecoveryFeatureRequest` — relevant excerpt | ||
|
|
||
| ```text | ||
| TdsParser.cs line 8963: | ||
| initialLength += ... _initialDatabase ... | ||
| TdsParser.cs line 8966: | ||
| currentLength += ... (_initialDatabase == _database ? 0 : _database) ... | ||
| TdsParser.cs line 9017: | ||
| WriteIdentifier(_database != _initialDatabase ? _database : null, ...) | ||
| ``` | ||
|
|
||
| When `_database` equals `_initialDatabase`, a zero-length identifier is written (meaning "no | ||
| change"). When they differ, the current database name is written and the server applies it. | ||
|
|
||
| ## Flow: How `ValidateAndReconnect` triggers recovery | ||
|
|
||
| ```text | ||
| SqlCommand.RunExecuteNonQueryTds() | ||
| → SqlConnection.ValidateAndReconnect() | ||
| check _connectRetryCount > 0 | ||
| check _sessionRecoveryAcknowledged | ||
| check !stateObj.ValidateSNIConnection() ← physical connection broken? | ||
| SessionData cData = tdsConn.CurrentSessionData ← snapshot (writes _database = CurrentDatabase) | ||
| _recoverySessionData = cData ← save for new connection | ||
| tdsConn.DoomThisConnection() | ||
| Task.Run → ReconnectAsync() | ||
| → ForceNewConnection = true | ||
| → OpenAsync() | ||
| → TryReplaceConnection() | ||
| → SqlConnectionFactory.CreateConnection() | ||
| → new SqlConnectionInternal(..., recoverySessionData) | ||
| constructor: _originalDatabase = recoverySessionData._initialDatabase | ||
| Login() | ||
| → CurrentDatabase = InitialCatalog | ||
| → login.database = CurrentDatabase | ||
| → TdsLogin(..., _recoverySessionData, ...) | ||
| → WriteSessionRecoveryFeatureRequest(recoverySessionData, ...) | ||
| Server processes login + recovery → ENV_CHANGE(ENV_DATABASE) → CurrentDatabase updated | ||
| CompleteLogin() | ||
| → _recoverySessionData = null | ||
| ``` |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc says
VerifyRecoveredDatabaseContextdefaults totrue, butLocalAppContextSwitches.VerifyRecoveredDatabaseContextcurrently defaults tofalse. Please update this document (or the switch default) so the documented behavior matches the implementation.