Fix _Once.ensure() to propagate handshake failure to concurrent waiters#3397
Open
bysiber wants to merge 1 commit intopython-trio:mainfrom
Open
Fix _Once.ensure() to propagate handshake failure to concurrent waiters#3397bysiber wants to merge 1 commit intopython-trio:mainfrom
bysiber wants to merge 1 commit intopython-trio:mainfrom
Conversation
When two tasks use an SSLStream concurrently (one sending, one receiving), both call _Once.ensure() to trigger the lazy handshake. If the first task starts the handshake and it fails (e.g. certificate error, connection reset), the exception propagates to that task but _done is never set. The second task, already waiting on _done.wait(), blocks indefinitely — the Event is never signalled and started is permanently True, so there is no recovery path. Store the failure exception and set _done even on error, so that all waiters wake up and receive a BrokenResourceError chained from the original handshake exception.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3397 +/- ##
====================================================
- Coverage 100.00000% 99.97942% -0.02058%
====================================================
Files 128 128
Lines 19424 19434 +10
Branches 1318 1320 +2
====================================================
+ Hits 19424 19430 +6
- Misses 0 2 +2
- Partials 0 2 +2
🚀 New features to boost your workflow:
|
A5rocks
requested changes
Feb 20, 2026
Contributor
A5rocks
left a comment
There was a problem hiding this comment.
Could you create an issue before making a PR fixing a bug like this? I'd like to confirm that this is a real problem first!
| try: | ||
| await self._afn(*self._args) | ||
| except BaseException as exc: | ||
| self._failure = exc |
Contributor
There was a problem hiding this comment.
This is a bad idea, because a) the stack frames for exc will be mutated, so you will have weird stack traces for raise ... from self._failure and b) this will lead to a refcycle I believe.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
_Once.ensure()inSSLStreamcan leave concurrent waiters hanging forever when the handshake fails.When two tasks share an
SSLStream(one sending, one receiving), both callensure()to lazily perform the TLS handshake. The first task setsstarted = Trueand begins the handshake. The second task seesstarted=True, finds_donenot yet set, and enters_done.wait().If the handshake fails (certificate error, connection reset, etc.), the exception propagates to the first task — but
_done.set()is never called. The second task is stuck forever in_done.wait(): the Event will never be signalled, andstartedis permanentlyTrue, so re-entry won't help either.Reproduction scenario
send_all()→ entersensure(), starts handshakereceive_some()→ entersensure(), waits on_doneBrokenResourceError— correctFix
Store the exception on failure and still signal
_done, so that concurrent waiters (and any future callers) wake up and receive aBrokenResourceErrorchained from the original handshake exception.