-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix command queue corruption on encoding failures #3443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
oopsie |
db93265 to
2100f8f
Compare
Summary: Add encoding error tracking to prevent command queue corruption - Add markEncodingError() and hasEncodingError() methods to RedisCommand interface - Implement encoding error flag in Command class with volatile boolean - Mark commands with encoding errors in CommandEncoder on encode failures - Add lazy cleanup of encoding failures in CommandHandler response processing - Update all RedisCommand implementations to support encoding error tracking - Add comprehensive unit tests and integration tests for encoding error handling Fixes issue where encoding failures could corrupt the outstanding command queue by leaving failed commands in the stack without proper cleanup, causing responses to be matched to wrong commands. Test Plan: UTs, Integration testing Reviewers: yayang, ureview Reviewed By: yayang Tags: #has_java JIRA Issues: REDIS-14050 Differential Revision: https://code.uberinternal.com/D19068147
…coding failure Summary: Fix error command handling code logic and add integration test for encoding failure Test Plan: unittest, integration test Reviewers: #ldap_storage_sre_cache, ureview, jingzhao Reviewed By: #ldap_storage_sre_cache, jingzhao Tags: #has_java JIRA Issues: REDIS-14192 Differential Revision: https://code.uberinternal.com/D19271701
Addressing some general cases
427ef97 to
e875bac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ The following Jit checks failed to run:
- secret-detection-trufflehog
- static-code-analysis-semgrep-pro
#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.
More info in the Jit platform.
|
Addressing the most visible places where out-of-order processing might be caused due to invalid state of the driver or the JVM, most notably : Encoding exceptions in the RedisCodec implementationsBased on the original PR Observation:The Test case:
Solution:Users are expected to close the connection, however nobody reads exception messages in their error handlers. As such a more radical approach should be taken by directly closing the connection. Alternatives:The original suggestion by @yangy0000 (as well as my first attempt) were aiming to recover from this error state. However the driver is very complex and there are a myriad of invariants involving time-outs and batching that might be hard to handle. To avoid reaching and inconsistent state we assume that such errors are not recoverable. Throwable errors in the Reactive chainObservation:The Lettuce driver - by default - uses the IO thread pull (unless configured otherwise) to process the reactive streams. Test case:
Solution:Catch Throwable instead of Exception and process the failure. |
|
Suggest crossporting this to:
|
Summary
This PR fixes an issue where encoding failures could corrupt the outstanding command queue, causing subsequent responses to be matched to wrong commands and leading to out-of-sync command/response handling.
Closes #2012
Root Cause: When command encoding fails, the failed command remains in the outstanding command stack without proper cleanup, corrupting the command-response matching mechanism.
Solution:
markEncodingError()andhasEncodingError()methodsChanges Made
EncodingErrorHandlingTests) and integration tests (EncodingErrorIntegrationTests,CommandEncodingErrorIntegrationTests)Test Plan
Fixes
This resolves command queue corruption issues that could occur when Redis commands fail during encoding, ensuring proper command-response synchronization is maintained.