Fix command queue corruption on encoding failures #3443

yangy0000 · 2025-09-13T05:27:52Z

Summary

This PR fixes an issue where encoding failures could corrupt the outstanding command queue, causing subsequent responses to be matched to wrong commands and leading to out-of-sync command/response handling.

Closes #2012

Root Cause: When command encoding fails, the failed command remains in the outstanding command stack without proper cleanup, corrupting the command-response matching mechanism.

Solution:

Add encoding error tracking to RedisCommand interface with markEncodingError() and hasEncodingError() methods
Mark commands with encoding errors in CommandEncoder when encode failures occur
Implement lazy cleanup of encoding-failed commands in CommandHandler response processing
Add comprehensive unit and integration tests for encoding error scenarios

Changes Made

RedisCommand Interface: Added encoding error tracking methods
Command Class: Implemented encoding error flag with volatile boolean for thread safety
CommandEncoder: Mark commands on encoding failures
CommandHandler: Enhanced response processing to handle encoding-failed commands
All RedisCommand Implementations: Updated to support encoding error tracking
Tests: Added comprehensive unit tests (EncodingErrorHandlingTests) and integration tests (EncodingErrorIntegrationTests, CommandEncodingErrorIntegrationTests)

Test Plan

Unit tests covering encoding error tracking and cleanup logic
Integration tests simulating real encoding failure scenarios
Existing test suite passes
Manual testing with deliberate encoding failures

Fixes

This resolves command queue corruption issues that could occur when Redis commands fail during encoding, ensuring proper command-response synchronization is maintained.

tishun · 2025-12-05T15:47:12Z

oopsie

Summary: Add encoding error tracking to prevent command queue corruption - Add markEncodingError() and hasEncodingError() methods to RedisCommand interface - Implement encoding error flag in Command class with volatile boolean - Mark commands with encoding errors in CommandEncoder on encode failures - Add lazy cleanup of encoding failures in CommandHandler response processing - Update all RedisCommand implementations to support encoding error tracking - Add comprehensive unit tests and integration tests for encoding error handling Fixes issue where encoding failures could corrupt the outstanding command queue by leaving failed commands in the stack without proper cleanup, causing responses to be matched to wrong commands. Test Plan: UTs, Integration testing Reviewers: yayang, ureview Reviewed By: yayang Tags: #has_java JIRA Issues: REDIS-14050 Differential Revision: https://code.uberinternal.com/D19068147

…coding failure Summary: Fix error command handling code logic and add integration test for encoding failure Test Plan: unittest, integration test Reviewers: #ldap_storage_sre_cache, ureview, jingzhao Reviewed By: #ldap_storage_sre_cache, jingzhao Tags: #has_java JIRA Issues: REDIS-14192 Differential Revision: https://code.uberinternal.com/D19271701

Addressing some general cases

jit-ci

❌ The following Jit checks failed to run:

secret-detection-trufflehog
static-code-analysis-semgrep-pro

#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.

More info in the Jit platform.

tishun · 2025-12-05T23:25:44Z

Addressing the most visible places where out-of-order processing might be caused due to invalid state of the driver or the JVM, most notably :

Encoding exceptions in the RedisCodec implementations

Based on the original PR

Observation:

The CommandEncoder is called after the commands are assumed to be written on the wire by the CommandHandler and thus put in the stack of issued commands. However the CommandEncoder uses different encoders, which could be custom user-provided encoders, and they might throw an Exception. As explained by @mp911de in #2012 (comment) the driver enters an invalid state and a disconnect is advised in this case.

Test case:

CodecFailureIntegrationTests

Solution:

Users are expected to close the connection, however nobody reads exception messages in their error handlers.
Furthermore some exceptions, such as - but not limited to - OutOfMemoryErrors, could not be avoided in the code.

As such a more radical approach should be taken by directly closing the connection.

Alternatives:

The original suggestion by @yangy0000 (as well as my first attempt) were aiming to recover from this error state. However the driver is very complex and there are a myriad of invariants involving time-outs and batching that might be hard to handle.

To avoid reaching and inconsistent state we assume that such errors are not recoverable.

Throwable errors in the Reactive chain

Observation:

The Lettuce driver - by default - uses the IO thread pull (unless configured otherwise) to process the reactive streams.
As a result an OutOfMemory error would essentially break the IO thread loop's processing for both the stream and the whole command.

Test case:

ReactiveStreamErrorIntegrationTests

Solution:

Catch Throwable instead of Exception and process the failure.

tishun · 2025-12-05T23:27:35Z

Suggest crossporting this to:

7.2.x
7.1.x
7.0.x
6.8.x

yangy0000 changed the title ~~Fix command queue corruption on encoding failuresFix encoding error command sync~~ Fix command queue corruption on encoding failures Sep 13, 2025

yangy0000 mentioned this pull request Sep 13, 2025

Responses getting out of sync with requests after failed command encoding #2012

Open

tishun force-pushed the fix-encoding-error-command-sync branch from db93265 to 2100f8f Compare December 5, 2025 15:52

Jing9 and others added 7 commits December 6, 2025 00:59

latest changes

5296624

Addressing the reactive streams issue

a2ee98d

Addressing the encoding issues

4a1331a

Addressing some general cases

Formatting issues

5721e86

Test failures addressed

e875bac

tishun force-pushed the fix-encoding-error-command-sync branch from 427ef97 to e875bac Compare December 5, 2025 22:59

jit-ci bot reviewed Dec 5, 2025

View reviewed changes

Polishing

7dd91b0

tishun requested review from a-TODO-rov and ggivo December 5, 2025 23:26

tishun added the type: bug A general bug label Dec 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix command queue corruption on encoding failures #3443

Fix command queue corruption on encoding failures #3443

yangy0000 commented Sep 13, 2025 •

edited by tishun

Loading

Uh oh!

tishun commented Dec 5, 2025

Uh oh!

jit-ci bot left a comment

Uh oh!

tishun commented Dec 5, 2025

Uh oh!

tishun commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix command queue corruption on encoding failures #3443

Are you sure you want to change the base?

Fix command queue corruption on encoding failures #3443

Conversation

yangy0000 commented Sep 13, 2025 • edited by tishun Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes Made

Test Plan

Fixes

Uh oh!

tishun commented Dec 5, 2025

Uh oh!

jit-ci bot left a comment

Choose a reason for hiding this comment

Uh oh!

tishun commented Dec 5, 2025

Encoding exceptions in the RedisCodec implementations

Observation:

Test case:

Solution:

Alternatives:

Throwable errors in the Reactive chain

Observation:

Test case:

Solution:

Uh oh!

tishun commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yangy0000 commented Sep 13, 2025 •

edited by tishun

Loading