Skip to content

Conversation

@yangy0000
Copy link

@yangy0000 yangy0000 commented Sep 13, 2025

Summary

This PR fixes an issue where encoding failures could corrupt the outstanding command queue, causing subsequent responses to be matched to wrong commands and leading to out-of-sync command/response handling.

Closes #2012

Root Cause: When command encoding fails, the failed command remains in the outstanding command stack without proper cleanup, corrupting the command-response matching mechanism.

Solution:

  • Add encoding error tracking to RedisCommand interface with markEncodingError() and hasEncodingError() methods
  • Mark commands with encoding errors in CommandEncoder when encode failures occur
  • Implement lazy cleanup of encoding-failed commands in CommandHandler response processing
  • Add comprehensive unit and integration tests for encoding error scenarios

Changes Made

  • RedisCommand Interface: Added encoding error tracking methods
  • Command Class: Implemented encoding error flag with volatile boolean for thread safety
  • CommandEncoder: Mark commands on encoding failures
  • CommandHandler: Enhanced response processing to handle encoding-failed commands
  • All RedisCommand Implementations: Updated to support encoding error tracking
  • Tests: Added comprehensive unit tests (EncodingErrorHandlingTests) and integration tests (EncodingErrorIntegrationTests, CommandEncodingErrorIntegrationTests)

Test Plan

  • Unit tests covering encoding error tracking and cleanup logic
  • Integration tests simulating real encoding failure scenarios
  • Existing test suite passes
  • Manual testing with deliberate encoding failures

Fixes

This resolves command queue corruption issues that could occur when Redis commands fail during encoding, ensuring proper command-response synchronization is maintained.

@yangy0000 yangy0000 changed the title Fix command queue corruption on encoding failuresFix encoding error command sync Fix command queue corruption on encoding failures Sep 13, 2025
@tishun
Copy link
Collaborator

tishun commented Dec 5, 2025

oopsie

@tishun tishun force-pushed the fix-encoding-error-command-sync branch from db93265 to 2100f8f Compare December 5, 2025 15:52
Jing9 and others added 7 commits December 6, 2025 00:59
Summary:
Add encoding error tracking to prevent command queue corruption

  - Add markEncodingError() and hasEncodingError() methods to RedisCommand interface
  - Implement encoding error flag in Command class with volatile boolean
  - Mark commands with encoding errors in CommandEncoder on encode failures
  - Add lazy cleanup of encoding failures in CommandHandler response processing
  - Update all RedisCommand implementations to support encoding error tracking
  - Add comprehensive unit tests and integration tests for encoding error handling

Fixes issue where encoding failures could corrupt the outstanding command queue by leaving failed commands in the stack without proper cleanup, causing responses to be matched to wrong commands.

Test Plan: UTs, Integration testing

Reviewers: yayang, ureview

Reviewed By: yayang

Tags: #has_java

JIRA Issues: REDIS-14050

Differential Revision: https://code.uberinternal.com/D19068147
…coding failure

Summary: Fix error command handling code logic and add integration test for encoding failure

Test Plan: unittest, integration test

Reviewers: #ldap_storage_sre_cache, ureview, jingzhao

Reviewed By: #ldap_storage_sre_cache, jingzhao

Tags: #has_java

JIRA Issues: REDIS-14192

Differential Revision: https://code.uberinternal.com/D19271701
Addressing some general cases
@tishun tishun force-pushed the fix-encoding-error-command-sync branch from 427ef97 to e875bac Compare December 5, 2025 22:59
Copy link

@jit-ci jit-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ The following Jit checks failed to run:

  • secret-detection-trufflehog
  • static-code-analysis-semgrep-pro

#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.

More info in the Jit platform.

@tishun
Copy link
Collaborator

tishun commented Dec 5, 2025

Addressing the most visible places where out-of-order processing might be caused due to invalid state of the driver or the JVM, most notably :

Encoding exceptions in the RedisCodec implementations

Based on the original PR

Observation:

The CommandEncoder is called after the commands are assumed to be written on the wire by the CommandHandler and thus put in the stack of issued commands. However the CommandEncoder uses different encoders, which could be custom user-provided encoders, and they might throw an Exception. As explained by @mp911de in #2012 (comment) the driver enters an invalid state and a disconnect is advised in this case.

Test case:

CodecFailureIntegrationTests

Solution:

Users are expected to close the connection, however nobody reads exception messages in their error handlers.
Furthermore some exceptions, such as - but not limited to - OutOfMemoryErrors, could not be avoided in the code.

As such a more radical approach should be taken by directly closing the connection.

Alternatives:

The original suggestion by @yangy0000 (as well as my first attempt) were aiming to recover from this error state. However the driver is very complex and there are a myriad of invariants involving time-outs and batching that might be hard to handle.

To avoid reaching and inconsistent state we assume that such errors are not recoverable.

Throwable errors in the Reactive chain

Observation:

The Lettuce driver - by default - uses the IO thread pull (unless configured otherwise) to process the reactive streams.
As a result an OutOfMemory error would essentially break the IO thread loop's processing for both the stream and the whole command.

Test case:

ReactiveStreamErrorIntegrationTests

Solution:

Catch Throwable instead of Exception and process the failure.

@tishun tishun requested review from a-TODO-rov and ggivo December 5, 2025 23:26
@tishun tishun added the type: bug A general bug label Dec 5, 2025
@tishun
Copy link
Collaborator

tishun commented Dec 5, 2025

Suggest crossporting this to:

  • 7.2.x
  • 7.1.x
  • 7.0.x
  • 6.8.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: bug A general bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Responses getting out of sync with requests after failed command encoding

3 participants