cachedb_redis: add ASK redirect handling for cluster resharding by dondetir · Pull Request #3852 · OpenSIPS/opensips

dondetir · 2026-03-29T16:29:56Z

Summary

Add support for Redis ASK redirects during cluster resharding, completing the cluster redirect handling started in PR #3639.

Background

When a Redis cluster is resharding (migrating slots between nodes), the source node returns an ASK response instead of MOVED. The key difference:

MOVED = permanent redirect (slot ownership changed) — already handled by PR Add Redis MOVED Redirection support #3639
ASK = temporary redirect (slot is mid-migration) — requires sending ASKING to the target node before retrying

Without ASK handling, queries during resharding operations fail with unhandled error responses.

Implementation

Detects ASK responses alongside existing MOVED handling in the query retry loop
Sends ASKING command to the target node before retrying the original query
Does NOT update the slot table (correct per Redis spec — ASK is one-time, unlike MOVED)
Refactors parse_moved_reply() into parse_redirect_reply() that accepts the prefix as a parameter, with inline wrappers parse_moved_reply() and parse_ask_reply() for backward compatibility

Note on hash slot fix

The original version of this PR also included a hash slot calculation fix. That has been removed per @NormB's feedback, as his PR #3815 addresses this more comprehensively (including hash tag support). This PR now contains only the ASK redirect handling.

Testing

Builds clean, zero warnings
All 2545 unit tests pass
@NormB verified the parse_redirect_reply() refactor passes integration tests with live slot migrations across a 3-node cluster

Partially addresses #2811

NormB · 2026-03-29T20:46:24Z

This appears to be a partial duplicate of #3815

dondetir · 2026-03-30T03:28:13Z

Thanks for pointing that out @NormB. You're right — the hash slot fix (& con->slots_assigned → % REDIS_CLUSTER_SLOTS) overlaps with your PR #3815, which also adds hash tag support that mine doesn't cover.

The unique part of this PR is the ASK redirect handling (detection + ASKING command before retry), which completes the cluster redirect support alongside the MOVED handling you added in #3639.

Happy to rebase and drop the hash slot change once #3815 merges, keeping only the ASK redirect logic. Or if you'd prefer, I can split this into a separate ASK-only PR right now. Let me know what works best.

Add support for Redis ASK redirects during cluster resharding. When a slot is being migrated between nodes, Redis returns an ASK response instead of MOVED. Unlike MOVED (permanent redirect), ASK is a one-time redirect that requires sending the ASKING command to the target node before retrying the original query. The implementation: - Detects ASK responses alongside existing MOVED handling - Sends ASKING command to the target node before retrying - Reuses the MOVED redirect infrastructure (endpoint lookup, reconnection, retry logic) Also refactor parse_moved_reply() into parse_redirect_reply() that accepts the prefix as a parameter, with inline wrappers parse_moved_reply() and parse_ask_reply() for backward compatibility. Partially addresses OpenSIPS#2811

dondetir · 2026-03-30T13:18:31Z

Rebased to remove the hash slot fix per @NormB's feedback — this PR now contains only the ASK redirect handling + parse_redirect_reply() refactor.

The hash slot calculation change is properly covered by #3815 (which also adds hash tag support).

Force-pushed with the updated commit.

With use_tls=1 on cachedb_redis, every SIP worker that touches a Redis/Valkey connection SIGSEGVs on the first SSL_new. Two independent problems stacked on the same line: 1. The old code indexed d->ctx as a per-process array: SSL_new(((void**)d->ctx)[process_no]) This predates the tls_mgm refactor (upstream 11aa15d, 'refactor TCP/TLS code to have everything in a single process'), after which ctx became a single SSL_CTX *. Every worker with process_no > 0 read garbage past the SSL_CTX struct. 2. Even after fixing the dereference, d->ctx is not safely usable from sibling workers. tls_mgm builds its SSL_CTX in child_init(PROC_TCP_MAIN), which runs *after* SIP workers and the MI process have already forked. So those workers see d->ctx == NULL when their cachedb_redis:child_init -> cachedb_do_init -> redis_init_ssl path fires. Even waiting for TCP_MAIN doesn't help: the SSL_CTX was built post-fork in TCP_MAIN's address space, and OpenSSL embeds pointers into per-process storage inside the CTX (method tables, providers, OSSL_LIB_CTX). Fix: each worker builds its own SSL_CTX from the static configuration on the tls_domain (cert, pkey, ca, verify_cert). CTXs are cached per-worker, keyed by tls_domain pointer, so multiple connections sharing a domain share one CTX inside that worker. Floors min proto at TLS 1.2 to match Valkey/Redis 7+ defaults and the tls_method=TLSv1_2 that most tls_mgm configs already set. Also tightens the adjacent error paths: SSL_free the SSL on redisInitiateSSL failure, and promote the printf in that path to LM_ERR for consistency. Note: a companion parser fix for 'tls-port' in parse_cluster_shards is still needed for TLS cluster topology updates to use the right port on MOVED/ASK redirects, but depends on PR OpenSIPS#3852 (parse_cluster_shards introduction); it can land as a small follow-up once that PR is merged.

liviuchircu

Great work here, @dondetir -- LGTM. As a plus, the patch is backwards-compatible too, as it purely adds support for "ASK" redirects, complementing the existing MOVED redirection support added in #3639 last year. Merging.

Replace the static cluster topology (built once at startup, never refreshed) with runtime discovery and automatic refresh: Topology discovery and refresh: - Probe CLUSTER SHARDS (Redis 7+) with fallback to CLUSTER SLOTS (Redis 3+) for backward compatibility - O(1) slot_table[16384] lookup replaces per-query linked-list scan - Automatic topology refresh on MOVED redirect, connection failure, or query targeting an unmapped slot (rate-limited to 1/sec) - Dynamic node creation when MOVED points to an unknown endpoint - Stale node pruning during refresh with safe connection cleanup - Cap redirect loop at 5 max redirects to prevent worker hang on pathological cluster state Cluster observability via MI commands: - redis_cluster_info: full topology dump including per-node connection status, slot assignments, query/error/moved/ask counters, and last activity timestamp - redis_cluster_refresh: trigger manual topology refresh (bypasses rate limit) - redis_ping_nodes: per-node PING with microsecond latency reporting - All MI commands support optional group filter parameter Statistics: - redis_queries, redis_queries_failed, redis_moved, redis_ask, redis_topology_refreshes (module-level stat counters) - Per-node query, error, moved, ask counters in redis_cluster_info Hash slot correctness: - Hash tag {…} extraction per Redis Cluster specification - CRC16 modulo 16384 replaces bitwise AND with slots_assigned ASK redirect handling: - Detect ASK responses alongside existing MOVED handling - Send ASKING command to target node before retrying original query - Do not update slot map (ASK is a temporary mid-migration redirect) - Refactor parse_moved_reply into parse_redirect_reply with prefix parameter; inline wrappers for backward compatibility Connection reliability: - TCP keepalive via redis_keepalive parameter (default 10s) - Stack allocation for redis_moved structs (eliminates OOM paths) - NULL guards on malformed CLUSTER SHARDS/SLOTS reply elements - Integer overflow protection in slot and port parsing - NULL guards in MI command handlers for group_name/initial_url Documentation: - New section: Redis Cluster Support (topology discovery, automatic refresh, MOVED/ASK handling, hash tags) - MI command reference: redis_cluster_info, redis_cluster_refresh, redis_ping_nodes - Authentication URL format documentation (classic, ACL, no-auth) - New parameter: redis_keepalive Test suite (186 tests): - C unit tests: hash slot calculation (37), MI counter helpers (41) - Integration: topology startup (12), ASK redirect (16), topology refresh (13), MI commands (50), edge cases (16) - Trap EXIT handlers for safe cluster state restoration - python3 preflight checks for JSON-dependent tests Depends on: #3815 (hash tag + modulo fix), #3852 (ASK redirect)

This was referenced Mar 30, 2026

cachedb_redis: add dynamic cluster topology management and observability #3855

Merged

cachedb_redis: add Redis cluster hash tag support to redisHash #3815

Merged

dondetir force-pushed the feature/redis-cluster-ask-topology branch from f3ee6c8 to fc8a736 Compare March 30, 2026 13:18

dondetir changed the title ~~cachedb_redis: add ASK redirect handling and fix hash slot calculation~~ cachedb_redis: add ASK redirect handling for cluster resharding Mar 30, 2026

liviuchircu self-assigned this Apr 22, 2026

liviuchircu added improvement new feature labels Apr 22, 2026

liviuchircu added this to the 4.0.0 milestone Apr 22, 2026

liviuchircu approved these changes Apr 22, 2026

View reviewed changes

liviuchircu merged commit 1444c40 into OpenSIPS:master Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cachedb_redis: add ASK redirect handling for cluster resharding#3852

cachedb_redis: add ASK redirect handling for cluster resharding#3852
liviuchircu merged 1 commit intoOpenSIPS:masterfrom
dondetir:feature/redis-cluster-ask-topology

dondetir commented Mar 29, 2026 •

edited

Loading

Uh oh!

NormB commented Mar 29, 2026

Uh oh!

dondetir commented Mar 30, 2026

Uh oh!

dondetir commented Mar 30, 2026

Uh oh!

liviuchircu left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dondetir commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Implementation

Note on hash slot fix

Testing

Uh oh!

NormB commented Mar 29, 2026

Uh oh!

dondetir commented Mar 30, 2026

Uh oh!

dondetir commented Mar 30, 2026

Uh oh!

liviuchircu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dondetir commented Mar 29, 2026 •

edited

Loading