Proposal: ordered_set backend for duplicate Registry

Not sure if an issue is the right place to discuss this — happy to move the conversation elsewhere if needed.
                                                                                                                                                                                                      
## Context                                                

At Whatnot we use `Phoenix.PubSub` (backed by `Registry`) for livestreams. In #14654 we added `{:duplicate, :key}` partitioning which helped when we had many small livestreams, but it doesn't help with large ones (as discussed in the PR) — all subscribers for a key still end up in one `duplicate_bag` partition.

We hit a concrete problem: at a certain leave rate from a large livestream, the PID partition GenServer's message queue runs away because each `:ets.match_delete/2` on the `duplicate_bag` has to scan all entries for that key to find the ones belonging to the leaving process. The partition can't keep up.

## Idea

Use an `ordered_set` ETS table with composite keys `{key, pid, counter}` instead of `duplicate_bag` with `{key,{pid, value}}`. Since `ordered_set` sorts entries, all entries for a `{key, pid}` prefix are adjacent in the tree, so unregistering only touches that process's entries instead of scanning the whole key.

Combined with key-based partitioning (so lookup hits one partition, not all of them).

A PoC implementation and benchmarks are available at: https://github.com/studzien/elixir/tree/registry/duplicate-ordered-set

In our simulated load tests the lookup time increase was not noticeable in our workload and we no longer saw the queue runaway issue.

## Benchmark results

Hot key workload — all N entries under a single key, 16 partitions, `duplicate` = pid-partitioned `duplicate_bag`, `ordered` = key-partitioned `ordered_set`:

### Lookup

| N | duplicate | ordered | |
|---|---|---|---|
| 100 | 4.0 μs | 6.4 μs | ordered 1.6x slower |
| 1,000 | 29.5 μs | 67.3 μs | ordered 2.3x slower |
| 10,000 | 315 μs | 404 μs | ordered 1.3x slower |
| 1,000,000 | 149 ms | 94.9 ms | ordered 1.6x faster |

Ordered is slower at small N due to `:ets.select/2` vs `:ets.lookup_element/3`. The gap closes as N grows, and at 1M ordered wins because it hits one partition instead of sixteen.

### Unregister

| N | duplicate | ordered | |
|---|---|---|---|
| 100 | 2.9 μs | 3.3 μs | ~tied |
| 1,000 | 4.1 μs | 3.6 μs | ordered 1.1x faster |
| 10,000 | 14.7 μs | 4.0 μs | ordered 3.7x faster |
| 1,000,000 | 5.1 ms | 5.4 μs | **ordered 940x faster** |

This is the main win. At 1M entries unregister goes from 5ms to 5μs — this is what causes the queue runaway in the partition GenServer with `duplicate_bag`.

## Tradeoffs

Lookup is ~1.5-2x slower at small to medium scale. Unregistration and crash cleanup are dramatically faster at scale. For our use case (large channels, high join/leave rate) the tradeoff is clearly worth it.

## Next steps

We're happy to work on this and prepare a production-ready version of the code, but would appreciate some guidance on structuring the solution. The current Registry implementation already has a fair amount of branching for `unique`, `duplicate` partitioned by pid, and `duplicate` partitioned by key — adding another backend on top of that makes the code harder to follow. Would appreciate thoughts on whether the current approach (more case branches) is acceptable, or if there's a preferred way to organize this (e.g., separate modules per backend, behaviour, etc.).

Related: #14654


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: ordered_set backend for duplicate Registry #15128

Context

Idea

Benchmark results

Lookup

Unregister

Tradeoffs

Next steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

N	duplicate	ordered
100	4.0 μs	6.4 μs	ordered 1.6x slower
1,000	29.5 μs	67.3 μs	ordered 2.3x slower
10,000	315 μs	404 μs	ordered 1.3x slower
1,000,000	149 ms	94.9 ms	ordered 1.6x faster

N	duplicate	ordered
100	2.9 μs	3.3 μs	~tied
1,000	4.1 μs	3.6 μs	ordered 1.1x faster
10,000	14.7 μs	4.0 μs	ordered 3.7x faster
1,000,000	5.1 ms	5.4 μs	ordered 940x faster

Proposal: ordered_set backend for duplicate Registry #15128

Description

Context

Idea

Benchmark results

Lookup

Unregister

Tradeoffs

Next steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions