Skip to content

[bug] WebSocket session takeover via client-controlled session_id #30

@d180

Description

@d180

What went wrong

The SDK WebSocket control plane trusts a client-supplied session_id and uses
it as the sole routing key for server-pushed verdicts and HITL responses.

If a second authenticated SDK client connects with a different valid API key
but the same session_id, it replaces the original subscriber for that
session. Subsequent verdicts for that session are delivered to the second
client, while the original client stops receiving them.

This can disrupt or weaken enforcement in the main SDK product path:

  • in HITL mode, the original SDK can stop receiving review decisions
  • in BLOCK mode, the original SDK can time out waiting for verdicts and fail
    open

Relevant code paths:

  • sdk/adrian/ws.py: SDK sends login.session_id
  • backend/internal/ws/handler.go: backend accepts login.SessionId directly
  • backend/internal/ws/hub.go: single subscriber per session_id, re-
    register closes the prior channel
  • backend/internal/api/handlers_reviews.go: HITL responses are published by
    session_id

Reproduction steps

  1. Start the backend and create two valid SDK API keys in the dashboard.
  2. Connect SDK client A to /ws with API key A and log in using session_id = "sess-shared-demo".
  3. Send an event from client A and confirm client A receives the verdict.
  4. Connect SDK client B to /ws with API key B and log in using the same
    session_id = "sess-shared-demo".
  5. Send another event from client A.
  6. Observe that client B receives the verdict for the second event, while
    client A times out waiting for it.

Standalone proof used locally:

before second client: client A got verdict for evt-before
after second client login: client B got verdict for evt-after
client A timed out waiting for the verdict after takeover

Expected behaviour

A second authenticated SDK client should not be able to take over another
session’s verdict/HITL channel merely by reusing its session_id.

The backend should bind verdict routing to a server-owned or strongly
authenticated identity, or reject conflicting reuse of the same session_id
across different clients / API keys.


Actual behaviour

The backend accepts the client-provided session_id, registers the WebSocket
subscriber by that value alone, and replaces the prior subscriber if another
connection registers the same session_id.

As a result, the most recent client claiming that session_id becomes the
active recipient for verdicts and HITL responses for that session.


Likely fix

Do not use raw client-supplied session_id as the sole trusted routing
identity.

Likely fixes:

  • Bind the active WebSocket session to the authenticated API key as well as
    the session_id, and reject or ignore reuse of the same session_id from a
    different key.
  • Alternatively, issue a server-owned connection/session identifier at login
    and route verdicts/HITL responses by that server-owned identifier rather
    than by the client-provided session_id.
  • If reconnect continuity is required, only allow takeover when the reconnect
    is authenticated as the same logical client, rather than any valid SDK key.
  • Add a regression test covering two authenticated clients with different keys
    claiming the same session_id, asserting that the second client does not
    steal the first client’s verdict stream.

At minimum, conflicting reuse of the same session_id across different
authenticated SDK clients should fail closed rather than silently replacing
the existing subscriber.


Environment

  • Adrian version / commit:
  • OS: macOS arm64
  • Docker version:Docker version 29.4.3
  • GPU model (if relevant): not relevant

Logs

Click to expand
before second client: client A got verdict for evt-before
after second client login: client B got verdict for evt-after
client A timed out waiting for the verdict after takeover

Offer to contribute

I’d be happy to work on a fix for this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions