Skip to content

Dev container hangs on /chat requests after extended uptime #186

@neuromechanist

Description

@neuromechanist

Description

The dev backend container (osa-dev) became unresponsive to POST /hed/chat requests while continuing to serve GET /health checks normally. All valid chat requests returned 0 bytes and timed out at every level (backend port, Apache proxy, Cloudflare Worker).

Symptoms

  • POST /hed/chat with valid auth: 0 bytes received, connection timeout
  • POST /hed/chat without auth: returns 401 immediately (proving uvicorn IS accepting connections)
  • GET /health: works normally (200 OK)
  • Docker health check: passes (runs inside container)
  • Direct curl from inside container (docker exec): works fine

Root Cause (Preliminary)

The uvicorn event loop appears to deadlock or stall on LLM/OpenRouter requests when accessed through the Docker port mapping (host:38529 -> container:38528). Requests that trigger the async LLM call path hang indefinitely, while requests that return before reaching the LLM (auth failures, health checks) work fine.

Resolution

Restarting the container (docker restart osa-dev) immediately resolved the issue.

Investigation Needed

  • Check uvicorn worker count (single worker = single event loop, one stalled request blocks all)
  • Add request timeout for LLM calls to prevent indefinite hangs
  • Add liveness probe that tests POST /chat endpoint, not just GET /health
  • Consider adding async timeout wrapper around OpenRouter API calls
  • Check if httpx/aiohttp connection pool exhaustion could cause this
  • Review if any synchronous operations block the async event loop

Impact

  • Dev environment only (production unaffected)
  • Widget appeared non-functional to users
  • Required manual container restart to recover

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Priority 1: Critical, fix as soon as possiblebugSomething isn't workingoperationsOperations, monitoring, and observability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions