Skip to content

Handle DCL vendor info timeouts during startup#1375

Open
ivaschru wants to merge 1 commit intomatter-js:mainfrom
ivaschru:fix-vendor-info-timeout
Open

Handle DCL vendor info timeouts during startup#1375
ivaschru wants to merge 1 commit intomatter-js:mainfrom
ivaschru:fix-vendor-info-timeout

Conversation

@ivaschru
Copy link
Copy Markdown

@ivaschru ivaschru commented May 5, 2026

Summary

  • Add an explicit 30 second timeout to the CSA DCL vendor-info fetch.
  • Treat TimeoutError the same way as ClientError so vendor catalog refresh remains best-effort.
  • Preserve startup by saving the built-in/stored vendor info instead of letting a stalled DCL response prevent the WebSocket server from binding.

Why

In MatterServer.start(), vendor_info.start() runs before mount_websocket() and MultiHostTCPSite.start(). If the DCL vendor endpoint stalls until aiohttp raises TimeoutError, that exception is currently not handled and the Matter Server never reaches the WebSocket bind step. Supervisor can still report the add-on service as started, but core-matter-server:5580/ws keeps returning connection refused.

This matches the startup failure being tracked in home-assistant/addons#4560.

Tests

PYTHONPATH=. /tmp/ha_matter_debug/venv/bin/python -m pytest -o addopts='' tests/server/test_vendor_info.py tests/server/test_server.py -q
2 passed in 5.39s

/tmp/ha_matter_debug/venv/bin/python -m ruff check matter_server/server/vendor_info.py tests/server/test_vendor_info.py
All checks passed!

@ivaschru
Copy link
Copy Markdown
Author

ivaschru commented May 5, 2026

I validated this patch on the affected HAOS 17.2 / amd64 installation by building a temporary local add-on from the official homeassistant/amd64-addon-matter-server:8.4.0 image with only matter_server/server/vendor_info.py replaced by this PR version.

Result from the live add-on startup:

2026-05-05 19:50:27.898 INFO [matter_server.server.vendor_info] Fetching the latest vendor info from DCL.
2026-05-05 19:50:58.790 WARNING [matter_server.server.vendor_info] Unable to fetch vendor info from DCL:
TimeoutError
2026-05-05 19:50:58.793 INFO [matter_server.server.vendor_info] Saving vendor info to storage.
2026-05-05 19:50:58.793 INFO [matter_server.server.device_controller] Loaded 0 nodes from stored configuration
2026-05-05 19:50:58.824 INFO [matter_server.server.server] Matter Server successfully initialized.

And the listener is open afterwards:

$ curl -sS -i --max-time 5 http://local-codex-matter-patch:5580/ws
HTTP/1.1 400 Bad Request
No WebSocket UPGRADE hdr: None
Can "Upgrade" only to "WebSocket".

So the patch fixes the startup blocker at the vendor-info DCL step: the server now reaches the WebSocket bind even when the DCL vendor endpoint times out.

One adjacent observation: on a fresh storage path, the earlier PAA DCL fetch can also hang until aiohttp's default ~5 minute timeout before falling back to Git. That is separate from this PR, but likely worth a follow-up timeout improvement for startup latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant