You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
open is fully synchronous because it's fully based on DuckDB, which is only synchronous. Ideally this part could be also async, because loading the Stac-GeoParquet file is network-IO, but this is less important anyways because open is relatively fast
Chunk reads currently expose only a synchronous API because it only implements the synchronous Xarray API (see Implement Xarray async integration #29). But under the hood a large part of this process is actually async via async-geotiff.
My opinion is that as a general principle, it's good to compose async and sync functionality as separate API calls so that the downstream user can decide when and how to manage concurrency between IO-bound calls and CPU-bound calls. This is why async-tiff has a separation between the tile fetch (async network bound) and the decompression (CPU bound).
Claude suggests there are these multitude of sync/async boundary crossings:
MultiBandStacBackendArray.__getitem__ SYNC ← xarray BackendArray protocol
└── _raw_getitem SYNC
├── (resolve indexers, build _ChunkReadPlan) sync, pure CPU
└── _run_coroutine(_run_mosaic_all_dates(...)) sync→async bridge
└── _run_mosaic_all_dates ASYNC
└── asyncio.gather(_run_one_date per time step)
└── _run_one_date ASYNC
├── duckdb_client.search SYNC, blocking — guarded by threading.Lock
└── async_mosaic_chunk ASYNC
├── per-item: async-geotiff open ASYNC (network I/O)
├── per-item: async-geotiff read ASYNC (network I/O)
├── per-item: reproject+mosaic SYNC, dispatched via run_in_executor
└── pixel-selection mosaic logic SYNC, in-line
Claude suggests
Infra — DuckDB-inside-async: even if you exposed async def read_chunk(...), it would still block the event loop during DuckDB calls. A truly async API should at minimum await loop.run_in_executor(None, duckdb_client.search, ...) so the loop yields. (Trivial change, big honesty improvement.)
Is there prior work about duckdb threading? can we put each duckdb chunk query into a separate thread? Claude says:
DuckDB threading
DuckDB's official thread-safety model:
Connections are thread-safe with internal serialization — operations on one connection block each other (single mutex), but the connection won't crash under concurrent access. DuckDB has documented this since 0.5.x.
Multiple connections to the same DB file run in parallel — that's the official pattern for parallel Python workloads. (duckdb.connect(...) once per thread, or connection.cursor() for cheap parallel transaction contexts on the same DB.)
A single query uses multiple threads internally via DuckDB's parallel executor (SET threads = N).
So the answer to "can we put each duckdb chunk query into a separate thread" is yes, and there are two flavors:
Cheap fix: wrap the existing single client's .search() call in await loop.run_in_executor(None, lambda: client.search(...)). This doesn't get parallelism — the existing threading.Lock (or DuckDB's internal mutex) still serializes the queries — but it does release the event loop so other coroutines (COG opens, reads, reprojections for already-resolved time steps) can progress. Honesty improvement, no parallelism gain.
Real parallelism: a thread-local or per-time-step DuckdbClient pool. Each thread calls DuckdbClient() once and reuses it; queries from different worker threads run truly in parallel (DuckDB's process-level engine handles concurrent connections fine). This requires checking what rustac.DuckdbClient actually does internally — dir() shows only sync methods (search, search_to_arrow, execute, query_to_table, get_collections), no cursor() or per-connection clone. Worth a quick question to the rustac maintainers: does each DuckdbClient instance hold its own connection, and is it safe to construct N of them for the same parquet file? If yes, the path is clear.
A useful data point for the issue: it's worth measuring how much wall-time the per-chunk DuckDB queries actually account for. There's already a logger.debug trace for this in _backend.py:319-325. If DuckDB is <5% of chunk time on typical workloads, fix #1 alone is the right call (cheap, honest, no infra). If it's 20%+, fix #2 starts paying for itself.
One thing to verify before posting: _search_items calls client.search(...) directly inside the coroutine (_backend.py:357-368), under a threading.Lock. The lock is held across the sync call, so even loop.run_in_executor wouldn't gain parallelism without removing it. That doesn't change the conclusion (loop is blocked → fix #1 is right) but it makes the issue tighter to cite the line numbers.
Ideally I feel like there should be an API split between the sync parts and the async parts, but the problem is that if we shoehorn this into xarray's API, it doesn't allow for that split. The xarray API is either sync or async.
Perhaps this is a suggestion for lazycogs to have a high-level xarray API but another low-level API with more control? I'm not sure
Related to #26, #29, #30
If I understand correctly:
openis fully synchronous because it's fully based on DuckDB, which is only synchronous. Ideally this part could be also async, because loading the Stac-GeoParquet file is network-IO, but this is less important anyways becauseopenis relatively fastMy opinion is that as a general principle, it's good to compose async and sync functionality as separate API calls so that the downstream user can decide when and how to manage concurrency between IO-bound calls and CPU-bound calls. This is why
async-tiffhas a separation between the tile fetch (async network bound) and the decompression (CPU bound).Claude suggests there are these multitude of sync/async boundary crossings:
Claude suggests
Is there prior work about duckdb threading? can we put each duckdb chunk query into a separate thread? Claude says:
Ideally I feel like there should be an API split between the sync parts and the async parts, but the problem is that if we shoehorn this into xarray's API, it doesn't allow for that split. The xarray API is either sync or async.
Perhaps this is a suggestion for lazycogs to have a high-level xarray API but another low-level API with more control? I'm not sure