You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running the full dataset-knowledge-graph pipeline reveals gaps in error handling that cause unnecessary failures. Updated with findings from the 2026-03-21 DKG run.
QLever's default query timeout is 30s. On the deployment, heavy VoID aggregation queries on large datasets (78M+ triples) exceed this — locally the same queries complete in ~25s, so a higher timeout is sufficient.
The @lde/sparql-qleverServer class now accepts a queryTimeout option (passed as --default-query-timeout to qlever-server). DKG should set this to a generous value (e.g. '120s').
Retrying 429s would not help here — the same query would just time out again.
119 failed SPARQL stages in the 2026-03-21 run hit HTTP 429, containing:
113 "Operation timed out" exceptions
52 "Tried to allocate X, but only Y available" (QLever memory limit, --memory-max-size 6G)
6 "Sort operation was canceled" (time estimate exceeded remaining time)
The retry in packages/sparql-qlever/src/importer.ts only catches the multiline string literal error and only for format === 'ttl'. Other QLever parallel parsing errors also need --parse-parallel false but aren't caught.
The format === 'ttl' restriction should be removed: while the multiline retry was originally motivated by TTL files, the same QLever parallel parsing errors can occur with any format.
Note: the two datasets originally listed here (Krantenpaginas from deventit.coda-apeldoorn.nl, LOD Beelddocumenten from studiezaal.nijmegen.nl) are not fixable by retry logic — both servers serve RDF/XML regardless of the requested format (.ttl/.nt/.nq URL extensions are ignored; content-disposition confirms filename=*.rdf(.gz)). This is a data quality issue at the source, not a QLever parsing problem.
No known datasets currently need this broader retry, but it's still a correctness improvement.
3.--parse-parallel false retry passes flag incorrectly — FIXED in #287
Fixed by replacing the hardcoded -p true + conditional --parse-parallel false with a single --parse-parallel ${parallel}.
Problem
Running the full dataset-knowledge-graph pipeline reveals gaps in error handling that cause unnecessary failures. Updated with findings from the 2026-03-21 DKG run.
1. QLever query timeouts (HTTP 429) — increase
--default-query-timeoutQLever's default query timeout is 30s. On the deployment, heavy VoID aggregation queries on large datasets (78M+ triples) exceed this — locally the same queries complete in ~25s, so a higher timeout is sufficient.
The
@lde/sparql-qleverServerclass now accepts aqueryTimeoutoption (passed as--default-query-timeouttoqlever-server). DKG should set this to a generous value (e.g.'120s').Retrying 429s would not help here — the same query would just time out again.
119 failed SPARQL stages in the 2026-03-21 run hit HTTP 429, containing:
"Operation timed out"exceptions"Tried to allocate X, but only Y available"(QLever memory limit,--memory-max-size 6G)"Sort operation was canceled"(time estimate exceeded remaining time)Affected datasets (all openarchieven.nl — large datasets):
dataset_aal,dataset_ade,dataset_arg,dataset_bhi,dataset_bor,dataset_brd,dataset_cod,dataset_dar,dataset_den,dataset_dev,dataset_eal,dataset_elo,dataset_eem,dataset_ell,dataset_ens,dataset_frl,dataset_gld,dataset_gmb,dataset_gra,dataset_hco,dataset_hga,dataset_hua,dataset_nha,dataset_raa,dataset_rad,dataset_rar,dataset_rat,dataset_rhe,dataset_rhl,dataset_saa,dataset_sha,dataset_smh,dataset_srt,dataset_wba,dataset_wfa,dataset_zar2. QLever
--parse-parallel falseretry too narrowThe retry in
packages/sparql-qlever/src/importer.tsonly catches themultiline string literalerror and only forformat === 'ttl'. Other QLever parallel parsing errors also need--parse-parallel falsebut aren't caught.The
format === 'ttl'restriction should be removed: while the multiline retry was originally motivated by TTL files, the same QLever parallel parsing errors can occur with any format.Note: the two datasets originally listed here (Krantenpaginas from deventit.coda-apeldoorn.nl, LOD Beelddocumenten from studiezaal.nijmegen.nl) are not fixable by retry logic — both servers serve RDF/XML regardless of the requested format (
.ttl/.nt/.nqURL extensions are ignored;content-dispositionconfirmsfilename=*.rdf(.gz)). This is a data quality issue at the source, not a QLever parsing problem.No known datasets currently need this broader retry, but it's still a correctness improvement.
3.--parse-parallel falseretry passes flag incorrectly — FIXED in #287Fixed by replacing the hardcoded
-p true+ conditional--parse-parallel falsewith a single--parse-parallel ${parallel}.2026-03-21 DKG run error summary (baseline)
328 datasets processed, 266 completed successfully, 62 skipped.
--parse-parallelflag errorProposed fixes
AddNot useful — same query would time out again. Instead, exposestatus === 429toisTransientError().queryTimeoutonServerso DKG can increase the timeout (done in fix(sparql-qlever): fix conflicting --parse-parallel flags on import retry #287).marks the end of a statementand remove theformat === 'ttl'restriction.Fix theDone in fix(sparql-qlever): fix conflicting --parse-parallel flags on import retry #287.--parse-parallel falseflag passing.