The ImportResolver probes data dump distributions via HEAD requests and stores the HTTP Last-Modified header on the probe result (ProbeResult.lastModified). However, this value is never propagated to Distribution.lastModified before passing distributions to the importer.
As a result, LastModifiedDownloader.localFileIsUpToDate() always returns false when distribution.lastModified is undefined — causing the file to be re-downloaded every run. This in turn invalidates the QLever index cache (since the file's mtime is updated), forcing a full re-index on every pipeline run.
This affects all distributions where lastModified isn't set by the dataset source (e.g. manual dataset.ttl selections). Registry-backed datasets are unaffected because @lde/dataset-registry-client sets lastModified from the registry's modified field.
Suggested fix
In ImportResolver.importDataset(), copy probeResult.lastModified onto the candidate distribution before passing it to the importer:
for (const candidate of candidates) {
const probeResult = probeResults.find(r => r.url === candidate.accessUrl.toString());
if (probeResult?.lastModified && !candidate.lastModified) {
candidate.lastModified = probeResult.lastModified;
}
}
This would let the downloader use the HTTP Last-Modified header to skip redundant downloads, which in turn preserves the QLever index cache.
The
ImportResolverprobes data dump distributions via HEAD requests and stores the HTTPLast-Modifiedheader on the probe result (ProbeResult.lastModified). However, this value is never propagated toDistribution.lastModifiedbefore passing distributions to the importer.As a result,
LastModifiedDownloader.localFileIsUpToDate()always returnsfalsewhendistribution.lastModifiedisundefined— causing the file to be re-downloaded every run. This in turn invalidates the QLever index cache (since the file's mtime is updated), forcing a full re-index on every pipeline run.This affects all distributions where
lastModifiedisn't set by the dataset source (e.g. manualdataset.ttlselections). Registry-backed datasets are unaffected because@lde/dataset-registry-clientsetslastModifiedfrom the registry'smodifiedfield.Suggested fix
In
ImportResolver.importDataset(), copyprobeResult.lastModifiedonto the candidate distribution before passing it to the importer:This would let the downloader use the HTTP
Last-Modifiedheader to skip redundant downloads, which in turn preserves the QLever index cache.