You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pushed a commit with an updated approach that adapts to the changes made to main since this PR was made.
Problem
When importing a document with many authors, getAttributions in workers/tasks/import/metadata.ts fires N individual getSearchUsers database queries (one per author name) via Promise.all. Each query runs an ILIKE '%name%' scan on the User table. With many authors this causes DB connection contention and can exceed the 2-minute worker timeout.
Original PR Approach (July 2024)
The original branch (gs/batch-author-import) had two commits:
Throttled queries with asyncMap — Replaced Promise.all + .map(async ...) with asyncMap at concurrency: 2, so only 2 author lookups run at a time instead of all at once.
Bumped the global worker timeout — Changed maxWorkerTimeSeconds from 120 → 300, affecting all worker tasks (export, import, archive).
This reduced DB pressure but still issued N individual queries — just slower. And the timeout change was a blunt instrument.
Updated Approach (April 2026)
Three targeted changes that batch at the SQL level rather than throttle at the application level:
1. New batchSearchUsers function — server/search/queries.ts
Added a new export that takes an array of author names and issues a singleUser.findAll query with all names combined into one OR clause. Results are mapped back per-name in JS using the same matching logic as the original per-name query (fullName contains, slug contains, email exact match). The existing getSearchUsers is left untouched for use by the search API.
Instead of Promise.all over N async lookups, the function now:
Filters authorEntries to collect all string names upfront
Calls batchSearchUsers once with the full list
Synchronously maps results back into the attribution objects
No concurrency management needed since it's a single query.
3. Per-task timeout — workers/queue.ts
Added import: 300 (5 minutes) to the existing customTimeouts map. This gives imports more headroom for large documents without affecting other task types. The codebase already had this mechanism in place for the archive task.
Key Insight
One SQL query with 50 names in its WHERE clause is far cheaper than 50 individual queries run 2 at a time. Batching at the database level eliminates the need for application-level concurrency control entirely.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue(s) Resolved
Resolves #2795
Test Plan
Screenshots (if applicable)
Optional
Notes/Context/Gotchas
Supporting Docs