Fix: always paginate GET /users; /search 500 on empty acronym filter#216
Merged
Fix: always paginate GET /users; /search 500 on empty acronym filter#216
Conversation
Drops the `if page?` guard so /users unconditionally returns a Page object. Adds a deterministic default sort (username asc) so paged responses are stable for clients walking nextPage links. Addresses production timeouts on /users?include=all. Pairs with ncbo/ontologies_linked_data#286 (idempotent User#admin?, drop inverse attrs from auth load) and the corresponding ontologies_api_ruby_client auto-paginate change. Tests updated for the paged response shape and cover ?pagesize and ?include=all combinations.
Commit 388d1bd ("converted acronyms filter from Boolean to Term syntax") replaced get_quoted_field_query_param at the start of filter_query construction in BOTH search_helper.rb (line 160) and properties_search_helper.rb (line 48). The new term-syntax function correctly early-returns "" for an empty acronyms list (an empty terms-query would itself be malformed), but the rest of filter_query construction in both helpers implicitly relied on the prior function's vacuous `submissionAcronym:""` placeholder always producing a non-empty starting clause. When acronyms filtered to empty (e.g. ontology_types restriction matches nothing, or the requesting user has no access to any of the requested ontologies), filter_query was "" and the subsequent `<< " AND <clause>"` appends produced a stray-AND fq that Solr rejected with 400 (surfaced as 500 by the API), breaking both /search and /property_search for any call whose acronyms resolved empty. Fix: when filter_query is empty after the acronyms step, use Solr's match-all literal `*:*`. AND'ing further clauses onto `*:*` narrows correctly, producing well-formed queries semantically equivalent to the constraints that follow. Also more correct than the pre-388d1bd behavior, which always silently added `submissionAcronym:""` (matching zero docs) when acronyms was empty — search-without-acronyms now actually returns matches. The third in-tree call site (search_helper.rb:181 — valueset_root_ids) is unaffected: it's already guarded by `unless valueset_root_ids.empty?` on the line above, so the function input is guaranteed non-empty. Adds regression tests for both endpoints exercising the empty-acronyms path via ontology_types=NONEXISTENT.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GET /usersby removing theif page?guard inhelpers/users_helper.rb. The endpoint now always returns a paged response ({page, pageCount, totalCount, prevPage, nextPage, links, collection}), never a flat array.username asc) so clients walkingnextPagelinks get stable page boundaries.ontologies_linked_datato develop tip (cdbf8f6), which includes Fix /users perf with security: idempotent admin?, drop inverse attrs from auth load ontologies_linked_data#286 — idempotentUser#admin?and inverse-attribute exclusion from the auth-middleware load (the dominant per-user cost in the timeout)./searchand/property_searchreturning 500 when the acronym filter resolves to empty (e.g.ontology_typesfilter rejects all candidates, or the requesting user has no access to any of the requested ontologies). Pre-existing regression introduced by commit388d1bd0("converted acronyms filter from Boolean to Term syntax"). Bundled here because both are production-critical and shipping in the same release window.Why /users is slow
Production users were unable to create ontologies, with
Faraday::TimeoutErrorcoming back frombioportal_web_uicontrollers that calledLinkedData::Client::Models::User.all. Response time of/users?include=allgrew linearly with user count: every serialized user re-invokedThread.current[:remote_user]&.admin?, which re-loaded:rolevia Goo on each call (N+1 over a single user), compounded by the absence of pagination. With ~5,300 users in production, this exceeded Faraday's 60s read timeout on every authenticated request.Why /search and /property_search are broken
Commit
388d1bd0replacedget_quoted_field_query_paramwithget_terms_field_query_paramat the start offilter_queryconstruction. The new term-syntax function correctly early-returns""for an empty acronyms list (an empty_query_:"{!terms f=…}"would itself be malformed), but the rest offilter_queryconstruction implicitly relied on the prior function's vacuoussubmissionAcronym:""placeholder always producing a non-empty starting clause.When acronyms filtered to empty,
filter_querywas""and the subsequent<< \" AND <clause>\"appends produced a stray-ANDfqthat Solr rejected with 400 (surfaced as 500 by the API). Same bug at two call sites:helpers/search_helper.rb:160andhelpers/properties_search_helper.rb:48. A third call site (search_helper.rb:181, valueset_root_ids) is unaffected — already guarded byunless valueset_root_ids.empty?.Fix: substitute Solr's match-all literal
*:*whenfilter_queryis empty after the acronyms step. AND'ing further clauses onto*:*narrows correctly. Side benefit: this is also more correct than the pre-388d1bd0 behavior, which silently returned 0 hits when acronyms was empty due to the vacuously-restrictivesubmissionAcronym:""placeholder.Cross-repo rollout
The full /users fix spans four repos and must roll out in a specific order:
Breaking change
GET /usersnow always returns a paged Hash, never a flat array. Direct HTTP consumers (anything bypassing theontologies_api_ruby_clientgem) need to readresponse['collection']instead of treating the response as an array. The forthcoming client release transparently walks pages and returns a flat array to existing in-process callers, so consumers using the gem are unaffected.Local benchmarks (21,339 users on dev triplestore, after #286 + this PR)
/users?include=all(no params, paginates to 50)pagesize=5000slim includepagesize=5000&include=all(6.7 MB)Models::User.allwalk via client (5 pages × 5000)Per-page cost dropped from ~30ms × N (linear) to roughly flat at ~2s for slim include.
Test plan
bundle exec rake testpasses locallytest_all_usersasserts paged response shape (page,totalCount,collection)test_all_users_paginationasserts?pagesize=2returns page 1 of 2 withnextPage=2test_all_users_include_all_is_pagedasserts?include=all&pagesize=2paginates and each item carries the requested attributes (e.g.created)test_search_with_empty_acronym_filter_returns_okasserts/search?q=anything&ontology_types=NONEXISTENT_TYPEreturns 200 with empty collection (regression for the*:*fix)test_property_search_with_empty_acronym_filter_returns_okasserts the equivalent for/property_searchcurl '/users?include=all'returns paged JSON;curl '/users?pagesize=50&include=all'returns first page in a few seconds;curl '/search?q=Conceptual%20Entity&ontologies=STY&require_exact_match=true'returns 200 (was 500)bioportal_web_ui(with the matching client gem) renders/admin/users, the New Ontology form's user dropdown, and/usersadmin page correctly after deploy