Skip to content

Comments

Simplify query for runs to export for samples and data during folder export#7448

Open
XingY wants to merge 1 commit intodevelopfrom
fb_folderExportPerf
Open

Simplify query for runs to export for samples and data during folder export#7448
XingY wants to merge 1 commit intodevelopfrom
fb_folderExportPerf

Conversation

@XingY
Copy link
Contributor

@XingY XingY commented Feb 23, 2026

Rationale

DataClassFolderWriter.write() and SampleTypeFolderWriter.write() follow the problematic pattern that has the potential of large memory usage.

  • Pull all ExpData or ExpMaterial into memory
  • Pull all ExpRun that reference the data or materials into memory
  • Filter them based run protocol and other criteria in Java
  • Add the RowIds for the filtered runs to the export

This PR simplifies the process by getting rid of the intermediate steps to query for all materials and runs, and instead query for run.rowId directly based on specified criteria.

Related Pull Requests

Changes

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors the folder export logic for sample types and data classes to reduce memory usage by replacing in-memory filtering with direct SQL queries. Instead of loading all ExpData, ExpMaterial, and ExpRun objects into memory and filtering them in Java, the new implementation queries for run IDs directly from the database with appropriate filters.

Changes:

  • Replaced memory-intensive object loading and Java-based filtering with SQL-based queries for determining which runs to export
  • Added two new methods to ExperimentService API: getDerivationRunIdsForDataClassExport and getDerivationRunIdsForSampleTypesExport
  • Removed intermediate data structures and helper methods (isValidRunType) that were used for in-memory filtering

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
experiment/src/org/labkey/experiment/samples/SampleTypeFolderWriter.java Simplified run selection by replacing material loading and filtering with direct SQL query; removed isValidRunType method
experiment/src/org/labkey/experiment/samples/DataClassFolderWriter.java Simplified run selection by replacing data loading and filtering with direct SQL query; removed isValidRunType method
experiment/src/org/labkey/experiment/api/ExperimentServiceImpl.java Implemented two new SQL-based query methods that return filtered run IDs for folder export
api/src/org/labkey/api/exp/api/ExperimentService.java Added interface declarations for the two new query methods
Comments suppressed due to low confidence (3)

experiment/src/org/labkey/experiment/samples/SampleTypeFolderWriter.java:106

  • The comment states "only want the sample derivation runs" but the implementation also includes aliquot runs (SAMPLE_ALIQUOT_PROTOCOL_LSID). Consider updating the comment to clarify that both derivation and aliquot protocol runs are included.
        // only want the sample derivation runs; other runs will get included in the experiment xar.

api/src/org/labkey/api/exp/api/ExperimentService.java:811

  • The JavaDoc comment could be more detailed. Consider expanding it to explain: (1) what "derivation run IDs" means in this context, (2) that it includes runs where materials from the specified sample types are used as either inputs or outputs, (3) the behavior of the includeRunsWithDataIO parameter more explicitly (when true, includes all derivation and aliquot runs; when false, includes aliquot runs and only derivation runs without data inputs/outputs).
    /** Get derivation/aliquot run IDs for sample types — filtered by protocol and optionally excluding runs with data inputs/outputs */
    List<Long> getDerivationRunIdsForSampleTypesExport(Collection<String> sampleTypeLsids, Container c, boolean includeRunsWithDataIO);

api/src/org/labkey/api/exp/api/ExperimentService.java:808

  • The JavaDoc comment could be more detailed. Consider expanding it to explain: (1) that it returns run IDs where the specified data class is used as either input or output, (2) that it only includes runs with the SAMPLE_DERIVATION_PROTOCOL that have no material inputs or outputs (since those are handled by the sample type writer).
    /** Get derivation run IDs for a data class — runs with SAMPLE_DERIVATION_PROTOCOL that have no material inputs/outputs */
    List<Long> getDerivationRunIdsForDataClassExport(long dataClassRowId);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant