-
Notifications
You must be signed in to change notification settings - Fork 1
Description
TL;DR:
We may be able to derive some more filters to remove samples that aren't relevant for the classifications but I'm not sure what they should be.
Details:
I did a 500 sample test run recently with the cells and multichannel filters on and it was reduced down to ~200 samples. Looking through the results, I think we still have some samples that don't fit a category.
I put all the samples into a google sheet here and added a manual category column where I started adding what categories certain samples should go in. Rows marked flagged were picked up by the rough regex flags I already put together. I also added data from a 2k sample run (filtered to ~800 samples) but didn't get to adding any manual category labels there.
While I haven't gone through the whole thing yet, I found a couple rows that mentioned something like gene expression in a tissue but didn't mention mice or a treatment to a mouse. Examples I found already were on lines 3 and 5 of the 500 sample sheet.
- Do you think we should use a filter to capture these samples and remove them?
- If so, what's the higher-level group these fall under?
- If not, do they fit into a category that I'm missing or why do we want to save them?
- Also, feel free to add ideas about any other filters below.