Speed up validation for datasets with same-size images#1373
Speed up validation for datasets with same-size images#1373jveitchmichaelis wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1373 +/- ##
==========================================
- Coverage 86.87% 86.48% -0.40%
==========================================
Files 24 24
Lines 3064 3205 +141
==========================================
+ Hits 2662 2772 +110
- Misses 402 433 +31
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I wonder if instead of this we should just make validate_coordinates() something a user can call from DeepForest.main, it seems really heavy every time that we run we open every image. I think the underlying idea here is wrong and while this fixes it, we should probably allow models to fail and then have docs to check validate coordinates to help users find that error? |
|
Or have a CLI script which runs the sanity checks? But yes exactly, the current behaviour is to open every image every time which adds minutes to large training runs. |
|
closed in favor of a solution to issue #1374 |
Description
Most of the time we train on datasets where images are the same size. We spend a lot of time during validation opening images to check what size they are, so we can confirm that annotations are in bounds. A simple optimization if we know up-front that the dataset doesn't have varying sized images, is to take the size of the first one and assume it's correct for the rest of the dataset.
This defaults to
False, which is the brute force approach, but in cases where we know the dataset is good, or the images are the same size, this can save quite a bit of time and disk thrashing.Also cleaned up the keypoint validation checker to have similar structure.
AI-Assisted Development
AI tools used (if applicable):
Claude