added Double Counting algorithm for review by Bhavya1604 · Pull Request #1098 · weecology/DeepForest

Bhavya1604 · 2025-07-29T05:21:19Z

Hi @henrykironde , @bw4sz and @jveitchmichaelis
This Draft PR contains a working prototype script that implements the "predict and delete" strategy to handle the double-counting of objects in overlapping images. I've adapted the core logic from the DoubleCounting repo and integrated it

I tested the workflow on a dataset with 70-80% overlap using the "left-hand" strategy for clear visualization. The blue boxes are all initial predictions, while the pink boxes are the final, unique predictions for that image.

output :

we can observe that the top predictions are in pink which are unique (new for that image) which indicates that the code is able to identify the overlap and is working fine. there were a total of 401 predictions of which 194 where detected to be unique.

This PR contains the standalone DoubleCounting.py script for review. Before I start integrating this into the main DeepForest library, I would greatly appreciate your feedback

bw4sz · 2025-08-07T22:22:55Z

Thanks for your thoughts here. Your example is somewhat difficult to follow because there are alot of boxes, can you use the deepforest bird model and the data here https://github.com/weecology/DoubleCounting/tree/main/tests/data/birds.

I think that if we merge this we would need a seperate pip install since the dependencies are heavy compared to the rest of the repo. So I imagine something like

pip install deepforest[double_counting] I think this would be an extra the .toml

    # pyproject.toml
    [project]
    name = "my_package"
    version = "0.1.0"

    [project.optional-dependencies]
    subpackage_extra = [
        "dependency_for_subpackage_extra_1",
        "dependency_for_subpackage_extra_2",
    ]

Then we would need to collect several other datasets to try to get a handle on how well it generalizes and what parameters are sensitive. These parameters would need to go in the hydra config. The general workflow would be something like

Make a function that takes in a list of images.
use predict_tile on each image
Run double counting
Produce visualizations
Return a results object of unique data

All of this is the module you attached, but would need a integration into deepforest.main()

plus a documentation page with examples.

Roadmap

example with existing test data
gather new examples to assess parameter sensitivity
package extra install for unique dependencies
documentation page
integrate a function with deepforest main (unique_predictions_images in your module, maybe call it deepforest.main.predict_unique

Bhavya1604 · 2025-08-09T13:48:13Z

@bw4sz Thank you again for the detailed feedback and the clear roadmap. The plan for optional dependencies and integrating the feature as predict_unique makes perfect sense.

As you suggested, I've rerun the workflow on the bird dataset to provide a clearer example of the result.

I'll start testing it with different data and let you know the sensitive parameters.

bw4sz · 2025-08-18T14:48:16Z

great, let me know if you need help.

Bhavya1604 · 2025-10-08T06:08:07Z

@bw4sz i could only fine few overlapping dataset from kaggle and github. the rest all data i found was not in order or there where arial images but they where of different places without overlaps so where can i find more datasets to test this . like are there any keywords which would help me or any place i could search for these type of datasets.

bw4sz · 2025-10-08T13:57:04Z

We have a number of datasets, let me look into this today. Can you look at that roadmap above and summarize which pieces are completed, which you plan to do, and which I can help you with. This is great stuff and I have some time this week to assist in review and get it over the finish line. Thanks for the contribution!

Bhavya1604 · 2025-10-09T07:14:39Z

Sorry for the Past two months i haven't been able to focus on this as i was caught up with lots of things.

Now I have much time so I will start with the documentation and creating the extra package install today and once i have the data i will test and observe the sensitive parameters.

I'll constantly give updates on where I am stuck and what's completed

bw4sz · 2025-10-10T14:11:19Z

Great. I've collected data for a couple tests. Thanks for your contribution.

…

On Thu, Oct 9, 2025 at 12:15 AM Bhavya Mehta ***@***.***> wrote: *Bhavya1604* left a comment (weecology/DeepForest#1098) <#1098 (comment)> Sorry for the Past two months i haven't been able to focus on this as i was caught up with lots of things. Now I have much time so I will start with the documentation and creating the extra package install today and once i have the data i will test and observe the sensitive parameters. I'll constantly give updates on where I am stuck and what's completed — Reply to this email directly, view it on GitHub <#1098 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJHBLFVXSZJWLETOV2OWBL3WYDPJAVCNFSM6AAAAACCSY5GYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGOBUGQ2DONJUHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Ben Weinstein, Ph.D. Research Scientist University of Florida https://bw4sz.github.io/

codecov · 2025-10-12T08:50:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.61%. Comparing base (e20f94f) to head (830c135).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1098      +/-   ##
==========================================
+ Coverage   87.43%   87.61%   +0.18%     
==========================================
  Files          20       20              
  Lines        2538     2544       +6     
==========================================
+ Hits         2219     2229      +10     
+ Misses        319      315       -4

Flag	Coverage Δ
unittests	`87.61% <ø> (+0.18%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…tions doc, removed DoubleCounting.py

Bhavya1604 · 2025-10-16T14:06:30Z

@bw4sz I’ve moved the code to evaluate.py and main.py, and added documentation along with separate dependencies.
What’s left is adding pytest coverage (which I’ll do once this structure is confirmed) and testing for sensitive parameters.
It might not be exactly as you intended, so I’ll make adjustments as needed.
I’m also a bit unsure about how I added the separate dependencies so please let me know if any changes are needed there.

Bhavya1604 · 2025-10-28T17:32:26Z

@bw4sz How will you share the testing data with me?

Bhavya1604 · 2025-11-13T15:36:44Z

@bw4sz i hope you are doing well
I wanted to know is you have reviewed this latest commit.

bw4sz · 2026-03-12T17:08:37Z

+```python
+# Train model on non-overlapping data
+model = main.deepforest()
+model.train(


this isn't quite right, just have a look at the docs, we use main.trainer.fit as the primary pathway.

bw4sz · 2026-03-12T17:08:59Z

+
+```python
+# Evaluate unique predictions
+evaluation_results = model.evaluate(


same as above, main.trainer.validate

bw4sz · 2026-03-12T17:10:23Z

+
+For additional support or questions about the double-counting removal functionality, please refer to the DeepForest documentation or community forums.
+
+## Using only the double-counting tools (standalone)


Let's remove this, since this is integrated into the pipeline, we shouldn't advertise running this standalone.

remove everything below.

bw4sz · 2026-03-12T17:11:32Z


    return {"results": results, "box_recall": box_recall, "class_recall": class_recall}
+
+def get_matching_points(h5_file, image1_name, image2_name, min_score=None):


all double counting internal code should go in its own module, makes it easier to rebase and isolate.

bw4sz · 2026-03-12T17:12:42Z

+        print(f"Using device: {device}")
+
+        model = deepforest()
+        model.use_release()


outdated, we use load_model

bw4sz · 2026-03-12T17:14:47Z

                    except MisconfigurationException:
                        pass
+
+    def predict_unique(image_dir, save_dir, strategy='highest-score', visualization=True):


i think we would want to change the name here, since its almost too different than predict_image or predict_tile, its a other set of folders. I'm not confident what to call it but, double count should be in the name.

bw4sz

I'm wondering about this PR. It could be use a bit of cleanup, and we've let it sit. @Bhavya1604 are you interested in continuing to build it here, just let us know, if not I think we might discuss how best to combine it, since its a heavy subpackage and we don't have a 2nd test set right now. I think the best case to me is to isolate it into its own module, that way we can rebase and have it ready to merge when its a bit farther along. if not, it will be a mess to track changes in these main modules.

Create a double_counting.py module and move all logic in there.
Just have a wrapper in main.py that is remove_double_counts that takes in a set of predictions, so we don't duplicate the inference steps.
An example script showing how to make predictions and express the results
Update and condense the docs.

Bhavya1604 · 2026-03-31T15:03:49Z

@bw4sz i agree it would better to create a new module of this.

added doublecounting algorithm for review

ff5338f

bw4sz assigned Bhavya1604 Aug 7, 2025

Merge branch 'weecology:main' into doublecounting_integration

830c135

Refactored docs, updated evaluate.py and main.py, added Unique_predic…

e0beed7

…tions doc, removed DoubleCounting.py

bw4sz reviewed Mar 12, 2026

View reviewed changes

bw4sz added Awaiting author contribution Waiting on the issue author to do something before proceeding and removed To be documented Please wait until this issue is fully documented in order to best contribute. labels Mar 12, 2026

github-actions Bot removed the Awaiting author contribution Waiting on the issue author to do something before proceeding label Mar 31, 2026


		For additional support or questions about the double-counting removal functionality, please refer to the DeepForest documentation or community forums.

		## Using only the double-counting tools (standalone)


		return {"results": results, "box_recall": box_recall, "class_recall": class_recall}

		def get_matching_points(h5_file, image1_name, image2_name, min_score=None):

Conversation

Bhavya1604 commented Jul 29, 2025

Uh oh!

bw4sz commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Roadmap

Uh oh!

Bhavya1604 commented Aug 9, 2025

Uh oh!

bw4sz commented Aug 18, 2025

Uh oh!

Bhavya1604 commented Oct 8, 2025

Uh oh!

bw4sz commented Oct 8, 2025

Uh oh!

Bhavya1604 commented Oct 9, 2025

Uh oh!

bw4sz commented Oct 10, 2025 via email

Uh oh!

codecov Bot commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Bhavya1604 commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bhavya1604 commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bhavya1604 commented Nov 13, 2025

Uh oh!

bw4sz Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

bw4sz Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

bw4sz Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

bw4sz Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

bw4sz Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

bw4sz Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

bw4sz Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

bw4sz left a comment

Choose a reason for hiding this comment

Uh oh!

Bhavya1604 commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bw4sz commented Aug 7, 2025 •

edited

Loading

codecov Bot commented Oct 12, 2025 •

edited

Loading

Bhavya1604 commented Oct 16, 2025 •

edited

Loading

Bhavya1604 commented Oct 28, 2025 •

edited

Loading