Skip to content

ooptimize IoU matching: sparse strtree output + 1:1 pre-filtering#1361

Open
musaqlain wants to merge 1 commit into
weecology:mainfrom
musaqlain:IoU_performance
Open

ooptimize IoU matching: sparse strtree output + 1:1 pre-filtering#1361
musaqlain wants to merge 1 commit into
weecology:mainfrom
musaqlain:IoU_performance

Conversation

@musaqlain
Copy link
Copy Markdown
Contributor

@musaqlain musaqlain commented Mar 25, 2026

Resolves #1345

_overlap_all() used the STRtree to find overlapping pairs but then discarded that sparse result by filling dense (n_truth × n_pred) matrices. These were passed directly to linear_sum_assignment(), which runs in O(n²m).

Following @jveitchmichaelis's suggestion, this PR:

  1. _overlap_all() returns sparse parallel arrays directly from the STRtree.

  2. match_polygons() first identify unambiguous 1:1 matches {using np.bincount on the STRtree indices} and resolves them immediately. Only the remaining ambiguous pairs go to linear_sum_assignment() via a reduced sub-matrix.

  3. 1 improvement I made: union areas are computed arithmetically (area(A) + area(B) - area(intersection)) instead of calling shapely.union(), this is more efficient as per my findings.

Existing tests pass as-is.....

#AI disclosure

  • AI is used for final improvements. snippets are generated by copilot along my coding
  • AI is used to gather background knowledge...

Copilot AI review requested due to automatic review settings March 25, 2026 19:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses the performance/memory bottleneck in polygon IoU matching by keeping STRtree overlap results sparse and reducing the size of the assignment problem before running Hungarian matching.

Changes:

  • Change _overlap_all() to return sparse parallel arrays (overlap indices + intersection/union areas) instead of dense (n_truth × n_pred) matrices.
  • Update match_polygons() to pre-resolve unambiguous 1:1 overlaps and run linear_sum_assignment() only on the remaining ambiguous subset.
  • Compute union areas via area(A) + area(B) - area(intersection) instead of shapely.union().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/deepforest/IoU.py Outdated
Comment thread src/deepforest/IoU.py
Comment thread src/deepforest/IoU.py Outdated
Comment thread src/deepforest/IoU.py Outdated
Comment thread src/deepforest/IoU.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 84.78261% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.75%. Comparing base (408e150) to head (9183c2d).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/deepforest/IoU.py 84.78% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1361      +/-   ##
==========================================
- Coverage   86.87%   86.75%   -0.13%     
==========================================
  Files          24       24              
  Lines        3064     3202     +138     
==========================================
+ Hits         2662     2778     +116     
- Misses        402      424      +22     
Flag Coverage Δ
unittests 86.75% <84.78%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vickysharma-prog
Copy link
Copy Markdown
Contributor

Traced through the diff the two-stage matching logic checks out. If a truth has exactly one overlapping pred and that pred has exactly one overlapping truth (bincount==1 both sides) with positive intersection, pulling them out can't affect the optimal assignment for what remains. The inter_areas > 0 guard matters too STRtree "intersects" includes boundary-touching pairs with zero-area overlap that you'd never want to lock in as a match.

Couple of things I noticed:

The early return adds a new path that didn't exist before when truths and preds both exist but nothing overlaps, the original ran linear_sum_assignment on an all-zeros matrix and assigned some truths to predictions with IoU=0. The new code short-circuits all truths to unmatched (prediction_id=None, IoU=0). More sensible, but it is a behavioral change a test for truths-present-zero-overlaps would pin down the contract.

setdefault in Stage 2 is technically safe since Stage 1 and Stage 2 touch disjoint truth indices by construction, but a direct assignment would make that invariant visible instead of silently swallowing a violation if it ever breaks.

Since #1345 had the tracemalloc script, running it on this branch
would give concrete before/after numbers.

@jveitchmichaelis
Copy link
Copy Markdown
Collaborator

jveitchmichaelis commented Mar 30, 2026

Thanks for the progress on this @musaqlain, it would be interesting to see the same memory test you shared in the issue thread. I can also have a look on a larger dataset to see what sort of improvement we're likely to see in practice. I don't think we have any numbers for what sort of sparsity we'd actually see (like how many ambiguous matches do we end up with for a typical eval).

In the case where there is no overlap, this shouldn't affect final scoring within the eval path because we threshold on IoU (result["match"] = result.IoU > iou_threshold). A completely non-overlapping assignment would get dropped here?

Comment thread src/deepforest/IoU.py
Comment thread src/deepforest/IoU.py Outdated
Comment thread src/deepforest/IoU.py Outdated
Comment thread src/deepforest/IoU.py
Comment thread src/deepforest/IoU.py Outdated
Copy link
Copy Markdown
Collaborator

@jveitchmichaelis jveitchmichaelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, I think some complexity can be reduced.

@musaqlain
Copy link
Copy Markdown
Contributor Author

thanks @jveitchmichaelis, here are the tracemalloc numbers (n_truth=5000, n_pred=10000):
bbefore (dense matrices):

Shape: (5000, 10000)
Sparsity: 0.0001
Peak memory: 763.9 MB
Time: 0.97s

after (sparse arrays):

STRtree pairs (M): 5070 of 50,000,000 possible (0.01% non-zero)
Peak memory: 0.6 MB  |  Time: 0.17s

memory drops ~1,200x. typical sparsity looks to be well under 0.1% so the pre-filter should help in practice.

On the zero-overlap case: agree, IoU > iou_threshold would drop those matches downstream anyway so final scoring is unaffected.

@musaqlain
Copy link
Copy Markdown
Contributor Author

since the number of function parameters changed, (_overlap_all()), here is the updated memory test script, I ran it against the updated code changes nad the results are satiffactory:


import os
import time
import tracemalloc
import geopandas as gpd
import numpy as np
from shapely.geometry import box

from deepforest.IoU import _overlap_all


def make_dummy_data(n, img_size=10000, box_size=50):
    xs = np.random.uniform(0, img_size - box_size, n)
    ys = np.random.uniform(0, img_size - box_size, n)
    geoms = [box(x, y, x + box_size, y + box_size) for x, y in zip(xs, ys)]
    return gpd.GeoDataFrame({"score": np.random.uniform(0.5, 1.0, n)}, geometry=geoms)


np.random.seed(42)
N_TRUTH = 5000
N_PRED = 10000
truth = make_dummy_data(N_TRUTH)
preds = make_dummy_data(N_PRED)

tracemalloc.start()
t0 = time.time()

t_idx, p_idx, inter_areas, union_areas, truth_ids, pred_ids = _overlap_all(preds, truth)

t1 = time.time()
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()

M = t_idx.size
dense_size_mb = (N_TRUTH * N_PRED * 8 * 2) / 1024 / 1024  # two float64 matrices
sparse_size_mb = (M * 8 * 4) / 1024 / 1024  # four float64/intp arrays of length M
sparsity = M / (N_TRUTH * N_PRED)

print(f"STRtree pairs (M):  {M} of {N_TRUTH * N_PRED} possible ({sparsity:.4%} non-zero)")
print(f"Dense matrix would: {dense_size_mb:.1f} MB")
print(f"Sparse arrays use:  {sparse_size_mb:.1f} MB")
print(f"Peak memory (measured): {peak / 1024 / 1024:.1f} MB")
print(f"Time: {t1 - t0:.2f}s")

@jveitchmichaelis
Copy link
Copy Markdown
Collaborator

Thanks for this, I'm going to run some tests on a big dataset on our cluster (since that's probably worst-case).

I'll also run an eval check against our benchmark dataset to check for any regression there, but in theory this approach should be a pure speedup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance: Dense O(n×m) IoU matrices cause excessive memory usage for large-scale evaluations

4 participants