Dino typically requires resizing the local patches. The resize operation generally is expensive, and also seems harmful from a metrics perspective.
The point of this task is to replace crop + resize with simply a crop operation, by using a larger canvas to begin with.
Basically, instead of grabbing a LxL patch and grabbing part of that, we start by taking a larger NxN section, and grabbing a LxL section from that.
So we need to find where the resize operation happens in dino. Check the size of patch it has. We should instead feed much larger patches to begin with, and no resize.
https://arxiv.org/pdf/2408.00738
Check Figure 2b, and the paragraph below it.
Dino typically requires resizing the local patches. The resize operation generally is expensive, and also seems harmful from a metrics perspective.
The point of this task is to replace crop + resize with simply a crop operation, by using a larger canvas to begin with.
Basically, instead of grabbing a LxL patch and grabbing part of that, we start by taking a larger NxN section, and grabbing a LxL section from that.
So we need to find where the resize operation happens in dino. Check the size of patch it has. We should instead feed much larger patches to begin with, and no resize.
https://arxiv.org/pdf/2408.00738
Check Figure 2b, and the paragraph below it.