Skip to content

[Question] Missing prepare_ovdg_dataset.py and low performance on novel classes (DIOR) #17

@ShacooKL

Description

@ShacooKL

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmrotate

Environment

ys.platform: linux
Python: 3.8.20 | packaged by conda-forge | (default, Sep 30 2024, 17:52:49) [GCC 13.3.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5: NVIDIA A800 80GB PCIe
CUDA_HOME: /usr/local/cuda-11.7/bin/nvcc
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
PyTorch: 2.1.0+cu118
PyTorch compiling details: PyTorch built with:

  • GCC 9.3
  • C++ Version: 201703
  • Intel(R) oneAPI Math Kernel Library Version 2022.1-Product Build 20220311 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX512
  • CUDA Runtime 11.8
  • NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90
  • CuDNN 8.9.5
    • Built with CuDNN 8.7
  • Magma 2.6.1
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.16.0+cu118
OpenCV: 4.13.0
MMEngine: 0.10.7
MMRotate: 1.0.0rc1+

Reproduces the problem - code sample

import os
import json
import glob
import random
from tqdm import tqdm

def generate_nwpu_unlabeled_json_final():
    # ================= Configuration Area (Updated) =================
    
    # 1. Dataset root directory (the directory containing category folders like airplane, airport, etc.)
    data_root = "NWPU-RESISC45"
    
    # 2. Output filename
    output_json = "annotations/nwpu45_unlabeled_2.json"
    
    # 3. Number of images to sample per class
    shots_per_class = 50 
    
    # 4. Random seed
    seed = 42
    
    # ================================================================

    # 45 class names
    classes = sorted([
        'airplane', 'airport', 'baseball_diamond', 'basketball_court', 'beach', 
        'bridge', 'chaparral', 'church', 'circular_farmland', 'cloud', 
        'commercial_area', 'dense_residential', 'desert', 'forest', 'freeway', 
        'golf_course', 'ground_track_field', 'harbor', 'industrial_area', 
        'intersection', 'island', 'lake', 'meadow', 'medium_residential', 
        'mobile_home_park', 'mountain', 'overpass', 'palace', 'parking_lot', 
        'railway', 'railway_station', 'rectangular_farmland', 'river', 
        'roundabout', 'runway', 'sea_ice', 'ship', 'snowberg', 
        'sparse_residential', 'stadium', 'storage_tank', 'tennis_court', 
        'terrace', 'thermal_power_plant', 'wetland'
    ])

    # Build Class ID mapping
    cat2id = {name: i + 1 for i, name in enumerate(classes)}

    random.seed(seed)
    
    coco_output = {
        "images": [],
        "annotations": [], # Keep empty
        "categories": []
    }

    # Populate Categories
    for name, cid in cat2id.items():
        coco_output["categories"].append({"id": cid, "name": name})

    print(f"Data Source: {data_root}")
    print(f"Output to: {output_json}")
    print(f"Processing: {shots_per_class} images per class...")
    
    img_id_counter = 1
    
    # Check if root directory exists
    if not os.path.exists(data_root):
        raise FileNotFoundError(f"Data directory not found: {data_root}")

    for cat_name in tqdm(classes, desc="Scanning classes"):
        cat_dir = os.path.join(data_root, cat_name)
        
        if not os.path.exists(cat_dir):
            print(f"Warning: Category directory {cat_dir} not found, skipping.")
            continue

        img_files = glob.glob(os.path.join(cat_dir, "*.jpg"))
        random.shuffle(img_files)
        
        selected_imgs = img_files[:shots_per_class]
        current_cat_id = cat2id[cat_name]

        for img_path in selected_imgs:
            # Calculate relative path, generated file_name looks like "airplane/airplane_001.jpg"
            rel_path = os.path.relpath(img_path, data_root)
            
            image_info = {
                "id": img_id_counter,
                "file_name": rel_path,
                "width": 256,
                "height": 256,
                # Key step: Must include category_id here to work with merge_ovdg_preds.py
                "category_id": current_cat_id 
            }
            coco_output["images"].append(image_info)
            img_id_counter += 1

    # Create output directory
    os.makedirs(os.path.dirname(output_json), exist_ok=True)

    # Save
    with open(output_json, 'w') as f:
        json.dump(coco_output, f)

    print(f"\nSuccessfully generated: {output_json}")
    print(f"Total images included: {len(coco_output['images'])}")

if __name__ == "__main__":
    generate_nwpu_unlabeled_json_final()

Reproduces the problem - command or script

DEVICES_ID=3

exp1="glip_atss_r50_a_fpn_dyhead_visdronezsd_base"

# Step1: train base-detector
CUDA_VISIBLE_DEVICES=$DEVICES_ID python tools/train.py \
    projects/GLIP/configs/$exp1.py

# Step2.1: pseudo-labeling
exp2="glip_atss_r50_a_fpn_dyhead_visdronezsd_base_nwpu45_pseudo_labeling"
CUDA_VISIBLE_DEVICES=$DEVICES_ID python tools/test.py \
    projects/GLIP/configs/$exp2.py \
    work_dirs/$exp1/iter_20000.pth

# Step2.2: merge predictions
python projects/GroundingDINO/tools/merge_ovdg_preds.py \
    --ann_path data/NWPU-RESISC45/annotations/nwpu45_unlabeled_2.json \
    --pred_path work_dirs/$exp2/nwpu45_pseudo_labeling_2.bbox.json \
    --save_path work_dirs/$exp2/nwpu45_unlabeled_with_glip_pseudos_2.json

cp work_dirs/$exp2/nwpu45_unlabeled_with_glip_pseudos_2.json data/NWPU-RESISC45/annotations/nwpu45_unlabeled_with_glip_pseudos_2.json

# Step3: self-training
exp3="glip_atss_r50_a_fpn_dyhead_visdronezsd_base_nwpu"
CUDA_VISIBLE_DEVICES=$DEVICES_ID python tools/train.py \
    projects/GLIP/configs/$exp3.py

# Step4: test
CUDA_VISIBLE_DEVICES=$DEVICES_ID python tools/test.py \
    projects/GLIP/configs/$exp3.py \
    work_dirs/$exp3/iter_10000.pth \
    --work-dir work_dirs/$exp3/dior_test

Reproduces the problem - error message

+-------------------------+-----+-------+--------+-------+
| class | gts | dets | recall | ap |
+-------------------------+-----+-------+--------+-------+
| airplane | 292 | 4115 | 0.928 | 0.814 |
| baseballfield | 373 | 4478 | 0.922 | 0.846 |
| bridge | 381 | 13307 | 0.593 | 0.252 |
| chimney | 243 | 2412 | 0.831 | 0.791 |
| dam | 155 | 12101 | 0.761 | 0.480 |
| expressway-service-area | 274 | 1000 | 0.785 | 0.686 |
| expressway-toll-station | 188 | 710 | 0.691 | 0.611 |
| golffield | 176 | 3174 | 0.847 | 0.768 |
| harbor | 252 | 6457 | 0.611 | 0.312 |
| overpass | 307 | 3793 | 0.612 | 0.472 |
| ship | 346 | 6537 | 0.766 | 0.538 |
| stadium | 184 | 4078 | 0.946 | 0.856 |
| storagetank | 237 | 1966 | 0.890 | 0.792 |
| tenniscourt | 293 | 5826 | 0.867 | 0.606 |
| trainstation | 143 | 2932 | 0.552 | 0.398 |
| vehicle | 910 | 12082 | 0.731 | 0.437 |
| airport | 160 | 0 | 0.000 | 0.000 |
| basketballcourt | 232 | 0 | 0.000 | 0.000 |
| groundtrackfield | 359 | 0 | 0.000 | 0.000 |
| windmill | 293 | 0 | 0.000 | 0.000 |
+-------------------------+-----+-------+--------+-------+
| mAP | | | | 0.483 |
+-------------------------+-----+-------+--------+-------+

Additional information

First of all, thank you for your impressive work on this project.

I am trying to reproduce the OVD results on the DIOR dataset, but I noticed that the file projects/GroundingDINO/tools/prepare_ovdg_dataset.py is missing from the repository.

My Attempt: To proceed, I implemented a custom script (named generate_nwpu_unlabeled_json_final) based on my understanding to generate the unlabeled JSON file. I followed the rest of the pipeline as described.

The Issue: While the detection for base classes works normally, the model fails to detect novel classes (the scores are extremely low or zero).

My Questions:

Could you please share the original prepare_ovdg_dataset.py file or describe the strict data format it requires? I suspect my custom JSON generation might be causing a misalignment between visual and text features.

Is the NWPU dataset strictly necessary for the model to detect novel classes on DIOR? Or is it used primarily for data augmentation?

Any guidance would be greatly appreciated. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions