Skip to content

EGSS implementation in subsequent query rounds #4

@blobquiet

Description

@blobquiet

Hi @lanfz2000 and authors,

Thank you for sharing the code for OpenPath! I've been reading the paper and finding the approach very interesting. I am currently going through the training scripts to better understand the active learning loop implementation.

I had a quick question regarding the selection strategy for the subsequent query rounds (Round 2 onwards).

In Section 2.3 of the paper, the method describes Entropy-Guided Stochastic Sampling (EGSS), where candidates are split into random batches and the most uncertain samples (highest entropy) are selected from each batch.

While looking at train_sup_crc100k.py, I noticed that the code calculates distance_entropy around lines 204–214. However, in the final selection step at line 232, it appears to use kmean_cluster rather than the entropy values:

## train_sup_crc100k.py

# ... (Entropy calculation happens above) ...

## Kmeans selection
cluster_idx = kmean_cluster(embeds=candidates_features, n=query_num)
selected_names = np.array(candidates_names)[cluster_idx]

It seems that candidates_distance_entropy is defined but not used in this final selection block, and the code defaults to clustering (similar to the strategy used in the first round).

Is this the intended behavior for the active learning loop, or is it possible that an older version of the script (or a baseline version) was uploaded by mistake? I want to make sure I am benchmarking against the exact EGSS logic described in the paper.

Thanks for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions