-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hi @lanfz2000 and authors,
Thank you for sharing the code for OpenPath! I've been reading the paper and finding the approach very interesting. I am currently going through the training scripts to better understand the active learning loop implementation.
I had a quick question regarding the selection strategy for the subsequent query rounds (Round 2 onwards).
In Section 2.3 of the paper, the method describes Entropy-Guided Stochastic Sampling (EGSS), where candidates are split into random batches and the most uncertain samples (highest entropy) are selected from each batch.
While looking at train_sup_crc100k.py, I noticed that the code calculates distance_entropy around lines 204–214. However, in the final selection step at line 232, it appears to use kmean_cluster rather than the entropy values:
## train_sup_crc100k.py
# ... (Entropy calculation happens above) ...
## Kmeans selection
cluster_idx = kmean_cluster(embeds=candidates_features, n=query_num)
selected_names = np.array(candidates_names)[cluster_idx]
It seems that candidates_distance_entropy is defined but not used in this final selection block, and the code defaults to clustering (similar to the strategy used in the first round).
Is this the intended behavior for the active learning loop, or is it possible that an older version of the script (or a baseline version) was uploaded by mistake? I want to make sure I am benchmarking against the exact EGSS logic described in the paper.
Thanks for your help!