EGSS implementation in subsequent query rounds

Hi @lanfz2000 and authors,

Thank you for sharing the code for OpenPath! I've been reading the paper and finding the approach very interesting. I am currently going through the training scripts to better understand the active learning loop implementation.

I had a quick question regarding the selection strategy for the subsequent query rounds (Round 2 onwards).

In **Section 2.3** of the paper, the method describes **Entropy-Guided Stochastic Sampling (EGSS)**, where candidates are split into random batches and the most uncertain samples (highest entropy) are selected from each batch.

While looking at `train_sup_crc100k.py,` I noticed that the code calculates distance_entropy around lines 204–214. However, in the final selection step at line 232, it appears to use `kmean_cluster` rather than the entropy values:

```
## train_sup_crc100k.py

# ... (Entropy calculation happens above) ...

## Kmeans selection
cluster_idx = kmean_cluster(embeds=candidates_features, n=query_num)
selected_names = np.array(candidates_names)[cluster_idx]
```

It seems that `candidates_distance_entropy` is defined but not used in this final selection block, and the code defaults to clustering (similar to the strategy used in the first round).

Is this the intended behavior for the active learning loop, or is it possible that an older version of the script (or a baseline version) was uploaded by mistake? I want to make sure I am benchmarking against the exact EGSS logic described in the paper.

Thanks for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EGSS implementation in subsequent query rounds #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EGSS implementation in subsequent query rounds #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions