-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Name-pair matching can provided for two types of supervised models, with and without rank features.
- without rank features: is a regular string matching score.
- with rank features: is a calibrated probability score that includes the number of good ground truth candidates.
The model with rank features has the best performance.
For example, with two good ground truth matches for a company name, the two scores without rank features can both be close to 1, and the two scores with rank features will both be 0.5. When using calibrated scores (with rank features) then you miss that name-pair b/c probably it doesn't pass the discrimination threshold.
In practice it would be interesting to have a combined supervised model that can take the best with-rank-features prediction of a set of high-probability name-pairs without rank features.
Metadata
Metadata
Assignees
Labels
No labels