Skip to content

Construct combined supervised model #40

@mbaak

Description

@mbaak

Name-pair matching can provided for two types of supervised models, with and without rank features.

  • without rank features: is a regular string matching score.
  • with rank features: is a calibrated probability score that includes the number of good ground truth candidates.
    The model with rank features has the best performance.

For example, with two good ground truth matches for a company name, the two scores without rank features can both be close to 1, and the two scores with rank features will both be 0.5. When using calibrated scores (with rank features) then you miss that name-pair b/c probably it doesn't pass the discrimination threshold.

In practice it would be interesting to have a combined supervised model that can take the best with-rank-features prediction of a set of high-probability name-pairs without rank features.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions