Skip to content

Extend preprint-publication dedup from FLoRA to FReD #106

@LukasWallrich

Description

@LukasWallrich

Context

FLoRA now has preprint-publication deduplication (added for #105) that:

  • Detects when the same paper appears as both preprint and published version (different DOIs)
  • Resolves confirmed duplicates, keeping the published version and storing the alternative DOI in doi_o_alt / doi_r_alt columns
  • Handles both replication-side (same doi_o, different doi_r) and original-side (different doi_o for the same paper) duplicates

What needs to happen

  1. Extend dedup to FReD: The same preprint-publication detection logic (R/preprint_dedup.R) should be applied to the FReD effect-level dataset, not just the paper-level FLoRA dataset.

  2. Add alternative DOI columns even without duplicates: Where FReD references a DOI that has a known preprint/published counterpart (from the FLoRA confirmed duplicates or CrossRef metadata), doi_o_alt and doi_r_alt should be populated even if FReD only contains one version. This ensures users can look up papers by either DOI.

References

  • Preprint dedup logic: R/preprint_dedup.R
  • FLoRA pipeline integration: Step 7c in pipelines/flora/prepare_flora.qmd
  • Confirmed duplicates: cache/confirmed_preprint_duplicates.csv
  • Original issue: Deduplicate preprints and publications #105

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions