Skip to content

Print warning in summary when comparing data frames with no rows #18

@MoritzPotthoffQC

Description

@MoritzPotthoffQC

If you compare data frames that both have no rows, the summary correctly shows a perfect match.

In exploratory work, I sometimes run into cases where I get a perfect match but it is actually just caused by both input data frames being empty (e.g., because of a faulty previous join). I think it would be nice to add a warning to the summary in case both data frames have no rows.

Example:

import polars as pl
from diffly import compare_frames

left = pl.DataFrame({"id": [], "value": [], "name": []}).cast(
    {"id": pl.Int64, "value": pl.Float64, "name": pl.Utf8}
)
right = pl.DataFrame({"id": [], "value": [], "name": []}).cast(
    {"id": pl.Int64, "value": pl.Float64, "name": pl.Utf8}
)

comparison = compare_frames(left, right, primary_key="id")
print(comparison.summary())

prints

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                                     Diffly Summary                                     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
                            --- Data frames match exactly! ---

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions