-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
If you compare data frames that both have no rows, the summary correctly shows a perfect match.
In exploratory work, I sometimes run into cases where I get a perfect match but it is actually just caused by both input data frames being empty (e.g., because of a faulty previous join). I think it would be nice to add a warning to the summary in case both data frames have no rows.
Example:
import polars as pl
from diffly import compare_frames
left = pl.DataFrame({"id": [], "value": [], "name": []}).cast(
{"id": pl.Int64, "value": pl.Float64, "name": pl.Utf8}
)
right = pl.DataFrame({"id": [], "value": [], "name": []}).cast(
{"id": pl.Int64, "value": pl.Float64, "name": pl.Utf8}
)
comparison = compare_frames(left, right, primary_key="id")
print(comparison.summary())prints
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Diffly Summary ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
--- Data frames match exactly! ---
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels