-
-
Notifications
You must be signed in to change notification settings - Fork 39
Closed
Description
I have this lines of code which produces different results with and without using diskframe.
a.df -> the diskframe with 2735110 rows
the group_by line:
result <- a.df %>%
group_by(col1,col2,col3,col4) %>%
summarize(tot4 = sum(col4), tot5 = sum(col5)) %>%
chunk_ungroup()
after the execution the result has 2735110 rows
but the same line with data frame (or at least when I collect(a.df)) returns different number of rows: 273511 rows
result <- collect(a.df) %>%
group_by(col1,col2,col3,col4) %>%
summarize(tot4 = sum(col4), tot5 = sum(col5)) %>%
ungroup
I cannot and should not collect the a.df here because it will be so big in future.
any suggestion or advice on this?
Thanks in advance
Metadata
Metadata
Assignees
Labels
No labels