Different group_by result due to doing joins with `by_chund_id`

I have this lines of code which produces different results with and without using diskframe.

a.df -> the diskframe with 2735110 rows

the group_by line:
```
result <- a.df %>%
    group_by(col1,col2,col3,col4) %>%
    summarize(tot4 = sum(col4), tot5 = sum(col5)) %>% 
    chunk_ungroup()
```

after the execution the result has  **2735110** rows

but the same line with data frame (or at least when I collect(a.df)) returns different number of rows: **273511** rows

```
result <- collect(a.df) %>%
    group_by(col1,col2,col3,col4) %>%
    summarize(tot4 = sum(col4), tot5 = sum(col5)) %>% 
   ungroup
```

I cannot and should not collect the a.df here because it will be so big in future.
any suggestion or advice on this?

Thanks in advance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Different group_by result due to doing joins with `by_chund_id` #356

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Different group_by result due to doing joins with by_chund_id #356

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Different group_by result due to doing joins with `by_chund_id` #356