Here is the experiment.
Given the dataframe and functions f0, f1 below
using DataFrames, DataFramesMeta, StatsBase
df = DataFrame(a=1:10_000) # I know the df is small but big enough to show the issue
f0(df::DataFrame) = begin
@chain df begin
@rtransform(:b = :a * 10)
@rtransform(:c = mean(:b))
@rtransform(:d = :b - :c)
@select(:a, :d)
end
end
f1(df::DataFrame) = begin
@chain df begin
@rtransform @astable begin
b = :a * 10
c = mean(b)
:d = b - c
end
end
end
We get an improvement in performance in f1, which is what one would expect given it does not need to create columns b, c .
@time f0(df)
0.001146 seconds (728 allocations: 898.516 KiB)
@time f1(df)
0.000503 seconds (161 allocations: 243.609 KiB)
However, if one uses this code outside a function (see below) it becomes 46 times slower! Making it unusable for datasets of a larger size.
@time @chain df begin
@rtransform @astable begin
b = :a * 10
c = mean(b)
:d = b - c
end
end
-> 2.331518 seconds (335.93 k allocations: 13.028 MiB, 4.69% compilation time)
@time @chain df begin
@rtransform(:b = :a * 10)
@rtransform(:c = mean(:b))
@rtransform(:d = :b - :c)
@select(:a, :d)
end
-> 0.056910 seconds (34.81 k allocations: 3.137 MiB, 95.06% compilation time)
Thanks for the great work :)
Here is the experiment.
Given the dataframe and functions
f0, f1belowWe get an improvement in performance in
f1, which is what one would expect given it does not need to create columnsb, c.However, if one uses this code outside a function (see below) it becomes 46 times slower! Making it unusable for datasets of a larger size.
Thanks for the great work :)