Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
ebd152d
modular optimization paths - init
ben-schwen Oct 28, 2025
71b21ab
make linter happy
ben-schwen Oct 29, 2025
8a9e727
move tests
ben-schwen Oct 30, 2025
04e5782
add lapply(list(col1, col2, ...), fun) pattern
ben-schwen Oct 30, 2025
a8dde19
turn on optimization
ben-schwen Oct 31, 2025
67f2874
add type conversion support to GForce
ben-schwen Nov 1, 2025
2876ebe
remove stale branch
ben-schwen Nov 1, 2025
c445c38
add tests
ben-schwen Nov 2, 2025
5410e31
update man
ben-schwen Nov 2, 2025
dece1c6
merge tests
ben-schwen Nov 2, 2025
5e1789d
polish test fun
ben-schwen Nov 2, 2025
62f1c48
add arithmetic
ben-schwen Nov 2, 2025
c47ec27
add AST walker and update tests
ben-schwen Nov 2, 2025
1d324d6
add tests
ben-schwen Nov 2, 2025
6b54c1e
Merge branch 'master' into modular_gforce
ben-schwen Nov 2, 2025
22cf35e
add NEWS
ben-schwen Nov 2, 2025
25a7e2e
make function name in massageSD more expressive
ben-schwen Nov 3, 2025
eb8056c
rename levels argument to optimization
ben-schwen Nov 3, 2025
4544398
update docs
ben-schwen Nov 3, 2025
d40edb8
restore test nums
ben-schwen Nov 3, 2025
5e7efb7
remove double tests
ben-schwen Nov 3, 2025
3826927
simplify tests
ben-schwen Nov 3, 2025
982343f
phrasing
ben-schwen Nov 4, 2025
996b28c
Merge remote-tracking branch 'refs/remotes/origin/modular_gforce' int…
ben-schwen Nov 4, 2025
1e6ad03
use mget for all vector params
ben-schwen Nov 4, 2025
9e1297e
rename optimization parameter
ben-schwen Nov 4, 2025
f6981d6
rename optimization parameter also in test
ben-schwen Nov 4, 2025
9fc4734
add optimize param checks
ben-schwen Nov 4, 2025
6aaea51
Merge branch 'master' into modular_gforce
ben-schwen Nov 4, 2025
c07999a
remove trailing ws
ben-schwen Nov 4, 2025
6914818
Merge branch 'master' into modular_gforce
ben-schwen Dec 15, 2025
6c7e368
Update man/test.Rd
ben-schwen Dec 15, 2025
08c9524
Merge branch 'master' into modular_gforce
ben-schwen Jan 5, 2026
047f6be
readd context
ben-schwen Jan 5, 2026
5a7a9a3
Update NEWS.md
MichaelChirico Jan 7, 2026
b495503
revert spurious diff
MichaelChirico Jan 7, 2026
03bcdd8
?
MichaelChirico Jan 7, 2026
fe525bf
add space
ben-schwen Jan 7, 2026
71b9838
reference deletion of tests
ben-schwen Jan 7, 2026
494cfe2
reference deletion of tests2
ben-schwen Jan 7, 2026
6f42ff5
add comment about removed tests
ben-schwen Jan 7, 2026
ac306eb
add comment about optimization level comparison
ben-schwen Jan 7, 2026
431dfc2
add comment about removed test
ben-schwen Jan 7, 2026
158136b
fix typo
ben-schwen Jan 7, 2026
0c2f61f
remove doubled test
ben-schwen Jan 7, 2026
2c7ebaf
add comment
ben-schwen Jan 7, 2026
371e246
update subsuming comments
ben-schwen Jan 7, 2026
e2694e1
add subsuming comments
ben-schwen Jan 7, 2026
da771d4
finish double checking of moving tests
ben-schwen Jan 7, 2026
af15282
make optimize more robust
ben-schwen Jan 7, 2026
b61f280
add comment about removing tests in benchmark.Rraw
ben-schwen Jan 7, 2026
d8e34d3
be clearer in NEWS
ben-schwen Jan 7, 2026
c5fb65a
add nocovs for errors
ben-schwen Jan 7, 2026
9f0e5cf
add unwrapper for conversions
ben-schwen Jan 7, 2026
8129198
add more tests
ben-schwen Jan 7, 2026
5c5d88b
improve comment
ben-schwen Jan 9, 2026
100aad5
unnest one layer
ben-schwen Jan 9, 2026
9167edb
unnest
ben-schwen Jan 9, 2026
282a091
move into helper
ben-schwen Jan 9, 2026
6197a34
Merge branch 'master' into modular_gforce
ben-schwen Jan 9, 2026
cf6def5
fix NEWS numbering
ben-schwen Jan 9, 2026
0480ee5
remove trailing ws
ben-schwen Jan 9, 2026
4be8a24
clean up merge
ben-schwen Jan 9, 2026
0a07dba
update errors
ben-schwen Jan 9, 2026
283ba85
Merge branch 'master' into modular_gforce
ben-schwen Jan 9, 2026
416bff0
try another round of unnesting
MichaelChirico Jan 11, 2026
446b448
Merge branch 'master' into modular_gforce
MichaelChirico Jan 13, 2026
2986436
cleaning up my mess (typos, thinkos)
MichaelChirico Jan 13, 2026
45a4575
thinko
MichaelChirico Jan 13, 2026
7a1b021
restore fallthrough case
MichaelChirico Jan 13, 2026
383b60a
delint
MichaelChirico Jan 13, 2026
62997d3
move the optimization comment into 'documentation' of .massageSD
MichaelChirico Jan 13, 2026
d3adad1
use more typical list style for clarity
MichaelChirico Jan 13, 2026
a50503f
typo it's -> its
MichaelChirico Jan 13, 2026
35bd7d9
unnest for unusual list() case
MichaelChirico Jan 13, 2026
6243cf3
try a better name
MichaelChirico Jan 13, 2026
c5e95d2
alignment for readability
MichaelChirico Jan 13, 2026
69c3a7c
add comment about \(x)
ben-schwen Jan 13, 2026
e7e6444
remove old comment
ben-schwen Jan 13, 2026
f32875b
use nzchar(jl__names)
ben-schwen Jan 13, 2026
864383f
use identical and quote instead of direct comparison
ben-schwen Jan 13, 2026
b57c2d5
avoid lambda fun
ben-schwen Jan 13, 2026
bf6c61f
use early exits instead of is_valid
ben-schwen Jan 13, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,14 @@

7. Fixed compilation failure like "error: unknown type name 'siginfo_t'" in v1.18.0 in some strict environments, e.g., FreeBSD, where the header file declaring the POSIX function `waitid` does not transitively include the header file defining the `siginfo_t` type, [#7516](https://github.com/rdatatable/data.table/issues/7516). Thanks to @jszhao for the report and @aitap for the fix.

8. GForce and lapply optimization detection has been refactored to use modular optimization paths and an AST (Abstract Syntax Tree) walker for improved maintainability and extensibility. The new architecture separates optimization detection into distinct, composable phases. This makes future optimization enhancements a lot easier. Thanks to @grantmcdermott, @jangorecki, @MichaelChirico, and @HughParsonage for the suggestions and @ben-schwen for the implementation.

This rewrite also introduces several new optimizations:
- Enables Map in addition to lapply optimizations (e.g., `Map(fun, .SD)` -> `list(fun(col1), fun(col2), ...)`) [#5336](https://github.com/Rdatatable/data.table/issues/5336)
- lapply optimization works without .SD (e.g., `lapply(list(col1, col2), fun)` -> `list(fun(col1), fun(col2))` [#5032](https://github.com/Rdatatable/data.table/issues/5032)
- Type conversion support in GForce expressions (e.g., `sum(as.numeric(x))` will use GForce, saving the need to coerce `x` in a setup step) [#2934](https://github.com/Rdatatable/data.table/issues/2934)
- Arithmetic operation support in GForce (e.g., `max(x) - min(x)` will use GForce on both `max(x)` and `min(x)`, saving the need to do the subtraction in a follow-up step) [#3815](https://github.com/Rdatatable/data.table/issues/3815)

### Notes

1. {data.table} now depends on R 3.5.0 (2018).
Expand Down
720 changes: 451 additions & 269 deletions R/data.table.R

Large diffs are not rendered by default.

34 changes: 33 additions & 1 deletion R/test.data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -380,7 +380,39 @@ utf8_check = function(test_str) identical(test_str, enc2native(test_str))
test = function(num, x, y=TRUE,
error=NULL, warning=NULL, message=NULL, output=NULL, notOutput=NULL, ignore.warning=NULL,
options=NULL, env=NULL,
context=NULL, requires_utf8=FALSE) {
context=NULL, requires_utf8=FALSE, optimize=NULL) {
# if optimization is provided, test across multiple optimization levels
if (!is.null(optimize)) {
if (!is.numeric(optimize) || length(optimize) < 1L || anyNA(optimize) || any(optimize < 0L))
stopf("optimize must be numeric, length >= 1, non-NA, and >= 0; got: %s", optimize) # nocov
cl = match.call()
if ("datatable.optimize" %in% names(cl$options))
stopf("Trying to set optimization level through both options= and optimize=") # nocov
cl$optimize = NULL # Remove optimization levels from the recursive call

# Check if y was explicitly provided (not just the default)
y_provided = !missing(y)
vector_params = mget(c("error", "warning", "message", "output", "notOutput", "ignore.warning"), environment())
vector_params = vector_params[lengths(vector_params) > 0L]
compare = !y_provided && length(optimize)>1L && !length(vector_params)
# When optimize has multiple levels, vector params are recycled across levels.
if (length(optimize) > 1L && "warning" %in% names(vector_params) && length(vector_params$warning) > 1L)
warningf("warning= with multiple values is recycled across optimize levels, not treated as multiple warnings in one run")

for (i in seq_along(optimize)) {
cl$num = num + (i - 1L) * 1e-6
opt_level = list(datatable.optimize = optimize[i])
cl$options = if (!is.null(options)) c(as.list(options), opt_level) else opt_level
for (param in names(vector_params)) {
val = vector_params[[param]]
cl[[param]] = val[((i - 1L) %% length(val)) + 1L] # cycle through values if fewer than optimization levels
}

if (compare && i == 1L) cl$y = eval(cl$x, parent.frame())
eval(cl, parent.frame()) # actual test call
}
return(invisible())
}
if (!is.null(env)) {
old = Sys.getenv(names(env), names=TRUE, unset=NA)
to_unset = !lengths(env)
Expand Down
26 changes: 8 additions & 18 deletions inst/tests/benchmark.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -190,24 +190,14 @@ DT = data.table(A=1:10,B=rnorm(10),C=paste("a",1:100010,sep=""))
test(301.1, nrow(DT[,sum(B),by=C])==100010)

# Test := by key, and that := to the key by key unsets the key. Make it non-trivial in size too.
local({
old = options(datatable.optimize=0L); on.exit(options(old))
set.seed(1)
DT = data.table(a=sample(1:100, 1e6, replace=TRUE), b=sample(1:1000, 1e6, replace=TRUE), key="a")
test(637.1, DT[, m:=sum(b), by=a][1:3], data.table(a=1L, b=c(156L, 808L, 848L), m=DT[J(1), sum(b)], key="a"))
test(637.2, key(DT[J(43L), a:=99L]), NULL)
setkey(DT, a)
test(637.3, key(DT[, a:=99L, by=a]), NULL)
})
local({
options(datatable.optimize=2L); on.exit(options(old))
set.seed(1)
DT = data.table(a=sample(1:100, 1e6, replace=TRUE), b=sample(1:1000, 1e6, replace=TRUE), key="a")
test(638.1, DT[, m:=sum(b), by=a][1:3], data.table(a=1L, b=c(156L, 808L, 848L), m=DT[J(1), sum(b)], key="a"))
test(638.2, key(DT[J(43L), a:=99L]), NULL)
setkey(DT,a)
test(638.3, key(DT[, a:=99L, by=a]), NULL)
})
set.seed(1)
DT = data.table(a=sample(1:100, 1e6, replace=TRUE), b=sample(1:1000, 1e6, replace=TRUE), key="a")
opt = c(0L,2L)
test(637.1, optimize=opt, copy(DT)[, m:=sum(b), by=a][1:3], data.table(a=1L, b=c(156L, 808L, 848L), m=DT[J(1), sum(b)], key="a"))
test(637.2, optimize=opt, key(copy(DT)[J(43L), a:=99L]), NULL)
setkey(DT, a)
test(637.3, optimize=opt, key(copy(DT)[, a:=99L, by=a]), NULL)
# test 637 subsumes 637 and 638 for different optimization levels

# Test X[Y] slowdown, #2216
# Many minutes in 1.8.2! Now well under 1s, but 10s for very wide tolerance for CRAN. We'd like CRAN to tell us if any changes
Expand Down
Loading
Loading