Allow join yielding by work and time simultaneously#22610
Merged
teskje merged 1 commit intoMaterializeInc:mainfrom Oct 25, 2023
Merged
Allow join yielding by work and time simultaneously#22610teskje merged 1 commit intoMaterializeInc:mainfrom
teskje merged 1 commit intoMaterializeInc:mainfrom
Conversation
8b1d2e7 to
0174bf3
Compare
antiguru
approved these changes
Oct 24, 2023
Member
antiguru
left a comment
There was a problem hiding this comment.
Looks good, left some comments inline.
Comment on lines
+103
to
+113
| (Materialize, Some(work_limit), Some(time_limit)) => { | ||
| let yield_fn = | ||
| move |start: Instant, work| work >= work_limit || start.elapsed() >= time_limit; | ||
| mz_join_core(arranged1, arranged2, shutdown_token, result, yield_fn) | ||
| } | ||
| (Materialize, Some(work_limit), None) => { | ||
| let yield_fn = move |_start, work| work >= work_limit; | ||
| mz_join_core(arranged1, arranged2, shutdown_token, result, yield_fn) | ||
| } | ||
| (Materialize, None, Some(time_limit)) => { | ||
| let yield_fn = move |start: Instant, _work| start.elapsed() >= time_limit; |
Member
There was a problem hiding this comment.
I wonder what the overhead is to checking the option inside the closure. Calling mz_join_core several times might be bad for compile times, but checking the limits in the closure might consume more CPU? Unclear!
Contributor
Author
There was a problem hiding this comment.
So there are several things we could do:
- Have a single
yield_fnthat always checks both work and time and handles theOptions. Good for readability, good for compile times, bad for runtime performance. - Have multiple
yield_fns for eachSome/Nonecombination, instantiate a differentmz_join_corevariant for each of them. Bad for readability, bad for compile times, good for runtime performance. - Have multiple
yield_fns for eachSome/Nonecombination, instantiate a singlemz_join_corethat takes aBox<dyn YFn>. Okay for readability, good for compile times, okay for runtime performance.
(The "X for runtime performance" are just guesses, I don't really know either.)
I went with (2) here because I really want to avoid regressing join performance, but I'm not too happy about readability. I'm not too worried about compile times, though maybe I should be :)
antiguru
reviewed
Oct 24, 2023
This commit extends the `YieldSpec` type and the syntax allowed for the `linear_join_yielding` system var to enable specifying yield strategies that consider both the performed work and the elapsed time. This will allow us to make sure that (a) join operators don't keep around huge amounts of output records and (b) join operators don't regress interactivity.
0174bf3 to
67c9dcb
Compare
Contributor
Author
|
TFTR! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR extends the
YieldSpectype and the syntax allowed for thelinear_join_yieldingsystem var to enable specifying yield strategies that consider both the performed work and the elapsed time.This will allow us to make sure that (a) join operators don't keep around huge amounts of output records and (b) join operators don't regress interactivity.
Motivation
Part of MaterializeInc/database-issues#6761
Follow-up to #22391, in which we discussed that we still want a way to ensure join outputs won't produce an OOM when time-based yielding is used.
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.