Skip to content

Run plot comms on the R thread#1100

Open
lionel- wants to merge 15 commits intomainfrom
task/sync-plot-comm
Open

Run plot comms on the R thread#1100
lionel- wants to merge 15 commits intomainfrom
task/sync-plot-comm

Conversation

@lionel-
Copy link
Copy Markdown
Contributor

@lionel- lionel- commented Mar 10, 2026

Progress towards #689
Branched from #1099
Addresses posit-dev/positron#12825 (planning to close)

I did not foresee all the simplifications falling out of this one!

  • Plots no longer live in their own threads, they live on the R thread. That frees up memory since each background thread takes about 2mb of stack space (depending on platform), so 20 open plots would take 40mb of memory.

  • DeviceContext moves from a standalone thread_local! into a regular field of Console. All access goes through Console::get().device_context(). The CommHandlerContext provides access to Console via ctx.console(), so plot handlers reach the device context through a controlled context chain.

  • The old idle-time polling loop (process_rpc_requests with crossbeam::Select) is removed. Plot RPCs are now dispatched synchronously through the shell handler on the R thread. This improves plot latency when pre-renderings require adjustments because we no longer timeout-poll for plot updates (related to Performance for plotting positron#5184).

  • The GraphicsDeviceNotification async channel is removed. Since the UI comm now runs on the R thread (from Run UI comm on the R thread #1099), prerender settings are updated synchronously via Console::get().device_context().set_prerender_settings(). This removes quite a bit of plumbing.

  • Fixes a dev.hold() regression I introduced with pre-renderings (Send pre-renderings of new plots to the frontend #775). We were previously sending pre-renderings unconditionally, now they are held until dev.flush(). This is tested by new integration tests.

@lionel- lionel- force-pushed the task/sync-plot-comm branch 2 times, most recently from 34baa70 to 845b4a7 Compare March 11, 2026 07:08
@lionel- lionel- requested a review from DavisVaughan March 11, 2026 13:47
Copy link
Copy Markdown
Contributor

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have paused on review until we can get these regressions resolved

dap: Arc<Mutex<Dap>>,
session_mode: SessionMode,
) -> Self {
let device_context = DeviceContext::new(iopub_tx.clone());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me to remove iopub_tx from DeviceContext now and only pull iopub_tx directly from Console

Over in graphics_device.rs, you can just Console::get().iopub_tx() anytime you need it now!

It feels like this better scopes responsibilities to me

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the opposite. It hides that the device context directly communicates with the frontend via IOPub.

Also ideally we should avoid Console::get() calls as these bypass Rust's ownership model. But then the issue is that the device belongs to console and not the other way around. Cloning the IOPub channel is the right ownership structure from that point of view.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It hides that the device context directly communicates with the frontend via IOPub

Hmm I see what you mean

Like the DeviceContext struct should have an "include what you use" kind of policy in the struct fields, and by not having iopub_tx in there, we mask what it uses

Comment on lines +128 to +130
Console::get()
.device_context()
.set_prerender_settings(params.settings);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that we could now make DeviceContext have non-Cell internals?

I am disturbed by being able to call get() to then perform a mutable update, rather than being forced to go through get_mut()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cell is actually more reassuring, especially regarding the possibility of reentrancy (flushing calls process_changes which can recurse into graphics device callbacks).

In general get() is not a signal that the state is not mutable, it's only a signal that the state is shared, and can still be mutable e.g. through Cells or other mechanisms.

It's all a bit hand wavy considering that our callbacks might produce multiple active get_mut() at the same time, but adding to the unsoundness would be... unsound ;)

Comment on lines -503 to -513
// Don't try to render a plot if we're currently drawing.
if self.is_drawing.get() {
log::trace!("Refusing to render due to `is_drawing`");
return;
}

// Don't try to render a plot if someone is asking us not to, i.e. `dev.hold()`
if !self.should_render.get() {
log::trace!("Refusing to render due to `should_render`");
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure these are important

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now process the requests when R is idle, is_drawing is structurally false, we're never mid-draw.

Regarding should_render (matching dev.hold()), it's true that holds can span across prompts, I can see that happening when developing a plot function interactively. This would require:

  • Creating a plot that Positron knows about (opened comm)
  • Calling dev.hold() in the console
  • Resizing the plot in the frontend UI

So that's a very narrow case. We show the uncommitted visual changes in that case, which seems like a reasonable compromise. The alternative would be to refuse to render, producing a distorted plot in the frontend after resize.

@lionel- lionel- force-pushed the task/sync-ui-comm branch 3 times, most recently from bd54059 to be02717 Compare March 20, 2026 06:19
Base automatically changed from task/sync-ui-comm to main March 20, 2026 06:50
@lionel- lionel- force-pushed the task/sync-plot-comm branch 2 times, most recently from 7386726 to be3d85c Compare March 31, 2026 13:24
@lionel- lionel- force-pushed the task/sync-plot-comm branch from be3d85c to 74abf66 Compare April 3, 2026 09:00
@lionel- lionel- requested a review from DavisVaughan April 3, 2026 12:13
@lionel-
Copy link
Copy Markdown
Contributor Author

lionel- commented Apr 3, 2026

I also fixed a race between comm_open and subsequent comm_msg, the tests revealed it was possible to get the latter before the former. That's because the Shell thread is in charge of sending comm_open on IOPub, so we had a triangular configuration where the R thread and the Shell thread were racing to send their messages on IOPub.

To fix this, I made comm opening blocking. There is a new CommEvent::Barrier variant to support this. On the Shell side, the shell thread now uses a select loop that drains comm events while a request is in flight.

A nice consequence of this is that the frontend now receives new plots as soon as they are created, rather than at the end of a loop:

for (i in 1:3) {
  plot(i)
  Sys.sleep(1)
}

I also noticed that posit-dev/positron#6736 is broken again.
I've opened posit-dev/positron#12825 for this.

And of course this is all now under test coverage.

Copy link
Copy Markdown
Contributor

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two main comments to discuss, other things are minor:

  • Not sure about passing console through handle_msg and friends
  • Not sure about CommEvent::Barrier vs passing done through CommEvent::Opened

See below for more on both

My test file seems to be working fine now

Comment on lines +400 to +408
let ready = sel.ready();

while let Ok(event) = self.comm_event_rx.try_recv() {
self.process_comm_event(event);
}

if ready == resp_idx {
break response_rx.recv().unwrap();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So even if sel.ready() returns the execute response as the first "ready" thing, you want to go off and process the comm_events first?

I think I was expecting to see

if ready == resp_idx {
    break response_rx.recv().unwrap();
}

right after let ready =

Comment on lines +74 to +75
// Block until Shell has processed the Opened event, ensuring the
// `comm_open` message is on IOPub before we return. Any updates
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a gut check, but is this really what it ensures? In CommEvent::Opened we do an iopub_tx.send() call, but that doesn't necessarily mean that IOPub has received it by the end of CommEvent::Opened and the following CommEvent::Barrier event is processed.

I'm not sure if that matters.

dap: Arc<Mutex<Dap>>,
session_mode: SessionMode,
) -> Self {
let device_context = DeviceContext::new(iopub_tx.clone());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It hides that the device context directly communicates with the frontend via IOPub

Hmm I see what you mean

Like the DeviceContext struct should have an "include what you use" kind of policy in the struct fields, and by not having iopub_tx in there, we mask what it uses

Comment on lines +686 to +687
#[cfg(feature = "testing")]
pub fn test_init() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this up with start() and call it test_start()?

}

fn handle_msg(&mut self, msg: CommMsg, ctx: &CommHandlerContext) {
fn handle_msg(&mut self, msg: CommMsg, ctx: &CommHandlerContext, _console: &Console) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree with passing console through all these handlers

It seems like the only current use that I can see is in impl CommHandler for PlotComm for handle_msg() and handle_close().

But those could just do Console::get().

It doesn't seem like it helps clarify ownership very much, because so many other things in graphics_device.rs also just do Console::get() already. And it seems likely that in handle_rpc() (called by handle_msg()) we could very easily just do another Console::get() rather than passing through the &Console we got from handle_msg(). It seems complex to me to know when to use which pattern.

It also seems like this _console: &Console argument is the reason for all of the new test_init() infra, and overall this just seems like it adds quite a bit of complexity for (to me) not a lot of obvious benefit 😢

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example, should on_did_execute_request() gain a console: &Console argument and thread it through process_changes -> process_new_plot -> should_use_dynamic_plots to the Console::get call there?

I don't think so, I feel like it makes sense that we just invoke Console::get() from there as required like we do everywhere else.

But now introducing this new pattern of handle_msg() and friends having a console passed through confuses me and makes me question how we should do things in all the other cases, and I'm not sure that's worth it

@@ -1271,11 +1137,10 @@ pub(crate) fn on_execute_request(
#[tracing::instrument(level = "trace", skip_all)]
pub(crate) fn on_did_execute_request() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think on_did_execute_request() should perhaps be a &self method

It is only called in handle_active_request(), but we have Console there as self, hence we also have DeviceContext, so we can just call self.device_context().on_did_execute_request() from there?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I am also a bit weirded out by the ownership implications of all this (both the current state of things and my proposal)

self.device_context().on_did_execute_request() would call DeviceContext::process_changes(), but that goes through process_new_plot() and should_use_dynamic_plots() and you end up accessing the "parent" owner of DeviceContext from there via Console::get(), which feels very mind boggling!

// Save our new socket.
// Refcell Safety: Short borrows in the file.
self.sockets.borrow_mut().insert(id.clone(), socket);
match Console::get_mut().comm_open_backend(PLOT_COMM_NAME, Box::new(plot_comm)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is another place where it feeeels weird that DeviceContext is a field of Console, yet refers to Console as well

}

pub(crate) struct Console {
pub struct Console {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@@ -699,3 +699,481 @@ fn test_plot_default_size_without_metadata() {
frontend.recv_iopub_idle();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random note that a cargo build --release is throwing this for me right now

warning: unused import: `libr::SEXP`
  --> crates/ark/src/r_task.rs:19:5
   |
19 | use libr::SEXP;
   |     ^^^^^^^^^^

I think this type is only used in debug builds?

frontend.recv_shell_execute_reply();
}

/// Test that `dev.hold()` suppresses intermediate plot output.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like plot(lm(disp ~ drat, data = mtcars)) is working again, but I dont see a backend test for it?

@lionel-
Copy link
Copy Markdown
Contributor Author

lionel- commented Apr 3, 2026

Ok let's discuss Console access by comms in our next sync

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants