Skip to content

feat: bidirectional links, trap_exit, and start_linked for actors#166

Open
ElFantasma wants to merge 2 commits into
mainfrom
feat/links-trap-exit
Open

feat: bidirectional links, trap_exit, and start_linked for actors#166
ElFantasma wants to merge 2 commits into
mainfrom
feat/links-trap-exit

Conversation

@ElFantasma
Copy link
Copy Markdown
Collaborator

@ElFantasma ElFantasma commented May 14, 2026

Closes #131 (the monitor half landed in #165; this PR completes the issue).

Summary

Phase 3d of the supervision-trees roadmap: Erlang-style bidirectional links between actors with trap_exit semantics.

What's new

  • Exit { from: ActorId, reason: ExitReason } — system message delivered to linked actors that trap exits
  • Actor::exit_received — new lifecycle callback (default no-op) that receives Exit notifications. Mirrors started/stopped — no Handler<Exit> boilerplate needed
  • ctx.link(&handle) / ctx.unlink(&handle) — bidirectional link management
  • ctx.trap_exit(true) — convert exit signals into Exit messages instead of dying
  • start_linked(parent_ctx) — atomic spawn + link via the ActorStart trait
  • ExitReason::Kill is untrappable — bypasses trap_exit and cancels the actor
  • Transitive propagation — when a non-trapping actor is cancelled by a link signal, it exits with the linked reason (not Normal), so chains A → B → C propagate correctly

Design notes

  • Each actor owns its own link table (Arc<Mutex<Vec<LinkEntry>>>). No global state — peers register in each other's tables via closures captured at link time.
  • ExitEnvelope is a system envelope dispatched via actor.exit_received() — no Handler<Exit> bound on linked actors.
  • A LinkedExitReason slot on each actor carries the propagated reason so transitive chain propagation works correctly through non-trapping middle actors.
  • Links are idempotent (duplicate link() is a no-op).
  • Links work in both tasks and threads modes.

Files

  • New: concurrency/src/link.rsExit, LinkEntry, LinkTable, make_signal, register_link, propagate_exit
  • Modified: concurrency/src/child_handle.rsChildHandle now carries trap_exit, links, linked_reason, send_exit for type-erased signal delivery
  • Modified: concurrency/src/tasks/actor.rsContext/ActorRef gain link state; Actor::exit_received callback; link/unlink/trap_exit methods; start_linked on ActorStart; run_actor propagates exit signals on actor death
  • Modified: concurrency/src/threads/actor.rs — same for threads mode
  • Example: examples/exit_reason adds Scenario 9 demonstrating supervisor-style trap_exit

Test plan

  • 13 new tests covering: panic propagation to non-trapping peer, Exit delivery to trapping peer, normal exits dropped, link to already-dead actor, unlink, idempotent duplicate links, chain propagation through non-trapping middle, start_linked
  • Tests in both tasks and threads modes
  • All 112 tests pass, clippy clean
  • Runnable example: cargo run -p exit_reason (Scenario 9 demonstrates the supervisor pattern)

@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

Review Summary

The PR implements Erlang-style bidirectional linking with trap_exit semantics. Overall, the implementation is correct and well-tested, but there are critical concurrency issues that must be addressed before merging.

Critical Issues

1. Deadlock in Link Registration (link.rs:165-196)

Problem: register_link and unregister_link acquire locks in inconsistent order (own table first, then peer table). When Actor A links to B while Actor B simultaneously links to A from different threads, this causes a classic deadlock.

Fix: Establish consistent lock ordering based on ActorId:

// In register_link and unregister_link:
if own_id < peer_id {
    let mut own = own_links.lock().unwrap();
    let mut peer = peer_links.lock().unwrap();
} else {
    let mut peer = peer_links.lock().unwrap();
    let mut own = own_links.lock().unwrap();
}

2. Reference Cycles in LinkEntry (link.rs:105-106)

Problem: LinkEntry holds peer_links: LinkTable (an Arc). When two actors link to each other, this creates an Arc reference cycle that prevents memory reclamation until actors die and propagate_exit drains the tables.

Recommendation: Consider using Weak<Mutex<...>> for peer_links to break the cycle, or document that links should be explicitly unlinked for long-lived actors to prevent memory pressure.

Medium Issues

3. Non-Atomic start_linked (tasks/actor.rs:1793-1798, threads/actor.rs:780-785)

Problem: The comment claims "Atomically start the actor and link it," but the implementation has a race window between start() and link(). While link handles already-dead actors gracefully, this isn't true atomicity.

Fix: Clarify the documentation to state that the link is established before the actor processes messages (which is the actual guarantee provided), not that the operation is atomic.

Minor Issues

4. Duplicate Exit Signals (tasks/actor.rs:388-398, threads/actor.rs:364-374)

Problem: In Context::link, if the target dies between register_link and the exit_reason() check, the exit signal may be delivered twice (once by the immediate check, once by the dying actor's `


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have everything I need. Let me write up the review.


PR #166 Review: Bidirectional Links, trap_exit, and start_linked

Overview

Well-structured implementation of Erlang-style actor links. The design mirrors the Erlang semantics faithfully: bidirectional propagation, Normal exits are silent for non-trapping peers, Kill is untrappable, and trap_exit converts signals into deliverable Exit messages. Good test coverage across both tasks and threads backends.


Issues

[BUG] Double-signal race in ctx.link() — trapping actors only

tasks/actor.rs ~L565-583, mirrored in threads/actor.rs

link::register_link(...);        // (1) self added to target's table

// If the target is already dead, deliver the signal immediately to us.
if let Some(reason) = target.exit_reason() {  // (2) checked after registration
    ...signal(target.id(), reason);
}

There is a window between (1) and (2) where the target can die:

  • Before (1): propagate_exit drains an empty table → self is never in target's table → check at (2) fires normally. ✓
  • After (1) but before (2): propagate_exit drains the table (self is in it → self gets signaled). Then exit_reason() returns Someself gets signaled a second time.

For non-trapping actors the second cancel is idempotent, so no visible effect. For trapping actors this delivers two Exit messages for the same death. The test link_to_already_dead_actor_delivers_signal only exercises the definitely-dead case and doesn't catch this race.

Fix: Before delivering the immediate signal, remove self from the (now-dead) target's link table (or verify self is still in the target's table before signaling, treating drain-and-check as the canonical "was I signaled?").


[MINOR] Misleading atomicity claim in start_linked

tasks/actor.rs ~L720-729, threads/actor.rs ~L1331-1335

/// Atomically start the actor and link it to the caller's context.
fn start_linked<P: Actor>(self, parent_ctx: &Context<P>) -> ActorRef<Self> {
    let actor_ref = self.start();
    parent_ctx.link(&actor_ref.child_handle()); // not atomic with start()
    actor_ref
}

This is not atomic — there is a window between start() and link() where the child can die without the parent being notified. The dead-actor fallback in ctx.link() closes most of this window, but calling it "atomic" sets incorrect expectations for callers relying on that guarantee under adversarial timing. Change the doc to say "start and immediately link" or "link before the child processes its first message."


[MINOR] Inconsistent poisoned-mutex handling in propagate_exit

link.rs L403

if let Ok(mut peer_table) = entry.peer_links.lock() {
    peer_table.retain(|e| e.peer_id != own_id);
}
// ↑ silently drops the poisoned case

Every other lock acquisition in this codebase uses unwrap_or_else(|p| p.into_inner()). If the peer mutex is poisoned, self remains in the peer's link table permanently — the peer will try to signal a dead actor. Use the project-standard pattern here.


[MINOR] TrapExitFlag and LinkedExitReason are unintentionally pub

link.rs L257, L263

pub type TrapExitFlag = Arc<AtomicBool>;
pub type LinkedExitReason = Arc<Mutex<Option<ExitReason>>>;

All other internal types (LinkTable, ExitSignalFn, LinkEntry, SendExitFn) are correctly pub(crate). These two leak implementation details. The only accessors (trap_exit_flag(), linked_reason()) are already pub(crate), so these type aliases should be too.


[NIT] #[allow(clippy::too_many_arguments)] signals a refactor opportunity

child_handle.rs L46, L68

The four link-related arguments added to from_tasks/from_threads (trap_exit, links, linked_reason, send_exit) are always passed together and semantically belong together. Grouping them into a LinkState struct would eliminate the suppression and make call sites readable.


[NIT] Duplicated SendExitFn construction

tasks/actor.rs ~L610-617 and ~L648-652; same in threads/actor.rs

own_send_exit_fn() on Context<A> and the closure inside From<ActorRef<A>> for ChildHandle are identical. The From impl could call a shared make_send_exit_fn(sender) helper on ActorRef, which Context then also uses.


[NITS] Test reliability

Several tests (link_delivers_exit_to_trapping_peer, chain_propagation_through_non_trapping_middle, etc.) use fixed sleeps (50ms, 100ms, 150ms) to wait for an Exit message to reach a trapping actor. These will flake under load on slow CI. A lightweight poll loop — retry request(GetExits) until exits.len() > 0 with a timeout — would make them deterministic.


[COSMETIC] Removed useful comment in threads CompletionGuard

threads/actor.rs ~L667 (surrounding the CompletionGuard init)

-reason: None, // defaults to Kill if run_actor panics unexpectedly
+reason: None,

This comment explained the CompletionGuard's drop behavior (reason=None → Kill propagates). It was removed without replacement and it genuinely helps readers understand a non-obvious invariant. Worth restoring or replacing.


What's Good

  • Architecture: Decoupled link.rs module, no global state, per-actor tables, type-erased signal functions — all clean.
  • ExitReason::Kill untrappable: Correctly bypasses trap_exit flag, consistent with Erlang semantics.
  • Idempotent links: register_link's dedup by peer_id is correct and well-tested.
  • Transitive propagation: The LinkedExitReason slot approach for forwarding the original reason through non-trapping middle actors is elegant.
  • Test parity: Full coverage in both tasks and threads backends.

Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

Findings

  1. High: Context::link() can deliver the same exit twice to trapping actors. It registers the link first and then manually signals again if target.exit_reason() is already set, but both runtimes publish completion only after propagate_exit() has already notified linked peers. If the target dies in that window, the peer gets one signal from normal teardown and a second from the fallback path. That is observable as duplicate Exit messages in trap_exit(true) supervisors. Refs: tasks link, tasks completion ordering, threads link, threads completion ordering.

  2. High: start_linked() is not atomic despite the contract and comments. Both implementations are just start() followed by link(), so a stale or concurrently-dead parent_ctx can still spawn a live child with no real parent-side link, and the child can run started() before the link exists. This needs a spawn path that installs the initial link before the actor becomes runnable. Refs: tasks start_linked, tasks link fallback only checks target, threads start_linked, threads link fallback only checks target.

  3. Medium: Linking to an already-dead actor leaks stale entries into links. register_link() always inserts both sides, but only propagate_exit() drains them; if the target was already dead, that drain has already happened, so the new entries persist until manual unlink() or the caller exits. Long-lived trapping supervisors will accumulate dead children and pay extra work on shutdown. Refs: link register, link propagate, tasks immediate-dead path, threads immediate-dead path.

Tests were not run here: cargo test could not resolve index.crates.io in this sandbox.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 14, 2026

Greptile Summary

This PR implements Erlang-style bidirectional actor links with trap_exit semantics, including Exit message delivery, ctx.link()/ctx.unlink(), start_linked, and transitive exit propagation — all without global state.

  • concurrency/src/link.rs introduces LinkTable, make_signal, register_link, and propagate_exit as the shared primitives; both tasks and threads backends wire these into Context, ActorRef, and ChildHandle.
  • ctx.link() registers a bidirectional link and, if the target is already dead, fires the exit signal immediately; a post-registration dead-actor check creates a duplicate-delivery window when the peer dies concurrently with link setup.
  • 13 new tests cover propagation, trapping, normal-exit filtering, already-dead targets, idempotency, and chain propagation across non-trapping middle actors.

Confidence Score: 3/5

The core link primitives are well-structured and the test suite is thorough, but the post-registration dead-actor check in ctx.link() introduces a real concurrency hazard that can cause trapping actors to receive duplicate Exit notifications for the same peer death.

The duplicate-signal window exists in both backends: register_link adds the linker to the target's table, propagate_exit fires and delivers the first signal, then exit_reason() detects the dead target and fires a second signal. For trapping actors this makes exit_received fire twice for one peer death, breaking the supervisor pattern this feature is designed to enable.

Both concurrency/src/tasks/actor.rs and concurrency/src/threads/actor.rs — specifically the ctx.link() post-registration dead-actor check — warrant the closest review before merge.

Important Files Changed

Filename Overview
concurrency/src/link.rs New module implementing bidirectional link primitives; TrapExitFlag/LinkedExitReason are accidentally pub, and propagate_exit inconsistently handles poisoned peer locks with a silent if let Ok drop.
concurrency/src/tasks/actor.rs Adds link/unlink/trap_exit to Context and start_linked to ActorStart; the post-registration dead-actor check in ctx.link() can deliver a duplicate Exit to trapping actors when the peer dies concurrently with link setup.
concurrency/src/threads/actor.rs Mirrors tasks-mode changes for threads; carries the same duplicate-Exit race in ctx.link() and additionally, OS preemption between the two register_link critical sections can produce a missed signal when the peer dies between step A and step B while completion is not yet published.
concurrency/src/child_handle.rs Extends ChildHandle with trap_exit, links, linked_reason, and send_exit fields for type-erased signal delivery; changes are mechanical and consistent.
concurrency/src/lib.rs Adds pub mod link and re-exports Exit; straightforward.
examples/exit_reason/src/main.rs Adds Scenario 9 demonstrating supervisor-style trap_exit; example code is clear and correct.

Sequence Diagram

sequenceDiagram
    participant A as Actor A (linker)
    participant RT as register_link
    participant B as Actor B (target)
    participant PX as propagate_exit

    A->>RT: lock own_links, add B entry
    A->>RT: lock peer_links, add A entry
    Note over RT: Two separate critical sections

    B->>PX: actor dies, drain own_links
    PX->>A: signal(A) first delivery
    PX->>B: completion_tx.send(reason)

    A->>B: exit_reason() returns Some(reason)
    A->>A: signal(A) duplicate delivery

    Note over A: Trapping actor has 2 Exit envelopes
    Note over A: for the same peer death
Loading

Comments Outside Diff (1)

  1. concurrency/src/tasks/actor.rs, line 574-583 (link)

    P1 Duplicate Exit delivery race for trapping actors

    After register_link succeeds, if the peer dies and its propagate_exit runs before this exit_reason() check, the linker receives the signal twice: once from propagate_exit (which found own_id in peer_links) and once from this post-registration check (which then sees exit_reason() is Some). For a non-trapping actor the second cancel() call is a no-op, but for a trapping actor it enqueues a second ExitEnvelope, so exit_received fires with two identical Exit notifications from the same peer death.

    The same race exists in threads/actor.rs at the corresponding link() method.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: concurrency/src/tasks/actor.rs
    Line: 574-583
    
    Comment:
    **Duplicate `Exit` delivery race for trapping actors**
    
    After `register_link` succeeds, if the peer dies and its `propagate_exit` runs before this `exit_reason()` check, the linker receives the signal twice: once from `propagate_exit` (which found `own_id` in `peer_links`) and once from this post-registration check (which then sees `exit_reason()` is `Some`). For a non-trapping actor the second `cancel()` call is a no-op, but for a trapping actor it enqueues a second `ExitEnvelope`, so `exit_received` fires with two identical `Exit` notifications from the same peer death.
    
    The same race exists in `threads/actor.rs` at the corresponding `link()` method.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 4
concurrency/src/tasks/actor.rs:574-583
**Duplicate `Exit` delivery race for trapping actors**

After `register_link` succeeds, if the peer dies and its `propagate_exit` runs before this `exit_reason()` check, the linker receives the signal twice: once from `propagate_exit` (which found `own_id` in `peer_links`) and once from this post-registration check (which then sees `exit_reason()` is `Some`). For a non-trapping actor the second `cancel()` call is a no-op, but for a trapping actor it enqueues a second `ExitEnvelope`, so `exit_received` fires with two identical `Exit` notifications from the same peer death.

The same race exists in `threads/actor.rs` at the corresponding `link()` method.

### Issue 2 of 4
concurrency/src/link.rs:182-184
`propagate_exit` uses `unwrap_or_else` (recover from a poisoned lock) for `own_links` but silently discards poisoned-lock errors for `entry.peer_links` via `if let Ok`. If a peer thread panics while holding its link-table lock, the cleanup step is silently skipped, leaving a stale back-reference to the dying actor in the peer's table. Consistent error handling with `unwrap_or_else` is safer.

```suggestion
        let mut peer_table = entry.peer_links.lock().unwrap_or_else(|p| p.into_inner());
        peer_table.retain(|e| e.peer_id != own_id);
```

### Issue 3 of 4
concurrency/src/link.rs:36-42
`TrapExitFlag` and `LinkedExitReason` are declared `pub` in a publicly re-exported module (`pub mod link`), so they form part of the library's stable surface. Both are pure internal implementation details — users only ever interact with `ctx.trap_exit(bool)` and `Exit`. Restricting to `pub(crate)` prevents downstream code from accidentally depending on the concrete `Arc<AtomicBool>` / `Arc<Mutex<…>>` representation.

```suggestion
pub(crate) type TrapExitFlag = Arc<AtomicBool>;

/// Per-actor slot holding the exit reason of a linked actor whose death
/// triggered cancellation. When a non-trapping actor is cancelled by a link
/// signal, this slot is set so the actor's own exit reason propagates
/// transitively through further links.
pub(crate) type LinkedExitReason = Arc<Mutex<Option<ExitReason>>>;
```

### Issue 4 of 4
concurrency/src/tasks/actor.rs:825-830
The docstring says the link is established "atomically" and that the race window is "closed", but `start()` and `link()` are two separate operations: the child spawns and can process messages (or even panic) before `parent_ctx.link(...)` runs. The `link()` call does handle a dead target via the post-registration `exit_reason()` check, but the guarantee is not atomic — it's a best-effort check subject to the duplicate-delivery race noted above. The comment should be softened to avoid misleading users about the synchronisation semantics.

```suggestion
    /// Start the actor and link it to the caller's context.
    ///
    /// The link is registered as soon as the actor is spawned. If the child
    /// dies before the link is established, `link()` detects this and delivers
    /// the exit signal immediately. This does not provide strict atomicity —
    /// the child may process a message between spawn and the link registration.
    fn start_linked<P: Actor>(self, parent_ctx: &Context<P>) -> ActorRef<Self> {
```

Reviews (1): Last reviewed commit: "feat: add bidirectional links, trap_exit..." | Re-trigger Greptile

Comment thread concurrency/src/link.rs Outdated
Comment on lines +182 to +184
if let Ok(mut peer_table) = entry.peer_links.lock() {
peer_table.retain(|e| e.peer_id != own_id);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 propagate_exit uses unwrap_or_else (recover from a poisoned lock) for own_links but silently discards poisoned-lock errors for entry.peer_links via if let Ok. If a peer thread panics while holding its link-table lock, the cleanup step is silently skipped, leaving a stale back-reference to the dying actor in the peer's table. Consistent error handling with unwrap_or_else is safer.

Suggested change
if let Ok(mut peer_table) = entry.peer_links.lock() {
peer_table.retain(|e| e.peer_id != own_id);
}
let mut peer_table = entry.peer_links.lock().unwrap_or_else(|p| p.into_inner());
peer_table.retain(|e| e.peer_id != own_id);
Prompt To Fix With AI
This is a comment left during a code review.
Path: concurrency/src/link.rs
Line: 182-184

Comment:
`propagate_exit` uses `unwrap_or_else` (recover from a poisoned lock) for `own_links` but silently discards poisoned-lock errors for `entry.peer_links` via `if let Ok`. If a peer thread panics while holding its link-table lock, the cleanup step is silently skipped, leaving a stale back-reference to the dying actor in the peer's table. Consistent error handling with `unwrap_or_else` is safer.

```suggestion
        let mut peer_table = entry.peer_links.lock().unwrap_or_else(|p| p.into_inner());
        peer_table.retain(|e| e.peer_id != own_id);
```

How can I resolve this? If you propose a fix, please make it concise.

Comment thread concurrency/src/link.rs Outdated
Comment on lines +36 to +42
pub type TrapExitFlag = Arc<AtomicBool>;

/// Per-actor slot holding the exit reason of a linked actor whose death
/// triggered cancellation. When a non-trapping actor is cancelled by a link
/// signal, this slot is set so the actor's own exit reason propagates
/// transitively through further links.
pub type LinkedExitReason = Arc<Mutex<Option<ExitReason>>>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 TrapExitFlag and LinkedExitReason are declared pub in a publicly re-exported module (pub mod link), so they form part of the library's stable surface. Both are pure internal implementation details — users only ever interact with ctx.trap_exit(bool) and Exit. Restricting to pub(crate) prevents downstream code from accidentally depending on the concrete Arc<AtomicBool> / Arc<Mutex<…>> representation.

Suggested change
pub type TrapExitFlag = Arc<AtomicBool>;
/// Per-actor slot holding the exit reason of a linked actor whose death
/// triggered cancellation. When a non-trapping actor is cancelled by a link
/// signal, this slot is set so the actor's own exit reason propagates
/// transitively through further links.
pub type LinkedExitReason = Arc<Mutex<Option<ExitReason>>>;
pub(crate) type TrapExitFlag = Arc<AtomicBool>;
/// Per-actor slot holding the exit reason of a linked actor whose death
/// triggered cancellation. When a non-trapping actor is cancelled by a link
/// signal, this slot is set so the actor's own exit reason propagates
/// transitively through further links.
pub(crate) type LinkedExitReason = Arc<Mutex<Option<ExitReason>>>;
Prompt To Fix With AI
This is a comment left during a code review.
Path: concurrency/src/link.rs
Line: 36-42

Comment:
`TrapExitFlag` and `LinkedExitReason` are declared `pub` in a publicly re-exported module (`pub mod link`), so they form part of the library's stable surface. Both are pure internal implementation details — users only ever interact with `ctx.trap_exit(bool)` and `Exit`. Restricting to `pub(crate)` prevents downstream code from accidentally depending on the concrete `Arc<AtomicBool>` / `Arc<Mutex<…>>` representation.

```suggestion
pub(crate) type TrapExitFlag = Arc<AtomicBool>;

/// Per-actor slot holding the exit reason of a linked actor whose death
/// triggered cancellation. When a non-trapping actor is cancelled by a link
/// signal, this slot is set so the actor's own exit reason propagates
/// transitively through further links.
pub(crate) type LinkedExitReason = Arc<Mutex<Option<ExitReason>>>;
```

How can I resolve this? If you propose a fix, please make it concise.

Comment thread concurrency/src/tasks/actor.rs Outdated
Comment on lines +825 to +830
/// Atomically start the actor and link it to the caller's context.
///
/// The link is established before the new actor processes any messages,
/// closing the race window where a child could die before the parent's
/// `link()` call completes.
fn start_linked<P: Actor>(self, parent_ctx: &Context<P>) -> ActorRef<Self> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The docstring says the link is established "atomically" and that the race window is "closed", but start() and link() are two separate operations: the child spawns and can process messages (or even panic) before parent_ctx.link(...) runs. The link() call does handle a dead target via the post-registration exit_reason() check, but the guarantee is not atomic — it's a best-effort check subject to the duplicate-delivery race noted above. The comment should be softened to avoid misleading users about the synchronisation semantics.

Suggested change
/// Atomically start the actor and link it to the caller's context.
///
/// The link is established before the new actor processes any messages,
/// closing the race window where a child could die before the parent's
/// `link()` call completes.
fn start_linked<P: Actor>(self, parent_ctx: &Context<P>) -> ActorRef<Self> {
/// Start the actor and link it to the caller's context.
///
/// The link is registered as soon as the actor is spawned. If the child
/// dies before the link is established, `link()` detects this and delivers
/// the exit signal immediately. This does not provide strict atomicity —
/// the child may process a message between spawn and the link registration.
fn start_linked<P: Actor>(self, parent_ctx: &Context<P>) -> ActorRef<Self> {
Prompt To Fix With AI
This is a comment left during a code review.
Path: concurrency/src/tasks/actor.rs
Line: 825-830

Comment:
The docstring says the link is established "atomically" and that the race window is "closed", but `start()` and `link()` are two separate operations: the child spawns and can process messages (or even panic) before `parent_ctx.link(...)` runs. The `link()` call does handle a dead target via the post-registration `exit_reason()` check, but the guarantee is not atomic — it's a best-effort check subject to the duplicate-delivery race noted above. The comment should be softened to avoid misleading users about the synchronisation semantics.

```suggestion
    /// Start the actor and link it to the caller's context.
    ///
    /// The link is registered as soon as the actor is spawned. If the child
    /// dies before the link is established, `link()` detects this and delivers
    /// the exit signal immediately. This does not provide strict atomicity —
    /// the child may process a message between spawn and the link registration.
    fn start_linked<P: Actor>(self, parent_ctx: &Context<P>) -> ActorRef<Self> {
```

How can I resolve this? If you propose a fix, please make it concise.

@ElFantasma
Copy link
Copy Markdown
Collaborator Author

Bot Review Responses

All actionable findings addressed in 0f48d9d.


Fixed

[Bug] Duplicate Exit delivery race in ctx.link() — flagged by Kimi #4, Claude (bug section), Codex #1, Greptile P1

Fixed by adding link::take_self_from_peer_table() — atomically removes self from the peer's link table and returns whether an entry was found. If it was, peer's propagate_exit hasn't fired (or hadn't reached us yet) so we deliver the signal. If not, peer already signaled us — skip to avoid duplicate. Added unit tests take_self_from_peer_table_returns_true_when_present / _returns_false_when_absent and regression test link_to_already_dead_delivers_exactly_once_to_trapping_peer.

[Doc] start_linked "atomic" claim misleading — Kimi #3, Claude minor, Codex #2

Doc softened in both tasks and threads modes: "not strictly atomic — the child may begin executing started() and process messages before the link is established. However, if the child dies in that window, Context::link detects the dead target and delivers the exit signal as a fallback, so no signal is lost."

[Bug] Stale entries when linking to dead actor — Codex #3

Same fix as #1: when linking to an already-dead target, take_self_from_peer_table now removes the stale entry we just inserted.

[Minor] TrapExitFlag/LinkedExitReason should be pub(crate) — Claude minor, Greptile

Fixed. They were unintentionally pub. Now consistent with LinkTable, ExitSignalFn, LinkEntry, SendExitFn.

[Minor] Inconsistent poisoned-mutex handling in propagate_exit — Claude minor, Greptile

Fixed. Now uses unwrap_or_else(|p| p.into_inner()) consistent with the rest of the codebase, instead of the silent if let Ok drop.

[Cosmetic] Lost CompletionGuard comment in threads mode — Claude

Restored with an expanded explanation of the panic-fallback semantics.


Flagged as wrong / not actionable

[Kimi #1] Deadlock in link registration — incorrect. register_link and unregister_link acquire locks sequentially in separate {} scopes — each is released before the next is taken. No two locks are held simultaneously, so no deadlock possibility regardless of ordering.

[Kimi #2] Arc reference cycle in LinkEntry — technically true (each entry holds peer's LinkTable Arc, and peer's entry holds ours), but mitigated by propagate_exit draining tables on death. Not a real leak in practice. Documenting explicit unlink() for unusual long-lived link patterns is reasonable future work.


Nits left as-is (not blocking)

  • #[allow(clippy::too_many_arguments)]LinkState struct (Claude nit) — would be a nice refactor but the call sites are limited (2 constructors); deferred.
  • Duplicated SendExitFn construction (Claude nit) — between Context::own_send_exit_fn and From<ActorRef<A>> for ChildHandle. Could share a helper; deferred.
  • Sleep-based test reliability (Claude nit) — same pattern as the rest of the test suite. Polling-based helpers would be an across-the-board improvement, not specific to this PR.

Test status

  • 115 tests passing (was 112, +3 from regression + 2 unit tests for take_self_from_peer_table)
  • Clippy clean
  • CI green

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add link and monitor operations for actors

1 participant