Skip to content

[cuebot] Fix channel closing issue#2274

Open
DiegoTavares wants to merge 2 commits into
AcademySoftwareFoundation:masterfrom
DiegoTavares:drain-cache
Open

[cuebot] Fix channel closing issue#2274
DiegoTavares wants to merge 2 commits into
AcademySoftwareFoundation:masterfrom
DiegoTavares:drain-cache

Conversation

@DiegoTavares
Copy link
Copy Markdown
Collaborator

@DiegoTavares DiegoTavares commented May 8, 2026

The application os not properly closing the grpc channel at shutdown, leaving this SEVERE warning:

Channel ManagedChannelImpl{logId=2787, target=elk0815:8444} was not shutdown properly

Explanation:

RqdClientGrpc (cuebot/src/main/java/com/imageworks/spcue/rqd/RqdClientGrpc.java:71-89) holds a Guava LoadingCache<String, ManagedChannel>. The removalListener calls conn.shutdown() on cache eviction — but never on application shutdown. The bean has no destroy-method (applicationContext-service.xml:39, no destroy hook), and RqdClientGrpc has no shutdown() method at all. So when cuebot stops:

  • The cache is GC'd
  • Each ManagedChannel is finalized without shutdown() having been called
  • gRPC logs SEVERE: Channel ... was not shutdown properly!!! per leaked channel, with the stack capturing where the channel was first allocated (the RuntimeException: ManagedChannel allocation site — that's a diagnostic stack, not a real exception)

LLM usage disclosure

Claude Opus was used for investigating the origin of the log and proposing a fix

Summary by CodeRabbit

  • Bug Fixes
    • Improved application shutdown procedure to properly clean up network resources and prevent potential resource leaks during shutdown.
    • Enhanced shutdown sequencing to ensure proper initialization and cleanup order of internal services.

Review Change Stack

The application os not properly closing the grpc channel at shutdown, leaving this SEVERE warning:

```
Channel ManagedChannelImpl{logId=2787, target=elk0815:8444} was not shutdown properly
```

Explanation:

`RqdClientGrpc` (cuebot/src/main/java/com/imageworks/spcue/rqd/RqdClientGrpc.java:71-89) holds a
Guava `LoadingCache<String, ManagedChannel>`. The `removalListener` calls `conn.shutdown()` **on
cache eviction** — but never on application shutdown. The bean has no `destroy-method`
(applicationContext-service.xml:39, no destroy hook), and `RqdClientGrpc` has no `shutdown()` method
at all. So when cuebot stops:

- The cache is GC'd
- Each `ManagedChannel` is finalized **without** `shutdown()` having been called
- gRPC logs `SEVERE: Channel ... was not shutdown properly!!!` per leaked channel, with the stack
  capturing where the channel was first allocated (the `RuntimeException: ManagedChannel allocation
  site` — that's a diagnostic stack, not a real exception)

The first truncated stack in your paste is another such allocation-site diagnostic. The actual
runtime call (`launchFrame`) **succeeded** — the frame was dispatched. This is purely a
resource-cleanup hygiene warning, not a functional failure.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

📝 Walkthrough

Walkthrough

This PR adds graceful shutdown of gRPC channels in OpenCue's RqdClientGrpc. A new shutdown() method invalidates all cached channels, and Spring configuration ensures this method is invoked during application shutdown with proper bean dependency ordering.

Changes

gRPC Graceful Shutdown

Layer / File(s) Summary
Shutdown method implementation
cuebot/src/main/java/com/imageworks/spcue/rqd/RqdClientGrpc.java
shutdown() method logs invocation and invalidates all cached channel entries to trigger removal via the existing cache removal listener.
Spring lifecycle configuration
cuebot/src/main/resources/conf/spring/applicationContext-service.xml
rqdClient bean declares destroy-method="shutdown" for explicit shutdown during Spring context destruction. Queue beans (bookingQueue, dispatchQueue, manageQueue, reportQueue, killQueue) add rqdClient to their depends-on attribute to establish proper shutdown ordering.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • lithorus
  • ramonfigueiredo

Poem

A client thread with channels cached,
Now gracefully will shutdown fast,
Spring's destroy hook calls shutdown's call,
Queue beans wait—no rush at all,
Clean closures, ordered, one by one. 🐰

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix channel closing issue' directly addresses the main change: implementing proper gRPC ManagedChannel shutdown through a new shutdown() method and Spring bean destroy configuration.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@DiegoTavares DiegoTavares marked this pull request as ready for review May 12, 2026 02:31
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cuebot/src/main/java/com/imageworks/spcue/rqd/RqdClientGrpc.java`:
- Around line 220-224: The shutdown() method currently only invalidates
channelCache but does not prevent new channels from being created via
channelCache.get(host); add a shutdown guard (e.g., an atomic boolean like
isShutdown) that shutdown() sets before invalidating/invalidationComplete and
update any code that calls channelCache.get(host) to check this guard and refuse
or short-circuit creation if shutdown has begun; ensure shutdown() also prevents
further puts/gets by either making channelCache null after setting the flag or
having creation paths throw/return early when isShutdown is true so no new
channels are recreated during teardown.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 510608ae-c252-4a7a-87d1-47bb051bd437

📥 Commits

Reviewing files that changed from the base of the PR and between 795cdae and 8380645.

📒 Files selected for processing (2)
  • cuebot/src/main/java/com/imageworks/spcue/rqd/RqdClientGrpc.java
  • cuebot/src/main/resources/conf/spring/applicationContext-service.xml

Comment on lines +220 to +224
public void shutdown() {
if (channelCache != null) {
logger.info("Shutting down RqdClientGrpc channel cache");
channelCache.invalidateAll();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent channel recreation after shutdown starts.

shutdown() currently closes existing cached channels, but new channels can still be created afterward via channelCache.get(host) if any teardown-time call races in. That can reintroduce the same leak warning during shutdown.

Suggested fix
+import java.util.concurrent.atomic.AtomicBoolean;
...
+    private final AtomicBoolean shuttingDown = new AtomicBoolean(false);
...
     private RqdInterfaceGrpc.RqdInterfaceBlockingStub getStub(String host)
             throws ExecutionException {
+        if (shuttingDown.get()) {
+            throw new IllegalStateException("RqdClientGrpc is shutting down");
+        }
         if (channelCache == null) {
             buildChannelCache();
         }
         ManagedChannel channel = channelCache.get(host);
         return RqdInterfaceGrpc.newBlockingStub(channel).withDeadlineAfter(rqdTaskDeadlineSeconds,
                 TimeUnit.SECONDS);
     }

     private RunningFrameGrpc.RunningFrameBlockingStub getRunningFrameStub(String host)
             throws ExecutionException {
+        if (shuttingDown.get()) {
+            throw new IllegalStateException("RqdClientGrpc is shutting down");
+        }
         if (channelCache == null) {
             buildChannelCache();
         }
         ManagedChannel channel = channelCache.get(host);
         return RunningFrameGrpc.newBlockingStub(channel).withDeadlineAfter(rqdTaskDeadlineSeconds,
                 TimeUnit.SECONDS);
     }
...
     public void shutdown() {
-        if (channelCache != null) {
+        if (!shuttingDown.compareAndSet(false, true)) {
+            return;
+        }
+        if (channelCache != null) {
             logger.info("Shutting down RqdClientGrpc channel cache");
             channelCache.invalidateAll();
+            channelCache.cleanUp();
         }
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cuebot/src/main/java/com/imageworks/spcue/rqd/RqdClientGrpc.java` around
lines 220 - 224, The shutdown() method currently only invalidates channelCache
but does not prevent new channels from being created via channelCache.get(host);
add a shutdown guard (e.g., an atomic boolean like isShutdown) that shutdown()
sets before invalidating/invalidationComplete and update any code that calls
channelCache.get(host) to check this guard and refuse or short-circuit creation
if shutdown has begun; ensure shutdown() also prevents further puts/gets by
either making channelCache null after setting the flag or having creation paths
throw/return early when isShutdown is true so no new channels are recreated
during teardown.

Copy link
Copy Markdown
Collaborator

@ramonfigueiredo ramonfigueiredo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @DiegoTavares !

Approved with minor change suggested by CodeRabbitAI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants