Skip to content

fix(BA-4392): Share single Docker client across container stats collection#9469

Merged
HyeockJinKim merged 6 commits intomainfrom
fix/BA-4392
Mar 4, 2026
Merged

fix(BA-4392): Share single Docker client across container stats collection#9469
HyeockJinKim merged 6 commits intomainfrom
fix/BA-4392

Conversation

@seedspirit
Copy link
Copy Markdown
Contributor

resolves #8809 (BA-4392)

Checklist: (if applicable)

  • Milestone metadata specifying the target backport version
  • Mention to the original issue
  • Installer updates including:
    • Fixtures for db schema changes
    • New mandatory config options
  • Update of end-to-end CLI integration tests in ai.backend.test
  • API server-client counterparts (e.g., manager API -> client SDK)
  • Test case(s) to:
    • Demonstrate the difference of before/after
    • Demonstrate the flow of abstract/conceptual models with a concrete implementation
  • Documentation
    • Contents in the docs directory
    • docstrings in public interfaces and type annotations

@seedspirit seedspirit self-assigned this Feb 27, 2026
@github-actions github-actions Bot added size:L 100~500 LoC comp:agent Related to Agent component labels Feb 27, 2026
@seedspirit seedspirit marked this pull request as ready for review February 27, 2026 06:44
@seedspirit seedspirit requested review from a team and Copilot February 27, 2026 06:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors Docker-based container stats collection to reuse a single aiodocker.Docker() client per collection pass, reducing file descriptor pressure in large deployments.

Changes:

  • Update CPUPlugin.gather_container_measures() to reuse one Docker client for all per-container API stats calls.
  • Update MemoryPlugin.gather_container_measures() to reuse one Docker client for both API and sysfs-derived stats paths.
  • Add unit tests asserting only a single Docker client is created per collection pass, plus a changelog entry.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/ai/backend/agent/docker/intrinsic.py Refactors CPU/memory container stats to share a single Docker client instance during per-pass collection.
tests/unit/agent/test_docker_intrinsic.py Adds tests verifying Docker client instantiation count in CPU/Memory plugins across stat modes.
changes/9469.enhance.md Documents the FD-exhaustion mitigation by sharing a Docker client during stats collection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ai/backend/agent/docker/intrinsic.py Outdated
Comment thread src/ai/backend/agent/docker/intrinsic.py
Comment thread changes/9469.fix.md
Copy link
Copy Markdown
Member

@fregataa fregataa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to write summary description.
Do we have Docker daemon infra for unit test? @HyeockJinKim

Comment thread src/ai/backend/agent/docker/intrinsic.py
Comment thread src/ai/backend/agent/docker/intrinsic.py Outdated
Comment thread src/ai/backend/agent/docker/intrinsic.py Outdated
@seedspirit seedspirit changed the title refactor(BA-4392): Share single Docker client across container stats collection fix(BA-4392): Share single Docker client across container stats collection Mar 4, 2026
The PR addresses a file descriptor exhaustion bug, not an enhancement.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@seedspirit seedspirit added this to the 25.15 milestone Mar 4, 2026
@HyeockJinKim HyeockJinKim merged commit 189b040 into main Mar 4, 2026
21 checks passed
@HyeockJinKim HyeockJinKim deleted the fix/BA-4392 branch March 4, 2026 08:19
lablup-octodog pushed a commit that referenced this pull request Mar 4, 2026
…ction (#9469)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Backported-from: main (26.2)
Backported-to: 26.2
Backport-of: 9469
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 4, 2026

Backport to 25.15 is failed. Please backport manually.

HyeockJinKim pushed a commit that referenced this pull request Mar 5, 2026
…ction (#9469) (#9621)

Co-authored-by: Bokyum Kim | 김보겸 <bkkim@lablup.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:agent Related to Agent component size:L 100~500 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add connection pooling for Docker stats collection

4 participants