Skip to content

[Bug] IOBuf TLS block pool: double-return of a Block creates a self-loop in portal_next linked list, causing thread hang #3243

@walterzhaoJR

Description

@walterzhaoJR

Describe the bug
release_tls_block() and release_tls_block_chain() in the IOBuf TLS block caching layer do not guard against a block being returned to TLS when it is already the TLS list head. This can create a self-referencing cycle (b->portal_next == b), causing any subsequent traversal of the TLS chain — such as remove_tls_block_chain() (registered via thread_atexit) or share_tls_block() — to loop infinitely, hanging the thread permanently.

In src/butil/iobuf_inl.h, release_tls_block():

Image

When b is already tls_data->block_head, the assignment b->u.portal_next = tls_data->block_head becomes b->u.portal_next = b, forming a single-node cycle.

Similarly, in src/butil/iobuf.cpp, release_tls_block_chain():

Image

If the chain being returned contains blocks that overlap with the existing TLS head, last_b->portal_next can point back to first_b (which may be last_b itself), again forming an infinite cycle.

How the Double-Return Happens
IOBufAsZeroCopyOutputStream::BackUp() calls iobuf::release_tls_block(_cur_block) to eagerly return the block to TLS so other code can reuse it:

Image

After BackUp(), the block is now tls_data.block_head. If a subsequent operation (e.g., _release_block() during destruction of IOBufAsZeroCopyOutputStream, or a BackUp in IOBufAsSnappySink) calls release_tls_block() again with the same block pointer (obtained from a still-live BlockRef), the block is returned a second time — triggering the self-loop.

Impact

  • Thread hangs permanently in remove_tls_block_chain() (called at thread exit via thread_atexit), or in share_tls_block() / release_tls_block_chain() during normal I/O.
  • The hang is silent — no crash, no log, no error — making it extremely difficult to diagnose in production.
  • Any brpc application using protobuf serialization over IOBuf (which internally uses IOBufAsZeroCopyOutputStream) is potentially affected.

To Reproduce

Expected behavior

Versions
OS:
Compiler:
brpc:
protobuf:

Additional context/screenshots

** Suggested Fix **

  1. Guard release_tls_block() against double-return
Image
  1. Guard release_tls_block_chain() against self-loop after linking
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions