Skip to content

Commit d7221ff

Browse files
docs: make the single-owner contract explicit (issue 050)
Issues 021 + 026 left the impression that the poison mechanism would catch concurrent misuse. Fuzz testing (issue 050) reliably refutes this: threaded readers on a shared ``ReadBuffer`` produce duplicate messages and corrupt bytes in every trial, with ``is_poisoned`` remaining False, because torn reads frequently yield valid-looking slices that decode cleanly and never trip the poison gate. The right remediation is documentation, not code. Adding a lock would serialize the hottest method in the package for a caller contract violation; adding an owner-thread check would break legitimate hand-off patterns. Instead: spell out on every class docstring and in the README that (a) the types are not thread-safe, (b) concurrent misuse produces silent corruption rather than exceptions, and (c) ``is_poisoned`` does not and cannot catch lost-update races. Add a regression test asserting the docstrings carry the warning so future reformats cannot silently drop it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent c67b745 commit d7221ff

File tree

4 files changed

+170
-3
lines changed

4 files changed

+170
-3
lines changed

README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,25 @@ decoder.feed(data)
2424
message = decoder.decode()
2525
```
2626

27+
## Thread-safety
28+
29+
`ReadBuffer`, `WriteBuffer`, `MessageEncoder`, and `MessageDecoder`
30+
are **not thread-safe**. Each instance must be owned by a single
31+
thread or a single asyncio coroutine at a time. This matches Go's
32+
`driver.Conn` contract from go-dqlite.
33+
34+
Concurrent use of a single instance from multiple threads produces
35+
**silent data corruption** — not exceptions. The `is_poisoned`
36+
mechanism catches torn state from signal delivery during
37+
single-owner execution, but it **cannot** detect lost-update races
38+
between concurrent threads. Fuzz testing reliably reproduces both
39+
duplicate message delivery and corrupt (garbage) message bytes with
40+
no exception surfacing.
41+
42+
If you need concurrent access, wrap every call site in an
43+
`asyncio.Lock` or `threading.Lock` at the layer that owns the
44+
socket and decoder.
45+
2746
## Protocol Reference
2847

2948
Based on the [dqlite wire protocol specification](https://canonical.com/dqlite/docs/reference/wire-protocol).

src/dqlitewire/buffer.py

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,16 @@
55

66

77
class WriteBuffer:
8-
"""Buffer for building wire protocol messages."""
8+
"""Buffer for building wire protocol messages.
9+
10+
Thread-safety: NOT thread-safe. A single ``WriteBuffer``
11+
instance must be owned by one thread (or one asyncio
12+
coroutine) at a time. The single-owner contract matches Go's
13+
``driver.Conn`` layer in go-dqlite; see issue 021 for the full
14+
analysis. ``write_padded`` guards against the specific
15+
torn-payload/pad interleave described in issue 034, but that is
16+
a narrow defense, not a general thread-safety guarantee.
17+
"""
918

1019
def __init__(self) -> None:
1120
self._data = bytearray()
@@ -46,6 +55,33 @@ class ReadBuffer:
4655
"""Buffer for reading wire protocol messages from a stream.
4756
4857
Handles partial reads and message framing.
58+
59+
Thread-safety: NOT thread-safe. A single ``ReadBuffer`` instance
60+
must be owned by one thread (or one asyncio coroutine) at a
61+
time. The single-owner contract matches Go's ``driver.Conn``
62+
layer in go-dqlite; see issue 021 for the full analysis.
63+
64+
Concurrent misuse from multiple threads produces **silent data
65+
corruption**, not exceptions. Specifically:
66+
67+
- Two readers can each observe the same ``_pos`` and slice the
68+
same message bytes, returning the same message twice while
69+
silently skipping the next one (lost update on ``_pos +=``).
70+
- Readers can observe torn ``_pos``/``_data`` snapshots across
71+
a concurrent ``_maybe_compact()`` call, producing garbage
72+
bytes that decode to plausible-looking (but wrong) messages.
73+
74+
The ``poison()`` mechanism does NOT and CANNOT detect this
75+
class of failure. Poison is designed to catch single-owner
76+
torn state from interrupted signal delivery (see issues 037,
77+
041, 045). It cannot observe lost-update races or torn reads
78+
that produce valid-looking output. See issue 050 for
79+
reproduction details.
80+
81+
If you need concurrent access to a single wire stream, wrap
82+
every call site in an ``asyncio.Lock`` (for coroutines) or
83+
``threading.Lock`` (for threads) at the layer that owns the
84+
socket and decoder.
4985
"""
5086

5187
DEFAULT_MAX_MESSAGE_SIZE = 64 * 1024 * 1024 # 64 MiB

src/dqlitewire/codec.py

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,16 @@
100100

101101

102102
class MessageEncoder:
103-
"""Encodes messages to wire protocol format."""
103+
"""Encodes messages to wire protocol format.
104+
105+
Thread-safety: NOT thread-safe. A single ``MessageEncoder``
106+
instance must be owned by one thread (or one asyncio coroutine)
107+
at a time. The encoder is effectively stateless after
108+
construction (it only caches a protocol ``_version``), but the
109+
single-owner contract matches the rest of the package — see
110+
issue 021 and the class docstring on ``MessageDecoder`` /
111+
``ReadBuffer``.
112+
"""
104113

105114
def __init__(self, version: int = PROTOCOL_VERSION) -> None:
106115
"""Initialize encoder.
@@ -125,7 +134,30 @@ def encode_handshake(self) -> bytes:
125134

126135

127136
class MessageDecoder:
128-
"""Decodes messages from wire protocol format."""
137+
"""Decodes messages from wire protocol format.
138+
139+
Thread-safety: NOT thread-safe. A single ``MessageDecoder``
140+
instance must be owned by one thread (or one asyncio coroutine)
141+
at a time. The single-owner contract matches Go's
142+
``driver.Conn`` layer in go-dqlite; see issue 021 for the full
143+
analysis.
144+
145+
Concurrent misuse from multiple threads produces **silent data
146+
corruption**, not exceptions. The underlying ``ReadBuffer``
147+
suffers from lost-update races on ``_pos`` and torn
148+
``_data``/``_pos`` snapshots across ``_maybe_compact()``
149+
calls; these produce valid-looking byte slices that decode
150+
cleanly to wrong (or duplicated) messages. Fuzz testing
151+
(issue 050) confirms this reliably on every trial.
152+
153+
The ``is_poisoned`` flag does NOT detect concurrent misuse.
154+
Poison is designed to catch single-owner torn state from
155+
interrupted signal delivery (see issues 037, 041, 045). It
156+
cannot observe lost-update races or torn reads that produce
157+
valid-looking output. If you need concurrent access, wrap every
158+
call site in an ``asyncio.Lock`` or ``threading.Lock`` at the
159+
layer that owns the socket.
160+
"""
129161

130162
def __init__(self, is_request: bool = False, version: int = PROTOCOL_VERSION) -> None:
131163
"""Initialize decoder.
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
"""Regression tests for issue 050: single-owner contract documentation.
2+
3+
Issue 050 documented that the prior thread-safety analysis (issue 021)
4+
+ poison mechanism (issue 026) created a misleading impression that
5+
concurrent misuse would always surface as ``ProtocolError``. Empirical
6+
fuzz testing confirmed it does not — under threaded contention,
7+
``ReadBuffer`` produces duplicate messages and corrupt (garbage) bytes
8+
with ``is_poisoned`` remaining ``False``, because torn reads often
9+
yield valid-looking slices that decode cleanly.
10+
11+
The remediation is documentation, not code: the docstrings on
12+
``ReadBuffer``, ``WriteBuffer``, ``MessageDecoder``, and
13+
``MessageEncoder`` must make the single-owner contract explicit AND
14+
call out that the poison mechanism cannot and does not detect
15+
concurrent misuse. These tests assert the docstrings carry that
16+
warning so future refactors that reformat docstrings don't silently
17+
drop it.
18+
"""
19+
20+
from __future__ import annotations
21+
22+
from dqlitewire.buffer import ReadBuffer, WriteBuffer
23+
from dqlitewire.codec import MessageDecoder, MessageEncoder
24+
25+
26+
def _docstring(obj: object) -> str:
27+
return (obj.__doc__ or "").lower()
28+
29+
30+
class TestThreadSafetyContractDocumented:
31+
"""Assert the thread-safety contract is explicit in class docstrings.
32+
33+
The contract has three parts, and all three must be mentioned:
34+
35+
1. The class is NOT thread-safe / requires a single owner.
36+
2. Concurrent misuse produces SILENT data corruption, not
37+
exceptions.
38+
3. The poison mechanism does NOT detect concurrent misuse.
39+
40+
Any docstring-reformat that drops one of these silently
41+
undoes issue 050's documentation fix.
42+
"""
43+
44+
def test_readbuffer_docstring_warns_about_thread_safety(self) -> None:
45+
doc = _docstring(ReadBuffer)
46+
assert "not thread-safe" in doc or "not thread safe" in doc, (
47+
"ReadBuffer docstring must explicitly say it is not thread-safe"
48+
)
49+
assert "silent" in doc and ("corruption" in doc or "corrupt" in doc), (
50+
"ReadBuffer docstring must warn that concurrent misuse produces silent corruption"
51+
)
52+
assert "poison" in doc, (
53+
"ReadBuffer docstring must explain the poison mechanism's "
54+
"relationship to concurrent misuse"
55+
)
56+
57+
def test_writebuffer_docstring_warns_about_thread_safety(self) -> None:
58+
doc = _docstring(WriteBuffer)
59+
assert "not thread-safe" in doc or "not thread safe" in doc, (
60+
"WriteBuffer docstring must explicitly say it is not thread-safe"
61+
)
62+
63+
def test_message_decoder_docstring_warns_about_thread_safety(self) -> None:
64+
doc = _docstring(MessageDecoder)
65+
assert "not thread-safe" in doc or "not thread safe" in doc, (
66+
"MessageDecoder docstring must explicitly say it is not thread-safe"
67+
)
68+
assert "silent" in doc and ("corruption" in doc or "corrupt" in doc), (
69+
"MessageDecoder docstring must warn that concurrent misuse produces silent corruption"
70+
)
71+
assert "poison" in doc, (
72+
"MessageDecoder docstring must explain the poison mechanism's "
73+
"relationship to concurrent misuse"
74+
)
75+
76+
def test_message_encoder_docstring_warns_about_thread_safety(self) -> None:
77+
doc = _docstring(MessageEncoder)
78+
assert "not thread-safe" in doc or "not thread safe" in doc, (
79+
"MessageEncoder docstring must explicitly say it is not thread-safe"
80+
)

0 commit comments

Comments
 (0)