Skip to content

fix(miner): persist intent before broadcast to prevent dest double-send (#296)#299

Open
RUNECTZ33 wants to merge 1 commit intoentrius:testfrom
RUNECTZ33:fix/296-prevent-double-broadcast
Open

fix(miner): persist intent before broadcast to prevent dest double-send (#296)#299
RUNECTZ33 wants to merge 1 commit intoentrius:testfrom
RUNECTZ33:fix/296-prevent-double-broadcast

Conversation

@RUNECTZ33
Copy link
Copy Markdown

Summary

Closes #296. S0 race — preventable on-chain double-spend.

process_swap broadcasts the destination tx via send_dest_funds(), then writes the SentSwap record to the in-memory dict + save_sent_cache():

if sent is None:
    send_result = self.send_dest_funds(swap, user_receives_amount)   # ← on-chain side effect
    if not send_result:
        return False
    to_tx_hash, to_tx_block = send_result
    sent = SentSwap(to_tx_hash=to_tx_hash, ...)
    self.sent[swap.id] = sent
    self.save_sent_cache()                                            # ← persist

If the miner is killed between the broadcast returning and the persist completing (SIGKILL/OOM/hardware fault/container shutdown), the destination tx is on-chain but the cache has no record. On restart, sent is Noneprocess_swap broadcasts a second destination tx for the same swap. The BTC provider's in-process broadcasted_txids set is also empty after restart, so its dedup is gone.

Fix: persist intent before side effect

Write a pending sentinel (SentSwap with empty to_tx_hash) to the cache before calling send_dest_funds. Three outcomes are now safe:

Crash window Behavior
Between sentinel-persist and broadcast Sentinel exists, no tx broadcast → restart sees pending, refuses to process, logs critical. No double-broadcast.
After broadcast but before post-broadcast cache update Sentinel exists, real tx is on-chain → restart sees pending, refuses to process, logs critical. No double-broadcast.
Broadcast fails cleanly (send_dest_funds returns falsy) Sentinel is dropped before returning → next pass can retry. No stuck pending entry.

Trade-off acknowledged: if the crash window is hit on a real swap, that single swap's fulfillment stalls until operator reconciles (look up tx by scanning dest chain for miner-address sends matching the swap, then update or delete the cache entry).

That trade is strictly correct: a stuck swap can be recovered; a double-spend cannot.

Change

allways/miner/fulfillment.py: +43 / -1. No SentSwap dataclass change (empty to_tx_hash is the sentinel — backward compatible with the existing 3-field cache schema). No new imports.

+ # v2 (#296): persist a pending sentinel BEFORE broadcasting
+ pending = SentSwap(to_tx_hash='', to_tx_block=0, marked_fulfilled=False)
+ self.sent[swap.id] = pending
+ self.save_sent_cache()
+
  send_result = self.send_dest_funds(swap, user_receives_amount)
  if not send_result:
+     # broadcast failed cleanly — drop the sentinel so a retry can re-attempt
+     self.sent.pop(swap.id, None)
+     self.save_sent_cache()
      ...

Plus load_sent_cache scans for empty-to_tx_hash entries on startup and logs bt.logging.critical(...) with the swap IDs + reconciliation instructions.

Plus process_swap checks for the sentinel up-front and refuses to operate, surfacing a bt.logging.warning so the operator sees the same recovery hint each retry pass.

Test plan

  • AST parses cleanly: python3 -c "import ast; ast.parse(open('allways/miner/fulfillment.py').read())"
  • Verified the JSON cache schema is unchanged — the existing 3-field [to_tx_hash, to_tx_block, marked_fulfilled] array works for sentinels (empty string to_tx_hash, zero block) and real entries identically.
  • Verified cleanup_stale_sends still drops sentinels alongside completed entries when the swap leaves the active set.
  • Manual trace through the three crash windows above against current process_swap + load_sent_cache flow.
  • Recommended follow-up: a regression test mirroring the issue's repro snippet (test_crash_between_send_and_cache_causes_double_broadcast) — happy to add if reviewer prefers.

…nd (entrius#296)

S0 race: process_swap broadcasts the destination tx via send_dest_funds, then
writes the SentSwap record to the in-memory dict + save_sent_cache(). If the
miner is killed (SIGKILL/OOM/hardware fault/container shutdown) between the
broadcast returning and the persist completing, the destination tx is on-chain
but the cache has no record. On restart, sent is None again, and process_swap
broadcasts a SECOND destination tx for the same swap.

The BTC provider's in-process broadcasted_txids set is also empty after
restart, so its dedup is gone.

Fix: persist a pending sentinel (SentSwap with empty to_tx_hash) BEFORE
calling send_dest_funds. Three outcomes:

  1. Crash between sentinel-persist and broadcast — restart sees the pending
     sentinel, refuses to process this swap, surfaces critical log.
  2. Crash after broadcast but before post-broadcast cache update — same
     outcome: pending sentinel triggers restart-side refusal + critical log,
     no double-broadcast.
  3. Broadcast fails cleanly (send_dest_funds returns falsy) — we drop the
     sentinel before returning, so the next pass can retry without a stuck
     pending entry.

On restart, load_sent_cache scans for entries with empty to_tx_hash. If any
exist, log critical with the swap IDs and tell the operator to scan the dest
chain for tx FROM the miner address matching each swap, then update or
delete the cache entry. process_swap refuses to operate on pending sentinels
to ensure the operator sees the warning before any further send.

Trade-off: if the crash window is hit on a real swap, fulfillment for that
swap stalls until manual reconciliation. That's strictly better than
double-broadcasting funds: a stuck swap can be recovered (look up tx, update
cache); a double-spend cannot.

Scope: allways/miner/fulfillment.py, +43/-1. No SentSwap dataclass change
(empty string in to_tx_hash is the sentinel — backward compatible with the
existing 3-field cache schema). No new imports.

Closes entrius#296
@xiao-xiao-mao xiao-xiao-mao Bot added the bug Something isn't working label May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] miner: double-broadcast of dest funds on crash between broadcast and cache write

1 participant