-
Notifications
You must be signed in to change notification settings - Fork 88
Description
Summary
When a connection executes a DDL statement (CREATE, DROP, ALTER) or triggers a ROLLBACK, the prepared-statement LRU cache enters a split-brain state. Once the cache subsequently fills back up to max_prepared_statements (default: 1000), the next query crashes with a KeyError inside the eviction logic. The crash also corrupts the connection so that the following rollback() call fails too, leaving the connection permanently broken.
This manifests in some dbt projects because users can run an arbitrary number of queries after dbt executes at least one DDL all of which execute on the same connection.
Environment
redshift-connectorversion: 2.1.12- Python: 3.x
- Cluster: Redshift (provisioned)
Steps to reproduce
Minimal inline repro:
```python
import redshift_connector
conn = redshift_connector.connect(host=..., database=..., user=..., password=...)
conn.autocommit = True
cur = conn.cursor()
Step 1: DDL poisons the LRU tracker
cur.execute("CREATE TABLE IF NOT EXISTS repro_test (id INT)")
Step 2: fill the prepared-statement cache to the limit (default 1000)
for i in range(1000):
cur.execute(f"SELECT {i} AS n")
cur.fetchall()
Step 3: one more new query triggers eviction → KeyError
cur.execute("SELECT 'boom' AS x") # ← crashes here
```
Expected: query succeeds.
Actual:
```
KeyError: ('CREATE TABLE IF NOT EXISTS repro_test (id INT)', ())
File "redshift_connector/core.py", line 1849, in execute
statements_to_close.append(cache["ps"][oldest_key]["statement_name_bin"])
```
Root cause
There are two parallel cache structures kept in sync for LRU eviction:
```
cache["ps"] – dict : key → prepared-statement metadata
cache["statement_dict"] – OrderedDict : key → None (LRU order)
```
Bug 1 — handle_COMMAND_COMPLETE (core.py:2029–2034) clears ps on DDL/ROLLBACK but leaves statement_dict intact:
```python
if command in (b"ALTER", b"CREATE", b"DROP", b"ROLLBACK"):
for ps in pcache["ps"].values():
self.close_prepared_statement(ps["statement_name_bin"])
pcache["ps"].clear() # ← ps emptied
# statement_dict is NOT cleared ← stale keys remain
```
Bug 2 — The consistency-repair check (core.py:1842–1844) only adds entries to statement_dict; it never removes stale ones:
```python
if len(cache["ps"]) != len(cache["statement_dict"]):
for existing_key in cache["ps"]:
cache["statement_dict"][existing_key] = None # adds, never removes
```
Bug 3 (crash site) — Eviction blindly assumes the oldest statement_dict key exists in ps (core.py:1847–1850):
```python
if len(cache["ps"]) >= self.max_prepared_statements:
oldest_key, _ = cache["statement_dict"].popitem(last=False)
statements_to_close.append(cache["ps"][oldest_key]["statement_name_bin"])
# ↑ KeyError if oldest_key is stale
```
Failure sequence
- Connection runs N unique queries →
psandstatement_dicteach have N entries - A DDL or ROLLBACK executes →
pscleared to 0,statement_dictstill has N stale keys - N more unique queries run →
psrefills to N;statement_dictnow has 2N entries - One more new query: eviction fires (
len(ps) >= 1000), pops the oldeststatement_dictentry (a stale pre-DDL key), crashes withKeyError - The crashed query left a
PARSEmessage partially sent; the subsequentrollback()fails withProgrammingError: prepared statement "..." already exists, making the connection permanently unusable
Observed diagnostic output
```
ps size after DDL: 0 statement_dict size: 502 ← 502 stale entries remain
query 1000: ps=1000, sd=1502 ← divergence grows with each query
Step 4: KeyError: ('SELECT 1 AS query_num', ()) ← crash at eviction
```
Note: in a realistic dbt workflow the failure occurs at query ~999 rather than 1001, because DDL setup queries (DROP TABLE, CREATE TABLE) run before the main loop and pre-poison statement_dict with stale keys, reducing the effective headroom.
Proposed fix
The minimal fix is to also clear statement_dict wherever ps is cleared:
```python
core.py ~line 2034
pcache["ps"].clear()
pcache["statement_dict"].clear() # ← add this line
```
A more defensive fix also guards the eviction against stale keys:
```python
core.py ~line 1847
if len(cache["ps"]) >= self.max_prepared_statements:
while cache["statement_dict"]:
oldest_key, _ = cache["statement_dict"].popitem(last=False)
if oldest_key in cache["ps"]:
statements_to_close.append(cache["ps"][oldest_key]["statement_name_bin"])
del cache["ps"][oldest_key]
break
# else: stale key, discard and keep looking
```
Both fixes together fully resolve the issue.