Skip to content

Commit 07b03aa

Browse files
authored
Add a bunch of settings to general, document connection recovery (#51)
1 parent ae2a621 commit 07b03aa

3 files changed

Lines changed: 164 additions & 21 deletions

File tree

docs/configuration/pgdog.toml/general.md

Lines changed: 55 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,28 @@ Delay running idle healthchecks at PgDog startup to give databases (and pools) t
141141

142142
Default: **`5_000`** (5s)
143143

144+
### `connection_recovery`
145+
146+
Controls if server connections are recovered or dropped if a client abruptly disconnects.
147+
148+
Available options:
149+
150+
- `recover` (default)
151+
- `rollback_only`
152+
- `drop`
153+
154+
`rollback_only` will only attempt to `ROLLBACK` any unfinished transactions but won't attempt to resynchronize connections. `drop` will close connections, without attempting recovery.
155+
156+
### `client_connection_recovery`
157+
158+
Controls whether to disconnect clients upon encountering connection pool errors (e.g., checkout timeout). Set this to `drop` if your clients are async / use pipelining mode.
159+
160+
Available options:
161+
162+
- `recover` (default)
163+
- `drop`
164+
165+
144166
## Timeouts
145167

146168
These settings control how long PgDog waits for maintenance tasks to complete. These timeouts make sure PgDog can recover
@@ -261,21 +283,6 @@ Enable load balancer [HTTP health checks](../../features/load-balancer/healthche
261283

262284
Default: **none** (disabled)
263285

264-
## Service discovery
265-
266-
### `broadcast_address`
267-
268-
Send multicast packets to this address on the local network. Configuring this setting enables
269-
mutual service discovery. Instances of PgDog running on the same network will be able to see
270-
each other.
271-
272-
Default: **none** (disabled)
273-
274-
### `broadcast_port`
275-
276-
The port used for sending and receiving broadcast messages.
277-
278-
Default: **`6433`**
279286

280287
## Monitoring
281288

@@ -410,11 +417,41 @@ Available options:
410417

411418
Default: **`auto`**
412419

413-
### `system_catalogs_omnisharded`
420+
### `system_catalogs`
414421

415-
Enables sticky routing for system catalog tables and treats them as [omnisharded](../../features/sharding/omnishards.md) tables. This makes tools like `psql` work out of the box.
422+
Changes how system catalog tables (like `pg_database`, `pg_class`, etc.) are treated by the query router. Default behavior is to assume they are the same on all shards and send queries referencing them to a random shard. This makes tools like `psql` work out of the box.
416423

417-
Default: **`true`** (enabled)
424+
Available options:
425+
426+
- `omnisharded`
427+
- `omnisharded_sticky` (default)
428+
- `sharded`
429+
430+
Default: **`omnisharded_sticky`** (enabled)
431+
432+
### `omnisharded_sticky`
433+
434+
If turned on, queries touching [omnisharded](../../features/sharding/omnishards.md) tables are always sent to the same shard for any given client connection. The shard is determined at random on connection creation.
435+
436+
Default: **`false`**
437+
438+
### `resharding_copy_format`
439+
440+
Which format to use for `COPY` statements during [resharding](../../features/sharding/resharding/index.md).
441+
442+
Available options:
443+
444+
- `binary` (default)
445+
- `text`
446+
447+
`text` format is required when migrating from `INTEGER` to `BIGINT` primary keys during resharding.
448+
449+
### `reload_schema_on_ddl`
450+
451+
!!! warning
452+
This setting is intended for local development / CI / single node PgDog deployments.
453+
454+
Automatically reload the schema cache used by PgDog to route queries upon detecting DDL statements (e.g., `CREATE TABLE`, `ALTER TABLE`, etc.).
418455

419456
## Logging
420457

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
---
2+
icon: material/connection
3+
---
4+
5+
# Connection recovery
6+
7+
PostgreSQL database connections are expensive to create so PgDog does its best not to close them unless absolutely necessary. In case a client disconnects before fully processing a query response, PgDog will attempt to preserve the connection using several recovery steps.
8+
9+
## Abandoned transactions
10+
11+
If a client disconnects abruptly while inside a transaction, the transaction is considered abandoned and PgDog will automatically execute a `ROLLBACK`, making sure none of its changes are persisted in the database.
12+
13+
This is a common occurrence if there is a bug that causes the application to crash while executing multiple statements inside a manually started transaction, for example:
14+
15+
=== "Rails"
16+
```ruby
17+
ActiveRecord.transaction do
18+
user = User.find(5)
19+
# crash happens here.
20+
end
21+
```
22+
=== "SQLAlchemy"
23+
```python
24+
with session.begin():
25+
user = session.get(User, 5)
26+
# crash happens here.
27+
```
28+
=== "Go"
29+
```go
30+
tx, _ := db.Begin()
31+
row := tx.QueryRow("SELECT * FROM users WHERE id = $1", 5)
32+
// crash happens here.
33+
```
34+
35+
### Connection storms
36+
37+
By preserving connections, PgDog protects the database against connection storms. Other connection poolers like PgBouncer close server connections without attempting any recovery.
38+
39+
When the application restarts, the pooler must recreate all of these connections at once, causing thousands of server connections to be opened and closed in rapid succession. This leads to unnecessary contention on database resources and can cause 100% CPU spikes on the database.
40+
41+
## Abandoned queries
42+
43+
A client can abruptly disconnect while receiving query response data from the server. This can happen due to out-of-memory errors or hardware failure, for example:
44+
45+
=== "Rails"
46+
```ruby
47+
orders = Order.where(user_id: 5)
48+
# ^ crash happens inside `pg`,
49+
# while receiving multiple rows
50+
```
51+
=== "SQLAlchemy"
52+
```python
53+
orders = session.execute(
54+
select(Order).where(Order.user_id == 5)
55+
).all()
56+
# ^ crash happens while receiving multiple rows
57+
```
58+
=== "Go"
59+
```go
60+
rows, _ := db.Query("SELECT * FROM orders WHERE user_id = $1", 5)
61+
for rows.Next() {
62+
// crash happens here while iterating over rows
63+
}
64+
```
65+
66+
PgDog will detect this and drain server connections, restoring them to their normal state, before returning them back to the connection pool. The drain mechanism works by receiving and discarding `DataRow` messages and sending [`Sync`](https://www.postgresql.org/docs/current/protocol-message-formats.html#PROTOCOL-MESSAGE-FORMATS-SYNC) to the server to resynchronize the extended protocol state.
67+
68+
Just like [abandoned transactions](#abandoned-transactions), this protects PostgreSQL databases from connection storms caused by unreliable clients. If the client was executing a transaction, it will be rolled back as well.
69+
70+
### Configuration
71+
72+
Connection recovery is an optional feature, enabled by default. You can change how it behaves through configuration:
73+
74+
```toml
75+
[general]
76+
connection_recovery = "recover"
77+
```
78+
79+
| Configuration value | Description |
80+
|-|-|
81+
| `recover` | Attempt full connection recovery, including rollback and resynchronization. This is the default. |
82+
| `rollback_only` | Rollback abandoned transactions but drop the connection if a query was abandoned mid-response. |
83+
| `drop` | Disable connection recovery and close the server connection (identical to PgBouncer). |
84+
85+
To make sure abandoned server connections don't block normal operations, PgDog supports a configurable timeout on the recovery operation. If connection recovery doesn't complete in time, the connection will be closed:
86+
87+
```toml
88+
[general]
89+
rollback_timeout = 5_000
90+
```
91+
92+
## Client connections
93+
94+
Just like server connections, PgDog can maintain client connections (application --> PgDog) during incidents. This helps preserve application-side connection pools and avoids re-creating thousands of connections unnecessarily.
95+
96+
While enabled by default, some applications don't behave well when their queries return errors instead of results. Therefore, this feature is configurable and can be disabled:
97+
98+
```toml
99+
[general]
100+
client_connection_recovery = "drop"
101+
```
102+
103+
| Configuration value | Description |
104+
|-|-|
105+
| `recover` | Attempt to maintain client connections open after database-related errors, like `checkout timeout`. |
106+
| `drop` | Disable connection recovery and close the client connection (identical to PgBouncer). |

docs/features/sharding/omnishards.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -106,11 +106,11 @@ tables = [
106106
]
107107
```
108108

109-
This is configurable with the `system_catalogs_omnisharded` setting in [`pgdog.toml`](../../configuration/pgdog.toml/general.md#system_catalogs_omnisharded):
109+
This is configurable with the `system_catalogs` setting in [`pgdog.toml`](../../configuration/pgdog.toml/general.md#system_catalogs_omnisharded):
110110

111111
```toml
112112
[general]
113-
system_catalogs_omnisharded = true
113+
system_catalogs = "omnisharded_sticky"
114114
```
115115

116-
If enabled (it is by default), commands like `\d`, `\d+` and others sent from `psql` will start to return correct results.
116+
If enabled (it is by default), commands like `\d`, `\d+` and others sent from `psql` will return correct results.

0 commit comments

Comments
 (0)