Skip to content

feat: add timeout for http and db requests#15

Merged
tac0turtle merged 3 commits intomainfrom
pierrick/fix-last-page-query
Feb 25, 2026
Merged

feat: add timeout for http and db requests#15
tac0turtle merged 3 commits intomainfrom
pierrick/fix-last-page-query

Conversation

@pthmas
Copy link
Collaborator

@pthmas pthmas commented Feb 24, 2026

Overview

  • Replace expensive OFFSET scans on the blocks table with a keyset cursor (WHERE number <= cursor) derived from the block number, cutting last-page query time from 30s+ to <15ms.
  • Add a 10s Postgres statement_timeout on every pool connection and an HTTP TimeoutLayer to prevent any slow query from exhausting the connection pool and crashing the API.

Summary by CodeRabbit

  • New Features
    • Requests exceeding 10 seconds now return HTTP 408 (request timeout).
    • Block listing uses keyset-style pagination for faster, more consistent listing on large datasets.
    • Database queries now enforce a 10-second timeout on connections to prevent long-running queries.

@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

📝 Walkthrough

Walkthrough

Adds request-level and DB-level timeouts and switches block listing to keyset pagination: tower-http gains the timeout feature and a 10s TimeoutLayer is applied; PgPool sets statement_timeout on connect; list_blocks uses a cursor-based WHERE ... ORDER BY ... LIMIT query.

Changes

Cohort / File(s) Summary
Router & deps
backend/Cargo.toml, backend/crates/atlas-api/src/main.rs
Added timeout feature to tower-http; wired a 10s TimeoutLayer into the Axum router returning HTTP 408 on timeout.
Database pool & migrations
backend/crates/atlas-common/src/db.rs, backend/crates/atlas-indexer/src/main.rs
Configured after_connect to run SET statement_timeout = '10s'; changed run_migrations to accept a database URL and create a dedicated pool for migrations; indexer updated to call run_migrations(&config.database_url).
Query pagination
backend/crates/atlas-api/src/handlers/blocks.rs
Replaced OFFSET pagination with keyset pagination using a calculated cursor and WHERE number <= $2 ORDER BY number DESC LIMIT $1 to bind (limit, cursor) instead of (limit, offset).

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Router as Axum Router (TimeoutLayer)
  participant Handler as Block Handler
  participant DB as Postgres Pool

  Client->>Router: HTTP request
  Router->>Handler: Forward (enforces 10s timeout)
  Handler->>DB: SELECT ... WHERE number <= $cursor LIMIT $limit
  DB-->>Handler: Rows
  Handler-->>Client: HTTP response
Loading
sequenceDiagram
  participant Service as App/Indexer
  participant Migrate as run_migrations (creates migration pool)
  participant DB as Postgres

  Service->>Migrate: run_migrations(database_url)
  Migrate->>DB: connect (pool with max_connections=1)
  Migrate->>DB: run migrations
  DB-->>Migrate: migration result
  Migrate-->>Service: done
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰
Time ticks ten, then hops away—
Cursors bound to show the way.
Pools set fast, migrations run,
Requests return before the sun.
Small hops, big speed—code done. ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the two main changes: adding HTTP timeout via TimeoutLayer and database timeout via statement_timeout. It captures the core objectives of preventing slow requests from exhausting resources.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pierrick/fix-last-page-query

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
backend/crates/atlas-api/src/main.rs (1)

8-11: Align timeout responses with the standard API error envelope.

All handlers return errors through ApiResult<Json<T>>, which serializes via ApiError::into_response() to produce a consistent JSON envelope: {"error": "message"}. However, TimeoutLayer at lines 214-217 returns a bare 408 status code without a response body, creating inconsistency for clients expecting the standard error format.

Consider wrapping with a custom error handler layer to serialize timeouts into the same envelope as other API errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-api/src/main.rs` around lines 8 - 11, TimeoutLayer
currently returns a bare 408 response that bypasses our ApiResult/ApiError JSON
envelope; modify the timeout handling so timeouts are converted into our
standard ApiError and serialized via ApiError::into_response (or wrap
TimeoutLayer with a custom layer/handler) so controllers still return
ApiResult<Json<T>> style errors. Locate where TimeoutLayer is added (the tower
layer setup using TimeoutLayer) and replace or wrap it with a layer that
intercepts timeout errors, maps them to an ApiError variant (e.g.,
ApiError::timeout or ApiError::new with message "request timed out"), and
produces the same JSON envelope by calling ApiError::into_response before
returning the response. Ensure the handler uses the existing ApiError type and
serialization path so clients always receive {"error": "..."} for timeouts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/crates/atlas-api/src/handlers/blocks.rs`:
- Around line 22-36: The cursor calculation for keyset pagination uses
Pagination::offset() which multiplies by the raw, unclamped limit causing
skipped/duplicated pages; fix by using the clamped limit when computing the
cursor (e.g. replace cursor = (total_count - 1) - pagination.offset() with a
calculation that uses pagination.limit(), or change Pagination::offset() to
compute (page.saturating_sub(1)) * self.limit.min(100)); update the cursor
assignment in the handler that computes `cursor` (and/or the Pagination::offset
method) so both use the same clamped limit to keep pages consistent with the SQL
LIMIT binding.

In `@backend/crates/atlas-common/src/db.rs`:
- Around line 4-15: The pool created by create_pool uses after_connect to set a
10s statement_timeout which can cause run_migrations to fail on long
index/insert operations; change the implementation so migrations run with a
pool/connection that does not set the 10s timeout or make the timeout
configurable: add an optional parameter (e.g., statement_timeout:
Option<Duration> or a bool like set_statement_timeout) to create_pool and only
install the after_connect timeout when provided, or add a new helper
create_migration_pool that omits the after_connect handler, and update
run_migrations to use the migration-specific pool/connection instead of the
timeout-bearing pool.

---

Nitpick comments:
In `@backend/crates/atlas-api/src/main.rs`:
- Around line 8-11: TimeoutLayer currently returns a bare 408 response that
bypasses our ApiResult/ApiError JSON envelope; modify the timeout handling so
timeouts are converted into our standard ApiError and serialized via
ApiError::into_response (or wrap TimeoutLayer with a custom layer/handler) so
controllers still return ApiResult<Json<T>> style errors. Locate where
TimeoutLayer is added (the tower layer setup using TimeoutLayer) and replace or
wrap it with a layer that intercepts timeout errors, maps them to an ApiError
variant (e.g., ApiError::timeout or ApiError::new with message "request timed
out"), and produces the same JSON envelope by calling ApiError::into_response
before returning the response. Ensure the handler uses the existing ApiError
type and serialization path so clients always receive {"error": "..."} for
timeouts.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af3daf4 and 4fb1182.

📒 Files selected for processing (4)
  • backend/Cargo.toml
  • backend/crates/atlas-api/src/handlers/blocks.rs
  • backend/crates/atlas-api/src/main.rs
  • backend/crates/atlas-common/src/db.rs

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
backend/crates/atlas-api/src/handlers/blocks.rs (1)

22-26: ⚠️ Potential issue | 🟡 Minor

The past critical cursor-limit mismatch is resolved; residual metadata inconsistency at line 43 remains.

The cursor now correctly derives from pagination.limit() (clamped to 100), which matches the SQL LIMIT $1 binding and closes the previous critical bug. However, PaginatedResponse::new at line 43 still receives pagination.limit (raw, unclamped). If a client supplies limit=200, the response advertises total_pages = ceil(total_count / 200), but each page is actually served with 100 rows — causing the client to believe only half the pages exist.

🛠️ Proposed fix

Pass the effective (clamped) limit to the response constructor so page-count metadata matches query behaviour:

     Ok(Json(PaginatedResponse::new(
         blocks,
         pagination.page,
-        pagination.limit,
+        limit as u64,   // use the same clamped limit that drove the query
         total_count,
     )))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-api/src/handlers/blocks.rs` around lines 22 - 26,
PaginatedResponse::new is being given the raw, unclamped pagination.limit while
the query uses the clamped value returned by pagination.limit(); update the call
to PaginatedResponse::new to pass the effective clamped limit variable (limit)
instead of the raw pagination.limit so that total_pages and other metadata
reflect the actual number of rows returned; locate where PaginatedResponse::new
is invoked in this handler and replace the pagination.limit argument with the
local limit variable (and ensure the cursor calculation using limit and cursor
remains unchanged).
🧹 Nitpick comments (2)
backend/crates/atlas-common/src/db.rs (2)

22-31: The past timeout-blocking-migrations issue is now properly addressed.

The dedicated no-timeout pool with max_connections = 1 correctly isolates migration execution from the statement_timeout applied to the main pool. The design is sound.

One note: the migration pool is dropped implicitly at the end of the function. In sqlx, dropping a PgPool without calling pool.close().await means in-flight connections are abandoned rather than gracefully closed. For a short-lived single-connection migration pool this is low-risk, but an explicit close is cleaner.

♻️ Suggested explicit pool close
 pub async fn run_migrations(database_url: &str) -> Result<(), sqlx::Error> {
     let pool = PgPoolOptions::new()
         .max_connections(1)
         .connect(database_url)
         .await?;
     sqlx::migrate!("../../migrations")
         .run(&pool)
         .await
         .map_err(|e| sqlx::Error::Migrate(Box::new(e)))?;
+    pool.close().await;
     Ok(())
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-common/src/db.rs` around lines 22 - 31, The
run_migrations function currently relies on implicit drop of the migration
PgPool (variable pool) which can abandon in-flight connections; after
sqlx::migrate!(...).run(&pool).await completes, explicitly call
pool.close().await to gracefully close the pool before returning from
run_migrations (keep the existing error mapping and Result signature and leave
sqlx::migrate! usage unchanged).

12-12: Make statement_timeout configurable for different environments.

The hardcoded '10s' timeout is appropriate for most cases, but environments with slower storage or heavier concurrent load may need to tune this. Consider adding a statement_timeout parameter (defaulting to 10s for backward compatibility), which aligns with the codebase's existing pattern of environment-based configuration for db parameters.

♻️ Suggested configurable timeout
-pub async fn create_pool(database_url: &str, max_connections: u32) -> Result<PgPool, sqlx::Error> {
+pub async fn create_pool(
+    database_url: &str,
+    max_connections: u32,
+    statement_timeout: Option<std::time::Duration>,
+) -> Result<PgPool, sqlx::Error> {
     PgPoolOptions::new()
         .max_connections(max_connections)
         .after_connect(|conn, _meta| {
             Box::pin(async move {
-                conn.execute("SET statement_timeout = '10s'").await?;
+                if let Some(timeout) = statement_timeout {
+                    let ms = timeout.as_millis();
+                    conn.execute(
+                        format!("SET statement_timeout = '{ms}ms'").as_str()
+                    ).await?;
+                }
                 Ok(())
             })
         })
         .connect(database_url)
         .await
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-common/src/db.rs` at line 12, Replace the hardcoded
conn.execute("SET statement_timeout = '10s'").await? with a configurable value:
read a new statement_timeout setting (default "10s") from the existing DB/config
env pattern and pass that value into the SET statement using the same
conn.execute call (e.g., format!("SET statement_timeout = '{}'",
statement_timeout)). Add the new setting to the DB config struct or function
signature used when creating the connection (keeping the default for backward
compatibility) and ensure parsing/validation of the value follows the project's
config conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@backend/crates/atlas-api/src/handlers/blocks.rs`:
- Around line 22-26: PaginatedResponse::new is being given the raw, unclamped
pagination.limit while the query uses the clamped value returned by
pagination.limit(); update the call to PaginatedResponse::new to pass the
effective clamped limit variable (limit) instead of the raw pagination.limit so
that total_pages and other metadata reflect the actual number of rows returned;
locate where PaginatedResponse::new is invoked in this handler and replace the
pagination.limit argument with the local limit variable (and ensure the cursor
calculation using limit and cursor remains unchanged).

---

Nitpick comments:
In `@backend/crates/atlas-common/src/db.rs`:
- Around line 22-31: The run_migrations function currently relies on implicit
drop of the migration PgPool (variable pool) which can abandon in-flight
connections; after sqlx::migrate!(...).run(&pool).await completes, explicitly
call pool.close().await to gracefully close the pool before returning from
run_migrations (keep the existing error mapping and Result signature and leave
sqlx::migrate! usage unchanged).
- Line 12: Replace the hardcoded conn.execute("SET statement_timeout =
'10s'").await? with a configurable value: read a new statement_timeout setting
(default "10s") from the existing DB/config env pattern and pass that value into
the SET statement using the same conn.execute call (e.g., format!("SET
statement_timeout = '{}'", statement_timeout)). Add the new setting to the DB
config struct or function signature used when creating the connection (keeping
the default for backward compatibility) and ensure parsing/validation of the
value follows the project's config conventions.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fb1182 and e4b37f4.

📒 Files selected for processing (4)
  • backend/crates/atlas-api/src/handlers/blocks.rs
  • backend/crates/atlas-api/src/main.rs
  • backend/crates/atlas-common/src/db.rs
  • backend/crates/atlas-indexer/src/main.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/crates/atlas-api/src/main.rs

@tac0turtle tac0turtle merged commit 91b4de8 into main Feb 25, 2026
3 checks passed
@tac0turtle tac0turtle deleted the pierrick/fix-last-page-query branch February 25, 2026 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants