Skip to content

perf: optimize timestamp truncation for sub-day formats #3477

@andygrove

Description

@andygrove

Description

Timestamp truncation (date_trunc / TruncTimestamp) for sub-day formats (HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND) can be optimized using simple modular arithmetic instead of the current approach that converts to/from DateTime<Tz> via chrono.

Since timestamps are stored internally as microseconds since Unix epoch (UTC), truncation to e.g. SECOND can be computed as:

micros - (micros % MICROS_PER_SECOND)

This avoids the overhead of timezone-aware DateTime construction and is significantly faster.

Important: timezone offset considerations

As noted by @parthchandra in #2996 (comment), the arithmetic fast path is not timezone-independent for HOUR (and theoretically MINUTE).

The raw arithmetic truncates the UTC hour, not the local hour. For timezones with non-whole-hour offsets, this produces incorrect results. For example:

  • Asia/Kathmandu (UTC+5:45): truncating 14:00 UTC to HOUR gives 14:00 UTC (= 19:45 local), but the correct local-hour truncation would be 19:00 local = 13:15 UTC
  • Asia/Kolkata (UTC+5:30), Australia/Eucla (UTC+8:45), and others with :30/:45 offsets have the same issue

Safe fast-path formats

  • MICROSECOND — always safe (no-op for microsecond timestamps)
  • MILLISECOND — always safe (sub-second, no timezone has sub-second offset)
  • SECOND — always safe (no timezone has sub-second offset)
  • MINUTE — safe in practice (no modern timezone has sub-minute offset, though some historical ones did)
  • HOURNOT safe for non-whole-hour timezone offsets

Suggested approach

One of:

  1. Only apply arithmetic fast path for SECOND, MILLISECOND, MICROSECOND — safest, still a meaningful optimization
  2. Apply fast path for HOUR/MINUTE only when the timezone offset is a whole-hour/whole-minute multiple — check the offset at runtime and fall back to the slow path otherwise
  3. Adjust arithmetic to account for timezone offset — add offset before truncation, subtract after. This requires resolving the timezone offset (which may vary due to DST), so it may negate the performance benefit for HOUR

The prior PR (#2996) also included optimizations for date_trunc (Date32 truncation) that work directly with days-since-epoch instead of converting through NaiveDateTime. Those optimizations are not affected by this timezone issue and could be submitted separately.

Benchmark results (from prior PR #2996)

The arithmetic approach showed significant speedup for sub-day formats:

Timestamp Truncate - HOUR:        ~2-3x faster
Timestamp Truncate - MINUTE:      ~2-3x faster  
Timestamp Truncate - SECOND:      ~2-3x faster
Timestamp Truncate - MILLISECOND: ~2-3x faster
Timestamp Truncate - MICROSECOND: ~10x faster (no-op)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions