Skip to content

fix: add timezone and special formats support for cast string to timestamp#3730

Open
parthchandra wants to merge 4 commits intoapache:mainfrom
parthchandra:cast-timestamp-tz
Open

fix: add timezone and special formats support for cast string to timestamp#3730
parthchandra wants to merge 4 commits intoapache:mainfrom
parthchandra:cast-timestamp-tz

Conversation

@parthchandra
Copy link
Contributor

No description provided.

@parthchandra parthchandra marked this pull request as draft March 18, 2026 22:03
@parthchandra parthchandra requested a review from andygrove March 19, 2026 17:07
@parthchandra parthchandra marked this pull request as ready for review March 19, 2026 17:07
@parthchandra
Copy link
Contributor Author

This builds on top of #3656. I will rebase this once #3656 is merged.

@parthchandra
Copy link
Contributor Author

There is one more follow up PR after this to enable it for all cases and mark this cast as compatible

@parthchandra parthchandra changed the title fix: [WIP] add timezone and special formats support for cast string to timestamp fix: add timezone and special formats support for cast string to timestamp Mar 19, 2026
// Year only: 4-7 digits, optionally negative
(
Regex::new(r"^\d{4,5}$").unwrap(),
Regex::new(r"^-?\d{4,7}$").unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these regex expressions get compiled for every invocation of timestamp_parser_with_tz?

} else {
(1i32, value)
};
let values: Vec<_> = date_part.split(['T', ' ', '-', ':', '.']).collect();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pre-existing code, but worth noting that this is allocating multiple strings and a vec for each date. This seems expensive.

Comment on lines +1123 to +1124
/// If `value` ends with a UTC offset suffix (`Z`, `+HH:MM`, or `-HH:MM`), returns the
/// stripped string and the offset in seconds. Returns `None` if no offset suffix is present.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior for offsets other than the ones that are supported so far?

// Check if datetime is not None
let tz_datetime = match datetime.single() {
// Spark uses the offset before daylight savings change so we need to use earliest()
// Return None for LocalResult::None which is the invalid time in a DST spring forward gap).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do existing tests cover any DST spring forward gaps?

@andygrove
Copy link
Member

Thanks @parthchandra. Could you run CometCastStringToTemporalBenchmark before and after these changes so we can see performance impact?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants