What is the problem the feature request solves?
Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark url_decode function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.
The UrlDecode expression decodes URL-encoded strings by converting percent-encoded characters back to their original form. This expression is implemented as a runtime replaceable expression that delegates to the UrlCodec.decode method with configurable error handling behavior.
Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
url_decode(url_string)
url_decode(url_string, fail_on_error)
// DataFrame API
col("url_column").expr("url_decode(url_column)")
Arguments:
| Argument |
Type |
Description |
| child |
StringType |
The URL-encoded string to decode |
| failOnError |
Boolean |
Whether to fail on malformed input (default: true) |
Return Type: Returns StringType - the decoded URL string.
Supported Data Types:
- StringType with collation support (supports trim collation)
- Input must be a valid string expression
Edge Cases:
- Null input returns null output (standard null propagation)
- Empty string input returns empty string
- Malformed percent-encoding behavior depends on
failOnError flag
- When
failOnError is true, invalid encoding throws exception
- When
failOnError is false, invalid sequences may be left unchanged or handled gracefully
- Supports trim collation for string comparison operations
Examples:
-- Basic URL decoding
SELECT url_decode('Hello%20World') AS decoded;
-- Result: "Hello World"
-- Decode with error handling
SELECT url_decode('user%40domain.com', true) AS email;
-- Result: "user@domain.com"
-- Decode complex URL parameters
SELECT url_decode('param%3Dvalue%26other%3D123') AS params;
-- Result: "param=value&other=123"
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(expr("url_decode(encoded_url)").as("decoded"))
// With explicit error handling
df.select(expr("url_decode(encoded_url, false)").as("decoded"))
Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
- Scala Serde: Add expression handler in
spark/src/main/scala/org/apache/comet/serde/
- Register: Add to appropriate map in
QueryPlanSerde.scala
- Protobuf: Add message type in
native/proto/src/proto/expr.proto if needed
- Rust: Implement in
native/spark-expr/src/ (check if DataFusion has built-in support first)
Additional context
Difficulty: Large
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.UrlDecode
Related:
UrlEncode - Companion expression for URL encoding
- String manipulation functions in
url_funcs group
StaticInvoke expression for method delegation
- Collation-aware string expressions
This issue was auto-generated from Spark reference documentation.
What is the problem the feature request solves?
Comet does not currently support the Spark
url_decodefunction, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.The
UrlDecodeexpression decodes URL-encoded strings by converting percent-encoded characters back to their original form. This expression is implemented as a runtime replaceable expression that delegates to theUrlCodec.decodemethod with configurable error handling behavior.Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
Arguments:
Return Type: Returns
StringType- the decoded URL string.Supported Data Types:
Edge Cases:
failOnErrorflagfailOnErroris true, invalid encoding throws exceptionfailOnErroris false, invalid sequences may be left unchanged or handled gracefullyExamples:
Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scalanative/proto/src/proto/expr.protoif needednative/spark-expr/src/(check if DataFusion has built-in support first)Additional context
Difficulty: Large
Spark Expression Class:
org.apache.spark.sql.catalyst.expressions.UrlDecodeRelated:
UrlEncode- Companion expression for URL encodingurl_funcsgroupStaticInvokeexpression for method delegationThis issue was auto-generated from Spark reference documentation.