Skip to content

[Feature] Support Spark expression: url_decode #3186

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.

Comet does not currently support the Spark url_decode function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.

The UrlDecode expression decodes URL-encoded strings by converting percent-encoded characters back to their original form. This expression is implemented as a runtime replaceable expression that delegates to the UrlCodec.decode method with configurable error handling behavior.

Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.

Describe the potential solution

Spark Specification

Syntax:

url_decode(url_string)
url_decode(url_string, fail_on_error)
// DataFrame API
col("url_column").expr("url_decode(url_column)")

Arguments:

Argument Type Description
child StringType The URL-encoded string to decode
failOnError Boolean Whether to fail on malformed input (default: true)

Return Type: Returns StringType - the decoded URL string.

Supported Data Types:

  • StringType with collation support (supports trim collation)
  • Input must be a valid string expression

Edge Cases:

  • Null input returns null output (standard null propagation)
  • Empty string input returns empty string
  • Malformed percent-encoding behavior depends on failOnError flag
  • When failOnError is true, invalid encoding throws exception
  • When failOnError is false, invalid sequences may be left unchanged or handled gracefully
  • Supports trim collation for string comparison operations

Examples:

-- Basic URL decoding
SELECT url_decode('Hello%20World') AS decoded;
-- Result: "Hello World"

-- Decode with error handling
SELECT url_decode('user%40domain.com', true) AS email;  
-- Result: "user@domain.com"

-- Decode complex URL parameters
SELECT url_decode('param%3Dvalue%26other%3D123') AS params;
-- Result: "param=value&other=123"
// DataFrame API usage
import org.apache.spark.sql.functions._

df.select(expr("url_decode(encoded_url)").as("decoded"))

// With explicit error handling
df.select(expr("url_decode(encoded_url, false)").as("decoded"))

Implementation Approach

See the Comet guide on adding new expressions for detailed instructions.

  1. Scala Serde: Add expression handler in spark/src/main/scala/org/apache/comet/serde/
  2. Register: Add to appropriate map in QueryPlanSerde.scala
  3. Protobuf: Add message type in native/proto/src/proto/expr.proto if needed
  4. Rust: Implement in native/spark-expr/src/ (check if DataFusion has built-in support first)

Additional context

Difficulty: Large
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.UrlDecode

Related:

  • UrlEncode - Companion expression for URL encoding
  • String manipulation functions in url_funcs group
  • StaticInvoke expression for method delegation
  • Collation-aware string expressions

This issue was auto-generated from Spark reference documentation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions