You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Null handling (coalesce/nanvl/nullif/nvl), null-safe equality (<=>), empty DataFrames, type coercion, special char column names, 20+ chained transformations, large CASE/IN
E20
E20_MultiJoinAnalytics
Self-joins, inequality joins with NOT EXISTS, peer cohort comparison, 6-table join with composite scoring via multiple PERCENT_RANK windows
Original examples (com.poc.sail)
Class
What it tests
BasicConnection
Connection test, range, SQL, groupBy, filter
SqlExample
Temp views, GROUP BY, window rank, CTE
CsvExample
CSV read, transform, Parquet write
What to expect
Some examples will likely fail on Sail, and that's the point. Track which ones succeed and which don't to map Sail's current compatibility surface. Known areas that may have gaps:
Recursive CTEs (E13)
Scala UDFs over Spark Connect (E17)
Some higher-order array functions (E10)
Complex type operations like map_from_entries (E15)
window() time-bucketing function (E12)
queryExecution introspection (E19)
Notes
All examples use SailSession() which reads SAIL_HOST/SAIL_PORT env vars (defaults: localhost:50051)
The code is standard Spark API - no vendor lock-in
The only difference vs real Spark is .remote("sc://host:port") instead of .master("local[*]")
About
Proof of concept: Scala Spark Connect client talking to Sail (DataFusion backend) over gRPC.