Currently, engines can be registered with freeform names and versions, but DJ has no way to know what functions are available on each engine. This leads to potential issues:
- Decomposition may generate unsupported SQL - e.g., using
hll_sketch_agg on Spark 3.3 which doesn't have it
- No graceful degradation - DJ can't fall back to simpler approaches when advanced features aren't available
- Function translation is fragile
Examples
| Function |
Spark 3.3 |
Spark 4.0+ |
Druid |
hll_sketch_agg |
✅ |
✅ |
✅ (DS_HLL) |
theta_sketch_agg |
❌ |
✅ |
✅ (THETA_SKETCH) |
Proposed Solution
Replace freeform engine registration with a curated list of supported engine/dialect combinations:
SUPPORTED_ENGINES = {
"spark:3.5": SparkDialect35(),
"spark:4.0": SparkDialect40(),
"trino:4xx": TrinoDialect(),
"druid:31": DruidDialect(),
}
Each dialect would declare:
- Available functions
- Function name mappings for translation
- Valid decomposition strategies
- Type coercions
Currently, engines can be registered with freeform names and versions, but DJ has no way to know what functions are available on each engine. This leads to potential issues:
hll_sketch_aggon Spark 3.3 which doesn't have itExamples
hll_sketch_aggDS_HLL)theta_sketch_aggTHETA_SKETCH)Proposed Solution
Replace freeform engine registration with a curated list of supported engine/dialect combinations:
Each dialect would declare: