Develop the DuckDB SQL to BigQuery Standard SQL transpilation prototype for the Research Tier. This pipeline acts as a convenience layer, allowing standard local quantitative queries on the MarketTick schema to be seamlessly routed to the BigQuery external tables established in Tract 1.
Important
Awaiting Red Team sign-off on this revised Tract 2 Implementation Plan before beginning execution. All 6 constraints from the Control Specification review have been functionally mapped to the proposed Python architecture below.
This module provides the deterministic bridge from local research DuckDB syntax to remote BigQuery execution.
-
TranspilationError(Exception):- A custom exception class serving as the Fail-Closed mechanism.
- Fallback Behavior: When raised, the error message will deterministically output:
- The specific unsupported AST node/token (e.g.,
Unsupported construct: WindowFunction). - The rejection reason (
Window functions are explicitly banned under the Tract 2 Control Spec). - The required operator fallback path (
Fallback required: Please execute complex aggregations natively via the BigQuery client).
- The specific unsupported AST node/token (e.g.,
-
Translation Boundary (DuckDB Substrait):
- The transpiler will consume raw DuckDB SQL strings.
- Instead of writing custom regex or a fragile string parser, it will lean on DuckDB's native parser by executing
conn.get_substrait(query)to extract the canonical Intermediate Representation (IR). - Note on Overclaiming: Validation of the Substrait IR against BigQuery Standard SQL is strictly experimental for this prototype. Promotion to Tract 1 relies entirely on passing the semantic parity test sweeps.
-
QuanuXDuckToBQTranspilerClass:- Read-Only Matrix Enforcement: Before any AST/Substrait parsing begins, the input string will be strictly scanned to ban non-
SELECToperations (DROP,ALTER,UPDATE,INSERT). - Whitelist Enforcement: The class will traverse the Substrait relational algebra nodes. It will implement a strictly allowed list (
SELECT,ProjectRel,AggregateRel,FilterRel). If an unrecognized Relational Node, mathematical operation, or unapproved function (like Windowing or recursive CTE mapping) is detected, it instantly firesTranspilationError. - Result Set Bounding: The class will output not just the SQL string, but a controlled BigQuery execution block utilizing
query_job.result().to_arrow_iterable()to guarantee chunked, memory-safe data retrieval back to the Python tier.
- Read-Only Matrix Enforcement: Before any AST/Substrait parsing begins, the input string will be strictly scanned to ban non-
The testing methodology abandons the Tract 1 ingestion shape in favor of strict parser and semantic parity assertions.
- Whitelist Acceptance Tests: Asserts that
SELECT,FROM,WHERE,GROUP BY, and standard aggregations (SUM,AVG,MIN,MAX,COUNT) map perfectly to BigQuery strings without raising exceptions. - Unsupported Construct Rejection Tests: Explicitly injects Window Functions, dialect-specific macros, and CTEs to verify that
TranspilationErroris thrown deterministically. - Fallback Message Determinism: Asserts that the exception
__str__exactly matches the required 3-part fallback structure demanded by the Control Spec. - Semantic Parity Fixtures: (Core Graduation Requirement) Executes the transpiled approved queries against a mocked/simulated layout and asserts exact row-count, grouping cardinality, explicit null-handling, and numeric precision against local DuckDB results.
- State-Mutation Bans: Asserts that sending an
UPDATEorDROP TABLEtext to the transpiler triggers an immediate, unrecoverable exception prior to any parsing attempt.
- Red Team Review: Awaiting code-level approval on the transpiler architecture and test shape outlined above.
- Implementation Execution: Code
gcp_transpiler.pyandtest_gcp_transpiler.pystrictly against this class structure. - Audit PyTest Runner: Output testing evidence to
tract2_test_run.logand push for final promotion evaluation.