Update explain to Include Singular, CROSS and User Collections#501
Update explain to Include Singular, CROSS and User Collections#501
explain to Include Singular, CROSS and User Collections#501Conversation
explain to Include Singular/CROSS and User Collections
explain to Include Singular/CROSS and User Collectionsexplain to Include Singular, CROSS and User Collections
knassre-bodo
left a comment
There was a problem hiding this comment.
A few things should be added in my opinion, but so far looks great @hadia206 :)
| if root is not None: | ||
| qualified_node = qualify_node(node, session) | ||
| else: | ||
| # If the root is None, it means that the node was an expression | ||
| # without information about its context. | ||
| lines.append( | ||
| f"Cannot call pydough.explain on {display_raw(node)}.\n" | ||
| "Did you mean to use pydough.explain_term?" | ||
| ) | ||
| # No root in the tree (e.g. UnqualifiedGeneratedCollection, or a | ||
| # bare expression like LOWER(first_name + last_name)). Try to | ||
| # qualify anyway for generated collections. If it still fails, | ||
| # raise an exception. | ||
| try: | ||
| qualified_node = qualify_node(node, session) | ||
| except Exception: | ||
| lines.append( | ||
| f"Cannot call pydough.explain on {display_raw(node)}.\n" | ||
| "Did you mean to use pydough.explain_term?" | ||
| ) | ||
| return "\n".join(lines) |
There was a problem hiding this comment.
What if we jus got rid of if root is not None and always did the try-except?
| if isinstance(prop, CartesianProductMetadata): | ||
| child_name = prop.child_collection.name | ||
| left_desc = ( | ||
| qualified_node.preceding_context.to_string() | ||
| if qualified_node.preceding_context is not None | ||
| else collection_name | ||
| ) | ||
| lines.append( | ||
| "This node is a CROSS join: every row of the left " | ||
| "collection is paired with every row of the right " | ||
| "collection." | ||
| ) | ||
| lines.append(f"Left (parent): {left_desc}") | ||
| lines.append(f"Right (child): {child_name}") | ||
| lines.append( | ||
| f"Metadata: {collection_name}.{property_name} -> {child_name}. " | ||
| f"Call pydough.explain(graph['{collection_name}']['{property_name}']) " | ||
| "to learn more." | ||
| ) |
There was a problem hiding this comment.
I don't necessarily like this way of handling CartesianProductMetadata; what was wrong with using the same message as other sub-collection types, then relying on pydough.explain on the metadata to explain the distinction between it being a simple join vs general join vs cross join?
There was a problem hiding this comment.
ok. I'll revert that. I started with wrong CROSS test and went rabbit hole on that.
| "Did you mean to use pydough.explain_term?" | ||
| ) | ||
| # No root in the tree (e.g. UnqualifiedGeneratedCollection, or a | ||
| # bare expression like LOWER(first_name + last_name)). Try to |
There was a problem hiding this comment.
Bare expressions shouldn't use explain, they should use explain_term since things like LOWER(first_name) can only be explained within a context.
There was a problem hiding this comment.
I had to do that because I was hitting a problem with if isinstance(qualified_node, PyDoughExpressionQDAG): being an issue with collection.
I updated the code to handle it differently. See if that's okay now
…update_explain
knassre-bodo
left a comment
There was a problem hiding this comment.
A few more things I'd like iterated on, though some of these can potentially be in followups.
| def cross_impl() -> UnqualifiedNode: | ||
| return nations.CROSS(regions) | ||
|
|
||
|
|
||
| def cross_nations_impl() -> tuple[UnqualifiedNode, UnqualifiedNode]: | ||
| return nations.CROSS(regions), nations |
There was a problem hiding this comment.
What about another one like this, where the term is a CROSS:
return customers.WHERE(market_segment == 'BUILDING'), CROSS(nations.WHERE(region.name == 'ASIA'))And one where there is ancestry stuff going on:
return customers.CALCULATE(cust_nationkey).CROSS(nations), cust_nationkey| TPCH.nations.TPCH.regions.CALCULATE(COUNT(nations.comment)) | ||
| TPCH.nations.TPCH.regions.WHERE(HAS(nations)) | ||
| TPCH.nations.TPCH.regions.ORDER_BY(COUNT(nations).DESC()) |
There was a problem hiding this comment.
I don't know if this is displaying correctly... it should be TPCH.nations.CROSS(TPCH.regions).CALCULATE(...)
Not 100% sure how to fix this... that may need to be a followup as part of a larger refactoring
There was a problem hiding this comment.
CROSS collections render as TPCH.nations.TPCH.regions instead of TPCH.nations.CROSS(TPCH.regions)
Added a todo for now as I agree it's a larger refactoring effort.
| ├─── SubCollection[customers] | ||
| └─── Where[RANKING(by=(account_balance.DESC(na_pos='last')), levels=1) == 1] | ||
|
|
||
| This child is plural with regards to the collection, meaning its scalar terms can only be accessed by the collection if they are aggregated. |
There was a problem hiding this comment.
This is wrong... it shouldn't be plural. Some adjustments to the explain code may be required to get this to display the correct logic.
|
|
||
| The term is the following expression: RANKING(by=(name.ASC(na_pos='first'))) | ||
|
|
||
| This expression calls the window function 'RANKING' with the following arguments: |
There was a problem hiding this comment.
We should clean this up so it displasy nicely when there are no regular arguments, but it also explains any by / per arguments, as well as other ancillary arguments (e.g. cumulative, frame, default, n_buckets)
There was a problem hiding this comment.
did some changes, let me know if this is okay now
unified both UDF and non-UDF branches with these improvements:
- No positional args: says "This expression calls the window function 'NAME'." instead of "...with the following arguments:" followed by nothing
- Ordering (by): new section lists each collation_arg when present
- Partition levels (per): shown when levels is not None
- Additional options: new section lists each kwarg (e.g. cumulative: True, allow_ties: False) when present
There was a problem hiding this comment.
I think the behavior of specific known ones (cumulative, frame, default, n_buckets) can be explained in the string.
| This expression calls the user-defined window function 'NVAL' with the following arguments: | ||
| name | ||
| 1 |
There was a problem hiding this comment.
Same comment as earlier about other window function arguments
| key | ||
|
|
||
| Description: Returns true if the argument is greater than zero. | ||
| This function is defined by the SQL macro: '{0} > 0'. |
There was a problem hiding this comment.
Perhaps for macro-like UDFs we can follow this up with an example by:
- Count how many arguments it has (from the arguments to the call)
- Creating that many dummy variables (
?a,?b,?c) - Injecting those into the SQL text via Python's
.formatmethod
So for this text it would be:
Suppose this function were called on arguments that are translated to the following in SQL: '?a'
Then the final SQL text for this function call would be: '?a > 0'
knassre-bodo
left a comment
There was a problem hiding this comment.
Great job @hadia206! Left a few final comments, but feel free to merge after
| return [ | ||
| f"This node accesses user-generated collection {self.name!r}.\n" | ||
| f"Columns: {', '.join(sorted(self.columns))}", | ||
| f"Unique columns: {', '.join(sorted(unique_terms))}", |
There was a problem hiding this comment.
I think this one should be in verbose, but the type of user generated collection should not be (add that in the individual overrides)
| # TODO: when the collection is a CROSS, qualified_node.to_string() | ||
| # renders as e.g. "TPCH.nations.TPCH.regions" instead of the | ||
| # friendlier "TPCH.nations.CROSS(TPCH.regions)". Fixing this | ||
| # properly requires propagating CROSS identity through the QDAG |
There was a problem hiding this comment.
Or finding a way to render the unqualified node that looks nicer, like how QDAG looks
|
|
||
| The term is the following expression: RANKING(by=(name.ASC(na_pos='first'))) | ||
|
|
||
| This expression calls the window function 'RANKING' with the following arguments: |
There was a problem hiding this comment.
I think the behavior of specific known ones (cumulative, frame, default, n_buckets) can be explained in the string.
| TPCH.regions.CALCULATE(nations.customers.WHERE(RANKING(by=(account_balance.DESC(na_pos='last')), levels=1) == 1).SINGULAR.account_balance) | ||
|
|
||
| To learn more about this child, you can try calling pydough.explain on the following: | ||
| TPCH.regions.nations.customers.WHERE(RANKING(by=(account_balance.DESC(na_pos='last')), levels=1) == 1).SINGULAR |
There was a problem hiding this comment.
Oop, we should adjust the repr for SINGULAR in qdag to have the () at the end.
| Suppose this function were called on arguments that are translated to the following in SQL: '?a', '?b', '?c' | ||
| Then the final SQL text for this function call would be: 'ABS(?b - ?a) <= ?c' |
There was a problem hiding this comment.
I think this part might a verbose-only thing
Extends
pydough.explain()so it can explainExtends
pydough.expalin_term()