Implement spath command with field resolution#5028
Implement spath command with field resolution#5028ykmr1224 merged 15 commits intoopensearch-project:mainfrom
Conversation
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds a field-resolution framework (context, result model, visitor) used by Calcite planning for SPATH extract-all flows, AST and debug utilities, internal append UDF refactor/registration, JSON_EXTRACT_ALL input handling improvements, parser/docs/test updates, and many new/expanded tests. Changes
Sequence Diagram(s)sequenceDiagram
participant QS as QueryService
participant CP as CalcitePlanContext
participant FRV as FieldResolutionVisitor
participant CRV as CalciteRelNodeVisitor
participant AST as UnresolvedPlan
participant ES as OpenSearch
QS->>CP: setRootNode(plan)
QS->>CRV: analyze(plan)
CRV->>CP: resolveFields(target)
CP->>FRV: analyze(rootNode)
FRV->>AST: traverse -> compute Map<UnresolvedPlan,FieldResolutionResult>
FRV-->>CP: return map
CP-->>CRV: FieldResolutionResult for target
alt spath without path (extract-all)
CRV->>ES: execute JSON_EXTRACT_ALL + ITEM + CAST per resolved field
ES-->>CRV: rows
else spath with path
CRV->>ES: execute path-based extraction
ES-->>CRV: rows
end
CRV-->>QS: generated RelNode / plan
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
15b3138 to
a90e097
Compare
| * result is used to support `spath` command (Field Resolution-based Extraction) and schema-on-read | ||
| * support in the future. | ||
| */ | ||
| public class FieldResolutionVisitor extends AbstractNodeVisitor<Node, FieldResolutionContext> { |
There was a problem hiding this comment.
what if traverse RelNode 2 times? at least RelNode not change quite a lot.
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@docs/user/ppl/cmd/spath.md`:
- Line 245: Update the sentence under the "**Important**" note that currently
reads "It raises error if extracted fields cannot be identified by following
commands" to include the missing article: change it to "It raises an error if
extracted fields cannot be identified by the following commands" (locate the
"**Important**:" paragraph in spath.md).
- Line 166: The sentence describing extraction behavior is awkward; rephrase the
clause referencing field `c` to read more clearly—e.g., state that when an
extracted field already exists (as with `c` in the example), additional
extracted values are appended to form an array; keep the rest of the sentence
noting that only `a`, `b`, and `c` are extracted from the JSON in `doc_multi`
and all extracted fields are returned as STRING type.
- Around line 38-42: Fix grammar and clarity in the "Field Resolution-based
Extraction Notes" block: change "downstream commands requirements" to
"downstream commands' requirements" (or "downstream command's requirements"),
change "raises error" to "raises an error", rephrase "Filter with query (`where
<field> in/exists [...]` ) is not supported after `spath` command" to "Filtering
with queries (e.g., WHERE clauses using IN/EXISTS subqueries) is not supported
after the `spath` command", and ensure parallel phrasing for the other
limitation bullets (e.g., "Cannot use wildcards (`*`) in field selection" and
"All extracted fields are returned as STRING type") so all bullets use
consistent tense and punctuation.
🧹 Nitpick comments (3)
docs/user/ppl/cmd/spath.md (3)
18-18: Clarify what "Experimental" means for users.The label "(Experimental)" is used but doesn't explain the implications for users. Consider adding a brief note about what experimental status means (e.g., "subject to change in future releases" or "may have incomplete functionality").
Example addition after line 18:
### Field Resolution-based Extraction (Experimental) > **Note**: This feature is experimental and may change in future releases. It provides limited field extraction as an interim solution until full dynamic field extraction is implemented.
248-259: Improve clarity of error messages in Example 9.The error comments could be more specific to help users understand the issues:
- Line 250: "Requires field selection" is vague—consider "Requires explicit field selection via fields or stats command"
- Line 258: "cannot extract arbitrary fields" is misleading since the actual limitation is about wildcards, not arbitrary field names. Consider: "Wildcards are not supported in field resolution mode"
📝 Proposed improvements
```ppl source=structured | spath input=doc_multi -| eval x = a * b # ERROR: Requires field selection (fields or stats command) +| eval x = a * b # ERROR: Requires explicit field selection command (fields or stats) after spathImportant: Wildcards are not supported in field resolution mode:
source=structured | spath input=doc_multi -| fields a, b* # ERROR: Spath command cannot extract arbitrary fields +| fields a, b* # ERROR: Wildcards not supported with field resolution-based spath</details> --- `261-291`: **Consider making performance guidance more prominent.** Example 10 contains critical performance information that significantly impacts query efficiency and system load. Since this is essential guidance rather than just another example, consider: 1. Moving this to a dedicated "Performance Considerations" section at the top-level (not numbered as an example) 2. Cross-referencing it from the performance warning at line 8 3. Or at minimum, using a more prominent heading like "⚠️ Performance Best Practices" This would help ensure users don't overlook critical performance implications when using `spath`. </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: Path: .coderabbit.yaml **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between f3e597aea4538759deb46857caefc4473c5bc3b1 and 97c297a155da2f5ab5e0ad8e3ae84acfe0f8b8f3. </details> <details> <summary>📒 Files selected for processing (1)</summary> * `docs/user/ppl/cmd/spath.md` </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (1)</summary> <details> <summary>📓 Common learnings</summary>Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changesLearnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns inCalciteRelNodeVisitorandCalciteRexNodeVisitorfor Calcite integration</details> </details> </details> <sub>✏️ Tip: You can disable this entire section by setting `review_details` to `false` in your review settings.</sub> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-5028-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 65baa2a427faf145a44dc22d62b2d179f20fb3d3
# Push it to GitHub
git push --set-upstream origin backport/backport-5028-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
| - [ ] **Visitor Pattern:** | ||
| - Add `visit*` in `AbstractNodeVisitor` | ||
| - Overriding `visit*` in `Analyzer`, `CalciteRelNodeVisitor` and `PPLQueryDataAnonymizer` | ||
| - Override `visit*` in `FieldResolutionVisitor` for `spath` command support. |
There was a problem hiding this comment.
@ykmr1224 can you add an IT (or guide how to add an necessary IT) to ensure new commands can work well with spath.
Not sure how does a new command impact spath, of else developer has to add following code in FieldResolutionVisitor to pass the UT CI.
@Override
public Node visit*(Transpose node, FieldResolutionContext context) {
visitChildren(node, context);
return node;
}
There was a problem hiding this comment.
FieldResolutionVisitorCoverageTest will detect if visitXXX is implemented in CalciteRelNodeVisitor but not implemented in FieldResolutionVisitor (this works as safe guard to forget implement it).
Let me add a implementation guide doc based on #4984 in a follow-up PR, but it might be better delayed until fully implement schema-on-read.
* Implement spath command with field resolution Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix for test failure Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Another test fix Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix test failure Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comments Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comment Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comments Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comment Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Add test case, etc. Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Quick fix Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix javadoc and test Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix DebugUtils Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Minor fix Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Update doc Signed-off-by: Tomoyuki Morita <moritato@amazon.com> --------- Signed-off-by: Tomoyuki Morita <moritato@amazon.com> Signed-off-by: Tomoyuki MORITA <moritato@amazon.com> (cherry picked from commit 65baa2a)
… (#5066) * Implement spath command with field resolution (#5028) * Implement spath command with field resolution Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix for test failure Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Another test fix Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix test failure Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comments Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comment Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comments Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comment Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Add test case, etc. Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Quick fix Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix javadoc and test Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix DebugUtils Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Minor fix Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Update doc Signed-off-by: Tomoyuki Morita <moritato@amazon.com> --------- Signed-off-by: Tomoyuki Morita <moritato@amazon.com> Signed-off-by: Tomoyuki MORITA <moritato@amazon.com> (cherry picked from commit 65baa2a) * Fix for Java 11 syntax Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Fix for Java 11 syntax Signed-off-by: Tomoyuki Morita <moritato@amazon.com> --------- Signed-off-by: Tomoyuki Morita <moritato@amazon.com> Signed-off-by: Tomoyuki MORITA <moritato@amazon.com>
…ject#5028)" This reverts commit 65baa2a. Signed-off-by: Chen Dai <daichen@amazon.com>
…ject#5028)" This reverts commit 65baa2a. Signed-off-by: Chen Dai <daichen@amazon.com>
* Revert "Adopt appendcol, appendpipe, multisearch to spath (#5075)" This reverts commit 7630db8. Signed-off-by: Chen Dai <daichen@amazon.com> * Revert "Support spath with dynamic fields (#5058)" This reverts commit 633d760. Signed-off-by: Chen Dai <daichen@amazon.com> * Revert "Implement spath command with field resolution (#5028)" This reverts commit 65baa2a. Signed-off-by: Chen Dai <daichen@amazon.com> * Fix failed IT testSpathWithMvCombine Signed-off-by: Chen Dai <daichen@amazon.com> --------- Signed-off-by: Chen Dai <daichen@amazon.com>
* Revert "Adopt appendcol, appendpipe, multisearch to spath (opensearch-project#5075)" This reverts commit 7630db8. Signed-off-by: Chen Dai <daichen@amazon.com> * Revert "Support spath with dynamic fields (opensearch-project#5058)" This reverts commit 633d760. Signed-off-by: Chen Dai <daichen@amazon.com> * Revert "Implement spath command with field resolution (opensearch-project#5028)" This reverts commit 65baa2a. Signed-off-by: Chen Dai <daichen@amazon.com> * Fix failed IT testSpathWithMvCombine Signed-off-by: Chen Dai <daichen@amazon.com> --------- Signed-off-by: Chen Dai <daichen@amazon.com>
Description
fieldscommand) or wildcard is used.mvappendfunction, but added new internalappendfunction to avoid returning single element array. (mvappendwill always make array regardless of array size, this is consistent with other array functions in PPL.appendwill return scalar value when the result become single element, this is similar to Splunk behavior where single element array is automatically converted to scalar value.)spathcommand becoming an array.Related Issues
#4984
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.