Skip to content

Implement spath command with field resolution#5028

Merged
ykmr1224 merged 15 commits intoopensearch-project:mainfrom
ykmr1224:spath/field-resolution
Jan 15, 2026
Merged

Implement spath command with field resolution#5028
ykmr1224 merged 15 commits intoopensearch-project:mainfrom
ykmr1224:spath/field-resolution

Conversation

@ykmr1224
Copy link
Copy Markdown
Collaborator

@ykmr1224 ykmr1224 commented Jan 7, 2026

Description

  • Implement spath command with field resolution
  • This change is for step1 (spath limited version) in [RFC] Support Schema-on-Read in PPL #4984
  • Identify referred fields from query and extract only the referred fields from JSON.
    • Raise error when referred fields cannot be identified (like when no fields command) or wildcard is used.
  • All the fields are extracted as STRING regardless of the actual value type in JSON
    • Later it would be extended to ANY, but that requires functions to handle ANY well.
  • Originally planned to use mvappend function, but added new internal append function to avoid returning single element array. (mvappend will always make array regardless of array size, this is consistent with other array functions in PPL. append will return scalar value when the result become single element, this is similar to Splunk behavior where single element array is automatically converted to scalar value.)
    • This prevents fields not merged by spath command becoming an array.

Related Issues

#4984

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ykmr1224 ykmr1224 added PPL Piped processing language feature calcite calcite migration releated labels Jan 7, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 7, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Experimental spath mode: automatic field-resolution when no explicit path is provided; improved field-resolution across queries.
    • New append behavior and internal append operator: returns null, scalar, or list based on inputs; MV append semantics preserved.
  • Bug Fixes

    • JSON_EXTRACT_ALL: accepts string or array inputs and treats empty objects as null.
  • Documentation

    • Expanded spath docs with dual modes, constraints, and examples.
  • Tests

    • Extensive new and updated unit/integration tests for spath, append, field resolution, JSON extraction, and utilities.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds a field-resolution framework (context, result model, visitor) used by Calcite planning for SPATH extract-all flows, AST and debug utilities, internal append UDF refactor/registration, JSON_EXTRACT_ALL input handling improvements, parser/docs/test updates, and many new/expanded tests.

Changes

Cohort / File(s) Summary
Field resolution core
core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionContext.java, core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionResult.java, core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionVisitor.java
New stack-based context, FieldResolutionResult (regular fields + wildcard combinators), and visitor computing required input fields per UnresolvedPlan.
Calcite integration & SPATH execution
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java, core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java, core/src/main/java/org/opensearch/sql/executor/QueryService.java
CalcitePlanContext gains rootNode and resolveFields; QueryService sets rootNode; CalciteRelNodeVisitor uses AstNodeUtils and FieldResolutionResult and implements extract-all SPATH flow.
AST & debug utilities
core/src/main/java/org/opensearch/sql/ast/AstNodeUtils.java, common/src/main/java/org/opensearch/sql/common/utils/DebugUtils.java
Added AstNodeUtils.containsSubqueryExpression and DebugUtils.debug (guarded by IS_DEBUG).
Append UDFs & registration
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/AppendCore.java, .../AppendFunctionImpl.java, .../MVAppendFunctionImpl.java, .../MapAppendFunctionImpl.java, core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java, core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java, core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
Renamed MVAppendCoreAppendCore; collectElements can now return null/single/list; added internal AppendFunctionImpl and INTERNAL_APPEND registration; callers updated.
JSON_EXTRACT_ALL input handling
core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractAllFunctionImpl.java
Accepts STRING or ARRAY operands, concatenates arrays to strings, normalizes empty maps to null, and updates operand checker.
PPL parser & SPATH validation
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
SPATH validation now requires path only when out (output) is specified; clearer error when out provided without path.
Tests — new & expanded
multiple files under core/src/test, ppl/src/test, integ-test/src/test (e.g., FieldResolutionResultTest.java, AppendFunctionImplTest.java, FieldResolutionVisitorTest.java, CalcitePPLSpathCommandIT.java)
Many new/expanded unit and integration tests covering field resolution, append behavior, JSON extraction, SPATH functionality, explain plans, and reflective coverage.
Test assertion style & utilities
core/src/test/.../MapAppendFunctionImplTest.java, integ-test/src/test/java/org/opensearch/sql/calcite/standalone/MapAppendFunctionIT.java, integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
Switched some assertions to Hamcrest; added MatcherUtils.array(Object...) helper.
Docs, test data & expected outputs
docs/user/ppl/cmd/spath.md, docs/dev/ppl-commands.md, doctest/test_data/structured.json, integ-test/src/test/resources/expectedOutput/.../explain_spath_without_path.yaml
SPATH docs updated for dual extraction modes and limitations; test data expanded; explain-plan expected outputs added.
Coverage validation
ppl/src/test/java/org/opensearch/sql/ppl/parser/FieldResolutionVisitorCoverageTest.java
Reflection test asserting FieldResolutionVisitor overrides CalciteRelNodeVisitor visit methods (excluding known unsupported commands).
Misc small edits
core/src/main/java/org/opensearch/sql/expression/function/FunctionResolver.java, opensearch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchRequest.java
Javadoc tweaks and minor import/format changes.

Sequence Diagram(s)

sequenceDiagram
    participant QS as QueryService
    participant CP as CalcitePlanContext
    participant FRV as FieldResolutionVisitor
    participant CRV as CalciteRelNodeVisitor
    participant AST as UnresolvedPlan
    participant ES as OpenSearch

    QS->>CP: setRootNode(plan)
    QS->>CRV: analyze(plan)
    CRV->>CP: resolveFields(target)
    CP->>FRV: analyze(rootNode)
    FRV->>AST: traverse -> compute Map<UnresolvedPlan,FieldResolutionResult>
    FRV-->>CP: return map
    CP-->>CRV: FieldResolutionResult for target
    alt spath without path (extract-all)
        CRV->>ES: execute JSON_EXTRACT_ALL + ITEM + CAST per resolved field
        ES-->>CRV: rows
    else spath with path
        CRV->>ES: execute path-based extraction
        ES-->>CRV: rows
    end
    CRV-->>QS: generated RelNode / plan
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

enhancement, testing

Suggested reviewers

  • penghuo
  • ps48
  • kavithacm
  • derek-ho
  • joshuali925
  • GumpacG
  • Swiddis
  • mengweieric
  • anirudha
  • qianheng-aws
  • Yury-Fridlyand
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.30% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Implement spath command with field resolution' clearly and concisely summarizes the main objective of the pull request.
Description check ✅ Passed The description is directly related to the changeset, explaining the purpose (field resolution for spath command), rationale, and implementation details.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

* result is used to support `spath` command (Field Resolution-based Extraction) and schema-on-read
* support in the future.
*/
public class FieldResolutionVisitor extends AbstractNodeVisitor<Node, FieldResolutionContext> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if traverse RelNode 2 times? at least RelNode not change quite a lot.

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@docs/user/ppl/cmd/spath.md`:
- Line 245: Update the sentence under the "**Important**" note that currently
reads "It raises error if extracted fields cannot be identified by following
commands" to include the missing article: change it to "It raises an error if
extracted fields cannot be identified by the following commands" (locate the
"**Important**:" paragraph in spath.md).
- Line 166: The sentence describing extraction behavior is awkward; rephrase the
clause referencing field `c` to read more clearly—e.g., state that when an
extracted field already exists (as with `c` in the example), additional
extracted values are appended to form an array; keep the rest of the sentence
noting that only `a`, `b`, and `c` are extracted from the JSON in `doc_multi`
and all extracted fields are returned as STRING type.
- Around line 38-42: Fix grammar and clarity in the "Field Resolution-based
Extraction Notes" block: change "downstream commands requirements" to
"downstream commands' requirements" (or "downstream command's requirements"),
change "raises error" to "raises an error", rephrase "Filter with query (`where
<field> in/exists [...]` ) is not supported after `spath` command" to "Filtering
with queries (e.g., WHERE clauses using IN/EXISTS subqueries) is not supported
after the `spath` command", and ensure parallel phrasing for the other
limitation bullets (e.g., "Cannot use wildcards (`*`) in field selection" and
"All extracted fields are returned as STRING type") so all bullets use
consistent tense and punctuation.
🧹 Nitpick comments (3)
docs/user/ppl/cmd/spath.md (3)

18-18: Clarify what "Experimental" means for users.

The label "(Experimental)" is used but doesn't explain the implications for users. Consider adding a brief note about what experimental status means (e.g., "subject to change in future releases" or "may have incomplete functionality").

Example addition after line 18:

### Field Resolution-based Extraction (Experimental)

> **Note**: This feature is experimental and may change in future releases. It provides limited field extraction as an interim solution until full dynamic field extraction is implemented.

248-259: Improve clarity of error messages in Example 9.

The error comments could be more specific to help users understand the issues:

  • Line 250: "Requires field selection" is vague—consider "Requires explicit field selection via fields or stats command"
  • Line 258: "cannot extract arbitrary fields" is misleading since the actual limitation is about wildcards, not arbitrary field names. Consider: "Wildcards are not supported in field resolution mode"
📝 Proposed improvements
 ```ppl
 source=structured
 | spath input=doc_multi
-| eval x = a * b  # ERROR: Requires field selection (fields or stats command)
+| eval x = a * b  # ERROR: Requires explicit field selection command (fields or stats) after spath

Important: Wildcards are not supported in field resolution mode:

source=structured
| spath input=doc_multi
-| fields a, b*  # ERROR: Spath command cannot extract arbitrary fields
+| fields a, b*  # ERROR: Wildcards not supported with field resolution-based spath
</details>

---

`261-291`: **Consider making performance guidance more prominent.**

Example 10 contains critical performance information that significantly impacts query efficiency and system load. Since this is essential guidance rather than just another example, consider:

1. Moving this to a dedicated "Performance Considerations" section at the top-level (not numbered as an example)
2. Cross-referencing it from the performance warning at line 8
3. Or at minimum, using a more prominent heading like "⚠️ Performance Best Practices"

This would help ensure users don't overlook critical performance implications when using `spath`.

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between f3e597aea4538759deb46857caefc4473c5bc3b1 and 97c297a155da2f5ab5e0ad8e3ae84acfe0f8b8f3.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `docs/user/ppl/cmd/spath.md`

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧠 Learnings (1)</summary>

<details>
<summary>📓 Common learnings</summary>

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes


Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor for Calcite integration


</details>

</details>

</details>

<sub>✏️ Tip: You can disable this entire section by setting `review_details` to `false` in your review settings.</sub>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread docs/user/ppl/cmd/spath.md
Comment thread docs/user/ppl/cmd/spath.md
Comment thread docs/user/ppl/cmd/spath.md
@ykmr1224 ykmr1224 merged commit 65baa2a into opensearch-project:main Jan 15, 2026
38 of 41 checks passed
@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-5028-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 65baa2a427faf145a44dc22d62b2d179f20fb3d3
# Push it to GitHub
git push --set-upstream origin backport/backport-5028-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-5028-to-2.19-dev.

Comment thread docs/dev/ppl-commands.md
- [ ] **Visitor Pattern:**
- Add `visit*` in `AbstractNodeVisitor`
- Overriding `visit*` in `Analyzer`, `CalciteRelNodeVisitor` and `PPLQueryDataAnonymizer`
- Override `visit*` in `FieldResolutionVisitor` for `spath` command support.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ykmr1224 can you add an IT (or guide how to add an necessary IT) to ensure new commands can work well with spath.
Not sure how does a new command impact spath, of else developer has to add following code in FieldResolutionVisitor to pass the UT CI.

  @Override
  public Node visit*(Transpose node, FieldResolutionContext context) {
    visitChildren(node, context);
    return node;
  }

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FieldResolutionVisitorCoverageTest will detect if visitXXX is implemented in CalciteRelNodeVisitor but not implemented in FieldResolutionVisitor (this works as safe guard to forget implement it).
Let me add a implementation guide doc based on #4984 in a follow-up PR, but it might be better delayed until fully implement schema-on-read.

ykmr1224 added a commit to ykmr1224/sql that referenced this pull request Jan 22, 2026
* Implement spath command with field resolution

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix for test failure

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Another test fix

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix test failure

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Address comments

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Address comment

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Address comments

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Address comment

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Add test case, etc.

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Quick fix

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix javadoc and test

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix DebugUtils

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Minor fix

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Update doc

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

---------

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki MORITA <moritato@amazon.com>
(cherry picked from commit 65baa2a)
ykmr1224 added a commit that referenced this pull request Jan 23, 2026
… (#5066)

* Implement spath command with field resolution (#5028)

* Implement spath command with field resolution

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix for test failure

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Another test fix

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix test failure

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Address comments

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Address comment

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Address comments

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Address comment

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Add test case, etc.

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Quick fix

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix javadoc and test

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix DebugUtils

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Minor fix

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Update doc

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

---------

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki MORITA <moritato@amazon.com>
(cherry picked from commit 65baa2a)

* Fix for Java 11 syntax

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix for Java 11 syntax

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

---------

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Tomoyuki MORITA <moritato@amazon.com>
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Jan 27, 2026
@github-project-automation github-project-automation Bot moved this from Todo to Done in PPL 2026 Roadmap Jan 27, 2026
dai-chen added a commit to dai-chen/sql-1 that referenced this pull request Feb 11, 2026
…ject#5028)"

This reverts commit 65baa2a.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen mentioned this pull request Feb 11, 2026
8 tasks
dai-chen added a commit to dai-chen/sql-1 that referenced this pull request Feb 11, 2026
…ject#5028)"

This reverts commit 65baa2a.

Signed-off-by: Chen Dai <daichen@amazon.com>
dai-chen added a commit that referenced this pull request Feb 11, 2026
* Revert "Adopt appendcol, appendpipe, multisearch to spath (#5075)"

This reverts commit 7630db8.

Signed-off-by: Chen Dai <daichen@amazon.com>

* Revert "Support spath with dynamic fields (#5058)"

This reverts commit 633d760.

Signed-off-by: Chen Dai <daichen@amazon.com>

* Revert "Implement spath command with field resolution (#5028)"

This reverts commit 65baa2a.

Signed-off-by: Chen Dai <daichen@amazon.com>

* Fix failed IT testSpathWithMvCombine

Signed-off-by: Chen Dai <daichen@amazon.com>

---------

Signed-off-by: Chen Dai <daichen@amazon.com>
LantaoJin pushed a commit to LantaoJin/search-plugins-sql that referenced this pull request Feb 12, 2026
* Revert "Adopt appendcol, appendpipe, multisearch to spath (opensearch-project#5075)"

This reverts commit 7630db8.

Signed-off-by: Chen Dai <daichen@amazon.com>

* Revert "Support spath with dynamic fields (opensearch-project#5058)"

This reverts commit 633d760.

Signed-off-by: Chen Dai <daichen@amazon.com>

* Revert "Implement spath command with field resolution (opensearch-project#5028)"

This reverts commit 65baa2a.

Signed-off-by: Chen Dai <daichen@amazon.com>

* Fix failed IT testSpathWithMvCombine

Signed-off-by: Chen Dai <daichen@amazon.com>

---------

Signed-off-by: Chen Dai <daichen@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. calcite calcite migration releated feature PPL Piped processing language

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants