Skip to content

[CALCITE-7428] Support regexp function change regexp operator for Hive library#4818

Open
cjj2010 wants to merge 5 commits intoapache:mainfrom
cjj2010:CALCITE-7428
Open

[CALCITE-7428] Support regexp function change regexp operator for Hive library#4818
cjj2010 wants to merge 5 commits intoapache:mainfrom
cjj2010:CALCITE-7428

Conversation

@cjj2010
Copy link

@cjj2010 cjj2010 commented Mar 4, 2026

@mihaibudiu
Copy link
Contributor

There is a question in JIRA. The documentation you link to shows an infix operator REGEXP, but you are implementing support for a function.

case TRIM:
RelToSqlConverterUtil.unparseHiveTrim(writer, call, leftPrec, rightPrec);
break;
case REGEXP:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judging from your Jira information, you want to add a new function, right? If you add this function, the dialect won't need to be modified.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judging from your Jira information, you want to add a new function, right? If you add this function, the dialect won't need to be modified.

Yes, REGEXP is an infix operator in Hive, but there is already a REGEXP function in Cacltie. If another REGEXP operator is added, the original SQL parsing will report an error: "Incorrect syntax near the keyword 'REGEXP'". Therefore, my idea is to convert the REGEXP function into a REGEXP operator based on the Hive dialect, I'm not sure if this is correct

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your Jira description, is it true that select brand_name from product where REGEXP(brand_name,'[a-zA-Z]') won't work in Hive? It needs to be converted to select brand_name from product where brand_name REGEXP '[a-zA-Z]'.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your Jira description, is it true that select brand_name from product where REGEXP(brand_name,'[a-zA-Z]') won't work in Hive? It needs to be converted to select brand_name from product where brand_name REGEXP '[a-zA-Z]'.

Yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, please describe it in detail in Jira, it doesn't seem very clear.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, please describe it in detail in Jira, it doesn't seem very clear.

Thank you for your suggestion. The changes have been made more accurately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point is that, as I understand it from Jira's perspective, there's no need to introduce a new SqlKind.

OTHER_DDL,

/** The {@code REGEXP} function. */
REGEXP;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make this change? If it were a dialect conversion, it could be determined using SqlOperator.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make this change? If it were a dialect conversion, it could be determined using SqlOperator.

Can we modify the function to
SqlBasicFunction.create(SqlKind.RLIKE, ReturnTypes.BOOLEAN_NULLABLE,
OperandTypes.STRING_STRING);
Using SQL Kind.RLIKE, it seems that there is a need for a kind in HiveSQL Dialect to perform conversion judgments, and I am not sure if I understand it correctly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can refer to the suggestions in Jira.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can refer to the suggestions in Jira.

I have already modified and resubmitted the code. Can you help me review the code again. Thank you

case TRIM:
RelToSqlConverterUtil.unparseHiveTrim(writer, call, leftPrec, rightPrec);
break;
case REGEXP:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point is that, as I understand it from Jira's perspective, there's no need to introduce a new SqlKind.

Copy link
Member

@xuzifu666 xuzifu666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

*/
public static void unparseRegexp(SqlWriter writer, SqlCall call, int leftPrec, int rightPrec) {
if (call.operandCount() != 2) {
throw new IllegalArgumentException("REGEXP operator requires exactly 2 operands");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line of code was not covered by tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line of code was not covered by tests.

I tried to execute REGEXP ("brandname"), which checks for errors during the SQL parsing validation phase and does not enter the dialect parsing phase. I think I need to remove the judgment logic in the code


/** The "REGEXP(value, regexp)" function, equivalent to {@link #RLIKE}. */
@LibraryOperator(libraries = {SPARK})
@LibraryOperator(libraries = {SPARK, HIVE})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's been added to libraries in this PR, Hive should also be added to SqlOperatorTest.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's been added to libraries in this PR, Hive should also be added to SqlOperatorTest.

done

@cjj2010 cjj2010 changed the title [CALCITE-7428] Add regexp function (enabled in Hive library) [CALCITE-7428] Support regexp function change regexp operator for Hive library Mar 13, 2026
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants