Skip to content

Cleanup regex_expressions.rs to remove _regexp_match function #9107

Merged
alamb merged 5 commits intoapache:mainfrom
Omega359:feature/regexp_cleanup
Feb 3, 2024
Merged

Cleanup regex_expressions.rs to remove _regexp_match function #9107
alamb merged 5 commits intoapache:mainfrom
Omega359:feature/regexp_cleanup

Conversation

@Omega359
Copy link
Copy Markdown
Contributor

@Omega359 Omega359 commented Feb 2, 2024

Which issue does this PR close?

Closes #9106

Rationale for this change

Code cleanup.

What changes are included in this PR?

Code removal, minor code cleanup, toml file updates.

Are these changes tested?

Yes

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label Feb 2, 2024
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Omega359 -- the code looks great. I am running some benchmarks just to be sure

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya fixed the problem in #8631

I ran some of the queries from that PR:

SELECT COUNT(*) FROM 'index' WHERE
    ARRAY_LENGTH(
      REGEXP_MATCH(path, '\\.(asm|c|cc|cpp|cxx|h|hpp|rs|[Ff][0-9]{0,2}(?:or)?|go)$')
    ) > 0;

on 7.2 GB of parquet data

$ du -s -h index
7.2G	index

Main (without this PR):

1 row in set. Query took 4.820 seconds.

This PR:

1 row in set. Query took 4.760 seconds.

So basically the same.

Thank you @Omega359 -- very nice cleanup 🧹

Comment thread datafusion/physical-expr/src/regex_expressions.rs
Comment thread datafusion/physical-expr/src/regex_expressions.rs Outdated
Comment thread datafusion/physical-expr/src/regex_expressions.rs Outdated
Omega359 and others added 3 commits February 2, 2024 19:09
Removed extraneous line as per code review suggestion

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Removed extraneous lines as per code review

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Feb 3, 2024

Thank you @Omega359 and @viirya 🚀

@alamb alamb merged commit 9d1502b into apache:main Feb 3, 2024
@Omega359 Omega359 deleted the feature/regexp_cleanup branch February 13, 2024 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cleanup regex_expressions.rs to remove _regexp_match function

3 participants