Skip to content

ARROW-12948: [C++][Python] Add slice_replace kernel#10494

Closed
lidavidm wants to merge 3 commits intoapache:masterfrom
lidavidm:arrow-12948
Closed

ARROW-12948: [C++][Python] Add slice_replace kernel#10494
lidavidm wants to merge 3 commits intoapache:masterfrom
lidavidm:arrow-12948

Conversation

@lidavidm
Copy link
Copy Markdown
Member

@lidavidm lidavidm commented Jun 9, 2021

This adds a slice_replace kernel mimicking Pandas's str.slice_replace. There are both ascii and UTF8 variants, indexing respectively with bytes and codepoints. The ascii variant also works on binary arrays.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 9, 2021

@lidavidm
Copy link
Copy Markdown
Member Author

lidavidm commented Jun 9, 2021

Needs rebasing onto #10496.

@lidavidm lidavidm marked this pull request as ready for review June 9, 2021 20:03
Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this looks good on the principle.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be called binary_replace_slice since it works on non-Ascii input as well (it just slices in byte units, not codeunits).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"binary string"?

Comment thread cpp/src/arrow/compute/api_scalar.h Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I'm not sure the default values will be picked up. Is it just for documentation?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opts.start can be negative, so you should perhaps use before_slice instead.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I added a test case for this too.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit pedantic. Just output - output_start?

Comment thread python/pyarrow/tests/test_compute.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really useful to re-write the same tests in Python. What you could do is generate slices as in test_slice_compatibility to check Pandas compatibility.

Comment thread docs/source/cpp/compute.rst Outdated
+--------------------------+------------+-------------------------+------------------------+---------+---------------------------------------+
| utf8_lower | Unary | String-like | String-like | \(8) | |
+--------------------------+------------+-------------------------+------------------------+---------+---------------------------------------+
| utf8_replace_slice | Unary | String-like | String-like | \(2) | :struct:`ReplaceSliceOptions` |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the note number should be (4)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants