Skip to content

Conversation

@rmnskb
Copy link
Contributor

@rmnskb rmnskb commented Nov 8, 2025

Rationale for this change

Please see #32007, currently, neither arrays nor scalars support Python-native arithmetic operations, such as array + array, it has to be done via pyarrow.compute API. This PR strives to fix this with custom dunder methods.

What changes are included in this PR?

Implemented dunder methods

Are these changes tested?

Yes

Are there any user-facing changes?

Possibility to use Python operators directly instead of calling the pyarrow.compute API.

@rmnskb
Copy link
Contributor Author

rmnskb commented Nov 8, 2025

@pitrou @AlenkaF @rok
Sorry for pinging like this, I wanted to understand the scope correctly. So far, I went with the most straightforward implementation and added dunders for all operations that were listed in the compute docs and had respective dunders available. Do you think this implementation is enough or should I go for other dunders as well, e.g bitwise operations? After reviewing the original discussion, I've seen @jorisvandenbossche suggesting checked versions of the functions, I think I will go with them as well.
Regarding the documentation:

  1. Should I add docstrings to dunder methods? Given that user won't see them in their IDE, for example.
  2. Should the documentation for arrays/scalars be updated to reflect that the native operators are now supported?
    Thanks for your input!

@rok
Copy link
Member

rok commented Nov 11, 2025

Sorry for late reply @rmnskb. I think all dunders that semantically map to our existing kernels are fair game! I'm not sure what these would be, but the set you have now looks decently sized.

Should I add docstrings to dunder methods? Given that user won't see them in their IDE, for example.

We might be able to add docstrings via type annotations. Or perhaps it's possible to copy them at runtime from the respective kernels? (unsure that is possible). In any case we probably don't want to duplicate docstrings in general.

Should the documentation for arrays/scalars be updated to reflect that the native operators are now supported?

Yes, that would be good to have!

@rmnskb
Copy link
Contributor Author

rmnskb commented Nov 11, 2025

Sorry for late reply @rmnskb. I think all dunders that semantically map to our existing kernels are fair game! I'm not sure what these would be, but the set you have now looks decently sized.

Ok, I was also thinking about covering as many dunder methods as possible, but did not want to go out of scope for this issue.

We might be able to add docstrings via type annotations. Or perhaps it's possible to copy them at runtime from the respective kernels? (unsure that is possible). In any case we probably don't want to duplicate docstrings in general.

Copying the docstrings sounds like a good idea, I will look further into that.

@AlenkaF
Copy link
Member

AlenkaF commented Nov 12, 2025

Thank you for working on this @rmnskb and sorry for replying late!
Pinging us when you have a question is totally fine and in fact necessary ;)

I think the current scope is solid. I am thinking of basic comparisons and the discussion in the issue around __eq__. That would also be good to tackle but maybe as a separate PR?

Yes, it would be important to update the documentation (User Guide which is the part that is not API reference related).

One comment after looking at the code would be to add tests that cover various type combinations (array, scalar, Python types, unsupported types ...)

@rok
Copy link
Member

rok commented Nov 12, 2025

I think the current scope is solid. I am thinking of basic comparisons and the discussion in the issue around __eq__. That would also be good to tackle but maybe as a separate PR?

I think we already have __eq__: https://github.com/apache/arrow/pull/7737/files.

One comment after looking at the code would be to add tests that cover various type combinations (array, scalar, Python types, unsupported types ...)

That's a good point, __eq__ currently raises if the types of compared don't match. Good test coverage would also help find edge cases.

    def __eq__(self, other):
        try:
            return self.equals(other)
        except TypeError:
            # This also handles comparing with None
            # as Array.equals(None) raises a TypeError.
            return NotImplemented

@AlenkaF
Copy link
Member

AlenkaF commented Nov 12, 2025

I think we already have __eq__: https://github.com/apache/arrow/pull/7737/files.

Yes, true. I was under the impression that we might want to change the use of equals() method with the equal compute function? From the issue:

I think if we add those, we should also change the behaviour of __eq__, although that is something that will require a long deprecation cycle.

@rok
Copy link
Member

rok commented Nov 12, 2025

Oh, I managed to forget this and thought array.equals dispatches to compute.equals already 🤦.

It seems like a good opportunity to change this! :)

@rmnskb
Copy link
Contributor Author

rmnskb commented Nov 13, 2025

We might be able to add docstrings via type annotations. Or perhaps it's possible to copy them at runtime from the respective kernels? (unsure that is possible). In any case we probably don't want to duplicate docstrings in general.

I've looked more into this, and based on these two discussions, we'd have to copy them directly from the underlying functions at runtime via importing the pyarrow.compute module, which AFAIK we don't have access to, since we call them with _pc().call_function().
I've added the docstrings to respective classes which notes that the API now support calling the Python-native operators directly, as well as provides some examples of that. Please let me know if that's sufficient.

@rmnskb
Copy link
Contributor Author

rmnskb commented Nov 13, 2025

I think the current scope is solid. I am thinking of basic comparisons and the discussion in the issue around __eq__. That would also be good to tackle but maybe as a separate PR?

I'll be happy to work on it, once I'm done with this implementation :)

Yes, it would be important to update the documentation (User Guide which is the part that is not API reference related).

Are you talking about this file? Or is there somewhere else, where I can put this information?

One comment after looking at the code would be to add tests that cover various type combinations (array, scalar, Python types, unsupported types ...)

Yes, I agree. Are there any particular unsupported types I should include that you have in mind?

@AlenkaF
Copy link
Member

AlenkaF commented Nov 14, 2025

I'll be happy to work on it, once I'm done with this implementation :)

Thank you!

Are you talking about this file? Or is there somewhere else, where I can put this information?

I was thinking more about the compute page in our User Guide: https://github.com/apache/arrow/blob/main/docs/source/python/compute.rst

Are there any particular unsupported types I should include that you have in mind?

No, not really. What comes to mind are strings and/or nested for some arithmetic functions.

@rmnskb
Copy link
Contributor Author

rmnskb commented Nov 14, 2025

Are you talking about this file? Or is there somewhere else, where I can put this information?

I was thinking more about the compute page in our User Guide: https://github.com/apache/arrow/blob/main/docs/source/python/compute.rst

Operators Documentation

I mentioned the newly implemented operators in the documentation. Please let me know if it makes sense. I'm not sure whether we should list all the implemented methods, on the other hand, I don't want to leave users guessing what exactly can they use.

@rok
Copy link
Member

rok commented Nov 17, 2025

One comment after looking at the code would be to add tests that cover various type combinations (array, scalar, Python types, unsupported types ...)

Yes, I agree. Are there any particular unsupported types I should include that you have in mind?

An integer based extension type maybe.

@rmnskb rmnskb marked this pull request as ready for review November 30, 2025 21:25
@rmnskb rmnskb requested a review from rok as a code owner November 30, 2025 21:25
@rmnskb
Copy link
Contributor Author

rmnskb commented Nov 30, 2025

@AlenkaF @rok thanks for your input on this PR, I've marked it as ready, so I think it's up for a review, please let me know if there is anything else I should include, thanks! :)

Copy link
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thank you @rmnskb!
I have one comment regarding the documentation and one minor nit. Also, could we add one more test that includes mixed type arithmetics (like in docstring examples: array + scalar, etc.)?

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Dec 3, 2025
@rmnskb
Copy link
Contributor Author

rmnskb commented Dec 8, 2025

Hey, @AlenkaF, good points! Thanks for the review. I've added another test and updated the docs as per your suggestion.

Copy link
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great improvement, thank you for working on it @rmnskb !
Will wait for another review before merging. @rok, would you have time to have a look?

@AlenkaF
Copy link
Member

AlenkaF commented Dec 15, 2025

@pitrou @raulcd looking for another review before merging, if anybody has time.

@jorisvandenbossche
Copy link
Member

Two general quick comments:

  • The & / | operators are currently mapped to the bitwise kernels, but in array-land, I think the most common use case for those is actually for logical (boolean) operations. So as a user, I would expect those to also work on bool type.
  • Do we want to add the round/floor/ceil/trunc rounding dunders? Numpy does not support those, so here I wouldn't have expected them. But maybe it can also be useful (I think it find it a bit strange looking that a stdlib builtin function returns a pyarrow array, but that is probably a question of getting used to it). One difference is that the builtin is documented to return an int, and our kernels don't do that (float stays float)

@rmnskb
Copy link
Contributor Author

rmnskb commented Jan 13, 2026

The & / | operators are currently mapped to the bitwise kernels, but in array-land, I think the most common use case for those is actually for logical (boolean) operations. So as a user, I would expect those to also work on bool type.

That is true, I also though about it, but it seems to me that being consistent with other dunders also makes sense. Should we probably document this? So the users know what to expect.

Do we want to add the round/floor/ceil/trunc rounding dunders? Numpy does not support those, so here I wouldn't have expected them. But maybe it can also be useful (I think it find it a bit strange looking that a stdlib builtin function returns a pyarrow array, but that is probably a question of getting used to it). One difference is that the builtin is documented to return an int, and our kernels don't do that (float stays float)

Fair point. I think documenting the expected behavior might be a good idea given that Python is quite forgiving when it comes to data types. What do you think? The other option is to explicitly convert the output of this dunders to integers to stay consistent with the builtin docs.

@pitrou
Copy link
Member

pitrou commented Jan 14, 2026

Do we want to gold-plate this? We can keep the bitwise operators, remove the round/floor/etc. dunders, and perhaps improve things later if people request it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants