gh-118218: Reuse return tuple in itertools.pairwise#118219
gh-118218: Reuse return tuple in itertools.pairwise#118219serhiy-storchaka merged 5 commits intopython:mainfrom
Conversation
With this change: ``` b1(1) min=152.8us mean=155.5us ± 3.8us (25 repeats, 1000 loops) b2(1) min=149.0us mean=159.1us ± 8.8us (25 repeats, 1000 loops) b3(1) min=232.6us mean=242.5us ± 11.7us (25 repeats, 1000 loops) b1(10) min=279.2us mean=296.9us ± 16.6us (25 repeats, 1000 loops) b2(10) min=249.5us mean=259.2us ± 12.2us (25 repeats, 1000 loops) b3(10) min=386.6us mean=398.8us ± 10.1us (25 repeats, 1000 loops) b1(1000) min=20.3ms mean=20.7ms ± 0.5ms (25 repeats, 1000 loops) b2(1000) min=16.7ms mean=17.1ms ± 0.2ms (25 repeats, 1000 loops) b3(1000) min=26.0ms mean=26.2ms ± 0.3ms (25 repeats, 1000 loops) ``` Without this change: ``` b1(1) min=142.2us mean=143.0us ± 0.9us (25 repeats, 1000 loops) b2(1) min=142.7us mean=143.3us ± 1.0us (25 repeats, 1000 loops) b3(1) min=219.8us mean=227.2us ± 4.4us (25 repeats, 1000 loops) b1(10) min=314.2us mean=323.8us ± 4.1us (25 repeats, 1000 loops) b2(10) min=335.4us mean=341.8us ± 5.1us (25 repeats, 1000 loops) b3(10) min=362.0us mean=386.2us ± 14.9us (25 repeats, 1000 loops) b1(1000) min=26.5ms mean=27.3ms ± 0.3ms (25 repeats, 1000 loops) b2(1000) min=29.8ms mean=30.2ms ± 0.2ms (25 repeats, 1000 loops) b3(1000) min=26.0ms mean=26.5ms ± 0.4ms (25 repeats, 1000 loops) ```
| } | ||
| po->it = it; | ||
| po->old = NULL; | ||
| po->result = PyTuple_Pack(2, Py_None, Py_None); |
There was a problem hiding this comment.
PyTuple_Pack is slow. I created an issue #118222 to discuss this.
An alternative that might be faster:
| po->result = PyTuple_Pack(2, Py_None, Py_None); | |
| PyObject tmp[2] = {Py_None, Py_None}; | |
| po->result = _PyTuple_FromArraySteal(tmp, 2),; |
(note: untested!)
There was a problem hiding this comment.
I tested this locally. Since this line is once called once for each call to pairwise the performance difference is insignificant. It does matter for the other call to PyTuple_Pack (the one in pairwise_next), but that is best handled in a different PR.
| PyTuple_SET_ITEM(result, 0, Py_NewRef(old)); | ||
| PyTuple_SET_ITEM(result, 1, Py_NewRef(new)); | ||
| Py_DECREF(last_old); | ||
| Py_DECREF(last_new); |
There was a problem hiding this comment.
Since last_new is equal to old we could do:
| PyTuple_SET_ITEM(result, 0, Py_NewRef(old)); | |
| PyTuple_SET_ITEM(result, 1, Py_NewRef(new)); | |
| Py_DECREF(last_old); | |
| Py_DECREF(last_new); | |
| PyTuple_SET_ITEM(result, 0, old); | |
| PyTuple_SET_ITEM(result, 1, Py_NewRef(new)); | |
| Py_DECREF(last_old); |
There was a problem hiding this comment.
It's not necessarily true that last_new is old. Really up to the whims of when the user lets go of the tuple
|
Conceptually this seems reasonable to me. Please do have someone else give it a thorough review (Serhiy would be a reasonable choice). |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Please compare it also with the simple code that uses PyTuple_New + PyTuple_SET_ITEM instead of PyTuple_Pack if the latter is so slow.
11f2e8e to
68b5d27
Compare
|
Using this diff: Detailsdiff --git a/Modules/itertoolsmodule.c b/Modules/itertoolsmodule.c
index 6ee447ef6a..668c5a4f15 100644
--- a/Modules/itertoolsmodule.c
+++ b/Modules/itertoolsmodule.c
@@ -356,7 +356,11 @@ pairwise_next(pairwiseobject *po)
return NULL;
}
/* Future optimization: Reuse the result tuple as we do in enumerate() */
- result = PyTuple_Pack(2, old, new);
+ result = PyTuple_New(2);
+ if (result != NULL) {
+ PyTuple_SET_ITEM(result, 0, Py_NewRef(old));
+ PyTuple_SET_ITEM(result, 1, Py_NewRef(new));
+ }
Py_XSETREF(po->old, new);
Py_DECREF(old);
return result;Baseline: This PR: |
| it = pairwise([None, None]) | ||
| gc.collect() |
There was a problem hiding this comment.
In all tests above except for zip_longest there is a next(it) before gc.collect(). All these tests were added in 226a012 (bpo-42536).
@brandtbucher, why is such difference? Is there a bug in test_zip_longest_result_gc? Do we need next(it) there and here?
Misc/NEWS.d/next/Library/2024-04-24-07-45-08.gh-issue-118218.m1OHbN.rst
Outdated
Show resolved
Hide resolved
Misc/NEWS.d/next/Library/2024-04-24-07-45-08.gh-issue-118218.m1OHbN.rst
Outdated
Show resolved
Hide resolved
…1OHbN.rst Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
Thank you for your contribution @hauntsaninja. |
As per the comment, this is shamelessly based off of enumobject.c
With this change:
Without this change:
Benchmarking script:
Details