gh-119396: Optimize PyUnicode_FromFormat() UTF-8 decoder#119398
Merged
vstinner merged 3 commits intopython:mainfrom May 22, 2024
Merged
gh-119396: Optimize PyUnicode_FromFormat() UTF-8 decoder#119398vstinner merged 3 commits intopython:mainfrom
vstinner merged 3 commits intopython:mainfrom
Conversation
Member
Author
|
Benchmark: diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
index f99ebf0dde..0752b2b1d2 100644
--- a/Modules/_testcapimodule.c
+++ b/Modules/_testcapimodule.c
@@ -3312,6 +3312,14 @@ function_set_warning(PyObject *Py_UNUSED(module), PyObject *Py_UNUSED(args))
Py_RETURN_NONE;
}
+static PyObject *
+bench(PyObject *Py_UNUSED(module), PyObject *Py_UNUSED(args))
+{
+ return PyUnicode_FromFormat(
+ "%s %s %s %s %s.",
+ "format", "multiple", "utf8", "short", "strings");
+}
+
static PyMethodDef TestMethods[] = {
{"set_errno", set_errno, METH_VARARGS},
{"test_config", test_config, METH_NOARGS},
@@ -3454,6 +3462,7 @@ static PyMethodDef TestMethods[] = {
{"check_pyimport_addmodule", check_pyimport_addmodule, METH_VARARGS},
{"test_weakref_capi", test_weakref_capi, METH_NOARGS},
{"function_set_warning", function_set_warning, METH_NOARGS},
+ {"bench", bench, METH_NOARGS},
{NULL, NULL} /* sentinel */
};
Command: ./python -m venv env
env/bin/python -m pip install pyperf
env/bin/python -m pyperf timeit -s 'import _testcapi; func=_testcapi.bench' 'func()' -v -o ref.jsonResult, Python built with
|
Member
Author
|
Oh, there was a performance regression on Benchmark: import pyperf
import _testcapi
runner = pyperf.Runner()
utf8 = b'abc'
runner.bench_func('abc', utf8.decode)
utf8 = 'abcé'.encode()
runner.bench_func('abc + UTF-8', utf8.decode)
utf8 = 'éabc'.encode()
runner.bench_func('UTF-8 + abc', utf8.decode)
utf8 = b'x' * (1024 * 1024)
runner.bench_func('ASCII 1 MiB', utf8.decode)
utf8 = ('x' * (1024 * 1024) + 'é').encode()
runner.bench_func('ASCII 1 MiB + UTF-8', utf8.decode)
utf8 = ('é' + 'x' * (1024 * 1024)).encode()
runner.bench_func('UTF-8 + ASCII 1 MiB', utf8.decode)
utf8 = ('€' + 'x' * (1024 * 1024)).encode()
runner.bench_func('UTF-8 euro + ASCII 1 MiB', utf8.decode)Results, Python built with => There is no significant impact on |
Member
Author
Add unicode_decode_utf8_writer() to write directly characters into a
_PyUnicodeWriter writer: avoid the creation of a temporary string.
Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().
Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().
Microbenchmark on the code:
return PyUnicode_FromFormat(
"%s %s %s %s %s.",
"format", "multiple", "utf8", "short", "strings");
Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
Member
Author
|
I enabled automerge. Thanks for the review @serhiy-storchaka. |
estyxx
pushed a commit
to estyxx/cpython
that referenced
this pull request
Jul 17, 2024
…n#119398) Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new unicode_decode_utf8_writer(). Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(). Microbenchmark on the code: return PyUnicode_FromFormat( "%s %s %s %s %s.", "format", "multiple", "utf8", "short", "strings"); Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().
Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().
Microbenchmark on the code:
Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.