gh-137855: `email.quoprimime` removing `re` import by Marius-Juston · Pull Request #132046 · python/cpython

Marius-Juston · 2025-04-03T10:22:34Z

This pull request removes the re module from the email.quoprimime, thus increasing the import speed from 5676 us to 3669 us (a 60% import speed increase );

From

marius@DESKTOP-IOUM5DH:~/cpython$ ./python -X importtime -c "import email.quoprimime"
import time: self [us] | cumulative | imported package
import time:        88 |         88 |   _io
import time:        19 |         19 |   marshal
import time:       143 |        143 |   posix
import time:       332 |        580 | _frozen_importlib_external
import time:        42 |         42 |   time
import time:       125 |        166 | zipimport
import time:        25 |         25 |     _codecs
import time:       290 |        315 |   codecs
import time:       190 |        190 |   encodings.aliases
import time:       417 |        921 | encodings
import time:        90 |         90 | encodings.utf_8
import time:        44 |         44 | _signal
import time:        22 |         22 |     _abc
import time:        93 |        114 |   abc
import time:       484 |        484 |   _collections_abc
import time:       136 |        733 | io
import time:        22 |         22 |       _stat
import time:        59 |         80 |     stat
import time:        31 |         31 |       errno
import time:        43 |         43 |       genericpath
import time:        87 |        160 |     posixpath
import time:       283 |        523 |   os
import time:        50 |         50 |   _sitebuiltins
import time:        86 |         86 |   sitecustomize
import time:        30 |         30 |   usercustomize
import time:       216 |        902 | site
import time:       114 |        114 | linecache
import time:       203 |        203 |   email
import time:        22 |         22 |     _string
import time:       140 |        140 |         types
import time:       795 |        935 |       enum
import time:        34 |         34 |         _sre
import time:       130 |        130 |           re._constants
import time:       181 |        311 |         re._parser
import time:        49 |         49 |         re._casefix
import time:       198 |        591 |       re._compiler
import time:        57 |         57 |           itertools
import time:        75 |         75 |           keyword
import time:        41 |         41 |             _operator
import time:       153 |        194 |           operator
import time:        98 |         98 |           reprlib
import time:        32 |         32 |           _collections
import time:       553 |       1006 |         collections
import time:        30 |         30 |         _functools
import time:       346 |       1381 |       functools
import time:        95 |         95 |       copyreg
import time:       311 |       3311 |     re
import time:       365 |       3697 |   string
import time:      1777 |       5676 | email.quoprimime

To

marius@DESKTOP-IOUM5DH:~/cpython$ ./python -X importtime -c "import email.quoprimime"
import time: self [us] | cumulative | imported package
import time:        89 |         89 |   _io
import time:        18 |         18 |   marshal
import time:       130 |        130 |   posix
import time:       305 |        541 | _frozen_importlib_external
import time:        37 |         37 |   time
import time:       115 |        152 | zipimport
import time:        24 |         24 |     _codecs
import time:       273 |        296 |   codecs
import time:       175 |        175 |   encodings.aliases
import time:       387 |        857 | encodings
import time:        83 |         83 | encodings.utf_8
import time:        40 |         40 | _signal
import time:        16 |         16 |     _abc
import time:        88 |        103 |   abc
import time:       422 |        422 |   _collections_abc
import time:       125 |        649 | io
import time:        20 |         20 |       _stat
import time:        54 |         73 |     stat
import time:        29 |         29 |       errno
import time:        39 |         39 |       genericpath
import time:        81 |        148 |     posixpath
import time:       269 |        490 |   os
import time:        48 |         48 |   _sitebuiltins
import time:        80 |         80 |   sitecustomize
import time:        28 |         28 |   usercustomize
import time:       199 |        842 | site
import time:       106 |        106 | linecache
import time:       189 |        189 |   email
import time:        18 |         18 |     _string
import time:       124 |        124 |         types
import time:       667 |        791 |       enum
import time:        33 |         33 |         _sre
import time:       130 |        130 |           re._constants
import time:       177 |        307 |         re._parser
import time:        49 |         49 |         re._casefix
import time:       179 |        566 |       re._compiler
import time:        58 |         58 |           itertools
import time:        74 |         74 |           keyword
import time:        36 |         36 |             _operator
import time:       148 |        183 |           operator
import time:        96 |         96 |           reprlib
import time:        31 |         31 |           _collections
import time:       494 |        934 |         collections
import time:        27 |         27 |         _functools
import time:       315 |       1274 |       functools
import time:        87 |         87 |       copyreg
import time:       284 |       3000 |     re
import time:       297 |       3314 |   string
import time:       168 |       3669 | email.quoprimime

however, the new implementation does increase the compute time

TEST_CASES = {
    "empty": "Dracula",
    "empty_medium": "Dracula"* 10,
    "empty_long": "Dracula"* 100,
    "short": "Hello=20World=21",
    "medium": "This_is_a_test=3F=3D=2E" * 10,
    "long": "Some_long_text_with_encoding=20" * 100,
    "mixed": "A=2Equick=20brown=5Ffox=21=3F" * 50,
    "edge_case_short": "=20=21=3F=2E=5F",
    "edge_case_long": "=20=21=3F=2E=5F" * 200
}

Benchmark	regex	non_regex
empty	284 ns	382 ns: 1.34x slower
empty_medium	302 ns	2.99 us: 9.91x slower
empty_long	371 ns	28.6 us: 77.20x slower
short	731 ns	902 ns: 1.23x slower
medium	6.24 us	11.8 us: 1.89x slower
long	25.5 us	137 us: 5.37x slower
mixed	57.0 us	71.5 us: 1.25x slower
edge_case_short	1.36 us	916 ns: 1.48x faster
edge_case_long	178 us	160 us: 1.11x faster
Geometric mean	(ref)	2.78x slower

So it is very possible that this is not worth it.

Issues:

Issue: Improve import time of various stdlib modules #137855

Marius-Juston · 2025-04-03T10:27:38Z

The PR:

gh-118761: Optimise import time for string #132037

will probably drastically improve the speed as well as once string lazy imports re it will drastically speed up the string import and this module only uses the string module to import constants from string import ascii_letters, digits, hexdigits

Marius-Juston · 2025-04-03T10:41:06Z

I did not notice that the warmup needed for ./python -X importtime -c 'import email.quoprimime' and so the more accurate timings are actually:

regex: 153.9974 ± 35.97 (103 to 1778; n=10000)
non_regex: 148.4565 ± 25.48 (125 to 991; n=10000)

…ices

Marius-Juston · 2025-04-03T11:05:11Z

( the new _HEX_TO_CHAR cache could also be used for the decode function as well afterwards since it checks for more or less the same thing)

# Decode if in form =AB
elif i+2 < n and line[i+1] in hexdigits and line[i+2] in hexdigits:
     decoded += unquote(line[i:i+3])

Marius-Juston · 2025-04-03T17:26:46Z

Benchmark	regex	non_regex_2
empty	288 ns	259 ns: 1.11x faster
empty_medium	299 ns	1.74 us: 5.81x slower
empty_long	375 ns	16.3 us: 43.61x slower
short	725 ns	714 ns: 1.01x faster
medium	6.22 us	7.97 us: 1.28x slower
long	22.0 us	85.9 us: 3.91x slower
mixed	49.5 us	56.3 us: 1.14x slower
edge_case_short	1.26 us	744 ns: 1.69x faster
edge_case_long	177 us	125 us: 1.41x faster
Geometric mean	(ref)	2.01x slower

Slightly faster

Marius-Juston · 2025-04-03T17:43:14Z

Adding the '=' check now speeds things up:

Benchmark	regex	non_regex
empty	288 ns	53.6 ns: 5.37x faster
empty_medium	299 ns	54.1 ns: 5.53x faster
empty_long	375 ns	62.5 ns: 6.00x faster
short	725 ns	722 ns: 1.00x faster
medium	6.22 us	8.09 us: 1.30x slower
long	22.0 us	86.7 us: 3.94x slower
mixed	49.5 us	58.6 us: 1.18x slower
edge_case_short	1.26 us	767 ns: 1.64x faster
edge_case_long	177 us	127 us: 1.39x faster
Geometric mean	(ref)	1.60x faster

Marius-Juston · 2025-04-03T17:47:21Z

As a comparison (if you compile the regex for the function + add early exit)

c = re.compile("=[a-fA-F0-9]{2}", flags=re.ASCII)

def header_decode_re(s):
    """Decode a string using regex."""
    s = s.replace('_', ' ')  # Replace underscores with spaces
    if '=' in s:
        return c.sub(_unquote_match, s)
    return s

Benchmark	regex	regex2	non_regex
empty	288 ns	51.4 ns: 5.60x faster	53.6 ns: 5.37x faster
empty_medium	299 ns	52.0 ns: 5.76x faster	54.1 ns: 5.53x faster
empty_long	375 ns	61.0 ns: 6.15x faster	62.5 ns: 6.00x faster
short	725 ns	560 ns: 1.29x faster	722 ns: 1.00x faster
medium	6.22 us	6.44 us: 1.04x slower	8.09 us: 1.30x slower
long	22.0 us	22.9 us: 1.04x slower	86.7 us: 3.94x slower
mixed	49.5 us	52.6 us: 1.06x slower	58.6 us: 1.18x slower
edge_case_short	1.26 us	1.12 us: 1.13x faster	767 ns: 1.64x faster
edge_case_long	177 us	189 us: 1.07x slower	127 us: 1.39x faster
Geometric mean	(ref)	1.83x faster	1.60x faster

Lib/email/quoprimime.py

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

Marius-Juston · 2025-04-04T03:24:52Z

@AA-Turner, what's your opinion on replacing this regex expression (even though it sometimes makes the algorithm slower)?

Marius-Juston · 2025-04-04T04:20:27Z

Very slight improvement (mainly on the edge_case_short and short where string concatenation is faster than using "".join()

Benchmark	regex	regex2	non_regex	non_regex_add
empty	288 ns	51.4 ns: 5.60x faster	53.6 ns: 5.37x faster	51.6 ns: 5.58x faster
empty_medium	299 ns	52.0 ns: 5.76x faster	54.1 ns: 5.53x faster	51.6 ns: 5.80x faster
empty_long	375 ns	61.0 ns: 6.15x faster	62.5 ns: 6.00x faster	59.9 ns: 6.26x faster
short	725 ns	560 ns: 1.29x faster	722 ns: 1.00x faster	674 ns: 1.08x faster
medium	6.22 us	6.44 us: 1.04x slower	8.09 us: 1.30x slower	7.82 us: 1.26x slower
long	22.0 us	22.9 us: 1.04x slower	86.7 us: 3.94x slower	83.5 us: 3.79x slower
mixed	49.5 us	52.6 us: 1.06x slower	58.6 us: 1.18x slower	60.6 us: 1.22x slower
edge_case_short	1.26 us	1.12 us: 1.13x faster	767 ns: 1.64x faster	699 ns: 1.80x faster
edge_case_long	177 us	189 us: 1.07x slower	127 us: 1.39x faster	127 us: 1.39x faster
Geometric mean	(ref)	1.83x faster	1.60x faster	1.66x faster

hauntsaninja

This is slower and harder to maintain, so I'm -1 on this PR

bedevere-app · 2025-04-06T05:33:51Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

hauntsaninja · 2025-04-06T05:42:54Z

Actually I don't understand this PR.

It looks like we still import re transitively? Also I don't understand why the "self" time reported by -X importtime in your PR body goes from 1777 to 168, if anything looks like quoprimime.py does more work at import time now

AA-Turner · 2025-04-06T05:42:57Z

I'm a bit lost on the current benchmarks, but the most recent comment (with non_regex_add) appears to indicate this is slightly faster. That said, I agree with @hauntsaninja that the algorithm in the PR is too complicated and will be difficult to maintain, in contrast to the one-liner regular expression.

It looks like we still import re transitively?

Through string, see #132037 to help there.

A

AA-Turner · 2025-04-06T05:51:08Z

Also I don't understand why the "self" time reported by -X importtime in your PR body goes from 1777 to 168, if anything looks like quoprimime.py does more work at import time now

I agree this is odd. I've been using the below (rough) script to benchmark import times, for more data points than just a single run.

bench.py

import subprocess, sys
import statistics

BASE_CMD = (sys.executable, '-Ximporttime', '-S', '-c',)

def run_importtime(mod: str) -> str:
    return subprocess.run(BASE_CMD + (f'import {mod}',), check=True, capture_output=True, encoding='utf-8').stderr

for mod in sys.argv[1:]:
    for _ in range(5):  # warmup
        lines = run_importtime(mod)
    print(lines.partition('\n')[0])
    own_times = []
    cum_times = []
    for _ in range(50):
        lines = run_importtime(mod)
        final_line = lines.rstrip().rpartition('\n')[-1]
        # print(final_line)
        # import time:       {own} |       {cum} | {mod}
        own, cum = map(int, final_line.split()[2:5:2])
        own_times.append(own)
        cum_times.append(cum)
    own_times.sort()
    cum_times.sort()
    own_times[:] = own_times[10:-10]
    cum_times[:] = cum_times[10:-10]
    for label, times in [('own', own_times), ('cumulative', cum_times)]:
        print()
        print(f'import {mod}: {label} time')
        print(f'mean: {statistics.mean(times):.3f} µs')
        print(f'median: {statistics.median(times):.3f} µs')
        print(f'stdev: {statistics.stdev(times):.3f}')
        print('min:', min(times))
        print('max:', max(times))

python-cla-bot · 2025-04-06T13:55:49Z

All commit authors signed the Contributor License Agreement.

terryjreedy · 2025-08-16T14:56:01Z

Lib/email/quoprimime.py

+    if '=' not in s:
+        return s
+
+    result = ''


Repeatedly appending to a string in a loop is O(n**2). The standard idiom is to make a list of pieces (result=[]) and join after the loop. I suspect that re.sub does the C equivalent.

In any case, I agree that replacing an re call with this much code seems dubious (a bad tradeoff), so closing this might be best.

Marius-Juston added 5 commits April 3, 2025 04:15

Removed re hex

c670b11

implace replace, removed valid_hex parameter

08cdc0b

joined to big if statement

a3ef550

added news

22e6d9e

inline character assigment

232bb55

Marius-Juston requested a review from a team as a code owner April 3, 2025 10:22

bedevere-app bot added the awaiting review label Apr 3, 2025

bedevere-app bot mentioned this pull request Apr 3, 2025

Improve import time of various stdlib modules #118761

Closed

Marius-Juston changed the title ~~gh-118761: Quoprimime removing re import~~ gh-118761: email.quoprimime removing re import Apr 3, 2025

use cache for hex to char + instead of single character append use sl…

3ada67a

…ices

inplace assignment with walrus

9e3cc1f

ZeroIntensity added the topic-email label Apr 3, 2025

Marius-Juston added 2 commits April 3, 2025 12:27

removed news since should probably be "skip news" tagged

3287564

fast pass for no '='

8362a2e

AA-Turner reviewed Apr 4, 2025

View reviewed changes

Lib/email/quoprimime.py Outdated Show resolved Hide resolved

Update Lib/email/quoprimime.py

81ae23a

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

faster string concatenation

8659486

hauntsaninja requested changes Apr 6, 2025

View reviewed changes

bedevere-app bot added awaiting changes and removed awaiting review labels Apr 6, 2025

ofek mentioned this pull request Aug 16, 2025

Improve import time of various stdlib modules #137855

Open

hugovk changed the title ~~gh-118761: email.quoprimime removing re import~~ gh-137855: email.quoprimime removing re import Aug 16, 2025

terryjreedy reviewed Aug 16, 2025

View reviewed changes

hauntsaninja closed this Aug 16, 2025

Uh oh!

Conversation

Marius-Juston commented Apr 3, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Marius-Juston commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Marius-Juston commented Apr 3, 2025

Uh oh!

Marius-Juston commented Apr 3, 2025

Uh oh!

Marius-Juston commented Apr 3, 2025

Uh oh!

Marius-Juston commented Apr 3, 2025

Uh oh!

Marius-Juston commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Marius-Juston commented Apr 4, 2025

Uh oh!

Marius-Juston commented Apr 4, 2025

Uh oh!

hauntsaninja left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bedevere-app bot commented Apr 6, 2025

Uh oh!

hauntsaninja commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AA-Turner commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AA-Turner commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

python-cla-bot bot commented Apr 6, 2025

Uh oh!

terryjreedy Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Marius-Juston commented Apr 3, 2025 •

edited by bedevere-app bot

Loading

Marius-Juston commented Apr 3, 2025 •

edited

Loading

Marius-Juston commented Apr 3, 2025 •

edited

Loading

hauntsaninja left a comment •

edited

Loading

hauntsaninja commented Apr 6, 2025 •

edited

Loading

AA-Turner commented Apr 6, 2025 •

edited

Loading

AA-Turner commented Apr 6, 2025 •

edited

Loading