gh-137855: email.quoprimime removing re import#132046
gh-137855: email.quoprimime removing re import#132046Marius-Juston wants to merge 11 commits intopython:mainfrom
email.quoprimime removing re import#132046Conversation
re importemail.quoprimime removing re import
|
The PR: will probably drastically improve the speed as well as once |
|
I did not notice that the warmup needed for regex: 153.9974 ± 35.97 (103 to 1778; n=10000)
non_regex: 148.4565 ± 25.48 (125 to 991; n=10000) |
|
( the new _HEX_TO_CHAR cache could also be used for the # Decode if in form =AB
elif i+2 < n and line[i+1] in hexdigits and line[i+2] in hexdigits:
decoded += unquote(line[i:i+3]) |
Slightly faster |
|
Adding the '=' check now speeds things up:
|
|
As a comparison (if you compile the regex for the function + add early exit) c = re.compile("=[a-fA-F0-9]{2}", flags=re.ASCII)
def header_decode_re(s):
"""Decode a string using regex."""
s = s.replace('_', ' ') # Replace underscores with spaces
if '=' in s:
return c.sub(_unquote_match, s)
return s
|
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
|
@AA-Turner, what's your opinion on replacing this regex expression (even though it sometimes makes the algorithm slower)? |
|
Very slight improvement (mainly on the
|
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
|
Actually I don't understand this PR. It looks like we still import |
|
I'm a bit lost on the current benchmarks, but the most recent comment (with
Through A |
I agree this is odd. I've been using the below (rough) script to benchmark import times, for more data points than just a single run.
|
email.quoprimime removing re importemail.quoprimime removing re import
| if '=' not in s: | ||
| return s | ||
|
|
||
| result = '' |
There was a problem hiding this comment.
Repeatedly appending to a string in a loop is O(n**2). The standard idiom is to make a list of pieces (result=[]) and join after the loop. I suspect that re.sub does the C equivalent.
In any case, I agree that replacing an re call with this much code seems dubious (a bad tradeoff), so closing this might be best.
This pull request removes the
remodule from theemail.quoprimime, thus increasing the import speed from5676us to3669us (a 60% import speed increase );From
To
however, the new implementation does increase the compute time
So it is very possible that this is not worth it.
Issues:
reuses #130167