Conversation
Fix the performance regression at ruby#172 for valid emails. ``` yml prelude: | require 'uri/mailto' n = 1000 re = URI::MailTo::EMAIL_REGEXP benchmark: n.t..t.@docomo.ne.jp: re.match?("n.t..t.@docomo.ne.jp") example@example.info: re.match?("example@example.info") ``` | |released| 788274b| c5974f0| this| |:---------------------|-------:|-------:|-------:|-------:| |n.t..t.@docomo.ne.jp | 3.538M| 4.509M| 4.597M| 8.089M| | | -| 1.27x| 1.30x| 2.29x| |example@example.info | 3.627M| 3.461M| 2.622M| 3.610M| | | 1.38x| 1.32x| -| 1.38x|
Fix the performance regression at ruby#172 for valid emails. ``` yml prelude: | require 'uri/mailto' n = 1000 re = URI::MailTo::EMAIL_REGEXP benchmark: n.t..t.: re.match?("n.t..t.@docomo.ne.jp") example: re.match?("example@example.info") ``` | |released| 788274b| c5974f0| this| |:--------|-------:|-------:|-------:|-------:| |n.t..t. | 3.795M| 4.864M| 4.993M| 8.739M| | | -| 1.28x| 1.32x| 2.30x| |example | 3.911M| 3.740M| 2.838M| 3.880M| | | 1.38x| 1.32x| -| 1.37x|
Fix the performance regression at ruby#172 for valid emails. ``` yml prelude: | require 'uri/mailto' n = 1000 re = URI::MailTo::EMAIL_REGEXP benchmark: n.t..t.: re.match?("n.t..t.@docomo.ne.jp") example: re.match?("example@example.info") ``` | |released| 788274b| c5974f0| this| |:--------|-------:|-------:|-------:|-------:| |n.t..t. | 3.795M| 4.864M| 4.993M| 8.739M| | | -| 1.28x| 1.32x| 2.30x| |example | 3.911M| 3.740M| 2.838M| 3.880M| | | 1.38x| 1.32x| -| 1.37x|
Fix the performance regression at ruby#172 for valid emails. ``` yml prelude: | require 'uri/mailto' n = 1000 re = URI::MailTo::EMAIL_REGEXP benchmark: n.t..t.: re.match?("n.t..t.@docomo.ne.jp") example: re.match?("example@example.info") ``` | |released| 788274b| c5974f0| this| |:--------|-------:|-------:|-------:|-------:| |n.t..t. | 3.795M| 4.864M| 4.993M| 8.739M| | | -| 1.28x| 1.32x| 2.30x| |example | 3.911M| 3.740M| 2.838M| 3.880M| | | 1.38x| 1.32x| -| 1.37x|
|
@nobu @hsbt While this change is semantically correct, its impact is rather broad and may cause unintentional breakages. It is standard practice to use Instead of modifying the original Note: Email addresses like |
| # https://html.spec.whatwg.org/multipage/input.html#valid-e-mail-address | ||
| EMAIL_REGEXP = /\A(?!\.)[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+(?<!\.)@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\z/ | ||
| EMAIL_REGEXP = /\A(?!\.)(?!.*\.{2})[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+(?<!\.)@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\z/ |
There was a problem hiding this comment.
As the comment above states, the original regex is mostly drawn from WHATWG HTML LS. This spec states that it intentionally violates RFC 5322 to provide a practical regex for validation.
This requirement is a willful violation of RFC 5322, which defines a syntax for email addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.
The allowing of .. is not the only deviation from RFC 5322. If a truly RFC 5322-compliant regexp is needed, I believe it should be organized under a different name, since too much departure from the original EMAIL_REGEXP must be introduced.
|
I have opened #189. Please use if needed. |
|
Reverted at #189 |
Successive dots are prohibited in RFC5322.