More checks in `EMAIL_REGEXP` by nobu · Pull Request #172 · ruby/uri

nobu · 2025-07-12T07:05:58Z

Successive dots are prohibited in RFC5322.

Fix the performance regression at ruby#172 for valid emails. ``` yml prelude: | require 'uri/mailto' n = 1000 re = URI::MailTo::EMAIL_REGEXP benchmark: n.t..t.@docomo.ne.jp: re.match?("n.t..t.@docomo.ne.jp") example@example.info: re.match?("example@example.info") ``` | |released| 788274b| c5974f0| this| |:---------------------|-------:|-------:|-------:|-------:| |n.t..t.@docomo.ne.jp | 3.538M| 4.509M| 4.597M| 8.089M| | | -| 1.27x| 1.30x| 2.29x| |example@example.info | 3.627M| 3.461M| 2.622M| 3.610M| | | 1.38x| 1.32x| -| 1.38x|

Fix the performance regression at ruby#172 for valid emails. ``` yml prelude: | require 'uri/mailto' n = 1000 re = URI::MailTo::EMAIL_REGEXP benchmark: n.t..t.: re.match?("n.t..t.@docomo.ne.jp") example: re.match?("example@example.info") ``` | |released| 788274b| c5974f0| this| |:--------|-------:|-------:|-------:|-------:| |n.t..t. | 3.795M| 4.864M| 4.993M| 8.739M| | | -| 1.28x| 1.32x| 2.30x| |example | 3.911M| 3.740M| 2.838M| 3.880M| | | 1.38x| 1.32x| -| 1.37x|

osyoyu · 2025-11-04T04:32:17Z

@nobu @hsbt While this change is semantically correct, its impact is rather broad and may cause unintentional breakages. It is standard practice to use EMAIL_REGEXP to test validity on login screens. An user using an email address containing .. will suddenly experience login problems once the service provider updates uri to 1.1.0.

Instead of modifying the original EMAIL_REGEXP constant, can the new regex live under a separate name like RFC5322_EMAIL_REGEXP (just like RFC3986_PARSER), or can a more larger announcement be made?

Note: Email addresses like a..a@example.com and a.@example.com have been allowed by a major email provider in the past, and still do exist in the wild. To the best of my knowledge, web services do rely on EMAIL_REGEXP behavior allowing these addresses.

osyoyu · 2025-11-04T05:19:54Z

lib/uri/mailto.rb

    # https://html.spec.whatwg.org/multipage/input.html#valid-e-mail-address
-    EMAIL_REGEXP = /\A(?!\.)[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+(?<!\.)@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\z/
+    EMAIL_REGEXP = /\A(?!\.)(?!.*\.{2})[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+(?<!\.)@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\z/


As the comment above states, the original regex is mostly drawn from WHATWG HTML LS. This spec states that it intentionally violates RFC 5322 to provide a practical regex for validation.

This requirement is a willful violation of RFC 5322, which defines a syntax for email addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

The allowing of .. is not the only deviation from RFC 5322. If a truly RFC 5322-compliant regexp is needed, I believe it should be organized under a different name, since too much departure from the original EMAIL_REGEXP must be introduced.

osyoyu · 2025-11-04T05:38:00Z

I have opened #189. Please use if needed.

sorah · 2025-11-04T07:06:17Z

Reverted at #189

nobu added 2 commits July 12, 2025 15:53

More tests for check_to

b1b5f9a

Prohibit successive dots in email

3233592

nobu changed the title ~~More checks in email regexp~~ More checks in EMAIL_REGEXP Jul 12, 2025

nobu merged commit 0abac72 into ruby:master Jul 12, 2025
26 checks passed

nobu deleted the more-checks-in-email_regexp branch July 12, 2025 07:07

nobu mentioned this pull request Jul 12, 2025

Improve performance of URI::MailTo::EMAIL_REGEXP #173

Merged

y-yagi mentioned this pull request Oct 6, 2025

Possible bug in URI::MailTo::EMAIL_REGEXP when parsing emails addresses containing two consecutive dots #177

Closed

osyoyu reviewed Nov 4, 2025

View reviewed changes

sorah mentioned this pull request Nov 4, 2025

The local part should not contain leading or trailing dots in the EMAIL_REGEXP #124

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More checks in `EMAIL_REGEXP`#172

More checks in `EMAIL_REGEXP`#172
nobu merged 2 commits intoruby:masterfrom
nobu:more-checks-in-email_regexp

nobu commented Jul 12, 2025

Uh oh!

Uh oh!

osyoyu commented Nov 4, 2025 •

edited

Loading

Uh oh!

osyoyu Nov 4, 2025 •

edited

Loading

Uh oh!

osyoyu commented Nov 4, 2025

Uh oh!

sorah commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nobu commented Jul 12, 2025

Uh oh!

Uh oh!

osyoyu commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

osyoyu Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

osyoyu commented Nov 4, 2025

Uh oh!

sorah commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

osyoyu commented Nov 4, 2025 •

edited

Loading

osyoyu Nov 4, 2025 •

edited

Loading