Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make email regex reasonable #2157

Merged
merged 6 commits into from May 21, 2023
Merged

Make email regex reasonable #2157

merged 6 commits into from May 21, 2023

Conversation

colinhacks
Copy link
Owner

@colinhacks colinhacks commented Mar 7, 2023

A lot of people think every technically valid email address must parse validation by z.string().email(). I used to think this too. Over the years I've merged several PRs to make the email regex more "technically correct".

Most recently, I merged a PR that adds support for IPv6 addresses as the domain part of an email. They look like this:

jonny@[ipv6:7e95:0559:10f2:21e9:9dab:7309:c116:ca3b]

Turns out that PR also broke plain old subdomains like user@sub.domain.com. That means my mom's current email address would fail to parse, but jonny up there can get away with parsing his freakish IPv6 email.

This made me re-evaluate my whole stance on what z.string().email() should do. Zod's users are mostly engineers who are building apps. When you're building an app, you want to make sure your users provide normal-ass email addresses. So that's what z.string().email() should do, imo.

So I rewrote the regex from scratch to be simple and reasonable.

/^([A-Z0-9_+-]+\.?)*[A-Z0-9_+-]@([A-Z0-9][A-Z0-9-]*\.)+[A-Z]{2,}$/i;
  • The username ("local part")
    • can use these characters: [a-zA-Z0-9-+._]
    • can't have two dots in a row
    • can't end on a dot
  • The domain
    • can use these characters: [a-zA-Z0-9-]
    • must have one or more dots somewhere in the middle
    • can't have two dots in a row
    • can't start with a hyphen
    • must end with a TLD that's 2+ letters, only [a-zA-Z] allowed

It is not trying to be RFC 5322 compliant. It's not going to check if the TLD is real. And it's not going to implement any of this craziness:

  • "Printable" characters: wtf-!#$%&'*/=?^_{|}~@mail.com
  • Quoted first parts: "zod is cool"@mail.com
  • Comments: regexiscool(kinda)@mail.com
  • IPv4: billie@[1.2.3.4]
  • IPv6: jonny@[ipv6:7e95:0559:10f2:21e9:9dab:7309:c116:ca3b]
  • Emoji: 👎@mail.com

For anything else, you can always use .regex() or .superRefine().

@netlify
Copy link

netlify bot commented Mar 7, 2023

Deploy Preview for guileless-rolypoly-866f8a ready!

Name Link
🔨 Latest commit 0ada2bc
🔍 Latest deploy log https://app.netlify.com/sites/guileless-rolypoly-866f8a/deploys/646aaba10beb3f000876629f
😎 Deploy Preview https://deploy-preview-2157--guileless-rolypoly-866f8a.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@igalklebanov
Copy link
Contributor

Agree!

Maybe we should:

  • export ipv4/ipv6 regexes.
  • extract and export each part of the "normy" email regex.

This'll help people migrate to ZodString.regex, combine parts to build their own email regex.

@joseph-lozano
Copy link
Contributor

joseph-lozano commented Mar 10, 2023

Not that you are asking for suggestions but what about:

  1. Leave email() as it is, but mark it deprecated.
  2. Call this new check simpleEmail() (or similar) to highlight the fact that it is not intended to be a complete email validator.

@joseph-lozano
Copy link
Contributor

joseph-lozano commented Mar 10, 2023

Also, what do you think about using the Regex that HTML input type=email elements use? It's both simple and will align with front-end validations.

https://www.w3.org/TR/2012/WD-html-markup-20121025/input.email.html#form.data.emailaddress_xref2

/^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

@colinclerk
Copy link

colinclerk commented Mar 10, 2023

Maybe add + to allowed chars?

@colinhacks
Copy link
Owner Author

colinhacks commented Mar 10, 2023

Maybe add + to allowed chars?

Oops, that was already supported by the regex but missed it in my breakdown. Good catch.

export ipv4/ipv6 regexes.

Definitely. I think we should consider exporting a bunch of regexes from zod/regex.

The problem is that each additional "special" validator like .email() involves adding a new method & handling logic inside ZodError, which adds up over time. Exporting from zod/regex would be treeshakable so we could throw whatever we wanted in there.

It's also possible we should let people specify their own email regex which would let people customize the behavior and still get an "Invalid email" message in their ZodIssue without additional configuration.

z.string().email({ pattern: /asdf/ })

Regex that HTML input type=email

Nice! I'd missed this. I think it makes sense and seems to be functionally equivalent to mine but with support for "printable characters" in the local part. I'm not convinced emails with those characters should be considered valid, but I like the parity with browsers.

Leave email() as it is, but mark it deprecated...simpleEmail...

I don't like camel-cased method names, and I don't want to burn .email() forever because of this. I'm not planning to do a rename or major version bump for this. It's an immediate usability win for the 98% of users who don't realize that .email() currently lets through a bunch of garbage.

@capaj
Copy link

capaj commented Mar 10, 2023

I would love to be able to allow only emails with a top level domain in all my apps that I am writing.
Like sure, if you're writing a corporate app running inside a corporate network na email like john@localserver might make sense, but I would wager 99 percent of people using zod are actually running their valiadtions in the open internet, where these email adresses are useless.

If I was making zod api, I would even make it forbid these adresses without a TLD by default.
so z.string().email() would not parse john@localserver.

If you would want it to parse it, you would have to use z.string().email({requireTld: false}).
Obviously this would be a breaking change so it would have to go into a new major.

@nderscore
Copy link

nderscore commented Mar 10, 2023

I think this regular expression is still a little too permissive when it comes to + characters.

An email where the username is all + characters is still valid with the new regex:

++++++@foo.bar

When used for mail aliasing, the + is usually part of a suffix added to an existing username.

And usually there is only a single plus character (or rarely they're sometimes used as separators username+foo+bar@xxx.xxx)

Since ++++@foo.bar is a pretty unexpected email address for most normal use cases, it might make sense to add these additional constraints to the regexp?

  • Require the username starts with some non-+ character
  • Prevent repeating + characters

@gudleik

This comment was marked as outdated.

@TomasHubelbauer
Copy link

Hi! I just opened #2224 to enable tom@test.te-st.com to pass. Please take domains and subdomains with dashes in them in consideration while designing a new regex! Thank you

@osdiab
Copy link

osdiab commented Apr 5, 2023

i also ran into this with mail-tester.com emails, would be great if this or the other one get merged in!

@ewen-lbh
Copy link

ewen-lbh commented Apr 24, 2023

something@subdomain.domain-with-hyphens.tld emails include pretty much any French student (or even professor) email address, as most email domains are composed of the etu. subdomain (short for étudiant, which is French for "student"), followed by the school/university's domain (as an example, @etu.inp-n7.fr).

You should allow hyphens anywhere in the domain part, excluding all students of a particular country is not a great idea (I suspect that other countries also use those kinds of email addresses).

No offense of course, but as this is (at least indirectly) a matter of inclusivity, I figured that raising the issue before this gets merged is important, and it illustrates @TomasHubelbauer's comment.

ewen-lbh added a commit to inp-net/loca7 that referenced this pull request Apr 24, 2023
@reminjp reminjp mentioned this pull request May 9, 2023
@colinhacks colinhacks merged commit 36fef58 into master May 21, 2023
13 checks passed
@colinhacks colinhacks deleted the email-regex branch May 21, 2023 23:40
@nicolassanmar
Copy link

Is this regex safe? When I try to use it on my code eslint complains that it is not (I'm aware that the one before was worse though):
image

Should we care too much about this? I'm not sure how easy is to input a string to generate a regex DoS attack with this.
Does anyone have thoughts on this?

@DonKoko
Copy link

DonKoko commented Aug 1, 2023

I am still having the issue that .email() doesn't allow dashes in the domain when there is a subdomain. I am using the latest version of zod.

In the end I had to use .refine() with checking the regex myself, but I am wondering if this is expected to work now and if I am missing something?

@rap2hpoutre
Copy link

Thank you! Is there a planned release date for this fix?

@liamrathke
Copy link

liamrathke commented Aug 11, 2023

I ran into this error as well, and we didn't want to use an unstable (canary) Zod build, so I ended up just wrapping Zod and overriding ZodString.email() in a new zodWrapper.ts file.

import { z, type ZodString } from "zod";

// Replace this with whatever regex you need
const emailRegex =
  /^([A-Z0-9_+-]+\.?)*[A-Z0-9_+-]@([A-Z0-9][A-Z0-9\-]*\.)+[A-Z]{2,}$/i;

z.ZodString.prototype.email = function (
  message?:
    | string
    | {
        message?: string | undefined;
      },
): ZodString {
  return this._addCheck({
    kind: "regex",
    regex: emailRegex,
    message: typeof message === "string" ? message : message?.message,
  });
};

export { z };

Then, instead of import { z } from "zod", we just import { z } from "zodWrapper".

Not the cleanest solution, but it saved me from having to change every z.string().email() to a regex.

@micmcgrorty
Copy link

Are you aware that this change excludes anyone with an apostrophe in their name? eg. O'reilly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet