Hey, Email validation is hard. Do you know how to validate an email address for sure? Neither do I, but here is a report on what I found while working on postmoogle .
So, what’s your wild guess? Regular expressions? Well:
- only pretty simple cases - yes.
- RFC -compliant - no (RFC822 is only a “base” one, there are other RFCs that extend it).
- Be sure that email address is valid for sure - also no.
So, how you can really do that?
First of all - don’t use regexp. You either have to be a god or… Nah, that’s the only option to create a regular expression that will match all possible variations of a valid email address. So - use a library that does all those validations (e.g. in Go there is mail.ParseAddress in stdlib that makes your life simpler).
But that will validate the format only, and it won’t check if the address itself is valid. For example, here is an email address with a 100% valid format, but it’s still invalid: email@example.com
Why is it invalid? Well, because there is no such address. It’s forged. I sense a “(x) doubt” feelings from you! You may ask “Okay, but why would I worry about that? Email address’ format is valid? Valid. I don’t care.”
You have to.
Email is the most spammed thing nowadays, so all email providers have some kind of spam protection, both for incoming and outgoing mail. If you don’t care, your email provider will ban/suspend your account for a high bounce rate.
That’s a serious problem. For example, from my experience, AWS SES will just freeze your email operations completely, until you contact support and prove that you fixed the reason of the high bounce rate on your side.
So, the question is:
How to be sure that an email address is actually valid?⌗
So… You want to hear a story, eh?
One about treasure hunters? Haha, have I got a story for you! Pandora… accidental borderlands
A story about rocket science, again. What did you expect, 3 lines of code to validate an email? Hahaha, you’re a funny one.
Jokes aside, let’s try to do that with a firstname.lastname@example.org
email (yes, a simple email address, not even a not-simple-but-still-valid
undisclosed-recipients:; email address, just for the sake of simplicity):
- Try to get the server address from an email address, usually, it’s a domain name after the
@symbol. In our case:
- Try to get MX records of that server address (e.g. by using tools like
digor your programming language’s MX lookup). You’re lucky! There are some MX records:
- if you are not lucky and there are no MX records - just use the server address itself (actually, according to RFC5321 , you have to go through multiple record types in a specific order, but your programming language stdlib is usually smart enough to do that automatically).
- Now, try to connect the discovered servers, one per step, in a
forloop. If you successfully connect to the
<serveraddress>:25- you’re lucky again! But if not - try the same address with port
587(yes it’s not RFC-compatible, but I saw several mail servers configured to accept submissions from other MTAs on that port. Don’t ask me why - that post is a kind of study report, not a ready-to-use instruction with all edge cases covered). If it doesn’t work - run the next cycle.
- Finally, you found a working SMTP server that supposedly can give you a clue about the validity of the specific email address.
- After you found a working SMTP server and connected to it, you have to start an SMTP session (
The Final Part is the answer of the SMTP server to the
RCPT TO command:
- if there is no error - the email address is valid, congratulations!
- if there is an error - you have to dig dipper
- 45x error usually means that you have been greylisted and you have to repeat the check after some time (usually minutes or hours). It doesn’t say if the email is valid or not - it just preemptively rate limits you.
- any other error usually means that the email address is invalid or there are some problems with the email server, either way - you can’t send email to that address now and it doesn’t matter that it will become valid again in N days.
Why I used
server address instead of
domain name? Because email@example.com
is a valid email, too. Now, live with it.
You may ask “How should I decide if the email address is valid or not if I was greylisted?” Have no idea.
I just mark such emails as invalid, because even if that email exists and you actually can send it email,
there is no clarity whether will it greylist you again when you’ll try to send an email or not, and (more important, IMO)
how will your email service define greylisting?
considers such cases as bounce, so why should I care?
If the user’s email provider preemptively greylists even simple validity checks - that’s the user’s problem, not mine.
What did you expect? A solution? Oh, my sweet summer child , there is none.
Hope that helps :D
PS: How is that relate to postmoogle ? It uses the same steps described above to validate incoming (and send outgoing) emails.