Linux中国
史上最复杂的验证邮件地址的正则表达式
邮件地址的规范来自于 RFC 5322 。有一个网站 emailregex.com 专门列出各种编程语言下的验证邮件地址的正则表达式,其中很多正则表达式都是我听说过而从未见过的复杂——我想说,做这个网站的程序员是被邮件验证这件事伤害了多深啊!
其实,在产品环境中,一般来说并不需要这么复杂的正则表达式来做到99.99%正确。一般来说,从执行效率和测试覆盖率来说,只需要一个简单的版本即可:
/^[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}$/i
那么下面我们来看看这些更严谨、更复杂的正则表达式吧:
验证邮件地址的通用正则表达式(符合 RFC 5322 标准)
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|\[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0bx0cx0e-x1fx21-x5ax53-x7f]|\[x01-x09x0bx0cx0e-x7f])+)])
由于各种语言对正则表达式的支持不同、语法差异和覆盖率不同,所以,不同语言里面的正则表达式也不同:
Python
这个是个简单的版本:
r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+$)"
Javascript
这个有点复杂了:
/^[-a-z0-9~!$%^&*_=+}{'?]+(.[-a-z0-9~!$%^&*_=+}{'?]+)*@([a-z0-9_][-a-z0-9_]*(.[-a-z0-9_]+)*.(aero|arpa|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org|pro|travel|mobi|[a-z][a-z])|([0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}))(:[0-9]{1,5})?$/i
Swift
[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}
PHP
PHP 的这个版本就更复杂了,覆盖率就更大一些:
/^(?!(?:(?:x22?x5C[x00-x7E]x22?)|(?:x22?[^x5Cx22]x22?)){255,})(?!(?:(?:x22?x5C[x00-x7E]x22?)|(?:x22?[^x5Cx22]x22?)){65,}@)(?:(?:[x21x23-x27x2Ax2Bx2Dx2F-x39x3Dx3Fx5E-x7E]+)|(?:x22(?:[x01-x08x0Bx0Cx0E-x1Fx21x23-x5Bx5D-x7F]|(?:x5C[x00-x7F]))*x22))(?:.(?:(?:[x21x23-x27x2Ax2Bx2Dx2F-x39x3Dx3Fx5E-x7E]+)|(?:x22(?:[x01-x08x0Bx0Cx0E-x1Fx21x23-x5Bx5D-x7F]|(?:x5C[x00-x7F]))*x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))]))$/iD
Perl / Ruby
对与 PHP 的版本,Perl 和 Ruby 表示不服,可以更严谨:
(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*(?:(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00- 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*))*)?;s
Perl 5.10 及以后版本
上面的版本,嗯,我可以说是天书吗?反正我是没有解读的想法了。当然,新版本的 Perl 语言还有一个更易读的版本(你是说真的么?)
/(?(DEFINE)
(?<address> (?&mailbox) | (?&group))
(?<mailbox> (?&name_addr) | (?&addr_spec))
(?<name_addr> (?&display_name)? (?&angle_addr))
(?<angle_addr> (?&CFWS)? < (?&addr_spec) > (?&CFWS)?)
(?<group> (?&display_name) : (?:(?&mailbox_list) | (?&CFWS))? ;
(?&CFWS)?)
(?<display_name> (?&phrase))
(?<mailbox_list> (?&mailbox) (?: , (?&mailbox))*)
(?<addr_spec> (?&local_part) @ (?&domain))
(?<local_part> (?&dot_atom) | (?"ed_string))
(?<domain> (?&dot_atom) | (?&domain_literal))
(?<domain_literal> (?&CFWS)? [ (?: (?&FWS)? (?&dcontent))* (?&FWS)?
] (?&CFWS)?)
(?<dcontent> (?&dtext) | (?"ed_pair))
(?<dtext> (?&NO_WS_CTL) | [x21-x5ax5e-x7e])
(?<atext> (?&ALPHA) | (?&DIGIT) | [!#$%&'*+-/=?^_`{|}~])
(?<atom> (?&CFWS)? (?&atext)+ (?&CFWS)?)
(?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)?)
(?<dot_atom_text> (?&atext)+ (?: . (?&atext)+)*)
(?<text> [x01-x09x0bx0cx0e-x7f])
(?<quoted_pair> \ (?&text))
(?<qtext> (?&NO_WS_CTL) | [x21x23-x5bx5d-x7e])
(?<qcontent> (?&qtext) | (?"ed_pair))
(?<quoted_string> (?&CFWS)? (?&DQUOTE) (?:(?&FWS)? (?&qcontent))*
(?&FWS)? (?&DQUOTE) (?&CFWS)?)
(?<word> (?&atom) | (?"ed_string))
(?<phrase> (?&word)+)
# Folding white space
(?<FWS> (?: (?&WSP)* (?&CRLF))? (?&WSP)+)
(?<ctext> (?&NO_WS_CTL) | [x21-x27x2a-x5bx5d-x7e])
(?<ccontent> (?&ctext) | (?"ed_pair) | (?&comment))
(?<comment> ( (?: (?&FWS)? (?&ccontent))* (?&FWS)? ) )
(?<CFWS> (?: (?&FWS)? (?&comment))*
(?: (?:(?&FWS)? (?&comment)) | (?&FWS)))
# No whitespace control
(?<NO_WS_CTL> [x01-x08x0bx0cx0e-x1fx7f])
(?<ALPHA> [A-Za-z])
(?<DIGIT> [0-9])
(?<CRLF> x0d x0a)
(?<DQUOTE> ")
(?<WSP> [x20x09])
)
(?&address)/x
Ruby (简单版)
Ruby 表示,其实人家还有个简单版本:
/A([w+-].?)+@[a-zd-]+(.[a-z]+)*.[a-z]+z/i
.NET
这样的版本谁没有啊——.NET 说:
^w+([-+.']w+)*@w+([-.]w+)*.w+([-.]w+)*$
grep 命令
用 grep 命令在文件中查找邮件地址,我想你不会写个若干行的正则表达式吧,意思一下就行了:
$ grep -E -o "b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,6}b" filename.txt
SQL Server
在 SQL Server 中也是可以用正则表达式的,不过这个代码片段应该是来自某个产品环境中的,所以,还体贴的照顾了那些把邮件地址写错的人:
select email
from table_name where
patindex ('%[ &'',":;!+=/()<>]%', email) > 0 -- Invalid characters
or patindex ('[@.-_]%', email) > 0 -- Valid but cannot be starting character
or patindex ('%[@.-_]', email) > 0 -- Valid but cannot be ending character
or email not like '%@%.%' -- Must contain at least one @ and one .
or email like '%..%' -- Cannot have two periods in a row
or email like '%@%@%' -- Cannot have two @ anywhere
or email like '%.@%' or email like '%@.%' -- Cannot have @ and . next to each other
or email like '%.cm' or email like '%.co' -- Camaroon or Colombia? Typos.
or email like '%.or' or email like '%.ne' -- Missing last letter
Oracle PL/SQL
这个是不是有点偷懒?尤其是在那些“复杂”的正则表达式之后:
SELECT email
FROM table_name
WHERE REGEXP_LIKE (email, '[A-Z0-9._%-]+@[A-Z0-9._%-]+.[A-Z]{2,4}');
MySQL
好吧,看来最后也一样懒:
SELECT * FROM `users` WHERE `email` NOT REGEXP '^[A-Z0-9._%-]+@[A-Z0-9.-]+.[A-Z]{2,4}$';
那么,你有没有关于验证邮件地址的正则表达式分享给大家?
本文转载来自 Linux 中国: https://github.com/Linux-CN/archive
对这篇文章感觉如何?
太棒了
0
不错
0
爱死了
0
不太好
0
感觉很糟
0
More in:Linux中国
捐赠 Let's Encrypt,共建安全的互联网
随着 Mozilla、苹果和谷歌对沃通和 StartCom 这两家 CA 公司处罚落定,很多使用这两家 CA 所签发证书的网站纷纷寻求新的证书签发商。有一个非盈利组织可以为大家提供了免费、可靠和安全的 SSL 证书服务,这就是 Let's Encrypt 项目。现在,它需要您的帮助
Let's Encrypt 正式发布,已经保护 380 万个域名
由于 Let's Encrypt 让安装 X.509 TLS 证书变得非常简单,所以这个数量增长迅猛。
关于Linux防火墙iptables的面试问答
Nishita Agarwal是Tecmint的用户,她将分享关于她刚刚经历的一家公司(印度的一家私人公司Pune)的面试经验。在面试中她被问及许多不同的问题,但她是iptables方面的专家,因此她想分享这些关于iptables的问题和相应的答案给那些以后可能会进行相关面试的人。 所有的问题和相应的答案都基于Nishita Agarwal的记忆并经过了重写。 嗨,朋友!我叫Nishita Agarwal。我已经取得了理学学士学位,我的专业集中在UNIX和它的变种(BSD,Linux)。它们一直深深的吸引着我。我在存储方面有1年多的经验。我正在寻求职业上的变化,并将供职于印度的P
Lets Encrypt 已被所有主流浏览器所信任
旨在让每个网站都能使用 HTTPS 加密的非赢利组织 Lets Encrypt 已经得了 IdenTrust的交叉签名,这意味着其证书现在已经可以被所有主流的浏览器所信任。从这个里程碑事件开始,访问者访问使用了Lets Encrypt 证书的网站不再需要特别配置就可以得到 HTTPS 安全保护了。 Lets Encrypt 的两个中级证书 ...

















