Linux中國

史上最複雜的驗證郵件地址的正則表達式

郵件地址的規範來自於 RFC 5322 。有一個網站 emailregex.com 專門列出各種編程語言下的驗證郵件地址正則表達式,其中很多正則表達式都是我聽說過而從未見過的複雜——我想說,做這個網站的程序員是被郵件驗證這件事傷害了多深啊!

其實,在產品環境中,一般來說並不需要這麼複雜的正則表達式來做到99.99%正確。一般來說,從執行效率和測試覆蓋率來說,只需要一個簡單的版本即可:

/^[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}$/i

那麼下面我們來看看這些更嚴謹、更複雜的正則表達式吧:

驗證郵件地址的通用正則表達式(符合 RFC 5322 標準)

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|\[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0bx0cx0e-x1fx21-x5ax53-x7f]|\[x01-x09x0bx0cx0e-x7f])+)])

由於各種語言對正則表達式的支持不同、語法差異和覆蓋率不同,所以,不同語言裡面的正則表達式也不同:

Python

這個是個簡單的版本:

r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+$)"

Javascript

這個有點複雜了:

/^[-a-z0-9~!$%^&*_=+}{'?]+(.[-a-z0-9~!$%^&*_=+}{'?]+)*@([a-z0-9_][-a-z0-9_]*(.[-a-z0-9_]+)*.(aero|arpa|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org|pro|travel|mobi|[a-z][a-z])|([0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}))(:[0-9]{1,5})?$/i

Swift

[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}

PHP

PHP 的這個版本就更複雜了,覆蓋率就更大一些:

/^(?!(?:(?:x22?x5C[x00-x7E]x22?)|(?:x22?[^x5Cx22]x22?)){255,})(?!(?:(?:x22?x5C[x00-x7E]x22?)|(?:x22?[^x5Cx22]x22?)){65,}@)(?:(?:[x21x23-x27x2Ax2Bx2Dx2F-x39x3Dx3Fx5E-x7E]+)|(?:x22(?:[x01-x08x0Bx0Cx0E-x1Fx21x23-x5Bx5D-x7F]|(?:x5C[x00-x7F]))*x22))(?:.(?:(?:[x21x23-x27x2Ax2Bx2Dx2F-x39x3Dx3Fx5E-x7E]+)|(?:x22(?:[x01-x08x0Bx0Cx0E-x1Fx21x23-x5Bx5D-x7F]|(?:x5C[x00-x7F]))*x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))]))$/iD

Perl / Ruby

對與 PHP 的版本,Perl 和 Ruby 表示不服,可以更嚴謹:

(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:\".[]00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[]00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*(?:(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|"(?:[^"r\]|\.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:\".[] 00-31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:\".[]]))|[([^[]r\]|\.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*))*)?;s

Perl 5.10 及以後版本

上面的版本,嗯,我可以說是天書嗎?反正我是沒有解讀的想法了。當然,新版本的 Perl 語言還有一個更易讀的版本(你是說真的么?)

/(?(DEFINE)
(?<address> (?&mailbox) | (?&group))
(?<mailbox> (?&name_addr) | (?&addr_spec))
(?<name_addr> (?&display_name)? (?&angle_addr))
(?<angle_addr> (?&CFWS)? < (?&addr_spec) > (?&CFWS)?)
(?<group> (?&display_name) : (?:(?&mailbox_list) | (?&CFWS))? ;
(?&CFWS)?)
(?<display_name> (?&phrase))
(?<mailbox_list> (?&mailbox) (?: , (?&mailbox))*)

(?<addr_spec> (?&local_part) @ (?&domain))
(?<local_part> (?&dot_atom) | (?&quoted_string))
(?<domain> (?&dot_atom) | (?&domain_literal))
(?<domain_literal> (?&CFWS)? [ (?: (?&FWS)? (?&dcontent))* (?&FWS)?
] (?&CFWS)?)
(?<dcontent> (?&dtext) | (?&quoted_pair))
(?<dtext> (?&NO_WS_CTL) | [x21-x5ax5e-x7e])

(?<atext> (?&ALPHA) | (?&DIGIT) | [!#$%&&apos;*+-/=?^_`{|}~])
(?<atom> (?&CFWS)? (?&atext)+ (?&CFWS)?)
(?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)?)
(?<dot_atom_text> (?&atext)+ (?: . (?&atext)+)*)

(?<text> [x01-x09x0bx0cx0e-x7f])
(?<quoted_pair> \ (?&text))

(?<qtext> (?&NO_WS_CTL) | [x21x23-x5bx5d-x7e])
(?<qcontent> (?&qtext) | (?&quoted_pair))
(?<quoted_string> (?&CFWS)? (?&DQUOTE) (?:(?&FWS)? (?&qcontent))*
(?&FWS)? (?&DQUOTE) (?&CFWS)?)

(?<word> (?&atom) | (?&quoted_string))
(?<phrase> (?&word)+)

# Folding white space
(?<FWS> (?: (?&WSP)* (?&CRLF))? (?&WSP)+)
(?<ctext> (?&NO_WS_CTL) | [x21-x27x2a-x5bx5d-x7e])
(?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment))
(?<comment> ( (?: (?&FWS)? (?&ccontent))* (?&FWS)? ) )
(?<CFWS> (?: (?&FWS)? (?&comment))*
(?: (?:(?&FWS)? (?&comment)) | (?&FWS)))

# No whitespace control
(?<NO_WS_CTL> [x01-x08x0bx0cx0e-x1fx7f])

(?<ALPHA> [A-Za-z])
(?<DIGIT> [0-9])
(?<CRLF> x0d x0a)
(?<DQUOTE> ")
(?<WSP> [x20x09])
)

(?&address)/x

Ruby (簡單版)

Ruby 表示,其實人家還有個簡單版本:

/A([w+-].?)+@[a-zd-]+(.[a-z]+)*.[a-z]+z/i

.NET

這樣的版本誰沒有啊——.NET 說:

^w+([-+.&apos;]w+)*@w+([-.]w+)*.w+([-.]w+)*$

grep 命令

用 grep 命令在文件中查找郵件地址,我想你不會寫個若干行的正則表達式吧,意思一下就行了:

$ grep -E -o "b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,6}b" filename.txt

SQL Server

在 SQL Server 中也是可以用正則表達式的,不過這個代碼片段應該是來自某個產品環境中的,所以,還體貼的照顧了那些把郵件地址寫錯的人:

 select email 
 from table_name where 
 patindex (&apos;%[ &&apos;&apos;,":;!+=/()<>]%&apos;, email) > 0 -- Invalid characters
 or patindex (&apos;[@.-_]%&apos;, email) > 0 -- Valid but cannot be starting character
 or patindex (&apos;%[@.-_]&apos;, email) > 0 -- Valid but cannot be ending character
 or email not like &apos;%@%.%&apos; -- Must contain at least one @ and one .
 or email like &apos;%..%&apos; -- Cannot have two periods in a row
 or email like &apos;%@%@%&apos; -- Cannot have two @ anywhere
 or email like &apos;%.@%&apos; or email like &apos;%@.%&apos; -- Cannot have @ and . next to each other
 or email like &apos;%.cm&apos; or email like &apos;%.co&apos; -- Camaroon or Colombia? Typos. 
 or email like &apos;%.or&apos; or email like &apos;%.ne&apos; -- Missing last letter

Oracle PL/SQL

這個是不是有點偷懶?尤其是在那些「複雜」的正則表達式之後:

SELECT email 
FROM table_name
WHERE REGEXP_LIKE (email, &apos;[A-Z0-9._%-]+@[A-Z0-9._%-]+.[A-Z]{2,4}&apos;);

MySQL

好吧,看來最後也一樣懶:

SELECT * FROM `users` WHERE `email` NOT REGEXP &apos;^[A-Z0-9._%-]+@[A-Z0-9.-]+.[A-Z]{2,4}$&apos;;

那麼,你有沒有關於驗證郵件地址的正則表達式分享給大家?


本文轉載來自 Linux 中國: https://github.com/Linux-CN/archive

對這篇文章感覺如何?

太棒了
0
不錯
0
愛死了
0
不太好
0
感覺很糟
0
雨落清風。心向陽

    You may also like

    Leave a reply

    您的郵箱地址不會被公開。 必填項已用 * 標註

    此站點使用Akismet來減少垃圾評論。了解我們如何處理您的評論數據

    More in:Linux中國