Thu, 02 Nov 2006
Regexp of the day
Regular expressions[?] are widely used in text manipulation, such as parsing e-mail addresses. I tried to use Email::Address module for it (it allows address to be modified, unlike a similar module - Mail::Address). My task was to qualify unqualified addresses, i.e. to change the address
AYANAMI Rei <rei>
to something like
AYANAMI Rei <rei@nerv.gov.jp>
However, Email::Address
cannot parse unqualified addresses
(unlike Mail::Address
, as I have discovered later :-). Fortunately,
Email::Address
provides the regular expression for
matching the address, and it can be modified. So the solution would be
easy, wouldn't it? Here we go - the regexp in question is the following
(line-wrapped for convennience):
$ perl -MEmail::Address -e 'print $Email::Address::angle_addr' (?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))| (?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xis m:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s *\)\s*)|\s+)*<(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s* (?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?- xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[ ^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7 F()<>\[\]:;@\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+)*)(?- xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\( ?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\ ]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+ )*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\] +))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(? -xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)) )*\s*\)\s*)|\s+)*"(?-xism:(?-xism:[^\\"])|(?-xism:\\(?-xism:[^\x0 A\x0D])))+"(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+ ))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?- xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*))) *\s*\)\s*)|\s+)*))\@(?-xism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(? -xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xi sm:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\ x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*(?-xism:[^\x00-\x1F\x7F( )<>\[\]:;@\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,."\s]+)*)(?-xi sm:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?- xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+ ))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)* )|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+) )|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-x ism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|))*\s*\)\s*)))* \s*\)\s*)|\s+)*\[(?:\s*(?-xism:(?-xism:[^\[\]\\])|(?-xism:\\(?-xi sm:[^\x0A\x0D]))))*\s*\](?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xis m:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\ s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|) )*\s*\)\s*)))*\s*\)\s*)|\s+)*)))>(?-xism:(?-xism:\s*\((?:\s*(?-xi sm:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism: \s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A \x0D]))|))*\s*\)\s*)))*\s*\)\s*)|\s+)*)
My dear lazyweb, who would be the first who will find out how to edit this regexp to parse unqualified addresses as well?
Update - Fri, 03 Nov 2006: Another solution
I have used Mail::Address
, and I create a new
object as a replacement whenever I have to replace an unqualified
address with the qualified one.