Yenya's World

Mon, 26 Feb 2007

Superlinear Algorithms

A while ago I wrote about the Email::Address Perl module for parsing addresses in the e-mail headers (such as To:, Cc:, etc.). We use it in production system, but on Thursday we have got a problem, which I narrowed to an inoptimality in this module. The mail delivery daemon (which - amongst other things - parses the mail message) seemed to be stuck in an infinite loop.

Later I found out that the message in question contained an overly long To: header - it had more than 300 mail addresses there. I have investigated this further, and it seems that the Email::Address module has complexity far above O(n):

# of addresses	parsing time
60	0.13 s
65	0.13 s
70	0.29 s
75	0.45 s
80	0.79 s
85	0.58 s
90	0.86 s
95	1.63 s
100	2.38 s
105	3.17 s
110	4.02 s
115	5.97 s
120	11.37 s
125	230.89 s
130	1045.07 s

So it was definitely impossible for it to parse a header with >300 addresses in a reasonable amount of time. The alternative module, Mail::Address, is unusable on this particular message as well - it crashes perl with SIGSEGV :-(.

As a workaround, I wanted to truncate the header somewhere, but in order to truncate the header exactly between e-mail addresses, I would have to parse the header. Which is what I has been doing, and it took too much time. A nice Catch 22, isn't it? So I have invented a heuristics where I truncate the header after a comma followed by a newline. When this is not enough, I use just a sole comma as a separator, and when even this is not enough, I truncate the header to a fixed number of characters. So far it seems to be sufficient.

Section: /computers (RSS feed) | Permanent link | 3 writebacks

Yenya's World

Mon, 26 Feb 2007

Superlinear Algorithms

About:

Links:

Categories:

Archive:

Blog roll: