Tue, 30 May 2006
MIME::Words and UTF-8
We use the
MIME::Words
package from CPAN to handle encoding
and decoding
the RFC 1522-style
e-mail headers (those =?UTF-8?Q?something=20something?=
-like texts).
Long time ago I have found that this package had a bug - when encoding two
adjacent words the inner whitespace should be added to the first or the second
word, because the whitespace between the two adjacent encoded words is discarded
during decoding. When moving our system to UTF-8, I have decided to install
a new MIME::Words
module, and I wondered whether this bug is fixed.
In the manpage, they wrote:
It does not comply with the RFC-1522 rules regarding the use of encoded words in message headers. You may want to roll your own variant, using encoded_mimeword(), for your application. Thanks to Jan Kasprzak for reminding me about this problem.
So they did not fix the problem reported 3-5 years ago, they just acknowledged its existence (even with my name :-). The module also does not handle multi-byte characters (in UTF-8 strings) correctly, and defaults to the ISO-8859-1 encoding instead.
I have decided to fix this module, solving both the problem of two adjacent encoded words, and the problems of encoding/decoding from/to the multibyte strings. Here is the patch for MIME::Words and UTF-8. Hopefully they will apply it soon.