Tue, 18 Jul 2006
Comma-Separated Values?
While migrating IS MU to UTF-8, I rewrote the code for exporting tabular
data to CSV file for MS Excel, factoring it out to a separate module.
When I was at it, I have also added the Content-Disposition
header, so that the exported file is saved under a sane filename,
instead of the default of some_application.pl
. So now
the Excel exports are saved as files ending with the .csv
suffix.
Which is, interestingly enough, the source of problems and incompatibilities
with MS Excel.
As I have verified, when I save the CSV file as file.pl
,
excel reads it correctly - it asks whether the TAB
character
is the field separator (indeed it is), whether Windows-1250 is the
file encoding (it is), and happily imports the file. When the same file
is named file.csv
, Excel opens it without any question,
but somehow does not recognize the TAB
character as the
field separator. So all the fields are merged to the first column,
and the TAB
characters are displayed as those ugly rectangles.
When I try to separate the fileds with semicolons, Excel happily opens the
file (when named as *.csv
), but with another file name, it is
necessary to explicitly choose the semicolon as a separator. Just another
example of MS stupidity - why the separator cannot be the same regardless
of the file name? And by the way, what
does CSV stand for? Comma-separated values? Colon-separated values? It does
not work for commas nor colons. Just semicolons are detected correctly.
Maybe it is some kind of newspeak invented by Microsoft.
I guess I keep the exports TAB-delimited, and just change the file name
in the Content-Disposition
header to use the .txt
extension instead (altough something like
.its_csv_you_stupid_excel
would probably be more
appropriate).