Wed, 31 Jan 2007
Mugshot
Few days ago I have (again) visited Mugshot pages. To my surprise I was able to create an account (during my last visit Mugshot was invite-only). Even Matthew Szulik mentions Mugshot in today's interview at AbcLinuxu (in Czech, but there is a link to the original English Ogg file near the bottom of the page). I occasionally see Mugshot being mentioned in various blogposts, mostly by Red Hat people.
But what is the purpose of Mugshot? Some Mugshot users talk about it as if it was the Next Big Thing(tm) since the sliced bread. I have registered to it, but I am not sure why people use it, and how they use it (I am talking about the WWW interface, I have not tried the standalone client yet).
For example: I (will :-) have a RSS feed of my bookmarks, and I think some of my friends would be interested in looking at these links. Why should they get them through Mugshot instead of subscribing to the RSS feed in their RSS readers? And still, I think publishing my bookmarks cannot be done in my account, but I have to create a private group and subscribe it to the RSS feed instead. On a different topci: Mugshot has some kind of IM connection, but it is AIM only (no Jabber from an open source company like Red Hat, WTF?).
So, are there any Mugshot users in my Lazyweb?
1 replies for this story:
Vasek Stodulka wrote:
I don't get it. I do not understand all this thing. Does this something, what can't be done by RSS? You can create something like RSS from other feeds, but this is also possible in (for example) Google reader. Maybe it is simply as useless as it looks. :-)
Reply to this story:
Tue, 30 Jan 2007
Social bookmarking
I was never much interested in this Web 2.0[?] thing and community sites. However, in IS MU there has been recently implemented a bookmarking system similar to del.icio.us, including tagging and other features. I found it easy to use this for keeping info about articles I want to read some time in the future (bookmark, tag, and forget). It is definitely better than my previous approach of keeping such "to read" articles as open tabs in Galeon.
There is also an added bonus of seeing other users' public bookmarks
to find something interesting. Hopefully there will be a RSS interface
too. For now, you can see my bookmarks (loosely) related to Linux
at is.muni.cz/ln/kas/linux
,
or all my bookmarks at is.muni.cz/ln/kas
.
I think it is different from del.icio.us
in the user base -
bookmarks can be added by authenticated users only, who are related to
Masaryk University. Also the users of a (public) bookmark are publicly visible,
which should probably reduce spam.
0 replies for this story:
Reply to this story:
Mon, 29 Jan 2007
PNG Transparency (continued)
Well, the problem of PNG transparency was more complicated than I thought.
Firstly, how does it work: the original approach
tries to look up all the <img>
tags in the document,
and replace them by the following text:
<span ... style="width:XXXpx;height=YYYpx;filter:progid: DXImageTransform.Microsoft.AlphaImageLoader(src= 'URL_TO_THE_IMAGE',sizingMethod='scale');"> </span>
There are problems with this approach, however:
- Firstly, the Javascript code does not quote the image URL correctly, so should it include the quote character, the transformation would be incorrect.
- The code searches for PNG images by looking at the
src
attribute of the<img>
tag and testing whether it ends by a ".PNG
" string. Which does not work for our images, because they are generated, and end with the TeX code in the query string, instead of the PNG file extension. Does anybody know how to search the images based on their MIME type instead of the URL? - Another problem was in MSIE parsing itself: when the image URL contained
the closing parenthesis (strings like
f(x)
are common in math :-), MSIE took the parenthesis as the closing element of the wholeAlphaImageLoader()
function, despite that the closing quote of the URL has not been seen yet :-( I had to add the URL-quoting of parentheses to the Javascript code. - And the worst one: when the URL contained backslashes (be it quoted as
%5C
or not), the actual HTTP request made by theAlphaImageLoader
had each backslash substituted by two backslashes! I don't know how to solve this cleanly. As you can guess, backslashes are also pretty common in the TeX syntax :-). What I did was to add ";msiehack=1
" at the end ofAlphaImageLoader
URL, and on the server side substitute all sequences of two backslashes back by a single backslash if "msiehack=1
argument is seen.
After all, I have the PNG transparencey hack working, but the amount of dirty hacking needed to get it working is simply stunning.
0 replies for this story:
Reply to this story:
Mon, 15 Jan 2007
PNG Transparency
In electronic tests in IS MU we use images
for displaying mathematics (as MathML
is still not widely supported by browsers). We have recently moved this
system from mimeTeX
to the native TeX-based system (with
dvipng as a back-end),
which provides the TeX syntax,
including possible additional macros such as AMSLaTeX. However, the PNG
files generated by dvipng
have problems in MS Internet Exporer
with transparency.
(the above picture will not be easily visible in MSIE 5.5 and 6). The older (and not-so-old) versions of MSIE display only 100% opaque pixels in the PNG images, so thin (and thus partly transparent) lines in mathematics are often not displayed at all - see the following MSIE 6 screenshot:
One of the solutions is to change the full alpha transparency to the binary
(fully transparent/fully opaque) transparency, but this looks bad when the
image background is somehting other than the default.
There are various
hacks to enable transparency in MSIE 5.5 and 6, but none of them worked
for me. After more-than-expected amount of time spent by experiments, I have
found that dvipng
generates 4-bit PNG images with palette,
while MSIE can handle (even with this hack) only static color images.
Another strange problem solved - use the "--truecolor
" switch
to dvipng
.
3 replies for this story:
Vasek Stodulka wrote:
If everything is grayscale, you should convert it to GIFs. I think GIF supports only binary alpha, but it is supported in all versions of MSIE quite well.
Yenya wrote: binary alpha
PNG (and dvipng) supports binary alpha as well, the problem is that full alpha looks much better.
Stepan wrote: use IE7 :(
We had the same problem on avivaz.cz. Unfortunately, we did not find solution...
Reply to this story:
Fri, 12 Jan 2007
Graceful Reload
Yesterday we have tried to solve a problem we probably had for a month or so: we have observed a very high load spikes on our application cluster servers. There were usually only few such spikes a day, and the spikes usually did not occur on all servers simultaneously. I think the problem lasted since we have moved to the new system (Apache 2.2 based, native x86_64). Here is a load graph (the problem has been solved around 5:30pm):
Mirek found that during this load peak there
was an extraordinary number of Apache
processes serving our title page (which is
quite computationally intensive, but rarely used in such a massive scale).
So we thought about
somebody DDoSing[?] us. But according to the Apache status page the clients came from 127.0.0.1
[?] address.
I don't know about any case where our application would want to access
our title page over HTTP (we do some self-referencing requests
for, for example, WAP access, but none for the title page). After increasing
the server log level we have found that these requests had strange User-Agent
value "internal dummy connection
". Quick search
for this string gave us the answer:
During the "graceful reload", the main Apache process apparently contacts
its children not by sending them the SIGUSR1
signal, as in previous
Apache releases, but instead sends them a dummy request "GET /
",
so that they can after the request check (and find out) that the
configuration has been changed, and terminate themselves.
So every time we have changed something in our applications (which is
several times a day), there was many Apache processes trying to serve our
dynamic title page to the Apache itself. Because there are some other
(service-only) Apache processes, the load spike was sometimes way
bigger than an ordinary remote DDoS attack can cause.
A mod_rewrite
hack in the server configuration has solved the
problem - we redirect such dummy requests to /robots.txt
instead
of the dynamic title page:
<Directory /documentroot> RewriteEngine on RewriteCond %{HTTP_USER_AGENT} internal\ dummy\ connection RewriteRule ^$ /robots.txt [L] ... </Directory>
If you ask me, I think it is pretty lame way to restart itself. The URL in the internal
request is not even configurable (what would Apache do when not configured
to listen on 127.0.0.1
at all?), and from my searches it looks
like we are not the first who ran into this problem.
4 replies for this story:
Peter Kruty wrote: What's wrong with SIGUSR1?
Sounds realy stupid. I wonder what's wrong with SIGUSR1.
Spes wrote: Re: What's wrong with SIGUSR1?
Maybe to have the same code for all systems, because not all support signals?
mutante wrote: Apache Wiki Page on Internal Dummy Connection
John Gillespie wrote:
Thanks for the info, I've been wondering what all those lines in my logs were about...
Reply to this story:
Wed, 10 Jan 2007
HP Procurve Upgrade
As for the packet loss problem I wrote about
yesterday: I have searched a bit, and found this page, which recommends to use the qos-passthrough-mode
for switches with variable link speeds. Unfortunately
this requires firmware upgrade, because the switch in question has too
old firmware. Well, let's have a look what the latest-greatest firmware offers:
IMPORTANT
Starting with software version I.08.74, FEC trunks (Cisco Systems’ Fast EtherChannel for aggregated links) are no longer supported, and generation of CDP (Cisco Discovery Protocol) packets are no longer supported. In their place are IEEE standards-based LACP aggregated links (as well as statically configured trunks) and generation of LLDP packets for device discovery. [...]IMPORTANT
Software version I.08.71 detects and disables non-genuine ProCurve transceivers and mini-GBICs discovered in Series 2800 Switch ports. When a non-genuine device is discovered, the switch disables the port and generates an error message in the Event Log.
So they intentionally remove support for CDP and EtherChannel to promote the open-standard protocols (which would have been nice, if the old implementation was kept in place, possibly disabled by default), and they intentionally refuse to work with non-HP GBICs, even though physically they are perfectly OK (which is plain evil from them).
So it seems HP has joined A-T and Cisco in the list of ethernet switch
vendors which are evil. Fortunately there is a firmware version which
already supports the qos-passthrough-mode
, and still does
not have the above two, ahem, improvements.
0 replies for this story:
Reply to this story:
Tue, 09 Jan 2007
Packet Loss
During the last few days we have experienced spikes of unusually high packet loss on one of our networks. This finally made me to install SmokePing, a network latency and packet loss measurement tool by Tobi Oetiker (author of MRTG and rrdtool, another two excellent Open source network monitoring tools).
(click for a bigger image) I can recommend SmokePing - it is easy to configure and does what I want.
On the packet loss front: I still do not know the exact cause. One of the problems was that we were under huge network scan few days ago, so maybe the IP blacklist got too big. The other problem definitely is that this network is connected to the router by a 100baseTX interface only, while the switch of that network as well as the server NICs have gigabit speed. But I thought all buffers along the path should be big enough for TCP to adapt to the available bandwidth. Linux has 1000 packets queue for a 100Mbit interface, and 5000 packets for a gigabit one. The switch (HP 2824) says the following about the memory:
Packet - Total : 1998 Buffers Free : 1607 Lowest : 1590 Missed : 0
which I interpret as "no packet lost because of the memory shortage". However, the uplink interface definitely shows something strange:
Status and Counters - Port Counters for port 23 Name : Link Status : Up Bytes Rx : 4,173,035,216 Bytes Tx : 290,829,123 Unicast Rx : 1,193,775,726 Unicast Tx : 1,030,811,282 Bcast/Mcast Rx : 421,349 Bcast/Mcast Tx : 12,537,341 FCS Rx : 0 Drops Rx : 4,386,042 Alignment Rx : 0 Collisions Tx : 0 Runts Rx : 0 Late Colln Tx : 0 Giants Rx : 0 Excessive Colln : 0 Total Rx Errors : 0 Deferred Tx : 0
The interesting part is the Drops Rx
value. The value there
is too big (and far biggest of all ports), but why it is not included
in Total Rx Errors
? The manual apparently does not say
anything about exact meaning of these counters. Is my lazyweb more
informed?
2 replies for this story:
Vasek Stodulka wrote:
Have you tried to switch it off and on again? :-) (This is a quote from "The IT Crowd" series - just for the case you have not seen it yet.)
Yenya wrote: power cycle
I have not tried to power cycle it yet, but I have already rebooted it.
Reply to this story:
Mon, 08 Jan 2007
Spam in 2007
I've came across an interesting case of mail misclassified by our DSpam filter. Some of the reasons given by DSpam were the following:
Date*2007+11, 0.99000, Received*Jan+2007, 0.99000, Date*2007, 0.99000, Received*2007+11, 0.99000, Received*2007, 0.99000,
It seems that when DSpam has been initially trained, all mail which
contained the string "2007" in Date:
or Received:
headers was spam (obviously - only spam or severly misconfigured mail servers
had the system date that much in the future).
The question is, what is the correct solution of this problem: should the
four-digit number in those two headers be a hard-coded exception?
Should the DSpam use a higher-level information (like SpamAssassin does),
such as "Date:
is more than 36 hours in the future"?
Or maybe should users every year on January 1st send few messages to the
DSpam training address?
1 replies for this story:
Milan Zamazal wrote:
I think this is much about learning strategy. First, it seems your spam database is overtrained, it's unlikely many spam messages that required training were future-date. Changes in learning strategy may also prevent such problems, how about automated (re)learning of a "ham message of the day" every day, i.e. a ham message most different of other messages received that day? I wouldn't like the other proposed solutions (hardcoded exceptions and higher level information), they are complicated, human assisted and out of the scope of the classifier.
Reply to this story:
Thu, 04 Jan 2007
Happy New Year
I wish everyone happy new year 2007.