Yenya's World

Wed, 31 Jan 2007

Mugshot

Few days ago I have (again) visited Mugshot pages. To my surprise I was able to create an account (during my last visit Mugshot was invite-only). Even Matthew Szulik mentions Mugshot in today's interview at AbcLinuxu (in Czech, but there is a link to the original English Ogg file near the bottom of the page). I occasionally see Mugshot being mentioned in various blogposts, mostly by Red Hat people.

But what is the purpose of Mugshot? Some Mugshot users talk about it as if it was the Next Big Thing(tm) since the sliced bread. I have registered to it, but I am not sure why people use it, and how they use it (I am talking about the WWW interface, I have not tried the standalone client yet).

For example: I (will :-) have a RSS feed of my bookmarks, and I think some of my friends would be interested in looking at these links. Why should they get them through Mugshot instead of subscribing to the RSS feed in their RSS readers? And still, I think publishing my bookmarks cannot be done in my account, but I have to create a private group and subscribe it to the RSS feed instead. On a different topci: Mugshot has some kind of IM connection, but it is AIM only (no Jabber from an open source company like Red Hat, WTF?).

So, are there any Mugshot users in my Lazyweb?

Section: /computers (RSS feed) | Permanent link | 1 writebacks

1 replies for this story:

Vasek Stodulka wrote:

I don't get it. I do not understand all this thing. Does this something, what can't be done by RSS? You can create something like RSS from other feeds, but this is also possible in (for example) Google reader. Maybe it is simply as useless as it looks. :-)

Reply to this story:

Tue, 30 Jan 2007

I was never much interested in this Web 2.0[?] thing and community sites. However, in IS MU there has been recently implemented a bookmarking system similar to del.icio.us, including tagging and other features. I found it easy to use this for keeping info about articles I want to read some time in the future (bookmark, tag, and forget). It is definitely better than my previous approach of keeping such "to read" articles as open tabs in Galeon.

There is also an added bonus of seeing other users' public bookmarks to find something interesting. Hopefully there will be a RSS interface too. For now, you can see my bookmarks (loosely) related to Linux at is.muni.cz/ln/kas/linux, or all my bookmarks at is.muni.cz/ln/kas.

I think it is different from del.icio.us in the user base - bookmarks can be added by authenticated users only, who are related to Masaryk University. Also the users of a (public) bookmark are publicly visible, which should probably reduce spam.

Section: /computers (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

Mon, 29 Jan 2007

PNG Transparency (continued)

Well, the problem of PNG transparency was more complicated than I thought. Firstly, how does it work: the original approach tries to look up all the <img> tags in the document, and replace them by the following text:

<span ... style="width:XXXpx;height=YYYpx;filter:progid:
      DXImageTransform.Microsoft.AlphaImageLoader(src=
      'URL_TO_THE_IMAGE',sizingMethod='scale');">
</span>

There are problems with this approach, however:

Firstly, the Javascript code does not quote the image URL correctly, so should it include the quote character, the transformation would be incorrect.
The code searches for PNG images by looking at the src attribute of the <img> tag and testing whether it ends by a ".PNG" string. Which does not work for our images, because they are generated, and end with the TeX code in the query string, instead of the PNG file extension. Does anybody know how to search the images based on their MIME type instead of the URL?
Another problem was in MSIE parsing itself: when the image URL contained the closing parenthesis (strings like f(x) are common in math :-), MSIE took the parenthesis as the closing element of the whole AlphaImageLoader() function, despite that the closing quote of the URL has not been seen yet :-( I had to add the URL-quoting of parentheses to the Javascript code.
And the worst one: when the URL contained backslashes (be it quoted as %5C or not), the actual HTTP request made by the AlphaImageLoader had each backslash substituted by two backslashes! I don't know how to solve this cleanly. As you can guess, backslashes are also pretty common in the TeX syntax :-). What I did was to add ";msiehack=1" at the end of AlphaImageLoader URL, and on the server side substitute all sequences of two backslashes back by a single backslash if "msiehack=1 argument is seen.

After all, I have the PNG transparencey hack working, but the amount of dirty hacking needed to get it working is simply stunning.

Section: /computers (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

Mon, 15 Jan 2007

PNG Transparency

In electronic tests in IS MU we use images for displaying mathematics (as MathML is still not widely supported by browsers). We have recently moved this system from mimeTeX to the native TeX-based system (with dvipng as a back-end), which provides the TeX syntax, including possible additional macros such as AMSLaTeX. However, the PNG files generated by dvipng have problems in MS Internet Exporer with transparency.

(the above picture will not be easily visible in MSIE 5.5 and 6). The older (and not-so-old) versions of MSIE display only 100% opaque pixels in the PNG images, so thin (and thus partly transparent) lines in mathematics are often not displayed at all - see the following MSIE 6 screenshot:

One of the solutions is to change the full alpha transparency to the binary (fully transparent/fully opaque) transparency, but this looks bad when the image background is somehting other than the default. There are various hacks to enable transparency in MSIE 5.5 and 6, but none of them worked for me. After more-than-expected amount of time spent by experiments, I have found that dvipng generates 4-bit PNG images with palette, while MSIE can handle (even with this hack) only static color images. Another strange problem solved - use the "--truecolor" switch to dvipng.

Section: /computers (RSS feed) | Permanent link | 3 writebacks

3 replies for this story:

Vasek Stodulka wrote:

If everything is grayscale, you should convert it to GIFs. I think GIF supports only binary alpha, but it is supported in all versions of MSIE quite well.

Yenya wrote: binary alpha

PNG (and dvipng) supports binary alpha as well, the problem is that full alpha looks much better.

Stepan wrote: use IE7 :(

We had the same problem on avivaz.cz. Unfortunately, we did not find solution...

Reply to this story:

Fri, 12 Jan 2007

Graceful Reload

Yesterday we have tried to solve a problem we probably had for a month or so: we have observed a very high load spikes on our application cluster servers. There were usually only few such spikes a day, and the spikes usually did not occur on all servers simultaneously. I think the problem lasted since we have moved to the new system (Apache 2.2 based, native x86_64). Here is a load graph (the problem has been solved around 5:30pm):

Mirek found that during this load peak there was an extraordinary number of Apache processes serving our title page (which is quite computationally intensive, but rarely used in such a massive scale). So we thought about somebody DDoSing[?] us. But according to the Apache status page the clients came from 127.0.0.1[?] address.

I don't know about any case where our application would want to access our title page over HTTP (we do some self-referencing requests for, for example, WAP access, but none for the title page). After increasing the server log level we have found that these requests had strange User-Agent value "internal dummy connection". Quick search for this string gave us the answer:

During the "graceful reload", the main Apache process apparently contacts its children not by sending them the SIGUSR1 signal, as in previous Apache releases, but instead sends them a dummy request "GET /", so that they can after the request check (and find out) that the configuration has been changed, and terminate themselves.

So every time we have changed something in our applications (which is several times a day), there was many Apache processes trying to serve our dynamic title page to the Apache itself. Because there are some other (service-only) Apache processes, the load spike was sometimes way bigger than an ordinary remote DDoS attack can cause. A mod_rewrite hack in the server configuration has solved the problem - we redirect such dummy requests to /robots.txt instead of the dynamic title page:

<Directory /documentroot>
        RewriteEngine on
        RewriteCond %{HTTP_USER_AGENT} internal\ dummy\ connection
        RewriteRule ^$ /robots.txt [L]
	...
</Directory>

If you ask me, I think it is pretty lame way to restart itself. The URL in the internal request is not even configurable (what would Apache do when not configured to listen on 127.0.0.1 at all?), and from my searches it looks like we are not the first who ran into this problem.

Section: /computers (RSS feed) | Permanent link | 4 writebacks

4 replies for this story:

Peter Kruty wrote: What's wrong with SIGUSR1?

Sounds realy stupid. I wonder what's wrong with SIGUSR1.

Spes wrote: Re: What's wrong with SIGUSR1?

Maybe to have the same code for all systems, because not all support signals?

mutante wrote: Apache Wiki Page on Internal Dummy Connection

John Gillespie wrote:

Thanks for the info, I've been wondering what all those lines in my logs were about...

Reply to this story:

Wed, 10 Jan 2007

HP Procurve Upgrade

As for the packet loss problem I wrote about yesterday: I have searched a bit, and found this page, which recommends to use the qos-passthrough-mode for switches with variable link speeds. Unfortunately this requires firmware upgrade, because the switch in question has too old firmware. Well, let's have a look what the latest-greatest firmware offers:

IMPORTANT
Starting with software version I.08.74, FEC trunks (Cisco Systems’ Fast EtherChannel for aggregated links) are no longer supported, and generation of CDP (Cisco Discovery Protocol) packets are no longer supported. In their place are IEEE standards-based LACP aggregated links (as well as statically configured trunks) and generation of LLDP packets for device discovery. [...]
IMPORTANT
Software version I.08.71 detects and disables non-genuine ProCurve transceivers and mini-GBICs discovered in Series 2800 Switch ports. When a non-genuine device is discovered, the switch disables the port and generates an error message in the Event Log.

So they intentionally remove support for CDP and EtherChannel to promote the open-standard protocols (which would have been nice, if the old implementation was kept in place, possibly disabled by default), and they intentionally refuse to work with non-HP GBICs, even though physically they are perfectly OK (which is plain evil from them).

So it seems HP has joined A-T and Cisco in the list of ethernet switch vendors which are evil. Fortunately there is a firmware version which already supports the qos-passthrough-mode, and still does not have the above two, ahem, improvements.

Section: /computers (RSS feed) | Permanent link | 0 writebacks

0 replies for this story:

Reply to this story:

Tue, 09 Jan 2007

Packet Loss

During the last few days we have experienced spikes of unusually high packet loss on one of our networks. This finally made me to install SmokePing, a network latency and packet loss measurement tool by Tobi Oetiker (author of MRTG and rrdtool, another two excellent Open source network monitoring tools).

(click for a bigger image) I can recommend SmokePing - it is easy to configure and does what I want.

On the packet loss front: I still do not know the exact cause. One of the problems was that we were under huge network scan few days ago, so maybe the IP blacklist got too big. The other problem definitely is that this network is connected to the router by a 100baseTX interface only, while the switch of that network as well as the server NICs have gigabit speed. But I thought all buffers along the path should be big enough for TCP to adapt to the available bandwidth. Linux has 1000 packets queue for a 100Mbit interface, and 5000 packets for a gigabit one. The switch (HP 2824) says the following about the memory:

Packet   - Total   : 1998        
Buffers    Free    : 1607        
           Lowest  : 1590        
           Missed  : 0

which I interpret as "no packet lost because of the memory shortage". However, the uplink interface definitely shows something strange:

Status and Counters - Port Counters for port 23

Name  :                                                                 

Link Status     : Up  

Bytes Rx        : 4,173,035,216       Bytes Tx        : 290,829,123       
Unicast Rx      : 1,193,775,726       Unicast Tx      : 1,030,811,282     
Bcast/Mcast Rx  : 421,349             Bcast/Mcast Tx  : 12,537,341        

FCS Rx          : 0                   Drops Rx        : 4,386,042         
Alignment Rx    : 0                   Collisions Tx   : 0                 
Runts Rx        : 0                   Late Colln Tx   : 0                 
Giants Rx       : 0                   Excessive Colln : 0                 
Total Rx Errors : 0                   Deferred Tx     : 0

The interesting part is the Drops Rx value. The value there is too big (and far biggest of all ports), but why it is not included in Total Rx Errors? The manual apparently does not say anything about exact meaning of these counters. Is my lazyweb more informed?

Section: /computers (RSS feed) | Permanent link | 2 writebacks

2 replies for this story:

Vasek Stodulka wrote:

Have you tried to switch it off and on again? :-) (This is a quote from "The IT Crowd" series - just for the case you have not seen it yet.)

Yenya wrote: power cycle

I have not tried to power cycle it yet, but I have already rebooted it.

Reply to this story:

Mon, 08 Jan 2007

Spam in 2007

I've came across an interesting case of mail misclassified by our DSpam filter. Some of the reasons given by DSpam were the following:

Date*2007+11, 0.99000,
Received*Jan+2007, 0.99000,
Date*2007, 0.99000,
Received*2007+11, 0.99000,
Received*2007, 0.99000,

It seems that when DSpam has been initially trained, all mail which contained the string "2007" in Date: or Received: headers was spam (obviously - only spam or severly misconfigured mail servers had the system date that much in the future).

The question is, what is the correct solution of this problem: should the four-digit number in those two headers be a hard-coded exception? Should the DSpam use a higher-level information (like SpamAssassin does), such as "Date: is more than 36 hours in the future"? Or maybe should users every year on January 1st send few messages to the DSpam training address?

Section: /computers (RSS feed) | Permanent link | 1 writebacks

1 replies for this story:

Milan Zamazal wrote:

I think this is much about learning strategy. First, it seems your spam database is overtrained, it's unlikely many spam messages that required training were future-date. Changes in learning strategy may also prevent such problems, how about automated (re)learning of a "ham message of the day" every day, i.e. a ham message most different of other messages received that day? I wouldn't like the other proposed solutions (hardcoded exceptions and higher level information), they are complicated, human assisted and out of the scope of the classifier.

Reply to this story:

Thu, 04 Jan 2007

Happy New Year

I wish everyone happy new year 2007.

Section: /personal (RSS feed) | Permanent link | 0 writebacks


Name:
URL/Email:	[http://... or mailto:you@wherever] (optional)
Title:	(optional)
Comments:

Key image:	(valid for an hour only)
Key value:	(to verify you are not a bot)


Name:
URL/Email:	[http://... or mailto:you@wherever] (optional)
Title:	(optional)
Comments:

Key image:	(valid for an hour only)
Key value:	(to verify you are not a bot)


Name:
URL/Email:	[http://... or mailto:you@wherever] (optional)
Title:	(optional)
Comments:

Key image:	(valid for an hour only)
Key value:	(to verify you are not a bot)


Name:
URL/Email:	[http://... or mailto:you@wherever] (optional)
Title:	(optional)
Comments:

Key image:	(valid for an hour only)
Key value:	(to verify you are not a bot)


Name:
URL/Email:	[http://... or mailto:you@wherever] (optional)
Title:	(optional)
Comments:

Key image:	(valid for an hour only)
Key value:	(to verify you are not a bot)

Yenya's World

Wed, 31 Jan 2007

Mugshot

1 replies for this story:

Vasek Stodulka wrote:

Reply to this story:

Tue, 30 Jan 2007

Social bookmarking

0 replies for this story:

Reply to this story:

Mon, 29 Jan 2007

PNG Transparency (continued)

0 replies for this story:

Reply to this story:

Mon, 15 Jan 2007

PNG Transparency

3 replies for this story:

Vasek Stodulka wrote:

Yenya wrote: binary alpha

Stepan wrote: use IE7 :(

Reply to this story:

Fri, 12 Jan 2007

Graceful Reload

4 replies for this story:

Peter Kruty wrote: What's wrong with SIGUSR1?

Spes wrote: Re: What's wrong with SIGUSR1?

mutante wrote: Apache Wiki Page on Internal Dummy Connection

John Gillespie wrote:

Reply to this story:

Wed, 10 Jan 2007

HP Procurve Upgrade

0 replies for this story:

Reply to this story:

Tue, 09 Jan 2007

Packet Loss

2 replies for this story:

Vasek Stodulka wrote:

Yenya wrote: power cycle

Reply to this story:

Mon, 08 Jan 2007

Spam in 2007

1 replies for this story:

Milan Zamazal wrote:

Reply to this story:

Thu, 04 Jan 2007

Happy New Year

0 replies for this story:

Reply to this story:

About:

Links:

Categories:

Archive:

Blog roll: