Thu, 25 May 2006
Kernel upgrades, crashes
Yesterday I have upgraded the kernel in our main server to the newest stable, 2.6.16.18. The server ran pretty historical kernel 2.6.11.10, because newer kernels had some problems in XFS on this setup. But even the older kernel has crashed from time to time. The 2.6.16.18 booted fine, I did few additional tweaks, and put the server back to the production use. Today, however, some problems appeared:
We have found that NFS clients using volumes from this server cannot lock
files using fcntl()
. According to tcpdump
,
the server just did not respond to the RPC locking requests. However, the
same kernel version on a different server, and with the same line in
/etc/exports
, worked correctly, even with file locking.
I think the difference was in the NFS server utils, which are older
in the prodution system (RHEL3) than in the other server (FC3). However,
I couldn't recompile newer NFS utils, because it depends on the newer
version of Kerberos libraries. I was about to reboot the server back to
the older kernel, when it
crashed on me. I am back on 2.6.11.10 and I will probably wait till we upgrade
this box to RHEL4. If the problem is the same then, I will probably make
it a Red Hat problem.
When testing the NFS with the newest kernel on that other box, I have found
that the 3ware driver
does
not work with the iommu=off
boot option.
I wonder if akpm is right about the kernel getting buggier.
Moreover, it seems that the Tyan S2882 board does not boot correctly, when
the power-on memory test is interrupted by pressing the Escape
key (MCE is reported, or the server just silently reboots when loading the
kernel). At first I thought the server was dead, when even the original
kernel refused to boot.