kerner error

Portal Home > Knowledgebase > Articles Database > kerner error

Posted by ttgt, 09-15-2011, 10:01 PM
Hi, i can not get response from my centos/cpanel server randomly this days, i check the /var/log/messages and has following error log, i check from cpanel forum,it seems cpanel support the centos 5.7 this days, and i check the centos from whm,it also shows with 5.7, and i check the pingdom with the down time log, it seems the server get down from the same day of whm release they support centos 5.7, so,i wonder if the update make the wrong, then,i run yum update kernel to update the kernel and reboot, but i still can get the similar error log from /var/log/messages, is it possible my hd or board issue? because i replace the hd and board about 100 days ago, or is it possible any other reason ? thanx Sep 16 06:48:02 quick088 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 15 06:48:02 server kernel: ata1.00: cmd 25/00:08:da:05:50/00:00:17:00:00/e0 tag 0 dma 4096 in Sep 15 06:48:02 server kernel: res 40/00:00:00:00:00/00:00:00:00:00/10 Emask 0x4 (timeout) Sep 15 06:48:02 server kernel: ata1.00: status: { DRDY } Sep 15 06:48:02 server kernel: ata1: soft resetting link Sep 15 06:48:02 server kernel: ata1.00: configured for UDMA/133 Sep 15 06:48:02 server kernel: ata1.01: configured for UDMA/133 Sep 15 06:48:02 server kernel: ata1: EH complete Sep 15 06:48:02 server kernel: SCSI device sda: 976771055 512-byte hdwr sectors (500107 MB) Sep 15 06:48:02 server kernel: sda: Write Protect is off Sep 15 06:48:02 server kernel: SCSI device sda: drive cache: write back Sep 15 06:48:02 server kernel: SCSI device sdb: 976771055 512-byte hdwr sectors (500107 MB) Sep 15 06:48:02 server kernel: sdb: Write Protect is off Sep 15 06:48:02 server kernel: SCSI device sdb: drive cache: write back Sep 15 06:48:02 server kernel: SCSI device sda: 976771055 512-byte hdwr sectors (500107 MB) Sep 15 06:48:02 server kernel: sda: Write Protect is off Sep 15 06:48:02 server kernel: SCSI device sda: drive cache: write back Sep 15 06:48:02 server kernel: SCSI device sdb: 976771055 512-byte hdwr sectors (500107 MB) Sep 15 06:48:02 server kernel: sdb: Write Protect is off Sep 15 06:48:02 server kernel: SCSI device sdb: drive cache: write back
Posted by luki, 09-15-2011, 10:26 PM
It looks like a communication issue between the mother board and the drive. That could be the cable or a bad board/drive, or driver bug. I'd try replacing the cable first since it's quick and cheap to try.
Posted by ttgt, 09-15-2011, 10:28 PM
Hi,but i can not understand why it always has error log on both sda and sdb.thanx
Posted by ttgt, 09-15-2011, 10:44 PM
Hi,i get another error smartd[4311]: Device: /dev/sda, 12 Currently unreadable (pending) sectors thanx
Posted by ttgt, 09-16-2011, 01:03 PM
Hi, i still has the error mainly smartd[4311]: Device: /dev/sda, 12 Currently unreadable (pending) sectors should i sun fsck -F to fix it? thanx
Posted by rustelekom, 09-16-2011, 01:22 PM
Hi, Fsck does not help with bad blocks(sectors). You need check cable and i it is okay then just replace hdd again as it is under troubles.
Posted by ttgt, 09-18-2011, 01:06 AM
Hi, i still continuously get the error Sep 17 05:26:30 server kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 17 05:26:30 server kernel: ata1.00: cmd ca/00:18:da:d2:29/00:00:00:00:00/ee tag 0 dma 12288 out Sep 17 05:26:30 server kernel: res 40/00:00:00:00:00/00:00:00:00:00/10 Emask 0x4 (timeout) Sep 17 05:26:30 server kernel: ata1.00: status: { DRDY } Sep 17 05:26:30 server kernel: ata1: soft resetting link Sep 17 05:26:30 server kernel: ata1.00: configured for UDMA/100 Sep 17 05:26:30 server kernel: ata1.01: configured for UDMA/133 Sep 17 05:26:30 server kernel: ata1: EH complete Sep 17 05:26:30 server kernel: SCSI device sda: 976771055 512-byte hdwr sectors (500107 MB) Sep 17 05:26:30 server kernel: sda: Write Protect is off Sep 17 05:26:30 server kernel: SCSI device sda: drive cache: write back Sep 17 05:26:30 server kernel: SCSI device sdb: 976771055 512-byte hdwr sectors (500107 MB) Sep 17 05:26:30 server kernel: sdb: Write Protect is off Sep 17 05:26:30 server kernel: SCSI device sdb: drive cache: write back Sep 17 05:26:30 server kernel: SCSI device sda: 976771055 512-byte hdwr sectors (500107 MB) Sep 17 05:26:30 server kernel: sda: Write Protect is off Sep 17 05:26:30 server kernel: SCSI device sda: drive cache: write back Sep 17 05:26:30 server kernel: SCSI device sdb: 976771055 512-byte hdwr sectors (500107 MB) Sep 17 05:26:30 server kernel: sdb: Write Protect is off Sep 17 05:26:30 server kernel: SCSI device sdb: drive cache: write back i will try to replace the sata cable hours later, but i still can not understand is it possible the two cables get issue at the same time ? but if the hd get issue,is it also two hd get issue at the same time ? because the error happen about two times per hour or two hours one time, and the server will get high load and slow response, i wonder how to fix it. thanx
Posted by ttgt, 09-18-2011, 01:08 AM
Hi, the board is GIGABYTE GA-G41M-ES2L and two hd connect to the board directly without any controller card. thanx
Posted by ttgt, 09-18-2011, 01:58 AM
Hi, ref. to https://bugzilla.redhat.com/show_bug.cgi?id=462425#c80 i need to modify grub.conf and i want to add 'acpi=off noapic' do i directly add it the the last line and save directly ? or i need to run any other command to let it work ? thanx
Posted by gate2vn, 09-18-2011, 03:56 AM
Add that code to kernel line of kernel version you are using. Make sure you can copy the old setting to new lines, then set the new setting for one times only in grub config. By that way, if there is something wrong with new setting, you just restart server, it will take the old one.
Posted by ttgt, 09-18-2011, 04:50 AM
Hi, can you please tell me more detail about where/how to add ? thanx this my grub.conf # grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file # NOTICE: You have a /boot partition. This means that # all kernel and initrd paths are relative to /boot/, eg. # root (hd0,0) # kernel /vmlinuz-version ro root=/dev/sda6 # initrd /initrd-version.img #boot=/dev/sda default=0 timeout=5 splashimage=(hd0,0)/grub/splash.xpm.gz hiddenmenu title CentOS (2.6.18-274.3.1.el5) root (hd0,0) kernel /vmlinuz-2.6.18-274.3.1.el5 ro root=LABEL=/1 initrd /initrd-2.6.18-274.3.1.el5.img title CentOS (2.6.18-238.9.1.el5) root (hd0,0) kernel /vmlinuz-2.6.18-238.9.1.el5 ro root=LABEL=/1 initrd /initrd-2.6.18-238.9.1.el5.img title CentOS (2.6.18-128.el5) root (hd0,0) kernel /vmlinuz-2.6.18-128.el5 ro root=LABEL=/1 initrd /initrd-2.6.18-128.el5.img [~]# uname -r 2.6.18-274.3.1.el5 Last edited by ttgt; 09-18-2011 at 05:02 AM.
Posted by gate2vn, 09-18-2011, 05:59 AM
update those lines to and to then run grub command. Then enter this: savedefault --default=0 --once quit Now, try to reboot your server. If it's working, your server will boot normally. If it's not back to online, with kernel error message mightbe, reboot again, it will take the old config.
Posted by ttgt, 09-18-2011, 06:47 AM
Hi, Thanx for your help,sorry for some questions. I use vi to edit /etc/grub.conf directly,correct ? 1. What the setting mean for set the default from 0 to 1 ? 2. Do you do a copy and at the toper one,i add the texts I want to add,correct ? 3. Can you tell me more detail about the following ? then run grub command. Then enter this: savedefault --default=0 --once quit What the grub command mean ? Thanx Last edited by ttgt; 09-18-2011 at 06:50 AM.
Posted by gate2vn, 09-18-2011, 08:16 AM
1. The copied lines are your current kernel setting. Move it to "1" position. "Default" will let grub know which kernel it will use 2. I have already added "acpi=off noapic" in "0" kernel, as you see above 3. After editing grub.conf file, in CLI, type grub , then paste those command. It will take kernel setting in "0" position for booting once.
Posted by ttgt, 09-18-2011, 08:22 AM
Hi, so, step1. vi /etc/grub.conf and do the modify and save. step2. type "grub" and , then type "savedefault --default=0 --once" and "quit" step. reboot correct ? thanx
Posted by gate2vn, 09-18-2011, 08:28 AM
If more details "savedefault --default=0 --once" then "quit" then
Posted by ttgt, 09-18-2011, 08:40 AM
Hi, how can i make sure the setting work/make effect ? thanx
Posted by ttgt, 09-18-2011, 08:50 AM
Hi, im not sure if i need leave a row between them,my grub.conf now is #boot=/dev/sda default=1 timeout=5 splashimage=(hd0,0)/grub/splash.xpm.gz hiddenmenu title CentOS (2.6.18-274.3.1.el5) root (hd0,0) kernel /vmlinuz-2.6.18-274.3.1.el5 ro root=LABEL=/1 acpi=off noapic initrd /initrd-2.6.18-274.3.1.el5.img title CentOS (2.6.18-274.3.1.el5) root (hd0,0) kernel /vmlinuz-2.6.18-274.3.1.el5 ro root=LABEL=/1 initrd /initrd-2.6.18-274.3.1.el5.img title CentOS (2.6.18-238.9.1.el5) root (hd0,0) kernel /vmlinuz-2.6.18-238.9.1.el5 ro root=LABEL=/1 initrd /initrd-2.6.18-238.9.1.el5.img title CentOS (2.6.18-128.el5) root (hd0,0) kernel /vmlinuz-2.6.18-128.el5 ro root=LABEL=/1 initrd /initrd-2.6.18-128.el5.img and i reboot,it can work,but im not sure if the change make effect/work well. thanx
Posted by gate2vn, 09-18-2011, 08:53 AM
If it works fine, you can edit default value to 0. Otherwise, next time reboot, it will take kernel setting in "1" position. It doesn't matter to have a blank row there or not. And that setting can help for your situation or not, cannot say exactly, you have to check your log.
Posted by ttgt, 09-18-2011, 09:03 AM
Hi, 1. is any way to check it is run with "0" or "1" now ? 2. so,the following is correct ? title CentOS (2.6.18-274.3.1.el5) with acpi=off noapic >>> 0 title CentOS (2.6.18-274.3.1.el5) without acpi=off noapic >>> 1 kernel /vmlinuz-2.6.18-238.9.1.el5 ro root=LABEL=/1 >>> 2 kernel /vmlinuz-2.6.18-128.el5 ro root=LABEL=/1 >>> 3 3. about Now, try to reboot your server. If it's working, your server will boot normally. If it's not back to online, with kernel error message mightbe, reboot again, it will take the old config. if i do not modify it from "1" back to "0", next time,when i reboot it, will it use "1" or "0" to boot ? because im not sure what "savedefault --default=0 --once" mean ? 4. in the feature,if i use yum update kernel to install newer kernel, it will show before the title CentOS (2.6.18-274.3.1.el5) root (hd0,0) kernel /vmlinuz-2.6.18-274.3.1.el5 ro root=LABEL=/1 acpi=off noapic initrd /initrd-2.6.18-274.3.1.el5.img and system think the new version as "0" ,correct ? thanx for your help alot Last edited by ttgt; 09-18-2011 at 09:06 AM.
Posted by ttgt, 09-18-2011, 12:59 PM
Hi, after modifying /etc/grub.conf,the kernel error is still there,when /var/log/messages get the error,the server is slow to response,does anyone have any experience can give me ? thanx
Posted by Steven, 09-18-2011, 07:09 PM
Based on your other thread with the pending sectors, your hard drive is failing and need to be replaced.

Add to Favourites Print this Article