* aic7xxx errors @ 2001-09-05 6:21 Joseph Mathewson 2001-09-05 7:58 ` Olaf Zaplinski 2001-09-05 20:23 ` aic7xxx errors Justin T. Gibbs 0 siblings, 2 replies; 28+ messages in thread From: Joseph Mathewson @ 2001-09-05 6:21 UTC (permalink / raw) To: linux-kernel I've just woken up this morning to find my internet gateway machine only responding to pings, and on giving it a keyboard & monitor, a load of scsi0:0:1:0: Attempting to queue an ABORT message scsi0:0:1:0: Cmd aborted from QINFIFO aic7xxx_abort returns 8194 errors. Is this a problem with the hard drive on ID 1 or a driver issue? It's now working fine after a restart (eventually it seems to have given up on ID 1 completely and it restarted cleanly [it boots off ID 0]). I'm using kernel 2.4.7, the card is an Adaptec 2940UW (aic7xxx), the drive on ID 1 a Seagate Barracuda 18LP. Joe. +-------------------------------------------------+ | Joseph Mathewson <joe@mathewson.co.uk> | +-------------------------------------------------+ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: aic7xxx errors 2001-09-05 6:21 aic7xxx errors Joseph Mathewson @ 2001-09-05 7:58 ` Olaf Zaplinski 2001-09-05 9:04 ` Frank Schneider 2001-09-07 20:32 ` AIC + RAID1 error? (was: Re: aic7xxx errors) Olaf Zaplinski 2001-09-05 20:23 ` aic7xxx errors Justin T. Gibbs 1 sibling, 2 replies; 28+ messages in thread From: Olaf Zaplinski @ 2001-09-05 7:58 UTC (permalink / raw) To: joe.mathewson; +Cc: linux-kernel Joseph Mathewson wrote: > > I've just woken up this morning to find my internet gateway machine only > responding to pings, and on giving it a keyboard & monitor, a load of > > scsi0:0:1:0: Attempting to queue an ABORT message > scsi0:0:1:0: Cmd aborted from QINFIFO > aic7xxx_abort returns 8194 > > errors. [...] /me too. I had this while booting 2.4.9 with a fresh installed SCSI card (AHA2940) + harddisk. What worked for me was to compile the kernel with the old Adaptec driver, so it's a driver issue. Olaf ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: aic7xxx errors 2001-09-05 7:58 ` Olaf Zaplinski @ 2001-09-05 9:04 ` Frank Schneider 2001-09-05 10:27 ` Antonio Miguel Trindade 2001-09-07 20:32 ` AIC + RAID1 error? (was: Re: aic7xxx errors) Olaf Zaplinski 1 sibling, 1 reply; 28+ messages in thread From: Frank Schneider @ 2001-09-05 9:04 UTC (permalink / raw) To: Olaf Zaplinski; +Cc: joe.mathewson, linux-kernel Olaf Zaplinski schrieb: > > Joseph Mathewson wrote: > > > > I've just woken up this morning to find my internet gateway machine only > > responding to pings, and on giving it a keyboard & monitor, a load of > > > > scsi0:0:1:0: Attempting to queue an ABORT message > > scsi0:0:1:0: Cmd aborted from QINFIFO > > aic7xxx_abort returns 8194 > > > > errors. > [...] > > /me too. I had this while booting 2.4.9 with a fresh installed SCSI card > (AHA2940) + harddisk. What worked for me was to compile the kernel with the > old Adaptec driver, so it's a driver issue. > > Olaf Hello... I had this effect too here (RH7.1, Kernel 2.4.3), but i put it on a wrong termination of the LVD Bus...be careful if you have LVD-Drives with a "Termination"-Jumper...(e.g. IBM DGHS18V)...this Termination is only usable if you use the drive as Single Ended SCSI-UW, *not* if you use the drive i a true LVD-environment ! I learnt this the hard way, because i used this "Termination"-jumper and the system bootet without problems and ran about 2 weeks...then the above errors occured, followed by system crashes....after reading the original ibm-docs, and not the oem-reseller-crap, the reason was clear. Th second thing i noticed was, that the value for "Maximum Number of TCQ Commands per Device" is per default on 255, but wirt my system the driver always complained, that he could only use 64 ("locked on 64")...so i decided to switch to 32 and not to let him auto-detect the max. value...since then i had no problems at all... Solong.. Frank. -- Frank Schneider, <SPATZ1@T-ONLINE.DE>. Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ... -.- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: aic7xxx errors 2001-09-05 9:04 ` Frank Schneider @ 2001-09-05 10:27 ` Antonio Miguel Trindade 2001-09-05 10:44 ` Frank Schneider 0 siblings, 1 reply; 28+ messages in thread From: Antonio Miguel Trindade @ 2001-09-05 10:27 UTC (permalink / raw) To: linux-kernel Em Quarta 05 Setembro 2001 10:04, Frank Schneider escreveu: > Olaf Zaplinski schrieb: > > I had this effect too here (RH7.1, Kernel 2.4.3), but i put it on a > wrong termination of the LVD Bus...be careful if you have LVD-Drives > with a "Termination"-Jumper...(e.g. IBM DGHS18V)...this Termination is > only usable if you use the drive as Single Ended SCSI-UW, *not* if you > use the drive i a true LVD-environment ! > > I learnt this the hard way, because i used this "Termination"-jumper and > the system bootet without problems and ran about 2 weeks...then the > above errors occured, followed by system crashes....after reading the > original ibm-docs, and not the oem-reseller-crap, the reason was clear. > According to IBM specs, _no LVD drive has terminators built-in_... I have several servers with LVD drives (all IBM) and none of them has terminators, even in SE mode. You always have to use an external terminator... > > Solong.. > Frank. -- A year spent in artificial intelligence is enough to make one believe in God. ------------------------------- António Miguel F. M. Trindade System's Administrator D.E.I. F.C.T.U.C. ------------------------------- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: aic7xxx errors 2001-09-05 10:27 ` Antonio Miguel Trindade @ 2001-09-05 10:44 ` Frank Schneider 2001-09-05 11:21 ` Thorsten Kranzkowski 0 siblings, 1 reply; 28+ messages in thread From: Frank Schneider @ 2001-09-05 10:44 UTC (permalink / raw) To: Antonio Miguel Trindade; +Cc: linux-kernel Antonio Miguel Trindade schrieb: > > Em Quarta 05 Setembro 2001 10:04, Frank Schneider escreveu: > > Olaf Zaplinski schrieb: > > > > I had this effect too here (RH7.1, Kernel 2.4.3), but i put it on a > > wrong termination of the LVD Bus...be careful if you have LVD-Drives > > with a "Termination"-Jumper...(e.g. IBM DGHS18V)...this Termination is > > only usable if you use the drive as Single Ended SCSI-UW, *not* if you > > use the drive i a true LVD-environment ! > > > > I learnt this the hard way, because i used this "Termination"-jumper and > > the system bootet without problems and ran about 2 weeks...then the > > above errors occured, followed by system crashes....after reading the > > original ibm-docs, and not the oem-reseller-crap, the reason was clear. > > > > According to IBM specs, _no LVD drive has terminators built-in_... I have > several servers with LVD drives (all IBM) and none of them has terminators, > even in SE mode. You always have to use an external terminator... That was it what i thought too...but if you get a copied sheet from your vendor, and there a jumper is named "Termination on" and the sheet also says you can use this, then you probably think the disk has a LVD-Terminator build-in...although such a terminator is quite simple, some resistors, perhaps a small chip, not more...it would be possible to integrate it in the drive logic... But as said, my DGHS-Disk has a build-in terminator for use with UW-buses...the bad thing is, that if you "terminate" the LVD-bus with this, it seems to work...for some time...i had "/" on it and a part of my /home-RAID5, and it run 2 weeks.... Solong.. Frank. -- Frank Schneider, <SPATZ1@T-ONLINE.DE>. Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ... -.- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: aic7xxx errors 2001-09-05 10:44 ` Frank Schneider @ 2001-09-05 11:21 ` Thorsten Kranzkowski 2001-09-05 13:05 ` Frank Schneider 0 siblings, 1 reply; 28+ messages in thread From: Thorsten Kranzkowski @ 2001-09-05 11:21 UTC (permalink / raw) To: Frank Schneider; +Cc: Antonio Miguel Trindade, linux-kernel On Wed, Sep 05, 2001 at 12:44:24PM +0200, Frank Schneider wrote: > Antonio Miguel Trindade schrieb: > > Em Quarta 05 Setembro 2001 10:04, Frank Schneider escreveu: > > > Olaf Zaplinski schrieb: > > > > > > I had this effect too here (RH7.1, Kernel 2.4.3), but i put it on a > > > wrong termination of the LVD Bus...be careful if you have LVD-Drives > > > with a "Termination"-Jumper...(e.g. IBM DGHS18V)...this Termination is > > > only usable if you use the drive as Single Ended SCSI-UW, *not* if you > > > use the drive i a true LVD-environment ! > > > > > > > According to IBM specs, _no LVD drive has terminators built-in_... I have There are definitely some that have this SE-Termination jumper. > > But as said, my DGHS-Disk has a build-in terminator for use with > UW-buses...the bad thing is, that if you "terminate" the LVD-bus with > this, it seems to work...for some time...i had "/" on it and a part of > my /home-RAID5, and it run 2 weeks.... Usually when a single device in a LVD chain is operated in SE mode all LVD devices also switch to SE mode automatically. The use of a SE terminator such as the one on your harddisk qualifies for SE operation. But in SE mode you are tied to the much stricter specifications like length of cable etc. compared to LVD mode. So maybe you just exceeded specifications too much. Bye, Thorsten -- | Thorsten Kranzkowski Internet: dl8bcu@dl8bcu.de | | Mobile: ++49 170 1876134 Snail: Niemannsweg 30, 49201 Dissen, Germany | | Ampr: dl8bcu@db0lj.#rpl.deu.eu, dl8bcu@marvin.dl8bcu.ampr.org [44.130.8.19] | ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: aic7xxx errors 2001-09-05 11:21 ` Thorsten Kranzkowski @ 2001-09-05 13:05 ` Frank Schneider 0 siblings, 0 replies; 28+ messages in thread From: Frank Schneider @ 2001-09-05 13:05 UTC (permalink / raw) To: dl8bcu; +Cc: Antonio Miguel Trindade, linux-kernel Thorsten Kranzkowski schrieb: > > On Wed, Sep 05, 2001 at 12:44:24PM +0200, Frank Schneider wrote: > > Antonio Miguel Trindade schrieb: > > > Em Quarta 05 Setembro 2001 10:04, Frank Schneider escreveu: > > > > Olaf Zaplinski schrieb: > > > > > > > > I had this effect too here (RH7.1, Kernel 2.4.3), but i put it on a > > > > wrong termination of the LVD Bus...be careful if you have LVD-Drives > > > > with a "Termination"-Jumper...(e.g. IBM DGHS18V)...this Termination is > > > > only usable if you use the drive as Single Ended SCSI-UW, *not* if you > > > > use the drive i a true LVD-environment ! > > > > > > > > > > According to IBM specs, _no LVD drive has terminators built-in_... I have > > There are definitely some that have this SE-Termination jumper. Yes...i can send you one if you send me a spare-drive instead...:-)) > > > > But as said, my DGHS-Disk has a build-in terminator for use with > > UW-buses...the bad thing is, that if you "terminate" the LVD-bus with > > this, it seems to work...for some time...i had "/" on it and a part of > > my /home-RAID5, and it run 2 weeks.... > > Usually when a single device in a LVD chain is operated in SE mode all LVD > devices also switch to SE mode automatically. The use of a SE terminator > such as the one on your harddisk qualifies for SE operation. Thats exactly what i expected, but that did not happen...i tried this one time by setting the "SE"-Jumper on *all* devices *and* connecting them to the UW-cable (i use a Asus P2B-DS-Mobo with 3 connectors, Fast-SCSI, UW-SCSI, LVD-SCSI)..their it worked in the described way, but on the LVD-cable not even the SCSI-Bios at bootup mentioned the problem...all devices were "LVD-SCSI" rated, and not "SE/FastSCSI" at bootup...and /proc/scsi/aic7xxx/0 also said something about "80MByte/sec synchronous speed..." It seems that in this particular case you don`t get any hint where the problem lies...neither from the bios nor from the driver...i noticed it when i changed the LVD-cable and took a closer look on the disks...and then in the specs on www.storage.ibm.com.... > But in SE mode you are tied to the much stricter specifications like length > of cable etc. compared to LVD mode. Thats clear...max. cablelength is 1,50m (if more than 4 devices are connected), all together, incl. Fast-SCSI-cable or external cables, if used... > So maybe you just exceeded specifications too much. I did this also one time (6 Devices-2m cablelength) and it showed indeed the same problems...randomly appearing crashes on the scsi-bus, sometimes revoverable, sometimes not, sometimes under heavy disk-load, sometimes without... Solong.. Frank. -- Frank Schneider, <SPATZ1@T-ONLINE.DE>. Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ... -.- ^ permalink raw reply [flat|nested] 28+ messages in thread
* AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-05 7:58 ` Olaf Zaplinski 2001-09-05 9:04 ` Frank Schneider @ 2001-09-07 20:32 ` Olaf Zaplinski 2001-09-07 22:32 ` Justin T. Gibbs 2001-09-08 20:25 ` Frank Schneider 1 sibling, 2 replies; 28+ messages in thread From: Olaf Zaplinski @ 2001-09-07 20:32 UTC (permalink / raw) To: linux-kernel Olaf Zaplinski wrote: > > Joseph Mathewson wrote: > > > > I've just woken up this morning to find my internet gateway machine only > > responding to pings, and on giving it a keyboard & monitor, a load of > > > > scsi0:0:1:0: Attempting to queue an ABORT message > > scsi0:0:1:0: Cmd aborted from QINFIFO > > aic7xxx_abort returns 8194 > > > > errors. > [...] > > /me too. I had this while booting 2.4.9 with a fresh installed SCSI card > (AHA2940) + harddisk. What worked for me was to compile the kernel with the > old Adaptec driver, so it's a driver issue. Okay, I had it again today: Sep 7 19:15:19 binky kernel: scsi0:0:0:0: Attempting to queue an ABORT message Sep 7 19:15:19 binky kernel: scsi0:0:0:0: Cmd aborted from QINFIFO Sep 7 19:15:19 binky kernel: scsi0:0:0:0: Attempting to queue an ABORT message Sep 7 19:15:19 binky kernel: scsi0:0:0:0: Command not found Sep 7 19:15:19 binky kernel: scsi0:0:0:0: Attempting to queue an ABORT message Sep 7 19:15:19 binky kernel: scsi0:0:0:0: Cmd aborted from QINFIFO Sep 7 19:15:19 binky kernel: scsi0:0:0:0: Attempting to queue an ABORT message Sep 7 19:15:19 binky kernel: scsi0:0:0:0: Command not found Sep 7 19:15:19 binky kernel: scsi0:0:0:0: Attempting to queue an ABORT message Sep 7 19:15:19 binky kernel: scsi0:0:0:0: Cmd aborted from QINFIFO Sep 7 19:15:19 binky kernel: scsi0:0:0:0: Attempting to queue an ABORT message Sep 7 19:15:19 binky kernel: scsi0:0:0:0: Command not found Kernel was 2.4.9ac9 with (new) AIC driver 6.2.1, compiled with "Maximum Number of TCQ Commands per Device" set to 64. I was lucky since it's a RAID1 system (mirror disk is hda). Distro is SuSE 7.2 Professional, machine K6-2/300 with 128 MB EDO RAM, FS is reiser 3.6.25. Average load is low, it's a small smtp/imap/www system. So I compiled the same kernel with the old AIC driver, and it works fine. I should mention that it is a rather old PCI AHA-2940 Fast SCSI card with an also older harddisk IBM 0662S12 (that's the whole SCSI chain). My other machine (AIC-something U2W with Tandberg SLR (U2W) and SCSI CDR (SE) attached, no HDDs) works fine with the new driver. I just guess when saying that it seems to me that the driver developers were focused on up-to-date cards but not the older ones. Olaf ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-07 20:32 ` AIC + RAID1 error? (was: Re: aic7xxx errors) Olaf Zaplinski @ 2001-09-07 22:32 ` Justin T. Gibbs 2001-09-07 22:51 ` Frank Schneider 2001-09-08 20:25 ` Frank Schneider 1 sibling, 1 reply; 28+ messages in thread From: Justin T. Gibbs @ 2001-09-07 22:32 UTC (permalink / raw) To: Olaf Zaplinski; +Cc: linux-kernel >Okay, I had it again today: You need to be running with aic7xxx=verbose for these messages to be useful. In the 6.2.2 driver release I've turned these messages on by default. >Kernel was 2.4.9ac9 with (new) AIC driver 6.2.1, compiled with "Maximum >Number of TCQ Commands per Device" set to 64. This is 8 times the tag load the old driver defaults to. >So I compiled the same kernel with the old AIC driver, and it works fine. Which may be due to a lighter load on the drive. Its hard to say without the verbose messages and the full dmesg for the machine. You're IBM drive may be running the "if I miss a seek, I fall off the bus" firmware where the bug is only triggered under high load. Send the dmesg output and we'll see. >I just guess when >saying that it seems to me that the driver developers were focused on >up-to-date cards but not the older ones. This isn't true. -- Justin ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-07 22:32 ` Justin T. Gibbs @ 2001-09-07 22:51 ` Frank Schneider 2001-09-07 23:37 ` Justin T. Gibbs 0 siblings, 1 reply; 28+ messages in thread From: Frank Schneider @ 2001-09-07 22:51 UTC (permalink / raw) To: Justin T. Gibbs; +Cc: linux-kernel "Justin T. Gibbs" schrieb: > > >Okay, I had it again today: > > You need to be running with aic7xxx=verbose for these messages to be > useful. In the 6.2.2 driver release I've turned these messages on > by default. Could you please shortly explain what this option does...(before it fills my logfiles with notes "succesfully wrote 1 Byte to disk abc"..:-) i had recently also some problems with aic7xxx, but they where due to a misconfigured scsi-bus and perhaps a bad drive (is still under test), so i enabled scsi error logging in the kernel (2.4.3, RH7.1) and by sending the following strings to /proc/scsi/scsi: /bin/echo "scsi log error 5" > /proc/scsi/scsi /bin/echo "scsi log mlqueue 3" > /proc/scsi/scsi /bin/echo "scsi log hlcomplete 1" > /proc/scsi/scsi /bin/echo "scsi log scan 5" > /proc/scsi/scsi But it did not give me that kind of info i wanted to see...does the "aic7xxx=verbose" something similar or something completly different ? > >Kernel was 2.4.9ac9 with (new) AIC driver 6.2.1, compiled with "Maximum > >Number of TCQ Commands per Device" set to 64. > > This is 8 times the tag load the old driver defaults to. Thats true, and e.g., my relatively new IBM-drives (DGHS18V, 2x DNES-309170W, DDRS-39130W, all Server-disks according to IBM) can only 64...and the kernel complains, if i compile it with 255 and locks to 64...as i have played with this feature a while ago, i did not realize a big performance-plus from 8 to 64, so i switched to 32...and i would go down to <8 if i where in doubt.... > >So I compiled the same kernel with the old AIC driver and it works fine. Test it longer and under load...i also "cured" a bad scsi-bus by switching drivers one time...sometimes it really seems to work...for some days...:-) > Which may be due to a lighter load on the drive. Its hard to say without > the verbose messages and the full dmesg for the machine. You're IBM drive > may be running the "if I miss a seek, I fall off the bus" firmware where > the bug is only triggered under high load. Send the dmesg output and we'll > see. Solong... Frank. -- Frank Schneider, <SPATZ1@T-ONLINE.DE>. Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ... -.- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-07 22:51 ` Frank Schneider @ 2001-09-07 23:37 ` Justin T. Gibbs 2001-09-10 13:50 ` Olaf Zaplinski 0 siblings, 1 reply; 28+ messages in thread From: Justin T. Gibbs @ 2001-09-07 23:37 UTC (permalink / raw) To: Frank Schneider; +Cc: linux-kernel >> You need to be running with aic7xxx=verbose for these messages to be >> useful. In the 6.2.2 driver release I've turned these messages on >> by default. > >Could you please shortly explain what this option does...(before it >fills my logfiles with notes "succesfully wrote 1 Byte to disk abc"..:-) It turns on some diagnostics regarding: 1) Card initialization 2) Transfer Negotiation (occurs with every check condition that occurs prior to sending data, so while not rare, is not a common occurrence). 3) Abort/Timeout processing It should not fill your log file unless you have a timeout. This is exactly the time you want it to fill your logs, so I can help diagnose and fix your problem. >> This is 8 times the tag load the old driver defaults to. > >Thats true, and e.g., my relatively new IBM-drives (DGHS18V, 2x >DNES-309170W, DDRS-39130W, all Server-disks according to IBM) can only >64...and the kernel complains, if i compile it with 255 and locks to >64... Its not really "complaining", its just telling you that it has determined the proper setting for this device. There is an advantage to setting your tag depth to the locked value - the SCSI layer cannot be told dynamically to lower the tag depth, so there may be extra transactions sitting in the driver queue for no real purpose - but its not that big of a deal. >as i have played with this feature a while ago, i did not realize a >big performance-plus from 8 to 64, so i switched to 32...and i would go >down to <8 if i where in doubt.... It all depends on your workload. If you run a news server or have lots of concurrent active users on the machine, you are more likely to see a difference. -- Justin ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-07 23:37 ` Justin T. Gibbs @ 2001-09-10 13:50 ` Olaf Zaplinski 2001-09-10 19:11 ` Frank Schneider 2001-09-11 15:00 ` Olaf Zaplinski 0 siblings, 2 replies; 28+ messages in thread From: Olaf Zaplinski @ 2001-09-10 13:50 UTC (permalink / raw) To: linux-kernel Okay, I tested it today, compiled 2.4.9ac10 with the new driver and TCQ set to 32. I built the driver as a module to make sure that the machine at least boots into runlevel 3 (I have no console access, only access to the reset switch). I rebooted and inserted the driver with 'modprobe aic7xxx', remembered that I forgot the verbose flag, removed the driver with 'modprobe -r' and re-inserted it with 'modprobe aic7xxx aic7xxx=verbose'. The machine was still alive then. But right after entering 'raidhotadd /dev/md1 /dev/sda1' the machine hung. reiserfs erased the last lines of /var/log/messages, but AFAIK the verbose driver output showed no errors. But how can I help to reproduce the error? Of course I could break the mirror, compile the driver into the kernel (non-module) and do some stress test on the SCSI drive. But it's not so good when I drive this machine into a hang too often. I compiled the old driver now, also with TCQ set to 32, and the machine seems to work fine. Olaf ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-10 13:50 ` Olaf Zaplinski @ 2001-09-10 19:11 ` Frank Schneider 2001-09-10 22:29 ` Andreas Steinmetz 2001-09-11 15:00 ` Olaf Zaplinski 1 sibling, 1 reply; 28+ messages in thread From: Frank Schneider @ 2001-09-10 19:11 UTC (permalink / raw) To: linux-kernel Olaf Zaplinski schrieb: > > Okay, I tested it today, compiled 2.4.9ac10 with the new driver and TCQ set > to 32. I built the driver as a module to make sure that the machine at least > boots into runlevel 3 (I have no console access, only access to the reset > switch). > > I rebooted and inserted the driver with 'modprobe aic7xxx', remembered that > I forgot the verbose flag, removed the driver with 'modprobe -r' and > re-inserted it with 'modprobe aic7xxx aic7xxx=verbose'. The machine was > still alive then. But right after entering 'raidhotadd /dev/md1 /dev/sda1' > the machine hung. reiserfs erased the last lines of /var/log/messages, but > AFAIK the verbose driver output showed no errors. > > But how can I help to reproduce the error? Of course I could break the > mirror, compile the driver into the kernel (non-module) and do some stress > test on the SCSI drive. But it's not so good when I drive this machine into > a hang too often. > > I compiled the old driver now, also with TCQ set to 32, and the machine > seems to work fine. > Hello... I`m also in the moment testing with my raid-problem where one drive falls out of the raid...till now it did not happen with the old driver, but that means nothing as it only happened once a week or so. Something other made me wonder: I ran the machine several times with the *new* aic7xxx-driver (TCQ=32) and the "aic7xxx=verbose" commandline, and i noticed the following: At every reboot (made by "reboot", RH7.1), the machine was not able to stop the raid5 correctly...it un-mounted the mountpoint (/home) and then it normaly wants to stop the raid...(you see the messages "mdrecoveryd got waken up...") but that did not work and after some time (30sec) the kernel Ooopsed. This was reproducable and only occured if booted with the "aic7xxx=verbose" kernel-parameter. The effect after reboot was, that the raid had to be resynced because one partition (that which always falls out) was damaged or at least seemed to. (The filesystem was clean, that was already unmounted as the oops occured.) Perhaps someone can test if this is reproducable with his machine too...i use kernel 2.4.3, raid is built-in, also the aic7xxx, there are three raid-disks (LVD, aic7xxx-controller on Mobo) in a raid5 mounted as /home. Solong... Frank. -- Frank Schneider, <SPATZ1@T-ONLINE.DE>. Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ... -.- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-10 19:11 ` Frank Schneider @ 2001-09-10 22:29 ` Andreas Steinmetz 2001-09-10 22:42 ` Justin T. Gibbs 2001-09-10 22:46 ` Frank Schneider 0 siblings, 2 replies; 28+ messages in thread From: Andreas Steinmetz @ 2001-09-10 22:29 UTC (permalink / raw) To: Frank Schneider; +Cc: linux-kernel > Something other made me wonder: > I ran the machine several times with the *new* aic7xxx-driver (TCQ=32) > and the "aic7xxx=verbose" commandline, and i noticed the following: > At every reboot (made by "reboot", RH7.1), the machine was not able to > stop the raid5 correctly...it un-mounted the mountpoint (/home) and then > it normaly wants to stop the raid...(you see the messages "mdrecoveryd > got waken up...") but that did not work and after some time (30sec) the > kernel Ooopsed. This was reproducable and only occured if booted with > the "aic7xxx=verbose" kernel-parameter. > The effect after reboot was, that the raid had to be resynced because > one partition (that which always falls out) was damaged or at least > seemed to. > (The filesystem was clean, that was already unmounted as the oops > occured.) > > Perhaps someone can test if this is reproducable with his machine > too...i use kernel 2.4.3, raid is built-in, also the aic7xxx, there are > three raid-disks (LVD, aic7xxx-controller on Mobo) in a raid5 mounted as > /home. > Same behaviour for RAID1 and the new aic7xxx driver for me at nearly every reboot. The old driver works just fine (2.4.9). Andreas Steinmetz D.O.M. Datenverarbeitung GmbH ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-10 22:29 ` Andreas Steinmetz @ 2001-09-10 22:42 ` Justin T. Gibbs 2001-09-10 22:55 ` Frank Schneider 2001-09-10 23:05 ` Andreas Steinmetz 2001-09-10 22:46 ` Frank Schneider 1 sibling, 2 replies; 28+ messages in thread From: Justin T. Gibbs @ 2001-09-10 22:42 UTC (permalink / raw) To: Andreas Steinmetz; +Cc: Frank Schneider, linux-kernel >> Something other made me wonder: >> I ran the machine several times with the *new* aic7xxx-driver (TCQ=32) >> and the "aic7xxx=verbose" commandline, and i noticed the following: >> At every reboot (made by "reboot", RH7.1), the machine was not able to >> stop the raid5 correctly...it un-mounted the mountpoint (/home) and then >> it normaly wants to stop the raid...(you see the messages "mdrecoveryd >> got waken up...") but that did not work and after some time (30sec) the >> kernel Ooopsed. ... >Same behaviour for RAID1 and the new aic7xxx driver for me at nearly every >reboot. The old driver works just fine (2.4.9). The new driver registers a "reboot notifier" with the system. If MD continues to perform I/O after the aic7xxx driver's notification routine is called, the result is undefined. The aic7xxx driver has already shutdown the hardware. Perhaps I should use a different event to indicate it is safe for me to clean up the hardware? -- Justin ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-10 22:42 ` Justin T. Gibbs @ 2001-09-10 22:55 ` Frank Schneider 2001-09-10 23:06 ` Justin T. Gibbs 2001-09-10 23:05 ` Andreas Steinmetz 1 sibling, 1 reply; 28+ messages in thread From: Frank Schneider @ 2001-09-10 22:55 UTC (permalink / raw) To: Justin T. Gibbs; +Cc: linux-kernel "Justin T. Gibbs" schrieb: > > >> Something other made me wonder: > >> I ran the machine several times with the *new* aic7xxx-driver (TCQ=32) > >> and the "aic7xxx=verbose" commandline, and i noticed the following: > >> At every reboot (made by "reboot", RH7.1), the machine was not able to > >> stop the raid5 correctly...it un-mounted the mountpoint (/home) and then > >> it normaly wants to stop the raid...(you see the messages "mdrecoveryd > >> got waken up...") but that did not work and after some time (30sec) the > >> kernel Ooopsed. > > ... > > >Same behaviour for RAID1 and the new aic7xxx driver for me at nearly every > >reboot. The old driver works just fine (2.4.9). > > The new driver registers a "reboot notifier" with the system. If MD > continues to perform I/O after the aic7xxx driver's notification routine > is called, the result is undefined. The aic7xxx driver has already > shutdown the hardware. Perhaps I should use a different event to indicate > it is safe for me to clean up the hardware? What about a kind of timer ? If the driver gets the "reboot"-note, watch for activity and shut down the hardware 5 or 10 secs after the last activity ? Shutting down the Userprocesses is done in a similar way..."Send term"...sleep 5...Send Kill..."...and when this happens, all unmounts and kills should have already occured, so it can only be a question of <5 secs until the last (raid-) process has exited. Other possibility would only be to let the kernel send this message just before he reboots the maschine via a BIOS-call...but even then you would have to wait a little until the hardware reacts...difficult problem... Solong... Frank -- Frank Schneider, <SPATZ1@T-ONLINE.DE>. Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ... -.- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-10 22:55 ` Frank Schneider @ 2001-09-10 23:06 ` Justin T. Gibbs 2001-09-10 23:37 ` Andreas Steinmetz 2001-09-11 12:10 ` Frank Schneider 0 siblings, 2 replies; 28+ messages in thread From: Justin T. Gibbs @ 2001-09-10 23:06 UTC (permalink / raw) To: Frank Schneider; +Cc: linux-kernel >What about a kind of timer ? The functions are run serially. If I'm to wait, I must block or risk having the machine powered off prior to completing my shutdown. A coworker of mine playing with the MD code reminded me that he had to change the priority of the MD notifier to make it work. I believe that this is the correct fix as there are other SCSI drivers that have shutdown hooks. All HBA drivers currently use 0 (or the lowest) as their priority. MD (line 3475 of drivers/md/md.c) uses 0 too. Change it to INT_MAX and MD will always get shutdown prior to any child devices it might use. -- Justin ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-10 23:06 ` Justin T. Gibbs @ 2001-09-10 23:37 ` Andreas Steinmetz 2001-09-10 23:46 ` Justin T. Gibbs 2001-09-11 12:10 ` Frank Schneider 1 sibling, 1 reply; 28+ messages in thread From: Andreas Steinmetz @ 2001-09-10 23:37 UTC (permalink / raw) To: Justin T. Gibbs; +Cc: linux-kernel, linux-kernel, Frank Schneider > MD (line 3475 of drivers/md/md.c) uses 0 too. Change it to INT_MAX > and MD will always get shutdown prior to any child devices it might I don't believe INT_MAX to be a good idea. What happens if anything else needs to shutdown prior to md (think of tux, knfsd)? As a suggestion it would be a good idea if someone with a broader overview would define some reboot priorities in include/linux/notifier.h. Andreas Steinmetz D.O.M. Datenverarbeitung GmbH ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-10 23:37 ` Andreas Steinmetz @ 2001-09-10 23:46 ` Justin T. Gibbs 2001-09-11 0:00 ` Andreas Steinmetz 0 siblings, 1 reply; 28+ messages in thread From: Justin T. Gibbs @ 2001-09-10 23:46 UTC (permalink / raw) To: Andreas Steinmetz; +Cc: linux-kernel, Frank Schneider >> MD (line 3475 of drivers/md/md.c) uses 0 too. Change it to INT_MAX >> and MD will always get shutdown prior to any child devices it might > >I don't believe INT_MAX to be a good idea. What happens if anything else needs >to shutdown prior to md (think of tux, knfsd)? Your examples are processes (albeit in the kernel) which should have received a signal long before the notifier chain is called. >As a suggestion it would be a >good idea if someone with a broader overview would define some reboot >priorities in include/linux/notifier.h. And expand the codes that are used for the notifier. The current set of codes are not well defined and most drivers treat all of them the same. -- Justin ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-10 23:46 ` Justin T. Gibbs @ 2001-09-11 0:00 ` Andreas Steinmetz 0 siblings, 0 replies; 28+ messages in thread From: Andreas Steinmetz @ 2001-09-11 0:00 UTC (permalink / raw) To: Justin T. Gibbs; +Cc: SPATZ1, Frank Schneider, linux-kernel On 10-Sep-2001 Justin T. Gibbs wrote: >>> MD (line 3475 of drivers/md/md.c) uses 0 too. Change it to INT_MAX >>> and MD will always get shutdown prior to any child devices it might >> >>I don't believe INT_MAX to be a good idea. What happens if anything else >>needs >>to shutdown prior to md (think of tux, knfsd)? > > Your examples are processes (albeit in the kernel) which should have > received a signal long before the notifier chain is called. > Granted. I could, however, imagine a fs to require a reboot notifier and that would need definitely be processed before md. >>As a suggestion it would be a >>good idea if someone with a broader overview would define some reboot >>priorities in include/linux/notifier.h. > > And expand the codes that are used for the notifier. The current set > of codes are not well defined and most drivers treat all of them the > same. > Just posted sort of this request to the list. > -- > Justin > Andreas Steinmetz D.O.M. Datenverarbeitung GmbH ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-10 23:06 ` Justin T. Gibbs 2001-09-10 23:37 ` Andreas Steinmetz @ 2001-09-11 12:10 ` Frank Schneider 2001-09-11 16:51 ` Justin T. Gibbs 1 sibling, 1 reply; 28+ messages in thread From: Frank Schneider @ 2001-09-11 12:10 UTC (permalink / raw) To: Justin T. Gibbs; +Cc: linux-kernel "Justin T. Gibbs" schrieb: > > >What about a kind of timer ? > > The functions are run serially. If I'm to wait, I must block > or risk having the machine powered off prior to completing my shutdown. > > A coworker of mine playing with the MD code reminded me that > he had to change the priority of the MD notifier to make it work. > I believe that this is the correct fix as there are other SCSI > drivers that have shutdown hooks. > > All HBA drivers currently use 0 (or the lowest) as their priority. > MD (line 3475 of drivers/md/md.c) uses 0 too. Change it to INT_MAX > and MD will always get shutdown prior to any child devices it might > use. One question is still open on this case: Why does the Oops only occur if the "aic7xxx=verbose" is set ? The above explanation is correct (AFAIK), but the kernel-oops should then happen on *every* reboot, not only if this verbose-parameter is set...or does the driver try to shutdown the drives and then write to the log "AIC7xxx shutdown successfull"...?...:-)) Solong... Frank. -- Frank Schneider, <SPATZ1@T-ONLINE.DE>. Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ... -.- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-11 12:10 ` Frank Schneider @ 2001-09-11 16:51 ` Justin T. Gibbs 0 siblings, 0 replies; 28+ messages in thread From: Justin T. Gibbs @ 2001-09-11 16:51 UTC (permalink / raw) To: Frank Schneider; +Cc: linux-kernel >One question is still open on this case: >Why does the Oops only occur if the "aic7xxx=verbose" is set ? I haven't looked to determine why this is so. -- Justin ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-10 22:42 ` Justin T. Gibbs 2001-09-10 22:55 ` Frank Schneider @ 2001-09-10 23:05 ` Andreas Steinmetz 1 sibling, 0 replies; 28+ messages in thread From: Andreas Steinmetz @ 2001-09-10 23:05 UTC (permalink / raw) To: Justin T. Gibbs; +Cc: linux-kernel, linux-kernel, Frank Schneider > > The new driver registers a "reboot notifier" with the system. If MD > continues to perform I/O after the aic7xxx driver's notification routine > is called, the result is undefined. The aic7xxx driver has already > shutdown the hardware. Perhaps I should use a different event to indicate > it is safe for me to clean up the hardware? > Gotcha! Actually the problem seems to be the raid code and the scsi code do register reboot notifiers with the same priority (0, see below). include/linux/notifier.h: struct notifier_block { int (*notifier_call)(struct notifier_block *self, unsigned long, void *); struct notifier_block *next; int priority; }; drivers/md/md.c: struct notifier_block md_notifier = { md_notify_reboot, NULL, 0 }; drivers/scsi/aic7xxx/aic7xxx_linux.c: static struct notifier_block ahc_linux_notifier = { ahc_linux_halt, NULL, 0 }; When registering the notifiers it depends on who's registering first at the same priority level. kernel/sys.c: int notifier_chain_register(struct notifier_block **list, struct notifier_block *n) { write_lock(¬ifier_lock); while(*list) { if(n->priority > (*list)->priority) break; list= &((*list)->next); } n->next = *list; *list=n; write_unlock(¬ifier_lock); return 0; } The notifier chin is then processed sequentially. kernel/sys.c: int notifier_call_chain(struct notifier_block **n, unsigned long val, void *v) { int ret=NOTIFY_DONE; struct notifier_block *nb = *n; while(nb) { ret=nb->notifier_call(nb,val,v); if(ret&NOTIFY_STOP_MASK) { return ret; } nb=nb->next; } return ret; } So what's actually required is to set the raid notifier to a higher priority than the scsi notifier to assert that raid is stopped before scsi. Unfortunately I can't test this right now as I'm doing work@home and I do need physical access to the systems (reset button) if it doesn't work out. Could you please straighten the priority issue out with the raid maintainer? Andreas Steinmetz D.O.M. Datenverarbeitung GmbH ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-10 22:29 ` Andreas Steinmetz 2001-09-10 22:42 ` Justin T. Gibbs @ 2001-09-10 22:46 ` Frank Schneider 1 sibling, 0 replies; 28+ messages in thread From: Frank Schneider @ 2001-09-10 22:46 UTC (permalink / raw) To: linux-kernel; +Cc: Andreas Steinmetz Andreas Steinmetz schrieb: > > > Something other made me wonder: > > I ran the machine several times with the *new* aic7xxx-driver (TCQ=32) > > and the "aic7xxx=verbose" commandline, and i noticed the following: > > At every reboot (made by "reboot", RH7.1), the machine was not able to > > stop the raid5 correctly...it un-mounted the mountpoint (/home) and then > > it normaly wants to stop the raid...(you see the messages "mdrecoveryd > > got waken up...") but that did not work and after some time (30sec) the > > kernel Ooopsed. This was reproducable and only occured if booted with > > the "aic7xxx=verbose" kernel-parameter. > > The effect after reboot was, that the raid had to be resynced because > > one partition (that which always falls out) was damaged or at least > > seemed to. > > (The filesystem was clean, that was already unmounted as the oops > > occured.) > > > > Perhaps someone can test if this is reproducable with his machine > > too...i use kernel 2.4.3, raid is built-in, also the aic7xxx, there are > > three raid-disks (LVD, aic7xxx-controller on Mobo) in a raid5 mounted as > > /home. > > > Same behaviour for RAID1 and the new aic7xxx driver for me at nearly every > reboot. The old driver works just fine (2.4.9). Ok, as i am using Kernel 2.4.3, it seems that the problem exists from 2.4.3 to 2.4.9...could you easily post the kernel-oops ? I can and will, but i am stil in testing the old driver with my disk-falls-out-of-raid problem, so i cannot reboot the next week or so as this problem only occurs randomly about once per week...:-(...and i want to "circle in" this problem to be sure that it is not something else... One thing i realize in the moment: The old driver uses a default TCQ of 8, now my /proc/scsi/aic7xxx/0 says that the actual queue depth per device is 1,1,1,1,1.....the TCQ is 8. We should test if the problem with the new driver goes away if we set a TCQ of 1...or has someone done this already ? This problem leads IMHO to the theory that the raid-code and the (new) aic7xxx-code interfer in some way...(race condition?)...perhaps this also causes my disk to fall out of the raid... Solong... Frank. -- Frank Schneider, <SPATZ1@T-ONLINE.DE>. Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ... -.- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-10 13:50 ` Olaf Zaplinski 2001-09-10 19:11 ` Frank Schneider @ 2001-09-11 15:00 ` Olaf Zaplinski 1 sibling, 0 replies; 28+ messages in thread From: Olaf Zaplinski @ 2001-09-11 15:00 UTC (permalink / raw) To: linux-kernel Olaf Zaplinski wrote: [...] > But how can I help to reproduce the error? Of course I could break the > mirror, compile the driver into the kernel (non-module) and do some stress > test on the SCSI drive. But it's not so good when I drive this machine into > a hang too often. Well, I tried that actually: - insmod'ed the new driver ('verbose', 'tcq=32') - broke mirror - mke2fs /dev/sda1 - tar'ed / to /mnt (which was the mounted sda1) => no errors So it has to do with the RAID code, I think. Olaf ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-07 20:32 ` AIC + RAID1 error? (was: Re: aic7xxx errors) Olaf Zaplinski 2001-09-07 22:32 ` Justin T. Gibbs @ 2001-09-08 20:25 ` Frank Schneider 2001-09-08 22:07 ` Justin T. Gibbs 1 sibling, 1 reply; 28+ messages in thread From: Frank Schneider @ 2001-09-08 20:25 UTC (permalink / raw) To: linux-kernel Olaf Zaplinski schrieb: > > Olaf Zaplinski wrote: > > > > Joseph Mathewson wrote: > > > > > > I've just woken up this morning to find my internet gateway machine only > > > responding to pings, and on giving it a keyboard & monitor, a load of > > > > > > scsi0:0:1:0: Attempting to queue an ABORT message > > > scsi0:0:1:0: Cmd aborted from QINFIFO > > > aic7xxx_abort returns 8194 > > > > > > errors. > > [...] > > > > /me too. I had this while booting 2.4.9 with a fresh installed SCSI card > > (AHA2940) + harddisk. What worked for me was to compile the kernel with the > > old Adaptec driver, so it's a driver issue. > Hello... I encounter a likely similar problem at the moment with aic7xxx and RAID5: I run a RAID5-Array on three SCSI-Disks, all IBM, all LVD on the AIC7xxx-Controller on the Mobo (ASUS-P2B-DS)...and from time to time (usually about once per week) always the same partition of the RAID5 gets a readerror and falls out of the array: ------------------------- Sep 8 20:49:31 falcon kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 8000002 Sep 8 20:49:31 falcon kernel: [valid=0] Info fld=0x0, Current sd08:04: sense key Hardware Error Sep 8 20:49:31 falcon kernel: Additional sense indicates Internal target failure Sep 8 20:49:31 falcon kernel: I/O error: dev 08:04, sector 8545688 Sep 8 20:49:31 falcon kernel: raid5: Disk failure on sda4, disabling device. Operation continuing on 2 devices Sep 8 20:49:31 falcon kernel: md: recovery thread got woken up ... Sep 8 20:49:31 falcon kernel: md0: no spare disk to reconstruct array! -- continuing in degraded mode Sep 8 20:49:31 falcon kernel: md: recovery thread finished ... Sep 8 20:49:31 falcon kernel: md: updating md0 RAID superblock on device Sep 8 20:49:31 falcon kernel: sdc1 [events: 000000be](write) sdc1's sb offset: 8707072 Sep 8 20:49:32 falcon kernel: sdb1 [events: 000000be](write) sdb1's sb offset: 8707072 Sep 8 20:49:32 falcon kernel: (skipping faulty sda4 ) Sep 8 20:49:32 falcon kernel: . ---------------------------- Ok, i also thought: "Bad disk" and to verify this (i have still guarantee on the drive) i formated it, let the AIC-BIOS do a "remap of bad blocks" and ran "badblocks" about 5 times on it with the "-w"-option...last but not least i copied over 160GB from and to the drive over two days...nothing, not a single failure of the drive...today i re-integrated the disk in my array, and got already the first fall-off. I now switched also to the old aic7xxx driver, only to get an idea where to seek the problem...in the raid-code, in the driver or somewhere else... Solong.. Frank. -- Frank Schneider, <SPATZ1@T-ONLINE.DE>. Microsoft isn't the answer. Microsoft is the question, and the answer is NO. ... -.- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: AIC + RAID1 error? (was: Re: aic7xxx errors) 2001-09-08 20:25 ` Frank Schneider @ 2001-09-08 22:07 ` Justin T. Gibbs 0 siblings, 0 replies; 28+ messages in thread From: Justin T. Gibbs @ 2001-09-08 22:07 UTC (permalink / raw) To: Frank Schneider; +Cc: linux-kernel >I run a RAID5-Array on three SCSI-Disks, all IBM, all LVD on the >AIC7xxx-Controller on the Mobo (ASUS-P2B-DS)...and from time to time >(usually about once per week) always the same partition of the RAID5 >gets a readerror and falls out of the array: This is a very different issue. The drive has even told you what is wrong. >------------------------- >Sep 8 20:49:31 falcon kernel: SCSI disk error : host 0 channel 0 id 0 >lun 0 return code = 8000002 >Sep 8 20:49:31 falcon kernel: [valid=0] Info fld=0x0, Current sd08:04: >sense key Hardware Error >Sep 8 20:49:31 falcon kernel: Additional sense indicates Internal ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >target failure ^^^^^^^^^^^^^^ Something bad happened inside the disk. Perhaps IBM can tell you what, but it is not the aic7xxx driver, SCSI layer, or md's fault for this disk going offline. >Ok, i also thought: "Bad disk" and to verify this (i have still >guarantee on the drive) i formated it, let the AIC-BIOS do a "remap of >bad blocks" and ran "badblocks" about 5 times on it with the Target failures are not "media errors". If the drive was experiencing a media problem, it would have said so. -- Justin ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: aic7xxx errors 2001-09-05 6:21 aic7xxx errors Joseph Mathewson 2001-09-05 7:58 ` Olaf Zaplinski @ 2001-09-05 20:23 ` Justin T. Gibbs 1 sibling, 0 replies; 28+ messages in thread From: Justin T. Gibbs @ 2001-09-05 20:23 UTC (permalink / raw) To: joe.mathewson; +Cc: linux-kernel >I've just woken up this morning to find my internet gateway machine only >responding to pings, and on giving it a keyboard & monitor, a load of > >scsi0:0:1:0: Attempting to queue an ABORT message >scsi0:0:1:0: Cmd aborted from QINFIFO >aic7xxx_abort returns 8194 > >errors. I would have to see the messages with "aic7xxx=verbose"" in order to better diagnose the problem. A full dmesg that includes driver initialization and SCSI device detection would be useful too. You might also want to upgrade your driver to something newer: http://people.FreeBSD.org/~gibbs/linux/ -- Justin ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2001-09-11 16:51 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-09-05 6:21 aic7xxx errors Joseph Mathewson 2001-09-05 7:58 ` Olaf Zaplinski 2001-09-05 9:04 ` Frank Schneider 2001-09-05 10:27 ` Antonio Miguel Trindade 2001-09-05 10:44 ` Frank Schneider 2001-09-05 11:21 ` Thorsten Kranzkowski 2001-09-05 13:05 ` Frank Schneider 2001-09-07 20:32 ` AIC + RAID1 error? (was: Re: aic7xxx errors) Olaf Zaplinski 2001-09-07 22:32 ` Justin T. Gibbs 2001-09-07 22:51 ` Frank Schneider 2001-09-07 23:37 ` Justin T. Gibbs 2001-09-10 13:50 ` Olaf Zaplinski 2001-09-10 19:11 ` Frank Schneider 2001-09-10 22:29 ` Andreas Steinmetz 2001-09-10 22:42 ` Justin T. Gibbs 2001-09-10 22:55 ` Frank Schneider 2001-09-10 23:06 ` Justin T. Gibbs 2001-09-10 23:37 ` Andreas Steinmetz 2001-09-10 23:46 ` Justin T. Gibbs 2001-09-11 0:00 ` Andreas Steinmetz 2001-09-11 12:10 ` Frank Schneider 2001-09-11 16:51 ` Justin T. Gibbs 2001-09-10 23:05 ` Andreas Steinmetz 2001-09-10 22:46 ` Frank Schneider 2001-09-11 15:00 ` Olaf Zaplinski 2001-09-08 20:25 ` Frank Schneider 2001-09-08 22:07 ` Justin T. Gibbs 2001-09-05 20:23 ` aic7xxx errors Justin T. Gibbs
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).