linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* AIC7xxx errors in 2.2.19 but not in 2.2.18
@ 2001-09-14  9:34 Holger Kiehl
  2001-09-14 13:11 ` Frank Schneider
  0 siblings, 1 reply; 7+ messages in thread
From: Holger Kiehl @ 2001-09-14  9:34 UTC (permalink / raw)
  To: linux-kernel

Hello

I am getting SCSI errors with an onboard Adaptec AIC-7890/1 Ultra2, but
only under very heavy disk load and only under kernel 2.2.19. These errors
do not appear under 2.2.18.

The system I have is a dual PIII-450 with 6 disks attached to the controller.
All disks are put together in SW-Raid5 array with one configured as hot
spare.

The errors under 2.2.19 look as follows:

 scsi : aborting command due to timeout : pid 52414, scsi0, channel 0, id 0, lun 0 Read (10) 00 00 ba f3 76 00 00 18 00
 scsi : aborting command due to timeout : pid 52416, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 ba f4 0e 00 00 80 00
 (scsi0:0:1:0) SCSISIGI 0x4, SEQADDR 0x77, SSTAT0 0x0, SSTAT1 0x2
 (scsi0:0:1:0) SG_CACHEPTR 0x8, SSTAT2 0x40, STCNT 0x5fc
 scsi : aborting command due to timeout : pid 52417, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 ba f4 8e 00 00 80 00
 scsi : aborting command due to timeout : pid 52419, scsi0, channel 0, id 2, lun 0 Write (10) 00 00 ba f4 0e 00 00 80 00
 scsi : aborting command due to timeout : pid 52420, scsi0, channel 0, id 2, lun 0 Write (10) 00 00 ba f4 8e 00 00 80 00
 scsi : aborting command due to timeout : pid 52422, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 ba f4 0e 00 00 80 00
 scsi : aborting command due to timeout : pid 52423, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 ba f4 8e 00 00 80 00
 scsi : aborting command due to timeout : pid 52425, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 1c ed 06 00 00 08 00
 scsi : aborting command due to timeout : pid 52426, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 ba f3 8e 00 00 80 00
 scsi : aborting command due to timeout : pid 52427, scsi0, channel 0, id 0, lun 0 Write (10) 00 00 ba f3 8e 00 00 80 00
 scsi : aborting command due to timeout : pid 52428, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 ba f5 0e 00 00 80 00
 scsi : aborting command due to timeout : pid 52429, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 ba f5 0e 00 00 80 00
 scsi : aborting command due to timeout : pid 52430, scsi0, channel 0, id 2, lun 0 Write (10) 00 00 ba f5 0e 00 00 80 00
 scsi : aborting command due to timeout : pid 52431, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 ba f4 0e 00 00 80 00
 scsi : aborting command due to timeout : pid 52432, scsi0, channel 0, id 0, lun 0 Write (10) 00 00 ba f4 0e 00 00 80 00
 SCSI host 0 abort (pid 52416) timed out - resetting
 SCSI bus is being reset for host 0 channel 0.

 wait_on_bh, CPU 1:
 irq:  0 [0 0]
 bh:   1 [1 0]
 <[c010aead]> <[c0199ffc]> <[c019ab9b]> <[c01a3860]> <6>(scsi0:0:4:0) Synchronous at 80.0 Mbyte/sec, offset 31.
 (scsi0:0:2:0) Synchronous at 80.0 Mbyte/sec, offset 63.
 (scsi0:0:3:0) Synchronous at 80.0 Mbyte/sec, offset 31.
 (scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31.
 (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 31.
 scsi : aborting command due to timeout : pid 53513, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 bb 4c f6 00 00 80 00
 (scsi0:0:1:0) SCSISIGI 0x4, SEQADDR 0x62, SSTAT0 0x0, SSTAT1 0x2
 (scsi0:0:1:0) SG_CACHEPTR 0x3c, SSTAT2 0x40, STCNT 0x3fc
 scsi : aborting command due to timeout : pid 53514, scsi0, channel 0, id 2, lun 0 Write (10) 00 00 bb 4a f6 00 00 80 00
 scsi : aborting command due to timeout : pid 53515, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 bb 4c f6 00 00 80 00
 scsi : aborting command due to timeout : pid 53516, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 bb 4a f6 00 00 80 00
 scsi : aborting command due to timeout : pid 53517, scsi0, channel 0, id 0, lun 0 Write (10) 00 00 bb 4d 76 00 00 40 00
 scsi : aborting command due to timeout : pid 53518, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 bb 4d 76 00 00 40 00
 scsi : aborting command due to timeout : pid 53519, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 bb 4d 76 00 00 40 00
 scsi : aborting command due to timeout : pid 53520, scsi0, channel 0, id 2, lun 0 Write (10) 00 00 ba fb 86 00 00 08 00
 scsi : aborting command due to timeout : pid 53521, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 ba fb 86 00 00 08 00
 scsi : aborting command due to timeout : pid 53522, scsi0, channel 0, id 0, lun 0 Write (10) 00 00 bb 4d be 00 00 30 00
 scsi : aborting command due to timeout : pid 53523, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 bb 4d be 00 00 30 00
 scsi : aborting command due to timeout : pid 53524, scsi0, channel 0, id 3, lun 0 Write (10) 00 00 bb 4d be 00 00 30 00
 scsi : aborting command due to timeout : pid 53525, scsi0, channel 0, id 4, lun 0 Write (10) 00 00 bb 4b 76 00 00 80 00
 scsi : aborting command due to timeout : pid 53526, scsi0, channel 0, id 2, lun 0 Write (10) 00 00 bb 4b 76 00 00 80 00
 scsi : aborting command due to timeout : pid 53527, scsi0, channel 0, id 0, lun 0 Write (10) 00 00 bb 4b f6 00 00 80 00
 SCSI host 0 abort (pid 53513) timed out - resetting
 SCSI bus is being reset for host 0 channel 0.
 (scsi0:0:4:0) Synchronous at 80.0 Mbyte/sec, offset 31.
 (scsi0:0:3:0) Synchronous at 80.0 Mbyte/sec, offset 31.
 (scsi0:0:2:0) Synchronous at 80.0 Mbyte/sec, offset 63.
 (scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31.
 (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 31.

>From Alan's changelog I see that there where changes in the AIC7xxx code.
Any idea what is wrong here?

Thanks,
Holger


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AIC7xxx errors in 2.2.19 but not in 2.2.18
  2001-09-14  9:34 AIC7xxx errors in 2.2.19 but not in 2.2.18 Holger Kiehl
@ 2001-09-14 13:11 ` Frank Schneider
  2001-09-14 13:37   ` Andreas Steinmetz
  2001-09-14 13:46   ` Holger Kiehl
  0 siblings, 2 replies; 7+ messages in thread
From: Frank Schneider @ 2001-09-14 13:11 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-kernel

Holger Kiehl schrieb:
> 
> Hello
> 
> I am getting SCSI errors with an onboard Adaptec AIC-7890/1 Ultra2, but
> only under very heavy disk load and only under kernel 2.2.19. These errors
> do not appear under 2.2.18.
> 
> The system I have is a dual PIII-450 with 6 disks attached to the controller.
> All disks are put together in SW-Raid5 array with one configured as hot
> spare.
> 

(..log snipped..)
 
> >From Alan's changelog I see that there where changes in the AIC7xxx code.
> Any idea what is wrong here?

Hello...

I (and someone else) had also mysterious problems with AIC7xxx and
RAID1/5, but we use Kernel 2.4.x.

In Kernel 2.4.x you can choose between two versions of the
aix7xxx-driver, one "old" one (Version 5.2.x) and a "new" one (Version
6.x.x). Do a "cat /proc/scsi/aic7xxx/0" to find your version.

We both found out that our problems dissapear when we use the "old"
driver (my tests are still in progress because my error (always the same
scsi-disk falling out of an raid5-array with an "internal error", but
the disk seems to be good) only appeared randomly about once a week, so
i still have to wait if it is really gone.

So perhaps you can try to use the older driver or determine the version
of your aic7xxx-driver. Perhaps you can use the aic7xxx-driver from
kernel 2.2.18 in Kernel 2.2.19 ?

You should also boot your system with the parameter "aic7xxx=verbose",
that will provide more infos in the syslog.

Solong..
Frank.

--
Frank Schneider, <SPATZ1@T-ONLINE.DE>.                           
Microsoft isn't the answer.
Microsoft is the question, and the answer is NO.
... -.-

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AIC7xxx errors in 2.2.19 but not in 2.2.18
  2001-09-14 13:11 ` Frank Schneider
@ 2001-09-14 13:37   ` Andreas Steinmetz
  2001-09-15 15:37     ` Doug Ledford
  2001-09-14 13:46   ` Holger Kiehl
  1 sibling, 1 reply; 7+ messages in thread
From: Andreas Steinmetz @ 2001-09-14 13:37 UTC (permalink / raw)
  To: Frank Schneider; +Cc: linux-kernel, Holger Kiehl

Hi,
2.2.19 only has the 'old' driver. The 'raid/scsi new' problem is a notifier
chain sequence problem that seems to have been taken care of now.
What I do see here may be a coincidence of kernel upgrade and a faulty drive.
Some snippets of 2.2.19 log messages of a faulty drive below.

May  2 03:33:07 pollux kernel: (scsi1:0:1:0) Parity error during Data-In phase.
May  2 03:33:37 pollux kernel: scsi : aborting command due to timeout : pid
1188263, scsi1, channel 0, id 1, lun 0 Read (10) 00 01 04 cd 97 00 00 80 00 
May  2 03:33:38 pollux kernel: scsi : aborting command due to timeout : pid
1188268, scsi1, channel 0, id 0, lun 0 Read (10) 00 01 04 ce 2f 00 00 80 00 
May  2 03:33:38 pollux kernel: scsi : aborting command due to timeout : pid
1188269, scsi1, channel 0, id 0, lun 0 Read (10) 00 01 04 ce af 00 00 28 00 
May  2 03:33:38 pollux kernel: scsi : aborting command due to timeout : pid
1188270, scsi1, channel 0, id 1, lun 0 Read (10) 00 01 04 ce 17 00 00 80 00 
May  2 03:33:38 pollux kernel: scsi : aborting command due to timeout : pid
1188271, scsi1, channel 0, id 1, lun 0 Read (10) 00 01 04 ce 97 00 00 40 00 
May  2 03:33:38 pollux kernel: scsi : aborting command due to timeout : pid
1188272, scsi1, channel 0, id 2, lun 0 Read (10) 00 01 04 ce 17 00 00 80 00 
May  2 03:33:38 pollux kernel: scsi : aborting command due to timeout : pid
1188273, scsi1, channel 0, id 2, lun 0 Read (10) 00 01 04 ce 97 00 00 40 00 
May  2 03:33:38 pollux kernel: scsi : aborting command due to timeout : pid
1188274, scsi1, channel 0, id 3, lun 0 Read (10) 00 01 04 ce 17 00 00 80 00 
May  2 03:33:38 pollux kernel: scsi : aborting command due to timeout : pid
1188275, scsi1, channel 0, id 3, lun 0 Read (10) 00 01 04 ce 97 00 00 08 00 
May  2 03:33:39 pollux kernel: SCSI host 1 abort (pid 1188263) timed out -
resetting
May  2 03:33:39 pollux kernel: SCSI bus is being reset for host 1 channel 0
May  2 03:33:41 pollux kernel: SCSI host 1 reset (pid 1188263) timed out again -
May  2 03:33:41 pollux kernel: probably an unrecoverable SCSI bus or device
hang.

On 14-Sep-2001 Frank Schneider wrote:
> Holger Kiehl schrieb:
>> 
>> Hello
>> 
>> I am getting SCSI errors with an onboard Adaptec AIC-7890/1 Ultra2, but
>> only under very heavy disk load and only under kernel 2.2.19. These errors
>> do not appear under 2.2.18.
>> 
>> The system I have is a dual PIII-450 with 6 disks attached to the
>> controller.
>> All disks are put together in SW-Raid5 array with one configured as hot
>> spare.
>> 
> 
> (..log snipped..)
>  
>> >From Alan's changelog I see that there where changes in the AIC7xxx code.
>> Any idea what is wrong here?
> 
> Hello...
> 
> I (and someone else) had also mysterious problems with AIC7xxx and
> RAID1/5, but we use Kernel 2.4.x.
> 
> In Kernel 2.4.x you can choose between two versions of the
> aix7xxx-driver, one "old" one (Version 5.2.x) and a "new" one (Version
> 6.x.x). Do a "cat /proc/scsi/aic7xxx/0" to find your version.
> 
> We both found out that our problems dissapear when we use the "old"
> driver (my tests are still in progress because my error (always the same
> scsi-disk falling out of an raid5-array with an "internal error", but
> the disk seems to be good) only appeared randomly about once a week, so
> i still have to wait if it is really gone.
> 
> So perhaps you can try to use the older driver or determine the version
> of your aic7xxx-driver. Perhaps you can use the aic7xxx-driver from
> kernel 2.2.18 in Kernel 2.2.19 ?
> 
> You should also boot your system with the parameter "aic7xxx=verbose",
> that will provide more infos in the syslog.
> 
> Solong..
> Frank.
> 
> --
> Frank Schneider, <SPATZ1@T-ONLINE.DE>.                           
> Microsoft isn't the answer.
> Microsoft is the question, and the answer is NO.
> ... -.-
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Andreas Steinmetz
D.O.M. Datenverarbeitung GmbH

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AIC7xxx errors in 2.2.19 but not in 2.2.18
  2001-09-14 13:11 ` Frank Schneider
  2001-09-14 13:37   ` Andreas Steinmetz
@ 2001-09-14 13:46   ` Holger Kiehl
  2001-09-14 17:12     ` Mike Fedyk
  1 sibling, 1 reply; 7+ messages in thread
From: Holger Kiehl @ 2001-09-14 13:46 UTC (permalink / raw)
  To: Frank Schneider; +Cc: linux-kernel




On Fri, 14 Sep 2001, Frank Schneider wrote:

> Holger Kiehl schrieb:
> >
> > Hello
> >
> > I am getting SCSI errors with an onboard Adaptec AIC-7890/1 Ultra2, but
> > only under very heavy disk load and only under kernel 2.2.19. These errors
> > do not appear under 2.2.18.
> >
> > The system I have is a dual PIII-450 with 6 disks attached to the controller.
> > All disks are put together in SW-Raid5 array with one configured as hot
> > spare.
> >
>
> (..log snipped..)
>
> > >From Alan's changelog I see that there where changes in the AIC7xxx code.
> > Any idea what is wrong here?
>
> Hello...
>
> I (and someone else) had also mysterious problems with AIC7xxx and
> RAID1/5, but we use Kernel 2.4.x.
>
Just today Neil Brown has send a patch where he mentioned something
about the AIC7xxx driver. But I don't know if this has anything
to do with this problem.

> In Kernel 2.4.x you can choose between two versions of the
> aix7xxx-driver, one "old" one (Version 5.2.x) and a "new" one (Version
> 6.x.x). Do a "cat /proc/scsi/aic7xxx/0" to find your version.
>
I have played with 2.4.5 and the new aic7xxx driver and did not see
the problems here. Have not tried the old one under 2.4.5. Unfortunately
I cannot take 2.4.x because of the bigger swap demand.

> We both found out that our problems dissapear when we use the "old"
> driver (my tests are still in progress because my error (always the same
> scsi-disk falling out of an raid5-array with an "internal error", but
> the disk seems to be good) only appeared randomly about once a week, so
> i still have to wait if it is really gone.
>
> So perhaps you can try to use the older driver or determine the version
> of your aic7xxx-driver. Perhaps you can use the aic7xxx-driver from
> kernel 2.2.18 in Kernel 2.2.19 ?
>
> You should also boot your system with the parameter "aic7xxx=verbose",
> that will provide more infos in the syslog.
>
Next time when I boot, I will put in this option.

Thanks,
Holger


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AIC7xxx errors in 2.2.19 but not in 2.2.18
  2001-09-14 13:46   ` Holger Kiehl
@ 2001-09-14 17:12     ` Mike Fedyk
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Fedyk @ 2001-09-14 17:12 UTC (permalink / raw)
  To: linux-kernel

On Fri, Sep 14, 2001 at 03:46:10PM +0200, Holger Kiehl wrote:
> I have played with 2.4.5 and the new aic7xxx driver and did not see
> the problems here. Have not tried the old one under 2.4.5. Unfortunately
> I cannot take 2.4.x because of the bigger swap demand.
> 
2.4.x-ac doesn't have the high swap requirement.  Swap demands look similar
to 2.2.xx kernels with 2.4.{8,9}-ac.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AIC7xxx errors in 2.2.19 but not in 2.2.18
  2001-09-14 13:37   ` Andreas Steinmetz
@ 2001-09-15 15:37     ` Doug Ledford
  2001-09-15 15:42       ` Andreas Steinmetz
  0 siblings, 1 reply; 7+ messages in thread
From: Doug Ledford @ 2001-09-15 15:37 UTC (permalink / raw)
  To: Andreas Steinmetz; +Cc: Frank Schneider

Andreas Steinmetz wrote:

> Hi,
> 2.2.19 only has the 'old' driver. The 'raid/scsi new' problem is a notifier
> chain sequence problem that seems to have been taken care of now.
> What I do see here may be a coincidence of kernel upgrade and a faulty drive.
> Some snippets of 2.2.19 log messages of a faulty drive below.
> 
> May  2 03:33:07 pollux kernel: (scsi1:0:1:0) Parity error during Data-In phase.
> May  2 03:33:37 pollux kernel: scsi : aborting command due to timeout : pid
> 1188263, scsi1, channel 0, id 1, lun 0 Read (10) 00 01 04 cd 97 00 00 80 00 



I've seen that error a few times now with the new code in 2.2.19.  I 
don't have a fix for it at this time (and I probably won't since 
development on that driver isn't a 'regular' thing at this point).  If 
the old driver in 2.2.18 worked for you, then I would copy the aic7xxx* 
files from 2.2.18 into 2.2.19 and rebuild your kernel.




-- 

  Doug Ledford <dledford@redhat.com>  http://people.redhat.com/dledford
       Please check my web site for aic7xxx updates/answers before
                       e-mailing me about problems


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: AIC7xxx errors in 2.2.19 but not in 2.2.18
  2001-09-15 15:37     ` Doug Ledford
@ 2001-09-15 15:42       ` Andreas Steinmetz
  0 siblings, 0 replies; 7+ messages in thread
From: Andreas Steinmetz @ 2001-09-15 15:42 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Holger Kiehl, linux-kernel, Frank Schneider


On 15-Sep-2001 Doug Ledford wrote:
> Andreas Steinmetz wrote:
> 
>> Hi,
>> 2.2.19 only has the 'old' driver. The 'raid/scsi new' problem is a notifier
>> chain sequence problem that seems to have been taken care of now.
>> What I do see here may be a coincidence of kernel upgrade and a faulty
>> drive.
>> Some snippets of 2.2.19 log messages of a faulty drive below.
>> 
>> May  2 03:33:07 pollux kernel: (scsi1:0:1:0) Parity error during Data-In
>> phase.
>> May  2 03:33:37 pollux kernel: scsi : aborting command due to timeout : pid
>> 1188263, scsi1, channel 0, id 1, lun 0 Read (10) 00 01 04 cd 97 00 00 80 00 
> 
> 
> 
> I've seen that error a few times now with the new code in 2.2.19.  I 
> don't have a fix for it at this time (and I probably won't since 
> development on that driver isn't a 'regular' thing at this point).  If 
> the old driver in 2.2.18 worked for you, then I would copy the aic7xxx* 
> files from 2.2.18 into 2.2.19 and rebuild your kernel.
> 
> 
Please note that the disk was proven faulty. (Other kernel, other OS on other
hardware, disk still failing, since then replaced and no more problems).


Andreas Steinmetz
D.O.M. Datenverarbeitung GmbH

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-09-15 15:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-14  9:34 AIC7xxx errors in 2.2.19 but not in 2.2.18 Holger Kiehl
2001-09-14 13:11 ` Frank Schneider
2001-09-14 13:37   ` Andreas Steinmetz
2001-09-15 15:37     ` Doug Ledford
2001-09-15 15:42       ` Andreas Steinmetz
2001-09-14 13:46   ` Holger Kiehl
2001-09-14 17:12     ` Mike Fedyk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).