linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 3ware Escalade problems
       [not found] <53B208BD9A7FD311881A009027B6BBFB9EADCC@siamese>
@ 2001-08-02  1:38 ` Jeff V. Merkey
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff V. Merkey @ 2001-08-02  1:38 UTC (permalink / raw)
  To: Adam Radford; +Cc: Scott Ransom, linux-kernel, 'eric@andante.org'

On Wed, Aug 01, 2001 at 02:13:02PM -0700, Adam Radford wrote:

Adam,

Nice to meet you.  The gendisk head is improperly reporting devices
attached to the 3ware controller.  What I am seeing is drives being 
reported with zero lengths in the block size fields, etc.  This causes
kernel level software to crash and some user space utilities to malfunction.

Please provide your telephone number so I can call you and go over 
these problems.  The driver is also **VERY** unstable on some systems,
and gets all sorts of lost interrupt problems and noisy messages on 
the Serverworks HE chipsets we are using with SCI.

Jeff 


> Jeff,
> 
> The problems this user is seeing with drive timeouts and ECC errors have 
> absolutely nothing to do with 'gendisk support'.
> 
> BTW... would you mind explaining to me what I need to do to have 'gendisk
> support' in the 3ware driver?  greping for 'struct gendisk' in the drivers/
> scsi directory on 2.4.7 shows only sd.c instantiating a struct gendisk.  
> 
> What do the low level drivers need to do?  If there's some missing gendisk
> calls in the 3ware driver, then why don't any other scsi drivers have any 
> instances of struct gendisk ?
> 
> Maybe Eric Youngdale can clarify what you are talking about ?
> 
> BTW, I am the author and maintainer of the 3ware driver.
> 
> --
> Adam Radford
> Software Engineer
> 3ware, Inc.
> 
> -----Original Message-----
> From: Jeff V. Merkey [mailto:jmerkey@vger.timpanogas.org]
> Sent: Wednesday, August 01, 2001 2:40 PM
> To: Scott Ransom
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: 3ware Escalade problems
> 
> 
> On Wed, Aug 01, 2001 at 02:14:44PM -0400, Scott Ransom wrote:
> 
> 
> I am also using 8 way escalade adapters, and am seeing a host of problems.
> The first and foremore is that the gendisk head in 2.4.X is not being 
> initialized properly in the driver.  I have reported these problems to
> 3-ware, and they are attmepting to get the engineer who owns the drivers
> on the line with us.  The problems you are seeing are probably related 
> to the same bugs.  This driver requires some rework to get it compliant
> with 2.4.X.  At present, several programs fail with it since is is not 
> setting up the gendisk head properly.  I do not know if your
> problem is related, but this one will get added to the list when I speak 
> with this person.
> 
> Jeff
> 
> 
> > Hello,
> > 
> > After months of running a fileserver with an 8 port 3ware escalade card
> > (kernels 2.4.[3457] using reiserfs and software RAID5) I started getting
> > problems this weekend.
> > 
> > Over the last three days, when I try to access the drives, after a
> > couple minutes I get a drive failure (I even heard a "yelp" from the
> > drive during one of them...).  But the "failure" has happened to 3 of
> > the 8 drives over 3 days -- so unless there is a hardware problem that
> > is killing my drives I find it hard to believe that 3 drives really and
> > truly failed....
> > 
> > Here is a sample from my syslog of a failure:
> > 
> > 3w-xxxx: tw_interrupt(): Bad response, status = 0xc7, flags = 0x51, unit
> > = 0x1.
> > 3w-xxxx: tw_scsi_eh_reset(): Reset succeeded for card 1.
> > 3w-xxxx: tw_interrupt(): Bad response, status = 0xc7, flags = 0x51, unit
> > = 0x1.
> > scsi: device set offline - not ready or command retry failed after host
> > reset: host 1 channel 0 id 1 lun 0
> > SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 80000
> > I/O error: dev 08:11, sector 158441712
> > 
> > I've noticed several "issues" with the 3ware cards in the archives.  Has
> > anyone seen something like this?
> > 
> > Scott
> > 
> > PS:  I'm currently running 2.4.7 with the lm-sensors/i2c patches.
> > 
> > -- 
> > Scott M. Ransom                   Address:  Harvard-Smithsonian CfA
> > Phone:  (617) 496-7908                      60 Garden St.  MS 10 
> > email:  ransom@cfa.harvard.edu              Cambridge, MA  02138
> > GPG Fingerprint: 06A9 9553 78BE 16DB 407B  FFCA 9BFA B6FF FFD3 2989
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3ware Escalade problems
  2001-08-02 21:26             ` Jeff V. Merkey
@ 2001-08-02 22:02               ` Jeff V. Merkey
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff V. Merkey @ 2001-08-02 22:02 UTC (permalink / raw)
  To: Alan Cox, langus; +Cc: linux-kernel, jmerkey



Alan,

On a related problem, we are seeing packet checksum corruption with both
the 3Com and Intel Gigabit ethernet adapters when a ping -f -s > 1472 packet
fllods are initiated.  The sniffers report packets with corrupted checksum
information.  I am copying Larry Angus on this email since he has all 
the sniffer traces.  These problems show up on 2.2 and 2.4 kernels.

We noticed several emails in the archive regarding this problem, but 
no fixes reported.  What's disturbing regarding this issue is that 
hardware checksum errors are being generated which results in the 
packets being lost.

Larry - please provide the traces and documentation you have discovered
to Alan regarding this problem.

Jeff


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3ware Escalade problems
  2001-08-02 12:22           ` Alan Cox
@ 2001-08-02 21:26             ` Jeff V. Merkey
  2001-08-02 22:02               ` Jeff V. Merkey
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff V. Merkey @ 2001-08-02 21:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: Scott Ransom, linux-kernel

On Thu, Aug 02, 2001 at 01:22:15PM +0100, Alan Cox wrote:
> > If you have an adapter, install it, dumop and gendisk head, and take a 
> > look at what's happening.  I am seeing drives being reported with 0 
> > block lengths and other wierdness.    
> 
> Looks fine to me. However if you are seeing 0 block length drives reported
> thats tw_scsiop_read_capacity_complete() causing the sd.c code to do
> something daft.
> 
> Alan

Alan,

Thanks.  I will look at this.  I am running into this problem on 2.2.X 
kernels more so than 2.4.X, but occasionally see it on 2.4.X.  It seems
related to RAID recovery during testing.  We noticed it when we were 
triggering RAID failover for testing.

Jeff


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3ware Escalade problems
  2001-08-02  1:58         ` Jeff V. Merkey
@ 2001-08-02 12:22           ` Alan Cox
  2001-08-02 21:26             ` Jeff V. Merkey
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Cox @ 2001-08-02 12:22 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: Alan Cox, Scott Ransom, linux-kernel

> If you have an adapter, install it, dumop and gendisk head, and take a 
> look at what's happening.  I am seeing drives being reported with 0 
> block lengths and other wierdness.    

Looks fine to me. However if you are seeing 0 block length drives reported
thats tw_scsiop_read_capacity_complete() causing the sd.c code to do
something daft.

Alan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3ware Escalade problems
  2001-08-02  0:40       ` Alan Cox
@ 2001-08-02  1:58         ` Jeff V. Merkey
  2001-08-02 12:22           ` Alan Cox
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff V. Merkey @ 2001-08-02  1:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: Scott Ransom, linux-kernel

On Thu, Aug 02, 2001 at 01:40:55AM +0100, Alan Cox wrote:
> > Try putting four adapters into a system all at once with 32 drives, and 
> > you will see all sorts of bugs.  I do not see problems with a single board,
> > other than gendisk reporting junk.  If it's the scsi layer, then the driver 
> > must not be calling the sd driver.  I will attempt to get on the phone with
> > Adam, and get these issues resolved.
> 
> The scsi disk layer does the entire gendisk stuff itself. Its actually
> very hard for a scsi driver to screw that up - the scsi driver has no
> real concept of a 'disk'. sd talks to a disk and the scsi driver just gets
> messages posted around.
> 
> Alan

Alan,

If you have an adapter, install it, dumop and gendisk head, and take a 
look at what's happening.  I am seeing drives being reported with 0 
block lengths and other wierdness.    

Jeff



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3ware Escalade problems
  2001-08-02  0:26   ` Alan Cox
@ 2001-08-02  1:40     ` Jeff V. Merkey
  2001-08-02  0:40       ` Alan Cox
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff V. Merkey @ 2001-08-02  1:40 UTC (permalink / raw)
  To: Alan Cox; +Cc: Scott Ransom, linux-kernel

On Thu, Aug 02, 2001 at 01:26:22AM +0100, Alan Cox wrote:
> > I am also using 8 way escalade adapters, and am seeing a host of problems.
> 
> I've seen no problems since the 1.02.00.005 driver. 
> 
> > The first and foremore is that the gendisk head in 2.4.X is not being 
> > initialized properly in the driver.  I have reported these problems to
> 
> The gendisk comes from the scsi midlayer so you want linux-scsi for that

Alan,

Try putting four adapters into a system all at once with 32 drives, and 
you will see all sorts of bugs.  I do not see problems with a single board,
other than gendisk reporting junk.  If it's the scsi layer, then the driver 
must not be calling the sd driver.  I will attempt to get on the phone with
Adam, and get these issues resolved.

Jeff



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3ware Escalade problems
  2001-08-02  1:40     ` Jeff V. Merkey
@ 2001-08-02  0:40       ` Alan Cox
  2001-08-02  1:58         ` Jeff V. Merkey
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Cox @ 2001-08-02  0:40 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: Alan Cox, Scott Ransom, linux-kernel

> Try putting four adapters into a system all at once with 32 drives, and 
> you will see all sorts of bugs.  I do not see problems with a single board,
> other than gendisk reporting junk.  If it's the scsi layer, then the driver 
> must not be calling the sd driver.  I will attempt to get on the phone with
> Adam, and get these issues resolved.

The scsi disk layer does the entire gendisk stuff itself. Its actually
very hard for a scsi driver to screw that up - the scsi driver has no
real concept of a 'disk'. sd talks to a disk and the scsi driver just gets
messages posted around.

Alan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3ware Escalade problems
  2001-08-01 21:39 ` Jeff V. Merkey
@ 2001-08-02  0:26   ` Alan Cox
  2001-08-02  1:40     ` Jeff V. Merkey
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Cox @ 2001-08-02  0:26 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: Scott Ransom, linux-kernel

> I am also using 8 way escalade adapters, and am seeing a host of problems.

I've seen no problems since the 1.02.00.005 driver. 

> The first and foremore is that the gendisk head in 2.4.X is not being 
> initialized properly in the driver.  I have reported these problems to

The gendisk comes from the scsi midlayer so you want linux-scsi for that


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3ware Escalade problems
  2001-08-01 18:14 Scott Ransom
@ 2001-08-01 21:39 ` Jeff V. Merkey
  2001-08-02  0:26   ` Alan Cox
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff V. Merkey @ 2001-08-01 21:39 UTC (permalink / raw)
  To: Scott Ransom; +Cc: linux-kernel

On Wed, Aug 01, 2001 at 02:14:44PM -0400, Scott Ransom wrote:


I am also using 8 way escalade adapters, and am seeing a host of problems.
The first and foremore is that the gendisk head in 2.4.X is not being 
initialized properly in the driver.  I have reported these problems to
3-ware, and they are attmepting to get the engineer who owns the drivers
on the line with us.  The problems you are seeing are probably related 
to the same bugs.  This driver requires some rework to get it compliant
with 2.4.X.  At present, several programs fail with it since is is not 
setting up the gendisk head properly.  I do not know if your
problem is related, but this one will get added to the list when I speak 
with this person.

Jeff


> Hello,
> 
> After months of running a fileserver with an 8 port 3ware escalade card
> (kernels 2.4.[3457] using reiserfs and software RAID5) I started getting
> problems this weekend.
> 
> Over the last three days, when I try to access the drives, after a
> couple minutes I get a drive failure (I even heard a "yelp" from the
> drive during one of them...).  But the "failure" has happened to 3 of
> the 8 drives over 3 days -- so unless there is a hardware problem that
> is killing my drives I find it hard to believe that 3 drives really and
> truly failed....
> 
> Here is a sample from my syslog of a failure:
> 
> 3w-xxxx: tw_interrupt(): Bad response, status = 0xc7, flags = 0x51, unit
> = 0x1.
> 3w-xxxx: tw_scsi_eh_reset(): Reset succeeded for card 1.
> 3w-xxxx: tw_interrupt(): Bad response, status = 0xc7, flags = 0x51, unit
> = 0x1.
> scsi: device set offline - not ready or command retry failed after host
> reset: host 1 channel 0 id 1 lun 0
> SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 80000
> I/O error: dev 08:11, sector 158441712
> 
> I've noticed several "issues" with the 3ware cards in the archives.  Has
> anyone seen something like this?
> 
> Scott
> 
> PS:  I'm currently running 2.4.7 with the lm-sensors/i2c patches.
> 
> -- 
> Scott M. Ransom                   Address:  Harvard-Smithsonian CfA
> Phone:  (617) 496-7908                      60 Garden St.  MS 10 
> email:  ransom@cfa.harvard.edu              Cambridge, MA  02138
> GPG Fingerprint: 06A9 9553 78BE 16DB 407B  FFCA 9BFA B6FF FFD3 2989
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3ware Escalade problems
       [not found] <53B208BD9A7FD311881A009027B6BBFB9EADC7@siamese>
@ 2001-08-01 18:51 ` Scott Ransom
  0 siblings, 0 replies; 11+ messages in thread
From: Scott Ransom @ 2001-08-01 18:51 UTC (permalink / raw)
  To: Adam Radford; +Cc: linux-kernel, Scott Ransom

Hi Adam,

The drives I am using are Maxtor 81.9G drives (model 98196H8).

I refuse to believe that 3 different disks could fail during the span of
3 days without _something_ causing it -- especially since things have
been working great since February or so.  And if I hadn't heard at least
one of the drives scream in agony, I wouldn't have believed that any of
them were really failing...  Is it possible that a bad drive could
affect other drives in some way?

Here is the first failure:

Jul 27 23:24:53 munin kernel: 3w-xxxx: tw_interrupt(): Bad response,
status = 0xc7, flags = 0x1b, unit = 0x3.
Jul 27 23:24:53 munin last message repeated 6 times
Jul 27 23:25:08 munin kernel: 3w-xxxx: tw_poll_status(): Flag 0x40000
not found.
Jul 27 23:25:08 munin kernel: 3w-xxxx: tw_aen_drain_queue(): No
attention interrupt for card 1
Jul 27 23:25:08 munin kernel: 3w-xxxx: tw_reset_sequence(): No attention
interrupt for card 1.
Jul 27 23:25:24 munin kernel: 3w-xxxx: tw_poll_status(): Flag 0x40000
not found.
Jul 27 23:25:24 munin kernel: 3w-xxxx: tw_aen_drain_queue(): No
attention interrupt for card 1
Jul 27 23:25:24 munin kernel: 3w-xxxx: tw_reset_sequence(): No attention
interrupt for card 1.
Jul 27 23:25:37 munin kernel: 3w-xxxx: tw_scsi_eh_reset(): Reset
succeeded for card 1.
Jul 27 23:25:47 munin kernel: 3w-xxxx: tw_interrupt(): Bad response,
status = 0xcf, flags = 0x0, unit = 0x3.
Jul 27 23:25:47 munin kernel: scsi: device set offline - not ready or
command retry failed after host reset: host 1 channel 0 id 3 lun 0
Jul 27 23:25:47 munin kernel: 3w-xxxx: tw_interrupt(): Bad response,
status = 0xcf, flags = 0x0, unit = 0x3.
Jul 27 23:25:47 munin kernel: scsi: device set offline - not ready or
command retry failed after host reset: host 1 channel 0 id 3 lun 0
Jul 27 23:25:47 munin kernel: 3w-xxxx: tw_interrupt(): Bad response,
status = 0xcf, flags = 0x0, unit = 0x3.
Jul 27 23:25:47 munin kernel: scsi: device set offline - not ready or
command retry failed after host reset: host 1 channel 0 id 3 lun 0
Jul 27 23:25:47 munin kernel: 3w-xxxx: tw_interrupt(): Bad response,
status = 0xcf, flags = 0x0, unit = 0x3.
Jul 27 23:25:47 munin kernel: scsi: device set offline - not ready or
command retry failed after host reset: host 1 channel 0 id 3 lun 0
Jul 27 23:25:47 munin kernel: 3w-xxxx: tw_interrupt(): Bad response,
status = 0xcf, flags = 0x0, unit = 0x3.
Jul 27 23:25:47 munin kernel: scsi: device set offline - not ready or
command retry failed after host reset: host 1 channel 0 id 3 lun 0
Jul 27 23:25:47 munin kernel: 3w-xxxx: tw_interrupt(): Bad response,
status = 0xcf, flags = 0x0, unit = 0x3.
Jul 27 23:25:47 munin kernel: scsi: device set offline - not ready or
command retry failed after host reset: host 1 channel 0 id 3 lun 0
Jul 27 23:25:47 munin kernel: 3w-xxxx: tw_interrupt(): Bad response,
status = 0xcf, flags = 0x0, unit = 0x3.
Jul 27 23:25:47 munin kernel: scsi: device set offline - not ready or
command retry failed after host reset: host 1 channel 0 id 3 lun 0

followed by a bunch of garbage looking like the following (don't know if
this came from the RAID code or the 3ware code or something else...

Jul 27 23:25:47 munin kernel:  : D:0 D:0 :0 D: D:0 D:0 D:0 D: D:0 D:0
D:0 T:1:00> .01c967 65WD C 0sdIS,3DID5)SK6,  K>: :0  0,<40:,S[d0)K<00
v   N:  N: N: N: N  N  N  N  DN:0  DN:   N: N:****: el<4> drrrc>
Jul 27 23:25:47 munin kernel: **MP****da1>ck0ea
Jul 27 23:25:47 munin kernel:      L5 S853 0: 1:6 2:1 3:  DISK<N:6> :6
:6> 6:  DI: 7:: 8: ::411:  DISK<N:0:412:4>
Jul 27 23:25:47 munin kernel: <13:414:415:4>
Jul 27 23:25:47 munin kernel:      16:417:4>
Jul 27 23:25:47 munin kernel:   1:4>
Jul 27 23:25:47 munin kernel: <20:421:42:423:42:4>25:26:4IS>
Jul 27 23:25:47 munin kernel: 7 :a
Jul 27 23:25:47 munin kernel: 6
Jul 27 23:25:47 munin kernel: <d

Then a different disk "failure" a couple days later...

Jul 31 19:21:16 munin kernel: 3w-xxxx: tw_interrupt(): Bad response,
status = 0xc7, flags = 0x51, unit = 0x4.
Jul 31 19:21:19 munin kernel: 3w-xxxx: tw_scsi_eh_reset(): Reset
succeeded for card 1.
Jul 31 19:21:33 munin kernel: 3w-xxxx: tw_interrupt(): Bad response,
status = 0xc7, flags = 0x51, unit = 0x4.
Jul 31 19:21:33 munin kernel: scsi: device set offline - not ready or
command retry failed after host reset: host 1 channel 0 id 4 lun 0
Jul 31 19:21:33 munin kernel: SCSI disk error : host 1 channel 0 id 4
lun 0 return code = 80000
Jul 31 19:21:33 munin kernel:  I/O error: dev 08:41, sector 2362112

And finally a third "failure" today... 

Aug  1 12:54:29 munin kernel: 3w-xxxx: tw_interrupt(): Bad response,
status = 0xc7, flags = 0x51, unit = 0x1.
Aug  1 12:54:32 munin kernel: 3w-xxxx: tw_scsi_eh_reset(): Reset
succeeded for card 1.
Aug  1 12:54:45 munin kernel: 3w-xxxx: tw_interrupt(): Bad response,
status = 0xc7, flags = 0x51, unit = 0x1.
Aug  1 12:54:45 munin kernel: scsi: device set offline - not ready or
command retry failed after host reset: host 1 channel 0 id 1 lun 0
Aug  1 12:54:45 munin kernel: SCSI disk error : host 1 channel 0 id 1
lun 0 return code = 80000
Aug  1 12:54:45 munin kernel:  I/O error: dev 08:11, sector 158441712


Scott


> Adam Radford wrote:
> 
> Scott,
> 
> Several of the 'problems' users are seeing are due to bad IBM 75 Gig
> drives
> that had contamination during the manufacturing process.  Lots of them
> have
> been recalled but some are still in use.  Unfortunately, these drives
> give lots
> of ECC errors.
> 
> The status=c7, flags=51, unit=0x1  means that the drive on unit 1
> (which is
> port 1 since you are using software raid) is showing ECC errors during
> reads.
> 
> You didn't mention what kind of drives you have, but in either case,
> you need
> to replace that drive, IBM or not.
> 
> --
> Adam Radford
> Software Engineer
> 3ware, Inc.
> 
> -----Original Message-----
> From: Scott Ransom [mailto:ransom@cfa.harvard.edu]
> Sent: Wednesday, August 01, 2001 11:15 AM
> To: linux-kernel@vger.kernel.org; Scott Ransom
> Subject: 3ware Escalade problems
> 
> Hello,
> 
> After months of running a fileserver with an 8 port 3ware escalade
> card
> (kernels 2.4.[3457] using reiserfs and software RAID5) I started
> getting
> problems this weekend.
> 
> Over the last three days, when I try to access the drives, after a
> couple minutes I get a drive failure (I even heard a "yelp" from the
> drive during one of them...).  But the "failure" has happened to 3 of
> the 8 drives over 3 days -- so unless there is a hardware problem that
> 
> is killing my drives I find it hard to believe that 3 drives really
> and
> truly failed....
> 
> Here is a sample from my syslog of a failure:
> 
> 3w-xxxx: tw_interrupt(): Bad response, status = 0xc7, flags = 0x51,
> unit
> = 0x1.
> 3w-xxxx: tw_scsi_eh_reset(): Reset succeeded for card 1.
> 3w-xxxx: tw_interrupt(): Bad response, status = 0xc7, flags = 0x51,
> unit
> = 0x1.
> scsi: device set offline - not ready or command retry failed after
> host
> reset: host 1 channel 0 id 1 lun 0
> SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 80000
> I/O error: dev 08:11, sector 158441712
> 
> I've noticed several "issues" with the 3ware cards in the archives.
> Has
> anyone seen something like this?
> 
> Scott
> 
> PS:  I'm currently running 2.4.7 with the lm-sensors/i2c patches.
> 
> --
> Scott M. Ransom                   Address:  Harvard-Smithsonian CfA
> Phone:  (617) 496-7908                      60 Garden St.  MS 10
> email:  ransom@cfa.harvard.edu              Cambridge, MA  02138
> GPG Fingerprint: 06A9 9553 78BE 16DB 407B  FFCA 9BFA B6FF FFD3 2989
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Scott M. Ransom                   Address:  Harvard-Smithsonian CfA
Phone:  (617) 496-7908                      60 Garden St.  MS 10 
email:  ransom@cfa.harvard.edu              Cambridge, MA  02138
GPG Fingerprint: 06A9 9553 78BE 16DB 407B  FFCA 9BFA B6FF FFD3 2989

^ permalink raw reply	[flat|nested] 11+ messages in thread

* 3ware Escalade problems
@ 2001-08-01 18:14 Scott Ransom
  2001-08-01 21:39 ` Jeff V. Merkey
  0 siblings, 1 reply; 11+ messages in thread
From: Scott Ransom @ 2001-08-01 18:14 UTC (permalink / raw)
  To: linux-kernel, Scott Ransom

Hello,

After months of running a fileserver with an 8 port 3ware escalade card
(kernels 2.4.[3457] using reiserfs and software RAID5) I started getting
problems this weekend.

Over the last three days, when I try to access the drives, after a
couple minutes I get a drive failure (I even heard a "yelp" from the
drive during one of them...).  But the "failure" has happened to 3 of
the 8 drives over 3 days -- so unless there is a hardware problem that
is killing my drives I find it hard to believe that 3 drives really and
truly failed....

Here is a sample from my syslog of a failure:

3w-xxxx: tw_interrupt(): Bad response, status = 0xc7, flags = 0x51, unit
= 0x1.
3w-xxxx: tw_scsi_eh_reset(): Reset succeeded for card 1.
3w-xxxx: tw_interrupt(): Bad response, status = 0xc7, flags = 0x51, unit
= 0x1.
scsi: device set offline - not ready or command retry failed after host
reset: host 1 channel 0 id 1 lun 0
SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 80000
I/O error: dev 08:11, sector 158441712

I've noticed several "issues" with the 3ware cards in the archives.  Has
anyone seen something like this?

Scott

PS:  I'm currently running 2.4.7 with the lm-sensors/i2c patches.

-- 
Scott M. Ransom                   Address:  Harvard-Smithsonian CfA
Phone:  (617) 496-7908                      60 Garden St.  MS 10 
email:  ransom@cfa.harvard.edu              Cambridge, MA  02138
GPG Fingerprint: 06A9 9553 78BE 16DB 407B  FFCA 9BFA B6FF FFD3 2989

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2001-08-02 20:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <53B208BD9A7FD311881A009027B6BBFB9EADCC@siamese>
2001-08-02  1:38 ` 3ware Escalade problems Jeff V. Merkey
     [not found] <53B208BD9A7FD311881A009027B6BBFB9EADC7@siamese>
2001-08-01 18:51 ` Scott Ransom
2001-08-01 18:14 Scott Ransom
2001-08-01 21:39 ` Jeff V. Merkey
2001-08-02  0:26   ` Alan Cox
2001-08-02  1:40     ` Jeff V. Merkey
2001-08-02  0:40       ` Alan Cox
2001-08-02  1:58         ` Jeff V. Merkey
2001-08-02 12:22           ` Alan Cox
2001-08-02 21:26             ` Jeff V. Merkey
2001-08-02 22:02               ` Jeff V. Merkey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).