linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* IDE wierdness
@ 2003-08-20 15:09 Larry McVoy
  2003-08-20 15:40 ` Jeff Garzik
  2003-08-20 15:40 ` Alan Cox
  0 siblings, 2 replies; 5+ messages in thread
From: Larry McVoy @ 2003-08-20 15:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List

The primary drive in our file server started to flake out on us (caught by
the integrity checker we use as part of our backups, files that hadn't been
modified in a couple of years started having different CRC's).  I pulled 
the data off and stuck in a new drive.

I wanted to see if the old drive could be salvaged and used as a test box
drive.  The drive seems to be degenerating fast.  When I put that drive
in a 3ware card the 3ware card only sees 1/3 of the drives.  Strange.
When I put all 3 drives in a promise card, it sees them but if I try and
copy data from the bad drive to any other drive the system locks up hard,
no console, no pings, no response to the reset switch, it takes a power
cycle to get things back.

I verified that behaviour on two different systems so it isn't the box.
I also cycled through 3 different 3ware cards to make sure that wasn't
the problem (isn't sys admin fun?).

It's clear to me that I don't want to use this drive but I'm wondering if
there is any interest in debugging the lock up.  I've only done it on
2.4.18 as shipped by redhat but I could try 2.6 or whatever you like.

If the concensus is that it is OK that bad hardware locks you up then I'll
toss the drive and move on.
-- 
---
Larry McVoy              lm at bitmover.com          http://www.bitmover.com/lm

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: IDE wierdness
  2003-08-20 15:09 IDE wierdness Larry McVoy
@ 2003-08-20 15:40 ` Jeff Garzik
  2003-08-20 15:40 ` Alan Cox
  1 sibling, 0 replies; 5+ messages in thread
From: Jeff Garzik @ 2003-08-20 15:40 UTC (permalink / raw)
  To: Larry McVoy, Linux Kernel Mailing List

On Wed, Aug 20, 2003 at 08:09:03AM -0700, Larry McVoy wrote:
> If the concensus is that it is OK that bad hardware locks you up then I'll
> toss the drive and move on.

Don't toss bad drives, send them to weirdos like me:  I can use them
for testing and debugging error handling paths...

	Jeff




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: IDE wierdness
  2003-08-20 15:09 IDE wierdness Larry McVoy
  2003-08-20 15:40 ` Jeff Garzik
@ 2003-08-20 15:40 ` Alan Cox
  2003-08-29 16:15   ` Geert Uytterhoeven
  1 sibling, 1 reply; 5+ messages in thread
From: Alan Cox @ 2003-08-20 15:40 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Linux Kernel Mailing List

On Mer, 2003-08-20 at 16:09, Larry McVoy wrote:
> 
> It's clear to me that I don't want to use this drive but I'm wondering if
> there is any interest in debugging the lock up.  I've only done it on
> 2.4.18 as shipped by redhat but I could try 2.6 or whatever you like.
> 
> If the concensus is that it is OK that bad hardware locks you up then I'll
> toss the drive and move on.

Some PIO transfers are regulated by the drive and the drive can lock the
bus forever. Newer chipsets like the SI680/3112 support watchdog
deadlock breakers for this but we don't really support them right now.

Getting different data off a failing drive is unusual because the blocks
are ECC'd extensively (well more than ECC'd) and have checks, could be
the RAM/CPU going I guess.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: IDE wierdness
  2003-08-20 15:40 ` Alan Cox
@ 2003-08-29 16:15   ` Geert Uytterhoeven
  0 siblings, 0 replies; 5+ messages in thread
From: Geert Uytterhoeven @ 2003-08-29 16:15 UTC (permalink / raw)
  To: Alan Cox; +Cc: Larry McVoy, Linux Kernel Mailing List

On 20 Aug 2003, Alan Cox wrote:
> On Mer, 2003-08-20 at 16:09, Larry McVoy wrote:
> > It's clear to me that I don't want to use this drive but I'm wondering if
> > there is any interest in debugging the lock up.  I've only done it on
> > 2.4.18 as shipped by redhat but I could try 2.6 or whatever you like.
> > 
> > If the concensus is that it is OK that bad hardware locks you up then I'll
> > toss the drive and move on.
> 
> Some PIO transfers are regulated by the drive and the drive can lock the
> bus forever. Newer chipsets like the SI680/3112 support watchdog
> deadlock breakers for this but we don't really support them right now.
> 
> Getting different data off a failing drive is unusual because the blocks
> are ECC'd extensively (well more than ECC'd) and have checks, could be
> the RAM/CPU going I guess.

Although it can happen. I used to see corrupted data in /etc/motd (which is
rewritten on each boot up) and random SEGVs on an embedded box. A few weeks
later the drive started to report real errors. After mapping out the bad blocks
using e2fsck -c, and replacing the files that were affected, the problem
disappeared.

Looks like ECC is not always ECC...

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: IDE wierdness
@ 2003-08-20 17:03 John Bradford
  0 siblings, 0 replies; 5+ messages in thread
From: John Bradford @ 2003-08-20 17:03 UTC (permalink / raw)
  To: linux-kernel, lm

> It's clear to me that I don't want to use this drive but I'm wondering if
> there is any interest in debugging the lock up.  I've only done it on
> 2.4.18 as shipped by redhat but I could try 2.6 or whatever you like.

Out of interest, what does the S.M.A.R.T. data from the drive look like?

John.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-08-29 16:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-20 15:09 IDE wierdness Larry McVoy
2003-08-20 15:40 ` Jeff Garzik
2003-08-20 15:40 ` Alan Cox
2003-08-29 16:15   ` Geert Uytterhoeven
2003-08-20 17:03 John Bradford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).