linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE:Re: ide drive dying?
@ 2002-09-06 20:40 Hell.Surfers
  2002-09-06 21:01 ` Alan Cox
  2002-09-06 21:07 ` DevilKin
  0 siblings, 2 replies; 68+ messages in thread
From: Hell.Surfers @ 2002-09-06 20:40 UTC (permalink / raw)
  To: alan, degger, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 120 bytes --]

Is a drive you cant rely on worth having?



On 	06 Sep 2002 21:31:25 +0100 	Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

[-- Attachment #2: Type: message/rfc822, Size: 2672 bytes --]

From: Alan Cox <alan@lxorguk.ukuu.org.uk>
To: Daniel Egger <degger@fhm.edu>
Cc: linux-kernel@vger.kernel.org
Subject: Re: ide drive dying?
Date: 06 Sep 2002 21:31:25 +0100
Message-ID: <1031344285.9861.81.camel@irongate.swansea.linux.org.uk>

On Fri, 2002-09-06 at 18:33, Daniel Egger wrote:
> Am Fre, 2002-09-06 um 17.38 schrieb Alan Cox:
> 
> > Get the IBM disk tools, upgrade the firmware and see what the ibm tools
> > have to say. IBM drives have had some problems with spontaneous bad
> > blocks appearing that go away with new firmware and a run of the disk
> > tools.
> 
> The "run of the disk tools" that does away with the badblocks is a
> lowlevel format; a tedious way to spent ones' time on a harddrive
> that will die anyway soon.

For the IBM's it depends what the problem is. Spontaneous bad blocks
appearing during power off appears to be fixed by the firmware update
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: RE:Re: ide drive dying?
  2002-09-06 20:40 RE:Re: ide drive dying? Hell.Surfers
@ 2002-09-06 21:01 ` Alan Cox
  2002-09-06 21:45   ` jbradford
                     ` (2 more replies)
  2002-09-06 21:07 ` DevilKin
  1 sibling, 3 replies; 68+ messages in thread
From: Alan Cox @ 2002-09-06 21:01 UTC (permalink / raw)
  To: Hell.Surfers; +Cc: degger, linux-kernel

On Fri, 2002-09-06 at 21:40, Hell.Surfers@cwctv.net wrote:
> Is a drive you cant rely on worth having?

Thats up to the owner. There are lots of uses for such drives - /tmp,
swap, in a raid array, etc

Mind you I collect drives that have nice properties like "hangs the
entire scsi bus when inserted into an SCA connector" for testing with


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Re: ide drive dying?
  2002-09-06 20:40 RE:Re: ide drive dying? Hell.Surfers
  2002-09-06 21:01 ` Alan Cox
@ 2002-09-06 21:07 ` DevilKin
  1 sibling, 0 replies; 68+ messages in thread
From: DevilKin @ 2002-09-06 21:07 UTC (permalink / raw)
  To: Hell.Surfers, linux-kernel

On Friday 06 September 2002 22:40, Hell.Surfers@cwctv.net wrote:
> Is a drive you cant rely on worth having?

Very good question... 

the DFT has finished it's work, and tells me no more bad sectors are 
present... for how long?

To the swap guru's: what does linux do if it attempts to write to swap, and 
gets an error code returned from the ide layer?

DK
-- 
The streets are safe in Philadelphia, it's only the people who make
them unsafe.
		-- Mayor Frank Rizzo


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: RE:Re: ide drive dying?
  2002-09-06 21:01 ` Alan Cox
@ 2002-09-06 21:45   ` jbradford
  2002-09-07  1:02   ` Jason L Tibbitts III
  2002-09-07 12:03   ` Daniel Egger
  2 siblings, 0 replies; 68+ messages in thread
From: jbradford @ 2002-09-06 21:45 UTC (permalink / raw)
  To: linux-kernel

> > Is a drive you cant rely on worth having?
> 
> Thats up to the owner. There are lots of uses for such drives - /tmp,
> swap, in a raid array, etc

..primary Windows partition :-)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 21:01 ` Alan Cox
  2002-09-06 21:45   ` jbradford
@ 2002-09-07  1:02   ` Jason L Tibbitts III
  2002-09-07 12:03   ` Daniel Egger
  2 siblings, 0 replies; 68+ messages in thread
From: Jason L Tibbitts III @ 2002-09-07  1:02 UTC (permalink / raw)
  To: linux-kernel

>>>>> "AC" == Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

AC> Thats up to the owner. There are lots of uses for such drives -
AC> /tmp, swap, in a raid array, etc

Be careful of these even in a RAID array; they will go bad silently.
I had one array (software RAID5, 8 75GXP drives on a 3w6800 in JBOD
mode, one hot spare) that was going fine until one drive died hard,
wouldn't spin up, etc.  I replaced it, but during the RAID resync
three other drives were found to have errors.  The array was trash,
but luckily all drives were dead just at the tail end, so I could copy
the data out during the RAID resync.  Some of the failed drives had
the updated firmware.

3ware has background integrity scans now; I don't know if software
RAID has any equivalent besides an occasional 'dd', but even that's a
good idea.

 - J<

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: RE:Re: ide drive dying?
  2002-09-06 21:01 ` Alan Cox
  2002-09-06 21:45   ` jbradford
  2002-09-07  1:02   ` Jason L Tibbitts III
@ 2002-09-07 12:03   ` Daniel Egger
  2 siblings, 0 replies; 68+ messages in thread
From: Daniel Egger @ 2002-09-07 12:03 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 810 bytes --]

Am Fre, 2002-09-06 um 23.01 schrieb Alan Cox:

> Thats up to the owner. There are lots of uses for such drives - /tmp,
> swap, in a raid array, etc

Having two of such notorious broken drives in a RAID array is also
not an option in many cases. Mirroring is meant to increase data
security in case a drive fails spontaneously; using particularly bad
drives for that purpose is a way to work against the reason.

> Mind you I collect drives that have nice properties like "hangs the
> entire scsi bus when inserted into an SCA connector" for testing with

You probably should keep a DeathStar as the worst drive ever made.
Heck, if my latest replacement drive from IBM ("serviceable used part")
starts failing again I might as well ship it to you instead of IBM.
 
-- 
Servus,
       Daniel

[-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-10 12:48   ` Ookhoi
  2002-09-10 13:59     ` Mike Dresser
@ 2002-09-10 20:21     ` Andre Hedrick
  1 sibling, 0 replies; 68+ messages in thread
From: Andre Hedrick @ 2002-09-10 20:21 UTC (permalink / raw)
  To: Ookhoi; +Cc: jbradford, Dieter N?tzel, linux-kernel


MaxLine II is Serial ATA coming in at 250GB per disk.


On Tue, 10 Sep 2002, Ookhoi wrote:

> jbradford@dial.pipex.com wrote (ao):
> > I have *never* lost data to a Maxtor disk.  I have had IBM, Fujitsu,
> > Western Digital, and DEC drives all fail on me before.
> > 
> > It's dissapointing that Maxtor are reducing their warranty from 3
> > years to 1 year, but on the other hand, I've never needed it at all.
> 
> FWIW:
> On http://www.maxtor.com/products/enterprise_apps/default.htm
> they say 3 years limited warranty. 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-10 15:21         ` Mike Dresser
@ 2002-09-10 15:29           ` Larry McVoy
  0 siblings, 0 replies; 68+ messages in thread
From: Larry McVoy @ 2002-09-10 15:29 UTC (permalink / raw)
  To: Mike Dresser
  Cc: jbradford, ookhoi, Dieter.Nuetzel, linux-kernel, erin_hartin, sales-mkt

On Tue, Sep 10, 2002 at 11:21:24AM -0400, Mike Dresser wrote:
> On Tue, 10 Sep 2002 jbradford@dial.pipex.com wrote:
> 
> > According this this announcement:
> >
> > http://www.shareholder.com/maxtor/news/20020909-89588.cfm
> >
> > some of their new ATA drives will carry a three-year warranty.
> >
> > John.
> 
> Right.  Only the MaxLine II.  The rest, not including SCSI, are 1 year.
> 
> It looks to me that Maxtor is exiting the consumer market.  They sell
> their crippled DiamondMax 9/16's, and the existing product lines, to the
> OEM's who don't care about warranty as much.

Well can you blame them?  Drive prices are coming down faster than processor
prices and it costs a lot more to produce a drive than a processor (production
costs, not development costs).  Drives have parts.  The head assembly isn't
free.  It's unbelieveable that we can get drives for $1/GB, at least it is
to me.  And if any of us think we're getting reliable drives at this price,
a visit from the tooth fairy can't be far behind.

What we do here is mark the date we put a drive into production on the drive
then cycle the drive out of production use in 24 months.  We have lots of
build machines so the "old" drives go into those.  We also put in 4 drives
for any data we care about (on a 3ware escalade in JBOD mod) and then 
mirror the data nightly to /nightly, /weekly, or /monthly.  If I'm really
being paranoid, I mix manufacters and release dates in the set of 4 drives
so I drop the likelihood of them all failing at once.

Don't get me wrong, there is no love lost between BitMover and Maxtor, 
they aren't a customer and we've had our own problems dealing with them
in the past.  However, it seems unfair to get too unhappy with a product
that works as well as it does for the price that you pay.  I'd hate to
be in the drive business, it looks like a losing proposition to me.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-10 14:56       ` jbradford
@ 2002-09-10 15:21         ` Mike Dresser
  2002-09-10 15:29           ` Larry McVoy
  0 siblings, 1 reply; 68+ messages in thread
From: Mike Dresser @ 2002-09-10 15:21 UTC (permalink / raw)
  To: jbradford; +Cc: ookhoi, Dieter.Nuetzel, linux-kernel, erin_hartin, sales-mkt

On Tue, 10 Sep 2002 jbradford@dial.pipex.com wrote:

> According this this announcement:
>
> http://www.shareholder.com/maxtor/news/20020909-89588.cfm
>
> some of their new ATA drives will carry a three-year warranty.
>
> John.

Right.  Only the MaxLine II.  The rest, not including SCSI, are 1 year.

It looks to me that Maxtor is exiting the consumer market.  They sell
their crippled DiamondMax 9/16's, and the existing product lines, to the
OEM's who don't care about warranty as much.

They continue to sell to NAS/SAN manufacturers(even though they got out of
the business themselves), and to people building large servers, with their
MaxLine  II.

Mike


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-10 13:59     ` Mike Dresser
@ 2002-09-10 14:56       ` jbradford
  2002-09-10 15:21         ` Mike Dresser
  0 siblings, 1 reply; 68+ messages in thread
From: jbradford @ 2002-09-10 14:56 UTC (permalink / raw)
  To: Mike Dresser; +Cc: ookhoi, Dieter.Nuetzel, linux-kernel, erin_hartin, sales-mkt

> 
> On Tue, 10 Sep 2002, Ookhoi wrote:
> 
> > jbradford@dial.pipex.com wrote (ao):
> > > I have *never* lost data to a Maxtor disk.  I have had IBM, Fujitsu,
> > > Western Digital, and DEC drives all fail on me before.
> > >
> > > It's dissapointing that Maxtor are reducing their warranty from 3
> > > years to 1 year, but on the other hand, I've never needed it at all.
> >
> > FWIW:
> > On http://www.maxtor.com/products/enterprise_apps/default.htm
> > they say 3 years limited warranty.
> 
> That's only their MaxLine II drives.  Their regular DiamondMax and all
> that, are still one year starting in October.  At least their SCSI drives
> haven't been killed off yet.
> 
> OffTopic:
> 
> I'm wonder just who in upper Management at Maxtor decided to help the
> company commit suicide.
> 
> I've already ordered a few Seagate drives to test out here at our
> offices, to replace my previous choice of Maxtor D740X's.  I'll still be
> looking at the MaxLine II's for backup servers because of the 3 year
> warranty, but for desktops, I can't risk our data to drives that even the
> manufacturer doesn't trust.  The performance drop becomes secondary at
> that point.
> 
> I know of a few local shops that will no longer carry Maxtor drives
> because the warranty costs would kill their profit margin.  They cannot
> offer a 3 year warranty on the computer when the drive is only covered
> for a year.
> 
> Mike Dresser,
> 
> Systems Administrator
> Windsor Machine & Stamping
> 

According this this announcement:

http://www.shareholder.com/maxtor/news/20020909-89588.cfm

some of their new ATA drives will carry a three-year warranty.

John.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-10 12:48   ` Ookhoi
@ 2002-09-10 13:59     ` Mike Dresser
  2002-09-10 14:56       ` jbradford
  2002-09-10 20:21     ` Andre Hedrick
  1 sibling, 1 reply; 68+ messages in thread
From: Mike Dresser @ 2002-09-10 13:59 UTC (permalink / raw)
  To: Ookhoi; +Cc: jbradford, Dieter N?tzel, linux-kernel, erin_hartin, sales-mkt

On Tue, 10 Sep 2002, Ookhoi wrote:

> jbradford@dial.pipex.com wrote (ao):
> > I have *never* lost data to a Maxtor disk.  I have had IBM, Fujitsu,
> > Western Digital, and DEC drives all fail on me before.
> >
> > It's dissapointing that Maxtor are reducing their warranty from 3
> > years to 1 year, but on the other hand, I've never needed it at all.
>
> FWIW:
> On http://www.maxtor.com/products/enterprise_apps/default.htm
> they say 3 years limited warranty.

That's only their MaxLine II drives.  Their regular DiamondMax and all
that, are still one year starting in October.  At least their SCSI drives
haven't been killed off yet.

OffTopic:

I'm wonder just who in upper Management at Maxtor decided to help the
company commit suicide.

I've already ordered a few Seagate drives to test out here at our
offices, to replace my previous choice of Maxtor D740X's.  I'll still be
looking at the MaxLine II's for backup servers because of the 3 year
warranty, but for desktops, I can't risk our data to drives that even the
manufacturer doesn't trust.  The performance drop becomes secondary at
that point.

I know of a few local shops that will no longer carry Maxtor drives
because the warranty costs would kill their profit margin.  They cannot
offer a 3 year warranty on the computer when the drive is only covered
for a year.

Mike Dresser,

Systems Administrator
Windsor Machine & Stamping


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-08 17:42 ` jbradford
                     ` (2 preceding siblings ...)
  2002-09-09 18:17   ` Thunder from the hill
@ 2002-09-10 12:48   ` Ookhoi
  2002-09-10 13:59     ` Mike Dresser
  2002-09-10 20:21     ` Andre Hedrick
  3 siblings, 2 replies; 68+ messages in thread
From: Ookhoi @ 2002-09-10 12:48 UTC (permalink / raw)
  To: jbradford; +Cc: Dieter N?tzel, linux-kernel

jbradford@dial.pipex.com wrote (ao):
> I have *never* lost data to a Maxtor disk.  I have had IBM, Fujitsu,
> Western Digital, and DEC drives all fail on me before.
> 
> It's dissapointing that Maxtor are reducing their warranty from 3
> years to 1 year, but on the other hand, I've never needed it at all.

FWIW:
On http://www.maxtor.com/products/enterprise_apps/default.htm
they say 3 years limited warranty. 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-08 14:14                           ` jbradford
@ 2002-09-09 21:59                             ` Alan Cox
  0 siblings, 0 replies; 68+ messages in thread
From: Alan Cox @ 2002-09-09 21:59 UTC (permalink / raw)
  To: jbradford; +Cc: linux-kernel

If I knew how to check that I would. We wanted to do the same for the
ancient WD but there was no way to tell


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-09 18:17   ` Thunder from the hill
@ 2002-09-09 18:55     ` Mike Dresser
  0 siblings, 0 replies; 68+ messages in thread
From: Mike Dresser @ 2002-09-09 18:55 UTC (permalink / raw)
  To: Thunder from the hill; +Cc: jbradford, Dieter Nützel, linux-kernel

On Mon, 9 Sep 2002, Thunder from the hill wrote:

> Hi,
>
> On Sun, 8 Sep 2002 jbradford@dial.pipex.com wrote:
> > I have *never* lost data to a Maxtor disk.  I have had IBM, Fujitsu,
> > Western Digital, and DEC drives all fail on me before.
>
> I can't confirm that. Yes, IBM failed, Fujitsu is often IBM, DEC isn't any
> better either. But Western... I'm still having some quite old Western
> drives, aged several years, a lot more than they guaranteed. They still

WDC AC21600H.

Best damn drive ever made by any company.

I've got maybe 40 of these left in the systems here.  They're coming up on
7-8 years old.

Sure, they're dog slow.  Sure, they're pretty small(1.6 gig)

But they're rock stable and solid.  I use them for boot drives for old
servers, and for the old Windows PC's

Mike


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-08 17:42 ` jbradford
  2002-09-09  1:31   ` Nuitari
  2002-09-09 12:26   ` Joachim Breuer
@ 2002-09-09 18:17   ` Thunder from the hill
  2002-09-09 18:55     ` Mike Dresser
  2002-09-10 12:48   ` Ookhoi
  3 siblings, 1 reply; 68+ messages in thread
From: Thunder from the hill @ 2002-09-09 18:17 UTC (permalink / raw)
  To: jbradford; +Cc: Dieter Nützel, linux-kernel

Hi,

On Sun, 8 Sep 2002 jbradford@dial.pipex.com wrote:
> I have *never* lost data to a Maxtor disk.  I have had IBM, Fujitsu,
> Western Digital, and DEC drives all fail on me before.

I can't confirm that. Yes, IBM failed, Fujitsu is often IBM, DEC isn't any 
better either. But Western... I'm still having some quite old Western 
drives, aged several years, a lot more than they guaranteed. They still 
run our old database, and are used in some workstations. Some of them 
touched ground more than once, and are still running like the cursed. No 
need for an end. Then there are these ST-157A, stable as rocks, still 
running here. You can even crash them on up to 60G's, if not more!!! That 
is, they can stand falling down very well.

In the while, there were two to three broken Maxtor disks. Their spindles  
broke after two years, so the data was physically moved upwards. We've 
returned them and got another disk in return, no problem.

			Thunder
-- 
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-08 17:42 ` jbradford
  2002-09-09  1:31   ` Nuitari
@ 2002-09-09 12:26   ` Joachim Breuer
  2002-09-09 18:17   ` Thunder from the hill
  2002-09-10 12:48   ` Ookhoi
  3 siblings, 0 replies; 68+ messages in thread
From: Joachim Breuer @ 2002-09-09 12:26 UTC (permalink / raw)
  To: Dieter Nützel; +Cc: linux-kernel

Just for the sake of the argument...

jbradford@dial.pipex.com writes:

>> BTW 
>> I had a double disk crash (same symptoms as in this thread) in a
>> school's RAID5 with four Fujitsu MPG3204AT-EF (the ones with
>> gel-lager, silent and reliable we hoped) last week...  The shop for
>> which I work from time to time got 71 disks of this type back (sold
>> over the last 1.5 years). We switched to them after the "IBM"
>> disaster. Maybe a "misdecision" ;-) What shall we sell safely,
>> now...?  MAXTOR?
>
> I have *never* lost data to a Maxtor disk.  I have had IBM, Fujitsu,
> Western Digital, and DEC drives all fail on me before.
>
> It's dissapointing that Maxtor are reducing their warranty from 3
> years to 1 year, but on the other hand, I've never needed it at all.

And with good reason, it seems (the warranty reduction) - my Maxtor
6L060J3 (or whatever, the 7200rpm 60G ATA-100) died after approx. 8
weeks (bad sectors); warranty replacement; replacement dies after
approx. 16 weeks (bad sectors); I'm now on the 2nd replacement. Oh
joy.

I have to say that I have a few more (~ 5) Maxtor drives running which
didn't cause any trouble... so far.

Yes, I did switch to Maxtor because of excessive outages of IBM drives
(DeathStar), why do you ask?


So long,
   Joe

-- 
"I use emacs, which might be thought of as a thermonuclear
 word processor."
-- Neal Stephenson, "In the beginning... was the command line"

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-08 20:11       ` jbradford
@ 2002-09-09  2:37         ` Nuitari
  0 siblings, 0 replies; 68+ messages in thread
From: Nuitari @ 2002-09-09  2:37 UTC (permalink / raw)
  To: linux-kernel

On Sun, 8 Sep 2002 jbradford@dial.pipex.com wrote:

> > Keep your drive cool and you can expect to keep it around for a very 
> > long time.
> 
> In a large tower case, it is worth while to leave a drive bay free between disks, instead of using them sequentially, like this:
> 
>     ------             ------
>     |****|             |****|
>     ------             ------
>     |    |             |****|
>     ------             ------
>     |****|             |****|
>     ------  instead of ------
>     |    |             |    |
>     ------             ------
>     |****|             |    |
>     ------             ------
>     |    |             |    |
>     ------             ------
> 
> Also, if you get a disk that suddenly doesn't spin up, don't assume that
> the motor has died - you can sometimes bring them back to life by
> connecting power to them, and giving them a very sharp angular jolt in
> the plane of the platters - the effect is called static friction,
> (A.K.A. stiction)


Depends on the ventilation you can put.
In this computer I have all bays (5 1/4 and 3 1/2) full and the 
temperature doesn't get over 30
There is a total of 4 HDs and 3CDs drives.

I think the spacing depends on the heat output of the drives.
I use my  2 Samsungs (they are always cold) between the Maxtor (makes a 
lot of heat) and Fujitsu.


Of course the best is to strap 6" fans in front of them (which I did to 
cool a stack of 6 full height disks that I put in an old full tower 
cases).


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-08 17:42 ` jbradford
@ 2002-09-09  1:31   ` Nuitari
  2002-09-08 19:27     ` Ed Sweetman
  2002-09-09 12:26   ` Joachim Breuer
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 68+ messages in thread
From: Nuitari @ 2002-09-09  1:31 UTC (permalink / raw)
  To: linux-kernel

On Sun, 8 Sep 2002 jbradford@dial.pipex.com wrote:
> I have *never* lost data to a Maxtor disk.  I have had IBM, Fujitsu, Western Digital, and DEC drives all fail on me before.
> 
> It's dissapointing that Maxtor are reducing their warranty from 3 years to 1 year, but on the other hand, I've never needed it at all.

The problem is that you will eventually lose data. No matter what the 
brand is. Some disks tend to work better for longer time. Sometimes you 
are just out of luck. With some brands, luck seems to be running out 
(Quantum). Other brands may work better, but they will eventually fail.

I've had failed Seagate, Maxtor, IBM, Fujitsu, Western Digital, Quantum, 
Conner.

The only brand which never failed that I use is Samsung (probably due to 
the fact that I only have 2 of them compared to the other brands).

I do expect them to fail and I have backups of the most important stuff I 
need.

The best I found for reliability (except for backups) is haveing a 
software raid 5 on many disks of about same capacity (but different 
brand/model).


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-08 19:27     ` Ed Sweetman
@ 2002-09-08 20:11       ` jbradford
  2002-09-09  2:37         ` Nuitari
  0 siblings, 1 reply; 68+ messages in thread
From: jbradford @ 2002-09-08 20:11 UTC (permalink / raw)
  To: Ed Sweetman; +Cc: linux-kernel

> Keep your drive cool and you can expect to keep it around for a very 
> long time.

In a large tower case, it is worth while to leave a drive bay free between disks, instead of using them sequentially, like this:

    ------             ------
    |****|             |****|
    ------             ------
    |    |             |****|
    ------             ------
    |****|             |****|
    ------  instead of ------
    |    |             |    |
    ------             ------
    |****|             |    |
    ------             ------
    |    |             |    |
    ------             ------

Also, if you get a disk that suddenly doesn't spin up, don't assume that the motor has died - you can sometimes bring them back to life by connecting power to them, and giving them a very sharp angular jolt in the plane of the platters - the effect is called static friction, (A.K.A. stiction)

John.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-09  1:31   ` Nuitari
@ 2002-09-08 19:27     ` Ed Sweetman
  2002-09-08 20:11       ` jbradford
  0 siblings, 1 reply; 68+ messages in thread
From: Ed Sweetman @ 2002-09-08 19:27 UTC (permalink / raw)
  To: Nuitari; +Cc: linux-kernel

Nuitari wrote:
> On Sun, 8 Sep 2002 jbradford@dial.pipex.com wrote:
> 
>>I have *never* lost data to a Maxtor disk.  I have had IBM, Fujitsu, Western Digital, and DEC drives all fail on me before.
>>
>>It's dissapointing that Maxtor are reducing their warranty from 3 years to 1 year, but on the other hand, I've never needed it at all.
> 
> 
> The problem is that you will eventually lose data. No matter what the 
> brand is. Some disks tend to work better for longer time. Sometimes you 
> are just out of luck. With some brands, luck seems to be running out 
> (Quantum). Other brands may work better, but they will eventually fail.
> 
> I've had failed Seagate, Maxtor, IBM, Fujitsu, Western Digital, Quantum, 
> Conner.
> 
> The only brand which never failed that I use is Samsung (probably due to 
> the fact that I only have 2 of them compared to the other brands).
> 
> I do expect them to fail and I have backups of the most important stuff I 
> need.
> 
> The best I found for reliability (except for backups) is haveing a 
> software raid 5 on many disks of about same capacity (but different 
> brand/model).
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

well since most hdd's have a 5-10 year lifespan when operating in <30C 
you can see why many people's hdds are dying or having errors much 
sooner.  People are operating their hdd's in the mid 30's or higher due 
to low air circulation or just generally high ambient temperatures in 
the case.  This cuts run to error time significantly.   There was a 
chart on some hdd website that showed how long you can expect to have 
your data safe for different ranges of temperatures and it's something 
like 30-33 = 3 year ...~35C is getting close to 1 year and it just goes 
like that.  So yea, it's not surprising at all that reputations that 
used to work dont anymore as heat effects all and heat has become the 
dominating problem in today's hdds rather than simple quality of parts.

Keep your drive cool and you can expect to keep it around for a very 
long time.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-08  0:02 Dieter Nützel
@ 2002-09-08 17:42 ` jbradford
  2002-09-09  1:31   ` Nuitari
                     ` (3 more replies)
  0 siblings, 4 replies; 68+ messages in thread
From: jbradford @ 2002-09-08 17:42 UTC (permalink / raw)
  To: Dieter Nützel; +Cc: linux-kernel

> Andre, can you fix start/stop counts, please?
> 
> unWave1 /home/nuetzel# /usr/local/sbin/smartctl -a /dev/sda
> Device: IBM      DDYS-T18350N     Version: S96H
> Device supports S.M.A.R.T. and is Enabled
> Temperature Warning Disabled or Not Supported
> S.M.A.R.T. Sense: Okay!
> Current Drive Temperature:     31 C
> Drive Trip Temperature:        85 C
> Current start stop count:      131072 times
> Recommended start stop count:  2555920 times
> 
> SunWave1 /home/nuetzel# /usr/local/sbin/smartctl -a /dev/sdb
> Device: IBM      DDRS-34560D      Version: DC1B
> Device supports S.M.A.R.T. and is Enabled
> Temperature Warning Disabled or Not Supported
> S.M.A.R.T. Sense: Okay!
> 
> SunWave1 /home/nuetzel# /usr/local/sbin/smartctl -a /dev/sdc
> Device: IBM      DDRS-34560W      Version: S71D
> Device supports S.M.A.R.T. and is Enabled
> Temperature Warning Disabled or Not Supported
> S.M.A.R.T. Sense: Okay!
> 
> Smartsuite-2.1 (at least) missing some feather for SCSI.

Are you sure that it is not just the drive mis-reporting the start/stop counts?  S.M.A.R.T. implementions are often flakey.

> BTW
> I had a double disk crash (same symptoms as in this thread) in a school's 
> RAID5 with four Fujitsu MPG3204AT-EF (the ones with gel-lager, silent and 
> reliable we hoped) last week...
> The shop for which I work from time to time got 71 disks of this type back 
> (sold over the last 1.5 years). We switched to them after the "IBM" disaster. 
> Maybe a "misdecision" ;-)
> What shall we sell safely, now...?
> MAXTOR?

I have *never* lost data to a Maxtor disk.  I have had IBM, Fujitsu, Western Digital, and DEC drives all fail on me before.

It's dissapointing that Maxtor are reducing their warranty from 3 years to 1 year, but on the other hand, I've never needed it at all.

John.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-08 10:56                         ` Henning P. Schmiedehausen
@ 2002-09-08 14:14                           ` jbradford
  2002-09-09 21:59                             ` Alan Cox
  0 siblings, 1 reply; 68+ messages in thread
From: jbradford @ 2002-09-08 14:14 UTC (permalink / raw)
  To: linux-kernel

> >The firmware update is for many more drives than that, My own
> 
> >     Model=IBM-DTLA-305040, FwRev=TW4OA60A
> 
> >is also recommended, as well as many with a FwRev=xxxOyzzz with zzz<66A.
> >Now i have to find a windows machine to try it out on...
> 
> You don't need to. All you need is someone run this tool and send you
> the image it creates. I put mine as boot.img on a CD so I can upgrade
> all the disks I have in boxes without floppy disk drives. It's a self
> booting DOS disk.

As the old firmware is known to be buggy, and those bugs are relevant when using Linux, and updated firmware is available, is it worth checking for the known buggy firmware version in the ide driver?

I realise that we cannot check every drive in the world for compatibility, but if this is a known issue...

John.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07 23:19                       ` David Forrest
@ 2002-09-08 10:56                         ` Henning P. Schmiedehausen
  2002-09-08 14:14                           ` jbradford
  0 siblings, 1 reply; 68+ messages in thread
From: Henning P. Schmiedehausen @ 2002-09-08 10:56 UTC (permalink / raw)
  To: linux-kernel

David Forrest <drf5n@mug.sys.virginia.edu> writes:

>On Sat, 7 Sep 2002 jbradford@dial.pipex.com wrote:

>...
>> Here is the URL:
>>
>> http://www-1.ibm.com/support/docview.wss?uid=psg1MIGR-39082
>>
>> it expressly states that the firmware is intended for the DTLA-307060.

>The firmware update is for many more drives than that, My own

>     Model=IBM-DTLA-305040, FwRev=TW4OA60A

>is also recommended, as well as many with a FwRev=xxxOyzzz with zzz<66A.
>Now i have to find a windows machine to try it out on...

You don't need to. All you need is someone run this tool and send you
the image it creates. I put mine as boot.img on a CD so I can upgrade
all the disks I have in boxes without floppy disk drives. It's a self
booting DOS disk.

	Regards
		Henning

-- 
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen       -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH     hps@intermeta.de

Am Schwabachgrund 22  Fon.: 09131 / 50654-0   info@intermeta.de
D-91054 Buckenhof     Fax.: 09131 / 50654-20   

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
@ 2002-09-08  0:02 Dieter Nützel
  2002-09-08 17:42 ` jbradford
  0 siblings, 1 reply; 68+ messages in thread
From: Dieter Nützel @ 2002-09-08  0:02 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: Linux Kernel List

On 7 Sep 2002, Andre Hedrick wrote:
> On 7 Sep 2002, Daniel Egger wrote:
>
> > Am Sam, 2002-09-07 um 17.02 schrieb jbradford@dial.pipex.com:
> > 
> > > No, but you've upgraded the firmware, right?
> > 
> > Not exactly. According to IBM technical support there is no such thing
> > as a new firmware. The drives are alright, the OS is broken.
>
> They are full of CRAP!
>
> IBM ran TASKFILE IO throught there bus analyzers and it came up clean.
> IBM also introduced FLAGGED versions of the diagnostic TASKFILE transport
> for eventual use of their DFT (Drive Fitness Test).
>
> You tell the service tech he is smoking crack.
> The kernel passed with flying colors in their disk labs. If you read
> in ide-taskfile.c version 0.33 and above, you will see they did some work
> on the driver and verified issues.

Sorry, that I step in but you said that you are working on smartsuite (2.1+), 
again?

Andre, can you fix start/stop counts, please?

unWave1 /home/nuetzel# /usr/local/sbin/smartctl -a /dev/sda
Device: IBM      DDYS-T18350N     Version: S96H
Device supports S.M.A.R.T. and is Enabled
Temperature Warning Disabled or Not Supported
S.M.A.R.T. Sense: Okay!
Current Drive Temperature:     31 C
Drive Trip Temperature:        85 C
Current start stop count:      131072 times
Recommended start stop count:  2555920 times

SunWave1 /home/nuetzel# /usr/local/sbin/smartctl -a /dev/sdb
Device: IBM      DDRS-34560D      Version: DC1B
Device supports S.M.A.R.T. and is Enabled
Temperature Warning Disabled or Not Supported
S.M.A.R.T. Sense: Okay!

SunWave1 /home/nuetzel# /usr/local/sbin/smartctl -a /dev/sdc
Device: IBM      DDRS-34560W      Version: S71D
Device supports S.M.A.R.T. and is Enabled
Temperature Warning Disabled or Not Supported
S.M.A.R.T. Sense: Okay!

Smartsuite-2.1 (at least) missing some feather for SCSI.

Regards,
	Dieter

BTW
I had a double disk crash (same symptoms as in this thread) in a school's 
RAID5 with four Fujitsu MPG3204AT-EF (the ones with gel-lager, silent and 
reliable we hoped) last week...
The shop for which I work from time to time got 71 disks of this type back 
(sold over the last 1.5 years). We switched to them after the "IBM" disaster. 
Maybe a "misdecision" ;-)
What shall we sell safely, now...?
MAXTOR?

-- 
Dieter Nützel
Graduate Student, Computer Science

University of Hamburg
Department of Computer Science
@home: Dieter.Nuetzel at hamburg.de (replace at with @)


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07 22:00                     ` jbradford
@ 2002-09-07 23:19                       ` David Forrest
  2002-09-08 10:56                         ` Henning P. Schmiedehausen
  0 siblings, 1 reply; 68+ messages in thread
From: David Forrest @ 2002-09-07 23:19 UTC (permalink / raw)
  To: jbradford; +Cc: Alan Cox, degger, linux-kernel

On Sat, 7 Sep 2002 jbradford@dial.pipex.com wrote:

...
> Here is the URL:
>
> http://www-1.ibm.com/support/docview.wss?uid=psg1MIGR-39082
>
> it expressly states that the firmware is intended for the DTLA-307060.

The firmware update is for many more drives than that, My own

     Model=IBM-DTLA-305040, FwRev=TW4OA60A

is also recommended, as well as many with a FwRev=xxxOyzzz with zzz<66A.
Now i have to find a windows machine to try it out on...

Dave,

-- 
 Dave Forrest                                   drf5n@virginia.edu
 (804)642-0662h (434)924-3954w  http://mug.sys.virginia.edu/~drf5n/


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07 20:19                 ` Daniel Egger
  2002-09-07 20:41                   ` jbradford
  2002-09-07 21:41                   ` Alan Cox
@ 2002-09-07 22:05                   ` Andre Hedrick
  2 siblings, 0 replies; 68+ messages in thread
From: Andre Hedrick @ 2002-09-07 22:05 UTC (permalink / raw)
  To: Daniel Egger; +Cc: jbradford, linux-kernel

On 7 Sep 2002, Daniel Egger wrote:

> Am Sam, 2002-09-07 um 17.02 schrieb jbradford@dial.pipex.com:
> 
> > No, but you've upgraded the firmware, right?
> 
> Not exactly. According to IBM technical support there is no such thing
> as a new firmware. The drives are alright, the OS is broken.

They are full of CRAP!

IBM ran TASKFILE IO throught there bus analyzers and it came up clean.
IBM also introduced FLAGGED versions of the diagnostic TASKFILE transport
for eventual use of their DFT (Drive Fitness Test).

You tell the service tech he is smoking crack.
The kernel passed with flying colors in their disk labs. If you read
in ide-taskfile.c version 0.33 and above, you will see they did some work
on the driver and verified issues.

Now earlier I published a method of how to stablize the drive once you
back up all the data you can off of it.  Since I do not yet have a source
verison of DFT-Linux, or binary yet, I can not offer much more native.

Cheers,

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07 21:41                   ` Alan Cox
@ 2002-09-07 22:00                     ` jbradford
  2002-09-07 23:19                       ` David Forrest
  0 siblings, 1 reply; 68+ messages in thread
From: jbradford @ 2002-09-07 22:00 UTC (permalink / raw)
  To: Alan Cox; +Cc: degger, linux-kernel

> On Sat, 2002-09-07 at 21:19, Daniel Egger wrote:
> > Am Sam, 2002-09-07 um 17.02 schrieb jbradford@dial.pipex.com:
> > 
> > > No, but you've upgraded the firmware, right?
> > 
> > Not exactly. According to IBM technical support there is no such thing
> > as a new firmware. The drives are alright, the OS is broken.
> 
> The IBM technical support I dealt with not only confirmed there was new
> firmware, the tools updated it and said they had 8)

Here is the URL:

http://www-1.ibm.com/support/docview.wss?uid=psg1MIGR-39082

it expressly states that the firmware is intended for the DTLA-307060.

The page mentions that is it enhances stability and SMART data collection.

John.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07 20:41                   ` jbradford
@ 2002-09-07 21:41                     ` Alan Cox
  0 siblings, 0 replies; 68+ messages in thread
From: Alan Cox @ 2002-09-07 21:41 UTC (permalink / raw)
  To: jbradford; +Cc: Daniel Egger, linux-kernel

On Sat, 2002-09-07 at 21:41, jbradford@dial.pipex.com wrote:
> > According to IBM technical support there is no such thing
> > as a new firmware. The drives are alright, the OS is broken.
> 
> Right, so you're calling Alan Cox a liar, then?  I know who I believe.

Hardly. He said IBM tech support told him one thing, and they told me
another. Give it a rest


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07 20:19                 ` Daniel Egger
  2002-09-07 20:41                   ` jbradford
@ 2002-09-07 21:41                   ` Alan Cox
  2002-09-07 22:00                     ` jbradford
  2002-09-07 22:05                   ` Andre Hedrick
  2 siblings, 1 reply; 68+ messages in thread
From: Alan Cox @ 2002-09-07 21:41 UTC (permalink / raw)
  To: Daniel Egger; +Cc: jbradford, linux-kernel

On Sat, 2002-09-07 at 21:19, Daniel Egger wrote:
> Am Sam, 2002-09-07 um 17.02 schrieb jbradford@dial.pipex.com:
> 
> > No, but you've upgraded the firmware, right?
> 
> Not exactly. According to IBM technical support there is no such thing
> as a new firmware. The drives are alright, the OS is broken.

The IBM technical support I dealt with not only confirmed there was new
firmware, the tools updated it and said they had 8)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07 20:19                 ` Daniel Egger
@ 2002-09-07 20:41                   ` jbradford
  2002-09-07 21:41                     ` Alan Cox
  2002-09-07 21:41                   ` Alan Cox
  2002-09-07 22:05                   ` Andre Hedrick
  2 siblings, 1 reply; 68+ messages in thread
From: jbradford @ 2002-09-07 20:41 UTC (permalink / raw)
  To: Daniel Egger; +Cc: linux-kernel

This discussion is becoming stupid, but here we go:

> > No, but you've upgraded the firmware, right?
> 
> Not exactly.

???  Either you did or didn't.

> According to IBM technical support there is no such thing
> as a new firmware. The drives are alright, the OS is broken.

Right, so you're calling Alan Cox a liar, then?  I know who I believe.

> > If that has fixed the problem, then it is not a faulty drive.
> Right, and how would you notice without sacrifying more data?

smartctl -X /dev/hda?

'Execute Extended Self Test' might be a good start

or you could just copy data to/from it, generally hammer it and spin it up, down, and sideways, generally try to make it go wrong, and if your data is intact, then I would trust it more than a disk that arrived in a jiffy bag, with an assurance that 'this one works'.

> > So, you'll just plug in your 'new' disk, and in a few months,
> > bad sectors will start appearing.
> 
> Not if you sold it at Ebay,

The bad sectors are just as likely to appear, but somebody else's data will be lost.  Very nice gesture, not to mention that you probably violate the Ebay T&C by selling a product that you suspect is faulty.

> which is what I did with all *new* drives I received from IBM.

Well, I won't buy a second hand drive from you then :-).

> I just kept the "serviceable used part" one in case I need to install
> Windows to upgrade the firmware of some drive or anything else in range.

Fine, if that's what floats your boat.

Infact, I was completely wrong, OK?  You were right all along, so there is no need to continue this pointless thread.

John.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07 15:02               ` jbradford
@ 2002-09-07 20:19                 ` Daniel Egger
  2002-09-07 20:41                   ` jbradford
                                     ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Daniel Egger @ 2002-09-07 20:19 UTC (permalink / raw)
  To: jbradford; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 757 bytes --]

Am Sam, 2002-09-07 um 17.02 schrieb jbradford@dial.pipex.com:

> No, but you've upgraded the firmware, right?

Not exactly. According to IBM technical support there is no such thing
as a new firmware. The drives are alright, the OS is broken.

> If that has fixed the problem, then it is not a faulty drive.

Right, and how would you notice without sacrifying more data?

> So, you'll just plug in your 'new' disk, and in a few months,
> bad sectors will start appearing.

Not if you sold it at Ebay, which is what I did with all *new*
drives I received from IBM. I just kept the "serviceable used part"
one in case I need to install Windows to upgrade the firmware of
some drive or anything else in range.
 
-- 
Servus,
       Daniel

[-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07 15:54           ` Holger Lubitz
@ 2002-09-07 16:31             ` jbradford
  0 siblings, 0 replies; 68+ messages in thread
From: jbradford @ 2002-09-07 16:31 UTC (permalink / raw)
  To: Holger Lubitz; +Cc: linux-kernel

> I wonder if it would be possible for the driver to monitor SMART and
> lighten the load on the drive when things don't seem normal.

I think it would be fun to have SMART monitoring in the driver, but I'm not sure it's worth the bloat.  It *can* be done in userspace, afterall.

> What is normal, anyway?

Not sure what 'normal' is, but the manufacturer defines thresholds, which are to be interpreted as 'drive is failing' if they are exceeded.

> I don't really believe the 310617 power on hours my Maxtor (the old 60
> gig with 4 platters) claims, either.

That's because it's reporting power on time in minutes :-)

John.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07  9:30         ` Anders Fugmann
  2002-09-07  9:37           ` Udo A. Steinberg
@ 2002-09-07 15:54           ` Holger Lubitz
  2002-09-07 16:31             ` jbradford
  1 sibling, 1 reply; 68+ messages in thread
From: Holger Lubitz @ 2002-09-07 15:54 UTC (permalink / raw)
  To: linux-kernel

Anders Fugmann wrote:
> I have had sucess in firmware-upgrading these drives, after which all
> problems were gone forever.

Which firmware version do your drives show? I ran the firmware upgrade
on my two DTLA half a year ago, and ended up with this:

Model=IBM-DTLA-307045, FwRev=TX6OA59A
Model=IBM-DTLA-305040, FwRev=TW4OA69A

(from hdparm -i output - the former 0A changed to 9A after the upgrade,
rest stayed the same)

Both work fine (they never failed me before the upgrade either).
However, at least the second drive still clicks often enough for me to
notice. I am still worried, though smartsuite says I'm fine - if I read
the output correctly.

It seems to click only when doing lots of write requests for extended
periods of time (like unbatching and spooling several megabytes of news
- one or two usually don't trigger it, larger batches do).

I wonder if it would be possible for the driver to monitor SMART and
lighten the load on the drive when things don't seem normal.

What is normal, anyway? For example, my Seagate Barracuda IV shows
continually increasing raw values for "Raw Read Error Rate", "Seek Error
Rate" and "Hardware ECC Recovered". It works fine, though. The older U5
I still have running has a high but pretty constant raw value for the
first, a slower rate of increase for the second and doesn't show the
third.

I don't really believe the 310617 power on hours my Maxtor (the old 60
gig with 4 platters) claims, either.

Holger

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07 13:50             ` Daniel Egger
@ 2002-09-07 15:02               ` jbradford
  2002-09-07 20:19                 ` Daniel Egger
  0 siblings, 1 reply; 68+ messages in thread
From: jbradford @ 2002-09-07 15:02 UTC (permalink / raw)
  To: Daniel Egger; +Cc: linux-kernel

> > Besides, you *do* backup, don't you?
> 
> I do but besides that there is still data loss involved and my time is
> expensive and limited, so I'd rather go for a hasslefree solution than
> to poke around in mud with a stick in the hope it might clear up.

Fair enough, if you don't have the time to devote to it, it's best to replace the drive.

I assumed from the size of this thread, which has nothing to do with the kernel anymore, that we were trying to find out what was to blame.

If this is going to become a flamewar, please remove the cc: to the kernel list, as I doubt that it interests them.

> > (Or do what Linus suggested a while ago, and upload your stuff to an
> > ftp site that is mirrored worldwide.)
> 
> Very practicable advise.

Whatever - it was a joke.

The reason I brought up backups, was because even if you have a RAID array, of high quality drives, with non-sequential serial numbers, on hot-pluggable interfaces, with known good firmware, you can still get silent data corruption.

Fact - *NO* SLED, or RAID array, can ever be guaranteed never to silently flip a bit.

> > I don't see the point of returning a disk that turns out not to be
> > faulty after the firmware upgrade,
> 
> The point is that until you know whether it really was the firmware,
> you've spend so much time that it is much easier to return the drive.

And the chances are you will get another drive of the same model, back from IBM.  How does that help?

I already pointed out that there are two known issues here with these drive - firmware bugs, and media defects.

So far, all we can say is that the firmware problem is now fixed.  On a replacement drive, you can't even say that.

The 'media errors' could have been caused entirely by the buggy firmware.

> > even if it qualifies for a warranty replacement, (which it shouldn't do)
> 
> A faulty drive is a faulty drive and thus qualifies for a
> free replacement (at least in Germany). Nobody here can force
> you to try several costly things which might solve the problem;
> it is rather the manufacturers duty to fix it on their cost.

No, but you've upgraded the firmware, right?  If that has fixed the problem, then it is not a faulty drive.  If it is not a faulty drive, then what is the point in sending it back?  If it is not a faulty drive, IBM would be justified in sending it right back to you at your expense.  Oh, and it might get damaged in transit.

> > because you might be exchanging a good disk for a bad disk.
> 
> Very doubtful considering past experience. Also it's not very
> probable (though it has happened) to receive a disk which is
> more broken than broken.=20

No, I would say it is very possible that you could receive a disk with the old firmware on it.  So, you'll just plug in your 'new' disk, and in a few months, bad sectors will start appearing.

John.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07 13:08           ` jbradford
@ 2002-09-07 13:50             ` Daniel Egger
  2002-09-07 15:02               ` jbradford
  0 siblings, 1 reply; 68+ messages in thread
From: Daniel Egger @ 2002-09-07 13:50 UTC (permalink / raw)
  To: jbradford; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1293 bytes --]

Am Sam, 2002-09-07 um 15.08 schrieb jbradford@dial.pipex.com:

> Besides, you *do* backup, don't you?

I do but besides that there is still data loss involved and my time is
expensive and limited, so I'd rather go for a hasslefree solution than
to poke around in mud with a stick in the hope it might clear up.

> (Or do what Linus suggested a while ago, and upload your stuff to an
> ftp site that is mirrored worldwide.)

Very practicable advise.

> I don't see the point of returning a disk that turns out not to be
> faulty after the firmware upgrade,

The point is that until you know whether it really was the firmware,
you've spend so much time that it is much easier to return the drive.

> even if it qualifies for a warranty replacement, (which it shouldn't do)

A faulty drive is a faulty drive and thus qualifies for a
free replacement (at least in Germany). Nobody here can force
you to try several costly things which might solve the problem;
it is rather the manufacturers duty to fix it on their cost.

> because you might be exchanging a good disk for a bad disk.

Very doubtful considering past experience. Also it's not very
probable (though it has happened) to receive a disk which is
more broken than broken. 

-- 
Servus,
       Daniel

[-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07 12:31         ` Daniel Egger
@ 2002-09-07 13:08           ` jbradford
  2002-09-07 13:50             ` Daniel Egger
  0 siblings, 1 reply; 68+ messages in thread
From: jbradford @ 2002-09-07 13:08 UTC (permalink / raw)
  To: Daniel Egger; +Cc: linux-kernel

> > The tests showed bad sectors, i'm currently running a disk erase.
> 
> This is exactly the mistake I've been meaning to warn you of.
> The disk will corrupt sooner or later again and you'll have to go
> through all the torture (possible backup/restore, missing data) again
> and if you're unlucky (which is quite possible with your frequency of
> use) the warranty is void until the problems appear the next time.

There are two separate issues here, though:

* Buggy firmware
* Unreliable media

We have confirmed, (I believe), that the drive did have the buggy firmware.  We do not know yet whether the media is defective or not, but we do know that the drives are not the best in the world.

Alan also confirmed that the errors were direct from the device, and so it is not a kernel bug.

However, I raise the question of whether the new kernel version caused different access patterns to the device, and showed up the firmware bug that was there all the time.  Or maybe the compilation of the new kernel thrashed the disk and showed up the firmware bug.  If the machine has been on for some time, (months), doing not very much, maybe a lot of disk data was cached in RAM, and the kernel compile caused it to be re-read from disk, showing up media defects.

I was hoping that he would actually post the output of:

smartctl -a /dev/hda?

because that tells you all sorts of things, like, for example, reallocated sector count, and calibration retry count.

Obviously, it is not a good idea to use the drive for anything important until it has been tested in a non-critical application first.

Besides, you *do* backup, don't you?  (Or do what Linus suggested a while ago, and upload your stuff to an ftp site that is mirrored worldwide.)

I don't see the point of returning a disk that turns out not to be faulty after the firmware upgrade, for replacement under the warranty, even if it qualifies for a warranty replacement, (which it shouldn't do), because you might be exchanging a good disk for a bad disk.

John.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 20:32   ` Alan Cox
@ 2002-09-07 12:34     ` Daniel Egger
  0 siblings, 0 replies; 68+ messages in thread
From: Daniel Egger @ 2002-09-07 12:34 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 619 bytes --]

Am Fre, 2002-09-06 um 22.32 schrieb Alan Cox:

> Its a status entry direct from the drive. The drive says "uncorrectable
> error" which means there is a media problem. Its nothing to do with
> Linux

According to IBM tech staff it is an OS problem because the data
transfer to the drive got corrupted somehow and thus the drive forgot
about the sectors.

I was just laughing my ass off when I heard this, especially after
the 4th drive failing within a short period of time with the same
guy calling me on my cell phone and telling me the same shite over
and over again...
 
-- 
Servus,
       Daniel

[-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 19:22       ` DevilKin
  2002-09-07  9:30         ` Anders Fugmann
@ 2002-09-07 12:31         ` Daniel Egger
  2002-09-07 13:08           ` jbradford
  1 sibling, 1 reply; 68+ messages in thread
From: Daniel Egger @ 2002-09-07 12:31 UTC (permalink / raw)
  To: DevilKin; +Cc: jbradford, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 683 bytes --]

Am Fre, 2002-09-06 um 21.22 schrieb DevilKin:

> Well, there were 21 ATA errors, and it showed 5 error blocks, with disk 'live' 
> times of 629 hours.

No wonder it ran for 2 years. Are you using this machine frequently at
all? :)
 
> The tests showed bad sectors, i'm currently running a disk erase.

This is exactly the mistake I've been meaning to warn you of.
The disk will corrupt sooner or later again and you'll have to go
through all the torture (possible backup/restore, missing data) again
and if you're unlucky (which is quite possible with your frequency of
use) the warranty is void until the problems appear the next time.

-- 
Servus,
       Daniel

[-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07  9:30         ` Anders Fugmann
@ 2002-09-07  9:37           ` Udo A. Steinberg
  2002-09-07 15:54           ` Holger Lubitz
  1 sibling, 0 replies; 68+ messages in thread
From: Udo A. Steinberg @ 2002-09-07  9:37 UTC (permalink / raw)
  To: Linux-Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 498 bytes --]

On Sat, 07 Sep 2002 11:30:01 +0200 Anders Fugmann (AF) wrote:

AF> You can download the firmware programs from 
AF> http://anders.fugmann.dhs.org/ibm. There are both upgrade for 75GXP and 
AF> 60GXP, or you could contact IBM for the firmware upgrade - They are not 
AF> available on the ibm site. The programs are Windows thingies, which 
AF> creates a floppy to be booted.

They are on the IBM site, but a bit hard to find:

http://www-1.ibm.com/support/docview.wss?rs=0&uid=psg1MIGR-39082

-Udo.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 19:22       ` DevilKin
@ 2002-09-07  9:30         ` Anders Fugmann
  2002-09-07  9:37           ` Udo A. Steinberg
  2002-09-07 15:54           ` Holger Lubitz
  2002-09-07 12:31         ` Daniel Egger
  1 sibling, 2 replies; 68+ messages in thread
From: Anders Fugmann @ 2002-09-07  9:30 UTC (permalink / raw)
  To: DevilKin; +Cc: Linux Kernel Mailing List

DevilKin wrote:
> Luckely I've been able to backup everything from the disk, and I'm running the 
> DFT now. The tests showed bad sectors, i'm currently running a disk erase.
I have had sucess in firmware-upgrading these drives, after which all 
problems were gone forever.

You can download the firmware programs from 
http://anders.fugmann.dhs.org/ibm. There are both upgrade for 75GXP and 
60GXP, or you could contact IBM for the firmware upgrade - They are not 
available on the ibm site. The programs are Windows thingies, which 
creates a floppy to be booted.

Regards
Anders Fugmann






^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07  7:42   ` jbradford
@ 2002-09-07  7:50     ` Andre Hedrick
  0 siblings, 0 replies; 68+ messages in thread
From: Andre Hedrick @ 2002-09-07  7:50 UTC (permalink / raw)
  To: jbradford; +Cc: hahn, linux-kernel


Technically it is, I am working to transfer the copyright/license to LAD.
Then I can update it and transform it to the preferred kernel API that is
not enabled by default.  I expect it will require an sub-set of the
taskfile_ioctl calls to restrict various IO calls.

Cheers,

On Sat, 7 Sep 2002 jbradford@dial.pipex.com wrote:

> > Next dig out smartsuite from http://www.linux-ide.org/smart.html
> 
> I thought that smartsuite was now unmaintained, and posted a comment to that effect earlier in this thread - sorry for the mis-information.
> 
> John.
> 

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-07  7:02 ` Andre Hedrick
@ 2002-09-07  7:42   ` jbradford
  2002-09-07  7:50     ` Andre Hedrick
  0 siblings, 1 reply; 68+ messages in thread
From: jbradford @ 2002-09-07  7:42 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: hahn, linux-kernel

> Next dig out smartsuite from http://www.linux-ide.org/smart.html

I thought that smartsuite was now unmaintained, and posted a comment to that effect earlier in this thread - sorry for the mis-information.

John.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:55   ` DevilKin
  2002-09-06 17:22     ` jbradford
@ 2002-09-07  7:08     ` Andre Hedrick
  1 sibling, 0 replies; 68+ messages in thread
From: Andre Hedrick @ 2002-09-07  7:08 UTC (permalink / raw)
  To: DevilKin; +Cc: jbradford, Linux Kernel Mailing List


Send me the results offline

On Fri, 6 Sep 2002, DevilKin wrote:

> On Friday 06 September 2002 17:36, jbradford@dial.pipex.com wrote:
> > > I've looked up these errors on the net, and as far as i can tell it means
> > > that the drive has some bad sectors at the given addresses and that it
> > > will probably die on me sooner or later.
> > >
> > > Can someone either confirm this to me or tell me what to do to fix it?
> > >
> > > The drive involved is an IBM-DTLA-307060, which has served me without
> > > problems now for about 2 years.
> >
> > Have a look at:
> >
> > http://csl.cse.ucsc.edu/smart.shtml
> >
> > there you will find software for interrogating and monitoring the
> > S.M.A.R.T. data available from your drive.  It's a little late to start
> > monitoring it, if the drive is already dying, but if, for example, it shows
> > a lot of re-allocated sectors, or spin retries, you'll know something is
> > wrong.
> >
> 
> OK, I downloaded that and installed it, but well, frankly, it shows me very 
> little useful stuff.
> 
> Or i'm just not good at interpreting this.
> 
> DK
> 
> -- 
> "I gained nothing at all from Supreme Enlightenment, and for that very
> reason it is called Supreme Enlightenment."
> 		-- Gotama Buddha
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:13 DevilKin
                   ` (5 preceding siblings ...)
  2002-09-06 17:46 ` mbs
@ 2002-09-07  7:02 ` Andre Hedrick
  2002-09-07  7:42   ` jbradford
  6 siblings, 1 reply; 68+ messages in thread
From: Andre Hedrick @ 2002-09-07  7:02 UTC (permalink / raw)
  To: DevilKin; +Cc: linux-kernel


First BACK up what is left.

Next dig out smartsuite from http://www.linux-ide.org/smart.html

Run it in full capture mode, please use another disk to run root, or the
system will tank.

Read and save smart logs.

cat /dev/zero > /dev/hd{IBM-DTLA-307060}x

Rerun Smart in full capture mode.

Reread smart logs and compare.

cat /dev/urandom > /dev/hd{IBM-DTLA-307060}x

If you get no errors you can reuse the drive, for how long? Maybe 6 months
to a year.

Now, I can not tell you what, why, how things are going on.
Sheesh, I expect to be in a deep six for this series of events already.

Sorry, I can not say anymore.

If you do not like the above, you need to run out and buy another drive
fast.

Cheers,

On Fri, 6 Sep 2002, DevilKin wrote:

> Hello kernel people,
> 
> Kernel running: 2.4.20-pre1ac3 or -pre5ac2 (same under both)
> 
> Today I discovered a stale copy of qt-3.0.3 lying about on my disk. When I 
> tried to delete it, this started showing up in my log files:
> 
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862, 
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data 
> of [612671 612672 0x0 SD]
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862, 
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data 
> of [612671 612677 0x0 SD]
> 
> and rm just reported me 'Permission denied'.
> 
> I've looked up these errors on the net, and as far as i can tell it means that 
> the drive has some bad sectors at the given addresses and that it will 
> probably die on me sooner or later.
> 
> Can someone either confirm this to me or tell me what to do to fix it?
> 
> The drive involved is an IBM-DTLA-307060, which has served me without problems 
> now for about 2 years.
> 
> Thanks!
> 
> DK
> -- 
> If all the Chinese simultaneously jumped into the Pacific off a 10 foot
> platform erected 10 feet off their coast, it would cause a tidal wave
> that would destroy everything in this country west of Nebraska.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
       [not found] <Pine.LNX.4.33.0209062017230.14523-100000@coffee.psychology.mcmaster.ca>
@ 2002-09-07  6:09 ` jbradford
  0 siblings, 0 replies; 68+ messages in thread
From: jbradford @ 2002-09-07  6:09 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-kernel

> > Now that the Smart Suite S.M.A.R.T. applications are unmaintained, would
> 
> what happened?

I'm not sure, but the last update to the S.M.A.R.T. Suite website, on 3 July this year, says that the page and the applications are no longer maintained.

Seems the Beta of version 2.0 never got finished either :-(.

> > there be any chance of implementing S.M.A.R.T. in to the kernel IDE code?
> 
> what would be the benefit?  as I understand it, smart is really
> a means of reporting long-term disk status, which is optimally done
> by user-space.  even something exotic like failing over to a spare disk
> would clearly be best done in user-space.

You are right, the idea is to monitor the smart info, ideally from when the drive is new, but at least over a period of time, so that a change in it's behavior shows up.

> > I know the IDE code is already a nightmare, but it would be a nice feature.
> 
> what did you have in mind?

Well, nothing very exotic, just some sanity checks on the SMART data when the IDE and SCSI interfaces are probed for devices.  Something like:

* Device supports/does not support following SMART features:
  * General attributes
  * Vendor attributes
  * Error log
  * Selftest log
  * Drive info

* SMART is currently enabled/disabled

* Total power-on time is currently foo hours

* Warning if any of the following is excessive:

  * Last spin up time
  * Calibration retry count
  * UDMA CRC Error count

> > S.M.A.R.T. is terribly under used at the moment - most people don't even
> > know what it is.  Infact, I could be wrong, but isn't a subset of
> > S.M.A.R.T. implemented on modern SCSI disks, too? 
> 
> I know that most people don't run it, but other than that, how is it 
> underused?

Well, I can't see any reason for *not* using it where available - who wouldn't appreciate a warning on boot up, 'oh, by the way, /dev/hda is about to die in a couple of days :-)'

> > Monitoring of any kind is always a nice feature to have...
> 
> certainly, though that doesn't mean it should move from userspace to
> kernel...

Agreed, there isn't any point in doing monitoring in kernelspace, but capabilities reporting, and sanity checks on boot might be useful.

John.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 17:46 ` mbs
@ 2002-09-06 20:32   ` Alan Cox
  2002-09-07 12:34     ` Daniel Egger
  0 siblings, 1 reply; 68+ messages in thread
From: Alan Cox @ 2002-09-06 20:32 UTC (permalink / raw)
  To: mbs; +Cc: DevilKin, linux-kernel

On Fri, 2002-09-06 at 18:46, mbs wrote:
> fdisk/format and reinstall but stick with a 2.4.19 or 2.4.19-ac kernel.
> 
> I would bet money that the problem is purely a .20-preX-acX thing.

Its a status entry direct from the drive. The drive says "uncorrectable
error" which means there is a media problem. Its nothing to do with
Linux


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 17:33   ` Daniel Egger
@ 2002-09-06 20:31     ` Alan Cox
  0 siblings, 0 replies; 68+ messages in thread
From: Alan Cox @ 2002-09-06 20:31 UTC (permalink / raw)
  To: Daniel Egger; +Cc: linux-kernel

On Fri, 2002-09-06 at 18:33, Daniel Egger wrote:
> Am Fre, 2002-09-06 um 17.38 schrieb Alan Cox:
> 
> > Get the IBM disk tools, upgrade the firmware and see what the ibm tools
> > have to say. IBM drives have had some problems with spontaneous bad
> > blocks appearing that go away with new firmware and a run of the disk
> > tools.
> 
> The "run of the disk tools" that does away with the badblocks is a
> lowlevel format; a tedious way to spent ones' time on a harddrive
> that will die anyway soon.

For the IBM's it depends what the problem is. Spontaneous bad blocks
appearing during power off appears to be fixed by the firmware update

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 17:22     ` jbradford
@ 2002-09-06 19:22       ` DevilKin
  2002-09-07  9:30         ` Anders Fugmann
  2002-09-07 12:31         ` Daniel Egger
  0 siblings, 2 replies; 68+ messages in thread
From: DevilKin @ 2002-09-06 19:22 UTC (permalink / raw)
  To: jbradford; +Cc: Linux Kernel Mailing List

On Friday 06 September 2002 19:22, jbradford@dial.pipex.com wrote:
> > OK, I downloaded that and installed it, but well, frankly, it shows me
> > very little useful stuff.
> >
> > Or i'm just not good at interpreting this.
>
> Post the output of smartctl -a /dev/hda? to me, and I'll tell you what I
> can, but it's best to monitor the stats from when the drive is new, (I.E.
> every drive you buy from now on :-) ).
>

Well, there were 21 ATA errors, and it showed 5 error blocks, with disk 'live' 
times of 629 hours.

Luckely I've been able to backup everything from the disk, and I'm running the 
DFT now. The tests showed bad sectors, i'm currently running a disk erase.

DK
-- 
	"What's that thing?"
	"Well, it's a highly technical, sensitive instrument we use in
computer repair.  Being a layman, you probably can't grasp exactly what
it does.  We call it a two-by-four."
		-- Jeff MacNelley, "Shoe"


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 16:14       ` Billy Harvey
  2002-09-06 16:41         ` Mike Dresser
@ 2002-09-06 18:00         ` jbradford
  2002-09-06 17:58           ` Mike Dresser
  1 sibling, 1 reply; 68+ messages in thread
From: jbradford @ 2002-09-06 18:00 UTC (permalink / raw)
  To: Billy Harvey; +Cc: linux-kernel, alan

> On Fri, 2002-09-06 at 11:42, Mike Dresser wrote:
> > On 6 Sep 2002, Alan Cox wrote:
> > 
> > > On Fri, 2002-09-06 at 16:26, Mike Dresser wrote:
> > > > eBAY, and buy yourself a new drive.  You can pickup 80 gig drives for
> > > > around 80 bucks nowadays.  I used to recommend Maxtors, until they said
> > > > they're cutting their warranty to one year from three.  I don't know what
> > > > to use anymore.
> > >
> > > At current drive density and reliabilities - raid. Software raid setups
> > > are so cheap there is little point not running RAID on IDE nowdays
> > >
> > Well, I was looking more on the side of the Windows PC's here at the
> > office, it's a bit expensive to start running raid on those.
> > 
> > Mike
> 
> Well, I haven't examined this empirically, but as the quantity of disk
> drives in an organization continues increasing, so does the probability
> of disk failure, any one of which can mean lost time/money, etc.  Drive
> reliability is likely not increasing at the same rate that density is,
> so the likelihood of lost data is probably increasing.  Since LAN speeds
> continue to increase, it might start making sense now in clusters of
> more than a few machines to make each machine less reliant on its own
> disk storage (to the point of not at all other than big swap space) and
> use the LAN more.  On the LAN put the money into a quality shared
> resource - a heavy duty UPS'd, etc. RAID system.  Especially if a RAID
> system is as easy to build/maintain/use as Alan alludes to (don't know -
> never built one).

A RAID array isn't a universal solution to all disk related problems, though, is it?  I mean, we were talking about buggy firmware earlier on in this thread - if a drive which is part of an array returns corrupted data, without acknowledging it, then you'll read corrupted data from the RAID array.  Also, an array of unreliable drives doesn't make a reliable array.

Now that the Smart Suite S.M.A.R.T. applications are unmaintained, would there be any chance of implementing S.M.A.R.T. in to the kernel IDE code?  I know the IDE code is already a nightmare, but it would be a nice feature.  S.M.A.R.T. is terribly under used at the moment - most people don't even know what it is.  Infact, I could be wrong, but isn't a subset of S.M.A.R.T. implemented on modern SCSI disks, too?

Monitoring of any kind is always a nice feature to have...

John.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 18:00         ` jbradford
@ 2002-09-06 17:58           ` Mike Dresser
  0 siblings, 0 replies; 68+ messages in thread
From: Mike Dresser @ 2002-09-06 17:58 UTC (permalink / raw)
  To: jbradford; +Cc: Billy Harvey, linux-kernel, alan

On Fri, 6 Sep 2002 jbradford@dial.pipex.com wrote:
> Infact, I could be wrong, but isn't a subset of S.M.A.R.T. implemented
on modern SCSI disks, too?

Yes.

Mike


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:13 DevilKin
                   ` (4 preceding siblings ...)
  2002-09-06 15:44 ` mbs
@ 2002-09-06 17:46 ` mbs
  2002-09-06 20:32   ` Alan Cox
  2002-09-07  7:02 ` Andre Hedrick
  6 siblings, 1 reply; 68+ messages in thread
From: mbs @ 2002-09-06 17:46 UTC (permalink / raw)
  To: DevilKin, linux-kernel

fdisk/format and reinstall but stick with a 2.4.19 or 2.4.19-ac kernel.

I would bet money that the problem is purely a .20-preX-acX thing.

run it a while on 2.4.19 to verify that life is good.  then build a new 
2.4.20-pre1-ac3 and boot it. I bet that within minutes of normal use, you 
will have a problem.

(I have done this loop 3 times.)

On Friday 06 September 2002 11:13, DevilKin wrote:
> Hello kernel people,
>
> Kernel running: 2.4.20-pre1ac3 or -pre5ac2 (same under both)
>
> Today I discovered a stale copy of qt-3.0.3 lying about on my disk. When I
> tried to delete it, this started showing up in my log files:
>
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
> data of [612671 612672 0x0 SD]
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
> data of [612671 612677 0x0 SD]
>
> and rm just reported me 'Permission denied'.
>
> I've looked up these errors on the net, and as far as i can tell it means
> that the drive has some bad sectors at the given addresses and that it will
> probably die on me sooner or later.
>
> Can someone either confirm this to me or tell me what to do to fix it?
>
> The drive involved is an IBM-DTLA-307060, which has served me without
> problems now for about 2 years.
>
> Thanks!
>
> DK

-- 
/**************************************************
**   Mark Salisbury       ||      mbs@mc.com     **
** If you would like to sponsor me for the       **
** Mass Getaway, a 150 mile bicycle ride to for  **
** MS, contact me to donate by cash or check or  **
** click the link below to donate by credit card **
**************************************************/
https://www.nationalmssociety.org/pledge/pledge.asp?participantid=86736

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:38 ` Alan Cox
@ 2002-09-06 17:33   ` Daniel Egger
  2002-09-06 20:31     ` Alan Cox
  0 siblings, 1 reply; 68+ messages in thread
From: Daniel Egger @ 2002-09-06 17:33 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 992 bytes --]

Am Fre, 2002-09-06 um 17.38 schrieb Alan Cox:

> Get the IBM disk tools, upgrade the firmware and see what the ibm tools
> have to say. IBM drives have had some problems with spontaneous bad
> blocks appearing that go away with new firmware and a run of the disk
> tools.

The "run of the disk tools" that does away with the badblocks is a
lowlevel format; a tedious way to spent ones' time on a harddrive
that will die anyway soon.

> More importantly if thats the problem with the firmware update
> they dont come back until the drive really dies.

Right, which is probably shortly after. Especially on a two years
old drive I wouldn't go through all the troubles to backup 60GB
data, lowlevel format the drive, restore the data and hope the
problems are gone; instead I'd rather get a new drive within the
warranty and cross fingers.

BTW: I did the backup way exactly once and the drive got back to me
with new errors two weeks after.

-- 
Servus,
       Daniel

[-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:26 ` Mike Dresser
  2002-09-06 15:39   ` Alan Cox
  2002-09-06 15:44   ` Richard B. Johnson
@ 2002-09-06 17:28   ` Daniel Egger
  2 siblings, 0 replies; 68+ messages in thread
From: Daniel Egger @ 2002-09-06 17:28 UTC (permalink / raw)
  To: Mike Dresser; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 785 bytes --]

Am Fre, 2002-09-06 um 17.26 schrieb Mike Dresser:

> Make backups immediately.  Run ibm's DFT tool, get the code to RMA this
> thing back to IBM.  Sell the replacement they send you to a sucker on
> eBAY, and buy yourself a new drive.  You can pickup 80 gig drives for
> around 80 bucks nowadays.  I used to recommend Maxtors, until they said
> they're cutting their warranty to one year from three.  I don't know what
> to use anymore.

I did exactly this and bought a 80gig Maxtor for EUR 100 (don't know why
it would be so much cheaper at your place, but anyway). Unfortunately
the drive was broken right away, let's see how long the replacement
drive keeps running...

Seems like every major brand is just producing crap nowadays....
 
-- 
Servus,
       Daniel

[-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:55   ` DevilKin
@ 2002-09-06 17:22     ` jbradford
  2002-09-06 19:22       ` DevilKin
  2002-09-07  7:08     ` Andre Hedrick
  1 sibling, 1 reply; 68+ messages in thread
From: jbradford @ 2002-09-06 17:22 UTC (permalink / raw)
  To: DevilKin; +Cc: linux-kernel

> OK, I downloaded that and installed it, but well, frankly, it shows me very 
> little useful stuff.
> 
> Or i'm just not good at interpreting this.

Post the output of smartctl -a /dev/hda? to me, and I'll tell you what I can, but it's best to monitor the stats from when the drive is new, (I.E. every drive you buy from now on :-) ).

John.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 16:14       ` Billy Harvey
@ 2002-09-06 16:41         ` Mike Dresser
  2002-09-06 18:00         ` jbradford
  1 sibling, 0 replies; 68+ messages in thread
From: Mike Dresser @ 2002-09-06 16:41 UTC (permalink / raw)
  To: Billy Harvey; +Cc: Linux Kernel

On 6 Sep 2002, Billy Harvey wrote:

> use the LAN more.  On the LAN put the money into a quality shared
> resource - a heavy duty UPS'd, etc. RAID system.  Especially if a RAID
> system is as easy to build/maintain/use as Alan alludes to (don't know -
> never built one).
>
> Billy

And don't forget the cost of cluebats to beat the users over the head
with.  I've been trying for 3 years to get people to save their documents
to the H: drive.  Still find stuff stored wherever they feel like storing
it.

So each facility has a backup server that nightly grabs their entire
drive, gzip's it, and then dumps it to a DDS-4 tape.  Also keeps X days of
daily full backups, and X weeks as well.

Aside from Windows filesharing being so slow(1500kps via smbtar is average
here), it works quite nicely.  Even with a P4/2.53, I still can't get
more than the 1500kps that a p133 is capable of.  All the p4 gives me, is
the ability to gzip -9 or even bzip2 the files, instead of the gzip -1
that the p133 is capable of in real time.

Mike


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:44   ` Richard B. Johnson
@ 2002-09-06 16:19     ` Craig Ruff
  0 siblings, 0 replies; 68+ messages in thread
From: Craig Ruff @ 2002-09-06 16:19 UTC (permalink / raw)
  To: linux-kernel

On Fri, Sep 06, 2002 at 11:44:52AM -0400, Richard B. Johnson wrote:
>              IBM DeathStar 75gxp.
> 
> Well put. Also, don't turn off this drive --ever. If possible, back-up
> to something on a network, not to anything on the IDE bus.

I had one of these drives fail recently with the dread "clicking of death"
sounds (while it was retrying reads).  What I discovered, while backing up
the disk, is that continuing sequential reads past the bad sectors without
and intervening operation would eventually cause the drive to get into a
messed up state where it erroneously reported the following good sectors
as bad.

My strategy to recover the good data was to read sequentially until I 
got an error, then explicitly seek to the next good sector and continue
from there.  This enabled me to copy the good data.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:42     ` Mike Dresser
@ 2002-09-06 16:14       ` Billy Harvey
  2002-09-06 16:41         ` Mike Dresser
  2002-09-06 18:00         ` jbradford
  0 siblings, 2 replies; 68+ messages in thread
From: Billy Harvey @ 2002-09-06 16:14 UTC (permalink / raw)
  To: Linux Kernel

On Fri, 2002-09-06 at 11:42, Mike Dresser wrote:
> On 6 Sep 2002, Alan Cox wrote:
> 
> > On Fri, 2002-09-06 at 16:26, Mike Dresser wrote:
> > > eBAY, and buy yourself a new drive.  You can pickup 80 gig drives for
> > > around 80 bucks nowadays.  I used to recommend Maxtors, until they said
> > > they're cutting their warranty to one year from three.  I don't know what
> > > to use anymore.
> >
> > At current drive density and reliabilities - raid. Software raid setups
> > are so cheap there is little point not running RAID on IDE nowdays
> >
> Well, I was looking more on the side of the Windows PC's here at the
> office, it's a bit expensive to start running raid on those.
> 
> Mike

Well, I haven't examined this empirically, but as the quantity of disk
drives in an organization continues increasing, so does the probability
of disk failure, any one of which can mean lost time/money, etc.  Drive
reliability is likely not increasing at the same rate that density is,
so the likelihood of lost data is probably increasing.  Since LAN speeds
continue to increase, it might start making sense now in clusters of
more than a few machines to make each machine less reliant on its own
disk storage (to the point of not at all other than big swap space) and
use the LAN more.  On the LAN put the money into a quality shared
resource - a heavy duty UPS'd, etc. RAID system.  Especially if a RAID
system is as easy to build/maintain/use as Alan alludes to (don't know -
never built one).

Billy


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:36 ` jbradford
@ 2002-09-06 15:55   ` DevilKin
  2002-09-06 17:22     ` jbradford
  2002-09-07  7:08     ` Andre Hedrick
  0 siblings, 2 replies; 68+ messages in thread
From: DevilKin @ 2002-09-06 15:55 UTC (permalink / raw)
  To: jbradford; +Cc: Linux Kernel Mailing List

On Friday 06 September 2002 17:36, jbradford@dial.pipex.com wrote:
> > I've looked up these errors on the net, and as far as i can tell it means
> > that the drive has some bad sectors at the given addresses and that it
> > will probably die on me sooner or later.
> >
> > Can someone either confirm this to me or tell me what to do to fix it?
> >
> > The drive involved is an IBM-DTLA-307060, which has served me without
> > problems now for about 2 years.
>
> Have a look at:
>
> http://csl.cse.ucsc.edu/smart.shtml
>
> there you will find software for interrogating and monitoring the
> S.M.A.R.T. data available from your drive.  It's a little late to start
> monitoring it, if the drive is already dying, but if, for example, it shows
> a lot of re-allocated sectors, or spin retries, you'll know something is
> wrong.
>

OK, I downloaded that and installed it, but well, frankly, it shows me very 
little useful stuff.

Or i'm just not good at interpreting this.

DK

-- 
"I gained nothing at all from Supreme Enlightenment, and for that very
reason it is called Supreme Enlightenment."
		-- Gotama Buddha


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:37 ` mbs
@ 2002-09-06 15:54   ` Alan Cox
  0 siblings, 0 replies; 68+ messages in thread
From: Alan Cox @ 2002-09-06 15:54 UTC (permalink / raw)
  To: mbs; +Cc: DevilKin, linux-kernel

On Fri, 2002-09-06 at 16:37, mbs wrote:
> same problem I was having with 2.4.20-pre4-ac2-preempt.

I beg to differ. He has a dying disk, you have some weird crc and other
goings on


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:26 ` Mike Dresser
  2002-09-06 15:39   ` Alan Cox
@ 2002-09-06 15:44   ` Richard B. Johnson
  2002-09-06 16:19     ` Craig Ruff
  2002-09-06 17:28   ` Daniel Egger
  2 siblings, 1 reply; 68+ messages in thread
From: Richard B. Johnson @ 2002-09-06 15:44 UTC (permalink / raw)
  To: Mike Dresser; +Cc: DevilKin, linux-kernel

On Fri, 6 Sep 2002, Mike Dresser wrote:

> > The drive involved is an IBM-DTLA-307060, which has served me without problems
> > now for about 2 years.
> 
> IBM DeathStar 75gxp.
> 
> One of the worst hard drives ever made.  It's quite likely it's failed,
> and in fact, two years is pretty impressive out of one of these.
> 
> Make backups immediately.  Run ibm's DFT tool, get the code to RMA this
> thing back to IBM.  Sell the replacement they send you to a sucker on
> eBAY, and buy yourself a new drive.  You can pickup 80 gig drives for
> around 80 bucks nowadays.  I used to recommend Maxtors, until they said
> they're cutting their warranty to one year from three.  I don't know what
> to use anymore.
> 
> Mike
> 

             IBM DeathStar 75gxp.

Well put. Also, don't turn off this drive --ever. If possible, back-up
to something on a network, not to anything on the IDE bus. If you don't
have anything available, borrow something from work and make a temporary
LAN. With bad sectors and a relocation list already full, this drive
will seize the IDE bus and never let go once you trip it into failure.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:13 DevilKin
                   ` (3 preceding siblings ...)
  2002-09-06 15:38 ` Alan Cox
@ 2002-09-06 15:44 ` mbs
  2002-09-06 17:46 ` mbs
  2002-09-07  7:02 ` Andre Hedrick
  6 siblings, 0 replies; 68+ messages in thread
From: mbs @ 2002-09-06 15:44 UTC (permalink / raw)
  To: DevilKin, linux-kernel

forgot to say: my drive worked fine with 2.4.19-pre3-ac5-preempt before the 
move to the -20 kernel.

also worked fine after a fdisk/reinstall and continued to work fine till the 
first time I booted on a (freshly built) -20-ac version.

I thought it was the drive so I replaced it with a brand new drive, and had 
_EXACTLY_ the same failure pattern.

------

same problem I was having with 2.4.20-pre4-ac2-preempt.

alan didn't want to hear it from me due to the -preempt

my system was e7500 chipset, dual xeon, WD 40g drive, ext2 or ext3.

from this we can glean: preempt not a factor, HD manufacturer not a factor, 
FS not a factor.  don't know what chipset you are using.

I was allso geting badCRC errors.

On Friday 06 September 2002 11:13, DevilKin wrote:
> Hello kernel people,
>
> Kernel running: 2.4.20-pre1ac3 or -pre5ac2 (same under both)
>
> Today I discovered a stale copy of qt-3.0.3 lying about on my disk. When I
> tried to delete it, this started showing up in my log files:
>
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
> data of [612671 612672 0x0 SD]
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
> data of [612671 612677 0x0 SD]
>
> and rm just reported me 'Permission denied'.
>
> I've looked up these errors on the net, and as far as i can tell it means
> that the drive has some bad sectors at the given addresses and that it will
> probably die on me sooner or later.
>
> Can someone either confirm this to me or tell me what to do to fix it?
>
> The drive involved is an IBM-DTLA-307060, which has served me without
> problems now for about 2 years.
>
> Thanks!
>
> DK

-- 
/**************************************************
**   Mark Salisbury       ||      mbs@mc.com     **
** If you would like to sponsor me for the       **
** Mass Getaway, a 150 mile bicycle ride to for  **
** MS, contact me to donate by cash or check or  **
** click the link below to donate by credit card **
**************************************************/
https://www.nationalmssociety.org/pledge/pledge.asp?participantid=86736

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:39   ` Alan Cox
@ 2002-09-06 15:42     ` Mike Dresser
  2002-09-06 16:14       ` Billy Harvey
  0 siblings, 1 reply; 68+ messages in thread
From: Mike Dresser @ 2002-09-06 15:42 UTC (permalink / raw)
  To: Alan Cox; +Cc: DevilKin, linux-kernel

On 6 Sep 2002, Alan Cox wrote:

> On Fri, 2002-09-06 at 16:26, Mike Dresser wrote:
> > eBAY, and buy yourself a new drive.  You can pickup 80 gig drives for
> > around 80 bucks nowadays.  I used to recommend Maxtors, until they said
> > they're cutting their warranty to one year from three.  I don't know what
> > to use anymore.
>
> At current drive density and reliabilities - raid. Software raid setups
> are so cheap there is little point not running RAID on IDE nowdays
>
Well, I was looking more on the side of the Windows PC's here at the
office, it's a bit expensive to start running raid on those.

Mike


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:26 ` Mike Dresser
@ 2002-09-06 15:39   ` Alan Cox
  2002-09-06 15:42     ` Mike Dresser
  2002-09-06 15:44   ` Richard B. Johnson
  2002-09-06 17:28   ` Daniel Egger
  2 siblings, 1 reply; 68+ messages in thread
From: Alan Cox @ 2002-09-06 15:39 UTC (permalink / raw)
  To: Mike Dresser; +Cc: DevilKin, linux-kernel

On Fri, 2002-09-06 at 16:26, Mike Dresser wrote:
> eBAY, and buy yourself a new drive.  You can pickup 80 gig drives for
> around 80 bucks nowadays.  I used to recommend Maxtors, until they said
> they're cutting their warranty to one year from three.  I don't know what
> to use anymore.

At current drive density and reliabilities - raid. Software raid setups
are so cheap there is little point not running RAID on IDE nowdays


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:13 DevilKin
                   ` (2 preceding siblings ...)
  2002-09-06 15:37 ` mbs
@ 2002-09-06 15:38 ` Alan Cox
  2002-09-06 17:33   ` Daniel Egger
  2002-09-06 15:44 ` mbs
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 68+ messages in thread
From: Alan Cox @ 2002-09-06 15:38 UTC (permalink / raw)
  To: DevilKin; +Cc: linux-kernel

On Fri, 2002-09-06 at 16:13, DevilKin wrote:
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862, 
> sector=1803472

That certainly looks like a drive error.

> The drive involved is an IBM-DTLA-307060, which has served me without problems 
> now for about 2 years.

Get the IBM disk tools, upgrade the firmware and see what the ibm tools
have to say. IBM drives have had some problems with spontaneous bad
blocks appearing that go away with new firmware and a run of the disk
tools. More importantly if thats the problem with the firmware update
they dont come back until the drive really dies.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:13 DevilKin
  2002-09-06 15:26 ` Mike Dresser
  2002-09-06 15:36 ` jbradford
@ 2002-09-06 15:37 ` mbs
  2002-09-06 15:54   ` Alan Cox
  2002-09-06 15:38 ` Alan Cox
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 68+ messages in thread
From: mbs @ 2002-09-06 15:37 UTC (permalink / raw)
  To: DevilKin, linux-kernel

same problem I was having with 2.4.20-pre4-ac2-preempt.

alan didn't want to hear it from me due to the -preempt

my system was e7500 chipset, dual xeon, WD 40g drive, ext2 or ext3.

from this we can glean: preempt not a factor, HD manufacturer not a factor, 
FS not a factor.  don't know what chipset you are using.

I was allso geting badCRC errors.

On Friday 06 September 2002 11:13, DevilKin wrote:
> Hello kernel people,
>
> Kernel running: 2.4.20-pre1ac3 or -pre5ac2 (same under both)
>
> Today I discovered a stale copy of qt-3.0.3 lying about on my disk. When I
> tried to delete it, this started showing up in my log files:
>
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
> data of [612671 612672 0x0 SD]
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862,
> sector=1803472
> end_request: I/O error, dev 03:06 (hda), sector 1803472
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
> data of [612671 612677 0x0 SD]
>
> and rm just reported me 'Permission denied'.
>
> I've looked up these errors on the net, and as far as i can tell it means
> that the drive has some bad sectors at the given addresses and that it will
> probably die on me sooner or later.
>
> Can someone either confirm this to me or tell me what to do to fix it?
>
> The drive involved is an IBM-DTLA-307060, which has served me without
> problems now for about 2 years.
>
> Thanks!
>
> DK

-- 
/**************************************************
**   Mark Salisbury       ||      mbs@mc.com     **
** If you would like to sponsor me for the       **
** Mass Getaway, a 150 mile bicycle ride to for  **
** MS, contact me to donate by cash or check or  **
** click the link below to donate by credit card **
**************************************************/
https://www.nationalmssociety.org/pledge/pledge.asp?participantid=86736

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:13 DevilKin
  2002-09-06 15:26 ` Mike Dresser
@ 2002-09-06 15:36 ` jbradford
  2002-09-06 15:55   ` DevilKin
  2002-09-06 15:37 ` mbs
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 68+ messages in thread
From: jbradford @ 2002-09-06 15:36 UTC (permalink / raw)
  To: DevilKin; +Cc: linux-kernel

> I've looked up these errors on the net, and as far as i can tell it means
> that the drive has some bad sectors at the given addresses and that it will
> probably die on me sooner or later.
> 
> Can someone either confirm this to me or tell me what to do to fix it?
> 
> The drive involved is an IBM-DTLA-307060, which has served me without
> problems now for about 2 years.

Have a look at:

http://csl.cse.ucsc.edu/smart.shtml

there you will find software for interrogating and monitoring the S.M.A.R.T. data available from your drive.  It's a little late to start monitoring it, if the drive is already dying, but if, for example, it shows a lot of re-allocated sectors, or spin retries, you'll know something is wrong.

John.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: ide drive dying?
  2002-09-06 15:13 DevilKin
@ 2002-09-06 15:26 ` Mike Dresser
  2002-09-06 15:39   ` Alan Cox
                     ` (2 more replies)
  2002-09-06 15:36 ` jbradford
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 68+ messages in thread
From: Mike Dresser @ 2002-09-06 15:26 UTC (permalink / raw)
  To: DevilKin; +Cc: linux-kernel

> The drive involved is an IBM-DTLA-307060, which has served me without problems
> now for about 2 years.

IBM DeathStar 75gxp.

One of the worst hard drives ever made.  It's quite likely it's failed,
and in fact, two years is pretty impressive out of one of these.

Make backups immediately.  Run ibm's DFT tool, get the code to RMA this
thing back to IBM.  Sell the replacement they send you to a sucker on
eBAY, and buy yourself a new drive.  You can pickup 80 gig drives for
around 80 bucks nowadays.  I used to recommend Maxtors, until they said
they're cutting their warranty to one year from three.  I don't know what
to use anymore.

Mike


^ permalink raw reply	[flat|nested] 68+ messages in thread

* ide drive dying?
@ 2002-09-06 15:13 DevilKin
  2002-09-06 15:26 ` Mike Dresser
                   ` (6 more replies)
  0 siblings, 7 replies; 68+ messages in thread
From: DevilKin @ 2002-09-06 15:13 UTC (permalink / raw)
  To: linux-kernel

Hello kernel people,

Kernel running: 2.4.20-pre1ac3 or -pre5ac2 (same under both)

Today I discovered a stale copy of qt-3.0.3 lying about on my disk. When I 
tried to delete it, this started showing up in my log files:

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862, 
sector=1803472
end_request: I/O error, dev 03:06 (hda), sector 1803472
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data 
of [612671 612672 0x0 SD]
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=7072862, 
sector=1803472
end_request: I/O error, dev 03:06 (hda), sector 1803472
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data 
of [612671 612677 0x0 SD]

and rm just reported me 'Permission denied'.

I've looked up these errors on the net, and as far as i can tell it means that 
the drive has some bad sectors at the given addresses and that it will 
probably die on me sooner or later.

Can someone either confirm this to me or tell me what to do to fix it?

The drive involved is an IBM-DTLA-307060, which has served me without problems 
now for about 2 years.

Thanks!

DK
-- 
If all the Chinese simultaneously jumped into the Pacific off a 10 foot
platform erected 10 feet off their coast, it would cause a tidal wave
that would destroy everything in this country west of Nebraska.


^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2002-09-10 20:18 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-06 20:40 RE:Re: ide drive dying? Hell.Surfers
2002-09-06 21:01 ` Alan Cox
2002-09-06 21:45   ` jbradford
2002-09-07  1:02   ` Jason L Tibbitts III
2002-09-07 12:03   ` Daniel Egger
2002-09-06 21:07 ` DevilKin
  -- strict thread matches above, loose matches on Subject: below --
2002-09-08  0:02 Dieter Nützel
2002-09-08 17:42 ` jbradford
2002-09-09  1:31   ` Nuitari
2002-09-08 19:27     ` Ed Sweetman
2002-09-08 20:11       ` jbradford
2002-09-09  2:37         ` Nuitari
2002-09-09 12:26   ` Joachim Breuer
2002-09-09 18:17   ` Thunder from the hill
2002-09-09 18:55     ` Mike Dresser
2002-09-10 12:48   ` Ookhoi
2002-09-10 13:59     ` Mike Dresser
2002-09-10 14:56       ` jbradford
2002-09-10 15:21         ` Mike Dresser
2002-09-10 15:29           ` Larry McVoy
2002-09-10 20:21     ` Andre Hedrick
     [not found] <Pine.LNX.4.33.0209062017230.14523-100000@coffee.psychology.mcmaster.ca>
2002-09-07  6:09 ` jbradford
2002-09-06 15:13 DevilKin
2002-09-06 15:26 ` Mike Dresser
2002-09-06 15:39   ` Alan Cox
2002-09-06 15:42     ` Mike Dresser
2002-09-06 16:14       ` Billy Harvey
2002-09-06 16:41         ` Mike Dresser
2002-09-06 18:00         ` jbradford
2002-09-06 17:58           ` Mike Dresser
2002-09-06 15:44   ` Richard B. Johnson
2002-09-06 16:19     ` Craig Ruff
2002-09-06 17:28   ` Daniel Egger
2002-09-06 15:36 ` jbradford
2002-09-06 15:55   ` DevilKin
2002-09-06 17:22     ` jbradford
2002-09-06 19:22       ` DevilKin
2002-09-07  9:30         ` Anders Fugmann
2002-09-07  9:37           ` Udo A. Steinberg
2002-09-07 15:54           ` Holger Lubitz
2002-09-07 16:31             ` jbradford
2002-09-07 12:31         ` Daniel Egger
2002-09-07 13:08           ` jbradford
2002-09-07 13:50             ` Daniel Egger
2002-09-07 15:02               ` jbradford
2002-09-07 20:19                 ` Daniel Egger
2002-09-07 20:41                   ` jbradford
2002-09-07 21:41                     ` Alan Cox
2002-09-07 21:41                   ` Alan Cox
2002-09-07 22:00                     ` jbradford
2002-09-07 23:19                       ` David Forrest
2002-09-08 10:56                         ` Henning P. Schmiedehausen
2002-09-08 14:14                           ` jbradford
2002-09-09 21:59                             ` Alan Cox
2002-09-07 22:05                   ` Andre Hedrick
2002-09-07  7:08     ` Andre Hedrick
2002-09-06 15:37 ` mbs
2002-09-06 15:54   ` Alan Cox
2002-09-06 15:38 ` Alan Cox
2002-09-06 17:33   ` Daniel Egger
2002-09-06 20:31     ` Alan Cox
2002-09-06 15:44 ` mbs
2002-09-06 17:46 ` mbs
2002-09-06 20:32   ` Alan Cox
2002-09-07 12:34     ` Daniel Egger
2002-09-07  7:02 ` Andre Hedrick
2002-09-07  7:42   ` jbradford
2002-09-07  7:50     ` Andre Hedrick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).