linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: Blockbusting news, results get worse
@ 2003-10-27 17:43 Mudama, Eric
  2003-10-27 18:48 ` Hans Reiser
  0 siblings, 1 reply; 26+ messages in thread
From: Mudama, Eric @ 2003-10-27 17:43 UTC (permalink / raw)
  To: 'Norman Diamond', 'Hans Reiser ',
	'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Vitaly Fertman ',
	'Krzysztof Halasa '



> -----Original Message-----
> Yeah, I need to deliberately damage one block in order to 
> test the firmware, but I don't want to damage multiple
> blocks and use up the reallocation space.  I am a home
> user, even if I also do programming at work, even if I
> also volunteer one day each weekend to test Linux.  How can I 
> arrange to damage one block on a disk?

Um... you can do that by shorting various pins on the PCBA if you have
access to an oscilloscope, or put it under heavy write workload and remove
power.

A modern drive has many thousands of reassign sectors available, so I don't
think either of these events will cause a permanent issue.

I'd also suggest reading older ATA specs, since some vendors still support
older commands that were capable of various wierdness that might be useful.

--eric


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-27 17:43 Blockbusting news, results get worse Mudama, Eric
@ 2003-10-27 18:48 ` Hans Reiser
  2003-10-27 19:47   ` Jeff Garzik
  0 siblings, 1 reply; 26+ messages in thread
From: Hans Reiser @ 2003-10-27 18:48 UTC (permalink / raw)
  To: Mudama, Eric
  Cc: 'Norman Diamond', 'Wes Janzen ',
	'Rogier Wolff ', 'John Bradford ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Vitaly Fertman ',
	'Krzysztof Halasa '

Mudama, Eric wrote:

>
> or put it under heavy write workload and remove
>power.
>
Can you tell us more about what really happens to disk drives when the 
power is cut while a block is being written?  We engage in a lot of 
uninformed speculation, and it would be nice if someone who really knows 
told us....

Do drives have enough capacitance under normal conditions to finish 
writing the block?  Does ECC on the drive detect that the block was bad 
and so we don't need to detect it in the FS?

-- 
Hans



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-27 18:48 ` Hans Reiser
@ 2003-10-27 19:47   ` Jeff Garzik
  2003-10-27 20:03     ` John Bradford
  2003-10-28  1:21     ` Pavel Machek
  0 siblings, 2 replies; 26+ messages in thread
From: Jeff Garzik @ 2003-10-27 19:47 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Mudama, Eric, 'Norman Diamond', 'Wes Janzen ',
	'Rogier Wolff ', 'John Bradford ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Vitaly Fertman ',
	'Krzysztof Halasa '

Hans Reiser wrote:
> Mudama, Eric wrote:
> 
>>
>> or put it under heavy write workload and remove
>> power.
>>
> Can you tell us more about what really happens to disk drives when the 
> power is cut while a block is being written?  We engage in a lot of 
> uninformed speculation, and it would be nice if someone who really knows 
> told us....
> 
> Do drives have enough capacitance under normal conditions to finish 
> writing the block?  Does ECC on the drive detect that the block was bad 
> and so we don't need to detect it in the FS?


Does it really matter to speculate about this?

If you don't FLUSH CACHE, you have no guarantees your data is on the 
platter.

	Jeff




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-27 19:47   ` Jeff Garzik
@ 2003-10-27 20:03     ` John Bradford
  2003-10-29 20:01       ` Pavel Machek
  2003-10-28  1:21     ` Pavel Machek
  1 sibling, 1 reply; 26+ messages in thread
From: John Bradford @ 2003-10-27 20:03 UTC (permalink / raw)
  To: Jeff Garzik, Hans Reiser
  Cc: Mudama, Eric, 'Norman Diamond', 'Wes Janzen ',
	'Rogier Wolff ', 'John Bradford ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Vitaly Fertman ',
	'Krzysztof Halasa '

Quote from Jeff Garzik <jgarzik@pobox.com>:
> Hans Reiser wrote:
> > Mudama, Eric wrote:
> > 
> >>
> >> or put it under heavy write workload and remove
> >> power.
> >>
> > Can you tell us more about what really happens to disk drives when the 
> > power is cut while a block is being written?  We engage in a lot of 
> > uninformed speculation, and it would be nice if someone who really knows 
> > told us....
> > 
> > Do drives have enough capacitance under normal conditions to finish 
> > writing the block?  Does ECC on the drive detect that the block was bad 
> > and so we don't need to detect it in the FS?
> 
> 
> Does it really matter to speculate about this?
> 
> If you don't FLUSH CACHE, you have no guarantees your data is on the 
> platter.

I think that the idea that is floating around is to deliberately ruin
the formatting on part of the drive in order to simulate a bad block.

Operation of disk drives immediately after a power failiure has been
discussed before, by the way:

http://marc.theaimsgroup.com/?l=linux-kernel&m=100665153518652&w=2

John.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-27 19:47   ` Jeff Garzik
  2003-10-27 20:03     ` John Bradford
@ 2003-10-28  1:21     ` Pavel Machek
  2003-10-28 12:54       ` Krzysztof Halasa
  1 sibling, 1 reply; 26+ messages in thread
From: Pavel Machek @ 2003-10-28  1:21 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Hans Reiser, Mudama, Eric, 'Norman Diamond',
	'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Vitaly Fertman ',
	'Krzysztof Halasa '

Hi!

> >>or put it under heavy write workload and remove
> >>power.
> >>
> >Can you tell us more about what really happens to disk drives when the 
> >power is cut while a block is being written?  We engage in a lot of 
> >uninformed speculation, and it would be nice if someone who really knows 
> >told us....
> >
> >Do drives have enough capacitance under normal conditions to finish 
> >writing the block?  Does ECC on the drive detect that the block was bad 
> >and so we don't need to detect it in the FS?
> 
> 
> Does it really matter to speculate about this?
> 
> If you don't FLUSH CACHE, you have no guarantees your data is on the 
> platter.

Well, even without FLUSH CACHE, you can expect that sector being
writen during powerfail either contains old data *or* new data.

If sector can become unreadable after powerfail, I guess journaling
people would like to know, and if powerfail may mean adjacent (or even
unrelated?) sectors to be damaged, everyone needs to know...
								Pavel
-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-28  1:21     ` Pavel Machek
@ 2003-10-28 12:54       ` Krzysztof Halasa
  0 siblings, 0 replies; 26+ messages in thread
From: Krzysztof Halasa @ 2003-10-28 12:54 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jeff Garzik, Hans Reiser, Mudama, Eric, 'Norman Diamond',
	'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ',
	linux-kernel, nikita, 'Justin Cormack ',
	'Vitaly Fertman '

Pavel Machek <pavel@ucw.cz> writes:

> Well, even without FLUSH CACHE, you can expect that sector being
> writen during powerfail either contains old data *or* new data.

I thinks so. It was not always the case with IBM DTLA drives, though.
-- 
Krzysztof Halasa, B*FH

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-27 20:03     ` John Bradford
@ 2003-10-29 20:01       ` Pavel Machek
  2003-10-30  8:30         ` John Bradford
  0 siblings, 1 reply; 26+ messages in thread
From: Pavel Machek @ 2003-10-29 20:01 UTC (permalink / raw)
  To: John Bradford
  Cc: Jeff Garzik, Hans Reiser, Mudama, Eric, 'Norman Diamond',
	'Wes Janzen ', 'Rogier Wolff ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Vitaly Fertman ',
	'Krzysztof Halasa '

Hi!

> > >> or put it under heavy write workload and remove
> > >> power.
> > >>
> > > Can you tell us more about what really happens to disk drives when the 
> > > power is cut while a block is being written?  We engage in a lot of 
> > > uninformed speculation, and it would be nice if someone who really knows 
> > > told us....
> > > 
> > > Do drives have enough capacitance under normal conditions to finish 
> > > writing the block?  Does ECC on the drive detect that the block was bad 
> > > and so we don't need to detect it in the FS?
> > 
> > 
> > Does it really matter to speculate about this?
> > 
> > If you don't FLUSH CACHE, you have no guarantees your data is on the 
> > platter.
> 
> I think that the idea that is floating around is to deliberately ruin
> the formatting on part of the drive in order to simulate a bad block.
> 
> Operation of disk drives immediately after a power failiure has been
> discussed before, by the way:
> 
> http://marc.theaimsgroup.com/?l=linux-kernel&m=100665153518652&w=2

Well, that looks like pure speculation.

BTW I *do* believe that powerfail can make the sector bad. Imagine you
bump into bad sector during write, and need to reallocate...

								Pavel
-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-29 20:01       ` Pavel Machek
@ 2003-10-30  8:30         ` John Bradford
  0 siblings, 0 replies; 26+ messages in thread
From: John Bradford @ 2003-10-30  8:30 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jeff Garzik, Hans Reiser, Mudama, Eric, 'Norman Diamond',
	'Wes Janzen ', 'Rogier Wolff ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Vitaly Fertman ',
	'Krzysztof Halasa '

Quote from Pavel Machek <pavel@ucw.cz>:
> Hi!
> 
> > > >> or put it under heavy write workload and remove
> > > >> power.
> > > >>
> > > > Can you tell us more about what really happens to disk drives when the 
> > > > power is cut while a block is being written?  We engage in a lot of 
> > > > uninformed speculation, and it would be nice if someone who really knows 
> > > > told us....
> > > > 
> > > > Do drives have enough capacitance under normal conditions to finish 
> > > > writing the block?  Does ECC on the drive detect that the block was bad 
> > > > and so we don't need to detect it in the FS?
> > > 
> > > 
> > > Does it really matter to speculate about this?
> > > 
> > > If you don't FLUSH CACHE, you have no guarantees your data is on the 
> > > platter.
> > 
> > I think that the idea that is floating around is to deliberately ruin
> > the formatting on part of the drive in order to simulate a bad block.
> > 
> > Operation of disk drives immediately after a power failiure has been
> > discussed before, by the way:
> > 
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=100665153518652&w=2
> 
> Well, that looks like pure speculation.
> 
> BTW I *do* believe that powerfail can make the sector bad. Imagine you
> bump into bad sector during write, and need to reallocate...

See the rest of the thread.

I think the point is that if that happened, it would be outside the
scope of the solution being suggested, and something more elaborate
such as a battery backed cache is needed if you want to guard against
that situation.

Unfortunately, I think that the re-writing due to a bad sector
requirement is going to occur often enough to make any solution which
doesn't handle it a bit pointless :-(

John.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Blockbusting news, results get worse
@ 2003-10-29 20:11 Mudama, Eric
  0 siblings, 0 replies; 26+ messages in thread
From: Mudama, Eric @ 2003-10-29 20:11 UTC (permalink / raw)
  To: 'Pavel Machek', John Bradford
  Cc: Jeff Garzik, Hans Reiser, 'Norman Diamond',
	'Wes Janzen ', 'Rogier Wolff ',
	linux-kernel, nikita, 'Justin Cormack ',
	'Vitaly Fertman ', 'Krzysztof Halasa '



> -----Original Message-----
> From: Pavel Machek [mailto:pavel@ucw.cz]
> 
> > > If you don't FLUSH CACHE, you have no guarantees your 
> data is on the 
> > > platter.
> > 
> > I think that the idea that is floating around is to 
> deliberately ruin
> > the formatting on part of the drive in order to simulate a 
> bad block.
> > 
> > Operation of disk drives immediately after a power failiure has been
> > discussed before, by the way:
> > 
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=100665153518652&w=2
> 
> Well, that looks like pure speculation.
> 
> BTW I *do* believe that powerfail can make the sector bad. Imagine you
> bump into bad sector during write, and need to reallocate...
> 
> 								Pavel

Both the linked post and Pavel's point are correct.

In a modern drive, tolerances are so tight that your drive is constantly
re-writing blocks it knows it didn't write very well.  In a power-fail
event, there's little to no time to reallocate or reattempt a write, and
even less energy available to "fix" things that aren't within specification
anymore (spin speed, etc) ... if we don't get the actuator to the latch,
your drive probably won't spin again and you'll lose *all* your data, so
that is our number 1 concern when the power fails.

"Performance" IDE drives these days ship with 8MB buffers, which compounds
the problem even further if you're trying to get data on the media after
power has been cut.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-27  9:34 ` Norman Diamond
  2003-10-27 10:23   ` Jan-Benedict Glaw
  2003-10-27 23:31   ` Jason Lunz
@ 2003-10-28 20:56   ` Hans Reiser
  2 siblings, 0 replies; 26+ messages in thread
From: Hans Reiser @ 2003-10-28 20:56 UTC (permalink / raw)
  To: Norman Diamond
  Cc: Mudama, Eric, 'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Vitaly Fertman ',
	'Krzysztof Halasa '

So it turns out that reiserfstune currently can mark bad blocks  for 
unmounted filesystems, and that the patches that were not uptodate were 
just patches for doing it for mounted filesystems.  So, problem is found 
to have already been solved after much miscommunication, sorry about that.

-- 
Hans



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-27  9:34 ` Norman Diamond
  2003-10-27 10:23   ` Jan-Benedict Glaw
@ 2003-10-27 23:31   ` Jason Lunz
  2003-10-28 20:56   ` Hans Reiser
  2 siblings, 0 replies; 26+ messages in thread
From: Jason Lunz @ 2003-10-27 23:31 UTC (permalink / raw)
  To: linux-kernel

ndiamond@wta.att.ne.jp said:
> Yeah, I need to deliberately damage one block in order to test the
> firmware, but I don't want to damage multiple blocks and use up the
> reallocation space.  I am a home user, even if I also do programming
> at work, even if I also volunteer one day each weekend to test Linux.
> How can I arrange to damage one block on a disk?

I have two ata100 drives sitting at home right now that fill dmesg with
lots of UnrecoverableErrors whenever you access certain sectors. I'll
ship them to anyone who will use them to make linux error recovery more
resilient, either at the ide driver level or in the filesystem.

any takers?

Jason


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Blockbusting news, results get worse
  2003-10-27 18:06 Mudama, Eric
@ 2003-10-27 19:18 ` Andre Hedrick
  0 siblings, 0 replies; 26+ messages in thread
From: Andre Hedrick @ 2003-10-27 19:18 UTC (permalink / raw)
  To: Mudama, Eric; +Cc: 'Samium Gromoff', linux-kernel


Eric,

You are being helpful and this is a good thing.
Sure anyone can join T13 for $800/year and up to $10K in travel cost.
Anyone can join T10 for $10K/year and additional $10K in travel cost.

Any other class/level of membership is worthless.  If you can not vote why
bother.

I do not remember meeting you when I was working with VP Skinner in
the Longmont office, the division of firmware for all of Maxtor.  This
could be that you were part of the Quantum merger?

Regardless, I may be hard to understand.  The drive companies are harder
being mimes until recently.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Mon, 27 Oct 2003, Mudama, Eric wrote:

> 
> 
> > -----Original Message-----
> > From: Samium Gromoff [mailto:deepfire@ibe.miee.ru]
> > Sent: Monday, October 27, 2003 6:08 AM
> > To: Mudama, Eric
> > Cc: linux-kernel@vger.kernel.org
> > Subject: RE: Blockbusting news, results get worse
> > 
> > 
> > Eric Mudama wrote:
> > > Andre Hedrick wrote:
> > > > Eric,
> > > >
> > > > Item "3" in your list is not practical, because no drive
> > > > maker allows the same drives that large oem's purchase to 
> > be placed in retail.
> > > > There are obvious reasons, but your position stated for 
> > the average joe
> > > > consumer is flawed.
> > > 
> > > I don't believe your statement is correct that OEM drives 
> > and retail drives
> > > always differ.  They may have slight configuration differences, but
> > > fundamentally I think they're the same drive with identical or
> > > near-identical firmware.
> > 
> > If there is somebody you should believe about such stuff, 
> > that would be Andre.
> > (by the way he was a T13 committee member not so long ago)
> 
> That's nice.  For $800/year, anyone can join who is interested, provided
> they can attend the meetings.  Anyone is free to join and ask questions on
> the T13 mailing list.
> 
> http://www.t13.org
> 
> As to the "facts," I guess I choose to believe myself, since I'm one of the
> guys writing firmware that decides drive behavior in many of these cases
> that people bring up.  Now, I've only been doing this for 3 years, so if
> there was something done greater than 3 years ago, odds are I haven't heard
> of it.  I am only speaking from recent experience.
> 
> As to "believing" Andre, I'm sure he's a nice guy, but he comes off as
> awefully bitter... it's tough to read more than a few sentences of what he
> writes.  He obviously "knows" stuff, but wants to make people jump through
> hoops to learn what he knows.
> 
> > And, hey, i would have been rather surprised if you have 
> > answered otherwise, given your email address...
> 
> Of course, you can dismiss everything I'm saying if you like.  However, I'd
> like to think I've been helpful to someone.  Disk drives don't work quite
> the way some people think, so I like to try to clear up these misconceptions
> thinking it will eventually help produce better linux code that works better
> with the IDE drives I can afford.
> 
> --eric
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Blockbusting news, results get worse
@ 2003-10-27 18:06 Mudama, Eric
  2003-10-27 19:18 ` Andre Hedrick
  0 siblings, 1 reply; 26+ messages in thread
From: Mudama, Eric @ 2003-10-27 18:06 UTC (permalink / raw)
  To: 'Samium Gromoff'; +Cc: linux-kernel



> -----Original Message-----
> From: Samium Gromoff [mailto:deepfire@ibe.miee.ru]
> Sent: Monday, October 27, 2003 6:08 AM
> To: Mudama, Eric
> Cc: linux-kernel@vger.kernel.org
> Subject: RE: Blockbusting news, results get worse
> 
> 
> Eric Mudama wrote:
> > Andre Hedrick wrote:
> > > Eric,
> > >
> > > Item "3" in your list is not practical, because no drive
> > > maker allows the same drives that large oem's purchase to 
> be placed in retail.
> > > There are obvious reasons, but your position stated for 
> the average joe
> > > consumer is flawed.
> > 
> > I don't believe your statement is correct that OEM drives 
> and retail drives
> > always differ.  They may have slight configuration differences, but
> > fundamentally I think they're the same drive with identical or
> > near-identical firmware.
> 
> If there is somebody you should believe about such stuff, 
> that would be Andre.
> (by the way he was a T13 committee member not so long ago)

That's nice.  For $800/year, anyone can join who is interested, provided
they can attend the meetings.  Anyone is free to join and ask questions on
the T13 mailing list.

http://www.t13.org

As to the "facts," I guess I choose to believe myself, since I'm one of the
guys writing firmware that decides drive behavior in many of these cases
that people bring up.  Now, I've only been doing this for 3 years, so if
there was something done greater than 3 years ago, odds are I haven't heard
of it.  I am only speaking from recent experience.

As to "believing" Andre, I'm sure he's a nice guy, but he comes off as
awefully bitter... it's tough to read more than a few sentences of what he
writes.  He obviously "knows" stuff, but wants to make people jump through
hoops to learn what he knows.

> And, hey, i would have been rather surprised if you have 
> answered otherwise, given your email address...

Of course, you can dismiss everything I'm saying if you like.  However, I'd
like to think I've been helpful to someone.  Disk drives don't work quite
the way some people think, so I like to try to clear up these misconceptions
thinking it will eventually help produce better linux code that works better
with the IDE drives I can afford.

--eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Blockbusting news, results get worse
@ 2003-10-27 13:07 Samium Gromoff
  0 siblings, 0 replies; 26+ messages in thread
From: Samium Gromoff @ 2003-10-27 13:07 UTC (permalink / raw)
  To: eric_mudama; +Cc: linux-kernel

Eric Mudama wrote:
> Andre Hedrick wrote:
> > Eric,
> >
> > Item "3" in your list is not practical, because no drive
> > maker allows the same drives that large oem's purchase to be placed in retail.
> > There are obvious reasons, but your position stated for the average joe
> > consumer is flawed.
> 
> I don't believe your statement is correct that OEM drives and retail drives
> always differ.  They may have slight configuration differences, but
> fundamentally I think they're the same drive with identical or
> near-identical firmware.

If there is somebody you should believe about such stuff, that would be Andre.
(by the way he was a T13 committee member not so long ago)

And, hey, i would have been rather surprised if you have answered otherwise,
given your email address...


cheers, Samium Gromoff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-27  9:34 ` Norman Diamond
@ 2003-10-27 10:23   ` Jan-Benedict Glaw
  2003-10-27 23:31   ` Jason Lunz
  2003-10-28 20:56   ` Hans Reiser
  2 siblings, 0 replies; 26+ messages in thread
From: Jan-Benedict Glaw @ 2003-10-27 10:23 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1399 bytes --]

On Mon, 2003-10-27 18:34:48 +0900, Norman Diamond <ndiamond@wta.att.ne.jp>
wrote in message <3cba01c39c6f$141529a0$24ee4ca5@DIAMONDLX60>:
> Eric Mudama wrote:

> also volunteer one day each weekend to test Linux.  How can I arrange to
> damage one block on a disk?

That's obviously quite easy to do. In German, we call this
"spanabhebende Datenverarbeitung". Just open some HDD and do some thick
scratches with a screwdriver on one of it's platters.

> I'm not sure how many Dell notebooks you'll have to open to see a Toshiba
> drive, but I'll bet the number is low.  Also do you recognize the name
> Toshiba as a large maker of notebook PCs, and do you have any guesses as to
> how many Toshiba notebooks you'll have to open to see a Toshiba drive?
> Toshiba already reduced their former US 3-year warranties to 1 year and

Personally, I don't like Toshiba. They're a so large company that they
can effort to do their own chips (like pcmcia bridges etc.). This way,
you get new (preprietary) chips at some time. Been there, now buying
other notebooks...

MfG, JBG

-- 
   Jan-Benedict Glaw       jbglaw@lug-owl.de    . +49-172-7608481
   "Eine Freie Meinung in  einem Freien Kopf    | Gegen Zensur | Gegen Krieg
    fuer einen Freien Staat voll Freier Bürger" | im Internet! |   im Irak!
   ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-26 18:33 Mudama, Eric
  2003-10-26 22:03 ` Andre Hedrick
@ 2003-10-27  9:34 ` Norman Diamond
  2003-10-27 10:23   ` Jan-Benedict Glaw
                     ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: Norman Diamond @ 2003-10-27  9:34 UTC (permalink / raw)
  To: Mudama, Eric, 'Hans Reiser ', 'Wes Janzen ',
	'Rogier Wolff ', 'John Bradford ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Vitaly Fertman ',
	'Krzysztof Halasa '

Eric Mudama wrote:

> 1. Pay a premium for longer warranty.

I've commented on this already.

> 2. Do qualification tests yourself during the first year of operation.

Yeah, I need to deliberately damage one block in order to test the firmware,
but I don't want to damage multiple blocks and use up the reallocation
space.  I am a home user, even if I also do programming at work, even if I
also volunteer one day each weekend to test Linux.  How can I arrange to
damage one block on a disk?

> 3. Look at what products are being shipped in large volume from OEMs, and
> buy the same product yourself.  Dell or HP or IBM can't afford to ship
> products that don't have the lowest in-the-field failure rates, so buying
> what they buy would make sense since they'll run their own tests like #2.

I'm not sure how many Dell notebooks you'll have to open to see a Toshiba
drive, but I'll bet the number is low.  Also do you recognize the name
Toshiba as a large maker of notebook PCs, and do you have any guesses as to
how many Toshiba notebooks you'll have to open to see a Toshiba drive?
Toshiba already reduced their former US 3-year warranties to 1 year and
provide 0 warranty directly to customers in Japan.  (Maybe they should
follow the ideas of a certain dominant software maker and pretend to have a
90-day warranty but in fact renege every time a failure occurs?  There would
be 0 difference in what needs to be done in software to make up for it.)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Blockbusting news, results get worse
@ 2003-10-26 22:12 Mudama, Eric
  0 siblings, 0 replies; 26+ messages in thread
From: Mudama, Eric @ 2003-10-26 22:12 UTC (permalink / raw)
  To: 'Andre Hedrick'; +Cc: linux-kernel



Andre Hedrick wrote:
> Eric,
> 
> Item "3" in your list is not practical, because no drive 
> maker allows the same drives that large oem's purchase to be placed in
retail. 
> There are obvious reasons, but your position stated for the average joe 
> consumer is flawed.

I don't believe your statement is correct that OEM drives and retail drives
always differ.  They may have slight configuration differences, but
fundamentally I think they're the same drive with identical or
near-identical firmware.

> Why don't you guys offer extended warrenty purchase service contracts?

As an optional feature on any drive?  Not sure, it would be nice.  However,
maintaining it specifically for individual drives in a product line might be
more work than someone high up feels is worth it.  Maybe there's a market
for buying a $30 warranty add-on from Maxtor that buys you an extra year or
whatever, however, I think you can get the same thing from CompUSA or other
companies now if you want it.  For them it is profitable, and I don't think
we'd want to compete with our virtual sales force. (The retail shops)

I do know you get better warranties on the more expensive models though, but
obviously that doesn't help after-the-fact.

--eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Blockbusting news, results get worse
  2003-10-26 18:33 Mudama, Eric
@ 2003-10-26 22:03 ` Andre Hedrick
  2003-10-27  9:34 ` Norman Diamond
  1 sibling, 0 replies; 26+ messages in thread
From: Andre Hedrick @ 2003-10-26 22:03 UTC (permalink / raw)
  To: Mudama, Eric; +Cc: linux-kernel


Eric,

Item "3" in your list is not practical, because no drive maker allows the
same drives that large oem's purchase to be placed in retail.  There are
obvious reasons, but your position stated for the average joe consumer is
flawed.

Why don't you guys offer extended warrenty purchase service contracts?

DCO the dog out of the replacements and be done.

Cheers,


Andre Hedrick
LAD Storage Consulting Group

On Sun, 26 Oct 2003, Mudama, Eric wrote:

> 
> 
> > -----Original Message-----
> > From: Norman Diamond [mailto:ndiamond@wta.att.ne.jp]
> >
> > 
> > 4.  When writing ZEROES to the bad sector, the drive reports SUCCESS.
> > But it lies.  Subsequent attempts to read still fail.  
> > Subsequent writing of
> > zeroes appears to succeed again.  Subsequent attempts to read 
> > still fail.
> 
> *That* is the fundamental problem with the drive.  If it knows it has had
> trouble with that block in the past, and it gets a new write, it should know
> that is a troublesome area and verify that it was able to put the new block
> in the old location.
> 
> If it can verify that, then there's no need to reallocate it at all, since
> the write most likely cured whatever was wrong.
> 
> If it can't verify it, then it should need to reallocate and verify at the
> new location.
> 
> > They said that they warranty Toshiba disk drives for 1 year.  So
> > if a customer buys a Toshiba disk drive with firmware that 
> > was defective on the day of purchase and defective on the dates
> > of design and manufacture, but if the customer doesn't detect
> > the defective firmware until 366 days later, the customer still
> > gets shafted.
> 
> In theory, I don't see the problem with this.
> 
> It isn't realistic for a vendor to warranty a product forever, and this is
> why OEMs do large qualifications on drives themselves before they purchase a
> single unit, since they know they'll bear the brunt of the support headache
> if the product fails.
> 
> That being said, there are three options:
> 
> 1. Pay a premium for longer warranty.  I know this is available in both IDE
> and SCSI, not sure if it is available in notebook drives.
> 
> 2. Do qualification tests yourself during the first year of operation.
> Hi/low temperature/humidity/air pressure, random command generator, and make
> sure the drive never miscompares or has a hard error it can't "fix".
> (Writing a zero and reading non-zero is a miscompare)
> 
> 3. Look at what products are being shipped in large volume from OEMs, and
> buy the same product yourself.  Dell or HP or IBM can't afford to ship
> products that don't have the lowest in-the-field failure rates, so buying
> what they buy would make sense since they'll run their own tests like #2.
> 
> 
> --eric
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Blockbusting news, results get worse
@ 2003-10-26 18:33 Mudama, Eric
  2003-10-26 22:03 ` Andre Hedrick
  2003-10-27  9:34 ` Norman Diamond
  0 siblings, 2 replies; 26+ messages in thread
From: Mudama, Eric @ 2003-10-26 18:33 UTC (permalink / raw)
  To: 'Norman Diamond', 'Hans Reiser ',
	'Wes Janzen ', 'Rogier Wolff ',
	'John Bradford ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '



> -----Original Message-----
> From: Norman Diamond [mailto:ndiamond@wta.att.ne.jp]
>
> 
> 4.  When writing ZEROES to the bad sector, the drive reports SUCCESS.
> But it lies.  Subsequent attempts to read still fail.  
> Subsequent writing of
> zeroes appears to succeed again.  Subsequent attempts to read 
> still fail.

*That* is the fundamental problem with the drive.  If it knows it has had
trouble with that block in the past, and it gets a new write, it should know
that is a troublesome area and verify that it was able to put the new block
in the old location.

If it can verify that, then there's no need to reallocate it at all, since
the write most likely cured whatever was wrong.

If it can't verify it, then it should need to reallocate and verify at the
new location.

> They said that they warranty Toshiba disk drives for 1 year.  So
> if a customer buys a Toshiba disk drive with firmware that 
> was defective on the day of purchase and defective on the dates
> of design and manufacture, but if the customer doesn't detect
> the defective firmware until 366 days later, the customer still
> gets shafted.

In theory, I don't see the problem with this.

It isn't realistic for a vendor to warranty a product forever, and this is
why OEMs do large qualifications on drives themselves before they purchase a
single unit, since they know they'll bear the brunt of the support headache
if the product fails.

That being said, there are three options:

1. Pay a premium for longer warranty.  I know this is available in both IDE
and SCSI, not sure if it is available in notebook drives.

2. Do qualification tests yourself during the first year of operation.
Hi/low temperature/humidity/air pressure, random command generator, and make
sure the drive never miscompares or has a hard error it can't "fix".
(Writing a zero and reading non-zero is a miscompare)

3. Look at what products are being shipped in large volume from OEMs, and
buy the same product yourself.  Dell or HP or IBM can't afford to ship
products that don't have the lowest in-the-field failure rates, so buying
what they buy would make sense since they'll run their own tests like #2.


--eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-26 11:38   ` Norman Diamond
  2003-10-26 11:56     ` Pavel Machek
  2003-10-26 12:06     ` Hans Reiser
@ 2003-10-26 13:59     ` Krzysztof Halasa
  2 siblings, 0 replies; 26+ messages in thread
From: Krzysztof Halasa @ 2003-10-26 13:59 UTC (permalink / raw)
  To: Norman Diamond
  Cc: John Bradford, Mudama, Eric, 'Hans Reiser ',
	'Wes Janzen ', 'Rogier Wolff ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Vitaly Fertman '

"Norman Diamond" <ndiamond@wta.att.ne.jp> writes:

> By the way some participants in this thread have argued that the block
> should not be replaced by zeroes or random garbage without notice.  I fully
> agree.  The block should be replaced by zeroes or random garbage WITH
> notice.

Right. The correct way of sending such a notice is returning I/O error
on read. It's standard and applications support it for years (of course
we can - and currently do - log the error as well).

>  From the point of view of logging it in the system log, it is
> enough to log it once, it doesn't have to be logged over and over again.

Storing a log entry in system log doesn't tell applications there is
a problem. It's simply unacceptable.
Relocating on write (at filesystem level) - sure, it would be helpful
(possibly as a compile-time option - most IDE drives and things like
RAM disks don't need it).
-- 
Krzysztof Halasa, B*FH

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-26 11:38   ` Norman Diamond
  2003-10-26 11:56     ` Pavel Machek
@ 2003-10-26 12:06     ` Hans Reiser
  2003-10-26 13:59     ` Krzysztof Halasa
  2 siblings, 0 replies; 26+ messages in thread
From: Hans Reiser @ 2003-10-26 12:06 UTC (permalink / raw)
  To: Norman Diamond
  Cc: John Bradford, Mudama, Eric, 'Wes Janzen ',
	'Rogier Wolff ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Vitaly Fertman ',
	'Krzysztof Halasa '

Norman Diamond wrote:

>
>.  Please let's not beggar each other.)
>
>  
>
I have been beggared quite without your help.;-)

-- 
Hans



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-26 11:38   ` Norman Diamond
@ 2003-10-26 11:56     ` Pavel Machek
  2003-10-26 12:06     ` Hans Reiser
  2003-10-26 13:59     ` Krzysztof Halasa
  2 siblings, 0 replies; 26+ messages in thread
From: Pavel Machek @ 2003-10-26 11:56 UTC (permalink / raw)
  To: Norman Diamond
  Cc: John Bradford, Mudama, Eric, 'Hans Reiser ',
	'Wes Janzen ', 'Rogier Wolff ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Vitaly Fertman ',
	'Krzysztof Halasa '

Hi!

> By the way some participants in this thread have argued that the block
> should not be replaced by zeroes or random garbage without notice.  I fully
> agree.  The block should be replaced by zeroes or random garbage WITH
> notice.  From the point of view of logging it in the system log, it is
> enough to log it once, it doesn't have to be logged over and over
> again.

It *does* have to be logged over and over. How does disk know system
did not crash between it returning an error and syslog message getting
written?

								Pavel
PS: Okay, we should end this thread here.
-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-26 10:39 ` John Bradford
  2003-10-26  9:41   ` Pavel Machek
@ 2003-10-26 11:38   ` Norman Diamond
  2003-10-26 11:56     ` Pavel Machek
                       ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: Norman Diamond @ 2003-10-26 11:38 UTC (permalink / raw)
  To: John Bradford, Mudama, Eric, 'Hans Reiser ',
	'Wes Janzen ', 'Rogier Wolff ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Vitaly Fertman ',
	'Krzysztof Halasa '

John Bradford pretended to reply to me:

> > 4.  When writing ZEROES to the bad sector, the drive reports SUCCESS.
> > But it lies.  Subsequent attempts to read still fail.  Subsequent
> > writing of zeroes appears to succeed again.  Subsequent attempts to read
> > still fail.
>
> > I still have to say, we can't fix Toshiba, and we can avoid Toshiba, but
> > meanwhile we can fix Linux.
>
> How do you suggest we 'fix' 4, above, other than to flush the cache
> and verify each time a full sector of zeros is written to the disk?

Number 4 cannot be fixed by Linux.  Why do you pervert my writing?

The refusal to remove a known defective block from ordinary use in the file
system can be fixed.  How many times does this need to be said?  Why do you
pretend that this is not what I have been saying in this entire thread?

If I understand Hans Reiser's message correctly, this fix has indeed been
made in ReiserFS version 4.  I thank Mr. Reiser.  (By the way, I volunteer
about one day each weekend for testing, and I am hardly in a position to
contribute funds.  Please let's not beggar each other.)

By the way some participants in this thread have argued that the block
should not be replaced by zeroes or random garbage without notice.  I fully
agree.  The block should be replaced by zeroes or random garbage WITH
notice.  From the point of view of logging it in the system log, it is
enough to log it once, it doesn't have to be logged over and over again.
>From the point of view of informing the user whose program is running, the
dd command does an excellent job, but some unknown program was remaining
silent when I/O errors were originally detected and logged.  I still think
it is better to get that block out of the file system so that when that file
is rewritten or when other new files get created or extended then they won't
try to reuse that block.  But I've said this enough too.  I guess it's time
to stop beating this dead horse.  But anyway Mr. Reiser understood, and I am
glad, and I thank him.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-26  7:37 Norman Diamond
@ 2003-10-26 10:39 ` John Bradford
  2003-10-26  9:41   ` Pavel Machek
  2003-10-26 11:38   ` Norman Diamond
  0 siblings, 2 replies; 26+ messages in thread
From: John Bradford @ 2003-10-26 10:39 UTC (permalink / raw)
  To: Norman Diamond, Mudama, Eric, 'Hans Reiser ',
	'Wes Janzen ', 'Rogier Wolff ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '

> 4.  When writing ZEROES to the bad sector, the drive reports SUCCESS.
> But it lies.  Subsequent attempts to read still fail.  Subsequent writing of
> zeroes appears to succeed again.  Subsequent attempts to read still fail.

> I still have to say, we can't fix Toshiba, and we can avoid Toshiba, but
> meanwhile we can fix Linux.

How do you suggest we 'fix' 4, above, other than to flush the cache
and verify each time a full sector of zeros is written to the disk?

John.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
  2003-10-26 10:39 ` John Bradford
@ 2003-10-26  9:41   ` Pavel Machek
  2003-10-26 11:38   ` Norman Diamond
  1 sibling, 0 replies; 26+ messages in thread
From: Pavel Machek @ 2003-10-26  9:41 UTC (permalink / raw)
  To: John Bradford
  Cc: Norman Diamond, Mudama, Eric, 'Hans Reiser ',
	'Wes Janzen ', 'Rogier Wolff ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '

Hi!

> > 4.  When writing ZEROES to the bad sector, the drive reports SUCCESS.
> > But it lies.  Subsequent attempts to read still fail.  Subsequent writing of
> > zeroes appears to succeed again.  Subsequent attempts to read still fail.
> 
> > I still have to say, we can't fix Toshiba, and we can avoid Toshiba, but
> > meanwhile we can fix Linux.
> 
> How do you suggest we 'fix' 4, above, other than to flush the cache
> and verify each time a full sector of zeros is written to the disk?

Well,

	if (drive_is_toshiba())
		panic("Forward harddrive to nearest trashcan.\n");

during bootup?

Reporting sucess when it is not, is, umm, bad.
								Pavel

-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Blockbusting news, results get worse
@ 2003-10-26  7:37 Norman Diamond
  2003-10-26 10:39 ` John Bradford
  0 siblings, 1 reply; 26+ messages in thread
From: Norman Diamond @ 2003-10-26  7:37 UTC (permalink / raw)
  To: Mudama, Eric, 'Hans Reiser ', 'Wes Janzen ',
	'Rogier Wolff ', 'John Bradford ',
	linux-kernel, nikita, 'Pavel Machek ',
	'Justin Cormack ', 'Russell King ',
	'Vitaly Fertman ', 'Krzysztof Halasa '

It gets worse.  First, to recap previous results:

1.  The drive reported a permanent error on read, refused to reallocate the
bad sector, and Linux logged the error but refused to remove the block from
the Reiser file system.  (Different people have different opinions about
whether various parts of this behavior are acceptable, but anyway this was
one of the observed results.)

2.  The drive reported a permanent error on write, refused to reallocate the
bad sector, and Linux logged the error but refused to remove the block from
the Reiser file system.  (I'm not sure if different people have different
opinions about whether various parts of this behavior are acceptable.  This
was a write, good data were known at the time, but subsequently good data
would never be retrievable from the file.)

3.  The drive reported a permanent read error during a S.M.A.R.T. long
self-test and refused to reallocate the bad sector.  (I think different
people have different opinions about the acceptability of this too.)

Well, here's news.

4.  When writing ZEROES to the bad sector, the drive reports SUCCESS.
But it lies.  Subsequent attempts to read still fail.  Subsequent writing of
zeroes appears to succeed again.  Subsequent attempts to read still fail.

I swear, I want that block out of the file system.  Even if the writing of
zeroes really succeeded, I would not be satisfied with the continued use of
that block.  I really want the drive to reallocate it, but Toshiba's
firmware is unsafe to drive at any speed.  So I really want the file system
to exclude that block.

Some participants in this discussion have said that ext2fs can exclude bad
blocks in a way that ReiserFS doesn't, though ReiserFS probably will in the
future.  But to the best of my understanding, ext2fs can detect and exclude
bad blocks at the time of formatting and at the time of a destructive
read-write test.  I have not seen news from anyone about whether ext2fs will
remove a detected permanent bad block from an existing mounted filesystem at
the time that the error is detected during normal operations.  It is 99%
necessary to do so (leaving 1% for audio visual applications where it's more
important to play a movie erroneously at proper speed than to attempt
recovery).

By the way, one participant in this thread recommended not buying disk
drives from bargain basement outlets.  OK, yesterday I inquired at Bic
Camera, which might be one of the biggest two retailers of computers and
parts nationwide, but might not be because they don't have many stores
outside of the Tokyo area.  At least they're surely one of the two biggest
in Tokyo.  They said that they warranty Toshiba disk drives for 1 year.  So
if a customer buys a Toshiba disk drive with firmware that was defective on
the day of purchase and defective on the dates of design and manufacture,
but if the customer doesn't detect the defective firmware until 366 days
later, the customer still gets shafted.

I still have to say, we can't fix Toshiba, and we can avoid Toshiba, but
meanwhile we can fix Linux.  Among other manufacturers, only Maxtor has said
that their firmware isn't broken in this way, but Maxtor doesn't make drives
for notebooks.  Just how many manufacturers of disk drives are we going to
avoid, or can we hope that Linux will be made to compensate for their
defects?

Well, in a future weekend, I will try to see if ext2fs really takes action
on permanently bad blocks that are detected during normal operations on a
mounted partition.


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2003-10-30  8:28 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-27 17:43 Blockbusting news, results get worse Mudama, Eric
2003-10-27 18:48 ` Hans Reiser
2003-10-27 19:47   ` Jeff Garzik
2003-10-27 20:03     ` John Bradford
2003-10-29 20:01       ` Pavel Machek
2003-10-30  8:30         ` John Bradford
2003-10-28  1:21     ` Pavel Machek
2003-10-28 12:54       ` Krzysztof Halasa
  -- strict thread matches above, loose matches on Subject: below --
2003-10-29 20:11 Mudama, Eric
2003-10-27 18:06 Mudama, Eric
2003-10-27 19:18 ` Andre Hedrick
2003-10-27 13:07 Samium Gromoff
2003-10-26 22:12 Mudama, Eric
2003-10-26 18:33 Mudama, Eric
2003-10-26 22:03 ` Andre Hedrick
2003-10-27  9:34 ` Norman Diamond
2003-10-27 10:23   ` Jan-Benedict Glaw
2003-10-27 23:31   ` Jason Lunz
2003-10-28 20:56   ` Hans Reiser
2003-10-26  7:37 Norman Diamond
2003-10-26 10:39 ` John Bradford
2003-10-26  9:41   ` Pavel Machek
2003-10-26 11:38   ` Norman Diamond
2003-10-26 11:56     ` Pavel Machek
2003-10-26 12:06     ` Hans Reiser
2003-10-26 13:59     ` Krzysztof Halasa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).