Re: Why are bad disk sectors numbered strangely, and what happens to them?

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
@ 2003-10-13  9:31 Norman Diamond
       [not found] ` <200310131014.h9DAEwY3000241@81-2-122-30.bradfords.org.uk>
  0 siblings, 1 reply; 64+ messages in thread
From: Norman Diamond @ 2003-10-13  9:31 UTC (permalink / raw)
  To: linux-kernel

Thanks to Andreas Jellinghaus's suggestion, I ran smartctl logs and tests.
My Linux questions increase in number, but first here are the results.

Before testing, the log included a count of 92 errors, of which the
latest 5 had details available.  Reallocated_Sector_Ct was 1 and
Reallocated_Event_Count was 1.  The offline test succeeded and changed
nothing.  The long self-test found one read error.  After testing, the log
still included a count of 92 errors, of which the latest 5 had details
available, and they were the same 5, so the firmware didn't update
that log with the error that was detected by its self-test.    However,
Reallocated_Sector_Ct was 2 and Reallocated_Event_Count was 2.

The self-test saved one detail of its read error separately from the main
log.  LBA_of_first_error was 0x0122403a.  In decimal this was a very
familiar-looking 19021882.

The sector is in a Reiser partition, which might affect some of the
following questions.

So, why do the syslog entries have so many "sector" numbers, which are
mostly different except for some repetitions, and mostly different from
"LBAsect"?  It seems that LBAsect is the correct number of the bad sector.

How can I find out which file contains the bad sector?  I would like to try
to recreate the file from a source of good data.

How can I tell Linux to mark the sector as bad, knowing the LBA sector
number?

Or did the drive's firmware mark the sector as bad during its self-test?  Is
this why the number of reallocations increased from 1 to 2?  But if so, why
didn't this happen when Linux tried to read the sector?

How can I tell Linux to read every sector in the partition?  Oh, I might
know this one,
  dd if=/dev/hda8 of=/dev/null
I want to make sure that the drive is now using a non-defective replacement
sector.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
       [not found] ` <200310131014.h9DAEwY3000241@81-2-122-30.bradfords.org.uk>
@ 2003-10-13 10:24   ` Norman Diamond
  2003-10-13 10:33     ` John Bradford
                       ` (2 more replies)
  0 siblings, 3 replies; 64+ messages in thread
From: Norman Diamond @ 2003-10-13 10:24 UTC (permalink / raw)
  To: John Bradford, linux-kernel

John Bradford replied to me:

> > How can I tell Linux to read every sector in the partition?  Oh, I might
> > know this one,
> >   dd if=/dev/hda8 of=/dev/null
> > I want to make sure that the drive is now using a non-defective
> > replacement sector.
>
> A read won't necessarily do that.  You might have to write to a
> defective sector to force re-allocation.

I agree, we are not sure if a read will do that.  That is the reason why two
of my preceding questions were:

   How can I find out which file contains the bad sector?  I would like to
   try to recreate the file from a source of good data.

   How can I tell Linux to mark the sector as bad, knowing the LBA sector
   number?

And that is also the reason why my last question, which Mr. Bradford replied
to, had the stated purpose of making sure that the drive is now using a
non-defective replacement sector after the preceding operations have been
carried out.

Please, the important questions are important.  Doesn't anyone really know
what Linux does with bad blocks, how to find out which file contains them,
how to get Linux to force them to be marked and reallocated?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 10:24   ` Norman Diamond
@ 2003-10-13 10:33     ` John Bradford
  2003-10-13 11:30       ` Norman Diamond
  2003-10-13 14:24     ` Chuck Campbell
  2003-10-14  6:49     ` Rogier Wolff
  2 siblings, 1 reply; 64+ messages in thread
From: John Bradford @ 2003-10-13 10:33 UTC (permalink / raw)
  To: Norman Diamond, linux-kernel

Quote from "Norman Diamond" <ndiamond@wta.att.ne.jp>:
> John Bradford replied to me:
> 
> > > How can I tell Linux to read every sector in the partition?  Oh, I might
> > > know this one,
> > >   dd if=/dev/hda8 of=/dev/null
> > > I want to make sure that the drive is now using a non-defective
> > > replacement sector.
> >
> > A read won't necessarily do that.  You might have to write to a
> > defective sector to force re-allocation.
> 
> I agree, we are not sure if a read will do that.  That is the reason why two
> of my preceding questions were:
> 
>    How can I find out which file contains the bad sector?  I would like to
>    try to recreate the file from a source of good data.

How are you going to make sure you write it in the same location as it was before?

>    How can I tell Linux to mark the sector as bad, knowing the LBA sector
>    number?

Don't.  If the drive can't fix this problem itself, throw it in the bin.

> And that is also the reason why my last question, which Mr. Bradford replied
> to, had the stated purpose of making sure that the drive is now using a
> non-defective replacement sector after the preceding operations have been
> carried out.

Backup your data.
Run the S.M.A.R.T. tests.
Write over the whole disk with something like dd if=/dev/zero of=/dev/hda.
If you still get errors, replace the disk.

> Please, the important questions are important.  Doesn't anyone really know
> what Linux does with bad blocks, how to find out which file contains them,
> how to get Linux to force them to be marked and reallocated?

John.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 10:33     ` John Bradford
@ 2003-10-13 11:30       ` Norman Diamond
  2003-10-13 11:58         ` Maciej Zenczykowski
                           ` (2 more replies)
  0 siblings, 3 replies; 64+ messages in thread
From: Norman Diamond @ 2003-10-13 11:30 UTC (permalink / raw)
  To: John Bradford, linux-kernel

John Bradford replied to me:

> > > > How can I tell Linux to read every sector in the partition?  Oh, I
> > > > might know this one,
> > > >   dd if=/dev/hda8 of=/dev/null
> > > > I want to make sure that the drive is now using a non-defective
> > > > replacement sector.
> > >
> > > A read won't necessarily do that.  You might have to write to a
> > > defective sector to force re-allocation.
> >
> > I agree, we are not sure if a read will do that.  That is the reason why
> > two of my preceding questions were:
> >
> >    How can I find out which file contains the bad sector?  I would like
> >    to try to recreate the file from a source of good data.
>
> How are you going to make sure you write it in the same location as it was
> before?

Mostly it doesn't matter.  The primary purpose of this bit of it is to
recreate the file to contain good data, which is why I would try to recreate
it from a source of good data.  The secondary purpose is:
  IF the bad sector doesn't get reused then great, then the next bit of
effort will be to try to get the sector marked as bad, if there is any way
to do that under Linux.  See the next question, which is now being reposted
for at least the fourth time.
  BUT IF the same sector number gets rewritten then hopefully the same
sector number will be associated with a reallocated non-defective sector and
the data will get written properly.

> >    How can I tell Linux to mark the sector as bad, knowing the LBA
> >    sector number?
>
> Don't.  If the drive can't fix this problem itself, throw it in the bin.

THE DRIVE HAS 1, ONE, HITOTSU, UNO, UN, BAD SECTOR.  The drive is capable of
doing reallocations.  What kind of operation can be done that will persuade
the drive to do the reallocation?

> > And that is also the reason why my last question, which Mr. Bradford
> > replied to, had the stated purpose of making sure that the drive is now
> > using a non-defective replacement sector after the preceding operations
> > have been carried out.
>
> Backup your data.

I want to fix the defective file from an existing backup or recomputation.
Aside from that, it is my crash box (as already posted in this thread).  The
questions are still important because sometimes this kind of thing happens
on machines that aren't crash boxes, and it is not customary to dump a drive
when 99.99% of its preparations for error recovery are still intact.

> Run the S.M.A.R.T. tests.

I DID.  YOU REPLIED TO MY POSTING WHERE I REPORTED THEM.

> Write over the whole disk with something like dd if=/dev/zero of=/dev/hda.

Hmm.  That could well be an answer.  I'll think about it.

Actually I should just write over the whole partition for the present time.
When the drive's self-test detected that one bad sector, I could figure out
which partition it was in (though not which file, which is why I asked one
of those questions several times already).  The drive's self-test read the
entire drive and the other partitions had no detectable errors.

> If you still get errors, replace the disk.

If the errors are not correctable and/or numerous (where I do not count
numerous syslog entries of the same defective sector to be numerous errors)
then of course I will do so.  Even though it's my crash box.

...  By the way, consider this:

Windows 98 has a scandisk command which writes a file scandisk.log in which
the user can see which files have been deleted by scandisk or corrupted
either by scandisk or before scandisk.  The user can try to recreate those
files.

Windows 2000 has a chkdsk command which does not write a logfile.

Therefore it is convenient for Windows 2000 users to keep an installation of
Windows 98 installed in order to run Windows 98's scandisk command when
necessary.  (Doesn't work for NTFS partitions, but otherwise convenient.)

If Linux is really supposed to be even less powerful than both of those,
then there's quite a lot of wasted effort under way in this undertaking.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 11:30       ` Norman Diamond
@ 2003-10-13 11:58         ` Maciej Zenczykowski
  2003-10-15 10:22           ` Norman Diamond
  2003-10-13 12:02         ` John Bradford
  2003-10-14  6:54         ` Rogier Wolff
  2 siblings, 1 reply; 64+ messages in thread
From: Maciej Zenczykowski @ 2003-10-13 11:58 UTC (permalink / raw)
  To: Norman Diamond; +Cc: John Bradford, linux-kernel

> Hmm.  That could well be an answer.  I'll think about it.
> 
> Actually I should just write over the whole partition for the present time.
> When the drive's self-test detected that one bad sector, I could figure out
> which partition it was in (though not which file, which is why I asked one
> of those questions several times already).  The drive's self-test read the
> entire drive and the other partitions had no detectable errors.

Instead of zeroing the entire partition just zero that single sector.
something like:

dd if=/dev/zero of=/dev/hda bs=512 seek=$lbasector conv=notrunc count=1

possibly first check (by reading in the oposite direction:
dd if=/dev/hda of=/dev/null bs=512 skip=$lbasector count=1)
if this is indeed the place were you get the read error (in syslog)...
if you can read anything from it then read it to a file and write it back 
from the file...

as for checking which file contains it... hmm file->sector->lba mapping 
can be performed... I don't know about the other direction.  Worst case 
would require checking the mapping of all files on the partition (and 
assuming it's not in an empty area or non-file system area).

Cheers,
MaZe.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 11:30       ` Norman Diamond
  2003-10-13 11:58         ` Maciej Zenczykowski
@ 2003-10-13 12:02         ` John Bradford
  2003-10-15 10:23           ` Norman Diamond
  2003-10-14  6:54         ` Rogier Wolff
  2 siblings, 1 reply; 64+ messages in thread
From: John Bradford @ 2003-10-13 12:02 UTC (permalink / raw)
  To: Norman Diamond, linux-kernel

Quote from "Norman Diamond" <ndiamond@wta.att.ne.jp>:
> John Bradford replied to me:
> 
> > > > > How can I tell Linux to read every sector in the partition?  Oh, I
> > > > > might know this one,
> > > > >   dd if=/dev/hda8 of=/dev/null
> > > > > I want to make sure that the drive is now using a non-defective
> > > > > replacement sector.
> > > >
> > > > A read won't necessarily do that.  You might have to write to a
> > > > defective sector to force re-allocation.
> > >
> > > I agree, we are not sure if a read will do that.  That is the reason why
> > > two of my preceding questions were:
> > >
> > >    How can I find out which file contains the bad sector?  I would like
> > >    to try to recreate the file from a source of good data.
> >
> > How are you going to make sure you write it in the same location as it was
> > before?
> 
> Mostly it doesn't matter.  The primary purpose of this bit of it is to
> recreate the file to contain good data, which is why I would try to recreate
> it from a source of good data.

OK.

>  The secondary purpose is:
>   IF the bad sector doesn't get reused then great, then the next bit of
> effort will be to try to get the sector marked as bad, if there is any way
> to do that under Linux.  See the next question, which is now being reposted
> for at least the fourth time.
>   BUT IF the same sector number gets rewritten then hopefully the same
> sector number will be associated with a reallocated non-defective sector and
> the data will get written properly.

Yes, that's what I'd hope, unless the disk ran out of spare space to
allocate.

> > >    How can I tell Linux to mark the sector as bad, knowing the LBA
> > >    sector number?
> >
> > Don't.  If the drive can't fix this problem itself, throw it in the bin.
> 
> THE DRIVE HAS 1, ONE, HITOTSU, UNO, UN, BAD SECTOR.

No, the last SMART test re-allocated one sector.  That sector may have
gone bad in the next few minutes.  Unlikely, but possible.

>  The drive is capable of
> doing reallocations.  What kind of operation can be done that will persuade
> the drive to do the reallocation?

The drive has _done_ a reallocation.  You posted that the reallocated
sector count had gone from 1 to 2.  This is why I said if it can't fix
the problem, bin it.  It doesn't seem to have fixed the problem yet.

> > > And that is also the reason why my last question, which Mr. Bradford
> > > replied to, had the stated purpose of making sure that the drive is now
> > > using a non-defective replacement sector after the preceding operations
> > > have been carried out.
> >
> > Backup your data.
> 
> I want to fix the defective file from an existing backup or recomputation.
> Aside from that, it is my crash box (as already posted in this thread).

Somebody else might read this thread, and want full instructions.  It
might be your crash box, but somebody else might have data they want
to preserve.

>  The
> questions are still important because sometimes this kind of thing happens
> on machines that aren't crash boxes, and it is not customary to dump a drive
> when 99.99% of its preparations for error recovery are still intact.
> 
> > Run the S.M.A.R.T. tests.
> 
> I DID.  YOU REPLIED TO MY POSTING WHERE I REPORTED THEM.

1. I know.  I read your original post
2. I am providing instructions that other people might follow in the
   future, that is why I am making sure they are complete.
3. Run the tests again.  Your drive fixed one bad sector, let's see if
   it completes the test again without finding more.

John.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 10:24   ` Norman Diamond
  2003-10-13 10:33     ` John Bradford
@ 2003-10-13 14:24     ` Chuck Campbell
  2003-10-13 14:54       ` Maciej Zenczykowski
  2003-10-14  6:49     ` Rogier Wolff
  2 siblings, 1 reply; 64+ messages in thread
From: Chuck Campbell @ 2003-10-13 14:24 UTC (permalink / raw)
  To: linux-kernel

On Mon, Oct 13, 2003 at 07:24:00PM +0900, Norman Diamond wrote:
> 
> I agree, we are not sure if a read will do that.  That is the reason why two
> of my preceding questions were:
> 
>    How can I find out which file contains the bad sector?  I would like to
>    try to recreate the file from a source of good data.

this was gib\x7fven to me  on this list by Al Viro a couple of years back.  
Worked fine for me.

find /usr/lib -type f|sed -e 's!.*!cat & >/dev/null || echo &!'|sh

-- 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 14:24     ` Chuck Campbell
@ 2003-10-13 14:54       ` Maciej Zenczykowski
  2003-10-13 16:29         ` Roger Larsson
  0 siblings, 1 reply; 64+ messages in thread
From: Maciej Zenczykowski @ 2003-10-13 14:54 UTC (permalink / raw)
  To: Chuck Campbell; +Cc: linux-kernel

> find /usr/lib -type f|sed -e 's!.*!cat & >/dev/null || echo &!'|sh
should obviously be:
  find /usr/lib -type f|sed -e 's!.*!cat "&" >/dev/null || echo &!'|sh
in order to accept spaces in file names... (they do happen).

MaZe.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 14:54       ` Maciej Zenczykowski
@ 2003-10-13 16:29         ` Roger Larsson
  0 siblings, 0 replies; 64+ messages in thread
From: Roger Larsson @ 2003-10-13 16:29 UTC (permalink / raw)
  To: linux-kernel

On Monday 13 October 2003 16.54, Maciej Zenczykowski wrote:
> > find /usr/lib -type f|sed -e 's!.*!cat & >/dev/null || echo &!'|sh
>
> should obviously be:
>   find /usr/lib -type f|sed -e 's!.*!cat "&" >/dev/null || echo &!'|sh
> in order to accept spaces in file names... (they do happen).

find /usr/lib -type f|sed -e 's!.*!cat "&" >/dev/null || echo "&"!'|sh

To accept even stranger characters... Like parantesis '('
Othervice I get:

sh: line 10051: syntax error near unexpected token `('
sh: line 10051: `cat "/usr/lib/qt-3.0.5/templates/
Dialog_with_Buttons_(Bottom).ui" >/dev/null || echo /usr/lib/qt-3.0.5/
templates/Dialog_with_Buttons_(Bottom).ui'

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 10:24   ` Norman Diamond
  2003-10-13 10:33     ` John Bradford
  2003-10-13 14:24     ` Chuck Campbell
@ 2003-10-14  6:49     ` Rogier Wolff
  2003-10-14  7:05       ` Wes Janzen
  2 siblings, 1 reply; 64+ messages in thread
From: Rogier Wolff @ 2003-10-14  6:49 UTC (permalink / raw)
  To: Norman Diamond; +Cc: John Bradford, linux-kernel

On Mon, Oct 13, 2003 at 07:24:00PM +0900, Norman Diamond wrote:
> John Bradford replied to me:
> 
> > > How can I tell Linux to read every sector in the partition?  Oh, I might
> > > know this one,
> > >   dd if=/dev/hda8 of=/dev/null
> > > I want to make sure that the drive is now using a non-defective
> > > replacement sector.
> >
> > A read won't necessarily do that.  You might have to write to a
> > defective sector to force re-allocation.
> 
> I agree, we are not sure if a read will do that.  That is the reason why two
> of my preceding questions were:

I've seen a disk (which now failed and will be replaced 3 hours from now)
remap defective sectors without reporting any errors to the OS. 
The SMART "remapped sector count" just went up, but no errors in the
logs. So apparently, the disk noticed something and remapped teh sector
without anybody noticing. 

>    How can I find out which file contains the bad sector?  I would like to
>    try to recreate the file from a source of good data.

Try: 
	tar cf - / | dd of=/dev/null

(note some people will try to abbreviate that to 
	tar cf /dev/null / 
but that won't work: Tar will recognise that it's writing to /dev/null
and skip reading the files! That's a bug in tar in my book. )

>    How can I tell Linux to mark the sector as bad, knowing the LBA sector
>    number?

man tune2fs .

You have to do the math on the LBA sector numbers (subtract the
partition start, divide by two). 

Also, you can use the "badblocks" program. 

			Roger. 
-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 11:30       ` Norman Diamond
  2003-10-13 11:58         ` Maciej Zenczykowski
  2003-10-13 12:02         ` John Bradford
@ 2003-10-14  6:54         ` Rogier Wolff
  2 siblings, 0 replies; 64+ messages in thread
From: Rogier Wolff @ 2003-10-14  6:54 UTC (permalink / raw)
  To: Norman Diamond; +Cc: John Bradford, linux-kernel

On Mon, Oct 13, 2003 at 08:30:19PM +0900, Norman Diamond wrote:
> > How are you going to make sure you write it in the same location as it was
> > before?
> 
> Mostly it doesn't matter.  The primary purpose of this bit of it is to
> recreate the file to contain good data, which is why I would try to recreate
> it from a source of good data.  The secondary purpose is:

Note that I strongly recommend not putting any important data on
a drive that has shown to have defective sectors(*). You never know when
the next sector is going to go. 

We're replacing a drive that has remapped 13 sectors or something like
that, and it's now given us the first IO errors, so it's going towards
the bin. 

		Roger. 

(*) If you're sure that something external which can be prevented in the
future caused the bad sectors, then fine. But if a drive is developing
bad sectors all by itself, the future might bring remapped sectors until
the slack remap space runs out, or one day a sector containing important
data goes bad.... 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  6:49     ` Rogier Wolff
@ 2003-10-14  7:05       ` Wes Janzen
  2003-10-14  7:21         ` John Bradford
                           ` (2 more replies)
  0 siblings, 3 replies; 64+ messages in thread
From: Wes Janzen @ 2003-10-14  7:05 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: Norman Diamond, John Bradford, linux-kernel



Rogier Wolff wrote:

>On Mon, Oct 13, 2003 at 07:24:00PM +0900, Norman Diamond wrote:
>  
>
>>John Bradford replied to me:
>>
>>    
>>
>>>>How can I tell Linux to read every sector in the partition?  Oh, I might
>>>>know this one,
>>>>  dd if=/dev/hda8 of=/dev/null
>>>>I want to make sure that the drive is now using a non-defective
>>>>replacement sector.
>>>>        
>>>>
>>>A read won't necessarily do that.  You might have to write to a
>>>defective sector to force re-allocation.
>>>      
>>>
>>I agree, we are not sure if a read will do that.  That is the reason why two
>>of my preceding questions were:
>>    
>>
>
>I've seen a disk (which now failed and will be replaced 3 hours from now)
>remap defective sectors without reporting any errors to the OS. 
>The SMART "remapped sector count" just went up, but no errors in the
>logs. So apparently, the disk noticed something and remapped teh sector
>without anybody noticing. 
>  
>
Can't you pretty much get the drive to check itself using smartctl, such 
as running:
     smartctl -o on -s on -S on /dev/hde &> /dev/null
in an init script?  Also, I think if you just happen to write to a bad 
sector the drive will remap it without a warning (unless it doesn't have 
any remapping sectors left), but if you read from it then to get the 
drive to "notice" it, you have to write back to that sector.  Or run the 
drive test which should find it and correct it.

>  
>
>>   How can I find out which file contains the bad sector?  I would like to
>>   try to recreate the file from a source of good data.
>>    
>>
>
>Try: 
>	tar cf - / | dd of=/dev/null
>
>(note some people will try to abbreviate that to 
>	tar cf /dev/null / 
>but that won't work: Tar will recognise that it's writing to /dev/null
>and skip reading the files! That's a bug in tar in my book. )
>
>  
>
>>   How can I tell Linux to mark the sector as bad, knowing the LBA sector
>>   number?
>>    
>>
>
>man tune2fs .
>
>You have to do the math on the LBA sector numbers (subtract the
>partition start, divide by two). 
>
>Also, you can use the "badblocks" program. 
>  
>
I think he's using reiserfs on the partition, which ASFAIK doesn't 
support marking bad sectors without some work.  I tend to agree with 
namesys when they suggest just getting a new drive if it has used up all 
of its extra sectors.  In my experience (admittedly limited), any drive 
which runs out of extra sectors starts to go bad in a hurry.

-Wes-

>			Roger. 
>  
>


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  7:05       ` Wes Janzen
@ 2003-10-14  7:21         ` John Bradford
  2003-10-14  7:40           ` Rogier Wolff
  2003-10-14  7:24         ` Rogier Wolff
  2003-10-14  9:04         ` Hans Reiser
  2 siblings, 1 reply; 64+ messages in thread
From: John Bradford @ 2003-10-14  7:21 UTC (permalink / raw)
  To: Wes Janzen, Rogier Wolff; +Cc: Norman Diamond, John Bradford, linux-kernel

> >>>>I want to make sure that the drive is now using a non-defective
> >>>>replacement sector.
> >>>>        
> >>>>
> >>>A read won't necessarily do that.  You might have to write to a
> >>>defective sector to force re-allocation.
> >>>      
> >>>
> >>I agree, we are not sure if a read will do that.  That is the reason why two
> >>of my preceding questions were:
> >>    
> >>
> >
> >I've seen a disk (which now failed and will be replaced 3 hours from now)
> >remap defective sectors without reporting any errors to the OS. 
> >The SMART "remapped sector count" just went up, but no errors in the
> >logs. So apparently, the disk noticed something and remapped teh sector
> >without anybody noticing. 
> >  
> >
> Can't you pretty much get the drive to check itself using smartctl, such 
> as running:
>      smartctl -o on -s on -S on /dev/hde &> /dev/null
> in an init script?  Also, I think if you just happen to write to a bad 
> sector the drive will remap it without a warning (unless it doesn't have 
> any remapping sectors left), but if you read from it then to get the 
> drive to "notice" it, you have to write back to that sector.  Or run the 
> drive test which should find it and correct it.

That's correct for the majority of modern IDE disks.

> >>   How can I tell Linux to mark the sector as bad, knowing the LBA sector
> >>   number?
> >>    
> >>
> >
> >man tune2fs .
> >
> >You have to do the math on the LBA sector numbers (subtract the
> >partition start, divide by two). 
> >
> >Also, you can use the "badblocks" program. 
> >  
> >
> I think he's using reiserfs on the partition, which ASFAIK doesn't 
> support marking bad sectors without some work.  I tend to agree with 
> namesys when they suggest just getting a new drive if it has used up all 
> of its extra sectors.  In my experience (admittedly limited), any drive 
> which runs out of extra sectors starts to go bad in a hurry.

I fail to see the point of this discussion.  What is the point in
marking sectors bad at the filesystem level, when the drive is
supposed to be doing it at the firmware level?

The drive is probably full of unusable areas, which are correctly
identified and not used by the firmware.  One more is detected, and
the firmware doesn't cope with it.  Suddenly we are getting
suggestions to work around that in the filesystem.

The drive may well have been developing faults regularly through it's
entire lifetime, and you haven't noticed.  Now you have noticed and
want to work around the problem, but why wouldn't the drive continue
it's 'natural decay', and assuming it does, why would it be able to
re-map future bad blocks, but not this one?

Working around the problem in the filesystem makes no sense at all on
a modern IDE drive.

John.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  7:05       ` Wes Janzen
  2003-10-14  7:21         ` John Bradford
@ 2003-10-14  7:24         ` Rogier Wolff
  2003-10-14  9:04         ` Hans Reiser
  2 siblings, 0 replies; 64+ messages in thread
From: Rogier Wolff @ 2003-10-14  7:24 UTC (permalink / raw)
  To: Wes Janzen; +Cc: Rogier Wolff, Norman Diamond, John Bradford, linux-kernel

On Tue, Oct 14, 2003 at 02:05:27AM -0500, Wes Janzen wrote:
> >I've seen a disk (which now failed and will be replaced 3 hours from now)
> >remap defective sectors without reporting any errors to the OS. 
> >The SMART "remapped sector count" just went up, but no errors in the
> >logs. So apparently, the disk noticed something and remapped teh sector
> >without anybody noticing. 
> > 
> >
> Can't you pretty much get the drive to check itself using smartctl, such 
> as running:
>     smartctl -o on -s on -S on /dev/hde &> /dev/null

I strongly recommend you  store the output somewhere. This way you
will get to ignore for instance:
	hde: no such device
without being ABLE to notice it. (being an initscript, outputting to
stdout is not good. Store it in /var/log somewhere)

> in an init script?  Also, I think if you just happen to write to a bad 
> sector the drive will remap it without a warning (unless it doesn't have 
> any remapping sectors left), but if you read from it then to get the 
> drive to "notice" it, you have to write back to that sector.  Or run the 
> drive test which should find it and correct it.

The drive which I'm replacing has had a total of 22 powercycles.
Something like 15 powercycles seem to happen during "install", we
had some hardware problems after that (replaced the motherboard)
in apparently another 7 power cycles. That's all. 

If you manage to get the drive to notice sectors going bad
just before they actually GO bad, then you'll see an exponential
increase in sectors going bad, resulting in the drive quickly 
running out of spare sectors. This defeats the purpose of SMART
in alerting you to a failing drive before it costs you your valuable
data.

If an area of say 2mm x 2mm is going bad, then that's already many
megabytes on a modern drive. The drive is going to decide to remap
sectors there on a case-by-case basis, keeping on storing your valuable
data in sectors which just didn't get noticed. You don't have the
ability to notice the structure in the bad sectors. 

If say a read-amp is slowly going bad, the worst sectors are going 
first, but the whole drive will fail soonish. 

Take it as a warning. Take the drive back on warranty. Point them to the
marketingspeak on the box which says: "defect free interface" or
somethign like that. You want a drive without bad sectors. 

If you can't take it back, move it away to your "long term storage"
disk, where you keep the backup of your CD collection or something like
that. Don't put anything important on it. 

			Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  7:21         ` John Bradford
@ 2003-10-14  7:40           ` Rogier Wolff
  2003-10-14  8:11             ` John Bradford
       [not found]             ` <200310140800.h9E80BT9000815@81-2-122-30.bradfords.org.uk>
  0 siblings, 2 replies; 64+ messages in thread
From: Rogier Wolff @ 2003-10-14  7:40 UTC (permalink / raw)
  To: John Bradford; +Cc: Wes Janzen, Rogier Wolff, Norman Diamond, linux-kernel

On Tue, Oct 14, 2003 at 08:21:48AM +0100, John Bradford wrote:
> > >
> > >Also, you can use the "badblocks" program. 
> > >  
> > >
> > I think he's using reiserfs on the partition, which ASFAIK doesn't 
> > support marking bad sectors without some work.  I tend to agree with 
> > namesys when they suggest just getting a new drive if it has used up all 
> > of its extra sectors.  In my experience (admittedly limited), any drive 
> > which runs out of extra sectors starts to go bad in a hurry.
> 
> I fail to see the point of this discussion.  What is the point in
> marking sectors bad at the filesystem level, when the drive is
> supposed to be doing it at the firmware level?

I'm not sure in what cases a drive will remap a sector. Manufacturers
are not publishing this.

So if you get a read-error (showing you that some of your data was just
lost!), you could just rewrite that sector and hope for the drive to
remap it. Well, you just lost some of your data. Maybe it was part of a
file you got from a CD. Fine. Easy to replace. Maybe it was part of your
CD-collection-backup. Fine. Easy to replace. Maybe it was part of your
thesis document. Oops. Difficult to replace.

> The drive is probably full of unusable areas, which are correctly
> identified and not used by the firmware.  One more is detected, and
> the firmware doesn't cope with it.  Suddenly we are getting
> suggestions to work around that in the filesystem.

Right. Support for bad sectors is really easy to build into a
filesystem. If Reiserfs doesn't (yet) support it, another reason not 
to use Reiserfs. 

You create a file called something like ".badblocks" in the root
directory. If as a filesystem you get to know of a bad block, just
allocate it towards that file. Next it pays to make the file invisble
from userspace. (otherwise "tar backups" would try to read it!). 

This is usually done by just allocating an inodenumber for it, and
telling  fsck about it, to prevent it being linked into lost+found 
on the first fsck.... 

> The drive may well have been developing faults regularly through it's
> entire lifetime, and you haven't noticed.  Now you have noticed and
> want to work around the problem, but why wouldn't the drive continue
> it's 'natural decay', and assuming it does, why would it be able to
> re-map future bad blocks, but not this one?

On the other hand, I once bumped my knee against the bottom of the table
that my computer was on. That was the exact moment that one of my
sectors went bad. So now I know the cause, and want to remap the sector. 
No gradual decay. 

			Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  7:40           ` Rogier Wolff
@ 2003-10-14  8:11             ` John Bradford
  2003-10-14  8:45               ` Hans Reiser
       [not found]             ` <200310140800.h9E80BT9000815@81-2-122-30.bradfords.org.uk>
  1 sibling, 1 reply; 64+ messages in thread
From: John Bradford @ 2003-10-14  8:11 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: Wes Janzen, linux-kernel

This is my last mail on this subject.

> I'm not sure in what cases a drive will remap a sector. Manufacturers
> are not publishing this.
> 
> So if you get a read-error (showing you that some of your data was just
> lost!), you could just rewrite that sector and hope for the drive to
> remap it. Well, you just lost some of your data. Maybe it was part of a
> file you got from a CD. Fine. Easy to replace. Maybe it was part of your
> CD-collection-backup. Fine. Easy to replace. Maybe it was part of your
> thesis document. Oops. Difficult to replace.

Sector re-mapping is not a replacement for backing up your data.  It
merely adds resiliance to the disk.  Infact, it's more or less
impossible to get away from these days - modern IDE disks error
correct all the time.  One area of the disk going bad is not an
unlikely event.

> > The drive is probably full of unusable areas, which are correctly
> > identified and not used by the firmware.  One more is detected, and
> > the firmware doesn't cope with it.  Suddenly we are getting
> > suggestions to work around that in the filesystem.
> 
> Right. Support for bad sectors is really easy to build into a
> filesystem. If Reiserfs doesn't (yet) support it, another reason not 
> to use Reiserfs. 

Not at all.  A bad sector map in the filesystem is a pointless feature
for a filesystem which will only likely be used on fault tollerant
devices.  It serves no purpose.  The 'it does no harm' argument is
just as pointless. 

> You create a file called something like ".badblocks" in the root
> directory. If as a filesystem you get to know of a bad block, just
> allocate it towards that file. Next it pays to make the file invisble
> from userspace. (otherwise "tar backups" would try to read it!). 

Doing that kind of thing was quite useful in the 1980s when floppies
were actually expensive and hard disks usually didn't remap bad
sectors.  Nowadays, it usually gains nothing, and may well hide real
faults that could cause data loss later on.

> This is usually done by just allocating an inodenumber for it, and
> telling  fsck about it, to prevent it being linked into lost+found 
> on the first fsck.... 
> 
> > The drive may well have been developing faults regularly through it's
> > entire lifetime, and you haven't noticed.  Now you have noticed and
> > want to work around the problem, but why wouldn't the drive continue
> > it's 'natural decay', and assuming it does, why would it be able to
> > re-map future bad blocks, but not this one?
> 
> On the other hand, I once bumped my knee against the bottom of the table
> that my computer was on. That was the exact moment that one of my
> sectors went bad. So now I know the cause, and want to remap the sector. 
> No gradual decay. 

Again, you are talking around the problem - there almost certainly
will be gradual decay with any disk.  You are just not noticing it
because the firmware is handling it.  If you know that there is a bad
sector, and the disk is not re-mapping it, _why_ isn't it remapping
it?

John.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  8:11             ` John Bradford
@ 2003-10-14  8:45               ` Hans Reiser
  2003-10-14  9:46                 ` Rogier Wolff
  2003-10-14 10:19                 ` John Bradford
  0 siblings, 2 replies; 64+ messages in thread
From: Hans Reiser @ 2003-10-14  8:45 UTC (permalink / raw)
  To: John Bradford; +Cc: Rogier Wolff, Wes Janzen, linux-kernel

Perhaps we should tell people to first write to the bad block, and only 
if the block remains bad after triggering the remapping by writing to it 
should you make any effort to get the filesystem to remap it for you.  
What do you think?

Rogier has not indicated that he has tried writing to the bad sector, 
has he?

-- 
Hans

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
       [not found]               ` <20031014081110.GA14418@bitwizard.nl>
@ 2003-10-14  8:55                 ` Wes Janzen
  2003-10-14 10:05                   ` Rogier Wolff
  0 siblings, 1 reply; 64+ messages in thread
From: Wes Janzen @ 2003-10-14  8:55 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: John Bradford, linux-kernel



Rogier Wolff wrote:

>On Tue, Oct 14, 2003 at 09:00:11AM +0100, John Bradford wrote:
>  
>
>>Besides, a read error might not mean the data is lost, maybe the drive
>>marked it bad because the amount of error correction needed to
>>retrieve the data was just 'on the edge' of what was possible.
>>    
>>
>
>No. A read error means the data was lost. 
>
>The drive may reallocate it wehn it was "on the edge". 
>
>  
>
>>Again, I'm not sure what you are implying.  I don't use ReiserFS
>>personally, but I think it's a _good_ thing if it doesn't implement
>>    
>>
>
>Good Keep it that way. 
>
>  
>
>>bad sector mapping because I don't see any use for it.  If somebody
>>wants to use ReiserFS on an ST-506 disk, the block layer should handle
>>re-allocations, and present an always good block device to the
>>filesystem.
>>    
>>
>
>We don't have that block layer. 
>
>  
>
>>>You create a file called something like ".badblocks" in the root
>>>directory. If as a filesystem you get to know of a bad block, just
>>>allocate it towards that file. Next it pays to make the file invisble
>>>from userspace. (otherwise "tar backups" would try to read it!). 
>>>      
>>>
>>>This is usually done by just allocating an inodenumber for it, and
>>>telling  fsck about it, to prevent it being linked into lost+found 
>>>on the first fsck.... 
>>>
>>>      
>>>
>>>>The drive may well have been developing faults regularly through it's
>>>>entire lifetime, and you haven't noticed.  Now you have noticed and
>>>>want to work around the problem, but why wouldn't the drive continue
>>>>it's 'natural decay', and assuming it does, why would it be able to
>>>>re-map future bad blocks, but not this one?
>>>>        
>>>>
>>>On the other hand, I once bumped my knee against the bottom of the table
>>>that my computer was on. That was the exact moment that one of my
>>>sectors went bad. So now I know the cause, and want to remap the sector. 
>>>No gradual decay. 
>>>      
>>>
>>Why didn't the drive firmware remap that bad sector then?
>>    
>>
>
>Because it was an MFM drive.
>
>Point is that if you KNOW the cause of the bad block, it might be
>worth the trouble not to use it anymore. 
>
>  
>
>>If it actually refused to, my point stands - bad sectors not getting
>>remapped.  You would be relying on no future sector going bad.  Good
>>luck.
>>    
>>
>
>Even if the remap works, you might have a performance penalty. 
>If you skip the 4k block in the future, your 40Mb per second drive
>will be "idle" for 100 microseconds, dropping your performance
>from 40,000,000 bytes to 39,996,000 bytes in that second. But if
>a seek to the remapped sector is involved, you're losing several
>milliseconds of your disk's performance!
>
>And the real-time performance of the drive becomes unreliable. 
>Worst case, in a 1Mbyte block 1 million sectors are remapped,
>requiring a seek of 10ms. While normally reading that block of
>data would consume 1/40th of a second, you are now looking at
>about 3 hours. 
>
Well, aren't we talking about hardware sectors?  The hardware sectors 
are probably at least 1 MB in size to start with.  My old 16GB Maxtor 
that had remapped its way out of sectors only had 16 to remap (the last 
unit I had fail due to this problem).  I doubt the hardware sectors were 
anywhere near 1 byte in size.  The bad sectors also seemed to occur at 
an exponential rate, which is supported by the 5 drives I've seen go bad 
in this manner.  Supposedly that has to do with debries spreading across 
the platter and taking out adjacent sectors.  The one drive I didn't 
send back or replace immediately after the first error (i.e. no more 
sectors can be remapped) had lost nearly 50MB of space to bad sectors in 
a week, and 200MB by the time the replacement arrived 4 days later.  I 
imagine that this only gets worse as more data is packed into a smaller 
space.

>If you are streaming a video off this drive, 
>that doesn't sound like an option. (say requiring only 4Mb per
>second of throughput, i.e. having a factor of 10 of performance
>margin!)
>  
>

Is there even a way to disable sector remapping on an ATA drive anyway?  
To avoid these "disadvantages of hardware remapping" you'd need some way 
to ensure that the drive didn't remap any sectors.  As someone noted, 
their drive remapped a sector without anything showing up in the log. 

I start more closely watching any drive that remaps more than half its 
available sectors, if it gets close to the limit I replace it (if it's 
out of warranty, otherwise I help it along with some badblock runs).  
It's just not worth the hassle of losing data.  At least if the drive 
detects the error, chances are it recovers the data and copies it to a 
good sector (at least I've never lost any data from a drive remapping).  
I can't say the same for the filesystem trying to recover the data, 
which usually seems to result in a corrupted file.  IMHO, the data 
integrity of hardware remapping outweighs any performance disadvantage 
as compared to a filesystem-only based solution.

Now if only the drive would catch the problem without requiring a write 
to the offending sector first. ;-)  Maybe that's already fixed on the 
newer drives, none of my newer ones have remapped sectors yet.

-Wes-

>			Roger. 
>
>
>  
>


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  7:05       ` Wes Janzen
  2003-10-14  7:21         ` John Bradford
  2003-10-14  7:24         ` Rogier Wolff
@ 2003-10-14  9:04         ` Hans Reiser
  2003-10-15 10:23           ` Norman Diamond
  2003-10-17  9:40           ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
  2 siblings, 2 replies; 64+ messages in thread
From: Hans Reiser @ 2003-10-14  9:04 UTC (permalink / raw)
  To: Wes Janzen
  Cc: Rogier Wolff, Norman Diamond, John Bradford, linux-kernel, nikita

Wes Janzen wrote:

>
>>
>>
>> You have to do the math on the LBA sector numbers (subtract the
>> partition start, divide by two).
>> Also, you can use the "badblocks" program.  
>>
> I think he's using reiserfs on the partition, which ASFAIK doesn't 
> support marking bad sectors without some work.  I tend to agree with 
> namesys when they suggest just getting a new drive if it has used up 
> all of its extra sectors.  In my experience (admittedly limited), any 
> drive which runs out of extra sectors starts to go bad in a hurry.
>
> -Wes-
>
>>             Roger.  
>>
>
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>
I think the problem is that many users don't know how to trigger the bad 
sector remapping for the case where the drive can still remap, using 
writes to the bad blocks, and probably our faq needs updating.

nikita, can you do that?

-- 
Hans



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  8:45               ` Hans Reiser
@ 2003-10-14  9:46                 ` Rogier Wolff
  2003-10-14  9:57                   ` Hans Reiser
  2003-10-14 10:19                 ` John Bradford
  1 sibling, 1 reply; 64+ messages in thread
From: Rogier Wolff @ 2003-10-14  9:46 UTC (permalink / raw)
  To: Hans Reiser; +Cc: John Bradford, Rogier Wolff, Wes Janzen, linux-kernel

On Tue, Oct 14, 2003 at 12:45:34PM +0400, Hans Reiser wrote:
> Perhaps we should tell people to first write to the bad block, and only 
> if the block remains bad after triggering the remapping by writing to it 
> should you make any effort to get the filesystem to remap it for you.  
> What do you think?
> 
> Rogier has not indicated that he has tried writing to the bad sector, 
> has he?

Hans, 

I simply refuse to try to trigger a remapping by writing to the
sector. A couple of things can happen:

1) The write succeeds on the "bad" spot. The "normal" write doesn't
do a "veriy-after-write", so the write might simply be succeeding, 
resulting in an immediate data-loss (which might be masked if I try
to reread the data from userspace bacause the data is still cached!)

2) the realloc might succeed, hiding the fact that my drive just lost
0.5k bytes of my data. I mean, there was SOME data there. Linux
wouldn't try to be reading it if it had never been written, right?  A
drive that refers my data to /dev/null should be diverted there
itself.

Of course, I left my drive that indicated it had problems (i.e. it
didn't spot the sector going bad before it became unreadable), in the
machine for another two days. It's getting replaced ASAP (i.e. the
next hour or so).

The bad sector developed in a backup of data that is still running
hapilly on another machine. But I'm not risking a sector getting
assigned some important data going bad next time I notice something.

			Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  9:46                 ` Rogier Wolff
@ 2003-10-14  9:57                   ` Hans Reiser
  2003-10-14 10:10                     ` Rogier Wolff
  0 siblings, 1 reply; 64+ messages in thread
From: Hans Reiser @ 2003-10-14  9:57 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: John Bradford, Wes Janzen, linux-kernel

Rogier Wolff wrote:

>On Tue, Oct 14, 2003 at 12:45:34PM +0400, Hans Reiser wrote:
>  
>
>>Perhaps we should tell people to first write to the bad block, and only 
>>if the block remains bad after triggering the remapping by writing to it 
>>should you make any effort to get the filesystem to remap it for you.  
>>What do you think?
>>
>>Rogier has not indicated that he has tried writing to the bad sector, 
>>has he?
>>    
>>
>
>Hans, 
>
>I simply refuse to try to trigger a remapping by writing to the
>sector. A couple of things can happen:
>
>1) The write succeeds on the "bad" spot.
>
> The "normal" write doesn't
>do a "veriy-after-write", so the write might simply be succeeding, 
>resulting in an immediate data-loss (which might be masked if I try
>to reread the data from userspace bacause the data is still cached!)
>
Do a hard reboot with > 25 seconds power off.

>
>2) the realloc might succeed, hiding the fact that my drive just lost
>0.5k bytes of my data. I mean, there was SOME data there. Linux
>wouldn't try to be reading it if it had never been written, right?  A
>drive that refers my data to /dev/null should be diverted there
>itself.
>
>Of course, I left my drive that indicated it had problems (i.e. it
>didn't spot the sector going bad before it became unreadable), in the
>machine for another two days. It's getting replaced ASAP (i.e. the
>next hour or so).
>
>The bad sector developed in a backup of data that is still running
>hapilly on another machine. But I'm not risking a sector getting
>assigned some important data going bad next time I notice something.
>
>			Roger. 
>
>  
>
replacing the drive is reasonable caution.  I think though that the 
other poster is right that IFF you want to remap bad blocks, the drive 
should do it not reiserfs.

-- 
Hans



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  8:55                 ` Wes Janzen
@ 2003-10-14 10:05                   ` Rogier Wolff
  0 siblings, 0 replies; 64+ messages in thread
From: Rogier Wolff @ 2003-10-14 10:05 UTC (permalink / raw)
  To: Wes Janzen; +Cc: Rogier Wolff, John Bradford, linux-kernel

On Tue, Oct 14, 2003 at 03:55:09AM -0500, Wes Janzen wrote:
> >And the real-time performance of the drive becomes unreliable. 
> >Worst case, in a 1Mbyte block 1 million sectors are remapped,
> >requiring a seek of 10ms. While normally reading that block of
> >data would consume 1/40th of a second, you are now looking at
> >about 3 hours. 

> Well, aren't we talking about hardware sectors?  The hardware sectors 
> are probably at least 1 MB in size to start with.  My old 16GB Maxtor 
> that had remapped its way out of sectors only had 16 to remap (the last 
> unit I had fail due to this problem).  I doubt the hardware sectors were 
> anywhere near 1 byte in size.  The bad sectors also seemed to occur at 

OOops. Sorry. Too quick with the numbers. The remapping granularity is
1 sector (0.5kbytes), and there are 2000 of those in a megabyte.

So if the odd numbered ones end up remapped, you have 2000 seeks to
perform to read that 1Mb of data. That would come to 2000 * 10ms = 20
seconds. Not quite as bad as several hours, but still.... 

> an exponential rate, which is supported by the 5 drives I've seen go bad 
> in this manner.  Supposedly that has to do with debries spreading across 
> the platter and taking out adjacent sectors.  The one drive I didn't 
> send back or replace immediately after the first error (i.e. no more 
> sectors can be remapped) had lost nearly 50MB of space to bad sectors in 
> a week, and 200MB by the time the replacement arrived 4 days later.  I 
> imagine that this only gets worse as more data is packed into a smaller 
> space.

This supports my statement that if you notice sectors getting bad,
replace the disk as fast as you can, and hope that the sector
remapping bails you out until you get that chance.


> Is there even a way to disable sector remapping on an ATA drive anyway?  
> To avoid these "disadvantages of hardware remapping" you'd need some way 
> to ensure that the drive didn't remap any sectors.  As someone noted, 
> their drive remapped a sector without anything showing up in the log. 

Some drives claim "AV compatibility" or something like that. I think
that this means that they will have their spare sectors on the same
cylinder. i.e. no seeking. (just on average 8ms delay).

> I start more closely watching any drive that remaps more than half its 
> available sectors, if it gets close to the limit I replace it (if it's 
> out of warranty, otherwise I help it along with some badblock runs).  
> It's just not worth the hassle of losing data.  At least if the drive 
> detects the error, chances are it recovers the data and copies it to a 
> good sector (at least I've never lost any data from a drive remapping).  
> I can't say the same for the filesystem trying to recover the data, 
> which usually seems to result in a corrupted file.  IMHO, the data 
> integrity of hardware remapping outweighs any performance disadvantage 
> as compared to a filesystem-only based solution.
> 
> Now if only the drive would catch the problem without requiring a write 
> to the offending sector first. ;-)  Maybe that's already fixed on the 
> newer drives, none of my newer ones have remapped sectors yet.

The problem is that it would be nice if the disk could report: I just
read the data from block XXX for you, but I had a hard time getting it
for you. Recommend reassignment. The OS should then log this, and put
the file that this belongs to elsewhere. This gives the OS the
authority, and the sysop the ability to take appropriate action.

I don't mind a couple of remaps on my mp3 collection. But I rather
hate them on my root drive. 

			Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  9:57                   ` Hans Reiser
@ 2003-10-14 10:10                     ` Rogier Wolff
  2003-10-14 10:31                       ` Hans Reiser
  0 siblings, 1 reply; 64+ messages in thread
From: Rogier Wolff @ 2003-10-14 10:10 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Rogier Wolff, John Bradford, Wes Janzen, linux-kernel

On Tue, Oct 14, 2003 at 01:57:42PM +0400, Hans Reiser wrote:
> Rogier Wolff wrote:
> >Of course, I left my drive that indicated it had problems (i.e. it
> >didn't spot the sector going bad before it became unreadable), in the
> >machine for another two days. It's getting replaced ASAP (i.e. the
> >next hour or so).

> replacing the drive is reasonable caution.  I think though that the 
> other poster is right that IFF you want to remap bad blocks, the drive 
> should do it not reiserfs.

It is a "pretty much for free" feature. In your in-kernel
implementation you hopefully already have the ability to skip blocks
in use by other files. So allocating it to a special file will take
care of the kernel part. Next you need one line in your fsck to
prevent that "dangling inode" getting linked into lost+found. Then you
do need a utility to actually be able to mark blocks as bad. 

			Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  8:45               ` Hans Reiser
  2003-10-14  9:46                 ` Rogier Wolff
@ 2003-10-14 10:19                 ` John Bradford
  1 sibling, 0 replies; 64+ messages in thread
From: John Bradford @ 2003-10-14 10:19 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Rogier Wolff, Wes Janzen, linux-kernel

(I know I said my previous post was the last one on this subject, but
we seem to have moved on to a slightly different area).

Quote from Hans Reiser <reiser@namesys.com>:
> Perhaps we should tell people to first write to the bad block, and only 
> if the block remains bad after triggering the remapping by writing to it 
> should you make any effort to get the filesystem to remap it for you.  
> What do you think?

I'm not convinced that this belongs in the filesystem.  I can see how
it makes sense in some ways for magnetic disk devices, but that's not
the filesystem's concern.  How would we know that the write isn't
being cached by hardware further along the line, for example?  What
are the negative effects of repeated writes if the filesystem is on
flash, or a tape.  A damaged tape could be damaged more by winding
back and forth, for example, (OK, tape is a bad example, but some
future storage technology that we don't know about could have an
analogous problem.  My point is that just because 99% of installations
will use ReiserFS on disk device, is it right to put disk device
specifics in the FS?).

Also, one corner case that occurs to me is that the first remapping
worked, and then the newly allocated area went bad in the time before
we verified it.  Then it could look like a persistant fault, when it
is infact it's two separate faults.  Realistically, though, I suspect
that is only likely to happen on a rapidly dieing disk, in which case
there isn't much we can do anyway.

In general, though, the question is really, should ReiserFS be usable
on a device which doesn't do it's own bad block handling?  I suggest
no.

The ultimate point is that only the drive firmware really knows what's
going on, and it can make informed decisions based on things that
nothing external to the drive knows about.  How much error correction
it needed to read a block, the number of errors per physical head, and
per physical cylinder, etc.  The filesystem can only generally make a
decision based on whether there is an error or not.

> Rogier has not indicated that he has tried writing to the bad sector, 
> has he?

I don't think so.

John.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14 10:10                     ` Rogier Wolff
@ 2003-10-14 10:31                       ` Hans Reiser
  0 siblings, 0 replies; 64+ messages in thread
From: Hans Reiser @ 2003-10-14 10:31 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: John Bradford, Wes Janzen, linux-kernel

Rogier Wolff wrote:

>On Tue, Oct 14, 2003 at 01:57:42PM +0400, Hans Reiser wrote:
>  
>
>>Rogier Wolff wrote:
>>    
>>
>>>Of course, I left my drive that indicated it had problems (i.e. it
>>>didn't spot the sector going bad before it became unreadable), in the
>>>machine for another two days. It's getting replaced ASAP (i.e. the
>>>next hour or so).
>>>      
>>>
>
>  
>
>>replacing the drive is reasonable caution.  I think though that the 
>>other poster is right that IFF you want to remap bad blocks, the drive 
>>should do it not reiserfs.
>>    
>>
>
>It is a "pretty much for free" feature. In your in-kernel
>implementation you hopefully already have the ability to skip blocks
>in use by other files. So allocating it to a special file will take
>care of the kernel part. Next you need one line in your fsck to
>prevent that "dangling inode" getting linked into lost+found. Then you
>do need a utility to actually be able to mark blocks as bad. 
>
>			Roger. 
>
>  
>
We DO have it.  It is present in Reiser4, and there is a patch around 
somewhere for V3 that I would be happy to have someone merge into the 
latest V3 code and test (we are too focused on shipping V4 to do it 
ourselves right now).

I agree that the FS should be able to do it, but I also think that the 
drive doing it is best.

-- 
Hans



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 11:58         ` Maciej Zenczykowski
@ 2003-10-15 10:22           ` Norman Diamond
  0 siblings, 0 replies; 64+ messages in thread
From: Norman Diamond @ 2003-10-15 10:22 UTC (permalink / raw)
  To: Maciej Zenczykowski; +Cc: John Bradford, linux-kernel

Maciej Zenczykowski replied to me:

> > When the drive's self-test detected that one bad sector, I could figure out
> > which partition it was in (though not which file, which is why I asked one
> > of those questions several times already).  The drive's self-test read the
> > entire drive and the other partitions had no detectable errors.
>
> Instead of zeroing the entire partition just zero that single sector.
> something like:
> dd if=/dev/zero of=/dev/hda bs=512 seek=$lbasector conv=notrunc count=1
>
> possibly first check (by reading in the oposite direction:
> dd if=/dev/hda of=/dev/null bs=512 skip=$lbasector count=1)
> if this is indeed the place were you get the read error (in syslog)...

Thank you.

> if you can read anything from it then read it to a file and write it back
> from the file...

dd if=/dev/hda8 of=/dev/null already quit at the bad sector.  It's really
certain that that one sector is it, and I won't be able to read anything
from it.  The read check should just be a redundant check that the correct
sector is being addressed there, and it is a good idea to do that.

> as for checking which file contains it... hmm file->sector->lba mapping
> can be performed... I don't know about the other direction.  Worst case
> would require checking the mapping of all files on the partition (and
> assuming it's not in an empty area or non-file system area).

I made a shell script with find commands to copy all files that are in that
partition (all pathnames that aren't in other mounted filesystems) to
/dev/null.  When one aborts, I should know the name.  But this is an
incredibly inefficient way to do it.  Intuitively it seems it should be
straightforward to find at least one of the pathnames that the file has.
Practically it seems it shouldn't take 24 hours to copy all files in a 5GB
partition to /dev/null.  But after several hours it only copied about 20% of
the files to /dev/null, and I'll have to continue it this weekend.  Even the
drive's "long" S.M.A.R.T. self-test only took 47 minutes.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 12:02         ` John Bradford
@ 2003-10-15 10:23           ` Norman Diamond
  2003-10-15 18:56             ` Pavel Machek
  0 siblings, 1 reply; 64+ messages in thread
From: Norman Diamond @ 2003-10-15 10:23 UTC (permalink / raw)
  To: John Bradford, linux-kernel

John Bradford replied to me:

> >   IF the bad sector doesn't get reused then great, then the next bit of
> > effort will be to try to get the sector marked as bad, if there is any way
> > to do that under Linux.  See the next question, which is now being reposted
> > for at least the fourth time.
> >   BUT IF the same sector number gets rewritten then hopefully the same
> > sector number will be associated with a reallocated non-defective sector and
> > the data will get written properly.
>
> Yes, that's what I'd hope, unless the disk ran out of spare space to
> allocate.

Surely two reallocations wouldn't have made it run out of spare space?

Besides, the S.M.A.R.T. log didn't have any statistics anywhere near
failure, and if the drive had run out of spare space then surely one or two
of the statistics should have gone down to zero.

> > > >    How can I tell Linux to mark the sector as bad, knowing the LBA
> > > >    sector number?
> > >
> > > Don't.  If the drive can't fix this problem itself, throw it in the bin.
> >
> > THE DRIVE HAS 1, ONE, HITOTSU, UNO, UN, BAD SECTOR.
>
> No, the last SMART test re-allocated one sector.

Yeah, but it's not even quite clear if the reallocated sector is the same as
the defective sector.  Something is pretty screwy, and I've asked some
friends at Toshiba to discuss it during their next visit (and they know
they're getting cat food instead of my wife's cooking  _^o^_)  Nonetheless,
it is customary to dump drives when they have increasingly numerous defects,
not when they have one.

> That sector may have gone bad in the next few minutes.  Unlikely, but possible.

I think you mean that the replacement sector might have gone bad in the
minutes after the reallocation.  Unlikely but possible, yes.  I guess I will
probably try to write zeroes to the sector using the suggestion by Maciej
Zenczykowski, but first I'll ask the Toshiba people if they have different
preferences.

> > The drive is capable of
> > doing reallocations.  What kind of operation can be done that will persuade
> > the drive to do the reallocation?
>
> The drive has _done_ a reallocation.  You posted that the reallocated
> sector count had gone from 1 to 2.  This is why I said if it can't fix
> the problem, bin it.  It doesn't seem to have fixed the problem yet.

It's not obvious if the reallocated sector was the same one as the detected
defective sector.  I thought it seemed not to be.  You pointed out that it
is unlikely but possible.

> Somebody else might read this thread, and want full instructions.

OK, sorry I thought you just hadn't read what you were replying to.

> 3. Run the tests again.  Your drive fixed one bad sector, let's see if
>    it completes the test again without finding more.

Yeah, but I was already upset by finding that the same sector number
remained bad even after it "should" have been the one that was reallocated.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  9:04         ` Hans Reiser
@ 2003-10-15 10:23           ` Norman Diamond
  2003-10-15 10:39             ` Hans Reiser
  2003-10-17  9:40           ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
  1 sibling, 1 reply; 64+ messages in thread
From: Norman Diamond @ 2003-10-15 10:23 UTC (permalink / raw)
  To: Hans Reiser, Wes Janzen; +Cc: Rogier Wolff, John Bradford, linux-kernel, nikita

Hans Reiser wrote:

> I think the problem is that many users don't know how to trigger the bad
> sector remapping for the case where the drive can still remap, using
> writes to the bad blocks, and probably our faq needs updating.

This is indeed one of the problems[*].  The other problem is that it seems
to be absurdly difficult to find which file contains the bad sector.  Even
though a file could have multiple hard links, it would be enough to get one
pathname for the file, in order to know which file needs to be reconstructed
from a source of good data.

[* Of course I also wish that the original failing write had been detected
by the drive, but this failure isn't software's fault.  I hope.]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-15 10:23           ` Norman Diamond
@ 2003-10-15 10:39             ` Hans Reiser
  0 siblings, 0 replies; 64+ messages in thread
From: Hans Reiser @ 2003-10-15 10:39 UTC (permalink / raw)
  To: Norman Diamond
  Cc: Wes Janzen, Rogier Wolff, John Bradford, linux-kernel, nikita

Norman Diamond wrote:

>Hans Reiser wrote:
>
>  
>
>>I think the problem is that many users don't know how to trigger the bad
>>sector remapping for the case where the drive can still remap, using
>>writes to the bad blocks, and probably our faq needs updating.
>>    
>>
>
>This is indeed one of the problems[*].  The other problem is that it seems
>to be absurdly difficult to find which file contains the bad sector.  Even
>though a file could have multiple hard links, it would be enough to get one
>pathname for the file, in order to know which file needs to be reconstructed
>from a source of good data.
>
>[* Of course I also wish that the original failing write had been detected
>by the drive, but this failure isn't software's fault.  I hope.]
>
>
>
>  
>
badblocks program fixes that

-- 
Hans

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-15 10:23           ` Norman Diamond
@ 2003-10-15 18:56             ` Pavel Machek
  0 siblings, 0 replies; 64+ messages in thread
From: Pavel Machek @ 2003-10-15 18:56 UTC (permalink / raw)
  To: Norman Diamond; +Cc: John Bradford, linux-kernel

Hi!

> > That sector may have gone bad in the next few minutes.  Unlikely, but possible.
> 
> I think you mean that the replacement sector might have gone bad in the
> minutes after the reallocation.  Unlikely but possible, yes.  I guess I will

Well, if your drive is overheated (for example), it is likely to kill
spare sector, too. [I've seen something like that here.]

									Pavel
-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-14  9:04         ` Hans Reiser
  2003-10-15 10:23           ` Norman Diamond
@ 2003-10-17  9:40           ` Norman Diamond
  2003-10-17  9:48             ` Hans Reiser
                               ` (5 more replies)
  1 sibling, 6 replies; 64+ messages in thread
From: Norman Diamond @ 2003-10-17  9:40 UTC (permalink / raw)
  To: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
	linux-kernel, nikita, Pavel Machek

Friends in the disk drive section at Toshiba said this:

When a drive tries to read a block, if it detects errors, it retries up to
255 times.  If a retry succeeds then the block gets reallocated.  IF 255
RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.

This was so unbelievable to that I had to confirm this with them in
different words.  In case of a temporary error, the drive provides the
recovered data as the result of the read operation and the drive writes the
data to a reallocated sector.  In case of a permanent error, the block is
assumed bad, and of course the data are lost.  Since the data are assumed
lost, the drive keeps the defective LBA sector number associated with the
same defective physical block and it does not reallocate the defective
block.

I explained to them why the LBA sector number should still get reallocated
even though the data are lost.  When the sector isn't reallocated, I could
repartition the drive and reformat the partition and the OS wouldn't know
about the defective block so the OS would try again to use it.  At first
they did not believe I could do this, but I explained to them that I'm still
able to delete partitions and create new partitions etc., and then they
understood.

They also said that a write operation has a chance of getting the bad block
reallocated.  The conditions for reallocation on write are similar but not
identical to the conditions for reallocate on read.  During a write
operation if a sector is determined to be permanently bad (255 failing
retries) then it is likely to be reallocated, unlike a read.  But I'm not
sure if this is guaranteed or not.  We agreed that we should try it on my
bad sector, but if the drive again detects a permantent error then it will
not reallocate the sector.  First I still want to find which file contains
the sector; I haven't had time for this on weekdays.

When I ran the "long" S.M.A.R.T. self-test, the number of reallocated
sectors and number of reallocation events both increased from 1 to 2, but
the known bad sector remained bad.  This is entirely because of the behavior
as designed.  The self-test detected a temporary error in some other
unrelated sector, rescued the data in that unreported sector number, and
reallocated it.  That was only a coincidence.  The known bad sector was
detected yet again as permanently bad and was not reallocated.

In this mailing list there has been some discussion of whether file systems
should keep lists of known bad blocks and hide those bad blocks from
ordinary operations in ordinary usage.  Of course historically this was
always necessary.  As someone else mentioned, and I've done it too, when
formatting a disk drive, type in the list of known bad block numbers that
were printed on a piece of paper that came with the drive.

In modern times, some people think that this shouldn't be necessary because
the drive already does its best to reallocate bad blocks.  WRONG.  THE BAD
BLOCK LIST REMAINS AS NECESSARY AS IT ALWAYS WAS.

This design might change in the future, but it might not.  My friends are
afraid that they might lose their jobs if they try to suggest such a change
in the high-level design of disk drive corporate politics.  I only hope this
posting doesn't get them fired.  (This is not a frivolous concern by the
way.  The myth of lifetime employment is a less pervasive myth than it used
to be, and Toshiba is pretty much average in both world and Japanese
standards for corporate politics.)

Regarding finding which file contains the known bad sector, someone in this
mailing list said that the badblocks program could help, but the manual page
for the badblocks program doesn't give any clues as to how it would help.
I'm still doing find of all files in the partition and cp them to /dev/null.

Meanwhile, yes we do need to record those bad block lists and try to never
let them get allocated to user-visible files.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17  9:40           ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
@ 2003-10-17  9:48             ` Hans Reiser
  2003-10-17 11:11               ` Norman Diamond
  2003-10-17  9:58             ` Pavel Machek
                               ` (4 subsequent siblings)
  5 siblings, 1 reply; 64+ messages in thread
From: Hans Reiser @ 2003-10-17  9:48 UTC (permalink / raw)
  To: Norman Diamond
  Cc: Wes Janzen, Rogier Wolff, John Bradford, linux-kernel, nikita,
	Pavel Machek

Norman Diamond wrote:

>Friends in the disk drive section at Toshiba said this:
>
>When a drive tries to read a block, if it detects errors, it retries up to
>255 times.  If a retry succeeds then the block gets reallocated.  IF 255
>RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
>
>This was so unbelievable to that I had to confirm this with them in
>different words.  In case of a temporary error, the drive provides the
>recovered data as the result of the read operation and the drive writes the
>data to a reallocated sector.  In case of a permanent error, the block is
>assumed bad, and of course the data are lost.  Since the data are assumed
>lost, the drive keeps the defective LBA sector number associated with the
>same defective physical block and it does not reallocate the defective
>block.
>
>I explained to them why the LBA sector number should still get reallocated
>even though the data are lost.  When the sector isn't reallocated, I could
>repartition the drive and reformat the partition and the OS wouldn't know
>about the defective block so the OS would try again to use it.  At first
>they did not believe I could do this, but I explained to them that I'm still
>able to delete partitions and create new partitions etc., and then they
>understood.
>
>They also said that a write operation has a chance of getting the bad block
>reallocated.  The conditions for reallocation on write are similar but not
>identical to the conditions for reallocate on read.  During a write
>operation if a sector is determined to be permanently bad (255 failing
>retries) then it is likely to be reallocated, unlike a read.  But I'm not
>sure if this is guaranteed or not.  We agreed that we should try it on my
>bad sector, but if the drive again detects a permantent error then it will
>not reallocate the sector.  First I still want to find which file contains
>the sector; I haven't had time for this on weekdays.
>
>When I ran the "long" S.M.A.R.T. self-test, the number of reallocated
>sectors and number of reallocation events both increased from 1 to 2, but
>the known bad sector remained bad.  This is entirely because of the behavior
>as designed.  The self-test detected a temporary error in some other
>unrelated sector, rescued the data in that unreported sector number, and
>reallocated it.  That was only a coincidence.  The known bad sector was
>detected yet again as permanently bad and was not reallocated.
>
>In this mailing list there has been some discussion of whether file systems
>should keep lists of known bad blocks and hide those bad blocks from
>ordinary operations in ordinary usage.  Of course historically this was
>always necessary.  As someone else mentioned, and I've done it too, when
>formatting a disk drive, type in the list of known bad block numbers that
>were printed on a piece of paper that came with the drive.
>
>In modern times, some people think that this shouldn't be necessary because
>the drive already does its best to reallocate bad blocks.  WRONG.  THE BAD
>BLOCK LIST REMAINS AS NECESSARY AS IT ALWAYS WAS.
>
>This design might change in the future, but it might not.  My friends are
>afraid that they might lose their jobs if they try to suggest such a change
>in the high-level design of disk drive corporate politics.  I only hope this
>posting doesn't get them fired.  (This is not a frivolous concern by the
>way.  The myth of lifetime employment is a less pervasive myth than it used
>to be, and Toshiba is pretty much average in both world and Japanese
>standards for corporate politics.)
>
>Regarding finding which file contains the known bad sector, someone in this
>mailing list said that the badblocks program could help, but the manual page
>for the badblocks program doesn't give any clues as to how it would help.
>I'm still doing find of all files in the partition and cp them to /dev/null.
>
>Meanwhile, yes we do need to record those bad block lists and try to never
>let them get allocated to user-visible files.
>
>
>
>  
>
Instead of recording the bad blocks, just write to them.

-- 
Hans



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17  9:40           ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
  2003-10-17  9:48             ` Hans Reiser
@ 2003-10-17  9:58             ` Pavel Machek
  2003-10-17 10:15               ` Hans Reiser
  2003-10-17 10:24             ` Rogier Wolff
                               ` (3 subsequent siblings)
  5 siblings, 1 reply; 64+ messages in thread
From: Pavel Machek @ 2003-10-17  9:58 UTC (permalink / raw)
  To: Norman Diamond
  Cc: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
	linux-kernel, nikita, Pavel Machek

Hi!

> When a drive tries to read a block, if it detects errors, it retries up to
> 255 times.  If a retry succeeds then the block gets reallocated.  IF 255
> RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.

...

> They also said that a write operation has a chance of getting the bad block
> reallocated.  The conditions for reallocation on write are similar but not
> identical to the conditions for reallocate on read.  During a write
> operation if a sector is determined to be permanently bad (255 failing
> retries) then it is likely to be reallocated, unlike a read.  But I'm not
> sure if this is guaranteed or not.  We agreed that we should try it
> on my

Well, this behaviour makes sense.

"If we can't read this, leave it in place, perhaps we can read it in
future (when temperature drops below 80Celsius or something)". "If we
can't write this, bad, but we can reallocate without loosing
anything".

It looks slightly unexpected, but pretty sane to me. Anything else
would kill your data.

[BTW your subject made me delete the mail with "spam", until Hans
replied to it...]
								Pavel
-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17  9:58             ` Pavel Machek
@ 2003-10-17 10:15               ` Hans Reiser
  0 siblings, 0 replies; 64+ messages in thread
From: Hans Reiser @ 2003-10-17 10:15 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Norman Diamond, Wes Janzen, Rogier Wolff, John Bradford,
	linux-kernel, nikita, Pavel Machek

Pavel Machek wrote:

>
>
>[BTW your subject made me delete the mail with "spam", until Hans
>replied to it...]
>								Pavel
>  
>
I wonder if spam filters will eventually result in a modest reduction in 
the level of hyperbole in non-spam.;-)

-- 
Hans



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17  9:40           ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
  2003-10-17  9:48             ` Hans Reiser
  2003-10-17  9:58             ` Pavel Machek
@ 2003-10-17 10:24             ` Rogier Wolff
  2003-10-17 10:49               ` John Bradford
  2003-10-17 10:37             ` ATA Defect management John Bradford
                               ` (2 subsequent siblings)
  5 siblings, 1 reply; 64+ messages in thread
From: Rogier Wolff @ 2003-10-17 10:24 UTC (permalink / raw)
  To: Norman Diamond
  Cc: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
	linux-kernel, nikita, Pavel Machek

On Fri, Oct 17, 2003 at 06:40:01PM +0900, Norman Diamond wrote:
> I explained to them why the LBA sector number should still get
> reallocated even though the data are lost.

This is unbelievably bad: Sometimes it is worth it, to try and read
the block again and again. We've seen blocks getting read after we've
retried over 1000 times from "userspace". That doesn't include the
retries that the drive did for us "behind the scenes". 

If you manage to convince Toshiba to remap the sector on a "bad read",
we'll never ever be able to recover the sector.

We've also been able to provide a different environment (e.g. other
ambient temperature) to a drive so that previously bad sectors could
be read.

No, the only way is to realloc on write. (but it should remember that
the data was bad, and treat the physical area with extra caution. It's
possible that something happened while writing that sector, so that
rewriting it this time will fix the problem for good, but on the other
hand, that area of the drive demonstrated the abilitty to lose data,
so you shouldn't trust any data to it!)

			Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 64+ messages in thread

* ATA Defect management
  2003-10-17  9:40           ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
                               ` (2 preceding siblings ...)
  2003-10-17 10:24             ` Rogier Wolff
@ 2003-10-17 10:37             ` John Bradford
  2003-10-21 20:44               ` bill davidsen
  2003-10-17 12:08             ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Justin Cormack
  2003-10-21 20:12             ` bill davidsen
  5 siblings, 1 reply; 64+ messages in thread
From: John Bradford @ 2003-10-17 10:37 UTC (permalink / raw)
  To: Norman Diamond, Hans Reiser, Wes Janzen, Rogier Wolff
  Cc: eric_mudama, linux-kernel, john

[Note to Eric, who is CC'ed, can you comment on how Maxtor drives
handle these issues?]

Quote from "Norman Diamond" <ndiamond@wta.att.ne.jp>:
> Friends in the disk drive section at Toshiba said this:
> 
> When a drive tries to read a block, if it detects errors, it retries up to
> 255 times.  If a retry succeeds then the block gets reallocated.  IF 255
> RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.

OK, this is interesting, at least we have some specific information.

> This was so unbelievable to that I had to confirm this with them in
> different words.  In case of a temporary error, the drive provides the
> recovered data as the result of the read operation and the drive writes the
> data to a reallocated sector.  In case of a permanent error, the block is
> assumed bad, and of course the data are lost.  Since the data are assumed
> lost, the drive keeps the defective LBA sector number associated with the
> same defective physical block and it does not reallocate the defective
> block.

OK, so for a stupid, backward, legacy OS, that takes the 'what is the
point of substituting a spare block if you have nothing to write to
it' viewpoint, maybe that would make sense - for the rest of us it's
stupid - did anybody actually consider that responsible admins
actually make backups and want to restore them on to the disk without
having to concern themselves with defect management which, shock,
horror, is supposed to be done by the drive.  Of course, the poor
admin who made a sector-by-sector backup is completely out of luck
when he comes to restore it on to a drive that insists one sector is
bad.

> I explained to them why the LBA sector number should still get reallocated
> even though the data are lost.  When the sector isn't reallocated, I could
> repartition the drive and reformat the partition and the OS wouldn't know
> about the defective block so the OS would try again to use it.  At first
> they did not believe I could do this, but I explained to them that I'm still
> able to delete partitions and create new partitions etc., and then they
> understood.
> 
> They also said that a write operation has a chance of getting the bad block
> reallocated.  The conditions for reallocation on write are similar but not
> identical to the conditions for reallocate on read.  During a write
> operation if a sector is determined to be permanently bad (255 failing
> retries) then it is likely to be reallocated, unlike a read.  But I'm not
> sure if this is guaranteed or not.

No, I'm sorry, are we to believe that it might or might not get
re-allocated just by chance?  This is rediculous.

>  We agreed that we should try it on my
> bad sector, but if the drive again detects a permantent error then it will
> not reallocate the sector.  First I still want to find which file contains
> the sector; I haven't had time for this on weekdays.
> 
> When I ran the "long" S.M.A.R.T. self-test, the number of reallocated
> sectors and number of reallocation events both increased from 1 to 2, but
> the known bad sector remained bad.  This is entirely because of the behavior
> as designed.  The self-test detected a temporary error in some other
> unrelated sector, rescued the data in that unreported sector number, and
> reallocated it.  That was only a coincidence.  The known bad sector was
> detected yet again as permanently bad and was not reallocated.

Even though you are _deliberately_ running a self test to check for
this kind of problem?

> In this mailing list there has been some discussion of whether file systems
> should keep lists of known bad blocks and hide those bad blocks from
> ordinary operations in ordinary usage.  Of course historically this was
> always necessary.  As someone else mentioned, and I've done it too, when
> formatting a disk drive, type in the list of known bad block numbers that
> were printed on a piece of paper that came with the drive.
> 
> In modern times, some people think that this shouldn't be necessary because
> the drive already does its best to reallocate bad blocks.  WRONG.  THE BAD
> BLOCK LIST REMAINS AS NECESSARY AS IT ALWAYS WAS.

I made that claim, and stand by it.

Note one thing:

If you are right, you are basically suggesting that we will have to go
back to writing defective sectors on a sticker on the drive casing.
If you do a:

dd if=/dev/zero of=/dev/hda

you loose that bad block list.  Now, you've got to enter it in again,
or let the OS scan the disk surface and find the bad sectors.  Hello,
this is the third millennium.  This may have been a way of life twenty
years ago, but I hope we have moved on from there.

Oh, and what happens if block zero is defective, eh?  The disk is no
longer usable as a boot disk, because the MBR can't be written to
block zero?

What if I want to use my disk for storing a TAR archive?  Why should
we bloat TAR with bad block support?

> This design might change in the future, but it might not.  My friends are
> afraid that they might lose their jobs if they try to suggest such a change
> in the high-level design of disk drive corporate politics.  I only hope this
> posting doesn't get them fired.  (This is not a frivolous concern by the
> way.  The myth of lifetime employment is a less pervasive myth than it used
> to be, and Toshiba is pretty much average in both world and Japanese
> standards for corporate politics.)

If anything, it should get them a promotion.  If somebody from Toshiba
wants to discuss defect management with me, they are welcome to, I'll
waive my consultancy fees, (at least initially).

> Regarding finding which file contains the known bad sector, someone in this
> mailing list said that the badblocks program could help, but the manual page
> for the badblocks program doesn't give any clues as to how it would help.
> I'm still doing find of all files in the partition and cp them to /dev/null.
> 
> Meanwhile, yes we do need to record those bad block lists and try to never
> let them get allocated to user-visible files.

NO.  Fix the drives.  If nobody is going to do that, I might as well
join the Linux-VAX project and run by business on a cluster of
11/780s.

Let me make this clear - some of us earn a living providing solutions
to clients who pay good money for that consultancy.  If they loose
data, have downtime or have any other problems, my clients will
generally come back to _ME_ for an explaination, and I want something
better than, "Well, that's the way the drives work".

We have identified a problem, now let's fix it.

Defect management needs to be done by the disk firmware, and it needs
to be done properly.

Note - this is not a criticism of Toshiba, nor am I implying that it
is in any way limited to their products.  I am grateful for them
providing information on the subject.  I own two of their laptops
which run Linux perfectly, and I am generally pleased with their
products.

Note also - I realise that the defect management techniques you
describe don't actually seem to allow data to be written to a bad
sector undetected.  A permenantly bad sector apparently won't become
'apparently good, but subtly bad', and loose data after time, but that
is not the point.  With write caching in the OS, data could be
allocated to an undetected-at-the-OS-level bad sector, and cause
problems when it is written out.  With the recent laptop mode patch we
are going to see more delayed writes going on.

John.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17 10:24             ` Rogier Wolff
@ 2003-10-17 10:49               ` John Bradford
  2003-10-17 11:09                 ` Rogier Wolff
  2003-10-17 11:24                 ` Krzysztof Halasa
  0 siblings, 2 replies; 64+ messages in thread
From: John Bradford @ 2003-10-17 10:49 UTC (permalink / raw)
  To: Rogier Wolff, Norman Diamond; +Cc: Hans Reiser, Wes Janzen, linux-kernel

Quote from Rogier Wolff <R.E.Wolff@BitWizard.nl>:
> On Fri, Oct 17, 2003 at 06:40:01PM +0900, Norman Diamond wrote:
> > I explained to them why the LBA sector number should still get
> > reallocated even though the data are lost.
> 
> This is unbelievably bad: Sometimes it is worth it, to try and read
> the block again and again. We've seen blocks getting read after we've
> retried over 1000 times from "userspace". That doesn't include the
> retries that the drive did for us "behind the scenes". 

That's moving in to the realms of more advanced data recovery.  You
shouldn't really expect to be able to do those kind of forensics on
intellegent drives using standard filesystem system calls.

Besides, are you positive that you always got the correct data off the
disk?  See the discussions about hashing algorithms - maybe the drive
simply returned data that had an additional bit flipped and wasn't
identified as bad.  If you are having to try over 1000 times from
userspace, the drive is in a bad way.  You shouldn't really make
assumptions that you do usually, (that the error correction is good
enough to ensure bad data isn't returned as good data).  If you are
recovering data from a spreadsheet, for example, the errors could go
unnoticed, but have catastrophic results.

> If you manage to convince Toshiba to remap the sector on a "bad read",
> we'll never ever be able to recover the sector.

Of course you will - it's remapped, the data isn't overwritten!  You
may need more advanced tools, but you can still seek the heads to that
part of the platter and get data from the head-amp.  Just because you
couldn't use your simple method anymore is real reason to argue
against fixing the problem.

> We've also been able to provide a different environment (e.g. other
> ambient temperature) to a drive so that previously bad sectors could
> be read.
> 
> No, the only way is to realloc on write.

This may be more sensible, but not for the reasons you are suggesting,
and not in the way that you are suggesting.  I have nothing really
against not re-allocating on read, although ideally, it should be an
option, but marking the sector as "don't touch, don't even re-map in
case we confuse the OS", after a bad read is NOT acceptable in my
opinion.

In any case, a S.M.A.R.T. test should remap all suspect sectors - if
an admin has deliberately run a S.M.A.R.T. test, I think we can assume
they know what they are doing.

> (but it should remember that
> the data was bad, and treat the physical area with extra caution. It's
> possible that something happened while writing that sector, so that
> rewriting it this time will fix the problem for good, but on the other
> hand, that area of the drive demonstrated the abilitty to lose data,
> so you shouldn't trust any data to it!)

Suspect drive?  Bin it.  Do you really not value your data enough to
do that?

John.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17 10:49               ` John Bradford
@ 2003-10-17 11:09                 ` Rogier Wolff
  2003-10-17 11:24                 ` Krzysztof Halasa
  1 sibling, 0 replies; 64+ messages in thread
From: Rogier Wolff @ 2003-10-17 11:09 UTC (permalink / raw)
  To: John Bradford
  Cc: Rogier Wolff, Norman Diamond, Hans Reiser, Wes Janzen, linux-kernel

On Fri, Oct 17, 2003 at 11:49:11AM +0100, John Bradford wrote:
> Quote from Rogier Wolff <R.E.Wolff@BitWizard.nl>:
> > On Fri, Oct 17, 2003 at 06:40:01PM +0900, Norman Diamond wrote:
> > > I explained to them why the LBA sector number should still get
> > > reallocated even though the data are lost.
> > 
> > This is unbelievably bad: Sometimes it is worth it, to try and read
> > the block again and again. We've seen blocks getting read after we've
> > retried over 1000 times from "userspace". That doesn't include the
> > retries that the drive did for us "behind the scenes". 
> 
> That's moving in to the realms of more advanced data recovery.  You
> shouldn't really expect to be able to do those kind of forensics on
> intellegent drives using standard filesystem system calls.

Yep. And several manufacturers have told us that they don't put any
"bypass" commands in their firmware so as a data-recovery company we
can't bypass the normal stuff. On SCSI drives we get to set the
"number of retiries" and things like that. Terribly useful. Not on ATA
drives. 

> Besides, are you positive that you always got the correct data off the
> disk?  See the discussions about hashing algorithms - maybe the drive
> simply returned data that had an additional bit flipped and wasn't
> identified as bad.  If you are having to try over 1000 times from
> userspace, the drive is in a bad way.  You shouldn't really make
> assumptions that you do usually, (that the error correction is good
> enough to ensure bad data isn't returned as good data).  If you are
> recovering data from a spreadsheet, for example, the errors could go
> unnoticed, but have catastrophic results.

Yes, as an experienced data-recovery-expert I can look at the data,
and say that I believe it. And I know the risks you explain here.

> > If you manage to convince Toshiba to remap the sector on a "bad read",
> > we'll never ever be able to recover the sector.
> 
> Of course you will - it's remapped, the data isn't overwritten!  You
> may need more advanced tools, but you can still seek the heads to that
> part of the platter and get data from the head-amp.  Just because you
> couldn't use your simple method anymore is real reason to argue
> against fixing the problem.

Nope. Even on SCSI drives there seems to be no way to tell the drive
"please give me the data for raw block XXX". We've pushed the
manufacturers for the ability to do this, but we get nowhere. Feel
free to prove us wrong. Armwaving doesn't work. 

			Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17  9:48             ` Hans Reiser
@ 2003-10-17 11:11               ` Norman Diamond
  2003-10-17 11:45                 ` Hans Reiser
                                   ` (3 more replies)
  0 siblings, 4 replies; 64+ messages in thread
From: Norman Diamond @ 2003-10-17 11:11 UTC (permalink / raw)
  To: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
	linux-kernel, nikita, Pavel Machek

Replying first to Hans Reiser; below to Russell King and Pavel Machek.

> Instead of recording the bad blocks, just write to them.

If writes are guaranteed to force reallocations then this is potentially
part of a solution.

I still remain suspicious because the first failed read was milliseconds or
minutes after the preceding write.  I think the odds are very high that the
sector was already bad at the time of the write but reallocation did not
occur.  It is possible but I think very unlikely that the sector was
reallocated to a different physical sector which went bad milliseconds after
being written after reallocation, and equally unlikely that the sector
wasn't reallocated because it really hadn't been bad but went bad
milliseconds later.  In other words, I think it is overwhelmingly likely
that the write failed but was not detected as such and did not result in
reallocation.

Now, maybe there is a technique to force it anyway.  When a partition is
newly created and is being formatted with the intention of writing data a
few minutes later, do writes that "should" have a better chance of being
detected.  The way to start this is to simply write every block, but this is
obviously insufficient because my block did get written shortly after the
partition was formatted and that write didn't cause the block to be
reallocated.  So in addition to simply writing every block, also read every
block.  For each read that fails, proceed to do another write which "should"
force reallocation.

Mr. Reiser, when I created a partition of your design, that technique was
not offered.  Why?  And will it soon start being offered?

Also, I remain highly suspicious that for each read that fails, when the
formatting program proceeds to do another write which "should" force
reallocation, the drive might not do it.  The formatter will have to proceed
to yet another read.  And if the block is still bad, then figure that the
drive is refusing to reallocate the bad block.  And then yes, the formatter
will still have to make a list of known bad blocks and do something to
prevent ordinary file system operations from ever seeing those blocks.

Russell King replied to me:

> > When a drive tries to read a block, if it detects errors, it retries up
> > to 255 times.  If a retry succeeds then the block gets reallocated.  IF
> > 255 RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
>
> This is perfectly reasonable.  If the drive can't recover your old data
> to reallocate it to a new block, then leaving the error present until you
> write new data to that bad block is the correct thing to do.

Only if the subsequent write is guaranteed to result in reallocation.  I
remain suspicious that the drive does not guarantee such.  Suppose the
contents of the next write happen to get stored close enough to correct that
the block doesn't get reallocated and the data survive for another 100
milliseconds before getting corrupt again?

> Think about what would happen if it did get reallocated.  What data would
> the drive return when requested to read the bad block?

Why does it matter?  The drive already reported a read failure.  Maybe Linux
programs aren't all smart enough to inform the user when a read operation
results in an I/O error, but drivers could be smarter.  I think there's
probably a bit of room in an inode to add a flag saying that the file has
been detected to be partially unreadable.  Sorry for the digression.
Anyway, it is 100% true that the data in that block are gone.  The block
should be reallocated and the new physical block can either be zeroed or
randomized or anything, and that's what subsequent reads will get until the
block gets written again.

> If the error persists during a write to the bad block, then yes, I'd
> expect it to be reallocated at that point - but only because the drive has
> the correct data for that block available.

We agree in our moral expectations and our technical analysis that correct
data will be available at that time.  But if your word "expect" means you
have confidence that the drive will perform correctly, I do not share your
confidence (I think it is possible but highly unlikely that the drive did
its job correctly during the previous write).

> Your description of the way Toshibas drive works seems perfectly sane.
> In fact, I'd consider a drive to be broken if it behaved in any other way
> - capable of almost silent data loss.

I think it would not be silent.  If the system log had one repetition
instead of fifty repetitions, it would not be silent.  I don't know which
application was silent and am irritated.  (dd wasn't silent when I tried
copying the entire partition to /dev/null).

Pavel Machek wrote:

> Well, this behaviour makes sense.
>
> "If we can't read this, leave it in place, perhaps we can read it in
> future (when temperature drops below 80Celsius or something)". "If we
> can't write this, bad, but we can reallocate without loosing
> anything".

Well, consider the two extremes we've seen in this thread now.  Mr. Bradford
felt that the entire drive should be discarded on account of having one bad
block.  Mr. Machek feels that we should preserve the possibility of reusing
the bad block because in the future it might appear not to be bad.  I take
the middle road.  The drive should not be discarded until errors become more
frequent or numerous, but known bad blocks should be acted on so that those
physical blocks should not have a chance of being used again.

Suppose the block became readable when the temperature drops (this one
didn't but I believe some can).  What happens when the block becomes
readable, and then a program writes new data to that block, and the block
temporarily appears good?  At that time it will get written and will not get
reallocated, right?  And a few milliseconds later, what?  I do not want that
block reused.  I want it reallocated.

And when a drive doesn't guarantee reallocation, I want the driver to remove
the sector from the file system.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17 10:49               ` John Bradford
  2003-10-17 11:09                 ` Rogier Wolff
@ 2003-10-17 11:24                 ` Krzysztof Halasa
  2003-10-17 19:35                   ` John Bradford
  1 sibling, 1 reply; 64+ messages in thread
From: Krzysztof Halasa @ 2003-10-17 11:24 UTC (permalink / raw)
  To: John Bradford
  Cc: Rogier Wolff, Norman Diamond, Hans Reiser, Wes Janzen, linux-kernel

John Bradford <john@grabjohn.com> writes:

> Besides, are you positive that you always got the correct data off the
> disk?  See the discussions about hashing algorithms - maybe the drive
> simply returned data that had an additional bit flipped and wasn't
> identified as bad.

One bit? No chance. The same as with ECC RAM - one bit error will always
be detected.

>  If you are having to try over 1000 times from
> userspace, the drive is in a bad way.  You shouldn't really make
> assumptions that you do usually, (that the error correction is good
> enough to ensure bad data isn't returned as good data).  If you are
> recovering data from a spreadsheet, for example, the errors could go
> unnoticed, but have catastrophic results.

Then you have to abandon using any hard drivers. Or computers at all.
Well, mirrors (with read-and-compare) are probably good enough for you,
but it has to be done at application level.

> Of course you will - it's remapped, the data isn't overwritten!  You
> may need more advanced tools,

= in practice, it's lost. Have you seen such tools?

> but you can still seek the heads to that
> part of the platter and get data from the head-amp.  Just because you
> couldn't use your simple method anymore is real reason to argue
> against fixing the problem.

against _changing_ the problem (it doesn't go away), breaking things
which are now sane.

> This may be more sensible, but not for the reasons you are suggesting,
> and not in the way that you are suggesting.

Then note that a drive can be temporarily unable to read most of the
data - due to, say, incorrect supply voltage or very high level of
electromagnetic interferences.

Would you like to trash _all_ your data in such case automatically?

> Suspect drive?  Bin it.  Do you really not value your data enough to
> do that?

Do you really not value your data enough to mark it as inaccessible?

If it comes to non-standard recovery then you should rather go for
backups.
-- 
Krzysztof Halasa, B*FH

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17 11:11               ` Norman Diamond
@ 2003-10-17 11:45                 ` Hans Reiser
  2003-10-17 11:51                 ` John Bradford
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 64+ messages in thread
From: Hans Reiser @ 2003-10-17 11:45 UTC (permalink / raw)
  To: Norman Diamond
  Cc: Wes Janzen, Rogier Wolff, John Bradford, linux-kernel, nikita,
	Pavel Machek, Vitaly Fertman

Norman Diamond wrote:

>Replying first to Hans Reiser; below to Russell King and Pavel Machek.
>
>  
>
>>Instead of recording the bad blocks, just write to them.
>>    
>>
>
>If writes are guaranteed to force reallocations then this is potentially
>part of a solution.
>
>I still remain suspicious because the first failed read was milliseconds or
>minutes after the preceding write.  I think the odds are very high that the
>sector was already bad at the time of the write but reallocation did not
>occur.  It is possible but I think very unlikely that the sector was
>reallocated to a different physical sector which went bad milliseconds after
>being written after reallocation, and equally unlikely that the sector
>wasn't reallocated because it really hadn't been bad but went bad
>milliseconds later.  In other words, I think it is overwhelmingly likely
>that the write failed but was not detected as such and did not result in
>reallocation.
>  
>
perform the write after the failed read, that way the drive knows it is 
a bad block at the time you write.

>Now, maybe there is a technique to force it anyway.  When a partition is
>newly created and is being formatted with the intention of writing data a
>few minutes later, do writes that "should" have a better chance of being
>detected.  The way to start this is to simply write every block, but this is
>obviously insufficient because my block did get written shortly after the
>partition was formatted and that write didn't cause the block to be
>reallocated.  So in addition to simply writing every block, also read every
>block.  For each read that fails, proceed to do another write which "should"
>force reallocation.
>
>Mr. Reiser, when I created a partition of your design, that technique was
>not offered.  Why?  And will it soon start being offered?
>  
>
I think I discussed with Vitaly offering users the option of writing, 
reading, and then writing again, every block before mkreiserfs.  I 
forget what happened to that idea, Vitaly?

>Also, I remain highly suspicious that for each read that fails, when the
>formatting program proceeds to do another write which "should" force
>reallocation, the drive might not do it.
>
I am not going to worry about such suspicions without evidence or drive 
manufacturer comment, as it has not been our experience so far.

>
>
>Why does it matter?  The drive already reported a read failure.  Maybe Linux
>programs aren't all smart enough to inform the user when a read operation
>results in an I/O error, but drivers could be smarter.
>
There is a general problem with reporting urgent kernel messages to 
users thanks to GUIs covering over the console.



-- 
Hans



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17 11:11               ` Norman Diamond
  2003-10-17 11:45                 ` Hans Reiser
@ 2003-10-17 11:51                 ` John Bradford
  2003-10-17 12:53                 ` John Bradford
  2003-10-17 13:04                 ` Russell King
  3 siblings, 0 replies; 64+ messages in thread
From: John Bradford @ 2003-10-17 11:51 UTC (permalink / raw)
  To: Norman Diamond, Hans Reiser, Wes Janzen, Rogier Wolff; +Cc: linux-kernel

> Well, consider the two extremes we've seen in this thread now.  Mr. Bradford
> felt that the entire drive should be discarded on account of having one bad
> block.

Please don't spread blatently mis-leading information.

My position on this is that if a drive is _persistantly_ unable to
_write_ to any LBA address, it should be binned.  Read errors are a
separate concern.  If they occur, the drive should simply return an
error.  The OS needs to do _NOTHING_.  No special re-writing to force
a re-allocation should be done - we assume the drive is going to do
that, and if it doesn't:

1. DRIVE -> BIN

2. Restore backup.

> Mr. Machek feels that we should preserve the possibility of reusing
> the bad block because in the future it might appear not to be bad.  I take
> the middle road.  The drive should not be discarded until errors become more
> frequent or numerous, but known bad blocks should be acted on so that those
> physical blocks should not have a chance of being used again.

You may consider that a responsible attitude towards people who are
paying for consultancy, and value their data at more than the physical
cost of the disk, but I do not.

> Suppose the block became readable when the temperature drops (this one
> didn't but I believe some can).  What happens when the block becomes
> readable, and then a program writes new data to that block, and the block
> temporarily appears good?  At that time it will get written and will not get
> reallocated, right?  And a few milliseconds later, what?  I do not want that
> block reused.  I want it reallocated.

1. Monitor drive.

2. Out of spec temperature?  If yes, remount R/O and page an operator.

3. Go to 1

> And when a drive doesn't guarantee reallocation, I want the driver to remove
> the sector from the file system.

Such drives are no better in this regard than ST-506 drives in my
opinion.  I have almost always started discussions with a phrase such
as, "assuming we are talking about modern drives that do their own
defect management".

John.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17  9:40           ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
                               ` (3 preceding siblings ...)
  2003-10-17 10:37             ` ATA Defect management John Bradford
@ 2003-10-17 12:08             ` Justin Cormack
  2003-10-21 20:12             ` bill davidsen
  5 siblings, 0 replies; 64+ messages in thread
From: Justin Cormack @ 2003-10-17 12:08 UTC (permalink / raw)
  To: Norman Diamond
  Cc: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
	Kernel mailing list, nikita, Pavel Machek

On Fri, 2003-10-17 at 10:40, Norman Diamond wrote:
> Friends in the disk drive section at Toshiba said this:
> 
> When a drive tries to read a block, if it detects errors, it retries up to
> 255 times.  If a retry succeeds then the block gets reallocated.  IF 255
> RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
> 
> This was so unbelievable to that I had to confirm this with them in
> different words.  In case of a temporary error, the drive provides the
> recovered data as the result of the read operation and the drive writes the
> data to a reallocated sector.  In case of a permanent error, the block is
> assumed bad, and of course the data are lost.  Since the data are assumed
> lost, the drive keeps the defective LBA sector number associated with the
> same defective physical block and it does not reallocate the defective
> block.
> 
> I explained to them why the LBA sector number should still get reallocated
> even though the data are lost.  When the sector isn't reallocated, I could
> repartition the drive and reformat the partition and the OS wouldn't know
> about the defective block so the OS would try again to use it.  At first
> they did not believe I could do this, but I explained to them that I'm still
> able to delete partitions and create new partitions etc., and then they
> understood.
> 
> They also said that a write operation has a chance of getting the bad block
> reallocated.  The conditions for reallocation on write are similar but not
> identical to the conditions for reallocate on read.  During a write
> operation if a sector is determined to be permanently bad (255 failing
> retries) then it is likely to be reallocated, unlike a read.  But I'm not
> sure if this is guaranteed or not.  We agreed that we should try it on my
> bad sector, but if the drive again detects a permantent error then it will
> not reallocate the sector.  First I still want to find which file contains
> the sector; I haven't had time for this on weekdays.
> 

I have found that in teh case of blocks that wont reallocate with reads,
a sufficiently large number of reads and writes will fix them
eventually, by reallocating. The bahaviour doesnt seem entirely
predicatable (but then failure modes often arent), but given time it is
possible to do.

> In this mailing list there has been some discussion of whether file systems
> should keep lists of known bad blocks and hide those bad blocks from
> ordinary operations in ordinary usage.  Of course historically this was
> always necessary.  As someone else mentioned, and I've done it too, when
> formatting a disk drive, type in the list of known bad block numbers that
> were printed on a piece of paper that came with the drive.
> 

This really isnt going to work with swap partitions and suchlike. If you
cant get rid of a bad sector with reads and writes on badblocks, smart
tests and the manufacturers low level format then it is defective and
you should discard it or return it under warranty. The bad blcoks list
is really not needed. If you really want to do it, use dm to remap the
raw device without bad blocks, then you can still use it on filesystems
without badblocks support (eg swap, raid etc). The device mapping stuff
should have no trouble with this.

> Regarding finding which file contains the known bad sector, someone in this
> mailing list said that the badblocks program could help, but the manual page
> for the badblocks program doesn't give any clues as to how it would help.
> I'm still doing find of all files in the partition and cp them to /dev/null.

use the read test on badblocks to find the sector, then use the write
tests tooverwrite it until the badblock is fixed, then fsck your
partition. If you get errors then the block was metadata. Otherwise
md5sum your files and check against the backups. That will tell you the
file... For ext2 I believe there are some tools, for other file systems
it might be more difficult.

Justin



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17 11:11               ` Norman Diamond
  2003-10-17 11:45                 ` Hans Reiser
  2003-10-17 11:51                 ` John Bradford
@ 2003-10-17 12:53                 ` John Bradford
  2003-10-17 13:03                   ` Russell King
  2003-10-19  7:50                   ` Andre Hedrick
  2003-10-17 13:04                 ` Russell King
  3 siblings, 2 replies; 64+ messages in thread
From: John Bradford @ 2003-10-17 12:53 UTC (permalink / raw)
  To: Norman Diamond, Hans Reiser, Wes Janzen, Rogier Wolff; +Cc: linux-kernel

Quote from "Norman Diamond" <ndiamond@wta.att.ne.jp>:
> Now, maybe there is a technique to force it anyway.  When a partition is
> newly created and is being formatted with the intention of writing data a
> few minutes later, do writes that "should" have a better chance of being
> detected.  The way to start this is to simply write every block, but this is
> obviously insufficient because my block did get written shortly after the
> partition was formatted and that write didn't cause the block to be
> reallocated.  So in addition to simply writing every block, also read every
> block.  For each read that fails, proceed to do another write which "should"
> force reallocation.

I am just imagning how many Flash devices will be worn out
unnecessarily by any filesystem utility that does this transparently
to the user :-(.

> Russell King replied to me:
> 
> > > When a drive tries to read a block, if it detects errors, it retries up
> > > to 255 times.  If a retry succeeds then the block gets reallocated.  IF
> > > 255 RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
> >
> > This is perfectly reasonable.  If the drive can't recover your old data
> > to reallocate it to a new block, then leaving the error present until you
> > write new data to that bad block is the correct thing to do.

I 99% agree with that.  The 1% where I don't is that there may be
situations where there is no interest in doing any data recovery from
the drive, (you have backups, it is part of a RAID array, or storing
temporary data that can be re-generated whenever necessary), and also,
any read errors that occur during a S.M.A.R.T. read test should result
in a re-mapping of the block.

> > Think about what would happen if it did get reallocated.  What data would
> > the drive return when requested to read the bad block?
> 
> Why does it matter?  The drive already reported a read failure.  Maybe Linux
> programs aren't all smart enough to inform the user when a read operation
> results in an I/O error, but drivers could be smarter.  I think there's
> probably a bit of room in an inode to add a flag saying that the file has
> been detected to be partially unreadable.  Sorry for the digression.
> Anyway, it is 100% true that the data in that block are gone.  The block
> should be reallocated and the new physical block can either be zeroed or
> randomized or anything, and that's what subsequent reads will get until the
> block gets written again.

100% agreed.

> > If the error persists during a write to the bad block, then yes, I'd
> > expect it to be reallocated at that point - but only because the drive has
> > the correct data for that block available.
> 
> We agree in our moral expectations and our technical analysis that correct
> data will be available at that time.  But if your word "expect" means you
> have confidence that the drive will perform correctly, I do not share your
> confidence (I think it is possible but highly unlikely that the drive did
> its job correctly during the previous write).

If the drive is not doing it's job properly DRIVE -> BIN.

> > Your description of the way Toshibas drive works seems perfectly sane.

I disagree - we haven't confirmed what happens in the error-on-write
situation.  If it does indeed always remap the block, then I'd agree
that that aspect was perfectly sane.

John.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17 12:53                 ` John Bradford
@ 2003-10-17 13:03                   ` Russell King
  2003-10-17 13:26                     ` John Bradford
  2003-10-19  7:50                   ` Andre Hedrick
  1 sibling, 1 reply; 64+ messages in thread
From: Russell King @ 2003-10-17 13:03 UTC (permalink / raw)
  To: John Bradford
  Cc: Norman Diamond, Hans Reiser, Wes Janzen, Rogier Wolff, linux-kernel

On Fri, Oct 17, 2003 at 01:53:01PM +0100, John Bradford wrote:
> I disagree - we haven't confirmed what happens in the error-on-write
> situation.  If it does indeed always remap the block, then I'd agree
> that that aspect was perfectly sane.

My comments were based upon the information contained within the mail
which appeared to originate from the manufacturer.

Plus, they were in *PRIVATE*.  Sheesh.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17 11:11               ` Norman Diamond
                                   ` (2 preceding siblings ...)
  2003-10-17 12:53                 ` John Bradford
@ 2003-10-17 13:04                 ` Russell King
  2003-10-17 14:09                   ` Norman Diamond
  3 siblings, 1 reply; 64+ messages in thread
From: Russell King @ 2003-10-17 13:04 UTC (permalink / raw)
  To: Norman Diamond
  Cc: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
	linux-kernel, nikita, Pavel Machek

On Fri, Oct 17, 2003 at 08:11:42PM +0900, Norman Diamond wrote:
> Russell King replied to me:
> > > When a drive tries to read a block, if it detects errors, it retries up
> > > to 255 times.  If a retry succeeds then the block gets reallocated.  IF
> > > 255 RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
> >
> > This is perfectly reasonable.  If the drive can't recover your old data
> > to reallocate it to a new block, then leaving the error present until you
> > write new data to that bad block is the correct thing to do.

Why the F**K are you replying to me publically when I sent my reply in
private?

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17 13:03                   ` Russell King
@ 2003-10-17 13:26                     ` John Bradford
  0 siblings, 0 replies; 64+ messages in thread
From: John Bradford @ 2003-10-17 13:26 UTC (permalink / raw)
  To: Russell King
  Cc: Norman Diamond, Hans Reiser, Wes Janzen, linux-kernel, R.E.Wolff

Quote from Russell King <rmk+lkml@arm.linux.org.uk>:
> On Fri, Oct 17, 2003 at 01:53:01PM +0100, John Bradford wrote:
> > I disagree - we haven't confirmed what happens in the error-on-write
> > situation.  If it does indeed always remap the block, then I'd agree
> > that that aspect was perfectly sane.
> 
> My comments were based upon the information contained within the mail
> which appeared to originate from the manufacturer.
> 
> Plus, they were in *PRIVATE*.  Sheesh.

Please note - _I_ only quoted what was already posted to the list as a
quote.

http://marc.theaimsgroup.com/?l=linux-kernel&m=106638956902403&w=2

John.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17 13:04                 ` Russell King
@ 2003-10-17 14:09                   ` Norman Diamond
  0 siblings, 0 replies; 64+ messages in thread
From: Norman Diamond @ 2003-10-17 14:09 UTC (permalink / raw)
  To: Russell King
  Cc: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
	linux-kernel, nikita, Pavel Machek

This question from Russell King was public...

> On Fri, Oct 17, 2003 at 08:11:42PM +0900, Norman Diamond wrote:
> > Russell King replied to me:
> > > > When a drive tries to read a block, if it detects errors, it retries up
> > > > to 255 times.  If a retry succeeds then the block gets reallocated.  IF
> > > > 255 RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
> > >
> > > This is perfectly reasonable.  If the drive can't recover your old data
> > > to reallocate it to a new block, then leaving the error present until you
> > > write new data to that bad block is the correct thing to do.
>
> Why the F**K are you replying to me publically when I sent my reply in
> private?

First to answer literally, the reasons are:
(1)  Everything else in this discussion have been public with additional
copies to individuals participating in the discussion.  (The same has been
true of most messages in other LKML discussions that I've seen.)
(2)  I didn't notice anything in your previous message that looked like it
needed to be kept secret, i.e. deliberately not posted publicly.

Now taking it non-literally, obviously I owe you an apology.  I should not
have quoted any of your words publicly without asking you first.  I am sorry
for quoting you without asking.

Now taking it intellectually, I am genuinely puzzled.  Sorry to repeat, but
I didn't notice anything in your previous message that looked like it needed
to be kept secret, i.e. deliberately not posted publicly.  Why was your
previous message private?

Sincerely,
Norman Diamond

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17 11:24                 ` Krzysztof Halasa
@ 2003-10-17 19:35                   ` John Bradford
  2003-10-17 23:28                     ` Krzysztof Halasa
       [not found]                     ` <m37k33igui.fsf@defiant. <m3u166vjn0.fsf@defiant.pm.waw.pl>
  0 siblings, 2 replies; 64+ messages in thread
From: John Bradford @ 2003-10-17 19:35 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: Rogier Wolff, Norman Diamond, Hans Reiser, Wes Janzen, linux-kernel

Quote from Krzysztof Halasa <khc@pm.waw.pl>:
> John Bradford <john@grabjohn.com> writes:
> 
> > Besides, are you positive that you always got the correct data off the
> > disk?  See the discussions about hashing algorithms - maybe the drive
> > simply returned data that had an additional bit flipped and wasn't
> > identified as bad.
> 
> One bit? No chance. The same as with ECC RAM - one bit error will always
> be detected.

I said an _additional_ bit.  I am assuming that N-1 reads returned the
same, (bad), data, which was identified as bad.  Read N encountered
one too many flipped bits and returned a false positive.  Perfectly
possible, and arguably more likely than all of the existing incorrect
bits flipping back, resulting in the correct data being read back, in
some cases.

> >  If you are having to try over 1000 times from
> > userspace, the drive is in a bad way.  You shouldn't really make
> > assumptions that you do usually, (that the error correction is good
> > enough to ensure bad data isn't returned as good data).  If you are
> > recovering data from a spreadsheet, for example, the errors could go
> > unnoticed, but have catastrophic results.
> 
> Then you have to abandon using any hard drivers. Or computers at all.

Hardly.  The point I was trying to make is that the likelyhood of a
critical fault is greater when you are experiencing many non-critical
faults.

> Well, mirrors (with read-and-compare) are probably good enough for you,
> but it has to be done at application level.
> 
> > Of course you will - it's remapped, the data isn't overwritten!  You
> > may need more advanced tools,
> 
> = in practice, it's lost. Have you seen such tools?

Tell this to the drive manufacturers.  They are the ones who can sell
you a specialist firmware if you want to do data recovery, not me.

> > but you can still seek the heads to that
> > part of the platter and get data from the head-amp.  Just because you
> > couldn't use your simple method anymore is real reason to argue
> > against fixing the problem.
> 
> against _changing_ the problem (it doesn't go away), breaking things
> which are now sane.

Your argument is flawed - how can you claim the current situation is
sane when at least some drive manufactuers don't publish simple facts
such as what happens when defective blocks are encountered on reads
and on writes?

> > This may be more sensible, but not for the reasons you are suggesting,
> > and not in the way that you are suggesting.
> 
> Then note that a drive can be temporarily unable to read most of the
> data - due to, say, incorrect supply voltage or very high level of
> electromagnetic interferences.

If a system got in to a state as extreme as that, I'd generally take
the hole system down.  Electromagnatic interference that affects one
drive immediately noticably may well be affecting other components in
subtle ways - possible _silent_ data corruption in other words.

> Would you like to trash _all_ your data in such case automatically?

Yes.  Or more specifically, I wouldn't trust that data without
verifying it.  It's easy to ignore such problems and say that
everything is probably OK, and maybe 99% of the time you would be
right, but so what?  What about that 1%?

> > Suspect drive?  Bin it.  Do you really not value your data enough to
> > do that?
> 
> Do you really not value your data enough to mark it as inaccessible?

Not sure what you mean - in what context?

> If it comes to non-standard recovery then you should rather go for
> backups.

Data recovery is always a last resort.  On the other hand, backing up
data daily can still result in 23 hours of lost data, so I consider
early detection of faulty disks very important.  Mirroring brings it's
own problems to consider - more devices to possibly fail, and if they
are connected to the same controller, a serious fault with any one
could usually theoretically destroy all of them.

John.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17 19:35                   ` John Bradford
@ 2003-10-17 23:28                     ` Krzysztof Halasa
  2003-10-18  7:42                       ` Pavel Machek
  2003-10-18  8:27                       ` John Bradford
       [not found]                     ` <m37k33igui.fsf@defiant. <m3u166vjn0.fsf@defiant.pm.waw.pl>
  1 sibling, 2 replies; 64+ messages in thread
From: Krzysztof Halasa @ 2003-10-17 23:28 UTC (permalink / raw)
  To: John Bradford
  Cc: Rogier Wolff, Norman Diamond, Hans Reiser, Wes Janzen, linux-kernel

John Bradford <john@grabjohn.com> writes:

> I said an _additional_ bit.  I am assuming that N-1 reads returned the
> same, (bad), data, which was identified as bad.  Read N encountered
> one too many flipped bits and returned a false positive.  Perfectly
> possible, and arguably more likely than all of the existing incorrect
> bits flipping back, resulting in the correct data being read back, in
> some cases.

In some cases, theoretically, yes. But I've never got anything like that
in practice.

BTW: Hard drives apparently use more sophisticated algorithms,
involving measuring head signal level even when there is no problem
reading the data, and eventually remapping a sector on read before the
information is lost.

> Tell this to the drive manufacturers.  They are the ones who can sell
> you a specialist firmware if you want to do data recovery, not me.

Maybe. But, you know, it's Linux and I don't want to pay for additional
software just to use disks already paid for. Especially when it's all
working fine now.

> Your argument is flawed - how can you claim the current situation is
> sane when at least some drive manufactuers don't publish simple facts
> such as what happens when defective blocks are encountered on reads
> and on writes?

Do you think you can make them publish such things? It would be great.

> If a system got in to a state as extreme as that, I'd generally take
> the hole system down.  Electromagnatic interference that affects one
> drive immediately noticably may well be affecting other components in
> subtle ways - possible _silent_ data corruption in other words.

Possibly. Possibly the machine will immediately freeze. But data on
disk platters will probably be ok, and you'll be able to read it
when the conditions are back in specs.

> Yes.  Or more specifically, I wouldn't trust that data without
> verifying it.  It's easy to ignore such problems and say that
> everything is probably OK, and maybe 99% of the time you would be
> right, but so what?  What about that 1%?

That's not 1% - rather something like 10^-17 or so.
See the specs.
And we have CRCs all over the place - damaged .gnumeric file will
probably fail gunzip stage.
BTW: the probability of silently corrupting, say, (D)RAM contents is
much much higher than that of corrupting HDD data. Even if you use
ECC RAM.

> > Do you really not value your data enough to mark it as inaccessible?
> 
> Not sure what you mean - in what context?

Remapping a sector on read without actually copying the data makes
it inaccessible. Unless you have manufacturer-provided software, of
course, but I haven't seen any.

> Data recovery is always a last resort.  On the other hand, backing up
> data daily can still result in 23 hours of lost data, so I consider
> early detection of faulty disks very important.  Mirroring brings it's
> own problems to consider - more devices to possibly fail, and if they
> are connected to the same controller, a serious fault with any one
> could usually theoretically destroy all of them.

It all depends on requirements. If you need 100% uninterrupted service
you can use mirrored servers, possibly installed in different locations.
This will fix potential problems, while remapping on failed read will
not.
-- 
Krzysztof Halasa, B*FH

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17 23:28                     ` Krzysztof Halasa
@ 2003-10-18  7:42                       ` Pavel Machek
  2003-10-18  8:30                         ` John Bradford
  2003-10-18  8:27                       ` John Bradford
  1 sibling, 1 reply; 64+ messages in thread
From: Pavel Machek @ 2003-10-18  7:42 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: John Bradford, Rogier Wolff, Norman Diamond, Hans Reiser,
	Wes Janzen, linux-kernel

Hi!

> BTW: Hard drives apparently use more sophisticated algorithms,
> involving measuring head signal level even when there is no problem
> reading the data, and eventually remapping a sector on read before the
> information is lost.
> 

Which means cat /dev/hda > /dev/null makes sense in
cron.weekly...
-- 
				Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17 23:28                     ` Krzysztof Halasa
  2003-10-18  7:42                       ` Pavel Machek
@ 2003-10-18  8:27                       ` John Bradford
  2003-10-18 12:02                         ` Krzysztof Halasa
  1 sibling, 1 reply; 64+ messages in thread
From: John Bradford @ 2003-10-18  8:27 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: Rogier Wolff, Norman Diamond, Hans Reiser, Wes Janzen, linux-kernel

This is off-topic for linux-kernel.  Please move the discussion
elsewhere if you want to continue it.

> > I said an _additional_ bit.  I am assuming that N-1 reads returned the
> > same, (bad), data, which was identified as bad.  Read N encountered
> > one too many flipped bits and returned a false positive.  Perfectly
> > possible, and arguably more likely than all of the existing incorrect
> > bits flipping back, resulting in the correct data being read back, in
> > some cases.
> 
> In some cases, theoretically, yes. But I've never got anything like that
> in practice.
> 
> BTW: Hard drives apparently use more sophisticated algorithms,
> involving measuring head signal level even when there is no problem
> reading the data, and eventually remapping a sector on read before the
> information is lost.

Yes, but some blocks on drives that are used for archiving data may
not have been read for months or even years.  They may have multiple
errors which have not been detected and remapped.

This is a simplified example, but say in one particular drive, the
error correction can cope with around 25% of the bits on the platter
being incorrect, and still recover the data.  If 50% of the bits are
incorrect, and you read it N-1 times and get an error, but get no
error on try N, what is more likely, that suddenly there are only 25%
incorrect bits or that 51% are now wrong, and you are getting a false
positive?

Now, I am not saying that the false positive is always more likely - a
change in temperature, or slight head movement so that it is reading
the track off-centre and getting a less corrupted signal as a result,
could make the error rate drop to 25%, but I wouldn't assume that had
happened.

> > Tell this to the drive manufacturers.  They are the ones who can sell
> > you a specialist firmware if you want to do data recovery, not me.
> 
> Maybe. But, you know, it's Linux and I don't want to pay for additional
> software just to use disks already paid for. Especially when it's all
> working fine now.

Well, the original firmware wasn't 'free' in any sense of the word, so
I wouldn't expect a more advanced firmware to be 'free' either.

Drive manufacturers could sell advanced firmware to data recovery
companies for a price that would pay for itself after 3-4 data
recovery jobs.  Given that you could then do far more advanced
recovery then people could themselves, I am suprised this hasn't
happened before.  Of course, free and open firmware would be nice in
general, but that hasn't arrived yet.

Besides, I don't think it is all working fine now.  About your only
method of data recovery is to retry reading a bad block over and over
again, possibly varying things like the temperature of the drive.  You
can't get the raw bits off of the platter, or accurately position the
heads off-centre from the tracks, for example.

> > Your argument is flawed - how can you claim the current situation is
> > sane when at least some drive manufactuers don't publish simple facts
> > such as what happens when defective blocks are encountered on reads
> > and on writes?
> 
> Do you think you can make them publish such things? It would be great.
> 
> > If a system got in to a state as extreme as that, I'd generally take
> > the hole system down.  Electromagnatic interference that affects one
> > drive immediately noticably may well be affecting other components in
> > subtle ways - possible _silent_ data corruption in other words.
> 
> Possibly. Possibly the machine will immediately freeze. But data on
> disk platters will probably be ok, and you'll be able to read it
> when the conditions are back in specs.

Possibly, but look at the wider picture - data in RAM may be badly
corrupted.  If you shut down the machine gracefully, that corrupted
data may get written to disk.  If you force the machine off, that data
is lost.  Either way, I wouldn't just turn it back on and hope for the
best.  OK, if it was my own data, or there was a good reason to, (for
example, the client decides that time is more critical than data
integrity), maybe I would, but if somebody is paying for consultancy,
especially if it is at a rate that makes the cost of a hard disk
fairly insignificant, then not at least considering the possibility of
silent data corruption is irresponsible.  Concluding that the risk of
data corruption is so small that it is insignificant may suffice in
some cases, but not necessarily all of them.

> > Yes.  Or more specifically, I wouldn't trust that data without
> > verifying it.  It's easy to ignore such problems and say that
> > everything is probably OK, and maybe 99% of the time you would be
> > right, but so what?  What about that 1%?
> 
> That's not 1% - rather something like 10^-17 or so.
> See the specs.

Hmmm, that sounds like you're talking about the chance of an error in
a single block.

If a machine starts showing sudden, noticable problems because of
something like a volate spike, I don't think you can reliably predict
what may have happened to data in RAM, including the cache on the
disk, which will presumably be flushed when you powrer down, unless
the disk has been put in to a very confused state by the voltage
spike, or whatever else has caused the problem.

Infact, if a PSU is failing, how do you know mains voltage won't
suddenly fly through the machine?  Don't claim that has never
happened!

> And we have CRCs all over the place - damaged .gnumeric file will
> probably fail gunzip stage.

Yes, but presumably you want to identify such a corrupted file _now_
instead of in 6 months time.  Verifying CRCs may well be sufficient in
many cases, I am not disputing that.

> BTW: the probability of silently corrupting, say, (D)RAM contents is
> much much higher than that of corrupting HDD data. Even if you use
> ECC RAM.

In a typical machine, usually yes.

> > > Do you really not value your data enough to mark it as inaccessible?
> > 
> > Not sure what you mean - in what context?
> 
> Remapping a sector on read without actually copying the data makes
> it inaccessible. Unless you have manufacturer-provided software, of
> course, but I haven't seen any.

On a very busy proxy or news server, maybe you'd rather remap the
sector, write zeros to the new one, and obtain a new copy of the data
over the network, without the disk spending ages trying to recover the
data.  If the disk is part of an array, and another disk got your data
for you, you might want to remap the sector immediately.

Although, to be honest, except where performance is critical, remap on
read is pointless.  It saves you from having to identify the bad block
again when you write to it.  Generally, guaranteed remap on write is
what I want.  What happens on read is less important if your data
isn't intact.  I can see your point of view for not re-mapping on read
given that advanced firmwares are not available, and the fact that it
allows you to do some form of data recovery.  Overall, though, if it
gets to the point where you have to start doing such data recovery,
downtime is usually significant, and for some applications, having the
data in a week's time may be little more than useless.  Predicting
possible disk fauliures is a good idea.

> > Data recovery is always a last resort.  On the other hand, backing up
> > data daily can still result in 23 hours of lost data, so I consider
> > early detection of faulty disks very important.  Mirroring brings it's
> > own problems to consider - more devices to possibly fail, and if they
> > are connected to the same controller, a serious fault with any one
> > could usually theoretically destroy all of them.
> 
> It all depends on requirements. If you need 100% uninterrupted service
> you can use mirrored servers, possibly installed in different locations.
> This will fix potential problems, while remapping on failed read will
> not.

I never actually suggested that remapping on an unrecovered failed
read would solve any data integrity problems.

I did suggest that data which was recovered automatically by the drive
on a second or subsequent read should result in a remapping of that
block.

My most important point is that writes should never fail on a good
drive.  If they do, I would not use the drive for critical data
anymore.  Presumably typical drive firmware will try several times to
do a write before reporting an error to the user - presumably it would
have to incase one or more replacement blocks are bad too.  Maybe such
failiures were temporary, caused by a voltage spike, for example, but
it would still be the case that the drive couldn't get itself back in
to a good state and retry the operation, and I would be suspicious of
it being reliable in the long term.

John.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-18  7:42                       ` Pavel Machek
@ 2003-10-18  8:30                         ` John Bradford
  2003-10-21 20:26                           ` bill davidsen
  0 siblings, 1 reply; 64+ messages in thread
From: John Bradford @ 2003-10-18  8:30 UTC (permalink / raw)
  To: Pavel Machek, Krzysztof Halasa
  Cc: John Bradford, Rogier Wolff, Norman Diamond, Hans Reiser,
	Wes Janzen, linux-kernel

> > BTW: Hard drives apparently use more sophisticated algorithms,
> > involving measuring head signal level even when there is no problem
> > reading the data, and eventually remapping a sector on read before the
> > information is lost.
> > 
> 
> Which means cat /dev/hda > /dev/null makes sense in
> cron.weekly...

Indeed.  Some drives can also do a timed defect scan using S.M.A.R.T.

John.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-18  8:27                       ` John Bradford
@ 2003-10-18 12:02                         ` Krzysztof Halasa
  2003-10-18 16:26                           ` Nuno Silva
  0 siblings, 1 reply; 64+ messages in thread
From: Krzysztof Halasa @ 2003-10-18 12:02 UTC (permalink / raw)
  To: John Bradford
  Cc: Rogier Wolff, Norman Diamond, Hans Reiser, Wes Janzen, linux-kernel

John Bradford <john@grabjohn.com> writes:

> Although, to be honest, except where performance is critical, remap on
> read is pointless.  It saves you from having to identify the bad block
> again when you write to it.  Generally, guaranteed remap on write is
> what I want.

Then I think we have an agreement.

> I did suggest that data which was recovered automatically by the drive
> on a second or subsequent read should result in a remapping of that
> block.

AFAIK this is what the drives do.

> My most important point is that writes should never fail on a good
> drive.

That's certainly what the drives do. Unless they are out of spare
sectors, of course.

Doing cat /dev/zero > /dev/hd* fixes all bad sectors on modern drive.
-- 
Krzysztof Halasa, B*FH

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-18 12:02                         ` Krzysztof Halasa
@ 2003-10-18 16:26                           ` Nuno Silva
  2003-10-18 20:16                             ` Krzysztof Halasa
  0 siblings, 1 reply; 64+ messages in thread
From: Nuno Silva @ 2003-10-18 16:26 UTC (permalink / raw)
  To: linux-kernel



Krzysztof Halasa wrote:

[..snip..]

> 
> Doing cat /dev/zero > /dev/hd* fixes all bad sectors on modern drive.

Yeah! I'm doing this right now because the data in hda is very important 
and and don't do backups since August!! :-D

Regards,
Nuno Silva



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-18 16:26                           ` Nuno Silva
@ 2003-10-18 20:16                             ` Krzysztof Halasa
  0 siblings, 0 replies; 64+ messages in thread
From: Krzysztof Halasa @ 2003-10-18 20:16 UTC (permalink / raw)
  To: Nuno Silva; +Cc: linux-kernel

Nuno Silva <nuno.silva@vgertech.com> writes:

> > Doing cat /dev/zero > /dev/hd* fixes all bad sectors on modern drive.
> 
> Yeah! I'm doing this right now because the data in hda is very
> important and and don't do backups since August!! :-D

Aaah right... August - which year exactly? :-)

(Just in case someone wants to try this on live disk - it erases all data
in the process).
-- 
Krzysztof Halasa, B*FH

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17 12:53                 ` John Bradford
  2003-10-17 13:03                   ` Russell King
@ 2003-10-19  7:50                   ` Andre Hedrick
  1 sibling, 0 replies; 64+ messages in thread
From: Andre Hedrick @ 2003-10-19  7:50 UTC (permalink / raw)
  To: John Bradford
  Cc: Norman Diamond, Hans Reiser, Wes Janzen, Rogier Wolff, linux-kernel


Sheesh, glad somebody slapped the obvious idioticy on the thread with
solid state media.  Yeah there are ways to force this, and the kernel
execute's all transactions with auto retries on the opcode.

If the drive returns valid data regardless of the ecc brut force required
it will not reallocate period.

Cheers,


Andre Hedrick
LAD Storage Consulting Group

On Fri, 17 Oct 2003, John Bradford wrote:

> Quote from "Norman Diamond" <ndiamond@wta.att.ne.jp>:
> > Now, maybe there is a technique to force it anyway.  When a partition is
> > newly created and is being formatted with the intention of writing data a
> > few minutes later, do writes that "should" have a better chance of being
> > detected.  The way to start this is to simply write every block, but this is
> > obviously insufficient because my block did get written shortly after the
> > partition was formatted and that write didn't cause the block to be
> > reallocated.  So in addition to simply writing every block, also read every
> > block.  For each read that fails, proceed to do another write which "should"
> > force reallocation.
> 
> I am just imagning how many Flash devices will be worn out
> unnecessarily by any filesystem utility that does this transparently
> to the user :-(.
> 
> > Russell King replied to me:
> > 
> > > > When a drive tries to read a block, if it detects errors, it retries up
> > > > to 255 times.  If a retry succeeds then the block gets reallocated.  IF
> > > > 255 RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
> > >
> > > This is perfectly reasonable.  If the drive can't recover your old data
> > > to reallocate it to a new block, then leaving the error present until you
> > > write new data to that bad block is the correct thing to do.
> 
> I 99% agree with that.  The 1% where I don't is that there may be
> situations where there is no interest in doing any data recovery from
> the drive, (you have backups, it is part of a RAID array, or storing
> temporary data that can be re-generated whenever necessary), and also,
> any read errors that occur during a S.M.A.R.T. read test should result
> in a re-mapping of the block.
> 
> > > Think about what would happen if it did get reallocated.  What data would
> > > the drive return when requested to read the bad block?
> > 
> > Why does it matter?  The drive already reported a read failure.  Maybe Linux
> > programs aren't all smart enough to inform the user when a read operation
> > results in an I/O error, but drivers could be smarter.  I think there's
> > probably a bit of room in an inode to add a flag saying that the file has
> > been detected to be partially unreadable.  Sorry for the digression.
> > Anyway, it is 100% true that the data in that block are gone.  The block
> > should be reallocated and the new physical block can either be zeroed or
> > randomized or anything, and that's what subsequent reads will get until the
> > block gets written again.
> 
> 100% agreed.
> 
> > > If the error persists during a write to the bad block, then yes, I'd
> > > expect it to be reallocated at that point - but only because the drive has
> > > the correct data for that block available.
> > 
> > We agree in our moral expectations and our technical analysis that correct
> > data will be available at that time.  But if your word "expect" means you
> > have confidence that the drive will perform correctly, I do not share your
> > confidence (I think it is possible but highly unlikely that the drive did
> > its job correctly during the previous write).
> 
> If the drive is not doing it's job properly DRIVE -> BIN.
> 
> > > Your description of the way Toshibas drive works seems perfectly sane.
> 
> I disagree - we haven't confirmed what happens in the error-on-write
> situation.  If it does indeed always remap the block, then I'd agree
> that that aspect was perfectly sane.
> 
> John.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-17  9:40           ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
                               ` (4 preceding siblings ...)
  2003-10-17 12:08             ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Justin Cormack
@ 2003-10-21 20:12             ` bill davidsen
  5 siblings, 0 replies; 64+ messages in thread
From: bill davidsen @ 2003-10-21 20:12 UTC (permalink / raw)
  To: linux-kernel

In article <11bf01c39492$bc5307c0$3eee4ca5@DIAMONDLX60>,
Norman Diamond <ndiamond@wta.att.ne.jp> wrote:
| Friends in the disk drive section at Toshiba said this:
| 
| When a drive tries to read a block, if it detects errors, it retries up to
| 255 times.  If a retry succeeds then the block gets reallocated.  IF 255
| RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
| 
| This was so unbelievable to that I had to confirm this with them in
| different words.  In case of a temporary error, the drive provides the
| recovered data as the result of the read operation and the drive writes the
| data to a reallocated sector.  In case of a permanent error, the block is
| assumed bad, and of course the data are lost.  Since the data are assumed
| lost, the drive keeps the defective LBA sector number associated with the
| same defective physical block and it does not reallocate the defective
| block.

Sounds right to me. If you relocate the LBA sector then on retry I will
(a) read {something} without error, and (b) it will NOT be my data, and
(c) I will not get back an error to tell me I am reading crap. In other
words, to do anything else would result in my silently getting back bad
data!

What should be done is to relocate after successful retry or after
unsuccessful write, because in both cases the drive has valid data to
relocate.

Blockbusting news, I think they're doing it just right. The object is
not to do a read and get no error, the object is to read and get correct
data, and if that doesn't happen, let the controller, o/s, or
application know about it decide what to do then.
-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
  2003-10-18  8:30                         ` John Bradford
@ 2003-10-21 20:26                           ` bill davidsen
  0 siblings, 0 replies; 64+ messages in thread
From: bill davidsen @ 2003-10-21 20:26 UTC (permalink / raw)
  To: linux-kernel

In article <200310180830.h9I8ULuc000419@81-2-122-30.bradfords.org.uk>,
John Bradford  <john@grabjohn.com> wrote:
| > > BTW: Hard drives apparently use more sophisticated algorithms,
| > > involving measuring head signal level even when there is no problem
| > > reading the data, and eventually remapping a sector on read before the
| > > information is lost.
| > > 
| > 
| > Which means cat /dev/hda > /dev/null makes sense in
| > cron.weekly...
| 
| Indeed.  Some drives can also do a timed defect scan using S.M.A.R.T.

You make the point I was going to question, is the cat (dd?) better than
a S.M.A.R.T. scan? I would think that the scan would be more likely to
be doing some special error checking, like turning off one level of ECC
or similar, and might see things a normal read might not. In other
words, the difference between no uncorrectable errors and no errors.

I am thinking of something like a C2 scan on a CD, to get error
detection without error correction.
-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
       [not found]                     ` <m37k33igui.fsf@defiant. <m3u166vjn0.fsf@defiant.pm.waw.pl>
@ 2003-10-21 20:39                       ` bill davidsen
  0 siblings, 0 replies; 64+ messages in thread
From: bill davidsen @ 2003-10-21 20:39 UTC (permalink / raw)
  To: linux-kernel

In article <m3u166vjn0.fsf@defiant.pm.waw.pl>,
Krzysztof Halasa  <khc@pm.waw.pl> wrote:
| John Bradford <john@grabjohn.com> writes:
|
| > My most important point is that writes should never fail on a good
| > drive.
| 
| That's certainly what the drives do. Unless they are out of spare
| sectors, of course.
| 
| Doing cat /dev/zero > /dev/hd* fixes all bad sectors on modern drive.

Flash from the past, back in the days of MFM drives, and "new" RLL
controllers, we wrote software which regularly read all the data off a
track with appropriate retries, reformatted the track, wrote the data,
and read it back to verify. This was because of 'sector walk" which made
the sectors move relative to the IRG. And we wrote our own device
drivers to use large sectors to get more capacity, those were the days.

However, that's the kind of thing I would hope S.M.A.R.T. could do, with
relocation of course.
-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: ATA Defect management
  2003-10-17 10:37             ` ATA Defect management John Bradford
@ 2003-10-21 20:44               ` bill davidsen
  0 siblings, 0 replies; 64+ messages in thread
From: bill davidsen @ 2003-10-21 20:44 UTC (permalink / raw)
  To: linux-kernel

In article <200310171037.h9HAbOrv000559@81-2-122-30.bradfords.org.uk>,
John Bradford  <john@grabjohn.com> wrote:
| [Note to Eric, who is CC'ed, can you comment on how Maxtor drives
| handle these issues?]
| 
| Quote from "Norman Diamond" <ndiamond@wta.att.ne.jp>:
| > Friends in the disk drive section at Toshiba said this:
| > 
| > When a drive tries to read a block, if it detects errors, it retries up to
| > 255 times.  If a retry succeeds then the block gets reallocated.  IF 255
| > RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
| 
| OK, this is interesting, at least we have some specific information.
| 
| > This was so unbelievable to that I had to confirm this with them in
| > different words.  In case of a temporary error, the drive provides the
| > recovered data as the result of the read operation and the drive writes the
| > data to a reallocated sector.  In case of a permanent error, the block is
| > assumed bad, and of course the data are lost.  Since the data are assumed
| > lost, the drive keeps the defective LBA sector number associated with the
| > same defective physical block and it does not reallocate the defective
| > block.

Not so. Assuming the admin is restoring to the same bad drive (the
twit!), since the drive does do relocate on write, the recovery will
work, the data will be whole, and life will be good.

I'm not sure why one would do a by-sector backup, but I guess for some
filesystems or raw database info it might be useful.
-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
@ 2003-10-12  8:25 Norman Diamond
  0 siblings, 0 replies; 64+ messages in thread
From: Norman Diamond @ 2003-10-12  8:25 UTC (permalink / raw)
  To: aj, linux-kernel

Andreas Jellinghaus replied to me with useful advice.  But he didn't really
answer my questions.  Please, if anyone knows the answers to my questions,
please kindly say.
(Why the sectors were numbered so strangely,
what does Linux do with them after detecting them,
and how to know if the errors occured during writes or during reads.)

Anyway,

> try the smartmontools package, it has "smartctl" that will
> show you the discs S.M.A.R.T. details

Good idea, thank you.

> doing a backup couldn't hurt.

It's essentially my crash box at the moment.  But I didn't expect visible
errors on a 2-year-old disk.  (Of course the magnetic layer always has
errors but I didn't expect things to get beyond the firmware's automatic
assignment and writing of replacement sectors.)

And my reason for posting is that the error logs didn't look the way I would
have expected, regarding the sector numbers and the repetitions.

> btw: are you sure cables are ok?

1.  There are none.
2.  If the connector on the motherboard were coming loose from the
motherboard, or if the motherboard had a crack causing intermittent failures
in some of its connections, surely the I/O errors would be far more numerous
and far more random than the strange occurences I observed.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-11  9:00 Norman Diamond
@ 2003-10-11  9:39 ` Andreas Jellinghaus
  0 siblings, 0 replies; 64+ messages in thread
From: Andreas Jellinghaus @ 2003-10-11  9:39 UTC (permalink / raw)
  To: linux-kernel

try the smartmontools package, it has "smartctl" that will
show you the discs S.M.A.R.T. details (i.e. how many bad
blocks the firmware knows, the errors the firmware knows
about, etc.). It can also run a self test etc.

doing a backup couldn't hurt.
btw: are you sure cables are ok?

Good Luck!

Andreas

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Why are bad disk sectors numbered strangely, and what happens to them?
@ 2003-10-11  9:00 Norman Diamond
  2003-10-11  9:39 ` Andreas Jellinghaus
  0 siblings, 1 reply; 64+ messages in thread
From: Norman Diamond @ 2003-10-11  9:00 UTC (permalink / raw)
  To: linux-kernel

My first question is why the bad disk sectors are numbered strangely, and
second is what does Linux do with them after detecting them?

I repartitioned and reformatted two Reiser partitions before installing SuSE
8.2 and then compiling kernels 2.6.0-test5, test6, and test7.  My feeling is
that the following errors "should" have been detected during writes, so the
damage "should" not be too bad.  The correct data "should" get written to
replacement sectors.  But my understanding of modern ATA drives is that the
firmware "should" have detected the errors during writes and "should" have
finished the work without the OS knowing about it.

If the following errors occured during reads then I have some pretty angry
questions about why they didn't get detected during writes, especially when
the writes occured minutes or milliseconds prior to the reads.  (I'll copy
this message to some Toshiba employees.  Maybe the next time they visit,
certain persons should get cat food instead of my wife's cooking  _^o^_
MK4018GAP, about 2 years old.)

Hmm, I guess I also need to ask how to figure out if these occured during
writes or reads.

Meanwhile, it seems really strange to see separate numbers for LBAsect and
sector, and to see that the two numbers are sometimes related but sometimes
apparently unrelated, and to see LBAsect remain constant while sector
changes with each error.  What is really going on here?

Also kernel 2.6.0-test7 no longer says whether hda was on 03:08 or 03:00
when the errors were detected.

Sep 27 16:49:41 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 27 16:49:41 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=852296
Sep 27 16:49:41 diamondpana kernel: end_request: I/O error,
   dev 03:08 (hda), sector 852296
Sep 27 16:49:41 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 27 16:49:41 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=852304
Sep 27 16:49:41 diamondpana kernel: end_request: I/O error,
   dev 03:08 (hda), sector 852304
[comment: no more that day]

Sep 28 15:20:20 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:20 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021784
Sep 28 15:20:20 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021784
Sep 28 15:20:20 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:20 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021786
Sep 28 15:20:20 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021786
[... every even-numbered sector in this range ...]
Sep 28 15:20:21 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:21 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021880
Sep 28 15:20:21 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021880
Sep 28 15:20:21 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:21 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021882
Sep 28 15:20:21 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021882
[comment: after hitting equality, it soon repeated from the middle]
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021832
Sep 28 15:20:26 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021832
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021834
Sep 28 15:20:26 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021834
[... every even-numbered sector in this range ...]
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021880
Sep 28 15:20:26 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021880
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021882
Sep 28 15:20:26 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021882
[comment:  after hitting equality again, no more that day]

Sep 29 01:24:09 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 29 01:24:09 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=852296
Sep 29 01:24:09 diamondpana kernel: end_request: I/O error,
   dev 03:08 (hda), sector 852296
Sep 29 01:24:09 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 29 01:24:09 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=852304
Sep 29 01:24:09 diamondpana kernel: end_request: I/O error,
   dev 03:08 (hda), sector 852304
[comment:  same sectors as on Sep 27]

Oct 10 18:29:29 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Oct 10 18:29:29 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021842
Oct 10 18:29:29 diamondpana kernel: end_request: I/O error,
   dev hda, sector 19021842
Oct 10 18:29:29 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Oct 10 18:29:29 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021850
Oct 10 18:29:29 diamondpana kernel: end_request: I/O error,
   dev hda, sector 19021850
[... every 8th sector in this range, congruent to 2 modulo 8 ...]
Oct 10 18:29:30 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Oct 10 18:29:30 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021874
Oct 10 18:29:30 diamondpana kernel: end_request: I/O error,
   dev hda, sector 19021874
Oct 10 18:29:30 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Oct 10 18:29:30 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021882
Oct 10 18:29:30 diamondpana kernel: end_request: I/O error,
   dev hda, sector 19021882
[comment: some of the same sectors as on Sep 28]


^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2003-10-21 20:54 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-13  9:31 Why are bad disk sectors numbered strangely, and what happens to them? Norman Diamond
     [not found] ` <200310131014.h9DAEwY3000241@81-2-122-30.bradfords.org.uk>
2003-10-13 10:24   ` Norman Diamond
2003-10-13 10:33     ` John Bradford
2003-10-13 11:30       ` Norman Diamond
2003-10-13 11:58         ` Maciej Zenczykowski
2003-10-15 10:22           ` Norman Diamond
2003-10-13 12:02         ` John Bradford
2003-10-15 10:23           ` Norman Diamond
2003-10-15 18:56             ` Pavel Machek
2003-10-14  6:54         ` Rogier Wolff
2003-10-13 14:24     ` Chuck Campbell
2003-10-13 14:54       ` Maciej Zenczykowski
2003-10-13 16:29         ` Roger Larsson
2003-10-14  6:49     ` Rogier Wolff
2003-10-14  7:05       ` Wes Janzen
2003-10-14  7:21         ` John Bradford
2003-10-14  7:40           ` Rogier Wolff
2003-10-14  8:11             ` John Bradford
2003-10-14  8:45               ` Hans Reiser
2003-10-14  9:46                 ` Rogier Wolff
2003-10-14  9:57                   ` Hans Reiser
2003-10-14 10:10                     ` Rogier Wolff
2003-10-14 10:31                       ` Hans Reiser
2003-10-14 10:19                 ` John Bradford
     [not found]             ` <200310140800.h9E80BT9000815@81-2-122-30.bradfords.org.uk>
     [not found]               ` <20031014081110.GA14418@bitwizard.nl>
2003-10-14  8:55                 ` Wes Janzen
2003-10-14 10:05                   ` Rogier Wolff
2003-10-14  7:24         ` Rogier Wolff
2003-10-14  9:04         ` Hans Reiser
2003-10-15 10:23           ` Norman Diamond
2003-10-15 10:39             ` Hans Reiser
2003-10-17  9:40           ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
2003-10-17  9:48             ` Hans Reiser
2003-10-17 11:11               ` Norman Diamond
2003-10-17 11:45                 ` Hans Reiser
2003-10-17 11:51                 ` John Bradford
2003-10-17 12:53                 ` John Bradford
2003-10-17 13:03                   ` Russell King
2003-10-17 13:26                     ` John Bradford
2003-10-19  7:50                   ` Andre Hedrick
2003-10-17 13:04                 ` Russell King
2003-10-17 14:09                   ` Norman Diamond
2003-10-17  9:58             ` Pavel Machek
2003-10-17 10:15               ` Hans Reiser
2003-10-17 10:24             ` Rogier Wolff
2003-10-17 10:49               ` John Bradford
2003-10-17 11:09                 ` Rogier Wolff
2003-10-17 11:24                 ` Krzysztof Halasa
2003-10-17 19:35                   ` John Bradford
2003-10-17 23:28                     ` Krzysztof Halasa
2003-10-18  7:42                       ` Pavel Machek
2003-10-18  8:30                         ` John Bradford
2003-10-21 20:26                           ` bill davidsen
2003-10-18  8:27                       ` John Bradford
2003-10-18 12:02                         ` Krzysztof Halasa
2003-10-18 16:26                           ` Nuno Silva
2003-10-18 20:16                             ` Krzysztof Halasa
     [not found]                     ` <m37k33igui.fsf@defiant. <m3u166vjn0.fsf@defiant.pm.waw.pl>
2003-10-21 20:39                       ` bill davidsen
2003-10-17 10:37             ` ATA Defect management John Bradford
2003-10-21 20:44               ` bill davidsen
2003-10-17 12:08             ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Justin Cormack
2003-10-21 20:12             ` bill davidsen
  -- strict thread matches above, loose matches on Subject: below --
2003-10-12  8:25 Why are bad disk sectors numbered strangely, and what happens to them? Norman Diamond
2003-10-11  9:00 Norman Diamond
2003-10-11  9:39 ` Andreas Jellinghaus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.