Re: Why are bad disk sectors numbered strangely, and what happens to them?

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
@ 2003-10-12  8:25 Norman Diamond
  0 siblings, 0 replies; 33+ messages in thread
From: Norman Diamond @ 2003-10-12  8:25 UTC (permalink / raw)
  To: aj, linux-kernel

Andreas Jellinghaus replied to me with useful advice.  But he didn't really
answer my questions.  Please, if anyone knows the answers to my questions,
please kindly say.
(Why the sectors were numbered so strangely,
what does Linux do with them after detecting them,
and how to know if the errors occured during writes or during reads.)

Anyway,

> try the smartmontools package, it has "smartctl" that will
> show you the discs S.M.A.R.T. details

Good idea, thank you.

> doing a backup couldn't hurt.

It's essentially my crash box at the moment.  But I didn't expect visible
errors on a 2-year-old disk.  (Of course the magnetic layer always has
errors but I didn't expect things to get beyond the firmware's automatic
assignment and writing of replacement sectors.)

And my reason for posting is that the error logs didn't look the way I would
have expected, regarding the sector numbers and the repetitions.

> btw: are you sure cables are ok?

1.  There are none.
2.  If the connector on the motherboard were coming loose from the
motherboard, or if the motherboard had a crack causing intermittent failures
in some of its connections, surely the I/O errors would be far more numerous
and far more random than the strange occurences I observed.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-15 10:23           ` Norman Diamond
@ 2003-10-15 18:56             ` Pavel Machek
  0 siblings, 0 replies; 33+ messages in thread
From: Pavel Machek @ 2003-10-15 18:56 UTC (permalink / raw)
  To: Norman Diamond; +Cc: John Bradford, linux-kernel

Hi!

> > That sector may have gone bad in the next few minutes.  Unlikely, but possible.
> 
> I think you mean that the replacement sector might have gone bad in the
> minutes after the reallocation.  Unlikely but possible, yes.  I guess I will

Well, if your drive is overheated (for example), it is likely to kill
spare sector, too. [I've seen something like that here.]

									Pavel
-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-15 10:23           ` Norman Diamond
@ 2003-10-15 10:39             ` Hans Reiser
  0 siblings, 0 replies; 33+ messages in thread
From: Hans Reiser @ 2003-10-15 10:39 UTC (permalink / raw)
  To: Norman Diamond
  Cc: Wes Janzen, Rogier Wolff, John Bradford, linux-kernel, nikita

Norman Diamond wrote:

>Hans Reiser wrote:
>
>  
>
>>I think the problem is that many users don't know how to trigger the bad
>>sector remapping for the case where the drive can still remap, using
>>writes to the bad blocks, and probably our faq needs updating.
>>    
>>
>
>This is indeed one of the problems[*].  The other problem is that it seems
>to be absurdly difficult to find which file contains the bad sector.  Even
>though a file could have multiple hard links, it would be enough to get one
>pathname for the file, in order to know which file needs to be reconstructed
>from a source of good data.
>
>[* Of course I also wish that the original failing write had been detected
>by the drive, but this failure isn't software's fault.  I hope.]
>
>
>
>  
>
badblocks program fixes that

-- 
Hans

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  9:04         ` Hans Reiser
@ 2003-10-15 10:23           ` Norman Diamond
  2003-10-15 10:39             ` Hans Reiser
  0 siblings, 1 reply; 33+ messages in thread
From: Norman Diamond @ 2003-10-15 10:23 UTC (permalink / raw)
  To: Hans Reiser, Wes Janzen; +Cc: Rogier Wolff, John Bradford, linux-kernel, nikita

Hans Reiser wrote:

> I think the problem is that many users don't know how to trigger the bad
> sector remapping for the case where the drive can still remap, using
> writes to the bad blocks, and probably our faq needs updating.

This is indeed one of the problems[*].  The other problem is that it seems
to be absurdly difficult to find which file contains the bad sector.  Even
though a file could have multiple hard links, it would be enough to get one
pathname for the file, in order to know which file needs to be reconstructed
from a source of good data.

[* Of course I also wish that the original failing write had been detected
by the drive, but this failure isn't software's fault.  I hope.]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 12:02         ` John Bradford
@ 2003-10-15 10:23           ` Norman Diamond
  2003-10-15 18:56             ` Pavel Machek
  0 siblings, 1 reply; 33+ messages in thread
From: Norman Diamond @ 2003-10-15 10:23 UTC (permalink / raw)
  To: John Bradford, linux-kernel

John Bradford replied to me:

> >   IF the bad sector doesn't get reused then great, then the next bit of
> > effort will be to try to get the sector marked as bad, if there is any way
> > to do that under Linux.  See the next question, which is now being reposted
> > for at least the fourth time.
> >   BUT IF the same sector number gets rewritten then hopefully the same
> > sector number will be associated with a reallocated non-defective sector and
> > the data will get written properly.
>
> Yes, that's what I'd hope, unless the disk ran out of spare space to
> allocate.

Surely two reallocations wouldn't have made it run out of spare space?

Besides, the S.M.A.R.T. log didn't have any statistics anywhere near
failure, and if the drive had run out of spare space then surely one or two
of the statistics should have gone down to zero.

> > > >    How can I tell Linux to mark the sector as bad, knowing the LBA
> > > >    sector number?
> > >
> > > Don't.  If the drive can't fix this problem itself, throw it in the bin.
> >
> > THE DRIVE HAS 1, ONE, HITOTSU, UNO, UN, BAD SECTOR.
>
> No, the last SMART test re-allocated one sector.

Yeah, but it's not even quite clear if the reallocated sector is the same as
the defective sector.  Something is pretty screwy, and I've asked some
friends at Toshiba to discuss it during their next visit (and they know
they're getting cat food instead of my wife's cooking  _^o^_)  Nonetheless,
it is customary to dump drives when they have increasingly numerous defects,
not when they have one.

> That sector may have gone bad in the next few minutes.  Unlikely, but possible.

I think you mean that the replacement sector might have gone bad in the
minutes after the reallocation.  Unlikely but possible, yes.  I guess I will
probably try to write zeroes to the sector using the suggestion by Maciej
Zenczykowski, but first I'll ask the Toshiba people if they have different
preferences.

> > The drive is capable of
> > doing reallocations.  What kind of operation can be done that will persuade
> > the drive to do the reallocation?
>
> The drive has _done_ a reallocation.  You posted that the reallocated
> sector count had gone from 1 to 2.  This is why I said if it can't fix
> the problem, bin it.  It doesn't seem to have fixed the problem yet.

It's not obvious if the reallocated sector was the same one as the detected
defective sector.  I thought it seemed not to be.  You pointed out that it
is unlikely but possible.

> Somebody else might read this thread, and want full instructions.

OK, sorry I thought you just hadn't read what you were replying to.

> 3. Run the tests again.  Your drive fixed one bad sector, let's see if
>    it completes the test again without finding more.

Yeah, but I was already upset by finding that the same sector number
remained bad even after it "should" have been the one that was reallocated.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 11:58         ` Maciej Zenczykowski
@ 2003-10-15 10:22           ` Norman Diamond
  0 siblings, 0 replies; 33+ messages in thread
From: Norman Diamond @ 2003-10-15 10:22 UTC (permalink / raw)
  To: Maciej Zenczykowski; +Cc: John Bradford, linux-kernel

Maciej Zenczykowski replied to me:

> > When the drive's self-test detected that one bad sector, I could figure out
> > which partition it was in (though not which file, which is why I asked one
> > of those questions several times already).  The drive's self-test read the
> > entire drive and the other partitions had no detectable errors.
>
> Instead of zeroing the entire partition just zero that single sector.
> something like:
> dd if=/dev/zero of=/dev/hda bs=512 seek=$lbasector conv=notrunc count=1
>
> possibly first check (by reading in the oposite direction:
> dd if=/dev/hda of=/dev/null bs=512 skip=$lbasector count=1)
> if this is indeed the place were you get the read error (in syslog)...

Thank you.

> if you can read anything from it then read it to a file and write it back
> from the file...

dd if=/dev/hda8 of=/dev/null already quit at the bad sector.  It's really
certain that that one sector is it, and I won't be able to read anything
from it.  The read check should just be a redundant check that the correct
sector is being addressed there, and it is a good idea to do that.

> as for checking which file contains it... hmm file->sector->lba mapping
> can be performed... I don't know about the other direction.  Worst case
> would require checking the mapping of all files on the partition (and
> assuming it's not in an empty area or non-file system area).

I made a shell script with find commands to copy all files that are in that
partition (all pathnames that aren't in other mounted filesystems) to
/dev/null.  When one aborts, I should know the name.  But this is an
incredibly inefficient way to do it.  Intuitively it seems it should be
straightforward to find at least one of the pathnames that the file has.
Practically it seems it shouldn't take 24 hours to copy all files in a 5GB
partition to /dev/null.  But after several hours it only copied about 20% of
the files to /dev/null, and I'll have to continue it this weekend.  Even the
drive's "long" S.M.A.R.T. self-test only took 47 minutes.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14 10:10                     ` Rogier Wolff
@ 2003-10-14 10:31                       ` Hans Reiser
  0 siblings, 0 replies; 33+ messages in thread
From: Hans Reiser @ 2003-10-14 10:31 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: John Bradford, Wes Janzen, linux-kernel

Rogier Wolff wrote:

>On Tue, Oct 14, 2003 at 01:57:42PM +0400, Hans Reiser wrote:
>  
>
>>Rogier Wolff wrote:
>>    
>>
>>>Of course, I left my drive that indicated it had problems (i.e. it
>>>didn't spot the sector going bad before it became unreadable), in the
>>>machine for another two days. It's getting replaced ASAP (i.e. the
>>>next hour or so).
>>>      
>>>
>
>  
>
>>replacing the drive is reasonable caution.  I think though that the 
>>other poster is right that IFF you want to remap bad blocks, the drive 
>>should do it not reiserfs.
>>    
>>
>
>It is a "pretty much for free" feature. In your in-kernel
>implementation you hopefully already have the ability to skip blocks
>in use by other files. So allocating it to a special file will take
>care of the kernel part. Next you need one line in your fsck to
>prevent that "dangling inode" getting linked into lost+found. Then you
>do need a utility to actually be able to mark blocks as bad. 
>
>			Roger. 
>
>  
>
We DO have it.  It is present in Reiser4, and there is a patch around 
somewhere for V3 that I would be happy to have someone merge into the 
latest V3 code and test (we are too focused on shipping V4 to do it 
ourselves right now).

I agree that the FS should be able to do it, but I also think that the 
drive doing it is best.

-- 
Hans



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  8:45               ` Hans Reiser
  2003-10-14  9:46                 ` Rogier Wolff
@ 2003-10-14 10:19                 ` John Bradford
  1 sibling, 0 replies; 33+ messages in thread
From: John Bradford @ 2003-10-14 10:19 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Rogier Wolff, Wes Janzen, linux-kernel

(I know I said my previous post was the last one on this subject, but
we seem to have moved on to a slightly different area).

Quote from Hans Reiser <reiser@namesys.com>:
> Perhaps we should tell people to first write to the bad block, and only 
> if the block remains bad after triggering the remapping by writing to it 
> should you make any effort to get the filesystem to remap it for you.  
> What do you think?

I'm not convinced that this belongs in the filesystem.  I can see how
it makes sense in some ways for magnetic disk devices, but that's not
the filesystem's concern.  How would we know that the write isn't
being cached by hardware further along the line, for example?  What
are the negative effects of repeated writes if the filesystem is on
flash, or a tape.  A damaged tape could be damaged more by winding
back and forth, for example, (OK, tape is a bad example, but some
future storage technology that we don't know about could have an
analogous problem.  My point is that just because 99% of installations
will use ReiserFS on disk device, is it right to put disk device
specifics in the FS?).

Also, one corner case that occurs to me is that the first remapping
worked, and then the newly allocated area went bad in the time before
we verified it.  Then it could look like a persistant fault, when it
is infact it's two separate faults.  Realistically, though, I suspect
that is only likely to happen on a rapidly dieing disk, in which case
there isn't much we can do anyway.

In general, though, the question is really, should ReiserFS be usable
on a device which doesn't do it's own bad block handling?  I suggest
no.

The ultimate point is that only the drive firmware really knows what's
going on, and it can make informed decisions based on things that
nothing external to the drive knows about.  How much error correction
it needed to read a block, the number of errors per physical head, and
per physical cylinder, etc.  The filesystem can only generally make a
decision based on whether there is an error or not.

> Rogier has not indicated that he has tried writing to the bad sector, 
> has he?

I don't think so.

John.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  9:57                   ` Hans Reiser
@ 2003-10-14 10:10                     ` Rogier Wolff
  2003-10-14 10:31                       ` Hans Reiser
  0 siblings, 1 reply; 33+ messages in thread
From: Rogier Wolff @ 2003-10-14 10:10 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Rogier Wolff, John Bradford, Wes Janzen, linux-kernel

On Tue, Oct 14, 2003 at 01:57:42PM +0400, Hans Reiser wrote:
> Rogier Wolff wrote:
> >Of course, I left my drive that indicated it had problems (i.e. it
> >didn't spot the sector going bad before it became unreadable), in the
> >machine for another two days. It's getting replaced ASAP (i.e. the
> >next hour or so).

> replacing the drive is reasonable caution.  I think though that the 
> other poster is right that IFF you want to remap bad blocks, the drive 
> should do it not reiserfs.

It is a "pretty much for free" feature. In your in-kernel
implementation you hopefully already have the ability to skip blocks
in use by other files. So allocating it to a special file will take
care of the kernel part. Next you need one line in your fsck to
prevent that "dangling inode" getting linked into lost+found. Then you
do need a utility to actually be able to mark blocks as bad. 

			Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  8:55                 ` Wes Janzen
@ 2003-10-14 10:05                   ` Rogier Wolff
  0 siblings, 0 replies; 33+ messages in thread
From: Rogier Wolff @ 2003-10-14 10:05 UTC (permalink / raw)
  To: Wes Janzen; +Cc: Rogier Wolff, John Bradford, linux-kernel

On Tue, Oct 14, 2003 at 03:55:09AM -0500, Wes Janzen wrote:
> >And the real-time performance of the drive becomes unreliable. 
> >Worst case, in a 1Mbyte block 1 million sectors are remapped,
> >requiring a seek of 10ms. While normally reading that block of
> >data would consume 1/40th of a second, you are now looking at
> >about 3 hours. 

> Well, aren't we talking about hardware sectors?  The hardware sectors 
> are probably at least 1 MB in size to start with.  My old 16GB Maxtor 
> that had remapped its way out of sectors only had 16 to remap (the last 
> unit I had fail due to this problem).  I doubt the hardware sectors were 
> anywhere near 1 byte in size.  The bad sectors also seemed to occur at 

OOops. Sorry. Too quick with the numbers. The remapping granularity is
1 sector (0.5kbytes), and there are 2000 of those in a megabyte.

So if the odd numbered ones end up remapped, you have 2000 seeks to
perform to read that 1Mb of data. That would come to 2000 * 10ms = 20
seconds. Not quite as bad as several hours, but still.... 

> an exponential rate, which is supported by the 5 drives I've seen go bad 
> in this manner.  Supposedly that has to do with debries spreading across 
> the platter and taking out adjacent sectors.  The one drive I didn't 
> send back or replace immediately after the first error (i.e. no more 
> sectors can be remapped) had lost nearly 50MB of space to bad sectors in 
> a week, and 200MB by the time the replacement arrived 4 days later.  I 
> imagine that this only gets worse as more data is packed into a smaller 
> space.

This supports my statement that if you notice sectors getting bad,
replace the disk as fast as you can, and hope that the sector
remapping bails you out until you get that chance.


> Is there even a way to disable sector remapping on an ATA drive anyway?  
> To avoid these "disadvantages of hardware remapping" you'd need some way 
> to ensure that the drive didn't remap any sectors.  As someone noted, 
> their drive remapped a sector without anything showing up in the log. 

Some drives claim "AV compatibility" or something like that. I think
that this means that they will have their spare sectors on the same
cylinder. i.e. no seeking. (just on average 8ms delay).

> I start more closely watching any drive that remaps more than half its 
> available sectors, if it gets close to the limit I replace it (if it's 
> out of warranty, otherwise I help it along with some badblock runs).  
> It's just not worth the hassle of losing data.  At least if the drive 
> detects the error, chances are it recovers the data and copies it to a 
> good sector (at least I've never lost any data from a drive remapping).  
> I can't say the same for the filesystem trying to recover the data, 
> which usually seems to result in a corrupted file.  IMHO, the data 
> integrity of hardware remapping outweighs any performance disadvantage 
> as compared to a filesystem-only based solution.
> 
> Now if only the drive would catch the problem without requiring a write 
> to the offending sector first. ;-)  Maybe that's already fixed on the 
> newer drives, none of my newer ones have remapped sectors yet.

The problem is that it would be nice if the disk could report: I just
read the data from block XXX for you, but I had a hard time getting it
for you. Recommend reassignment. The OS should then log this, and put
the file that this belongs to elsewhere. This gives the OS the
authority, and the sysop the ability to take appropriate action.

I don't mind a couple of remaps on my mp3 collection. But I rather
hate them on my root drive. 

			Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  9:46                 ` Rogier Wolff
@ 2003-10-14  9:57                   ` Hans Reiser
  2003-10-14 10:10                     ` Rogier Wolff
  0 siblings, 1 reply; 33+ messages in thread
From: Hans Reiser @ 2003-10-14  9:57 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: John Bradford, Wes Janzen, linux-kernel

Rogier Wolff wrote:

>On Tue, Oct 14, 2003 at 12:45:34PM +0400, Hans Reiser wrote:
>  
>
>>Perhaps we should tell people to first write to the bad block, and only 
>>if the block remains bad after triggering the remapping by writing to it 
>>should you make any effort to get the filesystem to remap it for you.  
>>What do you think?
>>
>>Rogier has not indicated that he has tried writing to the bad sector, 
>>has he?
>>    
>>
>
>Hans, 
>
>I simply refuse to try to trigger a remapping by writing to the
>sector. A couple of things can happen:
>
>1) The write succeeds on the "bad" spot.
>
> The "normal" write doesn't
>do a "veriy-after-write", so the write might simply be succeeding, 
>resulting in an immediate data-loss (which might be masked if I try
>to reread the data from userspace bacause the data is still cached!)
>
Do a hard reboot with > 25 seconds power off.

>
>2) the realloc might succeed, hiding the fact that my drive just lost
>0.5k bytes of my data. I mean, there was SOME data there. Linux
>wouldn't try to be reading it if it had never been written, right?  A
>drive that refers my data to /dev/null should be diverted there
>itself.
>
>Of course, I left my drive that indicated it had problems (i.e. it
>didn't spot the sector going bad before it became unreadable), in the
>machine for another two days. It's getting replaced ASAP (i.e. the
>next hour or so).
>
>The bad sector developed in a backup of data that is still running
>hapilly on another machine. But I'm not risking a sector getting
>assigned some important data going bad next time I notice something.
>
>			Roger. 
>
>  
>
replacing the drive is reasonable caution.  I think though that the 
other poster is right that IFF you want to remap bad blocks, the drive 
should do it not reiserfs.

-- 
Hans



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  8:45               ` Hans Reiser
@ 2003-10-14  9:46                 ` Rogier Wolff
  2003-10-14  9:57                   ` Hans Reiser
  2003-10-14 10:19                 ` John Bradford
  1 sibling, 1 reply; 33+ messages in thread
From: Rogier Wolff @ 2003-10-14  9:46 UTC (permalink / raw)
  To: Hans Reiser; +Cc: John Bradford, Rogier Wolff, Wes Janzen, linux-kernel

On Tue, Oct 14, 2003 at 12:45:34PM +0400, Hans Reiser wrote:
> Perhaps we should tell people to first write to the bad block, and only 
> if the block remains bad after triggering the remapping by writing to it 
> should you make any effort to get the filesystem to remap it for you.  
> What do you think?
> 
> Rogier has not indicated that he has tried writing to the bad sector, 
> has he?

Hans, 

I simply refuse to try to trigger a remapping by writing to the
sector. A couple of things can happen:

1) The write succeeds on the "bad" spot. The "normal" write doesn't
do a "veriy-after-write", so the write might simply be succeeding, 
resulting in an immediate data-loss (which might be masked if I try
to reread the data from userspace bacause the data is still cached!)

2) the realloc might succeed, hiding the fact that my drive just lost
0.5k bytes of my data. I mean, there was SOME data there. Linux
wouldn't try to be reading it if it had never been written, right?  A
drive that refers my data to /dev/null should be diverted there
itself.

Of course, I left my drive that indicated it had problems (i.e. it
didn't spot the sector going bad before it became unreadable), in the
machine for another two days. It's getting replaced ASAP (i.e. the
next hour or so).

The bad sector developed in a backup of data that is still running
hapilly on another machine. But I'm not risking a sector getting
assigned some important data going bad next time I notice something.

			Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  7:05       ` Wes Janzen
  2003-10-14  7:21         ` John Bradford
  2003-10-14  7:24         ` Rogier Wolff
@ 2003-10-14  9:04         ` Hans Reiser
  2003-10-15 10:23           ` Norman Diamond
  2 siblings, 1 reply; 33+ messages in thread
From: Hans Reiser @ 2003-10-14  9:04 UTC (permalink / raw)
  To: Wes Janzen
  Cc: Rogier Wolff, Norman Diamond, John Bradford, linux-kernel, nikita

Wes Janzen wrote:

>
>>
>>
>> You have to do the math on the LBA sector numbers (subtract the
>> partition start, divide by two).
>> Also, you can use the "badblocks" program.  
>>
> I think he's using reiserfs on the partition, which ASFAIK doesn't 
> support marking bad sectors without some work.  I tend to agree with 
> namesys when they suggest just getting a new drive if it has used up 
> all of its extra sectors.  In my experience (admittedly limited), any 
> drive which runs out of extra sectors starts to go bad in a hurry.
>
> -Wes-
>
>>             Roger.  
>>
>
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>
I think the problem is that many users don't know how to trigger the bad 
sector remapping for the case where the drive can still remap, using 
writes to the bad blocks, and probably our faq needs updating.

nikita, can you do that?

-- 
Hans



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
       [not found]               ` <20031014081110.GA14418@bitwizard.nl>
@ 2003-10-14  8:55                 ` Wes Janzen
  2003-10-14 10:05                   ` Rogier Wolff
  0 siblings, 1 reply; 33+ messages in thread
From: Wes Janzen @ 2003-10-14  8:55 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: John Bradford, linux-kernel



Rogier Wolff wrote:

>On Tue, Oct 14, 2003 at 09:00:11AM +0100, John Bradford wrote:
>  
>
>>Besides, a read error might not mean the data is lost, maybe the drive
>>marked it bad because the amount of error correction needed to
>>retrieve the data was just 'on the edge' of what was possible.
>>    
>>
>
>No. A read error means the data was lost. 
>
>The drive may reallocate it wehn it was "on the edge". 
>
>  
>
>>Again, I'm not sure what you are implying.  I don't use ReiserFS
>>personally, but I think it's a _good_ thing if it doesn't implement
>>    
>>
>
>Good Keep it that way. 
>
>  
>
>>bad sector mapping because I don't see any use for it.  If somebody
>>wants to use ReiserFS on an ST-506 disk, the block layer should handle
>>re-allocations, and present an always good block device to the
>>filesystem.
>>    
>>
>
>We don't have that block layer. 
>
>  
>
>>>You create a file called something like ".badblocks" in the root
>>>directory. If as a filesystem you get to know of a bad block, just
>>>allocate it towards that file. Next it pays to make the file invisble
>>>from userspace. (otherwise "tar backups" would try to read it!). 
>>>      
>>>
>>>This is usually done by just allocating an inodenumber for it, and
>>>telling  fsck about it, to prevent it being linked into lost+found 
>>>on the first fsck.... 
>>>
>>>      
>>>
>>>>The drive may well have been developing faults regularly through it's
>>>>entire lifetime, and you haven't noticed.  Now you have noticed and
>>>>want to work around the problem, but why wouldn't the drive continue
>>>>it's 'natural decay', and assuming it does, why would it be able to
>>>>re-map future bad blocks, but not this one?
>>>>        
>>>>
>>>On the other hand, I once bumped my knee against the bottom of the table
>>>that my computer was on. That was the exact moment that one of my
>>>sectors went bad. So now I know the cause, and want to remap the sector. 
>>>No gradual decay. 
>>>      
>>>
>>Why didn't the drive firmware remap that bad sector then?
>>    
>>
>
>Because it was an MFM drive.
>
>Point is that if you KNOW the cause of the bad block, it might be
>worth the trouble not to use it anymore. 
>
>  
>
>>If it actually refused to, my point stands - bad sectors not getting
>>remapped.  You would be relying on no future sector going bad.  Good
>>luck.
>>    
>>
>
>Even if the remap works, you might have a performance penalty. 
>If you skip the 4k block in the future, your 40Mb per second drive
>will be "idle" for 100 microseconds, dropping your performance
>from 40,000,000 bytes to 39,996,000 bytes in that second. But if
>a seek to the remapped sector is involved, you're losing several
>milliseconds of your disk's performance!
>
>And the real-time performance of the drive becomes unreliable. 
>Worst case, in a 1Mbyte block 1 million sectors are remapped,
>requiring a seek of 10ms. While normally reading that block of
>data would consume 1/40th of a second, you are now looking at
>about 3 hours. 
>
Well, aren't we talking about hardware sectors?  The hardware sectors 
are probably at least 1 MB in size to start with.  My old 16GB Maxtor 
that had remapped its way out of sectors only had 16 to remap (the last 
unit I had fail due to this problem).  I doubt the hardware sectors were 
anywhere near 1 byte in size.  The bad sectors also seemed to occur at 
an exponential rate, which is supported by the 5 drives I've seen go bad 
in this manner.  Supposedly that has to do with debries spreading across 
the platter and taking out adjacent sectors.  The one drive I didn't 
send back or replace immediately after the first error (i.e. no more 
sectors can be remapped) had lost nearly 50MB of space to bad sectors in 
a week, and 200MB by the time the replacement arrived 4 days later.  I 
imagine that this only gets worse as more data is packed into a smaller 
space.

>If you are streaming a video off this drive, 
>that doesn't sound like an option. (say requiring only 4Mb per
>second of throughput, i.e. having a factor of 10 of performance
>margin!)
>  
>

Is there even a way to disable sector remapping on an ATA drive anyway?  
To avoid these "disadvantages of hardware remapping" you'd need some way 
to ensure that the drive didn't remap any sectors.  As someone noted, 
their drive remapped a sector without anything showing up in the log. 

I start more closely watching any drive that remaps more than half its 
available sectors, if it gets close to the limit I replace it (if it's 
out of warranty, otherwise I help it along with some badblock runs).  
It's just not worth the hassle of losing data.  At least if the drive 
detects the error, chances are it recovers the data and copies it to a 
good sector (at least I've never lost any data from a drive remapping).  
I can't say the same for the filesystem trying to recover the data, 
which usually seems to result in a corrupted file.  IMHO, the data 
integrity of hardware remapping outweighs any performance disadvantage 
as compared to a filesystem-only based solution.

Now if only the drive would catch the problem without requiring a write 
to the offending sector first. ;-)  Maybe that's already fixed on the 
newer drives, none of my newer ones have remapped sectors yet.

-Wes-

>			Roger. 
>
>
>  
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  8:11             ` John Bradford
@ 2003-10-14  8:45               ` Hans Reiser
  2003-10-14  9:46                 ` Rogier Wolff
  2003-10-14 10:19                 ` John Bradford
  0 siblings, 2 replies; 33+ messages in thread
From: Hans Reiser @ 2003-10-14  8:45 UTC (permalink / raw)
  To: John Bradford; +Cc: Rogier Wolff, Wes Janzen, linux-kernel

Perhaps we should tell people to first write to the bad block, and only 
if the block remains bad after triggering the remapping by writing to it 
should you make any effort to get the filesystem to remap it for you.  
What do you think?

Rogier has not indicated that he has tried writing to the bad sector, 
has he?

-- 
Hans

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  7:40           ` Rogier Wolff
@ 2003-10-14  8:11             ` John Bradford
  2003-10-14  8:45               ` Hans Reiser
       [not found]             ` <200310140800.h9E80BT9000815@81-2-122-30.bradfords.org.uk>
  1 sibling, 1 reply; 33+ messages in thread
From: John Bradford @ 2003-10-14  8:11 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: Wes Janzen, linux-kernel

This is my last mail on this subject.

> I'm not sure in what cases a drive will remap a sector. Manufacturers
> are not publishing this.
> 
> So if you get a read-error (showing you that some of your data was just
> lost!), you could just rewrite that sector and hope for the drive to
> remap it. Well, you just lost some of your data. Maybe it was part of a
> file you got from a CD. Fine. Easy to replace. Maybe it was part of your
> CD-collection-backup. Fine. Easy to replace. Maybe it was part of your
> thesis document. Oops. Difficult to replace.

Sector re-mapping is not a replacement for backing up your data.  It
merely adds resiliance to the disk.  Infact, it's more or less
impossible to get away from these days - modern IDE disks error
correct all the time.  One area of the disk going bad is not an
unlikely event.

> > The drive is probably full of unusable areas, which are correctly
> > identified and not used by the firmware.  One more is detected, and
> > the firmware doesn't cope with it.  Suddenly we are getting
> > suggestions to work around that in the filesystem.
> 
> Right. Support for bad sectors is really easy to build into a
> filesystem. If Reiserfs doesn't (yet) support it, another reason not 
> to use Reiserfs. 

Not at all.  A bad sector map in the filesystem is a pointless feature
for a filesystem which will only likely be used on fault tollerant
devices.  It serves no purpose.  The 'it does no harm' argument is
just as pointless. 

> You create a file called something like ".badblocks" in the root
> directory. If as a filesystem you get to know of a bad block, just
> allocate it towards that file. Next it pays to make the file invisble
> from userspace. (otherwise "tar backups" would try to read it!). 

Doing that kind of thing was quite useful in the 1980s when floppies
were actually expensive and hard disks usually didn't remap bad
sectors.  Nowadays, it usually gains nothing, and may well hide real
faults that could cause data loss later on.

> This is usually done by just allocating an inodenumber for it, and
> telling  fsck about it, to prevent it being linked into lost+found 
> on the first fsck.... 
> 
> > The drive may well have been developing faults regularly through it's
> > entire lifetime, and you haven't noticed.  Now you have noticed and
> > want to work around the problem, but why wouldn't the drive continue
> > it's 'natural decay', and assuming it does, why would it be able to
> > re-map future bad blocks, but not this one?
> 
> On the other hand, I once bumped my knee against the bottom of the table
> that my computer was on. That was the exact moment that one of my
> sectors went bad. So now I know the cause, and want to remap the sector. 
> No gradual decay. 

Again, you are talking around the problem - there almost certainly
will be gradual decay with any disk.  You are just not noticing it
because the firmware is handling it.  If you know that there is a bad
sector, and the disk is not re-mapping it, _why_ isn't it remapping
it?

John.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  7:21         ` John Bradford
@ 2003-10-14  7:40           ` Rogier Wolff
  2003-10-14  8:11             ` John Bradford
       [not found]             ` <200310140800.h9E80BT9000815@81-2-122-30.bradfords.org.uk>
  0 siblings, 2 replies; 33+ messages in thread
From: Rogier Wolff @ 2003-10-14  7:40 UTC (permalink / raw)
  To: John Bradford; +Cc: Wes Janzen, Rogier Wolff, Norman Diamond, linux-kernel

On Tue, Oct 14, 2003 at 08:21:48AM +0100, John Bradford wrote:
> > >
> > >Also, you can use the "badblocks" program. 
> > >  
> > >
> > I think he's using reiserfs on the partition, which ASFAIK doesn't 
> > support marking bad sectors without some work.  I tend to agree with 
> > namesys when they suggest just getting a new drive if it has used up all 
> > of its extra sectors.  In my experience (admittedly limited), any drive 
> > which runs out of extra sectors starts to go bad in a hurry.
> 
> I fail to see the point of this discussion.  What is the point in
> marking sectors bad at the filesystem level, when the drive is
> supposed to be doing it at the firmware level?

I'm not sure in what cases a drive will remap a sector. Manufacturers
are not publishing this.

So if you get a read-error (showing you that some of your data was just
lost!), you could just rewrite that sector and hope for the drive to
remap it. Well, you just lost some of your data. Maybe it was part of a
file you got from a CD. Fine. Easy to replace. Maybe it was part of your
CD-collection-backup. Fine. Easy to replace. Maybe it was part of your
thesis document. Oops. Difficult to replace.

> The drive is probably full of unusable areas, which are correctly
> identified and not used by the firmware.  One more is detected, and
> the firmware doesn't cope with it.  Suddenly we are getting
> suggestions to work around that in the filesystem.

Right. Support for bad sectors is really easy to build into a
filesystem. If Reiserfs doesn't (yet) support it, another reason not 
to use Reiserfs. 

You create a file called something like ".badblocks" in the root
directory. If as a filesystem you get to know of a bad block, just
allocate it towards that file. Next it pays to make the file invisble
from userspace. (otherwise "tar backups" would try to read it!). 

This is usually done by just allocating an inodenumber for it, and
telling  fsck about it, to prevent it being linked into lost+found 
on the first fsck.... 

> The drive may well have been developing faults regularly through it's
> entire lifetime, and you haven't noticed.  Now you have noticed and
> want to work around the problem, but why wouldn't the drive continue
> it's 'natural decay', and assuming it does, why would it be able to
> re-map future bad blocks, but not this one?

On the other hand, I once bumped my knee against the bottom of the table
that my computer was on. That was the exact moment that one of my
sectors went bad. So now I know the cause, and want to remap the sector. 
No gradual decay. 

			Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  7:05       ` Wes Janzen
  2003-10-14  7:21         ` John Bradford
@ 2003-10-14  7:24         ` Rogier Wolff
  2003-10-14  9:04         ` Hans Reiser
  2 siblings, 0 replies; 33+ messages in thread
From: Rogier Wolff @ 2003-10-14  7:24 UTC (permalink / raw)
  To: Wes Janzen; +Cc: Rogier Wolff, Norman Diamond, John Bradford, linux-kernel

On Tue, Oct 14, 2003 at 02:05:27AM -0500, Wes Janzen wrote:
> >I've seen a disk (which now failed and will be replaced 3 hours from now)
> >remap defective sectors without reporting any errors to the OS. 
> >The SMART "remapped sector count" just went up, but no errors in the
> >logs. So apparently, the disk noticed something and remapped teh sector
> >without anybody noticing. 
> > 
> >
> Can't you pretty much get the drive to check itself using smartctl, such 
> as running:
>     smartctl -o on -s on -S on /dev/hde &> /dev/null

I strongly recommend you  store the output somewhere. This way you
will get to ignore for instance:
	hde: no such device
without being ABLE to notice it. (being an initscript, outputting to
stdout is not good. Store it in /var/log somewhere)

> in an init script?  Also, I think if you just happen to write to a bad 
> sector the drive will remap it without a warning (unless it doesn't have 
> any remapping sectors left), but if you read from it then to get the 
> drive to "notice" it, you have to write back to that sector.  Or run the 
> drive test which should find it and correct it.

The drive which I'm replacing has had a total of 22 powercycles.
Something like 15 powercycles seem to happen during "install", we
had some hardware problems after that (replaced the motherboard)
in apparently another 7 power cycles. That's all. 

If you manage to get the drive to notice sectors going bad
just before they actually GO bad, then you'll see an exponential
increase in sectors going bad, resulting in the drive quickly 
running out of spare sectors. This defeats the purpose of SMART
in alerting you to a failing drive before it costs you your valuable
data.

If an area of say 2mm x 2mm is going bad, then that's already many
megabytes on a modern drive. The drive is going to decide to remap
sectors there on a case-by-case basis, keeping on storing your valuable
data in sectors which just didn't get noticed. You don't have the
ability to notice the structure in the bad sectors. 

If say a read-amp is slowly going bad, the worst sectors are going 
first, but the whole drive will fail soonish. 

Take it as a warning. Take the drive back on warranty. Point them to the
marketingspeak on the box which says: "defect free interface" or
somethign like that. You want a drive without bad sectors. 

If you can't take it back, move it away to your "long term storage"
disk, where you keep the backup of your CD collection or something like
that. Don't put anything important on it. 

			Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  7:05       ` Wes Janzen
@ 2003-10-14  7:21         ` John Bradford
  2003-10-14  7:40           ` Rogier Wolff
  2003-10-14  7:24         ` Rogier Wolff
  2003-10-14  9:04         ` Hans Reiser
  2 siblings, 1 reply; 33+ messages in thread
From: John Bradford @ 2003-10-14  7:21 UTC (permalink / raw)
  To: Wes Janzen, Rogier Wolff; +Cc: Norman Diamond, John Bradford, linux-kernel

> >>>>I want to make sure that the drive is now using a non-defective
> >>>>replacement sector.
> >>>>        
> >>>>
> >>>A read won't necessarily do that.  You might have to write to a
> >>>defective sector to force re-allocation.
> >>>      
> >>>
> >>I agree, we are not sure if a read will do that.  That is the reason why two
> >>of my preceding questions were:
> >>    
> >>
> >
> >I've seen a disk (which now failed and will be replaced 3 hours from now)
> >remap defective sectors without reporting any errors to the OS. 
> >The SMART "remapped sector count" just went up, but no errors in the
> >logs. So apparently, the disk noticed something and remapped teh sector
> >without anybody noticing. 
> >  
> >
> Can't you pretty much get the drive to check itself using smartctl, such 
> as running:
>      smartctl -o on -s on -S on /dev/hde &> /dev/null
> in an init script?  Also, I think if you just happen to write to a bad 
> sector the drive will remap it without a warning (unless it doesn't have 
> any remapping sectors left), but if you read from it then to get the 
> drive to "notice" it, you have to write back to that sector.  Or run the 
> drive test which should find it and correct it.

That's correct for the majority of modern IDE disks.

> >>   How can I tell Linux to mark the sector as bad, knowing the LBA sector
> >>   number?
> >>    
> >>
> >
> >man tune2fs .
> >
> >You have to do the math on the LBA sector numbers (subtract the
> >partition start, divide by two). 
> >
> >Also, you can use the "badblocks" program. 
> >  
> >
> I think he's using reiserfs on the partition, which ASFAIK doesn't 
> support marking bad sectors without some work.  I tend to agree with 
> namesys when they suggest just getting a new drive if it has used up all 
> of its extra sectors.  In my experience (admittedly limited), any drive 
> which runs out of extra sectors starts to go bad in a hurry.

I fail to see the point of this discussion.  What is the point in
marking sectors bad at the filesystem level, when the drive is
supposed to be doing it at the firmware level?

The drive is probably full of unusable areas, which are correctly
identified and not used by the firmware.  One more is detected, and
the firmware doesn't cope with it.  Suddenly we are getting
suggestions to work around that in the filesystem.

The drive may well have been developing faults regularly through it's
entire lifetime, and you haven't noticed.  Now you have noticed and
want to work around the problem, but why wouldn't the drive continue
it's 'natural decay', and assuming it does, why would it be able to
re-map future bad blocks, but not this one?

Working around the problem in the filesystem makes no sense at all on
a modern IDE drive.

John.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-14  6:49     ` Rogier Wolff
@ 2003-10-14  7:05       ` Wes Janzen
  2003-10-14  7:21         ` John Bradford
                           ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Wes Janzen @ 2003-10-14  7:05 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: Norman Diamond, John Bradford, linux-kernel



Rogier Wolff wrote:

>On Mon, Oct 13, 2003 at 07:24:00PM +0900, Norman Diamond wrote:
>  
>
>>John Bradford replied to me:
>>
>>    
>>
>>>>How can I tell Linux to read every sector in the partition?  Oh, I might
>>>>know this one,
>>>>  dd if=/dev/hda8 of=/dev/null
>>>>I want to make sure that the drive is now using a non-defective
>>>>replacement sector.
>>>>        
>>>>
>>>A read won't necessarily do that.  You might have to write to a
>>>defective sector to force re-allocation.
>>>      
>>>
>>I agree, we are not sure if a read will do that.  That is the reason why two
>>of my preceding questions were:
>>    
>>
>
>I've seen a disk (which now failed and will be replaced 3 hours from now)
>remap defective sectors without reporting any errors to the OS. 
>The SMART "remapped sector count" just went up, but no errors in the
>logs. So apparently, the disk noticed something and remapped teh sector
>without anybody noticing. 
>  
>
Can't you pretty much get the drive to check itself using smartctl, such 
as running:
     smartctl -o on -s on -S on /dev/hde &> /dev/null
in an init script?  Also, I think if you just happen to write to a bad 
sector the drive will remap it without a warning (unless it doesn't have 
any remapping sectors left), but if you read from it then to get the 
drive to "notice" it, you have to write back to that sector.  Or run the 
drive test which should find it and correct it.

>  
>
>>   How can I find out which file contains the bad sector?  I would like to
>>   try to recreate the file from a source of good data.
>>    
>>
>
>Try: 
>	tar cf - / | dd of=/dev/null
>
>(note some people will try to abbreviate that to 
>	tar cf /dev/null / 
>but that won't work: Tar will recognise that it's writing to /dev/null
>and skip reading the files! That's a bug in tar in my book. )
>
>  
>
>>   How can I tell Linux to mark the sector as bad, knowing the LBA sector
>>   number?
>>    
>>
>
>man tune2fs .
>
>You have to do the math on the LBA sector numbers (subtract the
>partition start, divide by two). 
>
>Also, you can use the "badblocks" program. 
>  
>
I think he's using reiserfs on the partition, which ASFAIK doesn't 
support marking bad sectors without some work.  I tend to agree with 
namesys when they suggest just getting a new drive if it has used up all 
of its extra sectors.  In my experience (admittedly limited), any drive 
which runs out of extra sectors starts to go bad in a hurry.

-Wes-

>			Roger. 
>  
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 11:30       ` Norman Diamond
  2003-10-13 11:58         ` Maciej Zenczykowski
  2003-10-13 12:02         ` John Bradford
@ 2003-10-14  6:54         ` Rogier Wolff
  2 siblings, 0 replies; 33+ messages in thread
From: Rogier Wolff @ 2003-10-14  6:54 UTC (permalink / raw)
  To: Norman Diamond; +Cc: John Bradford, linux-kernel

On Mon, Oct 13, 2003 at 08:30:19PM +0900, Norman Diamond wrote:
> > How are you going to make sure you write it in the same location as it was
> > before?
> 
> Mostly it doesn't matter.  The primary purpose of this bit of it is to
> recreate the file to contain good data, which is why I would try to recreate
> it from a source of good data.  The secondary purpose is:

Note that I strongly recommend not putting any important data on
a drive that has shown to have defective sectors(*). You never know when
the next sector is going to go. 

We're replacing a drive that has remapped 13 sectors or something like
that, and it's now given us the first IO errors, so it's going towards
the bin. 

		Roger. 

(*) If you're sure that something external which can be prevented in the
future caused the bad sectors, then fine. But if a drive is developing
bad sectors all by itself, the future might bring remapped sectors until
the slack remap space runs out, or one day a sector containing important
data goes bad.... 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 10:24   ` Norman Diamond
  2003-10-13 10:33     ` John Bradford
  2003-10-13 14:24     ` Chuck Campbell
@ 2003-10-14  6:49     ` Rogier Wolff
  2003-10-14  7:05       ` Wes Janzen
  2 siblings, 1 reply; 33+ messages in thread
From: Rogier Wolff @ 2003-10-14  6:49 UTC (permalink / raw)
  To: Norman Diamond; +Cc: John Bradford, linux-kernel

On Mon, Oct 13, 2003 at 07:24:00PM +0900, Norman Diamond wrote:
> John Bradford replied to me:
> 
> > > How can I tell Linux to read every sector in the partition?  Oh, I might
> > > know this one,
> > >   dd if=/dev/hda8 of=/dev/null
> > > I want to make sure that the drive is now using a non-defective
> > > replacement sector.
> >
> > A read won't necessarily do that.  You might have to write to a
> > defective sector to force re-allocation.
> 
> I agree, we are not sure if a read will do that.  That is the reason why two
> of my preceding questions were:

I've seen a disk (which now failed and will be replaced 3 hours from now)
remap defective sectors without reporting any errors to the OS. 
The SMART "remapped sector count" just went up, but no errors in the
logs. So apparently, the disk noticed something and remapped teh sector
without anybody noticing. 

>    How can I find out which file contains the bad sector?  I would like to
>    try to recreate the file from a source of good data.

Try: 
	tar cf - / | dd of=/dev/null

(note some people will try to abbreviate that to 
	tar cf /dev/null / 
but that won't work: Tar will recognise that it's writing to /dev/null
and skip reading the files! That's a bug in tar in my book. )

>    How can I tell Linux to mark the sector as bad, knowing the LBA sector
>    number?

man tune2fs .

You have to do the math on the LBA sector numbers (subtract the
partition start, divide by two). 

Also, you can use the "badblocks" program. 

			Roger. 
-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 14:54       ` Maciej Zenczykowski
@ 2003-10-13 16:29         ` Roger Larsson
  0 siblings, 0 replies; 33+ messages in thread
From: Roger Larsson @ 2003-10-13 16:29 UTC (permalink / raw)
  To: linux-kernel

On Monday 13 October 2003 16.54, Maciej Zenczykowski wrote:
> > find /usr/lib -type f|sed -e 's!.*!cat & >/dev/null || echo &!'|sh
>
> should obviously be:
>   find /usr/lib -type f|sed -e 's!.*!cat "&" >/dev/null || echo &!'|sh
> in order to accept spaces in file names... (they do happen).

find /usr/lib -type f|sed -e 's!.*!cat "&" >/dev/null || echo "&"!'|sh

To accept even stranger characters... Like parantesis '('
Othervice I get:

sh: line 10051: syntax error near unexpected token `('
sh: line 10051: `cat "/usr/lib/qt-3.0.5/templates/
Dialog_with_Buttons_(Bottom).ui" >/dev/null || echo /usr/lib/qt-3.0.5/
templates/Dialog_with_Buttons_(Bottom).ui'

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 14:24     ` Chuck Campbell
@ 2003-10-13 14:54       ` Maciej Zenczykowski
  2003-10-13 16:29         ` Roger Larsson
  0 siblings, 1 reply; 33+ messages in thread
From: Maciej Zenczykowski @ 2003-10-13 14:54 UTC (permalink / raw)
  To: Chuck Campbell; +Cc: linux-kernel

> find /usr/lib -type f|sed -e 's!.*!cat & >/dev/null || echo &!'|sh
should obviously be:
  find /usr/lib -type f|sed -e 's!.*!cat "&" >/dev/null || echo &!'|sh
in order to accept spaces in file names... (they do happen).

MaZe.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 10:24   ` Norman Diamond
  2003-10-13 10:33     ` John Bradford
@ 2003-10-13 14:24     ` Chuck Campbell
  2003-10-13 14:54       ` Maciej Zenczykowski
  2003-10-14  6:49     ` Rogier Wolff
  2 siblings, 1 reply; 33+ messages in thread
From: Chuck Campbell @ 2003-10-13 14:24 UTC (permalink / raw)
  To: linux-kernel

On Mon, Oct 13, 2003 at 07:24:00PM +0900, Norman Diamond wrote:
> 
> I agree, we are not sure if a read will do that.  That is the reason why two
> of my preceding questions were:
> 
>    How can I find out which file contains the bad sector?  I would like to
>    try to recreate the file from a source of good data.

this was gib\x7fven to me  on this list by Al Viro a couple of years back.  
Worked fine for me.

find /usr/lib -type f|sed -e 's!.*!cat & >/dev/null || echo &!'|sh

-- 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 11:30       ` Norman Diamond
  2003-10-13 11:58         ` Maciej Zenczykowski
@ 2003-10-13 12:02         ` John Bradford
  2003-10-15 10:23           ` Norman Diamond
  2003-10-14  6:54         ` Rogier Wolff
  2 siblings, 1 reply; 33+ messages in thread
From: John Bradford @ 2003-10-13 12:02 UTC (permalink / raw)
  To: Norman Diamond, linux-kernel

Quote from "Norman Diamond" <ndiamond@wta.att.ne.jp>:
> John Bradford replied to me:
> 
> > > > > How can I tell Linux to read every sector in the partition?  Oh, I
> > > > > might know this one,
> > > > >   dd if=/dev/hda8 of=/dev/null
> > > > > I want to make sure that the drive is now using a non-defective
> > > > > replacement sector.
> > > >
> > > > A read won't necessarily do that.  You might have to write to a
> > > > defective sector to force re-allocation.
> > >
> > > I agree, we are not sure if a read will do that.  That is the reason why
> > > two of my preceding questions were:
> > >
> > >    How can I find out which file contains the bad sector?  I would like
> > >    to try to recreate the file from a source of good data.
> >
> > How are you going to make sure you write it in the same location as it was
> > before?
> 
> Mostly it doesn't matter.  The primary purpose of this bit of it is to
> recreate the file to contain good data, which is why I would try to recreate
> it from a source of good data.

OK.

>  The secondary purpose is:
>   IF the bad sector doesn't get reused then great, then the next bit of
> effort will be to try to get the sector marked as bad, if there is any way
> to do that under Linux.  See the next question, which is now being reposted
> for at least the fourth time.
>   BUT IF the same sector number gets rewritten then hopefully the same
> sector number will be associated with a reallocated non-defective sector and
> the data will get written properly.

Yes, that's what I'd hope, unless the disk ran out of spare space to
allocate.

> > >    How can I tell Linux to mark the sector as bad, knowing the LBA
> > >    sector number?
> >
> > Don't.  If the drive can't fix this problem itself, throw it in the bin.
> 
> THE DRIVE HAS 1, ONE, HITOTSU, UNO, UN, BAD SECTOR.

No, the last SMART test re-allocated one sector.  That sector may have
gone bad in the next few minutes.  Unlikely, but possible.

>  The drive is capable of
> doing reallocations.  What kind of operation can be done that will persuade
> the drive to do the reallocation?

The drive has _done_ a reallocation.  You posted that the reallocated
sector count had gone from 1 to 2.  This is why I said if it can't fix
the problem, bin it.  It doesn't seem to have fixed the problem yet.

> > > And that is also the reason why my last question, which Mr. Bradford
> > > replied to, had the stated purpose of making sure that the drive is now
> > > using a non-defective replacement sector after the preceding operations
> > > have been carried out.
> >
> > Backup your data.
> 
> I want to fix the defective file from an existing backup or recomputation.
> Aside from that, it is my crash box (as already posted in this thread).

Somebody else might read this thread, and want full instructions.  It
might be your crash box, but somebody else might have data they want
to preserve.

>  The
> questions are still important because sometimes this kind of thing happens
> on machines that aren't crash boxes, and it is not customary to dump a drive
> when 99.99% of its preparations for error recovery are still intact.
> 
> > Run the S.M.A.R.T. tests.
> 
> I DID.  YOU REPLIED TO MY POSTING WHERE I REPORTED THEM.

1. I know.  I read your original post
2. I am providing instructions that other people might follow in the
   future, that is why I am making sure they are complete.
3. Run the tests again.  Your drive fixed one bad sector, let's see if
   it completes the test again without finding more.

John.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 11:30       ` Norman Diamond
@ 2003-10-13 11:58         ` Maciej Zenczykowski
  2003-10-15 10:22           ` Norman Diamond
  2003-10-13 12:02         ` John Bradford
  2003-10-14  6:54         ` Rogier Wolff
  2 siblings, 1 reply; 33+ messages in thread
From: Maciej Zenczykowski @ 2003-10-13 11:58 UTC (permalink / raw)
  To: Norman Diamond; +Cc: John Bradford, linux-kernel

> Hmm.  That could well be an answer.  I'll think about it.
> 
> Actually I should just write over the whole partition for the present time.
> When the drive's self-test detected that one bad sector, I could figure out
> which partition it was in (though not which file, which is why I asked one
> of those questions several times already).  The drive's self-test read the
> entire drive and the other partitions had no detectable errors.

Instead of zeroing the entire partition just zero that single sector.
something like:

dd if=/dev/zero of=/dev/hda bs=512 seek=$lbasector conv=notrunc count=1

possibly first check (by reading in the oposite direction:
dd if=/dev/hda of=/dev/null bs=512 skip=$lbasector count=1)
if this is indeed the place were you get the read error (in syslog)...
if you can read anything from it then read it to a file and write it back 
from the file...

as for checking which file contains it... hmm file->sector->lba mapping 
can be performed... I don't know about the other direction.  Worst case 
would require checking the mapping of all files on the partition (and 
assuming it's not in an empty area or non-file system area).

Cheers,
MaZe.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 10:33     ` John Bradford
@ 2003-10-13 11:30       ` Norman Diamond
  2003-10-13 11:58         ` Maciej Zenczykowski
                           ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Norman Diamond @ 2003-10-13 11:30 UTC (permalink / raw)
  To: John Bradford, linux-kernel

John Bradford replied to me:

> > > > How can I tell Linux to read every sector in the partition?  Oh, I
> > > > might know this one,
> > > >   dd if=/dev/hda8 of=/dev/null
> > > > I want to make sure that the drive is now using a non-defective
> > > > replacement sector.
> > >
> > > A read won't necessarily do that.  You might have to write to a
> > > defective sector to force re-allocation.
> >
> > I agree, we are not sure if a read will do that.  That is the reason why
> > two of my preceding questions were:
> >
> >    How can I find out which file contains the bad sector?  I would like
> >    to try to recreate the file from a source of good data.
>
> How are you going to make sure you write it in the same location as it was
> before?

Mostly it doesn't matter.  The primary purpose of this bit of it is to
recreate the file to contain good data, which is why I would try to recreate
it from a source of good data.  The secondary purpose is:
  IF the bad sector doesn't get reused then great, then the next bit of
effort will be to try to get the sector marked as bad, if there is any way
to do that under Linux.  See the next question, which is now being reposted
for at least the fourth time.
  BUT IF the same sector number gets rewritten then hopefully the same
sector number will be associated with a reallocated non-defective sector and
the data will get written properly.

> >    How can I tell Linux to mark the sector as bad, knowing the LBA
> >    sector number?
>
> Don't.  If the drive can't fix this problem itself, throw it in the bin.

THE DRIVE HAS 1, ONE, HITOTSU, UNO, UN, BAD SECTOR.  The drive is capable of
doing reallocations.  What kind of operation can be done that will persuade
the drive to do the reallocation?

> > And that is also the reason why my last question, which Mr. Bradford
> > replied to, had the stated purpose of making sure that the drive is now
> > using a non-defective replacement sector after the preceding operations
> > have been carried out.
>
> Backup your data.

I want to fix the defective file from an existing backup or recomputation.
Aside from that, it is my crash box (as already posted in this thread).  The
questions are still important because sometimes this kind of thing happens
on machines that aren't crash boxes, and it is not customary to dump a drive
when 99.99% of its preparations for error recovery are still intact.

> Run the S.M.A.R.T. tests.

I DID.  YOU REPLIED TO MY POSTING WHERE I REPORTED THEM.

> Write over the whole disk with something like dd if=/dev/zero of=/dev/hda.

Hmm.  That could well be an answer.  I'll think about it.

Actually I should just write over the whole partition for the present time.
When the drive's self-test detected that one bad sector, I could figure out
which partition it was in (though not which file, which is why I asked one
of those questions several times already).  The drive's self-test read the
entire drive and the other partitions had no detectable errors.

> If you still get errors, replace the disk.

If the errors are not correctable and/or numerous (where I do not count
numerous syslog entries of the same defective sector to be numerous errors)
then of course I will do so.  Even though it's my crash box.

...  By the way, consider this:

Windows 98 has a scandisk command which writes a file scandisk.log in which
the user can see which files have been deleted by scandisk or corrupted
either by scandisk or before scandisk.  The user can try to recreate those
files.

Windows 2000 has a chkdsk command which does not write a logfile.

Therefore it is convenient for Windows 2000 users to keep an installation of
Windows 98 installed in order to run Windows 98's scandisk command when
necessary.  (Doesn't work for NTFS partitions, but otherwise convenient.)

If Linux is really supposed to be even less powerful than both of those,
then there's quite a lot of wasted effort under way in this undertaking.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-13 10:24   ` Norman Diamond
@ 2003-10-13 10:33     ` John Bradford
  2003-10-13 11:30       ` Norman Diamond
  2003-10-13 14:24     ` Chuck Campbell
  2003-10-14  6:49     ` Rogier Wolff
  2 siblings, 1 reply; 33+ messages in thread
From: John Bradford @ 2003-10-13 10:33 UTC (permalink / raw)
  To: Norman Diamond, linux-kernel

Quote from "Norman Diamond" <ndiamond@wta.att.ne.jp>:
> John Bradford replied to me:
> 
> > > How can I tell Linux to read every sector in the partition?  Oh, I might
> > > know this one,
> > >   dd if=/dev/hda8 of=/dev/null
> > > I want to make sure that the drive is now using a non-defective
> > > replacement sector.
> >
> > A read won't necessarily do that.  You might have to write to a
> > defective sector to force re-allocation.
> 
> I agree, we are not sure if a read will do that.  That is the reason why two
> of my preceding questions were:
> 
>    How can I find out which file contains the bad sector?  I would like to
>    try to recreate the file from a source of good data.

How are you going to make sure you write it in the same location as it was before?

>    How can I tell Linux to mark the sector as bad, knowing the LBA sector
>    number?

Don't.  If the drive can't fix this problem itself, throw it in the bin.

> And that is also the reason why my last question, which Mr. Bradford replied
> to, had the stated purpose of making sure that the drive is now using a
> non-defective replacement sector after the preceding operations have been
> carried out.

Backup your data.
Run the S.M.A.R.T. tests.
Write over the whole disk with something like dd if=/dev/zero of=/dev/hda.
If you still get errors, replace the disk.

> Please, the important questions are important.  Doesn't anyone really know
> what Linux does with bad blocks, how to find out which file contains them,
> how to get Linux to force them to be marked and reallocated?

John.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
       [not found] ` <200310131014.h9DAEwY3000241@81-2-122-30.bradfords.org.uk>
@ 2003-10-13 10:24   ` Norman Diamond
  2003-10-13 10:33     ` John Bradford
                       ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Norman Diamond @ 2003-10-13 10:24 UTC (permalink / raw)
  To: John Bradford, linux-kernel

John Bradford replied to me:

> > How can I tell Linux to read every sector in the partition?  Oh, I might
> > know this one,
> >   dd if=/dev/hda8 of=/dev/null
> > I want to make sure that the drive is now using a non-defective
> > replacement sector.
>
> A read won't necessarily do that.  You might have to write to a
> defective sector to force re-allocation.

I agree, we are not sure if a read will do that.  That is the reason why two
of my preceding questions were:

   How can I find out which file contains the bad sector?  I would like to
   try to recreate the file from a source of good data.

   How can I tell Linux to mark the sector as bad, knowing the LBA sector
   number?

And that is also the reason why my last question, which Mr. Bradford replied
to, had the stated purpose of making sure that the drive is now using a
non-defective replacement sector after the preceding operations have been
carried out.

Please, the important questions are important.  Doesn't anyone really know
what Linux does with bad blocks, how to find out which file contains them,
how to get Linux to force them to be marked and reallocated?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
@ 2003-10-13  9:31 Norman Diamond
       [not found] ` <200310131014.h9DAEwY3000241@81-2-122-30.bradfords.org.uk>
  0 siblings, 1 reply; 33+ messages in thread
From: Norman Diamond @ 2003-10-13  9:31 UTC (permalink / raw)
  To: linux-kernel

Thanks to Andreas Jellinghaus's suggestion, I ran smartctl logs and tests.
My Linux questions increase in number, but first here are the results.

Before testing, the log included a count of 92 errors, of which the
latest 5 had details available.  Reallocated_Sector_Ct was 1 and
Reallocated_Event_Count was 1.  The offline test succeeded and changed
nothing.  The long self-test found one read error.  After testing, the log
still included a count of 92 errors, of which the latest 5 had details
available, and they were the same 5, so the firmware didn't update
that log with the error that was detected by its self-test.    However,
Reallocated_Sector_Ct was 2 and Reallocated_Event_Count was 2.

The self-test saved one detail of its read error separately from the main
log.  LBA_of_first_error was 0x0122403a.  In decimal this was a very
familiar-looking 19021882.

The sector is in a Reiser partition, which might affect some of the
following questions.

So, why do the syslog entries have so many "sector" numbers, which are
mostly different except for some repetitions, and mostly different from
"LBAsect"?  It seems that LBAsect is the correct number of the bad sector.

How can I find out which file contains the bad sector?  I would like to try
to recreate the file from a source of good data.

How can I tell Linux to mark the sector as bad, knowing the LBA sector
number?

Or did the drive's firmware mark the sector as bad during its self-test?  Is
this why the number of reallocations increased from 1 to 2?  But if so, why
didn't this happen when Linux tried to read the sector?

How can I tell Linux to read every sector in the partition?  Oh, I might
know this one,
  dd if=/dev/hda8 of=/dev/null
I want to make sure that the drive is now using a non-defective replacement
sector.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Why are bad disk sectors numbered strangely, and what happens to them?
  2003-10-11  9:00 Norman Diamond
@ 2003-10-11  9:39 ` Andreas Jellinghaus
  0 siblings, 0 replies; 33+ messages in thread
From: Andreas Jellinghaus @ 2003-10-11  9:39 UTC (permalink / raw)
  To: linux-kernel

try the smartmontools package, it has "smartctl" that will
show you the discs S.M.A.R.T. details (i.e. how many bad
blocks the firmware knows, the errors the firmware knows
about, etc.). It can also run a self test etc.

doing a backup couldn't hurt.
btw: are you sure cables are ok?

Good Luck!

Andreas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Why are bad disk sectors numbered strangely, and what happens to them?
@ 2003-10-11  9:00 Norman Diamond
  2003-10-11  9:39 ` Andreas Jellinghaus
  0 siblings, 1 reply; 33+ messages in thread
From: Norman Diamond @ 2003-10-11  9:00 UTC (permalink / raw)
  To: linux-kernel

My first question is why the bad disk sectors are numbered strangely, and
second is what does Linux do with them after detecting them?

I repartitioned and reformatted two Reiser partitions before installing SuSE
8.2 and then compiling kernels 2.6.0-test5, test6, and test7.  My feeling is
that the following errors "should" have been detected during writes, so the
damage "should" not be too bad.  The correct data "should" get written to
replacement sectors.  But my understanding of modern ATA drives is that the
firmware "should" have detected the errors during writes and "should" have
finished the work without the OS knowing about it.

If the following errors occured during reads then I have some pretty angry
questions about why they didn't get detected during writes, especially when
the writes occured minutes or milliseconds prior to the reads.  (I'll copy
this message to some Toshiba employees.  Maybe the next time they visit,
certain persons should get cat food instead of my wife's cooking  _^o^_
MK4018GAP, about 2 years old.)

Hmm, I guess I also need to ask how to figure out if these occured during
writes or reads.

Meanwhile, it seems really strange to see separate numbers for LBAsect and
sector, and to see that the two numbers are sometimes related but sometimes
apparently unrelated, and to see LBAsect remain constant while sector
changes with each error.  What is really going on here?

Also kernel 2.6.0-test7 no longer says whether hda was on 03:08 or 03:00
when the errors were detected.

Sep 27 16:49:41 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 27 16:49:41 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=852296
Sep 27 16:49:41 diamondpana kernel: end_request: I/O error,
   dev 03:08 (hda), sector 852296
Sep 27 16:49:41 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 27 16:49:41 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=852304
Sep 27 16:49:41 diamondpana kernel: end_request: I/O error,
   dev 03:08 (hda), sector 852304
[comment: no more that day]

Sep 28 15:20:20 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:20 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021784
Sep 28 15:20:20 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021784
Sep 28 15:20:20 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:20 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021786
Sep 28 15:20:20 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021786
[... every even-numbered sector in this range ...]
Sep 28 15:20:21 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:21 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021880
Sep 28 15:20:21 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021880
Sep 28 15:20:21 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:21 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021882
Sep 28 15:20:21 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021882
[comment: after hitting equality, it soon repeated from the middle]
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021832
Sep 28 15:20:26 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021832
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021834
Sep 28 15:20:26 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021834
[... every even-numbered sector in this range ...]
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021880
Sep 28 15:20:26 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021880
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 28 15:20:26 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021882
Sep 28 15:20:26 diamondpana kernel: end_request: I/O error,
   dev 03:00 (hda), sector 19021882
[comment:  after hitting equality again, no more that day]

Sep 29 01:24:09 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 29 01:24:09 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=852296
Sep 29 01:24:09 diamondpana kernel: end_request: I/O error,
   dev 03:08 (hda), sector 852296
Sep 29 01:24:09 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Sep 29 01:24:09 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=852304
Sep 29 01:24:09 diamondpana kernel: end_request: I/O error,
   dev 03:08 (hda), sector 852304
[comment:  same sectors as on Sep 27]

Oct 10 18:29:29 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Oct 10 18:29:29 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021842
Oct 10 18:29:29 diamondpana kernel: end_request: I/O error,
   dev hda, sector 19021842
Oct 10 18:29:29 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Oct 10 18:29:29 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021850
Oct 10 18:29:29 diamondpana kernel: end_request: I/O error,
   dev hda, sector 19021850
[... every 8th sector in this range, congruent to 2 modulo 8 ...]
Oct 10 18:29:30 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Oct 10 18:29:30 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021874
Oct 10 18:29:30 diamondpana kernel: end_request: I/O error,
   dev hda, sector 19021874
Oct 10 18:29:30 diamondpana kernel: hda: dma_intr: status=0x51
   { DriveReady SeekComplete Error }
Oct 10 18:29:30 diamondpana kernel: hda: dma_intr: error=0x40
   { UncorrectableError }, LBAsect=19021882, sector=19021882
Oct 10 18:29:30 diamondpana kernel: end_request: I/O error,
   dev hda, sector 19021882
[comment: some of the same sectors as on Sep 28]


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2003-10-15 18:58 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-12  8:25 Why are bad disk sectors numbered strangely, and what happens to them? Norman Diamond
  -- strict thread matches above, loose matches on Subject: below --
2003-10-13  9:31 Norman Diamond
     [not found] ` <200310131014.h9DAEwY3000241@81-2-122-30.bradfords.org.uk>
2003-10-13 10:24   ` Norman Diamond
2003-10-13 10:33     ` John Bradford
2003-10-13 11:30       ` Norman Diamond
2003-10-13 11:58         ` Maciej Zenczykowski
2003-10-15 10:22           ` Norman Diamond
2003-10-13 12:02         ` John Bradford
2003-10-15 10:23           ` Norman Diamond
2003-10-15 18:56             ` Pavel Machek
2003-10-14  6:54         ` Rogier Wolff
2003-10-13 14:24     ` Chuck Campbell
2003-10-13 14:54       ` Maciej Zenczykowski
2003-10-13 16:29         ` Roger Larsson
2003-10-14  6:49     ` Rogier Wolff
2003-10-14  7:05       ` Wes Janzen
2003-10-14  7:21         ` John Bradford
2003-10-14  7:40           ` Rogier Wolff
2003-10-14  8:11             ` John Bradford
2003-10-14  8:45               ` Hans Reiser
2003-10-14  9:46                 ` Rogier Wolff
2003-10-14  9:57                   ` Hans Reiser
2003-10-14 10:10                     ` Rogier Wolff
2003-10-14 10:31                       ` Hans Reiser
2003-10-14 10:19                 ` John Bradford
     [not found]             ` <200310140800.h9E80BT9000815@81-2-122-30.bradfords.org.uk>
     [not found]               ` <20031014081110.GA14418@bitwizard.nl>
2003-10-14  8:55                 ` Wes Janzen
2003-10-14 10:05                   ` Rogier Wolff
2003-10-14  7:24         ` Rogier Wolff
2003-10-14  9:04         ` Hans Reiser
2003-10-15 10:23           ` Norman Diamond
2003-10-15 10:39             ` Hans Reiser
2003-10-11  9:00 Norman Diamond
2003-10-11  9:39 ` Andreas Jellinghaus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.