* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 9:40 ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
@ 2003-10-17 9:48 ` Hans Reiser
2003-10-17 11:11 ` Norman Diamond
2003-10-17 9:58 ` Pavel Machek
` (4 subsequent siblings)
5 siblings, 1 reply; 61+ messages in thread
From: Hans Reiser @ 2003-10-17 9:48 UTC (permalink / raw)
To: Norman Diamond
Cc: Wes Janzen, Rogier Wolff, John Bradford, linux-kernel, nikita,
Pavel Machek
Norman Diamond wrote:
>Friends in the disk drive section at Toshiba said this:
>
>When a drive tries to read a block, if it detects errors, it retries up to
>255 times. If a retry succeeds then the block gets reallocated. IF 255
>RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
>
>This was so unbelievable to that I had to confirm this with them in
>different words. In case of a temporary error, the drive provides the
>recovered data as the result of the read operation and the drive writes the
>data to a reallocated sector. In case of a permanent error, the block is
>assumed bad, and of course the data are lost. Since the data are assumed
>lost, the drive keeps the defective LBA sector number associated with the
>same defective physical block and it does not reallocate the defective
>block.
>
>I explained to them why the LBA sector number should still get reallocated
>even though the data are lost. When the sector isn't reallocated, I could
>repartition the drive and reformat the partition and the OS wouldn't know
>about the defective block so the OS would try again to use it. At first
>they did not believe I could do this, but I explained to them that I'm still
>able to delete partitions and create new partitions etc., and then they
>understood.
>
>They also said that a write operation has a chance of getting the bad block
>reallocated. The conditions for reallocation on write are similar but not
>identical to the conditions for reallocate on read. During a write
>operation if a sector is determined to be permanently bad (255 failing
>retries) then it is likely to be reallocated, unlike a read. But I'm not
>sure if this is guaranteed or not. We agreed that we should try it on my
>bad sector, but if the drive again detects a permantent error then it will
>not reallocate the sector. First I still want to find which file contains
>the sector; I haven't had time for this on weekdays.
>
>When I ran the "long" S.M.A.R.T. self-test, the number of reallocated
>sectors and number of reallocation events both increased from 1 to 2, but
>the known bad sector remained bad. This is entirely because of the behavior
>as designed. The self-test detected a temporary error in some other
>unrelated sector, rescued the data in that unreported sector number, and
>reallocated it. That was only a coincidence. The known bad sector was
>detected yet again as permanently bad and was not reallocated.
>
>In this mailing list there has been some discussion of whether file systems
>should keep lists of known bad blocks and hide those bad blocks from
>ordinary operations in ordinary usage. Of course historically this was
>always necessary. As someone else mentioned, and I've done it too, when
>formatting a disk drive, type in the list of known bad block numbers that
>were printed on a piece of paper that came with the drive.
>
>In modern times, some people think that this shouldn't be necessary because
>the drive already does its best to reallocate bad blocks. WRONG. THE BAD
>BLOCK LIST REMAINS AS NECESSARY AS IT ALWAYS WAS.
>
>This design might change in the future, but it might not. My friends are
>afraid that they might lose their jobs if they try to suggest such a change
>in the high-level design of disk drive corporate politics. I only hope this
>posting doesn't get them fired. (This is not a frivolous concern by the
>way. The myth of lifetime employment is a less pervasive myth than it used
>to be, and Toshiba is pretty much average in both world and Japanese
>standards for corporate politics.)
>
>Regarding finding which file contains the known bad sector, someone in this
>mailing list said that the badblocks program could help, but the manual page
>for the badblocks program doesn't give any clues as to how it would help.
>I'm still doing find of all files in the partition and cp them to /dev/null.
>
>Meanwhile, yes we do need to record those bad block lists and try to never
>let them get allocated to user-visible files.
>
>
>
>
>
Instead of recording the bad blocks, just write to them.
--
Hans
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 9:48 ` Hans Reiser
@ 2003-10-17 11:11 ` Norman Diamond
2003-10-17 11:45 ` Hans Reiser
` (3 more replies)
0 siblings, 4 replies; 61+ messages in thread
From: Norman Diamond @ 2003-10-17 11:11 UTC (permalink / raw)
To: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
linux-kernel, nikita, Pavel Machek
Replying first to Hans Reiser; below to Russell King and Pavel Machek.
> Instead of recording the bad blocks, just write to them.
If writes are guaranteed to force reallocations then this is potentially
part of a solution.
I still remain suspicious because the first failed read was milliseconds or
minutes after the preceding write. I think the odds are very high that the
sector was already bad at the time of the write but reallocation did not
occur. It is possible but I think very unlikely that the sector was
reallocated to a different physical sector which went bad milliseconds after
being written after reallocation, and equally unlikely that the sector
wasn't reallocated because it really hadn't been bad but went bad
milliseconds later. In other words, I think it is overwhelmingly likely
that the write failed but was not detected as such and did not result in
reallocation.
Now, maybe there is a technique to force it anyway. When a partition is
newly created and is being formatted with the intention of writing data a
few minutes later, do writes that "should" have a better chance of being
detected. The way to start this is to simply write every block, but this is
obviously insufficient because my block did get written shortly after the
partition was formatted and that write didn't cause the block to be
reallocated. So in addition to simply writing every block, also read every
block. For each read that fails, proceed to do another write which "should"
force reallocation.
Mr. Reiser, when I created a partition of your design, that technique was
not offered. Why? And will it soon start being offered?
Also, I remain highly suspicious that for each read that fails, when the
formatting program proceeds to do another write which "should" force
reallocation, the drive might not do it. The formatter will have to proceed
to yet another read. And if the block is still bad, then figure that the
drive is refusing to reallocate the bad block. And then yes, the formatter
will still have to make a list of known bad blocks and do something to
prevent ordinary file system operations from ever seeing those blocks.
Russell King replied to me:
> > When a drive tries to read a block, if it detects errors, it retries up
> > to 255 times. If a retry succeeds then the block gets reallocated. IF
> > 255 RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
>
> This is perfectly reasonable. If the drive can't recover your old data
> to reallocate it to a new block, then leaving the error present until you
> write new data to that bad block is the correct thing to do.
Only if the subsequent write is guaranteed to result in reallocation. I
remain suspicious that the drive does not guarantee such. Suppose the
contents of the next write happen to get stored close enough to correct that
the block doesn't get reallocated and the data survive for another 100
milliseconds before getting corrupt again?
> Think about what would happen if it did get reallocated. What data would
> the drive return when requested to read the bad block?
Why does it matter? The drive already reported a read failure. Maybe Linux
programs aren't all smart enough to inform the user when a read operation
results in an I/O error, but drivers could be smarter. I think there's
probably a bit of room in an inode to add a flag saying that the file has
been detected to be partially unreadable. Sorry for the digression.
Anyway, it is 100% true that the data in that block are gone. The block
should be reallocated and the new physical block can either be zeroed or
randomized or anything, and that's what subsequent reads will get until the
block gets written again.
> If the error persists during a write to the bad block, then yes, I'd
> expect it to be reallocated at that point - but only because the drive has
> the correct data for that block available.
We agree in our moral expectations and our technical analysis that correct
data will be available at that time. But if your word "expect" means you
have confidence that the drive will perform correctly, I do not share your
confidence (I think it is possible but highly unlikely that the drive did
its job correctly during the previous write).
> Your description of the way Toshibas drive works seems perfectly sane.
> In fact, I'd consider a drive to be broken if it behaved in any other way
> - capable of almost silent data loss.
I think it would not be silent. If the system log had one repetition
instead of fifty repetitions, it would not be silent. I don't know which
application was silent and am irritated. (dd wasn't silent when I tried
copying the entire partition to /dev/null).
Pavel Machek wrote:
> Well, this behaviour makes sense.
>
> "If we can't read this, leave it in place, perhaps we can read it in
> future (when temperature drops below 80Celsius or something)". "If we
> can't write this, bad, but we can reallocate without loosing
> anything".
Well, consider the two extremes we've seen in this thread now. Mr. Bradford
felt that the entire drive should be discarded on account of having one bad
block. Mr. Machek feels that we should preserve the possibility of reusing
the bad block because in the future it might appear not to be bad. I take
the middle road. The drive should not be discarded until errors become more
frequent or numerous, but known bad blocks should be acted on so that those
physical blocks should not have a chance of being used again.
Suppose the block became readable when the temperature drops (this one
didn't but I believe some can). What happens when the block becomes
readable, and then a program writes new data to that block, and the block
temporarily appears good? At that time it will get written and will not get
reallocated, right? And a few milliseconds later, what? I do not want that
block reused. I want it reallocated.
And when a drive doesn't guarantee reallocation, I want the driver to remove
the sector from the file system.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 11:11 ` Norman Diamond
@ 2003-10-17 11:45 ` Hans Reiser
2003-10-17 11:51 ` John Bradford
` (2 subsequent siblings)
3 siblings, 0 replies; 61+ messages in thread
From: Hans Reiser @ 2003-10-17 11:45 UTC (permalink / raw)
To: Norman Diamond
Cc: Wes Janzen, Rogier Wolff, John Bradford, linux-kernel, nikita,
Pavel Machek, Vitaly Fertman
Norman Diamond wrote:
>Replying first to Hans Reiser; below to Russell King and Pavel Machek.
>
>
>
>>Instead of recording the bad blocks, just write to them.
>>
>>
>
>If writes are guaranteed to force reallocations then this is potentially
>part of a solution.
>
>I still remain suspicious because the first failed read was milliseconds or
>minutes after the preceding write. I think the odds are very high that the
>sector was already bad at the time of the write but reallocation did not
>occur. It is possible but I think very unlikely that the sector was
>reallocated to a different physical sector which went bad milliseconds after
>being written after reallocation, and equally unlikely that the sector
>wasn't reallocated because it really hadn't been bad but went bad
>milliseconds later. In other words, I think it is overwhelmingly likely
>that the write failed but was not detected as such and did not result in
>reallocation.
>
>
perform the write after the failed read, that way the drive knows it is
a bad block at the time you write.
>Now, maybe there is a technique to force it anyway. When a partition is
>newly created and is being formatted with the intention of writing data a
>few minutes later, do writes that "should" have a better chance of being
>detected. The way to start this is to simply write every block, but this is
>obviously insufficient because my block did get written shortly after the
>partition was formatted and that write didn't cause the block to be
>reallocated. So in addition to simply writing every block, also read every
>block. For each read that fails, proceed to do another write which "should"
>force reallocation.
>
>Mr. Reiser, when I created a partition of your design, that technique was
>not offered. Why? And will it soon start being offered?
>
>
I think I discussed with Vitaly offering users the option of writing,
reading, and then writing again, every block before mkreiserfs. I
forget what happened to that idea, Vitaly?
>Also, I remain highly suspicious that for each read that fails, when the
>formatting program proceeds to do another write which "should" force
>reallocation, the drive might not do it.
>
I am not going to worry about such suspicions without evidence or drive
manufacturer comment, as it has not been our experience so far.
>
>
>Why does it matter? The drive already reported a read failure. Maybe Linux
>programs aren't all smart enough to inform the user when a read operation
>results in an I/O error, but drivers could be smarter.
>
There is a general problem with reporting urgent kernel messages to
users thanks to GUIs covering over the console.
--
Hans
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 11:11 ` Norman Diamond
2003-10-17 11:45 ` Hans Reiser
@ 2003-10-17 11:51 ` John Bradford
2003-10-17 12:53 ` John Bradford
2003-10-17 13:04 ` Russell King
3 siblings, 0 replies; 61+ messages in thread
From: John Bradford @ 2003-10-17 11:51 UTC (permalink / raw)
To: Norman Diamond, Hans Reiser, Wes Janzen, Rogier Wolff; +Cc: linux-kernel
> Well, consider the two extremes we've seen in this thread now. Mr. Bradford
> felt that the entire drive should be discarded on account of having one bad
> block.
Please don't spread blatently mis-leading information.
My position on this is that if a drive is _persistantly_ unable to
_write_ to any LBA address, it should be binned. Read errors are a
separate concern. If they occur, the drive should simply return an
error. The OS needs to do _NOTHING_. No special re-writing to force
a re-allocation should be done - we assume the drive is going to do
that, and if it doesn't:
1. DRIVE -> BIN
2. Restore backup.
> Mr. Machek feels that we should preserve the possibility of reusing
> the bad block because in the future it might appear not to be bad. I take
> the middle road. The drive should not be discarded until errors become more
> frequent or numerous, but known bad blocks should be acted on so that those
> physical blocks should not have a chance of being used again.
You may consider that a responsible attitude towards people who are
paying for consultancy, and value their data at more than the physical
cost of the disk, but I do not.
> Suppose the block became readable when the temperature drops (this one
> didn't but I believe some can). What happens when the block becomes
> readable, and then a program writes new data to that block, and the block
> temporarily appears good? At that time it will get written and will not get
> reallocated, right? And a few milliseconds later, what? I do not want that
> block reused. I want it reallocated.
1. Monitor drive.
2. Out of spec temperature? If yes, remount R/O and page an operator.
3. Go to 1
> And when a drive doesn't guarantee reallocation, I want the driver to remove
> the sector from the file system.
Such drives are no better in this regard than ST-506 drives in my
opinion. I have almost always started discussions with a phrase such
as, "assuming we are talking about modern drives that do their own
defect management".
John.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 11:11 ` Norman Diamond
2003-10-17 11:45 ` Hans Reiser
2003-10-17 11:51 ` John Bradford
@ 2003-10-17 12:53 ` John Bradford
2003-10-17 13:03 ` Russell King
2003-10-19 7:50 ` Andre Hedrick
2003-10-17 13:04 ` Russell King
3 siblings, 2 replies; 61+ messages in thread
From: John Bradford @ 2003-10-17 12:53 UTC (permalink / raw)
To: Norman Diamond, Hans Reiser, Wes Janzen, Rogier Wolff; +Cc: linux-kernel
Quote from "Norman Diamond" <ndiamond@wta.att.ne.jp>:
> Now, maybe there is a technique to force it anyway. When a partition is
> newly created and is being formatted with the intention of writing data a
> few minutes later, do writes that "should" have a better chance of being
> detected. The way to start this is to simply write every block, but this is
> obviously insufficient because my block did get written shortly after the
> partition was formatted and that write didn't cause the block to be
> reallocated. So in addition to simply writing every block, also read every
> block. For each read that fails, proceed to do another write which "should"
> force reallocation.
I am just imagning how many Flash devices will be worn out
unnecessarily by any filesystem utility that does this transparently
to the user :-(.
> Russell King replied to me:
>
> > > When a drive tries to read a block, if it detects errors, it retries up
> > > to 255 times. If a retry succeeds then the block gets reallocated. IF
> > > 255 RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
> >
> > This is perfectly reasonable. If the drive can't recover your old data
> > to reallocate it to a new block, then leaving the error present until you
> > write new data to that bad block is the correct thing to do.
I 99% agree with that. The 1% where I don't is that there may be
situations where there is no interest in doing any data recovery from
the drive, (you have backups, it is part of a RAID array, or storing
temporary data that can be re-generated whenever necessary), and also,
any read errors that occur during a S.M.A.R.T. read test should result
in a re-mapping of the block.
> > Think about what would happen if it did get reallocated. What data would
> > the drive return when requested to read the bad block?
>
> Why does it matter? The drive already reported a read failure. Maybe Linux
> programs aren't all smart enough to inform the user when a read operation
> results in an I/O error, but drivers could be smarter. I think there's
> probably a bit of room in an inode to add a flag saying that the file has
> been detected to be partially unreadable. Sorry for the digression.
> Anyway, it is 100% true that the data in that block are gone. The block
> should be reallocated and the new physical block can either be zeroed or
> randomized or anything, and that's what subsequent reads will get until the
> block gets written again.
100% agreed.
> > If the error persists during a write to the bad block, then yes, I'd
> > expect it to be reallocated at that point - but only because the drive has
> > the correct data for that block available.
>
> We agree in our moral expectations and our technical analysis that correct
> data will be available at that time. But if your word "expect" means you
> have confidence that the drive will perform correctly, I do not share your
> confidence (I think it is possible but highly unlikely that the drive did
> its job correctly during the previous write).
If the drive is not doing it's job properly DRIVE -> BIN.
> > Your description of the way Toshibas drive works seems perfectly sane.
I disagree - we haven't confirmed what happens in the error-on-write
situation. If it does indeed always remap the block, then I'd agree
that that aspect was perfectly sane.
John.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 12:53 ` John Bradford
@ 2003-10-17 13:03 ` Russell King
2003-10-17 13:26 ` John Bradford
2003-10-19 7:50 ` Andre Hedrick
1 sibling, 1 reply; 61+ messages in thread
From: Russell King @ 2003-10-17 13:03 UTC (permalink / raw)
To: John Bradford
Cc: Norman Diamond, Hans Reiser, Wes Janzen, Rogier Wolff, linux-kernel
On Fri, Oct 17, 2003 at 01:53:01PM +0100, John Bradford wrote:
> I disagree - we haven't confirmed what happens in the error-on-write
> situation. If it does indeed always remap the block, then I'd agree
> that that aspect was perfectly sane.
My comments were based upon the information contained within the mail
which appeared to originate from the manufacturer.
Plus, they were in *PRIVATE*. Sheesh.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 13:03 ` Russell King
@ 2003-10-17 13:26 ` John Bradford
0 siblings, 0 replies; 61+ messages in thread
From: John Bradford @ 2003-10-17 13:26 UTC (permalink / raw)
To: Russell King
Cc: Norman Diamond, Hans Reiser, Wes Janzen, linux-kernel, R.E.Wolff
Quote from Russell King <rmk+lkml@arm.linux.org.uk>:
> On Fri, Oct 17, 2003 at 01:53:01PM +0100, John Bradford wrote:
> > I disagree - we haven't confirmed what happens in the error-on-write
> > situation. If it does indeed always remap the block, then I'd agree
> > that that aspect was perfectly sane.
>
> My comments were based upon the information contained within the mail
> which appeared to originate from the manufacturer.
>
> Plus, they were in *PRIVATE*. Sheesh.
Please note - _I_ only quoted what was already posted to the list as a
quote.
http://marc.theaimsgroup.com/?l=linux-kernel&m=106638956902403&w=2
John.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 12:53 ` John Bradford
2003-10-17 13:03 ` Russell King
@ 2003-10-19 7:50 ` Andre Hedrick
1 sibling, 0 replies; 61+ messages in thread
From: Andre Hedrick @ 2003-10-19 7:50 UTC (permalink / raw)
To: John Bradford
Cc: Norman Diamond, Hans Reiser, Wes Janzen, Rogier Wolff, linux-kernel
Sheesh, glad somebody slapped the obvious idioticy on the thread with
solid state media. Yeah there are ways to force this, and the kernel
execute's all transactions with auto retries on the opcode.
If the drive returns valid data regardless of the ecc brut force required
it will not reallocate period.
Cheers,
Andre Hedrick
LAD Storage Consulting Group
On Fri, 17 Oct 2003, John Bradford wrote:
> Quote from "Norman Diamond" <ndiamond@wta.att.ne.jp>:
> > Now, maybe there is a technique to force it anyway. When a partition is
> > newly created and is being formatted with the intention of writing data a
> > few minutes later, do writes that "should" have a better chance of being
> > detected. The way to start this is to simply write every block, but this is
> > obviously insufficient because my block did get written shortly after the
> > partition was formatted and that write didn't cause the block to be
> > reallocated. So in addition to simply writing every block, also read every
> > block. For each read that fails, proceed to do another write which "should"
> > force reallocation.
>
> I am just imagning how many Flash devices will be worn out
> unnecessarily by any filesystem utility that does this transparently
> to the user :-(.
>
> > Russell King replied to me:
> >
> > > > When a drive tries to read a block, if it detects errors, it retries up
> > > > to 255 times. If a retry succeeds then the block gets reallocated. IF
> > > > 255 RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
> > >
> > > This is perfectly reasonable. If the drive can't recover your old data
> > > to reallocate it to a new block, then leaving the error present until you
> > > write new data to that bad block is the correct thing to do.
>
> I 99% agree with that. The 1% where I don't is that there may be
> situations where there is no interest in doing any data recovery from
> the drive, (you have backups, it is part of a RAID array, or storing
> temporary data that can be re-generated whenever necessary), and also,
> any read errors that occur during a S.M.A.R.T. read test should result
> in a re-mapping of the block.
>
> > > Think about what would happen if it did get reallocated. What data would
> > > the drive return when requested to read the bad block?
> >
> > Why does it matter? The drive already reported a read failure. Maybe Linux
> > programs aren't all smart enough to inform the user when a read operation
> > results in an I/O error, but drivers could be smarter. I think there's
> > probably a bit of room in an inode to add a flag saying that the file has
> > been detected to be partially unreadable. Sorry for the digression.
> > Anyway, it is 100% true that the data in that block are gone. The block
> > should be reallocated and the new physical block can either be zeroed or
> > randomized or anything, and that's what subsequent reads will get until the
> > block gets written again.
>
> 100% agreed.
>
> > > If the error persists during a write to the bad block, then yes, I'd
> > > expect it to be reallocated at that point - but only because the drive has
> > > the correct data for that block available.
> >
> > We agree in our moral expectations and our technical analysis that correct
> > data will be available at that time. But if your word "expect" means you
> > have confidence that the drive will perform correctly, I do not share your
> > confidence (I think it is possible but highly unlikely that the drive did
> > its job correctly during the previous write).
>
> If the drive is not doing it's job properly DRIVE -> BIN.
>
> > > Your description of the way Toshibas drive works seems perfectly sane.
>
> I disagree - we haven't confirmed what happens in the error-on-write
> situation. If it does indeed always remap the block, then I'd agree
> that that aspect was perfectly sane.
>
> John.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 11:11 ` Norman Diamond
` (2 preceding siblings ...)
2003-10-17 12:53 ` John Bradford
@ 2003-10-17 13:04 ` Russell King
2003-10-17 14:09 ` Norman Diamond
3 siblings, 1 reply; 61+ messages in thread
From: Russell King @ 2003-10-17 13:04 UTC (permalink / raw)
To: Norman Diamond
Cc: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
linux-kernel, nikita, Pavel Machek
On Fri, Oct 17, 2003 at 08:11:42PM +0900, Norman Diamond wrote:
> Russell King replied to me:
> > > When a drive tries to read a block, if it detects errors, it retries up
> > > to 255 times. If a retry succeeds then the block gets reallocated. IF
> > > 255 RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
> >
> > This is perfectly reasonable. If the drive can't recover your old data
> > to reallocate it to a new block, then leaving the error present until you
> > write new data to that bad block is the correct thing to do.
Why the F**K are you replying to me publically when I sent my reply in
private?
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 13:04 ` Russell King
@ 2003-10-17 14:09 ` Norman Diamond
0 siblings, 0 replies; 61+ messages in thread
From: Norman Diamond @ 2003-10-17 14:09 UTC (permalink / raw)
To: Russell King
Cc: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
linux-kernel, nikita, Pavel Machek
This question from Russell King was public...
> On Fri, Oct 17, 2003 at 08:11:42PM +0900, Norman Diamond wrote:
> > Russell King replied to me:
> > > > When a drive tries to read a block, if it detects errors, it retries up
> > > > to 255 times. If a retry succeeds then the block gets reallocated. IF
> > > > 255 RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
> > >
> > > This is perfectly reasonable. If the drive can't recover your old data
> > > to reallocate it to a new block, then leaving the error present until you
> > > write new data to that bad block is the correct thing to do.
>
> Why the F**K are you replying to me publically when I sent my reply in
> private?
First to answer literally, the reasons are:
(1) Everything else in this discussion have been public with additional
copies to individuals participating in the discussion. (The same has been
true of most messages in other LKML discussions that I've seen.)
(2) I didn't notice anything in your previous message that looked like it
needed to be kept secret, i.e. deliberately not posted publicly.
Now taking it non-literally, obviously I owe you an apology. I should not
have quoted any of your words publicly without asking you first. I am sorry
for quoting you without asking.
Now taking it intellectually, I am genuinely puzzled. Sorry to repeat, but
I didn't notice anything in your previous message that looked like it needed
to be kept secret, i.e. deliberately not posted publicly. Why was your
previous message private?
Sincerely,
Norman Diamond
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 9:40 ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
2003-10-17 9:48 ` Hans Reiser
@ 2003-10-17 9:58 ` Pavel Machek
2003-10-17 10:15 ` Hans Reiser
2003-10-17 10:24 ` Rogier Wolff
` (3 subsequent siblings)
5 siblings, 1 reply; 61+ messages in thread
From: Pavel Machek @ 2003-10-17 9:58 UTC (permalink / raw)
To: Norman Diamond
Cc: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
linux-kernel, nikita, Pavel Machek
Hi!
> When a drive tries to read a block, if it detects errors, it retries up to
> 255 times. If a retry succeeds then the block gets reallocated. IF 255
> RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
...
> They also said that a write operation has a chance of getting the bad block
> reallocated. The conditions for reallocation on write are similar but not
> identical to the conditions for reallocate on read. During a write
> operation if a sector is determined to be permanently bad (255 failing
> retries) then it is likely to be reallocated, unlike a read. But I'm not
> sure if this is guaranteed or not. We agreed that we should try it
> on my
Well, this behaviour makes sense.
"If we can't read this, leave it in place, perhaps we can read it in
future (when temperature drops below 80Celsius or something)". "If we
can't write this, bad, but we can reallocate without loosing
anything".
It looks slightly unexpected, but pretty sane to me. Anything else
would kill your data.
[BTW your subject made me delete the mail with "spam", until Hans
replied to it...]
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 9:58 ` Pavel Machek
@ 2003-10-17 10:15 ` Hans Reiser
0 siblings, 0 replies; 61+ messages in thread
From: Hans Reiser @ 2003-10-17 10:15 UTC (permalink / raw)
To: Pavel Machek
Cc: Norman Diamond, Wes Janzen, Rogier Wolff, John Bradford,
linux-kernel, nikita, Pavel Machek
Pavel Machek wrote:
>
>
>[BTW your subject made me delete the mail with "spam", until Hans
>replied to it...]
> Pavel
>
>
I wonder if spam filters will eventually result in a modest reduction in
the level of hyperbole in non-spam.;-)
--
Hans
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 9:40 ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
2003-10-17 9:48 ` Hans Reiser
2003-10-17 9:58 ` Pavel Machek
@ 2003-10-17 10:24 ` Rogier Wolff
2003-10-17 10:49 ` John Bradford
2003-10-17 10:37 ` ATA Defect management John Bradford
` (2 subsequent siblings)
5 siblings, 1 reply; 61+ messages in thread
From: Rogier Wolff @ 2003-10-17 10:24 UTC (permalink / raw)
To: Norman Diamond
Cc: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
linux-kernel, nikita, Pavel Machek
On Fri, Oct 17, 2003 at 06:40:01PM +0900, Norman Diamond wrote:
> I explained to them why the LBA sector number should still get
> reallocated even though the data are lost.
This is unbelievably bad: Sometimes it is worth it, to try and read
the block again and again. We've seen blocks getting read after we've
retried over 1000 times from "userspace". That doesn't include the
retries that the drive did for us "behind the scenes".
If you manage to convince Toshiba to remap the sector on a "bad read",
we'll never ever be able to recover the sector.
We've also been able to provide a different environment (e.g. other
ambient temperature) to a drive so that previously bad sectors could
be read.
No, the only way is to realloc on write. (but it should remember that
the data was bad, and treat the physical area with extra caution. It's
possible that something happened while writing that sector, so that
rewriting it this time will fix the problem for good, but on the other
hand, that area of the drive demonstrated the abilitty to lose data,
so you shouldn't trust any data to it!)
Roger.
--
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam - no windows, no gates, apache inside!" ****
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 10:24 ` Rogier Wolff
@ 2003-10-17 10:49 ` John Bradford
2003-10-17 11:09 ` Rogier Wolff
2003-10-17 11:24 ` Krzysztof Halasa
0 siblings, 2 replies; 61+ messages in thread
From: John Bradford @ 2003-10-17 10:49 UTC (permalink / raw)
To: Rogier Wolff, Norman Diamond; +Cc: Hans Reiser, Wes Janzen, linux-kernel
Quote from Rogier Wolff <R.E.Wolff@BitWizard.nl>:
> On Fri, Oct 17, 2003 at 06:40:01PM +0900, Norman Diamond wrote:
> > I explained to them why the LBA sector number should still get
> > reallocated even though the data are lost.
>
> This is unbelievably bad: Sometimes it is worth it, to try and read
> the block again and again. We've seen blocks getting read after we've
> retried over 1000 times from "userspace". That doesn't include the
> retries that the drive did for us "behind the scenes".
That's moving in to the realms of more advanced data recovery. You
shouldn't really expect to be able to do those kind of forensics on
intellegent drives using standard filesystem system calls.
Besides, are you positive that you always got the correct data off the
disk? See the discussions about hashing algorithms - maybe the drive
simply returned data that had an additional bit flipped and wasn't
identified as bad. If you are having to try over 1000 times from
userspace, the drive is in a bad way. You shouldn't really make
assumptions that you do usually, (that the error correction is good
enough to ensure bad data isn't returned as good data). If you are
recovering data from a spreadsheet, for example, the errors could go
unnoticed, but have catastrophic results.
> If you manage to convince Toshiba to remap the sector on a "bad read",
> we'll never ever be able to recover the sector.
Of course you will - it's remapped, the data isn't overwritten! You
may need more advanced tools, but you can still seek the heads to that
part of the platter and get data from the head-amp. Just because you
couldn't use your simple method anymore is real reason to argue
against fixing the problem.
> We've also been able to provide a different environment (e.g. other
> ambient temperature) to a drive so that previously bad sectors could
> be read.
>
> No, the only way is to realloc on write.
This may be more sensible, but not for the reasons you are suggesting,
and not in the way that you are suggesting. I have nothing really
against not re-allocating on read, although ideally, it should be an
option, but marking the sector as "don't touch, don't even re-map in
case we confuse the OS", after a bad read is NOT acceptable in my
opinion.
In any case, a S.M.A.R.T. test should remap all suspect sectors - if
an admin has deliberately run a S.M.A.R.T. test, I think we can assume
they know what they are doing.
> (but it should remember that
> the data was bad, and treat the physical area with extra caution. It's
> possible that something happened while writing that sector, so that
> rewriting it this time will fix the problem for good, but on the other
> hand, that area of the drive demonstrated the abilitty to lose data,
> so you shouldn't trust any data to it!)
Suspect drive? Bin it. Do you really not value your data enough to
do that?
John.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 10:49 ` John Bradford
@ 2003-10-17 11:09 ` Rogier Wolff
2003-10-17 11:24 ` Krzysztof Halasa
1 sibling, 0 replies; 61+ messages in thread
From: Rogier Wolff @ 2003-10-17 11:09 UTC (permalink / raw)
To: John Bradford
Cc: Rogier Wolff, Norman Diamond, Hans Reiser, Wes Janzen, linux-kernel
On Fri, Oct 17, 2003 at 11:49:11AM +0100, John Bradford wrote:
> Quote from Rogier Wolff <R.E.Wolff@BitWizard.nl>:
> > On Fri, Oct 17, 2003 at 06:40:01PM +0900, Norman Diamond wrote:
> > > I explained to them why the LBA sector number should still get
> > > reallocated even though the data are lost.
> >
> > This is unbelievably bad: Sometimes it is worth it, to try and read
> > the block again and again. We've seen blocks getting read after we've
> > retried over 1000 times from "userspace". That doesn't include the
> > retries that the drive did for us "behind the scenes".
>
> That's moving in to the realms of more advanced data recovery. You
> shouldn't really expect to be able to do those kind of forensics on
> intellegent drives using standard filesystem system calls.
Yep. And several manufacturers have told us that they don't put any
"bypass" commands in their firmware so as a data-recovery company we
can't bypass the normal stuff. On SCSI drives we get to set the
"number of retiries" and things like that. Terribly useful. Not on ATA
drives.
> Besides, are you positive that you always got the correct data off the
> disk? See the discussions about hashing algorithms - maybe the drive
> simply returned data that had an additional bit flipped and wasn't
> identified as bad. If you are having to try over 1000 times from
> userspace, the drive is in a bad way. You shouldn't really make
> assumptions that you do usually, (that the error correction is good
> enough to ensure bad data isn't returned as good data). If you are
> recovering data from a spreadsheet, for example, the errors could go
> unnoticed, but have catastrophic results.
Yes, as an experienced data-recovery-expert I can look at the data,
and say that I believe it. And I know the risks you explain here.
> > If you manage to convince Toshiba to remap the sector on a "bad read",
> > we'll never ever be able to recover the sector.
>
> Of course you will - it's remapped, the data isn't overwritten! You
> may need more advanced tools, but you can still seek the heads to that
> part of the platter and get data from the head-amp. Just because you
> couldn't use your simple method anymore is real reason to argue
> against fixing the problem.
Nope. Even on SCSI drives there seems to be no way to tell the drive
"please give me the data for raw block XXX". We've pushed the
manufacturers for the ability to do this, but we get nowhere. Feel
free to prove us wrong. Armwaving doesn't work.
Roger.
--
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam - no windows, no gates, apache inside!" ****
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 10:49 ` John Bradford
2003-10-17 11:09 ` Rogier Wolff
@ 2003-10-17 11:24 ` Krzysztof Halasa
2003-10-17 19:35 ` John Bradford
1 sibling, 1 reply; 61+ messages in thread
From: Krzysztof Halasa @ 2003-10-17 11:24 UTC (permalink / raw)
To: John Bradford
Cc: Rogier Wolff, Norman Diamond, Hans Reiser, Wes Janzen, linux-kernel
John Bradford <john@grabjohn.com> writes:
> Besides, are you positive that you always got the correct data off the
> disk? See the discussions about hashing algorithms - maybe the drive
> simply returned data that had an additional bit flipped and wasn't
> identified as bad.
One bit? No chance. The same as with ECC RAM - one bit error will always
be detected.
> If you are having to try over 1000 times from
> userspace, the drive is in a bad way. You shouldn't really make
> assumptions that you do usually, (that the error correction is good
> enough to ensure bad data isn't returned as good data). If you are
> recovering data from a spreadsheet, for example, the errors could go
> unnoticed, but have catastrophic results.
Then you have to abandon using any hard drivers. Or computers at all.
Well, mirrors (with read-and-compare) are probably good enough for you,
but it has to be done at application level.
> Of course you will - it's remapped, the data isn't overwritten! You
> may need more advanced tools,
= in practice, it's lost. Have you seen such tools?
> but you can still seek the heads to that
> part of the platter and get data from the head-amp. Just because you
> couldn't use your simple method anymore is real reason to argue
> against fixing the problem.
against _changing_ the problem (it doesn't go away), breaking things
which are now sane.
> This may be more sensible, but not for the reasons you are suggesting,
> and not in the way that you are suggesting.
Then note that a drive can be temporarily unable to read most of the
data - due to, say, incorrect supply voltage or very high level of
electromagnetic interferences.
Would you like to trash _all_ your data in such case automatically?
> Suspect drive? Bin it. Do you really not value your data enough to
> do that?
Do you really not value your data enough to mark it as inaccessible?
If it comes to non-standard recovery then you should rather go for
backups.
--
Krzysztof Halasa, B*FH
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 11:24 ` Krzysztof Halasa
@ 2003-10-17 19:35 ` John Bradford
2003-10-17 23:28 ` Krzysztof Halasa
[not found] ` <m37k33igui.fsf@defiant. <m3u166vjn0.fsf@defiant.pm.waw.pl>
0 siblings, 2 replies; 61+ messages in thread
From: John Bradford @ 2003-10-17 19:35 UTC (permalink / raw)
To: Krzysztof Halasa
Cc: Rogier Wolff, Norman Diamond, Hans Reiser, Wes Janzen, linux-kernel
Quote from Krzysztof Halasa <khc@pm.waw.pl>:
> John Bradford <john@grabjohn.com> writes:
>
> > Besides, are you positive that you always got the correct data off the
> > disk? See the discussions about hashing algorithms - maybe the drive
> > simply returned data that had an additional bit flipped and wasn't
> > identified as bad.
>
> One bit? No chance. The same as with ECC RAM - one bit error will always
> be detected.
I said an _additional_ bit. I am assuming that N-1 reads returned the
same, (bad), data, which was identified as bad. Read N encountered
one too many flipped bits and returned a false positive. Perfectly
possible, and arguably more likely than all of the existing incorrect
bits flipping back, resulting in the correct data being read back, in
some cases.
> > If you are having to try over 1000 times from
> > userspace, the drive is in a bad way. You shouldn't really make
> > assumptions that you do usually, (that the error correction is good
> > enough to ensure bad data isn't returned as good data). If you are
> > recovering data from a spreadsheet, for example, the errors could go
> > unnoticed, but have catastrophic results.
>
> Then you have to abandon using any hard drivers. Or computers at all.
Hardly. The point I was trying to make is that the likelyhood of a
critical fault is greater when you are experiencing many non-critical
faults.
> Well, mirrors (with read-and-compare) are probably good enough for you,
> but it has to be done at application level.
>
> > Of course you will - it's remapped, the data isn't overwritten! You
> > may need more advanced tools,
>
> = in practice, it's lost. Have you seen such tools?
Tell this to the drive manufacturers. They are the ones who can sell
you a specialist firmware if you want to do data recovery, not me.
> > but you can still seek the heads to that
> > part of the platter and get data from the head-amp. Just because you
> > couldn't use your simple method anymore is real reason to argue
> > against fixing the problem.
>
> against _changing_ the problem (it doesn't go away), breaking things
> which are now sane.
Your argument is flawed - how can you claim the current situation is
sane when at least some drive manufactuers don't publish simple facts
such as what happens when defective blocks are encountered on reads
and on writes?
> > This may be more sensible, but not for the reasons you are suggesting,
> > and not in the way that you are suggesting.
>
> Then note that a drive can be temporarily unable to read most of the
> data - due to, say, incorrect supply voltage or very high level of
> electromagnetic interferences.
If a system got in to a state as extreme as that, I'd generally take
the hole system down. Electromagnatic interference that affects one
drive immediately noticably may well be affecting other components in
subtle ways - possible _silent_ data corruption in other words.
> Would you like to trash _all_ your data in such case automatically?
Yes. Or more specifically, I wouldn't trust that data without
verifying it. It's easy to ignore such problems and say that
everything is probably OK, and maybe 99% of the time you would be
right, but so what? What about that 1%?
> > Suspect drive? Bin it. Do you really not value your data enough to
> > do that?
>
> Do you really not value your data enough to mark it as inaccessible?
Not sure what you mean - in what context?
> If it comes to non-standard recovery then you should rather go for
> backups.
Data recovery is always a last resort. On the other hand, backing up
data daily can still result in 23 hours of lost data, so I consider
early detection of faulty disks very important. Mirroring brings it's
own problems to consider - more devices to possibly fail, and if they
are connected to the same controller, a serious fault with any one
could usually theoretically destroy all of them.
John.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 19:35 ` John Bradford
@ 2003-10-17 23:28 ` Krzysztof Halasa
2003-10-18 7:42 ` Pavel Machek
2003-10-18 8:27 ` John Bradford
[not found] ` <m37k33igui.fsf@defiant. <m3u166vjn0.fsf@defiant.pm.waw.pl>
1 sibling, 2 replies; 61+ messages in thread
From: Krzysztof Halasa @ 2003-10-17 23:28 UTC (permalink / raw)
To: John Bradford
Cc: Rogier Wolff, Norman Diamond, Hans Reiser, Wes Janzen, linux-kernel
John Bradford <john@grabjohn.com> writes:
> I said an _additional_ bit. I am assuming that N-1 reads returned the
> same, (bad), data, which was identified as bad. Read N encountered
> one too many flipped bits and returned a false positive. Perfectly
> possible, and arguably more likely than all of the existing incorrect
> bits flipping back, resulting in the correct data being read back, in
> some cases.
In some cases, theoretically, yes. But I've never got anything like that
in practice.
BTW: Hard drives apparently use more sophisticated algorithms,
involving measuring head signal level even when there is no problem
reading the data, and eventually remapping a sector on read before the
information is lost.
> Tell this to the drive manufacturers. They are the ones who can sell
> you a specialist firmware if you want to do data recovery, not me.
Maybe. But, you know, it's Linux and I don't want to pay for additional
software just to use disks already paid for. Especially when it's all
working fine now.
> Your argument is flawed - how can you claim the current situation is
> sane when at least some drive manufactuers don't publish simple facts
> such as what happens when defective blocks are encountered on reads
> and on writes?
Do you think you can make them publish such things? It would be great.
> If a system got in to a state as extreme as that, I'd generally take
> the hole system down. Electromagnatic interference that affects one
> drive immediately noticably may well be affecting other components in
> subtle ways - possible _silent_ data corruption in other words.
Possibly. Possibly the machine will immediately freeze. But data on
disk platters will probably be ok, and you'll be able to read it
when the conditions are back in specs.
> Yes. Or more specifically, I wouldn't trust that data without
> verifying it. It's easy to ignore such problems and say that
> everything is probably OK, and maybe 99% of the time you would be
> right, but so what? What about that 1%?
That's not 1% - rather something like 10^-17 or so.
See the specs.
And we have CRCs all over the place - damaged .gnumeric file will
probably fail gunzip stage.
BTW: the probability of silently corrupting, say, (D)RAM contents is
much much higher than that of corrupting HDD data. Even if you use
ECC RAM.
> > Do you really not value your data enough to mark it as inaccessible?
>
> Not sure what you mean - in what context?
Remapping a sector on read without actually copying the data makes
it inaccessible. Unless you have manufacturer-provided software, of
course, but I haven't seen any.
> Data recovery is always a last resort. On the other hand, backing up
> data daily can still result in 23 hours of lost data, so I consider
> early detection of faulty disks very important. Mirroring brings it's
> own problems to consider - more devices to possibly fail, and if they
> are connected to the same controller, a serious fault with any one
> could usually theoretically destroy all of them.
It all depends on requirements. If you need 100% uninterrupted service
you can use mirrored servers, possibly installed in different locations.
This will fix potential problems, while remapping on failed read will
not.
--
Krzysztof Halasa, B*FH
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 23:28 ` Krzysztof Halasa
@ 2003-10-18 7:42 ` Pavel Machek
2003-10-18 8:30 ` John Bradford
2003-10-18 8:27 ` John Bradford
1 sibling, 1 reply; 61+ messages in thread
From: Pavel Machek @ 2003-10-18 7:42 UTC (permalink / raw)
To: Krzysztof Halasa
Cc: John Bradford, Rogier Wolff, Norman Diamond, Hans Reiser,
Wes Janzen, linux-kernel
Hi!
> BTW: Hard drives apparently use more sophisticated algorithms,
> involving measuring head signal level even when there is no problem
> reading the data, and eventually remapping a sector on read before the
> information is lost.
>
Which means cat /dev/hda > /dev/null makes sense in
cron.weekly...
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-18 7:42 ` Pavel Machek
@ 2003-10-18 8:30 ` John Bradford
2003-10-21 20:26 ` bill davidsen
0 siblings, 1 reply; 61+ messages in thread
From: John Bradford @ 2003-10-18 8:30 UTC (permalink / raw)
To: Pavel Machek, Krzysztof Halasa
Cc: John Bradford, Rogier Wolff, Norman Diamond, Hans Reiser,
Wes Janzen, linux-kernel
> > BTW: Hard drives apparently use more sophisticated algorithms,
> > involving measuring head signal level even when there is no problem
> > reading the data, and eventually remapping a sector on read before the
> > information is lost.
> >
>
> Which means cat /dev/hda > /dev/null makes sense in
> cron.weekly...
Indeed. Some drives can also do a timed defect scan using S.M.A.R.T.
John.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-18 8:30 ` John Bradford
@ 2003-10-21 20:26 ` bill davidsen
0 siblings, 0 replies; 61+ messages in thread
From: bill davidsen @ 2003-10-21 20:26 UTC (permalink / raw)
To: linux-kernel
In article <200310180830.h9I8ULuc000419@81-2-122-30.bradfords.org.uk>,
John Bradford <john@grabjohn.com> wrote:
| > > BTW: Hard drives apparently use more sophisticated algorithms,
| > > involving measuring head signal level even when there is no problem
| > > reading the data, and eventually remapping a sector on read before the
| > > information is lost.
| > >
| >
| > Which means cat /dev/hda > /dev/null makes sense in
| > cron.weekly...
|
| Indeed. Some drives can also do a timed defect scan using S.M.A.R.T.
You make the point I was going to question, is the cat (dd?) better than
a S.M.A.R.T. scan? I would think that the scan would be more likely to
be doing some special error checking, like turning off one level of ECC
or similar, and might see things a normal read might not. In other
words, the difference between no uncorrectable errors and no errors.
I am thinking of something like a C2 scan on a CD, to get error
detection without error correction.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 23:28 ` Krzysztof Halasa
2003-10-18 7:42 ` Pavel Machek
@ 2003-10-18 8:27 ` John Bradford
2003-10-18 12:02 ` Krzysztof Halasa
1 sibling, 1 reply; 61+ messages in thread
From: John Bradford @ 2003-10-18 8:27 UTC (permalink / raw)
To: Krzysztof Halasa
Cc: Rogier Wolff, Norman Diamond, Hans Reiser, Wes Janzen, linux-kernel
This is off-topic for linux-kernel. Please move the discussion
elsewhere if you want to continue it.
> > I said an _additional_ bit. I am assuming that N-1 reads returned the
> > same, (bad), data, which was identified as bad. Read N encountered
> > one too many flipped bits and returned a false positive. Perfectly
> > possible, and arguably more likely than all of the existing incorrect
> > bits flipping back, resulting in the correct data being read back, in
> > some cases.
>
> In some cases, theoretically, yes. But I've never got anything like that
> in practice.
>
> BTW: Hard drives apparently use more sophisticated algorithms,
> involving measuring head signal level even when there is no problem
> reading the data, and eventually remapping a sector on read before the
> information is lost.
Yes, but some blocks on drives that are used for archiving data may
not have been read for months or even years. They may have multiple
errors which have not been detected and remapped.
This is a simplified example, but say in one particular drive, the
error correction can cope with around 25% of the bits on the platter
being incorrect, and still recover the data. If 50% of the bits are
incorrect, and you read it N-1 times and get an error, but get no
error on try N, what is more likely, that suddenly there are only 25%
incorrect bits or that 51% are now wrong, and you are getting a false
positive?
Now, I am not saying that the false positive is always more likely - a
change in temperature, or slight head movement so that it is reading
the track off-centre and getting a less corrupted signal as a result,
could make the error rate drop to 25%, but I wouldn't assume that had
happened.
> > Tell this to the drive manufacturers. They are the ones who can sell
> > you a specialist firmware if you want to do data recovery, not me.
>
> Maybe. But, you know, it's Linux and I don't want to pay for additional
> software just to use disks already paid for. Especially when it's all
> working fine now.
Well, the original firmware wasn't 'free' in any sense of the word, so
I wouldn't expect a more advanced firmware to be 'free' either.
Drive manufacturers could sell advanced firmware to data recovery
companies for a price that would pay for itself after 3-4 data
recovery jobs. Given that you could then do far more advanced
recovery then people could themselves, I am suprised this hasn't
happened before. Of course, free and open firmware would be nice in
general, but that hasn't arrived yet.
Besides, I don't think it is all working fine now. About your only
method of data recovery is to retry reading a bad block over and over
again, possibly varying things like the temperature of the drive. You
can't get the raw bits off of the platter, or accurately position the
heads off-centre from the tracks, for example.
> > Your argument is flawed - how can you claim the current situation is
> > sane when at least some drive manufactuers don't publish simple facts
> > such as what happens when defective blocks are encountered on reads
> > and on writes?
>
> Do you think you can make them publish such things? It would be great.
>
> > If a system got in to a state as extreme as that, I'd generally take
> > the hole system down. Electromagnatic interference that affects one
> > drive immediately noticably may well be affecting other components in
> > subtle ways - possible _silent_ data corruption in other words.
>
> Possibly. Possibly the machine will immediately freeze. But data on
> disk platters will probably be ok, and you'll be able to read it
> when the conditions are back in specs.
Possibly, but look at the wider picture - data in RAM may be badly
corrupted. If you shut down the machine gracefully, that corrupted
data may get written to disk. If you force the machine off, that data
is lost. Either way, I wouldn't just turn it back on and hope for the
best. OK, if it was my own data, or there was a good reason to, (for
example, the client decides that time is more critical than data
integrity), maybe I would, but if somebody is paying for consultancy,
especially if it is at a rate that makes the cost of a hard disk
fairly insignificant, then not at least considering the possibility of
silent data corruption is irresponsible. Concluding that the risk of
data corruption is so small that it is insignificant may suffice in
some cases, but not necessarily all of them.
> > Yes. Or more specifically, I wouldn't trust that data without
> > verifying it. It's easy to ignore such problems and say that
> > everything is probably OK, and maybe 99% of the time you would be
> > right, but so what? What about that 1%?
>
> That's not 1% - rather something like 10^-17 or so.
> See the specs.
Hmmm, that sounds like you're talking about the chance of an error in
a single block.
If a machine starts showing sudden, noticable problems because of
something like a volate spike, I don't think you can reliably predict
what may have happened to data in RAM, including the cache on the
disk, which will presumably be flushed when you powrer down, unless
the disk has been put in to a very confused state by the voltage
spike, or whatever else has caused the problem.
Infact, if a PSU is failing, how do you know mains voltage won't
suddenly fly through the machine? Don't claim that has never
happened!
> And we have CRCs all over the place - damaged .gnumeric file will
> probably fail gunzip stage.
Yes, but presumably you want to identify such a corrupted file _now_
instead of in 6 months time. Verifying CRCs may well be sufficient in
many cases, I am not disputing that.
> BTW: the probability of silently corrupting, say, (D)RAM contents is
> much much higher than that of corrupting HDD data. Even if you use
> ECC RAM.
In a typical machine, usually yes.
> > > Do you really not value your data enough to mark it as inaccessible?
> >
> > Not sure what you mean - in what context?
>
> Remapping a sector on read without actually copying the data makes
> it inaccessible. Unless you have manufacturer-provided software, of
> course, but I haven't seen any.
On a very busy proxy or news server, maybe you'd rather remap the
sector, write zeros to the new one, and obtain a new copy of the data
over the network, without the disk spending ages trying to recover the
data. If the disk is part of an array, and another disk got your data
for you, you might want to remap the sector immediately.
Although, to be honest, except where performance is critical, remap on
read is pointless. It saves you from having to identify the bad block
again when you write to it. Generally, guaranteed remap on write is
what I want. What happens on read is less important if your data
isn't intact. I can see your point of view for not re-mapping on read
given that advanced firmwares are not available, and the fact that it
allows you to do some form of data recovery. Overall, though, if it
gets to the point where you have to start doing such data recovery,
downtime is usually significant, and for some applications, having the
data in a week's time may be little more than useless. Predicting
possible disk fauliures is a good idea.
> > Data recovery is always a last resort. On the other hand, backing up
> > data daily can still result in 23 hours of lost data, so I consider
> > early detection of faulty disks very important. Mirroring brings it's
> > own problems to consider - more devices to possibly fail, and if they
> > are connected to the same controller, a serious fault with any one
> > could usually theoretically destroy all of them.
>
> It all depends on requirements. If you need 100% uninterrupted service
> you can use mirrored servers, possibly installed in different locations.
> This will fix potential problems, while remapping on failed read will
> not.
I never actually suggested that remapping on an unrecovered failed
read would solve any data integrity problems.
I did suggest that data which was recovered automatically by the drive
on a second or subsequent read should result in a remapping of that
block.
My most important point is that writes should never fail on a good
drive. If they do, I would not use the drive for critical data
anymore. Presumably typical drive firmware will try several times to
do a write before reporting an error to the user - presumably it would
have to incase one or more replacement blocks are bad too. Maybe such
failiures were temporary, caused by a voltage spike, for example, but
it would still be the case that the drive couldn't get itself back in
to a good state and retry the operation, and I would be suspicious of
it being reliable in the long term.
John.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-18 8:27 ` John Bradford
@ 2003-10-18 12:02 ` Krzysztof Halasa
2003-10-18 16:26 ` Nuno Silva
0 siblings, 1 reply; 61+ messages in thread
From: Krzysztof Halasa @ 2003-10-18 12:02 UTC (permalink / raw)
To: John Bradford
Cc: Rogier Wolff, Norman Diamond, Hans Reiser, Wes Janzen, linux-kernel
John Bradford <john@grabjohn.com> writes:
> Although, to be honest, except where performance is critical, remap on
> read is pointless. It saves you from having to identify the bad block
> again when you write to it. Generally, guaranteed remap on write is
> what I want.
Then I think we have an agreement.
> I did suggest that data which was recovered automatically by the drive
> on a second or subsequent read should result in a remapping of that
> block.
AFAIK this is what the drives do.
> My most important point is that writes should never fail on a good
> drive.
That's certainly what the drives do. Unless they are out of spare
sectors, of course.
Doing cat /dev/zero > /dev/hd* fixes all bad sectors on modern drive.
--
Krzysztof Halasa, B*FH
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-18 12:02 ` Krzysztof Halasa
@ 2003-10-18 16:26 ` Nuno Silva
2003-10-18 20:16 ` Krzysztof Halasa
0 siblings, 1 reply; 61+ messages in thread
From: Nuno Silva @ 2003-10-18 16:26 UTC (permalink / raw)
To: linux-kernel
Krzysztof Halasa wrote:
[..snip..]
>
> Doing cat /dev/zero > /dev/hd* fixes all bad sectors on modern drive.
Yeah! I'm doing this right now because the data in hda is very important
and and don't do backups since August!! :-D
Regards,
Nuno Silva
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-18 16:26 ` Nuno Silva
@ 2003-10-18 20:16 ` Krzysztof Halasa
0 siblings, 0 replies; 61+ messages in thread
From: Krzysztof Halasa @ 2003-10-18 20:16 UTC (permalink / raw)
To: Nuno Silva; +Cc: linux-kernel
Nuno Silva <nuno.silva@vgertech.com> writes:
> > Doing cat /dev/zero > /dev/hd* fixes all bad sectors on modern drive.
>
> Yeah! I'm doing this right now because the data in hda is very
> important and and don't do backups since August!! :-D
Aaah right... August - which year exactly? :-)
(Just in case someone wants to try this on live disk - it erases all data
in the process).
--
Krzysztof Halasa, B*FH
^ permalink raw reply [flat|nested] 61+ messages in thread
[parent not found: <m37k33igui.fsf@defiant. <m3u166vjn0.fsf@defiant.pm.waw.pl>]
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
[not found] ` <m37k33igui.fsf@defiant. <m3u166vjn0.fsf@defiant.pm.waw.pl>
@ 2003-10-21 20:39 ` bill davidsen
0 siblings, 0 replies; 61+ messages in thread
From: bill davidsen @ 2003-10-21 20:39 UTC (permalink / raw)
To: linux-kernel
In article <m3u166vjn0.fsf@defiant.pm.waw.pl>,
Krzysztof Halasa <khc@pm.waw.pl> wrote:
| John Bradford <john@grabjohn.com> writes:
|
| > My most important point is that writes should never fail on a good
| > drive.
|
| That's certainly what the drives do. Unless they are out of spare
| sectors, of course.
|
| Doing cat /dev/zero > /dev/hd* fixes all bad sectors on modern drive.
Flash from the past, back in the days of MFM drives, and "new" RLL
controllers, we wrote software which regularly read all the data off a
track with appropriate retries, reformatted the track, wrote the data,
and read it back to verify. This was because of 'sector walk" which made
the sectors move relative to the IRG. And we wrote our own device
drivers to use large sectors to get more capacity, those were the days.
However, that's the kind of thing I would hope S.M.A.R.T. could do, with
relocation of course.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 61+ messages in thread
* ATA Defect management
2003-10-17 9:40 ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
` (2 preceding siblings ...)
2003-10-17 10:24 ` Rogier Wolff
@ 2003-10-17 10:37 ` John Bradford
2003-10-21 20:44 ` bill davidsen
2003-10-17 12:08 ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Justin Cormack
2003-10-21 20:12 ` bill davidsen
5 siblings, 1 reply; 61+ messages in thread
From: John Bradford @ 2003-10-17 10:37 UTC (permalink / raw)
To: Norman Diamond, Hans Reiser, Wes Janzen, Rogier Wolff
Cc: eric_mudama, linux-kernel, john
[Note to Eric, who is CC'ed, can you comment on how Maxtor drives
handle these issues?]
Quote from "Norman Diamond" <ndiamond@wta.att.ne.jp>:
> Friends in the disk drive section at Toshiba said this:
>
> When a drive tries to read a block, if it detects errors, it retries up to
> 255 times. If a retry succeeds then the block gets reallocated. IF 255
> RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
OK, this is interesting, at least we have some specific information.
> This was so unbelievable to that I had to confirm this with them in
> different words. In case of a temporary error, the drive provides the
> recovered data as the result of the read operation and the drive writes the
> data to a reallocated sector. In case of a permanent error, the block is
> assumed bad, and of course the data are lost. Since the data are assumed
> lost, the drive keeps the defective LBA sector number associated with the
> same defective physical block and it does not reallocate the defective
> block.
OK, so for a stupid, backward, legacy OS, that takes the 'what is the
point of substituting a spare block if you have nothing to write to
it' viewpoint, maybe that would make sense - for the rest of us it's
stupid - did anybody actually consider that responsible admins
actually make backups and want to restore them on to the disk without
having to concern themselves with defect management which, shock,
horror, is supposed to be done by the drive. Of course, the poor
admin who made a sector-by-sector backup is completely out of luck
when he comes to restore it on to a drive that insists one sector is
bad.
> I explained to them why the LBA sector number should still get reallocated
> even though the data are lost. When the sector isn't reallocated, I could
> repartition the drive and reformat the partition and the OS wouldn't know
> about the defective block so the OS would try again to use it. At first
> they did not believe I could do this, but I explained to them that I'm still
> able to delete partitions and create new partitions etc., and then they
> understood.
>
> They also said that a write operation has a chance of getting the bad block
> reallocated. The conditions for reallocation on write are similar but not
> identical to the conditions for reallocate on read. During a write
> operation if a sector is determined to be permanently bad (255 failing
> retries) then it is likely to be reallocated, unlike a read. But I'm not
> sure if this is guaranteed or not.
No, I'm sorry, are we to believe that it might or might not get
re-allocated just by chance? This is rediculous.
> We agreed that we should try it on my
> bad sector, but if the drive again detects a permantent error then it will
> not reallocate the sector. First I still want to find which file contains
> the sector; I haven't had time for this on weekdays.
>
> When I ran the "long" S.M.A.R.T. self-test, the number of reallocated
> sectors and number of reallocation events both increased from 1 to 2, but
> the known bad sector remained bad. This is entirely because of the behavior
> as designed. The self-test detected a temporary error in some other
> unrelated sector, rescued the data in that unreported sector number, and
> reallocated it. That was only a coincidence. The known bad sector was
> detected yet again as permanently bad and was not reallocated.
Even though you are _deliberately_ running a self test to check for
this kind of problem?
> In this mailing list there has been some discussion of whether file systems
> should keep lists of known bad blocks and hide those bad blocks from
> ordinary operations in ordinary usage. Of course historically this was
> always necessary. As someone else mentioned, and I've done it too, when
> formatting a disk drive, type in the list of known bad block numbers that
> were printed on a piece of paper that came with the drive.
>
> In modern times, some people think that this shouldn't be necessary because
> the drive already does its best to reallocate bad blocks. WRONG. THE BAD
> BLOCK LIST REMAINS AS NECESSARY AS IT ALWAYS WAS.
I made that claim, and stand by it.
Note one thing:
If you are right, you are basically suggesting that we will have to go
back to writing defective sectors on a sticker on the drive casing.
If you do a:
dd if=/dev/zero of=/dev/hda
you loose that bad block list. Now, you've got to enter it in again,
or let the OS scan the disk surface and find the bad sectors. Hello,
this is the third millennium. This may have been a way of life twenty
years ago, but I hope we have moved on from there.
Oh, and what happens if block zero is defective, eh? The disk is no
longer usable as a boot disk, because the MBR can't be written to
block zero?
What if I want to use my disk for storing a TAR archive? Why should
we bloat TAR with bad block support?
> This design might change in the future, but it might not. My friends are
> afraid that they might lose their jobs if they try to suggest such a change
> in the high-level design of disk drive corporate politics. I only hope this
> posting doesn't get them fired. (This is not a frivolous concern by the
> way. The myth of lifetime employment is a less pervasive myth than it used
> to be, and Toshiba is pretty much average in both world and Japanese
> standards for corporate politics.)
If anything, it should get them a promotion. If somebody from Toshiba
wants to discuss defect management with me, they are welcome to, I'll
waive my consultancy fees, (at least initially).
> Regarding finding which file contains the known bad sector, someone in this
> mailing list said that the badblocks program could help, but the manual page
> for the badblocks program doesn't give any clues as to how it would help.
> I'm still doing find of all files in the partition and cp them to /dev/null.
>
> Meanwhile, yes we do need to record those bad block lists and try to never
> let them get allocated to user-visible files.
NO. Fix the drives. If nobody is going to do that, I might as well
join the Linux-VAX project and run by business on a cluster of
11/780s.
Let me make this clear - some of us earn a living providing solutions
to clients who pay good money for that consultancy. If they loose
data, have downtime or have any other problems, my clients will
generally come back to _ME_ for an explaination, and I want something
better than, "Well, that's the way the drives work".
We have identified a problem, now let's fix it.
Defect management needs to be done by the disk firmware, and it needs
to be done properly.
Note - this is not a criticism of Toshiba, nor am I implying that it
is in any way limited to their products. I am grateful for them
providing information on the subject. I own two of their laptops
which run Linux perfectly, and I am generally pleased with their
products.
Note also - I realise that the defect management techniques you
describe don't actually seem to allow data to be written to a bad
sector undetected. A permenantly bad sector apparently won't become
'apparently good, but subtly bad', and loose data after time, but that
is not the point. With write caching in the OS, data could be
allocated to an undetected-at-the-OS-level bad sector, and cause
problems when it is written out. With the recent laptop mode patch we
are going to see more delayed writes going on.
John.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: ATA Defect management
2003-10-17 10:37 ` ATA Defect management John Bradford
@ 2003-10-21 20:44 ` bill davidsen
0 siblings, 0 replies; 61+ messages in thread
From: bill davidsen @ 2003-10-21 20:44 UTC (permalink / raw)
To: linux-kernel
In article <200310171037.h9HAbOrv000559@81-2-122-30.bradfords.org.uk>,
John Bradford <john@grabjohn.com> wrote:
| [Note to Eric, who is CC'ed, can you comment on how Maxtor drives
| handle these issues?]
|
| Quote from "Norman Diamond" <ndiamond@wta.att.ne.jp>:
| > Friends in the disk drive section at Toshiba said this:
| >
| > When a drive tries to read a block, if it detects errors, it retries up to
| > 255 times. If a retry succeeds then the block gets reallocated. IF 255
| > RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
|
| OK, this is interesting, at least we have some specific information.
|
| > This was so unbelievable to that I had to confirm this with them in
| > different words. In case of a temporary error, the drive provides the
| > recovered data as the result of the read operation and the drive writes the
| > data to a reallocated sector. In case of a permanent error, the block is
| > assumed bad, and of course the data are lost. Since the data are assumed
| > lost, the drive keeps the defective LBA sector number associated with the
| > same defective physical block and it does not reallocate the defective
| > block.
Not so. Assuming the admin is restoring to the same bad drive (the
twit!), since the drive does do relocate on write, the recovery will
work, the data will be whole, and life will be good.
I'm not sure why one would do a by-sector backup, but I guess for some
filesystems or raw database info it might be useful.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 9:40 ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
` (3 preceding siblings ...)
2003-10-17 10:37 ` ATA Defect management John Bradford
@ 2003-10-17 12:08 ` Justin Cormack
2003-10-21 20:12 ` bill davidsen
5 siblings, 0 replies; 61+ messages in thread
From: Justin Cormack @ 2003-10-17 12:08 UTC (permalink / raw)
To: Norman Diamond
Cc: Hans Reiser, Wes Janzen, Rogier Wolff, John Bradford,
Kernel mailing list, nikita, Pavel Machek
On Fri, 2003-10-17 at 10:40, Norman Diamond wrote:
> Friends in the disk drive section at Toshiba said this:
>
> When a drive tries to read a block, if it detects errors, it retries up to
> 255 times. If a retry succeeds then the block gets reallocated. IF 255
> RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
>
> This was so unbelievable to that I had to confirm this with them in
> different words. In case of a temporary error, the drive provides the
> recovered data as the result of the read operation and the drive writes the
> data to a reallocated sector. In case of a permanent error, the block is
> assumed bad, and of course the data are lost. Since the data are assumed
> lost, the drive keeps the defective LBA sector number associated with the
> same defective physical block and it does not reallocate the defective
> block.
>
> I explained to them why the LBA sector number should still get reallocated
> even though the data are lost. When the sector isn't reallocated, I could
> repartition the drive and reformat the partition and the OS wouldn't know
> about the defective block so the OS would try again to use it. At first
> they did not believe I could do this, but I explained to them that I'm still
> able to delete partitions and create new partitions etc., and then they
> understood.
>
> They also said that a write operation has a chance of getting the bad block
> reallocated. The conditions for reallocation on write are similar but not
> identical to the conditions for reallocate on read. During a write
> operation if a sector is determined to be permanently bad (255 failing
> retries) then it is likely to be reallocated, unlike a read. But I'm not
> sure if this is guaranteed or not. We agreed that we should try it on my
> bad sector, but if the drive again detects a permantent error then it will
> not reallocate the sector. First I still want to find which file contains
> the sector; I haven't had time for this on weekdays.
>
I have found that in teh case of blocks that wont reallocate with reads,
a sufficiently large number of reads and writes will fix them
eventually, by reallocating. The bahaviour doesnt seem entirely
predicatable (but then failure modes often arent), but given time it is
possible to do.
> In this mailing list there has been some discussion of whether file systems
> should keep lists of known bad blocks and hide those bad blocks from
> ordinary operations in ordinary usage. Of course historically this was
> always necessary. As someone else mentioned, and I've done it too, when
> formatting a disk drive, type in the list of known bad block numbers that
> were printed on a piece of paper that came with the drive.
>
This really isnt going to work with swap partitions and suchlike. If you
cant get rid of a bad sector with reads and writes on badblocks, smart
tests and the manufacturers low level format then it is defective and
you should discard it or return it under warranty. The bad blcoks list
is really not needed. If you really want to do it, use dm to remap the
raw device without bad blocks, then you can still use it on filesystems
without badblocks support (eg swap, raid etc). The device mapping stuff
should have no trouble with this.
> Regarding finding which file contains the known bad sector, someone in this
> mailing list said that the badblocks program could help, but the manual page
> for the badblocks program doesn't give any clues as to how it would help.
> I'm still doing find of all files in the partition and cp them to /dev/null.
use the read test on badblocks to find the sector, then use the write
tests tooverwrite it until the badblock is fixed, then fsck your
partition. If you get errors then the block was metadata. Otherwise
md5sum your files and check against the backups. That will tell you the
file... For ext2 I believe there are some tools, for other file systems
it might be more difficult.
Justin
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?)
2003-10-17 9:40 ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Norman Diamond
` (4 preceding siblings ...)
2003-10-17 12:08 ` Blockbusting news, this is important (Re: Why are bad disk sectors numbered strangely, and what happens to them?) Justin Cormack
@ 2003-10-21 20:12 ` bill davidsen
5 siblings, 0 replies; 61+ messages in thread
From: bill davidsen @ 2003-10-21 20:12 UTC (permalink / raw)
To: linux-kernel
In article <11bf01c39492$bc5307c0$3eee4ca5@DIAMONDLX60>,
Norman Diamond <ndiamond@wta.att.ne.jp> wrote:
| Friends in the disk drive section at Toshiba said this:
|
| When a drive tries to read a block, if it detects errors, it retries up to
| 255 times. If a retry succeeds then the block gets reallocated. IF 255
| RETRIES FAIL THEN THE BLOCK DOES NOT GET REALLOCATED.
|
| This was so unbelievable to that I had to confirm this with them in
| different words. In case of a temporary error, the drive provides the
| recovered data as the result of the read operation and the drive writes the
| data to a reallocated sector. In case of a permanent error, the block is
| assumed bad, and of course the data are lost. Since the data are assumed
| lost, the drive keeps the defective LBA sector number associated with the
| same defective physical block and it does not reallocate the defective
| block.
Sounds right to me. If you relocate the LBA sector then on retry I will
(a) read {something} without error, and (b) it will NOT be my data, and
(c) I will not get back an error to tell me I am reading crap. In other
words, to do anything else would result in my silently getting back bad
data!
What should be done is to relocate after successful retry or after
unsuccessful write, because in both cases the drive has valid data to
relocate.
Blockbusting news, I think they're doing it just right. The object is
not to do a read and get no error, the object is to read and get correct
data, and if that doesn't happen, let the controller, o/s, or
application know about it decide what to do then.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 61+ messages in thread