All of lore.kernel.org
 help / color / mirror / Atom feed
* Why not just return an error?
@ 2016-10-06 23:32 Dark Penguin
  2016-10-07  5:26 ` keld
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Dark Penguin @ 2016-10-06 23:32 UTC (permalink / raw)
  To: linux-raid

Greetings!

The more I read about md-raid, the more I notice that the biggest 
problem of it: if you hit an error on a degraded RAID, it falls apart. 
Because of this, it is possible to lose a huge amount of data due to one 
tiny read error, which particularly makes raid5 the sword of Damocles.

But one question keeps me increasingly frustrated. Yes, during its 
normal functioning, it totally makes sense to kick a faulty device out 
of an array. But if we're running a degraded array, and doing so will 
definitely result is massive data loss, why not just return a read error 
instead? Just add a little check: on error, if degraded -> then just 
return an error. I believe this is the dream of everyone who had ever 
dealt with RAIDs.

With RAID, the first proprity is keeping data safe. Yes, it's not an 
alternative to backups and all that, but still - if we hit an error on a 
degraded array, the array should scream and panic and send all kinds of 
warnings, but definitely NOT collapse and warrant a visit to the RAID 
recovery laboratory (or this mailing list). Imagine how much headache 
and lost hair would that relieve!..

Now, I'm probably not the first one to think of such a bright idea. So 
there must be a very good reason why this is not possible; I don't think 
the problem is just that "the existing behaviour is preferred, and 
anyone who does not agree is an idiot". If not for enterprise use, then 
at least it would be very useful for the "home archive" scenario when 
"uptime" and "absense of errors" hold much less meaning than "losing one 
file and not all the data". So, why is this not possible?..


-- 
darkpenguin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-06 23:32 Why not just return an error? Dark Penguin
@ 2016-10-07  5:26 ` keld
  2016-10-07  8:21   ` Rudy Zijlstra
  2016-10-07 11:21 ` Andreas Klauer
  2016-10-07 14:19 ` Phil Turmel
  2 siblings, 1 reply; 21+ messages in thread
From: keld @ 2016-10-07  5:26 UTC (permalink / raw)
  To: Dark Penguin; +Cc: linux-raid

On Fri, Oct 07, 2016 at 02:32:40AM +0300, Dark Penguin wrote:
> Greetings!
> 
> The more I read about md-raid, the more I notice that the biggest 
> problem of it: if you hit an error on a degraded RAID, it falls apart. 
> Because of this, it is possible to lose a huge amount of data due to one 
> tiny read error, which particularly makes raid5 the sword of Damocles.
> 
> But one question keeps me increasingly frustrated. Yes, during its 
> normal functioning, it totally makes sense to kick a faulty device out 
> of an array. But if we're running a degraded array, and doing so will 
> definitely result is massive data loss, why not just return a read error 
> instead? Just add a little check: on error, if degraded -> then just 
> return an error. I believe this is the dream of everyone who had ever 
> dealt with RAIDs.
> 
> With RAID, the first proprity is keeping data safe. Yes, it's not an 
> alternative to backups and all that, but still - if we hit an error on a 
> degraded array, the array should scream and panic and send all kinds of 
> warnings, but definitely NOT collapse and warrant a visit to the RAID 
> recovery laboratory (or this mailing list). Imagine how much headache 
> and lost hair would that relieve!..
> 
> Now, I'm probably not the first one to think of such a bright idea. So 
> there must be a very good reason why this is not possible; I don't think 
> the problem is just that "the existing behaviour is preferred, and 
> anyone who does not agree is an idiot". If not for enterprise use, then 
> at least it would be very useful for the "home archive" scenario when 
> "uptime" and "absense of errors" hold much less meaning than "losing one 
> file and not all the data". So, why is this not possible?..

Likewise, when the first disk fails, one could mark it as kind of in an error state,
and keep it running, and if one gets a read error, then you could get
the data from the good disks.

Often read errors can be remedied by writing data to the failing disk.
The good data could then be obtained from the good parts of the array.

This behaviour could be optional and could even be set during operation.

Best regards
keld

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-07  5:26 ` keld
@ 2016-10-07  8:21   ` Rudy Zijlstra
  2016-10-07  9:30     ` keld
  0 siblings, 1 reply; 21+ messages in thread
From: Rudy Zijlstra @ 2016-10-07  8:21 UTC (permalink / raw)
  To: keld, Dark Penguin; +Cc: linux-raid



Op 07-10-16 om 07:26 schreef keld@keldix.com:
> On Fri, Oct 07, 2016 at 02:32:40AM +0300, Dark Penguin wrote:
>> Greetings!
>>
>> The more I read about md-raid, the more I notice that the biggest
>> problem of it: if you hit an error on a degraded RAID, it falls apart.
>> Because of this, it is possible to lose a huge amount of data due to one
>> tiny read error, which particularly makes raid5 the sword of Damocles.
>>
>> But one question keeps me increasingly frustrated. Yes, during its
>> normal functioning, it totally makes sense to kick a faulty device out
>> of an array. But if we're running a degraded array, and doing so will
>> definitely result is massive data loss, why not just return a read error
>> instead? Just add a little check: on error, if degraded -> then just
>> return an error. I believe this is the dream of everyone who had ever
>> dealt with RAIDs.
>>
>> With RAID, the first proprity is keeping data safe. Yes, it's not an
>> alternative to backups and all that, but still - if we hit an error on a
>> degraded array, the array should scream and panic and send all kinds of
>> warnings, but definitely NOT collapse and warrant a visit to the RAID
>> recovery laboratory (or this mailing list). Imagine how much headache
>> and lost hair would that relieve!..
>>
>> Now, I'm probably not the first one to think of such a bright idea. So
>> there must be a very good reason why this is not possible; I don't think
>> the problem is just that "the existing behaviour is preferred, and
>> anyone who does not agree is an idiot". If not for enterprise use, then
>> at least it would be very useful for the "home archive" scenario when
>> "uptime" and "absense of errors" hold much less meaning than "losing one
>> file and not all the data". So, why is this not possible?..
> Likewise, when the first disk fails, one could mark it as kind of in an error state,
> and keep it running, and if one gets a read error, then you could get
> the data from the good disks.
>
> Often read errors can be remedied by writing data to the failing disk.
> The good data could then be obtained from the good parts of the array.
>
> This behaviour could be optional and could even be set during operation.
>
> Best regards
> keld
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

One big reason is human behaviour. And it is human behaviour that in the 
end causes all the collapsed raids. I have lost count how often i have 
seen requests for help once the raid had collapsed. But the earlier 
signal, where the RAID had become degraded was ignored. This means that 
if you only give an error message and continue going you will -- most 
likely in increasing rate -- have errors in the files. Very quickly it 
will become impossible to state which file is correct and which is not. 
Essentially you have lost at that point all information with NO ability 
to recover. Unless you have a backup....

That is one of the big reasons the behaviour is as it is. RAID is 
intented to guarantee the consistency and correctness of the stored 
data. When this becomes impossible, the only way out is to clearly 
signal this. Even a collapsed RAID has  more consistent data (although 
it takes effort to recover) then a corrupted RAID which would be the 
result of your proposal. The corruption resulting from your proposal 
above CANNOT be recovered.


Cheers

Rudy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-07  8:21   ` Rudy Zijlstra
@ 2016-10-07  9:30     ` keld
  0 siblings, 0 replies; 21+ messages in thread
From: keld @ 2016-10-07  9:30 UTC (permalink / raw)
  To: Rudy Zijlstra; +Cc: Dark Penguin, linux-raid

On Fri, Oct 07, 2016 at 10:21:26AM +0200, Rudy Zijlstra wrote:
> 
> 
> Op 07-10-16 om 07:26 schreef keld@keldix.com:
> >On Fri, Oct 07, 2016 at 02:32:40AM +0300, Dark Penguin wrote:
> >>Greetings!
> >>
> >>The more I read about md-raid, the more I notice that the biggest
> >>problem of it: if you hit an error on a degraded RAID, it falls apart.
> >>Because of this, it is possible to lose a huge amount of data due to one
> >>tiny read error, which particularly makes raid5 the sword of Damocles.
> >>
> >>But one question keeps me increasingly frustrated. Yes, during its
> >>normal functioning, it totally makes sense to kick a faulty device out
> >>of an array. But if we're running a degraded array, and doing so will
> >>definitely result is massive data loss, why not just return a read error
> >>instead? Just add a little check: on error, if degraded -> then just
> >>return an error. I believe this is the dream of everyone who had ever
> >>dealt with RAIDs.
> >>
> >>With RAID, the first proprity is keeping data safe. Yes, it's not an
> >>alternative to backups and all that, but still - if we hit an error on a
> >>degraded array, the array should scream and panic and send all kinds of
> >>warnings, but definitely NOT collapse and warrant a visit to the RAID
> >>recovery laboratory (or this mailing list). Imagine how much headache
> >>and lost hair would that relieve!..
> >>
> >>Now, I'm probably not the first one to think of such a bright idea. So
> >>there must be a very good reason why this is not possible; I don't think
> >>the problem is just that "the existing behaviour is preferred, and
> >>anyone who does not agree is an idiot". If not for enterprise use, then
> >>at least it would be very useful for the "home archive" scenario when
> >>"uptime" and "absense of errors" hold much less meaning than "losing one
> >>file and not all the data". So, why is this not possible?..
> >Likewise, when the first disk fails, one could mark it as kind of in an 
> >error state,
> >and keep it running, and if one gets a read error, then you could get
> >the data from the good disks.
> >
> >Often read errors can be remedied by writing data to the failing disk.
> >The good data could then be obtained from the good parts of the array.
> >
> >This behaviour could be optional and could even be set during operation.
> >
> >Best regards
> >keld
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> One big reason is human behaviour. And it is human behaviour that in the 
> end causes all the collapsed raids. I have lost count how often i have 
> seen requests for help once the raid had collapsed. But the earlier 
> signal, where the RAID had become degraded was ignored. This means that 
> if you only give an error message and continue going you will -- most 
> likely in increasing rate -- have errors in the files. Very quickly it 
> will become impossible to state which file is correct and which is not. 
> Essentially you have lost at that point all information with NO ability 
> to recover. Unless you have a backup....
> 
> That is one of the big reasons the behaviour is as it is. RAID is 
> intented to guarantee the consistency and correctness of the stored 
> data. When this becomes impossible, the only way out is to clearly 
> signal this. Even a collapsed RAID has  more consistent data (although 
> it takes effort to recover) then a corrupted RAID which would be the 
> result of your proposal. The corruption resulting from your proposal 
> above CANNOT be recovered.

I believe you are incorrect. As long as it is marked which parts of the array
that are in error, we know which data is good.  Of cause some data may be unobtainable,
but this may well be just a few files, and the rest will be good. A much better result
than all data being lost!

Anyway this could be an optional feature, so that it can be chosen or not be chosen.

Best regards
keld

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-06 23:32 Why not just return an error? Dark Penguin
  2016-10-07  5:26 ` keld
@ 2016-10-07 11:21 ` Andreas Klauer
  2016-10-07 14:43   ` Phil Turmel
  2016-10-07 14:19 ` Phil Turmel
  2 siblings, 1 reply; 21+ messages in thread
From: Andreas Klauer @ 2016-10-07 11:21 UTC (permalink / raw)
  To: Dark Penguin; +Cc: linux-raid

On Fri, Oct 07, 2016 at 02:32:40AM +0300, Dark Penguin wrote:
> why not just return a read error instead?

You make it sound like it solves all problems, but it does not.
Errors are just not part of the concept anywhere really.

If a filesystem encounters one, it might flip into read only mode;
if a program encounters one it might do whatever.
You still have a huge data loss, corrupt databases, et cetera.

Even so, is that not what you have with "bad block log" enabled, 
within reason? I disable it everywhere. I want my disks kicked.

Using cosmetics to hide errors only works to a certain limit. 
In the end, RAID only works if the disks work. RAID 5 with 
two dead disks is dead, no way to get around that. Disks go bad 
and need to be replaced, if you don't do that, you'll just fail 
even more horribly later on.

> I believe this is the dream of everyone who had ever 
> dealt with RAIDs.

My dream is different. I don't want errors. I want it to work. ;)
And it does, as long as you make sure your disks are healthy.

And if you make every effort to keep broken disks in your arrays, 
it just won't work. All promises are off - RAID promises to survive 
one or two dead disks, but that's only if all other disks are in 
perfect working order for the time it takes to rebuild.

Your disk produces read errors, or needs 3 minutes to read a single sector, 
what use is it to anyone? I'm not letting those disks stay, no matter how 
many more people preach that "read errors are normal". No. They're not. 
Such disks are utter and complete trash and have to go.

Don't wait for MD to kick disks out either. Check your disks. 
Actually replace them if they have errors. Most RAIDs die due 
to people not monitoring their disks, or delaying replacements.

Replacing disks costs money but that is the price you have to pay 
for the luxury of using RAID (especially at home) in the first place. 
When buying a RAID system, the money for the next replacement disk 
should always be planned into your budget. If you max it out or 
overdraw your budget for those fancy enterprise RAID disks, 
you'll find they die just the same.

Also make backups. RAID never replaces backups.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-06 23:32 Why not just return an error? Dark Penguin
  2016-10-07  5:26 ` keld
  2016-10-07 11:21 ` Andreas Klauer
@ 2016-10-07 14:19 ` Phil Turmel
  2 siblings, 0 replies; 21+ messages in thread
From: Phil Turmel @ 2016-10-07 14:19 UTC (permalink / raw)
  To: Dark Penguin, linux-raid

On 10/06/2016 07:32 PM, Dark Penguin wrote:
> Greetings!
> 
> The more I read about md-raid, the more I notice that the biggest
> problem of it: if you hit an error on a degraded RAID, it falls apart.
> Because of this, it is possible to lose a huge amount of data due to one
> tiny read error, which particularly makes raid5 the sword of Damocles.

Because raid is about uptime through failures.  It's not backup, it's
not data consistency.  A degraded array is supposed to be a temporary
state -- the time it takes to install a new drive and rebuild.  A
single-degraded raid6 still has redundancy to carry you through a read
error during rebuild.  Raid5 does not.  That's it.  There's no other
magic, and anything else would be more bug-inducing complexity.

A degraded raid5 isn't raid anymore, just "aid".  You can minimize the
odds of a read error during rebuild by properly scrubbing your arrays
while they are non-degraded, but drive specifications make it clear that
your odds won't be good on large arrays.

> But one question keeps me increasingly frustrated. Yes, during its
> normal functioning, it totally makes sense to kick a faulty device out
> of an array.

{ Possible misconception here: linux raid arrays don't kick out drives
just for read errors.  MD raid will attempt to *fix* the bad sector
using the data from the other drives.  Only if the fix fails will the
drive be ejected.  Timeout mismatch guarantees that the fix will fail. }

> But if we're running a degraded array, and doing so will
> definitely result is massive data loss, why not just return a read error
> instead? Just add a little check: on error, if degraded -> then just
> return an error. I believe this is the dream of everyone who had ever
> dealt with RAIDs.

Stopping the array *preserves* data.  The block layer has no concept of
what's on top, and an error in one place that isn't handled could easily
turn into corruption in otherwise good places.  Layered block devices
require a sysadmin to evaluate the situation.

> With RAID, the first proprity is keeping data safe. Yes, it's not an
> alternative to backups and all that, but still - if we hit an error on a
> degraded array, the array should scream and panic and send all kinds of
> warnings, but definitely NOT collapse and warrant a visit to the RAID
> recovery laboratory (or this mailing list). Imagine how much headache
> and lost hair would that relieve!..

Linux raid is widely used.  Traffic on this list is relatively small.
I'm quite sure 99.99% of linux raid users are dealing with these events
just fine:  ddrescue the troublesome drive to another, reassemble with
that, then wipe or replace the original.  Whine to the powers that be
that raid6 would have kept their array up through the event so could
they please fund another drive?  Drown sorrows in beer if the PTB say no.

> Now, I'm probably not the first one to think of such a bright idea. So
> there must be a very good reason why this is not possible; I don't think
> the problem is just that "the existing behaviour is preferred, and
> anyone who does not agree is an idiot". If not for enterprise use, then
> at least it would be very useful for the "home archive" scenario when
> "uptime" and "absense of errors" hold much less meaning than "losing one
> file and not all the data". So, why is this not possible?..

No, you aren't the first to want a magic wand.  Sorry.

Phil

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-07 11:21 ` Andreas Klauer
@ 2016-10-07 14:43   ` Phil Turmel
  2016-10-07 16:23     ` Dark Penguin
  0 siblings, 1 reply; 21+ messages in thread
From: Phil Turmel @ 2016-10-07 14:43 UTC (permalink / raw)
  To: Andreas Klauer, Dark Penguin; +Cc: linux-raid

Good morning Andreas,

On 10/07/2016 07:21 AM, Andreas Klauer wrote:
> On Fri, Oct 07, 2016 at 02:32:40AM +0300, Dark Penguin wrote:
>> why not just return a read error instead?
> 
> You make it sound like it solves all problems, but it does not.
> Errors are just not part of the concept anywhere really.

That's not strictly true. The majority of read errors on large modern
drives are fixable by writing over the troublesome sector.  That may or
may not relocate the sector to the drive's spare area.  Read error
locations that haven't yet been overwritten are identified in the drive
firmware as "Pending Relocations", since the drive doesn't yet know if
the problem is a true media defect or just a write error (power
transient during write, whatever).

Since brand new drives almost never have errors, people assume that's
normal.  Get three or four years in and you see that's not true.  In my
experience, when actual relocations hit double digits, it's time to
replace the drive.  The drive is still operating within spec, though --
it won't be a warranty replacement.

> If a filesystem encounters one, it might flip into read only mode;
> if a program encounters one it might do whatever.
> You still have a huge data loss, corrupt databases, et cetera.

Concur.

> Even so, is that not what you have with "bad block log" enabled, 
> within reason? I disable it everywhere. I want my disks kicked.

I want my disks *fixed* if possible, not kicked.  If they're kicked, the
rest of the good data on that disk is unavailable for keeping my array
running.  I want to see the relocations growing in my daily logwatch
reports so I can use mdadm --replace to maintain the array without *any*
loss of redundancy.

> Using cosmetics to hide errors only works to a certain limit. 
> In the end, RAID only works if the disks work. RAID 5 with 
> two dead disks is dead, no way to get around that. Disks go bad 
> and need to be replaced, if you don't do that, you'll just fail 
> even more horribly later on.

Concur.  We seem to differ on where to draw the line on "bad".

> Your disk produces read errors, or needs 3 minutes to read a single sector, 
> what use is it to anyone? I'm not letting those disks stay, no matter how 
> many more people preach that "read errors are normal". No. They're not. 
> Such disks are utter and complete trash and have to go.

Really?  You get rid of drives on the first read error event?  If you're
discarding them, I'll pay shipping for you to send them to me.  That
would be an especially cost effective source of drives for me. None of
the green or desktop POSes, though.  (-:  Or are you just not noticing
the read errors because MD is silently fixing them for you?

> Don't wait for MD to kick disks out either. Check your disks. 
> Actually replace them if they have errors. Most RAIDs die due 
> to people not monitoring their disks, or delaying replacements.

Yup.

> Replacing disks costs money but that is the price you have to pay 
> for the luxury of using RAID (especially at home) in the first place. 
> When buying a RAID system, the money for the next replacement disk 
> should always be planned into your budget. If you max it out or 
> overdraw your budget for those fancy enterprise RAID disks, 
> you'll find they die just the same.

Enterprise drives are easily justified for heavily loaded arrays in a
small shop.  NAS drives are just fine for small business and home media
servers.  Green and modern desktop drives are utterly unsuited to raid duty.

> Also make backups. RAID never replaces backups.

Indeed.

Phil

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-07 14:43   ` Phil Turmel
@ 2016-10-07 16:23     ` Dark Penguin
  2016-10-07 16:52       ` Phil Turmel
  0 siblings, 1 reply; 21+ messages in thread
From: Dark Penguin @ 2016-10-07 16:23 UTC (permalink / raw)
  To: Phil Turmel, Andreas Klauer, Rudy Zijlstra, keld; +Cc: linux-raid

> Likewise, when the first disk fails, one could mark it as kind of in an error state,
> and keep it running, and if one gets a read error, then you could get
> the data from the good disks.

Yes!! If a drive is "faulty", it means "you should replace it because it 
is failing"; there is no need to actually stop using it and degrade the 
whole RAID operation! What's more, it would be extremely useful at 
rebuilding without any performance loss: let the array work in degraded 
mode, while the faulty drive is being copied to the new one, with only 
read errors reconstructed from the rest of the drives! But that's a 
different issue, and not a very good idea for other reasons.


> One big reason is human behaviour. And it is human behaviour that in the
> end causes all the collapsed raids.

"Human behaviour", that's what I'm talking about. If the only reason to 
do it is to force people to do what is necessary, that approach is 
called "Windows". :) And I do not suggest that it should be the default 
behaviour; instead, we should have an option "--idiotmode 
--yes-i-know-what-i-am-doing" at RAID creation for those who 
specifically want to take the risks.

And of course, no broken files will appear if we suffer from read 
*errors*. We do not suffer from *incorrect reads*, right?..


> You make it sound like it solves all problems, but it does not.
> Errors are just not part of the concept anywhere really.

It does not "solve all problems", but it lets me solve my problems my 
way, and not "the only correct and intended way" - which is what Linux 
is good at. :)


>> > I believe this is the dream of everyone who had ever dealt with RAIDs.
>
> My dream is different. I don't want errors. I want it to work. ;)
> And it does, as long as you make sure your disks are healthy.

I do not suggest that we do it my way and not yours - we have an option 
to do it your way, but we do not have one to do it my way, that's the 
problem. :)

Anyway, if I had a collapsed RAID-5, I would want to at least have an 
easy option to start it in a read-only mode in the last-known working 
state, while the faulty drives are still not out of sync, and recover 
data easily (to my single backup drive), or continue using the array for 
a while, manually deleting one "bad" file if necessary; this is of 
course not a "good thing" to do, but this way, RAID would be at least 
not worse than single drives with faulty sectors, which are capable of 
that, while RAIDs are not! I would be fine with that in my archive - as 
I'm fine with some less importand parts of the archive being on faulty 
single drives. It's just that I don't want to lose the whole drive due 
to a hardware failure - and RAID adds more causes other than that, 
instead of offering more protection against that.


>> > Using cosmetics to hide errors only works to a certain limit.
>> > In the end, RAID only works if the disks work. RAID 5 with
>> > two dead disks is dead, no way to get around that. Disks go bad
>> > and need to be replaced, if you don't do that, you'll just fail
>> > even more horribly later on.
>
> Concur.  We seem to differ on where to draw the line on "bad".

And I think that line should be easy to move, so that anyone could 
choose their own! I understand that RAID is meant for "uptime, not 
backups" - for enterprise production. And everything that you say is 
correct about this case. However, there are other uses - like mirroring 
my backup archive to protect against whole-drive failures. And in this 
case, I want different behaviour; I can take in onto myself to make sure 
a read error won't make my filesystems go into read-only mode and break 
anything, I really know what I'm doing, and I don't need my computer to 
tell me that RAID is not supposed to be used in this way. And it 
shouldn't add a lot of complex code - just a test "if idiotmode and 
lastdisk then return error, else kick drive; shout like crazy either 
way". :)

It's just that everyone has their own opinion on where to draw the line, 
and the "intended" one should of course be preached, but not forced!

-- 
darkpenguin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-07 16:23     ` Dark Penguin
@ 2016-10-07 16:52       ` Phil Turmel
  2016-10-07 17:44         ` Dark Penguin
  0 siblings, 1 reply; 21+ messages in thread
From: Phil Turmel @ 2016-10-07 16:52 UTC (permalink / raw)
  To: Dark Penguin, Andreas Klauer, Rudy Zijlstra, keld; +Cc: linux-raid

Hi DP,

{It's good that you are trimming replies, but don't cut the ID of who
wrote what. }

On 10/07/2016 12:23 PM, Dark Penguin wrote:
>> Likewise, when the first disk fails, one could mark it as kind of in
>> an error state,
>> and keep it running, and if one gets a read error, then you could get
>> the data from the good disks.
> 
> Yes!! If a drive is "faulty", it means "you should replace it because it
> is failing"; there is no need to actually stop using it and degrade the
> whole RAID operation! What's more, it would be extremely useful at
> rebuilding without any performance loss: let the array work in degraded
> mode, while the faulty drive is being copied to the new one, with only
> read errors reconstructed from the rest of the drives! But that's a
> different issue, and not a very good idea for other reasons.

MD raid already does as much of this as it can, as I described.

>> One big reason is human behaviour. And it is human behaviour that in the
>> end causes all the collapsed raids.
> 
> "Human behaviour", that's what I'm talking about. If the only reason to
> do it is to force people to do what is necessary, that approach is
> called "Windows". :) And I do not suggest that it should be the default
> behaviour; instead, we should have an option "--idiotmode
> --yes-i-know-what-i-am-doing" at RAID creation for those who
> specifically want to take the risks.
> 
> And of course, no broken files will appear if we suffer from read
> *errors*. We do not suffer from *incorrect reads*, right?..

You want to push the failure condition from being "broken raid with
likely salvageable data, except for one sector" to "repeated errors to
the upper layers with unknowable corruption as side effects".

>> You make it sound like it solves all problems, but it does not.
>> Errors are just not part of the concept anywhere really.
> 
> It does not "solve all problems", but it lets me solve my problems my
> way, and not "the only correct and intended way" - which is what Linux
> is good at. :)

Then patch your kernel with your desired behavior.  "Free software"
doesn't mean someone writes what you want for free.  And I disagree with
you, so would object to it being put in the mainline kernel.

>>> > I believe this is the dream of everyone who had ever dealt with RAIDs.
>>
>> My dream is different. I don't want errors. I want it to work. ;)
>> And it does, as long as you make sure your disks are healthy.
> 
> I do not suggest that we do it my way and not yours - we have an option
> to do it your way, but we do not have one to do it my way, that's the
> problem. :)

Write the code to add the option you want.

> Anyway, if I had a collapsed RAID-5, I would want to at least have an
> easy option to start it in a read-only mode in the last-known working
> state, while the faulty drives are still not out of sync, and recover
> data easily (to my single backup drive), or continue using the array for
> a while, manually deleting one "bad" file if necessary; this is of
> course not a "good thing" to do, but this way, RAID would be at least
> not worse than single drives with faulty sectors, which are capable of
> that, while RAIDs are not! I would be fine with that in my archive - as
> I'm fine with some less importand parts of the archive being on faulty
> single drives. It's just that I don't want to lose the whole drive due
> to a hardware failure - and RAID adds more causes other than that,
> instead of offering more protection against that.

MD raid has no idea what is at any given sector.  And with a
near-infinite variety of layering choices, there's no way it's going to.
 That's why *you* have to do this.  You trimmed my description of the
only "easy option" actually trustable.

> It's just that everyone has their own opinion on where to draw the line,
> and the "intended" one should of course be preached, but not forced!

The "line" I was referring to is the decision of when to throw away a
drive vs. recondition it.  That's already in your hands.

Phil

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-07 16:52       ` Phil Turmel
@ 2016-10-07 17:44         ` Dark Penguin
  2016-10-07 18:41           ` Phil Turmel
  2016-10-10 20:47           ` Anthony Youngman
  0 siblings, 2 replies; 21+ messages in thread
From: Dark Penguin @ 2016-10-07 17:44 UTC (permalink / raw)
  To: Phil Turmel, Andreas Klauer, Rudy Zijlstra, keld; +Cc: linux-raid

On 07/10/16 19:52, Phil Turmel wrote:
> Hi DP,
>
> {It's good that you are trimming replies, but don't cut the ID of who
> wrote what. }

Oh, yeah, sorry.


> You want to push the failure condition from being "broken raid with
> likely salvageable data, except for one sector" to "repeated errors to
> the upper layers with unknowable corruption as side effects".

That actually describes it pretty well, yes. %) Being able to choose a 
failure condition most suitable for your specific situation, and being 
able to push it that far and still have a working RAID if you want that.


> Then patch your kernel with your desired behavior.  "Free software"
> doesn't mean someone writes what you want for free.  And I disagree with
> you, so would object to it being put in the mainline kernel.

Yes, that's one of the things on my TODO list once I become a developer 
able to do that. :) I just thought I'm probably not the only one who 
wants that, and so I wanted to learn why is it not possible. And listen 
to what other people really think about it.


>> Anyway, if I had a collapsed RAID-5, I would want to at least have an
>> easy option to start it in a read-only mode in the last-known working
>> state, while the faulty drives are still not out of sync, and recover
>> data easily (to my single backup drive), or continue using the array for
>> a while, manually deleting one "bad" file if necessary; this is of
>> course not a "good thing" to do, but this way, RAID would be at least
>> not worse than single drives with faulty sectors, which are capable of
>> that, while RAIDs are not! I would be fine with that in my archive - as
>> I'm fine with some less importand parts of the archive being on faulty
>> single drives. It's just that I don't want to lose the whole drive due
>> to a hardware failure - and RAID adds more causes other than that,
>> instead of offering more protection against that.
>
> MD raid has no idea what is at any given sector.  And with a
> near-infinite variety of layering choices, there's no way it's going to.
>   That's why *you* have to do this.  You trimmed my description of the
> only "easy option" actually trustable.

I actually wanted to ask about that. Can you really ddrescue a drive 
with a "hole" in it, re-add it and expect it to work?.. What happens if 
you try to read from that "hole" again? And while I'm talking about 
re-adding, when does it become impossible to "re-add" a drive?..


-- 
darkpenguin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-07 17:44         ` Dark Penguin
@ 2016-10-07 18:41           ` Phil Turmel
  2016-10-07 20:39             ` Dark Penguin
  2016-10-07 23:11             ` Edward Kuns
  2016-10-10 20:47           ` Anthony Youngman
  1 sibling, 2 replies; 21+ messages in thread
From: Phil Turmel @ 2016-10-07 18:41 UTC (permalink / raw)
  To: Dark Penguin, Andreas Klauer, Rudy Zijlstra, keld; +Cc: linux-raid

On 10/07/2016 01:44 PM, Dark Penguin wrote:
> On 07/10/16 19:52, Phil Turmel wrote:

>> MD raid has no idea what is at any given sector.  And with a
>> near-infinite variety of layering choices, there's no way it's going to.
>>   That's why *you* have to do this.  You trimmed my description of the
>> only "easy option" actually trustable.
> 
> I actually wanted to ask about that. Can you really ddrescue a drive
> with a "hole" in it, re-add it and expect it to work?.. What happens if
> you try to read from that "hole" again? And while I'm talking about
> re-adding, when does it become impossible to "re-add" a drive?..

Yes, ddrescue replaces unreadable areas with zeroes.  If those blocks
were part of a file, then the file will have zeroes in it.  But they
might have been where an inode or dirent were stored, in which case you
get orphaned data elsewhere.  You need fsck to minimize that.

ddrescue can provide a listing of the sectors it replaced so you can use
filesystem forensic tools to pinpoint the problems (which file, etc).

Note that all of the above are manual operations -- mdadm has no
knowledge of the upper layers.

None of the above uses --re-add.  Just assembly or forced assembly.
Re-add is only to return a kicked drive to a *functional* array when the
failure reason isn't really the drive.  (Controller, cable, power
supply, etc.)  And re-add is only helpful if the array members have
write-intent bitmaps so MD can figure out which parts of the re-added
disk are out of date.  Re-add can be used if a drive is kicked for
timeout mismatch, but is only helpful if the mismatch is addressed first.

Phil

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-07 18:41           ` Phil Turmel
@ 2016-10-07 20:39             ` Dark Penguin
  2016-10-07 23:11             ` Edward Kuns
  1 sibling, 0 replies; 21+ messages in thread
From: Dark Penguin @ 2016-10-07 20:39 UTC (permalink / raw)
  To: Phil Turmel, Andreas Klauer, Rudy Zijlstra, keld, linux-raid

>> On 07/10/16 19:52, Phil Turmel wrote:
>
>>> MD raid has no idea what is at any given sector.  And with a
>>> near-infinite variety of layering choices, there's no way it's going to.
>>>    That's why *you* have to do this.  You trimmed my description of the
>>> only "easy option" actually trustable.
>>
>> I actually wanted to ask about that. Can you really ddrescue a drive
>> with a "hole" in it, re-add it and expect it to work?.. What happens if
>> you try to read from that "hole" again? And while I'm talking about
>> re-adding, when does it become impossible to "re-add" a drive?..
>
> Yes, ddrescue replaces unreadable areas with zeroes.  If those blocks
> were part of a file, then the file will have zeroes in it.  But they
> might have been where an inode or dirent were stored, in which case you
> get orphaned data elsewhere.  You need fsck to minimize that.

Ah, yes - in this case it's the only drive with this piece of 
information, and md doesn't keep any checksums or anything, so it will 
simply return those zeroes. Thanks for explaining this!


> ddrescue can provide a listing of the sectors it replaced so you can use
> filesystem forensic tools to pinpoint the problems (which file, etc).
>
> Note that all of the above are manual operations -- mdadm has no
> knowledge of the upper layers.
>
> None of the above uses --re-add.  Just assembly or forced assembly.
> Re-add is only to return a kicked drive to a *functional* array when the
> failure reason isn't really the drive.  (Controller, cable, power
> supply, etc.)  And re-add is only helpful if the array members have
> write-intent bitmaps so MD can figure out which parts of the re-added
> disk are out of date.  Re-add can be used if a drive is kicked for
> timeout mismatch, but is only helpful if the mismatch is addressed first.

"Forced assembly"... That's one thing I've missed. So forced-assembling 
a faulty drive back into a collapsed array after each failure would 
basically do what I wanted to do - and with no inconsistencies, because 
the array stops the moment the drive was kicked; but I can see why this 
is not a good idea. %)

So, "re-adding" is only possible with a functional array, and only when 
a write-intent bitmap is used. But I remember clearly that not long ago, 
one of my drives failed (most likely due to a cable popping off) and 
refused to re-add into a mirror with a bitmap, so I'm still wondering 
why was it not possible. At least in theory, as long as there is a 
bitmap, it should be possible to re-add, no matter how much later, right?..


-- 
darkpenguin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-07 18:41           ` Phil Turmel
  2016-10-07 20:39             ` Dark Penguin
@ 2016-10-07 23:11             ` Edward Kuns
  1 sibling, 0 replies; 21+ messages in thread
From: Edward Kuns @ 2016-10-07 23:11 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Dark Penguin, Andreas Klauer, Rudy Zijlstra, keld, Linux-RAID

On Fri, Oct 7, 2016 at 9:43 AM, Phil Turmel <philip@turmel.org> wrote:
> I want to see the relocations growing in my daily logwatch  reports so
> I can use mdadm --replace to maintain the array without *any* loss
> of redundancy.

Is this report something you added to base logwatch?  I don't see any
such report in my daily logwatch.  Maybe you only see this if the
relocations are non-zero?

Let's say one has a RAID-5 and two drives develop bad sectors (in
different areas) at the same time that for whatever reason are not
repairable.  (Or someone ignores errors on one drive long enough that
another develops problems, or someone never did scrubbing so didn't
find out about bad sectors in one drive until multiple drives had bad
sectors.)  OK, you ddrescue the two drives and reassemble the array.
What will a RAID scrub do when it gets to the area that was zeroed?
Clearly the checksum will be invalid for those sections.

        Thanks,

            Eddie

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-07 17:44         ` Dark Penguin
  2016-10-07 18:41           ` Phil Turmel
@ 2016-10-10 20:47           ` Anthony Youngman
  2016-10-10 21:37             ` Andreas Klauer
  2016-10-10 22:10             ` Wakko Warner
  1 sibling, 2 replies; 21+ messages in thread
From: Anthony Youngman @ 2016-10-10 20:47 UTC (permalink / raw)
  To: Dark Penguin, Phil Turmel, Andreas Klauer, Rudy Zijlstra, keld; +Cc: linux-raid



On 07/10/16 18:44, Dark Penguin wrote:
>
> I actually wanted to ask about that. Can you really ddrescue a drive
> with a "hole" in it, re-add it and expect it to work?.. What happens if
> you try to read from that "hole" again? And while I'm talking about
> re-adding, when does it become impossible to "re-add" a drive?..

If you want to do some kernel development work, this is something you 
can do something about :-)

ddrescue creates a log of sectors that failed to copy. I've been 
thinking a bit about this, not least because other people have mentioned it.

Modern disk partitioning tools usually leave a chunk of space. What we 
want is some way of making ddrescue dump a signature on the disk, along 
with a list of all blocks that failed to copy. Then we need to patch the 
low-level disk access code so that it reads this list of "bad blocks" 
and returns a read error if any attempt is made to read one. If a block 
is written, it's removed from the list. In effect, this is a "bad block" 
list, only instead of being at the disk firmware level, it's at the OS's 
disk driver level.

That way, at least you know, if you copy a damaged disk with errors, 
then the filesystem layer will be told that the file is damaged, rather 
than returning duff data with no indication that it is duff.

THIS IS NOT THERE TODAY, but if you want a kernel project, this isn't a 
bad one. This will mean that you can recover a broken raid with no data 
loss, provided you have enough drives to be able to assemble a redundant 
array, and you aren't unlucky enough to have two drives have an error in 
the same place.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-10 20:47           ` Anthony Youngman
@ 2016-10-10 21:37             ` Andreas Klauer
  2016-10-10 21:55               ` Wols Lists
  2016-10-10 22:10             ` Wakko Warner
  1 sibling, 1 reply; 21+ messages in thread
From: Andreas Klauer @ 2016-10-10 21:37 UTC (permalink / raw)
  To: Anthony Youngman
  Cc: Dark Penguin, Phil Turmel, Rudy Zijlstra, keld, linux-raid

On Mon, Oct 10, 2016 at 09:47:04PM +0100, Anthony Youngman wrote:
> with a list of all blocks that failed to copy. Then we need to patch the 
> low-level disk access code so that it reads this list of "bad blocks" 
> and returns a read error if any attempt is made to read one. If a block 

hdparm has that feature to mark sectors as bad (--make-bad-sector).
not sure how that behaves on a re-write by md. I never tried it myself.

Maybe you could also do something with device mapper. It does have 
an error target, and then there's the overlay. I wish dmsetup had 
some profiles/shortcuts/reciped to make creation of such device mapper 
tidbits easier or another common tool for those device mapper tricks...

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-10 21:37             ` Andreas Klauer
@ 2016-10-10 21:55               ` Wols Lists
  2016-10-11  4:00                 ` Brad Campbell
  0 siblings, 1 reply; 21+ messages in thread
From: Wols Lists @ 2016-10-10 21:55 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Dark Penguin, Phil Turmel, Rudy Zijlstra, keld, linux-raid

On 10/10/16 22:37, Andreas Klauer wrote:
> On Mon, Oct 10, 2016 at 09:47:04PM +0100, Anthony Youngman wrote:
>> with a list of all blocks that failed to copy. Then we need to patch the 
>> low-level disk access code so that it reads this list of "bad blocks" 
>> and returns a read error if any attempt is made to read one. If a block 
> 
> hdparm has that feature to mark sectors as bad (--make-bad-sector).
> not sure how that behaves on a re-write by md. I never tried it myself.

I'm guessing it's useless ...

The point is that the disk sector is not bad. So you don't want to mark
it as bad on the disk. But you know that the *data* in that block is
bad, so you want the disk access layer to fake a read error when you try
to read it. The intent is to deliberately trigger a rewrite by md.
> 
> Maybe you could also do something with device mapper. It does have 
> an error target, and then there's the overlay. I wish dmsetup had 
> some profiles/shortcuts/reciped to make creation of such device mapper 
> tidbits easier or another common tool for those device mapper tricks...
> 
That certainly sounds plausible ...

Cheers,
Wol


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-10 20:47           ` Anthony Youngman
  2016-10-10 21:37             ` Andreas Klauer
@ 2016-10-10 22:10             ` Wakko Warner
  1 sibling, 0 replies; 21+ messages in thread
From: Wakko Warner @ 2016-10-10 22:10 UTC (permalink / raw)
  To: Anthony Youngman; +Cc: linux-raid

(CCs trimmed)

Anthony Youngman wrote:
> 
> 
> On 07/10/16 18:44, Dark Penguin wrote:
> >
> >I actually wanted to ask about that. Can you really ddrescue a drive
> >with a "hole" in it, re-add it and expect it to work?.. What happens if
> >you try to read from that "hole" again? And while I'm talking about
> >re-adding, when does it become impossible to "re-add" a drive?..
> 
> If you want to do some kernel development work, this is something
> you can do something about :-)
> 
> ddrescue creates a log of sectors that failed to copy. I've been
> thinking a bit about this, not least because other people have
> mentioned it.

I've done disk rescues where I work and I came up with an idea to use the
device mapper targets to emulate this.  Why not just read the .log file and
create a mapping where if it's good, it goes to the disk, if bad, it goes to
error.  It obviously won't handle writes, but you can layer a snapshot
device on top of it.  When the "error" is corrected, it'll write to the
snapshot.  You can then tear everything down, and merge the snapshot into
the disk.  I tried something similar when I had a bad sector on a drive and
md kept kicking it out.  Fortunately it was in /usr and wasn't important.

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-10 21:55               ` Wols Lists
@ 2016-10-11  4:00                 ` Brad Campbell
  2016-10-11  9:18                   ` Wols Lists
  0 siblings, 1 reply; 21+ messages in thread
From: Brad Campbell @ 2016-10-11  4:00 UTC (permalink / raw)
  To: Wols Lists, Andreas Klauer
  Cc: Dark Penguin, Phil Turmel, Rudy Zijlstra, keld, linux-raid

On 11/10/16 05:55, Wols Lists wrote:
> On 10/10/16 22:37, Andreas Klauer wrote:
>> On Mon, Oct 10, 2016 at 09:47:04PM +0100, Anthony Youngman wrote:
>>> with a list of all blocks that failed to copy. Then we need to patch the
>>> low-level disk access code so that it reads this list of "bad blocks"
>>> and returns a read error if any attempt is made to read one. If a block
>>
>> hdparm has that feature to mark sectors as bad (--make-bad-sector).
>> not sure how that behaves on a re-write by md. I never tried it myself.
>
> I'm guessing it's useless ...

Not useless at all.

> The point is that the disk sector is not bad. So you don't want to mark
> it as bad on the disk. But you know that the *data* in that block is
> bad, so you want the disk access layer to fake a read error when you try
> to read it. The intent is to deliberately trigger a rewrite by md.

I suggested this a while ago. Take the badblocks log, use hdparm to mark 
each bad sector as bad and put the drive back in the array. I even 
suggested potentially adding a feature to ddrescue to auto-mark the 
blocks as bad on the target drive.

When md reads from that bad sector it will get an immediate error from 
the drive, reconstruct the data and rewrite it, clearing the bad sector.

This absolutely prevents a rescued disk from returning zeros rather than 
bad data, and allows the good parts of the disk to participate in the 
array redundancy while stuff gets rectified.

This is only useful where you have several dud disks in an array and the 
bad sectors are not in the same stripes, but for that pathological case 
it would allow ddrescuing onto new drives and reconstructing the array 
without data loss. Whereas simply using ddrescued disks will happily 
return zeros where the holes are.

Regards,
Brad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-11  4:00                 ` Brad Campbell
@ 2016-10-11  9:18                   ` Wols Lists
  2016-10-11 10:01                     ` Brad Campbell
  0 siblings, 1 reply; 21+ messages in thread
From: Wols Lists @ 2016-10-11  9:18 UTC (permalink / raw)
  To: Brad Campbell, Andreas Klauer
  Cc: Dark Penguin, Phil Turmel, Rudy Zijlstra, keld, linux-raid

On 11/10/16 05:00, Brad Campbell wrote:
> On 11/10/16 05:55, Wols Lists wrote:
>> On 10/10/16 22:37, Andreas Klauer wrote:
>>> On Mon, Oct 10, 2016 at 09:47:04PM +0100, Anthony Youngman wrote:
>>>> with a list of all blocks that failed to copy. Then we need to patch
>>>> the
>>>> low-level disk access code so that it reads this list of "bad blocks"
>>>> and returns a read error if any attempt is made to read one. If a block
>>>
>>> hdparm has that feature to mark sectors as bad (--make-bad-sector).
>>> not sure how that behaves on a re-write by md. I never tried it myself.
>>
>> I'm guessing it's useless ...
> 
> Not useless at all.

Ahh...
> 
>> The point is that the disk sector is not bad. So you don't want to mark
>> it as bad on the disk. But you know that the *data* in that block is
>> bad, so you want the disk access layer to fake a read error when you try
>> to read it. The intent is to deliberately trigger a rewrite by md.
> 
> I suggested this a while ago. Take the badblocks log, use hdparm to mark
> each bad sector as bad and put the drive back in the array. I even
> suggested potentially adding a feature to ddrescue to auto-mark the
> blocks as bad on the target drive.

But does that mean that the drive thinks those sectors are bad, and that
they're then lost permanently at the hardware level? That's what I
thought the badblocks list did with hdparm, and that's what I was trying
to avoid.
> 
> When md reads from that bad sector it will get an immediate error from
> the drive, reconstruct the data and rewrite it, clearing the bad sector.
> 
> This absolutely prevents a rescued disk from returning zeros rather than
> bad data, and allows the good parts of the disk to participate in the
> array redundancy while stuff gets rectified.
> 
> This is only useful where you have several dud disks in an array and the
> bad sectors are not in the same stripes, but for that pathological case
> it would allow ddrescuing onto new drives and reconstructing the array
> without data loss. Whereas simply using ddrescued disks will happily
> return zeros where the holes are.
> 
My thoughts exactly :-) Indeed, it's probably you I got the idea from :-)

Cheers,
Wol


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-11  9:18                   ` Wols Lists
@ 2016-10-11 10:01                     ` Brad Campbell
  2016-10-11 10:15                       ` Wols Lists
  0 siblings, 1 reply; 21+ messages in thread
From: Brad Campbell @ 2016-10-11 10:01 UTC (permalink / raw)
  To: Wols Lists, Andreas Klauer
  Cc: Dark Penguin, Phil Turmel, Rudy Zijlstra, keld, linux-raid

On 11/10/16 17:18, Wols Lists wrote:
> On 11/10/16 05:00, Brad Campbell wrote:

>>> The point is that the disk sector is not bad. So you don't want to mark
>>> it as bad on the disk. But you know that the *data* in that block is
>>> bad, so you want the disk access layer to fake a read error when you try
>>> to read it. The intent is to deliberately trigger a rewrite by md.
>>
>> I suggested this a while ago. Take the badblocks log, use hdparm to mark
>> each bad sector as bad and put the drive back in the array. I even
>> suggested potentially adding a feature to ddrescue to auto-mark the
>> blocks as bad on the target drive.
>
> But does that mean that the drive thinks those sectors are bad, and that
> they're then lost permanently at the hardware level? That's what I
> thought the badblocks list did with hdparm, and that's what I was trying
> to avoid.

I've not used bad blocks list, but a cursory read would indicate it only 
records a bad block if the writeback fails. That won't ever happen with 
a bad sector created with hdparm. All hdparm does is corrupt the EEC on 
the block so a read always returns dud. A write solves that issue nicely.

Regards,
Brad


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Why not just return an error?
  2016-10-11 10:01                     ` Brad Campbell
@ 2016-10-11 10:15                       ` Wols Lists
  0 siblings, 0 replies; 21+ messages in thread
From: Wols Lists @ 2016-10-11 10:15 UTC (permalink / raw)
  To: Brad Campbell, Andreas Klauer
  Cc: Dark Penguin, Phil Turmel, Rudy Zijlstra, keld, linux-raid

On 11/10/16 11:01, Brad Campbell wrote:
> On 11/10/16 17:18, Wols Lists wrote:
>> On 11/10/16 05:00, Brad Campbell wrote:
> 
>>>> The point is that the disk sector is not bad. So you don't want to mark
>>>> it as bad on the disk. But you know that the *data* in that block is
>>>> bad, so you want the disk access layer to fake a read error when you
>>>> try
>>>> to read it. The intent is to deliberately trigger a rewrite by md.
>>>
>>> I suggested this a while ago. Take the badblocks log, use hdparm to mark
>>> each bad sector as bad and put the drive back in the array. I even
>>> suggested potentially adding a feature to ddrescue to auto-mark the
>>> blocks as bad on the target drive.
>>
>> But does that mean that the drive thinks those sectors are bad, and that
>> they're then lost permanently at the hardware level? That's what I
>> thought the badblocks list did with hdparm, and that's what I was trying
>> to avoid.
> 
> I've not used bad blocks list, but a cursory read would indicate it only
> records a bad block if the writeback fails. That won't ever happen with
> a bad sector created with hdparm. All hdparm does is corrupt the EEC on
> the block so a read always returns dud. A write solves that issue nicely.
> 
That's good to know. What happened with that suggestion for ddrescue?
Did they not like it, or was it the usual "show us the code and we'll
add it"? :-) So much to do, so little time :-)

I'm trying to build a little list of projects, partly as a result of
doing the wiki, that people wanting to get into raid programming (myself
included!) can do.

Cheers,
Wol


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-10-11 10:15 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-06 23:32 Why not just return an error? Dark Penguin
2016-10-07  5:26 ` keld
2016-10-07  8:21   ` Rudy Zijlstra
2016-10-07  9:30     ` keld
2016-10-07 11:21 ` Andreas Klauer
2016-10-07 14:43   ` Phil Turmel
2016-10-07 16:23     ` Dark Penguin
2016-10-07 16:52       ` Phil Turmel
2016-10-07 17:44         ` Dark Penguin
2016-10-07 18:41           ` Phil Turmel
2016-10-07 20:39             ` Dark Penguin
2016-10-07 23:11             ` Edward Kuns
2016-10-10 20:47           ` Anthony Youngman
2016-10-10 21:37             ` Andreas Klauer
2016-10-10 21:55               ` Wols Lists
2016-10-11  4:00                 ` Brad Campbell
2016-10-11  9:18                   ` Wols Lists
2016-10-11 10:01                     ` Brad Campbell
2016-10-11 10:15                       ` Wols Lists
2016-10-10 22:10             ` Wakko Warner
2016-10-07 14:19 ` Phil Turmel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.