Re: Maximizing failed disk replacement on a RAID5 array

From: Durval Menezes <durval.menezes@gmail.com>
To: John Robinson <john.robinson@anonymous.org.uk>
Cc: Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: Maximizing failed disk replacement on a RAID5 array
Date: Sat, 11 Jun 2011 19:35:17 -0300	[thread overview]
Message-ID: <BANLkTimOqdb2VH26Zv=-VSJ4VYccAuv02w@mail.gmail.com> (raw)
In-Reply-To: <4DF1F117.5010604@anonymous.org.uk>

Hello John,

On Fri, Jun 10, 2011 at 7:25 AM, John Robinson
<john.robinson@anonymous.org.uk> wrote:
> On 07/06/2011 09:52, John Robinson wrote:
>>
>> On 06/06/2011 19:06, Durval Menezes wrote:
>> [...]
>>>
>>> It would be great to have a
>>> "duplicate-this-bad-old-disk-into-this-shiny-new-disk" functionality,
>>> as it would enable an almost-no-downtime disk replacement with
>>> minimum risk, but it seems we can't have everything... :-0 Maybe it's
>>> something for the wishlist?
>>
>> It's already on the wishlist, described as a hot replace.
>
> Actually I've been thinking about this. I think I'd rather the hot replace
> functionality did a normal rebuild from the still-good drives, and only if
> it came across a read error from those would it attempt to refer to the
> contents of the known-to-be-failing drive (and then also attempt to repair
> the read error on the supposedly-still-good drive that gave a read error, as
> already happens).

This looks like a very good idea. The old (failing) drive would be
kept "on reserve", ready to be accessed for eventual failed sectors on
the other old (good) drives...

> My rationale for this is as follows: if we want to hot-replace a drive
> that's known to be failing, we should trust it less than the remaining
> still-good drives, and treat it with kid gloves. It may be suffering from
> bit-rot. We'd rather not hit all the bad sectors on the failing drive,
> because each time we do that we send the drive into 7 seconds (or more, for
> cheap drives without TLER) of re-reading, plus any Linux-level re-reading
> there might be. Further, making the known-to-be-failing drive work extra
> hard (doing the equivalent of dd'ing from it while also still using it to
> serve its contents as an array member) might make it die completely before
> we've finished.

I agree completely.

> What will this do for rebuild time? Well, I don't think it'll be any slower.

I think it will actually be faster.

> On the one hand, you'd think that copying from one drive to another would be
> faster than a rebuild, because you're only reading 1 drive instead of N-1,
> but on the other, your array is going to run slowly (pretty much degraded
> speed) anyway because you're keeping one drive in constant use reading from
> it, and you risk it becoming much, much slower if you do run in to hundreds
> or thousands of read errors on the failing drive.
>
> So overall I think hot-replace should be a normal replace with a possible
> second source of data/parity.

Your reasoning sounds good to me.

> Thoughts?

Only sadness that it's not implemented yet... :-)

> Yes, I know, -ENOPATCH

Exactly :-)

Cheers,
-- 
   Durval Menezes.

>
> Cheers,
>
> John.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html