Re: raid1 with several old drives and a big new one

From: Chris Murphy <lists@colorremedies.com>
To: Eric Wong <e@80x24.org>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: raid1 with several old drives and a big new one
Date: Thu, 30 Jul 2020 20:57:38 -0600	[thread overview]
Message-ID: <CAJCQCtS6fHYGBiHpqAJPu+-EoSzEKZ5YEaj4QjNxqPvO+JTACw@mail.gmail.com> (raw)
In-Reply-To: <20200731001652.GA28434@dcvr>

(first attempt did not go to the list)

On Thu, Jul 30, 2020 at 6:16 PM Eric Wong <e@80x24.org> wrote:
>
> Say I have three ancient 2TB HDDs and one new 6TB HDD, is there
> a way I can ensure one raid1 copy of the data stays on the new
> 6TB HDD?

Yes. Use mdadm --level=linear --raid-devices=2 to concatenate the two
2TB drives. Or use LVM (linear by default). Leave the 6TB out of this
regime. And now you have two block devices (one is the concat virtual
device) to do a raid1 with btrfs, and the 6TB will always get one of
the raid1 chunks.

There isn't a way to do this with btrfs alone.

When one of the 2TB fails, there's some likelihood that it'll behave
like a partially failing device. Some reads and writes will succeed,
others won't. So you'll need to be prepared strategy wise what to do.
Ideal scenario is a new 4+TB drive, and use 'btrfs replace' to replace
the md concat device. Due to the large number of errors possible with
the 'btrfs replace' you might want to use -r option.

Following successful replace, an option is to break the 2x 2TB mdadm
concat apart, send the dead drive off for grinding, and the good 2TB
you can add as a 3rd device to the Btrfs. If it dies, same thing.
Preferably use 'btrfs replace' - it's faster and more reliable than
'btrfs delete missing'.

And on second thought...

You might do some rudimentary read/write benchmarks on all three
drives. I haven't found btrfs to be fussy about speed differences
between raid1 member drives. But if it turns out either of the 2TB's
are slower than the 6TB, you could do raid0 instead of linear. If so,
I suggest either 32Kib or 64KiB for mdadm --chunk size. Default is
512KiB. Not great for metadata centric workloads.

Of course, if one of them dies, the error behavior will be quite a lot
more consistent, EIO on every other 64KiB strip. So you'll definitely
want -r option when doing the replace.

--
Chris Murphy