All of lore.kernel.org
 help / color / mirror / Atom feed
* raid1 with several old drives and a big new one
@ 2020-07-31  0:16 Eric Wong
  2020-07-31  2:57 ` Chris Murphy
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Eric Wong @ 2020-07-31  0:16 UTC (permalink / raw)
  To: linux-btrfs

Say I have three ancient 2TB HDDs and one new 6TB HDD, is there
a way I can ensure one raid1 copy of the data stays on the new
6TB HDD?

I expect the 2TB HDDs to fail sooner than the 6TB HDD given
their age (>5 years).

The devid balance filter only affects data which already exists
on the device, so that isn't suitable for this, right?

Thanks in advance.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 with several old drives and a big new one
  2020-07-31  0:16 raid1 with several old drives and a big new one Eric Wong
@ 2020-07-31  2:57 ` Chris Murphy
  2020-07-31  3:22   ` Eric Wong
  2020-08-01  9:05   ` Roman Mamedov
  2020-07-31  8:29 ` Alberto Bursi
  2020-07-31 16:13 ` Adam Borowski
  2 siblings, 2 replies; 9+ messages in thread
From: Chris Murphy @ 2020-07-31  2:57 UTC (permalink / raw)
  To: Eric Wong; +Cc: Btrfs BTRFS

(first attempt did not go to the list)


On Thu, Jul 30, 2020 at 6:16 PM Eric Wong <e@80x24.org> wrote:
>
> Say I have three ancient 2TB HDDs and one new 6TB HDD, is there
> a way I can ensure one raid1 copy of the data stays on the new
> 6TB HDD?

Yes. Use mdadm --level=linear --raid-devices=2 to concatenate the two
2TB drives. Or use LVM (linear by default). Leave the 6TB out of this
regime. And now you have two block devices (one is the concat virtual
device) to do a raid1 with btrfs, and the 6TB will always get one of
the raid1 chunks.

There isn't a way to do this with btrfs alone.

When one of the 2TB fails, there's some likelihood that it'll behave
like a partially failing device. Some reads and writes will succeed,
others won't. So you'll need to be prepared strategy wise what to do.
Ideal scenario is a new 4+TB drive, and use 'btrfs replace' to replace
the md concat device. Due to the large number of errors possible with
the 'btrfs replace' you might want to use -r option.

Following successful replace, an option is to break the 2x 2TB mdadm
concat apart, send the dead drive off for grinding, and the good 2TB
you can add as a 3rd device to the Btrfs. If it dies, same thing.
Preferably use 'btrfs replace' - it's faster and more reliable than
'btrfs delete missing'.

And on second thought...

You might do some rudimentary read/write benchmarks on all three
drives. I haven't found btrfs to be fussy about speed differences
between raid1 member drives. But if it turns out either of the 2TB's
are slower than the 6TB, you could do raid0 instead of linear. If so,
I suggest either 32Kib or 64KiB for mdadm --chunk size. Default is
512KiB. Not great for metadata centric workloads.

Of course, if one of them dies, the error behavior will be quite a lot
more consistent, EIO on every other 64KiB strip. So you'll definitely
want -r option when doing the replace.

--
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 with several old drives and a big new one
  2020-07-31  2:57 ` Chris Murphy
@ 2020-07-31  3:22   ` Eric Wong
  2020-07-31  3:35     ` Chris Murphy
  2020-08-01  9:05   ` Roman Mamedov
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Wong @ 2020-07-31  3:22 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-btrfs

Chris Murphy <lists@colorremedies.com> wrote:
> On Thu, Jul 30, 2020 at 6:16 PM Eric Wong <e@80x24.org> wrote:
> >
> > Say I have three ancient 2TB HDDs and one new 6TB HDD, is there
> > a way I can ensure one raid1 copy of the data stays on the new
> > 6TB HDD?
> 
> Yes. Use mdadm --level=linear --raid-devices=2 to concatenate the two
> 2TB drives. Or use LVM (linear by default). Leave the 6TB out of this
> regime. And now you have two block devices (one is the concat virtual
> device) to do a raid1 with btrfs, and the 6TB will always get one of
> the raid1 chunks.
> 
> There isn't a way to do this with btrfs alone.

Thanks for the response(s), I was hoping to simplify my stack
with btrfs alone.

> When one of the 2TB fails, there's some likelihood that it'll behave
> like a partially failing device. Some reads and writes will succeed,
> others won't. So you'll need to be prepared strategy wise what to do.
> Ideal scenario is a new 4+TB drive, and use 'btrfs replace' to replace
> the md concat device. Due to the large number of errors possible with
> the 'btrfs replace' you might want to use -r option.

If I went ahead with btrfs alone and am prepared to lose some
(not "all") files; could part of the FS remain usable (and the
rest restorable from slow backups) w/o involving LVM?

I could make metadata (and maybe system chunks?) raid1c3 or even
raid1c4 since they seem small and important enough with ancient
HW in play.

I mainly wanted raid1 because restoring from backups is slow;
and btrfs would let me grow a single FS without much planning
or having to find identical or even similar drives.

> And on second thought...
> 
> You might do some rudimentary read/write benchmarks on all three

<snip>
Not performance critical at all, all that is on SSD :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 with several old drives and a big new one
  2020-07-31  3:22   ` Eric Wong
@ 2020-07-31  3:35     ` Chris Murphy
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Murphy @ 2020-07-31  3:35 UTC (permalink / raw)
  To: Eric Wong; +Cc: Btrfs BTRFS

On Thu, Jul 30, 2020 at 9:22 PM Eric Wong <e@80x24.org> wrote:
>
> Chris Murphy <lists@colorremedies.com> wrote:

> > When one of the 2TB fails, there's some likelihood that it'll behave
> > like a partially failing device. Some reads and writes will succeed,
> > others won't. So you'll need to be prepared strategy wise what to do.
> > Ideal scenario is a new 4+TB drive, and use 'btrfs replace' to replace
> > the md concat device. Due to the large number of errors possible with
> > the 'btrfs replace' you might want to use -r option.
>
> If I went ahead with btrfs alone and am prepared to lose some
> (not "all") files; could part of the FS remain usable (and the
> rest restorable from slow backups) w/o involving LVM?
>
> I could make metadata (and maybe system chunks?) raid1c3 or even
> raid1c4 since they seem small and important enough with ancient
> HW in play.

Yes. I'm not sure whether it will mount rw,degraded if 2 devices are
missing though, it might insist on read-only.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 with several old drives and a big new one
  2020-07-31  0:16 raid1 with several old drives and a big new one Eric Wong
  2020-07-31  2:57 ` Chris Murphy
@ 2020-07-31  8:29 ` Alberto Bursi
  2020-07-31 10:06   ` Eric Wong
  2020-07-31 16:13 ` Adam Borowski
  2 siblings, 1 reply; 9+ messages in thread
From: Alberto Bursi @ 2020-07-31  8:29 UTC (permalink / raw)
  To: Eric Wong, linux-btrfs


On 31/07/20 02:16, Eric Wong wrote:
> Say I have three ancient 2TB HDDs and one new 6TB HDD, is there
> a way I can ensure one raid1 copy of the data stays on the new
> 6TB HDD?
>
> I expect the 2TB HDDs to fail sooner than the 6TB HDD given
> their age (>5 years).
>
> The devid balance filter only affects data which already exists
> on the device, so that isn't suitable for this, right?
>
> Thanks in advance.


I'm not sure what is the problem, ok maybe the drives are old and are 
more likely to fail, but why would more than one drive fail at once?

-Alberto


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 with several old drives and a big new one
  2020-07-31  8:29 ` Alberto Bursi
@ 2020-07-31 10:06   ` Eric Wong
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2020-07-31 10:06 UTC (permalink / raw)
  To: Alberto Bursi; +Cc: linux-btrfs

Alberto Bursi <bobafetthotmail@gmail.com> wrote:
> On 31/07/20 02:16, Eric Wong wrote:
> > Say I have three ancient 2TB HDDs and one new 6TB HDD, is there
> > a way I can ensure one raid1 copy of the data stays on the new
> > 6TB HDD?
> > 
> > I expect the 2TB HDDs to fail sooner than the 6TB HDD given
> > their age (>5 years).
> > 
> 
> I'm not sure what is the problem, ok maybe the drives are old and are more
> likely to fail, but why would more than one drive fail at once?

Why wouldn't they?  Otherwise there'd be no reason for RAID6 to
exist over RAID5.  Recovery puts more stress on the remaining
drives and increases the likelyhood of another drive in a pool
failing.  I've seen HW RAID5 arrays lost like like this in a
previous life (I didn't manage to convince the other sysadmins
to use RAID6 :<).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 with several old drives and a big new one
  2020-07-31  0:16 raid1 with several old drives and a big new one Eric Wong
  2020-07-31  2:57 ` Chris Murphy
  2020-07-31  8:29 ` Alberto Bursi
@ 2020-07-31 16:13 ` Adam Borowski
  2020-08-01  3:40   ` Zygo Blaxell
  2 siblings, 1 reply; 9+ messages in thread
From: Adam Borowski @ 2020-07-31 16:13 UTC (permalink / raw)
  To: Eric Wong; +Cc: linux-btrfs

On Fri, Jul 31, 2020 at 12:16:52AM +0000, Eric Wong wrote:
> Say I have three ancient 2TB HDDs and one new 6TB HDD, is there
> a way I can ensure one raid1 copy of the data stays on the new
> 6TB HDD?
> 
> I expect the 2TB HDDs to fail sooner than the 6TB HDD given
> their age (>5 years).

While there's no good way to do so in general, in your case, there's no way
for any new block group to be allocated without the big disk.

Btrfs' allocation algorithm is: always pick the disk with most free space
left.  Besides being simple, this guarantees optimally utilizing available
space.

And, for 2+2+2+6, no scheme that doesn't waste space could possibly place
raid1 copies without having one on the biggest disk.

Thus, all you need is to balance once.

> The devid balance filter only affects data which already exists
> on the device, so that isn't suitable for this, right?

Yeah, balance affects existing data, but doesn't have a lingering effect on
new allocations.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋⠀ It's time to migrate your Imaginary Protocol from version 4i to 6i.
⠈⠳⣄⠀⠀⠀⠀

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 with several old drives and a big new one
  2020-07-31 16:13 ` Adam Borowski
@ 2020-08-01  3:40   ` Zygo Blaxell
  0 siblings, 0 replies; 9+ messages in thread
From: Zygo Blaxell @ 2020-08-01  3:40 UTC (permalink / raw)
  To: Adam Borowski; +Cc: Eric Wong, linux-btrfs

On Fri, Jul 31, 2020 at 06:13:07PM +0200, Adam Borowski wrote:
> On Fri, Jul 31, 2020 at 12:16:52AM +0000, Eric Wong wrote:
> > Say I have three ancient 2TB HDDs and one new 6TB HDD, is there
> > a way I can ensure one raid1 copy of the data stays on the new
> > 6TB HDD?
> > 
> > I expect the 2TB HDDs to fail sooner than the 6TB HDD given
> > their age (>5 years).

It might be a good idea to run 'btrfs replace' on one of the two 2TB
disks instead of 'device add'.  That will move one copy of the data
very quickly to the new disk.  You then resize the new disk to 6TB (or
'max'), then add the 2TB disk back into the array with btrfs dev add.
This will leave you with 1 full 2TB disk, 1 empty 2TB disk, and a 6TB
disk with 2TB of data on it.

In that case you don't even need to balance--the empty 2TB drive will
fill up with BGs that contain one chunk from the 2TB drive and one
from 6TB, since the allocator will pick the two emptiest drives first.
Everything will be mirrored on the 6TB drive (probably, see below).

The variation in write load might also shift the date when the drives
eventually do fail, so they'll be less likely to fail at the same time.

> While there's no good way to do so in general, in your case, there's no way
> for any new block group to be allocated without the big disk.
> 
> Btrfs' allocation algorithm is: always pick the disk with most free space
> left.  Besides being simple, this guarantees optimally utilizing available
> space.

That is the theory; however, practice is a little different.

Sometimes btrfs just doesn't follow its own rules.  I've filled in
big raid1 arrays with lopsided disks like this, and ended up with one
block group out of every few thousand with a chunk from each of the
two smaller disks.  I guess it's a race condition, possibly triggered
by scrub or balance marking block groups readonly, but I've never fully
investigated.  When the larger disk is _exactly_ the same size as the two
smaller disks, having one block group in the wrong place can be annoying,
as it reduces capacity.

If two disks fail, btrfs will count the number of failing disks and say
"nope, can't mount this degraded raid1, sorry" if even one block group
in the filesystem contains both failing disks.

In any case, the behavior isn't strictly guaranteed here--btrfs *can*
allocate a block group across the two smaller disks, even though it
normally would not; therefore, there's a risk that it might do so
unexpectedly.

Contrast with combining the two 2TB disks (e.g. with mdadm-raid0 or
linear, or LVM), where btrfs is presented with exactly two devices and
has exactly one option to allocate mirror devices on them.

> And, for 2+2+2+6, no scheme that doesn't waste space could possibly place
> raid1 copies without having one on the biggest disk.
> 
> Thus, all you need is to balance once.
> 
> > The devid balance filter only affects data which already exists
> > on the device, so that isn't suitable for this, right?
> 
> Yeah, balance affects existing data, but doesn't have a lingering effect on
> new allocations.
> 
> Meow!
> -- 
> ⢀⣴⠾⠻⢶⣦⠀
> ⣾⠁⢠⠒⠀⣿⡁
> ⢿⡄⠘⠷⠚⠋⠀ It's time to migrate your Imaginary Protocol from version 4i to 6i.
> ⠈⠳⣄⠀⠀⠀⠀

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid1 with several old drives and a big new one
  2020-07-31  2:57 ` Chris Murphy
  2020-07-31  3:22   ` Eric Wong
@ 2020-08-01  9:05   ` Roman Mamedov
  1 sibling, 0 replies; 9+ messages in thread
From: Roman Mamedov @ 2020-08-01  9:05 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Eric Wong, Btrfs BTRFS

On Thu, 30 Jul 2020 20:57:38 -0600
Chris Murphy <lists@colorremedies.com> wrote:

> On Thu, Jul 30, 2020 at 6:16 PM Eric Wong <e@80x24.org> wrote:
> >
> > Say I have three ancient 2TB HDDs and one new 6TB HDD, is there
> > a way I can ensure one raid1 copy of the data stays on the new
> > 6TB HDD?
> 
> Yes. Use mdadm --level=linear --raid-devices=2 to concatenate the two
> 2TB drives.

Or go with a RAID0 for this, to get a nice performance benefit as well. It is
a bad idea in any case to hope for any data recoverability from a half-failed
linear "array".

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-01  9:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-31  0:16 raid1 with several old drives and a big new one Eric Wong
2020-07-31  2:57 ` Chris Murphy
2020-07-31  3:22   ` Eric Wong
2020-07-31  3:35     ` Chris Murphy
2020-08-01  9:05   ` Roman Mamedov
2020-07-31  8:29 ` Alberto Bursi
2020-07-31 10:06   ` Eric Wong
2020-07-31 16:13 ` Adam Borowski
2020-08-01  3:40   ` Zygo Blaxell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.