Re: adding new devices to degraded raid1

From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Eric Wong <e@80x24.org>
Cc: kreijack@inwind.it, linux-btrfs@vger.kernel.org
Subject: Re: adding new devices to degraded raid1
Date: Sat, 29 Aug 2020 14:46:10 -0400	[thread overview]
Message-ID: <20200829184610.GW5890@hungrycats.org> (raw)
In-Reply-To: <20200829004240.GA32462@dcvr>

On Sat, Aug 29, 2020 at 12:42:40AM +0000, Eric Wong wrote:
> Zygo Blaxell <ce3g8jdj@umail.furryterror.org> wrote:
> > Remove makes a copy of every extent, updates every reference to the
> > extent, then deletes the original extents.  Very seek-heavy--including
> > seeks between reads and writes on the same drive--and the work is roughly
> > proportional to the number of reflinks, so dedupe and snapshots push
> > the cost up.  About the only advantage of remove (and balance) is that
> > it consists of 95% existing btrfs read and write code, and it can handle
> > any relocation that does not require changing the size or content of an
> > extent (including all possible conversions).
> 
> Does that mean remove speed would be closer to replace on good SSDs?

It will be better, but there is still a cost for reading and writing
non-contiguously.  "Good SSD" depends on what the SSD is good at.
A SSD rated for NAS or caching use would be OK, but a high-performance
desktop SSD could hit big write-multiplication penalties.  A couple of
brand names starting with "S" have 5-second IO stalls when their internal
caches get full.  Proportionally, the ratio between the best and worst
IO latency in these SSD models is as bad as SMR drives.  Also there are
CPU and IO latency costs for 'remove' in the host that don't go away
no matter how good the disks are.

> > Arguably this isn't necessary.  Remove could copy a complete block group,
> > the same way replace does but to a different offset on each drive, and
> > simply update the chunk tree with the new location of the block group
> > at the end.  Trouble is, nobody's implemented this approach in btrfs yet.
> > It would be a whole new code path with its very own new bugs to fix.
> 
> Ah, it seems like a ton of work for a use case that mainly
> affects hobbyists.  I won't hold my breath for it.

Well, by that argument, mdadm and lvm shouldn't be able to do it either,
and yet they have supported this style of reshape for years.