All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Phil Karn <karn@ka9q.net>
Cc: Paul Jones <paul@pauljones.id.au>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Extremely slow device removals
Date: Sat, 2 May 2020 03:42:37 -0400	[thread overview]
Message-ID: <20200502074237.GM10769@hungrycats.org> (raw)
In-Reply-To: <CAMwB8mhGkcM3DCTusuHAi-cQcr-FrA5cq4hVYfv+65zn_QjAig@mail.gmail.com>

On Sat, May 02, 2020 at 12:20:42AM -0700, Phil Karn wrote:
> So I'm trying to figure out the advantage of including RAID 1 inside
> btrfs instead of just running it over a conventional (fs-agnostic)
> RAID subsystem.
> 
> I was originally really intrigued by the idea of integrating RAID into
> the file system since it seemed like you could do more that way, or at
> least do things more efficiently. For example, when adding or
> replacing a mirror you'd only have to copy those parts of the disk
> that actually contain data. That promised better performance. But if
> those actually-used blocks are copied in small pieces and in random
> order so the operation is far slower than the logical equivalent of
> "dd if=disk1 of=disk2', 

If you use btrfs replace to move data between drives then you get all
the advantages you describe.  Don't do 'device remove' if you can possibly
avoid it.

Array reshapes in btrfs are currently slower than they need to be, but
there's no on-disk-format reason why they can't be as fast as replace
in many cases.

> then what's left?

If there's data corruption on one disk, btrfs can detect it and replace
the lost data from the good copy.  Most block-level raid1's have a 50%
chance of corrupting the good copy with the bad one, and can only report
corruption as a difference in content between the drives (i.e. you have
to guess which is correct), if they bother to report corruption at all.

This allows you to take advantage of diverse redundant storage (e.g.
raid1 pairs composed of disks made by different vendors).  In btrfs
raid1, heterogeonous drive firmware maximizes the chance of having one
bug-free firmware, and scrub will tell you exactly which drive is bad.
In other raid1 implementations, a heterogeneous raid1 pair maximizes the
chance of one firmware in the array having a bug that corrupts the data
on the good drives, and doesn't tell you which drive is bad.

btrfs does not rely on lower level hardware for error detection, so you
can use cheap SSDs and SD cards that have no firmware capability to detect
or report the flash equivalent of UNC sectors as btrfs raid1 members.
I usually can squeeze about six months of extra life out of some very
substandard storage hardware with btrfs.

> Even the ability to use drives of different sizes isn't unique to
> btrfs. You can use LVM to concatenate smaller volumes into larger
> logical ones.
> 
> Phil

  reply	other threads:[~2020-05-02  7:42 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-28  7:22 Extremely slow device removals Phil Karn
2020-04-30 17:31 ` Phil Karn
2020-04-30 18:13   ` Jean-Denis Girard
2020-05-01  8:05     ` Phil Karn
2020-05-02  3:35       ` Zygo Blaxell
     [not found]         ` <CAMwB8mjUw+KV8mxg8ynPsv0sj5vSpwG7_khw=oP5n+SnPYzumQ@mail.gmail.com>
2020-05-02  4:31           ` Zygo Blaxell
2020-05-02  4:48         ` Paul Jones
2020-05-02  5:25           ` Phil Karn
2020-05-02  6:04             ` Remi Gauvin
2020-05-02  7:20             ` Zygo Blaxell
2020-05-02  7:27               ` Phil Karn
2020-05-02  7:52                 ` Zygo Blaxell
2020-05-02  6:00           ` Zygo Blaxell
2020-05-02  6:23             ` Paul Jones
2020-05-02  7:20               ` Phil Karn
2020-05-02  7:42                 ` Zygo Blaxell [this message]
2020-05-02  8:22                   ` Phil Karn
2020-05-02  8:24                     ` Phil Karn
2020-05-02  9:09                     ` Zygo Blaxell
2020-05-02 17:48                       ` Chris Murphy
2020-05-03  5:26                         ` Zygo Blaxell
2020-05-03  5:39                           ` Chris Murphy
2020-05-03  6:05                             ` Chris Murphy
2020-05-04  2:09                         ` Phil Karn
2020-05-02  7:43                 ` Jukka Larja
2020-05-02  4:49         ` Phil Karn
2020-04-30 18:40   ` Chris Murphy
2020-04-30 19:59     ` Phil Karn
2020-04-30 20:27       ` Alexandru Dordea
2020-04-30 20:58         ` Phil Karn
2020-05-01  2:47       ` Zygo Blaxell
2020-05-01  4:48         ` Phil Karn
2020-05-01  6:05           ` Alexandru Dordea
2020-05-01  7:29             ` Phil Karn
2020-05-02  4:18               ` Zygo Blaxell
2020-05-02  4:48                 ` Phil Karn
2020-05-02  5:00                 ` Phil Karn
2020-05-03  2:28                 ` Phil Karn
2020-05-04  7:39                   ` Phil Karn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200502074237.GM10769@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=karn@ka9q.net \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=paul@pauljones.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.