All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phil Karn <karn@ka9q.net>
To: Alexandru Dordea <alex@dordea.net>
Cc: Chris Murphy <lists@colorremedies.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Extremely slow device removals
Date: Thu, 30 Apr 2020 13:58:48 -0700	[thread overview]
Message-ID: <de1c1b3e-09ac-bdc8-c2fe-d95c2bffb766@ka9q.net> (raw)
In-Reply-To: <848D59AB-5B64-4C32-BE21-7BC8A8B9821E@dordea.net>

On 4/30/20 13:27, Alexandru Dordea wrote:
> Hello,
>
> I’m encountering the same issue with Raid6 for months :)
> I have a BTRFS raid6 with 15x8TB HDD’s and 5 x 14TB. One of the 15 x 8TB crashed. I have removed the faulty drive and if I’m running the delete missing command the sys-load is increasing and to recover 6.66TB will take few months. After 5 days of running the missing data decreased to -6.10.
> During this period the drivers are almost 100% and the R/W performance is degraded with more than 95%.
I see I have company, and that a more recent kernel has the same problem.
>
> The R/W performance is not impacted if the process of delete/balance is not running. (Don’t know if running balance on a single CPU without multithread is a feature or a bug but it's a shame that the process is keeping only one CPU out of 48 at 100%).

I'm using RAID-1 rather than 6, but for me there's very little CPU
consumption. Nor would I expect there to be, since the work is all in
the copying. I have 8 cores and am running Folding at Home (Covid-19
drug discovery) on 6 of them, but there seems to be plenty of CPU
available; idle time is consistently 12-15%. Still, I tried pausing FAH.
There was no discernable effect on the btrfs remove/copy operation, nor
would I expect there to be since FAH is entirely CPU-bound and the
remove/copy is entirely I/O bound. The 'btrfs remove' command uses
relatively little CPU and always shows as waiting on disk in the 'top'
command. Same for the kernel worker threads.

But just in case, I've scaled FAH back to 3 threads to see what happens.

I'm thinking maybe it's time to go back to dm-raid for RAID-1 and keep
btrfs only for its snapshot feature. Integrating RAID into the file
system seemed like a really good idea at the time, but snapshots alone
are still worth it.

When I ran XFS above dm-raid1, I'd periodically pull one drive, put it
in the safe, replace it with a blank drive and let it rebuild. This gave
me a full image backup, and the rebuild copy went at full disk speed
though it did have to copy unused disk. But based on what I'm seeing
now, that's preferable. A full disk image copy at sequential disk speed
is still much faster than copying only the used blocks in semi-random
order and shaking the hell out of my drives.

I wonder if putting bcache and a SSD between btrfs and my drives would
help...? How about a few hundred GB of RAM (I have only 12)?

--Phil




  reply	other threads:[~2020-04-30 20:58 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-28  7:22 Extremely slow device removals Phil Karn
2020-04-30 17:31 ` Phil Karn
2020-04-30 18:13   ` Jean-Denis Girard
2020-05-01  8:05     ` Phil Karn
2020-05-02  3:35       ` Zygo Blaxell
     [not found]         ` <CAMwB8mjUw+KV8mxg8ynPsv0sj5vSpwG7_khw=oP5n+SnPYzumQ@mail.gmail.com>
2020-05-02  4:31           ` Zygo Blaxell
2020-05-02  4:48         ` Paul Jones
2020-05-02  5:25           ` Phil Karn
2020-05-02  6:04             ` Remi Gauvin
2020-05-02  7:20             ` Zygo Blaxell
2020-05-02  7:27               ` Phil Karn
2020-05-02  7:52                 ` Zygo Blaxell
2020-05-02  6:00           ` Zygo Blaxell
2020-05-02  6:23             ` Paul Jones
2020-05-02  7:20               ` Phil Karn
2020-05-02  7:42                 ` Zygo Blaxell
2020-05-02  8:22                   ` Phil Karn
2020-05-02  8:24                     ` Phil Karn
2020-05-02  9:09                     ` Zygo Blaxell
2020-05-02 17:48                       ` Chris Murphy
2020-05-03  5:26                         ` Zygo Blaxell
2020-05-03  5:39                           ` Chris Murphy
2020-05-03  6:05                             ` Chris Murphy
2020-05-04  2:09                         ` Phil Karn
2020-05-02  7:43                 ` Jukka Larja
2020-05-02  4:49         ` Phil Karn
2020-04-30 18:40   ` Chris Murphy
2020-04-30 19:59     ` Phil Karn
2020-04-30 20:27       ` Alexandru Dordea
2020-04-30 20:58         ` Phil Karn [this message]
2020-05-01  2:47       ` Zygo Blaxell
2020-05-01  4:48         ` Phil Karn
2020-05-01  6:05           ` Alexandru Dordea
2020-05-01  7:29             ` Phil Karn
2020-05-02  4:18               ` Zygo Blaxell
2020-05-02  4:48                 ` Phil Karn
2020-05-02  5:00                 ` Phil Karn
2020-05-03  2:28                 ` Phil Karn
2020-05-04  7:39                   ` Phil Karn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de1c1b3e-09ac-bdc8-c2fe-d95c2bffb766@ka9q.net \
    --to=karn@ka9q.net \
    --cc=alex@dordea.net \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.