All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phil Karn <karn@ka9q.net>
To: linux-btrfs@vger.kernel.org
Subject: Extremely slow device removals
Date: Tue, 28 Apr 2020 00:22:20 -0700	[thread overview]
Message-ID: <8b647a7f-1223-fa9f-57c0-9a81a9bbeb27@ka9q.net> (raw)

I've been running btrfs in RAID1 mode on four 6TB drives for years. They
have 35+K hours (4 years) of running time, and while they're still
passing SMART scans I I wanted to stop tempting fate. They were also
starting to get full (about 92%) and performance was beginning to suffer.

My plan: replace them with two new 16TB EXOS (Enterprise) drives from
Seagate.

My first false start was a "device add" of one of the new drives
followed by a "device remove" on an old one. (I'd been a while, and I'd
forgotten "device replace"). This went extremely slowly, and by morning
it had bombed with a message in the kernel log about running out of
space on (I think) the *old* drive. This seemed odd since the new drive
was still mostly empty.

The filesystem also refused to remount right away, but given the furious
drive activity I decided to be patient. The file system mounted by
itself an hour or so later. There were plenty of "task hung" messages in
the kernel log, but they all seemed to be warnings. No lost data. Whew.

By now I remembered "device replace". But I'd already done "device add"
on the first new 16 TB drive. That gave me 5 drives online and no spare
slot for the second new drive.

I didn't want to repeat the "device remove" for fear of another
out-of-space failure. So I took a gamble.  I pulled one of the old 6TB
drives to make room for the second new 16TB drive, brought the array up
in degraded mode and started a "device replace missing" operation onto
the second new drive. 'iostat' showed just what I expected: a burst of
reads from one or more of the three old drives alternating with big
writes to the new drive. The data rates were reasonably consistent with
the I/O bandwidth limitations of my 10-year-old server. When it finished
the next day I pulled the old 6TB drive and replaced it with the second
new 16 TB drive. So far so good.

I then began another "device replace". Since I wasn't forced to degrade
the array this time, I didn't. It's been several days, and it's nowhere
near half done. As far as I can tell, it's only making headway of maybe
100-200 GB/day so at this rate it might finish in several weeks!
Moreover, when I run 'iostat' I see lots of writes **to** the drive
being replaced, usually in parallel with the same amount of data going
to one of the other drives.

I'd expect lots of *reads from* the drive being replaced, but why are
there any writes to it at all? Is this just to keep the filesystem
consistent in case of a crash?

I'd already run data and metadata balance operations up to about 95%.

I hesitate to tempt fate by forcing the system down to do another
"device replace missing" operation. Can anyone explain why replacing a
missing device is so much faster than replacing an existing device? Is
it simply because, without no redundancy left against a drive loss, less
work needs to (or can) be done to protect against a crash?

Thanks.

Phil Karn

Here's some current system information.

Linux homer.ka9q.net 4.19.0-8-rt-amd64 #1 SMP PREEMPT RT Debian
4.19.98-1 (2020-01-26) x86_64 GNU/Linux

btrfs-progs v4.20.1

Label: 'homer-btrfs'  uuid: 0d090428-8af8-4d23-99da-92f7176f82a7

Total devices 5 FS bytes used 9.89TiB
    devid    1 size 5.46TiB used 3.81TiB path /dev/sdd3
    devid    2 size 0.00B used 2.72TiB path /dev/sde3 [device currently
being replaced]
    devid    4 size 5.46TiB used 5.10TiB path /dev/sdc3
    devid    5 size 14.32TiB used 6.08TiB path /dev/sdb4
    devid    6 size 14.32TiB used 2.08TiB path /dev/sda4

Data, RAID1: total=9.84TiB, used=9.84TiB
System, RAID1: total=32.00MiB, used=1.73MiB
Metadata, RAID1: total=52.00GiB, used=48.32GiB
GlobalReserve, single: total=512.00MiB, used=0.00B




             reply	other threads:[~2020-04-28  7:22 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-28  7:22 Phil Karn [this message]
2020-04-30 17:31 ` Extremely slow device removals Phil Karn
2020-04-30 18:13   ` Jean-Denis Girard
2020-05-01  8:05     ` Phil Karn
2020-05-02  3:35       ` Zygo Blaxell
     [not found]         ` <CAMwB8mjUw+KV8mxg8ynPsv0sj5vSpwG7_khw=oP5n+SnPYzumQ@mail.gmail.com>
2020-05-02  4:31           ` Zygo Blaxell
2020-05-02  4:48         ` Paul Jones
2020-05-02  5:25           ` Phil Karn
2020-05-02  6:04             ` Remi Gauvin
2020-05-02  7:20             ` Zygo Blaxell
2020-05-02  7:27               ` Phil Karn
2020-05-02  7:52                 ` Zygo Blaxell
2020-05-02  6:00           ` Zygo Blaxell
2020-05-02  6:23             ` Paul Jones
2020-05-02  7:20               ` Phil Karn
2020-05-02  7:42                 ` Zygo Blaxell
2020-05-02  8:22                   ` Phil Karn
2020-05-02  8:24                     ` Phil Karn
2020-05-02  9:09                     ` Zygo Blaxell
2020-05-02 17:48                       ` Chris Murphy
2020-05-03  5:26                         ` Zygo Blaxell
2020-05-03  5:39                           ` Chris Murphy
2020-05-03  6:05                             ` Chris Murphy
2020-05-04  2:09                         ` Phil Karn
2020-05-02  7:43                 ` Jukka Larja
2020-05-02  4:49         ` Phil Karn
2020-04-30 18:40   ` Chris Murphy
2020-04-30 19:59     ` Phil Karn
2020-04-30 20:27       ` Alexandru Dordea
2020-04-30 20:58         ` Phil Karn
2020-05-01  2:47       ` Zygo Blaxell
2020-05-01  4:48         ` Phil Karn
2020-05-01  6:05           ` Alexandru Dordea
2020-05-01  7:29             ` Phil Karn
2020-05-02  4:18               ` Zygo Blaxell
2020-05-02  4:48                 ` Phil Karn
2020-05-02  5:00                 ` Phil Karn
2020-05-03  2:28                 ` Phil Karn
2020-05-04  7:39                   ` Phil Karn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8b647a7f-1223-fa9f-57c0-9a81a9bbeb27@ka9q.net \
    --to=karn@ka9q.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.