linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Murphy <lists@colorremedies.com>
To: Vladimir Panteleev <thecybershadow@gmail.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: "kernel BUG" and segmentation fault with "device delete"
Date: Fri, 5 Jul 2019 15:43:13 -0600	[thread overview]
Message-ID: <CAJCQCtRhXukLGrWTK1D5TLRhxwF6e31oewOSNDg2TAxSanavMA@mail.gmail.com> (raw)
In-Reply-To: <966f5562-1993-2a4f-0d6d-5cea69d6e1c6@gmail.com>

On Thu, Jul 4, 2019 at 10:39 PM Vladimir Panteleev
<thecybershadow@gmail.com> wrote:
>
> Hi,
>
> I'm trying to convert a data=RAID10,metadata=RAID1 (4 disks) array to
> RAID1 (2 disks). The array was less than half full, and I disconnected
> two parity drives, leaving two that contained one copy of all data.

There's no parity on either raid10 or raid1. But I can't tell from the
above exactly when each drive was disconnected. In this scenario you
need to convert to raid1 first, wait for that to complete successfully
before you can do a device remove. That's clear.  Also clear is you
must use 'btrfs device remove' and it must complete before that device
is disconnected.

What I've never tried, but the man page implies, is you can specify
two devices at one time for 'btrfs device remove' if the profile and
the number of devices permits it. So exactly the order and commands
you've used is really important to understand the problem and solution
including whether there might be a bug.


>
> After stubbing out btrfs_check_rw_degradable (because btrfs currently
> can't realize when it has all drives needed for RAID10),

Uhh? This implies it was still raid10 when you disconnected two drives
of a four drive raid10. That's definitely data loss territory.
However, your 'btrfs fi us' command suggests only raid1 chunks. What
I'm suspicious of is this:

>>Data,RAID1: Size:2.66TiB, Used:2.66TiB
>>  /dev/sdd1   2.66TiB
>>  /dev/sdf1   2.66TiB

All data block groups are only on sdf1 and sdd1.

>>Metadata,RAID1: Size:57.00GiB, Used:52.58GiB
>>   /dev/sdd1  57.00GiB
>>  /dev/sdf1  37.00GiB
>>   missing  20.00GiB

There's metadata still on one of the missing devices. You need to
physically reconnect this device. The device removal did not complete
before this device was physically disconnected.

>> System,RAID1: Size:8.00MiB, Used:416.00KiB
>>   /dev/sdd1   8.00MiB
>>   missing   8.00MiB

This is actually worse, potentially because it means there's only one
copy of the system chunk on sdd1. It has not been replicated to sdf1,
but is on the missing device. So it definitely sounds like the missing
device was physicall removed before 'device remove' command finished.

Depending on degraded operation for this task is the wrong strategy.
You needed to 'btrfs device delete/remove' before physically
disconnecting these drives.


>I've
> successfully mounted rw+degraded, balance-converted all RAID10 data to
> RAID1, and then btrfs-device-delete-d one of the missing drives. It
> fails at deleting the second.

OK you definitely did this incorrectly if you're expecting to
disconnect two devices at the same time, and then "btrfs device delete
missing" instead of explicitly deleting drives by ID before you
physically disconnect them.

It sounds to me like you had a successful conversion from 4 disk
raid10 to a 4 disk raid1. But then you're assuming there are
sufficient copies of all data and metadata on each drive. That is not
the case with Btrfs. The drives are not mirrored. The block groups are
mirrored. Btrfs raid1 tolerates exactly 1 device loss. Not two.


-- 
Chris Murphy

  parent reply	other threads:[~2019-07-05 21:43 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-05  4:39 "kernel BUG" and segmentation fault with "device delete" Vladimir Panteleev
2019-07-05  7:01 ` Vladimir Panteleev
2019-07-05  9:42 ` Andrei Borzenkov
2019-07-05 10:20   ` Vladimir Panteleev
2019-07-05 21:48     ` Chris Murphy
2019-07-05 22:04       ` Chris Murphy
2019-07-05 21:43 ` Chris Murphy [this message]
2019-07-06  0:05   ` Vladimir Panteleev
2019-07-06  2:38     ` Chris Murphy
2019-07-06  3:37       ` Vladimir Panteleev
2019-07-06 17:36         ` Chris Murphy
2019-07-06  5:01 ` Qu Wenruo
2019-07-06  5:13   ` Vladimir Panteleev
2019-07-06  5:51     ` Qu Wenruo
2019-07-06 15:09       ` Vladimir Panteleev
2019-07-20 10:59       ` Vladimir Panteleev
2019-08-08 20:40         ` Vladimir Panteleev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJCQCtRhXukLGrWTK1D5TLRhxwF6e31oewOSNDg2TAxSanavMA@mail.gmail.com \
    --to=lists@colorremedies.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=thecybershadow@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).