linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Panteleev <thecybershadow@gmail.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>,
	Qu Wenruo <quwenruo.btrfs@gmx.com>
Subject: Re: "kernel BUG" and segmentation fault with "device delete"
Date: Sat, 6 Jul 2019 03:37:58 +0000	[thread overview]
Message-ID: <0212c1f0-f02d-bf0f-5748-b1332b6bbfad@gmail.com> (raw)
In-Reply-To: <CAJCQCtS87cQV4PWuDRaQmmY-N03XmGqN2hh8EQv8BqqVGRuxbw@mail.gmail.com>

On 06/07/2019 02.38, Chris Murphy wrote:
> On Fri, Jul 5, 2019 at 6:05 PM Vladimir Panteleev
> <thecybershadow@gmail.com> wrote:
>> Unfortunately as mentioned before that wasn't an option. I was
>> performing this operation on a DM snapshot target backed by a file that
>> certainly could not fit the result of a RAID10-to-RAID1 rebalance.
> 
> Then the total operation isn't possible. Maybe you could have made the
> volume a seed, and then create a single device sprout on a new single
> target, and later convert that sprout to raid1. But I'm not sure of
> the state of multiple device seeds.

That's an interesting idea, thanks; I'll be sure to explore it if I run 
into this situation again.

>> What I found surprising, was that "btrfs device delete missing" deletes
>> exactly one device, instead of all missing devices. But, that might be
>> simply because a device with RAID10 blocks should not have been
>> mountable rw with two missing drives in the first place.
> 
> It's a really good question for developers if there is a good reason
> to permit rw mount of a volume that's missing two or more devices for
> raid 1, 10, or 5; and missing three or more for raid6. I cannot think
> of a good reason to allow degraded,rw mounts for a raid10 missing two
> devices.

Sorry, the code currently indeed does not permit mounting a RAID10 
filesystem with more than one missing device in rw. I needed to patch my 
kernel to force it to allow it, as I was working on the assumption that 
the two remaining drives contained a copy of all data (which turned out 
to be true).

> Wow that's really interesting. So you did 'btrfs replace start' for
> one of the missing drive devid's, with a loop device as the
> replacement, and that worked and finished?!

Yes, that's right.

> Does this three device volume mount rw and not degraded? I guess it
> must have because 'btrfs fi us' worked on it.
> 
>          devid    1 size 7.28TiB used 2.71TiB path /dev/sdd1
>          devid    2 size 7.28TiB used 22.01GiB path /dev/loop0
>          devid    3 size 7.28TiB used 2.69TiB path /dev/sdf1

Indeed - with the loop device attached, I can mount the filesystem rw 
just fine without any mount flags, with a stock kernel.

> OK so what happens now if you try to 'btrfs device remove /dev/loop0' ?

Unfortunately it fails in the same way (warning followed by "kernel 
BUG"). The same thing happens if I try to rebalance the metadata.

> Well there's definitely something screwy if Btrfs needs something on a
> missing drive, which is indicated by its refusal to remove it from the
> volume, and yet at same time it's possible to e.g. rsync every file to
> /dev/null without any errors. That's a bug somewhere.

As I understand, I don't think it actually "needs" any data from that 
device, it's just having trouble updating some metadata as it tries to 
move one redundant copy of the data from there to somewhere else. It's 
not refusing to remove the device either, rather it tries and fails at 
doing so.

> I'm not a developer but a dev very well might need to have a simple
> reproducer for this in order to locate the problem. But the call trace
> might tell them what they need to know. I'm not sure.

What I'm going to try to do next is to create another COW layer on top 
of the three devices I have, attach them to a virtual machine, and boot 
that (as it's not fun to reboot the physical machine each time the code 
crashes). Then I could maybe poke the related kernel code to try to 
understand the problem better.

-- 
Best regards,
  Vladimir

  reply	other threads:[~2019-07-06  3:39 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-05  4:39 "kernel BUG" and segmentation fault with "device delete" Vladimir Panteleev
2019-07-05  7:01 ` Vladimir Panteleev
2019-07-05  9:42 ` Andrei Borzenkov
2019-07-05 10:20   ` Vladimir Panteleev
2019-07-05 21:48     ` Chris Murphy
2019-07-05 22:04       ` Chris Murphy
2019-07-05 21:43 ` Chris Murphy
2019-07-06  0:05   ` Vladimir Panteleev
2019-07-06  2:38     ` Chris Murphy
2019-07-06  3:37       ` Vladimir Panteleev [this message]
2019-07-06 17:36         ` Chris Murphy
2019-07-06  5:01 ` Qu Wenruo
2019-07-06  5:13   ` Vladimir Panteleev
2019-07-06  5:51     ` Qu Wenruo
2019-07-06 15:09       ` Vladimir Panteleev
2019-07-20 10:59       ` Vladimir Panteleev
2019-08-08 20:40         ` Vladimir Panteleev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0212c1f0-f02d-bf0f-5748-b1332b6bbfad@gmail.com \
    --to=thecybershadow@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).