All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Cc: dsterba@suse.cz, clm@fb.com, josef@toxicpanda.com,
	dsterba@suse.com, anand.jain@oracle.com,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	skhan@linuxfoundation.org, gregkh@linuxfoundation.org,
	linux-kernel-mentees@lists.linuxfoundation.org,
	syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids
Date: Fri, 13 Aug 2021 10:51:38 +0200	[thread overview]
Message-ID: <20210813085137.GQ5047@twin.jikos.cz> (raw)
In-Reply-To: <1e0aafb2-9e55-5f64-d347-1765de0560c5@gmail.com>

On Fri, Aug 13, 2021 at 01:31:25AM +0800, Desmond Cheong Zhi Xi wrote:
> On 12/8/21 11:50 pm, David Sterba wrote:
> > On Thu, Aug 12, 2021 at 11:43:16PM +0800, Desmond Cheong Zhi Xi wrote:
> >> On 12/8/21 6:38 pm, David Sterba wrote:
> >>> On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
> >>>> --- a/fs/btrfs/volumes.c
> >>>> +++ b/fs/btrfs/volumes.c
> >>>> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
> >>>>    		if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
> >>>>    			list_del_init(&device->dev_alloc_list);
> >>>>    			clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
> >>>> +			fs_devices->rw_devices--;
> >>>>    		}
> >>>>    		list_del_init(&device->dev_list);
> >>>>    		fs_devices->num_devices--;
> >>>
> >>> I've hit a crash on master branch with stacktrace very similar to one
> >>> this bug was supposed to fix. It's a failed assertion on device close.
> >>> This patch was the last one to touch it and it matches some of the
> >>> keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
> >>> the original patch but was not reinstated in your fix.
> >>>
> >>> I'm not sure how reproducible it is, right now I have only one instance
> >>> and am hunting another strange problem. They could be related.
> >>>
> >>> assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
> >>>
> >>> https://susepaste.org/view/raw/18223056 full log with other stacktraces,
> >>> possibly relatedg
> >>>
> >>
> >> Looking at the logs, it seems that a dev_replace was started, then
> >> suspended. But it wasn't canceled or resumed before the fs devices were
> >> closed.
> >>
> >> I'll investigate further, just throwing some observations out there.
> > 
> > Thanks. I'm testing the patch revert, no crash after first loop, I'll
> > run a few more to be sure as it's not entirely reliable.
> > 
> > Sending the revert is option of last resort as we're approaching end of
> > 5.14 dev cycle and the crash prevents testing (unlike the fuzzer
> > warning).
> > 
> 
> I might be missing something, so any thoughts would be appreciated. But 
> I don't think the assertion in btrfs_close_one_device is correct.
> 
>  From what I see, this crash happens when close_ctree is called while a 
> dev_replace hasn't completed. In close_ctree, we suspend the 
> dev_replace, but keep the replace target around so that we can resume 
> the dev_replace procedure when we mount the root again. This is the call 
> trace:
> 
>    close_ctree():
>      btrfs_dev_replace_suspend_for_unmount();
>      btrfs_close_devices():
>        btrfs_close_fs_devices():
>          btrfs_close_one_device():
>            ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT, 
> &device->dev_state));
> 
> However, since the replace target sticks around, there is a device with 
> BTRFS_DEV_STATE_REPLACE_TGT set, and we fail the assertion in 
> btrfs_close_one_device.
> 
> Two options I can think of:
> 
> - We could remove the assertion.
> 
> - Or we could clear the BTRFS_DEV_STATE_REPLACE_TGT bit in 
> btrfs_dev_replace_suspend_for_unmount. This is fine since the bit is set 
> again in btrfs_init_dev_replace if the dev_replace->replace_state is 
> BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED. But this approach strikes me as 
> a little odd because the device is still the replace target when 
> mounting in the future.

The option #2 does not sound safe because the TGT bit is checked in
several places where device list is queried for various reasons, even
without a mounted filesystem.

Removing the assertion makes more sense but I'm still not convinced that
the this is expected/allowed state of a closed device.

WARNING: multiple messages have this Message-ID (diff)
From: David Sterba <dsterba@suse.cz>
To: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Cc: linux-kernel@vger.kernel.org,
	syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com,
	dsterba@suse.cz, anand.jain@oracle.com, josef@toxicpanda.com,
	clm@fb.com, dsterba@suse.com,
	linux-kernel-mentees@lists.linuxfoundation.org,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids
Date: Fri, 13 Aug 2021 10:51:38 +0200	[thread overview]
Message-ID: <20210813085137.GQ5047@twin.jikos.cz> (raw)
In-Reply-To: <1e0aafb2-9e55-5f64-d347-1765de0560c5@gmail.com>

On Fri, Aug 13, 2021 at 01:31:25AM +0800, Desmond Cheong Zhi Xi wrote:
> On 12/8/21 11:50 pm, David Sterba wrote:
> > On Thu, Aug 12, 2021 at 11:43:16PM +0800, Desmond Cheong Zhi Xi wrote:
> >> On 12/8/21 6:38 pm, David Sterba wrote:
> >>> On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote:
> >>>> --- a/fs/btrfs/volumes.c
> >>>> +++ b/fs/btrfs/volumes.c
> >>>> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
> >>>>    		if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
> >>>>    			list_del_init(&device->dev_alloc_list);
> >>>>    			clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
> >>>> +			fs_devices->rw_devices--;
> >>>>    		}
> >>>>    		list_del_init(&device->dev_list);
> >>>>    		fs_devices->num_devices--;
> >>>
> >>> I've hit a crash on master branch with stacktrace very similar to one
> >>> this bug was supposed to fix. It's a failed assertion on device close.
> >>> This patch was the last one to touch it and it matches some of the
> >>> keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in
> >>> the original patch but was not reinstated in your fix.
> >>>
> >>> I'm not sure how reproducible it is, right now I have only one instance
> >>> and am hunting another strange problem. They could be related.
> >>>
> >>> assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
> >>>
> >>> https://susepaste.org/view/raw/18223056 full log with other stacktraces,
> >>> possibly relatedg
> >>>
> >>
> >> Looking at the logs, it seems that a dev_replace was started, then
> >> suspended. But it wasn't canceled or resumed before the fs devices were
> >> closed.
> >>
> >> I'll investigate further, just throwing some observations out there.
> > 
> > Thanks. I'm testing the patch revert, no crash after first loop, I'll
> > run a few more to be sure as it's not entirely reliable.
> > 
> > Sending the revert is option of last resort as we're approaching end of
> > 5.14 dev cycle and the crash prevents testing (unlike the fuzzer
> > warning).
> > 
> 
> I might be missing something, so any thoughts would be appreciated. But 
> I don't think the assertion in btrfs_close_one_device is correct.
> 
>  From what I see, this crash happens when close_ctree is called while a 
> dev_replace hasn't completed. In close_ctree, we suspend the 
> dev_replace, but keep the replace target around so that we can resume 
> the dev_replace procedure when we mount the root again. This is the call 
> trace:
> 
>    close_ctree():
>      btrfs_dev_replace_suspend_for_unmount();
>      btrfs_close_devices():
>        btrfs_close_fs_devices():
>          btrfs_close_one_device():
>            ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT, 
> &device->dev_state));
> 
> However, since the replace target sticks around, there is a device with 
> BTRFS_DEV_STATE_REPLACE_TGT set, and we fail the assertion in 
> btrfs_close_one_device.
> 
> Two options I can think of:
> 
> - We could remove the assertion.
> 
> - Or we could clear the BTRFS_DEV_STATE_REPLACE_TGT bit in 
> btrfs_dev_replace_suspend_for_unmount. This is fine since the bit is set 
> again in btrfs_init_dev_replace if the dev_replace->replace_state is 
> BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED. But this approach strikes me as 
> a little odd because the device is still the replace target when 
> mounting in the future.

The option #2 does not sound safe because the TGT bit is checked in
several places where device list is queried for various reasons, even
without a mounted filesystem.

Removing the assertion makes more sense but I'm still not convinced that
the this is expected/allowed state of a closed device.
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

  reply	other threads:[~2021-08-13  8:54 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27  7:13 [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids Desmond Cheong Zhi Xi
2021-07-27  7:13 ` Desmond Cheong Zhi Xi
2021-07-28 12:58 ` David Sterba
2021-07-28 12:58   ` David Sterba
2021-08-12 10:38 ` David Sterba
2021-08-12 10:38   ` David Sterba
2021-08-12 15:43   ` Desmond Cheong Zhi Xi
2021-08-12 15:43     ` Desmond Cheong Zhi Xi
2021-08-12 15:50     ` David Sterba
2021-08-12 15:50       ` David Sterba
2021-08-12 17:31       ` Desmond Cheong Zhi Xi
2021-08-12 17:31         ` Desmond Cheong Zhi Xi
2021-08-13  8:51         ` David Sterba [this message]
2021-08-13  8:51           ` David Sterba
2021-08-13  9:57           ` Desmond Cheong Zhi Xi
2021-08-13  9:57             ` Desmond Cheong Zhi Xi
2021-08-13 10:30             ` David Sterba
2021-08-13 10:30               ` David Sterba
2021-08-19 17:11               ` Desmond Cheong Zhi Xi
2021-08-19 17:11                 ` Desmond Cheong Zhi Xi
2021-08-19 17:34                 ` David Sterba
2021-08-19 17:34                   ` David Sterba
2021-08-20  3:09                   ` Desmond Cheong Zhi Xi
2021-08-20  3:09                     ` Desmond Cheong Zhi Xi
2021-08-20 10:58                     ` David Sterba
2021-08-20 10:58                       ` David Sterba
2021-08-20 17:53                       ` Desmond Cheong Zhi Xi
2021-08-20 17:53                         ` Desmond Cheong Zhi Xi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210813085137.GQ5047@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=anand.jain@oracle.com \
    --cc=clm@fb.com \
    --cc=desmondcheongzx@gmail.com \
    --cc=dsterba@suse.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel-mentees@lists.linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=skhan@linuxfoundation.org \
    --cc=syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.