* [PATCH 0/2] fix issues during suspended replace operation @ 2022-08-12 10:32 Anand Jain 2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Anand Jain @ 2022-08-12 10:32 UTC (permalink / raw) To: linux-btrfs While trying to reproduce the issue reported in the ML. Found few issues as in the individual independent patches below. Anand Jain (2): btrfs: fix assert during replace-cancel of suspended replace-operation btrfs: add info when mount fails due to stale replace target fs/btrfs/dev-replace.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) -- 2.33.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation 2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain @ 2022-08-12 10:32 ` Anand Jain 2022-09-22 10:00 ` Filipe Manana 2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain 2022-08-22 19:36 ` [PATCH 0/2] fix issues during suspended replace operation David Sterba 2 siblings, 1 reply; 7+ messages in thread From: Anand Jain @ 2022-08-12 10:32 UTC (permalink / raw) To: linux-btrfs If the filesystem mounts with the replace-operation in a suspended state and try to cancel the suspended replace-operation, we hit the assert. The assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's scrub must not be running in suspended state") that was actually not required. So just remove it. $ mount /dev/sda5 /btrfs BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded' $ mount -o degraded /dev/sda5 /btrfs <-- success. $ btrfs replace cancel /btrfs kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131 kernel: ------------[ cut here ]------------ kernel: kernel BUG at fs/btrfs/ctree.h:3750! After the patch: $ btrfs replace cancel /btrfs BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled Signed-off-by: Anand Jain <anand.jain@oracle.com> --- fs/btrfs/dev-replace.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 488f2105c5d0..9d46a702bc11 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info) up_write(&dev_replace->rwsem); /* Scrub for replace must not be running in suspended state */ - ret = btrfs_scrub_cancel(fs_info); - ASSERT(ret != -ENOTCONN); + btrfs_scrub_cancel(fs_info); trans = btrfs_start_transaction(root, 0); if (IS_ERR(trans)) { -- 2.33.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation 2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain @ 2022-09-22 10:00 ` Filipe Manana 2022-09-22 11:28 ` Anand Jain 0 siblings, 1 reply; 7+ messages in thread From: Filipe Manana @ 2022-09-22 10:00 UTC (permalink / raw) To: Anand Jain; +Cc: linux-btrfs On Fri, Aug 12, 2022 at 11:56 AM Anand Jain <anand.jain@oracle.com> wrote: > > If the filesystem mounts with the replace-operation in a suspended state > and try to cancel the suspended replace-operation, we hit the assert. The > assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's > scrub must not be running in suspended state") that was actually not > required. So just remove it. > > $ mount /dev/sda5 /btrfs > > BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing > BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded' > > $ mount -o degraded /dev/sda5 /btrfs <-- success. > > $ btrfs replace cancel /btrfs > > kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131 > kernel: ------------[ cut here ]------------ > kernel: kernel BUG at fs/btrfs/ctree.h:3750! > > After the patch: > > $ btrfs replace cancel /btrfs > > BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled Anand, can you please add a test case to fstests? This is a scenario with no coverage at all in fstests, therefore specially useful to have it there. Thanks. > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > --- > fs/btrfs/dev-replace.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c > index 488f2105c5d0..9d46a702bc11 100644 > --- a/fs/btrfs/dev-replace.c > +++ b/fs/btrfs/dev-replace.c > @@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info) > up_write(&dev_replace->rwsem); > > /* Scrub for replace must not be running in suspended state */ > - ret = btrfs_scrub_cancel(fs_info); > - ASSERT(ret != -ENOTCONN); > + btrfs_scrub_cancel(fs_info); > > trans = btrfs_start_transaction(root, 0); > if (IS_ERR(trans)) { > -- > 2.33.1 > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.” ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation 2022-09-22 10:00 ` Filipe Manana @ 2022-09-22 11:28 ` Anand Jain 0 siblings, 0 replies; 7+ messages in thread From: Anand Jain @ 2022-09-22 11:28 UTC (permalink / raw) To: fdmanana; +Cc: linux-btrfs On 22/09/2022 18:00, Filipe Manana wrote: > On Fri, Aug 12, 2022 at 11:56 AM Anand Jain <anand.jain@oracle.com> wrote: >> >> If the filesystem mounts with the replace-operation in a suspended state >> and try to cancel the suspended replace-operation, we hit the assert. The >> assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's >> scrub must not be running in suspended state") that was actually not >> required. So just remove it. >> >> $ mount /dev/sda5 /btrfs >> >> BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing >> BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded' >> >> $ mount -o degraded /dev/sda5 /btrfs <-- success. >> >> $ btrfs replace cancel /btrfs >> >> kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131 >> kernel: ------------[ cut here ]------------ >> kernel: kernel BUG at fs/btrfs/ctree.h:3750! >> >> After the patch: >> >> $ btrfs replace cancel /btrfs >> >> BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled > > Anand, can you please add a test case to fstests? > This is a scenario with no coverage at all in fstests, therefore > specially useful to have it there. > I thought about it before and found that unless we implement the replace-pause sub-command, we can't get a pending replace item in an unmounted btrfs using a script. However, to test it manually, I did an abrupt reboot (or power-off, I think). Thanks. > Thanks. > >> >> Signed-off-by: Anand Jain <anand.jain@oracle.com> >> --- >> fs/btrfs/dev-replace.c | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) >> >> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c >> index 488f2105c5d0..9d46a702bc11 100644 >> --- a/fs/btrfs/dev-replace.c >> +++ b/fs/btrfs/dev-replace.c >> @@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info) >> up_write(&dev_replace->rwsem); >> >> /* Scrub for replace must not be running in suspended state */ >> - ret = btrfs_scrub_cancel(fs_info); >> - ASSERT(ret != -ENOTCONN); >> + btrfs_scrub_cancel(fs_info); >> >> trans = btrfs_start_transaction(root, 0); >> if (IS_ERR(trans)) { >> -- >> 2.33.1 >> > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] btrfs: add info when mount fails due to stale replace target 2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain 2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain @ 2022-08-12 10:32 ` Anand Jain 2022-08-22 19:36 ` David Sterba 2022-08-22 19:36 ` [PATCH 0/2] fix issues during suspended replace operation David Sterba 2 siblings, 1 reply; 7+ messages in thread From: Anand Jain @ 2022-08-12 10:32 UTC (permalink / raw) To: linux-btrfs; +Cc: Samuel Greiner If the replace-target device re-appears after the suspended replace is cancelled, it blocks the mount operation as it can't find the matching replace-item in the metadata. As shown below, BTRFS error (device sda5): replace devid present without an active replace item To overcome this situation, the user can run the command btrfs device scan --forget <device-path-to-devid=0> and try the mount command again. And also, to avoid repeating the issue, superblock on the devid=0 must be wiped. wipefs -a device-path-to-devid=0. This patch adds some info when this situation occurs. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reported-by: Samuel Greiner <samuel@balkonien.org> Link: https://lore.kernel.org/linux-btrfs/b4f62b10-b295-26ea-71f9-9a5c9299d42c@balkonien.org/T/ --- fs/btrfs/dev-replace.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 9d46a702bc11..7202b76ce59f 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -166,6 +166,8 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info) if (btrfs_find_device(fs_info->fs_devices, &args)) { btrfs_err(fs_info, "replace devid present without an active replace item"); + btrfs_info(fs_info, + "mount after the command 'btrfs deivce scan --forget <devpath-of-id-0>'"); ret = -EUCLEAN; } else { dev_replace->srcdev = NULL; -- 2.33.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] btrfs: add info when mount fails due to stale replace target 2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain @ 2022-08-22 19:36 ` David Sterba 0 siblings, 0 replies; 7+ messages in thread From: David Sterba @ 2022-08-22 19:36 UTC (permalink / raw) To: Anand Jain; +Cc: linux-btrfs, Samuel Greiner On Fri, Aug 12, 2022 at 06:32:19PM +0800, Anand Jain wrote: > If the replace-target device re-appears after the suspended replace is > cancelled, it blocks the mount operation as it can't find the matching > replace-item in the metadata. As shown below, > > BTRFS error (device sda5): replace devid present without an active replace item > > To overcome this situation, the user can run the command > > btrfs device scan --forget <device-path-to-devid=0> > > and try the mount command again. And also, to avoid repeating the issue, > superblock on the devid=0 must be wiped. > > wipefs -a device-path-to-devid=0. > > This patch adds some info when this situation occurs. > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > Reported-by: Samuel Greiner <samuel@balkonien.org> > Link: https://lore.kernel.org/linux-btrfs/b4f62b10-b295-26ea-71f9-9a5c9299d42c@balkonien.org/T/ > --- > fs/btrfs/dev-replace.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c > index 9d46a702bc11..7202b76ce59f 100644 > --- a/fs/btrfs/dev-replace.c > +++ b/fs/btrfs/dev-replace.c > @@ -166,6 +166,8 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info) > if (btrfs_find_device(fs_info->fs_devices, &args)) { > btrfs_err(fs_info, > "replace devid present without an active replace item"); > + btrfs_info(fs_info, > + "mount after the command 'btrfs deivce scan --forget <devpath-of-id-0>'"); The messages should be on the same level and in one message, I've reprhrased it a bit. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] fix issues during suspended replace operation 2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain 2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain 2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain @ 2022-08-22 19:36 ` David Sterba 2 siblings, 0 replies; 7+ messages in thread From: David Sterba @ 2022-08-22 19:36 UTC (permalink / raw) To: Anand Jain; +Cc: linux-btrfs On Fri, Aug 12, 2022 at 06:32:17PM +0800, Anand Jain wrote: > While trying to reproduce the issue reported in the ML. Found few > issues as in the individual independent patches below. > > Anand Jain (2): > btrfs: fix assert during replace-cancel of suspended replace-operation > btrfs: add info when mount fails due to stale replace target Added to misc-next, thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-09-22 11:28 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain 2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain 2022-09-22 10:00 ` Filipe Manana 2022-09-22 11:28 ` Anand Jain 2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain 2022-08-22 19:36 ` David Sterba 2022-08-22 19:36 ` [PATCH 0/2] fix issues during suspended replace operation David Sterba
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).