* [PATCH 0/2] fix issues during suspended replace operation
@ 2022-08-12 10:32 Anand Jain
2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Anand Jain @ 2022-08-12 10:32 UTC (permalink / raw)
To: linux-btrfs
While trying to reproduce the issue reported in the ML. Found few
issues as in the individual independent patches below.
Anand Jain (2):
btrfs: fix assert during replace-cancel of suspended replace-operation
btrfs: add info when mount fails due to stale replace target
fs/btrfs/dev-replace.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--
2.33.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation
2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain
@ 2022-08-12 10:32 ` Anand Jain
2022-09-22 10:00 ` Filipe Manana
2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain
2022-08-22 19:36 ` [PATCH 0/2] fix issues during suspended replace operation David Sterba
2 siblings, 1 reply; 7+ messages in thread
From: Anand Jain @ 2022-08-12 10:32 UTC (permalink / raw)
To: linux-btrfs
If the filesystem mounts with the replace-operation in a suspended state
and try to cancel the suspended replace-operation, we hit the assert. The
assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's
scrub must not be running in suspended state") that was actually not
required. So just remove it.
$ mount /dev/sda5 /btrfs
BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'
$ mount -o degraded /dev/sda5 /btrfs <-- success.
$ btrfs replace cancel /btrfs
kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
kernel: ------------[ cut here ]------------
kernel: kernel BUG at fs/btrfs/ctree.h:3750!
After the patch:
$ btrfs replace cancel /btrfs
BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
fs/btrfs/dev-replace.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 488f2105c5d0..9d46a702bc11 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
up_write(&dev_replace->rwsem);
/* Scrub for replace must not be running in suspended state */
- ret = btrfs_scrub_cancel(fs_info);
- ASSERT(ret != -ENOTCONN);
+ btrfs_scrub_cancel(fs_info);
trans = btrfs_start_transaction(root, 0);
if (IS_ERR(trans)) {
--
2.33.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] btrfs: add info when mount fails due to stale replace target
2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain
2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
@ 2022-08-12 10:32 ` Anand Jain
2022-08-22 19:36 ` David Sterba
2022-08-22 19:36 ` [PATCH 0/2] fix issues during suspended replace operation David Sterba
2 siblings, 1 reply; 7+ messages in thread
From: Anand Jain @ 2022-08-12 10:32 UTC (permalink / raw)
To: linux-btrfs; +Cc: Samuel Greiner
If the replace-target device re-appears after the suspended replace is
cancelled, it blocks the mount operation as it can't find the matching
replace-item in the metadata. As shown below,
BTRFS error (device sda5): replace devid present without an active replace item
To overcome this situation, the user can run the command
btrfs device scan --forget <device-path-to-devid=0>
and try the mount command again. And also, to avoid repeating the issue,
superblock on the devid=0 must be wiped.
wipefs -a device-path-to-devid=0.
This patch adds some info when this situation occurs.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reported-by: Samuel Greiner <samuel@balkonien.org>
Link: https://lore.kernel.org/linux-btrfs/b4f62b10-b295-26ea-71f9-9a5c9299d42c@balkonien.org/T/
---
fs/btrfs/dev-replace.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 9d46a702bc11..7202b76ce59f 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -166,6 +166,8 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
if (btrfs_find_device(fs_info->fs_devices, &args)) {
btrfs_err(fs_info,
"replace devid present without an active replace item");
+ btrfs_info(fs_info,
+ "mount after the command 'btrfs deivce scan --forget <devpath-of-id-0>'");
ret = -EUCLEAN;
} else {
dev_replace->srcdev = NULL;
--
2.33.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] fix issues during suspended replace operation
2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain
2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain
@ 2022-08-22 19:36 ` David Sterba
2 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2022-08-22 19:36 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs
On Fri, Aug 12, 2022 at 06:32:17PM +0800, Anand Jain wrote:
> While trying to reproduce the issue reported in the ML. Found few
> issues as in the individual independent patches below.
>
> Anand Jain (2):
> btrfs: fix assert during replace-cancel of suspended replace-operation
> btrfs: add info when mount fails due to stale replace target
Added to misc-next, thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] btrfs: add info when mount fails due to stale replace target
2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain
@ 2022-08-22 19:36 ` David Sterba
0 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2022-08-22 19:36 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs, Samuel Greiner
On Fri, Aug 12, 2022 at 06:32:19PM +0800, Anand Jain wrote:
> If the replace-target device re-appears after the suspended replace is
> cancelled, it blocks the mount operation as it can't find the matching
> replace-item in the metadata. As shown below,
>
> BTRFS error (device sda5): replace devid present without an active replace item
>
> To overcome this situation, the user can run the command
>
> btrfs device scan --forget <device-path-to-devid=0>
>
> and try the mount command again. And also, to avoid repeating the issue,
> superblock on the devid=0 must be wiped.
>
> wipefs -a device-path-to-devid=0.
>
> This patch adds some info when this situation occurs.
>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Reported-by: Samuel Greiner <samuel@balkonien.org>
> Link: https://lore.kernel.org/linux-btrfs/b4f62b10-b295-26ea-71f9-9a5c9299d42c@balkonien.org/T/
> ---
> fs/btrfs/dev-replace.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 9d46a702bc11..7202b76ce59f 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -166,6 +166,8 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
> if (btrfs_find_device(fs_info->fs_devices, &args)) {
> btrfs_err(fs_info,
> "replace devid present without an active replace item");
> + btrfs_info(fs_info,
> + "mount after the command 'btrfs deivce scan --forget <devpath-of-id-0>'");
The messages should be on the same level and in one message, I've
reprhrased it a bit.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation
2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
@ 2022-09-22 10:00 ` Filipe Manana
2022-09-22 11:28 ` Anand Jain
0 siblings, 1 reply; 7+ messages in thread
From: Filipe Manana @ 2022-09-22 10:00 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs
On Fri, Aug 12, 2022 at 11:56 AM Anand Jain <anand.jain@oracle.com> wrote:
>
> If the filesystem mounts with the replace-operation in a suspended state
> and try to cancel the suspended replace-operation, we hit the assert. The
> assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's
> scrub must not be running in suspended state") that was actually not
> required. So just remove it.
>
> $ mount /dev/sda5 /btrfs
>
> BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
> BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'
>
> $ mount -o degraded /dev/sda5 /btrfs <-- success.
>
> $ btrfs replace cancel /btrfs
>
> kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
> kernel: ------------[ cut here ]------------
> kernel: kernel BUG at fs/btrfs/ctree.h:3750!
>
> After the patch:
>
> $ btrfs replace cancel /btrfs
>
> BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled
Anand, can you please add a test case to fstests?
This is a scenario with no coverage at all in fstests, therefore
specially useful to have it there.
Thanks.
>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> fs/btrfs/dev-replace.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 488f2105c5d0..9d46a702bc11 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
> up_write(&dev_replace->rwsem);
>
> /* Scrub for replace must not be running in suspended state */
> - ret = btrfs_scrub_cancel(fs_info);
> - ASSERT(ret != -ENOTCONN);
> + btrfs_scrub_cancel(fs_info);
>
> trans = btrfs_start_transaction(root, 0);
> if (IS_ERR(trans)) {
> --
> 2.33.1
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation
2022-09-22 10:00 ` Filipe Manana
@ 2022-09-22 11:28 ` Anand Jain
0 siblings, 0 replies; 7+ messages in thread
From: Anand Jain @ 2022-09-22 11:28 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs
On 22/09/2022 18:00, Filipe Manana wrote:
> On Fri, Aug 12, 2022 at 11:56 AM Anand Jain <anand.jain@oracle.com> wrote:
>>
>> If the filesystem mounts with the replace-operation in a suspended state
>> and try to cancel the suspended replace-operation, we hit the assert. The
>> assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's
>> scrub must not be running in suspended state") that was actually not
>> required. So just remove it.
>>
>> $ mount /dev/sda5 /btrfs
>>
>> BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
>> BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'
>>
>> $ mount -o degraded /dev/sda5 /btrfs <-- success.
>>
>> $ btrfs replace cancel /btrfs
>>
>> kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
>> kernel: ------------[ cut here ]------------
>> kernel: kernel BUG at fs/btrfs/ctree.h:3750!
>>
>> After the patch:
>>
>> $ btrfs replace cancel /btrfs
>>
>> BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled
>
> Anand, can you please add a test case to fstests?
> This is a scenario with no coverage at all in fstests, therefore
> specially useful to have it there.
>
I thought about it before and found that unless we implement the
replace-pause sub-command, we can't get a pending replace item in
an unmounted btrfs using a script.
However, to test it manually, I did an abrupt reboot (or power-off,
I think).
Thanks.
> Thanks.
>
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> fs/btrfs/dev-replace.c | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>> index 488f2105c5d0..9d46a702bc11 100644
>> --- a/fs/btrfs/dev-replace.c
>> +++ b/fs/btrfs/dev-replace.c
>> @@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
>> up_write(&dev_replace->rwsem);
>>
>> /* Scrub for replace must not be running in suspended state */
>> - ret = btrfs_scrub_cancel(fs_info);
>> - ASSERT(ret != -ENOTCONN);
>> + btrfs_scrub_cancel(fs_info);
>>
>> trans = btrfs_start_transaction(root, 0);
>> if (IS_ERR(trans)) {
>> --
>> 2.33.1
>>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-09-22 11:28 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain
2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
2022-09-22 10:00 ` Filipe Manana
2022-09-22 11:28 ` Anand Jain
2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain
2022-08-22 19:36 ` David Sterba
2022-08-22 19:36 ` [PATCH 0/2] fix issues during suspended replace operation David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).