linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] fix issues during suspended replace operation
@ 2022-08-12 10:32 Anand Jain
  2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Anand Jain @ 2022-08-12 10:32 UTC (permalink / raw)
  To: linux-btrfs

While trying to reproduce the issue reported in the ML. Found few
issues as in the individual independent patches below.

Anand Jain (2):
  btrfs: fix assert during replace-cancel of suspended replace-operation
  btrfs: add info when mount fails due to stale replace target

 fs/btrfs/dev-replace.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation
  2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain
@ 2022-08-12 10:32 ` Anand Jain
  2022-09-22 10:00   ` Filipe Manana
  2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain
  2022-08-22 19:36 ` [PATCH 0/2] fix issues during suspended replace operation David Sterba
  2 siblings, 1 reply; 7+ messages in thread
From: Anand Jain @ 2022-08-12 10:32 UTC (permalink / raw)
  To: linux-btrfs

If the filesystem mounts with the replace-operation in a suspended state
and try to cancel the suspended replace-operation, we hit the assert. The
assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's
scrub must not be running in suspended state") that was actually not
required. So just remove it.

 $ mount /dev/sda5 /btrfs

    BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
    BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'

 $ mount -o degraded /dev/sda5 /btrfs <-- success.

 $ btrfs replace cancel /btrfs

    kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
    kernel: ------------[ cut here ]------------
    kernel: kernel BUG at fs/btrfs/ctree.h:3750!

After the patch:

 $ btrfs replace cancel /btrfs

    BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/dev-replace.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 488f2105c5d0..9d46a702bc11 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
 		up_write(&dev_replace->rwsem);
 
 		/* Scrub for replace must not be running in suspended state */
-		ret = btrfs_scrub_cancel(fs_info);
-		ASSERT(ret != -ENOTCONN);
+		btrfs_scrub_cancel(fs_info);
 
 		trans = btrfs_start_transaction(root, 0);
 		if (IS_ERR(trans)) {
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] btrfs: add info when mount fails due to stale replace target
  2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain
  2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
@ 2022-08-12 10:32 ` Anand Jain
  2022-08-22 19:36   ` David Sterba
  2022-08-22 19:36 ` [PATCH 0/2] fix issues during suspended replace operation David Sterba
  2 siblings, 1 reply; 7+ messages in thread
From: Anand Jain @ 2022-08-12 10:32 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Samuel Greiner

If the replace-target device re-appears after the suspended replace is
cancelled, it blocks the mount operation as it can't find the matching
replace-item in the metadata. As shown below,

   BTRFS error (device sda5): replace devid present without an active replace item

To overcome this situation, the user can run the command

   btrfs device scan --forget <device-path-to-devid=0>

and try the mount command again. And also, to avoid repeating the issue,
superblock on the devid=0 must be wiped.

   wipefs -a device-path-to-devid=0.

This patch adds some info when this situation occurs.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reported-by: Samuel Greiner <samuel@balkonien.org>
Link: https://lore.kernel.org/linux-btrfs/b4f62b10-b295-26ea-71f9-9a5c9299d42c@balkonien.org/T/
---
 fs/btrfs/dev-replace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 9d46a702bc11..7202b76ce59f 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -166,6 +166,8 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
 		if (btrfs_find_device(fs_info->fs_devices, &args)) {
 			btrfs_err(fs_info,
 			"replace devid present without an active replace item");
+			btrfs_info(fs_info,
+	"mount after the command 'btrfs deivce scan --forget <devpath-of-id-0>'");
 			ret = -EUCLEAN;
 		} else {
 			dev_replace->srcdev = NULL;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] fix issues during suspended replace operation
  2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain
  2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
  2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain
@ 2022-08-22 19:36 ` David Sterba
  2 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2022-08-22 19:36 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Fri, Aug 12, 2022 at 06:32:17PM +0800, Anand Jain wrote:
> While trying to reproduce the issue reported in the ML. Found few
> issues as in the individual independent patches below.
> 
> Anand Jain (2):
>   btrfs: fix assert during replace-cancel of suspended replace-operation
>   btrfs: add info when mount fails due to stale replace target

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] btrfs: add info when mount fails due to stale replace target
  2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain
@ 2022-08-22 19:36   ` David Sterba
  0 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2022-08-22 19:36 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, Samuel Greiner

On Fri, Aug 12, 2022 at 06:32:19PM +0800, Anand Jain wrote:
> If the replace-target device re-appears after the suspended replace is
> cancelled, it blocks the mount operation as it can't find the matching
> replace-item in the metadata. As shown below,
> 
>    BTRFS error (device sda5): replace devid present without an active replace item
> 
> To overcome this situation, the user can run the command
> 
>    btrfs device scan --forget <device-path-to-devid=0>
> 
> and try the mount command again. And also, to avoid repeating the issue,
> superblock on the devid=0 must be wiped.
> 
>    wipefs -a device-path-to-devid=0.
> 
> This patch adds some info when this situation occurs.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Reported-by: Samuel Greiner <samuel@balkonien.org>
> Link: https://lore.kernel.org/linux-btrfs/b4f62b10-b295-26ea-71f9-9a5c9299d42c@balkonien.org/T/
> ---
>  fs/btrfs/dev-replace.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 9d46a702bc11..7202b76ce59f 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -166,6 +166,8 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
>  		if (btrfs_find_device(fs_info->fs_devices, &args)) {
>  			btrfs_err(fs_info,
>  			"replace devid present without an active replace item");
> +			btrfs_info(fs_info,
> +	"mount after the command 'btrfs deivce scan --forget <devpath-of-id-0>'");

The messages should be on the same level and in one message, I've
reprhrased it a bit.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation
  2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
@ 2022-09-22 10:00   ` Filipe Manana
  2022-09-22 11:28     ` Anand Jain
  0 siblings, 1 reply; 7+ messages in thread
From: Filipe Manana @ 2022-09-22 10:00 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Fri, Aug 12, 2022 at 11:56 AM Anand Jain <anand.jain@oracle.com> wrote:
>
> If the filesystem mounts with the replace-operation in a suspended state
> and try to cancel the suspended replace-operation, we hit the assert. The
> assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's
> scrub must not be running in suspended state") that was actually not
> required. So just remove it.
>
>  $ mount /dev/sda5 /btrfs
>
>     BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
>     BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'
>
>  $ mount -o degraded /dev/sda5 /btrfs <-- success.
>
>  $ btrfs replace cancel /btrfs
>
>     kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
>     kernel: ------------[ cut here ]------------
>     kernel: kernel BUG at fs/btrfs/ctree.h:3750!
>
> After the patch:
>
>  $ btrfs replace cancel /btrfs
>
>     BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled

Anand, can you please add a test case to fstests?
This is a scenario with no coverage at all in fstests, therefore
specially useful to have it there.

Thanks.

>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/dev-replace.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 488f2105c5d0..9d46a702bc11 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
>                 up_write(&dev_replace->rwsem);
>
>                 /* Scrub for replace must not be running in suspended state */
> -               ret = btrfs_scrub_cancel(fs_info);
> -               ASSERT(ret != -ENOTCONN);
> +               btrfs_scrub_cancel(fs_info);
>
>                 trans = btrfs_start_transaction(root, 0);
>                 if (IS_ERR(trans)) {
> --
> 2.33.1
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation
  2022-09-22 10:00   ` Filipe Manana
@ 2022-09-22 11:28     ` Anand Jain
  0 siblings, 0 replies; 7+ messages in thread
From: Anand Jain @ 2022-09-22 11:28 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs



On 22/09/2022 18:00, Filipe Manana wrote:
> On Fri, Aug 12, 2022 at 11:56 AM Anand Jain <anand.jain@oracle.com> wrote:
>>
>> If the filesystem mounts with the replace-operation in a suspended state
>> and try to cancel the suspended replace-operation, we hit the assert. The
>> assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's
>> scrub must not be running in suspended state") that was actually not
>> required. So just remove it.
>>
>>   $ mount /dev/sda5 /btrfs
>>
>>      BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
>>      BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'
>>
>>   $ mount -o degraded /dev/sda5 /btrfs <-- success.
>>
>>   $ btrfs replace cancel /btrfs
>>
>>      kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
>>      kernel: ------------[ cut here ]------------
>>      kernel: kernel BUG at fs/btrfs/ctree.h:3750!
>>
>> After the patch:
>>
>>   $ btrfs replace cancel /btrfs
>>
>>      BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled
> 
> Anand, can you please add a test case to fstests?
> This is a scenario with no coverage at all in fstests, therefore
> specially useful to have it there.
> 

I thought about it before and found that unless we implement the
replace-pause sub-command, we can't get a pending replace item in
an unmounted btrfs using a script.

However, to test it manually, I did an abrupt reboot (or power-off,
I think).

Thanks.

> Thanks.
> 
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/dev-replace.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>> index 488f2105c5d0..9d46a702bc11 100644
>> --- a/fs/btrfs/dev-replace.c
>> +++ b/fs/btrfs/dev-replace.c
>> @@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
>>                  up_write(&dev_replace->rwsem);
>>
>>                  /* Scrub for replace must not be running in suspended state */
>> -               ret = btrfs_scrub_cancel(fs_info);
>> -               ASSERT(ret != -ENOTCONN);
>> +               btrfs_scrub_cancel(fs_info);
>>
>>                  trans = btrfs_start_transaction(root, 0);
>>                  if (IS_ERR(trans)) {
>> --
>> 2.33.1
>>
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-09-22 11:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain
2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
2022-09-22 10:00   ` Filipe Manana
2022-09-22 11:28     ` Anand Jain
2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain
2022-08-22 19:36   ` David Sterba
2022-08-22 19:36 ` [PATCH 0/2] fix issues during suspended replace operation David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).