All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] fix issues during suspended replace operation
@ 2022-08-12 10:32 Anand Jain
  2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Anand Jain @ 2022-08-12 10:32 UTC (permalink / raw)
  To: linux-btrfs

While trying to reproduce the issue reported in the ML. Found few
issues as in the individual independent patches below.

Anand Jain (2):
  btrfs: fix assert during replace-cancel of suspended replace-operation
  btrfs: add info when mount fails due to stale replace target

 fs/btrfs/dev-replace.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation
  2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain
@ 2022-08-12 10:32 ` Anand Jain
  2022-09-22 10:00   ` Filipe Manana
  2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain
  2022-08-22 19:36 ` [PATCH 0/2] fix issues during suspended replace operation David Sterba
  2 siblings, 1 reply; 7+ messages in thread
From: Anand Jain @ 2022-08-12 10:32 UTC (permalink / raw)
  To: linux-btrfs

If the filesystem mounts with the replace-operation in a suspended state
and try to cancel the suspended replace-operation, we hit the assert. The
assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's
scrub must not be running in suspended state") that was actually not
required. So just remove it.

 $ mount /dev/sda5 /btrfs

    BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
    BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'

 $ mount -o degraded /dev/sda5 /btrfs <-- success.

 $ btrfs replace cancel /btrfs

    kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
    kernel: ------------[ cut here ]------------
    kernel: kernel BUG at fs/btrfs/ctree.h:3750!

After the patch:

 $ btrfs replace cancel /btrfs

    BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/dev-replace.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 488f2105c5d0..9d46a702bc11 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
 		up_write(&dev_replace->rwsem);
 
 		/* Scrub for replace must not be running in suspended state */
-		ret = btrfs_scrub_cancel(fs_info);
-		ASSERT(ret != -ENOTCONN);
+		btrfs_scrub_cancel(fs_info);
 
 		trans = btrfs_start_transaction(root, 0);
 		if (IS_ERR(trans)) {
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] btrfs: add info when mount fails due to stale replace target
  2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain
  2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
@ 2022-08-12 10:32 ` Anand Jain
  2022-08-22 19:36   ` David Sterba
  2022-08-22 19:36 ` [PATCH 0/2] fix issues during suspended replace operation David Sterba
  2 siblings, 1 reply; 7+ messages in thread
From: Anand Jain @ 2022-08-12 10:32 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Samuel Greiner

If the replace-target device re-appears after the suspended replace is
cancelled, it blocks the mount operation as it can't find the matching
replace-item in the metadata. As shown below,

   BTRFS error (device sda5): replace devid present without an active replace item

To overcome this situation, the user can run the command

   btrfs device scan --forget <device-path-to-devid=0>

and try the mount command again. And also, to avoid repeating the issue,
superblock on the devid=0 must be wiped.

   wipefs -a device-path-to-devid=0.

This patch adds some info when this situation occurs.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reported-by: Samuel Greiner <samuel@balkonien.org>
Link: https://lore.kernel.org/linux-btrfs/b4f62b10-b295-26ea-71f9-9a5c9299d42c@balkonien.org/T/
---
 fs/btrfs/dev-replace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 9d46a702bc11..7202b76ce59f 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -166,6 +166,8 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
 		if (btrfs_find_device(fs_info->fs_devices, &args)) {
 			btrfs_err(fs_info,
 			"replace devid present without an active replace item");
+			btrfs_info(fs_info,
+	"mount after the command 'btrfs deivce scan --forget <devpath-of-id-0>'");
 			ret = -EUCLEAN;
 		} else {
 			dev_replace->srcdev = NULL;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] fix issues during suspended replace operation
  2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain
  2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
  2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain
@ 2022-08-22 19:36 ` David Sterba
  2 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2022-08-22 19:36 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Fri, Aug 12, 2022 at 06:32:17PM +0800, Anand Jain wrote:
> While trying to reproduce the issue reported in the ML. Found few
> issues as in the individual independent patches below.
> 
> Anand Jain (2):
>   btrfs: fix assert during replace-cancel of suspended replace-operation
>   btrfs: add info when mount fails due to stale replace target

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] btrfs: add info when mount fails due to stale replace target
  2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain
@ 2022-08-22 19:36   ` David Sterba
  0 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2022-08-22 19:36 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, Samuel Greiner

On Fri, Aug 12, 2022 at 06:32:19PM +0800, Anand Jain wrote:
> If the replace-target device re-appears after the suspended replace is
> cancelled, it blocks the mount operation as it can't find the matching
> replace-item in the metadata. As shown below,
> 
>    BTRFS error (device sda5): replace devid present without an active replace item
> 
> To overcome this situation, the user can run the command
> 
>    btrfs device scan --forget <device-path-to-devid=0>
> 
> and try the mount command again. And also, to avoid repeating the issue,
> superblock on the devid=0 must be wiped.
> 
>    wipefs -a device-path-to-devid=0.
> 
> This patch adds some info when this situation occurs.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Reported-by: Samuel Greiner <samuel@balkonien.org>
> Link: https://lore.kernel.org/linux-btrfs/b4f62b10-b295-26ea-71f9-9a5c9299d42c@balkonien.org/T/
> ---
>  fs/btrfs/dev-replace.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 9d46a702bc11..7202b76ce59f 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -166,6 +166,8 @@ int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info)
>  		if (btrfs_find_device(fs_info->fs_devices, &args)) {
>  			btrfs_err(fs_info,
>  			"replace devid present without an active replace item");
> +			btrfs_info(fs_info,
> +	"mount after the command 'btrfs deivce scan --forget <devpath-of-id-0>'");

The messages should be on the same level and in one message, I've
reprhrased it a bit.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation
  2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
@ 2022-09-22 10:00   ` Filipe Manana
  2022-09-22 11:28     ` Anand Jain
  0 siblings, 1 reply; 7+ messages in thread
From: Filipe Manana @ 2022-09-22 10:00 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Fri, Aug 12, 2022 at 11:56 AM Anand Jain <anand.jain@oracle.com> wrote:
>
> If the filesystem mounts with the replace-operation in a suspended state
> and try to cancel the suspended replace-operation, we hit the assert. The
> assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's
> scrub must not be running in suspended state") that was actually not
> required. So just remove it.
>
>  $ mount /dev/sda5 /btrfs
>
>     BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
>     BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'
>
>  $ mount -o degraded /dev/sda5 /btrfs <-- success.
>
>  $ btrfs replace cancel /btrfs
>
>     kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
>     kernel: ------------[ cut here ]------------
>     kernel: kernel BUG at fs/btrfs/ctree.h:3750!
>
> After the patch:
>
>  $ btrfs replace cancel /btrfs
>
>     BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled

Anand, can you please add a test case to fstests?
This is a scenario with no coverage at all in fstests, therefore
specially useful to have it there.

Thanks.

>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/dev-replace.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 488f2105c5d0..9d46a702bc11 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
>                 up_write(&dev_replace->rwsem);
>
>                 /* Scrub for replace must not be running in suspended state */
> -               ret = btrfs_scrub_cancel(fs_info);
> -               ASSERT(ret != -ENOTCONN);
> +               btrfs_scrub_cancel(fs_info);
>
>                 trans = btrfs_start_transaction(root, 0);
>                 if (IS_ERR(trans)) {
> --
> 2.33.1
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation
  2022-09-22 10:00   ` Filipe Manana
@ 2022-09-22 11:28     ` Anand Jain
  0 siblings, 0 replies; 7+ messages in thread
From: Anand Jain @ 2022-09-22 11:28 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs



On 22/09/2022 18:00, Filipe Manana wrote:
> On Fri, Aug 12, 2022 at 11:56 AM Anand Jain <anand.jain@oracle.com> wrote:
>>
>> If the filesystem mounts with the replace-operation in a suspended state
>> and try to cancel the suspended replace-operation, we hit the assert. The
>> assert came from the commit fe97e2e173af ("btrfs: dev-replace: replace's
>> scrub must not be running in suspended state") that was actually not
>> required. So just remove it.
>>
>>   $ mount /dev/sda5 /btrfs
>>
>>      BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
>>      BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'
>>
>>   $ mount -o degraded /dev/sda5 /btrfs <-- success.
>>
>>   $ btrfs replace cancel /btrfs
>>
>>      kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
>>      kernel: ------------[ cut here ]------------
>>      kernel: kernel BUG at fs/btrfs/ctree.h:3750!
>>
>> After the patch:
>>
>>   $ btrfs replace cancel /btrfs
>>
>>      BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled
> 
> Anand, can you please add a test case to fstests?
> This is a scenario with no coverage at all in fstests, therefore
> specially useful to have it there.
> 

I thought about it before and found that unless we implement the
replace-pause sub-command, we can't get a pending replace item in
an unmounted btrfs using a script.

However, to test it manually, I did an abrupt reboot (or power-off,
I think).

Thanks.

> Thanks.
> 
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/dev-replace.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>> index 488f2105c5d0..9d46a702bc11 100644
>> --- a/fs/btrfs/dev-replace.c
>> +++ b/fs/btrfs/dev-replace.c
>> @@ -1124,8 +1124,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
>>                  up_write(&dev_replace->rwsem);
>>
>>                  /* Scrub for replace must not be running in suspended state */
>> -               ret = btrfs_scrub_cancel(fs_info);
>> -               ASSERT(ret != -ENOTCONN);
>> +               btrfs_scrub_cancel(fs_info);
>>
>>                  trans = btrfs_start_transaction(root, 0);
>>                  if (IS_ERR(trans)) {
>> --
>> 2.33.1
>>
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-09-22 11:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-12 10:32 [PATCH 0/2] fix issues during suspended replace operation Anand Jain
2022-08-12 10:32 ` [PATCH 1/2] btrfs: fix assert during replace-cancel of suspended replace-operation Anand Jain
2022-09-22 10:00   ` Filipe Manana
2022-09-22 11:28     ` Anand Jain
2022-08-12 10:32 ` [PATCH 2/2] btrfs: add info when mount fails due to stale replace target Anand Jain
2022-08-22 19:36   ` David Sterba
2022-08-22 19:36 ` [PATCH 0/2] fix issues during suspended replace operation David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.