linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] btrfs: scrub: small enhancement related to super block errors
@ 2022-08-02  6:53 Qu Wenruo
  2022-08-02  6:53 ` [PATCH 1/2] btrfs: scrub: properly report super block errors in dmesg Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Qu Wenruo @ 2022-08-02  6:53 UTC (permalink / raw)
  To: linux-btrfs

Recently during a new test case to verify btrfs can handle a fully
corrupted disk (but with a good primary super block), I found that scrub
doesn't really try to repair super blocks.

And scrub doesn't report super block errors in dmesg either.

So this small patchset is to fix the problems by:

- Add explicit error messages for super block errors
- Try to fix super block errors by committing a transaction

Qu Wenruo (2):
  btrfs: scrub: properly report super block errors in dmesg
  btrfs: scrub: try to fix super block errors

 fs/btrfs/scrub.c | 71 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 50 insertions(+), 21 deletions(-)

-- 
2.37.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] btrfs: scrub: properly report super block errors in dmesg
  2022-08-02  6:53 [PATCH 0/2] btrfs: scrub: small enhancement related to super block errors Qu Wenruo
@ 2022-08-02  6:53 ` Qu Wenruo
  2022-08-02  6:53 ` [PATCH 2/2] btrfs: scrub: try to fix super block errors Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2022-08-02  6:53 UTC (permalink / raw)
  To: linux-btrfs

[PROBLEM]

Unlike data/metadata corruption, if scrub detected some error in the
super block, the only error message is from the updated device status:

 BTRFS info (device dm-1): scrub: started on devid 2
 BTRFS error (device dm-1): bdev /dev/mapper/test-scratch2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
 BTRFS info (device dm-1): scrub: finished on devid 2 with status: 0

This is not helpful at all.

[CAUSE]
Unlike data/metadata error reporting, there is no visible report in
kernel dmesg to report supper block errors.

In fact, return value of scrub_checksum_super() is intentionally
skipped, thus scrub_handle_errored_block() will never be called for
super blocks.

[FIX]
Make super block errors to output an error message, now the full
dmesg would looks like this:

 BTRFS info (device dm-1): scrub: started on devid 2
 BTRFS warning (device dm-1): super block error on dev /dev/mapper/test-scratch2, physical 67108864
 BTRFS error (device dm-1): bdev /dev/mapper/test-scratch2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
 BTRFS info (device dm-1): scrub: finished on devid 2 with status: 0
 BTRFS info (device dm-1): scrub: started on devid 2

This fix involves:

- Move the super_errors reporting to scrub_handle_errored_block()
  This allows the device status message to show after the super block
  error message.
  But now we no longer distinguish super block corruption and generation
  mismatch, now all counted as corruption.

- Properly check the return value from scrub_checksum_super()
- Add extra super block error reporting for scrub_print_warning().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 34 +++++++++++++---------------------
 1 file changed, 13 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 3afe5fa50a63..0330b1ba1a6a 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -729,6 +729,13 @@ static void scrub_print_warning(const char *errstr, struct scrub_block *sblock)
 	dev = sblock->sectors[0]->dev;
 	fs_info = sblock->sctx->fs_info;
 
+	/* Super block error, no need to search extent tree. */
+	if (sblock->sectors[0]->flags & BTRFS_EXTENT_FLAG_SUPER) {
+		btrfs_warn_in_rcu(fs_info, "%s on dev %s, physical %llu",
+			errstr, rcu_str_deref(dev->name),
+			sblock->sectors[0]->physical);
+		return;
+	}
 	path = btrfs_alloc_path();
 	if (!path)
 		return;
@@ -804,7 +811,7 @@ static inline void scrub_put_recover(struct btrfs_fs_info *fs_info,
 static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 {
 	struct scrub_ctx *sctx = sblock_to_check->sctx;
-	struct btrfs_device *dev;
+	struct btrfs_device *dev = sblock_to_check->sectors[0]->dev;
 	struct btrfs_fs_info *fs_info;
 	u64 logical;
 	unsigned int failed_mirror_index;
@@ -825,13 +832,16 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 	fs_info = sctx->fs_info;
 	if (sblock_to_check->sectors[0]->flags & BTRFS_EXTENT_FLAG_SUPER) {
 		/*
-		 * if we find an error in a super block, we just report it.
+		 * If we find an error in a super block, we just report it.
 		 * They will get written with the next transaction commit
 		 * anyway
 		 */
+		scrub_print_warning("super block error", sblock_to_check);
 		spin_lock(&sctx->stat_lock);
 		++sctx->stat.super_errors;
 		spin_unlock(&sctx->stat_lock);
+		btrfs_dev_stat_inc_and_print(dev,
+					     BTRFS_DEV_STAT_CORRUPTION_ERRS);
 		return 0;
 	}
 	logical = sblock_to_check->sectors[0]->logical;
@@ -840,7 +850,6 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 	is_metadata = !(sblock_to_check->sectors[0]->flags &
 			BTRFS_EXTENT_FLAG_DATA);
 	have_csum = sblock_to_check->sectors[0]->have_csum;
-	dev = sblock_to_check->sectors[0]->dev;
 
 	if (!sctx->is_dev_replace && btrfs_repair_one_zone(fs_info, logical))
 		return 0;
@@ -1762,7 +1771,7 @@ static int scrub_checksum(struct scrub_block *sblock)
 	else if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK)
 		ret = scrub_checksum_tree_block(sblock);
 	else if (flags & BTRFS_EXTENT_FLAG_SUPER)
-		(void)scrub_checksum_super(sblock);
+		ret = scrub_checksum_super(sblock);
 	else
 		WARN_ON(1);
 	if (ret)
@@ -1901,23 +1910,6 @@ static int scrub_checksum_super(struct scrub_block *sblock)
 	if (memcmp(calculated_csum, s->csum, sctx->fs_info->csum_size))
 		++fail_cor;
 
-	if (fail_cor + fail_gen) {
-		/*
-		 * if we find an error in a super block, we just report it.
-		 * They will get written with the next transaction commit
-		 * anyway
-		 */
-		spin_lock(&sctx->stat_lock);
-		++sctx->stat.super_errors;
-		spin_unlock(&sctx->stat_lock);
-		if (fail_cor)
-			btrfs_dev_stat_inc_and_print(sector->dev,
-				BTRFS_DEV_STAT_CORRUPTION_ERRS);
-		else
-			btrfs_dev_stat_inc_and_print(sector->dev,
-				BTRFS_DEV_STAT_GENERATION_ERRS);
-	}
-
 	return fail_cor + fail_gen;
 }
 
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] btrfs: scrub: try to fix super block errors
  2022-08-02  6:53 [PATCH 0/2] btrfs: scrub: small enhancement related to super block errors Qu Wenruo
  2022-08-02  6:53 ` [PATCH 1/2] btrfs: scrub: properly report super block errors in dmesg Qu Wenruo
@ 2022-08-02  6:53 ` Qu Wenruo
  2022-08-03 12:55   ` David Sterba
  2022-08-02 14:22 ` [PATCH 0/2] btrfs: scrub: small enhancement related to " David Sterba
  2022-08-04 14:05 ` David Sterba
  3 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2022-08-02  6:53 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
The following script shows that, although scrub can detect super block
errors, it never tries to fix it:

	mkfs.btrfs -f -d raid1 -m raid1 $dev1 $dev2
	xfs_io -c "pwrite 67108864 4k" $dev2

	mount $dev1 $mnt
	btrfs scrub start -B $dev2
	btrfs scrub start -Br $dev2
	umount $mnt

The first scrub reports the super error correctly:

  scrub done for f3289218-abd3-41ac-a630-202f766c0859
  Scrub started:    Tue Aug  2 14:44:11 2022
  Status:           finished
  Duration:         0:00:00
  Total to scrub:   1.26GiB
  Rate:             0.00B/s
  Error summary:    super=1
    Corrected:      0
    Uncorrectable:  0
    Unverified:     0

But the second read-only scrub still reports the same super error:

  Scrub started:    Tue Aug  2 14:44:11 2022
  Status:           finished
  Duration:         0:00:00
  Total to scrub:   1.26GiB
  Rate:             0.00B/s
  Error summary:    super=1
    Corrected:      0
    Uncorrectable:  0
    Unverified:     0

[CAUSE]
The comments already shows that super block can be easily fixed by
committing a transaction:

		/*
		 * If we find an error in a super block, we just report it.
		 * They will get written with the next transaction commit
		 * anyway
		 */

But the truth is, such assumption is not always true, and since scrub
should try to repair every error it found (except for read-only scrub),
we should really actively commit a transaction to fix this.

[FIX]
Just commit a transaction if we found any super block errors, after
everything else is done.

We can not do this just after scrub_supers(), as
btrfs_commit_transaction() will try to pause and wait for the running
scrub, thus we can not call it with scrub_lock hold.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 0330b1ba1a6a..d73c05b74992 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -4094,6 +4094,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 	int ret;
 	struct btrfs_device *dev;
 	unsigned int nofs_flag;
+	bool need_commit = false;
 
 	if (btrfs_fs_closing(fs_info))
 		return -EAGAIN;
@@ -4197,6 +4198,12 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 	 */
 	nofs_flag = memalloc_nofs_save();
 	if (!is_dev_replace) {
+		u64 old_super_errors;
+
+		spin_lock(&sctx->stat_lock);
+		old_super_errors = sctx->stat.super_errors;
+		spin_unlock(&sctx->stat_lock);
+
 		btrfs_info(fs_info, "scrub: started on devid %llu", devid);
 		/*
 		 * by holding device list mutex, we can
@@ -4205,6 +4212,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 		mutex_lock(&fs_info->fs_devices->device_list_mutex);
 		ret = scrub_supers(sctx, dev);
 		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
+
+		spin_lock(&sctx->stat_lock);
+		/*
+		 * Super block errors found, but we can not commit transaction
+		 * at current context, since btrfs_commit_transaction() need
+		 * to pause the current running scrub (hold by ourselves).
+		 */
+		if (sctx->stat.super_errors > old_super_errors && !sctx->readonly)
+			need_commit = true;
+		spin_unlock(&sctx->stat_lock);
 	}
 
 	if (!ret)
@@ -4231,6 +4248,26 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 	scrub_workers_put(fs_info);
 	scrub_put_ctx(sctx);
 
+	/*
+	 * We found some super block errors before, now try to force a
+	 * transaction commit, as all scrub has finished, we're safe to
+	 * commit a transaction.
+	 */
+	if (need_commit) {
+		struct btrfs_trans_handle *trans;
+
+		trans = btrfs_start_transaction(fs_info->tree_root, 0);
+		if (IS_ERR(trans)) {
+			ret = PTR_ERR(trans);
+			btrfs_err(fs_info,
+	"scrub: failed to start transaction to fix super block errors: %d", ret);
+			return ret;
+		}
+		ret = btrfs_commit_transaction(trans);
+		if (ret < 0)
+			btrfs_err(fs_info,
+	"scrub: failed to commit transaction to fix super block errors: %d", ret);
+	}
 	return ret;
 out:
 	scrub_workers_put(fs_info);
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] btrfs: scrub: small enhancement related to super block errors
  2022-08-02  6:53 [PATCH 0/2] btrfs: scrub: small enhancement related to super block errors Qu Wenruo
  2022-08-02  6:53 ` [PATCH 1/2] btrfs: scrub: properly report super block errors in dmesg Qu Wenruo
  2022-08-02  6:53 ` [PATCH 2/2] btrfs: scrub: try to fix super block errors Qu Wenruo
@ 2022-08-02 14:22 ` David Sterba
  2022-08-02 22:18   ` Qu Wenruo
  2022-08-04 14:05 ` David Sterba
  3 siblings, 1 reply; 12+ messages in thread
From: David Sterba @ 2022-08-02 14:22 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Aug 02, 2022 at 02:53:01PM +0800, Qu Wenruo wrote:
> Recently during a new test case to verify btrfs can handle a fully
> corrupted disk (but with a good primary super block), I found that scrub
> doesn't really try to repair super blocks.

As the comment says, superblock gets overwritten eventually, I'm not
sure we need to start the commit if it's found to be corrupted.

> And scrub doesn't report super block errors in dmesg either.

Yeah it makes sense to report it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] btrfs: scrub: small enhancement related to super block errors
  2022-08-02 14:22 ` [PATCH 0/2] btrfs: scrub: small enhancement related to " David Sterba
@ 2022-08-02 22:18   ` Qu Wenruo
  2022-08-03 12:50     ` David Sterba
  0 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2022-08-02 22:18 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/8/2 22:22, David Sterba wrote:
> On Tue, Aug 02, 2022 at 02:53:01PM +0800, Qu Wenruo wrote:
>> Recently during a new test case to verify btrfs can handle a fully
>> corrupted disk (but with a good primary super block), I found that scrub
>> doesn't really try to repair super blocks.
>
> As the comment says, superblock gets overwritten eventually, I'm not
> sure we need to start the commit if it's found to be corrupted.

The problem is, if someone like me to expect scrub to repair everything
it can, thus doing scrub (to repair) then scrub (to check), then the
second scrub still report super block error is definitely not expected.

Furthermore, just committing one transaction for one device doesn't look
that costly to me.

Thus the choice not to do anything looks pretty weird to me.

Thanks,
Qu
>
>> And scrub doesn't report super block errors in dmesg either.
>
> Yeah it makes sense to report it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] btrfs: scrub: small enhancement related to super block errors
  2022-08-02 22:18   ` Qu Wenruo
@ 2022-08-03 12:50     ` David Sterba
  0 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2022-08-03 12:50 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Wed, Aug 03, 2022 at 06:18:20AM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/8/2 22:22, David Sterba wrote:
> > On Tue, Aug 02, 2022 at 02:53:01PM +0800, Qu Wenruo wrote:
> >> Recently during a new test case to verify btrfs can handle a fully
> >> corrupted disk (but with a good primary super block), I found that scrub
> >> doesn't really try to repair super blocks.
> >
> > As the comment says, superblock gets overwritten eventually, I'm not
> > sure we need to start the commit if it's found to be corrupted.
> 
> The problem is, if someone like me to expect scrub to repair everything
> it can, thus doing scrub (to repair) then scrub (to check), then the
> second scrub still report super block error is definitely not expected.

That's a good point, scrub should fix what it could. That the superblock
could be skipped is an optimization but for a correctness feature it's
not the main criteria.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] btrfs: scrub: try to fix super block errors
  2022-08-02  6:53 ` [PATCH 2/2] btrfs: scrub: try to fix super block errors Qu Wenruo
@ 2022-08-03 12:55   ` David Sterba
  2022-08-03 21:49     ` Qu Wenruo
  0 siblings, 1 reply; 12+ messages in thread
From: David Sterba @ 2022-08-03 12:55 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Aug 02, 2022 at 02:53:03PM +0800, Qu Wenruo wrote:
> @@ -4231,6 +4248,26 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>  	scrub_workers_put(fs_info);
>  	scrub_put_ctx(sctx);
>  
> +	/*
> +	 * We found some super block errors before, now try to force a
> +	 * transaction commit, as all scrub has finished, we're safe to
> +	 * commit a transaction.
> +	 */

Scrub can be started in read-only mode, which is basicaly report-only
mode, so forcing the transaction commit should be also skipped. It would
fail with -EROFS right at the beginning of transaction start.

> +	if (need_commit) {
> +		struct btrfs_trans_handle *trans;
> +
> +		trans = btrfs_start_transaction(fs_info->tree_root, 0);
> +		if (IS_ERR(trans)) {
> +			ret = PTR_ERR(trans);
> +			btrfs_err(fs_info,
> +	"scrub: failed to start transaction to fix super block errors: %d", ret);
> +			return ret;
> +		}
> +		ret = btrfs_commit_transaction(trans);
> +		if (ret < 0)
> +			btrfs_err(fs_info,
> +	"scrub: failed to commit transaction to fix super block errors: %d", ret);
> +	}
>  	return ret;
>  out:
>  	scrub_workers_put(fs_info);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] btrfs: scrub: try to fix super block errors
  2022-08-03 12:55   ` David Sterba
@ 2022-08-03 21:49     ` Qu Wenruo
  2022-08-04 13:06       ` David Sterba
  0 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2022-08-03 21:49 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/8/3 20:55, David Sterba wrote:
> On Tue, Aug 02, 2022 at 02:53:03PM +0800, Qu Wenruo wrote:
>> @@ -4231,6 +4248,26 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>>   	scrub_workers_put(fs_info);
>>   	scrub_put_ctx(sctx);
>>
>> +	/*
>> +	 * We found some super block errors before, now try to force a
>> +	 * transaction commit, as all scrub has finished, we're safe to
>> +	 * commit a transaction.
>> +	 */
>
> Scrub can be started in read-only mode, which is basicaly report-only
> mode, so forcing the transaction commit should be also skipped. It would
> fail with -EROFS right at the beginning of transaction start.

It's already checked in the code:

+		if (sctx->stat.super_errors > old_super_errors && !sctx->readonly)
+			need_commit = true;

Thanks,
Qu
>
>> +	if (need_commit) {
>> +		struct btrfs_trans_handle *trans;
>> +
>> +		trans = btrfs_start_transaction(fs_info->tree_root, 0);
>> +		if (IS_ERR(trans)) {
>> +			ret = PTR_ERR(trans);
>> +			btrfs_err(fs_info,
>> +	"scrub: failed to start transaction to fix super block errors: %d", ret);
>> +			return ret;
>> +		}
>> +		ret = btrfs_commit_transaction(trans);
>> +		if (ret < 0)
>> +			btrfs_err(fs_info,
>> +	"scrub: failed to commit transaction to fix super block errors: %d", ret);
>> +	}
>>   	return ret;
>>   out:
>>   	scrub_workers_put(fs_info);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] btrfs: scrub: try to fix super block errors
  2022-08-03 21:49     ` Qu Wenruo
@ 2022-08-04 13:06       ` David Sterba
  2022-08-06  6:01         ` Qu Wenruo
  0 siblings, 1 reply; 12+ messages in thread
From: David Sterba @ 2022-08-04 13:06 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Thu, Aug 04, 2022 at 05:49:20AM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/8/3 20:55, David Sterba wrote:
> > On Tue, Aug 02, 2022 at 02:53:03PM +0800, Qu Wenruo wrote:
> >> @@ -4231,6 +4248,26 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
> >>   	scrub_workers_put(fs_info);
> >>   	scrub_put_ctx(sctx);
> >>
> >> +	/*
> >> +	 * We found some super block errors before, now try to force a
> >> +	 * transaction commit, as all scrub has finished, we're safe to
> >> +	 * commit a transaction.
> >> +	 */
> >
> > Scrub can be started in read-only mode, which is basicaly report-only
> > mode, so forcing the transaction commit should be also skipped. It would
> > fail with -EROFS right at the beginning of transaction start.
> 
> It's already checked in the code:
> 
> +		if (sctx->stat.super_errors > old_super_errors && !sctx->readonly)
> +			need_commit = true;

Great, I overlooked it and searched only for BTRFS_SCRUB_READONLY.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] btrfs: scrub: small enhancement related to super block errors
  2022-08-02  6:53 [PATCH 0/2] btrfs: scrub: small enhancement related to super block errors Qu Wenruo
                   ` (2 preceding siblings ...)
  2022-08-02 14:22 ` [PATCH 0/2] btrfs: scrub: small enhancement related to " David Sterba
@ 2022-08-04 14:05 ` David Sterba
  3 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2022-08-04 14:05 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Aug 02, 2022 at 02:53:01PM +0800, Qu Wenruo wrote:
> Recently during a new test case to verify btrfs can handle a fully
> corrupted disk (but with a good primary super block), I found that scrub
> doesn't really try to repair super blocks.
> 
> And scrub doesn't report super block errors in dmesg either.
> 
> So this small patchset is to fix the problems by:
> 
> - Add explicit error messages for super block errors
> - Try to fix super block errors by committing a transaction
> 
> Qu Wenruo (2):
>   btrfs: scrub: properly report super block errors in dmesg
>   btrfs: scrub: try to fix super block errors

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] btrfs: scrub: try to fix super block errors
  2022-08-04 13:06       ` David Sterba
@ 2022-08-06  6:01         ` Qu Wenruo
  2022-08-06  6:30           ` Qu Wenruo
  0 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2022-08-06  6:01 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/8/4 21:06, David Sterba wrote:
> On Thu, Aug 04, 2022 at 05:49:20AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2022/8/3 20:55, David Sterba wrote:
>>> On Tue, Aug 02, 2022 at 02:53:03PM +0800, Qu Wenruo wrote:
>>>> @@ -4231,6 +4248,26 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
>>>>    	scrub_workers_put(fs_info);
>>>>    	scrub_put_ctx(sctx);
>>>>
>>>> +	/*
>>>> +	 * We found some super block errors before, now try to force a
>>>> +	 * transaction commit, as all scrub has finished, we're safe to
>>>> +	 * commit a transaction.
>>>> +	 */
>>>
>>> Scrub can be started in read-only mode, which is basicaly report-only
>>> mode, so forcing the transaction commit should be also skipped. It would
>>> fail with -EROFS right at the beginning of transaction start.
>>
>> It's already checked in the code:
>>
>> +		if (sctx->stat.super_errors > old_super_errors && !sctx->readonly)
>> +			need_commit = true;
>
> Great, I overlooked it and searched only for BTRFS_SCRUB_READONLY.

My bad, I didn't test it with replace group, and it can cause hang in
btrfs/100.

The cause is, dev-replace will call btrfs_scrub_dev() with extra
dev-replace related locking.
However btrfs_commit_transaction() will also need to wait that lock,
thus we will dead lock there.

I'll fix the last patch, meanwhile please remove it from misc-next.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] btrfs: scrub: try to fix super block errors
  2022-08-06  6:01         ` Qu Wenruo
@ 2022-08-06  6:30           ` Qu Wenruo
  0 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2022-08-06  6:30 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/8/6 14:01, Qu Wenruo wrote:
>
>
> On 2022/8/4 21:06, David Sterba wrote:
>> On Thu, Aug 04, 2022 at 05:49:20AM +0800, Qu Wenruo wrote:
>>>
>>>
>>> On 2022/8/3 20:55, David Sterba wrote:
>>>> On Tue, Aug 02, 2022 at 02:53:03PM +0800, Qu Wenruo wrote:
>>>>> @@ -4231,6 +4248,26 @@ int btrfs_scrub_dev(struct btrfs_fs_info
>>>>> *fs_info, u64 devid, u64 start,
>>>>>        scrub_workers_put(fs_info);
>>>>>        scrub_put_ctx(sctx);
>>>>>
>>>>> +    /*
>>>>> +     * We found some super block errors before, now try to force a
>>>>> +     * transaction commit, as all scrub has finished, we're safe to
>>>>> +     * commit a transaction.
>>>>> +     */
>>>>
>>>> Scrub can be started in read-only mode, which is basicaly report-only
>>>> mode, so forcing the transaction commit should be also skipped. It
>>>> would
>>>> fail with -EROFS right at the beginning of transaction start.
>>>
>>> It's already checked in the code:
>>>
>>> +        if (sctx->stat.super_errors > old_super_errors &&
>>> !sctx->readonly)
>>> +            need_commit = true;
>>
>> Great, I overlooked it and searched only for BTRFS_SCRUB_READONLY.
>
> My bad, I didn't test it with replace group, and it can cause hang in
> btrfs/100.
>
> The cause is, dev-replace will call btrfs_scrub_dev() with extra
> dev-replace related locking.
> However btrfs_commit_transaction() will also need to wait that lock,
> thus we will dead lock there.

Ops, this is not the case, it looks like the discard related bug.

Although dev-replace is holding replace related locks, we should still
be able to commit transaction during dev-replace.

Thus nothing needs to be done here.

Thanks,
Qu
>
> I'll fix the last patch, meanwhile please remove it from misc-next.
>
> Thanks,
> Qu

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-08-06  6:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-02  6:53 [PATCH 0/2] btrfs: scrub: small enhancement related to super block errors Qu Wenruo
2022-08-02  6:53 ` [PATCH 1/2] btrfs: scrub: properly report super block errors in dmesg Qu Wenruo
2022-08-02  6:53 ` [PATCH 2/2] btrfs: scrub: try to fix super block errors Qu Wenruo
2022-08-03 12:55   ` David Sterba
2022-08-03 21:49     ` Qu Wenruo
2022-08-04 13:06       ` David Sterba
2022-08-06  6:01         ` Qu Wenruo
2022-08-06  6:30           ` Qu Wenruo
2022-08-02 14:22 ` [PATCH 0/2] btrfs: scrub: small enhancement related to " David Sterba
2022-08-02 22:18   ` Qu Wenruo
2022-08-03 12:50     ` David Sterba
2022-08-04 14:05 ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).