All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror()
@ 2023-01-16  7:04 Qu Wenruo
  2023-01-16  7:04 ` [PATCH 01/11] btrfs: remove unused @path inside scrub_stripe() Qu Wenruo
                   ` (11 more replies)
  0 siblings, 12 replies; 17+ messages in thread
From: Qu Wenruo @ 2023-01-16  7:04 UTC (permalink / raw)
  To: linux-btrfs

This is the formal version of the previous PoC patchset "btrfs: scrub:
rework to get rid of the complex bio formshaping"

The idea is pretty straight-forward for scrub:

- Fetch the extent info and csum for the whole BTRFS_STRIPE_LEN range

- Read the full BTRFS_STRIPE_LEN range

- Verify the contents using the extent info and csum at endio time

- Wait for above BTRFS_STRIPE_LEN range read to finish.

- If we have failed sectors, read extra mirrors to repair them.

- If we have failed sectors, writeback the repaired ones.

- If we're doing dev-replace, writeback all good sectors to the target
  device.

Although the workflow is still mostly the same as the old scrub
infrastructure, the implementation goes submit-and-wait method.

Thus it provides a very straight-forward code basis:

		scrub_reset_stripe(stripe);
		ret = scrub_find_fill_first_stripe(extent_root, csum_root, bg,
				cur_logical, logical_end - cur_logical, stripe);
		stripe->physical = physical + stripe->logical - logical_start;
		scrub_throttle_dev_io(sctx, device, BTRFS_STRIPE_LEN);
		scrub_submit_read_one_stripe(stripe);
		wait_scrub_stripe(stripe);
		scrub_repair_one_stripe(stripe);
		scrub_write_repaired_sectors(sctx, stripe);
		scrub_report_stripe_errors(sctx, stripe);
		if (sctx->is_dev_replace)
			scrub_write_replace_sectors(sctx, stripe);
		cur_logical = stripe->logical + BTRFS_STRIPE_LEN;

Thus it covers all the core logic in one function.

By contrast the old code goes various workqueue, endio function jumps,
and extra bio formshaping.

Currently the patchset only covers profiles other than RAID56 parity
stripes.
Thus old infrastructure is still kept for RAID56 parity scrub usage.

But still the patchset is already large enough for review.

The current patchset can already pass all scrub and replace tests.

[BENCHMARK]

However there is a cost.
Since our block size is limited to 64K, it's much smaller block size
compared to the original one.

Thus for the worst case scenario (all data are continuous, and the
profiles is RAID0 for extra splits), the scrub performance got a 20%
drop:

Old:
 
 Duration:         0:00:19
 Total to scrub:   10.52GiB
 Rate:             449.50MiB/s

New:

 Duration:         0:00:24
 Total to scrub:   10.52GiB
 Rate:             355.86MiB/s

The benchmark is using an SATA SSD directly attached to the VM.

[NEED FEEDBACK]

Is 20% drop perf acceptable?

I have seen some customers asking for ways to slow down scrub,
but not to speed it up.
Thus I'm not sure if a native performance drop is a curse or a bless.

Any if needed, I can enlarge the block size by submitting multiple
stripes instead.
But in that case, we will need some extra code to do multiple stripe
scrub.

Qu Wenruo (11):
  btrfs: remove unused @path inside scrub_stripe()
  btrfs: remove @root and @csum_root arguments from
    scrub_simple_mirror()
  btrfs: scrub: use dedicated super block verification function to scrub
    one super block
  btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based
    interface
  btrfs: scrub: introduce a helper to find and fill the sector info for
    a scrub_stripe
  btrfs: scrub: introduce a helper to verify one metadata
  btrfs: scrub: introduce a helper to verify one scrub_stripe
  btrfs: scrub: introduce the repair functionality for scrub_stripe
  btrfs: scrub: introduce a writeback helper for scrub_stripe
  btrfs: scrub: introduce error reporting functionality for scrub_stripe
  btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe
    infrastructure

 fs/btrfs/file-item.c |    9 +-
 fs/btrfs/file-item.h |    3 +-
 fs/btrfs/raid56.c    |    2 +-
 fs/btrfs/scrub.c     | 1521 ++++++++++++++++++++++++++++++------------
 fs/btrfs/volumes.c   |   10 +-
 fs/btrfs/volumes.h   |    2 +
 6 files changed, 1111 insertions(+), 436 deletions(-)

-- 
2.39.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 01/11] btrfs: remove unused @path inside scrub_stripe()
  2023-01-16  7:04 [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
@ 2023-01-16  7:04 ` Qu Wenruo
  2023-03-01 20:25   ` David Sterba
  2023-01-16  7:04 ` [PATCH 02/11] btrfs: remove @root and @csum_root arguments from scrub_simple_mirror() Qu Wenruo
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2023-01-16  7:04 UTC (permalink / raw)
  To: linux-btrfs

The variable @path is no longer passed into any call sites after commit
18d30ab96149 ("btrfs: scrub: use scrub_simple_mirror() to handle RAID56
data stripe scrub"), thus we can remove the variable completely.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 15 ---------------
 1 file changed, 15 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 10c26bc8e60e..e241e2a35bfd 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3607,7 +3607,6 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 					   struct btrfs_device *scrub_dev,
 					   int stripe_index)
 {
-	struct btrfs_path *path;
 	struct btrfs_fs_info *fs_info = sctx->fs_info;
 	struct btrfs_root *root;
 	struct btrfs_root *csum_root;
@@ -3629,19 +3628,6 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 	u64 stripe_end;
 	int stop_loop = 0;
 
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
-
-	/*
-	 * work on commit root. The related disk blocks are static as
-	 * long as COW is applied. This means, it is save to rewrite
-	 * them to repair disk errors without any race conditions
-	 */
-	path->search_commit_root = 1;
-	path->skip_locking = 1;
-	path->reada = READA_FORWARD;
-
 	wait_event(sctx->list_wait,
 		   atomic_read(&sctx->bios_in_flight) == 0);
 	scrub_blocked_if_needed(fs_info);
@@ -3761,7 +3747,6 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 	mutex_unlock(&sctx->wr_lock);
 
 	blk_finish_plug(&plug);
-	btrfs_free_path(path);
 
 	if (sctx->is_dev_replace && ret >= 0) {
 		int ret2;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 02/11] btrfs: remove @root and @csum_root arguments from scrub_simple_mirror()
  2023-01-16  7:04 [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
  2023-01-16  7:04 ` [PATCH 01/11] btrfs: remove unused @path inside scrub_stripe() Qu Wenruo
@ 2023-01-16  7:04 ` Qu Wenruo
  2023-03-01 20:29   ` David Sterba
  2023-01-16  7:04 ` [PATCH 03/11] btrfs: scrub: use dedicated super block verification function to scrub one super block Qu Wenruo
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2023-01-16  7:04 UTC (permalink / raw)
  To: linux-btrfs

Grabbing extent/csum root for extent tree v2 is not that a huge
workload, it's just a rb tree search.

Thus there is really not much need to pre-fetch it and pass it all the
way from scrub_stripe().

And we already have more than enough arguments in scrub_simple_mirror()
and scrub_simple_stripe(), it's better to remove them and only grab
those roots in scrub_simple_mirror().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 28 +++++++++-------------------
 1 file changed, 9 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index e241e2a35bfd..288d7e2a6cdf 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3410,8 +3410,6 @@ static int sync_write_pointer_for_zoned(struct scrub_ctx *sctx, u64 logical,
  * and @logical_length parameter.
  */
 static int scrub_simple_mirror(struct scrub_ctx *sctx,
-			       struct btrfs_root *extent_root,
-			       struct btrfs_root *csum_root,
 			       struct btrfs_block_group *bg,
 			       struct map_lookup *map,
 			       u64 logical_start, u64 logical_length,
@@ -3419,6 +3417,8 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
 			       u64 physical, int mirror_num)
 {
 	struct btrfs_fs_info *fs_info = sctx->fs_info;
+	struct btrfs_root *csum_root = btrfs_csum_root(fs_info, bg->start);
+	struct btrfs_root *extent_root = btrfs_extent_root(fs_info, bg->start);
 	const u64 logical_end = logical_start + logical_length;
 	/* An artificial limit, inherit from old scrub behavior */
 	const u32 max_length = SZ_64K;
@@ -3567,8 +3567,6 @@ static int simple_stripe_mirror_num(struct map_lookup *map, int stripe_index)
 }
 
 static int scrub_simple_stripe(struct scrub_ctx *sctx,
-			       struct btrfs_root *extent_root,
-			       struct btrfs_root *csum_root,
 			       struct btrfs_block_group *bg,
 			       struct map_lookup *map,
 			       struct btrfs_device *device,
@@ -3588,9 +3586,9 @@ static int scrub_simple_stripe(struct scrub_ctx *sctx,
 		 * just RAID1, so we can reuse scrub_simple_mirror() to scrub
 		 * this stripe.
 		 */
-		ret = scrub_simple_mirror(sctx, extent_root, csum_root, bg, map,
-					  cur_logical, map->stripe_len, device,
-					  cur_physical, mirror_num);
+		ret = scrub_simple_mirror(sctx, bg, map, cur_logical,
+					  map->stripe_len, device, cur_physical,
+					  mirror_num);
 		if (ret)
 			return ret;
 		/* Skip to next stripe which belongs to the target device */
@@ -3608,8 +3606,6 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 					   int stripe_index)
 {
 	struct btrfs_fs_info *fs_info = sctx->fs_info;
-	struct btrfs_root *root;
-	struct btrfs_root *csum_root;
 	struct blk_plug plug;
 	struct map_lookup *map = em->map_lookup;
 	const u64 profile = map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK;
@@ -3632,9 +3628,6 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 		   atomic_read(&sctx->bios_in_flight) == 0);
 	scrub_blocked_if_needed(fs_info);
 
-	root = btrfs_extent_root(fs_info, bg->start);
-	csum_root = btrfs_csum_root(fs_info, bg->start);
-
 	/*
 	 * collect all data csums for the stripe to avoid seeking during
 	 * the scrub. This might currently (crc32) end up to be about 1MB
@@ -3666,16 +3659,14 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 		 * Only @physical and @mirror_num needs to calculated using
 		 * @stripe_index.
 		 */
-		ret = scrub_simple_mirror(sctx, root, csum_root, bg, map,
-				bg->start, bg->length, scrub_dev,
-				map->stripes[stripe_index].physical,
+		ret = scrub_simple_mirror(sctx, bg, map, bg->start, bg->length,
+				scrub_dev, map->stripes[stripe_index].physical,
 				stripe_index + 1);
 		offset = 0;
 		goto out;
 	}
 	if (profile & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10)) {
-		ret = scrub_simple_stripe(sctx, root, csum_root, bg, map,
-					  scrub_dev, stripe_index);
+		ret = scrub_simple_stripe(sctx, bg, map, scrub_dev, stripe_index);
 		offset = map->stripe_len * (stripe_index / map->sub_stripes);
 		goto out;
 	}
@@ -3721,8 +3712,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 		 * We can reuse scrub_simple_mirror() here, as the repair part
 		 * is still based on @mirror_num.
 		 */
-		ret = scrub_simple_mirror(sctx, root, csum_root, bg, map,
-					  logical, map->stripe_len,
+		ret = scrub_simple_mirror(sctx, bg, map, logical, map->stripe_len,
 					  scrub_dev, physical, 1);
 		if (ret < 0)
 			goto out;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 03/11] btrfs: scrub: use dedicated super block verification function to scrub one super block
  2023-01-16  7:04 [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
  2023-01-16  7:04 ` [PATCH 01/11] btrfs: remove unused @path inside scrub_stripe() Qu Wenruo
  2023-01-16  7:04 ` [PATCH 02/11] btrfs: remove @root and @csum_root arguments from scrub_simple_mirror() Qu Wenruo
@ 2023-01-16  7:04 ` Qu Wenruo
  2023-01-19  4:46   ` li zhang
  2023-01-16  7:04 ` [PATCH 04/11] btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based interface Qu Wenruo
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2023-01-16  7:04 UTC (permalink / raw)
  To: linux-btrfs

There is really no need to go through the super complex scrub_sectors()
to just handle super blocks.

This patch will introduce a dedicated function (less than 50 lines) to
handle super block scrubing.

This new function will introduce a behavior change, instead of using the
complex but concurrent scrub_bio system, here we just go
submit-and-wait.

There is really not much sense to care the performance of super block
scrubbing. It only has 3 super blocks at most, and they are all scattered
around the devices already.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 53 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 45 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 288d7e2a6cdf..e554a9904d2a 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -4143,18 +4143,59 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
 	return ret;
 }
 
+static int scrub_one_super(struct scrub_ctx *sctx, struct btrfs_device *dev,
+			   struct page *page, u64 physical, u64 generation)
+{
+	struct btrfs_fs_info *fs_info = sctx->fs_info;
+	struct bio_vec bvec;
+	struct bio bio;
+	struct btrfs_super_block *sb = page_address(page);
+	int ret;
+
+	bio_init(&bio, dev->bdev, &bvec, 1, REQ_OP_READ);
+	bio.bi_iter.bi_sector = physical >> SECTOR_SHIFT;
+	bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE, 0);
+	ret = submit_bio_wait(&bio);
+	bio_uninit(&bio);
+
+	if (ret < 0)
+		return ret;
+	ret = btrfs_check_super_csum(fs_info, sb);
+	if (ret != 0) {
+		btrfs_err_rl(fs_info,
+			"super block at physical %llu devid %llu has bad csum",
+			physical, dev->devid);
+		return -EIO;
+	}
+	if (btrfs_super_generation(sb) != generation) {
+		btrfs_err_rl(fs_info,
+"super block at physical %llu devid %llu has bad generation, has %llu expect %llu",
+			     physical, dev->devid,
+			     btrfs_super_generation(sb), generation);
+		return -EUCLEAN;
+	}
+
+	ret = btrfs_validate_super(fs_info, sb, -1);
+	return ret;
+}
+
 static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
 					   struct btrfs_device *scrub_dev)
 {
 	int	i;
 	u64	bytenr;
 	u64	gen;
-	int	ret;
+	int	ret = 0;
+	struct page *page;
 	struct btrfs_fs_info *fs_info = sctx->fs_info;
 
 	if (BTRFS_FS_ERROR(fs_info))
 		return -EROFS;
 
+	page = alloc_page(GFP_KERNEL);
+	if (!page)
+		return -ENOMEM;
+
 	/* Seed devices of a new filesystem has their own generation. */
 	if (scrub_dev->fs_devices != fs_info->fs_devices)
 		gen = scrub_dev->generation;
@@ -4169,15 +4210,11 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
 		if (!btrfs_check_super_location(scrub_dev, bytenr))
 			continue;
 
-		ret = scrub_sectors(sctx, bytenr, BTRFS_SUPER_INFO_SIZE, bytenr,
-				    scrub_dev, BTRFS_EXTENT_FLAG_SUPER, gen, i,
-				    NULL, bytenr);
+		ret = scrub_one_super(sctx, scrub_dev, page, bytenr, gen);
 		if (ret)
-			return ret;
+			break;
 	}
-	wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0);
-
-	return 0;
+	return ret;
 }
 
 static void scrub_workers_put(struct btrfs_fs_info *fs_info)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 04/11] btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based interface
  2023-01-16  7:04 [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (2 preceding siblings ...)
  2023-01-16  7:04 ` [PATCH 03/11] btrfs: scrub: use dedicated super block verification function to scrub one super block Qu Wenruo
@ 2023-01-16  7:04 ` Qu Wenruo
  2023-01-16  7:04 ` [PATCH 05/11] btrfs: scrub: introduce a helper to find and fill the sector info for a scrub_stripe Qu Wenruo
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2023-01-16  7:04 UTC (permalink / raw)
  To: linux-btrfs

This patch introduces the following structures:

- scrub_sector_verification
  Contains all the needed info to verify one sector (data or metadata).

- scrub_stripe
  Contains all needed members (mostly bitmap based) to scrub one stripe
  (with a length of BTRFS_STRIPE_LEN).

The basic idea is, we keep the existing per-device scrub behavior, but
get rid of the bio form shaping by always read the full BTRFS_STRIPE_LEN
range.

This means we will read some sectors which is not scrub target, but
that's fine. At write back time we still only submit repaired sectors.

With every read submitted in BTRFS_STRIPE_LEN, there should not be much
need for a complex bio form shaping mechanism.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/scrub.h |   8 +++
 2 files changed, 155 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index e554a9904d2a..3beb11153dbf 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -70,6 +70,91 @@ struct scrub_ctx;
  */
 #define BTRFS_MAX_MIRRORS (4 + 1)
 
+/* Represent one sector and its needed info to verify the content. */
+struct scrub_sector_verification {
+	bool is_metadata;
+
+	union {
+		/*
+		 * Csum pointer for data csum verification.
+		 * Should point to a sector csum inside scrub_stripe::csums.
+		 *
+		 * NULL if this data sector has no csum.
+		 */
+		u8 *csum;
+
+		/*
+		 * Extra info for metadata verification.
+		 * All sectors inside a tree block shares the same
+		 * geneartion.
+		 */
+		u64 generation;
+	};
+};
+
+/*
+ * Represent one continuous range with a length of BTRFS_STRIPE_LEN.
+ */
+struct scrub_stripe {
+	struct btrfs_block_group *bg;
+
+	struct page *pages[BTRFS_STRIPE_LEN / PAGE_SIZE];
+	struct scrub_sector_verification *sectors;
+
+	struct btrfs_device *dev;
+
+	u64 logical;
+	u64 physical;
+
+	u16 mirror_num;
+
+	/* Should be BTRFS_STRIPE_LEN / sectorsize. */
+	u16 nr_sectors;
+
+	atomic_t pending_io;
+	wait_queue_head_t io_wait;
+
+	/* Indicates which sectors are covered by extent items. */
+	unsigned long extent_sector_bitmap;
+
+	/*
+	 * Records the errors found after the initial read.
+	 * This will be used for repair, as any sector with error needs repair
+	 * (if found a good copy).
+	 */
+	unsigned long init_error_bitmap;
+
+	/*
+	 * After reading another copy and verification, sectors can be repaired
+	 * will be cleared.
+	 */
+	unsigned long current_error_bitmap;
+
+	/*
+	 * The following error_bitmap are all for the initial read operation.
+	 * After the initial read, we should not touch those error bitmaps, as
+	 * they will later be used to do error reporting.
+	 *
+	 * Indicates IO errors during read.
+	 */
+	unsigned long io_error_bitmap;
+
+	/* For both metadata and data. */
+	unsigned long csum_error_bitmap;
+
+	/*
+	 * Indicates metadata specific errors.
+	 * (basic sanity checks to transid errors)
+	 */
+	unsigned long meta_error_bitmap;
+
+	/*
+	 * Checksum for the whole stripe if this stripe is inside a data block
+	 * group.
+	 */
+	u8 *csums;
+};
+
 struct scrub_recover {
 	refcount_t		refs;
 	struct btrfs_io_context	*bioc;
@@ -266,6 +351,68 @@ static void detach_scrub_page_private(struct page *page)
 #endif
 }
 
+static void free_scrub_stripe(struct scrub_stripe *stripe)
+{
+	int i;
+
+	if (!stripe)
+		return;
+
+	for (i = 0; i < BTRFS_STRIPE_LEN >> PAGE_SHIFT; i++) {
+		if (stripe->pages[i])
+			__free_page(stripe->pages[i]);
+	}
+	kfree(stripe->sectors);
+	kfree(stripe->csums);
+	kfree(stripe);
+}
+
+struct scrub_stripe *alloc_scrub_stripe(struct btrfs_fs_info *fs_info,
+					struct btrfs_block_group *bg)
+{
+	struct scrub_stripe *stripe;
+	int ret;
+
+	stripe = kzalloc(sizeof(*stripe), GFP_KERNEL);
+	if (!stripe)
+		return NULL;
+
+	stripe->nr_sectors = BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits;
+	stripe->bg = bg;
+
+	init_waitqueue_head(&stripe->io_wait);
+	atomic_set(&stripe->pending_io, 0);
+
+
+	ret = btrfs_alloc_page_array(BTRFS_STRIPE_LEN >> PAGE_SHIFT,
+				     stripe->pages);
+	if (ret < 0)
+		goto cleanup;
+
+	stripe->sectors = kcalloc(stripe->nr_sectors,
+				  sizeof(struct scrub_sector_verification),
+				  GFP_KERNEL);
+	if (!stripe->sectors)
+		goto cleanup;
+
+	if (bg->flags & BTRFS_BLOCK_GROUP_DATA) {
+		stripe->csums = kzalloc(
+			(BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits) *
+			fs_info->csum_size, GFP_KERNEL);
+		if (!stripe->csums)
+			goto cleanup;
+	}
+	return stripe;
+cleanup:
+	free_scrub_stripe(stripe);
+	return NULL;
+}
+
+void wait_scrub_stripe(struct scrub_stripe *stripe)
+{
+	wait_event(stripe->io_wait, atomic_read(&stripe->pending_io) == 0);
+}
+
 static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx,
 					     struct btrfs_device *dev,
 					     u64 logical, u64 physical,
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index 7639103ebf9d..a035375083f0 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -13,4 +13,12 @@ int btrfs_scrub_cancel_dev(struct btrfs_device *dev);
 int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid,
 			 struct btrfs_scrub_progress *progress);
 
+/*
+ * The following functions are temporary exports to avoid warning on unused
+ * static functions.
+ */
+struct scrub_stripe *alloc_scrub_stripe(struct btrfs_fs_info *fs_info,
+					struct btrfs_block_group *bg);
+void wait_scrub_stripe(struct scrub_stripe *stripe);
+
 #endif
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 05/11] btrfs: scrub: introduce a helper to find and fill the sector info for a scrub_stripe
  2023-01-16  7:04 [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (3 preceding siblings ...)
  2023-01-16  7:04 ` [PATCH 04/11] btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based interface Qu Wenruo
@ 2023-01-16  7:04 ` Qu Wenruo
  2023-01-16  7:04 ` [PATCH 06/11] btrfs: scrub: introduce a helper to verify one metadata Qu Wenruo
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2023-01-16  7:04 UTC (permalink / raw)
  To: linux-btrfs

The new helper will search the extent tree to find the first extent of a
logical range, then fill the sectors array by two loops:

- Loop 1 to fill common bits and metadata generation

- Loop 2 to fill csum data (only for data bgs)
  This loop will use the new btrfs_lookup_csums_bitmap() to fill
  the full csum buffer, and set scrub_sector_verification::csum.

With all the needed info fulfilled by this function, later we only need
to submit and verify the stripe.

Here we temporarily export the helper to avoid wanring on unused static
function.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file-item.c |   9 ++-
 fs/btrfs/file-item.h |   3 +-
 fs/btrfs/raid56.c    |   2 +-
 fs/btrfs/scrub.c     | 134 +++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/scrub.h     |   6 ++
 5 files changed, 151 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 5de73466b2ca..70c906ffd1ad 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -671,7 +671,8 @@ int btrfs_lookup_csums_list(struct btrfs_root *root, u64 start, u64 end,
  * in is large enough to contain all csums.
  */
 int btrfs_lookup_csums_bitmap(struct btrfs_root *root, u64 start, u64 end,
-			      u8 *csum_buf, unsigned long *csum_bitmap)
+			      u8 *csum_buf, unsigned long *csum_bitmap,
+			      bool search_commit)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct btrfs_key key;
@@ -688,6 +689,12 @@ int btrfs_lookup_csums_bitmap(struct btrfs_root *root, u64 start, u64 end,
 	if (!path)
 		return -ENOMEM;
 
+	if (search_commit) {
+		path->skip_locking = 1;
+		path->reada = READA_FORWARD;
+		path->search_commit_root = 1;
+	}
+
 	key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
 	key.type = BTRFS_EXTENT_CSUM_KEY;
 	key.offset = start;
diff --git a/fs/btrfs/file-item.h b/fs/btrfs/file-item.h
index 031225668434..64f4e5ca394a 100644
--- a/fs/btrfs/file-item.h
+++ b/fs/btrfs/file-item.h
@@ -55,7 +55,8 @@ int btrfs_lookup_csums_list(struct btrfs_root *root, u64 start, u64 end,
 			    struct list_head *list, int search_commit,
 			    bool nowait);
 int btrfs_lookup_csums_bitmap(struct btrfs_root *root, u64 start, u64 end,
-			      u8 *csum_buf, unsigned long *csum_bitmap);
+			      u8 *csum_buf, unsigned long *csum_bitmap,
+			      bool search_commit);
 void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 				     const struct btrfs_path *path,
 				     struct btrfs_file_extent_item *fi,
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 6a2cf754912d..65ed4f326fb9 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2171,7 +2171,7 @@ static void fill_data_csums(struct btrfs_raid_bio *rbio)
 	}
 
 	ret = btrfs_lookup_csums_bitmap(csum_root, start, start + len - 1,
-					rbio->csum_buf, rbio->csum_bitmap);
+					rbio->csum_buf, rbio->csum_bitmap, false);
 	if (ret < 0)
 		goto error;
 	if (bitmap_empty(rbio->csum_bitmap, len >> fs_info->sectorsize_bits))
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 3beb11153dbf..ca58a2e61b4c 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3548,6 +3548,140 @@ static int sync_write_pointer_for_zoned(struct scrub_ctx *sctx, u64 logical,
 	return ret;
 }
 
+static void fill_one_extent_info(struct btrfs_fs_info *fs_info,
+				 struct scrub_stripe *stripe,
+				 u64 extent_start, u64 extent_len,
+				 u64 extent_flags, u64 extent_gen)
+{
+	u64 cur_logical;
+
+	for (cur_logical = max(stripe->logical, extent_start);
+	     cur_logical < min(stripe->logical + BTRFS_STRIPE_LEN,
+			       extent_start + extent_len);
+	     cur_logical += fs_info->sectorsize) {
+		const int nr_sector = (cur_logical - stripe->logical) >>
+				      fs_info->sectorsize_bits;
+		struct scrub_sector_verification *sector =
+			&stripe->sectors[nr_sector];
+
+		set_bit(nr_sector, &stripe->extent_sector_bitmap);
+		if (extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
+			sector->is_metadata = true;
+			sector->generation = extent_gen;
+		}
+	}
+}
+
+/*
+ * Locate one stripe which has at least one extent in its range.
+ *
+ * Return 0 if found such stripe, and store its info into @stripe.
+ * Return >0 if there is no such stripe in the specified range.
+ * Return <0 for error.
+ */
+int scrub_find_fill_first_stripe(struct btrfs_root *extent_root,
+				 struct btrfs_root *csum_root,
+				 struct btrfs_block_group *bg,
+				 u64 logical_start, u64 logical_len,
+				 struct scrub_stripe *stripe)
+{
+	struct btrfs_fs_info *fs_info = extent_root->fs_info;
+	const u64 logical_end = logical_start + logical_len;
+	struct btrfs_path path = { 0 };
+	u64 cur_logical = logical_start;
+	u64 stripe_end;
+	u64 extent_start;
+	u64 extent_len;
+	u64 extent_flags;
+	u64 extent_gen;
+	int ret;
+
+	memset(stripe->sectors, 0, sizeof(struct scrub_sector_verification) *
+				   stripe->nr_sectors);
+	bitmap_zero(&stripe->init_error_bitmap, stripe->nr_sectors);
+	bitmap_zero(&stripe->current_error_bitmap, stripe->nr_sectors);
+
+	/* The range must be inside the bg */
+	ASSERT(logical_start >= bg->start && logical_end <= bg->start + bg->length);
+
+	path.search_commit_root = 1;
+	path.skip_locking = 1;
+
+	ret = find_first_extent_item(extent_root, &path, logical_start,
+				     logical_len);
+	/* Either error or not found. */
+	if (ret)
+		goto out;
+	get_extent_info(&path, &extent_start, &extent_len,
+			&extent_flags, &extent_gen);
+	cur_logical = max(extent_start, cur_logical);
+
+	/*
+	 * Round down to stripe boundary.
+	 *
+	 * The extra calculation against bg->start is to handle block groups
+	 * whose logical bytenr is not BTRFS_STRIPE_LEN aligned.
+	 */
+	stripe->logical = round_down(cur_logical - bg->start, BTRFS_STRIPE_LEN) +
+			  bg->start;
+	stripe_end = stripe->logical + BTRFS_STRIPE_LEN - 1;
+
+	/* Fill the first extent info into stripe->sectors[] array. */
+	fill_one_extent_info(fs_info, stripe, extent_start, extent_len,
+			     extent_flags, extent_gen);
+	cur_logical = extent_start + extent_len;
+
+	/* Fill the extent info for the remaining sectors. */
+	while (cur_logical <= stripe_end) {
+		ret = find_first_extent_item(extent_root, &path, cur_logical,
+					     stripe_end - cur_logical + 1);
+		if (ret < 0)
+			goto out;
+		if (ret > 0) {
+			ret = 0;
+			break;
+		}
+		get_extent_info(&path, &extent_start, &extent_len,
+				&extent_flags, &extent_gen);
+		fill_one_extent_info(fs_info, stripe, extent_start, extent_len,
+				     extent_flags, extent_gen);
+		cur_logical = extent_start + extent_len;
+	}
+
+	/* Now fill the data csum. */
+	if (bg->flags & BTRFS_BLOCK_GROUP_DATA) {
+		int sector_nr;
+		unsigned long csum_bitmap = 0;
+
+		/* Csum space should have already been allocated. */
+		ASSERT(stripe->csums);
+
+		/*
+		 * Our csum bitmap should be large enough, as BTRFS_STRIPE_LEN
+		 * should contain at most 16 sectors.
+		 */
+		ASSERT(BITS_PER_LONG >=
+		       BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits);
+
+		ret = btrfs_lookup_csums_bitmap(csum_root, stripe->logical,
+						stripe_end, stripe->csums,
+						&csum_bitmap, true);
+		if (ret < 0)
+			goto out;
+		if (ret > 0)
+			ret = 0;
+
+		for_each_set_bit(sector_nr, &csum_bitmap, stripe->nr_sectors) {
+			stripe->sectors[sector_nr].csum = stripe->csums +
+				sector_nr * fs_info->csum_size;
+		}
+	}
+out:
+	btrfs_release_path(&path);
+	return ret;
+}
+
+
 /*
  * Scrub one range which can only has simple mirror based profile.
  * (Including all range in SINGLE/DUP/RAID1/RAID1C*, and each stripe in
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index a035375083f0..ce402ad84c17 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -17,8 +17,14 @@ int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid,
  * The following functions are temporary exports to avoid warning on unused
  * static functions.
  */
+struct scrub_stripe;
 struct scrub_stripe *alloc_scrub_stripe(struct btrfs_fs_info *fs_info,
 					struct btrfs_block_group *bg);
 void wait_scrub_stripe(struct scrub_stripe *stripe);
+int scrub_find_fill_first_stripe(struct btrfs_root *extent_root,
+				 struct btrfs_root *csum_root,
+				 struct btrfs_block_group *bg,
+				 u64 logical_start, u64 logical_len,
+				 struct scrub_stripe *stripe);
 
 #endif
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 06/11] btrfs: scrub: introduce a helper to verify one metadata
  2023-01-16  7:04 [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (4 preceding siblings ...)
  2023-01-16  7:04 ` [PATCH 05/11] btrfs: scrub: introduce a helper to find and fill the sector info for a scrub_stripe Qu Wenruo
@ 2023-01-16  7:04 ` Qu Wenruo
  2023-01-16  7:04 ` [PATCH 07/11] btrfs: scrub: introduce a helper to verify one scrub_stripe Qu Wenruo
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2023-01-16  7:04 UTC (permalink / raw)
  To: linux-btrfs

The new helper, scrub_verify_one_metadata(), is almost the same as
scrub_checksum_tree_block().

The difference is in how we grab the pages.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/scrub.h |   1 +
 2 files changed, 104 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ca58a2e61b4c..ced98216f69e 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2166,6 +2166,109 @@ static int scrub_checksum_data(struct scrub_block *sblock)
 	return sblock->checksum_error;
 }
 
+static struct page *scrub_stripe_get_page(struct scrub_stripe *stripe,
+					  int sector_nr)
+{
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	int page_index = sector_nr << fs_info->sectorsize_bits >> PAGE_SHIFT;
+
+	return stripe->pages[page_index];
+}
+
+static unsigned int scrub_stripe_get_page_offset(struct scrub_stripe *stripe,
+						 int sector_nr)
+{
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+
+	return offset_in_page(sector_nr << fs_info->sectorsize_bits);
+}
+
+void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr)
+{
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	const unsigned int sectors_per_tree = fs_info->nodesize >>
+					      fs_info->sectorsize_bits;
+	const u64 logical = stripe->logical + (sector_nr << fs_info->sectorsize_bits);
+	const struct page *first_page = scrub_stripe_get_page(stripe, sector_nr);
+	const unsigned int first_off = scrub_stripe_get_page_offset(stripe, sector_nr);
+	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
+	u8 on_disk_csum[BTRFS_CSUM_SIZE];
+	u8 calculated_csum[BTRFS_CSUM_SIZE];
+	struct btrfs_header *h;
+	int i;
+
+	/*
+	 * Here we don't have a good way to attach the pages (and subpages)
+	 * to a dummy extent buffer, thus we have to directly grab the members
+	 * from pages.
+	 */
+	h = (struct btrfs_header *)(page_address(first_page) + first_off);
+	memcpy(on_disk_csum, h->csum, fs_info->csum_size);
+
+	if (logical != btrfs_stack_header_bytenr(h)) {
+		bitmap_set(&stripe->meta_error_bitmap, sector_nr,
+			   sectors_per_tree);
+		btrfs_warn_rl(fs_info,
+		"tree block %llu mirror %u has bad bytenr, has %llu want %llu",
+			      logical, stripe->mirror_num,
+			      btrfs_stack_header_bytenr(h), logical);
+		return;
+	}
+	if (memcmp(h->fsid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE)) {
+		bitmap_set(&stripe->meta_error_bitmap, sector_nr,
+			   sectors_per_tree);
+		btrfs_warn_rl(fs_info,
+		"tree block %llu mirror %u has bad fsid, has %pU want %pU",
+			      logical, stripe->mirror_num,
+			      h->fsid, fs_info->fs_devices->fsid);
+		return;
+	}
+	if (memcmp(h->chunk_tree_uuid, fs_info->chunk_tree_uuid,
+		   BTRFS_UUID_SIZE)) {
+		bitmap_set(&stripe->meta_error_bitmap, sector_nr,
+			   sectors_per_tree);
+		btrfs_warn_rl(fs_info,
+		"tree block %llu mirror %u has bad chunk tree uuid, has %pU want %pU",
+			      logical, stripe->mirror_num,
+			      h->chunk_tree_uuid, fs_info->chunk_tree_uuid);
+		return;
+	}
+
+	/* Now check tree block csum. */
+	shash->tfm = fs_info->csum_shash;
+	crypto_shash_init(shash);
+	crypto_shash_update(shash, page_address(first_page) + first_off +
+			    BTRFS_CSUM_SIZE, fs_info->sectorsize - BTRFS_CSUM_SIZE);
+
+	for (i = sector_nr + 1; i < sector_nr + sectors_per_tree; i++) {
+		struct page *page = scrub_stripe_get_page(stripe, i);
+		unsigned int page_off = scrub_stripe_get_page_offset(stripe, i);
+
+		crypto_shash_update(shash, page_address(page) + page_off,
+				    fs_info->sectorsize);
+	}
+	crypto_shash_final(shash, calculated_csum);
+	if (memcmp(calculated_csum, on_disk_csum, fs_info->csum_size)) {
+		bitmap_set(&stripe->meta_error_bitmap, sector_nr,
+			   sectors_per_tree);
+		btrfs_warn_rl(fs_info,
+		"tree block %llu mirror %u has bad csum, has " CSUM_FMT " want " CSUM_FMT,
+			      logical, stripe->mirror_num,
+			      CSUM_FMT_VALUE(fs_info->csum_size, on_disk_csum),
+			      CSUM_FMT_VALUE(fs_info->csum_size, calculated_csum));
+		return;
+	}
+	if (stripe->sectors[sector_nr].generation != btrfs_stack_header_generation(h)) {
+		bitmap_set(&stripe->meta_error_bitmap, sector_nr,
+			   sectors_per_tree);
+		btrfs_warn_rl(fs_info,
+		"tree block %llu mirror %u has bad geneartion, has %llu want %llu",
+			      logical, stripe->mirror_num,
+			      btrfs_stack_header_generation(h),
+			      stripe->sectors[sector_nr].generation);
+	}
+}
+
 static int scrub_checksum_tree_block(struct scrub_block *sblock)
 {
 	struct scrub_ctx *sctx = sblock->sctx;
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index ce402ad84c17..d9f869fd4d7c 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -26,5 +26,6 @@ int scrub_find_fill_first_stripe(struct btrfs_root *extent_root,
 				 struct btrfs_block_group *bg,
 				 u64 logical_start, u64 logical_len,
 				 struct scrub_stripe *stripe);
+void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr);
 
 #endif
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 07/11] btrfs: scrub: introduce a helper to verify one scrub_stripe
  2023-01-16  7:04 [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (5 preceding siblings ...)
  2023-01-16  7:04 ` [PATCH 06/11] btrfs: scrub: introduce a helper to verify one metadata Qu Wenruo
@ 2023-01-16  7:04 ` Qu Wenruo
  2023-01-16  7:04 ` [PATCH 08/11] btrfs: scrub: introduce the repair functionality for scrub_stripe Qu Wenruo
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2023-01-16  7:04 UTC (permalink / raw)
  To: linux-btrfs

The new helper, scrub_verify_stripe() is not much different than
the old scrub way.

The difference is mostly in how we grab pages and their offsets.

Currently the helper will only verify and update the error bitmaps, we
don't yet output any error message for data verification, as that can only
be done after either we repaired the stripe or exhausted all the mirrors.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/scrub.h |  2 +-
 2 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ced98216f69e..656b5395adf5 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2183,7 +2183,7 @@ static unsigned int scrub_stripe_get_page_offset(struct scrub_stripe *stripe,
 	return offset_in_page(sector_nr << fs_info->sectorsize_bits);
 }
 
-void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr)
+static void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr)
 {
 	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
 	const unsigned int sectors_per_tree = fs_info->nodesize >>
@@ -2269,6 +2269,85 @@ void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr)
 	}
 }
 
+static void scrub_verify_one_sector(struct scrub_stripe *stripe,
+				    int sector_nr)
+{
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	struct scrub_sector_verification *sector = &stripe->sectors[sector_nr];
+	const unsigned int sectors_per_tree = fs_info->nodesize >>
+					      fs_info->sectorsize_bits;
+	struct page *page = scrub_stripe_get_page(stripe, sector_nr);
+	unsigned int pgoff = scrub_stripe_get_page_offset(stripe, sector_nr);
+
+	ASSERT(sector_nr >= 0 && sector_nr < stripe->nr_sectors);
+
+	/* Sector not utilized, skip it. */
+	if (!test_bit(sector_nr, &stripe->extent_sector_bitmap))
+		return;
+
+	/* IO error, no need to check. */
+	if (test_bit(sector_nr, &stripe->io_error_bitmap))
+		return;
+
+	/* Metadata, verify the full tree block. */
+	if (sector->is_metadata) {
+		/*
+		 * Check if the tree block crosses the stripe boudary.
+		 * If crossed the boundary, we can not verify it but only
+		 * gives a warning.
+		 *
+		 * This can only happen in very old fs where chunks are not
+		 * ensured to be stripe aligned.
+		 */
+		if (unlikely(sector_nr + sectors_per_tree > stripe->nr_sectors)) {
+			btrfs_warn_rl(fs_info,
+			"tree block at %llu crosses stripe boundary %llu",
+				      stripe->logical +
+				      (sector_nr << fs_info->sectorsize_bits),
+				      stripe->logical);
+			return;
+		}
+		scrub_verify_one_metadata(stripe, sector_nr);
+		return;
+	}
+
+	/* Data is much easier, we just verify the data csum (if we have). */
+	if (sector->csum) {
+		int ret;
+		u8 csum_buf[BTRFS_CSUM_SIZE];
+
+		ret = btrfs_check_sector_csum(fs_info, page, pgoff, csum_buf,
+					      sector->csum);
+		if (ret < 0)
+			set_bit(sector_nr, &stripe->csum_error_bitmap);
+	}
+}
+
+void scrub_verify_one_stripe(struct scrub_stripe *stripe)
+{
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	const unsigned int sectors_per_tree = fs_info->nodesize >>
+					      fs_info->sectorsize_bits;
+	int sector_nr;
+
+	for_each_set_bit(sector_nr, &stripe->extent_sector_bitmap,
+			 stripe->nr_sectors) {
+		scrub_verify_one_sector(stripe, sector_nr);
+		if (stripe->sectors[sector_nr].is_metadata)
+			sector_nr += sectors_per_tree - 1;
+	}
+	/*
+	 * All error bitmap updated. OR all errors into the
+	 * initial_error_bitmap for later repair runs.
+	 */
+	bitmap_or(&stripe->init_error_bitmap, &stripe->io_error_bitmap,
+		  &stripe->csum_error_bitmap, stripe->nr_sectors);
+	bitmap_or(&stripe->init_error_bitmap, &stripe->init_error_bitmap,
+		  &stripe->meta_error_bitmap, stripe->nr_sectors);
+	bitmap_copy(&stripe->current_error_bitmap, &stripe->init_error_bitmap,
+		    stripe->nr_sectors);
+}
+
 static int scrub_checksum_tree_block(struct scrub_block *sblock)
 {
 	struct scrub_ctx *sctx = sblock->sctx;
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index d9f869fd4d7c..73fa6744a6a7 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -26,6 +26,6 @@ int scrub_find_fill_first_stripe(struct btrfs_root *extent_root,
 				 struct btrfs_block_group *bg,
 				 u64 logical_start, u64 logical_len,
 				 struct scrub_stripe *stripe);
-void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr);
+void scrub_verify_one_stripe(struct scrub_stripe *stripe);
 
 #endif
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 08/11] btrfs: scrub: introduce the repair functionality for scrub_stripe
  2023-01-16  7:04 [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (6 preceding siblings ...)
  2023-01-16  7:04 ` [PATCH 07/11] btrfs: scrub: introduce a helper to verify one scrub_stripe Qu Wenruo
@ 2023-01-16  7:04 ` Qu Wenruo
  2023-01-16  7:04 ` [PATCH 09/11] btrfs: scrub: introduce a writeback helper " Qu Wenruo
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2023-01-16  7:04 UTC (permalink / raw)
  To: linux-btrfs

The new helper, scrub_repair_one_stripe(), would try to repair a
scrub_stripe.

The repair is done by:

- Submit read for all remaining stripes in one go

- Verification is done at the endio function to each read of the mirror

- Wait for all above read to finish

- Copy repaired sectors back to the original stripe
  And clear the current_error_bitmap bit for the original stripe.

- Check if we repaired all the sectors

This is a little different than the original repair behavior:

- We don't read the current mirror

  While the old behavior is to submit read concurrently for all mirrors,
  including the current bad mirror.

  I didn't see such retry had any meaning, thus only read for the
  remaining mirrors.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c   | 208 ++++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/scrub.h   |   5 +-
 fs/btrfs/volumes.c |  10 ++-
 fs/btrfs/volumes.h |   2 +
 4 files changed, 216 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 656b5395adf5..92976310f899 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -24,6 +24,7 @@
 #include "accessors.h"
 #include "file-item.h"
 #include "scrub.h"
+#include "bio.h"
 
 /*
  * This is only the first step towards a full-features scrub. It reads all
@@ -367,8 +368,8 @@ static void free_scrub_stripe(struct scrub_stripe *stripe)
 	kfree(stripe);
 }
 
-struct scrub_stripe *alloc_scrub_stripe(struct btrfs_fs_info *fs_info,
-					struct btrfs_block_group *bg)
+static struct scrub_stripe *alloc_scrub_stripe(struct btrfs_fs_info *fs_info,
+					       struct btrfs_block_group *bg)
 {
 	struct scrub_stripe *stripe;
 	int ret;
@@ -408,11 +409,49 @@ struct scrub_stripe *alloc_scrub_stripe(struct btrfs_fs_info *fs_info,
 	return NULL;
 }
 
-void wait_scrub_stripe(struct scrub_stripe *stripe)
+static void wait_scrub_stripe(struct scrub_stripe *stripe)
 {
 	wait_event(stripe->io_wait, atomic_read(&stripe->pending_io) == 0);
 }
 
+/*
+ * Clones everything except the pages from @src.
+ *
+ * Used for repair path to read extra mirrors without re-doing the csum/extent
+ * tree search.
+ */
+static struct scrub_stripe *clone_scrub_stripe(struct btrfs_fs_info *fs_info,
+					       const struct scrub_stripe *src)
+{
+	struct scrub_stripe *dst;
+	int sector_nr;
+
+	dst = alloc_scrub_stripe(fs_info, src->bg);
+	if (!dst)
+		return NULL;
+	dst->logical = src->logical;
+	if (src->csums)
+		memcpy(dst->csums, src->csums,
+		       src->nr_sectors * fs_info->csum_size);
+	bitmap_copy(&dst->extent_sector_bitmap, &src->extent_sector_bitmap,
+		    src->nr_sectors);
+	for_each_set_bit(sector_nr, &src->extent_sector_bitmap, src->nr_sectors) {
+		dst->sectors[sector_nr].is_metadata =
+			src->sectors[sector_nr].is_metadata;
+		/* For meta, copy the generation number. */
+		if (src->sectors[sector_nr].is_metadata) {
+			dst->sectors[sector_nr].generation =
+				src->sectors[sector_nr].generation;
+			continue;
+		}
+		/* For data, only update csum pointer if there is data csum. */
+		if (src->sectors[sector_nr].csum)
+			dst->sectors[sector_nr].csum = dst->csums +
+				sector_nr * fs_info->csum_size;
+	}
+	return dst;
+}
+
 static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx,
 					     struct btrfs_device *dev,
 					     u64 logical, u64 physical,
@@ -2323,7 +2362,7 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe,
 	}
 }
 
-void scrub_verify_one_stripe(struct scrub_stripe *stripe)
+static void scrub_verify_one_stripe(struct scrub_stripe *stripe)
 {
 	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
 	const unsigned int sectors_per_tree = fs_info->nodesize >>
@@ -2348,6 +2387,167 @@ void scrub_verify_one_stripe(struct scrub_stripe *stripe)
 		    stripe->nr_sectors);
 }
 
+static void scrub_read_endio(struct btrfs_bio *bbio)
+{
+	struct scrub_stripe *stripe = bbio->private;
+
+	if (bbio->bio.bi_status) {
+		bitmap_set(&stripe->io_error_bitmap, 0, stripe->nr_sectors);
+		bitmap_set(&stripe->init_error_bitmap, 0, stripe->nr_sectors);
+	}
+	bio_put(&bbio->bio);
+	scrub_verify_one_stripe(stripe);
+	if (atomic_dec_and_test(&stripe->pending_io))
+		wake_up(&stripe->io_wait);
+}
+
+static void scrub_submit_read_one_stripe(struct scrub_stripe *stripe)
+{
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	struct bio *bio;
+	int ret;
+	int i;
+
+	ASSERT(stripe->mirror_num >= 1);
+
+	ASSERT(atomic_read(&stripe->pending_io) == 0);
+	bio = btrfs_bio_alloc(BTRFS_STRIPE_LEN >> PAGE_SHIFT, REQ_OP_READ,
+			      scrub_read_endio, stripe);
+	/* Backed by mempool, should not fail. */
+	ASSERT(bio);
+
+	bio->bi_iter.bi_sector = stripe->logical >> SECTOR_SHIFT;
+
+	for (i = 0; i < BTRFS_STRIPE_LEN >> PAGE_SHIFT; i++) {
+		ret = bio_add_page(bio, stripe->pages[i], PAGE_SIZE, 0);
+		ASSERT(ret == PAGE_SIZE);
+	}
+	atomic_inc(&stripe->pending_io);
+	btrfs_submit_bio(fs_info, bio, stripe->mirror_num);
+}
+
+static int get_next_mirror(int mirror, int nr_copies)
+{
+	int ret = mirror + 1;
+
+	if (ret > nr_copies)
+		return ret - nr_copies;
+	return ret;
+}
+
+static void scrub_repair_from_mirror(struct scrub_stripe *orig,
+				     struct scrub_stripe *repair)
+{
+	struct btrfs_fs_info *fs_info = orig->bg->fs_info;
+	const int nr_sectors = orig->nr_sectors;
+	int sector_nr;
+
+	/* IO should have done for both stripes. */
+	ASSERT(atomic_read(&orig->pending_io) == 0);
+	ASSERT(atomic_read(&repair->pending_io) == 0);
+
+	for_each_set_bit(sector_nr, &orig->extent_sector_bitmap, nr_sectors) {
+		int page_index = (sector_nr << fs_info->sectorsize_bits) >>
+				 PAGE_SHIFT;
+		int pgoff = offset_in_page(sector_nr << fs_info->sectorsize_bits);
+
+		if (test_bit(sector_nr, &orig->current_error_bitmap) &&
+		    !test_bit(sector_nr, &repair->current_error_bitmap)) {
+
+			/* Copy the repaired content. */
+			memcpy_page(orig->pages[page_index], pgoff,
+				    repair->pages[page_index], pgoff,
+				    fs_info->sectorsize);
+			/*
+			 * And clear the bit in @current_error_bitmap, so
+			 * later we know we need to write this sector back.
+			 */
+			clear_bit(sector_nr, &orig->current_error_bitmap);
+		}
+	}
+}
+
+int scrub_repair_one_stripe(struct scrub_stripe *stripe)
+{
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	struct scrub_stripe **repair_stripes;
+	int nr_copies;
+	int ret = 0;
+	int i;
+
+	/*
+	 * The stripe should only have been verified once, thus its init and
+	 * current error bitmap should match.
+	 */
+	ASSERT(bitmap_equal(&stripe->current_error_bitmap,
+			    &stripe->init_error_bitmap, stripe->nr_sectors));
+
+	/* The stripe has no error from the beginning. */
+	if (bitmap_empty(&stripe->init_error_bitmap, stripe->nr_sectors))
+		return 0;
+
+	/*
+	 * If we're called for replace, the target device is just garbage,
+	 * no need to use it as an extra mirror.
+	 * If we're called for scrub, replace can not be executing at the same
+	 * time.
+	 *
+	 * Thus by all means, we don't want to use replace target at an extra
+	 * copy.
+	 */
+	nr_copies = btrfs_num_copies_no_extra_mirror(fs_info, stripe->logical,
+						     fs_info->sectorsize);
+	/*
+	 * No extra mirrors to go. Still return 0, as we didn't have errors
+	 * repairing the stripe.
+	 */
+	if (nr_copies == 1)
+		return 0;
+
+	repair_stripes = kcalloc(nr_copies - 1, sizeof(struct scrub_stripe *),
+				 GFP_KERNEL);
+	if (!repair_stripes)
+		return -ENOMEM;
+
+	/* Get all remaining stripes setup. */
+	for (i = 0; i < nr_copies - 1; i++) {
+		int next_mirror = get_next_mirror(i + stripe->mirror_num, nr_copies);
+
+		repair_stripes[i] = clone_scrub_stripe(fs_info, stripe);
+		if (!repair_stripes[i]) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		repair_stripes[i]->mirror_num = next_mirror;
+	}
+
+	/* Submit read for all remaining mirrors. */
+	for (i = 0; i < nr_copies - 1; i++)
+		scrub_submit_read_one_stripe(repair_stripes[i]);
+
+	/* Wait for all reads finish */
+	for (i = 0; i < nr_copies - 1; i++)
+		wait_scrub_stripe(repair_stripes[i]);
+
+	/* Repair from the remaining mirrors. */
+	for (i = 0; i < nr_copies - 1; i++) {
+		scrub_repair_from_mirror(stripe, repair_stripes[i]);
+
+		/* Already repaired all bad sectors. */
+		if (bitmap_empty(&stripe->current_error_bitmap,
+				 stripe->nr_sectors))
+			break;
+	}
+
+out:
+	for (i = 0; i < nr_copies - 1; i++) {
+		if (repair_stripes[i])
+			free_scrub_stripe(repair_stripes[i]);
+	}
+	kfree(repair_stripes);
+	return ret;
+}
+
 static int scrub_checksum_tree_block(struct scrub_block *sblock)
 {
 	struct scrub_ctx *sctx = sblock->sctx;
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index 73fa6744a6a7..fc732187db16 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -18,14 +18,11 @@ int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid,
  * static functions.
  */
 struct scrub_stripe;
-struct scrub_stripe *alloc_scrub_stripe(struct btrfs_fs_info *fs_info,
-					struct btrfs_block_group *bg);
-void wait_scrub_stripe(struct scrub_stripe *stripe);
 int scrub_find_fill_first_stripe(struct btrfs_root *extent_root,
 				 struct btrfs_root *csum_root,
 				 struct btrfs_block_group *bg,
 				 u64 logical_start, u64 logical_len,
 				 struct scrub_stripe *stripe);
-void scrub_verify_one_stripe(struct scrub_stripe *stripe);
+int scrub_repair_one_stripe(struct scrub_stripe *stripe);
 
 #endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bcfef75b97da..7bf572632e55 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5715,7 +5715,9 @@ void btrfs_mapping_tree_free(struct extent_map_tree *tree)
 	}
 }
 
-int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
+/* For call sites which don't want using replace target as an extra mirror. */
+int btrfs_num_copies_no_extra_mirror(struct btrfs_fs_info *fs_info, u64 logical,
+				     u64 len)
 {
 	struct extent_map *em;
 	struct map_lookup *map;
@@ -5750,6 +5752,12 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
 		 */
 		ret = map->num_stripes;
 	free_extent_map(em);
+	return ret;
+}
+
+int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
+{
+	int ret = btrfs_num_copies_no_extra_mirror(fs_info, logical, len);
 
 	down_read(&fs_info->dev_replace.rwsem);
 	if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace) &&
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6b7a05f6cf82..840beaee56f1 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -578,6 +578,8 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info,
 		    struct block_device **bdev, fmode_t *mode);
 void __exit btrfs_cleanup_fs_uuids(void);
 int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len);
+int btrfs_num_copies_no_extra_mirror(struct btrfs_fs_info *fs_info, u64 logical,
+				     u64 len);
 int btrfs_grow_device(struct btrfs_trans_handle *trans,
 		      struct btrfs_device *device, u64 new_size);
 struct btrfs_device *btrfs_find_device(const struct btrfs_fs_devices *fs_devices,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 09/11] btrfs: scrub: introduce a writeback helper for scrub_stripe
  2023-01-16  7:04 [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (7 preceding siblings ...)
  2023-01-16  7:04 ` [PATCH 08/11] btrfs: scrub: introduce the repair functionality for scrub_stripe Qu Wenruo
@ 2023-01-16  7:04 ` Qu Wenruo
  2023-01-16  7:04 ` [PATCH 10/11] btrfs: scrub: introduce error reporting functionality " Qu Wenruo
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2023-01-16  7:04 UTC (permalink / raw)
  To: linux-btrfs

Add a new helper, scrub_write_sectors(), to writeback specified sectors
to the target disk.

There are several differences compared to read path:

- Do not utilize btrfs_submit_bio()
  The problem of btrfs_submit_bio() is, for dev-replace case it will
  write the data to both the original and target devices.

  This can cause problem for zoned devices.
  Thus here we manually write back to the specified device instead.

- We can not write the full stripe back
  We can only write the sectors we have.
  There will be two call sites later, one for repaired sectors,
  one for all utilzied sectors of dev-replace.

  Thus the callers should specify their own write_bitmap.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/scrub.h |   4 ++
 2 files changed, 122 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 92976310f899..638dece0b863 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -149,6 +149,15 @@ struct scrub_stripe {
 	 */
 	unsigned long meta_error_bitmap;
 
+	/* This is only for write back cases (repair or replace). */
+	unsigned long write_error_bitmap;
+
+	/*
+	 * Spinlock for write_error_bitmap, as that's the only case we can have
+	 * multiple bios for one stripe.
+	 */
+	spinlock_t write_error_bitmap_lock;
+
 	/*
 	 * Checksum for the whole stripe if this stripe is inside a data block
 	 * group.
@@ -383,6 +392,7 @@ static struct scrub_stripe *alloc_scrub_stripe(struct btrfs_fs_info *fs_info,
 
 	init_waitqueue_head(&stripe->io_wait);
 	atomic_set(&stripe->pending_io, 0);
+	spin_lock_init(&stripe->write_error_bitmap_lock);
 
 
 	ret = btrfs_alloc_page_array(BTRFS_STRIPE_LEN >> PAGE_SHIFT,
@@ -2467,6 +2477,114 @@ static void scrub_repair_from_mirror(struct scrub_stripe *orig,
 	}
 }
 
+static void scrub_write_endio(struct bio *bio)
+{
+	struct scrub_stripe *stripe = bio->bi_private;
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	struct bio_vec *first_bvec = bio_first_bvec_all(bio);
+	struct bio_vec *bvec;
+	unsigned long flags;
+	int bio_size = 0;
+	int first_sector_nr;
+	int i;
+
+	bio_for_each_bvec_all(bvec, bio, i)
+		bio_size += bvec->bv_len;
+
+	for (i = 0; i < BTRFS_STRIPE_LEN >> PAGE_SHIFT; i++) {
+		if (stripe->pages[i] == first_bvec->bv_page)
+			break;
+	}
+	/*
+	 * Since our pages should all be from stripe->pages[], we should find
+	 * the page.
+	 */
+	ASSERT(i < BTRFS_STRIPE_LEN >> PAGE_SHIFT);
+	first_sector_nr = ((i << PAGE_SHIFT) + first_bvec->bv_offset) >>
+			  fs_info->sectorsize_bits;
+
+	if (bio->bi_status) {
+		spin_lock_irqsave(&stripe->write_error_bitmap_lock, flags);
+		bitmap_set(&stripe->write_error_bitmap, first_sector_nr,
+			   bio_size >> fs_info->sectorsize_bits);
+		spin_unlock_irqrestore(&stripe->write_error_bitmap_lock, flags);
+	}
+	bio_put(bio);
+
+	if (atomic_dec_and_test(&stripe->pending_io))
+		wake_up(&stripe->io_wait);
+}
+
+/*
+ * Writeback specified sectors (by @write_bitmap) to @physical of @device.
+ *
+ * Here we can not utilize btrfs_submit_bio(), as in dev-replace case, it will
+ * write to both the original and replace target devices.
+ * Such duplicated write can cause problems for zoned devices.
+ *
+ * Thus here we submit bio directly to @device.
+ */
+void scrub_write_wait_sectors(struct scrub_ctx *sctx,
+			      struct scrub_stripe *stripe,
+			      struct btrfs_device *dev, u64 physical,
+			      unsigned long *write_bitmap)
+{
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	struct bio *bio = NULL;
+	bool zoned = btrfs_is_zoned(fs_info);
+	int sector_nr;
+
+	/* Missing devices, skip it. */
+	if (!dev->bdev)
+		return;
+
+	ASSERT(atomic_read(&stripe->pending_io) == 0);
+
+	/* Go through each sector in @write_bitmap. */
+	for_each_set_bit(sector_nr, write_bitmap, stripe->nr_sectors) {
+		struct page *page = scrub_stripe_get_page(stripe, sector_nr);
+		unsigned int pgoff = scrub_stripe_get_page_offset(stripe,
+								  sector_nr);
+		int ret;
+
+		/* We should only writeback sectors covered by an extent. */
+		ASSERT(test_bit(sector_nr, &stripe->extent_sector_bitmap));
+
+		/* No bio allocated or we can not merge with previous sector. */
+		if (!bio || sector_nr == 0 ||
+		    !test_bit(sector_nr - 1, write_bitmap)) {
+			if (bio) {
+				fill_writer_pointer_gap(sctx, physical +
+					(sector_nr << fs_info->sectorsize_bits));
+				atomic_inc(&stripe->pending_io);
+				submit_bio(bio);
+				if (zoned)
+					wait_scrub_stripe(stripe);
+			}
+			bio = bio_alloc(dev->bdev, BTRFS_STRIPE_LEN >> PAGE_SHIFT,
+					REQ_OP_WRITE, GFP_KERNEL);
+			ASSERT(bio);
+
+			bio->bi_private = stripe;
+			bio->bi_end_io = scrub_write_endio;
+			bio->bi_iter.bi_sector = (physical +
+				(sector_nr << fs_info->sectorsize_bits)) >>
+				SECTOR_SHIFT;
+		}
+		ret = bio_add_page(bio, page, fs_info->sectorsize, pgoff);
+		ASSERT(ret == fs_info->sectorsize);
+	}
+	if (bio) {
+		fill_writer_pointer_gap(sctx, bio->bi_iter.bi_sector << SECTOR_SHIFT);
+		atomic_inc(&stripe->pending_io);
+		submit_bio(bio);
+		if (zoned)
+			wait_scrub_stripe(stripe);
+	}
+
+	wait_scrub_stripe(stripe);
+}
+
 int scrub_repair_one_stripe(struct scrub_stripe *stripe)
 {
 	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index fc732187db16..03f84acd31ce 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -24,5 +24,9 @@ int scrub_find_fill_first_stripe(struct btrfs_root *extent_root,
 				 u64 logical_start, u64 logical_len,
 				 struct scrub_stripe *stripe);
 int scrub_repair_one_stripe(struct scrub_stripe *stripe);
+void scrub_write_wait_sectors(struct scrub_ctx *sctx,
+			      struct scrub_stripe *stripe,
+			      struct btrfs_device *dev, u64 physical,
+			      unsigned long *write_bitmap);
 
 #endif
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 10/11] btrfs: scrub: introduce error reporting functionality for scrub_stripe
  2023-01-16  7:04 [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (8 preceding siblings ...)
  2023-01-16  7:04 ` [PATCH 09/11] btrfs: scrub: introduce a writeback helper " Qu Wenruo
@ 2023-01-16  7:04 ` Qu Wenruo
  2023-01-16  7:04 ` [PATCH 11/11] btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure Qu Wenruo
  2023-01-18 20:01 ` [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() David Sterba
  11 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2023-01-16  7:04 UTC (permalink / raw)
  To: linux-btrfs

The new helper, scrub_report_stripe_errors(), will report the result of the
scrub to dmesg.

The main reporting is done by introducing a new helper,
scrub_print_common_warning(), which is mostly the same content from
scrub_print_wanring(), but without the need for a scrub_block.

Since we're reporting the errors, it's the perfect timing to update the
scrub stat too.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 163 ++++++++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/scrub.h |   2 +
 2 files changed, 156 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 638dece0b863..d587ca1fb8e7 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -112,6 +112,13 @@ struct scrub_stripe {
 	/* Should be BTRFS_STRIPE_LEN / sectorsize. */
 	u16 nr_sectors;
 
+	/*
+	 * How many data/meta extents are in this stripe.
+	 * Only for scrub stat report purpose.
+	 */
+	u16 nr_data_extents;
+	u16 nr_meta_extents;
+
 	atomic_t pending_io;
 	wait_queue_head_t io_wait;
 
@@ -1094,9 +1101,9 @@ static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes,
 	return 0;
 }
 
-static void scrub_print_warning(const char *errstr, struct scrub_block *sblock)
+static void scrub_print_common_warning(const char *errstr, struct btrfs_device *dev,
+				       bool is_super, u64 logical, u64 physical)
 {
-	struct btrfs_device *dev;
 	struct btrfs_fs_info *fs_info;
 	struct btrfs_path *path;
 	struct btrfs_key found_key;
@@ -1110,22 +1117,20 @@ static void scrub_print_warning(const char *errstr, struct scrub_block *sblock)
 	u8 ref_level = 0;
 	int ret;
 
-	WARN_ON(sblock->sector_count < 1);
-	dev = sblock->dev;
-	fs_info = sblock->sctx->fs_info;
+	fs_info = dev->fs_info;
 
 	/* Super block error, no need to search extent tree. */
-	if (sblock->sectors[0]->flags & BTRFS_EXTENT_FLAG_SUPER) {
+	if (is_super) {
 		btrfs_warn_in_rcu(fs_info, "%s on device %s, physical %llu",
-			errstr, btrfs_dev_name(dev), sblock->physical);
+			errstr, btrfs_dev_name(dev), physical);
 		return;
 	}
 	path = btrfs_alloc_path();
 	if (!path)
 		return;
 
-	swarn.physical = sblock->physical;
-	swarn.logical = sblock->logical;
+	swarn.physical = physical;
+	swarn.logical = logical;
 	swarn.errstr = errstr;
 	swarn.dev = NULL;
 
@@ -1174,6 +1179,13 @@ static void scrub_print_warning(const char *errstr, struct scrub_block *sblock)
 	btrfs_free_path(path);
 }
 
+static void scrub_print_warning(const char *errstr, struct scrub_block *sblock)
+{
+	scrub_print_common_warning(errstr, sblock->dev,
+			sblock->sectors[0]->flags & BTRFS_EXTENT_FLAG_SUPER,
+			sblock->logical, sblock->physical);
+}
+
 static inline void scrub_get_recover(struct scrub_recover *recover)
 {
 	refcount_inc(&recover->refs);
@@ -2666,6 +2678,131 @@ int scrub_repair_one_stripe(struct scrub_stripe *stripe)
 	return ret;
 }
 
+void scrub_report_stripe_errors(struct scrub_ctx *sctx,
+				struct scrub_stripe *stripe)
+{
+	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
+	struct btrfs_fs_info *fs_info = sctx->fs_info;
+	struct btrfs_device *dev = NULL;
+	u64 physical = 0;
+	int nr_data_sectors = 0;
+	int nr_meta_sectors = 0;
+	int nr_nodatacsum_sectors = 0;
+	int nr_repaired_sectors = 0;
+	int sector_nr;
+
+	/*
+	 * Init needed infos for error reporting.
+	 *
+	 * Although our scrub_stripe infrastucture is mostly based on btrfs_submit_bio()
+	 * thus no need for dev/physical, error reporting still needs dev and physical.
+	 */
+	if (!bitmap_empty(&stripe->init_error_bitmap, stripe->nr_sectors)) {
+		u64 mapped_len = fs_info->sectorsize;
+		struct btrfs_io_context *bioc = NULL;
+		int stripe_index = stripe->mirror_num - 1;
+		int ret;
+
+		/* For scrub, our mirror_num should always start at 1. */
+		ASSERT(stripe->mirror_num >= 1);
+		ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS,
+				      stripe->logical, &mapped_len, &bioc);
+		/*
+		 * If we failed, dev will be NULL, and later detailed reports
+		 * will just be skipped.
+		 */
+		if (ret < 0)
+			goto skip;
+		physical = bioc->stripes[stripe_index].physical;
+		dev = bioc->stripes[stripe_index].dev;
+		btrfs_put_bioc(bioc);
+	}
+
+skip:
+	for_each_set_bit(sector_nr, &stripe->extent_sector_bitmap,
+			 stripe->nr_sectors) {
+		bool repaired = false;
+
+		if (stripe->sectors[sector_nr].is_metadata) {
+			nr_meta_sectors++;
+		} else {
+			nr_data_sectors++;
+			if (!stripe->sectors[sector_nr].csum)
+				nr_nodatacsum_sectors++;
+		}
+
+		if (test_bit(sector_nr, &stripe->init_error_bitmap) &&
+		    !test_bit(sector_nr, &stripe->current_error_bitmap)) {
+			nr_repaired_sectors++;
+			repaired = true;
+		}
+
+		/* Good sector from the beginning, nothing need to be done. */
+		if (!test_bit(sector_nr, &stripe->init_error_bitmap))
+			continue;
+
+		/*
+		 * Report error for the corrupted sectors.
+		 * If repaired, just output the message of repaired message.
+		 */
+		if (repaired) {
+			if (dev)
+				btrfs_err_rl_in_rcu(fs_info,
+			"fixed up error at logical %llu on dev %s physical %llu",
+					    stripe->logical, btrfs_dev_name(dev),
+					    physical);
+			else
+				btrfs_err_rl_in_rcu(fs_info,
+			"fixed up error at logical %llu on mirror %u",
+					    stripe->logical, stripe->mirror_num);
+			continue;
+		}
+
+		/* The remaining are all for unrepaired. */
+		if (dev)
+			btrfs_err_rl_in_rcu(fs_info,
+	"unable to fixup (regular) error at logical %llu on dev %s physical %llu",
+					    stripe->logical, btrfs_dev_name(dev),
+					    physical);
+		else
+			btrfs_err_rl_in_rcu(fs_info,
+	"unable to fixup (regular) error at logical %llu on mirror %u",
+					    stripe->logical, stripe->mirror_num);
+
+		if (test_bit(sector_nr, &stripe->io_error_bitmap))
+			if (__ratelimit(&rs) && dev)
+				scrub_print_common_warning("i/o error", dev, false,
+						     stripe->logical, physical);
+		if (test_bit(sector_nr, &stripe->csum_error_bitmap))
+			if (__ratelimit(&rs) && dev)
+				scrub_print_common_warning("checksum error", dev, false,
+						     stripe->logical, physical);
+		if (test_bit(sector_nr, &stripe->meta_error_bitmap))
+			if (__ratelimit(&rs) && dev)
+				scrub_print_common_warning("header error", dev, false,
+						     stripe->logical, physical);
+	}
+
+	spin_lock(&sctx->stat_lock);
+	sctx->stat.data_extents_scrubbed += stripe->nr_data_extents;
+	sctx->stat.tree_extents_scrubbed += stripe->nr_meta_extents;
+	sctx->stat.data_bytes_scrubbed += nr_data_sectors <<
+					  fs_info->sectorsize_bits;
+	sctx->stat.tree_bytes_scrubbed += nr_meta_sectors <<
+					  fs_info->sectorsize_bits;
+	sctx->stat.read_errors +=
+		bitmap_weight(&stripe->io_error_bitmap, stripe->nr_sectors);
+	sctx->stat.csum_errors +=
+		bitmap_weight(&stripe->csum_error_bitmap, stripe->nr_sectors);
+	sctx->stat.verify_errors +=
+		bitmap_weight(&stripe->meta_error_bitmap, stripe->nr_sectors);
+	sctx->stat.uncorrectable_errors +=
+		bitmap_weight(&stripe->current_error_bitmap, stripe->nr_sectors);
+	sctx->stat.corrected_errors += nr_repaired_sectors;
+	spin_unlock(&sctx->stat_lock);
+}
+
 static int scrub_checksum_tree_block(struct scrub_block *sblock)
 {
 	struct scrub_ctx *sctx = sblock->sctx;
@@ -4114,6 +4251,10 @@ int scrub_find_fill_first_stripe(struct btrfs_root *extent_root,
 		goto out;
 	get_extent_info(&path, &extent_start, &extent_len,
 			&extent_flags, &extent_gen);
+	if (extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK)
+		stripe->nr_meta_extents++;
+	if (extent_flags & BTRFS_EXTENT_FLAG_DATA)
+		stripe->nr_data_extents++;
 	cur_logical = max(extent_start, cur_logical);
 
 	/*
@@ -4143,6 +4284,10 @@ int scrub_find_fill_first_stripe(struct btrfs_root *extent_root,
 		}
 		get_extent_info(&path, &extent_start, &extent_len,
 				&extent_flags, &extent_gen);
+		if (extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK)
+			stripe->nr_meta_extents++;
+		if (extent_flags & BTRFS_EXTENT_FLAG_DATA)
+			stripe->nr_data_extents++;
 		fill_one_extent_info(fs_info, stripe, extent_start, extent_len,
 				     extent_flags, extent_gen);
 		cur_logical = extent_start + extent_len;
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index 03f84acd31ce..fc294a250945 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -28,5 +28,7 @@ void scrub_write_wait_sectors(struct scrub_ctx *sctx,
 			      struct scrub_stripe *stripe,
 			      struct btrfs_device *dev, u64 physical,
 			      unsigned long *write_bitmap);
+void scrub_report_stripe_errors(struct scrub_ctx *sctx,
+				struct scrub_stripe *stripe);
 
 #endif
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 11/11] btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure
  2023-01-16  7:04 [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (9 preceding siblings ...)
  2023-01-16  7:04 ` [PATCH 10/11] btrfs: scrub: introduce error reporting functionality " Qu Wenruo
@ 2023-01-16  7:04 ` Qu Wenruo
  2023-01-18 20:01 ` [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() David Sterba
  11 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2023-01-16  7:04 UTC (permalink / raw)
  To: linux-btrfs

Switch scrub_simple_mirror() to the new scrub_stripe infrastructure.

Since scrub_simple_mirror() is the core part of scrub (only RAID56
P/Q stripes doesn't utilize it), we can get rid of a big hunk of code,
mostly scrub_extent() and scrub_sectors().

There is a functionality change:

- Scrub speed throttle now only affects read on the scrubbing device
  Writes (for repair and replace), and reads from other mirrors won't
  be limited by the limits.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 523 +++++++++++------------------------------------
 fs/btrfs/scrub.h |  18 --
 2 files changed, 121 insertions(+), 420 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index d587ca1fb8e7..6ff3f0d74683 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -619,18 +619,9 @@ static void scrub_sector_get(struct scrub_sector *sector);
 static void scrub_sector_put(struct scrub_sector *sector);
 static void scrub_parity_get(struct scrub_parity *sparity);
 static void scrub_parity_put(struct scrub_parity *sparity);
-static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len,
-			 u64 physical, struct btrfs_device *dev, u64 flags,
-			 u64 gen, int mirror_num, u8 *csum,
-			 u64 physical_for_dev_replace);
 static void scrub_bio_end_io(struct bio *bio);
 static void scrub_bio_end_io_worker(struct work_struct *work);
 static void scrub_block_complete(struct scrub_block *sblock);
-static void scrub_find_good_copy(struct btrfs_fs_info *fs_info,
-				 u64 extent_logical, u32 extent_len,
-				 u64 *extent_physical,
-				 struct btrfs_device **extent_dev,
-				 int *extent_mirror_num);
 static int scrub_add_sector_to_wr_bio(struct scrub_ctx *sctx,
 				      struct scrub_sector *sector);
 static void scrub_wr_submit(struct scrub_ctx *sctx);
@@ -2536,10 +2527,10 @@ static void scrub_write_endio(struct bio *bio)
  *
  * Thus here we submit bio directly to @device.
  */
-void scrub_write_wait_sectors(struct scrub_ctx *sctx,
-			      struct scrub_stripe *stripe,
-			      struct btrfs_device *dev, u64 physical,
-			      unsigned long *write_bitmap)
+static void scrub_write_wait_sectors(struct scrub_ctx *sctx,
+				     struct scrub_stripe *stripe,
+				     struct btrfs_device *dev, u64 physical,
+				     unsigned long *write_bitmap)
 {
 	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
 	struct bio *bio = NULL;
@@ -2597,7 +2588,7 @@ void scrub_write_wait_sectors(struct scrub_ctx *sctx,
 	wait_scrub_stripe(stripe);
 }
 
-int scrub_repair_one_stripe(struct scrub_stripe *stripe)
+static int scrub_repair_one_stripe(struct scrub_stripe *stripe)
 {
 	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
 	struct scrub_stripe **repair_stripes;
@@ -2678,8 +2669,8 @@ int scrub_repair_one_stripe(struct scrub_stripe *stripe)
 	return ret;
 }
 
-void scrub_report_stripe_errors(struct scrub_ctx *sctx,
-				struct scrub_stripe *stripe)
+static void scrub_report_stripe_errors(struct scrub_ctx *sctx,
+				       struct scrub_stripe *stripe)
 {
 	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
 				      DEFAULT_RATELIMIT_BURST);
@@ -2968,22 +2959,16 @@ static void scrub_sector_put(struct scrub_sector *sector)
 		kfree(sector);
 }
 
-/*
- * Throttling of IO submission, bandwidth-limit based, the timeslice is 1
- * second.  Limit can be set via /sys/fs/UUID/devinfo/devid/scrub_speed_max.
- */
-static void scrub_throttle(struct scrub_ctx *sctx)
+static void scrub_throttle_dev_io(struct scrub_ctx *sctx,
+				  struct btrfs_device *device,
+				  unsigned int bio_size)
 {
 	const int time_slice = 1000;
-	struct scrub_bio *sbio;
-	struct btrfs_device *device;
 	s64 delta;
 	ktime_t now;
 	u32 div;
 	u64 bwlimit;
 
-	sbio = sctx->bios[sctx->curr];
-	device = sbio->dev;
 	bwlimit = READ_ONCE(device->scrub_speed_max);
 	if (bwlimit == 0)
 		return;
@@ -3005,7 +2990,7 @@ static void scrub_throttle(struct scrub_ctx *sctx)
 	/* Still in the time to send? */
 	if (ktime_before(now, sctx->throttle_deadline)) {
 		/* If current bio is within the limit, send it */
-		sctx->throttle_sent += sbio->bio->bi_iter.bi_size;
+		sctx->throttle_sent += bio_size;
 		if (sctx->throttle_sent <= div_u64(bwlimit, div))
 			return;
 
@@ -3027,6 +3012,17 @@ static void scrub_throttle(struct scrub_ctx *sctx)
 	sctx->throttle_deadline = 0;
 }
 
+/*
+ * Throttling of IO submission, bandwidth-limit based, the timeslice is 1
+ * second.  Limit can be set via /sys/fs/UUID/devinfo/devid/scrub_speed_max.
+ */
+static void scrub_throttle(struct scrub_ctx *sctx)
+{
+	struct scrub_bio *sbio = sctx->bios[sctx->curr];
+
+	scrub_throttle_dev_io(sctx, sbio->dev, sbio->bio->bi_iter.bi_size);
+}
+
 static void scrub_submit(struct scrub_ctx *sctx)
 {
 	struct scrub_bio *sbio;
@@ -3111,202 +3107,6 @@ static int scrub_add_sector_to_rd_bio(struct scrub_ctx *sctx,
 	return 0;
 }
 
-static void scrub_missing_raid56_end_io(struct bio *bio)
-{
-	struct scrub_block *sblock = bio->bi_private;
-	struct btrfs_fs_info *fs_info = sblock->sctx->fs_info;
-
-	btrfs_bio_counter_dec(fs_info);
-	if (bio->bi_status)
-		sblock->no_io_error_seen = 0;
-
-	bio_put(bio);
-
-	queue_work(fs_info->scrub_workers, &sblock->work);
-}
-
-static void scrub_missing_raid56_worker(struct work_struct *work)
-{
-	struct scrub_block *sblock = container_of(work, struct scrub_block, work);
-	struct scrub_ctx *sctx = sblock->sctx;
-	struct btrfs_fs_info *fs_info = sctx->fs_info;
-	u64 logical;
-	struct btrfs_device *dev;
-
-	logical = sblock->logical;
-	dev = sblock->dev;
-
-	if (sblock->no_io_error_seen)
-		scrub_recheck_block_checksum(sblock);
-
-	if (!sblock->no_io_error_seen) {
-		spin_lock(&sctx->stat_lock);
-		sctx->stat.read_errors++;
-		spin_unlock(&sctx->stat_lock);
-		btrfs_err_rl_in_rcu(fs_info,
-			"IO error rebuilding logical %llu for dev %s",
-			logical, btrfs_dev_name(dev));
-	} else if (sblock->header_error || sblock->checksum_error) {
-		spin_lock(&sctx->stat_lock);
-		sctx->stat.uncorrectable_errors++;
-		spin_unlock(&sctx->stat_lock);
-		btrfs_err_rl_in_rcu(fs_info,
-			"failed to rebuild valid logical %llu for dev %s",
-			logical, btrfs_dev_name(dev));
-	} else {
-		scrub_write_block_to_dev_replace(sblock);
-	}
-
-	if (sctx->is_dev_replace && sctx->flush_all_writes) {
-		mutex_lock(&sctx->wr_lock);
-		scrub_wr_submit(sctx);
-		mutex_unlock(&sctx->wr_lock);
-	}
-
-	scrub_block_put(sblock);
-	scrub_pending_bio_dec(sctx);
-}
-
-static void scrub_missing_raid56_pages(struct scrub_block *sblock)
-{
-	struct scrub_ctx *sctx = sblock->sctx;
-	struct btrfs_fs_info *fs_info = sctx->fs_info;
-	u64 length = sblock->sector_count << fs_info->sectorsize_bits;
-	u64 logical = sblock->logical;
-	struct btrfs_io_context *bioc = NULL;
-	struct bio *bio;
-	struct btrfs_raid_bio *rbio;
-	int ret;
-	int i;
-
-	btrfs_bio_counter_inc_blocked(fs_info);
-	ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical,
-			       &length, &bioc);
-	if (ret || !bioc || !bioc->raid_map)
-		goto bioc_out;
-
-	if (WARN_ON(!sctx->is_dev_replace ||
-		    !(bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK))) {
-		/*
-		 * We shouldn't be scrubbing a missing device. Even for dev
-		 * replace, we should only get here for RAID 5/6. We either
-		 * managed to mount something with no mirrors remaining or
-		 * there's a bug in scrub_find_good_copy()/btrfs_map_block().
-		 */
-		goto bioc_out;
-	}
-
-	bio = bio_alloc(NULL, BIO_MAX_VECS, REQ_OP_READ, GFP_NOFS);
-	bio->bi_iter.bi_sector = logical >> 9;
-	bio->bi_private = sblock;
-	bio->bi_end_io = scrub_missing_raid56_end_io;
-
-	rbio = raid56_alloc_missing_rbio(bio, bioc);
-	if (!rbio)
-		goto rbio_out;
-
-	for (i = 0; i < sblock->sector_count; i++) {
-		struct scrub_sector *sector = sblock->sectors[i];
-
-		raid56_add_scrub_pages(rbio, scrub_sector_get_page(sector),
-				       scrub_sector_get_page_offset(sector),
-				       sector->offset + sector->sblock->logical);
-	}
-
-	INIT_WORK(&sblock->work, scrub_missing_raid56_worker);
-	scrub_block_get(sblock);
-	scrub_pending_bio_inc(sctx);
-	raid56_submit_missing_rbio(rbio);
-	btrfs_put_bioc(bioc);
-	return;
-
-rbio_out:
-	bio_put(bio);
-bioc_out:
-	btrfs_bio_counter_dec(fs_info);
-	btrfs_put_bioc(bioc);
-	spin_lock(&sctx->stat_lock);
-	sctx->stat.malloc_errors++;
-	spin_unlock(&sctx->stat_lock);
-}
-
-static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len,
-		       u64 physical, struct btrfs_device *dev, u64 flags,
-		       u64 gen, int mirror_num, u8 *csum,
-		       u64 physical_for_dev_replace)
-{
-	struct scrub_block *sblock;
-	const u32 sectorsize = sctx->fs_info->sectorsize;
-	int index;
-
-	sblock = alloc_scrub_block(sctx, dev, logical, physical,
-				   physical_for_dev_replace, mirror_num);
-	if (!sblock) {
-		spin_lock(&sctx->stat_lock);
-		sctx->stat.malloc_errors++;
-		spin_unlock(&sctx->stat_lock);
-		return -ENOMEM;
-	}
-
-	for (index = 0; len > 0; index++) {
-		struct scrub_sector *sector;
-		/*
-		 * Here we will allocate one page for one sector to scrub.
-		 * This is fine if PAGE_SIZE == sectorsize, but will cost
-		 * more memory for PAGE_SIZE > sectorsize case.
-		 */
-		u32 l = min(sectorsize, len);
-
-		sector = alloc_scrub_sector(sblock, logical);
-		if (!sector) {
-			spin_lock(&sctx->stat_lock);
-			sctx->stat.malloc_errors++;
-			spin_unlock(&sctx->stat_lock);
-			scrub_block_put(sblock);
-			return -ENOMEM;
-		}
-		sector->flags = flags;
-		sector->generation = gen;
-		if (csum) {
-			sector->have_csum = 1;
-			memcpy(sector->csum, csum, sctx->fs_info->csum_size);
-		} else {
-			sector->have_csum = 0;
-		}
-		len -= l;
-		logical += l;
-		physical += l;
-		physical_for_dev_replace += l;
-	}
-
-	WARN_ON(sblock->sector_count == 0);
-	if (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state)) {
-		/*
-		 * This case should only be hit for RAID 5/6 device replace. See
-		 * the comment in scrub_missing_raid56_pages() for details.
-		 */
-		scrub_missing_raid56_pages(sblock);
-	} else {
-		for (index = 0; index < sblock->sector_count; index++) {
-			struct scrub_sector *sector = sblock->sectors[index];
-			int ret;
-
-			ret = scrub_add_sector_to_rd_bio(sctx, sector);
-			if (ret) {
-				scrub_block_put(sblock);
-				return ret;
-			}
-		}
-
-		if (flags & BTRFS_EXTENT_FLAG_SUPER)
-			scrub_submit(sctx);
-	}
-
-	/* last one frees, either here or in bio completion for last page */
-	scrub_block_put(sblock);
-	return 0;
-}
-
 static void scrub_bio_end_io(struct bio *bio)
 {
 	struct scrub_bio *sbio = bio->bi_private;
@@ -3491,77 +3291,6 @@ static int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum)
 	return 1;
 }
 
-/* scrub extent tries to collect up to 64 kB for each bio */
-static int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map,
-			u64 logical, u32 len,
-			u64 physical, struct btrfs_device *dev, u64 flags,
-			u64 gen, int mirror_num)
-{
-	struct btrfs_device *src_dev = dev;
-	u64 src_physical = physical;
-	int src_mirror = mirror_num;
-	int ret;
-	u8 csum[BTRFS_CSUM_SIZE];
-	u32 blocksize;
-
-	if (flags & BTRFS_EXTENT_FLAG_DATA) {
-		if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK)
-			blocksize = map->stripe_len;
-		else
-			blocksize = sctx->fs_info->sectorsize;
-		spin_lock(&sctx->stat_lock);
-		sctx->stat.data_extents_scrubbed++;
-		sctx->stat.data_bytes_scrubbed += len;
-		spin_unlock(&sctx->stat_lock);
-	} else if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
-		if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK)
-			blocksize = map->stripe_len;
-		else
-			blocksize = sctx->fs_info->nodesize;
-		spin_lock(&sctx->stat_lock);
-		sctx->stat.tree_extents_scrubbed++;
-		sctx->stat.tree_bytes_scrubbed += len;
-		spin_unlock(&sctx->stat_lock);
-	} else {
-		blocksize = sctx->fs_info->sectorsize;
-		WARN_ON(1);
-	}
-
-	/*
-	 * For dev-replace case, we can have @dev being a missing device.
-	 * Regular scrub will avoid its execution on missing device at all,
-	 * as that would trigger tons of read error.
-	 *
-	 * Reading from missing device will cause read error counts to
-	 * increase unnecessarily.
-	 * So here we change the read source to a good mirror.
-	 */
-	if (sctx->is_dev_replace && !dev->bdev)
-		scrub_find_good_copy(sctx->fs_info, logical, len, &src_physical,
-				     &src_dev, &src_mirror);
-	while (len) {
-		u32 l = min(len, blocksize);
-		int have_csum = 0;
-
-		if (flags & BTRFS_EXTENT_FLAG_DATA) {
-			/* push csums to sbio */
-			have_csum = scrub_find_csum(sctx, logical, csum);
-			if (have_csum == 0)
-				++sctx->stat.no_csum;
-		}
-		ret = scrub_sectors(sctx, logical, l, src_physical, src_dev,
-				    flags, gen, src_mirror,
-				    have_csum ? csum : NULL, physical);
-		if (ret)
-			return ret;
-		len -= l;
-		logical += l;
-		physical += l;
-		src_physical += l;
-	}
-	return 0;
-}
-
 static int scrub_sectors_for_parity(struct scrub_parity *sparity,
 				  u64 logical, u32 len,
 				  u64 physical, struct btrfs_device *dev,
@@ -4145,20 +3874,6 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx,
 	return ret < 0 ? ret : 0;
 }
 
-static void sync_replace_for_zoned(struct scrub_ctx *sctx)
-{
-	if (!btrfs_is_zoned(sctx->fs_info))
-		return;
-
-	sctx->flush_all_writes = true;
-	scrub_submit(sctx);
-	mutex_lock(&sctx->wr_lock);
-	scrub_wr_submit(sctx);
-	mutex_unlock(&sctx->wr_lock);
-
-	wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0);
-}
-
 static int sync_write_pointer_for_zoned(struct scrub_ctx *sctx, u64 logical,
 					u64 physical, u64 physical_end)
 {
@@ -4216,11 +3931,11 @@ static void fill_one_extent_info(struct btrfs_fs_info *fs_info,
  * Return >0 if there is no such stripe in the specified range.
  * Return <0 for error.
  */
-int scrub_find_fill_first_stripe(struct btrfs_root *extent_root,
-				 struct btrfs_root *csum_root,
-				 struct btrfs_block_group *bg,
-				 u64 logical_start, u64 logical_len,
-				 struct scrub_stripe *stripe)
+static int scrub_find_fill_first_stripe(struct btrfs_root *extent_root,
+					struct btrfs_root *csum_root,
+					struct btrfs_block_group *bg,
+					u64 logical_start, u64 logical_len,
+					struct scrub_stripe *stripe)
 {
 	struct btrfs_fs_info *fs_info = extent_root->fs_info;
 	const u64 logical_end = logical_start + logical_len;
@@ -4326,6 +4041,67 @@ int scrub_find_fill_first_stripe(struct btrfs_root *extent_root,
 	return ret;
 }
 
+/* Reset the bitmaps and sectors info for next stripe. */
+static void scrub_reset_stripe(struct scrub_stripe *stripe)
+{
+	int i;
+
+	stripe->extent_sector_bitmap = 0;
+	stripe->init_error_bitmap = 0;
+	stripe->current_error_bitmap = 0;
+
+	stripe->io_error_bitmap = 0;
+	stripe->csum_error_bitmap = 0;
+	stripe->meta_error_bitmap = 0;
+	stripe->write_error_bitmap = 0;
+
+	stripe->nr_meta_extents = 0;
+	stripe->nr_data_extents = 0;
+
+	for (i = 0; i < stripe->nr_sectors; i++) {
+		stripe->sectors[i].is_metadata = false;
+		stripe->sectors[i].csum = NULL;
+		stripe->sectors[i].generation = 0;
+	}
+}
+
+static void scrub_write_repaired_sectors(struct scrub_ctx *sctx,
+					 struct scrub_stripe *stripe)
+{
+	unsigned long write_bitmap;
+
+	if (sctx->readonly)
+		return;
+
+	/*
+	 * Repaired sectors are in @init_error_bitmap, but not in
+	 * @current_error_bitmap.
+	 */
+	bitmap_andnot(&write_bitmap, &stripe->init_error_bitmap,
+		      &stripe->current_error_bitmap, stripe->nr_sectors);
+	scrub_write_wait_sectors(sctx, stripe, stripe->dev, stripe->physical,
+				 &write_bitmap);
+}
+
+static void scrub_write_replace_sectors(struct scrub_ctx *sctx,
+					struct scrub_stripe *stripe)
+{
+	unsigned long write_bitmap;
+
+	if (sctx->readonly)
+		return;
+
+	/*
+	 * Rplace sectors are in @extent_sector_bitmap, but not in
+	 * @current_error_bitmap.
+	 *
+	 * If we have unrepaired sectors, we have no choice but to skip them.
+	 */
+	bitmap_andnot(&write_bitmap, &stripe->extent_sector_bitmap,
+		      &stripe->current_error_bitmap, stripe->nr_sectors);
+	scrub_write_wait_sectors(sctx, stripe, sctx->wr_tgtdev, stripe->physical,
+				 &write_bitmap);
+}
 
 /*
  * Scrub one range which can only has simple mirror based profile.
@@ -4336,6 +4112,7 @@ int scrub_find_fill_first_stripe(struct btrfs_root *extent_root,
  * and @logical_length parameter.
  */
 static int scrub_simple_mirror(struct scrub_ctx *sctx,
+			       struct scrub_stripe *stripe,
 			       struct btrfs_block_group *bg,
 			       struct map_lookup *map,
 			       u64 logical_start, u64 logical_length,
@@ -4346,25 +4123,17 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
 	struct btrfs_root *csum_root = btrfs_csum_root(fs_info, bg->start);
 	struct btrfs_root *extent_root = btrfs_extent_root(fs_info, bg->start);
 	const u64 logical_end = logical_start + logical_length;
-	/* An artificial limit, inherit from old scrub behavior */
-	const u32 max_length = SZ_64K;
-	struct btrfs_path path = { 0 };
 	u64 cur_logical = logical_start;
 	int ret;
 
 	/* The range must be inside the bg */
 	ASSERT(logical_start >= bg->start && logical_end <= bg->start + bg->length);
 
-	path.search_commit_root = 1;
-	path.skip_locking = 1;
+	stripe->mirror_num = mirror_num;
+	stripe->dev = device;
+
 	/* Go through each extent items inside the logical range */
 	while (cur_logical < logical_end) {
-		u64 extent_start;
-		u64 extent_len;
-		u64 extent_flags;
-		u64 extent_gen;
-		u64 scrub_len;
-
 		/* Canceled? */
 		if (atomic_read(&fs_info->scrub_cancel_req) ||
 		    atomic_read(&sctx->cancel_req)) {
@@ -4393,8 +4162,9 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
 		}
 		spin_unlock(&bg->lock);
 
-		ret = find_first_extent_item(extent_root, &path, cur_logical,
-					     logical_end - cur_logical);
+		scrub_reset_stripe(stripe);
+		ret = scrub_find_fill_first_stripe(extent_root, csum_root, bg,
+				cur_logical, logical_end - cur_logical, stripe);
 		if (ret > 0) {
 			/* No more extent, just update the accounting */
 			sctx->stat.last_physical = physical + logical_length;
@@ -4403,56 +4173,20 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
 		}
 		if (ret < 0)
 			break;
-		get_extent_info(&path, &extent_start, &extent_len,
-				&extent_flags, &extent_gen);
-		/* Skip hole range which doesn't have any extent */
-		cur_logical = max(extent_start, cur_logical);
-
-		/*
-		 * Scrub len has three limits:
-		 * - Extent size limit
-		 * - Scrub range limit
-		 *   This is especially imporatant for RAID0/RAID10 to reuse
-		 *   this function
-		 * - Max scrub size limit
-		 */
-		scrub_len = min(min(extent_start + extent_len,
-				    logical_end), cur_logical + max_length) -
-			    cur_logical;
-
-		if (extent_flags & BTRFS_EXTENT_FLAG_DATA) {
-			ret = btrfs_lookup_csums_list(csum_root, cur_logical,
-					cur_logical + scrub_len - 1,
-					&sctx->csum_list, 1, false);
-			if (ret)
-				break;
-		}
-		if ((extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) &&
-		    does_range_cross_boundary(extent_start, extent_len,
-					      logical_start, logical_length)) {
-			btrfs_err(fs_info,
-"scrub: tree block %llu spanning boundaries, ignored. boundary=[%llu, %llu)",
-				  extent_start, logical_start, logical_end);
-			spin_lock(&sctx->stat_lock);
-			sctx->stat.uncorrectable_errors++;
-			spin_unlock(&sctx->stat_lock);
-			cur_logical += scrub_len;
-			continue;
-		}
-		ret = scrub_extent(sctx, map, cur_logical, scrub_len,
-				   cur_logical - logical_start + physical,
-				   device, extent_flags, extent_gen,
-				   mirror_num);
-		scrub_free_csums(sctx);
-		if (ret)
-			break;
+		stripe->physical = physical + stripe->logical - logical_start;
+		scrub_throttle_dev_io(sctx, device, BTRFS_STRIPE_LEN);
+		scrub_submit_read_one_stripe(stripe);
+		wait_scrub_stripe(stripe);
+		scrub_repair_one_stripe(stripe);
+		scrub_write_repaired_sectors(sctx, stripe);
+		scrub_report_stripe_errors(sctx, stripe);
 		if (sctx->is_dev_replace)
-			sync_replace_for_zoned(sctx);
-		cur_logical += scrub_len;
+			scrub_write_replace_sectors(sctx, stripe);
+
+		cur_logical = stripe->logical + BTRFS_STRIPE_LEN;
 		/* Don't hold CPU for too long time */
 		cond_resched();
 	}
-	btrfs_release_path(&path);
 	return ret;
 }
 
@@ -4493,6 +4227,7 @@ static int simple_stripe_mirror_num(struct map_lookup *map, int stripe_index)
 }
 
 static int scrub_simple_stripe(struct scrub_ctx *sctx,
+			       struct scrub_stripe *stripe,
 			       struct btrfs_block_group *bg,
 			       struct map_lookup *map,
 			       struct btrfs_device *device,
@@ -4512,7 +4247,7 @@ static int scrub_simple_stripe(struct scrub_ctx *sctx,
 		 * just RAID1, so we can reuse scrub_simple_mirror() to scrub
 		 * this stripe.
 		 */
-		ret = scrub_simple_mirror(sctx, bg, map, cur_logical,
+		ret = scrub_simple_mirror(sctx, stripe, bg, map, cur_logical,
 					  map->stripe_len, device, cur_physical,
 					  mirror_num);
 		if (ret)
@@ -4532,6 +4267,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 					   int stripe_index)
 {
 	struct btrfs_fs_info *fs_info = sctx->fs_info;
+	struct scrub_stripe *stripe;
 	struct blk_plug plug;
 	struct map_lookup *map = em->map_lookup;
 	const u64 profile = map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK;
@@ -4550,10 +4286,15 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 	u64 stripe_end;
 	int stop_loop = 0;
 
+	stripe = alloc_scrub_stripe(fs_info, bg);
+	if (!stripe)
+		return -ENOMEM;
+
 	wait_event(sctx->list_wait,
 		   atomic_read(&sctx->bios_in_flight) == 0);
 	scrub_blocked_if_needed(fs_info);
 
+
 	/*
 	 * collect all data csums for the stripe to avoid seeking during
 	 * the scrub. This might currently (crc32) end up to be about 1MB
@@ -4585,14 +4326,16 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 		 * Only @physical and @mirror_num needs to calculated using
 		 * @stripe_index.
 		 */
-		ret = scrub_simple_mirror(sctx, bg, map, bg->start, bg->length,
-				scrub_dev, map->stripes[stripe_index].physical,
+		ret = scrub_simple_mirror(sctx, stripe, bg, map, bg->start,
+				bg->length, scrub_dev,
+				map->stripes[stripe_index].physical,
 				stripe_index + 1);
 		offset = 0;
 		goto out;
 	}
 	if (profile & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10)) {
-		ret = scrub_simple_stripe(sctx, bg, map, scrub_dev, stripe_index);
+		ret = scrub_simple_stripe(sctx, stripe, bg, map, scrub_dev,
+					  stripe_index);
 		offset = map->stripe_len * (stripe_index / map->sub_stripes);
 		goto out;
 	}
@@ -4638,8 +4381,8 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 		 * We can reuse scrub_simple_mirror() here, as the repair part
 		 * is still based on @mirror_num.
 		 */
-		ret = scrub_simple_mirror(sctx, bg, map, logical, map->stripe_len,
-					  scrub_dev, physical, 1);
+		ret = scrub_simple_mirror(sctx, stripe, bg, map, logical,
+					  map->stripe_len, scrub_dev, physical, 1);
 		if (ret < 0)
 			goto out;
 next:
@@ -4674,6 +4417,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 		if (ret2)
 			ret = ret2;
 	}
+	free_scrub_stripe(stripe);
 
 	return ret < 0 ? ret : 0;
 }
@@ -5479,28 +5223,3 @@ int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid,
 
 	return dev ? (sctx ? 0 : -ENOTCONN) : -ENODEV;
 }
-
-static void scrub_find_good_copy(struct btrfs_fs_info *fs_info,
-				 u64 extent_logical, u32 extent_len,
-				 u64 *extent_physical,
-				 struct btrfs_device **extent_dev,
-				 int *extent_mirror_num)
-{
-	u64 mapped_length;
-	struct btrfs_io_context *bioc = NULL;
-	int ret;
-
-	mapped_length = extent_len;
-	ret = btrfs_map_block(fs_info, BTRFS_MAP_READ, extent_logical,
-			      &mapped_length, &bioc, 0);
-	if (ret || !bioc || mapped_length < extent_len ||
-	    !bioc->stripes[0].dev->bdev) {
-		btrfs_put_bioc(bioc);
-		return;
-	}
-
-	*extent_physical = bioc->stripes[0].physical;
-	*extent_mirror_num = bioc->mirror_num;
-	*extent_dev = bioc->stripes[0].dev;
-	btrfs_put_bioc(bioc);
-}
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index fc294a250945..7639103ebf9d 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -13,22 +13,4 @@ int btrfs_scrub_cancel_dev(struct btrfs_device *dev);
 int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid,
 			 struct btrfs_scrub_progress *progress);
 
-/*
- * The following functions are temporary exports to avoid warning on unused
- * static functions.
- */
-struct scrub_stripe;
-int scrub_find_fill_first_stripe(struct btrfs_root *extent_root,
-				 struct btrfs_root *csum_root,
-				 struct btrfs_block_group *bg,
-				 u64 logical_start, u64 logical_len,
-				 struct scrub_stripe *stripe);
-int scrub_repair_one_stripe(struct scrub_stripe *stripe);
-void scrub_write_wait_sectors(struct scrub_ctx *sctx,
-			      struct scrub_stripe *stripe,
-			      struct btrfs_device *dev, u64 physical,
-			      unsigned long *write_bitmap);
-void scrub_report_stripe_errors(struct scrub_ctx *sctx,
-				struct scrub_stripe *stripe);
-
 #endif
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror()
  2023-01-16  7:04 [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (10 preceding siblings ...)
  2023-01-16  7:04 ` [PATCH 11/11] btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure Qu Wenruo
@ 2023-01-18 20:01 ` David Sterba
  11 siblings, 0 replies; 17+ messages in thread
From: David Sterba @ 2023-01-18 20:01 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Jan 16, 2023 at 03:04:11PM +0800, Qu Wenruo wrote:
> This is the formal version of the previous PoC patchset "btrfs: scrub:
> rework to get rid of the complex bio formshaping"
> 
> The idea is pretty straight-forward for scrub:
> 
> - Fetch the extent info and csum for the whole BTRFS_STRIPE_LEN range
> 
> - Read the full BTRFS_STRIPE_LEN range
> 
> - Verify the contents using the extent info and csum at endio time
> 
> - Wait for above BTRFS_STRIPE_LEN range read to finish.
> 
> - If we have failed sectors, read extra mirrors to repair them.
> 
> - If we have failed sectors, writeback the repaired ones.
> 
> - If we're doing dev-replace, writeback all good sectors to the target
>   device.
> 
> Although the workflow is still mostly the same as the old scrub
> infrastructure, the implementation goes submit-and-wait method.
> 
> Thus it provides a very straight-forward code basis:
> 
> 		scrub_reset_stripe(stripe);
> 		ret = scrub_find_fill_first_stripe(extent_root, csum_root, bg,
> 				cur_logical, logical_end - cur_logical, stripe);
> 		stripe->physical = physical + stripe->logical - logical_start;
> 		scrub_throttle_dev_io(sctx, device, BTRFS_STRIPE_LEN);
> 		scrub_submit_read_one_stripe(stripe);
> 		wait_scrub_stripe(stripe);
> 		scrub_repair_one_stripe(stripe);
> 		scrub_write_repaired_sectors(sctx, stripe);
> 		scrub_report_stripe_errors(sctx, stripe);
> 		if (sctx->is_dev_replace)
> 			scrub_write_replace_sectors(sctx, stripe);
> 		cur_logical = stripe->logical + BTRFS_STRIPE_LEN;
> 
> Thus it covers all the core logic in one function.
> 
> By contrast the old code goes various workqueue, endio function jumps,
> and extra bio formshaping.
> 
> Currently the patchset only covers profiles other than RAID56 parity
> stripes.
> Thus old infrastructure is still kept for RAID56 parity scrub usage.
> 
> But still the patchset is already large enough for review.
> 
> The current patchset can already pass all scrub and replace tests.
> 
> [BENCHMARK]
> 
> However there is a cost.
> Since our block size is limited to 64K, it's much smaller block size
> compared to the original one.
> 
> Thus for the worst case scenario (all data are continuous, and the
> profiles is RAID0 for extra splits), the scrub performance got a 20%
> drop:
> 
> Old:
>  
>  Duration:         0:00:19
>  Total to scrub:   10.52GiB
>  Rate:             449.50MiB/s
> 
> New:
> 
>  Duration:         0:00:24
>  Total to scrub:   10.52GiB
>  Rate:             355.86MiB/s
> 
> The benchmark is using an SATA SSD directly attached to the VM.
> 
> [NEED FEEDBACK]
> 
> Is 20% drop perf acceptable?
> 
> I have seen some customers asking for ways to slow down scrub,
> but not to speed it up.
> Thus I'm not sure if a native performance drop is a curse or a bless.
> 
> Any if needed, I can enlarge the block size by submitting multiple
> stripes instead.
> But in that case, we will need some extra code to do multiple stripe
> scrub.

20% seems a lot, however if this means that more IO from other
applications can be submitted while scrub is running it might not be
that bad. Scrub limiting is not done by default so this could help.

The point against that is what if I have an idle machine and want to let
the scrub run as fast as possible. As the request size and IO timing
depends on the device type, it's not exactly the same as limiting the
bandwidth. In your case you have an SSD, no HDD can do 449 or 355. An
option to 'scrub start' to set the request size or some human friendly
name like "big request" can be done. But I'm not sure if this is the
best way, it would be yet another tunable, we might try using cgroups
instead.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 03/11] btrfs: scrub: use dedicated super block verification function to scrub one super block
  2023-01-16  7:04 ` [PATCH 03/11] btrfs: scrub: use dedicated super block verification function to scrub one super block Qu Wenruo
@ 2023-01-19  4:46   ` li zhang
  2023-01-19  6:57     ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: li zhang @ 2023-01-19  4:46 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Qu Wenruo <wqu@suse.com> 于2023年1月16日周一 15:06写道:
>
> There is really no need to go through the super complex scrub_sectors()
> to just handle super blocks.
>
> This patch will introduce a dedicated function (less than 50 lines) to
> handle super block scrubing.
>
> This new function will introduce a behavior change, instead of using the
> complex but concurrent scrub_bio system, here we just go
> submit-and-wait.
>
> There is really not much sense to care the performance of super block
> scrubbing. It only has 3 super blocks at most, and they are all scattered
> around the devices already.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/scrub.c | 53 ++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 45 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 288d7e2a6cdf..e554a9904d2a 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -4143,18 +4143,59 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
>         return ret;
>  }
>
> +static int scrub_one_super(struct scrub_ctx *sctx, struct btrfs_device *dev,
> +                          struct page *page, u64 physical, u64 generation)
> +{
> +       struct btrfs_fs_info *fs_info = sctx->fs_info;
> +       struct bio_vec bvec;
> +       struct bio bio;
> +       struct btrfs_super_block *sb = page_address(page);
> +       int ret;
> +
> +       bio_init(&bio, dev->bdev, &bvec, 1, REQ_OP_READ);
> +       bio.bi_iter.bi_sector = physical >> SECTOR_SHIFT;
> +       bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE, 0);
> +       ret = submit_bio_wait(&bio);
> +       bio_uninit(&bio);
> +
> +       if (ret < 0)
> +               return ret;
> +       ret = btrfs_check_super_csum(fs_info, sb);
> +       if (ret != 0) {
> +               btrfs_err_rl(fs_info,
> +                       "super block at physical %llu devid %llu has bad csum",
> +                       physical, dev->devid);
> +               return -EIO;
> +       }
> +       if (btrfs_super_generation(sb) != generation) {
> +               btrfs_err_rl(fs_info,
> +"super block at physical %llu devid %llu has bad generation, has %llu expect %llu",
> +                            physical, dev->devid,
> +                            btrfs_super_generation(sb), generation);
> +               return -EUCLEAN;
> +       }
> +
> +       ret = btrfs_validate_super(fs_info, sb, -1);
> +       return ret;
> +}
> +
>  static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
>                                            struct btrfs_device *scrub_dev)
>  {
>         int     i;
>         u64     bytenr;
>         u64     gen;
> -       int     ret;
> +       int     ret = 0;
> +       struct page *page;
>         struct btrfs_fs_info *fs_info = sctx->fs_info;
>
>         if (BTRFS_FS_ERROR(fs_info))
>                 return -EROFS;
>
> +       page = alloc_page(GFP_KERNEL);

Does this page need to be freed?


> +       if (!page)
> +               return -ENOMEM;
> +
>         /* Seed devices of a new filesystem has their own generation. */
>         if (scrub_dev->fs_devices != fs_info->fs_devices)
>                 gen = scrub_dev->generation;
> @@ -4169,15 +4210,11 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
>                 if (!btrfs_check_super_location(scrub_dev, bytenr))
>                         continue;
>
> -               ret = scrub_sectors(sctx, bytenr, BTRFS_SUPER_INFO_SIZE, bytenr,
> -                                   scrub_dev, BTRFS_EXTENT_FLAG_SUPER, gen, i,
> -                                   NULL, bytenr);
> +               ret = scrub_one_super(sctx, scrub_dev, page, bytenr, gen);
>                 if (ret)
> -                       return ret;
> +                       break;
>         }
> -       wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0);
> -
> -       return 0;
> +       return ret;
>  }
>
>  static void scrub_workers_put(struct btrfs_fs_info *fs_info)
> --
> 2.39.0
>

thanks
Li

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 03/11] btrfs: scrub: use dedicated super block verification function to scrub one super block
  2023-01-19  4:46   ` li zhang
@ 2023-01-19  6:57     ` Qu Wenruo
  0 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2023-01-19  6:57 UTC (permalink / raw)
  To: li zhang, Qu Wenruo; +Cc: linux-btrfs



On 2023/1/19 12:46, li zhang wrote:
> Qu Wenruo <wqu@suse.com> 于2023年1月16日周一 15:06写道:
>>
>> There is really no need to go through the super complex scrub_sectors()
>> to just handle super blocks.
>>
>> This patch will introduce a dedicated function (less than 50 lines) to
>> handle super block scrubing.
>>
>> This new function will introduce a behavior change, instead of using the
>> complex but concurrent scrub_bio system, here we just go
>> submit-and-wait.
>>
>> There is really not much sense to care the performance of super block
>> scrubbing. It only has 3 super blocks at most, and they are all scattered
>> around the devices already.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/scrub.c | 53 ++++++++++++++++++++++++++++++++++++++++--------
>>   1 file changed, 45 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
>> index 288d7e2a6cdf..e554a9904d2a 100644
>> --- a/fs/btrfs/scrub.c
>> +++ b/fs/btrfs/scrub.c
>> @@ -4143,18 +4143,59 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
>>          return ret;
>>   }
>>
>> +static int scrub_one_super(struct scrub_ctx *sctx, struct btrfs_device *dev,
>> +                          struct page *page, u64 physical, u64 generation)
>> +{
>> +       struct btrfs_fs_info *fs_info = sctx->fs_info;
>> +       struct bio_vec bvec;
>> +       struct bio bio;
>> +       struct btrfs_super_block *sb = page_address(page);
>> +       int ret;
>> +
>> +       bio_init(&bio, dev->bdev, &bvec, 1, REQ_OP_READ);
>> +       bio.bi_iter.bi_sector = physical >> SECTOR_SHIFT;
>> +       bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE, 0);
>> +       ret = submit_bio_wait(&bio);
>> +       bio_uninit(&bio);
>> +
>> +       if (ret < 0)
>> +               return ret;
>> +       ret = btrfs_check_super_csum(fs_info, sb);
>> +       if (ret != 0) {
>> +               btrfs_err_rl(fs_info,
>> +                       "super block at physical %llu devid %llu has bad csum",
>> +                       physical, dev->devid);
>> +               return -EIO;
>> +       }
>> +       if (btrfs_super_generation(sb) != generation) {
>> +               btrfs_err_rl(fs_info,
>> +"super block at physical %llu devid %llu has bad generation, has %llu expect %llu",
>> +                            physical, dev->devid,
>> +                            btrfs_super_generation(sb), generation);
>> +               return -EUCLEAN;
>> +       }
>> +
>> +       ret = btrfs_validate_super(fs_info, sb, -1);
>> +       return ret;
>> +}
>> +
>>   static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
>>                                             struct btrfs_device *scrub_dev)
>>   {
>>          int     i;
>>          u64     bytenr;
>>          u64     gen;
>> -       int     ret;
>> +       int     ret = 0;
>> +       struct page *page;
>>          struct btrfs_fs_info *fs_info = sctx->fs_info;
>>
>>          if (BTRFS_FS_ERROR(fs_info))
>>                  return -EROFS;
>>
>> +       page = alloc_page(GFP_KERNEL);
> 
> Does this page need to be freed?

Nice find, you're completely right, I forgot to free the page.

Would update it in my github branch first.

Thanks,
Qu
> 
> 
>> +       if (!page)
>> +               return -ENOMEM;
>> +
>>          /* Seed devices of a new filesystem has their own generation. */
>>          if (scrub_dev->fs_devices != fs_info->fs_devices)
>>                  gen = scrub_dev->generation;
>> @@ -4169,15 +4210,11 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
>>                  if (!btrfs_check_super_location(scrub_dev, bytenr))
>>                          continue;
>>
>> -               ret = scrub_sectors(sctx, bytenr, BTRFS_SUPER_INFO_SIZE, bytenr,
>> -                                   scrub_dev, BTRFS_EXTENT_FLAG_SUPER, gen, i,
>> -                                   NULL, bytenr);
>> +               ret = scrub_one_super(sctx, scrub_dev, page, bytenr, gen);
>>                  if (ret)
>> -                       return ret;
>> +                       break;
>>          }
>> -       wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0);
>> -
>> -       return 0;
>> +       return ret;
>>   }
>>
>>   static void scrub_workers_put(struct btrfs_fs_info *fs_info)
>> --
>> 2.39.0
>>
> 
> thanks
> Li

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 01/11] btrfs: remove unused @path inside scrub_stripe()
  2023-01-16  7:04 ` [PATCH 01/11] btrfs: remove unused @path inside scrub_stripe() Qu Wenruo
@ 2023-03-01 20:25   ` David Sterba
  0 siblings, 0 replies; 17+ messages in thread
From: David Sterba @ 2023-03-01 20:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Jan 16, 2023 at 03:04:12PM +0800, Qu Wenruo wrote:
> The variable @path is no longer passed into any call sites after commit
> 18d30ab96149 ("btrfs: scrub: use scrub_simple_mirror() to handle RAID56
> data stripe scrub"), thus we can remove the variable completely.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

This is an independent patch, now added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 02/11] btrfs: remove @root and @csum_root arguments from scrub_simple_mirror()
  2023-01-16  7:04 ` [PATCH 02/11] btrfs: remove @root and @csum_root arguments from scrub_simple_mirror() Qu Wenruo
@ 2023-03-01 20:29   ` David Sterba
  0 siblings, 0 replies; 17+ messages in thread
From: David Sterba @ 2023-03-01 20:29 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Jan 16, 2023 at 03:04:13PM +0800, Qu Wenruo wrote:
> Grabbing extent/csum root for extent tree v2 is not that a huge
> workload, it's just a rb tree search.
> 
> Thus there is really not much need to pre-fetch it and pass it all the
> way from scrub_stripe().
> 
> And we already have more than enough arguments in scrub_simple_mirror()
> and scrub_simple_stripe(), it's better to remove them and only grab
> those roots in scrub_simple_mirror().
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

This one also applies and is an independent cleanup, added to misc-next.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-03-01 20:35 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-16  7:04 [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
2023-01-16  7:04 ` [PATCH 01/11] btrfs: remove unused @path inside scrub_stripe() Qu Wenruo
2023-03-01 20:25   ` David Sterba
2023-01-16  7:04 ` [PATCH 02/11] btrfs: remove @root and @csum_root arguments from scrub_simple_mirror() Qu Wenruo
2023-03-01 20:29   ` David Sterba
2023-01-16  7:04 ` [PATCH 03/11] btrfs: scrub: use dedicated super block verification function to scrub one super block Qu Wenruo
2023-01-19  4:46   ` li zhang
2023-01-19  6:57     ` Qu Wenruo
2023-01-16  7:04 ` [PATCH 04/11] btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based interface Qu Wenruo
2023-01-16  7:04 ` [PATCH 05/11] btrfs: scrub: introduce a helper to find and fill the sector info for a scrub_stripe Qu Wenruo
2023-01-16  7:04 ` [PATCH 06/11] btrfs: scrub: introduce a helper to verify one metadata Qu Wenruo
2023-01-16  7:04 ` [PATCH 07/11] btrfs: scrub: introduce a helper to verify one scrub_stripe Qu Wenruo
2023-01-16  7:04 ` [PATCH 08/11] btrfs: scrub: introduce the repair functionality for scrub_stripe Qu Wenruo
2023-01-16  7:04 ` [PATCH 09/11] btrfs: scrub: introduce a writeback helper " Qu Wenruo
2023-01-16  7:04 ` [PATCH 10/11] btrfs: scrub: introduce error reporting functionality " Qu Wenruo
2023-01-16  7:04 ` [PATCH 11/11] btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure Qu Wenruo
2023-01-18 20:01 ` [PATCH 00/11] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.