All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/13] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror()
@ 2023-03-28 23:56 Qu Wenruo
  2023-03-28 23:56 ` [PATCH v7 01/13] btrfs: scrub: use dedicated super block verification function to scrub one super block Qu Wenruo
                   ` (12 more replies)
  0 siblings, 13 replies; 32+ messages in thread
From: Qu Wenruo @ 2023-03-28 23:56 UTC (permalink / raw)
  To: linux-btrfs

This series can be found in my github repo:

https://github.com/adam900710/linux/tree/scrub_stripe

It's recommended to fetch from the repo, as our misc-next seems to
change pretty rapidly.

[Changelog]
v7:
- Fix a bug that scrub_stripe::extent_sector_bitmap is not cleared
  Exposed during my development for RAID56 new scrub code.

  Extent_sector_bitmap indicates whether the sector is utilized by
  any extent.
  If that bitmap is not cleared before running the next stripe, we
  can treat unused sectors as NODATASUM data.

  This is not a big deal for non-RAID56 profiles, as they skip empty
  stripe through other methods.
  But can be very problematic for RAID56, as they need to scrub
  data stripes then P/Q stripes.

  Such inherited bitmap makes RAID56 to scrub all those unused
  full stripes and greatly slow down the scrub.

v6:
- Fix a bug in zoned block group repair
  Exposed during my development for RAID56 new scrub code.

  There is a bug that we may use @stripe to determine if we need to
  queue a block group for repair.

  But @stripe is the last stripe we checked, it may not have any error.

  The correct way is to go through all the stripes and queue the repair
  if we found any error.

v5:
- Fix a bug that unconditionally repairs a zoned block group
  Only trigger the repair if we had any initial read failure

  Huge thanks to Johannes for the initial ZNS tests.

v4:
- Add a dedicated patch to add btrfs_bio::fs_info
  Along with dedicated allocator for scrub btrfs bios.
  The dedicated allocator is due to the fact that scrub and regular
  btrfs bios have very different mandatory members (fs_info vs inode +
  file_offset).
  For now I believe a different allocator would be better.

- Some code style change
  * No more single letter temporray structure copied from old scrub code
  * Use "for (int i = 0; ...)" when possible
  * Some new lines fixes
  * Extra brackets for tenrary operators
  * A new macro for (BTRFS_STRIPE_LEN >> PAGE_SHIFT)
  * Use enum for scrub_stripe::state bit flags
  * Use extra brackets for double shifting
  * Use explicit != 0 or == 0 comparing memcmp() results
  * Remove unnecessary ASSERT()s after btrfs_bio allocation

v3:
- Add a dedicated @fs_info member for btrfs_bio
  Unfortunately although we have a 32 bytes hole between @end_io_work and @bio,
  compiler still choose not to use that hole for whatever reasons.
  Thus this would increase the size of btrfs_bio by 4 bytes.

- Rebased to lastest misc-next

- Fix various while space error not caught by btrfs-workflow

v2:
- Use batched scrub_stripe submission
  This allows much better performance compared to the old scrub code

- Add scrub specific bio layer helpers
  This makes the scrub code to be completely rely on logical bytenr +
  mirror_num.

[PROBLEMS OF OLD SCRUB]

- Too many delayed jumps, making it hard to read
  Even starting from scrub_simple_mirror(), we have the following
  functions:

  scrub_extent()
       |
       v
  scrub_sectors()
       |
       v
  scrub_add_sector_to_rd_bio()
       | endio function
       v
  scrub_bio_end_io()
       | delayed work
       v
  scrub_bio_end_io_worker()
       |
       v
  scrub_block_complete()
       |
       v
  scrub_handle_errored_blocks()
       |
       v
  scrub_recheck_block()
       |
       v
  scrub_repair_sector_from_good_copy()

  Not to mention the hidden jumps in certain branches.

- IOPS inefficient for fragmented extents

  The real block size of scrub read is between 4K and 128K.
  If the extents are not adjacent, the blocksize drops to 4K and would
  be an IOPS disaster.

- All hardcoded to do the logical -> physical mapping by scrub itself
  No usage of any existing bio facilities.
  And even implemented a RAID56 recovery wrapper.

[NEW SCRUB_STRIPE BASED SOLUTION]

- Overall streamlined code base

  queue_scrub_stripe()
     |
     v
  scrub_find_fill_first_stripe()
     |
     v
  done

  Or

  queue_scrub_stripe()
     |
     v
  flush_scrub_stripes()
     |
     v
  scrub_submit_initial_read()
     | endio function
     v
  scrub_read_endio()
     | delayed work
     v
  scrub_stripe_read_repair_worker()
     |
     v
  scrub_verify_one_stripe()
     |
     v
  scrub_stripe_submit_repair_read()
     |
     v
  scrub_write_sectors()
     |
     v
  scrub_stripe_report_errors()

  Only one endio and delayed work, all other work are properly done in a
  sequential workflow.

- Always read in 64KiB block size
  The real blocksize of read starts at 64KiB, and ends at 512K.
  This already results a better performance even for the worst case:

  With patchset:	404.81MiB/s
  Without patchset:	369.30MiB/s

  Around 10% performance improvement on an SATA SSD.

- All logical bytenr/mirror_num based read and write

  With the new single stripe fast path in btrfs_submit_bio(), scrub can
  reuse most of the bio layer code, result much simpler scrub code.

[TODO]

- More testing on zoned devices
  Now the patchset can already pass all scrub/replace groups with
  regular devices.

- Cleanup on RAID56 path
  Already done in the github repo.
  Still needs to do some patch split as the cleanup itself is
  twice the size of the submitted series already.

Qu Wenruo (13):
  btrfs: scrub: use dedicated super block verification function to scrub
    one super block
  btrfs: introduce a new allocator for scrub specific btrfs_bio
  btrfs: introduce a new helper to submit read bio for scrub
  btrfs: introduce a new helper to submit write bio for scrub
  btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based
    interface
  btrfs: scrub: introduce a helper to find and fill the sector info for
    a scrub_stripe
  btrfs: scrub: introduce a helper to verify one metadata
  btrfs: scrub: introduce a helper to verify one scrub_stripe
  btrfs: scrub: introduce the main read repair worker for scrub_stripe
  btrfs: scrub: introduce a writeback helper for scrub_stripe
  btrfs: scrub: introduce error reporting functionality for scrub_stripe
  btrfs: scrub: introduce the helper to queue a stripe for scrub
  btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe
    infrastructure

 fs/btrfs/bio.c       |  165 ++++-
 fs/btrfs/bio.h       |   22 +-
 fs/btrfs/file-item.c |    9 +-
 fs/btrfs/file-item.h |    3 +-
 fs/btrfs/raid56.c    |    2 +-
 fs/btrfs/scrub.c     | 1662 ++++++++++++++++++++++++++++++------------
 6 files changed, 1378 insertions(+), 485 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v7 01/13] btrfs: scrub: use dedicated super block verification function to scrub one super block
  2023-03-28 23:56 [PATCH v7 00/13] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
@ 2023-03-28 23:56 ` Qu Wenruo
  2023-03-29  0:29   ` Anand Jain
                     ` (2 more replies)
  2023-03-28 23:56 ` [PATCH v7 02/13] btrfs: introduce a new allocator for scrub specific btrfs_bio Qu Wenruo
                   ` (11 subsequent siblings)
  12 siblings, 3 replies; 32+ messages in thread
From: Qu Wenruo @ 2023-03-28 23:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

There is really no need to go through the super complex scrub_sectors()
to just handle super blocks.

This patch will introduce a dedicated function (less than 50 lines) to
handle super block scrubing.

This new function will introduce a behavior change, instead of using the
complex but concurrent scrub_bio system, here we just go
submit-and-wait.

There is really not much sense to care the performance of super block
scrubbing. It only has 3 super blocks at most, and they are all scattered
around the devices already.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/scrub.c | 54 +++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 46 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 3cdf73277e7e..e765eb8b8bcf 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -4243,18 +4243,59 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
 	return ret;
 }
 
+static int scrub_one_super(struct scrub_ctx *sctx, struct btrfs_device *dev,
+			   struct page *page, u64 physical, u64 generation)
+{
+	struct btrfs_fs_info *fs_info = sctx->fs_info;
+	struct bio_vec bvec;
+	struct bio bio;
+	struct btrfs_super_block *sb = page_address(page);
+	int ret;
+
+	bio_init(&bio, dev->bdev, &bvec, 1, REQ_OP_READ);
+	bio.bi_iter.bi_sector = physical >> SECTOR_SHIFT;
+	bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE, 0);
+	ret = submit_bio_wait(&bio);
+	bio_uninit(&bio);
+
+	if (ret < 0)
+		return ret;
+	ret = btrfs_check_super_csum(fs_info, sb);
+	if (ret != 0) {
+		btrfs_err_rl(fs_info,
+			"super block at physical %llu devid %llu has bad csum",
+			physical, dev->devid);
+		return -EIO;
+	}
+	if (btrfs_super_generation(sb) != generation) {
+		btrfs_err_rl(fs_info,
+"super block at physical %llu devid %llu has bad generation, has %llu expect %llu",
+			     physical, dev->devid,
+			     btrfs_super_generation(sb), generation);
+		return -EUCLEAN;
+	}
+
+	ret = btrfs_validate_super(fs_info, sb, -1);
+	return ret;
+}
+
 static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
 					   struct btrfs_device *scrub_dev)
 {
 	int	i;
 	u64	bytenr;
 	u64	gen;
-	int	ret;
+	int	ret = 0;
+	struct page *page;
 	struct btrfs_fs_info *fs_info = sctx->fs_info;
 
 	if (BTRFS_FS_ERROR(fs_info))
 		return -EROFS;
 
+	page = alloc_page(GFP_KERNEL);
+	if (!page)
+		return -ENOMEM;
+
 	/* Seed devices of a new filesystem has their own generation. */
 	if (scrub_dev->fs_devices != fs_info->fs_devices)
 		gen = scrub_dev->generation;
@@ -4269,15 +4310,12 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
 		if (!btrfs_check_super_location(scrub_dev, bytenr))
 			continue;
 
-		ret = scrub_sectors(sctx, bytenr, BTRFS_SUPER_INFO_SIZE, bytenr,
-				    scrub_dev, BTRFS_EXTENT_FLAG_SUPER, gen, i,
-				    NULL, bytenr);
+		ret = scrub_one_super(sctx, scrub_dev, page, bytenr, gen);
 		if (ret)
-			return ret;
+			break;
 	}
-	wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0);
-
-	return 0;
+	__free_page(page);
+	return ret;
 }
 
 static void scrub_workers_put(struct btrfs_fs_info *fs_info)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 02/13] btrfs: introduce a new allocator for scrub specific btrfs_bio
  2023-03-28 23:56 [PATCH v7 00/13] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
  2023-03-28 23:56 ` [PATCH v7 01/13] btrfs: scrub: use dedicated super block verification function to scrub one super block Qu Wenruo
@ 2023-03-28 23:56 ` Qu Wenruo
  2023-03-29 23:32   ` Christoph Hellwig
  2023-03-28 23:56 ` [PATCH v7 03/13] btrfs: introduce a new helper to submit read bio for scrub Qu Wenruo
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2023-03-28 23:56 UTC (permalink / raw)
  To: linux-btrfs

Currently we're doing a lot of work for btrfs_bio:

- Checksum verification for data read bios
- Bio splits if it crosses stripe boundary
- Read repair for data read bios

However for the incoming scrub patches, we don't want those extra
functionality at all, just pure logical + mirror -> physical mapping
ability.

Thus here we introduce:

- btrfs_bio::fs_info
  This is for the new scrub specific btrfs_bio, which would not
  populate btrfs_bio::inode.
  Thus we need such new member to grab a fs_info

  This new member would always be populated.

- btrfs_scrub_bio_alloc() helper
  The main differences between this and btrfs_bio_alloc() are:
  * No need for nr_vecs
    As we know scrub bio should not cross stripe boundary.

  * Use @fs_info to replace @inode parameter

- An extra ASSERT() to make sure btrfs_bio::fs_info is populated

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/bio.c | 25 +++++++++++++++++++++++++
 fs/btrfs/bio.h | 19 ++++++++++++++++++-
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index cf09c6271edb..c1edadc17260 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -36,6 +36,7 @@ void btrfs_bio_init(struct btrfs_bio *bbio, struct btrfs_inode *inode,
 {
 	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
 	bbio->inode = inode;
+	bbio->fs_info = inode->root->fs_info;
 	bbio->end_io = end_io;
 	bbio->private = private;
 	atomic_set(&bbio->pending_ios, 1);
@@ -61,6 +62,30 @@ struct btrfs_bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf,
 	return bbio;
 }
 
+/*
+ * Allocate a scrub specific btrfs_bio structure.
+ *
+ * This btrfs_bio would not go through any the btrfs special handling like
+ * checksum verification nor read-repair.
+ */
+struct btrfs_bio *btrfs_scrub_bio_alloc(blk_opf_t opf,
+					struct btrfs_fs_info *fs_info,
+					btrfs_bio_end_io_t end_io, void *private)
+{
+	struct btrfs_bio *bbio;
+	struct bio *bio;
+
+	bio = bio_alloc_bioset(NULL, BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits,
+			       opf, GFP_NOFS, &btrfs_bioset);
+	bbio = btrfs_bio(bio);
+	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
+	bbio->fs_info = fs_info;
+	bbio->end_io = end_io;
+	bbio->private = private;
+	atomic_set(&bbio->pending_ios, 1);
+	return bbio;
+}
+
 static struct btrfs_bio *btrfs_split_bio(struct btrfs_fs_info *fs_info,
 					 struct btrfs_bio *orig_bbio,
 					 u64 map_length, bool use_append)
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index dbf125f6fa33..3b97ce54140a 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -30,7 +30,12 @@ typedef void (*btrfs_bio_end_io_t)(struct btrfs_bio *bbio);
  * passed to btrfs_submit_bio for mapping to the physical devices.
  */
 struct btrfs_bio {
-	/* Inode and offset into it that this I/O operates on. */
+	/*
+	 * Inode and offset into it that this I/O operates on.
+	 *
+	 * @inode can be NULL for callers who don't want any advanced features
+	 * like read-time repair.
+	 */
 	struct btrfs_inode *inode;
 	u64 file_offset;
 
@@ -58,6 +63,15 @@ struct btrfs_bio {
 	atomic_t pending_ios;
 	struct work_struct end_io_work;
 
+	/*
+	 * For cases where callers only want to read/write from a logical
+	 * bytenr, in that case @inode can be NULL, and we need such
+	 * @fs_info pointer to grab the corresponding fs_info.
+	 *
+	 * Should always be populated.
+	 */
+	struct btrfs_fs_info *fs_info;
+
 	/*
 	 * This member must come last, bio_alloc_bioset will allocate enough
 	 * bytes for entire btrfs_bio but relies on bio being last.
@@ -78,6 +92,9 @@ void btrfs_bio_init(struct btrfs_bio *bbio, struct btrfs_inode *inode,
 struct btrfs_bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf,
 				  struct btrfs_inode *inode,
 				  btrfs_bio_end_io_t end_io, void *private);
+struct btrfs_bio *btrfs_scrub_bio_alloc(blk_opf_t opf,
+					struct btrfs_fs_info *fs_info,
+					btrfs_bio_end_io_t end_io, void *private);
 
 static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
 {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 03/13] btrfs: introduce a new helper to submit read bio for scrub
  2023-03-28 23:56 [PATCH v7 00/13] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
  2023-03-28 23:56 ` [PATCH v7 01/13] btrfs: scrub: use dedicated super block verification function to scrub one super block Qu Wenruo
  2023-03-28 23:56 ` [PATCH v7 02/13] btrfs: introduce a new allocator for scrub specific btrfs_bio Qu Wenruo
@ 2023-03-28 23:56 ` Qu Wenruo
  2023-03-29 23:33   ` Christoph Hellwig
  2023-03-28 23:56 ` [PATCH v7 04/13] btrfs: introduce a new helper to submit write " Qu Wenruo
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2023-03-28 23:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The new helper, btrfs_submit_scrub_read(), would be mostly a subset of
btrfs_submit_bio(), with the following limitations:

- Only supports read
- @mirror_num must be > 0
- No read-time repair nor checksum verification
- The @bbio must not cross stripe boundary

This would provide the basis for unified read repair for scrub, as we no
longer needs to handle RAID56 recovery all by scrub, and RAID56 data
stripes scrub can share the same code of read and repair.

The repair part would be the same as non-RAID56, as we only need to try
the next mirror.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/bio.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
 fs/btrfs/bio.h |  1 +
 2 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index c1edadc17260..bdef346c542c 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -333,8 +333,8 @@ static void btrfs_end_bio_work(struct work_struct *work)
 {
 	struct btrfs_bio *bbio = container_of(work, struct btrfs_bio, end_io_work);
 
-	/* Metadata reads are checked and repaired by the submitter. */
-	if (bbio->bio.bi_opf & REQ_META)
+	/* Metadata or scrub reads are checked and repaired by the submitter. */
+	if (bbio->bio.bi_opf & REQ_META || !bbio->inode)
 		bbio->end_io(bbio);
 	else
 		btrfs_check_read_bio(bbio, bbio->bio.bi_private);
@@ -344,7 +344,7 @@ static void btrfs_simple_end_io(struct bio *bio)
 {
 	struct btrfs_bio *bbio = btrfs_bio(bio);
 	struct btrfs_device *dev = bio->bi_private;
-	struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info;
+	struct btrfs_fs_info *fs_info = bbio->fs_info;
 
 	btrfs_bio_counter_dec(fs_info);
 
@@ -368,7 +368,8 @@ static void btrfs_raid56_end_io(struct bio *bio)
 
 	btrfs_bio_counter_dec(bioc->fs_info);
 	bbio->mirror_num = bioc->mirror_num;
-	if (bio_op(bio) == REQ_OP_READ && !(bbio->bio.bi_opf & REQ_META))
+	if (bio_op(bio) == REQ_OP_READ && bbio->inode &&
+	    !(bbio->bio.bi_opf & REQ_META))
 		btrfs_check_read_bio(bbio, NULL);
 	else
 		btrfs_orig_bbio_end_io(bbio);
@@ -714,6 +715,45 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
 	return true;
 }
 
+/*
+ * Scrub read special version, with extra limits:
+ *
+ * - Only support read for scrub usage
+ * - @mirror_num must be >0
+ * - No read-time repair nor checksum verification.
+ * - The @bbio must not cross stripe boundary.
+ */
+void btrfs_submit_scrub_read(struct btrfs_bio *bbio, int mirror_num)
+{
+	struct btrfs_fs_info *fs_info = bbio->fs_info;
+	u64 logical = bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT;
+	u64 length = bbio->bio.bi_iter.bi_size;
+	u64 map_length = length;
+	struct btrfs_io_context *bioc = NULL;
+	struct btrfs_io_stripe smap;
+	int ret;
+
+	ASSERT(fs_info);
+	ASSERT(mirror_num > 0);
+	ASSERT(btrfs_op(&bbio->bio) == BTRFS_MAP_READ);
+	ASSERT(!bbio->inode);
+
+	btrfs_bio_counter_inc_blocked(fs_info);
+	ret = __btrfs_map_block(fs_info, btrfs_op(&bbio->bio), logical,
+				&map_length, &bioc, &smap, &mirror_num, 1);
+	if (ret)
+		goto fail;
+
+	/* Caller should ensure the @bbio doesn't cross stripe boundary. */
+	ASSERT(map_length >= length);
+	__btrfs_submit_bio(&bbio->bio, bioc, &smap, mirror_num);
+	return;
+
+fail:
+	btrfs_bio_counter_dec(fs_info);
+	btrfs_bio_end_io(bbio, errno_to_blk_status(ret));
+}
+
 void btrfs_submit_bio(struct btrfs_bio *bbio, int mirror_num)
 {
 	while (!btrfs_submit_chunk(bbio, mirror_num))
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index 3b97ce54140a..afbcf318fdda 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -106,6 +106,7 @@ static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
 #define REQ_BTRFS_ONE_ORDERED			REQ_DRV
 
 void btrfs_submit_bio(struct btrfs_bio *bbio, int mirror_num);
+void btrfs_submit_scrub_read(struct btrfs_bio *bbio, int mirror_num);
 int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 			    u64 length, u64 logical, struct page *page,
 			    unsigned int pg_offset, int mirror_num);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 04/13] btrfs: introduce a new helper to submit write bio for scrub
  2023-03-28 23:56 [PATCH v7 00/13] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (2 preceding siblings ...)
  2023-03-28 23:56 ` [PATCH v7 03/13] btrfs: introduce a new helper to submit read bio for scrub Qu Wenruo
@ 2023-03-28 23:56 ` Qu Wenruo
  2023-03-29 23:33   ` Christoph Hellwig
  2023-03-28 23:56 ` [PATCH v7 05/13] btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based interface Qu Wenruo
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2023-03-28 23:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Just like the special scrub read, scrub write also has its extra niches:

- Only write back to single device
  Even for read-repair on RAID56, we only update the corrupted data
  stripe itself, not triggering the full RMW path.

  This makes scrub writeback a perfect match for the single stripe
  quick path.

- Requires a valid @mirror_num
  For RAID56 case, only @mirror_num == 1 is supported.
  For non-RAID56 cases, we need @mirror_num to locate our stripe.

- Need to manually specify if it's for dev-replace
  For scrub path we can write back to the original device (for
  read-repair) and to the target device (for replace) at the same
  time, but with different sectors (read-repair only writes repaired
  sectors, while dev-replace writes all good sectors).

  So here we need a bool to specify the case.

- No data csum generation

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/bio.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/bio.h |  2 ++
 2 files changed, 94 insertions(+)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index bdef346c542c..34902d58bb4b 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -754,6 +754,98 @@ void btrfs_submit_scrub_read(struct btrfs_bio *bbio, int mirror_num)
 	btrfs_bio_end_io(bbio, errno_to_blk_status(ret));
 }
 
+/*
+ * Scrub write special version. Some extra limits:
+ *
+ * - Only support write back for dev-replace and scrub read-repair.
+ *   This means, the write bio, even for RAID56, would only
+ *   be mapped to single device.
+ *
+ * - @mirror_num must be >0.
+ *   To indicate which mirror to be written.
+ *   If it's RAID56, it must be 1 (data stripes).
+ *
+ * - The @bbio must not cross stripe boundary.
+ *
+ * - If @dev_replace is true, the resulted stripe must be mapped to
+ *   replace source device.
+ *
+ * - No csum geneartion.
+ */
+void btrfs_submit_scrub_write(struct btrfs_bio *bbio, int mirror_num,
+			      bool dev_replace)
+{
+	struct btrfs_fs_info *fs_info = bbio->fs_info;
+	u64 logical = bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT;
+	u64 length = bbio->bio.bi_iter.bi_size;
+	u64 map_length = length;
+	struct btrfs_io_context *bioc = NULL;
+	struct btrfs_io_stripe smap;
+	int ret;
+
+	ASSERT(fs_info);
+	ASSERT(mirror_num > 0);
+	ASSERT(btrfs_op(&bbio->bio) == BTRFS_MAP_WRITE);
+	ASSERT(!bbio->inode);
+
+	btrfs_bio_counter_inc_blocked(fs_info);
+	ret = __btrfs_map_block(fs_info, btrfs_op(&bbio->bio), logical,
+				&map_length, &bioc, &smap, &mirror_num, 1);
+	if (ret)
+		goto fail;
+
+	/* Caller should ensure the @bbio doesn't cross stripe boundary. */
+	ASSERT(map_length >= length);
+	if (btrfs_op(&bbio->bio) == BTRFS_MAP_WRITE && btrfs_is_zoned(fs_info)) {
+		bbio->bio.bi_opf &= ~REQ_OP_WRITE;
+		bbio->bio.bi_opf |= REQ_OP_ZONE_APPEND;
+	}
+
+	if (!bioc)
+		goto submit;
+
+	/* Map the RAID56 multi-stripe writes to a single one. */
+	if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
+		int data_stripes = (bioc->map_type & BTRFS_BLOCK_GROUP_RAID5) ?
+				    bioc->num_stripes - 1 : bioc->num_stripes - 2;
+		int i;
+
+		/* This special write only works for data stripes. */
+		ASSERT(mirror_num == 1);
+		for (i = 0; i < data_stripes; i++) {
+			u64 stripe_start = bioc->full_stripe_logical +
+					   (i << BTRFS_STRIPE_LEN_SHIFT);
+
+			if (logical >= stripe_start &&
+			    logical < stripe_start + BTRFS_STRIPE_LEN)
+				break;
+		}
+		ASSERT(i < data_stripes);
+		smap.dev = bioc->stripes[i].dev;
+		smap.physical = bioc->stripes[i].physical +
+				((logical - bioc->full_stripe_logical) &
+				 BTRFS_STRIPE_LEN_MASK);
+		goto submit;
+	}
+	ASSERT(mirror_num <= bioc->num_stripes);
+	smap.dev = bioc->stripes[mirror_num - 1].dev;
+	smap.physical = bioc->stripes[mirror_num - 1].physical;
+submit:
+	ASSERT(smap.dev);
+	btrfs_put_bioc(bioc);
+	bioc = NULL;
+	if (dev_replace) {
+		ASSERT(smap.dev == fs_info->dev_replace.srcdev);
+		smap.dev = fs_info->dev_replace.tgtdev;
+	}
+	__btrfs_submit_bio(&bbio->bio, bioc, &smap, mirror_num);
+	return;
+
+fail:
+	btrfs_bio_counter_dec(fs_info);
+	btrfs_bio_end_io(bbio, errno_to_blk_status(ret));
+}
+
 void btrfs_submit_bio(struct btrfs_bio *bbio, int mirror_num)
 {
 	while (!btrfs_submit_chunk(bbio, mirror_num))
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index afbcf318fdda..ad5a6a558662 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -107,6 +107,8 @@ static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
 
 void btrfs_submit_bio(struct btrfs_bio *bbio, int mirror_num);
 void btrfs_submit_scrub_read(struct btrfs_bio *bbio, int mirror_num);
+void btrfs_submit_scrub_write(struct btrfs_bio *bbio, int mirror_num,
+			      bool dev_replace);
 int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 			    u64 length, u64 logical, struct page *page,
 			    unsigned int pg_offset, int mirror_num);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 05/13] btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based interface
  2023-03-28 23:56 [PATCH v7 00/13] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (3 preceding siblings ...)
  2023-03-28 23:56 ` [PATCH v7 04/13] btrfs: introduce a new helper to submit write " Qu Wenruo
@ 2023-03-28 23:56 ` Qu Wenruo
  2023-03-28 23:56 ` [PATCH v7 06/13] btrfs: scrub: introduce a helper to find and fill the sector info for a scrub_stripe Qu Wenruo
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2023-03-28 23:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

This patch introduces the following structures:

- scrub_sector_verification
  Contains all the needed info to verify one sector (data or metadata).

- scrub_stripe
  Contains all needed members (mostly bitmap based) to scrub one stripe
  (with a length of BTRFS_STRIPE_LEN).

The basic idea is, we keep the existing per-device scrub behavior, but
merge all the scrub_bio/scrub_bio into one generic structure, and read
the full BTRFS_STRIPE_LEN stripe in the first try.

This means we will read some sectors which is not scrub target, but
that's fine. At dev-replace time we only writeback the utilized and good
sectors, and for read-repair we only writeback the repaired sectors.

With every read submitted in BTRFS_STRIPE_LEN, the need for complex bio
formshaping would be gone.
Although to get the same performance of the old scrub behavior, we would
need to submit the initial read for two stripes at once.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/scrub.c | 142 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/scrub.h |   8 +++
 2 files changed, 150 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index e765eb8b8bcf..05ecd1e5c513 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -70,6 +70,94 @@ struct scrub_ctx;
  */
 #define BTRFS_MAX_MIRRORS (4 + 1)
 
+/* Represent one sector and its needed info to verify the content. */
+struct scrub_sector_verification {
+	bool is_metadata;
+
+	union {
+		/*
+		 * Csum pointer for data csum verification.
+		 * Should point to a sector csum inside scrub_stripe::csums.
+		 *
+		 * NULL if this data sector has no csum.
+		 */
+		u8 *csum;
+
+		/*
+		 * Extra info for metadata verification.
+		 * All sectors inside a tree block shares the same
+		 * geneartion.
+		 */
+		u64 generation;
+	};
+};
+
+enum scrub_stripe_flags {
+	/* Set when @mirror_num, @dev, @physical and @logical is set. */
+	SCRUB_STRIPE_FLAG_INITIALIZED,
+
+	/* Set when the read-repair is finished. */
+	SCRUB_STRIPE_FLAG_REPAIR_DONE,
+};
+
+#define SCRUB_STRIPE_PAGES		(BTRFS_STRIPE_LEN / PAGE_SIZE)
+/*
+ * Represent one continuous range with a length of BTRFS_STRIPE_LEN.
+ */
+struct scrub_stripe {
+	struct btrfs_block_group *bg;
+
+	struct page *pages[SCRUB_STRIPE_PAGES];
+	struct scrub_sector_verification *sectors;
+
+	struct btrfs_device *dev;
+	u64 logical;
+	u64 physical;
+
+	u16 mirror_num;
+
+	/* Should be BTRFS_STRIPE_LEN / sectorsize. */
+	u16 nr_sectors;
+
+	atomic_t pending_io;
+	wait_queue_head_t io_wait;
+
+	/*
+	 * Indicates the states of the stripe.
+	 * Bits are defined in scrub_stripe_flags enum.
+	 */
+	unsigned long state;
+
+	/* Indicates which sectors are covered by extent items. */
+	unsigned long extent_sector_bitmap;
+
+	/*
+	 * The errors hit during the initial read of the stripe.
+	 *
+	 * Would be utilized for error reporting and repair.
+	 */
+	unsigned long init_error_bitmap;
+
+	/*
+	 * The following error bitmaps are all for the current status.
+	 * Every time we submit a new read, those bitmaps may be updated.
+	 *
+	 * error_bitmap = io_error_bitmap | csum_error_bitmap | meta_error_bitmap;
+	 *
+	 * IO and csum errors can happen for both metadata and data.
+	 */
+	unsigned long error_bitmap;
+	unsigned long io_error_bitmap;
+	unsigned long csum_error_bitmap;
+	unsigned long meta_error_bitmap;
+
+	/*
+	 * Checksum for the whole stripe if this stripe is inside a data block
+	 * group.
+	 */
+	u8 *csums;
+};
+
 struct scrub_recover {
 	refcount_t		refs;
 	struct btrfs_io_context	*bioc;
@@ -266,6 +354,60 @@ static void detach_scrub_page_private(struct page *page)
 #endif
 }
 
+static void release_scrub_stripe(struct scrub_stripe *stripe)
+{
+	if (!stripe)
+		return;
+
+	for (int i = 0; i < SCRUB_STRIPE_PAGES; i++) {
+		if (stripe->pages[i])
+			__free_page(stripe->pages[i]);
+		stripe->pages[i] = NULL;
+	}
+	kfree(stripe->sectors);
+	kfree(stripe->csums);
+	stripe->sectors = NULL;
+	stripe->csums = NULL;
+	stripe->state = 0;
+}
+
+int init_scrub_stripe(struct btrfs_fs_info *fs_info, struct scrub_stripe *stripe)
+{
+	int ret;
+
+	memset(stripe, 0, sizeof(*stripe));
+
+	stripe->nr_sectors = BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits;
+	stripe->state = 0;
+
+	init_waitqueue_head(&stripe->io_wait);
+	atomic_set(&stripe->pending_io, 0);
+
+	ret = btrfs_alloc_page_array(SCRUB_STRIPE_PAGES, stripe->pages);
+	if (ret < 0)
+		goto error;
+
+	stripe->sectors = kcalloc(stripe->nr_sectors,
+				  sizeof(struct scrub_sector_verification),
+				  GFP_KERNEL);
+	if (!stripe->sectors)
+		goto error;
+
+	stripe->csums = kzalloc((BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits) *
+				fs_info->csum_size, GFP_KERNEL);
+	if (!stripe->csums)
+		goto error;
+	return 0;
+error:
+	release_scrub_stripe(stripe);
+	return -ENOMEM;
+}
+
+void wait_scrub_stripe_io(struct scrub_stripe *stripe)
+{
+	wait_event(stripe->io_wait, atomic_read(&stripe->pending_io) == 0);
+}
+
 static struct scrub_block *alloc_scrub_block(struct scrub_ctx *sctx,
 					     struct btrfs_device *dev,
 					     u64 logical, u64 physical,
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index 7639103ebf9d..e04764f8bb7e 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -13,4 +13,12 @@ int btrfs_scrub_cancel_dev(struct btrfs_device *dev);
 int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid,
 			 struct btrfs_scrub_progress *progress);
 
+/*
+ * The following functions are temporary exports to avoid warning on unused
+ * static functions.
+ */
+struct scrub_stripe;
+int init_scrub_stripe(struct btrfs_fs_info *fs_info, struct scrub_stripe *stripe);
+void wait_scrub_stripe_io(struct scrub_stripe *stripe);
+
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 06/13] btrfs: scrub: introduce a helper to find and fill the sector info for a scrub_stripe
  2023-03-28 23:56 [PATCH v7 00/13] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (4 preceding siblings ...)
  2023-03-28 23:56 ` [PATCH v7 05/13] btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based interface Qu Wenruo
@ 2023-03-28 23:56 ` Qu Wenruo
  2023-03-28 23:56 ` [PATCH v7 07/13] btrfs: scrub: introduce a helper to verify one metadata Qu Wenruo
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2023-03-28 23:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The new helper will search the extent tree to find the first extent of a
logical range, then fill the sectors array by two loops:

- Loop 1 to fill common bits and metadata generation

- Loop 2 to fill csum data (only for data bgs)
  This loop will use the new btrfs_lookup_csums_bitmap() to fill
  the full csum buffer, and set scrub_sector_verification::csum.

With all the needed info fulfilled by this function, later we only need
to submit and verify the stripe.

Here we temporarily export the helper to avoid wanring on unused static
function.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/file-item.c |   9 ++-
 fs/btrfs/file-item.h |   3 +-
 fs/btrfs/raid56.c    |   2 +-
 fs/btrfs/scrub.c     | 149 +++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/scrub.h     |   4 ++
 5 files changed, 164 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 1ce306cea690..018c711a0bc8 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -597,7 +597,8 @@ int btrfs_lookup_csums_list(struct btrfs_root *root, u64 start, u64 end,
  * in is large enough to contain all csums.
  */
 int btrfs_lookup_csums_bitmap(struct btrfs_root *root, u64 start, u64 end,
-			      u8 *csum_buf, unsigned long *csum_bitmap)
+			      u8 *csum_buf, unsigned long *csum_bitmap,
+			      bool search_commit)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct btrfs_key key;
@@ -614,6 +615,12 @@ int btrfs_lookup_csums_bitmap(struct btrfs_root *root, u64 start, u64 end,
 	if (!path)
 		return -ENOMEM;
 
+	if (search_commit) {
+		path->skip_locking = 1;
+		path->reada = READA_FORWARD;
+		path->search_commit_root = 1;
+	}
+
 	key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
 	key.type = BTRFS_EXTENT_CSUM_KEY;
 	key.offset = start;
diff --git a/fs/btrfs/file-item.h b/fs/btrfs/file-item.h
index cd7f2ae515c0..6be8725cd574 100644
--- a/fs/btrfs/file-item.h
+++ b/fs/btrfs/file-item.h
@@ -57,7 +57,8 @@ int btrfs_lookup_csums_list(struct btrfs_root *root, u64 start, u64 end,
 			    struct list_head *list, int search_commit,
 			    bool nowait);
 int btrfs_lookup_csums_bitmap(struct btrfs_root *root, u64 start, u64 end,
-			      u8 *csum_buf, unsigned long *csum_bitmap);
+			      u8 *csum_buf, unsigned long *csum_bitmap,
+			      bool search_commit);
 void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 				     const struct btrfs_path *path,
 				     struct btrfs_file_extent_item *fi,
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 6cbbaa6c06ca..a64b40000d12 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2113,7 +2113,7 @@ static void fill_data_csums(struct btrfs_raid_bio *rbio)
 	}
 
 	ret = btrfs_lookup_csums_bitmap(csum_root, start, start + len - 1,
-					rbio->csum_buf, rbio->csum_bitmap);
+					rbio->csum_buf, rbio->csum_bitmap, false);
 	if (ret < 0)
 		goto error;
 	if (bitmap_empty(rbio->csum_bitmap, len >> fs_info->sectorsize_bits))
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 05ecd1e5c513..77c9b615c929 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3642,6 +3642,155 @@ static int sync_write_pointer_for_zoned(struct scrub_ctx *sctx, u64 logical,
 	return ret;
 }
 
+static void fill_one_extent_info(struct btrfs_fs_info *fs_info,
+				 struct scrub_stripe *stripe,
+				 u64 extent_start, u64 extent_len,
+				 u64 extent_flags, u64 extent_gen)
+{
+	u64 cur_logical;
+
+	for (cur_logical = max(stripe->logical, extent_start);
+	     cur_logical < min(stripe->logical + BTRFS_STRIPE_LEN,
+			       extent_start + extent_len);
+	     cur_logical += fs_info->sectorsize) {
+		const int nr_sector = (cur_logical - stripe->logical) >>
+				      fs_info->sectorsize_bits;
+		struct scrub_sector_verification *sector =
+			&stripe->sectors[nr_sector];
+
+		set_bit(nr_sector, &stripe->extent_sector_bitmap);
+		if (extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
+			sector->is_metadata = true;
+			sector->generation = extent_gen;
+		}
+	}
+}
+
+static void scrub_stripe_reset_bitmaps(struct scrub_stripe *stripe)
+{
+	stripe->extent_sector_bitmap = 0;
+	stripe->init_error_bitmap = 0;
+	stripe->error_bitmap = 0;
+	stripe->io_error_bitmap = 0;
+	stripe->csum_error_bitmap = 0;
+	stripe->meta_error_bitmap = 0;
+}
+
+/*
+ * Locate one stripe which has at least one extent in its range.
+ *
+ * Return 0 if found such stripe, and store its info into @stripe.
+ * Return >0 if there is no such stripe in the specified range.
+ * Return <0 for error.
+ */
+int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
+				 struct btrfs_device *dev, u64 physical,
+				 int mirror_num, u64 logical_start,
+				 u32 logical_len, struct scrub_stripe *stripe)
+{
+	struct btrfs_fs_info *fs_info = bg->fs_info;
+	struct btrfs_root *extent_root = btrfs_extent_root(fs_info, bg->start);
+	struct btrfs_root *csum_root = btrfs_csum_root(fs_info, bg->start);
+	const u64 logical_end = logical_start + logical_len;
+	struct btrfs_path path = { 0 };
+	u64 cur_logical = logical_start;
+	u64 stripe_end;
+	u64 extent_start;
+	u64 extent_len;
+	u64 extent_flags;
+	u64 extent_gen;
+	int ret;
+
+	memset(stripe->sectors, 0, sizeof(struct scrub_sector_verification) *
+				   stripe->nr_sectors);
+	scrub_stripe_reset_bitmaps(stripe);
+
+	/* The range must be inside the bg */
+	ASSERT(logical_start >= bg->start && logical_end <= bg->start + bg->length);
+
+	path.search_commit_root = 1;
+	path.skip_locking = 1;
+
+	ret = find_first_extent_item(extent_root, &path, logical_start,
+				     logical_len);
+	/* Either error or not found. */
+	if (ret)
+		goto out;
+	get_extent_info(&path, &extent_start, &extent_len,
+			&extent_flags, &extent_gen);
+	cur_logical = max(extent_start, cur_logical);
+
+	/*
+	 * Round down to stripe boundary.
+	 *
+	 * The extra calculation against bg->start is to handle block groups
+	 * whose logical bytenr is not BTRFS_STRIPE_LEN aligned.
+	 */
+	stripe->logical = round_down(cur_logical - bg->start, BTRFS_STRIPE_LEN) +
+			  bg->start;
+	stripe->physical = physical + stripe->logical - logical_start;
+	stripe->dev = dev;
+	stripe->bg = bg;
+	stripe->mirror_num = mirror_num;
+	stripe_end = stripe->logical + BTRFS_STRIPE_LEN - 1;
+
+	/* Fill the first extent info into stripe->sectors[] array. */
+	fill_one_extent_info(fs_info, stripe, extent_start, extent_len,
+			     extent_flags, extent_gen);
+	cur_logical = extent_start + extent_len;
+
+	/* Fill the extent info for the remaining sectors. */
+	while (cur_logical <= stripe_end) {
+		ret = find_first_extent_item(extent_root, &path, cur_logical,
+					     stripe_end - cur_logical + 1);
+		if (ret < 0)
+			goto out;
+		if (ret > 0) {
+			ret = 0;
+			break;
+		}
+		get_extent_info(&path, &extent_start, &extent_len,
+				&extent_flags, &extent_gen);
+		fill_one_extent_info(fs_info, stripe, extent_start, extent_len,
+				     extent_flags, extent_gen);
+		cur_logical = extent_start + extent_len;
+	}
+
+	/* Now fill the data csum. */
+	if (bg->flags & BTRFS_BLOCK_GROUP_DATA) {
+		int sector_nr;
+		unsigned long csum_bitmap = 0;
+
+		/* Csum space should have already been allocated. */
+		ASSERT(stripe->csums);
+
+		/*
+		 * Our csum bitmap should be large enough, as BTRFS_STRIPE_LEN
+		 * should contain at most 16 sectors.
+		 */
+		ASSERT(BITS_PER_LONG >=
+		       BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits);
+
+		ret = btrfs_lookup_csums_bitmap(csum_root, stripe->logical,
+						stripe_end, stripe->csums,
+						&csum_bitmap, true);
+		if (ret < 0)
+			goto out;
+		if (ret > 0)
+			ret = 0;
+
+		for_each_set_bit(sector_nr, &csum_bitmap, stripe->nr_sectors) {
+			stripe->sectors[sector_nr].csum = stripe->csums +
+				sector_nr * fs_info->csum_size;
+		}
+	}
+	set_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &stripe->state);
+out:
+	btrfs_release_path(&path);
+	return ret;
+}
+
+
 /*
  * Scrub one range which can only has simple mirror based profile.
  * (Including all range in SINGLE/DUP/RAID1/RAID1C*, and each stripe in
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index e04764f8bb7e..27019d86b539 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -20,5 +20,9 @@ int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid,
 struct scrub_stripe;
 int init_scrub_stripe(struct btrfs_fs_info *fs_info, struct scrub_stripe *stripe);
 void wait_scrub_stripe_io(struct scrub_stripe *stripe);
+int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
+				 struct btrfs_device *dev, u64 physical,
+				 int mirror_num, u64 logical_start,
+				 u32 logical_len, struct scrub_stripe *stripe);
 
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 07/13] btrfs: scrub: introduce a helper to verify one metadata
  2023-03-28 23:56 [PATCH v7 00/13] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (5 preceding siblings ...)
  2023-03-28 23:56 ` [PATCH v7 06/13] btrfs: scrub: introduce a helper to find and fill the sector info for a scrub_stripe Qu Wenruo
@ 2023-03-28 23:56 ` Qu Wenruo
  2023-03-28 23:56 ` [PATCH v7 08/13] btrfs: scrub: introduce a helper to verify one scrub_stripe Qu Wenruo
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2023-03-28 23:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The new helper, scrub_verify_one_metadata(), is almost the same as
scrub_checksum_tree_block().

The difference is in how we grab the pages.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/scrub.c | 116 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/scrub.h |   1 +
 2 files changed, 117 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 77c9b615c929..60d4d4ebb359 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2159,6 +2159,122 @@ static int scrub_checksum_data(struct scrub_block *sblock)
 	return sblock->checksum_error;
 }
 
+static struct page *scrub_stripe_get_page(struct scrub_stripe *stripe,
+					  int sector_nr)
+{
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	int page_index = (sector_nr << fs_info->sectorsize_bits) >> PAGE_SHIFT;
+
+	return stripe->pages[page_index];
+}
+
+static unsigned int scrub_stripe_get_page_offset(struct scrub_stripe *stripe,
+						 int sector_nr)
+{
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+
+	return offset_in_page(sector_nr << fs_info->sectorsize_bits);
+}
+
+void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr)
+{
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	const unsigned int sectors_per_tree = fs_info->nodesize >>
+					      fs_info->sectorsize_bits;
+	const u64 logical = stripe->logical + (sector_nr << fs_info->sectorsize_bits);
+	const struct page *first_page = scrub_stripe_get_page(stripe, sector_nr);
+	const unsigned int first_off = scrub_stripe_get_page_offset(stripe, sector_nr);
+	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
+	u8 on_disk_csum[BTRFS_CSUM_SIZE];
+	u8 calculated_csum[BTRFS_CSUM_SIZE];
+	struct btrfs_header *header;
+
+	/*
+	 * Here we don't have a good way to attach the pages (and subpages)
+	 * to a dummy extent buffer, thus we have to directly grab the members
+	 * from pages.
+	 */
+	header = (struct btrfs_header *)(page_address(first_page) + first_off);
+	memcpy(on_disk_csum, header->csum, fs_info->csum_size);
+
+	if (logical != btrfs_stack_header_bytenr(header)) {
+		bitmap_set(&stripe->csum_error_bitmap, sector_nr,
+			   sectors_per_tree);
+		bitmap_set(&stripe->error_bitmap, sector_nr,
+			   sectors_per_tree);
+		btrfs_warn_rl(fs_info,
+		"tree block %llu mirror %u has bad bytenr, has %llu want %llu",
+			      logical, stripe->mirror_num,
+			      btrfs_stack_header_bytenr(header), logical);
+		return;
+	}
+	if (memcmp(header->fsid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE) != 0) {
+		bitmap_set(&stripe->meta_error_bitmap, sector_nr,
+			   sectors_per_tree);
+		bitmap_set(&stripe->error_bitmap, sector_nr,
+			   sectors_per_tree);
+		btrfs_warn_rl(fs_info,
+		"tree block %llu mirror %u has bad fsid, has %pU want %pU",
+			      logical, stripe->mirror_num,
+			      header->fsid, fs_info->fs_devices->fsid);
+		return;
+	}
+	if (memcmp(header->chunk_tree_uuid, fs_info->chunk_tree_uuid,
+		   BTRFS_UUID_SIZE) != 0) {
+		bitmap_set(&stripe->meta_error_bitmap, sector_nr,
+			   sectors_per_tree);
+		bitmap_set(&stripe->error_bitmap, sector_nr,
+			   sectors_per_tree);
+		btrfs_warn_rl(fs_info,
+		"tree block %llu mirror %u has bad chunk tree uuid, has %pU want %pU",
+			      logical, stripe->mirror_num,
+			      header->chunk_tree_uuid, fs_info->chunk_tree_uuid);
+		return;
+	}
+
+	/* Now check tree block csum. */
+	shash->tfm = fs_info->csum_shash;
+	crypto_shash_init(shash);
+	crypto_shash_update(shash, page_address(first_page) + first_off +
+			    BTRFS_CSUM_SIZE, fs_info->sectorsize - BTRFS_CSUM_SIZE);
+
+	for (int i = sector_nr + 1; i < sector_nr + sectors_per_tree; i++) {
+		struct page *page = scrub_stripe_get_page(stripe, i);
+		unsigned int page_off = scrub_stripe_get_page_offset(stripe, i);
+
+		crypto_shash_update(shash, page_address(page) + page_off,
+				    fs_info->sectorsize);
+	}
+	crypto_shash_final(shash, calculated_csum);
+	if (memcmp(calculated_csum, on_disk_csum, fs_info->csum_size) != 0) {
+		bitmap_set(&stripe->meta_error_bitmap, sector_nr,
+			   sectors_per_tree);
+		bitmap_set(&stripe->error_bitmap, sector_nr,
+			   sectors_per_tree);
+		btrfs_warn_rl(fs_info,
+		"tree block %llu mirror %u has bad csum, has " CSUM_FMT " want " CSUM_FMT,
+			      logical, stripe->mirror_num,
+			      CSUM_FMT_VALUE(fs_info->csum_size, on_disk_csum),
+			      CSUM_FMT_VALUE(fs_info->csum_size, calculated_csum));
+		return;
+	}
+	if (stripe->sectors[sector_nr].generation !=
+	    btrfs_stack_header_generation(header)) {
+		bitmap_set(&stripe->meta_error_bitmap, sector_nr,
+			   sectors_per_tree);
+		bitmap_set(&stripe->error_bitmap, sector_nr,
+			   sectors_per_tree);
+		btrfs_warn_rl(fs_info,
+		"tree block %llu mirror %u has bad geneartion, has %llu want %llu",
+			      logical, stripe->mirror_num,
+			      btrfs_stack_header_generation(header),
+			      stripe->sectors[sector_nr].generation);
+	}
+	bitmap_clear(&stripe->error_bitmap, sector_nr, sectors_per_tree);
+	bitmap_clear(&stripe->csum_error_bitmap, sector_nr, sectors_per_tree);
+	bitmap_clear(&stripe->meta_error_bitmap, sector_nr, sectors_per_tree);
+}
+
 static int scrub_checksum_tree_block(struct scrub_block *sblock)
 {
 	struct scrub_ctx *sctx = sblock->sctx;
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index 27019d86b539..0d8bdc7df89c 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -24,5 +24,6 @@ int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
 				 struct btrfs_device *dev, u64 physical,
 				 int mirror_num, u64 logical_start,
 				 u32 logical_len, struct scrub_stripe *stripe);
+void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr);
 
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 08/13] btrfs: scrub: introduce a helper to verify one scrub_stripe
  2023-03-28 23:56 [PATCH v7 00/13] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (6 preceding siblings ...)
  2023-03-28 23:56 ` [PATCH v7 07/13] btrfs: scrub: introduce a helper to verify one metadata Qu Wenruo
@ 2023-03-28 23:56 ` Qu Wenruo
  2023-03-28 23:56 ` [PATCH v7 09/13] btrfs: scrub: introduce the main read repair worker for scrub_stripe Qu Wenruo
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2023-03-28 23:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The new helper, scrub_verify_stripe(), shares the same main workflow of
the old scrub code.

The major differences are:

- How pages/page_offset is grabbed
  Everything can be grabbed from scrub_stripe easily.

- When error report happens
  Currently the helper only verify the sectors, not really doing any
  error reporting.
  The error reporting would be done after we have done the repair.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/scrub.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/scrub.h |  2 +-
 2 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 60d4d4ebb359..7fb661f5cb2b 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2176,7 +2176,7 @@ static unsigned int scrub_stripe_get_page_offset(struct scrub_stripe *stripe,
 	return offset_in_page(sector_nr << fs_info->sectorsize_bits);
 }
 
-void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr)
+static void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr)
 {
 	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
 	const unsigned int sectors_per_tree = fs_info->nodesize >>
@@ -2275,6 +2275,84 @@ void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr)
 	bitmap_clear(&stripe->meta_error_bitmap, sector_nr, sectors_per_tree);
 }
 
+static void scrub_verify_one_sector(struct scrub_stripe *stripe,
+				    int sector_nr)
+{
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	struct scrub_sector_verification *sector = &stripe->sectors[sector_nr];
+	const unsigned int sectors_per_tree = fs_info->nodesize >>
+					      fs_info->sectorsize_bits;
+	struct page *page = scrub_stripe_get_page(stripe, sector_nr);
+	unsigned int pgoff = scrub_stripe_get_page_offset(stripe, sector_nr);
+	u8 csum_buf[BTRFS_CSUM_SIZE];
+	int ret;
+
+	ASSERT(sector_nr >= 0 && sector_nr < stripe->nr_sectors);
+
+	/* Sector not utilized, skip it. */
+	if (!test_bit(sector_nr, &stripe->extent_sector_bitmap))
+		return;
+
+	/* IO error, no need to check. */
+	if (test_bit(sector_nr, &stripe->io_error_bitmap))
+		return;
+
+	/* Metadata, verify the full tree block. */
+	if (sector->is_metadata) {
+		/*
+		 * Check if the tree block crosses the stripe boudary.
+		 * If crossed the boundary, we can not verify it but only
+		 * gives a warning.
+		 *
+		 * This can only happen in very old fs where chunks are not
+		 * ensured to be stripe aligned.
+		 */
+		if (unlikely(sector_nr + sectors_per_tree > stripe->nr_sectors)) {
+			btrfs_warn_rl(fs_info,
+			"tree block at %llu crosses stripe boundary %llu",
+				      stripe->logical +
+				      (sector_nr << fs_info->sectorsize_bits),
+				      stripe->logical);
+			return;
+		}
+		scrub_verify_one_metadata(stripe, sector_nr);
+		return;
+	}
+
+	/*
+	 * Data is much easier, we just verify the data csum (if we have).
+	 * For cases without csum, we have no other choice but to trust it.
+	 */
+	if (!sector->csum) {
+		clear_bit(sector_nr, &stripe->error_bitmap);
+		return;
+	}
+
+	ret = btrfs_check_sector_csum(fs_info, page, pgoff, csum_buf, sector->csum);
+	if (ret < 0) {
+		set_bit(sector_nr, &stripe->csum_error_bitmap);
+		set_bit(sector_nr, &stripe->error_bitmap);
+	} else {
+		clear_bit(sector_nr, &stripe->csum_error_bitmap);
+		clear_bit(sector_nr, &stripe->error_bitmap);
+	}
+}
+
+/* Verify specified sectors of a stripe. */
+void scrub_verify_one_stripe(struct scrub_stripe *stripe, unsigned long bitmap)
+{
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	const unsigned int sectors_per_tree = fs_info->nodesize >>
+					      fs_info->sectorsize_bits;
+	int sector_nr;
+
+	for_each_set_bit(sector_nr, &bitmap, stripe->nr_sectors) {
+		scrub_verify_one_sector(stripe, sector_nr);
+		if (stripe->sectors[sector_nr].is_metadata)
+			sector_nr += sectors_per_tree - 1;
+	}
+}
+
 static int scrub_checksum_tree_block(struct scrub_block *sblock)
 {
 	struct scrub_ctx *sctx = sblock->sctx;
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index 0d8bdc7df89c..45ff7e149806 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -24,6 +24,6 @@ int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
 				 struct btrfs_device *dev, u64 physical,
 				 int mirror_num, u64 logical_start,
 				 u32 logical_len, struct scrub_stripe *stripe);
-void scrub_verify_one_metadata(struct scrub_stripe *stripe, int sector_nr);
+void scrub_verify_one_stripe(struct scrub_stripe *stripe, unsigned long bitmap);
 
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 09/13] btrfs: scrub: introduce the main read repair worker for scrub_stripe
  2023-03-28 23:56 [PATCH v7 00/13] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (7 preceding siblings ...)
  2023-03-28 23:56 ` [PATCH v7 08/13] btrfs: scrub: introduce a helper to verify one scrub_stripe Qu Wenruo
@ 2023-03-28 23:56 ` Qu Wenruo
  2023-03-28 23:56 ` [PATCH v7 10/13] btrfs: scrub: introduce a writeback helper " Qu Wenruo
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2023-03-28 23:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The new helper, scrub_stripe_read_repair_worker(), would handle the
read-repair part:

- Wait for the previous submitted read IO to finish

- Verify the contents of the stripe

- Go through the remaining mirrors, using as large blocksize as possible
  At this stage, we just read out all the failed sectors from each
  mirror and re-verify.
  If no more failed sector, we can exit.

- Go through all mirrors again, sector-by-sector this time
  This time, we read sector by sector, this is to address cases where
  one bad sector mismatches the drive's internal checksum, and cause the
  whole read range to fail.

  We put this recovery method as the last resort, as sector-by-sector
  reading is slow, and read from other mirrors may have already fixed
  the errors.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/scrub.c | 210 ++++++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/scrub.h |   3 +-
 2 files changed, 209 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 7fb661f5cb2b..d65286a6de5a 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -121,6 +121,7 @@ struct scrub_stripe {
 
 	atomic_t pending_io;
 	wait_queue_head_t io_wait;
+	wait_queue_head_t repair_wait;
 
 	/*
 	 * Indicates the states of the stripe.
@@ -156,6 +157,8 @@ struct scrub_stripe {
 	 * group.
 	 */
 	u8 *csums;
+
+	struct work_struct work;
 };
 
 struct scrub_recover {
@@ -381,6 +384,7 @@ int init_scrub_stripe(struct btrfs_fs_info *fs_info, struct scrub_stripe *stripe
 	stripe->state = 0;
 
 	init_waitqueue_head(&stripe->io_wait);
+	init_waitqueue_head(&stripe->repair_wait);
 	atomic_set(&stripe->pending_io, 0);
 
 	ret = btrfs_alloc_page_array(SCRUB_STRIPE_PAGES, stripe->pages);
@@ -403,7 +407,7 @@ int init_scrub_stripe(struct btrfs_fs_info *fs_info, struct scrub_stripe *stripe
 	return -ENOMEM;
 }
 
-void wait_scrub_stripe_io(struct scrub_stripe *stripe)
+static void wait_scrub_stripe_io(struct scrub_stripe *stripe)
 {
 	wait_event(stripe->io_wait, atomic_read(&stripe->pending_io) == 0);
 }
@@ -2339,7 +2343,8 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe,
 }
 
 /* Verify specified sectors of a stripe. */
-void scrub_verify_one_stripe(struct scrub_stripe *stripe, unsigned long bitmap)
+static void scrub_verify_one_stripe(struct scrub_stripe *stripe,
+				    unsigned long bitmap)
 {
 	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
 	const unsigned int sectors_per_tree = fs_info->nodesize >>
@@ -2353,6 +2358,207 @@ void scrub_verify_one_stripe(struct scrub_stripe *stripe, unsigned long bitmap)
 	}
 }
 
+static int calc_sector_number(struct scrub_stripe *stripe,
+			      struct bio_vec *first_bvec)
+{
+	int i;
+
+	for (i = 0; i < stripe->nr_sectors; i++) {
+		if (scrub_stripe_get_page(stripe, i) == first_bvec->bv_page &&
+		    scrub_stripe_get_page_offset(stripe, i) == first_bvec->bv_offset)
+			break;
+	}
+	ASSERT(i < stripe->nr_sectors);
+	return i;
+}
+
+/*
+ * Repair read is different to the regular read by:
+ *
+ * - Only reads the failed sectors
+ * - May have extra blocksize limits
+ */
+static void scrub_repair_read_endio(struct btrfs_bio *bbio)
+{
+	struct scrub_stripe *stripe = bbio->private;
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	struct bio_vec *bvec;
+	int sector_nr = calc_sector_number(stripe,
+					   bio_first_bvec_all(&bbio->bio));
+	int bio_size = 0;
+	int i;
+
+	ASSERT(sector_nr < stripe->nr_sectors);
+
+	bio_for_each_bvec_all(bvec, &bbio->bio, i)
+		bio_size += bvec->bv_len;
+
+	if (bbio->bio.bi_status) {
+		bitmap_set(&stripe->io_error_bitmap, sector_nr,
+			   bio_size >> fs_info->sectorsize_bits);
+		bitmap_set(&stripe->error_bitmap, sector_nr,
+			   bio_size >> fs_info->sectorsize_bits);
+	} else {
+		bitmap_clear(&stripe->io_error_bitmap, sector_nr,
+			     bio_size >> fs_info->sectorsize_bits);
+	}
+	bio_put(&bbio->bio);
+	if (atomic_dec_and_test(&stripe->pending_io))
+		wake_up(&stripe->io_wait);
+}
+
+static int calc_next_mirror(int mirror, int num_copies)
+{
+	ASSERT(mirror <= num_copies);
+	return (mirror + 1 > num_copies) ? 1 : mirror + 1;
+}
+
+static void scrub_stripe_submit_repair_read(struct scrub_stripe *stripe,
+					    int mirror, int blocksize,
+					    bool wait)
+{
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	struct btrfs_bio *bbio = NULL;
+	const unsigned long old_error_bitmap = stripe->error_bitmap;
+	int i;
+
+	ASSERT(stripe->mirror_num >= 1);
+	ASSERT(atomic_read(&stripe->pending_io) == 0);
+
+	for_each_set_bit(i, &old_error_bitmap, stripe->nr_sectors) {
+		struct page *page;
+		int pgoff;
+		int ret;
+
+		page = scrub_stripe_get_page(stripe, i);
+		pgoff = scrub_stripe_get_page_offset(stripe, i);
+
+		/* The current sector can not be merged, submit the bio. */
+		if (bbio && ((i > 0 && !test_bit(i - 1, &stripe->error_bitmap)) ||
+			     bbio->bio.bi_iter.bi_size >= blocksize)) {
+			ASSERT(bbio->bio.bi_iter.bi_size);
+			atomic_inc(&stripe->pending_io);
+			btrfs_submit_scrub_read(bbio, mirror);
+			if (wait)
+				wait_scrub_stripe_io(stripe);
+			bbio = NULL;
+		}
+
+		if (!bbio) {
+			bbio = btrfs_scrub_bio_alloc(REQ_OP_READ, fs_info,
+					scrub_repair_read_endio, stripe);
+			bbio->bio.bi_iter.bi_sector = (stripe->logical +
+				(i << fs_info->sectorsize_bits)) >> SECTOR_SHIFT;
+		}
+
+		ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
+		ASSERT(ret == fs_info->sectorsize);
+	}
+	if (bbio) {
+		ASSERT(bbio->bio.bi_iter.bi_size);
+		atomic_inc(&stripe->pending_io);
+		btrfs_submit_scrub_read(bbio, mirror);
+		if (wait)
+			wait_scrub_stripe_io(stripe);
+	}
+}
+
+/*
+ * The main entrance for all read related scrub work, including:
+ *
+ * - Wait for the initial read to finish
+ * - Verify and locate any bad sectors
+ * - Go through the remaining mirrors and try to read as large blocksize as
+ *   possible
+ *
+ * - Go through all mirrors (including the failed mirror) sector-by-sector
+ *
+ * Writeback does not happen here, they need extra synchronization.
+ */
+static void scrub_stripe_read_repair_worker(struct work_struct *work)
+{
+	struct scrub_stripe *stripe = container_of(work, struct scrub_stripe,
+						   work);
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	int num_copies = btrfs_num_copies(fs_info, stripe->bg->start,
+					  stripe->bg->length);
+	int mirror;
+	int i;
+
+	ASSERT(stripe->mirror_num > 0);
+
+	wait_scrub_stripe_io(stripe);
+	scrub_verify_one_stripe(stripe, stripe->extent_sector_bitmap);
+	/* Save the initial failed bitmap for later repair and report usage. */
+	stripe->init_error_bitmap = stripe->error_bitmap;
+
+	if (bitmap_empty(&stripe->init_error_bitmap, stripe->nr_sectors))
+		goto out;
+
+	/*
+	 * Try all remaining mirrors.
+	 *
+	 * Here we still try read as large block as possible, as this is faster
+	 * and we have extra safe nets to rely on.
+	 */
+	for (mirror = calc_next_mirror(stripe->mirror_num, num_copies);
+	     mirror != stripe->mirror_num;
+	     mirror = calc_next_mirror(mirror, num_copies)) {
+		const unsigned long old_error_bitmap = stripe->error_bitmap;
+
+		scrub_stripe_submit_repair_read(stripe, mirror,
+						BTRFS_STRIPE_LEN, false);
+		wait_scrub_stripe_io(stripe);
+		scrub_verify_one_stripe(stripe, old_error_bitmap);
+		if (bitmap_empty(&stripe->error_bitmap, stripe->nr_sectors))
+			goto out;
+	}
+
+	/*
+	 * Last safenet, try re-check all mirrors, including the failed one,
+	 * sector-by-sector.
+	 *
+	 * As if one sector failed the drive's internal csum, the whole read
+	 * containing the offending sector would be marked error.
+	 * Thus here we do sector-by-sector read.
+	 *
+	 * This can be slow, thus we only try it as the last resort.
+	 */
+
+	for (i = 0, mirror = stripe->mirror_num; i < num_copies;
+	     i++, mirror = calc_next_mirror(mirror, num_copies)) {
+		const unsigned long old_error_bitmap = stripe->error_bitmap;
+
+		scrub_stripe_submit_repair_read(stripe, mirror,
+						fs_info->sectorsize, true);
+		wait_scrub_stripe_io(stripe);
+		scrub_verify_one_stripe(stripe, old_error_bitmap);
+		if (bitmap_empty(&stripe->error_bitmap, stripe->nr_sectors))
+			goto out;
+	}
+out:
+	set_bit(SCRUB_STRIPE_FLAG_REPAIR_DONE, &stripe->state);
+	wake_up(&stripe->repair_wait);
+}
+
+void scrub_read_endio(struct btrfs_bio *bbio)
+{
+	struct scrub_stripe *stripe = bbio->private;
+
+	if (bbio->bio.bi_status) {
+		bitmap_set(&stripe->io_error_bitmap, 0, stripe->nr_sectors);
+		bitmap_set(&stripe->error_bitmap, 0, stripe->nr_sectors);
+	} else {
+		bitmap_clear(&stripe->io_error_bitmap, 0, stripe->nr_sectors);
+	}
+	bio_put(&bbio->bio);
+	if (atomic_dec_and_test(&stripe->pending_io)) {
+		wake_up(&stripe->io_wait);
+		INIT_WORK(&stripe->work, scrub_stripe_read_repair_worker);
+		queue_work(stripe->bg->fs_info->scrub_workers, &stripe->work);
+	}
+}
+
 static int scrub_checksum_tree_block(struct scrub_block *sblock)
 {
 	struct scrub_ctx *sctx = sblock->sctx;
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index 45ff7e149806..bcc9d398fe07 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -19,11 +19,10 @@ int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid,
  */
 struct scrub_stripe;
 int init_scrub_stripe(struct btrfs_fs_info *fs_info, struct scrub_stripe *stripe);
-void wait_scrub_stripe_io(struct scrub_stripe *stripe);
 int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
 				 struct btrfs_device *dev, u64 physical,
 				 int mirror_num, u64 logical_start,
 				 u32 logical_len, struct scrub_stripe *stripe);
-void scrub_verify_one_stripe(struct scrub_stripe *stripe, unsigned long bitmap);
+void scrub_read_endio(struct btrfs_bio *bbio);
 
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 10/13] btrfs: scrub: introduce a writeback helper for scrub_stripe
  2023-03-28 23:56 [PATCH v7 00/13] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (8 preceding siblings ...)
  2023-03-28 23:56 ` [PATCH v7 09/13] btrfs: scrub: introduce the main read repair worker for scrub_stripe Qu Wenruo
@ 2023-03-28 23:56 ` Qu Wenruo
  2023-03-28 23:56 ` [PATCH v7 11/13] btrfs: scrub: introduce error reporting functionality " Qu Wenruo
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2023-03-28 23:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Add a new helper, scrub_write_sectors(), to submit write bios for
specified sectors to the target disk

There are several differences compared to read path:

- Utilize btrfs_submit_scrub_write()
  Now we still rely on the @mirror_num based writeback, but the
  requirement is also a little different than regular writeback
  or read, thus we have to call btrfs_submit_scrub_write().

- We can not write the full stripe back
  We can only write the sectors we have.
  There will be two call sites later, one for repaired sectors,
  one for all utilized sectors of dev-replace.

  Thus the callers should specify their own write_bitmap.

This function only submit the bios, will not wait for them unless for
zoned case.

Caller must explicitly wait for the IO to finish.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/scrub.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/scrub.h |  3 ++
 2 files changed, 98 insertions(+)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index d65286a6de5a..b6d2361a6ddb 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -152,6 +152,12 @@ struct scrub_stripe {
 	unsigned long csum_error_bitmap;
 	unsigned long meta_error_bitmap;
 
+	/* For writeback (repair or replace) error report. */
+	unsigned long write_error_bitmap;
+
+	/* Writeback can be concurrent, thus we need to protect the bitmap. */
+	spinlock_t write_error_lock;
+
 	/*
 	 * Checksum for the whole stripe if this stripe is inside a data block
 	 * group.
@@ -386,6 +392,7 @@ int init_scrub_stripe(struct btrfs_fs_info *fs_info, struct scrub_stripe *stripe
 	init_waitqueue_head(&stripe->io_wait);
 	init_waitqueue_head(&stripe->repair_wait);
 	atomic_set(&stripe->pending_io, 0);
+	spin_lock_init(&stripe->write_error_lock);
 
 	ret = btrfs_alloc_page_array(SCRUB_STRIPE_PAGES, stripe->pages);
 	if (ret < 0)
@@ -2559,6 +2566,94 @@ void scrub_read_endio(struct btrfs_bio *bbio)
 	}
 }
 
+static void scrub_write_endio(struct btrfs_bio *bbio)
+{
+	struct scrub_stripe *stripe = bbio->private;
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	struct bio_vec *bvec;
+	unsigned long flags;
+	int sector_nr = calc_sector_number(stripe, bio_first_bvec_all(&bbio->bio));
+	unsigned int bio_size = 0;
+	int i;
+
+	bio_for_each_bvec_all(bvec, &bbio->bio, i)
+		bio_size += bvec->bv_len;
+
+	if (bbio->bio.bi_status) {
+		spin_lock_irqsave(&stripe->write_error_lock, flags);
+		bitmap_set(&stripe->write_error_bitmap, sector_nr,
+				bio_size >> fs_info->sectorsize_bits);
+		spin_unlock_irqrestore(&stripe->write_error_lock, flags);
+	}
+	bio_put(&bbio->bio);
+
+	if (atomic_dec_and_test(&stripe->pending_io))
+		wake_up(&stripe->io_wait);
+}
+
+/*
+ * Submit the write bio(s) for the sectors specified by @write_bitmap.
+ *
+ * Here we utilize btrfs_submit_scrub_write(), which has some extra benefits:
+ *
+ * - Only needs logical bytenr and mirror_num
+ *   Just like the scrub read path
+ *
+ * - Would only result writes to the specified mirror
+ *   Unlike the regular writeback path, which would write back to all stripes
+ *
+ * - Handle dev-replace and read-repair writeback differently
+ */
+void scrub_write_sectors(struct scrub_ctx *sctx,
+			struct scrub_stripe *stripe,
+			unsigned long write_bitmap, bool dev_replace)
+{
+	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
+	struct btrfs_bio *bbio = NULL;
+	bool zoned = btrfs_is_zoned(fs_info);
+	int sector_nr;
+
+	for_each_set_bit(sector_nr, &write_bitmap, stripe->nr_sectors) {
+		struct page *page = scrub_stripe_get_page(stripe, sector_nr);
+		unsigned int pgoff = scrub_stripe_get_page_offset(stripe,
+								  sector_nr);
+		int ret;
+
+		/* We should only writeback sectors covered by an extent. */
+		ASSERT(test_bit(sector_nr, &stripe->extent_sector_bitmap));
+
+		/* Can not merge with previous sector, submit the current one. */
+		if (bbio && sector_nr && !test_bit(sector_nr - 1, &write_bitmap)) {
+			fill_writer_pointer_gap(sctx, stripe->physical +
+					(sector_nr << fs_info->sectorsize_bits));
+			atomic_inc(&stripe->pending_io);
+			btrfs_submit_scrub_write(bbio, stripe->mirror_num,
+						 dev_replace);
+			/* For zoned writeback, queue depth must be 1. */
+			if (zoned)
+				wait_scrub_stripe_io(stripe);
+			bbio = NULL;
+		}
+		if (!bbio) {
+			bbio = btrfs_scrub_bio_alloc(REQ_OP_WRITE, fs_info,
+						     scrub_write_endio, stripe);
+			bbio->bio.bi_iter.bi_sector = (stripe->logical +
+				(sector_nr << fs_info->sectorsize_bits)) >>
+				SECTOR_SHIFT;
+		}
+		ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
+		ASSERT(ret == fs_info->sectorsize);
+	}
+	if (bbio) {
+		fill_writer_pointer_gap(sctx, bbio->bio.bi_iter.bi_sector <<
+					SECTOR_SHIFT);
+		atomic_inc(&stripe->pending_io);
+		btrfs_submit_scrub_write(bbio, stripe->mirror_num, dev_replace);
+		if (zoned)
+			wait_scrub_stripe_io(stripe);
+	}
+}
+
 static int scrub_checksum_tree_block(struct scrub_block *sblock)
 {
 	struct scrub_ctx *sctx = sblock->sctx;
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index bcc9d398fe07..3027d4c23ee8 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -24,5 +24,8 @@ int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
 				 int mirror_num, u64 logical_start,
 				 u32 logical_len, struct scrub_stripe *stripe);
 void scrub_read_endio(struct btrfs_bio *bbio);
+void scrub_write_sectors(struct scrub_ctx *sctx,
+			struct scrub_stripe *stripe,
+			unsigned long write_bitmap, bool dev_replace);
 
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 11/13] btrfs: scrub: introduce error reporting functionality for scrub_stripe
  2023-03-28 23:56 [PATCH v7 00/13] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (9 preceding siblings ...)
  2023-03-28 23:56 ` [PATCH v7 10/13] btrfs: scrub: introduce a writeback helper " Qu Wenruo
@ 2023-03-28 23:56 ` Qu Wenruo
  2023-03-28 23:56 ` [PATCH v7 12/13] btrfs: scrub: introduce the helper to queue a stripe for scrub Qu Wenruo
  2023-03-28 23:56 ` [PATCH v7 13/13] btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure Qu Wenruo
  12 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2023-03-28 23:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The new helper, scrub_stripe_report_errors(), will report the result of the
scrub to dmesg.

The main reporting is done by introducing a new helper,
scrub_print_common_warning(), which is mostly the same content from
scrub_print_wanring(), but without the need for a scrub_block.

Since we're reporting the errors, it's the perfect timing to update the
scrub stat too.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/scrub.c | 167 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 158 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index b6d2361a6ddb..7c331316ad92 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -105,6 +105,7 @@ enum scrub_stripe_flags {
  * Represent one continuous range with a length of BTRFS_STRIPE_LEN.
  */
 struct scrub_stripe {
+	struct scrub_ctx *sctx;
 	struct btrfs_block_group *bg;
 
 	struct page *pages[SCRUB_STRIPE_PAGES];
@@ -119,6 +120,13 @@ struct scrub_stripe {
 	/* Should be BTRFS_STRIPE_LEN / sectorsize. */
 	u16 nr_sectors;
 
+	/*
+	 * How many data/meta extents are in this stripe.
+	 * Only for scrub stat report purpose.
+	 */
+	u16 nr_data_extents;
+	u16 nr_meta_extents;
+
 	atomic_t pending_io;
 	wait_queue_head_t io_wait;
 	wait_queue_head_t repair_wait;
@@ -377,6 +385,7 @@ static void release_scrub_stripe(struct scrub_stripe *stripe)
 	kfree(stripe->csums);
 	stripe->sectors = NULL;
 	stripe->csums = NULL;
+	stripe->sctx = NULL;
 	stripe->state = 0;
 }
 
@@ -1046,9 +1055,9 @@ static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes,
 	return 0;
 }
 
-static void scrub_print_warning(const char *errstr, struct scrub_block *sblock)
+static void scrub_print_common_warning(const char *errstr, struct btrfs_device *dev,
+				       bool is_super, u64 logical, u64 physical)
 {
-	struct btrfs_device *dev;
 	struct btrfs_fs_info *fs_info;
 	struct btrfs_path *path;
 	struct btrfs_key found_key;
@@ -1062,22 +1071,20 @@ static void scrub_print_warning(const char *errstr, struct scrub_block *sblock)
 	u8 ref_level = 0;
 	int ret;
 
-	WARN_ON(sblock->sector_count < 1);
-	dev = sblock->dev;
-	fs_info = sblock->sctx->fs_info;
+	fs_info = dev->fs_info;
 
 	/* Super block error, no need to search extent tree. */
-	if (sblock->sectors[0]->flags & BTRFS_EXTENT_FLAG_SUPER) {
+	if (is_super) {
 		btrfs_warn_in_rcu(fs_info, "%s on device %s, physical %llu",
-			errstr, btrfs_dev_name(dev), sblock->physical);
+			errstr, btrfs_dev_name(dev), physical);
 		return;
 	}
 	path = btrfs_alloc_path();
 	if (!path)
 		return;
 
-	swarn.physical = sblock->physical;
-	swarn.logical = sblock->logical;
+	swarn.physical = physical;
+	swarn.logical = logical;
 	swarn.errstr = errstr;
 	swarn.dev = NULL;
 
@@ -1126,6 +1133,13 @@ static void scrub_print_warning(const char *errstr, struct scrub_block *sblock)
 	btrfs_free_path(path);
 }
 
+static void scrub_print_warning(const char *errstr, struct scrub_block *sblock)
+{
+	scrub_print_common_warning(errstr, sblock->dev,
+			sblock->sectors[0]->flags & BTRFS_EXTENT_FLAG_SUPER,
+			sblock->logical, sblock->physical);
+}
+
 static inline void scrub_get_recover(struct scrub_recover *recover)
 {
 	refcount_inc(&recover->refs);
@@ -2470,6 +2484,132 @@ static void scrub_stripe_submit_repair_read(struct scrub_stripe *stripe,
 	}
 }
 
+static void scrub_stripe_report_errors(struct scrub_ctx *sctx,
+				       struct scrub_stripe *stripe)
+{
+	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
+	struct btrfs_fs_info *fs_info = sctx->fs_info;
+	struct btrfs_device *dev = NULL;
+	u64 physical = 0;
+	int nr_data_sectors = 0;
+	int nr_meta_sectors = 0;
+	int nr_nodatacsum_sectors = 0;
+	int nr_repaired_sectors = 0;
+	int sector_nr;
+
+	/*
+	 * Init needed infos for error reporting.
+	 *
+	 * Although our scrub_stripe infrastucture is mostly based on btrfs_submit_bio()
+	 * thus no need for dev/physical, error reporting still needs dev and physical.
+	 */
+	if (!bitmap_empty(&stripe->init_error_bitmap, stripe->nr_sectors)) {
+		u64 mapped_len = fs_info->sectorsize;
+		struct btrfs_io_context *bioc = NULL;
+		int stripe_index = stripe->mirror_num - 1;
+		int ret;
+
+		/* For scrub, our mirror_num should always start at 1. */
+		ASSERT(stripe->mirror_num >= 1);
+		ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS,
+				      stripe->logical, &mapped_len, &bioc);
+		/*
+		 * If we failed, dev will be NULL, and later detailed reports
+		 * will just be skipped.
+		 */
+		if (ret < 0)
+			goto skip;
+		physical = bioc->stripes[stripe_index].physical;
+		dev = bioc->stripes[stripe_index].dev;
+		btrfs_put_bioc(bioc);
+	}
+
+skip:
+	for_each_set_bit(sector_nr, &stripe->extent_sector_bitmap,
+			 stripe->nr_sectors) {
+		bool repaired = false;
+
+		if (stripe->sectors[sector_nr].is_metadata) {
+			nr_meta_sectors++;
+		} else {
+			nr_data_sectors++;
+			if (!stripe->sectors[sector_nr].csum)
+				nr_nodatacsum_sectors++;
+		}
+
+		if (test_bit(sector_nr, &stripe->init_error_bitmap) &&
+		    !test_bit(sector_nr, &stripe->error_bitmap)) {
+			nr_repaired_sectors++;
+			repaired = true;
+		}
+
+		/* Good sector from the beginning, nothing need to be done. */
+		if (!test_bit(sector_nr, &stripe->init_error_bitmap))
+			continue;
+
+		/*
+		 * Report error for the corrupted sectors.
+		 * If repaired, just output the message of repaired message.
+		 */
+		if (repaired) {
+			if (dev)
+				btrfs_err_rl_in_rcu(fs_info,
+			"fixed up error at logical %llu on dev %s physical %llu",
+					    stripe->logical, btrfs_dev_name(dev),
+					    physical);
+			else
+				btrfs_err_rl_in_rcu(fs_info,
+			"fixed up error at logical %llu on mirror %u",
+					    stripe->logical, stripe->mirror_num);
+			continue;
+		}
+
+		/* The remaining are all for unrepaired. */
+		if (dev)
+			btrfs_err_rl_in_rcu(fs_info,
+	"unable to fixup (regular) error at logical %llu on dev %s physical %llu",
+					    stripe->logical, btrfs_dev_name(dev),
+					    physical);
+		else
+			btrfs_err_rl_in_rcu(fs_info,
+	"unable to fixup (regular) error at logical %llu on mirror %u",
+					    stripe->logical, stripe->mirror_num);
+
+		if (test_bit(sector_nr, &stripe->io_error_bitmap))
+			if (__ratelimit(&rs) && dev)
+				scrub_print_common_warning("i/o error", dev, false,
+						     stripe->logical, physical);
+		if (test_bit(sector_nr, &stripe->csum_error_bitmap))
+			if (__ratelimit(&rs) && dev)
+				scrub_print_common_warning("checksum error", dev, false,
+						     stripe->logical, physical);
+		if (test_bit(sector_nr, &stripe->meta_error_bitmap))
+			if (__ratelimit(&rs) && dev)
+				scrub_print_common_warning("header error", dev, false,
+						     stripe->logical, physical);
+	}
+
+	spin_lock(&sctx->stat_lock);
+	sctx->stat.data_extents_scrubbed += stripe->nr_data_extents;
+	sctx->stat.tree_extents_scrubbed += stripe->nr_meta_extents;
+	sctx->stat.data_bytes_scrubbed += nr_data_sectors <<
+					  fs_info->sectorsize_bits;
+	sctx->stat.tree_bytes_scrubbed += nr_meta_sectors <<
+					  fs_info->sectorsize_bits;
+	sctx->stat.no_csum += nr_nodatacsum_sectors;
+	sctx->stat.read_errors +=
+		bitmap_weight(&stripe->io_error_bitmap, stripe->nr_sectors);
+	sctx->stat.csum_errors +=
+		bitmap_weight(&stripe->csum_error_bitmap, stripe->nr_sectors);
+	sctx->stat.verify_errors +=
+		bitmap_weight(&stripe->meta_error_bitmap, stripe->nr_sectors);
+	sctx->stat.uncorrectable_errors +=
+		bitmap_weight(&stripe->error_bitmap, stripe->nr_sectors);
+	sctx->stat.corrected_errors += nr_repaired_sectors;
+	spin_unlock(&sctx->stat_lock);
+}
+
 /*
  * The main entrance for all read related scrub work, including:
  *
@@ -2544,6 +2684,7 @@ static void scrub_stripe_read_repair_worker(struct work_struct *work)
 			goto out;
 	}
 out:
+	scrub_stripe_report_errors(stripe->sctx, stripe);
 	set_bit(SCRUB_STRIPE_FLAG_REPAIR_DONE, &stripe->state);
 	wake_up(&stripe->repair_wait);
 }
@@ -4213,6 +4354,10 @@ int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
 		goto out;
 	get_extent_info(&path, &extent_start, &extent_len,
 			&extent_flags, &extent_gen);
+	if (extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK)
+		stripe->nr_meta_extents++;
+	if (extent_flags & BTRFS_EXTENT_FLAG_DATA)
+		stripe->nr_data_extents++;
 	cur_logical = max(extent_start, cur_logical);
 
 	/*
@@ -4246,6 +4391,10 @@ int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
 		}
 		get_extent_info(&path, &extent_start, &extent_len,
 				&extent_flags, &extent_gen);
+		if (extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK)
+			stripe->nr_meta_extents++;
+		if (extent_flags & BTRFS_EXTENT_FLAG_DATA)
+			stripe->nr_data_extents++;
 		fill_one_extent_info(fs_info, stripe, extent_start, extent_len,
 				     extent_flags, extent_gen);
 		cur_logical = extent_start + extent_len;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 12/13] btrfs: scrub: introduce the helper to queue a stripe for scrub
  2023-03-28 23:56 [PATCH v7 00/13] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (10 preceding siblings ...)
  2023-03-28 23:56 ` [PATCH v7 11/13] btrfs: scrub: introduce error reporting functionality " Qu Wenruo
@ 2023-03-28 23:56 ` Qu Wenruo
  2023-03-28 23:56 ` [PATCH v7 13/13] btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure Qu Wenruo
  12 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2023-03-28 23:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The new helper, queue_scrub_stripe(), would try to queue a stripe for
scrub.
If all stripes are already in use, we will submit all the existing
stripes and wait them to finish.

Currently we would queue up to 8 stripes, to enlarge the blocksize to
512KiB to improve the performance.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/scrub.c | 187 ++++++++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/scrub.h |  13 +---
 2 files changed, 182 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 7c331316ad92..0e390fe108d3 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -50,6 +50,7 @@ struct scrub_ctx;
  */
 #define SCRUB_SECTORS_PER_BIO	32	/* 128KiB per bio for 4KiB pages */
 #define SCRUB_BIOS_PER_SCTX	64	/* 8MiB per device in flight for 4KiB pages */
+#define SCRUB_STRIPES_PER_SCTX	8	/* That would be 8 64K stripe per-device. */
 
 /*
  * The following value times PAGE_SIZE needs to be large enough to match the
@@ -277,9 +278,11 @@ struct scrub_parity {
 
 struct scrub_ctx {
 	struct scrub_bio	*bios[SCRUB_BIOS_PER_SCTX];
+	struct scrub_stripe	stripes[SCRUB_STRIPES_PER_SCTX];
 	struct btrfs_fs_info	*fs_info;
 	int			first_free;
 	int			curr;
+	int			cur_stripe;
 	atomic_t		bios_in_flight;
 	atomic_t		workers_pending;
 	spinlock_t		list_lock;
@@ -389,7 +392,8 @@ static void release_scrub_stripe(struct scrub_stripe *stripe)
 	stripe->state = 0;
 }
 
-int init_scrub_stripe(struct btrfs_fs_info *fs_info, struct scrub_stripe *stripe)
+static int init_scrub_stripe(struct btrfs_fs_info *fs_info,
+			     struct scrub_stripe *stripe)
 {
 	int ret;
 
@@ -895,6 +899,9 @@ static noinline_for_stack void scrub_free_ctx(struct scrub_ctx *sctx)
 		kfree(sbio);
 	}
 
+	for (i = 0; i < SCRUB_STRIPES_PER_SCTX; i++)
+		release_scrub_stripe(&sctx->stripes[i]);
+
 	kfree(sctx->wr_curr_bio);
 	scrub_free_csums(sctx);
 	kfree(sctx);
@@ -939,6 +946,14 @@ static noinline_for_stack struct scrub_ctx *scrub_setup_ctx(
 		else
 			sctx->bios[i]->next_free = -1;
 	}
+	for (i = 0; i < SCRUB_STRIPES_PER_SCTX; i++) {
+		int ret;
+
+		ret = init_scrub_stripe(fs_info, &sctx->stripes[i]);
+		if (ret < 0)
+			goto nomem;
+		sctx->stripes[i].sctx = sctx;
+	}
 	sctx->first_free = 0;
 	atomic_set(&sctx->bios_in_flight, 0);
 	atomic_set(&sctx->workers_pending, 0);
@@ -2689,7 +2704,7 @@ static void scrub_stripe_read_repair_worker(struct work_struct *work)
 	wake_up(&stripe->repair_wait);
 }
 
-void scrub_read_endio(struct btrfs_bio *bbio)
+static void scrub_read_endio(struct btrfs_bio *bbio)
 {
 	struct scrub_stripe *stripe = bbio->private;
 
@@ -2745,9 +2760,9 @@ static void scrub_write_endio(struct btrfs_bio *bbio)
  *
  * - Handle dev-replace and read-repair writeback differently
  */
-void scrub_write_sectors(struct scrub_ctx *sctx,
-			struct scrub_stripe *stripe,
-			unsigned long write_bitmap, bool dev_replace)
+static void scrub_write_sectors(struct scrub_ctx *sctx,
+				struct scrub_stripe *stripe,
+				unsigned long write_bitmap, bool dev_replace)
 {
 	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
 	struct btrfs_bio *bbio = NULL;
@@ -4319,10 +4334,11 @@ static void scrub_stripe_reset_bitmaps(struct scrub_stripe *stripe)
  * Return >0 if there is no such stripe in the specified range.
  * Return <0 for error.
  */
-int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
-				 struct btrfs_device *dev, u64 physical,
-				 int mirror_num, u64 logical_start,
-				 u32 logical_len, struct scrub_stripe *stripe)
+static int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
+					struct btrfs_device *dev, u64 physical,
+					int mirror_num, u64 logical_start,
+					u32 logical_len,
+					struct scrub_stripe *stripe)
 {
 	struct btrfs_fs_info *fs_info = bg->fs_info;
 	struct btrfs_root *extent_root = btrfs_extent_root(fs_info, bg->start);
@@ -4434,6 +4450,159 @@ int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
 	return ret;
 }
 
+static void scrub_reset_stripe(struct scrub_stripe *stripe)
+{
+	scrub_stripe_reset_bitmaps(stripe);
+
+	stripe->nr_meta_extents = 0;
+	stripe->nr_data_extents = 0;
+	stripe->state = 0;
+
+	for (int i = 0; i < stripe->nr_sectors; i++) {
+		stripe->sectors[i].is_metadata = false;
+		stripe->sectors[i].csum = NULL;
+		stripe->sectors[i].generation = 0;
+	}
+}
+
+static void scrub_submit_initial_read(struct scrub_ctx *sctx,
+				      struct scrub_stripe *stripe)
+{
+	struct btrfs_fs_info *fs_info = sctx->fs_info;
+	struct btrfs_bio *bbio;
+	int mirror = stripe->mirror_num;
+
+	ASSERT(stripe->bg);
+	ASSERT(stripe->mirror_num > 0);
+	ASSERT(test_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &stripe->state));
+
+	bbio = btrfs_scrub_bio_alloc(REQ_OP_READ, fs_info, scrub_read_endio,
+				     stripe);
+
+	/* Read the whole stripe. */
+	bbio->bio.bi_iter.bi_sector = stripe->logical >> SECTOR_SHIFT;
+	for (int i = 0; i < BTRFS_STRIPE_LEN >> PAGE_SHIFT; i++) {
+		int ret;
+
+		ret = bio_add_page(&bbio->bio, stripe->pages[i], PAGE_SIZE, 0);
+		/* We should have allocated enough bio vectors. */
+		ASSERT(ret == PAGE_SIZE);
+	}
+	atomic_inc(&stripe->pending_io);
+
+	/*
+	 * For dev-replace, either user asks to avoid the source dev, or
+	 * the device is missing, we try the next mirror instead.
+	 */
+	if (sctx->is_dev_replace &&
+	    (fs_info->dev_replace.cont_reading_from_srcdev_mode ==
+	     BTRFS_DEV_REPLACE_ITEM_CONT_READING_FROM_SRCDEV_MODE_AVOID ||
+	     !stripe->dev->bdev)) {
+		int num_copies = btrfs_num_copies(fs_info, stripe->bg->start,
+						  stripe->bg->length);
+		mirror = calc_next_mirror(mirror, num_copies);
+	}
+	btrfs_submit_scrub_read(bbio, mirror);
+}
+
+static void flush_scrub_stripes(struct scrub_ctx *sctx)
+{
+	struct btrfs_fs_info *fs_info = sctx->fs_info;
+	struct scrub_stripe *stripe;
+	const int nr_stripes = sctx->cur_stripe;
+
+	if (!nr_stripes)
+		return;
+
+	ASSERT(test_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &sctx->stripes[0].state));
+	for (int i = 0; i < nr_stripes; i++) {
+		stripe = &sctx->stripes[i];
+		scrub_submit_initial_read(sctx, stripe);
+	}
+
+	for (int i = 0; i < nr_stripes; i++) {
+		stripe = &sctx->stripes[i];
+
+		wait_event(stripe->repair_wait,
+			   test_bit(SCRUB_STRIPE_FLAG_REPAIR_DONE,
+				    &stripe->state));
+	}
+
+	/*
+	 * Submit the repaired sectors.
+	 * For zoned case, we can not do repair in-place, but
+	 * queue the bg to be relocated.
+	 */
+	if (btrfs_is_zoned(fs_info)) {
+		for (int i = 0; i < nr_stripes; i++) {
+			stripe = &sctx->stripes[i];
+
+			if (!bitmap_empty(&stripe->error_bitmap, stripe->nr_sectors)) {
+				btrfs_repair_one_zone(fs_info, sctx->stripes[0].bg->start);
+				break;
+			}
+		}
+	} else {
+		for (int i = 0; i < nr_stripes; i++) {
+			unsigned long repaired;
+
+			stripe = &sctx->stripes[i];
+
+			bitmap_andnot(&repaired, &stripe->init_error_bitmap,
+				      &stripe->error_bitmap, stripe->nr_sectors);
+			scrub_write_sectors(sctx, stripe, repaired, false);
+		}
+	}
+
+	/* Submit for dev-replace. */
+	if (sctx->is_dev_replace) {
+		for (int i = 0; i < nr_stripes; i++) {
+			unsigned long good;
+
+			stripe = &sctx->stripes[i];
+
+			ASSERT(stripe->dev == fs_info->dev_replace.srcdev);
+
+			bitmap_andnot(&good, &stripe->extent_sector_bitmap,
+				      &stripe->error_bitmap, stripe->nr_sectors);
+			scrub_write_sectors(sctx, stripe, good, true);
+		}
+	}
+
+	/* Wait for above writebacks to finish. */
+	for (int i = 0; i < nr_stripes; i++) {
+		stripe = &sctx->stripes[i];
+
+		wait_scrub_stripe_io(stripe);
+		scrub_reset_stripe(stripe);
+	}
+	sctx->cur_stripe = 0;
+}
+
+int queue_scrub_stripe(struct scrub_ctx *sctx,
+		       struct btrfs_block_group *bg,
+		       struct btrfs_device *dev, int mirror_num,
+		       u64 logical, u32 length, u64 physical)
+{
+	struct scrub_stripe *stripe;
+	int ret;
+
+	/* No available slot, submit all stripes and wait for them. */
+	if (sctx->cur_stripe >= SCRUB_STRIPES_PER_SCTX)
+		flush_scrub_stripes(sctx);
+
+	stripe = &sctx->stripes[sctx->cur_stripe];
+
+	/* We can queue one stripe using the remaining slot. */
+	scrub_reset_stripe(stripe);
+	ret = scrub_find_fill_first_stripe(bg, dev, physical, mirror_num,
+					   logical, length, stripe);
+	/* Either >0 as no more extent or <0 for error. */
+	if (ret)
+		return ret;
+	sctx->cur_stripe++;
+	return 0;
+}
 
 /*
  * Scrub one range which can only has simple mirror based profile.
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index 3027d4c23ee8..fb9d906f5a17 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -18,14 +18,9 @@ int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid,
  * static functions.
  */
 struct scrub_stripe;
-int init_scrub_stripe(struct btrfs_fs_info *fs_info, struct scrub_stripe *stripe);
-int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
-				 struct btrfs_device *dev, u64 physical,
-				 int mirror_num, u64 logical_start,
-				 u32 logical_len, struct scrub_stripe *stripe);
-void scrub_read_endio(struct btrfs_bio *bbio);
-void scrub_write_sectors(struct scrub_ctx *sctx,
-			struct scrub_stripe *stripe,
-			unsigned long write_bitmap, bool dev_replace);
+int queue_scrub_stripe(struct scrub_ctx *sctx,
+		       struct btrfs_block_group *bg,
+		       struct btrfs_device *dev, int mirror_num,
+		       u64 logical, u32 length, u64 physical);
 
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 13/13] btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure
  2023-03-28 23:56 [PATCH v7 00/13] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
                   ` (11 preceding siblings ...)
  2023-03-28 23:56 ` [PATCH v7 12/13] btrfs: scrub: introduce the helper to queue a stripe for scrub Qu Wenruo
@ 2023-03-28 23:56 ` Qu Wenruo
  12 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2023-03-28 23:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Switch scrub_simple_mirror() to the new scrub_stripe infrastructure.

Since scrub_simple_mirror() is the core part of scrub (only RAID56
P/Q stripes doesn't utilize it), we can get rid of a big hunk of code,
mostly scrub_extent() and scrub_sectors().

There is a functionality change:

- Scrub speed throttle now only affects read on the scrubbing device
  Writes (for repair and replace), and reads from other mirrors won't
  be limited by the limits.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/scrub.c | 494 +++--------------------------------------------
 fs/btrfs/scrub.h |  10 -
 2 files changed, 30 insertions(+), 474 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 0e390fe108d3..42e18cc905b8 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -582,10 +582,6 @@ static void scrub_sector_get(struct scrub_sector *sector);
 static void scrub_sector_put(struct scrub_sector *sector);
 static void scrub_parity_get(struct scrub_parity *sparity);
 static void scrub_parity_put(struct scrub_parity *sparity);
-static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len,
-			 u64 physical, struct btrfs_device *dev, u64 flags,
-			 u64 gen, int mirror_num, u8 *csum,
-			 u64 physical_for_dev_replace);
 static void scrub_bio_end_io(struct bio *bio);
 static void scrub_bio_end_io_worker(struct work_struct *work);
 static void scrub_block_complete(struct scrub_block *sblock);
@@ -2975,22 +2971,16 @@ static void scrub_sector_put(struct scrub_sector *sector)
 		kfree(sector);
 }
 
-/*
- * Throttling of IO submission, bandwidth-limit based, the timeslice is 1
- * second.  Limit can be set via /sys/fs/UUID/devinfo/devid/scrub_speed_max.
- */
-static void scrub_throttle(struct scrub_ctx *sctx)
+static void scrub_throttle_dev_io(struct scrub_ctx *sctx,
+				  struct btrfs_device *device,
+				  unsigned int bio_size)
 {
 	const int time_slice = 1000;
-	struct scrub_bio *sbio;
-	struct btrfs_device *device;
 	s64 delta;
 	ktime_t now;
 	u32 div;
 	u64 bwlimit;
 
-	sbio = sctx->bios[sctx->curr];
-	device = sbio->dev;
 	bwlimit = READ_ONCE(device->scrub_speed_max);
 	if (bwlimit == 0)
 		return;
@@ -3012,7 +3002,7 @@ static void scrub_throttle(struct scrub_ctx *sctx)
 	/* Still in the time to send? */
 	if (ktime_before(now, sctx->throttle_deadline)) {
 		/* If current bio is within the limit, send it */
-		sctx->throttle_sent += sbio->bio->bi_iter.bi_size;
+		sctx->throttle_sent += bio_size;
 		if (sctx->throttle_sent <= div_u64(bwlimit, div))
 			return;
 
@@ -3034,6 +3024,17 @@ static void scrub_throttle(struct scrub_ctx *sctx)
 	sctx->throttle_deadline = 0;
 }
 
+/*
+ * Throttling of IO submission, bandwidth-limit based, the timeslice is 1
+ * second.  Limit can be set via /sys/fs/UUID/devinfo/devid/scrub_speed_max.
+ */
+static void scrub_throttle(struct scrub_ctx *sctx)
+{
+	struct scrub_bio *sbio = sctx->bios[sctx->curr];
+
+	scrub_throttle_dev_io(sctx, sbio->dev, sbio->bio->bi_iter.bi_size);
+}
+
 static void scrub_submit(struct scrub_ctx *sctx)
 {
 	struct scrub_bio *sbio;
@@ -3118,202 +3119,6 @@ static int scrub_add_sector_to_rd_bio(struct scrub_ctx *sctx,
 	return 0;
 }
 
-static void scrub_missing_raid56_end_io(struct bio *bio)
-{
-	struct scrub_block *sblock = bio->bi_private;
-	struct btrfs_fs_info *fs_info = sblock->sctx->fs_info;
-
-	btrfs_bio_counter_dec(fs_info);
-	if (bio->bi_status)
-		sblock->no_io_error_seen = 0;
-
-	bio_put(bio);
-
-	queue_work(fs_info->scrub_workers, &sblock->work);
-}
-
-static void scrub_missing_raid56_worker(struct work_struct *work)
-{
-	struct scrub_block *sblock = container_of(work, struct scrub_block, work);
-	struct scrub_ctx *sctx = sblock->sctx;
-	struct btrfs_fs_info *fs_info = sctx->fs_info;
-	u64 logical;
-	struct btrfs_device *dev;
-
-	logical = sblock->logical;
-	dev = sblock->dev;
-
-	if (sblock->no_io_error_seen)
-		scrub_recheck_block_checksum(sblock);
-
-	if (!sblock->no_io_error_seen) {
-		spin_lock(&sctx->stat_lock);
-		sctx->stat.read_errors++;
-		spin_unlock(&sctx->stat_lock);
-		btrfs_err_rl_in_rcu(fs_info,
-			"IO error rebuilding logical %llu for dev %s",
-			logical, btrfs_dev_name(dev));
-	} else if (sblock->header_error || sblock->checksum_error) {
-		spin_lock(&sctx->stat_lock);
-		sctx->stat.uncorrectable_errors++;
-		spin_unlock(&sctx->stat_lock);
-		btrfs_err_rl_in_rcu(fs_info,
-			"failed to rebuild valid logical %llu for dev %s",
-			logical, btrfs_dev_name(dev));
-	} else {
-		scrub_write_block_to_dev_replace(sblock);
-	}
-
-	if (sctx->is_dev_replace && sctx->flush_all_writes) {
-		mutex_lock(&sctx->wr_lock);
-		scrub_wr_submit(sctx);
-		mutex_unlock(&sctx->wr_lock);
-	}
-
-	scrub_block_put(sblock);
-	scrub_pending_bio_dec(sctx);
-}
-
-static void scrub_missing_raid56_pages(struct scrub_block *sblock)
-{
-	struct scrub_ctx *sctx = sblock->sctx;
-	struct btrfs_fs_info *fs_info = sctx->fs_info;
-	u64 length = sblock->sector_count << fs_info->sectorsize_bits;
-	u64 logical = sblock->logical;
-	struct btrfs_io_context *bioc = NULL;
-	struct bio *bio;
-	struct btrfs_raid_bio *rbio;
-	int ret;
-	int i;
-
-	btrfs_bio_counter_inc_blocked(fs_info);
-	ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical,
-			       &length, &bioc);
-	if (ret || !bioc)
-		goto bioc_out;
-
-	if (WARN_ON(!sctx->is_dev_replace ||
-		    !(bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK))) {
-		/*
-		 * We shouldn't be scrubbing a missing device. Even for dev
-		 * replace, we should only get here for RAID 5/6. We either
-		 * managed to mount something with no mirrors remaining or
-		 * there's a bug in scrub_find_good_copy()/btrfs_map_block().
-		 */
-		goto bioc_out;
-	}
-
-	bio = bio_alloc(NULL, BIO_MAX_VECS, REQ_OP_READ, GFP_NOFS);
-	bio->bi_iter.bi_sector = logical >> 9;
-	bio->bi_private = sblock;
-	bio->bi_end_io = scrub_missing_raid56_end_io;
-
-	rbio = raid56_alloc_missing_rbio(bio, bioc);
-	if (!rbio)
-		goto rbio_out;
-
-	for (i = 0; i < sblock->sector_count; i++) {
-		struct scrub_sector *sector = sblock->sectors[i];
-
-		raid56_add_scrub_pages(rbio, scrub_sector_get_page(sector),
-				       scrub_sector_get_page_offset(sector),
-				       sector->offset + sector->sblock->logical);
-	}
-
-	INIT_WORK(&sblock->work, scrub_missing_raid56_worker);
-	scrub_block_get(sblock);
-	scrub_pending_bio_inc(sctx);
-	raid56_submit_missing_rbio(rbio);
-	btrfs_put_bioc(bioc);
-	return;
-
-rbio_out:
-	bio_put(bio);
-bioc_out:
-	btrfs_bio_counter_dec(fs_info);
-	btrfs_put_bioc(bioc);
-	spin_lock(&sctx->stat_lock);
-	sctx->stat.malloc_errors++;
-	spin_unlock(&sctx->stat_lock);
-}
-
-static int scrub_sectors(struct scrub_ctx *sctx, u64 logical, u32 len,
-		       u64 physical, struct btrfs_device *dev, u64 flags,
-		       u64 gen, int mirror_num, u8 *csum,
-		       u64 physical_for_dev_replace)
-{
-	struct scrub_block *sblock;
-	const u32 sectorsize = sctx->fs_info->sectorsize;
-	int index;
-
-	sblock = alloc_scrub_block(sctx, dev, logical, physical,
-				   physical_for_dev_replace, mirror_num);
-	if (!sblock) {
-		spin_lock(&sctx->stat_lock);
-		sctx->stat.malloc_errors++;
-		spin_unlock(&sctx->stat_lock);
-		return -ENOMEM;
-	}
-
-	for (index = 0; len > 0; index++) {
-		struct scrub_sector *sector;
-		/*
-		 * Here we will allocate one page for one sector to scrub.
-		 * This is fine if PAGE_SIZE == sectorsize, but will cost
-		 * more memory for PAGE_SIZE > sectorsize case.
-		 */
-		u32 l = min(sectorsize, len);
-
-		sector = alloc_scrub_sector(sblock, logical);
-		if (!sector) {
-			spin_lock(&sctx->stat_lock);
-			sctx->stat.malloc_errors++;
-			spin_unlock(&sctx->stat_lock);
-			scrub_block_put(sblock);
-			return -ENOMEM;
-		}
-		sector->flags = flags;
-		sector->generation = gen;
-		if (csum) {
-			sector->have_csum = 1;
-			memcpy(sector->csum, csum, sctx->fs_info->csum_size);
-		} else {
-			sector->have_csum = 0;
-		}
-		len -= l;
-		logical += l;
-		physical += l;
-		physical_for_dev_replace += l;
-	}
-
-	WARN_ON(sblock->sector_count == 0);
-	if (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state)) {
-		/*
-		 * This case should only be hit for RAID 5/6 device replace. See
-		 * the comment in scrub_missing_raid56_pages() for details.
-		 */
-		scrub_missing_raid56_pages(sblock);
-	} else {
-		for (index = 0; index < sblock->sector_count; index++) {
-			struct scrub_sector *sector = sblock->sectors[index];
-			int ret;
-
-			ret = scrub_add_sector_to_rd_bio(sctx, sector);
-			if (ret) {
-				scrub_block_put(sblock);
-				return ret;
-			}
-		}
-
-		if (flags & BTRFS_EXTENT_FLAG_SUPER)
-			scrub_submit(sctx);
-	}
-
-	/* last one frees, either here or in bio completion for last page */
-	scrub_block_put(sblock);
-	return 0;
-}
-
 static void scrub_bio_end_io(struct bio *bio)
 {
 	struct scrub_bio *sbio = bio->bi_private;
@@ -3498,179 +3303,6 @@ static int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum)
 	return 1;
 }
 
-static bool should_use_device(struct btrfs_fs_info *fs_info,
-			      struct btrfs_device *dev,
-			      bool follow_replace_read_mode)
-{
-	struct btrfs_device *replace_srcdev = fs_info->dev_replace.srcdev;
-	struct btrfs_device *replace_tgtdev = fs_info->dev_replace.tgtdev;
-
-	if (!dev->bdev)
-		return false;
-
-	/*
-	 * We're doing scrub/replace, if it's pure scrub, no tgtdev should be
-	 * here.  If it's replace, we're going to write data to tgtdev, thus
-	 * the current data of the tgtdev is all garbage, thus we can not use
-	 * it at all.
-	 */
-	if (dev == replace_tgtdev)
-		return false;
-
-	/* No need to follow replace read mode, any existing device is fine. */
-	if (!follow_replace_read_mode)
-		return true;
-
-	/* Need to follow the mode. */
-	if (fs_info->dev_replace.cont_reading_from_srcdev_mode ==
-	    BTRFS_DEV_REPLACE_ITEM_CONT_READING_FROM_SRCDEV_MODE_AVOID)
-		return dev != replace_srcdev;
-	return true;
-}
-static int scrub_find_good_copy(struct btrfs_fs_info *fs_info,
-				u64 extent_logical, u32 extent_len,
-				u64 *extent_physical,
-				struct btrfs_device **extent_dev,
-				int *extent_mirror_num)
-{
-	u64 mapped_length;
-	struct btrfs_io_context *bioc = NULL;
-	int ret;
-	int i;
-
-	mapped_length = extent_len;
-	ret = btrfs_map_block(fs_info, BTRFS_MAP_GET_READ_MIRRORS,
-			      extent_logical, &mapped_length, &bioc, 0);
-	if (ret || !bioc || mapped_length < extent_len) {
-		btrfs_put_bioc(bioc);
-		btrfs_err_rl(fs_info, "btrfs_map_block() failed for logical %llu: %d",
-				extent_logical, ret);
-		return -EIO;
-	}
-
-	/*
-	 * First loop to exclude all missing devices and the source device if
-	 * needed.  And we don't want to use target device as mirror either, as
-	 * we're doing the replace, the target device range contains nothing.
-	 */
-	for (i = 0; i < bioc->num_stripes - bioc->replace_nr_stripes; i++) {
-		struct btrfs_io_stripe *stripe = &bioc->stripes[i];
-
-		if (!should_use_device(fs_info, stripe->dev, true))
-			continue;
-		goto found;
-	}
-	/*
-	 * We didn't find any alternative mirrors, we have to break our replace
-	 * read mode, or we can not read at all.
-	 */
-	for (i = 0; i < bioc->num_stripes - bioc->replace_nr_stripes; i++) {
-		struct btrfs_io_stripe *stripe = &bioc->stripes[i];
-
-		if (!should_use_device(fs_info, stripe->dev, false))
-			continue;
-		goto found;
-	}
-
-	btrfs_err_rl(fs_info, "failed to find any live mirror for logical %llu",
-			extent_logical);
-	return -EIO;
-
-found:
-	*extent_physical = bioc->stripes[i].physical;
-	*extent_mirror_num = i + 1;
-	*extent_dev = bioc->stripes[i].dev;
-	btrfs_put_bioc(bioc);
-	return 0;
-}
-
-static bool scrub_need_different_mirror(struct scrub_ctx *sctx,
-					struct map_lookup *map,
-					struct btrfs_device *dev)
-{
-	/*
-	 * For RAID56, all the extra mirrors are rebuilt from other P/Q,
-	 * cannot utilize other mirrors directly.
-	 */
-	if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK)
-		return false;
-
-	if (!dev->bdev)
-		return true;
-
-	return sctx->fs_info->dev_replace.cont_reading_from_srcdev_mode ==
-		BTRFS_DEV_REPLACE_ITEM_CONT_READING_FROM_SRCDEV_MODE_AVOID;
-}
-
-/* scrub extent tries to collect up to 64 kB for each bio */
-static int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map,
-			u64 logical, u32 len,
-			u64 physical, struct btrfs_device *dev, u64 flags,
-			u64 gen, int mirror_num)
-{
-	struct btrfs_device *src_dev = dev;
-	u64 src_physical = physical;
-	int src_mirror = mirror_num;
-	int ret;
-	u8 csum[BTRFS_CSUM_SIZE];
-	u32 blocksize;
-
-	if (flags & BTRFS_EXTENT_FLAG_DATA) {
-		if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK)
-			blocksize = BTRFS_STRIPE_LEN;
-		else
-			blocksize = sctx->fs_info->sectorsize;
-		spin_lock(&sctx->stat_lock);
-		sctx->stat.data_extents_scrubbed++;
-		sctx->stat.data_bytes_scrubbed += len;
-		spin_unlock(&sctx->stat_lock);
-	} else if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
-		if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK)
-			blocksize = BTRFS_STRIPE_LEN;
-		else
-			blocksize = sctx->fs_info->nodesize;
-		spin_lock(&sctx->stat_lock);
-		sctx->stat.tree_extents_scrubbed++;
-		sctx->stat.tree_bytes_scrubbed += len;
-		spin_unlock(&sctx->stat_lock);
-	} else {
-		blocksize = sctx->fs_info->sectorsize;
-		WARN_ON(1);
-	}
-
-	/*
-	 * For dev-replace case, we can have @dev being a missing device, or
-	 * we want to avoid reading from the source device if possible.
-	 */
-	if (sctx->is_dev_replace && scrub_need_different_mirror(sctx, map, dev)) {
-		ret = scrub_find_good_copy(sctx->fs_info, logical, len,
-					   &src_physical, &src_dev, &src_mirror);
-		if (ret < 0)
-			return ret;
-	}
-	while (len) {
-		u32 l = min(len, blocksize);
-		int have_csum = 0;
-
-		if (flags & BTRFS_EXTENT_FLAG_DATA) {
-			/* push csums to sbio */
-			have_csum = scrub_find_csum(sctx, logical, csum);
-			if (have_csum == 0)
-				++sctx->stat.no_csum;
-		}
-		ret = scrub_sectors(sctx, logical, l, src_physical, src_dev,
-				    flags, gen, src_mirror,
-				    have_csum ? csum : NULL, physical);
-		if (ret)
-			return ret;
-		len -= l;
-		logical += l;
-		physical += l;
-		src_physical += l;
-	}
-	return 0;
-}
-
 static int scrub_sectors_for_parity(struct scrub_parity *sparity,
 				  u64 logical, u32 len,
 				  u64 physical, struct btrfs_device *dev,
@@ -4253,20 +3885,6 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx,
 	return ret < 0 ? ret : 0;
 }
 
-static void sync_replace_for_zoned(struct scrub_ctx *sctx)
-{
-	if (!btrfs_is_zoned(sctx->fs_info))
-		return;
-
-	sctx->flush_all_writes = true;
-	scrub_submit(sctx);
-	mutex_lock(&sctx->wr_lock);
-	scrub_wr_submit(sctx);
-	mutex_unlock(&sctx->wr_lock);
-
-	wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0);
-}
-
 static int sync_write_pointer_for_zoned(struct scrub_ctx *sctx, u64 logical,
 					u64 physical, u64 physical_end)
 {
@@ -4515,6 +4133,9 @@ static void flush_scrub_stripes(struct scrub_ctx *sctx)
 		return;
 
 	ASSERT(test_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &sctx->stripes[0].state));
+
+	scrub_throttle_dev_io(sctx, sctx->stripes[0].dev,
+			      nr_stripes << BTRFS_STRIPE_LEN_SHIFT);
 	for (int i = 0; i < nr_stripes; i++) {
 		stripe = &sctx->stripes[i];
 		scrub_submit_initial_read(sctx, stripe);
@@ -4579,10 +4200,10 @@ static void flush_scrub_stripes(struct scrub_ctx *sctx)
 	sctx->cur_stripe = 0;
 }
 
-int queue_scrub_stripe(struct scrub_ctx *sctx,
-		       struct btrfs_block_group *bg,
-		       struct btrfs_device *dev, int mirror_num,
-		       u64 logical, u32 length, u64 physical)
+static int queue_scrub_stripe(struct scrub_ctx *sctx,
+			      struct btrfs_block_group *bg,
+			      struct btrfs_device *dev, int mirror_num,
+			      u64 logical, u32 length, u64 physical)
 {
 	struct scrub_stripe *stripe;
 	int ret;
@@ -4620,11 +4241,8 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
 			       u64 physical, int mirror_num)
 {
 	struct btrfs_fs_info *fs_info = sctx->fs_info;
-	struct btrfs_root *csum_root = btrfs_csum_root(fs_info, bg->start);
-	struct btrfs_root *extent_root = btrfs_extent_root(fs_info, bg->start);
 	const u64 logical_end = logical_start + logical_length;
 	/* An artificial limit, inherit from old scrub behavior */
-	const u32 max_length = SZ_64K;
 	struct btrfs_path path = { 0 };
 	u64 cur_logical = logical_start;
 	int ret;
@@ -4636,11 +4254,7 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
 	path.skip_locking = 1;
 	/* Go through each extent items inside the logical range */
 	while (cur_logical < logical_end) {
-		u64 extent_start;
-		u64 extent_len;
-		u64 extent_flags;
-		u64 extent_gen;
-		u64 scrub_len;
+		u64 cur_physical = physical + cur_logical - logical_start;
 
 		/* Canceled? */
 		if (atomic_read(&fs_info->scrub_cancel_req) ||
@@ -4670,8 +4284,9 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
 		}
 		spin_unlock(&bg->lock);
 
-		ret = find_first_extent_item(extent_root, &path, cur_logical,
-					     logical_end - cur_logical);
+		ret = queue_scrub_stripe(sctx, bg, device, mirror_num,
+					 cur_logical, logical_end - cur_logical,
+					 cur_physical);
 		if (ret > 0) {
 			/* No more extent, just update the accounting */
 			sctx->stat.last_physical = physical + logical_length;
@@ -4680,52 +4295,11 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
 		}
 		if (ret < 0)
 			break;
-		get_extent_info(&path, &extent_start, &extent_len,
-				&extent_flags, &extent_gen);
-		/* Skip hole range which doesn't have any extent */
-		cur_logical = max(extent_start, cur_logical);
 
-		/*
-		 * Scrub len has three limits:
-		 * - Extent size limit
-		 * - Scrub range limit
-		 *   This is especially imporatant for RAID0/RAID10 to reuse
-		 *   this function
-		 * - Max scrub size limit
-		 */
-		scrub_len = min(min(extent_start + extent_len,
-				    logical_end), cur_logical + max_length) -
-			    cur_logical;
+		ASSERT(sctx->cur_stripe > 0);
+		cur_logical = sctx->stripes[sctx->cur_stripe - 1].logical
+			      + BTRFS_STRIPE_LEN;
 
-		if (extent_flags & BTRFS_EXTENT_FLAG_DATA) {
-			ret = btrfs_lookup_csums_list(csum_root, cur_logical,
-					cur_logical + scrub_len - 1,
-					&sctx->csum_list, 1, false);
-			if (ret)
-				break;
-		}
-		if ((extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) &&
-		    does_range_cross_boundary(extent_start, extent_len,
-					      logical_start, logical_length)) {
-			btrfs_err(fs_info,
-"scrub: tree block %llu spanning boundaries, ignored. boundary=[%llu, %llu)",
-				  extent_start, logical_start, logical_end);
-			spin_lock(&sctx->stat_lock);
-			sctx->stat.uncorrectable_errors++;
-			spin_unlock(&sctx->stat_lock);
-			cur_logical += scrub_len;
-			continue;
-		}
-		ret = scrub_extent(sctx, map, cur_logical, scrub_len,
-				   cur_logical - logical_start + physical,
-				   device, extent_flags, extent_gen,
-				   mirror_num);
-		scrub_free_csums(sctx);
-		if (ret)
-			break;
-		if (sctx->is_dev_replace)
-			sync_replace_for_zoned(sctx);
-		cur_logical += scrub_len;
 		/* Don't hold CPU for too long time */
 		cond_resched();
 	}
@@ -4810,7 +4384,6 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 					   int stripe_index)
 {
 	struct btrfs_fs_info *fs_info = sctx->fs_info;
-	struct blk_plug plug;
 	struct map_lookup *map = em->map_lookup;
 	const u64 profile = map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK;
 	const u64 chunk_logical = bg->start;
@@ -4832,12 +4405,6 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 		   atomic_read(&sctx->bios_in_flight) == 0);
 	scrub_blocked_if_needed(fs_info);
 
-	/*
-	 * collect all data csums for the stripe to avoid seeking during
-	 * the scrub. This might currently (crc32) end up to be about 1MB
-	 */
-	blk_start_plug(&plug);
-
 	if (sctx->is_dev_replace &&
 	    btrfs_dev_is_sequential(sctx->wr_tgtdev, physical)) {
 		mutex_lock(&sctx->wr_lock);
@@ -4939,8 +4506,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 	mutex_lock(&sctx->wr_lock);
 	scrub_wr_submit(sctx);
 	mutex_unlock(&sctx->wr_lock);
-
-	blk_finish_plug(&plug);
+	flush_scrub_stripes(sctx);
 
 	if (sctx->is_dev_replace && ret >= 0) {
 		int ret2;
diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h
index fb9d906f5a17..7639103ebf9d 100644
--- a/fs/btrfs/scrub.h
+++ b/fs/btrfs/scrub.h
@@ -13,14 +13,4 @@ int btrfs_scrub_cancel_dev(struct btrfs_device *dev);
 int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid,
 			 struct btrfs_scrub_progress *progress);
 
-/*
- * The following functions are temporary exports to avoid warning on unused
- * static functions.
- */
-struct scrub_stripe;
-int queue_scrub_stripe(struct scrub_ctx *sctx,
-		       struct btrfs_block_group *bg,
-		       struct btrfs_device *dev, int mirror_num,
-		       u64 logical, u32 length, u64 physical);
-
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 01/13] btrfs: scrub: use dedicated super block verification function to scrub one super block
  2023-03-28 23:56 ` [PATCH v7 01/13] btrfs: scrub: use dedicated super block verification function to scrub one super block Qu Wenruo
@ 2023-03-29  0:29   ` Anand Jain
  2023-03-29  9:20   ` Johannes Thumshirn
  2023-03-29 23:25   ` Christoph Hellwig
  2 siblings, 0 replies; 32+ messages in thread
From: Anand Jain @ 2023-03-29  0:29 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba


This patch remains the same as v3, which was reviewed.

Reviewed-by: Anand Jain <anand.jain@oracle.com>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 01/13] btrfs: scrub: use dedicated super block verification function to scrub one super block
  2023-03-28 23:56 ` [PATCH v7 01/13] btrfs: scrub: use dedicated super block verification function to scrub one super block Qu Wenruo
  2023-03-29  0:29   ` Anand Jain
@ 2023-03-29  9:20   ` Johannes Thumshirn
  2023-03-29  9:53     ` Qu Wenruo
  2023-03-29 23:25   ` Christoph Hellwig
  2 siblings, 1 reply; 32+ messages in thread
From: Johannes Thumshirn @ 2023-03-29  9:20 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 29.03.23 01:57, Qu Wenruo wrote:
> +
> +	bio_init(&bio, dev->bdev, &bvec, 1, REQ_OP_READ);
> +	bio.bi_iter.bi_sector = physical >> SECTOR_SHIFT;
> +	bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE, 0);

Shouldn't we use __bio_add_page() here, as it is only adding 
one page to a newly crafted bio?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 01/13] btrfs: scrub: use dedicated super block verification function to scrub one super block
  2023-03-29  9:20   ` Johannes Thumshirn
@ 2023-03-29  9:53     ` Qu Wenruo
  0 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2023-03-29  9:53 UTC (permalink / raw)
  To: Johannes Thumshirn, Qu Wenruo, linux-btrfs; +Cc: David Sterba



On 2023/3/29 17:20, Johannes Thumshirn wrote:
> On 29.03.23 01:57, Qu Wenruo wrote:
>> +
>> +	bio_init(&bio, dev->bdev, &bvec, 1, REQ_OP_READ);
>> +	bio.bi_iter.bi_sector = physical >> SECTOR_SHIFT;
>> +	bio_add_page(&bio, page, BTRFS_SUPER_INFO_SIZE, 0);
> 
> Shouldn't we use __bio_add_page() here, as it is only adding
> one page to a newly crafted bio?

Oh, I always forgot that helper....

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 01/13] btrfs: scrub: use dedicated super block verification function to scrub one super block
  2023-03-28 23:56 ` [PATCH v7 01/13] btrfs: scrub: use dedicated super block verification function to scrub one super block Qu Wenruo
  2023-03-29  0:29   ` Anand Jain
  2023-03-29  9:20   ` Johannes Thumshirn
@ 2023-03-29 23:25   ` Christoph Hellwig
  2 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2023-03-29 23:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, David Sterba

On Wed, Mar 29, 2023 at 07:56:08AM +0800, Qu Wenruo wrote:
> +	ret = btrfs_validate_super(fs_info, sb, -1);
> +	return ret;

Just return directly here instead of assining to ret just to return it
in the next line.

Otherwise this looks good minus the __bio_add_page nit:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 02/13] btrfs: introduce a new allocator for scrub specific btrfs_bio
  2023-03-28 23:56 ` [PATCH v7 02/13] btrfs: introduce a new allocator for scrub specific btrfs_bio Qu Wenruo
@ 2023-03-29 23:32   ` Christoph Hellwig
  2023-03-29 23:39     ` Qu Wenruo
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2023-03-29 23:32 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

> +struct btrfs_bio *btrfs_scrub_bio_alloc(blk_opf_t opf,
> +					struct btrfs_fs_info *fs_info,
> +					btrfs_bio_end_io_t end_io, void *private)
> +{
> +	struct btrfs_bio *bbio;
> +	struct bio *bio;
> +
> +	bio = bio_alloc_bioset(NULL, BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits,
> +			       opf, GFP_NOFS, &btrfs_bioset);
> +	bbio = btrfs_bio(bio);
> +	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
> +	bbio->fs_info = fs_info;
> +	bbio->end_io = end_io;
> +	bbio->private = private;
> +	atomic_set(&bbio->pending_ios, 1);
> +	return bbio;

As mentioned in the last round, I'm not too happy with this.  With
the inode and file_offset being optional now we might as well drop them
as arguments from btrfs_bio_alloc/btrfs_bio_init and just pass a nr_vecs
instead and make this new allocator obsolete, and it would be a much
better result.

I'd prefer to just do it now rather than doing another series changing
it a little later.

> +	/*
> +	 * Inode and offset into it that this I/O operates on.
> +	 *
> +	 * @inode can be NULL for callers who don't want any advanced features
> +	 * like read-time repair.
> +	 */
>  	struct btrfs_inode *inode;
>  	u64 file_offset;

I don't think these negative comments are nice for the reader.  I'd do:

	/*
	 * Inode and offset into it that this I/O operates on.
	 * Only set for data I/O.
	 */

> +	/*
> +	 * For cases where callers only want to read/write from a logical
> +	 * bytenr, in that case @inode can be NULL, and we need such
> +	 * @fs_info pointer to grab the corresponding fs_info.
> +	 *
> +	 * Should always be populated.
> +	 */
> +	struct btrfs_fs_info *fs_info;

Again here, this comment only makes sense for people following the
development history of this particular patch series.  Once that is in
the reason why people use an inode before is irrelevant.  The only
useful bit left here is that it must always be populated, but I'm not
even sure I'd add that.  So all we might need is:

	/* File system that this I/O operates on. */

What would be good in this patch is to replace the
existing bbio->inode->root->fs_info dereferences with bbio->fs_info
ASAP, though.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 03/13] btrfs: introduce a new helper to submit read bio for scrub
  2023-03-28 23:56 ` [PATCH v7 03/13] btrfs: introduce a new helper to submit read bio for scrub Qu Wenruo
@ 2023-03-29 23:33   ` Christoph Hellwig
  2023-03-29 23:41     ` Qu Wenruo
  2023-03-30  6:43     ` Qu Wenruo
  0 siblings, 2 replies; 32+ messages in thread
From: Christoph Hellwig @ 2023-03-29 23:33 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, David Sterba

On Wed, Mar 29, 2023 at 07:56:10AM +0800, Qu Wenruo wrote:
> The new helper, btrfs_submit_scrub_read(), would be mostly a subset of
> btrfs_submit_bio(), with the following limitations:
> 
> - Only supports read
> - @mirror_num must be > 0
> - No read-time repair nor checksum verification
> - The @bbio must not cross stripe boundary
> 
> This would provide the basis for unified read repair for scrub, as we no
> longer needs to handle RAID56 recovery all by scrub, and RAID56 data
> stripes scrub can share the same code of read and repair.
> 
> The repair part would be the same as non-RAID56, as we only need to try
> the next mirror.

Didn't we just agree that we do not need another magic helper?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 04/13] btrfs: introduce a new helper to submit write bio for scrub
  2023-03-28 23:56 ` [PATCH v7 04/13] btrfs: introduce a new helper to submit write " Qu Wenruo
@ 2023-03-29 23:33   ` Christoph Hellwig
  2023-03-30  6:47     ` Qu Wenruo
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2023-03-29 23:33 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, David Sterba

Same here, hadn't we just decided to share code with
ⅺtrfs_repair_io_failure?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 02/13] btrfs: introduce a new allocator for scrub specific btrfs_bio
  2023-03-29 23:32   ` Christoph Hellwig
@ 2023-03-29 23:39     ` Qu Wenruo
  2023-03-29 23:47       ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2023-03-29 23:39 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo; +Cc: linux-btrfs



On 2023/3/30 07:32, Christoph Hellwig wrote:
>> +struct btrfs_bio *btrfs_scrub_bio_alloc(blk_opf_t opf,
>> +					struct btrfs_fs_info *fs_info,
>> +					btrfs_bio_end_io_t end_io, void *private)
>> +{
>> +	struct btrfs_bio *bbio;
>> +	struct bio *bio;
>> +
>> +	bio = bio_alloc_bioset(NULL, BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits,
>> +			       opf, GFP_NOFS, &btrfs_bioset);
>> +	bbio = btrfs_bio(bio);
>> +	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
>> +	bbio->fs_info = fs_info;
>> +	bbio->end_io = end_io;
>> +	bbio->private = private;
>> +	atomic_set(&bbio->pending_ios, 1);
>> +	return bbio;
> 
> As mentioned in the last round, I'm not too happy with this.  With
> the inode and file_offset being optional now we might as well drop them
> as arguments from btrfs_bio_alloc/btrfs_bio_init and just pass a nr_vecs
> instead and make this new allocator obsolete, and it would be a much
> better result.

I would be happier if we drop @inode and @file_offset from 
btrfs_bio_alloc().

In that case, I'm completely fine to drop the new allocator.

A quick glance into the code base, and it shows 
btrfs_bio_alloc/btrfs_bio_init() are not called that frequently, thus 
the change may be small enough to be done in the series.


But as my usual tendency, it would still be better to have some 
ASSERT()s to make sure we're properly populating @inode for needed call 
sites.

Or we can easily pass some bbio, which should go through csum 
verification, without a valid @inode pointer, and to be treated as 
NODATACSUM.
Such problem can be very hard to detect.

Any good suggestions?

> 
> I'd prefer to just do it now rather than doing another series changing
> it a little later.
> 
>> +	/*
>> +	 * Inode and offset into it that this I/O operates on.
>> +	 *
>> +	 * @inode can be NULL for callers who don't want any advanced features
>> +	 * like read-time repair.
>> +	 */
>>   	struct btrfs_inode *inode;
>>   	u64 file_offset;
> 
> I don't think these negative comments are nice for the reader.  I'd do:
> 
> 	/*
> 	 * Inode and offset into it that this I/O operates on.
> 	 * Only set for data I/O.
> 	 */
> 
>> +	/*
>> +	 * For cases where callers only want to read/write from a logical
>> +	 * bytenr, in that case @inode can be NULL, and we need such
>> +	 * @fs_info pointer to grab the corresponding fs_info.
>> +	 *
>> +	 * Should always be populated.
>> +	 */
>> +	struct btrfs_fs_info *fs_info;
> 
> Again here, this comment only makes sense for people following the
> development history of this particular patch series.  Once that is in
> the reason why people use an inode before is irrelevant.  The only
> useful bit left here is that it must always be populated, but I'm not
> even sure I'd add that.  So all we might need is:
> 
> 	/* File system that this I/O operates on. */
> 
> What would be good in this patch is to replace the
> existing bbio->inode->root->fs_info dereferences with bbio->fs_info
> ASAP, though.
Indeed, the comments and extra @fs_info chain would be updated in the 
next version.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 03/13] btrfs: introduce a new helper to submit read bio for scrub
  2023-03-29 23:33   ` Christoph Hellwig
@ 2023-03-29 23:41     ` Qu Wenruo
  2023-03-30  6:43     ` Qu Wenruo
  1 sibling, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2023-03-29 23:41 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo; +Cc: linux-btrfs, David Sterba



On 2023/3/30 07:33, Christoph Hellwig wrote:
> On Wed, Mar 29, 2023 at 07:56:10AM +0800, Qu Wenruo wrote:
>> The new helper, btrfs_submit_scrub_read(), would be mostly a subset of
>> btrfs_submit_bio(), with the following limitations:
>>
>> - Only supports read
>> - @mirror_num must be > 0
>> - No read-time repair nor checksum verification
>> - The @bbio must not cross stripe boundary
>>
>> This would provide the basis for unified read repair for scrub, as we no
>> longer needs to handle RAID56 recovery all by scrub, and RAID56 data
>> stripes scrub can share the same code of read and repair.
>>
>> The repair part would be the same as non-RAID56, as we only need to try
>> the next mirror.
> 
> Didn't we just agree that we do not need another magic helper?

Would address this in the next (and bigger) update, sorry that recent 
updates are not yet taking the higher level changes into considering.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 02/13] btrfs: introduce a new allocator for scrub specific btrfs_bio
  2023-03-29 23:39     ` Qu Wenruo
@ 2023-03-29 23:47       ` Christoph Hellwig
  2023-03-29 23:51         ` Qu Wenruo
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2023-03-29 23:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, Qu Wenruo, linux-btrfs

On Thu, Mar 30, 2023 at 07:39:43AM +0800, Qu Wenruo wrote:
> But as my usual tendency, it would still be better to have some ASSERT()s to
> make sure we're properly populating @inode for needed call sites.

Well, where would you place them?  The only place where we could do it
would be in btrfs_submit_chunk, but without having the inode there is
no way to know if we would have needed one.

> Or we can easily pass some bbio, which should go through csum verification,
> without a valid @inode pointer, and to be treated as NODATACSUM.
> Such problem can be very hard to detect.

That's what we have tests or that check that csums were there and used
to detect and fix problems.

In the new world order I'd rather see bbio->inode != NULL as flag to
run the checksumming and repair infrastructure.  Especially as with a
bit more work I'll be able to never set bbio->inode for all metadata
I/O either, and possibly get rid of the btree inode entirely.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 02/13] btrfs: introduce a new allocator for scrub specific btrfs_bio
  2023-03-29 23:47       ` Christoph Hellwig
@ 2023-03-29 23:51         ` Qu Wenruo
  2023-03-29 23:54           ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2023-03-29 23:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Qu Wenruo, linux-btrfs



On 2023/3/30 07:47, Christoph Hellwig wrote:
> On Thu, Mar 30, 2023 at 07:39:43AM +0800, Qu Wenruo wrote:
>> But as my usual tendency, it would still be better to have some ASSERT()s to
>> make sure we're properly populating @inode for needed call sites.
> 
> Well, where would you place them?  The only place where we could do it
> would be in btrfs_submit_chunk, but without having the inode there is
> no way to know if we would have needed one.

Can we do something like "if (bbio->file_offset) ASSERT(bbio->inode);"?

> 
>> Or we can easily pass some bbio, which should go through csum verification,
>> without a valid @inode pointer, and to be treated as NODATACSUM.
>> Such problem can be very hard to detect.
> 
> That's what we have tests or that check that csums were there and used
> to detect and fix problems.

OK, I guess I'd better change my ASSERT()s happy behavior...

Thanks,
Qu

> 
> In the new world order I'd rather see bbio->inode != NULL as flag to
> run the checksumming and repair infrastructure.  Especially as with a
> bit more work I'll be able to never set bbio->inode for all metadata
> I/O either, and possibly get rid of the btree inode entirely.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 02/13] btrfs: introduce a new allocator for scrub specific btrfs_bio
  2023-03-29 23:51         ` Qu Wenruo
@ 2023-03-29 23:54           ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2023-03-29 23:54 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, Qu Wenruo, linux-btrfs

On Thu, Mar 30, 2023 at 07:51:10AM +0800, Qu Wenruo wrote:
> On 2023/3/30 07:47, Christoph Hellwig wrote:
> > On Thu, Mar 30, 2023 at 07:39:43AM +0800, Qu Wenruo wrote:
> > > But as my usual tendency, it would still be better to have some ASSERT()s to
> > > make sure we're properly populating @inode for needed call sites.
> > 
> > Well, where would you place them?  The only place where we could do it
> > would be in btrfs_submit_chunk, but without having the inode there is
> > no way to know if we would have needed one.
> 
> Can we do something like "if (bbio->file_offset) ASSERT(bbio->inode);"?

We could do it, but it's also a bit confusing as NULL is a perfectly
valid file offset as well.  But I guess the inverse would make a little
more sense

	ASSERT(bbio->inode || bbio->file_offset == 0);


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 03/13] btrfs: introduce a new helper to submit read bio for scrub
  2023-03-29 23:33   ` Christoph Hellwig
  2023-03-29 23:41     ` Qu Wenruo
@ 2023-03-30  6:43     ` Qu Wenruo
  2023-03-30 21:50       ` Christoph Hellwig
  1 sibling, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2023-03-30  6:43 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo; +Cc: linux-btrfs, David Sterba



On 2023/3/30 07:33, Christoph Hellwig wrote:
> On Wed, Mar 29, 2023 at 07:56:10AM +0800, Qu Wenruo wrote:
>> The new helper, btrfs_submit_scrub_read(), would be mostly a subset of
>> btrfs_submit_bio(), with the following limitations:
>>
>> - Only supports read
>> - @mirror_num must be > 0
>> - No read-time repair nor checksum verification
>> - The @bbio must not cross stripe boundary
>>
>> This would provide the basis for unified read repair for scrub, as we no
>> longer needs to handle RAID56 recovery all by scrub, and RAID56 data
>> stripes scrub can share the same code of read and repair.
>>
>> The repair part would be the same as non-RAID56, as we only need to try
>> the next mirror.
> 
> Didn't we just agree that we do not need another magic helper?

I have changed the code in github repo so the read path just goes 
btrfs_submit_bio().

The patch adding bbio::fs_info and adds all the extra skips if 
bbio->inode is NULL:
https://github.com/adam900710/linux/commit/23c5a1bf8ce98f205de574a15a0eb0518e56e80c

Mind to give it a preview?

Thanks,
Qu

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 04/13] btrfs: introduce a new helper to submit write bio for scrub
  2023-03-29 23:33   ` Christoph Hellwig
@ 2023-03-30  6:47     ` Qu Wenruo
  2023-03-30 22:13       ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2023-03-30  6:47 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo; +Cc: linux-btrfs, David Sterba



On 2023/3/30 07:33, Christoph Hellwig wrote:
> Same here, hadn't we just decided to share code with
> ⅺtrfs_repair_io_failure?
> 

Due to the interface differences (read-repair goes plain bio and 
submit_bio_wait(), while scrub goes btrfs_bio), I can't find a perfect 
single interface for both usage.

Thus I go a common helper, map_repair_smap(), to grab the device with 
its physical, and use it to implement the old btrfs_repair_io_failure() 
and the new btrfs_submit_repair_write() interface.

The new one would be utilized by scrub.

The patch is here:
https://github.com/adam900710/linux/commit/e5740c4e7bbf5b3ab368a19459e5b2331c730289

Mind to give it a preview?

The whole series has been tested without problem for non-zoned device.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 03/13] btrfs: introduce a new helper to submit read bio for scrub
  2023-03-30  6:43     ` Qu Wenruo
@ 2023-03-30 21:50       ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2023-03-30 21:50 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, Qu Wenruo, linux-btrfs, David Sterba

On Thu, Mar 30, 2023 at 02:43:02PM +0800, Qu Wenruo wrote:
> > > 
> > > The repair part would be the same as non-RAID56, as we only need to try
> > > the next mirror.
> > 
> > Didn't we just agree that we do not need another magic helper?
> 
> I have changed the code in github repo so the read path just goes
> btrfs_submit_bio().
> 
> The patch adding bbio::fs_info and adds all the extra skips if bbio->inode
> is NULL:
> https://github.com/adam900710/linux/commit/23c5a1bf8ce98f205de574a15a0eb0518e56e80c

This seems to be a loose object in the repository.  I looked at
d8c93c7078fa8d75bd3e12f8a30df072831f1c47 on the scrub_stripe branch
instead.

This looks generally good to me.  A bunch of really minor nits:

In btrfs_check_read_bio the comment for the assert reads a little
strange to me, maybe

	/* Read-repair requires the inode field to be set by the submitter. */

In btrfs_end_bio_work I'd rework the check to check the inode first,
and add brances around the & check and test positively, i.e.

	if (bbio->inode && !(bbio->bio.bi_opf & REQ_META))
		btrfs_check_read_bio()
	else
		bbio->end_io(bbio);

as that reads easier.  It also matches what btrfs_raid56_end_io does.

In btrfs_submit_bio I'd add an empty line between the assert and
the while loop.

In alloc_new_bio I'd keep the setor initialization before the inode
and file_offset.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 04/13] btrfs: introduce a new helper to submit write bio for scrub
  2023-03-30  6:47     ` Qu Wenruo
@ 2023-03-30 22:13       ` Christoph Hellwig
  2023-03-31  0:48         ` Qu Wenruo
  2023-03-31  0:56         ` Qu Wenruo
  0 siblings, 2 replies; 32+ messages in thread
From: Christoph Hellwig @ 2023-03-30 22:13 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, Qu Wenruo, linux-btrfs, David Sterba

On Thu, Mar 30, 2023 at 02:47:08PM +0800, Qu Wenruo wrote:
> Due to the interface differences (read-repair goes plain bio and
> submit_bio_wait(), while scrub goes btrfs_bio), I can't find a perfect
> single interface for both usage.

> Thus I go a common helper, map_repair_smap(), to grab the device with its
> physical, and use it to implement the old btrfs_repair_io_failure() and the
> new btrfs_submit_repair_write() interface.

I think this map helper approach looks very nice.

A bunch of comments, though:

 - map_repair_smap doesn't deal with bios at all, so I think it should
   go into volumes.c instead and be renamed to something like
   btrfs_map_repair_block
 - it really needs a better comment describing what is can be used
   for for and what the assumptions are
 - The smap.dev ASSERT in both callers of map_repair_smap is odd,
   I'd expect a valid dev to just be part of the contract fo that
   helper.
 - shouldn't the ASSERT check that map_length is >= length, as a shorter
   map is the real problem here?  A larger one doesn't seem problematic.
   Instead of just an assert I'd probably do an actual error check with
   a WARN_ON_ONCE.
 - I'd factor the raid56 branch into a separate helper
 - the non-raid56 case should probably call set_io_stripe, to make
   Johannes' life with the raid stripe tree easier

 - all new code in bio.c should stay below btrfs_submit_bio to keep
   the main I/O path together
 - typo: "No csum geneartion" should be: "No csum generation"
 - why can btrfs_submit_repair_write skip the
   BTRFS_DEV_STATE_WRITEABLE check from btrfs_repair_io_failure?
   Shouldn't that and the ->bdev check be lifted into map_repair_smap?

> 
> The whole series has been tested without problem for non-zoned device.
> 
> Thanks,
> Qu
---end quoted text---

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 04/13] btrfs: introduce a new helper to submit write bio for scrub
  2023-03-30 22:13       ` Christoph Hellwig
@ 2023-03-31  0:48         ` Qu Wenruo
  2023-03-31  0:56         ` Qu Wenruo
  1 sibling, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2023-03-31  0:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Qu Wenruo, linux-btrfs, David Sterba



On 2023/3/31 06:13, Christoph Hellwig wrote:
> On Thu, Mar 30, 2023 at 02:47:08PM +0800, Qu Wenruo wrote:
>> Due to the interface differences (read-repair goes plain bio and
>> submit_bio_wait(), while scrub goes btrfs_bio), I can't find a perfect
>> single interface for both usage.
> 
>> Thus I go a common helper, map_repair_smap(), to grab the device with its
>> physical, and use it to implement the old btrfs_repair_io_failure() and the
>> new btrfs_submit_repair_write() interface.
> 
> I think this map helper approach looks very nice.
> 
> A bunch of comments, though:
> 
>   - map_repair_smap doesn't deal with bios at all, so I think it should
>     go into volumes.c instead and be renamed to something like
>     btrfs_map_repair_block
>   - it really needs a better comment describing what is can be used
>     for for and what the assumptions are
>   - The smap.dev ASSERT in both callers of map_repair_smap is odd,
>     I'd expect a valid dev to just be part of the contract fo that
>     helper.
>   - shouldn't the ASSERT check that map_length is >= length, as a shorter
>     map is the real problem here?  A larger one doesn't seem problematic.
>     Instead of just an assert I'd probably do an actual error check with
>     a WARN_ON_ONCE.

Larger mapped length is problematic (but not crash level) for the 
following reasons:

- We can only map the first part
   Since we only have @smap.
   Thus the remaining part won't be repaired.

   If caller is aware of this, and handle the remaining part, it would be
   completely fine, but that's not the case.

- The only two callers are all handling the stripe boundary correctly
   Read-repair only submit sector sized write, while new scrub is fully
   stripe based.

   If we hit a stripe larger than what we expect, it must be some wrong
   in the callers.

But overall, this case should not happen, and even if it happens it 
should happen in a dev environment first.

Thus I just put an ASSERT() there, not to bother the end users but us 
developers.

>   - I'd factor the raid56 branch into a separate helper
>   - the non-raid56 case should probably call set_io_stripe, to make
>     Johannes' life with the raid stripe tree easier
> 
>   - all new code in bio.c should stay below btrfs_submit_bio to keep
>     the main I/O path together
>   - typo: "No csum geneartion" should be: "No csum generation"
>   - why can btrfs_submit_repair_write skip the
>     BTRFS_DEV_STATE_WRITEABLE check from btrfs_repair_io_failure?
>     Shouldn't that and the ->bdev check be lifted into map_repair_smap?

The missing device check is skipped because btrfs_submit_dev_bio() would 
handle it, thus we can avoid duplicated checks.

While btrfs_repair_io_failure() doesn't use bbio, it has to do the 
checks all by itself.

Other commends would be reflected in the next update.

Thanks,
Qu

> 
>>
>> The whole series has been tested without problem for non-zoned device.
>>
>> Thanks,
>> Qu
> ---end quoted text---

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 04/13] btrfs: introduce a new helper to submit write bio for scrub
  2023-03-30 22:13       ` Christoph Hellwig
  2023-03-31  0:48         ` Qu Wenruo
@ 2023-03-31  0:56         ` Qu Wenruo
  1 sibling, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2023-03-31  0:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Qu Wenruo, linux-btrfs, David Sterba



On 2023/3/31 06:13, Christoph Hellwig wrote:
> On Thu, Mar 30, 2023 at 02:47:08PM +0800, Qu Wenruo wrote:
>> Due to the interface differences (read-repair goes plain bio and
>> submit_bio_wait(), while scrub goes btrfs_bio), I can't find a perfect
>> single interface for both usage.
> 
>> Thus I go a common helper, map_repair_smap(), to grab the device with its
>> physical, and use it to implement the old btrfs_repair_io_failure() and the
>> new btrfs_submit_repair_write() interface.
> 
> I think this map helper approach looks very nice.
> 
> A bunch of comments, though:
> 
>   - map_repair_smap doesn't deal with bios at all, so I think it should
>     go into volumes.c instead and be renamed to something like
>     btrfs_map_repair_block
>   - it really needs a better comment describing what is can be used
>     for for and what the assumptions are
>   - The smap.dev ASSERT in both callers of map_repair_smap is odd,
>     I'd expect a valid dev to just be part of the contract fo that
>     helper.
>   - shouldn't the ASSERT check that map_length is >= length, as a shorter
>     map is the real problem here?  A larger one doesn't seem problematic.
>     Instead of just an assert I'd probably do an actual error check with
>     a WARN_ON_ONCE.
>   - I'd factor the raid56 branch into a separate helper
>   - the non-raid56 case should probably call set_io_stripe, to make
>     Johannes' life with the raid stripe tree easier

Forgot to mention, set_io_stripe() is only available inside 
__btrfs_map_block(), as that has a direct grab on map_lookup structure.

But in our case, we call __btrfs_map_block(), thus without a grab on 
map_lookup, but an bioc.

Thanks,
Qu
> 
>   - all new code in bio.c should stay below btrfs_submit_bio to keep
>     the main I/O path together
>   - typo: "No csum geneartion" should be: "No csum generation"
>   - why can btrfs_submit_repair_write skip the
>     BTRFS_DEV_STATE_WRITEABLE check from btrfs_repair_io_failure?
>     Shouldn't that and the ->bdev check be lifted into map_repair_smap?
> 
>>
>> The whole series has been tested without problem for non-zoned device.
>>
>> Thanks,
>> Qu
> ---end quoted text---

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-03-31  0:57 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-28 23:56 [PATCH v7 00/13] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() Qu Wenruo
2023-03-28 23:56 ` [PATCH v7 01/13] btrfs: scrub: use dedicated super block verification function to scrub one super block Qu Wenruo
2023-03-29  0:29   ` Anand Jain
2023-03-29  9:20   ` Johannes Thumshirn
2023-03-29  9:53     ` Qu Wenruo
2023-03-29 23:25   ` Christoph Hellwig
2023-03-28 23:56 ` [PATCH v7 02/13] btrfs: introduce a new allocator for scrub specific btrfs_bio Qu Wenruo
2023-03-29 23:32   ` Christoph Hellwig
2023-03-29 23:39     ` Qu Wenruo
2023-03-29 23:47       ` Christoph Hellwig
2023-03-29 23:51         ` Qu Wenruo
2023-03-29 23:54           ` Christoph Hellwig
2023-03-28 23:56 ` [PATCH v7 03/13] btrfs: introduce a new helper to submit read bio for scrub Qu Wenruo
2023-03-29 23:33   ` Christoph Hellwig
2023-03-29 23:41     ` Qu Wenruo
2023-03-30  6:43     ` Qu Wenruo
2023-03-30 21:50       ` Christoph Hellwig
2023-03-28 23:56 ` [PATCH v7 04/13] btrfs: introduce a new helper to submit write " Qu Wenruo
2023-03-29 23:33   ` Christoph Hellwig
2023-03-30  6:47     ` Qu Wenruo
2023-03-30 22:13       ` Christoph Hellwig
2023-03-31  0:48         ` Qu Wenruo
2023-03-31  0:56         ` Qu Wenruo
2023-03-28 23:56 ` [PATCH v7 05/13] btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based interface Qu Wenruo
2023-03-28 23:56 ` [PATCH v7 06/13] btrfs: scrub: introduce a helper to find and fill the sector info for a scrub_stripe Qu Wenruo
2023-03-28 23:56 ` [PATCH v7 07/13] btrfs: scrub: introduce a helper to verify one metadata Qu Wenruo
2023-03-28 23:56 ` [PATCH v7 08/13] btrfs: scrub: introduce a helper to verify one scrub_stripe Qu Wenruo
2023-03-28 23:56 ` [PATCH v7 09/13] btrfs: scrub: introduce the main read repair worker for scrub_stripe Qu Wenruo
2023-03-28 23:56 ` [PATCH v7 10/13] btrfs: scrub: introduce a writeback helper " Qu Wenruo
2023-03-28 23:56 ` [PATCH v7 11/13] btrfs: scrub: introduce error reporting functionality " Qu Wenruo
2023-03-28 23:56 ` [PATCH v7 12/13] btrfs: scrub: introduce the helper to queue a stripe for scrub Qu Wenruo
2023-03-28 23:56 ` [PATCH v7 13/13] btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.