All of lore.kernel.org
 help / color / mirror / Atom feed
* simple synchronous read repair
@ 2022-05-17 14:50 Christoph Hellwig
  2022-05-17 14:50 ` [PATCH 01/15] btrfs: introduce a pure data checksum checking helper Christoph Hellwig
                   ` (14 more replies)
  0 siblings, 15 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

Hi all,

this is my take on the read repair code.  It borrow a lot of concepts
and patches from Qu's attempt.  The big difference is that it does away
with multiple in-flight repair bios, but instead just does one at a
time, but tries to make it as big as possible.

My aim here is mostly to fix up the messy I/O completions for the
direct I/O path, that make further bio optimizations rather annoying,
but it also gives better I/O patterns for repair (although I'm not sure
anyone cares) and removes a fair chunk of code.

Git tree:

   git://git.infradead.org/users/hch/misc.git read_repair

Gitweb:

   http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/read_repair

Diffstat:
 fs/btrfs/Makefile            |    2 
 fs/btrfs/btrfs_inode.h       |    5 
 fs/btrfs/compression.c       |   13 
 fs/btrfs/ctree.h             |    9 
 fs/btrfs/extent-io-tree.h    |   15 -
 fs/btrfs/extent_io.c         |  597 ++++++++-----------------------------------
 fs/btrfs/extent_io.h         |   27 -
 fs/btrfs/inode.c             |  138 +++------
 fs/btrfs/read-repair.c       |  211 +++++++++++++++
 fs/btrfs/read-repair.h       |   33 ++
 fs/btrfs/super.c             |    9 
 fs/btrfs/volumes.c           |   27 +
 fs/btrfs/volumes.h           |   14 +
 include/trace/events/btrfs.h |    1 
 14 files changed, 480 insertions(+), 621 deletions(-)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/15] btrfs: introduce a pure data checksum checking helper
  2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
@ 2022-05-17 14:50 ` Christoph Hellwig
  2022-05-17 14:59   ` Johannes Thumshirn
  2022-05-20  8:45   ` Nikolay Borisov
  2022-05-17 14:50 ` [PATCH 02/15] btrfs: quit early if the fs has no RAID56 support for raid56 related checks Christoph Hellwig
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

From: Qu Wenruo <wqu@suse.com>

Although we have several data csum verification code, we never have a
function really just to verify checksum for one sector.

Function check_data_csum() do extra work for error reporting, thus it
requires a lot of extra things like file offset, bio_offset etc.

Function btrfs_verify_data_csum() is even worse, it will utizlie page
checked flag, which means it can not be utilized for direct IO pages.

Here we introduce a new helper, btrfs_check_data_sector(), which really
only accept a sector in page, and expected checksum pointer.

We use this function to implement check_data_csum(), and export it for
incoming patch.

Signed-off-by: Qu Wenruo <wqu@suse.com>
[hch: keep passing the csum array as an arguments, as the callers want
      to print it]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/compression.c | 13 ++++---------
 fs/btrfs/ctree.h       |  2 ++
 fs/btrfs/inode.c       | 40 ++++++++++++++++++++++++++++------------
 3 files changed, 34 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index f4564f32f6d93..7212f63dec858 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -147,12 +147,10 @@ static int check_compressed_csum(struct btrfs_inode *inode, struct bio *bio,
 				 u64 disk_start)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	const u32 csum_size = fs_info->csum_size;
 	const u32 sectorsize = fs_info->sectorsize;
 	struct page *page;
 	unsigned int i;
-	char *kaddr;
 	u8 csum[BTRFS_CSUM_SIZE];
 	struct compressed_bio *cb = bio->bi_private;
 	u8 *cb_sum = cb->sums;
@@ -161,8 +159,6 @@ static int check_compressed_csum(struct btrfs_inode *inode, struct bio *bio,
 	    test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state))
 		return 0;
 
-	shash->tfm = fs_info->csum_shash;
-
 	for (i = 0; i < cb->nr_pages; i++) {
 		u32 pg_offset;
 		u32 bytes_left = PAGE_SIZE;
@@ -175,12 +171,11 @@ static int check_compressed_csum(struct btrfs_inode *inode, struct bio *bio,
 		/* Hash through the page sector by sector */
 		for (pg_offset = 0; pg_offset < bytes_left;
 		     pg_offset += sectorsize) {
-			kaddr = kmap_atomic(page);
-			crypto_shash_digest(shash, kaddr + pg_offset,
-					    sectorsize, csum);
-			kunmap_atomic(kaddr);
+			int ret;
 
-			if (memcmp(&csum, cb_sum, csum_size) != 0) {
+			ret = btrfs_check_data_sector(fs_info, page, pg_offset,
+						      csum, cb_sum);
+			if (ret) {
 				btrfs_print_data_csum_error(inode, disk_start,
 						csum, cb_sum, cb->mirror_num);
 				if (btrfs_bio(bio)->device)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0e49b1a0c0716..4713d59ed1105 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3253,6 +3253,8 @@ u64 btrfs_file_extent_end(const struct btrfs_path *path);
 /* inode.c */
 void btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 			   int mirror_num, enum btrfs_compression_type compress_type);
+int btrfs_check_data_sector(struct btrfs_fs_info *fs_info, struct page *page,
+			    u32 pgoff, u8 *csum, u8 *csum_expected);
 unsigned int btrfs_verify_data_csum(struct btrfs_bio *bbio,
 				    u32 bio_offset, struct page *page,
 				    u64 start, u64 end);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index da13bd0d10f12..76169e4e3ec36 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3326,6 +3326,29 @@ void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode,
 				       finish_ordered_fn, uptodate);
 }
 
+/*
+ * Verify the checksum for a single sector without any extra action that
+ * depend on the type of I/O.
+ */
+int btrfs_check_data_sector(struct btrfs_fs_info *fs_info, struct page *page,
+			    u32 pgoff, u8 *csum, u8 *csum_expected)
+{
+	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
+	char *kaddr;
+
+	ASSERT(pgoff + fs_info->sectorsize <= PAGE_SIZE);
+
+	shash->tfm = fs_info->csum_shash;
+
+	kaddr = kmap_local_page(page) + pgoff;
+	crypto_shash_digest(shash, kaddr, fs_info->sectorsize, csum);
+	kunmap_local(kaddr);
+
+	if (memcmp(csum, csum_expected, fs_info->csum_size))
+		return -EIO;
+	return 0;
+}
+
 /*
  * check_data_csum - verify checksum of one sector of uncompressed data
  * @inode:	inode
@@ -3336,14 +3359,15 @@ void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode,
  * @start:	logical offset in the file
  *
  * The length of such check is always one sector size.
+ *
+ * When csum mismatch detected, we will also report the error and fill the
+ * corrupted range with zero. (thus it needs the extra parameters)
  */
 static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio,
 			   u32 bio_offset, struct page *page, u32 pgoff,
 			   u64 start)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
-	char *kaddr;
 	u32 len = fs_info->sectorsize;
 	const u32 csum_size = fs_info->csum_size;
 	unsigned int offset_sectors;
@@ -3355,17 +3379,9 @@ static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio,
 	offset_sectors = bio_offset >> fs_info->sectorsize_bits;
 	csum_expected = ((u8 *)bbio->csum) + offset_sectors * csum_size;
 
-	kaddr = kmap_atomic(page);
-	shash->tfm = fs_info->csum_shash;
-
-	crypto_shash_digest(shash, kaddr + pgoff, len, csum);
-	kunmap_atomic(kaddr);
-
-	if (memcmp(csum, csum_expected, csum_size))
-		goto zeroit;
+	if (!btrfs_check_data_sector(fs_info, page, pgoff, csum, csum_expected))
+		return 0;
 
-	return 0;
-zeroit:
 	btrfs_print_data_csum_error(BTRFS_I(inode), start, csum, csum_expected,
 				    bbio->mirror_num);
 	if (bbio->device)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 02/15] btrfs: quit early if the fs has no RAID56 support for raid56 related checks
  2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
  2022-05-17 14:50 ` [PATCH 01/15] btrfs: introduce a pure data checksum checking helper Christoph Hellwig
@ 2022-05-17 14:50 ` Christoph Hellwig
  2022-05-17 15:00   ` Johannes Thumshirn
                     ` (2 more replies)
  2022-05-17 14:50 ` [PATCH 03/15] btrfs: save the original bi_iter into btrfs_bio for buffered read Christoph Hellwig
                   ` (12 subsequent siblings)
  14 siblings, 3 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

From: Qu Wenruo <wqu@suse.com>

The following functions do special handling for RAID56 chunks:

- btrfs_is_parity_mirror()
  Check if the range is in RAID56 chunks.

- btrfs_full_stripe_len()
  Either return sectorsize for non-RAID56 profiles or full stripe length
  for RAID56 chunks.

But if a filesystem without any RAID56 chunks, it will not have RAID56
incompt flags, and we can skip the chunk tree looking up completely.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/volumes.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 58f3eece8a48c..0819db46dbc42 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5769,6 +5769,9 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,
 	struct map_lookup *map;
 	unsigned long len = fs_info->sectorsize;
 
+	if (!btrfs_fs_incompat(fs_info, RAID56))
+		return len;
+
 	em = btrfs_get_chunk_map(fs_info, logical, len);
 
 	if (!WARN_ON(IS_ERR(em))) {
@@ -5786,6 +5789,9 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
 	struct map_lookup *map;
 	int ret = 0;
 
+	if (!btrfs_fs_incompat(fs_info, RAID56))
+		return 0;
+
 	em = btrfs_get_chunk_map(fs_info, logical, len);
 
 	if(!WARN_ON(IS_ERR(em))) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 03/15] btrfs: save the original bi_iter into btrfs_bio for buffered read
  2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
  2022-05-17 14:50 ` [PATCH 01/15] btrfs: introduce a pure data checksum checking helper Christoph Hellwig
  2022-05-17 14:50 ` [PATCH 02/15] btrfs: quit early if the fs has no RAID56 support for raid56 related checks Christoph Hellwig
@ 2022-05-17 14:50 ` Christoph Hellwig
  2022-05-17 14:50 ` [PATCH 04/15] btrfs: remove duplicated parameters from submit_data_read_repair() Christoph Hellwig
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

From: Qu Wenruo <wqu@suse.com>

Although we have btrfs_bio::iter, it currently have very limited usage:

- RAID56
  Which is not needed at all

- btrfs_bio_clone()
  This is used mostly for direct IO.

For the incoming read repair patches, we want to grab the original
logical bytenr, and be able to iterate the range of the bio (no matter
if it's cloned).

So this patch will also save btrfs_bio::iter for buffered read bios at
submit_one_bio().
And for the sake of consistency, also save the btrfs_bio::iter for
direct IO at btrfs_submit_dio_bio().

The reason that we didn't save the iter in btrfs_map_bio() is,
btrfs_map_bio() is going to handle various bios, with or without
btrfs_bio bioset.
And we  want to keep btrfs_map_bio() to handle and only handle plain bios
without bother the bioset.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 12 ++++++++++++
 fs/btrfs/inode.c     |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 588c7c606a2c6..8fe5f505d6e92 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -188,6 +188,18 @@ static void submit_one_bio(struct bio *bio, int mirror_num,
 	/* Caller should ensure the bio has at least some range added */
 	ASSERT(bio->bi_iter.bi_size);
 
+	/*
+	 * Save the original bi_iter for read bios, as read repair wants the
+	 * orignial logical bytenr.
+	 *
+	 * We don't do this in btrfs_map_bio() because that function is
+	 * bioset independent.
+	 * We can later pass bios without btrfs_bio or with other bioset into
+	 * btrfs_map_bio().
+	 */
+	if (bio_op(bio) == REQ_OP_READ)
+		btrfs_bio(bio)->iter = bio->bi_iter;
+
 	if (is_data_inode(tree->private_data))
 		btrfs_submit_data_bio(tree->private_data, bio, mirror_num,
 					    compress_type);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 76169e4e3ec36..cb0ad1971be30 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7987,6 +7987,8 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 		ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
 		if (ret)
 			goto err;
+		/* Check submit_one_bio() for the reason. */
+		btrfs_bio(bio)->iter = bio->bi_iter;
 	}
 
 	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 04/15] btrfs: remove duplicated parameters from submit_data_read_repair()
  2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
                   ` (2 preceding siblings ...)
  2022-05-17 14:50 ` [PATCH 03/15] btrfs: save the original bi_iter into btrfs_bio for buffered read Christoph Hellwig
@ 2022-05-17 14:50 ` Christoph Hellwig
  2022-05-17 15:35   ` Johannes Thumshirn
  2022-05-20 10:05   ` Nikolay Borisov
  2022-05-17 14:50 ` [PATCH 05/15] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks Christoph Hellwig
                   ` (10 subsequent siblings)
  14 siblings, 2 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

From: Qu Wenruo <wqu@suse.com>

The function submit_data_read_repair() is only called for buffered data
read path, thus those members can be calculated using bvec directly:

- start
  start = page_offset(bvec->bv_page) + bvec->bv_offset;

- end
  end = start + bvec->bv_len - 1;

- page
  page = bvec->bv_page;

- pgoff
  pgoff = bvec->bv_offset;

Thus we can safely replace those 4 parameters with just one bio_vec.

Also remove the unused return value.

Signed-off-by: Qu Wenruo <wqu@suse.com>
[hch: also remove the return value]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 28 +++++++++++-----------------
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8fe5f505d6e92..cee205d8d4bac 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2739,18 +2739,17 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 		btrfs_subpage_end_reader(fs_info, page, start, len);
 }
 
-static blk_status_t submit_data_read_repair(struct inode *inode,
-					    struct bio *failed_bio,
-					    u32 bio_offset, struct page *page,
-					    unsigned int pgoff,
-					    u64 start, u64 end,
-					    int failed_mirror,
-					    unsigned int error_bitmap)
+static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
+		u32 bio_offset, const struct bio_vec *bvec, int failed_mirror,
+		unsigned int error_bitmap)
 {
+	const unsigned int pgoff = bvec->bv_offset;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	struct page *page = bvec->bv_page;
+	const u64 start = page_offset(bvec->bv_page) + bvec->bv_offset;
+	const u64 end = start + bvec->bv_len - 1;
 	const u32 sectorsize = fs_info->sectorsize;
 	const int nr_bits = (end + 1 - start) >> fs_info->sectorsize_bits;
-	int error = 0;
 	int i;
 
 	BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
@@ -2797,11 +2796,9 @@ static blk_status_t submit_data_read_repair(struct inode *inode,
 			continue;
 		}
 		/*
-		 * Repair failed, just record the error but still continue.
-		 * Or the remaining sectors will not be properly unlocked.
+		 * Continue on failed repair, otherwise the remaining sectors
+		 * will not be properly unlocked.
 		 */
-		if (!error)
-			error = ret;
 next:
 		end_page_read(page, uptodate, start + offset, sectorsize);
 		if (uptodate)
@@ -2814,7 +2811,6 @@ static blk_status_t submit_data_read_repair(struct inode *inode,
 				start + offset + sectorsize - 1,
 				&cached);
 	}
-	return errno_to_blk_status(error);
 }
 
 /* lots and lots of room for performance fixes in the end_bio funcs */
@@ -3105,10 +3101,8 @@ static void end_bio_extent_readpage(struct bio *bio)
 			 * submit_data_read_repair() will handle all the good
 			 * and bad sectors, we just continue to the next bvec.
 			 */
-			submit_data_read_repair(inode, bio, bio_offset, page,
-						start - page_offset(page),
-						start, end, mirror,
-						error_bitmap);
+			submit_data_read_repair(inode, bio, bio_offset, bvec,
+						mirror, error_bitmap);
 
 			ASSERT(bio_offset + len > bio_offset);
 			bio_offset += len;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 05/15] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks
  2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
                   ` (3 preceding siblings ...)
  2022-05-17 14:50 ` [PATCH 04/15] btrfs: remove duplicated parameters from submit_data_read_repair() Christoph Hellwig
@ 2022-05-17 14:50 ` Christoph Hellwig
  2022-05-17 15:27   ` Johannes Thumshirn
  2022-05-17 14:50 ` [PATCH 06/15] btrfs: make repair_io_failure available outside of extent_io.c Christoph Hellwig
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

From: Qu Wenruo <wqu@suse.com>

Add a helper that works similar to __bio_for_each_segment, but instead of
iterating over PAGE_SIZE chunks it iterates over each sector.

Signed-off-by: Qu Wenruo <wqu@suse.com>
[hch: split from a larger patch, and iterate over the offset instead of
      the offset bits]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/volumes.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 12b2af9260e92..6f784d4f54664 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -391,6 +391,18 @@ static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio)
 	}
 }
 
+/*
+ * Iterate through a btrfs_bio (@bbio) on a per-sector basis.
+ */
+#define btrfs_bio_for_each_sector(fs_info, bvl, bbio, iter, bio_offset)	\
+	for ((iter) = (bbio)->iter, (bio_offset) = 0;			\
+	     (iter).bi_size &&					\
+	     (((bvl) = bio_iter_iovec((&(bbio)->bio), (iter))), 1);	\
+	     (bio_offset) += fs_info->sectorsize,			\
+	     bio_advance_iter_single(&(bbio)->bio, &(iter),		\
+	     (fs_info)->sectorsize))
+
+
 struct btrfs_io_stripe {
 	struct btrfs_device *dev;
 	u64 physical;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 06/15] btrfs: make repair_io_failure available outside of extent_io.c
  2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
                   ` (4 preceding siblings ...)
  2022-05-17 14:50 ` [PATCH 05/15] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks Christoph Hellwig
@ 2022-05-17 14:50 ` Christoph Hellwig
  2022-05-17 15:18   ` Johannes Thumshirn
  2022-05-17 14:50 ` [PATCH 07/15] btrfs: factor out a helper to end a single sector from submit_data_read_repair Christoph Hellwig
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

From: Qu Wenruo <wqu@suse.com>

Remove the static so that the function can be used by the new read
repair code, and give it a btrfs_ prefix.

Signed-off-by: Qu Wenruo <wqu@suse.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 19 ++++++++++---------
 fs/btrfs/extent_io.h |  3 +++
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index cee205d8d4bac..75466747a252c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2326,9 +2326,9 @@ int free_io_failure(struct extent_io_tree *failure_tree,
  * currently, there can be no more than two copies of every data bit. thus,
  * exactly one rewrite is required.
  */
-static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
-			     u64 length, u64 logical, struct page *page,
-			     unsigned int pg_offset, int mirror_num)
+int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
+			    u64 length, u64 logical, struct page *page,
+			    unsigned int pg_offset, int mirror_num)
 {
 	struct btrfs_device *dev;
 	struct bio_vec bvec;
@@ -2420,8 +2420,9 @@ int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num)
 	for (i = 0; i < num_pages; i++) {
 		struct page *p = eb->pages[i];
 
-		ret = repair_io_failure(fs_info, 0, start, PAGE_SIZE, start, p,
-					start - page_offset(p), mirror_num);
+		ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE,
+					      start, p, start - page_offset(p),
+					      mirror_num);
 		if (ret)
 			break;
 		start += PAGE_SIZE;
@@ -2471,9 +2472,9 @@ int clean_io_failure(struct btrfs_fs_info *fs_info,
 		num_copies = btrfs_num_copies(fs_info, failrec->logical,
 					      failrec->len);
 		if (num_copies > 1)  {
-			repair_io_failure(fs_info, ino, start, failrec->len,
-					  failrec->logical, page, pg_offset,
-					  failrec->failed_mirror);
+			btrfs_repair_io_failure(fs_info, ino, start,
+					failrec->len, failrec->logical,
+					page, pg_offset, failrec->failed_mirror);
 		}
 	}
 
@@ -2631,7 +2632,7 @@ static bool btrfs_check_repairable(struct inode *inode,
 	 *
 	 * Since we're only doing repair for one sector, we only need to get
 	 * a good copy of the failed sector and if we succeed, we have setup
-	 * everything for repair_io_failure to do the rest for us.
+	 * everything for btrfs_repair_io_failure() to do the rest for us.
 	 */
 	ASSERT(failed_mirror);
 	failrec->failed_mirror = failed_mirror;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 956fa434df435..6cdcea1551a66 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -276,6 +276,9 @@ int btrfs_repair_one_sector(struct inode *inode,
 			    struct page *page, unsigned int pgoff,
 			    u64 start, int failed_mirror,
 			    submit_bio_hook_t *submit_bio_hook);
+int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
+			    u64 length, u64 logical, struct page *page,
+			    unsigned int pg_offset, int mirror_num);
 
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 bool find_lock_delalloc_range(struct inode *inode,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 07/15] btrfs: factor out a helper to end a single sector from submit_data_read_repair
  2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
                   ` (5 preceding siblings ...)
  2022-05-17 14:50 ` [PATCH 06/15] btrfs: make repair_io_failure available outside of extent_io.c Christoph Hellwig
@ 2022-05-17 14:50 ` Christoph Hellwig
  2022-05-17 15:18   ` Johannes Thumshirn
  2022-05-17 22:17   ` Qu Wenruo
  2022-05-17 14:50 ` [PATCH 08/15] btrfs: refactor end_bio_extent_readpage Christoph Hellwig
                   ` (7 subsequent siblings)
  14 siblings, 2 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

Add a helper to end I/O on a single sector, which will come in handy with
the new read repair code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 75466747a252c..f96d5b7071813 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2740,6 +2740,20 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 		btrfs_subpage_end_reader(fs_info, page, start, len);
 }
 
+static void end_sector_io(struct page *page, u64 offset, bool uptodate)
+{
+	struct inode *inode = page->mapping->host;
+	u32 sectorsize = btrfs_sb(inode->i_sb)->sectorsize;
+	struct extent_state *cached = NULL;
+
+	end_page_read(page, uptodate, offset, sectorsize);
+	if (uptodate)
+		set_extent_uptodate(&BTRFS_I(inode)->io_tree, offset,
+				offset + sectorsize - 1, &cached, GFP_ATOMIC);
+	unlock_extent_cached_atomic(&BTRFS_I(inode)->io_tree, offset,
+			offset + sectorsize - 1, &cached);
+}
+
 static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
 		u32 bio_offset, const struct bio_vec *bvec, int failed_mirror,
 		unsigned int error_bitmap)
@@ -2770,7 +2784,6 @@ static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
 	/* Iterate through all the sectors in the range */
 	for (i = 0; i < nr_bits; i++) {
 		const unsigned int offset = i * sectorsize;
-		struct extent_state *cached = NULL;
 		bool uptodate = false;
 		int ret;
 
@@ -2801,16 +2814,7 @@ static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
 		 * will not be properly unlocked.
 		 */
 next:
-		end_page_read(page, uptodate, start + offset, sectorsize);
-		if (uptodate)
-			set_extent_uptodate(&BTRFS_I(inode)->io_tree,
-					start + offset,
-					start + offset + sectorsize - 1,
-					&cached, GFP_ATOMIC);
-		unlock_extent_cached_atomic(&BTRFS_I(inode)->io_tree,
-				start + offset,
-				start + offset + sectorsize - 1,
-				&cached);
+		end_sector_io(page, start + offset, uptodate);
 	}
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 08/15] btrfs: refactor end_bio_extent_readpage
  2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
                   ` (6 preceding siblings ...)
  2022-05-17 14:50 ` [PATCH 07/15] btrfs: factor out a helper to end a single sector from submit_data_read_repair Christoph Hellwig
@ 2022-05-17 14:50 ` Christoph Hellwig
  2022-05-17 22:22   ` Qu Wenruo
  2022-05-17 14:50 ` [PATCH 09/15] btrfs: factor out a btrfs_csum_ptr helper Christoph Hellwig
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

Untangle the goto mess and remove the pointless 'ret' local variable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 87 +++++++++++++++++++++-----------------------
 1 file changed, 41 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f96d5b7071813..1ba2d4b194f2e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3028,7 +3028,6 @@ static void end_bio_extent_readpage(struct bio *bio)
 	 */
 	u32 bio_offset = 0;
 	int mirror;
-	int ret;
 	struct bvec_iter_all iter_all;
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
@@ -3039,6 +3038,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 		const u32 sectorsize = fs_info->sectorsize;
 		unsigned int error_bitmap = (unsigned int)-1;
+		bool repair = false;
 		u64 start;
 		u64 end;
 		u32 len;
@@ -3076,55 +3076,23 @@ static void end_bio_extent_readpage(struct bio *bio)
 			if (is_data_inode(inode)) {
 				error_bitmap = btrfs_verify_data_csum(bbio,
 						bio_offset, page, start, end);
-				ret = error_bitmap;
+				if (error_bitmap)
+					uptodate = false;
 			} else {
-				ret = btrfs_validate_metadata_buffer(bbio,
-					page, start, end, mirror);
+				if (btrfs_validate_metadata_buffer(bbio,
+						page, start, end, mirror))
+					uptodate = false;
 			}
-			if (ret)
-				uptodate = false;
-			else
-				clean_io_failure(BTRFS_I(inode)->root->fs_info,
-						 failure_tree, tree, start,
-						 page,
-						 btrfs_ino(BTRFS_I(inode)), 0);
 		}
 
-		if (likely(uptodate))
-			goto readpage_ok;
-
-		if (is_data_inode(inode)) {
-			/*
-			 * If we failed to submit the IO at all we'll have a
-			 * mirror_num == 0, in which case we need to just mark
-			 * the page with an error and unlock it and carry on.
-			 */
-			if (mirror == 0)
-				goto readpage_ok;
-
-			/*
-			 * submit_data_read_repair() will handle all the good
-			 * and bad sectors, we just continue to the next bvec.
-			 */
-			submit_data_read_repair(inode, bio, bio_offset, bvec,
-						mirror, error_bitmap);
-
-			ASSERT(bio_offset + len > bio_offset);
-			bio_offset += len;
-			continue;
-		} else {
-			struct extent_buffer *eb;
-
-			eb = find_extent_buffer_readpage(fs_info, page, start);
-			set_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
-			eb->read_mirror = mirror;
-			atomic_dec(&eb->io_pages);
-		}
-readpage_ok:
 		if (likely(uptodate)) {
 			loff_t i_size = i_size_read(inode);
 			pgoff_t end_index = i_size >> PAGE_SHIFT;
 
+			clean_io_failure(BTRFS_I(inode)->root->fs_info,
+					 failure_tree, tree, start, page,
+					 btrfs_ino(BTRFS_I(inode)), 0);
+
 			/*
 			 * Zero out the remaining part if this range straddles
 			 * i_size.
@@ -3141,14 +3109,41 @@ static void end_bio_extent_readpage(struct bio *bio)
 				zero_user_segment(page, zero_start,
 						  offset_in_page(end) + 1);
 			}
+		} else if (is_data_inode(inode)) {
+			/*
+			 * Only try to repair bios that actually made it to a
+			 * device.  If the bio failed to be submitted mirror
+			 * is 0 and we need to fail it without retrying.
+			 */
+			if (mirror > 0)
+				repair = true;
+		} else {
+			struct extent_buffer *eb;
+
+			eb = find_extent_buffer_readpage(fs_info, page, start);
+			set_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
+			eb->read_mirror = mirror;
+			atomic_dec(&eb->io_pages);
 		}
+
+		if (repair) {
+			/*
+			 * submit_data_read_repair() will handle all the good
+			 * and bad sectors, we just continue to the next bvec.
+			 */
+			submit_data_read_repair(inode, bio, bio_offset, bvec,
+						mirror, error_bitmap);
+		} else {
+			/* Update page status and unlock */
+			end_page_read(page, uptodate, start, len);
+			endio_readpage_release_extent(&processed,
+					BTRFS_I(inode), start, end,
+					PageUptodate(page));
+		}
+
 		ASSERT(bio_offset + len > bio_offset);
 		bio_offset += len;
 
-		/* Update page status and unlock */
-		end_page_read(page, uptodate, start, len);
-		endio_readpage_release_extent(&processed, BTRFS_I(inode),
-					      start, end, PageUptodate(page));
 	}
 	/* Release the last extent */
 	endio_readpage_release_extent(&processed, NULL, 0, 0, false);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 09/15] btrfs: factor out a btrfs_csum_ptr helper
  2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
                   ` (7 preceding siblings ...)
  2022-05-17 14:50 ` [PATCH 08/15] btrfs: refactor end_bio_extent_readpage Christoph Hellwig
@ 2022-05-17 14:50 ` Christoph Hellwig
  2022-05-17 15:24   ` Johannes Thumshirn
  2022-05-17 14:50 ` [PATCH 10/15] btrfs: add a btrfs_map_bio_wait helper Christoph Hellwig
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

Add a helper to find the csum for a byte offset into the csum buffer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/ctree.h |  7 +++++++
 fs/btrfs/inode.c | 13 +++----------
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4713d59ed1105..4ab41cb57f216 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2733,6 +2733,13 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb,
 				     enum btrfs_inline_ref_type is_data);
 u64 hash_extent_data_ref(u64 root_objectid, u64 owner, u64 offset);
 
+static inline u8 *btrfs_csum_ptr(struct btrfs_fs_info *fs_info, u8 *csums,
+		u64 offset)
+{
+	return csums +
+		((offset >> fs_info->sectorsize_bits) * fs_info->csum_size);
+}
+
 /*
  * Take the number of bytes to be checksummmed and figure out how many leaves
  * it would require to store the csums for that many bytes.
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index cb0ad1971be30..c771136116151 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3369,15 +3369,12 @@ static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio,
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	u32 len = fs_info->sectorsize;
-	const u32 csum_size = fs_info->csum_size;
-	unsigned int offset_sectors;
 	u8 *csum_expected;
 	u8 csum[BTRFS_CSUM_SIZE];
 
 	ASSERT(pgoff + len <= PAGE_SIZE);
 
-	offset_sectors = bio_offset >> fs_info->sectorsize_bits;
-	csum_expected = ((u8 *)bbio->csum) + offset_sectors * csum_size;
+	csum_expected = btrfs_csum_ptr(fs_info, bbio->csum, bio_offset);
 
 	if (!btrfs_check_data_sector(fs_info, page, pgoff, csum, csum_expected))
 		return 0;
@@ -8007,12 +8004,8 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 		if (ret)
 			goto err;
 	} else {
-		u64 csum_offset;
-
-		csum_offset = file_offset - dip->file_offset;
-		csum_offset >>= fs_info->sectorsize_bits;
-		csum_offset *= fs_info->csum_size;
-		btrfs_bio(bio)->csum = dip->csums + csum_offset;
+		btrfs_bio(bio)->csum = btrfs_csum_ptr(fs_info, dip->csums,
+				file_offset - dip->file_offset);
 	}
 map:
 	ret = btrfs_map_bio(fs_info, bio, 0);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 10/15] btrfs: add a btrfs_map_bio_wait helper
  2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
                   ` (8 preceding siblings ...)
  2022-05-17 14:50 ` [PATCH 09/15] btrfs: factor out a btrfs_csum_ptr helper Christoph Hellwig
@ 2022-05-17 14:50 ` Christoph Hellwig
  2022-05-17 15:37   ` Johannes Thumshirn
  2022-05-17 22:26   ` Qu Wenruo
  2022-05-17 14:50 ` [PATCH 11/15] btrfs: set ->file_offset in end_bio_extent_readpage Christoph Hellwig
                   ` (4 subsequent siblings)
  14 siblings, 2 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

This helpers works like submit_bio_wait, but goes through the btrfs bio
mapping using btrfs_map_bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/volumes.c | 21 +++++++++++++++++++++
 fs/btrfs/volumes.h |  2 ++
 2 files changed, 23 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0819db46dbc42..8925bc606db7e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6818,6 +6818,27 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 	return BLK_STS_OK;
 }
 
+static void btrfs_end_io_sync(struct bio *bio)
+{
+	complete(bio->bi_private);
+}
+
+blk_status_t btrfs_map_bio_wait(struct btrfs_fs_info *fs_info, struct bio *bio,
+		int mirror)
+{
+	DECLARE_COMPLETION_ONSTACK(done);
+	blk_status_t ret;
+
+	bio->bi_private = &done;
+	bio->bi_end_io = btrfs_end_io_sync;
+	ret = btrfs_map_bio(fs_info, bio, mirror);
+	if (ret)
+		return ret;
+
+	wait_for_completion_io(&done);
+	return bio->bi_status;
+}
+
 static bool dev_args_match_fs_devices(const struct btrfs_dev_lookup_args *args,
 				      const struct btrfs_fs_devices *fs_devices)
 {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6f784d4f54664..b346f6c401515 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -555,6 +555,8 @@ struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
 void btrfs_mapping_tree_free(struct extent_map_tree *tree);
 blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 			   int mirror_num);
+blk_status_t btrfs_map_bio_wait(struct btrfs_fs_info *fs_info, struct bio *bio,
+		int mirror);
 int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 		       fmode_t flags, void *holder);
 struct btrfs_device *btrfs_scan_one_device(const char *path,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 11/15] btrfs: set ->file_offset in end_bio_extent_readpage
  2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
                   ` (9 preceding siblings ...)
  2022-05-17 14:50 ` [PATCH 10/15] btrfs: add a btrfs_map_bio_wait helper Christoph Hellwig
@ 2022-05-17 14:50 ` Christoph Hellwig
  2022-05-17 22:47   ` Qu Wenruo
  2022-05-17 14:50 ` [PATCH 12/15] btrfs: add new read repair infrastructure Christoph Hellwig
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

The new repair code expects ->file_offset to be set for all bios.  Set
it just after entering end_bio_extent_readpage.  As that requires looking
at the first vector before the first loop iteration also use that
opportunity to set various file-wide variables just once instead of once
per loop iteration.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1ba2d4b194f2e..cbe3ab24af9e5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3018,25 +3018,30 @@ static struct extent_buffer *find_extent_buffer_readpage(
  */
 static void end_bio_extent_readpage(struct bio *bio)
 {
+	struct bio_vec *first_vec = bio_first_bvec_all(bio);
+	struct inode *inode = first_vec->bv_page->mapping->host;
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	const u32 sectorsize = fs_info->sectorsize;
 	struct bio_vec *bvec;
 	struct btrfs_bio *bbio = btrfs_bio(bio);
-	struct extent_io_tree *tree, *failure_tree;
+	int mirror = bbio->mirror_num;
+	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
+	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
+	bool uptodate = !bio->bi_status;
 	struct processed_extent processed = { 0 };
 	/*
 	 * The offset to the beginning of a bio, since one bio can never be
 	 * larger than UINT_MAX, u32 here is enough.
 	 */
 	u32 bio_offset = 0;
-	int mirror;
 	struct bvec_iter_all iter_all;
 
+	btrfs_bio(bio)->file_offset =
+		page_offset(first_vec->bv_page) + first_vec->bv_offset;
+
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
-		bool uptodate = !bio->bi_status;
 		struct page *page = bvec->bv_page;
-		struct inode *inode = page->mapping->host;
-		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-		const u32 sectorsize = fs_info->sectorsize;
 		unsigned int error_bitmap = (unsigned int)-1;
 		bool repair = false;
 		u64 start;
@@ -3046,9 +3051,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		btrfs_debug(fs_info,
 			"end_bio_extent_readpage: bi_sector=%llu, err=%d, mirror=%u",
 			bio->bi_iter.bi_sector, bio->bi_status,
-			bbio->mirror_num);
-		tree = &BTRFS_I(inode)->io_tree;
-		failure_tree = &BTRFS_I(inode)->io_failure_tree;
+			mirror);
 
 		/*
 		 * We always issue full-sector reads, but if some block in a
@@ -3071,7 +3074,6 @@ static void end_bio_extent_readpage(struct bio *bio)
 		end = start + bvec->bv_len - 1;
 		len = bvec->bv_len;
 
-		mirror = bbio->mirror_num;
 		if (likely(uptodate)) {
 			if (is_data_inode(inode)) {
 				error_bitmap = btrfs_verify_data_csum(bbio,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 12/15] btrfs: add new read repair infrastructure
  2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
                   ` (10 preceding siblings ...)
  2022-05-17 14:50 ` [PATCH 11/15] btrfs: set ->file_offset in end_bio_extent_readpage Christoph Hellwig
@ 2022-05-17 14:50 ` Christoph Hellwig
  2022-05-17 23:04   ` Qu Wenruo
  2022-05-17 14:50 ` [PATCH 13/15] btrfs: use the new read repair code for direct I/O Christoph Hellwig
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

This adds a new read repair implementation for btrfs.  It is synchronous
in that the end I/O handlers call them, and will get back the results
instead of potentially getting multiple concurrent calls back into the
original end I/O handler.  The synchronous nature has the following
advantages:

 - there is no need for a per-I/O tree of I/O failures, as everything
   related to the I/O failure can be handled locally
 - not having separate repair end I/O helpers will in the future help
   to reuse the direct I/O bio from iomap for the actual submission and
   thus remove the btrfs_dio_private infrastructure

Because submitting many sector size synchronous I/Os would be very
slow when multiple sectors (or a whole read) fail, this new code instead
submits a single read and repair write bio for each contiguous section.
It uses clone of the bio to do that and thus does not need to allocate
any extra bio_vecs.  Note that this cloning is open coded instead of
using the block layer clone helpers as the clone is based on the save
iter in the btrfs_bio, and not bio.bi_iter, which at the point that the
repair code is called has been advanced by the low-level driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/Makefile      |   2 +-
 fs/btrfs/read-repair.c | 211 +++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/read-repair.h |  33 +++++++
 fs/btrfs/super.c       |   9 +-
 4 files changed, 252 insertions(+), 3 deletions(-)
 create mode 100644 fs/btrfs/read-repair.c
 create mode 100644 fs/btrfs/read-repair.h

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 99f9995670ea3..0b2605c750cab 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -31,7 +31,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
 	   backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
 	   uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \
 	   block-rsv.o delalloc-space.o block-group.o discard.o reflink.o \
-	   subpage.o tree-mod-log.o
+	   subpage.o tree-mod-log.o read-repair.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/read-repair.c b/fs/btrfs/read-repair.c
new file mode 100644
index 0000000000000..3ac93bfe09e4f
--- /dev/null
+++ b/fs/btrfs/read-repair.c
@@ -0,0 +1,211 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Christoph Hellwig.
+ */
+#include "ctree.h"
+#include "volumes.h"
+#include "read-repair.h"
+#include "btrfs_inode.h"
+
+static struct bio_set read_repair_bioset;
+
+static int next_mirror(struct btrfs_read_repair *rr, int cur_mirror)
+{
+	if (cur_mirror == rr->num_copies)
+		return cur_mirror + 1 - rr->num_copies;
+	return cur_mirror + 1;
+}
+
+static int prev_mirror(struct btrfs_read_repair *rr, int cur_mirror)
+{
+	if (cur_mirror == 1)
+		return rr->num_copies;
+	return cur_mirror - 1;
+}
+
+/*
+ * Clone a new bio from the src_bbio, using the saved iter in the btrfs_bio
+ * instead of bi_iter.
+ */
+static struct btrfs_bio *btrfs_repair_bio_clone(struct btrfs_bio *src_bbio,
+		u64 offset, u32 size, unsigned int op)
+{
+	struct btrfs_bio *bbio;
+	struct bio *bio;
+
+	bio = bio_alloc_bioset(NULL, 0, op | REQ_SYNC, GFP_NOFS,
+			       &read_repair_bioset);
+	bio_set_flag(bio, BIO_CLONED);
+
+	bio->bi_io_vec = src_bbio->bio.bi_io_vec;
+	bio->bi_iter = src_bbio->iter;
+	bio_advance(bio, offset);
+	bio->bi_iter.bi_size = size;
+
+	bbio = btrfs_bio(bio);
+	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
+	bbio->iter = bbio->bio.bi_iter;
+	bbio->file_offset = src_bbio->file_offset + offset;
+
+	return bbio;
+}
+
+static void btrfs_repair_one_mirror(struct btrfs_bio *read_bbio,
+		struct btrfs_bio *failed_bbio, struct inode *inode,
+		int bad_mirror)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	struct btrfs_inode *bi = BTRFS_I(inode);
+	u64 logical = read_bbio->iter.bi_sector << SECTOR_SHIFT;
+	u64 file_offset = read_bbio->file_offset;
+	struct btrfs_bio *write_bbio;
+	blk_status_t ret;
+
+	/*
+	 * For zoned file systems repair has to relocate the whole zone.
+	 */
+	if (btrfs_repair_one_zone(fs_info, logical))
+		return;
+
+	/*
+	 * For RAID56, we can not just write the bad data back, as any write
+	 * will trigger RMW and read back the corrrupted on-disk stripe, causing
+	 * further damage.
+	 *
+	 * Perform a special low-level repair that bypasses btrfs_map_bio.
+	 */
+	if (btrfs_is_parity_mirror(fs_info, logical, fs_info->sectorsize)) {
+		struct bvec_iter iter;
+		struct bio_vec bv;
+		u32 offset;
+
+		btrfs_bio_for_each_sector(fs_info, bv, read_bbio, iter, offset)
+			btrfs_repair_io_failure(fs_info, btrfs_ino(bi),
+					file_offset + offset,
+					fs_info->sectorsize,
+					logical + offset,
+					bv.bv_page, bv.bv_offset,
+					bad_mirror);
+		return;
+	}
+
+	/*
+	 * Otherwise just clone the whole bio and write it back to the
+	 * previously bad mirror.
+	 */
+	write_bbio = btrfs_repair_bio_clone(read_bbio, 0,
+			read_bbio->iter.bi_size, REQ_OP_WRITE);
+	ret = btrfs_map_bio_wait(fs_info, &write_bbio->bio, bad_mirror);
+	bio_put(&write_bbio->bio);
+
+	btrfs_info_rl(fs_info,
+		"%s: root %lld ino %llu off %llu logical %llu/%u from good mirror %d",
+		ret ? "failed to correct read error" : "read error corrected",
+		bi->root->root_key.objectid, btrfs_ino(bi),
+		file_offset, logical, read_bbio->iter.bi_size, bad_mirror);
+}
+
+static bool btrfs_repair_read_bio(struct btrfs_bio *bbio,
+		struct btrfs_bio *failed_bbio, struct inode *inode,
+		int read_mirror)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	u32 start_offset = bbio->file_offset - failed_bbio->file_offset;
+	u8 csum[BTRFS_CSUM_SIZE];
+	struct bvec_iter iter;
+	struct bio_vec bv;
+	u32 offset;
+
+	if (btrfs_map_bio_wait(fs_info, &bbio->bio, read_mirror))
+		return false;
+
+	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
+		return true;
+
+	btrfs_bio_for_each_sector(fs_info, bv, bbio, iter, offset) {
+		u8 *expected_csum =
+			btrfs_csum_ptr(fs_info, failed_bbio->csum,
+					start_offset + offset);
+
+		if (btrfs_check_data_sector(fs_info, bv.bv_page, bv.bv_offset,
+				csum, expected_csum))
+			return false;
+	}
+
+	return true;
+}
+
+bool __btrfs_read_repair_end(struct btrfs_read_repair *rr,
+		struct btrfs_bio *failed_bbio, struct inode *inode,
+		u64 end_offset, repair_endio_t endio)
+{
+	u8 bad_mirror = failed_bbio->mirror_num;
+	u8 read_mirror = next_mirror(rr, bad_mirror);
+	struct btrfs_bio *read_bbio = NULL;
+	bool uptodate = false;
+
+	do {
+		if (read_bbio)
+			bio_put(&read_bbio->bio);
+		read_bbio = btrfs_repair_bio_clone(failed_bbio,
+				rr->start_offset, end_offset - rr->start_offset,
+				REQ_OP_READ);
+		if (btrfs_repair_read_bio(read_bbio, failed_bbio, inode,
+				read_mirror)) {
+			do {
+		    		btrfs_repair_one_mirror(read_bbio, failed_bbio,
+						inode, bad_mirror);
+			} while ((bad_mirror = prev_mirror(rr, bad_mirror)) !=
+					failed_bbio->mirror_num);
+			uptodate = true;
+			break;
+		}
+		bad_mirror = read_mirror;
+		read_mirror = next_mirror(rr, bad_mirror);
+	} while (read_mirror != failed_bbio->mirror_num);
+
+	if (endio)
+		endio(read_bbio, inode, uptodate);
+	bio_put(&read_bbio->bio);
+
+	rr->in_use = false;
+	return uptodate;
+}
+
+bool btrfs_read_repair_add(struct btrfs_read_repair *rr,
+		struct btrfs_bio *failed_bbio, struct inode *inode,
+		u64 start_offset)
+{
+	if (rr->in_use)
+		return true;
+
+	if (!rr->num_copies) {
+		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+
+		rr->num_copies = btrfs_num_copies(fs_info,
+				failed_bbio->iter.bi_sector << SECTOR_SHIFT,
+				failed_bbio->iter.bi_size);
+	}
+
+	/*
+	 * If there is no other copy of the data to recovery from, give up now
+	 * and don't even try to build up a larget batch.
+	 */
+	if (rr->num_copies < 2)
+		return false;
+
+	rr->in_use = true;
+	rr->start_offset = start_offset;
+	return true;
+}
+
+void btrfs_read_repair_exit(void)
+{
+	bioset_exit(&read_repair_bioset);
+}
+
+int __init btrfs_read_repair_init(void)
+{
+	return bioset_init(&read_repair_bioset, BIO_POOL_SIZE,
+			offsetof(struct btrfs_bio, bio), 0);
+}
diff --git a/fs/btrfs/read-repair.h b/fs/btrfs/read-repair.h
new file mode 100644
index 0000000000000..e371790af2b3e
--- /dev/null
+++ b/fs/btrfs/read-repair.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef BTRFS_READ_REPAIR_H
+#define BTRFS_READ_REPAIR_H
+
+struct btrfs_read_repair {
+	u64 start_offset;
+	bool in_use;
+	int num_copies;
+};
+
+typedef void (*repair_endio_t)(struct btrfs_bio *repair_bbio,
+		struct inode *inode, bool uptodate);
+
+bool btrfs_read_repair_add(struct btrfs_read_repair *rr,
+		struct btrfs_bio *failed_bbio, struct inode *inode,
+		u64 bio_offset);
+bool __btrfs_read_repair_end(struct btrfs_read_repair *rr,
+		struct btrfs_bio *failed_bbio, struct inode *inode,
+		u64 end_offset, repair_endio_t end_io);
+static inline bool btrfs_read_repair_end(struct btrfs_read_repair *rr,
+		struct btrfs_bio *failed_bbio, struct inode *inode,
+		u64 end_offset, repair_endio_t endio)
+{
+	if (!rr->in_use)
+		return true;
+	return __btrfs_read_repair_end(rr, failed_bbio, inode, end_offset,
+			endio);
+}
+
+int __init btrfs_read_repair_init(void);
+void btrfs_read_repair_exit(void);
+
+#endif /* BTRFS_READ_REPAIR_H */
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index b1fdc6a26c76e..b33ad892c3058 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -48,6 +48,7 @@
 #include "block-group.h"
 #include "discard.h"
 #include "qgroup.h"
+#include "read-repair.h"
 #define CREATE_TRACE_POINTS
 #include <trace/events/btrfs.h>
 
@@ -2641,10 +2642,12 @@ static int __init init_btrfs_fs(void)
 	err = extent_io_init();
 	if (err)
 		goto free_cachep;
-
-	err = extent_state_cache_init();
+	err = btrfs_read_repair_init();
 	if (err)
 		goto free_extent_io;
+	err = extent_state_cache_init();
+	if (err)
+		goto free_read_repair;
 
 	err = extent_map_init();
 	if (err)
@@ -2708,6 +2711,8 @@ static int __init init_btrfs_fs(void)
 	extent_map_exit();
 free_extent_state_cache:
 	extent_state_cache_exit();
+free_read_repair:
+	btrfs_read_repair_exit();
 free_extent_io:
 	extent_io_exit();
 free_cachep:
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 13/15] btrfs: use the new read repair code for direct I/O
  2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
                   ` (11 preceding siblings ...)
  2022-05-17 14:50 ` [PATCH 12/15] btrfs: add new read repair infrastructure Christoph Hellwig
@ 2022-05-17 14:50 ` Christoph Hellwig
  2022-05-17 14:50 ` [PATCH 14/15] btrfs: use the new read repair code for buffered reads Christoph Hellwig
  2022-05-17 14:50 ` [PATCH 15/15] btrfs: remove io_failure_record infrastructure completely Christoph Hellwig
  14 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

Rewrite btrfs_check_read_dio_bio to use btrfs_bio_for_each_sector and
start/end a repair session as needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/inode.c | 76 +++++++++++++-----------------------------------
 1 file changed, 21 insertions(+), 55 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c771136116151..51cca8f343b72 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -55,6 +55,7 @@
 #include "zoned.h"
 #include "subpage.h"
 #include "inode-item.h"
+#include "read-repair.h"
 
 struct btrfs_iget_args {
 	u64 ino;
@@ -7861,71 +7862,36 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip)
 	bio_endio(&dip->bio);
 }
 
-static void submit_dio_repair_bio(struct inode *inode, struct bio *bio,
-				  int mirror_num,
-				  enum btrfs_compression_type compress_type)
-{
-	struct btrfs_dio_private *dip = bio->bi_private;
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-
-	BUG_ON(bio_op(bio) == REQ_OP_WRITE);
-
-	if (btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA))
-		return;
-
-	refcount_inc(&dip->refs);
-	if (btrfs_map_bio(fs_info, bio, mirror_num))
-		refcount_dec(&dip->refs);
-}
-
 static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
 					     struct btrfs_bio *bbio,
 					     const bool uptodate)
 {
 	struct inode *inode = dip->inode;
 	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
-	const u32 sectorsize = fs_info->sectorsize;
-	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
-	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
-	struct bio_vec bvec;
+	struct btrfs_read_repair rr = { };
+	blk_status_t ret = BLK_STS_OK;
 	struct bvec_iter iter;
-	u32 bio_offset = 0;
-	blk_status_t err = BLK_STS_OK;
-
-	__bio_for_each_segment(bvec, &bbio->bio, iter, bbio->iter) {
-		unsigned int i, nr_sectors, pgoff;
-
-		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
-		pgoff = bvec.bv_offset;
-		for (i = 0; i < nr_sectors; i++) {
-			u64 start = bbio->file_offset + bio_offset;
-
-			ASSERT(pgoff < PAGE_SIZE);
-			if (uptodate &&
-			    (!csum || !check_data_csum(inode, bbio,
-						       bio_offset, bvec.bv_page,
-						       pgoff, start))) {
-				clean_io_failure(fs_info, failure_tree, io_tree,
-						 start, bvec.bv_page,
-						 btrfs_ino(BTRFS_I(inode)),
-						 pgoff);
-			} else {
-				int ret;
-
-				ret = btrfs_repair_one_sector(inode, &bbio->bio,
-						bio_offset, bvec.bv_page, pgoff,
-						start, bbio->mirror_num,
-						submit_dio_repair_bio);
-				if (ret)
-					err = errno_to_blk_status(ret);
-			}
-			ASSERT(bio_offset + sectorsize > bio_offset);
-			bio_offset += sectorsize;
-			pgoff += sectorsize;
+	struct bio_vec bv;
+	u32 offset;
+
+	btrfs_bio_for_each_sector(fs_info, bv, bbio, iter, offset) {
+		if (uptodate &&
+		    (!csum || !check_data_csum(inode, bbio, offset,
+				bv.bv_page, bv.bv_offset,
+				bbio->file_offset + offset))) {
+			if (!btrfs_read_repair_end(&rr, bbio, inode, offset,
+					NULL))
+				ret = BLK_STS_IOERR;
+		} else {
+			if (!btrfs_read_repair_add(&rr, bbio, inode, offset))
+				ret = BLK_STS_IOERR;
 		}
 	}
-	return err;
+
+	if (!btrfs_read_repair_end(&rr, bbio, inode, offset, NULL))
+		ret = BLK_STS_IOERR;
+	return ret;
 }
 
 static void __endio_write_update_ordered(struct btrfs_inode *inode,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 14/15] btrfs: use the new read repair code for buffered reads
  2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
                   ` (12 preceding siblings ...)
  2022-05-17 14:50 ` [PATCH 13/15] btrfs: use the new read repair code for direct I/O Christoph Hellwig
@ 2022-05-17 14:50 ` Christoph Hellwig
  2022-05-17 14:50 ` [PATCH 15/15] btrfs: remove io_failure_record infrastructure completely Christoph Hellwig
  14 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

Start/end a repair session as needed in end_bio_extent_readpage and
submit_data_read_repair.  Unlike direct I/O, the buffered I/O handler
completes I/O on a per-sector basis and thus needs to pass an endio
handler to the repair code, which unlocks all pages and marks them
as either uptodate or not.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 76 ++++++++++++++++++++------------------------
 1 file changed, 35 insertions(+), 41 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index cbe3ab24af9e5..093e3ac28fe21 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -30,6 +30,7 @@
 #include "zoned.h"
 #include "block-group.h"
 #include "compression.h"
+#include "read-repair.h"
 
 static struct kmem_cache *extent_state_cache;
 static struct kmem_cache *extent_buffer_cache;
@@ -2754,14 +2755,27 @@ static void end_sector_io(struct page *page, u64 offset, bool uptodate)
 			offset + sectorsize - 1, &cached);
 }
 
+static void end_read_repair(struct btrfs_bio *repair_bbio, struct inode *inode,
+		bool uptodate)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	struct bvec_iter iter;
+	struct bio_vec bv;
+	u32 offset;
+
+	btrfs_bio_for_each_sector(fs_info, bv, repair_bbio, iter, offset)
+		end_sector_io(bv.bv_page, repair_bbio->file_offset + offset,
+				uptodate);
+}
+
 static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
 		u32 bio_offset, const struct bio_vec *bvec, int failed_mirror,
-		unsigned int error_bitmap)
+		unsigned int error_bitmap, struct btrfs_read_repair *rr)
 {
-	const unsigned int pgoff = bvec->bv_offset;
+	struct btrfs_bio *failed_bbio = btrfs_bio(failed_bio);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	u64 start = page_offset(bvec->bv_page) + bvec->bv_offset;
 	struct page *page = bvec->bv_page;
-	const u64 start = page_offset(bvec->bv_page) + bvec->bv_offset;
 	const u64 end = start + bvec->bv_len - 1;
 	const u32 sectorsize = fs_info->sectorsize;
 	const int nr_bits = (end + 1 - start) >> fs_info->sectorsize_bits;
@@ -2783,38 +2797,17 @@ static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
 
 	/* Iterate through all the sectors in the range */
 	for (i = 0; i < nr_bits; i++) {
-		const unsigned int offset = i * sectorsize;
-		bool uptodate = false;
-		int ret;
-
-		if (!(error_bitmap & (1U << i))) {
-			/*
-			 * This sector has no error, just end the page read
-			 * and unlock the range.
-			 */
-			uptodate = true;
-			goto next;
+		bool uptodate = !(error_bitmap & (1U << i));
+
+		if (uptodate ||
+		    !btrfs_read_repair_add(rr, failed_bbio, inode,
+		    		bio_offset)) {
+			btrfs_read_repair_end(rr, failed_bbio, inode,
+					bio_offset, end_read_repair);
+			end_sector_io(page, start, uptodate);
 		}
-
-		ret = btrfs_repair_one_sector(inode, failed_bio,
-				bio_offset + offset,
-				page, pgoff + offset, start + offset,
-				failed_mirror, btrfs_submit_data_bio);
-		if (!ret) {
-			/*
-			 * We have submitted the read repair, the page release
-			 * will be handled by the endio function of the
-			 * submitted repair bio.
-			 * Thus we don't need to do any thing here.
-			 */
-			continue;
-		}
-		/*
-		 * Continue on failed repair, otherwise the remaining sectors
-		 * will not be properly unlocked.
-		 */
-next:
-		end_sector_io(page, start + offset, uptodate);
+		bio_offset += sectorsize;
+		start += sectorsize;
 	}
 }
 
@@ -3025,8 +3018,6 @@ static void end_bio_extent_readpage(struct bio *bio)
 	struct bio_vec *bvec;
 	struct btrfs_bio *bbio = btrfs_bio(bio);
 	int mirror = bbio->mirror_num;
-	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
-	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
 	bool uptodate = !bio->bi_status;
 	struct processed_extent processed = { 0 };
 	/*
@@ -3035,6 +3026,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 	 */
 	u32 bio_offset = 0;
 	struct bvec_iter_all iter_all;
+	struct btrfs_read_repair rr = { };
 
 	btrfs_bio(bio)->file_offset =
 		page_offset(first_vec->bv_page) + first_vec->bv_offset;
@@ -3091,10 +3083,6 @@ static void end_bio_extent_readpage(struct bio *bio)
 			loff_t i_size = i_size_read(inode);
 			pgoff_t end_index = i_size >> PAGE_SHIFT;
 
-			clean_io_failure(BTRFS_I(inode)->root->fs_info,
-					 failure_tree, tree, start, page,
-					 btrfs_ino(BTRFS_I(inode)), 0);
-
 			/*
 			 * Zero out the remaining part if this range straddles
 			 * i_size.
@@ -3134,9 +3122,13 @@ static void end_bio_extent_readpage(struct bio *bio)
 			 * and bad sectors, we just continue to the next bvec.
 			 */
 			submit_data_read_repair(inode, bio, bio_offset, bvec,
-						mirror, error_bitmap);
+						mirror, error_bitmap,
+						&rr);
+
 		} else {
 			/* Update page status and unlock */
+			btrfs_read_repair_end(&rr, btrfs_bio(bio), inode,
+					bio_offset, end_read_repair);
 			end_page_read(page, uptodate, start, len);
 			endio_readpage_release_extent(&processed,
 					BTRFS_I(inode), start, end,
@@ -3148,6 +3140,8 @@ static void end_bio_extent_readpage(struct bio *bio)
 
 	}
 	/* Release the last extent */
+	btrfs_read_repair_end(&rr, btrfs_bio(bio), inode, bio_offset,
+			end_read_repair);
 	endio_readpage_release_extent(&processed, NULL, 0, 0, false);
 	btrfs_bio_free_csum(bbio);
 	bio_put(bio);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 15/15] btrfs: remove io_failure_record infrastructure completely
  2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
                   ` (13 preceding siblings ...)
  2022-05-17 14:50 ` [PATCH 14/15] btrfs: use the new read repair code for buffered reads Christoph Hellwig
@ 2022-05-17 14:50 ` Christoph Hellwig
  14 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-17 14:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

From: Qu Wenruo <wqu@suse.com>

Since our read repair are always handled by btrfs_read_repair_ctrl,
which only has the lifespan inside endio function.

This means we no longer needs to record which range and its mirror
number for failure.

Now if we failed to read some data page, we have already tried every
mirrors we have, thus no need to record the failed range.

Thus this patch can remove the whole io_failure_record structure and its
related functions.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/btrfs_inode.h       |   5 -
 fs/btrfs/extent-io-tree.h    |  15 --
 fs/btrfs/extent_io.c         | 365 -----------------------------------
 fs/btrfs/extent_io.h         |  24 ---
 fs/btrfs/inode.c             |   7 -
 include/trace/events/btrfs.h |   1 -
 6 files changed, 417 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 33811e896623f..3eeba0eb9f16b 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -91,11 +91,6 @@ struct btrfs_inode {
 	/* the io_tree does range state (DIRTY, LOCKED etc) */
 	struct extent_io_tree io_tree;
 
-	/* special utility tree used to record which mirrors have already been
-	 * tried when checksums fail for a given block
-	 */
-	struct extent_io_tree io_failure_tree;
-
 	/*
 	 * Keep track of where the inode has extent items mapped in order to
 	 * make sure the i_size adjustments are accurate
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index c3eb52dbe61cc..8ab9b6cd53ed4 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -56,7 +56,6 @@ enum {
 	IO_TREE_FS_EXCLUDED_EXTENTS,
 	IO_TREE_BTREE_INODE_IO,
 	IO_TREE_INODE_IO,
-	IO_TREE_INODE_IO_FAILURE,
 	IO_TREE_RELOC_BLOCKS,
 	IO_TREE_TRANS_DIRTY_PAGES,
 	IO_TREE_ROOT_DIRTY_LOG_PAGES,
@@ -250,18 +249,4 @@ bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start,
 			       u64 *end, u64 max_bytes,
 			       struct extent_state **cached_state);
 
-/* This should be reworked in the future and put elsewhere. */
-struct io_failure_record *get_state_failrec(struct extent_io_tree *tree, u64 start);
-int set_state_failrec(struct extent_io_tree *tree, u64 start,
-		      struct io_failure_record *failrec);
-void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start,
-		u64 end);
-int free_io_failure(struct extent_io_tree *failure_tree,
-		    struct extent_io_tree *io_tree,
-		    struct io_failure_record *rec);
-int clean_io_failure(struct btrfs_fs_info *fs_info,
-		     struct extent_io_tree *failure_tree,
-		     struct extent_io_tree *io_tree, u64 start,
-		     struct page *page, u64 ino, unsigned int pg_offset);
-
 #endif /* BTRFS_EXTENT_IO_TREE_H */
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 093e3ac28fe21..b115ba7a4902b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2177,66 +2177,6 @@ u64 count_range_bits(struct extent_io_tree *tree,
 	return total_bytes;
 }
 
-/*
- * set the private field for a given byte offset in the tree.  If there isn't
- * an extent_state there already, this does nothing.
- */
-int set_state_failrec(struct extent_io_tree *tree, u64 start,
-		      struct io_failure_record *failrec)
-{
-	struct rb_node *node;
-	struct extent_state *state;
-	int ret = 0;
-
-	spin_lock(&tree->lock);
-	/*
-	 * this search will find all the extents that end after
-	 * our range starts.
-	 */
-	node = tree_search(tree, start);
-	if (!node) {
-		ret = -ENOENT;
-		goto out;
-	}
-	state = rb_entry(node, struct extent_state, rb_node);
-	if (state->start != start) {
-		ret = -ENOENT;
-		goto out;
-	}
-	state->failrec = failrec;
-out:
-	spin_unlock(&tree->lock);
-	return ret;
-}
-
-struct io_failure_record *get_state_failrec(struct extent_io_tree *tree, u64 start)
-{
-	struct rb_node *node;
-	struct extent_state *state;
-	struct io_failure_record *failrec;
-
-	spin_lock(&tree->lock);
-	/*
-	 * this search will find all the extents that end after
-	 * our range starts.
-	 */
-	node = tree_search(tree, start);
-	if (!node) {
-		failrec = ERR_PTR(-ENOENT);
-		goto out;
-	}
-	state = rb_entry(node, struct extent_state, rb_node);
-	if (state->start != start) {
-		failrec = ERR_PTR(-ENOENT);
-		goto out;
-	}
-
-	failrec = state->failrec;
-out:
-	spin_unlock(&tree->lock);
-	return failrec;
-}
-
 /*
  * searches a range in the state tree for a given mask.
  * If 'filled' == 1, this returns 1 only if every extent in the tree
@@ -2293,30 +2233,6 @@ int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end,
 	return bitset;
 }
 
-int free_io_failure(struct extent_io_tree *failure_tree,
-		    struct extent_io_tree *io_tree,
-		    struct io_failure_record *rec)
-{
-	int ret;
-	int err = 0;
-
-	set_state_failrec(failure_tree, rec->start, NULL);
-	ret = clear_extent_bits(failure_tree, rec->start,
-				rec->start + rec->len - 1,
-				EXTENT_LOCKED | EXTENT_DIRTY);
-	if (ret)
-		err = ret;
-
-	ret = clear_extent_bits(io_tree, rec->start,
-				rec->start + rec->len - 1,
-				EXTENT_DAMAGED);
-	if (ret && !err)
-		err = ret;
-
-	kfree(rec);
-	return err;
-}
-
 /*
  * this bypasses the standard btrfs submit functions deliberately, as
  * the standard behavior is to write all copies in a raid setup. here we only
@@ -2432,287 +2348,6 @@ int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num)
 	return ret;
 }
 
-/*
- * each time an IO finishes, we do a fast check in the IO failure tree
- * to see if we need to process or clean up an io_failure_record
- */
-int clean_io_failure(struct btrfs_fs_info *fs_info,
-		     struct extent_io_tree *failure_tree,
-		     struct extent_io_tree *io_tree, u64 start,
-		     struct page *page, u64 ino, unsigned int pg_offset)
-{
-	u64 private;
-	struct io_failure_record *failrec;
-	struct extent_state *state;
-	int num_copies;
-	int ret;
-
-	private = 0;
-	ret = count_range_bits(failure_tree, &private, (u64)-1, 1,
-			       EXTENT_DIRTY, 0);
-	if (!ret)
-		return 0;
-
-	failrec = get_state_failrec(failure_tree, start);
-	if (IS_ERR(failrec))
-		return 0;
-
-	BUG_ON(!failrec->this_mirror);
-
-	if (sb_rdonly(fs_info->sb))
-		goto out;
-
-	spin_lock(&io_tree->lock);
-	state = find_first_extent_bit_state(io_tree,
-					    failrec->start,
-					    EXTENT_LOCKED);
-	spin_unlock(&io_tree->lock);
-
-	if (state && state->start <= failrec->start &&
-	    state->end >= failrec->start + failrec->len - 1) {
-		num_copies = btrfs_num_copies(fs_info, failrec->logical,
-					      failrec->len);
-		if (num_copies > 1)  {
-			btrfs_repair_io_failure(fs_info, ino, start,
-					failrec->len, failrec->logical,
-					page, pg_offset, failrec->failed_mirror);
-		}
-	}
-
-out:
-	free_io_failure(failure_tree, io_tree, failrec);
-
-	return 0;
-}
-
-/*
- * Can be called when
- * - hold extent lock
- * - under ordered extent
- * - the inode is freeing
- */
-void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start, u64 end)
-{
-	struct extent_io_tree *failure_tree = &inode->io_failure_tree;
-	struct io_failure_record *failrec;
-	struct extent_state *state, *next;
-
-	if (RB_EMPTY_ROOT(&failure_tree->state))
-		return;
-
-	spin_lock(&failure_tree->lock);
-	state = find_first_extent_bit_state(failure_tree, start, EXTENT_DIRTY);
-	while (state) {
-		if (state->start > end)
-			break;
-
-		ASSERT(state->end <= end);
-
-		next = next_state(state);
-
-		failrec = state->failrec;
-		free_extent_state(state);
-		kfree(failrec);
-
-		state = next;
-	}
-	spin_unlock(&failure_tree->lock);
-}
-
-static struct io_failure_record *btrfs_get_io_failure_record(struct inode *inode,
-							     u64 start)
-{
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct io_failure_record *failrec;
-	struct extent_map *em;
-	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
-	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
-	struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
-	const u32 sectorsize = fs_info->sectorsize;
-	int ret;
-	u64 logical;
-
-	failrec = get_state_failrec(failure_tree, start);
-	if (!IS_ERR(failrec)) {
-		btrfs_debug(fs_info,
-	"Get IO Failure Record: (found) logical=%llu, start=%llu, len=%llu",
-			failrec->logical, failrec->start, failrec->len);
-		/*
-		 * when data can be on disk more than twice, add to failrec here
-		 * (e.g. with a list for failed_mirror) to make
-		 * clean_io_failure() clean all those errors at once.
-		 */
-
-		return failrec;
-	}
-
-	failrec = kzalloc(sizeof(*failrec), GFP_NOFS);
-	if (!failrec)
-		return ERR_PTR(-ENOMEM);
-
-	failrec->start = start;
-	failrec->len = sectorsize;
-	failrec->this_mirror = 0;
-	failrec->compress_type = BTRFS_COMPRESS_NONE;
-
-	read_lock(&em_tree->lock);
-	em = lookup_extent_mapping(em_tree, start, failrec->len);
-	if (!em) {
-		read_unlock(&em_tree->lock);
-		kfree(failrec);
-		return ERR_PTR(-EIO);
-	}
-
-	if (em->start > start || em->start + em->len <= start) {
-		free_extent_map(em);
-		em = NULL;
-	}
-	read_unlock(&em_tree->lock);
-	if (!em) {
-		kfree(failrec);
-		return ERR_PTR(-EIO);
-	}
-
-	logical = start - em->start;
-	logical = em->block_start + logical;
-	if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) {
-		logical = em->block_start;
-		failrec->compress_type = em->compress_type;
-	}
-
-	btrfs_debug(fs_info,
-		    "Get IO Failure Record: (new) logical=%llu, start=%llu, len=%llu",
-		    logical, start, failrec->len);
-
-	failrec->logical = logical;
-	free_extent_map(em);
-
-	/* Set the bits in the private failure tree */
-	ret = set_extent_bits(failure_tree, start, start + sectorsize - 1,
-			      EXTENT_LOCKED | EXTENT_DIRTY);
-	if (ret >= 0) {
-		ret = set_state_failrec(failure_tree, start, failrec);
-		/* Set the bits in the inode's tree */
-		ret = set_extent_bits(tree, start, start + sectorsize - 1,
-				      EXTENT_DAMAGED);
-	} else if (ret < 0) {
-		kfree(failrec);
-		return ERR_PTR(ret);
-	}
-
-	return failrec;
-}
-
-static bool btrfs_check_repairable(struct inode *inode,
-				   struct io_failure_record *failrec,
-				   int failed_mirror)
-{
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	int num_copies;
-
-	num_copies = btrfs_num_copies(fs_info, failrec->logical, failrec->len);
-	if (num_copies == 1) {
-		/*
-		 * we only have a single copy of the data, so don't bother with
-		 * all the retry and error correction code that follows. no
-		 * matter what the error is, it is very likely to persist.
-		 */
-		btrfs_debug(fs_info,
-			"Check Repairable: cannot repair, num_copies=%d, next_mirror %d, failed_mirror %d",
-			num_copies, failrec->this_mirror, failed_mirror);
-		return false;
-	}
-
-	/* The failure record should only contain one sector */
-	ASSERT(failrec->len == fs_info->sectorsize);
-
-	/*
-	 * There are two premises:
-	 * a) deliver good data to the caller
-	 * b) correct the bad sectors on disk
-	 *
-	 * Since we're only doing repair for one sector, we only need to get
-	 * a good copy of the failed sector and if we succeed, we have setup
-	 * everything for btrfs_repair_io_failure() to do the rest for us.
-	 */
-	ASSERT(failed_mirror);
-	failrec->failed_mirror = failed_mirror;
-	failrec->this_mirror++;
-	if (failrec->this_mirror == failed_mirror)
-		failrec->this_mirror++;
-
-	if (failrec->this_mirror > num_copies) {
-		btrfs_debug(fs_info,
-			"Check Repairable: (fail) num_copies=%d, next_mirror %d, failed_mirror %d",
-			num_copies, failrec->this_mirror, failed_mirror);
-		return false;
-	}
-
-	return true;
-}
-
-int btrfs_repair_one_sector(struct inode *inode,
-			    struct bio *failed_bio, u32 bio_offset,
-			    struct page *page, unsigned int pgoff,
-			    u64 start, int failed_mirror,
-			    submit_bio_hook_t *submit_bio_hook)
-{
-	struct io_failure_record *failrec;
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
-	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
-	struct btrfs_bio *failed_bbio = btrfs_bio(failed_bio);
-	const int icsum = bio_offset >> fs_info->sectorsize_bits;
-	struct bio *repair_bio;
-	struct btrfs_bio *repair_bbio;
-
-	btrfs_debug(fs_info,
-		   "repair read error: read error at %llu", start);
-
-	BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
-
-	failrec = btrfs_get_io_failure_record(inode, start);
-	if (IS_ERR(failrec))
-		return PTR_ERR(failrec);
-
-
-	if (!btrfs_check_repairable(inode, failrec, failed_mirror)) {
-		free_io_failure(failure_tree, tree, failrec);
-		return -EIO;
-	}
-
-	repair_bio = btrfs_bio_alloc(1);
-	repair_bbio = btrfs_bio(repair_bio);
-	repair_bbio->file_offset = start;
-	repair_bio->bi_opf = REQ_OP_READ;
-	repair_bio->bi_end_io = failed_bio->bi_end_io;
-	repair_bio->bi_iter.bi_sector = failrec->logical >> 9;
-	repair_bio->bi_private = failed_bio->bi_private;
-
-	if (failed_bbio->csum) {
-		const u32 csum_size = fs_info->csum_size;
-
-		repair_bbio->csum = repair_bbio->csum_inline;
-		memcpy(repair_bbio->csum,
-		       failed_bbio->csum + csum_size * icsum, csum_size);
-	}
-
-	bio_add_page(repair_bio, page, failrec->len, pgoff);
-	repair_bbio->iter = repair_bio->bi_iter;
-
-	btrfs_debug(btrfs_sb(inode->i_sb),
-		    "repair read error: submitting new read to mirror %d",
-		    failrec->this_mirror);
-
-	/*
-	 * At this point we have a bio, so any errors from submit_bio_hook()
-	 * will be handled by the endio on the repair_bio, so we can't return an
-	 * error here.
-	 */
-	submit_bio_hook(inode, repair_bio, failrec->this_mirror, failrec->compress_type);
-	return BLK_STS_OK;
-}
-
 static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 6cdcea1551a66..e46fe23f6aff4 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -61,7 +61,6 @@ struct btrfs_root;
 struct btrfs_inode;
 struct btrfs_io_bio;
 struct btrfs_fs_info;
-struct io_failure_record;
 struct extent_io_tree;
 
 typedef void (submit_bio_hook_t)(struct inode *inode, struct bio *bio,
@@ -253,29 +252,6 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size);
 void end_extent_writepage(struct page *page, int err, u64 start, u64 end);
 int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num);
 
-/*
- * When IO fails, either with EIO or csum verification fails, we
- * try other mirrors that might have a good copy of the data.  This
- * io_failure_record is used to record state as we go through all the
- * mirrors.  If another mirror has good data, the sector is set up to date
- * and things continue.  If a good mirror can't be found, the original
- * bio end_io callback is called to indicate things have failed.
- */
-struct io_failure_record {
-	struct page *page;
-	u64 start;
-	u64 len;
-	u64 logical;
-	enum btrfs_compression_type compress_type;
-	int this_mirror;
-	int failed_mirror;
-};
-
-int btrfs_repair_one_sector(struct inode *inode,
-			    struct bio *failed_bio, u32 bio_offset,
-			    struct page *page, unsigned int pgoff,
-			    u64 start, int failed_mirror,
-			    submit_bio_hook_t *submit_bio_hook);
 int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 			    u64 length, u64 logical, struct page *page,
 			    unsigned int pg_offset, int mirror_num);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 51cca8f343b72..37c1af902a84b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3143,8 +3143,6 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 					ordered_extent->disk_num_bytes);
 	}
 
-	btrfs_free_io_failure_record(inode, start, end);
-
 	if (test_bit(BTRFS_ORDERED_TRUNCATED, &ordered_extent->flags)) {
 		truncated = true;
 		logical_len = ordered_extent->truncated_len;
@@ -5353,8 +5351,6 @@ void btrfs_evict_inode(struct inode *inode)
 	if (is_bad_inode(inode))
 		goto no_delete;
 
-	btrfs_free_io_failure_record(BTRFS_I(inode), 0, (u64)-1);
-
 	if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
 		goto no_delete;
 
@@ -8838,12 +8834,9 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
 	inode = &ei->vfs_inode;
 	extent_map_tree_init(&ei->extent_tree);
 	extent_io_tree_init(fs_info, &ei->io_tree, IO_TREE_INODE_IO, inode);
-	extent_io_tree_init(fs_info, &ei->io_failure_tree,
-			    IO_TREE_INODE_IO_FAILURE, inode);
 	extent_io_tree_init(fs_info, &ei->file_extent_tree,
 			    IO_TREE_INODE_FILE_EXTENT, inode);
 	ei->io_tree.track_uptodate = true;
-	ei->io_failure_tree.track_uptodate = true;
 	atomic_set(&ei->sync_writers, 0);
 	mutex_init(&ei->log_mutex);
 	btrfs_ordered_inode_tree_init(&ei->ordered_tree);
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 290f07eb050af..764e9643c123c 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -82,7 +82,6 @@ struct btrfs_space_info;
 	EM( IO_TREE_FS_EXCLUDED_EXTENTS,  "EXCLUDED_EXTENTS")	    \
 	EM( IO_TREE_BTREE_INODE_IO,	  "BTREE_INODE_IO")	    \
 	EM( IO_TREE_INODE_IO,		  "INODE_IO")		    \
-	EM( IO_TREE_INODE_IO_FAILURE,	  "INODE_IO_FAILURE")	    \
 	EM( IO_TREE_RELOC_BLOCKS,	  "RELOC_BLOCKS")	    \
 	EM( IO_TREE_TRANS_DIRTY_PAGES,	  "TRANS_DIRTY_PAGES")      \
 	EM( IO_TREE_ROOT_DIRTY_LOG_PAGES, "ROOT_DIRTY_LOG_PAGES")   \
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/15] btrfs: introduce a pure data checksum checking helper
  2022-05-17 14:50 ` [PATCH 01/15] btrfs: introduce a pure data checksum checking helper Christoph Hellwig
@ 2022-05-17 14:59   ` Johannes Thumshirn
  2022-05-18  8:44     ` Christoph Hellwig
  2022-05-20  8:45   ` Nikolay Borisov
  1 sibling, 1 reply; 55+ messages in thread
From: Johannes Thumshirn @ 2022-05-17 14:59 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs

On 17/05/2022 16:51, Christoph Hellwig wrote:
> -	if (memcmp(csum, csum_expected, csum_size))
> -		goto zeroit;
> +	if (!btrfs_check_data_sector(fs_info, page, pgoff, csum, csum_expected))
> +		return 0;
>  
> -	return 0;
> -zeroit:

This makes the read flow a bit awkward IMHO, as it returns in the middle of the
function with the "good" condition and then continues with error handling.

How about:

	if (btrfs_check_data_sector(...))
		goto zeroit;

	return 0;

zeroit:
	btrfs_print_data_csum_error(...);

	

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/15] btrfs: quit early if the fs has no RAID56 support for raid56 related checks
  2022-05-17 14:50 ` [PATCH 02/15] btrfs: quit early if the fs has no RAID56 support for raid56 related checks Christoph Hellwig
@ 2022-05-17 15:00   ` Johannes Thumshirn
  2022-05-18 17:07   ` Anand Jain
  2022-05-20  8:47   ` Nikolay Borisov
  2 siblings, 0 replies; 55+ messages in thread
From: Johannes Thumshirn @ 2022-05-17 15:00 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 07/15] btrfs: factor out a helper to end a single sector from submit_data_read_repair
  2022-05-17 14:50 ` [PATCH 07/15] btrfs: factor out a helper to end a single sector from submit_data_read_repair Christoph Hellwig
@ 2022-05-17 15:18   ` Johannes Thumshirn
  2022-05-17 22:17   ` Qu Wenruo
  1 sibling, 0 replies; 55+ messages in thread
From: Johannes Thumshirn @ 2022-05-17 15:18 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 06/15] btrfs: make repair_io_failure available outside of extent_io.c
  2022-05-17 14:50 ` [PATCH 06/15] btrfs: make repair_io_failure available outside of extent_io.c Christoph Hellwig
@ 2022-05-17 15:18   ` Johannes Thumshirn
  0 siblings, 0 replies; 55+ messages in thread
From: Johannes Thumshirn @ 2022-05-17 15:18 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 09/15] btrfs: factor out a btrfs_csum_ptr helper
  2022-05-17 14:50 ` [PATCH 09/15] btrfs: factor out a btrfs_csum_ptr helper Christoph Hellwig
@ 2022-05-17 15:24   ` Johannes Thumshirn
  2022-05-18  8:45     ` Christoph Hellwig
  0 siblings, 1 reply; 55+ messages in thread
From: Johannes Thumshirn @ 2022-05-17 15:24 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs

On 17/05/2022 16:52, Christoph Hellwig wrote:
> +static inline u8 *btrfs_csum_ptr(struct btrfs_fs_info *fs_info, u8 *csums,
> +		u64 offset)
> +{
> +	return csums +
> +		((offset >> fs_info->sectorsize_bits) * fs_info->csum_size);

I guess a local variable for holding 'offset >> fs_info->sectorsize_bits'
wouldn't have hurt readability and the compiler would've optimized it away,
but that's just me nitpicking.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/15] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks
  2022-05-17 14:50 ` [PATCH 05/15] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks Christoph Hellwig
@ 2022-05-17 15:27   ` Johannes Thumshirn
  2022-05-18  8:46     ` Christoph Hellwig
  0 siblings, 1 reply; 55+ messages in thread
From: Johannes Thumshirn @ 2022-05-17 15:27 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs

On 17/05/2022 16:51, Christoph Hellwig wrote:
> +/*
> + * Iterate through a btrfs_bio (@bbio) on a per-sector basis.
> + */
> +#define btrfs_bio_for_each_sector(fs_info, bvl, bbio, iter, bio_offset)	\
> +	for ((iter) = (bbio)->iter, (bio_offset) = 0;			\
> +	     (iter).bi_size &&					\
> +	     (((bvl) = bio_iter_iovec((&(bbio)->bio), (iter))), 1);	\
> +	     (bio_offset) += fs_info->sectorsize,			\
> +	     bio_advance_iter_single(&(bbio)->bio, &(iter),		\
> +	     (fs_info)->sectorsize))
> +
> +

This is first used in patch 12 why not move there?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 04/15] btrfs: remove duplicated parameters from submit_data_read_repair()
  2022-05-17 14:50 ` [PATCH 04/15] btrfs: remove duplicated parameters from submit_data_read_repair() Christoph Hellwig
@ 2022-05-17 15:35   ` Johannes Thumshirn
  2022-05-20 10:05   ` Nikolay Borisov
  1 sibling, 0 replies; 55+ messages in thread
From: Johannes Thumshirn @ 2022-05-17 15:35 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 10/15] btrfs: add a btrfs_map_bio_wait helper
  2022-05-17 14:50 ` [PATCH 10/15] btrfs: add a btrfs_map_bio_wait helper Christoph Hellwig
@ 2022-05-17 15:37   ` Johannes Thumshirn
  2022-05-17 22:26   ` Qu Wenruo
  1 sibling, 0 replies; 55+ messages in thread
From: Johannes Thumshirn @ 2022-05-17 15:37 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 07/15] btrfs: factor out a helper to end a single sector from submit_data_read_repair
  2022-05-17 14:50 ` [PATCH 07/15] btrfs: factor out a helper to end a single sector from submit_data_read_repair Christoph Hellwig
  2022-05-17 15:18   ` Johannes Thumshirn
@ 2022-05-17 22:17   ` Qu Wenruo
  1 sibling, 0 replies; 55+ messages in thread
From: Qu Wenruo @ 2022-05-17 22:17 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs



On 2022/5/17 22:50, Christoph Hellwig wrote:
> Add a helper to end I/O on a single sector, which will come in handy with
> the new read repair code.

The code looks good to me.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Just one thing to mention, I also considered such refactor, but the main
reason to prevent me doing this is, there are already more than enough
helpers, and without a good enough naming, it can be messy easily.

Although it looks like we're fine with the new helper for now.

Thanks,
Qu
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/extent_io.c | 26 +++++++++++++++-----------
>   1 file changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 75466747a252c..f96d5b7071813 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2740,6 +2740,20 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
>   		btrfs_subpage_end_reader(fs_info, page, start, len);
>   }
>
> +static void end_sector_io(struct page *page, u64 offset, bool uptodate)
> +{
> +	struct inode *inode = page->mapping->host;
> +	u32 sectorsize = btrfs_sb(inode->i_sb)->sectorsize;
> +	struct extent_state *cached = NULL;
> +
> +	end_page_read(page, uptodate, offset, sectorsize);
> +	if (uptodate)
> +		set_extent_uptodate(&BTRFS_I(inode)->io_tree, offset,
> +				offset + sectorsize - 1, &cached, GFP_ATOMIC);
> +	unlock_extent_cached_atomic(&BTRFS_I(inode)->io_tree, offset,
> +			offset + sectorsize - 1, &cached);
> +}
> +
>   static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
>   		u32 bio_offset, const struct bio_vec *bvec, int failed_mirror,
>   		unsigned int error_bitmap)
> @@ -2770,7 +2784,6 @@ static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
>   	/* Iterate through all the sectors in the range */
>   	for (i = 0; i < nr_bits; i++) {
>   		const unsigned int offset = i * sectorsize;
> -		struct extent_state *cached = NULL;
>   		bool uptodate = false;
>   		int ret;
>
> @@ -2801,16 +2814,7 @@ static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
>   		 * will not be properly unlocked.
>   		 */
>   next:
> -		end_page_read(page, uptodate, start + offset, sectorsize);
> -		if (uptodate)
> -			set_extent_uptodate(&BTRFS_I(inode)->io_tree,
> -					start + offset,
> -					start + offset + sectorsize - 1,
> -					&cached, GFP_ATOMIC);
> -		unlock_extent_cached_atomic(&BTRFS_I(inode)->io_tree,
> -				start + offset,
> -				start + offset + sectorsize - 1,
> -				&cached);
> +		end_sector_io(page, start + offset, uptodate);
>   	}
>   }
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/15] btrfs: refactor end_bio_extent_readpage
  2022-05-17 14:50 ` [PATCH 08/15] btrfs: refactor end_bio_extent_readpage Christoph Hellwig
@ 2022-05-17 22:22   ` Qu Wenruo
  2022-05-18  8:48     ` Christoph Hellwig
  0 siblings, 1 reply; 55+ messages in thread
From: Qu Wenruo @ 2022-05-17 22:22 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs



On 2022/5/17 22:50, Christoph Hellwig wrote:
> Untangle the goto mess and remove the pointless 'ret' local variable.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/extent_io.c | 87 +++++++++++++++++++++-----------------------
>   1 file changed, 41 insertions(+), 46 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index f96d5b7071813..1ba2d4b194f2e 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3028,7 +3028,6 @@ static void end_bio_extent_readpage(struct bio *bio)
>   	 */
>   	u32 bio_offset = 0;
>   	int mirror;
> -	int ret;
>   	struct bvec_iter_all iter_all;
>
>   	ASSERT(!bio_flagged(bio, BIO_CLONED));
> @@ -3039,6 +3038,7 @@ static void end_bio_extent_readpage(struct bio *bio)
>   		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   		const u32 sectorsize = fs_info->sectorsize;
>   		unsigned int error_bitmap = (unsigned int)-1;
> +		bool repair = false;
>   		u64 start;
>   		u64 end;
>   		u32 len;
> @@ -3076,55 +3076,23 @@ static void end_bio_extent_readpage(struct bio *bio)
>   			if (is_data_inode(inode)) {
>   				error_bitmap = btrfs_verify_data_csum(bbio,
>   						bio_offset, page, start, end);
> -				ret = error_bitmap;
> +				if (error_bitmap)
> +					uptodate = false;
>   			} else {
> -				ret = btrfs_validate_metadata_buffer(bbio,
> -					page, start, end, mirror);
> +				if (btrfs_validate_metadata_buffer(bbio,
> +						page, start, end, mirror))
> +					uptodate = false;
>   			}
> -			if (ret)
> -				uptodate = false;
> -			else
> -				clean_io_failure(BTRFS_I(inode)->root->fs_info,
> -						 failure_tree, tree, start,
> -						 page,
> -						 btrfs_ino(BTRFS_I(inode)), 0);
>   		}
>
> -		if (likely(uptodate))
> -			goto readpage_ok;
> -
> -		if (is_data_inode(inode)) {
> -			/*
> -			 * If we failed to submit the IO at all we'll have a
> -			 * mirror_num == 0, in which case we need to just mark
> -			 * the page with an error and unlock it and carry on.
> -			 */
> -			if (mirror == 0)
> -				goto readpage_ok;
> -
> -			/*
> -			 * submit_data_read_repair() will handle all the good
> -			 * and bad sectors, we just continue to the next bvec.
> -			 */
> -			submit_data_read_repair(inode, bio, bio_offset, bvec,
> -						mirror, error_bitmap);
> -
> -			ASSERT(bio_offset + len > bio_offset);
> -			bio_offset += len;
> -			continue;
> -		} else {
> -			struct extent_buffer *eb;
> -
> -			eb = find_extent_buffer_readpage(fs_info, page, start);
> -			set_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
> -			eb->read_mirror = mirror;
> -			atomic_dec(&eb->io_pages);
> -		}
> -readpage_ok:
>   		if (likely(uptodate)) {
>   			loff_t i_size = i_size_read(inode);
>   			pgoff_t end_index = i_size >> PAGE_SHIFT;
>
> +			clean_io_failure(BTRFS_I(inode)->root->fs_info,
> +					 failure_tree, tree, start, page,
> +					 btrfs_ino(BTRFS_I(inode)), 0);
> +
>   			/*
>   			 * Zero out the remaining part if this range straddles
>   			 * i_size.
> @@ -3141,14 +3109,41 @@ static void end_bio_extent_readpage(struct bio *bio)
>   				zero_user_segment(page, zero_start,
>   						  offset_in_page(end) + 1);
>   			}
> +		} else if (is_data_inode(inode)) {
> +			/*
> +			 * Only try to repair bios that actually made it to a
> +			 * device.  If the bio failed to be submitted mirror
> +			 * is 0 and we need to fail it without retrying.
> +			 */
> +			if (mirror > 0)
> +				repair = true;

In fact, you can do it even better, by also checking the number of
copies the bio has.

That what I missed in my all patchset, and completely relies on the
mirror based loop to exit.

> +		} else {
> +			struct extent_buffer *eb;
> +
> +			eb = find_extent_buffer_readpage(fs_info, page, start);
> +			set_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
> +			eb->read_mirror = mirror;
> +			atomic_dec(&eb->io_pages);
>   		}
> +
> +		if (repair) {
> +			/*
> +			 * submit_data_read_repair() will handle all the good
> +			 * and bad sectors, we just continue to the next bvec.
> +			 */
> +			submit_data_read_repair(inode, bio, bio_offset, bvec,
> +						mirror, error_bitmap);
> +		} else {
> +			/* Update page status and unlock */
> +			end_page_read(page, uptodate, start, len);
> +			endio_readpage_release_extent(&processed,
> +					BTRFS_I(inode), start, end,
> +					PageUptodate(page));

Another reason I'm not introducing the end_sector_io().

Here the code looks super familiar with end_sector_io(), but due to the
@processed interface, it's not compatible with end_sector_io(), or we
will do the unlock with way more io_tree operations with much smaller
range, thus reduces performance.

Thanks,
Qu

> +		}
> +
>   		ASSERT(bio_offset + len > bio_offset);
>   		bio_offset += len;
>
> -		/* Update page status and unlock */
> -		end_page_read(page, uptodate, start, len);
> -		endio_readpage_release_extent(&processed, BTRFS_I(inode),
> -					      start, end, PageUptodate(page));
>   	}
>   	/* Release the last extent */
>   	endio_readpage_release_extent(&processed, NULL, 0, 0, false);

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 10/15] btrfs: add a btrfs_map_bio_wait helper
  2022-05-17 14:50 ` [PATCH 10/15] btrfs: add a btrfs_map_bio_wait helper Christoph Hellwig
  2022-05-17 15:37   ` Johannes Thumshirn
@ 2022-05-17 22:26   ` Qu Wenruo
  2022-05-18  8:47     ` Christoph Hellwig
  1 sibling, 1 reply; 55+ messages in thread
From: Qu Wenruo @ 2022-05-17 22:26 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs



On 2022/5/17 22:50, Christoph Hellwig wrote:
> This helpers works like submit_bio_wait, but goes through the btrfs bio
> mapping using btrfs_map_bio.

I hate the naming of btrfs_map_bio(), which should be
btrfs_map_and_submit_bio(), but I also totally understand my poor naming
scheme is even worse for most cases.

Maybe we can add the "submit" part into the new function?
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/volumes.c | 21 +++++++++++++++++++++
>   fs/btrfs/volumes.h |  2 ++
>   2 files changed, 23 insertions(+)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 0819db46dbc42..8925bc606db7e 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6818,6 +6818,27 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>   	return BLK_STS_OK;
>   }
>
> +static void btrfs_end_io_sync(struct bio *bio)
> +{
> +	complete(bio->bi_private);
> +}
> +
> +blk_status_t btrfs_map_bio_wait(struct btrfs_fs_info *fs_info, struct bio *bio,
> +		int mirror)
> +{
> +	DECLARE_COMPLETION_ONSTACK(done);
> +	blk_status_t ret;

Is there any lockdep assert to make sure we're in wq context?

Despite these nitpicks, it looks good to me.

Thanks,
Qu

> +
> +	bio->bi_private = &done;
> +	bio->bi_end_io = btrfs_end_io_sync;
> +	ret = btrfs_map_bio(fs_info, bio, mirror);
> +	if (ret)
> +		return ret;
> +
> +	wait_for_completion_io(&done);
> +	return bio->bi_status;
> +}
> +
>   static bool dev_args_match_fs_devices(const struct btrfs_dev_lookup_args *args,
>   				      const struct btrfs_fs_devices *fs_devices)
>   {
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 6f784d4f54664..b346f6c401515 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -555,6 +555,8 @@ struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
>   void btrfs_mapping_tree_free(struct extent_map_tree *tree);
>   blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>   			   int mirror_num);
> +blk_status_t btrfs_map_bio_wait(struct btrfs_fs_info *fs_info, struct bio *bio,
> +		int mirror);
>   int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
>   		       fmode_t flags, void *holder);
>   struct btrfs_device *btrfs_scan_one_device(const char *path,

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 11/15] btrfs: set ->file_offset in end_bio_extent_readpage
  2022-05-17 14:50 ` [PATCH 11/15] btrfs: set ->file_offset in end_bio_extent_readpage Christoph Hellwig
@ 2022-05-17 22:47   ` Qu Wenruo
  0 siblings, 0 replies; 55+ messages in thread
From: Qu Wenruo @ 2022-05-17 22:47 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs



On 2022/5/17 22:50, Christoph Hellwig wrote:
> The new repair code expects ->file_offset to be set for all bios.  Set
> it just after entering end_bio_extent_readpage.  As that requires looking
> at the first vector before the first loop iteration also use that
> opportunity to set various file-wide variables just once instead of once
> per loop iteration.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/extent_io.c | 22 ++++++++++++----------
>   1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 1ba2d4b194f2e..cbe3ab24af9e5 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3018,25 +3018,30 @@ static struct extent_buffer *find_extent_buffer_readpage(
>    */
>   static void end_bio_extent_readpage(struct bio *bio)
>   {
> +	struct bio_vec *first_vec = bio_first_bvec_all(bio);
> +	struct inode *inode = first_vec->bv_page->mapping->host;
> +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> +	const u32 sectorsize = fs_info->sectorsize;
>   	struct bio_vec *bvec;
>   	struct btrfs_bio *bbio = btrfs_bio(bio);
> -	struct extent_io_tree *tree, *failure_tree;
> +	int mirror = bbio->mirror_num;
> +	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
> +	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
> +	bool uptodate = !bio->bi_status;
>   	struct processed_extent processed = { 0 };
>   	/*
>   	 * The offset to the beginning of a bio, since one bio can never be
>   	 * larger than UINT_MAX, u32 here is enough.
>   	 */
>   	u32 bio_offset = 0;
> -	int mirror;
>   	struct bvec_iter_all iter_all;
>
> +	btrfs_bio(bio)->file_offset =
> +		page_offset(first_vec->bv_page) + first_vec->bv_offset;
> +
>   	ASSERT(!bio_flagged(bio, BIO_CLONED));
>   	bio_for_each_segment_all(bvec, bio, iter_all) {
> -		bool uptodate = !bio->bi_status;
>   		struct page *page = bvec->bv_page;
> -		struct inode *inode = page->mapping->host;
> -		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> -		const u32 sectorsize = fs_info->sectorsize;
>   		unsigned int error_bitmap = (unsigned int)-1;
>   		bool repair = false;
>   		u64 start;
> @@ -3046,9 +3051,7 @@ static void end_bio_extent_readpage(struct bio *bio)
>   		btrfs_debug(fs_info,
>   			"end_bio_extent_readpage: bi_sector=%llu, err=%d, mirror=%u",
>   			bio->bi_iter.bi_sector, bio->bi_status,
> -			bbio->mirror_num);
> -		tree = &BTRFS_I(inode)->io_tree;
> -		failure_tree = &BTRFS_I(inode)->io_failure_tree;
> +			mirror);
>
>   		/*
>   		 * We always issue full-sector reads, but if some block in a
> @@ -3071,7 +3074,6 @@ static void end_bio_extent_readpage(struct bio *bio)
>   		end = start + bvec->bv_len - 1;
>   		len = bvec->bv_len;
>
> -		mirror = bbio->mirror_num;
>   		if (likely(uptodate)) {
>   			if (is_data_inode(inode)) {
>   				error_bitmap = btrfs_verify_data_csum(bbio,

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/15] btrfs: add new read repair infrastructure
  2022-05-17 14:50 ` [PATCH 12/15] btrfs: add new read repair infrastructure Christoph Hellwig
@ 2022-05-17 23:04   ` Qu Wenruo
  2022-05-18  8:54     ` Christoph Hellwig
                       ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Qu Wenruo @ 2022-05-17 23:04 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs



On 2022/5/17 22:50, Christoph Hellwig wrote:
> This adds a new read repair implementation for btrfs.  It is synchronous
> in that the end I/O handlers call them, and will get back the results
> instead of potentially getting multiple concurrent calls back into the
> original end I/O handler.  The synchronous nature has the following
> advantages:
>
>   - there is no need for a per-I/O tree of I/O failures, as everything
>     related to the I/O failure can be handled locally
>   - not having separate repair end I/O helpers will in the future help
>     to reuse the direct I/O bio from iomap for the actual submission and
>     thus remove the btrfs_dio_private infrastructure
>
> Because submitting many sector size synchronous I/Os would be very
> slow when multiple sectors (or a whole read) fail, this new code instead
> submits a single read and repair write bio for each contiguous section.
> It uses clone of the bio to do that and thus does not need to allocate
> any extra bio_vecs.  Note that this cloning is open coded instead of
> using the block layer clone helpers as the clone is based on the save
> iter in the btrfs_bio, and not bio.bi_iter, which at the point that the
> repair code is called has been advanced by the low-level driver.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/Makefile      |   2 +-
>   fs/btrfs/read-repair.c | 211 +++++++++++++++++++++++++++++++++++++++++
>   fs/btrfs/read-repair.h |  33 +++++++
>   fs/btrfs/super.c       |   9 +-
>   4 files changed, 252 insertions(+), 3 deletions(-)
>   create mode 100644 fs/btrfs/read-repair.c
>   create mode 100644 fs/btrfs/read-repair.h
>
> diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
> index 99f9995670ea3..0b2605c750cab 100644
> --- a/fs/btrfs/Makefile
> +++ b/fs/btrfs/Makefile
> @@ -31,7 +31,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
>   	   backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
>   	   uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \
>   	   block-rsv.o delalloc-space.o block-group.o discard.o reflink.o \
> -	   subpage.o tree-mod-log.o
> +	   subpage.o tree-mod-log.o read-repair.o
>
>   btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
>   btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
> diff --git a/fs/btrfs/read-repair.c b/fs/btrfs/read-repair.c
> new file mode 100644
> index 0000000000000..3ac93bfe09e4f
> --- /dev/null
> +++ b/fs/btrfs/read-repair.c
> @@ -0,0 +1,211 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2022 Christoph Hellwig.
> + */
> +#include "ctree.h"
> +#include "volumes.h"
> +#include "read-repair.h"
> +#include "btrfs_inode.h"
> +
> +static struct bio_set read_repair_bioset;
> +
> +static int next_mirror(struct btrfs_read_repair *rr, int cur_mirror)
> +{
> +	if (cur_mirror == rr->num_copies)
> +		return cur_mirror + 1 - rr->num_copies;
> +	return cur_mirror + 1;
> +}
> +
> +static int prev_mirror(struct btrfs_read_repair *rr, int cur_mirror)
> +{
> +	if (cur_mirror == 1)
> +		return rr->num_copies;
> +	return cur_mirror - 1;
> +}
> +
> +/*
> + * Clone a new bio from the src_bbio, using the saved iter in the btrfs_bio
> + * instead of bi_iter.
> + */
> +static struct btrfs_bio *btrfs_repair_bio_clone(struct btrfs_bio *src_bbio,
> +		u64 offset, u32 size, unsigned int op)
> +{
> +	struct btrfs_bio *bbio;
> +	struct bio *bio;
> +
> +	bio = bio_alloc_bioset(NULL, 0, op | REQ_SYNC, GFP_NOFS,
> +			       &read_repair_bioset);
> +	bio_set_flag(bio, BIO_CLONED);

Do we need to bother setting the CLONED flag?

Without CLONED flag, we can easily go bio_for_each_segment_all() in the
endio function without the need of bbio->iter, thus can save some memory.

> +
> +	bio->bi_io_vec = src_bbio->bio.bi_io_vec;
> +	bio->bi_iter = src_bbio->iter;
> +	bio_advance(bio, offset);
> +	bio->bi_iter.bi_size = size;
> +
> +	bbio = btrfs_bio(bio);
> +	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
> +	bbio->iter = bbio->bio.bi_iter;
> +	bbio->file_offset = src_bbio->file_offset + offset;
> +
> +	return bbio;
> +}
> +
> +static void btrfs_repair_one_mirror(struct btrfs_bio *read_bbio,
> +		struct btrfs_bio *failed_bbio, struct inode *inode,
> +		int bad_mirror)
> +{
> +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> +	struct btrfs_inode *bi = BTRFS_I(inode);
> +	u64 logical = read_bbio->iter.bi_sector << SECTOR_SHIFT;
> +	u64 file_offset = read_bbio->file_offset;
> +	struct btrfs_bio *write_bbio;
> +	blk_status_t ret;
> +
> +	/*
> +	 * For zoned file systems repair has to relocate the whole zone.
> +	 */
> +	if (btrfs_repair_one_zone(fs_info, logical))
> +		return;
> +
> +	/*
> +	 * For RAID56, we can not just write the bad data back, as any write
> +	 * will trigger RMW and read back the corrrupted on-disk stripe, causing
> +	 * further damage.
> +	 *
> +	 * Perform a special low-level repair that bypasses btrfs_map_bio.
> +	 */
> +	if (btrfs_is_parity_mirror(fs_info, logical, fs_info->sectorsize)) {
> +		struct bvec_iter iter;
> +		struct bio_vec bv;
> +		u32 offset;
> +
> +		btrfs_bio_for_each_sector(fs_info, bv, read_bbio, iter, offset)
> +			btrfs_repair_io_failure(fs_info, btrfs_ino(bi),
> +					file_offset + offset,
> +					fs_info->sectorsize,
> +					logical + offset,
> +					bv.bv_page, bv.bv_offset,
> +					bad_mirror);
> +		return;
> +	}
> +
> +	/*
> +	 * Otherwise just clone the whole bio and write it back to the
> +	 * previously bad mirror.
> +	 */
> +	write_bbio = btrfs_repair_bio_clone(read_bbio, 0,
> +			read_bbio->iter.bi_size, REQ_OP_WRITE);

Do we need to clone the whole bio?

Considering under most cases the read repair is already the cold path,
have more than one corruption is already rare.

> +	ret = btrfs_map_bio_wait(fs_info, &write_bbio->bio, bad_mirror);
> +	bio_put(&write_bbio->bio);
> +
> +	btrfs_info_rl(fs_info,
> +		"%s: root %lld ino %llu off %llu logical %llu/%u from good mirror %d",
> +		ret ? "failed to correct read error" : "read error corrected",
> +		bi->root->root_key.objectid, btrfs_ino(bi),
> +		file_offset, logical, read_bbio->iter.bi_size, bad_mirror);
> +}
> +
> +static bool btrfs_repair_read_bio(struct btrfs_bio *bbio,
> +		struct btrfs_bio *failed_bbio, struct inode *inode,
> +		int read_mirror)
> +{
> +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> +	u32 start_offset = bbio->file_offset - failed_bbio->file_offset;
> +	u8 csum[BTRFS_CSUM_SIZE];
> +	struct bvec_iter iter;
> +	struct bio_vec bv;
> +	u32 offset;
> +
> +	if (btrfs_map_bio_wait(fs_info, &bbio->bio, read_mirror))
> +		return false;
> +
> +	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
> +		return true;
> +
> +	btrfs_bio_for_each_sector(fs_info, bv, bbio, iter, offset) {
> +		u8 *expected_csum =
> +			btrfs_csum_ptr(fs_info, failed_bbio->csum,
> +					start_offset + offset);
> +
> +		if (btrfs_check_data_sector(fs_info, bv.bv_page, bv.bv_offset,
> +				csum, expected_csum))
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
> +bool __btrfs_read_repair_end(struct btrfs_read_repair *rr,
> +		struct btrfs_bio *failed_bbio, struct inode *inode,
> +		u64 end_offset, repair_endio_t endio)

The reason I don't use "end" in my patchset is, any thing related "end"
will let people to think it has something related with endio.

And in fact when checking the function, I really thought it's something
related to endio, but it's the equivalent of btrfs_read_repair_finish().

> +{
> +	u8 bad_mirror = failed_bbio->mirror_num;
> +	u8 read_mirror = next_mirror(rr, bad_mirror);
> +	struct btrfs_bio *read_bbio = NULL;
> +	bool uptodate = false;
> +
> +	do {
> +		if (read_bbio)
> +			bio_put(&read_bbio->bio);
> +		read_bbio = btrfs_repair_bio_clone(failed_bbio,
> +				rr->start_offset, end_offset - rr->start_offset,
> +				REQ_OP_READ);
> +		if (btrfs_repair_read_bio(read_bbio, failed_bbio, inode,
> +				read_mirror)) {

A big NONO here.

Function btrfs_repair_read_bio() will only return true if all of its
data matches csum.

Consider the following case:

Profile RAID1C3, 2 sectors to read, the initial mirror is 1.

Mirror 1:	|X|X|
Mirror 2:	|X| |
Mirror 3:	| |X|

Now we will got -EIO, but in reality, we can repair the read by using
the first sector from mirror 3 and the 2nd sector from mirror 2.

This is a behavior regression.

And that's why the original repair and all my patchsets are doing sector
by sector check, not a full range check.

Thanks,
Qu

> +			do {
> +		    		btrfs_repair_one_mirror(read_bbio, failed_bbio,
> +						inode, bad_mirror);
> +			} while ((bad_mirror = prev_mirror(rr, bad_mirror)) !=
> +					failed_bbio->mirror_num);
> +			uptodate = true;
> +			break;
> +		}
> +		bad_mirror = read_mirror;
> +		read_mirror = next_mirror(rr, bad_mirror);
> +	} while (read_mirror != failed_bbio->mirror_num);
> +
> +	if (endio)
> +		endio(read_bbio, inode, uptodate);
> +	bio_put(&read_bbio->bio);
> +
> +	rr->in_use = false;
> +	return uptodate;
> +}
> +
> +bool btrfs_read_repair_add(struct btrfs_read_repair *rr,
> +		struct btrfs_bio *failed_bbio, struct inode *inode,
> +		u64 start_offset)
> +{
> +	if (rr->in_use)
> +		return true;
> +
> +	if (!rr->num_copies) {
> +		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> +
> +		rr->num_copies = btrfs_num_copies(fs_info,
> +				failed_bbio->iter.bi_sector << SECTOR_SHIFT,
> +				failed_bbio->iter.bi_size);
> +	}
> +
> +	/*
> +	 * If there is no other copy of the data to recovery from, give up now
> +	 * and don't even try to build up a larget batch.
> +	 */
> +	if (rr->num_copies < 2)
> +		return false;
> +
> +	rr->in_use = true;
> +	rr->start_offset = start_offset;
> +	return true;
> +}
> +
> +void btrfs_read_repair_exit(void)
> +{
> +	bioset_exit(&read_repair_bioset);
> +}
> +
> +int __init btrfs_read_repair_init(void)
> +{
> +	return bioset_init(&read_repair_bioset, BIO_POOL_SIZE,
> +			offsetof(struct btrfs_bio, bio), 0);
> +}
> diff --git a/fs/btrfs/read-repair.h b/fs/btrfs/read-repair.h
> new file mode 100644
> index 0000000000000..e371790af2b3e
> --- /dev/null
> +++ b/fs/btrfs/read-repair.h
> @@ -0,0 +1,33 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef BTRFS_READ_REPAIR_H
> +#define BTRFS_READ_REPAIR_H
> +
> +struct btrfs_read_repair {
> +	u64 start_offset;
> +	bool in_use;
> +	int num_copies;
> +};
> +
> +typedef void (*repair_endio_t)(struct btrfs_bio *repair_bbio,
> +		struct inode *inode, bool uptodate);
> +
> +bool btrfs_read_repair_add(struct btrfs_read_repair *rr,
> +		struct btrfs_bio *failed_bbio, struct inode *inode,
> +		u64 bio_offset);
> +bool __btrfs_read_repair_end(struct btrfs_read_repair *rr,
> +		struct btrfs_bio *failed_bbio, struct inode *inode,
> +		u64 end_offset, repair_endio_t end_io);
> +static inline bool btrfs_read_repair_end(struct btrfs_read_repair *rr,
> +		struct btrfs_bio *failed_bbio, struct inode *inode,
> +		u64 end_offset, repair_endio_t endio)
> +{
> +	if (!rr->in_use)
> +		return true;
> +	return __btrfs_read_repair_end(rr, failed_bbio, inode, end_offset,
> +			endio);
> +}
> +
> +int __init btrfs_read_repair_init(void);
> +void btrfs_read_repair_exit(void);
> +
> +#endif /* BTRFS_READ_REPAIR_H */
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index b1fdc6a26c76e..b33ad892c3058 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -48,6 +48,7 @@
>   #include "block-group.h"
>   #include "discard.h"
>   #include "qgroup.h"
> +#include "read-repair.h"
>   #define CREATE_TRACE_POINTS
>   #include <trace/events/btrfs.h>
>
> @@ -2641,10 +2642,12 @@ static int __init init_btrfs_fs(void)
>   	err = extent_io_init();
>   	if (err)
>   		goto free_cachep;
> -
> -	err = extent_state_cache_init();
> +	err = btrfs_read_repair_init();
>   	if (err)
>   		goto free_extent_io;
> +	err = extent_state_cache_init();
> +	if (err)
> +		goto free_read_repair;
>
>   	err = extent_map_init();
>   	if (err)
> @@ -2708,6 +2711,8 @@ static int __init init_btrfs_fs(void)
>   	extent_map_exit();
>   free_extent_state_cache:
>   	extent_state_cache_exit();
> +free_read_repair:
> +	btrfs_read_repair_exit();
>   free_extent_io:
>   	extent_io_exit();
>   free_cachep:

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/15] btrfs: introduce a pure data checksum checking helper
  2022-05-17 14:59   ` Johannes Thumshirn
@ 2022-05-18  8:44     ` Christoph Hellwig
  0 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-18  8:44 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs

On Tue, May 17, 2022 at 02:59:13PM +0000, Johannes Thumshirn wrote:
> This makes the read flow a bit awkward IMHO, as it returns in the middle of the
> function with the "good" condition and then continues with error handling.
> 
> How about:
> 
> 	if (btrfs_check_data_sector(...))
> 		goto zeroit;
> 
> 	return 0;
> 
> zeroit:
> 	btrfs_print_data_csum_error(...);

Well, the flow was just as a bad before, but otherwise I agree.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 09/15] btrfs: factor out a btrfs_csum_ptr helper
  2022-05-17 15:24   ` Johannes Thumshirn
@ 2022-05-18  8:45     ` Christoph Hellwig
  0 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-18  8:45 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs

On Tue, May 17, 2022 at 03:24:26PM +0000, Johannes Thumshirn wrote:
> On 17/05/2022 16:52, Christoph Hellwig wrote:
> > +static inline u8 *btrfs_csum_ptr(struct btrfs_fs_info *fs_info, u8 *csums,
> > +		u64 offset)
> > +{
> > +	return csums +
> > +		((offset >> fs_info->sectorsize_bits) * fs_info->csum_size);
> 
> I guess a local variable for holding 'offset >> fs_info->sectorsize_bits'
> wouldn't have hurt readability and the compiler would've optimized it away,
> but that's just me nitpicking.

I can do that if there is a strong opinion, but I'm not sure it would
help readability at all.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/15] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks
  2022-05-17 15:27   ` Johannes Thumshirn
@ 2022-05-18  8:46     ` Christoph Hellwig
  2022-05-18 10:07       ` Qu Wenruo
  0 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-18  8:46 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs

On Tue, May 17, 2022 at 03:27:35PM +0000, Johannes Thumshirn wrote:
> On 17/05/2022 16:51, Christoph Hellwig wrote:
> > +/*
> > + * Iterate through a btrfs_bio (@bbio) on a per-sector basis.
> > + */
> > +#define btrfs_bio_for_each_sector(fs_info, bvl, bbio, iter, bio_offset)	\
> > +	for ((iter) = (bbio)->iter, (bio_offset) = 0;			\
> > +	     (iter).bi_size &&					\
> > +	     (((bvl) = bio_iter_iovec((&(bbio)->bio), (iter))), 1);	\
> > +	     (bio_offset) += fs_info->sectorsize,			\
> > +	     bio_advance_iter_single(&(bbio)->bio, &(iter),		\
> > +	     (fs_info)->sectorsize))
> > +
> > +
> 
> This is first used in patch 12 why not move there?

Because it is a logically separate change that doesn't really have
anything to do with the repair code, except that it happens to be the
first user?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 10/15] btrfs: add a btrfs_map_bio_wait helper
  2022-05-17 22:26   ` Qu Wenruo
@ 2022-05-18  8:47     ` Christoph Hellwig
  0 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-18  8:47 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs

On Wed, May 18, 2022 at 06:26:08AM +0800, Qu Wenruo wrote:
>
>
> On 2022/5/17 22:50, Christoph Hellwig wrote:
>> This helpers works like submit_bio_wait, but goes through the btrfs bio
>> mapping using btrfs_map_bio.
>
> I hate the naming of btrfs_map_bio(), which should be
> btrfs_map_and_submit_bio(), but I also totally understand my poor naming
> scheme is even worse for most cases.
>
> Maybe we can add the "submit" part into the new function?

I was tempted to rename btrfs_map_bio to just btrfs_submit_bio a
few times, but always had more important things to do first.  But
either way this helper should match the naming of the main async
function.

>> +	DECLARE_COMPLETION_ONSTACK(done);
>> +	blk_status_t ret;
>
> Is there any lockdep assert to make sure we're in wq context?


No.  You can build some asserts manually, but wait_for_completion will
already do that, so I'm not sure what the benefit would be.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/15] btrfs: refactor end_bio_extent_readpage
  2022-05-17 22:22   ` Qu Wenruo
@ 2022-05-18  8:48     ` Christoph Hellwig
  0 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-18  8:48 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs

On Wed, May 18, 2022 at 06:22:54AM +0800, Qu Wenruo wrote:
>> +		} else if (is_data_inode(inode)) {
>> +			/*
>> +			 * Only try to repair bios that actually made it to a
>> +			 * device.  If the bio failed to be submitted mirror
>> +			 * is 0 and we need to fail it without retrying.
>> +			 */
>> +			if (mirror > 0)
>> +				repair = true;
>
> In fact, you can do it even better, by also checking the number of
> copies the bio has.

That's done first thing in the actual new repair code.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/15] btrfs: add new read repair infrastructure
  2022-05-17 23:04   ` Qu Wenruo
@ 2022-05-18  8:54     ` Christoph Hellwig
  2022-05-18 10:20       ` Qu Wenruo
  2022-05-19  9:36     ` Christoph Hellwig
  2022-05-20 15:25     ` [PATCH 12/15] btrfs: add new read repair infrastructure Christoph Hellwig
  2 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-18  8:54 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs

On Wed, May 18, 2022 at 07:04:22AM +0800, Qu Wenruo wrote:
>> +static struct btrfs_bio *btrfs_repair_bio_clone(struct btrfs_bio *src_bbio,
>> +		u64 offset, u32 size, unsigned int op)
>> +{
>> +	struct btrfs_bio *bbio;
>> +	struct bio *bio;
>> +
>> +	bio = bio_alloc_bioset(NULL, 0, op | REQ_SYNC, GFP_NOFS,
>> +			       &read_repair_bioset);
>> +	bio_set_flag(bio, BIO_CLONED);
>
> Do we need to bother setting the CLONED flag?

The CLONED flag should never be set.  Except for the one bogus check
in btrfs that I have a pending removal for it is only used for debugging
checks.

> Without CLONED flag, we can easily go bio_for_each_segment_all() in the
> endio function without the need of bbio->iter, thus can save some memory.

bio_for_each_segment_all ignores the iter and walks over bi_io_vec
directly.  And that is something we absolutely can't do here, as the
bio reuses the bio_vecs from the failed bio, and depending on what
failed reduces the size.

>> +	/*
>> +	 * Otherwise just clone the whole bio and write it back to the
>> +	 * previously bad mirror.
>> +	 */
>> +	write_bbio = btrfs_repair_bio_clone(read_bbio, 0,
>> +			read_bbio->iter.bi_size, REQ_OP_WRITE);
>
> Do we need to clone the whole bio?
>
> Considering under most cases the read repair is already the cold path,
> have more than one corruption is already rare.

The read bio is trimmed to only cover the bad area, so this already
potentially does not cover the whole failed bbio.  But except for
the case you note below we do need to write back the whole bio.

>> +bool __btrfs_read_repair_end(struct btrfs_read_repair *rr,
>> +		struct btrfs_bio *failed_bbio, struct inode *inode,
>> +		u64 end_offset, repair_endio_t endio)
>
> The reason I don't use "end" in my patchset is, any thing related "end"
> will let people to think it has something related with endio.

That seems like an odd conotation.  Without io in the word end just
conotates the end of a range in most of the Linux I/O stack.

> And in fact when checking the function, I really thought it's something
> related to endio, but it's the equivalent of btrfs_read_repair_finish().

But if there is a strong preference I can use finish instead of end.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/15] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks
  2022-05-18  8:46     ` Christoph Hellwig
@ 2022-05-18 10:07       ` Qu Wenruo
  2022-05-20 16:27         ` Christoph Hellwig
  0 siblings, 1 reply; 55+ messages in thread
From: Qu Wenruo @ 2022-05-18 10:07 UTC (permalink / raw)
  To: Christoph Hellwig, Johannes Thumshirn
  Cc: Chris Mason, Josef Bacik, David Sterba, Qu Wenruo, linux-btrfs



On 2022/5/18 16:46, Christoph Hellwig wrote:
> On Tue, May 17, 2022 at 03:27:35PM +0000, Johannes Thumshirn wrote:
>> On 17/05/2022 16:51, Christoph Hellwig wrote:
>>> +/*
>>> + * Iterate through a btrfs_bio (@bbio) on a per-sector basis.
>>> + */
>>> +#define btrfs_bio_for_each_sector(fs_info, bvl, bbio, iter, bio_offset)	\
>>> +	for ((iter) = (bbio)->iter, (bio_offset) = 0;			\
>>> +	     (iter).bi_size &&					\
>>> +	     (((bvl) = bio_iter_iovec((&(bbio)->bio), (iter))), 1);	\
>>> +	     (bio_offset) += fs_info->sectorsize,			\
>>> +	     bio_advance_iter_single(&(bbio)->bio, &(iter),		\
>>> +	     (fs_info)->sectorsize))
>>> +
>>> +
>>
>> This is first used in patch 12 why not move there?
>
> Because it is a logically separate change that doesn't really have
> anything to do with the repair code, except that it happens to be the
> first user?

In fact, there are some other call sites can benefit from the helper,
like btrfs_csum_one_bio().

Thus if you can convert those call sites, there will be no question on
introducing the helper in one patch.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/15] btrfs: add new read repair infrastructure
  2022-05-18  8:54     ` Christoph Hellwig
@ 2022-05-18 10:20       ` Qu Wenruo
  2022-05-18 12:48         ` Christoph Hellwig
  0 siblings, 1 reply; 55+ messages in thread
From: Qu Wenruo @ 2022-05-18 10:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, Qu Wenruo, linux-btrfs



On 2022/5/18 16:54, Christoph Hellwig wrote:
> On Wed, May 18, 2022 at 07:04:22AM +0800, Qu Wenruo wrote:
>>> +static struct btrfs_bio *btrfs_repair_bio_clone(struct btrfs_bio *src_bbio,
>>> +		u64 offset, u32 size, unsigned int op)
>>> +{
>>> +	struct btrfs_bio *bbio;
>>> +	struct bio *bio;
>>> +
>>> +	bio = bio_alloc_bioset(NULL, 0, op | REQ_SYNC, GFP_NOFS,
>>> +			       &read_repair_bioset);
>>> +	bio_set_flag(bio, BIO_CLONED);
>>
>> Do we need to bother setting the CLONED flag?
>
> The CLONED flag should never be set.  Except for the one bogus check
> in btrfs that I have a pending removal for it is only used for debugging
> checks.
>
>> Without CLONED flag, we can easily go bio_for_each_segment_all() in the
>> endio function without the need of bbio->iter, thus can save some memory.
>
> bio_for_each_segment_all ignores the iter and walks over bi_io_vec
> directly.  And that is something we absolutely can't do here, as the
> bio reuses the bio_vecs from the failed bio, and depending on what
> failed reduces the size.

My bad, I see the bio_alloc_bioset() but didn't check it's allocating a
bi_io_vec with size 0, and soon utilize the original bi_io_vec.

So the function matches its name, it's really bio clone.

And it's very different from my version, which really allocates a new
bio with larger enough bi_io_vec then adding back the needed sectors
from the original bio.

Then I guess the BIO_CLONE flag is completely fine.

But in that case, you may want to call bio_alloc_clone() directly? Which
can handle bioset without problem.
>
>>> +	/*
>>> +	 * Otherwise just clone the whole bio and write it back to the
>>> +	 * previously bad mirror.
>>> +	 */
>>> +	write_bbio = btrfs_repair_bio_clone(read_bbio, 0,
>>> +			read_bbio->iter.bi_size, REQ_OP_WRITE);
>>
>> Do we need to clone the whole bio?
>>
>> Considering under most cases the read repair is already the cold path,
>> have more than one corruption is already rare.
>
> The read bio is trimmed to only cover the bad area, so this already
> potentially does not cover the whole failed bbio.  But except for
> the case you note below we do need to write back the whole bio.
>
>>> +bool __btrfs_read_repair_end(struct btrfs_read_repair *rr,
>>> +		struct btrfs_bio *failed_bbio, struct inode *inode,
>>> +		u64 end_offset, repair_endio_t endio)
>>
>> The reason I don't use "end" in my patchset is, any thing related "end"
>> will let people to think it has something related with endio.
>
> That seems like an odd conotation.  Without io in the word end just
> conotates the end of a range in most of the Linux I/O stack.

OK, thanks for explaining this, I guess it's my problem linking "end" to
"end_io" or "endio".

Then no problem using this current naming.

Thanks,
Qu
>
>> And in fact when checking the function, I really thought it's something
>> related to endio, but it's the equivalent of btrfs_read_repair_finish().
>
> But if there is a strong preference I can use finish instead of end.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/15] btrfs: add new read repair infrastructure
  2022-05-18 10:20       ` Qu Wenruo
@ 2022-05-18 12:48         ` Christoph Hellwig
  0 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-18 12:48 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs

On Wed, May 18, 2022 at 06:20:53PM +0800, Qu Wenruo wrote:
> My bad, I see the bio_alloc_bioset() but didn't check it's allocating a
> bi_io_vec with size 0, and soon utilize the original bi_io_vec.
>
> So the function matches its name, it's really bio clone.
>
> And it's very different from my version, which really allocates a new
> bio with larger enough bi_io_vec then adding back the needed sectors
> from the original bio.
>
> Then I guess the BIO_CLONE flag is completely fine.
>
> But in that case, you may want to call bio_alloc_clone() directly? Which
> can handle bioset without problem.

The big difference is that bio_alloc_clone directly looks at
bio->bi_iter, while we must look at btrfs_bio->iter as bio->bi_iter
may already be consumed by the driver.  I though the comment in the
code explains that, but maybe I need to improve it.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/15] btrfs: quit early if the fs has no RAID56 support for raid56 related checks
  2022-05-17 14:50 ` [PATCH 02/15] btrfs: quit early if the fs has no RAID56 support for raid56 related checks Christoph Hellwig
  2022-05-17 15:00   ` Johannes Thumshirn
@ 2022-05-18 17:07   ` Anand Jain
  2022-05-20  8:47   ` Nikolay Borisov
  2 siblings, 0 replies; 55+ messages in thread
From: Anand Jain @ 2022-05-18 17:07 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs

On 5/17/22 20:20, Christoph Hellwig wrote:
> From: Qu Wenruo <wqu@suse.com>
> 
> The following functions do special handling for RAID56 chunks:
> 
> - btrfs_is_parity_mirror()
>    Check if the range is in RAID56 chunks.
> 
> - btrfs_full_stripe_len()
>    Either return sectorsize for non-RAID56 profiles or full stripe length
>    for RAID56 chunks.
> 
> But if a filesystem without any RAID56 chunks, it will not have RAID56
> incompt flags, and we can skip the chunk tree looking up completely.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

LGTM.

Reviewed-by: Anand Jain <anand.jain@oracle.com>



> ---
>   fs/btrfs/volumes.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 58f3eece8a48c..0819db46dbc42 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5769,6 +5769,9 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,
>   	struct map_lookup *map;
>   	unsigned long len = fs_info->sectorsize;
>   
> +	if (!btrfs_fs_incompat(fs_info, RAID56))
> +		return len;
> +
>   	em = btrfs_get_chunk_map(fs_info, logical, len);
>   
>   	if (!WARN_ON(IS_ERR(em))) {
> @@ -5786,6 +5789,9 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
>   	struct map_lookup *map;
>   	int ret = 0;
>   
> +	if (!btrfs_fs_incompat(fs_info, RAID56))
> +		return 0;
> +
>   	em = btrfs_get_chunk_map(fs_info, logical, len);
>   
>   	if(!WARN_ON(IS_ERR(em))) {


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/15] btrfs: add new read repair infrastructure
  2022-05-17 23:04   ` Qu Wenruo
  2022-05-18  8:54     ` Christoph Hellwig
@ 2022-05-19  9:36     ` Christoph Hellwig
  2022-05-19 10:41       ` Qu Wenruo
  2022-05-20 15:25     ` [PATCH 12/15] btrfs: add new read repair infrastructure Christoph Hellwig
  2 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-19  9:36 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs

On Wed, May 18, 2022 at 07:04:22AM +0800, Qu Wenruo wrote:
> Function btrfs_repair_read_bio() will only return true if all of its
> data matches csum.
>
> Consider the following case:
>
> Profile RAID1C3, 2 sectors to read, the initial mirror is 1.
>
> Mirror 1:	|X|X|
> Mirror 2:	|X| |
> Mirror 3:	| |X|
>
> Now we will got -EIO, but in reality, we can repair the read by using
> the first sector from mirror 3 and the 2nd sector from mirror 2.

I tried to write a test case for this by copying btrfs/140 and then
as a first step extending it to three mirrors unsing the raid1c1
profile.  But it seems that the tricks used there don't work,
as the code in btrfs/140 relies on the fact that the btrfs logic
address repored by file frag is reported by dump-tree as the item
"index" ĭn this line:

item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 137756672) itemoff 15751 itemsiz

but for the raid1c3 profile that line reports something entirely
different.

for raid1:

logical: 137756672
item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 137756672) itemoff 15751 itemsize 112

for raid1c3:

logical: 343998464
item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 298844160) itemoff 15621 itemsize 144

any idea how to find physical sectors to corrupt for raid1c1?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/15] btrfs: add new read repair infrastructure
  2022-05-19  9:36     ` Christoph Hellwig
@ 2022-05-19 10:41       ` Qu Wenruo
  2022-05-19 10:45         ` Nikolay Borisov
                           ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Qu Wenruo @ 2022-05-19 10:41 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo
  Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 2022/5/19 17:36, Christoph Hellwig wrote:
> On Wed, May 18, 2022 at 07:04:22AM +0800, Qu Wenruo wrote:
>> Function btrfs_repair_read_bio() will only return true if all of its
>> data matches csum.
>>
>> Consider the following case:
>>
>> Profile RAID1C3, 2 sectors to read, the initial mirror is 1.
>>
>> Mirror 1:	|X|X|
>> Mirror 2:	|X| |
>> Mirror 3:	| |X|
>>
>> Now we will got -EIO, but in reality, we can repair the read by using
>> the first sector from mirror 3 and the 2nd sector from mirror 2.
> 
> I tried to write a test case for this by copying btrfs/140 and then
> as a first step extending it to three mirrors unsing the raid1c1
> profile.  But it seems that the tricks used there don't work,
> as the code in btrfs/140 relies on the fact that the btrfs logic
> address repored by file frag is reported by dump-tree as the item
> "index" ĭn this line:
> 
> item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 137756672) itemoff 15751 itemsiz
> 
> but for the raid1c3 profile that line reports something entirely
> different.
> 
> for raid1:
> 
> logical: 137756672
> item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 137756672) itemoff 15751 itemsize 112
> 
> for raid1c3:
> 
> logical: 343998464
> item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 298844160) itemoff 15621 itemsize 144
> 
> any idea how to find physical sectors to corrupt for raid1c1?
> 

I also recently hit weird cases why extent allocator no longer puts the 
first data extent at the beginning of a chunk.

So in that case, the best solution is to use "btrfs-map-logical -l 
343998464", which will directly return the physical offset of the wanted 
logical on each involved devices.

Although we need to note:

- btrfs-map-logical may not always be shipped in progs in the future
   This tool really looks like a debug tool. I'm not sure if we will keep
   shipping it (personally I really hope to)

- btrfs-map-logical only return data stripes
   Thus it doesn't work for RAID56 just in case you want to use it.

Despite the weird extent logical bytenr, everything should be fine with 
btrfs-map-logical.

I'll take some time looking into the weird behavior change.

Thanks,
Qu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/15] btrfs: add new read repair infrastructure
  2022-05-19 10:41       ` Qu Wenruo
@ 2022-05-19 10:45         ` Nikolay Borisov
  2022-05-19 10:46           ` Qu Wenruo
  2022-05-19 10:50         ` Christoph Hellwig
  2022-05-20  6:43         ` Why btrfs no longer allocate the extent at the beginning of an empty chunk (was: Re: [PATCH 12/15] btrfs: add new read repair infrastructure) Qu Wenruo
  2 siblings, 1 reply; 55+ messages in thread
From: Nikolay Borisov @ 2022-05-19 10:45 UTC (permalink / raw)
  To: Qu Wenruo, Christoph Hellwig, Qu Wenruo
  Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 19.05.22 г. 13:41 ч., Qu Wenruo wrote:
> 
> 
> On 2022/5/19 17:36, Christoph Hellwig wrote:
>> On Wed, May 18, 2022 at 07:04:22AM +0800, Qu Wenruo wrote:
>>> Function btrfs_repair_read_bio() will only return true if all of its
>>> data matches csum.
>>>
>>> Consider the following case:
>>>
>>> Profile RAID1C3, 2 sectors to read, the initial mirror is 1.
>>>
>>> Mirror 1:    |X|X|
>>> Mirror 2:    |X| |
>>> Mirror 3:    | |X|
>>>
>>> Now we will got -EIO, but in reality, we can repair the read by using
>>> the first sector from mirror 3 and the 2nd sector from mirror 2.
>>
>> I tried to write a test case for this by copying btrfs/140 and then
>> as a first step extending it to three mirrors unsing the raid1c1
>> profile.  But it seems that the tricks used there don't work,
>> as the code in btrfs/140 relies on the fact that the btrfs logic
>> address repored by file frag is reported by dump-tree as the item
>> "index" ĭn this line:
>>
>> item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 137756672) itemoff 15751 itemsiz
>>
>> but for the raid1c3 profile that line reports something entirely
>> different.
>>
>> for raid1:
>>
>> logical: 137756672
>> item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 137756672) itemoff 15751 
>> itemsize 112
>>
>> for raid1c3:
>>
>> logical: 343998464
>> item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 298844160) itemoff 15621 
>> itemsize 144
>>
>> any idea how to find physical sectors to corrupt for raid1c1?
>>
> 
> I also recently hit weird cases why extent allocator no longer puts the 
> first data extent at the beginning of a chunk.
> 
> So in that case, the best solution is to use "btrfs-map-logical -l 
> 343998464", which will directly return the physical offset of the wanted 
> logical on each involved devices.

Any reason why this is kept as a separate tool and not simply integrated 
into btrfs-progs as a separate command?

> 
> Although we need to note:
> 
> - btrfs-map-logical may not always be shipped in progs in the future
>    This tool really looks like a debug tool. I'm not sure if we will keep
>    shipping it (personally I really hope to)
> 
> - btrfs-map-logical only return data stripes
>    Thus it doesn't work for RAID56 just in case you want to use it.
> 
> Despite the weird extent logical bytenr, everything should be fine with 
> btrfs-map-logical.
> 
> I'll take some time looking into the weird behavior change.
> 
> Thanks,
> Qu
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/15] btrfs: add new read repair infrastructure
  2022-05-19 10:45         ` Nikolay Borisov
@ 2022-05-19 10:46           ` Qu Wenruo
  0 siblings, 0 replies; 55+ messages in thread
From: Qu Wenruo @ 2022-05-19 10:46 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 2022/5/19 18:45, Nikolay Borisov wrote:
>
>
> On 19.05.22 г. 13:41 ч., Qu Wenruo wrote:
>>
>>
>> On 2022/5/19 17:36, Christoph Hellwig wrote:
>>> On Wed, May 18, 2022 at 07:04:22AM +0800, Qu Wenruo wrote:
>>>> Function btrfs_repair_read_bio() will only return true if all of its
>>>> data matches csum.
>>>>
>>>> Consider the following case:
>>>>
>>>> Profile RAID1C3, 2 sectors to read, the initial mirror is 1.
>>>>
>>>> Mirror 1:    |X|X|
>>>> Mirror 2:    |X| |
>>>> Mirror 3:    | |X|
>>>>
>>>> Now we will got -EIO, but in reality, we can repair the read by using
>>>> the first sector from mirror 3 and the 2nd sector from mirror 2.
>>>
>>> I tried to write a test case for this by copying btrfs/140 and then
>>> as a first step extending it to three mirrors unsing the raid1c1
>>> profile.  But it seems that the tricks used there don't work,
>>> as the code in btrfs/140 relies on the fact that the btrfs logic
>>> address repored by file frag is reported by dump-tree as the item
>>> "index" ĭn this line:
>>>
>>> item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 137756672) itemoff 15751 itemsiz
>>>
>>> but for the raid1c3 profile that line reports something entirely
>>> different.
>>>
>>> for raid1:
>>>
>>> logical: 137756672
>>> item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 137756672) itemoff 15751
>>> itemsize 112
>>>
>>> for raid1c3:
>>>
>>> logical: 343998464
>>> item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 298844160) itemoff 15621
>>> itemsize 144
>>>
>>> any idea how to find physical sectors to corrupt for raid1c1?
>>>
>>
>> I also recently hit weird cases why extent allocator no longer puts
>> the first data extent at the beginning of a chunk.
>>
>> So in that case, the best solution is to use "btrfs-map-logical -l
>> 343998464", which will directly return the physical offset of the
>> wanted logical on each involved devices.
>
> Any reason why this is kept as a separate tool and not simply integrated
> into btrfs-progs as a separate command?

For historical reasons I guess.

I'm totally fine to move it into btrfs-ins subcommand.

Thanks,
Qu
>
>>
>> Although we need to note:
>>
>> - btrfs-map-logical may not always be shipped in progs in the future
>>    This tool really looks like a debug tool. I'm not sure if we will keep
>>    shipping it (personally I really hope to)
>>
>> - btrfs-map-logical only return data stripes
>>    Thus it doesn't work for RAID56 just in case you want to use it.
>>
>> Despite the weird extent logical bytenr, everything should be fine
>> with btrfs-map-logical.
>>
>> I'll take some time looking into the weird behavior change.
>>
>> Thanks,
>> Qu
>>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/15] btrfs: add new read repair infrastructure
  2022-05-19 10:41       ` Qu Wenruo
  2022-05-19 10:45         ` Nikolay Borisov
@ 2022-05-19 10:50         ` Christoph Hellwig
  2022-05-19 11:27           ` Qu Wenruo
  2022-05-20  6:43         ` Why btrfs no longer allocate the extent at the beginning of an empty chunk (was: Re: [PATCH 12/15] btrfs: add new read repair infrastructure) Qu Wenruo
  2 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-19 10:50 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Qu Wenruo, Chris Mason, Josef Bacik,
	David Sterba, linux-btrfs

On Thu, May 19, 2022 at 06:41:52PM +0800, Qu Wenruo wrote:
> So in that case, the best solution is to use "btrfs-map-logical -l 
> 343998464", which will directly return the physical offset of the wanted 
> logical on each involved devices.
>
> Although we need to note:
>
> - btrfs-map-logical may not always be shipped in progs in the future
>   This tool really looks like a debug tool. I'm not sure if we will keep
>   shipping it (personally I really hope to)
>
> - btrfs-map-logical only return data stripes
>   Thus it doesn't work for RAID56 just in case you want to use it.
>
> Despite the weird extent logical bytenr, everything should be fine with 
> btrfs-map-logical.


Oh, nice, this is much better than the hackery in the existing tests.

That bein said the -c option does not seem to work.  No matter what
I specify it always returns all three mirrors.  I guess a little
awk will fix that, but the behavior seems odd.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/15] btrfs: add new read repair infrastructure
  2022-05-19 10:50         ` Christoph Hellwig
@ 2022-05-19 11:27           ` Qu Wenruo
  0 siblings, 0 replies; 55+ messages in thread
From: Qu Wenruo @ 2022-05-19 11:27 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo
  Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 2022/5/19 18:50, Christoph Hellwig wrote:
> On Thu, May 19, 2022 at 06:41:52PM +0800, Qu Wenruo wrote:
>> So in that case, the best solution is to use "btrfs-map-logical -l
>> 343998464", which will directly return the physical offset of the wanted
>> logical on each involved devices.
>>
>> Although we need to note:
>>
>> - btrfs-map-logical may not always be shipped in progs in the future
>>    This tool really looks like a debug tool. I'm not sure if we will keep
>>    shipping it (personally I really hope to)
>>
>> - btrfs-map-logical only return data stripes
>>    Thus it doesn't work for RAID56 just in case you want to use it.
>>
>> Despite the weird extent logical bytenr, everything should be fine with
>> btrfs-map-logical.
>
>
> Oh, nice, this is much better than the hackery in the existing tests.
>
> That bein said the -c option does not seem to work.  No matter what
> I specify it always returns all three mirrors.  I guess a little
> awk will fix that, but the behavior seems odd.

Currently the copy number is only used for -o option, not for regular
chunk mapping printing.

Thus we still need awk to fix the awkward behavior. (no pun intended)

Another problem related to btrfs-map-logical is it doesn't work if the
range has no data/metadata extent there.

So Nik is right, we need a better tool integrated into btrfs-ins, other
than this historical one.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Why btrfs no longer allocate the extent at the beginning of an empty chunk (was: Re: [PATCH 12/15] btrfs: add new read repair infrastructure)
  2022-05-19 10:41       ` Qu Wenruo
  2022-05-19 10:45         ` Nikolay Borisov
  2022-05-19 10:50         ` Christoph Hellwig
@ 2022-05-20  6:43         ` Qu Wenruo
  2 siblings, 0 replies; 55+ messages in thread
From: Qu Wenruo @ 2022-05-20  6:43 UTC (permalink / raw)
  To: Qu Wenruo, Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

>> I tried to write a test case for this by copying btrfs/140 and then
>> as a first step extending it to three mirrors unsing the raid1c1
>> profile.  But it seems that the tricks used there don't work,
>> as the code in btrfs/140 relies on the fact that the btrfs logic
>> address repored by file frag is reported by dump-tree as the item
>> "index" ĭn this line:
>>
>> item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 137756672) itemoff 15751 itemsiz
>>
>> but for the raid1c3 profile that line reports something entirely
>> different.
>>
>> for raid1:
>>
>> logical: 137756672
>> item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 137756672) itemoff 15751
>> itemsize 112
>>
>> for raid1c3:
>>
>> logical: 343998464
>> item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 298844160) itemoff 15621
>> itemsize 144
>>
>> any idea how to find physical sectors to corrupt for raid1c1?
>>
>
> I also recently hit weird cases why extent allocator no longer puts the
> first data extent at the beginning of a chunk.

Thankfully, this is not a bug, but a combination of seemingly
straightforward behaviors, which leads to a weird combined result.

It takes me a short adventure into the free space handling to find the
problem.

For my example, I'm using 3x10G disks, and running RAID0 for data, RAID1
for metadata:

    Label:              (null)
    UUID:               bb10a539-0344-445a-9e77-bbda65d79366
    Node size:          16384
    Sector size:        4096
    Filesystem size:    30.00GiB
    Block group profiles:
      Data:             RAID0             3.00GiB
      Metadata:         RAID1           256.00MiB
      System:           RAID1             8.00MiB
    SSD detected:       no
    Zoned device:       no
    Incompat features:  extref, skinny-metadata, no-holes
    Runtime features:   free-space-tree
    Checksum:           crc32c
    Number of devices:  3
    Devices:
       ID        SIZE  PATH
        1    10.00GiB  /dev/test/scratch1
        2    10.00GiB  /dev/test/scratch2
        3    10.00GiB  /dev/test/scratch3

The 3GiB data chunk (at logical 298844160, length 3GiB) is completely
empty, but notice that, btrfs needs to avoid allocating extents for
super block reservations.

And we have one logical bytenr 434110464, which is at the superblock
location of /dev/test/scratch1.

So the free space of that 3GiB chunk is split into two parts:

[298844160, +135266304)
[434176000, +3085893632)

Notice the latter part is much larger.

So far so good, but there is another thing involved, the cached free
space behavior.

In find_free_space(), if we are searching from the beginning of a block
group, we will use `rb_first_cached(&ctl->free_space_bytes);`

But free_space_bytes rbtree is not sorted using logical bytenr, but the
free space.
And the leftmost one will have the most amount of free space.
So instead of choose [298844160, +135266304), we choose [434176000,
+3085893632) which has the much larger free space.


Thus we got the seemingly weird bytenr, 434176000, for our first data
extent.


And each behavior itself is completely sane and straightforward.
We can not use space reserved for superblocks.
We should use the free space which has the most free space.

But in the end, when combining two of them, we got the behavior that not
returning the beginning of a seemingly empty chunk.

So in short, we should not rely on the dirty dump tree hacks, but a
better version of btrfs-map-logical to grab the real physical offset of
a logical bytenr.

Thanks,
Qu




^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/15] btrfs: introduce a pure data checksum checking helper
  2022-05-17 14:50 ` [PATCH 01/15] btrfs: introduce a pure data checksum checking helper Christoph Hellwig
  2022-05-17 14:59   ` Johannes Thumshirn
@ 2022-05-20  8:45   ` Nikolay Borisov
  2022-05-20 16:24     ` Christoph Hellwig
  1 sibling, 1 reply; 55+ messages in thread
From: Nikolay Borisov @ 2022-05-20  8:45 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs



On 17.05.22 г. 17:50 ч., Christoph Hellwig wrote:
> From: Qu Wenruo <wqu@suse.com>
> 
> Although we have several data csum verification code, we never have a
> function really just to verify checksum for one sector.
> 
> Function check_data_csum() do extra work for error reporting, thus it
> requires a lot of extra things like file offset, bio_offset etc.
> 
> Function btrfs_verify_data_csum() is even worse, it will utizlie page
> checked flag, which means it can not be utilized for direct IO pages.
> 
> Here we introduce a new helper, btrfs_check_data_sector(), which really
> only accept a sector in page, and expected checksum pointer.
> 
> We use this function to implement check_data_csum(), and export it for
> incoming patch.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> [hch: keep passing the csum array as an arguments, as the callers want
>        to print it]
> Signed-off-by: Christoph Hellwig <hch@lst.de>


Reviewed-by: Nikolay Borisov <nborisov@suse.com>

Though I find the naming of the function suboptimal because we now have 
a bunch of *_check_* function and unless you go in and read the body of 
the new helper you have no idea what 'check' entails and that it is 
about verifying check sums.


For data reads we have:

btrfs_verify_data_csum -> check_data_csum -> btrfs_check_data_sector

I.e the newly introduced function should have csum in its name i.e 
btrfs_check_data_sector_csum.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/15] btrfs: quit early if the fs has no RAID56 support for raid56 related checks
  2022-05-17 14:50 ` [PATCH 02/15] btrfs: quit early if the fs has no RAID56 support for raid56 related checks Christoph Hellwig
  2022-05-17 15:00   ` Johannes Thumshirn
  2022-05-18 17:07   ` Anand Jain
@ 2022-05-20  8:47   ` Nikolay Borisov
  2022-05-20 16:25     ` Christoph Hellwig
  2 siblings, 1 reply; 55+ messages in thread
From: Nikolay Borisov @ 2022-05-20  8:47 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs



On 17.05.22 г. 17:50 ч., Christoph Hellwig wrote:
> From: Qu Wenruo <wqu@suse.com>
> 
> The following functions do special handling for RAID56 chunks:
> 
> - btrfs_is_parity_mirror()
>    Check if the range is in RAID56 chunks.
> 
> - btrfs_full_stripe_len()
>    Either return sectorsize for non-RAID56 profiles or full stripe length
>    for RAID56 chunks.
> 
> But if a filesystem without any RAID56 chunks, it will not have RAID56
> incompt flags, and we can skip the chunk tree looking up completely.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>


This seems rather unrelated to the rest of the series so it can go 
independently and ideally should have been a separate patch of its own.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 04/15] btrfs: remove duplicated parameters from submit_data_read_repair()
  2022-05-17 14:50 ` [PATCH 04/15] btrfs: remove duplicated parameters from submit_data_read_repair() Christoph Hellwig
  2022-05-17 15:35   ` Johannes Thumshirn
@ 2022-05-20 10:05   ` Nikolay Borisov
  1 sibling, 0 replies; 55+ messages in thread
From: Nikolay Borisov @ 2022-05-20 10:05 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs



On 17.05.22 г. 17:50 ч., Christoph Hellwig wrote:
> From: Qu Wenruo <wqu@suse.com>
> 
> The function submit_data_read_repair() is only called for buffered data
> read path, thus those members can be calculated using bvec directly:
> 
> - start
>    start = page_offset(bvec->bv_page) + bvec->bv_offset;
> 
> - end
>    end = start + bvec->bv_len - 1;
> 
> - page
>    page = bvec->bv_page;
> 
> - pgoff
>    pgoff = bvec->bv_offset;
> 
> Thus we can safely replace those 4 parameters with just one bio_vec.
> 
> Also remove the unused return value.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> [hch: also remove the return value]
> Signed-off-by: Christoph Hellwig <hch@lst.de>


Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/15] btrfs: add new read repair infrastructure
  2022-05-17 23:04   ` Qu Wenruo
  2022-05-18  8:54     ` Christoph Hellwig
  2022-05-19  9:36     ` Christoph Hellwig
@ 2022-05-20 15:25     ` Christoph Hellwig
  2 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-20 15:25 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs

On Wed, May 18, 2022 at 07:04:22AM +0800, Qu Wenruo wrote:
> A big NONO here.
>
> Function btrfs_repair_read_bio() will only return true if all of its
> data matches csum.
>
> Consider the following case:
>
> Profile RAID1C3, 2 sectors to read, the initial mirror is 1.
>
> Mirror 1:	|X|X|
> Mirror 2:	|X| |
> Mirror 3:	| |X|
>
> Now we will got -EIO, but in reality, we can repair the read by using
> the first sector from mirror 3 and the 2nd sector from mirror 2.
>
> This is a behavior regression.

Now that I've written tests and code to treat this properly I have to
say that at least on x86 I can't actually trigger this behavior
regression and had to add instrumentation to see a low-level change
in behavior.

Why?

For some reason (and I'd love to see an explanation for it!), btrfs
limits direct I/O reads to a single sector, so for direct I/O is is
impossible to hit this case as all bios (and thus repair bios) are
limited to a single sector.

For buffered I/O we can hit this case, but all ->readahead error are
ignored and the pages actually needed are eventually retried using
->readpage.  And ->reapage is limit to a single sector.   But given
that btrfs does not seem to support sub-4K sector sizes I can't
actually test the sub-sector code here - I assume this might be
an issue on larger page size systems.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/15] btrfs: introduce a pure data checksum checking helper
  2022-05-20  8:45   ` Nikolay Borisov
@ 2022-05-20 16:24     ` Christoph Hellwig
  0 siblings, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-20 16:24 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs

On Fri, May 20, 2022 at 11:45:26AM +0300, Nikolay Borisov wrote:
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>
> Though I find the naming of the function suboptimal because we now have a 
> bunch of *_check_* function and unless you go in and read the body of the 
> new helper you have no idea what 'check' entails and that it is about 
> verifying check sums.
>
>
> For data reads we have:
>
> btrfs_verify_data_csum -> check_data_csum -> btrfs_check_data_sector
>
> I.e the newly introduced function should have csum in its name i.e 
> btrfs_check_data_sector_csum.

If we rename things anyway, do we really need the data in here?
Why not simply btrfs_check_sector_csum?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/15] btrfs: quit early if the fs has no RAID56 support for raid56 related checks
  2022-05-20  8:47   ` Nikolay Borisov
@ 2022-05-20 16:25     ` Christoph Hellwig
  2022-05-20 22:36       ` Qu Wenruo
  0 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-20 16:25 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs

On Fri, May 20, 2022 at 11:47:31AM +0300, Nikolay Borisov wrote:
> This seems rather unrelated to the rest of the series so it can go 
> independently and ideally should have been a separate patch of its own.

As far as I can tell it just speeds up functions used here, so yes.

Qu, do you want to send this out separately?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/15] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks
  2022-05-18 10:07       ` Qu Wenruo
@ 2022-05-20 16:27         ` Christoph Hellwig
  2022-05-21  1:16           ` Qu Wenruo
  0 siblings, 1 reply; 55+ messages in thread
From: Christoph Hellwig @ 2022-05-20 16:27 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Johannes Thumshirn, Chris Mason, Josef Bacik,
	David Sterba, Qu Wenruo, linux-btrfs

On Wed, May 18, 2022 at 06:07:38PM +0800, Qu Wenruo wrote:
>> Because it is a logically separate change that doesn't really have
>> anything to do with the repair code, except that it happens to be the
>> first user?
>
> In fact, there are some other call sites can benefit from the helper,
> like btrfs_csum_one_bio().
>
> Thus if you can convert those call sites, there will be no question on
> introducing the helper in one patch.

Yes, there is a bunch of places where it would be pretty useful.  But
the series is already large enough as is, so for now I'd rather
keep that for another round.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/15] btrfs: quit early if the fs has no RAID56 support for raid56 related checks
  2022-05-20 16:25     ` Christoph Hellwig
@ 2022-05-20 22:36       ` Qu Wenruo
  0 siblings, 0 replies; 55+ messages in thread
From: Qu Wenruo @ 2022-05-20 22:36 UTC (permalink / raw)
  To: Christoph Hellwig, Nikolay Borisov
  Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 2022/5/21 00:25, Christoph Hellwig wrote:
> On Fri, May 20, 2022 at 11:47:31AM +0300, Nikolay Borisov wrote:
>> This seems rather unrelated to the rest of the series so it can go
>> independently and ideally should have been a separate patch of its own.
> 
> As far as I can tell it just speeds up functions used here, so yes.
> 
> Qu, do you want to send this out separately?
> 

I guess if needed, David can pick this up independently?

Thanks,
Qu


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/15] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks
  2022-05-20 16:27         ` Christoph Hellwig
@ 2022-05-21  1:16           ` Qu Wenruo
  0 siblings, 0 replies; 55+ messages in thread
From: Qu Wenruo @ 2022-05-21  1:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Johannes Thumshirn, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs



On 2022/5/21 00:27, Christoph Hellwig wrote:
> On Wed, May 18, 2022 at 06:07:38PM +0800, Qu Wenruo wrote:
>>> Because it is a logically separate change that doesn't really have
>>> anything to do with the repair code, except that it happens to be the
>>> first user?
>>
>> In fact, there are some other call sites can benefit from the helper,
>> like btrfs_csum_one_bio().
>>
>> Thus if you can convert those call sites, there will be no question on
>> introducing the helper in one patch.
>
> Yes, there is a bunch of places where it would be pretty useful.  But
> the series is already large enough as is, so for now I'd rather
> keep that for another round.

After more thought (refreshing bike riding), considering there are
already quite some independent fixes and cleanups, it looks reasonable
to me to split the patchset into two parts:

1. Independent cleanups/fixes/refactors
    Those can be queued for v5.20.
    And especially for this helper, I'd like to rework a lot of call
    sites to use this helpers to simplify the code.

2. Real read-time refactor
    In fact I also want to revive the old initial RFC version, and
    compare with different versions.

Does this sound OK?

Thanks,
Qu

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2022-05-21  1:16 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-17 14:50 simple synchronous read repair Christoph Hellwig
2022-05-17 14:50 ` [PATCH 01/15] btrfs: introduce a pure data checksum checking helper Christoph Hellwig
2022-05-17 14:59   ` Johannes Thumshirn
2022-05-18  8:44     ` Christoph Hellwig
2022-05-20  8:45   ` Nikolay Borisov
2022-05-20 16:24     ` Christoph Hellwig
2022-05-17 14:50 ` [PATCH 02/15] btrfs: quit early if the fs has no RAID56 support for raid56 related checks Christoph Hellwig
2022-05-17 15:00   ` Johannes Thumshirn
2022-05-18 17:07   ` Anand Jain
2022-05-20  8:47   ` Nikolay Borisov
2022-05-20 16:25     ` Christoph Hellwig
2022-05-20 22:36       ` Qu Wenruo
2022-05-17 14:50 ` [PATCH 03/15] btrfs: save the original bi_iter into btrfs_bio for buffered read Christoph Hellwig
2022-05-17 14:50 ` [PATCH 04/15] btrfs: remove duplicated parameters from submit_data_read_repair() Christoph Hellwig
2022-05-17 15:35   ` Johannes Thumshirn
2022-05-20 10:05   ` Nikolay Borisov
2022-05-17 14:50 ` [PATCH 05/15] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks Christoph Hellwig
2022-05-17 15:27   ` Johannes Thumshirn
2022-05-18  8:46     ` Christoph Hellwig
2022-05-18 10:07       ` Qu Wenruo
2022-05-20 16:27         ` Christoph Hellwig
2022-05-21  1:16           ` Qu Wenruo
2022-05-17 14:50 ` [PATCH 06/15] btrfs: make repair_io_failure available outside of extent_io.c Christoph Hellwig
2022-05-17 15:18   ` Johannes Thumshirn
2022-05-17 14:50 ` [PATCH 07/15] btrfs: factor out a helper to end a single sector from submit_data_read_repair Christoph Hellwig
2022-05-17 15:18   ` Johannes Thumshirn
2022-05-17 22:17   ` Qu Wenruo
2022-05-17 14:50 ` [PATCH 08/15] btrfs: refactor end_bio_extent_readpage Christoph Hellwig
2022-05-17 22:22   ` Qu Wenruo
2022-05-18  8:48     ` Christoph Hellwig
2022-05-17 14:50 ` [PATCH 09/15] btrfs: factor out a btrfs_csum_ptr helper Christoph Hellwig
2022-05-17 15:24   ` Johannes Thumshirn
2022-05-18  8:45     ` Christoph Hellwig
2022-05-17 14:50 ` [PATCH 10/15] btrfs: add a btrfs_map_bio_wait helper Christoph Hellwig
2022-05-17 15:37   ` Johannes Thumshirn
2022-05-17 22:26   ` Qu Wenruo
2022-05-18  8:47     ` Christoph Hellwig
2022-05-17 14:50 ` [PATCH 11/15] btrfs: set ->file_offset in end_bio_extent_readpage Christoph Hellwig
2022-05-17 22:47   ` Qu Wenruo
2022-05-17 14:50 ` [PATCH 12/15] btrfs: add new read repair infrastructure Christoph Hellwig
2022-05-17 23:04   ` Qu Wenruo
2022-05-18  8:54     ` Christoph Hellwig
2022-05-18 10:20       ` Qu Wenruo
2022-05-18 12:48         ` Christoph Hellwig
2022-05-19  9:36     ` Christoph Hellwig
2022-05-19 10:41       ` Qu Wenruo
2022-05-19 10:45         ` Nikolay Borisov
2022-05-19 10:46           ` Qu Wenruo
2022-05-19 10:50         ` Christoph Hellwig
2022-05-19 11:27           ` Qu Wenruo
2022-05-20  6:43         ` Why btrfs no longer allocate the extent at the beginning of an empty chunk (was: Re: [PATCH 12/15] btrfs: add new read repair infrastructure) Qu Wenruo
2022-05-20 15:25     ` [PATCH 12/15] btrfs: add new read repair infrastructure Christoph Hellwig
2022-05-17 14:50 ` [PATCH 13/15] btrfs: use the new read repair code for direct I/O Christoph Hellwig
2022-05-17 14:50 ` [PATCH 14/15] btrfs: use the new read repair code for buffered reads Christoph Hellwig
2022-05-17 14:50 ` [PATCH 15/15] btrfs: remove io_failure_record infrastructure completely Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.