[PATCH 0/7] btrfs: synchronous (but super simple) read-repair rework

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/7] btrfs: synchronous (but super simple) read-repair rework
@ 2022-05-23  1:48 Qu Wenruo
  2022-05-23  1:48 ` [PATCH 1/7] btrfs: save the original bi_iter into btrfs_bio for buffered read Qu Wenruo
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-05-23  1:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Christoph Hellwig

This is the initial RFC version revivied, and based on Christoph's
cleanup series.

The branch can be feteched from my repo:
https://github.com/adam900710/linux/tree/read_repair

The core idea of the revived read-repair is the following assumptions:

- Read-repair is already a cold path
- Multiple corruption in a single read is even rarer in real-world

With the above two assumption combined, we are safe to sacrifice the
read-repair performance, by going completely synchronous read-repair.
(the original code is also done sector-by-sector, but in an asynchronous
way).

Now the read-repair is done in a sector-by-sector base:

1) Try to read the next mirror (if have any)
2) Verify the csum (if any)
3) If read failed or csum mismatched, go back to 1)

All the read (from next mirror) or write (to previous bad mirror) is
done synchronously.
Which means, we will wait for the read, then also wait for the write.

This is no doubt slow, but we should be fine with that, as for corrupted
data case, the priority is on the correctness, not the performance
anymore.
Not to mention this performance penalty is only for the cold path.

The advantage of this method is, the helper, btrfs_read_repair_sector()
is less than 100 lines, straight-forward to read/maintain.
And as all later read-repair code, we get rid of
btrfs_inode::failure_io_tree completely.

And since that helper only needs to manage the content of the page,
no need to bother page status update, thus can be easily applied to any endio
context (both buffered/direct IO paths).

Unfortunately since that helper is so simple, there is no need to
introduce btrfs_read_repair_ctl structure, thus the argument list of
that helper is a little longer.

Cc: Christoph Hellwig <hch@lst.de>

Christoph Hellwig (1):
  btrfs: add a btrfs_map_bio_wait helper

Qu Wenruo (6):
  btrfs: save the original bi_iter into btrfs_bio for buffered read
  btrfs: make repair_io_failure available outside of extent_io.c
  btrfs: add new read repair infrastructure
  btrfs: use the new read repair code for buffered reads
  btrfs: use the new read repair code for direct I/O
  btrfs: remove io_failure_record infrastructure completely

 fs/btrfs/Makefile            |   2 +-
 fs/btrfs/btrfs_inode.h       |   5 -
 fs/btrfs/extent-io-tree.h    |  15 --
 fs/btrfs/extent_io.c         | 424 +++--------------------------------
 fs/btrfs/extent_io.h         |  27 +--
 fs/btrfs/inode.c             |  54 ++---
 fs/btrfs/read-repair.c       |  74 ++++++
 fs/btrfs/read-repair.h       |  13 ++
 fs/btrfs/volumes.c           |  21 ++
 fs/btrfs/volumes.h           |   2 +
 include/trace/events/btrfs.h |   1 -
 11 files changed, 164 insertions(+), 474 deletions(-)
 create mode 100644 fs/btrfs/read-repair.c
 create mode 100644 fs/btrfs/read-repair.h

-- 
2.36.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/7] btrfs: save the original bi_iter into btrfs_bio for buffered read
  2022-05-23  1:48 [PATCH 0/7] btrfs: synchronous (but super simple) read-repair rework Qu Wenruo
@ 2022-05-23  1:48 ` Qu Wenruo
  2022-05-23  1:48 ` [PATCH 2/7] btrfs: make repair_io_failure available outside of extent_io.c Qu Wenruo
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-05-23  1:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Christoph Hellwig

Although we have btrfs_bio::iter, it currently have very limited usage:

- RAID56
  Which is not needed at all

- btrfs_bio_clone()
  This is used mostly for direct IO.

For the incoming read repair patches, we want to grab the original
logical bytenr, and be able to iterate the range of the bio (no matter
if it's cloned).

So this patch will also save btrfs_bio::iter for buffered read bios at
submit_one_bio().
And for the sake of consistency, also save the btrfs_bio::iter for
direct IO at btrfs_submit_dio_bio().

The reason that we didn't save the iter in btrfs_map_bio() is,
btrfs_map_bio() is going to handle various bios, with or without
btrfs_bio bioset.
And we  want to keep btrfs_map_bio() to handle and only handle plain bios
without bother the bioset.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 7 +++++++
 fs/btrfs/inode.c     | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1d144f655f65..1bd1b1253f9d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -188,6 +188,13 @@ static void submit_one_bio(struct bio *bio, int mirror_num,
 	/* Caller should ensure the bio has at least some range added */
 	ASSERT(bio->bi_iter.bi_size);
 
+	/*
+	 * Save the original bi_iter for read bios, as read repair wants the
+	 * orignial logical bytenr.
+	 */
+	if (bio_op(bio) == REQ_OP_READ)
+		btrfs_bio(bio)->iter = bio->bi_iter;
+
 	if (is_data_inode(tree->private_data))
 		btrfs_submit_data_bio(tree->private_data, bio, mirror_num,
 					    compress_type);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 34466b543ed9..dd0882e1b982 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7974,6 +7974,8 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 		ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
 		if (ret)
 			goto err;
+		/* Check submit_one_bio() for the reason. */
+		btrfs_bio(bio)->iter = bio->bi_iter;
 	}
 
 	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/7] btrfs: make repair_io_failure available outside of extent_io.c
  2022-05-23  1:48 [PATCH 0/7] btrfs: synchronous (but super simple) read-repair rework Qu Wenruo
  2022-05-23  1:48 ` [PATCH 1/7] btrfs: save the original bi_iter into btrfs_bio for buffered read Qu Wenruo
@ 2022-05-23  1:48 ` Qu Wenruo
  2022-05-23  1:48 ` [PATCH 3/7] btrfs: add a btrfs_map_bio_wait helper Qu Wenruo
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-05-23  1:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Christoph Hellwig, Johannes Thumshirn

Remove the static so that the function can be used by the new read
repair code, and give it a btrfs_ prefix.

Signed-off-by: Qu Wenruo <wqu@suse.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/extent_io.c | 19 ++++++++++---------
 fs/btrfs/extent_io.h |  3 +++
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1bd1b1253f9d..1083d6cfa858 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2321,9 +2321,9 @@ int free_io_failure(struct extent_io_tree *failure_tree,
  * currently, there can be no more than two copies of every data bit. thus,
  * exactly one rewrite is required.
  */
-static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
-			     u64 length, u64 logical, struct page *page,
-			     unsigned int pg_offset, int mirror_num)
+int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
+			    u64 length, u64 logical, struct page *page,
+			    unsigned int pg_offset, int mirror_num)
 {
 	struct btrfs_device *dev;
 	struct bio_vec bvec;
@@ -2415,8 +2415,9 @@ int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num)
 	for (i = 0; i < num_pages; i++) {
 		struct page *p = eb->pages[i];
 
-		ret = repair_io_failure(fs_info, 0, start, PAGE_SIZE, start, p,
-					start - page_offset(p), mirror_num);
+		ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE,
+					      start, p, start - page_offset(p),
+					      mirror_num);
 		if (ret)
 			break;
 		start += PAGE_SIZE;
@@ -2466,9 +2467,9 @@ int clean_io_failure(struct btrfs_fs_info *fs_info,
 		num_copies = btrfs_num_copies(fs_info, failrec->logical,
 					      failrec->len);
 		if (num_copies > 1)  {
-			repair_io_failure(fs_info, ino, start, failrec->len,
-					  failrec->logical, page, pg_offset,
-					  failrec->failed_mirror);
+			btrfs_repair_io_failure(fs_info, ino, start,
+					failrec->len, failrec->logical,
+					page, pg_offset, failrec->failed_mirror);
 		}
 	}
 
@@ -2626,7 +2627,7 @@ static bool btrfs_check_repairable(struct inode *inode,
 	 *
 	 * Since we're only doing repair for one sector, we only need to get
 	 * a good copy of the failed sector and if we succeed, we have setup
-	 * everything for repair_io_failure to do the rest for us.
+	 * everything for btrfs_repair_io_failure() to do the rest for us.
 	 */
 	ASSERT(failed_mirror);
 	failrec->failed_mirror = failed_mirror;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 956fa434df43..6cdcea1551a6 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -276,6 +276,9 @@ int btrfs_repair_one_sector(struct inode *inode,
 			    struct page *page, unsigned int pgoff,
 			    u64 start, int failed_mirror,
 			    submit_bio_hook_t *submit_bio_hook);
+int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
+			    u64 length, u64 logical, struct page *page,
+			    unsigned int pg_offset, int mirror_num);
 
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 bool find_lock_delalloc_range(struct inode *inode,
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/7] btrfs: add a btrfs_map_bio_wait helper
  2022-05-23  1:48 [PATCH 0/7] btrfs: synchronous (but super simple) read-repair rework Qu Wenruo
  2022-05-23  1:48 ` [PATCH 1/7] btrfs: save the original bi_iter into btrfs_bio for buffered read Qu Wenruo
  2022-05-23  1:48 ` [PATCH 2/7] btrfs: make repair_io_failure available outside of extent_io.c Qu Wenruo
@ 2022-05-23  1:48 ` Qu Wenruo
  2022-05-23  1:48 ` [PATCH 4/7] btrfs: add new read repair infrastructure Qu Wenruo
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-05-23  1:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Christoph Hellwig, Johannes Thumshirn

From: Christoph Hellwig <hch@lst.de>

This helpers works like submit_bio_wait, but goes through the btrfs bio
mapping using btrfs_map_bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/volumes.c | 21 +++++++++++++++++++++
 fs/btrfs/volumes.h |  2 ++
 2 files changed, 23 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0819db46dbc4..8925bc606db7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6818,6 +6818,27 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 	return BLK_STS_OK;
 }
 
+static void btrfs_end_io_sync(struct bio *bio)
+{
+	complete(bio->bi_private);
+}
+
+blk_status_t btrfs_map_bio_wait(struct btrfs_fs_info *fs_info, struct bio *bio,
+		int mirror)
+{
+	DECLARE_COMPLETION_ONSTACK(done);
+	blk_status_t ret;
+
+	bio->bi_private = &done;
+	bio->bi_end_io = btrfs_end_io_sync;
+	ret = btrfs_map_bio(fs_info, bio, mirror);
+	if (ret)
+		return ret;
+
+	wait_for_completion_io(&done);
+	return bio->bi_status;
+}
+
 static bool dev_args_match_fs_devices(const struct btrfs_dev_lookup_args *args,
 				      const struct btrfs_fs_devices *fs_devices)
 {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6f784d4f5466..b346f6c40151 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -555,6 +555,8 @@ struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
 void btrfs_mapping_tree_free(struct extent_map_tree *tree);
 blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 			   int mirror_num);
+blk_status_t btrfs_map_bio_wait(struct btrfs_fs_info *fs_info, struct bio *bio,
+		int mirror);
 int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 		       fmode_t flags, void *holder);
 struct btrfs_device *btrfs_scan_one_device(const char *path,
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/7] btrfs: add new read repair infrastructure
  2022-05-23  1:48 [PATCH 0/7] btrfs: synchronous (but super simple) read-repair rework Qu Wenruo
                   ` (2 preceding siblings ...)
  2022-05-23  1:48 ` [PATCH 3/7] btrfs: add a btrfs_map_bio_wait helper Qu Wenruo
@ 2022-05-23  1:48 ` Qu Wenruo
  2022-05-23  1:48 ` [PATCH 5/7] btrfs: use the new read repair code for buffered reads Qu Wenruo
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-05-23  1:48 UTC (permalink / raw)
  To: linux-btrfs

The new infrastructure only has one function,
btrfs_read_repair_sector(), which will try to get the correct content of
that sector.

The idea of the function is very straight-forward:

1) Try to read the next mirror (if possible)
2) Verify the csum (if it has)
3) Go back to 1) if csum mismatch or read failed

All the bio submission is synchronous, meaning we will wait for the
submitted bio to finish before continue.

This can be a performance bottleneck, but considering that:

- Read-repair is already a cold path
- More than one corruption in one read bio is even rarer

Thus I don't think we should spend tons of code on a very cold path, no
to mention complex code itself can be bug prone and harder to maintain.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/Makefile      |  2 +-
 fs/btrfs/read-repair.c | 74 ++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/read-repair.h | 13 ++++++++
 3 files changed, 88 insertions(+), 1 deletion(-)
 create mode 100644 fs/btrfs/read-repair.c
 create mode 100644 fs/btrfs/read-repair.h

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 99f9995670ea..0b2605c750ca 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -31,7 +31,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
 	   backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
 	   uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \
 	   block-rsv.o delalloc-space.o block-group.o discard.o reflink.o \
-	   subpage.o tree-mod-log.o
+	   subpage.o tree-mod-log.o read-repair.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/read-repair.c b/fs/btrfs/read-repair.c
new file mode 100644
index 000000000000..e3175e27bcbb
--- /dev/null
+++ b/fs/btrfs/read-repair.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bio.h>
+#include "ctree.h"
+#include "volumes.h"
+#include "read-repair.h"
+#include "btrfs_inode.h"
+
+static int get_next_mirror(int cur_mirror, int num_copies)
+{
+	/* In the context of read-repair, we never use 0 as mirror_num. */
+	ASSERT(cur_mirror);
+	return (cur_mirror + 1 > num_copies) ? (cur_mirror + 1 - num_copies) :
+		cur_mirror + 1;
+}
+
+static int get_prev_mirror(int cur_mirror, int num_copies)
+{
+	/* In the context of read-repair, we never use 0 as mirror_num. */
+	ASSERT(cur_mirror);
+	return (cur_mirror - 1 <= 0) ? (num_copies) : cur_mirror - 1;
+}
+
+int btrfs_read_repair_sector(struct inode *inode,
+			     struct page *page, unsigned int pgoff,
+			     u64 logical, u64 file_off, int failed_mirror,
+			     int num_copies, u8 *expected_csum)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	bool uptodate = false;
+	int i;
+
+	/* No more mirrors to retry. */
+	if (num_copies <= 1)
+		return -EIO;
+
+	for (i = get_next_mirror(failed_mirror, num_copies); i != failed_mirror;
+	     i = get_next_mirror(i, num_copies)) {
+		u8 csum[BTRFS_CSUM_SIZE];
+		struct bio *read_bio;
+		int ret;
+
+		read_bio = bio_alloc(NULL, 1, REQ_OP_READ | REQ_SYNC, GFP_NOFS);
+		if (!read_bio)
+			return -EIO;
+		__bio_add_page(read_bio, page, fs_info->sectorsize, pgoff);
+		read_bio->bi_iter.bi_sector = logical >> SECTOR_SHIFT;
+
+		ret = btrfs_map_bio_wait(fs_info, read_bio, i);
+		/* Submit failed, try next mirror. */
+		if (ret < 0)
+			continue;
+
+		if (expected_csum) {
+			ret = btrfs_check_sector_csum(fs_info, page, pgoff,
+						      csum, expected_csum);
+			if (!ret)
+				uptodate = true;
+		} else {
+			uptodate = true;
+		}
+
+		if (uptodate) {
+			btrfs_repair_io_failure(fs_info,
+					btrfs_ino(BTRFS_I(inode)), file_off,
+					fs_info->sectorsize, logical, page,
+					pgoff, get_prev_mirror(i, num_copies));
+			break;
+		}
+	}
+	if (!uptodate)
+		return -EIO;
+	return 0;
+}
diff --git a/fs/btrfs/read-repair.h b/fs/btrfs/read-repair.h
new file mode 100644
index 000000000000..e984ab0b5b18
--- /dev/null
+++ b/fs/btrfs/read-repair.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef BTRFS_READ_REPAIR_H
+#define BTRFS_READ_REPAIR_H
+
+#include <linux/blk_types.h>
+#include <linux/fs.h>
+
+int btrfs_read_repair_sector(struct inode *inode,
+			     struct page *page, unsigned int pgoff,
+			     u64 logical, u64 file_off, int failed_mirror,
+			     int num_copies, u8 *expected_csum);
+#endif
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/7] btrfs: use the new read repair code for buffered reads
  2022-05-23  1:48 [PATCH 0/7] btrfs: synchronous (but super simple) read-repair rework Qu Wenruo
                   ` (3 preceding siblings ...)
  2022-05-23  1:48 ` [PATCH 4/7] btrfs: add new read repair infrastructure Qu Wenruo
@ 2022-05-23  1:48 ` Qu Wenruo
  2022-05-23  1:48 ` [PATCH 6/7] btrfs: use the new read repair code for direct I/O Qu Wenruo
  2022-05-23  1:48 ` [PATCH 7/7] btrfs: remove io_failure_record infrastructure completely Qu Wenruo
  6 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-05-23  1:48 UTC (permalink / raw)
  To: linux-btrfs

Just call the new btrfs_read_repair_sector() to replace the old
btrfs_repair_one_sector().

And since the new helper only handles the page content, the caller still
needs to handle the page status update and unlock.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1083d6cfa858..cf32b2ff0568 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -30,6 +30,7 @@
 #include "zoned.h"
 #include "block-group.h"
 #include "compression.h"
+#include "read-repair.h"
 
 static struct kmem_cache *extent_state_cache;
 static struct kmem_cache *extent_buffer_cache;
@@ -2756,10 +2757,13 @@ static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
 	const unsigned int pgoff = bvec->bv_offset;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct page *page = bvec->bv_page;
+	struct btrfs_bio *failed_bbio = btrfs_bio(failed_bio);
+	const u64 bio_logical = failed_bbio->iter.bi_sector << SECTOR_SHIFT;
 	const u64 start = page_offset(bvec->bv_page) + bvec->bv_offset;
 	const u64 end = start + bvec->bv_len - 1;
 	const u32 sectorsize = fs_info->sectorsize;
 	const int nr_bits = (end + 1 - start) >> fs_info->sectorsize_bits;
+	int num_copies;
 	int i;
 
 	BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
@@ -2776,10 +2780,13 @@ static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
 	 */
 	ASSERT(page->mapping && !bio_flagged(failed_bio, BIO_CLONED));
 
+	num_copies = btrfs_num_copies(fs_info, bio_logical, fs_info->sectorsize);
+
 	/* Iterate through all the sectors in the range */
 	for (i = 0; i < nr_bits; i++) {
 		const unsigned int offset = i * sectorsize;
 		bool uptodate = false;
+		u8 *expected_csum = NULL;
 		int ret;
 
 		if (!(error_bitmap & (1U << i))) {
@@ -2791,22 +2798,19 @@ static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
 			goto next;
 		}
 
-		ret = btrfs_repair_one_sector(inode, failed_bio,
-				bio_offset + offset,
-				page, pgoff + offset, start + offset,
-				failed_mirror, btrfs_submit_data_bio);
-		if (!ret) {
-			/*
-			 * We have submitted the read repair, the page release
-			 * will be handled by the endio function of the
-			 * submitted repair bio.
-			 * Thus we don't need to do any thing here.
-			 */
-			continue;
-		}
+		if (failed_bbio->csum)
+			expected_csum = btrfs_csum_ptr(fs_info,
+					failed_bbio->csum, bio_offset + offset);
+
+		ret = btrfs_read_repair_sector(inode, page, pgoff + offset,
+				bio_logical + bio_offset + offset,
+				start + offset, failed_bbio->mirror_num,
+				num_copies, expected_csum);
+		if (!ret)
+			uptodate = true;
 		/*
-		 * Continue on failed repair, otherwise the remaining sectors
-		 * will not be properly unlocked.
+		 * If above repair failed, we have tried all mirrors, time to
+		 * release the corrupted sector.
 		 */
 next:
 		end_sector_io(page, start + offset, uptodate);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 6/7] btrfs: use the new read repair code for direct I/O
  2022-05-23  1:48 [PATCH 0/7] btrfs: synchronous (but super simple) read-repair rework Qu Wenruo
                   ` (4 preceding siblings ...)
  2022-05-23  1:48 ` [PATCH 5/7] btrfs: use the new read repair code for buffered reads Qu Wenruo
@ 2022-05-23  1:48 ` Qu Wenruo
  2022-05-23  1:48 ` [PATCH 7/7] btrfs: remove io_failure_record infrastructure completely Qu Wenruo
  6 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-05-23  1:48 UTC (permalink / raw)
  To: linux-btrfs

Just convert the btrfs_repair_one_sector() call to
btrfs_read_repair_sector().

And we can remove the dio specific repair helper now.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 35 ++++++++++++++---------------------
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index dd0882e1b982..2d52a19e02cf 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -55,6 +55,7 @@
 #include "zoned.h"
 #include "subpage.h"
 #include "inode-item.h"
+#include "read-repair.h"
 
 struct btrfs_iget_args {
 	u64 ino;
@@ -7863,23 +7864,6 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip)
 	bio_endio(&dip->bio);
 }
 
-static void submit_dio_repair_bio(struct inode *inode, struct bio *bio,
-				  int mirror_num,
-				  enum btrfs_compression_type compress_type)
-{
-	struct btrfs_dio_private *dip = bio->bi_private;
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-
-	BUG_ON(bio_op(bio) == REQ_OP_WRITE);
-
-	if (btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA))
-		return;
-
-	refcount_inc(&dip->refs);
-	if (btrfs_map_bio(fs_info, bio, mirror_num))
-		refcount_dec(&dip->refs);
-}
-
 static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
 					     struct btrfs_bio *bbio,
 					     const bool uptodate)
@@ -7904,12 +7888,21 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
 					 bv.bv_page, btrfs_ino(BTRFS_I(inode)),
 					 bv.bv_offset);
 		} else {
+			u8 *csum_expected = NULL;
+			const u64 logical = (bbio->iter.bi_sector <<
+					     SECTOR_SHIFT) + offset;
+			int num_copies;
 			int ret;
 
-			ret = btrfs_repair_one_sector(inode, &bbio->bio, offset,
-					bv.bv_page, bv.bv_offset, start,
-					bbio->mirror_num,
-					submit_dio_repair_bio);
+			if (bbio->csum)
+				csum_expected = btrfs_csum_ptr(fs_info,
+						bbio->csum, offset);
+			num_copies = btrfs_num_copies(fs_info, logical,
+						      fs_info->sectorsize);
+
+			ret = btrfs_read_repair_sector(inode, bv.bv_page,
+					bv.bv_offset, logical, start,
+					bbio->mirror_num, num_copies, csum_expected);
 			if (ret)
 				err = errno_to_blk_status(ret);
 		}
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 7/7] btrfs: remove io_failure_record infrastructure completely
  2022-05-23  1:48 [PATCH 0/7] btrfs: synchronous (but super simple) read-repair rework Qu Wenruo
                   ` (5 preceding siblings ...)
  2022-05-23  1:48 ` [PATCH 6/7] btrfs: use the new read repair code for direct I/O Qu Wenruo
@ 2022-05-23  1:48 ` Qu Wenruo
  6 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-05-23  1:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Christoph Hellwig

Since our read repair are always handled by btrfs_read_repair_ctrl,
which only has the lifespan inside endio function.

This means we no longer needs to record which range and its mirror
number for failure.

Now if we failed to read some data page, we have already tried every
mirrors we have, thus no need to record the failed range.

Thus this patch can remove the whole io_failure_record structure and its
related functions.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/btrfs_inode.h       |   5 -
 fs/btrfs/extent-io-tree.h    |  15 --
 fs/btrfs/extent_io.c         | 372 -----------------------------------
 fs/btrfs/extent_io.h         |  24 ---
 fs/btrfs/inode.c             |  17 +-
 include/trace/events/btrfs.h |   1 -
 6 files changed, 2 insertions(+), 432 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 33811e896623..3eeba0eb9f16 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -91,11 +91,6 @@ struct btrfs_inode {
 	/* the io_tree does range state (DIRTY, LOCKED etc) */
 	struct extent_io_tree io_tree;
 
-	/* special utility tree used to record which mirrors have already been
-	 * tried when checksums fail for a given block
-	 */
-	struct extent_io_tree io_failure_tree;
-
 	/*
 	 * Keep track of where the inode has extent items mapped in order to
 	 * make sure the i_size adjustments are accurate
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index c3eb52dbe61c..8ab9b6cd53ed 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -56,7 +56,6 @@ enum {
 	IO_TREE_FS_EXCLUDED_EXTENTS,
 	IO_TREE_BTREE_INODE_IO,
 	IO_TREE_INODE_IO,
-	IO_TREE_INODE_IO_FAILURE,
 	IO_TREE_RELOC_BLOCKS,
 	IO_TREE_TRANS_DIRTY_PAGES,
 	IO_TREE_ROOT_DIRTY_LOG_PAGES,
@@ -250,18 +249,4 @@ bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start,
 			       u64 *end, u64 max_bytes,
 			       struct extent_state **cached_state);
 
-/* This should be reworked in the future and put elsewhere. */
-struct io_failure_record *get_state_failrec(struct extent_io_tree *tree, u64 start);
-int set_state_failrec(struct extent_io_tree *tree, u64 start,
-		      struct io_failure_record *failrec);
-void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start,
-		u64 end);
-int free_io_failure(struct extent_io_tree *failure_tree,
-		    struct extent_io_tree *io_tree,
-		    struct io_failure_record *rec);
-int clean_io_failure(struct btrfs_fs_info *fs_info,
-		     struct extent_io_tree *failure_tree,
-		     struct extent_io_tree *io_tree, u64 start,
-		     struct page *page, u64 ino, unsigned int pg_offset);
-
 #endif /* BTRFS_EXTENT_IO_TREE_H */
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index cf32b2ff0568..36bc8d45a5f3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2172,66 +2172,6 @@ u64 count_range_bits(struct extent_io_tree *tree,
 	return total_bytes;
 }
 
-/*
- * set the private field for a given byte offset in the tree.  If there isn't
- * an extent_state there already, this does nothing.
- */
-int set_state_failrec(struct extent_io_tree *tree, u64 start,
-		      struct io_failure_record *failrec)
-{
-	struct rb_node *node;
-	struct extent_state *state;
-	int ret = 0;
-
-	spin_lock(&tree->lock);
-	/*
-	 * this search will find all the extents that end after
-	 * our range starts.
-	 */
-	node = tree_search(tree, start);
-	if (!node) {
-		ret = -ENOENT;
-		goto out;
-	}
-	state = rb_entry(node, struct extent_state, rb_node);
-	if (state->start != start) {
-		ret = -ENOENT;
-		goto out;
-	}
-	state->failrec = failrec;
-out:
-	spin_unlock(&tree->lock);
-	return ret;
-}
-
-struct io_failure_record *get_state_failrec(struct extent_io_tree *tree, u64 start)
-{
-	struct rb_node *node;
-	struct extent_state *state;
-	struct io_failure_record *failrec;
-
-	spin_lock(&tree->lock);
-	/*
-	 * this search will find all the extents that end after
-	 * our range starts.
-	 */
-	node = tree_search(tree, start);
-	if (!node) {
-		failrec = ERR_PTR(-ENOENT);
-		goto out;
-	}
-	state = rb_entry(node, struct extent_state, rb_node);
-	if (state->start != start) {
-		failrec = ERR_PTR(-ENOENT);
-		goto out;
-	}
-
-	failrec = state->failrec;
-out:
-	spin_unlock(&tree->lock);
-	return failrec;
-}
-
 /*
  * searches a range in the state tree for a given mask.
  * If 'filled' == 1, this returns 1 only if every extent in the tree
@@ -2288,30 +2228,6 @@ int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end,
 	return bitset;
 }
 
-int free_io_failure(struct extent_io_tree *failure_tree,
-		    struct extent_io_tree *io_tree,
-		    struct io_failure_record *rec)
-{
-	int ret;
-	int err = 0;
-
-	set_state_failrec(failure_tree, rec->start, NULL);
-	ret = clear_extent_bits(failure_tree, rec->start,
-				rec->start + rec->len - 1,
-				EXTENT_LOCKED | EXTENT_DIRTY);
-	if (ret)
-		err = ret;
-
-	ret = clear_extent_bits(io_tree, rec->start,
-				rec->start + rec->len - 1,
-				EXTENT_DAMAGED);
-	if (ret && !err)
-		err = ret;
-
-	kfree(rec);
-	return err;
-}
-
 /*
  * this bypasses the standard btrfs submit functions deliberately, as
  * the standard behavior is to write all copies in a raid setup. here we only
@@ -2427,287 +2343,6 @@ int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num)
 	return ret;
 }
 
-/*
- * each time an IO finishes, we do a fast check in the IO failure tree
- * to see if we need to process or clean up an io_failure_record
- */
-int clean_io_failure(struct btrfs_fs_info *fs_info,
-		     struct extent_io_tree *failure_tree,
-		     struct extent_io_tree *io_tree, u64 start,
-		     struct page *page, u64 ino, unsigned int pg_offset)
-{
-	u64 private;
-	struct io_failure_record *failrec;
-	struct extent_state *state;
-	int num_copies;
-	int ret;
-
-	private = 0;
-	ret = count_range_bits(failure_tree, &private, (u64)-1, 1,
-			       EXTENT_DIRTY, 0);
-	if (!ret)
-		return 0;
-
-	failrec = get_state_failrec(failure_tree, start);
-	if (IS_ERR(failrec))
-		return 0;
-
-	BUG_ON(!failrec->this_mirror);
-
-	if (sb_rdonly(fs_info->sb))
-		goto out;
-
-	spin_lock(&io_tree->lock);
-	state = find_first_extent_bit_state(io_tree,
-					    failrec->start,
-					    EXTENT_LOCKED);
-	spin_unlock(&io_tree->lock);
-
-	if (state && state->start <= failrec->start &&
-	    state->end >= failrec->start + failrec->len - 1) {
-		num_copies = btrfs_num_copies(fs_info, failrec->logical,
-					      failrec->len);
-		if (num_copies > 1)  {
-			btrfs_repair_io_failure(fs_info, ino, start,
-					failrec->len, failrec->logical,
-					page, pg_offset, failrec->failed_mirror);
-		}
-	}
-
-out:
-	free_io_failure(failure_tree, io_tree, failrec);
-
-	return 0;
-}
-
-/*
- * Can be called when
- * - hold extent lock
- * - under ordered extent
- * - the inode is freeing
- */
-void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start, u64 end)
-{
-	struct extent_io_tree *failure_tree = &inode->io_failure_tree;
-	struct io_failure_record *failrec;
-	struct extent_state *state, *next;
-
-	if (RB_EMPTY_ROOT(&failure_tree->state))
-		return;
-
-	spin_lock(&failure_tree->lock);
-	state = find_first_extent_bit_state(failure_tree, start, EXTENT_DIRTY);
-	while (state) {
-		if (state->start > end)
-			break;
-
-		ASSERT(state->end <= end);
-
-		next = next_state(state);
-
-		failrec = state->failrec;
-		free_extent_state(state);
-		kfree(failrec);
-
-		state = next;
-	}
-	spin_unlock(&failure_tree->lock);
-}
-
-static struct io_failure_record *btrfs_get_io_failure_record(struct inode *inode,
-							     u64 start)
-{
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct io_failure_record *failrec;
-	struct extent_map *em;
-	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
-	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
-	struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree;
-	const u32 sectorsize = fs_info->sectorsize;
-	int ret;
-	u64 logical;
-
-	failrec = get_state_failrec(failure_tree, start);
-	if (!IS_ERR(failrec)) {
-		btrfs_debug(fs_info,
-	"Get IO Failure Record: (found) logical=%llu, start=%llu, len=%llu",
-			failrec->logical, failrec->start, failrec->len);
-		/*
-		 * when data can be on disk more than twice, add to failrec here
-		 * (e.g. with a list for failed_mirror) to make
-		 * clean_io_failure() clean all those errors at once.
-		 */
-
-		return failrec;
-	}
-
-	failrec = kzalloc(sizeof(*failrec), GFP_NOFS);
-	if (!failrec)
-		return ERR_PTR(-ENOMEM);
-
-	failrec->start = start;
-	failrec->len = sectorsize;
-	failrec->this_mirror = 0;
-	failrec->compress_type = BTRFS_COMPRESS_NONE;
-
-	read_lock(&em_tree->lock);
-	em = lookup_extent_mapping(em_tree, start, failrec->len);
-	if (!em) {
-		read_unlock(&em_tree->lock);
-		kfree(failrec);
-		return ERR_PTR(-EIO);
-	}
-
-	if (em->start > start || em->start + em->len <= start) {
-		free_extent_map(em);
-		em = NULL;
-	}
-	read_unlock(&em_tree->lock);
-	if (!em) {
-		kfree(failrec);
-		return ERR_PTR(-EIO);
-	}
-
-	logical = start - em->start;
-	logical = em->block_start + logical;
-	if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) {
-		logical = em->block_start;
-		failrec->compress_type = em->compress_type;
-	}
-
-	btrfs_debug(fs_info,
-		    "Get IO Failure Record: (new) logical=%llu, start=%llu, len=%llu",
-		    logical, start, failrec->len);
-
-	failrec->logical = logical;
-	free_extent_map(em);
-
-	/* Set the bits in the private failure tree */
-	ret = set_extent_bits(failure_tree, start, start + sectorsize - 1,
-			      EXTENT_LOCKED | EXTENT_DIRTY);
-	if (ret >= 0) {
-		ret = set_state_failrec(failure_tree, start, failrec);
-		/* Set the bits in the inode's tree */
-		ret = set_extent_bits(tree, start, start + sectorsize - 1,
-				      EXTENT_DAMAGED);
-	} else if (ret < 0) {
-		kfree(failrec);
-		return ERR_PTR(ret);
-	}
-
-	return failrec;
-}
-
-static bool btrfs_check_repairable(struct inode *inode,
-				   struct io_failure_record *failrec,
-				   int failed_mirror)
-{
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	int num_copies;
-
-	num_copies = btrfs_num_copies(fs_info, failrec->logical, failrec->len);
-	if (num_copies == 1) {
-		/*
-		 * we only have a single copy of the data, so don't bother with
-		 * all the retry and error correction code that follows. no
-		 * matter what the error is, it is very likely to persist.
-		 */
-		btrfs_debug(fs_info,
-			"Check Repairable: cannot repair, num_copies=%d, next_mirror %d, failed_mirror %d",
-			num_copies, failrec->this_mirror, failed_mirror);
-		return false;
-	}
-
-	/* The failure record should only contain one sector */
-	ASSERT(failrec->len == fs_info->sectorsize);
-
-	/*
-	 * There are two premises:
-	 * a) deliver good data to the caller
-	 * b) correct the bad sectors on disk
-	 *
-	 * Since we're only doing repair for one sector, we only need to get
-	 * a good copy of the failed sector and if we succeed, we have setup
-	 * everything for btrfs_repair_io_failure() to do the rest for us.
-	 */
-	ASSERT(failed_mirror);
-	failrec->failed_mirror = failed_mirror;
-	failrec->this_mirror++;
-	if (failrec->this_mirror == failed_mirror)
-		failrec->this_mirror++;
-
-	if (failrec->this_mirror > num_copies) {
-		btrfs_debug(fs_info,
-			"Check Repairable: (fail) num_copies=%d, next_mirror %d, failed_mirror %d",
-			num_copies, failrec->this_mirror, failed_mirror);
-		return false;
-	}
-
-	return true;
-}
-
-int btrfs_repair_one_sector(struct inode *inode,
-			    struct bio *failed_bio, u32 bio_offset,
-			    struct page *page, unsigned int pgoff,
-			    u64 start, int failed_mirror,
-			    submit_bio_hook_t *submit_bio_hook)
-{
-	struct io_failure_record *failrec;
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
-	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
-	struct btrfs_bio *failed_bbio = btrfs_bio(failed_bio);
-	const int icsum = bio_offset >> fs_info->sectorsize_bits;
-	struct bio *repair_bio;
-	struct btrfs_bio *repair_bbio;
-
-	btrfs_debug(fs_info,
-		   "repair read error: read error at %llu", start);
-
-	BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
-
-	failrec = btrfs_get_io_failure_record(inode, start);
-	if (IS_ERR(failrec))
-		return PTR_ERR(failrec);
-
-
-	if (!btrfs_check_repairable(inode, failrec, failed_mirror)) {
-		free_io_failure(failure_tree, tree, failrec);
-		return -EIO;
-	}
-
-	repair_bio = btrfs_bio_alloc(1);
-	repair_bbio = btrfs_bio(repair_bio);
-	repair_bbio->file_offset = start;
-	repair_bio->bi_opf = REQ_OP_READ;
-	repair_bio->bi_end_io = failed_bio->bi_end_io;
-	repair_bio->bi_iter.bi_sector = failrec->logical >> 9;
-	repair_bio->bi_private = failed_bio->bi_private;
-
-	if (failed_bbio->csum) {
-		const u32 csum_size = fs_info->csum_size;
-
-		repair_bbio->csum = repair_bbio->csum_inline;
-		memcpy(repair_bbio->csum,
-		       failed_bbio->csum + csum_size * icsum, csum_size);
-	}
-
-	bio_add_page(repair_bio, page, failrec->len, pgoff);
-	repair_bbio->iter = repair_bio->bi_iter;
-
-	btrfs_debug(btrfs_sb(inode->i_sb),
-		    "repair read error: submitting new read to mirror %d",
-		    failrec->this_mirror);
-
-	/*
-	 * At this point we have a bio, so any errors from submit_bio_hook()
-	 * will be handled by the endio on the repair_bio, so we can't return an
-	 * error here.
-	 */
-	submit_bio_hook(inode, repair_bio, failrec->this_mirror, failrec->compress_type);
-	return BLK_STS_OK;
-}
-
 static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
@@ -3019,7 +2654,6 @@ static void end_bio_extent_readpage(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	struct btrfs_bio *bbio = btrfs_bio(bio);
-	struct extent_io_tree *tree, *failure_tree;
 	struct processed_extent processed = { 0 };
 	/*
 	 * The offset to the beginning of a bio, since one bio can never be
@@ -3046,8 +2680,6 @@ static void end_bio_extent_readpage(struct bio *bio)
 			"end_bio_extent_readpage: bi_sector=%llu, err=%d, mirror=%u",
 			bio->bi_iter.bi_sector, bio->bi_status,
 			bbio->mirror_num);
-		tree = &BTRFS_I(inode)->io_tree;
-		failure_tree = &BTRFS_I(inode)->io_failure_tree;
 
 		/*
 		 * We always issue full-sector reads, but if some block in a
@@ -3088,10 +2720,6 @@ static void end_bio_extent_readpage(struct bio *bio)
 			loff_t i_size = i_size_read(inode);
 			pgoff_t end_index = i_size >> PAGE_SHIFT;
 
-			clean_io_failure(BTRFS_I(inode)->root->fs_info,
-					 failure_tree, tree, start, page,
-					 btrfs_ino(BTRFS_I(inode)), 0);
-
 			/*
 			 * Zero out the remaining part if this range straddles
 			 * i_size.
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 6cdcea1551a6..e46fe23f6aff 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -61,7 +61,6 @@ struct btrfs_root;
 struct btrfs_inode;
 struct btrfs_io_bio;
 struct btrfs_fs_info;
-struct io_failure_record;
 struct extent_io_tree;
 
 typedef void (submit_bio_hook_t)(struct inode *inode, struct bio *bio,
@@ -253,29 +252,6 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size);
 void end_extent_writepage(struct page *page, int err, u64 start, u64 end);
 int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num);
 
-/*
- * When IO fails, either with EIO or csum verification fails, we
- * try other mirrors that might have a good copy of the data.  This
- * io_failure_record is used to record state as we go through all the
- * mirrors.  If another mirror has good data, the sector is set up to date
- * and things continue.  If a good mirror can't be found, the original
- * bio end_io callback is called to indicate things have failed.
- */
-struct io_failure_record {
-	struct page *page;
-	u64 start;
-	u64 len;
-	u64 logical;
-	enum btrfs_compression_type compress_type;
-	int this_mirror;
-	int failed_mirror;
-};
-
-int btrfs_repair_one_sector(struct inode *inode,
-			    struct bio *failed_bio, u32 bio_offset,
-			    struct page *page, unsigned int pgoff,
-			    u64 start, int failed_mirror,
-			    submit_bio_hook_t *submit_bio_hook);
 int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 			    u64 length, u64 logical, struct page *page,
 			    unsigned int pg_offset, int mirror_num);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2d52a19e02cf..e08e8cb79055 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3143,8 +3143,6 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 					ordered_extent->disk_num_bytes);
 	}
 
-	btrfs_free_io_failure_record(inode, start, end);
-
 	if (test_bit(BTRFS_ORDERED_TRUNCATED, &ordered_extent->flags)) {
 		truncated = true;
 		logical_len = ordered_extent->truncated_len;
@@ -5355,8 +5353,6 @@ void btrfs_evict_inode(struct inode *inode)
 	if (is_bad_inode(inode))
 		goto no_delete;
 
-	btrfs_free_io_failure_record(BTRFS_I(inode), 0, (u64)-1);
-
 	if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
 		goto no_delete;
 
@@ -7870,8 +7866,6 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
 {
 	struct inode *inode = dip->inode;
 	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
-	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
-	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
 	blk_status_t err = BLK_STS_OK;
 	struct bvec_iter iter;
@@ -7881,13 +7875,9 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
 	btrfs_bio_for_each_sector(fs_info, bv, bbio, iter, offset) {
 		u64 start = bbio->file_offset + offset;
 
-		if (uptodate &&
-		    (!csum || !check_data_csum(inode, bbio, offset, bv.bv_page,
+		if (!uptodate ||
+		    (csum && check_data_csum(inode, bbio, offset, bv.bv_page,
 				bv.bv_offset, start))) {
-			clean_io_failure(fs_info, failure_tree, io_tree, start,
-					 bv.bv_page, btrfs_ino(BTRFS_I(inode)),
-					 bv.bv_offset);
-		} else {
 			u8 *csum_expected = NULL;
 			const u64 logical = (bbio->iter.bi_sector <<
 					     SECTOR_SHIFT) + offset;
@@ -8855,12 +8845,9 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
 	inode = &ei->vfs_inode;
 	extent_map_tree_init(&ei->extent_tree);
 	extent_io_tree_init(fs_info, &ei->io_tree, IO_TREE_INODE_IO, inode);
-	extent_io_tree_init(fs_info, &ei->io_failure_tree,
-			    IO_TREE_INODE_IO_FAILURE, inode);
 	extent_io_tree_init(fs_info, &ei->file_extent_tree,
 			    IO_TREE_INODE_FILE_EXTENT, inode);
 	ei->io_tree.track_uptodate = true;
-	ei->io_failure_tree.track_uptodate = true;
 	atomic_set(&ei->sync_writers, 0);
 	mutex_init(&ei->log_mutex);
 	btrfs_ordered_inode_tree_init(&ei->ordered_tree);
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 290f07eb050a..764e9643c123 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -82,7 +82,6 @@ struct btrfs_space_info;
 	EM( IO_TREE_FS_EXCLUDED_EXTENTS,  "EXCLUDED_EXTENTS")	    \
 	EM( IO_TREE_BTREE_INODE_IO,	  "BTREE_INODE_IO")	    \
 	EM( IO_TREE_INODE_IO,		  "INODE_IO")		    \
-	EM( IO_TREE_INODE_IO_FAILURE,	  "INODE_IO_FAILURE")	    \
 	EM( IO_TREE_RELOC_BLOCKS,	  "RELOC_BLOCKS")	    \
 	EM( IO_TREE_TRANS_DIRTY_PAGES,	  "TRANS_DIRTY_PAGES")      \
 	EM( IO_TREE_ROOT_DIRTY_LOG_PAGES, "ROOT_DIRTY_LOG_PAGES")   \
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-05-23  1:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-23  1:48 [PATCH 0/7] btrfs: synchronous (but super simple) read-repair rework Qu Wenruo
2022-05-23  1:48 ` [PATCH 1/7] btrfs: save the original bi_iter into btrfs_bio for buffered read Qu Wenruo
2022-05-23  1:48 ` [PATCH 2/7] btrfs: make repair_io_failure available outside of extent_io.c Qu Wenruo
2022-05-23  1:48 ` [PATCH 3/7] btrfs: add a btrfs_map_bio_wait helper Qu Wenruo
2022-05-23  1:48 ` [PATCH 4/7] btrfs: add new read repair infrastructure Qu Wenruo
2022-05-23  1:48 ` [PATCH 5/7] btrfs: use the new read repair code for buffered reads Qu Wenruo
2022-05-23  1:48 ` [PATCH 6/7] btrfs: use the new read repair code for direct I/O Qu Wenruo
2022-05-23  1:48 ` [PATCH 7/7] btrfs: remove io_failure_record infrastructure completely Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.