All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/15] btrfs: read repair/direct I/O improvements
@ 2020-04-16 21:46 Omar Sandoval
  2020-04-16 21:46 ` [PATCH v2 01/15] block: add bio_for_each_bvec_all() Omar Sandoval
                   ` (16 more replies)
  0 siblings, 17 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

Hi,

This is version 2 of my series of fixes, cleanups, and improvements to
direct I/O and read repair. It's preparation for adding read repair to
my RWF_ENCODED series [1], but it can go in independently.

Patch 1 adds a new bio helper needed for patch 4. Patches 2 and 3 are
direct I/O error handling fixes. Patch 4 is a buffered read repair fix.
Patch 5 is a buffered read repair improvement. Patches 6-9 are trivial
cleanups. Patch 10 converts direct I/O to use refcount_t, which would've
helped catch the bug fixed by patch 2. Patches 11-14 drastically
simplify the direct I/O code. Patch 15 unifies buffered and direct I/O
read repair, which also makes direct I/O repair actually do validation
for large failed reads instead of rewriting the whole thing.

Overall, this is net about -350 lines of code and actually makes direct
I/O more functional.

Note that this series causes btrfs/142 to fail. This is a bug in the
test, as it assumes that direct I/O doesn't do read validation. This is
fixed by the fstests patch "btrfs/14{2,3}: use dm-dust instead of
fail_make_request" which I sent up yesterday [2].

Jens and Christoph are cc'd for patches 1 and 4. Instead of looking at
the bio internals like I did in v1, I added a new
bio_for_each_bvec_all() helper.

Changes from v1 [3]:

* Added patch 1 with bio_for_each_bvec_all() helper
* Dropped patch 8 which moved struct definition
* Fixed performance regression [4] in patch 13 caused by accidentally
  making all direct I/O submission asynchronous
* Fixed uninitialized assert in patch 12
* Fixed misplaced assert in btrfs_check_read_dio_bio in patch 13
* Added reviewed-bys
* Refactored btrfs_submit_direct() and btrfs_submit_direct_hook() in
  patch 2 (I didn't add Nikolay's reviewed-by to that one because the
  new patch looks fairly different from patch 1 in v1)
* Rewrapped csum calculations in patch 10 for easier readability
* Clarified several commit messages and comments

This series is based on misc-next.

Thanks!

1: https://lore.kernel.org/linux-fsdevel/cover.1582930832.git.osandov@fb.com/
2: https://lore.kernel.org/fstests/d992390752c612acd0893ee3db929e77affded2b.1586983958.git.osandov@fb.com/
3: https://lore.kernel.org/linux-btrfs/cover.1583789410.git.osandov@fb.com/
4: https://lore.kernel.org/lkml/20200331090145.GH11705@shao2-debian/

Omar Sandoval (15):
  block: add bio_for_each_bvec_all()
  btrfs: fix error handling when submitting direct I/O bio
  btrfs: fix double __endio_write_update_ordered in direct I/O
  btrfs: look at full bi_io_vec for repair decision
  btrfs: don't do repair validation for checksum errors
  btrfs: clarify btrfs_lookup_bio_sums documentation
  btrfs: rename __readpage_endio_check to check_data_csum
  btrfs: make btrfs_check_repairable() static
  btrfs: kill btrfs_dio_private->private
  btrfs: convert btrfs_dio_private->pending_bios to refcount_t
  btrfs: put direct I/O checksums in btrfs_dio_private instead of bio
  btrfs: get rid of one layer of bios in direct I/O
  btrfs: simplify direct I/O read repair
  btrfs: get rid of endio_repair_workers
  btrfs: unify buffered and direct I/O read repair

 .clang-format                   |   1 +
 Documentation/block/biovecs.rst |   2 +
 fs/btrfs/btrfs_inode.h          |  26 +-
 fs/btrfs/ctree.h                |   1 -
 fs/btrfs/disk-io.c              |   8 +-
 fs/btrfs/disk-io.h              |   1 -
 fs/btrfs/extent_io.c            | 152 ++++---
 fs/btrfs/extent_io.h            |  19 +-
 fs/btrfs/file-item.c            |  11 +-
 fs/btrfs/inode.c                | 745 ++++++++------------------------
 include/linux/bio.h             |   8 +
 11 files changed, 312 insertions(+), 662 deletions(-)

-- 
2.26.1

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 01/15] block: add bio_for_each_bvec_all()
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
@ 2020-04-16 21:46 ` Omar Sandoval
  2020-04-17 12:56   ` Johannes Thumshirn
  2020-04-16 21:46 ` [PATCH v2 02/15] btrfs: fix error handling when submitting direct I/O bio Omar Sandoval
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

An upcoming Btrfs fix needs to know the original size of a non-cloned
bios. Rather than accessing the bvec table directly, let's add a
bio_for_each_bvec_all() accessor.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 .clang-format                   | 1 +
 Documentation/block/biovecs.rst | 2 ++
 include/linux/bio.h             | 8 ++++++++
 3 files changed, 11 insertions(+)

diff --git a/.clang-format b/.clang-format
index 6ec5558b516b..1d6e39ad2454 100644
--- a/.clang-format
+++ b/.clang-format
@@ -80,6 +80,7 @@ ForEachMacros:
   - 'ax25_uid_for_each'
   - '__bio_for_each_bvec'
   - 'bio_for_each_bvec'
+  - 'bio_for_each_bvec_all'
   - 'bio_for_each_integrity_vec'
   - '__bio_for_each_segment'
   - 'bio_for_each_segment'
diff --git a/Documentation/block/biovecs.rst b/Documentation/block/biovecs.rst
index ad303a2569d3..36771a131b56 100644
--- a/Documentation/block/biovecs.rst
+++ b/Documentation/block/biovecs.rst
@@ -129,6 +129,7 @@ Usage of helpers:
 ::
 
 	bio_for_each_segment_all()
+	bio_for_each_bvec_all()
 	bio_first_bvec_all()
 	bio_first_page_all()
 	bio_last_bvec_all()
@@ -143,4 +144,5 @@ Usage of helpers:
   bio_vec' will contain a multi-page IO vector during the iteration::
 
 	bio_for_each_bvec()
+	bio_for_each_bvec_all()
 	rq_for_each_bvec()
diff --git a/include/linux/bio.h b/include/linux/bio.h
index c1c0f9ea4e63..c506b26f273f 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -169,6 +169,14 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
 #define bio_for_each_bvec(bvl, bio, iter)			\
 	__bio_for_each_bvec(bvl, bio, iter, (bio)->bi_iter)
 
+/*
+ * Iterate over all multi-page bvecs. Drivers shouldn't use this version for the
+ * same reasons as bio_for_each_segment_all().
+ */
+#define bio_for_each_bvec_all(bvl, bio, i)		\
+	for (i = 0, bvl = bio_first_bvec_all(bio);	\
+	     i < (bio)->bi_vcnt; i++, bvl++)		\
+
 #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)
 
 static inline unsigned bio_segments(struct bio *bio)
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 02/15] btrfs: fix error handling when submitting direct I/O bio
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
  2020-04-16 21:46 ` [PATCH v2 01/15] block: add bio_for_each_bvec_all() Omar Sandoval
@ 2020-04-16 21:46 ` Omar Sandoval
  2020-04-17 13:01   ` Johannes Thumshirn
  2020-04-16 21:46 ` [PATCH v2 03/15] btrfs: fix double __endio_write_update_ordered in direct I/O Omar Sandoval
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

In btrfs_submit_direct_hook(), if a direct I/O write doesn't span a RAID
stripe or chunk, we submit orig_bio without cloning it. In this case, we
don't increment pending_bios. Then, if btrfs_submit_dio_bio() fails, we
decrement pending_bios to -1, and we never complete orig_bio. Fix it by
initializing pending_bios to 1 instead of incrementing later.

Fixing this exposes another bug: we put orig_bio prematurely and then
put it again from end_io. Fix it by not putting orig_bio.

After this change, pending_bios is really more of a reference count, but
I'll leave that cleanup separate to keep the fix small.

Fixes: e65e15355429 ("btrfs: fix panic caused by direct IO")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 259239b33370..b628c319a5b6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7939,7 +7939,6 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
 
 	/* bio split */
 	ASSERT(geom.len <= INT_MAX);
-	atomic_inc(&dip->pending_bios);
 	do {
 		clone_len = min_t(int, submit_len, geom.len);
 
@@ -7989,7 +7988,8 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
 	if (!status)
 		return 0;
 
-	bio_put(bio);
+	if (bio != orig_bio)
+		bio_put(bio);
 out_err:
 	dip->errors = 1;
 	/*
@@ -8030,7 +8030,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 	bio->bi_private = dip;
 	dip->orig_bio = bio;
 	dip->dio_bio = dio_bio;
-	atomic_set(&dip->pending_bios, 0);
+	atomic_set(&dip->pending_bios, 1);
 	io_bio = btrfs_io_bio(bio);
 	io_bio->logical = file_offset;
 
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 03/15] btrfs: fix double __endio_write_update_ordered in direct I/O
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
  2020-04-16 21:46 ` [PATCH v2 01/15] block: add bio_for_each_bvec_all() Omar Sandoval
  2020-04-16 21:46 ` [PATCH v2 02/15] btrfs: fix error handling when submitting direct I/O bio Omar Sandoval
@ 2020-04-16 21:46 ` Omar Sandoval
  2020-04-17 17:53   ` Johannes Thumshirn
  2020-04-21 10:44   ` Nikolay Borisov
  2020-04-16 21:46 ` [PATCH v2 04/15] btrfs: look at full bi_io_vec for repair decision Omar Sandoval
                   ` (13 subsequent siblings)
  16 siblings, 2 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

In btrfs_submit_direct(), if we fail to allocate the btrfs_dio_private,
we complete the ordered extent range. However, we don't mark that the
range doesn't need to be cleaned up from btrfs_direct_IO() until later.
Therefore, if we fail to allocate the btrfs_dio_private, we complete the
ordered extent range twice. We could fix this by updating
unsubmitted_oe_range earlier, but it's cleaner to reorganize the code so
that creating the btrfs_dio_private and submitting the bios are
separate, and once the btrfs_dio_private is created, cleanup always
happens through the btrfs_dio_private.

Fixes: f28a49287817 ("Btrfs: fix leaking of ordered extents after direct IO write error")
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 174 ++++++++++++++++++-----------------------------
 1 file changed, 66 insertions(+), 108 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b628c319a5b6..f6ce9749adb6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7903,14 +7903,60 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 	return ret;
 }
 
-static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
+/*
+ * If this succeeds, the btrfs_dio_private is responsible for cleaning up locked
+ * or ordered extents whether or not we submit any bios.
+ */
+static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
+							  struct inode *inode,
+							  loff_t file_offset)
 {
-	struct inode *inode = dip->inode;
+	const bool write = (bio_op(dio_bio) == REQ_OP_WRITE);
+	struct btrfs_dio_private *dip;
+	struct bio *bio;
+
+	dip = kzalloc(sizeof(*dip), GFP_NOFS);
+	if (!dip)
+		return NULL;
+
+	bio = btrfs_bio_clone(dio_bio);
+	bio->bi_private = dip;
+	btrfs_io_bio(bio)->logical = file_offset;
+
+	dip->private = dio_bio->bi_private;
+	dip->inode = inode;
+	dip->logical_offset = file_offset;
+	dip->bytes = dio_bio->bi_iter.bi_size;
+	dip->disk_bytenr = (u64)dio_bio->bi_iter.bi_sector << 9;
+	dip->orig_bio = bio;
+	dip->dio_bio = dio_bio;
+	atomic_set(&dip->pending_bios, 1);
+
+	if (write) {
+		struct btrfs_dio_data *dio_data = current->journal_info;
+
+		dio_data->unsubmitted_oe_range_end = dip->logical_offset +
+			dip->bytes;
+		dio_data->unsubmitted_oe_range_start =
+			dio_data->unsubmitted_oe_range_end;
+
+		bio->bi_end_io = btrfs_endio_direct_write;
+	} else {
+		bio->bi_end_io = btrfs_endio_direct_read;
+		dip->subio_endio = btrfs_subio_endio_read;
+	}
+	return dip;
+}
+
+static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
+				loff_t file_offset)
+{
+	const bool write = (bio_op(dio_bio) == REQ_OP_WRITE);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	struct btrfs_dio_private *dip;
 	struct bio *bio;
-	struct bio *orig_bio = dip->orig_bio;
-	u64 start_sector = orig_bio->bi_iter.bi_sector;
-	u64 file_offset = dip->logical_offset;
+	struct bio *orig_bio;
+	u64 start_sector;
 	int async_submit = 0;
 	u64 submit_len;
 	int clone_offset = 0;
@@ -7919,11 +7965,24 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
 	blk_status_t status;
 	struct btrfs_io_geometry geom;
 
+	dip = btrfs_create_dio_private(dio_bio, inode, file_offset);
+	if (!dip) {
+		if (!write) {
+			unlock_extent(&BTRFS_I(inode)->io_tree, file_offset,
+				file_offset + dio_bio->bi_iter.bi_size - 1);
+		}
+		dio_bio->bi_status = BLK_STS_RESOURCE;
+		dio_end_io(dio_bio);
+		return;
+	}
+
+	orig_bio = dip->orig_bio;
+	start_sector = orig_bio->bi_iter.bi_sector;
 	submit_len = orig_bio->bi_iter.bi_size;
 	ret = btrfs_get_io_geometry(fs_info, btrfs_op(orig_bio),
 				    start_sector << 9, submit_len, &geom);
 	if (ret)
-		return -EIO;
+		goto out_err;
 
 	if (geom.len >= submit_len) {
 		bio = orig_bio;
@@ -7986,7 +8045,7 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
 submit:
 	status = btrfs_submit_dio_bio(bio, inode, file_offset, async_submit);
 	if (!status)
-		return 0;
+		return;
 
 	if (bio != orig_bio)
 		bio_put(bio);
@@ -8000,107 +8059,6 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
 	 */
 	if (atomic_dec_and_test(&dip->pending_bios))
 		bio_io_error(dip->orig_bio);
-
-	/* bio_end_io() will handle error, so we needn't return it */
-	return 0;
-}
-
-static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
-				loff_t file_offset)
-{
-	struct btrfs_dio_private *dip = NULL;
-	struct bio *bio = NULL;
-	struct btrfs_io_bio *io_bio;
-	bool write = (bio_op(dio_bio) == REQ_OP_WRITE);
-	int ret = 0;
-
-	bio = btrfs_bio_clone(dio_bio);
-
-	dip = kzalloc(sizeof(*dip), GFP_NOFS);
-	if (!dip) {
-		ret = -ENOMEM;
-		goto free_ordered;
-	}
-
-	dip->private = dio_bio->bi_private;
-	dip->inode = inode;
-	dip->logical_offset = file_offset;
-	dip->bytes = dio_bio->bi_iter.bi_size;
-	dip->disk_bytenr = (u64)dio_bio->bi_iter.bi_sector << 9;
-	bio->bi_private = dip;
-	dip->orig_bio = bio;
-	dip->dio_bio = dio_bio;
-	atomic_set(&dip->pending_bios, 1);
-	io_bio = btrfs_io_bio(bio);
-	io_bio->logical = file_offset;
-
-	if (write) {
-		bio->bi_end_io = btrfs_endio_direct_write;
-	} else {
-		bio->bi_end_io = btrfs_endio_direct_read;
-		dip->subio_endio = btrfs_subio_endio_read;
-	}
-
-	/*
-	 * Reset the range for unsubmitted ordered extents (to a 0 length range)
-	 * even if we fail to submit a bio, because in such case we do the
-	 * corresponding error handling below and it must not be done a second
-	 * time by btrfs_direct_IO().
-	 */
-	if (write) {
-		struct btrfs_dio_data *dio_data = current->journal_info;
-
-		dio_data->unsubmitted_oe_range_end = dip->logical_offset +
-			dip->bytes;
-		dio_data->unsubmitted_oe_range_start =
-			dio_data->unsubmitted_oe_range_end;
-	}
-
-	ret = btrfs_submit_direct_hook(dip);
-	if (!ret)
-		return;
-
-	btrfs_io_bio_free_csum(io_bio);
-
-free_ordered:
-	/*
-	 * If we arrived here it means either we failed to submit the dip
-	 * or we either failed to clone the dio_bio or failed to allocate the
-	 * dip. If we cloned the dio_bio and allocated the dip, we can just
-	 * call bio_endio against our io_bio so that we get proper resource
-	 * cleanup if we fail to submit the dip, otherwise, we must do the
-	 * same as btrfs_endio_direct_[write|read] because we can't call these
-	 * callbacks - they require an allocated dip and a clone of dio_bio.
-	 */
-	if (bio && dip) {
-		bio_io_error(bio);
-		/*
-		 * The end io callbacks free our dip, do the final put on bio
-		 * and all the cleanup and final put for dio_bio (through
-		 * dio_end_io()).
-		 */
-		dip = NULL;
-		bio = NULL;
-	} else {
-		if (write)
-			__endio_write_update_ordered(inode,
-						file_offset,
-						dio_bio->bi_iter.bi_size,
-						false);
-		else
-			unlock_extent(&BTRFS_I(inode)->io_tree, file_offset,
-			      file_offset + dio_bio->bi_iter.bi_size - 1);
-
-		dio_bio->bi_status = BLK_STS_IOERR;
-		/*
-		 * Releases and cleans up our dio_bio, no need to bio_put()
-		 * nor bio_endio()/bio_io_error() against dio_bio.
-		 */
-		dio_end_io(dio_bio);
-	}
-	if (bio)
-		bio_put(bio);
-	kfree(dip);
 }
 
 static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 04/15] btrfs: look at full bi_io_vec for repair decision
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
                   ` (2 preceding siblings ...)
  2020-04-16 21:46 ` [PATCH v2 03/15] btrfs: fix double __endio_write_update_ordered in direct I/O Omar Sandoval
@ 2020-04-16 21:46 ` Omar Sandoval
  2020-04-17 17:56   ` Johannes Thumshirn
  2020-04-16 21:46 ` [PATCH v2 05/15] btrfs: don't do repair validation for checksum errors Omar Sandoval
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

Read repair does two things: it finds a good copy of data to return to
the reader, and it corrects the bad copy on disk. If a read of multiple
sectors has an I/O error, repair does an extra "validation" step that
issues a separate read for each sector. This allows us to find the exact
failing sectors and only rewrite those.

This heuristic is implemented in
bio_readpage_error()/btrfs_check_repairable() as:

	failed_bio_pages = failed_bio->bi_iter.bi_size >> PAGE_SHIFT;
	if (failed_bio_pages > 1)
		do validation

However, at this point, bi_iter may have already been advanced. This
means that we'll skip the validation step and rewrite the entire failed
read.

Fix it by getting the actual size from the biovec (which we can do
because this is only called for non-cloned bios, although that will
change in a later commit).

Fixes: 8a2ee44a371c ("btrfs: look at bi_size for repair decisions")
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent_io.c | 33 +++++++++++++++++++++++++++------
 fs/btrfs/extent_io.h |  5 +++--
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 39e45b8a5031..712f49607d3a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2537,8 +2537,9 @@ int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end,
 	return 0;
 }
 
-bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages,
-			   struct io_failure_record *failrec, int failed_mirror)
+bool btrfs_check_repairable(struct inode *inode, bool need_validation,
+			    struct io_failure_record *failrec,
+			    int failed_mirror)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	int num_copies;
@@ -2561,7 +2562,7 @@ bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages,
 	 *	a) deliver good data to the caller
 	 *	b) correct the bad sectors on disk
 	 */
-	if (failed_bio_pages > 1) {
+	if (need_validation) {
 		/*
 		 * to fulfill b), we need to know the exact failing sectors, as
 		 * we don't want to rewrite any more than the failed ones. thus,
@@ -2633,6 +2634,24 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio,
 	return bio;
 }
 
+static bool btrfs_io_needs_validation(struct inode *inode, struct bio *bio)
+{
+	struct bio_vec *bvec;
+	u64 len = 0;
+	int i;
+
+	/*
+	 * We need to validate each sector individually if the failed I/O was
+	 * for multiple sectors.
+	 */
+	bio_for_each_bvec_all(bvec, bio, i) {
+		len += bvec->bv_len;
+		if (len > inode->i_sb->s_blocksize)
+			return true;
+	}
+	return false;
+}
+
 /*
  * This is a generic handler for readpage errors. If other copies exist, read
  * those and write back good data to the failed position. Does not investigate
@@ -2647,11 +2666,11 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset,
 	struct inode *inode = page->mapping->host;
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
 	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
+	bool need_validation;
 	struct bio *bio;
 	int read_mode = 0;
 	blk_status_t status;
 	int ret;
-	unsigned failed_bio_pages = failed_bio->bi_iter.bi_size >> PAGE_SHIFT;
 
 	BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
 
@@ -2659,13 +2678,15 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset,
 	if (ret)
 		return ret;
 
-	if (!btrfs_check_repairable(inode, failed_bio_pages, failrec,
+	need_validation = btrfs_io_needs_validation(inode, failed_bio);
+
+	if (!btrfs_check_repairable(inode, need_validation, failrec,
 				    failed_mirror)) {
 		free_io_failure(failure_tree, tree, failrec);
 		return -EIO;
 	}
 
-	if (failed_bio_pages > 1)
+	if (need_validation)
 		read_mode |= REQ_FAILFAST_DEV;
 
 	phy_offset >>= inode->i_sb->s_blocksize_bits;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 2ed65bd0760e..26c0fce0bb64 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -312,8 +312,9 @@ struct io_failure_record {
 };
 
 
-bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages,
-			    struct io_failure_record *failrec, int fail_mirror);
+bool btrfs_check_repairable(struct inode *inode, bool need_validation,
+			    struct io_failure_record *failrec,
+			    int failed_mirror);
 struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio,
 				    struct io_failure_record *failrec,
 				    struct page *page, int pg_offset, int icsum,
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 05/15] btrfs: don't do repair validation for checksum errors
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
                   ` (3 preceding siblings ...)
  2020-04-16 21:46 ` [PATCH v2 04/15] btrfs: look at full bi_io_vec for repair decision Omar Sandoval
@ 2020-04-16 21:46 ` Omar Sandoval
  2020-04-17 17:59   ` Johannes Thumshirn
  2020-04-16 21:46 ` [PATCH v2 06/15] btrfs: clarify btrfs_lookup_bio_sums documentation Omar Sandoval
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

The purpose of the validation step is to distinguish between good and
bad sectors in a failed multi-sector read. If a multi-sector read
succeeded but some of those sectors had checksum errors, we don't need
to validate anything; we know the sectors with bad checksums need to be
repaired.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent_io.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 712f49607d3a..25dd42437cbd 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2640,6 +2640,14 @@ static bool btrfs_io_needs_validation(struct inode *inode, struct bio *bio)
 	u64 len = 0;
 	int i;
 
+	/*
+	 * If bi_status is BLK_STS_OK, then this was a checksum error, not an
+	 * I/O error. In this case, we already know exactly which sector was
+	 * bad, so we don't need to validate.
+	 */
+	if (bio->bi_status == BLK_STS_OK)
+		return false;
+
 	/*
 	 * We need to validate each sector individually if the failed I/O was
 	 * for multiple sectors.
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 06/15] btrfs: clarify btrfs_lookup_bio_sums documentation
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
                   ` (4 preceding siblings ...)
  2020-04-16 21:46 ` [PATCH v2 05/15] btrfs: don't do repair validation for checksum errors Omar Sandoval
@ 2020-04-16 21:46 ` Omar Sandoval
  2020-04-17 18:01   ` Johannes Thumshirn
  2020-04-21 11:17   ` Nikolay Borisov
  2020-04-16 21:46 ` [PATCH v2 07/15] btrfs: rename __readpage_endio_check to check_data_csum Omar Sandoval
                   ` (10 subsequent siblings)
  16 siblings, 2 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

Fix a couple of issues in the btrfs_lookup_bio_sums documentation:

* The bio doesn't need to be a btrfs_io_bio if dst was provided. Move
  the declaration in the code to make that clear, too.
* dst must be large enough to hold nblocks * csum_size, not just
  csum_size.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/file-item.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index b618ad5339ba..22cbb4da6d42 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -242,11 +242,13 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans,
 /**
  * btrfs_lookup_bio_sums - Look up checksums for a bio.
  * @inode: inode that the bio is for.
- * @bio: bio embedded in btrfs_io_bio.
+ * @bio: bio to look up.
  * @offset: Unless (u64)-1, look up checksums for this offset in the file.
  *          If (u64)-1, use the page offsets from the bio instead.
- * @dst: Buffer of size btrfs_super_csum_size() used to return checksum. If
- *       NULL, the checksum is returned in btrfs_io_bio(bio)->csum instead.
+ * @dst: Buffer of size nblocks * btrfs_super_csum_size() used to return
+ *       checksum (nblocks = bio->bi_iter.bi_size / fs_info->sectorsize). If
+ *       NULL, the checksum buffer is allocated and returned in
+ *       btrfs_io_bio(bio)->csum instead.
  *
  * Return: BLK_STS_RESOURCE if allocating memory fails, BLK_STS_OK otherwise.
  */
@@ -256,7 +258,6 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct bio_vec bvec;
 	struct bvec_iter iter;
-	struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio);
 	struct btrfs_csum_item *item = NULL;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_path *path;
@@ -277,6 +278,8 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 
 	nblocks = bio->bi_iter.bi_size >> inode->i_sb->s_blocksize_bits;
 	if (!dst) {
+		struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio);
+
 		if (nblocks * csum_size > BTRFS_BIO_INLINE_CSUM_SIZE) {
 			btrfs_bio->csum = kmalloc_array(nblocks, csum_size,
 							GFP_NOFS);
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 07/15] btrfs: rename __readpage_endio_check to check_data_csum
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
                   ` (5 preceding siblings ...)
  2020-04-16 21:46 ` [PATCH v2 06/15] btrfs: clarify btrfs_lookup_bio_sums documentation Omar Sandoval
@ 2020-04-16 21:46 ` Omar Sandoval
  2020-04-16 21:46 ` [PATCH v2 08/15] btrfs: make btrfs_check_repairable() static Omar Sandoval
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

__readpage_endio_check() is also used from the direct I/O read code, so
give it a more descriptive name.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f6ce9749adb6..eb3fcdc7c337 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2726,10 +2726,9 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
 	btrfs_queue_work(wq, &ordered_extent->work);
 }
 
-static int __readpage_endio_check(struct inode *inode,
-				  struct btrfs_io_bio *io_bio,
-				  int icsum, struct page *page,
-				  int pgoff, u64 start, size_t len)
+static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
+			   int icsum, struct page *page, int pgoff, u64 start,
+			   size_t len)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
@@ -2790,8 +2789,8 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 	}
 
 	phy_offset >>= inode->i_sb->s_blocksize_bits;
-	return __readpage_endio_check(inode, io_bio, phy_offset, page, offset,
-				      start, (size_t)(end - start + 1));
+	return check_data_csum(inode, io_bio, phy_offset, page, offset, start,
+			       (size_t)(end - start + 1));
 }
 
 /*
@@ -7584,9 +7583,9 @@ static void btrfs_retry_endio(struct bio *bio)
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
-		ret = __readpage_endio_check(inode, io_bio, i, bvec->bv_page,
-					     bvec->bv_offset, done->start,
-					     bvec->bv_len);
+		ret = check_data_csum(inode, io_bio, i, bvec->bv_page,
+				      bvec->bv_offset, done->start,
+				      bvec->bv_len);
 		if (!ret)
 			clean_io_failure(BTRFS_I(inode)->root->fs_info,
 					 failure_tree, io_tree, done->start,
@@ -7636,8 +7635,9 @@ static blk_status_t __btrfs_subio_endio_read(struct inode *inode,
 next_block:
 		if (uptodate) {
 			csum_pos = BTRFS_BYTES_TO_BLKS(fs_info, offset);
-			ret = __readpage_endio_check(inode, io_bio, csum_pos,
-					bvec.bv_page, pgoff, start, sectorsize);
+			ret = check_data_csum(inode, io_bio, csum_pos,
+					      bvec.bv_page, pgoff, start,
+					      sectorsize);
 			if (likely(!ret))
 				goto next;
 		}
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 08/15] btrfs: make btrfs_check_repairable() static
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
                   ` (6 preceding siblings ...)
  2020-04-16 21:46 ` [PATCH v2 07/15] btrfs: rename __readpage_endio_check to check_data_csum Omar Sandoval
@ 2020-04-16 21:46 ` Omar Sandoval
  2020-04-16 21:46 ` [PATCH v2 09/15] btrfs: kill btrfs_dio_private->private Omar Sandoval
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

Since its introduction in commit 2fe6303e7cd0 ("Btrfs: split
bio_readpage_error into several functions"), btrfs_check_repairable()
has only been used from extent_io.c where it is defined.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent_io.c | 7 ++++---
 fs/btrfs/extent_io.h | 3 ---
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 25dd42437cbd..85e98ba349a8 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2537,9 +2537,10 @@ int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end,
 	return 0;
 }
 
-bool btrfs_check_repairable(struct inode *inode, bool need_validation,
-			    struct io_failure_record *failrec,
-			    int failed_mirror)
+static bool btrfs_check_repairable(struct inode *inode,
+				   bool need_validation,
+				   struct io_failure_record *failrec,
+				   int failed_mirror)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	int num_copies;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 26c0fce0bb64..f4dfac756455 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -312,9 +312,6 @@ struct io_failure_record {
 };
 
 
-bool btrfs_check_repairable(struct inode *inode, bool need_validation,
-			    struct io_failure_record *failrec,
-			    int failed_mirror);
 struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio,
 				    struct io_failure_record *failrec,
 				    struct page *page, int pg_offset, int icsum,
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 09/15] btrfs: kill btrfs_dio_private->private
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
                   ` (7 preceding siblings ...)
  2020-04-16 21:46 ` [PATCH v2 08/15] btrfs: make btrfs_check_repairable() static Omar Sandoval
@ 2020-04-16 21:46 ` Omar Sandoval
  2020-04-17 18:02   ` Johannes Thumshirn
  2020-04-16 21:46 ` [PATCH v2 10/15] btrfs: convert btrfs_dio_private->pending_bios to refcount_t Omar Sandoval
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

We haven't used this since commit 9be3395bcd4a ("Btrfs: use a btrfs
bioset instead of abusing bio internals").

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/btrfs_inode.h | 1 -
 fs/btrfs/inode.c       | 1 -
 2 files changed, 2 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 27a1fefce508..ad36685ce046 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -301,7 +301,6 @@ struct btrfs_dio_private {
 	u64 logical_offset;
 	u64 disk_bytenr;
 	u64 bytes;
-	void *private;
 
 	/* number of bios pending for this dio */
 	atomic_t pending_bios;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index eb3fcdc7c337..d7cf248dd634 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7923,7 +7923,6 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
 	bio->bi_private = dip;
 	btrfs_io_bio(bio)->logical = file_offset;
 
-	dip->private = dio_bio->bi_private;
 	dip->inode = inode;
 	dip->logical_offset = file_offset;
 	dip->bytes = dio_bio->bi_iter.bi_size;
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 10/15] btrfs: convert btrfs_dio_private->pending_bios to refcount_t
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
                   ` (8 preceding siblings ...)
  2020-04-16 21:46 ` [PATCH v2 09/15] btrfs: kill btrfs_dio_private->private Omar Sandoval
@ 2020-04-16 21:46 ` Omar Sandoval
  2020-04-17 18:03   ` Johannes Thumshirn
  2020-04-16 21:46 ` [PATCH v2 11/15] btrfs: put direct I/O checksums in btrfs_dio_private instead of bio Omar Sandoval
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

This is really a reference count now, so convert it to refcount_t and
rename it to refs.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/btrfs_inode.h |  8 ++++++--
 fs/btrfs/inode.c       | 10 +++++-----
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index ad36685ce046..b965fa5429ec 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -7,6 +7,7 @@
 #define BTRFS_INODE_H
 
 #include <linux/hash.h>
+#include <linux/refcount.h>
 #include "extent_map.h"
 #include "extent_io.h"
 #include "ordered-data.h"
@@ -302,8 +303,11 @@ struct btrfs_dio_private {
 	u64 disk_bytenr;
 	u64 bytes;
 
-	/* number of bios pending for this dio */
-	atomic_t pending_bios;
+	/*
+	 * References to this structure. There is one reference per in-flight
+	 * bio plus one while we're still setting up.
+	 */
+	refcount_t refs;
 
 	/* IO errors */
 	int errors;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d7cf248dd634..4b1102f2e6b8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7811,7 +7811,7 @@ static void btrfs_end_dio_bio(struct bio *bio)
 	}
 
 	/* if there are more bios still pending for this dio, just exit */
-	if (!atomic_dec_and_test(&dip->pending_bios))
+	if (!refcount_dec_and_test(&dip->refs))
 		goto out;
 
 	if (dip->errors) {
@@ -7929,7 +7929,7 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
 	dip->disk_bytenr = (u64)dio_bio->bi_iter.bi_sector << 9;
 	dip->orig_bio = bio;
 	dip->dio_bio = dio_bio;
-	atomic_set(&dip->pending_bios, 1);
+	refcount_set(&dip->refs, 1);
 
 	if (write) {
 		struct btrfs_dio_data *dio_data = current->journal_info;
@@ -8021,13 +8021,13 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 		 * count. Otherwise, the dip might get freed before we're
 		 * done setting it up.
 		 */
-		atomic_inc(&dip->pending_bios);
+		refcount_inc(&dip->refs);
 
 		status = btrfs_submit_dio_bio(bio, inode, file_offset,
 						async_submit);
 		if (status) {
 			bio_put(bio);
-			atomic_dec(&dip->pending_bios);
+			refcount_dec(&dip->refs);
 			goto out_err;
 		}
 
@@ -8056,7 +8056,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 	 * atomic operations with a return value are fully ordered as per
 	 * atomic_t.txt
 	 */
-	if (atomic_dec_and_test(&dip->pending_bios))
+	if (refcount_dec_and_test(&dip->refs))
 		bio_io_error(dip->orig_bio);
 }
 
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 11/15] btrfs: put direct I/O checksums in btrfs_dio_private instead of bio
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
                   ` (9 preceding siblings ...)
  2020-04-16 21:46 ` [PATCH v2 10/15] btrfs: convert btrfs_dio_private->pending_bios to refcount_t Omar Sandoval
@ 2020-04-16 21:46 ` Omar Sandoval
  2020-04-17 18:06   ` Johannes Thumshirn
  2020-04-21 22:50   ` David Sterba
  2020-04-16 21:46 ` [PATCH v2 12/15] btrfs: get rid of one layer of bios in direct I/O Omar Sandoval
                   ` (5 subsequent siblings)
  16 siblings, 2 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

The next commit will get rid of btrfs_dio_private->orig_bio. The only
thing we really need it for is containing all of the checksums, but we
can easily put the checksum array in btrfs_dio_private and have the
submitted bios reference the array. We can also look the checksums up
while we're setting up instead of the current awkward logic that looks
them up for orig_bio when the first split bio is submitted.

(Interestingly, btrfs_dio_private did contain the
checksums before commit 23ea8e5a0767 ("Btrfs: load checksum data once
when submitting a direct read io"), but it didn't look them up up
front.)

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/btrfs_inode.h |  3 ++
 fs/btrfs/inode.c       | 70 +++++++++++++++++++-----------------------
 2 files changed, 34 insertions(+), 39 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index b965fa5429ec..94476a8be4cc 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -324,6 +324,9 @@ struct btrfs_dio_private {
 	 */
 	blk_status_t (*subio_endio)(struct inode *, struct btrfs_io_bio *,
 			blk_status_t);
+
+	/* Checksums. */
+	u8 sums[];
 };
 
 /*
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4b1102f2e6b8..fe87c465b13c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7712,7 +7712,6 @@ static void btrfs_endio_direct_read(struct bio *bio)
 
 	dio_bio->bi_status = err;
 	dio_end_io(dio_bio);
-	btrfs_io_bio_free_csum(io_bio);
 	bio_put(bio);
 }
 
@@ -7824,39 +7823,6 @@ static void btrfs_end_dio_bio(struct bio *bio)
 	bio_put(bio);
 }
 
-static inline blk_status_t btrfs_lookup_and_bind_dio_csum(struct inode *inode,
-						 struct btrfs_dio_private *dip,
-						 struct bio *bio,
-						 u64 file_offset)
-{
-	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
-	struct btrfs_io_bio *orig_io_bio = btrfs_io_bio(dip->orig_bio);
-	u16 csum_size;
-	blk_status_t ret;
-
-	/*
-	 * We load all the csum data we need when we submit
-	 * the first bio to reduce the csum tree search and
-	 * contention.
-	 */
-	if (dip->logical_offset == file_offset) {
-		ret = btrfs_lookup_bio_sums(inode, dip->orig_bio, file_offset,
-					    NULL);
-		if (ret)
-			return ret;
-	}
-
-	if (bio == dip->orig_bio)
-		return 0;
-
-	file_offset -= dip->logical_offset;
-	file_offset >>= inode->i_sb->s_blocksize_bits;
-	csum_size = btrfs_super_csum_size(btrfs_sb(inode->i_sb)->super_copy);
-	io_bio->csum = orig_io_bio->csum + csum_size * file_offset;
-
-	return 0;
-}
-
 static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 		struct inode *inode, u64 file_offset, int async_submit)
 {
@@ -7892,10 +7858,12 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 		if (ret)
 			goto err;
 	} else {
-		ret = btrfs_lookup_and_bind_dio_csum(inode, dip, bio,
-						     file_offset);
-		if (ret)
-			goto err;
+		u64 csum_offset;
+
+		csum_offset = file_offset - dip->logical_offset;
+		csum_offset >>= inode->i_sb->s_blocksize_bits;
+		csum_offset *= btrfs_super_csum_size(fs_info->super_copy);
+		btrfs_io_bio(bio)->csum = dip->sums + csum_offset;
 	}
 map:
 	ret = btrfs_map_bio(fs_info, bio, 0);
@@ -7912,10 +7880,22 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
 							  loff_t file_offset)
 {
 	const bool write = (bio_op(dio_bio) == REQ_OP_WRITE);
+	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
+	size_t dip_size;
 	struct btrfs_dio_private *dip;
 	struct bio *bio;
 
-	dip = kzalloc(sizeof(*dip), GFP_NOFS);
+	dip_size = sizeof(*dip);
+	if (!write && csum) {
+		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+		u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
+		size_t nblocks;
+
+		nblocks = dio_bio->bi_iter.bi_size >> inode->i_sb->s_blocksize_bits;
+		dip_size += csum_size * nblocks;
+	}
+
+	dip = kzalloc(dip_size, GFP_NOFS);
 	if (!dip)
 		return NULL;
 
@@ -7951,6 +7931,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 				loff_t file_offset)
 {
 	const bool write = (bio_op(dio_bio) == REQ_OP_WRITE);
+	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_dio_private *dip;
 	struct bio *bio;
@@ -7975,6 +7956,17 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 		return;
 	}
 
+	if (!write && csum) {
+		/*
+		 * Load the csums up front to reduce csum tree searches and
+		 * contention when submitting bios.
+		 */
+		status = btrfs_lookup_bio_sums(inode, dio_bio, file_offset,
+					       dip->sums);
+		if (status != BLK_STS_OK)
+			goto out_err;
+	}
+
 	orig_bio = dip->orig_bio;
 	start_sector = orig_bio->bi_iter.bi_sector;
 	submit_len = orig_bio->bi_iter.bi_size;
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 12/15] btrfs: get rid of one layer of bios in direct I/O
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
                   ` (10 preceding siblings ...)
  2020-04-16 21:46 ` [PATCH v2 11/15] btrfs: put direct I/O checksums in btrfs_dio_private instead of bio Omar Sandoval
@ 2020-04-16 21:46 ` Omar Sandoval
  2020-04-21 13:00   ` Nikolay Borisov
  2020-04-16 21:46 ` [PATCH v2 13/15] btrfs: simplify direct I/O read repair Omar Sandoval
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

In the worst case, there are _4_ layers of bios in the Btrfs direct I/O
path:

1. The bio created by the generic direct I/O code (dio_bio).
2. A clone of dio_bio we create in btrfs_submit_direct() to represent
   the entire direct I/O range (orig_bio).
3. A partial clone of orig_bio limited to the size of a RAID stripe that
   we create in btrfs_submit_direct_hook().
4. Clones of each of those split bios for each RAID stripe that we
   create in btrfs_map_bio().

As of the previous commit, the second layer (orig_bio) is no longer
needed for anything: we can split dio_bio instead, and complete dio_bio
directly when all of the cloned bios complete. This lets us clean up a
bunch of cruft, including dip->subio_endio and dip->errors (we can use
dio_bio->bi_status instead). It also enables the next big cleanup of
direct I/O read repair.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/btrfs_inode.h |  16 ----
 fs/btrfs/inode.c       | 185 +++++++++++++++--------------------------
 2 files changed, 65 insertions(+), 136 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 94476a8be4cc..3fb2f5a11ee3 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -294,11 +294,8 @@ static inline int btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation)
 	return ret;
 }
 
-#define BTRFS_DIO_ORIG_BIO_SUBMITTED	0x1
-
 struct btrfs_dio_private {
 	struct inode *inode;
-	unsigned long flags;
 	u64 logical_offset;
 	u64 disk_bytenr;
 	u64 bytes;
@@ -309,22 +306,9 @@ struct btrfs_dio_private {
 	 */
 	refcount_t refs;
 
-	/* IO errors */
-	int errors;
-
-	/* orig_bio is our btrfs_io_bio */
-	struct bio *orig_bio;
-
 	/* dio_bio came from fs/direct-io.c */
 	struct bio *dio_bio;
 
-	/*
-	 * The original bio may be split to several sub-bios, this is
-	 * done during endio of sub-bios
-	 */
-	blk_status_t (*subio_endio)(struct inode *, struct btrfs_io_bio *,
-			blk_status_t);
-
 	/* Checksums. */
 	u8 sums[];
 };
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index fe87c465b13c..79b884d2f3ed 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7356,6 +7356,29 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
 	return ret;
 }
 
+static void btrfs_dio_private_put(struct btrfs_dio_private *dip)
+{
+	/*
+	 * This implies a barrier so that stores to dio_bio->bi_status before
+	 * this and loads of dio_bio->bi_status after this are fully ordered.
+	 */
+	if (!refcount_dec_and_test(&dip->refs))
+		return;
+
+	if (bio_op(dip->dio_bio) == REQ_OP_WRITE) {
+		__endio_write_update_ordered(dip->inode, dip->logical_offset,
+					     dip->bytes,
+					     !dip->dio_bio->bi_status);
+	} else {
+		unlock_extent(&BTRFS_I(dip->inode)->io_tree,
+			      dip->logical_offset,
+			      dip->logical_offset + dip->bytes - 1);
+	}
+
+	dio_end_io(dip->dio_bio);
+	kfree(dip);
+}
+
 static inline blk_status_t submit_dio_repair_bio(struct inode *inode,
 						 struct bio *bio,
 						 int mirror_num)
@@ -7678,8 +7701,9 @@ static blk_status_t __btrfs_subio_endio_read(struct inode *inode,
 	return err;
 }
 
-static blk_status_t btrfs_subio_endio_read(struct inode *inode,
-		struct btrfs_io_bio *io_bio, blk_status_t err)
+static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
+					     struct btrfs_io_bio *io_bio,
+					     blk_status_t err)
 {
 	bool skip_csum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM;
 
@@ -7693,28 +7717,6 @@ static blk_status_t btrfs_subio_endio_read(struct inode *inode,
 	}
 }
 
-static void btrfs_endio_direct_read(struct bio *bio)
-{
-	struct btrfs_dio_private *dip = bio->bi_private;
-	struct inode *inode = dip->inode;
-	struct bio *dio_bio;
-	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
-	blk_status_t err = bio->bi_status;
-
-	if (dip->flags & BTRFS_DIO_ORIG_BIO_SUBMITTED)
-		err = btrfs_subio_endio_read(inode, io_bio, err);
-
-	unlock_extent(&BTRFS_I(inode)->io_tree, dip->logical_offset,
-		      dip->logical_offset + dip->bytes - 1);
-	dio_bio = dip->dio_bio;
-
-	kfree(dip);
-
-	dio_bio->bi_status = err;
-	dio_end_io(dio_bio);
-	bio_put(bio);
-}
-
 static void __endio_write_update_ordered(struct inode *inode,
 					 const u64 offset, const u64 bytes,
 					 const bool uptodate)
@@ -7758,21 +7760,6 @@ static void __endio_write_update_ordered(struct inode *inode,
 	}
 }
 
-static void btrfs_endio_direct_write(struct bio *bio)
-{
-	struct btrfs_dio_private *dip = bio->bi_private;
-	struct bio *dio_bio = dip->dio_bio;
-
-	__endio_write_update_ordered(dip->inode, dip->logical_offset,
-				     dip->bytes, !bio->bi_status);
-
-	kfree(dip);
-
-	dio_bio->bi_status = bio->bi_status;
-	dio_end_io(dio_bio);
-	bio_put(bio);
-}
-
 static blk_status_t btrfs_submit_bio_start_direct_io(void *private_data,
 				    struct bio *bio, u64 offset)
 {
@@ -7796,31 +7783,16 @@ static void btrfs_end_dio_bio(struct bio *bio)
 			   (unsigned long long)bio->bi_iter.bi_sector,
 			   bio->bi_iter.bi_size, err);
 
-	if (dip->subio_endio)
-		err = dip->subio_endio(dip->inode, btrfs_io_bio(bio), err);
-
-	if (err) {
-		/*
-		 * We want to perceive the errors flag being set before
-		 * decrementing the reference count. We don't need a barrier
-		 * since atomic operations with a return value are fully
-		 * ordered as per atomic_t.txt
-		 */
-		dip->errors = 1;
+	if (bio_op(bio) == REQ_OP_READ) {
+		err = btrfs_check_read_dio_bio(dip->inode, btrfs_io_bio(bio),
+					       err);
 	}
 
-	/* if there are more bios still pending for this dio, just exit */
-	if (!refcount_dec_and_test(&dip->refs))
-		goto out;
+	if (err)
+		dip->dio_bio->bi_status = err;
 
-	if (dip->errors) {
-		bio_io_error(dip->orig_bio);
-	} else {
-		dip->dio_bio->bi_status = BLK_STS_OK;
-		bio_endio(dip->orig_bio);
-	}
-out:
 	bio_put(bio);
+	btrfs_dio_private_put(dip);
 }
 
 static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
@@ -7883,7 +7855,6 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
 	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
 	size_t dip_size;
 	struct btrfs_dio_private *dip;
-	struct bio *bio;
 
 	dip_size = sizeof(*dip);
 	if (!write && csum) {
@@ -7899,15 +7870,10 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
 	if (!dip)
 		return NULL;
 
-	bio = btrfs_bio_clone(dio_bio);
-	bio->bi_private = dip;
-	btrfs_io_bio(bio)->logical = file_offset;
-
 	dip->inode = inode;
 	dip->logical_offset = file_offset;
 	dip->bytes = dio_bio->bi_iter.bi_size;
 	dip->disk_bytenr = (u64)dio_bio->bi_iter.bi_sector << 9;
-	dip->orig_bio = bio;
 	dip->dio_bio = dio_bio;
 	refcount_set(&dip->refs, 1);
 
@@ -7918,11 +7884,6 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
 			dip->bytes;
 		dio_data->unsubmitted_oe_range_start =
 			dio_data->unsubmitted_oe_range_end;
-
-		bio->bi_end_io = btrfs_endio_direct_write;
-	} else {
-		bio->bi_end_io = btrfs_endio_direct_read;
-		dip->subio_endio = btrfs_subio_endio_read;
 	}
 	return dip;
 }
@@ -7933,9 +7894,10 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 	const bool write = (bio_op(dio_bio) == REQ_OP_WRITE);
 	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	const bool raid56 = (btrfs_data_alloc_profile(fs_info) &
+			     BTRFS_BLOCK_GROUP_RAID56_MASK);
 	struct btrfs_dio_private *dip;
 	struct bio *bio;
-	struct bio *orig_bio;
 	u64 start_sector;
 	int async_submit = 0;
 	u64 submit_len;
@@ -7967,89 +7929,72 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
 			goto out_err;
 	}
 
-	orig_bio = dip->orig_bio;
-	start_sector = orig_bio->bi_iter.bi_sector;
-	submit_len = orig_bio->bi_iter.bi_size;
-	ret = btrfs_get_io_geometry(fs_info, btrfs_op(orig_bio),
-				    start_sector << 9, submit_len, &geom);
-	if (ret)
-		goto out_err;
+	start_sector = dio_bio->bi_iter.bi_sector;
+	submit_len = dio_bio->bi_iter.bi_size;
 
-	if (geom.len >= submit_len) {
-		bio = orig_bio;
-		dip->flags |= BTRFS_DIO_ORIG_BIO_SUBMITTED;
-		goto submit;
-	}
-
-	/* async crcs make it difficult to collect full stripe writes. */
-	if (btrfs_data_alloc_profile(fs_info) & BTRFS_BLOCK_GROUP_RAID56_MASK)
-		async_submit = 0;
-	else
-		async_submit = 1;
-
-	/* bio split */
-	ASSERT(geom.len <= INT_MAX);
 	do {
+		ret = btrfs_get_io_geometry(fs_info, btrfs_op(dio_bio),
+					    start_sector << 9, submit_len,
+					    &geom);
+		if (ret) {
+			status = errno_to_blk_status(ret);
+			goto out_err;
+		}
+		ASSERT(geom.len <= INT_MAX);
+
 		clone_len = min_t(int, submit_len, geom.len);
 
 		/*
 		 * This will never fail as it's passing GPF_NOFS and
 		 * the allocation is backed by btrfs_bioset.
 		 */
-		bio = btrfs_bio_clone_partial(orig_bio, clone_offset,
-					      clone_len);
+		bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len);
 		bio->bi_private = dip;
 		bio->bi_end_io = btrfs_end_dio_bio;
 		btrfs_io_bio(bio)->logical = file_offset;
 
 		ASSERT(submit_len >= clone_len);
 		submit_len -= clone_len;
-		if (submit_len == 0)
-			break;
 
 		/*
 		 * Increase the count before we submit the bio so we know
 		 * the end IO handler won't happen before we increase the
 		 * count. Otherwise, the dip might get freed before we're
 		 * done setting it up.
+		 *
+		 * We transfer the initial reference to the last bio, so we
+		 * don't need to increment the reference count for the last one.
 		 */
-		refcount_inc(&dip->refs);
+		if (submit_len > 0) {
+			refcount_inc(&dip->refs);
+			/*
+			 * If we are submitting more than one bio, submit them
+			 * all asynchronously. The exception is RAID 5 or 6, as
+			 * asynchronous checksums make it difficult to collect
+			 * full stripe writes.
+			 */
+			if (!raid56)
+				async_submit = 1;
+		}
 
 		status = btrfs_submit_dio_bio(bio, inode, file_offset,
 						async_submit);
 		if (status) {
 			bio_put(bio);
-			refcount_dec(&dip->refs);
+			if (submit_len > 0)
+				refcount_dec(&dip->refs);
 			goto out_err;
 		}
 
 		clone_offset += clone_len;
 		start_sector += clone_len >> 9;
 		file_offset += clone_len;
-
-		ret = btrfs_get_io_geometry(fs_info, btrfs_op(orig_bio),
-				      start_sector << 9, submit_len, &geom);
-		if (ret)
-			goto out_err;
 	} while (submit_len > 0);
+	return;
 
-submit:
-	status = btrfs_submit_dio_bio(bio, inode, file_offset, async_submit);
-	if (!status)
-		return;
-
-	if (bio != orig_bio)
-		bio_put(bio);
 out_err:
-	dip->errors = 1;
-	/*
-	 * Before atomic variable goto zero, we must  make sure dip->errors is
-	 * perceived to be set. This ordering is ensured by the fact that an
-	 * atomic operations with a return value are fully ordered as per
-	 * atomic_t.txt
-	 */
-	if (refcount_dec_and_test(&dip->refs))
-		bio_io_error(dip->orig_bio);
+	dip->dio_bio->bi_status = status;
+	btrfs_dio_private_put(dip);
 }
 
 static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 13/15] btrfs: simplify direct I/O read repair
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
                   ` (11 preceding siblings ...)
  2020-04-16 21:46 ` [PATCH v2 12/15] btrfs: get rid of one layer of bios in direct I/O Omar Sandoval
@ 2020-04-16 21:46 ` Omar Sandoval
  2020-04-21 13:53   ` Nikolay Borisov
  2020-04-16 21:46 ` [PATCH v2 14/15] btrfs: get rid of endio_repair_workers Omar Sandoval
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

Direct I/O read repair was originally implemented in commit 8b110e393c5a
("Btrfs: implement repair function when direct read fails"). This
implementation is unnecessarily complicated. There is major code
duplication between __btrfs_subio_endio_read() (checks checksums and
handles I/O errors for files with checksums),
__btrfs_correct_data_nocsum() (handles I/O errors for files without
checksums), btrfs_retry_endio() (checks checksums and handles I/O errors
for retries of files with checksums), and btrfs_retry_endio_nocsum()
(handles I/O errors for retries of files without checksum). If it sounds
like these should be one function, that's because they should.
Additionally, these functions are very hard to follow due to their
excessive use of goto.

This commit replaces the original implementation. After the previous
commit getting rid of orig_bio, we can reuse the same endio callback for
repair I/O and the original I/O, we just need to track the file offset
and original iterator in the repair bio. We can also unify the handling
of files with and without checksums and simplify the control flow. We
also no longer have to wait for each repair I/O to complete one by one.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent_io.c |   2 +
 fs/btrfs/inode.c     | 268 +++++++------------------------------------
 2 files changed, 44 insertions(+), 226 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 85e98ba349a8..6e1d97bb7652 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2631,6 +2631,8 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio,
 	}
 
 	bio_add_page(bio, page, failrec->len, pg_offset);
+	btrfs_io_bio(bio)->logical = failrec->start;
+	btrfs_io_bio(bio)->iter = bio->bi_iter;
 
 	return bio;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 79b884d2f3ed..2580f2d251d4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7435,19 +7435,17 @@ static int btrfs_check_dio_repairable(struct inode *inode,
 
 static blk_status_t dio_read_error(struct inode *inode, struct bio *failed_bio,
 				   struct page *page, unsigned int pgoff,
-				   u64 start, u64 end, int failed_mirror,
-				   bio_end_io_t *repair_endio, void *repair_arg)
+				   u64 start, u64 end, int failed_mirror)
 {
+	struct btrfs_dio_private *dip = failed_bio->bi_private;
 	struct io_failure_record *failrec;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
 	struct bio *bio;
 	int isector;
 	unsigned int read_mode = 0;
-	int segs;
 	int ret;
 	blk_status_t status;
-	struct bio_vec bvec;
 
 	BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
 
@@ -7462,261 +7460,79 @@ static blk_status_t dio_read_error(struct inode *inode, struct bio *failed_bio,
 		return BLK_STS_IOERR;
 	}
 
-	segs = bio_segments(failed_bio);
-	bio_get_first_bvec(failed_bio, &bvec);
-	if (segs > 1 ||
-	    (bvec.bv_len > btrfs_inode_sectorsize(inode)))
+	if (btrfs_io_bio(failed_bio)->iter.bi_size > inode->i_sb->s_blocksize)
 		read_mode |= REQ_FAILFAST_DEV;
 
 	isector = start - btrfs_io_bio(failed_bio)->logical;
 	isector >>= inode->i_sb->s_blocksize_bits;
-	bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page,
-				pgoff, isector, repair_endio, repair_arg);
+	bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page, pgoff,
+				      isector, failed_bio->bi_end_io, dip);
 	bio->bi_opf = REQ_OP_READ | read_mode;
 
 	btrfs_debug(BTRFS_I(inode)->root->fs_info,
 		    "repair DIO read error: submitting new dio read[%#x] to this_mirror=%d, in_validation=%d",
 		    read_mode, failrec->this_mirror, failrec->in_validation);
 
+	refcount_inc(&dip->refs);
 	status = submit_dio_repair_bio(inode, bio, failrec->this_mirror);
 	if (status) {
 		free_io_failure(failure_tree, io_tree, failrec);
 		bio_put(bio);
+		refcount_dec(&dip->refs);
 	}
 
 	return status;
 }
 
-struct btrfs_retry_complete {
-	struct completion done;
-	struct inode *inode;
-	u64 start;
-	int uptodate;
-};
-
-static void btrfs_retry_endio_nocsum(struct bio *bio)
-{
-	struct btrfs_retry_complete *done = bio->bi_private;
-	struct inode *inode = done->inode;
-	struct bio_vec *bvec;
-	struct extent_io_tree *io_tree, *failure_tree;
-	struct bvec_iter_all iter_all;
-
-	if (bio->bi_status)
-		goto end;
-
-	ASSERT(bio->bi_vcnt == 1);
-	io_tree = &BTRFS_I(inode)->io_tree;
-	failure_tree = &BTRFS_I(inode)->io_failure_tree;
-	ASSERT(bio_first_bvec_all(bio)->bv_len == btrfs_inode_sectorsize(inode));
-
-	done->uptodate = 1;
-	ASSERT(!bio_flagged(bio, BIO_CLONED));
-	bio_for_each_segment_all(bvec, bio, iter_all)
-		clean_io_failure(BTRFS_I(inode)->root->fs_info, failure_tree,
-				 io_tree, done->start, bvec->bv_page,
-				 btrfs_ino(BTRFS_I(inode)), 0);
-end:
-	complete(&done->done);
-	bio_put(bio);
-}
-
-static blk_status_t __btrfs_correct_data_nocsum(struct inode *inode,
-						struct btrfs_io_bio *io_bio)
+static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
+					     struct btrfs_io_bio *io_bio,
+					     const bool uptodate)
 {
-	struct btrfs_fs_info *fs_info;
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+	u32 sectorsize = fs_info->sectorsize;
+	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
+	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
+	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
 	struct bio_vec bvec;
 	struct bvec_iter iter;
-	struct btrfs_retry_complete done;
-	u64 start;
-	unsigned int pgoff;
-	u32 sectorsize;
-	int nr_sectors;
-	blk_status_t ret;
+	u64 start = io_bio->logical;
+	int icsum = 0;
 	blk_status_t err = BLK_STS_OK;
 
-	fs_info = BTRFS_I(inode)->root->fs_info;
-	sectorsize = fs_info->sectorsize;
-
-	start = io_bio->logical;
-	done.inode = inode;
-	io_bio->bio.bi_iter = io_bio->iter;
+	__bio_for_each_segment(bvec, &io_bio->bio, iter, io_bio->iter) {
+		unsigned int i, nr_sectors, pgoff;
 
-	bio_for_each_segment(bvec, &io_bio->bio, iter) {
 		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
 		pgoff = bvec.bv_offset;
-
-next_block_or_try_again:
-		done.uptodate = 0;
-		done.start = start;
-		init_completion(&done.done);
-
-		ret = dio_read_error(inode, &io_bio->bio, bvec.bv_page,
-				pgoff, start, start + sectorsize - 1,
-				io_bio->mirror_num,
-				btrfs_retry_endio_nocsum, &done);
-		if (ret) {
-			err = ret;
-			goto next;
-		}
-
-		wait_for_completion_io(&done.done);
-
-		if (!done.uptodate) {
-			/* We might have another mirror, so try again */
-			goto next_block_or_try_again;
-		}
-
-next:
-		start += sectorsize;
-
-		nr_sectors--;
-		if (nr_sectors) {
-			pgoff += sectorsize;
+		for (i = 0; i < nr_sectors; i++) {
 			ASSERT(pgoff < PAGE_SIZE);
-			goto next_block_or_try_again;
-		}
-	}
-
-	return err;
-}
-
-static void btrfs_retry_endio(struct bio *bio)
-{
-	struct btrfs_retry_complete *done = bio->bi_private;
-	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
-	struct extent_io_tree *io_tree, *failure_tree;
-	struct inode *inode = done->inode;
-	struct bio_vec *bvec;
-	int uptodate;
-	int ret;
-	int i = 0;
-	struct bvec_iter_all iter_all;
-
-	if (bio->bi_status)
-		goto end;
-
-	uptodate = 1;
-
-	ASSERT(bio->bi_vcnt == 1);
-	ASSERT(bio_first_bvec_all(bio)->bv_len == btrfs_inode_sectorsize(done->inode));
-
-	io_tree = &BTRFS_I(inode)->io_tree;
-	failure_tree = &BTRFS_I(inode)->io_failure_tree;
-
-	ASSERT(!bio_flagged(bio, BIO_CLONED));
-	bio_for_each_segment_all(bvec, bio, iter_all) {
-		ret = check_data_csum(inode, io_bio, i, bvec->bv_page,
-				      bvec->bv_offset, done->start,
-				      bvec->bv_len);
-		if (!ret)
-			clean_io_failure(BTRFS_I(inode)->root->fs_info,
-					 failure_tree, io_tree, done->start,
-					 bvec->bv_page,
-					 btrfs_ino(BTRFS_I(inode)),
-					 bvec->bv_offset);
-		else
-			uptodate = 0;
-		i++;
-	}
-
-	done->uptodate = uptodate;
-end:
-	complete(&done->done);
-	bio_put(bio);
-}
-
-static blk_status_t __btrfs_subio_endio_read(struct inode *inode,
-		struct btrfs_io_bio *io_bio, blk_status_t err)
-{
-	struct btrfs_fs_info *fs_info;
-	struct bio_vec bvec;
-	struct bvec_iter iter;
-	struct btrfs_retry_complete done;
-	u64 start;
-	u64 offset = 0;
-	u32 sectorsize;
-	int nr_sectors;
-	unsigned int pgoff;
-	int csum_pos;
-	bool uptodate = (err == 0);
-	int ret;
-	blk_status_t status;
-
-	fs_info = BTRFS_I(inode)->root->fs_info;
-	sectorsize = fs_info->sectorsize;
-
-	err = BLK_STS_OK;
-	start = io_bio->logical;
-	done.inode = inode;
-	io_bio->bio.bi_iter = io_bio->iter;
-
-	bio_for_each_segment(bvec, &io_bio->bio, iter) {
-		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
-
-		pgoff = bvec.bv_offset;
-next_block:
-		if (uptodate) {
-			csum_pos = BTRFS_BYTES_TO_BLKS(fs_info, offset);
-			ret = check_data_csum(inode, io_bio, csum_pos,
-					      bvec.bv_page, pgoff, start,
-					      sectorsize);
-			if (likely(!ret))
-				goto next;
-		}
-try_again:
-		done.uptodate = 0;
-		done.start = start;
-		init_completion(&done.done);
-
-		status = dio_read_error(inode, &io_bio->bio, bvec.bv_page,
-					pgoff, start, start + sectorsize - 1,
-					io_bio->mirror_num, btrfs_retry_endio,
-					&done);
-		if (status) {
-			err = status;
-			goto next;
-		}
-
-		wait_for_completion_io(&done.done);
-
-		if (!done.uptodate) {
-			/* We might have another mirror, so try again */
-			goto try_again;
-		}
-next:
-		offset += sectorsize;
-		start += sectorsize;
-
-		ASSERT(nr_sectors);
-
-		nr_sectors--;
-		if (nr_sectors) {
+			if (uptodate &&
+			    (!csum || !check_data_csum(inode, io_bio, icsum,
+						       bvec.bv_page, pgoff,
+						       start, sectorsize))) {
+				clean_io_failure(fs_info, failure_tree, io_tree,
+						 start, bvec.bv_page,
+						 btrfs_ino(BTRFS_I(inode)),
+						 pgoff);
+			} else {
+				blk_status_t status;
+
+				status = dio_read_error(inode, &io_bio->bio,
+							bvec.bv_page, pgoff,
+							start,
+							start + sectorsize - 1,
+							io_bio->mirror_num);
+				if (status)
+					err = status;
+			}
+			start += sectorsize;
+			icsum++;
 			pgoff += sectorsize;
-			ASSERT(pgoff < PAGE_SIZE);
-			goto next_block;
 		}
 	}
-
 	return err;
 }
 
-static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
-					     struct btrfs_io_bio *io_bio,
-					     blk_status_t err)
-{
-	bool skip_csum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM;
-
-	if (skip_csum) {
-		if (unlikely(err))
-			return __btrfs_correct_data_nocsum(inode, io_bio);
-		else
-			return BLK_STS_OK;
-	} else {
-		return __btrfs_subio_endio_read(inode, io_bio, err);
-	}
-}
-
 static void __endio_write_update_ordered(struct inode *inode,
 					 const u64 offset, const u64 bytes,
 					 const bool uptodate)
@@ -7785,7 +7601,7 @@ static void btrfs_end_dio_bio(struct bio *bio)
 
 	if (bio_op(bio) == REQ_OP_READ) {
 		err = btrfs_check_read_dio_bio(dip->inode, btrfs_io_bio(bio),
-					       err);
+					       !err);
 	}
 
 	if (err)
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 14/15] btrfs: get rid of endio_repair_workers
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
                   ` (12 preceding siblings ...)
  2020-04-16 21:46 ` [PATCH v2 13/15] btrfs: simplify direct I/O read repair Omar Sandoval
@ 2020-04-16 21:46 ` Omar Sandoval
  2020-04-16 21:46 ` [PATCH v2 15/15] btrfs: unify buffered and direct I/O read repair Omar Sandoval
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

This was originally added in commit 8b110e393c5a ("Btrfs: implement
repair function when direct read fails") to avoid a deadlock. In that
commit, the direct I/O read endio executes on the endio_workers
workqueue, submits a repair bio, and waits for it to complete. The
repair bio endio must execute on a different workqueue, otherwise it
could block on the endio_workers workqueue becoming available, which
won't happen because the original endio is blocked on the repair bio.

As of the previous commit, the original endio doesn't wait for the
repair bio, so this separate workqueue is unnecessary.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/ctree.h   | 1 -
 fs/btrfs/disk-io.c | 8 +-------
 fs/btrfs/disk-io.h | 1 -
 fs/btrfs/inode.c   | 2 +-
 4 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c322568231a4..91b9052f315e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -758,7 +758,6 @@ struct btrfs_fs_info {
 	struct btrfs_workqueue *endio_workers;
 	struct btrfs_workqueue *endio_meta_workers;
 	struct btrfs_workqueue *endio_raid56_workers;
-	struct btrfs_workqueue *endio_repair_workers;
 	struct btrfs_workqueue *rmw_workers;
 	struct btrfs_workqueue *endio_meta_write_workers;
 	struct btrfs_workqueue *endio_write_workers;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a6cb5cbbdb9f..22efd6defcf7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -709,9 +709,7 @@ static void end_workqueue_bio(struct bio *bio)
 		else
 			wq = fs_info->endio_write_workers;
 	} else {
-		if (unlikely(end_io_wq->metadata == BTRFS_WQ_ENDIO_DIO_REPAIR))
-			wq = fs_info->endio_repair_workers;
-		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
+		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
 			wq = fs_info->endio_raid56_workers;
 		else if (end_io_wq->metadata)
 			wq = fs_info->endio_meta_workers;
@@ -1942,7 +1940,6 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info)
 	btrfs_destroy_workqueue(fs_info->workers);
 	btrfs_destroy_workqueue(fs_info->endio_workers);
 	btrfs_destroy_workqueue(fs_info->endio_raid56_workers);
-	btrfs_destroy_workqueue(fs_info->endio_repair_workers);
 	btrfs_destroy_workqueue(fs_info->rmw_workers);
 	btrfs_destroy_workqueue(fs_info->endio_write_workers);
 	btrfs_destroy_workqueue(fs_info->endio_freespace_worker);
@@ -2148,8 +2145,6 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info,
 	fs_info->endio_raid56_workers =
 		btrfs_alloc_workqueue(fs_info, "endio-raid56", flags,
 				      max_active, 4);
-	fs_info->endio_repair_workers =
-		btrfs_alloc_workqueue(fs_info, "endio-repair", flags, 1, 0);
 	fs_info->rmw_workers =
 		btrfs_alloc_workqueue(fs_info, "rmw", flags, max_active, 2);
 	fs_info->endio_write_workers =
@@ -2173,7 +2168,6 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info,
 	      fs_info->flush_workers &&
 	      fs_info->endio_workers && fs_info->endio_meta_workers &&
 	      fs_info->endio_meta_write_workers &&
-	      fs_info->endio_repair_workers &&
 	      fs_info->endio_write_workers && fs_info->endio_raid56_workers &&
 	      fs_info->endio_freespace_worker && fs_info->rmw_workers &&
 	      fs_info->caching_workers && fs_info->readahead_workers &&
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index cd629113f61c..734bc5270b6a 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -25,7 +25,6 @@ enum btrfs_wq_endio_type {
 	BTRFS_WQ_ENDIO_METADATA,
 	BTRFS_WQ_ENDIO_FREE_SPACE,
 	BTRFS_WQ_ENDIO_RAID56,
-	BTRFS_WQ_ENDIO_DIO_REPAIR,
 };
 
 static inline u64 btrfs_sb_offset(int mirror)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2580f2d251d4..72fb398a88f7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7388,7 +7388,7 @@ static inline blk_status_t submit_dio_repair_bio(struct inode *inode,
 
 	BUG_ON(bio_op(bio) == REQ_OP_WRITE);
 
-	ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DIO_REPAIR);
+	ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
 	if (ret)
 		return ret;
 
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 15/15] btrfs: unify buffered and direct I/O read repair
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
                   ` (13 preceding siblings ...)
  2020-04-16 21:46 ` [PATCH v2 14/15] btrfs: get rid of endio_repair_workers Omar Sandoval
@ 2020-04-16 21:46 ` Omar Sandoval
  2020-04-21 23:38   ` David Sterba
  2020-04-17 11:03 ` [PATCH v2 00/15] btrfs: read repair/direct I/O improvements David Sterba
  2020-04-21 23:46 ` David Sterba
  16 siblings, 1 reply; 37+ messages in thread
From: Omar Sandoval @ 2020-04-16 21:46 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

From: Omar Sandoval <osandov@fb.com>

Currently, direct I/O has its own versions of bio_readpage_error() and
btrfs_check_repairable() (dio_read_error() and
btrfs_check_dio_repairable(), respectively). The main difference is that
the direct I/O version doesn't do read validation. The rework of direct
I/O repair makes it possible to do validation, so we can get rid of
btrfs_check_dio_repairable() and combine bio_readpage_error() and
dio_read_error() into a new helper, btrfs_submit_read_repair().

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent_io.c | 130 ++++++++++++++++++++-----------------------
 fs/btrfs/extent_io.h |  17 ++++--
 fs/btrfs/inode.c     | 103 ++++------------------------------
 3 files changed, 82 insertions(+), 168 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6e1d97bb7652..4c59ff5b0e3c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2602,41 +2602,6 @@ static bool btrfs_check_repairable(struct inode *inode,
 	return true;
 }
 
-
-struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio,
-				    struct io_failure_record *failrec,
-				    struct page *page, int pg_offset, int icsum,
-				    bio_end_io_t *endio_func, void *data)
-{
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct bio *bio;
-	struct btrfs_io_bio *btrfs_failed_bio;
-	struct btrfs_io_bio *btrfs_bio;
-
-	bio = btrfs_io_bio_alloc(1);
-	bio->bi_end_io = endio_func;
-	bio->bi_iter.bi_sector = failrec->logical >> 9;
-	bio->bi_iter.bi_size = 0;
-	bio->bi_private = data;
-
-	btrfs_failed_bio = btrfs_io_bio(failed_bio);
-	if (btrfs_failed_bio->csum) {
-		u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
-
-		btrfs_bio = btrfs_io_bio(bio);
-		btrfs_bio->csum = btrfs_bio->csum_inline;
-		icsum *= csum_size;
-		memcpy(btrfs_bio->csum, btrfs_failed_bio->csum + icsum,
-		       csum_size);
-	}
-
-	bio_add_page(bio, page, failrec->len, pg_offset);
-	btrfs_io_bio(bio)->logical = failrec->start;
-	btrfs_io_bio(bio)->iter = bio->bi_iter;
-
-	return bio;
-}
-
 static bool btrfs_io_needs_validation(struct inode *inode, struct bio *bio)
 {
 	struct bio_vec *bvec;
@@ -2654,72 +2619,96 @@ static bool btrfs_io_needs_validation(struct inode *inode, struct bio *bio)
 	/*
 	 * We need to validate each sector individually if the failed I/O was
 	 * for multiple sectors.
+	 *
+	 * There are a few possible bios that can end up here:
+	 * 1. A buffered read bio, which is not cloned.
+	 * 2. A direct I/O read bio, which is cloned.
+	 * 3. A (buffered or direct) repair bio, which is not cloned.
+	 *
+	 * For cloned bios (case 2), we can get the size from
+	 * btrfs_io_bio->iter; for non-cloned bios (cases 1 and 3), we can get
+	 * it from the bvecs.
 	 */
-	bio_for_each_bvec_all(bvec, bio, i) {
-		len += bvec->bv_len;
-		if (len > inode->i_sb->s_blocksize)
+	if (bio_flagged(bio, BIO_CLONED)) {
+		if (btrfs_io_bio(bio)->iter.bi_size > inode->i_sb->s_blocksize)
 			return true;
+	} else {
+		bio_for_each_bvec_all(bvec, bio, i) {
+			len += bvec->bv_len;
+			if (len > inode->i_sb->s_blocksize)
+				return true;
+		}
 	}
 	return false;
 }
 
-/*
- * This is a generic handler for readpage errors. If other copies exist, read
- * those and write back good data to the failed position. Does not investigate
- * in remapping the failed extent elsewhere, hoping the device will be smart
- * enough to do this as needed
- */
-static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset,
-			      struct page *page, u64 start, u64 end,
-			      int failed_mirror)
+blk_status_t btrfs_submit_read_repair(struct inode *inode,
+				      struct bio *failed_bio, u64 phy_offset,
+				      struct page *page, unsigned int pgoff,
+				      u64 start, u64 end, int failed_mirror,
+				      submit_bio_hook_t *submit_bio_hook)
 {
 	struct io_failure_record *failrec;
-	struct inode *inode = page->mapping->host;
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
 	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
+	struct btrfs_io_bio *failed_io_bio = btrfs_io_bio(failed_bio);
+	int icsum = phy_offset >> inode->i_sb->s_blocksize_bits;
 	bool need_validation;
-	struct bio *bio;
-	int read_mode = 0;
+	struct bio *repair_bio;
+	struct btrfs_io_bio *repair_io_bio;
 	blk_status_t status;
 	int ret;
 
+	btrfs_info(btrfs_sb(inode->i_sb),
+		   "Repair Read Error: read error at %llu", start);
+
 	BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
 
 	ret = btrfs_get_io_failure_record(inode, start, end, &failrec);
 	if (ret)
-		return ret;
+		return errno_to_blk_status(ret);
 
 	need_validation = btrfs_io_needs_validation(inode, failed_bio);
 
 	if (!btrfs_check_repairable(inode, need_validation, failrec,
 				    failed_mirror)) {
 		free_io_failure(failure_tree, tree, failrec);
-		return -EIO;
+		return BLK_STS_IOERR;
 	}
 
+	repair_bio = btrfs_io_bio_alloc(1);
+	repair_io_bio = btrfs_io_bio(repair_bio);
+	repair_bio->bi_opf = REQ_OP_READ;
 	if (need_validation)
-		read_mode |= REQ_FAILFAST_DEV;
+		repair_bio->bi_opf |= REQ_FAILFAST_DEV;
+	repair_bio->bi_end_io = failed_bio->bi_end_io;
+	repair_bio->bi_iter.bi_sector = failrec->logical >> 9;
+	repair_bio->bi_private = failed_bio->bi_private;
 
-	phy_offset >>= inode->i_sb->s_blocksize_bits;
-	bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page,
-				      start - page_offset(page),
-				      (int)phy_offset, failed_bio->bi_end_io,
-				      NULL);
-	bio->bi_opf = REQ_OP_READ | read_mode;
+	if (failed_io_bio->csum) {
+		u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
+
+		repair_io_bio->csum = repair_io_bio->csum_inline;
+		memcpy(repair_io_bio->csum,
+		       failed_io_bio->csum + csum_size * icsum, csum_size);
+	}
+
+	bio_add_page(repair_bio, page, failrec->len, pgoff);
+	repair_io_bio->logical = failrec->start;
+	repair_io_bio->iter = repair_bio->bi_iter;
 
 	btrfs_debug(btrfs_sb(inode->i_sb),
-		"Repair Read Error: submitting new read[%#x] to this_mirror=%d, in_validation=%d",
-		read_mode, failrec->this_mirror, failrec->in_validation);
+"Repair Read Error: submitting new read to this_mirror=%d, in_validation=%d",
+		    failrec->this_mirror, failrec->in_validation);
 
-	status = tree->ops->submit_bio_hook(tree->private_data, bio, failrec->this_mirror,
-					 failrec->bio_flags);
+	status = submit_bio_hook(inode, repair_bio, failrec->this_mirror,
+				 failrec->bio_flags);
 	if (status) {
 		free_io_failure(failure_tree, tree, failrec);
-		bio_put(bio);
-		ret = blk_status_to_errno(status);
+		bio_put(repair_bio);
 	}
-
-	return ret;
+	return status;
 }
 
 /* lots and lots of room for performance fixes in the end_bio funcs */
@@ -2891,9 +2880,10 @@ static void end_bio_extent_readpage(struct bio *bio)
 			 * If it can't handle the error it will return -EIO and
 			 * we remain responsible for that page.
 			 */
-			ret = bio_readpage_error(bio, offset, page, start, end,
-						 mirror);
-			if (ret == 0) {
+			if (!btrfs_submit_read_repair(inode, bio, offset, page,
+						start - page_offset(page),
+						start, end, mirror,
+						tree->ops->submit_bio_hook)) {
 				uptodate = !bio->bi_status;
 				offset += len;
 				continue;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index f4dfac756455..a2842b2d9a98 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -66,6 +66,10 @@ struct btrfs_io_bio;
 struct io_failure_record;
 struct extent_io_tree;
 
+typedef blk_status_t (submit_bio_hook_t)(struct inode *inode, struct bio *bio,
+					 int mirror_num,
+					 unsigned long bio_flags);
+
 typedef blk_status_t (extent_submit_bio_start_t)(void *private_data,
 		struct bio *bio, u64 bio_offset);
 
@@ -74,8 +78,7 @@ struct extent_io_ops {
 	 * The following callbacks must be always defined, the function
 	 * pointer will be called unconditionally.
 	 */
-	blk_status_t (*submit_bio_hook)(struct inode *inode, struct bio *bio,
-					int mirror_num, unsigned long bio_flags);
+	submit_bio_hook_t *submit_bio_hook;
 	int (*readpage_end_io_hook)(struct btrfs_io_bio *io_bio, u64 phy_offset,
 				    struct page *page, u64 start, u64 end,
 				    int mirror);
@@ -312,10 +315,12 @@ struct io_failure_record {
 };
 
 
-struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio,
-				    struct io_failure_record *failrec,
-				    struct page *page, int pg_offset, int icsum,
-				    bio_end_io_t *endio_func, void *data);
+blk_status_t btrfs_submit_read_repair(struct inode *inode,
+				      struct bio *failed_bio, u64 phy_offset,
+				      struct page *page, unsigned int pgoff,
+				      u64 start, u64 end, int failed_mirror,
+				      submit_bio_hook_t *submit_bio_hook);
+
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 bool find_lock_delalloc_range(struct inode *inode,
 			     struct page *locked_page, u64 *start,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 72fb398a88f7..a4e9f9d0a43d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7379,10 +7379,11 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip)
 	kfree(dip);
 }
 
-static inline blk_status_t submit_dio_repair_bio(struct inode *inode,
-						 struct bio *bio,
-						 int mirror_num)
+static blk_status_t submit_dio_repair_bio(struct inode *inode, struct bio *bio,
+					  int mirror_num,
+					  unsigned long bio_flags)
 {
+	struct btrfs_dio_private *dip = bio->bi_private;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	blk_status_t ret;
 
@@ -7392,96 +7393,11 @@ static inline blk_status_t submit_dio_repair_bio(struct inode *inode,
 	if (ret)
 		return ret;
 
+	refcount_inc(&dip->refs);
 	ret = btrfs_map_bio(fs_info, bio, mirror_num);
-
-	return ret;
-}
-
-static int btrfs_check_dio_repairable(struct inode *inode,
-				      struct bio *failed_bio,
-				      struct io_failure_record *failrec,
-				      int failed_mirror)
-{
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	int num_copies;
-
-	num_copies = btrfs_num_copies(fs_info, failrec->logical, failrec->len);
-	if (num_copies == 1) {
-		/*
-		 * we only have a single copy of the data, so don't bother with
-		 * all the retry and error correction code that follows. no
-		 * matter what the error is, it is very likely to persist.
-		 */
-		btrfs_debug(fs_info,
-			"Check DIO Repairable: cannot repair, num_copies=%d, next_mirror %d, failed_mirror %d",
-			num_copies, failrec->this_mirror, failed_mirror);
-		return 0;
-	}
-
-	failrec->failed_mirror = failed_mirror;
-	failrec->this_mirror++;
-	if (failrec->this_mirror == failed_mirror)
-		failrec->this_mirror++;
-
-	if (failrec->this_mirror > num_copies) {
-		btrfs_debug(fs_info,
-			"Check DIO Repairable: (fail) num_copies=%d, next_mirror %d, failed_mirror %d",
-			num_copies, failrec->this_mirror, failed_mirror);
-		return 0;
-	}
-
-	return 1;
-}
-
-static blk_status_t dio_read_error(struct inode *inode, struct bio *failed_bio,
-				   struct page *page, unsigned int pgoff,
-				   u64 start, u64 end, int failed_mirror)
-{
-	struct btrfs_dio_private *dip = failed_bio->bi_private;
-	struct io_failure_record *failrec;
-	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
-	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
-	struct bio *bio;
-	int isector;
-	unsigned int read_mode = 0;
-	int ret;
-	blk_status_t status;
-
-	BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
-
-	ret = btrfs_get_io_failure_record(inode, start, end, &failrec);
 	if (ret)
-		return errno_to_blk_status(ret);
-
-	ret = btrfs_check_dio_repairable(inode, failed_bio, failrec,
-					 failed_mirror);
-	if (!ret) {
-		free_io_failure(failure_tree, io_tree, failrec);
-		return BLK_STS_IOERR;
-	}
-
-	if (btrfs_io_bio(failed_bio)->iter.bi_size > inode->i_sb->s_blocksize)
-		read_mode |= REQ_FAILFAST_DEV;
-
-	isector = start - btrfs_io_bio(failed_bio)->logical;
-	isector >>= inode->i_sb->s_blocksize_bits;
-	bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page, pgoff,
-				      isector, failed_bio->bi_end_io, dip);
-	bio->bi_opf = REQ_OP_READ | read_mode;
-
-	btrfs_debug(BTRFS_I(inode)->root->fs_info,
-		    "repair DIO read error: submitting new dio read[%#x] to this_mirror=%d, in_validation=%d",
-		    read_mode, failrec->this_mirror, failrec->in_validation);
-
-	refcount_inc(&dip->refs);
-	status = submit_dio_repair_bio(inode, bio, failrec->this_mirror);
-	if (status) {
-		free_io_failure(failure_tree, io_tree, failrec);
-		bio_put(bio);
 		refcount_dec(&dip->refs);
-	}
-
-	return status;
+	return ret;
 }
 
 static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
@@ -7517,11 +7433,14 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
 			} else {
 				blk_status_t status;
 
-				status = dio_read_error(inode, &io_bio->bio,
+				status = btrfs_submit_read_repair(inode,
+							&io_bio->bio,
+							start - io_bio->logical,
 							bvec.bv_page, pgoff,
 							start,
 							start + sectorsize - 1,
-							io_bio->mirror_num);
+							io_bio->mirror_num,
+							submit_dio_repair_bio);
 				if (status)
 					err = status;
 			}
-- 
2.26.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/15] btrfs: read repair/direct I/O improvements
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
                   ` (14 preceding siblings ...)
  2020-04-16 21:46 ` [PATCH v2 15/15] btrfs: unify buffered and direct I/O read repair Omar Sandoval
@ 2020-04-17 11:03 ` David Sterba
  2020-04-21 23:46 ` David Sterba
  16 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2020-04-17 11:03 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, kernel-team, Jens Axboe, Christoph Hellwig

On Thu, Apr 16, 2020 at 02:46:10PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Hi,
> 
> This is version 2 of my series of fixes, cleanups, and improvements to
> direct I/O and read repair. It's preparation for adding read repair to
> my RWF_ENCODED series [1], but it can go in independently.
> 
> Patch 1 adds a new bio helper needed for patch 4. Patches 2 and 3 are
> direct I/O error handling fixes. Patch 4 is a buffered read repair fix.
> Patch 5 is a buffered read repair improvement. Patches 6-9 are trivial
> cleanups. Patch 10 converts direct I/O to use refcount_t, which would've
> helped catch the bug fixed by patch 2. Patches 11-14 drastically
> simplify the direct I/O code. Patch 15 unifies buffered and direct I/O
> read repair, which also makes direct I/O repair actually do validation
> for large failed reads instead of rewriting the whole thing.
> 
> Overall, this is net about -350 lines of code and actually makes direct
> I/O more functional.
> 
> Note that this series causes btrfs/142 to fail. This is a bug in the
> test, as it assumes that direct I/O doesn't do read validation. This is
> fixed by the fstests patch "btrfs/14{2,3}: use dm-dust instead of
> fail_make_request" which I sent up yesterday [2].
> 
> Jens and Christoph are cc'd for patches 1 and 4. Instead of looking at
> the bio internals like I did in v1, I added a new
> bio_for_each_bvec_all() helper.
> 
> Changes from v1 [3]:
> 
> * Added patch 1 with bio_for_each_bvec_all() helper
> * Dropped patch 8 which moved struct definition
> * Fixed performance regression [4] in patch 13 caused by accidentally
>   making all direct I/O submission asynchronous
> * Fixed uninitialized assert in patch 12
> * Fixed misplaced assert in btrfs_check_read_dio_bio in patch 13
> * Added reviewed-bys
> * Refactored btrfs_submit_direct() and btrfs_submit_direct_hook() in
>   patch 2 (I didn't add Nikolay's reviewed-by to that one because the
>   new patch looks fairly different from patch 1 in v1)
> * Rewrapped csum calculations in patch 10 for easier readability
> * Clarified several commit messages and comments

Thanks, I haven't read the changes yet but fstests passed. There are
some conflicts with the dio-iomap patches, all seem to be in the
expected range and solvable.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 01/15] block: add bio_for_each_bvec_all()
  2020-04-16 21:46 ` [PATCH v2 01/15] block: add bio_for_each_bvec_all() Omar Sandoval
@ 2020-04-17 12:56   ` Johannes Thumshirn
  0 siblings, 0 replies; 37+ messages in thread
From: Johannes Thumshirn @ 2020-04-17 12:56 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

Looks good from my PoV
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 02/15] btrfs: fix error handling when submitting direct I/O bio
  2020-04-16 21:46 ` [PATCH v2 02/15] btrfs: fix error handling when submitting direct I/O bio Omar Sandoval
@ 2020-04-17 13:01   ` Johannes Thumshirn
  0 siblings, 0 replies; 37+ messages in thread
From: Johannes Thumshirn @ 2020-04-17 13:01 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 03/15] btrfs: fix double __endio_write_update_ordered in direct I/O
  2020-04-16 21:46 ` [PATCH v2 03/15] btrfs: fix double __endio_write_update_ordered in direct I/O Omar Sandoval
@ 2020-04-17 17:53   ` Johannes Thumshirn
  2020-04-20 15:45     ` David Sterba
  2020-04-21 10:44   ` Nikolay Borisov
  1 sibling, 1 reply; 37+ messages in thread
From: Johannes Thumshirn @ 2020-04-17 17:53 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

On 16/04/2020 23:47, Omar Sandoval wrote:
[...]
> +static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
> +							  struct inode *inode,
> +							  loff_t file_offset)
>   {
> -	struct inode *inode = dip->inode;
> +	const bool write = (bio_op(dio_bio) == REQ_OP_WRITE);

Nit: I think the braces aren't needed here.

[...]

> +static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode,
> +				loff_t file_offset)
> +{
> +	const bool write = (bio_op(dio_bio) == REQ_OP_WRITE);

Same here

Anyways:
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 04/15] btrfs: look at full bi_io_vec for repair decision
  2020-04-16 21:46 ` [PATCH v2 04/15] btrfs: look at full bi_io_vec for repair decision Omar Sandoval
@ 2020-04-17 17:56   ` Johannes Thumshirn
  0 siblings, 0 replies; 37+ messages in thread
From: Johannes Thumshirn @ 2020-04-17 17:56 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 05/15] btrfs: don't do repair validation for checksum errors
  2020-04-16 21:46 ` [PATCH v2 05/15] btrfs: don't do repair validation for checksum errors Omar Sandoval
@ 2020-04-17 17:59   ` Johannes Thumshirn
  0 siblings, 0 replies; 37+ messages in thread
From: Johannes Thumshirn @ 2020-04-17 17:59 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 06/15] btrfs: clarify btrfs_lookup_bio_sums documentation
  2020-04-16 21:46 ` [PATCH v2 06/15] btrfs: clarify btrfs_lookup_bio_sums documentation Omar Sandoval
@ 2020-04-17 18:01   ` Johannes Thumshirn
  2020-04-21 11:17   ` Nikolay Borisov
  1 sibling, 0 replies; 37+ messages in thread
From: Johannes Thumshirn @ 2020-04-17 18:01 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

On 16/04/2020 23:47, Omar Sandoval wrote:
> Fix a couple of issues in the btrfs_lookup_bio_sums documentation:
> 
> * The bio doesn't need to be a btrfs_io_bio if dst was provided. Move
>    the declaration in the code to make that clear, too.
> * dst must be large enough to hold nblocks * csum_size, not just
>    csum_size.

And also reduce the scope of 'btrfs_bio'?

Otherwise,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 09/15] btrfs: kill btrfs_dio_private->private
  2020-04-16 21:46 ` [PATCH v2 09/15] btrfs: kill btrfs_dio_private->private Omar Sandoval
@ 2020-04-17 18:02   ` Johannes Thumshirn
  0 siblings, 0 replies; 37+ messages in thread
From: Johannes Thumshirn @ 2020-04-17 18:02 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 10/15] btrfs: convert btrfs_dio_private->pending_bios to refcount_t
  2020-04-16 21:46 ` [PATCH v2 10/15] btrfs: convert btrfs_dio_private->pending_bios to refcount_t Omar Sandoval
@ 2020-04-17 18:03   ` Johannes Thumshirn
  0 siblings, 0 replies; 37+ messages in thread
From: Johannes Thumshirn @ 2020-04-17 18:03 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

Easy enough,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 11/15] btrfs: put direct I/O checksums in btrfs_dio_private instead of bio
  2020-04-16 21:46 ` [PATCH v2 11/15] btrfs: put direct I/O checksums in btrfs_dio_private instead of bio Omar Sandoval
@ 2020-04-17 18:06   ` Johannes Thumshirn
  2020-04-21 22:50   ` David Sterba
  1 sibling, 0 replies; 37+ messages in thread
From: Johannes Thumshirn @ 2020-04-17 18:06 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 03/15] btrfs: fix double __endio_write_update_ordered in direct I/O
  2020-04-17 17:53   ` Johannes Thumshirn
@ 2020-04-20 15:45     ` David Sterba
  0 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2020-04-20 15:45 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Omar Sandoval, linux-btrfs, kernel-team, Jens Axboe, Christoph Hellwig

On Fri, Apr 17, 2020 at 05:53:49PM +0000, Johannes Thumshirn wrote:
> On 16/04/2020 23:47, Omar Sandoval wrote:
> [...]
> > +static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
> > +							  struct inode *inode,
> > +							  loff_t file_offset)
> >   {
> > -	struct inode *inode = dip->inode;
> > +	const bool write = (bio_op(dio_bio) == REQ_OP_WRITE);
> 
> Nit: I think the braces aren't needed here.

For readability I prefer to keep them, it makes the operator precedence
obvious.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 03/15] btrfs: fix double __endio_write_update_ordered in direct I/O
  2020-04-16 21:46 ` [PATCH v2 03/15] btrfs: fix double __endio_write_update_ordered in direct I/O Omar Sandoval
  2020-04-17 17:53   ` Johannes Thumshirn
@ 2020-04-21 10:44   ` Nikolay Borisov
  2020-04-21 22:26     ` David Sterba
  1 sibling, 1 reply; 37+ messages in thread
From: Nikolay Borisov @ 2020-04-21 10:44 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig



On 17.04.20 г. 0:46 ч., Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> In btrfs_submit_direct(), if we fail to allocate the btrfs_dio_private,
> we complete the ordered extent range. However, we don't mark that the
> range doesn't need to be cleaned up from btrfs_direct_IO() until later.
> Therefore, if we fail to allocate the btrfs_dio_private, we complete the
> ordered extent range twice. We could fix this by updating
> unsubmitted_oe_range earlier, but it's cleaner to reorganize the code so
> that creating the btrfs_dio_private and submitting the bios are
> separate, and once the btrfs_dio_private is created, cleanup always
> happens through the btrfs_dio_private.
> 
> Fixes: f28a49287817 ("Btrfs: fix leaking of ordered extents after direct IO write error")
> Signed-off-by: Omar Sandoval <osandov@fb.com>

Generally looks ok, so :

Reviewed-by: Nikolay Borisov <nborisov@suse.com>, however please see
below for some remarks on the logic around unsubmitted_oe_range_start/end

> ---
>  fs/btrfs/inode.c | 174 ++++++++++++++++++-----------------------------
>  1 file changed, 66 insertions(+), 108 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index b628c319a5b6..f6ce9749adb6 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7903,14 +7903,60 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
>  	return ret;
>  }
>  
> -static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
> +/*
> + * If this succeeds, the btrfs_dio_private is responsible for cleaning up locked
> + * or ordered extents whether or not we submit any bios.
> + */
> +static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
> +							  struct inode *inode,
> +							  loff_t file_offset)
>  {
> -	struct inode *inode = dip->inode;
> +	const bool write = (bio_op(dio_bio) == REQ_OP_WRITE);
> +	struct btrfs_dio_private *dip;
> +	struct bio *bio;
> +
> +	dip = kzalloc(sizeof(*dip), GFP_NOFS);
> +	if (!dip)
> +		return NULL;
> +
> +	bio = btrfs_bio_clone(dio_bio);
> +	bio->bi_private = dip;
> +	btrfs_io_bio(bio)->logical = file_offset;
> +
> +	dip->private = dio_bio->bi_private;
> +	dip->inode = inode;
> +	dip->logical_offset = file_offset;
> +	dip->bytes = dio_bio->bi_iter.bi_size;
> +	dip->disk_bytenr = (u64)dio_bio->bi_iter.bi_sector << 9;
> +	dip->orig_bio = bio;
> +	dip->dio_bio = dio_bio;
> +	atomic_set(&dip->pending_bios, 1);
> +
> +	if (write) {
> +		struct btrfs_dio_data *dio_data = current->journal_info;
> +
> +		dio_data->unsubmitted_oe_range_end = dip->logical_offset +
> +			dip->bytes;
> +		dio_data->unsubmitted_oe_range_start =
> +			dio_data->unsubmitted_oe_range_end;

The logic around those 2 members is really subtle. We really have the
following:

1. btrfs_direct_IO sets those two to the same value.

2. When we call __blockdev_direct_IO unless
btrfs_get_blocks_direct->btrfs_get_blocks_direct_write is called to
modify unsubmitted_oe_range_start so that start < end. Cleanup won't
happen.

3. We come into btrfs_submit_direct - if it dip allocation fails we'd
return with oe_range_end now modified so cleanup will happen.

4. If we manage to allocate the dip we reset the unsubmitted range
members to be equal so that cleanup happens from btrfs_endio_direct_write.

This 4-step logic is not really obvious, especially given it's scattered
across 3 functions. Perhaps a comment saying that having the 2 members
being equal means cleanup in btrfs_DIRECT_io is disabled.

I'd rather have it spelled out in the changelog, I guess David can also
do that ?

> +
> +		bio->bi_end_io = btrfs_endio_direct_write;
> +	} else {
> +		bio->bi_end_io = btrfs_endio_direct_read;
> +		dip->subio_endio = btrfs_subio_endio_read;
> +	}
> +	return dip;
> +}
> +

<snip>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 06/15] btrfs: clarify btrfs_lookup_bio_sums documentation
  2020-04-16 21:46 ` [PATCH v2 06/15] btrfs: clarify btrfs_lookup_bio_sums documentation Omar Sandoval
  2020-04-17 18:01   ` Johannes Thumshirn
@ 2020-04-21 11:17   ` Nikolay Borisov
  1 sibling, 0 replies; 37+ messages in thread
From: Nikolay Borisov @ 2020-04-21 11:17 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig



On 17.04.20 г. 0:46 ч., Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Fix a couple of issues in the btrfs_lookup_bio_sums documentation:
> 
> * The bio doesn't need to be a btrfs_io_bio if dst was provided. Move
>   the declaration in the code to make that clear, too.
> * dst must be large enough to hold nblocks * csum_size, not just
>   csum_size.
> 
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Omar Sandoval <osandov@fb.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 12/15] btrfs: get rid of one layer of bios in direct I/O
  2020-04-16 21:46 ` [PATCH v2 12/15] btrfs: get rid of one layer of bios in direct I/O Omar Sandoval
@ 2020-04-21 13:00   ` Nikolay Borisov
  2020-04-21 23:11     ` David Sterba
  0 siblings, 1 reply; 37+ messages in thread
From: Nikolay Borisov @ 2020-04-21 13:00 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig



On 17.04.20 г. 0:46 ч., Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> In the worst case, there are _4_ layers of bios in the Btrfs direct I/O
> path:
> 
> 1. The bio created by the generic direct I/O code (dio_bio).
> 2. A clone of dio_bio we create in btrfs_submit_direct() to represent
>    the entire direct I/O range (orig_bio).
> 3. A partial clone of orig_bio limited to the size of a RAID stripe that
>    we create in btrfs_submit_direct_hook().
> 4. Clones of each of those split bios for each RAID stripe that we
>    create in btrfs_map_bio().
> 
> As of the previous commit, the second layer (orig_bio) is no longer
> needed for anything: we can split dio_bio instead, and complete dio_bio
> directly when all of the cloned bios complete. This lets us clean up a
> bunch of cruft, including dip->subio_endio and dip->errors (we can use
> dio_bio->bi_status instead). It also enables the next big cleanup of
> direct I/O read repair.

nit: Please explicitly mention that btrfs_dio_private_put is now not
only putting a structure and doing cleanups but also serves as the io
completion routine for dio_bio. Given the name of the function and its
purpose I find this a bit counter-intuitive. But since this is rather
subjective I'd like to ask David if he also sees it as a bit surprising?
> 
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/btrfs_inode.h |  16 ----
>  fs/btrfs/inode.c       | 185 +++++++++++++++--------------------------
>  2 files changed, 65 insertions(+), 136 deletions(-)
> 

<snip>

> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index fe87c465b13c..79b884d2f3ed 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7356,6 +7356,29 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
>  	return ret;
>  }
>  
> +static void btrfs_dio_private_put(struct btrfs_dio_private *dip)
> +{

The way you've structured the code is proliferating something which has
been identified as bad practice, namely - asymmetry between  "refcount
get" and "refcount put" operations. The put is nicely encapsulated
behind an aptly named function, however getting the reference is really
an open-coded refcount_inc. This has lead to at least 1 bug in the past
(recently fixed by Filipe) since transaction's refcount management is
similar. So I'd rather have the refcount_inc(dip->refs) be encapsulated
behind a simple btrfs_dio_private_get() helper.

> +	/*
> +	 * This implies a barrier so that stores to dio_bio->bi_status before
> +	 * this and loads of dio_bio->bi_status after this are fully ordered.
> +	 */
> +	if (!refcount_dec_and_test(&dip->refs))
> +		return;
> +
> +	if (bio_op(dip->dio_bio) == REQ_OP_WRITE) {
> +		__endio_write_update_ordered(dip->inode, dip->logical_offset,
> +					     dip->bytes,
> +					     !dip->dio_bio->bi_status);
> +	} else {
> +		unlock_extent(&BTRFS_I(dip->inode)->io_tree,
> +			      dip->logical_offset,
> +			      dip->logical_offset + dip->bytes - 1);
> +	}
> +
> +	dio_end_io(dip->dio_bio);
> +	kfree(dip);
> +}
> +
>  static inline blk_status_t submit_dio_repair_bio(struct inode *inode,
>  						 struct bio *bio,
>  						 int mirror_num)

<snip>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 13/15] btrfs: simplify direct I/O read repair
  2020-04-16 21:46 ` [PATCH v2 13/15] btrfs: simplify direct I/O read repair Omar Sandoval
@ 2020-04-21 13:53   ` Nikolay Borisov
  2020-04-21 14:40     ` Nikolay Borisov
  0 siblings, 1 reply; 37+ messages in thread
From: Nikolay Borisov @ 2020-04-21 13:53 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig



On 17.04.20 г. 0:46 ч., Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Direct I/O read repair was originally implemented in commit 8b110e393c5a
> ("Btrfs: implement repair function when direct read fails"). This
> implementation is unnecessarily complicated. There is major code
> duplication between __btrfs_subio_endio_read() (checks checksums and
> handles I/O errors for files with checksums),
> __btrfs_correct_data_nocsum() (handles I/O errors for files without
> checksums), btrfs_retry_endio() (checks checksums and handles I/O errors
> for retries of files with checksums), and btrfs_retry_endio_nocsum()
> (handles I/O errors for retries of files without checksum). If it sounds
> like these should be one function, that's because they should.
> Additionally, these functions are very hard to follow due to their
> excessive use of goto.
> 
> This commit replaces the original implementation. After the previous
> commit getting rid of orig_bio, we can reuse the same endio callback for
> repair I/O and the original I/O, we just need to track the file offset
> and original iterator in the repair bio. We can also unify the handling
> of files with and without checksums and simplify the control flow. We
> also no longer have to wait for each repair I/O to complete one by one.
> 
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/extent_io.c |   2 +
>  fs/btrfs/inode.c     | 268 +++++++------------------------------------
>  2 files changed, 44 insertions(+), 226 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 85e98ba349a8..6e1d97bb7652 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2631,6 +2631,8 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio,
>  	}
>  
>  	bio_add_page(bio, page, failrec->len, pg_offset);
> +	btrfs_io_bio(bio)->logical = failrec->start;
> +	btrfs_io_bio(bio)->iter = bio->bi_iter;
>  
>  	return bio;
>  }
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 79b884d2f3ed..2580f2d251d4 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7435,19 +7435,17 @@ static int btrfs_check_dio_repairable(struct inode *inode,
>  
>  static blk_status_t dio_read_error(struct inode *inode, struct bio *failed_bio,
>  				   struct page *page, unsigned int pgoff,
> -				   u64 start, u64 end, int failed_mirror,
> -				   bio_end_io_t *repair_endio, void *repair_arg)
> +				   u64 start, u64 end, int failed_mirror)
>  {
> +	struct btrfs_dio_private *dip = failed_bio->bi_private;
>  	struct io_failure_record *failrec;
>  	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
>  	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
>  	struct bio *bio;
>  	int isector;
>  	unsigned int read_mode = 0;
> -	int segs;
>  	int ret;
>  	blk_status_t status;
> -	struct bio_vec bvec;
>  
>  	BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
>  
> @@ -7462,261 +7460,79 @@ static blk_status_t dio_read_error(struct inode *inode, struct bio *failed_bio,
>  		return BLK_STS_IOERR;
>  	}
>  
> -	segs = bio_segments(failed_bio);
> -	bio_get_first_bvec(failed_bio, &bvec);
> -	if (segs > 1 ||
> -	    (bvec.bv_len > btrfs_inode_sectorsize(inode)))
> +	if (btrfs_io_bio(failed_bio)->iter.bi_size > inode->i_sb->s_blocksize)
>  		read_mode |= REQ_FAILFAST_DEV;
>  
>  	isector = start - btrfs_io_bio(failed_bio)->logical;
>  	isector >>= inode->i_sb->s_blocksize_bits;
> -	bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page,
> -				pgoff, isector, repair_endio, repair_arg);
> +	bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page, pgoff,
> +				      isector, failed_bio->bi_end_io, dip);
>  	bio->bi_opf = REQ_OP_READ | read_mode;
>  
>  	btrfs_debug(BTRFS_I(inode)->root->fs_info,
>  		    "repair DIO read error: submitting new dio read[%#x] to this_mirror=%d, in_validation=%d",
>  		    read_mode, failrec->this_mirror, failrec->in_validation);
>  
> +	refcount_inc(&dip->refs);
>  	status = submit_dio_repair_bio(inode, bio, failrec->this_mirror);
>  	if (status) {
>  		free_io_failure(failure_tree, io_tree, failrec);
>  		bio_put(bio);
> +		refcount_dec(&dip->refs);
>  	}
>  
>  	return status;
>  }
>  
> -struct btrfs_retry_complete {
> -	struct completion done;
> -	struct inode *inode;
> -	u64 start;
> -	int uptodate;
> -};
> -
> -static void btrfs_retry_endio_nocsum(struct bio *bio)
> -{
> -	struct btrfs_retry_complete *done = bio->bi_private;
> -	struct inode *inode = done->inode;
> -	struct bio_vec *bvec;
> -	struct extent_io_tree *io_tree, *failure_tree;
> -	struct bvec_iter_all iter_all;
> -
> -	if (bio->bi_status)
> -		goto end;
> -
> -	ASSERT(bio->bi_vcnt == 1);
> -	io_tree = &BTRFS_I(inode)->io_tree;
> -	failure_tree = &BTRFS_I(inode)->io_failure_tree;
> -	ASSERT(bio_first_bvec_all(bio)->bv_len == btrfs_inode_sectorsize(inode));
> -
> -	done->uptodate = 1;
> -	ASSERT(!bio_flagged(bio, BIO_CLONED));
> -	bio_for_each_segment_all(bvec, bio, iter_all)
> -		clean_io_failure(BTRFS_I(inode)->root->fs_info, failure_tree,
> -				 io_tree, done->start, bvec->bv_page,
> -				 btrfs_ino(BTRFS_I(inode)), 0);
> -end:
> -	complete(&done->done);
> -	bio_put(bio);
> -}
> -
> -static blk_status_t __btrfs_correct_data_nocsum(struct inode *inode,
> -						struct btrfs_io_bio *io_bio)
> +static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
> +					     struct btrfs_io_bio *io_bio,
> +					     const bool uptodate)
>  {
> -	struct btrfs_fs_info *fs_info;
> +	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> +	u32 sectorsize = fs_info->sectorsize;
> +	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
> +	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
> +	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
>  	struct bio_vec bvec;
>  	struct bvec_iter iter;
> -	struct btrfs_retry_complete done;
> -	u64 start;
> -	unsigned int pgoff;
> -	u32 sectorsize;
> -	int nr_sectors;
> -	blk_status_t ret;
> +	u64 start = io_bio->logical;
> +	int icsum = 0;
>  	blk_status_t err = BLK_STS_OK;
>  
> -	fs_info = BTRFS_I(inode)->root->fs_info;
> -	sectorsize = fs_info->sectorsize;
> -
> -	start = io_bio->logical;
> -	done.inode = inode;
> -	io_bio->bio.bi_iter = io_bio->iter;
> +	__bio_for_each_segment(bvec, &io_bio->bio, iter, io_bio->iter) {
> +		unsigned int i, nr_sectors, pgoff;
>  		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
>  		pgoff = bvec.bv_offset;
<snip>
> +		for (i = 0; i < nr_sectors; i++) {
>  			ASSERT(pgoff < PAGE_SIZE);
<snip>
> +			if (uptodate &&
> +			    (!csum || !check_data_csum(inode, io_bio, icsum,
> +						       bvec.bv_page, pgoff,
> +						       start, sectorsize))) {
> +				clean_io_failure(fs_info, failure_tree, io_tree,
> +						 start, bvec.bv_page,
> +						 btrfs_ino(BTRFS_I(inode)),
> +						 pgoff);

Why do you have to call clean_io_failure in case there isn't a failure
since that function:

a) Will return because the tree is empty altogether (count_range_bits
returns 0).

b) Or if there was a corruption a different segment - get_state_failrec
would return -ENOENT.

Can't this be reworked to:

if (!uptodate || (csum && check_data_csum)) {
dio_read_error
}

> +			} else {
> +				blk_status_t status;
> +
> +				status = dio_read_error(inode, &io_bio->bio,
> +							bvec.bv_page, pgoff,
> +							start,
> +							start + sectorsize - 1,
> +							io_bio->mirror_num);
> +				if (status)
> +					err = status;
> +			}
> +			start += sectorsize;
> +			icsum++;
>  			pgoff += sectorsize;
> -			ASSERT(pgoff < PAGE_SIZE);
> -			goto next_block;
>  		}
>  	}
> -
>  	return err;
>  }

<snip>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 13/15] btrfs: simplify direct I/O read repair
  2020-04-21 13:53   ` Nikolay Borisov
@ 2020-04-21 14:40     ` Nikolay Borisov
  0 siblings, 0 replies; 37+ messages in thread
From: Nikolay Borisov @ 2020-04-21 14:40 UTC (permalink / raw)
  To: Omar Sandoval, linux-btrfs; +Cc: kernel-team, Jens Axboe, Christoph Hellwig



On 21.04.20 г. 16:53 ч., Nikolay Borisov wrote:
> 
> 
> On 17.04.20 г. 0:46 ч., Omar Sandoval wrote:
>> From: Omar Sandoval <osandov@fb.com>
>>
>> Direct I/O read repair was originally implemented in commit 8b110e393c5a
>> ("Btrfs: implement repair function when direct read fails"). This
>> implementation is unnecessarily complicated. There is major code
>> duplication between __btrfs_subio_endio_read() (checks checksums and
>> handles I/O errors for files with checksums),
>> __btrfs_correct_data_nocsum() (handles I/O errors for files without
>> checksums), btrfs_retry_endio() (checks checksums and handles I/O errors
>> for retries of files with checksums), and btrfs_retry_endio_nocsum()
>> (handles I/O errors for retries of files without checksum). If it sounds
>> like these should be one function, that's because they should.
>> Additionally, these functions are very hard to follow due to their
>> excessive use of goto.
>>
>> This commit replaces the original implementation. After the previous
>> commit getting rid of orig_bio, we can reuse the same endio callback for
>> repair I/O and the original I/O, we just need to track the file offset
>> and original iterator in the repair bio. We can also unify the handling
>> of files with and without checksums and simplify the control flow. We
>> also no longer have to wait for each repair I/O to complete one by one.
>>
>> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>> Signed-off-by: Omar Sandoval <osandov@fb.com>
>> ---
>>  fs/btrfs/extent_io.c |   2 +
>>  fs/btrfs/inode.c     | 268 +++++++------------------------------------
>>  2 files changed, 44 insertions(+), 226 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 85e98ba349a8..6e1d97bb7652 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -2631,6 +2631,8 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio,
>>  	}
>>  
>>  	bio_add_page(bio, page, failrec->len, pg_offset);
>> +	btrfs_io_bio(bio)->logical = failrec->start;
>> +	btrfs_io_bio(bio)->iter = bio->bi_iter;
>>  
>>  	return bio;
>>  }
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 79b884d2f3ed..2580f2d251d4 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -7435,19 +7435,17 @@ static int btrfs_check_dio_repairable(struct inode *inode,
>>  
>>  static blk_status_t dio_read_error(struct inode *inode, struct bio *failed_bio,
>>  				   struct page *page, unsigned int pgoff,
>> -				   u64 start, u64 end, int failed_mirror,
>> -				   bio_end_io_t *repair_endio, void *repair_arg)
>> +				   u64 start, u64 end, int failed_mirror)
>>  {
>> +	struct btrfs_dio_private *dip = failed_bio->bi_private;
>>  	struct io_failure_record *failrec;
>>  	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
>>  	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
>>  	struct bio *bio;
>>  	int isector;
>>  	unsigned int read_mode = 0;
>> -	int segs;
>>  	int ret;
>>  	blk_status_t status;
>> -	struct bio_vec bvec;
>>  
>>  	BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
>>  
>> @@ -7462,261 +7460,79 @@ static blk_status_t dio_read_error(struct inode *inode, struct bio *failed_bio,
>>  		return BLK_STS_IOERR;
>>  	}
>>  
>> -	segs = bio_segments(failed_bio);
>> -	bio_get_first_bvec(failed_bio, &bvec);
>> -	if (segs > 1 ||
>> -	    (bvec.bv_len > btrfs_inode_sectorsize(inode)))
>> +	if (btrfs_io_bio(failed_bio)->iter.bi_size > inode->i_sb->s_blocksize)
>>  		read_mode |= REQ_FAILFAST_DEV;
>>  
>>  	isector = start - btrfs_io_bio(failed_bio)->logical;
>>  	isector >>= inode->i_sb->s_blocksize_bits;
>> -	bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page,
>> -				pgoff, isector, repair_endio, repair_arg);
>> +	bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page, pgoff,
>> +				      isector, failed_bio->bi_end_io, dip);
>>  	bio->bi_opf = REQ_OP_READ | read_mode;
>>  
>>  	btrfs_debug(BTRFS_I(inode)->root->fs_info,
>>  		    "repair DIO read error: submitting new dio read[%#x] to this_mirror=%d, in_validation=%d",
>>  		    read_mode, failrec->this_mirror, failrec->in_validation);
>>  
>> +	refcount_inc(&dip->refs);
>>  	status = submit_dio_repair_bio(inode, bio, failrec->this_mirror);
>>  	if (status) {
>>  		free_io_failure(failure_tree, io_tree, failrec);
>>  		bio_put(bio);
>> +		refcount_dec(&dip->refs);
>>  	}
>>  
>>  	return status;
>>  }
>>  
>> -struct btrfs_retry_complete {
>> -	struct completion done;
>> -	struct inode *inode;
>> -	u64 start;
>> -	int uptodate;
>> -};
>> -
>> -static void btrfs_retry_endio_nocsum(struct bio *bio)
>> -{
>> -	struct btrfs_retry_complete *done = bio->bi_private;
>> -	struct inode *inode = done->inode;
>> -	struct bio_vec *bvec;
>> -	struct extent_io_tree *io_tree, *failure_tree;
>> -	struct bvec_iter_all iter_all;
>> -
>> -	if (bio->bi_status)
>> -		goto end;
>> -
>> -	ASSERT(bio->bi_vcnt == 1);
>> -	io_tree = &BTRFS_I(inode)->io_tree;
>> -	failure_tree = &BTRFS_I(inode)->io_failure_tree;
>> -	ASSERT(bio_first_bvec_all(bio)->bv_len == btrfs_inode_sectorsize(inode));
>> -
>> -	done->uptodate = 1;
>> -	ASSERT(!bio_flagged(bio, BIO_CLONED));
>> -	bio_for_each_segment_all(bvec, bio, iter_all)
>> -		clean_io_failure(BTRFS_I(inode)->root->fs_info, failure_tree,
>> -				 io_tree, done->start, bvec->bv_page,
>> -				 btrfs_ino(BTRFS_I(inode)), 0);
>> -end:
>> -	complete(&done->done);
>> -	bio_put(bio);
>> -}
>> -
>> -static blk_status_t __btrfs_correct_data_nocsum(struct inode *inode,
>> -						struct btrfs_io_bio *io_bio)
>> +static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
>> +					     struct btrfs_io_bio *io_bio,
>> +					     const bool uptodate)
>>  {
>> -	struct btrfs_fs_info *fs_info;
>> +	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
>> +	u32 sectorsize = fs_info->sectorsize;
>> +	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
>> +	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
>> +	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
>>  	struct bio_vec bvec;
>>  	struct bvec_iter iter;
>> -	struct btrfs_retry_complete done;
>> -	u64 start;
>> -	unsigned int pgoff;
>> -	u32 sectorsize;
>> -	int nr_sectors;
>> -	blk_status_t ret;
>> +	u64 start = io_bio->logical;
>> +	int icsum = 0;
>>  	blk_status_t err = BLK_STS_OK;
>>  
>> -	fs_info = BTRFS_I(inode)->root->fs_info;
>> -	sectorsize = fs_info->sectorsize;
>> -
>> -	start = io_bio->logical;
>> -	done.inode = inode;
>> -	io_bio->bio.bi_iter = io_bio->iter;
>> +	__bio_for_each_segment(bvec, &io_bio->bio, iter, io_bio->iter) {
>> +		unsigned int i, nr_sectors, pgoff;
>>  		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
>>  		pgoff = bvec.bv_offset;
> <snip>
>> +		for (i = 0; i < nr_sectors; i++) {
>>  			ASSERT(pgoff < PAGE_SIZE);
> <snip>
>> +			if (uptodate &&
>> +			    (!csum || !check_data_csum(inode, io_bio, icsum,
>> +						       bvec.bv_page, pgoff,
>> +						       start, sectorsize))) {
>> +				clean_io_failure(fs_info, failure_tree, io_tree,
>> +						 start, bvec.bv_page,
>> +						 btrfs_ino(BTRFS_I(inode)),
>> +						 pgoff);
> 
> Why do you have to call clean_io_failure in case there isn't a failure
> since that function:
> 
> a) Will return because the tree is empty altogether (count_range_bits
> returns 0).
> 
> b) Or if there was a corruption a different segment - get_state_failrec
> would return -ENOENT.
> 
> Can't this be reworked to:
> 
> if (!uptodate || (csum && check_data_csum)) {
> dio_read_error
> }

Thinking about it a bit more this seems to handle the case when we have
failed one read and subsequently queued a repair bio. When this repair
bio finishes we'll execute the clean_ioo_failure to remove the failure
rec created initially. In this case this patch LGTM, so:

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> 
>> +			} else {
>> +				blk_status_t status;
>> +
>> +				status = dio_read_error(inode, &io_bio->bio,
>> +							bvec.bv_page, pgoff,
>> +							start,
>> +							start + sectorsize - 1,
>> +							io_bio->mirror_num);
>> +				if (status)
>> +					err = status;
>> +			}
>> +			start += sectorsize;
>> +			icsum++;
>>  			pgoff += sectorsize;
>> -			ASSERT(pgoff < PAGE_SIZE);
>> -			goto next_block;
>>  		}
>>  	}
>> -
>>  	return err;
>>  }
> 
> <snip>
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 03/15] btrfs: fix double __endio_write_update_ordered in direct I/O
  2020-04-21 10:44   ` Nikolay Borisov
@ 2020-04-21 22:26     ` David Sterba
  0 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2020-04-21 22:26 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Omar Sandoval, linux-btrfs, kernel-team, Jens Axboe, Christoph Hellwig

On Tue, Apr 21, 2020 at 01:44:25PM +0300, Nikolay Borisov wrote:
> 
> 
> On 17.04.20 г. 0:46 ч., Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > In btrfs_submit_direct(), if we fail to allocate the btrfs_dio_private,
> > we complete the ordered extent range. However, we don't mark that the
> > range doesn't need to be cleaned up from btrfs_direct_IO() until later.
> > Therefore, if we fail to allocate the btrfs_dio_private, we complete the
> > ordered extent range twice. We could fix this by updating
> > unsubmitted_oe_range earlier, but it's cleaner to reorganize the code so
> > that creating the btrfs_dio_private and submitting the bios are
> > separate, and once the btrfs_dio_private is created, cleanup always
> > happens through the btrfs_dio_private.
> > 
> > Fixes: f28a49287817 ("Btrfs: fix leaking of ordered extents after direct IO write error")
> > Signed-off-by: Omar Sandoval <osandov@fb.com>
> 
> Generally looks ok, so :
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>, however please see
> below for some remarks on the logic around unsubmitted_oe_range_start/end
> 
> > ---
> >  fs/btrfs/inode.c | 174 ++++++++++++++++++-----------------------------
> >  1 file changed, 66 insertions(+), 108 deletions(-)
> > 
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index b628c319a5b6..f6ce9749adb6 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -7903,14 +7903,60 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
> >  	return ret;
> >  }
> >  
> > -static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip)
> > +/*
> > + * If this succeeds, the btrfs_dio_private is responsible for cleaning up locked
> > + * or ordered extents whether or not we submit any bios.
> > + */
> > +static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
> > +							  struct inode *inode,
> > +							  loff_t file_offset)
> >  {
> > -	struct inode *inode = dip->inode;
> > +	const bool write = (bio_op(dio_bio) == REQ_OP_WRITE);
> > +	struct btrfs_dio_private *dip;
> > +	struct bio *bio;
> > +
> > +	dip = kzalloc(sizeof(*dip), GFP_NOFS);
> > +	if (!dip)
> > +		return NULL;
> > +
> > +	bio = btrfs_bio_clone(dio_bio);
> > +	bio->bi_private = dip;
> > +	btrfs_io_bio(bio)->logical = file_offset;
> > +
> > +	dip->private = dio_bio->bi_private;
> > +	dip->inode = inode;
> > +	dip->logical_offset = file_offset;
> > +	dip->bytes = dio_bio->bi_iter.bi_size;
> > +	dip->disk_bytenr = (u64)dio_bio->bi_iter.bi_sector << 9;
> > +	dip->orig_bio = bio;
> > +	dip->dio_bio = dio_bio;
> > +	atomic_set(&dip->pending_bios, 1);
> > +
> > +	if (write) {
> > +		struct btrfs_dio_data *dio_data = current->journal_info;
> > +
> > +		dio_data->unsubmitted_oe_range_end = dip->logical_offset +
> > +			dip->bytes;
> > +		dio_data->unsubmitted_oe_range_start =
> > +			dio_data->unsubmitted_oe_range_end;
> 
> The logic around those 2 members is really subtle. We really have the
> following:
> 
> 1. btrfs_direct_IO sets those two to the same value.
> 
> 2. When we call __blockdev_direct_IO unless
> btrfs_get_blocks_direct->btrfs_get_blocks_direct_write is called to
> modify unsubmitted_oe_range_start so that start < end. Cleanup won't
> happen.
> 
> 3. We come into btrfs_submit_direct - if it dip allocation fails we'd
> return with oe_range_end now modified so cleanup will happen.
> 
> 4. If we manage to allocate the dip we reset the unsubmitted range
> members to be equal so that cleanup happens from btrfs_endio_direct_write.
> 
> This 4-step logic is not really obvious, especially given it's scattered
> across 3 functions. Perhaps a comment saying that having the 2 members
> being equal means cleanup in btrfs_DIRECT_io is disabled.
> 
> I'd rather have it spelled out in the changelog, I guess David can also
> do that ?

The above added to changelog and a brief comment added, thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 11/15] btrfs: put direct I/O checksums in btrfs_dio_private instead of bio
  2020-04-16 21:46 ` [PATCH v2 11/15] btrfs: put direct I/O checksums in btrfs_dio_private instead of bio Omar Sandoval
  2020-04-17 18:06   ` Johannes Thumshirn
@ 2020-04-21 22:50   ` David Sterba
  1 sibling, 0 replies; 37+ messages in thread
From: David Sterba @ 2020-04-21 22:50 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, kernel-team, Jens Axboe, Christoph Hellwig

On Thu, Apr 16, 2020 at 02:46:21PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> The next commit will get rid of btrfs_dio_private->orig_bio. The only
> thing we really need it for is containing all of the checksums, but we
> can easily put the checksum array in btrfs_dio_private and have the
> submitted bios reference the array. We can also look the checksums up
> while we're setting up instead of the current awkward logic that looks
> them up for orig_bio when the first split bio is submitted.
> 
> (Interestingly, btrfs_dio_private did contain the
> checksums before commit 23ea8e5a0767 ("Btrfs: load checksum data once
> when submitting a direct read io"), but it didn't look them up up
> front.)
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/btrfs_inode.h |  3 ++
>  fs/btrfs/inode.c       | 70 +++++++++++++++++++-----------------------
>  2 files changed, 34 insertions(+), 39 deletions(-)
> 
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index b965fa5429ec..94476a8be4cc 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -324,6 +324,9 @@ struct btrfs_dio_private {
>  	 */
>  	blk_status_t (*subio_endio)(struct inode *, struct btrfs_io_bio *,
>  			blk_status_t);
> +
> +	/* Checksums. */
> +	u8 sums[];

I've renamed it to 'csums'

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 12/15] btrfs: get rid of one layer of bios in direct I/O
  2020-04-21 13:00   ` Nikolay Borisov
@ 2020-04-21 23:11     ` David Sterba
  0 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2020-04-21 23:11 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Omar Sandoval, linux-btrfs, kernel-team, Jens Axboe, Christoph Hellwig

On Tue, Apr 21, 2020 at 04:00:56PM +0300, Nikolay Borisov wrote:
> On 17.04.20 г. 0:46 ч., Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > In the worst case, there are _4_ layers of bios in the Btrfs direct I/O
> > path:
> > 
> > 1. The bio created by the generic direct I/O code (dio_bio).
> > 2. A clone of dio_bio we create in btrfs_submit_direct() to represent
> >    the entire direct I/O range (orig_bio).
> > 3. A partial clone of orig_bio limited to the size of a RAID stripe that
> >    we create in btrfs_submit_direct_hook().
> > 4. Clones of each of those split bios for each RAID stripe that we
> >    create in btrfs_map_bio().
> > 
> > As of the previous commit, the second layer (orig_bio) is no longer
> > needed for anything: we can split dio_bio instead, and complete dio_bio
> > directly when all of the cloned bios complete. This lets us clean up a
> > bunch of cruft, including dip->subio_endio and dip->errors (we can use
> > dio_bio->bi_status instead). It also enables the next big cleanup of
> > direct I/O read repair.
> 
> nit: Please explicitly mention that btrfs_dio_private_put is now not
> only putting a structure and doing cleanups but also serves as the io
> completion routine for dio_bio. Given the name of the function and its
> purpose I find this a bit counter-intuitive. But since this is rather
> subjective I'd like to ask David if he also sees it as a bit surprising?

I tend to agree that mixing put and end_io together is counter
intuitive, but I don't se a way how to separate them.

> > 
> > Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> > Signed-off-by: Omar Sandoval <osandov@fb.com>
> > ---
> >  fs/btrfs/btrfs_inode.h |  16 ----
> >  fs/btrfs/inode.c       | 185 +++++++++++++++--------------------------
> >  2 files changed, 65 insertions(+), 136 deletions(-)
> > 
> 
> <snip>
> 
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index fe87c465b13c..79b884d2f3ed 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -7356,6 +7356,29 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock,
> >  	return ret;
> >  }
> >  
> > +static void btrfs_dio_private_put(struct btrfs_dio_private *dip)
> > +{
> 
> The way you've structured the code is proliferating something which has
> been identified as bad practice, namely - asymmetry between  "refcount
> get" and "refcount put" operations. The put is nicely encapsulated
> behind an aptly named function, however getting the reference is really
> an open-coded refcount_inc. This has lead to at least 1 bug in the past
> (recently fixed by Filipe) since transaction's refcount management is
> similar. So I'd rather have the refcount_inc(dip->refs) be encapsulated
> behind a simple btrfs_dio_private_get() helper.

btrfs_dio_private_get for refcount_inc would be ok, but I'm not sure if
you also want the put+end_io for btrfs_dio_private_put.

Eg. in submit_dio_repair_bio, there's plain refcount_inc/refcount_dec,
here the end_io semantics is not expected (and won't probably happen).

In btrfs_submit_direct there's the conditional inc/dec, when transfering
the 1st reference from the init.

Suggestions welcome.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 15/15] btrfs: unify buffered and direct I/O read repair
  2020-04-16 21:46 ` [PATCH v2 15/15] btrfs: unify buffered and direct I/O read repair Omar Sandoval
@ 2020-04-21 23:38   ` David Sterba
  0 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2020-04-21 23:38 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, kernel-team, Jens Axboe, Christoph Hellwig

On Thu, Apr 16, 2020 at 02:46:25PM -0700, Omar Sandoval wrote:
> +blk_status_t btrfs_submit_read_repair(struct inode *inode,
> +				      struct bio *failed_bio, u64 phy_offset,
> +				      struct page *page, unsigned int pgoff,
> +				      u64 start, u64 end, int failed_mirror,
> +				      submit_bio_hook_t *submit_bio_hook)
>  {
>  	struct io_failure_record *failrec;
> -	struct inode *inode = page->mapping->host;
> +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>  	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
>  	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
> +	struct btrfs_io_bio *failed_io_bio = btrfs_io_bio(failed_bio);
> +	int icsum = phy_offset >> inode->i_sb->s_blocksize_bits;
>  	bool need_validation;
> -	struct bio *bio;
> -	int read_mode = 0;
> +	struct bio *repair_bio;
> +	struct btrfs_io_bio *repair_io_bio;
>  	blk_status_t status;
>  	int ret;
>  
> +	btrfs_info(btrfs_sb(inode->i_sb),
> +		   "Repair Read Error: read error at %llu", start);

This is a new message, there's another one with 'repair read error' but
at 'debug' level, so I'd set that one to debug too. Or maybe drop it
altogether, we want to use the uevents mechanism to communicate such
things to user and not flood the system log.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/15] btrfs: read repair/direct I/O improvements
  2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
                   ` (15 preceding siblings ...)
  2020-04-17 11:03 ` [PATCH v2 00/15] btrfs: read repair/direct I/O improvements David Sterba
@ 2020-04-21 23:46 ` David Sterba
  16 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2020-04-21 23:46 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-btrfs, kernel-team, Jens Axboe, Christoph Hellwig, rgoldwyn

On Thu, Apr 16, 2020 at 02:46:10PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> Omar Sandoval (15):
>   block: add bio_for_each_bvec_all()
>   btrfs: fix error handling when submitting direct I/O bio
>   btrfs: fix double __endio_write_update_ordered in direct I/O
>   btrfs: look at full bi_io_vec for repair decision
>   btrfs: don't do repair validation for checksum errors
>   btrfs: clarify btrfs_lookup_bio_sums documentation
>   btrfs: rename __readpage_endio_check to check_data_csum
>   btrfs: make btrfs_check_repairable() static
>   btrfs: kill btrfs_dio_private->private
>   btrfs: convert btrfs_dio_private->pending_bios to refcount_t
>   btrfs: put direct I/O checksums in btrfs_dio_private instead of bio
>   btrfs: get rid of one layer of bios in direct I/O
>   btrfs: simplify direct I/O read repair
>   btrfs: get rid of endio_repair_workers
>   btrfs: unify buffered and direct I/O read repair

Thanks to all who did the reviews, I'm adding the patchset to misc-next.
There are some minor updates, in changelogs or in code. There are also
some comments that might lead to more fixups but I don't think it's
too serious to hold the patches unmerged.

If there are futher comments or requests for clarification, changelog
updates, please let me know, I'll do that directly. We need to give it a
test and also to provide a base for the dio-iomap patchset.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2020-04-21 23:47 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-16 21:46 [PATCH v2 00/15] btrfs: read repair/direct I/O improvements Omar Sandoval
2020-04-16 21:46 ` [PATCH v2 01/15] block: add bio_for_each_bvec_all() Omar Sandoval
2020-04-17 12:56   ` Johannes Thumshirn
2020-04-16 21:46 ` [PATCH v2 02/15] btrfs: fix error handling when submitting direct I/O bio Omar Sandoval
2020-04-17 13:01   ` Johannes Thumshirn
2020-04-16 21:46 ` [PATCH v2 03/15] btrfs: fix double __endio_write_update_ordered in direct I/O Omar Sandoval
2020-04-17 17:53   ` Johannes Thumshirn
2020-04-20 15:45     ` David Sterba
2020-04-21 10:44   ` Nikolay Borisov
2020-04-21 22:26     ` David Sterba
2020-04-16 21:46 ` [PATCH v2 04/15] btrfs: look at full bi_io_vec for repair decision Omar Sandoval
2020-04-17 17:56   ` Johannes Thumshirn
2020-04-16 21:46 ` [PATCH v2 05/15] btrfs: don't do repair validation for checksum errors Omar Sandoval
2020-04-17 17:59   ` Johannes Thumshirn
2020-04-16 21:46 ` [PATCH v2 06/15] btrfs: clarify btrfs_lookup_bio_sums documentation Omar Sandoval
2020-04-17 18:01   ` Johannes Thumshirn
2020-04-21 11:17   ` Nikolay Borisov
2020-04-16 21:46 ` [PATCH v2 07/15] btrfs: rename __readpage_endio_check to check_data_csum Omar Sandoval
2020-04-16 21:46 ` [PATCH v2 08/15] btrfs: make btrfs_check_repairable() static Omar Sandoval
2020-04-16 21:46 ` [PATCH v2 09/15] btrfs: kill btrfs_dio_private->private Omar Sandoval
2020-04-17 18:02   ` Johannes Thumshirn
2020-04-16 21:46 ` [PATCH v2 10/15] btrfs: convert btrfs_dio_private->pending_bios to refcount_t Omar Sandoval
2020-04-17 18:03   ` Johannes Thumshirn
2020-04-16 21:46 ` [PATCH v2 11/15] btrfs: put direct I/O checksums in btrfs_dio_private instead of bio Omar Sandoval
2020-04-17 18:06   ` Johannes Thumshirn
2020-04-21 22:50   ` David Sterba
2020-04-16 21:46 ` [PATCH v2 12/15] btrfs: get rid of one layer of bios in direct I/O Omar Sandoval
2020-04-21 13:00   ` Nikolay Borisov
2020-04-21 23:11     ` David Sterba
2020-04-16 21:46 ` [PATCH v2 13/15] btrfs: simplify direct I/O read repair Omar Sandoval
2020-04-21 13:53   ` Nikolay Borisov
2020-04-21 14:40     ` Nikolay Borisov
2020-04-16 21:46 ` [PATCH v2 14/15] btrfs: get rid of endio_repair_workers Omar Sandoval
2020-04-16 21:46 ` [PATCH v2 15/15] btrfs: unify buffered and direct I/O read repair Omar Sandoval
2020-04-21 23:38   ` David Sterba
2020-04-17 11:03 ` [PATCH v2 00/15] btrfs: read repair/direct I/O improvements David Sterba
2020-04-21 23:46 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.