All of lore.kernel.org
 help / color / mirror / Atom feed
* misc btrfs cleanups
@ 2022-05-22 11:47 Christoph Hellwig
  2022-05-22 11:47 ` [PATCH 1/8] btrfs: quit early if the fs has no RAID56 support for raid56 related checks Christoph Hellwig
                   ` (8 more replies)
  0 siblings, 9 replies; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-22 11:47 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

Hi all,

this series has a bunch of random cleanups and factored out helpers split
from the read repair series.

Diffstat:
 compression.c |   13 +----
 ctree.h       |   10 ++++
 extent_io.c   |  137 +++++++++++++++++++++++++++-------------------------------
 inode.c       |  107 ++++++++++++++++++++++-----------------------
 volumes.c     |    6 ++
 volumes.h     |   12 +++++
 6 files changed, 150 insertions(+), 135 deletions(-)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/8] btrfs: quit early if the fs has no RAID56 support for raid56 related checks
  2022-05-22 11:47 misc btrfs cleanups Christoph Hellwig
@ 2022-05-22 11:47 ` Christoph Hellwig
  2022-05-22 11:47 ` [PATCH 2/8] btrfs: introduce a pure data checksum checking helper Christoph Hellwig
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-22 11:47 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs, Nikolay Borisov, Anand Jain, Johannes Thumshirn

From: Qu Wenruo <wqu@suse.com>

The following functions do special handling for RAID56 chunks:

- btrfs_is_parity_mirror()
  Check if the range is in RAID56 chunks.

- btrfs_full_stripe_len()
  Either return sectorsize for non-RAID56 profiles or full stripe length
  for RAID56 chunks.

But if a filesystem without any RAID56 chunks, it will not have RAID56
incompt flags, and we can skip the chunk tree looking up completely.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/volumes.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 58f3eece8a48c..0819db46dbc42 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5769,6 +5769,9 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info,
 	struct map_lookup *map;
 	unsigned long len = fs_info->sectorsize;
 
+	if (!btrfs_fs_incompat(fs_info, RAID56))
+		return len;
+
 	em = btrfs_get_chunk_map(fs_info, logical, len);
 
 	if (!WARN_ON(IS_ERR(em))) {
@@ -5786,6 +5789,9 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
 	struct map_lookup *map;
 	int ret = 0;
 
+	if (!btrfs_fs_incompat(fs_info, RAID56))
+		return 0;
+
 	em = btrfs_get_chunk_map(fs_info, logical, len);
 
 	if(!WARN_ON(IS_ERR(em))) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/8] btrfs: introduce a pure data checksum checking helper
  2022-05-22 11:47 misc btrfs cleanups Christoph Hellwig
  2022-05-22 11:47 ` [PATCH 1/8] btrfs: quit early if the fs has no RAID56 support for raid56 related checks Christoph Hellwig
@ 2022-05-22 11:47 ` Christoph Hellwig
  2022-05-23  0:38   ` Qu Wenruo
  2022-05-22 11:47 ` [PATCH 3/8] btrfs: remove duplicated parameters from submit_data_read_repair() Christoph Hellwig
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-22 11:47 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs, Nikolay Borisov

From: Qu Wenruo <wqu@suse.com>

Although we have several data csum verification code, we never have a
function really just to verify checksum for one sector.

Function check_data_csum() do extra work for error reporting, thus it
requires a lot of extra things like file offset, bio_offset etc.

Function btrfs_verify_data_csum() is even worse, it will utizlie page
checked flag, which means it can not be utilized for direct IO pages.

Here we introduce a new helper, btrfs_check_sector_csum(), which really
only accept a sector in page, and expected checksum pointer.

We use this function to implement check_data_csum(), and export it for
incoming patch.

Signed-off-by: Qu Wenruo <wqu@suse.com>
[hch: keep passing the csum array as an arguments, as the callers want
      to print it, rename per request]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/compression.c | 13 ++++---------
 fs/btrfs/ctree.h       |  2 ++
 fs/btrfs/inode.c       | 38 ++++++++++++++++++++++++++++----------
 3 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index f4564f32f6d93..6ab82e142f1f8 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -147,12 +147,10 @@ static int check_compressed_csum(struct btrfs_inode *inode, struct bio *bio,
 				 u64 disk_start)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	const u32 csum_size = fs_info->csum_size;
 	const u32 sectorsize = fs_info->sectorsize;
 	struct page *page;
 	unsigned int i;
-	char *kaddr;
 	u8 csum[BTRFS_CSUM_SIZE];
 	struct compressed_bio *cb = bio->bi_private;
 	u8 *cb_sum = cb->sums;
@@ -161,8 +159,6 @@ static int check_compressed_csum(struct btrfs_inode *inode, struct bio *bio,
 	    test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state))
 		return 0;
 
-	shash->tfm = fs_info->csum_shash;
-
 	for (i = 0; i < cb->nr_pages; i++) {
 		u32 pg_offset;
 		u32 bytes_left = PAGE_SIZE;
@@ -175,12 +171,11 @@ static int check_compressed_csum(struct btrfs_inode *inode, struct bio *bio,
 		/* Hash through the page sector by sector */
 		for (pg_offset = 0; pg_offset < bytes_left;
 		     pg_offset += sectorsize) {
-			kaddr = kmap_atomic(page);
-			crypto_shash_digest(shash, kaddr + pg_offset,
-					    sectorsize, csum);
-			kunmap_atomic(kaddr);
+			int ret;
 
-			if (memcmp(&csum, cb_sum, csum_size) != 0) {
+			ret = btrfs_check_sector_csum(fs_info, page, pg_offset,
+						      csum, cb_sum);
+			if (ret) {
 				btrfs_print_data_csum_error(inode, disk_start,
 						csum, cb_sum, cb->mirror_num);
 				if (btrfs_bio(bio)->device)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0e49b1a0c0716..8dd7d36b83ecb 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3253,6 +3253,8 @@ u64 btrfs_file_extent_end(const struct btrfs_path *path);
 /* inode.c */
 void btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 			   int mirror_num, enum btrfs_compression_type compress_type);
+int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
+			    u32 pgoff, u8 *csum, u8 *csum_expected);
 unsigned int btrfs_verify_data_csum(struct btrfs_bio *bbio,
 				    u32 bio_offset, struct page *page,
 				    u64 start, u64 end);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index da13bd0d10f12..e4acdec9ffc69 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3326,6 +3326,29 @@ void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode,
 				       finish_ordered_fn, uptodate);
 }
 
+/*
+ * Verify the checksum for a single sector without any extra action that
+ * depend on the type of I/O.
+ */
+int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
+			    u32 pgoff, u8 *csum, u8 *csum_expected)
+{
+	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
+	char *kaddr;
+
+	ASSERT(pgoff + fs_info->sectorsize <= PAGE_SIZE);
+
+	shash->tfm = fs_info->csum_shash;
+
+	kaddr = kmap_local_page(page) + pgoff;
+	crypto_shash_digest(shash, kaddr, fs_info->sectorsize, csum);
+	kunmap_local(kaddr);
+
+	if (memcmp(csum, csum_expected, fs_info->csum_size))
+		return -EIO;
+	return 0;
+}
+
 /*
  * check_data_csum - verify checksum of one sector of uncompressed data
  * @inode:	inode
@@ -3336,14 +3359,15 @@ void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode,
  * @start:	logical offset in the file
  *
  * The length of such check is always one sector size.
+ *
+ * When csum mismatch detected, we will also report the error and fill the
+ * corrupted range with zero. (thus it needs the extra parameters)
  */
 static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio,
 			   u32 bio_offset, struct page *page, u32 pgoff,
 			   u64 start)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
-	char *kaddr;
 	u32 len = fs_info->sectorsize;
 	const u32 csum_size = fs_info->csum_size;
 	unsigned int offset_sectors;
@@ -3355,16 +3379,10 @@ static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio,
 	offset_sectors = bio_offset >> fs_info->sectorsize_bits;
 	csum_expected = ((u8 *)bbio->csum) + offset_sectors * csum_size;
 
-	kaddr = kmap_atomic(page);
-	shash->tfm = fs_info->csum_shash;
-
-	crypto_shash_digest(shash, kaddr + pgoff, len, csum);
-	kunmap_atomic(kaddr);
-
-	if (memcmp(csum, csum_expected, csum_size))
+	if (btrfs_check_sector_csum(fs_info, page, pgoff, csum, csum_expected))
 		goto zeroit;
-
 	return 0;
+
 zeroit:
 	btrfs_print_data_csum_error(BTRFS_I(inode), start, csum, csum_expected,
 				    bbio->mirror_num);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/8] btrfs: remove duplicated parameters from submit_data_read_repair()
  2022-05-22 11:47 misc btrfs cleanups Christoph Hellwig
  2022-05-22 11:47 ` [PATCH 1/8] btrfs: quit early if the fs has no RAID56 support for raid56 related checks Christoph Hellwig
  2022-05-22 11:47 ` [PATCH 2/8] btrfs: introduce a pure data checksum checking helper Christoph Hellwig
@ 2022-05-22 11:47 ` Christoph Hellwig
  2022-05-22 11:47 ` [PATCH 4/8] btrfs: factor out a helper to end a single sector buffere I/O Christoph Hellwig
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-22 11:47 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs, Nikolay Borisov, Johannes Thumshirn

From: Qu Wenruo <wqu@suse.com>

The function submit_data_read_repair() is only called for buffered data
read path, thus those members can be calculated using bvec directly:

- start
  start = page_offset(bvec->bv_page) + bvec->bv_offset;

- end
  end = start + bvec->bv_len - 1;

- page
  page = bvec->bv_page;

- pgoff
  pgoff = bvec->bv_offset;

Thus we can safely replace those 4 parameters with just one bio_vec.

Also remove the unused return value.

Signed-off-by: Qu Wenruo <wqu@suse.com>
[hch: also remove the return value]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/extent_io.c | 28 +++++++++++-----------------
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 588c7c606a2c6..dbec9be6daf9f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2727,18 +2727,17 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 		btrfs_subpage_end_reader(fs_info, page, start, len);
 }
 
-static blk_status_t submit_data_read_repair(struct inode *inode,
-					    struct bio *failed_bio,
-					    u32 bio_offset, struct page *page,
-					    unsigned int pgoff,
-					    u64 start, u64 end,
-					    int failed_mirror,
-					    unsigned int error_bitmap)
+static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
+		u32 bio_offset, const struct bio_vec *bvec, int failed_mirror,
+		unsigned int error_bitmap)
 {
+	const unsigned int pgoff = bvec->bv_offset;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	struct page *page = bvec->bv_page;
+	const u64 start = page_offset(bvec->bv_page) + bvec->bv_offset;
+	const u64 end = start + bvec->bv_len - 1;
 	const u32 sectorsize = fs_info->sectorsize;
 	const int nr_bits = (end + 1 - start) >> fs_info->sectorsize_bits;
-	int error = 0;
 	int i;
 
 	BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
@@ -2785,11 +2784,9 @@ static blk_status_t submit_data_read_repair(struct inode *inode,
 			continue;
 		}
 		/*
-		 * Repair failed, just record the error but still continue.
-		 * Or the remaining sectors will not be properly unlocked.
+		 * Continue on failed repair, otherwise the remaining sectors
+		 * will not be properly unlocked.
 		 */
-		if (!error)
-			error = ret;
 next:
 		end_page_read(page, uptodate, start + offset, sectorsize);
 		if (uptodate)
@@ -2802,7 +2799,6 @@ static blk_status_t submit_data_read_repair(struct inode *inode,
 				start + offset + sectorsize - 1,
 				&cached);
 	}
-	return errno_to_blk_status(error);
 }
 
 /* lots and lots of room for performance fixes in the end_bio funcs */
@@ -3093,10 +3089,8 @@ static void end_bio_extent_readpage(struct bio *bio)
 			 * submit_data_read_repair() will handle all the good
 			 * and bad sectors, we just continue to the next bvec.
 			 */
-			submit_data_read_repair(inode, bio, bio_offset, page,
-						start - page_offset(page),
-						start, end, mirror,
-						error_bitmap);
+			submit_data_read_repair(inode, bio, bio_offset, bvec,
+						mirror, error_bitmap);
 
 			ASSERT(bio_offset + len > bio_offset);
 			bio_offset += len;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/8] btrfs: factor out a helper to end a single sector buffere I/O
  2022-05-22 11:47 misc btrfs cleanups Christoph Hellwig
                   ` (2 preceding siblings ...)
  2022-05-22 11:47 ` [PATCH 3/8] btrfs: remove duplicated parameters from submit_data_read_repair() Christoph Hellwig
@ 2022-05-22 11:47 ` Christoph Hellwig
  2022-05-25 14:54   ` Nikolay Borisov
  2022-05-22 11:47 ` [PATCH 5/8] btrfs: refactor end_bio_extent_readpage Christoph Hellwig
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-22 11:47 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs, Johannes Thumshirn

Add a helper to end I/O on a single sector, which will come in handy with
the new read repair code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index dbec9be6daf9f..e87474f5b9415 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2727,6 +2727,20 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 		btrfs_subpage_end_reader(fs_info, page, start, len);
 }
 
+static void end_sector_io(struct page *page, u64 offset, bool uptodate)
+{
+	struct inode *inode = page->mapping->host;
+	u32 sectorsize = btrfs_sb(inode->i_sb)->sectorsize;
+	struct extent_state *cached = NULL;
+
+	end_page_read(page, uptodate, offset, sectorsize);
+	if (uptodate)
+		set_extent_uptodate(&BTRFS_I(inode)->io_tree, offset,
+				offset + sectorsize - 1, &cached, GFP_ATOMIC);
+	unlock_extent_cached_atomic(&BTRFS_I(inode)->io_tree, offset,
+			offset + sectorsize - 1, &cached);
+}
+
 static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
 		u32 bio_offset, const struct bio_vec *bvec, int failed_mirror,
 		unsigned int error_bitmap)
@@ -2757,7 +2771,6 @@ static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
 	/* Iterate through all the sectors in the range */
 	for (i = 0; i < nr_bits; i++) {
 		const unsigned int offset = i * sectorsize;
-		struct extent_state *cached = NULL;
 		bool uptodate = false;
 		int ret;
 
@@ -2788,16 +2801,7 @@ static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio,
 		 * will not be properly unlocked.
 		 */
 next:
-		end_page_read(page, uptodate, start + offset, sectorsize);
-		if (uptodate)
-			set_extent_uptodate(&BTRFS_I(inode)->io_tree,
-					start + offset,
-					start + offset + sectorsize - 1,
-					&cached, GFP_ATOMIC);
-		unlock_extent_cached_atomic(&BTRFS_I(inode)->io_tree,
-				start + offset,
-				start + offset + sectorsize - 1,
-				&cached);
+		end_sector_io(page, start + offset, uptodate);
 	}
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/8] btrfs: refactor end_bio_extent_readpage
  2022-05-22 11:47 misc btrfs cleanups Christoph Hellwig
                   ` (3 preceding siblings ...)
  2022-05-22 11:47 ` [PATCH 4/8] btrfs: factor out a helper to end a single sector buffere I/O Christoph Hellwig
@ 2022-05-22 11:47 ` Christoph Hellwig
  2022-05-25 14:57   ` Nikolay Borisov
  2022-05-22 11:47 ` [PATCH 6/8] btrfs: factor out a btrfs_csum_ptr helper Christoph Hellwig
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-22 11:47 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

Untangle the goto mess and remove the pointless 'ret' local variable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 87 +++++++++++++++++++++-----------------------
 1 file changed, 41 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e87474f5b9415..1d144f655f653 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3015,7 +3015,6 @@ static void end_bio_extent_readpage(struct bio *bio)
 	 */
 	u32 bio_offset = 0;
 	int mirror;
-	int ret;
 	struct bvec_iter_all iter_all;
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
@@ -3026,6 +3025,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 		const u32 sectorsize = fs_info->sectorsize;
 		unsigned int error_bitmap = (unsigned int)-1;
+		bool repair = false;
 		u64 start;
 		u64 end;
 		u32 len;
@@ -3063,55 +3063,23 @@ static void end_bio_extent_readpage(struct bio *bio)
 			if (is_data_inode(inode)) {
 				error_bitmap = btrfs_verify_data_csum(bbio,
 						bio_offset, page, start, end);
-				ret = error_bitmap;
+				if (error_bitmap)
+					uptodate = false;
 			} else {
-				ret = btrfs_validate_metadata_buffer(bbio,
-					page, start, end, mirror);
+				if (btrfs_validate_metadata_buffer(bbio,
+						page, start, end, mirror))
+					uptodate = false;
 			}
-			if (ret)
-				uptodate = false;
-			else
-				clean_io_failure(BTRFS_I(inode)->root->fs_info,
-						 failure_tree, tree, start,
-						 page,
-						 btrfs_ino(BTRFS_I(inode)), 0);
 		}
 
-		if (likely(uptodate))
-			goto readpage_ok;
-
-		if (is_data_inode(inode)) {
-			/*
-			 * If we failed to submit the IO at all we'll have a
-			 * mirror_num == 0, in which case we need to just mark
-			 * the page with an error and unlock it and carry on.
-			 */
-			if (mirror == 0)
-				goto readpage_ok;
-
-			/*
-			 * submit_data_read_repair() will handle all the good
-			 * and bad sectors, we just continue to the next bvec.
-			 */
-			submit_data_read_repair(inode, bio, bio_offset, bvec,
-						mirror, error_bitmap);
-
-			ASSERT(bio_offset + len > bio_offset);
-			bio_offset += len;
-			continue;
-		} else {
-			struct extent_buffer *eb;
-
-			eb = find_extent_buffer_readpage(fs_info, page, start);
-			set_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
-			eb->read_mirror = mirror;
-			atomic_dec(&eb->io_pages);
-		}
-readpage_ok:
 		if (likely(uptodate)) {
 			loff_t i_size = i_size_read(inode);
 			pgoff_t end_index = i_size >> PAGE_SHIFT;
 
+			clean_io_failure(BTRFS_I(inode)->root->fs_info,
+					 failure_tree, tree, start, page,
+					 btrfs_ino(BTRFS_I(inode)), 0);
+
 			/*
 			 * Zero out the remaining part if this range straddles
 			 * i_size.
@@ -3128,14 +3096,41 @@ static void end_bio_extent_readpage(struct bio *bio)
 				zero_user_segment(page, zero_start,
 						  offset_in_page(end) + 1);
 			}
+		} else if (is_data_inode(inode)) {
+			/*
+			 * Only try to repair bios that actually made it to a
+			 * device.  If the bio failed to be submitted mirror
+			 * is 0 and we need to fail it without retrying.
+			 */
+			if (mirror > 0)
+				repair = true;
+		} else {
+			struct extent_buffer *eb;
+
+			eb = find_extent_buffer_readpage(fs_info, page, start);
+			set_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
+			eb->read_mirror = mirror;
+			atomic_dec(&eb->io_pages);
 		}
+
+		if (repair) {
+			/*
+			 * submit_data_read_repair() will handle all the good
+			 * and bad sectors, we just continue to the next bvec.
+			 */
+			submit_data_read_repair(inode, bio, bio_offset, bvec,
+						mirror, error_bitmap);
+		} else {
+			/* Update page status and unlock */
+			end_page_read(page, uptodate, start, len);
+			endio_readpage_release_extent(&processed,
+					BTRFS_I(inode), start, end,
+					PageUptodate(page));
+		}
+
 		ASSERT(bio_offset + len > bio_offset);
 		bio_offset += len;
 
-		/* Update page status and unlock */
-		end_page_read(page, uptodate, start, len);
-		endio_readpage_release_extent(&processed, BTRFS_I(inode),
-					      start, end, PageUptodate(page));
 	}
 	/* Release the last extent */
 	endio_readpage_release_extent(&processed, NULL, 0, 0, false);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 6/8] btrfs: factor out a btrfs_csum_ptr helper
  2022-05-22 11:47 misc btrfs cleanups Christoph Hellwig
                   ` (4 preceding siblings ...)
  2022-05-22 11:47 ` [PATCH 5/8] btrfs: refactor end_bio_extent_readpage Christoph Hellwig
@ 2022-05-22 11:47 ` Christoph Hellwig
  2022-05-25 14:32   ` Nikolay Borisov
  2022-05-22 11:47 ` [PATCH 7/8] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks Christoph Hellwig
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-22 11:47 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs, Johannes Thumshirn

Add a helper to find the csum for a byte offset into the csum buffer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/ctree.h |  8 ++++++++
 fs/btrfs/inode.c | 13 +++----------
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8dd7d36b83ecb..d982ea62c521b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2733,6 +2733,14 @@ int btrfs_get_extent_inline_ref_type(const struct extent_buffer *eb,
 				     enum btrfs_inline_ref_type is_data);
 u64 hash_extent_data_ref(u64 root_objectid, u64 owner, u64 offset);
 
+static inline u8 *btrfs_csum_ptr(struct btrfs_fs_info *fs_info, u8 *csums,
+		u64 offset)
+{
+	u64 offset_in_sectors = offset >> fs_info->sectorsize_bits;
+
+	return csums + offset_in_sectors * fs_info->csum_size;
+}
+
 /*
  * Take the number of bytes to be checksummmed and figure out how many leaves
  * it would require to store the csums for that many bytes.
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e4acdec9ffc69..095632977a798 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3369,15 +3369,12 @@ static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio,
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	u32 len = fs_info->sectorsize;
-	const u32 csum_size = fs_info->csum_size;
-	unsigned int offset_sectors;
 	u8 *csum_expected;
 	u8 csum[BTRFS_CSUM_SIZE];
 
 	ASSERT(pgoff + len <= PAGE_SIZE);
 
-	offset_sectors = bio_offset >> fs_info->sectorsize_bits;
-	csum_expected = ((u8 *)bbio->csum) + offset_sectors * csum_size;
+	csum_expected = btrfs_csum_ptr(fs_info, bbio->csum, bio_offset);
 
 	if (btrfs_check_sector_csum(fs_info, page, pgoff, csum, csum_expected))
 		goto zeroit;
@@ -8007,12 +8004,8 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 		if (ret)
 			goto err;
 	} else {
-		u64 csum_offset;
-
-		csum_offset = file_offset - dip->file_offset;
-		csum_offset >>= fs_info->sectorsize_bits;
-		csum_offset *= fs_info->csum_size;
-		btrfs_bio(bio)->csum = dip->csums + csum_offset;
+		btrfs_bio(bio)->csum = btrfs_csum_ptr(fs_info, dip->csums,
+				file_offset - dip->file_offset);
 	}
 map:
 	ret = btrfs_map_bio(fs_info, bio, 0);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 7/8] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks
  2022-05-22 11:47 misc btrfs cleanups Christoph Hellwig
                   ` (5 preceding siblings ...)
  2022-05-22 11:47 ` [PATCH 6/8] btrfs: factor out a btrfs_csum_ptr helper Christoph Hellwig
@ 2022-05-22 11:47 ` Christoph Hellwig
  2022-05-26 12:58   ` Nikolay Borisov
  2022-05-22 11:47 ` [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio Christoph Hellwig
  2022-05-25 20:20 ` misc btrfs cleanups David Sterba
  8 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-22 11:47 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

From: Qu Wenruo <wqu@suse.com>

Add a helper that works similar to __bio_for_each_segment, but instead of
iterating over PAGE_SIZE chunks it iterates over each sector.

Signed-off-by: Qu Wenruo <wqu@suse.com>
[hch: split from a larger patch, and iterate over the offset instead of
      the offset bits]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/volumes.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 12b2af9260e92..6f784d4f54664 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -391,6 +391,18 @@ static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio)
 	}
 }
 
+/*
+ * Iterate through a btrfs_bio (@bbio) on a per-sector basis.
+ */
+#define btrfs_bio_for_each_sector(fs_info, bvl, bbio, iter, bio_offset)	\
+	for ((iter) = (bbio)->iter, (bio_offset) = 0;			\
+	     (iter).bi_size &&					\
+	     (((bvl) = bio_iter_iovec((&(bbio)->bio), (iter))), 1);	\
+	     (bio_offset) += fs_info->sectorsize,			\
+	     bio_advance_iter_single(&(bbio)->bio, &(iter),		\
+	     (fs_info)->sectorsize))
+
+
 struct btrfs_io_stripe {
 	struct btrfs_device *dev;
 	u64 physical;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-22 11:47 misc btrfs cleanups Christoph Hellwig
                   ` (6 preceding siblings ...)
  2022-05-22 11:47 ` [PATCH 7/8] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks Christoph Hellwig
@ 2022-05-22 11:47 ` Christoph Hellwig
  2022-05-22 12:21   ` Qu Wenruo
  2022-05-26 12:58   ` Nikolay Borisov
  2022-05-25 20:20 ` misc btrfs cleanups David Sterba
  8 siblings, 2 replies; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-22 11:47 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: Qu Wenruo, linux-btrfs

Use the new btrfs_bio_for_each_sector iterator to simplify
btrfs_check_read_dio_bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/inode.c | 56 +++++++++++++++++++-----------------------------
 1 file changed, 22 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 095632977a798..34466b543ed97 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7886,47 +7886,35 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
 {
 	struct inode *inode = dip->inode;
 	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
-	const u32 sectorsize = fs_info->sectorsize;
 	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
-	struct bio_vec bvec;
-	struct bvec_iter iter;
-	u32 bio_offset = 0;
 	blk_status_t err = BLK_STS_OK;
+	struct bvec_iter iter;
+	struct bio_vec bv;
+	u32 offset;
+
+	btrfs_bio_for_each_sector(fs_info, bv, bbio, iter, offset) {
+		u64 start = bbio->file_offset + offset;
+
+		if (uptodate &&
+		    (!csum || !check_data_csum(inode, bbio, offset, bv.bv_page,
+				bv.bv_offset, start))) {
+			clean_io_failure(fs_info, failure_tree, io_tree, start,
+					 bv.bv_page, btrfs_ino(BTRFS_I(inode)),
+					 bv.bv_offset);
+		} else {
+			int ret;
 
-	__bio_for_each_segment(bvec, &bbio->bio, iter, bbio->iter) {
-		unsigned int i, nr_sectors, pgoff;
-
-		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
-		pgoff = bvec.bv_offset;
-		for (i = 0; i < nr_sectors; i++) {
-			u64 start = bbio->file_offset + bio_offset;
-
-			ASSERT(pgoff < PAGE_SIZE);
-			if (uptodate &&
-			    (!csum || !check_data_csum(inode, bbio,
-						       bio_offset, bvec.bv_page,
-						       pgoff, start))) {
-				clean_io_failure(fs_info, failure_tree, io_tree,
-						 start, bvec.bv_page,
-						 btrfs_ino(BTRFS_I(inode)),
-						 pgoff);
-			} else {
-				int ret;
-
-				ret = btrfs_repair_one_sector(inode, &bbio->bio,
-						bio_offset, bvec.bv_page, pgoff,
-						start, bbio->mirror_num,
-						submit_dio_repair_bio);
-				if (ret)
-					err = errno_to_blk_status(ret);
-			}
-			ASSERT(bio_offset + sectorsize > bio_offset);
-			bio_offset += sectorsize;
-			pgoff += sectorsize;
+			ret = btrfs_repair_one_sector(inode, &bbio->bio, offset,
+					bv.bv_page, bv.bv_offset, start,
+					bbio->mirror_num,
+					submit_dio_repair_bio);
+			if (ret)
+				err = errno_to_blk_status(ret);
 		}
 	}
+
 	return err;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-22 11:47 ` [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio Christoph Hellwig
@ 2022-05-22 12:21   ` Qu Wenruo
  2022-05-22 12:31     ` Christoph Hellwig
  2022-05-26 12:58   ` Nikolay Borisov
  1 sibling, 1 reply; 36+ messages in thread
From: Qu Wenruo @ 2022-05-22 12:21 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs



On 2022/5/22 19:47, Christoph Hellwig wrote:
> Use the new btrfs_bio_for_each_sector iterator to simplify
> btrfs_check_read_dio_bio.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Qu Wenruo <wqu@suse.com>

In fact, in my version I also want to convert the buffered read endio to
use the helper, and get rid of the error bitmap thing.

But you're so fast sending out the patchset...

Thanks,
Qu
> ---
>   fs/btrfs/inode.c | 56 +++++++++++++++++++-----------------------------
>   1 file changed, 22 insertions(+), 34 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 095632977a798..34466b543ed97 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7886,47 +7886,35 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
>   {
>   	struct inode *inode = dip->inode;
>   	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> -	const u32 sectorsize = fs_info->sectorsize;
>   	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
>   	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
>   	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
> -	struct bio_vec bvec;
> -	struct bvec_iter iter;
> -	u32 bio_offset = 0;
>   	blk_status_t err = BLK_STS_OK;
> +	struct bvec_iter iter;
> +	struct bio_vec bv;
> +	u32 offset;
> +
> +	btrfs_bio_for_each_sector(fs_info, bv, bbio, iter, offset) {
> +		u64 start = bbio->file_offset + offset;
> +
> +		if (uptodate &&
> +		    (!csum || !check_data_csum(inode, bbio, offset, bv.bv_page,
> +				bv.bv_offset, start))) {
> +			clean_io_failure(fs_info, failure_tree, io_tree, start,
> +					 bv.bv_page, btrfs_ino(BTRFS_I(inode)),
> +					 bv.bv_offset);
> +		} else {
> +			int ret;
>
> -	__bio_for_each_segment(bvec, &bbio->bio, iter, bbio->iter) {
> -		unsigned int i, nr_sectors, pgoff;
> -
> -		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
> -		pgoff = bvec.bv_offset;
> -		for (i = 0; i < nr_sectors; i++) {
> -			u64 start = bbio->file_offset + bio_offset;
> -
> -			ASSERT(pgoff < PAGE_SIZE);
> -			if (uptodate &&
> -			    (!csum || !check_data_csum(inode, bbio,
> -						       bio_offset, bvec.bv_page,
> -						       pgoff, start))) {
> -				clean_io_failure(fs_info, failure_tree, io_tree,
> -						 start, bvec.bv_page,
> -						 btrfs_ino(BTRFS_I(inode)),
> -						 pgoff);
> -			} else {
> -				int ret;
> -
> -				ret = btrfs_repair_one_sector(inode, &bbio->bio,
> -						bio_offset, bvec.bv_page, pgoff,
> -						start, bbio->mirror_num,
> -						submit_dio_repair_bio);
> -				if (ret)
> -					err = errno_to_blk_status(ret);
> -			}
> -			ASSERT(bio_offset + sectorsize > bio_offset);
> -			bio_offset += sectorsize;
> -			pgoff += sectorsize;
> +			ret = btrfs_repair_one_sector(inode, &bbio->bio, offset,
> +					bv.bv_page, bv.bv_offset, start,
> +					bbio->mirror_num,
> +					submit_dio_repair_bio);
> +			if (ret)
> +				err = errno_to_blk_status(ret);
>   		}
>   	}
> +
>   	return err;
>   }
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-22 12:21   ` Qu Wenruo
@ 2022-05-22 12:31     ` Christoph Hellwig
  2022-05-22 12:38       ` Qu Wenruo
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-22 12:31 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs

On Sun, May 22, 2022 at 08:21:47PM +0800, Qu Wenruo wrote:
> In fact, in my version I also want to convert the buffered read endio to
> use the helper, and get rid of the error bitmap thing.

Yes, the buffered end I/O path could use some more love.  I also wonder
if splitting the data vs metadata case might be good idea as well.  But
maybe we can finish the read repair series first and do that next?
'cause now a lot of the bio works will depend on the read repair, and
I don't want to block it on yet another series..

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-22 12:31     ` Christoph Hellwig
@ 2022-05-22 12:38       ` Qu Wenruo
  2022-05-22 12:53         ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Qu Wenruo @ 2022-05-22 12:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, Qu Wenruo, linux-btrfs



On 2022/5/22 20:31, Christoph Hellwig wrote:
> On Sun, May 22, 2022 at 08:21:47PM +0800, Qu Wenruo wrote:
>> In fact, in my version I also want to convert the buffered read endio to
>> use the helper, and get rid of the error bitmap thing.
>
> Yes, the buffered end I/O path could use some more love.  I also wonder
> if splitting the data vs metadata case might be good idea as well.  But
> maybe we can finish the read repair series first and do that next?

OK, that makes sense.

> 'cause now a lot of the bio works will depend on the read repair, and
> I don't want to block it on yet another series..

Although I believe we will have to take more time on the read repair
code/functionality.

Especially all our submitted version have their own problems.

 From the basic handling of checker pattern corruption, to the trade-off
between memory allocation and performance for read on corrupted data.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-22 12:38       ` Qu Wenruo
@ 2022-05-22 12:53         ` Christoph Hellwig
  2022-05-23  0:07           ` Qu Wenruo
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-22 12:53 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs

On Sun, May 22, 2022 at 08:38:50PM +0800, Qu Wenruo wrote:
>> 'cause now a lot of the bio works will depend on the read repair, and
>> I don't want to block it on yet another series..
>
> Although I believe we will have to take more time on the read repair
> code/functionality.
>
> Especially all our submitted version have their own problems.
>
> From the basic handling of checker pattern corruption, to the trade-off
> between memory allocation and performance for read on corrupted data.

I've already pushed out a new version here:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-read_repair

so feel free to take a look.  I just don't want to spam the list with it
quite yet with this series outstanding.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-22 12:53         ` Christoph Hellwig
@ 2022-05-23  0:07           ` Qu Wenruo
  2022-05-23  6:26             ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Qu Wenruo @ 2022-05-23  0:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, Qu Wenruo, linux-btrfs



On 2022/5/22 20:53, Christoph Hellwig wrote:
> On Sun, May 22, 2022 at 08:38:50PM +0800, Qu Wenruo wrote:
>>> 'cause now a lot of the bio works will depend on the read repair, and
>>> I don't want to block it on yet another series..
>>
>> Although I believe we will have to take more time on the read repair
>> code/functionality.
>>
>> Especially all our submitted version have their own problems.
>>
>>  From the basic handling of checker pattern corruption, to the trade-off
>> between memory allocation and performance for read on corrupted data.
>
> I've already pushed out a new version here:
>
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-read_repair
>
> so feel free to take a look.  I just don't want to spam the list with it
> quite yet with this series outstanding.

I checked the code, but still find the code in patch "btrfs: add new
read repair infrastructure" not that instinctive.

- Why we bother different repair methods in btrfs_repair_one_mirror()?
   In fact btrfs_repair_io_failure() can handle all profiles.

   Then why we go back to write the whole bio?
   The only reason I can think of is, we're still trying to do some
   "optimization".

   But all our bio submission is already synchronous, I doubt such
   "optimization" would make much difference.

- The bio truncation
   This really looks like a bandage to address the checker pattern
   corruption.
   I doubt why not just do per-sector read/write like:

+	/* Init read bio to contain that corrupted sector only */
+	for (i = get_next_mirror(init_mirror); i != init_mirror; i++) {
+		int ret;
+
+		ret = btrfs_map_bio_wait(inode, read_bio, i);
+		/* Failed to submit, try next mirror */
+		if (ret < 0)
+			continue;
+
+		/* Verify the checksum */
+		if (failed_bbio->csum)
+			ret = btrfs_check_data_sector(fs_info, page,
+					pgoff, repair_bbio->csum);
+		if (ret == 0) {
+			found_good = true;
+			btrfs_repair_io_failure();
+			break;
+		}
+	}
+	if (!found_good)
+		return -EIO;

To me, the "optimization" of batched read/write is only relevant if we
have tons of corrupted sectors in a read bio, which I don't believe is a
hot path in real world anyway.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/8] btrfs: introduce a pure data checksum checking helper
  2022-05-22 11:47 ` [PATCH 2/8] btrfs: introduce a pure data checksum checking helper Christoph Hellwig
@ 2022-05-23  0:38   ` Qu Wenruo
  2022-05-24  7:24     ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Qu Wenruo @ 2022-05-23  0:38 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs, Nikolay Borisov



On 2022/5/22 19:47, Christoph Hellwig wrote:
> From: Qu Wenruo <wqu@suse.com>
>
> Although we have several data csum verification code, we never have a
> function really just to verify checksum for one sector.
>
> Function check_data_csum() do extra work for error reporting, thus it
> requires a lot of extra things like file offset, bio_offset etc.
>
> Function btrfs_verify_data_csum() is even worse, it will utizlie page
> checked flag, which means it can not be utilized for direct IO pages.
>
> Here we introduce a new helper, btrfs_check_sector_csum(), which really
> only accept a sector in page, and expected checksum pointer.
>
> We use this function to implement check_data_csum(), and export it for
> incoming patch.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> [hch: keep passing the csum array as an arguments, as the callers want
>        to print it, rename per request]

Mind to constify the @csum_expected parameter?

Thanks,
Qu
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> ---
>   fs/btrfs/compression.c | 13 ++++---------
>   fs/btrfs/ctree.h       |  2 ++
>   fs/btrfs/inode.c       | 38 ++++++++++++++++++++++++++++----------
>   3 files changed, 34 insertions(+), 19 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index f4564f32f6d93..6ab82e142f1f8 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -147,12 +147,10 @@ static int check_compressed_csum(struct btrfs_inode *inode, struct bio *bio,
>   				 u64 disk_start)
>   {
>   	struct btrfs_fs_info *fs_info = inode->root->fs_info;
> -	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
>   	const u32 csum_size = fs_info->csum_size;
>   	const u32 sectorsize = fs_info->sectorsize;
>   	struct page *page;
>   	unsigned int i;
> -	char *kaddr;
>   	u8 csum[BTRFS_CSUM_SIZE];
>   	struct compressed_bio *cb = bio->bi_private;
>   	u8 *cb_sum = cb->sums;
> @@ -161,8 +159,6 @@ static int check_compressed_csum(struct btrfs_inode *inode, struct bio *bio,
>   	    test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state))
>   		return 0;
>
> -	shash->tfm = fs_info->csum_shash;
> -
>   	for (i = 0; i < cb->nr_pages; i++) {
>   		u32 pg_offset;
>   		u32 bytes_left = PAGE_SIZE;
> @@ -175,12 +171,11 @@ static int check_compressed_csum(struct btrfs_inode *inode, struct bio *bio,
>   		/* Hash through the page sector by sector */
>   		for (pg_offset = 0; pg_offset < bytes_left;
>   		     pg_offset += sectorsize) {
> -			kaddr = kmap_atomic(page);
> -			crypto_shash_digest(shash, kaddr + pg_offset,
> -					    sectorsize, csum);
> -			kunmap_atomic(kaddr);
> +			int ret;
>
> -			if (memcmp(&csum, cb_sum, csum_size) != 0) {
> +			ret = btrfs_check_sector_csum(fs_info, page, pg_offset,
> +						      csum, cb_sum);
> +			if (ret) {
>   				btrfs_print_data_csum_error(inode, disk_start,
>   						csum, cb_sum, cb->mirror_num);
>   				if (btrfs_bio(bio)->device)
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 0e49b1a0c0716..8dd7d36b83ecb 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3253,6 +3253,8 @@ u64 btrfs_file_extent_end(const struct btrfs_path *path);
>   /* inode.c */
>   void btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
>   			   int mirror_num, enum btrfs_compression_type compress_type);
> +int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
> +			    u32 pgoff, u8 *csum, u8 *csum_expected);
>   unsigned int btrfs_verify_data_csum(struct btrfs_bio *bbio,
>   				    u32 bio_offset, struct page *page,
>   				    u64 start, u64 end);
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index da13bd0d10f12..e4acdec9ffc69 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3326,6 +3326,29 @@ void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode,
>   				       finish_ordered_fn, uptodate);
>   }
>
> +/*
> + * Verify the checksum for a single sector without any extra action that
> + * depend on the type of I/O.
> + */
> +int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
> +			    u32 pgoff, u8 *csum, u8 *csum_expected)
> +{
> +	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
> +	char *kaddr;
> +
> +	ASSERT(pgoff + fs_info->sectorsize <= PAGE_SIZE);
> +
> +	shash->tfm = fs_info->csum_shash;
> +
> +	kaddr = kmap_local_page(page) + pgoff;
> +	crypto_shash_digest(shash, kaddr, fs_info->sectorsize, csum);
> +	kunmap_local(kaddr);
> +
> +	if (memcmp(csum, csum_expected, fs_info->csum_size))
> +		return -EIO;
> +	return 0;
> +}
> +
>   /*
>    * check_data_csum - verify checksum of one sector of uncompressed data
>    * @inode:	inode
> @@ -3336,14 +3359,15 @@ void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode,
>    * @start:	logical offset in the file
>    *
>    * The length of such check is always one sector size.
> + *
> + * When csum mismatch detected, we will also report the error and fill the
> + * corrupted range with zero. (thus it needs the extra parameters)
>    */
>   static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio,
>   			   u32 bio_offset, struct page *page, u32 pgoff,
>   			   u64 start)
>   {
>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> -	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
> -	char *kaddr;
>   	u32 len = fs_info->sectorsize;
>   	const u32 csum_size = fs_info->csum_size;
>   	unsigned int offset_sectors;
> @@ -3355,16 +3379,10 @@ static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio,
>   	offset_sectors = bio_offset >> fs_info->sectorsize_bits;
>   	csum_expected = ((u8 *)bbio->csum) + offset_sectors * csum_size;
>
> -	kaddr = kmap_atomic(page);
> -	shash->tfm = fs_info->csum_shash;
> -
> -	crypto_shash_digest(shash, kaddr + pgoff, len, csum);
> -	kunmap_atomic(kaddr);
> -
> -	if (memcmp(csum, csum_expected, csum_size))
> +	if (btrfs_check_sector_csum(fs_info, page, pgoff, csum, csum_expected))
>   		goto zeroit;
> -
>   	return 0;
> +
>   zeroit:
>   	btrfs_print_data_csum_error(BTRFS_I(inode), start, csum, csum_expected,
>   				    bbio->mirror_num);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-23  0:07           ` Qu Wenruo
@ 2022-05-23  6:26             ` Christoph Hellwig
  2022-05-23  7:46               ` Qu Wenruo
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-23  6:26 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs

On Mon, May 23, 2022 at 08:07:01AM +0800, Qu Wenruo wrote:
>
> I checked the code, but still find the code in patch "btrfs: add new
> read repair infrastructure" not that instinctive.
>
> - Why we bother different repair methods in btrfs_repair_one_mirror()?
>   In fact btrfs_repair_io_failure() can handle all profiles.

Becasue btrfs_repair_io_failure can't handle multiple-page I/O.  It
is also is rather cumbersome because it bypassed the normal bio
mapping.  As a follow on I'd rather move it over to btrfs_map_bio
with a special flag for the single mirror parity write rather than that
hack.

>   Then why we go back to write the whole bio?

Because the whole bio at this point is all the bad sectors.  There
is no point in writing only parts of the bio because that would leave
corruption on disk.

>   The only reason I can think of is, we're still trying to do some
>   "optimization".
>
>   But all our bio submission is already synchronous, I doubt such
>   "optimization" would make much difference.

Can you explain what you mean here?

>
> - The bio truncation
>   This really looks like a bandage to address the checker pattern
>   corruption.
>   I doubt why not just do per-sector read/write like:

Because now you wait for each I/O.

> To me, the "optimization" of batched read/write is only relevant if we
> have tons of corrupted sectors in a read bio, which I don't believe is a
> hot path in real world anyway.

It is is very easy low hanging fruit, and actually pretty common for
actual failures of components.  In other words:  single sector failures
happen some times, usually due to corruption on the transfer wire.  If
actual corruption happens on the driver it will usually fail quite
bit areas.  These are rather rare these days as a lot of device shut
down before returning bad data, but if it happens you'll rarely see
just a single sector.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-23  6:26             ` Christoph Hellwig
@ 2022-05-23  7:46               ` Qu Wenruo
  2022-05-24  7:32                 ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Qu Wenruo @ 2022-05-23  7:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, Qu Wenruo, linux-btrfs



On 2022/5/23 14:26, Christoph Hellwig wrote:
> On Mon, May 23, 2022 at 08:07:01AM +0800, Qu Wenruo wrote:
>>
>> I checked the code, but still find the code in patch "btrfs: add new
>> read repair infrastructure" not that instinctive.
>>
>> - Why we bother different repair methods in btrfs_repair_one_mirror()?
>>    In fact btrfs_repair_io_failure() can handle all profiles.
>
> Becasue btrfs_repair_io_failure can't handle multiple-page I/O.  It
> is also is rather cumbersome because it bypassed the normal bio
> mapping.  As a follow on I'd rather move it over to btrfs_map_bio
> with a special flag for the single mirror parity write rather than that
> hack.

In fact so far for all callers of btrfs_repair_io_failure(), we are
always handling things inside one stripe.

Thus we can easily enhance that function to handle multi page ranges.

Although a dedicated btrfs_map_bio() flags seems more generic and better.
>
>>    Then why we go back to write the whole bio?
>
> Because the whole bio at this point is all the bad sectors.  There
> is no point in writing only parts of the bio because that would leave
> corruption on disk.
>
>>    The only reason I can think of is, we're still trying to do some
>>    "optimization".
>>
>>    But all our bio submission is already synchronous, I doubt such
>>    "optimization" would make much difference.
>
> Can you explain what you mean here?

We wait for the read bio anyway, I doubt the batched write part is that
important.

If you really want, I can try to make the write part asynchronous, while
still keep the read part synchronous, and easier to read.

>
>>
>> - The bio truncation
>>    This really looks like a bandage to address the checker pattern
>>    corruption.
>>    I doubt why not just do per-sector read/write like:
>
> Because now you wait for each I/O.

Yeah, that's the biggest performance drop here, that's definitely no doubt.

>
>> To me, the "optimization" of batched read/write is only relevant if we
>> have tons of corrupted sectors in a read bio, which I don't believe is a
>> hot path in real world anyway.
>
> It is is very easy low hanging fruit,

Unfortunately, this is not that a simple fruit, or I won't go full
bitmap way.

In your current version, the do {} while() loop iterates through all
mirrors.

But for the following case, we will hit problems thanks to RAID1C3 again:

Mirror 1 	|X|X|X|X|
Mirror 2	|X| |X| |
Mirror 3	| |X| |X|

We hit mirror 1 initially, thus @initial_mirror is 1.

Then when we try mirror 2, since the first sector is still bad, we jump
to the next mirror.

For mirror 3, we fixed the first sector only. Then 2nd sector is still
from mirror 3 and didn't pass.
Now we have no more mirrors, and still return -EIO.

What saves our day is the VFS read retry, which will try read the range
sectors by sector, and got every sector fixed.


Unfortunatly we waste a lot of code to hack the bio size, but it still
doesn't work, and falls back to exactly the same sector-by-sector wait
behavior.

So my points still stand, if we want to do batched handling, either we
go bitmap or we give up.

Such hacky bandage seems to work at first glance and will pass your new
test cases, but it doesn't do it any better than sector-by-sector waiting.
(Forgot to mention, the new RAID1C3 test case may also be flawed, as any
read on other mirrors will cause read-repair, screwing up our later
retry, thus we must check pid first before doing any read.)

Thus it's a low hanging but rotten fruit.

Thanks,
Qu

> and actually pretty common for
> actual failures of components.  In other words:  single sector failures
> happen some times, usually due to corruption on the transfer wire.  If
> actual corruption happens on the driver it will usually fail quite
> bit areas.  These are rather rare these days as a lot of device shut
> down before returning bad data, but if it happens you'll rarely see
> just a single sector.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/8] btrfs: introduce a pure data checksum checking helper
  2022-05-23  0:38   ` Qu Wenruo
@ 2022-05-24  7:24     ` Christoph Hellwig
  2022-05-24  8:07       ` Qu Wenruo
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-24  7:24 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs, Nikolay Borisov

On Mon, May 23, 2022 at 08:38:13AM +0800, Qu Wenruo wrote:
>
>
> On 2022/5/22 19:47, Christoph Hellwig wrote:
>> From: Qu Wenruo <wqu@suse.com>
>>
>> Although we have several data csum verification code, we never have a
>> function really just to verify checksum for one sector.
>>
>> Function check_data_csum() do extra work for error reporting, thus it
>> requires a lot of extra things like file offset, bio_offset etc.
>>
>> Function btrfs_verify_data_csum() is even worse, it will utizlie page
>> checked flag, which means it can not be utilized for direct IO pages.
>>
>> Here we introduce a new helper, btrfs_check_sector_csum(), which really
>> only accept a sector in page, and expected checksum pointer.
>>
>> We use this function to implement check_data_csum(), and export it for
>> incoming patch.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> [hch: keep passing the csum array as an arguments, as the callers want
>>        to print it, rename per request]
>
> Mind to constify the @csum_expected parameter?

This would be the incremental diff, if Dave cares deeply he can fold
it in:

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index d982ea62c521b..f01ce82af8ca9 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3262,7 +3262,7 @@ u64 btrfs_file_extent_end(const struct btrfs_path *path);
 void btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 			   int mirror_num, enum btrfs_compression_type compress_type);
 int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
-			    u32 pgoff, u8 *csum, u8 *csum_expected);
+			    u32 pgoff, u8 *csum, u8 * const csum_expected);
 unsigned int btrfs_verify_data_csum(struct btrfs_bio *bbio,
 				    u32 bio_offset, struct page *page,
 				    u64 start, u64 end);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index bfc0b0035b03c..c344ed0e057ac 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3330,7 +3330,7 @@ void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode,
  * depend on the type of I/O.
  */
 int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
-			    u32 pgoff, u8 *csum, u8 *csum_expected)
+			    u32 pgoff, u8 *csum, u8 * const csum_expected)
 {
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	char *kaddr;

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-23  7:46               ` Qu Wenruo
@ 2022-05-24  7:32                 ` Christoph Hellwig
  2022-05-24  8:04                   ` Qu Wenruo
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-24  7:32 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs

On Mon, May 23, 2022 at 03:46:02PM +0800, Qu Wenruo wrote:
>> Becasue btrfs_repair_io_failure can't handle multiple-page I/O.  It
>> is also is rather cumbersome because it bypassed the normal bio
>> mapping.  As a follow on I'd rather move it over to btrfs_map_bio
>> with a special flag for the single mirror parity write rather than that
>> hack.
>
> In fact so far for all callers of btrfs_repair_io_failure(), we are
> always handling things inside one stripe.
>
> Thus we can easily enhance that function to handle multi page ranges.
>
> Although a dedicated btrfs_map_bio() flags seems more generic and better.

I did think of moving btrfs_repair_io_failure over to my new
infrastructure in fact, because it seems inherently possible.  Just
not the highest priority right now.

>> Because the whole bio at this point is all the bad sectors.  There
>> is no point in writing only parts of the bio because that would leave
>> corruption on disk.
>>
>>>    The only reason I can think of is, we're still trying to do some
>>>    "optimization".
>>>
>>>    But all our bio submission is already synchronous, I doubt such
>>>    "optimization" would make much difference.
>>
>> Can you explain what you mean here?
>
> We wait for the read bio anyway, I doubt the batched write part is that
> important.

I still don't understand the point.  Once we read more than a single
page, writing it back as a patch is completely trivial as shown by
this series.  Why would we not do it?

>
> If you really want, I can try to make the write part asynchronous, while
> still keep the read part synchronous, and easier to read.

Asynchronous writes gets us back into all the I/O completion handler
complexities, which was the whole reason to start on the synchronous
repair.

> In your current version, the do {} while() loop iterates through all
> mirrors.
>
> But for the following case, we will hit problems thanks to RAID1C3 again:
>
> Mirror 1 	|X|X|X|X|
> Mirror 2	|X| |X| |
> Mirror 3	| |X| |X|
>
> We hit mirror 1 initially, thus @initial_mirror is 1.
>
> Then when we try mirror 2, since the first sector is still bad, we jump
> to the next mirror.
>
> For mirror 3, we fixed the first sector only. Then 2nd sector is still
> from mirror 3 and didn't pass.
> Now we have no more mirrors, and still return -EIO.

Can you share a test case?  The code resets initial_mirror as soon as
we made any progress so that should not happen.

> So my points still stand, if we want to do batched handling, either we
> go bitmap or we give up.

Why?  For the very common case of clustered corruption or entirely
failing reads it is significantly faster than a simple synchronous
read of each sector, and also much better than the existing code.
It also is a lot less code than the existing code base, and (maybe
I'm biassed) a lot more readable.

Bitmaps only help you with randomly splattered corruption, which simply
is not how SSDs or hard drives actually fail.

> Such hacky bandage seems to work at first glance and will pass your new
> test cases, but it doesn't do it any better than sector-by-sector waiting.
> (Forgot to mention, the new RAID1C3 test case may also be flawed, as any
> read on other mirrors will cause read-repair, screwing up our later
> retry, thus we must check pid first before doing any read.)

The updated version uses the read from mirror loop from btrfs/142
that cleverly used bash internals to not issue the read if it would
be done using the wrong mirror.  Which also really nicely speeds up
the tests including the exist 140 and 141 ones.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-24  7:32                 ` Christoph Hellwig
@ 2022-05-24  8:04                   ` Qu Wenruo
  2022-05-24  8:21                     ` Qu Wenruo
  0 siblings, 1 reply; 36+ messages in thread
From: Qu Wenruo @ 2022-05-24  8:04 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo
  Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 2022/5/24 15:32, Christoph Hellwig wrote:
> On Mon, May 23, 2022 at 03:46:02PM +0800, Qu Wenruo wrote:
>>> Becasue btrfs_repair_io_failure can't handle multiple-page I/O.  It
>>> is also is rather cumbersome because it bypassed the normal bio
>>> mapping.  As a follow on I'd rather move it over to btrfs_map_bio
>>> with a special flag for the single mirror parity write rather than that
>>> hack.
>>
>> In fact so far for all callers of btrfs_repair_io_failure(), we are
>> always handling things inside one stripe.
>>
>> Thus we can easily enhance that function to handle multi page ranges.
>>
>> Although a dedicated btrfs_map_bio() flags seems more generic and better.
>
> I did think of moving btrfs_repair_io_failure over to my new
> infrastructure in fact, because it seems inherently possible.  Just
> not the highest priority right now.
>
>>> Because the whole bio at this point is all the bad sectors.  There
>>> is no point in writing only parts of the bio because that would leave
>>> corruption on disk.
>>>
>>>>     The only reason I can think of is, we're still trying to do some
>>>>     "optimization".
>>>>
>>>>     But all our bio submission is already synchronous, I doubt such
>>>>     "optimization" would make much difference.
>>>
>>> Can you explain what you mean here?
>>
>> We wait for the read bio anyway, I doubt the batched write part is that
>> important.
>
> I still don't understand the point.  Once we read more than a single
> page, writing it back as a patch is completely trivial as shown by
> this series.  Why would we not do it?
>
>>
>> If you really want, I can try to make the write part asynchronous, while
>> still keep the read part synchronous, and easier to read.
>
> Asynchronous writes gets us back into all the I/O completion handler
> complexities, which was the whole reason to start on the synchronous
> repair.
>
>> In your current version, the do {} while() loop iterates through all
>> mirrors.
>>
>> But for the following case, we will hit problems thanks to RAID1C3 again:
>>
>> Mirror 1 	|X|X|X|X|
>> Mirror 2	|X| |X| |
>> Mirror 3	| |X| |X|
>>
>> We hit mirror 1 initially, thus @initial_mirror is 1.
>>
>> Then when we try mirror 2, since the first sector is still bad, we jump
>> to the next mirror.
>>
>> For mirror 3, we fixed the first sector only. Then 2nd sector is still
>> from mirror 3 and didn't pass.
>> Now we have no more mirrors, and still return -EIO.
>
> Can you share a test case?

Unfortunately no real test case can work here.

The problem is, VFS will try to re-read with smaller block size.
In that case, fallback to sector-by-sector repair, thus even if we have
some policy terribly wrong, as long as the sector-by-sector behavior is
  fine, it will be hidden.

That's why when I do my bitmap version, I have to add extra trace events
to make sure it's not the retry, but really my read-repair code doing
the correct repair, without triggering the re-read.

>  The code resets initial_mirror as soon as
> we made any progress so that should not happen.

Oh, didn't see that part.

And in that case, yes, it will work for the checker pattern, although
not really any more efficient.

We will do 4 different reads to fix it, no better than sector-by-sector
repair.
And worse than 2 reads from the bitmap version.

But I get your point, it can handle continuous corruption better than
sector-by-sector, and no worse than bitmap version in that case.

>
>> So my points still stand, if we want to do batched handling, either we
>> go bitmap or we give up.
>
> Why?  For the very common case of clustered corruption or entirely
> failing reads it is significantly faster than a simple synchronous
> read of each sector, and also much better than the existing code.
> It also is a lot less code than the existing code base, and (maybe
> I'm biassed) a lot more readable.

The problem here is, yes, you can pick the common case one, but it comes
with the burden of worst cases too.

And for the readable part, I strongly doubt.

The things like resetting initial_mirror, making the naming "initial"
meaningless.
And the reset on the length part is also very quirky.

Yes, my bitmap version is super complex, that's no doubt, but the idea
and code should be very straightforward.

Just loop all mirrors until the bad bitmap is all zero. No need to reset
the length or whatever halfway, bitmap and mirror number is the only
iterator.

And even for the bitmap preallocation failure part, we have VFS to bail
us out.
And code wise, it's not that simpler, if you ignore the bitmap
pre-allocation part...

And for the ultimate readability, the sector-by-sector method can not be
beaten.
Thus I'm not a particular fan of any middle ground here.

>
> Bitmaps only help you with randomly splattered corruption, which simply
> is not how SSDs or hard drives actually fail.

But that's the case you have to take into consideration.

Even for cases where real world SSD to corrupt a big range of data, we
can still submit a read that crosses the corruption boundary.

>
>> Such hacky bandage seems to work at first glance and will pass your new
>> test cases, but it doesn't do it any better than sector-by-sector waiting.
>> (Forgot to mention, the new RAID1C3 test case may also be flawed, as any
>> read on other mirrors will cause read-repair, screwing up our later
>> retry, thus we must check pid first before doing any read.)
>
> The updated version uses the read from mirror loop from btrfs/142
> that cleverly used bash internals to not issue the read if it would
> be done using the wrong mirror.  Which also really nicely speeds up
> the tests including the exist 140 and 141 ones.
>
That's wonderful.

However the smaller problem is still there, we have no way to know if
it's the read-repair itself does its part correctly, or it's VFS retry
saves our day.

But yeah, btrfs/142 method is much better and should be backported to
btrfs/140 and btrfs/141.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/8] btrfs: introduce a pure data checksum checking helper
  2022-05-24  7:24     ` Christoph Hellwig
@ 2022-05-24  8:07       ` Qu Wenruo
  2022-05-24  8:11         ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Qu Wenruo @ 2022-05-24  8:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, Qu Wenruo, linux-btrfs,
	Nikolay Borisov



On 2022/5/24 15:24, Christoph Hellwig wrote:
> On Mon, May 23, 2022 at 08:38:13AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2022/5/22 19:47, Christoph Hellwig wrote:
>>> From: Qu Wenruo <wqu@suse.com>
>>>
>>> Although we have several data csum verification code, we never have a
>>> function really just to verify checksum for one sector.
>>>
>>> Function check_data_csum() do extra work for error reporting, thus it
>>> requires a lot of extra things like file offset, bio_offset etc.
>>>
>>> Function btrfs_verify_data_csum() is even worse, it will utizlie page
>>> checked flag, which means it can not be utilized for direct IO pages.
>>>
>>> Here we introduce a new helper, btrfs_check_sector_csum(), which really
>>> only accept a sector in page, and expected checksum pointer.
>>>
>>> We use this function to implement check_data_csum(), and export it for
>>> incoming patch.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> [hch: keep passing the csum array as an arguments, as the callers want
>>>         to print it, rename per request]
>>
>> Mind to constify the @csum_expected parameter?
>
> This would be the incremental diff, if Dave cares deeply he can fold
> it in:
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index d982ea62c521b..f01ce82af8ca9 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3262,7 +3262,7 @@ u64 btrfs_file_extent_end(const struct btrfs_path *path);
>   void btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
>   			   int mirror_num, enum btrfs_compression_type compress_type);
>   int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
> -			    u32 pgoff, u8 *csum, u8 *csum_expected);
> +			    u32 pgoff, u8 *csum, u8 * const csum_expected);

Shouldn't it be "const u8 *" instead?

Anyway, normally David will handle it manually if needed.

I'm talking about this one just because my version is passing "const u8
*" for its csum_expected pointer, and caused warning here.

Thanks,
Qu

>   unsigned int btrfs_verify_data_csum(struct btrfs_bio *bbio,
>   				    u32 bio_offset, struct page *page,
>   				    u64 start, u64 end);
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index bfc0b0035b03c..c344ed0e057ac 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3330,7 +3330,7 @@ void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode,
>    * depend on the type of I/O.
>    */
>   int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
> -			    u32 pgoff, u8 *csum, u8 *csum_expected)
> +			    u32 pgoff, u8 *csum, u8 * const csum_expected)
>   {
>   	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
>   	char *kaddr;

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/8] btrfs: introduce a pure data checksum checking helper
  2022-05-24  8:07       ` Qu Wenruo
@ 2022-05-24  8:11         ` Christoph Hellwig
  2022-05-25 16:20           ` David Sterba
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-24  8:11 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Qu Wenruo, linux-btrfs, Nikolay Borisov

On Tue, May 24, 2022 at 04:07:41PM +0800, Qu Wenruo wrote:
>> -			    u32 pgoff, u8 *csum, u8 *csum_expected);
>> +			    u32 pgoff, u8 *csum, u8 * const csum_expected);
>
> Shouldn't it be "const u8 *" instead?

No, that would bind to the pointer.  But we care about the data.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-24  8:04                   ` Qu Wenruo
@ 2022-05-24  8:21                     ` Qu Wenruo
  2022-05-24 12:08                       ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Qu Wenruo @ 2022-05-24  8:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 2022/5/24 16:04, Qu Wenruo wrote:
>
>
> On 2022/5/24 15:32, Christoph Hellwig wrote:
>> On Mon, May 23, 2022 at 03:46:02PM +0800, Qu Wenruo wrote:
>>>> Becasue btrfs_repair_io_failure can't handle multiple-page I/O.  It
>>>> is also is rather cumbersome because it bypassed the normal bio
>>>> mapping.  As a follow on I'd rather move it over to btrfs_map_bio
>>>> with a special flag for the single mirror parity write rather than that
>>>> hack.
>>>
>>> In fact so far for all callers of btrfs_repair_io_failure(), we are
>>> always handling things inside one stripe.
>>>
>>> Thus we can easily enhance that function to handle multi page ranges.
>>>
>>> Although a dedicated btrfs_map_bio() flags seems more generic and
>>> better.
>>
>> I did think of moving btrfs_repair_io_failure over to my new
>> infrastructure in fact, because it seems inherently possible.  Just
>> not the highest priority right now.
>>
>>>> Because the whole bio at this point is all the bad sectors.  There
>>>> is no point in writing only parts of the bio because that would leave
>>>> corruption on disk.
>>>>
>>>>>     The only reason I can think of is, we're still trying to do some
>>>>>     "optimization".
>>>>>
>>>>>     But all our bio submission is already synchronous, I doubt such
>>>>>     "optimization" would make much difference.
>>>>
>>>> Can you explain what you mean here?
>>>
>>> We wait for the read bio anyway, I doubt the batched write part is that
>>> important.
>>
>> I still don't understand the point.  Once we read more than a single
>> page, writing it back as a patch is completely trivial as shown by
>> this series.  Why would we not do it?
>>
>>>
>>> If you really want, I can try to make the write part asynchronous, while
>>> still keep the read part synchronous, and easier to read.
>>
>> Asynchronous writes gets us back into all the I/O completion handler
>> complexities, which was the whole reason to start on the synchronous
>> repair.
>>
>>> In your current version, the do {} while() loop iterates through all
>>> mirrors.
>>>
>>> But for the following case, we will hit problems thanks to RAID1C3
>>> again:
>>>
>>> Mirror 1     |X|X|X|X|
>>> Mirror 2    |X| |X| |
>>> Mirror 3    | |X| |X|
>>>
>>> We hit mirror 1 initially, thus @initial_mirror is 1.
>>>
>>> Then when we try mirror 2, since the first sector is still bad, we jump
>>> to the next mirror.
>>>
>>> For mirror 3, we fixed the first sector only. Then 2nd sector is still
>>> from mirror 3 and didn't pass.
>>> Now we have no more mirrors, and still return -EIO.
>>
>> Can you share a test case?
>
> Unfortunately no real test case can work here.
>
> The problem is, VFS will try to re-read with smaller block size.
> In that case, fallback to sector-by-sector repair, thus even if we have
> some policy terribly wrong, as long as the sector-by-sector behavior is
>   fine, it will be hidden.
>
> That's why when I do my bitmap version, I have to add extra trace events
> to make sure it's not the retry, but really my read-repair code doing
> the correct repair, without triggering the re-read.
>
>>  The code resets initial_mirror as soon as
>> we made any progress so that should not happen.
>
> Oh, didn't see that part.
>
> And in that case, yes, it will work for the checker pattern, although
> not really any more efficient.
>
> We will do 4 different reads to fix it, no better than sector-by-sector
> repair.
> And worse than 2 reads from the bitmap version.
>
> But I get your point, it can handle continuous corruption better than
> sector-by-sector, and no worse than bitmap version in that case.
>
>>
>>> So my points still stand, if we want to do batched handling, either we
>>> go bitmap or we give up.
>>
>> Why?  For the very common case of clustered corruption or entirely
>> failing reads it is significantly faster than a simple synchronous
>> read of each sector, and also much better than the existing code.
>> It also is a lot less code than the existing code base, and (maybe
>> I'm biassed) a lot more readable.
>
> The problem here is, yes, you can pick the common case one, but it comes
> with the burden of worst cases too.
>
> And for the readable part, I strongly doubt.
>
> The things like resetting initial_mirror, making the naming "initial"
> meaningless.
> And the reset on the length part is also very quirky.

In fact, if you didn't do the initial_mirror and length change (which is
a big disaster of readability, to change iterator in a loop, at least to
me), and rely on the VFS re-read behavior to fall back to sector by
secot read, I would call it better readability...

Thanks,
Qu

>
> Yes, my bitmap version is super complex, that's no doubt, but the idea
> and code should be very straightforward.
>
> Just loop all mirrors until the bad bitmap is all zero. No need to reset
> the length or whatever halfway, bitmap and mirror number is the only
> iterator.
>
> And even for the bitmap preallocation failure part, we have VFS to bail
> us out.
> And code wise, it's not that simpler, if you ignore the bitmap
> pre-allocation part...
>
> And for the ultimate readability, the sector-by-sector method can not be
> beaten.
> Thus I'm not a particular fan of any middle ground here.
>
>>
>> Bitmaps only help you with randomly splattered corruption, which simply
>> is not how SSDs or hard drives actually fail.
>
> But that's the case you have to take into consideration.
>
> Even for cases where real world SSD to corrupt a big range of data, we
> can still submit a read that crosses the corruption boundary.
>
>>
>>> Such hacky bandage seems to work at first glance and will pass your new
>>> test cases, but it doesn't do it any better than sector-by-sector
>>> waiting.
>>> (Forgot to mention, the new RAID1C3 test case may also be flawed, as any
>>> read on other mirrors will cause read-repair, screwing up our later
>>> retry, thus we must check pid first before doing any read.)
>>
>> The updated version uses the read from mirror loop from btrfs/142
>> that cleverly used bash internals to not issue the read if it would
>> be done using the wrong mirror.  Which also really nicely speeds up
>> the tests including the exist 140 and 141 ones.
>>
> That's wonderful.
>
> However the smaller problem is still there, we have no way to know if
> it's the read-repair itself does its part correctly, or it's VFS retry
> saves our day.
>
> But yeah, btrfs/142 method is much better and should be backported to
> btrfs/140 and btrfs/141.
>
> Thanks,
> Qu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-24  8:21                     ` Qu Wenruo
@ 2022-05-24 12:08                       ` Christoph Hellwig
  2022-05-24 13:13                         ` Qu Wenruo
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2022-05-24 12:08 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Tue, May 24, 2022 at 04:21:38PM +0800, Qu Wenruo wrote:
>> The things like resetting initial_mirror, making the naming "initial"
>> meaningless.
>> And the reset on the length part is also very quirky.
>
> In fact, if you didn't do the initial_mirror and length change (which is
> a big disaster of readability, to change iterator in a loop, at least to
> me),

So what is the big problem there?  Do I need more extensive documentation
or as there anything in this concept that is just too confusing.

> and rely on the VFS re-read behavior to fall back to sector by
> secot read, I would call it better readability...

I don't think relying on undocumented VFS behavior is a good idea.  It
will also not work for direct I/O if any single direct I/O bio has
ever more than a single page, which is something btrfs badly needs
if it wants to get any kind of performance out of direct I/O.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-24 12:08                       ` Christoph Hellwig
@ 2022-05-24 13:13                         ` Qu Wenruo
  2022-05-24 14:02                           ` Qu Wenruo
  2022-05-25 15:12                           ` Nikolay Borisov
  0 siblings, 2 replies; 36+ messages in thread
From: Qu Wenruo @ 2022-05-24 13:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 2022/5/24 20:08, Christoph Hellwig wrote:
> On Tue, May 24, 2022 at 04:21:38PM +0800, Qu Wenruo wrote:
>>> The things like resetting initial_mirror, making the naming "initial"
>>> meaningless.
>>> And the reset on the length part is also very quirky.
>>
>> In fact, if you didn't do the initial_mirror and length change (which is
>> a big disaster of readability, to change iterator in a loop, at least to
>> me),
>
> So what is the big problem there?  Do I need more extensive documentation
> or as there anything in this concept that is just too confusing.

Modifying the iterator inside a loop is the biggest problem to me.

Yes, extra comments can help, but that doesn't help the readability.

And that's why most of guys here prefer for () loop if we can.

Thanks,
Qu
>
>> and rely on the VFS re-read behavior to fall back to sector by
>> secot read, I would call it better readability...
>
> I don't think relying on undocumented VFS behavior is a good idea.  It
> will also not work for direct I/O if any single direct I/O bio has
> ever more than a single page, which is something btrfs badly needs
> if it wants to get any kind of performance out of direct I/O.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-24 13:13                         ` Qu Wenruo
@ 2022-05-24 14:02                           ` Qu Wenruo
  2022-05-25 15:12                           ` Nikolay Borisov
  1 sibling, 0 replies; 36+ messages in thread
From: Qu Wenruo @ 2022-05-24 14:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 2022/5/24 21:13, Qu Wenruo wrote:
>
>
> On 2022/5/24 20:08, Christoph Hellwig wrote:
>> On Tue, May 24, 2022 at 04:21:38PM +0800, Qu Wenruo wrote:
>>>> The things like resetting initial_mirror, making the naming "initial"
>>>> meaningless.
>>>> And the reset on the length part is also very quirky.
>>>
>>> In fact, if you didn't do the initial_mirror and length change (which is
>>> a big disaster of readability, to change iterator in a loop, at least to
>>> me),
>>
>> So what is the big problem there?  Do I need more extensive documentation
>> or as there anything in this concept that is just too confusing.
>
> Modifying the iterator inside a loop is the biggest problem to me.
>
> Yes, extra comments can help, but that doesn't help the readability.
>
> And that's why most of guys here prefer for () loop if we can.

Just allow me to do another attempt on this.

This time, still bitmap based, but no pre/runtime allocation.
Just use the on-stack structure to contain a u32 bitmap.


And we no longer use the bitmap for the whole bio, but only for the
first corrupted sector, to the next good sector.

If the bitmap is not large enough to contain the next sector, we just
finish the current batch.

Now we should have all the best things, batched submission, no memory
allocation, no modification on the iterator inside a loop.

Although the cost is that, we are still doing the bitmap way, and we
have more call sites of btrfs_read_repair_finish().

Thanks,
Qu
>
> Thanks,
> Qu
>>
>>> and rely on the VFS re-read behavior to fall back to sector by
>>> secot read, I would call it better readability...
>>
>> I don't think relying on undocumented VFS behavior is a good idea.  It
>> will also not work for direct I/O if any single direct I/O bio has
>> ever more than a single page, which is something btrfs badly needs
>> if it wants to get any kind of performance out of direct I/O.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/8] btrfs: factor out a btrfs_csum_ptr helper
  2022-05-22 11:47 ` [PATCH 6/8] btrfs: factor out a btrfs_csum_ptr helper Christoph Hellwig
@ 2022-05-25 14:32   ` Nikolay Borisov
  0 siblings, 0 replies; 36+ messages in thread
From: Nikolay Borisov @ 2022-05-25 14:32 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs, Johannes Thumshirn



On 22.05.22 г. 14:47 ч., Christoph Hellwig wrote:
> Add a helper to find the csum for a byte offset into the csum buffer.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/8] btrfs: factor out a helper to end a single sector buffere I/O
  2022-05-22 11:47 ` [PATCH 4/8] btrfs: factor out a helper to end a single sector buffere I/O Christoph Hellwig
@ 2022-05-25 14:54   ` Nikolay Borisov
  0 siblings, 0 replies; 36+ messages in thread
From: Nikolay Borisov @ 2022-05-25 14:54 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs, Johannes Thumshirn



On 22.05.22 г. 14:47 ч., Christoph Hellwig wrote:
> Add a helper to end I/O on a single sector, which will come in handy with
> the new read repair code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Reviewed-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/8] btrfs: refactor end_bio_extent_readpage
  2022-05-22 11:47 ` [PATCH 5/8] btrfs: refactor end_bio_extent_readpage Christoph Hellwig
@ 2022-05-25 14:57   ` Nikolay Borisov
  0 siblings, 0 replies; 36+ messages in thread
From: Nikolay Borisov @ 2022-05-25 14:57 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs



On 22.05.22 г. 14:47 ч., Christoph Hellwig wrote:
> Untangle the goto mess and remove the pointless 'ret' local variable.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-24 13:13                         ` Qu Wenruo
  2022-05-24 14:02                           ` Qu Wenruo
@ 2022-05-25 15:12                           ` Nikolay Borisov
  2022-05-25 22:46                             ` Qu Wenruo
  1 sibling, 1 reply; 36+ messages in thread
From: Nikolay Borisov @ 2022-05-25 15:12 UTC (permalink / raw)
  To: Qu Wenruo, Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 24.05.22 г. 16:13 ч., Qu Wenruo wrote:
> 
> 
> On 2022/5/24 20:08, Christoph Hellwig wrote:
>> On Tue, May 24, 2022 at 04:21:38PM +0800, Qu Wenruo wrote:
>>>> The things like resetting initial_mirror, making the naming "initial"
>>>> meaningless.
>>>> And the reset on the length part is also very quirky.
>>>
>>> In fact, if you didn't do the initial_mirror and length change (which is
>>> a big disaster of readability, to change iterator in a loop, at least to
>>> me),
>>
>> So what is the big problem there?  Do I need more extensive documentation
>> or as there anything in this concept that is just too confusing.
> 
> Modifying the iterator inside a loop is the biggest problem to me.

What iterator gets modified? The code defines a local one, which gets 
initialized by copying the state from the bbio. Then the loop works as 
usual, that's not confusing and is in line with how the other bio vec 
iterator macros work.

> 
> Yes, extra comments can help, but that doesn't help the readability.
> 
> And that's why most of guys here prefer for () loop if we can.
> 
> Thanks,
> Qu
>>
>>> and rely on the VFS re-read behavior to fall back to sector by
>>> secot read, I would call it better readability...
>>
>> I don't think relying on undocumented VFS behavior is a good idea.  It
>> will also not work for direct I/O if any single direct I/O bio has
>> ever more than a single page, which is something btrfs badly needs
>> if it wants to get any kind of performance out of direct I/O.
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/8] btrfs: introduce a pure data checksum checking helper
  2022-05-24  8:11         ` Christoph Hellwig
@ 2022-05-25 16:20           ` David Sterba
  0 siblings, 0 replies; 36+ messages in thread
From: David Sterba @ 2022-05-25 16:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Qu Wenruo, Chris Mason, Josef Bacik, David Sterba, Qu Wenruo,
	linux-btrfs, Nikolay Borisov

On Tue, May 24, 2022 at 10:11:10AM +0200, Christoph Hellwig wrote:
> On Tue, May 24, 2022 at 04:07:41PM +0800, Qu Wenruo wrote:
> >> -			    u32 pgoff, u8 *csum, u8 *csum_expected);
> >> +			    u32 pgoff, u8 *csum, u8 * const csum_expected);
> >
> > Shouldn't it be "const u8 *" instead?
> 
> No, that would bind to the pointer.  But we care about the data.

Then it could be 'const u8 * const csum_expected'. The consts are not
everywhere I would like them to see so in new code I care a bit more and
add it when applying patches.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: misc btrfs cleanups
  2022-05-22 11:47 misc btrfs cleanups Christoph Hellwig
                   ` (7 preceding siblings ...)
  2022-05-22 11:47 ` [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio Christoph Hellwig
@ 2022-05-25 20:20 ` David Sterba
  2022-05-27 15:20   ` David Sterba
  8 siblings, 1 reply; 36+ messages in thread
From: David Sterba @ 2022-05-25 20:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, Qu Wenruo, linux-btrfs

On Sun, May 22, 2022 at 01:47:46PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series has a bunch of random cleanups and factored out helpers split
> from the read repair series.

Thanks, I went through it, did some coding style fixups. Now it's under
tests and I'll move it to misc-next after that so the repair changes can
be based on that.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-25 15:12                           ` Nikolay Borisov
@ 2022-05-25 22:46                             ` Qu Wenruo
  0 siblings, 0 replies; 36+ messages in thread
From: Qu Wenruo @ 2022-05-25 22:46 UTC (permalink / raw)
  To: Nikolay Borisov, Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 2022/5/25 23:12, Nikolay Borisov wrote:
>
>
> On 24.05.22 г. 16:13 ч., Qu Wenruo wrote:
>>
>>
>> On 2022/5/24 20:08, Christoph Hellwig wrote:
>>> On Tue, May 24, 2022 at 04:21:38PM +0800, Qu Wenruo wrote:
>>>>> The things like resetting initial_mirror, making the naming "initial"
>>>>> meaningless.
>>>>> And the reset on the length part is also very quirky.
>>>>
>>>> In fact, if you didn't do the initial_mirror and length change
>>>> (which is
>>>> a big disaster of readability, to change iterator in a loop, at
>>>> least to
>>>> me),
>>>
>>> So what is the big problem there?  Do I need more extensive
>>> documentation
>>> or as there anything in this concept that is just too confusing.
>>
>> Modifying the iterator inside a loop is the biggest problem to me.
>
> What iterator gets modified?

Mirror number.

Thanks,
Qu

> The code defines a local one, which gets
> initialized by copying the state from the bbio. Then the loop works as
> usual, that's not confusing and is in line with how the other bio vec
> iterator macros work.
>
>>
>> Yes, extra comments can help, but that doesn't help the readability.
>>
>> And that's why most of guys here prefer for () loop if we can.
>>
>> Thanks,
>> Qu
>>>
>>>> and rely on the VFS re-read behavior to fall back to sector by
>>>> secot read, I would call it better readability...
>>>
>>> I don't think relying on undocumented VFS behavior is a good idea.  It
>>> will also not work for direct I/O if any single direct I/O bio has
>>> ever more than a single page, which is something btrfs badly needs
>>> if it wants to get any kind of performance out of direct I/O.
>>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 7/8] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks
  2022-05-22 11:47 ` [PATCH 7/8] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks Christoph Hellwig
@ 2022-05-26 12:58   ` Nikolay Borisov
  0 siblings, 0 replies; 36+ messages in thread
From: Nikolay Borisov @ 2022-05-26 12:58 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs



On 22.05.22 г. 14:47 ч., Christoph Hellwig wrote:
> From: Qu Wenruo <wqu@suse.com>
> 
> Add a helper that works similar to __bio_for_each_segment, but instead of
> iterating over PAGE_SIZE chunks it iterates over each sector.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> [hch: split from a larger patch, and iterate over the offset instead of
>        the offset bits]
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
  2022-05-22 11:47 ` [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio Christoph Hellwig
  2022-05-22 12:21   ` Qu Wenruo
@ 2022-05-26 12:58   ` Nikolay Borisov
  1 sibling, 0 replies; 36+ messages in thread
From: Nikolay Borisov @ 2022-05-26 12:58 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Qu Wenruo, linux-btrfs



On 22.05.22 г. 14:47 ч., Christoph Hellwig wrote:
> Use the new btrfs_bio_for_each_sector iterator to simplify
> btrfs_check_read_dio_bio.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: misc btrfs cleanups
  2022-05-25 20:20 ` misc btrfs cleanups David Sterba
@ 2022-05-27 15:20   ` David Sterba
  0 siblings, 0 replies; 36+ messages in thread
From: David Sterba @ 2022-05-27 15:20 UTC (permalink / raw)
  To: dsterba, Christoph Hellwig, Chris Mason, Josef Bacik,
	David Sterba, Qu Wenruo, linux-btrfs

On Wed, May 25, 2022 at 10:20:12PM +0200, David Sterba wrote:
> On Sun, May 22, 2022 at 01:47:46PM +0200, Christoph Hellwig wrote:
> > Hi all,
> > 
> > this series has a bunch of random cleanups and factored out helpers split
> > from the read repair series.
> 
> Thanks, I went through it, did some coding style fixups. Now it's under
> tests and I'll move it to misc-next after that so the repair changes can
> be based on that.

Moved to misc-next.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2022-05-27 15:25 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-22 11:47 misc btrfs cleanups Christoph Hellwig
2022-05-22 11:47 ` [PATCH 1/8] btrfs: quit early if the fs has no RAID56 support for raid56 related checks Christoph Hellwig
2022-05-22 11:47 ` [PATCH 2/8] btrfs: introduce a pure data checksum checking helper Christoph Hellwig
2022-05-23  0:38   ` Qu Wenruo
2022-05-24  7:24     ` Christoph Hellwig
2022-05-24  8:07       ` Qu Wenruo
2022-05-24  8:11         ` Christoph Hellwig
2022-05-25 16:20           ` David Sterba
2022-05-22 11:47 ` [PATCH 3/8] btrfs: remove duplicated parameters from submit_data_read_repair() Christoph Hellwig
2022-05-22 11:47 ` [PATCH 4/8] btrfs: factor out a helper to end a single sector buffere I/O Christoph Hellwig
2022-05-25 14:54   ` Nikolay Borisov
2022-05-22 11:47 ` [PATCH 5/8] btrfs: refactor end_bio_extent_readpage Christoph Hellwig
2022-05-25 14:57   ` Nikolay Borisov
2022-05-22 11:47 ` [PATCH 6/8] btrfs: factor out a btrfs_csum_ptr helper Christoph Hellwig
2022-05-25 14:32   ` Nikolay Borisov
2022-05-22 11:47 ` [PATCH 7/8] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks Christoph Hellwig
2022-05-26 12:58   ` Nikolay Borisov
2022-05-22 11:47 ` [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio Christoph Hellwig
2022-05-22 12:21   ` Qu Wenruo
2022-05-22 12:31     ` Christoph Hellwig
2022-05-22 12:38       ` Qu Wenruo
2022-05-22 12:53         ` Christoph Hellwig
2022-05-23  0:07           ` Qu Wenruo
2022-05-23  6:26             ` Christoph Hellwig
2022-05-23  7:46               ` Qu Wenruo
2022-05-24  7:32                 ` Christoph Hellwig
2022-05-24  8:04                   ` Qu Wenruo
2022-05-24  8:21                     ` Qu Wenruo
2022-05-24 12:08                       ` Christoph Hellwig
2022-05-24 13:13                         ` Qu Wenruo
2022-05-24 14:02                           ` Qu Wenruo
2022-05-25 15:12                           ` Nikolay Borisov
2022-05-25 22:46                             ` Qu Wenruo
2022-05-26 12:58   ` Nikolay Borisov
2022-05-25 20:20 ` misc btrfs cleanups David Sterba
2022-05-27 15:20   ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.