linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] btrfs: btrfs_lookup_bio_sums() related fixes
@ 2020-10-28  7:24 Qu Wenruo
  2020-10-28  7:24 ` [PATCH 1/3] btrfs: file-item: use nodesize to determine whether we need readhead for btrfs_lookup_bio_sums() Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Qu Wenruo @ 2020-10-28  7:24 UTC (permalink / raw)
  To: linux-btrfs

This is the final piece to make subpage read/write patchset to pass all
fsstress tests.

Before this 3 patches, there is a very low chance to hit the following
errors:
- False bad data csum error
  The expected csum is alwasy 0x00000000, which should represents
  something.

- WARN_ON_ONCE(count) hit
  Something is definitely wrong in btrfs_lookup_bio_sums().


After more debugging, even on X86_64, there is definitely something to
notice, the most important thing is, the bvec in btrfs_lookup_bio_sums()
is not always in linear bytenr order:

  btrfs_lookup_bio_sums: file_offset=-1 expected_bvec_offset=532480 page_offset=716800 bv_offset=0
  btrfs_lookup_bio_sums: orig_file_offset=520192 bvec_index=3 root=259 ino=260 page_owner_ino=260

This is from x86_64 run, with extra debug info from the
bio_for_each_segment() loop.
It shows that, also the bio->bi_size is correct and each page is
properly mapped, they are not ensured to be in bytenr order.
This could ruin subpage support, if we hit a bvec which belongs to the
end of the bio, and we found a lot of csums for it, we may skip the
csums for other bvecs.

To address the problem, the 3rd patch is introduced to do bio unrelated
csum search.
This does not only simplify the main loop (just one small main loop),
but also free us from the out-of-order bvec problems.

The other two patches are mostly small enhancement found during the
development.

With the patchset, the subpage can survice (mostly) infinite fsstress
run and the regular page size case can still pass all existing fstests.

Since it has nothing special to subpage at all, and by nature they are
mostly renames and refactor, they can be submitted right now, with or
without subpage patches.

Qu Wenruo (3):
  btrfs: file-item: use nodesize to determine whether we need readhead
    for btrfs_lookup_bio_sums()
  btrfs: ordered-data: rename parameter @len to @nr_sectors
  btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle
    out-of-order bvecs

 fs/btrfs/file-item.c    | 242 ++++++++++++++++++++++++----------------
 fs/btrfs/ordered-data.c |   9 +-
 fs/btrfs/ordered-data.h |   2 +-
 3 files changed, 154 insertions(+), 99 deletions(-)

-- 
2.29.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] btrfs: file-item: use nodesize to determine whether we need readhead for btrfs_lookup_bio_sums()
  2020-10-28  7:24 [PATCH 0/3] btrfs: btrfs_lookup_bio_sums() related fixes Qu Wenruo
@ 2020-10-28  7:24 ` Qu Wenruo
  2020-10-29  7:25   ` Nikolay Borisov
  2020-10-28  7:24 ` [PATCH 2/3] btrfs: ordered-data: rename parameter @len to @nr_sectors Qu Wenruo
  2020-10-28  7:24 ` [PATCH 3/3] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs Qu Wenruo
  2 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2020-10-28  7:24 UTC (permalink / raw)
  To: linux-btrfs

In btrfs_lookup_bio_sums() if the bio is pretty large, we want to
readahead the csum tree.

However the threshold is an immediate number, (PAGE_SIZE * 8), from the
initial btrfs merge.

The value itself is pretty hard to guess the meaning, especially when
the immediate number is from the age where 4K sectorsize is the default
and only CRC32 is supported.

For the most common btrfs setup, CRC32 csum algorithme 4K sectorsize,
it means just 32K read would kick readahead, while the csum itself is
only 32 bytes in size.

Now let's be more reasonable by taking both csum size and node size into
consideration.

If the csum size for the bio is larger than one node, then we kick the
readahead.
This means for current default btrfs, the threshold will be 16M.

This change should not change performance observably, thus this is mostly
a readability enhancement.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file-item.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 7d5ec71615b8..fbc60948b2c4 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -295,7 +295,11 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 		csum = dst;
 	}
 
-	if (bio->bi_iter.bi_size > PAGE_SIZE * 8)
+	/*
+	 * If needed csum size is larger than a node, kick the readahead for
+	 * csum tree would be a good idea.
+	 */
+	if (nblocks * csum_size > fs_info->nodesize)
 		path->reada = READA_FORWARD;
 
 	/*
-- 
2.29.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] btrfs: ordered-data: rename parameter @len to @nr_sectors
  2020-10-28  7:24 [PATCH 0/3] btrfs: btrfs_lookup_bio_sums() related fixes Qu Wenruo
  2020-10-28  7:24 ` [PATCH 1/3] btrfs: file-item: use nodesize to determine whether we need readhead for btrfs_lookup_bio_sums() Qu Wenruo
@ 2020-10-28  7:24 ` Qu Wenruo
  2020-10-29  7:32   ` Nikolay Borisov
  2020-11-03 19:16   ` David Sterba
  2020-10-28  7:24 ` [PATCH 3/3] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs Qu Wenruo
  2 siblings, 2 replies; 12+ messages in thread
From: Qu Wenruo @ 2020-10-28  7:24 UTC (permalink / raw)
  To: linux-btrfs

The parameter is the number of sectors of the range to search.
While most "len" we used in other locations are in byte size, this can
lead to confusion.

Rename @len to @nr_sectors to make it more clear and avoid confusion.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ordered-data.c | 9 ++++++---
 fs/btrfs/ordered-data.h | 2 +-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index ebac13389e7e..10c13f8a1603 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -802,9 +802,11 @@ btrfs_lookup_first_ordered_extent(struct inode *inode, u64 file_offset)
  * search the ordered extents for one corresponding to 'offset' and
  * try to find a checksum.  This is used because we allow pages to
  * be reclaimed before their checksum is actually put into the btree
+ *
+ * @nr_sectors:	The length of the search range, in sectors.
  */
 int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
-			   u8 *sum, int len)
+			   u8 *sum, int nr_sectors)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_ordered_sum *ordered_sum;
@@ -828,12 +830,13 @@ int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
 			    inode->i_sb->s_blocksize_bits;
 			num_sectors = ordered_sum->len >>
 				      inode->i_sb->s_blocksize_bits;
-			num_sectors = min_t(int, len - index, num_sectors - i);
+			num_sectors = min_t(int, nr_sectors - index,
+					    num_sectors - i);
 			memcpy(sum + index, ordered_sum->sums + i * csum_size,
 			       num_sectors * csum_size);
 
 			index += (int)num_sectors * csum_size;
-			if (index == len)
+			if (index == nr_sectors)
 				goto out;
 			disk_bytenr += num_sectors * sectorsize;
 		}
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index d61ea9c880a3..3ffc1c27ee30 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -175,7 +175,7 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range(
 		u64 file_offset,
 		u64 len);
 int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
-			   u8 *sum, int len);
+			   u8 *sum, int nr_sectors);
 u64 btrfs_wait_ordered_extents(struct btrfs_root *root, u64 nr,
 			       const u64 range_start, const u64 range_len);
 void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr,
-- 
2.29.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs
  2020-10-28  7:24 [PATCH 0/3] btrfs: btrfs_lookup_bio_sums() related fixes Qu Wenruo
  2020-10-28  7:24 ` [PATCH 1/3] btrfs: file-item: use nodesize to determine whether we need readhead for btrfs_lookup_bio_sums() Qu Wenruo
  2020-10-28  7:24 ` [PATCH 2/3] btrfs: ordered-data: rename parameter @len to @nr_sectors Qu Wenruo
@ 2020-10-28  7:24 ` Qu Wenruo
  2020-11-03 19:46   ` David Sterba
  2 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2020-10-28  7:24 UTC (permalink / raw)
  To: linux-btrfs

Refactor btrfs_lookup_bio_sums() by:
- Extract the csum tree lookup into find_csum_tree_sums()
  The new function will handle the csum search in csum tree.
  The return value is the same as btrfs_find_ordered_sum(), returning
  the found number of sectors who has checksum.

- Change how we do the main loop
  The parameter bio is in fact just a distraction.
  The only needed info from bio is:
  * the on-disk bytenr
  * the file_offset (if file_offset == (u64)-1)
  * the length

  After extracting above info, we can do the search without bio
  at all, which makes the main loop much simpler:

	for (cur_offset = orig_file_offset; cur_offset < orig_file_offset + orig_len;
	     cur_offset += count * sectorsize) {
		/* Lookup ordered csum */
		count = btrfs_find_ordered_sum(inode, cur_offset,
					       cur_disk_bytenr, csum_dst,
					       search_len / sectorsize);
		if (count)
			continue;
		/* Lookup csum tree */
		count = find_csum_tree_sums(fs_info, path, cur_disk_bytenr,
					    search_len, csum_dst);
		if (!count) {
			/* Csum hole handling */
		}
	}

- Use single variable as core to calculate all other offsets
  Instead of all differnt type of variables, we use only one core
  variable, cur_offset, which represents the current file offset.

  All involves values can be calculated from that core variable, and
  all those variable will only be visible in the inner loop.

    diff_sectors = (cur_offset - orig_file_offset) / sectorsize;
    cur_disk_bytenr = orig_disk_bytenr + diff_sectors * sectorsize;
    csum_dst = csum + diff_sectors * csum_size;

- Fix bugs related to the bio iteration
  There are several hidden pitfalls related to the csum lookup.
  The biggest one is related to the bvec iteration.
  There are cases that the bvecs are not in linear bytenr order, like
  the following case with added debug info:
    btrfs_lookup_bio_sums: file_offset=-1 expected_bvec_offset=532480 page_offset=716800 bv_offset=0
    btrfs_lookup_bio_sums: orig_file_offset=520192 bvec_index=3 root=259 ino=260 page_owner_ino=260

  This is even more dangerous for subpage support, as for subpage case,
  we can have bvec with non-zero bv_offset, and if they get re-ordered,
  we can easily get incorrect csum skip and lead to false csum warning.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file-item.c | 236 ++++++++++++++++++++++++++-----------------
 1 file changed, 142 insertions(+), 94 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index fbc60948b2c4..5f60ce6f227a 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -239,44 +239,117 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
+/*
+ * Helper to find csums for logical bytenr range
+ * [disk_bytenr, disk_bytenr + len) and restore the result to @dst.
+ *
+ * Return >0 for the number of sectors we found.
+ * Return 0 for the range [disk_bytenr, disk_bytenr + sectorsize) has no csum
+ * for it. Caller may want to try next sector until one range is hit.
+ * Return <0 for fatal error.
+ */
+static int find_csum_tree_sums(struct btrfs_fs_info *fs_info,
+			       struct btrfs_path *path, u64 disk_bytenr,
+			       u64 len, u8 *dst)
+{
+	struct btrfs_csum_item *item = NULL;
+	struct btrfs_key key;
+	u32 csum_size = btrfs_super_csum_size(fs_info->super_copy);
+	u32 sectorsize = fs_info->sectorsize;
+	int ret;
+	u64 csum_start;
+	u64 csum_len;
+
+	ASSERT(IS_ALIGNED(disk_bytenr, sectorsize) &&
+	       IS_ALIGNED(len, sectorsize));
+
+	/* Check if the current csum item covers disk_bytenr */
+	if (path->nodes[0]) {
+		item = btrfs_item_ptr(path->nodes[0], path->slots[0],
+				      struct btrfs_csum_item);
+		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+		csum_start = key.offset;
+		csum_len = btrfs_item_size_nr(path->nodes[0], path->slots[0]) /
+			   csum_size * sectorsize;
+
+		if (csum_start <= disk_bytenr &&
+		    csum_start + csum_len > disk_bytenr)
+			goto found;
+	}
+
+	/* Current item doesn't contain the desired range, re-search */
+	btrfs_release_path(path);
+	item = btrfs_lookup_csum(NULL, fs_info->csum_root, path,
+				 disk_bytenr, 0);
+	if (IS_ERR(item)) {
+		ret = PTR_ERR(item);
+		goto out;
+	}
+found:
+	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+	csum_start = key.offset;
+	csum_len = btrfs_item_size_nr(path->nodes[0], path->slots[0]) /
+		   csum_size * sectorsize;
+	ASSERT(csum_start <= disk_bytenr &&
+	       csum_start + csum_len > disk_bytenr);
+
+	ret = div_u64(min(csum_start + csum_len, disk_bytenr + len) -
+		      disk_bytenr, sectorsize);
+	read_extent_buffer(path->nodes[0], dst, (unsigned long)item,
+			ret * csum_size);
+out:
+	if (ret == -ENOENT) {
+		ret = 0;
+		memset(dst, 0, csum_size);
+	}
+	return ret;
+}
+
 /**
  * btrfs_lookup_bio_sums - Look up checksums for a bio.
- * @inode: inode that the bio is for.
- * @bio: bio to look up.
- * @offset: Unless (u64)-1, look up checksums for this offset in the file.
- *          If (u64)-1, use the page offsets from the bio instead.
- * @dst: Buffer of size nblocks * btrfs_super_csum_size() used to return
- *       checksum (nblocks = bio->bi_iter.bi_size / fs_info->sectorsize). If
- *       NULL, the checksum buffer is allocated and returned in
- *       btrfs_io_bio(bio)->csum instead.
+ * @inode:		Inode that the bio is for.
+ * @bio:		Bio to look up.
+ *			NOTE: The bio is only used to determine the file_offset
+ *			and length.
+ * @file_offset:	File offset of the bio.
+ * 			If (u64)-1, will use the bio to determine the
+ * 			file offset.
+ * @dst:		Csum destination.
+ * 			Should be at least (bio->bi_iter.bi_size /
+ * 			fs_info->sectorsize * csum_size) bytes in size.
  *
  * Return: BLK_STS_RESOURCE if allocating memory fails, BLK_STS_OK otherwise.
  */
 blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
-				   u64 offset, u8 *dst)
+				   u64 file_offset, u8 *dst)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct bio_vec bvec;
-	struct bvec_iter iter;
-	struct btrfs_csum_item *item = NULL;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_path *path;
-	const bool page_offsets = (offset == (u64)-1);
+	u32 csum_size = btrfs_super_csum_size(fs_info->super_copy);
+	u32 sectorsize = fs_info->sectorsize;
+	u64 orig_file_offset;
+	u64 orig_len;
+	u64 orig_disk_bytenr = bio->bi_iter.bi_sector << 9;
+	/* Current file offset, is used to calculate all other values */
+	u64 cur_offset;
 	u8 *csum;
-	u64 item_start_offset = 0;
-	u64 item_last_offset = 0;
-	u64 disk_bytenr;
-	u64 page_bytes_left;
-	u32 diff;
 	int nblocks;
 	int count = 0;
-	u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
+
+	if (file_offset == (u64)-1)
+		orig_file_offset = page_offset(bio_first_page_all(bio)) +
+				   bio_first_bvec_all(bio)->bv_offset;
+	else
+		orig_file_offset = file_offset;
+
+	orig_len = bio->bi_iter.bi_size;
+	nblocks = orig_len >> inode->i_sb->s_blocksize_bits;
 
 	path = btrfs_alloc_path();
 	if (!path)
 		return BLK_STS_RESOURCE;
 
-	nblocks = bio->bi_iter.bi_size >> inode->i_sb->s_blocksize_bits;
 	if (!dst) {
 		struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio);
 
@@ -313,85 +386,60 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 		path->skip_locking = 1;
 	}
 
-	disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
-
-	bio_for_each_segment(bvec, bio, iter) {
-		page_bytes_left = bvec.bv_len;
-		if (count)
-			goto next;
-
-		if (page_offsets)
-			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
-		count = btrfs_find_ordered_sum(inode, offset, disk_bytenr,
-					       csum, nblocks);
+	/*
+	 * In fact, for csum lookup we don't really need bio at all.
+	 *
+	 * We know the on-disk bytenr, the file_offset, and length.
+	 * That's enough to search csum. The bio is in fact just a distraction
+	 * and following bio bvec would make thing much hard to go.
+	 * As we could have subpage bvec (with different bv_len) and non-linear
+	 * bvec.
+	 *
+	 * So here we don't bother bio at all, just use @cur_offset to do the
+	 * iteration.
+	 */
+	for (cur_offset = orig_file_offset; cur_offset < orig_file_offset + orig_len;
+	     cur_offset += count * sectorsize) {
+		u64 cur_disk_bytenr;
+		int search_len = orig_file_offset + orig_len - cur_offset;
+		int diff_sectors;
+		u8 *csum_dst;
+
+		diff_sectors = div_u64(cur_offset - orig_file_offset,
+				       sectorsize);
+		cur_disk_bytenr = orig_disk_bytenr +
+				  diff_sectors * sectorsize;
+		csum_dst = csum + diff_sectors * csum_size;
+
+		count = btrfs_find_ordered_sum(inode, cur_offset,
+					       cur_disk_bytenr, csum_dst,
+					       search_len / sectorsize);
 		if (count)
-			goto found;
-
-		if (!item || disk_bytenr < item_start_offset ||
-		    disk_bytenr >= item_last_offset) {
-			struct btrfs_key found_key;
-			u32 item_size;
-
-			if (item)
-				btrfs_release_path(path);
-			item = btrfs_lookup_csum(NULL, fs_info->csum_root,
-						 path, disk_bytenr, 0);
-			if (IS_ERR(item)) {
-				count = 1;
-				memset(csum, 0, csum_size);
-				if (BTRFS_I(inode)->root->root_key.objectid ==
-				    BTRFS_DATA_RELOC_TREE_OBJECTID) {
-					set_extent_bits(io_tree, offset,
-						offset + fs_info->sectorsize - 1,
-						EXTENT_NODATASUM);
-				} else {
-					btrfs_info_rl(fs_info,
-						   "no csum found for inode %llu start %llu",
-					       btrfs_ino(BTRFS_I(inode)), offset);
-				}
-				item = NULL;
-				btrfs_release_path(path);
-				goto found;
+			continue;
+		count = find_csum_tree_sums(fs_info, path, cur_disk_bytenr,
+					    search_len, csum_dst);
+		if (!count) {
+			/*
+			 * For not found case, the csum has been zeroed
+			 * in find_csum_tree_sums() already, just skip
+			 * to next sector.
+			 */
+			count = 1;
+			if (BTRFS_I(inode)->root->root_key.objectid ==
+			    BTRFS_DATA_RELOC_TREE_OBJECTID) {
+				set_extent_bits(io_tree, cur_offset,
+					cur_offset + sectorsize - 1,
+					EXTENT_NODATASUM);
+			} else {
+				btrfs_warn_rl(fs_info,
+		"csum hole found for root %lld inode %llu range [%llu, %llu)",
+				BTRFS_I(inode)->root->root_key.objectid,
+				btrfs_ino(BTRFS_I(inode)),
+				cur_offset, cur_offset + sectorsize);
 			}
-			btrfs_item_key_to_cpu(path->nodes[0], &found_key,
-					      path->slots[0]);
-
-			item_start_offset = found_key.offset;
-			item_size = btrfs_item_size_nr(path->nodes[0],
-						       path->slots[0]);
-			item_last_offset = item_start_offset +
-				(item_size / csum_size) *
-				fs_info->sectorsize;
-			item = btrfs_item_ptr(path->nodes[0], path->slots[0],
-					      struct btrfs_csum_item);
-		}
-		/*
-		 * this byte range must be able to fit inside
-		 * a single leaf so it will also fit inside a u32
-		 */
-		diff = disk_bytenr - item_start_offset;
-		diff = diff / fs_info->sectorsize;
-		diff = diff * csum_size;
-		count = min_t(int, nblocks, (item_last_offset - disk_bytenr) >>
-					    inode->i_sb->s_blocksize_bits);
-		read_extent_buffer(path->nodes[0], csum,
-				   ((unsigned long)item) + diff,
-				   csum_size * count);
-found:
-		csum += count * csum_size;
-		nblocks -= count;
-next:
-		while (count > 0) {
-			count--;
-			disk_bytenr += fs_info->sectorsize;
-			offset += fs_info->sectorsize;
-			page_bytes_left -= fs_info->sectorsize;
-			if (!page_bytes_left)
-				break; /* move to next bio */
 		}
 	}
 
-	WARN_ON_ONCE(count);
 	btrfs_free_path(path);
 	return BLK_STS_OK;
 }
-- 
2.29.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] btrfs: file-item: use nodesize to determine whether we need readhead for btrfs_lookup_bio_sums()
  2020-10-28  7:24 ` [PATCH 1/3] btrfs: file-item: use nodesize to determine whether we need readhead for btrfs_lookup_bio_sums() Qu Wenruo
@ 2020-10-29  7:25   ` Nikolay Borisov
  0 siblings, 0 replies; 12+ messages in thread
From: Nikolay Borisov @ 2020-10-29  7:25 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 28.10.20 г. 9:24 ч., Qu Wenruo wrote:
> In btrfs_lookup_bio_sums() if the bio is pretty large, we want to
> readahead the csum tree.
> 
> However the threshold is an immediate number, (PAGE_SIZE * 8), from the
> initial btrfs merge.
> 
> The value itself is pretty hard to guess the meaning, especially when
> the immediate number is from the age where 4K sectorsize is the default
> and only CRC32 is supported.
> 
> For the most common btrfs setup, CRC32 csum algorithme 4K sectorsize,
> it means just 32K read would kick readahead, while the csum itself is
> only 32 bytes in size.
> 
> Now let's be more reasonable by taking both csum size and node size into
> consideration.
> 
> If the csum size for the bio is larger than one node, then we kick the
> readahead.
> This means for current default btrfs, the threshold will be 16M.
> 
> This change should not change performance observably, thus this is mostly
> a readability enhancement.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] btrfs: ordered-data: rename parameter @len to @nr_sectors
  2020-10-28  7:24 ` [PATCH 2/3] btrfs: ordered-data: rename parameter @len to @nr_sectors Qu Wenruo
@ 2020-10-29  7:32   ` Nikolay Borisov
  2020-11-03 19:16   ` David Sterba
  1 sibling, 0 replies; 12+ messages in thread
From: Nikolay Borisov @ 2020-10-29  7:32 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 28.10.20 г. 9:24 ч., Qu Wenruo wrote:
> The parameter is the number of sectors of the range to search.
> While most "len" we used in other locations are in byte size, this can
> lead to confusion.
> 
> Rename @len to @nr_sectors to make it more clear and avoid confusion.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/ordered-data.c | 9 ++++++---
>  fs/btrfs/ordered-data.h | 2 +-
>  2 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index ebac13389e7e..10c13f8a1603 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -802,9 +802,11 @@ btrfs_lookup_first_ordered_extent(struct inode *inode, u64 file_offset)
>   * search the ordered extents for one corresponding to 'offset' and
>   * try to find a checksum.  This is used because we allow pages to
>   * be reclaimed before their checksum is actually put into the btree
> + *
> + * @nr_sectors:	The length of the search range, in sectors.
>   */
>  int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
> -			   u8 *sum, int len)
> +			   u8 *sum, int nr_sectors)
>  {
>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>  	struct btrfs_ordered_sum *ordered_sum;
> @@ -828,12 +830,13 @@ int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,

FWIW: The loop can be made slightly more readable by applying the following hunk:

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 9a277a475a1c..0b41499bb45a 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -878,8 +878,8 @@ int btrfs_find_ordered_sum(struct btrfs_inode *inode, u64 offset,
 
        spin_lock_irq(&tree->lock);
        list_for_each_entry_reverse(ordered_sum, &ordered->list, list) {
-               if (disk_bytenr >= ordered_sum->bytenr &&
-                   disk_bytenr < ordered_sum->bytenr + ordered_sum->len) {
+               if (in_range(disk_bytenr, ordered_sum->bytenr,
+                            ordered_sum->len)) {
                        i = (disk_bytenr - ordered_sum->bytenr) >> blocksize_bits;
                        num_sectors = ordered_sum->len >> blocksize_bits;
                        num_sectors = min_t(int, len - index, num_sectors - i);


>  			    inode->i_sb->s_blocksize_bits;
>  			num_sectors = ordered_sum->len >>
>  				      inode->i_sb->s_blocksize_bits;
> -			num_sectors = min_t(int, len - index, num_sectors - i);
> +			num_sectors = min_t(int, nr_sectors - index,
> +					    num_sectors - i);
>  			memcpy(sum + index, ordered_sum->sums + i * csum_size,
>  			       num_sectors * csum_size);
>  
>  			index += (int)num_sectors * csum_size;
> -			if (index == len)
> +			if (index == nr_sectors)
>  				goto out;
>  			disk_bytenr += num_sectors * sectorsize;
>  		}

<snip>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] btrfs: ordered-data: rename parameter @len to @nr_sectors
  2020-10-28  7:24 ` [PATCH 2/3] btrfs: ordered-data: rename parameter @len to @nr_sectors Qu Wenruo
  2020-10-29  7:32   ` Nikolay Borisov
@ 2020-11-03 19:16   ` David Sterba
  2020-11-03 19:23     ` Amy Parker
  1 sibling, 1 reply; 12+ messages in thread
From: David Sterba @ 2020-11-03 19:16 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Oct 28, 2020 at 03:24:31PM +0800, Qu Wenruo wrote:
> The parameter is the number of sectors of the range to search.
> While most "len" we used in other locations are in byte size, this can
> lead to confusion.
> 
> Rename @len to @nr_sectors to make it more clear and avoid confusion.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/ordered-data.c | 9 ++++++---
>  fs/btrfs/ordered-data.h | 2 +-
>  2 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index ebac13389e7e..10c13f8a1603 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -802,9 +802,11 @@ btrfs_lookup_first_ordered_extent(struct inode *inode, u64 file_offset)
>   * search the ordered extents for one corresponding to 'offset' and
>   * try to find a checksum.  This is used because we allow pages to
>   * be reclaimed before their checksum is actually put into the btree
> + *
> + * @nr_sectors:	The length of the search range, in sectors.

Please add all parameters

>   */
>  int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
> -			   u8 *sum, int len)
> +			   u8 *sum, int nr_sectors)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] btrfs: ordered-data: rename parameter @len to @nr_sectors
  2020-11-03 19:16   ` David Sterba
@ 2020-11-03 19:23     ` Amy Parker
  2020-11-03 23:36       ` Qu Wenruo
  0 siblings, 1 reply; 12+ messages in thread
From: Amy Parker @ 2020-11-03 19:23 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

On Tue, Nov 3, 2020 at 11:18 AM David Sterba <dsterba@suse.cz> wrote:
>
> On Wed, Oct 28, 2020 at 03:24:31PM +0800, Qu Wenruo wrote:
> > The parameter is the number of sectors of the range to search.
> > While most "len" we used in other locations are in byte size, this can
> > lead to confusion.
> >
> > Rename @len to @nr_sectors to make it more clear and avoid confusion.
> >
> > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > ---
> >  fs/btrfs/ordered-data.c | 9 ++++++---
> >  fs/btrfs/ordered-data.h | 2 +-
> >  2 files changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> > index ebac13389e7e..10c13f8a1603 100644
> > --- a/fs/btrfs/ordered-data.c
> > +++ b/fs/btrfs/ordered-data.c
> > @@ -802,9 +802,11 @@ btrfs_lookup_first_ordered_extent(struct inode *inode, u64 file_offset)
> >   * search the ordered extents for one corresponding to 'offset' and
> >   * try to find a checksum.  This is used because we allow pages to
> >   * be reclaimed before their checksum is actually put into the btree
> > + *
> > + * @nr_sectors:      The length of the search range, in sectors.
>
> Please add all parameters

Yeah, having all parameters will be important for documentation
purposes. Shouldn't need too much effort, just quick descriptions
of the parameters.

Best regards,
Amy Parker
(they/them)

>
> >   */
> >  int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
> > -                        u8 *sum, int len)
> > +                        u8 *sum, int nr_sectors)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs
  2020-10-28  7:24 ` [PATCH 3/3] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs Qu Wenruo
@ 2020-11-03 19:46   ` David Sterba
  2020-11-03 23:42     ` Qu Wenruo
  0 siblings, 1 reply; 12+ messages in thread
From: David Sterba @ 2020-11-03 19:46 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Oct 28, 2020 at 03:24:32PM +0800, Qu Wenruo wrote:
>  
> +/*
> + * Helper to find csums for logical bytenr range

    * Find checksums for logical bytenr range

'Helper' or 'This function' are redundant

> + * [disk_bytenr, disk_bytenr + len) and restore the result to @dst.
                                           ^^^^^^^
					   store

So this replaces the individual parameter description and is better
readable, ok.

> + *
> + * Return >0 for the number of sectors we found.
> + * Return 0 for the range [disk_bytenr, disk_bytenr + sectorsize) has no csum
> + * for it. Caller may want to try next sector until one range is hit.
> + * Return <0 for fatal error.
> + */
> +static int find_csum_tree_sums(struct btrfs_fs_info *fs_info,
> +			       struct btrfs_path *path, u64 disk_bytenr,
> +			       u64 len, u8 *dst)
> +{
> +	struct btrfs_csum_item *item = NULL;
> +	struct btrfs_key key;
> +	u32 csum_size = btrfs_super_csum_size(fs_info->super_copy);

This could use the fs_info->csum_size now that we have it

> +	u32 sectorsize = fs_info->sectorsize;
> +	int ret;
> +	u64 csum_start;
> +	u64 csum_len;
> +
> +	ASSERT(IS_ALIGNED(disk_bytenr, sectorsize) &&
> +	       IS_ALIGNED(len, sectorsize));
> +
> +	/* Check if the current csum item covers disk_bytenr */
> +	if (path->nodes[0]) {
> +		item = btrfs_item_ptr(path->nodes[0], path->slots[0],
> +				      struct btrfs_csum_item);
> +		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> +		csum_start = key.offset;
> +		csum_len = btrfs_item_size_nr(path->nodes[0], path->slots[0]) /
> +			   csum_size * sectorsize;

			path->slots[0]) / csum_size * sectorsize

This expresission would be better on one line

> +		if (csum_start <= disk_bytenr &&
> +		    csum_start + csum_len > disk_bytenr)
> +			goto found;
> +	}
> +
> +	/* Current item doesn't contain the desired range, re-search */
                                                           ^^^^^^^^^
							   search again

> +	btrfs_release_path(path);
> +	item = btrfs_lookup_csum(NULL, fs_info->csum_root, path,
> +				 disk_bytenr, 0);
> +	if (IS_ERR(item)) {
> +		ret = PTR_ERR(item);
> +		goto out;
> +	}
> +found:
> +	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> +	csum_start = key.offset;
> +	csum_len = btrfs_item_size_nr(path->nodes[0], path->slots[0]) /
> +		   csum_size * sectorsize;

same, on one line

> +	ASSERT(csum_start <= disk_bytenr &&
> +	       csum_start + csum_len > disk_bytenr);
> +
> +	ret = div_u64(min(csum_start + csum_len, disk_bytenr + len) -
> +		      disk_bytenr, sectorsize);

This can use the sectorsize_bits

> +	read_extent_buffer(path->nodes[0], dst, (unsigned long)item,
> +			ret * csum_size);
> +out:
> +	if (ret == -ENOENT) {
> +		ret = 0;
> +		memset(dst, 0, csum_size);
> +	}
> +	return ret;
> +}
> +
>  /**
>   * btrfs_lookup_bio_sums - Look up checksums for a bio.
> - * @inode: inode that the bio is for.
> - * @bio: bio to look up.
> - * @offset: Unless (u64)-1, look up checksums for this offset in the file.
> - *          If (u64)-1, use the page offsets from the bio instead.
> - * @dst: Buffer of size nblocks * btrfs_super_csum_size() used to return
> - *       checksum (nblocks = bio->bi_iter.bi_size / fs_info->sectorsize). If
> - *       NULL, the checksum buffer is allocated and returned in
> - *       btrfs_io_bio(bio)->csum instead.
> + * @inode:		Inode that the bio is for.
> + * @bio:		Bio to look up.
> + *			NOTE: The bio is only used to determine the file_offset
> + *			and length.
> + * @file_offset:	File offset of the bio.
> + * 			If (u64)-1, will use the bio to determine the
> + * 			file offset.
> + * @dst:		Csum destination.
> + * 			Should be at least (bio->bi_iter.bi_size /
> + * 			fs_info->sectorsize * csum_size) bytes in size.

This is wasting too much vertical space, please align the descriptions
by @file_offset: + 1 space after

>   *
>   * Return: BLK_STS_RESOURCE if allocating memory fails, BLK_STS_OK otherwise.
>   */
>  blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
> -				   u64 offset, u8 *dst)
> +				   u64 file_offset, u8 *dst)
>  {
>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> -	struct bio_vec bvec;
> -	struct bvec_iter iter;
> -	struct btrfs_csum_item *item = NULL;
>  	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
>  	struct btrfs_path *path;
> -	const bool page_offsets = (offset == (u64)-1);
> +	u32 csum_size = btrfs_super_csum_size(fs_info->super_copy);

fs_info->csum_size

> +	u32 sectorsize = fs_info->sectorsize;
> +	u64 orig_file_offset;
> +	u64 orig_len;
> +	u64 orig_disk_bytenr = bio->bi_iter.bi_sector << 9;
> +	/* Current file offset, is used to calculate all other values */
> +	u64 cur_offset;
>  	u8 *csum;
> -	u64 item_start_offset = 0;
> -	u64 item_last_offset = 0;
> -	u64 disk_bytenr;
> -	u64 page_bytes_left;
> -	u32 diff;
>  	int nblocks;
>  	int count = 0;
> -	u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
> +
> +	if (file_offset == (u64)-1)
> +		orig_file_offset = page_offset(bio_first_page_all(bio)) +
> +				   bio_first_bvec_all(bio)->bv_offset;
> +	else
> +		orig_file_offset = file_offset;
> +
> +	orig_len = bio->bi_iter.bi_size;
> +	nblocks = orig_len >> inode->i_sb->s_blocksize_bits;
>  
>  	path = btrfs_alloc_path();
>  	if (!path)
>  		return BLK_STS_RESOURCE;
>  
> -	nblocks = bio->bi_iter.bi_size >> inode->i_sb->s_blocksize_bits;
>  	if (!dst) {
>  		struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio);
>  
> @@ -313,85 +386,60 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
>  		path->skip_locking = 1;
>  	}
>  
> -	disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
> -
> -	bio_for_each_segment(bvec, bio, iter) {
> -		page_bytes_left = bvec.bv_len;
> -		if (count)
> -			goto next;
> -
> -		if (page_offsets)
> -			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
> -		count = btrfs_find_ordered_sum(inode, offset, disk_bytenr,
> -					       csum, nblocks);
> +	/*
> +	 * In fact, for csum lookup we don't really need bio at all.
> +	 *
> +	 * We know the on-disk bytenr, the file_offset, and length.
> +	 * That's enough to search csum. The bio is in fact just a distraction
> +	 * and following bio bvec would make thing much hard to go.
> +	 * As we could have subpage bvec (with different bv_len) and non-linear
> +	 * bvec.
> +	 *
> +	 * So here we don't bother bio at all, just use @cur_offset to do the
> +	 * iteration.

This comment style is maybe more suitable for changelog but in the code
it comes in the context of the code so I'd expect something like:

	We don't need to use bio because we already know the on-disk
	bytenr, file_offset and length. For subpage bvec we can even
	have different bv_len than PAGE_SIZE or non-linear bvec.

Though, if there's redundant information in the bio it can be
cross-checked by asserts.

> +	 */
> +	for (cur_offset = orig_file_offset; cur_offset < orig_file_offset + orig_len;
> +	     cur_offset += count * sectorsize) {
> +		u64 cur_disk_bytenr;
> +		int search_len = orig_file_offset + orig_len - cur_offset;
> +		int diff_sectors;

Int types mixed with u64

> +		u8 *csum_dst;
> +
> +		diff_sectors = div_u64(cur_offset - orig_file_offset,
> +				       sectorsize);
> +		cur_disk_bytenr = orig_disk_bytenr +
> +				  diff_sectors * sectorsize;
> +		csum_dst = csum + diff_sectors * csum_size;
> +
> +		count = btrfs_find_ordered_sum(inode, cur_offset,
> +					       cur_disk_bytenr, csum_dst,
> +					       search_len / sectorsize);
>  		if (count)
> -			goto found;
> -
> -		if (!item || disk_bytenr < item_start_offset ||
> -		    disk_bytenr >= item_last_offset) {
> -			struct btrfs_key found_key;
> -			u32 item_size;
> -
> -			if (item)
> -				btrfs_release_path(path);
> -			item = btrfs_lookup_csum(NULL, fs_info->csum_root,
> -						 path, disk_bytenr, 0);
> -			if (IS_ERR(item)) {
> -				count = 1;
> -				memset(csum, 0, csum_size);
> -				if (BTRFS_I(inode)->root->root_key.objectid ==
> -				    BTRFS_DATA_RELOC_TREE_OBJECTID) {
> -					set_extent_bits(io_tree, offset,
> -						offset + fs_info->sectorsize - 1,
> -						EXTENT_NODATASUM);
> -				} else {
> -					btrfs_info_rl(fs_info,
> -						   "no csum found for inode %llu start %llu",
> -					       btrfs_ino(BTRFS_I(inode)), offset);
> -				}
> -				item = NULL;
> -				btrfs_release_path(path);
> -				goto found;
> +			continue;
> +		count = find_csum_tree_sums(fs_info, path, cur_disk_bytenr,
> +					    search_len, csum_dst);
> +		if (!count) {
> +			/*
> +			 * For not found case, the csum has been zeroed
> +			 * in find_csum_tree_sums() already, just skip
> +			 * to next sector.
> +			 */
> +			count = 1;
> +			if (BTRFS_I(inode)->root->root_key.objectid ==
> +			    BTRFS_DATA_RELOC_TREE_OBJECTID) {
> +				set_extent_bits(io_tree, cur_offset,
> +					cur_offset + sectorsize - 1,
> +					EXTENT_NODATASUM);
> +			} else {
> +				btrfs_warn_rl(fs_info,
> +		"csum hole found for root %lld inode %llu range [%llu, %llu)",
> +				BTRFS_I(inode)->root->root_key.objectid,
> +				btrfs_ino(BTRFS_I(inode)),
> +				cur_offset, cur_offset + sectorsize);
>  			}
> -			btrfs_item_key_to_cpu(path->nodes[0], &found_key,
> -					      path->slots[0]);
> -
> -			item_start_offset = found_key.offset;
> -			item_size = btrfs_item_size_nr(path->nodes[0],
> -						       path->slots[0]);
> -			item_last_offset = item_start_offset +
> -				(item_size / csum_size) *
> -				fs_info->sectorsize;
> -			item = btrfs_item_ptr(path->nodes[0], path->slots[0],
> -					      struct btrfs_csum_item);
> -		}
> -		/*
> -		 * this byte range must be able to fit inside
> -		 * a single leaf so it will also fit inside a u32
> -		 */
> -		diff = disk_bytenr - item_start_offset;
> -		diff = diff / fs_info->sectorsize;
> -		diff = diff * csum_size;
> -		count = min_t(int, nblocks, (item_last_offset - disk_bytenr) >>
> -					    inode->i_sb->s_blocksize_bits);
> -		read_extent_buffer(path->nodes[0], csum,
> -				   ((unsigned long)item) + diff,
> -				   csum_size * count);
> -found:
> -		csum += count * csum_size;
> -		nblocks -= count;
> -next:
> -		while (count > 0) {
> -			count--;
> -			disk_bytenr += fs_info->sectorsize;
> -			offset += fs_info->sectorsize;
> -			page_bytes_left -= fs_info->sectorsize;
> -			if (!page_bytes_left)
> -				break; /* move to next bio */
>  		}
>  	}
>  
> -	WARN_ON_ONCE(count);
>  	btrfs_free_path(path);
>  	return BLK_STS_OK;
>  }
> -- 
> 2.29.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] btrfs: ordered-data: rename parameter @len to @nr_sectors
  2020-11-03 19:23     ` Amy Parker
@ 2020-11-03 23:36       ` Qu Wenruo
  0 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2020-11-03 23:36 UTC (permalink / raw)
  To: Amy Parker, dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1750 bytes --]



On 2020/11/4 上午3:23, Amy Parker wrote:
> On Tue, Nov 3, 2020 at 11:18 AM David Sterba <dsterba@suse.cz> wrote:
>>
>> On Wed, Oct 28, 2020 at 03:24:31PM +0800, Qu Wenruo wrote:
>>> The parameter is the number of sectors of the range to search.
>>> While most "len" we used in other locations are in byte size, this can
>>> lead to confusion.
>>>
>>> Rename @len to @nr_sectors to make it more clear and avoid confusion.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>  fs/btrfs/ordered-data.c | 9 ++++++---
>>>  fs/btrfs/ordered-data.h | 2 +-
>>>  2 files changed, 7 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
>>> index ebac13389e7e..10c13f8a1603 100644
>>> --- a/fs/btrfs/ordered-data.c
>>> +++ b/fs/btrfs/ordered-data.c
>>> @@ -802,9 +802,11 @@ btrfs_lookup_first_ordered_extent(struct inode *inode, u64 file_offset)
>>>   * search the ordered extents for one corresponding to 'offset' and
>>>   * try to find a checksum.  This is used because we allow pages to
>>>   * be reclaimed before their checksum is actually put into the btree
>>> + *
>>> + * @nr_sectors:      The length of the search range, in sectors.
>>
>> Please add all parameters
> 
> Yeah, having all parameters will be important for documentation
> purposes. Shouldn't need too much effort, just quick descriptions
> of the parameters.

No need anymore, as the function will be removed completely.

Thanks,
Qu
> 
> Best regards,
> Amy Parker
> (they/them)
> 
>>
>>>   */
>>>  int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,
>>> -                        u8 *sum, int len)
>>> +                        u8 *sum, int nr_sectors)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs
  2020-11-03 19:46   ` David Sterba
@ 2020-11-03 23:42     ` Qu Wenruo
  2020-11-16 16:27       ` David Sterba
  0 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2020-11-03 23:42 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 8049 bytes --]



On 2020/11/4 上午3:46, David Sterba wrote:
> On Wed, Oct 28, 2020 at 03:24:32PM +0800, Qu Wenruo wrote:
>>  
>> +/*
>> + * Helper to find csums for logical bytenr range
> 
>     * Find checksums for logical bytenr range
> 
> 'Helper' or 'This function' are redundant
> 
>> + * [disk_bytenr, disk_bytenr + len) and restore the result to @dst.
>                                            ^^^^^^^
> 					   store
> 
> So this replaces the individual parameter description and is better
> readable, ok.
> 
>> + *
>> + * Return >0 for the number of sectors we found.
>> + * Return 0 for the range [disk_bytenr, disk_bytenr + sectorsize) has no csum
>> + * for it. Caller may want to try next sector until one range is hit.
>> + * Return <0 for fatal error.
>> + */
>> +static int find_csum_tree_sums(struct btrfs_fs_info *fs_info,
>> +			       struct btrfs_path *path, u64 disk_bytenr,
>> +			       u64 len, u8 *dst)
>> +{
>> +	struct btrfs_csum_item *item = NULL;
>> +	struct btrfs_key key;
>> +	u32 csum_size = btrfs_super_csum_size(fs_info->super_copy);
> 
> This could use the fs_info->csum_size now that we have it
> 
>> +	u32 sectorsize = fs_info->sectorsize;
>> +	int ret;
>> +	u64 csum_start;
>> +	u64 csum_len;
>> +
>> +	ASSERT(IS_ALIGNED(disk_bytenr, sectorsize) &&
>> +	       IS_ALIGNED(len, sectorsize));
>> +
>> +	/* Check if the current csum item covers disk_bytenr */
>> +	if (path->nodes[0]) {
>> +		item = btrfs_item_ptr(path->nodes[0], path->slots[0],
>> +				      struct btrfs_csum_item);
>> +		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
>> +		csum_start = key.offset;
>> +		csum_len = btrfs_item_size_nr(path->nodes[0], path->slots[0]) /
>> +			   csum_size * sectorsize;
> 
> 			path->slots[0]) / csum_size * sectorsize
> 
> This expresission would be better on one line

But it's already over 80 charactors.

Or maybe I could use a small helper to do the csum_len calcuation like
calc_csum_lenght(path)?


> 
>> +		if (csum_start <= disk_bytenr &&
>> +		    csum_start + csum_len > disk_bytenr)
>> +			goto found;
>> +	}
>> +
>> +	/* Current item doesn't contain the desired range, re-search */
>                                                            ^^^^^^^^^
> 							   search again
> 
>> +	btrfs_release_path(path);
>> +	item = btrfs_lookup_csum(NULL, fs_info->csum_root, path,
>> +				 disk_bytenr, 0);
>> +	if (IS_ERR(item)) {
>> +		ret = PTR_ERR(item);
>> +		goto out;
>> +	}
>> +found:
>> +	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
>> +	csum_start = key.offset;
>> +	csum_len = btrfs_item_size_nr(path->nodes[0], path->slots[0]) /
>> +		   csum_size * sectorsize;
> 
> same, on one line
> 
>> +	ASSERT(csum_start <= disk_bytenr &&
>> +	       csum_start + csum_len > disk_bytenr);
>> +
>> +	ret = div_u64(min(csum_start + csum_len, disk_bytenr + len) -
>> +		      disk_bytenr, sectorsize);
> 
> This can use the sectorsize_bits

Already done in the latest version. "[PATCH 24/32] btrfs: file-item:
refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs".

...
>> +	/*
>> +	 * In fact, for csum lookup we don't really need bio at all.
>> +	 *
>> +	 * We know the on-disk bytenr, the file_offset, and length.
>> +	 * That's enough to search csum. The bio is in fact just a distraction
>> +	 * and following bio bvec would make thing much hard to go.
>> +	 * As we could have subpage bvec (with different bv_len) and non-linear
>> +	 * bvec.
>> +	 *
>> +	 * So here we don't bother bio at all, just use @cur_offset to do the
>> +	 * iteration.
> 
> This comment style is maybe more suitable for changelog but in the code
> it comes in the context of the code so I'd expect something like:
> 
> 	We don't need to use bio because we already know the on-disk
> 	bytenr, file_offset and length. For subpage bvec we can even
> 	have different bv_len than PAGE_SIZE or non-linear bvec.

Indeed looks better.

> 
> Though, if there's redundant information in the bio it can be
> cross-checked by asserts.

The assert() can be done, but since we no longer do bvec iteration at
all, I doubt if it's really needed.

Thanks,
Qu

> 
>> +	 */
>> +	for (cur_offset = orig_file_offset; cur_offset < orig_file_offset + orig_len;
>> +	     cur_offset += count * sectorsize) {
>> +		u64 cur_disk_bytenr;
>> +		int search_len = orig_file_offset + orig_len - cur_offset;
>> +		int diff_sectors;
> 
> Int types mixed with u64
> 
>> +		u8 *csum_dst;
>> +
>> +		diff_sectors = div_u64(cur_offset - orig_file_offset,
>> +				       sectorsize);
>> +		cur_disk_bytenr = orig_disk_bytenr +
>> +				  diff_sectors * sectorsize;
>> +		csum_dst = csum + diff_sectors * csum_size;
>> +
>> +		count = btrfs_find_ordered_sum(inode, cur_offset,
>> +					       cur_disk_bytenr, csum_dst,
>> +					       search_len / sectorsize);
>>  		if (count)
>> -			goto found;
>> -
>> -		if (!item || disk_bytenr < item_start_offset ||
>> -		    disk_bytenr >= item_last_offset) {
>> -			struct btrfs_key found_key;
>> -			u32 item_size;
>> -
>> -			if (item)
>> -				btrfs_release_path(path);
>> -			item = btrfs_lookup_csum(NULL, fs_info->csum_root,
>> -						 path, disk_bytenr, 0);
>> -			if (IS_ERR(item)) {
>> -				count = 1;
>> -				memset(csum, 0, csum_size);
>> -				if (BTRFS_I(inode)->root->root_key.objectid ==
>> -				    BTRFS_DATA_RELOC_TREE_OBJECTID) {
>> -					set_extent_bits(io_tree, offset,
>> -						offset + fs_info->sectorsize - 1,
>> -						EXTENT_NODATASUM);
>> -				} else {
>> -					btrfs_info_rl(fs_info,
>> -						   "no csum found for inode %llu start %llu",
>> -					       btrfs_ino(BTRFS_I(inode)), offset);
>> -				}
>> -				item = NULL;
>> -				btrfs_release_path(path);
>> -				goto found;
>> +			continue;
>> +		count = find_csum_tree_sums(fs_info, path, cur_disk_bytenr,
>> +					    search_len, csum_dst);
>> +		if (!count) {
>> +			/*
>> +			 * For not found case, the csum has been zeroed
>> +			 * in find_csum_tree_sums() already, just skip
>> +			 * to next sector.
>> +			 */
>> +			count = 1;
>> +			if (BTRFS_I(inode)->root->root_key.objectid ==
>> +			    BTRFS_DATA_RELOC_TREE_OBJECTID) {
>> +				set_extent_bits(io_tree, cur_offset,
>> +					cur_offset + sectorsize - 1,
>> +					EXTENT_NODATASUM);
>> +			} else {
>> +				btrfs_warn_rl(fs_info,
>> +		"csum hole found for root %lld inode %llu range [%llu, %llu)",
>> +				BTRFS_I(inode)->root->root_key.objectid,
>> +				btrfs_ino(BTRFS_I(inode)),
>> +				cur_offset, cur_offset + sectorsize);
>>  			}
>> -			btrfs_item_key_to_cpu(path->nodes[0], &found_key,
>> -					      path->slots[0]);
>> -
>> -			item_start_offset = found_key.offset;
>> -			item_size = btrfs_item_size_nr(path->nodes[0],
>> -						       path->slots[0]);
>> -			item_last_offset = item_start_offset +
>> -				(item_size / csum_size) *
>> -				fs_info->sectorsize;
>> -			item = btrfs_item_ptr(path->nodes[0], path->slots[0],
>> -					      struct btrfs_csum_item);
>> -		}
>> -		/*
>> -		 * this byte range must be able to fit inside
>> -		 * a single leaf so it will also fit inside a u32
>> -		 */
>> -		diff = disk_bytenr - item_start_offset;
>> -		diff = diff / fs_info->sectorsize;
>> -		diff = diff * csum_size;
>> -		count = min_t(int, nblocks, (item_last_offset - disk_bytenr) >>
>> -					    inode->i_sb->s_blocksize_bits);
>> -		read_extent_buffer(path->nodes[0], csum,
>> -				   ((unsigned long)item) + diff,
>> -				   csum_size * count);
>> -found:
>> -		csum += count * csum_size;
>> -		nblocks -= count;
>> -next:
>> -		while (count > 0) {
>> -			count--;
>> -			disk_bytenr += fs_info->sectorsize;
>> -			offset += fs_info->sectorsize;
>> -			page_bytes_left -= fs_info->sectorsize;
>> -			if (!page_bytes_left)
>> -				break; /* move to next bio */
>>  		}
>>  	}
>>  
>> -	WARN_ON_ONCE(count);
>>  	btrfs_free_path(path);
>>  	return BLK_STS_OK;
>>  }
>> -- 
>> 2.29.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs
  2020-11-03 23:42     ` Qu Wenruo
@ 2020-11-16 16:27       ` David Sterba
  0 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2020-11-16 16:27 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Wed, Nov 04, 2020 at 07:42:21AM +0800, Qu Wenruo wrote:
> >> +	if (path->nodes[0]) {
> >> +		item = btrfs_item_ptr(path->nodes[0], path->slots[0],
> >> +				      struct btrfs_csum_item);
> >> +		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> >> +		csum_start = key.offset;
> >> +		csum_len = btrfs_item_size_nr(path->nodes[0], path->slots[0]) /
> >> +			   csum_size * sectorsize;
> > 
> > 			path->slots[0]) / csum_size * sectorsize
> > 
> > This expresission would be better on one line
> 
> But it's already over 80 charactors.
> 
> Or maybe I could use a small helper to do the csum_len calcuation like
> calc_csum_lenght(path)?

If it's a slight 80 columns overflow I'm joining the lines, there are
exceptions like ending ); or ) { or if the first part of the word is
enough to understand.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-11-16 16:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-28  7:24 [PATCH 0/3] btrfs: btrfs_lookup_bio_sums() related fixes Qu Wenruo
2020-10-28  7:24 ` [PATCH 1/3] btrfs: file-item: use nodesize to determine whether we need readhead for btrfs_lookup_bio_sums() Qu Wenruo
2020-10-29  7:25   ` Nikolay Borisov
2020-10-28  7:24 ` [PATCH 2/3] btrfs: ordered-data: rename parameter @len to @nr_sectors Qu Wenruo
2020-10-29  7:32   ` Nikolay Borisov
2020-11-03 19:16   ` David Sterba
2020-11-03 19:23     ` Amy Parker
2020-11-03 23:36       ` Qu Wenruo
2020-10-28  7:24 ` [PATCH 3/3] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs Qu Wenruo
2020-11-03 19:46   ` David Sterba
2020-11-03 23:42     ` Qu Wenruo
2020-11-16 16:27       ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).