All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/24] btrfs: preparation patches for subpage support
@ 2020-11-13 12:51 Qu Wenruo
  2020-11-13 12:51 ` [PATCH v2 01/24] btrfs: tests: fix free space tree test failure on 64K page system Qu Wenruo
                   ` (24 more replies)
  0 siblings, 25 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs

This is the rebased preparation branch for all patches not yet merged into
misc-next.

It can be fetched from github (with RO mount support using page->private)
https://github.com/adam900710/linux/tree/subpage

This patchset includes all the unmerged preparation patches for subpage
support.

The patchset is sent without the main core for subpage support, as
myself has proven that, big patchset bombarding won't really make
reviewers happy, but only make the author happy (for a very short time).

Thanks for the hard work from David, there are only 24 patches unmerged.

Patch 01:	Special hotfix, for selftests. Should be folded into the
		offending patch.
Patch 02:	Fix to remove invalid test cases for 64K sector size.
Patch 03~08:	Refactors around endio functions (both data and
		metadata)
Patch 09~14:	Update metadata related handling to support subpage.
Patch 15:	Make extent io bits to be u32. (no longer utilized
		by later subpage, but the cleanup should still make
		sense)
Patch 16~18:	Refactor btrfs_lookup_bio_sums()
Patch 19~23:	Scrub support for subpage
		Since scrub is completely unrelated to regular data/meta
 		read write, the scrub support for subpage can be
		implemented independently and easily.
Patch 24:	Page->private related refactor, will be utilized by
		later subpage patches soon.

Changelog:
v1:
- Separate prep patches from the huge subpage patchset

- Rebased to misc-next

- Add more commit message for patch "btrfs: extent_io: remove the
  extent_start/extent_len for end_bio_extent_readpage()"
  With one runtime example to explain why we are doing the same thing.

- Fix the assert_spin_lock() usage
  What we really want is lockdep_assert_held()

- Re-iterate the reason why some extent io tests are invalid
  This is especially important since later patches will reduce
  extent_buffer::pages[] to bare minimal, killing the ability to
  handle certain invalid extent buffers.

- Use sectorsize_bits for division
  During the convert, we should only use sectorsize_bits for division,
  this solves the hassle on 32bit system to do division.
  But we should not use sectorsize_bits no brain, as bit shift is not
  straight forward as multiple/division.

- Address the comments for btrfs_lookup_bio_sums() cleanup patchset
  From naming to macro usages, all of those comments should further
  improve the readability.

v2:
- Remove new extent_io tree features
  Now we won't utilize extent io tree for subpage support, thus new
  features along with some aggressive refactor is no longer needed.

- Reduce extent_io tree operations to reduce endio time latency
  Although extent_io tree can do a lot of things like page status, but
  it has obvious overhead, namingly search btree.
  So keep the original behavior by only calling extent_io operation in a
  big extent, to reduce latency


Qu Wenruo (24):
  btrfs: tests: fix free space tree test failure on 64K page system
  btrfs: extent-io-tests: remove invalid tests
  btrfs: extent_io: replace extent_start/extent_len with better
    structure for end_bio_extent_readpage()
  btrfs: extent_io: introduce helper to handle page status update in
    end_bio_extent_readpage()
  btrfs: extent_io: extract the btree page submission code into its own
    helper function
  btrfs: remove the phy_offset parameter for
    btrfs_validate_metadata_buffer()
  btrfs: pass bio_offset to check_data_csum() directly
  btrfs: inode: make btrfs_verify_data_csum() follow sector size
  btrfs: extent_io: calculate inline extent buffer page size based on
    page size
  btrfs: introduce a helper to determine if the sectorsize is smaller
    than PAGE_SIZE
  btrfs: extent_io: don't allow tree block to cross page boundary for
    subpage support
  btrfs: extent_io: update num_extent_pages() to support subpage sized
    extent buffer
  btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
  btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage()
  btrfs: extent-io: make type of extent_state::state to be at least 32
    bits
  btrfs: file-item: use nodesize to determine whether we need readahead
    for btrfs_lookup_bio_sums()
  btrfs: file-item: remove the btrfs_find_ordered_sum() call in
    btrfs_lookup_bio_sums()
  btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle
    out-of-order bvecs
  btrfs: scrub: remove the anonymous structure from scrub_page
  btrfs: scrub: always allocate one full page for one sector for RAID56
  btrfs: scrub: support subpage tree block scrub
  btrfs: scrub: support subpage data scrub
  btrfs: scrub: allow scrub to work with subpage sectorsize
  btrfs: extent_io: Use detach_page_private() for alloc_extent_buffer()

 fs/btrfs/compression.c           |   5 +-
 fs/btrfs/ctree.c                 |   5 +-
 fs/btrfs/ctree.h                 |  47 +++-
 fs/btrfs/disk-io.c               |   2 +-
 fs/btrfs/disk-io.h               |   2 +-
 fs/btrfs/extent-io-tree.h        |  33 +--
 fs/btrfs/extent_io.c             | 403 +++++++++++++++++++------------
 fs/btrfs/extent_io.h             |  21 +-
 fs/btrfs/file-item.c             | 259 +++++++++++++-------
 fs/btrfs/inode.c                 |  43 ++--
 fs/btrfs/ordered-data.c          |  44 ----
 fs/btrfs/ordered-data.h          |   2 -
 fs/btrfs/scrub.c                 |  57 +++--
 fs/btrfs/struct-funcs.c          |  18 +-
 fs/btrfs/tests/btrfs-tests.c     |   1 +
 fs/btrfs/tests/extent-io-tests.c |  26 +-
 16 files changed, 584 insertions(+), 384 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 01/24] btrfs: tests: fix free space tree test failure on 64K page system
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-17 11:53   ` Nikolay Borisov
  2020-11-13 12:51 ` [PATCH v2 02/24] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs

When trying to load btrfs with selftest compiled in, on 64K page system
the test will always fail:

  BTRFS: selftest: running free space tree tests
  BTRFS: selftest: fs/btrfs/tests/free-space-tree-tests.c:101 free space tree is invalid
  BTRFS: selftest: fs/btrfs/tests/free-space-tree-tests.c:529 test_empty_block_group [btrfs] failed with extents, sectorsize=65536, nodesize=65536, alignment=134217728

The cause is that, after commit 801dc0c9ff1f ("btrfs: replace div_u64 by
shift in free_space_bitmap_size"), we use fs_info->sectorsize_bits for
free space cache.

But in comit fc59cfa7d2ab ("btrfs: use precalculated sectorsize_bits
from fs_info"), we only initialized the fs_info for non-testing
environment, leaving the default bits to be ilog2(4K), screwing up the
selftest on 64K page system.

Fix it by also honor sectorsize_bits in selftest.

David, please fold this fix into the offending commit.

Fixes: fc59cfa7d2ab ("btrfs: use precalculated sectorsize_bits from fs_info")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/tests/btrfs-tests.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c
index 8519f7746b2e..8ca334d554af 100644
--- a/fs/btrfs/tests/btrfs-tests.c
+++ b/fs/btrfs/tests/btrfs-tests.c
@@ -134,6 +134,7 @@ struct btrfs_fs_info *btrfs_alloc_dummy_fs_info(u32 nodesize, u32 sectorsize)
 
 	fs_info->nodesize = nodesize;
 	fs_info->sectorsize = sectorsize;
+	fs_info->sectorsize_bits = ilog2(sectorsize);
 	set_bit(BTRFS_FS_STATE_DUMMY_FS_INFO, &fs_info->fs_state);
 
 	test_mnt->mnt_sb->s_fs_info = fs_info;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 02/24] btrfs: extent-io-tests: remove invalid tests
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
  2020-11-13 12:51 ` [PATCH v2 01/24] btrfs: tests: fix free space tree test failure on 64K page system Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-13 18:42   ` Josef Bacik
  2020-11-19 21:08   ` David Sterba
  2020-11-13 12:51 ` [PATCH v2 03/24] btrfs: extent_io: replace extent_start/extent_len with better structure for end_bio_extent_readpage() Qu Wenruo
                   ` (22 subsequent siblings)
  24 siblings, 2 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs

In extent-io-test, there are two invalid tests:
- Invalid nodesize for test_eb_bitmaps()
  Instead of the sectorsize and nodesize combination passed in, we're
  always using hand-crafted nodesize, e.g:

	len = (sectorsize < BTRFS_MAX_METADATA_BLOCKSIZE)
		? sectorsize * 4 : sectorsize;

  In above case, if we have 32K page size, then we will get a length of
  128K, which is beyond max node size, and obviously invalid.

  Thankfully most machines are either 4K or 64K page size, thus we
  haven't yet hit such case.

- Invalid extent buffer bytenr
  For 64K page size, the only combination we're going to test is
  sectorsize = nodesize = 64K.
  However in that case, we will try to test an eb which bytenr is not
  sectorsize aligned:

	/* Do it over again with an extent buffer which isn't page-aligned. */
	eb = __alloc_dummy_extent_buffer(fs_info, nodesize / 2, len);

  Sector alignedment is a hard requirement for any sector size.
  The only exception is superblock. But anything else should follow
  sector size alignment.

  This is definitely an invalid test case.

This patch will fix both problems by:
- Honor the sectorsize/nodesize combination
  Now we won't bother to hand-craft a strange length and use it as
  nodesize.

- Use sectorsize as the 2nd run extent buffer start
  This would test the case where extent buffer is aligned to sectorsize
  but not always aligned to nodesize.

Please note that, later subpage related cleanup will reduce
extent_buffer::pages[] to exact what we need, making the sector
unaligned extent buffer operations to cause problem.

Since only extent_io self tests utilize this invalid feature, this
patch is required for all later cleanup/refactors.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/tests/extent-io-tests.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
index df7ce874a74b..73e96d505f4f 100644
--- a/fs/btrfs/tests/extent-io-tests.c
+++ b/fs/btrfs/tests/extent-io-tests.c
@@ -379,54 +379,50 @@ static int __test_eb_bitmaps(unsigned long *bitmap, struct extent_buffer *eb,
 static int test_eb_bitmaps(u32 sectorsize, u32 nodesize)
 {
 	struct btrfs_fs_info *fs_info;
-	unsigned long len;
 	unsigned long *bitmap = NULL;
 	struct extent_buffer *eb = NULL;
 	int ret;
 
 	test_msg("running extent buffer bitmap tests");
 
-	/*
-	 * In ppc64, sectorsize can be 64K, thus 4 * 64K will be larger than
-	 * BTRFS_MAX_METADATA_BLOCKSIZE.
-	 */
-	len = (sectorsize < BTRFS_MAX_METADATA_BLOCKSIZE)
-		? sectorsize * 4 : sectorsize;
-
-	fs_info = btrfs_alloc_dummy_fs_info(len, len);
+	fs_info = btrfs_alloc_dummy_fs_info(nodesize, sectorsize);
 	if (!fs_info) {
 		test_std_err(TEST_ALLOC_FS_INFO);
 		return -ENOMEM;
 	}
 
-	bitmap = kmalloc(len, GFP_KERNEL);
+	bitmap = kmalloc(nodesize, GFP_KERNEL);
 	if (!bitmap) {
 		test_err("couldn't allocate test bitmap");
 		ret = -ENOMEM;
 		goto out;
 	}
 
-	eb = __alloc_dummy_extent_buffer(fs_info, 0, len);
+	eb = __alloc_dummy_extent_buffer(fs_info, 0, nodesize);
 	if (!eb) {
 		test_std_err(TEST_ALLOC_ROOT);
 		ret = -ENOMEM;
 		goto out;
 	}
 
-	ret = __test_eb_bitmaps(bitmap, eb, len);
+	ret = __test_eb_bitmaps(bitmap, eb, nodesize);
 	if (ret)
 		goto out;
 
-	/* Do it over again with an extent buffer which isn't page-aligned. */
 	free_extent_buffer(eb);
-	eb = __alloc_dummy_extent_buffer(fs_info, nodesize / 2, len);
+
+	/*
+	 * Test again for case where the tree block is sectorsize aligned but
+	 * not nodesize aligned.
+	 */
+	eb = __alloc_dummy_extent_buffer(fs_info, sectorsize, nodesize);
 	if (!eb) {
 		test_std_err(TEST_ALLOC_ROOT);
 		ret = -ENOMEM;
 		goto out;
 	}
 
-	ret = __test_eb_bitmaps(bitmap, eb, len);
+	ret = __test_eb_bitmaps(bitmap, eb, nodesize);
 out:
 	free_extent_buffer(eb);
 	kfree(bitmap);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 03/24] btrfs: extent_io: replace extent_start/extent_len with better structure for end_bio_extent_readpage()
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
  2020-11-13 12:51 ` [PATCH v2 01/24] btrfs: tests: fix free space tree test failure on 64K page system Qu Wenruo
  2020-11-13 12:51 ` [PATCH v2 02/24] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-13 19:13   ` Josef Bacik
                     ` (2 more replies)
  2020-11-13 12:51 ` [PATCH v2 04/24] btrfs: extent_io: introduce helper to handle page status update in end_bio_extent_readpage() Qu Wenruo
                   ` (21 subsequent siblings)
  24 siblings, 3 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs

In end_bio_extent_readpage() we had a strange dance around
extent_start/extent_len.

Hides behind the strange dance is, it's just calling
endio_readpage_release_extent() on each bvec range.

Here is an example to explain the original work flow:
  Bio is for inode 257, containing 2 pages, for range [1M, 1M+8K)

  end_bio_extent_extent_readpage() entered
  |- extent_start = 0;
  |- extent_end = 0;
  |- bio_for_each_segment_all() {
  |  |- /* Got the 1st bvec */
  |  |- start = SZ_1M;
  |  |- end = SZ_1M + SZ_4K - 1;
  |  |- update = 1;
  |  |- if (extent_len == 0) {
  |  |  |- extent_start = start; /* SZ_1M */
  |  |  |- extent_len = end + 1 - start; /* SZ_1M */
  |  |  }
  |  |
  |  |- /* Got the 2nd bvec */
  |  |- start = SZ_1M + 4K;
  |  |- end = SZ_1M + 4K - 1;
  |  |- update = 1;
  |  |- if (extent_start + extent_len == start) {
  |  |  |- extent_len += end + 1 - start; /* SZ_8K */
  |  |  }
  |  } /* All bio vec iterated */
  |
  |- if (extent_len) {
     |- endio_readpage_release_extent(tree, extent_start, extent_len,
				      update);
	/* extent_start == SZ_1M, extent_len == SZ_8K, uptodate = 1 */

As the above flow shows, the existing code in end_bio_extent_readpage()
is just accumulate extent_start/extent_len, and when the contiguous range
breaks, call endio_readpage_release_extent() for the range.

However current behavior has something not really considered:
- The inode can change
  For bio, their pages don't need to have contig page_offset.
  This means, even pages from different inode can be packed into one
  bio.

- Bvec cross page boundary
  There is a feature called multi-page bvec, where bvec->bv_len can go
  beyond bvec->bv_page boundary.

- Poor readability

This patch will address the problem by:
- Introduce a proper structure, processed_extent, to record processed
  extent range
- Integrate inode/start/end/uptodate check into
  endio_readpage_release_extent()
- Add more comment on each step.
  This should greatly improve the readability, now in
  end_bio_extent_readpage() there are only two
  endio_readpage_release_extent() calls.

- Add inode contig check
  Now we also ensure the inode is the same before checking the range
  contig.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 102 +++++++++++++++++++++++++++++--------------
 1 file changed, 69 insertions(+), 33 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3bbb3bdd395b..b5b3700943e0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2779,16 +2779,74 @@ static void end_bio_extent_writepage(struct bio *bio)
 	bio_put(bio);
 }
 
+/*
+ * Records previously processed extent range.
+ *
+ * For endio_readpage_release_extent() to handle a full extent range, reducing
+ * the extent io operations.
+ */
+struct processed_extent {
+	struct btrfs_inode *inode;
+	u64 start;	/* file offset in @inode */
+	u64 end;	/* file offset in @inode */
+	bool uptodate;
+};
+
+/*
+ * Try to release processed extent range.
+ *
+ * May not release the extent range right now if the current range is contig
+ * with processed extent.
+ *
+ * Will release processed extent when any of @inode, @uptodate, the range is
+ * no longer contig with processed range.
+ * Pass @inode == NULL will force processed extent to be released.
+ */
 static void
-endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 len,
-			      int uptodate)
+endio_readpage_release_extent(struct processed_extent *processed,
+			      struct btrfs_inode *inode, u64 start, u64 end,
+			      bool uptodate)
 {
 	struct extent_state *cached = NULL;
-	u64 end = start + len - 1;
+	struct extent_io_tree *tree;
 
-	if (uptodate && tree->track_uptodate)
-		set_extent_uptodate(tree, start, end, &cached, GFP_ATOMIC);
-	unlock_extent_cached_atomic(tree, start, end, &cached);
+	/* We're the first extent, initialize @processed */
+	if (!processed->inode)
+		goto update;
+
+	/*
+	 * Contig with processed extent. Just uptodate the end
+	 *
+	 * Several things to notice:
+	 * - Bio can be merged as long as on-disk bytenr is contig
+	 *   This means we can have page belonging to other inodes, thus need to
+	 *   check if the inode matches.
+	 * - Bvec can contain range beyond current page for multi-page bvec
+	 *   Thus we need to do processed->end + 1 >= start check
+	 */
+	if (processed->inode == inode && processed->uptodate == uptodate &&
+	    processed->end + 1 >= start && end >= processed->end) {
+		processed->end = end;
+		return;
+	}
+
+	tree = &processed->inode->io_tree;
+	/*
+	 * Now we have a range not contig with processed range, release the
+	 * processed range now.
+	 */
+	if (processed->uptodate && tree->track_uptodate)
+		set_extent_uptodate(tree, processed->start, processed->end,
+				    &cached, GFP_ATOMIC);
+	unlock_extent_cached_atomic(tree, processed->start, processed->end,
+				    &cached);
+
+update:
+	/* Update @processed to current range */
+	processed->inode = inode;
+	processed->start = start;
+	processed->end = end;
+	processed->uptodate = uptodate;
 }
 
 /*
@@ -2808,12 +2866,11 @@ static void end_bio_extent_readpage(struct bio *bio)
 	int uptodate = !bio->bi_status;
 	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
 	struct extent_io_tree *tree, *failure_tree;
+	struct processed_extent processed = { 0 };
 	u64 offset = 0;
 	u64 start;
 	u64 end;
 	u64 len;
-	u64 extent_start = 0;
-	u64 extent_len = 0;
 	int mirror;
 	int ret;
 	struct bvec_iter_all iter_all;
@@ -2922,32 +2979,11 @@ static void end_bio_extent_readpage(struct bio *bio)
 		unlock_page(page);
 		offset += len;
 
-		if (unlikely(!uptodate)) {
-			if (extent_len) {
-				endio_readpage_release_extent(tree,
-							      extent_start,
-							      extent_len, 1);
-				extent_start = 0;
-				extent_len = 0;
-			}
-			endio_readpage_release_extent(tree, start,
-						      end - start + 1, 0);
-		} else if (!extent_len) {
-			extent_start = start;
-			extent_len = end + 1 - start;
-		} else if (extent_start + extent_len == start) {
-			extent_len += end + 1 - start;
-		} else {
-			endio_readpage_release_extent(tree, extent_start,
-						      extent_len, uptodate);
-			extent_start = start;
-			extent_len = end + 1 - start;
-		}
+		endio_readpage_release_extent(&processed, BTRFS_I(inode),
+					      start, end, uptodate);
 	}
-
-	if (extent_len)
-		endio_readpage_release_extent(tree, extent_start, extent_len,
-					      uptodate);
+	/* Release the last extent */
+	endio_readpage_release_extent(&processed, NULL, 0, 0, 0);
 	btrfs_io_bio_free_csum(io_bio);
 	bio_put(bio);
 }
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 04/24] btrfs: extent_io: introduce helper to handle page status update in end_bio_extent_readpage()
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (2 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 03/24] btrfs: extent_io: replace extent_start/extent_len with better structure for end_bio_extent_readpage() Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-13 19:18   ` Josef Bacik
                     ` (2 more replies)
  2020-11-13 12:51 ` [PATCH v2 05/24] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
                   ` (20 subsequent siblings)
  24 siblings, 3 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new helper, endio_readpage_release_extent(), to handle
update status update in end_bio_extent_readpage().

The refactor itself is not really nothing interesting, the point here is
to provide the basis for later subpage support, where the page status
update can be more complex than current code.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b5b3700943e0..caafe44542e8 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2849,6 +2849,17 @@ endio_readpage_release_extent(struct processed_extent *processed,
 	processed->uptodate = uptodate;
 }
 
+static void endio_readpage_update_page_status(struct page *page, bool uptodate)
+{
+	if (uptodate) {
+		SetPageUptodate(page);
+	} else {
+		ClearPageUptodate(page);
+		SetPageError(page);
+	}
+	unlock_page(page);
+}
+
 /*
  * after a readpage IO is done, we need to:
  * clear the uptodate bits on error
@@ -2971,14 +2982,10 @@ static void end_bio_extent_readpage(struct bio *bio)
 			off = offset_in_page(i_size);
 			if (page->index == end_index && off)
 				zero_user_segment(page, off, PAGE_SIZE);
-			SetPageUptodate(page);
-		} else {
-			ClearPageUptodate(page);
-			SetPageError(page);
 		}
-		unlock_page(page);
 		offset += len;
 
+		endio_readpage_update_page_status(page, uptodate);
 		endio_readpage_release_extent(&processed, BTRFS_I(inode),
 					      start, end, uptodate);
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 05/24] btrfs: extent_io: extract the btree page submission code into its own helper function
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (3 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 04/24] btrfs: extent_io: introduce helper to handle page status update in end_bio_extent_readpage() Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-13 19:22   ` Josef Bacik
                     ` (2 more replies)
  2020-11-13 12:51 ` [PATCH v2 06/24] btrfs: remove the phy_offset parameter for btrfs_validate_metadata_buffer() Qu Wenruo
                   ` (19 subsequent siblings)
  24 siblings, 3 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

In btree_write_cache_pages() we have a btree page submission routine
buried deeply into a nested loop.

This patch will extract that part of code into a helper function,
submit_eb_page(), to do the same work.

Also, since submit_eb_page() now can return >0 for successful extent
buffer submission, remove the "ASSERT(ret <= 0);" line.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c | 116 +++++++++++++++++++++++++------------------
 1 file changed, 69 insertions(+), 47 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index caafe44542e8..fd4845b06989 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3987,10 +3987,75 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 	return ret;
 }
 
+/*
+ * Submit one page of one extent buffer.
+ *
+ * Will try to submit all pages of one extent buffer, thus will skip some pages
+ * if it's already submitted.
+ *
+ * @page:	The page of one extent buffer
+ * @eb_context:	To determine if we need to submit this page. If current page
+ *		belongs to this eb, we don't need to submit.
+ *
+ * Return >0 if we have submitted the extent buffer successfully.
+ * Return 0 if we don't need to do anything for the page.
+ * Return <0 for fatal error.
+ */
+static int submit_eb_page(struct page *page, struct writeback_control *wbc,
+			  struct extent_page_data *epd,
+			  struct extent_buffer **eb_context)
+{
+	struct address_space *mapping = page->mapping;
+	struct extent_buffer *eb;
+	int ret;
+
+	if (!PagePrivate(page))
+		return 0;
+
+	spin_lock(&mapping->private_lock);
+	if (!PagePrivate(page)) {
+		spin_unlock(&mapping->private_lock);
+		return 0;
+	}
+
+	eb = (struct extent_buffer *)page->private;
+
+	/*
+	 * Shouldn't happen and normally this would be a BUG_ON but no sense
+	 * in crashing the users box for something we can survive anyway.
+	 */
+	if (WARN_ON(!eb)) {
+		spin_unlock(&mapping->private_lock);
+		return 0;
+	}
+
+	if (eb == *eb_context) {
+		spin_unlock(&mapping->private_lock);
+		return 0;
+	}
+	ret = atomic_inc_not_zero(&eb->refs);
+	spin_unlock(&mapping->private_lock);
+	if (!ret)
+		return 0;
+
+	*eb_context = eb;
+
+	ret = lock_extent_buffer_for_io(eb, epd);
+	if (ret <= 0) {
+		free_extent_buffer(eb);
+		return ret;
+	}
+	ret = write_one_eb(eb, wbc, epd);
+	free_extent_buffer(eb);
+	if (ret < 0)
+		return ret;
+	return 1;
+}
+
 int btree_write_cache_pages(struct address_space *mapping,
 				   struct writeback_control *wbc)
 {
-	struct extent_buffer *eb, *prev_eb = NULL;
+	struct extent_buffer *eb_context = NULL;
 	struct extent_page_data epd = {
 		.bio = NULL,
 		.extent_locked = 0,
@@ -4036,55 +4101,13 @@ int btree_write_cache_pages(struct address_space *mapping,
 		for (i = 0; i < nr_pages; i++) {
 			struct page *page = pvec.pages[i];
 
-			if (!PagePrivate(page))
-				continue;
-
-			spin_lock(&mapping->private_lock);
-			if (!PagePrivate(page)) {
-				spin_unlock(&mapping->private_lock);
-				continue;
-			}
-
-			eb = (struct extent_buffer *)page->private;
-
-			/*
-			 * Shouldn't happen and normally this would be a BUG_ON
-			 * but no sense in crashing the users box for something
-			 * we can survive anyway.
-			 */
-			if (WARN_ON(!eb)) {
-				spin_unlock(&mapping->private_lock);
+			ret = submit_eb_page(page, wbc, &epd, &eb_context);
+			if (ret == 0)
 				continue;
-			}
-
-			if (eb == prev_eb) {
-				spin_unlock(&mapping->private_lock);
-				continue;
-			}
-
-			ret = atomic_inc_not_zero(&eb->refs);
-			spin_unlock(&mapping->private_lock);
-			if (!ret)
-				continue;
-
-			prev_eb = eb;
-			ret = lock_extent_buffer_for_io(eb, &epd);
-			if (!ret) {
-				free_extent_buffer(eb);
-				continue;
-			} else if (ret < 0) {
-				done = 1;
-				free_extent_buffer(eb);
-				break;
-			}
-
-			ret = write_one_eb(eb, wbc, &epd);
-			if (ret) {
+			if (ret < 0) {
 				done = 1;
-				free_extent_buffer(eb);
 				break;
 			}
-			free_extent_buffer(eb);
 
 			/*
 			 * the filesystem may choose to bump up nr_to_write.
@@ -4105,7 +4128,6 @@ int btree_write_cache_pages(struct address_space *mapping,
 		index = 0;
 		goto retry;
 	}
-	ASSERT(ret <= 0);
 	if (ret < 0) {
 		end_write_bio(&epd, ret);
 		return ret;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 06/24] btrfs: remove the phy_offset parameter for btrfs_validate_metadata_buffer()
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (4 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 05/24] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-19 21:09   ` David Sterba
  2020-11-13 12:51 ` [PATCH v2 07/24] btrfs: pass bio_offset to check_data_csum() directly Qu Wenruo
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Parameter @phy_offset is the offset against the bio->bi_iter.bi_sector.
@phy_offset is mostly for data io to lookup the csum in btrfs_io_bio.

But for metadata, it's completely useless as metadata stores their own
csum in its btrfs_header.

Remove this useless parameter from btrfs_validate_metadata_buffer().

Just an extra note for parameters @start and @end, they are not utilized
at all for current sectorsize == PAGE_SIZE case, as we can grab eb directly
from page.

But those two parameters are very important for later subpage support,
thus @start/@len are not touched here.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/disk-io.c   | 2 +-
 fs/btrfs/disk-io.h   | 2 +-
 fs/btrfs/extent_io.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f0161d2ae2a4..8a558a43818d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -593,7 +593,7 @@ static int validate_extent_buffer(struct extent_buffer *eb)
 	return ret;
 }
 
-int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio, u64 phy_offset,
+int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
 				   struct page *page, u64 start, u64 end,
 				   int mirror)
 {
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index e75ea6092942..40e81d6e481e 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -79,7 +79,7 @@ void btrfs_btree_balance_dirty(struct btrfs_fs_info *fs_info);
 void btrfs_btree_balance_dirty_nodelay(struct btrfs_fs_info *fs_info);
 void btrfs_drop_and_free_fs_root(struct btrfs_fs_info *fs_info,
 				 struct btrfs_root *root);
-int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio, u64 phy_offset,
+int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
 				   struct page *page, u64 start, u64 end,
 				   int mirror);
 blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index fd4845b06989..9d394485a60c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2928,7 +2928,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 							     start, end, mirror);
 			else
 				ret = btrfs_validate_metadata_buffer(io_bio,
-					offset, page, start, end, mirror);
+					page, start, end, mirror);
 			if (ret)
 				uptodate = 0;
 			else
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 07/24] btrfs: pass bio_offset to check_data_csum() directly
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (5 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 06/24] btrfs: remove the phy_offset parameter for btrfs_validate_metadata_buffer() Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-13 12:51 ` [PATCH v2 08/24] btrfs: inode: make btrfs_verify_data_csum() follow sector size Qu Wenruo
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs

Parameter @icsum for check_data_csum() is a little hard to understand.
So is the @phy_offset for btrfs_verify_data_csum().

Both parameters are calculated values for csum lookup.

Instead of some calculated value, just pass @bio_offset and let the
final and only user, check_data_csum(), to calculate whatever it needs.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h     |  2 +-
 fs/btrfs/extent_io.c | 14 ++++++++------
 fs/btrfs/inode.c     | 26 ++++++++++++++++----------
 3 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 679da4920c92..99955b6bfc62 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3046,7 +3046,7 @@ u64 btrfs_file_extent_end(const struct btrfs_path *path);
 /* inode.c */
 blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 				   int mirror_num, unsigned long bio_flags);
-int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 phy_offset,
+int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 bio_offset,
 			   struct page *page, u64 start, u64 end, int mirror);
 struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
 					   u64 start, u64 len);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9d394485a60c..37dd103213f9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2878,7 +2878,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
 	struct extent_io_tree *tree, *failure_tree;
 	struct processed_extent processed = { 0 };
-	u64 offset = 0;
+	u64 bio_offset = 0;
 	u64 start;
 	u64 end;
 	u64 len;
@@ -2924,8 +2924,9 @@ static void end_bio_extent_readpage(struct bio *bio)
 		mirror = io_bio->mirror_num;
 		if (likely(uptodate)) {
 			if (is_data_inode(inode))
-				ret = btrfs_verify_data_csum(io_bio, offset, page,
-							     start, end, mirror);
+				ret = btrfs_verify_data_csum(io_bio,
+						bio_offset, page, start, end,
+						mirror);
 			else
 				ret = btrfs_validate_metadata_buffer(io_bio,
 					page, start, end, mirror);
@@ -2953,12 +2954,13 @@ static void end_bio_extent_readpage(struct bio *bio)
 			 * If it can't handle the error it will return -EIO and
 			 * we remain responsible for that page.
 			 */
-			if (!btrfs_submit_read_repair(inode, bio, offset, page,
+			if (!btrfs_submit_read_repair(inode, bio, bio_offset,
+						page,
 						start - page_offset(page),
 						start, end, mirror,
 						btrfs_submit_data_bio)) {
 				uptodate = !bio->bi_status;
-				offset += len;
+				bio_offset += len;
 				continue;
 			}
 		} else {
@@ -2983,7 +2985,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 			if (page->index == end_index && off)
 				zero_user_segment(page, off, PAGE_SIZE);
 		}
-		offset += len;
+		bio_offset += len;
 
 		endio_readpage_update_page_status(page, uptodate);
 		endio_readpage_release_extent(&processed, BTRFS_I(inode),
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 435b270430e3..bcf3152d0efb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2931,26 +2931,28 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
  * check_data_csum - verify checksum of one sector of uncompressed data
  * @inode:	the inode
  * @io_bio:	btrfs_io_bio which contains the csum
- * @icsum:	checksum index in the io_bio->csum array, size of csum_size
+ * @bio_offset:	the offset to the beginning of the bio (in bytes)
  * @page:	page where is the data to be verified
  * @pgoff:	offset inside the page
  *
  * The length of such check is always one sector size.
  */
 static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
-			   int icsum, struct page *page, int pgoff)
+			   u64 bio_offset, struct page *page, int pgoff)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	char *kaddr;
 	u32 len = fs_info->sectorsize;
 	const u32 csum_size = fs_info->csum_size;
+	int offset_sectors;
 	u8 *csum_expected;
 	u8 csum[BTRFS_CSUM_SIZE];
 
 	ASSERT(pgoff + len <= PAGE_SIZE);
 
-	csum_expected = ((u8 *)io_bio->csum) + icsum * csum_size;
+	offset_sectors = bio_offset >> fs_info->sectorsize_bits;
+	csum_expected = ((u8 *)io_bio->csum) + offset_sectors * csum_size;
 
 	kaddr = kmap_atomic(page);
 	shash->tfm = fs_info->csum_shash;
@@ -2978,8 +2980,13 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
  * when reads are done, we need to check csums to verify the data is correct
  * if there's a match, we allow the bio to finish.  If not, the code in
  * extent_io.c will try to find good copies for us.
+ *
+ * @bio_offset:	The offset to the begining of the bio (in bytes)
+ * @start:	The file offset of the range start
+ * @end:	The file offset of the range end (inclusive)
+ * @mirror:	The mirror number
  */
-int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 phy_offset,
+int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 bio_offset,
 			   struct page *page, u64 start, u64 end, int mirror)
 {
 	size_t offset = start - page_offset(page);
@@ -3004,8 +3011,7 @@ int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 phy_offset,
 		return 0;
 	}
 
-	phy_offset >>= root->fs_info->sectorsize_bits;
-	return check_data_csum(inode, io_bio, phy_offset, page, offset);
+	return check_data_csum(inode, io_bio, bio_offset, page, offset);
 }
 
 /*
@@ -7716,7 +7722,7 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
 	struct bio_vec bvec;
 	struct bvec_iter iter;
 	u64 start = io_bio->logical;
-	int icsum = 0;
+	u64 bio_offset = 0;
 	blk_status_t err = BLK_STS_OK;
 
 	__bio_for_each_segment(bvec, &io_bio->bio, iter, io_bio->iter) {
@@ -7727,8 +7733,8 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
 		for (i = 0; i < nr_sectors; i++) {
 			ASSERT(pgoff < PAGE_SIZE);
 			if (uptodate &&
-			    (!csum || !check_data_csum(inode, io_bio, icsum,
-						       bvec.bv_page, pgoff))) {
+			    (!csum || !check_data_csum(inode, io_bio,
+					bio_offset, bvec.bv_page, pgoff))) {
 				clean_io_failure(fs_info, failure_tree, io_tree,
 						 start, bvec.bv_page,
 						 btrfs_ino(BTRFS_I(inode)),
@@ -7748,7 +7754,7 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
 					err = status;
 			}
 			start += sectorsize;
-			icsum++;
+			bio_offset += sectorsize;
 			pgoff += sectorsize;
 		}
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 08/24] btrfs: inode: make btrfs_verify_data_csum() follow sector size
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (6 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 07/24] btrfs: pass bio_offset to check_data_csum() directly Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-13 19:43   ` Josef Bacik
  2020-11-13 12:51 ` [PATCH v2 09/24] btrfs: extent_io: calculate inline extent buffer page size based on page size Qu Wenruo
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

Currently btrfs_verify_data_csum() just pass the whole page to
check_data_csum(), which is fine since we only support sectorsize ==
PAGE_SIZE.

To support subpage, we need to properly honor per-sector
checksum verification, just like what we did in dio read path.

This patch will do the csum verification in a for loop, starts with
pg_off == start - page_offset(page), with sectorsize increasement for
each loop.

For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we
will only finish with just one loop.

For subpage case, we do the loop to iterate each sector and if we found
any error, we return error.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index bcf3152d0efb..3f19e0e19c96 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2989,10 +2989,11 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
 int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 bio_offset,
 			   struct page *page, u64 start, u64 end, int mirror)
 {
-	size_t offset = start - page_offset(page);
+	u64 pg_off;
 	struct inode *inode = page->mapping->host;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
+	u32 sectorsize = root->fs_info->sectorsize;
 
 	if (PageChecked(page)) {
 		ClearPageChecked(page);
@@ -3011,7 +3012,16 @@ int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 bio_offset,
 		return 0;
 	}
 
-	return check_data_csum(inode, io_bio, bio_offset, page, offset);
+	for (pg_off = start - page_offset(page);
+	     pg_off < end - page_offset(page);
+	     pg_off += sectorsize, bio_offset += sectorsize) {
+		int ret;
+
+		ret = check_data_csum(inode, io_bio, bio_offset, page, pg_off);
+		if (ret < 0)
+			return -EIO;
+	}
+	return 0;
 }
 
 /*
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 09/24] btrfs: extent_io: calculate inline extent buffer page size based on page size
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (7 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 08/24] btrfs: inode: make btrfs_verify_data_csum() follow sector size Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-13 19:47   ` Josef Bacik
  2020-11-13 12:51 ` [PATCH v2 10/24] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE Qu Wenruo
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov, David Sterba

Btrfs only support 64K as max node size, thus for 4K page system, we
would have at most 16 pages for one extent buffer.

For a system using 64K page size, we would really have just one
single page.

While we always use 16 pages for extent_buffer::pages[], this means for
systems using 64K pages, we are wasting memory for the 15 pages which
will never be utilized.

So this patch will change how the extent_buffer::pages[] array size is
calclulated, now it will be calculated using
BTRFS_MAX_METADATA_BLOCKSIZE and PAGE_SIZE.

For systems using 4K page size, it will stay 16 pages.
For systems using 64K page size, it will be just 1 page.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c | 7 +------
 fs/btrfs/extent_io.h | 8 +++++---
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 37dd103213f9..ca3eb095a298 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5042,12 +5042,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
 	atomic_set(&eb->refs, 1);
 	atomic_set(&eb->io_pages, 0);
 
-	/*
-	 * Sanity checks, currently the maximum is 64k covered by 16x 4k pages
-	 */
-	BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE
-		> MAX_INLINE_EXTENT_BUFFER_SIZE);
-	BUG_ON(len > MAX_INLINE_EXTENT_BUFFER_SIZE);
+	ASSERT(len <= BTRFS_MAX_METADATA_BLOCKSIZE);
 
 	return eb;
 }
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index c76697fc3120..dfdef9c5c379 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -73,9 +73,11 @@ typedef blk_status_t (submit_bio_hook_t)(struct inode *inode, struct bio *bio,
 
 typedef blk_status_t (extent_submit_bio_start_t)(struct inode *inode,
 		struct bio *bio, u64 bio_offset);
-
-#define INLINE_EXTENT_BUFFER_PAGES 16
-#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE)
+/*
+ * The SZ_64K is BTRFS_MAX_METADATA_BLOCKSIZE, here just to avoid circle
+ * including "ctree.h".
+ */
+#define INLINE_EXTENT_BUFFER_PAGES (SZ_64K / PAGE_SIZE)
 struct extent_buffer {
 	u64 start;
 	unsigned long len;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 10/24] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (8 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 09/24] btrfs: extent_io: calculate inline extent buffer page size based on page size Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-16 22:51   ` David Sterba
  2020-11-13 12:51 ` [PATCH v2 11/24] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support Qu Wenruo
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs

Just to save us several letters for the incoming patches.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 99955b6bfc62..93de6134c65c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3660,6 +3660,11 @@ static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info)
 	return signal_pending(current);
 }
 
+static inline bool btrfs_is_subpage(struct btrfs_fs_info *fs_info)
+{
+	return (fs_info->sectorsize < PAGE_SIZE);
+}
+
 #define in_range(b, first, len) ((b) >= (first) && (b) < (first) + (len))
 
 /* Sanity test specific functions */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 11/24] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (9 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 10/24] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-13 12:51 ` [PATCH v2 12/24] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs

As a preparation for subpage sector size support (allowing filesystem
with sector size smaller than page size to be mounted) if the sector
size is smaller than page size, we don't allow tree block to be read if
it crosses 64K(*) boundary.

The 64K is selected because:
- We are only going to support 64K page size for subpage for now
- 64K is also the max node size btrfs supports

This ensures that, tree blocks are always contained in one page for a
system with 64K page size, which can greatly simplify the handling.

Or we need to do complex multi-page handling for tree blocks.

Currently there is no way to create such tree blocks.
Kernel has avoided such tree blocks allocation even on 4K page size, as
it can lead to RAID56 stripe scrubing.

While btrfs-progs has fixed its chunk allocator since 2016 for convert,
and has extra checks to do the same behavior as the kernel.

Just add such graceful checks for ancient btrfs.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ca3eb095a298..f87220167f99 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5261,6 +5261,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		btrfs_err(fs_info, "bad tree block start %llu", start);
 		return ERR_PTR(-EINVAL);
 	}
+	if (btrfs_is_subpage(fs_info) && offset_in_page(start) + len >
+			PAGE_SIZE) {
+		btrfs_err(fs_info,
+		"tree block crosses page boundary, start %llu nodesize %lu",
+			  start, len);
+		return ERR_PTR(-EINVAL);
+	}
 
 	eb = find_extent_buffer(fs_info, start);
 	if (eb)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 12/24] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (10 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 11/24] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-18 16:22   ` David Sterba
  2020-11-13 12:51 ` [PATCH v2 13/24] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

For subpage sized extent buffer, we have ensured no extent buffer will
cross page boundary, thus we would only need one page for any extent
buffer.

This patch will update the function num_extent_pages() to handle such
case.
Now num_extent_pages() would return 1 instead of for subpage sized
extent buffer.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index dfdef9c5c379..d39ebaee5ff7 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -206,8 +206,15 @@ void btrfs_readahead_node_child(struct extent_buffer *node, int slot);
 
 static inline int num_extent_pages(const struct extent_buffer *eb)
 {
-	return (round_up(eb->start + eb->len, PAGE_SIZE) >> PAGE_SHIFT) -
-	       (eb->start >> PAGE_SHIFT);
+	/*
+	 * For sectorsize == PAGE_SIZE case, since eb is always aligned to
+	 * sectorsize, it's just (eb->len / PAGE_SIZE) >> PAGE_SHIFT.
+	 *
+	 * For sectorsize < PAGE_SIZE case, we only want to support 64K
+	 * PAGE_SIZE, and ensured all tree blocks won't cross page boundary.
+	 * So in that case we always got 1 page.
+	 */
+	return round_up(eb->len, PAGE_SIZE) >> PAGE_SHIFT;
 }
 
 static inline int extent_buffer_uptodate(const struct extent_buffer *eb)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 13/24] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (11 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 12/24] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-18 19:30   ` David Sterba
                     ` (2 more replies)
  2020-11-13 12:51 ` [PATCH v2 14/24] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage() Qu Wenruo
                   ` (11 subsequent siblings)
  24 siblings, 3 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

To support sectorsize < PAGE_SIZE case, we need to take extra care for
extent buffer accessors.

Since sectorsize is smaller than PAGE_SIZE, one page can contain
multiple tree blocks, we must use eb->start to determine the real offset
to read/write for extent buffer accessors.

This patch introduces two helpers to do these:
- get_eb_page_index()
  This is to calculate the index to access extent_buffer::pages.
  It's just a simple wrapper around "start >> PAGE_SHIFT".

  For sectorsize == PAGE_SIZE case, nothing is changed.
  For sectorsize < PAGE_SIZE case, we always get index as 0, and
  the existing page shift works also fine.

- get_eb_page_offset()
  This is to calculate the offset to access extent_buffer::pages.
  This needs to take extent_buffer::start into consideration.

  For sectorsize == PAGE_SIZE case, extent_buffer::start is always
  aligned to PAGE_SIZE, thus adding extent_buffer::start to
  offset_in_page() won't change the result.
  For sectorsize < PAGE_SIZE case, adding extent_buffer::start gives
  us the correct offset to access.

This patch will touch the following parts to cover all extent buffer
accessors:

- BTRFS_SETGET_HEADER_FUNCS()
- read_extent_buffer()
- read_extent_buffer_to_user()
- memcmp_extent_buffer()
- write_extent_buffer_chunk_tree_uuid()
- write_extent_buffer_fsid()
- write_extent_buffer()
- memzero_extent_buffer()
- copy_extent_buffer_full()
- copy_extent_buffer()
- memcpy_extent_buffer()
- memmove_extent_buffer()
- btrfs_get_token_##bits()
- btrfs_get_##bits()
- btrfs_set_token_##bits()
- btrfs_set_##bits()
- generic_bin_search()

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.c        |  5 ++--
 fs/btrfs/ctree.h        | 38 ++++++++++++++++++++++--
 fs/btrfs/extent_io.c    | 65 ++++++++++++++++++++++++-----------------
 fs/btrfs/struct-funcs.c | 18 +++++++-----
 4 files changed, 87 insertions(+), 39 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 892b467347a3..fd4969883c2e 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1686,10 +1686,11 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
 		oip = offset_in_page(offset);
 
 		if (oip + key_size <= PAGE_SIZE) {
-			const unsigned long idx = offset >> PAGE_SHIFT;
+			const unsigned long idx = get_eb_page_index(offset);
 			char *kaddr = page_address(eb->pages[idx]);
 
-			tmp = (struct btrfs_disk_key *)(kaddr + oip);
+			tmp = (struct btrfs_disk_key *)(kaddr +
+					get_eb_page_offset(eb, offset));
 		} else {
 			read_extent_buffer(eb, &unaligned, offset, key_size);
 			tmp = &unaligned;
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 93de6134c65c..d69ba24401ff 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1552,13 +1552,14 @@ static inline void btrfs_set_token_##name(struct btrfs_map_token *token,\
 #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)		\
 static inline u##bits btrfs_##name(const struct extent_buffer *eb)	\
 {									\
-	const type *p = page_address(eb->pages[0]);			\
+	const type *p = page_address(eb->pages[0]) +			\
+			offset_in_page(eb->start);			\
 	return get_unaligned_le##bits(&p->member);			\
 }									\
 static inline void btrfs_set_##name(const struct extent_buffer *eb,	\
 				    u##bits val)			\
 {									\
-	type *p = page_address(eb->pages[0]);				\
+	type *p = page_address(eb->pages[0]) + offset_in_page(eb->start); \
 	put_unaligned_le##bits(val, &p->member);			\
 }
 
@@ -3367,6 +3368,39 @@ static inline void assertfail(const char *expr, const char* file, int line) { }
 #define ASSERT(expr)	(void)(expr)
 #endif
 
+/*
+ * Get the correct offset inside the page of extent buffer.
+ *
+ * Will handle both sectorsize == PAGE_SIZE and sectorsize < PAGE_SIZE cases.
+ *
+ * @eb:		The target extent buffer
+ * @start:	The offset inside the extent buffer
+ */
+static inline size_t get_eb_page_offset(const struct extent_buffer *eb,
+					unsigned long offset)
+{
+	/*
+	 * For sectorsize == PAGE_SIZE case, eb->start will always be aligned
+	 * to PAGE_SIZE, thus adding it won't cause any difference.
+	 *
+	 * For sectorsize < PAGE_SIZE, we must only read the data belongs to
+	 * the eb, thus we have to take the eb->start into consideration.
+	 */
+	return offset_in_page(offset + eb->start);
+}
+
+static inline unsigned long get_eb_page_index(unsigned long offset)
+{
+	/*
+	 * For sectorsize == PAGE_SIZE case, plain >> PAGE_SHIFT is enough.
+	 *
+	 * For sectorsize < PAGE_SIZE case, we only support 64K PAGE_SIZE,
+	 * and has ensured all tree blocks are contained in one page, thus
+	 * we always get index == 0.
+	 */
+	return offset >> PAGE_SHIFT;
+}
+
 /*
  * Use that for functions that are conditionally exported for sanity tests but
  * otherwise static
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f87220167f99..2faab91f0e8e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5696,12 +5696,12 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 	struct page *page;
 	char *kaddr;
 	char *dst = (char *)dstv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 
 	if (check_eb_range(eb, start, len))
 		return;
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5726,13 +5726,13 @@ int read_extent_buffer_to_user_nofault(const struct extent_buffer *eb,
 	struct page *page;
 	char *kaddr;
 	char __user *dst = (char __user *)dstv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 	int ret = 0;
 
 	WARN_ON(start > eb->len);
 	WARN_ON(start + len > eb->start + eb->len);
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5761,13 +5761,13 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 	struct page *page;
 	char *kaddr;
 	char *ptr = (char *)ptrv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 	int ret = 0;
 
 	if (check_eb_range(eb, start, len))
 		return -EINVAL;
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5793,7 +5793,7 @@ void write_extent_buffer_chunk_tree_uuid(const struct extent_buffer *eb,
 	char *kaddr;
 
 	WARN_ON(!PageUptodate(eb->pages[0]));
-	kaddr = page_address(eb->pages[0]);
+	kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, chunk_tree_uuid), srcv,
 			BTRFS_FSID_SIZE);
 }
@@ -5803,7 +5803,7 @@ void write_extent_buffer_fsid(const struct extent_buffer *eb, const void *srcv)
 	char *kaddr;
 
 	WARN_ON(!PageUptodate(eb->pages[0]));
-	kaddr = page_address(eb->pages[0]);
+	kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, fsid), srcv,
 			BTRFS_FSID_SIZE);
 }
@@ -5816,12 +5816,12 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
 	struct page *page;
 	char *kaddr;
 	char *src = (char *)srcv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 
 	if (check_eb_range(eb, start, len))
 		return;
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5845,12 +5845,12 @@ void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start,
 	size_t offset;
 	struct page *page;
 	char *kaddr;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 
 	if (check_eb_range(eb, start, len))
 		return;
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5874,10 +5874,21 @@ void copy_extent_buffer_full(const struct extent_buffer *dst,
 
 	ASSERT(dst->len == src->len);
 
-	num_pages = num_extent_pages(dst);
-	for (i = 0; i < num_pages; i++)
-		copy_page(page_address(dst->pages[i]),
-				page_address(src->pages[i]));
+	if (dst->fs_info->sectorsize == PAGE_SIZE) {
+		num_pages = num_extent_pages(dst);
+		for (i = 0; i < num_pages; i++)
+			copy_page(page_address(dst->pages[i]),
+				  page_address(src->pages[i]));
+	} else {
+		unsigned long src_index = 0;
+		unsigned long dst_index = 0;
+		size_t src_offset = get_eb_page_offset(src, 0);
+		size_t dst_offset = get_eb_page_offset(dst, 0);
+
+		memcpy(page_address(dst->pages[dst_index]) + dst_offset,
+		       page_address(src->pages[src_index]) + src_offset,
+		       src->len);
+	}
 }
 
 void copy_extent_buffer(const struct extent_buffer *dst,
@@ -5890,7 +5901,7 @@ void copy_extent_buffer(const struct extent_buffer *dst,
 	size_t offset;
 	struct page *page;
 	char *kaddr;
-	unsigned long i = dst_offset >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(dst_offset);
 
 	if (check_eb_range(dst, dst_offset, len) ||
 	    check_eb_range(src, src_offset, len))
@@ -5898,7 +5909,7 @@ void copy_extent_buffer(const struct extent_buffer *dst,
 
 	WARN_ON(src->len != dst_len);
 
-	offset = offset_in_page(dst_offset);
+	offset = get_eb_page_offset(dst, dst_offset);
 
 	while (len > 0) {
 		page = dst->pages[i];
@@ -5942,7 +5953,7 @@ static inline void eb_bitmap_offset(const struct extent_buffer *eb,
 	 * the bitmap item in the extent buffer + the offset of the byte in the
 	 * bitmap item.
 	 */
-	offset = start + byte_offset;
+	offset = start + offset_in_page(eb->start) + byte_offset;
 
 	*page_index = offset >> PAGE_SHIFT;
 	*page_offset = offset_in_page(offset);
@@ -6096,11 +6107,11 @@ void memcpy_extent_buffer(const struct extent_buffer *dst,
 		return;
 
 	while (len > 0) {
-		dst_off_in_page = offset_in_page(dst_offset);
-		src_off_in_page = offset_in_page(src_offset);
+		dst_off_in_page = get_eb_page_offset(dst, dst_offset);
+		src_off_in_page = get_eb_page_offset(dst, src_offset);
 
-		dst_i = dst_offset >> PAGE_SHIFT;
-		src_i = src_offset >> PAGE_SHIFT;
+		dst_i = get_eb_page_index(dst_offset);
+		src_i = get_eb_page_index(src_offset);
 
 		cur = min(len, (unsigned long)(PAGE_SIZE -
 					       src_off_in_page));
@@ -6136,11 +6147,11 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
 		return;
 	}
 	while (len > 0) {
-		dst_i = dst_end >> PAGE_SHIFT;
-		src_i = src_end >> PAGE_SHIFT;
+		dst_i = get_eb_page_index(dst_end);
+		src_i = get_eb_page_index(src_end);
 
-		dst_off_in_page = offset_in_page(dst_end);
-		src_off_in_page = offset_in_page(src_end);
+		dst_off_in_page = get_eb_page_offset(dst, dst_end);
+		src_off_in_page = get_eb_page_offset(dst, src_end);
 
 		cur = min_t(unsigned long, len, src_off_in_page + 1);
 		cur = min(cur, dst_off_in_page + 1);
diff --git a/fs/btrfs/struct-funcs.c b/fs/btrfs/struct-funcs.c
index c46be27be700..b51fec5c8df3 100644
--- a/fs/btrfs/struct-funcs.c
+++ b/fs/btrfs/struct-funcs.c
@@ -57,8 +57,9 @@ u##bits btrfs_get_token_##bits(struct btrfs_map_token *token,		\
 			       const void *ptr, unsigned long off)	\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
-	const unsigned long oip = offset_in_page(member_offset);	\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
+	const unsigned long oip = get_eb_page_offset(token->eb,		\
+						     member_offset);	\
 	const int size = sizeof(u##bits);				\
 	u8 lebytes[sizeof(u##bits)];					\
 	const int part = PAGE_SIZE - oip;				\
@@ -85,8 +86,8 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb,		\
 			 const void *ptr, unsigned long off)		\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long oip = offset_in_page(member_offset);	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
+	const unsigned long oip = get_eb_page_offset(eb, member_offset);\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
 	char *kaddr = page_address(eb->pages[idx]);			\
 	const int size = sizeof(u##bits);				\
 	const int part = PAGE_SIZE - oip;				\
@@ -106,8 +107,9 @@ void btrfs_set_token_##bits(struct btrfs_map_token *token,		\
 			    u##bits val)				\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
-	const unsigned long oip = offset_in_page(member_offset);	\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
+	const unsigned long oip = get_eb_page_offset(token->eb,		\
+						     member_offset);	\
 	const int size = sizeof(u##bits);				\
 	u8 lebytes[sizeof(u##bits)];					\
 	const int part = PAGE_SIZE - oip;				\
@@ -136,8 +138,8 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr,	\
 		      unsigned long off, u##bits val)			\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long oip = offset_in_page(member_offset);	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
+	const unsigned long oip = get_eb_page_offset(eb, member_offset);\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
 	char *kaddr = page_address(eb->pages[idx]);			\
 	const int size = sizeof(u##bits);				\
 	const int part = PAGE_SIZE - oip;				\
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 14/24] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage()
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (12 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 13/24] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-19 21:09   ` David Sterba
  2020-11-13 12:51 ` [PATCH v2 15/24] btrfs: extent-io: make type of extent_state::state to be at least 32 bits Qu Wenruo
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

In extent_invalidatepage() it will try to clear all possible bits since
it's calling clear_extent_bit() with delete == 1.
That would try to clear all existing bits.

This is currently fine, since for btree io tree, it only utilizes
EXTENT_LOCK bit.
But this could be a problem for later subpage support, which will
utilize extra io tree bit to represent extra info.

This patch will just convert that clear_extent_bit() to
unlock_extent_cached().

For current code since only EXTENT_LOCKED bit is utilized, this doesn't
change the behavior, but provides a much cleaner basis for incoming
subpage support.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2faab91f0e8e..1c240448be83 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4463,14 +4463,22 @@ int extent_invalidatepage(struct extent_io_tree *tree,
 	u64 end = start + PAGE_SIZE - 1;
 	size_t blocksize = page->mapping->host->i_sb->s_blocksize;
 
+	/* This function is only called for btree */
+	ASSERT(tree->owner == IO_TREE_BTREE_INODE_IO);
+
 	start += ALIGN(offset, blocksize);
 	if (start > end)
 		return 0;
 
 	lock_extent_bits(tree, start, end, &cached_state);
 	wait_on_page_writeback(page);
-	clear_extent_bit(tree, start, end, EXTENT_LOCKED | EXTENT_DELALLOC |
-			 EXTENT_DO_ACCOUNTING, 1, 1, &cached_state);
+
+	/*
+	 * Currently for btree io tree, only EXTENT_LOCKED is utilized,
+	 * so here we only need to unlock the extent range to free any
+	 * existing extent state.
+	 */
+	unlock_extent_cached(tree, start, end, &cached_state);
 	return 0;
 }
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 15/24] btrfs: extent-io: make type of extent_state::state to be at least 32 bits
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (13 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 14/24] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage() Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-13 18:40   ` Josef Bacik
                     ` (2 more replies)
  2020-11-13 12:51 ` [PATCH v2 16/24] btrfs: file-item: use nodesize to determine whether we need readahead for btrfs_lookup_bio_sums() Qu Wenruo
                   ` (9 subsequent siblings)
  24 siblings, 3 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Currently we use 'unsigned' for extent_state::state, which is only ensured
to be at least 16 bits.

But for incoming subpage support, we are going to introduce more bits,
thus we will go beyond 16 bits.

To support this, make extent_state::state at least 32bit and to be more
explicit, we use "u32" to be clear about the max supported bits.

This doesn't increase the memory usage for x86_64, but may affect other
architectures.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-io-tree.h | 33 ++++++++++++------------
 fs/btrfs/extent_io.c      | 54 +++++++++++++++++++--------------------
 fs/btrfs/extent_io.h      |  2 +-
 3 files changed, 44 insertions(+), 45 deletions(-)

diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 5334b3772f18..042ad97ca08a 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -35,6 +35,7 @@ struct io_failure_record;
  * delalloc bytes decremented, in an atomic way to prevent races with stat(2).
  */
 #define EXTENT_ADD_INODE_BYTES  (1U << 15)
+
 #define EXTENT_DO_ACCOUNTING    (EXTENT_CLEAR_META_RESV | \
 				 EXTENT_CLEAR_DATA_RESV)
 #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING | \
@@ -87,7 +88,7 @@ struct extent_state {
 	/* ADD NEW ELEMENTS AFTER THIS */
 	wait_queue_head_t wq;
 	refcount_t refs;
-	unsigned state;
+	u32 state;
 
 	struct io_failure_record *failrec;
 
@@ -119,19 +120,19 @@ void __cold extent_io_exit(void);
 
 u64 count_range_bits(struct extent_io_tree *tree,
 		     u64 *start, u64 search_end,
-		     u64 max_bytes, unsigned bits, int contig);
+		     u64 max_bytes, u32 bits, int contig);
 
 void free_extent_state(struct extent_state *state);
 int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		   unsigned bits, int filled,
+		   u32 bits, int filled,
 		   struct extent_state *cached_state);
 int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
-		unsigned bits, struct extent_changeset *changeset);
+			     u32 bits, struct extent_changeset *changeset);
 int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		     unsigned bits, int wake, int delete,
+		     u32 bits, int wake, int delete,
 		     struct extent_state **cached);
 int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		     unsigned bits, int wake, int delete,
+		     u32 bits, int wake, int delete,
 		     struct extent_state **cached, gfp_t mask,
 		     struct extent_changeset *changeset);
 
@@ -155,7 +156,7 @@ static inline int unlock_extent_cached_atomic(struct extent_io_tree *tree,
 }
 
 static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start,
-		u64 end, unsigned bits)
+				    u64 end, u32 bits)
 {
 	int wake = 0;
 
@@ -166,14 +167,14 @@ static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start,
 }
 
 int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
-			   unsigned bits, struct extent_changeset *changeset);
+			   u32 bits, struct extent_changeset *changeset);
 int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		   unsigned bits, struct extent_state **cached_state, gfp_t mask);
+		   u32 bits, struct extent_state **cached_state, gfp_t mask);
 int set_extent_bits_nowait(struct extent_io_tree *tree, u64 start, u64 end,
-			   unsigned bits);
+			   u32 bits);
 
 static inline int set_extent_bits(struct extent_io_tree *tree, u64 start,
-		u64 end, unsigned bits)
+		u64 end, u32 bits)
 {
 	return set_extent_bit(tree, start, end, bits, NULL, GFP_NOFS);
 }
@@ -200,11 +201,11 @@ static inline int clear_extent_dirty(struct extent_io_tree *tree, u64 start,
 }
 
 int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		       unsigned bits, unsigned clear_bits,
+		       u32 bits, u32 clear_bits,
 		       struct extent_state **cached_state);
 
 static inline int set_extent_delalloc(struct extent_io_tree *tree, u64 start,
-				      u64 end, unsigned int extra_bits,
+				      u64 end, u32 extra_bits,
 				      struct extent_state **cached_state)
 {
 	return set_extent_bit(tree, start, end,
@@ -234,12 +235,12 @@ static inline int set_extent_uptodate(struct extent_io_tree *tree, u64 start,
 }
 
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
-			  u64 *start_ret, u64 *end_ret, unsigned bits,
+			  u64 *start_ret, u64 *end_ret, u32 bits,
 			  struct extent_state **cached_state);
 void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
-				 u64 *start_ret, u64 *end_ret, unsigned bits);
+				 u64 *start_ret, u64 *end_ret, u32 bits);
 int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start,
-			       u64 *start_ret, u64 *end_ret, unsigned bits);
+			       u64 *start_ret, u64 *end_ret, u32 bits);
 int extent_invalidatepage(struct extent_io_tree *tree,
 			  struct page *page, unsigned long offset);
 bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1c240448be83..f305777ee1a3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -142,7 +142,7 @@ struct extent_page_data {
 	unsigned int sync_io:1;
 };
 
-static int add_extent_changeset(struct extent_state *state, unsigned bits,
+static int add_extent_changeset(struct extent_state *state, u32 bits,
 				 struct extent_changeset *changeset,
 				 int set)
 {
@@ -530,7 +530,7 @@ static void merge_state(struct extent_io_tree *tree,
 }
 
 static void set_state_bits(struct extent_io_tree *tree,
-			   struct extent_state *state, unsigned *bits,
+			   struct extent_state *state, u32 *bits,
 			   struct extent_changeset *changeset);
 
 /*
@@ -547,7 +547,7 @@ static int insert_state(struct extent_io_tree *tree,
 			struct extent_state *state, u64 start, u64 end,
 			struct rb_node ***p,
 			struct rb_node **parent,
-			unsigned *bits, struct extent_changeset *changeset)
+			u32 *bits, struct extent_changeset *changeset)
 {
 	struct rb_node *node;
 
@@ -628,11 +628,11 @@ static struct extent_state *next_state(struct extent_state *state)
  */
 static struct extent_state *clear_state_bit(struct extent_io_tree *tree,
 					    struct extent_state *state,
-					    unsigned *bits, int wake,
+					    u32 *bits, int wake,
 					    struct extent_changeset *changeset)
 {
 	struct extent_state *next;
-	unsigned bits_to_clear = *bits & ~EXTENT_CTLBITS;
+	u32 bits_to_clear = *bits & ~EXTENT_CTLBITS;
 	int ret;
 
 	if ((bits_to_clear & EXTENT_DIRTY) && (state->state & EXTENT_DIRTY)) {
@@ -695,9 +695,9 @@ static void extent_io_tree_panic(struct extent_io_tree *tree, int err)
  * This takes the tree lock, and returns 0 on success and < 0 on error.
  */
 int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-			      unsigned bits, int wake, int delete,
-			      struct extent_state **cached_state,
-			      gfp_t mask, struct extent_changeset *changeset)
+		       u32 bits, int wake, int delete,
+		       struct extent_state **cached_state,
+		       gfp_t mask, struct extent_changeset *changeset)
 {
 	struct extent_state *state;
 	struct extent_state *cached;
@@ -868,7 +868,7 @@ static void wait_on_state(struct extent_io_tree *tree,
  * The tree lock is taken by this function
  */
 static void wait_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-			    unsigned long bits)
+			    u32 bits)
 {
 	struct extent_state *state;
 	struct rb_node *node;
@@ -915,9 +915,9 @@ static void wait_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 
 static void set_state_bits(struct extent_io_tree *tree,
 			   struct extent_state *state,
-			   unsigned *bits, struct extent_changeset *changeset)
+			   u32 *bits, struct extent_changeset *changeset)
 {
-	unsigned bits_to_set = *bits & ~EXTENT_CTLBITS;
+	u32 bits_to_set = *bits & ~EXTENT_CTLBITS;
 	int ret;
 
 	if (tree->private_data && is_data_inode(tree->private_data))
@@ -964,8 +964,8 @@ static void cache_state(struct extent_state *state,
 
 static int __must_check
 __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		 unsigned bits, unsigned exclusive_bits,
-		 u64 *failed_start, struct extent_state **cached_state,
+		 u32 bits, u32 exclusive_bits, u64 *failed_start,
+		 struct extent_state **cached_state,
 		 gfp_t mask, struct extent_changeset *changeset)
 {
 	struct extent_state *state;
@@ -1184,7 +1184,7 @@ __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 }
 
 int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		   unsigned bits, struct extent_state **cached_state, gfp_t mask)
+		   u32 bits, struct extent_state **cached_state, gfp_t mask)
 {
 	return __set_extent_bit(tree, start, end, bits, 0, NULL, cached_state,
 			        mask, NULL);
@@ -1210,7 +1210,7 @@ int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
  * All allocations are done with GFP_NOFS.
  */
 int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		       unsigned bits, unsigned clear_bits,
+		       u32 bits, u32 clear_bits,
 		       struct extent_state **cached_state)
 {
 	struct extent_state *state;
@@ -1411,7 +1411,7 @@ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 
 /* wrappers around set/clear extent bit */
 int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
-			   unsigned bits, struct extent_changeset *changeset)
+			   u32 bits, struct extent_changeset *changeset)
 {
 	/*
 	 * We don't support EXTENT_LOCKED yet, as current changeset will
@@ -1426,14 +1426,14 @@ int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 }
 
 int set_extent_bits_nowait(struct extent_io_tree *tree, u64 start, u64 end,
-			   unsigned bits)
+			   u32 bits)
 {
 	return __set_extent_bit(tree, start, end, bits, 0, NULL, NULL,
 				GFP_NOWAIT, NULL);
 }
 
 int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		     unsigned bits, int wake, int delete,
+		     u32 bits, int wake, int delete,
 		     struct extent_state **cached)
 {
 	return __clear_extent_bit(tree, start, end, bits, wake, delete,
@@ -1441,7 +1441,7 @@ int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 }
 
 int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
-		unsigned bits, struct extent_changeset *changeset)
+		u32 bits, struct extent_changeset *changeset)
 {
 	/*
 	 * Don't support EXTENT_LOCKED case, same reason as
@@ -1529,8 +1529,7 @@ void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end)
  * nothing was found after 'start'
  */
 static struct extent_state *
-find_first_extent_bit_state(struct extent_io_tree *tree,
-			    u64 start, unsigned bits)
+find_first_extent_bit_state(struct extent_io_tree *tree, u64 start, u32 bits)
 {
 	struct rb_node *node;
 	struct extent_state *state;
@@ -1565,7 +1564,7 @@ find_first_extent_bit_state(struct extent_io_tree *tree,
  * Return 1 if we found nothing.
  */
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
-			  u64 *start_ret, u64 *end_ret, unsigned bits,
+			  u64 *start_ret, u64 *end_ret, u32 bits,
 			  struct extent_state **cached_state)
 {
 	struct extent_state *state;
@@ -1616,7 +1615,7 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
  * returned will be the full contiguous area with the bits set.
  */
 int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start,
-			       u64 *start_ret, u64 *end_ret, unsigned bits)
+			       u64 *start_ret, u64 *end_ret, u32 bits)
 {
 	struct extent_state *state;
 	int ret = 1;
@@ -1653,7 +1652,7 @@ int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start,
  * trim @end_ret to the appropriate size.
  */
 void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
-				 u64 *start_ret, u64 *end_ret, unsigned bits)
+				 u64 *start_ret, u64 *end_ret, u32 bits)
 {
 	struct extent_state *state;
 	struct rb_node *node, *prev = NULL, *next;
@@ -2024,8 +2023,7 @@ static int __process_pages_contig(struct address_space *mapping,
 
 void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 				  struct page *locked_page,
-				  unsigned clear_bits,
-				  unsigned long page_ops)
+				  u32 clear_bits, unsigned long page_ops)
 {
 	clear_extent_bit(&inode->io_tree, start, end, clear_bits, 1, 0, NULL);
 
@@ -2041,7 +2039,7 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
  */
 u64 count_range_bits(struct extent_io_tree *tree,
 		     u64 *start, u64 search_end, u64 max_bytes,
-		     unsigned bits, int contig)
+		     u32 bits, int contig)
 {
 	struct rb_node *node;
 	struct extent_state *state;
@@ -2161,7 +2159,7 @@ struct io_failure_record *get_state_failrec(struct extent_io_tree *tree, u64 sta
  * range is found set.
  */
 int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		   unsigned bits, int filled, struct extent_state *cached)
+		   u32 bits, int filled, struct extent_state *cached)
 {
 	struct extent_state *state = NULL;
 	struct rb_node *node;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index d39ebaee5ff7..0123c75ee203 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -265,7 +265,7 @@ void extent_range_clear_dirty_for_io(struct inode *inode, u64 start, u64 end);
 void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end);
 void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 				  struct page *locked_page,
-				  unsigned bits_to_clear,
+				  u32 bits_to_clear,
 				  unsigned long page_ops);
 struct bio *btrfs_bio_alloc(u64 first_byte);
 struct bio *btrfs_io_bio_alloc(unsigned int nr_iovecs);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 16/24] btrfs: file-item: use nodesize to determine whether we need readahead for btrfs_lookup_bio_sums()
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (14 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 15/24] btrfs: extent-io: make type of extent_state::state to be at least 32 bits Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-19 21:09   ` David Sterba
  2020-11-13 12:51 ` [PATCH v2 17/24] btrfs: file-item: remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums() Qu Wenruo
                   ` (8 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

In btrfs_lookup_bio_sums() if the bio is pretty large, we want to
readahead the csum tree.

However the threshold is an immediate number, (PAGE_SIZE * 8), from the
initial btrfs merge.

The value itself is pretty hard to guess the meaning, especially when
the immediate number is from the age where 4K sectorsize is the default
and only CRC32 is supported.

For the most common btrfs setup, CRC32 csum and 4K sectorsize,
it means just 32K read would kick readahead, while the csum itself is
only 32 bytes in size.

Now let's be more reasonable by taking both csum size and node size into
consideration.

If the csum size for the bio is larger than one leaf, then we kick the
readahead.
This means for current default btrfs, the threshold will be 16M.

This change should not change performance observably, thus this is mostly
a readability enhancement.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file-item.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 40daf1a4b46c..73896eb9ead0 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -297,7 +297,11 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 		csum = dst;
 	}
 
-	if (bio->bi_iter.bi_size > PAGE_SIZE * 8)
+	/*
+	 * If needed number of sectors is larger than one leaf can contain,
+	 * kick the readahead for csum tree would be a good idea.
+	 */
+	if (nblocks > fs_info->csums_per_leaf)
 		path->reada = READA_FORWARD;
 
 	/*
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 17/24] btrfs: file-item: remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums()
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (15 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 16/24] btrfs: file-item: use nodesize to determine whether we need readahead for btrfs_lookup_bio_sums() Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-13 12:51 ` [PATCH v2 18/24] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs Qu Wenruo
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

The function btrfs_lookup_bio_sums() is only called for read bios.
While btrfs_find_ordered_sum() is to search ordered extent sums, which
is only for write path.

This means to read a page we either:
- Submit read bio if it's no uptodate
  This means we only need to search csum tree for csums.

- The page is already uptodate
  It can be marked uptodate for previous read, or being marked dirty.
  As we always mark page uptodate for dirty page.
  In that case, we don't need to submit read bio at all, thus no need
  to search any csum.

So this patch will remove the btrfs_find_ordered_sum() call in
btrfs_lookup_bio_sums().
And since btrfs_lookup_bio_sums() is the only caller for
btrfs_find_ordered_sum(), also remove the implementation.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file-item.c    | 16 ++++++++++-----
 fs/btrfs/ordered-data.c | 44 -----------------------------------------
 fs/btrfs/ordered-data.h |  2 --
 3 files changed, 11 insertions(+), 51 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 73896eb9ead0..a3e328406d00 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -239,7 +239,8 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans,
 }
 
 /**
- * btrfs_lookup_bio_sums - Look up checksums for a bio.
+ * btrfs_lookup_bio_sums - Look up checksums for a read bio.
+ *
  * @inode: inode that the bio is for.
  * @bio: bio to look up.
  * @offset: Unless (u64)-1, look up checksums for this offset in the file.
@@ -274,6 +275,15 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 	if (!fs_info->csum_root || (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM))
 		return BLK_STS_OK;
 
+	/*
+	 * This function is only called for read bio.
+	 *
+	 * This means several things:
+	 * - All of our csums should only be in csum tree
+	 *   No ordered extents csums. As ordered extents are only for write
+	 *   path.
+	 */
+	ASSERT(bio_op(bio) == REQ_OP_READ);
 	path = btrfs_alloc_path();
 	if (!path)
 		return BLK_STS_RESOURCE;
@@ -324,10 +334,6 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 
 		if (page_offsets)
 			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
-		count = btrfs_find_ordered_sum(BTRFS_I(inode), offset,
-					       disk_bytenr, csum, nblocks);
-		if (count)
-			goto found;
 
 		if (!item || disk_bytenr < item_start_offset ||
 		    disk_bytenr >= item_last_offset) {
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 0d61f9fefc02..79d366a36223 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -854,50 +854,6 @@ btrfs_lookup_first_ordered_extent(struct btrfs_inode *inode, u64 file_offset)
 	return entry;
 }
 
-/*
- * search the ordered extents for one corresponding to 'offset' and
- * try to find a checksum.  This is used because we allow pages to
- * be reclaimed before their checksum is actually put into the btree
- */
-int btrfs_find_ordered_sum(struct btrfs_inode *inode, u64 offset,
-			   u64 disk_bytenr, u8 *sum, int len)
-{
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct btrfs_ordered_sum *ordered_sum;
-	struct btrfs_ordered_extent *ordered;
-	struct btrfs_ordered_inode_tree *tree = &inode->ordered_tree;
-	unsigned long num_sectors;
-	unsigned long i;
-	const u32 csum_size = fs_info->csum_size;
-	int index = 0;
-
-	ordered = btrfs_lookup_ordered_extent(inode, offset);
-	if (!ordered)
-		return 0;
-
-	spin_lock_irq(&tree->lock);
-	list_for_each_entry_reverse(ordered_sum, &ordered->list, list) {
-		if (disk_bytenr >= ordered_sum->bytenr &&
-		    disk_bytenr < ordered_sum->bytenr + ordered_sum->len) {
-			i = (disk_bytenr - ordered_sum->bytenr) >>
-			    fs_info->sectorsize_bits;
-			num_sectors = ordered_sum->len >> fs_info->sectorsize_bits;
-			num_sectors = min_t(int, len - index, num_sectors - i);
-			memcpy(sum + index, ordered_sum->sums + i * csum_size,
-			       num_sectors * csum_size);
-
-			index += (int)num_sectors * csum_size;
-			if (index == len)
-				goto out;
-			disk_bytenr += num_sectors * fs_info->sectorsize;
-		}
-	}
-out:
-	spin_unlock_irq(&tree->lock);
-	btrfs_put_ordered_extent(ordered);
-	return index;
-}
-
 /*
  * btrfs_flush_ordered_range - Lock the passed range and ensures all pending
  * ordered extents in it are run to completion.
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 367269effd6a..0bfa82b58e23 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -183,8 +183,6 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range(
 		u64 len);
 void btrfs_get_ordered_extents_for_logging(struct btrfs_inode *inode,
 					   struct list_head *list);
-int btrfs_find_ordered_sum(struct btrfs_inode *inode, u64 offset,
-			   u64 disk_bytenr, u8 *sum, int len);
 u64 btrfs_wait_ordered_extents(struct btrfs_root *root, u64 nr,
 			       const u64 range_start, const u64 range_len);
 void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 18/24] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (16 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 17/24] btrfs: file-item: remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums() Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-18 16:27   ` David Sterba
  2020-11-13 12:51 ` [PATCH v2 19/24] btrfs: scrub: remove the anonymous structure from scrub_page Qu Wenruo
                   ` (6 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs

Refactor btrfs_lookup_bio_sums() by:
- Remove the @file_offset parameter
  There are two factors making the @file_offset parameter useless:

  * For csum lookup in csum tree, file offset makes no sense
    We only need disk_bytenr, which is unrelated to file_offset

  * page_offset (file offset) of each bvec is not contiguous.
    Pages can be added to the same bio as long as their on-disk bytenr
    is contiguous, meaning we could have pages at different file offsets
    in the same bio.

  Thus passing file_offset makes no sense any more.
  The only user of file_offset is for data reloc inode, we will use
  a new function, search_file_offset_in_bio(), to handle it.

- Extract the csum tree lookup into search_csum_tree()
  The new function will handle the csum search in csum tree.
  The return value is the same as btrfs_find_ordered_sum(), returning
  the found number of sectors which has checksum.

- Change how we do the main loop
  The only needed info from bio is:
  * the on-disk bytenr
  * the length

  After extracting above info, we can do the search without bio
  at all, which makes the main loop much simpler:

	for (cur_disk_bytenr = orig_disk_bytenr;
	     cur_disk_bytenr < orig_disk_bytenr + orig_len;
	     cur_disk_bytenr += count * sectorsize) {

		/* Lookup csum tree */
		count = search_csum_tree(fs_info, path, cur_disk_bytenr,
					 search_len, csum_dst);
		if (!count) {
			/* Csum hole handling */
		}
	}

- Use single variable as core to calculate all other offsets
  Instead of all different type of variables, we use only one core
  variable, cur_disk_bytenr, which represents the current disk bytenr.

  All involved values can be calculated from that core variable, and
  all those variable will only be visible in the inner loop.

All above refactor makes btrfs_lookup_bio_sums() way more robust than it
used to, especially related to the file offset lookup.
Now file_offset lookup is only related to data reloc inode, other wise
we don't need to bother file_offset at all.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c |   5 +-
 fs/btrfs/ctree.h       |   2 +-
 fs/btrfs/file-item.c   | 239 +++++++++++++++++++++++++++--------------
 fs/btrfs/inode.c       |   5 +-
 4 files changed, 162 insertions(+), 89 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 4e022ed72d2f..3fb6fde2ca13 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -719,8 +719,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 			 */
 			refcount_inc(&cb->pending_bios);
 
-			ret = btrfs_lookup_bio_sums(inode, comp_bio, (u64)-1,
-						    sums);
+			ret = btrfs_lookup_bio_sums(inode, comp_bio, sums);
 			BUG_ON(ret); /* -ENOMEM */
 
 			nr_sectors = DIV_ROUND_UP(comp_bio->bi_iter.bi_size,
@@ -746,7 +745,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	ret = btrfs_bio_wq_end_io(fs_info, comp_bio, BTRFS_WQ_ENDIO_DATA);
 	BUG_ON(ret); /* -ENOMEM */
 
-	ret = btrfs_lookup_bio_sums(inode, comp_bio, (u64)-1, sums);
+	ret = btrfs_lookup_bio_sums(inode, comp_bio, sums);
 	BUG_ON(ret); /* -ENOMEM */
 
 	ret = btrfs_map_bio(fs_info, comp_bio, mirror_num);
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index d69ba24401ff..39cc81e02cdb 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3014,7 +3014,7 @@ struct btrfs_dio_private;
 int btrfs_del_csums(struct btrfs_trans_handle *trans,
 		    struct btrfs_root *root, u64 bytenr, u64 len);
 blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
-				   u64 offset, u8 *dst);
+				   u8 *dst);
 int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
 			     struct btrfs_root *root,
 			     u64 objectid, u64 pos,
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index a3e328406d00..06763fabde38 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -238,13 +238,118 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
+/*
+ * Find csums for logical bytenr range
+ * [disk_bytenr, disk_bytenr + len) and restore the result to @dst.
+ *
+ * Return >0 for the number of sectors we found.
+ * Return 0 for the range [disk_bytenr, disk_bytenr + sectorsize) has no csum
+ * for it. Caller may want to try next sector until one range is hit.
+ * Return <0 for fatal error.
+ */
+static int search_csum_tree(struct btrfs_fs_info *fs_info,
+			    struct btrfs_path *path, u64 disk_bytenr,
+			    u64 len, u8 *dst)
+{
+	struct btrfs_csum_item *item = NULL;
+	struct btrfs_key key;
+	u32 csum_size = fs_info->csum_size;
+	u32 sectorsize = fs_info->sectorsize;
+	u32 itemsize;
+	int ret;
+	u64 csum_start;
+	u64 csum_len;
+
+	ASSERT(IS_ALIGNED(disk_bytenr, sectorsize) &&
+	       IS_ALIGNED(len, sectorsize));
+
+	/* Check if the current csum item covers disk_bytenr */
+	if (path->nodes[0]) {
+		item = btrfs_item_ptr(path->nodes[0], path->slots[0],
+				      struct btrfs_csum_item);
+		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+		itemsize = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
+
+		csum_start = key.offset;
+		csum_len = (itemsize / csum_size) * sectorsize;
+
+		if (in_range(disk_bytenr, csum_start, csum_len))
+			goto found;
+	}
+
+	/* Current item doesn't contain the desired range, re-search */
+	btrfs_release_path(path);
+	item = btrfs_lookup_csum(NULL, fs_info->csum_root, path,
+				 disk_bytenr, 0);
+	if (IS_ERR(item)) {
+		ret = PTR_ERR(item);
+		goto out;
+	}
+	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+	itemsize = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
+
+	csum_start = key.offset;
+	csum_len = (itemsize / csum_size) * sectorsize;
+	ASSERT(in_range(disk_bytenr, csum_start, csum_len));
+
+found:
+	ret = (min(csum_start + csum_len, disk_bytenr + len) -
+		   disk_bytenr) >> fs_info->sectorsize_bits;
+	read_extent_buffer(path->nodes[0], dst, (unsigned long)item,
+			ret * csum_size);
+out:
+	if (ret == -ENOENT)
+		ret = 0;
+	return ret;
+}
+
+/*
+ * A helper to locate the file_offset of @cur_disk_bytenr of a @bio.
+ *
+ * Bio of btrfs represents read range of
+ * [bi_sector << 9, bi_sector << 9 + bi_size).
+ * Knowing this, we can iterate through each bvec to locate the page belong to
+ * @cur_disk_bytenr and get the file offset.
+ *
+ * @inode is used to determine the bvec page really belongs to @inode.
+ *
+ * Return 0 if we can't find the file offset;
+ * Return >0 if we find the file offset and restore it to @file_offset_ret
+ */
+static int search_file_offset_in_bio(struct bio *bio, struct inode *inode,
+				     u64 disk_bytenr, u64 *file_offset_ret)
+{
+	struct bvec_iter iter;
+	struct bio_vec bvec;
+	u64 cur = bio->bi_iter.bi_sector << 9;
+	int ret = 0;
+
+	bio_for_each_segment(bvec, bio, iter) {
+		struct page *page = bvec.bv_page;
+
+		if (cur > disk_bytenr)
+			break;
+		if (cur + bvec.bv_len <= disk_bytenr) {
+			cur += bvec.bv_len;
+			continue;
+		}
+		ASSERT(in_range(disk_bytenr, cur, bvec.bv_len));
+		if (page->mapping && page->mapping->host &&
+		    page->mapping->host == inode) {
+			ret = 1;
+			*file_offset_ret = page_offset(page) + bvec.bv_offset
+				+ disk_bytenr - cur;
+			break;
+		}
+	}
+	return ret;
+}
+
 /**
- * btrfs_lookup_bio_sums - Look up checksums for a read bio.
+ * Lookup the csum for the read bio in csum tree.
  *
  * @inode: inode that the bio is for.
  * @bio: bio to look up.
- * @offset: Unless (u64)-1, look up checksums for this offset in the file.
- *          If (u64)-1, use the page offsets from the bio instead.
  * @dst: Buffer of size nblocks * btrfs_super_csum_size() used to return
  *       checksum (nblocks = bio->bi_iter.bi_size / fs_info->sectorsize). If
  *       NULL, the checksum buffer is allocated and returned in
@@ -253,22 +358,17 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans,
  * Return: BLK_STS_RESOURCE if allocating memory fails, BLK_STS_OK otherwise.
  */
 blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
-				   u64 offset, u8 *dst)
+				   u8 *dst)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct bio_vec bvec;
-	struct bvec_iter iter;
-	struct btrfs_csum_item *item = NULL;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_path *path;
-	const bool page_offsets = (offset == (u64)-1);
+	u32 sectorsize = fs_info->sectorsize;
+	u64 orig_len = bio->bi_iter.bi_size;
+	u64 orig_disk_bytenr = bio->bi_iter.bi_sector << 9;
+	u64 cur_disk_bytenr;
 	u8 *csum;
-	u64 item_start_offset = 0;
-	u64 item_last_offset = 0;
-	u64 disk_bytenr;
-	u64 page_bytes_left;
-	u32 diff;
-	int nblocks;
+	int nblocks = orig_len >> fs_info->sectorsize_bits;
 	int count = 0;
 	const u32 csum_size = fs_info->csum_size;
 
@@ -282,13 +382,16 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 	 * - All of our csums should only be in csum tree
 	 *   No ordered extents csums. As ordered extents are only for write
 	 *   path.
+	 * - No need to bother any other info from bvec
+	 *   Since we're looking up csums, the only important info is the
+	 *   disk_bytenr and the length, which can all be extracted from
+	 *   bi_iter directly.
 	 */
 	ASSERT(bio_op(bio) == REQ_OP_READ);
 	path = btrfs_alloc_path();
 	if (!path)
 		return BLK_STS_RESOURCE;
 
-	nblocks = bio->bi_iter.bi_size >> fs_info->sectorsize_bits;
 	if (!dst) {
 		struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio);
 
@@ -325,81 +428,53 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 		path->skip_locking = 1;
 	}
 
-	disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
+	for (cur_disk_bytenr = orig_disk_bytenr;
+	     cur_disk_bytenr < orig_disk_bytenr + orig_len;
+	     cur_disk_bytenr += (count * sectorsize)) {
+		u64 search_len = orig_disk_bytenr + orig_len - cur_disk_bytenr;
+		int sector_offset;
+		u8 *csum_dst;
 
-	bio_for_each_segment(bvec, bio, iter) {
-		page_bytes_left = bvec.bv_len;
-		if (count)
-			goto next;
+		sector_offset = (cur_disk_bytenr - orig_disk_bytenr) >>
+				 fs_info->sectorsize_bits;
+		csum_dst = csum + sector_offset * csum_size;
 
-		if (page_offsets)
-			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
+		count = search_csum_tree(fs_info, path, cur_disk_bytenr,
+					 search_len, csum_dst);
+		if (count <= 0) {
+			/*
+			 * Either we hit a critical error or we didn't find
+			 * the csum.
+			 * Either way, we put zero into the csums dst, and just
+			 * skip to next sector for a better luck.
+			 */
+			memset(csum_dst, 0, csum_size);
+			count = 1;
 
-		if (!item || disk_bytenr < item_start_offset ||
-		    disk_bytenr >= item_last_offset) {
-			struct btrfs_key found_key;
-			u32 item_size;
-
-			if (item)
-				btrfs_release_path(path);
-			item = btrfs_lookup_csum(NULL, fs_info->csum_root,
-						 path, disk_bytenr, 0);
-			if (IS_ERR(item)) {
-				count = 1;
-				memset(csum, 0, csum_size);
-				if (BTRFS_I(inode)->root->root_key.objectid ==
-				    BTRFS_DATA_RELOC_TREE_OBJECTID) {
-					set_extent_bits(io_tree, offset,
-						offset + fs_info->sectorsize - 1,
+			/*
+			 * For data reloc inode, we need to mark the
+			 * range NODATASUM so that balance won't report
+			 * false csum error.
+			 */
+			if (BTRFS_I(inode)->root->root_key.objectid ==
+			    BTRFS_DATA_RELOC_TREE_OBJECTID) {
+				u64 file_offset;
+				int ret;
+
+				ret = search_file_offset_in_bio(bio, inode,
+						cur_disk_bytenr, &file_offset);
+				if (ret)
+					set_extent_bits(io_tree, file_offset,
+						file_offset + sectorsize - 1,
 						EXTENT_NODATASUM);
-				} else {
-					btrfs_info_rl(fs_info,
-						   "no csum found for inode %llu start %llu",
-					       btrfs_ino(BTRFS_I(inode)), offset);
-				}
-				item = NULL;
-				btrfs_release_path(path);
-				goto found;
+			} else {
+				btrfs_warn_rl(fs_info,
+			"csum hole found for disk bytenr range [%llu, %llu)",
+				cur_disk_bytenr, cur_disk_bytenr + sectorsize);
 			}
-			btrfs_item_key_to_cpu(path->nodes[0], &found_key,
-					      path->slots[0]);
-
-			item_start_offset = found_key.offset;
-			item_size = btrfs_item_size_nr(path->nodes[0],
-						       path->slots[0]);
-			item_last_offset = item_start_offset +
-				(item_size / csum_size) *
-				fs_info->sectorsize;
-			item = btrfs_item_ptr(path->nodes[0], path->slots[0],
-					      struct btrfs_csum_item);
-		}
-		/*
-		 * this byte range must be able to fit inside
-		 * a single leaf so it will also fit inside a u32
-		 */
-		diff = disk_bytenr - item_start_offset;
-		diff = diff >> fs_info->sectorsize_bits;
-		diff = diff * csum_size;
-		count = min_t(int, nblocks, (item_last_offset - disk_bytenr) >>
-					    fs_info->sectorsize_bits);
-		read_extent_buffer(path->nodes[0], csum,
-				   ((unsigned long)item) + diff,
-				   csum_size * count);
-found:
-		csum += count * csum_size;
-		nblocks -= count;
-next:
-		while (count > 0) {
-			count--;
-			disk_bytenr += fs_info->sectorsize;
-			offset += fs_info->sectorsize;
-			page_bytes_left -= fs_info->sectorsize;
-			if (!page_bytes_left)
-				break; /* move to next bio */
 		}
 	}
 
-	WARN_ON_ONCE(count);
 	btrfs_free_path(path);
 	return BLK_STS_OK;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3f19e0e19c96..750aa3770d8f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2255,7 +2255,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 			 * need to csum or not, which is why we ignore skip_sum
 			 * here.
 			 */
-			ret = btrfs_lookup_bio_sums(inode, bio, (u64)-1, NULL);
+			ret = btrfs_lookup_bio_sums(inode, bio, NULL);
 			if (ret)
 				goto out;
 		}
@@ -7965,8 +7965,7 @@ static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap,
 		 *
 		 * If we have csums disabled this will do nothing.
 		 */
-		status = btrfs_lookup_bio_sums(inode, dio_bio, file_offset,
-					       dip->csums);
+		status = btrfs_lookup_bio_sums(inode, dio_bio, dip->csums);
 		if (status != BLK_STS_OK)
 			goto out_err;
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 19/24] btrfs: scrub: remove the anonymous structure from scrub_page
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (17 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 18/24] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-18 19:00   ` David Sterba
  2020-11-19 21:09   ` David Sterba
  2020-11-13 12:51 ` [PATCH v2 20/24] btrfs: scrub: always allocate one full page for one sector for RAID56 Qu Wenruo
                   ` (5 subsequent siblings)
  24 siblings, 2 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs

That anonymous structure serve no special purpose, just replace it with
regular members.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 57ee06d92150..06f6428b97d1 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -71,11 +71,9 @@ struct scrub_page {
 	u64			physical;
 	u64			physical_for_dev_replace;
 	atomic_t		refs;
-	struct {
-		unsigned int	mirror_num:8;
-		unsigned int	have_csum:1;
-		unsigned int	io_error:1;
-	};
+	u8			mirror_num;
+	int			have_csum:1;
+	int			io_error:1;
 	u8			csum[BTRFS_CSUM_SIZE];
 
 	struct scrub_recover	*recover;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 20/24] btrfs: scrub: always allocate one full page for one sector for RAID56
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (18 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 19/24] btrfs: scrub: remove the anonymous structure from scrub_page Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-13 12:51 ` [PATCH v2 21/24] btrfs: scrub: support subpage tree block scrub Qu Wenruo
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs

For scrub_pages() and scrub_pages_for_parity(), we currently allocate
one scrub_page structure for one page.

This is fine if we only read/write one sector one time.
But for cases like scrubing RAID56, we need to read/write the full
stripe, which is in 64K size.

For subpage size, we will submit the read in just one page, which is
normally a good thing, but for RAID56 case, it only expects to see one
sector, not the full stripe in its endio function.
This could lead to wrong parity checksum for RAID56 on subpage.

To make the existing code work well for subpage case, here we take a
shortcut, by always allocating a full page for one sector.

This should provide the basis to make RAID56 work for subpage case.

The cost is pretty obvious now, for one RAID56 stripe now we always need 16
pages. For support subpage situation (64K page size, 4K sector size),
this means we need full one megabyte to scrub just one RAID56 stripe.

And for data scrub, each 4K sector will also need one 64K page.

This is mostly just a workaround, the proper fix for this is a much
larger project, using scrub_block to replace scrub_page, and allow
scrub_block to handle multi pages, csums, and csum_bitmap to avoid
allocating one page for each sector.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 06f6428b97d1..c41911604919 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2152,6 +2152,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 		       u64 physical_for_dev_replace)
 {
 	struct scrub_block *sblock;
+	u32 sectorsize = sctx->fs_info->sectorsize;
 	int index;
 
 	sblock = kzalloc(sizeof(*sblock), GFP_KERNEL);
@@ -2170,7 +2171,15 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 
 	for (index = 0; len > 0; index++) {
 		struct scrub_page *spage;
-		u64 l = min_t(u64, len, PAGE_SIZE);
+		/*
+		 * Here we will allocate one page for one sector to scrub.
+		 * This is fine if PAGE_SIZE == sectorsize, but will cost
+		 * more memory for PAGE_SIZE > sectorsize case.
+		 *
+		 * TODO: Make scrub_block to handle multiple pages and csums,
+		 * so that we don't need scrub_page structure at all.
+		 */
+		u32 l = min_t(u32, sectorsize, len);
 
 		spage = kzalloc(sizeof(*spage), GFP_KERNEL);
 		if (!spage) {
@@ -2485,8 +2494,11 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity,
 {
 	struct scrub_ctx *sctx = sparity->sctx;
 	struct scrub_block *sblock;
+	u32 sectorsize = sctx->fs_info->sectorsize;
 	int index;
 
+	ASSERT(IS_ALIGNED(len, sectorsize));
+
 	sblock = kzalloc(sizeof(*sblock), GFP_KERNEL);
 	if (!sblock) {
 		spin_lock(&sctx->stat_lock);
@@ -2505,7 +2517,8 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity,
 
 	for (index = 0; len > 0; index++) {
 		struct scrub_page *spage;
-		u64 l = min_t(u64, len, PAGE_SIZE);
+		/* Check scrub_page() for the reason why we use sectorsize */
+		u32 l = sectorsize;
 
 		spage = kzalloc(sizeof(*spage), GFP_KERNEL);
 		if (!spage) {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 21/24] btrfs: scrub: support subpage tree block scrub
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (19 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 20/24] btrfs: scrub: always allocate one full page for one sector for RAID56 Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-13 12:51 ` [PATCH v2 22/24] btrfs: scrub: support subpage data scrub Qu Wenruo
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs

To support subpage tree block scrub, scrub_checksum_tree_block() only
needs to learn 2 new tricks:

- Follow sector size
  Now scrub_page only represents one sector, we need to follow it
  properly.

- Run checksum on all sectors
  Since scrub_page only represents one sector, we need to run hash on
  all sectors, no longer just (nodesize >> PAGE_SIZE).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index c41911604919..ed63921789d4 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1807,15 +1807,20 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock)
 	struct scrub_ctx *sctx = sblock->sctx;
 	struct btrfs_header *h;
 	struct btrfs_fs_info *fs_info = sctx->fs_info;
+	u32 sectorsize = sctx->fs_info->sectorsize;
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	u8 calculated_csum[BTRFS_CSUM_SIZE];
 	u8 on_disk_csum[BTRFS_CSUM_SIZE];
-	const int num_pages = sctx->fs_info->nodesize >> PAGE_SHIFT;
+	const int num_sectors = fs_info->nodesize >> fs_info->sectorsize_bits;
 	int i;
 	struct scrub_page *spage;
 	char *kaddr;
 
 	BUG_ON(sblock->page_count < 1);
+
+	/* Each pagev[] is in fact just one sector, not a full page */
+	ASSERT(sblock->page_count == num_sectors);
+
 	spage = sblock->pagev[0];
 	kaddr = page_address(spage->page);
 	h = (struct btrfs_header *)kaddr;
@@ -1844,11 +1849,11 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock)
 	shash->tfm = fs_info->csum_shash;
 	crypto_shash_init(shash);
 	crypto_shash_update(shash, kaddr + BTRFS_CSUM_SIZE,
-			    PAGE_SIZE - BTRFS_CSUM_SIZE);
+			    sectorsize - BTRFS_CSUM_SIZE);
 
-	for (i = 1; i < num_pages; i++) {
+	for (i = 1; i < num_sectors; i++) {
 		kaddr = page_address(sblock->pagev[i]->page);
-		crypto_shash_update(shash, kaddr, PAGE_SIZE);
+		crypto_shash_update(shash, kaddr, sectorsize);
 	}
 
 	crypto_shash_final(shash, calculated_csum);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 22/24] btrfs: scrub: support subpage data scrub
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (20 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 21/24] btrfs: scrub: support subpage tree block scrub Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-18 16:29   ` David Sterba
  2020-11-13 12:51 ` [PATCH v2 23/24] btrfs: scrub: allow scrub to work with subpage sectorsize Qu Wenruo
                   ` (2 subsequent siblings)
  24 siblings, 1 reply; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs

Btrfs scrub is in fact much more flex than buffered data write path, as
we can read an unaligned subpage data into page offset 0.

This ability makes subpage support much easier, we just need to check
each scrub_page::page_len and ensure we only calculate hash for [0,
page_len) of a page, and call it a day for subpage scrub support.

There is a small thing to notice, for subpage case, we still do sector
by sector scrub.
This means we will submit a read bio for each sector to scrub, resulting
the same amount of read bios, just like the 4K page systems.

This behavior can be considered as a good thing, if we want everything
to be the same as 4K page systems.
But this also means, we're wasting the ability to submit larger bio
using 64K page size.
This is another problem to consider in the future.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ed63921789d4..4be376f2a82f 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1781,8 +1781,9 @@ static int scrub_checksum_data(struct scrub_block *sblock)
 	struct scrub_ctx *sctx = sblock->sctx;
 	struct btrfs_fs_info *fs_info = sctx->fs_info;
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
-	u8 csum[BTRFS_CSUM_SIZE];
 	struct scrub_page *spage;
+	u32 sectorsize = fs_info->sectorsize;
+	u8 csum[BTRFS_CSUM_SIZE];
 	char *kaddr;
 
 	BUG_ON(sblock->page_count < 1);
@@ -1794,11 +1795,15 @@ static int scrub_checksum_data(struct scrub_block *sblock)
 
 	shash->tfm = fs_info->csum_shash;
 	crypto_shash_init(shash);
-	crypto_shash_digest(shash, kaddr, PAGE_SIZE, csum);
+
+	/*
+	 * In scrub_pages() and scrub_pages_for_parity() we ensure
+	 * each spage only contains just one sector of data.
+	 */
+	crypto_shash_digest(shash, kaddr, sectorsize, csum);
 
 	if (memcmp(csum, spage->csum, sctx->fs_info->csum_size))
 		sblock->checksum_error = 1;
-
 	return sblock->checksum_error;
 }
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 23/24] btrfs: scrub: allow scrub to work with subpage sectorsize
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (21 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 22/24] btrfs: scrub: support subpage data scrub Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-13 12:51 ` [PATCH v2 24/24] btrfs: extent_io: Use detach_page_private() for alloc_extent_buffer() Qu Wenruo
  2020-11-13 20:05 ` [PATCH v2 00/24] btrfs: preparation patches for subpage support Josef Bacik
  24 siblings, 0 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs

Just remove the restriction on (sectorsize != PAGE_SIZE) error out
branch.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 4be376f2a82f..f712a40ee77c 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3872,14 +3872,6 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 		return -EINVAL;
 	}
 
-	if (fs_info->sectorsize != PAGE_SIZE) {
-		/* not supported for data w/o checksums */
-		btrfs_err_rl(fs_info,
-			   "scrub: size assumption sectorsize != PAGE_SIZE (%d != %lu) fails",
-		       fs_info->sectorsize, PAGE_SIZE);
-		return -EINVAL;
-	}
-
 	if (fs_info->nodesize >
 	    PAGE_SIZE * SCRUB_MAX_PAGES_PER_BLOCK ||
 	    fs_info->sectorsize > PAGE_SIZE * SCRUB_MAX_PAGES_PER_BLOCK) {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 24/24] btrfs: extent_io: Use detach_page_private() for alloc_extent_buffer()
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (22 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 23/24] btrfs: scrub: allow scrub to work with subpage sectorsize Qu Wenruo
@ 2020-11-13 12:51 ` Qu Wenruo
  2020-11-13 20:05 ` [PATCH v2 00/24] btrfs: preparation patches for subpage support Josef Bacik
  24 siblings, 0 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-13 12:51 UTC (permalink / raw)
  To: linux-btrfs

In alloc_extent_buffer(), after we got a page from btree inode, we check
if that page has private pointer attached.

If attached, we check if the existing extent buffer has a proper refs.
If not (the eb is being freed), we will detach that private eb pointer.

The point here is, we are detaching that eb pointer by calling:
- ClearPagePrivate()
- put_page()

The put_page() here is especially confusing, as it's decreaing the ref
caused by attach_page_private().
Without knowing that, it looks like the put_page() is for the
find_or_create_page() call, confusing the read.

Since we're always modifing page private with attach_page_private() and
detach_page_private(), the only open-coded detach_page_private() here is
really confusing.

Fix it by calling detach_page_private().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f305777ee1a3..55115f485d09 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5310,14 +5310,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 				goto free_eb;
 			}
 			exists = NULL;
+			WARN_ON(PageDirty(p));
 
 			/*
 			 * Do this so attach doesn't complain and we need to
 			 * drop the ref the old guy had.
 			 */
-			ClearPagePrivate(p);
-			WARN_ON(PageDirty(p));
-			put_page(p);
+			detach_page_private(page);
 		}
 		attach_extent_buffer_page(eb, p);
 		spin_unlock(&mapping->private_lock);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 15/24] btrfs: extent-io: make type of extent_state::state to be at least 32 bits
  2020-11-13 12:51 ` [PATCH v2 15/24] btrfs: extent-io: make type of extent_state::state to be at least 32 bits Qu Wenruo
@ 2020-11-13 18:40   ` Josef Bacik
  2020-11-18 16:11   ` David Sterba
  2020-11-19 21:09   ` David Sterba
  2 siblings, 0 replies; 67+ messages in thread
From: Josef Bacik @ 2020-11-13 18:40 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Nikolay Borisov

On 11/13/20 7:51 AM, Qu Wenruo wrote:
> Currently we use 'unsigned' for extent_state::state, which is only ensured
> to be at least 16 bits.
> 
> But for incoming subpage support, we are going to introduce more bits,
> thus we will go beyond 16 bits.
> 
> To support this, make extent_state::state at least 32bit and to be more
> explicit, we use "u32" to be clear about the max supported bits.
> 
> This doesn't increase the memory usage for x86_64, but may affect other
> architectures.
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

This failed to apply cleanly, I'll review everything up to this.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 02/24] btrfs: extent-io-tests: remove invalid tests
  2020-11-13 12:51 ` [PATCH v2 02/24] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
@ 2020-11-13 18:42   ` Josef Bacik
  2020-11-19 21:08   ` David Sterba
  1 sibling, 0 replies; 67+ messages in thread
From: Josef Bacik @ 2020-11-13 18:42 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 11/13/20 7:51 AM, Qu Wenruo wrote:
> In extent-io-test, there are two invalid tests:
> - Invalid nodesize for test_eb_bitmaps()
>    Instead of the sectorsize and nodesize combination passed in, we're
>    always using hand-crafted nodesize, e.g:
> 
> 	len = (sectorsize < BTRFS_MAX_METADATA_BLOCKSIZE)
> 		? sectorsize * 4 : sectorsize;
> 
>    In above case, if we have 32K page size, then we will get a length of
>    128K, which is beyond max node size, and obviously invalid.
> 
>    Thankfully most machines are either 4K or 64K page size, thus we
>    haven't yet hit such case.
> 
> - Invalid extent buffer bytenr
>    For 64K page size, the only combination we're going to test is
>    sectorsize = nodesize = 64K.
>    However in that case, we will try to test an eb which bytenr is not
>    sectorsize aligned:
> 
> 	/* Do it over again with an extent buffer which isn't page-aligned. */
> 	eb = __alloc_dummy_extent_buffer(fs_info, nodesize / 2, len);
> 
>    Sector alignedment is a hard requirement for any sector size.
>    The only exception is superblock. But anything else should follow
>    sector size alignment.
> 
>    This is definitely an invalid test case.
> 
> This patch will fix both problems by:
> - Honor the sectorsize/nodesize combination
>    Now we won't bother to hand-craft a strange length and use it as
>    nodesize.
> 
> - Use sectorsize as the 2nd run extent buffer start
>    This would test the case where extent buffer is aligned to sectorsize
>    but not always aligned to nodesize.
> 
> Please note that, later subpage related cleanup will reduce
> extent_buffer::pages[] to exact what we need, making the sector
> unaligned extent buffer operations to cause problem.
> 
> Since only extent_io self tests utilize this invalid feature, this
> patch is required for all later cleanup/refactors.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 03/24] btrfs: extent_io: replace extent_start/extent_len with better structure for end_bio_extent_readpage()
  2020-11-13 12:51 ` [PATCH v2 03/24] btrfs: extent_io: replace extent_start/extent_len with better structure for end_bio_extent_readpage() Qu Wenruo
@ 2020-11-13 19:13   ` Josef Bacik
  2020-11-18 16:05   ` David Sterba
  2020-11-19 21:08   ` David Sterba
  2 siblings, 0 replies; 67+ messages in thread
From: Josef Bacik @ 2020-11-13 19:13 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 11/13/20 7:51 AM, Qu Wenruo wrote:
> In end_bio_extent_readpage() we had a strange dance around
> extent_start/extent_len.
> 
> Hides behind the strange dance is, it's just calling
> endio_readpage_release_extent() on each bvec range.
> 
> Here is an example to explain the original work flow:
>    Bio is for inode 257, containing 2 pages, for range [1M, 1M+8K)
> 
>    end_bio_extent_extent_readpage() entered
>    |- extent_start = 0;
>    |- extent_end = 0;
>    |- bio_for_each_segment_all() {
>    |  |- /* Got the 1st bvec */
>    |  |- start = SZ_1M;
>    |  |- end = SZ_1M + SZ_4K - 1;
>    |  |- update = 1;
>    |  |- if (extent_len == 0) {
>    |  |  |- extent_start = start; /* SZ_1M */
>    |  |  |- extent_len = end + 1 - start; /* SZ_1M */
>    |  |  }
>    |  |
>    |  |- /* Got the 2nd bvec */
>    |  |- start = SZ_1M + 4K;
>    |  |- end = SZ_1M + 4K - 1;
>    |  |- update = 1;
>    |  |- if (extent_start + extent_len == start) {
>    |  |  |- extent_len += end + 1 - start; /* SZ_8K */
>    |  |  }
>    |  } /* All bio vec iterated */
>    |
>    |- if (extent_len) {
>       |- endio_readpage_release_extent(tree, extent_start, extent_len,
> 				      update);
> 	/* extent_start == SZ_1M, extent_len == SZ_8K, uptodate = 1 */
> 
> As the above flow shows, the existing code in end_bio_extent_readpage()
> is just accumulate extent_start/extent_len, and when the contiguous range
> breaks, call endio_readpage_release_extent() for the range.
> 
> However current behavior has something not really considered:
> - The inode can change
>    For bio, their pages don't need to have contig page_offset.
>    This means, even pages from different inode can be packed into one
>    bio.
> 
> - Bvec cross page boundary
>    There is a feature called multi-page bvec, where bvec->bv_len can go
>    beyond bvec->bv_page boundary.
> 
> - Poor readability
> 
> This patch will address the problem by:
> - Introduce a proper structure, processed_extent, to record processed
>    extent range
> - Integrate inode/start/end/uptodate check into
>    endio_readpage_release_extent()
> - Add more comment on each step.
>    This should greatly improve the readability, now in
>    end_bio_extent_readpage() there are only two
>    endio_readpage_release_extent() calls.
> 
> - Add inode contig check
>    Now we also ensure the inode is the same before checking the range
>    contig.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

I like this much better,

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 04/24] btrfs: extent_io: introduce helper to handle page status update in end_bio_extent_readpage()
  2020-11-13 12:51 ` [PATCH v2 04/24] btrfs: extent_io: introduce helper to handle page status update in end_bio_extent_readpage() Qu Wenruo
@ 2020-11-13 19:18   ` Josef Bacik
  2020-11-18 20:27   ` David Sterba
  2020-11-19 21:08   ` David Sterba
  2 siblings, 0 replies; 67+ messages in thread
From: Josef Bacik @ 2020-11-13 19:18 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 11/13/20 7:51 AM, Qu Wenruo wrote:
> Introduce a new helper, endio_readpage_release_extent(), to handle
> update status update in end_bio_extent_readpage().
> 
> The refactor itself is not really nothing interesting, the point here is
> to provide the basis for later subpage support, where the page status
> update can be more complex than current code.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 05/24] btrfs: extent_io: extract the btree page submission code into its own helper function
  2020-11-13 12:51 ` [PATCH v2 05/24] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
@ 2020-11-13 19:22   ` Josef Bacik
  2020-11-18 20:04   ` David Sterba
  2020-11-18 20:09   ` David Sterba
  2 siblings, 0 replies; 67+ messages in thread
From: Josef Bacik @ 2020-11-13 19:22 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 11/13/20 7:51 AM, Qu Wenruo wrote:
> In btree_write_cache_pages() we have a btree page submission routine
> buried deeply into a nested loop.
> 
> This patch will extract that part of code into a helper function,
> submit_eb_page(), to do the same work.
> 
> Also, since submit_eb_page() now can return >0 for successful extent
> buffer submission, remove the "ASSERT(ret <= 0);" line.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 08/24] btrfs: inode: make btrfs_verify_data_csum() follow sector size
  2020-11-13 12:51 ` [PATCH v2 08/24] btrfs: inode: make btrfs_verify_data_csum() follow sector size Qu Wenruo
@ 2020-11-13 19:43   ` Josef Bacik
  0 siblings, 0 replies; 67+ messages in thread
From: Josef Bacik @ 2020-11-13 19:43 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Goldwyn Rodrigues

On 11/13/20 7:51 AM, Qu Wenruo wrote:
> Currently btrfs_verify_data_csum() just pass the whole page to
> check_data_csum(), which is fine since we only support sectorsize ==
> PAGE_SIZE.
> 
> To support subpage, we need to properly honor per-sector
> checksum verification, just like what we did in dio read path.
> 
> This patch will do the csum verification in a for loop, starts with
> pg_off == start - page_offset(page), with sectorsize increasement for
> each loop.
> 
> For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we
> will only finish with just one loop.
> 
> For subpage case, we do the loop to iterate each sector and if we found
> any error, we return error.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 09/24] btrfs: extent_io: calculate inline extent buffer page size based on page size
  2020-11-13 12:51 ` [PATCH v2 09/24] btrfs: extent_io: calculate inline extent buffer page size based on page size Qu Wenruo
@ 2020-11-13 19:47   ` Josef Bacik
  2020-11-14  0:11     ` Qu Wenruo
  0 siblings, 1 reply; 67+ messages in thread
From: Josef Bacik @ 2020-11-13 19:47 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Nikolay Borisov, David Sterba

On 11/13/20 7:51 AM, Qu Wenruo wrote:
> Btrfs only support 64K as max node size, thus for 4K page system, we
> would have at most 16 pages for one extent buffer.
> 
> For a system using 64K page size, we would really have just one
> single page.
> 
> While we always use 16 pages for extent_buffer::pages[], this means for
> systems using 64K pages, we are wasting memory for the 15 pages which
> will never be utilized.
> 
> So this patch will change how the extent_buffer::pages[] array size is
> calclulated, now it will be calculated using
> BTRFS_MAX_METADATA_BLOCKSIZE and PAGE_SIZE.
> 
> For systems using 4K page size, it will stay 16 pages.
> For systems using 64K page size, it will be just 1 page.
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/extent_io.c | 7 +------
>   fs/btrfs/extent_io.h | 8 +++++---
>   2 files changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 37dd103213f9..ca3eb095a298 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5042,12 +5042,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
>   	atomic_set(&eb->refs, 1);
>   	atomic_set(&eb->io_pages, 0);
>   
> -	/*
> -	 * Sanity checks, currently the maximum is 64k covered by 16x 4k pages
> -	 */
> -	BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE
> -		> MAX_INLINE_EXTENT_BUFFER_SIZE);
> -	BUG_ON(len > MAX_INLINE_EXTENT_BUFFER_SIZE);
> +	ASSERT(len <= BTRFS_MAX_METADATA_BLOCKSIZE);
>   
>   	return eb;
>   }
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index c76697fc3120..dfdef9c5c379 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -73,9 +73,11 @@ typedef blk_status_t (submit_bio_hook_t)(struct inode *inode, struct bio *bio,
>   
>   typedef blk_status_t (extent_submit_bio_start_t)(struct inode *inode,
>   		struct bio *bio, u64 bio_offset);
> -
> -#define INLINE_EXTENT_BUFFER_PAGES 16
> -#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE)
> +/*
> + * The SZ_64K is BTRFS_MAX_METADATA_BLOCKSIZE, here just to avoid circle
> + * including "ctree.h".
> + */

Just a passing thought, maybe we should move that definition into btrfs_tree.h.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 00/24] btrfs: preparation patches for subpage support
  2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (23 preceding siblings ...)
  2020-11-13 12:51 ` [PATCH v2 24/24] btrfs: extent_io: Use detach_page_private() for alloc_extent_buffer() Qu Wenruo
@ 2020-11-13 20:05 ` Josef Bacik
  24 siblings, 0 replies; 67+ messages in thread
From: Josef Bacik @ 2020-11-13 20:05 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 11/13/20 7:51 AM, Qu Wenruo wrote:
> This is the rebased preparation branch for all patches not yet merged into
> misc-next.
> 
> It can be fetched from github (with RO mount support using page->private)
> https://github.com/adam900710/linux/tree/subpage
> 
> This patchset includes all the unmerged preparation patches for subpage
> support.
> 
> The patchset is sent without the main core for subpage support, as
> myself has proven that, big patchset bombarding won't really make
> reviewers happy, but only make the author happy (for a very short time).
> 
> Thanks for the hard work from David, there are only 24 patches unmerged.
> 
> Patch 01:	Special hotfix, for selftests. Should be folded into the
> 		offending patch.
> Patch 02:	Fix to remove invalid test cases for 64K sector size.
> Patch 03~08:	Refactors around endio functions (both data and
> 		metadata)
> Patch 09~14:	Update metadata related handling to support subpage.

Sorry Dave, should have just done it this way, you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

for 1-14.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 09/24] btrfs: extent_io: calculate inline extent buffer page size based on page size
  2020-11-13 19:47   ` Josef Bacik
@ 2020-11-14  0:11     ` Qu Wenruo
  0 siblings, 0 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-14  0:11 UTC (permalink / raw)
  To: Josef Bacik, Qu Wenruo, linux-btrfs; +Cc: Nikolay Borisov, David Sterba



On 2020/11/14 上午3:47, Josef Bacik wrote:
> On 11/13/20 7:51 AM, Qu Wenruo wrote:
>> Btrfs only support 64K as max node size, thus for 4K page system, we
>> would have at most 16 pages for one extent buffer.
>>
>> For a system using 64K page size, we would really have just one
>> single page.
>>
>> While we always use 16 pages for extent_buffer::pages[], this means for
>> systems using 64K pages, we are wasting memory for the 15 pages which
>> will never be utilized.
>>
>> So this patch will change how the extent_buffer::pages[] array size is
>> calclulated, now it will be calculated using
>> BTRFS_MAX_METADATA_BLOCKSIZE and PAGE_SIZE.
>>
>> For systems using 4K page size, it will stay 16 pages.
>> For systems using 64K page size, it will be just 1 page.
>>
>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> Signed-off-by: David Sterba <dsterba@suse.com>
>> ---
>>   fs/btrfs/extent_io.c | 7 +------
>>   fs/btrfs/extent_io.h | 8 +++++---
>>   2 files changed, 6 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 37dd103213f9..ca3eb095a298 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -5042,12 +5042,7 @@ __alloc_extent_buffer(struct btrfs_fs_info
>> *fs_info, u64 start,
>>       atomic_set(&eb->refs, 1);
>>       atomic_set(&eb->io_pages, 0);
>>   -    /*
>> -     * Sanity checks, currently the maximum is 64k covered by 16x 4k
>> pages
>> -     */
>> -    BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE
>> -        > MAX_INLINE_EXTENT_BUFFER_SIZE);
>> -    BUG_ON(len > MAX_INLINE_EXTENT_BUFFER_SIZE);
>> +    ASSERT(len <= BTRFS_MAX_METADATA_BLOCKSIZE);
>>         return eb;
>>   }
>> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
>> index c76697fc3120..dfdef9c5c379 100644
>> --- a/fs/btrfs/extent_io.h
>> +++ b/fs/btrfs/extent_io.h
>> @@ -73,9 +73,11 @@ typedef blk_status_t (submit_bio_hook_t)(struct
>> inode *inode, struct bio *bio,
>>     typedef blk_status_t (extent_submit_bio_start_t)(struct inode *inode,
>>           struct bio *bio, u64 bio_offset);
>> -
>> -#define INLINE_EXTENT_BUFFER_PAGES 16
>> -#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES *
>> PAGE_SIZE)
>> +/*
>> + * The SZ_64K is BTRFS_MAX_METADATA_BLOCKSIZE, here just to avoid circle
>> + * including "ctree.h".
>> + */
>
> Just a passing thought, maybe we should move that definition into
> btrfs_tree.h.

Indeed, this is much better and solves the ctree.h including loop.

Would be another patch though.

Thanks,
Qu
>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>
> Thanks,
>
> Josef

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 10/24] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE
  2020-11-13 12:51 ` [PATCH v2 10/24] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE Qu Wenruo
@ 2020-11-16 22:51   ` David Sterba
  2020-11-16 23:50     ` Qu Wenruo
  0 siblings, 1 reply; 67+ messages in thread
From: David Sterba @ 2020-11-16 22:51 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Nov 13, 2020 at 08:51:35PM +0800, Qu Wenruo wrote:
> Just to save us several letters for the incoming patches.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/ctree.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 99955b6bfc62..93de6134c65c 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3660,6 +3660,11 @@ static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info)
>  	return signal_pending(current);
>  }
>  
> +static inline bool btrfs_is_subpage(struct btrfs_fs_info *fs_info)
> +{
> +	return (fs_info->sectorsize < PAGE_SIZE);

I'm not convinced we want to obscure the simple check, saving a few
letters does not sound like a compelling argument. Nothing wrong to have
'sectorsize < PAGE_SIZE' in conditions.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 10/24] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE
  2020-11-16 22:51   ` David Sterba
@ 2020-11-16 23:50     ` Qu Wenruo
  2020-11-17  0:24       ` David Sterba
  0 siblings, 1 reply; 67+ messages in thread
From: Qu Wenruo @ 2020-11-16 23:50 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1822 bytes --]



On 2020/11/17 上午6:51, David Sterba wrote:
> On Fri, Nov 13, 2020 at 08:51:35PM +0800, Qu Wenruo wrote:
>> Just to save us several letters for the incoming patches.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/ctree.h | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 99955b6bfc62..93de6134c65c 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -3660,6 +3660,11 @@ static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info)
>>  	return signal_pending(current);
>>  }
>>  
>> +static inline bool btrfs_is_subpage(struct btrfs_fs_info *fs_info)
>> +{
>> +	return (fs_info->sectorsize < PAGE_SIZE);
> 
> I'm not convinced we want to obscure the simple check, saving a few
> letters does not sound like a compelling argument. Nothing wrong to have
> 'sectorsize < PAGE_SIZE' in conditions.
> 
OK, I can go without the helper.

But there are more things like:

struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);

Should we use a helper or just above line?


Since we're here, allow me to ask for some advice on how to refactor
some code.

Another question is in my current RW branch.
We will have quite some calls like:

spin_lock(&subpage->lock);
bitmap_set(subpage->uptodate_bitmap, bit_start, nbits);
if (bitmap_full(subpage->uptodate_bitmap, BTRFS_SUBPAGE_BITMAP_SIZE))
	SetPageUptodate(page);
spin_unlock(&subpage->lock);

The call sites are not that many, <=5, but there are several different
combinations, e.g. for endio we want to use irqsave version of spinlock.

For data write, we want to convert bits in dirty_bitmap to
writeback_bitmap, and re-check page status.

Should I introduce some helpers for that too?

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 10/24] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE
  2020-11-16 23:50     ` Qu Wenruo
@ 2020-11-17  0:24       ` David Sterba
  0 siblings, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-17  0:24 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Tue, Nov 17, 2020 at 07:50:48AM +0800, Qu Wenruo wrote:
> On 2020/11/17 上午6:51, David Sterba wrote:
> > On Fri, Nov 13, 2020 at 08:51:35PM +0800, Qu Wenruo wrote:
> >> Just to save us several letters for the incoming patches.
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >>  fs/btrfs/ctree.h | 5 +++++
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> >> index 99955b6bfc62..93de6134c65c 100644
> >> --- a/fs/btrfs/ctree.h
> >> +++ b/fs/btrfs/ctree.h
> >> @@ -3660,6 +3660,11 @@ static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info)
> >>  	return signal_pending(current);
> >>  }
> >>  
> >> +static inline bool btrfs_is_subpage(struct btrfs_fs_info *fs_info)
> >> +{
> >> +	return (fs_info->sectorsize < PAGE_SIZE);
> > 
> > I'm not convinced we want to obscure the simple check, saving a few
> > letters does not sound like a compelling argument. Nothing wrong to have
> > 'sectorsize < PAGE_SIZE' in conditions.
> > 
> OK, I can go without the helper.
> 
> But there are more things like:
> 
> struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
> 
> Should we use a helper or just above line?

That's ok and we can clean it up later, the pointer chain is used in
several places so that's not an uncommon pattern and if we want to use
something compact then it's better to do that all at once.

> Since we're here, allow me to ask for some advice on how to refactor
> some code.
> 
> Another question is in my current RW branch.
> We will have quite some calls like:
> 
> spin_lock(&subpage->lock);
> bitmap_set(subpage->uptodate_bitmap, bit_start, nbits);
> if (bitmap_full(subpage->uptodate_bitmap, BTRFS_SUBPAGE_BITMAP_SIZE))
> 	SetPageUptodate(page);
> spin_unlock(&subpage->lock);
> 
> The call sites are not that many, <=5, but there are several different
> combinations, e.g. for endio we want to use irqsave version of spinlock.
> 
> For data write, we want to convert bits in dirty_bitmap to
> writeback_bitmap, and re-check page status.
> 
> Should I introduce some helpers for that too?

I haven't seen the code but from what you say I think it could be hard
to have one common helper for all the cases. I can give you another
oppinion after I see the actual patch so if the code follows a pattern
lock/update bitmap/check/nulock I think it's ok.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 01/24] btrfs: tests: fix free space tree test failure on 64K page system
  2020-11-13 12:51 ` [PATCH v2 01/24] btrfs: tests: fix free space tree test failure on 64K page system Qu Wenruo
@ 2020-11-17 11:53   ` Nikolay Borisov
  0 siblings, 0 replies; 67+ messages in thread
From: Nikolay Borisov @ 2020-11-17 11:53 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 13.11.20 г. 14:51 ч., Qu Wenruo wrote:
> When trying to load btrfs with selftest compiled in, on 64K page system
> the test will always fail:
> 
>   BTRFS: selftest: running free space tree tests
>   BTRFS: selftest: fs/btrfs/tests/free-space-tree-tests.c:101 free space tree is invalid
>   BTRFS: selftest: fs/btrfs/tests/free-space-tree-tests.c:529 test_empty_block_group [btrfs] failed with extents, sectorsize=65536, nodesize=65536, alignment=134217728
> 
> The cause is that, after commit 801dc0c9ff1f ("btrfs: replace div_u64 by
> shift in free_space_bitmap_size"), we use fs_info->sectorsize_bits for
> free space cache.
> 
> But in comit fc59cfa7d2ab ("btrfs: use precalculated sectorsize_bits
> from fs_info"), we only initialized the fs_info for non-testing
> environment, leaving the default bits to be ilog2(4K), screwing up the
> selftest on 64K page system.
> 
> Fix it by also honor sectorsize_bits in selftest.
> 
> David, please fold this fix into the offending commit.
> 
> Fixes: fc59cfa7d2ab ("btrfs: use precalculated sectorsize_bits from fs_info")
> Signed-off-by: Qu Wenruo <wqu@suse.com>

This can be dropped as it's already merged and authored by David.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 03/24] btrfs: extent_io: replace extent_start/extent_len with better structure for end_bio_extent_readpage()
  2020-11-13 12:51 ` [PATCH v2 03/24] btrfs: extent_io: replace extent_start/extent_len with better structure for end_bio_extent_readpage() Qu Wenruo
  2020-11-13 19:13   ` Josef Bacik
@ 2020-11-18 16:05   ` David Sterba
  2020-11-18 23:49     ` Qu Wenruo
  2020-11-19 21:08   ` David Sterba
  2 siblings, 1 reply; 67+ messages in thread
From: David Sterba @ 2020-11-18 16:05 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Nov 13, 2020 at 08:51:28PM +0800, Qu Wenruo wrote:
>  }
>  
> +/*
> + * Records previously processed extent range.
> + *
> + * For endio_readpage_release_extent() to handle a full extent range, reducing
> + * the extent io operations.
> + */
> +struct processed_extent {
> +	struct btrfs_inode *inode;
> +	u64 start;	/* file offset in @inode */
> +	u64 end;	/* file offset in @inode */

Please don't use the in-line comments for struct members.

> +	bool uptodate;
> +};
> +
> +/*
> + * Try to release processed extent range.
> + *
> + * May not release the extent range right now if the current range is contig

'contig' means what? If it's for 'contiguous' then please spell it out
in text and use the abbreviated form only for variables.

> + * with processed extent.
> + *
> + * Will release processed extent when any of @inode, @uptodate, the range is
> + * no longer contig with processed range.
> + * Pass @inode == NULL will force processed extent to be released.
> + */
>  static void
> -endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 len,
> -			      int uptodate)
> +endio_readpage_release_extent(struct processed_extent *processed,
> +			      struct btrfs_inode *inode, u64 start, u64 end,
> +			      bool uptodate)
>  {
>  	struct extent_state *cached = NULL;
> -	u64 end = start + len - 1;
> +	struct extent_io_tree *tree;
>  
> -	if (uptodate && tree->track_uptodate)
> -		set_extent_uptodate(tree, start, end, &cached, GFP_ATOMIC);
> -	unlock_extent_cached_atomic(tree, start, end, &cached);
> +	/* We're the first extent, initialize @processed */
> +	if (!processed->inode)
> +		goto update;
> +
> +	/*
> +	 * Contig with processed extent. Just uptodate the end
> +	 *
> +	 * Several things to notice:
> +	 * - Bio can be merged as long as on-disk bytenr is contig
> +	 *   This means we can have page belonging to other inodes, thus need to
> +	 *   check if the inode matches.
> +	 * - Bvec can contain range beyond current page for multi-page bvec
> +	 *   Thus we need to do processed->end + 1 >= start check
> +	 */
> +	if (processed->inode == inode && processed->uptodate == uptodate &&
> +	    processed->end + 1 >= start && end >= processed->end) {
> +		processed->end = end;
> +		return;
> +	}
> +
> +	tree = &processed->inode->io_tree;
> +	/*
> +	 * Now we have a range not contig with processed range, release the
> +	 * processed range now.
> +	 */
> +	if (processed->uptodate && tree->track_uptodate)
> +		set_extent_uptodate(tree, processed->start, processed->end,
> +				    &cached, GFP_ATOMIC);
> +	unlock_extent_cached_atomic(tree, processed->start, processed->end,
> +				    &cached);
> +
> +update:
> +	/* Update @processed to current range */
> +	processed->inode = inode;
> +	processed->start = start;
> +	processed->end = end;
> +	processed->uptodate = uptodate;
>  }

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 15/24] btrfs: extent-io: make type of extent_state::state to be at least 32 bits
  2020-11-13 12:51 ` [PATCH v2 15/24] btrfs: extent-io: make type of extent_state::state to be at least 32 bits Qu Wenruo
  2020-11-13 18:40   ` Josef Bacik
@ 2020-11-18 16:11   ` David Sterba
  2020-11-18 23:48     ` Qu Wenruo
  2020-11-19 21:09   ` David Sterba
  2 siblings, 1 reply; 67+ messages in thread
From: David Sterba @ 2020-11-18 16:11 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Nikolay Borisov

On Fri, Nov 13, 2020 at 08:51:40PM +0800, Qu Wenruo wrote:
> Currently we use 'unsigned' for extent_state::state, which is only ensured
> to be at least 16 bits.

Ensured maybe by the C standard but we use u32 and 'unsigned int'
interchangably everywhere. There are some inferior architectures that
use different type witdths, but all we care is 32bit and 64bit.

> But for incoming subpage support, we are going to introduce more bits,
> thus we will go beyond 16 bits.
> 
> To support this, make extent_state::state at least 32bit and to be more
> explicit, we use "u32" to be clear about the max supported bits.

Yeah that's fine to make the expected width requirement explicit.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 12/24] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer
  2020-11-13 12:51 ` [PATCH v2 12/24] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
@ 2020-11-18 16:22   ` David Sterba
  0 siblings, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-18 16:22 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Nikolay Borisov

On Fri, Nov 13, 2020 at 08:51:37PM +0800, Qu Wenruo wrote:
> For subpage sized extent buffer, we have ensured no extent buffer will
> cross page boundary, thus we would only need one page for any extent
> buffer.
> 
> This patch will update the function num_extent_pages() to handle such
> case.
> Now num_extent_pages() would return 1 instead of for subpage sized
> extent buffer.
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.h | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index dfdef9c5c379..d39ebaee5ff7 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -206,8 +206,15 @@ void btrfs_readahead_node_child(struct extent_buffer *node, int slot);
>  
>  static inline int num_extent_pages(const struct extent_buffer *eb)
>  {
> -	return (round_up(eb->start + eb->len, PAGE_SIZE) >> PAGE_SHIFT) -
> -	       (eb->start >> PAGE_SHIFT);
> +	/*
> +	 * For sectorsize == PAGE_SIZE case, since eb is always aligned to
> +	 * sectorsize, it's just (eb->len / PAGE_SIZE) >> PAGE_SHIFT.
> +	 *
> +	 * For sectorsize < PAGE_SIZE case, we only want to support 64K
> +	 * PAGE_SIZE, and ensured all tree blocks won't cross page boundary.
> +	 * So in that case we always got 1 page.
> +	 */
> +	return round_up(eb->len, PAGE_SIZE) >> PAGE_SHIFT;

	return max(eb->len, PAGE_SIZE) >> PAGE_SHIFT;

Looks simpler and looks understandable. A bit more compact way

	return eb->len >> PAGE_SHIFT ?: 1;

would be too much but maybe a potential optimiziation for later.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 18/24] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs
  2020-11-13 12:51 ` [PATCH v2 18/24] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs Qu Wenruo
@ 2020-11-18 16:27   ` David Sterba
  2020-11-18 23:57     ` Qu Wenruo
  0 siblings, 1 reply; 67+ messages in thread
From: David Sterba @ 2020-11-18 16:27 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Nov 13, 2020 at 08:51:43PM +0800, Qu Wenruo wrote:
> @@ -325,81 +428,53 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
>  		path->skip_locking = 1;
>  	}
>  
> -	disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
> +	for (cur_disk_bytenr = orig_disk_bytenr;
> +	     cur_disk_bytenr < orig_disk_bytenr + orig_len;
> +	     cur_disk_bytenr += (count * sectorsize)) {
> +		u64 search_len = orig_disk_bytenr + orig_len - cur_disk_bytenr;
> +		int sector_offset;
> +		u8 *csum_dst;
>  
> -	bio_for_each_segment(bvec, bio, iter) {
> -		page_bytes_left = bvec.bv_len;
> -		if (count)
> -			goto next;
> +		sector_offset = (cur_disk_bytenr - orig_disk_bytenr) >>
> +				 fs_info->sectorsize_bits;

I replied to the mismatching types already but because you've been
sending out the patchsets too often this got lost, I really don't want
keep seeing the same things again. You left some int type mess in the
new subpage patchset as well.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 22/24] btrfs: scrub: support subpage data scrub
  2020-11-13 12:51 ` [PATCH v2 22/24] btrfs: scrub: support subpage data scrub Qu Wenruo
@ 2020-11-18 16:29   ` David Sterba
  2020-11-18 23:38     ` Qu Wenruo
  0 siblings, 1 reply; 67+ messages in thread
From: David Sterba @ 2020-11-18 16:29 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Nov 13, 2020 at 08:51:47PM +0800, Qu Wenruo wrote:
> @@ -1781,8 +1781,9 @@ static int scrub_checksum_data(struct scrub_block *sblock)
>  	struct scrub_ctx *sctx = sblock->sctx;
>  	struct btrfs_fs_info *fs_info = sctx->fs_info;
>  	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
> -	u8 csum[BTRFS_CSUM_SIZE];
>  	struct scrub_page *spage;
> +	u32 sectorsize = fs_info->sectorsize;
> +	u8 csum[BTRFS_CSUM_SIZE];
>  	char *kaddr;
>  
>  	BUG_ON(sblock->page_count < 1);
> @@ -1794,11 +1795,15 @@ static int scrub_checksum_data(struct scrub_block *sblock)
>  
>  	shash->tfm = fs_info->csum_shash;
>  	crypto_shash_init(shash);
> -	crypto_shash_digest(shash, kaddr, PAGE_SIZE, csum);
> +
> +	/*
> +	 * In scrub_pages() and scrub_pages_for_parity() we ensure
> +	 * each spage only contains just one sector of data.
> +	 */
> +	crypto_shash_digest(shash, kaddr, sectorsize, csum);

Temporary variable is not needed for single use (sectorsize).

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 19/24] btrfs: scrub: remove the anonymous structure from scrub_page
  2020-11-13 12:51 ` [PATCH v2 19/24] btrfs: scrub: remove the anonymous structure from scrub_page Qu Wenruo
@ 2020-11-18 19:00   ` David Sterba
  2020-11-19 21:09   ` David Sterba
  1 sibling, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-18 19:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Nov 13, 2020 at 08:51:44PM +0800, Qu Wenruo wrote:
> That anonymous structure serve no special purpose, just replace it with
> regular members.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/scrub.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 57ee06d92150..06f6428b97d1 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -71,11 +71,9 @@ struct scrub_page {
>  	u64			physical;
>  	u64			physical_for_dev_replace;
>  	atomic_t		refs;
> -	struct {
> -		unsigned int	mirror_num:8;
> -		unsigned int	have_csum:1;
> -		unsigned int	io_error:1;
> -	};
> +	u8			mirror_num;
> +	int			have_csum:1;
> +	int			io_error:1;

Note that bitfields need to be unsigned. This could be also bool as
there's a bit of space, but this misaligns csum by 1 byte so that's for
more evaluation and another patch.

>  	u8			csum[BTRFS_CSUM_SIZE];
>  
>  	struct scrub_recover	*recover;
> --

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 13/24] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
  2020-11-13 12:51 ` [PATCH v2 13/24] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
@ 2020-11-18 19:30   ` David Sterba
  2020-11-18 19:38   ` David Sterba
  2020-11-18 19:48   ` David Sterba
  2 siblings, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-18 19:30 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Goldwyn Rodrigues

On Fri, Nov 13, 2020 at 08:51:38PM +0800, Qu Wenruo wrote:
>  fs/btrfs/ctree.c        |  5 ++--
>  fs/btrfs/ctree.h        | 38 ++++++++++++++++++++++--
>  fs/btrfs/extent_io.c    | 65 ++++++++++++++++++++++++-----------------
>  fs/btrfs/struct-funcs.c | 18 +++++++-----
>  4 files changed, 87 insertions(+), 39 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 892b467347a3..fd4969883c2e 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1686,10 +1686,11 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
>  		oip = offset_in_page(offset);
>  
>  		if (oip + key_size <= PAGE_SIZE) {
> -			const unsigned long idx = offset >> PAGE_SHIFT;
> +			const unsigned long idx = get_eb_page_index(offset);

This could follow the pattern to extract the offset and index

>  			char *kaddr = page_address(eb->pages[idx]);
>  
> -			tmp = (struct btrfs_disk_key *)(kaddr + oip);
> +			tmp = (struct btrfs_disk_key *)(kaddr +
> +					get_eb_page_offset(eb, offset));

and not put it inside the expression.

>  		} else {
>  			read_extent_buffer(eb, &unaligned, offset, key_size);
>  			tmp = &unaligned;

> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5874,10 +5874,21 @@ void copy_extent_buffer_full(const struct extent_buffer *dst,
>  
>  	ASSERT(dst->len == src->len);
>  
> -	num_pages = num_extent_pages(dst);
> -	for (i = 0; i < num_pages; i++)
> -		copy_page(page_address(dst->pages[i]),
> -				page_address(src->pages[i]));
> +	if (dst->fs_info->sectorsize == PAGE_SIZE) {
> +		num_pages = num_extent_pages(dst);
> +		for (i = 0; i < num_pages; i++)
> +			copy_page(page_address(dst->pages[i]),
> +				  page_address(src->pages[i]));
> +	} else {
> +		unsigned long src_index = 0;
> +		unsigned long dst_index = 0;

This is not necessary, 0 is fine in the context.

> +		size_t src_offset = get_eb_page_offset(src, 0);
> +		size_t dst_offset = get_eb_page_offset(dst, 0);
> +

As this is in the 'else' branch, an assert for the only valid case
should be here

		ASSERT(dst->fs_info->sectorsize < PAGE_SIZE);

> +		memcpy(page_address(dst->pages[dst_index]) + dst_offset,
> +		       page_address(src->pages[src_index]) + src_offset,
> +		       src->len);
> +	}

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 13/24] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
  2020-11-13 12:51 ` [PATCH v2 13/24] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
  2020-11-18 19:30   ` David Sterba
@ 2020-11-18 19:38   ` David Sterba
  2020-11-18 19:48   ` David Sterba
  2 siblings, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-18 19:38 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Goldwyn Rodrigues

On Fri, Nov 13, 2020 at 08:51:38PM +0800, Qu Wenruo wrote:
> This patch introduces two helpers to do these:
> - get_eb_page_index()
>   This is to calculate the index to access extent_buffer::pages.
>   It's just a simple wrapper around "start >> PAGE_SHIFT".
> 
>   For sectorsize == PAGE_SIZE case, nothing is changed.
>   For sectorsize < PAGE_SIZE case, we always get index as 0, and
>   the existing page shift works also fine.
> 
> - get_eb_page_offset()

This is too confusing, it does not return offset of the page but offset
in the page, as it is replacement of offset_in_page.  That
get_eb_page_index is clear from the name but the other not.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 13/24] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
  2020-11-13 12:51 ` [PATCH v2 13/24] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
  2020-11-18 19:30   ` David Sterba
  2020-11-18 19:38   ` David Sterba
@ 2020-11-18 19:48   ` David Sterba
  2020-11-24  6:20     ` Qu Wenruo
  2 siblings, 1 reply; 67+ messages in thread
From: David Sterba @ 2020-11-18 19:48 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Goldwyn Rodrigues

On Fri, Nov 13, 2020 at 08:51:38PM +0800, Qu Wenruo wrote:
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1686,10 +1686,11 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
>  		oip = offset_in_page(offset);
>  
>  		if (oip + key_size <= PAGE_SIZE) {
> -			const unsigned long idx = offset >> PAGE_SHIFT;
> +			const unsigned long idx = get_eb_page_index(offset);
>  			char *kaddr = page_address(eb->pages[idx]);
>  
> -			tmp = (struct btrfs_disk_key *)(kaddr + oip);
> +			tmp = (struct btrfs_disk_key *)(kaddr +
> +					get_eb_page_offset(eb, offset));

Here offset_in_page(offset) == get_eb_page_offset(eb, offset) and does
not need to be calculated again for both sector/page combinations. As
this is a hot path in search it sould be kept optimizied.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 05/24] btrfs: extent_io: extract the btree page submission code into its own helper function
  2020-11-13 12:51 ` [PATCH v2 05/24] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
  2020-11-13 19:22   ` Josef Bacik
@ 2020-11-18 20:04   ` David Sterba
  2020-11-18 20:09   ` David Sterba
  2 siblings, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-18 20:04 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, David Sterba

On Fri, Nov 13, 2020 at 08:51:30PM +0800, Qu Wenruo wrote:
> In btree_write_cache_pages() we have a btree page submission routine
> buried deeply into a nested loop.
> 
> This patch will extract that part of code into a helper function,
> submit_eb_page(), to do the same work.
> 
> Also, since submit_eb_page() now can return >0 for successful extent
> buffer submission, remove the "ASSERT(ret <= 0);" line.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/extent_io.c | 116 +++++++++++++++++++++++++------------------
>  1 file changed, 69 insertions(+), 47 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index caafe44542e8..fd4845b06989 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3987,10 +3987,75 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
>  	return ret;
>  }
>  
> +/*
> + * Submit one page of one extent buffer.
> + *
> + * Will try to submit all pages of one extent buffer, thus will skip some pages
> + * if it's already submitted.

Note that this paragraph

> + *
> + * @page:	The page of one extent buffer
> + * @eb_context:	To determine if we need to submit this page. If current page
> + *		belongs to this eb, we don't need to submit.

belongs under the parameter description, I hope the example is
https://btrfs.wiki.kernel.org/index.php/Development_notes#Coding_style_preferences
instructive enough.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 05/24] btrfs: extent_io: extract the btree page submission code into its own helper function
  2020-11-13 12:51 ` [PATCH v2 05/24] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
  2020-11-13 19:22   ` Josef Bacik
  2020-11-18 20:04   ` David Sterba
@ 2020-11-18 20:09   ` David Sterba
  2 siblings, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-18 20:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, David Sterba

On Fri, Nov 13, 2020 at 08:51:30PM +0800, Qu Wenruo wrote:
> +/*
> + * Submit one page of one extent buffer.
> + *
> + * Will try to submit all pages of one extent buffer, thus will skip some pages
> + * if it's already submitted.

I think I have a deja vu, reading the above is contradictory (submit one
vs submit all) and that I replied that it should be clarified.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 04/24] btrfs: extent_io: introduce helper to handle page status update in end_bio_extent_readpage()
  2020-11-13 12:51 ` [PATCH v2 04/24] btrfs: extent_io: introduce helper to handle page status update in end_bio_extent_readpage() Qu Wenruo
  2020-11-13 19:18   ` Josef Bacik
@ 2020-11-18 20:27   ` David Sterba
  2020-11-18 23:43     ` Qu Wenruo
  2020-11-19 21:08   ` David Sterba
  2 siblings, 1 reply; 67+ messages in thread
From: David Sterba @ 2020-11-18 20:27 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Nov 13, 2020 at 08:51:29PM +0800, Qu Wenruo wrote:
> Introduce a new helper, endio_readpage_release_extent(), to handle
> update status update in end_bio_extent_readpage().
> 
> The refactor itself is not really nothing interesting, the point here is
> to provide the basis for later subpage support, where the page status
> update can be more complex than current code.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index b5b3700943e0..caafe44542e8 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2849,6 +2849,17 @@ endio_readpage_release_extent(struct processed_extent *processed,
>  	processed->uptodate = uptodate;
>  }
>  
> +static void endio_readpage_update_page_status(struct page *page, bool uptodate)
> +{
> +	if (uptodate) {
> +		SetPageUptodate(page);
> +	} else {
> +		ClearPageUptodate(page);
> +		SetPageError(page);
> +	}
> +	unlock_page(page);

That would be better left in the caller as it's quite important
information but the helper name does not say anything about it.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 22/24] btrfs: scrub: support subpage data scrub
  2020-11-18 16:29   ` David Sterba
@ 2020-11-18 23:38     ` Qu Wenruo
  0 siblings, 0 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-18 23:38 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1318 bytes --]



On 2020/11/19 上午12:29, David Sterba wrote:
> On Fri, Nov 13, 2020 at 08:51:47PM +0800, Qu Wenruo wrote:
>> @@ -1781,8 +1781,9 @@ static int scrub_checksum_data(struct scrub_block *sblock)
>>  	struct scrub_ctx *sctx = sblock->sctx;
>>  	struct btrfs_fs_info *fs_info = sctx->fs_info;
>>  	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
>> -	u8 csum[BTRFS_CSUM_SIZE];
>>  	struct scrub_page *spage;
>> +	u32 sectorsize = fs_info->sectorsize;
>> +	u8 csum[BTRFS_CSUM_SIZE];
>>  	char *kaddr;
>>  
>>  	BUG_ON(sblock->page_count < 1);
>> @@ -1794,11 +1795,15 @@ static int scrub_checksum_data(struct scrub_block *sblock)
>>  
>>  	shash->tfm = fs_info->csum_shash;
>>  	crypto_shash_init(shash);
>> -	crypto_shash_digest(shash, kaddr, PAGE_SIZE, csum);
>> +
>> +	/*
>> +	 * In scrub_pages() and scrub_pages_for_parity() we ensure
>> +	 * each spage only contains just one sector of data.
>> +	 */
>> +	crypto_shash_digest(shash, kaddr, sectorsize, csum);
> 
> Temporary variable is not needed for single use (sectorsize).
> 

Personally speaking, whether such temporary variable is needed should be
determined at compile time.

For reader, I didn't see anything wrong using such variable, especially
it can save some "fs_info->" typing and saves some new line.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 04/24] btrfs: extent_io: introduce helper to handle page status update in end_bio_extent_readpage()
  2020-11-18 20:27   ` David Sterba
@ 2020-11-18 23:43     ` Qu Wenruo
  2020-11-19 18:32       ` David Sterba
  0 siblings, 1 reply; 67+ messages in thread
From: Qu Wenruo @ 2020-11-18 23:43 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1533 bytes --]



On 2020/11/19 上午4:27, David Sterba wrote:
> On Fri, Nov 13, 2020 at 08:51:29PM +0800, Qu Wenruo wrote:
>> Introduce a new helper, endio_readpage_release_extent(), to handle
>> update status update in end_bio_extent_readpage().
>>
>> The refactor itself is not really nothing interesting, the point here is
>> to provide the basis for later subpage support, where the page status
>> update can be more complex than current code.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/extent_io.c | 17 ++++++++++++-----
>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index b5b3700943e0..caafe44542e8 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -2849,6 +2849,17 @@ endio_readpage_release_extent(struct processed_extent *processed,
>>  	processed->uptodate = uptodate;
>>  }
>>  
>> +static void endio_readpage_update_page_status(struct page *page, bool uptodate)
>> +{
>> +	if (uptodate) {
>> +		SetPageUptodate(page);
>> +	} else {
>> +		ClearPageUptodate(page);
>> +		SetPageError(page);
>> +	}
>> +	unlock_page(page);
> 
> That would be better left in the caller as it's quite important
> information but the helper name does not say anything about it.
> 

It may be the case for now, but for incoming subpage, check the patch
"btrfs: integrate page status update for read path into
begin/end_page_read()" to see why we want page unlocking to be done here.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 15/24] btrfs: extent-io: make type of extent_state::state to be at least 32 bits
  2020-11-18 16:11   ` David Sterba
@ 2020-11-18 23:48     ` Qu Wenruo
  2020-11-19  7:18       ` Nikolay Borisov
  0 siblings, 1 reply; 67+ messages in thread
From: Qu Wenruo @ 2020-11-18 23:48 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, Nikolay Borisov



On 2020/11/19 上午12:11, David Sterba wrote:
> On Fri, Nov 13, 2020 at 08:51:40PM +0800, Qu Wenruo wrote:
>> Currently we use 'unsigned' for extent_state::state, which is only ensured
>> to be at least 16 bits.
>
> Ensured maybe by the C standard but we use u32 and 'unsigned int'
> interchangably everywhere. There are some inferior architectures that
> use different type witdths, but all we care is 32bit and 64bit.

Personally speaking, that's not a good practice at all.

You'll never know when you'll hit a new arch.

One best example here is subpage testing. I believe you're also
utilizing arch64 to do the test, and thankfully it has all the same
widths here, but you can never be that sure.

>
>> But for incoming subpage support, we are going to introduce more bits,
>> thus we will go beyond 16 bits.
>>
>> To support this, make extent_state::state at least 32bit and to be more
>> explicit, we use "u32" to be clear about the max supported bits.
>
> Yeah that's fine to make the expected width requirement explicit.
>

As long as this can be merged, I'm completely fine though.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 03/24] btrfs: extent_io: replace extent_start/extent_len with better structure for end_bio_extent_readpage()
  2020-11-18 16:05   ` David Sterba
@ 2020-11-18 23:49     ` Qu Wenruo
  2020-11-19 20:30       ` David Sterba
  0 siblings, 1 reply; 67+ messages in thread
From: Qu Wenruo @ 2020-11-18 23:49 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3170 bytes --]



On 2020/11/19 上午12:05, David Sterba wrote:
> On Fri, Nov 13, 2020 at 08:51:28PM +0800, Qu Wenruo wrote:
>>  }
>>  
>> +/*
>> + * Records previously processed extent range.
>> + *
>> + * For endio_readpage_release_extent() to handle a full extent range, reducing
>> + * the extent io operations.
>> + */
>> +struct processed_extent {
>> +	struct btrfs_inode *inode;
>> +	u64 start;	/* file offset in @inode */
>> +	u64 end;	/* file offset in @inode */
> 
> Please don't use the in-line comments for struct members.

Even for such short description?

That's a little overkilled to me now.

Thanks,
Qu
> 
>> +	bool uptodate;
>> +};
>> +
>> +/*
>> + * Try to release processed extent range.
>> + *
>> + * May not release the extent range right now if the current range is contig
> 
> 'contig' means what? If it's for 'contiguous' then please spell it out
> in text and use the abbreviated form only for variables.
> 
>> + * with processed extent.
>> + *
>> + * Will release processed extent when any of @inode, @uptodate, the range is
>> + * no longer contig with processed range.
>> + * Pass @inode == NULL will force processed extent to be released.
>> + */
>>  static void
>> -endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 len,
>> -			      int uptodate)
>> +endio_readpage_release_extent(struct processed_extent *processed,
>> +			      struct btrfs_inode *inode, u64 start, u64 end,
>> +			      bool uptodate)
>>  {
>>  	struct extent_state *cached = NULL;
>> -	u64 end = start + len - 1;
>> +	struct extent_io_tree *tree;
>>  
>> -	if (uptodate && tree->track_uptodate)
>> -		set_extent_uptodate(tree, start, end, &cached, GFP_ATOMIC);
>> -	unlock_extent_cached_atomic(tree, start, end, &cached);
>> +	/* We're the first extent, initialize @processed */
>> +	if (!processed->inode)
>> +		goto update;
>> +
>> +	/*
>> +	 * Contig with processed extent. Just uptodate the end
>> +	 *
>> +	 * Several things to notice:
>> +	 * - Bio can be merged as long as on-disk bytenr is contig
>> +	 *   This means we can have page belonging to other inodes, thus need to
>> +	 *   check if the inode matches.
>> +	 * - Bvec can contain range beyond current page for multi-page bvec
>> +	 *   Thus we need to do processed->end + 1 >= start check
>> +	 */
>> +	if (processed->inode == inode && processed->uptodate == uptodate &&
>> +	    processed->end + 1 >= start && end >= processed->end) {
>> +		processed->end = end;
>> +		return;
>> +	}
>> +
>> +	tree = &processed->inode->io_tree;
>> +	/*
>> +	 * Now we have a range not contig with processed range, release the
>> +	 * processed range now.
>> +	 */
>> +	if (processed->uptodate && tree->track_uptodate)
>> +		set_extent_uptodate(tree, processed->start, processed->end,
>> +				    &cached, GFP_ATOMIC);
>> +	unlock_extent_cached_atomic(tree, processed->start, processed->end,
>> +				    &cached);
>> +
>> +update:
>> +	/* Update @processed to current range */
>> +	processed->inode = inode;
>> +	processed->start = start;
>> +	processed->end = end;
>> +	processed->uptodate = uptodate;
>>  }


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 18/24] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs
  2020-11-18 16:27   ` David Sterba
@ 2020-11-18 23:57     ` Qu Wenruo
  0 siblings, 0 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-18 23:57 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1551 bytes --]



On 2020/11/19 上午12:27, David Sterba wrote:
> On Fri, Nov 13, 2020 at 08:51:43PM +0800, Qu Wenruo wrote:
>> @@ -325,81 +428,53 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
>>  		path->skip_locking = 1;
>>  	}
>>  
>> -	disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
>> +	for (cur_disk_bytenr = orig_disk_bytenr;
>> +	     cur_disk_bytenr < orig_disk_bytenr + orig_len;
>> +	     cur_disk_bytenr += (count * sectorsize)) {
>> +		u64 search_len = orig_disk_bytenr + orig_len - cur_disk_bytenr;
>> +		int sector_offset;
>> +		u8 *csum_dst;
>>  
>> -	bio_for_each_segment(bvec, bio, iter) {
>> -		page_bytes_left = bvec.bv_len;
>> -		if (count)
>> -			goto next;
>> +		sector_offset = (cur_disk_bytenr - orig_disk_bytenr) >>
>> +				 fs_info->sectorsize_bits;
> 
> I replied to the mismatching types already but because you've been
> sending out the patchsets too often this got lost, I really don't want
> keep seeing the same things again. You left some int type mess in the
> new subpage patchset as well.
> 

OK, let me to be clear here, what's the problem using int here?

If your concern is the width, no, 32bit is definitely enough.
At the right of the assignment, calculation is still done in u64, and we
should never have a bio which is larger than 4GiB.
As bio has its size limit to UINX_MAX already.

If your concern is sign, I'm fine to change it to unsigned int or u32 to
be more specific.

I believe using u64 everywhere is definitely overkilled.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 15/24] btrfs: extent-io: make type of extent_state::state to be at least 32 bits
  2020-11-18 23:48     ` Qu Wenruo
@ 2020-11-19  7:18       ` Nikolay Borisov
  0 siblings, 0 replies; 67+ messages in thread
From: Nikolay Borisov @ 2020-11-19  7:18 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs



On 19.11.20 г. 1:48 ч., Qu Wenruo wrote:
> 
> 
> On 2020/11/19 上午12:11, David Sterba wrote:
>> On Fri, Nov 13, 2020 at 08:51:40PM +0800, Qu Wenruo wrote:
>>> Currently we use 'unsigned' for extent_state::state, which is only ensured
>>> to be at least 16 bits.
>>
>> Ensured maybe by the C standard but we use u32 and 'unsigned int'
>> interchangably everywhere. There are some inferior architectures that
>> use different type witdths, but all we care is 32bit and 64bit.
> 
> Personally speaking, that's not a good practice at all.
> 
> You'll never know when you'll hit a new arch.
> 
> One best example here is subpage testing. I believe you're also
> utilizing arch64 to do the test, and thankfully it has all the same
> widths here, but you can never be that sure.
> 

A lot of the kernel is coded with the assumption that int is at least 32
bits if it wasn't things a lot of things would break.

>>
>>> But for incoming subpage support, we are going to introduce more bits,
>>> thus we will go beyond 16 bits.
>>>
>>> To support this, make extent_state::state at least 32bit and to be more
>>> explicit, we use "u32" to be clear about the max supported bits.
>>
>> Yeah that's fine to make the expected width requirement explicit.
>>
> 
> As long as this can be merged, I'm completely fine though.
> 
> Thanks,
> Qu
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 04/24] btrfs: extent_io: introduce helper to handle page status update in end_bio_extent_readpage()
  2020-11-18 23:43     ` Qu Wenruo
@ 2020-11-19 18:32       ` David Sterba
  0 siblings, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-19 18:32 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Thu, Nov 19, 2020 at 07:43:56AM +0800, Qu Wenruo wrote:
> On 2020/11/19 上午4:27, David Sterba wrote:
> > On Fri, Nov 13, 2020 at 08:51:29PM +0800, Qu Wenruo wrote:
> >> +static void endio_readpage_update_page_status(struct page *page, bool uptodate)
> >> +{
> >> +	if (uptodate) {
> >> +		SetPageUptodate(page);
> >> +	} else {
> >> +		ClearPageUptodate(page);
> >> +		SetPageError(page);
> >> +	}
> >> +	unlock_page(page);
> > 
> > That would be better left in the caller as it's quite important
> > information but the helper name does not say anything about it.
> 
> It may be the case for now, but for incoming subpage, check the patch
> "btrfs: integrate page status update for read path into
> begin/end_page_read()" to see why we want page unlocking to be done here.

Ok. I added a comment to the call that it also unlocks the page.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 03/24] btrfs: extent_io: replace extent_start/extent_len with better structure for end_bio_extent_readpage()
  2020-11-18 23:49     ` Qu Wenruo
@ 2020-11-19 20:30       ` David Sterba
  0 siblings, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-19 20:30 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Thu, Nov 19, 2020 at 07:49:30AM +0800, Qu Wenruo wrote:
> 
> 
> On 2020/11/19 上午12:05, David Sterba wrote:
> > On Fri, Nov 13, 2020 at 08:51:28PM +0800, Qu Wenruo wrote:
> >>  }
> >>  
> >> +/*
> >> + * Records previously processed extent range.
> >> + *
> >> + * For endio_readpage_release_extent() to handle a full extent range, reducing
> >> + * the extent io operations.
> >> + */
> >> +struct processed_extent {
> >> +	struct btrfs_inode *inode;
> >> +	u64 start;	/* file offset in @inode */
> >> +	u64 end;	/* file offset in @inode */
> > 
> > Please don't use the in-line comments for struct members.
> 
> Even for such short description?
> 
> That's a little overkilled to me now.

Everywhere, for consistency. The in-line comments tend to be too short
and not always an improvement.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 02/24] btrfs: extent-io-tests: remove invalid tests
  2020-11-13 12:51 ` [PATCH v2 02/24] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
  2020-11-13 18:42   ` Josef Bacik
@ 2020-11-19 21:08   ` David Sterba
  1 sibling, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-19 21:08 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Nov 13, 2020 at 08:51:27PM +0800, Qu Wenruo wrote:
> In extent-io-test, there are two invalid tests:
> - Invalid nodesize for test_eb_bitmaps()
>   Instead of the sectorsize and nodesize combination passed in, we're
>   always using hand-crafted nodesize, e.g:
> 
> 	len = (sectorsize < BTRFS_MAX_METADATA_BLOCKSIZE)
> 		? sectorsize * 4 : sectorsize;
> 
>   In above case, if we have 32K page size, then we will get a length of
>   128K, which is beyond max node size, and obviously invalid.
> 
>   Thankfully most machines are either 4K or 64K page size, thus we
>   haven't yet hit such case.
> 
> - Invalid extent buffer bytenr
>   For 64K page size, the only combination we're going to test is
>   sectorsize = nodesize = 64K.
>   However in that case, we will try to test an eb which bytenr is not
>   sectorsize aligned:
> 
> 	/* Do it over again with an extent buffer which isn't page-aligned. */
> 	eb = __alloc_dummy_extent_buffer(fs_info, nodesize / 2, len);
> 
>   Sector alignedment is a hard requirement for any sector size.
>   The only exception is superblock. But anything else should follow
>   sector size alignment.
> 
>   This is definitely an invalid test case.
> 
> This patch will fix both problems by:
> - Honor the sectorsize/nodesize combination
>   Now we won't bother to hand-craft a strange length and use it as
>   nodesize.
> 
> - Use sectorsize as the 2nd run extent buffer start
>   This would test the case where extent buffer is aligned to sectorsize
>   but not always aligned to nodesize.
> 
> Please note that, later subpage related cleanup will reduce
> extent_buffer::pages[] to exact what we need, making the sector
> unaligned extent buffer operations to cause problem.
> 
> Since only extent_io self tests utilize this invalid feature, this
> patch is required for all later cleanup/refactors.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added to misc-next.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 03/24] btrfs: extent_io: replace extent_start/extent_len with better structure for end_bio_extent_readpage()
  2020-11-13 12:51 ` [PATCH v2 03/24] btrfs: extent_io: replace extent_start/extent_len with better structure for end_bio_extent_readpage() Qu Wenruo
  2020-11-13 19:13   ` Josef Bacik
  2020-11-18 16:05   ` David Sterba
@ 2020-11-19 21:08   ` David Sterba
  2 siblings, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-19 21:08 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Nov 13, 2020 at 08:51:28PM +0800, Qu Wenruo wrote:
> In end_bio_extent_readpage() we had a strange dance around
> extent_start/extent_len.
> 
> Hides behind the strange dance is, it's just calling
> endio_readpage_release_extent() on each bvec range.
> 
> Here is an example to explain the original work flow:
>   Bio is for inode 257, containing 2 pages, for range [1M, 1M+8K)
> 
>   end_bio_extent_extent_readpage() entered
>   |- extent_start = 0;
>   |- extent_end = 0;
>   |- bio_for_each_segment_all() {
>   |  |- /* Got the 1st bvec */
>   |  |- start = SZ_1M;
>   |  |- end = SZ_1M + SZ_4K - 1;
>   |  |- update = 1;
>   |  |- if (extent_len == 0) {
>   |  |  |- extent_start = start; /* SZ_1M */
>   |  |  |- extent_len = end + 1 - start; /* SZ_1M */
>   |  |  }
>   |  |
>   |  |- /* Got the 2nd bvec */
>   |  |- start = SZ_1M + 4K;
>   |  |- end = SZ_1M + 4K - 1;
>   |  |- update = 1;
>   |  |- if (extent_start + extent_len == start) {
>   |  |  |- extent_len += end + 1 - start; /* SZ_8K */
>   |  |  }
>   |  } /* All bio vec iterated */
>   |
>   |- if (extent_len) {
>      |- endio_readpage_release_extent(tree, extent_start, extent_len,
> 				      update);
> 	/* extent_start == SZ_1M, extent_len == SZ_8K, uptodate = 1 */
> 
> As the above flow shows, the existing code in end_bio_extent_readpage()
> is just accumulate extent_start/extent_len, and when the contiguous range
> breaks, call endio_readpage_release_extent() for the range.
> 
> However current behavior has something not really considered:
> - The inode can change
>   For bio, their pages don't need to have contig page_offset.
>   This means, even pages from different inode can be packed into one
>   bio.
> 
> - Bvec cross page boundary
>   There is a feature called multi-page bvec, where bvec->bv_len can go
>   beyond bvec->bv_page boundary.
> 
> - Poor readability
> 
> This patch will address the problem by:
> - Introduce a proper structure, processed_extent, to record processed
>   extent range
> - Integrate inode/start/end/uptodate check into
>   endio_readpage_release_extent()
> - Add more comment on each step.
>   This should greatly improve the readability, now in
>   end_bio_extent_readpage() there are only two
>   endio_readpage_release_extent() calls.
> 
> - Add inode contig check
>   Now we also ensure the inode is the same before checking the range
>   contig.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added to misc-next.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 04/24] btrfs: extent_io: introduce helper to handle page status update in end_bio_extent_readpage()
  2020-11-13 12:51 ` [PATCH v2 04/24] btrfs: extent_io: introduce helper to handle page status update in end_bio_extent_readpage() Qu Wenruo
  2020-11-13 19:18   ` Josef Bacik
  2020-11-18 20:27   ` David Sterba
@ 2020-11-19 21:08   ` David Sterba
  2 siblings, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-19 21:08 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Nov 13, 2020 at 08:51:29PM +0800, Qu Wenruo wrote:
> Introduce a new helper, endio_readpage_release_extent(), to handle
> update status update in end_bio_extent_readpage().
> 
> The refactor itself is not really nothing interesting, the point here is
> to provide the basis for later subpage support, where the page status
> update can be more complex than current code.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added to misc-next.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 06/24] btrfs: remove the phy_offset parameter for btrfs_validate_metadata_buffer()
  2020-11-13 12:51 ` [PATCH v2 06/24] btrfs: remove the phy_offset parameter for btrfs_validate_metadata_buffer() Qu Wenruo
@ 2020-11-19 21:09   ` David Sterba
  0 siblings, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-19 21:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Nikolay Borisov

On Fri, Nov 13, 2020 at 08:51:31PM +0800, Qu Wenruo wrote:
> Parameter @phy_offset is the offset against the bio->bi_iter.bi_sector.
> @phy_offset is mostly for data io to lookup the csum in btrfs_io_bio.
> 
> But for metadata, it's completely useless as metadata stores their own
> csum in its btrfs_header.
> 
> Remove this useless parameter from btrfs_validate_metadata_buffer().
> 
> Just an extra note for parameters @start and @end, they are not utilized
> at all for current sectorsize == PAGE_SIZE case, as we can grab eb directly
> from page.
> 
> But those two parameters are very important for later subpage support,
> thus @start/@len are not touched here.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>

Added to misc-next.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 14/24] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage()
  2020-11-13 12:51 ` [PATCH v2 14/24] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage() Qu Wenruo
@ 2020-11-19 21:09   ` David Sterba
  0 siblings, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-19 21:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Nikolay Borisov

On Fri, Nov 13, 2020 at 08:51:39PM +0800, Qu Wenruo wrote:
> In extent_invalidatepage() it will try to clear all possible bits since
> it's calling clear_extent_bit() with delete == 1.
> That would try to clear all existing bits.
> 
> This is currently fine, since for btree io tree, it only utilizes
> EXTENT_LOCK bit.
> But this could be a problem for later subpage support, which will
> utilize extra io tree bit to represent extra info.
> 
> This patch will just convert that clear_extent_bit() to
> unlock_extent_cached().
> 
> For current code since only EXTENT_LOCKED bit is utilized, this doesn't
> change the behavior, but provides a much cleaner basis for incoming
> subpage support.
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added to misc-next.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 15/24] btrfs: extent-io: make type of extent_state::state to be at least 32 bits
  2020-11-13 12:51 ` [PATCH v2 15/24] btrfs: extent-io: make type of extent_state::state to be at least 32 bits Qu Wenruo
  2020-11-13 18:40   ` Josef Bacik
  2020-11-18 16:11   ` David Sterba
@ 2020-11-19 21:09   ` David Sterba
  2 siblings, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-19 21:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Nikolay Borisov

On Fri, Nov 13, 2020 at 08:51:40PM +0800, Qu Wenruo wrote:
> Currently we use 'unsigned' for extent_state::state, which is only ensured
> to be at least 16 bits.
> 
> But for incoming subpage support, we are going to introduce more bits,
> thus we will go beyond 16 bits.
> 
> To support this, make extent_state::state at least 32bit and to be more
> explicit, we use "u32" to be clear about the max supported bits.
> 
> This doesn't increase the memory usage for x86_64, but may affect other
> architectures.
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added to misc-next.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 16/24] btrfs: file-item: use nodesize to determine whether we need readahead for btrfs_lookup_bio_sums()
  2020-11-13 12:51 ` [PATCH v2 16/24] btrfs: file-item: use nodesize to determine whether we need readahead for btrfs_lookup_bio_sums() Qu Wenruo
@ 2020-11-19 21:09   ` David Sterba
  0 siblings, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-19 21:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Nikolay Borisov

On Fri, Nov 13, 2020 at 08:51:41PM +0800, Qu Wenruo wrote:
> In btrfs_lookup_bio_sums() if the bio is pretty large, we want to
> readahead the csum tree.
> 
> However the threshold is an immediate number, (PAGE_SIZE * 8), from the
> initial btrfs merge.
> 
> The value itself is pretty hard to guess the meaning, especially when
> the immediate number is from the age where 4K sectorsize is the default
> and only CRC32 is supported.
> 
> For the most common btrfs setup, CRC32 csum and 4K sectorsize,
> it means just 32K read would kick readahead, while the csum itself is
> only 32 bytes in size.
> 
> Now let's be more reasonable by taking both csum size and node size into
> consideration.
> 
> If the csum size for the bio is larger than one leaf, then we kick the
> readahead.
> This means for current default btrfs, the threshold will be 16M.
> 
> This change should not change performance observably, thus this is mostly
> a readability enhancement.
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added to misc-next.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 19/24] btrfs: scrub: remove the anonymous structure from scrub_page
  2020-11-13 12:51 ` [PATCH v2 19/24] btrfs: scrub: remove the anonymous structure from scrub_page Qu Wenruo
  2020-11-18 19:00   ` David Sterba
@ 2020-11-19 21:09   ` David Sterba
  1 sibling, 0 replies; 67+ messages in thread
From: David Sterba @ 2020-11-19 21:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Nov 13, 2020 at 08:51:44PM +0800, Qu Wenruo wrote:
> That anonymous structure serve no special purpose, just replace it with
> regular members.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added to misc-next.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 13/24] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
  2020-11-18 19:48   ` David Sterba
@ 2020-11-24  6:20     ` Qu Wenruo
  0 siblings, 0 replies; 67+ messages in thread
From: Qu Wenruo @ 2020-11-24  6:20 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, Goldwyn Rodrigues


[-- Attachment #1.1: Type: text/plain, Size: 1232 bytes --]



On 2020/11/19 上午3:48, David Sterba wrote:
> On Fri, Nov 13, 2020 at 08:51:38PM +0800, Qu Wenruo wrote:
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -1686,10 +1686,11 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
>>  		oip = offset_in_page(offset);
>>  
>>  		if (oip + key_size <= PAGE_SIZE) {
>> -			const unsigned long idx = offset >> PAGE_SHIFT;
>> +			const unsigned long idx = get_eb_page_index(offset);
>>  			char *kaddr = page_address(eb->pages[idx]);
>>  
>> -			tmp = (struct btrfs_disk_key *)(kaddr + oip);
>> +			tmp = (struct btrfs_disk_key *)(kaddr +
>> +					get_eb_page_offset(eb, offset));
> 
> Here offset_in_page(offset) == get_eb_page_offset(eb, offset) and does
> not need to be calculated again for both sector/page combinations. As
> this is a hot path in search it sould be kept optimizied.
> 
Now you fall in to the pitfall.

offset_in_page(offset) != get_eb_page_offset(eb, offset) at all.

get_eb_page_offset(eb, offset) will add eb->start into the
offset_in_page() call.
This is especially important for subpage.

So here we still need to call get_eb_page_offset().

Although I need to find a better name for it.

THanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2020-11-24  6:20 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-13 12:51 [PATCH v2 00/24] btrfs: preparation patches for subpage support Qu Wenruo
2020-11-13 12:51 ` [PATCH v2 01/24] btrfs: tests: fix free space tree test failure on 64K page system Qu Wenruo
2020-11-17 11:53   ` Nikolay Borisov
2020-11-13 12:51 ` [PATCH v2 02/24] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
2020-11-13 18:42   ` Josef Bacik
2020-11-19 21:08   ` David Sterba
2020-11-13 12:51 ` [PATCH v2 03/24] btrfs: extent_io: replace extent_start/extent_len with better structure for end_bio_extent_readpage() Qu Wenruo
2020-11-13 19:13   ` Josef Bacik
2020-11-18 16:05   ` David Sterba
2020-11-18 23:49     ` Qu Wenruo
2020-11-19 20:30       ` David Sterba
2020-11-19 21:08   ` David Sterba
2020-11-13 12:51 ` [PATCH v2 04/24] btrfs: extent_io: introduce helper to handle page status update in end_bio_extent_readpage() Qu Wenruo
2020-11-13 19:18   ` Josef Bacik
2020-11-18 20:27   ` David Sterba
2020-11-18 23:43     ` Qu Wenruo
2020-11-19 18:32       ` David Sterba
2020-11-19 21:08   ` David Sterba
2020-11-13 12:51 ` [PATCH v2 05/24] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
2020-11-13 19:22   ` Josef Bacik
2020-11-18 20:04   ` David Sterba
2020-11-18 20:09   ` David Sterba
2020-11-13 12:51 ` [PATCH v2 06/24] btrfs: remove the phy_offset parameter for btrfs_validate_metadata_buffer() Qu Wenruo
2020-11-19 21:09   ` David Sterba
2020-11-13 12:51 ` [PATCH v2 07/24] btrfs: pass bio_offset to check_data_csum() directly Qu Wenruo
2020-11-13 12:51 ` [PATCH v2 08/24] btrfs: inode: make btrfs_verify_data_csum() follow sector size Qu Wenruo
2020-11-13 19:43   ` Josef Bacik
2020-11-13 12:51 ` [PATCH v2 09/24] btrfs: extent_io: calculate inline extent buffer page size based on page size Qu Wenruo
2020-11-13 19:47   ` Josef Bacik
2020-11-14  0:11     ` Qu Wenruo
2020-11-13 12:51 ` [PATCH v2 10/24] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE Qu Wenruo
2020-11-16 22:51   ` David Sterba
2020-11-16 23:50     ` Qu Wenruo
2020-11-17  0:24       ` David Sterba
2020-11-13 12:51 ` [PATCH v2 11/24] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support Qu Wenruo
2020-11-13 12:51 ` [PATCH v2 12/24] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
2020-11-18 16:22   ` David Sterba
2020-11-13 12:51 ` [PATCH v2 13/24] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
2020-11-18 19:30   ` David Sterba
2020-11-18 19:38   ` David Sterba
2020-11-18 19:48   ` David Sterba
2020-11-24  6:20     ` Qu Wenruo
2020-11-13 12:51 ` [PATCH v2 14/24] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage() Qu Wenruo
2020-11-19 21:09   ` David Sterba
2020-11-13 12:51 ` [PATCH v2 15/24] btrfs: extent-io: make type of extent_state::state to be at least 32 bits Qu Wenruo
2020-11-13 18:40   ` Josef Bacik
2020-11-18 16:11   ` David Sterba
2020-11-18 23:48     ` Qu Wenruo
2020-11-19  7:18       ` Nikolay Borisov
2020-11-19 21:09   ` David Sterba
2020-11-13 12:51 ` [PATCH v2 16/24] btrfs: file-item: use nodesize to determine whether we need readahead for btrfs_lookup_bio_sums() Qu Wenruo
2020-11-19 21:09   ` David Sterba
2020-11-13 12:51 ` [PATCH v2 17/24] btrfs: file-item: remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums() Qu Wenruo
2020-11-13 12:51 ` [PATCH v2 18/24] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs Qu Wenruo
2020-11-18 16:27   ` David Sterba
2020-11-18 23:57     ` Qu Wenruo
2020-11-13 12:51 ` [PATCH v2 19/24] btrfs: scrub: remove the anonymous structure from scrub_page Qu Wenruo
2020-11-18 19:00   ` David Sterba
2020-11-19 21:09   ` David Sterba
2020-11-13 12:51 ` [PATCH v2 20/24] btrfs: scrub: always allocate one full page for one sector for RAID56 Qu Wenruo
2020-11-13 12:51 ` [PATCH v2 21/24] btrfs: scrub: support subpage tree block scrub Qu Wenruo
2020-11-13 12:51 ` [PATCH v2 22/24] btrfs: scrub: support subpage data scrub Qu Wenruo
2020-11-18 16:29   ` David Sterba
2020-11-18 23:38     ` Qu Wenruo
2020-11-13 12:51 ` [PATCH v2 23/24] btrfs: scrub: allow scrub to work with subpage sectorsize Qu Wenruo
2020-11-13 12:51 ` [PATCH v2 24/24] btrfs: extent_io: Use detach_page_private() for alloc_extent_buffer() Qu Wenruo
2020-11-13 20:05 ` [PATCH v2 00/24] btrfs: preparation patches for subpage support Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.