Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 00/19] btrfs: add read-only support for subpage sector size
@ 2020-09-15  5:35 Qu Wenruo
  2020-09-15  5:35 ` [PATCH v2 01/19] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
                   ` (20 more replies)
  0 siblings, 21 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs

Patches can be fetched from github:
https://github.com/adam900710/linux/tree/subpage

Currently btrfs only allows to mount fs with sectorsize == PAGE_SIZE.

That means, for 64K page size system, they can only use 64K sector size
fs.
This brings a big compatible problem for btrfs.

This patch is going to slightly solve the problem by, allowing 64K
system to mount 4K sectorsize fs in read-only mode.

The main objective here, is to remove the blockage in the code base, and
pave the road to full RW mount support.

== What works ==

Existing regular page sized sector size support
Subpage read-only Mount (with all self tests and ASSERT)
Subpage metadata read (including all trees and inline extents, and csum checking)
Subpage uncompressed data read (with csum checking)

== What doesn't work ==

Read-write mount (see the subject)
Compressed data read

== Challenge we meet ==

The main problem is metadata, where we have several limitations:
- We always read the full page of a metadata
  In subpage case, one full page can contain several tree blocks.

- We use page::private to point to extent buffer
  This means we currently can only support one-page-to-one-extent-buffer
  mapping.
  For subpage size support, we need one-page-to-multiple-extent-buffer
  mapping.


== Solutions ==

So here for the metadata part, we use the following methods to
workaround the problem:

- Completely rely on extent_io_tree for metadata status/locking
  Now for subpage metadata, page::private is never utilized. It always
  points to NULL.
  And we only utilize private page status, other status
  (locked/uptodate/dirty/...) are all ignored.

  Instead, page lock is replayed by EXTENT_LOCK of extent_io_tree.
  Page uptodate is replaced by EXTENT_UPTODATE of extent_io_tree.
  And if a range has extent buffer is represented by EXTENT_NEW.

  This provides the full potential for later RW support.

- Do subpage read for metadata
  Now we do proper subpage read for both data and metadata.
  For metadata we never merge bio for adjacent tree blocks, but always
  submit one bio for one tree block.
  This allows us to do proper verification for each tree blocks.

For data part, it's pretty simple, all existing infrastructure can be
easily converted to support subpage read, without any subpage specific
handing yet.

== Patchset structure ==

The structure of the patchset:
Patch 01~15: Preparation patches for data and metadata subpage read support.
             These patches can be merged without problem, and work for
             both regular and subpage case.
	     This part can conflict with Nikolay's latest cleanup, but
	     the conflicts should be pretty controllable.

Patch 16~19: Patches for metadata subpage read support.
	     The main part of the patchset. It converts metadata to
	     purely extent_io_tree based solution for subpage read.

	     In theory, page sized routine can also be converted to
	     extent_io_tree. But that would be another topic in the
	     future.

The number of patches is the main reason I'm submitting them to the mail
list. As there are too many preparation patches already.

Qu Wenruo (19):
  btrfs: extent-io-tests: remove invalid tests
  btrfs: remove the unnecessary parameter @start and @len for
    check_data_csum()
  btrfs: calculate inline extent buffer page size based on page size
  btrfs: remove the open-code to read disk-key
  btrfs: make btrfs_fs_info::buffer_radix to take sector size devided
    values
  btrfs: don't allow tree block to cross page boundary for subpage
    support
  btrfs: update num_extent_pages() to support subpage sized extent
    buffer
  btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
  btrfs: make csum_tree_block() handle sectorsize smaller than page size
  btrfs: add assert_spin_locked() for attach_extent_buffer_page()
  btrfs: extract the extent buffer verification from
    btree_readpage_end_io_hook()
  btrfs: extent_io: only require sector size alignment for page read
  btrfs: make btrfs_readpage_end_io_hook() follow sector size
  btrfs: make btree inode io_tree has its special owner
  btrfs: don't set extent_io_tree bits for btree inode at endio time
  btrfs: use extent_io_tree to handle subpage extent buffer allocation
  btrfs: implement subpage metadata read and its endio function
  btrfs: implement btree_readpage() and try_release_extent_buffer() for
    subpage
  btrfs: allow RO mount of 4K sector size fs on 64K page system

 fs/btrfs/btrfs_inode.h           |  12 +
 fs/btrfs/ctree.c                 |  13 +-
 fs/btrfs/ctree.h                 |  38 +++-
 fs/btrfs/disk-io.c               | 217 ++++++++++++++----
 fs/btrfs/extent-io-tree.h        |   8 +
 fs/btrfs/extent_io.c             | 376 +++++++++++++++++++++++++++----
 fs/btrfs/extent_io.h             |  19 +-
 fs/btrfs/inode.c                 |  40 +++-
 fs/btrfs/ordered-data.c          |   8 +
 fs/btrfs/qgroup.c                |   4 +
 fs/btrfs/struct-funcs.c          |  18 +-
 fs/btrfs/super.c                 |   7 +
 fs/btrfs/tests/extent-io-tests.c |  26 +--
 13 files changed, 642 insertions(+), 144 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 01/19] btrfs: extent-io-tests: remove invalid tests
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  5:35 ` [PATCH v2 02/19] btrfs: remove the unnecessary parameter @start and @len for check_data_csum() Qu Wenruo
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs

In extent-io-test, there are two invalid tests:
- Invalid nodesize for test_eb_bitmaps()
  Instead of the sectorsize and nodesize combination passed in, we're
  always using hand-crafted nodesize.
  Although it has some extra check for 64K page size, we can still hit
  a case where PAGE_SIZE == 32K, then we got 128K nodesize which is
  larger than max valid node size.

  Thankfully most machines are either 4K or 64K page size, thus we
  haven't yet hit such case.

- Invalid extent buffer bytenr
  For 64K page size, the only combination we're going to test is
  sectorsize = nodesize = 64K.
  In that case, we'll try to create an extent buffer with 32K bytenr,
  which is not aligned to sectorsize thus invalid.

This patch will fix both problems by:
- Honor the sectorsize/nodesize combination
  Now we won't bother to hand-craft a strange length and use it as
  nodesize.

- Use sectorsize as the 2nd run extent buffer start
  This would test the case where extent buffer is aligned to sectorsize
  but not always aligned to nodesize.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/tests/extent-io-tests.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
index df7ce874a74b..73e96d505f4f 100644
--- a/fs/btrfs/tests/extent-io-tests.c
+++ b/fs/btrfs/tests/extent-io-tests.c
@@ -379,54 +379,50 @@ static int __test_eb_bitmaps(unsigned long *bitmap, struct extent_buffer *eb,
 static int test_eb_bitmaps(u32 sectorsize, u32 nodesize)
 {
 	struct btrfs_fs_info *fs_info;
-	unsigned long len;
 	unsigned long *bitmap = NULL;
 	struct extent_buffer *eb = NULL;
 	int ret;
 
 	test_msg("running extent buffer bitmap tests");
 
-	/*
-	 * In ppc64, sectorsize can be 64K, thus 4 * 64K will be larger than
-	 * BTRFS_MAX_METADATA_BLOCKSIZE.
-	 */
-	len = (sectorsize < BTRFS_MAX_METADATA_BLOCKSIZE)
-		? sectorsize * 4 : sectorsize;
-
-	fs_info = btrfs_alloc_dummy_fs_info(len, len);
+	fs_info = btrfs_alloc_dummy_fs_info(nodesize, sectorsize);
 	if (!fs_info) {
 		test_std_err(TEST_ALLOC_FS_INFO);
 		return -ENOMEM;
 	}
 
-	bitmap = kmalloc(len, GFP_KERNEL);
+	bitmap = kmalloc(nodesize, GFP_KERNEL);
 	if (!bitmap) {
 		test_err("couldn't allocate test bitmap");
 		ret = -ENOMEM;
 		goto out;
 	}
 
-	eb = __alloc_dummy_extent_buffer(fs_info, 0, len);
+	eb = __alloc_dummy_extent_buffer(fs_info, 0, nodesize);
 	if (!eb) {
 		test_std_err(TEST_ALLOC_ROOT);
 		ret = -ENOMEM;
 		goto out;
 	}
 
-	ret = __test_eb_bitmaps(bitmap, eb, len);
+	ret = __test_eb_bitmaps(bitmap, eb, nodesize);
 	if (ret)
 		goto out;
 
-	/* Do it over again with an extent buffer which isn't page-aligned. */
 	free_extent_buffer(eb);
-	eb = __alloc_dummy_extent_buffer(fs_info, nodesize / 2, len);
+
+	/*
+	 * Test again for case where the tree block is sectorsize aligned but
+	 * not nodesize aligned.
+	 */
+	eb = __alloc_dummy_extent_buffer(fs_info, sectorsize, nodesize);
 	if (!eb) {
 		test_std_err(TEST_ALLOC_ROOT);
 		ret = -ENOMEM;
 		goto out;
 	}
 
-	ret = __test_eb_bitmaps(bitmap, eb, len);
+	ret = __test_eb_bitmaps(bitmap, eb, nodesize);
 out:
 	free_extent_buffer(eb);
 	kfree(bitmap);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 02/19] btrfs: remove the unnecessary parameter @start and @len for check_data_csum()
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
  2020-09-15  5:35 ` [PATCH v2 01/19] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  8:39   ` Johannes Thumshirn
  2020-09-15  5:35 ` [PATCH v2 03/19] btrfs: calculate inline extent buffer page size based on page size Qu Wenruo
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs

For check_data_csum(), the page we're using is directly from inode
mapping, thus it has valid page_offset().

We can use (page_offset() + pg_off) to replace @start parameter
completely, while the @len should always be sectorsize.

Since we're here, also add some comment, since there are quite some
confusion in words like start/offset, without explaining whether it's
file_offset or logical bytenr.

This should not affect the existing behavior, as for current sectorsize
== PAGE_SIZE case, @pgoff should always be 0, and len is always
PAGE_SIZE (or sectorsize from the dio read path).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9570458aa847..bb4442950d2b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2793,17 +2793,30 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
 	btrfs_queue_work(wq, &ordered_extent->work);
 }
 
+/*
+ * Verify the checksum of one sector of uncompressed data.
+ *
+ * @inode:	The inode.
+ * @io_bio:	The btrfs_io_bio which contains the csum.
+ * @icsum:	The csum offset (by number of sectors).
+ * @page:	The page where the data to be verified is.
+ * @pgoff:	The offset inside the page.
+ *
+ * The length of such check is always one sector size.
+ */
 static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
-			   int icsum, struct page *page, int pgoff, u64 start,
-			   size_t len)
+			   int icsum, struct page *page, int pgoff)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	char *kaddr;
+	u32 len = fs_info->sectorsize;
 	u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
 	u8 *csum_expected;
 	u8 csum[BTRFS_CSUM_SIZE];
 
+	ASSERT(pgoff + len <= PAGE_SIZE);
+
 	csum_expected = ((u8 *)io_bio->csum) + icsum * csum_size;
 
 	kaddr = kmap_atomic(page);
@@ -2817,8 +2830,8 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
 	kunmap_atomic(kaddr);
 	return 0;
 zeroit:
-	btrfs_print_data_csum_error(BTRFS_I(inode), start, csum, csum_expected,
-				    io_bio->mirror_num);
+	btrfs_print_data_csum_error(BTRFS_I(inode), page_offset(page) + pgoff,
+				    csum, csum_expected, io_bio->mirror_num);
 	if (io_bio->device)
 		btrfs_dev_stat_inc_and_print(io_bio->device,
 					     BTRFS_DEV_STAT_CORRUPTION_ERRS);
@@ -2857,8 +2870,7 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 	}
 
 	phy_offset >>= inode->i_sb->s_blocksize_bits;
-	return check_data_csum(inode, io_bio, phy_offset, page, offset, start,
-			       (size_t)(end - start + 1));
+	return check_data_csum(inode, io_bio, phy_offset, page, offset);
 }
 
 /*
@@ -7545,8 +7557,7 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
 			ASSERT(pgoff < PAGE_SIZE);
 			if (uptodate &&
 			    (!csum || !check_data_csum(inode, io_bio, icsum,
-						       bvec.bv_page, pgoff,
-						       start, sectorsize))) {
+						       bvec.bv_page, pgoff))) {
 				clean_io_failure(fs_info, failure_tree, io_tree,
 						 start, bvec.bv_page,
 						 btrfs_ino(BTRFS_I(inode)),
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 03/19] btrfs: calculate inline extent buffer page size based on page size
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
  2020-09-15  5:35 ` [PATCH v2 01/19] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
  2020-09-15  5:35 ` [PATCH v2 02/19] btrfs: remove the unnecessary parameter @start and @len for check_data_csum() Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  8:35   ` Nikolay Borisov
  2020-09-15  8:40   ` Johannes Thumshirn
  2020-09-15  5:35 ` [PATCH v2 04/19] btrfs: remove the open-code to read disk-key Qu Wenruo
                   ` (17 subsequent siblings)
  20 siblings, 2 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs

Btrfs only support 64K as max node size, thus for 4K page system, we
would have at most 16 pages for one extent buffer.

For a system using 64K page size, we would really have just one
single page.

While we always use 16 pages for extent_buffer::pages[], this means for
systems using 64K pages, we are wasting memory for the 15 pages which
will never be utilized.

So this patch will change how the extent_buffer::pages[] array size is
calclulated, now it will be calculated using
BTRFS_MAX_METADATA_BLOCKSIZE and PAGE_SIZE.

For systems using 4K page size, it will stay 16 pages.
For systems using 64K page size, it will be just 1 page.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 6 +++---
 fs/btrfs/extent_io.h | 8 +++++---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6def411b2eba..674d3b82751f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5004,9 +5004,9 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
 	/*
 	 * Sanity checks, currently the maximum is 64k covered by 16x 4k pages
 	 */
-	BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE
-		> MAX_INLINE_EXTENT_BUFFER_SIZE);
-	BUG_ON(len > MAX_INLINE_EXTENT_BUFFER_SIZE);
+	BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE >
+		     INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE);
+	BUG_ON(len > BTRFS_MAX_METADATA_BLOCKSIZE);
 
 #ifdef CONFIG_BTRFS_DEBUG
 	eb->spinning_writers = 0;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 00a88f2eb5ab..d511890702cc 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -85,9 +85,11 @@ struct extent_io_ops {
 				    int mirror);
 };
 
-
-#define INLINE_EXTENT_BUFFER_PAGES 16
-#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE)
+/*
+ * The SZ_64K is BTRFS_MAX_METADATA_BLOCKSIZE, here just to avoid circle
+ * including "ctree.h".
+ */
+#define INLINE_EXTENT_BUFFER_PAGES (SZ_64K / PAGE_SIZE)
 struct extent_buffer {
 	u64 start;
 	unsigned long len;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 04/19] btrfs: remove the open-code to read disk-key
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (2 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 03/19] btrfs: calculate inline extent buffer page size based on page size Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  8:36   ` Nikolay Borisov
                     ` (2 more replies)
  2020-09-15  5:35 ` [PATCH v2 05/19] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
                   ` (16 subsequent siblings)
  20 siblings, 3 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs

generic_bin_search() distinguishes between reading a key which doesn't
cross a page and one which does. However this distinction is not
necessary since read_extent_buffer handles both cases transparently.

Just use read_extent_buffer to streamline the code.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index cd1cd673bc0b..e204e1320745 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1697,7 +1697,6 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
 	}
 
 	while (low < high) {
-		unsigned long oip;
 		unsigned long offset;
 		struct btrfs_disk_key *tmp;
 		struct btrfs_disk_key unaligned;
@@ -1705,17 +1704,9 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
 
 		mid = (low + high) / 2;
 		offset = p + mid * item_size;
-		oip = offset_in_page(offset);
 
-		if (oip + key_size <= PAGE_SIZE) {
-			const unsigned long idx = offset >> PAGE_SHIFT;
-			char *kaddr = page_address(eb->pages[idx]);
-
-			tmp = (struct btrfs_disk_key *)(kaddr + oip);
-		} else {
-			read_extent_buffer(eb, &unaligned, offset, key_size);
-			tmp = &unaligned;
-		}
+		read_extent_buffer(eb, &unaligned, offset, key_size);
+		tmp = &unaligned;
 
 		ret = comp_keys(tmp, key);
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 05/19] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (3 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 04/19] btrfs: remove the open-code to read disk-key Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  8:27   ` Johannes Thumshirn
  2020-09-15 17:40   ` kernel test robot
  2020-09-15  5:35 ` [PATCH v2 06/19] btrfs: don't allow tree block to cross page boundary for subpage support Qu Wenruo
                   ` (15 subsequent siblings)
  20 siblings, 2 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

For subpage size sector size support, one page can contain mutliple tree
blocks, thus we can no longer use (eb->start >> PAGE_SHIFT) any more, or
we can easily get extent buffer doesn't belongs to us.

This patch will use (extent_buffer::start / sectorsize) as index for radix
tree so that we can get correct extent buffer for subpage size support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 674d3b82751f..e56aa68cd9fe 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5142,7 +5142,7 @@ struct extent_buffer *find_extent_buffer(struct btrfs_fs_info *fs_info,
 
 	rcu_read_lock();
 	eb = radix_tree_lookup(&fs_info->buffer_radix,
-			       start >> PAGE_SHIFT);
+			       start / fs_info->sectorsize);
 	if (eb && atomic_inc_not_zero(&eb->refs)) {
 		rcu_read_unlock();
 		/*
@@ -5194,7 +5194,7 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
 	}
 	spin_lock(&fs_info->buffer_lock);
 	ret = radix_tree_insert(&fs_info->buffer_radix,
-				start >> PAGE_SHIFT, eb);
+				start / fs_info->sectorsize, eb);
 	spin_unlock(&fs_info->buffer_lock);
 	radix_tree_preload_end();
 	if (ret == -EEXIST) {
@@ -5302,7 +5302,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 
 	spin_lock(&fs_info->buffer_lock);
 	ret = radix_tree_insert(&fs_info->buffer_radix,
-				start >> PAGE_SHIFT, eb);
+				start / fs_info->sectorsize, eb);
 	spin_unlock(&fs_info->buffer_lock);
 	radix_tree_preload_end();
 	if (ret == -EEXIST) {
@@ -5358,7 +5358,7 @@ static int release_extent_buffer(struct extent_buffer *eb)
 
 			spin_lock(&fs_info->buffer_lock);
 			radix_tree_delete(&fs_info->buffer_radix,
-					  eb->start >> PAGE_SHIFT);
+					  eb->start / fs_info->sectorsize);
 			spin_unlock(&fs_info->buffer_lock);
 		} else {
 			spin_unlock(&eb->refs_lock);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 06/19] btrfs: don't allow tree block to cross page boundary for subpage support
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (4 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 05/19] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  8:37   ` Nikolay Borisov
  2020-09-15  8:44   ` Johannes Thumshirn
  2020-09-15  5:35 ` [PATCH v2 07/19] btrfs: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
                   ` (14 subsequent siblings)
  20 siblings, 2 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs

As a preparation for subpage sector size support (allowing filesystem
with sector size smaller than page size to be mounted) if the sector
size is smaller than page size, we don't allow tree block to be read if
it cross 64K(*) boundary.

The 64K is selected because:
- We are only going to support 64K page size for subpage
- 64K is also the max node size btrfs supports

This ensures that, tree blocks are always contained in one page for a
system with 64K page size, which can greatly simplify the handling.

Or we need to do complex multi-page handling for tree blocks.

Currently the only way to create such tree blocks crossing 64K boundary
is by btrfs-convert, which will get fixed soon and doesn't get
wide-spread usage.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e56aa68cd9fe..9af972999a09 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5232,6 +5232,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		btrfs_err(fs_info, "bad tree block start %llu", start);
 		return ERR_PTR(-EINVAL);
 	}
+	if (fs_info->sectorsize < PAGE_SIZE && round_down(start, PAGE_SIZE) !=
+	    round_down(start + len - 1, PAGE_SIZE)) {
+		btrfs_err(fs_info,
+		"tree block crosses page boundary, start %llu nodesize %lu",
+			  start, len);
+		return ERR_PTR(-EINVAL);
+	}
 
 	eb = find_extent_buffer(fs_info, start);
 	if (eb)
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 07/19] btrfs: update num_extent_pages() to support subpage sized extent buffer
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (5 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 06/19] btrfs: don't allow tree block to cross page boundary for subpage support Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  8:42   ` Johannes Thumshirn
  2020-09-15  5:35 ` [PATCH v2 08/19] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs

For subpage sized extent buffer, we have ensured no extent buffer will
cross page boundary, thus we would only need one page for any extent
buffer.

This patch will update the function num_extent_pages() to handle such
case.
Now num_extent_pages() would return 1 instead of for subpage sized
extent buffer.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index d511890702cc..61556d6e9363 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -229,8 +229,15 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb);
 
 static inline int num_extent_pages(const struct extent_buffer *eb)
 {
-	return (round_up(eb->start + eb->len, PAGE_SIZE) >> PAGE_SHIFT) -
-	       (eb->start >> PAGE_SHIFT);
+	/*
+	 * For sectorsize == PAGE_SIZE case, since eb is always aligned to
+	 * sectorsize, it's just (eb->len / PAGE_SIZE) >> PAGE_SHIFT.
+	 *
+	 * For sectorsize < PAGE_SIZE case, we only want to support 64K
+	 * PAGE_SIZE, and ensured all tree blocks won't cross page boundary.
+	 * So in that case we always got 1 page.
+	 */
+	return (round_up(eb->len, PAGE_SIZE) >> PAGE_SHIFT);
 }
 
 static inline int extent_buffer_uptodate(const struct extent_buffer *eb)
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 08/19] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (6 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 07/19] btrfs: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  5:35 ` [PATCH v2 09/19] btrfs: make csum_tree_block() handle sectorsize smaller than page size Qu Wenruo
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

To support sectorsize < PAGE_SIZE case, we need to take extra care for
extent buffer accessors.

Since sectorsize is smaller than PAGE_SIZE, one page can contain
multiple tree blocks, we must use eb->start to determine the real offset
to read/write for extent buffer accessors.

This patch introduces two helpers to do these:
- get_eb_page_index()
  This is to calculate the index to access extent_buffer::pages.
  It's just a simple wrapper around "start >> PAGE_SHIFT".

  For sectorsize == PAGE_SIZE case, nothing is changed.
  For sectorsize < PAGE_SIZE case, we always get index as 0, and
  the existing page shift works also fine.

- get_eb_page_offset()
  This is to calculate the offset to access extent_buffer::pages.
  This needs to take extent_buffer::start into consideration.

  For sectorsize == PAGE_SIZE case, extent_buffer::start is always
  aligned to PAGE_SIZE, thus adding extent_buffer::start to
  offset_in_page() won't change the result.
  For sectorsize < PAGE_SIZE case, adding extent_buffer::start gives
  us the correct offset to access.

This patch will touch the following parts to cover all extent buffer
accessors:

- BTRFS_SETGET_HEADER_FUNCS()
- read_extent_buffer()
- read_extent_buffer_to_user()
- memcmp_extent_buffer()
- write_extent_buffer_chunk_tree_uuid()
- write_extent_buffer_fsid()
- write_extent_buffer()
- memzero_extent_buffer()
- copy_extent_buffer_full()
- copy_extent_buffer()
- memcpy_extent_buffer()
- memmove_extent_buffer()
- btrfs_get_token_##bits()
- btrfs_get_##bits()
- btrfs_set_token_##bits()
- btrfs_set_##bits()

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h        | 38 ++++++++++++++++++++++--
 fs/btrfs/extent_io.c    | 66 ++++++++++++++++++++++++-----------------
 fs/btrfs/struct-funcs.c | 18 ++++++-----
 3 files changed, 85 insertions(+), 37 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9a72896bed2e..81d5a6cc97b5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1448,14 +1448,15 @@ static inline void btrfs_set_token_##name(struct btrfs_map_token *token,\
 #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)		\
 static inline u##bits btrfs_##name(const struct extent_buffer *eb)	\
 {									\
-	const type *p = page_address(eb->pages[0]);			\
+	const type *p = page_address(eb->pages[0]) +			\
+			offset_in_page(eb->start);			\
 	u##bits res = le##bits##_to_cpu(p->member);			\
 	return res;							\
 }									\
 static inline void btrfs_set_##name(const struct extent_buffer *eb,	\
 				    u##bits val)			\
 {									\
-	type *p = page_address(eb->pages[0]);				\
+	type *p = page_address(eb->pages[0]) + offset_in_page(eb->start); \
 	p->member = cpu_to_le##bits(val);				\
 }
 
@@ -3241,6 +3242,39 @@ static inline void assertfail(const char *expr, const char* file, int line) { }
 #define ASSERT(expr)	(void)(expr)
 #endif
 
+/*
+ * Get the correct offset inside the page of extent buffer.
+ *
+ * Will handle both sectorsize == PAGE_SIZE and sectorsize < PAGE_SIZE cases.
+ *
+ * @eb:		The target extent buffer
+ * @start:	The offset inside the extent buffer
+ */
+static inline size_t get_eb_page_offset(const struct extent_buffer *eb,
+					unsigned long start)
+{
+	/*
+	 * For sectorsize == PAGE_SIZE case, eb->start will always be aligned
+	 * to PAGE_SIZE, thus adding it won't cause any difference.
+	 *
+	 * For sectorsize < PAGE_SIZE, we must only read the data belongs to
+	 * the eb, thus we have to take the eb->start into consideration.
+	 */
+	return offset_in_page(start + eb->start);
+}
+
+static inline unsigned long get_eb_page_index(unsigned long start)
+{
+	/*
+	 * For sectorsize == PAGE_SIZE case, plain >> PAGE_SHIFT is enough.
+	 *
+	 * For sectorsize < PAGE_SIZE case, we only support 64K PAGE_SIZE,
+	 * and has ensured all tree blocks are contained in one page, thus
+	 * we always get index == 0.
+	 */
+	return start >> PAGE_SHIFT;
+}
+
 /*
  * Use that for functions that are conditionally exported for sanity tests but
  * otherwise static
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9af972999a09..ba9cc864d9c0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5637,7 +5637,7 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 	struct page *page;
 	char *kaddr;
 	char *dst = (char *)dstv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 
 	if (start + len > eb->len) {
 		WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, wanted %lu %lu\n",
@@ -5646,7 +5646,7 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 		return;
 	}
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5671,13 +5671,13 @@ int read_extent_buffer_to_user(const struct extent_buffer *eb,
 	struct page *page;
 	char *kaddr;
 	char __user *dst = (char __user *)dstv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 	int ret = 0;
 
 	WARN_ON(start > eb->len);
 	WARN_ON(start + len > eb->start + eb->len);
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5706,13 +5706,13 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 	struct page *page;
 	char *kaddr;
 	char *ptr = (char *)ptrv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 	int ret = 0;
 
 	WARN_ON(start > eb->len);
 	WARN_ON(start + len > eb->start + eb->len);
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5738,7 +5738,7 @@ void write_extent_buffer_chunk_tree_uuid(const struct extent_buffer *eb,
 	char *kaddr;
 
 	WARN_ON(!PageUptodate(eb->pages[0]));
-	kaddr = page_address(eb->pages[0]);
+	kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, chunk_tree_uuid), srcv,
 			BTRFS_FSID_SIZE);
 }
@@ -5748,7 +5748,7 @@ void write_extent_buffer_fsid(const struct extent_buffer *eb, const void *srcv)
 	char *kaddr;
 
 	WARN_ON(!PageUptodate(eb->pages[0]));
-	kaddr = page_address(eb->pages[0]);
+	kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, fsid), srcv,
 			BTRFS_FSID_SIZE);
 }
@@ -5761,12 +5761,12 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
 	struct page *page;
 	char *kaddr;
 	char *src = (char *)srcv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 
 	WARN_ON(start > eb->len);
 	WARN_ON(start + len > eb->start + eb->len);
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5790,12 +5790,12 @@ void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start,
 	size_t offset;
 	struct page *page;
 	char *kaddr;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 
 	WARN_ON(start > eb->len);
 	WARN_ON(start + len > eb->start + eb->len);
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5819,10 +5819,22 @@ void copy_extent_buffer_full(const struct extent_buffer *dst,
 
 	ASSERT(dst->len == src->len);
 
-	num_pages = num_extent_pages(dst);
-	for (i = 0; i < num_pages; i++)
-		copy_page(page_address(dst->pages[i]),
-				page_address(src->pages[i]));
+	if (dst->fs_info->sectorsize == PAGE_SIZE) {
+		num_pages = num_extent_pages(dst);
+		for (i = 0; i < num_pages; i++)
+			copy_page(page_address(dst->pages[i]),
+				  page_address(src->pages[i]));
+	} else {
+		unsigned long src_index = get_eb_page_index(src->start);
+		unsigned long dst_index = get_eb_page_index(dst->start);
+		size_t src_offset = get_eb_page_offset(src, 0);
+		size_t dst_offset = get_eb_page_offset(dst, 0);
+
+		ASSERT(src_index == 0 && dst_index == 0);
+		memcpy(page_address(dst->pages[dst_index]) + dst_offset,
+		       page_address(src->pages[src_index]) + src_offset,
+		       src->len);
+	}
 }
 
 void copy_extent_buffer(const struct extent_buffer *dst,
@@ -5835,11 +5847,11 @@ void copy_extent_buffer(const struct extent_buffer *dst,
 	size_t offset;
 	struct page *page;
 	char *kaddr;
-	unsigned long i = dst_offset >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(dst_offset);
 
 	WARN_ON(src->len != dst_len);
 
-	offset = offset_in_page(dst_offset);
+	offset = get_eb_page_offset(dst, dst_offset);
 
 	while (len > 0) {
 		page = dst->pages[i];
@@ -5883,7 +5895,7 @@ static inline void eb_bitmap_offset(const struct extent_buffer *eb,
 	 * the bitmap item in the extent buffer + the offset of the byte in the
 	 * bitmap item.
 	 */
-	offset = start + byte_offset;
+	offset = start + offset_in_page(eb->start) + byte_offset;
 
 	*page_index = offset >> PAGE_SHIFT;
 	*page_offset = offset_in_page(offset);
@@ -6047,11 +6059,11 @@ void memcpy_extent_buffer(const struct extent_buffer *dst,
 	}
 
 	while (len > 0) {
-		dst_off_in_page = offset_in_page(dst_offset);
-		src_off_in_page = offset_in_page(src_offset);
+		dst_off_in_page = get_eb_page_offset(dst, dst_offset);
+		src_off_in_page = get_eb_page_offset(dst, src_offset);
 
-		dst_i = dst_offset >> PAGE_SHIFT;
-		src_i = src_offset >> PAGE_SHIFT;
+		dst_i = get_eb_page_index(dst_offset);
+		src_i = get_eb_page_index(src_offset);
 
 		cur = min(len, (unsigned long)(PAGE_SIZE -
 					       src_off_in_page));
@@ -6097,11 +6109,11 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
 		return;
 	}
 	while (len > 0) {
-		dst_i = dst_end >> PAGE_SHIFT;
-		src_i = src_end >> PAGE_SHIFT;
+		dst_i = get_eb_page_index(dst_end);
+		src_i = get_eb_page_index(src_end);
 
-		dst_off_in_page = offset_in_page(dst_end);
-		src_off_in_page = offset_in_page(src_end);
+		dst_off_in_page = get_eb_page_offset(dst, dst_end);
+		src_off_in_page = get_eb_page_offset(dst, src_end);
 
 		cur = min_t(unsigned long, len, src_off_in_page + 1);
 		cur = min(cur, dst_off_in_page + 1);
diff --git a/fs/btrfs/struct-funcs.c b/fs/btrfs/struct-funcs.c
index 079b059818e9..769901c2b3c9 100644
--- a/fs/btrfs/struct-funcs.c
+++ b/fs/btrfs/struct-funcs.c
@@ -67,8 +67,9 @@ u##bits btrfs_get_token_##bits(struct btrfs_map_token *token,		\
 			       const void *ptr, unsigned long off)	\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
-	const unsigned long oip = offset_in_page(member_offset);	\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
+	const unsigned long oip = get_eb_page_offset(token->eb, 	\
+						     member_offset);	\
 	const int size = sizeof(u##bits);				\
 	u8 lebytes[sizeof(u##bits)];					\
 	const int part = PAGE_SIZE - oip;				\
@@ -95,8 +96,8 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb,		\
 			 const void *ptr, unsigned long off)		\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long oip = offset_in_page(member_offset);	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
+	const unsigned long oip = get_eb_page_offset(eb, member_offset);\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
 	char *kaddr = page_address(eb->pages[idx]);			\
 	const int size = sizeof(u##bits);				\
 	const int part = PAGE_SIZE - oip;				\
@@ -116,8 +117,9 @@ void btrfs_set_token_##bits(struct btrfs_map_token *token,		\
 			    u##bits val)				\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
-	const unsigned long oip = offset_in_page(member_offset);	\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
+	const unsigned long oip = get_eb_page_offset(token->eb,		\
+						     member_offset);	\
 	const int size = sizeof(u##bits);				\
 	u8 lebytes[sizeof(u##bits)];					\
 	const int part = PAGE_SIZE - oip;				\
@@ -146,8 +148,8 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr,	\
 		      unsigned long off, u##bits val)			\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long oip = offset_in_page(member_offset);	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
+	const unsigned long oip = get_eb_page_offset(eb, member_offset);\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
 	char *kaddr = page_address(eb->pages[idx]);			\
 	const int size = sizeof(u##bits);				\
 	const int part = PAGE_SIZE - oip;				\
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 09/19] btrfs: make csum_tree_block() handle sectorsize smaller than page size
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (7 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 08/19] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  8:47   ` Johannes Thumshirn
  2020-09-15  5:35 ` [PATCH v2 10/19] btrfs: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues, Nikolay Borisov

For subpage size support, we only need to handle the first page.

To make the code work for both cases, we modify the following behaviors:

- num_pages calcuation
  Instead of "nodesize >> PAGE_SHIFT", we go
  "DIV_ROUND_UP(nodesize, PAGE_SIZE)", this ensures we get at least one
  page for subpage size support, while still get the same result for
  reguar page size.

- The length for the first run
  Instead of PAGE_SIZE - BTRFS_CSUM_SIZE, we go min(PAGE_SIZE, nodesize)
  - BTRFS_CSUM_SIZE.
  This allows us to handle both cases well.

- The start location of the first run
  Instead of always use BTRFS_CSUM_SIZE as csum start position, add
  offset_in_page(eb->start) to get proper offset for both cases.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/disk-io.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f6bba7eb1fa1..62dbd9bbd381 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -257,16 +257,16 @@ struct extent_map *btree_get_extent(struct btrfs_inode *inode,
 static void csum_tree_block(struct extent_buffer *buf, u8 *result)
 {
 	struct btrfs_fs_info *fs_info = buf->fs_info;
-	const int num_pages = fs_info->nodesize >> PAGE_SHIFT;
+	const int num_pages = DIV_ROUND_UP(fs_info->nodesize, PAGE_SIZE);
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	char *kaddr;
 	int i;
 
 	shash->tfm = fs_info->csum_shash;
 	crypto_shash_init(shash);
-	kaddr = page_address(buf->pages[0]);
+	kaddr = page_address(buf->pages[0]) + offset_in_page(buf->start);
 	crypto_shash_update(shash, kaddr + BTRFS_CSUM_SIZE,
-			    PAGE_SIZE - BTRFS_CSUM_SIZE);
+		min_t(u32, PAGE_SIZE, fs_info->nodesize) - BTRFS_CSUM_SIZE);
 
 	for (i = 1; i < num_pages; i++) {
 		kaddr = page_address(buf->pages[i]);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 10/19] btrfs: add assert_spin_locked() for attach_extent_buffer_page()
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (8 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 09/19] btrfs: make csum_tree_block() handle sectorsize smaller than page size Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  8:52   ` Johannes Thumshirn
  2020-09-15  5:35 ` [PATCH v2 11/19] btrfs: extract the extent buffer verification from btree_readpage_end_io_hook() Qu Wenruo
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

When calling attach_extent_buffer_page(), either we're attaching
anonymous pages, called from btrfs_clone_extent_buffer().

Or we're attaching btree_inode pages, called from alloc_extent_buffer().

For the later case, we should have page->mapping->private_lock hold to
avoid race modifying page->private.

Add assert_spin_locked() if we're calling from alloc_extent_buffer().

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ba9cc864d9c0..9d4a81997738 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3096,6 +3096,15 @@ static int submit_extent_page(unsigned int opf,
 static void attach_extent_buffer_page(struct extent_buffer *eb,
 				      struct page *page)
 {
+	/*
+	 * If the page is mapped to btree inode, we should hold the private
+	 * lock to prevent race.
+	 * For cloned or dummy extent buffers, their pages are not mapped and
+	 * will not race with any other ebs.
+	 */
+	if (page->mapping)
+		assert_spin_locked(&page->mapping->private_lock);
+
 	if (!PagePrivate(page))
 		attach_page_private(page, eb);
 	else
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 11/19] btrfs: extract the extent buffer verification from btree_readpage_end_io_hook()
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (9 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 10/19] btrfs: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  5:35 ` [PATCH v2 12/19] btrfs: extent_io: only require sector size alignment for page read Qu Wenruo
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs

Currently btree_readpage_end_io_hook() only needs to handle one extent
buffer as currently one page only maps to one extent buffer.

But for incoming subpage support, one page can be mapped to multiple
extent buffers, thus we can no longer use current code.

This refactor would allow us to call btrfs_check_extent_buffer() on
all involved extent buffers at btree_readpage_end_io_hook() and other
locations.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 78 ++++++++++++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 62dbd9bbd381..1ba16951ccaa 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -574,60 +574,37 @@ static int check_tree_block_fsid(struct extent_buffer *eb)
 	return ret;
 }
 
-static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
-				      u64 phy_offset, struct page *page,
-				      u64 start, u64 end, int mirror)
+/* Do basic extent buffer check at read time */
+static int btrfs_check_extent_buffer(struct extent_buffer *eb)
 {
-	u64 found_start;
-	int found_level;
-	struct extent_buffer *eb;
-	struct btrfs_fs_info *fs_info;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 	u16 csum_size;
-	int ret = 0;
+	u64 found_start;
+	u8 found_level;
 	u8 result[BTRFS_CSUM_SIZE];
-	int reads_done;
-
-	if (!page->private)
-		goto out;
+	int ret = 0;
 
-	eb = (struct extent_buffer *)page->private;
-	fs_info = eb->fs_info;
 	csum_size = btrfs_super_csum_size(fs_info->super_copy);
 
-	/* the pending IO might have been the only thing that kept this buffer
-	 * in memory.  Make sure we have a ref for all this other checks
-	 */
-	atomic_inc(&eb->refs);
-
-	reads_done = atomic_dec_and_test(&eb->io_pages);
-	if (!reads_done)
-		goto err;
-
-	eb->read_mirror = mirror;
-	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
-		ret = -EIO;
-		goto err;
-	}
-
 	found_start = btrfs_header_bytenr(eb);
 	if (found_start != eb->start) {
 		btrfs_err_rl(fs_info, "bad tree block start, want %llu have %llu",
 			     eb->start, found_start);
 		ret = -EIO;
-		goto err;
+		goto out;
 	}
 	if (check_tree_block_fsid(eb)) {
 		btrfs_err_rl(fs_info, "bad fsid on block %llu",
 			     eb->start);
 		ret = -EIO;
-		goto err;
+		goto out;
 	}
 	found_level = btrfs_header_level(eb);
 	if (found_level >= BTRFS_MAX_LEVEL) {
 		btrfs_err(fs_info, "bad tree block level %d on %llu",
 			  (int)btrfs_header_level(eb), eb->start);
 		ret = -EIO;
-		goto err;
+		goto out;
 	}
 
 	btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb),
@@ -647,7 +624,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 			      fs_info->sb->s_id, eb->start,
 			      val, found, btrfs_header_level(eb));
 		ret = -EUCLEAN;
-		goto err;
+		goto out;
 	}
 
 	/*
@@ -669,6 +646,40 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 		btrfs_err(fs_info,
 			  "block=%llu read time tree block corruption detected",
 			  eb->start);
+out:
+	return ret;
+}
+
+static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
+				      u64 phy_offset, struct page *page,
+				      u64 start, u64 end, int mirror)
+{
+	struct extent_buffer *eb;
+	int ret = 0;
+	bool reads_done;
+
+	/* Metadata pages that goes through IO should all have private set */
+	ASSERT(PagePrivate(page) && page->private);
+	eb = (struct extent_buffer *)page->private;
+
+	/*
+	 * The pending IO might have been the only thing that kept this buffer
+	 * in memory.  Make sure we have a ref for all this other checks
+	 */
+	atomic_inc(&eb->refs);
+
+	reads_done = atomic_dec_and_test(&eb->io_pages);
+	if (!reads_done)
+		goto err;
+
+	eb->read_mirror = mirror;
+	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
+		ret = -EIO;
+		goto err;
+	}
+
+	ret = btrfs_check_extent_buffer(eb);
+
 err:
 	if (reads_done &&
 	    test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
@@ -684,7 +695,6 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 		clear_extent_buffer_uptodate(eb);
 	}
 	free_extent_buffer(eb);
-out:
 	return ret;
 }
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 12/19] btrfs: extent_io: only require sector size alignment for page read
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (10 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 11/19] btrfs: extract the extent buffer verification from btree_readpage_end_io_hook() Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  5:35 ` [PATCH v2 13/19] btrfs: make btrfs_readpage_end_io_hook() follow sector size Qu Wenruo
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs

If we're reading partial page, btrfs will warn about this as our
read/write are always done in sector size, which equals page size.

But for the incoming subpage RO support, our data read is only aligned
to sectorsize, which can be smaller than page size.

Thus here we change the warning condition to check it against
sectorsize, thus the behavior is not changed for regular sectorsize ==
PAGE_SIZE case, while won't report error for subpage read.

Also, pass the proper start/end with bv_offset for check_data_csum() to
handle.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9d4a81997738..f4ab59b3ce3c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2819,6 +2819,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		struct page *page = bvec->bv_page;
 		struct inode *inode = page->mapping->host;
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+		u32 sectorsize = fs_info->sectorsize;
 		bool data_inode = btrfs_ino(BTRFS_I(inode))
 			!= BTRFS_BTREE_INODE_OBJECTID;
 
@@ -2829,13 +2830,17 @@ static void end_bio_extent_readpage(struct bio *bio)
 		tree = &BTRFS_I(inode)->io_tree;
 		failure_tree = &BTRFS_I(inode)->io_failure_tree;
 
-		/* We always issue full-page reads, but if some block
+		/*
+		 * We always issue full-sector reads, but if some block
 		 * in a page fails to read, blk_update_request() will
 		 * advance bv_offset and adjust bv_len to compensate.
-		 * Print a warning for nonzero offsets, and an error
-		 * if they don't add up to a full page.  */
-		if (bvec->bv_offset || bvec->bv_len != PAGE_SIZE) {
-			if (bvec->bv_offset + bvec->bv_len != PAGE_SIZE)
+		 * Print a warning for unaligned offsets, and an error
+		 * if they don't add up to a full sector.
+		 */
+		if (!IS_ALIGNED(bvec->bv_offset, sectorsize) ||
+		    !IS_ALIGNED(bvec->bv_offset + bvec->bv_len, sectorsize)) {
+			if (!IS_ALIGNED(bvec->bv_offset + bvec->bv_len,
+					sectorsize))
 				btrfs_err(fs_info,
 					"partial page read in btrfs with offset %u and length %u",
 					bvec->bv_offset, bvec->bv_len);
@@ -2845,8 +2850,8 @@ static void end_bio_extent_readpage(struct bio *bio)
 					bvec->bv_offset, bvec->bv_len);
 		}
 
-		start = page_offset(page);
-		end = start + bvec->bv_offset + bvec->bv_len - 1;
+		start = page_offset(page) + bvec->bv_offset;
+		end = start + bvec->bv_len - 1;
 		len = bvec->bv_len;
 
 		mirror = io_bio->mirror_num;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 13/19] btrfs: make btrfs_readpage_end_io_hook() follow sector size
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (11 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 12/19] btrfs: extent_io: only require sector size alignment for page read Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  5:35 ` [PATCH v2 14/19] btrfs: make btree inode io_tree has its special owner Qu Wenruo
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

Currently btrfs_readpage_end_io_hook() just pass the whole page to
check_data_csum(), which is fine since we only support sectorsize ==
PAGE_SIZE.

To support subpage RO support, we need to properly honor per-sector
checksum verification, just like what we did in dio read path.

This patch will do the csum verification in a for loop, starts with
pg_off == start - page_offset(page), with sectorsize increasement for
each loop.

For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we
will only finish with just one loop.

For subpage, we do the proper loop.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index bb4442950d2b..544253dfcd0b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2851,9 +2851,12 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 				      u64 start, u64 end, int mirror)
 {
 	size_t offset = start - page_offset(page);
+	size_t pg_off;
 	struct inode *inode = page->mapping->host;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
+	u32 sectorsize = root->fs_info->sectorsize;
+	bool found_err = false;
 
 	if (PageChecked(page)) {
 		ClearPageChecked(page);
@@ -2870,7 +2873,17 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 	}
 
 	phy_offset >>= inode->i_sb->s_blocksize_bits;
-	return check_data_csum(inode, io_bio, phy_offset, page, offset);
+	for (pg_off = offset; pg_off < end - page_offset(page);
+	     pg_off += sectorsize, phy_offset++) {
+		int ret;
+
+		ret = check_data_csum(inode, io_bio, phy_offset, page, pg_off);
+		if (ret < 0)
+			found_err = true;
+	}
+	if (found_err)
+		return -EIO;
+	return 0;
 }
 
 /*
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 14/19] btrfs: make btree inode io_tree has its special owner
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (12 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 13/19] btrfs: make btrfs_readpage_end_io_hook() follow sector size Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-16  9:28   ` Johannes Thumshirn
                     ` (2 more replies)
  2020-09-15  5:35 ` [PATCH v2 15/19] btrfs: don't set extent_io_tree bits for btree inode at endio time Qu Wenruo
                   ` (6 subsequent siblings)
  20 siblings, 3 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs

Btree inode is pretty special compared to all other inode extent io
tree, although it has a btrfs inode, it doesn't have the track_uptodate
bit at all.

This means a lot of things like extent locking doesn't even need to be
applied to btree io tree.

Since it's so special, adds a new owner value for it to make debuging a
little easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c        | 2 +-
 fs/btrfs/extent-io-tree.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1ba16951ccaa..82a841bd0702 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2126,7 +2126,7 @@ static void btrfs_init_btree_inode(struct btrfs_fs_info *fs_info)
 
 	RB_CLEAR_NODE(&BTRFS_I(inode)->rb_node);
 	extent_io_tree_init(fs_info, &BTRFS_I(inode)->io_tree,
-			    IO_TREE_INODE_IO, inode);
+			    IO_TREE_BTREE_IO, inode);
 	BTRFS_I(inode)->io_tree.track_uptodate = false;
 	extent_map_tree_init(&BTRFS_I(inode)->extent_tree);
 
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 219a09a2b734..21d128383bfd 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -40,6 +40,7 @@ struct io_failure_record;
 enum {
 	IO_TREE_FS_PINNED_EXTENTS,
 	IO_TREE_FS_EXCLUDED_EXTENTS,
+	IO_TREE_BTREE_IO,
 	IO_TREE_INODE_IO,
 	IO_TREE_INODE_IO_FAILURE,
 	IO_TREE_RELOC_BLOCKS,
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 15/19] btrfs: don't set extent_io_tree bits for btree inode at endio time
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (13 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 14/19] btrfs: make btree inode io_tree has its special owner Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  5:35 ` [PATCH v2 16/19] btrfs: use extent_io_tree to handle subpage extent buffer allocation Qu Wenruo
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs

For btree inode, we don't use ordered extent, nor really use it for
locking and uptodate bits.

But we still call lock_extent_bits() during endio call backs.

This is fine and won't cause anything wrong for current code base, but
since we're going to completely rely on extent_io_tree to do all the
sector locking/uptodate/dirty tracking, it's better to decouple btree
inode from endio call backs.

There is only one caller who explicitly lock and unlock btree inode io
tree, that's verify_parent_transid().

But in verify_parent_transid(), call its callers have ensured that they
have the pages read out, either through manual
read_extent_buffer_pages() call, or through extent_buffer_uptodate()
call (checks UPTODATE bit of an extent buffer, which only get set after
page read).

Thus that extra locking makes no sense and can be removed completely.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c        |  5 -----
 fs/btrfs/extent-io-tree.h |  7 +++++++
 fs/btrfs/extent_io.c      | 19 ++++++++++++++-----
 fs/btrfs/ordered-data.c   |  8 ++++++++
 fs/btrfs/qgroup.c         |  4 ++++
 5 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 82a841bd0702..b526adf20f3e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -286,7 +286,6 @@ static int verify_parent_transid(struct extent_io_tree *io_tree,
 				 struct extent_buffer *eb, u64 parent_transid,
 				 int atomic)
 {
-	struct extent_state *cached_state = NULL;
 	int ret;
 	bool need_lock = (current->journal_info == BTRFS_SEND_TRANS_STUB);
 
@@ -301,8 +300,6 @@ static int verify_parent_transid(struct extent_io_tree *io_tree,
 		btrfs_set_lock_blocking_read(eb);
 	}
 
-	lock_extent_bits(io_tree, eb->start, eb->start + eb->len - 1,
-			 &cached_state);
 	if (extent_buffer_uptodate(eb) &&
 	    btrfs_header_generation(eb) == parent_transid) {
 		ret = 0;
@@ -325,8 +322,6 @@ static int verify_parent_transid(struct extent_io_tree *io_tree,
 	if (!extent_buffer_under_io(eb))
 		clear_extent_buffer_uptodate(eb);
 out:
-	unlock_extent_cached(io_tree, eb->start, eb->start + eb->len - 1,
-			     &cached_state);
 	if (need_lock)
 		btrfs_tree_read_unlock_blocking(eb);
 	return ret;
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 21d128383bfd..133cae8a88a6 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -56,6 +56,13 @@ struct extent_io_tree {
 	struct btrfs_fs_info *fs_info;
 	void *private_data;
 	u64 dirty_bytes;
+
+	/*
+	 * Does the tree tracks things like locking and uptodate at endio time.
+	 *
+	 * File inodes has it set to true. Other inodes (include btree inode) set it
+	 * to false.
+	 */
 	bool track_uptodate;
 
 	/* Who owns this io tree, should be one of IO_TREE_* */
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f4ab59b3ce3c..16fe9f4313a1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2782,9 +2782,15 @@ endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 len,
 	struct extent_state *cached = NULL;
 	u64 end = start + len - 1;
 
-	if (uptodate && tree->track_uptodate)
+	/*
+	 * Only update the UPTODATE and LOCK bits for regular inodes.
+	 * Btree io tree (without the track_uptodate bit) handles its own bits
+	 * manually.
+	 */
+	if (uptodate && tree->track_uptodate) {
 		set_extent_uptodate(tree, start, end, &cached, GFP_ATOMIC);
-	unlock_extent_cached_atomic(tree, start, end, &cached);
+		unlock_extent_cached_atomic(tree, start, end, &cached);
+	}
 }
 
 /*
@@ -4436,10 +4442,13 @@ int extent_invalidatepage(struct extent_io_tree *tree,
 	if (start > end)
 		return 0;
 
-	lock_extent_bits(tree, start, end, &cached_state);
+	if (tree->track_uptodate)
+		lock_extent_bits(tree, start, end, &cached_state);
 	wait_on_page_writeback(page);
-	clear_extent_bit(tree, start, end, EXTENT_LOCKED | EXTENT_DELALLOC |
-			 EXTENT_DO_ACCOUNTING, 1, 1, &cached_state);
+	if (tree->track_uptodate)
+		clear_extent_bit(tree, start, end, EXTENT_LOCKED |
+			EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING, 1, 1,
+			&cached_state);
 	return 0;
 }
 
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index ebac13389e7e..0bb8ce1ab51f 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -865,6 +865,14 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start,
 	struct extent_state *cache = NULL;
 	struct extent_state **cachedp = &cache;
 
+	/*
+	 * Btree inode, who doesn't have the track_uptodate bit, doesn't use
+	 * ordered extent at all.
+	 * Exit directly.
+	 */
+	if (!inode->io_tree.track_uptodate)
+		return;
+
 	if (cached_state)
 		cachedp = cached_state;
 
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index c0f350c3a0cf..aeb04ccafaa8 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3924,6 +3924,10 @@ void btrfs_qgroup_check_reserved_leak(struct btrfs_inode *inode)
 	struct ulist_iterator iter;
 	int ret;
 
+	/* This is btree inode, no need to check */
+	if (!inode->io_tree.track_uptodate)
+		return;
+
 	extent_changeset_init(&changeset);
 	ret = clear_record_extent_bits(&inode->io_tree, 0, (u64)-1,
 			EXTENT_QGROUP_RESERVED, &changeset);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 16/19] btrfs: use extent_io_tree to handle subpage extent buffer allocation
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (14 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 15/19] btrfs: don't set extent_io_tree bits for btree inode at endio time Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  5:35 ` [PATCH v2 17/19] btrfs: implement subpage metadata read and its endio function Qu Wenruo
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs uses page::private as an indicator of who owns the
extent buffer, this method won't really work on subpage support, as one
page can contain several tree blocks (up to 16 for 4K node size and 64K
page size).

Instead, here we utilize btree extent io tree to handle them.
Now EXTENT_NEW means we have an extent buffer for the range.

This will affects the following functions:
- alloc_extent_buffer()
  Now for subpage we never use page->private to grab an existing eb.
  Instead, we rely on extra safenet in alloc_extent_buffer() to detect two
  callers on the same eb.

- btrfs_release_extent_buffer_pages()
  Now for subpage, we clear the EXTENT_NEW bit first, then check if the
  remaining range in the page has EXTENT_NEW bit.
  If not, then clear the private bit for the page.

- attach_extent_buffer_page()
  Now we set EXTENT_NEW bit for the new extent buffer to be attached,
  and set the page private, with NULL as page::private.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/btrfs_inode.h | 12 +++++++
 fs/btrfs/disk-io.c     |  7 ++++
 fs/btrfs/extent_io.c   | 79 ++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index c47b6c6fea9f..cff818e0c406 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -217,6 +217,18 @@ static inline struct btrfs_inode *BTRFS_I(const struct inode *inode)
 	return container_of(inode, struct btrfs_inode, vfs_inode);
 }
 
+static inline struct btrfs_fs_info *page_to_fs_info(struct page *page)
+{
+	ASSERT(page->mapping);
+	return BTRFS_I(page->mapping->host)->root->fs_info;
+}
+
+static inline struct extent_io_tree
+*info_to_btree_io_tree(struct btrfs_fs_info *fs_info)
+{
+	return &BTRFS_I(fs_info->btree_inode)->io_tree;
+}
+
 static inline unsigned long btrfs_inode_hash(u64 objectid,
 					     const struct btrfs_root *root)
 {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b526adf20f3e..2ef35eb7a6e1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2120,6 +2120,13 @@ static void btrfs_init_btree_inode(struct btrfs_fs_info *fs_info)
 	inode->i_mapping->a_ops = &btree_aops;
 
 	RB_CLEAR_NODE(&BTRFS_I(inode)->rb_node);
+	/*
+	 * This extent io tree is subpage metadata specific.
+	 *
+	 * It uses the following bits to represent new meaning:
+	 * - EXTENT_NEW:	Has extent buffer allocated
+	 * - EXTENT_UPTODATE	Has latest metadata read from disk
+	 */
 	extent_io_tree_init(fs_info, &BTRFS_I(inode)->io_tree,
 			    IO_TREE_BTREE_IO, inode);
 	BTRFS_I(inode)->io_tree.track_uptodate = false;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 16fe9f4313a1..2af6786e6ab4 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3116,6 +3116,20 @@ static void attach_extent_buffer_page(struct extent_buffer *eb,
 	if (page->mapping)
 		assert_spin_locked(&page->mapping->private_lock);
 
+	if (eb->fs_info->sectorsize < PAGE_SIZE && page->mapping) {
+		struct extent_io_tree *io_tree =
+			info_to_btree_io_tree(eb->fs_info);
+
+		if (!PagePrivate(page))
+			attach_page_private(page, NULL);
+
+		/* EXTENT_NEW represents we have an extent buffer */
+		set_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+				EXTENT_NEW, NULL, NULL, GFP_ATOMIC);
+		eb->pages[0] = page;
+		return;
+	}
+
 	if (!PagePrivate(page))
 		attach_page_private(page, eb);
 	else
@@ -4943,6 +4957,37 @@ int extent_buffer_under_io(const struct extent_buffer *eb)
 		test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
 }
 
+static void detach_extent_buffer_subpage(struct extent_buffer *eb)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	struct page *page = eb->pages[0];
+	bool mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
+	int ret;
+
+	if (!page)
+		return;
+
+	if (mapped)
+		spin_lock(&page->mapping->private_lock);
+
+	/*
+	 * Clear the EXTENT_NEW bit from io tree, to indicate that there is
+	 * no longer an extent buffer in the range.
+	 */
+	__clear_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+			   EXTENT_NEW, 0, 0, NULL, GFP_ATOMIC, NULL);
+
+	/* Test if we still have other extent buffer in the page range */
+	ret = test_range_bit(io_tree, round_down(eb->start, PAGE_SIZE),
+			     round_down(eb->start, PAGE_SIZE) + PAGE_SIZE - 1,
+			     EXTENT_NEW, 0, NULL);
+	if (!ret)
+		detach_page_private(eb->pages[0]);
+	if (mapped)
+		spin_unlock(&page->mapping->private_lock);
+}
+
 /*
  * Release all pages attached to the extent buffer.
  */
@@ -4954,6 +4999,9 @@ static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
 
 	BUG_ON(extent_buffer_under_io(eb));
 
+	if (eb->fs_info->sectorsize < PAGE_SIZE)
+		return detach_extent_buffer_subpage(eb);
+
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		struct page *page = eb->pages[i];
@@ -5248,6 +5296,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 	struct extent_buffer *exists = NULL;
 	struct page *p;
 	struct address_space *mapping = fs_info->btree_inode->i_mapping;
+	bool subpage = (fs_info->sectorsize < PAGE_SIZE);
 	int uptodate = 1;
 	int ret;
 
@@ -5280,7 +5329,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		}
 
 		spin_lock(&mapping->private_lock);
-		if (PagePrivate(p)) {
+		if (PagePrivate(p) && !subpage) {
 			/*
 			 * We could have already allocated an eb for this page
 			 * and attached one so lets see if we can get a ref on
@@ -5321,8 +5370,21 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		 * we could crash.
 		 */
 	}
-	if (uptodate)
+	if (uptodate && !subpage)
 		set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+	/*
+	 * For subpage, we must check extent_io_tree to get if the eb is really
+	 * UPTODATE, as the page bit is no longer reliable as we can do subpage
+	 * read.
+	 */
+	if (subpage) {
+		struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+
+		ret = test_range_bit(io_tree, eb->start, eb->start + eb->len - 1,
+				     EXTENT_UPTODATE, 1, NULL);
+		if (ret)
+			set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+	}
 again:
 	ret = radix_tree_preload(GFP_NOFS);
 	if (ret) {
@@ -5361,6 +5423,19 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		if (eb->pages[i])
 			unlock_page(eb->pages[i]);
 	}
+	/*
+	 * For subpage case, btrfs_release_extent_buffer() will clear the
+	 * EXTENT_NEW bit if there is a page, and EXTENT_NEW bit represents
+	 * we have an extent buffer in that range.
+	 *
+	 * Since we're here because we hit a race with another caller, who
+	 * succeeded in inserting the eb, we shouldn't clear that EXTENT_NEW
+	 * bit. So here we cleanup the page manually.
+	 */
+	if (subpage) {
+		put_page(eb->pages[0]);
+		eb->pages[i] = NULL;
+	}
 
 	btrfs_release_extent_buffer(eb);
 	return exists;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 17/19] btrfs: implement subpage metadata read and its endio function
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (15 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 16/19] btrfs: use extent_io_tree to handle subpage extent buffer allocation Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-16  8:47   ` kernel test robot
  2020-09-15  5:35 ` [PATCH v2 18/19] btrfs: implement btree_readpage() and try_release_extent_buffer() for subpage Qu Wenruo
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs

For subpage metadata read, since we're completely relying on io tree
other than page bits, its read submission and endio function is
different from the original page size.

For submission part:
- Do extent locking/waiting
  Instead of page locking/waiting, since we don't rely on page bits anymore.

- Submit extent page directly
  To simply the process, as all the metadata read is always contained in
  one page.

For endio part:
- Do extent locking/waiting

This behavior has a small problem that, extent locking/waiting are all
going to allocate memory, thus they can all fail.

Currently we're using GFP_ATOMIC, but still we may hit ENOSPC someday.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   |  89 ++++++++++++++++++++++++++++++++++++++
 fs/btrfs/extent_io.c | 101 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 189 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2ef35eb7a6e1..1de5a0fef2f5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -645,6 +645,92 @@ static int btrfs_check_extent_buffer(struct extent_buffer *eb)
 	return ret;
 }
 
+static int btree_read_subpage_endio_hook(struct page *page, u64 start, u64 end,
+					 int mirror)
+{
+	struct btrfs_fs_info *fs_info = page_to_fs_info(page);
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	struct extent_buffer *eb;
+	int reads_done;
+	int ret = 0;
+
+	if (!IS_ALIGNED(start, fs_info->sectorsize) ||
+	    !IS_ALIGNED(end - start + 1, fs_info->sectorsize) ||
+	    !IS_ALIGNED(end - start + 1, fs_info->nodesize)) {
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		btrfs_err(fs_info, "invalid tree read bytenr");
+		return -EUCLEAN;
+	}
+
+	/*
+	 * We don't allow bio merge for subpage metadata read, so we should
+	 * only get one eb for each endio hook.
+	 */
+	ASSERT(end == start + fs_info->nodesize - 1);
+	ASSERT(PagePrivate(page));
+
+	rcu_read_lock();
+	eb = radix_tree_lookup(&fs_info->buffer_radix,
+			       start / fs_info->sectorsize);
+	rcu_read_unlock();
+
+	/*
+	 * When we are reading one tree block, eb must have been
+	 * inserted into the radix tree. If not something is wrong.
+	 */
+	if (!eb) {
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		btrfs_err(fs_info,
+			"can't find extent buffer for bytenr %llu",
+			start);
+		return -EUCLEAN;
+	}
+	/*
+	 * The pending IO might have been the only thing that kept
+	 * this buffer in memory.  Make sure we have a ref for all
+	 * this other checks
+	 */
+	atomic_inc(&eb->refs);
+
+	reads_done = atomic_dec_and_test(&eb->io_pages);
+	/* Subpage read must finish in page read */
+	ASSERT(reads_done);
+
+	eb->read_mirror= mirror;
+	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
+		ret = -EIO;
+		goto err;
+	}
+	ret = btrfs_check_extent_buffer(eb);
+	if (ret < 0)
+		goto err;
+
+	if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
+		btree_readahead_hook(eb, ret);
+
+	set_extent_buffer_uptodate(eb);
+
+	free_extent_buffer(eb);
+
+	/*
+	 * Since we don't use PageLocked() but extent_io_tree lock to lock the
+	 * range, we need to unlock the range here.
+	 */
+	unlock_extent(io_tree, start, end);
+	return ret;
+err:
+	/*
+	 * our io error hook is going to dec the io pages
+	 * again, we have to make sure it has something to
+	 * decrement
+	 */
+	atomic_inc(&eb->io_pages);
+	clear_extent_buffer_uptodate(eb);
+	free_extent_buffer(eb);
+	unlock_extent(io_tree, start, end);
+	return ret;
+}
+
 static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 				      u64 phy_offset, struct page *page,
 				      u64 start, u64 end, int mirror)
@@ -653,6 +739,9 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 	int ret = 0;
 	bool reads_done;
 
+	if (page_to_fs_info(page)->sectorsize < PAGE_SIZE)
+		return btree_read_subpage_endio_hook(page, start, end, mirror);
+
 	/* Metadata pages that goes through IO should all have private set */
 	ASSERT(PagePrivate(page) && page->private);
 	eb = (struct extent_buffer *)page->private;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2af6786e6ab4..75437a55a986 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2923,7 +2923,13 @@ static void end_bio_extent_readpage(struct bio *bio)
 			ClearPageUptodate(page);
 			SetPageError(page);
 		}
-		unlock_page(page);
+
+		/*
+		 * For subpage metadata read, we don't have page locked, but
+		 * rely on EXTENT_LOCKED bit to handle lock completely.
+		 */
+		if (!(fs_info->sectorsize < PAGE_SIZE && !data_inode))
+			unlock_page(page);
 		offset += len;
 
 		if (unlikely(!uptodate)) {
@@ -3064,6 +3070,14 @@ static int submit_extent_page(unsigned int opf,
 		else
 			contig = bio_end_sector(bio) == sector;
 
+		/*
+		 * For subpage metadata read, never merge request, so that
+		 * we get endio hook called on each metadata read.
+		 */
+		if (page_to_fs_info(page)->sectorsize < PAGE_SIZE &&
+		    tree->owner == IO_TREE_BTREE_IO)
+			ASSERT(force_bio_submit);
+
 		ASSERT(tree->ops);
 		if (btrfs_bio_fits_in_stripe(page, page_size, bio, bio_flags))
 			can_merge = false;
@@ -5593,6 +5607,15 @@ void clear_extent_buffer_uptodate(struct extent_buffer *eb)
 	int num_pages;
 
 	clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+	/* For subpage eb, we don't set page bits, but extent_io_tree bits */
+	if (eb->fs_info->sectorsize < PAGE_SIZE) {
+		struct extent_io_tree *io_tree =
+			info_to_btree_io_tree(eb->fs_info);
+
+		clear_extent_bits(io_tree, eb->start, eb->start + eb->len - 1,
+				  EXTENT_UPTODATE);
+		return;
+	}
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
@@ -5608,6 +5631,20 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
 	int num_pages;
 
 	set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+
+	/*
+	 * For subpage size, we ignore page bits, but need to set io tree range
+	 * uptodate
+	 */
+	if (eb->fs_info->sectorsize < PAGE_SIZE) {
+		struct extent_io_tree *io_tree =
+			info_to_btree_io_tree(eb->fs_info);
+
+		set_extent_uptodate(io_tree, eb->start, eb->start + eb->len - 1,
+				    NULL, GFP_ATOMIC);
+		return;
+	}
+
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
@@ -5615,6 +5652,65 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
 	}
 }
 
+static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
+				      int mirror_num)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	struct page *page = eb->pages[0];
+	struct bio *bio = NULL;
+	int ret = 0;
+
+	/* We don't lock page, but only extent_io_tree for btree inode*/
+	if (wait == WAIT_NONE) {
+		ret = try_lock_extent(io_tree, eb->start,
+				      eb->start + eb->len - 1);
+		if (ret == 0)
+			return ret;
+	} else {
+		ret = lock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		if (ret < 0)
+			return ret;
+	}
+
+	ret = 0;
+	atomic_set(&eb->io_pages, 1);
+	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags) ||
+	    test_range_bit(io_tree, eb->start, eb->start + eb->len - 1,
+			   EXTENT_UPTODATE, 1, NULL)) {
+		unlock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		return ret;
+	}
+
+	ret = submit_extent_page(REQ_OP_READ | REQ_META, NULL, page, eb->start,
+				 eb->len, eb->start - page_offset(page), &bio,
+				 end_bio_extent_readpage, mirror_num, 0, 0,
+				 true);
+	if (ret) {
+		/*
+		 * In the endio function, if we hit something wrong we will
+		 * increase the io_pages, so here we need to decrease it for error
+		 * path.
+		 */
+		atomic_dec(&eb->io_pages);
+	}
+	if (bio) {
+		int tmp;
+
+		tmp = submit_one_bio(bio, mirror_num, 0);
+		if (tmp < 0)
+			return tmp;
+	}
+	if (ret || wait != WAIT_COMPLETE)
+		return ret;
+
+	wait_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+			EXTENT_LOCKED);
+	if (!test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
+		ret = -EIO;
+	return ret;
+}
+
 int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 {
 	int i;
@@ -5631,6 +5727,9 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
 		return 0;
 
+	if (eb->fs_info->sectorsize < PAGE_SIZE)
+		return read_extent_buffer_subpage(eb, wait, mirror_num);
+
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 18/19] btrfs: implement btree_readpage() and try_release_extent_buffer() for subpage
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (16 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 17/19] btrfs: implement subpage metadata read and its endio function Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-15  5:35 ` [PATCH v2 19/19] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs

For btree_readpage() we just block the function, as btree read should
only be triggered by btrfs itself, VFS shouldn't need to bother.

For try_release_extent_buffer(), we just iterate through all the range
with EXTENT_NEW set, and try freeing each extent buffer.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   |  6 +++++
 fs/btrfs/extent_io.c | 62 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1de5a0fef2f5..769ffb191683 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1048,6 +1048,12 @@ static int btree_writepages(struct address_space *mapping,
 
 static int btree_readpage(struct file *file, struct page *page)
 {
+	/*
+	 * For subpage, we don't support VFS to call btree_readpages(),
+	 * directly.
+	 */
+	if (page_to_fs_info(page)->sectorsize < PAGE_SIZE)
+		return -ENOTTY;
 	return extent_read_full_page(page, btree_get_extent, 0);
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 75437a55a986..4eceae7183c9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -6324,10 +6324,72 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
 	}
 }
 
+static int try_release_subpage_eb(struct page *page)
+{
+	struct btrfs_fs_info *fs_info = page_to_fs_info(page);
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	u64 cur = page_offset(page);
+	u64 end = page_offset(page) + PAGE_SIZE - 1;
+	int ret;
+
+	while (cur <= end) {
+		struct extent_buffer *eb;
+		u64 found_start;
+		u64 found_end;
+
+		spin_lock(&page->mapping->private_lock);
+		ret = find_first_extent_bit(io_tree, cur, &found_start,
+				&found_end, EXTENT_NEW, NULL);
+		/* No found or found range beyond end */
+		if (ret > 0 || found_start > end) {
+			spin_unlock(&page->mapping->private_lock);
+			goto out;
+		}
+
+		/* found_start can be smaller than cur */
+		cur = max(cur, found_start);
+
+		/*
+		 * Here we have private_lock and is very safe to lookup the
+		 * radix tree.
+		 * And we can't call find_extent_buffer() which will increase
+		 * eb->refs.
+		 */
+		eb = radix_tree_lookup(&fs_info->buffer_radix,
+				cur / fs_info->sectorsize);
+		ASSERT(eb);
+		cur = eb->start + eb->len;
+
+		spin_lock(&eb->refs_lock);
+		if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb) ||
+		    !test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) {
+			spin_unlock(&eb->refs_lock);
+			spin_unlock(&page->mapping->private_lock);
+			continue;
+		}
+		spin_unlock(&page->mapping->private_lock);
+		/*
+		 * Here we don't care the return value, we will always check
+		 * the EXTENT_NEW bits at the end.
+		 */
+		release_extent_buffer(eb);
+	}
+out:
+	/* Finally check if there is any EXTENT_NEW bit in the range */
+	if (test_range_bit(io_tree, page_offset(page), end, EXTENT_NEW, 0,
+			   NULL))
+		ret = 0;
+	else
+		ret = 1;
+	return ret;
+}
+
 int try_release_extent_buffer(struct page *page)
 {
 	struct extent_buffer *eb;
 
+	if (page_to_fs_info(page)->sectorsize < PAGE_SIZE)
+		return try_release_subpage_eb(page);
 	/*
 	 * We need to make sure nobody is attaching this page to an eb right
 	 * now.
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2 19/19] btrfs: allow RO mount of 4K sector size fs on 64K page system
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (17 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 18/19] btrfs: implement btree_readpage() and try_release_extent_buffer() for subpage Qu Wenruo
@ 2020-09-15  5:35 ` Qu Wenruo
  2020-09-16  1:35 ` [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
  2020-09-16 16:18 ` Neal Gompa
  20 siblings, 0 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15  5:35 UTC (permalink / raw)
  To: linux-btrfs

This adds the basic RO mount ability for 4K sector size on 64K page
system.

Currently we only plan to support 4K and 64K page system.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 24 +++++++++++++++++++++---
 fs/btrfs/super.c   |  7 +++++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 769ffb191683..75bbe879ed18 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2538,13 +2538,21 @@ static int validate_super(struct btrfs_fs_info *fs_info,
 		btrfs_err(fs_info, "invalid sectorsize %llu", sectorsize);
 		ret = -EINVAL;
 	}
-	/* Only PAGE SIZE is supported yet */
-	if (sectorsize != PAGE_SIZE) {
+
+	/*
+	 * For 4K page size, we only support 4K sector size.
+	 * For 64K page size, we support RW for 64K sector size, and RO for
+	 * 4K sector size.
+	 */
+	if ((PAGE_SIZE == SZ_4K && sectorsize != PAGE_SIZE) ||
+	    (PAGE_SIZE == SZ_64K && (sectorsize != SZ_4K &&
+				     sectorsize != SZ_64K))) {
 		btrfs_err(fs_info,
-			"sectorsize %llu not supported yet, only support %lu",
+			"sectorsize %llu not supported yet for page size %lu",
 			sectorsize, PAGE_SIZE);
 		ret = -EINVAL;
 	}
+
 	if (!is_power_of_2(nodesize) || nodesize < sectorsize ||
 	    nodesize > BTRFS_MAX_METADATA_BLOCKSIZE) {
 		btrfs_err(fs_info, "invalid nodesize %llu", nodesize);
@@ -3192,6 +3200,16 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_alloc;
 	}
 
+	/* For 4K sector size support, it's only read-only yet */
+	if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) {
+		if (!sb_rdonly(sb) || btrfs_super_log_root(disk_super)) {
+			btrfs_err(fs_info,
+				"subpage sector size only support RO yet");
+			err = -EINVAL;
+			goto fail_alloc;
+		}
+	}
+
 	ret = btrfs_init_workqueues(fs_info, fs_devices);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 25967ecaaf0a..edc731780d64 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1922,6 +1922,13 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			ret = -EINVAL;
 			goto restore;
 		}
+		if (fs_info->sectorsize < PAGE_SIZE) {
+			btrfs_warn(fs_info,
+	"read-write mount is not yet allowed for sector size %u page size %lu",
+				   fs_info->sectorsize, PAGE_SIZE);
+			ret = -EINVAL;
+			goto restore;
+		}
 
 		ret = btrfs_cleanup_fs_roots(fs_info);
 		if (ret)
-- 
2.28.0


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 05/19] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values
  2020-09-15  5:35 ` [PATCH v2 05/19] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
@ 2020-09-15  8:27   ` Johannes Thumshirn
  2020-09-15 10:04     ` Qu Wenruo
  2020-09-15 17:40   ` kernel test robot
  1 sibling, 1 reply; 58+ messages in thread
From: Johannes Thumshirn @ 2020-09-15  8:27 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Nikolay Borisov

On 15/09/2020 07:36, Qu Wenruo wrote:
>  	eb = radix_tree_lookup(&fs_info->buffer_radix,
> -			       start >> PAGE_SHIFT);
> +			       start / fs_info->sectorsize);


These should be div_u64(start, fs_info->sectorsize) I think.
At least 'start' is a u64.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 03/19] btrfs: calculate inline extent buffer page size based on page size
  2020-09-15  5:35 ` [PATCH v2 03/19] btrfs: calculate inline extent buffer page size based on page size Qu Wenruo
@ 2020-09-15  8:35   ` Nikolay Borisov
  2020-09-15 10:05     ` Qu Wenruo
  2020-09-15  8:40   ` Johannes Thumshirn
  1 sibling, 1 reply; 58+ messages in thread
From: Nikolay Borisov @ 2020-09-15  8:35 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 15.09.20 г. 8:35 ч., Qu Wenruo wrote:
> Btrfs only support 64K as max node size, thus for 4K page system, we
> would have at most 16 pages for one extent buffer.
> 
> For a system using 64K page size, we would really have just one
> single page.
> 
> While we always use 16 pages for extent_buffer::pages[], this means for
> systems using 64K pages, we are wasting memory for the 15 pages which
> will never be utilized.
> 
> So this patch will change how the extent_buffer::pages[] array size is
> calclulated, now it will be calculated using
> BTRFS_MAX_METADATA_BLOCKSIZE and PAGE_SIZE.
> 
> For systems using 4K page size, it will stay 16 pages.
> For systems using 64K page size, it will be just 1 page.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

<snip>

> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 00a88f2eb5ab..d511890702cc 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -85,9 +85,11 @@ struct extent_io_ops {
>  				    int mirror);
>  };
>  
> -
> -#define INLINE_EXTENT_BUFFER_PAGES 16
> -#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE)
> +/*
> + * The SZ_64K is BTRFS_MAX_METADATA_BLOCKSIZE, here just to avoid circle
> + * including "ctree.h".
> + */
> +#define INLINE_EXTENT_BUFFER_PAGES (SZ_64K / PAGE_SIZE)

nit: Instead of doing this it would be better if this define is simply
moved next to BTRFS_MAX_METADATA_BLOCKSIZE so that every define relating
to eb's are clustered together.

>  struct extent_buffer {
>  	u64 start;
>  	unsigned long len;
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 04/19] btrfs: remove the open-code to read disk-key
  2020-09-15  5:35 ` [PATCH v2 04/19] btrfs: remove the open-code to read disk-key Qu Wenruo
@ 2020-09-15  8:36   ` Nikolay Borisov
  2020-09-15  8:40   ` Johannes Thumshirn
  2020-09-16 16:01   ` David Sterba
  2 siblings, 0 replies; 58+ messages in thread
From: Nikolay Borisov @ 2020-09-15  8:36 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 15.09.20 г. 8:35 ч., Qu Wenruo wrote:
> generic_bin_search() distinguishes between reading a key which doesn't
> cross a page and one which does. However this distinction is not
> necessary since read_extent_buffer handles both cases transparently.
> 
> Just use read_extent_buffer to streamline the code.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 06/19] btrfs: don't allow tree block to cross page boundary for subpage support
  2020-09-15  5:35 ` [PATCH v2 06/19] btrfs: don't allow tree block to cross page boundary for subpage support Qu Wenruo
@ 2020-09-15  8:37   ` Nikolay Borisov
  2020-09-15 10:06     ` Qu Wenruo
  2020-09-15  8:44   ` Johannes Thumshirn
  1 sibling, 1 reply; 58+ messages in thread
From: Nikolay Borisov @ 2020-09-15  8:37 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 15.09.20 г. 8:35 ч., Qu Wenruo wrote:
> As a preparation for subpage sector size support (allowing filesystem
> with sector size smaller than page size to be mounted) if the sector
> size is smaller than page size, we don't allow tree block to be read if
> it cross 64K(*) boundary.
> 
> The 64K is selected because:
> - We are only going to support 64K page size for subpage

What does "only support 64k page size for subpage" ?

> - 64K is also the max node size btrfs supports
> 
> This ensures that, tree blocks are always contained in one page for a
> system with 64K page size, which can greatly simplify the handling.
> 
> Or we need to do complex multi-page handling for tree blocks.
> 
> Currently the only way to create such tree blocks crossing 64K boundary
> is by btrfs-convert, which will get fixed soon and doesn't get
> wide-spread usage.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index e56aa68cd9fe..9af972999a09 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5232,6 +5232,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>  		btrfs_err(fs_info, "bad tree block start %llu", start);
>  		return ERR_PTR(-EINVAL);
>  	}
> +	if (fs_info->sectorsize < PAGE_SIZE && round_down(start, PAGE_SIZE) !=
> +	    round_down(start + len - 1, PAGE_SIZE)) {
> +		btrfs_err(fs_info,
> +		"tree block crosses page boundary, start %llu nodesize %lu",
> +			  start, len);
> +		return ERR_PTR(-EINVAL);
> +	}
>  
>  	eb = find_extent_buffer(fs_info, start);
>  	if (eb)
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 02/19] btrfs: remove the unnecessary parameter @start and @len for check_data_csum()
  2020-09-15  5:35 ` [PATCH v2 02/19] btrfs: remove the unnecessary parameter @start and @len for check_data_csum() Qu Wenruo
@ 2020-09-15  8:39   ` Johannes Thumshirn
  0 siblings, 0 replies; 58+ messages in thread
From: Johannes Thumshirn @ 2020-09-15  8:39 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 03/19] btrfs: calculate inline extent buffer page size based on page size
  2020-09-15  5:35 ` [PATCH v2 03/19] btrfs: calculate inline extent buffer page size based on page size Qu Wenruo
  2020-09-15  8:35   ` Nikolay Borisov
@ 2020-09-15  8:40   ` Johannes Thumshirn
  1 sibling, 0 replies; 58+ messages in thread
From: Johannes Thumshirn @ 2020-09-15  8:40 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 04/19] btrfs: remove the open-code to read disk-key
  2020-09-15  5:35 ` [PATCH v2 04/19] btrfs: remove the open-code to read disk-key Qu Wenruo
  2020-09-15  8:36   ` Nikolay Borisov
@ 2020-09-15  8:40   ` Johannes Thumshirn
  2020-09-16 16:01   ` David Sterba
  2 siblings, 0 replies; 58+ messages in thread
From: Johannes Thumshirn @ 2020-09-15  8:40 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 07/19] btrfs: update num_extent_pages() to support subpage sized extent buffer
  2020-09-15  5:35 ` [PATCH v2 07/19] btrfs: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
@ 2020-09-15  8:42   ` Johannes Thumshirn
  2020-09-15 10:07     ` Qu Wenruo
  2020-09-15 10:07     ` Qu Wenruo
  0 siblings, 2 replies; 58+ messages in thread
From: Johannes Thumshirn @ 2020-09-15  8:42 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 15/09/2020 07:36, Qu Wenruo wrote:
> +	 * For sectorsize < PAGE_SIZE case, we only want to support 64K
> +	 * PAGE_SIZE, and ensured all tree blocks won't cross page boundary.
> +	 * So in that case we always got 1 page.

Just to clarify, does this mean we won't support 512B sector size with 4K pages?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 06/19] btrfs: don't allow tree block to cross page boundary for subpage support
  2020-09-15  5:35 ` [PATCH v2 06/19] btrfs: don't allow tree block to cross page boundary for subpage support Qu Wenruo
  2020-09-15  8:37   ` Nikolay Borisov
@ 2020-09-15  8:44   ` Johannes Thumshirn
  1 sibling, 0 replies; 58+ messages in thread
From: Johannes Thumshirn @ 2020-09-15  8:44 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 09/19] btrfs: make csum_tree_block() handle sectorsize smaller than page size
  2020-09-15  5:35 ` [PATCH v2 09/19] btrfs: make csum_tree_block() handle sectorsize smaller than page size Qu Wenruo
@ 2020-09-15  8:47   ` Johannes Thumshirn
  0 siblings, 0 replies; 58+ messages in thread
From: Johannes Thumshirn @ 2020-09-15  8:47 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Goldwyn Rodrigues, Nikolay Borisov

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 10/19] btrfs: add assert_spin_locked() for attach_extent_buffer_page()
  2020-09-15  5:35 ` [PATCH v2 10/19] btrfs: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
@ 2020-09-15  8:52   ` Johannes Thumshirn
  0 siblings, 0 replies; 58+ messages in thread
From: Johannes Thumshirn @ 2020-09-15  8:52 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Nikolay Borisov

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 05/19] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values
  2020-09-15  8:27   ` Johannes Thumshirn
@ 2020-09-15 10:04     ` Qu Wenruo
  2020-09-15 10:12       ` Johannes Thumshirn
  0 siblings, 1 reply; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15 10:04 UTC (permalink / raw)
  To: Johannes Thumshirn, Qu Wenruo, linux-btrfs; +Cc: Nikolay Borisov



On 2020/9/15 下午4:27, Johannes Thumshirn wrote:
> On 15/09/2020 07:36, Qu Wenruo wrote:
>>  	eb = radix_tree_lookup(&fs_info->buffer_radix,
>> -			       start >> PAGE_SHIFT);
>> +			       start / fs_info->sectorsize);
>
>
> These should be div_u64(start, fs_info->sectorsize) I think.
> At least 'start' is a u64.
>
Do we need to call div_u64() if it's u64/u32?

Anyway, the final version would utilize something like sector_shift.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 03/19] btrfs: calculate inline extent buffer page size based on page size
  2020-09-15  8:35   ` Nikolay Borisov
@ 2020-09-15 10:05     ` Qu Wenruo
  0 siblings, 0 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15 10:05 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2020/9/15 下午4:35, Nikolay Borisov wrote:
>
>
> On 15.09.20 г. 8:35 ч., Qu Wenruo wrote:
>> Btrfs only support 64K as max node size, thus for 4K page system, we
>> would have at most 16 pages for one extent buffer.
>>
>> For a system using 64K page size, we would really have just one
>> single page.
>>
>> While we always use 16 pages for extent_buffer::pages[], this means for
>> systems using 64K pages, we are wasting memory for the 15 pages which
>> will never be utilized.
>>
>> So this patch will change how the extent_buffer::pages[] array size is
>> calclulated, now it will be calculated using
>> BTRFS_MAX_METADATA_BLOCKSIZE and PAGE_SIZE.
>>
>> For systems using 4K page size, it will stay 16 pages.
>> For systems using 64K page size, it will be just 1 page.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>
> <snip>
>
>> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
>> index 00a88f2eb5ab..d511890702cc 100644
>> --- a/fs/btrfs/extent_io.h
>> +++ b/fs/btrfs/extent_io.h
>> @@ -85,9 +85,11 @@ struct extent_io_ops {
>>  				    int mirror);
>>  };
>>
>> -
>> -#define INLINE_EXTENT_BUFFER_PAGES 16
>> -#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE)
>> +/*
>> + * The SZ_64K is BTRFS_MAX_METADATA_BLOCKSIZE, here just to avoid circle
>> + * including "ctree.h".
>> + */
>> +#define INLINE_EXTENT_BUFFER_PAGES (SZ_64K / PAGE_SIZE)
>
> nit: Instead of doing this it would be better if this define is simply
> moved next to BTRFS_MAX_METADATA_BLOCKSIZE so that every define relating
> to eb's are clustered together.

It would be better to only define the macro in extent buffer context.

We have put too many things into ctree.h so that we hit such problem in
the first place.

Thanks,
Qu
>
>>  struct extent_buffer {
>>  	u64 start;
>>  	unsigned long len;
>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 06/19] btrfs: don't allow tree block to cross page boundary for subpage support
  2020-09-15  8:37   ` Nikolay Borisov
@ 2020-09-15 10:06     ` Qu Wenruo
  0 siblings, 0 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15 10:06 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2020/9/15 下午4:37, Nikolay Borisov wrote:
>
>
> On 15.09.20 г. 8:35 ч., Qu Wenruo wrote:
>> As a preparation for subpage sector size support (allowing filesystem
>> with sector size smaller than page size to be mounted) if the sector
>> size is smaller than page size, we don't allow tree block to be read if
>> it cross 64K(*) boundary.
>>
>> The 64K is selected because:
>> - We are only going to support 64K page size for subpage
>
> What does "only support 64k page size for subpage" ?

That means, we only support 64K page size system to mount sector size
smaller than page size.

Or to be more specific, we only support 64K page size system to mount 4K
sector size fs.

Thanks,
Qu
>
>> - 64K is also the max node size btrfs supports
>>
>> This ensures that, tree blocks are always contained in one page for a
>> system with 64K page size, which can greatly simplify the handling.
>>
>> Or we need to do complex multi-page handling for tree blocks.
>>
>> Currently the only way to create such tree blocks crossing 64K boundary
>> is by btrfs-convert, which will get fixed soon and doesn't get
>> wide-spread usage.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/extent_io.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index e56aa68cd9fe..9af972999a09 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -5232,6 +5232,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>>  		btrfs_err(fs_info, "bad tree block start %llu", start);
>>  		return ERR_PTR(-EINVAL);
>>  	}
>> +	if (fs_info->sectorsize < PAGE_SIZE && round_down(start, PAGE_SIZE) !=
>> +	    round_down(start + len - 1, PAGE_SIZE)) {
>> +		btrfs_err(fs_info,
>> +		"tree block crosses page boundary, start %llu nodesize %lu",
>> +			  start, len);
>> +		return ERR_PTR(-EINVAL);
>> +	}
>>
>>  	eb = find_extent_buffer(fs_info, start);
>>  	if (eb)
>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 07/19] btrfs: update num_extent_pages() to support subpage sized extent buffer
  2020-09-15  8:42   ` Johannes Thumshirn
@ 2020-09-15 10:07     ` Qu Wenruo
  2020-09-15 10:12       ` Johannes Thumshirn
  2020-09-15 10:07     ` Qu Wenruo
  1 sibling, 1 reply; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15 10:07 UTC (permalink / raw)
  To: Johannes Thumshirn, Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 503 bytes --]



On 2020/9/15 下午4:42, Johannes Thumshirn wrote:
> On 15/09/2020 07:36, Qu Wenruo wrote:
>> +	 * For sectorsize < PAGE_SIZE case, we only want to support 64K
>> +	 * PAGE_SIZE, and ensured all tree blocks won't cross page boundary.
>> +	 * So in that case we always got 1 page.
> 
> Just to clarify, does this mean we won't support 512B sector size with 4K pages?
> 
At least that's why I have planned.

I think the current minimal 4K sector size is pretty reasonable.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 07/19] btrfs: update num_extent_pages() to support subpage sized extent buffer
  2020-09-15  8:42   ` Johannes Thumshirn
  2020-09-15 10:07     ` Qu Wenruo
@ 2020-09-15 10:07     ` Qu Wenruo
  1 sibling, 0 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-15 10:07 UTC (permalink / raw)
  To: Johannes Thumshirn, Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 461 bytes --]



On 2020/9/15 下午4:42, Johannes Thumshirn wrote:
> On 15/09/2020 07:36, Qu Wenruo wrote:
>> +	 * For sectorsize < PAGE_SIZE case, we only want to support 64K
>> +	 * PAGE_SIZE, and ensured all tree blocks won't cross page boundary.
>> +	 * So in that case we always got 1 page.
> 
> Just to clarify, does this mean we won't support 512B sector size with 4K pages?
> 
At least I don't plan to support sector size smaller than 4K.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 05/19] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values
  2020-09-15 10:04     ` Qu Wenruo
@ 2020-09-15 10:12       ` Johannes Thumshirn
  0 siblings, 0 replies; 58+ messages in thread
From: Johannes Thumshirn @ 2020-09-15 10:12 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs; +Cc: Nikolay Borisov

On 15/09/2020 12:04, Qu Wenruo wrote:
> 
> 
> On 2020/9/15 下午4:27, Johannes Thumshirn wrote:
>> On 15/09/2020 07:36, Qu Wenruo wrote:
>>>  	eb = radix_tree_lookup(&fs_info->buffer_radix,
>>> -			       start >> PAGE_SHIFT);
>>> +			       start / fs_info->sectorsize);
>>
>>
>> These should be div_u64(start, fs_info->sectorsize) I think.
>> At least 'start' is a u64.
>>
> Do we need to call div_u64() if it's u64/u32?

I'm not entirely sure either, that's the problem.

> 
> Anyway, the final version would utilize something like sector_shift.
> 
> Thanks,
> Qu
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 07/19] btrfs: update num_extent_pages() to support subpage sized extent buffer
  2020-09-15 10:07     ` Qu Wenruo
@ 2020-09-15 10:12       ` Johannes Thumshirn
  0 siblings, 0 replies; 58+ messages in thread
From: Johannes Thumshirn @ 2020-09-15 10:12 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs

On 15/09/2020 12:07, Qu Wenruo wrote:
> 
> 
> On 2020/9/15 下午4:42, Johannes Thumshirn wrote:
>> On 15/09/2020 07:36, Qu Wenruo wrote:
>>> +	 * For sectorsize < PAGE_SIZE case, we only want to support 64K
>>> +	 * PAGE_SIZE, and ensured all tree blocks won't cross page boundary.
>>> +	 * So in that case we always got 1 page.
>>
>> Just to clarify, does this mean we won't support 512B sector size with 4K pages?
>>
> At least that's why I have planned.
> 
> I think the current minimal 4K sector size is pretty reasonable.

OK thanks 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 05/19] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values
  2020-09-15  5:35 ` [PATCH v2 05/19] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
  2020-09-15  8:27   ` Johannes Thumshirn
@ 2020-09-15 17:40   ` kernel test robot
  1 sibling, 0 replies; 58+ messages in thread
From: kernel test robot @ 2020-09-15 17:40 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: kbuild-all, Nikolay Borisov


[-- Attachment #1: Type: text/plain, Size: 5983 bytes --]

Hi Qu,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v5.9-rc5]
[also build test ERROR on next-20200915]
[cannot apply to kdave/for-next btrfs/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200915-133811
base:    856deb866d16e29bd65952e0289066f6078af773
config: m68k-randconfig-r014-20200913 (attached as .config)
compiler: m68k-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=m68k 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   m68k-linux-ld: fs/btrfs/extent_io.o: in function `alloc_extent_buffer':
>> fs/btrfs/extent_io.c:5305: undefined reference to `__udivdi3'
   m68k-linux-ld: fs/btrfs/extent_io.o: in function `release_extent_buffer':
   fs/btrfs/extent_io.c:5361: undefined reference to `__udivdi3'
   m68k-linux-ld: fs/btrfs/extent_io.o: in function `find_extent_buffer':
   fs/btrfs/extent_io.c:5145: undefined reference to `__udivdi3'

# https://github.com/0day-ci/linux/commit/d68d61d0719a047c653dcee58952ec46f5db5d00
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200915-133811
git checkout d68d61d0719a047c653dcee58952ec46f5db5d00
vim +5305 fs/btrfs/extent_io.c

  5216	
  5217	struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
  5218						  u64 start)
  5219	{
  5220		unsigned long len = fs_info->nodesize;
  5221		int num_pages;
  5222		int i;
  5223		unsigned long index = start >> PAGE_SHIFT;
  5224		struct extent_buffer *eb;
  5225		struct extent_buffer *exists = NULL;
  5226		struct page *p;
  5227		struct address_space *mapping = fs_info->btree_inode->i_mapping;
  5228		int uptodate = 1;
  5229		int ret;
  5230	
  5231		if (!IS_ALIGNED(start, fs_info->sectorsize)) {
  5232			btrfs_err(fs_info, "bad tree block start %llu", start);
  5233			return ERR_PTR(-EINVAL);
  5234		}
  5235	
  5236		eb = find_extent_buffer(fs_info, start);
  5237		if (eb)
  5238			return eb;
  5239	
  5240		eb = __alloc_extent_buffer(fs_info, start, len);
  5241		if (!eb)
  5242			return ERR_PTR(-ENOMEM);
  5243	
  5244		num_pages = num_extent_pages(eb);
  5245		for (i = 0; i < num_pages; i++, index++) {
  5246			p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
  5247			if (!p) {
  5248				exists = ERR_PTR(-ENOMEM);
  5249				goto free_eb;
  5250			}
  5251	
  5252			spin_lock(&mapping->private_lock);
  5253			if (PagePrivate(p)) {
  5254				/*
  5255				 * We could have already allocated an eb for this page
  5256				 * and attached one so lets see if we can get a ref on
  5257				 * the existing eb, and if we can we know it's good and
  5258				 * we can just return that one, else we know we can just
  5259				 * overwrite page->private.
  5260				 */
  5261				exists = (struct extent_buffer *)p->private;
  5262				if (atomic_inc_not_zero(&exists->refs)) {
  5263					spin_unlock(&mapping->private_lock);
  5264					unlock_page(p);
  5265					put_page(p);
  5266					mark_extent_buffer_accessed(exists, p);
  5267					goto free_eb;
  5268				}
  5269				exists = NULL;
  5270	
  5271				/*
  5272				 * Do this so attach doesn't complain and we need to
  5273				 * drop the ref the old guy had.
  5274				 */
  5275				ClearPagePrivate(p);
  5276				WARN_ON(PageDirty(p));
  5277				put_page(p);
  5278			}
  5279			attach_extent_buffer_page(eb, p);
  5280			spin_unlock(&mapping->private_lock);
  5281			WARN_ON(PageDirty(p));
  5282			eb->pages[i] = p;
  5283			if (!PageUptodate(p))
  5284				uptodate = 0;
  5285	
  5286			/*
  5287			 * We can't unlock the pages just yet since the extent buffer
  5288			 * hasn't been properly inserted in the radix tree, this
  5289			 * opens a race with btree_releasepage which can free a page
  5290			 * while we are still filling in all pages for the buffer and
  5291			 * we could crash.
  5292			 */
  5293		}
  5294		if (uptodate)
  5295			set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
  5296	again:
  5297		ret = radix_tree_preload(GFP_NOFS);
  5298		if (ret) {
  5299			exists = ERR_PTR(ret);
  5300			goto free_eb;
  5301		}
  5302	
  5303		spin_lock(&fs_info->buffer_lock);
  5304		ret = radix_tree_insert(&fs_info->buffer_radix,
> 5305					start / fs_info->sectorsize, eb);
  5306		spin_unlock(&fs_info->buffer_lock);
  5307		radix_tree_preload_end();
  5308		if (ret == -EEXIST) {
  5309			exists = find_extent_buffer(fs_info, start);
  5310			if (exists)
  5311				goto free_eb;
  5312			else
  5313				goto again;
  5314		}
  5315		/* add one reference for the tree */
  5316		check_buffer_tree_ref(eb);
  5317		set_bit(EXTENT_BUFFER_IN_TREE, &eb->bflags);
  5318	
  5319		/*
  5320		 * Now it's safe to unlock the pages because any calls to
  5321		 * btree_releasepage will correctly detect that a page belongs to a
  5322		 * live buffer and won't free them prematurely.
  5323		 */
  5324		for (i = 0; i < num_pages; i++)
  5325			unlock_page(eb->pages[i]);
  5326		return eb;
  5327	
  5328	free_eb:
  5329		WARN_ON(!atomic_dec_and_test(&eb->refs));
  5330		for (i = 0; i < num_pages; i++) {
  5331			if (eb->pages[i])
  5332				unlock_page(eb->pages[i]);
  5333		}
  5334	
  5335		btrfs_release_extent_buffer(eb);
  5336		return exists;
  5337	}
  5338	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22530 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 00/19] btrfs: add read-only support for subpage sector size
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (18 preceding siblings ...)
  2020-09-15  5:35 ` [PATCH v2 19/19] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
@ 2020-09-16  1:35 ` Qu Wenruo
  2020-09-16 16:18 ` Neal Gompa
  20 siblings, 0 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-16  1:35 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 6533 bytes --]



On 2020/9/15 下午1:35, Qu Wenruo wrote:
> Patches can be fetched from github:
> https://github.com/adam900710/linux/tree/subpage
> 
> Currently btrfs only allows to mount fs with sectorsize == PAGE_SIZE.
> 
> That means, for 64K page size system, they can only use 64K sector size
> fs.
> This brings a big compatible problem for btrfs.
> 
> This patch is going to slightly solve the problem by, allowing 64K
> system to mount 4K sectorsize fs in read-only mode.
> 
> The main objective here, is to remove the blockage in the code base, and
> pave the road to full RW mount support.
> 
> == What works ==
> 
> Existing regular page sized sector size support
> Subpage read-only Mount (with all self tests and ASSERT)
> Subpage metadata read (including all trees and inline extents, and csum checking)
> Subpage uncompressed data read (with csum checking)
> 
> == What doesn't work ==
> 
> Read-write mount (see the subject)
> Compressed data read
> 
> == Challenge we meet ==
> 
> The main problem is metadata, where we have several limitations:
> - We always read the full page of a metadata
>   In subpage case, one full page can contain several tree blocks.
> 
> - We use page::private to point to extent buffer
>   This means we currently can only support one-page-to-one-extent-buffer
>   mapping.
>   For subpage size support, we need one-page-to-multiple-extent-buffer
>   mapping.
> 
> 
> == Solutions ==
> 
> So here for the metadata part, we use the following methods to
> workaround the problem:
> 
> - Completely rely on extent_io_tree for metadata status/locking
>   Now for subpage metadata, page::private is never utilized. It always
>   points to NULL.
>   And we only utilize private page status, other status
>   (locked/uptodate/dirty/...) are all ignored.
> 
>   Instead, page lock is replayed by EXTENT_LOCK of extent_io_tree.
>   Page uptodate is replaced by EXTENT_UPTODATE of extent_io_tree.
>   And if a range has extent buffer is represented by EXTENT_NEW.
> 
>   This provides the full potential for later RW support.
> 
> - Do subpage read for metadata
>   Now we do proper subpage read for both data and metadata.
>   For metadata we never merge bio for adjacent tree blocks, but always
>   submit one bio for one tree block.
>   This allows us to do proper verification for each tree blocks.
> 
> For data part, it's pretty simple, all existing infrastructure can be
> easily converted to support subpage read, without any subpage specific
> handing yet.
> 
> == Patchset structure ==
> 
> The structure of the patchset:
> Patch 01~15: Preparation patches for data and metadata subpage read support.
>              These patches can be merged without problem, and work for
>              both regular and subpage case.
> 	     This part can conflict with Nikolay's latest cleanup, but
> 	     the conflicts should be pretty controllable.
> 
> Patch 16~19: Patches for metadata subpage read support.
> 	     The main part of the patchset. It converts metadata to
> 	     purely extent_io_tree based solution for subpage read.
> 
> 	     In theory, page sized routine can also be converted to
> 	     extent_io_tree. But that would be another topic in the
> 	     future.
> 
> The number of patches is the main reason I'm submitting them to the mail
> list. As there are too many preparation patches already.

For the missing changelog:
v2:
- Migrating to extent_io_tree based status/locking mechanism
  This gets rid of the ad-hoc subpage_eb_mapping structure and extra
  timing to verify the extent buffers.

  This also brings some extra cleanups for btree inode extent io tree
  hooks which makes no sense for both subpage and regular sector size.

  This also completely removes the requirement for page status like
  Locked/Uptodate/Dirty. Now metadata pages only utilize Private status,
  while private pointer is always NULL.

- Submit proper subpage sized read for metadata
  With the help of extent io tree, we no longer need to bother full page
  read. Now submit subpage sized metadata read and do subpage locking.

- Remove some unnecessary refactors
  Some refactors like extracting detach_extent_buffer_pages() doesn't
  really make the code cleaner. We can easily add subpage specific
  branch.

- Address the comments from v1

Thanks,
Qu
> 
> Qu Wenruo (19):
>   btrfs: extent-io-tests: remove invalid tests
>   btrfs: remove the unnecessary parameter @start and @len for
>     check_data_csum()
>   btrfs: calculate inline extent buffer page size based on page size
>   btrfs: remove the open-code to read disk-key
>   btrfs: make btrfs_fs_info::buffer_radix to take sector size devided
>     values
>   btrfs: don't allow tree block to cross page boundary for subpage
>     support
>   btrfs: update num_extent_pages() to support subpage sized extent
>     buffer
>   btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
>   btrfs: make csum_tree_block() handle sectorsize smaller than page size
>   btrfs: add assert_spin_locked() for attach_extent_buffer_page()
>   btrfs: extract the extent buffer verification from
>     btree_readpage_end_io_hook()
>   btrfs: extent_io: only require sector size alignment for page read
>   btrfs: make btrfs_readpage_end_io_hook() follow sector size
>   btrfs: make btree inode io_tree has its special owner
>   btrfs: don't set extent_io_tree bits for btree inode at endio time
>   btrfs: use extent_io_tree to handle subpage extent buffer allocation
>   btrfs: implement subpage metadata read and its endio function
>   btrfs: implement btree_readpage() and try_release_extent_buffer() for
>     subpage
>   btrfs: allow RO mount of 4K sector size fs on 64K page system
> 
>  fs/btrfs/btrfs_inode.h           |  12 +
>  fs/btrfs/ctree.c                 |  13 +-
>  fs/btrfs/ctree.h                 |  38 +++-
>  fs/btrfs/disk-io.c               | 217 ++++++++++++++----
>  fs/btrfs/extent-io-tree.h        |   8 +
>  fs/btrfs/extent_io.c             | 376 +++++++++++++++++++++++++++----
>  fs/btrfs/extent_io.h             |  19 +-
>  fs/btrfs/inode.c                 |  40 +++-
>  fs/btrfs/ordered-data.c          |   8 +
>  fs/btrfs/qgroup.c                |   4 +
>  fs/btrfs/struct-funcs.c          |  18 +-
>  fs/btrfs/super.c                 |   7 +
>  fs/btrfs/tests/extent-io-tests.c |  26 +--
>  13 files changed, 642 insertions(+), 144 deletions(-)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 17/19] btrfs: implement subpage metadata read and its endio function
  2020-09-15  5:35 ` [PATCH v2 17/19] btrfs: implement subpage metadata read and its endio function Qu Wenruo
@ 2020-09-16  8:47   ` kernel test robot
  0 siblings, 0 replies; 58+ messages in thread
From: kernel test robot @ 2020-09-16  8:47 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: kbuild-all


[-- Attachment #1: Type: text/plain, Size: 2064 bytes --]

Hi Qu,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v5.9-rc5]
[also build test ERROR on next-20200915]
[cannot apply to kdave/for-next btrfs/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200915-133811
base:    856deb866d16e29bd65952e0289066f6078af773
config: nds32-randconfig-r003-20200916 (attached as .config)
compiler: nds32le-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nds32 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   nds32le-linux-ld: fs/btrfs/disk-io.o: in function `btree_readpage_end_io_hook':
>> disk-io.c:(.text+0x2412): undefined reference to `__udivdi3'
>> nds32le-linux-ld: disk-io.c:(.text+0x2416): undefined reference to `__udivdi3'
   nds32le-linux-ld: nds32le-linux-ld: DWARF error: found dwarf version '36812', this reader only handles version 2, 3, 4 and 5 information
   fs/btrfs/extent_io.o: in function `release_extent_buffer':
   extent_io.c:(.text+0x6b08): undefined reference to `__udivdi3'
   nds32le-linux-ld: extent_io.c:(.text+0x6b0c): undefined reference to `__udivdi3'
   nds32le-linux-ld: fs/btrfs/extent_io.o: in function `find_extent_buffer':
   extent_io.c:(.text+0x6d1a): undefined reference to `__udivdi3'
   nds32le-linux-ld: fs/btrfs/extent_io.o:extent_io.c:(.text+0x6d1e): more undefined references to `__udivdi3' follow

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28530 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 14/19] btrfs: make btree inode io_tree has its special owner
  2020-09-15  5:35 ` [PATCH v2 14/19] btrfs: make btree inode io_tree has its special owner Qu Wenruo
@ 2020-09-16  9:28   ` Johannes Thumshirn
  2020-09-16 16:06   ` David Sterba
  2020-09-22 14:14   ` David Sterba
  2 siblings, 0 replies; 58+ messages in thread
From: Johannes Thumshirn @ 2020-09-16  9:28 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 04/19] btrfs: remove the open-code to read disk-key
  2020-09-15  5:35 ` [PATCH v2 04/19] btrfs: remove the open-code to read disk-key Qu Wenruo
  2020-09-15  8:36   ` Nikolay Borisov
  2020-09-15  8:40   ` Johannes Thumshirn
@ 2020-09-16 16:01   ` David Sterba
  2020-09-17  8:02     ` Qu Wenruo
  2 siblings, 1 reply; 58+ messages in thread
From: David Sterba @ 2020-09-16 16:01 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Sep 15, 2020 at 01:35:17PM +0800, Qu Wenruo wrote:
> generic_bin_search() distinguishes between reading a key which doesn't
> cross a page and one which does. However this distinction is not
> necessary since read_extent_buffer handles both cases transparently.
> 
> Just use read_extent_buffer to streamline the code.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/ctree.c | 13 ++-----------
>  1 file changed, 2 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index cd1cd673bc0b..e204e1320745 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1697,7 +1697,6 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
>  	}
>  
>  	while (low < high) {
> -		unsigned long oip;
>  		unsigned long offset;
>  		struct btrfs_disk_key *tmp;
>  		struct btrfs_disk_key unaligned;
> @@ -1705,17 +1704,9 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
>  
>  		mid = (low + high) / 2;
>  		offset = p + mid * item_size;
> -		oip = offset_in_page(offset);
>  
> -		if (oip + key_size <= PAGE_SIZE) {
> -			const unsigned long idx = offset >> PAGE_SHIFT;
> -			char *kaddr = page_address(eb->pages[idx]);
> -
> -			tmp = (struct btrfs_disk_key *)(kaddr + oip);
> -		} else {
> -			read_extent_buffer(eb, &unaligned, offset, key_size);
> -			tmp = &unaligned;
> -		}
> +		read_extent_buffer(eb, &unaligned, offset, key_size);
> +		tmp = &unaligned;

Reading from the first page is a performance optimization on systems
with 4K pages, ie. the majority. I'm not in favor removing it just to
make the code look nicer.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 14/19] btrfs: make btree inode io_tree has its special owner
  2020-09-15  5:35 ` [PATCH v2 14/19] btrfs: make btree inode io_tree has its special owner Qu Wenruo
  2020-09-16  9:28   ` Johannes Thumshirn
@ 2020-09-16 16:06   ` David Sterba
  2020-09-17  0:02     ` Qu Wenruo
  2020-09-22 14:14   ` David Sterba
  2 siblings, 1 reply; 58+ messages in thread
From: David Sterba @ 2020-09-16 16:06 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Sep 15, 2020 at 01:35:27PM +0800, Qu Wenruo wrote:
> Btree inode is pretty special compared to all other inode extent io
> tree, although it has a btrfs inode, it doesn't have the track_uptodate
> bit at all.
> 
> This means a lot of things like extent locking doesn't even need to be
> applied to btree io tree.
> 
> Since it's so special, adds a new owner value for it to make debuging a
> little easier.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/disk-io.c        | 2 +-
>  fs/btrfs/extent-io-tree.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 1ba16951ccaa..82a841bd0702 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2126,7 +2126,7 @@ static void btrfs_init_btree_inode(struct btrfs_fs_info *fs_info)
>  
>  	RB_CLEAR_NODE(&BTRFS_I(inode)->rb_node);
>  	extent_io_tree_init(fs_info, &BTRFS_I(inode)->io_tree,
> -			    IO_TREE_INODE_IO, inode);
> +			    IO_TREE_BTREE_IO, inode);

This looks like an independent patch, so it could be taken separately.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 00/19] btrfs: add read-only support for subpage sector size
  2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (19 preceding siblings ...)
  2020-09-16  1:35 ` [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
@ 2020-09-16 16:18 ` Neal Gompa
  2020-09-17  0:03   ` Qu Wenruo
  20 siblings, 1 reply; 58+ messages in thread
From: Neal Gompa @ 2020-09-16 16:18 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Btrfs BTRFS, Josef Bacik, Chris Murphy

On Tue, Sep 15, 2020 at 1:36 AM Qu Wenruo <wqu@suse.com> wrote:
>
> Patches can be fetched from github:
> https://github.com/adam900710/linux/tree/subpage
>
> Currently btrfs only allows to mount fs with sectorsize == PAGE_SIZE.
>
> That means, for 64K page size system, they can only use 64K sector size
> fs.
> This brings a big compatible problem for btrfs.
>
> This patch is going to slightly solve the problem by, allowing 64K
> system to mount 4K sectorsize fs in read-only mode.
>
> The main objective here, is to remove the blockage in the code base, and
> pave the road to full RW mount support.
>

Is there a reason we don't include a patch in here to just hardwire
the block size to 4K going forward?


-- 
真実はいつも一つ!/ Always, there's only one truth!

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 14/19] btrfs: make btree inode io_tree has its special owner
  2020-09-16 16:06   ` David Sterba
@ 2020-09-17  0:02     ` Qu Wenruo
  2020-09-17 12:50       ` David Sterba
  0 siblings, 1 reply; 58+ messages in thread
From: Qu Wenruo @ 2020-09-17  0:02 UTC (permalink / raw)
  To: dsterba, linux-btrfs



On 2020/9/17 上午12:06, David Sterba wrote:
> On Tue, Sep 15, 2020 at 01:35:27PM +0800, Qu Wenruo wrote:
>> Btree inode is pretty special compared to all other inode extent io
>> tree, although it has a btrfs inode, it doesn't have the track_uptodate
>> bit at all.
>>
>> This means a lot of things like extent locking doesn't even need to be
>> applied to btree io tree.
>>
>> Since it's so special, adds a new owner value for it to make debuging a
>> little easier.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/disk-io.c        | 2 +-
>>  fs/btrfs/extent-io-tree.h | 1 +
>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 1ba16951ccaa..82a841bd0702 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -2126,7 +2126,7 @@ static void btrfs_init_btree_inode(struct btrfs_fs_info *fs_info)
>>  
>>  	RB_CLEAR_NODE(&BTRFS_I(inode)->rb_node);
>>  	extent_io_tree_init(fs_info, &BTRFS_I(inode)->io_tree,
>> -			    IO_TREE_INODE_IO, inode);
>> +			    IO_TREE_BTREE_IO, inode);
> 
> This looks like an independent patch, so it could be taken separately.
> 
Errr, why?

We added a new owner for btree inode io tree, and utilize that new owner
in the same patch looks completely sane to me.

Or did I miss something?

Thanks,
Qu


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 00/19] btrfs: add read-only support for subpage sector size
  2020-09-16 16:18 ` Neal Gompa
@ 2020-09-17  0:03   ` Qu Wenruo
  2020-09-17  0:13     ` Neal Gompa
  0 siblings, 1 reply; 58+ messages in thread
From: Qu Wenruo @ 2020-09-17  0:03 UTC (permalink / raw)
  To: Neal Gompa; +Cc: Btrfs BTRFS, Josef Bacik, Chris Murphy



On 2020/9/17 上午12:18, Neal Gompa wrote:
> On Tue, Sep 15, 2020 at 1:36 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> Patches can be fetched from github:
>> https://github.com/adam900710/linux/tree/subpage
>>
>> Currently btrfs only allows to mount fs with sectorsize == PAGE_SIZE.
>>
>> That means, for 64K page size system, they can only use 64K sector size
>> fs.
>> This brings a big compatible problem for btrfs.
>>
>> This patch is going to slightly solve the problem by, allowing 64K
>> system to mount 4K sectorsize fs in read-only mode.
>>
>> The main objective here, is to remove the blockage in the code base, and
>> pave the road to full RW mount support.
>>
> 
> Is there a reason we don't include a patch in here to just hardwire
> the block size to 4K going forward?
> 

Did you mean to make 4K sector size the hard requirement?

That would make existing 64K sector size fs unable to be mounted then.

Thanks,
Qu


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 00/19] btrfs: add read-only support for subpage sector size
  2020-09-17  0:03   ` Qu Wenruo
@ 2020-09-17  0:13     ` Neal Gompa
  2020-09-17  0:24       ` Qu Wenruo
  0 siblings, 1 reply; 58+ messages in thread
From: Neal Gompa @ 2020-09-17  0:13 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Btrfs BTRFS, Josef Bacik, Chris Murphy

On Wed, Sep 16, 2020 at 8:03 PM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> On 2020/9/17 上午12:18, Neal Gompa wrote:
> > On Tue, Sep 15, 2020 at 1:36 AM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >> Patches can be fetched from github:
> >> https://github.com/adam900710/linux/tree/subpage
> >>
> >> Currently btrfs only allows to mount fs with sectorsize == PAGE_SIZE.
> >>
> >> That means, for 64K page size system, they can only use 64K sector size
> >> fs.
> >> This brings a big compatible problem for btrfs.
> >>
> >> This patch is going to slightly solve the problem by, allowing 64K
> >> system to mount 4K sectorsize fs in read-only mode.
> >>
> >> The main objective here, is to remove the blockage in the code base, and
> >> pave the road to full RW mount support.
> >>
> >
> > Is there a reason we don't include a patch in here to just hardwire
> > the block size to 4K going forward?
> >
>
> Did you mean to make 4K sector size the hard requirement?
>
> That would make existing 64K sector size fs unable to be mounted then.
>

I mean, make the 64K variant a legacy one and force 4K for all new
filesystems. That then simplifies this work to making it mountable and
usable as a legacy filesystem format.

I guess that would be an incompat flag, no?



--
真実はいつも一つ!/ Always, there's only one truth!

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 00/19] btrfs: add read-only support for subpage sector size
  2020-09-17  0:13     ` Neal Gompa
@ 2020-09-17  0:24       ` Qu Wenruo
  0 siblings, 0 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-17  0:24 UTC (permalink / raw)
  To: Neal Gompa, Qu Wenruo; +Cc: Btrfs BTRFS, Josef Bacik, Chris Murphy

[-- Attachment #1.1: Type: text/plain, Size: 1650 bytes --]



On 2020/9/17 上午8:13, Neal Gompa wrote:
> On Wed, Sep 16, 2020 at 8:03 PM Qu Wenruo <wqu@suse.com> wrote:
>>
>>
>>
>> On 2020/9/17 上午12:18, Neal Gompa wrote:
>>> On Tue, Sep 15, 2020 at 1:36 AM Qu Wenruo <wqu@suse.com> wrote:
>>>>
>>>> Patches can be fetched from github:
>>>> https://github.com/adam900710/linux/tree/subpage
>>>>
>>>> Currently btrfs only allows to mount fs with sectorsize == PAGE_SIZE.
>>>>
>>>> That means, for 64K page size system, they can only use 64K sector size
>>>> fs.
>>>> This brings a big compatible problem for btrfs.
>>>>
>>>> This patch is going to slightly solve the problem by, allowing 64K
>>>> system to mount 4K sectorsize fs in read-only mode.
>>>>
>>>> The main objective here, is to remove the blockage in the code base, and
>>>> pave the road to full RW mount support.
>>>>
>>>
>>> Is there a reason we don't include a patch in here to just hardwire
>>> the block size to 4K going forward?
>>>
>>
>> Did you mean to make 4K sector size the hard requirement?
>>
>> That would make existing 64K sector size fs unable to be mounted then.
>>
> 
> I mean, make the 64K variant a legacy one and force 4K for all new
> filesystems. That then simplifies this work to making it mountable and
> usable as a legacy filesystem format.

That's the plan.

But not for now, since currently the subpage support is only read-only.

> 
> I guess that would be an incompat flag, no?

No need for incompat flag.

For older kernel they just can't mount subpage fs.

Thanks,
Qu

> 
> 
> 
> --
> 真実はいつも一つ!/ Always, there's only one truth!
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 04/19] btrfs: remove the open-code to read disk-key
  2020-09-16 16:01   ` David Sterba
@ 2020-09-17  8:02     ` Qu Wenruo
  2020-09-17 12:37       ` David Sterba
  0 siblings, 1 reply; 58+ messages in thread
From: Qu Wenruo @ 2020-09-17  8:02 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 2051 bytes --]



On 2020/9/17 上午12:01, David Sterba wrote:
> On Tue, Sep 15, 2020 at 01:35:17PM +0800, Qu Wenruo wrote:
>> generic_bin_search() distinguishes between reading a key which doesn't
>> cross a page and one which does. However this distinction is not
>> necessary since read_extent_buffer handles both cases transparently.
>>
>> Just use read_extent_buffer to streamline the code.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/ctree.c | 13 ++-----------
>>  1 file changed, 2 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>> index cd1cd673bc0b..e204e1320745 100644
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -1697,7 +1697,6 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
>>  	}
>>  
>>  	while (low < high) {
>> -		unsigned long oip;
>>  		unsigned long offset;
>>  		struct btrfs_disk_key *tmp;
>>  		struct btrfs_disk_key unaligned;
>> @@ -1705,17 +1704,9 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
>>  
>>  		mid = (low + high) / 2;
>>  		offset = p + mid * item_size;
>> -		oip = offset_in_page(offset);
>>  
>> -		if (oip + key_size <= PAGE_SIZE) {
>> -			const unsigned long idx = offset >> PAGE_SHIFT;
>> -			char *kaddr = page_address(eb->pages[idx]);
>> -
>> -			tmp = (struct btrfs_disk_key *)(kaddr + oip);
>> -		} else {
>> -			read_extent_buffer(eb, &unaligned, offset, key_size);
>> -			tmp = &unaligned;
>> -		}
>> +		read_extent_buffer(eb, &unaligned, offset, key_size);
>> +		tmp = &unaligned;
> 
> Reading from the first page is a performance optimization on systems
> with 4K pages, ie. the majority. I'm not in favor removing it just to
> make the code look nicer.
> 

For 4K system, with the optimization it only saves one
read_extent_buffer() call cost.

In read_extent_buffer() it's still doing the same thing.

Or we will need to manually call get_eb_page_offset() here to make it
work for subpage.

I wonder if it really worthy.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 04/19] btrfs: remove the open-code to read disk-key
  2020-09-17  8:02     ` Qu Wenruo
@ 2020-09-17 12:37       ` David Sterba
  2020-09-17 13:15         ` Qu Wenruo
  0 siblings, 1 reply; 58+ messages in thread
From: David Sterba @ 2020-09-17 12:37 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Thu, Sep 17, 2020 at 04:02:37PM +0800, Qu Wenruo wrote:
> On 2020/9/17 上午12:01, David Sterba wrote:
> > On Tue, Sep 15, 2020 at 01:35:17PM +0800, Qu Wenruo wrote:
> >> generic_bin_search() distinguishes between reading a key which doesn't
> >> cross a page and one which does. However this distinction is not
> >> necessary since read_extent_buffer handles both cases transparently.
> >>
> >> Just use read_extent_buffer to streamline the code.
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >>  fs/btrfs/ctree.c | 13 ++-----------
> >>  1 file changed, 2 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> >> index cd1cd673bc0b..e204e1320745 100644
> >> --- a/fs/btrfs/ctree.c
> >> +++ b/fs/btrfs/ctree.c
> >> @@ -1697,7 +1697,6 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
> >>  	}
> >>  
> >>  	while (low < high) {
> >> -		unsigned long oip;
> >>  		unsigned long offset;
> >>  		struct btrfs_disk_key *tmp;
> >>  		struct btrfs_disk_key unaligned;
> >> @@ -1705,17 +1704,9 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
> >>  
> >>  		mid = (low + high) / 2;
> >>  		offset = p + mid * item_size;
> >> -		oip = offset_in_page(offset);
> >>  
> >> -		if (oip + key_size <= PAGE_SIZE) {
> >> -			const unsigned long idx = offset >> PAGE_SHIFT;
> >> -			char *kaddr = page_address(eb->pages[idx]);
> >> -
> >> -			tmp = (struct btrfs_disk_key *)(kaddr + oip);
> >> -		} else {
> >> -			read_extent_buffer(eb, &unaligned, offset, key_size);
> >> -			tmp = &unaligned;
> >> -		}
> >> +		read_extent_buffer(eb, &unaligned, offset, key_size);
> >> +		tmp = &unaligned;
> > 
> > Reading from the first page is a performance optimization on systems
> > with 4K pages, ie. the majority. I'm not in favor removing it just to
> > make the code look nicer.
> 
> For 4K system, with the optimization it only saves one
> read_extent_buffer() call cost.

This evaluation is wrong, you missed several things that
generic_bin_search and read_extent_buffer do.

generic_bin_search is called very often, each search slot so
optimization is worth here

read_extent_buffer is used _only_ for keys that cross page boundary, so
we need to read the bytes in two steps and this is wrapped into a
function that we call in a limited number of cases

In all other cases, when the whole key is contained in the page the call
is inline in generic_bin_search, ie. no function call overhead

> Or we will need to manually call get_eb_page_offset() here to make it
> work for subpage.

For nodesize that is smaller than PAGE_SIZE there's no page crossing at
all so using read_extent_buffer would be making things worse.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 14/19] btrfs: make btree inode io_tree has its special owner
  2020-09-17  0:02     ` Qu Wenruo
@ 2020-09-17 12:50       ` David Sterba
  2020-09-18  8:18         ` Qu Wenruo
  0 siblings, 1 reply; 58+ messages in thread
From: David Sterba @ 2020-09-17 12:50 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Thu, Sep 17, 2020 at 08:02:05AM +0800, Qu Wenruo wrote:
> 
> 
> On 2020/9/17 上午12:06, David Sterba wrote:
> > On Tue, Sep 15, 2020 at 01:35:27PM +0800, Qu Wenruo wrote:
> >> Btree inode is pretty special compared to all other inode extent io
> >> tree, although it has a btrfs inode, it doesn't have the track_uptodate
> >> bit at all.
> >>
> >> This means a lot of things like extent locking doesn't even need to be
> >> applied to btree io tree.
> >>
> >> Since it's so special, adds a new owner value for it to make debuging a
> >> little easier.
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >>  fs/btrfs/disk-io.c        | 2 +-
> >>  fs/btrfs/extent-io-tree.h | 1 +
> >>  2 files changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> >> index 1ba16951ccaa..82a841bd0702 100644
> >> --- a/fs/btrfs/disk-io.c
> >> +++ b/fs/btrfs/disk-io.c
> >> @@ -2126,7 +2126,7 @@ static void btrfs_init_btree_inode(struct btrfs_fs_info *fs_info)
> >>  
> >>  	RB_CLEAR_NODE(&BTRFS_I(inode)->rb_node);
> >>  	extent_io_tree_init(fs_info, &BTRFS_I(inode)->io_tree,
> >> -			    IO_TREE_INODE_IO, inode);
> >> +			    IO_TREE_BTREE_IO, inode);
> > 
> > This looks like an independent patch, so it could be taken separately.
> > 
> Errr, why?
> 
> We added a new owner for btree inode io tree, and utilize that new owner
> in the same patch looks completely sane to me.
> 
> Or did I miss something?

The IO_TREE_* ids are only for debugging and IO_TREE_INODE_IO is
supposed to be used for data inodes. But the btree_inode has that too,
which does not seem to follow the purpose of the tree id, you fix that
in this patch and it applies to current code too.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 04/19] btrfs: remove the open-code to read disk-key
  2020-09-17 12:37       ` David Sterba
@ 2020-09-17 13:15         ` Qu Wenruo
  2020-09-17 22:41           ` David Sterba
  0 siblings, 1 reply; 58+ messages in thread
From: Qu Wenruo @ 2020-09-17 13:15 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 3719 bytes --]



On 2020/9/17 下午8:37, David Sterba wrote:
> On Thu, Sep 17, 2020 at 04:02:37PM +0800, Qu Wenruo wrote:
>> On 2020/9/17 上午12:01, David Sterba wrote:
>>> On Tue, Sep 15, 2020 at 01:35:17PM +0800, Qu Wenruo wrote:
>>>> generic_bin_search() distinguishes between reading a key which doesn't
>>>> cross a page and one which does. However this distinction is not
>>>> necessary since read_extent_buffer handles both cases transparently.
>>>>
>>>> Just use read_extent_buffer to streamline the code.
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>>  fs/btrfs/ctree.c | 13 ++-----------
>>>>  1 file changed, 2 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>>>> index cd1cd673bc0b..e204e1320745 100644
>>>> --- a/fs/btrfs/ctree.c
>>>> +++ b/fs/btrfs/ctree.c
>>>> @@ -1697,7 +1697,6 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
>>>>  	}
>>>>  
>>>>  	while (low < high) {
>>>> -		unsigned long oip;
>>>>  		unsigned long offset;
>>>>  		struct btrfs_disk_key *tmp;
>>>>  		struct btrfs_disk_key unaligned;
>>>> @@ -1705,17 +1704,9 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
>>>>  
>>>>  		mid = (low + high) / 2;
>>>>  		offset = p + mid * item_size;
>>>> -		oip = offset_in_page(offset);
>>>>  
>>>> -		if (oip + key_size <= PAGE_SIZE) {
>>>> -			const unsigned long idx = offset >> PAGE_SHIFT;
>>>> -			char *kaddr = page_address(eb->pages[idx]);
>>>> -
>>>> -			tmp = (struct btrfs_disk_key *)(kaddr + oip);
>>>> -		} else {
>>>> -			read_extent_buffer(eb, &unaligned, offset, key_size);
>>>> -			tmp = &unaligned;
>>>> -		}
>>>> +		read_extent_buffer(eb, &unaligned, offset, key_size);
>>>> +		tmp = &unaligned;
>>>
>>> Reading from the first page is a performance optimization on systems
>>> with 4K pages, ie. the majority. I'm not in favor removing it just to
>>> make the code look nicer.
>>
>> For 4K system, with the optimization it only saves one
>> read_extent_buffer() call cost.
> 
> This evaluation is wrong, you missed several things that
> generic_bin_search and read_extent_buffer do.
> 
> generic_bin_search is called very often, each search slot so
> optimization is worth here
> 
> read_extent_buffer is used _only_ for keys that cross page boundary, so
> we need to read the bytes in two steps and this is wrapped into a
> function that we call in a limited number of cases

Then to me, the better solution is to make read_extent_buffer() to be
split into two part.

Part 1 to handle the same page read, which should be made inline.
The part 1 should be small enough, as it only involves the in-page
offset calculation, which is also already done in current
generic_bin_search.

Then part 2 to handle the cross page case, and that part can be a
function call.

Personally speaking, even generic_bin_search() is a hot-path, I still
don't believe it's worthy, as read_extent_buffer() itself is also
frequently called in other locations, and I never see a special handling
for it in any other location.

Anyway, I will use the get_eb_page_offset()/get_eb_page_index() macros
here first, or subpage will be completely screwed.

And then try to use that two-part solution for read_extent_buffer().

Thanks,
Qu

> 
> In all other cases, when the whole key is contained in the page the call
> is inline in generic_bin_search, ie. no function call overhead
> 
>> Or we will need to manually call get_eb_page_offset() here to make it
>> work for subpage.
> 
> For nodesize that is smaller than PAGE_SIZE there's no page crossing at
> all so using read_extent_buffer would be making things worse.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 04/19] btrfs: remove the open-code to read disk-key
  2020-09-17 13:15         ` Qu Wenruo
@ 2020-09-17 22:41           ` David Sterba
  2020-09-17 23:26             ` Qu Wenruo
  0 siblings, 1 reply; 58+ messages in thread
From: David Sterba @ 2020-09-17 22:41 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Thu, Sep 17, 2020 at 09:15:31PM +0800, Qu Wenruo wrote:
> Then to me, the better solution is to make read_extent_buffer() to be
> split into two part.
> 
> Part 1 to handle the same page read, which should be made inline.
> The part 1 should be small enough, as it only involves the in-page
> offset calculation, which is also already done in current
> generic_bin_search.

Sounds easy, the result is awful. The inlined part 1 is not that small
and explodes for each call of read_extent_buffer. Explodes in code size,
increases stack consumption of all callers.

> Then part 2 to handle the cross page case, and that part can be a
> function call.
> 
> Personally speaking, even generic_bin_search() is a hot-path, I still
> don't believe it's worthy, as read_extent_buffer() itself is also
> frequently called in other locations, and I never see a special handling
> for it in any other location.

The usage pattern is different, many other location calls
read_extent_buffer just once to read some data and process. There are
very few functions that call it in a loop.

OTOH, bin_search jumps over the sorted array of node keys, it does not
even have to read the actual key for comparison because it understands
the on-disk key and just sets the pointer. Calling read_extent_buffer
for each of them will just waste cycles copying it to the tmp variable.

> Anyway, I will use the get_eb_page_offset()/get_eb_page_index() macros
> here first, or subpage will be completely screwed.
> 
> And then try to use that two-part solution for read_extent_buffer().

Some numbers from a release build, patch below:

object size:

   text    data     bss     dec     hex filename
1099317   17972   14912 1132201  1146a9 pre/btrfs.ko
1165930   17972   14912 1198814  124ade post/btrfs.ko

DELTA: +66613

Stack usage meter:

send_clone                                                        +16 (128 -> 144)
btree_readpage_end_io_hook                                        +40 (168 -> 208)
btrfs_lookup_csum                                                  +8 (104 -> 112)
find_free_dev_extent_start                                         +8 (144 -> 152)
__btrfs_commit_inode_delayed_items                                 +8 (168 -> 176)
btrfs_exclude_logged_extents                                       +8 (72 -> 80)
btrfs_set_inode_index                                             +16 (88 -> 104)
btrfs_shrink_device                                                +8 (160 -> 168)
find_parent_nodes                                                  -8 (312 -> 304)
__add_to_free_space_tree                                          +16 (112 -> 128)
btrfs_truncate_inode_items                                         -8 (360 -> 352)
ref_get_fields                                                    +16 (48 -> 64)
btrfs_qgroup_trace_leaf_items                                      +8 (80 -> 88)
did_create_dir                                                     +8 (112 -> 120)
free_space_next_bitmap                                            +32 (56 -> 88)
btrfs_lookup_bio_sums                                             +24 (216 -> 240)
btrfs_read_qgroup_config                                           +8 (120 -> 128)
btrfs_check_ref_name_override                                     +16 (152 -> 168)
btrfs_uuid_tree_iterate                                            +8 (128 -> 136)
log_dir_items                                                     +16 (160 -> 176)
btrfs_ioctl_send                                                  +16 (216 -> 232)
btrfs_get_parent                                                  +16 (80 -> 96)
__btrfs_inc_extent_ref                                             +8 (128 -> 136)
btrfs_unlink_subvol                                               +16 (144 -> 160)
btrfs_del_csums                                                    +8 (184 -> 192)
btrfs_mount                                                       -16 (184 -> 168)
generic_bin_search                                                 +8 (104 -> 112)
btrfs_uuid_tree_add                                               +16 (128 -> 144)
free_space_test_bit                                                +8 (72 -> 80)
btrfs_init_dev_stats                                              +16 (160 -> 176)
btrfs_read_chunk_tree                                             +48 (208 -> 256)
process_all_refs                                                  +16 (104 -> 120)
... and this goes on

LOST (80):
        btrfs_ioctl_setflags                                       80

NEW (208):
        __read_extent_buffer                                       24
        get_order                                                   8
        btrfs_search_path_in_tree_user                            176
LOST/NEW DELTA:     +128
PRE/POST DELTA:    +1944

---

Here's the patch. I'm now quite sure we don't want to split
read_extent_buffer and keep the bin_search optimization as is.

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index afac70ef0cc5..77c1df5771bf 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5584,7 +5584,7 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 	return ret;
 }
 
-static bool report_eb_range(const struct extent_buffer *eb, unsigned long start,
+bool report_eb_range(const struct extent_buffer *eb, unsigned long start,
 			    unsigned long len)
 {
 	btrfs_warn(eb->fs_info,
@@ -5595,45 +5595,17 @@ static bool report_eb_range(const struct extent_buffer *eb, unsigned long start,
 	return true;
 }
 
-/*
- * Check if the [start, start + len) range is valid before reading/writing
- * the eb.
- * NOTE: @start and @len are offset inside the eb, not logical address.
- *
- * Caller should not touch the dst/src memory if this function returns error.
- */
-static inline int check_eb_range(const struct extent_buffer *eb,
-				 unsigned long start, unsigned long len)
-{
-	unsigned long offset;
-
-	/* start, start + len should not go beyond eb->len nor overflow */
-	if (unlikely(check_add_overflow(start, len, &offset) || offset > eb->len))
-		return report_eb_range(eb, start, len);
-
-	return false;
-}
-
-void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
+void __read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 			unsigned long start, unsigned long len)
 {
-	size_t cur;
-	size_t offset;
-	struct page *page;
-	char *kaddr;
+	unsigned long offset = offset_in_page(start);
 	char *dst = (char *)dstv;
 	unsigned long i = start >> PAGE_SHIFT;
 
-	if (check_eb_range(eb, start, len))
-		return;
-
-	offset = offset_in_page(start);
-
 	while (len > 0) {
-		page = eb->pages[i];
+		const char *kaddr = page_address(eb->pages[i]);
+		size_t cur = min(len, (PAGE_SIZE - offset));
 
-		cur = min(len, (PAGE_SIZE - offset));
-		kaddr = page_address(page);
 		memcpy(dst, kaddr + offset, cur);
 
 		dst += cur;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 3bbc25b816ea..7ea53794f927 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -241,9 +241,57 @@ static inline int extent_buffer_uptodate(const struct extent_buffer *eb)
 
 int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 			 unsigned long start, unsigned long len);
+/* NEW */
+
+bool report_eb_range(const struct extent_buffer *eb, unsigned long start,
+			    unsigned long len);
+void __read_extent_buffer(const struct extent_buffer *eb, void *dst,
+			unsigned long start,
+			unsigned long len);
+/*
+ * Check if the [start, start + len) range is valid before reading/writing
+ * the eb.
+ * NOTE: @start and @len are offset inside the eb, not logical address.
+ *
+ * Caller should not touch the dst/src memory if this function returns error.
+ */
+static inline int check_eb_range(const struct extent_buffer *eb,
+				 unsigned long start, unsigned long len)
+{
+	unsigned long offset;
+
+	/* start, start + len should not go beyond eb->len nor overflow */
+	if (unlikely(check_add_overflow(start, len, &offset) || offset > eb->len))
+		return report_eb_range(eb, start, len);
+
+	return false;
+}
+
+static inline void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
+				      unsigned long start, unsigned long len)
+{
+	const unsigned long oip = offset_in_page(start);
+
+	if (check_eb_range(eb, start, len))
+		return;
+
+	if (likely(oip + len <= PAGE_SIZE)) {
+		const unsigned long idx = start >> PAGE_SHIFT;
+		const char *kaddr = page_address(eb->pages[idx]);
+
+		memcpy(dstv, kaddr + oip, len);
+		return;
+	}
+
+	__read_extent_buffer(eb, dstv, start, len);
+}
+
+/* END */
+/*
 void read_extent_buffer(const struct extent_buffer *eb, void *dst,
 			unsigned long start,
 			unsigned long len);
+*/
 int read_extent_buffer_to_user_nofault(const struct extent_buffer *eb,
 				       void __user *dst, unsigned long start,
 				       unsigned long len);

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 04/19] btrfs: remove the open-code to read disk-key
  2020-09-17 22:41           ` David Sterba
@ 2020-09-17 23:26             ` Qu Wenruo
  0 siblings, 0 replies; 58+ messages in thread
From: Qu Wenruo @ 2020-09-17 23:26 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 9692 bytes --]



On 2020/9/18 上午6:41, David Sterba wrote:
> On Thu, Sep 17, 2020 at 09:15:31PM +0800, Qu Wenruo wrote:
>> Then to me, the better solution is to make read_extent_buffer() to be
>> split into two part.
>>
>> Part 1 to handle the same page read, which should be made inline.
>> The part 1 should be small enough, as it only involves the in-page
>> offset calculation, which is also already done in current
>> generic_bin_search.
> 
> Sounds easy, the result is awful. The inlined part 1 is not that small
> and explodes for each call of read_extent_buffer. Explodes in code size,
> increases stack consumption of all callers.
> 
>> Then part 2 to handle the cross page case, and that part can be a
>> function call.
>>
>> Personally speaking, even generic_bin_search() is a hot-path, I still
>> don't believe it's worthy, as read_extent_buffer() itself is also
>> frequently called in other locations, and I never see a special handling
>> for it in any other location.
> 
> The usage pattern is different, many other location calls
> read_extent_buffer just once to read some data and process. There are
> very few functions that call it in a loop.
> 
> OTOH, bin_search jumps over the sorted array of node keys, it does not
> even have to read the actual key for comparison because it understands
> the on-disk key and just sets the pointer. Calling read_extent_buffer
> for each of them will just waste cycles copying it to the tmp variable.
> 
>> Anyway, I will use the get_eb_page_offset()/get_eb_page_index() macros
>> here first, or subpage will be completely screwed.
>>
>> And then try to use that two-part solution for read_extent_buffer().
> 
> Some numbers from a release build, patch below:
> 
> object size:
> 
>    text    data     bss     dec     hex filename
> 1099317   17972   14912 1132201  1146a9 pre/btrfs.ko
> 1165930   17972   14912 1198814  124ade post/btrfs.ko
> 
> DELTA: +66613
> 
> Stack usage meter:
> 
> send_clone                                                        +16 (128 -> 144)
> btree_readpage_end_io_hook                                        +40 (168 -> 208)
> btrfs_lookup_csum                                                  +8 (104 -> 112)
> find_free_dev_extent_start                                         +8 (144 -> 152)
> __btrfs_commit_inode_delayed_items                                 +8 (168 -> 176)
> btrfs_exclude_logged_extents                                       +8 (72 -> 80)
> btrfs_set_inode_index                                             +16 (88 -> 104)
> btrfs_shrink_device                                                +8 (160 -> 168)
> find_parent_nodes                                                  -8 (312 -> 304)
> __add_to_free_space_tree                                          +16 (112 -> 128)
> btrfs_truncate_inode_items                                         -8 (360 -> 352)
> ref_get_fields                                                    +16 (48 -> 64)
> btrfs_qgroup_trace_leaf_items                                      +8 (80 -> 88)
> did_create_dir                                                     +8 (112 -> 120)
> free_space_next_bitmap                                            +32 (56 -> 88)
> btrfs_lookup_bio_sums                                             +24 (216 -> 240)
> btrfs_read_qgroup_config                                           +8 (120 -> 128)
> btrfs_check_ref_name_override                                     +16 (152 -> 168)
> btrfs_uuid_tree_iterate                                            +8 (128 -> 136)
> log_dir_items                                                     +16 (160 -> 176)
> btrfs_ioctl_send                                                  +16 (216 -> 232)
> btrfs_get_parent                                                  +16 (80 -> 96)
> __btrfs_inc_extent_ref                                             +8 (128 -> 136)
> btrfs_unlink_subvol                                               +16 (144 -> 160)
> btrfs_del_csums                                                    +8 (184 -> 192)
> btrfs_mount                                                       -16 (184 -> 168)
> generic_bin_search                                                 +8 (104 -> 112)
> btrfs_uuid_tree_add                                               +16 (128 -> 144)
> free_space_test_bit                                                +8 (72 -> 80)
> btrfs_init_dev_stats                                              +16 (160 -> 176)
> btrfs_read_chunk_tree                                             +48 (208 -> 256)
> process_all_refs                                                  +16 (104 -> 120)
> ... and this goes on
> 
> LOST (80):
>         btrfs_ioctl_setflags                                       80
> 
> NEW (208):
>         __read_extent_buffer                                       24
>         get_order                                                   8
>         btrfs_search_path_in_tree_user                            176
> LOST/NEW DELTA:     +128
> PRE/POST DELTA:    +1944
> 
> ---
> 
> Here's the patch. I'm now quite sure we don't want to split
> read_extent_buffer and keep the bin_search optimization as is.

Fine, I'll change the patch to use get_eb_page_offset/index() in
generic_bin_search().

> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index afac70ef0cc5..77c1df5771bf 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5584,7 +5584,7 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
>  	return ret;
>  }
>  
> -static bool report_eb_range(const struct extent_buffer *eb, unsigned long start,
> +bool report_eb_range(const struct extent_buffer *eb, unsigned long start,
>  			    unsigned long len)
>  {
>  	btrfs_warn(eb->fs_info,
> @@ -5595,45 +5595,17 @@ static bool report_eb_range(const struct extent_buffer *eb, unsigned long start,
>  	return true;
>  }
>  
> -/*
> - * Check if the [start, start + len) range is valid before reading/writing
> - * the eb.
> - * NOTE: @start and @len are offset inside the eb, not logical address.
> - *
> - * Caller should not touch the dst/src memory if this function returns error.
> - */
> -static inline int check_eb_range(const struct extent_buffer *eb,
> -				 unsigned long start, unsigned long len)
> -{
> -	unsigned long offset;
> -
> -	/* start, start + len should not go beyond eb->len nor overflow */
> -	if (unlikely(check_add_overflow(start, len, &offset) || offset > eb->len))
> -		return report_eb_range(eb, start, len);
> -
> -	return false;
> -}
> -
> -void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
> +void __read_extent_buffer(const struct extent_buffer *eb, void *dstv,
>  			unsigned long start, unsigned long len)
>  {
> -	size_t cur;
> -	size_t offset;
> -	struct page *page;
> -	char *kaddr;
> +	unsigned long offset = offset_in_page(start);
>  	char *dst = (char *)dstv;
>  	unsigned long i = start >> PAGE_SHIFT;
>  
> -	if (check_eb_range(eb, start, len))
> -		return;
> -
> -	offset = offset_in_page(start);
> -
>  	while (len > 0) {
> -		page = eb->pages[i];
> +		const char *kaddr = page_address(eb->pages[i]);
> +		size_t cur = min(len, (PAGE_SIZE - offset));
>  
> -		cur = min(len, (PAGE_SIZE - offset));
> -		kaddr = page_address(page);
>  		memcpy(dst, kaddr + offset, cur);
>  
>  		dst += cur;
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 3bbc25b816ea..7ea53794f927 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -241,9 +241,57 @@ static inline int extent_buffer_uptodate(const struct extent_buffer *eb)
>  
>  int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
>  			 unsigned long start, unsigned long len);
> +/* NEW */
> +
> +bool report_eb_range(const struct extent_buffer *eb, unsigned long start,
> +			    unsigned long len);
> +void __read_extent_buffer(const struct extent_buffer *eb, void *dst,
> +			unsigned long start,
> +			unsigned long len);
> +/*
> + * Check if the [start, start + len) range is valid before reading/writing
> + * the eb.
> + * NOTE: @start and @len are offset inside the eb, not logical address.
> + *
> + * Caller should not touch the dst/src memory if this function returns error.
> + */
> +static inline int check_eb_range(const struct extent_buffer *eb,
> +				 unsigned long start, unsigned long len)
> +{
> +	unsigned long offset;
> +
> +	/* start, start + len should not go beyond eb->len nor overflow */
> +	if (unlikely(check_add_overflow(start, len, &offset) || offset > eb->len))
> +		return report_eb_range(eb, start, len);
> +
> +	return false;
> +}
> +
> +static inline void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
> +				      unsigned long start, unsigned long len)
> +{
> +	const unsigned long oip = offset_in_page(start);
> +
> +	if (check_eb_range(eb, start, len))
> +		return;
> +
> +	if (likely(oip + len <= PAGE_SIZE)) {
> +		const unsigned long idx = start >> PAGE_SHIFT;
> +		const char *kaddr = page_address(eb->pages[idx]);
> +
> +		memcpy(dstv, kaddr + oip, len);
> +		return;
> +	}
> +
> +	__read_extent_buffer(eb, dstv, start, len);
> +}
> +
> +/* END */
> +/*
>  void read_extent_buffer(const struct extent_buffer *eb, void *dst,
>  			unsigned long start,
>  			unsigned long len);
> +*/
>  int read_extent_buffer_to_user_nofault(const struct extent_buffer *eb,
>  				       void __user *dst, unsigned long start,
>  				       unsigned long len);
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 14/19] btrfs: make btree inode io_tree has its special owner
  2020-09-17 12:50       ` David Sterba
@ 2020-09-18  8:18         ` Qu Wenruo
  2020-09-22 14:06           ` David Sterba
  0 siblings, 1 reply; 58+ messages in thread
From: Qu Wenruo @ 2020-09-18  8:18 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 2099 bytes --]



On 2020/9/17 下午8:50, David Sterba wrote:
> On Thu, Sep 17, 2020 at 08:02:05AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2020/9/17 上午12:06, David Sterba wrote:
>>> On Tue, Sep 15, 2020 at 01:35:27PM +0800, Qu Wenruo wrote:
>>>> Btree inode is pretty special compared to all other inode extent io
>>>> tree, although it has a btrfs inode, it doesn't have the track_uptodate
>>>> bit at all.
>>>>
>>>> This means a lot of things like extent locking doesn't even need to be
>>>> applied to btree io tree.
>>>>
>>>> Since it's so special, adds a new owner value for it to make debuging a
>>>> little easier.
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>>  fs/btrfs/disk-io.c        | 2 +-
>>>>  fs/btrfs/extent-io-tree.h | 1 +
>>>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index 1ba16951ccaa..82a841bd0702 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -2126,7 +2126,7 @@ static void btrfs_init_btree_inode(struct btrfs_fs_info *fs_info)
>>>>  
>>>>  	RB_CLEAR_NODE(&BTRFS_I(inode)->rb_node);
>>>>  	extent_io_tree_init(fs_info, &BTRFS_I(inode)->io_tree,
>>>> -			    IO_TREE_INODE_IO, inode);
>>>> +			    IO_TREE_BTREE_IO, inode);
>>>
>>> This looks like an independent patch, so it could be taken separately.
>>>
>> Errr, why?
>>
>> We added a new owner for btree inode io tree, and utilize that new owner
>> in the same patch looks completely sane to me.
>>
>> Or did I miss something?
> 
> The IO_TREE_* ids are only for debugging and IO_TREE_INODE_IO is
> supposed to be used for data inodes. But the btree_inode has that too,
> which does not seem to follow the purpose of the tree id, you fix that
> in this patch and it applies to current code too.
> 
Oh, I didn't see it as a fix. But your point still stands, and makes sense.

And this patch would be the base stone for later btree io tree specific
re-mapping bits.
So if this patch can be applied to current code, it would only be a good
thing.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 14/19] btrfs: make btree inode io_tree has its special owner
  2020-09-18  8:18         ` Qu Wenruo
@ 2020-09-22 14:06           ` David Sterba
  0 siblings, 0 replies; 58+ messages in thread
From: David Sterba @ 2020-09-22 14:06 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Fri, Sep 18, 2020 at 04:18:27PM +0800, Qu Wenruo wrote:
> >>> This looks like an independent patch, so it could be taken separately.
> >>>
> >> Errr, why?
> >>
> >> We added a new owner for btree inode io tree, and utilize that new owner
> >> in the same patch looks completely sane to me.
> >>
> >> Or did I miss something?
> > 
> > The IO_TREE_* ids are only for debugging and IO_TREE_INODE_IO is
> > supposed to be used for data inodes. But the btree_inode has that too,
> > which does not seem to follow the purpose of the tree id, you fix that
> > in this patch and it applies to current code too.
> > 
> Oh, I didn't see it as a fix. But your point still stands, and makes sense.
> 
> And this patch would be the base stone for later btree io tree specific
> re-mapping bits.
> So if this patch can be applied to current code, it would only be a good
> thing.

Thanks for confirming, I'll add it to misc-next.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v2 14/19] btrfs: make btree inode io_tree has its special owner
  2020-09-15  5:35 ` [PATCH v2 14/19] btrfs: make btree inode io_tree has its special owner Qu Wenruo
  2020-09-16  9:28   ` Johannes Thumshirn
  2020-09-16 16:06   ` David Sterba
@ 2020-09-22 14:14   ` David Sterba
  2 siblings, 0 replies; 58+ messages in thread
From: David Sterba @ 2020-09-22 14:14 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Sep 15, 2020 at 01:35:27PM +0800, Qu Wenruo wrote:
> Btree inode is pretty special compared to all other inode extent io
> tree, although it has a btrfs inode, it doesn't have the track_uptodate
> bit at all.
> 
> This means a lot of things like extent locking doesn't even need to be
> applied to btree io tree.
> 
> Since it's so special, adds a new owner value for it to make debuging a
> little easier.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/disk-io.c        | 2 +-
>  fs/btrfs/extent-io-tree.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 1ba16951ccaa..82a841bd0702 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2126,7 +2126,7 @@ static void btrfs_init_btree_inode(struct btrfs_fs_info *fs_info)
>  
>  	RB_CLEAR_NODE(&BTRFS_I(inode)->rb_node);
>  	extent_io_tree_init(fs_info, &BTRFS_I(inode)->io_tree,
> -			    IO_TREE_INODE_IO, inode);
> +			    IO_TREE_BTREE_IO, inode);
>  	BTRFS_I(inode)->io_tree.track_uptodate = false;
>  	extent_map_tree_init(&BTRFS_I(inode)->extent_tree);
>  
> diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
> index 219a09a2b734..21d128383bfd 100644
> --- a/fs/btrfs/extent-io-tree.h
> +++ b/fs/btrfs/extent-io-tree.h
> @@ -40,6 +40,7 @@ struct io_failure_record;
>  enum {
>  	IO_TREE_FS_PINNED_EXTENTS,
>  	IO_TREE_FS_EXCLUDED_EXTENTS,
> +	IO_TREE_BTREE_IO,

I've renamed it to IO_TREE_BTREE_INODE_IO, and btw don't forget to check
if enums aren't exported to tracepoints. This is and needs to be added
there as well.

>  	IO_TREE_INODE_IO,
>  	IO_TREE_INODE_IO_FAILURE,
>  	IO_TREE_RELOC_BLOCKS,
> -- 
> 2.28.0

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, back to index

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-15  5:35 [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
2020-09-15  5:35 ` [PATCH v2 01/19] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
2020-09-15  5:35 ` [PATCH v2 02/19] btrfs: remove the unnecessary parameter @start and @len for check_data_csum() Qu Wenruo
2020-09-15  8:39   ` Johannes Thumshirn
2020-09-15  5:35 ` [PATCH v2 03/19] btrfs: calculate inline extent buffer page size based on page size Qu Wenruo
2020-09-15  8:35   ` Nikolay Borisov
2020-09-15 10:05     ` Qu Wenruo
2020-09-15  8:40   ` Johannes Thumshirn
2020-09-15  5:35 ` [PATCH v2 04/19] btrfs: remove the open-code to read disk-key Qu Wenruo
2020-09-15  8:36   ` Nikolay Borisov
2020-09-15  8:40   ` Johannes Thumshirn
2020-09-16 16:01   ` David Sterba
2020-09-17  8:02     ` Qu Wenruo
2020-09-17 12:37       ` David Sterba
2020-09-17 13:15         ` Qu Wenruo
2020-09-17 22:41           ` David Sterba
2020-09-17 23:26             ` Qu Wenruo
2020-09-15  5:35 ` [PATCH v2 05/19] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
2020-09-15  8:27   ` Johannes Thumshirn
2020-09-15 10:04     ` Qu Wenruo
2020-09-15 10:12       ` Johannes Thumshirn
2020-09-15 17:40   ` kernel test robot
2020-09-15  5:35 ` [PATCH v2 06/19] btrfs: don't allow tree block to cross page boundary for subpage support Qu Wenruo
2020-09-15  8:37   ` Nikolay Borisov
2020-09-15 10:06     ` Qu Wenruo
2020-09-15  8:44   ` Johannes Thumshirn
2020-09-15  5:35 ` [PATCH v2 07/19] btrfs: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
2020-09-15  8:42   ` Johannes Thumshirn
2020-09-15 10:07     ` Qu Wenruo
2020-09-15 10:12       ` Johannes Thumshirn
2020-09-15 10:07     ` Qu Wenruo
2020-09-15  5:35 ` [PATCH v2 08/19] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
2020-09-15  5:35 ` [PATCH v2 09/19] btrfs: make csum_tree_block() handle sectorsize smaller than page size Qu Wenruo
2020-09-15  8:47   ` Johannes Thumshirn
2020-09-15  5:35 ` [PATCH v2 10/19] btrfs: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
2020-09-15  8:52   ` Johannes Thumshirn
2020-09-15  5:35 ` [PATCH v2 11/19] btrfs: extract the extent buffer verification from btree_readpage_end_io_hook() Qu Wenruo
2020-09-15  5:35 ` [PATCH v2 12/19] btrfs: extent_io: only require sector size alignment for page read Qu Wenruo
2020-09-15  5:35 ` [PATCH v2 13/19] btrfs: make btrfs_readpage_end_io_hook() follow sector size Qu Wenruo
2020-09-15  5:35 ` [PATCH v2 14/19] btrfs: make btree inode io_tree has its special owner Qu Wenruo
2020-09-16  9:28   ` Johannes Thumshirn
2020-09-16 16:06   ` David Sterba
2020-09-17  0:02     ` Qu Wenruo
2020-09-17 12:50       ` David Sterba
2020-09-18  8:18         ` Qu Wenruo
2020-09-22 14:06           ` David Sterba
2020-09-22 14:14   ` David Sterba
2020-09-15  5:35 ` [PATCH v2 15/19] btrfs: don't set extent_io_tree bits for btree inode at endio time Qu Wenruo
2020-09-15  5:35 ` [PATCH v2 16/19] btrfs: use extent_io_tree to handle subpage extent buffer allocation Qu Wenruo
2020-09-15  5:35 ` [PATCH v2 17/19] btrfs: implement subpage metadata read and its endio function Qu Wenruo
2020-09-16  8:47   ` kernel test robot
2020-09-15  5:35 ` [PATCH v2 18/19] btrfs: implement btree_readpage() and try_release_extent_buffer() for subpage Qu Wenruo
2020-09-15  5:35 ` [PATCH v2 19/19] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
2020-09-16  1:35 ` [PATCH v2 00/19] btrfs: add read-only support for subpage sector size Qu Wenruo
2020-09-16 16:18 ` Neal Gompa
2020-09-17  0:03   ` Qu Wenruo
2020-09-17  0:13     ` Neal Gompa
2020-09-17  0:24       ` Qu Wenruo

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git