[PATCH v2 0/3] btrfs: support more pages sizes for the subpage support

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/3] btrfs: support more pages sizes for the subpage support
@ 2022-01-11  2:34 Qu Wenruo
  2022-01-11  2:34 ` [PATCH v2 1/3] btrfs: use dummy extent buffer for super block sys chunk array read Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Qu Wenruo @ 2022-01-11  2:34 UTC (permalink / raw)
  To: linux-btrfs

The series can be fetched from github:
https://github.com/adam900710/linux/tree/metadata_subpage_switch

Previously we only support 64K page size with 4K sector size for subpage
support.

There are two reasons for such requirement:

- Make sure no matter what the nodesize is, metadata can be fit into one
  page
  This is to avoid multi-page metadata handling.

- We use u16 as bitmap
  That means we will waste extra memory for smaller page sizes.

The 2nd problem is already solved by compacting all bitmap into one
large bitmap, in commit 72a69cd03082 ("btrfs: subpage: pack all subpage
bitmaps into a larger bitmap").

And this patchset will address the first problem by going to non-subpage
routine if nodesize >= PAGE_SIZE.

This will still leave a small limitation, that for nodesize >= PAGE_SIZE
and sectorsize < PAGE_SIZE case, we can not allow a tree block to cross
page boundary.

Thankfully we have btrfs-check to detect such problem, and mkfs and
kernel chunk allocator have already ensured all our metadata will not
cross such page boundaries.

The following combinations has been tested:
(p: page_size s: sectorsize, n: nodesize)

- p=64K s=4K n=64K (aarch64, RK3399/CM4)
  To cover the new metadata path

- p=64K s=4K n=16K (aarch64, RK3399/CM4)
- p=4k s=4k n=16k (x86_64)
  The make sure no new bugs in the old path

- p=16K s=4K n=16K (aarch64, M1)
- p=16K s=4K n=64K (aarch64, M1)
  Special thanks to Su Yue. He contributes his VM on M1 to me to do
  extra tests, and exposed a bug in the btrfs_read_sys_array() affecting
  16K page size cases.

Changelog:
RFC->v1:
- Remove one ASSERT() which is causing false alert
  As we have no way to distinguish unmapped extent buffer with anonymous
  page used by DIO/compression.

- Expand the subpage support to any PAGE_SIZE > 4K
  There is still a limitation on not allowing metadata block crossing page
  boundaries, but that should already be rare nowadays.

  Another limit is we still don't support 256K page size due to it's
  beyond max compressed extent size.

v2:
- Add extra supported sectorsizes in sysfs interface
  Now for page size > 4K, all sector sizes up to page size will be
  supported.

- Fix a bug in btrfs_read_sys_array() that would cause false alert
  Now btrfs_read_sys_array() will use dummy eb, and
  set/clear_extent_buffer_uptodate() will handle both dummy and subpage
  cases properly.

- Fix a bug in check_eb_alignment() that would cause false alert
  Since we handle nodesize >= PAGE_SIZE cases with the same page based
  metadata routine, we need to make sure metadata is page aligned for
  that case.

Qu Wenruo (3):
  btrfs: use dummy extent buffer for super block sys chunk array read
  btrfs: make nodesize >= PAGE_SIZE case to reuse the non-subpage
    routine
  btrfs: expand subpage support to any PAGE_SIZE > 4K

 fs/btrfs/disk-io.c   |  20 ++++++---
 fs/btrfs/extent_io.c | 102 ++++++++++++++++++++++++++++---------------
 fs/btrfs/inode.c     |   2 +-
 fs/btrfs/subpage.c   |  30 ++++++-------
 fs/btrfs/subpage.h   |  25 +++++++++++
 fs/btrfs/sysfs.c     |  11 +++--
 fs/btrfs/volumes.c   |  27 ++----------
 7 files changed, 130 insertions(+), 87 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/3] btrfs: use dummy extent buffer for super block sys chunk array read
  2022-01-11  2:34 [PATCH v2 0/3] btrfs: support more pages sizes for the subpage support Qu Wenruo
@ 2022-01-11  2:34 ` Qu Wenruo
  2022-01-11  9:56   ` Qu Wenruo
  2022-01-11  2:34 ` [PATCH v2 2/3] btrfs: make nodesize >= PAGE_SIZE case to reuse the non-subpage routine Qu Wenruo
  2022-01-11  2:34 ` [PATCH v2 3/3] btrfs: expand subpage support to any PAGE_SIZE > 4K Qu Wenruo
  2 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2022-01-11  2:34 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
During test for 16K page size with 4K sectorsize and 64K nodesize, btrfs
will reject the mount with the following dmesg:

 BTRFS warning (device dm-1): read-write for sector size 4096 with page size 16384 is experimental
 BTRFS error (device dm-1): tree block crosses page boundary, start 65536 nodesize 65536
 BTRFS error (device dm-1): failed to read the system array: -22
 BTRFS error (device dm-1): open_ctree failed

The rejection only happens with sectorsize=4K, page_size=16K and
nodesize=64K combination.

For sectorsize=4K, page_size=16K and nodesize=16K (default) case, it
works fine.

[CAUSE]
The problem is in how we read sys chunk array.

In function btrfs_read_sys_array(), we allocate an extent buffer and
copy super block into the extent buffer, so that we can use various
extent buffer helpers to do the work.

But the problem is, we're calling btrfs_find_create_tree_block(), which
will do all the validation check, including the page boundary check for
subpage cases.

[FIX]
In fact, we only need a dummy extent buffer, without all the checks for
a real extent buffer.

This patch will replace the btrfs_find_create_tree_block() call with
__alloc_dummy_extent_buffer().

By this we can:

- Set the extent buffer size to BTRFS_SUPER_INFO_SIZE
- Avoid the unnecessary validation checks

Also since we're here, remove some stale comments on setting the eb page
uptodate, as now set_extent_buffer_uptodate() can handle dummy eb cases
without any problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 27 ++++-----------------------
 1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b07d382d53a8..00c7f5cf4b52 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7373,7 +7373,6 @@ static int read_one_dev(struct extent_buffer *leaf,
 
 int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
 {
-	struct btrfs_root *root = fs_info->tree_root;
 	struct btrfs_super_block *super_copy = fs_info->super_copy;
 	struct extent_buffer *sb;
 	struct btrfs_disk_key *disk_key;
@@ -7388,31 +7387,13 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
 	u64 type;
 	struct btrfs_key key;
 
-	ASSERT(BTRFS_SUPER_INFO_SIZE <= fs_info->nodesize);
-	/*
-	 * This will create extent buffer of nodesize, superblock size is
-	 * fixed to BTRFS_SUPER_INFO_SIZE. If nodesize > sb size, this will
-	 * overallocate but we can keep it as-is, only the first page is used.
-	 */
-	sb = btrfs_find_create_tree_block(fs_info, BTRFS_SUPER_INFO_OFFSET,
-					  root->root_key.objectid, 0);
+	ASSERT(BTRFS_SUPER_INFO_SIZE <= fs_info->nodesize &&
+	       BTRFS_SUPER_INFO_SIZE <= PAGE_SIZE);
+	sb = __alloc_dummy_extent_buffer(fs_info, BTRFS_SUPER_INFO_OFFSET,
+					 BTRFS_SUPER_INFO_SIZE);
 	if (IS_ERR(sb))
 		return PTR_ERR(sb);
 	set_extent_buffer_uptodate(sb);
-	/*
-	 * The sb extent buffer is artificial and just used to read the system array.
-	 * set_extent_buffer_uptodate() call does not properly mark all it's
-	 * pages up-to-date when the page is larger: extent does not cover the
-	 * whole page and consequently check_page_uptodate does not find all
-	 * the page's extents up-to-date (the hole beyond sb),
-	 * write_extent_buffer then triggers a WARN_ON.
-	 *
-	 * Regular short extents go through mark_extent_buffer_dirty/writeback cycle,
-	 * but sb spans only this function. Add an explicit SetPageUptodate call
-	 * to silence the warning eg. on PowerPC 64.
-	 */
-	if (PAGE_SIZE > BTRFS_SUPER_INFO_SIZE)
-		SetPageUptodate(sb->pages[0]);
 
 	write_extent_buffer(sb, super_copy, 0, BTRFS_SUPER_INFO_SIZE);
 	array_size = btrfs_super_sys_array_size(super_copy);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/3] btrfs: make nodesize >= PAGE_SIZE case to reuse the non-subpage routine
  2022-01-11  2:34 [PATCH v2 0/3] btrfs: support more pages sizes for the subpage support Qu Wenruo
  2022-01-11  2:34 ` [PATCH v2 1/3] btrfs: use dummy extent buffer for super block sys chunk array read Qu Wenruo
@ 2022-01-11  2:34 ` Qu Wenruo
  2022-01-11  2:34 ` [PATCH v2 3/3] btrfs: expand subpage support to any PAGE_SIZE > 4K Qu Wenruo
  2 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2022-01-11  2:34 UTC (permalink / raw)
  To: linux-btrfs

The reason why we only support 64K page size for subpage is, for 64K
page size we can ensure no matter what the nodesize is, we can fit it
into one page.

When other page size comes, especially like 16K, the limitation is a bit
blockage.

To remove such limitation, we allow nodesize >= PAGE_SIZE case to go
the non-subpage routine.
By this, we can allow 4K sectorsize on 16K page size.

Although this introduces another smaller limitation, the metadata can
not cross page boundary, which is already met by most recent mkfs.

Another small improvement is, we can avoid the overhead for metadata if
nodesize >= PAGE_SIZE.
For 4K sector size and 64K page size/node size, or 4K sector size and
16K page size/node size, we don't need to allocate extra memory for the
metadata pages.

Please note that, this patch will not yet enable other page size support
yet.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   |   4 +-
 fs/btrfs/extent_io.c | 102 ++++++++++++++++++++++++++++---------------
 fs/btrfs/inode.c     |   2 +-
 fs/btrfs/subpage.c   |  30 ++++++-------
 fs/btrfs/subpage.h   |  25 +++++++++++
 5 files changed, 110 insertions(+), 53 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 87a5addbedf6..884e0b543136 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -505,7 +505,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec
 	u64 found_start;
 	struct extent_buffer *eb;
 
-	if (fs_info->sectorsize < PAGE_SIZE)
+	if (fs_info->nodesize < PAGE_SIZE)
 		return csum_dirty_subpage_buffers(fs_info, bvec);
 
 	eb = (struct extent_buffer *)page->private;
@@ -690,7 +690,7 @@ int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
 
 	ASSERT(page->private);
 
-	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
+	if (btrfs_sb(page->mapping->host->i_sb)->nodesize < PAGE_SIZE)
 		return validate_subpage_buffer(page, start, end, mirror);
 
 	eb = (struct extent_buffer *)page->private;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d6d48ecf823c..55740c3ce0e3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2710,7 +2710,7 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 		btrfs_page_set_error(fs_info, page, start, len);
 	}
 
-	if (fs_info->sectorsize == PAGE_SIZE)
+	if (!btrfs_is_subpage(fs_info, page))
 		unlock_page(page);
 	else
 		btrfs_subpage_end_reader(fs_info, page, start, len);
@@ -2943,7 +2943,7 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
 static void begin_page_read(struct btrfs_fs_info *fs_info, struct page *page)
 {
 	ASSERT(PageLocked(page));
-	if (fs_info->sectorsize == PAGE_SIZE)
+	if (!btrfs_is_subpage(fs_info, page))
 		return;
 
 	ASSERT(PagePrivate(page));
@@ -2965,7 +2965,7 @@ static struct extent_buffer *find_extent_buffer_readpage(
 	 * For regular sectorsize, we can use page->private to grab extent
 	 * buffer
 	 */
-	if (fs_info->sectorsize == PAGE_SIZE) {
+	if (fs_info->nodesize >= PAGE_SIZE) {
 		ASSERT(PagePrivate(page) && page->private);
 		return (struct extent_buffer *)page->private;
 	}
@@ -3458,7 +3458,7 @@ static int attach_extent_buffer_page(struct extent_buffer *eb,
 	if (page->mapping)
 		lockdep_assert_held(&page->mapping->private_lock);
 
-	if (fs_info->sectorsize == PAGE_SIZE) {
+	if (fs_info->nodesize >= PAGE_SIZE) {
 		if (!PagePrivate(page))
 			attach_page_private(page, eb);
 		else
@@ -3493,7 +3493,7 @@ int set_page_extent_mapped(struct page *page)
 
 	fs_info = btrfs_sb(page->mapping->host->i_sb);
 
-	if (fs_info->sectorsize < PAGE_SIZE)
+	if (btrfs_is_subpage(fs_info, page))
 		return btrfs_attach_subpage(fs_info, page, BTRFS_SUBPAGE_DATA);
 
 	attach_page_private(page, (void *)EXTENT_PAGE_PRIVATE);
@@ -3510,7 +3510,7 @@ void clear_page_extent_mapped(struct page *page)
 		return;
 
 	fs_info = btrfs_sb(page->mapping->host->i_sb);
-	if (fs_info->sectorsize < PAGE_SIZE)
+	if (btrfs_is_subpage(fs_info, page))
 		return btrfs_detach_subpage(fs_info, page);
 
 	detach_page_private(page);
@@ -3868,7 +3868,7 @@ static void find_next_dirty_byte(struct btrfs_fs_info *fs_info,
 	 * For regular sector size == page size case, since one page only
 	 * contains one sector, we return the page offset directly.
 	 */
-	if (fs_info->sectorsize == PAGE_SIZE) {
+	if (!btrfs_is_subpage(fs_info, page)) {
 		*start = page_offset(page);
 		*end = page_offset(page) + PAGE_SIZE;
 		return;
@@ -4250,7 +4250,7 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 	 * Subpage metadata doesn't use page locking at all, so we can skip
 	 * the page locking.
 	 */
-	if (!ret || fs_info->sectorsize < PAGE_SIZE)
+	if (!ret || fs_info->nodesize < PAGE_SIZE)
 		return ret;
 
 	num_pages = num_extent_pages(eb);
@@ -4410,7 +4410,7 @@ static void end_bio_subpage_eb_writepage(struct bio *bio)
 	struct bvec_iter_all iter_all;
 
 	fs_info = btrfs_sb(bio_first_page_all(bio)->mapping->host->i_sb);
-	ASSERT(fs_info->sectorsize < PAGE_SIZE);
+	ASSERT(fs_info->nodesize < PAGE_SIZE);
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
@@ -4737,7 +4737,7 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc,
 	if (!PagePrivate(page))
 		return 0;
 
-	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
+	if (btrfs_sb(page->mapping->host->i_sb)->nodesize < PAGE_SIZE)
 		return submit_eb_subpage(page, wbc, epd);
 
 	spin_lock(&mapping->private_lock);
@@ -5793,7 +5793,7 @@ static void detach_extent_buffer_page(struct extent_buffer *eb, struct page *pag
 		return;
 	}
 
-	if (fs_info->sectorsize == PAGE_SIZE) {
+	if (fs_info->nodesize >= PAGE_SIZE) {
 		/*
 		 * We do this since we'll remove the pages after we've
 		 * removed the eb from the radix tree, so we could race
@@ -6113,7 +6113,7 @@ static struct extent_buffer *grab_extent_buffer(
 	 * don't try to insert two ebs for the same bytenr.  So here we always
 	 * return NULL and just continue.
 	 */
-	if (fs_info->sectorsize < PAGE_SIZE)
+	if (fs_info->nodesize < PAGE_SIZE)
 		return NULL;
 
 	/* Page not yet attached to an extent buffer */
@@ -6135,6 +6135,30 @@ static struct extent_buffer *grab_extent_buffer(
 	return NULL;
 }
 
+static int check_eb_alignment(struct btrfs_fs_info *fs_info, u64 start)
+{
+	if (!IS_ALIGNED(start, fs_info->sectorsize)) {
+		btrfs_err(fs_info, "bad tree block start %llu", start);
+		return -EINVAL;
+	}
+
+	if (fs_info->nodesize < PAGE_SIZE &&
+	    offset_in_page(start) + fs_info->nodesize > PAGE_SIZE) {
+		btrfs_err(fs_info,
+		"tree block crosses page boundary, start %llu nodesize %u",
+			  start, fs_info->nodesize);
+		return -EINVAL;
+	}
+	if (fs_info->nodesize >= PAGE_SIZE &&
+	    !IS_ALIGNED(start, PAGE_SIZE)) {
+		btrfs_err(fs_info,
+		"tree block is not page aligned, start %llu nodesize %u",
+			  start, fs_info->nodesize);
+		return -EINVAL;
+	}
+	return 0;
+}
+
 struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 					  u64 start, u64 owner_root, int level)
 {
@@ -6149,10 +6173,8 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 	int uptodate = 1;
 	int ret;
 
-	if (!IS_ALIGNED(start, fs_info->sectorsize)) {
-		btrfs_err(fs_info, "bad tree block start %llu", start);
+	if (check_eb_alignment(fs_info, start))
 		return ERR_PTR(-EINVAL);
-	}
 
 #if BITS_PER_LONG == 32
 	if (start >= MAX_LFS_FILESIZE) {
@@ -6165,14 +6187,6 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		btrfs_warn_32bit_limit(fs_info);
 #endif
 
-	if (fs_info->sectorsize < PAGE_SIZE &&
-	    offset_in_page(start) + len > PAGE_SIZE) {
-		btrfs_err(fs_info,
-		"tree block crosses page boundary, start %llu nodesize %lu",
-			  start, len);
-		return ERR_PTR(-EINVAL);
-	}
-
 	eb = find_extent_buffer(fs_info, start);
 	if (eb)
 		return eb;
@@ -6202,7 +6216,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		 * page, but it may change in the future for 16K page size
 		 * support, so we still preallocate the memory in the loop.
 		 */
-		if (fs_info->sectorsize < PAGE_SIZE) {
+		if (fs_info->nodesize < PAGE_SIZE) {
 			prealloc = btrfs_alloc_subpage(fs_info, BTRFS_SUBPAGE_METADATA);
 			if (IS_ERR(prealloc)) {
 				ret = PTR_ERR(prealloc);
@@ -6421,7 +6435,7 @@ void clear_extent_buffer_dirty(const struct extent_buffer *eb)
 	int num_pages;
 	struct page *page;
 
-	if (eb->fs_info->sectorsize < PAGE_SIZE)
+	if (eb->fs_info->nodesize < PAGE_SIZE)
 		return clear_subpage_extent_buffer_dirty(eb);
 
 	num_pages = num_extent_pages(eb);
@@ -6453,7 +6467,7 @@ bool set_extent_buffer_dirty(struct extent_buffer *eb)
 	WARN_ON(!test_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags));
 
 	if (!was_dirty) {
-		bool subpage = eb->fs_info->sectorsize < PAGE_SIZE;
+		bool subpage = eb->fs_info->nodesize < PAGE_SIZE;
 
 		/*
 		 * For subpage case, we can have other extent buffers in the
@@ -6493,9 +6507,18 @@ void clear_extent_buffer_uptodate(struct extent_buffer *eb)
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-		if (page)
-			btrfs_page_clear_uptodate(fs_info, page,
-						  eb->start, eb->len);
+		if (!page)
+			continue;
+
+		/*
+		 * This is special handling for metadata subpage, as regular
+		 * btrfs_is_subpage() can not handle cloned/dummy metadata.
+		 */
+		if (fs_info->nodesize >= PAGE_SIZE)
+			ClearPageUptodate(page);
+		else
+			btrfs_subpage_clear_uptodate(fs_info, page, eb->start,
+						     eb->len);
 	}
 }
 
@@ -6510,7 +6533,16 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-		btrfs_page_set_uptodate(fs_info, page, eb->start, eb->len);
+
+		/*
+		 * This is special handling for metadata subpage, as regular
+		 * btrfs_is_subpage() can not handle cloned/dummy metadata.
+		 */
+		if (fs_info->nodesize >= PAGE_SIZE)
+			SetPageUptodate(page);
+		else
+			btrfs_subpage_set_uptodate(fs_info, page, eb->start,
+						   eb->len);
 	}
 }
 
@@ -6605,7 +6637,7 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 	if (unlikely(test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)))
 		return -EIO;
 
-	if (eb->fs_info->sectorsize < PAGE_SIZE)
+	if (eb->fs_info->nodesize < PAGE_SIZE)
 		return read_extent_buffer_subpage(eb, wait, mirror_num);
 
 	num_pages = num_extent_pages(eb);
@@ -6851,7 +6883,7 @@ static void assert_eb_page_uptodate(const struct extent_buffer *eb,
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
 
-	if (fs_info->sectorsize < PAGE_SIZE) {
+	if (fs_info->nodesize < PAGE_SIZE) {
 		bool uptodate;
 
 		uptodate = btrfs_subpage_test_uptodate(fs_info, page,
@@ -6952,7 +6984,7 @@ void copy_extent_buffer_full(const struct extent_buffer *dst,
 
 	ASSERT(dst->len == src->len);
 
-	if (dst->fs_info->sectorsize == PAGE_SIZE) {
+	if (dst->fs_info->nodesize >= PAGE_SIZE) {
 		num_pages = num_extent_pages(dst);
 		for (i = 0; i < num_pages; i++)
 			copy_page(page_address(dst->pages[i]),
@@ -6961,7 +6993,7 @@ void copy_extent_buffer_full(const struct extent_buffer *dst,
 		size_t src_offset = get_eb_offset_in_page(src, 0);
 		size_t dst_offset = get_eb_offset_in_page(dst, 0);
 
-		ASSERT(src->fs_info->sectorsize < PAGE_SIZE);
+		ASSERT(src->fs_info->nodesize < PAGE_SIZE);
 		memcpy(page_address(dst->pages[0]) + dst_offset,
 		       page_address(src->pages[0]) + src_offset,
 		       src->len);
@@ -7354,7 +7386,7 @@ int try_release_extent_buffer(struct page *page)
 {
 	struct extent_buffer *eb;
 
-	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
+	if (btrfs_sb(page->mapping->host->i_sb)->nodesize < PAGE_SIZE)
 		return try_release_subpage_extent_buffer(page);
 
 	/*
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3b2403b6127f..89e888409609 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8129,7 +8129,7 @@ static void wait_subpage_spinlock(struct page *page)
 	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
 	struct btrfs_subpage *subpage;
 
-	if (fs_info->sectorsize == PAGE_SIZE)
+	if (!btrfs_is_subpage(fs_info, page))
 		return;
 
 	ASSERT(PagePrivate(page) && page->private);
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 29bd8c7a7706..c7dea689da90 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -107,7 +107,7 @@ int btrfs_attach_subpage(const struct btrfs_fs_info *fs_info,
 		ASSERT(PageLocked(page));
 
 	/* Either not subpage, or the page already has private attached */
-	if (fs_info->sectorsize == PAGE_SIZE || PagePrivate(page))
+	if (!btrfs_is_subpage(fs_info, page) || PagePrivate(page))
 		return 0;
 
 	subpage = btrfs_alloc_subpage(fs_info, type);
@@ -124,7 +124,7 @@ void btrfs_detach_subpage(const struct btrfs_fs_info *fs_info,
 	struct btrfs_subpage *subpage;
 
 	/* Either not subpage, or already detached */
-	if (fs_info->sectorsize == PAGE_SIZE || !PagePrivate(page))
+	if (!btrfs_is_subpage(fs_info, page) || !PagePrivate(page))
 		return;
 
 	subpage = (struct btrfs_subpage *)detach_page_private(page);
@@ -175,7 +175,7 @@ void btrfs_page_inc_eb_refs(const struct btrfs_fs_info *fs_info,
 {
 	struct btrfs_subpage *subpage;
 
-	if (fs_info->sectorsize == PAGE_SIZE)
+	if (!btrfs_is_subpage(fs_info, page))
 		return;
 
 	ASSERT(PagePrivate(page) && page->mapping);
@@ -190,7 +190,7 @@ void btrfs_page_dec_eb_refs(const struct btrfs_fs_info *fs_info,
 {
 	struct btrfs_subpage *subpage;
 
-	if (fs_info->sectorsize == PAGE_SIZE)
+	if (!btrfs_is_subpage(fs_info, page))
 		return;
 
 	ASSERT(PagePrivate(page) && page->mapping);
@@ -319,7 +319,7 @@ bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_info,
 int btrfs_page_start_writer_lock(const struct btrfs_fs_info *fs_info,
 		struct page *page, u64 start, u32 len)
 {
-	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) {
+	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page)) {
 		lock_page(page);
 		return 0;
 	}
@@ -336,7 +336,7 @@ int btrfs_page_start_writer_lock(const struct btrfs_fs_info *fs_info,
 void btrfs_page_end_writer_lock(const struct btrfs_fs_info *fs_info,
 		struct page *page, u64 start, u32 len)
 {
-	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE)
+	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page))
 		return unlock_page(page);
 	btrfs_subpage_clamp_range(page, &start, &len);
 	if (btrfs_subpage_end_and_test_writer(fs_info, page, start, len))
@@ -620,7 +620,7 @@ IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(checked);
 void btrfs_page_set_##name(const struct btrfs_fs_info *fs_info,		\
 		struct page *page, u64 start, u32 len)			\
 {									\
-	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) {	\
+	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page)) {	\
 		set_page_func(page);					\
 		return;							\
 	}								\
@@ -629,7 +629,7 @@ void btrfs_page_set_##name(const struct btrfs_fs_info *fs_info,		\
 void btrfs_page_clear_##name(const struct btrfs_fs_info *fs_info,	\
 		struct page *page, u64 start, u32 len)			\
 {									\
-	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) {	\
+	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page)) {	\
 		clear_page_func(page);					\
 		return;							\
 	}								\
@@ -638,14 +638,14 @@ void btrfs_page_clear_##name(const struct btrfs_fs_info *fs_info,	\
 bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info,	\
 		struct page *page, u64 start, u32 len)			\
 {									\
-	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE)	\
+	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page))	\
 		return test_page_func(page);				\
 	return btrfs_subpage_test_##name(fs_info, page, start, len);	\
 }									\
 void btrfs_page_clamp_set_##name(const struct btrfs_fs_info *fs_info,	\
 		struct page *page, u64 start, u32 len)			\
 {									\
-	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) {	\
+	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page)) {	\
 		set_page_func(page);					\
 		return;							\
 	}								\
@@ -655,7 +655,7 @@ void btrfs_page_clamp_set_##name(const struct btrfs_fs_info *fs_info,	\
 void btrfs_page_clamp_clear_##name(const struct btrfs_fs_info *fs_info, \
 		struct page *page, u64 start, u32 len)			\
 {									\
-	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) {	\
+	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page)) {	\
 		clear_page_func(page);					\
 		return;							\
 	}								\
@@ -665,7 +665,7 @@ void btrfs_page_clamp_clear_##name(const struct btrfs_fs_info *fs_info, \
 bool btrfs_page_clamp_test_##name(const struct btrfs_fs_info *fs_info,	\
 		struct page *page, u64 start, u32 len)			\
 {									\
-	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE)	\
+	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page))	\
 		return test_page_func(page);				\
 	btrfs_subpage_clamp_range(page, &start, &len);			\
 	return btrfs_subpage_test_##name(fs_info, page, start, len);	\
@@ -694,7 +694,7 @@ void btrfs_page_assert_not_dirty(const struct btrfs_fs_info *fs_info,
 		return;
 
 	ASSERT(!PageDirty(page));
-	if (fs_info->sectorsize == PAGE_SIZE)
+	if (!btrfs_is_subpage(fs_info, page))
 		return;
 
 	ASSERT(PagePrivate(page) && page->private);
@@ -722,8 +722,8 @@ void btrfs_page_unlock_writer(struct btrfs_fs_info *fs_info, struct page *page,
 	struct btrfs_subpage *subpage;
 
 	ASSERT(PageLocked(page));
-	/* For regular page size case, we just unlock the page */
-	if (fs_info->sectorsize == PAGE_SIZE)
+	/* For non-subpage case, we just unlock the page */
+	if (!btrfs_is_subpage(fs_info, page))
 		return unlock_page(page);
 
 	ASSERT(PagePrivate(page) && page->private);
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 7accb5c40d33..a87e53e8c24c 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -4,6 +4,7 @@
 #define BTRFS_SUBPAGE_H
 
 #include <linux/spinlock.h>
+#include "btrfs_inode.h"
 
 /*
  * Extra info for subpapge bitmap.
@@ -74,6 +75,30 @@ enum btrfs_subpage_type {
 	BTRFS_SUBPAGE_DATA,
 };
 
+static inline bool btrfs_is_subpage(const struct btrfs_fs_info *fs_info,
+				    struct page *page)
+{
+	if (fs_info->sectorsize >= PAGE_SIZE)
+		return false;
+
+	/*
+	 * Only data pages (either through DIO or compression) can have no
+	 * mapping. And if page->mapping->host is data inode, it's subpage.
+	 * As we have ruled our sectorsize >= PAGE_SIZE case already.
+	 */
+	if (!page->mapping || !page->mapping->host ||
+	    is_data_inode(page->mapping->host))
+		return true;
+
+	/*
+	 * Now the only remaining case is metadata, which we only go subpage
+	 * routine if nodesize < PAGE_SIZE.
+	 */
+	if (fs_info->nodesize < PAGE_SIZE)
+		return true;
+	return false;
+}
+
 void btrfs_init_subpage_info(struct btrfs_subpage_info *subpage_info, u32 sectorsize);
 int btrfs_attach_subpage(const struct btrfs_fs_info *fs_info,
 			 struct page *page, enum btrfs_subpage_type type);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 3/3] btrfs: expand subpage support to any PAGE_SIZE > 4K
  2022-01-11  2:34 [PATCH v2 0/3] btrfs: support more pages sizes for the subpage support Qu Wenruo
  2022-01-11  2:34 ` [PATCH v2 1/3] btrfs: use dummy extent buffer for super block sys chunk array read Qu Wenruo
  2022-01-11  2:34 ` [PATCH v2 2/3] btrfs: make nodesize >= PAGE_SIZE case to reuse the non-subpage routine Qu Wenruo
@ 2022-01-11  2:34 ` Qu Wenruo
  2 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2022-01-11  2:34 UTC (permalink / raw)
  To: linux-btrfs

With the recent change in metadata handling, we can handle metadata in
the following cases:

- nodesize < PAGE_SIZE and sectorsize < PAGE_SIZE
  Go subpage routine for both metadata and data.

- nodesize < PAGE_SIZE and sectorsize >= PAGE_SIZE
  Invalid case for now. As we require nodesize >= sectorsize.

- nodesize >= PAGE_SIZE and sectorsize < PAGE_SIZE
  Go subpage routine for data, but regular page routine for metadata.

- nodesize >= PAGE_SIZE and sectorsize >= PAGE_SIZE
  Go regular page routine for both metadata and data.

Previously we require 64K page size for subpage as we need to ensure any
tree block can be fit into one page, as at that time, we don't have
support subpage metadata crossing multiple pages.

But now since we allow nodesize >= PAGE_SIZE case to regular page
routine, there is no such limitation any more.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 16 +++++++++++-----
 fs/btrfs/sysfs.c   | 11 +++++------
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 884e0b543136..ffb8c811aeea 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2752,12 +2752,18 @@ static int validate_super(struct btrfs_fs_info *fs_info,
 	}
 
 	/*
-	 * For 4K page size, we only support 4K sector size.
-	 * For 64K page size, we support 64K and 4K sector sizes.
+	 * We only support sectorsize <= PAGE_SIZE case.
+	 *
+	 * For 4K page size (e.g. x86_64), it only supports 4K sector size.
+	 * For 16K page size (e.g. PPC/ARM), it supports 4K, 8K and 16K
+	 * sector sizes.
+	 * For 64K and higher page size (e.g. PPC/ARM), it supports 4K, 8K,
+	 * 16K, 32K and 64K sector sizes (aka, all sector sizes).
+	 *
+	 * For the future, 4K will be the default sectorsize for all btrfs
+	 * and will get supported on all archtectures.
 	 */
-	if ((PAGE_SIZE == SZ_4K && sectorsize != PAGE_SIZE) ||
-	    (PAGE_SIZE == SZ_64K && (sectorsize != SZ_4K &&
-				     sectorsize != SZ_64K))) {
+	if (sectorsize > PAGE_SIZE) {
 		btrfs_err(fs_info,
 			"sectorsize %llu not yet supported for page size %lu",
 			sectorsize, PAGE_SIZE);
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index beb7f72d50b8..6a01be4ef3b4 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -390,13 +390,12 @@ static ssize_t supported_sectorsizes_show(struct kobject *kobj,
 					  char *buf)
 {
 	ssize_t ret = 0;
+	u32 sectorsize;
 
-	/* 4K sector size is also supported with 64K page size */
-	if (PAGE_SIZE == SZ_64K)
-		ret += sysfs_emit_at(buf, ret, "%u ", SZ_4K);
-
-	/* Only sectorsize == PAGE_SIZE is now supported */
-	ret += sysfs_emit_at(buf, ret, "%lu\n", PAGE_SIZE);
+	/* We support any sectorsize <= PAGE_SIZE */
+	for (sectorsize = SZ_4K; sectorsize <= PAGE_SIZE; sectorsize <<= 1)
+		ret += sysfs_emit_at(buf, ret, "%u%s", sectorsize,
+				     sectorsize == PAGE_SIZE ? "\n" : " ");
 
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/3] btrfs: use dummy extent buffer for super block sys chunk array read
  2022-01-11  2:34 ` [PATCH v2 1/3] btrfs: use dummy extent buffer for super block sys chunk array read Qu Wenruo
@ 2022-01-11  9:56   ` Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2022-01-11  9:56 UTC (permalink / raw)
  To: linux-btrfs



On 2022/1/11 10:34, Qu Wenruo wrote:
> [BUG]
> During test for 16K page size with 4K sectorsize and 64K nodesize, btrfs
> will reject the mount with the following dmesg:
> 
>   BTRFS warning (device dm-1): read-write for sector size 4096 with page size 16384 is experimental
>   BTRFS error (device dm-1): tree block crosses page boundary, start 65536 nodesize 65536
>   BTRFS error (device dm-1): failed to read the system array: -22
>   BTRFS error (device dm-1): open_ctree failed

The message is not caused by this reason, but the wrong boundary check 
in the 2nd patch (which is already fixed in v2).

> 
> The rejection only happens with sectorsize=4K, page_size=16K and
> nodesize=64K combination.
> 
> For sectorsize=4K, page_size=16K and nodesize=16K (default) case, it
> works fine.
> 
> [CAUSE]
> The problem is in how we read sys chunk array.
> 
> In function btrfs_read_sys_array(), we allocate an extent buffer and
> copy super block into the extent buffer, so that we can use various
> extent buffer helpers to do the work.
> 
> But the problem is, we're calling btrfs_find_create_tree_block(), which
> will do all the validation check, including the page boundary check for
> subpage cases.
> 
> [FIX]
> In fact, we only need a dummy extent buffer, without all the checks for
> a real extent buffer.
> 
> This patch will replace the btrfs_find_create_tree_block() call with
> __alloc_dummy_extent_buffer().
> 
> By this we can:
> 
> - Set the extent buffer size to BTRFS_SUPER_INFO_SIZE
> - Avoid the unnecessary validation checks
> 
> Also since we're here, remove some stale comments on setting the eb page
> uptodate, as now set_extent_buffer_uptodate() can handle dummy eb cases
> without any problem.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/volumes.c | 27 ++++-----------------------
>   1 file changed, 4 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index b07d382d53a8..00c7f5cf4b52 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -7373,7 +7373,6 @@ static int read_one_dev(struct extent_buffer *leaf,
>   
>   int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
>   {
> -	struct btrfs_root *root = fs_info->tree_root;
>   	struct btrfs_super_block *super_copy = fs_info->super_copy;
>   	struct extent_buffer *sb;
>   	struct btrfs_disk_key *disk_key;
> @@ -7388,31 +7387,13 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
>   	u64 type;
>   	struct btrfs_key key;
>   
> -	ASSERT(BTRFS_SUPER_INFO_SIZE <= fs_info->nodesize);
> -	/*
> -	 * This will create extent buffer of nodesize, superblock size is
> -	 * fixed to BTRFS_SUPER_INFO_SIZE. If nodesize > sb size, this will
> -	 * overallocate but we can keep it as-is, only the first page is used.
> -	 */
> -	sb = btrfs_find_create_tree_block(fs_info, BTRFS_SUPER_INFO_OFFSET,
> -					  root->root_key.objectid, 0);
> +	ASSERT(BTRFS_SUPER_INFO_SIZE <= fs_info->nodesize &&
> +	       BTRFS_SUPER_INFO_SIZE <= PAGE_SIZE);
> +	sb = __alloc_dummy_extent_buffer(fs_info, BTRFS_SUPER_INFO_OFFSET,
> +					 BTRFS_SUPER_INFO_SIZE);

This is not good for 64K page size, 16K sector size.

It will cause crash in generic/205.

As later set_extent_buffer_uptodate() will use eb->len to set the page 
uptodate, using subpage routine.

But 4K is not aligned to 16K sectorsize, and will trigger an ASSERT().

Thus we need to use alloc_dummy_extent_buffer(), without "__" prefix here.

The patch has been updated into github.

Thanks,
Qu
>   	if (IS_ERR(sb))
>   		return PTR_ERR(sb);
>   	set_extent_buffer_uptodate(sb);
> -	/*
> -	 * The sb extent buffer is artificial and just used to read the system array.
> -	 * set_extent_buffer_uptodate() call does not properly mark all it's
> -	 * pages up-to-date when the page is larger: extent does not cover the
> -	 * whole page and consequently check_page_uptodate does not find all
> -	 * the page's extents up-to-date (the hole beyond sb),
> -	 * write_extent_buffer then triggers a WARN_ON.
> -	 *
> -	 * Regular short extents go through mark_extent_buffer_dirty/writeback cycle,
> -	 * but sb spans only this function. Add an explicit SetPageUptodate call
> -	 * to silence the warning eg. on PowerPC 64.
> -	 */
> -	if (PAGE_SIZE > BTRFS_SUPER_INFO_SIZE)
> -		SetPageUptodate(sb->pages[0]); >
>   	write_extent_buffer(sb, super_copy, 0, BTRFS_SUPER_INFO_SIZE);
>   	array_size = btrfs_super_sys_array_size(super_copy);


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-01-11  9:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-11  2:34 [PATCH v2 0/3] btrfs: support more pages sizes for the subpage support Qu Wenruo
2022-01-11  2:34 ` [PATCH v2 1/3] btrfs: use dummy extent buffer for super block sys chunk array read Qu Wenruo
2022-01-11  9:56   ` Qu Wenruo
2022-01-11  2:34 ` [PATCH v2 2/3] btrfs: make nodesize >= PAGE_SIZE case to reuse the non-subpage routine Qu Wenruo
2022-01-11  2:34 ` [PATCH v2 3/3] btrfs: expand subpage support to any PAGE_SIZE > 4K Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).