All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/14] btrfs: add read-only support for subpage sector size
@ 2020-11-18  8:53 Qu Wenruo
  2020-11-18  8:53 ` [PATCH 01/14] btrfs: extent_io: Use detach_page_private() for alloc_extent_buffer() Qu Wenruo
                   ` (13 more replies)
  0 siblings, 14 replies; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18  8:53 UTC (permalink / raw)
  To: linux-btrfs

Patches can be fetched from github:
https://github.com/adam900710/linux/tree/subpage
Currently the branch also contains RW metadata support (partly tested).
The schedule is when a milestone (for this case, metadata RW)is mostly
finishes, send out previous milestone (metadata RO) for review.

Please note that, due to the following reasons, please don't expect patches
can be applies without conflicts or checkpatch warning:

- Ongoing development
  Since the development is still ongoing, the rebase is not that
  frequent.
  So until the development calms down, just don't complain about the
  checkpatch/conflicts.

- Slow full kernel compile
  Even with cross distcc, it will still take tens of minutes to compile
  the full kernel.
  Not to mention the new regressions from current cycle affecting my
  aarch64 environment.
  (regulator regression screwing up all RK3399 boards, lockdep bugs).

- Stupid checkpatch script
  ` if (PAGE_SIZE == SZ_64K && ) {}. `
  Above check will be handled at compile time as both macros are fixed
  value, but checkpatch can't detect that and always want the user to
  put the fixed value to the right of the "==".

- Dependency on previous patches
  This patchset is mainly focus on the read-only implementation.
  The prep patches are sent in previous patchset.

== What works ==

Existing regular page sized sector size support
Subpage read-only Mount (with all self tests and ASSERT)
Subpage metadata read (including all trees and inline extents, and csum checking)
Subpage compressed/uncompressed data read (with csum checking)

== What doesn't work ==

Read-write mount (see the subject)

=== Need feedback ===
The following points need feedback from the community:

- The error handling for page::private memory allocation
  This introduces new failure patterns. And the iomap code is not a good
  example either (just uses __NOFAIL, and skip NULL check).

- Whether to use helpers for various bitmap operations
  Almost all patches have some bitmap based operation to update patch
  status. All of them have some patterns but not completely the same.
  Thus I'm not sure whether it's a good idea to introduce a helper.

- u16 vs u32 bitmap
  Currently subpage support only needs 16 bits for it operations.
  But all the bitmap operations uses 32 bits.

  This means:
  * Extra memory just get wasted
    Memory usage for each bitmap get doubled.
  * Ugly way to check if a range has its bits all set
    Currently we need to we need to define a temporary
    bitmap, set the temporary bitmap, then call bitmap_subset().
    If use u16 directly, we can use bit and and to do it more easily.

- Should we handle subpage and regular sector size case separately?
  Handling them separately makes the existing behavior untouched, thus
  mostly regression free. But this bloats the code obviously.

  Unifying to subpage would cause obvious memory overhead, and obviously
  regression for 4K page systems.

  Currently I prefer to trade code complexity for 4K regression free.

=== Changelog ===
v1:
- Separate the main implementation from previous huge patchset
  Huge patchset doesn't make much sense.

- Use bitmap implementation
  Now page::private will be a pointer to btrfs_subpage structure, which
  contains bitmaps for various page status.


Qu Wenruo (14):
  btrfs: extent_io: Use detach_page_private() for alloc_extent_buffer()
  btrfs: extent_io: introduce a helper to grab an existing extent buffer
    from a page
  btrfs: extent_io: introduce the skeleton of btrfs_subpage structure
  btrfs: extent_io: make attach_extent_buffer_page() to handle subpage
    case
  btrfs: extent_io: make grab_extent_buffer_from_page() to handle
    subpage case
  btrfs: extent_io: support subpage for extent buffer page release
  btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support
    subpage size
  btrfs: extent_io: implement try_release_extent_buffer()  for subpage
    metadata support
  btrfs: extent_io: introduce read_extent_buffer_subpage()
  btrfs: extent_io: make endio_readpage_update_page_status() to handle
    subpage case
  btrfs: disk-io: introduce subpage metadata validation check
  btrfs: introduce btrfs_subpage for data inodes
  btrfs: integrate page status update for read path into
    begin/end_page_read()
  btrfs: allow RO mount of 4K sector size fs on 64K page system

 fs/btrfs/compression.c      |  10 +-
 fs/btrfs/disk-io.c          | 107 ++++++-
 fs/btrfs/extent_io.c        | 566 +++++++++++++++++++++++++++++++-----
 fs/btrfs/extent_io.h        |  14 +-
 fs/btrfs/file.c             |  10 +-
 fs/btrfs/free-space-cache.c |  15 +-
 fs/btrfs/inode.c            |  12 +-
 fs/btrfs/ioctl.c            |   5 +-
 fs/btrfs/reflink.c          |   5 +-
 fs/btrfs/relocation.c       |  12 +-
 fs/btrfs/super.c            |   7 +
 11 files changed, 666 insertions(+), 97 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/14] btrfs: extent_io: Use detach_page_private() for alloc_extent_buffer()
  2020-11-18  8:53 [PATCH v1 00/14] btrfs: add read-only support for subpage sector size Qu Wenruo
@ 2020-11-18  8:53 ` Qu Wenruo
  2020-11-18 10:22   ` Johannes Thumshirn
  2020-11-18 15:56   ` David Sterba
  2020-11-18  8:53 ` [PATCH 02/14] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page Qu Wenruo
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18  8:53 UTC (permalink / raw)
  To: linux-btrfs

In alloc_extent_buffer(), after we got a page from btree inode, we check
if that page has private pointer attached.

If attached, we check if the existing extent buffer has a proper refs.
If not (the eb is being freed), we will detach that private eb pointer.

The point here is, we are detaching that eb pointer by calling:
- ClearPagePrivate()
- put_page()

The put_page() here is especially confusing, as it's decreaing the ref
caused by attach_page_private().
Without knowing that, it looks like the put_page() is for the
find_or_create_page() call, confusing the read.

Since we're always modifing page private with attach_page_private() and
detach_page_private(), the only open-coded detach_page_private() here is
really confusing.

Fix it by calling detach_page_private().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f305777ee1a3..55115f485d09 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5310,14 +5310,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 				goto free_eb;
 			}
 			exists = NULL;
+			WARN_ON(PageDirty(p));
 
 			/*
 			 * Do this so attach doesn't complain and we need to
 			 * drop the ref the old guy had.
 			 */
-			ClearPagePrivate(p);
-			WARN_ON(PageDirty(p));
-			put_page(p);
+			detach_page_private(page);
 		}
 		attach_extent_buffer_page(eb, p);
 		spin_unlock(&mapping->private_lock);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/14] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page
  2020-11-18  8:53 [PATCH v1 00/14] btrfs: add read-only support for subpage sector size Qu Wenruo
  2020-11-18  8:53 ` [PATCH 01/14] btrfs: extent_io: Use detach_page_private() for alloc_extent_buffer() Qu Wenruo
@ 2020-11-18  8:53 ` Qu Wenruo
  2020-11-18 10:26   ` Johannes Thumshirn
  2020-11-18  8:53 ` [PATCH 03/14] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure Qu Wenruo
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18  8:53 UTC (permalink / raw)
  To: linux-btrfs

This patch will extract the code to grab an extent buffer from a page
into a helper, grab_extent_buffer_from_page().

This reduces one indent level, and provides the work place for later
expansion for subapge support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 60 ++++++++++++++++++++++++++------------------
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 55115f485d09..759d2f2292ed 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5249,6 +5249,36 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
 }
 #endif
 
+static struct extent_buffer *grab_extent_buffer_from_page(struct page *page)
+{
+	struct extent_buffer *exists;
+
+	/* Page not yet attached to an extent buffer */
+	if (!PagePrivate(page))
+		return NULL;
+
+	/*
+	 * We could have already allocated an eb for this page
+	 * and attached one so lets see if we can get a ref on
+	 * the existing eb, and if we can we know it's good and
+	 * we can just return that one, else we know we can just
+	 * overwrite page->private.
+	 */
+	exists = (struct extent_buffer *)page->private;
+	if (atomic_inc_not_zero(&exists->refs)) {
+		mark_extent_buffer_accessed(exists, page);
+		return exists;
+	}
+
+	WARN_ON(PageDirty(page));
+	/*
+	 * The page belongs to an eb which is being freed.
+	 * Detach it from previous eb so that we can reuse it.
+	 */
+	detach_page_private(page);
+	return NULL;
+}
+
 struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 					  u64 start, u64 owner_root, int level)
 {
@@ -5293,30 +5323,12 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		}
 
 		spin_lock(&mapping->private_lock);
-		if (PagePrivate(p)) {
-			/*
-			 * We could have already allocated an eb for this page
-			 * and attached one so lets see if we can get a ref on
-			 * the existing eb, and if we can we know it's good and
-			 * we can just return that one, else we know we can just
-			 * overwrite page->private.
-			 */
-			exists = (struct extent_buffer *)p->private;
-			if (atomic_inc_not_zero(&exists->refs)) {
-				spin_unlock(&mapping->private_lock);
-				unlock_page(p);
-				put_page(p);
-				mark_extent_buffer_accessed(exists, p);
-				goto free_eb;
-			}
-			exists = NULL;
-			WARN_ON(PageDirty(p));
-
-			/*
-			 * Do this so attach doesn't complain and we need to
-			 * drop the ref the old guy had.
-			 */
-			detach_page_private(page);
+		exists = grab_extent_buffer_from_page(p);
+		if (exists) {
+			spin_unlock(&mapping->private_lock);
+			unlock_page(p);
+			put_page(p);
+			goto free_eb;
 		}
 		attach_extent_buffer_page(eb, p);
 		spin_unlock(&mapping->private_lock);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/14] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure
  2020-11-18  8:53 [PATCH v1 00/14] btrfs: add read-only support for subpage sector size Qu Wenruo
  2020-11-18  8:53 ` [PATCH 01/14] btrfs: extent_io: Use detach_page_private() for alloc_extent_buffer() Qu Wenruo
  2020-11-18  8:53 ` [PATCH 02/14] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page Qu Wenruo
@ 2020-11-18  8:53 ` Qu Wenruo
  2020-11-18 10:53   ` Johannes Thumshirn
  2020-11-18  8:53 ` [PATCH 04/14] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18  8:53 UTC (permalink / raw)
  To: linux-btrfs

For btrfs subpage support, we need a structure for record extra info for
a page so that we can know things like which sector in the page is
uptodate/dirty.

This patch will introduce the skeleton structure for future btrfs
subpage support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 32 ++++++++++++++++++++++++++++++++
 fs/btrfs/extent_io.h |  8 ++++++++
 2 files changed, 40 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 759d2f2292ed..2eaf09ff59ca 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5279,6 +5279,38 @@ static struct extent_buffer *grab_extent_buffer_from_page(struct page *page)
 	return NULL;
 }
 
+int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page)
+{
+	struct btrfs_subpage *subpage;
+
+	ASSERT(PageLocked(page));
+	/* Either not subpage, or the page already has private attached */
+	if (!btrfs_is_subpage(fs_info) || PagePrivate(page))
+		return 0;
+
+	subpage = kzalloc(sizeof(*subpage), GFP_NOFS);
+	if (!subpage)
+		return -ENOMEM;
+
+	spin_lock_init(&subpage->lock);
+	attach_page_private(page, subpage);
+	return 0;
+}
+
+void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page)
+{
+	struct btrfs_subpage *subpage;
+
+	/* Either not subpage, or already detached */
+	if (!btrfs_is_subpage(fs_info) || !PagePrivate(page))
+		return;
+
+	subpage = (struct btrfs_subpage *)detach_page_private(page);
+	ASSERT(subpage && bitmap_empty(subpage->tree_block_bitmap,
+				       BTRFS_SUBPAGE_BITMAP_SIZE));
+	kfree(subpage);
+}
+
 struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 					  u64 start, u64 owner_root, int level)
 {
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 0123c75ee203..4251bef25aac 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -307,6 +307,14 @@ blk_status_t btrfs_submit_read_repair(struct inode *inode,
 				      u64 start, u64 end, int failed_mirror,
 				      submit_bio_hook_t *submit_bio_hook);
 
+#define BTRFS_SUBPAGE_BITMAP_SIZE	(SZ_64K / SZ_4K)
+struct btrfs_subpage {
+	spinlock_t lock;
+	DECLARE_BITMAP(tree_block_bitmap, BTRFS_SUBPAGE_BITMAP_SIZE);
+};
+
+int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
+void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 bool find_lock_delalloc_range(struct inode *inode,
 			     struct page *locked_page, u64 *start,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/14] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case
  2020-11-18  8:53 [PATCH v1 00/14] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (2 preceding siblings ...)
  2020-11-18  8:53 ` [PATCH 03/14] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure Qu Wenruo
@ 2020-11-18  8:53 ` Qu Wenruo
  2020-11-18  8:53 ` [PATCH 05/14] btrfs: extent_io: make grab_extent_buffer_from_page() " Qu Wenruo
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18  8:53 UTC (permalink / raw)
  To: linux-btrfs

For subpage case, we need to allocate new memory for each metadata page.

So we need to:
- Allow attach_extent_buffer_page() to return int
  To indicate allocation failure

- Prealloc page->private for alloc_extent_buffer()
  We don't want to call memory allocation with spinlock hold, so
  do preallocation before we acquire the spin lock.

- Handle subpage and regular case differently in
  attach_extent_buffer_page()
  For regular case, just do the usual thing.
  For subpage case, allocate new memory and update the tree_block
  bitmap.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 77 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 63 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2eaf09ff59ca..94101d1e04eb 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3142,22 +3142,50 @@ static int submit_extent_page(unsigned int opf,
 	return ret;
 }
 
-static void attach_extent_buffer_page(struct extent_buffer *eb,
+static int attach_extent_buffer_page(struct extent_buffer *eb,
 				      struct page *page)
 {
-	/*
-	 * If the page is mapped to btree inode, we should hold the private
-	 * lock to prevent race.
-	 * For cloned or dummy extent buffers, their pages are not mapped and
-	 * will not race with any other ebs.
-	 */
-	if (page->mapping)
-		lockdep_assert_held(&page->mapping->private_lock);
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct btrfs_subpage *subpage;
+	int start;
+	int nbits;
+	int ret;
 
-	if (!PagePrivate(page))
-		attach_page_private(page, eb);
-	else
-		WARN_ON(page->private != (unsigned long)eb);
+	if (!btrfs_is_subpage(fs_info)) {
+		/*
+		 * If the page is mapped to btree inode, we should hold the
+		 * private lock to prevent race.
+		 * For cloned or dummy extent buffers, their pages are not
+		 * mapped and will not race with any other ebs.
+		 */
+		if (page->mapping)
+			lockdep_assert_held(&page->mapping->private_lock);
+
+		if (!PagePrivate(page))
+			attach_page_private(page, eb);
+		else
+			WARN_ON(page->private != (unsigned long)eb);
+		return 0;
+	}
+
+	/* Already mapped, just update the existing range */
+	if (PagePrivate(page))
+		goto update_bitmap;
+
+	/* Do new allocation to attach subpage */
+	ret = btrfs_attach_subpage(fs_info, page);
+	if (ret < 0)
+		return ret;
+
+update_bitmap:
+	start = (eb->start - page_offset(page)) >> fs_info->sectorsize_bits;
+	nbits = eb->len >> fs_info->sectorsize_bits;
+
+	subpage = (struct btrfs_subpage *)page->private;
+	spin_lock_bh(&subpage->lock);
+	bitmap_set(subpage->tree_block_bitmap, start, nbits);
+	spin_unlock_bh(&subpage->lock);
+	return 0;
 }
 
 void set_page_extent_mapped(struct page *page)
@@ -5065,12 +5093,19 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
 		return NULL;
 
 	for (i = 0; i < num_pages; i++) {
+		int ret;
+
 		p = alloc_page(GFP_NOFS);
 		if (!p) {
 			btrfs_release_extent_buffer(new);
 			return NULL;
 		}
-		attach_extent_buffer_page(new, p);
+		ret = attach_extent_buffer_page(new, p);
+		if (ret < 0) {
+			put_page(p);
+			btrfs_release_extent_buffer(new);
+			return NULL;
+		}
 		WARN_ON(PageDirty(p));
 		SetPageUptodate(p);
 		new->pages[i] = p;
@@ -5354,6 +5389,18 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 			goto free_eb;
 		}
 
+		/*
+		 * Preallocate page->private for subpage case, so that
+		 * we won't allocate memory with private_lock hold.
+		 */
+		ret = btrfs_attach_subpage(fs_info, p);
+		if (ret < 0) {
+			unlock_page(p);
+			put_page(p);
+			exists = ERR_PTR(-ENOMEM);
+			goto free_eb;
+		}
+
 		spin_lock(&mapping->private_lock);
 		exists = grab_extent_buffer_from_page(p);
 		if (exists) {
@@ -5362,8 +5409,10 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 			put_page(p);
 			goto free_eb;
 		}
+		/* Should not fail, as we have attached the subpage already */
 		attach_extent_buffer_page(eb, p);
 		spin_unlock(&mapping->private_lock);
+
 		WARN_ON(PageDirty(p));
 		eb->pages[i] = p;
 		if (!PageUptodate(p))
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/14] btrfs: extent_io: make grab_extent_buffer_from_page() to handle subpage case
  2020-11-18  8:53 [PATCH v1 00/14] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (3 preceding siblings ...)
  2020-11-18  8:53 ` [PATCH 04/14] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
@ 2020-11-18  8:53 ` Qu Wenruo
  2020-11-18  8:53 ` [PATCH 06/14] btrfs: extent_io: support subpage for extent buffer page release Qu Wenruo
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18  8:53 UTC (permalink / raw)
  To: linux-btrfs

For subpage case, grab_extent_buffer_from_page() can't really get an
extent buffer just from btrfs_subpage.

Although we have btrfs_subpage::tree_block_bitmap, which can be used to
grab the bytenr of an existing extent buffer, and can then go radix tree
search to grab that existing eb.

However we are still doing radix tree insert check in
alloc_extent_buffer(), thus we don't really need to do the extra hassle,
just let alloc_extent_buffer() to handle existing eb in radix tree.

So for grab_extent_buffer_from_page(), just always return NULL for
subpage case.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 94101d1e04eb..f424a26a695e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5284,10 +5284,19 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
 }
 #endif
 
-static struct extent_buffer *grab_extent_buffer_from_page(struct page *page)
+static struct extent_buffer *grab_extent_buffer_from_page(
+		struct btrfs_fs_info *fs_info, struct page *page)
 {
 	struct extent_buffer *exists;
 
+	/*
+	 * For subpage case, we completely rely on radix tree to ensure we
+	 * don't try to insert two eb for the same bytenr.
+	 * So here we alwasy return NULL and just continue.
+	 */
+	if (btrfs_is_subpage(fs_info))
+		return NULL;
+
 	/* Page not yet attached to an extent buffer */
 	if (!PagePrivate(page))
 		return NULL;
@@ -5402,7 +5411,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		}
 
 		spin_lock(&mapping->private_lock);
-		exists = grab_extent_buffer_from_page(p);
+		exists = grab_extent_buffer_from_page(fs_info, p);
 		if (exists) {
 			spin_unlock(&mapping->private_lock);
 			unlock_page(p);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/14] btrfs: extent_io: support subpage for extent buffer page release
  2020-11-18  8:53 [PATCH v1 00/14] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (4 preceding siblings ...)
  2020-11-18  8:53 ` [PATCH 05/14] btrfs: extent_io: make grab_extent_buffer_from_page() " Qu Wenruo
@ 2020-11-18  8:53 ` Qu Wenruo
  2020-11-18  8:53 ` [PATCH 07/14] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18  8:53 UTC (permalink / raw)
  To: linux-btrfs

In btrfs_release_extent_buffer_pages(), we need to add extra handling
for subpage.

To do so, introduce a new helper, detach_extent_buffer_page(), to do
different handling for regular and subpage cases.

For subpage case, the new trick is to clear the range of current extent
buffer, and detach page private if and only if we're the last tree block
of the page.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 70 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 53 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f424a26a695e..090acf0e6a59 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4999,25 +4999,12 @@ int extent_buffer_under_io(const struct extent_buffer *eb)
 		test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
 }
 
-/*
- * Release all pages attached to the extent buffer.
- */
-static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
+static void detach_extent_buffer_page(struct extent_buffer *eb,
+				      struct page *page)
 {
-	int i;
-	int num_pages;
-	int mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
-
-	BUG_ON(extent_buffer_under_io(eb));
-
-	num_pages = num_extent_pages(eb);
-	for (i = 0; i < num_pages; i++) {
-		struct page *page = eb->pages[i];
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 
-		if (!page)
-			continue;
-		if (mapped)
-			spin_lock(&page->mapping->private_lock);
+	if (!btrfs_is_subpage(fs_info)) {
 		/*
 		 * We do this since we'll remove the pages after we've
 		 * removed the eb from the radix tree, so we could race
@@ -5036,6 +5023,55 @@ static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
 			 */
 			detach_page_private(page);
 		}
+	}
+
+	/*
+	 * For subpage case, clear the range in tree_block_bitmap,
+	 * and if we're the last one, detach private completely.
+	 */
+	if (PagePrivate(page)) {
+		struct btrfs_subpage *subpage;
+		int start = (eb->start - page_offset(page)) >>
+			    fs_info->sectorsize_bits;
+		int nbits = (eb->len) >> fs_info->sectorsize_bits;
+		bool last = false;
+
+		ASSERT(page_offset(page) <= eb->start &&
+		       eb->start + eb->len <= page_offset(page) + PAGE_SIZE);
+
+		subpage = (struct btrfs_subpage *)page->private;
+		spin_lock_bh(&subpage->lock);
+		bitmap_clear(subpage->tree_block_bitmap, start, nbits);
+		if (bitmap_empty(subpage->tree_block_bitmap,
+				 BTRFS_SUBPAGE_BITMAP_SIZE))
+			last = true;
+		spin_unlock_bh(&subpage->lock);
+		if (last)
+			btrfs_detach_subpage(fs_info, page);
+	}
+}
+
+/*
+ * Release all pages attached to the extent buffer.
+ */
+static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
+{
+	int i;
+	int num_pages;
+	int mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
+
+	ASSERT(!extent_buffer_under_io(eb));
+
+	num_pages = num_extent_pages(eb);
+	for (i = 0; i < num_pages; i++) {
+		struct page *page = eb->pages[i];
+
+		if (!page)
+			continue;
+		if (mapped)
+			spin_lock(&page->mapping->private_lock);
+
+		detach_extent_buffer_page(eb, page);
 
 		if (mapped)
 			spin_unlock(&page->mapping->private_lock);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/14] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size
  2020-11-18  8:53 [PATCH v1 00/14] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (5 preceding siblings ...)
  2020-11-18  8:53 ` [PATCH 06/14] btrfs: extent_io: support subpage for extent buffer page release Qu Wenruo
@ 2020-11-18  8:53 ` Qu Wenruo
  2020-11-18  8:53 ` [PATCH 08/14] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18  8:53 UTC (permalink / raw)
  To: linux-btrfs

For those functions, to support subpage size they just need the follow work:
- set/clear uptodate bitmap
- set page Uptodate if the full range of the page is uptodate

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 40 ++++++++++++++++++++++++++++++++++++----
 fs/btrfs/extent_io.h |  1 +
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 090acf0e6a59..b3edd7fba5c8 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5663,10 +5663,24 @@ bool set_extent_buffer_dirty(struct extent_buffer *eb)
 
 void clear_extent_buffer_uptodate(struct extent_buffer *eb)
 {
-	int i;
-	struct page *page;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct page *page = eb->pages[0];
 	int num_pages;
+	int i;
+
+	if (btrfs_is_subpage(fs_info)) {
+		struct btrfs_subpage *subpage;
+		int bit_start = (eb->start - page_offset(page)) >>
+				fs_info->sectorsize_bits;
+		int nbits = fs_info->nodesize >>
+				fs_info->sectorsize_bits;
 
+		subpage = (struct btrfs_subpage *)page->private;
+
+		spin_lock_bh(&subpage->lock);
+		bitmap_clear(subpage->uptodate_bitmap, bit_start, nbits);
+		spin_unlock_bh(&subpage->lock);
+	}
 	clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
@@ -5678,11 +5692,29 @@ void clear_extent_buffer_uptodate(struct extent_buffer *eb)
 
 void set_extent_buffer_uptodate(struct extent_buffer *eb)
 {
-	int i;
-	struct page *page;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct page *page = eb->pages[0];
 	int num_pages;
+	int i;
 
 	set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+	if (btrfs_is_subpage(fs_info)) {
+		struct btrfs_subpage *subpage;
+		int bit_start = (eb->start - page_offset(page)) >>
+				fs_info->sectorsize_bits;
+		int nbits = fs_info->nodesize >>
+				fs_info->sectorsize_bits;
+
+		subpage = (struct btrfs_subpage *)page->private;
+
+		spin_lock_bh(&subpage->lock);
+		bitmap_set(subpage->uptodate_bitmap, bit_start, nbits);
+		if (bitmap_full(subpage->uptodate_bitmap,
+				BTRFS_SUBPAGE_BITMAP_SIZE))
+			SetPageUptodate(page);
+		spin_unlock_bh(&subpage->lock);
+		return;
+	}
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 4251bef25aac..11e1e013cb8c 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -311,6 +311,7 @@ blk_status_t btrfs_submit_read_repair(struct inode *inode,
 struct btrfs_subpage {
 	spinlock_t lock;
 	DECLARE_BITMAP(tree_block_bitmap, BTRFS_SUBPAGE_BITMAP_SIZE);
+	DECLARE_BITMAP(uptodate_bitmap, BTRFS_SUBPAGE_BITMAP_SIZE);
 };
 
 int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/14] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2020-11-18  8:53 [PATCH v1 00/14] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (6 preceding siblings ...)
  2020-11-18  8:53 ` [PATCH 07/14] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
@ 2020-11-18  8:53 ` Qu Wenruo
  2020-11-18  8:53 ` [PATCH 09/14] btrfs: extent_io: introduce read_extent_buffer_subpage() Qu Wenruo
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18  8:53 UTC (permalink / raw)
  To: linux-btrfs

Unlike the original try_release_extent_buffer,
try_release_subpage_extent_buffer() will iterate through
btrfs_subpage::tree_block_bitmap, and try to release each extent buffer.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 69 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b3edd7fba5c8..28f35eb06bf8 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -6340,10 +6340,79 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
 	}
 }
 
+static int try_release_subpage_extent_buffer(struct page *page)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+	u64 page_start = page_offset(page);
+	int bitmap_size = BTRFS_SUBPAGE_BITMAP_SIZE;
+	int bit_start = 0;
+	int ret;
+
+	while (bit_start < bitmap_size) {
+		struct btrfs_subpage *subpage;
+		struct extent_buffer *eb;
+		u64 start;
+
+		/*
+		 * Make sure the page still has private, as previous run can
+		 * detach the private
+		 */
+		spin_lock(&page->mapping->private_lock);
+		if (!PagePrivate(page)) {
+			spin_unlock(&page->mapping->private_lock);
+			break;
+		}
+		subpage = (struct btrfs_subpage *)page->private;
+		spin_unlock(&page->mapping->private_lock);
+
+		spin_lock_bh(&subpage->lock);
+		bit_start = find_next_bit(subpage->tree_block_bitmap,
+				BTRFS_SUBPAGE_BITMAP_SIZE, bit_start);
+		spin_unlock_bh(&subpage->lock);
+		if (bit_start >= bitmap_size)
+			break;
+		start = bit_start * fs_info->sectorsize + page_start;
+		bit_start += fs_info->nodesize >> fs_info->sectorsize_bits;
+		/*
+		 * Here we can't call find_extent_buffer() which will increase
+		 * eb->refs.
+		 */
+		rcu_read_lock();
+		eb = radix_tree_lookup(&fs_info->buffer_radix,
+				start >> fs_info->sectorsize_bits);
+		rcu_read_unlock();
+		ASSERT(eb);
+		spin_lock(&eb->refs_lock);
+		if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb) ||
+		    !test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) {
+			spin_unlock(&eb->refs_lock);
+			continue;
+		}
+		/*
+		 * Here we don't care the return value, we will always check
+		 * the page private at the end.
+		 * And release_extent_buffer() will release the refs_lock.
+		 */
+		release_extent_buffer(eb);
+	}
+	/* Finally to check if we have cleared page private */
+	spin_lock(&page->mapping->private_lock);
+	if (!PagePrivate(page))
+		ret = 1;
+	else
+		ret = 0;
+	spin_unlock(&page->mapping->private_lock);
+	return ret;
+
+}
+
 int try_release_extent_buffer(struct page *page)
 {
 	struct extent_buffer *eb;
 
+	if (btrfs_is_subpage(btrfs_sb(page->mapping->host->i_sb)))
+		return try_release_subpage_extent_buffer(page);
+
 	/*
 	 * We need to make sure nobody is attaching this page to an eb right
 	 * now.
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/14] btrfs: extent_io: introduce read_extent_buffer_subpage()
  2020-11-18  8:53 [PATCH v1 00/14] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (7 preceding siblings ...)
  2020-11-18  8:53 ` [PATCH 08/14] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
@ 2020-11-18  8:53 ` Qu Wenruo
  2020-11-18  8:53 ` [PATCH 10/14] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case Qu Wenruo
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18  8:53 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new helper, read_extent_buffer_subpage(), to do the subpage
extent buffer read.

The difference between regular and subpage routines are:
- No page locking
  Here we completely rely on extent locking.
  Page locking can reduce the concurrency greatly, as if we lock one
  page to read one extent buffer, all the other extent buffers in the
  same page will have to wait.

- Extent uptodate condition
  Despite the existing PageUptodate() and EXTENT_BUFFER_UPTODATE check,
  We also need to check btrfs_subpage::uptodate_bitmap.

- No page loop
  Just one page, no need to loop, this greately simplified the subpage
  routine.

This patch only implemented the bio submit part, no endio support yet.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   |  1 +
 fs/btrfs/extent_io.c | 72 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8a558a43818d..b395daf62086 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -604,6 +604,7 @@ int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
 	ASSERT(page->private);
 	eb = (struct extent_buffer *)page->private;
 
+
 	/*
 	 * The pending IO might have been the only thing that kept this buffer
 	 * in memory.  Make sure we have a ref for all this other checks
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 28f35eb06bf8..35aee688d6c1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5722,6 +5722,75 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
 	}
 }
 
+static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
+				      int mirror_num)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct btrfs_subpage *subpage;
+	struct extent_io_tree *io_tree;
+	struct page *page = eb->pages[0];
+	struct bio *bio = NULL;
+	int start = (eb->start - page_offset(page)) >> fs_info->sectorsize_bits;
+	int ret = 0;
+
+	ASSERT(!test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags));
+	ASSERT(PagePrivate(page));
+	subpage = (struct btrfs_subpage *)page->private;
+	io_tree = &BTRFS_I(fs_info->btree_inode)->io_tree;
+
+	if (wait == WAIT_NONE) {
+		ret = try_lock_extent(io_tree, eb->start,
+				      eb->start + eb->len - 1);
+		if (ret <= 0)
+			return ret;
+	} else {
+		ret = lock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		if (ret < 0)
+			return ret;
+	}
+
+	ret = 0;
+	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags) ||
+	    PageUptodate(page) || test_bit(start, subpage->uptodate_bitmap)) {
+		set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+		unlock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		return ret;
+	}
+
+	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
+	eb->read_mirror = 0;
+	atomic_set(&eb->io_pages, 1);
+	check_buffer_tree_ref(eb);
+
+	ret = submit_extent_page(REQ_OP_READ | REQ_META, NULL, page, eb->start,
+				 eb->len, eb->start - page_offset(page), &bio,
+				 end_bio_extent_readpage, mirror_num, 0, 0,
+				 true);
+	if (ret) {
+		/*
+		 * In the endio function, if we hit something wrong we will
+		 * increase the io_pages, so here we need to decrease it for error
+		 * path.
+		 */
+		atomic_dec(&eb->io_pages);
+	}
+	if (bio) {
+		int tmp;
+
+		tmp = submit_one_bio(bio, mirror_num, 0);
+		if (tmp < 0)
+			return tmp;
+	}
+	if (ret || wait != WAIT_COMPLETE)
+		return ret;
+
+	wait_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+			EXTENT_LOCKED);
+	if (!test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
+		ret = -EIO;
+	return ret;
+}
+
 int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 {
 	int i;
@@ -5738,6 +5807,9 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
 		return 0;
 
+	if (btrfs_is_subpage(eb->fs_info))
+		return read_extent_buffer_subpage(eb, wait, mirror_num);
+
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/14] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case
  2020-11-18  8:53 [PATCH v1 00/14] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (8 preceding siblings ...)
  2020-11-18  8:53 ` [PATCH 09/14] btrfs: extent_io: introduce read_extent_buffer_subpage() Qu Wenruo
@ 2020-11-18  8:53 ` Qu Wenruo
  2020-11-18  8:53 ` [PATCH 11/14] btrfs: disk-io: introduce subpage metadata validation check Qu Wenruo
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18  8:53 UTC (permalink / raw)
  To: linux-btrfs

To handle subpage status update, add the following new tricks:
- Set btrfs_subpage::error_bitmap
  Now if we hit an error, we set the corresponding bits in error bitmap,
  then call ClearPageUptodate() and SetPageError().

- Uptodate page status according to uptodate_bitmap
  Now we only SetPageUptodate() when the full page contains uptodate
  sectors.
  Also if we cleared all error bit during read, then we also
  ClearPageError()

- No page unlock for metadata
  Since metadata doesn't utilize page locking at all, skip it for now.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 56 +++++++++++++++++++++++++++++++++++++++-----
 fs/btrfs/extent_io.h |  1 +
 2 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 35aee688d6c1..236de0b6b20a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2847,15 +2847,59 @@ endio_readpage_release_extent(struct processed_extent *processed,
 	processed->uptodate = uptodate;
 }
 
-static void endio_readpage_update_page_status(struct page *page, bool uptodate)
+static void endio_readpage_update_page_status(struct page *page, bool uptodate,
+					      u64 start, u64 end)
 {
-	if (uptodate) {
-		SetPageUptodate(page);
-	} else {
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+	struct btrfs_subpage *subpage;
+	int bit_start;
+	int nbits;
+	bool all_uptodate = false;
+	bool no_error = false;
+
+	ASSERT(page_offset(page) <= start &&
+		end <= page_offset(page) + PAGE_SIZE - 1);
+
+	if (!btrfs_is_subpage(fs_info)) {
+		if (uptodate) {
+			SetPageUptodate(page);
+		} else {
+			ClearPageUptodate(page);
+			SetPageError(page);
+		}
+		unlock_page(page);
+		return;
+	}
+
+	ASSERT(PagePrivate(page) && page->private);
+	subpage = (struct btrfs_subpage *)page->private;
+	bit_start = (start - page_offset(page)) >> fs_info->sectorsize_bits;
+	nbits = fs_info->nodesize >> fs_info->sectorsize_bits;
+
+	if (!uptodate) {
+		spin_lock_bh(&subpage->lock);
+		bitmap_set(subpage->error_bitmap, bit_start, nbits);
+		spin_unlock_bh(&subpage->lock);
+
 		ClearPageUptodate(page);
 		SetPageError(page);
+		return;
 	}
-	unlock_page(page);
+
+	spin_lock_bh(&subpage->lock);
+	bitmap_set(subpage->uptodate_bitmap, bit_start, nbits);
+	bitmap_clear(subpage->error_bitmap, bit_start, nbits);
+	if (bitmap_full(subpage->uptodate_bitmap, BTRFS_SUBPAGE_BITMAP_SIZE))
+		all_uptodate = true;
+	if (bitmap_empty(subpage->error_bitmap, BTRFS_SUBPAGE_BITMAP_SIZE))
+		no_error = true;
+	spin_unlock_bh(&subpage->lock);
+
+	if (no_error)
+		ClearPageError(page);
+	if (all_uptodate)
+		SetPageUptodate(page);
+	return;
 }
 
 /*
@@ -2985,7 +3029,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		}
 		bio_offset += len;
 
-		endio_readpage_update_page_status(page, uptodate);
+		endio_readpage_update_page_status(page, uptodate, start, end);
 		endio_readpage_release_extent(&processed, BTRFS_I(inode),
 					      start, end, uptodate);
 	}
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 11e1e013cb8c..b4d0e39ebceb 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -312,6 +312,7 @@ struct btrfs_subpage {
 	spinlock_t lock;
 	DECLARE_BITMAP(tree_block_bitmap, BTRFS_SUBPAGE_BITMAP_SIZE);
 	DECLARE_BITMAP(uptodate_bitmap, BTRFS_SUBPAGE_BITMAP_SIZE);
+	DECLARE_BITMAP(error_bitmap, BTRFS_SUBPAGE_BITMAP_SIZE);
 };
 
 int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 11/14] btrfs: disk-io: introduce subpage metadata validation check
  2020-11-18  8:53 [PATCH v1 00/14] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (9 preceding siblings ...)
  2020-11-18  8:53 ` [PATCH 10/14] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case Qu Wenruo
@ 2020-11-18  8:53 ` Qu Wenruo
  2020-11-18  8:53 ` [PATCH 12/14] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18  8:53 UTC (permalink / raw)
  To: linux-btrfs

For subpage metadata validation check, there are some difference:
- Read must finish in one bvec
  Since we're just reading one subpage range in one page, it should
  never be split into two bios nor two bvecs.

- How to grab the existing eb
  Instead of grabbing eb using page->private, we have to go search radix
  tree as we don't have any direct pointer at hand.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b395daf62086..699b999c8ba3 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -593,6 +593,84 @@ static int validate_extent_buffer(struct extent_buffer *eb)
 	return ret;
 }
 
+static int validate_subpage_buffer(struct page *page, u64 start, u64 end,
+				   int mirror)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+	struct extent_buffer *eb;
+	int reads_done;
+	int ret = 0;
+
+	if (!IS_ALIGNED(start, fs_info->sectorsize) ||
+	    !IS_ALIGNED(end - start + 1, fs_info->sectorsize) ||
+	    !IS_ALIGNED(end - start + 1, fs_info->nodesize)) {
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		btrfs_err(fs_info, "invalid tree read bytenr");
+		return -EUCLEAN;
+	}
+
+	/*
+	 * We don't allow bio merge for subpage metadata read, so we should
+	 * only get one eb for each endio hook.
+	 */
+	ASSERT(end == start + fs_info->nodesize - 1);
+	ASSERT(PagePrivate(page));
+
+	rcu_read_lock();
+	eb = radix_tree_lookup(&fs_info->buffer_radix,
+			       start / fs_info->sectorsize);
+	rcu_read_unlock();
+
+	/*
+	 * When we are reading one tree block, eb must have been
+	 * inserted into the radix tree. If not something is wrong.
+	 */
+	if (!eb) {
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		btrfs_err(fs_info,
+			"can't find extent buffer for bytenr %llu",
+			start);
+		return -EUCLEAN;
+	}
+	/*
+	 * The pending IO might have been the only thing that kept
+	 * this buffer in memory.  Make sure we have a ref for all
+	 * this other checks
+	 */
+	atomic_inc(&eb->refs);
+
+	reads_done = atomic_dec_and_test(&eb->io_pages);
+	/* Subpage read must finish in page read */
+	ASSERT(reads_done);
+
+	eb->read_mirror = mirror;
+	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
+		ret = -EIO;
+		goto err;
+	}
+	ret = validate_extent_buffer(eb);
+	if (ret < 0)
+		goto err;
+
+	if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
+		btree_readahead_hook(eb, ret);
+
+	set_extent_buffer_uptodate(eb);
+
+	free_extent_buffer(eb);
+	return ret;
+err:
+	/*
+	 * our io error hook is going to dec the io pages
+	 * again, we have to make sure it has something to
+	 * decrement
+	 */
+	atomic_inc(&eb->io_pages);
+	clear_extent_buffer_uptodate(eb);
+	free_extent_buffer(eb);
+	return ret;
+}
+
 int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
 				   struct page *page, u64 start, u64 end,
 				   int mirror)
@@ -602,6 +680,10 @@ int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
 	int reads_done;
 
 	ASSERT(page->private);
+
+	if (btrfs_is_subpage(btrfs_sb(page->mapping->host->i_sb)))
+		return validate_subpage_buffer(page, start, end, mirror);
+
 	eb = (struct extent_buffer *)page->private;
 
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 12/14] btrfs: introduce btrfs_subpage for data inodes
  2020-11-18  8:53 [PATCH v1 00/14] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (10 preceding siblings ...)
  2020-11-18  8:53 ` [PATCH 11/14] btrfs: disk-io: introduce subpage metadata validation check Qu Wenruo
@ 2020-11-18  8:53 ` Qu Wenruo
  2020-11-18  8:53 ` [PATCH 13/14] btrfs: integrate page status update for read path into begin/end_page_read() Qu Wenruo
  2020-11-18  8:53 ` [PATCH 14/14] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
  13 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18  8:53 UTC (permalink / raw)
  To: linux-btrfs

To support subpage sector size, data also need extra info to make sure
which sectors in a page are uptodate/dirty/...

This patch will make pages for data inodes to get btrfs_subpage
structure attached, and detached when the page is freed.

This patch also slightly changes the timing when
set_page_extent_mapped() to make sure:
- We have page->mapping set
  page->mapping->host is used to grab btrfs_fs_info, thus we can only
  call this function after page is mapped to an inode.

  One call site attaches pages to inode manually, thus we have to modify
  the timing of set_page_extent_mapped() a little.

- As soon as possible, before other operations
  Since memory allocation can fail, we have to do extra error handling.
  Calling set_page_extent_mapped() as soon as possible can simply the
  error handling for several call sites.

The idea is pretty much the same as iomap_page, but with more bitmaps
for btrfs specific cases.

Currently the plan is to switch iomap if iomap can provide sector
aligned write back (only write back dirty sectors, but not the full
page, data balance require this feature).

So we will stick to btrfs specific bitmap for now.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c      | 10 ++++++--
 fs/btrfs/extent_io.c        | 47 +++++++++++++++++++++++++++++++++----
 fs/btrfs/extent_io.h        |  3 ++-
 fs/btrfs/file.c             | 10 +++++---
 fs/btrfs/free-space-cache.c | 15 +++++++++---
 fs/btrfs/inode.c            | 12 ++++++----
 fs/btrfs/ioctl.c            |  5 +++-
 fs/btrfs/reflink.c          |  5 +++-
 fs/btrfs/relocation.c       | 12 ++++++++--
 9 files changed, 98 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 3fb6fde2ca13..f0b119a910a4 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -542,13 +542,19 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 			goto next;
 		}
 
-		end = last_offset + PAGE_SIZE - 1;
 		/*
 		 * at this point, we have a locked page in the page cache
 		 * for these bytes in the file.  But, we have to make
 		 * sure they map to this compressed extent on disk.
 		 */
-		set_page_extent_mapped(page);
+		ret = set_page_extent_mapped(page);
+		if (ret < 0) {
+			unlock_page(page);
+			put_page(page);
+			break;
+		}
+
+		end = last_offset + PAGE_SIZE - 1;
 		lock_extent(tree, last_offset, end);
 		read_lock(&em_tree->lock);
 		em = lookup_extent_mapping(em_tree, last_offset,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 236de0b6b20a..3d1dee27db8a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3232,10 +3232,40 @@ static int attach_extent_buffer_page(struct extent_buffer *eb,
 	return 0;
 }
 
-void set_page_extent_mapped(struct page *page)
+int __must_check set_page_extent_mapped(struct page *page)
 {
-	if (!PagePrivate(page))
+	struct btrfs_fs_info *fs_info;
+
+	ASSERT(page->mapping);
+
+	if (PagePrivate(page))
+		return 0;
+
+	fs_info = btrfs_sb(page->mapping->host->i_sb);
+	if (!btrfs_is_subpage(fs_info)) {
 		attach_page_private(page, (void *)EXTENT_PAGE_PRIVATE);
+		return 0;
+	}
+
+	return btrfs_attach_subpage(fs_info, page);
+}
+
+void clear_page_extent_mapped(struct page *page)
+{
+	struct btrfs_fs_info *fs_info;
+
+	ASSERT(page->mapping);
+
+	if (!PagePrivate(page))
+		return;
+
+	fs_info = btrfs_sb(page->mapping->host->i_sb);
+	if (!btrfs_is_subpage(fs_info)) {
+		detach_page_private(page);
+		return;
+	}
+
+	btrfs_detach_subpage(fs_info, page);
 }
 
 static struct extent_map *
@@ -3292,7 +3322,12 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 	unsigned long this_bio_flag = 0;
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
 
-	set_page_extent_mapped(page);
+	ret = set_page_extent_mapped(page);
+	if (ret < 0) {
+		unlock_extent(tree, start, end);
+		SetPageError(page);
+		goto out;
+	}
 
 	if (!PageUptodate(page)) {
 		if (cleancache_get_page(page) == 0) {
@@ -3737,7 +3772,11 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 		flush_dcache_page(page);
 	}
 
-	set_page_extent_mapped(page);
+	ret = set_page_extent_mapped(page);
+	if (ret < 0) {
+		SetPageError(page);
+		goto done;
+	}
 
 	if (!epd->extent_locked) {
 		ret = writepage_delalloc(BTRFS_I(inode), page, wbc, start,
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index b4d0e39ebceb..01ec178a1ab9 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -181,7 +181,8 @@ int btree_write_cache_pages(struct address_space *mapping,
 void extent_readahead(struct readahead_control *rac);
 int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
 		  u64 start, u64 len);
-void set_page_extent_mapped(struct page *page);
+int __must_check set_page_extent_mapped(struct page *page);
+void clear_page_extent_mapped(struct page *page);
 
 struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 					  u64 start, u64 owner_root, int level);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 69147091f219..41188b751808 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1370,6 +1370,12 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
 			goto fail;
 		}
 
+		err = set_page_extent_mapped(pages[i]);
+		if (err < 0) {
+			faili = i;
+			goto fail;
+		}
+
 		if (i == 0)
 			err = prepare_uptodate_page(inode, pages[i], pos,
 						    force_uptodate);
@@ -1467,10 +1473,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
 	 * We'll call btrfs_dirty_pages() later on, and that will flip around
 	 * delalloc bits and dirty the pages as required.
 	 */
-	for (i = 0; i < num_pages; i++) {
-		set_page_extent_mapped(pages[i]);
+	for (i = 0; i < num_pages; i++)
 		WARN_ON(!PageLocked(pages[i]));
-	}
 
 	return ret;
 }
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 58bd2d3e54db..115e2a7fe74a 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -385,11 +385,22 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, bool uptodate)
 	int i;
 
 	for (i = 0; i < io_ctl->num_pages; i++) {
+		int ret;
+
 		page = find_or_create_page(inode->i_mapping, i, mask);
 		if (!page) {
 			io_ctl_drop_pages(io_ctl);
 			return -ENOMEM;
 		}
+
+		ret = set_page_extent_mapped(page);
+		if (ret < 0) {
+			unlock_page(page);
+			put_page(page);
+			io_ctl_drop_pages(io_ctl);
+			return -ENOMEM;
+		}
+
 		io_ctl->pages[i] = page;
 		if (uptodate && !PageUptodate(page)) {
 			btrfs_readpage(NULL, page);
@@ -409,10 +420,8 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, bool uptodate)
 		}
 	}
 
-	for (i = 0; i < io_ctl->num_pages; i++) {
+	for (i = 0; i < io_ctl->num_pages; i++)
 		clear_page_dirty_for_io(io_ctl->pages[i]);
-		set_page_extent_mapped(io_ctl->pages[i]);
-	}
 
 	return 0;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 750aa3770d8f..b9918214cd23 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4717,6 +4717,9 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 		ret = -ENOMEM;
 		goto out;
 	}
+	ret = set_page_extent_mapped(page);
+	if (ret < 0)
+		goto out_unlock;
 
 	if (!PageUptodate(page)) {
 		ret = btrfs_readpage(NULL, page);
@@ -4734,7 +4737,6 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 	wait_on_page_writeback(page);
 
 	lock_extent_bits(io_tree, block_start, block_end, &cached_state);
-	set_page_extent_mapped(page);
 
 	ordered = btrfs_lookup_ordered_extent(BTRFS_I(inode), block_start);
 	if (ordered) {
@@ -8118,7 +8120,7 @@ static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
 {
 	int ret = try_release_extent_mapping(page, gfp_flags);
 	if (ret == 1)
-		detach_page_private(page);
+		clear_page_extent_mapped(page);
 	return ret;
 }
 
@@ -8277,7 +8279,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 	}
 
 	ClearPageChecked(page);
-	detach_page_private(page);
+	clear_page_extent_mapped(page);
 }
 
 /*
@@ -8356,7 +8358,9 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	wait_on_page_writeback(page);
 
 	lock_extent_bits(io_tree, page_start, page_end, &cached_state);
-	set_page_extent_mapped(page);
+	ret = set_page_extent_mapped(page);
+	if (ret < 0)
+		goto out_unlock;
 
 	/*
 	 * we can't set the delalloc bits if there are pending ordered
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2904f92c3813..56cc26d0e6db 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1307,6 +1307,10 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		if (!page)
 			break;
 
+		ret = set_page_extent_mapped(page);
+		if (ret < 0)
+			break;
+
 		page_start = page_offset(page);
 		page_end = page_start + PAGE_SIZE - 1;
 		while (1) {
@@ -1428,7 +1432,6 @@ static int cluster_pages_for_defrag(struct inode *inode,
 	for (i = 0; i < i_done; i++) {
 		clear_page_dirty_for_io(pages[i]);
 		ClearPageChecked(pages[i]);
-		set_page_extent_mapped(pages[i]);
 		set_page_dirty(pages[i]);
 		unlock_page(pages[i]);
 		put_page(pages[i]);
diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index 4bbc5f52b752..6f20536494e8 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -81,7 +81,10 @@ static int copy_inline_to_page(struct btrfs_inode *inode,
 		goto out_unlock;
 	}
 
-	set_page_extent_mapped(page);
+	ret = set_page_extent_mapped(page);
+	if (ret < 0)
+		goto out_unlock;
+
 	clear_extent_bit(&inode->io_tree, file_offset, range_end,
 			 EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
 			 0, 0, NULL);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index c5774a8e6ff7..c353b85f7027 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2699,6 +2699,16 @@ static int relocate_file_extent_cluster(struct inode *inode,
 				goto out;
 			}
 		}
+		ret = set_page_extent_mapped(page);
+		if (ret < 0) {
+			btrfs_delalloc_release_metadata(BTRFS_I(inode),
+						PAGE_SIZE, true);
+			btrfs_delalloc_release_extents(BTRFS_I(inode),
+						PAGE_SIZE);
+			unlock_page(page);
+			put_page(page);
+			goto out;
+		}
 
 		if (PageReadahead(page)) {
 			page_cache_async_readahead(inode->i_mapping,
@@ -2726,8 +2736,6 @@ static int relocate_file_extent_cluster(struct inode *inode,
 
 		lock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end);
 
-		set_page_extent_mapped(page);
-
 		if (nr < cluster->nr &&
 		    page_start + offset == cluster->boundary[nr]) {
 			set_extent_bits(&BTRFS_I(inode)->io_tree,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 13/14] btrfs: integrate page status update for read path into begin/end_page_read()
  2020-11-18  8:53 [PATCH v1 00/14] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (11 preceding siblings ...)
  2020-11-18  8:53 ` [PATCH 12/14] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
@ 2020-11-18  8:53 ` Qu Wenruo
  2020-11-18  8:53 ` [PATCH 14/14] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
  13 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18  8:53 UTC (permalink / raw)
  To: linux-btrfs

In btrfs data page read path, the page status update are handled in two
different locations:

  btrfs_do_read_page()
  {
	while (cur <= end) {
		/* No need to read from disk */
		if (HOLE/PREALLOC/INLINE){
			memset();
			set_extent_uptodate();
			continue;
		}
		/* Read from disk */
		ret = submit_extent_page(end_bio_extent_readpage);
  }

  end_bio_extent_readpage()
  {
	endio_readpage_uptodate_page_status();
  }

This is fine for sectorsize == PAGE_SIZE case, as for above loop we
should only hit one branch and then exit.

But for subpage, there are more works to be done in page status update:
- Page Unlock condition
  Unlike regular page size == sectorsize case, we can no longer just
  unlock a page.
  Only the last reader of the page can unlock the page.
  This means, we can unlock the page either in the while() loop, or in
  the endio function.

- Page uptodate condition
  Since we have multiple sectors to read for a page, we can only mark
  the full page uptodate if all sectors are uptodate.

To handle both subpage and regular cases, introduce a pair of functions
to help handling page status update:

- being_page_read()
  For regular case, it does nothing.
  For subpage case, it update the reader counters so that later
  end_page_read() can know who is the last one to unlock the page.

- end_page_read()
  This is just endio_readpage_uptodate_page_status() renamed.
  The original name is a little too long and too specific for endio.

  The only new trick added is the condition for page unlock.
  Now for subage data, we unlock the page if we're the last reader.

This does not only provide the basis for subpage data read, but also
hide the special handling of page read from the main read loop.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 45 +++++++++++++++++++++++++++++++-------------
 fs/btrfs/extent_io.h |  1 +
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3d1dee27db8a..0b484df67dc3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2847,8 +2847,19 @@ endio_readpage_release_extent(struct processed_extent *processed,
 	processed->uptodate = uptodate;
 }
 
-static void endio_readpage_update_page_status(struct page *page, bool uptodate,
-					      u64 start, u64 end)
+static void begin_page_read(struct btrfs_fs_info *fs_info, struct page *page)
+{
+	struct btrfs_subpage *subpage;
+
+	if (!btrfs_is_subpage(fs_info))
+		return;
+
+	ASSERT(PagePrivate(page) && page->private);
+	subpage = (struct btrfs_subpage *)page->private;
+	atomic_set(&subpage->readers, PAGE_SIZE >> fs_info->sectorsize_bits);
+}
+
+static void end_page_read(struct page *page, bool uptodate, u64 start, u64 end)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
 	struct btrfs_subpage *subpage;
@@ -2874,7 +2885,7 @@ static void endio_readpage_update_page_status(struct page *page, bool uptodate,
 	ASSERT(PagePrivate(page) && page->private);
 	subpage = (struct btrfs_subpage *)page->private;
 	bit_start = (start - page_offset(page)) >> fs_info->sectorsize_bits;
-	nbits = fs_info->nodesize >> fs_info->sectorsize_bits;
+	nbits = (end + 1 - start) >> fs_info->sectorsize_bits;
 
 	if (!uptodate) {
 		spin_lock_bh(&subpage->lock);
@@ -2899,7 +2910,14 @@ static void endio_readpage_update_page_status(struct page *page, bool uptodate,
 		ClearPageError(page);
 	if (all_uptodate)
 		SetPageUptodate(page);
-	return;
+
+	/*
+	 * For data, we still do page unlock, but that only happens when we're
+	 * the last reader of the page.
+	 */
+	if (page->mapping->host != fs_info->btree_inode &&
+	    atomic_sub_and_test(nbits, &subpage->readers))
+		unlock_page(page);
 }
 
 /*
@@ -3029,7 +3047,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		}
 		bio_offset += len;
 
-		endio_readpage_update_page_status(page, uptodate, start, end);
+		end_page_read(page, uptodate, start, end);
 		endio_readpage_release_extent(&processed, BTRFS_I(inode),
 					      start, end, uptodate);
 	}
@@ -3306,6 +3324,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		      unsigned int read_flags, u64 *prev_em_start)
 {
 	struct inode *inode = page->mapping->host;
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	u64 start = page_offset(page);
 	const u64 end = start + PAGE_SIZE - 1;
 	u64 cur = start;
@@ -3349,6 +3368,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 			kunmap_atomic(userpage);
 		}
 	}
+	begin_page_read(fs_info, page);
 	while (cur <= end) {
 		bool force_bio_submit = false;
 		u64 offset;
@@ -3366,13 +3386,14 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 					    &cached, GFP_NOFS);
 			unlock_extent_cached(tree, cur,
 					     cur + iosize - 1, &cached);
+			end_page_read(page, true, cur, cur + iosize - 1);
 			break;
 		}
 		em = __get_extent_map(inode, page, pg_offset, cur,
 				      end - cur + 1, em_cached);
 		if (IS_ERR_OR_NULL(em)) {
-			SetPageError(page);
 			unlock_extent(tree, cur, end);
+			end_page_read(page, false, cur, end);
 			break;
 		}
 		extent_offset = cur - em->start;
@@ -3455,6 +3476,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 					    &cached, GFP_NOFS);
 			unlock_extent_cached(tree, cur,
 					     cur + iosize - 1, &cached);
+			end_page_read(page, true, cur, cur + iosize - 1);
 			cur = cur + iosize;
 			pg_offset += iosize;
 			continue;
@@ -3464,6 +3486,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 				   EXTENT_UPTODATE, 1, NULL)) {
 			check_page_uptodate(tree, page);
 			unlock_extent(tree, cur, cur + iosize - 1);
+			end_page_read(page, true, cur, cur + iosize - 1);
 			cur = cur + iosize;
 			pg_offset += iosize;
 			continue;
@@ -3472,8 +3495,8 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		 * to date.  Error out
 		 */
 		if (block_start == EXTENT_MAP_INLINE) {
-			SetPageError(page);
 			unlock_extent(tree, cur, cur + iosize - 1);
+			end_page_read(page, false, cur, cur + iosize - 1);
 			cur = cur + iosize;
 			pg_offset += iosize;
 			continue;
@@ -3490,19 +3513,14 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 			nr++;
 			*bio_flags = this_bio_flag;
 		} else {
-			SetPageError(page);
 			unlock_extent(tree, cur, cur + iosize - 1);
+			end_page_read(page, false, cur, cur + iosize - 1);
 			goto out;
 		}
 		cur = cur + iosize;
 		pg_offset += iosize;
 	}
 out:
-	if (!nr) {
-		if (!PageError(page))
-			SetPageUptodate(page);
-		unlock_page(page);
-	}
 	return ret;
 }
 
@@ -5456,6 +5474,7 @@ int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page)
 		return -ENOMEM;
 
 	spin_lock_init(&subpage->lock);
+	atomic_set(&subpage->readers, 0);
 	attach_page_private(page, subpage);
 	return 0;
 }
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 01ec178a1ab9..e050490056a6 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -314,6 +314,7 @@ struct btrfs_subpage {
 	DECLARE_BITMAP(tree_block_bitmap, BTRFS_SUBPAGE_BITMAP_SIZE);
 	DECLARE_BITMAP(uptodate_bitmap, BTRFS_SUBPAGE_BITMAP_SIZE);
 	DECLARE_BITMAP(error_bitmap, BTRFS_SUBPAGE_BITMAP_SIZE);
+	atomic_t readers;
 };
 
 int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 14/14] btrfs: allow RO mount of 4K sector size fs on 64K page system
  2020-11-18  8:53 [PATCH v1 00/14] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (12 preceding siblings ...)
  2020-11-18  8:53 ` [PATCH 13/14] btrfs: integrate page status update for read path into begin/end_page_read() Qu Wenruo
@ 2020-11-18  8:53 ` Qu Wenruo
  13 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18  8:53 UTC (permalink / raw)
  To: linux-btrfs

This adds the basic RO mount ability for 4K sector size on 64K page
system.

Currently we only plan to support 4K and 64K page system.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 24 +++++++++++++++++++++---
 fs/btrfs/super.c   |  7 +++++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 699b999c8ba3..32bf623e3646 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2524,13 +2524,21 @@ static int validate_super(struct btrfs_fs_info *fs_info,
 		btrfs_err(fs_info, "invalid sectorsize %llu", sectorsize);
 		ret = -EINVAL;
 	}
-	/* Only PAGE SIZE is supported yet */
-	if (sectorsize != PAGE_SIZE) {
+
+	/*
+	 * For 4K page size, we only support 4K sector size.
+	 * For 64K page size, we support RW for 64K sector size, and RO for
+	 * 4K sector size.
+	 */
+	if ((SZ_4K == PAGE_SIZE && sectorsize != PAGE_SIZE) ||
+	    (SZ_64K == PAGE_SIZE && (sectorsize != SZ_4K &&
+				     sectorsize != SZ_64K))) {
 		btrfs_err(fs_info,
-			"sectorsize %llu not supported yet, only support %lu",
+			"sectorsize %llu not supported yet for page size %lu",
 			sectorsize, PAGE_SIZE);
 		ret = -EINVAL;
 	}
+
 	if (!is_power_of_2(nodesize) || nodesize < sectorsize ||
 	    nodesize > BTRFS_MAX_METADATA_BLOCKSIZE) {
 		btrfs_err(fs_info, "invalid nodesize %llu", nodesize);
@@ -3182,6 +3190,16 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_alloc;
 	}
 
+	/* For 4K sector size support, it's only read-only yet */
+	if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) {
+		if (!sb_rdonly(sb) || btrfs_super_log_root(disk_super)) {
+			btrfs_err(fs_info,
+				"subpage sector size only support RO yet");
+			err = -EINVAL;
+			goto fail_alloc;
+		}
+	}
+
 	ret = btrfs_init_workqueues(fs_info, fs_devices);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 6693cfc14dfd..5338d3a60e9b 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1970,6 +1970,13 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			ret = -EINVAL;
 			goto restore;
 		}
+		if (btrfs_is_subpage(fs_info)) {
+			btrfs_warn(fs_info,
+	"read-write mount is not yet allowed for sector size %u page size %lu",
+				   fs_info->sectorsize, PAGE_SIZE);
+			ret = -EINVAL;
+			goto restore;
+		}
 
 		ret = btrfs_cleanup_fs_roots(fs_info);
 		if (ret)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/14] btrfs: extent_io: Use detach_page_private() for alloc_extent_buffer()
  2020-11-18  8:53 ` [PATCH 01/14] btrfs: extent_io: Use detach_page_private() for alloc_extent_buffer() Qu Wenruo
@ 2020-11-18 10:22   ` Johannes Thumshirn
  2020-11-18 15:56   ` David Sterba
  1 sibling, 0 replies; 20+ messages in thread
From: Johannes Thumshirn @ 2020-11-18 10:22 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 18/11/2020 09:55, Qu Wenruo wrote:
> In alloc_extent_buffer(), after we got a page from btree inode, we check
> if that page has private pointer attached.
> 
> If attached, we check if the existing extent buffer has a proper refs.
> If not (the eb is being freed), we will detach that private eb pointer.
> 
> The point here is, we are detaching that eb pointer by calling:
> - ClearPagePrivate()
> - put_page()
> 
> The put_page() here is especially confusing, as it's decreaing the ref
> caused by attach_page_private().
> Without knowing that, it looks like the put_page() is for the
> find_or_create_page() call, confusing the read.
> 
> Since we're always modifing page private with attach_page_private() and
> detach_page_private(), the only open-coded detach_page_private() here is
> really confusing.
> 
> Fix it by calling detach_page_private().
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index f305777ee1a3..55115f485d09 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5310,14 +5310,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>  				goto free_eb;
>  			}
>  			exists = NULL;
> +			WARN_ON(PageDirty(p));
>  
>  			/*
>  			 * Do this so attach doesn't complain and we need to
>  			 * drop the ref the old guy had.
>  			 */
> -			ClearPagePrivate(p);
> -			WARN_ON(PageDirty(p));
> -			put_page(p);
> +			detach_page_private(page);
>  		}
>  		attach_extent_buffer_page(eb, p);
>  		spin_unlock(&mapping->private_lock);
> 

There's one difference though, detach_page_private() does set a page's ->private to 0,
whereas in alloc_extent_buffer() we didn't do it.

I think setting it to 0 is more correct though, so
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 02/14] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page
  2020-11-18  8:53 ` [PATCH 02/14] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page Qu Wenruo
@ 2020-11-18 10:26   ` Johannes Thumshirn
  0 siblings, 0 replies; 20+ messages in thread
From: Johannes Thumshirn @ 2020-11-18 10:26 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks ok,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 03/14] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure
  2020-11-18  8:53 ` [PATCH 03/14] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure Qu Wenruo
@ 2020-11-18 10:53   ` Johannes Thumshirn
  2020-11-18 11:45     ` Qu Wenruo
  0 siblings, 1 reply; 20+ messages in thread
From: Johannes Thumshirn @ 2020-11-18 10:53 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 18/11/2020 09:55, Qu Wenruo wrote:
> +	ASSERT(subpage && bitmap_empty(subpage->tree_block_bitmap,
> +				       BTRFS_SUBPAGE_BITMAP_SIZE));

Hmm from this patch it's not clear to me, what the bitmap is
supposed to do. Maybe add this ASSERT() to the patch manipulating the bitmap.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 03/14] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure
  2020-11-18 10:53   ` Johannes Thumshirn
@ 2020-11-18 11:45     ` Qu Wenruo
  0 siblings, 0 replies; 20+ messages in thread
From: Qu Wenruo @ 2020-11-18 11:45 UTC (permalink / raw)
  To: Johannes Thumshirn, linux-btrfs



On 2020/11/18 下午6:53, Johannes Thumshirn wrote:
> On 18/11/2020 09:55, Qu Wenruo wrote:
>> +	ASSERT(subpage && bitmap_empty(subpage->tree_block_bitmap,
>> +				       BTRFS_SUBPAGE_BITMAP_SIZE));
> 
> Hmm from this patch it's not clear to me, what the bitmap is
> supposed to do. Maybe add this ASSERT() to the patch manipulating the bitmap.
> 
Indeed, this skeleton patch should not utilize any bit yet.

The tree_block_bitmap is mostly utilize by later patches, to indicate in
which range of the page that we have a tree block.

Indeed I need to improve the patch separation here.

Thanks for exposing the problem here,
Qu


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/14] btrfs: extent_io: Use detach_page_private() for alloc_extent_buffer()
  2020-11-18  8:53 ` [PATCH 01/14] btrfs: extent_io: Use detach_page_private() for alloc_extent_buffer() Qu Wenruo
  2020-11-18 10:22   ` Johannes Thumshirn
@ 2020-11-18 15:56   ` David Sterba
  1 sibling, 0 replies; 20+ messages in thread
From: David Sterba @ 2020-11-18 15:56 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Nov 18, 2020 at 04:53:06PM +0800, Qu Wenruo wrote:
> index f305777ee1a3..55115f485d09 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5310,14 +5310,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>  				goto free_eb;
>  			}
>  			exists = NULL;
> +			WARN_ON(PageDirty(p));
>  
>  			/*
>  			 * Do this so attach doesn't complain and we need to
>  			 * drop the ref the old guy had.
>  			 */
> -			ClearPagePrivate(p);
> -			WARN_ON(PageDirty(p));
> -			put_page(p);
> +			detach_page_private(page);

Does this compile? The page is in 'p', not in 'page'. The code is moved
in the next patch but each patch needs to compile.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-11-18 18:24 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-18  8:53 [PATCH v1 00/14] btrfs: add read-only support for subpage sector size Qu Wenruo
2020-11-18  8:53 ` [PATCH 01/14] btrfs: extent_io: Use detach_page_private() for alloc_extent_buffer() Qu Wenruo
2020-11-18 10:22   ` Johannes Thumshirn
2020-11-18 15:56   ` David Sterba
2020-11-18  8:53 ` [PATCH 02/14] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page Qu Wenruo
2020-11-18 10:26   ` Johannes Thumshirn
2020-11-18  8:53 ` [PATCH 03/14] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure Qu Wenruo
2020-11-18 10:53   ` Johannes Thumshirn
2020-11-18 11:45     ` Qu Wenruo
2020-11-18  8:53 ` [PATCH 04/14] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
2020-11-18  8:53 ` [PATCH 05/14] btrfs: extent_io: make grab_extent_buffer_from_page() " Qu Wenruo
2020-11-18  8:53 ` [PATCH 06/14] btrfs: extent_io: support subpage for extent buffer page release Qu Wenruo
2020-11-18  8:53 ` [PATCH 07/14] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
2020-11-18  8:53 ` [PATCH 08/14] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
2020-11-18  8:53 ` [PATCH 09/14] btrfs: extent_io: introduce read_extent_buffer_subpage() Qu Wenruo
2020-11-18  8:53 ` [PATCH 10/14] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case Qu Wenruo
2020-11-18  8:53 ` [PATCH 11/14] btrfs: disk-io: introduce subpage metadata validation check Qu Wenruo
2020-11-18  8:53 ` [PATCH 12/14] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
2020-11-18  8:53 ` [PATCH 13/14] btrfs: integrate page status update for read path into begin/end_page_read() Qu Wenruo
2020-11-18  8:53 ` [PATCH 14/14] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.