All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/18] btrfs: add read-only support for subpage sector size
@ 2021-01-26  8:33 Qu Wenruo
  2021-01-26  8:33 ` [PATCH v5 01/18] btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to PAGE_START_WRITEBACK Qu Wenruo
                   ` (21 more replies)
  0 siblings, 22 replies; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs

Patches can be fetched from github:
https://github.com/adam900710/linux/tree/subpage
Currently the branch also contains partial RW data support (still some
ordered extent and data csum mismatch problems)

Great thanks to David/Nikolay/Josef for their effort reviewing and
merging the preparation patches into misc-next.

=== What works ===
Just from the patchset:
- Data read
  Both regular and compressed data, with csum check.

- Metadata read

This means, with these patchset, 64K page systems can at least mount
btrfs with 4K sector size read-only.
This should provide the ability to migrate data at least.

While on the github branch, there are already experimental RW supports,
there are still ordered extent related bugs for me to fix.
Thus only the RO part is sent for review and testing.

=== Patchset structure ===
Patch 01~02:	Preparation patches which don't have functional change
Patch 03~12:	Subpage metadata allocation and freeing
Patch 13~15:	Subpage metadata read path
Patch 16~17:	Subpage data read path
Patch 18:	Enable subpage RO support

=== Changelog ===
v1:
- Separate the main implementation from previous huge patchset
  Huge patchset doesn't make much sense.

- Use bitmap implementation
  Now page::private will be a pointer to btrfs_subpage structure, which
  contains bitmaps for various page status.

v2:
- Use page::private as btrfs_subpage for extra info
  This replace old extent io tree based solution, which reduces latency
  and don't require memory allocation for its operations.

- Cherry-pick new preparation patches from RW development
  Those new preparation patches improves the readability by their own.

v3:
- Make dummy extent buffer to follow the same subpage accessors
  Fsstress exposed several ASSERT() for dummy extent buffers.
  It turns out we need to make dummy extent buffer to own the same
  btrfs_subpage structure to make eb accessors to work properly

- Two new small __process_pages_contig() related preparation patches
  One to make __process_pages_contig() to enhance the error handling
  path for locked_page, one to merge one macro.

- Extent buffers refs count update
  Except try_release_extent_buffer(), all other eb uses will try to
  increase the ref count of the eb.
  For try_release_extent_buffer(), the eb refs check will happen inside
  the rcu critical section to avoid eb being freed.

- Comment updates
  Addressing the comments from the mail list.

v4:
- Get rid of btrfs_subpage::tree_block_bitmap
  This is to reduce lock complexity (no need to bother extra subpage
  lock for metadata, all locks are existing locks)
  Now eb looking up mostly depends on radix tree, with small help from
  btrfs_subpage::under_alloc.
  Now I haven't experieneced metadata related problems any more during
  my local fsstress tests.

- Fix a race where metadata page dirty bit can race
  Fixed in the metadata RW patchset though.

- Rebased to latest misc-next branch
  With 4 patches removed, as they are already in misc-next.

v5:
- Use the updated version from David as base
  Most comment/commit message update should be kept as is.

- A new separate patch to move UNMAPPED bit set timing

- New comment on why we need to prealloc subpage inside a loop
  Mostly for further 16K page size support, where we can have
  eb across multiple pages.

- Remove one patch which is too RW specific
  Since it introduces functional change which only makes sense for RW
  support, it's not a good idea to include it in RO support.

- Error handling fixes
  Great thanks to Josef.

- Refactor btrfs_subpage allocation/freeing
  Now we have btrfs_alloc_subpage() and btrfs_free_subpage() helpers to
  do all the allocation/freeing.
  It's pretty easy to convert to kmem_cache using above helpers.
  (already internally tested using kmem_cache without problem, in fact
   it's all the problems found in kmem_cache test leads to the new
   interface)

- Use btrfs_subpage::eb_refs to replace old under_alloc
  This makes checking whether the page has any eb left much easier.

Qu Wenruo (18):
  btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to
    PAGE_START_WRITEBACK
  btrfs: set UNMAPPED bit early in btrfs_clone_extent_buffer() for
    subpage support
  btrfs: introduce the skeleton of btrfs_subpage structure
  btrfs: make attach_extent_buffer_page() handle subpage case
  btrfs: make grab_extent_buffer_from_page() handle subpage case
  btrfs: support subpage for extent buffer page release
  btrfs: attach private to dummy extent buffer pages
  btrfs: introduce helpers for subpage uptodate status
  btrfs: introduce helpers for subpage error status
  btrfs: support subpage in set/clear_extent_buffer_uptodate()
  btrfs: support subpage in btrfs_clone_extent_buffer
  btrfs: support subpage in try_release_extent_buffer()
  btrfs: introduce read_extent_buffer_subpage()
  btrfs: support subpage in endio_readpage_update_page_status()
  btrfs: introduce subpage metadata validation check
  btrfs: introduce btrfs_subpage for data inodes
  btrfs: integrate page status update for data read path into
    begin/end_page_read()
  btrfs: allow RO mount of 4K sector size fs on 64K page system

 fs/btrfs/Makefile           |   3 +-
 fs/btrfs/compression.c      |  10 +-
 fs/btrfs/disk-io.c          |  81 +++++-
 fs/btrfs/extent_io.c        | 485 ++++++++++++++++++++++++++++++++----
 fs/btrfs/extent_io.h        |  15 +-
 fs/btrfs/file.c             |  24 +-
 fs/btrfs/free-space-cache.c |  15 +-
 fs/btrfs/inode.c            |  42 ++--
 fs/btrfs/ioctl.c            |   8 +-
 fs/btrfs/reflink.c          |   5 +-
 fs/btrfs/relocation.c       |  11 +-
 fs/btrfs/subpage.c          | 278 +++++++++++++++++++++
 fs/btrfs/subpage.h          |  92 +++++++
 fs/btrfs/super.c            |   8 +-
 14 files changed, 964 insertions(+), 113 deletions(-)
 create mode 100644 fs/btrfs/subpage.c
 create mode 100644 fs/btrfs/subpage.h

-- 
2.30.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v5 01/18] btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to PAGE_START_WRITEBACK
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
@ 2021-01-26  8:33 ` Qu Wenruo
  2021-01-27 15:56   ` Josef Bacik
  2021-01-26  8:33 ` [PATCH v5 02/18] btrfs: set UNMAPPED bit early in btrfs_clone_extent_buffer() for subpage support Qu Wenruo
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK are two defines used in
__process_pages_contig(), to let the function know to clear page dirty
bit and then set page writeback.

However page writeback and dirty bits are conflicting (at least for
sector size == PAGE_SIZE case), this means these two have to be always
updated together.

This means we can merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to
PAGE_START_WRITEBACK.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c |  4 ++--
 fs/btrfs/extent_io.h | 12 ++++++------
 fs/btrfs/inode.c     | 28 ++++++++++------------------
 3 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7f689ad7709c..6cd81c6e8996 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1975,10 +1975,10 @@ static int __process_pages_contig(struct address_space *mapping,
 				pages_processed++;
 				continue;
 			}
-			if (page_ops & PAGE_CLEAR_DIRTY)
+			if (page_ops & PAGE_START_WRITEBACK) {
 				clear_page_dirty_for_io(pages[i]);
-			if (page_ops & PAGE_SET_WRITEBACK)
 				set_page_writeback(pages[i]);
+			}
 			if (page_ops & PAGE_SET_ERROR)
 				SetPageError(pages[i]);
 			if (page_ops & PAGE_END_WRITEBACK)
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 19221095c635..2d8187c84812 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -35,12 +35,12 @@ enum {
 
 /* these are flags for __process_pages_contig */
 #define PAGE_UNLOCK		(1 << 0)
-#define PAGE_CLEAR_DIRTY	(1 << 1)
-#define PAGE_SET_WRITEBACK	(1 << 2)
-#define PAGE_END_WRITEBACK	(1 << 3)
-#define PAGE_SET_PRIVATE2	(1 << 4)
-#define PAGE_SET_ERROR		(1 << 5)
-#define PAGE_LOCK		(1 << 6)
+/* Page starts writeback, clear dirty bit and set writeback bit */
+#define PAGE_START_WRITEBACK	(1 << 1)
+#define PAGE_END_WRITEBACK	(1 << 2)
+#define PAGE_SET_PRIVATE2	(1 << 3)
+#define PAGE_SET_ERROR		(1 << 4)
+#define PAGE_LOCK		(1 << 5)
 
 /*
  * page->private values.  Every page that is controlled by the extent
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ef6cb7b620d0..d1bb3cc8499b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -692,8 +692,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 						     NULL,
 						     clear_flags,
 						     PAGE_UNLOCK |
-						     PAGE_CLEAR_DIRTY |
-						     PAGE_SET_WRITEBACK |
+						     PAGE_START_WRITEBACK |
 						     page_error_op |
 						     PAGE_END_WRITEBACK);
 
@@ -934,8 +933,7 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk)
 				async_extent->start +
 				async_extent->ram_size - 1,
 				NULL, EXTENT_LOCKED | EXTENT_DELALLOC,
-				PAGE_UNLOCK | PAGE_CLEAR_DIRTY |
-				PAGE_SET_WRITEBACK);
+				PAGE_UNLOCK | PAGE_START_WRITEBACK);
 		if (btrfs_submit_compressed_write(inode, async_extent->start,
 				    async_extent->ram_size,
 				    ins.objectid,
@@ -971,9 +969,8 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk)
 				     NULL, EXTENT_LOCKED | EXTENT_DELALLOC |
 				     EXTENT_DELALLOC_NEW |
 				     EXTENT_DEFRAG | EXTENT_DO_ACCOUNTING,
-				     PAGE_UNLOCK | PAGE_CLEAR_DIRTY |
-				     PAGE_SET_WRITEBACK | PAGE_END_WRITEBACK |
-				     PAGE_SET_ERROR);
+				     PAGE_UNLOCK | PAGE_START_WRITEBACK |
+				     PAGE_END_WRITEBACK | PAGE_SET_ERROR);
 	free_async_extent_pages(async_extent);
 	kfree(async_extent);
 	goto again;
@@ -1071,8 +1068,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 				     EXTENT_LOCKED | EXTENT_DELALLOC |
 				     EXTENT_DELALLOC_NEW | EXTENT_DEFRAG |
 				     EXTENT_DO_ACCOUNTING, PAGE_UNLOCK |
-				     PAGE_CLEAR_DIRTY | PAGE_SET_WRITEBACK |
-				     PAGE_END_WRITEBACK);
+				     PAGE_START_WRITEBACK | PAGE_END_WRITEBACK);
 			*nr_written = *nr_written +
 			     (end - start + PAGE_SIZE) / PAGE_SIZE;
 			*page_started = 1;
@@ -1194,8 +1190,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 out_unlock:
 	clear_bits = EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DELALLOC_NEW |
 		EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV;
-	page_ops = PAGE_UNLOCK | PAGE_CLEAR_DIRTY | PAGE_SET_WRITEBACK |
-		PAGE_END_WRITEBACK;
+	page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK | PAGE_END_WRITEBACK;
 	/*
 	 * If we reserved an extent for our delalloc range (or a subrange) and
 	 * failed to create the respective ordered extent, then it means that
@@ -1320,9 +1315,8 @@ static int cow_file_range_async(struct btrfs_inode *inode,
 		unsigned clear_bits = EXTENT_LOCKED | EXTENT_DELALLOC |
 			EXTENT_DELALLOC_NEW | EXTENT_DEFRAG |
 			EXTENT_DO_ACCOUNTING;
-		unsigned long page_ops = PAGE_UNLOCK | PAGE_CLEAR_DIRTY |
-			PAGE_SET_WRITEBACK | PAGE_END_WRITEBACK |
-			PAGE_SET_ERROR;
+		unsigned long page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK |
+					 PAGE_END_WRITEBACK | PAGE_SET_ERROR;
 
 		extent_clear_unlock_delalloc(inode, start, end, locked_page,
 					     clear_bits, page_ops);
@@ -1519,8 +1513,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 					     EXTENT_LOCKED | EXTENT_DELALLOC |
 					     EXTENT_DO_ACCOUNTING |
 					     EXTENT_DEFRAG, PAGE_UNLOCK |
-					     PAGE_CLEAR_DIRTY |
-					     PAGE_SET_WRITEBACK |
+					     PAGE_START_WRITEBACK |
 					     PAGE_END_WRITEBACK);
 		return -ENOMEM;
 	}
@@ -1842,8 +1835,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 					     locked_page, EXTENT_LOCKED |
 					     EXTENT_DELALLOC | EXTENT_DEFRAG |
 					     EXTENT_DO_ACCOUNTING, PAGE_UNLOCK |
-					     PAGE_CLEAR_DIRTY |
-					     PAGE_SET_WRITEBACK |
+					     PAGE_START_WRITEBACK |
 					     PAGE_END_WRITEBACK);
 	btrfs_free_path(path);
 	return ret;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 02/18] btrfs: set UNMAPPED bit early in btrfs_clone_extent_buffer() for subpage support
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
  2021-01-26  8:33 ` [PATCH v5 01/18] btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to PAGE_START_WRITEBACK Qu Wenruo
@ 2021-01-26  8:33 ` Qu Wenruo
  2021-01-27 15:56   ` Josef Bacik
  2021-01-26  8:33 ` [PATCH v5 03/18] btrfs: introduce the skeleton of btrfs_subpage structure Qu Wenruo
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs

For the incoming subpage support, UNMAPPED extent buffer will have
different behavior in btrfs_release_extent_buffer().

This means we need to set UNMAPPED bit early before calling
btrfs_release_extent_buffer().

Currently there is only one caller which relies on
btrfs_release_extent_buffer() in its error path while set UNMAPPED bit
late:
- btrfs_clone_extent_buffer()

Make it subpage compatible by setting the UNMAPPED bit early, since
we're here, also move the UPTODATE bit early.

There is another caller, __alloc_dummy_extent_buffer(), setting UNAMPPED
bit late, but that function clean up the allocated page manually, thus
no need for any modification.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6cd81c6e8996..a56391839aca 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5062,6 +5062,13 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
 	if (new == NULL)
 		return NULL;
 
+	/*
+	 * Set UNMAPPED bfore calling btrfs_release_extent_buffer(), as
+	 * btrfs_release_extent_buffer() have different behavior for
+	 * UNMAPPED subpage extent buffer.
+	 */
+	set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags);
+
 	for (i = 0; i < num_pages; i++) {
 		p = alloc_page(GFP_NOFS);
 		if (!p) {
@@ -5074,9 +5081,7 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
 		new->pages[i] = p;
 		copy_page(page_address(p), page_address(src->pages[i]));
 	}
-
 	set_bit(EXTENT_BUFFER_UPTODATE, &new->bflags);
-	set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags);
 
 	return new;
 }
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 03/18] btrfs: introduce the skeleton of btrfs_subpage structure
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
  2021-01-26  8:33 ` [PATCH v5 01/18] btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to PAGE_START_WRITEBACK Qu Wenruo
  2021-01-26  8:33 ` [PATCH v5 02/18] btrfs: set UNMAPPED bit early in btrfs_clone_extent_buffer() for subpage support Qu Wenruo
@ 2021-01-26  8:33 ` Qu Wenruo
  2021-01-26  8:33 ` [PATCH v5 04/18] btrfs: make attach_extent_buffer_page() handle subpage case Qu Wenruo
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik, David Sterba

For sectorsize < page size support, we need a structure to record extra
status info for each sector of a page.

Introduce the skeleton structure, all subpage related code would go to
subpage.[ch].

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/Makefile  |  3 ++-
 fs/btrfs/subpage.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/subpage.h | 33 +++++++++++++++++++++++++++++++++
 fs/btrfs/super.c   |  1 -
 4 files changed, 78 insertions(+), 2 deletions(-)
 create mode 100644 fs/btrfs/subpage.c
 create mode 100644 fs/btrfs/subpage.h

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 9f1b1a88e317..942562e11456 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -11,7 +11,8 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
 	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
 	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
 	   uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \
-	   block-rsv.o delalloc-space.o block-group.o discard.o reflink.o
+	   block-rsv.o delalloc-space.o block-group.o discard.o reflink.o \
+	   subpage.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
new file mode 100644
index 000000000000..a3e5b6a13d54
--- /dev/null
+++ b/fs/btrfs/subpage.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/slab.h>
+#include "ctree.h"
+#include "subpage.h"
+
+int btrfs_attach_subpage(const struct btrfs_fs_info *fs_info,
+			 struct page *page, enum btrfs_subpage_type type)
+{
+	struct btrfs_subpage *subpage;
+
+	/*
+	 * We have cases like a dummy extent buffer page, which is not mappped
+	 * and doesn't need to be locked.
+	 */
+	if (page->mapping)
+		ASSERT(PageLocked(page));
+	/* Either not subpage, or the page already has private attached */
+	if (fs_info->sectorsize == PAGE_SIZE || PagePrivate(page))
+		return 0;
+
+	subpage = kzalloc(sizeof(struct btrfs_subpage), GFP_NOFS);
+	if (!subpage)
+		return -ENOMEM;
+
+	spin_lock_init(&subpage->lock);
+	attach_page_private(page, subpage);
+	return 0;
+}
+
+void btrfs_detach_subpage(const struct btrfs_fs_info *fs_info,
+			  struct page *page)
+{
+	struct btrfs_subpage *subpage;
+
+	/* Either not subpage, or already detached */
+	if (fs_info->sectorsize == PAGE_SIZE || !PagePrivate(page))
+		return;
+
+	subpage = (struct btrfs_subpage *)detach_page_private(page);
+	ASSERT(subpage);
+	kfree(subpage);
+}
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
new file mode 100644
index 000000000000..676280bc7562
--- /dev/null
+++ b/fs/btrfs/subpage.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef BTRFS_SUBPAGE_H
+#define BTRFS_SUBPAGE_H
+
+#include <linux/spinlock.h>
+
+/*
+ * Maximum page size we support is 64K, minimum sector size is 4K, u16 bitmap
+ * is sufficient. Regular bitmap_* is not used due to size reasons.
+ */
+#define BTRFS_SUBPAGE_BITMAP_SIZE	16
+
+/*
+ * Structure to trace status of each sector inside a page, attached to
+ * page::private for both data and metadata inodes.
+ */
+struct btrfs_subpage {
+	/* Common members for both data and metadata pages */
+	spinlock_t lock;
+};
+
+enum btrfs_subpage_type {
+	BTRFS_SUBPAGE_METADATA,
+	BTRFS_SUBPAGE_DATA,
+};
+
+int btrfs_attach_subpage(const struct btrfs_fs_info *fs_info,
+			 struct page *page, enum btrfs_subpage_type type);
+void btrfs_detach_subpage(const struct btrfs_fs_info *fs_info,
+			  struct page *page);
+
+#endif
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 12d7d3be7cd4..919ed5c357e9 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -48,7 +48,6 @@
 #include "tests/btrfs-tests.h"
 #include "block-group.h"
 #include "discard.h"
-
 #include "qgroup.h"
 #define CREATE_TRACE_POINTS
 #include <trace/events/btrfs.h>
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 04/18] btrfs: make attach_extent_buffer_page() handle subpage case
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (2 preceding siblings ...)
  2021-01-26  8:33 ` [PATCH v5 03/18] btrfs: introduce the skeleton of btrfs_subpage structure Qu Wenruo
@ 2021-01-26  8:33 ` Qu Wenruo
  2021-01-27 16:01   ` Josef Bacik
  2021-01-26  8:33 ` [PATCH v5 05/18] btrfs: make grab_extent_buffer_from_page() " Qu Wenruo
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

For subpage case, we need to allocate additional memory for each
metadata page.

So we need to:

- Allow attach_extent_buffer_page() to return int to indicate allocation
  failure

- Allow manually pre-allocate subpage memory for alloc_extent_buffer()
  As we don't want to use GFP_ATOMIC under spinlock, we introduce
  btrfs_alloc_subpage() and btrfs_free_subpage() functions for this
  purpose.
  (The simple wrap for btrfs_free_subpage() is for later convert to
   kmem_cache. Already internally tested without problem)

- Preallocate btrfs_subpage structure for alloc_extent_buffer()
  We don't want to call memory allocation with spinlock held, so
  do preallocation before we acquire mapping->private_lock.

- Handle subpage and regular case differently in
  attach_extent_buffer_page()
  For regular case, no change, just do the usual thing.
  For subpage case, allocate new memory or use the preallocated memory.

For future subpage metadata, we will make use of radix tree to grab
extent buffer.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c | 69 +++++++++++++++++++++++++++++++++++++++-----
 fs/btrfs/subpage.c   | 30 +++++++++++++++----
 fs/btrfs/subpage.h   | 10 +++++++
 3 files changed, 96 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a56391839aca..ea105cb69e3a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -24,6 +24,7 @@
 #include "rcu-string.h"
 #include "backref.h"
 #include "disk-io.h"
+#include "subpage.h"
 
 static struct kmem_cache *extent_state_cache;
 static struct kmem_cache *extent_buffer_cache;
@@ -3140,9 +3141,13 @@ static int submit_extent_page(unsigned int opf,
 	return ret;
 }
 
-static void attach_extent_buffer_page(struct extent_buffer *eb,
-				      struct page *page)
+static int attach_extent_buffer_page(struct extent_buffer *eb,
+				     struct page *page,
+				     struct btrfs_subpage *prealloc)
 {
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	int ret = 0;
+
 	/*
 	 * If the page is mapped to btree inode, we should hold the private
 	 * lock to prevent race.
@@ -3152,10 +3157,28 @@ static void attach_extent_buffer_page(struct extent_buffer *eb,
 	if (page->mapping)
 		lockdep_assert_held(&page->mapping->private_lock);
 
-	if (!PagePrivate(page))
-		attach_page_private(page, eb);
+	if (fs_info->sectorsize == PAGE_SIZE) {
+		if (!PagePrivate(page))
+			attach_page_private(page, eb);
+		else
+			WARN_ON(page->private != (unsigned long)eb);
+		return 0;
+	}
+
+	/* Already mapped, just free prealloc */
+	if (PagePrivate(page)) {
+		btrfs_free_subpage(prealloc);
+		return 0;
+	}
+
+	if (prealloc)
+		/* Has preallocated memory for subpage */
+		attach_page_private(page, prealloc);
 	else
-		WARN_ON(page->private != (unsigned long)eb);
+		/* Do new allocation to attach subpage */
+		ret = btrfs_attach_subpage(fs_info, page,
+					   BTRFS_SUBPAGE_METADATA);
+	return ret;
 }
 
 void set_page_extent_mapped(struct page *page)
@@ -5070,12 +5093,19 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
 	set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags);
 
 	for (i = 0; i < num_pages; i++) {
+		int ret;
+
 		p = alloc_page(GFP_NOFS);
 		if (!p) {
 			btrfs_release_extent_buffer(new);
 			return NULL;
 		}
-		attach_extent_buffer_page(new, p);
+		ret = attach_extent_buffer_page(new, p, NULL);
+		if (ret < 0) {
+			put_page(p);
+			btrfs_release_extent_buffer(new);
+			return NULL;
+		}
 		WARN_ON(PageDirty(p));
 		SetPageUptodate(p);
 		new->pages[i] = p;
@@ -5313,12 +5343,33 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++, index++) {
+		struct btrfs_subpage *prealloc = NULL;
+
 		p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
 		if (!p) {
 			exists = ERR_PTR(-ENOMEM);
 			goto free_eb;
 		}
 
+		/*
+		 * Preallocate page->private for subpage case, so that we won't
+		 * allocate memory with private_lock hold.  The memory will be
+		 * freed by attach_extent_buffer_page() or freed manually if
+		 * we exit earlier.
+		 *
+		 * Although we have ensured one subpage eb can only have one
+		 * page, but it may change in the future for 16K page size
+		 * support, so we still preallocate the memory in the loop.
+		 */
+		ret = btrfs_alloc_subpage(fs_info, &prealloc,
+					  BTRFS_SUBPAGE_METADATA);
+		if (ret < 0) {
+			unlock_page(p);
+			put_page(p);
+			exists = ERR_PTR(ret);
+			goto free_eb;
+		}
+
 		spin_lock(&mapping->private_lock);
 		exists = grab_extent_buffer(p);
 		if (exists) {
@@ -5326,10 +5377,14 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 			unlock_page(p);
 			put_page(p);
 			mark_extent_buffer_accessed(exists, p);
+			btrfs_free_subpage(prealloc);
 			goto free_eb;
 		}
-		attach_extent_buffer_page(eb, p);
+		/* Should not fail, as we have preallocated the memory */
+		ret = attach_extent_buffer_page(eb, p, prealloc);
+		ASSERT(!ret);
 		spin_unlock(&mapping->private_lock);
+
 		WARN_ON(PageDirty(p));
 		eb->pages[i] = p;
 		if (!PageUptodate(p))
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index a3e5b6a13d54..61b28dfca20c 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -7,7 +7,8 @@
 int btrfs_attach_subpage(const struct btrfs_fs_info *fs_info,
 			 struct page *page, enum btrfs_subpage_type type)
 {
-	struct btrfs_subpage *subpage;
+	struct btrfs_subpage *subpage = NULL;
+	int ret;
 
 	/*
 	 * We have cases like a dummy extent buffer page, which is not mappped
@@ -19,11 +20,9 @@ int btrfs_attach_subpage(const struct btrfs_fs_info *fs_info,
 	if (fs_info->sectorsize == PAGE_SIZE || PagePrivate(page))
 		return 0;
 
-	subpage = kzalloc(sizeof(struct btrfs_subpage), GFP_NOFS);
-	if (!subpage)
-		return -ENOMEM;
-
-	spin_lock_init(&subpage->lock);
+	ret = btrfs_alloc_subpage(fs_info, &subpage, type);
+	if (ret < 0)
+		return ret;
 	attach_page_private(page, subpage);
 	return 0;
 }
@@ -39,5 +38,24 @@ void btrfs_detach_subpage(const struct btrfs_fs_info *fs_info,
 
 	subpage = (struct btrfs_subpage *)detach_page_private(page);
 	ASSERT(subpage);
+	btrfs_free_subpage(subpage);
+}
+
+int btrfs_alloc_subpage(const struct btrfs_fs_info *fs_info,
+			struct btrfs_subpage **ret,
+			enum btrfs_subpage_type type)
+{
+	if (fs_info->sectorsize == PAGE_SIZE)
+		return 0;
+
+	*ret = kzalloc(sizeof(struct btrfs_subpage), GFP_NOFS);
+	if (!*ret)
+		return -ENOMEM;
+	spin_lock_init(&(*ret)->lock);
+	return 0;
+}
+
+void btrfs_free_subpage(struct btrfs_subpage *subpage)
+{
 	kfree(subpage);
 }
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 676280bc7562..7ba544bcc9c6 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -18,6 +18,10 @@
 struct btrfs_subpage {
 	/* Common members for both data and metadata pages */
 	spinlock_t lock;
+	union {
+		/* Structures only used by metadata */
+		/* Structures only used by data */
+	};
 };
 
 enum btrfs_subpage_type {
@@ -30,4 +34,10 @@ int btrfs_attach_subpage(const struct btrfs_fs_info *fs_info,
 void btrfs_detach_subpage(const struct btrfs_fs_info *fs_info,
 			  struct page *page);
 
+/* Allocate additional data where page represents more than one sector */
+int btrfs_alloc_subpage(const struct btrfs_fs_info *fs_info,
+			struct btrfs_subpage **ret,
+			enum btrfs_subpage_type type);
+void btrfs_free_subpage(struct btrfs_subpage *subpage);
+
 #endif
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 05/18] btrfs: make grab_extent_buffer_from_page() handle subpage case
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (3 preceding siblings ...)
  2021-01-26  8:33 ` [PATCH v5 04/18] btrfs: make attach_extent_buffer_page() handle subpage case Qu Wenruo
@ 2021-01-26  8:33 ` Qu Wenruo
  2021-01-27 16:20   ` Josef Bacik
  2021-01-26  8:33 ` [PATCH v5 06/18] btrfs: support subpage for extent buffer page release Qu Wenruo
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

For subpage case, grab_extent_buffer() can't really get an extent buffer
just from btrfs_subpage.

We have radix tree lock protecting us from inserting the same eb into
the tree.  Thus we don't really need to do the extra hassle, just let
alloc_extent_buffer() handle the existing eb in radix tree.

Now if two ebs are being allocated as the same time, one will fail with
-EEIXST when inserting into the radix tree.

So for grab_extent_buffer(), just always return NULL for subpage case.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ea105cb69e3a..16a29f63cfd1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5282,10 +5282,19 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
 }
 #endif
 
-static struct extent_buffer *grab_extent_buffer(struct page *page)
+static struct extent_buffer *grab_extent_buffer(
+		struct btrfs_fs_info *fs_info, struct page *page)
 {
 	struct extent_buffer *exists;
 
+	/*
+	 * For subpage case, we completely rely on radix tree to ensure we
+	 * don't try to insert two ebs for the same bytenr.  So here we always
+	 * return NULL and just continue.
+	 */
+	if (fs_info->sectorsize < PAGE_SIZE)
+		return NULL;
+
 	/* Page not yet attached to an extent buffer */
 	if (!PagePrivate(page))
 		return NULL;
@@ -5371,7 +5380,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		}
 
 		spin_lock(&mapping->private_lock);
-		exists = grab_extent_buffer(p);
+		exists = grab_extent_buffer(fs_info, p);
 		if (exists) {
 			spin_unlock(&mapping->private_lock);
 			unlock_page(p);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 06/18] btrfs: support subpage for extent buffer page release
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (4 preceding siblings ...)
  2021-01-26  8:33 ` [PATCH v5 05/18] btrfs: make grab_extent_buffer_from_page() " Qu Wenruo
@ 2021-01-26  8:33 ` Qu Wenruo
  2021-01-27 16:21   ` Josef Bacik
  2021-01-26  8:33 ` [PATCH v5 07/18] btrfs: attach private to dummy extent buffer pages Qu Wenruo
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

In btrfs_release_extent_buffer_pages(), we need to add extra handling
for subpage.

Introduce a helper, detach_extent_buffer_page(), to do different
handling for regular and subpage cases.

For subpage case, handle detaching page private.

For unmapped (dummy or cloned) ebs, we can detach the page private
immediately as the page can only be attached to one unmapped eb.

For mapped ebs, we have to ensure there are no eb in the page range
before we delete it, as page->private is shared between all ebs in the
same page.

But there is a subpage specific race, where we can race with extent
buffer allocation, and clear the page private while new eb is still
being utilized, like this:

  Extent buffer A is the new extent buffer which will be allocated,
  while extent buffer B is the last existing extent buffer of the page.

  		T1 (eb A) 	 |		T2 (eb B)
  -------------------------------+------------------------------
  alloc_extent_buffer()		 | btrfs_release_extent_buffer_pages()
  |- p = find_or_create_page()   | |
  |- attach_extent_buffer_page() | |
  |				 | |- detach_extent_buffer_page()
  |				 |    |- if (!page_range_has_eb())
  |				 |    |  No new eb in the page range yet
  |				 |    |  As new eb A hasn't yet been
  |				 |    |  inserted into radix tree.
  |				 |    |- btrfs_detach_subpage()
  |				 |       |- detach_page_private();
  |- radix_tree_insert()	 |

  Then we have a metadata eb whose page has no private bit.

To avoid such race, we introduce a subpage metadata-specific member,
btrfs_subpage::eb_refs.

In alloc_extent_buffer() we increase eb_refs in the critical section of
private_lock.  Then page_range_has_eb() will return true for
detach_extent_buffer_page(), and will not detach page private.

The section is marked by:

- btrfs_page_inc_eb_refs()
- btrfs_page_dec_eb_refs()

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c | 94 +++++++++++++++++++++++++++++++++++++-------
 fs/btrfs/subpage.c   | 42 ++++++++++++++++++++
 fs/btrfs/subpage.h   | 13 +++++-
 3 files changed, 133 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 16a29f63cfd1..118874926179 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4993,25 +4993,39 @@ int extent_buffer_under_io(const struct extent_buffer *eb)
 		test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
 }
 
-/*
- * Release all pages attached to the extent buffer.
- */
-static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
+static bool page_range_has_eb(struct btrfs_fs_info *fs_info, struct page *page)
 {
-	int i;
-	int num_pages;
-	int mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
+	struct btrfs_subpage *subpage;
 
-	BUG_ON(extent_buffer_under_io(eb));
+	lockdep_assert_held(&page->mapping->private_lock);
 
-	num_pages = num_extent_pages(eb);
-	for (i = 0; i < num_pages; i++) {
-		struct page *page = eb->pages[i];
+	if (PagePrivate(page)) {
+		subpage = (struct btrfs_subpage *)page->private;
+		if (atomic_read(&subpage->eb_refs))
+			return true;
+	}
+	return false;
+}
 
-		if (!page)
-			continue;
+static void detach_extent_buffer_page(struct extent_buffer *eb, struct page *page)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	const bool mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
+
+	/*
+	 * For mapped eb, we're going to change the page private, which should
+	 * be done under the private_lock.
+	 */
+	if (mapped)
+		spin_lock(&page->mapping->private_lock);
+
+	if (!PagePrivate(page)) {
 		if (mapped)
-			spin_lock(&page->mapping->private_lock);
+			spin_unlock(&page->mapping->private_lock);
+		return;
+	}
+
+	if (fs_info->sectorsize == PAGE_SIZE) {
 		/*
 		 * We do this since we'll remove the pages after we've
 		 * removed the eb from the radix tree, so we could race
@@ -5030,9 +5044,49 @@ static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
 			 */
 			detach_page_private(page);
 		}
-
 		if (mapped)
 			spin_unlock(&page->mapping->private_lock);
+		return;
+	}
+
+	/*
+	 * For subpage, we can have dummy eb with page private.  In this case,
+	 * we can directly detach the private as such page is only attached to
+	 * one dummy eb, no sharing.
+	 */
+	if (!mapped) {
+		btrfs_detach_subpage(fs_info, page);
+		return;
+	}
+
+	btrfs_page_dec_eb_refs(fs_info, page);
+
+	/*
+	 * We can only detach the page private if there are no other ebs in the
+	 * page range.
+	 */
+	if (!page_range_has_eb(fs_info, page))
+		btrfs_detach_subpage(fs_info, page);
+
+	spin_unlock(&page->mapping->private_lock);
+}
+
+/* Release all pages attached to the extent buffer */
+static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
+{
+	int i;
+	int num_pages;
+
+	ASSERT(!extent_buffer_under_io(eb));
+
+	num_pages = num_extent_pages(eb);
+	for (i = 0; i < num_pages; i++) {
+		struct page *page = eb->pages[i];
+
+		if (!page)
+			continue;
+
+		detach_extent_buffer_page(eb, page);
 
 		/* One for when we allocated the page */
 		put_page(page);
@@ -5392,6 +5446,16 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		/* Should not fail, as we have preallocated the memory */
 		ret = attach_extent_buffer_page(eb, p, prealloc);
 		ASSERT(!ret);
+		/*
+		 * To inform we have extra eb under allocation, so that
+		 * detach_extent_buffer_page() won't release the page private
+		 * when the eb hasn't yet been inserted into radix tree.
+		 *
+		 * The ref will be decreased when the eb released the page, in
+		 * detach_extent_buffer_page().
+		 * Thus needs no special handling in error path.
+		 */
+		btrfs_page_inc_eb_refs(fs_info, p);
 		spin_unlock(&mapping->private_lock);
 
 		WARN_ON(PageDirty(p));
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 61b28dfca20c..a2a21fa0ea35 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -52,6 +52,8 @@ int btrfs_alloc_subpage(const struct btrfs_fs_info *fs_info,
 	if (!*ret)
 		return -ENOMEM;
 	spin_lock_init(&(*ret)->lock);
+	if (type == BTRFS_SUBPAGE_METADATA)
+		atomic_set(&(*ret)->eb_refs, 0);
 	return 0;
 }
 
@@ -59,3 +61,43 @@ void btrfs_free_subpage(struct btrfs_subpage *subpage)
 {
 	kfree(subpage);
 }
+
+/*
+ * Increase the eb_refs of current subpage.
+ *
+ * This is important for eb allocation, to prevent race with last eb freeing
+ * of the same page.
+ * With the eb_refs increased before the eb inserted into radix tree,
+ * detach_extent_buffer_page() won't detach the page private while we're still
+ * allocating the extent buffer.
+ */
+void btrfs_page_inc_eb_refs(const struct btrfs_fs_info *fs_info,
+			    struct page *page)
+{
+	struct btrfs_subpage *subpage;
+
+	if (fs_info->sectorsize == PAGE_SIZE)
+		return;
+
+	ASSERT(PagePrivate(page) && page->mapping);
+	lockdep_assert_held(&page->mapping->private_lock);
+
+	subpage = (struct btrfs_subpage *)page->private;
+	atomic_inc(&subpage->eb_refs);
+}
+
+void btrfs_page_dec_eb_refs(const struct btrfs_fs_info *fs_info,
+			    struct page *page)
+{
+	struct btrfs_subpage *subpage;
+
+	if (fs_info->sectorsize == PAGE_SIZE)
+		return;
+
+	ASSERT(PagePrivate(page) && page->mapping);
+	lockdep_assert_held(&page->mapping->private_lock);
+
+	subpage = (struct btrfs_subpage *)page->private;
+	ASSERT(atomic_read(&subpage->eb_refs));
+	atomic_dec(&subpage->eb_refs);
+}
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 7ba544bcc9c6..eef2ecae77e0 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -4,6 +4,7 @@
 #define BTRFS_SUBPAGE_H
 
 #include <linux/spinlock.h>
+#include <linux/refcount.h>
 
 /*
  * Maximum page size we support is 64K, minimum sector size is 4K, u16 bitmap
@@ -19,7 +20,13 @@ struct btrfs_subpage {
 	/* Common members for both data and metadata pages */
 	spinlock_t lock;
 	union {
-		/* Structures only used by metadata */
+		/*
+		 * Structures only used by metadata
+		 *
+		 * @eb_refs should only be operated under private_lock, as it
+		 * manages whether the subpage can be detached.
+		 */
+		atomic_t eb_refs;
 		/* Structures only used by data */
 	};
 };
@@ -40,4 +47,8 @@ int btrfs_alloc_subpage(const struct btrfs_fs_info *fs_info,
 			enum btrfs_subpage_type type);
 void btrfs_free_subpage(struct btrfs_subpage *subpage);
 
+void btrfs_page_inc_eb_refs(const struct btrfs_fs_info *fs_info,
+			    struct page *page);
+void btrfs_page_dec_eb_refs(const struct btrfs_fs_info *fs_info,
+			    struct page *page);
 #endif
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 07/18] btrfs: attach private to dummy extent buffer pages
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (5 preceding siblings ...)
  2021-01-26  8:33 ` [PATCH v5 06/18] btrfs: support subpage for extent buffer page release Qu Wenruo
@ 2021-01-26  8:33 ` Qu Wenruo
  2021-01-27 16:21   ` Josef Bacik
  2021-01-26  8:33 ` [PATCH v5 08/18] btrfs: introduce helpers for subpage uptodate status Qu Wenruo
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

There are locations where we allocate dummy extent buffers for temporary
usage, like in tree_mod_log_rewind() or get_old_root().

These dummy extent buffers will be handled by the same eb accessors, and
if they don't have page::private subpage eb accessors could fail.

To address such problems, make __alloc_dummy_extent_buffer() attach
page private for dummy extent buffers too.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 118874926179..7ee28d94bae9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5183,9 +5183,14 @@ struct extent_buffer *__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info,
 
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
+		int ret;
+
 		eb->pages[i] = alloc_page(GFP_NOFS);
 		if (!eb->pages[i])
 			goto err;
+		ret = attach_extent_buffer_page(eb, eb->pages[i], NULL);
+		if (ret < 0)
+			goto err;
 	}
 	set_extent_buffer_uptodate(eb);
 	btrfs_set_header_nritems(eb, 0);
@@ -5193,8 +5198,10 @@ struct extent_buffer *__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info,
 
 	return eb;
 err:
-	for (; i > 0; i--)
+	for (; i > 0; i--) {
+		detach_extent_buffer_page(eb, eb->pages[i - 1]);
 		__free_page(eb->pages[i - 1]);
+	}
 	__free_extent_buffer(eb);
 	return NULL;
 }
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 08/18] btrfs: introduce helpers for subpage uptodate status
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (6 preceding siblings ...)
  2021-01-26  8:33 ` [PATCH v5 07/18] btrfs: attach private to dummy extent buffer pages Qu Wenruo
@ 2021-01-26  8:33 ` Qu Wenruo
  2021-01-27 16:34   ` Josef Bacik
  2021-01-26  8:33 ` [PATCH v5 09/18] btrfs: introduce helpers for subpage error status Qu Wenruo
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Introduce the following functions to handle subpage uptodate status:

- btrfs_subpage_set_uptodate()
- btrfs_subpage_clear_uptodate()
- btrfs_subpage_test_uptodate()
  These helpers can only be called when the page has subpage attached
  and the range is ensured to be inside the page.

- btrfs_page_set_uptodate()
- btrfs_page_clear_uptodate()
- btrfs_page_test_uptodate()
  These helpers can handle both regular sector size and subpage.
  Although caller should still ensure that the range is inside the page.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/subpage.c | 114 +++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/subpage.h |  28 +++++++++++
 2 files changed, 142 insertions(+)

diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index a2a21fa0ea35..50b56e58ca93 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -101,3 +101,117 @@ void btrfs_page_dec_eb_refs(const struct btrfs_fs_info *fs_info,
 	ASSERT(atomic_read(&subpage->eb_refs));
 	atomic_dec(&subpage->eb_refs);
 }
+
+/*
+ * Convert the [start, start + len) range into a u16 bitmap
+ *
+ * For example: if start == page_offset() + 16K, len = 16K, we get 0x00f0.
+ */
+static inline u16 btrfs_subpage_calc_bitmap(
+		const struct btrfs_fs_info *fs_info, struct page *page,
+		u64 start, u32 len)
+{
+	const int bit_start = offset_in_page(start) >> fs_info->sectorsize_bits;
+	const int nbits = len >> fs_info->sectorsize_bits;
+
+	/* Basic checks */
+	ASSERT(PagePrivate(page) && page->private);
+	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
+	       IS_ALIGNED(len, fs_info->sectorsize));
+
+	/*
+	 * The range check only works for mapped page, we can still have
+	 * unampped page like dummy extent buffer pages.
+	 */
+	if (page->mapping)
+		ASSERT(page_offset(page) <= start &&
+			start + len <= page_offset(page) + PAGE_SIZE);
+	/*
+	 * Here nbits can be 16, thus can go beyond u16 range. We make the
+	 * first left shift to be calculate in unsigned long (at least u32),
+	 * then truncate the result to u16.
+	 */
+	return (u16)(((1UL << nbits) - 1) << bit_start);
+}
+
+void btrfs_subpage_set_uptodate(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	const u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->uptodate_bitmap |= tmp;
+	if (subpage->uptodate_bitmap == U16_MAX)
+		SetPageUptodate(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
+void btrfs_subpage_clear_uptodate(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	const u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->uptodate_bitmap &= ~tmp;
+	ClearPageUptodate(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
+/*
+ * Unlike set/clear which is dependent on each page status, for test all bits
+ * are tested in the same way.
+ */
+#define IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(name)				\
+bool btrfs_subpage_test_##name(const struct btrfs_fs_info *fs_info,	\
+		struct page *page, u64 start, u32 len)			\
+{									\
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; \
+	const u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len); \
+	unsigned long flags;						\
+	bool ret;							\
+									\
+	spin_lock_irqsave(&subpage->lock, flags);			\
+	ret = ((subpage->name##_bitmap & tmp) == tmp);			\
+	spin_unlock_irqrestore(&subpage->lock, flags);			\
+	return ret;							\
+}
+IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(uptodate);
+
+/*
+ * Note that, in selftests (extent-io-tests), we can have empty fs_info passed
+ * in.  We only test sectorsize == PAGE_SIZE cases so far, thus we can fall
+ * back to regular sectorsize branch.
+ */
+#define IMPLEMENT_BTRFS_PAGE_OPS(name, set_page_func, clear_page_func,	\
+			       test_page_func)				\
+void btrfs_page_set_##name(const struct btrfs_fs_info *fs_info,		\
+		struct page *page, u64 start, u32 len)			\
+{									\
+	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) {	\
+		set_page_func(page);					\
+		return;							\
+	}								\
+	btrfs_subpage_set_##name(fs_info, page, start, len);		\
+}									\
+void btrfs_page_clear_##name(const struct btrfs_fs_info *fs_info,	\
+		struct page *page, u64 start, u32 len)			\
+{									\
+	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) {	\
+		clear_page_func(page);					\
+		return;							\
+	}								\
+	btrfs_subpage_clear_##name(fs_info, page, start, len);		\
+}									\
+bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info,	\
+		struct page *page, u64 start, u32 len)			\
+{									\
+	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE)	\
+		return test_page_func(page);				\
+	return btrfs_subpage_test_##name(fs_info, page, start, len);	\
+}
+IMPLEMENT_BTRFS_PAGE_OPS(uptodate, SetPageUptodate, ClearPageUptodate,
+			 PageUptodate);
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index eef2ecae77e0..1dee4eb1c5c1 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -19,6 +19,7 @@
 struct btrfs_subpage {
 	/* Common members for both data and metadata pages */
 	spinlock_t lock;
+	u16 uptodate_bitmap;
 	union {
 		/*
 		 * Structures only used by metadata
@@ -51,4 +52,31 @@ void btrfs_page_inc_eb_refs(const struct btrfs_fs_info *fs_info,
 			    struct page *page);
 void btrfs_page_dec_eb_refs(const struct btrfs_fs_info *fs_info,
 			    struct page *page);
+
+/*
+ * Template for subpage related operations.
+ *
+ * btrfs_subpage_*() are for call sites where the page has subpage attached and
+ * the range is ensured to be inside the page.
+ *
+ * btrfs_page_*() are for call sites where the page can either be subpage
+ * specific or regular page. The function will handle both cases.
+ * But the range still needs to be inside the page.
+ */
+#define DECLARE_BTRFS_SUBPAGE_OPS(name)					\
+void btrfs_subpage_set_##name(const struct btrfs_fs_info *fs_info,	\
+		struct page *page, u64 start, u32 len);			\
+void btrfs_subpage_clear_##name(const struct btrfs_fs_info *fs_info,	\
+		struct page *page, u64 start, u32 len);			\
+bool btrfs_subpage_test_##name(const struct btrfs_fs_info *fs_info,	\
+		struct page *page, u64 start, u32 len);			\
+void btrfs_page_set_##name(const struct btrfs_fs_info *fs_info,		\
+		struct page *page, u64 start, u32 len);			\
+void btrfs_page_clear_##name(const struct btrfs_fs_info *fs_info,	\
+		struct page *page, u64 start, u32 len);			\
+bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info,	\
+		struct page *page, u64 start, u32 len);
+
+DECLARE_BTRFS_SUBPAGE_OPS(uptodate);
+
 #endif
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 09/18] btrfs: introduce helpers for subpage error status
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (7 preceding siblings ...)
  2021-01-26  8:33 ` [PATCH v5 08/18] btrfs: introduce helpers for subpage uptodate status Qu Wenruo
@ 2021-01-26  8:33 ` Qu Wenruo
  2021-01-27 16:34   ` Josef Bacik
  2021-01-26  8:33 ` [PATCH v5 10/18] btrfs: support subpage in set/clear_extent_buffer_uptodate() Qu Wenruo
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Introduce the following functions to handle subpage error status:

- btrfs_subpage_set_error()
- btrfs_subpage_clear_error()
- btrfs_subpage_test_error()
  These helpers can only be called when the page has subpage attached
  and the range is ensured to be inside the page.

- btrfs_page_set_error()
- btrfs_page_clear_error()
- btrfs_page_test_error()
  These helpers can handle both regular sector size and subpage without
  problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/subpage.c | 29 +++++++++++++++++++++++++++++
 fs/btrfs/subpage.h |  2 ++
 2 files changed, 31 insertions(+)

diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 50b56e58ca93..2fe55a712557 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -161,6 +161,33 @@ void btrfs_subpage_clear_uptodate(const struct btrfs_fs_info *fs_info,
 	spin_unlock_irqrestore(&subpage->lock, flags);
 }
 
+void btrfs_subpage_set_error(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	const u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->error_bitmap |= tmp;
+	SetPageError(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
+void btrfs_subpage_clear_error(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	const u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->error_bitmap &= ~tmp;
+	if (subpage->error_bitmap == 0)
+		ClearPageError(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
 /*
  * Unlike set/clear which is dependent on each page status, for test all bits
  * are tested in the same way.
@@ -180,6 +207,7 @@ bool btrfs_subpage_test_##name(const struct btrfs_fs_info *fs_info,	\
 	return ret;							\
 }
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(uptodate);
+IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(error);
 
 /*
  * Note that, in selftests (extent-io-tests), we can have empty fs_info passed
@@ -215,3 +243,4 @@ bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info,	\
 }
 IMPLEMENT_BTRFS_PAGE_OPS(uptodate, SetPageUptodate, ClearPageUptodate,
 			 PageUptodate);
+IMPLEMENT_BTRFS_PAGE_OPS(error, SetPageError, ClearPageError, PageError);
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 1dee4eb1c5c1..68cbfc4f6765 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -20,6 +20,7 @@ struct btrfs_subpage {
 	/* Common members for both data and metadata pages */
 	spinlock_t lock;
 	u16 uptodate_bitmap;
+	u16 error_bitmap;
 	union {
 		/*
 		 * Structures only used by metadata
@@ -78,5 +79,6 @@ bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info,	\
 		struct page *page, u64 start, u32 len);
 
 DECLARE_BTRFS_SUBPAGE_OPS(uptodate);
+DECLARE_BTRFS_SUBPAGE_OPS(error);
 
 #endif
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 10/18] btrfs: support subpage in set/clear_extent_buffer_uptodate()
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (8 preceding siblings ...)
  2021-01-26  8:33 ` [PATCH v5 09/18] btrfs: introduce helpers for subpage error status Qu Wenruo
@ 2021-01-26  8:33 ` Qu Wenruo
  2021-01-27 16:35   ` Josef Bacik
  2021-01-26  8:33 ` [PATCH v5 11/18] btrfs: support subpage in btrfs_clone_extent_buffer Qu Wenruo
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

To support subpage in set_extent_buffer_uptodate and
clear_extent_buffer_uptodate we only need to use the subpage-aware
helpers to update the page bits.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7ee28d94bae9..78fd36ba1f47 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5670,30 +5670,33 @@ bool set_extent_buffer_dirty(struct extent_buffer *eb)
 
 void clear_extent_buffer_uptodate(struct extent_buffer *eb)
 {
-	int i;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 	struct page *page;
 	int num_pages;
+	int i;
 
 	clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
 		if (page)
-			ClearPageUptodate(page);
+			btrfs_page_clear_uptodate(fs_info, page,
+						  eb->start, eb->len);
 	}
 }
 
 void set_extent_buffer_uptodate(struct extent_buffer *eb)
 {
-	int i;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 	struct page *page;
 	int num_pages;
+	int i;
 
 	set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-		SetPageUptodate(page);
+		btrfs_page_set_uptodate(fs_info, page, eb->start, eb->len);
 	}
 }
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 11/18] btrfs: support subpage in btrfs_clone_extent_buffer
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (9 preceding siblings ...)
  2021-01-26  8:33 ` [PATCH v5 10/18] btrfs: support subpage in set/clear_extent_buffer_uptodate() Qu Wenruo
@ 2021-01-26  8:33 ` Qu Wenruo
  2021-01-27 16:35   ` Josef Bacik
  2021-01-26  8:33 ` [PATCH v5 12/18] btrfs: support subpage in try_release_extent_buffer() Qu Wenruo
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

For btrfs_clone_extent_buffer(), it's mostly the same code of
__alloc_dummy_extent_buffer(), except it has extra page copy.

So to make it subpage compatible, we only need to:

- Call set_extent_buffer_uptodate() instead of SetPageUptodate()
  This will set correct uptodate bit for subpage and regular sector size
  cases.

Since we're calling set_extent_buffer_uptodate() which will also set
EXTENT_BUFFER_UPTODATE bit, we don't need to manually set that bit
either.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 78fd36ba1f47..d1c1bbc19226 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5161,11 +5161,10 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
 			return NULL;
 		}
 		WARN_ON(PageDirty(p));
-		SetPageUptodate(p);
 		new->pages[i] = p;
 		copy_page(page_address(p), page_address(src->pages[i]));
 	}
-	set_bit(EXTENT_BUFFER_UPTODATE, &new->bflags);
+	set_extent_buffer_uptodate(new);
 
 	return new;
 }
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 12/18] btrfs: support subpage in try_release_extent_buffer()
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (10 preceding siblings ...)
  2021-01-26  8:33 ` [PATCH v5 11/18] btrfs: support subpage in btrfs_clone_extent_buffer Qu Wenruo
@ 2021-01-26  8:33 ` Qu Wenruo
  2021-01-27 16:37   ` Josef Bacik
  2021-01-26  8:33 ` [PATCH v5 13/18] btrfs: introduce read_extent_buffer_subpage() Qu Wenruo
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Unlike the original try_release_extent_buffer(),
try_release_subpage_extent_buffer() will iterate through all the ebs in
the page, and try to release each.

We can release the full page only after there's no private attached,
which means all ebs of that page have been released as well.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c | 106 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 104 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d1c1bbc19226..27d7f42f605e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -6316,13 +6316,115 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
 	}
 }
 
+static struct extent_buffer *get_next_extent_buffer(
+		struct btrfs_fs_info *fs_info, struct page *page, u64 bytenr)
+{
+	struct extent_buffer *gang[BTRFS_SUBPAGE_BITMAP_SIZE];
+	struct extent_buffer *found = NULL;
+	u64 page_start = page_offset(page);
+	int ret;
+	int i;
+
+	ASSERT(in_range(bytenr, page_start, PAGE_SIZE));
+	ASSERT(PAGE_SIZE / fs_info->nodesize <= BTRFS_SUBPAGE_BITMAP_SIZE);
+	lockdep_assert_held(&fs_info->buffer_lock);
+
+	ret = radix_tree_gang_lookup(&fs_info->buffer_radix, (void **)gang,
+			bytenr >> fs_info->sectorsize_bits,
+			PAGE_SIZE / fs_info->nodesize);
+	for (i = 0; i < ret; i++) {
+		/* Already beyond page end */
+		if (gang[i]->start >= page_start + PAGE_SIZE)
+			break;
+		/* Found one */
+		if (gang[i]->start >= bytenr) {
+			found = gang[i];
+			break;
+		}
+	}
+	return found;
+}
+
+static int try_release_subpage_extent_buffer(struct page *page)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+	u64 cur = page_offset(page);
+	const u64 end = page_offset(page) + PAGE_SIZE;
+	int ret;
+
+	while (cur < end) {
+		struct extent_buffer *eb = NULL;
+
+		/*
+		 * Unlike try_release_extent_buffer() which uses page->private
+		 * to grab buffer, for subpage case we rely on radix tree, thus
+		 * we need to ensure radix tree consistency.
+		 *
+		 * We also want an atomic snapshot of the radix tree, thus go
+		 * with spinlock rather than RCU.
+		 */
+		spin_lock(&fs_info->buffer_lock);
+		eb = get_next_extent_buffer(fs_info, page, cur);
+		if (!eb) {
+			/* No more eb in the page range after or at cur */
+			spin_unlock(&fs_info->buffer_lock);
+			break;
+		}
+		cur = eb->start + eb->len;
+
+		/*
+		 * The same as try_release_extent_buffer(), to ensure the eb
+		 * won't disappear out from under us.
+		 */
+		spin_lock(&eb->refs_lock);
+		if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb)) {
+			spin_unlock(&eb->refs_lock);
+			spin_unlock(&fs_info->buffer_lock);
+			break;
+		}
+		spin_unlock(&fs_info->buffer_lock);
+
+		/*
+		 * If tree ref isn't set then we know the ref on this eb is a
+		 * real ref, so just return, this eb will likely be freed soon
+		 * anyway.
+		 */
+		if (!test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) {
+			spin_unlock(&eb->refs_lock);
+			break;
+		}
+
+		/*
+		 * Here we don't care about the return value, we will always
+		 * check the page private at the end.  And
+		 * release_extent_buffer() will release the refs_lock.
+		 */
+		release_extent_buffer(eb);
+	}
+	/*
+	 * Finally to check if we have cleared page private, as if we have
+	 * released all ebs in the page, the page private should be cleared now.
+	 */
+	spin_lock(&page->mapping->private_lock);
+	if (!PagePrivate(page))
+		ret = 1;
+	else
+		ret = 0;
+	spin_unlock(&page->mapping->private_lock);
+	return ret;
+
+}
+
 int try_release_extent_buffer(struct page *page)
 {
 	struct extent_buffer *eb;
 
+	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
+		return try_release_subpage_extent_buffer(page);
+
 	/*
-	 * We need to make sure nobody is attaching this page to an eb right
-	 * now.
+	 * We need to make sure nobody is changing page->private, as we rely on
+	 * page->private as the pointer to extent buffer.
 	 */
 	spin_lock(&page->mapping->private_lock);
 	if (!PagePrivate(page)) {
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 13/18] btrfs: introduce read_extent_buffer_subpage()
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (11 preceding siblings ...)
  2021-01-26  8:33 ` [PATCH v5 12/18] btrfs: support subpage in try_release_extent_buffer() Qu Wenruo
@ 2021-01-26  8:33 ` Qu Wenruo
  2021-01-27 16:39   ` Josef Bacik
  2021-01-26  8:33 ` [PATCH v5 14/18] btrfs: support subpage in endio_readpage_update_page_status() Qu Wenruo
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Introduce a helper, read_extent_buffer_subpage(), to do the subpage
extent buffer read.

The difference between regular and subpage routines are:

- No page locking
  Here we completely rely on extent locking.
  Page locking can reduce the concurrency greatly, as if we lock one
  page to read one extent buffer, all the other extent buffers in the
  same page will have to wait.

- Extent uptodate condition
  Despite the existing PageUptodate() and EXTENT_BUFFER_UPTODATE check,
  We also need to check btrfs_subpage::uptodate_bitmap.

- No page iteration
  Just one page, no need to loop, this greatly simplified the subpage
  routine.

This patch only implements the bio submit part, no endio support yet.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c | 70 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 27d7f42f605e..5ac8faf0f8e5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5699,6 +5699,73 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
 	}
 }
 
+static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
+				      int mirror_num)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct extent_io_tree *io_tree;
+	struct page *page = eb->pages[0];
+	struct bio *bio = NULL;
+	int ret = 0;
+
+	ASSERT(!test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags));
+	ASSERT(PagePrivate(page));
+	io_tree = &BTRFS_I(fs_info->btree_inode)->io_tree;
+
+	if (wait == WAIT_NONE) {
+		ret = try_lock_extent(io_tree, eb->start,
+				      eb->start + eb->len - 1);
+		if (ret <= 0)
+			return ret;
+	} else {
+		ret = lock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		if (ret < 0)
+			return ret;
+	}
+
+	ret = 0;
+	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags) ||
+	    PageUptodate(page) ||
+	    btrfs_subpage_test_uptodate(fs_info, page, eb->start, eb->len)) {
+		set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+		unlock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		return ret;
+	}
+
+	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
+	eb->read_mirror = 0;
+	atomic_set(&eb->io_pages, 1);
+	check_buffer_tree_ref(eb);
+	btrfs_subpage_clear_error(fs_info, page, eb->start, eb->len);
+
+	ret = submit_extent_page(REQ_OP_READ | REQ_META, NULL, page, eb->start,
+				 eb->len, eb->start - page_offset(page), &bio,
+				 end_bio_extent_readpage, mirror_num, 0, 0,
+				 true);
+	if (ret) {
+		/*
+		 * In the endio function, if we hit something wrong we will
+		 * increase the io_pages, so here we need to decrease it for
+		 * error path.
+		 */
+		atomic_dec(&eb->io_pages);
+	}
+	if (bio) {
+		int tmp;
+
+		tmp = submit_one_bio(bio, mirror_num, 0);
+		if (tmp < 0)
+			return tmp;
+	}
+	if (ret || wait != WAIT_COMPLETE)
+		return ret;
+
+	wait_extent_bit(io_tree, eb->start, eb->start + eb->len - 1, EXTENT_LOCKED);
+	if (!test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
+		ret = -EIO;
+	return ret;
+}
+
 int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 {
 	int i;
@@ -5715,6 +5782,9 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
 		return 0;
 
+	if (eb->fs_info->sectorsize < PAGE_SIZE)
+		return read_extent_buffer_subpage(eb, wait, mirror_num);
+
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 14/18] btrfs: support subpage in endio_readpage_update_page_status()
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (12 preceding siblings ...)
  2021-01-26  8:33 ` [PATCH v5 13/18] btrfs: introduce read_extent_buffer_subpage() Qu Wenruo
@ 2021-01-26  8:33 ` Qu Wenruo
  2021-01-27 16:42   ` Josef Bacik
  2021-01-26  8:33 ` [PATCH v5 15/18] btrfs: introduce subpage metadata validation check Qu Wenruo
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

To handle subpage status update, add the following:

- Use btrfs_page_*() subpage-aware helpers to update page status
  Now we can handle both cases well.

- No page unlock for subpage metadata
  Since subpage metadata doesn't utilize page locking at all, skip it.
  For subpage data locking, it's handled in later commits.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5ac8faf0f8e5..139a8a77ed72 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2839,15 +2839,24 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
 	processed->uptodate = uptodate;
 }
 
-static void endio_readpage_update_page_status(struct page *page, bool uptodate)
+static void endio_readpage_update_page_status(struct page *page, bool uptodate,
+					      u64 start, u32 len)
 {
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+
+	ASSERT(page_offset(page) <= start &&
+		start + len <= page_offset(page) + PAGE_SIZE);
+
 	if (uptodate) {
-		SetPageUptodate(page);
+		btrfs_page_set_uptodate(fs_info, page, start, len);
 	} else {
-		ClearPageUptodate(page);
-		SetPageError(page);
+		btrfs_page_clear_uptodate(fs_info, page, start, len);
+		btrfs_page_set_error(fs_info, page, start, len);
 	}
-	unlock_page(page);
+
+	if (fs_info->sectorsize == PAGE_SIZE)
+		unlock_page(page);
+	/* Subpage locking will be handled in later patches */
 }
 
 /*
@@ -2984,7 +2993,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		bio_offset += len;
 
 		/* Update page status and unlock */
-		endio_readpage_update_page_status(page, uptodate);
+		endio_readpage_update_page_status(page, uptodate, start, len);
 		endio_readpage_release_extent(&processed, BTRFS_I(inode),
 					      start, end, uptodate);
 	}
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 15/18] btrfs: introduce subpage metadata validation check
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (13 preceding siblings ...)
  2021-01-26  8:33 ` [PATCH v5 14/18] btrfs: support subpage in endio_readpage_update_page_status() Qu Wenruo
@ 2021-01-26  8:33 ` Qu Wenruo
  2021-01-27 16:47   ` Josef Bacik
  2021-01-26  8:34 ` [PATCH v5 16/18] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

For subpage metadata validation check, there are some differences:

- Read must finish in one bvec
  Since we're just reading one subpage range in one page, it should
  never be split into two bios nor two bvecs.

- How to grab the existing eb
  Instead of grabbing eb using page->private, we have to go search radix
  tree as we don't have any direct pointer at hand.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/disk-io.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5473bed6a7e8..0b10577ad2bd 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -591,6 +591,59 @@ static int validate_extent_buffer(struct extent_buffer *eb)
 	return ret;
 }
 
+static int validate_subpage_buffer(struct page *page, u64 start, u64 end,
+				   int mirror)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+	struct extent_buffer *eb;
+	bool reads_done;
+	int ret = 0;
+
+	/*
+	 * We don't allow bio merge for subpage metadata read, so we should
+	 * only get one eb for each endio hook.
+	 */
+	ASSERT(end == start + fs_info->nodesize - 1);
+	ASSERT(PagePrivate(page));
+
+	eb = find_extent_buffer(fs_info, start);
+	/*
+	 * When we are reading one tree block, eb must have been inserted into
+	 * the radix tree. If not, something is wrong.
+	 */
+	ASSERT(eb);
+
+	reads_done = atomic_dec_and_test(&eb->io_pages);
+	/* Subpage read must finish in page read */
+	ASSERT(reads_done);
+
+	eb->read_mirror = mirror;
+	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
+		ret = -EIO;
+		goto err;
+	}
+	ret = validate_extent_buffer(eb);
+	if (ret < 0)
+		goto err;
+
+	if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
+		btree_readahead_hook(eb, ret);
+
+	set_extent_buffer_uptodate(eb);
+
+	free_extent_buffer(eb);
+	return ret;
+err:
+	/*
+	 * end_bio_extent_readpage decrements io_pages in case of error,
+	 * make sure it has something to decrement.
+	 */
+	atomic_inc(&eb->io_pages);
+	clear_extent_buffer_uptodate(eb);
+	free_extent_buffer(eb);
+	return ret;
+}
+
 int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
 				   struct page *page, u64 start, u64 end,
 				   int mirror)
@@ -600,6 +653,10 @@ int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
 	int reads_done;
 
 	ASSERT(page->private);
+
+	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
+		return validate_subpage_buffer(page, start, end, mirror);
+
 	eb = (struct extent_buffer *)page->private;
 
 	/*
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 16/18] btrfs: introduce btrfs_subpage for data inodes
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (14 preceding siblings ...)
  2021-01-26  8:33 ` [PATCH v5 15/18] btrfs: introduce subpage metadata validation check Qu Wenruo
@ 2021-01-26  8:34 ` Qu Wenruo
  2021-01-27 16:56   ` Josef Bacik
  2021-01-26  8:34 ` [PATCH v5 17/18] btrfs: integrate page status update for data read path into begin/end_page_read() Qu Wenruo
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

To support subpage sector size, data also need extra info to make sure
which sectors in a page are uptodate/dirty/...

This patch will make pages for data inodes to get btrfs_subpage
structure attached, and detached when the page is freed.

This patch also slightly changes the timing when
set_page_extent_mapped() to make sure:

- We have page->mapping set
  page->mapping->host is used to grab btrfs_fs_info, thus we can only
  call this function after page is mapped to an inode.

  One call site attaches pages to inode manually, thus we have to modify
  the timing of set_page_extent_mapped() a little.

- As soon as possible, before other operations
  Since memory allocation can fail, we have to do extra error handling.
  Calling set_page_extent_mapped() as soon as possible can simply the
  error handling for several call sites.

The idea is pretty much the same as iomap_page, but with more bitmaps
for btrfs specific cases.

Currently the plan is to switch iomap if iomap can provide sector
aligned write back (only write back dirty sectors, but not the full
page, data balance require this feature).

So we will stick to btrfs specific bitmap for now.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/compression.c      | 10 ++++++--
 fs/btrfs/extent_io.c        | 46 +++++++++++++++++++++++++++++++++----
 fs/btrfs/extent_io.h        |  3 ++-
 fs/btrfs/file.c             | 24 ++++++++-----------
 fs/btrfs/free-space-cache.c | 15 +++++++++---
 fs/btrfs/inode.c            | 14 +++++++----
 fs/btrfs/ioctl.c            |  8 ++++++-
 fs/btrfs/reflink.c          |  5 +++-
 fs/btrfs/relocation.c       | 11 +++++++--
 9 files changed, 103 insertions(+), 33 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 5ae3fa0386b7..6d203acfdeb3 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -542,13 +542,19 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 			goto next;
 		}
 
-		end = last_offset + PAGE_SIZE - 1;
 		/*
 		 * at this point, we have a locked page in the page cache
 		 * for these bytes in the file.  But, we have to make
 		 * sure they map to this compressed extent on disk.
 		 */
-		set_page_extent_mapped(page);
+		ret = set_page_extent_mapped(page);
+		if (ret < 0) {
+			unlock_page(page);
+			put_page(page);
+			break;
+		}
+
+		end = last_offset + PAGE_SIZE - 1;
 		lock_extent(tree, last_offset, end);
 		read_lock(&em_tree->lock);
 		em = lookup_extent_mapping(em_tree, last_offset,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 139a8a77ed72..eeee3213daaa 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3190,10 +3190,39 @@ static int attach_extent_buffer_page(struct extent_buffer *eb,
 	return ret;
 }
 
-void set_page_extent_mapped(struct page *page)
+int set_page_extent_mapped(struct page *page)
 {
+	struct btrfs_fs_info *fs_info;
+
+	ASSERT(page->mapping);
+
+	if (PagePrivate(page))
+		return 0;
+
+	fs_info = btrfs_sb(page->mapping->host->i_sb);
+
+	if (fs_info->sectorsize < PAGE_SIZE)
+		return btrfs_attach_subpage(fs_info, page, BTRFS_SUBPAGE_DATA);
+
+	attach_page_private(page, (void *)EXTENT_PAGE_PRIVATE);
+	return 0;
+
+}
+
+void clear_page_extent_mapped(struct page *page)
+{
+	struct btrfs_fs_info *fs_info;
+
+	ASSERT(page->mapping);
+
 	if (!PagePrivate(page))
-		attach_page_private(page, (void *)EXTENT_PAGE_PRIVATE);
+		return;
+
+	fs_info = btrfs_sb(page->mapping->host->i_sb);
+	if (fs_info->sectorsize < PAGE_SIZE)
+		return btrfs_detach_subpage(fs_info, page);
+
+	detach_page_private(page);
 }
 
 static struct extent_map *
@@ -3250,7 +3279,12 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 	unsigned long this_bio_flag = 0;
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
 
-	set_page_extent_mapped(page);
+	ret = set_page_extent_mapped(page);
+	if (ret < 0) {
+		unlock_extent(tree, start, end);
+		SetPageError(page);
+		goto out;
+	}
 
 	if (!PageUptodate(page)) {
 		if (cleancache_get_page(page) == 0) {
@@ -3690,7 +3724,11 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 		flush_dcache_page(page);
 	}
 
-	set_page_extent_mapped(page);
+	ret = set_page_extent_mapped(page);
+	if (ret < 0) {
+		SetPageError(page);
+		goto done;
+	}
 
 	if (!epd->extent_locked) {
 		ret = writepage_delalloc(BTRFS_I(inode), page, wbc, start,
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 2d8187c84812..047b3e66897f 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -178,7 +178,8 @@ int btree_write_cache_pages(struct address_space *mapping,
 void extent_readahead(struct readahead_control *rac);
 int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
 		  u64 start, u64 len);
-void set_page_extent_mapped(struct page *page);
+int set_page_extent_mapped(struct page *page);
+void clear_page_extent_mapped(struct page *page);
 
 struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 					  u64 start, u64 owner_root, int level);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index d81ae1f518f2..63b290210eaa 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1369,6 +1369,12 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
 			goto fail;
 		}
 
+		err = set_page_extent_mapped(pages[i]);
+		if (err < 0) {
+			faili = i;
+			goto fail;
+		}
+
 		if (i == 0)
 			err = prepare_uptodate_page(inode, pages[i], pos,
 						    force_uptodate);
@@ -1453,23 +1459,11 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
 	}
 
 	/*
-	 * It's possible the pages are dirty right now, but we don't want
-	 * to clean them yet because copy_from_user may catch a page fault
-	 * and we might have to fall back to one page at a time.  If that
-	 * happens, we'll unlock these pages and we'd have a window where
-	 * reclaim could sneak in and drop the once-dirty page on the floor
-	 * without writing it.
-	 *
-	 * We have the pages locked and the extent range locked, so there's
-	 * no way someone can start IO on any dirty pages in this range.
-	 *
-	 * We'll call btrfs_dirty_pages() later on, and that will flip around
-	 * delalloc bits and dirty the pages as required.
+	 * We should be called after prepare_pages() which should have
+	 * locked all pages in the range.
 	 */
-	for (i = 0; i < num_pages; i++) {
-		set_page_extent_mapped(pages[i]);
+	for (i = 0; i < num_pages; i++)
 		WARN_ON(!PageLocked(pages[i]));
-	}
 
 	return ret;
 }
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index fd6ddd6b8165..35e5c6ee0584 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -431,11 +431,22 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, bool uptodate)
 	int i;
 
 	for (i = 0; i < io_ctl->num_pages; i++) {
+		int ret;
+
 		page = find_or_create_page(inode->i_mapping, i, mask);
 		if (!page) {
 			io_ctl_drop_pages(io_ctl);
 			return -ENOMEM;
 		}
+
+		ret = set_page_extent_mapped(page);
+		if (ret < 0) {
+			unlock_page(page);
+			put_page(page);
+			io_ctl_drop_pages(io_ctl);
+			return ret;
+		}
+
 		io_ctl->pages[i] = page;
 		if (uptodate && !PageUptodate(page)) {
 			btrfs_readpage(NULL, page);
@@ -455,10 +466,8 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, bool uptodate)
 		}
 	}
 
-	for (i = 0; i < io_ctl->num_pages; i++) {
+	for (i = 0; i < io_ctl->num_pages; i++)
 		clear_page_dirty_for_io(io_ctl->pages[i]);
-		set_page_extent_mapped(io_ctl->pages[i]);
-	}
 
 	return 0;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d1bb3cc8499b..a18e3c950f07 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4712,6 +4712,9 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 		ret = -ENOMEM;
 		goto out;
 	}
+	ret = set_page_extent_mapped(page);
+	if (ret < 0)
+		goto out_unlock;
 
 	if (!PageUptodate(page)) {
 		ret = btrfs_readpage(NULL, page);
@@ -4729,7 +4732,6 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 	wait_on_page_writeback(page);
 
 	lock_extent_bits(io_tree, block_start, block_end, &cached_state);
-	set_page_extent_mapped(page);
 
 	ordered = btrfs_lookup_ordered_extent(inode, block_start);
 	if (ordered) {
@@ -8107,7 +8109,7 @@ static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
 {
 	int ret = try_release_extent_mapping(page, gfp_flags);
 	if (ret == 1)
-		detach_page_private(page);
+		clear_page_extent_mapped(page);
 	return ret;
 }
 
@@ -8266,7 +8268,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 	}
 
 	ClearPageChecked(page);
-	detach_page_private(page);
+	clear_page_extent_mapped(page);
 }
 
 /*
@@ -8345,7 +8347,11 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	wait_on_page_writeback(page);
 
 	lock_extent_bits(io_tree, page_start, page_end, &cached_state);
-	set_page_extent_mapped(page);
+	ret2 = set_page_extent_mapped(page);
+	if (ret2 < 0) {
+		ret = vmf_error(ret2);
+		goto out_unlock;
+	}
 
 	/*
 	 * we can't set the delalloc bits if there are pending ordered
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 7f2935ea8d3a..e6a63f652235 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1314,6 +1314,13 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		if (!page)
 			break;
 
+		ret = set_page_extent_mapped(page);
+		if (ret < 0) {
+			unlock_page(page);
+			put_page(page);
+			break;
+		}
+
 		page_start = page_offset(page);
 		page_end = page_start + PAGE_SIZE - 1;
 		while (1) {
@@ -1435,7 +1442,6 @@ static int cluster_pages_for_defrag(struct inode *inode,
 	for (i = 0; i < i_done; i++) {
 		clear_page_dirty_for_io(pages[i]);
 		ClearPageChecked(pages[i]);
-		set_page_extent_mapped(pages[i]);
 		set_page_dirty(pages[i]);
 		unlock_page(pages[i]);
 		put_page(pages[i]);
diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index b03e7891394e..b24396cf2f99 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -81,7 +81,10 @@ static int copy_inline_to_page(struct btrfs_inode *inode,
 		goto out_unlock;
 	}
 
-	set_page_extent_mapped(page);
+	ret = set_page_extent_mapped(page);
+	if (ret < 0)
+		goto out_unlock;
+
 	clear_extent_bit(&inode->io_tree, file_offset, range_end,
 			 EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
 			 0, 0, NULL);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 9f2289bcdde6..2601fa19ff99 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2681,6 +2681,15 @@ static int relocate_file_extent_cluster(struct inode *inode,
 				goto out;
 			}
 		}
+		ret = set_page_extent_mapped(page);
+		if (ret < 0) {
+			btrfs_delalloc_release_metadata(BTRFS_I(inode),
+							PAGE_SIZE, true);
+			btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
+			unlock_page(page);
+			put_page(page);
+			goto out;
+		}
 
 		if (PageReadahead(page)) {
 			page_cache_async_readahead(inode->i_mapping,
@@ -2708,8 +2717,6 @@ static int relocate_file_extent_cluster(struct inode *inode,
 
 		lock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end);
 
-		set_page_extent_mapped(page);
-
 		if (nr < cluster->nr &&
 		    page_start + offset == cluster->boundary[nr]) {
 			set_extent_bits(&BTRFS_I(inode)->io_tree,
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 17/18] btrfs: integrate page status update for data read path into begin/end_page_read()
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (15 preceding siblings ...)
  2021-01-26  8:34 ` [PATCH v5 16/18] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
@ 2021-01-26  8:34 ` Qu Wenruo
  2021-01-27 17:13   ` Josef Bacik
  2021-01-26  8:34 ` [PATCH v5 18/18] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

In btrfs data page read path, the page status update are handled in two
different locations:

  btrfs_do_read_page()
  {
	while (cur <= end) {
		/* No need to read from disk */
		if (HOLE/PREALLOC/INLINE){
			memset();
			set_extent_uptodate();
			continue;
		}
		/* Read from disk */
		ret = submit_extent_page(end_bio_extent_readpage);
  }

  end_bio_extent_readpage()
  {
	endio_readpage_uptodate_page_status();
  }

This is fine for sectorsize == PAGE_SIZE case, as for above loop we
should only hit one branch and then exit.

But for subpage, there are more works to be done in page status update:
- Page Unlock condition
  Unlike regular page size == sectorsize case, we can no longer just
  unlock a page without a brain.
  Only the last reader of the page can unlock the page.
  This means, we can unlock the page either in the while() loop, or in
  the endio function.

- Page uptodate condition
  Since we have multiple sectors to read for a page, we can only mark
  the full page uptodate if all sectors are uptodate.

To handle both subpage and regular cases, introduce a pair of functions
to help handling page status update:

- being_page_read()
  For regular case, it does nothing.
  For subpage case, it update the reader counters so that later
  end_page_read() can know who is the last one to unlock the page.

- end_page_read()
  This is just endio_readpage_uptodate_page_status() renamed.
  The original name is a little too long and too specific for endio.

  The only new trick added is the condition for page unlock.
  Now for subage data, we unlock the page if we're the last reader.

This does not only provide the basis for subpage data read, but also
hide the special handling of page read from the main read loop.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c | 38 ++++++++++++++++++++----------
 fs/btrfs/subpage.c   | 56 ++++++++++++++++++++++++++++++++++----------
 fs/btrfs/subpage.h   |  8 +++++++
 3 files changed, 78 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index eeee3213daaa..7fc2c62d4eb9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2839,8 +2839,17 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
 	processed->uptodate = uptodate;
 }
 
-static void endio_readpage_update_page_status(struct page *page, bool uptodate,
-					      u64 start, u32 len)
+static void begin_data_page_read(struct btrfs_fs_info *fs_info, struct page *page)
+{
+	ASSERT(PageLocked(page));
+	if (fs_info->sectorsize == PAGE_SIZE)
+		return;
+
+	ASSERT(PagePrivate(page));
+	btrfs_subpage_start_reader(fs_info, page, page_offset(page), PAGE_SIZE);
+}
+
+static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
 
@@ -2856,7 +2865,12 @@ static void endio_readpage_update_page_status(struct page *page, bool uptodate,
 
 	if (fs_info->sectorsize == PAGE_SIZE)
 		unlock_page(page);
-	/* Subpage locking will be handled in later patches */
+	else if (is_data_inode(page->mapping->host))
+		/*
+		 * For subpage data, unlock the page if we're the last reader.
+		 * For subpage metadata, page lock is not utilized for read.
+		 */
+		btrfs_subpage_end_reader(fs_info, page, start, len);
 }
 
 /*
@@ -2993,7 +3007,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		bio_offset += len;
 
 		/* Update page status and unlock */
-		endio_readpage_update_page_status(page, uptodate, start, len);
+		end_page_read(page, uptodate, start, len);
 		endio_readpage_release_extent(&processed, BTRFS_I(inode),
 					      start, end, uptodate);
 	}
@@ -3263,6 +3277,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		      unsigned int read_flags, u64 *prev_em_start)
 {
 	struct inode *inode = page->mapping->host;
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	u64 start = page_offset(page);
 	const u64 end = start + PAGE_SIZE - 1;
 	u64 cur = start;
@@ -3306,6 +3321,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 			kunmap_atomic(userpage);
 		}
 	}
+	begin_data_page_read(fs_info, page);
 	while (cur <= end) {
 		bool force_bio_submit = false;
 		u64 disk_bytenr;
@@ -3323,13 +3339,14 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 					    &cached, GFP_NOFS);
 			unlock_extent_cached(tree, cur,
 					     cur + iosize - 1, &cached);
+			end_page_read(page, true, cur, iosize);
 			break;
 		}
 		em = __get_extent_map(inode, page, pg_offset, cur,
 				      end - cur + 1, em_cached);
 		if (IS_ERR_OR_NULL(em)) {
-			SetPageError(page);
 			unlock_extent(tree, cur, end);
+			end_page_read(page, false, cur, end + 1 - cur);
 			break;
 		}
 		extent_offset = cur - em->start;
@@ -3412,6 +3429,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 					    &cached, GFP_NOFS);
 			unlock_extent_cached(tree, cur,
 					     cur + iosize - 1, &cached);
+			end_page_read(page, true, cur, iosize);
 			cur = cur + iosize;
 			pg_offset += iosize;
 			continue;
@@ -3421,6 +3439,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 				   EXTENT_UPTODATE, 1, NULL)) {
 			check_page_uptodate(tree, page);
 			unlock_extent(tree, cur, cur + iosize - 1);
+			end_page_read(page, true, cur, iosize);
 			cur = cur + iosize;
 			pg_offset += iosize;
 			continue;
@@ -3429,8 +3448,8 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		 * to date.  Error out
 		 */
 		if (block_start == EXTENT_MAP_INLINE) {
-			SetPageError(page);
 			unlock_extent(tree, cur, cur + iosize - 1);
+			end_page_read(page, false, cur, iosize);
 			cur = cur + iosize;
 			pg_offset += iosize;
 			continue;
@@ -3447,19 +3466,14 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 			nr++;
 			*bio_flags = this_bio_flag;
 		} else {
-			SetPageError(page);
 			unlock_extent(tree, cur, cur + iosize - 1);
+			end_page_read(page, false, cur, iosize);
 			goto out;
 		}
 		cur = cur + iosize;
 		pg_offset += iosize;
 	}
 out:
-	if (!nr) {
-		if (!PageError(page))
-			SetPageUptodate(page);
-		unlock_page(page);
-	}
 	return ret;
 }
 
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 2fe55a712557..c85f0f1c7441 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -54,6 +54,8 @@ int btrfs_alloc_subpage(const struct btrfs_fs_info *fs_info,
 	spin_lock_init(&(*ret)->lock);
 	if (type == BTRFS_SUBPAGE_METADATA)
 		atomic_set(&(*ret)->eb_refs, 0);
+	else
+		atomic_set(&(*ret)->readers, 0);
 	return 0;
 }
 
@@ -102,23 +104,13 @@ void btrfs_page_dec_eb_refs(const struct btrfs_fs_info *fs_info,
 	atomic_dec(&subpage->eb_refs);
 }
 
-/*
- * Convert the [start, start + len) range into a u16 bitmap
- *
- * For example: if start == page_offset() + 16K, len = 16K, we get 0x00f0.
- */
-static inline u16 btrfs_subpage_calc_bitmap(
-		const struct btrfs_fs_info *fs_info, struct page *page,
-		u64 start, u32 len)
+static void btrfs_subpage_assert(const struct btrfs_fs_info *fs_info,
+	struct page *page, u64 start, u32 len)
 {
-	const int bit_start = offset_in_page(start) >> fs_info->sectorsize_bits;
-	const int nbits = len >> fs_info->sectorsize_bits;
-
 	/* Basic checks */
 	ASSERT(PagePrivate(page) && page->private);
 	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
 	       IS_ALIGNED(len, fs_info->sectorsize));
-
 	/*
 	 * The range check only works for mapped page, we can still have
 	 * unampped page like dummy extent buffer pages.
@@ -126,6 +118,46 @@ static inline u16 btrfs_subpage_calc_bitmap(
 	if (page->mapping)
 		ASSERT(page_offset(page) <= start &&
 			start + len <= page_offset(page) + PAGE_SIZE);
+}
+
+void btrfs_subpage_start_reader(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	const int nbits = len >> fs_info->sectorsize_bits;
+	int ret;
+
+	btrfs_subpage_assert(fs_info, page, start, len);
+
+	ret = atomic_add_return(nbits, &subpage->readers);
+	ASSERT(ret == nbits);
+}
+
+void btrfs_subpage_end_reader(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	const int nbits = len >> fs_info->sectorsize_bits;
+
+	btrfs_subpage_assert(fs_info, page, start, len);
+	ASSERT(atomic_read(&subpage->readers) >= nbits);
+	if (atomic_sub_and_test(nbits, &subpage->readers))
+		unlock_page(page);
+}
+
+/*
+ * Convert the [start, start + len) range into a u16 bitmap
+ *
+ * For example: if start == page_offset() + 16K, len = 16K, we get 0x00f0.
+ */
+static u16 btrfs_subpage_calc_bitmap(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	const int bit_start = offset_in_page(start) >> fs_info->sectorsize_bits;
+	const int nbits = len >> fs_info->sectorsize_bits;
+
+	btrfs_subpage_assert(fs_info, page, start, len);
+
 	/*
 	 * Here nbits can be 16, thus can go beyond u16 range. We make the
 	 * first left shift to be calculate in unsigned long (at least u32),
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 68cbfc4f6765..bf5f565c8d1d 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -30,6 +30,9 @@ struct btrfs_subpage {
 		 */
 		atomic_t eb_refs;
 		/* Structures only used by data */
+		struct {
+			atomic_t readers;
+		};
 	};
 };
 
@@ -54,6 +57,11 @@ void btrfs_page_inc_eb_refs(const struct btrfs_fs_info *fs_info,
 void btrfs_page_dec_eb_refs(const struct btrfs_fs_info *fs_info,
 			    struct page *page);
 
+void btrfs_subpage_start_reader(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len);
+void btrfs_subpage_end_reader(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len);
+
 /*
  * Template for subpage related operations.
  *
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 18/18] btrfs: allow RO mount of 4K sector size fs on 64K page system
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (16 preceding siblings ...)
  2021-01-26  8:34 ` [PATCH v5 17/18] btrfs: integrate page status update for data read path into begin/end_page_read() Qu Wenruo
@ 2021-01-26  8:34 ` Qu Wenruo
  2021-01-27 17:13   ` Josef Bacik
  2021-02-01 15:49   ` David Sterba
  2021-01-27 16:17 ` [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Josef Bacik
                   ` (3 subsequent siblings)
  21 siblings, 2 replies; 52+ messages in thread
From: Qu Wenruo @ 2021-01-26  8:34 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

This adds the basic RO mount ability for 4K sector size on 64K page
system.

Currently we only plan to support 4K and 64K page system.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/disk-io.c | 24 +++++++++++++++++++++---
 fs/btrfs/super.c   |  7 +++++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 0b10577ad2bd..d74ee0a396ac 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2483,13 +2483,21 @@ static int validate_super(struct btrfs_fs_info *fs_info,
 		btrfs_err(fs_info, "invalid sectorsize %llu", sectorsize);
 		ret = -EINVAL;
 	}
-	/* Only PAGE SIZE is supported yet */
-	if (sectorsize != PAGE_SIZE) {
+
+	/*
+	 * For 4K page size, we only support 4K sector size.
+	 * For 64K page size, we support RW for 64K sector size, and RO for
+	 * 4K sector size.
+	 */
+	if ((SZ_4K == PAGE_SIZE && sectorsize != PAGE_SIZE) ||
+	    (SZ_64K == PAGE_SIZE && (sectorsize != SZ_4K &&
+				     sectorsize != SZ_64K))) {
 		btrfs_err(fs_info,
-			"sectorsize %llu not supported yet, only support %lu",
+			"sectorsize %llu not supported yet for page size %lu",
 			sectorsize, PAGE_SIZE);
 		ret = -EINVAL;
 	}
+
 	if (!is_power_of_2(nodesize) || nodesize < sectorsize ||
 	    nodesize > BTRFS_MAX_METADATA_BLOCKSIZE) {
 		btrfs_err(fs_info, "invalid nodesize %llu", nodesize);
@@ -3248,6 +3256,16 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_alloc;
 	}
 
+	/* For 4K sector size support, it's only read-only yet */
+	if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) {
+		if (!sb_rdonly(sb) || btrfs_super_log_root(disk_super)) {
+			btrfs_err(fs_info,
+				"subpage sector size only support RO yet");
+			err = -EINVAL;
+			goto fail_alloc;
+		}
+	}
+
 	ret = btrfs_init_workqueues(fs_info, fs_devices);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 919ed5c357e9..8be9985feeb0 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2027,6 +2027,13 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			ret = -EINVAL;
 			goto restore;
 		}
+		if (fs_info->sectorsize < PAGE_SIZE) {
+			btrfs_warn(fs_info,
+	"read-write mount is not yet allowed for sector size %u page size %lu",
+				   fs_info->sectorsize, PAGE_SIZE);
+			ret = -EINVAL;
+			goto restore;
+		}
 
 		/*
 		 * NOTE: when remounting with a change that does writes, don't
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 01/18] btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to PAGE_START_WRITEBACK
  2021-01-26  8:33 ` [PATCH v5 01/18] btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to PAGE_START_WRITEBACK Qu Wenruo
@ 2021-01-27 15:56   ` Josef Bacik
  0 siblings, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 15:56 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:33 AM, Qu Wenruo wrote:
> PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK are two defines used in
> __process_pages_contig(), to let the function know to clear page dirty
> bit and then set page writeback.
> 
> However page writeback and dirty bits are conflicting (at least for
> sector size == PAGE_SIZE case), this means these two have to be always
> updated together.
> 
> This means we can merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to
> PAGE_START_WRITEBACK.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 02/18] btrfs: set UNMAPPED bit early in btrfs_clone_extent_buffer() for subpage support
  2021-01-26  8:33 ` [PATCH v5 02/18] btrfs: set UNMAPPED bit early in btrfs_clone_extent_buffer() for subpage support Qu Wenruo
@ 2021-01-27 15:56   ` Josef Bacik
  0 siblings, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 15:56 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 1/26/21 3:33 AM, Qu Wenruo wrote:
> For the incoming subpage support, UNMAPPED extent buffer will have
> different behavior in btrfs_release_extent_buffer().
> 
> This means we need to set UNMAPPED bit early before calling
> btrfs_release_extent_buffer().
> 
> Currently there is only one caller which relies on
> btrfs_release_extent_buffer() in its error path while set UNMAPPED bit
> late:
> - btrfs_clone_extent_buffer()
> 
> Make it subpage compatible by setting the UNMAPPED bit early, since
> we're here, also move the UPTODATE bit early.
> 
> There is another caller, __alloc_dummy_extent_buffer(), setting UNAMPPED
> bit late, but that function clean up the allocated page manually, thus
> no need for any modification.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 04/18] btrfs: make attach_extent_buffer_page() handle subpage case
  2021-01-26  8:33 ` [PATCH v5 04/18] btrfs: make attach_extent_buffer_page() handle subpage case Qu Wenruo
@ 2021-01-27 16:01   ` Josef Bacik
  0 siblings, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 16:01 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:33 AM, Qu Wenruo wrote:
> For subpage case, we need to allocate additional memory for each
> metadata page.
> 
> So we need to:
> 
> - Allow attach_extent_buffer_page() to return int to indicate allocation
>    failure
> 
> - Allow manually pre-allocate subpage memory for alloc_extent_buffer()
>    As we don't want to use GFP_ATOMIC under spinlock, we introduce
>    btrfs_alloc_subpage() and btrfs_free_subpage() functions for this
>    purpose.
>    (The simple wrap for btrfs_free_subpage() is for later convert to
>     kmem_cache. Already internally tested without problem)
> 
> - Preallocate btrfs_subpage structure for alloc_extent_buffer()
>    We don't want to call memory allocation with spinlock held, so
>    do preallocation before we acquire mapping->private_lock.
> 
> - Handle subpage and regular case differently in
>    attach_extent_buffer_page()
>    For regular case, no change, just do the usual thing.
>    For subpage case, allocate new memory or use the preallocated memory.
> 
> For future subpage metadata, we will make use of radix tree to grab
> extent buffer.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 00/18] btrfs: add read-only support for subpage sector size
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (17 preceding siblings ...)
  2021-01-26  8:34 ` [PATCH v5 18/18] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
@ 2021-01-27 16:17 ` Josef Bacik
  2021-01-28  0:30   ` Qu Wenruo
  2021-02-01 15:55 ` David Sterba
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 16:17 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 1/26/21 3:33 AM, Qu Wenruo wrote:
> Patches can be fetched from github:
> https://github.com/adam900710/linux/tree/subpage
> Currently the branch also contains partial RW data support (still some
> ordered extent and data csum mismatch problems)
> 
> Great thanks to David/Nikolay/Josef for their effort reviewing and
> merging the preparation patches into misc-next.
> 
> === What works ===
> Just from the patchset:
> - Data read
>    Both regular and compressed data, with csum check.
> 
> - Metadata read
> 
> This means, with these patchset, 64K page systems can at least mount
> btrfs with 4K sector size read-only.
> This should provide the ability to migrate data at least.
> 
> While on the github branch, there are already experimental RW supports,
> there are still ordered extent related bugs for me to fix.
> Thus only the RO part is sent for review and testing.
> 
> === Patchset structure ===
> Patch 01~02:	Preparation patches which don't have functional change
> Patch 03~12:	Subpage metadata allocation and freeing
> Patch 13~15:	Subpage metadata read path
> Patch 16~17:	Subpage data read path
> Patch 18:	Enable subpage RO support
> 
> === Changelog ===
> v1:
> - Separate the main implementation from previous huge patchset
>    Huge patchset doesn't make much sense.
> 
> - Use bitmap implementation
>    Now page::private will be a pointer to btrfs_subpage structure, which
>    contains bitmaps for various page status.
> 
> v2:
> - Use page::private as btrfs_subpage for extra info
>    This replace old extent io tree based solution, which reduces latency
>    and don't require memory allocation for its operations.
> 
> - Cherry-pick new preparation patches from RW development
>    Those new preparation patches improves the readability by their own.
> 
> v3:
> - Make dummy extent buffer to follow the same subpage accessors
>    Fsstress exposed several ASSERT() for dummy extent buffers.
>    It turns out we need to make dummy extent buffer to own the same
>    btrfs_subpage structure to make eb accessors to work properly
> 
> - Two new small __process_pages_contig() related preparation patches
>    One to make __process_pages_contig() to enhance the error handling
>    path for locked_page, one to merge one macro.
> 
> - Extent buffers refs count update
>    Except try_release_extent_buffer(), all other eb uses will try to
>    increase the ref count of the eb.
>    For try_release_extent_buffer(), the eb refs check will happen inside
>    the rcu critical section to avoid eb being freed.
> 
> - Comment updates
>    Addressing the comments from the mail list.
> 
> v4:
> - Get rid of btrfs_subpage::tree_block_bitmap
>    This is to reduce lock complexity (no need to bother extra subpage
>    lock for metadata, all locks are existing locks)
>    Now eb looking up mostly depends on radix tree, with small help from
>    btrfs_subpage::under_alloc.
>    Now I haven't experieneced metadata related problems any more during
>    my local fsstress tests.
> 
> - Fix a race where metadata page dirty bit can race
>    Fixed in the metadata RW patchset though.
> 
> - Rebased to latest misc-next branch
>    With 4 patches removed, as they are already in misc-next.
> 
> v5:
> - Use the updated version from David as base
>    Most comment/commit message update should be kept as is.
> 
> - A new separate patch to move UNMAPPED bit set timing
> 
> - New comment on why we need to prealloc subpage inside a loop
>    Mostly for further 16K page size support, where we can have
>    eb across multiple pages.
> 
> - Remove one patch which is too RW specific
>    Since it introduces functional change which only makes sense for RW
>    support, it's not a good idea to include it in RO support.
> 
> - Error handling fixes
>    Great thanks to Josef.
> 
> - Refactor btrfs_subpage allocation/freeing
>    Now we have btrfs_alloc_subpage() and btrfs_free_subpage() helpers to
>    do all the allocation/freeing.
>    It's pretty easy to convert to kmem_cache using above helpers.
>    (already internally tested using kmem_cache without problem, in fact
>     it's all the problems found in kmem_cache test leads to the new
>     interface)
> 
> - Use btrfs_subpage::eb_refs to replace old under_alloc
>    This makes checking whether the page has any eb left much easier.
> 
> Qu Wenruo (18):
>    btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to
>      PAGE_START_WRITEBACK
>    btrfs: set UNMAPPED bit early in btrfs_clone_extent_buffer() for
>      subpage support
>    btrfs: introduce the skeleton of btrfs_subpage structure
>    btrfs: make attach_extent_buffer_page() handle subpage case
>    btrfs: make grab_extent_buffer_from_page() handle subpage case
>    btrfs: support subpage for extent buffer page release

I don't have this patch in my inbox so I can't reply to it directly, but you 
include refcount.h, but then use normal atomics.  Please used the actual 
refcount_t, as it gets us all the debugging stuff that makes finding problems 
much easier.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 05/18] btrfs: make grab_extent_buffer_from_page() handle subpage case
  2021-01-26  8:33 ` [PATCH v5 05/18] btrfs: make grab_extent_buffer_from_page() " Qu Wenruo
@ 2021-01-27 16:20   ` Josef Bacik
  0 siblings, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 16:20 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:33 AM, Qu Wenruo wrote:
> For subpage case, grab_extent_buffer() can't really get an extent buffer
> just from btrfs_subpage.
> 
> We have radix tree lock protecting us from inserting the same eb into
> the tree.  Thus we don't really need to do the extra hassle, just let
> alloc_extent_buffer() handle the existing eb in radix tree.
> 
> Now if two ebs are being allocated as the same time, one will fail with
> -EEIXST when inserting into the radix tree.
> 
> So for grab_extent_buffer(), just always return NULL for subpage case.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 06/18] btrfs: support subpage for extent buffer page release
  2021-01-26  8:33 ` [PATCH v5 06/18] btrfs: support subpage for extent buffer page release Qu Wenruo
@ 2021-01-27 16:21   ` Josef Bacik
  2021-02-01 15:32     ` David Sterba
  0 siblings, 1 reply; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 16:21 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:33 AM, Qu Wenruo wrote:
> In btrfs_release_extent_buffer_pages(), we need to add extra handling
> for subpage.
> 
> Introduce a helper, detach_extent_buffer_page(), to do different
> handling for regular and subpage cases.
> 
> For subpage case, handle detaching page private.
> 
> For unmapped (dummy or cloned) ebs, we can detach the page private
> immediately as the page can only be attached to one unmapped eb.
> 
> For mapped ebs, we have to ensure there are no eb in the page range
> before we delete it, as page->private is shared between all ebs in the
> same page.
> 
> But there is a subpage specific race, where we can race with extent
> buffer allocation, and clear the page private while new eb is still
> being utilized, like this:
> 
>    Extent buffer A is the new extent buffer which will be allocated,
>    while extent buffer B is the last existing extent buffer of the page.
> 
>    		T1 (eb A) 	 |		T2 (eb B)
>    -------------------------------+------------------------------
>    alloc_extent_buffer()		 | btrfs_release_extent_buffer_pages()
>    |- p = find_or_create_page()   | |
>    |- attach_extent_buffer_page() | |
>    |				 | |- detach_extent_buffer_page()
>    |				 |    |- if (!page_range_has_eb())
>    |				 |    |  No new eb in the page range yet
>    |				 |    |  As new eb A hasn't yet been
>    |				 |    |  inserted into radix tree.
>    |				 |    |- btrfs_detach_subpage()
>    |				 |       |- detach_page_private();
>    |- radix_tree_insert()	 |
> 
>    Then we have a metadata eb whose page has no private bit.
> 
> To avoid such race, we introduce a subpage metadata-specific member,
> btrfs_subpage::eb_refs.
> 
> In alloc_extent_buffer() we increase eb_refs in the critical section of
> private_lock.  Then page_range_has_eb() will return true for
> detach_extent_buffer_page(), and will not detach page private.
> 
> The section is marked by:
> 
> - btrfs_page_inc_eb_refs()
> - btrfs_page_dec_eb_refs()
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/extent_io.c | 94 +++++++++++++++++++++++++++++++++++++-------
>   fs/btrfs/subpage.c   | 42 ++++++++++++++++++++
>   fs/btrfs/subpage.h   | 13 +++++-
>   3 files changed, 133 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 16a29f63cfd1..118874926179 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4993,25 +4993,39 @@ int extent_buffer_under_io(const struct extent_buffer *eb)
>   		test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
>   }
>   
> -/*
> - * Release all pages attached to the extent buffer.
> - */
> -static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
> +static bool page_range_has_eb(struct btrfs_fs_info *fs_info, struct page *page)
>   {
> -	int i;
> -	int num_pages;
> -	int mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
> +	struct btrfs_subpage *subpage;
>   
> -	BUG_ON(extent_buffer_under_io(eb));
> +	lockdep_assert_held(&page->mapping->private_lock);
>   
> -	num_pages = num_extent_pages(eb);
> -	for (i = 0; i < num_pages; i++) {
> -		struct page *page = eb->pages[i];
> +	if (PagePrivate(page)) {
> +		subpage = (struct btrfs_subpage *)page->private;
> +		if (atomic_read(&subpage->eb_refs))
> +			return true;
> +	}
> +	return false;
> +}
>   
> -		if (!page)
> -			continue;
> +static void detach_extent_buffer_page(struct extent_buffer *eb, struct page *page)
> +{
> +	struct btrfs_fs_info *fs_info = eb->fs_info;
> +	const bool mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
> +
> +	/*
> +	 * For mapped eb, we're going to change the page private, which should
> +	 * be done under the private_lock.
> +	 */
> +	if (mapped)
> +		spin_lock(&page->mapping->private_lock);
> +
> +	if (!PagePrivate(page)) {
>   		if (mapped)
> -			spin_lock(&page->mapping->private_lock);
> +			spin_unlock(&page->mapping->private_lock);
> +		return;
> +	}
> +
> +	if (fs_info->sectorsize == PAGE_SIZE) {
>   		/*
>   		 * We do this since we'll remove the pages after we've
>   		 * removed the eb from the radix tree, so we could race
> @@ -5030,9 +5044,49 @@ static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
>   			 */
>   			detach_page_private(page);
>   		}
> -
>   		if (mapped)
>   			spin_unlock(&page->mapping->private_lock);
> +		return;
> +	}
> +
> +	/*
> +	 * For subpage, we can have dummy eb with page private.  In this case,
> +	 * we can directly detach the private as such page is only attached to
> +	 * one dummy eb, no sharing.
> +	 */
> +	if (!mapped) {
> +		btrfs_detach_subpage(fs_info, page);
> +		return;
> +	}
> +
> +	btrfs_page_dec_eb_refs(fs_info, page);
> +
> +	/*
> +	 * We can only detach the page private if there are no other ebs in the
> +	 * page range.
> +	 */
> +	if (!page_range_has_eb(fs_info, page))
> +		btrfs_detach_subpage(fs_info, page);
> +
> +	spin_unlock(&page->mapping->private_lock);
> +}
> +
> +/* Release all pages attached to the extent buffer */
> +static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
> +{
> +	int i;
> +	int num_pages;
> +
> +	ASSERT(!extent_buffer_under_io(eb));
> +
> +	num_pages = num_extent_pages(eb);
> +	for (i = 0; i < num_pages; i++) {
> +		struct page *page = eb->pages[i];
> +
> +		if (!page)
> +			continue;
> +
> +		detach_extent_buffer_page(eb, page);
>   
>   		/* One for when we allocated the page */
>   		put_page(page);
> @@ -5392,6 +5446,16 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>   		/* Should not fail, as we have preallocated the memory */
>   		ret = attach_extent_buffer_page(eb, p, prealloc);
>   		ASSERT(!ret);
> +		/*
> +		 * To inform we have extra eb under allocation, so that
> +		 * detach_extent_buffer_page() won't release the page private
> +		 * when the eb hasn't yet been inserted into radix tree.
> +		 *
> +		 * The ref will be decreased when the eb released the page, in
> +		 * detach_extent_buffer_page().
> +		 * Thus needs no special handling in error path.
> +		 */
> +		btrfs_page_inc_eb_refs(fs_info, p);
>   		spin_unlock(&mapping->private_lock);
>   
>   		WARN_ON(PageDirty(p));
> diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
> index 61b28dfca20c..a2a21fa0ea35 100644
> --- a/fs/btrfs/subpage.c
> +++ b/fs/btrfs/subpage.c
> @@ -52,6 +52,8 @@ int btrfs_alloc_subpage(const struct btrfs_fs_info *fs_info,
>   	if (!*ret)
>   		return -ENOMEM;
>   	spin_lock_init(&(*ret)->lock);
> +	if (type == BTRFS_SUBPAGE_METADATA)
> +		atomic_set(&(*ret)->eb_refs, 0);
>   	return 0;
>   }
>   
> @@ -59,3 +61,43 @@ void btrfs_free_subpage(struct btrfs_subpage *subpage)
>   {
>   	kfree(subpage);
>   }
> +
> +/*
> + * Increase the eb_refs of current subpage.
> + *
> + * This is important for eb allocation, to prevent race with last eb freeing
> + * of the same page.
> + * With the eb_refs increased before the eb inserted into radix tree,
> + * detach_extent_buffer_page() won't detach the page private while we're still
> + * allocating the extent buffer.
> + */
> +void btrfs_page_inc_eb_refs(const struct btrfs_fs_info *fs_info,
> +			    struct page *page)
> +{
> +	struct btrfs_subpage *subpage;
> +
> +	if (fs_info->sectorsize == PAGE_SIZE)
> +		return;
> +
> +	ASSERT(PagePrivate(page) && page->mapping);
> +	lockdep_assert_held(&page->mapping->private_lock);
> +
> +	subpage = (struct btrfs_subpage *)page->private;
> +	atomic_inc(&subpage->eb_refs);
> +}
> +
> +void btrfs_page_dec_eb_refs(const struct btrfs_fs_info *fs_info,
> +			    struct page *page)
> +{
> +	struct btrfs_subpage *subpage;
> +
> +	if (fs_info->sectorsize == PAGE_SIZE)
> +		return;
> +
> +	ASSERT(PagePrivate(page) && page->mapping);
> +	lockdep_assert_held(&page->mapping->private_lock);
> +
> +	subpage = (struct btrfs_subpage *)page->private;
> +	ASSERT(atomic_read(&subpage->eb_refs));
> +	atomic_dec(&subpage->eb_refs);
> +}
> diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
> index 7ba544bcc9c6..eef2ecae77e0 100644
> --- a/fs/btrfs/subpage.h
> +++ b/fs/btrfs/subpage.h
> @@ -4,6 +4,7 @@
>   #define BTRFS_SUBPAGE_H
>   
>   #include <linux/spinlock.h>
> +#include <linux/refcount.h>

I made this comment elsewhere, but the patch finally showed up in my email after 
I refreshed (???? thunderbird wtf??).  Anyway you import refcount.h here, but 
don't actually use refcount_t.  Please use refcount_t, so we get the benefit of 
the debugging from the helpers.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 07/18] btrfs: attach private to dummy extent buffer pages
  2021-01-26  8:33 ` [PATCH v5 07/18] btrfs: attach private to dummy extent buffer pages Qu Wenruo
@ 2021-01-27 16:21   ` Josef Bacik
  0 siblings, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 16:21 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:33 AM, Qu Wenruo wrote:
> There are locations where we allocate dummy extent buffers for temporary
> usage, like in tree_mod_log_rewind() or get_old_root().
> 
> These dummy extent buffers will be handled by the same eb accessors, and
> if they don't have page::private subpage eb accessors could fail.
> 
> To address such problems, make __alloc_dummy_extent_buffer() attach
> page private for dummy extent buffers too.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>

Yup I'm slow,

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 08/18] btrfs: introduce helpers for subpage uptodate status
  2021-01-26  8:33 ` [PATCH v5 08/18] btrfs: introduce helpers for subpage uptodate status Qu Wenruo
@ 2021-01-27 16:34   ` Josef Bacik
  0 siblings, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 16:34 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:33 AM, Qu Wenruo wrote:
> Introduce the following functions to handle subpage uptodate status:
> 
> - btrfs_subpage_set_uptodate()
> - btrfs_subpage_clear_uptodate()
> - btrfs_subpage_test_uptodate()
>    These helpers can only be called when the page has subpage attached
>    and the range is ensured to be inside the page.
> 
> - btrfs_page_set_uptodate()
> - btrfs_page_clear_uptodate()
> - btrfs_page_test_uptodate()
>    These helpers can handle both regular sector size and subpage.
>    Although caller should still ensure that the range is inside the page.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 09/18] btrfs: introduce helpers for subpage error status
  2021-01-26  8:33 ` [PATCH v5 09/18] btrfs: introduce helpers for subpage error status Qu Wenruo
@ 2021-01-27 16:34   ` Josef Bacik
  0 siblings, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 16:34 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:33 AM, Qu Wenruo wrote:
> Introduce the following functions to handle subpage error status:
> 
> - btrfs_subpage_set_error()
> - btrfs_subpage_clear_error()
> - btrfs_subpage_test_error()
>    These helpers can only be called when the page has subpage attached
>    and the range is ensured to be inside the page.
> 
> - btrfs_page_set_error()
> - btrfs_page_clear_error()
> - btrfs_page_test_error()
>    These helpers can handle both regular sector size and subpage without
>    problem.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 10/18] btrfs: support subpage in set/clear_extent_buffer_uptodate()
  2021-01-26  8:33 ` [PATCH v5 10/18] btrfs: support subpage in set/clear_extent_buffer_uptodate() Qu Wenruo
@ 2021-01-27 16:35   ` Josef Bacik
  0 siblings, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 16:35 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:33 AM, Qu Wenruo wrote:
> To support subpage in set_extent_buffer_uptodate and
> clear_extent_buffer_uptodate we only need to use the subpage-aware
> helpers to update the page bits.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 11/18] btrfs: support subpage in btrfs_clone_extent_buffer
  2021-01-26  8:33 ` [PATCH v5 11/18] btrfs: support subpage in btrfs_clone_extent_buffer Qu Wenruo
@ 2021-01-27 16:35   ` Josef Bacik
  0 siblings, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 16:35 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:33 AM, Qu Wenruo wrote:
> For btrfs_clone_extent_buffer(), it's mostly the same code of
> __alloc_dummy_extent_buffer(), except it has extra page copy.
> 
> So to make it subpage compatible, we only need to:
> 
> - Call set_extent_buffer_uptodate() instead of SetPageUptodate()
>    This will set correct uptodate bit for subpage and regular sector size
>    cases.
> 
> Since we're calling set_extent_buffer_uptodate() which will also set
> EXTENT_BUFFER_UPTODATE bit, we don't need to manually set that bit
> either.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 12/18] btrfs: support subpage in try_release_extent_buffer()
  2021-01-26  8:33 ` [PATCH v5 12/18] btrfs: support subpage in try_release_extent_buffer() Qu Wenruo
@ 2021-01-27 16:37   ` Josef Bacik
  0 siblings, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 16:37 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:33 AM, Qu Wenruo wrote:
> Unlike the original try_release_extent_buffer(),
> try_release_subpage_extent_buffer() will iterate through all the ebs in
> the page, and try to release each.
> 
> We can release the full page only after there's no private attached,
> which means all ebs of that page have been released as well.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 13/18] btrfs: introduce read_extent_buffer_subpage()
  2021-01-26  8:33 ` [PATCH v5 13/18] btrfs: introduce read_extent_buffer_subpage() Qu Wenruo
@ 2021-01-27 16:39   ` Josef Bacik
  0 siblings, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 16:39 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:33 AM, Qu Wenruo wrote:
> Introduce a helper, read_extent_buffer_subpage(), to do the subpage
> extent buffer read.
> 
> The difference between regular and subpage routines are:
> 
> - No page locking
>    Here we completely rely on extent locking.
>    Page locking can reduce the concurrency greatly, as if we lock one
>    page to read one extent buffer, all the other extent buffers in the
>    same page will have to wait.
> 
> - Extent uptodate condition
>    Despite the existing PageUptodate() and EXTENT_BUFFER_UPTODATE check,
>    We also need to check btrfs_subpage::uptodate_bitmap.
> 
> - No page iteration
>    Just one page, no need to loop, this greatly simplified the subpage
>    routine.
> 
> This patch only implements the bio submit part, no endio support yet.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 14/18] btrfs: support subpage in endio_readpage_update_page_status()
  2021-01-26  8:33 ` [PATCH v5 14/18] btrfs: support subpage in endio_readpage_update_page_status() Qu Wenruo
@ 2021-01-27 16:42   ` Josef Bacik
  0 siblings, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 16:42 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:33 AM, Qu Wenruo wrote:
> To handle subpage status update, add the following:
> 
> - Use btrfs_page_*() subpage-aware helpers to update page status
>    Now we can handle both cases well.
> 
> - No page unlock for subpage metadata
>    Since subpage metadata doesn't utilize page locking at all, skip it.
>    For subpage data locking, it's handled in later commits.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 15/18] btrfs: introduce subpage metadata validation check
  2021-01-26  8:33 ` [PATCH v5 15/18] btrfs: introduce subpage metadata validation check Qu Wenruo
@ 2021-01-27 16:47   ` Josef Bacik
  0 siblings, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 16:47 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:33 AM, Qu Wenruo wrote:
> For subpage metadata validation check, there are some differences:
> 
> - Read must finish in one bvec
>    Since we're just reading one subpage range in one page, it should
>    never be split into two bios nor two bvecs.
> 
> - How to grab the existing eb
>    Instead of grabbing eb using page->private, we have to go search radix
>    tree as we don't have any direct pointer at hand.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: David Sterba <dsterba@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 16/18] btrfs: introduce btrfs_subpage for data inodes
  2021-01-26  8:34 ` [PATCH v5 16/18] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
@ 2021-01-27 16:56   ` Josef Bacik
  2021-02-01 15:42     ` David Sterba
  0 siblings, 1 reply; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 16:56 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:34 AM, Qu Wenruo wrote:
> To support subpage sector size, data also need extra info to make sure
> which sectors in a page are uptodate/dirty/...
> 
> This patch will make pages for data inodes to get btrfs_subpage
> structure attached, and detached when the page is freed.
> 
> This patch also slightly changes the timing when
> set_page_extent_mapped() to make sure:
> 
> - We have page->mapping set
>    page->mapping->host is used to grab btrfs_fs_info, thus we can only
>    call this function after page is mapped to an inode.
> 
>    One call site attaches pages to inode manually, thus we have to modify
>    the timing of set_page_extent_mapped() a little.
> 
> - As soon as possible, before other operations
>    Since memory allocation can fail, we have to do extra error handling.
>    Calling set_page_extent_mapped() as soon as possible can simply the
>    error handling for several call sites.
> 
> The idea is pretty much the same as iomap_page, but with more bitmaps
> for btrfs specific cases.
> 
> Currently the plan is to switch iomap if iomap can provide sector
> aligned write back (only write back dirty sectors, but not the full
> page, data balance require this feature).
> 
> So we will stick to btrfs specific bitmap for now.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/compression.c      | 10 ++++++--
>   fs/btrfs/extent_io.c        | 46 +++++++++++++++++++++++++++++++++----
>   fs/btrfs/extent_io.h        |  3 ++-
>   fs/btrfs/file.c             | 24 ++++++++-----------
>   fs/btrfs/free-space-cache.c | 15 +++++++++---
>   fs/btrfs/inode.c            | 14 +++++++----
>   fs/btrfs/ioctl.c            |  8 ++++++-
>   fs/btrfs/reflink.c          |  5 +++-
>   fs/btrfs/relocation.c       | 11 +++++++--
>   9 files changed, 103 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 5ae3fa0386b7..6d203acfdeb3 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -542,13 +542,19 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>   			goto next;
>   		}
>   
> -		end = last_offset + PAGE_SIZE - 1;
>   		/*
>   		 * at this point, we have a locked page in the page cache
>   		 * for these bytes in the file.  But, we have to make
>   		 * sure they map to this compressed extent on disk.
>   		 */
> -		set_page_extent_mapped(page);
> +		ret = set_page_extent_mapped(page);
> +		if (ret < 0) {
> +			unlock_page(page);
> +			put_page(page);
> +			break;
> +		}
> +
> +		end = last_offset + PAGE_SIZE - 1;
>   		lock_extent(tree, last_offset, end);
>   		read_lock(&em_tree->lock);
>   		em = lookup_extent_mapping(em_tree, last_offset,
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 139a8a77ed72..eeee3213daaa 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3190,10 +3190,39 @@ static int attach_extent_buffer_page(struct extent_buffer *eb,
>   	return ret;
>   }
>   
> -void set_page_extent_mapped(struct page *page)
> +int set_page_extent_mapped(struct page *page)
>   {
> +	struct btrfs_fs_info *fs_info;
> +
> +	ASSERT(page->mapping);
> +
> +	if (PagePrivate(page))
> +		return 0;
> +
> +	fs_info = btrfs_sb(page->mapping->host->i_sb);
> +
> +	if (fs_info->sectorsize < PAGE_SIZE)
> +		return btrfs_attach_subpage(fs_info, page, BTRFS_SUBPAGE_DATA);
> +
> +	attach_page_private(page, (void *)EXTENT_PAGE_PRIVATE);
> +	return 0;
> +
> +}
> +
> +void clear_page_extent_mapped(struct page *page)
> +{
> +	struct btrfs_fs_info *fs_info;
> +
> +	ASSERT(page->mapping);
> +
>   	if (!PagePrivate(page))
> -		attach_page_private(page, (void *)EXTENT_PAGE_PRIVATE);
> +		return;
> +
> +	fs_info = btrfs_sb(page->mapping->host->i_sb);
> +	if (fs_info->sectorsize < PAGE_SIZE)
> +		return btrfs_detach_subpage(fs_info, page);
> +
> +	detach_page_private(page);
>   }
>   
>   static struct extent_map *
> @@ -3250,7 +3279,12 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
>   	unsigned long this_bio_flag = 0;
>   	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
>   
> -	set_page_extent_mapped(page);
> +	ret = set_page_extent_mapped(page);
> +	if (ret < 0) {
> +		unlock_extent(tree, start, end);
> +		SetPageError(page);
> +		goto out;
> +	}
>   
>   	if (!PageUptodate(page)) {
>   		if (cleancache_get_page(page) == 0) {
> @@ -3690,7 +3724,11 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
>   		flush_dcache_page(page);
>   	}
>   
> -	set_page_extent_mapped(page);
> +	ret = set_page_extent_mapped(page);
> +	if (ret < 0) {
> +		SetPageError(page);
> +		goto done;
> +	}
>   
>   	if (!epd->extent_locked) {
>   		ret = writepage_delalloc(BTRFS_I(inode), page, wbc, start,
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 2d8187c84812..047b3e66897f 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -178,7 +178,8 @@ int btree_write_cache_pages(struct address_space *mapping,
>   void extent_readahead(struct readahead_control *rac);
>   int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
>   		  u64 start, u64 len);
> -void set_page_extent_mapped(struct page *page);
> +int set_page_extent_mapped(struct page *page);
> +void clear_page_extent_mapped(struct page *page);
>   
>   struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>   					  u64 start, u64 owner_root, int level);
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index d81ae1f518f2..63b290210eaa 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1369,6 +1369,12 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
>   			goto fail;
>   		}
>   
> +		err = set_page_extent_mapped(pages[i]);
> +		if (err < 0) {
> +			faili = i;
> +			goto fail;
> +		}
> +
>   		if (i == 0)
>   			err = prepare_uptodate_page(inode, pages[i], pos,
>   						    force_uptodate);
> @@ -1453,23 +1459,11 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
>   	}
>   
>   	/*
> -	 * It's possible the pages are dirty right now, but we don't want
> -	 * to clean them yet because copy_from_user may catch a page fault
> -	 * and we might have to fall back to one page at a time.  If that
> -	 * happens, we'll unlock these pages and we'd have a window where
> -	 * reclaim could sneak in and drop the once-dirty page on the floor
> -	 * without writing it.
> -	 *
> -	 * We have the pages locked and the extent range locked, so there's
> -	 * no way someone can start IO on any dirty pages in this range.
> -	 *
> -	 * We'll call btrfs_dirty_pages() later on, and that will flip around
> -	 * delalloc bits and dirty the pages as required.
> +	 * We should be called after prepare_pages() which should have
> +	 * locked all pages in the range.
>   	 */
> -	for (i = 0; i < num_pages; i++) {
> -		set_page_extent_mapped(pages[i]);
> +	for (i = 0; i < num_pages; i++)
>   		WARN_ON(!PageLocked(pages[i]));
> -	}
>   
>   	return ret;
>   }
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index fd6ddd6b8165..35e5c6ee0584 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -431,11 +431,22 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, bool uptodate)
>   	int i;
>   
>   	for (i = 0; i < io_ctl->num_pages; i++) {
> +		int ret;
> +
>   		page = find_or_create_page(inode->i_mapping, i, mask);
>   		if (!page) {
>   			io_ctl_drop_pages(io_ctl);
>   			return -ENOMEM;
>   		}
> +
> +		ret = set_page_extent_mapped(page);
> +		if (ret < 0) {
> +			unlock_page(page);
> +			put_page(page);
> +			io_ctl_drop_pages(io_ctl);
> +			return ret;
> +		}
> +
>   		io_ctl->pages[i] = page;
>   		if (uptodate && !PageUptodate(page)) {
>   			btrfs_readpage(NULL, page);
> @@ -455,10 +466,8 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, bool uptodate)
>   		}
>   	}
>   
> -	for (i = 0; i < io_ctl->num_pages; i++) {
> +	for (i = 0; i < io_ctl->num_pages; i++)
>   		clear_page_dirty_for_io(io_ctl->pages[i]);
> -		set_page_extent_mapped(io_ctl->pages[i]);
> -	}
>   
>   	return 0;
>   }
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index d1bb3cc8499b..a18e3c950f07 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -4712,6 +4712,9 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
>   		ret = -ENOMEM;
>   		goto out;
>   	}
> +	ret = set_page_extent_mapped(page);
> +	if (ret < 0)
> +		goto out_unlock;
>   
>   	if (!PageUptodate(page)) {
>   		ret = btrfs_readpage(NULL, page);
> @@ -4729,7 +4732,6 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
>   	wait_on_page_writeback(page);
>   
>   	lock_extent_bits(io_tree, block_start, block_end, &cached_state);
> -	set_page_extent_mapped(page);
>   
>   	ordered = btrfs_lookup_ordered_extent(inode, block_start);
>   	if (ordered) {
> @@ -8107,7 +8109,7 @@ static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
>   {
>   	int ret = try_release_extent_mapping(page, gfp_flags);
>   	if (ret == 1)
> -		detach_page_private(page);
> +		clear_page_extent_mapped(page);
>   	return ret;
>   }
>   
> @@ -8266,7 +8268,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>   	}
>   
>   	ClearPageChecked(page);
> -	detach_page_private(page);
> +	clear_page_extent_mapped(page);
>   }
>   
>   /*
> @@ -8345,7 +8347,11 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>   	wait_on_page_writeback(page);
>   
>   	lock_extent_bits(io_tree, page_start, page_end, &cached_state);
> -	set_page_extent_mapped(page);
> +	ret2 = set_page_extent_mapped(page);
> +	if (ret2 < 0) {
> +		ret = vmf_error(ret2);
> +		goto out_unlock;
> +	}

Sorry I missed this bit in my last reply, you need a

ret = vmf_error(ret2);
unlock_extent_cached(io_tree, page_start, page_end, &cached_state);
goto out_unlock;

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 17/18] btrfs: integrate page status update for data read path into begin/end_page_read()
  2021-01-26  8:34 ` [PATCH v5 17/18] btrfs: integrate page status update for data read path into begin/end_page_read() Qu Wenruo
@ 2021-01-27 17:13   ` Josef Bacik
  2021-02-01 15:47     ` David Sterba
  0 siblings, 1 reply; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 17:13 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:34 AM, Qu Wenruo wrote:
> In btrfs data page read path, the page status update are handled in two
> different locations:
> 
>    btrfs_do_read_page()
>    {
> 	while (cur <= end) {
> 		/* No need to read from disk */
> 		if (HOLE/PREALLOC/INLINE){
> 			memset();
> 			set_extent_uptodate();
> 			continue;
> 		}
> 		/* Read from disk */
> 		ret = submit_extent_page(end_bio_extent_readpage);
>    }
> 
>    end_bio_extent_readpage()
>    {
> 	endio_readpage_uptodate_page_status();
>    }
> 
> This is fine for sectorsize == PAGE_SIZE case, as for above loop we
> should only hit one branch and then exit.
> 
> But for subpage, there are more works to be done in page status update:
> - Page Unlock condition
>    Unlike regular page size == sectorsize case, we can no longer just
>    unlock a page without a brain.
>    Only the last reader of the page can unlock the page.
>    This means, we can unlock the page either in the while() loop, or in
>    the endio function.
> 
> - Page uptodate condition
>    Since we have multiple sectors to read for a page, we can only mark
>    the full page uptodate if all sectors are uptodate.
> 
> To handle both subpage and regular cases, introduce a pair of functions
> to help handling page status update:
> 
> - being_page_read()
>    For regular case, it does nothing.
>    For subpage case, it update the reader counters so that later
>    end_page_read() can know who is the last one to unlock the page.
> 
> - end_page_read()
>    This is just endio_readpage_uptodate_page_status() renamed.
>    The original name is a little too long and too specific for endio.
> 
>    The only new trick added is the condition for page unlock.
>    Now for subage data, we unlock the page if we're the last reader.
> 
> This does not only provide the basis for subpage data read, but also
> hide the special handling of page read from the main read loop.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>   fs/btrfs/extent_io.c | 38 ++++++++++++++++++++----------
>   fs/btrfs/subpage.c   | 56 ++++++++++++++++++++++++++++++++++----------
>   fs/btrfs/subpage.h   |  8 +++++++
>   3 files changed, 78 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index eeee3213daaa..7fc2c62d4eb9 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2839,8 +2839,17 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
>   	processed->uptodate = uptodate;
>   }
>   
> -static void endio_readpage_update_page_status(struct page *page, bool uptodate,
> -					      u64 start, u32 len)
> +static void begin_data_page_read(struct btrfs_fs_info *fs_info, struct page *page)
> +{
> +	ASSERT(PageLocked(page));
> +	if (fs_info->sectorsize == PAGE_SIZE)
> +		return;
> +
> +	ASSERT(PagePrivate(page));
> +	btrfs_subpage_start_reader(fs_info, page, page_offset(page), PAGE_SIZE);
> +}
> +
> +static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
>   {
>   	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
>   
> @@ -2856,7 +2865,12 @@ static void endio_readpage_update_page_status(struct page *page, bool uptodate,
>   
>   	if (fs_info->sectorsize == PAGE_SIZE)
>   		unlock_page(page);
> -	/* Subpage locking will be handled in later patches */
> +	else if (is_data_inode(page->mapping->host))
> +		/*
> +		 * For subpage data, unlock the page if we're the last reader.
> +		 * For subpage metadata, page lock is not utilized for read.
> +		 */
> +		btrfs_subpage_end_reader(fs_info, page, start, len);
>   }
>   
>   /*
> @@ -2993,7 +3007,7 @@ static void end_bio_extent_readpage(struct bio *bio)
>   		bio_offset += len;
>   
>   		/* Update page status and unlock */
> -		endio_readpage_update_page_status(page, uptodate, start, len);
> +		end_page_read(page, uptodate, start, len);
>   		endio_readpage_release_extent(&processed, BTRFS_I(inode),
>   					      start, end, uptodate);
>   	}
> @@ -3263,6 +3277,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
>   		      unsigned int read_flags, u64 *prev_em_start)
>   {
>   	struct inode *inode = page->mapping->host;
> +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   	u64 start = page_offset(page);
>   	const u64 end = start + PAGE_SIZE - 1;
>   	u64 cur = start;
> @@ -3306,6 +3321,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
>   			kunmap_atomic(userpage);
>   		}
>   	}

You have two error cases above this

         ret = set_page_extent_mapped(page);
         if (ret < 0) {
                 unlock_extent(tree, start, end);
                 SetPageError(page);
                 goto out;
         }

and

         if (!PageUptodate(page)) {
                 if (cleancache_get_page(page) == 0) {
                         BUG_ON(blocksize != PAGE_SIZE);
                         unlock_extent(tree, start, end);
                         goto out;
                 }
         }

which will now leave the page locked when it errors out.  Not to mention I'm 
pretty sure you want to use btrfs_page_set_error() instead of SetPageError() in 
that first case.

> +	begin_data_page_read(fs_info, page);
>   	while (cur <= end) {
>   		bool force_bio_submit = false;
>   		u64 disk_bytenr;
> @@ -3323,13 +3339,14 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
>   					    &cached, GFP_NOFS);
>   			unlock_extent_cached(tree, cur,
>   					     cur + iosize - 1, &cached);
> +			end_page_read(page, true, cur, iosize);
>   			break;
>   		}
>   		em = __get_extent_map(inode, page, pg_offset, cur,
>   				      end - cur + 1, em_cached);
>   		if (IS_ERR_OR_NULL(em)) {
> -			SetPageError(page);
>   			unlock_extent(tree, cur, end);
> +			end_page_read(page, false, cur, end + 1 - cur);
>   			break;
>   		}
>   		extent_offset = cur - em->start;
> @@ -3412,6 +3429,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
>   					    &cached, GFP_NOFS);
>   			unlock_extent_cached(tree, cur,
>   					     cur + iosize - 1, &cached);
> +			end_page_read(page, true, cur, iosize);
>   			cur = cur + iosize;
>   			pg_offset += iosize;
>   			continue;
> @@ -3421,6 +3439,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
>   				   EXTENT_UPTODATE, 1, NULL)) {
>   			check_page_uptodate(tree, page);
>   			unlock_extent(tree, cur, cur + iosize - 1);
> +			end_page_read(page, true, cur, iosize);
>   			cur = cur + iosize;
>   			pg_offset += iosize;
>   			continue;
> @@ -3429,8 +3448,8 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
>   		 * to date.  Error out
>   		 */
>   		if (block_start == EXTENT_MAP_INLINE) {
> -			SetPageError(page);
>   			unlock_extent(tree, cur, cur + iosize - 1);
> +			end_page_read(page, false, cur, iosize);
>   			cur = cur + iosize;
>   			pg_offset += iosize;
>   			continue;
> @@ -3447,19 +3466,14 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
>   			nr++;
>   			*bio_flags = this_bio_flag;
>   		} else {
> -			SetPageError(page);
>   			unlock_extent(tree, cur, cur + iosize - 1);
> +			end_page_read(page, false, cur, iosize);
>   			goto out;
>   		}
>   		cur = cur + iosize;
>   		pg_offset += iosize;
>   	}
>   out:
> -	if (!nr) {
> -		if (!PageError(page))
> -			SetPageUptodate(page);
> -		unlock_page(page);
> -	}

We can just delete out: here and either return on error or break from the main 
loop.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 18/18] btrfs: allow RO mount of 4K sector size fs on 64K page system
  2021-01-26  8:34 ` [PATCH v5 18/18] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
@ 2021-01-27 17:13   ` Josef Bacik
  2021-02-01 15:49   ` David Sterba
  1 sibling, 0 replies; 52+ messages in thread
From: Josef Bacik @ 2021-01-27 17:13 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: David Sterba

On 1/26/21 3:34 AM, Qu Wenruo wrote:
> This adds the basic RO mount ability for 4K sector size on 64K page
> system.
> 
> Currently we only plan to support 4K and 64K page system.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 00/18] btrfs: add read-only support for subpage sector size
  2021-01-27 16:17 ` [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Josef Bacik
@ 2021-01-28  0:30   ` Qu Wenruo
  2021-01-28 10:34     ` David Sterba
  0 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-28  0:30 UTC (permalink / raw)
  To: Josef Bacik, Qu Wenruo, linux-btrfs



On 2021/1/28 上午12:17, Josef Bacik wrote:
> On 1/26/21 3:33 AM, Qu Wenruo wrote:
>> Patches can be fetched from github:
>> https://github.com/adam900710/linux/tree/subpage
>> Currently the branch also contains partial RW data support (still some
>> ordered extent and data csum mismatch problems)
>>
>> Great thanks to David/Nikolay/Josef for their effort reviewing and
>> merging the preparation patches into misc-next.
>>
>> === What works ===
>> Just from the patchset:
>> - Data read
>>    Both regular and compressed data, with csum check.
>>
>> - Metadata read
>>
>> This means, with these patchset, 64K page systems can at least mount
>> btrfs with 4K sector size read-only.
>> This should provide the ability to migrate data at least.
>>
>> While on the github branch, there are already experimental RW supports,
>> there are still ordered extent related bugs for me to fix.
>> Thus only the RO part is sent for review and testing.
>>
>> === Patchset structure ===
>> Patch 01~02:    Preparation patches which don't have functional change
>> Patch 03~12:    Subpage metadata allocation and freeing
>> Patch 13~15:    Subpage metadata read path
>> Patch 16~17:    Subpage data read path
>> Patch 18:    Enable subpage RO support
>>
>> === Changelog ===
>> v1:
>> - Separate the main implementation from previous huge patchset
>>    Huge patchset doesn't make much sense.
>>
>> - Use bitmap implementation
>>    Now page::private will be a pointer to btrfs_subpage structure, which
>>    contains bitmaps for various page status.
>>
>> v2:
>> - Use page::private as btrfs_subpage for extra info
>>    This replace old extent io tree based solution, which reduces latency
>>    and don't require memory allocation for its operations.
>>
>> - Cherry-pick new preparation patches from RW development
>>    Those new preparation patches improves the readability by their own.
>>
>> v3:
>> - Make dummy extent buffer to follow the same subpage accessors
>>    Fsstress exposed several ASSERT() for dummy extent buffers.
>>    It turns out we need to make dummy extent buffer to own the same
>>    btrfs_subpage structure to make eb accessors to work properly
>>
>> - Two new small __process_pages_contig() related preparation patches
>>    One to make __process_pages_contig() to enhance the error handling
>>    path for locked_page, one to merge one macro.
>>
>> - Extent buffers refs count update
>>    Except try_release_extent_buffer(), all other eb uses will try to
>>    increase the ref count of the eb.
>>    For try_release_extent_buffer(), the eb refs check will happen inside
>>    the rcu critical section to avoid eb being freed.
>>
>> - Comment updates
>>    Addressing the comments from the mail list.
>>
>> v4:
>> - Get rid of btrfs_subpage::tree_block_bitmap
>>    This is to reduce lock complexity (no need to bother extra subpage
>>    lock for metadata, all locks are existing locks)
>>    Now eb looking up mostly depends on radix tree, with small help from
>>    btrfs_subpage::under_alloc.
>>    Now I haven't experieneced metadata related problems any more during
>>    my local fsstress tests.
>>
>> - Fix a race where metadata page dirty bit can race
>>    Fixed in the metadata RW patchset though.
>>
>> - Rebased to latest misc-next branch
>>    With 4 patches removed, as they are already in misc-next.
>>
>> v5:
>> - Use the updated version from David as base
>>    Most comment/commit message update should be kept as is.
>>
>> - A new separate patch to move UNMAPPED bit set timing
>>
>> - New comment on why we need to prealloc subpage inside a loop
>>    Mostly for further 16K page size support, where we can have
>>    eb across multiple pages.
>>
>> - Remove one patch which is too RW specific
>>    Since it introduces functional change which only makes sense for RW
>>    support, it's not a good idea to include it in RO support.
>>
>> - Error handling fixes
>>    Great thanks to Josef.
>>
>> - Refactor btrfs_subpage allocation/freeing
>>    Now we have btrfs_alloc_subpage() and btrfs_free_subpage() helpers to
>>    do all the allocation/freeing.
>>    It's pretty easy to convert to kmem_cache using above helpers.
>>    (already internally tested using kmem_cache without problem, in fact
>>     it's all the problems found in kmem_cache test leads to the new
>>     interface)
>>
>> - Use btrfs_subpage::eb_refs to replace old under_alloc
>>    This makes checking whether the page has any eb left much easier.
>>
>> Qu Wenruo (18):
>>    btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to
>>      PAGE_START_WRITEBACK
>>    btrfs: set UNMAPPED bit early in btrfs_clone_extent_buffer() for
>>      subpage support
>>    btrfs: introduce the skeleton of btrfs_subpage structure
>>    btrfs: make attach_extent_buffer_page() handle subpage case
>>    btrfs: make grab_extent_buffer_from_page() handle subpage case
>>    btrfs: support subpage for extent buffer page release
>
> I don't have this patch in my inbox so I can't reply to it directly, but
> you include refcount.h, but then use normal atomics.  Please used the
> actual refcount_t, as it gets us all the debugging stuff that makes
> finding problems much easier.  Thanks,

My bad, my initial plan is to use refcount, but the use case has valid 0
refcount usage, thus refcount is not good here.

I'll remove the remaining including line.

Thanks,
Qu
>
> Josef

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 00/18] btrfs: add read-only support for subpage sector size
  2021-01-28  0:30   ` Qu Wenruo
@ 2021-01-28 10:34     ` David Sterba
  2021-01-28 10:51       ` Qu Wenruo
  0 siblings, 1 reply; 52+ messages in thread
From: David Sterba @ 2021-01-28 10:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Josef Bacik, Qu Wenruo, linux-btrfs

On Thu, Jan 28, 2021 at 08:30:21AM +0800, Qu Wenruo wrote:
> >>    btrfs: support subpage for extent buffer page release
> >
> > I don't have this patch in my inbox so I can't reply to it directly, but
> > you include refcount.h, but then use normal atomics.  Please used the
> > actual refcount_t, as it gets us all the debugging stuff that makes
> > finding problems much easier.  Thanks,
> 
> My bad, my initial plan is to use refcount, but the use case has valid 0
> refcount usage, thus refcount is not good here.

In case you need to shift the "0" you can use refcount_dec_not_one or
refcount_inc/dec_not_zero, but I haven't seen the code so don't know if
this applies in your case.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 00/18] btrfs: add read-only support for subpage sector size
  2021-01-28 10:34     ` David Sterba
@ 2021-01-28 10:51       ` Qu Wenruo
  2021-02-01 14:50         ` David Sterba
  0 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-01-28 10:51 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Josef Bacik, linux-btrfs



On 2021/1/28 下午6:34, David Sterba wrote:
> On Thu, Jan 28, 2021 at 08:30:21AM +0800, Qu Wenruo wrote:
>>>>     btrfs: support subpage for extent buffer page release
>>>
>>> I don't have this patch in my inbox so I can't reply to it directly, but
>>> you include refcount.h, but then use normal atomics.  Please used the
>>> actual refcount_t, as it gets us all the debugging stuff that makes
>>> finding problems much easier.  Thanks,
>>
>> My bad, my initial plan is to use refcount, but the use case has valid 0
>> refcount usage, thus refcount is not good here.
> 
> In case you need to shift the "0" you can use refcount_dec_not_one or
> refcount_inc/dec_not_zero, but I haven't seen the code so don't know if
> this applies in your case.
> 

In the code, what we want is inc on zero, which will cause warning on 
refcount. (initial subpage allocation has zero ref, then increased to 
one when one eb is attached to the page)

But maybe I can change the timing so that we can use refcount.
Current code uses ASSERT()s to prevent underflow, so it would be 
sufficient for current code base though.

I'll investigate more time on this topic in next update.

Thanks,
Qu


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 00/18] btrfs: add read-only support for subpage sector size
  2021-01-28 10:51       ` Qu Wenruo
@ 2021-02-01 14:50         ` David Sterba
  0 siblings, 0 replies; 52+ messages in thread
From: David Sterba @ 2021-02-01 14:50 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, Josef Bacik, linux-btrfs

On Thu, Jan 28, 2021 at 06:51:46PM +0800, Qu Wenruo wrote:
> On 2021/1/28 下午6:34, David Sterba wrote:
> > On Thu, Jan 28, 2021 at 08:30:21AM +0800, Qu Wenruo wrote:
> >>>>     btrfs: support subpage for extent buffer page release
> >>>
> >>> I don't have this patch in my inbox so I can't reply to it directly, but
> >>> you include refcount.h, but then use normal atomics.  Please used the
> >>> actual refcount_t, as it gets us all the debugging stuff that makes
> >>> finding problems much easier.  Thanks,
> >>
> >> My bad, my initial plan is to use refcount, but the use case has valid 0
> >> refcount usage, thus refcount is not good here.
> > 
> > In case you need to shift the "0" you can use refcount_dec_not_one or
> > refcount_inc/dec_not_zero, but I haven't seen the code so don't know if
> > this applies in your case.
> 
> In the code, what we want is inc on zero, which will cause warning on 
> refcount. (initial subpage allocation has zero ref, then increased to 
> one when one eb is attached to the page)
> 
> But maybe I can change the timing so that we can use refcount.
> Current code uses ASSERT()s to prevent underflow, so it would be 
> sufficient for current code base though.

Assert for an underflow is ok but the refcount catches inc from zero ie.
a potential use after free.

With lifted refcount it should be possible to distinguish states where
it's really freed (0, to be deallocated) and 1 which is some middle
state like initialized, valid but not yet attached. Usage will increase
the ref, once there are no users, compare to 1, and then final put is
back to 0. A similar pattern is done for extent buffers, the subpage
data probably have similar lifetime.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 06/18] btrfs: support subpage for extent buffer page release
  2021-01-27 16:21   ` Josef Bacik
@ 2021-02-01 15:32     ` David Sterba
  0 siblings, 0 replies; 52+ messages in thread
From: David Sterba @ 2021-02-01 15:32 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Qu Wenruo, linux-btrfs, David Sterba

On Wed, Jan 27, 2021 at 11:21:08AM -0500, Josef Bacik wrote:
> On 1/26/21 3:33 AM, Qu Wenruo wrote:
> > --- a/fs/btrfs/subpage.h
> > +++ b/fs/btrfs/subpage.h
> > @@ -4,6 +4,7 @@
> >   #define BTRFS_SUBPAGE_H
> >   
> >   #include <linux/spinlock.h>
> > +#include <linux/refcount.h>
> 
> I made this comment elsewhere, but the patch finally showed up in my email after 
> I refreshed (???? thunderbird wtf??).  Anyway you import refcount.h here, but 
> don't actually use refcount_t.  Please use refcount_t, so we get the benefit of 
> the debugging from the helpers.  Thanks,

Switching to refcount looks a bit complicated so for now let's use
atomics, it's affecting only the subpage case.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 16/18] btrfs: introduce btrfs_subpage for data inodes
  2021-01-27 16:56   ` Josef Bacik
@ 2021-02-01 15:42     ` David Sterba
  0 siblings, 0 replies; 52+ messages in thread
From: David Sterba @ 2021-02-01 15:42 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Qu Wenruo, linux-btrfs, David Sterba

On Wed, Jan 27, 2021 at 11:56:39AM -0500, Josef Bacik wrote:
> On 1/26/21 3:34 AM, Qu Wenruo wrote:
> > @@ -8345,7 +8347,11 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
> >   	wait_on_page_writeback(page);
> >   
> >   	lock_extent_bits(io_tree, page_start, page_end, &cached_state);
> > -	set_page_extent_mapped(page);
> > +	ret2 = set_page_extent_mapped(page);
> > +	if (ret2 < 0) {
> > +		ret = vmf_error(ret2);
> > +		goto out_unlock;
> > +	}
> 
> Sorry I missed this bit in my last reply, you need a
> 
> ret = vmf_error(ret2);
> unlock_extent_cached(io_tree, page_start, page_end, &cached_state);
> goto out_unlock;

Folded to the patch, thanks.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 17/18] btrfs: integrate page status update for data read path into begin/end_page_read()
  2021-01-27 17:13   ` Josef Bacik
@ 2021-02-01 15:47     ` David Sterba
  0 siblings, 0 replies; 52+ messages in thread
From: David Sterba @ 2021-02-01 15:47 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Qu Wenruo, linux-btrfs, David Sterba

On Wed, Jan 27, 2021 at 12:13:27PM -0500, Josef Bacik wrote:
> On 1/26/21 3:34 AM, Qu Wenruo wrote:
> > @@ -3263,6 +3277,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
> >   		      unsigned int read_flags, u64 *prev_em_start)
> >   {
> >   	struct inode *inode = page->mapping->host;
> > +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> >   	u64 start = page_offset(page);
> >   	const u64 end = start + PAGE_SIZE - 1;
> >   	u64 cur = start;
> > @@ -3306,6 +3321,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
> >   			kunmap_atomic(userpage);
> >   		}
> >   	}
> 
> You have two error cases above this
> 
>          ret = set_page_extent_mapped(page);
>          if (ret < 0) {
>                  unlock_extent(tree, start, end);
>                  SetPageError(page);
>                  goto out;
>          }
> 
> and
> 
>          if (!PageUptodate(page)) {
>                  if (cleancache_get_page(page) == 0) {
>                          BUG_ON(blocksize != PAGE_SIZE);
>                          unlock_extent(tree, start, end);
>                          goto out;
>                  }
>          }
> 
> which will now leave the page locked when it errors out.  Not to mention I'm 
> pretty sure you want to use btrfs_page_set_error() instead of SetPageError() in 
> that first case.

Qu, please send a fixed version, just this patch, thanks.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 18/18] btrfs: allow RO mount of 4K sector size fs on 64K page system
  2021-01-26  8:34 ` [PATCH v5 18/18] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
  2021-01-27 17:13   ` Josef Bacik
@ 2021-02-01 15:49   ` David Sterba
  1 sibling, 0 replies; 52+ messages in thread
From: David Sterba @ 2021-02-01 15:49 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, David Sterba

On Tue, Jan 26, 2021 at 04:34:02PM +0800, Qu Wenruo wrote:
> This adds the basic RO mount ability for 4K sector size on 64K page
> system.
> 
> Currently we only plan to support 4K and 64K page system.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/disk-io.c | 24 +++++++++++++++++++++---
>  fs/btrfs/super.c   |  7 +++++++
>  2 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 0b10577ad2bd..d74ee0a396ac 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2483,13 +2483,21 @@ static int validate_super(struct btrfs_fs_info *fs_info,
>  		btrfs_err(fs_info, "invalid sectorsize %llu", sectorsize);
>  		ret = -EINVAL;
>  	}
> -	/* Only PAGE SIZE is supported yet */
> -	if (sectorsize != PAGE_SIZE) {
> +
> +	/*
> +	 * For 4K page size, we only support 4K sector size.
> +	 * For 64K page size, we support RW for 64K sector size, and RO for
> +	 * 4K sector size.
> +	 */
> +	if ((SZ_4K == PAGE_SIZE && sectorsize != PAGE_SIZE) ||
> +	    (SZ_64K == PAGE_SIZE && (sectorsize != SZ_4K &&

I've switched the order here so it reads more naturally as PAGE_SIZE == SZ_...

> +				     sectorsize != SZ_64K))) {
>  		btrfs_err(fs_info,
> -			"sectorsize %llu not supported yet, only support %lu",
> +			"sectorsize %llu not supported yet for page size %lu",
>  			sectorsize, PAGE_SIZE);
>  		ret = -EINVAL;
>  	}
> +
>  	if (!is_power_of_2(nodesize) || nodesize < sectorsize ||
>  	    nodesize > BTRFS_MAX_METADATA_BLOCKSIZE) {
>  		btrfs_err(fs_info, "invalid nodesize %llu", nodesize);
> @@ -3248,6 +3256,16 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>  		goto fail_alloc;
>  	}
>  
> +	/* For 4K sector size support, it's only read-only yet */
> +	if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) {
> +		if (!sb_rdonly(sb) || btrfs_super_log_root(disk_super)) {
> +			btrfs_err(fs_info,
> +				"subpage sector size only support RO yet");

Similar to the other message, I've added which sectorsize and page size
don't work.

And s/RO/read-only/. This is for clarity of the messages that are read
by users, while we can use the RO/RW in comments or changelogs.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 00/18] btrfs: add read-only support for subpage sector size
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (18 preceding siblings ...)
  2021-01-27 16:17 ` [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Josef Bacik
@ 2021-02-01 15:55 ` David Sterba
  2021-02-02  9:21 ` [bug report] Unable to handle kernel paging request Anand Jain
  2021-02-03 13:20 ` [PATCH v5 00/18] btrfs: add read-only support for subpage sector size David Sterba
  21 siblings, 0 replies; 52+ messages in thread
From: David Sterba @ 2021-02-01 15:55 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Jan 26, 2021 at 04:33:44PM +0800, Qu Wenruo wrote:
> Patches can be fetched from github:
> https://github.com/adam900710/linux/tree/subpage
> Currently the branch also contains partial RW data support (still some
> ordered extent and data csum mismatch problems)
> 
> Great thanks to David/Nikolay/Josef for their effort reviewing and
> merging the preparation patches into misc-next.
> 
> === What works ===
> Just from the patchset:
> - Data read
>   Both regular and compressed data, with csum check.
> 
> - Metadata read
> 
> This means, with these patchset, 64K page systems can at least mount
> btrfs with 4K sector size read-only.
> This should provide the ability to migrate data at least.
> 
> While on the github branch, there are already experimental RW supports,
> there are still ordered extent related bugs for me to fix.
> Thus only the RO part is sent for review and testing.
> 
> === Patchset structure ===
> Patch 01~02:	Preparation patches which don't have functional change
> Patch 03~12:	Subpage metadata allocation and freeing
> Patch 13~15:	Subpage metadata read path
> Patch 16~17:	Subpage data read path
> Patch 18:	Enable subpage RO support

> v5:
> - Use the updated version from David as base
>   Most comment/commit message update should be kept as is.
> 
> - A new separate patch to move UNMAPPED bit set timing
> 
> - New comment on why we need to prealloc subpage inside a loop
>   Mostly for further 16K page size support, where we can have
>   eb across multiple pages.
> 
> - Remove one patch which is too RW specific
>   Since it introduces functional change which only makes sense for RW
>   support, it's not a good idea to include it in RO support.
> 
> - Error handling fixes
>   Great thanks to Josef.
> 
> - Refactor btrfs_subpage allocation/freeing
>   Now we have btrfs_alloc_subpage() and btrfs_free_subpage() helpers to
>   do all the allocation/freeing.
>   It's pretty easy to convert to kmem_cache using above helpers.
>   (already internally tested using kmem_cache without problem, in fact
>    it's all the problems found in kmem_cache test leads to the new
>    interface)
> 
> - Use btrfs_subpage::eb_refs to replace old under_alloc
>   This makes checking whether the page has any eb left much easier.

All look reasonable for merge, patch 17 still needs an update that'll
replace once you send it.

I'll move it to misc-next after fstests finish, minor updates are still
possible during this week, merge window freeze is approaching.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [bug report] Unable to handle kernel paging request
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (19 preceding siblings ...)
  2021-02-01 15:55 ` David Sterba
@ 2021-02-02  9:21 ` Anand Jain
  2021-02-02 10:23   ` Qu Wenruo
  2021-02-03 13:20 ` [PATCH v5 00/18] btrfs: add read-only support for subpage sector size David Sterba
  21 siblings, 1 reply; 52+ messages in thread
From: Anand Jain @ 2021-02-02  9:21 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs


Qu,

  fstests ran fine on an aarch64 kvm with this patch set.

  Further, I was running few hand tests as below, and it fails
  with - Unable to handle kernel paging.

  Test case looks something like..

  On x86_64 create btrfs on a file 11g
  copy /usr into /test-mnt stops at enospc
  set compression property on the root sunvol
  run defrag with -czstd
  truncate a large file 4gb
  punch holes on it
  truncate couple of smaller files
  unmount
  send file to an aarch64 (64k pagesize) kvm
  mount -o ro
  run sha256sum on all the files

---------------------
[37012.027764] BTRFS warning (device loop0): csum failed root 5 ino 611 
off 228659200 csum 0x1dcefc2d expected csum 0x69412d2a mirror 1
[37012.030971] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0, 
rd 0, flush 0, corrupt 9, gen 0
[37012.036223] BTRFS warning (device loop0): csum failed root 5 ino 616 
off 228724736 csum 0x73f63661 expected csum 0xaf922a6f mirror 1
[37012.036250] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0, 
rd 0, flush 0, corrupt 10, gen 0
[37012.123917] Unable to handle kernel paging request at virtual address 
0061d1f66c080000
[37012.126104] Mem abort info:
[37012.126951]   ESR = 0x96000004
[37012.127791]   EC = 0x25: DABT (current EL), IL = 32 bits
[37012.129207]   SET = 0, FnV = 0
[37012.130043]   EA = 0, S1PTW = 0
[37012.131269] Data abort info:
[37012.132165]   ISV = 0, ISS = 0x00000004
[37012.133211]   CM = 0, WnR = 0
[37012.134014] [0061d1f66c080000] address between user and kernel 
address ranges
[37012.136050] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[37012.137567] Modules linked in: btrfs blake2b_generic xor xor_neon 
zstd_compress raid6_pq crct10dif_ce ip_tables x_tables ipv6
[37012.140742] CPU: 0 PID: 289001 Comm: kworker/u64:3 Not tainted 
5.11.0-rc5+ #10
[37012.142839] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 
02/06/2015
[37012.144787] Workqueue: btrfs-endio btrfs_work_helper [btrfs]
[37012.146474] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
[37012.148175] pc : __crc32c_le+0x84/0xe8
[37012.149266] lr : chksum_digest+0x24/0x40
[37012.150420] sp : ffff80001638f8f0
[37012.151491] x29: ffff80001638f8f0 x28: ffff0000c7bb0000
[37012.152982] x27: ffff0000d1a27000 x26: ffff0002f21b56e0
[37012.154565] x25: ffff800011df3948 x24: 0000004000000000
[37012.156063] x23: ffff000000000000 x22: ffff80001638fa00
[37012.157570] x21: 0000000000000004 x20: ffff0000c7bb0050
[37012.159145] x19: ffff80001638fc88 x18: 0000000000000000
[37012.160684] x17: 0000000000000000 x16: 0000000000000000
[37012.162190] x15: 0000051d5454c764 x14: 000000000000017a
[37012.163774] x13: 0000000000000145 x12: 0000000000000001
[37012.165282] x11: 0000000000000000 x10: 00000000000009d0
[37012.166849] x9 : ffff0000ca305564 x8 : 0000000000000000
[37012.168395] x7 : 0000000000000000 x6 : ffff800011f23980
[37012.169883] x5 : 00000000006f6964 x4 : ffff8000105dd7a8
[37012.171476] x3 : ffff80001638fc88 x2 : 0000000000010000
[37012.172997] x1 : bc61d1f66c080000 x0 : 00000000ffffffff
[37012.174642] Call trace:
[37012.175427]  __crc32c_le+0x84/0xe8
[37012.176419]  crypto_shash_digest+0x34/0x58
[37012.177616]  check_compressed_csum+0xd0/0x2b0 [btrfs]
[37012.179160]  end_compressed_bio_read+0xb8/0x308 [btrfs]
[37012.180731]  bio_endio+0x12c/0x1d8
[37012.181712]  end_workqueue_fn+0x3c/0x60 [btrfs]
[37012.183161]  btrfs_work_helper+0xf4/0x5a8 [btrfs]
[37012.184570]  process_one_work+0x1ec/0x4c0
[37012.185727]  worker_thread+0x48/0x478
[37012.186823]  kthread+0x158/0x160
[37012.187768]  ret_from_fork+0x10/0x34
[37012.188791] Code: 9ac55c08 9ac65d08 1a880000 b4000122 (a8c21023)
[37012.190486] ---[ end trace 4f73e813d058b84c ]---
[37019.180684] note: kworker/u64:3[289001] exited with preempt_count 1
---------------

  Could you please take a look?

Thanks, Anand

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [bug report] Unable to handle kernel paging request
  2021-02-02  9:21 ` [bug report] Unable to handle kernel paging request Anand Jain
@ 2021-02-02 10:23   ` Qu Wenruo
  2021-02-02 11:28     ` Anand Jain
  0 siblings, 1 reply; 52+ messages in thread
From: Qu Wenruo @ 2021-02-02 10:23 UTC (permalink / raw)
  To: Anand Jain, Qu Wenruo, linux-btrfs



On 2021/2/2 下午5:21, Anand Jain wrote:
>
> Qu,
>
>   fstests ran fine on an aarch64 kvm with this patch set.

Do you mean subpage patchset?

With 4K sector size?

No way it can run fine...
Long enough fsstress can crash the kernel with btrfs_csum_one_bio()
unable to locate the corresponding ordered extent.


>
>   Further, I was running few hand tests as below, and it fails
>   with - Unable to handle kernel paging.
>
>   Test case looks something like..
>
>   On x86_64 create btrfs on a file 11g
>   copy /usr into /test-mnt stops at enospc
>   set compression property on the root sunvol
>   run defrag with -czstd

I don't even consider compression a supported feature for subpage.

Are you really talking about the subpage patchset with 4K sector size,
on 64K page size AArch64?

If really so, I appreciate your effort on testing very much, it means
the patchset is doing way better than it is.
But I don't really believe it's even true to pass fstests....

Thanks,
Qu

>   truncate a large file 4gb
>   punch holes on it
>   truncate couple of smaller files
>   unmount
>   send file to an aarch64 (64k pagesize) kvm
>   mount -o ro
>   run sha256sum on all the files
>
> ---------------------
> [37012.027764] BTRFS warning (device loop0): csum failed root 5 ino 611
> off 228659200 csum 0x1dcefc2d expected csum 0x69412d2a mirror 1
> [37012.030971] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
> rd 0, flush 0, corrupt 9, gen 0
> [37012.036223] BTRFS warning (device loop0): csum failed root 5 ino 616
> off 228724736 csum 0x73f63661 expected csum 0xaf922a6f mirror 1
> [37012.036250] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
> rd 0, flush 0, corrupt 10, gen 0
> [37012.123917] Unable to handle kernel paging request at virtual address
> 0061d1f66c080000
> [37012.126104] Mem abort info:
> [37012.126951]   ESR = 0x96000004
> [37012.127791]   EC = 0x25: DABT (current EL), IL = 32 bits
> [37012.129207]   SET = 0, FnV = 0
> [37012.130043]   EA = 0, S1PTW = 0
> [37012.131269] Data abort info:
> [37012.132165]   ISV = 0, ISS = 0x00000004
> [37012.133211]   CM = 0, WnR = 0
> [37012.134014] [0061d1f66c080000] address between user and kernel
> address ranges
> [37012.136050] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [37012.137567] Modules linked in: btrfs blake2b_generic xor xor_neon
> zstd_compress raid6_pq crct10dif_ce ip_tables x_tables ipv6
> [37012.140742] CPU: 0 PID: 289001 Comm: kworker/u64:3 Not tainted
> 5.11.0-rc5+ #10
> [37012.142839] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0
> 02/06/2015
> [37012.144787] Workqueue: btrfs-endio btrfs_work_helper [btrfs]
> [37012.146474] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
> [37012.148175] pc : __crc32c_le+0x84/0xe8
> [37012.149266] lr : chksum_digest+0x24/0x40
> [37012.150420] sp : ffff80001638f8f0
> [37012.151491] x29: ffff80001638f8f0 x28: ffff0000c7bb0000
> [37012.152982] x27: ffff0000d1a27000 x26: ffff0002f21b56e0
> [37012.154565] x25: ffff800011df3948 x24: 0000004000000000
> [37012.156063] x23: ffff000000000000 x22: ffff80001638fa00
> [37012.157570] x21: 0000000000000004 x20: ffff0000c7bb0050
> [37012.159145] x19: ffff80001638fc88 x18: 0000000000000000
> [37012.160684] x17: 0000000000000000 x16: 0000000000000000
> [37012.162190] x15: 0000051d5454c764 x14: 000000000000017a
> [37012.163774] x13: 0000000000000145 x12: 0000000000000001
> [37012.165282] x11: 0000000000000000 x10: 00000000000009d0
> [37012.166849] x9 : ffff0000ca305564 x8 : 0000000000000000
> [37012.168395] x7 : 0000000000000000 x6 : ffff800011f23980
> [37012.169883] x5 : 00000000006f6964 x4 : ffff8000105dd7a8
> [37012.171476] x3 : ffff80001638fc88 x2 : 0000000000010000
> [37012.172997] x1 : bc61d1f66c080000 x0 : 00000000ffffffff
> [37012.174642] Call trace:
> [37012.175427]  __crc32c_le+0x84/0xe8
> [37012.176419]  crypto_shash_digest+0x34/0x58
> [37012.177616]  check_compressed_csum+0xd0/0x2b0 [btrfs]
> [37012.179160]  end_compressed_bio_read+0xb8/0x308 [btrfs]
> [37012.180731]  bio_endio+0x12c/0x1d8
> [37012.181712]  end_workqueue_fn+0x3c/0x60 [btrfs]
> [37012.183161]  btrfs_work_helper+0xf4/0x5a8 [btrfs]
> [37012.184570]  process_one_work+0x1ec/0x4c0
> [37012.185727]  worker_thread+0x48/0x478
> [37012.186823]  kthread+0x158/0x160
> [37012.187768]  ret_from_fork+0x10/0x34
> [37012.188791] Code: 9ac55c08 9ac65d08 1a880000 b4000122 (a8c21023)
> [37012.190486] ---[ end trace 4f73e813d058b84c ]---
> [37019.180684] note: kworker/u64:3[289001] exited with preempt_count 1
> ---------------
>
>   Could you please take a look?
>
> Thanks, Anand

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [bug report] Unable to handle kernel paging request
  2021-02-02 10:23   ` Qu Wenruo
@ 2021-02-02 11:28     ` Anand Jain
  2021-02-02 13:37       ` Anand Jain
  0 siblings, 1 reply; 52+ messages in thread
From: Anand Jain @ 2021-02-02 11:28 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs



On 2/2/2021 6:23 PM, Qu Wenruo wrote:
> 
> 
> On 2021/2/2 下午5:21, Anand Jain wrote:
>>
>> Qu,
>>
>>   fstests ran fine on an aarch64 kvm with this patch set.
> 
> Do you mean subpage patchset?
> 
> With 4K sector size?
> No way it can run fine...

  No . fstests ran with sectorsize == pagesize == 64k. These aren't
  subpage though. I mean just regression checks.

> Long enough fsstress can crash the kernel with btrfs_csum_one_bio()
> unable to locate the corresponding ordered extent.
>
>>   Further, I was running few hand tests as below, and it fails
>>   with - Unable to handle kernel paging.
>>
>>   Test case looks something like..
>>
>>   On x86_64 create btrfs on a file 11g
>>   copy /usr into /test-mnt stops at enospc
>>   set compression property on the root sunvol
>>   run defrag with -czstd
> 
> I don't even consider compression a supported feature for subpage.

  It should fail the ro mount, which it didn't. Similar test case
  without compression is fine.

> Are you really talking about the subpage patchset with 4K sector size,
> on 64K page size AArch64?

  yes readonly mount test case as above.

Thanks, Anand


> If really so, I appreciate your effort on testing very much, it means
> the patchset is doing way better than it is.
> But I don't really believe it's even true to pass fstests....



> Thanks,
> Qu
> 
>>   truncate a large file 4gb
>>   punch holes on it
>>   truncate couple of smaller files
>>   unmount
>>   send file to an aarch64 (64k pagesize) kvm
>>   mount -o ro
>>   run sha256sum on all the files
>>
>> ---------------------
>> [37012.027764] BTRFS warning (device loop0): csum failed root 5 ino 611
>> off 228659200 csum 0x1dcefc2d expected csum 0x69412d2a mirror 1
>> [37012.030971] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
>> rd 0, flush 0, corrupt 9, gen 0
>> [37012.036223] BTRFS warning (device loop0): csum failed root 5 ino 616
>> off 228724736 csum 0x73f63661 expected csum 0xaf922a6f mirror 1
>> [37012.036250] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
>> rd 0, flush 0, corrupt 10, gen 0
>> [37012.123917] Unable to handle kernel paging request at virtual address
>> 0061d1f66c080000
>> [37012.126104] Mem abort info:
>> [37012.126951]   ESR = 0x96000004
>> [37012.127791]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [37012.129207]   SET = 0, FnV = 0
>> [37012.130043]   EA = 0, S1PTW = 0
>> [37012.131269] Data abort info:
>> [37012.132165]   ISV = 0, ISS = 0x00000004
>> [37012.133211]   CM = 0, WnR = 0
>> [37012.134014] [0061d1f66c080000] address between user and kernel
>> address ranges
>> [37012.136050] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>> [37012.137567] Modules linked in: btrfs blake2b_generic xor xor_neon
>> zstd_compress raid6_pq crct10dif_ce ip_tables x_tables ipv6
>> [37012.140742] CPU: 0 PID: 289001 Comm: kworker/u64:3 Not tainted
>> 5.11.0-rc5+ #10
>> [37012.142839] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0
>> 02/06/2015
>> [37012.144787] Workqueue: btrfs-endio btrfs_work_helper [btrfs]
>> [37012.146474] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
>> [37012.148175] pc : __crc32c_le+0x84/0xe8
>> [37012.149266] lr : chksum_digest+0x24/0x40
>> [37012.150420] sp : ffff80001638f8f0
>> [37012.151491] x29: ffff80001638f8f0 x28: ffff0000c7bb0000
>> [37012.152982] x27: ffff0000d1a27000 x26: ffff0002f21b56e0
>> [37012.154565] x25: ffff800011df3948 x24: 0000004000000000
>> [37012.156063] x23: ffff000000000000 x22: ffff80001638fa00
>> [37012.157570] x21: 0000000000000004 x20: ffff0000c7bb0050
>> [37012.159145] x19: ffff80001638fc88 x18: 0000000000000000
>> [37012.160684] x17: 0000000000000000 x16: 0000000000000000
>> [37012.162190] x15: 0000051d5454c764 x14: 000000000000017a
>> [37012.163774] x13: 0000000000000145 x12: 0000000000000001
>> [37012.165282] x11: 0000000000000000 x10: 00000000000009d0
>> [37012.166849] x9 : ffff0000ca305564 x8 : 0000000000000000
>> [37012.168395] x7 : 0000000000000000 x6 : ffff800011f23980
>> [37012.169883] x5 : 00000000006f6964 x4 : ffff8000105dd7a8
>> [37012.171476] x3 : ffff80001638fc88 x2 : 0000000000010000
>> [37012.172997] x1 : bc61d1f66c080000 x0 : 00000000ffffffff
>> [37012.174642] Call trace:
>> [37012.175427]  __crc32c_le+0x84/0xe8
>> [37012.176419]  crypto_shash_digest+0x34/0x58
>> [37012.177616]  check_compressed_csum+0xd0/0x2b0 [btrfs]
>> [37012.179160]  end_compressed_bio_read+0xb8/0x308 [btrfs]
>> [37012.180731]  bio_endio+0x12c/0x1d8
>> [37012.181712]  end_workqueue_fn+0x3c/0x60 [btrfs]
>> [37012.183161]  btrfs_work_helper+0xf4/0x5a8 [btrfs]
>> [37012.184570]  process_one_work+0x1ec/0x4c0
>> [37012.185727]  worker_thread+0x48/0x478
>> [37012.186823]  kthread+0x158/0x160
>> [37012.187768]  ret_from_fork+0x10/0x34
>> [37012.188791] Code: 9ac55c08 9ac65d08 1a880000 b4000122 (a8c21023)
>> [37012.190486] ---[ end trace 4f73e813d058b84c ]---
>> [37019.180684] note: kworker/u64:3[289001] exited with preempt_count 1
>> ---------------
>>
>>   Could you please take a look?
>>
>> Thanks, Anand

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [bug report] Unable to handle kernel paging request
  2021-02-02 11:28     ` Anand Jain
@ 2021-02-02 13:37       ` Anand Jain
  2021-02-04  5:13         ` Qu Wenruo
  0 siblings, 1 reply; 52+ messages in thread
From: Anand Jain @ 2021-02-02 13:37 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs



It is much simpler to reproduce. I am using two systems with different
pagesizes to test the subpage readonly support.

On a host with pagesize = 4k.
   truncate -s 3g 3g.img
   mkfs.btrfs ./3g.img
   mount -o loop,compress=zstd ./3g.img /btrfs
   xfs_io -f -c "pwrite -S 0xab 0 128k" /btrfs/foo
   umount /btrfs

Copy the file 3g.img to another host with pagesize = 64k.
   mount -o ro,loop ./3g.img /btrfs
   sha256sum /btrfs/foo

   leads to Unable to handle kernel NULL pointer dereference
----------------
[  +0.001387] BTRFS warning (device loop0): csum hole found for disk 
bytenr range [13672448, 13676544)
[  +0.001514] BTRFS warning (device loop0): csum failed root 5 ino 257 
off 13697024 csum 0xbcd798f5 expected csum 0xf11c5ebf mirror 1
[  +0.002301] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0, rd 
0, flush 0, corrupt 1, gen 0
[  +0.001647] Unable to handle kernel NULL pointer dereference at 
virtual address 0000000000000000
[  +0.001670] Mem abort info:
[  +0.000506]   ESR = 0x96000005
[  +0.000471]   EC = 0x25: DABT (current EL), IL = 32 bits
[  +0.000783]   SET = 0, FnV = 0
[  +0.000450]   EA = 0, S1PTW = 0
[  +0.000462] Data abort info:
[  +0.000530]   ISV = 0, ISS = 0x00000005
[  +0.000755]   CM = 0, WnR = 0
[  +0.000466] user pgtable: 64k pages, 48-bit VAs, pgdp=000000010717ce00
[  +0.001027] [0000000000000000] pgd=0000000000000000, 
p4d=0000000000000000, pud=0000000000000000
[  +0.001402] Internal error: Oops: 96000005 [#1] PREEMPT SMP

Message from syslogd@aa3 at Feb  2 08:18:05 ...
  kernel:Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  +0.000958] Modules linked in: btrfs blake2b_generic xor xor_neon 
zstd_compress raid6_pq crct10dif_ce ip_tables x_tables ipv6
[  +0.001779] CPU: 25 PID: 5754 Comm: kworker/u64:1 Not tainted 
5.11.0-rc5+ #10
[  +0.001122] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[  +0.001286] Workqueue: btrfs-endio btrfs_work_helper [btrfs]
[  +0.001139] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
[  +0.001110] pc : __crc32c_le+0x84/0xe8
[  +0.000726] lr : chksum_digest+0x24/0x40
[  +0.000731] sp : ffff800017def8f0
[  +0.000624] x29: ffff800017def8f0 x28: ffff0000c84dca00
[  +0.000994] x27: ffff0000c44f5400 x26: ffff0000e3a008b0
[  +0.000985] x25: ffff800011df3948 x24: 0000004000000000
[  +0.001006] x23: ffff000000000000 x22: ffff800017defa00
[  +0.000993] x21: 0000000000000004 x20: ffff0000c84dca50
[  +0.000983] x19: ffff800017defc88 x18: 0000000000000010
[  +0.000995] x17: 0000000000000000 x16: ffff800009352a98
[  +0.001008] x15: 000009a9d48628c0 x14: 0000000000000209
[  +0.000999] x13: 00000000000003d1 x12: 0000000000000001
[  +0.000986] x11: 0000000000000001 x10: 00000000000009d0
[  +0.000982] x9 : ffff0000c5418064 x8 : 0000000000000000
[  +0.001008] x7 : 0000000000000000 x6 : ffff800011f23980
[  +0.001025] x5 : 00000000006f6964 x4 : ffff8000105dd7a8
[  +0.000997] x3 : ffff800017defc88 x2 : 0000000000010000
[  +0.000986] x1 : 0000000000000000 x0 : 00000000ffffffff
[  +0.001011] Call trace:
[  +0.000459]  __crc32c_le+0x84/0xe8
[  +0.000649]  crypto_shash_digest+0x34/0x58
[  +0.000766]  check_compressed_csum+0xd0/0x2b0 [btrfs]
[  +0.001011]  end_compressed_bio_read+0xb8/0x308 [btrfs]
[  +0.001060]  bio_endio+0x12c/0x1d8
[  +0.000651]  end_workqueue_fn+0x3c/0x60 [btrfs]
[  +0.000916]  btrfs_work_helper+0xf4/0x5a8 [btrfs]
[  +0.000934]  process_one_work+0x1ec/0x4c0
[  +0.000751]  worker_thread+0x48/0x478
[  +0.000701]  kthread+0x158/0x160
[  +0.000618]  ret_from_fork+0x10/0x34
[  +0.000697] Code: 9ac55c08 9ac65d08 1a880000 b4000122 (a8c21023)
[  +0.001075] ---[ end trace d4f31b4f11a947b7 ]---
[ +14.775765] note: kworker/u64:1[5754] exited with preempt_count 1
------------------------


Thanks, Anand



On 2/2/2021 7:28 PM, Anand Jain wrote:
> 
> 
> On 2/2/2021 6:23 PM, Qu Wenruo wrote:
>>
>>
>> On 2021/2/2 下午5:21, Anand Jain wrote:
>>>
>>> Qu,
>>>
>>>   fstests ran fine on an aarch64 kvm with this patch set.
>>
>> Do you mean subpage patchset?
>>
>> With 4K sector size?
>> No way it can run fine...
> 
>   No . fstests ran with sectorsize == pagesize == 64k. These aren't
>   subpage though. I mean just regression checks.
> 
>> Long enough fsstress can crash the kernel with btrfs_csum_one_bio()
>> unable to locate the corresponding ordered extent.
>>
>>>   Further, I was running few hand tests as below, and it fails
>>>   with - Unable to handle kernel paging.
>>>
>>>   Test case looks something like..
>>>
>>>   On x86_64 create btrfs on a file 11g
>>>   copy /usr into /test-mnt stops at enospc
>>>   set compression property on the root sunvol
>>>   run defrag with -czstd
>>
>> I don't even consider compression a supported feature for subpage.
> 
>   It should fail the ro mount, which it didn't. Similar test case
>   without compression is fine.
> 
>> Are you really talking about the subpage patchset with 4K sector size,
>> on 64K page size AArch64?
> 
>   yes readonly mount test case as above.
> 
> Thanks, Anand
> 
> 
>> If really so, I appreciate your effort on testing very much, it means
>> the patchset is doing way better than it is.
>> But I don't really believe it's even true to pass fstests....
> 
> 
> 
>> Thanks,
>> Qu
>>
>>>   truncate a large file 4gb
>>>   punch holes on it
>>>   truncate couple of smaller files
>>>   unmount
>>>   send file to an aarch64 (64k pagesize) kvm
>>>   mount -o ro
>>>   run sha256sum on all the files
>>>
>>> ---------------------
>>> [37012.027764] BTRFS warning (device loop0): csum failed root 5 ino 611
>>> off 228659200 csum 0x1dcefc2d expected csum 0x69412d2a mirror 1
>>> [37012.030971] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
>>> rd 0, flush 0, corrupt 9, gen 0
>>> [37012.036223] BTRFS warning (device loop0): csum failed root 5 ino 616
>>> off 228724736 csum 0x73f63661 expected csum 0xaf922a6f mirror 1
>>> [37012.036250] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
>>> rd 0, flush 0, corrupt 10, gen 0
>>> [37012.123917] Unable to handle kernel paging request at virtual address
>>> 0061d1f66c080000
>>> [37012.126104] Mem abort info:
>>> [37012.126951]   ESR = 0x96000004
>>> [37012.127791]   EC = 0x25: DABT (current EL), IL = 32 bits
>>> [37012.129207]   SET = 0, FnV = 0
>>> [37012.130043]   EA = 0, S1PTW = 0
>>> [37012.131269] Data abort info:
>>> [37012.132165]   ISV = 0, ISS = 0x00000004
>>> [37012.133211]   CM = 0, WnR = 0
>>> [37012.134014] [0061d1f66c080000] address between user and kernel
>>> address ranges
>>> [37012.136050] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>> [37012.137567] Modules linked in: btrfs blake2b_generic xor xor_neon
>>> zstd_compress raid6_pq crct10dif_ce ip_tables x_tables ipv6
>>> [37012.140742] CPU: 0 PID: 289001 Comm: kworker/u64:3 Not tainted
>>> 5.11.0-rc5+ #10
>>> [37012.142839] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0
>>> 02/06/2015
>>> [37012.144787] Workqueue: btrfs-endio btrfs_work_helper [btrfs]
>>> [37012.146474] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
>>> [37012.148175] pc : __crc32c_le+0x84/0xe8
>>> [37012.149266] lr : chksum_digest+0x24/0x40
>>> [37012.150420] sp : ffff80001638f8f0
>>> [37012.151491] x29: ffff80001638f8f0 x28: ffff0000c7bb0000
>>> [37012.152982] x27: ffff0000d1a27000 x26: ffff0002f21b56e0
>>> [37012.154565] x25: ffff800011df3948 x24: 0000004000000000
>>> [37012.156063] x23: ffff000000000000 x22: ffff80001638fa00
>>> [37012.157570] x21: 0000000000000004 x20: ffff0000c7bb0050
>>> [37012.159145] x19: ffff80001638fc88 x18: 0000000000000000
>>> [37012.160684] x17: 0000000000000000 x16: 0000000000000000
>>> [37012.162190] x15: 0000051d5454c764 x14: 000000000000017a
>>> [37012.163774] x13: 0000000000000145 x12: 0000000000000001
>>> [37012.165282] x11: 0000000000000000 x10: 00000000000009d0
>>> [37012.166849] x9 : ffff0000ca305564 x8 : 0000000000000000
>>> [37012.168395] x7 : 0000000000000000 x6 : ffff800011f23980
>>> [37012.169883] x5 : 00000000006f6964 x4 : ffff8000105dd7a8
>>> [37012.171476] x3 : ffff80001638fc88 x2 : 0000000000010000
>>> [37012.172997] x1 : bc61d1f66c080000 x0 : 00000000ffffffff
>>> [37012.174642] Call trace:
>>> [37012.175427]  __crc32c_le+0x84/0xe8
>>> [37012.176419]  crypto_shash_digest+0x34/0x58
>>> [37012.177616]  check_compressed_csum+0xd0/0x2b0 [btrfs]
>>> [37012.179160]  end_compressed_bio_read+0xb8/0x308 [btrfs]
>>> [37012.180731]  bio_endio+0x12c/0x1d8
>>> [37012.181712]  end_workqueue_fn+0x3c/0x60 [btrfs]
>>> [37012.183161]  btrfs_work_helper+0xf4/0x5a8 [btrfs]
>>> [37012.184570]  process_one_work+0x1ec/0x4c0
>>> [37012.185727]  worker_thread+0x48/0x478
>>> [37012.186823]  kthread+0x158/0x160
>>> [37012.187768]  ret_from_fork+0x10/0x34
>>> [37012.188791] Code: 9ac55c08 9ac65d08 1a880000 b4000122 (a8c21023)
>>> [37012.190486] ---[ end trace 4f73e813d058b84c ]---
>>> [37019.180684] note: kworker/u64:3[289001] exited with preempt_count 1
>>> ---------------
>>>
>>>   Could you please take a look?
>>>
>>> Thanks, Anand


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 00/18] btrfs: add read-only support for subpage sector size
  2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (20 preceding siblings ...)
  2021-02-02  9:21 ` [bug report] Unable to handle kernel paging request Anand Jain
@ 2021-02-03 13:20 ` David Sterba
  21 siblings, 0 replies; 52+ messages in thread
From: David Sterba @ 2021-02-03 13:20 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Jan 26, 2021 at 04:33:44PM +0800, Qu Wenruo wrote:
> Qu Wenruo (18):
>   btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to
>     PAGE_START_WRITEBACK
>   btrfs: set UNMAPPED bit early in btrfs_clone_extent_buffer() for
>     subpage support
>   btrfs: introduce the skeleton of btrfs_subpage structure
>   btrfs: make attach_extent_buffer_page() handle subpage case
>   btrfs: make grab_extent_buffer_from_page() handle subpage case
>   btrfs: support subpage for extent buffer page release
>   btrfs: attach private to dummy extent buffer pages
>   btrfs: introduce helpers for subpage uptodate status
>   btrfs: introduce helpers for subpage error status
>   btrfs: support subpage in set/clear_extent_buffer_uptodate()
>   btrfs: support subpage in btrfs_clone_extent_buffer
>   btrfs: support subpage in try_release_extent_buffer()
>   btrfs: introduce read_extent_buffer_subpage()
>   btrfs: support subpage in endio_readpage_update_page_status()
>   btrfs: introduce subpage metadata validation check
>   btrfs: introduce btrfs_subpage for data inodes
>   btrfs: integrate page status update for data read path into
>     begin/end_page_read()
>   btrfs: allow RO mount of 4K sector size fs on 64K page system

This is now in misc-next, with the replaced patch 17 sent recently,
thanks.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [bug report] Unable to handle kernel paging request
  2021-02-02 13:37       ` Anand Jain
@ 2021-02-04  5:13         ` Qu Wenruo
  0 siblings, 0 replies; 52+ messages in thread
From: Qu Wenruo @ 2021-02-04  5:13 UTC (permalink / raw)
  To: Anand Jain, Qu Wenruo, linux-btrfs



On 2021/2/2 下午9:37, Anand Jain wrote:
>
>
> It is much simpler to reproduce. I am using two systems with different
> pagesizes to test the subpage readonly support.
>
> On a host with pagesize = 4k.
>    truncate -s 3g 3g.img
>    mkfs.btrfs ./3g.img
>    mount -o loop,compress=zstd ./3g.img /btrfs
>    xfs_io -f -c "pwrite -S 0xab 0 128k" /btrfs/foo
>    umount /btrfs
>
> Copy the file 3g.img to another host with pagesize = 64k.
>    mount -o ro,loop ./3g.img /btrfs
>    sha256sum /btrfs/foo
>
>    leads to Unable to handle kernel NULL pointer dereference

Thanks for the report.

Although in my case, I can't reproduce the crash, but only csum data
mismatch with "csum hole found for disk bytenr range" error message.

Anyway, it should be fixed for compressed read.

I'll investigate the case.

Thanks,
Qu
> ----------------
> [  +0.001387] BTRFS warning (device loop0): csum hole found for disk
> bytenr range [13672448, 13676544)
> [  +0.001514] BTRFS warning (device loop0): csum failed root 5 ino 257
> off 13697024 csum 0xbcd798f5 expected csum 0xf11c5ebf mirror 1
> [  +0.002301] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0, rd
> 0, flush 0, corrupt 1, gen 0
> [  +0.001647] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000000
> [  +0.001670] Mem abort info:
> [  +0.000506]   ESR = 0x96000005
> [  +0.000471]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  +0.000783]   SET = 0, FnV = 0
> [  +0.000450]   EA = 0, S1PTW = 0
> [  +0.000462] Data abort info:
> [  +0.000530]   ISV = 0, ISS = 0x00000005
> [  +0.000755]   CM = 0, WnR = 0
> [  +0.000466] user pgtable: 64k pages, 48-bit VAs, pgdp=000000010717ce00
> [  +0.001027] [0000000000000000] pgd=0000000000000000,
> p4d=0000000000000000, pud=0000000000000000
> [  +0.001402] Internal error: Oops: 96000005 [#1] PREEMPT SMP
>
> Message from syslogd@aa3 at Feb  2 08:18:05 ...
>   kernel:Internal error: Oops: 96000005 [#1] PREEMPT SMP
> [  +0.000958] Modules linked in: btrfs blake2b_generic xor xor_neon
> zstd_compress raid6_pq crct10dif_ce ip_tables x_tables ipv6
> [  +0.001779] CPU: 25 PID: 5754 Comm: kworker/u64:1 Not tainted
> 5.11.0-rc5+ #10
> [  +0.001122] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0
> 02/06/2015
> [  +0.001286] Workqueue: btrfs-endio btrfs_work_helper [btrfs]
> [  +0.001139] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
> [  +0.001110] pc : __crc32c_le+0x84/0xe8
> [  +0.000726] lr : chksum_digest+0x24/0x40
> [  +0.000731] sp : ffff800017def8f0
> [  +0.000624] x29: ffff800017def8f0 x28: ffff0000c84dca00
> [  +0.000994] x27: ffff0000c44f5400 x26: ffff0000e3a008b0
> [  +0.000985] x25: ffff800011df3948 x24: 0000004000000000
> [  +0.001006] x23: ffff000000000000 x22: ffff800017defa00
> [  +0.000993] x21: 0000000000000004 x20: ffff0000c84dca50
> [  +0.000983] x19: ffff800017defc88 x18: 0000000000000010
> [  +0.000995] x17: 0000000000000000 x16: ffff800009352a98
> [  +0.001008] x15: 000009a9d48628c0 x14: 0000000000000209
> [  +0.000999] x13: 00000000000003d1 x12: 0000000000000001
> [  +0.000986] x11: 0000000000000001 x10: 00000000000009d0
> [  +0.000982] x9 : ffff0000c5418064 x8 : 0000000000000000
> [  +0.001008] x7 : 0000000000000000 x6 : ffff800011f23980
> [  +0.001025] x5 : 00000000006f6964 x4 : ffff8000105dd7a8
> [  +0.000997] x3 : ffff800017defc88 x2 : 0000000000010000
> [  +0.000986] x1 : 0000000000000000 x0 : 00000000ffffffff
> [  +0.001011] Call trace:
> [  +0.000459]  __crc32c_le+0x84/0xe8
> [  +0.000649]  crypto_shash_digest+0x34/0x58
> [  +0.000766]  check_compressed_csum+0xd0/0x2b0 [btrfs]
> [  +0.001011]  end_compressed_bio_read+0xb8/0x308 [btrfs]
> [  +0.001060]  bio_endio+0x12c/0x1d8
> [  +0.000651]  end_workqueue_fn+0x3c/0x60 [btrfs]
> [  +0.000916]  btrfs_work_helper+0xf4/0x5a8 [btrfs]
> [  +0.000934]  process_one_work+0x1ec/0x4c0
> [  +0.000751]  worker_thread+0x48/0x478
> [  +0.000701]  kthread+0x158/0x160
> [  +0.000618]  ret_from_fork+0x10/0x34
> [  +0.000697] Code: 9ac55c08 9ac65d08 1a880000 b4000122 (a8c21023)
> [  +0.001075] ---[ end trace d4f31b4f11a947b7 ]---
> [ +14.775765] note: kworker/u64:1[5754] exited with preempt_count 1
> ------------------------
>
>
> Thanks, Anand
>
>
>
> On 2/2/2021 7:28 PM, Anand Jain wrote:
>>
>>
>> On 2/2/2021 6:23 PM, Qu Wenruo wrote:
>>>
>>>
>>> On 2021/2/2 下午5:21, Anand Jain wrote:
>>>>
>>>> Qu,
>>>>
>>>>   fstests ran fine on an aarch64 kvm with this patch set.
>>>
>>> Do you mean subpage patchset?
>>>
>>> With 4K sector size?
>>> No way it can run fine...
>>
>>   No . fstests ran with sectorsize == pagesize == 64k. These aren't
>>   subpage though. I mean just regression checks.
>>
>>> Long enough fsstress can crash the kernel with btrfs_csum_one_bio()
>>> unable to locate the corresponding ordered extent.
>>>
>>>>   Further, I was running few hand tests as below, and it fails
>>>>   with - Unable to handle kernel paging.
>>>>
>>>>   Test case looks something like..
>>>>
>>>>   On x86_64 create btrfs on a file 11g
>>>>   copy /usr into /test-mnt stops at enospc
>>>>   set compression property on the root sunvol
>>>>   run defrag with -czstd
>>>
>>> I don't even consider compression a supported feature for subpage.
>>
>>   It should fail the ro mount, which it didn't. Similar test case
>>   without compression is fine.
>>
>>> Are you really talking about the subpage patchset with 4K sector size,
>>> on 64K page size AArch64?
>>
>>   yes readonly mount test case as above.
>>
>> Thanks, Anand
>>
>>
>>> If really so, I appreciate your effort on testing very much, it means
>>> the patchset is doing way better than it is.
>>> But I don't really believe it's even true to pass fstests....
>>
>>
>>
>>> Thanks,
>>> Qu
>>>
>>>>   truncate a large file 4gb
>>>>   punch holes on it
>>>>   truncate couple of smaller files
>>>>   unmount
>>>>   send file to an aarch64 (64k pagesize) kvm
>>>>   mount -o ro
>>>>   run sha256sum on all the files
>>>>
>>>> ---------------------
>>>> [37012.027764] BTRFS warning (device loop0): csum failed root 5 ino 611
>>>> off 228659200 csum 0x1dcefc2d expected csum 0x69412d2a mirror 1
>>>> [37012.030971] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
>>>> rd 0, flush 0, corrupt 9, gen 0
>>>> [37012.036223] BTRFS warning (device loop0): csum failed root 5 ino 616
>>>> off 228724736 csum 0x73f63661 expected csum 0xaf922a6f mirror 1
>>>> [37012.036250] BTRFS error (device loop0): bdev /dev/loop0 errs: wr 0,
>>>> rd 0, flush 0, corrupt 10, gen 0
>>>> [37012.123917] Unable to handle kernel paging request at virtual
>>>> address
>>>> 0061d1f66c080000
>>>> [37012.126104] Mem abort info:
>>>> [37012.126951]   ESR = 0x96000004
>>>> [37012.127791]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>> [37012.129207]   SET = 0, FnV = 0
>>>> [37012.130043]   EA = 0, S1PTW = 0
>>>> [37012.131269] Data abort info:
>>>> [37012.132165]   ISV = 0, ISS = 0x00000004
>>>> [37012.133211]   CM = 0, WnR = 0
>>>> [37012.134014] [0061d1f66c080000] address between user and kernel
>>>> address ranges
>>>> [37012.136050] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>>> [37012.137567] Modules linked in: btrfs blake2b_generic xor xor_neon
>>>> zstd_compress raid6_pq crct10dif_ce ip_tables x_tables ipv6
>>>> [37012.140742] CPU: 0 PID: 289001 Comm: kworker/u64:3 Not tainted
>>>> 5.11.0-rc5+ #10
>>>> [37012.142839] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0
>>>> 02/06/2015
>>>> [37012.144787] Workqueue: btrfs-endio btrfs_work_helper [btrfs]
>>>> [37012.146474] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
>>>> [37012.148175] pc : __crc32c_le+0x84/0xe8
>>>> [37012.149266] lr : chksum_digest+0x24/0x40
>>>> [37012.150420] sp : ffff80001638f8f0
>>>> [37012.151491] x29: ffff80001638f8f0 x28: ffff0000c7bb0000
>>>> [37012.152982] x27: ffff0000d1a27000 x26: ffff0002f21b56e0
>>>> [37012.154565] x25: ffff800011df3948 x24: 0000004000000000
>>>> [37012.156063] x23: ffff000000000000 x22: ffff80001638fa00
>>>> [37012.157570] x21: 0000000000000004 x20: ffff0000c7bb0050
>>>> [37012.159145] x19: ffff80001638fc88 x18: 0000000000000000
>>>> [37012.160684] x17: 0000000000000000 x16: 0000000000000000
>>>> [37012.162190] x15: 0000051d5454c764 x14: 000000000000017a
>>>> [37012.163774] x13: 0000000000000145 x12: 0000000000000001
>>>> [37012.165282] x11: 0000000000000000 x10: 00000000000009d0
>>>> [37012.166849] x9 : ffff0000ca305564 x8 : 0000000000000000
>>>> [37012.168395] x7 : 0000000000000000 x6 : ffff800011f23980
>>>> [37012.169883] x5 : 00000000006f6964 x4 : ffff8000105dd7a8
>>>> [37012.171476] x3 : ffff80001638fc88 x2 : 0000000000010000
>>>> [37012.172997] x1 : bc61d1f66c080000 x0 : 00000000ffffffff
>>>> [37012.174642] Call trace:
>>>> [37012.175427]  __crc32c_le+0x84/0xe8
>>>> [37012.176419]  crypto_shash_digest+0x34/0x58
>>>> [37012.177616]  check_compressed_csum+0xd0/0x2b0 [btrfs]
>>>> [37012.179160]  end_compressed_bio_read+0xb8/0x308 [btrfs]
>>>> [37012.180731]  bio_endio+0x12c/0x1d8
>>>> [37012.181712]  end_workqueue_fn+0x3c/0x60 [btrfs]
>>>> [37012.183161]  btrfs_work_helper+0xf4/0x5a8 [btrfs]
>>>> [37012.184570]  process_one_work+0x1ec/0x4c0
>>>> [37012.185727]  worker_thread+0x48/0x478
>>>> [37012.186823]  kthread+0x158/0x160
>>>> [37012.187768]  ret_from_fork+0x10/0x34
>>>> [37012.188791] Code: 9ac55c08 9ac65d08 1a880000 b4000122 (a8c21023)
>>>> [37012.190486] ---[ end trace 4f73e813d058b84c ]---
>>>> [37019.180684] note: kworker/u64:3[289001] exited with preempt_count 1
>>>> ---------------
>>>>
>>>>   Could you please take a look?
>>>>
>>>> Thanks, Anand
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2021-02-04  5:15 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-26  8:33 [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
2021-01-26  8:33 ` [PATCH v5 01/18] btrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to PAGE_START_WRITEBACK Qu Wenruo
2021-01-27 15:56   ` Josef Bacik
2021-01-26  8:33 ` [PATCH v5 02/18] btrfs: set UNMAPPED bit early in btrfs_clone_extent_buffer() for subpage support Qu Wenruo
2021-01-27 15:56   ` Josef Bacik
2021-01-26  8:33 ` [PATCH v5 03/18] btrfs: introduce the skeleton of btrfs_subpage structure Qu Wenruo
2021-01-26  8:33 ` [PATCH v5 04/18] btrfs: make attach_extent_buffer_page() handle subpage case Qu Wenruo
2021-01-27 16:01   ` Josef Bacik
2021-01-26  8:33 ` [PATCH v5 05/18] btrfs: make grab_extent_buffer_from_page() " Qu Wenruo
2021-01-27 16:20   ` Josef Bacik
2021-01-26  8:33 ` [PATCH v5 06/18] btrfs: support subpage for extent buffer page release Qu Wenruo
2021-01-27 16:21   ` Josef Bacik
2021-02-01 15:32     ` David Sterba
2021-01-26  8:33 ` [PATCH v5 07/18] btrfs: attach private to dummy extent buffer pages Qu Wenruo
2021-01-27 16:21   ` Josef Bacik
2021-01-26  8:33 ` [PATCH v5 08/18] btrfs: introduce helpers for subpage uptodate status Qu Wenruo
2021-01-27 16:34   ` Josef Bacik
2021-01-26  8:33 ` [PATCH v5 09/18] btrfs: introduce helpers for subpage error status Qu Wenruo
2021-01-27 16:34   ` Josef Bacik
2021-01-26  8:33 ` [PATCH v5 10/18] btrfs: support subpage in set/clear_extent_buffer_uptodate() Qu Wenruo
2021-01-27 16:35   ` Josef Bacik
2021-01-26  8:33 ` [PATCH v5 11/18] btrfs: support subpage in btrfs_clone_extent_buffer Qu Wenruo
2021-01-27 16:35   ` Josef Bacik
2021-01-26  8:33 ` [PATCH v5 12/18] btrfs: support subpage in try_release_extent_buffer() Qu Wenruo
2021-01-27 16:37   ` Josef Bacik
2021-01-26  8:33 ` [PATCH v5 13/18] btrfs: introduce read_extent_buffer_subpage() Qu Wenruo
2021-01-27 16:39   ` Josef Bacik
2021-01-26  8:33 ` [PATCH v5 14/18] btrfs: support subpage in endio_readpage_update_page_status() Qu Wenruo
2021-01-27 16:42   ` Josef Bacik
2021-01-26  8:33 ` [PATCH v5 15/18] btrfs: introduce subpage metadata validation check Qu Wenruo
2021-01-27 16:47   ` Josef Bacik
2021-01-26  8:34 ` [PATCH v5 16/18] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
2021-01-27 16:56   ` Josef Bacik
2021-02-01 15:42     ` David Sterba
2021-01-26  8:34 ` [PATCH v5 17/18] btrfs: integrate page status update for data read path into begin/end_page_read() Qu Wenruo
2021-01-27 17:13   ` Josef Bacik
2021-02-01 15:47     ` David Sterba
2021-01-26  8:34 ` [PATCH v5 18/18] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
2021-01-27 17:13   ` Josef Bacik
2021-02-01 15:49   ` David Sterba
2021-01-27 16:17 ` [PATCH v5 00/18] btrfs: add read-only support for subpage sector size Josef Bacik
2021-01-28  0:30   ` Qu Wenruo
2021-01-28 10:34     ` David Sterba
2021-01-28 10:51       ` Qu Wenruo
2021-02-01 14:50         ` David Sterba
2021-02-01 15:55 ` David Sterba
2021-02-02  9:21 ` [bug report] Unable to handle kernel paging request Anand Jain
2021-02-02 10:23   ` Qu Wenruo
2021-02-02 11:28     ` Anand Jain
2021-02-02 13:37       ` Anand Jain
2021-02-04  5:13         ` Qu Wenruo
2021-02-03 13:20 ` [PATCH v5 00/18] btrfs: add read-only support for subpage sector size David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.