All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] btrfs: support read-write for subpage metadata
@ 2021-02-22  6:33 Qu Wenruo
  2021-02-22  6:33 ` [PATCH 01/12] btrfs: subpage: introduce helper for subpage dirty status Qu Wenruo
                   ` (13 more replies)
  0 siblings, 14 replies; 19+ messages in thread
From: Qu Wenruo @ 2021-02-22  6:33 UTC (permalink / raw)
  To: linux-btrfs

This patchset can be fetched from the following github repo, along with
the full subpage RW support:
https://github.com/adam900710/linux/tree/subpage

This patchset is for metadata read write support.

[TEST]
Since the data write path is not included in this patchset, we can't
really test it, but during the lunar new year vocation, I have tested
the full RW patchset with "fstresss -n 10000 -p2" on my Aarch64 board.

And the full RW patchset survives without any crash for a full week.

There is only one remaining bug exposed during the test, that we have
random data checksum mismatch, which is still under investigation.

But the metadata part should be OK for submission.

[DIFFERENCE AGAINST REGULAR SECTORSIZE]
The metadata part in fact has more new code than data part, as it has
some different behaviors compared to the regular sector size handling:

- No more page locking
  Now metadata read/write relies on extent io tree locking, other than
  page locking.
  This is to allow behaviors like read lock one eb while also try to
  read lock another eb in the same page.
  We can't rely on page lock as now we have multiple extent buffers in
  the same page.

- Page status update
  Now we use subpage wrappers to handle page status update.

- How to submit dirty extent buffers
  Instead of just grabbing extent buffer from page::private, we need to
  iterate all dirty extent buffers in the page and submit them.

Qu Wenruo (12):
  btrfs: subpage: introduce helper for subpage dirty status
  btrfs: subpage: introduce helper for subpage writeback status
  btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check
    on subpage metadata
  btrfs: disk-io: support subpage metadata csum calculation at write
    time
  btrfs: extent_io: make alloc_extent_buffer() check subpage dirty
    bitmap
  btrfs: extent_io: make the page uptodate assert check to handle
    subpage
  btrfs: extent_io: make set/clear_extent_buffer_dirty() to support
    subpage sized metadata
  btrfs: extent_io: make set_btree_ioerr() accept extent buffer and
    handle subpage metadata
  btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function
  btrfs: extent_io: introduce write_one_subpage_eb() function
  btrfs: extent_io: make lock_extent_buffer_for_io() to support subpage
    metadata
  btrfs: extent_io: introduce submit_eb_subpage() to submit a subpage
    metadata page

 fs/btrfs/disk-io.c   | 143 +++++++++++----
 fs/btrfs/extent_io.c | 420 ++++++++++++++++++++++++++++++++++++-------
 fs/btrfs/subpage.c   |  72 ++++++++
 fs/btrfs/subpage.h   |  17 ++
 4 files changed, 559 insertions(+), 93 deletions(-)

-- 
2.30.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 01/12] btrfs: subpage: introduce helper for subpage dirty status
  2021-02-22  6:33 [PATCH 00/12] btrfs: support read-write for subpage metadata Qu Wenruo
@ 2021-02-22  6:33 ` Qu Wenruo
  2021-02-22  6:33 ` [PATCH 02/12] btrfs: subpage: introduce helper for subpage writeback status Qu Wenruo
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2021-02-22  6:33 UTC (permalink / raw)
  To: linux-btrfs

This patch introduce the following functions to handle btrfs subpage
dirty status:
- btrfs_subpage_set_dirty()
- btrfs_subpage_clear_dirty()
- btrfs_subpage_test_dirty()
  Those helpers can only be called when the range is ensured to be
  inside the page.

- btrfs_page_set_dirty()
- btrfs_page_clear_dirty()
- btrfs_page_test_dirty()
  Those helpers can handle both regular sector size and subpage without
  problem.
  Thus those would be used to replace PageDirty() related calls in
  later commits.

There is one special point to note here, just like set_page_dirty() and
clear_page_dirty_for_io(), btrfs_*page_set_dirty() and
btrfs_*page_clear_dirty() must be called with page locked.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/subpage.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/subpage.h | 15 +++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index c69049e7daa9..16dd6fcd258d 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -220,6 +220,45 @@ void btrfs_subpage_clear_error(const struct btrfs_fs_info *fs_info,
 	spin_unlock_irqrestore(&subpage->lock, flags);
 }
 
+void btrfs_subpage_set_dirty(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->dirty_bitmap |= tmp;
+	spin_unlock_irqrestore(&subpage->lock, flags);
+	set_page_dirty(page);
+}
+
+bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+	bool last = false;
+
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->dirty_bitmap &= ~tmp;
+	if (subpage->dirty_bitmap == 0)
+		last = true;
+	spin_unlock_irqrestore(&subpage->lock, flags);
+	return last;
+}
+
+void btrfs_subpage_clear_dirty(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	bool last;
+	last = btrfs_subpage_clear_and_test_dirty(fs_info, page, start, len);
+	if (last)
+		clear_page_dirty_for_io(page);
+}
+
 /*
  * Unlike set/clear which is dependent on each page status, for test all bits
  * are tested in the same way.
@@ -240,6 +279,7 @@ bool btrfs_subpage_test_##name(const struct btrfs_fs_info *fs_info,	\
 }
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(uptodate);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(error);
+IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(dirty);
 
 /*
  * Note that, in selftests (extent-io-tests), we can have empty fs_info passed
@@ -276,3 +316,5 @@ bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info,	\
 IMPLEMENT_BTRFS_PAGE_OPS(uptodate, SetPageUptodate, ClearPageUptodate,
 			 PageUptodate);
 IMPLEMENT_BTRFS_PAGE_OPS(error, SetPageError, ClearPageError, PageError);
+IMPLEMENT_BTRFS_PAGE_OPS(dirty, set_page_dirty, clear_page_dirty_for_io,
+			 PageDirty);
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index b86a4881475d..adaece5ce294 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -20,6 +20,7 @@ struct btrfs_subpage {
 	spinlock_t lock;
 	u16 uptodate_bitmap;
 	u16 error_bitmap;
+	u16 dirty_bitmap;
 	union {
 		/*
 		 * Structures only used by metadata
@@ -87,5 +88,19 @@ bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info,	\
 
 DECLARE_BTRFS_SUBPAGE_OPS(uptodate);
 DECLARE_BTRFS_SUBPAGE_OPS(error);
+DECLARE_BTRFS_SUBPAGE_OPS(dirty);
+
+/*
+ * Extra clear_and_test function for subpage dirty bitmap.
+ *
+ * Return true if we're the last bits in the dirty_bitmap and clear the
+ * dirty_bitmap.
+ * Return false otherwise.
+ *
+ * NOTE: Callers should manually clear page dirty for true case, as we have
+ * extra handling for tree blocks.
+ */
+bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len);
 
 #endif
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 02/12] btrfs: subpage: introduce helper for subpage writeback status
  2021-02-22  6:33 [PATCH 00/12] btrfs: support read-write for subpage metadata Qu Wenruo
  2021-02-22  6:33 ` [PATCH 01/12] btrfs: subpage: introduce helper for subpage dirty status Qu Wenruo
@ 2021-02-22  6:33 ` Qu Wenruo
  2021-02-22  6:33 ` [PATCH 03/12] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2021-02-22  6:33 UTC (permalink / raw)
  To: linux-btrfs

This patch introduce the following functions to handle btrfs subpage
writeback status:
- btrfs_subpage_set_writeback()
- btrfs_subpage_clear_writeback()
- btrfs_subpage_test_writeback()
  Those helpers can only be called when the range is ensured to be
  inside the page.

- btrfs_page_set_writeback()
- btrfs_page_clear_writeback()
- btrfs_page_test_writeback()
  Those helpers can handle both regular sector size and subpage without
  problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/subpage.c | 30 ++++++++++++++++++++++++++++++
 fs/btrfs/subpage.h |  2 ++
 2 files changed, 32 insertions(+)

diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 16dd6fcd258d..9bc212d16d3c 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -259,6 +259,33 @@ void btrfs_subpage_clear_dirty(const struct btrfs_fs_info *fs_info,
 		clear_page_dirty_for_io(page);
 }
 
+void btrfs_subpage_set_writeback(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->writeback_bitmap |= tmp;
+	set_page_writeback(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
+void btrfs_subpage_clear_writeback(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->writeback_bitmap &= ~tmp;
+	if (subpage->writeback_bitmap == 0)
+		end_page_writeback(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
 /*
  * Unlike set/clear which is dependent on each page status, for test all bits
  * are tested in the same way.
@@ -280,6 +307,7 @@ bool btrfs_subpage_test_##name(const struct btrfs_fs_info *fs_info,	\
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(uptodate);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(error);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(dirty);
+IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(writeback);
 
 /*
  * Note that, in selftests (extent-io-tests), we can have empty fs_info passed
@@ -318,3 +346,5 @@ IMPLEMENT_BTRFS_PAGE_OPS(uptodate, SetPageUptodate, ClearPageUptodate,
 IMPLEMENT_BTRFS_PAGE_OPS(error, SetPageError, ClearPageError, PageError);
 IMPLEMENT_BTRFS_PAGE_OPS(dirty, set_page_dirty, clear_page_dirty_for_io,
 			 PageDirty);
+IMPLEMENT_BTRFS_PAGE_OPS(writeback, set_page_writeback, end_page_writeback,
+			PageWriteback);
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index adaece5ce294..fe43267e31f3 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -21,6 +21,7 @@ struct btrfs_subpage {
 	u16 uptodate_bitmap;
 	u16 error_bitmap;
 	u16 dirty_bitmap;
+	u16 writeback_bitmap;
 	union {
 		/*
 		 * Structures only used by metadata
@@ -89,6 +90,7 @@ bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info,	\
 DECLARE_BTRFS_SUBPAGE_OPS(uptodate);
 DECLARE_BTRFS_SUBPAGE_OPS(error);
 DECLARE_BTRFS_SUBPAGE_OPS(dirty);
+DECLARE_BTRFS_SUBPAGE_OPS(writeback);
 
 /*
  * Extra clear_and_test function for subpage dirty bitmap.
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 03/12] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata
  2021-02-22  6:33 [PATCH 00/12] btrfs: support read-write for subpage metadata Qu Wenruo
  2021-02-22  6:33 ` [PATCH 01/12] btrfs: subpage: introduce helper for subpage dirty status Qu Wenruo
  2021-02-22  6:33 ` [PATCH 02/12] btrfs: subpage: introduce helper for subpage writeback status Qu Wenruo
@ 2021-02-22  6:33 ` Qu Wenruo
  2021-02-22  7:58   ` Su Yue
  2021-02-22  6:33 ` [PATCH 04/12] btrfs: disk-io: support subpage metadata csum calculation at write time Qu Wenruo
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2021-02-22  6:33 UTC (permalink / raw)
  To: linux-btrfs

For btree_set_page_dirty(), we should also check the extent buffer
sanity for subpage support.

Unlike the regular sector size case, since one page can contain multiple
extent buffers, we need to make sure there is at least one dirty extent
buffer in the page.

So this patch will iterate through the btrfs_subpage::dirty_bitmap
to get the extent buffers, and check if any dirty extent buffer in the page
range has EXTENT_BUFFER_DIRTY and proper refs.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 47 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c2576c5fe62e..437e6b2163c7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -42,6 +42,7 @@
 #include "discard.h"
 #include "space-info.h"
 #include "zoned.h"
+#include "subpage.h"
 
 #define BTRFS_SUPER_FLAG_SUPP	(BTRFS_HEADER_FLAG_WRITTEN |\
 				 BTRFS_HEADER_FLAG_RELOC |\
@@ -992,14 +993,48 @@ static void btree_invalidatepage(struct page *page, unsigned int offset,
 static int btree_set_page_dirty(struct page *page)
 {
 #ifdef DEBUG
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+	struct btrfs_subpage *subpage;
 	struct extent_buffer *eb;
+	int cur_bit;
+	u64 page_start = page_offset(page);
+
+	if (fs_info->sectorsize == PAGE_SIZE) {
+		BUG_ON(!PagePrivate(page));
+		eb = (struct extent_buffer *)page->private;
+		BUG_ON(!eb);
+		BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
+		BUG_ON(!atomic_read(&eb->refs));
+		btrfs_assert_tree_locked(eb);
+		return __set_page_dirty_nobuffers(page);
+	}
+	ASSERT(PagePrivate(page) && page->private);
+	subpage = (struct btrfs_subpage *)page->private;
+
+	ASSERT(subpage->dirty_bitmap);
+	while (cur_bit < BTRFS_SUBPAGE_BITMAP_SIZE) {
+		unsigned long flags;
+		u64 cur;
+		u16 tmp = (1 << cur_bit);
+
+		spin_lock_irqsave(&subpage->lock, flags);
+		if (!(tmp & subpage->dirty_bitmap)) {
+			spin_unlock_irqrestore(&subpage->lock, flags);
+			cur_bit++;
+			continue;
+		}
+		spin_unlock_irqrestore(&subpage->lock, flags);
+		cur = page_start + cur_bit * fs_info->sectorsize;
 
-	BUG_ON(!PagePrivate(page));
-	eb = (struct extent_buffer *)page->private;
-	BUG_ON(!eb);
-	BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
-	BUG_ON(!atomic_read(&eb->refs));
-	btrfs_assert_tree_locked(eb);
+		eb = find_extent_buffer(fs_info, cur);
+		ASSERT(eb);
+		ASSERT(test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
+		ASSERT(atomic_read(&eb->refs));
+		btrfs_assert_tree_locked(eb);
+		free_extent_buffer(eb);
+
+		cur_bit += (fs_info->nodesize >> fs_info->sectorsize_bits);
+	}
 #endif
 	return __set_page_dirty_nobuffers(page);
 }
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 04/12] btrfs: disk-io: support subpage metadata csum calculation at write time
  2021-02-22  6:33 [PATCH 00/12] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (2 preceding siblings ...)
  2021-02-22  6:33 ` [PATCH 03/12] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
@ 2021-02-22  6:33 ` Qu Wenruo
  2021-02-22  6:33 ` [PATCH 05/12] btrfs: extent_io: make alloc_extent_buffer() check subpage dirty bitmap Qu Wenruo
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2021-02-22  6:33 UTC (permalink / raw)
  To: linux-btrfs

Add a new helper, csum_dirty_subpage_buffers(), to iterate through all
dirty extent buffers in one bvec.

Also extract the code of calculating csum for one extent buffer into
csum_one_extent_buffer(), so that both the existing csum_dirty_buffer()
and the new csum_dirty_subpage_buffers() can reuse the same routine.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 96 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 72 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 437e6b2163c7..3c00a65f6679 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -441,6 +441,74 @@ static int btree_read_extent_buffer_pages(struct extent_buffer *eb,
 	return ret;
 }
 
+static int csum_one_extent_buffer(struct extent_buffer *eb)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	u8 result[BTRFS_CSUM_SIZE];
+	int ret;
+
+	ASSERT(memcmp_extent_buffer(eb, fs_info->fs_devices->metadata_uuid,
+				    offsetof(struct btrfs_header, fsid),
+				    BTRFS_FSID_SIZE) == 0);
+	csum_tree_block(eb, result);
+
+	if (btrfs_header_level(eb))
+		ret = btrfs_check_node(eb);
+	else
+		ret = btrfs_check_leaf_full(eb);
+
+	if (ret < 0) {
+		btrfs_print_tree(eb, 0);
+		btrfs_err(fs_info,
+		"block=%llu write time tree block corruption detected",
+			  eb->start);
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		return ret;
+	}
+	write_extent_buffer(eb, result, 0, fs_info->csum_size);
+
+	return 0;
+}
+
+/* Checksum all dirty extent buffers in one bio_vec. */
+static int csum_dirty_subpage_buffers(struct btrfs_fs_info *fs_info,
+				      struct bio_vec *bvec)
+{
+	struct page *page = bvec->bv_page;
+	u64 bvec_start = page_offset(page) + bvec->bv_offset;
+	u64 cur;
+	int ret = 0;
+
+	for (cur = bvec_start; cur < bvec_start + bvec->bv_len;
+	     cur += fs_info->nodesize) {
+		struct extent_buffer *eb;
+		bool uptodate;
+
+		eb = find_extent_buffer(fs_info, cur);
+		uptodate = btrfs_subpage_test_uptodate(fs_info, page, cur,
+						       fs_info->nodesize);
+
+		/* A dirty eb shouldn't disappera from buffer_radix */
+		if (WARN_ON(!eb))
+			return -EUCLEAN;
+
+		if (WARN_ON(cur != btrfs_header_bytenr(eb))) {
+			free_extent_buffer(eb);
+			return -EUCLEAN;
+		}
+		if (WARN_ON(!uptodate)) {
+			free_extent_buffer(eb);
+			return -EUCLEAN;
+		}
+
+		ret = csum_one_extent_buffer(eb);
+		free_extent_buffer(eb);
+		if (ret < 0)
+			return ret;
+	}
+	return ret;
+}
+
 /*
  * Checksum a dirty tree block before IO.  This has extra checks to make sure
  * we only fill in the checksum field in the first page of a multi-page block.
@@ -451,9 +519,10 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec
 	struct page *page = bvec->bv_page;
 	u64 start = page_offset(page);
 	u64 found_start;
-	u8 result[BTRFS_CSUM_SIZE];
 	struct extent_buffer *eb;
-	int ret;
+
+	if (fs_info->sectorsize < PAGE_SIZE)
+		return csum_dirty_subpage_buffers(fs_info, bvec);
 
 	eb = (struct extent_buffer *)page->private;
 	if (page != eb->pages[0])
@@ -475,28 +544,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec
 	if (WARN_ON(!PageUptodate(page)))
 		return -EUCLEAN;
 
-	ASSERT(memcmp_extent_buffer(eb, fs_info->fs_devices->metadata_uuid,
-				    offsetof(struct btrfs_header, fsid),
-				    BTRFS_FSID_SIZE) == 0);
-
-	csum_tree_block(eb, result);
-
-	if (btrfs_header_level(eb))
-		ret = btrfs_check_node(eb);
-	else
-		ret = btrfs_check_leaf_full(eb);
-
-	if (ret < 0) {
-		btrfs_print_tree(eb, 0);
-		btrfs_err(fs_info,
-		"block=%llu write time tree block corruption detected",
-			  eb->start);
-		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
-		return ret;
-	}
-	write_extent_buffer(eb, result, 0, fs_info->csum_size);
-
-	return 0;
+	return csum_one_extent_buffer(eb);
 }
 
 static int check_tree_block_fsid(struct extent_buffer *eb)
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 05/12] btrfs: extent_io: make alloc_extent_buffer() check subpage dirty bitmap
  2021-02-22  6:33 [PATCH 00/12] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (3 preceding siblings ...)
  2021-02-22  6:33 ` [PATCH 04/12] btrfs: disk-io: support subpage metadata csum calculation at write time Qu Wenruo
@ 2021-02-22  6:33 ` Qu Wenruo
  2021-02-22  6:33 ` [PATCH 06/12] btrfs: extent_io: make the page uptodate assert check to handle subpage Qu Wenruo
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2021-02-22  6:33 UTC (permalink / raw)
  To: linux-btrfs

In alloc_extent_buffer(), we make sure that the newly allocated page is
never dirty.

This is fine for sector size == PAGE_SIZE case, but for subpage it's
possible that one extent buffer in the page is dirty, thus the whole
page is marked dirty, and could cause false alert.

To support subpage, call btrfs_page_test_dirty() to handle both cases.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4dfb3ead1175..6637535d264d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5625,7 +5625,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		btrfs_page_inc_eb_refs(fs_info, p);
 		spin_unlock(&mapping->private_lock);
 
-		WARN_ON(PageDirty(p));
+		WARN_ON(btrfs_page_test_dirty(fs_info, p, eb->start, eb->len));
 		eb->pages[i] = p;
 		if (!PageUptodate(p))
 			uptodate = 0;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 06/12] btrfs: extent_io: make the page uptodate assert check to handle subpage
  2021-02-22  6:33 [PATCH 00/12] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (4 preceding siblings ...)
  2021-02-22  6:33 ` [PATCH 05/12] btrfs: extent_io: make alloc_extent_buffer() check subpage dirty bitmap Qu Wenruo
@ 2021-02-22  6:33 ` Qu Wenruo
  2021-02-22  6:33 ` [PATCH 07/12] btrfs: extent_io: make set/clear_extent_buffer_dirty() to support subpage sized metadata Qu Wenruo
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2021-02-22  6:33 UTC (permalink / raw)
  To: linux-btrfs

There are quite some assert test on page uptodate in extent buffer write
accessors.
They ensure the destination page is already uptodate.

This is fine for regular sector size case, but not for subpage case, as
for subpage we only mark the page uptodate if the page contains no hole
and all its extent buffers are uptodate.

So instead of checking PageUptodate(), for subpage case we check the
uptodate bitmap of btrfs_subpage structure.

To make the check more elegant, introduce a helper,
assert_eb_page_uptodate() to do the check for both subpage and regular
sector size cases.

The following functions are involved:
- write_extent_buffer_chunk_tree_uuid()
- write_extent_buffer_fsid()
- write_extent_buffer()
- memzero_extent_buffer()
- copy_extent_buffer()
- extent_buffer_test_bit()
- extent_buffer_bitmap_set()
- extent_buffer_bitmap_clear()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 42 ++++++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6637535d264d..13ba7a012425 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -6177,12 +6177,34 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 	return ret;
 }
 
+/*
+ * A helper to ensure that the extent buffer is uptodate.
+ *
+ * For regular sector size == PAGE_SIZE case, check if @page is uptodate.
+ * For subpage case, check if the range covered by the eb has EXTENT_UPTODATE.
+ */
+static void assert_eb_page_uptodate(const struct extent_buffer *eb,
+				    struct page *page)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+
+	if (fs_info->sectorsize < PAGE_SIZE) {
+		bool uptodate;
+
+		uptodate = btrfs_subpage_test_uptodate(fs_info, page,
+						eb->start, eb->len);
+		WARN_ON(!uptodate);
+	} else {
+		WARN_ON(!PageUptodate(page));
+	}
+}
+
 void write_extent_buffer_chunk_tree_uuid(const struct extent_buffer *eb,
 		const void *srcv)
 {
 	char *kaddr;
 
-	WARN_ON(!PageUptodate(eb->pages[0]));
+	assert_eb_page_uptodate(eb, eb->pages[0]);
 	kaddr = page_address(eb->pages[0]) + get_eb_offset_in_page(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, chunk_tree_uuid), srcv,
 			BTRFS_FSID_SIZE);
@@ -6192,7 +6214,7 @@ void write_extent_buffer_fsid(const struct extent_buffer *eb, const void *srcv)
 {
 	char *kaddr;
 
-	WARN_ON(!PageUptodate(eb->pages[0]));
+	assert_eb_page_uptodate(eb, eb->pages[0]);
 	kaddr = page_address(eb->pages[0]) + get_eb_offset_in_page(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, fsid), srcv,
 			BTRFS_FSID_SIZE);
@@ -6217,7 +6239,7 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
 
 	while (len > 0) {
 		page = eb->pages[i];
-		WARN_ON(!PageUptodate(page));
+		assert_eb_page_uptodate(eb, page);
 
 		cur = min(len, PAGE_SIZE - offset);
 		kaddr = page_address(page);
@@ -6246,7 +6268,7 @@ void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start,
 
 	while (len > 0) {
 		page = eb->pages[i];
-		WARN_ON(!PageUptodate(page));
+		assert_eb_page_uptodate(eb, page);
 
 		cur = min(len, PAGE_SIZE - offset);
 		kaddr = page_address(page);
@@ -6304,7 +6326,7 @@ void copy_extent_buffer(const struct extent_buffer *dst,
 
 	while (len > 0) {
 		page = dst->pages[i];
-		WARN_ON(!PageUptodate(page));
+		assert_eb_page_uptodate(dst, page);
 
 		cur = min(len, (unsigned long)(PAGE_SIZE - offset));
 
@@ -6366,7 +6388,7 @@ int extent_buffer_test_bit(const struct extent_buffer *eb, unsigned long start,
 
 	eb_bitmap_offset(eb, start, nr, &i, &offset);
 	page = eb->pages[i];
-	WARN_ON(!PageUptodate(page));
+	assert_eb_page_uptodate(eb, page);
 	kaddr = page_address(page);
 	return 1U & (kaddr[offset] >> (nr & (BITS_PER_BYTE - 1)));
 }
@@ -6391,7 +6413,7 @@ void extent_buffer_bitmap_set(const struct extent_buffer *eb, unsigned long star
 
 	eb_bitmap_offset(eb, start, pos, &i, &offset);
 	page = eb->pages[i];
-	WARN_ON(!PageUptodate(page));
+	assert_eb_page_uptodate(eb, page);
 	kaddr = page_address(page);
 
 	while (len >= bits_to_set) {
@@ -6402,7 +6424,7 @@ void extent_buffer_bitmap_set(const struct extent_buffer *eb, unsigned long star
 		if (++offset >= PAGE_SIZE && len > 0) {
 			offset = 0;
 			page = eb->pages[++i];
-			WARN_ON(!PageUptodate(page));
+			assert_eb_page_uptodate(eb, page);
 			kaddr = page_address(page);
 		}
 	}
@@ -6434,7 +6456,7 @@ void extent_buffer_bitmap_clear(const struct extent_buffer *eb,
 
 	eb_bitmap_offset(eb, start, pos, &i, &offset);
 	page = eb->pages[i];
-	WARN_ON(!PageUptodate(page));
+	assert_eb_page_uptodate(eb, page);
 	kaddr = page_address(page);
 
 	while (len >= bits_to_clear) {
@@ -6445,7 +6467,7 @@ void extent_buffer_bitmap_clear(const struct extent_buffer *eb,
 		if (++offset >= PAGE_SIZE && len > 0) {
 			offset = 0;
 			page = eb->pages[++i];
-			WARN_ON(!PageUptodate(page));
+			assert_eb_page_uptodate(eb, page);
 			kaddr = page_address(page);
 		}
 	}
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 07/12] btrfs: extent_io: make set/clear_extent_buffer_dirty() to support subpage sized metadata
  2021-02-22  6:33 [PATCH 00/12] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (5 preceding siblings ...)
  2021-02-22  6:33 ` [PATCH 06/12] btrfs: extent_io: make the page uptodate assert check to handle subpage Qu Wenruo
@ 2021-02-22  6:33 ` Qu Wenruo
  2021-02-22  6:33 ` [PATCH 08/12] btrfs: extent_io: make set_btree_ioerr() accept extent buffer and handle subpage metadata Qu Wenruo
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2021-02-22  6:33 UTC (permalink / raw)
  To: linux-btrfs

For set_extent_buffer_dirty() to support subpage sized metadata, just
call btrfs_page_set_dirty() to handle both cases.

For clear_extent_buffer_dirty(), it needs to clear the page dirty if and
only if all extent buffers in the page range are no longer dirty.
Also do the same for page error.

This is pretty different from the exist clear_extent_buffer_dirty()
routine, so add a new helper function,
clear_subpage_extent_buffer_dirty() to do this for subpage metadata.

Also since the main part of clearing page dirty code is still the same,
extract that into btree_clear_page_dirty() so that it can be utilized
for both cases.

But there is a special race between set_extent_buffer_dirty() and
clear_extent_buffer_dirty(), where we can clear the page dirty.

[POSSIBLE RACE WINDOW]
For the race window between clear_subpage_extent_buffer_dirty() and
set_extent_buffer_dirty(), due to the fact that we can't call
clear_page_dirty_for_io() under subpage spin lock, we can race like
below:

   T1 (eb1 in the same page)	|  T2 (eb2 in the same page)
 -------------------------------+------------------------------
 set_extent_buffer_dirty()	| clear_extent_buffer_dirty()
 |- was_dirty = false;		| |- clear_subpagE_extent_buffer_dirty()
 |				|    |- btrfs_clear_and_test_dirty()
 |				|    |  Since eb2 is the last dirty page
 |				|    |  we got:
 |				|    |  last == true;
 |				|    |
 |- btrfs_page_set_dirty()	|    |
 |  We set the page dirty and   |    |
 |  subpage dirty bitmap	|    |
 |				|    |- if (last)
 |				|    |  Since we don't have subpage lock
 |				|    |  hold, now @last is no longer
 |				|    |  correct
 |				|    |- btree_clear_page_dirty()
 |				|	Now PageDirty == false, even we
 |				|       have dirty_bitmap not zero.
 |- ASSERT(PageDirty());	|
    ^^^^ CRASH

The solution here is to also lock the eb->pages[0] for subpage case of
set_extent_buffer_dirty(), to prevent racing with
clear_extent_buffer_dirty().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 65 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 53 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 13ba7a012425..ea0089c8aefb 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5774,28 +5774,51 @@ void free_extent_buffer_stale(struct extent_buffer *eb)
 	release_extent_buffer(eb);
 }
 
+static void btree_clear_page_dirty(struct page *page)
+{
+	ASSERT(PageDirty(page));
+	ASSERT(PageLocked(page));
+	clear_page_dirty_for_io(page);
+	xa_lock_irq(&page->mapping->i_pages);
+	if (!PageDirty(page))
+		__xa_clear_mark(&page->mapping->i_pages,
+				page_index(page), PAGECACHE_TAG_DIRTY);
+	xa_unlock_irq(&page->mapping->i_pages);
+}
+
+static void clear_subpage_extent_buffer_dirty(const struct extent_buffer *eb)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct page *page = eb->pages[0];
+	bool last;
+
+	/* btree_clear_page_dirty() needs page locked */
+	lock_page(page);
+	last = btrfs_subpage_clear_and_test_dirty(fs_info, page, eb->start,
+						  eb->len);
+	if (last)
+		btree_clear_page_dirty(page);
+	unlock_page(page);
+	WARN_ON(atomic_read(&eb->refs) == 0);
+}
+
 void clear_extent_buffer_dirty(const struct extent_buffer *eb)
 {
 	int i;
 	int num_pages;
 	struct page *page;
 
+	if (eb->fs_info->sectorsize < PAGE_SIZE)
+		return clear_subpage_extent_buffer_dirty(eb);
+
 	num_pages = num_extent_pages(eb);
 
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
 		if (!PageDirty(page))
 			continue;
-
 		lock_page(page);
-		WARN_ON(!PagePrivate(page));
-
-		clear_page_dirty_for_io(page);
-		xa_lock_irq(&page->mapping->i_pages);
-		if (!PageDirty(page))
-			__xa_clear_mark(&page->mapping->i_pages,
-					page_index(page), PAGECACHE_TAG_DIRTY);
-		xa_unlock_irq(&page->mapping->i_pages);
+		btree_clear_page_dirty(page);
 		ClearPageError(page);
 		unlock_page(page);
 	}
@@ -5816,10 +5839,28 @@ bool set_extent_buffer_dirty(struct extent_buffer *eb)
 	WARN_ON(atomic_read(&eb->refs) == 0);
 	WARN_ON(!test_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags));
 
-	if (!was_dirty)
-		for (i = 0; i < num_pages; i++)
-			set_page_dirty(eb->pages[i]);
+	if (!was_dirty) {
+		bool subpage = eb->fs_info->sectorsize < PAGE_SIZE;
 
+		/*
+		 * For subpage case, we can have other extent buffers in the
+		 * same page, and in clear_subpage_extent_buffer_dirty() we
+		 * have to clear page dirty without subapge lock hold.
+		 * This can cause race where our page gets dirty cleared after
+		 * we just set it.
+		 *
+		 * Thankfully, clear_subpage_extent_buffer_dirty() has locked
+		 * its page for other reasons, we can use page lock to
+		 * prevent above race.
+		 */
+		if (subpage)
+			lock_page(eb->pages[0]);
+		for (i = 0; i < num_pages; i++)
+			btrfs_page_set_dirty(eb->fs_info, eb->pages[i],
+					     eb->start, eb->len);
+		if (subpage)
+			unlock_page(eb->pages[0]);
+	}
 #ifdef CONFIG_BTRFS_DEBUG
 	for (i = 0; i < num_pages; i++)
 		ASSERT(PageDirty(eb->pages[i]));
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 08/12] btrfs: extent_io: make set_btree_ioerr() accept extent buffer and handle subpage metadata
  2021-02-22  6:33 [PATCH 00/12] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (6 preceding siblings ...)
  2021-02-22  6:33 ` [PATCH 07/12] btrfs: extent_io: make set/clear_extent_buffer_dirty() to support subpage sized metadata Qu Wenruo
@ 2021-02-22  6:33 ` Qu Wenruo
  2021-02-22  6:33 ` [PATCH 09/12] btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function Qu Wenruo
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2021-02-22  6:33 UTC (permalink / raw)
  To: linux-btrfs

Current set_btree_ioerr() only accepts @page parameter and grabs extent
buffer from page::private.

This works fine for sector size == PAGE_SIZE case, but not for subpage
case.

Adds an extra parameter, @eb, for callers to pass extent buffer to this
function, so that subpage code can reuse this function.

And also add subpage special handling to update
btrfs_subpage::error_bitmap.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ea0089c8aefb..96ac72d3f3a0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3972,12 +3972,11 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 	return ret;
 }
 
-static void set_btree_ioerr(struct page *page)
+static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
 {
-	struct extent_buffer *eb = (struct extent_buffer *)page->private;
-	struct btrfs_fs_info *fs_info;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 
-	SetPageError(page);
+	btrfs_page_set_error(fs_info, page, eb->start, eb->len);
 	if (test_and_set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
 		return;
 
@@ -3985,7 +3984,6 @@ static void set_btree_ioerr(struct page *page)
 	 * If we error out, we should add back the dirty_metadata_bytes
 	 * to make it consistent.
 	 */
-	fs_info = eb->fs_info;
 	percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
 				 eb->len, fs_info->dirty_metadata_batch);
 
@@ -4029,13 +4027,13 @@ static void set_btree_ioerr(struct page *page)
 	 */
 	switch (eb->log_index) {
 	case -1:
-		set_bit(BTRFS_FS_BTREE_ERR, &eb->fs_info->flags);
+		set_bit(BTRFS_FS_BTREE_ERR, &fs_info->flags);
 		break;
 	case 0:
-		set_bit(BTRFS_FS_LOG1_ERR, &eb->fs_info->flags);
+		set_bit(BTRFS_FS_LOG1_ERR, &fs_info->flags);
 		break;
 	case 1:
-		set_bit(BTRFS_FS_LOG2_ERR, &eb->fs_info->flags);
+		set_bit(BTRFS_FS_LOG2_ERR, &fs_info->flags);
 		break;
 	default:
 		BUG(); /* unexpected, logic error */
@@ -4060,7 +4058,7 @@ static void end_bio_extent_buffer_writepage(struct bio *bio)
 		if (bio->bi_status ||
 		    test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
 			ClearPageUptodate(page);
-			set_btree_ioerr(page);
+			set_btree_ioerr(page, eb);
 		}
 
 		end_page_writeback(page);
@@ -4116,7 +4114,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 					 end_bio_extent_buffer_writepage,
 					 0, 0, 0, false);
 		if (ret) {
-			set_btree_ioerr(p);
+			set_btree_ioerr(p, eb);
 			if (PageWriteback(p))
 				end_page_writeback(p);
 			if (atomic_sub_and_test(num_pages - i, &eb->io_pages))
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 09/12] btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function
  2021-02-22  6:33 [PATCH 00/12] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (7 preceding siblings ...)
  2021-02-22  6:33 ` [PATCH 08/12] btrfs: extent_io: make set_btree_ioerr() accept extent buffer and handle subpage metadata Qu Wenruo
@ 2021-02-22  6:33 ` Qu Wenruo
  2021-03-01 16:33   ` David Sterba
  2021-02-22  6:33 ` [PATCH 10/12] btrfs: extent_io: introduce write_one_subpage_eb() function Qu Wenruo
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2021-02-22  6:33 UTC (permalink / raw)
  To: linux-btrfs

The new function, end_bio_subpage_eb_writepage(), will handle the
metadata writeback endio.

The major difference involved is:
- How to grab extent buffer
  Now page::private is a pointer to btrfs_subpage, we can no longer grab
  extent buffer directly.
  Thus we need to use the bv_offset to locate the extent buffer manually
  and iterate through the whole range.

- Use btrfs_subpage_end_writeback() caller
  This helper will handle the subpage writeback for us.

Since this function is executed under endio context, when grabbing
extent buffers it can't grab eb->refs_lock as that lock is not designed
to be grabbed under hardirq context.

So here introduce a helper, find_extent_buffer_nospinlock(), for such
situation, and convert find_extent_buffer() to use that helper.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 135 +++++++++++++++++++++++++++++++++----------
 1 file changed, 106 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 96ac72d3f3a0..d0afff0eb252 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4040,13 +4040,97 @@ static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
 	}
 }
 
+/*
+ * This is the endio specific version which won't touch any unsafe spinlock
+ * in endio context.
+ */
+static struct extent_buffer *find_extent_buffer_nospinlock(
+		struct btrfs_fs_info *fs_info, u64 start)
+{
+	struct extent_buffer *eb;
+
+	rcu_read_lock();
+	eb = radix_tree_lookup(&fs_info->buffer_radix,
+			       start >> fs_info->sectorsize_bits);
+	if (eb && atomic_inc_not_zero(&eb->refs)) {
+		rcu_read_unlock();
+		return eb;
+	}
+	rcu_read_unlock();
+	return NULL;
+}
+/*
+ * The endio function for subpage extent buffer write.
+ *
+ * Unlike end_bio_extent_buffer_writepage(), we only call end_page_writeback()
+ * after all extent buffers in the page has finished their writeback.
+ */
+static void end_bio_subpage_eb_writepage(struct btrfs_fs_info *fs_info,
+					 struct bio *bio)
+{
+	struct bio_vec *bvec;
+	struct bvec_iter_all iter_all;
+
+	ASSERT(!bio_flagged(bio, BIO_CLONED));
+	bio_for_each_segment_all(bvec, bio, iter_all) {
+		struct page *page = bvec->bv_page;
+		u64 bvec_start = page_offset(page) + bvec->bv_offset;
+		u64 bvec_end = bvec_start + bvec->bv_len - 1;
+		u64 cur_bytenr = bvec_start;
+
+		ASSERT(IS_ALIGNED(bvec->bv_len, fs_info->nodesize));
+
+		/* Iterate through all extent buffers in the range */
+		while (cur_bytenr <= bvec_end) {
+			struct extent_buffer *eb;
+			int done;
+
+			/*
+			 * Here we can't use find_extent_buffer(), as it may
+			 * try to lock eb->refs_lock, which is not safe in endio 
+			 * context.
+			 */
+			eb = find_extent_buffer_nospinlock(fs_info, cur_bytenr);
+			ASSERT(eb);
+
+			cur_bytenr = eb->start + eb->len;
+
+			ASSERT(test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags));
+			done = atomic_dec_and_test(&eb->io_pages);
+			ASSERT(done);
+
+			if (bio->bi_status ||
+			    test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
+				ClearPageUptodate(page);
+				set_btree_ioerr(page, eb);
+			}
+
+			btrfs_subpage_clear_writeback(fs_info, page, eb->start,
+						      eb->len);
+			end_extent_buffer_writeback(eb);
+			/*
+			 * free_extent_buffer() will grab spinlock which is not
+			 * safe in endio context. Thus here we manually dec
+			 * the ref.
+			 */
+			atomic_dec(&eb->refs);
+		}
+	}
+	bio_put(bio);
+}
+
 static void end_bio_extent_buffer_writepage(struct bio *bio)
 {
+	struct btrfs_fs_info *fs_info;
 	struct bio_vec *bvec;
 	struct extent_buffer *eb;
 	int done;
 	struct bvec_iter_all iter_all;
 
+	fs_info = btrfs_sb(bio_first_page_all(bio)->mapping->host->i_sb);
+	if (fs_info->sectorsize < PAGE_SIZE)
+		return end_bio_subpage_eb_writepage(fs_info, bio);
+
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
 		struct page *page = bvec->bv_page;
@@ -5427,36 +5511,29 @@ struct extent_buffer *find_extent_buffer(struct btrfs_fs_info *fs_info,
 {
 	struct extent_buffer *eb;
 
-	rcu_read_lock();
-	eb = radix_tree_lookup(&fs_info->buffer_radix,
-			       start >> fs_info->sectorsize_bits);
-	if (eb && atomic_inc_not_zero(&eb->refs)) {
-		rcu_read_unlock();
-		/*
-		 * Lock our eb's refs_lock to avoid races with
-		 * free_extent_buffer. When we get our eb it might be flagged
-		 * with EXTENT_BUFFER_STALE and another task running
-		 * free_extent_buffer might have seen that flag set,
-		 * eb->refs == 2, that the buffer isn't under IO (dirty and
-		 * writeback flags not set) and it's still in the tree (flag
-		 * EXTENT_BUFFER_TREE_REF set), therefore being in the process
-		 * of decrementing the extent buffer's reference count twice.
-		 * So here we could race and increment the eb's reference count,
-		 * clear its stale flag, mark it as dirty and drop our reference
-		 * before the other task finishes executing free_extent_buffer,
-		 * which would later result in an attempt to free an extent
-		 * buffer that is dirty.
-		 */
-		if (test_bit(EXTENT_BUFFER_STALE, &eb->bflags)) {
-			spin_lock(&eb->refs_lock);
-			spin_unlock(&eb->refs_lock);
-		}
-		mark_extent_buffer_accessed(eb, NULL);
-		return eb;
+	eb = find_extent_buffer_nospinlock(fs_info, start);
+	if (!eb)
+		return NULL;
+	/*
+	 * Lock our eb's refs_lock to avoid races with free_extent_buffer().
+	 * When we get our eb it might be flagged with EXTENT_BUFFER_STALE and
+	 * another task running free_extent_buffer() might have seen that flag
+	 * set, eb->refs == 2, that the buffer isn't under IO (dirty and
+	 * writeback flags not set) and it's still in the tree (flag
+	 * EXTENT_BUFFER_TREE_REF set), therefore being in the process
+	 * of decrementing the extent buffer's reference count twice.
+	 * So here we could race and increment the eb's reference count,
+	 * clear its stale flag, mark it as dirty and drop our reference
+	 * before the other task finishes executing free_extent_buffer,
+	 * which would later result in an attempt to free an extent
+	 * buffer that is dirty.
+	 */
+	if (test_bit(EXTENT_BUFFER_STALE, &eb->bflags)) {
+		spin_lock(&eb->refs_lock);
+		spin_unlock(&eb->refs_lock);
 	}
-	rcu_read_unlock();
-
-	return NULL;
+	mark_extent_buffer_accessed(eb, NULL);
+	return eb;
 }
 
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 10/12] btrfs: extent_io: introduce write_one_subpage_eb() function
  2021-02-22  6:33 [PATCH 00/12] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (8 preceding siblings ...)
  2021-02-22  6:33 ` [PATCH 09/12] btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function Qu Wenruo
@ 2021-02-22  6:33 ` Qu Wenruo
  2021-02-22  6:33 ` [PATCH 11/12] btrfs: extent_io: make lock_extent_buffer_for_io() to support subpage metadata Qu Wenruo
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2021-02-22  6:33 UTC (permalink / raw)
  To: linux-btrfs

The new function, write_one_subpage_eb(), as a subroutine for subpage
metadata write, will handle the extent buffer bio submission.

The main difference between the new write_one_subpage_eb() and
write_one_eb() is:
- No page locking
  When entering write_one_subpage_eb() the page is no longer locked.
  We only lock the page for its status update, and unlock immediately.
  Now we completely rely on extent io tree locking.

- Extra bitmap update along with page status update
  Now page dirty and writeback is controlled by
  btrfs_subpage dirty_bitmap and writeback_bitmap.
  They both follow the schema that any sector is dirty/writeback, then
  the full page get dirty/writeback.

- When to update the nr_written number
  Now we take a short cut, if we have cleared the last dirty bit of the
  page, we update nr_written.
  This is not completely perfect, but should emulate the old behavior
  good enough.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 55 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d0afff0eb252..ea603c1f2994 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4156,6 +4156,58 @@ static void end_bio_extent_buffer_writepage(struct bio *bio)
 	bio_put(bio);
 }
 
+/*
+ * Unlike the work in write_one_eb(), we rely completely on extent locking.
+ * Page locking is only utizlied at minimal to keep the VM code happy.
+ *
+ * Caller should still call write_one_eb() other than this function directly.
+ * As write_one_eb() has extra prepration before submitting the extent buffer.
+ */
+static int write_one_subpage_eb(struct extent_buffer *eb,
+				struct writeback_control *wbc,
+				struct extent_page_data *epd)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct page *page = eb->pages[0];
+	unsigned int write_flags = wbc_to_write_flags(wbc) | REQ_META;
+	bool no_dirty_ebs = false;
+	int ret;
+
+	/* clear_page_dirty_for_io() in subpage helper need page locked. */
+	lock_page(page);
+	btrfs_subpage_set_writeback(fs_info, page, eb->start, eb->len);
+
+	/* If we're the last dirty bit to update nr_written */
+	no_dirty_ebs = btrfs_subpage_clear_and_test_dirty(fs_info, page,
+							  eb->start, eb->len);
+	if (no_dirty_ebs)
+		clear_page_dirty_for_io(page);
+
+	ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc, page,
+			eb->start, eb->len, eb->start - page_offset(page),
+			&epd->bio, end_bio_extent_buffer_writepage, 0, 0, 0,
+			false);
+	if (ret) {
+		btrfs_subpage_clear_writeback(fs_info, page, eb->start,
+					      eb->len);
+		set_btree_ioerr(page, eb);
+		unlock_page(page);
+
+		if (atomic_dec_and_test(&eb->io_pages))
+			end_extent_buffer_writeback(eb);
+		return -EIO;
+	}
+	unlock_page(page);
+	/*
+	 * Submission finishes without problem, if no range of the page is
+	 * dirty anymore, we have submitted a page.
+	 * Update the nr_written in wbc.
+	 */
+	if (no_dirty_ebs)
+		update_nr_written(wbc, 1);
+	return ret;
+}
+
 static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 			struct writeback_control *wbc,
 			struct extent_page_data *epd)
@@ -4187,6 +4239,9 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 		memzero_extent_buffer(eb, start, end - start);
 	}
 
+	if (eb->fs_info->sectorsize < PAGE_SIZE)
+		return write_one_subpage_eb(eb, wbc, epd);
+
 	for (i = 0; i < num_pages; i++) {
 		struct page *p = eb->pages[i];
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 11/12] btrfs: extent_io: make lock_extent_buffer_for_io() to support subpage metadata
  2021-02-22  6:33 [PATCH 00/12] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (9 preceding siblings ...)
  2021-02-22  6:33 ` [PATCH 10/12] btrfs: extent_io: introduce write_one_subpage_eb() function Qu Wenruo
@ 2021-02-22  6:33 ` Qu Wenruo
  2021-02-22  6:33 ` [PATCH 12/12] btrfs: extent_io: introduce submit_eb_subpage() to submit a subpage metadata page Qu Wenruo
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2021-02-22  6:33 UTC (permalink / raw)
  To: linux-btrfs

For subpage metadata, we don't use page locking at all.
So just skip the page locking part for subpage.

All the remaining routine can be reused.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ea603c1f2994..edfca14a158e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3927,7 +3927,13 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 
 	btrfs_tree_unlock(eb);
 
-	if (!ret)
+	/*
+	 * Either we don't need to submit any tree block, or we're submitting
+	 * subpage.
+	 * Subpage metadata doesn't use page locking at all, so we can skip
+	 * the page locking.
+	 */
+	if (!ret || fs_info->sectorsize < PAGE_SIZE)
 		return ret;
 
 	num_pages = num_extent_pages(eb);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 12/12] btrfs: extent_io: introduce submit_eb_subpage() to submit a subpage metadata page
  2021-02-22  6:33 [PATCH 00/12] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (10 preceding siblings ...)
  2021-02-22  6:33 ` [PATCH 11/12] btrfs: extent_io: make lock_extent_buffer_for_io() to support subpage metadata Qu Wenruo
@ 2021-02-22  6:33 ` Qu Wenruo
  2021-03-01 16:22 ` [PATCH 00/12] btrfs: support read-write for subpage metadata David Sterba
  2021-03-01 16:30 ` David Sterba
  13 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2021-02-22  6:33 UTC (permalink / raw)
  To: linux-btrfs

The new function, submit_eb_subpage(), will submit all the dirty extent
buffers in the page.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 95 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 95 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index edfca14a158e..191bd47c04e0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4283,6 +4283,98 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 	return ret;
 }
 
+/*
+ * Submit one subpage btree page.
+ *
+ * The main difference between submit_eb_page() is:
+ * - Page locking
+ *   For subpage, we don't rely on page locking at all.
+ *
+ * - Flush write bio
+ *   We only flush bio if we may be unable to fit current extent buffers into
+ *   current bio.
+ *
+ * Return >=0 for the number of submitted extent buffers.
+ * Return <0 for fatal error.
+ */
+static int submit_eb_subpage(struct page *page,
+			     struct writeback_control *wbc,
+			     struct extent_page_data *epd)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+	int submitted = 0;
+	u64 page_start = page_offset(page);
+	int bit_start = 0;
+	int nbits = BTRFS_SUBPAGE_BITMAP_SIZE;
+	int sectors_per_node = fs_info->nodesize >> fs_info->sectorsize_bits;
+	int ret;
+
+	/* Lock and write each dirty extent buffers in the range */
+	while (bit_start < nbits) {
+		struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+		struct extent_buffer *eb;
+		unsigned long flags;
+		u64 start;
+
+		/*
+		 * Take private lock to ensure the subpage won't be detached
+		 * halfway.
+		 */
+		spin_lock(&page->mapping->private_lock);
+		if (!PagePrivate(page)) {
+			spin_unlock(&page->mapping->private_lock);
+			break;
+		}
+		spin_lock_irqsave(&subpage->lock, flags);
+		if (!((1 << bit_start) & subpage->dirty_bitmap)) {
+			spin_unlock_irqrestore(&subpage->lock, flags);
+			spin_unlock(&page->mapping->private_lock);
+			bit_start++;
+			continue;
+		}
+
+		start = page_start + bit_start * fs_info->sectorsize;
+		bit_start += sectors_per_node;
+
+		/*
+		 * Here we just want to grab the eb without touching extra
+		 * spin locks. So here we call find_extent_buffer_nospinlock().
+		 */
+		eb = find_extent_buffer_nospinlock(fs_info, start);
+		spin_unlock_irqrestore(&subpage->lock, flags);
+		spin_unlock(&page->mapping->private_lock);
+
+		/*
+		 * The eb has already reached 0 refs thus find_extent_buffer()
+		 * doesn't return it. We don't need to write back such eb
+		 * anyway.
+		 */
+		if (!eb)
+			continue;
+
+		ret = lock_extent_buffer_for_io(eb, epd);
+		if (ret == 0) {
+			free_extent_buffer(eb);
+			continue;
+		}
+		if (ret < 0) {
+			free_extent_buffer(eb);
+			goto cleanup;
+		}
+		ret = write_one_eb(eb, wbc, epd);
+		free_extent_buffer(eb);
+		if (ret < 0)
+			goto cleanup;
+		submitted++;
+	}
+	return submitted;
+
+cleanup:
+	/* We hit error, end bio for the submitted extent buffers */
+	end_write_bio(epd, ret);
+	return ret;
+}
+
 /*
  * Submit all page(s) of one extent buffer.
  *
@@ -4315,6 +4407,9 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc,
 	if (!PagePrivate(page))
 		return 0;
 
+	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
+		return submit_eb_subpage(page, wbc, epd);
+
 	spin_lock(&mapping->private_lock);
 	if (!PagePrivate(page)) {
 		spin_unlock(&mapping->private_lock);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 03/12] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata
  2021-02-22  6:33 ` [PATCH 03/12] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
@ 2021-02-22  7:58   ` Su Yue
  2021-03-01 16:29     ` David Sterba
  0 siblings, 1 reply; 19+ messages in thread
From: Su Yue @ 2021-02-22  7:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs


On Mon 22 Feb 2021 at 14:33, Qu Wenruo <wqu@suse.com> wrote:

> For btree_set_page_dirty(), we should also check the extent 
> buffer
> sanity for subpage support.
>
> Unlike the regular sector size case, since one page can contain 
> multiple
> extent buffers, we need to make sure there is at least one dirty 
> extent
> buffer in the page.
>
> So this patch will iterate through the 
> btrfs_subpage::dirty_bitmap
> to get the extent buffers, and check if any dirty extent buffer 
> in the page
> range has EXTENT_BUFFER_DIRTY and proper refs.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/disk-io.c | 47 
>  ++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 41 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index c2576c5fe62e..437e6b2163c7 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -42,6 +42,7 @@
>  #include "discard.h"
>  #include "space-info.h"
>  #include "zoned.h"
> +#include "subpage.h"
>
>  #define BTRFS_SUPER_FLAG_SUPP	(BTRFS_HEADER_FLAG_WRITTEN |\
>  				 BTRFS_HEADER_FLAG_RELOC |\
> @@ -992,14 +993,48 @@ static void btree_invalidatepage(struct 
> page *page, unsigned int offset,
>  static int btree_set_page_dirty(struct page *page)
>  {
>  #ifdef DEBUG
> +	struct btrfs_fs_info *fs_info = 
> btrfs_sb(page->mapping->host->i_sb);
> +	struct btrfs_subpage *subpage;
>  	struct extent_buffer *eb;
> +	int cur_bit;
>
cur_bit is not initialized.

> +	u64 page_start = page_offset(page);
> +
> +	if (fs_info->sectorsize == PAGE_SIZE) {
> +		BUG_ON(!PagePrivate(page));
> +		eb = (struct extent_buffer *)page->private;
> +		BUG_ON(!eb);
> +		BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
> +		BUG_ON(!atomic_read(&eb->refs));
> +		btrfs_assert_tree_locked(eb);
> +		return __set_page_dirty_nobuffers(page);
> +	}
> +	ASSERT(PagePrivate(page) && page->private);
> +	subpage = (struct btrfs_subpage *)page->private;
> +
> +	ASSERT(subpage->dirty_bitmap);
> +	while (cur_bit < BTRFS_SUBPAGE_BITMAP_SIZE) {
> +		unsigned long flags;
> +		u64 cur;
> +		u16 tmp = (1 << cur_bit);
> +
> +		spin_lock_irqsave(&subpage->lock, flags);
> +		if (!(tmp & subpage->dirty_bitmap)) {
> +			spin_unlock_irqrestore(&subpage->lock, flags);
> +			cur_bit++;
> +			continue;
> +		}
> +		spin_unlock_irqrestore(&subpage->lock, flags);
> +		cur = page_start + cur_bit * fs_info->sectorsize;
>
> -	BUG_ON(!PagePrivate(page));
> -	eb = (struct extent_buffer *)page->private;
> -	BUG_ON(!eb);
> -	BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
> -	BUG_ON(!atomic_read(&eb->refs));
> -	btrfs_assert_tree_locked(eb);
> +		eb = find_extent_buffer(fs_info, cur);
> +		ASSERT(eb);
> +		ASSERT(test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
> +		ASSERT(atomic_read(&eb->refs));
> +		btrfs_assert_tree_locked(eb);
> +		free_extent_buffer(eb);
> +
> +		cur_bit += (fs_info->nodesize >> 
> fs_info->sectorsize_bits);
> +	}
>  #endif
>  	return __set_page_dirty_nobuffers(page);
>  }


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 00/12] btrfs: support read-write for subpage metadata
  2021-02-22  6:33 [PATCH 00/12] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (11 preceding siblings ...)
  2021-02-22  6:33 ` [PATCH 12/12] btrfs: extent_io: introduce submit_eb_subpage() to submit a subpage metadata page Qu Wenruo
@ 2021-03-01 16:22 ` David Sterba
  2021-03-01 16:30 ` David Sterba
  13 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2021-03-01 16:22 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Feb 22, 2021 at 02:33:45PM +0800, Qu Wenruo wrote:
> This patchset can be fetched from the following github repo, along with
> the full subpage RW support:
> https://github.com/adam900710/linux/tree/subpage
> 
> This patchset is for metadata read write support.

I've skimmed the patches, it's adding further helpers and special cases
for subpage (but I could have missed something). From that I think it's
ok to add it to for-next for some test coverage, but rather to make sure
the subpage changes do not bleed to the regular case.

> [TEST]
> Since the data write path is not included in this patchset, we can't
> really test it, but during the lunar new year vocation, I have tested
> the full RW patchset with "fstresss -n 10000 -p2" on my Aarch64 board.
> 
> And the full RW patchset survives without any crash for a full week.
> 
> There is only one remaining bug exposed during the test, that we have
> random data checksum mismatch, which is still under investigation.
> 
> But the metadata part should be OK for submission.
> 
> [DIFFERENCE AGAINST REGULAR SECTORSIZE]
> The metadata part in fact has more new code than data part, as it has
> some different behaviors compared to the regular sector size handling:
> 
> - No more page locking
>   Now metadata read/write relies on extent io tree locking, other than
>   page locking.
>   This is to allow behaviors like read lock one eb while also try to
>   read lock another eb in the same page.
>   We can't rely on page lock as now we have multiple extent buffers in
>   the same page.
> 
> - Page status update
>   Now we use subpage wrappers to handle page status update.
> 
> - How to submit dirty extent buffers
>   Instead of just grabbing extent buffer from page::private, we need to
>   iterate all dirty extent buffers in the page and submit them.

I'm not sure if all this information is also preserved in some comments,
if not it definitely should.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 03/12] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata
  2021-02-22  7:58   ` Su Yue
@ 2021-03-01 16:29     ` David Sterba
  0 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2021-03-01 16:29 UTC (permalink / raw)
  To: Su Yue; +Cc: Qu Wenruo, linux-btrfs

On Mon, Feb 22, 2021 at 03:58:00PM +0800, Su Yue wrote:
> 
> On Mon 22 Feb 2021 at 14:33, Qu Wenruo <wqu@suse.com> wrote:
> 
> > For btree_set_page_dirty(), we should also check the extent 
> > buffer
> > sanity for subpage support.
> >
> > Unlike the regular sector size case, since one page can contain 
> > multiple
> > extent buffers, we need to make sure there is at least one dirty 
> > extent
> > buffer in the page.
> >
> > So this patch will iterate through the 
> > btrfs_subpage::dirty_bitmap
> > to get the extent buffers, and check if any dirty extent buffer 
> > in the page
> > range has EXTENT_BUFFER_DIRTY and proper refs.
> >
> > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > ---
> >  fs/btrfs/disk-io.c | 47 
> >  ++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 41 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index c2576c5fe62e..437e6b2163c7 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -42,6 +42,7 @@
> >  #include "discard.h"
> >  #include "space-info.h"
> >  #include "zoned.h"
> > +#include "subpage.h"
> >
> >  #define BTRFS_SUPER_FLAG_SUPP	(BTRFS_HEADER_FLAG_WRITTEN |\
> >  				 BTRFS_HEADER_FLAG_RELOC |\
> > @@ -992,14 +993,48 @@ static void btree_invalidatepage(struct 
> > page *page, unsigned int offset,
> >  static int btree_set_page_dirty(struct page *page)
> >  {
> >  #ifdef DEBUG
> > +	struct btrfs_fs_info *fs_info = 
> > btrfs_sb(page->mapping->host->i_sb);
> > +	struct btrfs_subpage *subpage;
> >  	struct extent_buffer *eb;
> > +	int cur_bit;
> >
> cur_bit is not initialized.

Indeed and it's strange that gcc does not warn about that, either by
default or with W=3.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 00/12] btrfs: support read-write for subpage metadata
  2021-02-22  6:33 [PATCH 00/12] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (12 preceding siblings ...)
  2021-03-01 16:22 ` [PATCH 00/12] btrfs: support read-write for subpage metadata David Sterba
@ 2021-03-01 16:30 ` David Sterba
  2021-03-02  0:18   ` Qu Wenruo
  13 siblings, 1 reply; 19+ messages in thread
From: David Sterba @ 2021-03-01 16:30 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Feb 22, 2021 at 02:33:45PM +0800, Qu Wenruo wrote:
> This patchset can be fetched from the following github repo, along with
> the full subpage RW support:
> https://github.com/adam900710/linux/tree/subpage
> 
> This patchset is for metadata read write support.
> 
> [TEST]
> Since the data write path is not included in this patchset, we can't
> really test it, but during the lunar new year vocation, I have tested
> the full RW patchset with "fstresss -n 10000 -p2" on my Aarch64 board.
> 
> And the full RW patchset survives without any crash for a full week.
> 
> There is only one remaining bug exposed during the test, that we have
> random data checksum mismatch, which is still under investigation.
> 
> But the metadata part should be OK for submission.
> 
> [DIFFERENCE AGAINST REGULAR SECTORSIZE]
> The metadata part in fact has more new code than data part, as it has
> some different behaviors compared to the regular sector size handling:
> 
> - No more page locking
>   Now metadata read/write relies on extent io tree locking, other than
>   page locking.
>   This is to allow behaviors like read lock one eb while also try to
>   read lock another eb in the same page.
>   We can't rely on page lock as now we have multiple extent buffers in
>   the same page.
> 
> - Page status update
>   Now we use subpage wrappers to handle page status update.
> 
> - How to submit dirty extent buffers
>   Instead of just grabbing extent buffer from page::private, we need to
>   iterate all dirty extent buffers in the page and submit them.
> 
> Qu Wenruo (12):
>   btrfs: subpage: introduce helper for subpage dirty status
>   btrfs: subpage: introduce helper for subpage writeback status
>   btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check
>     on subpage metadata
>   btrfs: disk-io: support subpage metadata csum calculation at write
>     time
>   btrfs: extent_io: make alloc_extent_buffer() check subpage dirty
>     bitmap
>   btrfs: extent_io: make the page uptodate assert check to handle
>     subpage
>   btrfs: extent_io: make set/clear_extent_buffer_dirty() to support
>     subpage sized metadata
>   btrfs: extent_io: make set_btree_ioerr() accept extent buffer and
>     handle subpage metadata
>   btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function
>   btrfs: extent_io: introduce write_one_subpage_eb() function
>   btrfs: extent_io: make lock_extent_buffer_for_io() to support subpage
>     metadata
>   btrfs: extent_io: introduce submit_eb_subpage() to submit a subpage
>     metadata page

Please don't use "extent_io" nor "disk-io" in subjects.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 09/12] btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function
  2021-02-22  6:33 ` [PATCH 09/12] btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function Qu Wenruo
@ 2021-03-01 16:33   ` David Sterba
  0 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2021-03-01 16:33 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Feb 22, 2021 at 02:33:54PM +0800, Qu Wenruo wrote:
> +static void end_bio_subpage_eb_writepage(struct btrfs_fs_info *fs_info,
> +					 struct bio *bio)
> +{
> +	struct bio_vec *bvec;
> +	struct bvec_iter_all iter_all;
> +
> +	ASSERT(!bio_flagged(bio, BIO_CLONED));
> +	bio_for_each_segment_all(bvec, bio, iter_all) {
> +		struct page *page = bvec->bv_page;
> +		u64 bvec_start = page_offset(page) + bvec->bv_offset;
> +		u64 bvec_end = bvec_start + bvec->bv_len - 1;
> +		u64 cur_bytenr = bvec_start;
> +
> +		ASSERT(IS_ALIGNED(bvec->bv_len, fs_info->nodesize));
> +
> +		/* Iterate through all extent buffers in the range */
> +		while (cur_bytenr <= bvec_end) {
> +			struct extent_buffer *eb;
> +			int done;
> +
> +			/*
> +			 * Here we can't use find_extent_buffer(), as it may
> +			 * try to lock eb->refs_lock, which is not safe in endio 

Please make sure you don't leave whitespace damage in newly added code,
'git am' then fails to apply the patches and I need to fix it manually.

warning: 1 line adds whitespace errors.
*
* You have some suspicious patch lines:
*
* In fs/btrfs/extent_io.c
* trailing whitespace (line 4090)
fs/btrfs/extent_io.c:4090:                       * try to lock eb->refs_lock, which is not safe in endio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 00/12] btrfs: support read-write for subpage metadata
  2021-03-01 16:30 ` David Sterba
@ 2021-03-02  0:18   ` Qu Wenruo
  0 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2021-03-02  0:18 UTC (permalink / raw)
  To: dsterba, linux-btrfs



On 2021/3/2 上午12:30, David Sterba wrote:
> On Mon, Feb 22, 2021 at 02:33:45PM +0800, Qu Wenruo wrote:
>> This patchset can be fetched from the following github repo, along with
>> the full subpage RW support:
>> https://github.com/adam900710/linux/tree/subpage
>>
>> This patchset is for metadata read write support.
>>
>> [TEST]
>> Since the data write path is not included in this patchset, we can't
>> really test it, but during the lunar new year vocation, I have tested
>> the full RW patchset with "fstresss -n 10000 -p2" on my Aarch64 board.
>>
>> And the full RW patchset survives without any crash for a full week.
>>
>> There is only one remaining bug exposed during the test, that we have
>> random data checksum mismatch, which is still under investigation.
>>
>> But the metadata part should be OK for submission.
>>
>> [DIFFERENCE AGAINST REGULAR SECTORSIZE]
>> The metadata part in fact has more new code than data part, as it has
>> some different behaviors compared to the regular sector size handling:
>>
>> - No more page locking
>>    Now metadata read/write relies on extent io tree locking, other than
>>    page locking.
>>    This is to allow behaviors like read lock one eb while also try to
>>    read lock another eb in the same page.
>>    We can't rely on page lock as now we have multiple extent buffers in
>>    the same page.
>>
>> - Page status update
>>    Now we use subpage wrappers to handle page status update.
>>
>> - How to submit dirty extent buffers
>>    Instead of just grabbing extent buffer from page::private, we need to
>>    iterate all dirty extent buffers in the page and submit them.
>>
>> Qu Wenruo (12):
>>    btrfs: subpage: introduce helper for subpage dirty status
>>    btrfs: subpage: introduce helper for subpage writeback status
>>    btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check
>>      on subpage metadata
>>    btrfs: disk-io: support subpage metadata csum calculation at write
>>      time
>>    btrfs: extent_io: make alloc_extent_buffer() check subpage dirty
>>      bitmap
>>    btrfs: extent_io: make the page uptodate assert check to handle
>>      subpage
>>    btrfs: extent_io: make set/clear_extent_buffer_dirty() to support
>>      subpage sized metadata
>>    btrfs: extent_io: make set_btree_ioerr() accept extent buffer and
>>      handle subpage metadata
>>    btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function
>>    btrfs: extent_io: introduce write_one_subpage_eb() function
>>    btrfs: extent_io: make lock_extent_buffer_for_io() to support subpage
>>      metadata
>>    btrfs: extent_io: introduce submit_eb_subpage() to submit a subpage
>>      metadata page
> 
> Please don't use "extent_io" nor "disk-io" in subjects.
> 
Oh, those patches are too old before the naming schema change.

I should recheck all the checklist on them.

Thanks,
Qu


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-03-02  8:12 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-22  6:33 [PATCH 00/12] btrfs: support read-write for subpage metadata Qu Wenruo
2021-02-22  6:33 ` [PATCH 01/12] btrfs: subpage: introduce helper for subpage dirty status Qu Wenruo
2021-02-22  6:33 ` [PATCH 02/12] btrfs: subpage: introduce helper for subpage writeback status Qu Wenruo
2021-02-22  6:33 ` [PATCH 03/12] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
2021-02-22  7:58   ` Su Yue
2021-03-01 16:29     ` David Sterba
2021-02-22  6:33 ` [PATCH 04/12] btrfs: disk-io: support subpage metadata csum calculation at write time Qu Wenruo
2021-02-22  6:33 ` [PATCH 05/12] btrfs: extent_io: make alloc_extent_buffer() check subpage dirty bitmap Qu Wenruo
2021-02-22  6:33 ` [PATCH 06/12] btrfs: extent_io: make the page uptodate assert check to handle subpage Qu Wenruo
2021-02-22  6:33 ` [PATCH 07/12] btrfs: extent_io: make set/clear_extent_buffer_dirty() to support subpage sized metadata Qu Wenruo
2021-02-22  6:33 ` [PATCH 08/12] btrfs: extent_io: make set_btree_ioerr() accept extent buffer and handle subpage metadata Qu Wenruo
2021-02-22  6:33 ` [PATCH 09/12] btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function Qu Wenruo
2021-03-01 16:33   ` David Sterba
2021-02-22  6:33 ` [PATCH 10/12] btrfs: extent_io: introduce write_one_subpage_eb() function Qu Wenruo
2021-02-22  6:33 ` [PATCH 11/12] btrfs: extent_io: make lock_extent_buffer_for_io() to support subpage metadata Qu Wenruo
2021-02-22  6:33 ` [PATCH 12/12] btrfs: extent_io: introduce submit_eb_subpage() to submit a subpage metadata page Qu Wenruo
2021-03-01 16:22 ` [PATCH 00/12] btrfs: support read-write for subpage metadata David Sterba
2021-03-01 16:30 ` David Sterba
2021-03-02  0:18   ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.