All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/15] btrfs: support read-write for subpage metadata
@ 2021-03-10  9:08 Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 01/15] btrfs: add sysfs interface for supported sectorsize Qu Wenruo
                   ` (14 more replies)
  0 siblings, 15 replies; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

This patchset can be fetched from the following github repo, along with
the full subpage RW support:
https://github.com/adam900710/linux/tree/subpage

This patchset is for metadata read write support.

[FULL RW TEST]
Since the data write path is not included in this patchset, we can't
really test the patchset itself, but anyone can grab the patch from
github repo and do fstests/generic tests.

There are some known issues:
- Very very rare random ASSERT() failure for data page::private
  It looks like we can lock a data page without page::private set for
  subpage.
  This problem seems to be caused some set_page_extent_mapped() callers
  are not holding the page locked, thus leaving a small window.
  Investigating.

- Defrag related test failure
  Since current defrag is doing per-page defrag, to support subpage
  defrag, we need some change in the loop.
  Thus for now, defrag is disabled completely for subpage RW mount.

- No compression support yet
  There are at least 2 known bugs if forcing compression for subpage
  * Some hard coded PAGE_SIZE screwing up space rsv
  * Subpage ASSERT() triggered
    This is because some compression code is unlocking locked_page by
    calling extent_clear_unlock_delalloc() with locked_page == NULL.
  So for now compression is also disabled.

[DIFFERENCE AGAINST REGULAR SECTORSIZE]
The metadata part in fact has more new code than data part, as it has
some different behaviors compared to the regular sector size handling:

- No more page locking
  Now metadata read/write relies on extent io tree locking, other than
  page locking.
  This is to allow behaviors like read lock one eb while also try to
  read lock another eb in the same page.
  We can't rely on page lock as now we have multiple extent buffers in
  the same page.

- Page status update
  Now we use subpage wrappers to handle page status update.

- How to submit dirty extent buffers
  Instead of just grabbing extent buffer from page::private, we need to
  iterate all dirty extent buffers in the page and submit them.

[CHANGELOG]
v2:
- Rebased to latest misc-next
  No conflicts at all.

- Add new sysfs interface to grab supported RO/RW sectorsize
  This will allow mkfs.btrfs to detect unmountable fs better.

- Use newer naming schema for each patch
  No more "extent_io:" or "inode:" schema anymore.

- Move two pure cleanups to the series
  Patch 2~3, originally in RW part.

- Fix one uninitialized variable
  Patch 6.

Qu Wenruo (15):
  btrfs: add sysfs interface for supported sectorsize
  btrfs: use min() to replace open-code in btrfs_invalidatepage()
  btrfs: remove unnecessary variable shadowing in btrfs_invalidatepage()
  btrfs: introduce helpers for subpage dirty status
  btrfs: introduce helpers for subpage writeback status
  btrfs: allow btree_set_page_dirty() to do more sanity check on subpage
    metadata
  btrfs: support subpage metadata csum calculation at write time
  btrfs: make alloc_extent_buffer() check subpage dirty bitmap
  btrfs: make the page uptodate assert to be subpage compatible
  btrfs: make set/clear_extent_buffer_dirty() to be subpage compatible
  btrfs: make set_btree_ioerr() accept extent buffer and to be subpage
    compatible
  btrfs: introduce end_bio_subpage_eb_writepage() function
  btrfs: introduce write_one_subpage_eb() function
  btrfs: make lock_extent_buffer_for_io() to be subpage compatible
  btrfs: introduce submit_eb_subpage() to submit a subpage metadata page

 fs/btrfs/disk-io.c   | 143 +++++++++++----
 fs/btrfs/extent_io.c | 420 ++++++++++++++++++++++++++++++++++++-------
 fs/btrfs/inode.c     |  14 +-
 fs/btrfs/subpage.c   |  73 ++++++++
 fs/btrfs/subpage.h   |  17 ++
 fs/btrfs/sysfs.c     |  34 ++++
 6 files changed, 598 insertions(+), 103 deletions(-)

-- 
2.30.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 01/15] btrfs: add sysfs interface for supported sectorsize
  2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
@ 2021-03-10  9:08 ` Qu Wenruo
  2021-03-15 11:59   ` Anand Jain
  2021-03-10  9:08 ` [PATCH v2 02/15] btrfs: use min() to replace open-code in btrfs_invalidatepage() Qu Wenruo
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

Add extra sysfs interface features/supported_ro_sectorsize and
features/supported_rw_sectorsize to indicate subpage support.

Currently for supported_rw_sectorsize all architectures only have their
PAGE_SIZE listed.

While for supported_ro_sectorsize, for systems with 64K page size, 4K
sectorsize is also supported.

This new sysfs interface would help mkfs.btrfs to do more accurate
warning.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/sysfs.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 6eb1c50fa98c..3ef419899472 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -360,11 +360,45 @@ static ssize_t supported_rescue_options_show(struct kobject *kobj,
 BTRFS_ATTR(static_feature, supported_rescue_options,
 	   supported_rescue_options_show);
 
+static ssize_t supported_ro_sectorsize_show(struct kobject *kobj,
+					    struct kobj_attribute *a,
+					    char *buf)
+{
+	ssize_t ret = 0;
+	int i = 0;
+
+	/* For 64K page size, 4K sector size is supported */
+	if (PAGE_SIZE == SZ_64K) {
+		ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%u", SZ_4K);
+		i++;
+	}
+	/* Other than above subpage, only support PAGE_SIZE as sectorsize yet */
+	ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%s%lu\n",
+			 (i ? " " : ""), PAGE_SIZE);
+	return ret;
+}
+BTRFS_ATTR(static_feature, supported_ro_sectorsize,
+	   supported_ro_sectorsize_show);
+
+static ssize_t supported_rw_sectorsize_show(struct kobject *kobj,
+					    struct kobj_attribute *a,
+					    char *buf)
+{
+	ssize_t ret = 0;
+
+	/* Only PAGE_SIZE as sectorsize is supported */
+	ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%lu\n", PAGE_SIZE);
+	return ret;
+}
+BTRFS_ATTR(static_feature, supported_rw_sectorsize,
+	   supported_rw_sectorsize_show);
 static struct attribute *btrfs_supported_static_feature_attrs[] = {
 	BTRFS_ATTR_PTR(static_feature, rmdir_subvol),
 	BTRFS_ATTR_PTR(static_feature, supported_checksums),
 	BTRFS_ATTR_PTR(static_feature, send_stream_version),
 	BTRFS_ATTR_PTR(static_feature, supported_rescue_options),
+	BTRFS_ATTR_PTR(static_feature, supported_ro_sectorsize),
+	BTRFS_ATTR_PTR(static_feature, supported_rw_sectorsize),
 	NULL
 };
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 02/15] btrfs: use min() to replace open-code in btrfs_invalidatepage()
  2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 01/15] btrfs: add sysfs interface for supported sectorsize Qu Wenruo
@ 2021-03-10  9:08 ` Qu Wenruo
  2021-03-15 12:03   ` Anand Jain
  2021-03-10  9:08 ` [PATCH v2 03/15] btrfs: remove unnecessary variable shadowing " Qu Wenruo
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

In btrfs_invalidatepage() we introduce a temporary variable, new_len, to
update ordered->truncated_len.

But we can use min() to replace it completely and no need for the
variable.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 52dc5f52ea58..2973cec05505 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8405,15 +8405,13 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 		 */
 		if (TestClearPagePrivate2(page)) {
 			struct btrfs_ordered_inode_tree *tree;
-			u64 new_len;
 
 			tree = &inode->ordered_tree;
 
 			spin_lock_irq(&tree->lock);
 			set_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags);
-			new_len = start - ordered->file_offset;
-			if (new_len < ordered->truncated_len)
-				ordered->truncated_len = new_len;
+			ordered->truncated_len = min(ordered->truncated_len,
+					start - ordered->file_offset);
 			spin_unlock_irq(&tree->lock);
 
 			if (btrfs_dec_test_ordered_pending(inode, &ordered,
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 03/15] btrfs: remove unnecessary variable shadowing in btrfs_invalidatepage()
  2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 01/15] btrfs: add sysfs interface for supported sectorsize Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 02/15] btrfs: use min() to replace open-code in btrfs_invalidatepage() Qu Wenruo
@ 2021-03-10  9:08 ` Qu Wenruo
  2021-03-15 12:06   ` Anand Jain
  2021-03-10  9:08 ` [PATCH v2 04/15] btrfs: introduce helpers for subpage dirty status Qu Wenruo
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

In btrfs_invalidatepage() we re-declare @tree variable as
btrfs_ordered_inode_tree.

Since it's only used to do the spinlock, we can grab it from inode
directly, and remove the unnecessary declaration completely.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2973cec05505..f99554f0bd48 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8404,15 +8404,11 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 		 * for the finish_ordered_io
 		 */
 		if (TestClearPagePrivate2(page)) {
-			struct btrfs_ordered_inode_tree *tree;
-
-			tree = &inode->ordered_tree;
-
-			spin_lock_irq(&tree->lock);
+			spin_lock_irq(&inode->ordered_tree.lock);
 			set_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags);
 			ordered->truncated_len = min(ordered->truncated_len,
 					start - ordered->file_offset);
-			spin_unlock_irq(&tree->lock);
+			spin_unlock_irq(&inode->ordered_tree.lock);
 
 			if (btrfs_dec_test_ordered_pending(inode, &ordered,
 							   start,
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 04/15] btrfs: introduce helpers for subpage dirty status
  2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (2 preceding siblings ...)
  2021-03-10  9:08 ` [PATCH v2 03/15] btrfs: remove unnecessary variable shadowing " Qu Wenruo
@ 2021-03-10  9:08 ` Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 05/15] btrfs: introduce helpers for subpage writeback status Qu Wenruo
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

This patch introduce the following functions to handle btrfs subpage
dirty status:
- btrfs_subpage_set_dirty()
- btrfs_subpage_clear_dirty()
- btrfs_subpage_test_dirty()
  Those helpers can only be called when the range is ensured to be
  inside the page.

- btrfs_page_set_dirty()
- btrfs_page_clear_dirty()
- btrfs_page_test_dirty()
  Those helpers can handle both regular sector size and subpage without
  problem.
  Thus those would be used to replace PageDirty() related calls in
  later commits.

There is one special point to note here, just like set_page_dirty() and
clear_page_dirty_for_io(), btrfs_*page_set_dirty() and
btrfs_*page_clear_dirty() must be called with page locked.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/subpage.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/subpage.h | 15 +++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index c69049e7daa9..183925902031 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -220,6 +220,46 @@ void btrfs_subpage_clear_error(const struct btrfs_fs_info *fs_info,
 	spin_unlock_irqrestore(&subpage->lock, flags);
 }
 
+void btrfs_subpage_set_dirty(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->dirty_bitmap |= tmp;
+	spin_unlock_irqrestore(&subpage->lock, flags);
+	set_page_dirty(page);
+}
+
+bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+	bool last = false;
+
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->dirty_bitmap &= ~tmp;
+	if (subpage->dirty_bitmap == 0)
+		last = true;
+	spin_unlock_irqrestore(&subpage->lock, flags);
+	return last;
+}
+
+void btrfs_subpage_clear_dirty(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	bool last;
+
+	last = btrfs_subpage_clear_and_test_dirty(fs_info, page, start, len);
+	if (last)
+		clear_page_dirty_for_io(page);
+}
+
 /*
  * Unlike set/clear which is dependent on each page status, for test all bits
  * are tested in the same way.
@@ -240,6 +280,7 @@ bool btrfs_subpage_test_##name(const struct btrfs_fs_info *fs_info,	\
 }
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(uptodate);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(error);
+IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(dirty);
 
 /*
  * Note that, in selftests (extent-io-tests), we can have empty fs_info passed
@@ -276,3 +317,5 @@ bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info,	\
 IMPLEMENT_BTRFS_PAGE_OPS(uptodate, SetPageUptodate, ClearPageUptodate,
 			 PageUptodate);
 IMPLEMENT_BTRFS_PAGE_OPS(error, SetPageError, ClearPageError, PageError);
+IMPLEMENT_BTRFS_PAGE_OPS(dirty, set_page_dirty, clear_page_dirty_for_io,
+			 PageDirty);
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index b86a4881475d..adaece5ce294 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -20,6 +20,7 @@ struct btrfs_subpage {
 	spinlock_t lock;
 	u16 uptodate_bitmap;
 	u16 error_bitmap;
+	u16 dirty_bitmap;
 	union {
 		/*
 		 * Structures only used by metadata
@@ -87,5 +88,19 @@ bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info,	\
 
 DECLARE_BTRFS_SUBPAGE_OPS(uptodate);
 DECLARE_BTRFS_SUBPAGE_OPS(error);
+DECLARE_BTRFS_SUBPAGE_OPS(dirty);
+
+/*
+ * Extra clear_and_test function for subpage dirty bitmap.
+ *
+ * Return true if we're the last bits in the dirty_bitmap and clear the
+ * dirty_bitmap.
+ * Return false otherwise.
+ *
+ * NOTE: Callers should manually clear page dirty for true case, as we have
+ * extra handling for tree blocks.
+ */
+bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len);
 
 #endif
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 05/15] btrfs: introduce helpers for subpage writeback status
  2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (3 preceding siblings ...)
  2021-03-10  9:08 ` [PATCH v2 04/15] btrfs: introduce helpers for subpage dirty status Qu Wenruo
@ 2021-03-10  9:08 ` Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 06/15] btrfs: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

This patch introduces the following functions to handle btrfs subpage
writeback status:
- btrfs_subpage_set_writeback()
- btrfs_subpage_clear_writeback()
- btrfs_subpage_test_writeback()
  Those helpers can only be called when the range is ensured to be
  inside the page.

- btrfs_page_set_writeback()
- btrfs_page_clear_writeback()
- btrfs_page_test_writeback()
  Those helpers can handle both regular sector size and subpage without
  problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/subpage.c | 30 ++++++++++++++++++++++++++++++
 fs/btrfs/subpage.h |  2 ++
 2 files changed, 32 insertions(+)

diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 183925902031..2a326d6385ed 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -260,6 +260,33 @@ void btrfs_subpage_clear_dirty(const struct btrfs_fs_info *fs_info,
 		clear_page_dirty_for_io(page);
 }
 
+void btrfs_subpage_set_writeback(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->writeback_bitmap |= tmp;
+	set_page_writeback(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
+void btrfs_subpage_clear_writeback(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->writeback_bitmap &= ~tmp;
+	if (subpage->writeback_bitmap == 0)
+		end_page_writeback(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
 /*
  * Unlike set/clear which is dependent on each page status, for test all bits
  * are tested in the same way.
@@ -281,6 +308,7 @@ bool btrfs_subpage_test_##name(const struct btrfs_fs_info *fs_info,	\
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(uptodate);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(error);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(dirty);
+IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(writeback);
 
 /*
  * Note that, in selftests (extent-io-tests), we can have empty fs_info passed
@@ -319,3 +347,5 @@ IMPLEMENT_BTRFS_PAGE_OPS(uptodate, SetPageUptodate, ClearPageUptodate,
 IMPLEMENT_BTRFS_PAGE_OPS(error, SetPageError, ClearPageError, PageError);
 IMPLEMENT_BTRFS_PAGE_OPS(dirty, set_page_dirty, clear_page_dirty_for_io,
 			 PageDirty);
+IMPLEMENT_BTRFS_PAGE_OPS(writeback, set_page_writeback, end_page_writeback,
+			PageWriteback);
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index adaece5ce294..fe43267e31f3 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -21,6 +21,7 @@ struct btrfs_subpage {
 	u16 uptodate_bitmap;
 	u16 error_bitmap;
 	u16 dirty_bitmap;
+	u16 writeback_bitmap;
 	union {
 		/*
 		 * Structures only used by metadata
@@ -89,6 +90,7 @@ bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info,	\
 DECLARE_BTRFS_SUBPAGE_OPS(uptodate);
 DECLARE_BTRFS_SUBPAGE_OPS(error);
 DECLARE_BTRFS_SUBPAGE_OPS(dirty);
+DECLARE_BTRFS_SUBPAGE_OPS(writeback);
 
 /*
  * Extra clear_and_test function for subpage dirty bitmap.
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 06/15] btrfs: allow btree_set_page_dirty() to do more sanity check on subpage metadata
  2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (4 preceding siblings ...)
  2021-03-10  9:08 ` [PATCH v2 05/15] btrfs: introduce helpers for subpage writeback status Qu Wenruo
@ 2021-03-10  9:08 ` Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 07/15] btrfs: support subpage metadata csum calculation at write time Qu Wenruo
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

For btree_set_page_dirty(), we should also check the extent buffer
sanity for subpage support.

Unlike the regular sector size case, since one page can contain multiple
extent buffers, we need to make sure there is at least one dirty extent
buffer in the page.

So this patch will iterate through the btrfs_subpage::dirty_bitmap
to get the extent buffers, and check if any dirty extent buffer in the page
range has EXTENT_BUFFER_DIRTY and proper refs.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 47 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 41b718cfea40..d53df276923e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -42,6 +42,7 @@
 #include "discard.h"
 #include "space-info.h"
 #include "zoned.h"
+#include "subpage.h"
 
 #define BTRFS_SUPER_FLAG_SUPP	(BTRFS_HEADER_FLAG_WRITTEN |\
 				 BTRFS_HEADER_FLAG_RELOC |\
@@ -992,14 +993,48 @@ static void btree_invalidatepage(struct page *page, unsigned int offset,
 static int btree_set_page_dirty(struct page *page)
 {
 #ifdef DEBUG
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+	struct btrfs_subpage *subpage;
 	struct extent_buffer *eb;
+	int cur_bit = 0;
+	u64 page_start = page_offset(page);
+
+	if (fs_info->sectorsize == PAGE_SIZE) {
+		BUG_ON(!PagePrivate(page));
+		eb = (struct extent_buffer *)page->private;
+		BUG_ON(!eb);
+		BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
+		BUG_ON(!atomic_read(&eb->refs));
+		btrfs_assert_tree_locked(eb);
+		return __set_page_dirty_nobuffers(page);
+	}
+	ASSERT(PagePrivate(page) && page->private);
+	subpage = (struct btrfs_subpage *)page->private;
+
+	ASSERT(subpage->dirty_bitmap);
+	while (cur_bit < BTRFS_SUBPAGE_BITMAP_SIZE) {
+		unsigned long flags;
+		u64 cur;
+		u16 tmp = (1 << cur_bit);
+
+		spin_lock_irqsave(&subpage->lock, flags);
+		if (!(tmp & subpage->dirty_bitmap)) {
+			spin_unlock_irqrestore(&subpage->lock, flags);
+			cur_bit++;
+			continue;
+		}
+		spin_unlock_irqrestore(&subpage->lock, flags);
+		cur = page_start + cur_bit * fs_info->sectorsize;
 
-	BUG_ON(!PagePrivate(page));
-	eb = (struct extent_buffer *)page->private;
-	BUG_ON(!eb);
-	BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
-	BUG_ON(!atomic_read(&eb->refs));
-	btrfs_assert_tree_locked(eb);
+		eb = find_extent_buffer(fs_info, cur);
+		ASSERT(eb);
+		ASSERT(test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
+		ASSERT(atomic_read(&eb->refs));
+		btrfs_assert_tree_locked(eb);
+		free_extent_buffer(eb);
+
+		cur_bit += (fs_info->nodesize >> fs_info->sectorsize_bits);
+	}
 #endif
 	return __set_page_dirty_nobuffers(page);
 }
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 07/15] btrfs: support subpage metadata csum calculation at write time
  2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (5 preceding siblings ...)
  2021-03-10  9:08 ` [PATCH v2 06/15] btrfs: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
@ 2021-03-10  9:08 ` Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 08/15] btrfs: make alloc_extent_buffer() check subpage dirty bitmap Qu Wenruo
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

Add a new helper, csum_dirty_subpage_buffers(), to iterate through all
dirty extent buffers in one bvec.

Also extract the code of calculating csum for one extent buffer into
csum_one_extent_buffer(), so that both the existing csum_dirty_buffer()
and the new csum_dirty_subpage_buffers() can reuse the same routine.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 96 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 72 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d53df276923e..371502021a60 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -441,6 +441,74 @@ static int btree_read_extent_buffer_pages(struct extent_buffer *eb,
 	return ret;
 }
 
+static int csum_one_extent_buffer(struct extent_buffer *eb)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	u8 result[BTRFS_CSUM_SIZE];
+	int ret;
+
+	ASSERT(memcmp_extent_buffer(eb, fs_info->fs_devices->metadata_uuid,
+				    offsetof(struct btrfs_header, fsid),
+				    BTRFS_FSID_SIZE) == 0);
+	csum_tree_block(eb, result);
+
+	if (btrfs_header_level(eb))
+		ret = btrfs_check_node(eb);
+	else
+		ret = btrfs_check_leaf_full(eb);
+
+	if (ret < 0) {
+		btrfs_print_tree(eb, 0);
+		btrfs_err(fs_info,
+		"block=%llu write time tree block corruption detected",
+			  eb->start);
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		return ret;
+	}
+	write_extent_buffer(eb, result, 0, fs_info->csum_size);
+
+	return 0;
+}
+
+/* Checksum all dirty extent buffers in one bio_vec. */
+static int csum_dirty_subpage_buffers(struct btrfs_fs_info *fs_info,
+				      struct bio_vec *bvec)
+{
+	struct page *page = bvec->bv_page;
+	u64 bvec_start = page_offset(page) + bvec->bv_offset;
+	u64 cur;
+	int ret = 0;
+
+	for (cur = bvec_start; cur < bvec_start + bvec->bv_len;
+	     cur += fs_info->nodesize) {
+		struct extent_buffer *eb;
+		bool uptodate;
+
+		eb = find_extent_buffer(fs_info, cur);
+		uptodate = btrfs_subpage_test_uptodate(fs_info, page, cur,
+						       fs_info->nodesize);
+
+		/* A dirty eb shouldn't disappera from buffer_radix */
+		if (WARN_ON(!eb))
+			return -EUCLEAN;
+
+		if (WARN_ON(cur != btrfs_header_bytenr(eb))) {
+			free_extent_buffer(eb);
+			return -EUCLEAN;
+		}
+		if (WARN_ON(!uptodate)) {
+			free_extent_buffer(eb);
+			return -EUCLEAN;
+		}
+
+		ret = csum_one_extent_buffer(eb);
+		free_extent_buffer(eb);
+		if (ret < 0)
+			return ret;
+	}
+	return ret;
+}
+
 /*
  * Checksum a dirty tree block before IO.  This has extra checks to make sure
  * we only fill in the checksum field in the first page of a multi-page block.
@@ -451,9 +519,10 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec
 	struct page *page = bvec->bv_page;
 	u64 start = page_offset(page);
 	u64 found_start;
-	u8 result[BTRFS_CSUM_SIZE];
 	struct extent_buffer *eb;
-	int ret;
+
+	if (fs_info->sectorsize < PAGE_SIZE)
+		return csum_dirty_subpage_buffers(fs_info, bvec);
 
 	eb = (struct extent_buffer *)page->private;
 	if (page != eb->pages[0])
@@ -475,28 +544,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec
 	if (WARN_ON(!PageUptodate(page)))
 		return -EUCLEAN;
 
-	ASSERT(memcmp_extent_buffer(eb, fs_info->fs_devices->metadata_uuid,
-				    offsetof(struct btrfs_header, fsid),
-				    BTRFS_FSID_SIZE) == 0);
-
-	csum_tree_block(eb, result);
-
-	if (btrfs_header_level(eb))
-		ret = btrfs_check_node(eb);
-	else
-		ret = btrfs_check_leaf_full(eb);
-
-	if (ret < 0) {
-		btrfs_print_tree(eb, 0);
-		btrfs_err(fs_info,
-		"block=%llu write time tree block corruption detected",
-			  eb->start);
-		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
-		return ret;
-	}
-	write_extent_buffer(eb, result, 0, fs_info->csum_size);
-
-	return 0;
+	return csum_one_extent_buffer(eb);
 }
 
 static int check_tree_block_fsid(struct extent_buffer *eb)
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 08/15] btrfs: make alloc_extent_buffer() check subpage dirty bitmap
  2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (6 preceding siblings ...)
  2021-03-10  9:08 ` [PATCH v2 07/15] btrfs: support subpage metadata csum calculation at write time Qu Wenruo
@ 2021-03-10  9:08 ` Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 09/15] btrfs: make the page uptodate assert to be subpage compatible Qu Wenruo
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

In alloc_extent_buffer(), we make sure that the newly allocated page is
never dirty.

This is fine for sector size == PAGE_SIZE case, but for subpage it's
possible that one extent buffer in the page is dirty, thus the whole
page is marked dirty, and could cause false alert.

To support subpage, call btrfs_page_test_dirty() to handle both cases.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 82cc8b9ce744..796beb5a0b6b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5635,7 +5635,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		btrfs_page_inc_eb_refs(fs_info, p);
 		spin_unlock(&mapping->private_lock);
 
-		WARN_ON(PageDirty(p));
+		WARN_ON(btrfs_page_test_dirty(fs_info, p, eb->start, eb->len));
 		eb->pages[i] = p;
 		if (!PageUptodate(p))
 			uptodate = 0;
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 09/15] btrfs: make the page uptodate assert to be subpage compatible
  2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (7 preceding siblings ...)
  2021-03-10  9:08 ` [PATCH v2 08/15] btrfs: make alloc_extent_buffer() check subpage dirty bitmap Qu Wenruo
@ 2021-03-10  9:08 ` Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 10/15] btrfs: make set/clear_extent_buffer_dirty() " Qu Wenruo
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

There are quite some assert check on page uptodate in extent buffer write
accessors.
They ensure the destination page is already uptodate.

This is fine for regular sector size case, but not for subpage case, as
for subpage we only mark the page uptodate if the page contains no hole
and all its extent buffers are uptodate.

So instead of checking PageUptodate(), for subpage case we check the
uptodate bitmap of btrfs_subpage structure.

To make the check more elegant, introduce a helper,
assert_eb_page_uptodate() to do the check for both subpage and regular
sector size cases.

The following functions are involved:
- write_extent_buffer_chunk_tree_uuid()
- write_extent_buffer_fsid()
- write_extent_buffer()
- memzero_extent_buffer()
- copy_extent_buffer()
- extent_buffer_test_bit()
- extent_buffer_bitmap_set()
- extent_buffer_bitmap_clear()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 42 ++++++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 796beb5a0b6b..208e603acf0c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -6187,12 +6187,34 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 	return ret;
 }
 
+/*
+ * A helper to ensure that the extent buffer is uptodate.
+ *
+ * For regular sector size == PAGE_SIZE case, check if @page is uptodate.
+ * For subpage case, check if the range covered by the eb has EXTENT_UPTODATE.
+ */
+static void assert_eb_page_uptodate(const struct extent_buffer *eb,
+				    struct page *page)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+
+	if (fs_info->sectorsize < PAGE_SIZE) {
+		bool uptodate;
+
+		uptodate = btrfs_subpage_test_uptodate(fs_info, page,
+						eb->start, eb->len);
+		WARN_ON(!uptodate);
+	} else {
+		WARN_ON(!PageUptodate(page));
+	}
+}
+
 void write_extent_buffer_chunk_tree_uuid(const struct extent_buffer *eb,
 		const void *srcv)
 {
 	char *kaddr;
 
-	WARN_ON(!PageUptodate(eb->pages[0]));
+	assert_eb_page_uptodate(eb, eb->pages[0]);
 	kaddr = page_address(eb->pages[0]) + get_eb_offset_in_page(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, chunk_tree_uuid), srcv,
 			BTRFS_FSID_SIZE);
@@ -6202,7 +6224,7 @@ void write_extent_buffer_fsid(const struct extent_buffer *eb, const void *srcv)
 {
 	char *kaddr;
 
-	WARN_ON(!PageUptodate(eb->pages[0]));
+	assert_eb_page_uptodate(eb, eb->pages[0]);
 	kaddr = page_address(eb->pages[0]) + get_eb_offset_in_page(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, fsid), srcv,
 			BTRFS_FSID_SIZE);
@@ -6227,7 +6249,7 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
 
 	while (len > 0) {
 		page = eb->pages[i];
-		WARN_ON(!PageUptodate(page));
+		assert_eb_page_uptodate(eb, page);
 
 		cur = min(len, PAGE_SIZE - offset);
 		kaddr = page_address(page);
@@ -6256,7 +6278,7 @@ void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start,
 
 	while (len > 0) {
 		page = eb->pages[i];
-		WARN_ON(!PageUptodate(page));
+		assert_eb_page_uptodate(eb, page);
 
 		cur = min(len, PAGE_SIZE - offset);
 		kaddr = page_address(page);
@@ -6314,7 +6336,7 @@ void copy_extent_buffer(const struct extent_buffer *dst,
 
 	while (len > 0) {
 		page = dst->pages[i];
-		WARN_ON(!PageUptodate(page));
+		assert_eb_page_uptodate(dst, page);
 
 		cur = min(len, (unsigned long)(PAGE_SIZE - offset));
 
@@ -6376,7 +6398,7 @@ int extent_buffer_test_bit(const struct extent_buffer *eb, unsigned long start,
 
 	eb_bitmap_offset(eb, start, nr, &i, &offset);
 	page = eb->pages[i];
-	WARN_ON(!PageUptodate(page));
+	assert_eb_page_uptodate(eb, page);
 	kaddr = page_address(page);
 	return 1U & (kaddr[offset] >> (nr & (BITS_PER_BYTE - 1)));
 }
@@ -6401,7 +6423,7 @@ void extent_buffer_bitmap_set(const struct extent_buffer *eb, unsigned long star
 
 	eb_bitmap_offset(eb, start, pos, &i, &offset);
 	page = eb->pages[i];
-	WARN_ON(!PageUptodate(page));
+	assert_eb_page_uptodate(eb, page);
 	kaddr = page_address(page);
 
 	while (len >= bits_to_set) {
@@ -6412,7 +6434,7 @@ void extent_buffer_bitmap_set(const struct extent_buffer *eb, unsigned long star
 		if (++offset >= PAGE_SIZE && len > 0) {
 			offset = 0;
 			page = eb->pages[++i];
-			WARN_ON(!PageUptodate(page));
+			assert_eb_page_uptodate(eb, page);
 			kaddr = page_address(page);
 		}
 	}
@@ -6444,7 +6466,7 @@ void extent_buffer_bitmap_clear(const struct extent_buffer *eb,
 
 	eb_bitmap_offset(eb, start, pos, &i, &offset);
 	page = eb->pages[i];
-	WARN_ON(!PageUptodate(page));
+	assert_eb_page_uptodate(eb, page);
 	kaddr = page_address(page);
 
 	while (len >= bits_to_clear) {
@@ -6455,7 +6477,7 @@ void extent_buffer_bitmap_clear(const struct extent_buffer *eb,
 		if (++offset >= PAGE_SIZE && len > 0) {
 			offset = 0;
 			page = eb->pages[++i];
-			WARN_ON(!PageUptodate(page));
+			assert_eb_page_uptodate(eb, page);
 			kaddr = page_address(page);
 		}
 	}
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 10/15] btrfs: make set/clear_extent_buffer_dirty() to be subpage compatible
  2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (8 preceding siblings ...)
  2021-03-10  9:08 ` [PATCH v2 09/15] btrfs: make the page uptodate assert to be subpage compatible Qu Wenruo
@ 2021-03-10  9:08 ` Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 11/15] btrfs: make set_btree_ioerr() accept extent buffer and " Qu Wenruo
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

For set_extent_buffer_dirty() to support subpage sized metadata, just
call btrfs_page_set_dirty() to handle both cases.

For clear_extent_buffer_dirty(), it needs to clear the page dirty if and
only if all extent buffers in the page range are no longer dirty.
Also do the same for page error.

This is pretty different from the exist clear_extent_buffer_dirty()
routine, so add a new helper function,
clear_subpage_extent_buffer_dirty() to do this for subpage metadata.

Also since the main part of clearing page dirty code is still the same,
extract that into btree_clear_page_dirty() so that it can be utilized
for both cases.

But there is a special race between set_extent_buffer_dirty() and
clear_extent_buffer_dirty(), where we can clear the page dirty.

[POSSIBLE RACE WINDOW]
For the race window between clear_subpage_extent_buffer_dirty() and
set_extent_buffer_dirty(), due to the fact that we can't call
clear_page_dirty_for_io() under subpage spin lock, we can race like
below:

   T1 (eb1 in the same page)	|  T2 (eb2 in the same page)
 -------------------------------+------------------------------
 set_extent_buffer_dirty()	| clear_extent_buffer_dirty()
 |- was_dirty = false;		| |- clear_subpagE_extent_buffer_dirty()
 |				|    |- btrfs_clear_and_test_dirty()
 |				|    |  Since eb2 is the last dirty page
 |				|    |  we got:
 |				|    |  last == true;
 |				|    |
 |- btrfs_page_set_dirty()	|    |
 |  We set the page dirty and   |    |
 |  subpage dirty bitmap	|    |
 |				|    |- if (last)
 |				|    |  Since we don't have subpage lock
 |				|    |  hold, now @last is no longer
 |				|    |  correct
 |				|    |- btree_clear_page_dirty()
 |				|	Now PageDirty == false, even we
 |				|       have dirty_bitmap not zero.
 |- ASSERT(PageDirty());	|
    ^^^^ CRASH

The solution here is to also lock the eb->pages[0] for subpage case of
set_extent_buffer_dirty(), to prevent racing with
clear_extent_buffer_dirty().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 65 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 53 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 208e603acf0c..2d16d92107bc 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5784,28 +5784,51 @@ void free_extent_buffer_stale(struct extent_buffer *eb)
 	release_extent_buffer(eb);
 }
 
+static void btree_clear_page_dirty(struct page *page)
+{
+	ASSERT(PageDirty(page));
+	ASSERT(PageLocked(page));
+	clear_page_dirty_for_io(page);
+	xa_lock_irq(&page->mapping->i_pages);
+	if (!PageDirty(page))
+		__xa_clear_mark(&page->mapping->i_pages,
+				page_index(page), PAGECACHE_TAG_DIRTY);
+	xa_unlock_irq(&page->mapping->i_pages);
+}
+
+static void clear_subpage_extent_buffer_dirty(const struct extent_buffer *eb)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct page *page = eb->pages[0];
+	bool last;
+
+	/* btree_clear_page_dirty() needs page locked */
+	lock_page(page);
+	last = btrfs_subpage_clear_and_test_dirty(fs_info, page, eb->start,
+						  eb->len);
+	if (last)
+		btree_clear_page_dirty(page);
+	unlock_page(page);
+	WARN_ON(atomic_read(&eb->refs) == 0);
+}
+
 void clear_extent_buffer_dirty(const struct extent_buffer *eb)
 {
 	int i;
 	int num_pages;
 	struct page *page;
 
+	if (eb->fs_info->sectorsize < PAGE_SIZE)
+		return clear_subpage_extent_buffer_dirty(eb);
+
 	num_pages = num_extent_pages(eb);
 
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
 		if (!PageDirty(page))
 			continue;
-
 		lock_page(page);
-		WARN_ON(!PagePrivate(page));
-
-		clear_page_dirty_for_io(page);
-		xa_lock_irq(&page->mapping->i_pages);
-		if (!PageDirty(page))
-			__xa_clear_mark(&page->mapping->i_pages,
-					page_index(page), PAGECACHE_TAG_DIRTY);
-		xa_unlock_irq(&page->mapping->i_pages);
+		btree_clear_page_dirty(page);
 		ClearPageError(page);
 		unlock_page(page);
 	}
@@ -5826,10 +5849,28 @@ bool set_extent_buffer_dirty(struct extent_buffer *eb)
 	WARN_ON(atomic_read(&eb->refs) == 0);
 	WARN_ON(!test_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags));
 
-	if (!was_dirty)
-		for (i = 0; i < num_pages; i++)
-			set_page_dirty(eb->pages[i]);
+	if (!was_dirty) {
+		bool subpage = eb->fs_info->sectorsize < PAGE_SIZE;
 
+		/*
+		 * For subpage case, we can have other extent buffers in the
+		 * same page, and in clear_subpage_extent_buffer_dirty() we
+		 * have to clear page dirty without subapge lock hold.
+		 * This can cause race where our page gets dirty cleared after
+		 * we just set it.
+		 *
+		 * Thankfully, clear_subpage_extent_buffer_dirty() has locked
+		 * its page for other reasons, we can use page lock to
+		 * prevent above race.
+		 */
+		if (subpage)
+			lock_page(eb->pages[0]);
+		for (i = 0; i < num_pages; i++)
+			btrfs_page_set_dirty(eb->fs_info, eb->pages[i],
+					     eb->start, eb->len);
+		if (subpage)
+			unlock_page(eb->pages[0]);
+	}
 #ifdef CONFIG_BTRFS_DEBUG
 	for (i = 0; i < num_pages; i++)
 		ASSERT(PageDirty(eb->pages[i]));
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 11/15] btrfs: make set_btree_ioerr() accept extent buffer and to be subpage compatible
  2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (9 preceding siblings ...)
  2021-03-10  9:08 ` [PATCH v2 10/15] btrfs: make set/clear_extent_buffer_dirty() " Qu Wenruo
@ 2021-03-10  9:08 ` Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 12/15] btrfs: introduce end_bio_subpage_eb_writepage() function Qu Wenruo
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

Current set_btree_ioerr() only accepts @page parameter and grabs extent
buffer from page::private.

This works fine for sector size == PAGE_SIZE case, but not for subpage
case.

Adds an extra parameter, @eb, for callers to pass extent buffer to this
function, so that subpage code can reuse this function.

And also add subpage special handling to update
btrfs_subpage::error_bitmap.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2d16d92107bc..b6fbb512abfd 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3982,12 +3982,11 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 	return ret;
 }
 
-static void set_btree_ioerr(struct page *page)
+static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
 {
-	struct extent_buffer *eb = (struct extent_buffer *)page->private;
-	struct btrfs_fs_info *fs_info;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 
-	SetPageError(page);
+	btrfs_page_set_error(fs_info, page, eb->start, eb->len);
 	if (test_and_set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
 		return;
 
@@ -3995,7 +3994,6 @@ static void set_btree_ioerr(struct page *page)
 	 * If we error out, we should add back the dirty_metadata_bytes
 	 * to make it consistent.
 	 */
-	fs_info = eb->fs_info;
 	percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
 				 eb->len, fs_info->dirty_metadata_batch);
 
@@ -4039,13 +4037,13 @@ static void set_btree_ioerr(struct page *page)
 	 */
 	switch (eb->log_index) {
 	case -1:
-		set_bit(BTRFS_FS_BTREE_ERR, &eb->fs_info->flags);
+		set_bit(BTRFS_FS_BTREE_ERR, &fs_info->flags);
 		break;
 	case 0:
-		set_bit(BTRFS_FS_LOG1_ERR, &eb->fs_info->flags);
+		set_bit(BTRFS_FS_LOG1_ERR, &fs_info->flags);
 		break;
 	case 1:
-		set_bit(BTRFS_FS_LOG2_ERR, &eb->fs_info->flags);
+		set_bit(BTRFS_FS_LOG2_ERR, &fs_info->flags);
 		break;
 	default:
 		BUG(); /* unexpected, logic error */
@@ -4070,7 +4068,7 @@ static void end_bio_extent_buffer_writepage(struct bio *bio)
 		if (bio->bi_status ||
 		    test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
 			ClearPageUptodate(page);
-			set_btree_ioerr(page);
+			set_btree_ioerr(page, eb);
 		}
 
 		end_page_writeback(page);
@@ -4126,7 +4124,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 					 end_bio_extent_buffer_writepage,
 					 0, 0, 0, false);
 		if (ret) {
-			set_btree_ioerr(p);
+			set_btree_ioerr(p, eb);
 			if (PageWriteback(p))
 				end_page_writeback(p);
 			if (atomic_sub_and_test(num_pages - i, &eb->io_pages))
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 12/15] btrfs: introduce end_bio_subpage_eb_writepage() function
  2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (10 preceding siblings ...)
  2021-03-10  9:08 ` [PATCH v2 11/15] btrfs: make set_btree_ioerr() accept extent buffer and " Qu Wenruo
@ 2021-03-10  9:08 ` Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 13/15] btrfs: introduce write_one_subpage_eb() function Qu Wenruo
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

The new function, end_bio_subpage_eb_writepage(), will handle the
metadata writeback endio.

The major differences involved are:
- How to grab extent buffer
  Now page::private is a pointer to btrfs_subpage, we can no longer grab
  extent buffer directly.
  Thus we need to use the bv_offset to locate the extent buffer manually
  and iterate through the whole range.

- Use btrfs_subpage_end_writeback() caller
  This helper will handle the subpage writeback for us.

Since this function is executed under endio context, when grabbing
extent buffers it can't grab eb->refs_lock as that lock is not designed
to be grabbed under hardirq context.

So here introduce a helper, find_extent_buffer_nospinlock(), for such
situation, and convert find_extent_buffer() to use that helper.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 135 +++++++++++++++++++++++++++++++++----------
 1 file changed, 106 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b6fbb512abfd..74d59b292c9a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4050,13 +4050,97 @@ static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
 	}
 }
 
+/*
+ * This is the endio specific version which won't touch any unsafe spinlock
+ * in endio context.
+ */
+static struct extent_buffer *find_extent_buffer_nospinlock(
+		struct btrfs_fs_info *fs_info, u64 start)
+{
+	struct extent_buffer *eb;
+
+	rcu_read_lock();
+	eb = radix_tree_lookup(&fs_info->buffer_radix,
+			       start >> fs_info->sectorsize_bits);
+	if (eb && atomic_inc_not_zero(&eb->refs)) {
+		rcu_read_unlock();
+		return eb;
+	}
+	rcu_read_unlock();
+	return NULL;
+}
+/*
+ * The endio function for subpage extent buffer write.
+ *
+ * Unlike end_bio_extent_buffer_writepage(), we only call end_page_writeback()
+ * after all extent buffers in the page has finished their writeback.
+ */
+static void end_bio_subpage_eb_writepage(struct btrfs_fs_info *fs_info,
+					 struct bio *bio)
+{
+	struct bio_vec *bvec;
+	struct bvec_iter_all iter_all;
+
+	ASSERT(!bio_flagged(bio, BIO_CLONED));
+	bio_for_each_segment_all(bvec, bio, iter_all) {
+		struct page *page = bvec->bv_page;
+		u64 bvec_start = page_offset(page) + bvec->bv_offset;
+		u64 bvec_end = bvec_start + bvec->bv_len - 1;
+		u64 cur_bytenr = bvec_start;
+
+		ASSERT(IS_ALIGNED(bvec->bv_len, fs_info->nodesize));
+
+		/* Iterate through all extent buffers in the range */
+		while (cur_bytenr <= bvec_end) {
+			struct extent_buffer *eb;
+			int done;
+
+			/*
+			 * Here we can't use find_extent_buffer(), as it may
+			 * try to lock eb->refs_lock, which is not safe in endio
+			 * context.
+			 */
+			eb = find_extent_buffer_nospinlock(fs_info, cur_bytenr);
+			ASSERT(eb);
+
+			cur_bytenr = eb->start + eb->len;
+
+			ASSERT(test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags));
+			done = atomic_dec_and_test(&eb->io_pages);
+			ASSERT(done);
+
+			if (bio->bi_status ||
+			    test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
+				ClearPageUptodate(page);
+				set_btree_ioerr(page, eb);
+			}
+
+			btrfs_subpage_clear_writeback(fs_info, page, eb->start,
+						      eb->len);
+			end_extent_buffer_writeback(eb);
+			/*
+			 * free_extent_buffer() will grab spinlock which is not
+			 * safe in endio context. Thus here we manually dec
+			 * the ref.
+			 */
+			atomic_dec(&eb->refs);
+		}
+	}
+	bio_put(bio);
+}
+
 static void end_bio_extent_buffer_writepage(struct bio *bio)
 {
+	struct btrfs_fs_info *fs_info;
 	struct bio_vec *bvec;
 	struct extent_buffer *eb;
 	int done;
 	struct bvec_iter_all iter_all;
 
+	fs_info = btrfs_sb(bio_first_page_all(bio)->mapping->host->i_sb);
+	if (fs_info->sectorsize < PAGE_SIZE)
+		return end_bio_subpage_eb_writepage(fs_info, bio);
+
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
 		struct page *page = bvec->bv_page;
@@ -5437,36 +5521,29 @@ struct extent_buffer *find_extent_buffer(struct btrfs_fs_info *fs_info,
 {
 	struct extent_buffer *eb;
 
-	rcu_read_lock();
-	eb = radix_tree_lookup(&fs_info->buffer_radix,
-			       start >> fs_info->sectorsize_bits);
-	if (eb && atomic_inc_not_zero(&eb->refs)) {
-		rcu_read_unlock();
-		/*
-		 * Lock our eb's refs_lock to avoid races with
-		 * free_extent_buffer. When we get our eb it might be flagged
-		 * with EXTENT_BUFFER_STALE and another task running
-		 * free_extent_buffer might have seen that flag set,
-		 * eb->refs == 2, that the buffer isn't under IO (dirty and
-		 * writeback flags not set) and it's still in the tree (flag
-		 * EXTENT_BUFFER_TREE_REF set), therefore being in the process
-		 * of decrementing the extent buffer's reference count twice.
-		 * So here we could race and increment the eb's reference count,
-		 * clear its stale flag, mark it as dirty and drop our reference
-		 * before the other task finishes executing free_extent_buffer,
-		 * which would later result in an attempt to free an extent
-		 * buffer that is dirty.
-		 */
-		if (test_bit(EXTENT_BUFFER_STALE, &eb->bflags)) {
-			spin_lock(&eb->refs_lock);
-			spin_unlock(&eb->refs_lock);
-		}
-		mark_extent_buffer_accessed(eb, NULL);
-		return eb;
+	eb = find_extent_buffer_nospinlock(fs_info, start);
+	if (!eb)
+		return NULL;
+	/*
+	 * Lock our eb's refs_lock to avoid races with free_extent_buffer().
+	 * When we get our eb it might be flagged with EXTENT_BUFFER_STALE and
+	 * another task running free_extent_buffer() might have seen that flag
+	 * set, eb->refs == 2, that the buffer isn't under IO (dirty and
+	 * writeback flags not set) and it's still in the tree (flag
+	 * EXTENT_BUFFER_TREE_REF set), therefore being in the process
+	 * of decrementing the extent buffer's reference count twice.
+	 * So here we could race and increment the eb's reference count,
+	 * clear its stale flag, mark it as dirty and drop our reference
+	 * before the other task finishes executing free_extent_buffer,
+	 * which would later result in an attempt to free an extent
+	 * buffer that is dirty.
+	 */
+	if (test_bit(EXTENT_BUFFER_STALE, &eb->bflags)) {
+		spin_lock(&eb->refs_lock);
+		spin_unlock(&eb->refs_lock);
 	}
-	rcu_read_unlock();
-
-	return NULL;
+	mark_extent_buffer_accessed(eb, NULL);
+	return eb;
 }
 
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 13/15] btrfs: introduce write_one_subpage_eb() function
  2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (11 preceding siblings ...)
  2021-03-10  9:08 ` [PATCH v2 12/15] btrfs: introduce end_bio_subpage_eb_writepage() function Qu Wenruo
@ 2021-03-10  9:08 ` Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 14/15] btrfs: make lock_extent_buffer_for_io() to be subpage compatible Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 15/15] btrfs: introduce submit_eb_subpage() to submit a subpage metadata page Qu Wenruo
  14 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

The new function, write_one_subpage_eb(), as a subroutine for subpage
metadata write, will handle the extent buffer bio submission.

The major differences between the new write_one_subpage_eb() and
write_one_eb() is:
- No page locking
  When entering write_one_subpage_eb() the page is no longer locked.
  We only lock the page for its status update, and unlock immediately.
  Now we completely rely on extent io tree locking.

- Extra bitmap update along with page status update
  Now page dirty and writeback is controlled by
  btrfs_subpage::dirty_bitmap and btrfs_subpage::writeback_bitmap.
  They both follow the schema that any sector is dirty/writeback, then
  the full page get dirty/writeback.

- When to update the nr_written number
  Now we take a short cut, if we have cleared the last dirty bit of the
  page, we update nr_written.
  This is not completely perfect, but should emulate the old behavior
  good enough.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 55 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 74d59b292c9a..74525ebf2b83 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4166,6 +4166,58 @@ static void end_bio_extent_buffer_writepage(struct bio *bio)
 	bio_put(bio);
 }
 
+/*
+ * Unlike the work in write_one_eb(), we rely completely on extent locking.
+ * Page locking is only utizlied at minimal to keep the VM code happy.
+ *
+ * Caller should still call write_one_eb() other than this function directly.
+ * As write_one_eb() has extra prepration before submitting the extent buffer.
+ */
+static int write_one_subpage_eb(struct extent_buffer *eb,
+				struct writeback_control *wbc,
+				struct extent_page_data *epd)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct page *page = eb->pages[0];
+	unsigned int write_flags = wbc_to_write_flags(wbc) | REQ_META;
+	bool no_dirty_ebs = false;
+	int ret;
+
+	/* clear_page_dirty_for_io() in subpage helper need page locked. */
+	lock_page(page);
+	btrfs_subpage_set_writeback(fs_info, page, eb->start, eb->len);
+
+	/* If we're the last dirty bit to update nr_written */
+	no_dirty_ebs = btrfs_subpage_clear_and_test_dirty(fs_info, page,
+							  eb->start, eb->len);
+	if (no_dirty_ebs)
+		clear_page_dirty_for_io(page);
+
+	ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc, page,
+			eb->start, eb->len, eb->start - page_offset(page),
+			&epd->bio, end_bio_extent_buffer_writepage, 0, 0, 0,
+			false);
+	if (ret) {
+		btrfs_subpage_clear_writeback(fs_info, page, eb->start,
+					      eb->len);
+		set_btree_ioerr(page, eb);
+		unlock_page(page);
+
+		if (atomic_dec_and_test(&eb->io_pages))
+			end_extent_buffer_writeback(eb);
+		return -EIO;
+	}
+	unlock_page(page);
+	/*
+	 * Submission finishes without problem, if no range of the page is
+	 * dirty anymore, we have submitted a page.
+	 * Update the nr_written in wbc.
+	 */
+	if (no_dirty_ebs)
+		update_nr_written(wbc, 1);
+	return ret;
+}
+
 static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 			struct writeback_control *wbc,
 			struct extent_page_data *epd)
@@ -4197,6 +4249,9 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 		memzero_extent_buffer(eb, start, end - start);
 	}
 
+	if (eb->fs_info->sectorsize < PAGE_SIZE)
+		return write_one_subpage_eb(eb, wbc, epd);
+
 	for (i = 0; i < num_pages; i++) {
 		struct page *p = eb->pages[i];
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 14/15] btrfs: make lock_extent_buffer_for_io() to be subpage compatible
  2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (12 preceding siblings ...)
  2021-03-10  9:08 ` [PATCH v2 13/15] btrfs: introduce write_one_subpage_eb() function Qu Wenruo
@ 2021-03-10  9:08 ` Qu Wenruo
  2021-03-10  9:08 ` [PATCH v2 15/15] btrfs: introduce submit_eb_subpage() to submit a subpage metadata page Qu Wenruo
  14 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

For subpage metadata, we don't use page locking at all.
So just skip the page locking part for subpage.

All the remaining routine can be reused.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 74525ebf2b83..18730d3ab50f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3937,7 +3937,13 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 
 	btrfs_tree_unlock(eb);
 
-	if (!ret)
+	/*
+	 * Either we don't need to submit any tree block, or we're submitting
+	 * subpage.
+	 * Subpage metadata doesn't use page locking at all, so we can skip
+	 * the page locking.
+	 */
+	if (!ret || fs_info->sectorsize < PAGE_SIZE)
 		return ret;
 
 	num_pages = num_extent_pages(eb);
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 15/15] btrfs: introduce submit_eb_subpage() to submit a subpage metadata page
  2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
                   ` (13 preceding siblings ...)
  2021-03-10  9:08 ` [PATCH v2 14/15] btrfs: make lock_extent_buffer_for_io() to be subpage compatible Qu Wenruo
@ 2021-03-10  9:08 ` Qu Wenruo
  14 siblings, 0 replies; 25+ messages in thread
From: Qu Wenruo @ 2021-03-10  9:08 UTC (permalink / raw)
  To: linux-btrfs

The new function, submit_eb_subpage(), will submit all the dirty extent
buffers in the page.

The major difference between submit_eb_page() and submit_eb_subpage()
is:
- How to grab extent buffer
  Now we use find_extent_buffer_nospinlock() other than using
  page::private.

All other different handling is already done in functions like
lock_extent_buffer_for_io() and write_one_eb().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 95 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 95 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 18730d3ab50f..7281ec72a86a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4293,6 +4293,98 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 	return ret;
 }
 
+/*
+ * Submit one subpage btree page.
+ *
+ * The main difference between submit_eb_page() is:
+ * - Page locking
+ *   For subpage, we don't rely on page locking at all.
+ *
+ * - Flush write bio
+ *   We only flush bio if we may be unable to fit current extent buffers into
+ *   current bio.
+ *
+ * Return >=0 for the number of submitted extent buffers.
+ * Return <0 for fatal error.
+ */
+static int submit_eb_subpage(struct page *page,
+			     struct writeback_control *wbc,
+			     struct extent_page_data *epd)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+	int submitted = 0;
+	u64 page_start = page_offset(page);
+	int bit_start = 0;
+	int nbits = BTRFS_SUBPAGE_BITMAP_SIZE;
+	int sectors_per_node = fs_info->nodesize >> fs_info->sectorsize_bits;
+	int ret;
+
+	/* Lock and write each dirty extent buffers in the range */
+	while (bit_start < nbits) {
+		struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+		struct extent_buffer *eb;
+		unsigned long flags;
+		u64 start;
+
+		/*
+		 * Take private lock to ensure the subpage won't be detached
+		 * halfway.
+		 */
+		spin_lock(&page->mapping->private_lock);
+		if (!PagePrivate(page)) {
+			spin_unlock(&page->mapping->private_lock);
+			break;
+		}
+		spin_lock_irqsave(&subpage->lock, flags);
+		if (!((1 << bit_start) & subpage->dirty_bitmap)) {
+			spin_unlock_irqrestore(&subpage->lock, flags);
+			spin_unlock(&page->mapping->private_lock);
+			bit_start++;
+			continue;
+		}
+
+		start = page_start + bit_start * fs_info->sectorsize;
+		bit_start += sectors_per_node;
+
+		/*
+		 * Here we just want to grab the eb without touching extra
+		 * spin locks. So here we call find_extent_buffer_nospinlock().
+		 */
+		eb = find_extent_buffer_nospinlock(fs_info, start);
+		spin_unlock_irqrestore(&subpage->lock, flags);
+		spin_unlock(&page->mapping->private_lock);
+
+		/*
+		 * The eb has already reached 0 refs thus find_extent_buffer()
+		 * doesn't return it. We don't need to write back such eb
+		 * anyway.
+		 */
+		if (!eb)
+			continue;
+
+		ret = lock_extent_buffer_for_io(eb, epd);
+		if (ret == 0) {
+			free_extent_buffer(eb);
+			continue;
+		}
+		if (ret < 0) {
+			free_extent_buffer(eb);
+			goto cleanup;
+		}
+		ret = write_one_eb(eb, wbc, epd);
+		free_extent_buffer(eb);
+		if (ret < 0)
+			goto cleanup;
+		submitted++;
+	}
+	return submitted;
+
+cleanup:
+	/* We hit error, end bio for the submitted extent buffers */
+	end_write_bio(epd, ret);
+	return ret;
+}
+
 /*
  * Submit all page(s) of one extent buffer.
  *
@@ -4325,6 +4417,9 @@ static int submit_eb_page(struct page *page, struct writeback_control *wbc,
 	if (!PagePrivate(page))
 		return 0;
 
+	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
+		return submit_eb_subpage(page, wbc, epd);
+
 	spin_lock(&mapping->private_lock);
 	if (!PagePrivate(page)) {
 		spin_unlock(&mapping->private_lock);
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/15] btrfs: add sysfs interface for supported sectorsize
  2021-03-10  9:08 ` [PATCH v2 01/15] btrfs: add sysfs interface for supported sectorsize Qu Wenruo
@ 2021-03-15 11:59   ` Anand Jain
  2021-03-15 12:39     ` Qu Wenruo
  0 siblings, 1 reply; 25+ messages in thread
From: Anand Jain @ 2021-03-15 11:59 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 10/03/2021 17:08, Qu Wenruo wrote:
> Add extra sysfs interface features/supported_ro_sectorsize and
> features/supported_rw_sectorsize to indicate subpage support.
> 
> Currently for supported_rw_sectorsize all architectures only have their
> PAGE_SIZE listed.
> 
> While for supported_ro_sectorsize, for systems with 64K page size, 4K
> sectorsize is also supported.
> 
> This new sysfs interface would help mkfs.btrfs to do more accurate
> warning.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---

Changes looks good. Nit below...
And maybe it is a good idea to wait for other comments before reroll.

>   fs/btrfs/sysfs.c | 34 ++++++++++++++++++++++++++++++++++
>   1 file changed, 34 insertions(+)
> 
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index 6eb1c50fa98c..3ef419899472 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -360,11 +360,45 @@ static ssize_t supported_rescue_options_show(struct kobject *kobj,
>   BTRFS_ATTR(static_feature, supported_rescue_options,
>   	   supported_rescue_options_show);
>   
> +static ssize_t supported_ro_sectorsize_show(struct kobject *kobj,
> +					    struct kobj_attribute *a,
> +					    char *buf)
> +{
> +	ssize_t ret = 0;
> +	int i = 0;

  Drop variable i, as ret can be used instead of 'i'.

> +
> +	/* For 64K page size, 4K sector size is supported */
> +	if (PAGE_SIZE == SZ_64K) {
> +		ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%u", SZ_4K);
> +		i++;
> +	}
> +	/* Other than above subpage, only support PAGE_SIZE as sectorsize yet */
> +	ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%s%lu\n",

> +			 (i ? " " : ""), PAGE_SIZE);
                           ^ret

> +	return ret;
> +}
> +BTRFS_ATTR(static_feature, supported_ro_sectorsize,
> +	   supported_ro_sectorsize_show);
> +
> +static ssize_t supported_rw_sectorsize_show(struct kobject *kobj,
> +					    struct kobj_attribute *a,
> +					    char *buf)
> +{
> +	ssize_t ret = 0;
> +
> +	/* Only PAGE_SIZE as sectorsize is supported */
> +	ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%lu\n", PAGE_SIZE);
> +	return ret;
> +}
> +BTRFS_ATTR(static_feature, supported_rw_sectorsize,
> +	   supported_rw_sectorsize_show);

  Why not merge supported_ro_sectorsize and supported_rw_sectorsize
  and show both in two lines...
  For example:
    cat supported_sectorsizes
    ro: 4096 65536
    rw: 65536



>   static struct attribute *btrfs_supported_static_feature_attrs[] = {
>   	BTRFS_ATTR_PTR(static_feature, rmdir_subvol),
>   	BTRFS_ATTR_PTR(static_feature, supported_checksums),
>   	BTRFS_ATTR_PTR(static_feature, send_stream_version),
>   	BTRFS_ATTR_PTR(static_feature, supported_rescue_options),
> +	BTRFS_ATTR_PTR(static_feature, supported_ro_sectorsize),
> +	BTRFS_ATTR_PTR(static_feature, supported_rw_sectorsize),
>   	NULL
>   };
>   
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 02/15] btrfs: use min() to replace open-code in btrfs_invalidatepage()
  2021-03-10  9:08 ` [PATCH v2 02/15] btrfs: use min() to replace open-code in btrfs_invalidatepage() Qu Wenruo
@ 2021-03-15 12:03   ` Anand Jain
  0 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2021-03-15 12:03 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 10/03/2021 17:08, Qu Wenruo wrote:
> In btrfs_invalidatepage() we introduce a temporary variable, new_len, to
> update ordered->truncated_len.
> 
> But we can use min() to replace it completely and no need for the
> variable.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---

   Reviewed-by: Anand Jain <anand.jain@oracle.com>
-Anand

>   fs/btrfs/inode.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 52dc5f52ea58..2973cec05505 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8405,15 +8405,13 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>   		 */
>   		if (TestClearPagePrivate2(page)) {
>   			struct btrfs_ordered_inode_tree *tree;
> -			u64 new_len;
>   
>   			tree = &inode->ordered_tree;
>   
>   			spin_lock_irq(&tree->lock);
>   			set_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags);
> -			new_len = start - ordered->file_offset;
> -			if (new_len < ordered->truncated_len)
> -				ordered->truncated_len = new_len;
> +			ordered->truncated_len = min(ordered->truncated_len,
> +					start - ordered->file_offset);
>   			spin_unlock_irq(&tree->lock);
>   
>   			if (btrfs_dec_test_ordered_pending(inode, &ordered,
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 03/15] btrfs: remove unnecessary variable shadowing in btrfs_invalidatepage()
  2021-03-10  9:08 ` [PATCH v2 03/15] btrfs: remove unnecessary variable shadowing " Qu Wenruo
@ 2021-03-15 12:06   ` Anand Jain
  0 siblings, 0 replies; 25+ messages in thread
From: Anand Jain @ 2021-03-15 12:06 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 10/03/2021 17:08, Qu Wenruo wrote:
> In btrfs_invalidatepage() we re-declare @tree variable as
> btrfs_ordered_inode_tree.
> 
> Since it's only used to do the spinlock, we can grab it from inode
> directly, and remove the unnecessary declaration completely.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---

  Reviewed-by: Anand Jain <anand.jain@oracle.com>
-Anand

>   fs/btrfs/inode.c | 8 ++------
>   1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 2973cec05505..f99554f0bd48 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8404,15 +8404,11 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>   		 * for the finish_ordered_io
>   		 */
>   		if (TestClearPagePrivate2(page)) {
> -			struct btrfs_ordered_inode_tree *tree;
> -
> -			tree = &inode->ordered_tree;
> -
> -			spin_lock_irq(&tree->lock);
> +			spin_lock_irq(&inode->ordered_tree.lock);
>   			set_bit(BTRFS_ORDERED_TRUNCATED, &ordered->flags);
>   			ordered->truncated_len = min(ordered->truncated_len,
>   					start - ordered->file_offset);
> -			spin_unlock_irq(&tree->lock);
> +			spin_unlock_irq(&inode->ordered_tree.lock);
>   
>   			if (btrfs_dec_test_ordered_pending(inode, &ordered,
>   							   start,
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/15] btrfs: add sysfs interface for supported sectorsize
  2021-03-15 11:59   ` Anand Jain
@ 2021-03-15 12:39     ` Qu Wenruo
  2021-03-15 18:44       ` David Sterba
  0 siblings, 1 reply; 25+ messages in thread
From: Qu Wenruo @ 2021-03-15 12:39 UTC (permalink / raw)
  To: Anand Jain, Qu Wenruo, linux-btrfs



On 2021/3/15 下午7:59, Anand Jain wrote:
> On 10/03/2021 17:08, Qu Wenruo wrote:
>> Add extra sysfs interface features/supported_ro_sectorsize and
>> features/supported_rw_sectorsize to indicate subpage support.
>>
>> Currently for supported_rw_sectorsize all architectures only have their
>> PAGE_SIZE listed.
>>
>> While for supported_ro_sectorsize, for systems with 64K page size, 4K
>> sectorsize is also supported.
>>
>> This new sysfs interface would help mkfs.btrfs to do more accurate
>> warning.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>
> Changes looks good. Nit below...
> And maybe it is a good idea to wait for other comments before reroll.
>
>>   fs/btrfs/sysfs.c | 34 ++++++++++++++++++++++++++++++++++
>>   1 file changed, 34 insertions(+)
>>
>> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
>> index 6eb1c50fa98c..3ef419899472 100644
>> --- a/fs/btrfs/sysfs.c
>> +++ b/fs/btrfs/sysfs.c
>> @@ -360,11 +360,45 @@ static ssize_t
>> supported_rescue_options_show(struct kobject *kobj,
>>   BTRFS_ATTR(static_feature, supported_rescue_options,
>>          supported_rescue_options_show);
>> +static ssize_t supported_ro_sectorsize_show(struct kobject *kobj,
>> +                        struct kobj_attribute *a,
>> +                        char *buf)
>> +{
>> +    ssize_t ret = 0;
>> +    int i = 0;
>
>   Drop variable i, as ret can be used instead of 'i'.
>
>> +
>> +    /* For 64K page size, 4K sector size is supported */
>> +    if (PAGE_SIZE == SZ_64K) {
>> +        ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%u", SZ_4K);
>> +        i++;
>> +    }
>> +    /* Other than above subpage, only support PAGE_SIZE as sectorsize
>> yet */
>> +    ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%s%lu\n",
>
>> +             (i ? " " : ""), PAGE_SIZE);
>                            ^ret
>
>> +    return ret;
>> +}
>> +BTRFS_ATTR(static_feature, supported_ro_sectorsize,
>> +       supported_ro_sectorsize_show);
>> +
>> +static ssize_t supported_rw_sectorsize_show(struct kobject *kobj,
>> +                        struct kobj_attribute *a,
>> +                        char *buf)
>> +{
>> +    ssize_t ret = 0;
>> +
>> +    /* Only PAGE_SIZE as sectorsize is supported */
>> +    ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%lu\n", PAGE_SIZE);
>> +    return ret;
>> +}
>> +BTRFS_ATTR(static_feature, supported_rw_sectorsize,
>> +       supported_rw_sectorsize_show);
>
>   Why not merge supported_ro_sectorsize and supported_rw_sectorsize
>   and show both in two lines...
>   For example:
>     cat supported_sectorsizes
>     ro: 4096 65536
>     rw: 65536

If merged, btrfs-progs needs to do line number check before doing string
matching.

Although I doubt the usefulness for supported_ro_sectorsize, as the
window for RO support without RW support should not be that large.
(Current RW passes most generic test cases, and the remaining failures
are very limited)

Thus I can merged them into supported_sectorsize, and only report
sectorsize we can do RW as supported.

Thanks,
Qu

>
>
>
>>   static struct attribute *btrfs_supported_static_feature_attrs[] = {
>>       BTRFS_ATTR_PTR(static_feature, rmdir_subvol),
>>       BTRFS_ATTR_PTR(static_feature, supported_checksums),
>>       BTRFS_ATTR_PTR(static_feature, send_stream_version),
>>       BTRFS_ATTR_PTR(static_feature, supported_rescue_options),
>> +    BTRFS_ATTR_PTR(static_feature, supported_ro_sectorsize),
>> +    BTRFS_ATTR_PTR(static_feature, supported_rw_sectorsize),
>>       NULL
>>   };
>>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/15] btrfs: add sysfs interface for supported sectorsize
  2021-03-15 12:39     ` Qu Wenruo
@ 2021-03-15 18:44       ` David Sterba
  2021-03-16  0:05         ` Qu Wenruo
  0 siblings, 1 reply; 25+ messages in thread
From: David Sterba @ 2021-03-15 18:44 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Anand Jain, Qu Wenruo, linux-btrfs

On Mon, Mar 15, 2021 at 08:39:31PM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/3/15 下午7:59, Anand Jain wrote:
> > On 10/03/2021 17:08, Qu Wenruo wrote:
> >> Add extra sysfs interface features/supported_ro_sectorsize and
> >> features/supported_rw_sectorsize to indicate subpage support.
> >>
> >> Currently for supported_rw_sectorsize all architectures only have their
> >> PAGE_SIZE listed.
> >>
> >> While for supported_ro_sectorsize, for systems with 64K page size, 4K
> >> sectorsize is also supported.
> >>
> >> This new sysfs interface would help mkfs.btrfs to do more accurate
> >> warning.
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >
> > Changes looks good. Nit below...
> > And maybe it is a good idea to wait for other comments before reroll.
> >
> >>   fs/btrfs/sysfs.c | 34 ++++++++++++++++++++++++++++++++++
> >>   1 file changed, 34 insertions(+)
> >>
> >> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> >> index 6eb1c50fa98c..3ef419899472 100644
> >> --- a/fs/btrfs/sysfs.c
> >> +++ b/fs/btrfs/sysfs.c
> >> @@ -360,11 +360,45 @@ static ssize_t
> >> supported_rescue_options_show(struct kobject *kobj,
> >>   BTRFS_ATTR(static_feature, supported_rescue_options,
> >>          supported_rescue_options_show);
> >> +static ssize_t supported_ro_sectorsize_show(struct kobject *kobj,
> >> +                        struct kobj_attribute *a,
> >> +                        char *buf)
> >> +{
> >> +    ssize_t ret = 0;
> >> +    int i = 0;
> >
> >   Drop variable i, as ret can be used instead of 'i'.
> >
> >> +
> >> +    /* For 64K page size, 4K sector size is supported */
> >> +    if (PAGE_SIZE == SZ_64K) {
> >> +        ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%u", SZ_4K);
> >> +        i++;
> >> +    }
> >> +    /* Other than above subpage, only support PAGE_SIZE as sectorsize
> >> yet */
> >> +    ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%s%lu\n",
> >
> >> +             (i ? " " : ""), PAGE_SIZE);
> >                            ^ret
> >
> >> +    return ret;
> >> +}
> >> +BTRFS_ATTR(static_feature, supported_ro_sectorsize,
> >> +       supported_ro_sectorsize_show);
> >> +
> >> +static ssize_t supported_rw_sectorsize_show(struct kobject *kobj,
> >> +                        struct kobj_attribute *a,
> >> +                        char *buf)
> >> +{
> >> +    ssize_t ret = 0;
> >> +
> >> +    /* Only PAGE_SIZE as sectorsize is supported */
> >> +    ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%lu\n", PAGE_SIZE);
> >> +    return ret;
> >> +}
> >> +BTRFS_ATTR(static_feature, supported_rw_sectorsize,
> >> +       supported_rw_sectorsize_show);
> >
> >   Why not merge supported_ro_sectorsize and supported_rw_sectorsize
> >   and show both in two lines...
> >   For example:
> >     cat supported_sectorsizes
> >     ro: 4096 65536
> >     rw: 65536
> 
> If merged, btrfs-progs needs to do line number check before doing string
> matching.

The sysfs files should do one value per file.

> Although I doubt the usefulness for supported_ro_sectorsize, as the
> window for RO support without RW support should not be that large.
> (Current RW passes most generic test cases, and the remaining failures
> are very limited)
> 
> Thus I can merged them into supported_sectorsize, and only report
> sectorsize we can do RW as supported.

In that case one file with the list of supported values is a better
option. The main point is to have full RW support, until then it's
interesting only for developers and they know what to expect.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/15] btrfs: add sysfs interface for supported sectorsize
  2021-03-15 18:44       ` David Sterba
@ 2021-03-16  0:05         ` Qu Wenruo
  2021-03-16  0:10           ` Anand Jain
  2021-03-16 10:27           ` David Sterba
  0 siblings, 2 replies; 25+ messages in thread
From: Qu Wenruo @ 2021-03-16  0:05 UTC (permalink / raw)
  To: dsterba, Anand Jain, Qu Wenruo, linux-btrfs



On 2021/3/16 上午2:44, David Sterba wrote:
> On Mon, Mar 15, 2021 at 08:39:31PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2021/3/15 下午7:59, Anand Jain wrote:
>>> On 10/03/2021 17:08, Qu Wenruo wrote:
>>>> Add extra sysfs interface features/supported_ro_sectorsize and
>>>> features/supported_rw_sectorsize to indicate subpage support.
>>>>
>>>> Currently for supported_rw_sectorsize all architectures only have their
>>>> PAGE_SIZE listed.
>>>>
>>>> While for supported_ro_sectorsize, for systems with 64K page size, 4K
>>>> sectorsize is also supported.
>>>>
>>>> This new sysfs interface would help mkfs.btrfs to do more accurate
>>>> warning.
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>
>>> Changes looks good. Nit below...
>>> And maybe it is a good idea to wait for other comments before reroll.
>>>
>>>>    fs/btrfs/sysfs.c | 34 ++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 34 insertions(+)
>>>>
>>>> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
>>>> index 6eb1c50fa98c..3ef419899472 100644
>>>> --- a/fs/btrfs/sysfs.c
>>>> +++ b/fs/btrfs/sysfs.c
>>>> @@ -360,11 +360,45 @@ static ssize_t
>>>> supported_rescue_options_show(struct kobject *kobj,
>>>>    BTRFS_ATTR(static_feature, supported_rescue_options,
>>>>           supported_rescue_options_show);
>>>> +static ssize_t supported_ro_sectorsize_show(struct kobject *kobj,
>>>> +                        struct kobj_attribute *a,
>>>> +                        char *buf)
>>>> +{
>>>> +    ssize_t ret = 0;
>>>> +    int i = 0;
>>>
>>>    Drop variable i, as ret can be used instead of 'i'.
>>>
>>>> +
>>>> +    /* For 64K page size, 4K sector size is supported */
>>>> +    if (PAGE_SIZE == SZ_64K) {
>>>> +        ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%u", SZ_4K);
>>>> +        i++;
>>>> +    }
>>>> +    /* Other than above subpage, only support PAGE_SIZE as sectorsize
>>>> yet */
>>>> +    ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%s%lu\n",
>>>
>>>> +             (i ? " " : ""), PAGE_SIZE);
>>>                             ^ret
>>>
>>>> +    return ret;
>>>> +}
>>>> +BTRFS_ATTR(static_feature, supported_ro_sectorsize,
>>>> +       supported_ro_sectorsize_show);
>>>> +
>>>> +static ssize_t supported_rw_sectorsize_show(struct kobject *kobj,
>>>> +                        struct kobj_attribute *a,
>>>> +                        char *buf)
>>>> +{
>>>> +    ssize_t ret = 0;
>>>> +
>>>> +    /* Only PAGE_SIZE as sectorsize is supported */
>>>> +    ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%lu\n", PAGE_SIZE);
>>>> +    return ret;
>>>> +}
>>>> +BTRFS_ATTR(static_feature, supported_rw_sectorsize,
>>>> +       supported_rw_sectorsize_show);
>>>
>>>    Why not merge supported_ro_sectorsize and supported_rw_sectorsize
>>>    and show both in two lines...
>>>    For example:
>>>      cat supported_sectorsizes
>>>      ro: 4096 65536
>>>      rw: 65536
>>
>> If merged, btrfs-progs needs to do line number check before doing string
>> matching.
>
> The sysfs files should do one value per file.
>
>> Although I doubt the usefulness for supported_ro_sectorsize, as the
>> window for RO support without RW support should not be that large.
>> (Current RW passes most generic test cases, and the remaining failures
>> are very limited)
>>
>> Thus I can merged them into supported_sectorsize, and only report
>> sectorsize we can do RW as supported.
>
> In that case one file with the list of supported values is a better
> option. The main point is to have full RW support, until then it's
> interesting only for developers and they know what to expect.
>

Indeed only full RW support makes sense.

BTW, any comment on the file name? If no problem I would just use
"supported_sectorsize" in next update.

Although I hope the sysfs interface can be merged separately early, so
that I can add the proper support in btrfs-progs.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/15] btrfs: add sysfs interface for supported sectorsize
  2021-03-16  0:05         ` Qu Wenruo
@ 2021-03-16  0:10           ` Anand Jain
  2021-03-16 10:25             ` David Sterba
  2021-03-16 10:27           ` David Sterba
  1 sibling, 1 reply; 25+ messages in thread
From: Anand Jain @ 2021-03-16  0:10 UTC (permalink / raw)
  To: Qu Wenruo, dsterba, Qu Wenruo, linux-btrfs



On 16/03/2021 08:05, Qu Wenruo wrote:
> 
> 
> On 2021/3/16 上午2:44, David Sterba wrote:
>> On Mon, Mar 15, 2021 at 08:39:31PM +0800, Qu Wenruo wrote:
>>>
>>>
>>> On 2021/3/15 下午7:59, Anand Jain wrote:
>>>> On 10/03/2021 17:08, Qu Wenruo wrote:
>>>>> Add extra sysfs interface features/supported_ro_sectorsize and
>>>>> features/supported_rw_sectorsize to indicate subpage support.
>>>>>
>>>>> Currently for supported_rw_sectorsize all architectures only have 
>>>>> their
>>>>> PAGE_SIZE listed.
>>>>>
>>>>> While for supported_ro_sectorsize, for systems with 64K page size, 4K
>>>>> sectorsize is also supported.
>>>>>
>>>>> This new sysfs interface would help mkfs.btrfs to do more accurate
>>>>> warning.
>>>>>
>>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>>> ---
>>>>
>>>> Changes looks good. Nit below...
>>>> And maybe it is a good idea to wait for other comments before reroll.
>>>>
>>>>>    fs/btrfs/sysfs.c | 34 ++++++++++++++++++++++++++++++++++
>>>>>    1 file changed, 34 insertions(+)
>>>>>
>>>>> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
>>>>> index 6eb1c50fa98c..3ef419899472 100644
>>>>> --- a/fs/btrfs/sysfs.c
>>>>> +++ b/fs/btrfs/sysfs.c
>>>>> @@ -360,11 +360,45 @@ static ssize_t
>>>>> supported_rescue_options_show(struct kobject *kobj,
>>>>>    BTRFS_ATTR(static_feature, supported_rescue_options,
>>>>>           supported_rescue_options_show);
>>>>> +static ssize_t supported_ro_sectorsize_show(struct kobject *kobj,
>>>>> +                        struct kobj_attribute *a,
>>>>> +                        char *buf)
>>>>> +{
>>>>> +    ssize_t ret = 0;
>>>>> +    int i = 0;
>>>>
>>>>    Drop variable i, as ret can be used instead of 'i'.
>>>>
>>>>> +
>>>>> +    /* For 64K page size, 4K sector size is supported */
>>>>> +    if (PAGE_SIZE == SZ_64K) {
>>>>> +        ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%u", SZ_4K);
>>>>> +        i++;
>>>>> +    }
>>>>> +    /* Other than above subpage, only support PAGE_SIZE as sectorsize
>>>>> yet */
>>>>> +    ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%s%lu\n",
>>>>
>>>>> +             (i ? " " : ""), PAGE_SIZE);
>>>>                             ^ret
>>>>
>>>>> +    return ret;
>>>>> +}
>>>>> +BTRFS_ATTR(static_feature, supported_ro_sectorsize,
>>>>> +       supported_ro_sectorsize_show);
>>>>> +
>>>>> +static ssize_t supported_rw_sectorsize_show(struct kobject *kobj,
>>>>> +                        struct kobj_attribute *a,
>>>>> +                        char *buf)
>>>>> +{
>>>>> +    ssize_t ret = 0;
>>>>> +
>>>>> +    /* Only PAGE_SIZE as sectorsize is supported */
>>>>> +    ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%lu\n", PAGE_SIZE);
>>>>> +    return ret;
>>>>> +}
>>>>> +BTRFS_ATTR(static_feature, supported_rw_sectorsize,
>>>>> +       supported_rw_sectorsize_show);
>>>>
>>>>    Why not merge supported_ro_sectorsize and supported_rw_sectorsize
>>>>    and show both in two lines...
>>>>    For example:
>>>>      cat supported_sectorsizes
>>>>      ro: 4096 65536
>>>>      rw: 65536
>>>
>>> If merged, btrfs-progs needs to do line number check before doing string
>>> matching.
>>
>> The sysfs files should do one value per file.
>>
>>> Although I doubt the usefulness for supported_ro_sectorsize, as the
>>> window for RO support without RW support should not be that large.
>>> (Current RW passes most generic test cases, and the remaining failures
>>> are very limited)
>>>
>>> Thus I can merged them into supported_sectorsize, and only report
>>> sectorsize we can do RW as supported.
>>
>> In that case one file with the list of supported values is a better
>> option. The main point is to have full RW support, until then it's
>> interesting only for developers and they know what to expect.
>>
> 
> Indeed only full RW support makes sense.
> 
  Makes sense to me.

> BTW, any comment on the file name? If no problem I would just use
> "supported_sectorsize" in next update.

  supported_sectorsizes (plural) is better IMO.

Thanks, Anand

> Although I hope the sysfs interface can be merged separately early, so
> that I can add the proper support in btrfs-progs.
> 
> Thanks,
> Qu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/15] btrfs: add sysfs interface for supported sectorsize
  2021-03-16  0:10           ` Anand Jain
@ 2021-03-16 10:25             ` David Sterba
  0 siblings, 0 replies; 25+ messages in thread
From: David Sterba @ 2021-03-16 10:25 UTC (permalink / raw)
  To: Anand Jain; +Cc: Qu Wenruo, dsterba, Qu Wenruo, linux-btrfs

On Tue, Mar 16, 2021 at 08:10:13AM +0800, Anand Jain wrote:
> 
> 
> On 16/03/2021 08:05, Qu Wenruo wrote:
> > 
> > 
> > On 2021/3/16 上午2:44, David Sterba wrote:
> >> On Mon, Mar 15, 2021 at 08:39:31PM +0800, Qu Wenruo wrote:
> >>>
> >>>
> >>> On 2021/3/15 下午7:59, Anand Jain wrote:
> >>>> On 10/03/2021 17:08, Qu Wenruo wrote:
> >>>>> Add extra sysfs interface features/supported_ro_sectorsize and
> >>>>> features/supported_rw_sectorsize to indicate subpage support.
> >>>>>
> >>>>> Currently for supported_rw_sectorsize all architectures only have 
> >>>>> their
> >>>>> PAGE_SIZE listed.
> >>>>>
> >>>>> While for supported_ro_sectorsize, for systems with 64K page size, 4K
> >>>>> sectorsize is also supported.
> >>>>>
> >>>>> This new sysfs interface would help mkfs.btrfs to do more accurate
> >>>>> warning.
> >>>>>
> >>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >>>>> ---
> >>>>
> >>>> Changes looks good. Nit below...
> >>>> And maybe it is a good idea to wait for other comments before reroll.
> >>>>
> >>>>>    fs/btrfs/sysfs.c | 34 ++++++++++++++++++++++++++++++++++
> >>>>>    1 file changed, 34 insertions(+)
> >>>>>
> >>>>> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> >>>>> index 6eb1c50fa98c..3ef419899472 100644
> >>>>> --- a/fs/btrfs/sysfs.c
> >>>>> +++ b/fs/btrfs/sysfs.c
> >>>>> @@ -360,11 +360,45 @@ static ssize_t
> >>>>> supported_rescue_options_show(struct kobject *kobj,
> >>>>>    BTRFS_ATTR(static_feature, supported_rescue_options,
> >>>>>           supported_rescue_options_show);
> >>>>> +static ssize_t supported_ro_sectorsize_show(struct kobject *kobj,
> >>>>> +                        struct kobj_attribute *a,
> >>>>> +                        char *buf)
> >>>>> +{
> >>>>> +    ssize_t ret = 0;
> >>>>> +    int i = 0;
> >>>>
> >>>>    Drop variable i, as ret can be used instead of 'i'.
> >>>>
> >>>>> +
> >>>>> +    /* For 64K page size, 4K sector size is supported */
> >>>>> +    if (PAGE_SIZE == SZ_64K) {
> >>>>> +        ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%u", SZ_4K);
> >>>>> +        i++;
> >>>>> +    }
> >>>>> +    /* Other than above subpage, only support PAGE_SIZE as sectorsize
> >>>>> yet */
> >>>>> +    ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%s%lu\n",
> >>>>
> >>>>> +             (i ? " " : ""), PAGE_SIZE);
> >>>>                             ^ret
> >>>>
> >>>>> +    return ret;
> >>>>> +}
> >>>>> +BTRFS_ATTR(static_feature, supported_ro_sectorsize,
> >>>>> +       supported_ro_sectorsize_show);
> >>>>> +
> >>>>> +static ssize_t supported_rw_sectorsize_show(struct kobject *kobj,
> >>>>> +                        struct kobj_attribute *a,
> >>>>> +                        char *buf)
> >>>>> +{
> >>>>> +    ssize_t ret = 0;
> >>>>> +
> >>>>> +    /* Only PAGE_SIZE as sectorsize is supported */
> >>>>> +    ret += scnprintf(buf + ret, PAGE_SIZE - ret, "%lu\n", PAGE_SIZE);
> >>>>> +    return ret;
> >>>>> +}
> >>>>> +BTRFS_ATTR(static_feature, supported_rw_sectorsize,
> >>>>> +       supported_rw_sectorsize_show);
> >>>>
> >>>>    Why not merge supported_ro_sectorsize and supported_rw_sectorsize
> >>>>    and show both in two lines...
> >>>>    For example:
> >>>>      cat supported_sectorsizes
> >>>>      ro: 4096 65536
> >>>>      rw: 65536
> >>>
> >>> If merged, btrfs-progs needs to do line number check before doing string
> >>> matching.
> >>
> >> The sysfs files should do one value per file.
> >>
> >>> Although I doubt the usefulness for supported_ro_sectorsize, as the
> >>> window for RO support without RW support should not be that large.
> >>> (Current RW passes most generic test cases, and the remaining failures
> >>> are very limited)
> >>>
> >>> Thus I can merged them into supported_sectorsize, and only report
> >>> sectorsize we can do RW as supported.
> >>
> >> In that case one file with the list of supported values is a better
> >> option. The main point is to have full RW support, until then it's
> >> interesting only for developers and they know what to expect.
> >>
> > 
> > Indeed only full RW support makes sense.
> > 
>   Makes sense to me.
> 
> > BTW, any comment on the file name? If no problem I would just use
> > "supported_sectorsize" in next update.
> 
>   supported_sectorsizes (plural) is better IMO.

Yeah pluar is consistent with what we have now, eg. supported_checksums

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/15] btrfs: add sysfs interface for supported sectorsize
  2021-03-16  0:05         ` Qu Wenruo
  2021-03-16  0:10           ` Anand Jain
@ 2021-03-16 10:27           ` David Sterba
  1 sibling, 0 replies; 25+ messages in thread
From: David Sterba @ 2021-03-16 10:27 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Anand Jain, Qu Wenruo, linux-btrfs

On Tue, Mar 16, 2021 at 08:05:31AM +0800, Qu Wenruo wrote:
> > In that case one file with the list of supported values is a better
> > option. The main point is to have full RW support, until then it's
> > interesting only for developers and they know what to expect.
> >
> 
> Indeed only full RW support makes sense.
> 
> BTW, any comment on the file name? If no problem I would just use
> "supported_sectorsize" in next update.
> 
> Although I hope the sysfs interface can be merged separately early, so
> that I can add the proper support in btrfs-progs.

Yeah, exporting the information via sysfs is the easy stuff so you can
postpone it as you need.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-03-16 10:29 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-10  9:08 [PATCH v2 00/15] btrfs: support read-write for subpage metadata Qu Wenruo
2021-03-10  9:08 ` [PATCH v2 01/15] btrfs: add sysfs interface for supported sectorsize Qu Wenruo
2021-03-15 11:59   ` Anand Jain
2021-03-15 12:39     ` Qu Wenruo
2021-03-15 18:44       ` David Sterba
2021-03-16  0:05         ` Qu Wenruo
2021-03-16  0:10           ` Anand Jain
2021-03-16 10:25             ` David Sterba
2021-03-16 10:27           ` David Sterba
2021-03-10  9:08 ` [PATCH v2 02/15] btrfs: use min() to replace open-code in btrfs_invalidatepage() Qu Wenruo
2021-03-15 12:03   ` Anand Jain
2021-03-10  9:08 ` [PATCH v2 03/15] btrfs: remove unnecessary variable shadowing " Qu Wenruo
2021-03-15 12:06   ` Anand Jain
2021-03-10  9:08 ` [PATCH v2 04/15] btrfs: introduce helpers for subpage dirty status Qu Wenruo
2021-03-10  9:08 ` [PATCH v2 05/15] btrfs: introduce helpers for subpage writeback status Qu Wenruo
2021-03-10  9:08 ` [PATCH v2 06/15] btrfs: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
2021-03-10  9:08 ` [PATCH v2 07/15] btrfs: support subpage metadata csum calculation at write time Qu Wenruo
2021-03-10  9:08 ` [PATCH v2 08/15] btrfs: make alloc_extent_buffer() check subpage dirty bitmap Qu Wenruo
2021-03-10  9:08 ` [PATCH v2 09/15] btrfs: make the page uptodate assert to be subpage compatible Qu Wenruo
2021-03-10  9:08 ` [PATCH v2 10/15] btrfs: make set/clear_extent_buffer_dirty() " Qu Wenruo
2021-03-10  9:08 ` [PATCH v2 11/15] btrfs: make set_btree_ioerr() accept extent buffer and " Qu Wenruo
2021-03-10  9:08 ` [PATCH v2 12/15] btrfs: introduce end_bio_subpage_eb_writepage() function Qu Wenruo
2021-03-10  9:08 ` [PATCH v2 13/15] btrfs: introduce write_one_subpage_eb() function Qu Wenruo
2021-03-10  9:08 ` [PATCH v2 14/15] btrfs: make lock_extent_buffer_for_io() to be subpage compatible Qu Wenruo
2021-03-10  9:08 ` [PATCH v2 15/15] btrfs: introduce submit_eb_subpage() to submit a subpage metadata page Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.