[PATCH RFC 0/8] btrfs: experimental compression support for subpage

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC 0/8] btrfs: experimental compression support for subpage
@ 2021-06-23  5:55 Qu Wenruo
  2021-06-23  5:55 ` [PATCH RFC 1/8] btrfs: don't pass compressed pages to btrfs_writepage_endio_finish_ordered() Qu Wenruo
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Qu Wenruo @ 2021-06-23  5:55 UTC (permalink / raw)
  To: linux-btrfs

This patchset is based on my previously submitted compression refactor,
which is further based on subpage patchset.

It can be fetched from the following repo:
https://github.com/adam900710/linux/tree/compression

It only has been lightly tested, no stress test/full fstests run yet.
The reason for such a unstable patchset is to get feedback on how to
continue the development, aka the feedback for the last patch.

The first 7 patches are mostly cleanups to make compression path to be
more subpage compatible.

The RFC is for the last patch, which will enable subpage compression in
a pretty controversial way.

The reason is copied from that patch:

```
This mechanism allows btrfs to continue handling next ranges, without
waiting for the time consuming compression.

But this has a problem for subpage case, as we could have the following
delalloc range for a page:

0		32K		64K
|	|///////|	|///////|
		\- A		\- B

In above case, if we pass both range to cow_file_range_async(), both
range A and range B will try to unlock the full page [0, 64K).

And which finishes later than the other range will try to do other page
operatioins like end_page_writeback() on a unlocked page, triggering VM
layer BUG_ON().

Currently I don't have any perfect solution to this, but two
workarounds:

- Only allow compression for fully page aligned range

  This is what I did in this patch.
  By this, the compressed range can exclusively lock the first page
  (and other pages), so they are completely safe to do whatever they
  want.
  The problem is, we will not compress a lot of small writes.
  This is especially problematic as our target page size is 64K, not
  a small size.

- Make cow_file_range_async() to behave like cow_file_range() for
  subpage

  This needs several behavier change, and are all subpage specific:
  * Skip the first page of the range when finished
    Just like cow_file_range()
  * Have a way to wait for the async_cow to finish before handling the
    next delalloc range

  The biggest problem here is the performance impact.
  Although by this we can compress all sector aligned ranges, we will
  waste time waiting for the async_cow to finish.
  This is completely denying the meaning of "async" part.
  Now to mention there are tons of code needs to be changed.

Thus I choose the current way to only compress ranges which is fully
page aligned.
The cost is we will skip a lot of small writes for 64K page size.
```

Any feedback on this part would be pretty helpful.

Qu Wenruo (8):
  btrfs: don't pass compressed pages to
    btrfs_writepage_endio_finish_ordered()
  btrfs: make btrfs_subpage_end_and_test_writer() to handle pages not
    locked by btrfs_page_start_writer_lock()
  btrfs: make compress_file_range() to be subpage compatible
  btrfs: make btrfs_submit_compressed_write() to be subpage compatible
  btrfs: use async_chunk::async_cow to replace the confusing pending
    pointer
  btrfs: make end_compressed_bio_writeback() to be subpage compatble
  btrfs: make extent_write_locked_range() to be subpage compatible
  btrfs: only allow subpage compression if the range fully covers the
    first page

 fs/btrfs/compression.c |  8 +++++--
 fs/btrfs/extent_io.c   |  7 ++++--
 fs/btrfs/inode.c       | 49 +++++++++++++++++++++++++++---------------
 fs/btrfs/subpage.c     | 24 ++++++++++++++++++++-
 4 files changed, 66 insertions(+), 22 deletions(-)

-- 
2.32.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 1/8] btrfs: don't pass compressed pages to btrfs_writepage_endio_finish_ordered()
  2021-06-23  5:55 [PATCH RFC 0/8] btrfs: experimental compression support for subpage Qu Wenruo
@ 2021-06-23  5:55 ` Qu Wenruo
  2021-06-23  5:55 ` [PATCH RFC 2/8] btrfs: make btrfs_subpage_end_and_test_writer() to handle pages not locked by btrfs_page_start_writer_lock() Qu Wenruo
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2021-06-23  5:55 UTC (permalink / raw)
  To: linux-btrfs

Since async_extent holds the compressed page, it would trigger the new
ASSERT() in btrfs_mark_ordered_io_finished() which checks the range is
inside the page.

Since btrfs_writepage_endio_finish_ordered() can accept @page == NULL,
just pass NULL to btrfs_writepage_endio_finish_ordered().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 050c88ab74b5..e102e3672475 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -974,15 +974,12 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk)
 				    async_extent->nr_pages,
 				    async_chunk->write_flags,
 				    async_chunk->blkcg_css)) {
-			struct page *p = async_extent->pages[0];
 			const u64 start = async_extent->start;
 			const u64 end = start + async_extent->ram_size - 1;
 
-			p->mapping = inode->vfs_inode.i_mapping;
-			btrfs_writepage_endio_finish_ordered(inode, p, start,
+			btrfs_writepage_endio_finish_ordered(inode, NULL, start,
 							     end, 0);
 
-			p->mapping = NULL;
 			extent_clear_unlock_delalloc(inode, start, end, NULL, 0,
 						     PAGE_END_WRITEBACK |
 						     PAGE_SET_ERROR);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 2/8] btrfs: make btrfs_subpage_end_and_test_writer() to handle pages not locked by btrfs_page_start_writer_lock()
  2021-06-23  5:55 [PATCH RFC 0/8] btrfs: experimental compression support for subpage Qu Wenruo
  2021-06-23  5:55 ` [PATCH RFC 1/8] btrfs: don't pass compressed pages to btrfs_writepage_endio_finish_ordered() Qu Wenruo
@ 2021-06-23  5:55 ` Qu Wenruo
  2021-06-23  5:55 ` [PATCH RFC 3/8] btrfs: make compress_file_range() to be subpage compatible Qu Wenruo
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2021-06-23  5:55 UTC (permalink / raw)
  To: linux-btrfs

Normally if a page is locked by page_lock() other than
btrfs_page_start_writer_lock(), it should be passed as @locked_page for
various extent_clear_unlock_delalloc() call sites.

But there are quite some call sites in compression path, where we
intentionally call extent_clear_unlock_delalloc() with @locked_page ==
NULL.

This will abuse extent_clear_unlock_delalloc() to unlock @locked_page.

This works fine for regular page size, but not really for subpage, as if
a page is locked by btrfs_page_start_writer_lock() it will have proper
@writers value increased.

If a page not locked by btrfs_page_start_writer_lock() we will underflow
the value.

But thankfully, for such pages its @writers value should be zero, and
we can use that to distinguish page locked by
btrfs_page_start_writer_lock() and @locked_page by delalloc.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/subpage.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index a61aa33aeeee..72d5d4712933 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -245,11 +245,33 @@ bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_info,
 {
 	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
 	const int nbits = (len >> fs_info->sectorsize_bits);
+	unsigned long flags;
+	bool ret;
 
 	btrfs_subpage_assert(fs_info, page, start, len);
 
+	/* Will do two atomic checks, no longer atomic and need spinlock */
+	spin_lock_irqsave(&subpage->lock, flags);
+
+	/*
+	 * In compression path, we have extent_clear_unlock_delalloc() call
+	 * sites which intentionally pass @locked_page == NULL to unlock
+	 * the locked page.
+	 *
+	 * In that case, @locked_page should has no writer counts, as it's
+	 * not locked by btrfs_page_start_writer_lock().
+	 * For such case, we just return true so that
+	 * btrfs_page_end_writer_lock() will unlock the page.
+	 */
+	if (atomic_read(&subpage->writers) == 0) {
+		ret = true;
+		spin_unlock_irqrestore(&subpage->lock, flags);
+		return ret;
+	}
 	ASSERT(atomic_read(&subpage->writers) >= nbits);
-	return atomic_sub_and_test(nbits, &subpage->writers);
+	ret = atomic_sub_and_test(nbits, &subpage->writers);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+	return ret;
 }
 
 /*
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 3/8] btrfs: make compress_file_range() to be subpage compatible
  2021-06-23  5:55 [PATCH RFC 0/8] btrfs: experimental compression support for subpage Qu Wenruo
  2021-06-23  5:55 ` [PATCH RFC 1/8] btrfs: don't pass compressed pages to btrfs_writepage_endio_finish_ordered() Qu Wenruo
  2021-06-23  5:55 ` [PATCH RFC 2/8] btrfs: make btrfs_subpage_end_and_test_writer() to handle pages not locked by btrfs_page_start_writer_lock() Qu Wenruo
@ 2021-06-23  5:55 ` Qu Wenruo
  2021-06-23  5:55 ` [PATCH RFC 4/8] btrfs: make btrfs_submit_compressed_write() " Qu Wenruo
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2021-06-23  5:55 UTC (permalink / raw)
  To: linux-btrfs

In function compress_file_range(), then the compression is finished, the
function just round up @total_in to PAGE_SIZE.

This is fine for regular sectorsize == PAGE_SIZE case, but not for
subpage.

Just change the ALIGN(, PAGE_SIZE) to round_up(, sectorsize) so that
both regular sectorsize and subpage sectorsize will be happy.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e102e3672475..30cb8b1fc067 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -759,7 +759,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 		 * win, compare the page count read with the blocks on disk,
 		 * compression must free at least one sector size
 		 */
-		total_in = ALIGN(total_in, PAGE_SIZE);
+		total_in = round_up(total_in, fs_info->sectorsize);
 		if (total_compressed + blocksize <= total_in) {
 			compressed_extents++;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 4/8] btrfs: make btrfs_submit_compressed_write() to be subpage compatible
  2021-06-23  5:55 [PATCH RFC 0/8] btrfs: experimental compression support for subpage Qu Wenruo
                   ` (2 preceding siblings ...)
  2021-06-23  5:55 ` [PATCH RFC 3/8] btrfs: make compress_file_range() to be subpage compatible Qu Wenruo
@ 2021-06-23  5:55 ` Qu Wenruo
  2021-06-23  5:55 ` [PATCH RFC 5/8] btrfs: use async_chunk::async_cow to replace the confusing pending pointer Qu Wenruo
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2021-06-23  5:55 UTC (permalink / raw)
  To: linux-btrfs

There is a WARN_ON() checking if @start is aligned to PAGE_SIZE, not
sectorsize, which will cause false alert for subpage.

Fix it to check against sectorsize.

Furthermore:

- Use ASSERT() to do the check
  So that in the future we may skip the check for production build

- Also check alignment for @len

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index b5487c80d2b6..a49dbf166b15 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -498,7 +498,8 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	const bool use_append = btrfs_use_zone_append(inode, disk_start);
 	const unsigned int bio_op = use_append ? REQ_OP_ZONE_APPEND : REQ_OP_WRITE;
 
-	WARN_ON(!PAGE_ALIGNED(start));
+	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
+	       IS_ALIGNED(len, fs_info->sectorsize));
 	cb = kmalloc(compressed_bio_size(fs_info, compressed_len), GFP_NOFS);
 	if (!cb)
 		return BLK_STS_RESOURCE;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 5/8] btrfs: use async_chunk::async_cow to replace the confusing pending pointer
  2021-06-23  5:55 [PATCH RFC 0/8] btrfs: experimental compression support for subpage Qu Wenruo
                   ` (3 preceding siblings ...)
  2021-06-23  5:55 ` [PATCH RFC 4/8] btrfs: make btrfs_submit_compressed_write() " Qu Wenruo
@ 2021-06-23  5:55 ` Qu Wenruo
  2021-06-23  5:55 ` [PATCH RFC 6/8] btrfs: make end_compressed_bio_writeback() to be subpage compatble Qu Wenruo
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2021-06-23  5:55 UTC (permalink / raw)
  To: linux-btrfs

For structure async_chunk, we use a very strange member layout to grab
structure async_cow who owns this async_chunk.

At initialization, it goes like this:

		async_chunk[i].pending = &ctx->num_chunks;

Then at async_cow_free() we do a super weird freeing:

	/*
	 * Since the pointer to 'pending' is at the beginning of the array of
	 * async_chunk's, freeing it ensures the whole array has been freed.
	 */
	if (atomic_dec_and_test(async_chunk->pending))
		kvfree(async_chunk->pending);

This is absolutely an abuse of kvfree().

Replace async_chunk::pending with async_chunk::async_cow, so that we can
grab the async_cow structure directly, without this strange dancing.

And with this change, there is no requirement for any specific member
location.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 30cb8b1fc067..497c219758e0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -455,11 +455,10 @@ struct async_chunk {
 	struct list_head extents;
 	struct cgroup_subsys_state *blkcg_css;
 	struct btrfs_work work;
-	atomic_t *pending;
+	struct async_cow *async_cow;
 };
 
 struct async_cow {
-	/* Number of chunks in flight; must be first in the structure */
 	atomic_t num_chunks;
 	struct async_chunk chunks[];
 };
@@ -1322,18 +1321,17 @@ static noinline void async_cow_submit(struct btrfs_work *work)
 static noinline void async_cow_free(struct btrfs_work *work)
 {
 	struct async_chunk *async_chunk;
+	struct async_cow *async_cow;
 
 	async_chunk = container_of(work, struct async_chunk, work);
 	if (async_chunk->inode)
 		btrfs_add_delayed_iput(async_chunk->inode);
 	if (async_chunk->blkcg_css)
 		css_put(async_chunk->blkcg_css);
-	/*
-	 * Since the pointer to 'pending' is at the beginning of the array of
-	 * async_chunk's, freeing it ensures the whole array has been freed.
-	 */
-	if (atomic_dec_and_test(async_chunk->pending))
-		kvfree(async_chunk->pending);
+
+	async_cow = async_chunk->async_cow;
+	if (atomic_dec_and_test(&async_cow->num_chunks))
+		kvfree(async_cow);
 }
 
 static int cow_file_range_async(struct btrfs_inode *inode,
@@ -1394,7 +1392,7 @@ static int cow_file_range_async(struct btrfs_inode *inode,
 		 * lightweight reference for the callback lifetime
 		 */
 		ihold(&inode->vfs_inode);
-		async_chunk[i].pending = &ctx->num_chunks;
+		async_chunk[i].async_cow = ctx;
 		async_chunk[i].inode = &inode->vfs_inode;
 		async_chunk[i].start = start;
 		async_chunk[i].end = cur_end;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 6/8] btrfs: make end_compressed_bio_writeback() to be subpage compatble
  2021-06-23  5:55 [PATCH RFC 0/8] btrfs: experimental compression support for subpage Qu Wenruo
                   ` (4 preceding siblings ...)
  2021-06-23  5:55 ` [PATCH RFC 5/8] btrfs: use async_chunk::async_cow to replace the confusing pending pointer Qu Wenruo
@ 2021-06-23  5:55 ` Qu Wenruo
  2021-06-23  5:55 ` [PATCH RFC 7/8] btrfs: make extent_write_locked_range() to be subpage compatible Qu Wenruo
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2021-06-23  5:55 UTC (permalink / raw)
  To: linux-btrfs

In end_compressed_writeback() we just clear the full page writeback.
For subpage case, if there are two delalloc range in the same page, the
2nd range will trigger a BUG_ON() as the page writeback is already
cleared by previous range.

Fix it by using btrfs_page_clamp_clear_writeback() helper.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index a49dbf166b15..2eec9850996e 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -29,6 +29,7 @@
 #include "extent_io.h"
 #include "extent_map.h"
 #include "zoned.h"
+#include "subpage.h"
 
 static const char* const btrfs_compress_types[] = { "", "zlib", "lzo", "zstd" };
 
@@ -330,6 +331,7 @@ static void end_compressed_bio_read(struct bio *bio)
 static noinline void end_compressed_writeback(struct inode *inode,
 					      const struct compressed_bio *cb)
 {
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	unsigned long index = cb->start >> PAGE_SHIFT;
 	unsigned long end_index = (cb->start + cb->len - 1) >> PAGE_SHIFT;
 	struct page *pages[16];
@@ -352,7 +354,8 @@ static noinline void end_compressed_writeback(struct inode *inode,
 		for (i = 0; i < ret; i++) {
 			if (cb->errors)
 				SetPageError(pages[i]);
-			end_page_writeback(pages[i]);
+			btrfs_page_clamp_clear_writeback(fs_info, pages[i],
+							 cb->start, cb->len);
 			put_page(pages[i]);
 		}
 		nr_pages -= ret;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 7/8] btrfs: make extent_write_locked_range() to be subpage compatible
  2021-06-23  5:55 [PATCH RFC 0/8] btrfs: experimental compression support for subpage Qu Wenruo
                   ` (5 preceding siblings ...)
  2021-06-23  5:55 ` [PATCH RFC 6/8] btrfs: make end_compressed_bio_writeback() to be subpage compatble Qu Wenruo
@ 2021-06-23  5:55 ` Qu Wenruo
  2021-07-05 12:43   ` Qu Wenruo
  2021-06-23  5:55 ` [PATCH RFC 8/8] btrfs: only allow subpage compression if the range fully covers the first page Qu Wenruo
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2021-06-23  5:55 UTC (permalink / raw)
  To: linux-btrfs

Function extent_write_locked_range() gets called when an async range
falls back to regular COW.

In that case, we need to finish the ordered extents for the range.

But the function is still using hardcoded PAGE_SIZE to calculate the
range.

If it get called with a range like this:

  0   16K  32K   48K 64K
  |   |////|/////|  |

Then the range passed to btrfs_writepage_endio_finish_ordered() will be
start == 16K, end == 80K, and the page range will no longer cover it,
triggering an ASSERT().

Fix it by properly calculate the range end by considering both the page
end and range end.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e244c10074c8..c5491720a346 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5070,16 +5070,19 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
 
 	wbc_attach_fdatawrite_inode(&wbc_writepages, inode);
 	while (start <= end) {
+		u64 cur_end = min(round_down(start, PAGE_SIZE) + PAGE_SIZE - 1,
+				  end);
+
 		page = find_get_page(mapping, start >> PAGE_SHIFT);
 		if (clear_page_dirty_for_io(page))
 			ret = __extent_writepage(page, &wbc_writepages, &epd);
 		else {
 			btrfs_writepage_endio_finish_ordered(BTRFS_I(inode),
-					page, start, start + PAGE_SIZE - 1, 1);
+					page, start, cur_end, 1);
 			unlock_page(page);
 		}
 		put_page(page);
-		start += PAGE_SIZE;
+		start = cur_end + 1;
 	}
 
 	ASSERT(ret <= 0);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 8/8] btrfs: only allow subpage compression if the range fully covers the first page
  2021-06-23  5:55 [PATCH RFC 0/8] btrfs: experimental compression support for subpage Qu Wenruo
                   ` (6 preceding siblings ...)
  2021-06-23  5:55 ` [PATCH RFC 7/8] btrfs: make extent_write_locked_range() to be subpage compatible Qu Wenruo
@ 2021-06-23  5:55 ` Qu Wenruo
  2021-06-23 21:23 ` [PATCH RFC 0/8] btrfs: experimental compression support for subpage David Sterba
  2021-06-23 22:37 ` David Sterba
  9 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2021-06-23  5:55 UTC (permalink / raw)
  To: linux-btrfs

For btrfs compressed write, we use a mechanism called async cow, which
unliek regular run_delalloc_cow() or cow_file_range(), it will also
unlock the first page.

This mechanism allows btrfs to continue handling next ranges, without
waiting for the time consuming compression.

But this has a problem for subpage case, as we could have the following
delalloc range for a page:

0		32K		64K
|	|///////|	|///////|
		\- A		\- B

In above case, if we pass both range to cow_file_range_async(), both
range A and range B will try to unlock the full page [0, 64K).

And which finishes later than the other range will try to do other page
operatioins like end_page_writeback() on a unlocked page, triggering VM
layer BUG_ON().

Currently I don't have any perfect solution to this, but two
workarounds:

- Only allow compression for fully page aligned range

  This is what I did in this patch.
  By this, the compressed range can exclusively lock the first page
  (and other pages), so they are completely safe to do whatever they
  want.
  The problem is, we will not compress a lot of small writes.
  This is especially problematic as our target page size is 64K, not
  a small size.

- Make cow_file_range_async() to behave like cow_file_range() for
  subpage

  This needs several behavier change, and are all subpage specific:
  * Skip the first page of the range when finished
    Just like cow_file_range()
  * Have a way to wait for the async_cow to finish before handling the
    next delalloc range

  The biggest problem here is the performance impact.
  Although by this we can compress all sector aligned ranges, we will
  waste time waiting for the async_cow to finish.
  This is completely denying the meaning of "async" part.
  Now to mention there are tons of code needs to be changed.

Thus I choose the current way to only compress ranges which is fully
page aligned.
The cost is we will skip a lot of small writes for 64K page size.

With this change, btrfs subpage can support compressed write now.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 497c219758e0..be6d00d097d1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -489,9 +489,6 @@ static noinline int add_async_extent(struct async_chunk *cow,
  */
 static inline bool inode_can_compress(struct btrfs_inode *inode)
 {
-	/* Subpage doesn't support compress yet */
-	if (inode->root->fs_info->sectorsize < PAGE_SIZE)
-		return false;
 	if (inode->flags & BTRFS_INODE_NODATACOW ||
 	    inode->flags & BTRFS_INODE_NODATASUM)
 		return false;
@@ -513,6 +510,29 @@ static inline int inode_need_compress(struct btrfs_inode *inode, u64 start,
 			btrfs_ino(inode));
 		return 0;
 	}
+	/*
+	 * Special check for subpage.
+	 *
+	 * Due to page locking is still done in full page size for
+	 * btrfs_run_delalloc_range(), subpage compression can't allow two
+	 * delalloc range both start at the same page like the following case:
+	 *
+	 * 0		32K		64K
+	 * |	|///////|	|///////|
+	 * 		\- A		\- B
+	 *
+	 * In above case, both range A and range B will try to unlock the full
+	 * page [0, 64K), causing the one finished later will have page
+	 * unlocked already, triggering various page lock requirement BUG_ON()s.
+	 *
+	 * So here we add an artificial limit that subpage compression can only
+	 * happen if the range is fully page aligned.
+	 */
+	if (fs_info->sectorsize < PAGE_SIZE) {
+		if (!IS_ALIGNED(start, PAGE_SIZE) ||
+		    !IS_ALIGNED(end + 1, PAGE_SIZE))
+			return 0;
+	}
 	/* force compress */
 	if (btrfs_test_opt(fs_info, FORCE_COMPRESS))
 		return 1;
-- 
2.32.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/8] btrfs: experimental compression support for subpage
  2021-06-23  5:55 [PATCH RFC 0/8] btrfs: experimental compression support for subpage Qu Wenruo
                   ` (7 preceding siblings ...)
  2021-06-23  5:55 ` [PATCH RFC 8/8] btrfs: only allow subpage compression if the range fully covers the first page Qu Wenruo
@ 2021-06-23 21:23 ` David Sterba
  2021-06-24  1:10   ` Qu Wenruo
  2021-06-23 22:37 ` David Sterba
  9 siblings, 1 reply; 14+ messages in thread
From: David Sterba @ 2021-06-23 21:23 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Jun 23, 2021 at 01:55:21PM +0800, Qu Wenruo wrote:
> This patchset is based on my previously submitted compression refactor,
> which is further based on subpage patchset.
> 
> It can be fetched from the following repo:
> https://github.com/adam900710/linux/tree/compression
> 
> It only has been lightly tested, no stress test/full fstests run yet.
> The reason for such a unstable patchset is to get feedback on how to
> continue the development, aka the feedback for the last patch.
> 
> The first 7 patches are mostly cleanups to make compression path to be
> more subpage compatible.
> 
> The RFC is for the last patch, which will enable subpage compression in
> a pretty controversial way.
> 
> The reason is copied from that patch:
> 
> ```
> This mechanism allows btrfs to continue handling next ranges, without
> waiting for the time consuming compression.
> 
> But this has a problem for subpage case, as we could have the following
> delalloc range for a page:
> 
> 0		32K		64K
> |	|///////|	|///////|
> 		\- A		\- B
> 
> In above case, if we pass both range to cow_file_range_async(), both
> range A and range B will try to unlock the full page [0, 64K).
> 
> And which finishes later than the other range will try to do other page
> operatioins like end_page_writeback() on a unlocked page, triggering VM
> layer BUG_ON().
> 
> Currently I don't have any perfect solution to this, but two
> workarounds:
> 
> - Only allow compression for fully page aligned range
> 
>   This is what I did in this patch.
>   By this, the compressed range can exclusively lock the first page
>   (and other pages), so they are completely safe to do whatever they
>   want.
>   The problem is, we will not compress a lot of small writes.
>   This is especially problematic as our target page size is 64K, not
>   a small size.
> 
> - Make cow_file_range_async() to behave like cow_file_range() for
>   subpage
> 
>   This needs several behavier change, and are all subpage specific:
>   * Skip the first page of the range when finished
>     Just like cow_file_range()
>   * Have a way to wait for the async_cow to finish before handling the
>     next delalloc range
> 
>   The biggest problem here is the performance impact.
>   Although by this we can compress all sector aligned ranges, we will
>   waste time waiting for the async_cow to finish.
>   This is completely denying the meaning of "async" part.
>   Now to mention there are tons of code needs to be changed.
> 
> Thus I choose the current way to only compress ranges which is fully
> page aligned.
> The cost is we will skip a lot of small writes for 64K page size.
> ```
> 
> Any feedback on this part would be pretty helpful.

As another step, progressing towards full subpage support I think that
the limiting compression to the full 64k page is acceptable. Better than
nothing and without intrusive changes described above.

There still might be some odd case even with the whole page, I'm not
sure how exactly it would work with 4k/64k but there are some checks
inside compression anyway if the size is getting smaller and then
bailing out uncompressed anyway. So in general fallback to uncompressed
data happens already.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/8] btrfs: experimental compression support for subpage
  2021-06-23  5:55 [PATCH RFC 0/8] btrfs: experimental compression support for subpage Qu Wenruo
                   ` (8 preceding siblings ...)
  2021-06-23 21:23 ` [PATCH RFC 0/8] btrfs: experimental compression support for subpage David Sterba
@ 2021-06-23 22:37 ` David Sterba
  2021-06-23 22:54   ` Qu Wenruo
  9 siblings, 1 reply; 14+ messages in thread
From: David Sterba @ 2021-06-23 22:37 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Jun 23, 2021 at 01:55:21PM +0800, Qu Wenruo wrote:
> Qu Wenruo (8):
>   btrfs: don't pass compressed pages to
>     btrfs_writepage_endio_finish_ordered()
>   btrfs: make btrfs_subpage_end_and_test_writer() to handle pages not
>     locked by btrfs_page_start_writer_lock()
>   btrfs: make compress_file_range() to be subpage compatible
>   btrfs: make btrfs_submit_compressed_write() to be subpage compatible
>   btrfs: use async_chunk::async_cow to replace the confusing pending
>     pointer
>   btrfs: make end_compressed_bio_writeback() to be subpage compatble
>   btrfs: make extent_write_locked_range() to be subpage compatible
>   btrfs: only allow subpage compression if the range fully covers the
>     first page

Some of the patches seem independent and potentially could be taken into
5.14 now but I guess the whole subpage could go in one big batch to 5.15
so it won't help you much anyway. The significant change is the last
patch and so far I think it's acceptable.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/8] btrfs: experimental compression support for subpage
  2021-06-23 22:37 ` David Sterba
@ 2021-06-23 22:54   ` Qu Wenruo
  0 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2021-06-23 22:54 UTC (permalink / raw)
  To: dsterba, linux-btrfs



On 2021/6/24 上午6:37, David Sterba wrote:
> On Wed, Jun 23, 2021 at 01:55:21PM +0800, Qu Wenruo wrote:
>> Qu Wenruo (8):
>>    btrfs: don't pass compressed pages to
>>      btrfs_writepage_endio_finish_ordered()
>>    btrfs: make btrfs_subpage_end_and_test_writer() to handle pages not
>>      locked by btrfs_page_start_writer_lock()
>>    btrfs: make compress_file_range() to be subpage compatible
>>    btrfs: make btrfs_submit_compressed_write() to be subpage compatible
>>    btrfs: use async_chunk::async_cow to replace the confusing pending
>>      pointer
>>    btrfs: make end_compressed_bio_writeback() to be subpage compatble
>>    btrfs: make extent_write_locked_range() to be subpage compatible
>>    btrfs: only allow subpage compression if the range fully covers the
>>      first page
> 
> Some of the patches seem independent and potentially could be taken into
> 5.14 now but I guess the whole subpage could go in one big batch to 5.15
> so it won't help you much anyway. The significant change is the last
> patch and so far I think it's acceptable.
> 
Nonononono, please don't merge any patche from the series.

As the patches are not fully stress tested, and I have already seen some 
ASSERT()s triggered, thus there must be something wrong in the 
preparation part.

Sorry I should have added some warning like "WIP, don't merge any patch 
in the series".

This is mostly for the discussion on the last patch, not really to get 
any patch merged.

Thanks,
Qu


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/8] btrfs: experimental compression support for subpage
  2021-06-23 21:23 ` [PATCH RFC 0/8] btrfs: experimental compression support for subpage David Sterba
@ 2021-06-24  1:10   ` Qu Wenruo
  0 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2021-06-24  1:10 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2021/6/24 上午5:23, David Sterba wrote:
> On Wed, Jun 23, 2021 at 01:55:21PM +0800, Qu Wenruo wrote:
>> This patchset is based on my previously submitted compression refactor,
>> which is further based on subpage patchset.
>>
>> It can be fetched from the following repo:
>> https://github.com/adam900710/linux/tree/compression
>>
>> It only has been lightly tested, no stress test/full fstests run yet.
>> The reason for such a unstable patchset is to get feedback on how to
>> continue the development, aka the feedback for the last patch.
>>
>> The first 7 patches are mostly cleanups to make compression path to be
>> more subpage compatible.
>>
>> The RFC is for the last patch, which will enable subpage compression in
>> a pretty controversial way.
>>
>> The reason is copied from that patch:
>>
>> ```
>> This mechanism allows btrfs to continue handling next ranges, without
>> waiting for the time consuming compression.
>>
>> But this has a problem for subpage case, as we could have the following
>> delalloc range for a page:
>>
>> 0		32K		64K
>> |	|///////|	|///////|
>> 		\- A		\- B
>>
>> In above case, if we pass both range to cow_file_range_async(), both
>> range A and range B will try to unlock the full page [0, 64K).
>>
>> And which finishes later than the other range will try to do other page
>> operatioins like end_page_writeback() on a unlocked page, triggering VM
>> layer BUG_ON().
>>
>> Currently I don't have any perfect solution to this, but two
>> workarounds:
>>
>> - Only allow compression for fully page aligned range
>>
>>    This is what I did in this patch.
>>    By this, the compressed range can exclusively lock the first page
>>    (and other pages), so they are completely safe to do whatever they
>>    want.
>>    The problem is, we will not compress a lot of small writes.
>>    This is especially problematic as our target page size is 64K, not
>>    a small size.
>>
>> - Make cow_file_range_async() to behave like cow_file_range() for
>>    subpage
>>
>>    This needs several behavier change, and are all subpage specific:
>>    * Skip the first page of the range when finished
>>      Just like cow_file_range()
>>    * Have a way to wait for the async_cow to finish before handling the
>>      next delalloc range
>>
>>    The biggest problem here is the performance impact.
>>    Although by this we can compress all sector aligned ranges, we will
>>    waste time waiting for the async_cow to finish.
>>    This is completely denying the meaning of "async" part.
>>    Now to mention there are tons of code needs to be changed.
>>
>> Thus I choose the current way to only compress ranges which is fully
>> page aligned.
>> The cost is we will skip a lot of small writes for 64K page size.
>> ```
>>
>> Any feedback on this part would be pretty helpful.
>
> As another step, progressing towards full subpage support I think that
> the limiting compression to the full 64k page is acceptable. Better than
> nothing and without intrusive changes described above.
>
> There still might be some odd case even with the whole page, I'm not
> sure how exactly it would work with 4k/64k but there are some checks
> inside compression anyway if the size is getting smaller and then
> bailing out uncompressed anyway. So in general fallback to uncompressed
> data happens already.
>

Well well well, just come by a new shinny and maybe perfect idea to make
it better.

- For async cow submission
   The main problem described above can be solved by
   btrfs_subpage::writers and a change in how we run delalloc range.

   Currently we find and lock a delalloc range, then run the locked range
   immediately.
   This makes any submitted async cow can finish before we reach next
   range, thus even we increase subage::writers it makes no sense.

   But if we find and lock *all* delalloc ranges inside a page first, and
   increase subpage::writers according to all the locked range, we can
   ensure the full page won't be unlocked until all delalloc ranges are
   submitted and finished.

   By this we can allow sector-perfect compressed write.

   And for exsting delalloc range handlers like cow_file_range(), we need
   to keep a bitmap to record which range is submitted asynchronized,
   then unlock the remaining ranges manually.

- The bio submission and delalloc range
   Above change makes me thinking, why we don't submit bio when we run
   delalloc range?

   When running delalloc range, we will need to find and lock all pages
   in the range any way.
   Why we can't do the submission at the same time, as we already need to
   do the page operations anyway.

   I guess there is some performance related reason, but I'm not sure.
   Any feedback on this part will be appreciated.

Despite the better solution, I'll still push the current full page
compression, not only as a quick stop gap, but all these preparation
patches will benefit later sector-perfect compression.

Thanks,
Qu


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 7/8] btrfs: make extent_write_locked_range() to be subpage compatible
  2021-06-23  5:55 ` [PATCH RFC 7/8] btrfs: make extent_write_locked_range() to be subpage compatible Qu Wenruo
@ 2021-07-05 12:43   ` Qu Wenruo
  0 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2021-07-05 12:43 UTC (permalink / raw)
  To: linux-btrfs



On 2021/6/23 下午1:55, Qu Wenruo wrote:
> Function extent_write_locked_range() gets called when an async range
> falls back to regular COW.
> 
> In that case, we need to finish the ordered extents for the range.
> 
> But the function is still using hardcoded PAGE_SIZE to calculate the
> range.
> 
> If it get called with a range like this:
> 
>    0   16K  32K   48K 64K
>    |   |////|/////|  |
> 
> Then the range passed to btrfs_writepage_endio_finish_ordered() will be
> start == 16K, end == 80K, and the page range will no longer cover it,
> triggering an ASSERT().
> 
> Fix it by properly calculate the range end by considering both the page
> end and range end.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/extent_io.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index e244c10074c8..c5491720a346 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5070,16 +5070,19 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
>   
>   	wbc_attach_fdatawrite_inode(&wbc_writepages, inode);
>   	while (start <= end) {
> +		u64 cur_end = min(round_down(start, PAGE_SIZE) + PAGE_SIZE - 1,

Here it should be round_down(cur, PAGE_SIZE) other than @start.

Fixed in my github repo, as this is makeing tons of weird bugs even for 
x86_64.

Thank god this patch is just an RFC...

Thanks,
Qu
> +				  end);
> +
>   		page = find_get_page(mapping, start >> PAGE_SHIFT);
>   		if (clear_page_dirty_for_io(page))
>   			ret = __extent_writepage(page, &wbc_writepages, &epd);
>   		else {
>   			btrfs_writepage_endio_finish_ordered(BTRFS_I(inode),
> -					page, start, start + PAGE_SIZE - 1, 1);
> +					page, start, cur_end, 1);
>   			unlock_page(page);
>   		}
>   		put_page(page);
> -		start += PAGE_SIZE;
> +		start = cur_end + 1;
>   	}
>   
>   	ASSERT(ret <= 0);
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-07-05 12:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-23  5:55 [PATCH RFC 0/8] btrfs: experimental compression support for subpage Qu Wenruo
2021-06-23  5:55 ` [PATCH RFC 1/8] btrfs: don't pass compressed pages to btrfs_writepage_endio_finish_ordered() Qu Wenruo
2021-06-23  5:55 ` [PATCH RFC 2/8] btrfs: make btrfs_subpage_end_and_test_writer() to handle pages not locked by btrfs_page_start_writer_lock() Qu Wenruo
2021-06-23  5:55 ` [PATCH RFC 3/8] btrfs: make compress_file_range() to be subpage compatible Qu Wenruo
2021-06-23  5:55 ` [PATCH RFC 4/8] btrfs: make btrfs_submit_compressed_write() " Qu Wenruo
2021-06-23  5:55 ` [PATCH RFC 5/8] btrfs: use async_chunk::async_cow to replace the confusing pending pointer Qu Wenruo
2021-06-23  5:55 ` [PATCH RFC 6/8] btrfs: make end_compressed_bio_writeback() to be subpage compatble Qu Wenruo
2021-06-23  5:55 ` [PATCH RFC 7/8] btrfs: make extent_write_locked_range() to be subpage compatible Qu Wenruo
2021-07-05 12:43   ` Qu Wenruo
2021-06-23  5:55 ` [PATCH RFC 8/8] btrfs: only allow subpage compression if the range fully covers the first page Qu Wenruo
2021-06-23 21:23 ` [PATCH RFC 0/8] btrfs: experimental compression support for subpage David Sterba
2021-06-24  1:10   ` Qu Wenruo
2021-06-23 22:37 ` David Sterba
2021-06-23 22:54   ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.