All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/27] btrfs: limited subpage compressed write support
@ 2021-07-13  6:14 Qu Wenruo
  2021-07-13  6:14 ` [PATCH 01/27] btrfs: remove unused parameter @nr_pages in add_ra_bio_pages() Qu Wenruo
                   ` (27 more replies)
  0 siblings, 28 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:14 UTC (permalink / raw)
  To: linux-btrfs

The patchset can be fetched from github:
https://github.com/adam900710/linux/tree/compression

The branch is based on the previously submitted subpage enablement
patchset.
The target merge window is v5.16 or v5.17.

=== What's working ===

Delalloc range which is fully page aligned can be compressed with
64K page size and 4K sector size (AKA, subpage).

With current patchset, it can pass most "compress" test group, except
btrfs/106, whose golden output is bound to 4K page size, thus test case
needs to be updated.

And as a basic requirement, 4K page size systems still pass the regular
fstests runs.

=== What's not working ===
Delalloc range not fully page aligned will not go through compression.

That's to say, the following inode will go through different write path:

	0	32K	64K	96K	128K
	|///////////////|	|///////|
			|		\- Will not be compressed
			|
			\- Will be compressed

This will reduce the chance of compression obviously.

But all involved patches will be the basis for later sector perfect
compression support.

The limitation is mostly introduced by two factors:

- How we handle the locked page of a async cow delalloc range
  Currently we unlock the first page unconditionally.
  Even with the patchset, we still follows the behavior.

  This means we can't have two async cow range shares the same
  page.
  This can be enhanced to use subpage::writers, but the next
  problem will prevent us doing so.

- No way to ensure an async cow range not to unlock the page while
  we still have delalloc range in the page

  This is caused by how we run delalloc range in a page.
  For regular sectorsize, it's not a problem as we have at most one
  sector for a page.

  But for subpage case, we can have multiple sectors in one page.
  If we submit an async cow, it may try to unlock the page while
  we are still running the next delalloc range of the page.

  The correct way here is to find and lock all delalloc range inside a
  page, update the subpage::writers properly, then run each delalloc
  range, so that the page won't be unlocked half way.

=== Patch structure ===

Patch 01~04:	Small and safe cleanups
Patch 05:	Make compressed readahead to be subpage compatble
Patch 06~14:	Optimize compressed read/write path to determine stripe
		boundary in a per-bio base
Patch 15~16:	Extra code refactor/cleanup for compressed path

Patch 17~26:	Make compressed write path to be subpage compatible
Patch 27:	Enable limited subpage compressed write support

Patch 01~16 may be a good candidate for early merge, as real heavy
lifting part starts at patch 17.

While patch 01~04 are really small and safe cleanups, which can be
merged even earlier than subpage enablement patchset.


Qu Wenruo (27):
  btrfs: remove unused parameter @nr_pages in add_ra_bio_pages()
  btrfs: remove unnecessary parameter @delalloc_start for
    writepage_delalloc()
  btrfs: use async_chunk::async_cow to replace the confusing pending
    pointer
  btrfs: don't pass compressed pages to
    btrfs_writepage_endio_finish_ordered()
  btrfs: make add_ra_bio_pages() to be subpage compatible
  btrfs: introduce compressed_bio::pending_sectors to trace compressed
    bio more elegantly
  btrfs: add subpage checked_bitmap to make PageChecked flag to be
    subpage compatible
  btrfs: handle errors properly inside btrfs_submit_compressed_read()
  btrfs: handle errors properly inside btrfs_submit_compressed_write()
  btrfs: introduce submit_compressed_bio() for compression
  btrfs: introduce alloc_compressed_bio() for compression
  btrfs: make btrfs_submit_compressed_read() to determine stripe
    boundary at bio allocation time
  btrfs: make btrfs_submit_compressed_write() to determine stripe
    boundary at bio allocation time
  btrfs: remove unused function btrfs_bio_fits_in_stripe()
  btrfs: refactor submit_compressed_extents()
  btrfs: cleanup for extent_write_locked_range()
  btrfs: make compress_file_range() to be subpage compatible
  btrfs: make btrfs_submit_compressed_write() to be subpage compatible
  btrfs: make end_compressed_bio_writeback() to be subpage compatble
  btrfs: make extent_write_locked_range() to be subpage compatible
  btrfs: extract uncompressed async extent submission code into a new
    helper
  btrfs: rework lzo_compress_pages() to make it subpage compatible
  btrfs: teach __extent_writepage() to handle locked page differently
  btrfs: allow page to be unlocked by btrfs_page_end_writer_lock() even
    if it's locked by plain page_lock()
  btrfs: allow subpage to compress a range which only covers one page
  btrfs: don't run delalloc range which is beyond the locked_page to
    prevent deadlock for subpage compression
  btrfs: only allow subpage compression if the range is fully page
    aligned

 fs/btrfs/compression.c           | 678 ++++++++++++++++++-------------
 fs/btrfs/compression.h           |   4 +-
 fs/btrfs/ctree.h                 |   2 -
 fs/btrfs/extent_io.c             | 123 ++++--
 fs/btrfs/extent_io.h             |   3 +-
 fs/btrfs/file.c                  |  20 +-
 fs/btrfs/free-space-cache.c      |   6 +-
 fs/btrfs/inode.c                 | 455 +++++++++++----------
 fs/btrfs/lzo.c                   | 280 ++++++-------
 fs/btrfs/reflink.c               |   2 +-
 fs/btrfs/subpage.c               |  85 ++++
 fs/btrfs/subpage.h               |  10 +
 fs/btrfs/tests/extent-io-tests.c |  12 +-
 13 files changed, 996 insertions(+), 684 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/27] btrfs: remove unused parameter @nr_pages in add_ra_bio_pages()
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
@ 2021-07-13  6:14 ` Qu Wenruo
  2021-07-13  6:14 ` [PATCH 02/27] btrfs: remove unnecessary parameter @delalloc_start for writepage_delalloc() Qu Wenruo
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:14 UTC (permalink / raw)
  To: linux-btrfs

Variable @nr_pages only get increased but never used.
Remove it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 4edfcc3890a5..0d95eed4282c 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -549,7 +549,6 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 	u64 isize = i_size_read(inode);
 	int ret;
 	struct page *page;
-	unsigned long nr_pages = 0;
 	struct extent_map *em;
 	struct address_space *mapping = inode->i_mapping;
 	struct extent_map_tree *em_tree;
@@ -645,7 +644,6 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 				   PAGE_SIZE, 0);
 
 		if (ret == PAGE_SIZE) {
-			nr_pages++;
 			put_page(page);
 		} else {
 			unlock_extent(tree, last_offset, end);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/27] btrfs: remove unnecessary parameter @delalloc_start for writepage_delalloc()
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
  2021-07-13  6:14 ` [PATCH 01/27] btrfs: remove unused parameter @nr_pages in add_ra_bio_pages() Qu Wenruo
@ 2021-07-13  6:14 ` Qu Wenruo
  2021-07-13  6:14 ` [PATCH 03/27] btrfs: use async_chunk::async_cow to replace the confusing pending pointer Qu Wenruo
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:14 UTC (permalink / raw)
  To: linux-btrfs

In function __extent_writepage() we always pass page start to
@delalloc_start for writepage_delalloc().

Thus we don't really need @delalloc_start parameter as we can extract it
from @page.

So this patch will remove @delalloc_start parameter and make
__extent_writepage() to declare @page_start and @page_end as const.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 248f22267665..d11facbd64c0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3774,10 +3774,11 @@ static void update_nr_written(struct writeback_control *wbc,
  */
 static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 		struct page *page, struct writeback_control *wbc,
-		u64 delalloc_start, unsigned long *nr_written)
+		unsigned long *nr_written)
 {
-	u64 page_end = delalloc_start + PAGE_SIZE - 1;
+	u64 page_end = page_offset(page) + PAGE_SIZE - 1;
 	bool found;
+	u64 delalloc_start = page_offset(page);
 	u64 delalloc_to_write = 0;
 	u64 delalloc_end = 0;
 	int ret;
@@ -4058,8 +4059,8 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 			      struct extent_page_data *epd)
 {
 	struct inode *inode = page->mapping->host;
-	u64 start = page_offset(page);
-	u64 page_end = start + PAGE_SIZE - 1;
+	const u64 page_start = page_offset(page);
+	const u64 page_end = page_start + PAGE_SIZE - 1;
 	int ret;
 	int nr = 0;
 	size_t pg_offset;
@@ -4093,7 +4094,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 	}
 
 	if (!epd->extent_locked) {
-		ret = writepage_delalloc(BTRFS_I(inode), page, wbc, start,
+		ret = writepage_delalloc(BTRFS_I(inode), page, wbc,
 					 &nr_written);
 		if (ret == 1)
 			return 0;
@@ -4114,7 +4115,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 	}
 	if (PageError(page)) {
 		ret = ret < 0 ? ret : -EIO;
-		end_extent_writepage(page, ret, start, page_end);
+		end_extent_writepage(page, ret, page_start, page_end);
 	}
 	unlock_page(page);
 	ASSERT(ret <= 0);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/27] btrfs: use async_chunk::async_cow to replace the confusing pending pointer
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
  2021-07-13  6:14 ` [PATCH 01/27] btrfs: remove unused parameter @nr_pages in add_ra_bio_pages() Qu Wenruo
  2021-07-13  6:14 ` [PATCH 02/27] btrfs: remove unnecessary parameter @delalloc_start for writepage_delalloc() Qu Wenruo
@ 2021-07-13  6:14 ` Qu Wenruo
  2021-07-13  7:36   ` Nikolay Borisov
  2021-07-13  6:14 ` [PATCH 04/27] btrfs: don't pass compressed pages to btrfs_writepage_endio_finish_ordered() Qu Wenruo
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:14 UTC (permalink / raw)
  To: linux-btrfs

For structure async_chunk, we use a very strange member layout to grab
structure async_cow who owns this async_chunk.

At initialization, it goes like this:

		async_chunk[i].pending = &ctx->num_chunks;

Then at async_cow_free() we do a super weird freeing:

	/*
	 * Since the pointer to 'pending' is at the beginning of the array of
	 * async_chunk's, freeing it ensures the whole array has been freed.
	 */
	if (atomic_dec_and_test(async_chunk->pending))
		kvfree(async_chunk->pending);

This is absolutely an abuse of kvfree().

Replace async_chunk::pending with async_chunk::async_cow, so that we can
grab the async_cow structure directly, without this strange dancing.

And with this change, there is no requirement for any specific member
location.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c8f4d1f8cc82..e5703160718c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -454,11 +454,10 @@ struct async_chunk {
 	struct list_head extents;
 	struct cgroup_subsys_state *blkcg_css;
 	struct btrfs_work work;
-	atomic_t *pending;
+	struct async_cow *async_cow;
 };
 
 struct async_cow {
-	/* Number of chunks in flight; must be first in the structure */
 	atomic_t num_chunks;
 	struct async_chunk chunks[];
 };
@@ -1324,18 +1323,17 @@ static noinline void async_cow_submit(struct btrfs_work *work)
 static noinline void async_cow_free(struct btrfs_work *work)
 {
 	struct async_chunk *async_chunk;
+	struct async_cow *async_cow;
 
 	async_chunk = container_of(work, struct async_chunk, work);
 	if (async_chunk->inode)
 		btrfs_add_delayed_iput(async_chunk->inode);
 	if (async_chunk->blkcg_css)
 		css_put(async_chunk->blkcg_css);
-	/*
-	 * Since the pointer to 'pending' is at the beginning of the array of
-	 * async_chunk's, freeing it ensures the whole array has been freed.
-	 */
-	if (atomic_dec_and_test(async_chunk->pending))
-		kvfree(async_chunk->pending);
+
+	async_cow = async_chunk->async_cow;
+	if (atomic_dec_and_test(&async_cow->num_chunks))
+		kvfree(async_cow);
 }
 
 static int cow_file_range_async(struct btrfs_inode *inode,
@@ -1396,7 +1394,7 @@ static int cow_file_range_async(struct btrfs_inode *inode,
 		 * lightweight reference for the callback lifetime
 		 */
 		ihold(&inode->vfs_inode);
-		async_chunk[i].pending = &ctx->num_chunks;
+		async_chunk[i].async_cow = ctx;
 		async_chunk[i].inode = &inode->vfs_inode;
 		async_chunk[i].start = start;
 		async_chunk[i].end = cur_end;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/27] btrfs: don't pass compressed pages to btrfs_writepage_endio_finish_ordered()
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (2 preceding siblings ...)
  2021-07-13  6:14 ` [PATCH 03/27] btrfs: use async_chunk::async_cow to replace the confusing pending pointer Qu Wenruo
@ 2021-07-13  6:14 ` Qu Wenruo
  2021-07-13  6:14 ` [PATCH 05/27] btrfs: make add_ra_bio_pages() to be subpage compatible Qu Wenruo
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:14 UTC (permalink / raw)
  To: linux-btrfs

Since async_extent holds the compressed page, it would trigger the new
ASSERT() in btrfs_mark_ordered_io_finished() which checks the range is
inside the page.

Now btrfs_writepage_endio_finish_ordered() can accept @page == NULL,
just pass NULL to btrfs_writepage_endio_finish_ordered().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e5703160718c..b524deadb5c6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -972,15 +972,12 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk)
 				    async_extent->nr_pages,
 				    async_chunk->write_flags,
 				    async_chunk->blkcg_css)) {
-			struct page *p = async_extent->pages[0];
 			const u64 start = async_extent->start;
 			const u64 end = start + async_extent->ram_size - 1;
 
-			p->mapping = inode->vfs_inode.i_mapping;
-			btrfs_writepage_endio_finish_ordered(inode, p, start,
+			btrfs_writepage_endio_finish_ordered(inode, NULL, start,
 							     end, 0);
 
-			p->mapping = NULL;
 			extent_clear_unlock_delalloc(inode, start, end, NULL, 0,
 						     PAGE_END_WRITEBACK |
 						     PAGE_SET_ERROR);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/27] btrfs: make add_ra_bio_pages() to be subpage compatible
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (3 preceding siblings ...)
  2021-07-13  6:14 ` [PATCH 04/27] btrfs: don't pass compressed pages to btrfs_writepage_endio_finish_ordered() Qu Wenruo
@ 2021-07-13  6:14 ` Qu Wenruo
  2021-07-13  6:14 ` [PATCH 06/27] btrfs: introduce compressed_bio::pending_sectors to trace compressed bio more elegantly Qu Wenruo
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:14 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
If we remove the subpage limitation in add_ra_bio_pages(), then read a
compressed extent which has part of its range in next page, like the
following inode layout:

	0	32K	64K	96K	128K
	|<--------------|-------------->|

Btrfs will trigger ASSERT() in endio function:

 assertion failed: atomic_read(&subpage->readers) >= nbits
 ------------[ cut here ]------------
 kernel BUG at fs/btrfs/ctree.h:3431!
 Internal error: Oops - BUG: 0 [#1] SMP
 Workqueue: btrfs-endio btrfs_work_helper [btrfs]
 Call trace:
  assertfail.constprop.0+0x28/0x2c [btrfs]
  btrfs_subpage_end_reader+0x148/0x14c [btrfs]
  end_page_read+0x8c/0x100 [btrfs]
  end_bio_extent_readpage+0x320/0x6b0 [btrfs]
  bio_endio+0x15c/0x1dc
  end_workqueue_fn+0x44/0x64 [btrfs]
  btrfs_work_helper+0x74/0x250 [btrfs]
  process_one_work+0x1d4/0x47c
  worker_thread+0x180/0x400
  kthread+0x11c/0x120
  ret_from_fork+0x10/0x30
 ---[ end trace c8b7b552d3bb408c ]---

[CAUSE]
When we read the page range [0, 64K), we find it's a compressed extent,
and we will try to add extra pages in add_ra_bio_pages() to avoid
reading the same compressed extent.

But when we add such page into the read bio, it doesn't follow the
behavior of btrfs_do_readpage() to properly set subpage::readers.

This means, for page [64K, 128K), its subpage::readers is still 0.

And when endio is executed on both pages, since page [64K, 128K) has 0
subpage::readers, it triggers above ASSERT()

[FIX]
Function add_ra_bio_pages() is far from subpage compatible, it always
assume PAGE_SIZE == sectorsize, thus when it skip to next range it
always just skip PAGE_SIZE.

Make it subpage compatible by:

- Skip to next page properly when needed
  If we find there is already a page cache, we need to skip to next
  page.
  For that case, we shouldn't just skip PAGE_SIZE bytes, but use
  @pg_index to calculate the next bytenr and continue.

- Only add the page range covered by current extent map
  We need to calculate which range is covered by current extent map and
  only add that part into the read bio.

- Update subpage::readers before submitting the bio

- Use proper cursor other than confusing @last_offset

- Calculate the missed threshold based on sector size
  It's no longer using missed pages, as for 64K page size, we have at
  most 3 pages to skip. (If aligned only 2 pages)

- Add ASSERT() to make sure our bytenr is always aligned

- Add comment for the function
  Add an especial note for subpage case, as the function won't really
  work well for subpage cases.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 89 +++++++++++++++++++++++++++---------------
 fs/btrfs/extent_io.c   |  1 +
 2 files changed, 59 insertions(+), 31 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 0d95eed4282c..bfd31f96c088 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -539,13 +539,24 @@ static u64 bio_end_offset(struct bio *bio)
 	return page_offset(last->bv_page) + last->bv_len + last->bv_offset;
 }
 
+/*
+ * Add extra pages in the same compressed file extent so that we don't
+ * need to re-read the same extent again and again.
+ *
+ * NOTE: this won't work well for subpage, as for subpage read, we lock the
+ * full page then submit bio for each compressed/regular extents.
+ *
+ * This means, if we have several sectors in the same page points to the same
+ * on-disk compressed data, we will re-read the same extent many times and
+ * this function can only help for the next page.
+ */
 static noinline int add_ra_bio_pages(struct inode *inode,
 				     u64 compressed_end,
 				     struct compressed_bio *cb)
 {
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	unsigned long end_index;
-	unsigned long pg_index;
-	u64 last_offset;
+	u64 cur = bio_end_offset(cb->orig_bio);
 	u64 isize = i_size_read(inode);
 	int ret;
 	struct page *page;
@@ -553,10 +564,8 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 	struct address_space *mapping = inode->i_mapping;
 	struct extent_map_tree *em_tree;
 	struct extent_io_tree *tree;
-	u64 end;
-	int misses = 0;
+	int sectors_missed = 0;
 
-	last_offset = bio_end_offset(cb->orig_bio);
 	em_tree = &BTRFS_I(inode)->extent_tree;
 	tree = &BTRFS_I(inode)->io_tree;
 
@@ -575,18 +584,29 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 
 	end_index = (i_size_read(inode) - 1) >> PAGE_SHIFT;
 
-	while (last_offset < compressed_end) {
-		pg_index = last_offset >> PAGE_SHIFT;
+	while (cur < compressed_end) {
+		u64 page_end;
+		u64 pg_index = cur >> PAGE_SHIFT;
+		u32 add_size;
 
 		if (pg_index > end_index)
 			break;
 
 		page = xa_load(&mapping->i_pages, pg_index);
 		if (page && !xa_is_value(page)) {
-			misses++;
-			if (misses > 4)
+			sectors_missed += (PAGE_SIZE - offset_in_page(cur)) >>
+					  fs_info->sectorsize_bits;
+
+			/* Beyond threshold, no need to continue */
+			if (sectors_missed > 4)
 				break;
-			goto next;
+
+			/*
+			 * Jump to next page start as we already have page for
+			 * current offset.
+			 */
+			cur = (pg_index << PAGE_SHIFT) + PAGE_SIZE;
+			continue;
 		}
 
 		page = __page_cache_alloc(mapping_gfp_constraint(mapping,
@@ -596,14 +616,11 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 
 		if (add_to_page_cache_lru(page, mapping, pg_index, GFP_NOFS)) {
 			put_page(page);
-			goto next;
+			/* There is already a page, skip to page end */
+			cur = (pg_index << PAGE_SHIFT) + PAGE_SIZE;
+			continue;
 		}
 
-		/*
-		 * at this point, we have a locked page in the page cache
-		 * for these bytes in the file.  But, we have to make
-		 * sure they map to this compressed extent on disk.
-		 */
 		ret = set_page_extent_mapped(page);
 		if (ret < 0) {
 			unlock_page(page);
@@ -611,18 +628,22 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 			break;
 		}
 
-		end = last_offset + PAGE_SIZE - 1;
-		lock_extent(tree, last_offset, end);
+		page_end = (pg_index << PAGE_SHIFT) + PAGE_SIZE - 1;
+		lock_extent(tree, cur, page_end);
 		read_lock(&em_tree->lock);
-		em = lookup_extent_mapping(em_tree, last_offset,
-					   PAGE_SIZE);
+		em = lookup_extent_mapping(em_tree, cur, page_end + 1 - cur);
 		read_unlock(&em_tree->lock);
 
-		if (!em || last_offset < em->start ||
-		    (last_offset + PAGE_SIZE > extent_map_end(em)) ||
+		/*
+		 * At this point, we have a locked page in the page cache
+		 * for these bytes in the file.  But, we have to make
+		 * sure they map to this compressed extent on disk.
+		 */
+		if (!em || cur < em->start ||
+		    (cur + fs_info->sectorsize > extent_map_end(em)) ||
 		    (em->block_start >> 9) != cb->orig_bio->bi_iter.bi_sector) {
 			free_extent_map(em);
-			unlock_extent(tree, last_offset, end);
+			unlock_extent(tree, cur, page_end);
 			unlock_page(page);
 			put_page(page);
 			break;
@@ -640,19 +661,25 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 			}
 		}
 
+		add_size = min(em->start + em->len, page_end + 1) - cur;
 		ret = bio_add_page(cb->orig_bio, page,
-				   PAGE_SIZE, 0);
-
-		if (ret == PAGE_SIZE) {
-			put_page(page);
-		} else {
-			unlock_extent(tree, last_offset, end);
+				   add_size, offset_in_page(cur));
+		if (ret != add_size) {
+			unlock_extent(tree, cur, page_end);
 			unlock_page(page);
 			put_page(page);
 			break;
 		}
-next:
-		last_offset += PAGE_SIZE;
+		/*
+		 * If it's subpage, we also need to increase its
+		 * subpage::readers numbre, as at endio we will
+		 * decrease subpage::readers and to unlock the page.
+		 */
+		if (fs_info->sectorsize < PAGE_SIZE)
+			btrfs_subpage_start_reader(fs_info, page,
+				cur, add_size);
+		put_page(page);
+		cur += add_size;
 	}
 	return 0;
 }
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d11facbd64c0..5ad15a092574 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3595,6 +3595,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		bool force_bio_submit = false;
 		u64 disk_bytenr;
 
+		ASSERT(IS_ALIGNED(cur, fs_info->sectorsize));
 		if (cur >= last_byte) {
 			struct extent_state *cached = NULL;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/27] btrfs: introduce compressed_bio::pending_sectors to trace compressed bio more elegantly
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (4 preceding siblings ...)
  2021-07-13  6:14 ` [PATCH 05/27] btrfs: make add_ra_bio_pages() to be subpage compatible Qu Wenruo
@ 2021-07-13  6:14 ` Qu Wenruo
  2021-07-13  6:14 ` [PATCH 07/27] btrfs: add subpage checked_bitmap to make PageChecked flag to be subpage compatible Qu Wenruo
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:14 UTC (permalink / raw)
  To: linux-btrfs

For btrfs_submit_compressed_read() and btrfs_submit_compressed_write(),
we have a pretty weird dance around compressed_bio::pending_bios:

  btrfs_submit_compressed_read/write()
  {
	cb = kmalloc()
	refcount_set(&cb->pending_bios, 0);
	bio = btrfs_alloc_bio();

	/* NOTE here, we haven't yet submitted any bio */
	refcount_set(&cb->pending_bios, 1);

	for (pg_index = 0; pg_index < cb->nr_pages; pg_index++) {
		if (submit) {
			/* Here we submit bio, but we always have one
			 * extra pending_bios */
			refcount_inc(&cb->pending_bios);
			ret = btrfs_map_bio();
		}
	}

	/* Submit the last bio */
	ret = btrfs_map_bio();
  }

There are two reasons why we do this:

- compressed_bio::pending_bios is a refcount
  Thus if it's reduced to 0, it can not be increased again.

- To ensure the compressed_bio is not freed by some submitted bios
  If the submitted bio is finished before the next bio submitted,
  we can free the compressed_bio completely.

But the above code is sometimes confusing, and we can do it better by
just introduce a new member, compressed_bio::pending_sectors.

Now we use compressed_bio::pending_sectors to indicate whether we have any
pending sectors under IO or not yet submitted.

If pending_sectors == 0, we're definitely the last bio of compressed_bio,
and is OK to release the compressed bio.

Now the workflow looks like this:

  btrfs_submit_compressed_read/write()
  {
	cb = kmalloc()
	atomic_set(&cb->pending_bios, 0);
	refcount_set(&cb->pending_sectors,
		     compressed_len >> sectorsize_bits);
	bio = btrfs_alloc_bio();

	for (pg_index = 0; pg_index < cb->nr_pages; pg_index++) {
		if (submit) {
			refcount_inc(&cb->pending_bios);
			ret = btrfs_map_bio();
		}
	}

	/* Submit the last bio */
	refcount_inc(&cb->pending_bios);
	ret = btrfs_map_bio();
  }

For now we still need pending_bios for later error handling, but will
remove pending_bios eventually after properly handling the errors.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 78 ++++++++++++++++++++++++------------------
 fs/btrfs/compression.h |  5 ++-
 2 files changed, 49 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index bfd31f96c088..f08522e8b4cd 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -192,6 +192,39 @@ static int check_compressed_csum(struct btrfs_inode *inode, struct bio *bio,
 	return 0;
 }
 
+/*
+ * Reduce bio and io accounting for a compressed_bio with its coresponding bio.
+ *
+ * Return true if there is no pending bio nor io.
+ * Return false otherwise.
+ */
+static bool dec_and_test_compressed_bio(struct compressed_bio *cb,
+					struct bio *bio)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
+	unsigned int bi_size = 0;
+	bool last_io = false;
+	struct bio_vec *bvec;
+	struct bvec_iter_all iter_all;
+
+	/*
+	 * At endio time, bi_iter.bi_size doesn't represent the real bio size.
+	 * Thus here we have to iterate through all segments to grab correct
+	 * bio size.
+	 */
+	bio_for_each_segment_all(bvec, bio, iter_all)
+		bi_size += bvec->bv_len;
+
+	if (bio->bi_status)
+		cb->errors = 1;
+
+	ASSERT(bi_size && bi_size <= cb->compressed_len);
+	last_io = refcount_sub_and_test(bi_size >> fs_info->sectorsize_bits,
+					&cb->pending_sectors);
+	atomic_dec(&cb->pending_bios);
+	return last_io;
+}
+
 /* when we finish reading compressed pages from the disk, we
  * decompress them and then run the bio end_io routines on the
  * decompressed pages (in the inode address space).
@@ -211,13 +244,7 @@ static void end_compressed_bio_read(struct bio *bio)
 	unsigned int mirror = btrfs_io_bio(bio)->mirror_num;
 	int ret = 0;
 
-	if (bio->bi_status)
-		cb->errors = 1;
-
-	/* if there are more bios still pending for this compressed
-	 * extent, just exit
-	 */
-	if (!refcount_dec_and_test(&cb->pending_bios))
+	if (!dec_and_test_compressed_bio(cb, bio))
 		goto out;
 
 	/*
@@ -335,13 +362,7 @@ static void end_compressed_bio_write(struct bio *bio)
 	struct page *page;
 	unsigned int index;
 
-	if (bio->bi_status)
-		cb->errors = 1;
-
-	/* if there are more bios still pending for this compressed
-	 * extent, just exit
-	 */
-	if (!refcount_dec_and_test(&cb->pending_bios))
+	if (!dec_and_test_compressed_bio(cb, bio))
 		goto out;
 
 	/* ok, we're the last bio for this extent, step one is to
@@ -407,7 +428,9 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	cb = kmalloc(compressed_bio_size(fs_info, compressed_len), GFP_NOFS);
 	if (!cb)
 		return BLK_STS_RESOURCE;
-	refcount_set(&cb->pending_bios, 0);
+	atomic_set(&cb->pending_bios, 0);
+	refcount_set(&cb->pending_sectors,
+		     compressed_len >> fs_info->sectorsize_bits);
 	cb->errors = 0;
 	cb->inode = &inode->vfs_inode;
 	cb->start = start;
@@ -440,7 +463,6 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 		bio->bi_opf |= REQ_CGROUP_PUNT;
 		kthread_associate_blkcg(blkcg_css);
 	}
-	refcount_set(&cb->pending_bios, 1);
 
 	/* create and submit bios for the compressed pages */
 	bytes_left = compressed_len;
@@ -468,13 +490,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 
 		page->mapping = NULL;
 		if (submit || len < PAGE_SIZE) {
-			/*
-			 * inc the count before we submit the bio so
-			 * we know the end IO handler won't happen before
-			 * we inc the count.  Otherwise, the cb might get
-			 * freed before we're done setting it up
-			 */
-			refcount_inc(&cb->pending_bios);
+			atomic_inc(&cb->pending_bios);
 			ret = btrfs_bio_wq_end_io(fs_info, bio,
 						  BTRFS_WQ_ENDIO_DATA);
 			BUG_ON(ret); /* -ENOMEM */
@@ -512,6 +528,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 		cond_resched();
 	}
 
+	atomic_inc(&cb->pending_bios);
 	ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
 	BUG_ON(ret); /* -ENOMEM */
 
@@ -733,7 +750,9 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	if (!cb)
 		goto out;
 
-	refcount_set(&cb->pending_bios, 0);
+	atomic_set(&cb->pending_bios, 0);
+	refcount_set(&cb->pending_sectors,
+		     compressed_len >> fs_info->sectorsize_bits);
 	cb->errors = 0;
 	cb->inode = inode;
 	cb->mirror_num = mirror_num;
@@ -777,7 +796,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	comp_bio->bi_opf = REQ_OP_READ;
 	comp_bio->bi_private = cb;
 	comp_bio->bi_end_io = end_compressed_bio_read;
-	refcount_set(&cb->pending_bios, 1);
 
 	for (pg_index = 0; pg_index < nr_pages; pg_index++) {
 		u32 pg_len = PAGE_SIZE;
@@ -806,18 +824,11 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		if (submit || bio_add_page(comp_bio, page, pg_len, 0) < pg_len) {
 			unsigned int nr_sectors;
 
+			atomic_inc(&cb->pending_bios);
 			ret = btrfs_bio_wq_end_io(fs_info, comp_bio,
 						  BTRFS_WQ_ENDIO_DATA);
 			BUG_ON(ret); /* -ENOMEM */
 
-			/*
-			 * inc the count before we submit the bio so
-			 * we know the end IO handler won't happen before
-			 * we inc the count.  Otherwise, the cb might get
-			 * freed before we're done setting it up
-			 */
-			refcount_inc(&cb->pending_bios);
-
 			ret = btrfs_lookup_bio_sums(inode, comp_bio, sums);
 			BUG_ON(ret); /* -ENOMEM */
 
@@ -841,6 +852,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		cur_disk_byte += pg_len;
 	}
 
+	atomic_inc(&cb->pending_bios);
 	ret = btrfs_bio_wq_end_io(fs_info, comp_bio, BTRFS_WQ_ENDIO_DATA);
 	BUG_ON(ret); /* -ENOMEM */
 
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 399be0b435bf..61955581e34f 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -29,7 +29,10 @@ struct btrfs_inode;
 
 struct compressed_bio {
 	/* number of bios pending for this compressed extent */
-	refcount_t pending_bios;
+	atomic_t pending_bios;
+
+	/* Number of sectors with unfinished IO (unsubmitted or unfinished) */
+	refcount_t pending_sectors;
 
 	/* Number of compressed pages in the array */
 	unsigned int nr_pages;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/27] btrfs: add subpage checked_bitmap to make PageChecked flag to be subpage compatible
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (5 preceding siblings ...)
  2021-07-13  6:14 ` [PATCH 06/27] btrfs: introduce compressed_bio::pending_sectors to trace compressed bio more elegantly Qu Wenruo
@ 2021-07-13  6:14 ` Qu Wenruo
  2021-07-16  7:54   ` Qu Wenruo
  2021-07-13  6:14 ` [PATCH 08/27] btrfs: handle errors properly inside btrfs_submit_compressed_read() Qu Wenruo
                   ` (20 subsequent siblings)
  27 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:14 UTC (permalink / raw)
  To: linux-btrfs

Although in btrfs we have very limited user of PageChecked flag, it's
still some page flag not yet subpage compatible.

Fix it by introducing btrfs_subpage::checked_bitmap to do the covert.

For most call sites, especially for free-space cache, COW fixup and
btrfs_invalidatepage(), they all work in full page mode anyway.

For other call sites, they work as fully subpage mode.

Some call sites need extra modification:

- btrfs_drop_pages()
  Needs extra parameter to get the real range we need to clear checked
  flag.

- btrfs_invalidatepage()
  We need to call subpage helper before calling __btrfs_releasepage(),
  or it will trigger ASSERT() as page->private will be cleared.

- btrfs_verify_data_csum()
  In theory we don't need the io_bio->csum check anymore, but it's
  won't hurt.
  Just change the comment.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c      | 11 +++++++++--
 fs/btrfs/file.c             | 20 +++++++++++++++-----
 fs/btrfs/free-space-cache.c |  6 +++++-
 fs/btrfs/inode.c            | 30 ++++++++++++++----------------
 fs/btrfs/reflink.c          |  2 +-
 fs/btrfs/subpage.c          | 31 +++++++++++++++++++++++++++++++
 fs/btrfs/subpage.h          |  8 ++++++++
 7 files changed, 83 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index f08522e8b4cd..be81e34fbd56 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -29,6 +29,7 @@
 #include "extent_io.h"
 #include "extent_map.h"
 #include "zoned.h"
+#include "subpage.h"
 
 static const char* const btrfs_compress_types[] = { "", "zlib", "lzo", "zstd" };
 
@@ -296,8 +297,14 @@ static void end_compressed_bio_read(struct bio *bio)
 		 * checked so the end_io handlers know about it
 		 */
 		ASSERT(!bio_flagged(bio, BIO_CLONED));
-		bio_for_each_segment_all(bvec, cb->orig_bio, iter_all)
-			SetPageChecked(bvec->bv_page);
+		bio_for_each_segment_all(bvec, cb->orig_bio, iter_all) {
+			u64 bvec_start = page_offset(bvec->bv_page) +
+					 bvec->bv_offset;
+
+			btrfs_page_set_checked(btrfs_sb(cb->inode->i_sb),
+					bvec->bv_page, bvec_start,
+					bvec->bv_len);
+		}
 
 		bio_endio(cb->orig_bio);
 	}
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0831ca08376f..ccbbf2732685 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -437,9 +437,16 @@ static noinline int btrfs_copy_from_user(loff_t pos, size_t write_bytes,
 /*
  * unlocks pages after btrfs_file_write is done with them
  */
-static void btrfs_drop_pages(struct page **pages, size_t num_pages)
+static void btrfs_drop_pages(struct btrfs_fs_info *fs_info,
+			     struct page **pages, size_t num_pages,
+			     u64 pos, u64 copied)
 {
 	size_t i;
+	u64 block_start = round_down(pos, fs_info->sectorsize);
+	u64 block_len = round_up(pos + copied, fs_info->sectorsize) -
+			block_start;
+
+	ASSERT(block_len <= U32_MAX);
 	for (i = 0; i < num_pages; i++) {
 		/* page checked is some magic around finding pages that
 		 * have been modified without going through btrfs_set_page_dirty
@@ -447,7 +454,8 @@ static void btrfs_drop_pages(struct page **pages, size_t num_pages)
 		 * accessed as prepare_pages should have marked them accessed
 		 * in prepare_pages via find_or_create_page()
 		 */
-		ClearPageChecked(pages[i]);
+		btrfs_page_clamp_clear_checked(fs_info, pages[i], block_start,
+					       block_len);
 		unlock_page(pages[i]);
 		put_page(pages[i]);
 	}
@@ -504,7 +512,8 @@ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
 		struct page *p = pages[i];
 
 		btrfs_page_clamp_set_uptodate(fs_info, p, start_pos, num_bytes);
-		ClearPageChecked(p);
+		btrfs_page_clamp_clear_checked(fs_info, p, start_pos,
+					       num_bytes);
 		btrfs_page_clamp_set_dirty(fs_info, p, start_pos, num_bytes);
 	}
 
@@ -1845,7 +1854,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 
 		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
 		if (ret) {
-			btrfs_drop_pages(pages, num_pages);
+			btrfs_drop_pages(fs_info, pages, num_pages, pos,
+					 copied);
 			break;
 		}
 
@@ -1853,7 +1863,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		if (only_release_metadata)
 			btrfs_check_nocow_unlock(BTRFS_I(inode));
 
-		btrfs_drop_pages(pages, num_pages);
+		btrfs_drop_pages(fs_info, pages, num_pages, pos, copied);
 
 		cond_resched();
 
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 2131ae5b9ed7..f0a84fe7dc80 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -22,6 +22,7 @@
 #include "delalloc-space.h"
 #include "block-group.h"
 #include "discard.h"
+#include "subpage.h"
 
 #define BITS_PER_BITMAP		(PAGE_SIZE * 8UL)
 #define MAX_CACHE_BYTES_PER_GIG	SZ_64K
@@ -417,7 +418,10 @@ static void io_ctl_drop_pages(struct btrfs_io_ctl *io_ctl)
 
 	for (i = 0; i < io_ctl->num_pages; i++) {
 		if (io_ctl->pages[i]) {
-			ClearPageChecked(io_ctl->pages[i]);
+			btrfs_page_clear_checked(io_ctl->fs_info,
+					io_ctl->pages[i],
+					page_offset(io_ctl->pages[i]),
+					PAGE_SIZE);
 			unlock_page(io_ctl->pages[i]);
 			put_page(io_ctl->pages[i]);
 		}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b524deadb5c6..5c29d131a574 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2757,7 +2757,8 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 		clear_page_dirty_for_io(page);
 		SetPageError(page);
 	}
-	ClearPageChecked(page);
+	btrfs_page_clear_checked(inode->root->fs_info, page,
+				 page_start, PAGE_SIZE);
 	unlock_page(page);
 	put_page(page);
 	kfree(fixup);
@@ -2812,7 +2813,7 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end)
 	 * page->mapping outside of the page lock.
 	 */
 	ihold(inode);
-	SetPageChecked(page);
+	btrfs_page_set_checked(fs_info, page, page_offset(page), PAGE_SIZE);
 	get_page(page);
 	btrfs_init_work(&fixup->work, btrfs_writepage_fixup_worker, NULL, NULL);
 	fixup->page = page;
@@ -3257,27 +3258,23 @@ unsigned int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u32 bio_offset,
 				    struct page *page, u64 start, u64 end)
 {
 	struct inode *inode = page->mapping->host;
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	const u32 sectorsize = root->fs_info->sectorsize;
 	u32 pg_off;
 	unsigned int result = 0;
 
-	if (PageChecked(page)) {
-		ClearPageChecked(page);
+	if (btrfs_page_test_checked(fs_info, page, start, end + 1 - start)) {
+		btrfs_page_clear_checked(fs_info, page, start, end + 1 - start);
 		return 0;
 	}
 
 	/*
-	 * For subpage case, above PageChecked is not safe as it's not subpage
-	 * compatible.
-	 * But for now only cow fixup and compressed read utilize PageChecked
-	 * flag, while in this context we can easily use io_bio->csum to
-	 * determine if we really need to do csum verification.
-	 *
-	 * So for now, just exit if io_bio->csum is NULL, as it means it's
-	 * compressed read, and its compressed data csum has already been
-	 * verified.
+	 * This only happens for NODATASUM or compressed read.
+	 * Normally this should be covered by above check for compressed read
+	 * or the next check for NODATASUM.
+	 * Just do a quicker exit here.
 	 */
 	if (io_bio->csum == NULL)
 		return 0;
@@ -5083,7 +5080,8 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 				     len);
 		flush_dcache_page(page);
 	}
-	ClearPageChecked(page);
+	btrfs_page_clear_checked(fs_info, page, block_start,
+				 block_end + 1 - block_start);
 	btrfs_page_set_dirty(fs_info, page, block_start, block_end + 1 - block_start);
 	unlock_extent_cached(io_tree, block_start, block_end, &cached_state);
 
@@ -8673,9 +8671,9 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 	 * did something wrong.
 	 */
 	ASSERT(!PageOrdered(page));
+	btrfs_page_clear_checked(fs_info, page, page_offset(page), PAGE_SIZE);
 	if (!inode_evicting)
 		__btrfs_releasepage(page, GFP_NOFS);
-	ClearPageChecked(page);
 	clear_page_extent_mapped(page);
 }
 
@@ -8819,7 +8817,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 		memzero_page(page, zero_start, PAGE_SIZE - zero_start);
 		flush_dcache_page(page);
 	}
-	ClearPageChecked(page);
+	btrfs_page_clear_checked(fs_info, page, page_start, PAGE_SIZE);
 	btrfs_page_set_dirty(fs_info, page, page_start, end + 1 - page_start);
 	btrfs_page_set_uptodate(fs_info, page, page_start, end + 1 - page_start);
 
diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index 9b0814318e72..a7de8cfdcac0 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -138,7 +138,7 @@ static int copy_inline_to_page(struct btrfs_inode *inode,
 	}
 
 	btrfs_page_set_uptodate(fs_info, page, file_offset, block_size);
-	ClearPageChecked(page);
+	btrfs_page_clear_checked(fs_info, page, file_offset, block_size);
 	btrfs_page_set_dirty(fs_info, page, file_offset, block_size);
 out_unlock:
 	if (page) {
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index a61aa33aeeee..b8a420cb1683 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -468,6 +468,34 @@ void btrfs_subpage_clear_ordered(const struct btrfs_fs_info *fs_info,
 		ClearPageOrdered(page);
 	spin_unlock_irqrestore(&subpage->lock, flags);
 }
+
+void btrfs_subpage_set_checked(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	const u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->checked_bitmap |= tmp;
+	if (subpage->checked_bitmap == U16_MAX)
+		SetPageChecked(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
+void btrfs_subpage_clear_checked(const struct btrfs_fs_info *fs_info,
+		struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	const u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->checked_bitmap &= ~tmp;
+	ClearPageChecked(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
 /*
  * Unlike set/clear which is dependent on each page status, for test all bits
  * are tested in the same way.
@@ -491,6 +519,7 @@ IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(error);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(dirty);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(writeback);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(ordered);
+IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(checked);
 
 /*
  * Note that, in selftests (extent-io-tests), we can have empty fs_info passed
@@ -561,6 +590,8 @@ IMPLEMENT_BTRFS_PAGE_OPS(writeback, set_page_writeback, end_page_writeback,
 			 PageWriteback);
 IMPLEMENT_BTRFS_PAGE_OPS(ordered, SetPageOrdered, ClearPageOrdered,
 			 PageOrdered);
+IMPLEMENT_BTRFS_PAGE_OPS(checked, SetPageChecked, ClearPageChecked,
+			 PageChecked);
 
 void btrfs_page_assert_not_dirty(const struct btrfs_fs_info *fs_info,
 				 struct page *page)
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 9aa40d795ba9..6fb54b22b295 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -44,6 +44,13 @@ struct btrfs_subpage {
 
 			/* Tracke pending ordered extent in this sector */
 			u16 ordered_bitmap;
+
+			/*
+			 * If the sector is already checked, thus no need to go
+			 * through csum verification.
+			 * Used by both compression and cow fixup.
+			 */
+			u16 checked_bitmap;
 		};
 	};
 };
@@ -122,6 +129,7 @@ DECLARE_BTRFS_SUBPAGE_OPS(error);
 DECLARE_BTRFS_SUBPAGE_OPS(dirty);
 DECLARE_BTRFS_SUBPAGE_OPS(writeback);
 DECLARE_BTRFS_SUBPAGE_OPS(ordered);
+DECLARE_BTRFS_SUBPAGE_OPS(checked);
 
 bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info,
 		struct page *page, u64 start, u32 len);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/27] btrfs: handle errors properly inside btrfs_submit_compressed_read()
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (6 preceding siblings ...)
  2021-07-13  6:14 ` [PATCH 07/27] btrfs: add subpage checked_bitmap to make PageChecked flag to be subpage compatible Qu Wenruo
@ 2021-07-13  6:14 ` Qu Wenruo
  2021-07-13  6:14 ` [PATCH 09/27] btrfs: handle errors properly inside btrfs_submit_compressed_write() Qu Wenruo
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:14 UTC (permalink / raw)
  To: linux-btrfs

There are quite some BUG_ON()s inside btrfs_submit_compressed_read(),
namingly all errors inside the for() loop relies on BUG_ON() to handle
-ENOMEM.

Handle these errors properly by:

- Wait for submitted bios to finish first
  Using wake_var_event() APIs to wait without introducing extra memory
  overhead inside compressed_bio.
  This allows us to wait for any submitted bio to finish, while still
  keeps the compressed_bio from being freed.

- Introduce finish_compressed_bio_read() to finish the compressed_bio

- Properly end the bio and finish compressed_bio when error happens

Now in btrfs_submit_compressed_read() even when the bio submission
failed, we can properly handle the error without triggering BUG_ON().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 133 +++++++++++++++++++++++++----------------
 1 file changed, 83 insertions(+), 50 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index be81e34fbd56..51f83684dbb0 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -223,9 +223,60 @@ static bool dec_and_test_compressed_bio(struct compressed_bio *cb,
 	last_io = refcount_sub_and_test(bi_size >> fs_info->sectorsize_bits,
 					&cb->pending_sectors);
 	atomic_dec(&cb->pending_bios);
+	/*
+	 * Here we must wake up the possible error handler after all other
+	 * operations on @cb finished, or we can race with
+	 * finish_compressed_bio_*() which may free @cb.
+	 */
+	wake_up_var(cb);
+
 	return last_io;
 }
 
+static void finish_compressed_bio_read(struct compressed_bio *cb,
+				       struct bio *bio)
+{
+	unsigned int index;
+	struct page *page;
+
+	/* Release the compressed pages */
+	for (index = 0; index < cb->nr_pages; index++) {
+		page = cb->compressed_pages[index];
+		page->mapping = NULL;
+		put_page(page);
+	}
+
+	/* Do io completion on the original bio */
+	if (cb->errors) {
+		bio_io_error(cb->orig_bio);
+	} else {
+		struct bio_vec *bvec;
+		struct bvec_iter_all iter_all;
+
+		ASSERT(bio);
+		ASSERT(!bio->bi_status);
+		/*
+		 * We have verified the checksum already, set page
+		 * checked so the end_io handlers know about it
+		 */
+		ASSERT(!bio_flagged(bio, BIO_CLONED));
+		bio_for_each_segment_all(bvec, cb->orig_bio, iter_all) {
+			u64 bvec_start = page_offset(bvec->bv_page) +
+					 bvec->bv_offset;
+
+			btrfs_page_set_checked(btrfs_sb(cb->inode->i_sb),
+					bvec->bv_page, bvec_start,
+					bvec->bv_len);
+		}
+
+		bio_endio(cb->orig_bio);
+	}
+
+	/* Finally free the cb struct */
+	kfree(cb->compressed_pages);
+	kfree(cb);
+}
+
 /* when we finish reading compressed pages from the disk, we
  * decompress them and then run the bio end_io routines on the
  * decompressed pages (in the inode address space).
@@ -240,8 +291,6 @@ static void end_compressed_bio_read(struct bio *bio)
 {
 	struct compressed_bio *cb = bio->bi_private;
 	struct inode *inode;
-	struct page *page;
-	unsigned int index;
 	unsigned int mirror = btrfs_io_bio(bio)->mirror_num;
 	int ret = 0;
 
@@ -276,42 +325,7 @@ static void end_compressed_bio_read(struct bio *bio)
 csum_failed:
 	if (ret)
 		cb->errors = 1;
-
-	/* release the compressed pages */
-	index = 0;
-	for (index = 0; index < cb->nr_pages; index++) {
-		page = cb->compressed_pages[index];
-		page->mapping = NULL;
-		put_page(page);
-	}
-
-	/* do io completion on the original bio */
-	if (cb->errors) {
-		bio_io_error(cb->orig_bio);
-	} else {
-		struct bio_vec *bvec;
-		struct bvec_iter_all iter_all;
-
-		/*
-		 * we have verified the checksum already, set page
-		 * checked so the end_io handlers know about it
-		 */
-		ASSERT(!bio_flagged(bio, BIO_CLONED));
-		bio_for_each_segment_all(bvec, cb->orig_bio, iter_all) {
-			u64 bvec_start = page_offset(bvec->bv_page) +
-					 bvec->bv_offset;
-
-			btrfs_page_set_checked(btrfs_sb(cb->inode->i_sb),
-					bvec->bv_page, bvec_start,
-					bvec->bv_len);
-		}
-
-		bio_endio(cb->orig_bio);
-	}
-
-	/* finally free the cb struct */
-	kfree(cb->compressed_pages);
-	kfree(cb);
+	finish_compressed_bio_read(cb, bio);
 out:
 	bio_put(bio);
 }
@@ -834,20 +848,20 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 			atomic_inc(&cb->pending_bios);
 			ret = btrfs_bio_wq_end_io(fs_info, comp_bio,
 						  BTRFS_WQ_ENDIO_DATA);
-			BUG_ON(ret); /* -ENOMEM */
+			if (ret)
+				goto finish_cb;
 
 			ret = btrfs_lookup_bio_sums(inode, comp_bio, sums);
-			BUG_ON(ret); /* -ENOMEM */
+			if (ret)
+				goto finish_cb;
 
 			nr_sectors = DIV_ROUND_UP(comp_bio->bi_iter.bi_size,
 						  fs_info->sectorsize);
 			sums += fs_info->csum_size * nr_sectors;
 
 			ret = btrfs_map_bio(fs_info, comp_bio, mirror_num);
-			if (ret) {
-				comp_bio->bi_status = ret;
-				bio_endio(comp_bio);
-			}
+			if (ret)
+				goto finish_cb;
 
 			comp_bio = btrfs_bio_alloc(cur_disk_byte);
 			comp_bio->bi_opf = REQ_OP_READ;
@@ -861,16 +875,16 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 
 	atomic_inc(&cb->pending_bios);
 	ret = btrfs_bio_wq_end_io(fs_info, comp_bio, BTRFS_WQ_ENDIO_DATA);
-	BUG_ON(ret); /* -ENOMEM */
+	if (ret)
+		goto last_bio;
 
 	ret = btrfs_lookup_bio_sums(inode, comp_bio, sums);
-	BUG_ON(ret); /* -ENOMEM */
+	if (ret)
+		goto last_bio;
 
 	ret = btrfs_map_bio(fs_info, comp_bio, mirror_num);
-	if (ret) {
-		comp_bio->bi_status = ret;
-		bio_endio(comp_bio);
-	}
+	if (ret)
+		goto last_bio;
 
 	return 0;
 
@@ -886,6 +900,25 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 out:
 	free_extent_map(em);
 	return ret;
+last_bio:
+	comp_bio->bi_status = ret;
+	/* This is the last bio, endio functions will free @cb */
+	bio_endio(comp_bio);
+	return ret;
+finish_cb:
+	if (comp_bio) {
+		comp_bio->bi_status = ret;
+		bio_endio(comp_bio);
+	}
+	wait_var_event(cb, atomic_read(&cb->pending_bios) == 0);
+	/*
+	 * Even with previous bio ended, we should still have io not yet
+	 * submitted, thus need to finish @cb manually.
+	 */
+	ASSERT(refcount_read(&cb->pending_sectors));
+	/* Now we are the only one referring @cb, can finish it safely. */
+	finish_compressed_bio_read(cb, NULL);
+	return ret;
 }
 
 /*
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/27] btrfs: handle errors properly inside btrfs_submit_compressed_write()
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (7 preceding siblings ...)
  2021-07-13  6:14 ` [PATCH 08/27] btrfs: handle errors properly inside btrfs_submit_compressed_read() Qu Wenruo
@ 2021-07-13  6:14 ` Qu Wenruo
  2021-07-13  6:14 ` [PATCH 10/27] btrfs: introduce submit_compressed_bio() for compression Qu Wenruo
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:14 UTC (permalink / raw)
  To: linux-btrfs

Just like btrfs_submit_compressed_read(), there are quite some BUG_ON()s
inside btrfs_submit_compressed_write() for the bio submission path.

Fix them using the same method:

- For last bio, just endio the bio
  As in that case, one of the endio function of all these submitted bio
  will be able to free the compressed_bio

- For half-submitted bio, wait and finish the compressed_bio manually
  In this case, as long as all other bio finishes, we're the only one
  referring the compressed bio, and can manually finish it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 101 ++++++++++++++++++++++++++---------------
 1 file changed, 64 insertions(+), 37 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 51f83684dbb0..0ff7975ad874 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -368,50 +368,56 @@ static noinline void end_compressed_writeback(struct inode *inode,
 	/* the inode may be gone now */
 }
 
-/*
- * do the cleanup once all the compressed pages hit the disk.
- * This will clear writeback on the file pages and free the compressed
- * pages.
- *
- * This also calls the writeback end hooks for the file pages so that
- * metadata and checksums can be updated in the file.
- */
-static void end_compressed_bio_write(struct bio *bio)
+static void finish_compressed_bio_write(struct compressed_bio *cb)
 {
-	struct compressed_bio *cb = bio->bi_private;
-	struct inode *inode;
-	struct page *page;
+	struct inode *inode = cb->inode;
 	unsigned int index;
 
-	if (!dec_and_test_compressed_bio(cb, bio))
-		goto out;
-
-	/* ok, we're the last bio for this extent, step one is to
-	 * call back into the FS and do all the end_io operations
+	/*
+	 * Ok, we're the last bio for this extent, step one is to
+	 * call back into the FS and do all the end_io operations.
 	 */
-	inode = cb->inode;
-	btrfs_record_physical_zoned(inode, cb->start, bio);
 	btrfs_writepage_endio_finish_ordered(BTRFS_I(inode), NULL,
 			cb->start, cb->start + cb->len - 1,
-			bio->bi_status == BLK_STS_OK);
+			!cb->errors);
 
 	end_compressed_writeback(inode, cb);
-	/* note, our inode could be gone now */
+	/* Note, our inode could be gone now */
 
 	/*
-	 * release the compressed pages, these came from alloc_page and
+	 * Release the compressed pages, these came from alloc_page and
 	 * are not attached to the inode at all
 	 */
-	index = 0;
 	for (index = 0; index < cb->nr_pages; index++) {
-		page = cb->compressed_pages[index];
+		struct page *page = cb->compressed_pages[index];
+
 		page->mapping = NULL;
 		put_page(page);
 	}
 
-	/* finally free the cb struct */
+	/* Finally free the cb struct */
 	kfree(cb->compressed_pages);
 	kfree(cb);
+}
+
+/*
+ * do the cleanup once all the compressed pages hit the disk.
+ * This will clear writeback on the file pages and free the compressed
+ * pages.
+ *
+ * This also calls the writeback end hooks for the file pages so that
+ * metadata and checksums can be updated in the file.
+ */
+static void end_compressed_bio_write(struct bio *bio)
+{
+	struct compressed_bio *cb = bio->bi_private;
+
+	if (!dec_and_test_compressed_bio(cb, bio))
+		goto out;
+
+	btrfs_record_physical_zoned(cb->inode, cb->start, bio);
+
+	finish_compressed_bio_write(cb);
 out:
 	bio_put(bio);
 }
@@ -514,18 +520,18 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 			atomic_inc(&cb->pending_bios);
 			ret = btrfs_bio_wq_end_io(fs_info, bio,
 						  BTRFS_WQ_ENDIO_DATA);
-			BUG_ON(ret); /* -ENOMEM */
+			if (ret)
+				goto finish_cb;
 
 			if (!skip_sum) {
 				ret = btrfs_csum_one_bio(inode, bio, start, 1);
-				BUG_ON(ret); /* -ENOMEM */
+				if (ret)
+					goto finish_cb;
 			}
 
 			ret = btrfs_map_bio(fs_info, bio, 0);
-			if (ret) {
-				bio->bi_status = ret;
-				bio_endio(bio);
-			}
+			if (ret)
+				goto finish_cb;
 
 			bio = btrfs_bio_alloc(first_byte);
 			bio->bi_opf = bio_op | write_flags;
@@ -551,23 +557,44 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 
 	atomic_inc(&cb->pending_bios);
 	ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
-	BUG_ON(ret); /* -ENOMEM */
+	if (ret)
+		goto last_bio;
 
 	if (!skip_sum) {
 		ret = btrfs_csum_one_bio(inode, bio, start, 1);
-		BUG_ON(ret); /* -ENOMEM */
+		if (ret)
+			goto last_bio;
 	}
 
 	ret = btrfs_map_bio(fs_info, bio, 0);
-	if (ret) {
-		bio->bi_status = ret;
-		bio_endio(bio);
-	}
+	if (ret)
+		goto last_bio;
 
 	if (blkcg_css)
 		kthread_associate_blkcg(NULL);
 
 	return 0;
+last_bio:
+	bio->bi_status = ret;
+	/* One of the bios' endio function will free @cb. */
+	bio_endio(bio);
+	return ret;
+
+finish_cb:
+	if (bio) {
+		bio->bi_status = ret;
+		bio_endio(bio);
+	}
+
+	wait_var_event(cb, atomic_read(&cb->pending_bios) == 0);
+	/*
+	 * Even with previous bio ended, we should still have io not yet
+	 * submitted, thus need to finish manually.
+	 */
+	ASSERT(refcount_read(&cb->pending_sectors));
+	/* Now we are the only one referring @cb, can finish it safely. */
+	finish_compressed_bio_write(cb);
+	return ret;
 }
 
 static u64 bio_end_offset(struct bio *bio)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/27] btrfs: introduce submit_compressed_bio() for compression
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (8 preceding siblings ...)
  2021-07-13  6:14 ` [PATCH 09/27] btrfs: handle errors properly inside btrfs_submit_compressed_write() Qu Wenruo
@ 2021-07-13  6:14 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 11/27] btrfs: introduce alloc_compressed_bio() " Qu Wenruo
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:14 UTC (permalink / raw)
  To: linux-btrfs

The new helper, submit_compressed_bio(), will aggregate the following
work:

- Increase compressed_bio::pending_bios
- Remap the endio function
- Map and submit the bio

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 45 ++++++++++++++++++------------------------
 1 file changed, 19 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 0ff7975ad874..5db5b7450d95 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -422,6 +422,21 @@ static void end_compressed_bio_write(struct bio *bio)
 	bio_put(bio);
 }
 
+static blk_status_t submit_compressed_bio(struct btrfs_fs_info *fs_info,
+					  struct compressed_bio *cb,
+					  struct bio *bio, int mirror_num)
+{
+	blk_status_t ret;
+
+	ASSERT(bio->bi_iter.bi_size);
+	atomic_inc(&cb->pending_bios);
+	ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
+	if (ret)
+		return ret;
+	ret = btrfs_map_bio(fs_info, bio, mirror_num);
+	return ret;
+}
+
 /*
  * worker function to build and submit bios for previously compressed pages.
  * The corresponding pages in the inode should be marked for writeback
@@ -517,19 +532,13 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 
 		page->mapping = NULL;
 		if (submit || len < PAGE_SIZE) {
-			atomic_inc(&cb->pending_bios);
-			ret = btrfs_bio_wq_end_io(fs_info, bio,
-						  BTRFS_WQ_ENDIO_DATA);
-			if (ret)
-				goto finish_cb;
-
 			if (!skip_sum) {
 				ret = btrfs_csum_one_bio(inode, bio, start, 1);
 				if (ret)
 					goto finish_cb;
 			}
 
-			ret = btrfs_map_bio(fs_info, bio, 0);
+			ret = submit_compressed_bio(fs_info, cb, bio, 0);
 			if (ret)
 				goto finish_cb;
 
@@ -555,18 +564,13 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 		cond_resched();
 	}
 
-	atomic_inc(&cb->pending_bios);
-	ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
-	if (ret)
-		goto last_bio;
-
 	if (!skip_sum) {
 		ret = btrfs_csum_one_bio(inode, bio, start, 1);
 		if (ret)
 			goto last_bio;
 	}
 
-	ret = btrfs_map_bio(fs_info, bio, 0);
+	ret = submit_compressed_bio(fs_info, cb, bio, 0);
 	if (ret)
 		goto last_bio;
 
@@ -872,12 +876,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		if (submit || bio_add_page(comp_bio, page, pg_len, 0) < pg_len) {
 			unsigned int nr_sectors;
 
-			atomic_inc(&cb->pending_bios);
-			ret = btrfs_bio_wq_end_io(fs_info, comp_bio,
-						  BTRFS_WQ_ENDIO_DATA);
-			if (ret)
-				goto finish_cb;
-
 			ret = btrfs_lookup_bio_sums(inode, comp_bio, sums);
 			if (ret)
 				goto finish_cb;
@@ -886,7 +884,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 						  fs_info->sectorsize);
 			sums += fs_info->csum_size * nr_sectors;
 
-			ret = btrfs_map_bio(fs_info, comp_bio, mirror_num);
+			ret = submit_compressed_bio(fs_info, cb, comp_bio, mirror_num);
 			if (ret)
 				goto finish_cb;
 
@@ -900,16 +898,11 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		cur_disk_byte += pg_len;
 	}
 
-	atomic_inc(&cb->pending_bios);
-	ret = btrfs_bio_wq_end_io(fs_info, comp_bio, BTRFS_WQ_ENDIO_DATA);
-	if (ret)
-		goto last_bio;
-
 	ret = btrfs_lookup_bio_sums(inode, comp_bio, sums);
 	if (ret)
 		goto last_bio;
 
-	ret = btrfs_map_bio(fs_info, comp_bio, mirror_num);
+	ret = submit_compressed_bio(fs_info, cb, comp_bio, mirror_num);
 	if (ret)
 		goto last_bio;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/27] btrfs: introduce alloc_compressed_bio() for compression
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (9 preceding siblings ...)
  2021-07-13  6:14 ` [PATCH 10/27] btrfs: introduce submit_compressed_bio() for compression Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 12/27] btrfs: make btrfs_submit_compressed_read() to determine stripe boundary at bio allocation time Qu Wenruo
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

Just aggregate the bio allocation code into one helper, so that we can
replace 4 call sites.

There is one special note for zoned write.

Currently btrfs_submit_compressed_write() will only allocate the first
bio using ZONE_APPEND.
If we have to submit current bio due to stripe boundary, the new bio
allocated will not use ZONE_APPEND.

In theory this should be a bug, but considering zoned mode currently
only support SINGLE profile, which doesn't have any stripe boundary
limit, it should never be a problem.

This function will provide a good entrance for any work which needs to be
done at bio allocation time. Like determining the stripe boundary.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 85 ++++++++++++++++++++++++++++--------------
 1 file changed, 57 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 5db5b7450d95..b704290dd50d 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -437,6 +437,35 @@ static blk_status_t submit_compressed_bio(struct btrfs_fs_info *fs_info,
 	return ret;
 }
 
+/*
+ * To allocate a compressed_bio, which will be used to read/write on-disk data.
+ */
+static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_bytenr,
+					unsigned int opf, bio_end_io_t endio_func)
+{
+	struct bio *bio;
+
+	bio = btrfs_bio_alloc(disk_bytenr);
+
+	bio->bi_opf = opf;
+	bio->bi_private = cb;
+	bio->bi_end_io = endio_func;
+
+	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
+		struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
+		struct btrfs_device *device;
+
+		device = btrfs_zoned_get_device(fs_info, disk_bytenr,
+						fs_info->sectorsize);
+		if (IS_ERR(device)) {
+			bio_put(bio);
+			return ERR_CAST(device);
+		}
+		bio_set_dev(bio, device->bdev);
+	}
+	return bio;
+}
+
 /*
  * worker function to build and submit bios for previously compressed pages.
  * The corresponding pages in the inode should be marked for writeback
@@ -483,22 +512,11 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	cb->orig_bio = NULL;
 	cb->nr_pages = nr_pages;
 
-	bio = btrfs_bio_alloc(first_byte);
-	bio->bi_opf = bio_op | write_flags;
-	bio->bi_private = cb;
-	bio->bi_end_io = end_compressed_bio_write;
-
-	if (use_append) {
-		struct btrfs_device *device;
-
-		device = btrfs_zoned_get_device(fs_info, disk_start, PAGE_SIZE);
-		if (IS_ERR(device)) {
-			kfree(cb);
-			bio_put(bio);
-			return BLK_STS_NOTSUPP;
-		}
-
-		bio_set_dev(bio, device->bdev);
+	bio = alloc_compressed_bio(cb, first_byte, bio_op | write_flags,
+				   end_compressed_bio_write);
+	if (IS_ERR(bio)) {
+		kfree(cb);
+		return errno_to_blk_status(PTR_ERR(bio));
 	}
 
 	if (blkcg_css) {
@@ -542,10 +560,14 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 			if (ret)
 				goto finish_cb;
 
-			bio = btrfs_bio_alloc(first_byte);
-			bio->bi_opf = bio_op | write_flags;
-			bio->bi_private = cb;
-			bio->bi_end_io = end_compressed_bio_write;
+			bio = alloc_compressed_bio(cb, first_byte,
+					bio_op | write_flags,
+					end_compressed_bio_write);
+			if (IS_ERR(bio)) {
+				ret = errno_to_blk_status(PTR_ERR(bio));
+				bio = NULL;
+				goto finish_cb;
+			}
 			if (blkcg_css)
 				bio->bi_opf |= REQ_CGROUP_PUNT;
 			/*
@@ -844,10 +866,13 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	/* include any pages we added in add_ra-bio_pages */
 	cb->len = bio->bi_iter.bi_size;
 
-	comp_bio = btrfs_bio_alloc(cur_disk_byte);
-	comp_bio->bi_opf = REQ_OP_READ;
-	comp_bio->bi_private = cb;
-	comp_bio->bi_end_io = end_compressed_bio_read;
+	comp_bio = alloc_compressed_bio(cb, cur_disk_byte, REQ_OP_READ,
+					end_compressed_bio_read);
+	if (IS_ERR(comp_bio)) {
+		ret = errno_to_blk_status(PTR_ERR(comp_bio));
+		comp_bio = NULL;
+		goto fail2;
+	}
 
 	for (pg_index = 0; pg_index < nr_pages; pg_index++) {
 		u32 pg_len = PAGE_SIZE;
@@ -888,10 +913,14 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 			if (ret)
 				goto finish_cb;
 
-			comp_bio = btrfs_bio_alloc(cur_disk_byte);
-			comp_bio->bi_opf = REQ_OP_READ;
-			comp_bio->bi_private = cb;
-			comp_bio->bi_end_io = end_compressed_bio_read;
+			comp_bio = alloc_compressed_bio(cb, cur_disk_byte,
+					REQ_OP_READ,
+					end_compressed_bio_read);
+			if (IS_ERR(comp_bio)) {
+				ret = errno_to_blk_status(PTR_ERR(comp_bio));
+				comp_bio = NULL;
+				goto finish_cb;
+			}
 
 			bio_add_page(comp_bio, page, pg_len, 0);
 		}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 12/27] btrfs: make btrfs_submit_compressed_read() to determine stripe boundary at bio allocation time
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (10 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 11/27] btrfs: introduce alloc_compressed_bio() " Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 13/27] btrfs: make btrfs_submit_compressed_write() " Qu Wenruo
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs_submit_compressed_read() will check
btrfs_bio_fits_in_stripe() each time a new page is going to be added.

Even compressed extent is small, we don't really need to do that for
every page.

This patch will align the behavior to extent_io.c, by determining the
stripe boundary when allocating a bio.

Unlike extent_io.c, in compressed.c we don't need to bother things like
different bio flags, thus no need to re-use bio_ctrl.

Here we just manually introduce new local variable, next_stripe_start,
and teach alloc_compressed_bio() to calculate the stripe boundary.

Then each time we add some page range into the bio, we check if we
reached the boundary.
And if reached, submit it.

Also, since we have @cur_disk_byte to determine whether we're the last
bio, we don't need a explicit last_bio: tag for error handling any more.

And we can use @cur_disk_byte to track which range has been added to
bio, we can also use @cur_disk_byte to calculate the wait condition, no
need for @pending_bios.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 164 +++++++++++++++++++++++------------------
 1 file changed, 93 insertions(+), 71 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index b704290dd50d..30e10d8aeca0 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -439,11 +439,18 @@ static blk_status_t submit_compressed_bio(struct btrfs_fs_info *fs_info,
 
 /*
  * To allocate a compressed_bio, which will be used to read/write on-disk data.
+ *
+ * @next_stripe_start:	Disk bytenr of next stripe start
  */
 static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_bytenr,
-					unsigned int opf, bio_end_io_t endio_func)
+					unsigned int opf, bio_end_io_t endio_func,
+					u64 *next_stripe_start)
 {
+	struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
+	struct btrfs_io_geometry geom;
+	struct extent_map *em;
 	struct bio *bio;
+	int ret;
 
 	bio = btrfs_bio_alloc(disk_bytenr);
 
@@ -451,18 +458,24 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte
 	bio->bi_private = cb;
 	bio->bi_end_io = endio_func;
 
-	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
-		struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
-		struct btrfs_device *device;
+	em = btrfs_get_chunk_map(fs_info, disk_bytenr, fs_info->sectorsize);
+	if (IS_ERR(em)) {
+		bio_put(bio);
+		return ERR_CAST(em);
+	}
 
-		device = btrfs_zoned_get_device(fs_info, disk_bytenr,
-						fs_info->sectorsize);
-		if (IS_ERR(device)) {
-			bio_put(bio);
-			return ERR_CAST(device);
-		}
-		bio_set_dev(bio, device->bdev);
+	if (bio_op(bio) == REQ_OP_ZONE_APPEND)
+		bio_set_dev(bio, em->map_lookup->stripes[0].dev->bdev);
+
+	ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio), disk_bytenr,
+				    &geom);
+	free_extent_map(em);
+	if (ret < 0) {
+		bio_put(bio);
+		return ERR_PTR(ret);
 	}
+	*next_stripe_start = disk_bytenr + geom.len;
+
 	return bio;
 }
 
@@ -490,6 +503,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	int pg_index = 0;
 	struct page *page;
 	u64 first_byte = disk_start;
+	u64 next_stripe_start;
 	blk_status_t ret;
 	int skip_sum = inode->flags & BTRFS_INODE_NODATASUM;
 	const bool use_append = btrfs_use_zone_append(inode, disk_start);
@@ -513,7 +527,8 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	cb->nr_pages = nr_pages;
 
 	bio = alloc_compressed_bio(cb, first_byte, bio_op | write_flags,
-				   end_compressed_bio_write);
+				   end_compressed_bio_write,
+				   &next_stripe_start);
 	if (IS_ERR(bio)) {
 		kfree(cb);
 		return errno_to_blk_status(PTR_ERR(bio));
@@ -562,7 +577,8 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 
 			bio = alloc_compressed_bio(cb, first_byte,
 					bio_op | write_flags,
-					end_compressed_bio_write);
+					end_compressed_bio_write,
+					&next_stripe_start);
 			if (IS_ERR(bio)) {
 				ret = errno_to_blk_status(PTR_ERR(bio));
 				bio = NULL;
@@ -795,9 +811,10 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	unsigned int compressed_len;
 	unsigned int nr_pages;
 	unsigned int pg_index;
-	struct page *page;
-	struct bio *comp_bio;
-	u64 cur_disk_byte = bio->bi_iter.bi_sector << 9;
+	struct bio *comp_bio = NULL;
+	const u64 disk_bytenr = bio->bi_iter.bi_sector << SECTOR_SHIFT;
+	u64 cur_disk_byte = disk_bytenr;
+	u64 next_stripe_start;
 	u64 file_offset;
 	u64 em_len;
 	u64 em_start;
@@ -866,39 +883,62 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	/* include any pages we added in add_ra-bio_pages */
 	cb->len = bio->bi_iter.bi_size;
 
-	comp_bio = alloc_compressed_bio(cb, cur_disk_byte, REQ_OP_READ,
-					end_compressed_bio_read);
-	if (IS_ERR(comp_bio)) {
-		ret = errno_to_blk_status(PTR_ERR(comp_bio));
-		comp_bio = NULL;
-		goto fail2;
-	}
-
-	for (pg_index = 0; pg_index < nr_pages; pg_index++) {
-		u32 pg_len = PAGE_SIZE;
-		int submit = 0;
+	while (cur_disk_byte < disk_bytenr + compressed_len) {
+		u64 offset = cur_disk_byte - disk_bytenr;
+		unsigned int index = offset >> PAGE_SHIFT;
+		unsigned int real_size;
+		unsigned int added;
+		struct page *page = cb->compressed_pages[index];
+		bool submit = false;
 
+		/* Allocate new bio if submitted or not yet allocated */
+		if (!comp_bio) {
+			comp_bio = alloc_compressed_bio(cb, cur_disk_byte,
+					REQ_OP_READ, end_compressed_bio_read,
+					&next_stripe_start);
+			if (IS_ERR(comp_bio)) {
+				ret = errno_to_blk_status(PTR_ERR(comp_bio));
+				comp_bio = NULL;
+				goto finish_cb;
+			}
+		}
 		/*
-		 * To handle subpage case, we need to make sure the bio only
-		 * covers the range we need.
-		 *
-		 * If we're at the last page, truncate the length to only cover
-		 * the remaining part.
+		 * We should never reach next_stripe_start start as we will
+		 * submit comp_bio when reach the boundary immediately.
+		 */
+		ASSERT(cur_disk_byte != next_stripe_start);
+		/*
+		 * We have various limit on the real read size:
+		 * - stripe boundary
+		 * - page boundary
+		 * - compressed length boundary
+		 */
+		real_size = min_t(u64, U32_MAX,
+				  next_stripe_start - cur_disk_byte);
+		real_size = min_t(u64, real_size,
+				  PAGE_SIZE - offset_in_page(offset));
+		real_size = min_t(u64, real_size,
+				  compressed_len - offset);
+		ASSERT(IS_ALIGNED(real_size, fs_info->sectorsize));
+
+		added = bio_add_page(comp_bio, page, real_size,
+				     offset_in_page(offset));
+		/*
+		 * Maximum compressed extent is smaller than bio size limit,
+		 * thus bio_add_page() should always success.
 		 */
-		if (pg_index == nr_pages - 1)
-			pg_len = min_t(u32, PAGE_SIZE,
-					compressed_len - pg_index * PAGE_SIZE);
+		ASSERT(added == real_size);
+		cur_disk_byte += added;
 
-		page = cb->compressed_pages[pg_index];
-		page->mapping = inode->i_mapping;
-		page->index = em_start >> PAGE_SHIFT;
+		/* Reached stripe boundary, need to submit */
+		if (cur_disk_byte == next_stripe_start)
+			submit = true;
 
-		if (comp_bio->bi_iter.bi_size)
-			submit = btrfs_bio_fits_in_stripe(page, pg_len,
-							  comp_bio, 0);
+		/* Has finished the range, need to submit */
+		if (cur_disk_byte == disk_bytenr + compressed_len)
+			submit = true;
 
-		page->mapping = NULL;
-		if (submit || bio_add_page(comp_bio, page, pg_len, 0) < pg_len) {
+		if (submit) {
 			unsigned int nr_sectors;
 
 			ret = btrfs_lookup_bio_sums(inode, comp_bio, sums);
@@ -909,32 +949,13 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 						  fs_info->sectorsize);
 			sums += fs_info->csum_size * nr_sectors;
 
-			ret = submit_compressed_bio(fs_info, cb, comp_bio, mirror_num);
+			ret = submit_compressed_bio(fs_info, cb, comp_bio,
+						    mirror_num);
 			if (ret)
 				goto finish_cb;
-
-			comp_bio = alloc_compressed_bio(cb, cur_disk_byte,
-					REQ_OP_READ,
-					end_compressed_bio_read);
-			if (IS_ERR(comp_bio)) {
-				ret = errno_to_blk_status(PTR_ERR(comp_bio));
-				comp_bio = NULL;
-				goto finish_cb;
-			}
-
-			bio_add_page(comp_bio, page, pg_len, 0);
+			comp_bio = NULL;
 		}
-		cur_disk_byte += pg_len;
 	}
-
-	ret = btrfs_lookup_bio_sums(inode, comp_bio, sums);
-	if (ret)
-		goto last_bio;
-
-	ret = submit_compressed_bio(fs_info, cb, comp_bio, mirror_num);
-	if (ret)
-		goto last_bio;
-
 	return 0;
 
 fail2:
@@ -949,17 +970,18 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 out:
 	free_extent_map(em);
 	return ret;
-last_bio:
-	comp_bio->bi_status = ret;
-	/* This is the last bio, endio functions will free @cb */
-	bio_endio(comp_bio);
-	return ret;
 finish_cb:
 	if (comp_bio) {
 		comp_bio->bi_status = ret;
 		bio_endio(comp_bio);
 	}
-	wait_var_event(cb, atomic_read(&cb->pending_bios) == 0);
+	/* All bytes of @cb is submitted, endio will free @cb */
+	if (cur_disk_byte == disk_bytenr + compressed_len)
+		return ret;
+
+	wait_var_event(cb, refcount_read(&cb->pending_sectors) ==
+			   (disk_bytenr + compressed_len - cur_disk_byte) >>
+			   fs_info->sectorsize_bits);
 	/*
 	 * Even with previous bio ended, we should still have io not yet
 	 * submitted, thus need to finish @cb manually.
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 13/27] btrfs: make btrfs_submit_compressed_write() to determine stripe boundary at bio allocation time
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (11 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 12/27] btrfs: make btrfs_submit_compressed_read() to determine stripe boundary at bio allocation time Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 14/27] btrfs: remove unused function btrfs_bio_fits_in_stripe() Qu Wenruo
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs_submit_compressed_write() will check
btrfs_bio_fits_in_stripe() each time a new page is going to be added.

Even compressed extent is small, we don't really need to do that for
every page.

This patch will align the behavior to extent_io.c, by determining the
stripe boundary when allocating a bio.

Unlike extent_io.c, in compressed.c we don't need to bother things like
different bio flags, thus no need to re-use bio_ctrl.

Here we just manually introduce new local variable, next_stripe_start,
and use that value returned from alloc_compressed_bio() to calculate
the stripe boundary.

Then each time we add some page range into the bio, we check if we
reached the boundary.
And if reached, submit it.

Also, since we have @cur_disk_bytenr to determine whether we're the last
bio, we don't need a explicit last_bio: tag for error handling any more.

And since we use @cur_disk_bytenr to wait, there is no need for
pending_bios, also remove it to save some memory of compressed_bio.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 145 ++++++++++++++++++-----------------------
 fs/btrfs/compression.h |   3 -
 2 files changed, 62 insertions(+), 86 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 30e10d8aeca0..70a4679799be 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -222,7 +222,6 @@ static bool dec_and_test_compressed_bio(struct compressed_bio *cb,
 	ASSERT(bi_size && bi_size <= cb->compressed_len);
 	last_io = refcount_sub_and_test(bi_size >> fs_info->sectorsize_bits,
 					&cb->pending_sectors);
-	atomic_dec(&cb->pending_bios);
 	/*
 	 * Here we must wake up the possible error handler after all other
 	 * operations on @cb finished, or we can race with
@@ -429,7 +428,6 @@ static blk_status_t submit_compressed_bio(struct btrfs_fs_info *fs_info,
 	blk_status_t ret;
 
 	ASSERT(bio->bi_iter.bi_size);
-	atomic_inc(&cb->pending_bios);
 	ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
 	if (ret)
 		return ret;
@@ -499,10 +497,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct bio *bio = NULL;
 	struct compressed_bio *cb;
-	unsigned long bytes_left;
-	int pg_index = 0;
-	struct page *page;
-	u64 first_byte = disk_start;
+	u64 cur_disk_bytenr = disk_start;
 	u64 next_stripe_start;
 	blk_status_t ret;
 	int skip_sum = inode->flags & BTRFS_INODE_NODATASUM;
@@ -513,7 +508,6 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	cb = kmalloc(compressed_bio_size(fs_info, compressed_len), GFP_NOFS);
 	if (!cb)
 		return BLK_STS_RESOURCE;
-	atomic_set(&cb->pending_bios, 0);
 	refcount_set(&cb->pending_sectors,
 		     compressed_len >> fs_info->sectorsize_bits);
 	cb->errors = 0;
@@ -526,45 +520,65 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	cb->orig_bio = NULL;
 	cb->nr_pages = nr_pages;
 
-	bio = alloc_compressed_bio(cb, first_byte, bio_op | write_flags,
-				   end_compressed_bio_write,
-				   &next_stripe_start);
-	if (IS_ERR(bio)) {
-		kfree(cb);
-		return errno_to_blk_status(PTR_ERR(bio));
-	}
-
-	if (blkcg_css) {
-		bio->bi_opf |= REQ_CGROUP_PUNT;
-		kthread_associate_blkcg(blkcg_css);
-	}
-
-	/* create and submit bios for the compressed pages */
-	bytes_left = compressed_len;
-	for (pg_index = 0; pg_index < cb->nr_pages; pg_index++) {
-		int submit = 0;
-		int len = 0;
+	while (cur_disk_bytenr < disk_start + compressed_len) {
+		u64 offset = cur_disk_bytenr - disk_start;
+		unsigned int index = offset >> PAGE_SHIFT;
+		unsigned int real_size;
+		unsigned int added;
+		struct page *page = compressed_pages[index];
+		bool submit = false;
 
-		page = compressed_pages[pg_index];
-		page->mapping = inode->vfs_inode.i_mapping;
-		if (bio->bi_iter.bi_size)
-			submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio,
-							  0);
+		/* Allocate new bio if submitted or not yet allocated */
+		if (!bio) {
+			bio = alloc_compressed_bio(cb, cur_disk_bytenr,
+				bio_op | write_flags, end_compressed_bio_write,
+				&next_stripe_start);
+			if (IS_ERR(bio)) {
+				ret = errno_to_blk_status(PTR_ERR(bio));
+				bio = NULL;
+				goto finish_cb;
+			}
+		}
+		/*
+		 * We should never reach next_stripe_start start as we will
+		 * submit comp_bio when reach the boundary immediately.
+		 */
+		ASSERT(cur_disk_bytenr != next_stripe_start);
 
 		/*
-		 * Page can only be added to bio if the current bio fits in
-		 * stripe.
+		 * We have various limit on the real read size:
+		 * - stripe boundary
+		 * - page boundary
+		 * - compressed length boundary
 		 */
-		if (!submit) {
-			if (pg_index == 0 && use_append)
-				len = bio_add_zone_append_page(bio, page,
-							       PAGE_SIZE, 0);
-			else
-				len = bio_add_page(bio, page, PAGE_SIZE, 0);
-		}
+		real_size = min_t(u64, U32_MAX,
+				  next_stripe_start - cur_disk_bytenr);
+		real_size = min_t(u64, real_size,
+				  PAGE_SIZE - offset_in_page(offset));
+		real_size = min_t(u64, real_size,
+				  compressed_len - offset);
+		ASSERT(IS_ALIGNED(real_size, fs_info->sectorsize));
 
-		page->mapping = NULL;
-		if (submit || len < PAGE_SIZE) {
+		if (use_append)
+			added = bio_add_zone_append_page(bio, page, real_size,
+					offset_in_page(offset));
+		else
+			added = bio_add_page(bio, page, real_size,
+					offset_in_page(offset));
+		/* Reached zoned boundary */
+		if (added == 0)
+			submit = true;
+
+		cur_disk_bytenr += added;
+		/* Reached stripe boundary */
+		if (cur_disk_bytenr == next_stripe_start)
+			submit = true;
+
+		/* Finished the range */
+		if (cur_disk_bytenr == disk_start + compressed_len)
+			submit = true;
+
+		if (submit) {
 			if (!skip_sum) {
 				ret = btrfs_csum_one_bio(inode, bio, start, 1);
 				if (ret)
@@ -574,61 +588,27 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 			ret = submit_compressed_bio(fs_info, cb, bio, 0);
 			if (ret)
 				goto finish_cb;
-
-			bio = alloc_compressed_bio(cb, first_byte,
-					bio_op | write_flags,
-					end_compressed_bio_write,
-					&next_stripe_start);
-			if (IS_ERR(bio)) {
-				ret = errno_to_blk_status(PTR_ERR(bio));
-				bio = NULL;
-				goto finish_cb;
-			}
-			if (blkcg_css)
-				bio->bi_opf |= REQ_CGROUP_PUNT;
-			/*
-			 * Use bio_add_page() to ensure the bio has at least one
-			 * page.
-			 */
-			bio_add_page(bio, page, PAGE_SIZE, 0);
-		}
-		if (bytes_left < PAGE_SIZE) {
-			btrfs_info(fs_info,
-					"bytes left %lu compress len %u nr %u",
-			       bytes_left, cb->compressed_len, cb->nr_pages);
+			bio = NULL;
 		}
-		bytes_left -= PAGE_SIZE;
-		first_byte += PAGE_SIZE;
 		cond_resched();
 	}
-
-	if (!skip_sum) {
-		ret = btrfs_csum_one_bio(inode, bio, start, 1);
-		if (ret)
-			goto last_bio;
-	}
-
-	ret = submit_compressed_bio(fs_info, cb, bio, 0);
-	if (ret)
-		goto last_bio;
-
 	if (blkcg_css)
 		kthread_associate_blkcg(NULL);
 
 	return 0;
-last_bio:
-	bio->bi_status = ret;
-	/* One of the bios' endio function will free @cb. */
-	bio_endio(bio);
-	return ret;
 
 finish_cb:
 	if (bio) {
 		bio->bi_status = ret;
 		bio_endio(bio);
 	}
+	/* Last byte of @cb is submitted, endio will free @cb */
+	if (cur_disk_bytenr == disk_start + compressed_len)
+		return ret;
 
-	wait_var_event(cb, atomic_read(&cb->pending_bios) == 0);
+	wait_var_event(cb, refcount_read(&cb->pending_sectors) ==
+			   (disk_start + compressed_len - cur_disk_bytenr) >>
+			   fs_info->sectorsize_bits);
 	/*
 	 * Even with previous bio ended, we should still have io not yet
 	 * submitted, thus need to finish manually.
@@ -841,7 +821,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	if (!cb)
 		goto out;
 
-	atomic_set(&cb->pending_bios, 0);
 	refcount_set(&cb->pending_sectors,
 		     compressed_len >> fs_info->sectorsize_bits);
 	cb->errors = 0;
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 61955581e34f..56eef0821e3e 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -28,9 +28,6 @@ struct btrfs_inode;
 #define	BTRFS_ZLIB_DEFAULT_LEVEL		3
 
 struct compressed_bio {
-	/* number of bios pending for this compressed extent */
-	atomic_t pending_bios;
-
 	/* Number of sectors with unfinished IO (unsubmitted or unfinished) */
 	refcount_t pending_sectors;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 14/27] btrfs: remove unused function btrfs_bio_fits_in_stripe()
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (12 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 13/27] btrfs: make btrfs_submit_compressed_write() " Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 15/27] btrfs: refactor submit_compressed_extents() Qu Wenruo
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

As the last caller in compression.c is removed, we don't need that
function anymore.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h |  2 --
 fs/btrfs/inode.c | 42 ------------------------------------------
 2 files changed, 44 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4a69aa604ac5..e265a7eb0d5c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3152,8 +3152,6 @@ void btrfs_merge_delalloc_extent(struct inode *inode, struct extent_state *new,
 				 struct extent_state *other);
 void btrfs_split_delalloc_extent(struct inode *inode,
 				 struct extent_state *orig, u64 split);
-int btrfs_bio_fits_in_stripe(struct page *page, size_t size, struct bio *bio,
-			     unsigned long bio_flags);
 void btrfs_set_range_writeback(struct btrfs_inode *inode, u64 start, u64 end);
 vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf);
 int btrfs_readpage(struct file *file, struct page *page);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5c29d131a574..cb6cb99eb454 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2226,48 +2226,6 @@ void btrfs_clear_delalloc_extent(struct inode *vfs_inode,
 	}
 }
 
-/*
- * btrfs_bio_fits_in_stripe - Checks whether the size of the given bio will fit
- * in a chunk's stripe. This function ensures that bios do not span a
- * stripe/chunk
- *
- * @page - The page we are about to add to the bio
- * @size - size we want to add to the bio
- * @bio - bio we want to ensure is smaller than a stripe
- * @bio_flags - flags of the bio
- *
- * return 1 if page cannot be added to the bio
- * return 0 if page can be added to the bio
- * return error otherwise
- */
-int btrfs_bio_fits_in_stripe(struct page *page, size_t size, struct bio *bio,
-			     unsigned long bio_flags)
-{
-	struct inode *inode = page->mapping->host;
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	u64 logical = bio->bi_iter.bi_sector << 9;
-	u32 bio_len = bio->bi_iter.bi_size;
-	struct extent_map *em;
-	int ret = 0;
-	struct btrfs_io_geometry geom;
-
-	if (bio_flags & EXTENT_BIO_COMPRESSED)
-		return 0;
-
-	em = btrfs_get_chunk_map(fs_info, logical, fs_info->sectorsize);
-	if (IS_ERR(em))
-		return PTR_ERR(em);
-	ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio), logical, &geom);
-	if (ret < 0)
-		goto out;
-
-	if (geom.len < bio_len + size)
-		ret = 1;
-out:
-	free_extent_map(em);
-	return ret;
-}
-
 /*
  * in order to insert checksums into the metadata in large chunks,
  * we wait until bio submission time.   All the pages in the bio are
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 15/27] btrfs: refactor submit_compressed_extents()
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (13 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 14/27] btrfs: remove unused function btrfs_bio_fits_in_stripe() Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 16/27] btrfs: cleanup for extent_write_locked_range() Qu Wenruo
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

We have a big hunk of code inside a while() loop, with tons of strange
jump for error handling.

It's definitely not to the code standard of today.

Move the code into a new function, submit_one_async_extent().

Since we're here, also do the following modifications:

- Comment style change
  To follow the current scheme

- Don't fallback to non-compressed write then hitting ENOSPC
  If we hit ENOSPC for compressed write, how could we reserve more space
  for non-compressed write?
  Thus we go error path directly.
  This removes the retry: label.

- Add more comment for super long parameter list
  Explain which parameter is for, so we don't need to check the
  prototype.

- Move the error handling to submit_one_async_extent()
  Thus no strange code like:

  out_free:
	...
	goto again;

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 278 +++++++++++++++++++++++------------------------
 1 file changed, 137 insertions(+), 141 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index cb6cb99eb454..7931c4a5bc5d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -838,163 +838,129 @@ static void free_async_extent_pages(struct async_extent *async_extent)
 	async_extent->pages = NULL;
 }
 
-/*
- * phase two of compressed writeback.  This is the ordered portion
- * of the code, which only gets called in the order the work was
- * queued.  We walk all the async extents created by compress_file_range
- * and send them down to the disk.
- */
-static noinline void submit_compressed_extents(struct async_chunk *async_chunk)
+static int submit_one_async_extent(struct btrfs_inode *inode,
+				   struct async_chunk *async_chunk,
+				   struct async_extent *async_extent,
+				   u64 *alloc_hint)
 {
-	struct btrfs_inode *inode = BTRFS_I(async_chunk->inode);
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct async_extent *async_extent;
-	u64 alloc_hint = 0;
+	struct extent_io_tree *io_tree = &inode->io_tree;
+	struct btrfs_root *root = inode->root;
+	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct btrfs_key ins;
 	struct extent_map *em;
-	struct btrfs_root *root = inode->root;
-	struct extent_io_tree *io_tree = &inode->io_tree;
 	int ret = 0;
+	u64 start = async_extent->start;
+	u64 end = async_extent->start + async_extent->ram_size - 1;
 
-again:
-	while (!list_empty(&async_chunk->extents)) {
-		async_extent = list_entry(async_chunk->extents.next,
-					  struct async_extent, list);
-		list_del(&async_extent->list);
-
-retry:
-		lock_extent(io_tree, async_extent->start,
-			    async_extent->start + async_extent->ram_size - 1);
-		/* did the compression code fall back to uncompressed IO? */
-		if (!async_extent->pages) {
-			int page_started = 0;
-			unsigned long nr_written = 0;
-
-			/* allocate blocks */
-			ret = cow_file_range(inode, async_chunk->locked_page,
-					     async_extent->start,
-					     async_extent->start +
-					     async_extent->ram_size - 1,
-					     &page_started, &nr_written, 0);
-
-			/* JDM XXX */
+	lock_extent(io_tree, start, end);
 
-			/*
-			 * if page_started, cow_file_range inserted an
-			 * inline extent and took care of all the unlocking
-			 * and IO for us.  Otherwise, we need to submit
-			 * all those pages down to the drive.
-			 */
-			if (!page_started && !ret)
-				extent_write_locked_range(&inode->vfs_inode,
-						  async_extent->start,
-						  async_extent->start +
-						  async_extent->ram_size - 1,
-						  WB_SYNC_ALL);
-			else if (ret && async_chunk->locked_page)
-				unlock_page(async_chunk->locked_page);
-			kfree(async_extent);
-			cond_resched();
-			continue;
-		}
-
-		ret = btrfs_reserve_extent(root, async_extent->ram_size,
-					   async_extent->compressed_size,
-					   async_extent->compressed_size,
-					   0, alloc_hint, &ins, 1, 1);
-		if (ret) {
-			free_async_extent_pages(async_extent);
-
-			if (ret == -ENOSPC) {
-				unlock_extent(io_tree, async_extent->start,
-					      async_extent->start +
-					      async_extent->ram_size - 1);
-
-				/*
-				 * we need to redirty the pages if we decide to
-				 * fallback to uncompressed IO, otherwise we
-				 * will not submit these pages down to lower
-				 * layers.
-				 */
-				extent_range_redirty_for_io(&inode->vfs_inode,
-						async_extent->start,
-						async_extent->start +
-						async_extent->ram_size - 1);
+	/* We have fall back to uncompressed write */
+	if (!async_extent->pages) {
+		int page_started = 0;
+		unsigned long nr_written = 0;
 
-				goto retry;
-			}
-			goto out_free;
-		}
 		/*
-		 * here we're doing allocation and writeback of the
-		 * compressed pages
+		 * Call cow_file_range() to run the delalloc range directly,
+		 * since we won't go to nocow or async path again.
 		 */
-		em = create_io_em(inode, async_extent->start,
-				  async_extent->ram_size, /* len */
-				  async_extent->start, /* orig_start */
-				  ins.objectid, /* block_start */
-				  ins.offset, /* block_len */
-				  ins.offset, /* orig_block_len */
-				  async_extent->ram_size, /* ram_bytes */
-				  async_extent->compress_type,
-				  BTRFS_ORDERED_COMPRESSED);
-		if (IS_ERR(em))
-			/* ret value is not necessary due to void function */
-			goto out_free_reserve;
-		free_extent_map(em);
-
-		ret = btrfs_add_ordered_extent_compress(inode,
-						async_extent->start,
-						ins.objectid,
-						async_extent->ram_size,
-						ins.offset,
-						async_extent->compress_type);
-		if (ret) {
-			btrfs_drop_extent_cache(inode, async_extent->start,
-						async_extent->start +
-						async_extent->ram_size - 1, 0);
-			goto out_free_reserve;
-		}
-		btrfs_dec_block_group_reservations(fs_info, ins.objectid);
-
+		ret = cow_file_range(inode, async_chunk->locked_page,
+				     start, end, &page_started, &nr_written, 0);
 		/*
-		 * clear dirty, set writeback and unlock the pages.
+		 * If @page_started, cow_file_range() inserted an
+		 * inline extent and took care of all the unlocking
+		 * and IO for us.  Otherwise, we need to submit
+		 * all those pages down to the drive.
 		 */
-		extent_clear_unlock_delalloc(inode, async_extent->start,
-				async_extent->start +
-				async_extent->ram_size - 1,
-				NULL, EXTENT_LOCKED | EXTENT_DELALLOC,
-				PAGE_UNLOCK | PAGE_START_WRITEBACK);
-		if (btrfs_submit_compressed_write(inode, async_extent->start,
-				    async_extent->ram_size,
-				    ins.objectid,
-				    ins.offset, async_extent->pages,
-				    async_extent->nr_pages,
-				    async_chunk->write_flags,
-				    async_chunk->blkcg_css)) {
-			const u64 start = async_extent->start;
-			const u64 end = start + async_extent->ram_size - 1;
-
-			btrfs_writepage_endio_finish_ordered(inode, NULL, start,
-							     end, 0);
-
-			extent_clear_unlock_delalloc(inode, start, end, NULL, 0,
-						     PAGE_END_WRITEBACK |
-						     PAGE_SET_ERROR);
-			free_async_extent_pages(async_extent);
-		}
-		alloc_hint = ins.objectid + ins.offset;
+		if (!page_started && !ret)
+			extent_write_locked_range(&inode->vfs_inode, start,
+						  end, WB_SYNC_ALL);
+		else if (ret && async_chunk->locked_page)
+			unlock_page(async_chunk->locked_page);
 		kfree(async_extent);
-		cond_resched();
+		return ret;
+	}
+
+	ret = btrfs_reserve_extent(root, async_extent->ram_size,
+				   async_extent->compressed_size,
+				   async_extent->compressed_size,
+				   0, *alloc_hint, &ins, 1, 1);
+	if (ret) {
+		free_async_extent_pages(async_extent);
+		/*
+		 * Here we used to try again by going back to non-compressed
+		 * path for ENOSPC.
+		 * But we can't reserve space even for compressed size, how
+		 * could it work for uncompressed size which requires larger
+		 * size?
+		 * So here we directly go error path.
+		 */
+		goto out_free;
+	}
+
+	/*
+	 * Here we're doing allocation and writeback of the
+	 * compressed pages
+	 */
+	em = create_io_em(inode, start,
+			  async_extent->ram_size, /* len */
+			  start, /* orig_start */
+			  ins.objectid, /* block_start */
+			  ins.offset, /* block_len */
+			  ins.offset, /* orig_block_len */
+			  async_extent->ram_size, /* ram_bytes */
+			  async_extent->compress_type,
+			  BTRFS_ORDERED_COMPRESSED);
+	if (IS_ERR(em)) {
+		ret = PTR_ERR(em);
+		goto out_free_reserve;
+	}
+	free_extent_map(em);
+
+	ret = btrfs_add_ordered_extent_compress(inode, start, /* file_offset */
+					ins.objectid,	/* disk_bytenr */
+					async_extent->ram_size, /* num_bytes */
+					ins.offset, /* disk_num_bytes */
+					async_extent->compress_type);
+	if (ret) {
+		btrfs_drop_extent_cache(inode, start, end, 0);
+		goto out_free_reserve;
 	}
-	return;
+	btrfs_dec_block_group_reservations(fs_info, ins.objectid);
+
+	/*
+	 * clear dirty, set writeback and unlock the pages.
+	 */
+	extent_clear_unlock_delalloc(inode, start, end,
+			NULL, EXTENT_LOCKED | EXTENT_DELALLOC,
+			PAGE_UNLOCK | PAGE_START_WRITEBACK);
+	if (btrfs_submit_compressed_write(inode, start, /* file_offset */
+			    async_extent->ram_size, /* num_bytes */
+			    ins.objectid, /* disk_bytenr */
+			    ins.offset, /* compressed_len */
+			    async_extent->pages, /* compressed_pages */
+			    async_extent->nr_pages,
+			    async_chunk->write_flags,
+			    async_chunk->blkcg_css)) {
+		const u64 start = async_extent->start;
+		const u64 end = start + async_extent->ram_size - 1;
+
+		btrfs_writepage_endio_finish_ordered(inode, NULL, start,
+						     end, 0);
+
+		extent_clear_unlock_delalloc(inode, start, end, NULL, 0,
+					     PAGE_END_WRITEBACK |
+					     PAGE_SET_ERROR);
+		free_async_extent_pages(async_extent);
+	}
+	*alloc_hint = ins.objectid + ins.offset;
+	kfree(async_extent);
+	return ret;
+
 out_free_reserve:
 	btrfs_dec_block_group_reservations(fs_info, ins.objectid);
 	btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1);
 out_free:
-	extent_clear_unlock_delalloc(inode, async_extent->start,
-				     async_extent->start +
-				     async_extent->ram_size - 1,
+	extent_clear_unlock_delalloc(inode, start, end,
 				     NULL, EXTENT_LOCKED | EXTENT_DELALLOC |
 				     EXTENT_DELALLOC_NEW |
 				     EXTENT_DEFRAG | EXTENT_DO_ACCOUNTING,
@@ -1002,7 +968,37 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk)
 				     PAGE_END_WRITEBACK | PAGE_SET_ERROR);
 	free_async_extent_pages(async_extent);
 	kfree(async_extent);
-	goto again;
+	return ret;
+}
+
+/*
+ * phase two of compressed writeback.  This is the ordered portion
+ * of the code, which only gets called in the order the work was
+ * queued.  We walk all the async extents created by compress_file_range
+ * and send them down to the disk.
+ */
+static noinline void submit_compressed_extents(struct async_chunk *async_chunk)
+{
+	struct btrfs_inode *inode = BTRFS_I(async_chunk->inode);
+	struct btrfs_fs_info *fs_info = inode->root->fs_info;
+	struct async_extent *async_extent;
+	u64 alloc_hint = 0;
+	int ret = 0;
+
+	while (!list_empty(&async_chunk->extents)) {
+		async_extent = list_entry(async_chunk->extents.next,
+					  struct async_extent, list);
+		list_del(&async_extent->list);
+
+		ret = submit_one_async_extent(inode, async_chunk, async_extent,
+					      &alloc_hint);
+		/* Just for developer */
+		btrfs_debug(fs_info,
+"async extent submission failed root=%lld inode=%llu start=%llu len=%llu ret=%d",
+			    inode->root->root_key.objectid,
+			    btrfs_ino(inode), async_extent->start,
+			    async_extent->ram_size, ret);
+	}
 }
 
 static u64 get_extent_allocation_hint(struct btrfs_inode *inode, u64 start,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 16/27] btrfs: cleanup for extent_write_locked_range()
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (14 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 15/27] btrfs: refactor submit_compressed_extents() Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 17/27] btrfs: make compress_file_range() to be subpage compatible Qu Wenruo
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

There are several cleanups for extent_write_locked_range(), most of them
are pure cleanups, but with some preparation for future subpage support.

- Add a proper comment for which call sites are suitable
  Unlike regular synchronized extent write back, if async cow or zoned
  cow happens, we have all pages in the range still locked.

  Thus for those (only) two call sites, we need this function to submit
  page content into bios and submit them.

- Remove @mode parameter
  All the existing two call sites pass WB_SYNC_ALL. No need for @mode
  parameter.

- Better error handling
  Currently if we hit an error during the page iteration loop, we
  overwrite @ret, causing only the last error can be recorded.

  Here we add @found_error and @first_error variable to record if we hit
  any error, and the first error we hit.
  So the first error won't get lost.

- Don't reuse @start as the cursor
  We reuse the parameter @start as the cursor to iterate the range, not
  a big problem, but since we're here, introduce a proper @cur as the
  cursor.

- Remove impossible branch
  Since all pages are still locked after the ordered extent is inserted,
  there is no way that pages can get its dirty bit cleared.
  Remove the branch where page is not dirty and replace it with an
  ASSERT().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 45 ++++++++++++++++++++++++++++----------------
 fs/btrfs/extent_io.h |  3 +--
 fs/btrfs/inode.c     |  4 ++--
 3 files changed, 32 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5ad15a092574..583cfc89f2af 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5045,23 +5045,29 @@ int extent_write_full_page(struct page *page, struct writeback_control *wbc)
 	return ret;
 }
 
-int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
-			      int mode)
+/*
+ * Submit the pages in the range to bio for call sites which delalloc range
+ * has already be ran (aka, ordered extent inserted) and all pages are still
+ * locked.
+ */
+int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
 {
+	bool found_error = false;
+	int first_error = 0;
 	int ret = 0;
 	struct address_space *mapping = inode->i_mapping;
 	struct page *page;
+	u64 cur = start;
 	unsigned long nr_pages = (end - start + PAGE_SIZE) >>
 		PAGE_SHIFT;
-
 	struct extent_page_data epd = {
 		.bio_ctrl = { 0 },
 		.extent_locked = 1,
-		.sync_io = mode == WB_SYNC_ALL,
+		.sync_io = 1,
 	};
 	struct writeback_control wbc_writepages = {
-		.sync_mode	= mode,
 		.nr_to_write	= nr_pages * 2,
+		.sync_mode	= WB_SYNC_ALL,
 		.range_start	= start,
 		.range_end	= end + 1,
 		/* We're called from an async helper function */
@@ -5070,26 +5076,33 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
 	};
 
 	wbc_attach_fdatawrite_inode(&wbc_writepages, inode);
-	while (start <= end) {
-		page = find_get_page(mapping, start >> PAGE_SHIFT);
-		if (clear_page_dirty_for_io(page))
-			ret = __extent_writepage(page, &wbc_writepages, &epd);
-		else {
-			btrfs_writepage_endio_finish_ordered(BTRFS_I(inode),
-					page, start, start + PAGE_SIZE - 1, 1);
-			unlock_page(page);
+	while (cur <= end) {
+		page = find_get_page(mapping, cur >> PAGE_SHIFT);
+		/*
+		 * All pages in the range are locked since
+		 * btrfs_run_delalloc_range(), thus there is no way to clear
+		 * the page dirty flag.
+		 */
+		ASSERT(PageDirty(page));
+		clear_page_dirty_for_io(page);
+		ret = __extent_writepage(page, &wbc_writepages, &epd);
+		ASSERT(ret <= 0);
+		if (ret < 0) {
+			found_error = true;
+			first_error = ret;
 		}
 		put_page(page);
-		start += PAGE_SIZE;
+		cur += PAGE_SIZE;
 	}
 
-	ASSERT(ret <= 0);
-	if (ret == 0)
+	if (!found_error)
 		ret = flush_write_bio(&epd);
 	else
 		end_write_bio(&epd, ret);
 
 	wbc_detach_inode(&wbc_writepages);
+	if (found_error)
+		return first_error;
 	return ret;
 }
 
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 62027f551b44..3f7f83a91976 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -183,8 +183,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		      struct btrfs_bio_ctrl *bio_ctrl,
 		      unsigned int read_flags, u64 *prev_em_start);
 int extent_write_full_page(struct page *page, struct writeback_control *wbc);
-int extent_write_locked_range(struct inode *inode, u64 start, u64 end,
-			      int mode);
+int extent_write_locked_range(struct inode *inode, u64 start, u64 end);
 int extent_writepages(struct address_space *mapping,
 		      struct writeback_control *wbc);
 int btree_write_cache_pages(struct address_space *mapping,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7931c4a5bc5d..94449a6da16b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -873,7 +873,7 @@ static int submit_one_async_extent(struct btrfs_inode *inode,
 		 */
 		if (!page_started && !ret)
 			extent_write_locked_range(&inode->vfs_inode, start,
-						  end, WB_SYNC_ALL);
+						  end);
 		else if (ret && async_chunk->locked_page)
 			unlock_page(async_chunk->locked_page);
 		kfree(async_extent);
@@ -1460,7 +1460,7 @@ static noinline int run_delalloc_zoned(struct btrfs_inode *inode,
 
 	__set_page_dirty_nobuffers(locked_page);
 	account_page_redirty(locked_page);
-	extent_write_locked_range(&inode->vfs_inode, start, end, WB_SYNC_ALL);
+	extent_write_locked_range(&inode->vfs_inode, start, end);
 	*page_started = 1;
 
 	return 0;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 17/27] btrfs: make compress_file_range() to be subpage compatible
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (15 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 16/27] btrfs: cleanup for extent_write_locked_range() Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 18/27] btrfs: make btrfs_submit_compressed_write() " Qu Wenruo
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

In function compress_file_range(), when the compression is finished, the
function just round up @total_in to PAGE_SIZE.

This is fine for regular sectorsize == PAGE_SIZE case, but not for
subpage.

Just change the ALIGN(, PAGE_SIZE) to round_up(, sectorsize) so that
both regular sectorsize and subpage sectorsize will be happy.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 94449a6da16b..3e6ff2b2dded 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -757,7 +757,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 		 * win, compare the page count read with the blocks on disk,
 		 * compression must free at least one sector size
 		 */
-		total_in = ALIGN(total_in, PAGE_SIZE);
+		total_in = round_up(total_in, fs_info->sectorsize);
 		if (total_compressed + blocksize <= total_in) {
 			compressed_extents++;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 18/27] btrfs: make btrfs_submit_compressed_write() to be subpage compatible
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (16 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 17/27] btrfs: make compress_file_range() to be subpage compatible Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 19/27] btrfs: make end_compressed_bio_writeback() to be subpage compatble Qu Wenruo
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

There is a WARN_ON() checking if @start is aligned to PAGE_SIZE, not
sectorsize, which will cause false alert for subpage.

Fix it to check against sectorsize.

Furthermore:

- Use ASSERT() to do the check
  So that in the future we may skip the check for production build

- Also check alignment for @len

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 70a4679799be..d2d943e1f648 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -504,7 +504,8 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	const bool use_append = btrfs_use_zone_append(inode, disk_start);
 	const unsigned int bio_op = use_append ? REQ_OP_ZONE_APPEND : REQ_OP_WRITE;
 
-	WARN_ON(!PAGE_ALIGNED(start));
+	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
+	       IS_ALIGNED(len, fs_info->sectorsize));
 	cb = kmalloc(compressed_bio_size(fs_info, compressed_len), GFP_NOFS);
 	if (!cb)
 		return BLK_STS_RESOURCE;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 19/27] btrfs: make end_compressed_bio_writeback() to be subpage compatble
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (17 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 18/27] btrfs: make btrfs_submit_compressed_write() " Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 20/27] btrfs: make extent_write_locked_range() to be subpage compatible Qu Wenruo
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

In end_compressed_writeback() we just clear the full page writeback.
For subpage case, if there are two delalloc range in the same page, the
2nd range will trigger a BUG_ON() as the page writeback is already
cleared by previous range.

Fix it by using btrfs_page_clamp_clear_writeback() helper.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index d2d943e1f648..b4aa29965c69 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -336,6 +336,7 @@ static void end_compressed_bio_read(struct bio *bio)
 static noinline void end_compressed_writeback(struct inode *inode,
 					      const struct compressed_bio *cb)
 {
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	unsigned long index = cb->start >> PAGE_SHIFT;
 	unsigned long end_index = (cb->start + cb->len - 1) >> PAGE_SHIFT;
 	struct page *pages[16];
@@ -358,7 +359,8 @@ static noinline void end_compressed_writeback(struct inode *inode,
 		for (i = 0; i < ret; i++) {
 			if (cb->errors)
 				SetPageError(pages[i]);
-			end_page_writeback(pages[i]);
+			btrfs_page_clamp_clear_writeback(fs_info, pages[i],
+							 cb->start, cb->len);
 			put_page(pages[i]);
 		}
 		nr_pages -= ret;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 20/27] btrfs: make extent_write_locked_range() to be subpage compatible
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (18 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 19/27] btrfs: make end_compressed_bio_writeback() to be subpage compatble Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 21/27] btrfs: extract uncompressed async extent submission code into a new helper Qu Wenruo
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

There are two sites are not subpage compatible yet for
extent_write_locked_range():

- How @nr_pages are calculated
  For subpage we can have the following range with 64K page size:

  0   32K  64K   96K 128K
  |   |////|/////|   |

  In that case, although 96K - 32K == 64K, thus it looks like one page
  is enough, but the range spans across two pages, not one.

  Fix it by doing proper round_up() and round_down() to calculate
  @nr_pages.

  Also add some extra ASSERT()s to ensure the range passed in is already
  aligned.

- How the page end is calculated
  Currently we just use cur + PAGE_SIZE - 1 to calculate the page end.

  Which can't handle above range layout, and will trigger ASSERT() in
  btrfs_writepage_endio_finish_ordered(), as the range is no longer
  covered by the page range.

  Fix it by take page end into consideration.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 583cfc89f2af..f6be630cf9ef 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5058,15 +5058,14 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
 	struct address_space *mapping = inode->i_mapping;
 	struct page *page;
 	u64 cur = start;
-	unsigned long nr_pages = (end - start + PAGE_SIZE) >>
-		PAGE_SHIFT;
+	unsigned long nr_pages;
+	const u32 sectorsize = btrfs_sb(inode->i_sb)->sectorsize;
 	struct extent_page_data epd = {
 		.bio_ctrl = { 0 },
 		.extent_locked = 1,
 		.sync_io = 1,
 	};
 	struct writeback_control wbc_writepages = {
-		.nr_to_write	= nr_pages * 2,
 		.sync_mode	= WB_SYNC_ALL,
 		.range_start	= start,
 		.range_end	= end + 1,
@@ -5075,14 +5074,24 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
 		.no_cgroup_owner = 1,
 	};
 
+	ASSERT(IS_ALIGNED(start, sectorsize) &&
+	       IS_ALIGNED(end + 1, sectorsize));
+	nr_pages = (round_up(end, PAGE_SIZE) - round_down(start, PAGE_SIZE)) >>
+		   PAGE_SHIFT;
+	wbc_writepages.nr_to_write = nr_pages * 2;
+
 	wbc_attach_fdatawrite_inode(&wbc_writepages, inode);
 	while (cur <= end) {
+		u64 cur_end = min(round_down(cur, PAGE_SIZE) + PAGE_SIZE - 1,
+				  end);
+
 		page = find_get_page(mapping, cur >> PAGE_SHIFT);
 		/*
 		 * All pages in the range are locked since
 		 * btrfs_run_delalloc_range(), thus there is no way to clear
 		 * the page dirty flag.
 		 */
+		ASSERT(PageLocked(page));
 		ASSERT(PageDirty(page));
 		clear_page_dirty_for_io(page);
 		ret = __extent_writepage(page, &wbc_writepages, &epd);
@@ -5092,7 +5101,7 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
 			first_error = ret;
 		}
 		put_page(page);
-		cur += PAGE_SIZE;
+		cur = cur_end + 1;
 	}
 
 	if (!found_error)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 21/27] btrfs: extract uncompressed async extent submission code into a new helper
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (19 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 20/27] btrfs: make extent_write_locked_range() to be subpage compatible Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 22/27] btrfs: rework lzo_compress_pages() to make it subpage compatible Qu Wenruo
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new helper, submit_uncompressed_range(), for async cow cases
where we fallback to cow.

There are some new modification introduced to the helper:

- Proper locked_page detection
  It's possible that the async_extent range doesn't cover the locked
  page.
  In that case we shouldn't unlock the locked page.

  In the new helper, we will ensure that we only unlock the locked page
  when:

  * The locked page covers part of the async_extent range
  * The locked page is not unlocked by cow_file_range() nor
    extent_write_locked_range()

  This also means extra comments are added focusing on the page locking.

- Add extra comment on some rare parameter used.
  We use @unlock_page = 0 for cow_file_range(), where only two call
  sites doing the same thing, including the new helper.

  It's definitely worthy some comments.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 76 +++++++++++++++++++++++++++++++++---------------
 1 file changed, 52 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3e6ff2b2dded..1432e268c13e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -838,6 +838,43 @@ static void free_async_extent_pages(struct async_extent *async_extent)
 	async_extent->pages = NULL;
 }
 
+static int submit_uncompressed_range(struct btrfs_inode *inode,
+				     struct async_extent *async_extent,
+				     struct page *locked_page)
+{
+	u64 start = async_extent->start;
+	u64 end = async_extent->start + async_extent->ram_size - 1;
+	unsigned long nr_written = 0;
+	int page_started = 0;
+	int ret;
+
+	/*
+	 * Call cow_file_range() to run the delalloc range directly,
+	 * since we won't go to nocow or async path again.
+	 *
+	 * Also we call cow_file_range() with @unlock_page == 0, so that we
+	 * can directly submit them without interruption.
+	 */
+	ret = cow_file_range(inode, locked_page, start, end, &page_started,
+			     &nr_written, 0);
+	/* Inline extent inserted, page get unlocked and everything is done */
+	if (ret > 0) {
+		ret = 0;
+		goto out;
+	}
+	if (ret < 0) {
+		if (locked_page)
+			unlock_page(locked_page);
+		goto out;
+	}
+
+	ret = extent_write_locked_range(&inode->vfs_inode, start, end);
+	/* All pages will be unlocked, including @locked_page */
+out:
+	kfree(async_extent);
+	return ret;
+}
+
 static int submit_one_async_extent(struct btrfs_inode *inode,
 				   struct async_chunk *async_chunk,
 				   struct async_extent *async_extent,
@@ -847,38 +884,29 @@ static int submit_one_async_extent(struct btrfs_inode *inode,
 	struct btrfs_root *root = inode->root;
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct btrfs_key ins;
+	struct page *locked_page = NULL;
 	struct extent_map *em;
 	int ret = 0;
 	u64 start = async_extent->start;
 	u64 end = async_extent->start + async_extent->ram_size - 1;
 
+	/*
+	 * If async_chunk->locked_page is in the async_extent range, we
+	 * need to handle it.
+	 */
+	if (async_chunk->locked_page) {
+		u64 locked_page_start = page_offset(async_chunk->locked_page);
+		u64 locked_page_end = locked_page_start + PAGE_SIZE - 1;
+
+		if (!(start >= locked_page_end || end <= locked_page_start))
+			locked_page = async_chunk->locked_page;
+	}
 	lock_extent(io_tree, start, end);
 
 	/* We have fall back to uncompressed write */
-	if (!async_extent->pages) {
-		int page_started = 0;
-		unsigned long nr_written = 0;
-
-		/*
-		 * Call cow_file_range() to run the delalloc range directly,
-		 * since we won't go to nocow or async path again.
-		 */
-		ret = cow_file_range(inode, async_chunk->locked_page,
-				     start, end, &page_started, &nr_written, 0);
-		/*
-		 * If @page_started, cow_file_range() inserted an
-		 * inline extent and took care of all the unlocking
-		 * and IO for us.  Otherwise, we need to submit
-		 * all those pages down to the drive.
-		 */
-		if (!page_started && !ret)
-			extent_write_locked_range(&inode->vfs_inode, start,
-						  end);
-		else if (ret && async_chunk->locked_page)
-			unlock_page(async_chunk->locked_page);
-		kfree(async_extent);
-		return ret;
-	}
+	if (!async_extent->pages)
+		return submit_uncompressed_range(inode, async_extent,
+						 locked_page);
 
 	ret = btrfs_reserve_extent(root, async_extent->ram_size,
 				   async_extent->compressed_size,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 22/27] btrfs: rework lzo_compress_pages() to make it subpage compatible
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (20 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 21/27] btrfs: extract uncompressed async extent submission code into a new helper Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 23/27] btrfs: teach __extent_writepage() to handle locked page differently Qu Wenruo
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

There are several problems in lzo_compress_pages() preventing it from
being subpage compatible:

- No page offset is calculated when reading from inode pages
  For subpage case, we could have @start which is not aligned to
  PAGE_SIZE.

  Thus the destination where we read data from must take offset in page
  into consideration.

- The padding for segment header is bound to PAGE_SIZE
  This means, for subpage case we can skip several corners where on x86
  machines we need to add padding zeros.

The rework will:

- Update the comment to replace "page" with "sector"

- Introduce a new helper, copy_compressed_data_to_page(), to do the copy
  So that we don't need to bother page switches for both input and
  output.

  Now in lzo_compress_pages() we only care about page switching for
  input, while in copy_compressed_data_to_page() we only care the page
  switching for output.

- Only one main cursor
  For lzo_compress_pages() we use @cur_in as main curor.
  It will be the file offset we are currently at.

  All other helper variables will be only declared inside the loop.

  For copy_compressed_data_to_page() it's similar, we will have
  @cur_out at the main cursor, which records how many bytes are in the
  output.

- Get rid of kmap()/kunmap()
  Instead of using __GFP_HIGHMEM and needs to do kmap()/kunmap(), just
  get rid of that GFP flag, so we can use page_address() and never
  bother the kmap()/kunmap() thing.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/lzo.c | 280 +++++++++++++++++++++++++------------------------
 1 file changed, 144 insertions(+), 136 deletions(-)

diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
index 3ad10bb41723..574d1d7e7414 100644
--- a/fs/btrfs/lzo.c
+++ b/fs/btrfs/lzo.c
@@ -32,19 +32,19 @@
  *     payload.
  *     One regular LZO compressed extent can have one or more segments.
  *     For inlined LZO compressed extent, only one segment is allowed.
- *     One segment represents at most one page of uncompressed data.
+ *     One segment represents at most one sector of uncompressed data.
  *
  * 2.1 Segment header
  *     Fixed size. LZO_LEN (4) bytes long, LE32.
  *     Records the total size of the segment (not including the header).
- *     Segment header never crosses page boundary, thus it's possible to
- *     have at most 3 padding zeros at the end of the page.
+ *     Segment header never crosses sector boundary, thus it's possible to
+ *     have at most 3 padding zeros at the end of the sector.
  *
  * 2.2 Data Payload
- *     Variable size. Size up limit should be lzo1x_worst_compress(PAGE_SIZE)
- *     which is 4419 for a 4KiB page.
+ *     Variable size. Size up limit should be lzo1x_worst_compress(sectorsize)
+ *     which is 4419 for a 4KiB sectorsize.
  *
- * Example:
+ * Example with 4K sectorsize:
  * Page 1:
  *          0     0x2   0x4   0x6   0x8   0xa   0xc   0xe     0x10
  * 0x0000   |  Header   | SegHdr 01 | Data payload 01 ...     |
@@ -112,163 +112,171 @@ static inline size_t read_compress_length(const char *buf)
 	return le32_to_cpu(dlen);
 }
 
+/*
+ * Will do:
+ *
+ * - Write a segment header into the destination
+ * - Copy the compressed buffer into the destination
+ * - Make sure we have enough space in the last sector to fit a segment header
+ *   If not, we will pad at most (LZO_LEN (4)) - 1 bytes of zeros.
+ *
+ * Will allocate new pages when needed.
+ */
+static int copy_compressed_data_to_page(char *compressed_data,
+					size_t compressed_size,
+					struct page **out_pages,
+					u32 *cur_out,
+					const u32 sectorsize)
+{
+	u32 sector_bytes_left;
+	u32 orig_out;
+	struct page *cur_page;
+
+	/*
+	 * We never allow a segment header crossing sector boundary, previous
+	 * run should ensure we have enough space left inside the sector.
+	 */
+	ASSERT((*cur_out / sectorsize) ==
+	       (*cur_out + LZO_LEN - 1) / sectorsize);
+
+	cur_page = out_pages[*cur_out / PAGE_SIZE];
+	/* Allocate a new page */
+	if (!cur_page) {
+		cur_page = alloc_page(GFP_NOFS);
+		if (!cur_page)
+			return -ENOMEM;
+		out_pages[*cur_out / PAGE_SIZE] = cur_page;
+	}
+
+	write_compress_length(page_address(cur_page) + offset_in_page(*cur_out),
+			      compressed_size);
+	*cur_out += LZO_LEN;
+
+	orig_out = *cur_out;
+	/* *cur_out is increased, let the main loop to grab a proper page */
+	cur_page = NULL;
+
+	/* Copy compressed data */
+	while (*cur_out - orig_out < compressed_size) {
+		u32 copy_len = min_t(u32, sectorsize - *cur_out % sectorsize,
+				     orig_out + compressed_size - *cur_out);
+
+		/* Grab a page or allocate a new one */
+		if (!cur_page) {
+			cur_page = out_pages[*cur_out / PAGE_SIZE];
+			if (!cur_page) {
+				cur_page = alloc_page(GFP_NOFS);
+				if (!cur_page)
+					return -ENOMEM;
+				out_pages[*cur_out / PAGE_SIZE] = cur_page;
+			}
+		}
+
+		memcpy(page_address(cur_page) + offset_in_page(*cur_out),
+		       compressed_data + *cur_out - orig_out, copy_len);
+
+		*cur_out += copy_len;
+
+		/* If we reached page boudnary, go to next page */
+		if (IS_ALIGNED(*cur_out, PAGE_SIZE)) {
+			/* Let next iteration to grab a page */
+			cur_page = NULL;
+		}
+	}
+
+	/*
+	 * Check if we can fit the next segment header into the remaining space
+	 * of the sector.
+	 */
+	sector_bytes_left = round_up(*cur_out, sectorsize) - *cur_out;
+	if (sector_bytes_left >= LZO_LEN)
+		return 0;
+
+	/* The remaining size is not enough, pad it with zeros */
+	memset(page_address(cur_page) + offset_in_page(*cur_out), 0,
+	       sector_bytes_left);
+	*cur_out += sector_bytes_left;
+	return 0;
+}
+
 int lzo_compress_pages(struct list_head *ws, struct address_space *mapping,
 		u64 start, struct page **pages, unsigned long *out_pages,
 		unsigned long *total_in, unsigned long *total_out)
 {
 	struct workspace *workspace = list_entry(ws, struct workspace, list);
+	const u32 sectorsize = btrfs_sb(mapping->host->i_sb)->sectorsize;
+	struct page *page_in = NULL;
 	int ret = 0;
-	char *data_in;
-	char *cpage_out, *sizes_ptr;
-	int nr_pages = 0;
-	struct page *in_page = NULL;
-	struct page *out_page = NULL;
-	unsigned long bytes_left;
-	unsigned long len = *total_out;
-	unsigned long nr_dest_pages = *out_pages;
-	const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
-	size_t in_len;
-	size_t out_len;
-	char *buf;
-	unsigned long tot_in = 0;
-	unsigned long tot_out = 0;
-	unsigned long pg_bytes_left;
-	unsigned long out_offset;
-	unsigned long bytes;
+	u64 cur_in = start;	/* Points to the file offset of input data */
+	u32 cur_out = 0;	/* Points to the current output byte */
+	u32 len = *total_out;
 
 	*out_pages = 0;
 	*total_out = 0;
 	*total_in = 0;
 
-	in_page = find_get_page(mapping, start >> PAGE_SHIFT);
-	data_in = page_address(in_page);
-
 	/*
-	 * store the size of all chunks of compressed data in
-	 * the first 4 bytes
+	 * Skip the header for now, we will later come back and write the total
+	 * compressed size
 	 */
-	out_page = alloc_page(GFP_NOFS);
-	if (out_page == NULL) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	cpage_out = page_address(out_page);
-	out_offset = LZO_LEN;
-	tot_out = LZO_LEN;
-	pages[0] = out_page;
-	nr_pages = 1;
-	pg_bytes_left = PAGE_SIZE - LZO_LEN;
-
-	/* compress at most one page of data each time */
-	in_len = min(len, PAGE_SIZE);
-	while (tot_in < len) {
-		ret = lzo1x_1_compress(data_in, in_len, workspace->cbuf,
-				       &out_len, workspace->mem);
-		if (ret != LZO_E_OK) {
-			pr_debug("BTRFS: lzo in loop returned %d\n",
-			       ret);
+	cur_out += LZO_LEN;
+	while (cur_in < start + len) {
+		u32 sector_off = (cur_in - start) % sectorsize;
+		u32 in_len;
+		size_t out_len;
+
+		/* Get the input page first */
+		if (!page_in) {
+			page_in = find_get_page(mapping, cur_in >> PAGE_SHIFT);
+			ASSERT(page_in);
+		}
+
+		/* Compress at most one sector of data each time */
+		in_len = min_t(u32, start + len - cur_in,
+			       sectorsize - sector_off);
+		ASSERT(in_len);
+		ret = lzo1x_1_compress(page_address(page_in) +
+				       offset_in_page(cur_in), in_len,
+				       workspace->cbuf, &out_len,
+				       workspace->mem);
+		if (ret < 0) {
+			pr_debug("BTRFS: lzo in loop returned %d\n", ret);
 			ret = -EIO;
 			goto out;
 		}
 
-		/* store the size of this chunk of compressed data */
-		write_compress_length(cpage_out + out_offset, out_len);
-		tot_out += LZO_LEN;
-		out_offset += LZO_LEN;
-		pg_bytes_left -= LZO_LEN;
-
-		tot_in += in_len;
-		tot_out += out_len;
-
-		/* copy bytes from the working buffer into the pages */
-		buf = workspace->cbuf;
-		while (out_len) {
-			bytes = min_t(unsigned long, pg_bytes_left, out_len);
-
-			memcpy(cpage_out + out_offset, buf, bytes);
-
-			out_len -= bytes;
-			pg_bytes_left -= bytes;
-			buf += bytes;
-			out_offset += bytes;
-
-			/*
-			 * we need another page for writing out.
-			 *
-			 * Note if there's less than 4 bytes left, we just
-			 * skip to a new page.
-			 */
-			if ((out_len == 0 && pg_bytes_left < LZO_LEN) ||
-			    pg_bytes_left == 0) {
-				if (pg_bytes_left) {
-					memset(cpage_out + out_offset, 0,
-					       pg_bytes_left);
-					tot_out += pg_bytes_left;
-				}
-
-				/* we're done, don't allocate new page */
-				if (out_len == 0 && tot_in >= len)
-					break;
-
-				if (nr_pages == nr_dest_pages) {
-					out_page = NULL;
-					ret = -E2BIG;
-					goto out;
-				}
-
-				out_page = alloc_page(GFP_NOFS);
-				if (out_page == NULL) {
-					ret = -ENOMEM;
-					goto out;
-				}
-				cpage_out = page_address(out_page);
-				pages[nr_pages++] = out_page;
-
-				pg_bytes_left = PAGE_SIZE;
-				out_offset = 0;
-			}
-		}
+		ret = copy_compressed_data_to_page(workspace->cbuf, out_len,
+						   pages, &cur_out, sectorsize);
+		if (ret < 0)
+			goto out;
+
+		cur_in += in_len;
 
-		/* we're making it bigger, give up */
-		if (tot_in > 8192 && tot_in < tot_out) {
+		/*
+		 * Check if we're making it bigger after two sectors.
+		 * And if we're making it bigger, give up.
+		 */
+		if (cur_in - start > sectorsize * 2 &&
+		    cur_in - start < cur_out) {
 			ret = -E2BIG;
 			goto out;
 		}
 
-		/* we're all done */
-		if (tot_in >= len)
-			break;
-
-		if (tot_out > max_out)
-			break;
-
-		bytes_left = len - tot_in;
-		put_page(in_page);
-
-		start += PAGE_SIZE;
-		in_page = find_get_page(mapping, start >> PAGE_SHIFT);
-		data_in = page_address(in_page);
-		in_len = min(bytes_left, PAGE_SIZE);
-	}
-
-	if (tot_out >= tot_in) {
-		ret = -E2BIG;
-		goto out;
+		/* Check if we have reached page boundary */
+		if (IS_ALIGNED(cur_in, PAGE_SIZE)) {
+			put_page(page_in);
+			page_in = NULL;
+		}
 	}
 
-	/* store the size of all chunks of compressed data */
-	sizes_ptr = page_address(pages[0]);
-	write_compress_length(sizes_ptr, tot_out);
+	/* Store the size of all chunks of compressed data */
+	write_compress_length(page_address(pages[0]), cur_out);
 
 	ret = 0;
-	*total_out = tot_out;
-	*total_in = tot_in;
+	*total_out = cur_out;
+	*total_in = cur_in - start;
 out:
-	*out_pages = nr_pages;
-
-	if (in_page)
-		put_page(in_page);
-
+	*out_pages = DIV_ROUND_UP(cur_out, PAGE_SIZE);
 	return ret;
 }
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 23/27] btrfs: teach __extent_writepage() to handle locked page differently
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (21 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 22/27] btrfs: rework lzo_compress_pages() to make it subpage compatible Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 24/27] btrfs: allow page to be unlocked by btrfs_page_end_writer_lock() even if it's locked by plain page_lock() Qu Wenruo
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

Pages passed to __extent_writepage() are always locked, but they may be
locked by different functions.

There are two types of locked page for __extent_writepage():

- Page locked by plain lock_page()
  It should not have any subpage::writers count.
  Can be unlocked by unlock_page().
  This is the most common locked page for __extent_writepage() called
  inside extent_write_cache_pages() or extent_write_full_page().
  Rarer cases includes the @locked_page from extent_write_locked_range().

- Page locked by lock_delalloc_pages()
  There is only one caller, all pages except @locked_page for
  extent_write_locked_range().
  In this case, we have to call subpage helper to handle the case.

So here we introduce a helper, btrfs_page_unlock_writer(), to allow
__extent_writepage() to unlock different locked pages.

And since for all other callers of __extent_writepage() their pages are
ensured to be locked by lock_page(), also add an extra check for
epd::extent_locked to unlock such pages directly.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 15 ++++++++++++++-
 fs/btrfs/subpage.c   | 43 +++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/subpage.h   |  2 ++
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f6be630cf9ef..ae5689a29660 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4060,6 +4060,7 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 			      struct extent_page_data *epd)
 {
 	struct inode *inode = page->mapping->host;
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	const u64 page_start = page_offset(page);
 	const u64 page_end = page_start + PAGE_SIZE - 1;
 	int ret;
@@ -4118,7 +4119,19 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 		ret = ret < 0 ? ret : -EIO;
 		end_extent_writepage(page, ret, page_start, page_end);
 	}
-	unlock_page(page);
+	if (epd->extent_locked) {
+		/*
+		 * If epd->extent_locked, it's from extent_write_locked_range(),
+		 * the page can either be locked by lock_page() or
+		 * process_one_page().
+		 * Let btrfs_page_unlock_writer() to handle both cases.
+		 */
+		ASSERT(wbc);
+		btrfs_page_unlock_writer(fs_info, page, wbc->range_start,
+					 wbc->range_end + 1 - wbc->range_start);
+	} else {
+		unlock_page(page);
+	}
 	ASSERT(ret <= 0);
 	return ret;
 }
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index b8a420cb1683..3e0416e7e48b 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -608,3 +608,46 @@ void btrfs_page_assert_not_dirty(const struct btrfs_fs_info *fs_info,
 	ASSERT(PagePrivate(page) && page->private);
 	ASSERT(subpage->dirty_bitmap == 0);
 }
+
+/*
+ * Helper to handle different locked page with different page size
+ * - Page locked by plain lock_page()
+ *   It should not have any subpage::writers count.
+ *   Can be unlocked by unlock_page().
+ *   This is the most common locked page for __extent_writepage() called
+ *   inside extent_write_cache_pages() or extent_write_full_page().
+ *   Rarer cases includes the @locked_page from extent_write_locked_range().
+ *
+ * - Page locked by lock_delalloc_pages()
+ *   There is only one caller, all pages except @locked_page for
+ *   extent_write_locked_range().
+ *   In this case, we have to call subpage helper to handle the case.
+ */
+void btrfs_page_unlock_writer(struct btrfs_fs_info *fs_info, struct page *page,
+			      u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage;
+
+	ASSERT(PageLocked(page));
+	/* For regular page size case, we just unlock the page */
+	if (fs_info->sectorsize == PAGE_SIZE)
+		return unlock_page(page);
+
+	ASSERT(PagePrivate(page) && page->private);
+	subpage = (struct btrfs_subpage *)page->private;
+
+	/*
+	 * For subpage case, there are two types of locked page.
+	 * With or without writers number.
+	 *
+	 * Since we own the page lock, no one else could touch
+	 * subpage::writers and are we safe to do several atomic operations
+	 * without spinlock.
+	 */
+	if (atomic_read(&subpage->writers))
+		/* No writers, locked by plain lock_page() */
+		return unlock_page(page);
+
+	/* Have writers, use proper subpage helper to end it */
+	btrfs_page_end_writer_lock(fs_info, page, start, len);
+}
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 6fb54b22b295..d725802f1f39 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -140,5 +140,7 @@ bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info,
  */
 void btrfs_page_assert_not_dirty(const struct btrfs_fs_info *fs_info,
 				 struct page *page);
+void btrfs_page_unlock_writer(struct btrfs_fs_info *fs_info, struct page *page,
+			      u64 start, u32 len);
 
 #endif
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 24/27] btrfs: allow page to be unlocked by btrfs_page_end_writer_lock() even if it's locked by plain page_lock()
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (22 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 23/27] btrfs: teach __extent_writepage() to handle locked page differently Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 25/27] btrfs: allow subpage to compress a range which only covers one page Qu Wenruo
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

There are several call sites of extent_clear_unlock_delalloc() which
gets @locked_page = NULL.
So that extent_clear_unlock_delalloc() will try to call
process_one_page() to unlock every page even the first page is not
locked by btrfs_page_start_writer_lock().

This will trigger an ASSERT() in btrfs_subpage_end_and_test_writer() as
previously we require every page passed to
btrfs_subpage_end_and_test_writer() to be locked by
btrfs_page_start_writer_lock().

But compression path doesn't go that way.

Thankfully it's not hard to distinguish page locked by lock_page() and
btrfs_page_start_writer_lock().

So do the check in btrfs_subpage_end_and_test_writer() so now it can
handle both cases well.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/subpage.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 3e0416e7e48b..7d0f4f1ce0bf 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -248,6 +248,17 @@ bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_info,
 
 	btrfs_subpage_assert(fs_info, page, start, len);
 
+	/*
+	 * We have call sites passing @lock_page into
+	 * extent_clear_unlock_delalloc() for compression path.
+	 *
+	 * Those @locked_page is locked by plain lock_page(), thus its
+	 * subpage::writers is 0.
+	 * Handle them specially.
+	 */
+	if (atomic_read(&subpage->writers) == 0)
+		return true;
+
 	ASSERT(atomic_read(&subpage->writers) >= nbits);
 	return atomic_sub_and_test(nbits, &subpage->writers);
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 25/27] btrfs: allow subpage to compress a range which only covers one page
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (23 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 24/27] btrfs: allow page to be unlocked by btrfs_page_end_writer_lock() even if it's locked by plain page_lock() Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 26/27] btrfs: don't run delalloc range which is beyond the locked_page to prevent deadlock for subpage compression Qu Wenruo
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
With expertimental subpage compression write enabled, btrfs still refuse
to compress a completely valid range:

 mkfs.btrfs -f -s 4k $dev
 mount $dev -o compress $mnt

 xfs_io -f -c "pwrite 0 64K" $mnt/file
 umount $mnt

The result extent is uncompressed.

[CAUSE]
It turns out we have extra check on whether we should compress the range
in compress_file_range().

The check looks like:

	if (nr_pages > 1 && inode_need_compress(BTRFS_I(inode), start, end)) {

This means, we will only compress range which covers more than one page.

This check no longer makes sense for subpage, as one page can contain
several sectors.

[FIX]
Don't use @nr_pages, but directly compare @total_compressed against
@blocksize to determine whether we need to do compression.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1432e268c13e..1cab1e99e46e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -630,7 +630,8 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 	 * inode has not been flagged as nocompress.  This flag can
 	 * change at any time if we discover bad compression ratios.
 	 */
-	if (nr_pages > 1 && inode_need_compress(BTRFS_I(inode), start, end)) {
+	if (total_compressed > blocksize &&
+	    inode_need_compress(BTRFS_I(inode), start, end)) {
 		WARN_ON(pages);
 		pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
 		if (!pages) {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 26/27] btrfs: don't run delalloc range which is beyond the locked_page to prevent deadlock for subpage compression
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (24 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 25/27] btrfs: allow subpage to compress a range which only covers one page Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-13  6:15 ` [PATCH 27/27] btrfs: only allow subpage compression if the range is fully page aligned Qu Wenruo
  2021-07-16  9:11 ` [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
With experimental subpage compression enabled, a simple fsstress can
lead to self deadlock on page 720896:

        mkfs.btrfs -f -s 4k $dev > /dev/null
        mount $dev -o compress $mnt
        $fsstress -p 1 -n 100 -w -d $mnt -v -s 1625511156

[CAUSE]
If we have a file layout looks like below:

	0	32K	64K	96K	128K
	|//|		|///////////////|
	   4K
Then we run delalloc range for the inode, it will:

- Call find_lock_delalloc_range() with @delalloc_start = 0
  Then we got a delalloc range [0, 4K).

  This range will be CoWed.

- Call find_lock_delalloc_range() again with @delalloc_start = 4K
  Since find_lock_delalloc_range() never cares whether the range
  is still inside page range [0, 64K), it will return range [64K, 128K).

  This range meets the condition for subpage compression, will go
  through async cow path.

  And async cow path will return @page_started.

  But that @page_started is now for range [64K, 128K), not for range
  [0, 64K).

- writepage_dellloc() returned 1 for page [0, 64K)
  Thus page [0, 64K) will not be unlocked, nor its page dirty status
  will be cleared.

Next time when we try to lock page [0, 64K) we will deadlock, as there
is no one to release page [0, 64K).

This problem will never happen for regular page size as one page only
contains one sector.
After the first find_lock_delalloc_range() call, the @delalloc_end will
go beyond @page_end no matter if we found a delalloc range or not

Thus this bug only happens for subpage, as now we need multiple runs to
exhaust the delalloc range of a page.

[FIX]
Fix the problem by ensure the delalloc range we ran at least starts
inside @locked_page.

So that we will never got incorrect @page_started.

And to prevent such problem from happening again:

- Make find_lock_delalloc_range() to return false if the found range is
  beyond @end value passed in.

  Since @end will be utilized now, add an ASSERT() to ensure we pass
  correct @end into find_lock_delalloc_range().

  This also means, for selftest we needs to populate @end before calling
  find_lock_delalloc_range().

- New ASSERT() in find_lock_delalloc_range()
  Now we will make sure the @start/@end passed in at least covers part
  of the page.

- New ASSERT() in run_delalloc_range()
  To make sure the range at least starts inside @locked page.

- Use @delalloc_start as proper cursor, while @delalloc_end is always
  reset to @page_end.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c             | 36 ++++++++++++++++++++++++--------
 fs/btrfs/inode.c                 |  7 +++++++
 fs/btrfs/tests/extent-io-tests.c | 12 +++++------
 3 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ae5689a29660..fff1a4d8fe25 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1974,10 +1974,18 @@ static noinline int lock_delalloc_pages(struct inode *inode,
 
 /*
  * Find and lock a contiguous range of bytes in the file marked as delalloc, no
- * more than @max_bytes.  @Start and @end are used to return the range,
+ * more than @max_bytes.
  *
- * Return: true if we find something
- *         false if nothing was in the tree
+ * @start:	The original start bytenr to search.
+ *		Will store the extent range start bytenr.
+ * @end:	The original end bytenr of the search range
+ *		Will store the extent range end bytenr.
+ *
+ * Return true if we find a delalloc range which starts inside the original
+ * range, and @start/@end will store the delalloc range start/end.
+ *
+ * Return false if we can't find any delalloc range which starts inside the
+ * original range, and @start/@end will be the non-delalloc range start/end.
  */
 EXPORT_FOR_TESTS
 noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
@@ -1985,6 +1993,8 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
 				    u64 *end)
 {
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
+	const u64 orig_start = *start;
+	const u64 orig_end = *end;
 	u64 max_bytes = BTRFS_MAX_EXTENT_SIZE;
 	u64 delalloc_start;
 	u64 delalloc_end;
@@ -1993,15 +2003,23 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
 	int ret;
 	int loops = 0;
 
+	/* Caller should pass a valid @end to indicate the search range end */
+	ASSERT(orig_end > orig_start);
+
+	/* The range should at least cover part of the page */
+	ASSERT(!(orig_start >= page_offset(locked_page) + PAGE_SIZE ||
+		 orig_end <= page_offset(locked_page)));
 again:
 	/* step one, find a bunch of delalloc bytes starting at start */
 	delalloc_start = *start;
 	delalloc_end = 0;
 	found = btrfs_find_delalloc_range(tree, &delalloc_start, &delalloc_end,
 					  max_bytes, &cached_state);
-	if (!found || delalloc_end <= *start) {
+	if (!found || delalloc_end <= *start || delalloc_start > orig_end) {
 		*start = delalloc_start;
-		*end = delalloc_end;
+
+		/* @delalloc_end can be -1, never go beyond @orig_end */
+		*end = min(delalloc_end, orig_end);
 		free_extent_state(cached_state);
 		return false;
 	}
@@ -3777,16 +3795,16 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 		struct page *page, struct writeback_control *wbc,
 		unsigned long *nr_written)
 {
-	u64 page_end = page_offset(page) + PAGE_SIZE - 1;
-	bool found;
+	const u64 page_end = page_offset(page) + PAGE_SIZE - 1;
 	u64 delalloc_start = page_offset(page);
 	u64 delalloc_to_write = 0;
-	u64 delalloc_end = 0;
 	int ret;
 	int page_started = 0;
 
+	while (delalloc_start < page_end) {
+		u64 delalloc_end = page_end;
+		bool found;
 
-	while (delalloc_end < page_end) {
 		found = find_lock_delalloc_range(&inode->vfs_inode, page,
 					       &delalloc_start,
 					       &delalloc_end);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1cab1e99e46e..4086e3364acb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1965,6 +1965,13 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page
 	int ret;
 	const bool zoned = btrfs_is_zoned(inode->root->fs_info);
 
+	/*
+	 * The range must cover part of the @locked_page, or the returned
+	 * @page_started can confuse the caller.
+	 */
+	ASSERT(!(end <= page_offset(locked_page) ||
+		 start >= page_offset(locked_page) + PAGE_SIZE));
+
 	if (should_nocow(inode, start, end)) {
 		ASSERT(!zoned);
 		ret = run_delalloc_nocow(inode, locked_page, start, end,
diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
index 73e96d505f4f..c2e72e7a8ff0 100644
--- a/fs/btrfs/tests/extent-io-tests.c
+++ b/fs/btrfs/tests/extent-io-tests.c
@@ -112,7 +112,7 @@ static int test_find_delalloc(u32 sectorsize)
 	 */
 	set_extent_delalloc(tmp, 0, sectorsize - 1, 0, NULL);
 	start = 0;
-	end = 0;
+	end = start + PAGE_SIZE - 1;
 	found = find_lock_delalloc_range(inode, locked_page, &start,
 					 &end);
 	if (!found) {
@@ -143,7 +143,7 @@ static int test_find_delalloc(u32 sectorsize)
 	}
 	set_extent_delalloc(tmp, sectorsize, max_bytes - 1, 0, NULL);
 	start = test_start;
-	end = 0;
+	end = start + PAGE_SIZE - 1;
 	found = find_lock_delalloc_range(inode, locked_page, &start,
 					 &end);
 	if (!found) {
@@ -177,14 +177,14 @@ static int test_find_delalloc(u32 sectorsize)
 		goto out_bits;
 	}
 	start = test_start;
-	end = 0;
+	end = start + PAGE_SIZE - 1;
 	found = find_lock_delalloc_range(inode, locked_page, &start,
 					 &end);
 	if (found) {
 		test_err("found range when we shouldn't have");
 		goto out_bits;
 	}
-	if (end != (u64)-1) {
+	if (end != test_start + PAGE_SIZE - 1) {
 		test_err("did not return the proper end offset");
 		goto out_bits;
 	}
@@ -198,7 +198,7 @@ static int test_find_delalloc(u32 sectorsize)
 	 */
 	set_extent_delalloc(tmp, max_bytes, total_dirty - 1, 0, NULL);
 	start = test_start;
-	end = 0;
+	end = start + PAGE_SIZE - 1;
 	found = find_lock_delalloc_range(inode, locked_page, &start,
 					 &end);
 	if (!found) {
@@ -233,7 +233,7 @@ static int test_find_delalloc(u32 sectorsize)
 	/* We unlocked it in the previous test */
 	lock_page(locked_page);
 	start = test_start;
-	end = 0;
+	end = start + PAGE_SIZE - 1;
 	/*
 	 * Currently if we fail to find dirty pages in the delalloc range we
 	 * will adjust max_bytes down to PAGE_SIZE and then re-search.  If
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 27/27] btrfs: only allow subpage compression if the range is fully page aligned
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (25 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 26/27] btrfs: don't run delalloc range which is beyond the locked_page to prevent deadlock for subpage compression Qu Wenruo
@ 2021-07-13  6:15 ` Qu Wenruo
  2021-07-16  9:11 ` [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
  27 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-13  6:15 UTC (permalink / raw)
  To: linux-btrfs

For btrfs compressed write, we use a mechanism called async cow, which
unlike regular run_delalloc_cow() or cow_file_range(), it will also
unlock the first page.

This mechanism allows btrfs to continue handling next ranges, without
waiting for the time consuming compression.

But this has a problem for subpage case, as we could have the following
delalloc range for a page:

0		32K		64K
|	|///////|	|///////|
		\- A		\- B

In above case, if we pass both range to cow_file_range_async(), both
range A and range B will try to unlock the full page [0, 64K).

And which finishes later than the other range will try to do other page
operations like end_page_writeback() on a unlocked page, triggering VM
layer BUG_ON().

To make subpage compression work at least partially, here we add another
restriction for it, only allow compression if the delalloc range is
fully page aligned.

By that, async extent is always ensured to unlock the first page
exclusively, just like it used to be for regular sectorsize.

In theory, we only need to make sure the delalloc range fully covers its
first page, but the tailing page will be locked anyway, blocking later
writeback until the compression finishes.

Thus here we choose to make sure the range is fully page aligned before
doing the compression.

In the future, we could optimize the situation by properly increase
subpage::writers number for the locked page, but that also means we need
to change how we run delalloc range of page.
(Instead of running each delalloc range we hit, we need to find and lock
all delalloc range covers the page, then run each of them).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 44 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 37 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4086e3364acb..e8af0021af78 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -488,9 +488,6 @@ static noinline int add_async_extent(struct async_chunk *cow,
  */
 static inline bool inode_can_compress(struct btrfs_inode *inode)
 {
-	/* Subpage doesn't support compress yet */
-	if (inode->root->fs_info->sectorsize < PAGE_SIZE)
-		return false;
 	if (inode->flags & BTRFS_INODE_NODATACOW ||
 	    inode->flags & BTRFS_INODE_NODATASUM)
 		return false;
@@ -512,6 +509,38 @@ static inline int inode_need_compress(struct btrfs_inode *inode, u64 start,
 			btrfs_ino(inode));
 		return 0;
 	}
+	/*
+	 * Special check for subpage.
+	 *
+	 * We lock the full page then run each delalloc range in the page, thus
+	 * for the following case, we will hit some subpage specific corner case:
+	 *
+	 * 0		32K		64K
+	 * |	|///////|	|///////|
+	 *		\- A		\- B
+	 *
+	 * In above case, both range A and range B will try to unlock the full
+	 * page [0, 64K), causing the one finished later will have page
+	 * unlocked already, triggering various page lock requirement BUG_ON()s.
+	 *
+	 * So here we add an artificial limit that subpage compression can only
+	 * if the range is fully page aligned.
+	 *
+	 * In theory we only need to ensure the first page is fully covered, but
+	 * the tailing partial page will be locked until the full compression
+	 * finishes, delaying the write of other range.
+	 *
+	 * TODO: Make btrfs_run_delalloc_range() to lock all delalloc range
+	 * first to prevent any submitted async extent to unlock the full page.
+	 * By this, we can ensure for subpage case that only the last async_cow
+	 * will unlock the full page.
+	 */
+	if (fs_info->sectorsize < PAGE_SIZE) {
+		if (!IS_ALIGNED(start, PAGE_SIZE) ||
+		    !IS_ALIGNED(end + 1, PAGE_SIZE))
+			return 0;
+	}
+
 	/* force compress */
 	if (btrfs_test_opt(fs_info, FORCE_COMPRESS))
 		return 1;
@@ -613,11 +642,12 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 	total_compressed = actual_end - start;
 
 	/*
-	 * skip compression for a small file range(<=blocksize) that
-	 * isn't an inline extent, since it doesn't save disk space at all.
+	 * For subpage case, we require full page alignment for the sector
+	 * aligned range.
+	 * Thus we must also check against @actual_end, not just @end.
 	 */
-	if (total_compressed <= blocksize &&
-	   (start > 0 || end + 1 < BTRFS_I(inode)->disk_i_size))
+	if (blocksize < PAGE_SIZE &&
+	    !IS_ALIGNED(round_up(actual_end, blocksize), PAGE_SIZE))
 		goto cleanup_and_bail_uncompressed;
 
 	total_compressed = min_t(unsigned long, total_compressed,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 03/27] btrfs: use async_chunk::async_cow to replace the confusing pending pointer
  2021-07-13  6:14 ` [PATCH 03/27] btrfs: use async_chunk::async_cow to replace the confusing pending pointer Qu Wenruo
@ 2021-07-13  7:36   ` Nikolay Borisov
  0 siblings, 0 replies; 32+ messages in thread
From: Nikolay Borisov @ 2021-07-13  7:36 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 13.07.21 г. 9:14, Qu Wenruo wrote:
> For structure async_chunk, we use a very strange member layout to grab
> structure async_cow who owns this async_chunk.
> 
> At initialization, it goes like this:
> 
> 		async_chunk[i].pending = &ctx->num_chunks;
> 
> Then at async_cow_free() we do a super weird freeing:
> 
> 	/*
> 	 * Since the pointer to 'pending' is at the beginning of the array of
> 	 * async_chunk's, freeing it ensures the whole array has been freed.
> 	 */
> 	if (atomic_dec_and_test(async_chunk->pending))
> 		kvfree(async_chunk->pending);
> 
> This is absolutely an abuse of kvfree().
> 
> Replace async_chunk::pending with async_chunk::async_cow, so that we can
> grab the async_cow structure directly, without this strange dancing.
> 
> And with this change, there is no requirement for any specific member
> location.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/27] btrfs: add subpage checked_bitmap to make PageChecked flag to be subpage compatible
  2021-07-13  6:14 ` [PATCH 07/27] btrfs: add subpage checked_bitmap to make PageChecked flag to be subpage compatible Qu Wenruo
@ 2021-07-16  7:54   ` Qu Wenruo
  0 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-16  7:54 UTC (permalink / raw)
  To: linux-btrfs



On 2021/7/13 下午2:14, Qu Wenruo wrote:
> Although in btrfs we have very limited user of PageChecked flag, it's
> still some page flag not yet subpage compatible.
> 
> Fix it by introducing btrfs_subpage::checked_bitmap to do the covert.
> 
> For most call sites, especially for free-space cache, COW fixup and
> btrfs_invalidatepage(), they all work in full page mode anyway.
> 
> For other call sites, they work as fully subpage mode.
> 
> Some call sites need extra modification:
> 
> - btrfs_drop_pages()
>    Needs extra parameter to get the real range we need to clear checked
>    flag.
> 
> - btrfs_invalidatepage()
>    We need to call subpage helper before calling __btrfs_releasepage(),
>    or it will trigger ASSERT() as page->private will be cleared.
> 
> - btrfs_verify_data_csum()
>    In theory we don't need the io_bio->csum check anymore, but it's
>    won't hurt.
>    Just change the comment.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/compression.c      | 11 +++++++++--
>   fs/btrfs/file.c             | 20 +++++++++++++++-----
>   fs/btrfs/free-space-cache.c |  6 +++++-
>   fs/btrfs/inode.c            | 30 ++++++++++++++----------------
>   fs/btrfs/reflink.c          |  2 +-
>   fs/btrfs/subpage.c          | 31 +++++++++++++++++++++++++++++++
>   fs/btrfs/subpage.h          |  8 ++++++++
>   7 files changed, 83 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index f08522e8b4cd..be81e34fbd56 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -29,6 +29,7 @@
>   #include "extent_io.h"
>   #include "extent_map.h"
>   #include "zoned.h"
> +#include "subpage.h"
>   
>   static const char* const btrfs_compress_types[] = { "", "zlib", "lzo", "zstd" };
>   
> @@ -296,8 +297,14 @@ static void end_compressed_bio_read(struct bio *bio)
>   		 * checked so the end_io handlers know about it
>   		 */
>   		ASSERT(!bio_flagged(bio, BIO_CLONED));
> -		bio_for_each_segment_all(bvec, cb->orig_bio, iter_all)
> -			SetPageChecked(bvec->bv_page);
> +		bio_for_each_segment_all(bvec, cb->orig_bio, iter_all) {
> +			u64 bvec_start = page_offset(bvec->bv_page) +
> +					 bvec->bv_offset;
> +
> +			btrfs_page_set_checked(btrfs_sb(cb->inode->i_sb),
> +					bvec->bv_page, bvec_start,
> +					bvec->bv_len);
> +		}
>   
>   		bio_endio(cb->orig_bio);
>   	}
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 0831ca08376f..ccbbf2732685 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -437,9 +437,16 @@ static noinline int btrfs_copy_from_user(loff_t pos, size_t write_bytes,
>   /*
>    * unlocks pages after btrfs_file_write is done with them
>    */
> -static void btrfs_drop_pages(struct page **pages, size_t num_pages)
> +static void btrfs_drop_pages(struct btrfs_fs_info *fs_info,
> +			     struct page **pages, size_t num_pages,
> +			     u64 pos, u64 copied)
>   {
>   	size_t i;
> +	u64 block_start = round_down(pos, fs_info->sectorsize);
> +	u64 block_len = round_up(pos + copied, fs_info->sectorsize) -
> +			block_start;
> +
> +	ASSERT(block_len <= U32_MAX);
>   	for (i = 0; i < num_pages; i++) {
>   		/* page checked is some magic around finding pages that
>   		 * have been modified without going through btrfs_set_page_dirty
> @@ -447,7 +454,8 @@ static void btrfs_drop_pages(struct page **pages, size_t num_pages)
>   		 * accessed as prepare_pages should have marked them accessed
>   		 * in prepare_pages via find_or_create_page()
>   		 */
> -		ClearPageChecked(pages[i]);
> +		btrfs_page_clamp_clear_checked(fs_info, pages[i], block_start,
> +					       block_len);

Currently btrfs_subpage_clamp() can't handle any page whose 
page_offset() is beyond @block_start + @block_len.

This leads to rare crash in generic/269 and generic/102.

Will also update btrfs_subpage_clamp() to handle such case in next update.

Thanks,
Q

>   		unlock_page(pages[i]);
>   		put_page(pages[i]);
>   	}
> @@ -504,7 +512,8 @@ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
>   		struct page *p = pages[i];
>   
>   		btrfs_page_clamp_set_uptodate(fs_info, p, start_pos, num_bytes);
> -		ClearPageChecked(p);
> +		btrfs_page_clamp_clear_checked(fs_info, p, start_pos,
> +					       num_bytes);
>   		btrfs_page_clamp_set_dirty(fs_info, p, start_pos, num_bytes);
>   	}
>   
> @@ -1845,7 +1854,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>   
>   		btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes);
>   		if (ret) {
> -			btrfs_drop_pages(pages, num_pages);
> +			btrfs_drop_pages(fs_info, pages, num_pages, pos,
> +					 copied);
>   			break;
>   		}
>   
> @@ -1853,7 +1863,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>   		if (only_release_metadata)
>   			btrfs_check_nocow_unlock(BTRFS_I(inode));
>   
> -		btrfs_drop_pages(pages, num_pages);
> +		btrfs_drop_pages(fs_info, pages, num_pages, pos, copied);
>   
>   		cond_resched();
>   
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index 2131ae5b9ed7..f0a84fe7dc80 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -22,6 +22,7 @@
>   #include "delalloc-space.h"
>   #include "block-group.h"
>   #include "discard.h"
> +#include "subpage.h"
>   
>   #define BITS_PER_BITMAP		(PAGE_SIZE * 8UL)
>   #define MAX_CACHE_BYTES_PER_GIG	SZ_64K
> @@ -417,7 +418,10 @@ static void io_ctl_drop_pages(struct btrfs_io_ctl *io_ctl)
>   
>   	for (i = 0; i < io_ctl->num_pages; i++) {
>   		if (io_ctl->pages[i]) {
> -			ClearPageChecked(io_ctl->pages[i]);
> +			btrfs_page_clear_checked(io_ctl->fs_info,
> +					io_ctl->pages[i],
> +					page_offset(io_ctl->pages[i]),
> +					PAGE_SIZE);
>   			unlock_page(io_ctl->pages[i]);
>   			put_page(io_ctl->pages[i]);
>   		}
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index b524deadb5c6..5c29d131a574 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2757,7 +2757,8 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
>   		clear_page_dirty_for_io(page);
>   		SetPageError(page);
>   	}
> -	ClearPageChecked(page);
> +	btrfs_page_clear_checked(inode->root->fs_info, page,
> +				 page_start, PAGE_SIZE);
>   	unlock_page(page);
>   	put_page(page);
>   	kfree(fixup);
> @@ -2812,7 +2813,7 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end)
>   	 * page->mapping outside of the page lock.
>   	 */
>   	ihold(inode);
> -	SetPageChecked(page);
> +	btrfs_page_set_checked(fs_info, page, page_offset(page), PAGE_SIZE);
>   	get_page(page);
>   	btrfs_init_work(&fixup->work, btrfs_writepage_fixup_worker, NULL, NULL);
>   	fixup->page = page;
> @@ -3257,27 +3258,23 @@ unsigned int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u32 bio_offset,
>   				    struct page *page, u64 start, u64 end)
>   {
>   	struct inode *inode = page->mapping->host;
> +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
>   	struct btrfs_root *root = BTRFS_I(inode)->root;
>   	const u32 sectorsize = root->fs_info->sectorsize;
>   	u32 pg_off;
>   	unsigned int result = 0;
>   
> -	if (PageChecked(page)) {
> -		ClearPageChecked(page);
> +	if (btrfs_page_test_checked(fs_info, page, start, end + 1 - start)) {
> +		btrfs_page_clear_checked(fs_info, page, start, end + 1 - start);
>   		return 0;
>   	}
>   
>   	/*
> -	 * For subpage case, above PageChecked is not safe as it's not subpage
> -	 * compatible.
> -	 * But for now only cow fixup and compressed read utilize PageChecked
> -	 * flag, while in this context we can easily use io_bio->csum to
> -	 * determine if we really need to do csum verification.
> -	 *
> -	 * So for now, just exit if io_bio->csum is NULL, as it means it's
> -	 * compressed read, and its compressed data csum has already been
> -	 * verified.
> +	 * This only happens for NODATASUM or compressed read.
> +	 * Normally this should be covered by above check for compressed read
> +	 * or the next check for NODATASUM.
> +	 * Just do a quicker exit here.
>   	 */
>   	if (io_bio->csum == NULL)
>   		return 0;
> @@ -5083,7 +5080,8 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
>   				     len);
>   		flush_dcache_page(page);
>   	}
> -	ClearPageChecked(page);
> +	btrfs_page_clear_checked(fs_info, page, block_start,
> +				 block_end + 1 - block_start);
>   	btrfs_page_set_dirty(fs_info, page, block_start, block_end + 1 - block_start);
>   	unlock_extent_cached(io_tree, block_start, block_end, &cached_state);
>   
> @@ -8673,9 +8671,9 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
>   	 * did something wrong.
>   	 */
>   	ASSERT(!PageOrdered(page));
> +	btrfs_page_clear_checked(fs_info, page, page_offset(page), PAGE_SIZE);
>   	if (!inode_evicting)
>   		__btrfs_releasepage(page, GFP_NOFS);
> -	ClearPageChecked(page);
>   	clear_page_extent_mapped(page);
>   }
>   
> @@ -8819,7 +8817,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>   		memzero_page(page, zero_start, PAGE_SIZE - zero_start);
>   		flush_dcache_page(page);
>   	}
> -	ClearPageChecked(page);
> +	btrfs_page_clear_checked(fs_info, page, page_start, PAGE_SIZE);
>   	btrfs_page_set_dirty(fs_info, page, page_start, end + 1 - page_start);
>   	btrfs_page_set_uptodate(fs_info, page, page_start, end + 1 - page_start);
>   
> diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
> index 9b0814318e72..a7de8cfdcac0 100644
> --- a/fs/btrfs/reflink.c
> +++ b/fs/btrfs/reflink.c
> @@ -138,7 +138,7 @@ static int copy_inline_to_page(struct btrfs_inode *inode,
>   	}
>   
>   	btrfs_page_set_uptodate(fs_info, page, file_offset, block_size);
> -	ClearPageChecked(page);
> +	btrfs_page_clear_checked(fs_info, page, file_offset, block_size);
>   	btrfs_page_set_dirty(fs_info, page, file_offset, block_size);
>   out_unlock:
>   	if (page) {
> diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
> index a61aa33aeeee..b8a420cb1683 100644
> --- a/fs/btrfs/subpage.c
> +++ b/fs/btrfs/subpage.c
> @@ -468,6 +468,34 @@ void btrfs_subpage_clear_ordered(const struct btrfs_fs_info *fs_info,
>   		ClearPageOrdered(page);
>   	spin_unlock_irqrestore(&subpage->lock, flags);
>   }
> +
> +void btrfs_subpage_set_checked(const struct btrfs_fs_info *fs_info,
> +		struct page *page, u64 start, u32 len)
> +{
> +	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
> +	const u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&subpage->lock, flags);
> +	subpage->checked_bitmap |= tmp;
> +	if (subpage->checked_bitmap == U16_MAX)
> +		SetPageChecked(page);
> +	spin_unlock_irqrestore(&subpage->lock, flags);
> +}
> +
> +void btrfs_subpage_clear_checked(const struct btrfs_fs_info *fs_info,
> +		struct page *page, u64 start, u32 len)
> +{
> +	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
> +	const u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&subpage->lock, flags);
> +	subpage->checked_bitmap &= ~tmp;
> +	ClearPageChecked(page);
> +	spin_unlock_irqrestore(&subpage->lock, flags);
> +}
> +
>   /*
>    * Unlike set/clear which is dependent on each page status, for test all bits
>    * are tested in the same way.
> @@ -491,6 +519,7 @@ IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(error);
>   IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(dirty);
>   IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(writeback);
>   IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(ordered);
> +IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(checked);
>   
>   /*
>    * Note that, in selftests (extent-io-tests), we can have empty fs_info passed
> @@ -561,6 +590,8 @@ IMPLEMENT_BTRFS_PAGE_OPS(writeback, set_page_writeback, end_page_writeback,
>   			 PageWriteback);
>   IMPLEMENT_BTRFS_PAGE_OPS(ordered, SetPageOrdered, ClearPageOrdered,
>   			 PageOrdered);
> +IMPLEMENT_BTRFS_PAGE_OPS(checked, SetPageChecked, ClearPageChecked,
> +			 PageChecked);
>   
>   void btrfs_page_assert_not_dirty(const struct btrfs_fs_info *fs_info,
>   				 struct page *page)
> diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
> index 9aa40d795ba9..6fb54b22b295 100644
> --- a/fs/btrfs/subpage.h
> +++ b/fs/btrfs/subpage.h
> @@ -44,6 +44,13 @@ struct btrfs_subpage {
>   
>   			/* Tracke pending ordered extent in this sector */
>   			u16 ordered_bitmap;
> +
> +			/*
> +			 * If the sector is already checked, thus no need to go
> +			 * through csum verification.
> +			 * Used by both compression and cow fixup.
> +			 */
> +			u16 checked_bitmap;
>   		};
>   	};
>   };
> @@ -122,6 +129,7 @@ DECLARE_BTRFS_SUBPAGE_OPS(error);
>   DECLARE_BTRFS_SUBPAGE_OPS(dirty);
>   DECLARE_BTRFS_SUBPAGE_OPS(writeback);
>   DECLARE_BTRFS_SUBPAGE_OPS(ordered);
> +DECLARE_BTRFS_SUBPAGE_OPS(checked);
>   
>   bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info,
>   		struct page *page, u64 start, u32 len);
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/27] btrfs: limited subpage compressed write support
  2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
                   ` (26 preceding siblings ...)
  2021-07-13  6:15 ` [PATCH 27/27] btrfs: only allow subpage compression if the range is fully page aligned Qu Wenruo
@ 2021-07-16  9:11 ` Qu Wenruo
  2021-07-20  5:00   ` Qu Wenruo
  27 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2021-07-16  9:11 UTC (permalink / raw)
  To: linux-btrfs



On 2021/7/13 下午2:14, Qu Wenruo wrote:
> The patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/compression
> 
> The branch is based on the previously submitted subpage enablement
> patchset.
> The target merge window is v5.16 or v5.17.
> 
> === What's working ===
> 
> Delalloc range which is fully page aligned can be compressed with
> 64K page size and 4K sector size (AKA, subpage).
> 
> With current patchset, it can pass most "compress" test group, except
> btrfs/106, whose golden output is bound to 4K page size, thus test case
> needs to be updated.

It turns out that, btrfs/160 has a very high chance to crash due to 
ordered extent tree inconsistency.
This is only exposed when running with "-o compress" mount option.

Will fix all the bugs exposed during full fstests run with "-o compress" 
mount option.

Thanks,
Qu

> 
> And as a basic requirement, 4K page size systems still pass the regular
> fstests runs.
> 
> === What's not working ===
> Delalloc range not fully page aligned will not go through compression.
> 
> That's to say, the following inode will go through different write path:
> 
> 	0	32K	64K	96K	128K
> 	|///////////////|	|///////|
> 			|		\- Will not be compressed
> 			|
> 			\- Will be compressed
> 
> This will reduce the chance of compression obviously.
> 
> But all involved patches will be the basis for later sector perfect
> compression support.
> 
> The limitation is mostly introduced by two factors:
> 
> - How we handle the locked page of a async cow delalloc range
>    Currently we unlock the first page unconditionally.
>    Even with the patchset, we still follows the behavior.
> 
>    This means we can't have two async cow range shares the same
>    page.
>    This can be enhanced to use subpage::writers, but the next
>    problem will prevent us doing so.
> 
> - No way to ensure an async cow range not to unlock the page while
>    we still have delalloc range in the page
> 
>    This is caused by how we run delalloc range in a page.
>    For regular sectorsize, it's not a problem as we have at most one
>    sector for a page.
> 
>    But for subpage case, we can have multiple sectors in one page.
>    If we submit an async cow, it may try to unlock the page while
>    we are still running the next delalloc range of the page.
> 
>    The correct way here is to find and lock all delalloc range inside a
>    page, update the subpage::writers properly, then run each delalloc
>    range, so that the page won't be unlocked half way.
> 
> === Patch structure ===
> 
> Patch 01~04:	Small and safe cleanups
> Patch 05:	Make compressed readahead to be subpage compatble
> Patch 06~14:	Optimize compressed read/write path to determine stripe
> 		boundary in a per-bio base
> Patch 15~16:	Extra code refactor/cleanup for compressed path
> 
> Patch 17~26:	Make compressed write path to be subpage compatible
> Patch 27:	Enable limited subpage compressed write support
> 
> Patch 01~16 may be a good candidate for early merge, as real heavy
> lifting part starts at patch 17.
> 
> While patch 01~04 are really small and safe cleanups, which can be
> merged even earlier than subpage enablement patchset.
> 
> 
> Qu Wenruo (27):
>    btrfs: remove unused parameter @nr_pages in add_ra_bio_pages()
>    btrfs: remove unnecessary parameter @delalloc_start for
>      writepage_delalloc()
>    btrfs: use async_chunk::async_cow to replace the confusing pending
>      pointer
>    btrfs: don't pass compressed pages to
>      btrfs_writepage_endio_finish_ordered()
>    btrfs: make add_ra_bio_pages() to be subpage compatible
>    btrfs: introduce compressed_bio::pending_sectors to trace compressed
>      bio more elegantly
>    btrfs: add subpage checked_bitmap to make PageChecked flag to be
>      subpage compatible
>    btrfs: handle errors properly inside btrfs_submit_compressed_read()
>    btrfs: handle errors properly inside btrfs_submit_compressed_write()
>    btrfs: introduce submit_compressed_bio() for compression
>    btrfs: introduce alloc_compressed_bio() for compression
>    btrfs: make btrfs_submit_compressed_read() to determine stripe
>      boundary at bio allocation time
>    btrfs: make btrfs_submit_compressed_write() to determine stripe
>      boundary at bio allocation time
>    btrfs: remove unused function btrfs_bio_fits_in_stripe()
>    btrfs: refactor submit_compressed_extents()
>    btrfs: cleanup for extent_write_locked_range()
>    btrfs: make compress_file_range() to be subpage compatible
>    btrfs: make btrfs_submit_compressed_write() to be subpage compatible
>    btrfs: make end_compressed_bio_writeback() to be subpage compatble
>    btrfs: make extent_write_locked_range() to be subpage compatible
>    btrfs: extract uncompressed async extent submission code into a new
>      helper
>    btrfs: rework lzo_compress_pages() to make it subpage compatible
>    btrfs: teach __extent_writepage() to handle locked page differently
>    btrfs: allow page to be unlocked by btrfs_page_end_writer_lock() even
>      if it's locked by plain page_lock()
>    btrfs: allow subpage to compress a range which only covers one page
>    btrfs: don't run delalloc range which is beyond the locked_page to
>      prevent deadlock for subpage compression
>    btrfs: only allow subpage compression if the range is fully page
>      aligned
> 
>   fs/btrfs/compression.c           | 678 ++++++++++++++++++-------------
>   fs/btrfs/compression.h           |   4 +-
>   fs/btrfs/ctree.h                 |   2 -
>   fs/btrfs/extent_io.c             | 123 ++++--
>   fs/btrfs/extent_io.h             |   3 +-
>   fs/btrfs/file.c                  |  20 +-
>   fs/btrfs/free-space-cache.c      |   6 +-
>   fs/btrfs/inode.c                 | 455 +++++++++++----------
>   fs/btrfs/lzo.c                   | 280 ++++++-------
>   fs/btrfs/reflink.c               |   2 +-
>   fs/btrfs/subpage.c               |  85 ++++
>   fs/btrfs/subpage.h               |  10 +
>   fs/btrfs/tests/extent-io-tests.c |  12 +-
>   13 files changed, 996 insertions(+), 684 deletions(-)
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/27] btrfs: limited subpage compressed write support
  2021-07-16  9:11 ` [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
@ 2021-07-20  5:00   ` Qu Wenruo
  0 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2021-07-20  5:00 UTC (permalink / raw)
  To: linux-btrfs



On 2021/7/16 下午5:11, Qu Wenruo wrote:
> 
> 
> On 2021/7/13 下午2:14, Qu Wenruo wrote:
>> The patchset can be fetched from github:
>> https://github.com/adam900710/linux/tree/compression
>>
>> The branch is based on the previously submitted subpage enablement
>> patchset.
>> The target merge window is v5.16 or v5.17.
>>
>> === What's working ===
>>
>> Delalloc range which is fully page aligned can be compressed with
>> 64K page size and 4K sector size (AKA, subpage).
>>
>> With current patchset, it can pass most "compress" test group, except
>> btrfs/106, whose golden output is bound to 4K page size, thus test case
>> needs to be updated.
> 
> It turns out that, btrfs/160 has a very high chance to crash due to 
> ordered extent tree inconsistency.
> This is only exposed when running with "-o compress" mount option.

To make things more weird, if I disable space cache then run the test 
again, it no longer crashes anymore.

Recent debug also shows the problem is inside the ordered extent 
cleanup, for *space cache*.

By somehow, it seems with subpage compression, we're creating larger v1 
free space cache file (this behavior itself can be a bug though), thus 
it's more vulnerable to IO error half way with large ordered extent 
submitted.

So this is a bug in generic data writeback path, but some how only 
subpage compression makes it easier to trigger.

Thanks,
Qu

> 
> Will fix all the bugs exposed during full fstests run with "-o compress" 
> mount option.
> 
> Thanks,
> Qu
> 
>>
>> And as a basic requirement, 4K page size systems still pass the regular
>> fstests runs.
>>
>> === What's not working ===
>> Delalloc range not fully page aligned will not go through compression.
>>
>> That's to say, the following inode will go through different write path:
>>
>>     0    32K    64K    96K    128K
>>     |///////////////|    |///////|
>>             |        \- Will not be compressed
>>             |
>>             \- Will be compressed
>>
>> This will reduce the chance of compression obviously.
>>
>> But all involved patches will be the basis for later sector perfect
>> compression support.
>>
>> The limitation is mostly introduced by two factors:
>>
>> - How we handle the locked page of a async cow delalloc range
>>    Currently we unlock the first page unconditionally.
>>    Even with the patchset, we still follows the behavior.
>>
>>    This means we can't have two async cow range shares the same
>>    page.
>>    This can be enhanced to use subpage::writers, but the next
>>    problem will prevent us doing so.
>>
>> - No way to ensure an async cow range not to unlock the page while
>>    we still have delalloc range in the page
>>
>>    This is caused by how we run delalloc range in a page.
>>    For regular sectorsize, it's not a problem as we have at most one
>>    sector for a page.
>>
>>    But for subpage case, we can have multiple sectors in one page.
>>    If we submit an async cow, it may try to unlock the page while
>>    we are still running the next delalloc range of the page.
>>
>>    The correct way here is to find and lock all delalloc range inside a
>>    page, update the subpage::writers properly, then run each delalloc
>>    range, so that the page won't be unlocked half way.
>>
>> === Patch structure ===
>>
>> Patch 01~04:    Small and safe cleanups
>> Patch 05:    Make compressed readahead to be subpage compatble
>> Patch 06~14:    Optimize compressed read/write path to determine stripe
>>         boundary in a per-bio base
>> Patch 15~16:    Extra code refactor/cleanup for compressed path
>>
>> Patch 17~26:    Make compressed write path to be subpage compatible
>> Patch 27:    Enable limited subpage compressed write support
>>
>> Patch 01~16 may be a good candidate for early merge, as real heavy
>> lifting part starts at patch 17.
>>
>> While patch 01~04 are really small and safe cleanups, which can be
>> merged even earlier than subpage enablement patchset.
>>
>>
>> Qu Wenruo (27):
>>    btrfs: remove unused parameter @nr_pages in add_ra_bio_pages()
>>    btrfs: remove unnecessary parameter @delalloc_start for
>>      writepage_delalloc()
>>    btrfs: use async_chunk::async_cow to replace the confusing pending
>>      pointer
>>    btrfs: don't pass compressed pages to
>>      btrfs_writepage_endio_finish_ordered()
>>    btrfs: make add_ra_bio_pages() to be subpage compatible
>>    btrfs: introduce compressed_bio::pending_sectors to trace compressed
>>      bio more elegantly
>>    btrfs: add subpage checked_bitmap to make PageChecked flag to be
>>      subpage compatible
>>    btrfs: handle errors properly inside btrfs_submit_compressed_read()
>>    btrfs: handle errors properly inside btrfs_submit_compressed_write()
>>    btrfs: introduce submit_compressed_bio() for compression
>>    btrfs: introduce alloc_compressed_bio() for compression
>>    btrfs: make btrfs_submit_compressed_read() to determine stripe
>>      boundary at bio allocation time
>>    btrfs: make btrfs_submit_compressed_write() to determine stripe
>>      boundary at bio allocation time
>>    btrfs: remove unused function btrfs_bio_fits_in_stripe()
>>    btrfs: refactor submit_compressed_extents()
>>    btrfs: cleanup for extent_write_locked_range()
>>    btrfs: make compress_file_range() to be subpage compatible
>>    btrfs: make btrfs_submit_compressed_write() to be subpage compatible
>>    btrfs: make end_compressed_bio_writeback() to be subpage compatble
>>    btrfs: make extent_write_locked_range() to be subpage compatible
>>    btrfs: extract uncompressed async extent submission code into a new
>>      helper
>>    btrfs: rework lzo_compress_pages() to make it subpage compatible
>>    btrfs: teach __extent_writepage() to handle locked page differently
>>    btrfs: allow page to be unlocked by btrfs_page_end_writer_lock() even
>>      if it's locked by plain page_lock()
>>    btrfs: allow subpage to compress a range which only covers one page
>>    btrfs: don't run delalloc range which is beyond the locked_page to
>>      prevent deadlock for subpage compression
>>    btrfs: only allow subpage compression if the range is fully page
>>      aligned
>>
>>   fs/btrfs/compression.c           | 678 ++++++++++++++++++-------------
>>   fs/btrfs/compression.h           |   4 +-
>>   fs/btrfs/ctree.h                 |   2 -
>>   fs/btrfs/extent_io.c             | 123 ++++--
>>   fs/btrfs/extent_io.h             |   3 +-
>>   fs/btrfs/file.c                  |  20 +-
>>   fs/btrfs/free-space-cache.c      |   6 +-
>>   fs/btrfs/inode.c                 | 455 +++++++++++----------
>>   fs/btrfs/lzo.c                   | 280 ++++++-------
>>   fs/btrfs/reflink.c               |   2 +-
>>   fs/btrfs/subpage.c               |  85 ++++
>>   fs/btrfs/subpage.h               |  10 +
>>   fs/btrfs/tests/extent-io-tests.c |  12 +-
>>   13 files changed, 996 insertions(+), 684 deletions(-)
>>
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-07-20  5:02 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-13  6:14 [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
2021-07-13  6:14 ` [PATCH 01/27] btrfs: remove unused parameter @nr_pages in add_ra_bio_pages() Qu Wenruo
2021-07-13  6:14 ` [PATCH 02/27] btrfs: remove unnecessary parameter @delalloc_start for writepage_delalloc() Qu Wenruo
2021-07-13  6:14 ` [PATCH 03/27] btrfs: use async_chunk::async_cow to replace the confusing pending pointer Qu Wenruo
2021-07-13  7:36   ` Nikolay Borisov
2021-07-13  6:14 ` [PATCH 04/27] btrfs: don't pass compressed pages to btrfs_writepage_endio_finish_ordered() Qu Wenruo
2021-07-13  6:14 ` [PATCH 05/27] btrfs: make add_ra_bio_pages() to be subpage compatible Qu Wenruo
2021-07-13  6:14 ` [PATCH 06/27] btrfs: introduce compressed_bio::pending_sectors to trace compressed bio more elegantly Qu Wenruo
2021-07-13  6:14 ` [PATCH 07/27] btrfs: add subpage checked_bitmap to make PageChecked flag to be subpage compatible Qu Wenruo
2021-07-16  7:54   ` Qu Wenruo
2021-07-13  6:14 ` [PATCH 08/27] btrfs: handle errors properly inside btrfs_submit_compressed_read() Qu Wenruo
2021-07-13  6:14 ` [PATCH 09/27] btrfs: handle errors properly inside btrfs_submit_compressed_write() Qu Wenruo
2021-07-13  6:14 ` [PATCH 10/27] btrfs: introduce submit_compressed_bio() for compression Qu Wenruo
2021-07-13  6:15 ` [PATCH 11/27] btrfs: introduce alloc_compressed_bio() " Qu Wenruo
2021-07-13  6:15 ` [PATCH 12/27] btrfs: make btrfs_submit_compressed_read() to determine stripe boundary at bio allocation time Qu Wenruo
2021-07-13  6:15 ` [PATCH 13/27] btrfs: make btrfs_submit_compressed_write() " Qu Wenruo
2021-07-13  6:15 ` [PATCH 14/27] btrfs: remove unused function btrfs_bio_fits_in_stripe() Qu Wenruo
2021-07-13  6:15 ` [PATCH 15/27] btrfs: refactor submit_compressed_extents() Qu Wenruo
2021-07-13  6:15 ` [PATCH 16/27] btrfs: cleanup for extent_write_locked_range() Qu Wenruo
2021-07-13  6:15 ` [PATCH 17/27] btrfs: make compress_file_range() to be subpage compatible Qu Wenruo
2021-07-13  6:15 ` [PATCH 18/27] btrfs: make btrfs_submit_compressed_write() " Qu Wenruo
2021-07-13  6:15 ` [PATCH 19/27] btrfs: make end_compressed_bio_writeback() to be subpage compatble Qu Wenruo
2021-07-13  6:15 ` [PATCH 20/27] btrfs: make extent_write_locked_range() to be subpage compatible Qu Wenruo
2021-07-13  6:15 ` [PATCH 21/27] btrfs: extract uncompressed async extent submission code into a new helper Qu Wenruo
2021-07-13  6:15 ` [PATCH 22/27] btrfs: rework lzo_compress_pages() to make it subpage compatible Qu Wenruo
2021-07-13  6:15 ` [PATCH 23/27] btrfs: teach __extent_writepage() to handle locked page differently Qu Wenruo
2021-07-13  6:15 ` [PATCH 24/27] btrfs: allow page to be unlocked by btrfs_page_end_writer_lock() even if it's locked by plain page_lock() Qu Wenruo
2021-07-13  6:15 ` [PATCH 25/27] btrfs: allow subpage to compress a range which only covers one page Qu Wenruo
2021-07-13  6:15 ` [PATCH 26/27] btrfs: don't run delalloc range which is beyond the locked_page to prevent deadlock for subpage compression Qu Wenruo
2021-07-13  6:15 ` [PATCH 27/27] btrfs: only allow subpage compression if the range is fully page aligned Qu Wenruo
2021-07-16  9:11 ` [PATCH 00/27] btrfs: limited subpage compressed write support Qu Wenruo
2021-07-20  5:00   ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.