All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size
@ 2020-09-30  1:54 Qu Wenruo
  2020-09-30  1:54 ` [PATCH v3 01/49] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
                   ` (48 more replies)
  0 siblings, 49 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:54 UTC (permalink / raw)
  To: linux-btrfs

Patches can be fetched from github:
https://github.com/adam900710/linux/tree/subpage

Currently btrfs only allows to mount fs with sectorsize == PAGE_SIZE.

That means, for 64K page size system, they can only use 64K sector size
fs.
This brings a big compatibility problem for btrfs.

This patch is going to slightly solve the problem by, allowing 64K
system to mount 4K sectorsize fs in metadata read-write mode.
Data write is not ensured to work yet.

The main objective here, is to remove the blockage in the code base, and
pave the road to full RW mount support.

== What works ==

Existing regular page sized sector size support
Subpage read-only Mount (with all self tests and ASSERT)
Subpage metadata read write (including all trees and inline extents, and csum checking)
Subpage data read (both compressed and uncompressed, with csum checking)

== What doesn't work ==

Subpage data write (neither compressed nor uncompressed)

Thus no full fstests yet.

== Challenge we meet ==

The main here problem is metadata, where we have several limitations:
- We always read the full page of a metadata
  In subpage case, one full page can contain several tree blocks.

- We use page::private to point to extent buffer
  This means we currently can only support one-page-to-one-extent-buffer
  mapping.
  For subpage size support, we need one-page-to-multiple-extent-buffer
  mapping.

But still, we have some challenge in data too.
- No EXTENT_* bits for certain page status
  The main example is Private2.

== Solutions ==

So here for the metadata part, we use the following methods to
workaround the problem:

- Rely on extent_io_tree more for metadata status/locking
  Now for subpage metadata, page::private is never utilized. It always
  points to NULL.

  The page status is updated according to the extent_io_tree now.
  E.g. if we have any extent marked EXTENT_DIRTY, then the page covering
  the range will be marked DIRTY.
  While only all extent ranges are UPTODATE, and no hole in the range,
  then the page will be UPTODATE.

  Current utilized bits for metadata btree are:
  * EXTENT_UPTODATE
    Similar to PageUptodate(), needs all range covered by the page to be
    UPTODATE.

  * EXTENT_DIRTY
    Similar to PageDirty(), any dirty extent will mark the page dirty.

  * EXTENT_WRITEBACK
    Similar to PageWriteback(), any extent under writeback will mark the
    page writeback.

  * EXTENT_LOCKED
    Similar to PageLocked(), however the locking sequence is different
    for subpage. We lock page first, then lock all ebs in the page
    for subpage. For regular sector size, we lock the eb, then
    lock all pages belongs to the eb.

- Do subpage read for metadata
  Now we do proper subpage read for both data and metadata.
  For metadata we never merge bio for adjacent tree blocks, but always
  submit one bio for one tree block.
  This allows us to do proper verification for each tree blocks.

- Do mergable subpage write for metadata
  We submit extent buffer in its nodesize, but still allow them to be
  merged.

For data part, we just convert the existing csum/read code to do proper
subpage check.

But now, we are at the crossroad for data write support.
We can either:
- Go iomap directly
  Iomap supports subpage, but I'm not yet sure how they support that.
  (Per-page bitmap? Or something similar to btrfs extent_io_tree?
   And how many call backs from btrfs is needed?)

- Go current direction
  I guess this would cause less problem, and less dependency.
  But the iomap seems to be the ultimate solution, thus even we go this
  way, we still need to go iomap one day.

Anyway, I'll still go extent_io_tree for data write and at least make it
able to run fstests.

== Patchset structure ==

This patchset is so big that even I tried my best to re-order the
patchset, it can still have some questionable order for the cleanups.

Anyway, the main priority is:
- Bug fix
  Obviously

- Subpage independent refactors
  Such refactors makes sense no matter if we support subpage or not.

- Subpage related but still independent refactors
  Those refactors makes more sense if we support subpage.
  But still it doesn't affect existing behavior

- Subpage specific code
  Most of these code will be a subroutine for existing code, thus they
  shouldn't change the existing behavior.
  But really only make sense for subpage usage.

So the patchset structure is:
Patch 01~04:	Small bug fixes found during the development.
Patch 05~17:	Mostly independent refactors
Pathc 18~26:	Refactors leans more towards subpage, but still makes
		sense for regular sector size case.
Patch 27~49:	Subpage specific code

Changelog:
v2:
- Migrating to extent_io_tree based status/locking mechanism
  This gets rid of the ad-hoc subpage_eb_mapping structure and extra
  timing to verify the extent buffers.

  This also brings some extra cleanups for btree inode extent io tree
  hooks which makes no sense for both subpage and regular sector size.

  This also completely removes the requirement for page status like
  Locked/Uptodate/Dirty. Now metadata pages only utilize Private status,
  while private pointer is always NULL.

- Submit proper subpage sized read for metadata
  With the help of extent io tree, we no longer need to bother full page
  read. Now submit subpage sized metadata read and do subpage locking.

- Remove some unnecessary refactors
  Some refactors like extracting detach_extent_buffer_pages() doesn't
  really make the code cleaner. We can easily add subpage specific
  branch.

- Address the comments from v1

v3:
- Add compressed data read fix

- Also update page status according to extent status for btree inode
  This makes us to reuse more code from the existing code base.

- Add metadata write support
  Only manually tested (with a fs created under x86_64, and script to do
  metadata only operations under aarch64 with 64K page size).

- More cleanup/refactors during metadata write support development.

Goldwyn Rodrigues (1):
  btrfs: use iosize while reading compressed pages

Qu Wenruo (48):
  btrfs: extent-io-tests: remove invalid tests
  btrfs: extent_io: fix the comment on lock_extent_buffer_for_io().
  btrfs: extent_io: update the comment for find_first_extent_bit()
  btrfs: make btree inode io_tree has its special owner
  btrfs: disk-io: replace @fs_info and @private_data with @inode for
    btrfs_wq_submit_bio()
  btrfs: inode: sink parameter @start and @len for check_data_csum()
  btrfs: extent_io: unexport extent_invalidatepage()
  btrfs: extent_io: remove the forward declaration and rename
    __process_pages_contig
  btrfs: extent_io: rename pages_locked in process_pages_contig()
  btrfs: extent_io: make process_pages_contig() to accept bytenr
    directly
  btrfs: extent_io: only require sector size alignment for page read
  btrfs: extent_io: remove the extent_start/extent_len for
    end_bio_extent_readpage()
  btrfs: extent_io: integrate page status update into
    endio_readpage_release_extent()
  btrfs: extent_io: rename page_size to io_size in submit_extent_page()
  btrfs: extent_io: add assert_spin_locked() for
    attach_extent_buffer_page()
  btrfs: extent_io: extract the btree page submission code into its own
    helper function
  btrfs: extent_io: calculate inline extent buffer page size based on
    page size
  btrfs: extent_io: make btrfs_fs_info::buffer_radix to take sector size
    devided values
  btrfs: disk_io: grab fs_info from extent_buffer::fs_info directly for
    btrfs_mark_buffer_dirty()
  btrfs: disk-io: make csum_tree_block() handle sectorsize smaller than
    page size
  btrfs: disk-io: extract the extent buffer verification from
    btree_readpage_end_io_hook()
  btrfs: disk-io: accept bvec directly for csum_dirty_buffer()
  btrfs: inode: make btrfs_readpage_end_io_hook() follow sector size
  btrfs: introduce a helper to determine if the sectorsize is smaller
    than PAGE_SIZE
  btrfs: extent_io: allow find_first_extent_bit() to find a range with
    exact bits match
  btrfs: extent_io: don't allow tree block to cross page boundary for
    subpage support
  btrfs: extent_io: update num_extent_pages() to support subpage sized
    extent buffer
  btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
  btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage()
  btrfs: extent-io: make type of extent_state::state to be at least 32
    bits
  btrfs: extent_io: use extent_io_tree to handle subpage extent buffer
    allocation
  btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support
    subpage size
  btrfs: extent_io: make the assert test on page uptodate able to handle
    subpage
  btrfs: extent_io: implement subpage metadata read and its endio
    function
  btrfs: extent_io: implement try_release_extent_buffer() for subpage
    metadata support
  btrfs: set btree inode track_uptodate for subpage support
  btrfs: allow RO mount of 4K sector size fs on 64K page system
  btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check
    on subpage metadata
  btrfs: disk-io: support subpage metadata csum calculation at write
    time
  btrfs: extent_io: prevent extent_state from being merged for btree io
    tree
  btrfs: extent_io: make set_extent_buffer_dirty() to support subpage
    sized metadata
  btrfs: extent_io: add subpage support for clear_extent_buffer_dirty()
  btrfs: extent_io: make set_btree_ioerr() accept extent buffer
  btrfs: extent_io: introduce write_one_subpage_eb() function
  btrfs: extent_io: make lock_extent_buffer_for_io() subpage compatible
  btrfs: extent_io: introduce submit_btree_subpage() to submit a page
    for subpage metadata write
  btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function
  btrfs: support metadata read write for test

 fs/btrfs/block-group.c           |    2 +-
 fs/btrfs/btrfs_inode.h           |   12 +
 fs/btrfs/ctree.c                 |    5 +-
 fs/btrfs/ctree.h                 |   43 +-
 fs/btrfs/disk-io.c               |  425 +++++++--
 fs/btrfs/disk-io.h               |    8 +-
 fs/btrfs/extent-io-tree.h        |   58 +-
 fs/btrfs/extent-tree.c           |    2 +-
 fs/btrfs/extent_io.c             | 1403 ++++++++++++++++++++++--------
 fs/btrfs/extent_io.h             |   27 +-
 fs/btrfs/file.c                  |    4 +
 fs/btrfs/free-space-cache.c      |    2 +-
 fs/btrfs/inode.c                 |   61 +-
 fs/btrfs/relocation.c            |    2 +-
 fs/btrfs/struct-funcs.c          |   18 +-
 fs/btrfs/tests/extent-io-tests.c |   26 +-
 fs/btrfs/transaction.c           |    4 +-
 fs/btrfs/volumes.c               |    2 +-
 include/trace/events/btrfs.h     |    1 +
 19 files changed, 1562 insertions(+), 543 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 01/49] btrfs: extent-io-tests: remove invalid tests
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
@ 2020-09-30  1:54 ` Qu Wenruo
  2020-09-30  1:54 ` [PATCH v3 02/49] btrfs: use iosize while reading compressed pages Qu Wenruo
                   ` (47 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:54 UTC (permalink / raw)
  To: linux-btrfs

In extent-io-test, there are two invalid tests:
- Invalid nodesize for test_eb_bitmaps()
  Instead of the sectorsize and nodesize combination passed in, we're
  always using hand-crafted nodesize.
  Although it has some extra check for 64K page size, we can still hit
  a case where PAGE_SIZE == 32K, then we got 128K nodesize which is
  larger than max valid node size.

  Thankfully most machines are either 4K or 64K page size, thus we
  haven't yet hit such case.

- Invalid extent buffer bytenr
  For 64K page size, the only combination we're going to test is
  sectorsize = nodesize = 64K.
  In that case, we'll try to create an extent buffer with 32K bytenr,
  which is not aligned to sectorsize thus invalid.

This patch will fix both problems by:
- Honor the sectorsize/nodesize combination
  Now we won't bother to hand-craft a strange length and use it as
  nodesize.

- Use sectorsize as the 2nd run extent buffer start
  This would test the case where extent buffer is aligned to sectorsize
  but not always aligned to nodesize.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/tests/extent-io-tests.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
index df7ce874a74b..73e96d505f4f 100644
--- a/fs/btrfs/tests/extent-io-tests.c
+++ b/fs/btrfs/tests/extent-io-tests.c
@@ -379,54 +379,50 @@ static int __test_eb_bitmaps(unsigned long *bitmap, struct extent_buffer *eb,
 static int test_eb_bitmaps(u32 sectorsize, u32 nodesize)
 {
 	struct btrfs_fs_info *fs_info;
-	unsigned long len;
 	unsigned long *bitmap = NULL;
 	struct extent_buffer *eb = NULL;
 	int ret;
 
 	test_msg("running extent buffer bitmap tests");
 
-	/*
-	 * In ppc64, sectorsize can be 64K, thus 4 * 64K will be larger than
-	 * BTRFS_MAX_METADATA_BLOCKSIZE.
-	 */
-	len = (sectorsize < BTRFS_MAX_METADATA_BLOCKSIZE)
-		? sectorsize * 4 : sectorsize;
-
-	fs_info = btrfs_alloc_dummy_fs_info(len, len);
+	fs_info = btrfs_alloc_dummy_fs_info(nodesize, sectorsize);
 	if (!fs_info) {
 		test_std_err(TEST_ALLOC_FS_INFO);
 		return -ENOMEM;
 	}
 
-	bitmap = kmalloc(len, GFP_KERNEL);
+	bitmap = kmalloc(nodesize, GFP_KERNEL);
 	if (!bitmap) {
 		test_err("couldn't allocate test bitmap");
 		ret = -ENOMEM;
 		goto out;
 	}
 
-	eb = __alloc_dummy_extent_buffer(fs_info, 0, len);
+	eb = __alloc_dummy_extent_buffer(fs_info, 0, nodesize);
 	if (!eb) {
 		test_std_err(TEST_ALLOC_ROOT);
 		ret = -ENOMEM;
 		goto out;
 	}
 
-	ret = __test_eb_bitmaps(bitmap, eb, len);
+	ret = __test_eb_bitmaps(bitmap, eb, nodesize);
 	if (ret)
 		goto out;
 
-	/* Do it over again with an extent buffer which isn't page-aligned. */
 	free_extent_buffer(eb);
-	eb = __alloc_dummy_extent_buffer(fs_info, nodesize / 2, len);
+
+	/*
+	 * Test again for case where the tree block is sectorsize aligned but
+	 * not nodesize aligned.
+	 */
+	eb = __alloc_dummy_extent_buffer(fs_info, sectorsize, nodesize);
 	if (!eb) {
 		test_std_err(TEST_ALLOC_ROOT);
 		ret = -ENOMEM;
 		goto out;
 	}
 
-	ret = __test_eb_bitmaps(bitmap, eb, len);
+	ret = __test_eb_bitmaps(bitmap, eb, nodesize);
 out:
 	free_extent_buffer(eb);
 	kfree(bitmap);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 02/49] btrfs: use iosize while reading compressed pages
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
  2020-09-30  1:54 ` [PATCH v3 01/49] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
@ 2020-09-30  1:54 ` Qu Wenruo
  2020-09-30  1:54 ` [PATCH v3 03/49] btrfs: extent_io: fix the comment on lock_extent_buffer_for_io() Qu Wenruo
                   ` (46 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:54 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.de>

While using compression, a submitted bio is mapped with a compressed bio
which performs the read from disk, decompresses and returns uncompressed
data to original bio. The original bio must reflect the uncompressed
size (iosize) of the I/O to be performed, or else the page just gets the
decompressed I/O length of data (disk_io_size). The compressed bio
checks the extent map and get the correct length while performing the
I/O from disk.

This came up in subpage work when only compressed length of the original
bio was filled in the page. This worked correctly for pagesize ==
sectorsize because both compressed and uncompressed data are at pagesize
boundaries, and would end up filling the requested page.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/extent_io.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a940edb1e64f..64f7f61ce718 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3162,7 +3162,6 @@ static int __do_readpage(struct page *page,
 	int nr = 0;
 	size_t pg_offset = 0;
 	size_t iosize;
-	size_t disk_io_size;
 	size_t blocksize = inode->i_sb->s_blocksize;
 	unsigned long this_bio_flag = 0;
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
@@ -3228,13 +3227,10 @@ static int __do_readpage(struct page *page,
 		iosize = min(extent_map_end(em) - cur, end - cur + 1);
 		cur_end = min(extent_map_end(em) - 1, end);
 		iosize = ALIGN(iosize, blocksize);
-		if (this_bio_flag & EXTENT_BIO_COMPRESSED) {
-			disk_io_size = em->block_len;
+		if (this_bio_flag & EXTENT_BIO_COMPRESSED)
 			offset = em->block_start;
-		} else {
+		else
 			offset = em->block_start + extent_offset;
-			disk_io_size = iosize;
-		}
 		block_start = em->block_start;
 		if (test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
 			block_start = EXTENT_MAP_HOLE;
@@ -3323,7 +3319,7 @@ static int __do_readpage(struct page *page,
 		}
 
 		ret = submit_extent_page(REQ_OP_READ | read_flags, NULL,
-					 page, offset, disk_io_size,
+					 page, offset, iosize,
 					 pg_offset, bio,
 					 end_bio_extent_readpage, mirror_num,
 					 *bio_flags,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 03/49] btrfs: extent_io: fix the comment on lock_extent_buffer_for_io().
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
  2020-09-30  1:54 ` [PATCH v3 01/49] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
  2020-09-30  1:54 ` [PATCH v3 02/49] btrfs: use iosize while reading compressed pages Qu Wenruo
@ 2020-09-30  1:54 ` Qu Wenruo
  2020-09-30  1:54 ` [PATCH v3 04/49] btrfs: extent_io: update the comment for find_first_extent_bit() Qu Wenruo
                   ` (45 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:54 UTC (permalink / raw)
  To: linux-btrfs

The return value of that function is completely wrong.

That function only return 0 if the the extent buffer doesn't need to be
submitted.
The "ret = 1" and "ret = 0" are determined by the return value of
"test_and_clear_bit(EXTENT_BUFFER_DIRTY, &eb->bflags)".

And if we get ret == 1, it's because the extent buffer is dirty, and we
set its status to EXTENT_BUFFER_WRITE_BACK, and continue to page
locking.

While if we get ret == 0, it means the extent is not dirty from the
beginning, so we don't need to write it back.

The caller also follows this, in btree_write_cache_pages(), if
lock_extent_buffer_for_io() return 0, we just skip the extent buffer
completely.

So the comment is completely wrong.

Since we're here, also change the description a little.
The write bio flushing won't be visible to the caller, thus it's not an
major feature.
In the main decription, only describe the locking part to make the point
more clear.

Fixes: 2e3c25136adf ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 64f7f61ce718..a64d88163f3b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3688,11 +3688,14 @@ static void end_extent_buffer_writeback(struct extent_buffer *eb)
 }
 
 /*
- * Lock eb pages and flush the bio if we can't the locks
+ * Lock extent buffer status and pages for write back.
  *
- * Return  0 if nothing went wrong
- * Return >0 is same as 0, except bio is not submitted
- * Return <0 if something went wrong, no page is locked
+ * May try to flush write bio if we can't get the lock.
+ *
+ * Return  0 if the extent buffer doesn't need to be submitted.
+ * (E.g. the extent buffer is not dirty)
+ * Return >0 is the extent buffer is submitted to bio.
+ * Return <0 if something went wrong, no page is locked.
  */
 static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb,
 			  struct extent_page_data *epd)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 04/49] btrfs: extent_io: update the comment for find_first_extent_bit()
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (2 preceding siblings ...)
  2020-09-30  1:54 ` [PATCH v3 03/49] btrfs: extent_io: fix the comment on lock_extent_buffer_for_io() Qu Wenruo
@ 2020-09-30  1:54 ` Qu Wenruo
  2020-09-30  1:54 ` [PATCH v3 05/49] btrfs: make btree inode io_tree has its special owner Qu Wenruo
                   ` (44 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:54 UTC (permalink / raw)
  To: linux-btrfs

The pitfall here is, if the parameter @bits has multiple bits set, we
will return the first range which just has one of the specified bits
set.

This is a little tricky if we want an exact match.

Anyway, update the comment to inform the callers.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a64d88163f3b..2980e8384e74 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1554,11 +1554,12 @@ find_first_extent_bit_state(struct extent_io_tree *tree,
 }
 
 /*
- * find the first offset in the io tree with 'bits' set. zero is
- * returned if we find something, and *start_ret and *end_ret are
- * set to reflect the state struct that was found.
+ * Find the first offset in the io tree with one or more @bits set.
  *
- * If nothing was found, 1 is returned. If found something, return 0.
+ * NOTE: If @bits are multiple bits, any bit of @bits will meet the match.
+ *
+ * Return 0 if we find something, and update @start_ret and @end_ret.
+ * Return 1 if we found nothing.
  */
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
 			  u64 *start_ret, u64 *end_ret, unsigned bits,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 05/49] btrfs: make btree inode io_tree has its special owner
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (3 preceding siblings ...)
  2020-09-30  1:54 ` [PATCH v3 04/49] btrfs: extent_io: update the comment for find_first_extent_bit() Qu Wenruo
@ 2020-09-30  1:54 ` Qu Wenruo
  2020-09-30  1:54 ` [PATCH v3 06/49] btrfs: disk-io: replace @fs_info and @private_data with @inode for btrfs_wq_submit_bio() Qu Wenruo
                   ` (43 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:54 UTC (permalink / raw)
  To: linux-btrfs

Btree inode is pretty special compared to all other inode extent io
tree, although it has a btrfs inode, it doesn't have the track_uptodate
bit set to true, and never has ordered extent.

Since it's so special, adds a new owner value for it to make debuging a
little easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c           | 2 +-
 fs/btrfs/extent-io-tree.h    | 1 +
 include/trace/events/btrfs.h | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f6bba7eb1fa1..be6edbd34934 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2116,7 +2116,7 @@ static void btrfs_init_btree_inode(struct btrfs_fs_info *fs_info)
 
 	RB_CLEAR_NODE(&BTRFS_I(inode)->rb_node);
 	extent_io_tree_init(fs_info, &BTRFS_I(inode)->io_tree,
-			    IO_TREE_INODE_IO, inode);
+			    IO_TREE_BTREE_INODE_IO, inode);
 	BTRFS_I(inode)->io_tree.track_uptodate = false;
 	extent_map_tree_init(&BTRFS_I(inode)->extent_tree);
 
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 219a09a2b734..960d4a24f13e 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -40,6 +40,7 @@ struct io_failure_record;
 enum {
 	IO_TREE_FS_PINNED_EXTENTS,
 	IO_TREE_FS_EXCLUDED_EXTENTS,
+	IO_TREE_BTREE_INODE_IO,
 	IO_TREE_INODE_IO,
 	IO_TREE_INODE_IO_FAILURE,
 	IO_TREE_RELOC_BLOCKS,
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 863335ecb7e8..89397605e465 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -79,6 +79,7 @@ struct btrfs_space_info;
 #define IO_TREE_OWNER						    \
 	EM( IO_TREE_FS_PINNED_EXTENTS, 	  "PINNED_EXTENTS")	    \
 	EM( IO_TREE_FS_EXCLUDED_EXTENTS,  "EXCLUDED_EXTENTS")	    \
+	EM( IO_TREE_BTREE_INODE_IO,	  "BTRFS_INODE_IO")	    \
 	EM( IO_TREE_INODE_IO,		  "INODE_IO")		    \
 	EM( IO_TREE_INODE_IO_FAILURE,	  "INODE_IO_FAILURE")	    \
 	EM( IO_TREE_RELOC_BLOCKS,	  "RELOC_BLOCKS")	    \
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 06/49] btrfs: disk-io: replace @fs_info and @private_data with @inode for btrfs_wq_submit_bio()
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (4 preceding siblings ...)
  2020-09-30  1:54 ` [PATCH v3 05/49] btrfs: make btree inode io_tree has its special owner Qu Wenruo
@ 2020-09-30  1:54 ` Qu Wenruo
  2020-09-30  1:54 ` [PATCH v3 07/49] btrfs: inode: sink parameter @start and @len for check_data_csum() Qu Wenruo
                   ` (42 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:54 UTC (permalink / raw)
  To: linux-btrfs

All callers for btrfs_wq_submit_bio() passes struct inode as
@private_data, so there is no need for @private_data to be (void *),
just replace it with "struct inode *inode".

While we can extra fs_info from struct inode, also remove the @fs_info
parameter.

Since we're here, also replace all the (void *private_data) into (struct
inode *inode).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   | 21 +++++++++++----------
 fs/btrfs/disk-io.h   |  8 ++++----
 fs/btrfs/extent_io.h |  2 +-
 fs/btrfs/inode.c     | 21 +++++++++------------
 4 files changed, 25 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index be6edbd34934..b7436ab7bba9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -110,7 +110,7 @@ static void btrfs_free_csum_hash(struct btrfs_fs_info *fs_info)
  * just before they are sent down the IO stack.
  */
 struct async_submit_bio {
-	void *private_data;
+	struct inode *inode;
 	struct bio *bio;
 	extent_submit_bio_start_t *submit_bio_start;
 	int mirror_num;
@@ -746,7 +746,7 @@ static void run_one_async_start(struct btrfs_work *work)
 	blk_status_t ret;
 
 	async = container_of(work, struct  async_submit_bio, work);
-	ret = async->submit_bio_start(async->private_data, async->bio,
+	ret = async->submit_bio_start(async->inode, async->bio,
 				      async->bio_offset);
 	if (ret)
 		async->status = ret;
@@ -767,7 +767,7 @@ static void run_one_async_done(struct btrfs_work *work)
 	blk_status_t ret;
 
 	async = container_of(work, struct  async_submit_bio, work);
-	inode = async->private_data;
+	inode = async->inode;
 
 	/* If an error occurred we just want to clean up the bio and move on */
 	if (async->status) {
@@ -797,18 +797,19 @@ static void run_one_async_free(struct btrfs_work *work)
 	kfree(async);
 }
 
-blk_status_t btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
+blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio,
 				 int mirror_num, unsigned long bio_flags,
-				 u64 bio_offset, void *private_data,
+				 u64 bio_offset,
 				 extent_submit_bio_start_t *submit_bio_start)
 {
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
 	struct async_submit_bio *async;
 
 	async = kmalloc(sizeof(*async), GFP_NOFS);
 	if (!async)
 		return BLK_STS_RESOURCE;
 
-	async->private_data = private_data;
+	async->inode = inode;
 	async->bio = bio;
 	async->mirror_num = mirror_num;
 	async->submit_bio_start = submit_bio_start;
@@ -845,8 +846,8 @@ static blk_status_t btree_csum_one_bio(struct bio *bio)
 	return errno_to_blk_status(ret);
 }
 
-static blk_status_t btree_submit_bio_start(void *private_data, struct bio *bio,
-					     u64 bio_offset)
+static blk_status_t btree_submit_bio_start(struct inode *inode, struct bio *bio,
+					   u64 bio_offset)
 {
 	/*
 	 * when we're called for a write, we're already in the async
@@ -893,8 +894,8 @@ static blk_status_t btree_submit_bio_hook(struct inode *inode, struct bio *bio,
 		 * kthread helpers are used to submit writes so that
 		 * checksumming can happen in parallel across all CPUs
 		 */
-		ret = btrfs_wq_submit_bio(fs_info, bio, mirror_num, 0,
-					  0, inode, btree_submit_bio_start);
+		ret = btrfs_wq_submit_bio(inode, bio, mirror_num, 0,
+					  0, btree_submit_bio_start);
 	}
 
 	if (ret)
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 00dc39d47ed3..2d564e9223e2 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -105,10 +105,10 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64 parent_transid, int level,
 		      struct btrfs_key *first_key);
 blk_status_t btrfs_bio_wq_end_io(struct btrfs_fs_info *info, struct bio *bio,
 			enum btrfs_wq_endio_type metadata);
-blk_status_t btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
-			int mirror_num, unsigned long bio_flags,
-			u64 bio_offset, void *private_data,
-			extent_submit_bio_start_t *submit_bio_start);
+blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio,
+				 int mirror_num, unsigned long bio_flags,
+				 u64 bio_offset,
+				 extent_submit_bio_start_t *submit_bio_start);
 blk_status_t btrfs_submit_bio_done(void *private_data, struct bio *bio,
 			  int mirror_num);
 int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 30794ae58498..3c9252b429e0 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -71,7 +71,7 @@ typedef blk_status_t (submit_bio_hook_t)(struct inode *inode, struct bio *bio,
 					 int mirror_num,
 					 unsigned long bio_flags);
 
-typedef blk_status_t (extent_submit_bio_start_t)(void *private_data,
+typedef blk_status_t (extent_submit_bio_start_t)(struct inode *inode,
 		struct bio *bio, u64 bio_offset);
 
 struct extent_io_ops {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9570458aa847..e5d558ef4c7f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2157,11 +2157,9 @@ int btrfs_bio_fits_in_stripe(struct page *page, size_t size, struct bio *bio,
  * At IO completion time the cums attached on the ordered extent record
  * are inserted into the btree
  */
-static blk_status_t btrfs_submit_bio_start(void *private_data, struct bio *bio,
-				    u64 bio_offset)
+static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio,
+					   u64 bio_offset)
 {
-	struct inode *inode = private_data;
-
 	return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0);
 }
 
@@ -2221,8 +2219,8 @@ static blk_status_t btrfs_submit_bio_hook(struct inode *inode, struct bio *bio,
 		if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID)
 			goto mapit;
 		/* we're doing a write, do the async checksumming */
-		ret = btrfs_wq_submit_bio(fs_info, bio, mirror_num, bio_flags,
-					  0, inode, btrfs_submit_bio_start);
+		ret = btrfs_wq_submit_bio(inode, bio, mirror_num, bio_flags,
+					  0, btrfs_submit_bio_start);
 		goto out;
 	} else if (!skip_sum) {
 		ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0);
@@ -7616,11 +7614,10 @@ static void __endio_write_update_ordered(struct btrfs_inode *inode,
 	}
 }
 
-static blk_status_t btrfs_submit_bio_start_direct_io(void *private_data,
-				    struct bio *bio, u64 offset)
+static blk_status_t btrfs_submit_bio_start_direct_io(struct inode *inode,
+						     struct bio *bio,
+						     u64 offset)
 {
-	struct inode *inode = private_data;
-
 	return btrfs_csum_one_bio(BTRFS_I(inode), bio, offset, 1);
 }
 
@@ -7671,8 +7668,8 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 		goto map;
 
 	if (write && async_submit) {
-		ret = btrfs_wq_submit_bio(fs_info, bio, 0, 0,
-					  file_offset, inode,
+		ret = btrfs_wq_submit_bio(inode, bio, 0, 0,
+					  file_offset,
 					  btrfs_submit_bio_start_direct_io);
 		goto err;
 	} else if (write) {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 07/49] btrfs: inode: sink parameter @start and @len for check_data_csum()
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (5 preceding siblings ...)
  2020-09-30  1:54 ` [PATCH v3 06/49] btrfs: disk-io: replace @fs_info and @private_data with @inode for btrfs_wq_submit_bio() Qu Wenruo
@ 2020-09-30  1:54 ` Qu Wenruo
  2020-09-30  1:54 ` [PATCH v3 08/49] btrfs: extent_io: unexport extent_invalidatepage() Qu Wenruo
                   ` (41 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:54 UTC (permalink / raw)
  To: linux-btrfs

For check_data_csum(), the page we're using is directly from inode
mapping, thus it has valid page_offset().

We can use (page_offset() + pg_off) to replace @start parameter
completely, while the @len should always be sectorsize.

Since we're here, also add some comment, as there are quite some
confusion in words like start/offset, without explaining whether it's
file_offset or logical bytenr.

This should not affect the existing behavior, as for current sectorsize
== PAGE_SIZE case, @pgoff should always be 0, and len is always
PAGE_SIZE (or sectorsize from the dio read path).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e5d558ef4c7f..10ea6a92685b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2791,17 +2791,30 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
 	btrfs_queue_work(wq, &ordered_extent->work);
 }
 
+/*
+ * Verify the checksum of one sector of uncompressed data.
+ *
+ * @inode:	The inode.
+ * @io_bio:	The btrfs_io_bio which contains the csum.
+ * @icsum:	The csum offset (by number of sectors).
+ * @page:	The page where the data to be verified is.
+ * @pgoff:	The offset inside the page.
+ *
+ * The length of such check is always one sector size.
+ */
 static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
-			   int icsum, struct page *page, int pgoff, u64 start,
-			   size_t len)
+			   int icsum, struct page *page, int pgoff)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	char *kaddr;
+	u32 len = fs_info->sectorsize;
 	u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
 	u8 *csum_expected;
 	u8 csum[BTRFS_CSUM_SIZE];
 
+	ASSERT(pgoff + len <= PAGE_SIZE);
+
 	csum_expected = ((u8 *)io_bio->csum) + icsum * csum_size;
 
 	kaddr = kmap_atomic(page);
@@ -2815,8 +2828,8 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
 	kunmap_atomic(kaddr);
 	return 0;
 zeroit:
-	btrfs_print_data_csum_error(BTRFS_I(inode), start, csum, csum_expected,
-				    io_bio->mirror_num);
+	btrfs_print_data_csum_error(BTRFS_I(inode), page_offset(page) + pgoff,
+				    csum, csum_expected, io_bio->mirror_num);
 	if (io_bio->device)
 		btrfs_dev_stat_inc_and_print(io_bio->device,
 					     BTRFS_DEV_STAT_CORRUPTION_ERRS);
@@ -2855,8 +2868,7 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 	}
 
 	phy_offset >>= inode->i_sb->s_blocksize_bits;
-	return check_data_csum(inode, io_bio, phy_offset, page, offset, start,
-			       (size_t)(end - start + 1));
+	return check_data_csum(inode, io_bio, phy_offset, page, offset);
 }
 
 /*
@@ -7543,8 +7555,7 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
 			ASSERT(pgoff < PAGE_SIZE);
 			if (uptodate &&
 			    (!csum || !check_data_csum(inode, io_bio, icsum,
-						       bvec.bv_page, pgoff,
-						       start, sectorsize))) {
+						       bvec.bv_page, pgoff))) {
 				clean_io_failure(fs_info, failure_tree, io_tree,
 						 start, bvec.bv_page,
 						 btrfs_ino(BTRFS_I(inode)),
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 08/49] btrfs: extent_io: unexport extent_invalidatepage()
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (6 preceding siblings ...)
  2020-09-30  1:54 ` [PATCH v3 07/49] btrfs: inode: sink parameter @start and @len for check_data_csum() Qu Wenruo
@ 2020-09-30  1:54 ` Qu Wenruo
  2020-09-30  1:54 ` [PATCH v3 09/49] btrfs: extent_io: remove the forward declaration and rename __process_pages_contig Qu Wenruo
                   ` (40 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:54 UTC (permalink / raw)
  To: linux-btrfs

Function extent_invalidatepage() has a single caller,
btree_invalidatepage().

Just unexport this function and move it disk-io.c.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c        | 23 +++++++++++++++++++++++
 fs/btrfs/extent-io-tree.h |  2 --
 fs/btrfs/extent_io.c      | 24 ------------------------
 3 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b7436ab7bba9..c81b7e53149c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -966,6 +966,29 @@ static int btree_releasepage(struct page *page, gfp_t gfp_flags)
 	return try_release_extent_buffer(page);
 }
 
+/*
+ * basic invalidatepage code, this waits on any locked or writeback
+ * ranges corresponding to the page, and then deletes any extent state
+ * records from the tree
+ */
+static void extent_invalidatepage(struct extent_io_tree *tree,
+				  struct page *page, unsigned long offset)
+{
+	struct extent_state *cached_state = NULL;
+	u64 start = page_offset(page);
+	u64 end = start + PAGE_SIZE - 1;
+	size_t blocksize = page->mapping->host->i_sb->s_blocksize;
+
+	start += ALIGN(offset, blocksize);
+	if (start > end)
+		return;
+
+	lock_extent_bits(tree, start, end, &cached_state);
+	wait_on_page_writeback(page);
+	clear_extent_bit(tree, start, end, EXTENT_LOCKED | EXTENT_DELALLOC |
+			 EXTENT_DO_ACCOUNTING, 1, 1, &cached_state);
+}
+
 static void btree_invalidatepage(struct page *page, unsigned int offset,
 				 unsigned int length)
 {
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 960d4a24f13e..5927338c74a2 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -229,8 +229,6 @@ void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
 				 u64 *start_ret, u64 *end_ret, unsigned bits);
 int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start,
 			       u64 *start_ret, u64 *end_ret, unsigned bits);
-int extent_invalidatepage(struct extent_io_tree *tree,
-			  struct page *page, unsigned long offset);
 bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start,
 			       u64 *end, u64 max_bytes,
 			       struct extent_state **cached_state);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2980e8384e74..02c3518afa82 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4405,30 +4405,6 @@ void extent_readahead(struct readahead_control *rac)
 	}
 }
 
-/*
- * basic invalidatepage code, this waits on any locked or writeback
- * ranges corresponding to the page, and then deletes any extent state
- * records from the tree
- */
-int extent_invalidatepage(struct extent_io_tree *tree,
-			  struct page *page, unsigned long offset)
-{
-	struct extent_state *cached_state = NULL;
-	u64 start = page_offset(page);
-	u64 end = start + PAGE_SIZE - 1;
-	size_t blocksize = page->mapping->host->i_sb->s_blocksize;
-
-	start += ALIGN(offset, blocksize);
-	if (start > end)
-		return 0;
-
-	lock_extent_bits(tree, start, end, &cached_state);
-	wait_on_page_writeback(page);
-	clear_extent_bit(tree, start, end, EXTENT_LOCKED | EXTENT_DELALLOC |
-			 EXTENT_DO_ACCOUNTING, 1, 1, &cached_state);
-	return 0;
-}
-
 /*
  * a helper for releasepage, this tests for areas of the page that
  * are locked or under IO and drops the related state bits if it is safe
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 09/49] btrfs: extent_io: remove the forward declaration and rename __process_pages_contig
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (7 preceding siblings ...)
  2020-09-30  1:54 ` [PATCH v3 08/49] btrfs: extent_io: unexport extent_invalidatepage() Qu Wenruo
@ 2020-09-30  1:54 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 10/49] btrfs: extent_io: rename pages_locked in process_pages_contig() Qu Wenruo
                   ` (39 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:54 UTC (permalink / raw)
  To: linux-btrfs

There is no need to do forward declaration for __process_pages_contig(),
so move it before it get first called.

Since we are here, also remove the "__" prefix since there is no special
meaning for it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 180 +++++++++++++++++++++++--------------------
 1 file changed, 95 insertions(+), 85 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 02c3518afa82..9f46d7f17a9c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1810,10 +1810,98 @@ bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start,
 	return found;
 }
 
-static int __process_pages_contig(struct address_space *mapping,
-				  struct page *locked_page,
-				  pgoff_t start_index, pgoff_t end_index,
-				  unsigned long page_ops, pgoff_t *index_ret);
+/*
+ * A helper to update contiguous pages status according to @page_ops.
+ *
+ * @mapping:		The address space of the pages
+ * @locked_page:	The already locked page. Mostly for inline extent
+ * 			handling
+ * @start_index:	The start page index.
+ * @end_inde:		The last page index.
+ * @pages_opts:		The operations to be done
+ * @index_ret:		The last handled page index (for error case)
+ *
+ * Return 0 if every page is handled properly.
+ * Return <0 if something wrong happened, and update @index_ret.
+ */
+static int process_pages_contig(struct address_space *mapping,
+				struct page *locked_page,
+				pgoff_t start_index, pgoff_t end_index,
+				unsigned long page_ops, pgoff_t *index_ret)
+{
+	unsigned long nr_pages = end_index - start_index + 1;
+	unsigned long pages_locked = 0;
+	pgoff_t index = start_index;
+	struct page *pages[16];
+	unsigned ret;
+	int err = 0;
+	int i;
+
+	if (page_ops & PAGE_LOCK) {
+		ASSERT(page_ops == PAGE_LOCK);
+		ASSERT(index_ret && *index_ret == start_index);
+	}
+
+	if ((page_ops & PAGE_SET_ERROR) && nr_pages > 0)
+		mapping_set_error(mapping, -EIO);
+
+	while (nr_pages > 0) {
+		ret = find_get_pages_contig(mapping, index,
+				     min_t(unsigned long,
+				     nr_pages, ARRAY_SIZE(pages)), pages);
+		if (ret == 0) {
+			/*
+			 * Only if we're going to lock these pages,
+			 * can we find nothing at @index.
+			 */
+			ASSERT(page_ops & PAGE_LOCK);
+			err = -EAGAIN;
+			goto out;
+		}
+
+		for (i = 0; i < ret; i++) {
+			if (page_ops & PAGE_SET_PRIVATE2)
+				SetPagePrivate2(pages[i]);
+
+			if (locked_page && pages[i] == locked_page) {
+				put_page(pages[i]);
+				pages_locked++;
+				continue;
+			}
+			if (page_ops & PAGE_CLEAR_DIRTY)
+				clear_page_dirty_for_io(pages[i]);
+			if (page_ops & PAGE_SET_WRITEBACK)
+				set_page_writeback(pages[i]);
+			if (page_ops & PAGE_SET_ERROR)
+				SetPageError(pages[i]);
+			if (page_ops & PAGE_END_WRITEBACK)
+				end_page_writeback(pages[i]);
+			if (page_ops & PAGE_UNLOCK)
+				unlock_page(pages[i]);
+			if (page_ops & PAGE_LOCK) {
+				lock_page(pages[i]);
+				if (!PageDirty(pages[i]) ||
+				    pages[i]->mapping != mapping) {
+					unlock_page(pages[i]);
+					for (; i < ret; i++)
+						put_page(pages[i]);
+					err = -EAGAIN;
+					goto out;
+				}
+			}
+			put_page(pages[i]);
+			pages_locked++;
+		}
+		nr_pages -= ret;
+		index += ret;
+		cond_resched();
+	}
+out:
+	if (err && index_ret)
+		*index_ret = start_index + pages_locked - 1;
+	return err;
+}
+
 
 static noinline void __unlock_for_delalloc(struct inode *inode,
 					   struct page *locked_page,
@@ -1826,7 +1914,7 @@ static noinline void __unlock_for_delalloc(struct inode *inode,
 	if (index == locked_page->index && end_index == index)
 		return;
 
-	__process_pages_contig(inode->i_mapping, locked_page, index, end_index,
+	process_pages_contig(inode->i_mapping, locked_page, index, end_index,
 			       PAGE_UNLOCK, NULL);
 }
 
@@ -1844,7 +1932,7 @@ static noinline int lock_delalloc_pages(struct inode *inode,
 	if (index == locked_page->index && index == end_index)
 		return 0;
 
-	ret = __process_pages_contig(inode->i_mapping, locked_page, index,
+	ret = process_pages_contig(inode->i_mapping, locked_page, index,
 				     end_index, PAGE_LOCK, &index_ret);
 	if (ret == -EAGAIN)
 		__unlock_for_delalloc(inode, locked_page, delalloc_start,
@@ -1941,84 +2029,6 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
 	return found;
 }
 
-static int __process_pages_contig(struct address_space *mapping,
-				  struct page *locked_page,
-				  pgoff_t start_index, pgoff_t end_index,
-				  unsigned long page_ops, pgoff_t *index_ret)
-{
-	unsigned long nr_pages = end_index - start_index + 1;
-	unsigned long pages_locked = 0;
-	pgoff_t index = start_index;
-	struct page *pages[16];
-	unsigned ret;
-	int err = 0;
-	int i;
-
-	if (page_ops & PAGE_LOCK) {
-		ASSERT(page_ops == PAGE_LOCK);
-		ASSERT(index_ret && *index_ret == start_index);
-	}
-
-	if ((page_ops & PAGE_SET_ERROR) && nr_pages > 0)
-		mapping_set_error(mapping, -EIO);
-
-	while (nr_pages > 0) {
-		ret = find_get_pages_contig(mapping, index,
-				     min_t(unsigned long,
-				     nr_pages, ARRAY_SIZE(pages)), pages);
-		if (ret == 0) {
-			/*
-			 * Only if we're going to lock these pages,
-			 * can we find nothing at @index.
-			 */
-			ASSERT(page_ops & PAGE_LOCK);
-			err = -EAGAIN;
-			goto out;
-		}
-
-		for (i = 0; i < ret; i++) {
-			if (page_ops & PAGE_SET_PRIVATE2)
-				SetPagePrivate2(pages[i]);
-
-			if (locked_page && pages[i] == locked_page) {
-				put_page(pages[i]);
-				pages_locked++;
-				continue;
-			}
-			if (page_ops & PAGE_CLEAR_DIRTY)
-				clear_page_dirty_for_io(pages[i]);
-			if (page_ops & PAGE_SET_WRITEBACK)
-				set_page_writeback(pages[i]);
-			if (page_ops & PAGE_SET_ERROR)
-				SetPageError(pages[i]);
-			if (page_ops & PAGE_END_WRITEBACK)
-				end_page_writeback(pages[i]);
-			if (page_ops & PAGE_UNLOCK)
-				unlock_page(pages[i]);
-			if (page_ops & PAGE_LOCK) {
-				lock_page(pages[i]);
-				if (!PageDirty(pages[i]) ||
-				    pages[i]->mapping != mapping) {
-					unlock_page(pages[i]);
-					for (; i < ret; i++)
-						put_page(pages[i]);
-					err = -EAGAIN;
-					goto out;
-				}
-			}
-			put_page(pages[i]);
-			pages_locked++;
-		}
-		nr_pages -= ret;
-		index += ret;
-		cond_resched();
-	}
-out:
-	if (err && index_ret)
-		*index_ret = start_index + pages_locked - 1;
-	return err;
-}
-
 void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 				  struct page *locked_page,
 				  unsigned clear_bits,
@@ -2026,7 +2036,7 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 {
 	clear_extent_bit(&inode->io_tree, start, end, clear_bits, 1, 0, NULL);
 
-	__process_pages_contig(inode->vfs_inode.i_mapping, locked_page,
+	process_pages_contig(inode->vfs_inode.i_mapping, locked_page,
 			       start >> PAGE_SHIFT, end >> PAGE_SHIFT,
 			       page_ops, NULL);
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 10/49] btrfs: extent_io: rename pages_locked in process_pages_contig()
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (8 preceding siblings ...)
  2020-09-30  1:54 ` [PATCH v3 09/49] btrfs: extent_io: remove the forward declaration and rename __process_pages_contig Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 11/49] btrfs: extent_io: make process_pages_contig() to accept bytenr directly Qu Wenruo
                   ` (38 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

Function process_pages_contig() does not only handle page locking but
also other operations.

So rename the local variable pages_locked to pages_processed to reduce
confusion.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9f46d7f17a9c..07f8117ddbb4 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1830,7 +1830,7 @@ static int process_pages_contig(struct address_space *mapping,
 				unsigned long page_ops, pgoff_t *index_ret)
 {
 	unsigned long nr_pages = end_index - start_index + 1;
-	unsigned long pages_locked = 0;
+	unsigned long pages_processed = 0;
 	pgoff_t index = start_index;
 	struct page *pages[16];
 	unsigned ret;
@@ -1865,7 +1865,7 @@ static int process_pages_contig(struct address_space *mapping,
 
 			if (locked_page && pages[i] == locked_page) {
 				put_page(pages[i]);
-				pages_locked++;
+				pages_processed++;
 				continue;
 			}
 			if (page_ops & PAGE_CLEAR_DIRTY)
@@ -1890,7 +1890,7 @@ static int process_pages_contig(struct address_space *mapping,
 				}
 			}
 			put_page(pages[i]);
-			pages_locked++;
+			pages_processed++;
 		}
 		nr_pages -= ret;
 		index += ret;
@@ -1898,7 +1898,7 @@ static int process_pages_contig(struct address_space *mapping,
 	}
 out:
 	if (err && index_ret)
-		*index_ret = start_index + pages_locked - 1;
+		*index_ret = start_index + pages_processed - 1;
 	return err;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 11/49] btrfs: extent_io: make process_pages_contig() to accept bytenr directly
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (9 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 10/49] btrfs: extent_io: rename pages_locked in process_pages_contig() Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 12/49] btrfs: extent_io: only require sector size alignment for page read Qu Wenruo
                   ` (37 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

Instead of page index, accept bytenr directly for
process_pages_contig().

This allows process_pages_contig() to accept ranges which is not aligned
to page size, while still report accurate @end_ret.

Currently we still only accept page aligned values, but this provides
the basis for later subpage support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 78 ++++++++++++++++++++++++--------------------
 1 file changed, 43 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 07f8117ddbb4..d35eae29bc80 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1810,46 +1810,58 @@ bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start,
 	return found;
 }
 
+static int calc_bytes_processed(struct page *page, u64 range_start)
+{
+	u64 page_start = page_offset(page);
+	u64 real_start = max(range_start, page_start);
+
+	return page_start + PAGE_SIZE - real_start;
+}
+
 /*
  * A helper to update contiguous pages status according to @page_ops.
  *
  * @mapping:		The address space of the pages
  * @locked_page:	The already locked page. Mostly for inline extent
  * 			handling
- * @start_index:	The start page index.
- * @end_inde:		The last page index.
+ * @start:		The start file offset
+ * @end:		The end file offset (inclusive)
  * @pages_opts:		The operations to be done
- * @index_ret:		The last handled page index (for error case)
+ * @end_ret:		The last handled inclusive file offset (for error case)
  *
  * Return 0 if every page is handled properly.
- * Return <0 if something wrong happened, and update @index_ret.
+ * Return <0 if something wrong happened, and update @end_ret.
  */
 static int process_pages_contig(struct address_space *mapping,
 				struct page *locked_page,
-				pgoff_t start_index, pgoff_t end_index,
-				unsigned long page_ops, pgoff_t *index_ret)
+				u64 start, u64 end,
+				unsigned long page_ops, u64 *end_ret)
 {
-	unsigned long nr_pages = end_index - start_index + 1;
-	unsigned long pages_processed = 0;
+	pgoff_t start_index = start >> PAGE_SHIFT;
+	pgoff_t end_index = end >> PAGE_SHIFT;
 	pgoff_t index = start_index;
+	u64 processed_end = start - 1;
+	unsigned long nr_pages = end_index - start_index + 1;
 	struct page *pages[16];
-	unsigned ret;
 	int err = 0;
 	int i;
 
+	ASSERT(IS_ALIGNED(start, PAGE_SIZE) && IS_ALIGNED(end + 1, PAGE_SIZE));
 	if (page_ops & PAGE_LOCK) {
 		ASSERT(page_ops == PAGE_LOCK);
-		ASSERT(index_ret && *index_ret == start_index);
+		ASSERT(end_ret && *end_ret == start - 1);
 	}
 
 	if ((page_ops & PAGE_SET_ERROR) && nr_pages > 0)
 		mapping_set_error(mapping, -EIO);
 
 	while (nr_pages > 0) {
-		ret = find_get_pages_contig(mapping, index,
+		unsigned found_pages;
+
+		found_pages = find_get_pages_contig(mapping, index,
 				     min_t(unsigned long,
 				     nr_pages, ARRAY_SIZE(pages)), pages);
-		if (ret == 0) {
+		if (found_pages == 0) {
 			/*
 			 * Only if we're going to lock these pages,
 			 * can we find nothing at @index.
@@ -1859,13 +1871,14 @@ static int process_pages_contig(struct address_space *mapping,
 			goto out;
 		}
 
-		for (i = 0; i < ret; i++) {
+		for (i = 0; i < found_pages; i++) {
 			if (page_ops & PAGE_SET_PRIVATE2)
 				SetPagePrivate2(pages[i]);
 
 			if (locked_page && pages[i] == locked_page) {
 				put_page(pages[i]);
-				pages_processed++;
+				processed_end +=
+					calc_bytes_processed(pages[i], start);
 				continue;
 			}
 			if (page_ops & PAGE_CLEAR_DIRTY)
@@ -1883,22 +1896,22 @@ static int process_pages_contig(struct address_space *mapping,
 				if (!PageDirty(pages[i]) ||
 				    pages[i]->mapping != mapping) {
 					unlock_page(pages[i]);
-					for (; i < ret; i++)
+					for (; i < found_pages; i++)
 						put_page(pages[i]);
 					err = -EAGAIN;
 					goto out;
 				}
 			}
 			put_page(pages[i]);
-			pages_processed++;
+			processed_end += calc_bytes_processed(pages[i], start);
 		}
-		nr_pages -= ret;
-		index += ret;
+		nr_pages -= found_pages;
+		index += found_pages;
 		cond_resched();
 	}
 out:
-	if (err && index_ret)
-		*index_ret = start_index + pages_processed - 1;
+	if (err && end_ret)
+		*end_ret = processed_end;
 	return err;
 }
 
@@ -1907,15 +1920,12 @@ static noinline void __unlock_for_delalloc(struct inode *inode,
 					   struct page *locked_page,
 					   u64 start, u64 end)
 {
-	unsigned long index = start >> PAGE_SHIFT;
-	unsigned long end_index = end >> PAGE_SHIFT;
-
 	ASSERT(locked_page);
-	if (index == locked_page->index && end_index == index)
+	if (end < start)
 		return;
 
-	process_pages_contig(inode->i_mapping, locked_page, index, end_index,
-			       PAGE_UNLOCK, NULL);
+	process_pages_contig(inode->i_mapping, locked_page, start, end,
+			     PAGE_UNLOCK, NULL);
 }
 
 static noinline int lock_delalloc_pages(struct inode *inode,
@@ -1923,20 +1933,19 @@ static noinline int lock_delalloc_pages(struct inode *inode,
 					u64 delalloc_start,
 					u64 delalloc_end)
 {
-	unsigned long index = delalloc_start >> PAGE_SHIFT;
-	unsigned long index_ret = index;
-	unsigned long end_index = delalloc_end >> PAGE_SHIFT;
+	u64 processed_end = delalloc_start - 1;
 	int ret;
 
 	ASSERT(locked_page);
-	if (index == locked_page->index && index == end_index)
+	if (delalloc_end < delalloc_start)
 		return 0;
 
-	ret = process_pages_contig(inode->i_mapping, locked_page, index,
-				     end_index, PAGE_LOCK, &index_ret);
+	ret = process_pages_contig(inode->i_mapping, locked_page,
+				   delalloc_start, delalloc_end, PAGE_LOCK,
+				   &processed_end);
 	if (ret == -EAGAIN)
 		__unlock_for_delalloc(inode, locked_page, delalloc_start,
-				      (u64)index_ret << PAGE_SHIFT);
+				      processed_end);
 	return ret;
 }
 
@@ -2037,8 +2046,7 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 	clear_extent_bit(&inode->io_tree, start, end, clear_bits, 1, 0, NULL);
 
 	process_pages_contig(inode->vfs_inode.i_mapping, locked_page,
-			       start >> PAGE_SHIFT, end >> PAGE_SHIFT,
-			       page_ops, NULL);
+			     start, end, page_ops, NULL);
 }
 
 /*
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 12/49] btrfs: extent_io: only require sector size alignment for page read
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (10 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 11/49] btrfs: extent_io: make process_pages_contig() to accept bytenr directly Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 13/49] btrfs: extent_io: remove the extent_start/extent_len for end_bio_extent_readpage() Qu Wenruo
                   ` (36 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

If we're reading partial page, btrfs will warn about this as our
read/write are always done in sector size, which equals page size.

But for the incoming subpage RO support, our data read is only aligned
to sectorsize, which can be smaller than page size.

Thus here we change the warning condition to check it against
sectorsize, thus the behavior is not changed for regular sectorsize ==
PAGE_SIZE case, and won't report error for subpage read.

Also, pass the proper start/end with bv_offset for check_data_csum() to
handle.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d35eae29bc80..1da7897a799e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2838,6 +2838,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		struct page *page = bvec->bv_page;
 		struct inode *inode = page->mapping->host;
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+		u32 sectorsize = fs_info->sectorsize;
 		bool data_inode = btrfs_ino(BTRFS_I(inode))
 			!= BTRFS_BTREE_INODE_OBJECTID;
 
@@ -2848,24 +2849,25 @@ static void end_bio_extent_readpage(struct bio *bio)
 		tree = &BTRFS_I(inode)->io_tree;
 		failure_tree = &BTRFS_I(inode)->io_failure_tree;
 
-		/* We always issue full-page reads, but if some block
+		/*
+		 * We always issue full-sector reads, but if some block
 		 * in a page fails to read, blk_update_request() will
 		 * advance bv_offset and adjust bv_len to compensate.
-		 * Print a warning for nonzero offsets, and an error
-		 * if they don't add up to a full page.  */
-		if (bvec->bv_offset || bvec->bv_len != PAGE_SIZE) {
-			if (bvec->bv_offset + bvec->bv_len != PAGE_SIZE)
-				btrfs_err(fs_info,
-					"partial page read in btrfs with offset %u and length %u",
-					bvec->bv_offset, bvec->bv_len);
-			else
-				btrfs_info(fs_info,
-					"incomplete page read in btrfs with offset %u and length %u",
-					bvec->bv_offset, bvec->bv_len);
-		}
-
-		start = page_offset(page);
-		end = start + bvec->bv_offset + bvec->bv_len - 1;
+		 * Print a warning for unaligned offsets, and an error
+		 * if they don't add up to a full sector.
+		 */
+		if (!IS_ALIGNED(bvec->bv_offset, sectorsize))
+			btrfs_err(fs_info,
+		"partial page read in btrfs with offset %u and length %u",
+				  bvec->bv_offset, bvec->bv_len);
+		else if (!IS_ALIGNED(bvec->bv_offset + bvec->bv_len,
+				     sectorsize))
+			btrfs_info(fs_info,
+		"incomplete page read in btrfs with offset %u and length %u",
+				   bvec->bv_offset, bvec->bv_len);
+
+		start = page_offset(page) + bvec->bv_offset;
+		end = start + bvec->bv_len - 1;
 		len = bvec->bv_len;
 
 		mirror = io_bio->mirror_num;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 13/49] btrfs: extent_io: remove the extent_start/extent_len for end_bio_extent_readpage()
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (11 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 12/49] btrfs: extent_io: only require sector size alignment for page read Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 14/49] btrfs: extent_io: integrate page status update into endio_readpage_release_extent() Qu Wenruo
                   ` (35 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

In end_bio_extent_readpage() we had a strange dance around
extent_start/extent_len.

The truth is, no matter what we're doing using those two variable, the
end result is just the same, clear the EXTENT_LOCKED bit and if needed
set the EXTENT_UPTODATE bit for the io_tree.

This doesn't need the complex dance, we can do it pretty easily by just
calling endio_readpage_release_extent() for each bvec.

This greatly streamlines the code.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 30 ++----------------------------
 1 file changed, 2 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1da7897a799e..395fa52ed2f9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2795,11 +2795,10 @@ static void end_bio_extent_writepage(struct bio *bio)
 }
 
 static void
-endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 len,
+endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 end,
 			      int uptodate)
 {
 	struct extent_state *cached = NULL;
-	u64 end = start + len - 1;
 
 	if (uptodate && tree->track_uptodate)
 		set_extent_uptodate(tree, start, end, &cached, GFP_ATOMIC);
@@ -2827,8 +2826,6 @@ static void end_bio_extent_readpage(struct bio *bio)
 	u64 start;
 	u64 end;
 	u64 len;
-	u64 extent_start = 0;
-	u64 extent_len = 0;
 	int mirror;
 	int ret;
 	struct bvec_iter_all iter_all;
@@ -2936,32 +2933,9 @@ static void end_bio_extent_readpage(struct bio *bio)
 		unlock_page(page);
 		offset += len;
 
-		if (unlikely(!uptodate)) {
-			if (extent_len) {
-				endio_readpage_release_extent(tree,
-							      extent_start,
-							      extent_len, 1);
-				extent_start = 0;
-				extent_len = 0;
-			}
-			endio_readpage_release_extent(tree, start,
-						      end - start + 1, 0);
-		} else if (!extent_len) {
-			extent_start = start;
-			extent_len = end + 1 - start;
-		} else if (extent_start + extent_len == start) {
-			extent_len += end + 1 - start;
-		} else {
-			endio_readpage_release_extent(tree, extent_start,
-						      extent_len, uptodate);
-			extent_start = start;
-			extent_len = end + 1 - start;
-		}
+		endio_readpage_release_extent(tree, start, end, uptodate);
 	}
 
-	if (extent_len)
-		endio_readpage_release_extent(tree, extent_start, extent_len,
-					      uptodate);
 	btrfs_io_bio_free_csum(io_bio);
 	bio_put(bio);
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 14/49] btrfs: extent_io: integrate page status update into endio_readpage_release_extent()
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (12 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 13/49] btrfs: extent_io: remove the extent_start/extent_len for end_bio_extent_readpage() Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 15/49] btrfs: extent_io: rename page_size to io_size in submit_extent_page() Qu Wenruo
                   ` (34 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

In end_bio_extent_readpage(), we set page uptodate or error according to
the bio status.
However that assumes all submitted read are in page size.

To support case like subpage read, we should only set the whole page
uptodate if all data in the page has been read from disk.

This patch will integrate the page status update into
endio_readpage_release_extent() for end_bio_extent_readpage().

Now in endio_readpage_release_extent() we will set the page uptodate if
either:
- start/end covers the full page
  This is the existing behavior already.

- all the page range is already uptodate
  This adds the support for subpage read.

And for the error path, we always clear the page uptodate and set the
page error.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 39 +++++++++++++++++++++++++++++----------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 395fa52ed2f9..af86289f465e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2795,13 +2795,36 @@ static void end_bio_extent_writepage(struct bio *bio)
 }
 
 static void
-endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 end,
-			      int uptodate)
+endio_readpage_release_extent(struct extent_io_tree *tree, struct page *page,
+			      u64 start, u64 end, int uptodate)
 {
 	struct extent_state *cached = NULL;
 
-	if (uptodate && tree->track_uptodate)
-		set_extent_uptodate(tree, start, end, &cached, GFP_ATOMIC);
+	if (uptodate) {
+		u64 page_start = page_offset(page);
+		u64 page_end = page_offset(page) + PAGE_SIZE - 1;
+
+		if (tree->track_uptodate) {
+			/*
+			 * The tree has EXTENT_UPTODATE bit tracking, update
+			 * extent io tree, and use it to update the page if
+			 * needed.
+			 */
+			set_extent_uptodate(tree, start, end, &cached,
+					    GFP_NOFS);
+			check_page_uptodate(tree, page);
+		} else if ((start <= page_start && end >= page_end)) {
+			/* We have covered the full page, set it uptodate */
+			SetPageUptodate(page);
+		}
+	} else if (!uptodate){
+		if (tree->track_uptodate)
+			clear_extent_uptodate(tree, start, end, &cached);
+
+		/* Any error in the page range would invalid the uptodate bit */
+		ClearPageUptodate(page);
+		SetPageError(page);
+	}
 	unlock_extent_cached_atomic(tree, start, end, &cached);
 }
 
@@ -2925,15 +2948,11 @@ static void end_bio_extent_readpage(struct bio *bio)
 			off = offset_in_page(i_size);
 			if (page->index == end_index && off)
 				zero_user_segment(page, off, PAGE_SIZE);
-			SetPageUptodate(page);
-		} else {
-			ClearPageUptodate(page);
-			SetPageError(page);
 		}
-		unlock_page(page);
 		offset += len;
 
-		endio_readpage_release_extent(tree, start, end, uptodate);
+		endio_readpage_release_extent(tree, page, start, end, uptodate);
+		unlock_page(page);
 	}
 
 	btrfs_io_bio_free_csum(io_bio);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 15/49] btrfs: extent_io: rename page_size to io_size in submit_extent_page()
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (13 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 14/49] btrfs: extent_io: integrate page status update into endio_readpage_release_extent() Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 16/49] btrfs: extent_io: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
                   ` (33 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

The variable @page_size of submit_extent_page() is not bounded to page
size.

It can already be smaller than PAGE_SIZE, so rename it to io_size to
reduce confusion, this is especially important for later subpage
support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index af86289f465e..2edbac6c089e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3051,7 +3051,7 @@ static int submit_extent_page(unsigned int opf,
 {
 	int ret = 0;
 	struct bio *bio;
-	size_t page_size = min_t(size_t, size, PAGE_SIZE);
+	size_t io_size = min_t(size_t, size, PAGE_SIZE);
 	sector_t sector = offset >> 9;
 	struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree;
 
@@ -3068,12 +3068,12 @@ static int submit_extent_page(unsigned int opf,
 			contig = bio_end_sector(bio) == sector;
 
 		ASSERT(tree->ops);
-		if (btrfs_bio_fits_in_stripe(page, page_size, bio, bio_flags))
+		if (btrfs_bio_fits_in_stripe(page, io_size, bio, bio_flags))
 			can_merge = false;
 
 		if (prev_bio_flags != bio_flags || !contig || !can_merge ||
 		    force_bio_submit ||
-		    bio_add_page(bio, page, page_size, pg_offset) < page_size) {
+		    bio_add_page(bio, page, io_size, pg_offset) < io_size) {
 			ret = submit_one_bio(bio, mirror_num, prev_bio_flags);
 			if (ret < 0) {
 				*bio_ret = NULL;
@@ -3082,13 +3082,13 @@ static int submit_extent_page(unsigned int opf,
 			bio = NULL;
 		} else {
 			if (wbc)
-				wbc_account_cgroup_owner(wbc, page, page_size);
+				wbc_account_cgroup_owner(wbc, page, io_size);
 			return 0;
 		}
 	}
 
 	bio = btrfs_bio_alloc(offset);
-	bio_add_page(bio, page, page_size, pg_offset);
+	bio_add_page(bio, page, io_size, pg_offset);
 	bio->bi_end_io = end_io_func;
 	bio->bi_private = tree;
 	bio->bi_write_hint = page->mapping->host->i_write_hint;
@@ -3099,7 +3099,7 @@ static int submit_extent_page(unsigned int opf,
 		bdev = BTRFS_I(page->mapping->host)->root->fs_info->fs_devices->latest_bdev;
 		bio_set_dev(bio, bdev);
 		wbc_init_bio(wbc, bio);
-		wbc_account_cgroup_owner(wbc, page, page_size);
+		wbc_account_cgroup_owner(wbc, page, io_size);
 	}
 
 	*bio_ret = bio;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 16/49] btrfs: extent_io: add assert_spin_locked() for attach_extent_buffer_page()
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (14 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 15/49] btrfs: extent_io: rename page_size to io_size in submit_extent_page() Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 17/49] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
                   ` (32 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

When calling attach_extent_buffer_page(), either we're attaching
anonymous pages, called from btrfs_clone_extent_buffer().

Or we're attaching btree_inode pages, called from alloc_extent_buffer().

For the later case, we should have page->mapping->private_lock hold to
avoid race modifying page->private.

Add assert_spin_locked() if we're calling from alloc_extent_buffer().

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2edbac6c089e..e282eb63ad1b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3110,6 +3110,15 @@ static int submit_extent_page(unsigned int opf,
 static void attach_extent_buffer_page(struct extent_buffer *eb,
 				      struct page *page)
 {
+	/*
+	 * If the page is mapped to btree inode, we should hold the private
+	 * lock to prevent race.
+	 * For cloned or dummy extent buffers, their pages are not mapped and
+	 * will not race with any other ebs.
+	 */
+	if (page->mapping)
+		assert_spin_locked(&page->mapping->private_lock);
+
 	if (!PagePrivate(page))
 		attach_page_private(page, eb);
 	else
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 17/49] btrfs: extent_io: extract the btree page submission code into its own helper function
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (15 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 16/49] btrfs: extent_io: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 18/49] btrfs: extent_io: calculate inline extent buffer page size based on page size Qu Wenruo
                   ` (31 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

In btree_write_cache_pages() we have a btree page submission routine
buried deeply into a nested loop.

This patch will extract that part of code into a helper function,
submit_btree_page(), to do the same work.

Also, since submit_btree_page() now can return >0 for successfull extent
buffer submission, remove the "ASSERT(ret <= 0);" line.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 116 +++++++++++++++++++++++++------------------
 1 file changed, 69 insertions(+), 47 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e282eb63ad1b..6b925094608c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3988,10 +3988,75 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 	return ret;
 }
 
+/*
+ * A helper to submit a btree page.
+ *
+ * This function is not always submitting the page, as we only submit the full
+ * extent buffer in a batch.
+ *
+ * @page:	The btree page
+ * @prev_eb:	Previous extent buffer, to determine if we need to submit
+ * 		this page.
+ *
+ * Return >0 if we have submitted the extent buffer successfully.
+ * Return 0 if we don't need to do anything for the page.
+ * Return <0 for fatal error.
+ */
+static int submit_btree_page(struct page *page, struct writeback_control *wbc,
+			     struct extent_page_data *epd,
+			     struct extent_buffer **prev_eb)
+{
+	struct address_space *mapping = page->mapping;
+	struct extent_buffer *eb;
+	int ret;
+
+	if (!PagePrivate(page))
+		return 0;
+
+	spin_lock(&mapping->private_lock);
+	if (!PagePrivate(page)) {
+		spin_unlock(&mapping->private_lock);
+		return 0;
+	}
+
+	eb = (struct extent_buffer *)page->private;
+
+	/*
+	 * Shouldn't happen and normally this would be a BUG_ON but no sense
+	 * in crashing the users box for something we can survive anyway.
+	 */
+	if (WARN_ON(!eb)) {
+		spin_unlock(&mapping->private_lock);
+		return 0;
+	}
+
+	if (eb == *prev_eb) {
+		spin_unlock(&mapping->private_lock);
+		return 0;
+	}
+	ret = atomic_inc_not_zero(&eb->refs);
+	spin_unlock(&mapping->private_lock);
+	if (!ret)
+		return 0;
+
+	*prev_eb = eb;
+
+	ret = lock_extent_buffer_for_io(eb, epd);
+	if (ret <= 0) {
+		free_extent_buffer(eb);
+		return ret;
+	}
+	ret = write_one_eb(eb, wbc, epd);
+	free_extent_buffer(eb);
+	if (ret < 0)
+		return ret;
+	return 1;
+}
+
 int btree_write_cache_pages(struct address_space *mapping,
 				   struct writeback_control *wbc)
 {
-	struct extent_buffer *eb, *prev_eb = NULL;
+	struct extent_buffer *prev_eb = NULL;
 	struct extent_page_data epd = {
 		.bio = NULL,
 		.extent_locked = 0,
@@ -4037,55 +4102,13 @@ int btree_write_cache_pages(struct address_space *mapping,
 		for (i = 0; i < nr_pages; i++) {
 			struct page *page = pvec.pages[i];
 
-			if (!PagePrivate(page))
-				continue;
-
-			spin_lock(&mapping->private_lock);
-			if (!PagePrivate(page)) {
-				spin_unlock(&mapping->private_lock);
-				continue;
-			}
-
-			eb = (struct extent_buffer *)page->private;
-
-			/*
-			 * Shouldn't happen and normally this would be a BUG_ON
-			 * but no sense in crashing the users box for something
-			 * we can survive anyway.
-			 */
-			if (WARN_ON(!eb)) {
-				spin_unlock(&mapping->private_lock);
-				continue;
-			}
-
-			if (eb == prev_eb) {
-				spin_unlock(&mapping->private_lock);
-				continue;
-			}
-
-			ret = atomic_inc_not_zero(&eb->refs);
-			spin_unlock(&mapping->private_lock);
-			if (!ret)
-				continue;
-
-			prev_eb = eb;
-			ret = lock_extent_buffer_for_io(eb, &epd);
-			if (!ret) {
-				free_extent_buffer(eb);
+			ret = submit_btree_page(page, wbc, &epd, &prev_eb);
+			if (ret == 0)
 				continue;
-			} else if (ret < 0) {
-				done = 1;
-				free_extent_buffer(eb);
-				break;
-			}
-
-			ret = write_one_eb(eb, wbc, &epd);
-			if (ret) {
+			if (ret < 0) {
 				done = 1;
-				free_extent_buffer(eb);
 				break;
 			}
-			free_extent_buffer(eb);
 
 			/*
 			 * the filesystem may choose to bump up nr_to_write.
@@ -4106,7 +4129,6 @@ int btree_write_cache_pages(struct address_space *mapping,
 		index = 0;
 		goto retry;
 	}
-	ASSERT(ret <= 0);
 	if (ret < 0) {
 		end_write_bio(&epd, ret);
 		return ret;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 18/49] btrfs: extent_io: calculate inline extent buffer page size based on page size
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (16 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 17/49] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 19/49] btrfs: extent_io: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
                   ` (30 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

Btrfs only support 64K as max node size, thus for 4K page system, we
would have at most 16 pages for one extent buffer.

For a system using 64K page size, we would really have just one
single page.

While we always use 16 pages for extent_buffer::pages[], this means for
systems using 64K pages, we are wasting memory for the 15 pages which
will never be utilized.

So this patch will change how the extent_buffer::pages[] array size is
calclulated, now it will be calculated using
BTRFS_MAX_METADATA_BLOCKSIZE and PAGE_SIZE.

For systems using 4K page size, it will stay 16 pages.
For systems using 64K page size, it will be just 1 page.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 6 +++---
 fs/btrfs/extent_io.h | 8 +++++---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6b925094608c..8662b27e42d6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5024,9 +5024,9 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
 	/*
 	 * Sanity checks, currently the maximum is 64k covered by 16x 4k pages
 	 */
-	BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE
-		> MAX_INLINE_EXTENT_BUFFER_SIZE);
-	BUG_ON(len > MAX_INLINE_EXTENT_BUFFER_SIZE);
+	BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE >
+		     INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE);
+	BUG_ON(len > BTRFS_MAX_METADATA_BLOCKSIZE);
 
 #ifdef CONFIG_BTRFS_DEBUG
 	eb->spinning_writers = 0;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 3c9252b429e0..e588b3100ede 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -85,9 +85,11 @@ struct extent_io_ops {
 				    int mirror);
 };
 
-
-#define INLINE_EXTENT_BUFFER_PAGES 16
-#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE)
+/*
+ * The SZ_64K is BTRFS_MAX_METADATA_BLOCKSIZE, here just to avoid circle
+ * including "ctree.h".
+ */
+#define INLINE_EXTENT_BUFFER_PAGES (SZ_64K / PAGE_SIZE)
 struct extent_buffer {
 	u64 start;
 	unsigned long len;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 19/49] btrfs: extent_io: make btrfs_fs_info::buffer_radix to take sector size devided values
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (17 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 18/49] btrfs: extent_io: calculate inline extent buffer page size based on page size Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 20/49] btrfs: disk_io: grab fs_info from extent_buffer::fs_info directly for btrfs_mark_buffer_dirty() Qu Wenruo
                   ` (29 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

For subpage sized sector size support, one page can contain mutliple tree
blocks, thus we can no longer use (eb->start >> PAGE_SHIFT) any more, or
we can easily get extent buffer doesn't belongs to the bytenr.

This patch will use (extent_buffer::start / sectorsize) as index for radix
tree so that we can get correct extent buffer for subpage size support.
While still keep the behavior same for regular sector size.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8662b27e42d6..5d982441bf6e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5162,7 +5162,7 @@ struct extent_buffer *find_extent_buffer(struct btrfs_fs_info *fs_info,
 
 	rcu_read_lock();
 	eb = radix_tree_lookup(&fs_info->buffer_radix,
-			       start >> PAGE_SHIFT);
+			       start / fs_info->sectorsize);
 	if (eb && atomic_inc_not_zero(&eb->refs)) {
 		rcu_read_unlock();
 		/*
@@ -5214,7 +5214,7 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
 	}
 	spin_lock(&fs_info->buffer_lock);
 	ret = radix_tree_insert(&fs_info->buffer_radix,
-				start >> PAGE_SHIFT, eb);
+				start / fs_info->sectorsize, eb);
 	spin_unlock(&fs_info->buffer_lock);
 	radix_tree_preload_end();
 	if (ret == -EEXIST) {
@@ -5322,7 +5322,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 
 	spin_lock(&fs_info->buffer_lock);
 	ret = radix_tree_insert(&fs_info->buffer_radix,
-				start >> PAGE_SHIFT, eb);
+				start / fs_info->sectorsize, eb);
 	spin_unlock(&fs_info->buffer_lock);
 	radix_tree_preload_end();
 	if (ret == -EEXIST) {
@@ -5378,7 +5378,7 @@ static int release_extent_buffer(struct extent_buffer *eb)
 
 			spin_lock(&fs_info->buffer_lock);
 			radix_tree_delete(&fs_info->buffer_radix,
-					  eb->start >> PAGE_SHIFT);
+					  eb->start / fs_info->sectorsize);
 			spin_unlock(&fs_info->buffer_lock);
 		} else {
 			spin_unlock(&eb->refs_lock);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 20/49] btrfs: disk_io: grab fs_info from extent_buffer::fs_info directly for btrfs_mark_buffer_dirty()
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (18 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 19/49] btrfs: extent_io: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 21/49] btrfs: disk-io: make csum_tree_block() handle sectorsize smaller than page size Qu Wenruo
                   ` (28 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

Since commit f28491e0a6c4 ("Btrfs: move the extent buffer radix tree into
the fs_info"), fs_info can be grabbed from extent_buffer directly.

So use that extent_buffer::fs_info directly in btrfs_mark_buffer_dirty()
to make things a little easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c81b7e53149c..58928076d08d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4190,8 +4190,7 @@ int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
 
 void btrfs_mark_buffer_dirty(struct extent_buffer *buf)
 {
-	struct btrfs_fs_info *fs_info;
-	struct btrfs_root *root;
+	struct btrfs_fs_info *fs_info = buf->fs_info;
 	u64 transid = btrfs_header_generation(buf);
 	int was_dirty;
 
@@ -4204,8 +4203,6 @@ void btrfs_mark_buffer_dirty(struct extent_buffer *buf)
 	if (unlikely(test_bit(EXTENT_BUFFER_UNMAPPED, &buf->bflags)))
 		return;
 #endif
-	root = BTRFS_I(buf->pages[0]->mapping->host)->root;
-	fs_info = root->fs_info;
 	btrfs_assert_tree_locked(buf);
 	if (transid != fs_info->generation)
 		WARN(1, KERN_CRIT "btrfs transid mismatch buffer %llu, found %llu running %llu\n",
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 21/49] btrfs: disk-io: make csum_tree_block() handle sectorsize smaller than page size
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (19 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 20/49] btrfs: disk_io: grab fs_info from extent_buffer::fs_info directly for btrfs_mark_buffer_dirty() Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 22/49] btrfs: disk-io: extract the extent buffer verification from btree_readpage_end_io_hook() Qu Wenruo
                   ` (27 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues, Nikolay Borisov

For subpage size support, we only need to handle the first page.

To make the code work for both cases, we modify the following behaviors:

- num_pages calcuation
  Instead of "nodesize >> PAGE_SHIFT", we go
  "DIV_ROUND_UP(nodesize, PAGE_SIZE)", this ensures we get at least one
  page for subpage size support, while still get the same result for
  regular page size.

- The length for the first run
  Instead of PAGE_SIZE - BTRFS_CSUM_SIZE, we go min(PAGE_SIZE, nodesize)
  - BTRFS_CSUM_SIZE.
  This allows us to handle both cases well.

- The start location of the first run
  Instead of always use BTRFS_CSUM_SIZE as csum start position, add
  offset_in_page(eb->start) to get proper offset for both cases.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/disk-io.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 58928076d08d..55bb4f2def3c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -257,16 +257,16 @@ struct extent_map *btree_get_extent(struct btrfs_inode *inode,
 static void csum_tree_block(struct extent_buffer *buf, u8 *result)
 {
 	struct btrfs_fs_info *fs_info = buf->fs_info;
-	const int num_pages = fs_info->nodesize >> PAGE_SHIFT;
+	const int num_pages = DIV_ROUND_UP(fs_info->nodesize, PAGE_SIZE);
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	char *kaddr;
 	int i;
 
 	shash->tfm = fs_info->csum_shash;
 	crypto_shash_init(shash);
-	kaddr = page_address(buf->pages[0]);
+	kaddr = page_address(buf->pages[0]) + offset_in_page(buf->start);
 	crypto_shash_update(shash, kaddr + BTRFS_CSUM_SIZE,
-			    PAGE_SIZE - BTRFS_CSUM_SIZE);
+		min_t(u32, PAGE_SIZE, fs_info->nodesize) - BTRFS_CSUM_SIZE);
 
 	for (i = 1; i < num_pages; i++) {
 		kaddr = page_address(buf->pages[i]);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 22/49] btrfs: disk-io: extract the extent buffer verification from btree_readpage_end_io_hook()
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (20 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 21/49] btrfs: disk-io: make csum_tree_block() handle sectorsize smaller than page size Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 23/49] btrfs: disk-io: accept bvec directly for csum_dirty_buffer() Qu Wenruo
                   ` (26 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

Currently btree_readpage_end_io_hook() only needs to handle one extent
buffer as currently one page only maps to one extent buffer.

But for incoming subpage support, one page can be mapped to multiple
extent buffers, thus we can no longer use current code.

This refactor would allow us to call btrfs_check_extent_buffer() on
all involved extent buffers at btree_readpage_end_io_hook() and other
locations.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 78 ++++++++++++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 55bb4f2def3c..ee2a6d480a7d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -574,60 +574,37 @@ static int check_tree_block_fsid(struct extent_buffer *eb)
 	return ret;
 }
 
-static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
-				      u64 phy_offset, struct page *page,
-				      u64 start, u64 end, int mirror)
+/* Do basic extent buffer check at read time */
+static int btrfs_check_extent_buffer(struct extent_buffer *eb)
 {
-	u64 found_start;
-	int found_level;
-	struct extent_buffer *eb;
-	struct btrfs_fs_info *fs_info;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 	u16 csum_size;
-	int ret = 0;
+	u64 found_start;
+	u8 found_level;
 	u8 result[BTRFS_CSUM_SIZE];
-	int reads_done;
-
-	if (!page->private)
-		goto out;
+	int ret = 0;
 
-	eb = (struct extent_buffer *)page->private;
-	fs_info = eb->fs_info;
 	csum_size = btrfs_super_csum_size(fs_info->super_copy);
 
-	/* the pending IO might have been the only thing that kept this buffer
-	 * in memory.  Make sure we have a ref for all this other checks
-	 */
-	atomic_inc(&eb->refs);
-
-	reads_done = atomic_dec_and_test(&eb->io_pages);
-	if (!reads_done)
-		goto err;
-
-	eb->read_mirror = mirror;
-	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
-		ret = -EIO;
-		goto err;
-	}
-
 	found_start = btrfs_header_bytenr(eb);
 	if (found_start != eb->start) {
 		btrfs_err_rl(fs_info, "bad tree block start, want %llu have %llu",
 			     eb->start, found_start);
 		ret = -EIO;
-		goto err;
+		goto out;
 	}
 	if (check_tree_block_fsid(eb)) {
 		btrfs_err_rl(fs_info, "bad fsid on block %llu",
 			     eb->start);
 		ret = -EIO;
-		goto err;
+		goto out;
 	}
 	found_level = btrfs_header_level(eb);
 	if (found_level >= BTRFS_MAX_LEVEL) {
 		btrfs_err(fs_info, "bad tree block level %d on %llu",
 			  (int)btrfs_header_level(eb), eb->start);
 		ret = -EIO;
-		goto err;
+		goto out;
 	}
 
 	btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb),
@@ -647,7 +624,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 			      fs_info->sb->s_id, eb->start,
 			      val, found, btrfs_header_level(eb));
 		ret = -EUCLEAN;
-		goto err;
+		goto out;
 	}
 
 	/*
@@ -669,6 +646,40 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 		btrfs_err(fs_info,
 			  "block=%llu read time tree block corruption detected",
 			  eb->start);
+out:
+	return ret;
+}
+
+static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
+				      u64 phy_offset, struct page *page,
+				      u64 start, u64 end, int mirror)
+{
+	struct extent_buffer *eb;
+	int ret = 0;
+	bool reads_done;
+
+	/* Metadata pages that goes through IO should all have private set */
+	ASSERT(PagePrivate(page) && page->private);
+	eb = (struct extent_buffer *)page->private;
+
+	/*
+	 * The pending IO might have been the only thing that kept this buffer
+	 * in memory.  Make sure we have a ref for all this other checks
+	 */
+	atomic_inc(&eb->refs);
+
+	reads_done = atomic_dec_and_test(&eb->io_pages);
+	if (!reads_done)
+		goto err;
+
+	eb->read_mirror = mirror;
+	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
+		ret = -EIO;
+		goto err;
+	}
+
+	ret = btrfs_check_extent_buffer(eb);
+
 err:
 	if (reads_done &&
 	    test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
@@ -684,7 +695,6 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 		clear_extent_buffer_uptodate(eb);
 	}
 	free_extent_buffer(eb);
-out:
 	return ret;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 23/49] btrfs: disk-io: accept bvec directly for csum_dirty_buffer()
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (21 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 22/49] btrfs: disk-io: extract the extent buffer verification from btree_readpage_end_io_hook() Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 24/49] btrfs: inode: make btrfs_readpage_end_io_hook() follow sector size Qu Wenruo
                   ` (25 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

Currently csum_dirty_buffer() uses page to grab extent buffer, but that
only works for regular sector size == PAGE_SIZE case.

For subpage we need page + page_offset to grab extent buffer.

This patch will change csum_dirty_buffer() to accept bvec directly so
that we can extract both page and page_offset for later subpage support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ee2a6d480a7d..b34a3f312e0c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -495,13 +495,14 @@ static int btree_read_extent_buffer_pages(struct extent_buffer *eb,
  * we only fill in the checksum field in the first page of a multi-page block
  */
 
-static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
+static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec)
 {
+	struct extent_buffer *eb;
+	struct page *page = bvec->bv_page;
 	u64 start = page_offset(page);
 	u64 found_start;
 	u8 result[BTRFS_CSUM_SIZE];
 	u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
-	struct extent_buffer *eb;
 	int ret;
 
 	eb = (struct extent_buffer *)page->private;
@@ -848,7 +849,7 @@ static blk_status_t btree_csum_one_bio(struct bio *bio)
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
 		root = BTRFS_I(bvec->bv_page->mapping->host)->root;
-		ret = csum_dirty_buffer(root->fs_info, bvec->bv_page);
+		ret = csum_dirty_buffer(root->fs_info, bvec);
 		if (ret)
 			break;
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 24/49] btrfs: inode: make btrfs_readpage_end_io_hook() follow sector size
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (22 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 23/49] btrfs: disk-io: accept bvec directly for csum_dirty_buffer() Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 25/49] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE Qu Wenruo
                   ` (24 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

Currently btrfs_readpage_end_io_hook() just pass the whole page to
check_data_csum(), which is fine since we only support sectorsize ==
PAGE_SIZE.

To support subpage, we need to properly honor per-sector
checksum verification, just like what we did in dio read path.

This patch will do the csum verification in a for loop, starts with
pg_off == start - page_offset(page), with sectorsize increasement for
each loop.

For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we
will only finish with just one loop.

For subpage, we do the proper loop.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 10ea6a92685b..2ee6ff186be4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2849,9 +2849,12 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 				      u64 start, u64 end, int mirror)
 {
 	size_t offset = start - page_offset(page);
+	size_t pg_off;
 	struct inode *inode = page->mapping->host;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
+	u32 sectorsize = root->fs_info->sectorsize;
+	bool found_err = false;
 
 	if (PageChecked(page)) {
 		ClearPageChecked(page);
@@ -2868,7 +2871,17 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 	}
 
 	phy_offset >>= inode->i_sb->s_blocksize_bits;
-	return check_data_csum(inode, io_bio, phy_offset, page, offset);
+	for (pg_off = offset; pg_off < end - page_offset(page);
+	     pg_off += sectorsize, phy_offset++) {
+		int ret;
+
+		ret = check_data_csum(inode, io_bio, phy_offset, page, pg_off);
+		if (ret < 0)
+			found_err = true;
+	}
+	if (found_err)
+		return -EIO;
+	return 0;
 }
 
 /*
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 25/49] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (23 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 24/49] btrfs: inode: make btrfs_readpage_end_io_hook() follow sector size Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 26/49] btrfs: extent_io: allow find_first_extent_bit() to find a range with exact bits match Qu Wenruo
                   ` (23 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

Just to save us several letters for the incoming patches.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9a72896bed2e..e3501dad88e2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3532,6 +3532,11 @@ static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info)
 	return signal_pending(current);
 }
 
+static inline bool btrfs_is_subpage(struct btrfs_fs_info *fs_info)
+{
+	return (fs_info->sectorsize < PAGE_SIZE);
+}
+
 #define in_range(b, first, len) ((b) >= (first) && (b) < (first) + (len))
 
 /* Sanity test specific functions */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 26/49] btrfs: extent_io: allow find_first_extent_bit() to find a range with exact bits match
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (24 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 25/49] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 27/49] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support Qu Wenruo
                   ` (22 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

Currently if we pass mutliple @bits to find_first_extent_bit(), it will
return the first range with one or more bits matching @bits.

This is fine for current code, since most of them are just doing their
own extra checks, and all existing callers only call it with 1 or 2
bits.

But for the incoming subpage support, we want the ability to return range
with exact match, so that caller can skip some extra checks.

So this patch will add a new bool parameter, @exact_match, to
find_first_extent_bit() and its callees.
Currently all callers just pass 'false' to the new parameter, thus no
functional change is introduced.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/block-group.c      |  2 +-
 fs/btrfs/disk-io.c          |  4 ++--
 fs/btrfs/extent-io-tree.h   |  2 +-
 fs/btrfs/extent-tree.c      |  2 +-
 fs/btrfs/extent_io.c        | 42 +++++++++++++++++++++++++------------
 fs/btrfs/free-space-cache.c |  2 +-
 fs/btrfs/relocation.c       |  2 +-
 fs/btrfs/transaction.c      |  4 ++--
 fs/btrfs/volumes.c          |  2 +-
 9 files changed, 39 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index ea8aaf36647e..7e6ab6b765f6 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -461,7 +461,7 @@ u64 add_new_free_space(struct btrfs_block_group *block_group, u64 start, u64 end
 		ret = find_first_extent_bit(&info->excluded_extents, start,
 					    &extent_start, &extent_end,
 					    EXTENT_DIRTY | EXTENT_UPTODATE,
-					    NULL);
+					    false, NULL);
 		if (ret)
 			break;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b34a3f312e0c..1ca121ca28aa 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4516,7 +4516,7 @@ static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info,
 
 	while (1) {
 		ret = find_first_extent_bit(dirty_pages, start, &start, &end,
-					    mark, NULL);
+					    mark, false, NULL);
 		if (ret)
 			break;
 
@@ -4556,7 +4556,7 @@ static int btrfs_destroy_pinned_extent(struct btrfs_fs_info *fs_info,
 		 */
 		mutex_lock(&fs_info->unused_bg_unpin_mutex);
 		ret = find_first_extent_bit(unpin, 0, &start, &end,
-					    EXTENT_DIRTY, &cached_state);
+					    EXTENT_DIRTY, false, &cached_state);
 		if (ret) {
 			mutex_unlock(&fs_info->unused_bg_unpin_mutex);
 			break;
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 5927338c74a2..4d0dbb562a81 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -224,7 +224,7 @@ static inline int set_extent_uptodate(struct extent_io_tree *tree, u64 start,
 
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
 			  u64 *start_ret, u64 *end_ret, unsigned bits,
-			  struct extent_state **cached_state);
+			  bool exact_match, struct extent_state **cached_state);
 void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
 				 u64 *start_ret, u64 *end_ret, unsigned bits);
 int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e9eedc053fc5..406329dabb48 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2880,7 +2880,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
 
 		mutex_lock(&fs_info->unused_bg_unpin_mutex);
 		ret = find_first_extent_bit(unpin, 0, &start, &end,
-					    EXTENT_DIRTY, &cached_state);
+					    EXTENT_DIRTY, false, &cached_state);
 		if (ret) {
 			mutex_unlock(&fs_info->unused_bg_unpin_mutex);
 			break;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5d982441bf6e..50cd5efc79ab 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1521,13 +1521,27 @@ void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end)
 	}
 }
 
-/* find the first state struct with 'bits' set after 'start', and
- * return it.  tree->lock must be held.  NULL will returned if
- * nothing was found after 'start'
+static bool match_extent_state(struct extent_state *state, unsigned bits,
+			       bool exact_match)
+{
+	if (exact_match)
+		return ((state->state & bits) == bits);
+	return (state->state & bits);
+}
+
+/*
+ * Find the first state struct with @bits set after @start.
+ *
+ * NOTE: tree->lock must be hold.
+ *
+ * @exact_match:	Do we need to have all @bits set, or just any of
+ * 			the @bits.
+ *
+ * Return NULL if we can't find a match.
  */
 static struct extent_state *
 find_first_extent_bit_state(struct extent_io_tree *tree,
-			    u64 start, unsigned bits)
+			    u64 start, unsigned bits, bool exact_match)
 {
 	struct rb_node *node;
 	struct extent_state *state;
@@ -1542,7 +1556,8 @@ find_first_extent_bit_state(struct extent_io_tree *tree,
 
 	while (1) {
 		state = rb_entry(node, struct extent_state, rb_node);
-		if (state->end >= start && (state->state & bits))
+		if (state->end >= start &&
+		    match_extent_state(state, bits, exact_match))
 			return state;
 
 		node = rb_next(node);
@@ -1563,7 +1578,7 @@ find_first_extent_bit_state(struct extent_io_tree *tree,
  */
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
 			  u64 *start_ret, u64 *end_ret, unsigned bits,
-			  struct extent_state **cached_state)
+			  bool exact_match, struct extent_state **cached_state)
 {
 	struct extent_state *state;
 	int ret = 1;
@@ -1573,7 +1588,8 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
 		state = *cached_state;
 		if (state->end == start - 1 && extent_state_in_tree(state)) {
 			while ((state = next_state(state)) != NULL) {
-				if (state->state & bits)
+				if (match_extent_state(state, bits,
+				    exact_match))
 					goto got_it;
 			}
 			free_extent_state(*cached_state);
@@ -1584,7 +1600,7 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
 		*cached_state = NULL;
 	}
 
-	state = find_first_extent_bit_state(tree, start, bits);
+	state = find_first_extent_bit_state(tree, start, bits, exact_match);
 got_it:
 	if (state) {
 		cache_state_if_flags(state, cached_state, 0);
@@ -1619,7 +1635,7 @@ int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start,
 	int ret = 1;
 
 	spin_lock(&tree->lock);
-	state = find_first_extent_bit_state(tree, start, bits);
+	state = find_first_extent_bit_state(tree, start, bits, false);
 	if (state) {
 		*start_ret = state->start;
 		*end_ret = state->end;
@@ -2413,9 +2429,8 @@ int clean_io_failure(struct btrfs_fs_info *fs_info,
 		goto out;
 
 	spin_lock(&io_tree->lock);
-	state = find_first_extent_bit_state(io_tree,
-					    failrec->start,
-					    EXTENT_LOCKED);
+	state = find_first_extent_bit_state(io_tree, failrec->start,
+					    EXTENT_LOCKED, false);
 	spin_unlock(&io_tree->lock);
 
 	if (state && state->start <= failrec->start &&
@@ -2451,7 +2466,8 @@ void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start, u64 end)
 		return;
 
 	spin_lock(&failure_tree->lock);
-	state = find_first_extent_bit_state(failure_tree, start, EXTENT_DIRTY);
+	state = find_first_extent_bit_state(failure_tree, start, EXTENT_DIRTY,
+					    false);
 	while (state) {
 		if (state->start > end)
 			break;
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index dc82fd0c80cb..1533df86536b 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1093,7 +1093,7 @@ static noinline_for_stack int write_pinned_extent_entries(
 	while (start < block_group->start + block_group->length) {
 		ret = find_first_extent_bit(unpin, start,
 					    &extent_start, &extent_end,
-					    EXTENT_DIRTY, NULL);
+					    EXTENT_DIRTY, false, NULL);
 		if (ret)
 			return 0;
 
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 4ba1ab9cc76d..77a7e35a500c 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3153,7 +3153,7 @@ int find_next_extent(struct reloc_control *rc, struct btrfs_path *path,
 
 		ret = find_first_extent_bit(&rc->processed_blocks,
 					    key.objectid, &start, &end,
-					    EXTENT_DIRTY, NULL);
+					    EXTENT_DIRTY, false, NULL);
 
 		if (ret == 0 && start <= key.objectid) {
 			btrfs_release_path(path);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 20c6ac1a5de7..5b3444641ea5 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -974,7 +974,7 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info,
 
 	atomic_inc(&BTRFS_I(fs_info->btree_inode)->sync_writers);
 	while (!find_first_extent_bit(dirty_pages, start, &start, &end,
-				      mark, &cached_state)) {
+				      mark, false, &cached_state)) {
 		bool wait_writeback = false;
 
 		err = convert_extent_bit(dirty_pages, start, end,
@@ -1029,7 +1029,7 @@ static int __btrfs_wait_marked_extents(struct btrfs_fs_info *fs_info,
 	u64 end;
 
 	while (!find_first_extent_bit(dirty_pages, start, &start, &end,
-				      EXTENT_NEED_WAIT, &cached_state)) {
+				      EXTENT_NEED_WAIT, false, &cached_state)) {
 		/*
 		 * Ignore -ENOMEM errors returned by clear_extent_bit().
 		 * When committing the transaction, we'll remove any entries
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 214856c4ccb1..c54329e92ced 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1382,7 +1382,7 @@ static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
 
 	if (!find_first_extent_bit(&device->alloc_state, *start,
 				   &physical_start, &physical_end,
-				   CHUNK_ALLOCATED, NULL)) {
+				   CHUNK_ALLOCATED, false, NULL)) {
 
 		if (in_range(physical_start, *start, len) ||
 		    in_range(*start, physical_start,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 27/49] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (25 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 26/49] btrfs: extent_io: allow find_first_extent_bit() to find a range with exact bits match Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 28/49] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
                   ` (21 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

As a preparation for subpage sector size support (allowing filesystem
with sector size smaller than page size to be mounted) if the sector
size is smaller than page size, we don't allow tree block to be read if
it crosses 64K(*) boundary.

The 64K is selected because:
- We are only going to support 64K page size for subpage for now
- 64K is also the max node size btrfs supports

This ensures that, tree blocks are always contained in one page for a
system with 64K page size, which can greatly simplify the handling.

Or we need to do complex multi-page handling for tree blocks.

Currently the only way to create such tree blocks crossing 64K boundary
is by btrfs-convert, which will get fixed soon and doesn't get
wide-spread usage.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 50cd5efc79ab..28188509a206 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5268,6 +5268,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		btrfs_err(fs_info, "bad tree block start %llu", start);
 		return ERR_PTR(-EINVAL);
 	}
+	if (btrfs_is_subpage(fs_info) && round_down(start, PAGE_SIZE) !=
+	    round_down(start + len - 1, PAGE_SIZE)) {
+		btrfs_err(fs_info,
+		"tree block crosses page boundary, start %llu nodesize %lu",
+			  start, len);
+		return ERR_PTR(-EINVAL);
+	}
 
 	eb = find_extent_buffer(fs_info, start);
 	if (eb)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 28/49] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (26 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 27/49] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 29/49] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
                   ` (20 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

For subpage sized extent buffer, we have ensured no extent buffer will
cross page boundary, thus we would only need one page for any extent
buffer.

This patch will update the function num_extent_pages() to handle such
case.
Now num_extent_pages() would return 1 instead of for subpage sized
extent buffer.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index e588b3100ede..552afc1c0bbc 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -229,8 +229,15 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb);
 
 static inline int num_extent_pages(const struct extent_buffer *eb)
 {
-	return (round_up(eb->start + eb->len, PAGE_SIZE) >> PAGE_SHIFT) -
-	       (eb->start >> PAGE_SHIFT);
+	/*
+	 * For sectorsize == PAGE_SIZE case, since eb is always aligned to
+	 * sectorsize, it's just (eb->len / PAGE_SIZE) >> PAGE_SHIFT.
+	 *
+	 * For sectorsize < PAGE_SIZE case, we only want to support 64K
+	 * PAGE_SIZE, and ensured all tree blocks won't cross page boundary.
+	 * So in that case we always got 1 page.
+	 */
+	return (round_up(eb->len, PAGE_SIZE) >> PAGE_SHIFT);
 }
 
 static inline int extent_buffer_uptodate(const struct extent_buffer *eb)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 29/49] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (27 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 28/49] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 30/49] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage() Qu Wenruo
                   ` (19 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

To support sectorsize < PAGE_SIZE case, we need to take extra care for
extent buffer accessors.

Since sectorsize is smaller than PAGE_SIZE, one page can contain
multiple tree blocks, we must use eb->start to determine the real offset
to read/write for extent buffer accessors.

This patch introduces two helpers to do these:
- get_eb_page_index()
  This is to calculate the index to access extent_buffer::pages.
  It's just a simple wrapper around "start >> PAGE_SHIFT".

  For sectorsize == PAGE_SIZE case, nothing is changed.
  For sectorsize < PAGE_SIZE case, we always get index as 0, and
  the existing page shift works also fine.

- get_eb_page_offset()
  This is to calculate the offset to access extent_buffer::pages.
  This needs to take extent_buffer::start into consideration.

  For sectorsize == PAGE_SIZE case, extent_buffer::start is always
  aligned to PAGE_SIZE, thus adding extent_buffer::start to
  offset_in_page() won't change the result.
  For sectorsize < PAGE_SIZE case, adding extent_buffer::start gives
  us the correct offset to access.

This patch will touch the following parts to cover all extent buffer
accessors:

- BTRFS_SETGET_HEADER_FUNCS()
- read_extent_buffer()
- read_extent_buffer_to_user()
- memcmp_extent_buffer()
- write_extent_buffer_chunk_tree_uuid()
- write_extent_buffer_fsid()
- write_extent_buffer()
- memzero_extent_buffer()
- copy_extent_buffer_full()
- copy_extent_buffer()
- memcpy_extent_buffer()
- memmove_extent_buffer()
- btrfs_get_token_##bits()
- btrfs_get_##bits()
- btrfs_set_token_##bits()
- btrfs_set_##bits()
- generic_bin_search()

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.c        |  5 ++--
 fs/btrfs/ctree.h        | 38 ++++++++++++++++++++++--
 fs/btrfs/extent_io.c    | 66 ++++++++++++++++++++++++-----------------
 fs/btrfs/struct-funcs.c | 18 ++++++-----
 4 files changed, 88 insertions(+), 39 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index cd392da69b81..0f6944a3a836 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1712,10 +1712,11 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
 		oip = offset_in_page(offset);
 
 		if (oip + key_size <= PAGE_SIZE) {
-			const unsigned long idx = offset >> PAGE_SHIFT;
+			const unsigned long idx = get_eb_page_index(offset);
 			char *kaddr = page_address(eb->pages[idx]);
 
-			tmp = (struct btrfs_disk_key *)(kaddr + oip);
+			tmp = (struct btrfs_disk_key *)(kaddr +
+					get_eb_page_offset(eb, offset));
 		} else {
 			read_extent_buffer(eb, &unaligned, offset, key_size);
 			tmp = &unaligned;
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index e3501dad88e2..0c3ea3599dc7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1448,14 +1448,15 @@ static inline void btrfs_set_token_##name(struct btrfs_map_token *token,\
 #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)		\
 static inline u##bits btrfs_##name(const struct extent_buffer *eb)	\
 {									\
-	const type *p = page_address(eb->pages[0]);			\
+	const type *p = page_address(eb->pages[0]) +			\
+			offset_in_page(eb->start);			\
 	u##bits res = le##bits##_to_cpu(p->member);			\
 	return res;							\
 }									\
 static inline void btrfs_set_##name(const struct extent_buffer *eb,	\
 				    u##bits val)			\
 {									\
-	type *p = page_address(eb->pages[0]);				\
+	type *p = page_address(eb->pages[0]) + offset_in_page(eb->start); \
 	p->member = cpu_to_le##bits(val);				\
 }
 
@@ -3241,6 +3242,39 @@ static inline void assertfail(const char *expr, const char* file, int line) { }
 #define ASSERT(expr)	(void)(expr)
 #endif
 
+/*
+ * Get the correct offset inside the page of extent buffer.
+ *
+ * Will handle both sectorsize == PAGE_SIZE and sectorsize < PAGE_SIZE cases.
+ *
+ * @eb:		The target extent buffer
+ * @start:	The offset inside the extent buffer
+ */
+static inline size_t get_eb_page_offset(const struct extent_buffer *eb,
+					unsigned long offset_in_eb)
+{
+	/*
+	 * For sectorsize == PAGE_SIZE case, eb->start will always be aligned
+	 * to PAGE_SIZE, thus adding it won't cause any difference.
+	 *
+	 * For sectorsize < PAGE_SIZE, we must only read the data belongs to
+	 * the eb, thus we have to take the eb->start into consideration.
+	 */
+	return offset_in_page(offset_in_eb + eb->start);
+}
+
+static inline unsigned long get_eb_page_index(unsigned long offset_in_eb)
+{
+	/*
+	 * For sectorsize == PAGE_SIZE case, plain >> PAGE_SHIFT is enough.
+	 *
+	 * For sectorsize < PAGE_SIZE case, we only support 64K PAGE_SIZE,
+	 * and has ensured all tree blocks are contained in one page, thus
+	 * we always get index == 0.
+	 */
+	return offset_in_eb >> PAGE_SHIFT;
+}
+
 /*
  * Use that for functions that are conditionally exported for sanity tests but
  * otherwise static
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 28188509a206..e42a17039bf6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5673,7 +5673,7 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 	struct page *page;
 	char *kaddr;
 	char *dst = (char *)dstv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 
 	if (start + len > eb->len) {
 		WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, wanted %lu %lu\n",
@@ -5682,7 +5682,7 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 		return;
 	}
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5707,13 +5707,13 @@ int read_extent_buffer_to_user_nofault(const struct extent_buffer *eb,
 	struct page *page;
 	char *kaddr;
 	char __user *dst = (char __user *)dstv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 	int ret = 0;
 
 	WARN_ON(start > eb->len);
 	WARN_ON(start + len > eb->start + eb->len);
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5742,13 +5742,13 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 	struct page *page;
 	char *kaddr;
 	char *ptr = (char *)ptrv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 	int ret = 0;
 
 	WARN_ON(start > eb->len);
 	WARN_ON(start + len > eb->start + eb->len);
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5774,7 +5774,7 @@ void write_extent_buffer_chunk_tree_uuid(const struct extent_buffer *eb,
 	char *kaddr;
 
 	WARN_ON(!PageUptodate(eb->pages[0]));
-	kaddr = page_address(eb->pages[0]);
+	kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, chunk_tree_uuid), srcv,
 			BTRFS_FSID_SIZE);
 }
@@ -5784,7 +5784,7 @@ void write_extent_buffer_fsid(const struct extent_buffer *eb, const void *srcv)
 	char *kaddr;
 
 	WARN_ON(!PageUptodate(eb->pages[0]));
-	kaddr = page_address(eb->pages[0]);
+	kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, fsid), srcv,
 			BTRFS_FSID_SIZE);
 }
@@ -5797,12 +5797,12 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
 	struct page *page;
 	char *kaddr;
 	char *src = (char *)srcv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 
 	WARN_ON(start > eb->len);
 	WARN_ON(start + len > eb->start + eb->len);
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5826,12 +5826,12 @@ void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start,
 	size_t offset;
 	struct page *page;
 	char *kaddr;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 
 	WARN_ON(start > eb->len);
 	WARN_ON(start + len > eb->start + eb->len);
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5855,10 +5855,22 @@ void copy_extent_buffer_full(const struct extent_buffer *dst,
 
 	ASSERT(dst->len == src->len);
 
-	num_pages = num_extent_pages(dst);
-	for (i = 0; i < num_pages; i++)
-		copy_page(page_address(dst->pages[i]),
-				page_address(src->pages[i]));
+	if (dst->fs_info->sectorsize == PAGE_SIZE) {
+		num_pages = num_extent_pages(dst);
+		for (i = 0; i < num_pages; i++)
+			copy_page(page_address(dst->pages[i]),
+				  page_address(src->pages[i]));
+	} else {
+		unsigned long src_index = get_eb_page_index(0);
+		unsigned long dst_index = get_eb_page_index(0);
+		size_t src_offset = get_eb_page_offset(src, 0);
+		size_t dst_offset = get_eb_page_offset(dst, 0);
+
+		ASSERT(src_index == 0 && dst_index == 0);
+		memcpy(page_address(dst->pages[dst_index]) + dst_offset,
+		       page_address(src->pages[src_index]) + src_offset,
+		       src->len);
+	}
 }
 
 void copy_extent_buffer(const struct extent_buffer *dst,
@@ -5871,11 +5883,11 @@ void copy_extent_buffer(const struct extent_buffer *dst,
 	size_t offset;
 	struct page *page;
 	char *kaddr;
-	unsigned long i = dst_offset >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(dst_offset);
 
 	WARN_ON(src->len != dst_len);
 
-	offset = offset_in_page(dst_offset);
+	offset = get_eb_page_offset(dst, dst_offset);
 
 	while (len > 0) {
 		page = dst->pages[i];
@@ -5919,7 +5931,7 @@ static inline void eb_bitmap_offset(const struct extent_buffer *eb,
 	 * the bitmap item in the extent buffer + the offset of the byte in the
 	 * bitmap item.
 	 */
-	offset = start + byte_offset;
+	offset = start + offset_in_page(eb->start) + byte_offset;
 
 	*page_index = offset >> PAGE_SHIFT;
 	*page_offset = offset_in_page(offset);
@@ -6083,11 +6095,11 @@ void memcpy_extent_buffer(const struct extent_buffer *dst,
 	}
 
 	while (len > 0) {
-		dst_off_in_page = offset_in_page(dst_offset);
-		src_off_in_page = offset_in_page(src_offset);
+		dst_off_in_page = get_eb_page_offset(dst, dst_offset);
+		src_off_in_page = get_eb_page_offset(dst, src_offset);
 
-		dst_i = dst_offset >> PAGE_SHIFT;
-		src_i = src_offset >> PAGE_SHIFT;
+		dst_i = get_eb_page_index(dst_offset);
+		src_i = get_eb_page_index(src_offset);
 
 		cur = min(len, (unsigned long)(PAGE_SIZE -
 					       src_off_in_page));
@@ -6133,11 +6145,11 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
 		return;
 	}
 	while (len > 0) {
-		dst_i = dst_end >> PAGE_SHIFT;
-		src_i = src_end >> PAGE_SHIFT;
+		dst_i = get_eb_page_index(dst_end);
+		src_i = get_eb_page_index(src_end);
 
-		dst_off_in_page = offset_in_page(dst_end);
-		src_off_in_page = offset_in_page(src_end);
+		dst_off_in_page = get_eb_page_offset(dst, dst_end);
+		src_off_in_page = get_eb_page_offset(dst, src_end);
 
 		cur = min_t(unsigned long, len, src_off_in_page + 1);
 		cur = min(cur, dst_off_in_page + 1);
diff --git a/fs/btrfs/struct-funcs.c b/fs/btrfs/struct-funcs.c
index 079b059818e9..769901c2b3c9 100644
--- a/fs/btrfs/struct-funcs.c
+++ b/fs/btrfs/struct-funcs.c
@@ -67,8 +67,9 @@ u##bits btrfs_get_token_##bits(struct btrfs_map_token *token,		\
 			       const void *ptr, unsigned long off)	\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
-	const unsigned long oip = offset_in_page(member_offset);	\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
+	const unsigned long oip = get_eb_page_offset(token->eb, 	\
+						     member_offset);	\
 	const int size = sizeof(u##bits);				\
 	u8 lebytes[sizeof(u##bits)];					\
 	const int part = PAGE_SIZE - oip;				\
@@ -95,8 +96,8 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb,		\
 			 const void *ptr, unsigned long off)		\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long oip = offset_in_page(member_offset);	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
+	const unsigned long oip = get_eb_page_offset(eb, member_offset);\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
 	char *kaddr = page_address(eb->pages[idx]);			\
 	const int size = sizeof(u##bits);				\
 	const int part = PAGE_SIZE - oip;				\
@@ -116,8 +117,9 @@ void btrfs_set_token_##bits(struct btrfs_map_token *token,		\
 			    u##bits val)				\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
-	const unsigned long oip = offset_in_page(member_offset);	\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
+	const unsigned long oip = get_eb_page_offset(token->eb,		\
+						     member_offset);	\
 	const int size = sizeof(u##bits);				\
 	u8 lebytes[sizeof(u##bits)];					\
 	const int part = PAGE_SIZE - oip;				\
@@ -146,8 +148,8 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr,	\
 		      unsigned long off, u##bits val)			\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long oip = offset_in_page(member_offset);	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
+	const unsigned long oip = get_eb_page_offset(eb, member_offset);\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
 	char *kaddr = page_address(eb->pages[idx]);			\
 	const int size = sizeof(u##bits);				\
 	const int part = PAGE_SIZE - oip;				\
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 30/49] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage()
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (28 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 29/49] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 31/49] btrfs: extent-io: make type of extent_state::state to be at least 32 bits Qu Wenruo
                   ` (18 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

In extent_invalidatepage() it will try to clear all possible bits since
it's calling clear_extent_bit() with delete == 1.
That would try to clear all existing bits.

This is currently fine, since for btree io tree, it only utilizes
EXTENT_LOCK bit.
But this could be a problem for later subpage support, which will
utilize extra io tree bit to represent extra info.

This patch will just convert that clear_extent_bit() to
unlock_extent_cached().

As for btree io tree, only EXTENT_LOCKED bit is utilized, this doesn't
change the behavior, but provides a much cleaner basis for incoming
subpage support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1ca121ca28aa..10bdb0a8a92f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -996,8 +996,13 @@ static void extent_invalidatepage(struct extent_io_tree *tree,
 
 	lock_extent_bits(tree, start, end, &cached_state);
 	wait_on_page_writeback(page);
-	clear_extent_bit(tree, start, end, EXTENT_LOCKED | EXTENT_DELALLOC |
-			 EXTENT_DO_ACCOUNTING, 1, 1, &cached_state);
+
+	/*
+	 * Currently for btree io tree, only EXTENT_LOCKED is utilized,
+	 * so here we only need to unlock the extent range to free any
+	 * existing extent state.
+	 */
+	unlock_extent_cached(tree, start, end, &cached_state);
 }
 
 static void btree_invalidatepage(struct page *page, unsigned int offset,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 31/49] btrfs: extent-io: make type of extent_state::state to be at least 32 bits
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (29 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 30/49] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage() Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 32/49] btrfs: extent_io: use extent_io_tree to handle subpage extent buffer allocation Qu Wenruo
                   ` (17 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

Currently we use 'unsigned' for extent_state::state, which is only ensured
to be at least 16 bits.

But for incoming subpage support, we are going to introduce more bits to
at least match the following page bits:
- PageUptodate
- PagePrivate2

Thus we will go beyond 16 bits.

To support this, make extent_state::state at least 32bit and to be more
explicit, we use "u32" to be clear about the max supported bits.

This doesn't increase the memory usage for x86_64, but may affect other
architectures.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-io-tree.h | 36 +++++++++++++++-------------
 fs/btrfs/extent_io.c      | 49 +++++++++++++++++++--------------------
 fs/btrfs/extent_io.h      |  2 +-
 3 files changed, 45 insertions(+), 42 deletions(-)

diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 4d0dbb562a81..108b386118fe 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -22,6 +22,10 @@ struct io_failure_record;
 #define EXTENT_QGROUP_RESERVED	(1U << 12)
 #define EXTENT_CLEAR_DATA_RESV	(1U << 13)
 #define EXTENT_DELALLOC_NEW	(1U << 14)
+
+/* For subpage btree io tree, to indicate there is an extent buffer */
+#define EXTENT_HAS_TREE_BLOCK	(1U << 15)
+
 #define EXTENT_DO_ACCOUNTING    (EXTENT_CLEAR_META_RESV | \
 				 EXTENT_CLEAR_DATA_RESV)
 #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING)
@@ -73,7 +77,7 @@ struct extent_state {
 	/* ADD NEW ELEMENTS AFTER THIS */
 	wait_queue_head_t wq;
 	refcount_t refs;
-	unsigned state;
+	u32 state;
 
 	struct io_failure_record *failrec;
 
@@ -105,19 +109,19 @@ void __cold extent_io_exit(void);
 
 u64 count_range_bits(struct extent_io_tree *tree,
 		     u64 *start, u64 search_end,
-		     u64 max_bytes, unsigned bits, int contig);
+		     u64 max_bytes, u32 bits, int contig);
 
 void free_extent_state(struct extent_state *state);
 int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		   unsigned bits, int filled,
+		   u32 bits, int filled,
 		   struct extent_state *cached_state);
 int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
-		unsigned bits, struct extent_changeset *changeset);
+			     u32 bits, struct extent_changeset *changeset);
 int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		     unsigned bits, int wake, int delete,
+		     u32 bits, int wake, int delete,
 		     struct extent_state **cached);
 int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		     unsigned bits, int wake, int delete,
+		     u32 bits, int wake, int delete,
 		     struct extent_state **cached, gfp_t mask,
 		     struct extent_changeset *changeset);
 
@@ -141,7 +145,7 @@ static inline int unlock_extent_cached_atomic(struct extent_io_tree *tree,
 }
 
 static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start,
-		u64 end, unsigned bits)
+				    u64 end, u32 bits)
 {
 	int wake = 0;
 
@@ -152,15 +156,15 @@ static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start,
 }
 
 int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
-			   unsigned bits, struct extent_changeset *changeset);
+			   u32 bits, struct extent_changeset *changeset);
 int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		   unsigned bits, u64 *failed_start,
+		   u32 bits, u64 *failed_start,
 		   struct extent_state **cached_state, gfp_t mask);
 int set_extent_bits_nowait(struct extent_io_tree *tree, u64 start, u64 end,
-			   unsigned bits);
+			   u32 bits);
 
 static inline int set_extent_bits(struct extent_io_tree *tree, u64 start,
-		u64 end, unsigned bits)
+		u64 end, u32 bits)
 {
 	return set_extent_bit(tree, start, end, bits, NULL, NULL, GFP_NOFS);
 }
@@ -188,11 +192,11 @@ static inline int clear_extent_dirty(struct extent_io_tree *tree, u64 start,
 }
 
 int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		       unsigned bits, unsigned clear_bits,
+		       u32 bits, u32 clear_bits,
 		       struct extent_state **cached_state);
 
 static inline int set_extent_delalloc(struct extent_io_tree *tree, u64 start,
-				      u64 end, unsigned int extra_bits,
+				      u64 end, u32 extra_bits,
 				      struct extent_state **cached_state)
 {
 	return set_extent_bit(tree, start, end,
@@ -223,12 +227,12 @@ static inline int set_extent_uptodate(struct extent_io_tree *tree, u64 start,
 }
 
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
-			  u64 *start_ret, u64 *end_ret, unsigned bits,
+			  u64 *start_ret, u64 *end_ret, u32 bits,
 			  bool exact_match, struct extent_state **cached_state);
 void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
-				 u64 *start_ret, u64 *end_ret, unsigned bits);
+				 u64 *start_ret, u64 *end_ret, u32 bits);
 int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start,
-			       u64 *start_ret, u64 *end_ret, unsigned bits);
+			       u64 *start_ret, u64 *end_ret, u32 bits);
 bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start,
 			       u64 *end, u64 max_bytes,
 			       struct extent_state **cached_state);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e42a17039bf6..0c4ce0b1f4ce 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -142,7 +142,7 @@ struct extent_page_data {
 	unsigned int sync_io:1;
 };
 
-static int add_extent_changeset(struct extent_state *state, unsigned bits,
+static int add_extent_changeset(struct extent_state *state, u32 bits,
 				 struct extent_changeset *changeset,
 				 int set)
 {
@@ -530,7 +530,7 @@ static void merge_state(struct extent_io_tree *tree,
 }
 
 static void set_state_bits(struct extent_io_tree *tree,
-			   struct extent_state *state, unsigned *bits,
+			   struct extent_state *state, u32 *bits,
 			   struct extent_changeset *changeset);
 
 /*
@@ -547,7 +547,7 @@ static int insert_state(struct extent_io_tree *tree,
 			struct extent_state *state, u64 start, u64 end,
 			struct rb_node ***p,
 			struct rb_node **parent,
-			unsigned *bits, struct extent_changeset *changeset)
+			u32 *bits, struct extent_changeset *changeset)
 {
 	struct rb_node *node;
 
@@ -628,11 +628,11 @@ static struct extent_state *next_state(struct extent_state *state)
  */
 static struct extent_state *clear_state_bit(struct extent_io_tree *tree,
 					    struct extent_state *state,
-					    unsigned *bits, int wake,
+					    u32 *bits, int wake,
 					    struct extent_changeset *changeset)
 {
 	struct extent_state *next;
-	unsigned bits_to_clear = *bits & ~EXTENT_CTLBITS;
+	u32 bits_to_clear = *bits & ~EXTENT_CTLBITS;
 	int ret;
 
 	if ((bits_to_clear & EXTENT_DIRTY) && (state->state & EXTENT_DIRTY)) {
@@ -695,7 +695,7 @@ static void extent_io_tree_panic(struct extent_io_tree *tree, int err)
  * This takes the tree lock, and returns 0 on success and < 0 on error.
  */
 int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-			      unsigned bits, int wake, int delete,
+			      u32 bits, int wake, int delete,
 			      struct extent_state **cached_state,
 			      gfp_t mask, struct extent_changeset *changeset)
 {
@@ -868,7 +868,7 @@ static void wait_on_state(struct extent_io_tree *tree,
  * The tree lock is taken by this function
  */
 static void wait_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-			    unsigned long bits)
+			    u32 bits)
 {
 	struct extent_state *state;
 	struct rb_node *node;
@@ -915,9 +915,9 @@ static void wait_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 
 static void set_state_bits(struct extent_io_tree *tree,
 			   struct extent_state *state,
-			   unsigned *bits, struct extent_changeset *changeset)
+			   u32 *bits, struct extent_changeset *changeset)
 {
-	unsigned bits_to_set = *bits & ~EXTENT_CTLBITS;
+	u32 bits_to_set = *bits & ~EXTENT_CTLBITS;
 	int ret;
 
 	if (tree->private_data && is_data_inode(tree->private_data))
@@ -964,7 +964,7 @@ static void cache_state(struct extent_state *state,
 
 static int __must_check
 __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		 unsigned bits, unsigned exclusive_bits,
+		 u32 bits, u32 exclusive_bits,
 		 u64 *failed_start, struct extent_state **cached_state,
 		 gfp_t mask, struct extent_changeset *changeset)
 {
@@ -1180,7 +1180,7 @@ __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 }
 
 int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		   unsigned bits, u64 * failed_start,
+		   u32 bits, u64 * failed_start,
 		   struct extent_state **cached_state, gfp_t mask)
 {
 	return __set_extent_bit(tree, start, end, bits, 0, failed_start,
@@ -1207,7 +1207,7 @@ int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
  * All allocations are done with GFP_NOFS.
  */
 int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		       unsigned bits, unsigned clear_bits,
+		       u32 bits, u32 clear_bits,
 		       struct extent_state **cached_state)
 {
 	struct extent_state *state;
@@ -1408,7 +1408,7 @@ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 
 /* wrappers around set/clear extent bit */
 int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
-			   unsigned bits, struct extent_changeset *changeset)
+			   u32 bits, struct extent_changeset *changeset)
 {
 	/*
 	 * We don't support EXTENT_LOCKED yet, as current changeset will
@@ -1423,14 +1423,14 @@ int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 }
 
 int set_extent_bits_nowait(struct extent_io_tree *tree, u64 start, u64 end,
-			   unsigned bits)
+			   u32 bits)
 {
 	return __set_extent_bit(tree, start, end, bits, 0, NULL, NULL,
 				GFP_NOWAIT, NULL);
 }
 
 int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		     unsigned bits, int wake, int delete,
+		     u32 bits, int wake, int delete,
 		     struct extent_state **cached)
 {
 	return __clear_extent_bit(tree, start, end, bits, wake, delete,
@@ -1438,7 +1438,7 @@ int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 }
 
 int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
-		unsigned bits, struct extent_changeset *changeset)
+		u32 bits, struct extent_changeset *changeset)
 {
 	/*
 	 * Don't support EXTENT_LOCKED case, same reason as
@@ -1521,7 +1521,7 @@ void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end)
 	}
 }
 
-static bool match_extent_state(struct extent_state *state, unsigned bits,
+static bool match_extent_state(struct extent_state *state, u32 bits,
 			       bool exact_match)
 {
 	if (exact_match)
@@ -1541,7 +1541,7 @@ static bool match_extent_state(struct extent_state *state, unsigned bits,
  */
 static struct extent_state *
 find_first_extent_bit_state(struct extent_io_tree *tree,
-			    u64 start, unsigned bits, bool exact_match)
+			    u64 start, u32 bits, bool exact_match)
 {
 	struct rb_node *node;
 	struct extent_state *state;
@@ -1577,7 +1577,7 @@ find_first_extent_bit_state(struct extent_io_tree *tree,
  * Return 1 if we found nothing.
  */
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
-			  u64 *start_ret, u64 *end_ret, unsigned bits,
+			  u64 *start_ret, u64 *end_ret, u32 bits,
 			  bool exact_match, struct extent_state **cached_state)
 {
 	struct extent_state *state;
@@ -1629,7 +1629,7 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
  * returned will be the full contiguous area with the bits set.
  */
 int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start,
-			       u64 *start_ret, u64 *end_ret, unsigned bits)
+			       u64 *start_ret, u64 *end_ret, u32 bits)
 {
 	struct extent_state *state;
 	int ret = 1;
@@ -1666,7 +1666,7 @@ int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start,
  * trim @end_ret to the appropriate size.
  */
 void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
-				 u64 *start_ret, u64 *end_ret, unsigned bits)
+				 u64 *start_ret, u64 *end_ret, u32 bits)
 {
 	struct extent_state *state;
 	struct rb_node *node, *prev = NULL, *next;
@@ -2056,8 +2056,7 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
 
 void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 				  struct page *locked_page,
-				  unsigned clear_bits,
-				  unsigned long page_ops)
+				  u32 clear_bits, unsigned long page_ops)
 {
 	clear_extent_bit(&inode->io_tree, start, end, clear_bits, 1, 0, NULL);
 
@@ -2072,7 +2071,7 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
  */
 u64 count_range_bits(struct extent_io_tree *tree,
 		     u64 *start, u64 search_end, u64 max_bytes,
-		     unsigned bits, int contig)
+		     u32 bits, int contig)
 {
 	struct rb_node *node;
 	struct extent_state *state;
@@ -2192,7 +2191,7 @@ struct io_failure_record *get_state_failrec(struct extent_io_tree *tree, u64 sta
  * range is found set.
  */
 int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		   unsigned bits, int filled, struct extent_state *cached)
+		   u32 bits, int filled, struct extent_state *cached)
 {
 	struct extent_state *state = NULL;
 	struct rb_node *node;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 552afc1c0bbc..602d6568c8ea 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -288,7 +288,7 @@ void extent_range_clear_dirty_for_io(struct inode *inode, u64 start, u64 end);
 void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end);
 void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 				  struct page *locked_page,
-				  unsigned bits_to_clear,
+				  u32 bits_to_clear,
 				  unsigned long page_ops);
 struct bio *btrfs_bio_alloc(u64 first_byte);
 struct bio *btrfs_io_bio_alloc(unsigned int nr_iovecs);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 32/49] btrfs: extent_io: use extent_io_tree to handle subpage extent buffer allocation
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (30 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 31/49] btrfs: extent-io: make type of extent_state::state to be at least 32 bits Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 33/49] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
                   ` (16 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs uses page::private as an indicator of who owns the
extent buffer, this method won't really work on subpage support, as one
page can contain several tree blocks (up to 16 for 4K node size and 64K
page size).

Instead, here we utilize btree extent io tree to handle them.
For btree io tree, we introduce a new bit, EXTENT_HAS_TREE_BLOCK to
indicate that we have an in-tree extent buffer for the range.

This will affects the following functions:
- alloc_extent_buffer()
  Now for subpage we never use page->private to grab an existing eb.
  Instead, we rely on extra safenet in alloc_extent_buffer() to detect two
  callers on the same eb.

- btrfs_release_extent_buffer_pages()
  Now for subpage, we clear the EXTENT_HAS_TREE_BLOCK bit first, then
  check if the remaining range in the page has EXTENT_HAS_TREE_BLOCK bit.
  If not, then clear the private bit for the page.

- attach_extent_buffer_page()
  Now we set EXTENT_HAS_TREE_BLOCK bit for the new extent buffer to be
  attached, and set the page private, with NULL as page::private.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/btrfs_inode.h    | 12 ++++++
 fs/btrfs/extent-io-tree.h |  2 +-
 fs/btrfs/extent_io.c      | 80 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index c47b6c6fea9f..cff818e0c406 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -217,6 +217,18 @@ static inline struct btrfs_inode *BTRFS_I(const struct inode *inode)
 	return container_of(inode, struct btrfs_inode, vfs_inode);
 }
 
+static inline struct btrfs_fs_info *page_to_fs_info(struct page *page)
+{
+	ASSERT(page->mapping);
+	return BTRFS_I(page->mapping->host)->root->fs_info;
+}
+
+static inline struct extent_io_tree
+*info_to_btree_io_tree(struct btrfs_fs_info *fs_info)
+{
+	return &BTRFS_I(fs_info->btree_inode)->io_tree;
+}
+
 static inline unsigned long btrfs_inode_hash(u64 objectid,
 					     const struct btrfs_root *root)
 {
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 108b386118fe..c4e73c84ba34 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -23,7 +23,7 @@ struct io_failure_record;
 #define EXTENT_CLEAR_DATA_RESV	(1U << 13)
 #define EXTENT_DELALLOC_NEW	(1U << 14)
 
-/* For subpage btree io tree, to indicate there is an extent buffer */
+/* For subpage btree io tree, indicates there is an in-tree extent buffer */
 #define EXTENT_HAS_TREE_BLOCK	(1U << 15)
 
 #define EXTENT_DO_ACCOUNTING    (EXTENT_CLEAR_META_RESV | \
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0c4ce0b1f4ce..4dbc0b79c4ce 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3134,6 +3134,18 @@ static void attach_extent_buffer_page(struct extent_buffer *eb,
 	if (page->mapping)
 		assert_spin_locked(&page->mapping->private_lock);
 
+	if (btrfs_is_subpage(eb->fs_info) && page->mapping) {
+		struct extent_io_tree *io_tree =
+			info_to_btree_io_tree(eb->fs_info);
+
+		if (!PagePrivate(page))
+			attach_page_private(page, NULL);
+
+		set_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+				EXTENT_HAS_TREE_BLOCK, NULL, NULL, GFP_ATOMIC);
+		return;
+	}
+
 	if (!PagePrivate(page))
 		attach_page_private(page, eb);
 	else
@@ -4955,6 +4967,36 @@ int extent_buffer_under_io(const struct extent_buffer *eb)
 		test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
 }
 
+static void detach_extent_buffer_subpage(struct extent_buffer *eb)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	struct page *page = eb->pages[0];
+	bool mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
+	int ret;
+
+	if (!page)
+		return;
+
+	if (mapped)
+		spin_lock(&page->mapping->private_lock);
+
+	__clear_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+			   EXTENT_HAS_TREE_BLOCK, 0, 0, NULL, GFP_ATOMIC, NULL);
+
+	/* Test if we still have other extent buffer in the page range */
+	ret = test_range_bit(io_tree, round_down(eb->start, PAGE_SIZE),
+			     round_down(eb->start, PAGE_SIZE) + PAGE_SIZE - 1,
+			     EXTENT_HAS_TREE_BLOCK, 0, NULL);
+	if (!ret)
+		detach_page_private(eb->pages[0]);
+	if (mapped)
+		spin_unlock(&page->mapping->private_lock);
+
+	/* One for when we allocated the page */
+	put_page(page);
+}
+
 /*
  * Release all pages attached to the extent buffer.
  */
@@ -4966,6 +5008,9 @@ static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
 
 	BUG_ON(extent_buffer_under_io(eb));
 
+	if (btrfs_is_subpage(eb->fs_info) && mapped)
+		return detach_extent_buffer_subpage(eb);
+
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		struct page *page = eb->pages[i];
@@ -5260,6 +5305,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 	struct extent_buffer *exists = NULL;
 	struct page *p;
 	struct address_space *mapping = fs_info->btree_inode->i_mapping;
+	bool subpage = btrfs_is_subpage(fs_info);
 	int uptodate = 1;
 	int ret;
 
@@ -5292,7 +5338,12 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		}
 
 		spin_lock(&mapping->private_lock);
-		if (PagePrivate(p)) {
+		/*
+		 * Subpage support doesn't use page::private at all, so we
+		 * completely rely on the radix insert lock to prevent two
+		 * ebs allocated for the same bytenr.
+		 */
+		if (PagePrivate(p) && !subpage) {
 			/*
 			 * We could have already allocated an eb for this page
 			 * and attached one so lets see if we can get a ref on
@@ -5333,8 +5384,21 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		 * we could crash.
 		 */
 	}
-	if (uptodate)
+	if (uptodate) {
 		set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+	} else if (subpage) {
+		/*
+		 * For subpage, we must check extent_io_tree to get if the eb
+		 * is really uptodate, as the page uptodate is only set if the
+		 * whole page is uptodate.
+		 * We can still have uptodate range in the page.
+		 */
+		struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+
+		if (test_range_bit(io_tree, eb->start, eb->start + eb->len - 1,
+				   EXTENT_UPTODATE, 1, NULL))
+			set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+	}
 again:
 	ret = radix_tree_preload(GFP_NOFS);
 	if (ret) {
@@ -5373,6 +5437,18 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		if (eb->pages[i])
 			unlock_page(eb->pages[i]);
 	}
+	/*
+	 * For subpage case, btrfs_release_extent_buffer() will clear the
+	 * EXTENT_HAS_TREE_BLOCK bit if there is a page.
+	 *
+	 * Since we're here because we hit a race with another caller, who
+	 * succeeded in inserting the eb, we shouldn't clear that
+	 * EXTENT_HAS_TREE_BLOCK bit. So here we cleanup the page manually.
+	 */
+	if (subpage) {
+		put_page(eb->pages[0]);
+		eb->pages[i] = NULL;
+	}
 
 	btrfs_release_extent_buffer(eb);
 	return exists;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 33/49] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (31 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 32/49] btrfs: extent_io: use extent_io_tree to handle subpage extent buffer allocation Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 34/49] btrfs: extent_io: make the assert test on page uptodate able to handle subpage Qu Wenruo
                   ` (15 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

For those two functions, to support subpage size they just need the
follow work:
- set/clear the EXTENT_UPTODATE bits for io_tree
- set page Uptodate if the full range of the page is uptodate

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4dbc0b79c4ce..c9bbb91c6155 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5602,10 +5602,18 @@ bool set_extent_buffer_dirty(struct extent_buffer *eb)
 void clear_extent_buffer_uptodate(struct extent_buffer *eb)
 {
 	int i;
-	struct page *page;
+	struct page *page = eb->pages[0];
 	int num_pages;
 
 	clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+
+	if (btrfs_is_subpage(eb->fs_info) && page->mapping) {
+		struct extent_io_tree *io_tree =
+			info_to_btree_io_tree(eb->fs_info);
+
+		clear_extent_uptodate(io_tree, eb->start,
+				      eb->start + eb->len - 1, NULL);
+	}
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
@@ -5617,10 +5625,26 @@ void clear_extent_buffer_uptodate(struct extent_buffer *eb)
 void set_extent_buffer_uptodate(struct extent_buffer *eb)
 {
 	int i;
-	struct page *page;
+	struct page *page = eb->pages[0];
 	int num_pages;
 
 	set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+
+	if (btrfs_is_subpage(eb->fs_info) && page->mapping) {
+		struct extent_state *cached = NULL;
+		struct extent_io_tree *io_tree =
+			info_to_btree_io_tree(eb->fs_info);
+		u64 page_start = page_offset(page);
+		u64 page_end = page_offset(page) + PAGE_SIZE - 1;
+
+		set_extent_uptodate(io_tree, eb->start, eb->start + eb->len - 1,
+				    &cached, GFP_NOFS);
+		if (test_range_bit(io_tree, page_start, page_end,
+				   EXTENT_UPTODATE, 1, cached))
+			SetPageUptodate(page);
+		free_extent_state(cached);
+		return;
+	}
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 34/49] btrfs: extent_io: make the assert test on page uptodate able to handle subpage
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (32 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 33/49] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 35/49] btrfs: extent_io: implement subpage metadata read and its endio function Qu Wenruo
                   ` (14 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

There are quite some assert test on page uptodate in extent buffer write
accessors.
They ensure the destination page is already uptodate.

This is fine for regular sector size case, but not for subpage case, as
for subpage we only mark the page uptodate if the page contains no hole
and all its extent buffers are uptodate.

So instead of checking PageUptodate(), for subpage case we check
EXTENT_UPTODATE bit for the range covered by the extent buffer.

To make the check more elegant, introduce a helper,
assert_eb_range_uptodate() to do the check for both subpage and regular
sector size cases.

The following functions are involved:
- write_extent_buffer_chunk_tree_uuid()
- write_extent_buffer_fsid()
- write_extent_buffer()
- memzero_extent_buffer()
- copy_extent_buffer()
- extent_buffer_test_bit()
- extent_buffer_bitmap_set()
- extent_buffer_bitmap_clear()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 44 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c9bbb91c6155..210ae3349108 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5867,12 +5867,36 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 	return ret;
 }
 
+/*
+ * A helper to ensure that the extent buffer is uptodate.
+ *
+ * For regular sector size == PAGE_SIZE case, check if @page is uptodate.
+ * For subpage case, check if the range covered by the eb has EXTENT_UPTODATE.
+ */
+static void assert_eb_range_uptodate(const struct extent_buffer *eb,
+				     struct page *page)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+
+	if (btrfs_is_subpage(fs_info) && page->mapping) {
+		struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+
+		/* For subpage and mapped eb, check the EXTENT_UPTODATE bit. */
+		WARN_ON(!test_range_bit(io_tree, eb->start,
+				eb->start + eb->len - 1, EXTENT_UPTODATE, 1,
+				NULL));
+	} else {
+		/* For regular eb or dummy eb, check the page status directly */
+		WARN_ON(!PageUptodate(page));
+	}
+}
+
 void write_extent_buffer_chunk_tree_uuid(const struct extent_buffer *eb,
 		const void *srcv)
 {
 	char *kaddr;
 
-	WARN_ON(!PageUptodate(eb->pages[0]));
+	assert_eb_range_uptodate(eb, eb->pages[0]);
 	kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, chunk_tree_uuid), srcv,
 			BTRFS_FSID_SIZE);
@@ -5882,7 +5906,7 @@ void write_extent_buffer_fsid(const struct extent_buffer *eb, const void *srcv)
 {
 	char *kaddr;
 
-	WARN_ON(!PageUptodate(eb->pages[0]));
+	assert_eb_range_uptodate(eb, eb->pages[0]);
 	kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, fsid), srcv,
 			BTRFS_FSID_SIZE);
@@ -5905,7 +5929,7 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
 
 	while (len > 0) {
 		page = eb->pages[i];
-		WARN_ON(!PageUptodate(page));
+		assert_eb_range_uptodate(eb, page);
 
 		cur = min(len, PAGE_SIZE - offset);
 		kaddr = page_address(page);
@@ -5934,7 +5958,7 @@ void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start,
 
 	while (len > 0) {
 		page = eb->pages[i];
-		WARN_ON(!PageUptodate(page));
+		assert_eb_range_uptodate(eb, page);
 
 		cur = min(len, PAGE_SIZE - offset);
 		kaddr = page_address(page);
@@ -5990,7 +6014,7 @@ void copy_extent_buffer(const struct extent_buffer *dst,
 
 	while (len > 0) {
 		page = dst->pages[i];
-		WARN_ON(!PageUptodate(page));
+		assert_eb_range_uptodate(dst, page);
 
 		cur = min(len, (unsigned long)(PAGE_SIZE - offset));
 
@@ -6052,7 +6076,7 @@ int extent_buffer_test_bit(const struct extent_buffer *eb, unsigned long start,
 
 	eb_bitmap_offset(eb, start, nr, &i, &offset);
 	page = eb->pages[i];
-	WARN_ON(!PageUptodate(page));
+	assert_eb_range_uptodate(eb, page);
 	kaddr = page_address(page);
 	return 1U & (kaddr[offset] >> (nr & (BITS_PER_BYTE - 1)));
 }
@@ -6077,7 +6101,7 @@ void extent_buffer_bitmap_set(const struct extent_buffer *eb, unsigned long star
 
 	eb_bitmap_offset(eb, start, pos, &i, &offset);
 	page = eb->pages[i];
-	WARN_ON(!PageUptodate(page));
+	assert_eb_range_uptodate(eb, page);
 	kaddr = page_address(page);
 
 	while (len >= bits_to_set) {
@@ -6088,7 +6112,7 @@ void extent_buffer_bitmap_set(const struct extent_buffer *eb, unsigned long star
 		if (++offset >= PAGE_SIZE && len > 0) {
 			offset = 0;
 			page = eb->pages[++i];
-			WARN_ON(!PageUptodate(page));
+			assert_eb_range_uptodate(eb, page);
 			kaddr = page_address(page);
 		}
 	}
@@ -6120,7 +6144,7 @@ void extent_buffer_bitmap_clear(const struct extent_buffer *eb,
 
 	eb_bitmap_offset(eb, start, pos, &i, &offset);
 	page = eb->pages[i];
-	WARN_ON(!PageUptodate(page));
+	assert_eb_range_uptodate(eb, page);
 	kaddr = page_address(page);
 
 	while (len >= bits_to_clear) {
@@ -6131,7 +6155,7 @@ void extent_buffer_bitmap_clear(const struct extent_buffer *eb,
 		if (++offset >= PAGE_SIZE && len > 0) {
 			offset = 0;
 			page = eb->pages[++i];
-			WARN_ON(!PageUptodate(page));
+			assert_eb_range_uptodate(eb, page);
 			kaddr = page_address(page);
 		}
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 35/49] btrfs: extent_io: implement subpage metadata read and its endio function
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (33 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 34/49] btrfs: extent_io: make the assert test on page uptodate able to handle subpage Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 36/49] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
                   ` (13 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

For subpage metadata read, since we're completely relying on io tree
other than page bits, its read submission and endio function is
different from the regular page size.

For submission part:
- Do extent locking/waiting
  Addition to page locking, we do extra extent io tree locking, which
  provides more accurate range locking.

  Since we're still utilizing the page locking, that means we will have
  higher delay for reading tree blocks in the same page.
  (reading extent buffers in the same page will be forced sequential).

- Submit extent page directly
  To simply the process, as all the metadata read is always contained in
  one page.

For endio part:
- Do extent locking/waiting
  The same as submission part.

This behavior has a small problem that, extent locking/waiting are all
going to allocate memory, thus they can all fail.

Currently we're relying on the BUG_ON() in various set_extent_bits()
calls. But when we're going to handle the error from them, this way
would make it more complex to pass all the ENOMEM error upwards.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   | 81 ++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/extent_io.c | 88 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 169 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 10bdb0a8a92f..89021e552da0 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -651,6 +651,84 @@ static int btrfs_check_extent_buffer(struct extent_buffer *eb)
 	return ret;
 }
 
+static int btree_read_subpage_endio_hook(struct page *page, u64 start, u64 end,
+					 int mirror)
+{
+	struct btrfs_fs_info *fs_info = page_to_fs_info(page);
+	struct extent_buffer *eb;
+	int reads_done;
+	int ret = 0;
+
+	if (!IS_ALIGNED(start, fs_info->sectorsize) ||
+	    !IS_ALIGNED(end - start + 1, fs_info->sectorsize) ||
+	    !IS_ALIGNED(end - start + 1, fs_info->nodesize)) {
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		btrfs_err(fs_info, "invalid tree read bytenr");
+		return -EUCLEAN;
+	}
+
+	/*
+	 * We don't allow bio merge for subpage metadata read, so we should
+	 * only get one eb for each endio hook.
+	 */
+	ASSERT(end == start + fs_info->nodesize - 1);
+	ASSERT(PagePrivate(page));
+
+	rcu_read_lock();
+	eb = radix_tree_lookup(&fs_info->buffer_radix,
+			       start / fs_info->sectorsize);
+	rcu_read_unlock();
+
+	/*
+	 * When we are reading one tree block, eb must have been
+	 * inserted into the radix tree. If not something is wrong.
+	 */
+	if (!eb) {
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		btrfs_err(fs_info,
+			"can't find extent buffer for bytenr %llu",
+			start);
+		return -EUCLEAN;
+	}
+	/*
+	 * The pending IO might have been the only thing that kept
+	 * this buffer in memory.  Make sure we have a ref for all
+	 * this other checks
+	 */
+	atomic_inc(&eb->refs);
+
+	reads_done = atomic_dec_and_test(&eb->io_pages);
+	/* Subpage read must finish in page read */
+	ASSERT(reads_done);
+
+	eb->read_mirror= mirror;
+	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
+		ret = -EIO;
+		goto err;
+	}
+	ret = btrfs_check_extent_buffer(eb);
+	if (ret < 0)
+		goto err;
+
+	if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
+		btree_readahead_hook(eb, ret);
+
+	set_extent_buffer_uptodate(eb);
+
+	free_extent_buffer(eb);
+	return ret;
+err:
+	/*
+	 * our io error hook is going to dec the io pages
+	 * again, we have to make sure it has something to
+	 * decrement
+	 */
+	atomic_inc(&eb->io_pages);
+	clear_extent_buffer_uptodate(eb);
+	free_extent_buffer(eb);
+	return ret;
+}
+
 static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 				      u64 phy_offset, struct page *page,
 				      u64 start, u64 end, int mirror)
@@ -659,6 +737,9 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 	int ret = 0;
 	bool reads_done;
 
+	if (btrfs_is_subpage(page_to_fs_info(page)))
+		return btree_read_subpage_endio_hook(page, start, end, mirror);
+
 	/* Metadata pages that goes through IO should all have private set */
 	ASSERT(PagePrivate(page) && page->private);
 	eb = (struct extent_buffer *)page->private;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 210ae3349108..1423f69bc210 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3082,6 +3082,15 @@ static int submit_extent_page(unsigned int opf,
 		else
 			contig = bio_end_sector(bio) == sector;
 
+		/*
+		 * For subpage metadata read, never merge request, so that
+		 * we get endio hook called on each metadata read.
+		 */
+		if (btrfs_is_subpage(page_to_fs_info(page)) &&
+		    tree->owner == IO_TREE_BTREE_INODE_IO &&
+		    (opf & REQ_OP_READ))
+			ASSERT(force_bio_submit);
+
 		ASSERT(tree->ops);
 		if (btrfs_bio_fits_in_stripe(page, io_size, bio, bio_flags))
 			can_merge = false;
@@ -5652,6 +5661,82 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
 	}
 }
 
+static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
+				      int mirror_num)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	struct page *page = eb->pages[0];
+	struct bio *bio = NULL;
+	int ret = 0;
+
+	ASSERT(!test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags));
+
+	/* Lock page first then lock extent range */
+	if (wait == WAIT_NONE) {
+		if (!trylock_page(page))
+			return 0;
+	} else {
+		lock_page(page);
+	}
+
+	if (wait == WAIT_NONE) {
+		ret = try_lock_extent(io_tree, eb->start,
+				      eb->start + eb->len - 1);
+		if (ret <= 0) {
+			unlock_page(page);
+			return ret;
+		}
+	} else {
+		ret = lock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		if (ret < 0) {
+			unlock_page(page);
+			return ret;
+		}
+	}
+
+	ret = 0;
+	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags) ||
+	    PageUptodate(page) ||
+	    test_range_bit(io_tree, eb->start, eb->start + eb->len - 1,
+			   EXTENT_UPTODATE, 1, NULL)) {
+		set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+		unlock_page(page);
+		unlock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		return ret;
+	}
+	atomic_set(&eb->io_pages, 1);
+
+	ret = submit_extent_page(REQ_OP_READ | REQ_META, NULL, page, eb->start,
+				 eb->len, eb->start - page_offset(page), &bio,
+				 end_bio_extent_readpage, mirror_num, 0, 0,
+				 true);
+	if (ret) {
+		/*
+		 * In the endio function, if we hit something wrong we will
+		 * increase the io_pages, so here we need to decrease it for error
+		 * path.
+		 */
+		atomic_dec(&eb->io_pages);
+	}
+	if (bio) {
+		int tmp;
+
+		tmp = submit_one_bio(bio, mirror_num, 0);
+		if (tmp < 0)
+			return tmp;
+	}
+	if (ret || wait != WAIT_COMPLETE)
+		return ret;
+
+	wait_on_page_locked(page);
+	wait_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+			EXTENT_LOCKED);
+	if (!test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
+		ret = -EIO;
+	return ret;
+}
+
 int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 {
 	int i;
@@ -5668,6 +5753,9 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
 		return 0;
 
+	if (btrfs_is_subpage(eb->fs_info))
+		return read_extent_buffer_subpage(eb, wait, mirror_num);
+
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 36/49] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (34 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 35/49] btrfs: extent_io: implement subpage metadata read and its endio function Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 37/49] btrfs: set btree inode track_uptodate for subpage support Qu Wenruo
                   ` (12 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

For try_release_extent_buffer(), we just iterate through all the range
with EXTENT_NEW set, and try freeing each extent buffer.

Also introduce a helper, find_first_subpage_eb(), to locate find the
first eb in the range.
This helper will also be utilized for later subpage patches.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   |  6 ++++
 fs/btrfs/extent_io.c | 83 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 89021e552da0..efbe12e4f952 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1047,6 +1047,12 @@ static int btree_writepages(struct address_space *mapping,
 
 static int btree_readpage(struct file *file, struct page *page)
 {
+	/*
+	 * For subpage, we don't support VFS to call btree_readpages(),
+	 * directly.
+	 */
+	if (btrfs_is_subpage(page_to_fs_info(page)))
+		return -ENOTTY;
 	return extent_read_full_page(page, btree_get_extent, 0);
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1423f69bc210..6aa25681aea4 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2743,6 +2743,48 @@ blk_status_t btrfs_submit_read_repair(struct inode *inode,
 	return status;
 }
 
+/*
+ * A helper for locate subpage extent buffer.
+ *
+ * NOTE: returned extent buffer won't has its ref increased.
+ *
+ * @extra_bits:		Extra bits to match.
+ * 			The returned eb range will match all extra_bits.
+ *
+ * Return 0 if we found one extent buffer and record it in @eb_ret.
+ * Return 1 if there is no extent buffer in the range.
+ */
+static int find_first_subpage_eb(struct btrfs_fs_info *fs_info,
+				 struct extent_buffer **eb_ret, u64 start,
+				 u64 end, u32 extra_bits)
+{
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	u64 found_start;
+	u64 found_end;
+	int ret;
+
+	ASSERT(btrfs_is_subpage(fs_info) && eb_ret);
+
+	ret = find_first_extent_bit(io_tree, start, &found_start, &found_end,
+			EXTENT_HAS_TREE_BLOCK | extra_bits, true, NULL);
+	if (ret > 0 || found_start > end)
+		return 1;
+
+	/* found_start can be smaller than start */
+	start = max(start, found_start);
+
+	/*
+	 * Here we can't call find_extent_buffer() which will increase
+	 * eb->refs.
+	 */
+	rcu_read_lock();
+	*eb_ret = radix_tree_lookup(&fs_info->buffer_radix,
+				    start / fs_info->sectorsize);
+	rcu_read_unlock();
+	ASSERT(*eb_ret);
+	return 0;
+}
+
 /* lots and lots of room for performance fixes in the end_bio funcs */
 
 void end_extent_writepage(struct page *page, int err, u64 start, u64 end)
@@ -6374,10 +6416,51 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
 	}
 }
 
+static int try_release_subpage_eb(struct page *page)
+{
+	struct btrfs_fs_info *fs_info = page_to_fs_info(page);
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	u64 cur = page_offset(page);
+	u64 end = page_offset(page) + PAGE_SIZE - 1;
+	int ret;
+
+	while (cur <= end) {
+		struct extent_buffer *eb;
+
+		ret = find_first_subpage_eb(fs_info, &eb, cur, end, 0);
+		if (ret > 0)
+			break;
+
+		cur = eb->start + eb->len;
+
+		spin_lock(&eb->refs_lock);
+		if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb) ||
+		    !test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) {
+			spin_unlock(&eb->refs_lock);
+			continue;
+		}
+		/*
+		 * Here we don't care the return value, we will always check
+		 * the EXTENT_HAS_TREE_BLOCK bit at the end.
+		 */
+		release_extent_buffer(eb);
+	}
+
+	/* Finally check if there is any EXTENT_HAS_TREE_BLOCK bit remaining */
+	if (test_range_bit(io_tree, page_offset(page), end,
+			   EXTENT_HAS_TREE_BLOCK, 0, NULL))
+		ret = 0;
+	else
+		ret = 1;
+	return ret;
+}
+
 int try_release_extent_buffer(struct page *page)
 {
 	struct extent_buffer *eb;
 
+	if (btrfs_is_subpage(page_to_fs_info(page)))
+		return try_release_subpage_eb(page);
 	/*
 	 * We need to make sure nobody is attaching this page to an eb right
 	 * now.
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 37/49] btrfs: set btree inode track_uptodate for subpage support
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (35 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 36/49] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 38/49] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
                   ` (11 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

Let btree io tree to track EXTENT_UPTODATE bit, so that for subpage
metadata IO, we don't need to bother tracking the UPTODATE status
manually through bio submission/endio functions.

Currently only subpage metadata will cleanup the extra bits utizlied
(EXTENT_HAS_TREE_BLOCK, EXTENT_UPTODATE, EXTENT_LOCKED), while the
regular page size will only clean up EXTENT_LOCKED.

This still allows the regular page size case to avoid the extra delay in
extent io tree operations, but allows subpage case to be sector size
aligned.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index efbe12e4f952..97c44f518a49 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2244,7 +2244,14 @@ static void btrfs_init_btree_inode(struct btrfs_fs_info *fs_info)
 	RB_CLEAR_NODE(&BTRFS_I(inode)->rb_node);
 	extent_io_tree_init(fs_info, &BTRFS_I(inode)->io_tree,
 			    IO_TREE_BTREE_INODE_IO, inode);
-	BTRFS_I(inode)->io_tree.track_uptodate = false;
+	/*
+	 * For subpage size support, btree inode tracks EXTENT_UPTODATE for
+	 * its IO.
+	 */
+	if (btrfs_is_subpage(fs_info))
+		BTRFS_I(inode)->io_tree.track_uptodate = true;
+	else
+		BTRFS_I(inode)->io_tree.track_uptodate = false;
 	extent_map_tree_init(&BTRFS_I(inode)->extent_tree);
 
 	BTRFS_I(inode)->io_tree.ops = &btree_extent_io_ops;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 38/49] btrfs: allow RO mount of 4K sector size fs on 64K page system
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (36 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 37/49] btrfs: set btree inode track_uptodate for subpage support Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 39/49] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
                   ` (10 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

This adds the basic RO mount ability for 4K sector size on 64K page
system.

Currently we only plan to support 4K and 64K page system.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 24 +++++++++++++++++++++---
 fs/btrfs/super.c   |  7 +++++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 97c44f518a49..e0dc7b92411e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2565,13 +2565,21 @@ static int validate_super(struct btrfs_fs_info *fs_info,
 		btrfs_err(fs_info, "invalid sectorsize %llu", sectorsize);
 		ret = -EINVAL;
 	}
-	/* Only PAGE SIZE is supported yet */
-	if (sectorsize != PAGE_SIZE) {
+
+	/*
+	 * For 4K page size, we only support 4K sector size.
+	 * For 64K page size, we support RW for 64K sector size, and RO for
+	 * 4K sector size.
+	 */
+	if ((PAGE_SIZE == SZ_4K && sectorsize != PAGE_SIZE) ||
+	    (PAGE_SIZE == SZ_64K && (sectorsize != SZ_4K &&
+				     sectorsize != SZ_64K))) {
 		btrfs_err(fs_info,
-			"sectorsize %llu not supported yet, only support %lu",
+			"sectorsize %llu not supported yet for page size %lu",
 			sectorsize, PAGE_SIZE);
 		ret = -EINVAL;
 	}
+
 	if (!is_power_of_2(nodesize) || nodesize < sectorsize ||
 	    nodesize > BTRFS_MAX_METADATA_BLOCKSIZE) {
 		btrfs_err(fs_info, "invalid nodesize %llu", nodesize);
@@ -3219,6 +3227,16 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_alloc;
 	}
 
+	/* For 4K sector size support, it's only read-only yet */
+	if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) {
+		if (!sb_rdonly(sb) || btrfs_super_log_root(disk_super)) {
+			btrfs_err(fs_info,
+				"subpage sector size only support RO yet");
+			err = -EINVAL;
+			goto fail_alloc;
+		}
+	}
+
 	ret = btrfs_init_workqueues(fs_info, fs_devices);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 25967ecaaf0a..743a2fadf4ee 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1922,6 +1922,13 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			ret = -EINVAL;
 			goto restore;
 		}
+		if (btrfs_is_subpage(fs_info)) {
+			btrfs_warn(fs_info,
+	"read-write mount is not yet allowed for sector size %u page size %lu",
+				   fs_info->sectorsize, PAGE_SIZE);
+			ret = -EINVAL;
+			goto restore;
+		}
 
 		ret = btrfs_cleanup_fs_roots(fs_info);
 		if (ret)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 39/49] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (37 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 38/49] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 40/49] btrfs: disk-io: support subpage metadata csum calculation at write time Qu Wenruo
                   ` (9 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

For btree_set_page_dirty(), we should also check the extent buffer
sanity for subpage support.

Unlike the regular sector size case, since one page can contain multile
extent buffers, and page::private no longer contains the pointer to
extent buffer.

So this patch will iterate through the extent_io_tree to find out any
EXTENT_HAS_TREE_BLOCK bit, and check if any extent buffers in the page
range has EXTENT_BUFFER_DIRTY and proper refs.

Also, since we need to find subpage extent outside of extent_io.c,
export find_first_subpage_eb() as btrfs_find_first_subpage_eb().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   | 36 ++++++++++++++++++++++++++++++------
 fs/btrfs/extent_io.c |  8 ++++----
 fs/btrfs/extent_io.h |  4 ++++
 3 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e0dc7b92411e..d31999978821 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1110,14 +1110,38 @@ static void btree_invalidatepage(struct page *page, unsigned int offset,
 static int btree_set_page_dirty(struct page *page)
 {
 #ifdef DEBUG
+	struct btrfs_fs_info *fs_info = page_to_fs_info(page);
 	struct extent_buffer *eb;
 
-	BUG_ON(!PagePrivate(page));
-	eb = (struct extent_buffer *)page->private;
-	BUG_ON(!eb);
-	BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
-	BUG_ON(!atomic_read(&eb->refs));
-	btrfs_assert_tree_locked(eb);
+	if (fs_info->sectorsize == PAGE_SIZE) {
+		BUG_ON(!PagePrivate(page));
+		eb = (struct extent_buffer *)page->private;
+		BUG_ON(!eb);
+		BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
+		BUG_ON(!atomic_read(&eb->refs));
+		btrfs_assert_tree_locked(eb);
+	} else {
+		u64 page_start = page_offset(page);
+		u64 page_end = page_start + PAGE_SIZE - 1;
+		u64 cur = page_start;
+		bool found_dirty_eb = false;
+		int ret;
+
+		ASSERT(btrfs_is_subpage(fs_info));
+		while (cur <= page_end) {
+			ret = btrfs_find_first_subpage_eb(fs_info, &eb, cur,
+							  page_end, 0);
+			if (ret > 0)
+				break;
+			cur = eb->start + eb->len;
+			if (test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags)) {
+				found_dirty_eb = true;
+				ASSERT(atomic_read(&eb->refs));
+				btrfs_assert_tree_locked(eb);
+			}
+		}
+		BUG_ON(!found_dirty_eb);
+	}
 #endif
 	return __set_page_dirty_nobuffers(page);
 }
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6aa25681aea4..5750a3b92777 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2754,9 +2754,9 @@ blk_status_t btrfs_submit_read_repair(struct inode *inode,
  * Return 0 if we found one extent buffer and record it in @eb_ret.
  * Return 1 if there is no extent buffer in the range.
  */
-static int find_first_subpage_eb(struct btrfs_fs_info *fs_info,
-				 struct extent_buffer **eb_ret, u64 start,
-				 u64 end, u32 extra_bits)
+int btrfs_find_first_subpage_eb(struct btrfs_fs_info *fs_info,
+				struct extent_buffer **eb_ret, u64 start,
+				u64 end, u32 extra_bits)
 {
 	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
 	u64 found_start;
@@ -6427,7 +6427,7 @@ static int try_release_subpage_eb(struct page *page)
 	while (cur <= end) {
 		struct extent_buffer *eb;
 
-		ret = find_first_subpage_eb(fs_info, &eb, cur, end, 0);
+		ret = btrfs_find_first_subpage_eb(fs_info, &eb, cur, end, 0);
 		if (ret > 0)
 			break;
 
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 602d6568c8ea..f527b6fa258d 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -298,6 +298,10 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size);
 struct btrfs_fs_info;
 struct btrfs_inode;
 
+int btrfs_find_first_subpage_eb(struct btrfs_fs_info *fs_info,
+				struct extent_buffer **eb_ret, u64 start,
+				u64 end, unsigned int extra_bits);
+
 int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 		      u64 length, u64 logical, struct page *page,
 		      unsigned int pg_offset, int mirror_num);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 40/49] btrfs: disk-io: support subpage metadata csum calculation at write time
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (38 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 39/49] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 41/49] btrfs: extent_io: prevent extent_state from being merged for btree io tree Qu Wenruo
                   ` (8 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

Add a new helper, csum_dirty_subpage_buffers(), to iterate through all
possible extent buffers in one bvec.

Also extract the code to calculate csum for one extent buffer into
csum_one_extent_buffer(), so that both the existing csum_dirty_buffer()
and the new csum_dirty_subpage_buffers() can reuse the same routine.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 103 ++++++++++++++++++++++++++++++++++-----------
 1 file changed, 79 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d31999978821..9aa68e2344e1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -490,35 +490,13 @@ static int btree_read_extent_buffer_pages(struct extent_buffer *eb,
 	return ret;
 }
 
-/*
- * checksum a dirty tree block before IO.  This has extra checks to make sure
- * we only fill in the checksum field in the first page of a multi-page block
- */
-
-static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec)
+static int csum_one_extent_buffer(struct extent_buffer *eb)
 {
-	struct extent_buffer *eb;
-	struct page *page = bvec->bv_page;
-	u64 start = page_offset(page);
-	u64 found_start;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 	u8 result[BTRFS_CSUM_SIZE];
 	u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
 	int ret;
 
-	eb = (struct extent_buffer *)page->private;
-	if (page != eb->pages[0])
-		return 0;
-
-	found_start = btrfs_header_bytenr(eb);
-	/*
-	 * Please do not consolidate these warnings into a single if.
-	 * It is useful to know what went wrong.
-	 */
-	if (WARN_ON(found_start != start))
-		return -EUCLEAN;
-	if (WARN_ON(!PageUptodate(page)))
-		return -EUCLEAN;
-
 	ASSERT(memcmp_extent_buffer(eb, fs_info->fs_devices->metadata_uuid,
 				    offsetof(struct btrfs_header, fsid),
 				    BTRFS_FSID_SIZE) == 0);
@@ -543,6 +521,83 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec
 	return 0;
 }
 
+/*
+ * Do all the csum calculation and extra sanity checks on all extent
+ * buffers in the bvec.
+ */
+static int csum_dirty_subpage_buffers(struct btrfs_fs_info *fs_info,
+				      struct bio_vec *bvec)
+{
+	struct page *page = bvec->bv_page;
+	u64 page_start = page_offset(page);
+	u64 start = page_start + bvec->bv_offset;
+	u64 end = start + bvec->bv_len - 1;
+	u64 cur = start;
+	int ret = 0;
+
+	while (cur <= end) {
+		struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+		struct extent_buffer *eb;
+
+		ret = btrfs_find_first_subpage_eb(fs_info, &eb, cur, end, 0);
+		if (ret > 0) {
+			ret = 0;
+			break;
+		}
+
+		/*
+		 * Here we can't use PageUptodate() to check the status.
+		 * As one page is uptodate only when all its extent buffers
+		 * are uptodate, and no holes between them.
+		 * So here we use EXTENT_UPTODATE bit to make sure the exntent
+		 * buffer is uptodate.
+		 */
+		if (WARN_ON(test_range_bit(io_tree, eb->start,
+				eb->start + eb->len - 1, EXTENT_UPTODATE, 1,
+				NULL) == 0))
+			return -EUCLEAN;
+		if (WARN_ON(cur != btrfs_header_bytenr(eb)))
+			return -EUCLEAN;
+
+		ret = csum_one_extent_buffer(eb);
+		if (ret < 0)
+			return ret;
+		cur = eb->start + eb->len;
+	}
+	return ret;
+}
+
+/*
+ * checksum a dirty tree block before IO.  This has extra checks to make sure
+ * we only fill in the checksum field in the first page of a multi-page block
+ */
+static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec)
+{
+	struct extent_buffer *eb;
+	struct page *page = bvec->bv_page;
+	u64 start = page_offset(page) + bvec->bv_offset;
+	u64 found_start;
+
+	if (btrfs_is_subpage(fs_info))
+		return csum_dirty_subpage_buffers(fs_info, bvec);
+
+	eb = (struct extent_buffer *)page->private;
+	if (page != eb->pages[0])
+		return 0;
+
+	found_start = btrfs_header_bytenr(eb);
+	/*
+	 * Please do not consolidate these warnings into a single if.
+	 * It is useful to know what went wrong.
+	 */
+	if (WARN_ON(found_start != start))
+		return -EUCLEAN;
+	if (WARN_ON(!PageUptodate(page)))
+		return -EUCLEAN;
+
+	return csum_one_extent_buffer(eb);
+}
+
 static int check_tree_block_fsid(struct extent_buffer *eb)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 41/49] btrfs: extent_io: prevent extent_state from being merged for btree io tree
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (39 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 40/49] btrfs: disk-io: support subpage metadata csum calculation at write time Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 42/49] btrfs: extent_io: make set_extent_buffer_dirty() to support subpage sized metadata Qu Wenruo
                   ` (7 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

For incoming subpage metadata rw support, prevent extent_state from
being merged for btree io tree.

The main cause is set_extent_buffer_dirty().

In the following call chain, we could fall into the situation where we
have to call set_extent_dirty() with atomic context:

alloc_reserved_tree_block()
|- path->leave_spinning = 1;
|- btrfs_insert_empty_item()
   |- btrfs_search_slot()
   |  Now the path has all its tree block spinning locked
   |- setup_items_for_insert();
   |- btrfs_unlock_up_safe(path, 1);
   |  Now path->nodes[0] still spin locked
   |- btrfs_mark_buffer_dirty(leaf);
      |- set_extent_buffer_dirty()

Since set_extent_buffer_dirty() is in fact a pretty common call, just
fall back to GFP_ATOMIC allocation used in __set_extent_bit() may
exhause the pool sooner than we expected.

So this patch goes another direction, by not merging all extent_state
for subpage btree io tree.

Since for subpage btree io tree, all in tree extent buffers has
EXTENT_HAS_TREE_BLOCK bit set during its lifespan, as long as
extent_state is not merged, each extent buffer would has its own
extent_state, so that set/clear_extent_bit() can reuse existing extent
buffer extent_state, without allocating new memory.

The cost is obvious, around 150 bytes per subpage extent buffer.
But considering for subpage extent buffer, we saved 15 page pointers,
this should save 120 bytes, so the net cost is just 30 bytes per subpage
extent buffer, which should be acceptable.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c        | 14 ++++++++++++--
 fs/btrfs/extent-io-tree.h | 14 ++++++++++++++
 fs/btrfs/extent_io.c      | 19 ++++++++++++++-----
 3 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9aa68e2344e1..e466c30b52c8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2326,11 +2326,21 @@ static void btrfs_init_btree_inode(struct btrfs_fs_info *fs_info)
 	/*
 	 * For subpage size support, btree inode tracks EXTENT_UPTODATE for
 	 * its IO.
+	 *
+	 * And never merge extent states to make all set/clear operation never
+	 * to allocate memory, except the initial EXTENT_HAS_TREE_BLOCK bit.
+	 * This adds extra ~150 bytes for each extent buffer.
+	 *
+	 * TODO: Josef's rwsem rework on tree lock would kill the leave_spining
+	 * case, and then we can revert this behavior.
 	 */
-	if (btrfs_is_subpage(fs_info))
+	if (btrfs_is_subpage(fs_info)) {
 		BTRFS_I(inode)->io_tree.track_uptodate = true;
-	else
+		BTRFS_I(inode)->io_tree.never_merge = true;
+	} else {
 		BTRFS_I(inode)->io_tree.track_uptodate = false;
+		BTRFS_I(inode)->io_tree.never_merge = false;
+	}
 	extent_map_tree_init(&BTRFS_I(inode)->extent_tree);
 
 	BTRFS_I(inode)->io_tree.ops = &btree_extent_io_ops;
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index c4e73c84ba34..5c0a66146f05 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -62,6 +62,20 @@ struct extent_io_tree {
 	u64 dirty_bytes;
 	bool track_uptodate;
 
+	/*
+	 * Never to merge extent_state.
+	 *
+	 * This allows any set/clear function to be execute in atomic context
+	 * without allocating extra memory.
+	 * The cost is extra memory usage.
+	 *
+	 * Should only be used for subpage btree io tree, which mostly adds per
+	 * extent buffer memory usage.
+	 *
+	 * Default: false.
+	 */
+	bool never_merge;
+
 	/* Who owns this io tree, should be one of IO_TREE_* */
 	u8 owner;
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5750a3b92777..d9a05979396d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -285,6 +285,7 @@ void extent_io_tree_init(struct btrfs_fs_info *fs_info,
 	spin_lock_init(&tree->lock);
 	tree->private_data = private_data;
 	tree->owner = owner;
+	tree->never_merge = false;
 	if (owner == IO_TREE_INODE_FILE_EXTENT)
 		lockdep_set_class(&tree->lock, &file_extent_tree_class);
 }
@@ -480,11 +481,18 @@ static inline struct rb_node *tree_search(struct extent_io_tree *tree,
 }
 
 /*
- * utility function to look for merge candidates inside a given range.
+ * Utility function to look for merge candidates inside a given range.
  * Any extents with matching state are merged together into a single
- * extent in the tree.  Extents with EXTENT_IO in their state field
- * are not merged because the end_io handlers need to be able to do
- * operations on them without sleeping (or doing allocations/splits).
+ * extent in the tree.
+ *
+ * Except the following cases:
+ * - extent_state with EXTENT_LOCK or EXTENT_BOUNDARY bit set
+ *   Those extents are not merged because end_io handlers need to be able
+ *   to do operations on them without sleeping (or doing allocations/splits)
+ *
+ * - extent_io_tree with never_merge bit set
+ *   Same reason as above, but extra call sites may have spinlock/rwlock hold,
+ *   and we don't want to abuse GFP_ATOMIC.
  *
  * This should be called with the tree lock held.
  */
@@ -494,7 +502,8 @@ static void merge_state(struct extent_io_tree *tree,
 	struct extent_state *other;
 	struct rb_node *other_node;
 
-	if (state->state & (EXTENT_LOCKED | EXTENT_BOUNDARY))
+	if (state->state & (EXTENT_LOCKED | EXTENT_BOUNDARY) ||
+	    tree->never_merge)
 		return;
 
 	other_node = rb_prev(&state->rb_node);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 42/49] btrfs: extent_io: make set_extent_buffer_dirty() to support subpage sized metadata
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (40 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 41/49] btrfs: extent_io: prevent extent_state from being merged for btree io tree Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 43/49] btrfs: extent_io: add subpage support for clear_extent_buffer_dirty() Qu Wenruo
                   ` (6 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

For set_extent_buffer_dirty() to support subpage sized metadata, we only
need to call set_extent_dirty().

As any dirty extent buffer in the page would make the whole page dirty,
we can re-use the existing routine without problem, just need to add
above call of set_extent_buffer_dirty().

Now since a page is dirty if any extent buffer in it is dirty, the
WARN_ON() in alloc_extent_buffer() can be falsely triggered, also update
the WARN_ON(PageDirty()) check into assert_eb_range_not_dirty() to
support subpage case.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d9a05979396d..ae7ab7364115 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5354,6 +5354,22 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
 }
 #endif
 
+static void assert_eb_range_not_dirty(struct extent_buffer *eb,
+				      struct page *page)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+
+	if (btrfs_is_subpage(fs_info) && page->mapping) {
+		struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+
+		WARN_ON(test_range_bit(io_tree, eb->start,
+				eb->start + eb->len - 1, EXTENT_DIRTY, 0,
+				NULL));
+	} else {
+		WARN_ON(PageDirty(page));
+	}
+}
+
 struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 					  u64 start)
 {
@@ -5426,12 +5442,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 			 * drop the ref the old guy had.
 			 */
 			ClearPagePrivate(p);
+			assert_eb_range_not_dirty(eb, p);
 			WARN_ON(PageDirty(p));
 			put_page(p);
 		}
 		attach_extent_buffer_page(eb, p);
 		spin_unlock(&mapping->private_lock);
-		WARN_ON(PageDirty(p));
+		assert_eb_range_not_dirty(eb, p);
 		eb->pages[i] = p;
 		if (!PageUptodate(p))
 			uptodate = 0;
@@ -5651,6 +5668,22 @@ bool set_extent_buffer_dirty(struct extent_buffer *eb)
 		for (i = 0; i < num_pages; i++)
 			set_page_dirty(eb->pages[i]);
 
+	/*
+	 * For subpage size, also set the sector aligned EXTENT_DIRTY range for
+	 * btree io tree
+	 */
+	if (btrfs_is_subpage(eb->fs_info)) {
+		struct extent_io_tree *io_tree =
+			info_to_btree_io_tree(eb->fs_info);
+
+		/*
+		 * set_extent_buffer_dirty() can be called with
+		 * path->leave_spinning == 1, in that case we can't sleep.
+		 */
+		set_extent_dirty(io_tree, eb->start, eb->start + eb->len - 1,
+				 GFP_ATOMIC);
+	}
+
 #ifdef CONFIG_BTRFS_DEBUG
 	for (i = 0; i < num_pages; i++)
 		ASSERT(PageDirty(eb->pages[i]));
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 43/49] btrfs: extent_io: add subpage support for clear_extent_buffer_dirty()
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (41 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 42/49] btrfs: extent_io: make set_extent_buffer_dirty() to support subpage sized metadata Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 44/49] btrfs: extent_io: make set_btree_ioerr() accept extent buffer Qu Wenruo
                   ` (5 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

To support subpage metadata, clear_extent_buffer_dirty() needs to clear
the page dirty if and only if all extent buffers in the page range are
no longer dirty.

This is pretty different from the exist clear_extent_buffer_dirty()
routine, so add a new helper function,
clear_subpage_extent_buffer_dirty() to do this for subpage metadata.

Also since the main part of clearing page dirty code is still the same,
extract that into btree_clear_page_dirty() so that it can be utilized
for both cases.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 47 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 35 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ae7ab7364115..07dec345f662 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5622,30 +5622,53 @@ void free_extent_buffer_stale(struct extent_buffer *eb)
 	release_extent_buffer(eb);
 }
 
+static void btree_clear_page_dirty(struct page *page)
+{
+	ASSERT(PageDirty(page));
+
+	lock_page(page);
+	clear_page_dirty_for_io(page);
+	xa_lock_irq(&page->mapping->i_pages);
+	if (!PageDirty(page))
+		__xa_clear_mark(&page->mapping->i_pages,
+				page_index(page), PAGECACHE_TAG_DIRTY);
+	xa_unlock_irq(&page->mapping->i_pages);
+	ClearPageError(page);
+	unlock_page(page);
+}
+
+static void clear_subpage_extent_buffer_dirty(const struct extent_buffer *eb)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	struct page *page = eb->pages[0];
+	u64 page_start = page_offset(page);
+	u64 page_end = page_start + PAGE_SIZE - 1;
+	int ret;
+
+	clear_extent_dirty(io_tree, eb->start, eb->start + eb->len - 1, NULL);
+	ret = test_range_bit(io_tree, page_start, page_end, EXTENT_DIRTY, 0, NULL);
+	/* All extent buffers in the page range is cleared now */
+	if (ret == 0 && PageDirty(page))
+		btree_clear_page_dirty(page);
+	WARN_ON(atomic_read(&eb->refs) == 0);
+}
+
 void clear_extent_buffer_dirty(const struct extent_buffer *eb)
 {
 	int i;
 	int num_pages;
 	struct page *page;
 
+	if (btrfs_is_subpage(eb->fs_info))
+		return clear_subpage_extent_buffer_dirty(eb);
 	num_pages = num_extent_pages(eb);
 
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
 		if (!PageDirty(page))
 			continue;
-
-		lock_page(page);
-		WARN_ON(!PagePrivate(page));
-
-		clear_page_dirty_for_io(page);
-		xa_lock_irq(&page->mapping->i_pages);
-		if (!PageDirty(page))
-			__xa_clear_mark(&page->mapping->i_pages,
-					page_index(page), PAGECACHE_TAG_DIRTY);
-		xa_unlock_irq(&page->mapping->i_pages);
-		ClearPageError(page);
-		unlock_page(page);
+		btree_clear_page_dirty(page);
 	}
 	WARN_ON(atomic_read(&eb->refs) == 0);
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 44/49] btrfs: extent_io: make set_btree_ioerr() accept extent buffer
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (42 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 43/49] btrfs: extent_io: add subpage support for clear_extent_buffer_dirty() Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 45/49] btrfs: extent_io: introduce write_one_subpage_eb() function Qu Wenruo
                   ` (4 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

Current set_btree_ioerr() only accepts @page parameter and grabs extent
buffer from page::private.

This works fine for sector size == PAGE_SIZE case, but not for subpage
case.

Adds an extra parameter, @eb, for callers to pass extent buffer to this
function, so that subpage code can reuse this function.

Also since we are here, change how we grab "fs_info->flags" by using the
fs_info directly.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 07dec345f662..f80ba4c13fe6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3907,10 +3907,9 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 	return ret;
 }
 
-static void set_btree_ioerr(struct page *page)
+static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
 {
-	struct extent_buffer *eb = (struct extent_buffer *)page->private;
-	struct btrfs_fs_info *fs_info;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 
 	SetPageError(page);
 	if (test_and_set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
@@ -3920,7 +3919,6 @@ static void set_btree_ioerr(struct page *page)
 	 * If we error out, we should add back the dirty_metadata_bytes
 	 * to make it consistent.
 	 */
-	fs_info = eb->fs_info;
 	percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
 				 eb->len, fs_info->dirty_metadata_batch);
 
@@ -3964,13 +3962,13 @@ static void set_btree_ioerr(struct page *page)
 	 */
 	switch (eb->log_index) {
 	case -1:
-		set_bit(BTRFS_FS_BTREE_ERR, &eb->fs_info->flags);
+		set_bit(BTRFS_FS_BTREE_ERR, &fs_info->flags);
 		break;
 	case 0:
-		set_bit(BTRFS_FS_LOG1_ERR, &eb->fs_info->flags);
+		set_bit(BTRFS_FS_LOG1_ERR, &fs_info->flags);
 		break;
 	case 1:
-		set_bit(BTRFS_FS_LOG2_ERR, &eb->fs_info->flags);
+		set_bit(BTRFS_FS_LOG2_ERR, &fs_info->flags);
 		break;
 	default:
 		BUG(); /* unexpected, logic error */
@@ -3995,7 +3993,7 @@ static void end_bio_extent_buffer_writepage(struct bio *bio)
 		if (bio->bi_status ||
 		    test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
 			ClearPageUptodate(page);
-			set_btree_ioerr(page);
+			set_btree_ioerr(page, eb);
 		}
 
 		end_page_writeback(page);
@@ -4051,7 +4049,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 					 end_bio_extent_buffer_writepage,
 					 0, 0, 0, false);
 		if (ret) {
-			set_btree_ioerr(p);
+			set_btree_ioerr(p, eb);
 			if (PageWriteback(p))
 				end_page_writeback(p);
 			if (atomic_sub_and_test(num_pages - i, &eb->io_pages))
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 45/49] btrfs: extent_io: introduce write_one_subpage_eb() function
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (43 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 44/49] btrfs: extent_io: make set_btree_ioerr() accept extent buffer Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 46/49] btrfs: extent_io: make lock_extent_buffer_for_io() subpage compatible Qu Wenruo
                   ` (3 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

The new function, write_one_subpage_eb(), as a subroutine for subpage
metadata write, will handle the extent buffer bio submission.

The main difference between the new write_one_subpage_eb() and
write_one_eb() is:
- Page unlock
  write_one_subpage_eb() will not unlock the page, and it's the caller
  to lock the page , submit all extent buffers in the page,
  then unlock the page.

- Extra EXTENT_* bits along with page status update
  New EXTENT_WRITEBACK bit is introduced to trace extent buffer write
  back.

  For page dirty bit, it will only be cleared if all dirty extent buffers
  in the page range has been cleaned.
  For page writeback bit, it will be set anyway, and cleared in the
  error path if no other extent buffers are under writeback.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-io-tree.h |  3 ++
 fs/btrfs/extent_io.c      | 75 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 5c0a66146f05..12673bd50378 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -26,6 +26,9 @@ struct io_failure_record;
 /* For subpage btree io tree, indicates there is an in-tree extent buffer */
 #define EXTENT_HAS_TREE_BLOCK	(1U << 15)
 
+/* For subpage btree io tree, indicates the range is under writeback */
+#define EXTENT_WRITEBACK	(1U << 16)
+
 #define EXTENT_DO_ACCOUNTING    (EXTENT_CLEAR_META_RESV | \
 				 EXTENT_CLEAR_DATA_RESV)
 #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f80ba4c13fe6..736bc33a0e64 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3124,6 +3124,7 @@ static int submit_extent_page(unsigned int opf,
 	ASSERT(bio_ret);
 
 	if (*bio_ret) {
+		bool force_merge = false;
 		bool contig;
 		bool can_merge = true;
 
@@ -3149,6 +3150,7 @@ static int submit_extent_page(unsigned int opf,
 		if (prev_bio_flags != bio_flags || !contig || !can_merge ||
 		    force_bio_submit ||
 		    bio_add_page(bio, page, io_size, pg_offset) < io_size) {
+			ASSERT(!force_merge);
 			ret = submit_one_bio(bio, mirror_num, prev_bio_flags);
 			if (ret < 0) {
 				*bio_ret = NULL;
@@ -4007,6 +4009,76 @@ static void end_bio_extent_buffer_writepage(struct bio *bio)
 	bio_put(bio);
 }
 
+/*
+ * Unlike the work in write_one_eb(), we won't unlock the page even we
+ * succeeded submitting the extent buffer.
+ * It's callers responsibility to unlock the page after all extent
+ *
+ * Caller should still call write_one_eb() other than this function directly.
+ * As write_one_eb() has extra prepration before submitting the extent buffer.
+ */
+static int write_one_subpage_eb(struct extent_buffer *eb,
+				      struct writeback_control *wbc,
+				      struct extent_page_data *epd)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct extent_state *cached = NULL;
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	struct page *page = eb->pages[0];
+	u64 page_start = page_offset(page);
+	u64 page_end = page_start + PAGE_SIZE - 1;
+	unsigned int write_flags = wbc_to_write_flags(wbc) | REQ_META;
+	bool no_dirty_ebs = false;
+	int ret;
+
+	ASSERT(PageLocked(page));
+
+	/* Convert the EXTENT_DIRTY to EXTENT_WRITEBACK for this eb */
+	ret = convert_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+				 EXTENT_WRITEBACK, EXTENT_DIRTY, &cached);
+	if (ret < 0)
+		return ret;
+	/*
+	 * Only clear page dirty if there is no dirty extent buffer in the
+	 * page range
+	 */
+	if (!test_range_bit(io_tree, page_start, page_end, EXTENT_DIRTY, 0,
+			    cached)) {
+		clear_page_dirty_for_io(page);
+		no_dirty_ebs = true;
+	}
+	/* Any extent buffer writeback will mark the full page writeback */
+	set_page_writeback(page);
+
+	ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc, page,
+			eb->start, eb->len, eb->start - page_offset(page),
+			&epd->bio, end_bio_extent_buffer_writepage, 0, 0, 0,
+			false);
+	if (ret) {
+		clear_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+				 EXTENT_WRITEBACK, 0, 0, &cached);
+		set_btree_ioerr(page, eb);
+		if (PageWriteback(page) &&
+		    !test_range_bit(io_tree, page_start, page_end,
+				    EXTENT_WRITEBACK, 0, cached))
+			end_page_writeback(page);
+
+		if (atomic_dec_and_test(&eb->io_pages))
+			end_extent_buffer_writeback(eb);
+		free_extent_state(cached);
+		return -EIO;
+	}
+	free_extent_state(cached);
+	/*
+	 * Submission finishes without problem, if no eb is dirty anymore, we
+	 * have submitted a page.
+	 * Update the nr_written in wbc.
+	 */
+	if (no_dirty_ebs)
+		update_nr_written(wbc, 1);
+	return ret;
+}
+
 static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 			struct writeback_control *wbc,
 			struct extent_page_data *epd)
@@ -4038,6 +4110,9 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 		memzero_extent_buffer(eb, start, end - start);
 	}
 
+	if (btrfs_is_subpage(eb->fs_info))
+		return write_one_subpage_eb(eb, wbc, epd);
+
 	for (i = 0; i < num_pages; i++) {
 		struct page *p = eb->pages[i];
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 46/49] btrfs: extent_io: make lock_extent_buffer_for_io() subpage compatible
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (44 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 45/49] btrfs: extent_io: introduce write_one_subpage_eb() function Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 47/49] btrfs: extent_io: introduce submit_btree_subpage() to submit a page for subpage metadata write Qu Wenruo
                   ` (2 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

To support subpage metadata locking, the following aspects are modified:
- Locking sequence
  For regular sectorsize, we lock extent buffer first, then lock each
  page.
  For subpage sectorsize, we can't do that anymore, but let the caller
  to lock the whole page first, then lock each extent buffer in the
  page.

- Extent io tree locking
  For subpage metadata, we also lock the range in btree io tree.
  This allow the endio function to get unmerged extent_state, so that in
  endio function we don't need to allocate memory in atomic context.
  This also follows the behavior in metadata read path.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 47 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 736bc33a0e64..be8c863f7806 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3803,6 +3803,9 @@ static void end_extent_buffer_writeback(struct extent_buffer *eb)
  * Lock extent buffer status and pages for write back.
  *
  * May try to flush write bio if we can't get the lock.
+ * For subpage extent buffer, caller is responsible to lock the page, we won't
+ * flush write bio, which can cause extent buffers in the same page submitted
+ * to different bios.
  *
  * Return  0 if the extent buffer doesn't need to be submitted.
  * (E.g. the extent buffer is not dirty)
@@ -3813,26 +3816,47 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 			  struct extent_page_data *epd)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
 	int i, num_pages, failed_page_nr;
+	bool extent_locked = false;
 	int flush = 0;
 	int ret = 0;
 
+	if (btrfs_is_subpage(fs_info)) {
+		/*
+		 * For subpage extent buffer write, caller is responsible to
+		 * lock the page first.
+		 */
+		ASSERT(PageLocked(eb->pages[0]));
+
+		/*
+		 * Also lock the range so that endio can always get unmerged
+		 * extent_state.
+		 */
+		ret = lock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		if (ret < 0)
+			goto out;
+		extent_locked = true;
+	}
+
 	if (!btrfs_try_tree_write_lock(eb)) {
 		ret = flush_write_bio(epd);
 		if (ret < 0)
-			return ret;
+			goto out;
 		flush = 1;
 		btrfs_tree_lock(eb);
 	}
 
 	if (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) {
 		btrfs_tree_unlock(eb);
-		if (!epd->sync_io)
-			return 0;
+		if (!epd->sync_io) {
+			ret = 0;
+			goto out;
+		}
 		if (!flush) {
 			ret = flush_write_bio(epd);
 			if (ret < 0)
-				return ret;
+				goto out;
 			flush = 1;
 		}
 		while (1) {
@@ -3860,11 +3884,19 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 		ret = 1;
 	} else {
 		spin_unlock(&eb->refs_lock);
+		if (extent_locked)
+			unlock_extent(io_tree, eb->start,
+				      eb->start + eb->len - 1);
 	}
 
 	btrfs_tree_unlock(eb);
 
-	if (!ret)
+	/*
+	 * Either the tree does not need to be submitted, or we're
+	 * submitting subpage extent buffer.
+	 * Either we we don't need to lock the page(s).
+	 */
+	if (!ret || btrfs_is_subpage(fs_info))
 		return ret;
 
 	num_pages = num_extent_pages(eb);
@@ -3906,6 +3938,11 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 				 fs_info->dirty_metadata_batch);
 	btrfs_clear_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN);
 	btrfs_tree_unlock(eb);
+	/* Subpage should never reach this routine */
+	ASSERT(!btrfs_is_subpage(fs_info));
+out:
+	if (extent_locked)
+		unlock_extent(io_tree, eb->start, eb->start + eb->len - 1);
 	return ret;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 47/49] btrfs: extent_io: introduce submit_btree_subpage() to submit a page for subpage metadata write
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (45 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 46/49] btrfs: extent_io: make lock_extent_buffer_for_io() subpage compatible Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 48/49] btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 49/49] btrfs: support metadata read write for test Qu Wenruo
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

The new function, submit_btree_subpage(), will submit all the dirty extent
buffers in the page.

The major difference between submit_btree_page() is:
- Page locking sequence
  Now we lock page first then lock extent buffers, thus we don't need to
  unlock the page just after writting one extent buffer.
  The page get unlocked after we have submitted all extent buffers.

- Bio submission
  Since one extent buffer is ensured to be contained into one page, we
  call submit_extent_page() directly.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 69 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index be8c863f7806..bd79b3531a75 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4185,6 +4185,72 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 	return ret;
 }
 
+/*
+ * A helper to submit one subpage btree page.
+ *
+ * The main difference between submit_btree_page() is:
+ * - Page locking sequence
+ *   Page are locked first, then lock extent buffers
+ *
+ * - Flush write bio
+ *   We only flush bio if we may be unable to fit current extent buffers into
+ *   current bio.
+ *
+ * Return >=0 for the number of submitted extent buffers.
+ * Return <0 for fatal error.
+ */
+static int submit_btree_subpage(struct page *page,
+				struct writeback_control *wbc,
+				struct extent_page_data *epd)
+{
+	struct btrfs_fs_info *fs_info = page_to_fs_info(page);
+	int submitted = 0;
+	u64 page_start = page_offset(page);
+	u64 page_end = page_start + PAGE_SIZE - 1;
+	u64 cur = page_start;
+	int ret;
+
+	/* Lock the page first */
+	lock_page(page);
+
+	/* Then lock and write each extent buffers in the range */
+	while (cur <= page_end) {
+		struct extent_buffer *eb;
+
+		ret = btrfs_find_first_subpage_eb(fs_info, &eb, cur, page_end,
+						  EXTENT_DIRTY);
+		if (ret > 0)
+			break;
+		ret = atomic_inc_not_zero(&eb->refs);
+		if (!ret)
+			continue;
+
+		cur = eb->start + eb->len;
+		ret = lock_extent_buffer_for_io(eb, epd);
+		if (ret == 0) {
+			free_extent_buffer(eb);
+			continue;
+		}
+		if (ret < 0) {
+			free_extent_buffer(eb);
+			goto cleanup;
+		}
+		ret = write_one_eb(eb, wbc, epd);
+		free_extent_buffer(eb);
+		if (ret < 0)
+			goto cleanup;
+		submitted++;
+	}
+	unlock_page(page);
+	return submitted;
+
+cleanup:
+	unlock_page(page);
+	/* We hit error, end bio for the submitted extent buffers */
+	end_write_bio(epd, ret);
+	return ret;
+}
+
 /*
  * A helper to submit a btree page.
  *
@@ -4210,6 +4276,9 @@ static int submit_btree_page(struct page *page, struct writeback_control *wbc,
 	if (!PagePrivate(page))
 		return 0;
 
+	if (btrfs_is_subpage(page_to_fs_info(page)))
+		return submit_btree_subpage(page, wbc, epd);
+
 	spin_lock(&mapping->private_lock);
 	if (!PagePrivate(page)) {
 		spin_unlock(&mapping->private_lock);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 48/49] btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (46 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 47/49] btrfs: extent_io: introduce submit_btree_subpage() to submit a page for subpage metadata write Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  2020-09-30  1:55 ` [PATCH v3 49/49] btrfs: support metadata read write for test Qu Wenruo
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

The new function, end_bio_subpage_eb_writepage(), will handle the
metadata writeback endio.

The major difference involved is:
- Page Writeback clear
  We will only clear the page writeback bit after all extent buffers in
  the same page has finished their writeback.
  This means we need to check the EXTENT_WRITEBACK bit for the page
  range.

- Clear EXTENT_WRITEBACK bit for btree inode
  This is the new bit for btree inode io tree. It emulates the same page
  status, but in sector size aligned range.
  The new bit is remapped from EXTENT_DEFRAG, as defrag is impossible
  for btree inode, it should be pretty safe to use.

Also since the new endio function needs quite some extent io tree
operations, change btree_submit_bio_hook() to queue the endio work into
metadata endio workqueue.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   | 21 ++++++++++++-
 fs/btrfs/extent_io.c | 70 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e466c30b52c8..2ac980f739dc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -961,6 +961,7 @@ blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio,
 	async->mirror_num = mirror_num;
 	async->submit_bio_start = submit_bio_start;
 
+
 	btrfs_init_work(&async->work, run_one_async_start, run_one_async_done,
 			run_one_async_free);
 
@@ -1031,7 +1032,25 @@ static blk_status_t btree_submit_bio_hook(struct inode *inode, struct bio *bio,
 		if (ret)
 			goto out_w_error;
 		ret = btrfs_map_bio(fs_info, bio, mirror_num);
-	} else if (!async) {
+		if (ret < 0)
+			goto out_w_error;
+		return ret;
+	}
+
+	/*
+	 * For subpage metadata write, the endio involes several
+	 * extent_io_tree operations, which is not suitable for endio
+	 * context.
+	 * Thus we need to queue them into endio workqueue.
+	 */
+	if (btrfs_is_subpage(fs_info)) {
+		ret = btrfs_bio_wq_end_io(fs_info, bio,
+					  BTRFS_WQ_ENDIO_METADATA);
+		if (ret)
+			goto out_w_error;
+	}
+
+	if (!async) {
 		ret = btree_csum_one_bio(bio);
 		if (ret)
 			goto out_w_error;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index bd79b3531a75..fc882daf6899 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4014,6 +4014,73 @@ static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
 	}
 }
 
+/*
+ * The endio function for subpage extent buffer write.
+ *
+ * Unlike end_bio_extent_buffer_writepage(), we only call end_page_writeback()
+ * after all extent buffers in the page has finished their writeback.
+ */
+static void end_bio_subpage_eb_writepage(struct bio *bio)
+{
+	struct bio_vec *bvec;
+	struct bvec_iter_all iter_all;
+
+	ASSERT(!bio_flagged(bio, BIO_CLONED));
+	bio_for_each_segment_all(bvec, bio, iter_all) {
+		struct page *page = bvec->bv_page;
+		struct btrfs_fs_info *fs_info = page_to_fs_info(page);
+		struct extent_buffer *eb;
+		u64 page_start = page_offset(page);
+		u64 page_end = page_start + PAGE_SIZE - 1;
+		u64 bvec_start = page_offset(page) + bvec->bv_offset;
+		u64 bvec_end = bvec_start + bvec->bv_len - 1;
+		u64 cur_bytenr = bvec_start;
+
+		ASSERT(IS_ALIGNED(bvec->bv_len, fs_info->nodesize));
+
+		/* Iterate through all extent buffers in the range */
+		while (cur_bytenr <= bvec_end) {
+			struct extent_state *cached = NULL;
+			struct extent_io_tree *io_tree =
+				info_to_btree_io_tree(fs_info);
+			int done;
+			int ret;
+
+			ret = btrfs_find_first_subpage_eb(fs_info, &eb,
+					cur_bytenr, bvec_end, 0);
+			if (ret > 0)
+				break;
+
+			cur_bytenr = eb->start + eb->len;
+
+			ASSERT(test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags));
+			done = atomic_dec_and_test(&eb->io_pages);
+			ASSERT(done);
+
+			if (bio->bi_status ||
+			    test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
+				ClearPageUptodate(page);
+				set_btree_ioerr(page, eb);
+			}
+
+			clear_extent_bit(io_tree, eb->start,
+					eb->start + eb->len - 1,
+					EXTENT_WRITEBACK | EXTENT_LOCKED, 1, 0,
+					&cached);
+			/*
+			 * Only end the page writeback if there is no extent
+			 * buffer under writeback in the page anymore
+			 */
+			if (!test_range_bit(io_tree, page_start, page_end,
+					   EXTENT_WRITEBACK, 0, cached))
+				end_page_writeback(page);
+			free_extent_state(cached);
+			end_extent_buffer_writeback(eb);
+		}
+	}
+	bio_put(bio);
+}
+
 static void end_bio_extent_buffer_writepage(struct bio *bio)
 {
 	struct bio_vec *bvec;
@@ -4021,6 +4088,9 @@ static void end_bio_extent_buffer_writepage(struct bio *bio)
 	int done;
 	struct bvec_iter_all iter_all;
 
+	if (btrfs_is_subpage(page_to_fs_info(bio_first_page_all(bio))))
+		return end_bio_subpage_eb_writepage(bio);
+
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
 		struct page *page = bvec->bv_page;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v3 49/49] btrfs: support metadata read write for test
  2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
                   ` (47 preceding siblings ...)
  2020-09-30  1:55 ` [PATCH v3 48/49] btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function Qu Wenruo
@ 2020-09-30  1:55 ` Qu Wenruo
  48 siblings, 0 replies; 50+ messages in thread
From: Qu Wenruo @ 2020-09-30  1:55 UTC (permalink / raw)
  To: linux-btrfs

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 10 ----------
 fs/btrfs/file.c    |  4 ++++
 fs/btrfs/super.c   |  7 -------
 3 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2ac980f739dc..8b5f65e6c5fa 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3335,16 +3335,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_alloc;
 	}
 
-	/* For 4K sector size support, it's only read-only yet */
-	if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) {
-		if (!sb_rdonly(sb) || btrfs_super_log_root(disk_super)) {
-			btrfs_err(fs_info,
-				"subpage sector size only support RO yet");
-			err = -EINVAL;
-			goto fail_alloc;
-		}
-	}
-
 	ret = btrfs_init_workqueues(fs_info, fs_devices);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 4507c3d09399..0785e16ba243 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1937,6 +1937,10 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	loff_t oldsize;
 	int clean_page = 0;
 
+	/* Don't support data write yet */
+	if (btrfs_is_subpage(fs_info))
+		return -EOPNOTSUPP;
+
 	if (!(iocb->ki_flags & IOCB_DIRECT) &&
 	    (iocb->ki_flags & IOCB_NOWAIT))
 		return -EOPNOTSUPP;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 743a2fadf4ee..25967ecaaf0a 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1922,13 +1922,6 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			ret = -EINVAL;
 			goto restore;
 		}
-		if (btrfs_is_subpage(fs_info)) {
-			btrfs_warn(fs_info,
-	"read-write mount is not yet allowed for sector size %u page size %lu",
-				   fs_info->sectorsize, PAGE_SIZE);
-			ret = -EINVAL;
-			goto restore;
-		}
 
 		ret = btrfs_cleanup_fs_roots(fs_info);
 		if (ret)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2020-09-30  1:57 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-30  1:54 [PATCH v3 00/49] btrfs: add partial rw support for subpage sector size Qu Wenruo
2020-09-30  1:54 ` [PATCH v3 01/49] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
2020-09-30  1:54 ` [PATCH v3 02/49] btrfs: use iosize while reading compressed pages Qu Wenruo
2020-09-30  1:54 ` [PATCH v3 03/49] btrfs: extent_io: fix the comment on lock_extent_buffer_for_io() Qu Wenruo
2020-09-30  1:54 ` [PATCH v3 04/49] btrfs: extent_io: update the comment for find_first_extent_bit() Qu Wenruo
2020-09-30  1:54 ` [PATCH v3 05/49] btrfs: make btree inode io_tree has its special owner Qu Wenruo
2020-09-30  1:54 ` [PATCH v3 06/49] btrfs: disk-io: replace @fs_info and @private_data with @inode for btrfs_wq_submit_bio() Qu Wenruo
2020-09-30  1:54 ` [PATCH v3 07/49] btrfs: inode: sink parameter @start and @len for check_data_csum() Qu Wenruo
2020-09-30  1:54 ` [PATCH v3 08/49] btrfs: extent_io: unexport extent_invalidatepage() Qu Wenruo
2020-09-30  1:54 ` [PATCH v3 09/49] btrfs: extent_io: remove the forward declaration and rename __process_pages_contig Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 10/49] btrfs: extent_io: rename pages_locked in process_pages_contig() Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 11/49] btrfs: extent_io: make process_pages_contig() to accept bytenr directly Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 12/49] btrfs: extent_io: only require sector size alignment for page read Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 13/49] btrfs: extent_io: remove the extent_start/extent_len for end_bio_extent_readpage() Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 14/49] btrfs: extent_io: integrate page status update into endio_readpage_release_extent() Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 15/49] btrfs: extent_io: rename page_size to io_size in submit_extent_page() Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 16/49] btrfs: extent_io: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 17/49] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 18/49] btrfs: extent_io: calculate inline extent buffer page size based on page size Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 19/49] btrfs: extent_io: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 20/49] btrfs: disk_io: grab fs_info from extent_buffer::fs_info directly for btrfs_mark_buffer_dirty() Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 21/49] btrfs: disk-io: make csum_tree_block() handle sectorsize smaller than page size Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 22/49] btrfs: disk-io: extract the extent buffer verification from btree_readpage_end_io_hook() Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 23/49] btrfs: disk-io: accept bvec directly for csum_dirty_buffer() Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 24/49] btrfs: inode: make btrfs_readpage_end_io_hook() follow sector size Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 25/49] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 26/49] btrfs: extent_io: allow find_first_extent_bit() to find a range with exact bits match Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 27/49] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 28/49] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 29/49] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 30/49] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage() Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 31/49] btrfs: extent-io: make type of extent_state::state to be at least 32 bits Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 32/49] btrfs: extent_io: use extent_io_tree to handle subpage extent buffer allocation Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 33/49] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 34/49] btrfs: extent_io: make the assert test on page uptodate able to handle subpage Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 35/49] btrfs: extent_io: implement subpage metadata read and its endio function Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 36/49] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 37/49] btrfs: set btree inode track_uptodate for subpage support Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 38/49] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 39/49] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 40/49] btrfs: disk-io: support subpage metadata csum calculation at write time Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 41/49] btrfs: extent_io: prevent extent_state from being merged for btree io tree Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 42/49] btrfs: extent_io: make set_extent_buffer_dirty() to support subpage sized metadata Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 43/49] btrfs: extent_io: add subpage support for clear_extent_buffer_dirty() Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 44/49] btrfs: extent_io: make set_btree_ioerr() accept extent buffer Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 45/49] btrfs: extent_io: introduce write_one_subpage_eb() function Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 46/49] btrfs: extent_io: make lock_extent_buffer_for_io() subpage compatible Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 47/49] btrfs: extent_io: introduce submit_btree_subpage() to submit a page for subpage metadata write Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 48/49] btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function Qu Wenruo
2020-09-30  1:55 ` [PATCH v3 49/49] btrfs: support metadata read write for test Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.