linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size
@ 2020-10-21  6:24 Qu Wenruo
  2020-10-21  6:24 ` [PATCH v4 01/68] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
                   ` (69 more replies)
  0 siblings, 70 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:24 UTC (permalink / raw)
  To: linux-btrfs

Patches can be fetched from github:
https://github.com/adam900710/linux/tree/subpage_data_fullpage_write

=== Overview ===
To make 64K page size systems to mount 4K sector size btrfs and to
regular read-write.

=== What works ===
- Subpage data read
  Both uncompressed and compressed data

- Subpage metadata read/write
  So far single thread "fsstress" loops hasn't crash the system with all
  debug options enabled.
  (Currently running with "-n 2048" in a 1024 run loop).

  This also means, we can do subpage sized pure metadata operations like
  reflink. (e.g. we can result 4K sector using reflink without problem)

- Full page data write
  Only tested uncompressed data yet.
  This means all data write will happen in page size, including:
  * buffered write
  * dio write
  * hole punch for unaligned range
  This means even just one 4K sector is dirtied, we will writeback the
  full 64K page as an data extent.

=== What doesn't works ===
- Balance
- Scrub
  All failed with csum check failure, may be quick to solve, but the
  current development status and patchset size is enough for a milestone.

- Dev replace
  Unable to submit subpage data writes.


=== Challenges and solutions ===
- Metadata
  * One 64K page can contain several tree blocks
    Instead of full page read/write/lock, we use extent io tree to do
    sector aligned read/write/lock, and avoid full page lock if
    possible.

  * Metadata can cross 64K page boundary
    This only happens for certain converted fs. Consider how little used
    just reject them for now and fix convert.

  Overall, metadata is not that complex as metadata has very limited
  interfaces.

- Data
  * Data has more page status and uses ordered extents
  * Data subpage write can be handled by iomap
    Instead of using extent io tree for each page status, goes full page
    write back.
    So that I won't waste time to implement something which is designed
    to be replaced.

- Testing
  * No way to test under 86_64
    Currently I'm using an RK3399 board with NVME driver, planning to
    move to a Xavier AGX board.
    But we plan to add 2K sector size support as a pure testing sector
    size for x86_64 (but still 4K as minimal node size) to test subpage
    routines and make my life a little easier.

=== TODO ===
- More testing 
  Obviously

- Balance and scrub support
- Limited data subpage write
  Mostly for balance and replace, as a workaround.

- Iomap support for true subpage data writeback

=== Patchset structure ===
Patch 01~03:	Small bug fixes
Patch 04~22:	Generic cleanup and refactors, which make sense without
		subpage support
Patch 23~27:	Subpage specific cleanup and refactors.
Patch 28~42:	Enablement for subpage RO mount
Patch 43~52:	Enablement for subpage metadata write
Patch 53~68:	Enablement for subpage data write (although still in
		page size)

=== Changelog ===
v2:
- Migrating to extent_io_tree based status/locking mechanism
  This gets rid of the ad-hoc subpage_eb_mapping structure and extra
  timing to verify the extent buffers.

  This also brings some extra cleanups for btree inode extent io tree
  hooks which makes no sense for both subpage and regular sector size.

  This also completely removes the requirement for page status like
  Locked/Uptodate/Dirty. Now metadata pages only utilize Private status,
  while private pointer is always NULL.

- Submit proper subpage sized read for metadata
  With the help of extent io tree, we no longer need to bother full page
  read. Now submit subpage sized metadata read and do subpage locking.

- Remove some unnecessary refactors
  Some refactors like extracting detach_extent_buffer_pages() doesn't
  really make the code cleaner. We can easily add subpage specific
  branch.

- Address the comments from v1

v3:
- Add compressed data read fix

- Also update page status according to extent status for btree inode
  This makes us to reuse more code from the existing code base.

- Add metadata write support
  Only manually tested (with a fs created under x86_64, and script to do
  metadata only operations under aarch64 with 64K page size).

- More cleanup/refactors during metadata write support development.

v4:
- Add more refactors
  The mostly obvious one is the refactor of __set/__clear_extent_bit()
  to make the less common options less visible, and allow me to add more
  options more easily.

- Add full data page write support

- More bug fixes for existing patches
  Mostly the bug found during fsstress tests.

- Reduce page locking to minimal for metadata
  I hit a possible ABBA lock, where extent io tree locking and page
  locking leads to dead lock.
  To resolve it without adding more requirement for page locking
  sequence, subpage metadata only rely on extent io tree locking.
  Page locking is only reserved for unavoidable cases, like calling
  clear_page_dirty_for_io().

Goldwyn Rodrigues (1):
  btrfs: use iosize while reading compressed pages

Qu Wenruo (67):
  btrfs: extent-io-tests: remove invalid tests
  btrfs: extent_io: fix the comment on lock_extent_buffer_for_io().
  btrfs: extent_io: update the comment for find_first_extent_bit()
  btrfs: extent_io: sink the @failed_start parameter for
    set_extent_bit()
  btrfs: make btree inode io_tree has its special owner
  btrfs: disk-io: replace @fs_info and @private_data with @inode for
    btrfs_wq_submit_bio()
  btrfs: inode: sink parameter @start and @len for check_data_csum()
  btrfs: extent_io: unexport extent_invalidatepage()
  btrfs: extent_io: remove the forward declaration and rename
    __process_pages_contig
  btrfs: extent_io: rename pages_locked in process_pages_contig()
  btrfs: extent_io: only require sector size alignment for page read
  btrfs: extent_io: remove the extent_start/extent_len for
    end_bio_extent_readpage()
  btrfs: extent_io: integrate page status update into
    endio_readpage_release_extent()
  btrfs: extent_io: rename page_size to io_size in submit_extent_page()
  btrfs: extent_io: add assert_spin_locked() for
    attach_extent_buffer_page()
  btrfs: extent_io: extract the btree page submission code into its own
    helper function
  btrfs: extent_io: calculate inline extent buffer page size based on
    page size
  btrfs: extent_io: make btrfs_fs_info::buffer_radix to take sector size
    devided values
  btrfs: extent_io: sink less common parameters for __set_extent_bit()
  btrfs: extent_io: sink less common parameters for __clear_extent_bit()
  btrfs: disk_io: grab fs_info from extent_buffer::fs_info directly for
    btrfs_mark_buffer_dirty()
  btrfs: disk-io: make csum_tree_block() handle sectorsize smaller than
    page size
  btrfs: disk-io: extract the extent buffer verification from
    btree_readpage_end_io_hook()
  btrfs: disk-io: accept bvec directly for csum_dirty_buffer()
  btrfs: inode: make btrfs_readpage_end_io_hook() follow sector size
  btrfs: introduce a helper to determine if the sectorsize is smaller
    than PAGE_SIZE
  btrfs: extent_io: allow find_first_extent_bit() to find a range with
    exact bits match
  btrfs: extent_io: don't allow tree block to cross page boundary for
    subpage support
  btrfs: extent_io: update num_extent_pages() to support subpage sized
    extent buffer
  btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
  btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage()
  btrfs: extent-io: make type of extent_state::state to be at least 32
    bits
  btrfs: extent_io: use extent_io_tree to handle subpage extent buffer
    allocation
  btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support
    subpage size
  btrfs: extent_io: make the assert test on page uptodate able to handle
    subpage
  btrfs: extent_io: implement subpage metadata read and its endio
    function
  btrfs: extent_io: implement try_release_extent_buffer() for subpage
    metadata support
  btrfs: extent_io: extra the core of test_range_bit() into
    test_range_bit_nolock()
  btrfs: extent_io: introduce EXTENT_READ_SUBMITTED to handle subpage
    data read
  btrfs: set btree inode track_uptodate for subpage support
  btrfs: allow RO mount of 4K sector size fs on 64K page system
  btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check
    on subpage metadata
  btrfs: disk-io: support subpage metadata csum calculation at write
    time
  btrfs: extent_io: prevent extent_state from being merged for btree io
    tree
  btrfs: extent_io: make set_extent_buffer_dirty() to support subpage
    sized metadata
  btrfs: extent_io: add subpage support for clear_extent_buffer_dirty()
  btrfs: extent_io: make set_btree_ioerr() accept extent buffer
  btrfs: extent_io: introduce write_one_subpage_eb() function
  btrfs: extent_io: make lock_extent_buffer_for_io() subpage compatible
  btrfs: extent_io: introduce submit_btree_subpage() to submit a page
    for subpage metadata write
  btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function
  btrfs: inode: make can_nocow_extent() check only return 1 if the range
    is no smaller than PAGE_SIZE
  btrfs: file: calculate reserve space based on PAGE_SIZE for buffered
    write
  btrfs: file: make hole punching page aligned for subpage
  btrfs: file: make btrfs_dirty_pages() follow page size to mark extent
    io tree
  btrfs: file: make btrfs_file_write_iter() to be page aligned
  btrfs: output extra info for space info update underflow
  btrfs: delalloc-space: make data space reservation to be page aligned
  btrfs: scrub: allow scrub to work with subpage sectorsize
  btrfs: inode: make btrfs_truncate_block() to do page alignment
  btrfs: file: make hole punch and zero range to be page aligned
  btrfs: file: make btrfs_fallocate() to use PAGE_SIZE as blocksize
  btrfs: inode: always mark the full page range delalloc for
    btrfs_page_mkwrite()
  btrfs: inode: require page alignement for direct io
  btrfs: inode: only do NOCOW write for page aligned extent
  btrfs: reflink: do full page writeback for reflink prepare
  btrfs: support subpage read write for test

 fs/btrfs/block-group.c           |    2 +-
 fs/btrfs/btrfs_inode.h           |   12 +
 fs/btrfs/ctree.c                 |    5 +-
 fs/btrfs/ctree.h                 |   43 +-
 fs/btrfs/delalloc-space.c        |   19 +-
 fs/btrfs/disk-io.c               |  425 ++++++--
 fs/btrfs/disk-io.h               |    8 +-
 fs/btrfs/extent-io-tree.h        |  145 ++-
 fs/btrfs/extent-tree.c           |    2 +-
 fs/btrfs/extent_io.c             | 1576 ++++++++++++++++++++++--------
 fs/btrfs/extent_io.h             |   27 +-
 fs/btrfs/extent_map.c            |    2 +-
 fs/btrfs/file.c                  |  140 ++-
 fs/btrfs/free-space-cache.c      |    2 +-
 fs/btrfs/inode.c                 |  117 ++-
 fs/btrfs/reflink.c               |   36 +-
 fs/btrfs/relocation.c            |    2 +-
 fs/btrfs/scrub.c                 |    8 -
 fs/btrfs/space-info.h            |    4 +-
 fs/btrfs/struct-funcs.c          |   18 +-
 fs/btrfs/tests/extent-io-tests.c |   26 +-
 fs/btrfs/transaction.c           |    4 +-
 fs/btrfs/volumes.c               |    2 +-
 include/trace/events/btrfs.h     |    1 +
 24 files changed, 1927 insertions(+), 699 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v4 01/68] btrfs: extent-io-tests: remove invalid tests
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
@ 2020-10-21  6:24 ` Qu Wenruo
  2020-10-26 23:26   ` David Sterba
  2020-10-21  6:24 ` [PATCH v4 02/68] btrfs: use iosize while reading compressed pages Qu Wenruo
                   ` (68 subsequent siblings)
  69 siblings, 1 reply; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:24 UTC (permalink / raw)
  To: linux-btrfs

In extent-io-test, there are two invalid tests:
- Invalid nodesize for test_eb_bitmaps()
  Instead of the sectorsize and nodesize combination passed in, we're
  always using hand-crafted nodesize.
  Although it has some extra check for 64K page size, we can still hit
  a case where PAGE_SIZE == 32K, then we got 128K nodesize which is
  larger than max valid node size.

  Thankfully most machines are either 4K or 64K page size, thus we
  haven't yet hit such case.

- Invalid extent buffer bytenr
  For 64K page size, the only combination we're going to test is
  sectorsize = nodesize = 64K.
  In that case, we'll try to create an extent buffer with 32K bytenr,
  which is not aligned to sectorsize thus invalid.

This patch will fix both problems by:
- Honor the sectorsize/nodesize combination
  Now we won't bother to hand-craft a strange length and use it as
  nodesize.

- Use sectorsize as the 2nd run extent buffer start
  This would test the case where extent buffer is aligned to sectorsize
  but not always aligned to nodesize.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/tests/extent-io-tests.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
index df7ce874a74b..73e96d505f4f 100644
--- a/fs/btrfs/tests/extent-io-tests.c
+++ b/fs/btrfs/tests/extent-io-tests.c
@@ -379,54 +379,50 @@ static int __test_eb_bitmaps(unsigned long *bitmap, struct extent_buffer *eb,
 static int test_eb_bitmaps(u32 sectorsize, u32 nodesize)
 {
 	struct btrfs_fs_info *fs_info;
-	unsigned long len;
 	unsigned long *bitmap = NULL;
 	struct extent_buffer *eb = NULL;
 	int ret;
 
 	test_msg("running extent buffer bitmap tests");
 
-	/*
-	 * In ppc64, sectorsize can be 64K, thus 4 * 64K will be larger than
-	 * BTRFS_MAX_METADATA_BLOCKSIZE.
-	 */
-	len = (sectorsize < BTRFS_MAX_METADATA_BLOCKSIZE)
-		? sectorsize * 4 : sectorsize;
-
-	fs_info = btrfs_alloc_dummy_fs_info(len, len);
+	fs_info = btrfs_alloc_dummy_fs_info(nodesize, sectorsize);
 	if (!fs_info) {
 		test_std_err(TEST_ALLOC_FS_INFO);
 		return -ENOMEM;
 	}
 
-	bitmap = kmalloc(len, GFP_KERNEL);
+	bitmap = kmalloc(nodesize, GFP_KERNEL);
 	if (!bitmap) {
 		test_err("couldn't allocate test bitmap");
 		ret = -ENOMEM;
 		goto out;
 	}
 
-	eb = __alloc_dummy_extent_buffer(fs_info, 0, len);
+	eb = __alloc_dummy_extent_buffer(fs_info, 0, nodesize);
 	if (!eb) {
 		test_std_err(TEST_ALLOC_ROOT);
 		ret = -ENOMEM;
 		goto out;
 	}
 
-	ret = __test_eb_bitmaps(bitmap, eb, len);
+	ret = __test_eb_bitmaps(bitmap, eb, nodesize);
 	if (ret)
 		goto out;
 
-	/* Do it over again with an extent buffer which isn't page-aligned. */
 	free_extent_buffer(eb);
-	eb = __alloc_dummy_extent_buffer(fs_info, nodesize / 2, len);
+
+	/*
+	 * Test again for case where the tree block is sectorsize aligned but
+	 * not nodesize aligned.
+	 */
+	eb = __alloc_dummy_extent_buffer(fs_info, sectorsize, nodesize);
 	if (!eb) {
 		test_std_err(TEST_ALLOC_ROOT);
 		ret = -ENOMEM;
 		goto out;
 	}
 
-	ret = __test_eb_bitmaps(bitmap, eb, len);
+	ret = __test_eb_bitmaps(bitmap, eb, nodesize);
 out:
 	free_extent_buffer(eb);
 	kfree(bitmap);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 02/68] btrfs: use iosize while reading compressed pages
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
  2020-10-21  6:24 ` [PATCH v4 01/68] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
@ 2020-10-21  6:24 ` Qu Wenruo
  2020-10-21  6:24 ` [PATCH v4 03/68] btrfs: extent_io: fix the comment on lock_extent_buffer_for_io() Qu Wenruo
                   ` (67 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues, Goldwyn Rodrigues

From: Goldwyn Rodrigues <rgoldwyn@suse.de>

While using compression, a submitted bio is mapped with a compressed bio
which performs the read from disk, decompresses and returns uncompressed
data to original bio. The original bio must reflect the uncompressed
size (iosize) of the I/O to be performed, or else the page just gets the
decompressed I/O length of data (disk_io_size). The compressed bio
checks the extent map and get the correct length while performing the
I/O from disk.

This came up in subpage work when only compressed length of the original
bio was filled in the page. This worked correctly for pagesize ==
sectorsize because both compressed and uncompressed data are at pagesize
boundaries, and would end up filling the requested page.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/btrfs/extent_io.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a940edb1e64f..64f7f61ce718 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3162,7 +3162,6 @@ static int __do_readpage(struct page *page,
 	int nr = 0;
 	size_t pg_offset = 0;
 	size_t iosize;
-	size_t disk_io_size;
 	size_t blocksize = inode->i_sb->s_blocksize;
 	unsigned long this_bio_flag = 0;
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
@@ -3228,13 +3227,10 @@ static int __do_readpage(struct page *page,
 		iosize = min(extent_map_end(em) - cur, end - cur + 1);
 		cur_end = min(extent_map_end(em) - 1, end);
 		iosize = ALIGN(iosize, blocksize);
-		if (this_bio_flag & EXTENT_BIO_COMPRESSED) {
-			disk_io_size = em->block_len;
+		if (this_bio_flag & EXTENT_BIO_COMPRESSED)
 			offset = em->block_start;
-		} else {
+		else
 			offset = em->block_start + extent_offset;
-			disk_io_size = iosize;
-		}
 		block_start = em->block_start;
 		if (test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
 			block_start = EXTENT_MAP_HOLE;
@@ -3323,7 +3319,7 @@ static int __do_readpage(struct page *page,
 		}
 
 		ret = submit_extent_page(REQ_OP_READ | read_flags, NULL,
-					 page, offset, disk_io_size,
+					 page, offset, iosize,
 					 pg_offset, bio,
 					 end_bio_extent_readpage, mirror_num,
 					 *bio_flags,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 03/68] btrfs: extent_io: fix the comment on lock_extent_buffer_for_io().
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
  2020-10-21  6:24 ` [PATCH v4 01/68] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
  2020-10-21  6:24 ` [PATCH v4 02/68] btrfs: use iosize while reading compressed pages Qu Wenruo
@ 2020-10-21  6:24 ` Qu Wenruo
  2020-10-21  6:24 ` [PATCH v4 04/68] btrfs: extent_io: update the comment for find_first_extent_bit() Qu Wenruo
                   ` (66 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:24 UTC (permalink / raw)
  To: linux-btrfs

The return value of that function is completely wrong.

That function only return 0 if the the extent buffer doesn't need to be
submitted.
The "ret = 1" and "ret = 0" are determined by the return value of
"test_and_clear_bit(EXTENT_BUFFER_DIRTY, &eb->bflags)".

And if we get ret == 1, it's because the extent buffer is dirty, and we
set its status to EXTENT_BUFFER_WRITE_BACK, and continue to page
locking.

While if we get ret == 0, it means the extent is not dirty from the
beginning, so we don't need to write it back.

The caller also follows this, in btree_write_cache_pages(), if
lock_extent_buffer_for_io() return 0, we just skip the extent buffer
completely.

So the comment is completely wrong.

Since we're here, also change the description a little.
The write bio flushing won't be visible to the caller, thus it's not an
major feature.
In the main decription, only describe the locking part to make the point
more clear.

Fixes: 2e3c25136adf ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 64f7f61ce718..a64d88163f3b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3688,11 +3688,14 @@ static void end_extent_buffer_writeback(struct extent_buffer *eb)
 }
 
 /*
- * Lock eb pages and flush the bio if we can't the locks
+ * Lock extent buffer status and pages for write back.
  *
- * Return  0 if nothing went wrong
- * Return >0 is same as 0, except bio is not submitted
- * Return <0 if something went wrong, no page is locked
+ * May try to flush write bio if we can't get the lock.
+ *
+ * Return  0 if the extent buffer doesn't need to be submitted.
+ * (E.g. the extent buffer is not dirty)
+ * Return >0 is the extent buffer is submitted to bio.
+ * Return <0 if something went wrong, no page is locked.
  */
 static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb,
 			  struct extent_page_data *epd)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 04/68] btrfs: extent_io: update the comment for find_first_extent_bit()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (2 preceding siblings ...)
  2020-10-21  6:24 ` [PATCH v4 03/68] btrfs: extent_io: fix the comment on lock_extent_buffer_for_io() Qu Wenruo
@ 2020-10-21  6:24 ` Qu Wenruo
  2020-10-21  6:24 ` [PATCH v4 05/68] btrfs: extent_io: sink the @failed_start parameter for set_extent_bit() Qu Wenruo
                   ` (65 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:24 UTC (permalink / raw)
  To: linux-btrfs

The pitfall here is, if the parameter @bits has multiple bits set, we
will return the first range which just has one of the specified bits
set.

This is a little tricky if we want an exact match.

Anyway, update the comment to inform the callers.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a64d88163f3b..2980e8384e74 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1554,11 +1554,12 @@ find_first_extent_bit_state(struct extent_io_tree *tree,
 }
 
 /*
- * find the first offset in the io tree with 'bits' set. zero is
- * returned if we find something, and *start_ret and *end_ret are
- * set to reflect the state struct that was found.
+ * Find the first offset in the io tree with one or more @bits set.
  *
- * If nothing was found, 1 is returned. If found something, return 0.
+ * NOTE: If @bits are multiple bits, any bit of @bits will meet the match.
+ *
+ * Return 0 if we find something, and update @start_ret and @end_ret.
+ * Return 1 if we found nothing.
  */
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
 			  u64 *start_ret, u64 *end_ret, unsigned bits,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 05/68] btrfs: extent_io: sink the @failed_start parameter for set_extent_bit()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (3 preceding siblings ...)
  2020-10-21  6:24 ` [PATCH v4 04/68] btrfs: extent_io: update the comment for find_first_extent_bit() Qu Wenruo
@ 2020-10-21  6:24 ` Qu Wenruo
  2020-10-21  6:24 ` [PATCH v4 06/68] btrfs: make btree inode io_tree has its special owner Qu Wenruo
                   ` (64 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:24 UTC (permalink / raw)
  To: linux-btrfs

The @failed_start parameter is only paired with @exclusive_bits, and
those parameters are only used for EXTENT_LOCKED bit, which have their
own wrappers lock_extent_bits().

Thus for regular set_extent_bit() calls, the failed_start makes no
sense, just sink the parameter.

Also, since @failed_start and @exclusive_bits are used in pairs, add
extra assert to make it more obvious.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-io-tree.h | 18 ++++++++----------
 fs/btrfs/extent_io.c      | 12 ++++++++----
 fs/btrfs/file.c           |  6 +++---
 fs/btrfs/inode.c          |  3 +--
 4 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 219a09a2b734..9a60d8426796 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -153,15 +153,15 @@ static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start,
 int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 			   unsigned bits, struct extent_changeset *changeset);
 int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		   unsigned bits, u64 *failed_start,
-		   struct extent_state **cached_state, gfp_t mask);
+		   unsigned bits, struct extent_state **cached_state,
+		   gfp_t mask);
 int set_extent_bits_nowait(struct extent_io_tree *tree, u64 start, u64 end,
 			   unsigned bits);
 
 static inline int set_extent_bits(struct extent_io_tree *tree, u64 start,
 		u64 end, unsigned bits)
 {
-	return set_extent_bit(tree, start, end, bits, NULL, NULL, GFP_NOFS);
+	return set_extent_bit(tree, start, end, bits, NULL, GFP_NOFS);
 }
 
 static inline int clear_extent_uptodate(struct extent_io_tree *tree, u64 start,
@@ -174,8 +174,7 @@ static inline int clear_extent_uptodate(struct extent_io_tree *tree, u64 start,
 static inline int set_extent_dirty(struct extent_io_tree *tree, u64 start,
 		u64 end, gfp_t mask)
 {
-	return set_extent_bit(tree, start, end, EXTENT_DIRTY, NULL,
-			      NULL, mask);
+	return set_extent_bit(tree, start, end, EXTENT_DIRTY, NULL, mask);
 }
 
 static inline int clear_extent_dirty(struct extent_io_tree *tree, u64 start,
@@ -196,7 +195,7 @@ static inline int set_extent_delalloc(struct extent_io_tree *tree, u64 start,
 {
 	return set_extent_bit(tree, start, end,
 			      EXTENT_DELALLOC | EXTENT_UPTODATE | extra_bits,
-			      NULL, cached_state, GFP_NOFS);
+			      cached_state, GFP_NOFS);
 }
 
 static inline int set_extent_defrag(struct extent_io_tree *tree, u64 start,
@@ -204,20 +203,19 @@ static inline int set_extent_defrag(struct extent_io_tree *tree, u64 start,
 {
 	return set_extent_bit(tree, start, end,
 			      EXTENT_DELALLOC | EXTENT_UPTODATE | EXTENT_DEFRAG,
-			      NULL, cached_state, GFP_NOFS);
+			      cached_state, GFP_NOFS);
 }
 
 static inline int set_extent_new(struct extent_io_tree *tree, u64 start,
 		u64 end)
 {
-	return set_extent_bit(tree, start, end, EXTENT_NEW, NULL, NULL,
-			GFP_NOFS);
+	return set_extent_bit(tree, start, end, EXTENT_NEW, NULL, GFP_NOFS);
 }
 
 static inline int set_extent_uptodate(struct extent_io_tree *tree, u64 start,
 		u64 end, struct extent_state **cached_state, gfp_t mask)
 {
-	return set_extent_bit(tree, start, end, EXTENT_UPTODATE, NULL,
+	return set_extent_bit(tree, start, end, EXTENT_UPTODATE,
 			      cached_state, mask);
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2980e8384e74..ca219c42ddc6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -980,6 +980,10 @@ __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 	btrfs_debug_check_extent_io_range(tree, start, end);
 	trace_btrfs_set_extent_bit(tree, start, end - start + 1, bits);
 
+	if (exclusive_bits)
+		ASSERT(failed_start);
+	else
+		ASSERT(!failed_start);
 again:
 	if (!prealloc && gfpflags_allow_blocking(mask)) {
 		/*
@@ -1180,11 +1184,11 @@ __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 }
 
 int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		   unsigned bits, u64 * failed_start,
-		   struct extent_state **cached_state, gfp_t mask)
+		   unsigned bits, struct extent_state **cached_state,
+		   gfp_t mask)
 {
-	return __set_extent_bit(tree, start, end, bits, 0, failed_start,
-				cached_state, mask, NULL);
+	return __set_extent_bit(tree, start, end, bits, 0, NULL, cached_state,
+			        mask, NULL);
 }
 
 
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 4507c3d09399..d3766d2bb8d6 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -481,8 +481,8 @@ static int btrfs_find_new_delalloc_bytes(struct btrfs_inode *inode,
 
 		ret = set_extent_bit(&inode->io_tree, search_start,
 				     search_start + em_len - 1,
-				     EXTENT_DELALLOC_NEW,
-				     NULL, cached_state, GFP_NOFS);
+				     EXTENT_DELALLOC_NEW, cached_state,
+				     GFP_NOFS);
 next:
 		search_start = extent_map_end(em);
 		free_extent_map(em);
@@ -1830,7 +1830,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 
 			set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
 				       lockend, EXTENT_NORESERVE, NULL,
-				       NULL, GFP_NOFS);
+				       GFP_NOFS);
 		}
 
 		btrfs_drop_pages(pages, num_pages);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9570458aa847..1d2fe21489ca 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4619,8 +4619,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 
 	if (only_release_metadata)
 		set_extent_bit(&BTRFS_I(inode)->io_tree, block_start,
-				block_end, EXTENT_NORESERVE, NULL, NULL,
-				GFP_NOFS);
+				block_end, EXTENT_NORESERVE, NULL, GFP_NOFS);
 
 out_unlock:
 	if (ret) {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 06/68] btrfs: make btree inode io_tree has its special owner
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (4 preceding siblings ...)
  2020-10-21  6:24 ` [PATCH v4 05/68] btrfs: extent_io: sink the @failed_start parameter for set_extent_bit() Qu Wenruo
@ 2020-10-21  6:24 ` Qu Wenruo
  2020-10-21  6:24 ` [PATCH v4 07/68] btrfs: disk-io: replace @fs_info and @private_data with @inode for btrfs_wq_submit_bio() Qu Wenruo
                   ` (63 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:24 UTC (permalink / raw)
  To: linux-btrfs

Btree inode is pretty special compared to all other inode extent io
tree, although it has a btrfs inode, it doesn't have the track_uptodate
bit set to true, and never has ordered extent.

Since it's so special, adds a new owner value for it to make debuging a
little easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c           | 2 +-
 fs/btrfs/extent-io-tree.h    | 1 +
 include/trace/events/btrfs.h | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f6bba7eb1fa1..be6edbd34934 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2116,7 +2116,7 @@ static void btrfs_init_btree_inode(struct btrfs_fs_info *fs_info)
 
 	RB_CLEAR_NODE(&BTRFS_I(inode)->rb_node);
 	extent_io_tree_init(fs_info, &BTRFS_I(inode)->io_tree,
-			    IO_TREE_INODE_IO, inode);
+			    IO_TREE_BTREE_INODE_IO, inode);
 	BTRFS_I(inode)->io_tree.track_uptodate = false;
 	extent_map_tree_init(&BTRFS_I(inode)->extent_tree);
 
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 9a60d8426796..92caa1190ca8 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -40,6 +40,7 @@ struct io_failure_record;
 enum {
 	IO_TREE_FS_PINNED_EXTENTS,
 	IO_TREE_FS_EXCLUDED_EXTENTS,
+	IO_TREE_BTREE_INODE_IO,
 	IO_TREE_INODE_IO,
 	IO_TREE_INODE_IO_FAILURE,
 	IO_TREE_RELOC_BLOCKS,
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 863335ecb7e8..89397605e465 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -79,6 +79,7 @@ struct btrfs_space_info;
 #define IO_TREE_OWNER						    \
 	EM( IO_TREE_FS_PINNED_EXTENTS, 	  "PINNED_EXTENTS")	    \
 	EM( IO_TREE_FS_EXCLUDED_EXTENTS,  "EXCLUDED_EXTENTS")	    \
+	EM( IO_TREE_BTREE_INODE_IO,	  "BTRFS_INODE_IO")	    \
 	EM( IO_TREE_INODE_IO,		  "INODE_IO")		    \
 	EM( IO_TREE_INODE_IO_FAILURE,	  "INODE_IO_FAILURE")	    \
 	EM( IO_TREE_RELOC_BLOCKS,	  "RELOC_BLOCKS")	    \
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 07/68] btrfs: disk-io: replace @fs_info and @private_data with @inode for btrfs_wq_submit_bio()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (5 preceding siblings ...)
  2020-10-21  6:24 ` [PATCH v4 06/68] btrfs: make btree inode io_tree has its special owner Qu Wenruo
@ 2020-10-21  6:24 ` Qu Wenruo
  2020-10-21 22:00   ` Goldwyn Rodrigues
  2020-10-21  6:24 ` [PATCH v4 08/68] btrfs: inode: sink parameter @start and @len for check_data_csum() Qu Wenruo
                   ` (62 subsequent siblings)
  69 siblings, 1 reply; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:24 UTC (permalink / raw)
  To: linux-btrfs

All callers for btrfs_wq_submit_bio() passes struct inode as
@private_data, so there is no need for @private_data to be (void *),
just replace it with "struct inode *inode".

While we can extra fs_info from struct inode, also remove the @fs_info
parameter.

Since we're here, also replace all the (void *private_data) into (struct
inode *inode).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   | 21 +++++++++++----------
 fs/btrfs/disk-io.h   |  8 ++++----
 fs/btrfs/extent_io.h |  2 +-
 fs/btrfs/inode.c     | 21 +++++++++------------
 4 files changed, 25 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index be6edbd34934..b7436ab7bba9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -110,7 +110,7 @@ static void btrfs_free_csum_hash(struct btrfs_fs_info *fs_info)
  * just before they are sent down the IO stack.
  */
 struct async_submit_bio {
-	void *private_data;
+	struct inode *inode;
 	struct bio *bio;
 	extent_submit_bio_start_t *submit_bio_start;
 	int mirror_num;
@@ -746,7 +746,7 @@ static void run_one_async_start(struct btrfs_work *work)
 	blk_status_t ret;
 
 	async = container_of(work, struct  async_submit_bio, work);
-	ret = async->submit_bio_start(async->private_data, async->bio,
+	ret = async->submit_bio_start(async->inode, async->bio,
 				      async->bio_offset);
 	if (ret)
 		async->status = ret;
@@ -767,7 +767,7 @@ static void run_one_async_done(struct btrfs_work *work)
 	blk_status_t ret;
 
 	async = container_of(work, struct  async_submit_bio, work);
-	inode = async->private_data;
+	inode = async->inode;
 
 	/* If an error occurred we just want to clean up the bio and move on */
 	if (async->status) {
@@ -797,18 +797,19 @@ static void run_one_async_free(struct btrfs_work *work)
 	kfree(async);
 }
 
-blk_status_t btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
+blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio,
 				 int mirror_num, unsigned long bio_flags,
-				 u64 bio_offset, void *private_data,
+				 u64 bio_offset,
 				 extent_submit_bio_start_t *submit_bio_start)
 {
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
 	struct async_submit_bio *async;
 
 	async = kmalloc(sizeof(*async), GFP_NOFS);
 	if (!async)
 		return BLK_STS_RESOURCE;
 
-	async->private_data = private_data;
+	async->inode = inode;
 	async->bio = bio;
 	async->mirror_num = mirror_num;
 	async->submit_bio_start = submit_bio_start;
@@ -845,8 +846,8 @@ static blk_status_t btree_csum_one_bio(struct bio *bio)
 	return errno_to_blk_status(ret);
 }
 
-static blk_status_t btree_submit_bio_start(void *private_data, struct bio *bio,
-					     u64 bio_offset)
+static blk_status_t btree_submit_bio_start(struct inode *inode, struct bio *bio,
+					   u64 bio_offset)
 {
 	/*
 	 * when we're called for a write, we're already in the async
@@ -893,8 +894,8 @@ static blk_status_t btree_submit_bio_hook(struct inode *inode, struct bio *bio,
 		 * kthread helpers are used to submit writes so that
 		 * checksumming can happen in parallel across all CPUs
 		 */
-		ret = btrfs_wq_submit_bio(fs_info, bio, mirror_num, 0,
-					  0, inode, btree_submit_bio_start);
+		ret = btrfs_wq_submit_bio(inode, bio, mirror_num, 0,
+					  0, btree_submit_bio_start);
 	}
 
 	if (ret)
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 00dc39d47ed3..2d564e9223e2 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -105,10 +105,10 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64 parent_transid, int level,
 		      struct btrfs_key *first_key);
 blk_status_t btrfs_bio_wq_end_io(struct btrfs_fs_info *info, struct bio *bio,
 			enum btrfs_wq_endio_type metadata);
-blk_status_t btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
-			int mirror_num, unsigned long bio_flags,
-			u64 bio_offset, void *private_data,
-			extent_submit_bio_start_t *submit_bio_start);
+blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio,
+				 int mirror_num, unsigned long bio_flags,
+				 u64 bio_offset,
+				 extent_submit_bio_start_t *submit_bio_start);
 blk_status_t btrfs_submit_bio_done(void *private_data, struct bio *bio,
 			  int mirror_num);
 int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 30794ae58498..3c9252b429e0 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -71,7 +71,7 @@ typedef blk_status_t (submit_bio_hook_t)(struct inode *inode, struct bio *bio,
 					 int mirror_num,
 					 unsigned long bio_flags);
 
-typedef blk_status_t (extent_submit_bio_start_t)(void *private_data,
+typedef blk_status_t (extent_submit_bio_start_t)(struct inode *inode,
 		struct bio *bio, u64 bio_offset);
 
 struct extent_io_ops {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1d2fe21489ca..2a56d3b8eff4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2157,11 +2157,9 @@ int btrfs_bio_fits_in_stripe(struct page *page, size_t size, struct bio *bio,
  * At IO completion time the cums attached on the ordered extent record
  * are inserted into the btree
  */
-static blk_status_t btrfs_submit_bio_start(void *private_data, struct bio *bio,
-				    u64 bio_offset)
+static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio,
+					   u64 bio_offset)
 {
-	struct inode *inode = private_data;
-
 	return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0);
 }
 
@@ -2221,8 +2219,8 @@ static blk_status_t btrfs_submit_bio_hook(struct inode *inode, struct bio *bio,
 		if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID)
 			goto mapit;
 		/* we're doing a write, do the async checksumming */
-		ret = btrfs_wq_submit_bio(fs_info, bio, mirror_num, bio_flags,
-					  0, inode, btrfs_submit_bio_start);
+		ret = btrfs_wq_submit_bio(inode, bio, mirror_num, bio_flags,
+					  0, btrfs_submit_bio_start);
 		goto out;
 	} else if (!skip_sum) {
 		ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0);
@@ -7615,11 +7613,10 @@ static void __endio_write_update_ordered(struct btrfs_inode *inode,
 	}
 }
 
-static blk_status_t btrfs_submit_bio_start_direct_io(void *private_data,
-				    struct bio *bio, u64 offset)
+static blk_status_t btrfs_submit_bio_start_direct_io(struct inode *inode,
+						     struct bio *bio,
+						     u64 offset)
 {
-	struct inode *inode = private_data;
-
 	return btrfs_csum_one_bio(BTRFS_I(inode), bio, offset, 1);
 }
 
@@ -7670,8 +7667,8 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 		goto map;
 
 	if (write && async_submit) {
-		ret = btrfs_wq_submit_bio(fs_info, bio, 0, 0,
-					  file_offset, inode,
+		ret = btrfs_wq_submit_bio(inode, bio, 0, 0,
+					  file_offset,
 					  btrfs_submit_bio_start_direct_io);
 		goto err;
 	} else if (write) {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 08/68] btrfs: inode: sink parameter @start and @len for check_data_csum()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (6 preceding siblings ...)
  2020-10-21  6:24 ` [PATCH v4 07/68] btrfs: disk-io: replace @fs_info and @private_data with @inode for btrfs_wq_submit_bio() Qu Wenruo
@ 2020-10-21  6:24 ` Qu Wenruo
  2020-10-21 22:11   ` Goldwyn Rodrigues
  2020-10-27  0:13   ` David Sterba
  2020-10-21  6:24 ` [PATCH v4 09/68] btrfs: extent_io: unexport extent_invalidatepage() Qu Wenruo
                   ` (61 subsequent siblings)
  69 siblings, 2 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:24 UTC (permalink / raw)
  To: linux-btrfs

For check_data_csum(), the page we're using is directly from inode
mapping, thus it has valid page_offset().

We can use (page_offset() + pg_off) to replace @start parameter
completely, while the @len should always be sectorsize.

Since we're here, also add some comment, as there are quite some
confusion in words like start/offset, without explaining whether it's
file_offset or logical bytenr.

This should not affect the existing behavior, as for current sectorsize
== PAGE_SIZE case, @pgoff should always be 0, and len is always
PAGE_SIZE (or sectorsize from the dio read path).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2a56d3b8eff4..24fbf2c46e56 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2791,17 +2791,30 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
 	btrfs_queue_work(wq, &ordered_extent->work);
 }
 
+/*
+ * Verify the checksum of one sector of uncompressed data.
+ *
+ * @inode:	The inode.
+ * @io_bio:	The btrfs_io_bio which contains the csum.
+ * @icsum:	The csum offset (by number of sectors).
+ * @page:	The page where the data to be verified is.
+ * @pgoff:	The offset inside the page.
+ *
+ * The length of such check is always one sector size.
+ */
 static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
-			   int icsum, struct page *page, int pgoff, u64 start,
-			   size_t len)
+			   int icsum, struct page *page, int pgoff)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	char *kaddr;
+	u32 len = fs_info->sectorsize;
 	u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
 	u8 *csum_expected;
 	u8 csum[BTRFS_CSUM_SIZE];
 
+	ASSERT(pgoff + len <= PAGE_SIZE);
+
 	csum_expected = ((u8 *)io_bio->csum) + icsum * csum_size;
 
 	kaddr = kmap_atomic(page);
@@ -2815,8 +2828,8 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
 	kunmap_atomic(kaddr);
 	return 0;
 zeroit:
-	btrfs_print_data_csum_error(BTRFS_I(inode), start, csum, csum_expected,
-				    io_bio->mirror_num);
+	btrfs_print_data_csum_error(BTRFS_I(inode), page_offset(page) + pgoff,
+				    csum, csum_expected, io_bio->mirror_num);
 	if (io_bio->device)
 		btrfs_dev_stat_inc_and_print(io_bio->device,
 					     BTRFS_DEV_STAT_CORRUPTION_ERRS);
@@ -2855,8 +2868,7 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 	}
 
 	phy_offset >>= inode->i_sb->s_blocksize_bits;
-	return check_data_csum(inode, io_bio, phy_offset, page, offset, start,
-			       (size_t)(end - start + 1));
+	return check_data_csum(inode, io_bio, phy_offset, page, offset);
 }
 
 /*
@@ -7542,8 +7554,7 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
 			ASSERT(pgoff < PAGE_SIZE);
 			if (uptodate &&
 			    (!csum || !check_data_csum(inode, io_bio, icsum,
-						       bvec.bv_page, pgoff,
-						       start, sectorsize))) {
+						       bvec.bv_page, pgoff))) {
 				clean_io_failure(fs_info, failure_tree, io_tree,
 						 start, bvec.bv_page,
 						 btrfs_ino(BTRFS_I(inode)),
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 09/68] btrfs: extent_io: unexport extent_invalidatepage()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (7 preceding siblings ...)
  2020-10-21  6:24 ` [PATCH v4 08/68] btrfs: inode: sink parameter @start and @len for check_data_csum() Qu Wenruo
@ 2020-10-21  6:24 ` Qu Wenruo
  2020-10-27  0:24   ` David Sterba
  2020-10-21  6:24 ` [PATCH v4 10/68] btrfs: extent_io: remove the forward declaration and rename __process_pages_contig Qu Wenruo
                   ` (60 subsequent siblings)
  69 siblings, 1 reply; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:24 UTC (permalink / raw)
  To: linux-btrfs

Function extent_invalidatepage() has a single caller,
btree_invalidatepage().

Just unexport this function and move it disk-io.c.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c        | 23 +++++++++++++++++++++++
 fs/btrfs/extent-io-tree.h |  2 --
 fs/btrfs/extent_io.c      | 24 ------------------------
 3 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b7436ab7bba9..c81b7e53149c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -966,6 +966,29 @@ static int btree_releasepage(struct page *page, gfp_t gfp_flags)
 	return try_release_extent_buffer(page);
 }
 
+/*
+ * basic invalidatepage code, this waits on any locked or writeback
+ * ranges corresponding to the page, and then deletes any extent state
+ * records from the tree
+ */
+static void extent_invalidatepage(struct extent_io_tree *tree,
+				  struct page *page, unsigned long offset)
+{
+	struct extent_state *cached_state = NULL;
+	u64 start = page_offset(page);
+	u64 end = start + PAGE_SIZE - 1;
+	size_t blocksize = page->mapping->host->i_sb->s_blocksize;
+
+	start += ALIGN(offset, blocksize);
+	if (start > end)
+		return;
+
+	lock_extent_bits(tree, start, end, &cached_state);
+	wait_on_page_writeback(page);
+	clear_extent_bit(tree, start, end, EXTENT_LOCKED | EXTENT_DELALLOC |
+			 EXTENT_DO_ACCOUNTING, 1, 1, &cached_state);
+}
+
 static void btree_invalidatepage(struct page *page, unsigned int offset,
 				 unsigned int length)
 {
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 92caa1190ca8..3aaf83376797 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -227,8 +227,6 @@ void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
 				 u64 *start_ret, u64 *end_ret, unsigned bits);
 int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start,
 			       u64 *start_ret, u64 *end_ret, unsigned bits);
-int extent_invalidatepage(struct extent_io_tree *tree,
-			  struct page *page, unsigned long offset);
 bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start,
 			       u64 *end, u64 max_bytes,
 			       struct extent_state **cached_state);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ca219c42ddc6..3f95c67f0c92 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4409,30 +4409,6 @@ void extent_readahead(struct readahead_control *rac)
 	}
 }
 
-/*
- * basic invalidatepage code, this waits on any locked or writeback
- * ranges corresponding to the page, and then deletes any extent state
- * records from the tree
- */
-int extent_invalidatepage(struct extent_io_tree *tree,
-			  struct page *page, unsigned long offset)
-{
-	struct extent_state *cached_state = NULL;
-	u64 start = page_offset(page);
-	u64 end = start + PAGE_SIZE - 1;
-	size_t blocksize = page->mapping->host->i_sb->s_blocksize;
-
-	start += ALIGN(offset, blocksize);
-	if (start > end)
-		return 0;
-
-	lock_extent_bits(tree, start, end, &cached_state);
-	wait_on_page_writeback(page);
-	clear_extent_bit(tree, start, end, EXTENT_LOCKED | EXTENT_DELALLOC |
-			 EXTENT_DO_ACCOUNTING, 1, 1, &cached_state);
-	return 0;
-}
-
 /*
  * a helper for releasepage, this tests for areas of the page that
  * are locked or under IO and drops the related state bits if it is safe
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 10/68] btrfs: extent_io: remove the forward declaration and rename __process_pages_contig
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (8 preceding siblings ...)
  2020-10-21  6:24 ` [PATCH v4 09/68] btrfs: extent_io: unexport extent_invalidatepage() Qu Wenruo
@ 2020-10-21  6:24 ` Qu Wenruo
  2020-10-27  0:28   ` David Sterba
  2020-10-21  6:24 ` [PATCH v4 11/68] btrfs: extent_io: rename pages_locked in process_pages_contig() Qu Wenruo
                   ` (59 subsequent siblings)
  69 siblings, 1 reply; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:24 UTC (permalink / raw)
  To: linux-btrfs

There is no need to do forward declaration for __process_pages_contig(),
so move it before it get first called.

Since we are here, also remove the "__" prefix since there is no special
meaning for it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 180 +++++++++++++++++++++++--------------------
 1 file changed, 95 insertions(+), 85 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3f95c67f0c92..d5e03977c9c8 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1814,10 +1814,98 @@ bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start,
 	return found;
 }
 
-static int __process_pages_contig(struct address_space *mapping,
-				  struct page *locked_page,
-				  pgoff_t start_index, pgoff_t end_index,
-				  unsigned long page_ops, pgoff_t *index_ret);
+/*
+ * A helper to update contiguous pages status according to @page_ops.
+ *
+ * @mapping:		The address space of the pages
+ * @locked_page:	The already locked page. Mostly for inline extent
+ * 			handling
+ * @start_index:	The start page index.
+ * @end_inde:		The last page index.
+ * @pages_opts:		The operations to be done
+ * @index_ret:		The last handled page index (for error case)
+ *
+ * Return 0 if every page is handled properly.
+ * Return <0 if something wrong happened, and update @index_ret.
+ */
+static int process_pages_contig(struct address_space *mapping,
+				struct page *locked_page,
+				pgoff_t start_index, pgoff_t end_index,
+				unsigned long page_ops, pgoff_t *index_ret)
+{
+	unsigned long nr_pages = end_index - start_index + 1;
+	unsigned long pages_locked = 0;
+	pgoff_t index = start_index;
+	struct page *pages[16];
+	unsigned ret;
+	int err = 0;
+	int i;
+
+	if (page_ops & PAGE_LOCK) {
+		ASSERT(page_ops == PAGE_LOCK);
+		ASSERT(index_ret && *index_ret == start_index);
+	}
+
+	if ((page_ops & PAGE_SET_ERROR) && nr_pages > 0)
+		mapping_set_error(mapping, -EIO);
+
+	while (nr_pages > 0) {
+		ret = find_get_pages_contig(mapping, index,
+				     min_t(unsigned long,
+				     nr_pages, ARRAY_SIZE(pages)), pages);
+		if (ret == 0) {
+			/*
+			 * Only if we're going to lock these pages,
+			 * can we find nothing at @index.
+			 */
+			ASSERT(page_ops & PAGE_LOCK);
+			err = -EAGAIN;
+			goto out;
+		}
+
+		for (i = 0; i < ret; i++) {
+			if (page_ops & PAGE_SET_PRIVATE2)
+				SetPagePrivate2(pages[i]);
+
+			if (locked_page && pages[i] == locked_page) {
+				put_page(pages[i]);
+				pages_locked++;
+				continue;
+			}
+			if (page_ops & PAGE_CLEAR_DIRTY)
+				clear_page_dirty_for_io(pages[i]);
+			if (page_ops & PAGE_SET_WRITEBACK)
+				set_page_writeback(pages[i]);
+			if (page_ops & PAGE_SET_ERROR)
+				SetPageError(pages[i]);
+			if (page_ops & PAGE_END_WRITEBACK)
+				end_page_writeback(pages[i]);
+			if (page_ops & PAGE_UNLOCK)
+				unlock_page(pages[i]);
+			if (page_ops & PAGE_LOCK) {
+				lock_page(pages[i]);
+				if (!PageDirty(pages[i]) ||
+				    pages[i]->mapping != mapping) {
+					unlock_page(pages[i]);
+					for (; i < ret; i++)
+						put_page(pages[i]);
+					err = -EAGAIN;
+					goto out;
+				}
+			}
+			put_page(pages[i]);
+			pages_locked++;
+		}
+		nr_pages -= ret;
+		index += ret;
+		cond_resched();
+	}
+out:
+	if (err && index_ret)
+		*index_ret = start_index + pages_locked - 1;
+	return err;
+}
+
 
 static noinline void __unlock_for_delalloc(struct inode *inode,
 					   struct page *locked_page,
@@ -1830,7 +1918,7 @@ static noinline void __unlock_for_delalloc(struct inode *inode,
 	if (index == locked_page->index && end_index == index)
 		return;
 
-	__process_pages_contig(inode->i_mapping, locked_page, index, end_index,
+	process_pages_contig(inode->i_mapping, locked_page, index, end_index,
 			       PAGE_UNLOCK, NULL);
 }
 
@@ -1848,7 +1936,7 @@ static noinline int lock_delalloc_pages(struct inode *inode,
 	if (index == locked_page->index && index == end_index)
 		return 0;
 
-	ret = __process_pages_contig(inode->i_mapping, locked_page, index,
+	ret = process_pages_contig(inode->i_mapping, locked_page, index,
 				     end_index, PAGE_LOCK, &index_ret);
 	if (ret == -EAGAIN)
 		__unlock_for_delalloc(inode, locked_page, delalloc_start,
@@ -1945,84 +2033,6 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
 	return found;
 }
 
-static int __process_pages_contig(struct address_space *mapping,
-				  struct page *locked_page,
-				  pgoff_t start_index, pgoff_t end_index,
-				  unsigned long page_ops, pgoff_t *index_ret)
-{
-	unsigned long nr_pages = end_index - start_index + 1;
-	unsigned long pages_locked = 0;
-	pgoff_t index = start_index;
-	struct page *pages[16];
-	unsigned ret;
-	int err = 0;
-	int i;
-
-	if (page_ops & PAGE_LOCK) {
-		ASSERT(page_ops == PAGE_LOCK);
-		ASSERT(index_ret && *index_ret == start_index);
-	}
-
-	if ((page_ops & PAGE_SET_ERROR) && nr_pages > 0)
-		mapping_set_error(mapping, -EIO);
-
-	while (nr_pages > 0) {
-		ret = find_get_pages_contig(mapping, index,
-				     min_t(unsigned long,
-				     nr_pages, ARRAY_SIZE(pages)), pages);
-		if (ret == 0) {
-			/*
-			 * Only if we're going to lock these pages,
-			 * can we find nothing at @index.
-			 */
-			ASSERT(page_ops & PAGE_LOCK);
-			err = -EAGAIN;
-			goto out;
-		}
-
-		for (i = 0; i < ret; i++) {
-			if (page_ops & PAGE_SET_PRIVATE2)
-				SetPagePrivate2(pages[i]);
-
-			if (locked_page && pages[i] == locked_page) {
-				put_page(pages[i]);
-				pages_locked++;
-				continue;
-			}
-			if (page_ops & PAGE_CLEAR_DIRTY)
-				clear_page_dirty_for_io(pages[i]);
-			if (page_ops & PAGE_SET_WRITEBACK)
-				set_page_writeback(pages[i]);
-			if (page_ops & PAGE_SET_ERROR)
-				SetPageError(pages[i]);
-			if (page_ops & PAGE_END_WRITEBACK)
-				end_page_writeback(pages[i]);
-			if (page_ops & PAGE_UNLOCK)
-				unlock_page(pages[i]);
-			if (page_ops & PAGE_LOCK) {
-				lock_page(pages[i]);
-				if (!PageDirty(pages[i]) ||
-				    pages[i]->mapping != mapping) {
-					unlock_page(pages[i]);
-					for (; i < ret; i++)
-						put_page(pages[i]);
-					err = -EAGAIN;
-					goto out;
-				}
-			}
-			put_page(pages[i]);
-			pages_locked++;
-		}
-		nr_pages -= ret;
-		index += ret;
-		cond_resched();
-	}
-out:
-	if (err && index_ret)
-		*index_ret = start_index + pages_locked - 1;
-	return err;
-}
-
 void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 				  struct page *locked_page,
 				  unsigned clear_bits,
@@ -2030,7 +2040,7 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 {
 	clear_extent_bit(&inode->io_tree, start, end, clear_bits, 1, 0, NULL);
 
-	__process_pages_contig(inode->vfs_inode.i_mapping, locked_page,
+	process_pages_contig(inode->vfs_inode.i_mapping, locked_page,
 			       start >> PAGE_SHIFT, end >> PAGE_SHIFT,
 			       page_ops, NULL);
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 11/68] btrfs: extent_io: rename pages_locked in process_pages_contig()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (9 preceding siblings ...)
  2020-10-21  6:24 ` [PATCH v4 10/68] btrfs: extent_io: remove the forward declaration and rename __process_pages_contig Qu Wenruo
@ 2020-10-21  6:24 ` Qu Wenruo
  2020-10-21  6:24 ` [PATCH v4 12/68] btrfs: extent_io: only require sector size alignment for page read Qu Wenruo
                   ` (58 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:24 UTC (permalink / raw)
  To: linux-btrfs

Function process_pages_contig() does not only handle page locking but
also other operations.

So rename the local variable pages_locked to pages_processed to reduce
confusion.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d5e03977c9c8..f20b8e886724 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1834,7 +1834,7 @@ static int process_pages_contig(struct address_space *mapping,
 				unsigned long page_ops, pgoff_t *index_ret)
 {
 	unsigned long nr_pages = end_index - start_index + 1;
-	unsigned long pages_locked = 0;
+	unsigned long pages_processed = 0;
 	pgoff_t index = start_index;
 	struct page *pages[16];
 	unsigned ret;
@@ -1869,7 +1869,7 @@ static int process_pages_contig(struct address_space *mapping,
 
 			if (locked_page && pages[i] == locked_page) {
 				put_page(pages[i]);
-				pages_locked++;
+				pages_processed++;
 				continue;
 			}
 			if (page_ops & PAGE_CLEAR_DIRTY)
@@ -1894,7 +1894,7 @@ static int process_pages_contig(struct address_space *mapping,
 				}
 			}
 			put_page(pages[i]);
-			pages_locked++;
+			pages_processed++;
 		}
 		nr_pages -= ret;
 		index += ret;
@@ -1902,7 +1902,7 @@ static int process_pages_contig(struct address_space *mapping,
 	}
 out:
 	if (err && index_ret)
-		*index_ret = start_index + pages_locked - 1;
+		*index_ret = start_index + pages_processed - 1;
 	return err;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 12/68] btrfs: extent_io: only require sector size alignment for page read
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (10 preceding siblings ...)
  2020-10-21  6:24 ` [PATCH v4 11/68] btrfs: extent_io: rename pages_locked in process_pages_contig() Qu Wenruo
@ 2020-10-21  6:24 ` Qu Wenruo
  2020-10-21  6:24 ` [PATCH v4 13/68] btrfs: extent_io: remove the extent_start/extent_len for end_bio_extent_readpage() Qu Wenruo
                   ` (57 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:24 UTC (permalink / raw)
  To: linux-btrfs

If we're reading partial page, btrfs will warn about this as our
read/write are always done in sector size, which equals page size.

But for the incoming subpage RO support, our data read is only aligned
to sectorsize, which can be smaller than page size.

Thus here we change the warning condition to check it against
sectorsize, thus the behavior is not changed for regular sectorsize ==
PAGE_SIZE case, and won't report error for subpage read.

Also, pass the proper start/end with bv_offset for check_data_csum() to
handle.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f20b8e886724..ce5b23169e47 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2834,6 +2834,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		struct page *page = bvec->bv_page;
 		struct inode *inode = page->mapping->host;
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+		u32 sectorsize = fs_info->sectorsize;
 		bool data_inode = btrfs_ino(BTRFS_I(inode))
 			!= BTRFS_BTREE_INODE_OBJECTID;
 
@@ -2844,24 +2845,25 @@ static void end_bio_extent_readpage(struct bio *bio)
 		tree = &BTRFS_I(inode)->io_tree;
 		failure_tree = &BTRFS_I(inode)->io_failure_tree;
 
-		/* We always issue full-page reads, but if some block
+		/*
+		 * We always issue full-sector reads, but if some block
 		 * in a page fails to read, blk_update_request() will
 		 * advance bv_offset and adjust bv_len to compensate.
-		 * Print a warning for nonzero offsets, and an error
-		 * if they don't add up to a full page.  */
-		if (bvec->bv_offset || bvec->bv_len != PAGE_SIZE) {
-			if (bvec->bv_offset + bvec->bv_len != PAGE_SIZE)
-				btrfs_err(fs_info,
-					"partial page read in btrfs with offset %u and length %u",
-					bvec->bv_offset, bvec->bv_len);
-			else
-				btrfs_info(fs_info,
-					"incomplete page read in btrfs with offset %u and length %u",
-					bvec->bv_offset, bvec->bv_len);
-		}
-
-		start = page_offset(page);
-		end = start + bvec->bv_offset + bvec->bv_len - 1;
+		 * Print a warning for unaligned offsets, and an error
+		 * if they don't add up to a full sector.
+		 */
+		if (!IS_ALIGNED(bvec->bv_offset, sectorsize))
+			btrfs_err(fs_info,
+		"partial page read in btrfs with offset %u and length %u",
+				  bvec->bv_offset, bvec->bv_len);
+		else if (!IS_ALIGNED(bvec->bv_offset + bvec->bv_len,
+				     sectorsize))
+			btrfs_info(fs_info,
+		"incomplete page read in btrfs with offset %u and length %u",
+				   bvec->bv_offset, bvec->bv_len);
+
+		start = page_offset(page) + bvec->bv_offset;
+		end = start + bvec->bv_len - 1;
 		len = bvec->bv_len;
 
 		mirror = io_bio->mirror_num;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 13/68] btrfs: extent_io: remove the extent_start/extent_len for end_bio_extent_readpage()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (11 preceding siblings ...)
  2020-10-21  6:24 ` [PATCH v4 12/68] btrfs: extent_io: only require sector size alignment for page read Qu Wenruo
@ 2020-10-21  6:24 ` Qu Wenruo
  2020-10-27 10:29   ` David Sterba
  2020-10-21  6:25 ` [PATCH v4 14/68] btrfs: extent_io: integrate page status update into endio_readpage_release_extent() Qu Wenruo
                   ` (56 subsequent siblings)
  69 siblings, 1 reply; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:24 UTC (permalink / raw)
  To: linux-btrfs

In end_bio_extent_readpage() we had a strange dance around
extent_start/extent_len.

The truth is, no matter what we're doing using those two variable, the
end result is just the same, clear the EXTENT_LOCKED bit and if needed
set the EXTENT_UPTODATE bit for the io_tree.

This doesn't need the complex dance, we can do it pretty easily by just
calling endio_readpage_release_extent() for each bvec.

This greatly streamlines the code.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 30 ++----------------------------
 1 file changed, 2 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ce5b23169e47..3819bf7505e3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2791,11 +2791,10 @@ static void end_bio_extent_writepage(struct bio *bio)
 }
 
 static void
-endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 len,
+endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 end,
 			      int uptodate)
 {
 	struct extent_state *cached = NULL;
-	u64 end = start + len - 1;
 
 	if (uptodate && tree->track_uptodate)
 		set_extent_uptodate(tree, start, end, &cached, GFP_ATOMIC);
@@ -2823,8 +2822,6 @@ static void end_bio_extent_readpage(struct bio *bio)
 	u64 start;
 	u64 end;
 	u64 len;
-	u64 extent_start = 0;
-	u64 extent_len = 0;
 	int mirror;
 	int ret;
 	struct bvec_iter_all iter_all;
@@ -2932,32 +2929,9 @@ static void end_bio_extent_readpage(struct bio *bio)
 		unlock_page(page);
 		offset += len;
 
-		if (unlikely(!uptodate)) {
-			if (extent_len) {
-				endio_readpage_release_extent(tree,
-							      extent_start,
-							      extent_len, 1);
-				extent_start = 0;
-				extent_len = 0;
-			}
-			endio_readpage_release_extent(tree, start,
-						      end - start + 1, 0);
-		} else if (!extent_len) {
-			extent_start = start;
-			extent_len = end + 1 - start;
-		} else if (extent_start + extent_len == start) {
-			extent_len += end + 1 - start;
-		} else {
-			endio_readpage_release_extent(tree, extent_start,
-						      extent_len, uptodate);
-			extent_start = start;
-			extent_len = end + 1 - start;
-		}
+		endio_readpage_release_extent(tree, start, end, uptodate);
 	}
 
-	if (extent_len)
-		endio_readpage_release_extent(tree, extent_start, extent_len,
-					      uptodate);
 	btrfs_io_bio_free_csum(io_bio);
 	bio_put(bio);
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 14/68] btrfs: extent_io: integrate page status update into endio_readpage_release_extent()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (12 preceding siblings ...)
  2020-10-21  6:24 ` [PATCH v4 13/68] btrfs: extent_io: remove the extent_start/extent_len for end_bio_extent_readpage() Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 15/68] btrfs: extent_io: rename page_size to io_size in submit_extent_page() Qu Wenruo
                   ` (55 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

In end_bio_extent_readpage(), we set page uptodate or error according to
the bio status.
However that assumes all submitted read are in page size.

To support case like subpage read, we should only set the whole page
uptodate if all data in the page has been read from disk.

This patch will integrate the page status update into
endio_readpage_release_extent() for end_bio_extent_readpage().

Now in endio_readpage_release_extent() we will set the page uptodate if
either:
- start/end covers the full page
  This is the existing behavior already.

- all the page range is already uptodate
  This adds the support for subpage read.

And for the error path, we always clear the page uptodate and set the
page error.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 39 +++++++++++++++++++++++++++++----------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3819bf7505e3..ec0f1fb01a0f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2791,13 +2791,36 @@ static void end_bio_extent_writepage(struct bio *bio)
 }
 
 static void
-endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 end,
-			      int uptodate)
+endio_readpage_release_extent(struct extent_io_tree *tree, struct page *page,
+			      u64 start, u64 end, int uptodate)
 {
 	struct extent_state *cached = NULL;
 
-	if (uptodate && tree->track_uptodate)
-		set_extent_uptodate(tree, start, end, &cached, GFP_ATOMIC);
+	if (uptodate) {
+		u64 page_start = page_offset(page);
+		u64 page_end = page_offset(page) + PAGE_SIZE - 1;
+
+		if (tree->track_uptodate) {
+			/*
+			 * The tree has EXTENT_UPTODATE bit tracking, update
+			 * extent io tree, and use it to update the page if
+			 * needed.
+			 */
+			set_extent_uptodate(tree, start, end, &cached,
+					    GFP_NOFS);
+			check_page_uptodate(tree, page);
+		} else if ((start <= page_start && end >= page_end)) {
+			/* We have covered the full page, set it uptodate */
+			SetPageUptodate(page);
+		}
+	} else if (!uptodate){
+		if (tree->track_uptodate)
+			clear_extent_uptodate(tree, start, end, &cached);
+
+		/* Any error in the page range would invalid the uptodate bit */
+		ClearPageUptodate(page);
+		SetPageError(page);
+	}
 	unlock_extent_cached_atomic(tree, start, end, &cached);
 }
 
@@ -2921,15 +2944,11 @@ static void end_bio_extent_readpage(struct bio *bio)
 			off = offset_in_page(i_size);
 			if (page->index == end_index && off)
 				zero_user_segment(page, off, PAGE_SIZE);
-			SetPageUptodate(page);
-		} else {
-			ClearPageUptodate(page);
-			SetPageError(page);
 		}
-		unlock_page(page);
 		offset += len;
 
-		endio_readpage_release_extent(tree, start, end, uptodate);
+		endio_readpage_release_extent(tree, page, start, end, uptodate);
+		unlock_page(page);
 	}
 
 	btrfs_io_bio_free_csum(io_bio);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 15/68] btrfs: extent_io: rename page_size to io_size in submit_extent_page()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (13 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 14/68] btrfs: extent_io: integrate page status update into endio_readpage_release_extent() Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 16/68] btrfs: extent_io: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
                   ` (54 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

The variable @page_size of submit_extent_page() is not bounded to page
size.

It can already be smaller than PAGE_SIZE, so rename it to io_size to
reduce confusion, this is especially important for later subpage
support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ec0f1fb01a0f..5842d3522865 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3047,7 +3047,7 @@ static int submit_extent_page(unsigned int opf,
 {
 	int ret = 0;
 	struct bio *bio;
-	size_t page_size = min_t(size_t, size, PAGE_SIZE);
+	size_t io_size = min_t(size_t, size, PAGE_SIZE);
 	sector_t sector = offset >> 9;
 	struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree;
 
@@ -3064,12 +3064,12 @@ static int submit_extent_page(unsigned int opf,
 			contig = bio_end_sector(bio) == sector;
 
 		ASSERT(tree->ops);
-		if (btrfs_bio_fits_in_stripe(page, page_size, bio, bio_flags))
+		if (btrfs_bio_fits_in_stripe(page, io_size, bio, bio_flags))
 			can_merge = false;
 
 		if (prev_bio_flags != bio_flags || !contig || !can_merge ||
 		    force_bio_submit ||
-		    bio_add_page(bio, page, page_size, pg_offset) < page_size) {
+		    bio_add_page(bio, page, io_size, pg_offset) < io_size) {
 			ret = submit_one_bio(bio, mirror_num, prev_bio_flags);
 			if (ret < 0) {
 				*bio_ret = NULL;
@@ -3078,13 +3078,13 @@ static int submit_extent_page(unsigned int opf,
 			bio = NULL;
 		} else {
 			if (wbc)
-				wbc_account_cgroup_owner(wbc, page, page_size);
+				wbc_account_cgroup_owner(wbc, page, io_size);
 			return 0;
 		}
 	}
 
 	bio = btrfs_bio_alloc(offset);
-	bio_add_page(bio, page, page_size, pg_offset);
+	bio_add_page(bio, page, io_size, pg_offset);
 	bio->bi_end_io = end_io_func;
 	bio->bi_private = tree;
 	bio->bi_write_hint = page->mapping->host->i_write_hint;
@@ -3095,7 +3095,7 @@ static int submit_extent_page(unsigned int opf,
 		bdev = BTRFS_I(page->mapping->host)->root->fs_info->fs_devices->latest_bdev;
 		bio_set_dev(bio, bdev);
 		wbc_init_bio(wbc, bio);
-		wbc_account_cgroup_owner(wbc, page, page_size);
+		wbc_account_cgroup_owner(wbc, page, io_size);
 	}
 
 	*bio_ret = bio;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 16/68] btrfs: extent_io: add assert_spin_locked() for attach_extent_buffer_page()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (14 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 15/68] btrfs: extent_io: rename page_size to io_size in submit_extent_page() Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-27 10:43   ` David Sterba
  2020-10-21  6:25 ` [PATCH v4 17/68] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
                   ` (53 subsequent siblings)
  69 siblings, 1 reply; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

When calling attach_extent_buffer_page(), either we're attaching
anonymous pages, called from btrfs_clone_extent_buffer().

Or we're attaching btree_inode pages, called from alloc_extent_buffer().

For the later case, we should have page->mapping->private_lock hold to
avoid race modifying page->private.

Add assert_spin_locked() if we're calling from alloc_extent_buffer().

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5842d3522865..8bf38948bd37 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3106,6 +3106,15 @@ static int submit_extent_page(unsigned int opf,
 static void attach_extent_buffer_page(struct extent_buffer *eb,
 				      struct page *page)
 {
+	/*
+	 * If the page is mapped to btree inode, we should hold the private
+	 * lock to prevent race.
+	 * For cloned or dummy extent buffers, their pages are not mapped and
+	 * will not race with any other ebs.
+	 */
+	if (page->mapping)
+		assert_spin_locked(&page->mapping->private_lock);
+
 	if (!PagePrivate(page))
 		attach_page_private(page, eb);
 	else
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 17/68] btrfs: extent_io: extract the btree page submission code into its own helper function
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (15 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 16/68] btrfs: extent_io: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 18/68] btrfs: extent_io: calculate inline extent buffer page size based on page size Qu Wenruo
                   ` (52 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

In btree_write_cache_pages() we have a btree page submission routine
buried deeply into a nested loop.

This patch will extract that part of code into a helper function,
submit_btree_page(), to do the same work.

Also, since submit_btree_page() now can return >0 for successfull extent
buffer submission, remove the "ASSERT(ret <= 0);" line.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 116 +++++++++++++++++++++++++------------------
 1 file changed, 69 insertions(+), 47 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8bf38948bd37..0d5d0581af06 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3984,10 +3984,75 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 	return ret;
 }
 
+/*
+ * A helper to submit a btree page.
+ *
+ * This function is not always submitting the page, as we only submit the full
+ * extent buffer in a batch.
+ *
+ * @page:	The btree page
+ * @prev_eb:	Previous extent buffer, to determine if we need to submit
+ * 		this page.
+ *
+ * Return >0 if we have submitted the extent buffer successfully.
+ * Return 0 if we don't need to do anything for the page.
+ * Return <0 for fatal error.
+ */
+static int submit_btree_page(struct page *page, struct writeback_control *wbc,
+			     struct extent_page_data *epd,
+			     struct extent_buffer **prev_eb)
+{
+	struct address_space *mapping = page->mapping;
+	struct extent_buffer *eb;
+	int ret;
+
+	if (!PagePrivate(page))
+		return 0;
+
+	spin_lock(&mapping->private_lock);
+	if (!PagePrivate(page)) {
+		spin_unlock(&mapping->private_lock);
+		return 0;
+	}
+
+	eb = (struct extent_buffer *)page->private;
+
+	/*
+	 * Shouldn't happen and normally this would be a BUG_ON but no sense
+	 * in crashing the users box for something we can survive anyway.
+	 */
+	if (WARN_ON(!eb)) {
+		spin_unlock(&mapping->private_lock);
+		return 0;
+	}
+
+	if (eb == *prev_eb) {
+		spin_unlock(&mapping->private_lock);
+		return 0;
+	}
+	ret = atomic_inc_not_zero(&eb->refs);
+	spin_unlock(&mapping->private_lock);
+	if (!ret)
+		return 0;
+
+	*prev_eb = eb;
+
+	ret = lock_extent_buffer_for_io(eb, epd);
+	if (ret <= 0) {
+		free_extent_buffer(eb);
+		return ret;
+	}
+	ret = write_one_eb(eb, wbc, epd);
+	free_extent_buffer(eb);
+	if (ret < 0)
+		return ret;
+	return 1;
+}
+
 int btree_write_cache_pages(struct address_space *mapping,
 				   struct writeback_control *wbc)
 {
-	struct extent_buffer *eb, *prev_eb = NULL;
+	struct extent_buffer *prev_eb = NULL;
 	struct extent_page_data epd = {
 		.bio = NULL,
 		.extent_locked = 0,
@@ -4033,55 +4098,13 @@ int btree_write_cache_pages(struct address_space *mapping,
 		for (i = 0; i < nr_pages; i++) {
 			struct page *page = pvec.pages[i];
 
-			if (!PagePrivate(page))
-				continue;
-
-			spin_lock(&mapping->private_lock);
-			if (!PagePrivate(page)) {
-				spin_unlock(&mapping->private_lock);
-				continue;
-			}
-
-			eb = (struct extent_buffer *)page->private;
-
-			/*
-			 * Shouldn't happen and normally this would be a BUG_ON
-			 * but no sense in crashing the users box for something
-			 * we can survive anyway.
-			 */
-			if (WARN_ON(!eb)) {
-				spin_unlock(&mapping->private_lock);
-				continue;
-			}
-
-			if (eb == prev_eb) {
-				spin_unlock(&mapping->private_lock);
-				continue;
-			}
-
-			ret = atomic_inc_not_zero(&eb->refs);
-			spin_unlock(&mapping->private_lock);
-			if (!ret)
-				continue;
-
-			prev_eb = eb;
-			ret = lock_extent_buffer_for_io(eb, &epd);
-			if (!ret) {
-				free_extent_buffer(eb);
+			ret = submit_btree_page(page, wbc, &epd, &prev_eb);
+			if (ret == 0)
 				continue;
-			} else if (ret < 0) {
-				done = 1;
-				free_extent_buffer(eb);
-				break;
-			}
-
-			ret = write_one_eb(eb, wbc, &epd);
-			if (ret) {
+			if (ret < 0) {
 				done = 1;
-				free_extent_buffer(eb);
 				break;
 			}
-			free_extent_buffer(eb);
 
 			/*
 			 * the filesystem may choose to bump up nr_to_write.
@@ -4102,7 +4125,6 @@ int btree_write_cache_pages(struct address_space *mapping,
 		index = 0;
 		goto retry;
 	}
-	ASSERT(ret <= 0);
 	if (ret < 0) {
 		end_write_bio(&epd, ret);
 		return ret;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 18/68] btrfs: extent_io: calculate inline extent buffer page size based on page size
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (16 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 17/68] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-27 11:16   ` David Sterba
  2020-10-21  6:25 ` [PATCH v4 19/68] btrfs: extent_io: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
                   ` (51 subsequent siblings)
  69 siblings, 1 reply; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Btrfs only support 64K as max node size, thus for 4K page system, we
would have at most 16 pages for one extent buffer.

For a system using 64K page size, we would really have just one
single page.

While we always use 16 pages for extent_buffer::pages[], this means for
systems using 64K pages, we are wasting memory for the 15 pages which
will never be utilized.

So this patch will change how the extent_buffer::pages[] array size is
calclulated, now it will be calculated using
BTRFS_MAX_METADATA_BLOCKSIZE and PAGE_SIZE.

For systems using 4K page size, it will stay 16 pages.
For systems using 64K page size, it will be just 1 page.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 6 +++---
 fs/btrfs/extent_io.h | 8 +++++---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0d5d0581af06..6e33fa1645c3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5020,9 +5020,9 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
 	/*
 	 * Sanity checks, currently the maximum is 64k covered by 16x 4k pages
 	 */
-	BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE
-		> MAX_INLINE_EXTENT_BUFFER_SIZE);
-	BUG_ON(len > MAX_INLINE_EXTENT_BUFFER_SIZE);
+	BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE >
+		     INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE);
+	BUG_ON(len > BTRFS_MAX_METADATA_BLOCKSIZE);
 
 #ifdef CONFIG_BTRFS_DEBUG
 	eb->spinning_writers = 0;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 3c9252b429e0..e588b3100ede 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -85,9 +85,11 @@ struct extent_io_ops {
 				    int mirror);
 };
 
-
-#define INLINE_EXTENT_BUFFER_PAGES 16
-#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE)
+/*
+ * The SZ_64K is BTRFS_MAX_METADATA_BLOCKSIZE, here just to avoid circle
+ * including "ctree.h".
+ */
+#define INLINE_EXTENT_BUFFER_PAGES (SZ_64K / PAGE_SIZE)
 struct extent_buffer {
 	u64 start;
 	unsigned long len;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 19/68] btrfs: extent_io: make btrfs_fs_info::buffer_radix to take sector size devided values
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (17 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 18/68] btrfs: extent_io: calculate inline extent buffer page size based on page size Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 20/68] btrfs: extent_io: sink less common parameters for __set_extent_bit() Qu Wenruo
                   ` (50 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

For subpage sized sector size support, one page can contain mutliple tree
blocks, thus we can no longer use (eb->start >> PAGE_SHIFT) any more, or
we can easily get extent buffer doesn't belongs to the bytenr.

This patch will use (extent_buffer::start / sectorsize) as index for radix
tree so that we can get correct extent buffer for subpage size support.
While still keep the behavior same for regular sector size.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/extent_io.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6e33fa1645c3..4ac315d8753f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5158,7 +5158,7 @@ struct extent_buffer *find_extent_buffer(struct btrfs_fs_info *fs_info,
 
 	rcu_read_lock();
 	eb = radix_tree_lookup(&fs_info->buffer_radix,
-			       start >> PAGE_SHIFT);
+			       start / fs_info->sectorsize);
 	if (eb && atomic_inc_not_zero(&eb->refs)) {
 		rcu_read_unlock();
 		/*
@@ -5210,7 +5210,7 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
 	}
 	spin_lock(&fs_info->buffer_lock);
 	ret = radix_tree_insert(&fs_info->buffer_radix,
-				start >> PAGE_SHIFT, eb);
+				start / fs_info->sectorsize, eb);
 	spin_unlock(&fs_info->buffer_lock);
 	radix_tree_preload_end();
 	if (ret == -EEXIST) {
@@ -5318,7 +5318,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 
 	spin_lock(&fs_info->buffer_lock);
 	ret = radix_tree_insert(&fs_info->buffer_radix,
-				start >> PAGE_SHIFT, eb);
+				start / fs_info->sectorsize, eb);
 	spin_unlock(&fs_info->buffer_lock);
 	radix_tree_preload_end();
 	if (ret == -EEXIST) {
@@ -5374,7 +5374,7 @@ static int release_extent_buffer(struct extent_buffer *eb)
 
 			spin_lock(&fs_info->buffer_lock);
 			radix_tree_delete(&fs_info->buffer_radix,
-					  eb->start >> PAGE_SHIFT);
+					  eb->start / fs_info->sectorsize);
 			spin_unlock(&fs_info->buffer_lock);
 		} else {
 			spin_unlock(&eb->refs_lock);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 20/68] btrfs: extent_io: sink less common parameters for __set_extent_bit()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (18 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 19/68] btrfs: extent_io: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 21/68] btrfs: extent_io: sink less common parameters for __clear_extent_bit() Qu Wenruo
                   ` (49 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

For __set_extent_bit(), those parameter are less common for most
callers:
- exclusive_bits
- failed_start
  Paired together for EXTENT_LOCKED usage.

- extent_changeset
  For qgroup usage.

As a common design principle, less common parameters should have their
default values and only callers really need them will set the parameters
to non-default values.

Sink those parameters into a new structure, extent_io_extra_options.
So most callers won't bother those less used parameters, and make later
expansion easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-io-tree.h | 22 ++++++++++++++
 fs/btrfs/extent_io.c      | 61 ++++++++++++++++++++++++---------------
 2 files changed, 59 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 3aaf83376797..dfbb65ac9c8c 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -82,6 +82,28 @@ struct extent_state {
 #endif
 };
 
+/*
+ * Extra options for extent io tree operations.
+ *
+ * All of these options are initialized to 0/false/NULL by default,
+ * and most callers should utilize the wrappers other than the extra options.
+ */
+struct extent_io_extra_options {
+	/*
+	 * For __set_extent_bit(), to return -EEXIST when hit an extent with
+	 * @excl_bits set, and update @excl_failed_start.
+	 * Utizlied by EXTENT_LOCKED wrappers.
+	 */
+	u32 excl_bits;
+	u64 excl_failed_start;
+
+	/*
+	 * For __set/__clear_extent_bit() to record how many bytes is modified.
+	 * For qgroup related functions.
+	 */
+	struct extent_changeset *changeset;
+};
+
 int __init extent_state_cache_init(void);
 void __cold extent_state_cache_exit(void);
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4ac315d8753f..5f899b27962b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -29,6 +29,7 @@ static struct kmem_cache *extent_state_cache;
 static struct kmem_cache *extent_buffer_cache;
 static struct bio_set btrfs_bioset;
 
+static struct extent_io_extra_options default_opts = { 0 };
 static inline bool extent_state_in_tree(const struct extent_state *state)
 {
 	return !RB_EMPTY_NODE(&state->rb_node);
@@ -952,10 +953,10 @@ static void cache_state(struct extent_state *state,
 }
 
 /*
- * set some bits on a range in the tree.  This may require allocations or
+ * Set some bits on a range in the tree.  This may require allocations or
  * sleeping, so the gfp mask is used to indicate what is allowed.
  *
- * If any of the exclusive bits are set, this will fail with -EEXIST if some
+ * If *any* of the exclusive bits are set, this will fail with -EEXIST if some
  * part of the range already has the desired bits set.  The start of the
  * existing range is returned in failed_start in this case.
  *
@@ -964,26 +965,30 @@ static void cache_state(struct extent_state *state,
 
 static int __must_check
 __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		 unsigned bits, unsigned exclusive_bits,
-		 u64 *failed_start, struct extent_state **cached_state,
-		 gfp_t mask, struct extent_changeset *changeset)
+		 unsigned bits, struct extent_state **cached_state,
+		 gfp_t mask, struct extent_io_extra_options *extra_opts)
 {
 	struct extent_state *state;
 	struct extent_state *prealloc = NULL;
 	struct rb_node *node;
 	struct rb_node **p;
 	struct rb_node *parent;
+	struct extent_changeset *changeset;
 	int err = 0;
+	u32 exclusive_bits;
+	u64 *failed_start;
 	u64 last_start;
 	u64 last_end;
 
 	btrfs_debug_check_extent_io_range(tree, start, end);
 	trace_btrfs_set_extent_bit(tree, start, end - start + 1, bits);
 
-	if (exclusive_bits)
-		ASSERT(failed_start);
-	else
-		ASSERT(!failed_start);
+	if (!extra_opts)
+		extra_opts = &default_opts;
+	exclusive_bits = extra_opts->excl_bits;
+	failed_start = &extra_opts->excl_failed_start;
+	changeset = extra_opts->changeset;
+
 again:
 	if (!prealloc && gfpflags_allow_blocking(mask)) {
 		/*
@@ -1187,7 +1192,7 @@ int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		   unsigned bits, struct extent_state **cached_state,
 		   gfp_t mask)
 {
-	return __set_extent_bit(tree, start, end, bits, 0, NULL, cached_state,
+	return __set_extent_bit(tree, start, end, bits, cached_state,
 			        mask, NULL);
 }
 
@@ -1414,6 +1419,10 @@ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 			   unsigned bits, struct extent_changeset *changeset)
 {
+	struct extent_io_extra_options extra_opts = {
+		.changeset = changeset,
+	};
+
 	/*
 	 * We don't support EXTENT_LOCKED yet, as current changeset will
 	 * record any bits changed, so for EXTENT_LOCKED case, it will
@@ -1422,15 +1431,14 @@ int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 	 */
 	BUG_ON(bits & EXTENT_LOCKED);
 
-	return __set_extent_bit(tree, start, end, bits, 0, NULL, NULL, GFP_NOFS,
-				changeset);
+	return __set_extent_bit(tree, start, end, bits, NULL, GFP_NOFS,
+				&extra_opts);
 }
 
 int set_extent_bits_nowait(struct extent_io_tree *tree, u64 start, u64 end,
 			   unsigned bits)
 {
-	return __set_extent_bit(tree, start, end, bits, 0, NULL, NULL,
-				GFP_NOWAIT, NULL);
+	return __set_extent_bit(tree, start, end, bits, NULL, GFP_NOWAIT, NULL);
 }
 
 int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
@@ -1461,16 +1469,18 @@ int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 int lock_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 		     struct extent_state **cached_state)
 {
+	struct extent_io_extra_options extra_opts = {
+		.excl_bits = EXTENT_LOCKED,
+	};
 	int err;
-	u64 failed_start;
 
 	while (1) {
 		err = __set_extent_bit(tree, start, end, EXTENT_LOCKED,
-				       EXTENT_LOCKED, &failed_start,
-				       cached_state, GFP_NOFS, NULL);
+				       cached_state, GFP_NOFS, &extra_opts);
 		if (err == -EEXIST) {
-			wait_extent_bit(tree, failed_start, end, EXTENT_LOCKED);
-			start = failed_start;
+			wait_extent_bit(tree, extra_opts.excl_failed_start, end,
+					EXTENT_LOCKED);
+			start = extra_opts.excl_failed_start;
 		} else
 			break;
 		WARN_ON(start > end);
@@ -1480,14 +1490,17 @@ int lock_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 
 int try_lock_extent(struct extent_io_tree *tree, u64 start, u64 end)
 {
+	struct extent_io_extra_options extra_opts = {
+		.excl_bits = EXTENT_LOCKED,
+	};
 	int err;
-	u64 failed_start;
 
-	err = __set_extent_bit(tree, start, end, EXTENT_LOCKED, EXTENT_LOCKED,
-			       &failed_start, NULL, GFP_NOFS, NULL);
+	err = __set_extent_bit(tree, start, end, EXTENT_LOCKED,
+			       NULL, GFP_NOFS, &extra_opts);
 	if (err == -EEXIST) {
-		if (failed_start > start)
-			clear_extent_bit(tree, start, failed_start - 1,
+		if (extra_opts.excl_failed_start > start)
+			clear_extent_bit(tree, start,
+					 extra_opts.excl_failed_start - 1,
 					 EXTENT_LOCKED, 1, 0, NULL);
 		return 0;
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 21/68] btrfs: extent_io: sink less common parameters for __clear_extent_bit()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (19 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 20/68] btrfs: extent_io: sink less common parameters for __set_extent_bit() Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 22/68] btrfs: disk_io: grab fs_info from extent_buffer::fs_info directly for btrfs_mark_buffer_dirty() Qu Wenruo
                   ` (48 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

The following parameters are less commonly used for
__clear_extent_bit():
- wake
  To wake up the waiters

- delete
  For cleanup cases, to remove the extent state regardless of its state

- changeset
  Only utilized for qgroup

Sink them into extent_io_extra_options structure.

For most callers who don't care these options, we obviously sink some
parameters, without any impact.
For callers who care these options, we slightly increase the stack
usage, as the extent_io_extra options has extra members only for
__set_extent_bits().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-io-tree.h | 30 +++++++++++++++++++-------
 fs/btrfs/extent_io.c      | 45 ++++++++++++++++++++++++++++-----------
 fs/btrfs/extent_map.c     |  2 +-
 3 files changed, 56 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index dfbb65ac9c8c..2893573eb556 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -102,6 +102,15 @@ struct extent_io_extra_options {
 	 * For qgroup related functions.
 	 */
 	struct extent_changeset *changeset;
+
+	/*
+	 * For __clear_extent_bit().
+	 * @wake:	Wake up the waiters. Mostly for EXTENT_LOCKED case
+	 * @delete:	Delete the extent regardless of its state. Mostly for
+	 * 		cleanup.
+	 */
+	bool wake;
+	bool delete;
 };
 
 int __init extent_state_cache_init(void);
@@ -139,9 +148,8 @@ int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		     unsigned bits, int wake, int delete,
 		     struct extent_state **cached);
 int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		     unsigned bits, int wake, int delete,
-		     struct extent_state **cached, gfp_t mask,
-		     struct extent_changeset *changeset);
+		       unsigned bits, struct extent_state **cached_state,
+		       gfp_t mask, struct extent_io_extra_options *extra_opts);
 
 static inline int unlock_extent(struct extent_io_tree *tree, u64 start, u64 end)
 {
@@ -151,15 +159,21 @@ static inline int unlock_extent(struct extent_io_tree *tree, u64 start, u64 end)
 static inline int unlock_extent_cached(struct extent_io_tree *tree, u64 start,
 		u64 end, struct extent_state **cached)
 {
-	return __clear_extent_bit(tree, start, end, EXTENT_LOCKED, 1, 0, cached,
-				GFP_NOFS, NULL);
+	struct extent_io_extra_options extra_opts = {
+		.wake = true,
+	};
+	return __clear_extent_bit(tree, start, end, EXTENT_LOCKED, cached,
+				GFP_NOFS, &extra_opts);
 }
 
 static inline int unlock_extent_cached_atomic(struct extent_io_tree *tree,
 		u64 start, u64 end, struct extent_state **cached)
 {
-	return __clear_extent_bit(tree, start, end, EXTENT_LOCKED, 1, 0, cached,
-				GFP_ATOMIC, NULL);
+	struct extent_io_extra_options extra_opts = {
+		.wake = true,
+	};
+	return __clear_extent_bit(tree, start, end, EXTENT_LOCKED, cached,
+				GFP_ATOMIC, &extra_opts);
 }
 
 static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start,
@@ -190,7 +204,7 @@ static inline int set_extent_bits(struct extent_io_tree *tree, u64 start,
 static inline int clear_extent_uptodate(struct extent_io_tree *tree, u64 start,
 		u64 end, struct extent_state **cached_state)
 {
-	return __clear_extent_bit(tree, start, end, EXTENT_UPTODATE, 0, 0,
+	return __clear_extent_bit(tree, start, end, EXTENT_UPTODATE,
 				cached_state, GFP_NOFS, NULL);
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5f899b27962b..98b114becd52 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -688,26 +688,38 @@ static void extent_io_tree_panic(struct extent_io_tree *tree, int err)
  * or inserting elements in the tree, so the gfp mask is used to
  * indicate which allocations or sleeping are allowed.
  *
- * pass 'wake' == 1 to kick any sleepers, and 'delete' == 1 to remove
- * the given range from the tree regardless of state (ie for truncate).
+ * extar_opts::wake:		To kick any sleeps.
+ * extra_opts::delete:		To remove the given range regardless of state
+ *				(ie for truncate)
+ * extra_opts::changeset: 	To record how many bytes are modified and
+ * 				which ranges are modified. (for qgroup)
  *
- * the range [start, end] is inclusive.
+ * The range [start, end] is inclusive.
  *
- * This takes the tree lock, and returns 0 on success and < 0 on error.
+ * Returns 0 on success
+ * No error can be returned yet, the ENOMEM for memory is handled by BUG_ON().
  */
 int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-			      unsigned bits, int wake, int delete,
-			      struct extent_state **cached_state,
-			      gfp_t mask, struct extent_changeset *changeset)
+		       unsigned bits, struct extent_state **cached_state,
+		       gfp_t mask, struct extent_io_extra_options *extra_opts)
 {
+	struct extent_changeset *changeset;
 	struct extent_state *state;
 	struct extent_state *cached;
 	struct extent_state *prealloc = NULL;
 	struct rb_node *node;
+	bool wake;
+	bool delete;
 	u64 last_end;
 	int err;
 	int clear = 0;
 
+	if (!extra_opts)
+		extra_opts = &default_opts;
+	changeset = extra_opts->changeset;
+	wake = extra_opts->wake;
+	delete = extra_opts->delete;
+
 	btrfs_debug_check_extent_io_range(tree, start, end);
 	trace_btrfs_clear_extent_bit(tree, start, end - start + 1, bits);
 
@@ -1445,21 +1457,30 @@ int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		     unsigned bits, int wake, int delete,
 		     struct extent_state **cached)
 {
-	return __clear_extent_bit(tree, start, end, bits, wake, delete,
-				  cached, GFP_NOFS, NULL);
+	struct extent_io_extra_options extra_opts = {
+		.wake = wake,
+		.delete = delete,
+	};
+
+	return __clear_extent_bit(tree, start, end, bits,
+				  cached, GFP_NOFS, &extra_opts);
 }
 
 int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 		unsigned bits, struct extent_changeset *changeset)
 {
+	struct extent_io_extra_options extra_opts = {
+		.changeset = changeset,
+	};
+
 	/*
 	 * Don't support EXTENT_LOCKED case, same reason as
 	 * set_record_extent_bits().
 	 */
 	BUG_ON(bits & EXTENT_LOCKED);
 
-	return __clear_extent_bit(tree, start, end, bits, 0, 0, NULL, GFP_NOFS,
-				  changeset);
+	return __clear_extent_bit(tree, start, end, bits, NULL, GFP_NOFS,
+				  &extra_opts);
 }
 
 /*
@@ -4479,7 +4500,7 @@ static int try_release_extent_state(struct extent_io_tree *tree,
 		 */
 		ret = __clear_extent_bit(tree, start, end,
 				 ~(EXTENT_LOCKED | EXTENT_NODATASUM),
-				 0, 0, NULL, mask, NULL);
+				 NULL, mask, NULL);
 
 		/* if clear_extent_bit failed for enomem reasons,
 		 * we can't allow the release to continue.
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index bd6229fb2b6f..95651ddbb3a7 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -380,7 +380,7 @@ static void extent_map_device_clear_bits(struct extent_map *em, unsigned bits)
 
 		__clear_extent_bit(&device->alloc_state, stripe->physical,
 				   stripe->physical + stripe_size - 1, bits,
-				   0, 0, NULL, GFP_NOWAIT, NULL);
+				   NULL, GFP_NOWAIT, NULL);
 	}
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 22/68] btrfs: disk_io: grab fs_info from extent_buffer::fs_info directly for btrfs_mark_buffer_dirty()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (20 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 21/68] btrfs: extent_io: sink less common parameters for __clear_extent_bit() Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-27 15:43   ` Goldwyn Rodrigues
  2020-10-21  6:25 ` [PATCH v4 23/68] btrfs: disk-io: make csum_tree_block() handle sectorsize smaller than page size Qu Wenruo
                   ` (47 subsequent siblings)
  69 siblings, 1 reply; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Since commit f28491e0a6c4 ("Btrfs: move the extent buffer radix tree into
the fs_info"), fs_info can be grabbed from extent_buffer directly.

So use that extent_buffer::fs_info directly in btrfs_mark_buffer_dirty()
to make things a little easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c81b7e53149c..58928076d08d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4190,8 +4190,7 @@ int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
 
 void btrfs_mark_buffer_dirty(struct extent_buffer *buf)
 {
-	struct btrfs_fs_info *fs_info;
-	struct btrfs_root *root;
+	struct btrfs_fs_info *fs_info = buf->fs_info;
 	u64 transid = btrfs_header_generation(buf);
 	int was_dirty;
 
@@ -4204,8 +4203,6 @@ void btrfs_mark_buffer_dirty(struct extent_buffer *buf)
 	if (unlikely(test_bit(EXTENT_BUFFER_UNMAPPED, &buf->bflags)))
 		return;
 #endif
-	root = BTRFS_I(buf->pages[0]->mapping->host)->root;
-	fs_info = root->fs_info;
 	btrfs_assert_tree_locked(buf);
 	if (transid != fs_info->generation)
 		WARN(1, KERN_CRIT "btrfs transid mismatch buffer %llu, found %llu running %llu\n",
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 23/68] btrfs: disk-io: make csum_tree_block() handle sectorsize smaller than page size
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (21 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 22/68] btrfs: disk_io: grab fs_info from extent_buffer::fs_info directly for btrfs_mark_buffer_dirty() Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 24/68] btrfs: disk-io: extract the extent buffer verification from btree_readpage_end_io_hook() Qu Wenruo
                   ` (46 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues, Nikolay Borisov

For subpage size support, we only need to handle the first page.

To make the code work for both cases, we modify the following behaviors:

- num_pages calcuation
  Instead of "nodesize >> PAGE_SHIFT", we go
  "DIV_ROUND_UP(nodesize, PAGE_SIZE)", this ensures we get at least one
  page for subpage size support, while still get the same result for
  regular page size.

- The length for the first run
  Instead of PAGE_SIZE - BTRFS_CSUM_SIZE, we go min(PAGE_SIZE, nodesize)
  - BTRFS_CSUM_SIZE.
  This allows us to handle both cases well.

- The start location of the first run
  Instead of always use BTRFS_CSUM_SIZE as csum start position, add
  offset_in_page(eb->start) to get proper offset for both cases.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/disk-io.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 58928076d08d..55bb4f2def3c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -257,16 +257,16 @@ struct extent_map *btree_get_extent(struct btrfs_inode *inode,
 static void csum_tree_block(struct extent_buffer *buf, u8 *result)
 {
 	struct btrfs_fs_info *fs_info = buf->fs_info;
-	const int num_pages = fs_info->nodesize >> PAGE_SHIFT;
+	const int num_pages = DIV_ROUND_UP(fs_info->nodesize, PAGE_SIZE);
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	char *kaddr;
 	int i;
 
 	shash->tfm = fs_info->csum_shash;
 	crypto_shash_init(shash);
-	kaddr = page_address(buf->pages[0]);
+	kaddr = page_address(buf->pages[0]) + offset_in_page(buf->start);
 	crypto_shash_update(shash, kaddr + BTRFS_CSUM_SIZE,
-			    PAGE_SIZE - BTRFS_CSUM_SIZE);
+		min_t(u32, PAGE_SIZE, fs_info->nodesize) - BTRFS_CSUM_SIZE);
 
 	for (i = 1; i < num_pages; i++) {
 		kaddr = page_address(buf->pages[i]);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 24/68] btrfs: disk-io: extract the extent buffer verification from btree_readpage_end_io_hook()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (22 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 23/68] btrfs: disk-io: make csum_tree_block() handle sectorsize smaller than page size Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 25/68] btrfs: disk-io: accept bvec directly for csum_dirty_buffer() Qu Wenruo
                   ` (45 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Currently btree_readpage_end_io_hook() only needs to handle one extent
buffer as currently one page only maps to one extent buffer.

But for incoming subpage support, one page can be mapped to multiple
extent buffers, thus we can no longer use current code.

This refactor would allow us to call btrfs_check_extent_buffer() on
all involved extent buffers at btree_readpage_end_io_hook() and other
locations.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 78 ++++++++++++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 55bb4f2def3c..ee2a6d480a7d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -574,60 +574,37 @@ static int check_tree_block_fsid(struct extent_buffer *eb)
 	return ret;
 }
 
-static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
-				      u64 phy_offset, struct page *page,
-				      u64 start, u64 end, int mirror)
+/* Do basic extent buffer check at read time */
+static int btrfs_check_extent_buffer(struct extent_buffer *eb)
 {
-	u64 found_start;
-	int found_level;
-	struct extent_buffer *eb;
-	struct btrfs_fs_info *fs_info;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 	u16 csum_size;
-	int ret = 0;
+	u64 found_start;
+	u8 found_level;
 	u8 result[BTRFS_CSUM_SIZE];
-	int reads_done;
-
-	if (!page->private)
-		goto out;
+	int ret = 0;
 
-	eb = (struct extent_buffer *)page->private;
-	fs_info = eb->fs_info;
 	csum_size = btrfs_super_csum_size(fs_info->super_copy);
 
-	/* the pending IO might have been the only thing that kept this buffer
-	 * in memory.  Make sure we have a ref for all this other checks
-	 */
-	atomic_inc(&eb->refs);
-
-	reads_done = atomic_dec_and_test(&eb->io_pages);
-	if (!reads_done)
-		goto err;
-
-	eb->read_mirror = mirror;
-	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
-		ret = -EIO;
-		goto err;
-	}
-
 	found_start = btrfs_header_bytenr(eb);
 	if (found_start != eb->start) {
 		btrfs_err_rl(fs_info, "bad tree block start, want %llu have %llu",
 			     eb->start, found_start);
 		ret = -EIO;
-		goto err;
+		goto out;
 	}
 	if (check_tree_block_fsid(eb)) {
 		btrfs_err_rl(fs_info, "bad fsid on block %llu",
 			     eb->start);
 		ret = -EIO;
-		goto err;
+		goto out;
 	}
 	found_level = btrfs_header_level(eb);
 	if (found_level >= BTRFS_MAX_LEVEL) {
 		btrfs_err(fs_info, "bad tree block level %d on %llu",
 			  (int)btrfs_header_level(eb), eb->start);
 		ret = -EIO;
-		goto err;
+		goto out;
 	}
 
 	btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb),
@@ -647,7 +624,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 			      fs_info->sb->s_id, eb->start,
 			      val, found, btrfs_header_level(eb));
 		ret = -EUCLEAN;
-		goto err;
+		goto out;
 	}
 
 	/*
@@ -669,6 +646,40 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 		btrfs_err(fs_info,
 			  "block=%llu read time tree block corruption detected",
 			  eb->start);
+out:
+	return ret;
+}
+
+static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
+				      u64 phy_offset, struct page *page,
+				      u64 start, u64 end, int mirror)
+{
+	struct extent_buffer *eb;
+	int ret = 0;
+	bool reads_done;
+
+	/* Metadata pages that goes through IO should all have private set */
+	ASSERT(PagePrivate(page) && page->private);
+	eb = (struct extent_buffer *)page->private;
+
+	/*
+	 * The pending IO might have been the only thing that kept this buffer
+	 * in memory.  Make sure we have a ref for all this other checks
+	 */
+	atomic_inc(&eb->refs);
+
+	reads_done = atomic_dec_and_test(&eb->io_pages);
+	if (!reads_done)
+		goto err;
+
+	eb->read_mirror = mirror;
+	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
+		ret = -EIO;
+		goto err;
+	}
+
+	ret = btrfs_check_extent_buffer(eb);
+
 err:
 	if (reads_done &&
 	    test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
@@ -684,7 +695,6 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 		clear_extent_buffer_uptodate(eb);
 	}
 	free_extent_buffer(eb);
-out:
 	return ret;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 25/68] btrfs: disk-io: accept bvec directly for csum_dirty_buffer()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (23 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 24/68] btrfs: disk-io: extract the extent buffer verification from btree_readpage_end_io_hook() Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 26/68] btrfs: inode: make btrfs_readpage_end_io_hook() follow sector size Qu Wenruo
                   ` (44 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Currently csum_dirty_buffer() uses page to grab extent buffer, but that
only works for regular sector size == PAGE_SIZE case.

For subpage we need page + page_offset to grab extent buffer.

This patch will change csum_dirty_buffer() to accept bvec directly so
that we can extract both page and page_offset for later subpage support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ee2a6d480a7d..b34a3f312e0c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -495,13 +495,14 @@ static int btree_read_extent_buffer_pages(struct extent_buffer *eb,
  * we only fill in the checksum field in the first page of a multi-page block
  */
 
-static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
+static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec)
 {
+	struct extent_buffer *eb;
+	struct page *page = bvec->bv_page;
 	u64 start = page_offset(page);
 	u64 found_start;
 	u8 result[BTRFS_CSUM_SIZE];
 	u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
-	struct extent_buffer *eb;
 	int ret;
 
 	eb = (struct extent_buffer *)page->private;
@@ -848,7 +849,7 @@ static blk_status_t btree_csum_one_bio(struct bio *bio)
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
 		root = BTRFS_I(bvec->bv_page->mapping->host)->root;
-		ret = csum_dirty_buffer(root->fs_info, bvec->bv_page);
+		ret = csum_dirty_buffer(root->fs_info, bvec);
 		if (ret)
 			break;
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 26/68] btrfs: inode: make btrfs_readpage_end_io_hook() follow sector size
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (24 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 25/68] btrfs: disk-io: accept bvec directly for csum_dirty_buffer() Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 27/68] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE Qu Wenruo
                   ` (43 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

Currently btrfs_readpage_end_io_hook() just pass the whole page to
check_data_csum(), which is fine since we only support sectorsize ==
PAGE_SIZE.

To support subpage, we need to properly honor per-sector
checksum verification, just like what we did in dio read path.

This patch will do the csum verification in a for loop, starts with
pg_off == start - page_offset(page), with sectorsize increasement for
each loop.

For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we
will only finish with just one loop.

For subpage, we do the proper loop.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 24fbf2c46e56..f22ee5d3c105 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2849,9 +2849,12 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 				      u64 start, u64 end, int mirror)
 {
 	size_t offset = start - page_offset(page);
+	size_t pg_off;
 	struct inode *inode = page->mapping->host;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
+	u32 sectorsize = root->fs_info->sectorsize;
+	bool found_err = false;
 
 	if (PageChecked(page)) {
 		ClearPageChecked(page);
@@ -2868,7 +2871,17 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 	}
 
 	phy_offset >>= inode->i_sb->s_blocksize_bits;
-	return check_data_csum(inode, io_bio, phy_offset, page, offset);
+	for (pg_off = offset; pg_off < end - page_offset(page);
+	     pg_off += sectorsize, phy_offset++) {
+		int ret;
+
+		ret = check_data_csum(inode, io_bio, phy_offset, page, pg_off);
+		if (ret < 0)
+			found_err = true;
+	}
+	if (found_err)
+		return -EIO;
+	return 0;
 }
 
 /*
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 27/68] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (25 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 26/68] btrfs: inode: make btrfs_readpage_end_io_hook() follow sector size Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 28/68] btrfs: extent_io: allow find_first_extent_bit() to find a range with exact bits match Qu Wenruo
                   ` (42 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Just to save us several letters for the incoming patches.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9a72896bed2e..e3501dad88e2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3532,6 +3532,11 @@ static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info)
 	return signal_pending(current);
 }
 
+static inline bool btrfs_is_subpage(struct btrfs_fs_info *fs_info)
+{
+	return (fs_info->sectorsize < PAGE_SIZE);
+}
+
 #define in_range(b, first, len) ((b) >= (first) && (b) < (first) + (len))
 
 /* Sanity test specific functions */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 28/68] btrfs: extent_io: allow find_first_extent_bit() to find a range with exact bits match
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (26 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 27/68] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 29/68] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support Qu Wenruo
                   ` (41 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Currently if we pass mutliple @bits to find_first_extent_bit(), it will
return the first range with one or more bits matching @bits.

This is fine for current code, since most of them are just doing their
own extra checks, and all existing callers only call it with 1 or 2
bits.

But for the incoming subpage support, we want the ability to return range
with exact match, so that caller can skip some extra checks.

So this patch will add a new bool parameter, @exact_match, to
find_first_extent_bit() and its callees.
Currently all callers just pass 'false' to the new parameter, thus no
functional change is introduced.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/block-group.c      |  2 +-
 fs/btrfs/disk-io.c          |  4 ++--
 fs/btrfs/extent-io-tree.h   |  2 +-
 fs/btrfs/extent-tree.c      |  2 +-
 fs/btrfs/extent_io.c        | 42 +++++++++++++++++++++++++------------
 fs/btrfs/free-space-cache.c |  2 +-
 fs/btrfs/relocation.c       |  2 +-
 fs/btrfs/transaction.c      |  4 ++--
 fs/btrfs/volumes.c          |  2 +-
 9 files changed, 39 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index ea8aaf36647e..7e6ab6b765f6 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -461,7 +461,7 @@ u64 add_new_free_space(struct btrfs_block_group *block_group, u64 start, u64 end
 		ret = find_first_extent_bit(&info->excluded_extents, start,
 					    &extent_start, &extent_end,
 					    EXTENT_DIRTY | EXTENT_UPTODATE,
-					    NULL);
+					    false, NULL);
 		if (ret)
 			break;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b34a3f312e0c..1ca121ca28aa 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4516,7 +4516,7 @@ static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info,
 
 	while (1) {
 		ret = find_first_extent_bit(dirty_pages, start, &start, &end,
-					    mark, NULL);
+					    mark, false, NULL);
 		if (ret)
 			break;
 
@@ -4556,7 +4556,7 @@ static int btrfs_destroy_pinned_extent(struct btrfs_fs_info *fs_info,
 		 */
 		mutex_lock(&fs_info->unused_bg_unpin_mutex);
 		ret = find_first_extent_bit(unpin, 0, &start, &end,
-					    EXTENT_DIRTY, &cached_state);
+					    EXTENT_DIRTY, false, &cached_state);
 		if (ret) {
 			mutex_unlock(&fs_info->unused_bg_unpin_mutex);
 			break;
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 2893573eb556..48fdaf5f3a19 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -258,7 +258,7 @@ static inline int set_extent_uptodate(struct extent_io_tree *tree, u64 start,
 
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
 			  u64 *start_ret, u64 *end_ret, unsigned bits,
-			  struct extent_state **cached_state);
+			  bool exact_match, struct extent_state **cached_state);
 void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
 				 u64 *start_ret, u64 *end_ret, unsigned bits);
 int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e9eedc053fc5..406329dabb48 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2880,7 +2880,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
 
 		mutex_lock(&fs_info->unused_bg_unpin_mutex);
 		ret = find_first_extent_bit(unpin, 0, &start, &end,
-					    EXTENT_DIRTY, &cached_state);
+					    EXTENT_DIRTY, false, &cached_state);
 		if (ret) {
 			mutex_unlock(&fs_info->unused_bg_unpin_mutex);
 			break;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 98b114becd52..37c721294ffe 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1559,13 +1559,27 @@ void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end)
 	}
 }
 
-/* find the first state struct with 'bits' set after 'start', and
- * return it.  tree->lock must be held.  NULL will returned if
- * nothing was found after 'start'
+static bool match_extent_state(struct extent_state *state, unsigned bits,
+			       bool exact_match)
+{
+	if (exact_match)
+		return ((state->state & bits) == bits);
+	return (state->state & bits);
+}
+
+/*
+ * Find the first state struct with @bits set after @start.
+ *
+ * NOTE: tree->lock must be hold.
+ *
+ * @exact_match:	Do we need to have all @bits set, or just any of
+ * 			the @bits.
+ *
+ * Return NULL if we can't find a match.
  */
 static struct extent_state *
 find_first_extent_bit_state(struct extent_io_tree *tree,
-			    u64 start, unsigned bits)
+			    u64 start, unsigned bits, bool exact_match)
 {
 	struct rb_node *node;
 	struct extent_state *state;
@@ -1580,7 +1594,8 @@ find_first_extent_bit_state(struct extent_io_tree *tree,
 
 	while (1) {
 		state = rb_entry(node, struct extent_state, rb_node);
-		if (state->end >= start && (state->state & bits))
+		if (state->end >= start &&
+		    match_extent_state(state, bits, exact_match))
 			return state;
 
 		node = rb_next(node);
@@ -1601,7 +1616,7 @@ find_first_extent_bit_state(struct extent_io_tree *tree,
  */
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
 			  u64 *start_ret, u64 *end_ret, unsigned bits,
-			  struct extent_state **cached_state)
+			  bool exact_match, struct extent_state **cached_state)
 {
 	struct extent_state *state;
 	int ret = 1;
@@ -1611,7 +1626,8 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
 		state = *cached_state;
 		if (state->end == start - 1 && extent_state_in_tree(state)) {
 			while ((state = next_state(state)) != NULL) {
-				if (state->state & bits)
+				if (match_extent_state(state, bits,
+				    exact_match))
 					goto got_it;
 			}
 			free_extent_state(*cached_state);
@@ -1622,7 +1638,7 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
 		*cached_state = NULL;
 	}
 
-	state = find_first_extent_bit_state(tree, start, bits);
+	state = find_first_extent_bit_state(tree, start, bits, exact_match);
 got_it:
 	if (state) {
 		cache_state_if_flags(state, cached_state, 0);
@@ -1657,7 +1673,7 @@ int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start,
 	int ret = 1;
 
 	spin_lock(&tree->lock);
-	state = find_first_extent_bit_state(tree, start, bits);
+	state = find_first_extent_bit_state(tree, start, bits, false);
 	if (state) {
 		*start_ret = state->start;
 		*end_ret = state->end;
@@ -2443,9 +2459,8 @@ int clean_io_failure(struct btrfs_fs_info *fs_info,
 		goto out;
 
 	spin_lock(&io_tree->lock);
-	state = find_first_extent_bit_state(io_tree,
-					    failrec->start,
-					    EXTENT_LOCKED);
+	state = find_first_extent_bit_state(io_tree, failrec->start,
+					    EXTENT_LOCKED, false);
 	spin_unlock(&io_tree->lock);
 
 	if (state && state->start <= failrec->start &&
@@ -2481,7 +2496,8 @@ void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start, u64 end)
 		return;
 
 	spin_lock(&failure_tree->lock);
-	state = find_first_extent_bit_state(failure_tree, start, EXTENT_DIRTY);
+	state = find_first_extent_bit_state(failure_tree, start, EXTENT_DIRTY,
+					    false);
 	while (state) {
 		if (state->start > end)
 			break;
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index dc82fd0c80cb..1533df86536b 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1093,7 +1093,7 @@ static noinline_for_stack int write_pinned_extent_entries(
 	while (start < block_group->start + block_group->length) {
 		ret = find_first_extent_bit(unpin, start,
 					    &extent_start, &extent_end,
-					    EXTENT_DIRTY, NULL);
+					    EXTENT_DIRTY, false, NULL);
 		if (ret)
 			return 0;
 
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 4ba1ab9cc76d..77a7e35a500c 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3153,7 +3153,7 @@ int find_next_extent(struct reloc_control *rc, struct btrfs_path *path,
 
 		ret = find_first_extent_bit(&rc->processed_blocks,
 					    key.objectid, &start, &end,
-					    EXTENT_DIRTY, NULL);
+					    EXTENT_DIRTY, false, NULL);
 
 		if (ret == 0 && start <= key.objectid) {
 			btrfs_release_path(path);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 20c6ac1a5de7..5b3444641ea5 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -974,7 +974,7 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info,
 
 	atomic_inc(&BTRFS_I(fs_info->btree_inode)->sync_writers);
 	while (!find_first_extent_bit(dirty_pages, start, &start, &end,
-				      mark, &cached_state)) {
+				      mark, false, &cached_state)) {
 		bool wait_writeback = false;
 
 		err = convert_extent_bit(dirty_pages, start, end,
@@ -1029,7 +1029,7 @@ static int __btrfs_wait_marked_extents(struct btrfs_fs_info *fs_info,
 	u64 end;
 
 	while (!find_first_extent_bit(dirty_pages, start, &start, &end,
-				      EXTENT_NEED_WAIT, &cached_state)) {
+				      EXTENT_NEED_WAIT, false, &cached_state)) {
 		/*
 		 * Ignore -ENOMEM errors returned by clear_extent_bit().
 		 * When committing the transaction, we'll remove any entries
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 214856c4ccb1..c54329e92ced 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1382,7 +1382,7 @@ static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
 
 	if (!find_first_extent_bit(&device->alloc_state, *start,
 				   &physical_start, &physical_end,
-				   CHUNK_ALLOCATED, NULL)) {
+				   CHUNK_ALLOCATED, false, NULL)) {
 
 		if (in_range(physical_start, *start, len) ||
 		    in_range(*start, physical_start,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 29/68] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (27 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 28/68] btrfs: extent_io: allow find_first_extent_bit() to find a range with exact bits match Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 30/68] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
                   ` (40 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

As a preparation for subpage sector size support (allowing filesystem
with sector size smaller than page size to be mounted) if the sector
size is smaller than page size, we don't allow tree block to be read if
it crosses 64K(*) boundary.

The 64K is selected because:
- We are only going to support 64K page size for subpage for now
- 64K is also the max node size btrfs supports

This ensures that, tree blocks are always contained in one page for a
system with 64K page size, which can greatly simplify the handling.

Or we need to do complex multi-page handling for tree blocks.

Currently the only way to create such tree blocks crossing 64K boundary
is by btrfs-convert, which will get fixed soon and doesn't get
wide-spread usage.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 37c721294ffe..6f41371290e2 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5298,6 +5298,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		btrfs_err(fs_info, "bad tree block start %llu", start);
 		return ERR_PTR(-EINVAL);
 	}
+	if (btrfs_is_subpage(fs_info) && round_down(start, PAGE_SIZE) !=
+	    round_down(start + len - 1, PAGE_SIZE)) {
+		btrfs_err(fs_info,
+		"tree block crosses page boundary, start %llu nodesize %lu",
+			  start, len);
+		return ERR_PTR(-EINVAL);
+	}
 
 	eb = find_extent_buffer(fs_info, start);
 	if (eb)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 30/68] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (28 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 29/68] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 31/68] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
                   ` (39 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

For subpage sized extent buffer, we have ensured no extent buffer will
cross page boundary, thus we would only need one page for any extent
buffer.

This patch will update the function num_extent_pages() to handle such
case.
Now num_extent_pages() would return 1 instead of for subpage sized
extent buffer.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index e588b3100ede..552afc1c0bbc 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -229,8 +229,15 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb);
 
 static inline int num_extent_pages(const struct extent_buffer *eb)
 {
-	return (round_up(eb->start + eb->len, PAGE_SIZE) >> PAGE_SHIFT) -
-	       (eb->start >> PAGE_SHIFT);
+	/*
+	 * For sectorsize == PAGE_SIZE case, since eb is always aligned to
+	 * sectorsize, it's just (eb->len / PAGE_SIZE) >> PAGE_SHIFT.
+	 *
+	 * For sectorsize < PAGE_SIZE case, we only want to support 64K
+	 * PAGE_SIZE, and ensured all tree blocks won't cross page boundary.
+	 * So in that case we always got 1 page.
+	 */
+	return (round_up(eb->len, PAGE_SIZE) >> PAGE_SHIFT);
 }
 
 static inline int extent_buffer_uptodate(const struct extent_buffer *eb)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 31/68] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (29 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 30/68] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 32/68] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage() Qu Wenruo
                   ` (38 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

To support sectorsize < PAGE_SIZE case, we need to take extra care for
extent buffer accessors.

Since sectorsize is smaller than PAGE_SIZE, one page can contain
multiple tree blocks, we must use eb->start to determine the real offset
to read/write for extent buffer accessors.

This patch introduces two helpers to do these:
- get_eb_page_index()
  This is to calculate the index to access extent_buffer::pages.
  It's just a simple wrapper around "start >> PAGE_SHIFT".

  For sectorsize == PAGE_SIZE case, nothing is changed.
  For sectorsize < PAGE_SIZE case, we always get index as 0, and
  the existing page shift works also fine.

- get_eb_page_offset()
  This is to calculate the offset to access extent_buffer::pages.
  This needs to take extent_buffer::start into consideration.

  For sectorsize == PAGE_SIZE case, extent_buffer::start is always
  aligned to PAGE_SIZE, thus adding extent_buffer::start to
  offset_in_page() won't change the result.
  For sectorsize < PAGE_SIZE case, adding extent_buffer::start gives
  us the correct offset to access.

This patch will touch the following parts to cover all extent buffer
accessors:

- BTRFS_SETGET_HEADER_FUNCS()
- read_extent_buffer()
- read_extent_buffer_to_user()
- memcmp_extent_buffer()
- write_extent_buffer_chunk_tree_uuid()
- write_extent_buffer_fsid()
- write_extent_buffer()
- memzero_extent_buffer()
- copy_extent_buffer_full()
- copy_extent_buffer()
- memcpy_extent_buffer()
- memmove_extent_buffer()
- btrfs_get_token_##bits()
- btrfs_get_##bits()
- btrfs_set_token_##bits()
- btrfs_set_##bits()
- generic_bin_search()

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.c        |  5 ++--
 fs/btrfs/ctree.h        | 38 ++++++++++++++++++++++--
 fs/btrfs/extent_io.c    | 66 ++++++++++++++++++++++++-----------------
 fs/btrfs/struct-funcs.c | 18 ++++++-----
 4 files changed, 88 insertions(+), 39 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index cd392da69b81..0f6944a3a836 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1712,10 +1712,11 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
 		oip = offset_in_page(offset);
 
 		if (oip + key_size <= PAGE_SIZE) {
-			const unsigned long idx = offset >> PAGE_SHIFT;
+			const unsigned long idx = get_eb_page_index(offset);
 			char *kaddr = page_address(eb->pages[idx]);
 
-			tmp = (struct btrfs_disk_key *)(kaddr + oip);
+			tmp = (struct btrfs_disk_key *)(kaddr +
+					get_eb_page_offset(eb, offset));
 		} else {
 			read_extent_buffer(eb, &unaligned, offset, key_size);
 			tmp = &unaligned;
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index e3501dad88e2..0c3ea3599dc7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1448,14 +1448,15 @@ static inline void btrfs_set_token_##name(struct btrfs_map_token *token,\
 #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)		\
 static inline u##bits btrfs_##name(const struct extent_buffer *eb)	\
 {									\
-	const type *p = page_address(eb->pages[0]);			\
+	const type *p = page_address(eb->pages[0]) +			\
+			offset_in_page(eb->start);			\
 	u##bits res = le##bits##_to_cpu(p->member);			\
 	return res;							\
 }									\
 static inline void btrfs_set_##name(const struct extent_buffer *eb,	\
 				    u##bits val)			\
 {									\
-	type *p = page_address(eb->pages[0]);				\
+	type *p = page_address(eb->pages[0]) + offset_in_page(eb->start); \
 	p->member = cpu_to_le##bits(val);				\
 }
 
@@ -3241,6 +3242,39 @@ static inline void assertfail(const char *expr, const char* file, int line) { }
 #define ASSERT(expr)	(void)(expr)
 #endif
 
+/*
+ * Get the correct offset inside the page of extent buffer.
+ *
+ * Will handle both sectorsize == PAGE_SIZE and sectorsize < PAGE_SIZE cases.
+ *
+ * @eb:		The target extent buffer
+ * @start:	The offset inside the extent buffer
+ */
+static inline size_t get_eb_page_offset(const struct extent_buffer *eb,
+					unsigned long offset_in_eb)
+{
+	/*
+	 * For sectorsize == PAGE_SIZE case, eb->start will always be aligned
+	 * to PAGE_SIZE, thus adding it won't cause any difference.
+	 *
+	 * For sectorsize < PAGE_SIZE, we must only read the data belongs to
+	 * the eb, thus we have to take the eb->start into consideration.
+	 */
+	return offset_in_page(offset_in_eb + eb->start);
+}
+
+static inline unsigned long get_eb_page_index(unsigned long offset_in_eb)
+{
+	/*
+	 * For sectorsize == PAGE_SIZE case, plain >> PAGE_SHIFT is enough.
+	 *
+	 * For sectorsize < PAGE_SIZE case, we only support 64K PAGE_SIZE,
+	 * and has ensured all tree blocks are contained in one page, thus
+	 * we always get index == 0.
+	 */
+	return offset_in_eb >> PAGE_SHIFT;
+}
+
 /*
  * Use that for functions that are conditionally exported for sanity tests but
  * otherwise static
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6f41371290e2..ea248e2689c9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5703,7 +5703,7 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 	struct page *page;
 	char *kaddr;
 	char *dst = (char *)dstv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 
 	if (start + len > eb->len) {
 		WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, wanted %lu %lu\n",
@@ -5712,7 +5712,7 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 		return;
 	}
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5737,13 +5737,13 @@ int read_extent_buffer_to_user_nofault(const struct extent_buffer *eb,
 	struct page *page;
 	char *kaddr;
 	char __user *dst = (char __user *)dstv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 	int ret = 0;
 
 	WARN_ON(start > eb->len);
 	WARN_ON(start + len > eb->start + eb->len);
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5772,13 +5772,13 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 	struct page *page;
 	char *kaddr;
 	char *ptr = (char *)ptrv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 	int ret = 0;
 
 	WARN_ON(start > eb->len);
 	WARN_ON(start + len > eb->start + eb->len);
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5804,7 +5804,7 @@ void write_extent_buffer_chunk_tree_uuid(const struct extent_buffer *eb,
 	char *kaddr;
 
 	WARN_ON(!PageUptodate(eb->pages[0]));
-	kaddr = page_address(eb->pages[0]);
+	kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, chunk_tree_uuid), srcv,
 			BTRFS_FSID_SIZE);
 }
@@ -5814,7 +5814,7 @@ void write_extent_buffer_fsid(const struct extent_buffer *eb, const void *srcv)
 	char *kaddr;
 
 	WARN_ON(!PageUptodate(eb->pages[0]));
-	kaddr = page_address(eb->pages[0]);
+	kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, fsid), srcv,
 			BTRFS_FSID_SIZE);
 }
@@ -5827,12 +5827,12 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
 	struct page *page;
 	char *kaddr;
 	char *src = (char *)srcv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 
 	WARN_ON(start > eb->len);
 	WARN_ON(start + len > eb->start + eb->len);
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5856,12 +5856,12 @@ void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start,
 	size_t offset;
 	struct page *page;
 	char *kaddr;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 
 	WARN_ON(start > eb->len);
 	WARN_ON(start + len > eb->start + eb->len);
 
-	offset = offset_in_page(start);
+	offset = get_eb_page_offset(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5885,10 +5885,22 @@ void copy_extent_buffer_full(const struct extent_buffer *dst,
 
 	ASSERT(dst->len == src->len);
 
-	num_pages = num_extent_pages(dst);
-	for (i = 0; i < num_pages; i++)
-		copy_page(page_address(dst->pages[i]),
-				page_address(src->pages[i]));
+	if (dst->fs_info->sectorsize == PAGE_SIZE) {
+		num_pages = num_extent_pages(dst);
+		for (i = 0; i < num_pages; i++)
+			copy_page(page_address(dst->pages[i]),
+				  page_address(src->pages[i]));
+	} else {
+		unsigned long src_index = get_eb_page_index(0);
+		unsigned long dst_index = get_eb_page_index(0);
+		size_t src_offset = get_eb_page_offset(src, 0);
+		size_t dst_offset = get_eb_page_offset(dst, 0);
+
+		ASSERT(src_index == 0 && dst_index == 0);
+		memcpy(page_address(dst->pages[dst_index]) + dst_offset,
+		       page_address(src->pages[src_index]) + src_offset,
+		       src->len);
+	}
 }
 
 void copy_extent_buffer(const struct extent_buffer *dst,
@@ -5901,11 +5913,11 @@ void copy_extent_buffer(const struct extent_buffer *dst,
 	size_t offset;
 	struct page *page;
 	char *kaddr;
-	unsigned long i = dst_offset >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(dst_offset);
 
 	WARN_ON(src->len != dst_len);
 
-	offset = offset_in_page(dst_offset);
+	offset = get_eb_page_offset(dst, dst_offset);
 
 	while (len > 0) {
 		page = dst->pages[i];
@@ -5949,7 +5961,7 @@ static inline void eb_bitmap_offset(const struct extent_buffer *eb,
 	 * the bitmap item in the extent buffer + the offset of the byte in the
 	 * bitmap item.
 	 */
-	offset = start + byte_offset;
+	offset = start + offset_in_page(eb->start) + byte_offset;
 
 	*page_index = offset >> PAGE_SHIFT;
 	*page_offset = offset_in_page(offset);
@@ -6113,11 +6125,11 @@ void memcpy_extent_buffer(const struct extent_buffer *dst,
 	}
 
 	while (len > 0) {
-		dst_off_in_page = offset_in_page(dst_offset);
-		src_off_in_page = offset_in_page(src_offset);
+		dst_off_in_page = get_eb_page_offset(dst, dst_offset);
+		src_off_in_page = get_eb_page_offset(dst, src_offset);
 
-		dst_i = dst_offset >> PAGE_SHIFT;
-		src_i = src_offset >> PAGE_SHIFT;
+		dst_i = get_eb_page_index(dst_offset);
+		src_i = get_eb_page_index(src_offset);
 
 		cur = min(len, (unsigned long)(PAGE_SIZE -
 					       src_off_in_page));
@@ -6163,11 +6175,11 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
 		return;
 	}
 	while (len > 0) {
-		dst_i = dst_end >> PAGE_SHIFT;
-		src_i = src_end >> PAGE_SHIFT;
+		dst_i = get_eb_page_index(dst_end);
+		src_i = get_eb_page_index(src_end);
 
-		dst_off_in_page = offset_in_page(dst_end);
-		src_off_in_page = offset_in_page(src_end);
+		dst_off_in_page = get_eb_page_offset(dst, dst_end);
+		src_off_in_page = get_eb_page_offset(dst, src_end);
 
 		cur = min_t(unsigned long, len, src_off_in_page + 1);
 		cur = min(cur, dst_off_in_page + 1);
diff --git a/fs/btrfs/struct-funcs.c b/fs/btrfs/struct-funcs.c
index 079b059818e9..769901c2b3c9 100644
--- a/fs/btrfs/struct-funcs.c
+++ b/fs/btrfs/struct-funcs.c
@@ -67,8 +67,9 @@ u##bits btrfs_get_token_##bits(struct btrfs_map_token *token,		\
 			       const void *ptr, unsigned long off)	\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
-	const unsigned long oip = offset_in_page(member_offset);	\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
+	const unsigned long oip = get_eb_page_offset(token->eb, 	\
+						     member_offset);	\
 	const int size = sizeof(u##bits);				\
 	u8 lebytes[sizeof(u##bits)];					\
 	const int part = PAGE_SIZE - oip;				\
@@ -95,8 +96,8 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb,		\
 			 const void *ptr, unsigned long off)		\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long oip = offset_in_page(member_offset);	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
+	const unsigned long oip = get_eb_page_offset(eb, member_offset);\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
 	char *kaddr = page_address(eb->pages[idx]);			\
 	const int size = sizeof(u##bits);				\
 	const int part = PAGE_SIZE - oip;				\
@@ -116,8 +117,9 @@ void btrfs_set_token_##bits(struct btrfs_map_token *token,		\
 			    u##bits val)				\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
-	const unsigned long oip = offset_in_page(member_offset);	\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
+	const unsigned long oip = get_eb_page_offset(token->eb,		\
+						     member_offset);	\
 	const int size = sizeof(u##bits);				\
 	u8 lebytes[sizeof(u##bits)];					\
 	const int part = PAGE_SIZE - oip;				\
@@ -146,8 +148,8 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr,	\
 		      unsigned long off, u##bits val)			\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long oip = offset_in_page(member_offset);	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
+	const unsigned long oip = get_eb_page_offset(eb, member_offset);\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
 	char *kaddr = page_address(eb->pages[idx]);			\
 	const int size = sizeof(u##bits);				\
 	const int part = PAGE_SIZE - oip;				\
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 32/68] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (30 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 31/68] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 33/68] btrfs: extent-io: make type of extent_state::state to be at least 32 bits Qu Wenruo
                   ` (37 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

In extent_invalidatepage() it will try to clear all possible bits since
it's calling clear_extent_bit() with delete == 1.
That would try to clear all existing bits.

This is currently fine, since for btree io tree, it only utilizes
EXTENT_LOCK bit.
But this could be a problem for later subpage support, which will
utilize extra io tree bit to represent extra info.

This patch will just convert that clear_extent_bit() to
unlock_extent_cached().

As for btree io tree, only EXTENT_LOCKED bit is utilized, this doesn't
change the behavior, but provides a much cleaner basis for incoming
subpage support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1ca121ca28aa..10bdb0a8a92f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -996,8 +996,13 @@ static void extent_invalidatepage(struct extent_io_tree *tree,
 
 	lock_extent_bits(tree, start, end, &cached_state);
 	wait_on_page_writeback(page);
-	clear_extent_bit(tree, start, end, EXTENT_LOCKED | EXTENT_DELALLOC |
-			 EXTENT_DO_ACCOUNTING, 1, 1, &cached_state);
+
+	/*
+	 * Currently for btree io tree, only EXTENT_LOCKED is utilized,
+	 * so here we only need to unlock the extent range to free any
+	 * existing extent state.
+	 */
+	unlock_extent_cached(tree, start, end, &cached_state);
 }
 
 static void btree_invalidatepage(struct page *page, unsigned int offset,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 33/68] btrfs: extent-io: make type of extent_state::state to be at least 32 bits
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (31 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 32/68] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage() Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 34/68] btrfs: extent_io: use extent_io_tree to handle subpage extent buffer allocation Qu Wenruo
                   ` (36 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Currently we use 'unsigned' for extent_state::state, which is only ensured
to be at least 16 bits.

But for incoming subpage support, we are going to introduce more bits to
at least match the following page bits:
- PageUptodate
- PagePrivate2

Thus we will go beyond 16 bits.

To support this, make extent_state::state at least 32bit and to be more
explicit, we use "u32" to be clear about the max supported bits.

This doesn't increase the memory usage for x86_64, but may affect other
architectures.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-io-tree.h | 37 ++++++++++++++++-------------
 fs/btrfs/extent_io.c      | 50 +++++++++++++++++++--------------------
 fs/btrfs/extent_io.h      |  2 +-
 3 files changed, 45 insertions(+), 44 deletions(-)

diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 48fdaf5f3a19..176e0e8e1f7c 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -22,6 +22,10 @@ struct io_failure_record;
 #define EXTENT_QGROUP_RESERVED	(1U << 12)
 #define EXTENT_CLEAR_DATA_RESV	(1U << 13)
 #define EXTENT_DELALLOC_NEW	(1U << 14)
+
+/* For subpage btree io tree, to indicate there is an extent buffer */
+#define EXTENT_HAS_TREE_BLOCK	(1U << 15)
+
 #define EXTENT_DO_ACCOUNTING    (EXTENT_CLEAR_META_RESV | \
 				 EXTENT_CLEAR_DATA_RESV)
 #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING)
@@ -73,7 +77,7 @@ struct extent_state {
 	/* ADD NEW ELEMENTS AFTER THIS */
 	wait_queue_head_t wq;
 	refcount_t refs;
-	unsigned state;
+	u32 state;
 
 	struct io_failure_record *failrec;
 
@@ -136,19 +140,19 @@ void __cold extent_io_exit(void);
 
 u64 count_range_bits(struct extent_io_tree *tree,
 		     u64 *start, u64 search_end,
-		     u64 max_bytes, unsigned bits, int contig);
+		     u64 max_bytes, u32 bits, int contig);
 
 void free_extent_state(struct extent_state *state);
 int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		   unsigned bits, int filled,
+		   u32 bits, int filled,
 		   struct extent_state *cached_state);
 int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
-		unsigned bits, struct extent_changeset *changeset);
+			     u32 bits, struct extent_changeset *changeset);
 int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		     unsigned bits, int wake, int delete,
+		     u32 bits, int wake, int delete,
 		     struct extent_state **cached);
 int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		       unsigned bits, struct extent_state **cached_state,
+		       u32 bits, struct extent_state **cached_state,
 		       gfp_t mask, struct extent_io_extra_options *extra_opts);
 
 static inline int unlock_extent(struct extent_io_tree *tree, u64 start, u64 end)
@@ -177,7 +181,7 @@ static inline int unlock_extent_cached_atomic(struct extent_io_tree *tree,
 }
 
 static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start,
-		u64 end, unsigned bits)
+				    u64 end, u32 bits)
 {
 	int wake = 0;
 
@@ -188,15 +192,14 @@ static inline int clear_extent_bits(struct extent_io_tree *tree, u64 start,
 }
 
 int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
-			   unsigned bits, struct extent_changeset *changeset);
+			   u32 bits, struct extent_changeset *changeset);
 int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		   unsigned bits, struct extent_state **cached_state,
-		   gfp_t mask);
+		   u32 bits, struct extent_state **cached_state, gfp_t mask);
 int set_extent_bits_nowait(struct extent_io_tree *tree, u64 start, u64 end,
-			   unsigned bits);
+			   u32 bits);
 
 static inline int set_extent_bits(struct extent_io_tree *tree, u64 start,
-		u64 end, unsigned bits)
+		u64 end, u32 bits)
 {
 	return set_extent_bit(tree, start, end, bits, NULL, GFP_NOFS);
 }
@@ -223,11 +226,11 @@ static inline int clear_extent_dirty(struct extent_io_tree *tree, u64 start,
 }
 
 int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		       unsigned bits, unsigned clear_bits,
+		       u32 bits, u32 clear_bits,
 		       struct extent_state **cached_state);
 
 static inline int set_extent_delalloc(struct extent_io_tree *tree, u64 start,
-				      u64 end, unsigned int extra_bits,
+				      u64 end, u32 extra_bits,
 				      struct extent_state **cached_state)
 {
 	return set_extent_bit(tree, start, end,
@@ -257,12 +260,12 @@ static inline int set_extent_uptodate(struct extent_io_tree *tree, u64 start,
 }
 
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
-			  u64 *start_ret, u64 *end_ret, unsigned bits,
+			  u64 *start_ret, u64 *end_ret, u32 bits,
 			  bool exact_match, struct extent_state **cached_state);
 void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
-				 u64 *start_ret, u64 *end_ret, unsigned bits);
+				 u64 *start_ret, u64 *end_ret, u32 bits);
 int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start,
-			       u64 *start_ret, u64 *end_ret, unsigned bits);
+			       u64 *start_ret, u64 *end_ret, u32 bits);
 bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start,
 			       u64 *end, u64 max_bytes,
 			       struct extent_state **cached_state);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ea248e2689c9..a7e4d3c65162 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -143,7 +143,7 @@ struct extent_page_data {
 	unsigned int sync_io:1;
 };
 
-static int add_extent_changeset(struct extent_state *state, unsigned bits,
+static int add_extent_changeset(struct extent_state *state, u32 bits,
 				 struct extent_changeset *changeset,
 				 int set)
 {
@@ -531,7 +531,7 @@ static void merge_state(struct extent_io_tree *tree,
 }
 
 static void set_state_bits(struct extent_io_tree *tree,
-			   struct extent_state *state, unsigned *bits,
+			   struct extent_state *state, u32 *bits,
 			   struct extent_changeset *changeset);
 
 /*
@@ -548,7 +548,7 @@ static int insert_state(struct extent_io_tree *tree,
 			struct extent_state *state, u64 start, u64 end,
 			struct rb_node ***p,
 			struct rb_node **parent,
-			unsigned *bits, struct extent_changeset *changeset)
+			u32 *bits, struct extent_changeset *changeset)
 {
 	struct rb_node *node;
 
@@ -629,11 +629,11 @@ static struct extent_state *next_state(struct extent_state *state)
  */
 static struct extent_state *clear_state_bit(struct extent_io_tree *tree,
 					    struct extent_state *state,
-					    unsigned *bits, int wake,
+					    u32 *bits, int wake,
 					    struct extent_changeset *changeset)
 {
 	struct extent_state *next;
-	unsigned bits_to_clear = *bits & ~EXTENT_CTLBITS;
+	u32 bits_to_clear = *bits & ~EXTENT_CTLBITS;
 	int ret;
 
 	if ((bits_to_clear & EXTENT_DIRTY) && (state->state & EXTENT_DIRTY)) {
@@ -700,7 +700,7 @@ static void extent_io_tree_panic(struct extent_io_tree *tree, int err)
  * No error can be returned yet, the ENOMEM for memory is handled by BUG_ON().
  */
 int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		       unsigned bits, struct extent_state **cached_state,
+		       u32 bits, struct extent_state **cached_state,
 		       gfp_t mask, struct extent_io_extra_options *extra_opts)
 {
 	struct extent_changeset *changeset;
@@ -881,7 +881,7 @@ static void wait_on_state(struct extent_io_tree *tree,
  * The tree lock is taken by this function
  */
 static void wait_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-			    unsigned long bits)
+			    u32 bits)
 {
 	struct extent_state *state;
 	struct rb_node *node;
@@ -928,9 +928,9 @@ static void wait_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 
 static void set_state_bits(struct extent_io_tree *tree,
 			   struct extent_state *state,
-			   unsigned *bits, struct extent_changeset *changeset)
+			   u32 *bits, struct extent_changeset *changeset)
 {
-	unsigned bits_to_set = *bits & ~EXTENT_CTLBITS;
+	u32 bits_to_set = *bits & ~EXTENT_CTLBITS;
 	int ret;
 
 	if (tree->private_data && is_data_inode(tree->private_data))
@@ -977,7 +977,7 @@ static void cache_state(struct extent_state *state,
 
 static int __must_check
 __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		 unsigned bits, struct extent_state **cached_state,
+		 u32 bits, struct extent_state **cached_state,
 		 gfp_t mask, struct extent_io_extra_options *extra_opts)
 {
 	struct extent_state *state;
@@ -1201,8 +1201,7 @@ __set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 }
 
 int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		   unsigned bits, struct extent_state **cached_state,
-		   gfp_t mask)
+		   u32 bits, struct extent_state **cached_state, gfp_t mask)
 {
 	return __set_extent_bit(tree, start, end, bits, cached_state,
 			        mask, NULL);
@@ -1228,7 +1227,7 @@ int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
  * All allocations are done with GFP_NOFS.
  */
 int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		       unsigned bits, unsigned clear_bits,
+		       u32 bits, u32 clear_bits,
 		       struct extent_state **cached_state)
 {
 	struct extent_state *state;
@@ -1429,7 +1428,7 @@ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 
 /* wrappers around set/clear extent bit */
 int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
-			   unsigned bits, struct extent_changeset *changeset)
+			   u32 bits, struct extent_changeset *changeset)
 {
 	struct extent_io_extra_options extra_opts = {
 		.changeset = changeset,
@@ -1448,13 +1447,13 @@ int set_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
 }
 
 int set_extent_bits_nowait(struct extent_io_tree *tree, u64 start, u64 end,
-			   unsigned bits)
+			   u32 bits)
 {
 	return __set_extent_bit(tree, start, end, bits, NULL, GFP_NOWAIT, NULL);
 }
 
 int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		     unsigned bits, int wake, int delete,
+		     u32 bits, int wake, int delete,
 		     struct extent_state **cached)
 {
 	struct extent_io_extra_options extra_opts = {
@@ -1467,7 +1466,7 @@ int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 }
 
 int clear_record_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
-		unsigned bits, struct extent_changeset *changeset)
+		u32 bits, struct extent_changeset *changeset)
 {
 	struct extent_io_extra_options extra_opts = {
 		.changeset = changeset,
@@ -1559,7 +1558,7 @@ void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end)
 	}
 }
 
-static bool match_extent_state(struct extent_state *state, unsigned bits,
+static bool match_extent_state(struct extent_state *state, u32 bits,
 			       bool exact_match)
 {
 	if (exact_match)
@@ -1579,7 +1578,7 @@ static bool match_extent_state(struct extent_state *state, unsigned bits,
  */
 static struct extent_state *
 find_first_extent_bit_state(struct extent_io_tree *tree,
-			    u64 start, unsigned bits, bool exact_match)
+			    u64 start, u32 bits, bool exact_match)
 {
 	struct rb_node *node;
 	struct extent_state *state;
@@ -1615,7 +1614,7 @@ find_first_extent_bit_state(struct extent_io_tree *tree,
  * Return 1 if we found nothing.
  */
 int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
-			  u64 *start_ret, u64 *end_ret, unsigned bits,
+			  u64 *start_ret, u64 *end_ret, u32 bits,
 			  bool exact_match, struct extent_state **cached_state)
 {
 	struct extent_state *state;
@@ -1667,7 +1666,7 @@ int find_first_extent_bit(struct extent_io_tree *tree, u64 start,
  * returned will be the full contiguous area with the bits set.
  */
 int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start,
-			       u64 *start_ret, u64 *end_ret, unsigned bits)
+			       u64 *start_ret, u64 *end_ret, u32 bits)
 {
 	struct extent_state *state;
 	int ret = 1;
@@ -1704,7 +1703,7 @@ int find_contiguous_extent_bit(struct extent_io_tree *tree, u64 start,
  * trim @end_ret to the appropriate size.
  */
 void find_first_clear_extent_bit(struct extent_io_tree *tree, u64 start,
-				 u64 *start_ret, u64 *end_ret, unsigned bits)
+				 u64 *start_ret, u64 *end_ret, u32 bits)
 {
 	struct extent_state *state;
 	struct rb_node *node, *prev = NULL, *next;
@@ -2085,8 +2084,7 @@ noinline_for_stack bool find_lock_delalloc_range(struct inode *inode,
 
 void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 				  struct page *locked_page,
-				  unsigned clear_bits,
-				  unsigned long page_ops)
+				  u32 clear_bits, unsigned long page_ops)
 {
 	clear_extent_bit(&inode->io_tree, start, end, clear_bits, 1, 0, NULL);
 
@@ -2102,7 +2100,7 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
  */
 u64 count_range_bits(struct extent_io_tree *tree,
 		     u64 *start, u64 search_end, u64 max_bytes,
-		     unsigned bits, int contig)
+		     u32 bits, int contig)
 {
 	struct rb_node *node;
 	struct extent_state *state;
@@ -2222,7 +2220,7 @@ struct io_failure_record *get_state_failrec(struct extent_io_tree *tree, u64 sta
  * range is found set.
  */
 int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		   unsigned bits, int filled, struct extent_state *cached)
+		   u32 bits, int filled, struct extent_state *cached)
 {
 	struct extent_state *state = NULL;
 	struct rb_node *node;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 552afc1c0bbc..602d6568c8ea 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -288,7 +288,7 @@ void extent_range_clear_dirty_for_io(struct inode *inode, u64 start, u64 end);
 void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end);
 void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 				  struct page *locked_page,
-				  unsigned bits_to_clear,
+				  u32 bits_to_clear,
 				  unsigned long page_ops);
 struct bio *btrfs_bio_alloc(u64 first_byte);
 struct bio *btrfs_io_bio_alloc(unsigned int nr_iovecs);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 34/68] btrfs: extent_io: use extent_io_tree to handle subpage extent buffer allocation
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (32 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 33/68] btrfs: extent-io: make type of extent_state::state to be at least 32 bits Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 35/68] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
                   ` (35 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs uses page::private as an indicator of who owns the
extent buffer, this method won't really work on subpage support, as one
page can contain several tree blocks (up to 16 for 4K node size and 64K
page size).

Instead, here we utilize btree extent io tree to handle them.
For btree io tree, we introduce a new bit, EXTENT_HAS_TREE_BLOCK to
indicate that we have an in-tree extent buffer for the range.

This will affects the following functions:
- alloc_extent_buffer()
  Now for subpage we never use page->private to grab an existing eb.
  Instead, we rely on extra safenet in alloc_extent_buffer() to detect two
  callers on the same eb.

- btrfs_release_extent_buffer_pages()
  Now for subpage, we clear the EXTENT_HAS_TREE_BLOCK bit first, then
  check if the remaining range in the page has EXTENT_HAS_TREE_BLOCK bit.
  If not, then clear the private bit for the page.

- attach_extent_buffer_page()
  Now we set EXTENT_HAS_TREE_BLOCK bit for the new extent buffer to be
  attached, and set the page private, with NULL as page::private.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/btrfs_inode.h    | 12 ++++++
 fs/btrfs/extent-io-tree.h |  2 +-
 fs/btrfs/extent_io.c      | 80 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index c47b6c6fea9f..cff818e0c406 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -217,6 +217,18 @@ static inline struct btrfs_inode *BTRFS_I(const struct inode *inode)
 	return container_of(inode, struct btrfs_inode, vfs_inode);
 }
 
+static inline struct btrfs_fs_info *page_to_fs_info(struct page *page)
+{
+	ASSERT(page->mapping);
+	return BTRFS_I(page->mapping->host)->root->fs_info;
+}
+
+static inline struct extent_io_tree
+*info_to_btree_io_tree(struct btrfs_fs_info *fs_info)
+{
+	return &BTRFS_I(fs_info->btree_inode)->io_tree;
+}
+
 static inline unsigned long btrfs_inode_hash(u64 objectid,
 					     const struct btrfs_root *root)
 {
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index 176e0e8e1f7c..bdafac1bd15f 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -23,7 +23,7 @@ struct io_failure_record;
 #define EXTENT_CLEAR_DATA_RESV	(1U << 13)
 #define EXTENT_DELALLOC_NEW	(1U << 14)
 
-/* For subpage btree io tree, to indicate there is an extent buffer */
+/* For subpage btree io tree, indicates there is an in-tree extent buffer */
 #define EXTENT_HAS_TREE_BLOCK	(1U << 15)
 
 #define EXTENT_DO_ACCOUNTING    (EXTENT_CLEAR_META_RESV | \
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a7e4d3c65162..d899a75db977 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3163,6 +3163,18 @@ static void attach_extent_buffer_page(struct extent_buffer *eb,
 	if (page->mapping)
 		assert_spin_locked(&page->mapping->private_lock);
 
+	if (btrfs_is_subpage(eb->fs_info) && page->mapping) {
+		struct extent_io_tree *io_tree =
+			info_to_btree_io_tree(eb->fs_info);
+
+		if (!PagePrivate(page))
+			attach_page_private(page, NULL);
+
+		set_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+				EXTENT_HAS_TREE_BLOCK, NULL, GFP_ATOMIC);
+		return;
+	}
+
 	if (!PagePrivate(page))
 		attach_page_private(page, eb);
 	else
@@ -4984,6 +4996,36 @@ int extent_buffer_under_io(const struct extent_buffer *eb)
 		test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
 }
 
+static void detach_extent_buffer_subpage(struct extent_buffer *eb)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	struct page *page = eb->pages[0];
+	bool mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
+	int ret;
+
+	if (!page)
+		return;
+
+	if (mapped)
+		spin_lock(&page->mapping->private_lock);
+
+	__clear_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+			   EXTENT_HAS_TREE_BLOCK, NULL, GFP_ATOMIC, NULL);
+
+	/* Test if we still have other extent buffer in the page range */
+	ret = test_range_bit(io_tree, round_down(eb->start, PAGE_SIZE),
+			     round_down(eb->start, PAGE_SIZE) + PAGE_SIZE - 1,
+			     EXTENT_HAS_TREE_BLOCK, 0, NULL);
+	if (!ret)
+		detach_page_private(eb->pages[0]);
+	if (mapped)
+		spin_unlock(&page->mapping->private_lock);
+
+	/* One for when we allocated the page */
+	put_page(page);
+}
+
 /*
  * Release all pages attached to the extent buffer.
  */
@@ -4995,6 +5037,9 @@ static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
 
 	BUG_ON(extent_buffer_under_io(eb));
 
+	if (btrfs_is_subpage(eb->fs_info) && mapped)
+		return detach_extent_buffer_subpage(eb);
+
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		struct page *page = eb->pages[i];
@@ -5289,6 +5334,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 	struct extent_buffer *exists = NULL;
 	struct page *p;
 	struct address_space *mapping = fs_info->btree_inode->i_mapping;
+	bool subpage = btrfs_is_subpage(fs_info);
 	int uptodate = 1;
 	int ret;
 
@@ -5321,7 +5367,12 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		}
 
 		spin_lock(&mapping->private_lock);
-		if (PagePrivate(p)) {
+		/*
+		 * Subpage support doesn't use page::private at all, so we
+		 * completely rely on the radix insert lock to prevent two
+		 * ebs allocated for the same bytenr.
+		 */
+		if (PagePrivate(p) && !subpage) {
 			/*
 			 * We could have already allocated an eb for this page
 			 * and attached one so lets see if we can get a ref on
@@ -5362,8 +5413,21 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		 * we could crash.
 		 */
 	}
-	if (uptodate)
+	if (uptodate) {
 		set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+	} else if (subpage) {
+		/*
+		 * For subpage, we must check extent_io_tree to get if the eb
+		 * is really uptodate, as the page uptodate is only set if the
+		 * whole page is uptodate.
+		 * We can still have uptodate range in the page.
+		 */
+		struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+
+		if (test_range_bit(io_tree, eb->start, eb->start + eb->len - 1,
+				   EXTENT_UPTODATE, 1, NULL))
+			set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+	}
 again:
 	ret = radix_tree_preload(GFP_NOFS);
 	if (ret) {
@@ -5402,6 +5466,18 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		if (eb->pages[i])
 			unlock_page(eb->pages[i]);
 	}
+	/*
+	 * For subpage case, btrfs_release_extent_buffer() will clear the
+	 * EXTENT_HAS_TREE_BLOCK bit if there is a page.
+	 *
+	 * Since we're here because we hit a race with another caller, who
+	 * succeeded in inserting the eb, we shouldn't clear that
+	 * EXTENT_HAS_TREE_BLOCK bit. So here we cleanup the page manually.
+	 */
+	if (subpage) {
+		put_page(eb->pages[0]);
+		eb->pages[i] = NULL;
+	}
 
 	btrfs_release_extent_buffer(eb);
 	return exists;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 35/68] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (33 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 34/68] btrfs: extent_io: use extent_io_tree to handle subpage extent buffer allocation Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 36/68] btrfs: extent_io: make the assert test on page uptodate able to handle subpage Qu Wenruo
                   ` (34 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

For those two functions, to support subpage size they just need the
follow work:
- set/clear the EXTENT_UPTODATE bits for io_tree
- set page Uptodate if the full range of the page is uptodate

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d899a75db977..1e959e6e8ce8 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5631,10 +5631,18 @@ bool set_extent_buffer_dirty(struct extent_buffer *eb)
 void clear_extent_buffer_uptodate(struct extent_buffer *eb)
 {
 	int i;
-	struct page *page;
+	struct page *page = eb->pages[0];
 	int num_pages;
 
 	clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+
+	if (btrfs_is_subpage(eb->fs_info) && page->mapping) {
+		struct extent_io_tree *io_tree =
+			info_to_btree_io_tree(eb->fs_info);
+
+		clear_extent_uptodate(io_tree, eb->start,
+				      eb->start + eb->len - 1, NULL);
+	}
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
@@ -5646,10 +5654,26 @@ void clear_extent_buffer_uptodate(struct extent_buffer *eb)
 void set_extent_buffer_uptodate(struct extent_buffer *eb)
 {
 	int i;
-	struct page *page;
+	struct page *page = eb->pages[0];
 	int num_pages;
 
 	set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+
+	if (btrfs_is_subpage(eb->fs_info) && page->mapping) {
+		struct extent_state *cached = NULL;
+		struct extent_io_tree *io_tree =
+			info_to_btree_io_tree(eb->fs_info);
+		u64 page_start = page_offset(page);
+		u64 page_end = page_offset(page) + PAGE_SIZE - 1;
+
+		set_extent_uptodate(io_tree, eb->start, eb->start + eb->len - 1,
+				    &cached, GFP_NOFS);
+		if (test_range_bit(io_tree, page_start, page_end,
+				   EXTENT_UPTODATE, 1, cached))
+			SetPageUptodate(page);
+		free_extent_state(cached);
+		return;
+	}
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 36/68] btrfs: extent_io: make the assert test on page uptodate able to handle subpage
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (34 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 35/68] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 37/68] btrfs: extent_io: implement subpage metadata read and its endio function Qu Wenruo
                   ` (33 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

There are quite some assert test on page uptodate in extent buffer write
accessors.
They ensure the destination page is already uptodate.

This is fine for regular sector size case, but not for subpage case, as
for subpage we only mark the page uptodate if the page contains no hole
and all its extent buffers are uptodate.

So instead of checking PageUptodate(), for subpage case we check
EXTENT_UPTODATE bit for the range covered by the extent buffer.

To make the check more elegant, introduce a helper,
assert_eb_range_uptodate() to do the check for both subpage and regular
sector size cases.

The following functions are involved:
- write_extent_buffer_chunk_tree_uuid()
- write_extent_buffer_fsid()
- write_extent_buffer()
- memzero_extent_buffer()
- copy_extent_buffer()
- extent_buffer_test_bit()
- extent_buffer_bitmap_set()
- extent_buffer_bitmap_clear()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 44 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1e959e6e8ce8..dcc7d4602cea 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5896,12 +5896,36 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 	return ret;
 }
 
+/*
+ * A helper to ensure that the extent buffer is uptodate.
+ *
+ * For regular sector size == PAGE_SIZE case, check if @page is uptodate.
+ * For subpage case, check if the range covered by the eb has EXTENT_UPTODATE.
+ */
+static void assert_eb_range_uptodate(const struct extent_buffer *eb,
+				     struct page *page)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+
+	if (btrfs_is_subpage(fs_info) && page->mapping) {
+		struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+
+		/* For subpage and mapped eb, check the EXTENT_UPTODATE bit. */
+		WARN_ON(!test_range_bit(io_tree, eb->start,
+				eb->start + eb->len - 1, EXTENT_UPTODATE, 1,
+				NULL));
+	} else {
+		/* For regular eb or dummy eb, check the page status directly */
+		WARN_ON(!PageUptodate(page));
+	}
+}
+
 void write_extent_buffer_chunk_tree_uuid(const struct extent_buffer *eb,
 		const void *srcv)
 {
 	char *kaddr;
 
-	WARN_ON(!PageUptodate(eb->pages[0]));
+	assert_eb_range_uptodate(eb, eb->pages[0]);
 	kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, chunk_tree_uuid), srcv,
 			BTRFS_FSID_SIZE);
@@ -5911,7 +5935,7 @@ void write_extent_buffer_fsid(const struct extent_buffer *eb, const void *srcv)
 {
 	char *kaddr;
 
-	WARN_ON(!PageUptodate(eb->pages[0]));
+	assert_eb_range_uptodate(eb, eb->pages[0]);
 	kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, fsid), srcv,
 			BTRFS_FSID_SIZE);
@@ -5934,7 +5958,7 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
 
 	while (len > 0) {
 		page = eb->pages[i];
-		WARN_ON(!PageUptodate(page));
+		assert_eb_range_uptodate(eb, page);
 
 		cur = min(len, PAGE_SIZE - offset);
 		kaddr = page_address(page);
@@ -5963,7 +5987,7 @@ void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start,
 
 	while (len > 0) {
 		page = eb->pages[i];
-		WARN_ON(!PageUptodate(page));
+		assert_eb_range_uptodate(eb, page);
 
 		cur = min(len, PAGE_SIZE - offset);
 		kaddr = page_address(page);
@@ -6019,7 +6043,7 @@ void copy_extent_buffer(const struct extent_buffer *dst,
 
 	while (len > 0) {
 		page = dst->pages[i];
-		WARN_ON(!PageUptodate(page));
+		assert_eb_range_uptodate(dst, page);
 
 		cur = min(len, (unsigned long)(PAGE_SIZE - offset));
 
@@ -6081,7 +6105,7 @@ int extent_buffer_test_bit(const struct extent_buffer *eb, unsigned long start,
 
 	eb_bitmap_offset(eb, start, nr, &i, &offset);
 	page = eb->pages[i];
-	WARN_ON(!PageUptodate(page));
+	assert_eb_range_uptodate(eb, page);
 	kaddr = page_address(page);
 	return 1U & (kaddr[offset] >> (nr & (BITS_PER_BYTE - 1)));
 }
@@ -6106,7 +6130,7 @@ void extent_buffer_bitmap_set(const struct extent_buffer *eb, unsigned long star
 
 	eb_bitmap_offset(eb, start, pos, &i, &offset);
 	page = eb->pages[i];
-	WARN_ON(!PageUptodate(page));
+	assert_eb_range_uptodate(eb, page);
 	kaddr = page_address(page);
 
 	while (len >= bits_to_set) {
@@ -6117,7 +6141,7 @@ void extent_buffer_bitmap_set(const struct extent_buffer *eb, unsigned long star
 		if (++offset >= PAGE_SIZE && len > 0) {
 			offset = 0;
 			page = eb->pages[++i];
-			WARN_ON(!PageUptodate(page));
+			assert_eb_range_uptodate(eb, page);
 			kaddr = page_address(page);
 		}
 	}
@@ -6149,7 +6173,7 @@ void extent_buffer_bitmap_clear(const struct extent_buffer *eb,
 
 	eb_bitmap_offset(eb, start, pos, &i, &offset);
 	page = eb->pages[i];
-	WARN_ON(!PageUptodate(page));
+	assert_eb_range_uptodate(eb, page);
 	kaddr = page_address(page);
 
 	while (len >= bits_to_clear) {
@@ -6160,7 +6184,7 @@ void extent_buffer_bitmap_clear(const struct extent_buffer *eb,
 		if (++offset >= PAGE_SIZE && len > 0) {
 			offset = 0;
 			page = eb->pages[++i];
-			WARN_ON(!PageUptodate(page));
+			assert_eb_range_uptodate(eb, page);
 			kaddr = page_address(page);
 		}
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 37/68] btrfs: extent_io: implement subpage metadata read and its endio function
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (35 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 36/68] btrfs: extent_io: make the assert test on page uptodate able to handle subpage Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 38/68] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
                   ` (32 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

For subpage metadata read, since we're completely relying on io tree
other than page bits, its read submission and endio function is
different from the regular page size.

For submission part:
- Do extent locking/waiting
  Instead of page locking, we do extent io tree locking, which provides
  subpage granularity locking.

  And since we're no longer relying on full page locking, which means in
  theory we can submit parallel metadata read even they are in the same
  page.

- Submit extent page directly
  To simply the process, as all the metadata read is always contained in
  one page.

For endio part:
- Do extent locking
  The same as submission part, instead of page locking, only reply on
  extent io tree locking.

This behavior has a small problem that, extent locking/waiting are all
going to allocate memory, thus they can all fail.

Currently we're relying on the BUG_ON() in various set_extent_bits()
calls. But when we're going to handle the error from them, this way
would make it more complex to pass all the ENOMEM error upwards.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   | 81 ++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/extent_io.c | 74 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 155 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 10bdb0a8a92f..89021e552da0 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -651,6 +651,84 @@ static int btrfs_check_extent_buffer(struct extent_buffer *eb)
 	return ret;
 }
 
+static int btree_read_subpage_endio_hook(struct page *page, u64 start, u64 end,
+					 int mirror)
+{
+	struct btrfs_fs_info *fs_info = page_to_fs_info(page);
+	struct extent_buffer *eb;
+	int reads_done;
+	int ret = 0;
+
+	if (!IS_ALIGNED(start, fs_info->sectorsize) ||
+	    !IS_ALIGNED(end - start + 1, fs_info->sectorsize) ||
+	    !IS_ALIGNED(end - start + 1, fs_info->nodesize)) {
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		btrfs_err(fs_info, "invalid tree read bytenr");
+		return -EUCLEAN;
+	}
+
+	/*
+	 * We don't allow bio merge for subpage metadata read, so we should
+	 * only get one eb for each endio hook.
+	 */
+	ASSERT(end == start + fs_info->nodesize - 1);
+	ASSERT(PagePrivate(page));
+
+	rcu_read_lock();
+	eb = radix_tree_lookup(&fs_info->buffer_radix,
+			       start / fs_info->sectorsize);
+	rcu_read_unlock();
+
+	/*
+	 * When we are reading one tree block, eb must have been
+	 * inserted into the radix tree. If not something is wrong.
+	 */
+	if (!eb) {
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		btrfs_err(fs_info,
+			"can't find extent buffer for bytenr %llu",
+			start);
+		return -EUCLEAN;
+	}
+	/*
+	 * The pending IO might have been the only thing that kept
+	 * this buffer in memory.  Make sure we have a ref for all
+	 * this other checks
+	 */
+	atomic_inc(&eb->refs);
+
+	reads_done = atomic_dec_and_test(&eb->io_pages);
+	/* Subpage read must finish in page read */
+	ASSERT(reads_done);
+
+	eb->read_mirror= mirror;
+	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
+		ret = -EIO;
+		goto err;
+	}
+	ret = btrfs_check_extent_buffer(eb);
+	if (ret < 0)
+		goto err;
+
+	if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
+		btree_readahead_hook(eb, ret);
+
+	set_extent_buffer_uptodate(eb);
+
+	free_extent_buffer(eb);
+	return ret;
+err:
+	/*
+	 * our io error hook is going to dec the io pages
+	 * again, we have to make sure it has something to
+	 * decrement
+	 */
+	atomic_inc(&eb->io_pages);
+	clear_extent_buffer_uptodate(eb);
+	free_extent_buffer(eb);
+	return ret;
+}
+
 static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 				      u64 phy_offset, struct page *page,
 				      u64 start, u64 end, int mirror)
@@ -659,6 +737,9 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 	int ret = 0;
 	bool reads_done;
 
+	if (btrfs_is_subpage(page_to_fs_info(page)))
+		return btree_read_subpage_endio_hook(page, start, end, mirror);
+
 	/* Metadata pages that goes through IO should all have private set */
 	ASSERT(PagePrivate(page) && page->private);
 	eb = (struct extent_buffer *)page->private;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index dcc7d4602cea..2f9609d35f0c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3111,6 +3111,15 @@ static int submit_extent_page(unsigned int opf,
 		else
 			contig = bio_end_sector(bio) == sector;
 
+		/*
+		 * For subpage metadata read, never merge request, so that
+		 * we get endio hook called on each metadata read.
+		 */
+		if (btrfs_is_subpage(page_to_fs_info(page)) &&
+		    tree->owner == IO_TREE_BTREE_INODE_IO &&
+		    (opf & REQ_OP_READ))
+			ASSERT(force_bio_submit);
+
 		ASSERT(tree->ops);
 		if (btrfs_bio_fits_in_stripe(page, io_size, bio, bio_flags))
 			can_merge = false;
@@ -5681,6 +5690,68 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
 	}
 }
 
+static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
+				      int mirror_num)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	struct page *page = eb->pages[0];
+	struct bio *bio = NULL;
+	int ret = 0;
+
+	ASSERT(!test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags));
+
+	if (wait == WAIT_NONE) {
+		ret = try_lock_extent(io_tree, eb->start,
+				      eb->start + eb->len - 1);
+		if (ret <= 0)
+			return ret;
+	} else {
+		ret = lock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		if (ret < 0)
+			return ret;
+	}
+
+	ret = 0;
+	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags) ||
+	    PageUptodate(page) ||
+	    test_range_bit(io_tree, eb->start, eb->start + eb->len - 1,
+			   EXTENT_UPTODATE, 1, NULL)) {
+		set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+		unlock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		return ret;
+	}
+	atomic_set(&eb->io_pages, 1);
+
+	ret = submit_extent_page(REQ_OP_READ | REQ_META, NULL, page, eb->start,
+				 eb->len, eb->start - page_offset(page), &bio,
+				 end_bio_extent_readpage, mirror_num, 0, 0,
+				 true);
+	if (ret) {
+		/*
+		 * In the endio function, if we hit something wrong we will
+		 * increase the io_pages, so here we need to decrease it for error
+		 * path.
+		 */
+		atomic_dec(&eb->io_pages);
+	}
+	if (bio) {
+		int tmp;
+
+		tmp = submit_one_bio(bio, mirror_num, 0);
+		if (tmp < 0)
+			return tmp;
+	}
+	if (ret || wait != WAIT_COMPLETE)
+		return ret;
+
+	wait_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+			EXTENT_LOCKED);
+	if (!test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
+		ret = -EIO;
+	return ret;
+}
+
 int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 {
 	int i;
@@ -5697,6 +5768,9 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
 		return 0;
 
+	if (btrfs_is_subpage(eb->fs_info))
+		return read_extent_buffer_subpage(eb, wait, mirror_num);
+
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 38/68] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (36 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 37/68] btrfs: extent_io: implement subpage metadata read and its endio function Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 39/68] btrfs: extent_io: extra the core of test_range_bit() into test_range_bit_nolock() Qu Wenruo
                   ` (31 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

For try_release_extent_buffer(), we just iterate through all the range
with EXTENT_NEW set, and try freeing each extent buffer.

Also introduce a helper, find_first_subpage_eb(), to locate find the
first eb in the range.
This helper will also be utilized for later subpage patches.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   |  6 ++++
 fs/btrfs/extent_io.c | 83 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 89021e552da0..efbe12e4f952 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1047,6 +1047,12 @@ static int btree_writepages(struct address_space *mapping,
 
 static int btree_readpage(struct file *file, struct page *page)
 {
+	/*
+	 * For subpage, we don't support VFS to call btree_readpages(),
+	 * directly.
+	 */
+	if (btrfs_is_subpage(page_to_fs_info(page)))
+		return -ENOTTY;
 	return extent_read_full_page(page, btree_get_extent, 0);
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2f9609d35f0c..6a34b33be1fc 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2772,6 +2772,48 @@ blk_status_t btrfs_submit_read_repair(struct inode *inode,
 	return status;
 }
 
+/*
+ * A helper for locate subpage extent buffer.
+ *
+ * NOTE: returned extent buffer won't has its ref increased.
+ *
+ * @extra_bits:		Extra bits to match.
+ * 			The returned eb range will match all extra_bits.
+ *
+ * Return 0 if we found one extent buffer and record it in @eb_ret.
+ * Return 1 if there is no extent buffer in the range.
+ */
+static int find_first_subpage_eb(struct btrfs_fs_info *fs_info,
+				 struct extent_buffer **eb_ret, u64 start,
+				 u64 end, u32 extra_bits)
+{
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	u64 found_start;
+	u64 found_end;
+	int ret;
+
+	ASSERT(btrfs_is_subpage(fs_info) && eb_ret);
+
+	ret = find_first_extent_bit(io_tree, start, &found_start, &found_end,
+			EXTENT_HAS_TREE_BLOCK | extra_bits, true, NULL);
+	if (ret > 0 || found_start > end)
+		return 1;
+
+	/* found_start can be smaller than start */
+	start = max(start, found_start);
+
+	/*
+	 * Here we can't call find_extent_buffer() which will increase
+	 * eb->refs.
+	 */
+	rcu_read_lock();
+	*eb_ret = radix_tree_lookup(&fs_info->buffer_radix,
+				    start / fs_info->sectorsize);
+	rcu_read_unlock();
+	ASSERT(*eb_ret);
+	return 0;
+}
+
 /* lots and lots of room for performance fixes in the end_bio funcs */
 
 void end_extent_writepage(struct page *page, int err, u64 start, u64 end)
@@ -6389,10 +6431,51 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
 	}
 }
 
+static int try_release_subpage_eb(struct page *page)
+{
+	struct btrfs_fs_info *fs_info = page_to_fs_info(page);
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	u64 cur = page_offset(page);
+	u64 end = page_offset(page) + PAGE_SIZE - 1;
+	int ret;
+
+	while (cur <= end) {
+		struct extent_buffer *eb;
+
+		ret = find_first_subpage_eb(fs_info, &eb, cur, end, 0);
+		if (ret > 0)
+			break;
+
+		cur = eb->start + eb->len;
+
+		spin_lock(&eb->refs_lock);
+		if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb) ||
+		    !test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) {
+			spin_unlock(&eb->refs_lock);
+			continue;
+		}
+		/*
+		 * Here we don't care the return value, we will always check
+		 * the EXTENT_HAS_TREE_BLOCK bit at the end.
+		 */
+		release_extent_buffer(eb);
+	}
+
+	/* Finally check if there is any EXTENT_HAS_TREE_BLOCK bit remaining */
+	if (test_range_bit(io_tree, page_offset(page), end,
+			   EXTENT_HAS_TREE_BLOCK, 0, NULL))
+		ret = 0;
+	else
+		ret = 1;
+	return ret;
+}
+
 int try_release_extent_buffer(struct page *page)
 {
 	struct extent_buffer *eb;
 
+	if (btrfs_is_subpage(page_to_fs_info(page)))
+		return try_release_subpage_eb(page);
 	/*
 	 * We need to make sure nobody is attaching this page to an eb right
 	 * now.
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 39/68] btrfs: extent_io: extra the core of test_range_bit() into test_range_bit_nolock()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (37 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 38/68] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 40/68] btrfs: extent_io: introduce EXTENT_READ_SUBMITTED to handle subpage data read Qu Wenruo
                   ` (30 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

This allows later function to utilize test_range_bit_nolock() with
caller handling the lock.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6a34b33be1fc..37593b599522 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2213,20 +2213,16 @@ struct io_failure_record *get_state_failrec(struct extent_io_tree *tree, u64 sta
 	return failrec;
 }
 
-/*
- * searches a range in the state tree for a given mask.
- * If 'filled' == 1, this returns 1 only if every extent in the tree
- * has the bits set.  Otherwise, 1 is returned if any bit in the
- * range is found set.
- */
-int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end,
-		   u32 bits, int filled, struct extent_state *cached)
+static int test_range_bit_nolock(struct extent_io_tree *tree, u64 start,
+				 u64 end, u32 bits, int filled,
+				 struct extent_state *cached)
 {
 	struct extent_state *state = NULL;
 	struct rb_node *node;
 	int bitset = 0;
 
-	spin_lock(&tree->lock);
+	assert_spin_locked(&tree->lock);
+
 	if (cached && extent_state_in_tree(cached) && cached->start <= start &&
 	    cached->end > start)
 		node = &cached->rb_node;
@@ -2265,10 +2261,26 @@ int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end,
 			break;
 		}
 	}
-	spin_unlock(&tree->lock);
 	return bitset;
 }
 
+/*
+ * searches a range in the state tree for a given mask.
+ * If 'filled' == 1, this returns 1 only if every extent in the tree
+ * has the bits set.  Otherwise, 1 is returned if any bit in the
+ * range is found set.
+ */
+int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end,
+		   u32 bits, int filled, struct extent_state *cached)
+{
+	int ret;
+
+	spin_lock(&tree->lock);
+	ret = test_range_bit_nolock(tree, start, end, bits, filled, cached);
+	spin_unlock(&tree->lock);
+	return ret;
+}
+
 /*
  * helper function to set a given page up to date if all the
  * extents in the tree for that page are up to date
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 40/68] btrfs: extent_io: introduce EXTENT_READ_SUBMITTED to handle subpage data read
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (38 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 39/68] btrfs: extent_io: extra the core of test_range_bit() into test_range_bit_nolock() Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 41/68] btrfs: set btree inode track_uptodate for subpage support Qu Wenruo
                   ` (29 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

In end_bio_extent_readpage(), we will unlock the page for each segment,
this is fine for regular sectorsize == PAGE_SIZE case.

But for subpage size case, we may have several bio segments for the same
page, and unlock the page unconditionally could easily screw up the
locking.

To address the problem:
- Introduce a new bit, EXTENT_READ_SUBMITTED
  Now for subpage data read, each submitted read bio will have its range
  with EXTENT_READ_SUBMITTED set.

- Set the EXTENT_READ_SUBMITTED in __do_readpage()
  Set the full page with EXTENT_READ_SUBMITTED set.

- Clear and test if we're the last owner of EXTENT_READ_SUBMITTED in
  end_bio_extent_readpage() and __do_readpage()
  This ensures that no matter who finishes filling the page, the last
  owner will unlock the page.

  This is quite different from regular sectorsize case, where one page
  either get unlocked in __do_readpage() or in
  end_bio_extent_readpage().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-io-tree.h |  22 ++++++++
 fs/btrfs/extent_io.c      | 115 +++++++++++++++++++++++++++++++++++---
 2 files changed, 129 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index bdafac1bd15f..d3b21c732634 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -26,6 +26,15 @@ struct io_failure_record;
 /* For subpage btree io tree, indicates there is an in-tree extent buffer */
 #define EXTENT_HAS_TREE_BLOCK	(1U << 15)
 
+/*
+ * For subpage data io tree, indicates there is an read bio submitted.
+ * The last one to clear the bit in the page will be responsible to unlock
+ * the containg page.
+ *
+ * TODO: Remove this if we use iomap for data read.
+ */
+#define EXTENT_READ_SUBMITTED	(1U << 16)
+
 #define EXTENT_DO_ACCOUNTING    (EXTENT_CLEAR_META_RESV | \
 				 EXTENT_CLEAR_DATA_RESV)
 #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING)
@@ -115,6 +124,19 @@ struct extent_io_extra_options {
 	 */
 	bool wake;
 	bool delete;
+
+	/*
+	 * For __clear_extent_bit(), to skip the spin lock and rely on caller
+	 * for the lock.
+	 * This allows the caller to do test-and-clear in a spinlock.
+	 */
+	bool skip_lock;
+
+	/*
+	 * For __clear_extent_bit(), paired with skip_lock, to provide the
+	 * preallocated extent_state.
+	 */
+	struct extent_state *prealloc;
 };
 
 int __init extent_state_cache_init(void);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 37593b599522..5254a4ce2598 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -710,6 +710,7 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 	struct rb_node *node;
 	bool wake;
 	bool delete;
+	bool skip_lock;
 	u64 last_end;
 	int err;
 	int clear = 0;
@@ -719,8 +720,13 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 	changeset = extra_opts->changeset;
 	wake = extra_opts->wake;
 	delete = extra_opts->delete;
+	skip_lock = extra_opts->skip_lock;
 
-	btrfs_debug_check_extent_io_range(tree, start, end);
+	if (skip_lock)
+		ASSERT(!gfpflags_allow_blocking(mask));
+
+	if (!skip_lock)
+		btrfs_debug_check_extent_io_range(tree, start, end);
 	trace_btrfs_clear_extent_bit(tree, start, end - start + 1, bits);
 
 	if (bits & EXTENT_DELALLOC)
@@ -742,8 +748,11 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 		 */
 		prealloc = alloc_extent_state(mask);
 	}
+	if (!prealloc && skip_lock)
+		prealloc = extra_opts->prealloc;
 
-	spin_lock(&tree->lock);
+	if (!skip_lock)
+		spin_lock(&tree->lock);
 	if (cached_state) {
 		cached = *cached_state;
 
@@ -848,15 +857,20 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 search_again:
 	if (start > end)
 		goto out;
-	spin_unlock(&tree->lock);
-	if (gfpflags_allow_blocking(mask))
-		cond_resched();
+	if (!skip_lock) {
+		spin_unlock(&tree->lock);
+		if (gfpflags_allow_blocking(mask))
+			cond_resched();
+	}
 	goto again;
 
 out:
-	spin_unlock(&tree->lock);
+	if (!skip_lock)
+		spin_unlock(&tree->lock);
 	if (prealloc)
 		free_extent_state(prealloc);
+	if (skip_lock)
+		extra_opts->prealloc = NULL;
 
 	return 0;
 
@@ -2926,6 +2940,70 @@ endio_readpage_release_extent(struct extent_io_tree *tree, struct page *page,
 	unlock_extent_cached_atomic(tree, start, end, &cached);
 }
 
+/*
+ * Finish the read and unlock the page if needed.
+ *
+ * For regular sectorsize == PAGE_SIZE case, just unlock the page.
+ * For subpage case, clear the EXTENT_READ_SUBMITTED bit, then if and
+ * only if we're the last EXTENT_READ_SUBMITTED of the page.
+ */
+static void finish_and_unlock_read_page(struct btrfs_fs_info *fs_info,
+		struct extent_io_tree *tree, u64 start, u64 end,
+		struct page *page, bool in_endio_context)
+{
+	struct extent_io_extra_options extra_opts = {
+		.skip_lock = true,
+	};
+	u64 page_start = round_down(start, PAGE_SIZE);
+	u64 page_end = page_start + PAGE_SIZE - 1;
+	bool metadata = (tree->owner == IO_TREE_BTREE_INODE_IO);
+	bool has_bit = true;
+	bool last_owner = false;
+
+	/*
+	 * For subpage metadata, we don't lock page for read/write at all,
+	 * just exit.
+	 */
+	if (btrfs_is_subpage(fs_info) && metadata)
+		return;
+
+	/* For regular sector size, we need to unlock the full page for endio */
+	if (!btrfs_is_subpage(fs_info)) {
+		/*
+		 * This function can be called in __do_readpage(), in that case we
+		 * shouldn't unlock the page.
+		 */
+		if (in_endio_context)
+			unlock_page(page);
+		return;
+	}
+
+	/*
+	 * The remaining case is subpage data read, which we need to update
+	 * EXTENT_READ_SUBMITTED and unlock the page for the last reader.
+	 */
+	ASSERT(end <= page_end);
+
+	/* Will be freed in __clear_extent_bit() */
+	extra_opts.prealloc = alloc_extent_state(GFP_NOFS);
+
+	spin_lock(&tree->lock);
+	/* Check if we have the bit first */
+	if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
+		has_bit = test_range_bit_nolock(tree, start, end,
+				EXTENT_READ_SUBMITTED, 1, NULL);
+		WARN_ON(!has_bit);
+	}
+
+	__clear_extent_bit(tree, start, end, EXTENT_READ_SUBMITTED, NULL,
+			   GFP_ATOMIC, &extra_opts);
+	last_owner = !test_range_bit_nolock(tree, page_start, page_end,
+					    EXTENT_READ_SUBMITTED, 0, NULL);
+	spin_unlock(&tree->lock);
+	if (has_bit && last_owner)
+		unlock_page(page);
+}
+
 /*
  * after a readpage IO is done, we need to:
  * clear the uptodate bits on error
@@ -3050,7 +3128,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		offset += len;
 
 		endio_readpage_release_extent(tree, page, start, end, uptodate);
-		unlock_page(page);
+		finish_and_unlock_read_page(fs_info, tree, start, end, page, true);
 	}
 
 	btrfs_io_bio_free_csum(io_bio);
@@ -3277,6 +3355,7 @@ __get_extent_map(struct inode *inode, struct page *page, size_t pg_offset,
 	}
 	return em;
 }
+
 /*
  * basic readpage implementation.  Locked extent state structs are inserted
  * into the tree that are removed when the IO is done (by the end_io
@@ -3292,6 +3371,7 @@ static int __do_readpage(struct page *page,
 			 u64 *prev_em_start)
 {
 	struct inode *inode = page->mapping->host;
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
 	u64 start = page_offset(page);
 	const u64 end = start + PAGE_SIZE - 1;
 	u64 cur = start;
@@ -3330,6 +3410,9 @@ static int __do_readpage(struct page *page,
 			kunmap_atomic(userpage);
 		}
 	}
+
+	if (btrfs_is_subpage(fs_info))
+		set_extent_bits(tree, start, end, EXTENT_READ_SUBMITTED);
 	while (cur <= end) {
 		bool force_bio_submit = false;
 		u64 offset;
@@ -3347,6 +3430,8 @@ static int __do_readpage(struct page *page,
 					    &cached, GFP_NOFS);
 			unlock_extent_cached(tree, cur,
 					     cur + iosize - 1, &cached);
+			finish_and_unlock_read_page(fs_info, tree, cur,
+						cur + iosize - 1, page, false);
 			break;
 		}
 		em = __get_extent_map(inode, page, pg_offset, cur,
@@ -3354,6 +3439,8 @@ static int __do_readpage(struct page *page,
 		if (IS_ERR_OR_NULL(em)) {
 			SetPageError(page);
 			unlock_extent(tree, cur, end);
+			finish_and_unlock_read_page(fs_info, tree, cur,
+						cur + iosize - 1, page, false);
 			break;
 		}
 		extent_offset = cur - em->start;
@@ -3436,6 +3523,8 @@ static int __do_readpage(struct page *page,
 					    &cached, GFP_NOFS);
 			unlock_extent_cached(tree, cur,
 					     cur + iosize - 1, &cached);
+			finish_and_unlock_read_page(fs_info, tree, cur,
+						cur + iosize - 1, page, false);
 			cur = cur + iosize;
 			pg_offset += iosize;
 			continue;
@@ -3445,6 +3534,8 @@ static int __do_readpage(struct page *page,
 				   EXTENT_UPTODATE, 1, NULL)) {
 			check_page_uptodate(tree, page);
 			unlock_extent(tree, cur, cur + iosize - 1);
+			finish_and_unlock_read_page(fs_info, tree, cur,
+						cur + iosize - 1, page, false);
 			cur = cur + iosize;
 			pg_offset += iosize;
 			continue;
@@ -3455,6 +3546,8 @@ static int __do_readpage(struct page *page,
 		if (block_start == EXTENT_MAP_INLINE) {
 			SetPageError(page);
 			unlock_extent(tree, cur, cur + iosize - 1);
+			finish_and_unlock_read_page(fs_info, tree, cur,
+						cur + iosize - 1, page, false);
 			cur = cur + iosize;
 			pg_offset += iosize;
 			continue;
@@ -3482,7 +3575,13 @@ static int __do_readpage(struct page *page,
 	if (!nr) {
 		if (!PageError(page))
 			SetPageUptodate(page);
-		unlock_page(page);
+		/*
+		 * Subpage case will unlock the page in
+		 * finish_and_unlock_read_page() according to the
+		 * EXTENT_READ_SUBMITTED status.
+		 */
+		if (!btrfs_is_subpage(fs_info))
+			unlock_page(page);
 	}
 	return ret;
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 41/68] btrfs: set btree inode track_uptodate for subpage support
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (39 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 40/68] btrfs: extent_io: introduce EXTENT_READ_SUBMITTED to handle subpage data read Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 42/68] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
                   ` (28 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Let btree io tree to track EXTENT_UPTODATE bit, so that for subpage
metadata IO, we don't need to bother tracking the UPTODATE status
manually through bio submission/endio functions.

Currently only subpage metadata will cleanup the extra bits utizlied
(EXTENT_HAS_TREE_BLOCK, EXTENT_UPTODATE, EXTENT_LOCKED), while the
regular page size will only clean up EXTENT_LOCKED.

This still allows the regular page size case to avoid the extra delay in
extent io tree operations, but allows subpage case to be sector size
aligned.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index efbe12e4f952..97c44f518a49 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2244,7 +2244,14 @@ static void btrfs_init_btree_inode(struct btrfs_fs_info *fs_info)
 	RB_CLEAR_NODE(&BTRFS_I(inode)->rb_node);
 	extent_io_tree_init(fs_info, &BTRFS_I(inode)->io_tree,
 			    IO_TREE_BTREE_INODE_IO, inode);
-	BTRFS_I(inode)->io_tree.track_uptodate = false;
+	/*
+	 * For subpage size support, btree inode tracks EXTENT_UPTODATE for
+	 * its IO.
+	 */
+	if (btrfs_is_subpage(fs_info))
+		BTRFS_I(inode)->io_tree.track_uptodate = true;
+	else
+		BTRFS_I(inode)->io_tree.track_uptodate = false;
 	extent_map_tree_init(&BTRFS_I(inode)->extent_tree);
 
 	BTRFS_I(inode)->io_tree.ops = &btree_extent_io_ops;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 42/68] btrfs: allow RO mount of 4K sector size fs on 64K page system
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (40 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 41/68] btrfs: set btree inode track_uptodate for subpage support Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-29 20:11   ` David Sterba
  2020-10-29 23:34   ` Michał Mirosław
  2020-10-21  6:25 ` [PATCH v4 43/68] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
                   ` (27 subsequent siblings)
  69 siblings, 2 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

This adds the basic RO mount ability for 4K sector size on 64K page
system.

Currently we only plan to support 4K and 64K page system.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 24 +++++++++++++++++++++---
 fs/btrfs/super.c   |  7 +++++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 97c44f518a49..e0dc7b92411e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2565,13 +2565,21 @@ static int validate_super(struct btrfs_fs_info *fs_info,
 		btrfs_err(fs_info, "invalid sectorsize %llu", sectorsize);
 		ret = -EINVAL;
 	}
-	/* Only PAGE SIZE is supported yet */
-	if (sectorsize != PAGE_SIZE) {
+
+	/*
+	 * For 4K page size, we only support 4K sector size.
+	 * For 64K page size, we support RW for 64K sector size, and RO for
+	 * 4K sector size.
+	 */
+	if ((PAGE_SIZE == SZ_4K && sectorsize != PAGE_SIZE) ||
+	    (PAGE_SIZE == SZ_64K && (sectorsize != SZ_4K &&
+				     sectorsize != SZ_64K))) {
 		btrfs_err(fs_info,
-			"sectorsize %llu not supported yet, only support %lu",
+			"sectorsize %llu not supported yet for page size %lu",
 			sectorsize, PAGE_SIZE);
 		ret = -EINVAL;
 	}
+
 	if (!is_power_of_2(nodesize) || nodesize < sectorsize ||
 	    nodesize > BTRFS_MAX_METADATA_BLOCKSIZE) {
 		btrfs_err(fs_info, "invalid nodesize %llu", nodesize);
@@ -3219,6 +3227,16 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_alloc;
 	}
 
+	/* For 4K sector size support, it's only read-only yet */
+	if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) {
+		if (!sb_rdonly(sb) || btrfs_super_log_root(disk_super)) {
+			btrfs_err(fs_info,
+				"subpage sector size only support RO yet");
+			err = -EINVAL;
+			goto fail_alloc;
+		}
+	}
+
 	ret = btrfs_init_workqueues(fs_info, fs_devices);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 25967ecaaf0a..743a2fadf4ee 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1922,6 +1922,13 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			ret = -EINVAL;
 			goto restore;
 		}
+		if (btrfs_is_subpage(fs_info)) {
+			btrfs_warn(fs_info,
+	"read-write mount is not yet allowed for sector size %u page size %lu",
+				   fs_info->sectorsize, PAGE_SIZE);
+			ret = -EINVAL;
+			goto restore;
+		}
 
 		ret = btrfs_cleanup_fs_roots(fs_info);
 		if (ret)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 43/68] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (41 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 42/68] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 44/68] btrfs: disk-io: support subpage metadata csum calculation at write time Qu Wenruo
                   ` (26 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

For btree_set_page_dirty(), we should also check the extent buffer
sanity for subpage support.

Unlike the regular sector size case, since one page can contain multile
extent buffers, and page::private no longer contains the pointer to
extent buffer.

So this patch will iterate through the extent_io_tree to find out any
EXTENT_HAS_TREE_BLOCK bit, and check if any extent buffers in the page
range has EXTENT_BUFFER_DIRTY and proper refs.

Also, since we need to find subpage extent outside of extent_io.c,
export find_first_subpage_eb() as btrfs_find_first_subpage_eb().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   | 36 ++++++++++++++++++++++++++++++------
 fs/btrfs/extent_io.c |  8 ++++----
 fs/btrfs/extent_io.h |  4 ++++
 3 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e0dc7b92411e..d31999978821 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1110,14 +1110,38 @@ static void btree_invalidatepage(struct page *page, unsigned int offset,
 static int btree_set_page_dirty(struct page *page)
 {
 #ifdef DEBUG
+	struct btrfs_fs_info *fs_info = page_to_fs_info(page);
 	struct extent_buffer *eb;
 
-	BUG_ON(!PagePrivate(page));
-	eb = (struct extent_buffer *)page->private;
-	BUG_ON(!eb);
-	BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
-	BUG_ON(!atomic_read(&eb->refs));
-	btrfs_assert_tree_locked(eb);
+	if (fs_info->sectorsize == PAGE_SIZE) {
+		BUG_ON(!PagePrivate(page));
+		eb = (struct extent_buffer *)page->private;
+		BUG_ON(!eb);
+		BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
+		BUG_ON(!atomic_read(&eb->refs));
+		btrfs_assert_tree_locked(eb);
+	} else {
+		u64 page_start = page_offset(page);
+		u64 page_end = page_start + PAGE_SIZE - 1;
+		u64 cur = page_start;
+		bool found_dirty_eb = false;
+		int ret;
+
+		ASSERT(btrfs_is_subpage(fs_info));
+		while (cur <= page_end) {
+			ret = btrfs_find_first_subpage_eb(fs_info, &eb, cur,
+							  page_end, 0);
+			if (ret > 0)
+				break;
+			cur = eb->start + eb->len;
+			if (test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags)) {
+				found_dirty_eb = true;
+				ASSERT(atomic_read(&eb->refs));
+				btrfs_assert_tree_locked(eb);
+			}
+		}
+		BUG_ON(!found_dirty_eb);
+	}
 #endif
 	return __set_page_dirty_nobuffers(page);
 }
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5254a4ce2598..278154d405ea 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2809,9 +2809,9 @@ blk_status_t btrfs_submit_read_repair(struct inode *inode,
  * Return 0 if we found one extent buffer and record it in @eb_ret.
  * Return 1 if there is no extent buffer in the range.
  */
-static int find_first_subpage_eb(struct btrfs_fs_info *fs_info,
-				 struct extent_buffer **eb_ret, u64 start,
-				 u64 end, u32 extra_bits)
+int btrfs_find_first_subpage_eb(struct btrfs_fs_info *fs_info,
+				struct extent_buffer **eb_ret, u64 start,
+				u64 end, u32 extra_bits)
 {
 	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
 	u64 found_start;
@@ -6553,7 +6553,7 @@ static int try_release_subpage_eb(struct page *page)
 	while (cur <= end) {
 		struct extent_buffer *eb;
 
-		ret = find_first_subpage_eb(fs_info, &eb, cur, end, 0);
+		ret = btrfs_find_first_subpage_eb(fs_info, &eb, cur, end, 0);
 		if (ret > 0)
 			break;
 
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 602d6568c8ea..f527b6fa258d 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -298,6 +298,10 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size);
 struct btrfs_fs_info;
 struct btrfs_inode;
 
+int btrfs_find_first_subpage_eb(struct btrfs_fs_info *fs_info,
+				struct extent_buffer **eb_ret, u64 start,
+				u64 end, unsigned int extra_bits);
+
 int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 		      u64 length, u64 logical, struct page *page,
 		      unsigned int pg_offset, int mirror_num);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 44/68] btrfs: disk-io: support subpage metadata csum calculation at write time
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (42 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 43/68] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 45/68] btrfs: extent_io: prevent extent_state from being merged for btree io tree Qu Wenruo
                   ` (25 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Add a new helper, csum_dirty_subpage_buffers(), to iterate through all
possible extent buffers in one bvec.

Also extract the code to calculate csum for one extent buffer into
csum_one_extent_buffer(), so that both the existing csum_dirty_buffer()
and the new csum_dirty_subpage_buffers() can reuse the same routine.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 103 ++++++++++++++++++++++++++++++++++-----------
 1 file changed, 79 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d31999978821..9aa68e2344e1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -490,35 +490,13 @@ static int btree_read_extent_buffer_pages(struct extent_buffer *eb,
 	return ret;
 }
 
-/*
- * checksum a dirty tree block before IO.  This has extra checks to make sure
- * we only fill in the checksum field in the first page of a multi-page block
- */
-
-static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec)
+static int csum_one_extent_buffer(struct extent_buffer *eb)
 {
-	struct extent_buffer *eb;
-	struct page *page = bvec->bv_page;
-	u64 start = page_offset(page);
-	u64 found_start;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 	u8 result[BTRFS_CSUM_SIZE];
 	u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
 	int ret;
 
-	eb = (struct extent_buffer *)page->private;
-	if (page != eb->pages[0])
-		return 0;
-
-	found_start = btrfs_header_bytenr(eb);
-	/*
-	 * Please do not consolidate these warnings into a single if.
-	 * It is useful to know what went wrong.
-	 */
-	if (WARN_ON(found_start != start))
-		return -EUCLEAN;
-	if (WARN_ON(!PageUptodate(page)))
-		return -EUCLEAN;
-
 	ASSERT(memcmp_extent_buffer(eb, fs_info->fs_devices->metadata_uuid,
 				    offsetof(struct btrfs_header, fsid),
 				    BTRFS_FSID_SIZE) == 0);
@@ -543,6 +521,83 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec
 	return 0;
 }
 
+/*
+ * Do all the csum calculation and extra sanity checks on all extent
+ * buffers in the bvec.
+ */
+static int csum_dirty_subpage_buffers(struct btrfs_fs_info *fs_info,
+				      struct bio_vec *bvec)
+{
+	struct page *page = bvec->bv_page;
+	u64 page_start = page_offset(page);
+	u64 start = page_start + bvec->bv_offset;
+	u64 end = start + bvec->bv_len - 1;
+	u64 cur = start;
+	int ret = 0;
+
+	while (cur <= end) {
+		struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+		struct extent_buffer *eb;
+
+		ret = btrfs_find_first_subpage_eb(fs_info, &eb, cur, end, 0);
+		if (ret > 0) {
+			ret = 0;
+			break;
+		}
+
+		/*
+		 * Here we can't use PageUptodate() to check the status.
+		 * As one page is uptodate only when all its extent buffers
+		 * are uptodate, and no holes between them.
+		 * So here we use EXTENT_UPTODATE bit to make sure the exntent
+		 * buffer is uptodate.
+		 */
+		if (WARN_ON(test_range_bit(io_tree, eb->start,
+				eb->start + eb->len - 1, EXTENT_UPTODATE, 1,
+				NULL) == 0))
+			return -EUCLEAN;
+		if (WARN_ON(cur != btrfs_header_bytenr(eb)))
+			return -EUCLEAN;
+
+		ret = csum_one_extent_buffer(eb);
+		if (ret < 0)
+			return ret;
+		cur = eb->start + eb->len;
+	}
+	return ret;
+}
+
+/*
+ * checksum a dirty tree block before IO.  This has extra checks to make sure
+ * we only fill in the checksum field in the first page of a multi-page block
+ */
+static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec)
+{
+	struct extent_buffer *eb;
+	struct page *page = bvec->bv_page;
+	u64 start = page_offset(page) + bvec->bv_offset;
+	u64 found_start;
+
+	if (btrfs_is_subpage(fs_info))
+		return csum_dirty_subpage_buffers(fs_info, bvec);
+
+	eb = (struct extent_buffer *)page->private;
+	if (page != eb->pages[0])
+		return 0;
+
+	found_start = btrfs_header_bytenr(eb);
+	/*
+	 * Please do not consolidate these warnings into a single if.
+	 * It is useful to know what went wrong.
+	 */
+	if (WARN_ON(found_start != start))
+		return -EUCLEAN;
+	if (WARN_ON(!PageUptodate(page)))
+		return -EUCLEAN;
+
+	return csum_one_extent_buffer(eb);
+}
+
 static int check_tree_block_fsid(struct extent_buffer *eb)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 45/68] btrfs: extent_io: prevent extent_state from being merged for btree io tree
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (43 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 44/68] btrfs: disk-io: support subpage metadata csum calculation at write time Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 46/68] btrfs: extent_io: make set_extent_buffer_dirty() to support subpage sized metadata Qu Wenruo
                   ` (24 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

For incoming subpage metadata rw support, prevent extent_state from
being merged for btree io tree.

The main cause is set_extent_buffer_dirty().

In the following call chain, we could fall into the situation where we
have to call set_extent_dirty() with atomic context:

alloc_reserved_tree_block()
|- path->leave_spinning = 1;
|- btrfs_insert_empty_item()
   |- btrfs_search_slot()
   |  Now the path has all its tree block spinning locked
   |- setup_items_for_insert();
   |- btrfs_unlock_up_safe(path, 1);
   |  Now path->nodes[0] still spin locked
   |- btrfs_mark_buffer_dirty(leaf);
      |- set_extent_buffer_dirty()

Since set_extent_buffer_dirty() is in fact a pretty common call, just
fall back to GFP_ATOMIC allocation used in __set_extent_bit() may
exhause the pool sooner than we expected.

So this patch goes another direction, by not merging all extent_state
for subpage btree io tree.

Since for subpage btree io tree, all in tree extent buffers has
EXTENT_HAS_TREE_BLOCK bit set during its lifespan, as long as
extent_state is not merged, each extent buffer would has its own
extent_state, so that set/clear_extent_bit() can reuse existing extent
buffer extent_state, without allocating new memory.

The cost is obvious, around 150 bytes per subpage extent buffer.
But considering for subpage extent buffer, we saved 15 page pointers,
this should save 120 bytes, so the net cost is just 30 bytes per subpage
extent buffer, which should be acceptable.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c        | 14 ++++++++++++--
 fs/btrfs/extent-io-tree.h | 14 ++++++++++++++
 fs/btrfs/extent_io.c      | 19 ++++++++++++++-----
 3 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9aa68e2344e1..e466c30b52c8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2326,11 +2326,21 @@ static void btrfs_init_btree_inode(struct btrfs_fs_info *fs_info)
 	/*
 	 * For subpage size support, btree inode tracks EXTENT_UPTODATE for
 	 * its IO.
+	 *
+	 * And never merge extent states to make all set/clear operation never
+	 * to allocate memory, except the initial EXTENT_HAS_TREE_BLOCK bit.
+	 * This adds extra ~150 bytes for each extent buffer.
+	 *
+	 * TODO: Josef's rwsem rework on tree lock would kill the leave_spining
+	 * case, and then we can revert this behavior.
 	 */
-	if (btrfs_is_subpage(fs_info))
+	if (btrfs_is_subpage(fs_info)) {
 		BTRFS_I(inode)->io_tree.track_uptodate = true;
-	else
+		BTRFS_I(inode)->io_tree.never_merge = true;
+	} else {
 		BTRFS_I(inode)->io_tree.track_uptodate = false;
+		BTRFS_I(inode)->io_tree.never_merge = false;
+	}
 	extent_map_tree_init(&BTRFS_I(inode)->extent_tree);
 
 	BTRFS_I(inode)->io_tree.ops = &btree_extent_io_ops;
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index d3b21c732634..bb95c6b9ad82 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -71,6 +71,20 @@ struct extent_io_tree {
 	u64 dirty_bytes;
 	bool track_uptodate;
 
+	/*
+	 * Never to merge extent_state.
+	 *
+	 * This allows any set/clear function to be execute in atomic context
+	 * without allocating extra memory.
+	 * The cost is extra memory usage.
+	 *
+	 * Should only be used for subpage btree io tree, which mostly adds per
+	 * extent buffer memory usage.
+	 *
+	 * Default: false.
+	 */
+	bool never_merge;
+
 	/* Who owns this io tree, should be one of IO_TREE_* */
 	u8 owner;
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 278154d405ea..f67d88586d05 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -286,6 +286,7 @@ void extent_io_tree_init(struct btrfs_fs_info *fs_info,
 	spin_lock_init(&tree->lock);
 	tree->private_data = private_data;
 	tree->owner = owner;
+	tree->never_merge = false;
 	if (owner == IO_TREE_INODE_FILE_EXTENT)
 		lockdep_set_class(&tree->lock, &file_extent_tree_class);
 }
@@ -481,11 +482,18 @@ static inline struct rb_node *tree_search(struct extent_io_tree *tree,
 }
 
 /*
- * utility function to look for merge candidates inside a given range.
+ * Utility function to look for merge candidates inside a given range.
  * Any extents with matching state are merged together into a single
- * extent in the tree.  Extents with EXTENT_IO in their state field
- * are not merged because the end_io handlers need to be able to do
- * operations on them without sleeping (or doing allocations/splits).
+ * extent in the tree.
+ *
+ * Except the following cases:
+ * - extent_state with EXTENT_LOCK or EXTENT_BOUNDARY bit set
+ *   Those extents are not merged because end_io handlers need to be able
+ *   to do operations on them without sleeping (or doing allocations/splits)
+ *
+ * - extent_io_tree with never_merge bit set
+ *   Same reason as above, but extra call sites may have spinlock/rwlock hold,
+ *   and we don't want to abuse GFP_ATOMIC.
  *
  * This should be called with the tree lock held.
  */
@@ -495,7 +503,8 @@ static void merge_state(struct extent_io_tree *tree,
 	struct extent_state *other;
 	struct rb_node *other_node;
 
-	if (state->state & (EXTENT_LOCKED | EXTENT_BOUNDARY))
+	if (state->state & (EXTENT_LOCKED | EXTENT_BOUNDARY) ||
+	    tree->never_merge)
 		return;
 
 	other_node = rb_prev(&state->rb_node);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 46/68] btrfs: extent_io: make set_extent_buffer_dirty() to support subpage sized metadata
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (44 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 45/68] btrfs: extent_io: prevent extent_state from being merged for btree io tree Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 47/68] btrfs: extent_io: add subpage support for clear_extent_buffer_dirty() Qu Wenruo
                   ` (23 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

For set_extent_buffer_dirty() to support subpage sized metadata, we only
need to call set_extent_dirty().

As any dirty extent buffer in the page would make the whole page dirty,
we can re-use the existing routine without problem, just need to add
above call of set_extent_buffer_dirty().

Now since a page is dirty if any extent buffer in it is dirty, the
WARN_ON() in alloc_extent_buffer() can be falsely triggered, also update
the WARN_ON(PageDirty()) check into assert_eb_range_not_dirty() to
support subpage case.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f67d88586d05..2cb9abdb0d60 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5494,6 +5494,22 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
 }
 #endif
 
+static void assert_eb_range_not_dirty(struct extent_buffer *eb,
+				      struct page *page)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+
+	if (btrfs_is_subpage(fs_info) && page->mapping) {
+		struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+
+		WARN_ON(test_range_bit(io_tree, eb->start,
+				eb->start + eb->len - 1, EXTENT_DIRTY, 0,
+				NULL));
+	} else {
+		WARN_ON(PageDirty(page));
+	}
+}
+
 struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 					  u64 start)
 {
@@ -5566,12 +5582,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 			 * drop the ref the old guy had.
 			 */
 			ClearPagePrivate(p);
+			assert_eb_range_not_dirty(eb, p);
 			WARN_ON(PageDirty(p));
 			put_page(p);
 		}
 		attach_extent_buffer_page(eb, p);
 		spin_unlock(&mapping->private_lock);
-		WARN_ON(PageDirty(p));
+		assert_eb_range_not_dirty(eb, p);
 		eb->pages[i] = p;
 		if (!PageUptodate(p))
 			uptodate = 0;
@@ -5791,6 +5808,24 @@ bool set_extent_buffer_dirty(struct extent_buffer *eb)
 		for (i = 0; i < num_pages; i++)
 			set_page_dirty(eb->pages[i]);
 
+	/*
+	 * For subpage size, also set the sector aligned EXTENT_DIRTY range for
+	 * btree io tree
+	 */
+	if (btrfs_is_subpage(eb->fs_info)) {
+		struct extent_io_tree *io_tree =
+			info_to_btree_io_tree(eb->fs_info);
+
+		/*
+		 * set_extent_buffer_dirty() can be called with
+		 * path->leave_spinning == 1, in that case we can't sleep.
+		 */
+		set_extent_dirty(io_tree, eb->start, eb->start + eb->len - 1,
+				 GFP_ATOMIC);
+		set_page_dirty(eb->pages[0]);
+		return was_dirty;
+	}
+
 #ifdef CONFIG_BTRFS_DEBUG
 	for (i = 0; i < num_pages; i++)
 		ASSERT(PageDirty(eb->pages[i]));
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 47/68] btrfs: extent_io: add subpage support for clear_extent_buffer_dirty()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (45 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 46/68] btrfs: extent_io: make set_extent_buffer_dirty() to support subpage sized metadata Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 48/68] btrfs: extent_io: make set_btree_ioerr() accept extent buffer Qu Wenruo
                   ` (22 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

To support subpage metadata, clear_extent_buffer_dirty() needs to clear
the page dirty if and only if all extent buffers in the page range are
no longer dirty.

This is pretty different from the exist clear_extent_buffer_dirty()
routine, so add a new helper function,
clear_subpage_extent_buffer_dirty() to do this for subpage metadata.

Also since the main part of clearing page dirty code is still the same,
extract that into btree_clear_page_dirty() so that it can be utilized
for both cases.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 47 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 35 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2cb9abdb0d60..76123d0f416a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5762,30 +5762,53 @@ void free_extent_buffer_stale(struct extent_buffer *eb)
 	release_extent_buffer(eb);
 }
 
+static void btree_clear_page_dirty(struct page *page)
+{
+	ASSERT(PageDirty(page));
+
+	lock_page(page);
+	clear_page_dirty_for_io(page);
+	xa_lock_irq(&page->mapping->i_pages);
+	if (!PageDirty(page))
+		__xa_clear_mark(&page->mapping->i_pages,
+				page_index(page), PAGECACHE_TAG_DIRTY);
+	xa_unlock_irq(&page->mapping->i_pages);
+	ClearPageError(page);
+	unlock_page(page);
+}
+
+static void clear_subpage_extent_buffer_dirty(const struct extent_buffer *eb)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	struct page *page = eb->pages[0];
+	u64 page_start = page_offset(page);
+	u64 page_end = page_start + PAGE_SIZE - 1;
+	int ret;
+
+	clear_extent_dirty(io_tree, eb->start, eb->start + eb->len - 1, NULL);
+	ret = test_range_bit(io_tree, page_start, page_end, EXTENT_DIRTY, 0, NULL);
+	/* All extent buffers in the page range is cleared now */
+	if (ret == 0 && PageDirty(page))
+		btree_clear_page_dirty(page);
+	WARN_ON(atomic_read(&eb->refs) == 0);
+}
+
 void clear_extent_buffer_dirty(const struct extent_buffer *eb)
 {
 	int i;
 	int num_pages;
 	struct page *page;
 
+	if (btrfs_is_subpage(eb->fs_info))
+		return clear_subpage_extent_buffer_dirty(eb);
 	num_pages = num_extent_pages(eb);
 
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
 		if (!PageDirty(page))
 			continue;
-
-		lock_page(page);
-		WARN_ON(!PagePrivate(page));
-
-		clear_page_dirty_for_io(page);
-		xa_lock_irq(&page->mapping->i_pages);
-		if (!PageDirty(page))
-			__xa_clear_mark(&page->mapping->i_pages,
-					page_index(page), PAGECACHE_TAG_DIRTY);
-		xa_unlock_irq(&page->mapping->i_pages);
-		ClearPageError(page);
-		unlock_page(page);
+		btree_clear_page_dirty(page);
 	}
 	WARN_ON(atomic_read(&eb->refs) == 0);
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 48/68] btrfs: extent_io: make set_btree_ioerr() accept extent buffer
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (46 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 47/68] btrfs: extent_io: add subpage support for clear_extent_buffer_dirty() Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 49/68] btrfs: extent_io: introduce write_one_subpage_eb() function Qu Wenruo
                   ` (21 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Current set_btree_ioerr() only accepts @page parameter and grabs extent
buffer from page::private.

This works fine for sector size == PAGE_SIZE case, but not for subpage
case.

Adds an extra parameter, @eb, for callers to pass extent buffer to this
function, so that subpage code can reuse this function.

Also since we are here, change how we grab "fs_info->flags" by using the
fs_info directly.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 76123d0f416a..1e182dfbb499 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4047,10 +4047,9 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 	return ret;
 }
 
-static void set_btree_ioerr(struct page *page)
+static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
 {
-	struct extent_buffer *eb = (struct extent_buffer *)page->private;
-	struct btrfs_fs_info *fs_info;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 
 	SetPageError(page);
 	if (test_and_set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
@@ -4060,7 +4059,6 @@ static void set_btree_ioerr(struct page *page)
 	 * If we error out, we should add back the dirty_metadata_bytes
 	 * to make it consistent.
 	 */
-	fs_info = eb->fs_info;
 	percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
 				 eb->len, fs_info->dirty_metadata_batch);
 
@@ -4104,13 +4102,13 @@ static void set_btree_ioerr(struct page *page)
 	 */
 	switch (eb->log_index) {
 	case -1:
-		set_bit(BTRFS_FS_BTREE_ERR, &eb->fs_info->flags);
+		set_bit(BTRFS_FS_BTREE_ERR, &fs_info->flags);
 		break;
 	case 0:
-		set_bit(BTRFS_FS_LOG1_ERR, &eb->fs_info->flags);
+		set_bit(BTRFS_FS_LOG1_ERR, &fs_info->flags);
 		break;
 	case 1:
-		set_bit(BTRFS_FS_LOG2_ERR, &eb->fs_info->flags);
+		set_bit(BTRFS_FS_LOG2_ERR, &fs_info->flags);
 		break;
 	default:
 		BUG(); /* unexpected, logic error */
@@ -4135,7 +4133,7 @@ static void end_bio_extent_buffer_writepage(struct bio *bio)
 		if (bio->bi_status ||
 		    test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
 			ClearPageUptodate(page);
-			set_btree_ioerr(page);
+			set_btree_ioerr(page, eb);
 		}
 
 		end_page_writeback(page);
@@ -4191,7 +4189,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 					 end_bio_extent_buffer_writepage,
 					 0, 0, 0, false);
 		if (ret) {
-			set_btree_ioerr(p);
+			set_btree_ioerr(p, eb);
 			if (PageWriteback(p))
 				end_page_writeback(p);
 			if (atomic_sub_and_test(num_pages - i, &eb->io_pages))
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 49/68] btrfs: extent_io: introduce write_one_subpage_eb() function
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (47 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 48/68] btrfs: extent_io: make set_btree_ioerr() accept extent buffer Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 50/68] btrfs: extent_io: make lock_extent_buffer_for_io() subpage compatible Qu Wenruo
                   ` (20 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

The new function, write_one_subpage_eb(), as a subroutine for subpage
metadata write, will handle the extent buffer bio submission.

The main difference between the new write_one_subpage_eb() and
write_one_eb() is:
- No page locking
  When entering write_one_subpage_eb() the page is no longer locked.
  We only lock the page for its status update, and unlock immeidately.
  Now we completely rely on extent io tree locking.

- Extra EXTENT_* bits along with page status update
  New EXTENT_WRITEBACK bit is introduced to trace extent buffer write
  back.

  For page dirty bit, it will only be cleared if all dirty extent buffers
  in the page range has been cleaned.
  For page writeback bit, it will be set anyway, and cleared in the
  error path if no other extent buffers are under writeback.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-io-tree.h |  3 ++
 fs/btrfs/extent_io.c      | 79 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+)

diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index bb95c6b9ad82..1658854efd70 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -35,6 +35,9 @@ struct io_failure_record;
  */
 #define EXTENT_READ_SUBMITTED	(1U << 16)
 
+/* For subpage btree io tree, indicates the range is under writeback */
+#define EXTENT_WRITEBACK	(1U << 17)
+
 #define EXTENT_DO_ACCOUNTING    (EXTENT_CLEAR_META_RESV | \
 				 EXTENT_CLEAR_DATA_RESV)
 #define EXTENT_CTLBITS		(EXTENT_DO_ACCOUNTING)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1e182dfbb499..a1e039848539 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3243,6 +3243,7 @@ static int submit_extent_page(unsigned int opf,
 	ASSERT(bio_ret);
 
 	if (*bio_ret) {
+		bool force_merge = false;
 		bool contig;
 		bool can_merge = true;
 
@@ -3268,6 +3269,7 @@ static int submit_extent_page(unsigned int opf,
 		if (prev_bio_flags != bio_flags || !contig || !can_merge ||
 		    force_bio_submit ||
 		    bio_add_page(bio, page, io_size, pg_offset) < io_size) {
+			ASSERT(!force_merge);
 			ret = submit_one_bio(bio, mirror_num, prev_bio_flags);
 			if (ret < 0) {
 				*bio_ret = NULL;
@@ -4147,6 +4149,80 @@ static void end_bio_extent_buffer_writepage(struct bio *bio)
 	bio_put(bio);
 }
 
+/*
+ * Unlike the work in write_one_eb(), we won't unlock the page even we
+ * succeeded submitting the extent buffer.
+ * It's callers responsibility to unlock the page after all extent
+ *
+ * Caller should still call write_one_eb() other than this function directly.
+ * As write_one_eb() has extra prepration before submitting the extent buffer.
+ */
+static int write_one_subpage_eb(struct extent_buffer *eb,
+				      struct writeback_control *wbc,
+				      struct extent_page_data *epd)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct extent_state *cached = NULL;
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
+	struct page *page = eb->pages[0];
+	u64 page_start = page_offset(page);
+	u64 page_end = page_start + PAGE_SIZE - 1;
+	unsigned int write_flags = wbc_to_write_flags(wbc) | REQ_META;
+	bool no_dirty_ebs = false;
+	int ret;
+
+	/* Convert the EXTENT_DIRTY to EXTENT_WRITEBACK for this eb */
+	ret = convert_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+				 EXTENT_WRITEBACK, EXTENT_DIRTY, &cached);
+	if (ret < 0)
+		return ret;
+	/*
+	 * Only clear page dirty if there is no dirty extent buffer in the
+	 * page range
+	 *
+	 * Also since clear_page_dirty_for_io() needs page locked, here we lock
+	 * the page just to shut up the MM code.
+	 */
+	lock_page(page);
+	if (!test_range_bit(io_tree, page_start, page_end, EXTENT_DIRTY, 0,
+			    cached)) {
+		clear_page_dirty_for_io(page);
+		no_dirty_ebs = true;
+	}
+	/* Any extent buffer writeback will mark the full page writeback */
+	set_page_writeback(page);
+
+	ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc, page,
+			eb->start, eb->len, eb->start - page_offset(page),
+			&epd->bio, end_bio_extent_buffer_writepage, 0, 0, 0,
+			false);
+	if (ret) {
+		clear_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+				 EXTENT_WRITEBACK, 0, 0, &cached);
+		set_btree_ioerr(page, eb);
+		if (PageWriteback(page) &&
+		    !test_range_bit(io_tree, page_start, page_end,
+				    EXTENT_WRITEBACK, 0, cached))
+			end_page_writeback(page);
+		unlock_page(page);
+
+		if (atomic_dec_and_test(&eb->io_pages))
+			end_extent_buffer_writeback(eb);
+		free_extent_state(cached);
+		return -EIO;
+	}
+	unlock_page(page);
+	free_extent_state(cached);
+	/*
+	 * Submission finishes without problem, if no eb is dirty anymore, we
+	 * have submitted a page.
+	 * Update the nr_written in wbc.
+	 */
+	if (no_dirty_ebs)
+		update_nr_written(wbc, 1);
+	return ret;
+}
+
 static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 			struct writeback_control *wbc,
 			struct extent_page_data *epd)
@@ -4178,6 +4254,9 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 		memzero_extent_buffer(eb, start, end - start);
 	}
 
+	if (btrfs_is_subpage(eb->fs_info))
+		return write_one_subpage_eb(eb, wbc, epd);
+
 	for (i = 0; i < num_pages; i++) {
 		struct page *p = eb->pages[i];
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 50/68] btrfs: extent_io: make lock_extent_buffer_for_io() subpage compatible
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (48 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 49/68] btrfs: extent_io: introduce write_one_subpage_eb() function Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 51/68] btrfs: extent_io: introduce submit_btree_subpage() to submit a page for subpage metadata write Qu Wenruo
                   ` (19 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

To support subpage metadata locking, the following aspects are modified:
- Locking sequence
  For regular sectorsize, we lock extent buffer first, then lock each
  page.
  For subpage sectorsize, we only lock extent buffer, but not to lock
  the page as one page can contain multiple extent buffers.

- Extent io tree locking
  For subpage metadata, we also lock the range in btree io tree.
  This allow the endio function to get unmerged extent_state, so that in
  endio function we don't need to allocate memory in atomic context.
  This also follows the behavior in metadata read path.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 44 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a1e039848539..d07972f94c40 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3943,6 +3943,9 @@ static void end_extent_buffer_writeback(struct extent_buffer *eb)
  * Lock extent buffer status and pages for write back.
  *
  * May try to flush write bio if we can't get the lock.
+ * For subpage extent buffer, caller is responsible to lock the page, we won't
+ * flush write bio, which can cause extent buffers in the same page submitted
+ * to different bios.
  *
  * Return  0 if the extent buffer doesn't need to be submitted.
  * (E.g. the extent buffer is not dirty)
@@ -3953,26 +3956,41 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 			  struct extent_page_data *epd)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct extent_io_tree *io_tree = info_to_btree_io_tree(fs_info);
 	int i, num_pages, failed_page_nr;
+	bool extent_locked = false;
 	int flush = 0;
 	int ret = 0;
 
+	if (btrfs_is_subpage(fs_info)) {
+		/*
+		 * Also lock the range so that endio can always get unmerged
+		 * extent_state.
+		 */
+		ret = lock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		if (ret < 0)
+			goto out;
+		extent_locked = true;
+	}
+
 	if (!btrfs_try_tree_write_lock(eb)) {
 		ret = flush_write_bio(epd);
 		if (ret < 0)
-			return ret;
+			goto out;
 		flush = 1;
 		btrfs_tree_lock(eb);
 	}
 
 	if (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) {
 		btrfs_tree_unlock(eb);
-		if (!epd->sync_io)
-			return 0;
+		if (!epd->sync_io) {
+			ret = 0;
+			goto out;
+		}
 		if (!flush) {
 			ret = flush_write_bio(epd);
 			if (ret < 0)
-				return ret;
+				goto out;
 			flush = 1;
 		}
 		while (1) {
@@ -3998,13 +4016,22 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 					 -eb->len,
 					 fs_info->dirty_metadata_batch);
 		ret = 1;
+		btrfs_tree_unlock(eb);
 	} else {
 		spin_unlock(&eb->refs_lock);
+		btrfs_tree_unlock(eb);
+		if (extent_locked)
+			unlock_extent(io_tree, eb->start,
+				      eb->start + eb->len - 1);
 	}
 
-	btrfs_tree_unlock(eb);
 
-	if (!ret)
+	/*
+	 * Either the tree does not need to be submitted, or we're
+	 * submitting subpage extent buffer.
+	 * Either we we don't need to lock the page(s).
+	 */
+	if (!ret || btrfs_is_subpage(fs_info))
 		return ret;
 
 	num_pages = num_extent_pages(eb);
@@ -4046,6 +4073,11 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 				 fs_info->dirty_metadata_batch);
 	btrfs_clear_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN);
 	btrfs_tree_unlock(eb);
+	/* Subpage should never reach this routine */
+	ASSERT(!btrfs_is_subpage(fs_info));
+out:
+	if (extent_locked)
+		unlock_extent(io_tree, eb->start, eb->start + eb->len - 1);
 	return ret;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 51/68] btrfs: extent_io: introduce submit_btree_subpage() to submit a page for subpage metadata write
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (49 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 50/68] btrfs: extent_io: make lock_extent_buffer_for_io() subpage compatible Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 52/68] btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function Qu Wenruo
                   ` (18 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

The new function, submit_btree_subpage(), will submit all the dirty extent
buffers in the page.

The major difference between submit_btree_page() is:
- Page locking sequence
  Now we lock page first then lock extent buffers, thus we don't need to
  unlock the page just after writting one extent buffer.
  The page get unlocked after we have submitted all extent buffers.

- Bio submission
  Since one extent buffer is ensured to be contained into one page, we
  call submit_extent_page() directly.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 64 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d07972f94c40..3a2bb2656067 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4324,6 +4324,67 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 	return ret;
 }
 
+/*
+ * A helper to submit one subpage btree page.
+ *
+ * The main difference between submit_btree_page() is:
+ * - Page locking sequence
+ *   Page are locked first, then lock extent buffers
+ *
+ * - Flush write bio
+ *   We only flush bio if we may be unable to fit current extent buffers into
+ *   current bio.
+ *
+ * Return >=0 for the number of submitted extent buffers.
+ * Return <0 for fatal error.
+ */
+static int submit_btree_subpage(struct page *page,
+				struct writeback_control *wbc,
+				struct extent_page_data *epd)
+{
+	struct btrfs_fs_info *fs_info = page_to_fs_info(page);
+	int submitted = 0;
+	u64 page_start = page_offset(page);
+	u64 page_end = page_start + PAGE_SIZE - 1;
+	u64 cur = page_start;
+	int ret;
+
+	/* Lock and write each extent buffers in the range */
+	while (cur <= page_end) {
+		struct extent_buffer *eb;
+
+		ret = btrfs_find_first_subpage_eb(fs_info, &eb, cur, page_end,
+						  EXTENT_DIRTY);
+		if (ret > 0)
+			break;
+		ret = atomic_inc_not_zero(&eb->refs);
+		if (!ret)
+			continue;
+
+		cur = eb->start + eb->len;
+		ret = lock_extent_buffer_for_io(eb, epd);
+		if (ret == 0) {
+			free_extent_buffer(eb);
+			continue;
+		}
+		if (ret < 0) {
+			free_extent_buffer(eb);
+			goto cleanup;
+		}
+		ret = write_one_eb(eb, wbc, epd);
+		free_extent_buffer(eb);
+		if (ret < 0)
+			goto cleanup;
+		submitted++;
+	}
+	return submitted;
+
+cleanup:
+	/* We hit error, end bio for the submitted extent buffers */
+	end_write_bio(epd, ret);
+	return ret;
+}
+
 /*
  * A helper to submit a btree page.
  *
@@ -4349,6 +4410,9 @@ static int submit_btree_page(struct page *page, struct writeback_control *wbc,
 	if (!PagePrivate(page))
 		return 0;
 
+	if (btrfs_is_subpage(page_to_fs_info(page)))
+		return submit_btree_subpage(page, wbc, epd);
+
 	spin_lock(&mapping->private_lock);
 	if (!PagePrivate(page)) {
 		spin_unlock(&mapping->private_lock);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 52/68] btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (50 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 51/68] btrfs: extent_io: introduce submit_btree_subpage() to submit a page for subpage metadata write Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 53/68] btrfs: inode: make can_nocow_extent() check only return 1 if the range is no smaller than PAGE_SIZE Qu Wenruo
                   ` (17 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

The new function, end_bio_subpage_eb_writepage(), will handle the
metadata writeback endio.

The major difference involved is:
- Page Writeback clear
  We will only clear the page writeback bit after all extent buffers in
  the same page has finished their writeback.
  This means we need to check the EXTENT_WRITEBACK bit for the page
  range.

- Clear EXTENT_WRITEBACK bit for btree inode
  This is the new bit for btree inode io tree. It emulates the same page
  status, but in sector size aligned range.
  The new bit is remapped from EXTENT_DEFRAG, as defrag is impossible
  for btree inode, it should be pretty safe to use.

Also since the new endio function needs quite some extent io tree
operations, change btree_submit_bio_hook() to queue the endio work into
metadata endio workqueue.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   | 21 ++++++++++++-
 fs/btrfs/extent_io.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e466c30b52c8..2ac980f739dc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -961,6 +961,7 @@ blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio,
 	async->mirror_num = mirror_num;
 	async->submit_bio_start = submit_bio_start;
 
+
 	btrfs_init_work(&async->work, run_one_async_start, run_one_async_done,
 			run_one_async_free);
 
@@ -1031,7 +1032,25 @@ static blk_status_t btree_submit_bio_hook(struct inode *inode, struct bio *bio,
 		if (ret)
 			goto out_w_error;
 		ret = btrfs_map_bio(fs_info, bio, mirror_num);
-	} else if (!async) {
+		if (ret < 0)
+			goto out_w_error;
+		return ret;
+	}
+
+	/*
+	 * For subpage metadata write, the endio involes several
+	 * extent_io_tree operations, which is not suitable for endio
+	 * context.
+	 * Thus we need to queue them into endio workqueue.
+	 */
+	if (btrfs_is_subpage(fs_info)) {
+		ret = btrfs_bio_wq_end_io(fs_info, bio,
+					  BTRFS_WQ_ENDIO_METADATA);
+		if (ret)
+			goto out_w_error;
+	}
+
+	if (!async) {
 		ret = btree_csum_one_bio(bio);
 		if (ret)
 			goto out_w_error;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3a2bb2656067..2a66bfae3414 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4149,6 +4149,76 @@ static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
 	}
 }
 
+/*
+ * The endio function for subpage extent buffer write.
+ *
+ * Unlike end_bio_extent_buffer_writepage(), we only call end_page_writeback()
+ * after all extent buffers in the page has finished their writeback.
+ */
+static void end_bio_subpage_eb_writepage(struct bio *bio)
+{
+	struct bio_vec *bvec;
+	struct bvec_iter_all iter_all;
+
+	ASSERT(!bio_flagged(bio, BIO_CLONED));
+	bio_for_each_segment_all(bvec, bio, iter_all) {
+		struct page *page = bvec->bv_page;
+		struct btrfs_fs_info *fs_info = page_to_fs_info(page);
+		struct extent_buffer *eb;
+		u64 page_start = page_offset(page);
+		u64 page_end = page_start + PAGE_SIZE - 1;
+		u64 bvec_start = page_offset(page) + bvec->bv_offset;
+		u64 bvec_end = bvec_start + bvec->bv_len - 1;
+		u64 cur_bytenr = bvec_start;
+
+		ASSERT(IS_ALIGNED(bvec->bv_len, fs_info->nodesize));
+
+		/* Iterate through all extent buffers in the range */
+		while (cur_bytenr <= bvec_end) {
+			struct extent_state *cached = NULL;
+			struct extent_io_tree *io_tree =
+				info_to_btree_io_tree(fs_info);
+			int done;
+			int ret;
+
+			ret = btrfs_find_first_subpage_eb(fs_info, &eb,
+					cur_bytenr, bvec_end, 0);
+			if (ret > 0)
+				break;
+
+			cur_bytenr = eb->start + eb->len;
+
+			ASSERT(test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags));
+			done = atomic_dec_and_test(&eb->io_pages);
+			ASSERT(done);
+
+			if (bio->bi_status ||
+			    test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
+				ClearPageUptodate(page);
+				set_btree_ioerr(page, eb);
+			}
+
+			clear_extent_bit(io_tree, eb->start,
+					eb->start + eb->len - 1,
+					EXTENT_WRITEBACK | EXTENT_LOCKED, 1, 0,
+					&cached);
+			lock_page(page);
+			/*
+			 * Only end the page writeback if there is no extent
+			 * buffer under writeback in the page anymore
+			 */
+			if (!test_range_bit(io_tree, page_start, page_end,
+					    EXTENT_WRITEBACK, 0, cached) &&
+			    PageWriteback(page))
+				end_page_writeback(page);
+			unlock_page(page);
+			free_extent_state(cached);
+			end_extent_buffer_writeback(eb);
+		}
+	}
+	bio_put(bio);
+}
+
 static void end_bio_extent_buffer_writepage(struct bio *bio)
 {
 	struct bio_vec *bvec;
@@ -4156,6 +4226,9 @@ static void end_bio_extent_buffer_writepage(struct bio *bio)
 	int done;
 	struct bvec_iter_all iter_all;
 
+	if (btrfs_is_subpage(page_to_fs_info(bio_first_page_all(bio))))
+		return end_bio_subpage_eb_writepage(bio);
+
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
 		struct page *page = bvec->bv_page;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 53/68] btrfs: inode: make can_nocow_extent() check only return 1 if the range is no smaller than PAGE_SIZE
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (51 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 52/68] btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 54/68] btrfs: file: calculate reserve space based on PAGE_SIZE for buffered write Qu Wenruo
                   ` (16 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

For subpage, we can still get sector aligned extent mapper, thus it
could lead to the following case:

0	16K	32K	48K	64K
|///////|			|
    |		\- Hole
    \- NODATACOW extent

If we want to dirty page range [0, 64K) for new write, and we need to
check the nocow status, can_nocow_extent() would return 1, with length
16K.

But for current subpage data write support, we can only write a full
page, but the range [16K, 64K) is hole where writes must be COWed.

To solve the problem, just make can_nocow_extent() do extra returned
length check.
If the result is smaller than one page, we return 0.

This behavior change won't affect regular sector size support since in
that case num_bytes should already be page aligned.

Also modify the callers to always pass page aligned offset for subpage
support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c  |  7 +++----
 fs/btrfs/inode.c | 15 +++++++++++++++
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index d3766d2bb8d6..a2009127ef96 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1535,8 +1535,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
 static int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
 			   size_t *write_bytes, bool nowait)
 {
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct btrfs_root *root = inode->root;
+	u32 blocksize = PAGE_SIZE;
 	u64 lockstart, lockend;
 	u64 num_bytes;
 	int ret;
@@ -1547,9 +1547,8 @@ static int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
 	if (!nowait && !btrfs_drew_try_write_lock(&root->snapshot_lock))
 		return -EAGAIN;
 
-	lockstart = round_down(pos, fs_info->sectorsize);
-	lockend = round_up(pos + *write_bytes,
-			   fs_info->sectorsize) - 1;
+	lockstart = round_down(pos, blocksize);
+	lockend = round_up(pos + *write_bytes, blocksize) - 1;
 	num_bytes = lockend - lockstart + 1;
 
 	if (nowait) {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f22ee5d3c105..8551815c4d65 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7006,6 +7006,11 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 	int found_type;
 	bool nocow = (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW);
 
+	/*
+	 * We should only do full page write even for subpage. Thus the offset
+	 * should always be page aligned.
+	 */
+	ASSERT(IS_ALIGNED(offset, PAGE_SIZE));
 	path = btrfs_alloc_path();
 	if (!path)
 		return -ENOMEM;
@@ -7121,6 +7126,16 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 	disk_bytenr += offset - key.offset;
 	if (csum_exist_in_range(fs_info, disk_bytenr, num_bytes))
 		goto out;
+
+	/*
+	 * If the nocow range is smaller than one page, it doesn't make any
+	 * sense for subpage case, as we can only submit full page write yet.
+	 */
+	if (num_bytes < PAGE_SIZE) {
+		ret = 0;
+		goto out;
+	}
+
 	/*
 	 * all of the above have passed, it is safe to overwrite this extent
 	 * without cow
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 54/68] btrfs: file: calculate reserve space based on PAGE_SIZE for buffered write
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (52 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 53/68] btrfs: inode: make can_nocow_extent() check only return 1 if the range is no smaller than PAGE_SIZE Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 55/68] btrfs: file: make hole punching page aligned for subpage Qu Wenruo
                   ` (15 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

In theory btrfs_buffered_write() should reserve space using sector size.
But for now let's base all reserve on PAGE_SIZE, this would make later
subpage support to always submit full page write.

This would cause more data space usage, but greatly simplify the subpage
data write support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 38 +++++++++++---------------------------
 1 file changed, 11 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a2009127ef96..564784a5c0c0 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1650,7 +1650,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	while (iov_iter_count(i) > 0) {
 		struct extent_state *cached_state = NULL;
 		size_t offset = offset_in_page(pos);
-		size_t sector_offset;
 		size_t write_bytes = min(iov_iter_count(i),
 					 nrptrs * (size_t)PAGE_SIZE -
 					 offset);
@@ -1659,7 +1658,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		size_t reserve_bytes;
 		size_t dirty_pages;
 		size_t copied;
-		size_t dirty_sectors;
 		size_t num_sectors;
 		int extents_locked;
 
@@ -1675,9 +1673,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		}
 
 		only_release_metadata = false;
-		sector_offset = pos & (fs_info->sectorsize - 1);
-		reserve_bytes = round_up(write_bytes + sector_offset,
-				fs_info->sectorsize);
+		reserve_bytes = round_up(write_bytes + offset, PAGE_SIZE);
 
 		extent_changeset_release(data_reserved);
 		ret = btrfs_check_data_free_space(BTRFS_I(inode),
@@ -1697,9 +1693,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 				 */
 				num_pages = DIV_ROUND_UP(write_bytes + offset,
 							 PAGE_SIZE);
-				reserve_bytes = round_up(write_bytes +
-							 sector_offset,
-							 fs_info->sectorsize);
+				reserve_bytes = round_up(write_bytes + offset,
+							 PAGE_SIZE);
 			} else {
 				break;
 			}
@@ -1750,9 +1745,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		copied = btrfs_copy_from_user(pos, write_bytes, pages, i);
 
 		num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes);
-		dirty_sectors = round_up(copied + sector_offset,
-					fs_info->sectorsize);
-		dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors);
 
 		/*
 		 * if we have trouble faulting in the pages, fall
@@ -1763,35 +1755,29 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 
 		if (copied == 0) {
 			force_page_uptodate = true;
-			dirty_sectors = 0;
 			dirty_pages = 0;
 		} else {
 			force_page_uptodate = false;
-			dirty_pages = DIV_ROUND_UP(copied + offset,
-						   PAGE_SIZE);
+			dirty_pages = DIV_ROUND_UP(copied + offset, PAGE_SIZE);
 		}
 
-		if (num_sectors > dirty_sectors) {
+		if (num_pages > dirty_pages) {
 			/* release everything except the sectors we dirtied */
-			release_bytes -= dirty_sectors <<
-						fs_info->sb->s_blocksize_bits;
+			release_bytes -= dirty_pages << PAGE_SHIFT;
 			if (only_release_metadata) {
 				btrfs_delalloc_release_metadata(BTRFS_I(inode),
 							release_bytes, true);
 			} else {
 				u64 __pos;
 
-				__pos = round_down(pos,
-						   fs_info->sectorsize) +
+				__pos = round_down(pos, PAGE_SIZE) +
 					(dirty_pages << PAGE_SHIFT);
 				btrfs_delalloc_release_space(BTRFS_I(inode),
 						data_reserved, __pos,
 						release_bytes, true);
 			}
 		}
-
-		release_bytes = round_up(copied + sector_offset,
-					fs_info->sectorsize);
+		release_bytes = round_up(copied + offset, PAGE_SIZE);
 
 		if (copied > 0)
 			ret = btrfs_dirty_pages(BTRFS_I(inode), pages,
@@ -1822,10 +1808,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 			btrfs_check_nocow_unlock(BTRFS_I(inode));
 
 		if (only_release_metadata && copied > 0) {
-			lockstart = round_down(pos,
-					       fs_info->sectorsize);
-			lockend = round_up(pos + copied,
-					   fs_info->sectorsize) - 1;
+			lockstart = round_down(pos, PAGE_SIZE);
+			lockend = round_up(pos + copied, PAGE_SIZE) - 1;
 
 			set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
 				       lockend, EXTENT_NORESERVE, NULL,
@@ -1852,7 +1836,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		} else {
 			btrfs_delalloc_release_space(BTRFS_I(inode),
 					data_reserved,
-					round_down(pos, fs_info->sectorsize),
+					round_down(pos, PAGE_SIZE),
 					release_bytes, true);
 		}
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 55/68] btrfs: file: make hole punching page aligned for subpage
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (53 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 54/68] btrfs: file: calculate reserve space based on PAGE_SIZE for buffered write Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 56/68] btrfs: file: make btrfs_dirty_pages() follow page size to mark extent io tree Qu Wenruo
                   ` (14 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Since current subpage data write only support full page write, make hole
punching to follow page size instead of sector size.

Also there is an optimization branch which will skip any existing holes,
but since we can still have subpage holes in the hole punching range,
the optimization needs to be disabled in subpage case.

Update the related comment for subpage support, explaining why we don't
want that optimization.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 564784a5c0c0..cb8f2b04ccd8 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2802,6 +2802,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 	u64 tail_start;
 	u64 tail_len;
 	u64 orig_start = offset;
+	u32 blocksize = PAGE_SIZE;
 	int ret = 0;
 	bool same_block;
 	u64 ino_size;
@@ -2813,7 +2814,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 		return ret;
 
 	inode_lock(inode);
-	ino_size = round_up(inode->i_size, fs_info->sectorsize);
+	ino_size = round_up(inode->i_size, block_size);
 	ret = find_first_non_hole(inode, &offset, &len);
 	if (ret < 0)
 		goto out_only_mutex;
@@ -2823,11 +2824,10 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 		goto out_only_mutex;
 	}
 
-	lockstart = round_up(offset, btrfs_inode_sectorsize(inode));
-	lockend = round_down(offset + len,
-			     btrfs_inode_sectorsize(inode)) - 1;
-	same_block = (BTRFS_BYTES_TO_BLKS(fs_info, offset))
-		== (BTRFS_BYTES_TO_BLKS(fs_info, offset + len - 1));
+	lockstart = round_up(offset, blocksize);
+	lockend = round_down(offset + len, blocksize) - 1;
+	same_block = round_down(offset, blocksize) ==
+		     round_down(offset + len - 1, blocksize);
 	/*
 	 * We needn't truncate any block which is beyond the end of the file
 	 * because we are sure there is no data there.
@@ -2836,7 +2836,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 	 * Only do this if we are in the same block and we aren't doing the
 	 * entire block.
 	 */
-	if (same_block && len < fs_info->sectorsize) {
+	if (same_block && len < blocksize) {
 		if (offset < ino_size) {
 			truncated_block = true;
 			ret = btrfs_truncate_block(inode, offset, len, 0);
@@ -2856,10 +2856,13 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 		}
 	}
 
-	/* Check the aligned pages after the first unaligned page,
-	 * if offset != orig_start, which means the first unaligned page
-	 * including several following pages are already in holes,
-	 * the extra check can be skipped */
+	/*
+	 * Optimization to check if we can skip any already existing holes.
+	 *
+	 * If offset != orig_start, which means the first unaligned page
+	 * and several following pages are already holes, thus can skip the
+	 * check.
+	 */
 	if (offset == orig_start) {
 		/* after truncate page, check hole again */
 		len = offset + len - lockstart;
@@ -2871,7 +2874,8 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 			ret = 0;
 			goto out_only_mutex;
 		}
-		lockstart = offset;
+		lockstart = max_t(u64, lockstart,
+				  round_down(offset, blocksize));
 	}
 
 	/* Check the tail unaligned part is in a hole */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 56/68] btrfs: file: make btrfs_dirty_pages() follow page size to mark extent io tree
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (54 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 55/68] btrfs: file: make hole punching page aligned for subpage Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 57/68] btrfs: file: make btrfs_file_write_iter() to be page aligned Qu Wenruo
                   ` (13 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs_dirty_pages() follows sector size to mark extent io
tree, but since we currently don't follow subpage data writeback, this
could cause extra problem for subpage support.

Change it to do page alignement.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index cb8f2b04ccd8..30b22303ad2c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -504,9 +504,9 @@ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
 		      size_t num_pages, loff_t pos, size_t write_bytes,
 		      struct extent_state **cached)
 {
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	int err = 0;
 	int i;
+	u32 blocksize = PAGE_SIZE;
 	u64 num_bytes;
 	u64 start_pos;
 	u64 end_of_last_block;
@@ -514,9 +514,8 @@ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
 	loff_t isize = i_size_read(&inode->vfs_inode);
 	unsigned int extra_bits = 0;
 
-	start_pos = pos & ~((u64) fs_info->sectorsize - 1);
-	num_bytes = round_up(write_bytes + pos - start_pos,
-			     fs_info->sectorsize);
+	start_pos = round_down(pos, blocksize);
+	num_bytes = round_up(write_bytes + pos - start_pos, blocksize);
 
 	end_of_last_block = start_pos + num_bytes - 1;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 57/68] btrfs: file: make btrfs_file_write_iter() to be page aligned
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (55 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 56/68] btrfs: file: make btrfs_dirty_pages() follow page size to mark extent io tree Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 58/68] btrfs: output extra info for space info update underflow Qu Wenruo
                   ` (12 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

This is mostly for subpage write support, as we don't support to submit
subpage sized write yet, so we have to submit the full page write.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 30b22303ad2c..8f44bde1d04e 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1909,6 +1909,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	struct inode *inode = file_inode(file);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_root *root = BTRFS_I(inode)->root;
+	u32 blocksize = PAGE_SIZE;
 	u64 start_pos;
 	u64 end_pos;
 	ssize_t num_written = 0;
@@ -1988,18 +1989,17 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	 */
 	update_time_for_write(inode);
 
-	start_pos = round_down(pos, fs_info->sectorsize);
+	start_pos = round_down(pos, blocksize);
 	oldsize = i_size_read(inode);
 	if (start_pos > oldsize) {
 		/* Expand hole size to cover write data, preventing empty gap */
-		end_pos = round_up(pos + count,
-				   fs_info->sectorsize);
+		end_pos = round_up(pos + count, blocksize);
 		err = btrfs_cont_expand(inode, oldsize, end_pos);
 		if (err) {
 			inode_unlock(inode);
 			goto out;
 		}
-		if (start_pos > round_up(oldsize, fs_info->sectorsize))
+		if (start_pos > round_up(oldsize, blocksize))
 			clean_page = 1;
 	}
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 58/68] btrfs: output extra info for space info update underflow
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (56 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 57/68] btrfs: file: make btrfs_file_write_iter() to be page aligned Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 59/68] btrfs: delalloc-space: make data space reservation to be page aligned Qu Wenruo
                   ` (11 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/space-info.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index c3c64019950a..f7c3fc3a8541 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -106,6 +106,8 @@ btrfs_space_info_update_##name(struct btrfs_fs_info *fs_info,		\
 				      sinfo->flags, abs_bytes,		\
 				      bytes > 0);			\
 	if (bytes < 0 && sinfo->name < -bytes) {			\
+		btrfs_warn(fs_info, "bytes_%s have %llu diff %lld\n",	\
+			trace_name, sinfo->name, bytes);		\
 		WARN_ON(1);						\
 		sinfo->name = 0;					\
 		return;							\
@@ -113,7 +115,7 @@ btrfs_space_info_update_##name(struct btrfs_fs_info *fs_info,		\
 	sinfo->name += bytes;						\
 }
 
-DECLARE_SPACE_INFO_UPDATE(bytes_may_use, "space_info");
+DECLARE_SPACE_INFO_UPDATE(bytes_may_use, "may_use");
 DECLARE_SPACE_INFO_UPDATE(bytes_pinned, "pinned");
 
 int btrfs_init_space_info(struct btrfs_fs_info *fs_info);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 59/68] btrfs: delalloc-space: make data space reservation to be page aligned
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (57 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 58/68] btrfs: output extra info for space info update underflow Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 60/68] btrfs: scrub: allow scrub to work with subpage sectorsize Qu Wenruo
                   ` (10 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

This is for initial subpage data write support.
Currently we don't yet support full subpage data write, but still full
page data writeback.

Thus change data reserve and release code to be page aligned.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/delalloc-space.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index 0e354e9e57d0..1f2b324485f5 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -116,13 +116,14 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
 	struct btrfs_root *root = inode->root;
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
+	u32 blocksize = PAGE_SIZE;
 	u64 used;
 	int ret = 0;
 	int need_commit = 2;
 	int have_pinned_space;
 
-	/* Make sure bytes are sectorsize aligned */
-	bytes = ALIGN(bytes, fs_info->sectorsize);
+	/* Make sure bytes are aligned */
+	bytes = round_up(bytes, blocksize);
 
 	if (btrfs_is_free_space_inode(inode)) {
 		need_commit = 0;
@@ -241,12 +242,12 @@ int btrfs_check_data_free_space(struct btrfs_inode *inode,
 			struct extent_changeset **reserved, u64 start, u64 len)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
+	u32 blocksize = PAGE_SIZE;
 	int ret;
 
 	/* align the range */
-	len = round_up(start + len, fs_info->sectorsize) -
-	      round_down(start, fs_info->sectorsize);
-	start = round_down(start, fs_info->sectorsize);
+	len = round_up(start + len, blocksize) - round_down(start, blocksize);
+	start = round_down(start, blocksize);
 
 	ret = btrfs_alloc_data_chunk_ondemand(inode, len);
 	if (ret < 0)
@@ -293,11 +294,11 @@ void btrfs_free_reserved_data_space(struct btrfs_inode *inode,
 			struct extent_changeset *reserved, u64 start, u64 len)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
+	u32 blocksize = PAGE_SIZE;
 
-	/* Make sure the range is aligned to sectorsize */
-	len = round_up(start + len, fs_info->sectorsize) -
-	      round_down(start, fs_info->sectorsize);
-	start = round_down(start, fs_info->sectorsize);
+	/* Make sure the range is aligned */
+	len = round_up(start + len, blocksize) - round_down(start, blocksize);
+	start = round_down(start, blocksize);
 
 	btrfs_free_reserved_data_space_noquota(fs_info, len);
 	btrfs_qgroup_free_data(inode, reserved, start, len);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 60/68] btrfs: scrub: allow scrub to work with subpage sectorsize
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (58 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 59/68] btrfs: delalloc-space: make data space reservation to be page aligned Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 61/68] btrfs: inode: make btrfs_truncate_block() to do page alignment Qu Wenruo
                   ` (9 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 354ab9985a34..806523515d2f 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3821,14 +3821,6 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 		return -EINVAL;
 	}
 
-	if (fs_info->sectorsize != PAGE_SIZE) {
-		/* not supported for data w/o checksums */
-		btrfs_err_rl(fs_info,
-			   "scrub: size assumption sectorsize != PAGE_SIZE (%d != %lu) fails",
-		       fs_info->sectorsize, PAGE_SIZE);
-		return -EINVAL;
-	}
-
 	if (fs_info->nodesize >
 	    PAGE_SIZE * SCRUB_MAX_PAGES_PER_BLOCK ||
 	    fs_info->sectorsize > PAGE_SIZE * SCRUB_MAX_PAGES_PER_BLOCK) {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 61/68] btrfs: inode: make btrfs_truncate_block() to do page alignment
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (59 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 60/68] btrfs: scrub: allow scrub to work with subpage sectorsize Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 62/68] btrfs: file: make hole punch and zero range to be page aligned Qu Wenruo
                   ` (8 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

This is mostly for subpage write back, as we still can only submit full
page write, we can't truncate the subpage sector.

Thus here we truncate the whole page other than each sector.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8551815c4d65..f3bc894611e0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4529,7 +4529,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 			int front)
 {
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct address_space *mapping = inode->i_mapping;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_ordered_extent *ordered;
@@ -4537,7 +4536,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
 	struct extent_changeset *data_reserved = NULL;
 	char *kaddr;
 	bool only_release_metadata = false;
-	u32 blocksize = fs_info->sectorsize;
+	u32 blocksize = PAGE_SIZE;
 	pgoff_t index = from >> PAGE_SHIFT;
 	unsigned offset = from & (blocksize - 1);
 	struct page *page;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 62/68] btrfs: file: make hole punch and zero range to be page aligned
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (60 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 61/68] btrfs: inode: make btrfs_truncate_block() to do page alignment Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 63/68] btrfs: file: make btrfs_fallocate() to use PAGE_SIZE as blocksize Qu Wenruo
                   ` (7 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

To workaround the fact that we can't yet submit subpage write bio.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 8f44bde1d04e..6e342c466fdf 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2455,6 +2455,8 @@ static int btrfs_punch_hole_lock_range(struct inode *inode,
 				       const u64 lockend,
 				       struct extent_state **cached_state)
 {
+	ASSERT(IS_ALIGNED(lockstart, PAGE_SIZE) &&
+	       IS_ALIGNED(lockend + 1, PAGE_SIZE));
 	while (1) {
 		struct btrfs_ordered_extent *ordered;
 		int ret;
@@ -3033,12 +3035,12 @@ enum {
 static int btrfs_zero_range_check_range_boundary(struct inode *inode,
 						 u64 offset)
 {
-	const u64 sectorsize = btrfs_inode_sectorsize(inode);
+	const u32 blocksize = PAGE_SIZE;
 	struct extent_map *em;
 	int ret;
 
-	offset = round_down(offset, sectorsize);
-	em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, offset, sectorsize);
+	offset = round_down(offset, blocksize);
+	em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, offset, blocksize);
 	if (IS_ERR(em))
 		return PTR_ERR(em);
 
@@ -3058,14 +3060,13 @@ static int btrfs_zero_range(struct inode *inode,
 			    loff_t len,
 			    const int mode)
 {
-	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
 	struct extent_map *em;
 	struct extent_changeset *data_reserved = NULL;
 	int ret;
+	const u32 blocksize = PAGE_SIZE;
 	u64 alloc_hint = 0;
-	const u64 sectorsize = btrfs_inode_sectorsize(inode);
-	u64 alloc_start = round_down(offset, sectorsize);
-	u64 alloc_end = round_up(offset + len, sectorsize);
+	u64 alloc_start = round_down(offset, blocksize);
+	u64 alloc_end = round_up(offset + len, blocksize);
 	u64 bytes_to_reserve = 0;
 	bool space_reserved = false;
 
@@ -3105,18 +3106,17 @@ static int btrfs_zero_range(struct inode *inode,
 		 * Part of the range is already a prealloc extent, so operate
 		 * only on the remaining part of the range.
 		 */
-		alloc_start = em_end;
-		ASSERT(IS_ALIGNED(alloc_start, sectorsize));
+		alloc_start = round_down(em_end, blocksize);
 		len = offset + len - alloc_start;
 		offset = alloc_start;
 		alloc_hint = em->block_start + em->len;
 	}
 	free_extent_map(em);
 
-	if (BTRFS_BYTES_TO_BLKS(fs_info, offset) ==
-	    BTRFS_BYTES_TO_BLKS(fs_info, offset + len - 1)) {
+	if (round_down(offset, blocksize) ==
+	    round_down(offset + len - 1, blocksize)) {
 		em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, alloc_start,
-				      sectorsize);
+				      blocksize);
 		if (IS_ERR(em)) {
 			ret = PTR_ERR(em);
 			goto out;
@@ -3128,7 +3128,7 @@ static int btrfs_zero_range(struct inode *inode,
 							   mode);
 			goto out;
 		}
-		if (len < sectorsize && em->block_start != EXTENT_MAP_HOLE) {
+		if (len < blocksize && em->block_start != EXTENT_MAP_HOLE) {
 			free_extent_map(em);
 			ret = btrfs_truncate_block(inode, offset, len, 0);
 			if (!ret)
@@ -3138,13 +3138,13 @@ static int btrfs_zero_range(struct inode *inode,
 			return ret;
 		}
 		free_extent_map(em);
-		alloc_start = round_down(offset, sectorsize);
-		alloc_end = alloc_start + sectorsize;
+		alloc_start = round_down(offset, blocksize);
+		alloc_end = alloc_start + blocksize;
 		goto reserve_space;
 	}
 
-	alloc_start = round_up(offset, sectorsize);
-	alloc_end = round_down(offset + len, sectorsize);
+	alloc_start = round_up(offset, blocksize);
+	alloc_end = round_down(offset + len, blocksize);
 
 	/*
 	 * For unaligned ranges, check the pages at the boundaries, they might
@@ -3152,12 +3152,12 @@ static int btrfs_zero_range(struct inode *inode,
 	 * they might map to a hole, in which case we need our allocation range
 	 * to cover them.
 	 */
-	if (!IS_ALIGNED(offset, sectorsize)) {
+	if (!IS_ALIGNED(offset, blocksize)) {
 		ret = btrfs_zero_range_check_range_boundary(inode, offset);
 		if (ret < 0)
 			goto out;
 		if (ret == RANGE_BOUNDARY_HOLE) {
-			alloc_start = round_down(offset, sectorsize);
+			alloc_start = round_down(offset, blocksize);
 			ret = 0;
 		} else if (ret == RANGE_BOUNDARY_WRITTEN_EXTENT) {
 			ret = btrfs_truncate_block(inode, offset, 0, 0);
@@ -3168,13 +3168,13 @@ static int btrfs_zero_range(struct inode *inode,
 		}
 	}
 
-	if (!IS_ALIGNED(offset + len, sectorsize)) {
+	if (!IS_ALIGNED(offset + len, blocksize)) {
 		ret = btrfs_zero_range_check_range_boundary(inode,
 							    offset + len);
 		if (ret < 0)
 			goto out;
 		if (ret == RANGE_BOUNDARY_HOLE) {
-			alloc_end = round_up(offset + len, sectorsize);
+			alloc_end = round_up(offset + len, blocksize);
 			ret = 0;
 		} else if (ret == RANGE_BOUNDARY_WRITTEN_EXTENT) {
 			ret = btrfs_truncate_block(inode, offset + len, 0, 1);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 63/68] btrfs: file: make btrfs_fallocate() to use PAGE_SIZE as blocksize
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (61 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 62/68] btrfs: file: make hole punch and zero range to be page aligned Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 64/68] btrfs: inode: always mark the full page range delalloc for btrfs_page_mkwrite() Qu Wenruo
                   ` (6 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

In theory, we can still allow subpage sector size to be utilized in such
case, but since btrfs_truncate_block() now operates in page unit, we
should also change btrfs_fallocate() to honor PAGE_SIZE as blocksize.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 6e342c466fdf..f7122f71b791 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -3244,7 +3244,7 @@ static long btrfs_fallocate(struct file *file, int mode,
 	u64 locked_end;
 	u64 actual_end = 0;
 	struct extent_map *em;
-	int blocksize = btrfs_inode_sectorsize(inode);
+	int blocksize = PAGE_SIZE;
 	int ret;
 
 	alloc_start = round_down(offset, blocksize);
@@ -3401,7 +3401,7 @@ static long btrfs_fallocate(struct file *file, int mode,
 		if (!ret)
 			ret = btrfs_prealloc_file_range(inode, mode,
 					range->start,
-					range->len, i_blocksize(inode),
+					range->len, blocksize,
 					offset + len, &alloc_hint);
 		else
 			btrfs_free_reserved_data_space(BTRFS_I(inode),
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 64/68] btrfs: inode: always mark the full page range delalloc for btrfs_page_mkwrite()
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (62 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 63/68] btrfs: file: make btrfs_fallocate() to use PAGE_SIZE as blocksize Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 65/68] btrfs: inode: require page alignement for direct io Qu Wenruo
                   ` (5 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

So that we won't get subpage sized EXTENT_DELALLOC, which could easily
screwup the PAGE aligned write space reservation for subpage support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f3bc894611e0..0da6c91db0bc 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8323,8 +8323,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	}
 
 	if (page->index == ((size - 1) >> PAGE_SHIFT)) {
-		reserved_space = round_up(size - page_start,
-					  fs_info->sectorsize);
+		reserved_space = round_up(size - page_start, PAGE_SIZE);
 		if (reserved_space < PAGE_SIZE) {
 			end = page_start + reserved_space - 1;
 			btrfs_delalloc_release_space(BTRFS_I(inode),
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 65/68] btrfs: inode: require page alignement for direct io
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (63 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 64/68] btrfs: inode: always mark the full page range delalloc for btrfs_page_mkwrite() Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 66/68] btrfs: inode: only do NOCOW write for page aligned extent Qu Wenruo
                   ` (4 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

For incoming subpage support, we still can only submit full page write,
thus the requirement for direct IO alignment should still be page size,
not sector size.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0da6c91db0bc..625950258c87 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7894,7 +7894,7 @@ static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
 {
 	int seg;
 	int i;
-	unsigned int blocksize_mask = fs_info->sectorsize - 1;
+	unsigned int blocksize_mask = PAGE_SIZE - 1;
 	ssize_t retval = -EINVAL;
 
 	if (offset & blocksize_mask)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 66/68] btrfs: inode: only do NOCOW write for page aligned extent
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (64 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 65/68] btrfs: inode: require page alignement for direct io Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 67/68] btrfs: reflink: do full page writeback for reflink prepare Qu Wenruo
                   ` (3 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Another workaround for the inability to submit real subpage sized write bio.

For NOCOW, if a range ends at sector boundary but no page boundary, we
can't submit a subpage NOCOW write bio.
To workaround this, we skip any extent which is not page aligned, and
fall back to COW.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 625950258c87..c3d32f4858d5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1451,6 +1451,12 @@ static int fallback_to_cow(struct btrfs_inode *inode, struct page *locked_page,
  *
  * If no cow copies or snapshots exist, we write directly to the existing
  * blocks on disk
+ * the full page. Or we fall back to COW, as we don't yet support subpage
+ * write.
+ *
+ * For subpage case, since we can't submit subpage data write yet, we have
+ * more restrict condition for NOCOW (the extent must contain the full page).
+ * Or we fall back to COW the full page.
  */
 static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 				       struct page *locked_page,
@@ -1592,6 +1598,20 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 			    btrfs_file_extent_encryption(leaf, fi) ||
 			    btrfs_file_extent_other_encoding(leaf, fi))
 				goto out_check;
+			/*
+			 * If the file offset/extent offset/extent end is not
+			 * page aligned, we skip it and fallback to COW.
+			 * This is mostly overkilled, but to make subpage NOCOW
+			 * write easier, we only allow write into page aligned
+			 * extent.
+			 *
+			 * TODO: Remove this when full subpage write is
+			 * supported.
+			 */
+			if (!IS_ALIGNED(found_key.offset, PAGE_SIZE) ||
+			    !IS_ALIGNED(extent_end, PAGE_SIZE) ||
+			    !IS_ALIGNED(extent_offset, PAGE_SIZE))
+				goto out_check;
 			/*
 			 * If extent is created before the last volume's snapshot
 			 * this implies the extent is shared, hence we can't do
@@ -1676,8 +1696,8 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 		 */
 		if (!nocow) {
 			if (cow_start == (u64)-1)
-				cow_start = cur_offset;
-			cur_offset = extent_end;
+				cow_start = round_down(cur_offset, PAGE_SIZE);
+			cur_offset = round_up(extent_end, PAGE_SIZE);
 			if (cur_offset > end)
 				break;
 			path->slots[0]++;
@@ -1692,6 +1712,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 		 * NOCOW, following one which needs to be COW'ed
 		 */
 		if (cow_start != (u64)-1) {
+			ASSERT(IS_ALIGNED(cow_start, PAGE_SIZE));
 			ret = fallback_to_cow(inode, locked_page,
 					      cow_start, found_key.offset - 1,
 					      page_started, nr_written);
@@ -1700,6 +1721,9 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 			cow_start = (u64)-1;
 		}
 
+		ASSERT(IS_ALIGNED(cur_offset, PAGE_SIZE) &&
+		       IS_ALIGNED(num_bytes, PAGE_SIZE) &&
+		       IS_ALIGNED(found_key.offset, PAGE_SIZE));
 		if (extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
 			u64 orig_start = found_key.offset - extent_offset;
 			struct extent_map *em;
@@ -1774,7 +1798,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 		cow_start = cur_offset;
 
 	if (cow_start != (u64)-1) {
-		cur_offset = end;
+		cur_offset = round_up(end, PAGE_SIZE) - 1;
 		ret = fallback_to_cow(inode, locked_page, cow_start, end,
 				      page_started, nr_written);
 		if (ret)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 67/68] btrfs: reflink: do full page writeback for reflink prepare
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (65 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 66/68] btrfs: inode: only do NOCOW write for page aligned extent Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21  6:25 ` [PATCH v4 68/68] btrfs: support subpage read write for test Qu Wenruo
                   ` (2 subsequent siblings)
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Since we don't support subpage writeback yet, let
btrfs_remap_file_range_prep() to do full page writeback.

This only affects subpage support, as the regular sectorsize support
already has its sectorsize == PAGE_SIZE.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/reflink.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index 5cd02514cf4d..e8023c1dcb5d 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -700,9 +700,15 @@ static int btrfs_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 {
 	struct inode *inode_in = file_inode(file_in);
 	struct inode *inode_out = file_inode(file_out);
-	u64 bs = BTRFS_I(inode_out)->root->fs_info->sb->s_blocksize;
+	/*
+	 * We don't support subpage write yet, thus for data writeback we
+	 * must use PAGE_SIZE here. But for reflink we still support proper
+	 * sector alignment.
+	 */
+	u32 wb_bs = PAGE_SIZE;
 	bool same_inode = inode_out == inode_in;
-	u64 wb_len;
+	u64 in_wb_len;
+	u64 out_wb_len;
 	int ret;
 
 	if (!(remap_flags & REMAP_FILE_DEDUP)) {
@@ -735,11 +741,21 @@ static int btrfs_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 	 *    waits for the writeback to complete, i.e. for IO to be done, and
 	 *    not for the ordered extents to complete. We need to wait for them
 	 *    to complete so that new file extent items are in the fs tree.
+	 *
+	 * Also for subpage case, since at different offset the same length can
+	 * cover different number of pages, we have to calculate the wb_len for
+	 * each file.
 	 */
-	if (*len == 0 && !(remap_flags & REMAP_FILE_DEDUP))
-		wb_len = ALIGN(inode_in->i_size, bs) - ALIGN_DOWN(pos_in, bs);
-	else
-		wb_len = ALIGN(*len, bs);
+	if (*len == 0 && !(remap_flags & REMAP_FILE_DEDUP)) {
+		in_wb_len = round_up(inode_in->i_size, wb_bs) -
+			    round_down(pos_in, wb_bs);
+		out_wb_len = in_wb_len;
+	} else {
+		in_wb_len = round_up(pos_in + *len, wb_bs) -
+			    round_down(pos_in, wb_bs);
+		out_wb_len = round_up(pos_out + *len, wb_bs) -
+			     round_down(pos_out, wb_bs);
+	}
 
 	/*
 	 * Since we don't lock ranges, wait for ongoing lockless dio writes (as
@@ -771,12 +787,12 @@ static int btrfs_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 	if (ret < 0)
 		return ret;
 
-	ret = btrfs_wait_ordered_range(inode_in, ALIGN_DOWN(pos_in, bs),
-				       wb_len);
+	ret = btrfs_wait_ordered_range(inode_in, round_down(pos_in, wb_bs),
+				       in_wb_len);
 	if (ret < 0)
 		return ret;
-	ret = btrfs_wait_ordered_range(inode_out, ALIGN_DOWN(pos_out, bs),
-				       wb_len);
+	ret = btrfs_wait_ordered_range(inode_out, round_down(pos_out, wb_bs),
+				       out_wb_len);
 	if (ret < 0)
 		return ret;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 68/68] btrfs: support subpage read write for test
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (66 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 67/68] btrfs: reflink: do full page writeback for reflink prepare Qu Wenruo
@ 2020-10-21  6:25 ` Qu Wenruo
  2020-10-21 11:22 ` [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size David Sterba
  2020-11-02 14:56 ` David Sterba
  69 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21  6:25 UTC (permalink / raw)
  To: linux-btrfs

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 10 ----------
 fs/btrfs/super.c   |  7 -------
 2 files changed, 17 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2ac980f739dc..8b5f65e6c5fa 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3335,16 +3335,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_alloc;
 	}
 
-	/* For 4K sector size support, it's only read-only yet */
-	if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) {
-		if (!sb_rdonly(sb) || btrfs_super_log_root(disk_super)) {
-			btrfs_err(fs_info,
-				"subpage sector size only support RO yet");
-			err = -EINVAL;
-			goto fail_alloc;
-		}
-	}
-
 	ret = btrfs_init_workqueues(fs_info, fs_devices);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 743a2fadf4ee..25967ecaaf0a 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1922,13 +1922,6 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			ret = -EINVAL;
 			goto restore;
 		}
-		if (btrfs_is_subpage(fs_info)) {
-			btrfs_warn(fs_info,
-	"read-write mount is not yet allowed for sector size %u page size %lu",
-				   fs_info->sectorsize, PAGE_SIZE);
-			ret = -EINVAL;
-			goto restore;
-		}
 
 		ret = btrfs_cleanup_fs_roots(fs_info);
 		if (ret)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (67 preceding siblings ...)
  2020-10-21  6:25 ` [PATCH v4 68/68] btrfs: support subpage read write for test Qu Wenruo
@ 2020-10-21 11:22 ` David Sterba
  2020-10-21 11:50   ` Qu Wenruo
  2020-11-02 14:56 ` David Sterba
  69 siblings, 1 reply; 97+ messages in thread
From: David Sterba @ 2020-10-21 11:22 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Oct 21, 2020 at 02:24:46PM +0800, Qu Wenruo wrote:
> === Patchset structure ===
> Patch 01~03:	Small bug fixes
> Patch 04~22:	Generic cleanup and refactors, which make sense without
> 		subpage support
> Patch 23~27:	Subpage specific cleanup and refactors.
> Patch 28~42:	Enablement for subpage RO mount
> Patch 43~52:	Enablement for subpage metadata write
> Patch 53~68:	Enablement for subpage data write (although still in
> 		page size)

That's a sane grouping to merge it from the top, though it still could
be some updates required. There are some pending patchsets for next and
I don't have an estimate for conflicts regarding the cleanups you have
in this patchset so we'll see.  All up to 27 should be mergeable in this
dev cycle.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size
  2020-10-21 11:22 ` [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size David Sterba
@ 2020-10-21 11:50   ` Qu Wenruo
  0 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-21 11:50 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1176 bytes --]



On 2020/10/21 下午7:22, David Sterba wrote:
> On Wed, Oct 21, 2020 at 02:24:46PM +0800, Qu Wenruo wrote:
>> === Patchset structure ===
>> Patch 01~03:	Small bug fixes
>> Patch 04~22:	Generic cleanup and refactors, which make sense without
>> 		subpage support
>> Patch 23~27:	Subpage specific cleanup and refactors.
>> Patch 28~42:	Enablement for subpage RO mount
>> Patch 43~52:	Enablement for subpage metadata write
>> Patch 53~68:	Enablement for subpage data write (although still in
>> 		page size)
> 
> That's a sane grouping to merge it from the top, though it still could
> be some updates required. There are some pending patchsets for next and
> I don't have an estimate for conflicts regarding the cleanups you have
> in this patchset so we'll see.  All up to 27 should be mergeable in this
> dev cycle.
> 

That's great, if the conflicts are not manageable, feel free to ask me
to do the rebase.

The main conflicts I can guess is from the metadata readpage refactor
from Nik, but my current structure is already using a similar way to
call submit_extent_page() directly, so I guess it shouldn't be too
destructive.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 07/68] btrfs: disk-io: replace @fs_info and @private_data with @inode for btrfs_wq_submit_bio()
  2020-10-21  6:24 ` [PATCH v4 07/68] btrfs: disk-io: replace @fs_info and @private_data with @inode for btrfs_wq_submit_bio() Qu Wenruo
@ 2020-10-21 22:00   ` Goldwyn Rodrigues
  0 siblings, 0 replies; 97+ messages in thread
From: Goldwyn Rodrigues @ 2020-10-21 22:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Just some commit header re-phrase 

On 14:24 21/10, Qu Wenruo wrote:
> All callers for btrfs_wq_submit_bio() passes struct inode as
              of                        pass                of

> @private_data, so there is no need for @private_data to be (void *),
> just replace it with "struct inode *inode".
> 
> While we can extra fs_info from struct inode, also remove the @fs_info
> parameter.

Since we can extract fs_info

> 
> Since we're here, also replace all the (void *private_data) into (struct
> inode *inode).
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 08/68] btrfs: inode: sink parameter @start and @len for check_data_csum()
  2020-10-21  6:24 ` [PATCH v4 08/68] btrfs: inode: sink parameter @start and @len for check_data_csum() Qu Wenruo
@ 2020-10-21 22:11   ` Goldwyn Rodrigues
  2020-10-27  0:13   ` David Sterba
  1 sibling, 0 replies; 97+ messages in thread
From: Goldwyn Rodrigues @ 2020-10-21 22:11 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

In $SUBJECT, will prefer s/sink/remove/

On 14:24 21/10, Qu Wenruo wrote:
> For check_data_csum(), the page we're using is directly from inode
> mapping, thus it has valid page_offset().
> 
> We can use (page_offset() + pg_off) to replace @start parameter
> completely, while the @len should always be sectorsize.
> 
> Since we're here, also add some comment, as there are quite some
> confusion in words like start/offset, without explaining whether it's
> file_offset or logical bytenr.
> 
> This should not affect the existing behavior, as for current sectorsize
> == PAGE_SIZE case, @pgoff should always be 0, and len is always
> PAGE_SIZE (or sectorsize from the dio read path).
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

> ---
>  fs/btrfs/inode.c | 27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 2a56d3b8eff4..24fbf2c46e56 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2791,17 +2791,30 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
>  	btrfs_queue_work(wq, &ordered_extent->work);
>  }
>  
> +/*
> + * Verify the checksum of one sector of uncompressed data.
> + *
> + * @inode:	The inode.
> + * @io_bio:	The btrfs_io_bio which contains the csum.
> + * @icsum:	The csum offset (by number of sectors).
> + * @page:	The page where the data to be verified is.
> + * @pgoff:	The offset inside the page.
> + *
> + * The length of such check is always one sector size.
> + */
>  static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
> -			   int icsum, struct page *page, int pgoff, u64 start,
> -			   size_t len)
> +			   int icsum, struct page *page, int pgoff)
>  {
>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>  	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
>  	char *kaddr;
> +	u32 len = fs_info->sectorsize;
>  	u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
>  	u8 *csum_expected;
>  	u8 csum[BTRFS_CSUM_SIZE];
>  
> +	ASSERT(pgoff + len <= PAGE_SIZE);
> +
>  	csum_expected = ((u8 *)io_bio->csum) + icsum * csum_size;
>  
>  	kaddr = kmap_atomic(page);
> @@ -2815,8 +2828,8 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
>  	kunmap_atomic(kaddr);
>  	return 0;
>  zeroit:
> -	btrfs_print_data_csum_error(BTRFS_I(inode), start, csum, csum_expected,
> -				    io_bio->mirror_num);
> +	btrfs_print_data_csum_error(BTRFS_I(inode), page_offset(page) + pgoff,
> +				    csum, csum_expected, io_bio->mirror_num);
>  	if (io_bio->device)
>  		btrfs_dev_stat_inc_and_print(io_bio->device,
>  					     BTRFS_DEV_STAT_CORRUPTION_ERRS);
> @@ -2855,8 +2868,7 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
>  	}
>  
>  	phy_offset >>= inode->i_sb->s_blocksize_bits;
> -	return check_data_csum(inode, io_bio, phy_offset, page, offset, start,
> -			       (size_t)(end - start + 1));
> +	return check_data_csum(inode, io_bio, phy_offset, page, offset);
>  }
>  
>  /*
> @@ -7542,8 +7554,7 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
>  			ASSERT(pgoff < PAGE_SIZE);
>  			if (uptodate &&
>  			    (!csum || !check_data_csum(inode, io_bio, icsum,
> -						       bvec.bv_page, pgoff,
> -						       start, sectorsize))) {
> +						       bvec.bv_page, pgoff))) {
>  				clean_io_failure(fs_info, failure_tree, io_tree,
>  						 start, bvec.bv_page,
>  						 btrfs_ino(BTRFS_I(inode)),
> -- 
> 2.28.0
> 

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 01/68] btrfs: extent-io-tests: remove invalid tests
  2020-10-21  6:24 ` [PATCH v4 01/68] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
@ 2020-10-26 23:26   ` David Sterba
  2020-10-27  0:44     ` Qu Wenruo
  0 siblings, 1 reply; 97+ messages in thread
From: David Sterba @ 2020-10-26 23:26 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Oct 21, 2020 at 02:24:47PM +0800, Qu Wenruo wrote:
> In extent-io-test, there are two invalid tests:
> - Invalid nodesize for test_eb_bitmaps()
>   Instead of the sectorsize and nodesize combination passed in, we're
>   always using hand-crafted nodesize.
>   Although it has some extra check for 64K page size, we can still hit
>   a case where PAGE_SIZE == 32K, then we got 128K nodesize which is
>   larger than max valid node size.
> 
>   Thankfully most machines are either 4K or 64K page size, thus we
>   haven't yet hit such case.
> 
> - Invalid extent buffer bytenr
>   For 64K page size, the only combination we're going to test is
>   sectorsize = nodesize = 64K.
>   In that case, we'll try to create an extent buffer with 32K bytenr,
>   which is not aligned to sectorsize thus invalid.
> 
> This patch will fix both problems by:
> - Honor the sectorsize/nodesize combination
>   Now we won't bother to hand-craft a strange length and use it as
>   nodesize.
> 
> - Use sectorsize as the 2nd run extent buffer start
>   This would test the case where extent buffer is aligned to sectorsize
>   but not always aligned to nodesize.

The code has evolved since it was added in 0f3312295d3ce1d823 ("Btrfs:
add extent buffer bitmap sanity tests") and "page * 4" is intentional to
provide buffer where the shifted bitmap is tested. The logic has not
changed, only the ppc64 case was added.

And I remember that tweaking this code tended to break on a real machine
so there are a few things that bother me:

- the test does something and I'm not sure it's invalid (I think it's
  not)
- test on a real 64k page machine is needed
- you reduce the scope of the test to fewer combinations

If there are combinations that would make it hard for the subpage then
it would be better to add it as an exception but otherwise the main
usecase is for 4K page and this allows more combinations to test.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 08/68] btrfs: inode: sink parameter @start and @len for check_data_csum()
  2020-10-21  6:24 ` [PATCH v4 08/68] btrfs: inode: sink parameter @start and @len for check_data_csum() Qu Wenruo
  2020-10-21 22:11   ` Goldwyn Rodrigues
@ 2020-10-27  0:13   ` David Sterba
  2020-10-27  0:50     ` Qu Wenruo
  1 sibling, 1 reply; 97+ messages in thread
From: David Sterba @ 2020-10-27  0:13 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Oct 21, 2020 at 02:24:54PM +0800, Qu Wenruo wrote:
> For check_data_csum(), the page we're using is directly from inode
> mapping, thus it has valid page_offset().
> 
> We can use (page_offset() + pg_off) to replace @start parameter
> completely, while the @len should always be sectorsize.
> 
> Since we're here, also add some comment, as there are quite some
> confusion in words like start/offset, without explaining whether it's
> file_offset or logical bytenr.
> 
> This should not affect the existing behavior, as for current sectorsize
> == PAGE_SIZE case, @pgoff should always be 0, and len is always
> PAGE_SIZE (or sectorsize from the dio read path).
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/inode.c | 27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 2a56d3b8eff4..24fbf2c46e56 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2791,17 +2791,30 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
>  	btrfs_queue_work(wq, &ordered_extent->work);
>  }
>  
> +/*
> + * Verify the checksum of one sector of uncompressed data.
> + *
> + * @inode:	The inode.
> + * @io_bio:	The btrfs_io_bio which contains the csum.
> + * @icsum:	The csum offset (by number of sectors).

This is not true, it's the index to the checksum array, where size of
the element is fs_info::csum_size. The offset can be calculated but it's
not the thing that's passed as argument.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 09/68] btrfs: extent_io: unexport extent_invalidatepage()
  2020-10-21  6:24 ` [PATCH v4 09/68] btrfs: extent_io: unexport extent_invalidatepage() Qu Wenruo
@ 2020-10-27  0:24   ` David Sterba
  0 siblings, 0 replies; 97+ messages in thread
From: David Sterba @ 2020-10-27  0:24 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Oct 21, 2020 at 02:24:55PM +0800, Qu Wenruo wrote:
> Function extent_invalidatepage() has a single caller,
> btree_invalidatepage().

It is so but the function is also part of the extent io tree API so it's
in the right file with other functions.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 10/68] btrfs: extent_io: remove the forward declaration and rename __process_pages_contig
  2020-10-21  6:24 ` [PATCH v4 10/68] btrfs: extent_io: remove the forward declaration and rename __process_pages_contig Qu Wenruo
@ 2020-10-27  0:28   ` David Sterba
  2020-10-27  0:50     ` Qu Wenruo
  0 siblings, 1 reply; 97+ messages in thread
From: David Sterba @ 2020-10-27  0:28 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Oct 21, 2020 at 02:24:56PM +0800, Qu Wenruo wrote:
> There is no need to do forward declaration for __process_pages_contig(),
> so move it before it get first called.

But without other good reason than prototype removal we don't want to
move the code.

> Since we are here, also remove the "__" prefix since there is no special
> meaning for it.

Renaming and adding the comment is fine on itself but does not justify
moving the chunk of code.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 01/68] btrfs: extent-io-tests: remove invalid tests
  2020-10-26 23:26   ` David Sterba
@ 2020-10-27  0:44     ` Qu Wenruo
  2020-11-03  6:07       ` Qu Wenruo
  0 siblings, 1 reply; 97+ messages in thread
From: Qu Wenruo @ 2020-10-27  0:44 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2576 bytes --]



On 2020/10/27 上午7:26, David Sterba wrote:
> On Wed, Oct 21, 2020 at 02:24:47PM +0800, Qu Wenruo wrote:
>> In extent-io-test, there are two invalid tests:
>> - Invalid nodesize for test_eb_bitmaps()
>>   Instead of the sectorsize and nodesize combination passed in, we're
>>   always using hand-crafted nodesize.
>>   Although it has some extra check for 64K page size, we can still hit
>>   a case where PAGE_SIZE == 32K, then we got 128K nodesize which is
>>   larger than max valid node size.
>>
>>   Thankfully most machines are either 4K or 64K page size, thus we
>>   haven't yet hit such case.
>>
>> - Invalid extent buffer bytenr
>>   For 64K page size, the only combination we're going to test is
>>   sectorsize = nodesize = 64K.
>>   In that case, we'll try to create an extent buffer with 32K bytenr,
>>   which is not aligned to sectorsize thus invalid.
>>
>> This patch will fix both problems by:
>> - Honor the sectorsize/nodesize combination
>>   Now we won't bother to hand-craft a strange length and use it as
>>   nodesize.
>>
>> - Use sectorsize as the 2nd run extent buffer start
>>   This would test the case where extent buffer is aligned to sectorsize
>>   but not always aligned to nodesize.
> 
> The code has evolved since it was added in 0f3312295d3ce1d823 ("Btrfs:
> add extent buffer bitmap sanity tests") and "page * 4" is intentional to
> provide buffer where the shifted bitmap is tested. The logic has not
> changed, only the ppc64 case was added.
> 
> And I remember that tweaking this code tended to break on a real machine
> so there are a few things that bother me:
> 
> - the test does something and I'm not sure it's invalid (I think it's
>   not)

Sector is the minimal unit that every tree block/data should follow (the
only exception is superblock).
Thus a sector starts in half of the sector size is definitely invalid.

> - test on a real 64k page machine is needed

Every time I inserted the btrfs kernel for my RK3399 board with 64K page
size it's tested already.

> - you reduce the scope of the test to fewer combinations

Well, removing invalid cases would definitely lead to fewer combinations
anyway.

> 
> If there are combinations that would make it hard for the subpage then
> it would be better to add it as an exception but otherwise the main
> usecase is for 4K page and this allows more combinations to test.
> 
No, there isn't anything special related to subpage.

Just the things related to "sector" are broken in this test cases.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 08/68] btrfs: inode: sink parameter @start and @len for check_data_csum()
  2020-10-27  0:13   ` David Sterba
@ 2020-10-27  0:50     ` Qu Wenruo
  2020-10-27 23:17       ` David Sterba
  0 siblings, 1 reply; 97+ messages in thread
From: Qu Wenruo @ 2020-10-27  0:50 UTC (permalink / raw)
  To: dsterba, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1802 bytes --]



On 2020/10/27 上午8:13, David Sterba wrote:
> On Wed, Oct 21, 2020 at 02:24:54PM +0800, Qu Wenruo wrote:
>> For check_data_csum(), the page we're using is directly from inode
>> mapping, thus it has valid page_offset().
>>
>> We can use (page_offset() + pg_off) to replace @start parameter
>> completely, while the @len should always be sectorsize.
>>
>> Since we're here, also add some comment, as there are quite some
>> confusion in words like start/offset, without explaining whether it's
>> file_offset or logical bytenr.
>>
>> This should not affect the existing behavior, as for current sectorsize
>> == PAGE_SIZE case, @pgoff should always be 0, and len is always
>> PAGE_SIZE (or sectorsize from the dio read path).
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/inode.c | 27 +++++++++++++++++++--------
>>  1 file changed, 19 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 2a56d3b8eff4..24fbf2c46e56 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -2791,17 +2791,30 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
>>  	btrfs_queue_work(wq, &ordered_extent->work);
>>  }
>>  
>> +/*
>> + * Verify the checksum of one sector of uncompressed data.
>> + *
>> + * @inode:	The inode.
>> + * @io_bio:	The btrfs_io_bio which contains the csum.
>> + * @icsum:	The csum offset (by number of sectors).
> 
> This is not true, it's the index to the checksum array, where size of
> the element is fs_info::csum_size. The offset can be calculated but it's
> not the thing that's passed as argument.
> Isn't the offset by sectors the same?

If it's 1, it means we need to skip 1 csum which is in csum_size.

Or again my bad words?

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 10/68] btrfs: extent_io: remove the forward declaration and rename __process_pages_contig
  2020-10-27  0:28   ` David Sterba
@ 2020-10-27  0:50     ` Qu Wenruo
  2020-10-27 23:25       ` David Sterba
  0 siblings, 1 reply; 97+ messages in thread
From: Qu Wenruo @ 2020-10-27  0:50 UTC (permalink / raw)
  To: dsterba, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 822 bytes --]



On 2020/10/27 上午8:28, David Sterba wrote:
> On Wed, Oct 21, 2020 at 02:24:56PM +0800, Qu Wenruo wrote:
>> There is no need to do forward declaration for __process_pages_contig(),
>> so move it before it get first called.
> 
> But without other good reason than prototype removal we don't want to
> move the code.
> 
>> Since we are here, also remove the "__" prefix since there is no special
>> meaning for it.
> 
> Renaming and adding the comment is fine on itself but does not justify
> moving the chunk of code.
> 
I thought the forward declaration should be something we clean up during
development.

But it looks like it's no longer the case or my memory is just blurry.

Anyway, I can definitely keep the forward declaration and just keep the
renaming and new comments.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 13/68] btrfs: extent_io: remove the extent_start/extent_len for end_bio_extent_readpage()
  2020-10-21  6:24 ` [PATCH v4 13/68] btrfs: extent_io: remove the extent_start/extent_len for end_bio_extent_readpage() Qu Wenruo
@ 2020-10-27 10:29   ` David Sterba
  2020-10-27 12:15     ` Qu Wenruo
  0 siblings, 1 reply; 97+ messages in thread
From: David Sterba @ 2020-10-27 10:29 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Oct 21, 2020 at 02:24:59PM +0800, Qu Wenruo wrote:
> In end_bio_extent_readpage() we had a strange dance around
> extent_start/extent_len.
> 
> The truth is, no matter what we're doing using those two variable, the
> end result is just the same, clear the EXTENT_LOCKED bit and if needed
> set the EXTENT_UPTODATE bit for the io_tree.
> 
> This doesn't need the complex dance, we can do it pretty easily by just
> calling endio_readpage_release_extent() for each bvec.
> 
> This greatly streamlines the code.

Yes it does, the old code is a series of conditions and new code is just
one call but it's hard to see why this is correct. Can you please write
some guidance, what are the invariants or how does the logic simplify?
What you write above is a summary but for review I'd need something to
follow so I don't have to spend too much time reading just this patch.
Thanks.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 16/68] btrfs: extent_io: add assert_spin_locked() for attach_extent_buffer_page()
  2020-10-21  6:25 ` [PATCH v4 16/68] btrfs: extent_io: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
@ 2020-10-27 10:43   ` David Sterba
  0 siblings, 0 replies; 97+ messages in thread
From: David Sterba @ 2020-10-27 10:43 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Nikolay Borisov

On Wed, Oct 21, 2020 at 02:25:02PM +0800, Qu Wenruo wrote:
> When calling attach_extent_buffer_page(), either we're attaching
> anonymous pages, called from btrfs_clone_extent_buffer().
> 
> Or we're attaching btree_inode pages, called from alloc_extent_buffer().
> 
> For the later case, we should have page->mapping->private_lock hold to
> avoid race modifying page->private.
> 
> Add assert_spin_locked() if we're calling from alloc_extent_buffer().
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  fs/btrfs/extent_io.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 5842d3522865..8bf38948bd37 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3106,6 +3106,15 @@ static int submit_extent_page(unsigned int opf,
>  static void attach_extent_buffer_page(struct extent_buffer *eb,
>  				      struct page *page)
>  {
> +	/*
> +	 * If the page is mapped to btree inode, we should hold the private
> +	 * lock to prevent race.
> +	 * For cloned or dummy extent buffers, their pages are not mapped and
> +	 * will not race with any other ebs.
> +	 */
> +	if (page->mapping)
> +		assert_spin_locked(&page->mapping->private_lock);

assert_spin_locked per documentation checks if the spinlock is lockded
on any cpu, but from the comments above you want to assert that it's
held by the caller. So for that you need lockdep_assert_held, I don't
thing we'd ever want assert_spin_locked in our code.

> +
>  	if (!PagePrivate(page))
>  		attach_page_private(page, eb);
>  	else
> -- 
> 2.28.0

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 18/68] btrfs: extent_io: calculate inline extent buffer page size based on page size
  2020-10-21  6:25 ` [PATCH v4 18/68] btrfs: extent_io: calculate inline extent buffer page size based on page size Qu Wenruo
@ 2020-10-27 11:16   ` David Sterba
  2020-10-27 11:20     ` David Sterba
  0 siblings, 1 reply; 97+ messages in thread
From: David Sterba @ 2020-10-27 11:16 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Oct 21, 2020 at 02:25:04PM +0800, Qu Wenruo wrote:
> -#define INLINE_EXTENT_BUFFER_PAGES 16
> -#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE)
> +/*
> + * The SZ_64K is BTRFS_MAX_METADATA_BLOCKSIZE, here just to avoid circle
> + * including "ctree.h".

This should be moved to features.h instead of the duplicate definition.

> + */
> +#define INLINE_EXTENT_BUFFER_PAGES (SZ_64K / PAGE_SIZE)


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 18/68] btrfs: extent_io: calculate inline extent buffer page size based on page size
  2020-10-27 11:16   ` David Sterba
@ 2020-10-27 11:20     ` David Sterba
  0 siblings, 0 replies; 97+ messages in thread
From: David Sterba @ 2020-10-27 11:20 UTC (permalink / raw)
  To: David Sterba; +Cc: Qu Wenruo, linux-btrfs

On Tue, Oct 27, 2020 at 12:16:32PM +0100, David Sterba wrote:
> On Wed, Oct 21, 2020 at 02:25:04PM +0800, Qu Wenruo wrote:
> > -#define INLINE_EXTENT_BUFFER_PAGES 16
> > -#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE)
> > +/*
> > + * The SZ_64K is BTRFS_MAX_METADATA_BLOCKSIZE, here just to avoid circle
> > + * including "ctree.h".
> 
> This should be moved to features.h instead of the duplicate definition.

So features.h was some leftover in my tree, we don't have that file yet,
this will need a cleanup first. I'll keep the patch in series so we can
test it but it won't be in the final version I hope.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 13/68] btrfs: extent_io: remove the extent_start/extent_len for end_bio_extent_readpage()
  2020-10-27 10:29   ` David Sterba
@ 2020-10-27 12:15     ` Qu Wenruo
  2020-10-27 23:31       ` David Sterba
  0 siblings, 1 reply; 97+ messages in thread
From: Qu Wenruo @ 2020-10-27 12:15 UTC (permalink / raw)
  To: dsterba, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1077 bytes --]



On 2020/10/27 下午6:29, David Sterba wrote:
> On Wed, Oct 21, 2020 at 02:24:59PM +0800, Qu Wenruo wrote:
>> In end_bio_extent_readpage() we had a strange dance around
>> extent_start/extent_len.
>>
>> The truth is, no matter what we're doing using those two variable, the
>> end result is just the same, clear the EXTENT_LOCKED bit and if needed
>> set the EXTENT_UPTODATE bit for the io_tree.
>>
>> This doesn't need the complex dance, we can do it pretty easily by just
>> calling endio_readpage_release_extent() for each bvec.
>>
>> This greatly streamlines the code.
> 
> Yes it does, the old code is a series of conditions and new code is just
> one call but it's hard to see why this is correct. Can you please write
> some guidance, what are the invariants or how does the logic simplify?
> What you write above is a summary but for review I'd need something to
> follow so I don't have to spend too much time reading just this patch.
> Thanks.
> 
Sorry, I should add more explanation on that, would add that in next update.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 22/68] btrfs: disk_io: grab fs_info from extent_buffer::fs_info directly for btrfs_mark_buffer_dirty()
  2020-10-21  6:25 ` [PATCH v4 22/68] btrfs: disk_io: grab fs_info from extent_buffer::fs_info directly for btrfs_mark_buffer_dirty() Qu Wenruo
@ 2020-10-27 15:43   ` Goldwyn Rodrigues
  0 siblings, 0 replies; 97+ messages in thread
From: Goldwyn Rodrigues @ 2020-10-27 15:43 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On 14:25 21/10, Qu Wenruo wrote:
> Since commit f28491e0a6c4 ("Btrfs: move the extent buffer radix tree into
> the fs_info"), fs_info can be grabbed from extent_buffer directly.
> 
> So use that extent_buffer::fs_info directly in btrfs_mark_buffer_dirty()
> to make things a little easier.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 08/68] btrfs: inode: sink parameter @start and @len for check_data_csum()
  2020-10-27  0:50     ` Qu Wenruo
@ 2020-10-27 23:17       ` David Sterba
  2020-10-28  0:57         ` Qu Wenruo
  0 siblings, 1 reply; 97+ messages in thread
From: David Sterba @ 2020-10-27 23:17 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Tue, Oct 27, 2020 at 08:50:15AM +0800, Qu Wenruo wrote:
> On 2020/10/27 上午8:13, David Sterba wrote:
> > On Wed, Oct 21, 2020 at 02:24:54PM +0800, Qu Wenruo wrote:
> >> For check_data_csum(), the page we're using is directly from inode
> >> mapping, thus it has valid page_offset().
> >>
> >> We can use (page_offset() + pg_off) to replace @start parameter
> >> completely, while the @len should always be sectorsize.
> >>
> >> Since we're here, also add some comment, as there are quite some
> >> confusion in words like start/offset, without explaining whether it's
> >> file_offset or logical bytenr.
> >>
> >> This should not affect the existing behavior, as for current sectorsize
> >> == PAGE_SIZE case, @pgoff should always be 0, and len is always
> >> PAGE_SIZE (or sectorsize from the dio read path).
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >>  fs/btrfs/inode.c | 27 +++++++++++++++++++--------
> >>  1 file changed, 19 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >> index 2a56d3b8eff4..24fbf2c46e56 100644
> >> --- a/fs/btrfs/inode.c
> >> +++ b/fs/btrfs/inode.c
> >> @@ -2791,17 +2791,30 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
> >>  	btrfs_queue_work(wq, &ordered_extent->work);
> >>  }
> >>  
> >> +/*
> >> + * Verify the checksum of one sector of uncompressed data.
> >> + *
> >> + * @inode:	The inode.
> >> + * @io_bio:	The btrfs_io_bio which contains the csum.
> >> + * @icsum:	The csum offset (by number of sectors).
> > 
> > This is not true, it's the index to the checksum array, where size of
> > the element is fs_info::csum_size. The offset can be calculated but it's
> > not the thing that's passed as argument.

> Isn't the offset by sectors the same?

Offset by sectors reads as something expressed in sector-sized units.
> 
> If it's 1, it means we need to skip 1 csum which is in csum_size.

Yes, so you see the difference sector vs csum_size. I understand what
you meant by that but reading the comment without going to the code can
confuse somebody.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 10/68] btrfs: extent_io: remove the forward declaration and rename __process_pages_contig
  2020-10-27  0:50     ` Qu Wenruo
@ 2020-10-27 23:25       ` David Sterba
  0 siblings, 0 replies; 97+ messages in thread
From: David Sterba @ 2020-10-27 23:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Tue, Oct 27, 2020 at 08:50:15AM +0800, Qu Wenruo wrote:
> 
> 
> On 2020/10/27 上午8:28, David Sterba wrote:
> > On Wed, Oct 21, 2020 at 02:24:56PM +0800, Qu Wenruo wrote:
> >> There is no need to do forward declaration for __process_pages_contig(),
> >> so move it before it get first called.
> > 
> > But without other good reason than prototype removal we don't want to
> > move the code.
> > 
> >> Since we are here, also remove the "__" prefix since there is no special
> >> meaning for it.
> > 
> > Renaming and adding the comment is fine on itself but does not justify
> > moving the chunk of code.
> > 
> I thought the forward declaration should be something we clean up during
> development.

Eventually yes but commits that only move code pollute the git history
so there needs to be some other reason like splitting or refactoring.
Keeping the prototypes is not that bad, if it pops up during grep it's
quickly skipped, but when one is looking why some code changed it's very
annoying to land in some "move code" patch. I'm trying to keep such
changes to minimum but there are cases where we want that so it's not a
strict 'no never', rather case by case decision.
> 
> But it looks like it's no longer the case or my memory is just blurry.

The list of things to keep in mind is getting long
https://btrfs.wiki.kernel.org/index.php/Development_notes#Coding_style_preferences

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 13/68] btrfs: extent_io: remove the extent_start/extent_len for end_bio_extent_readpage()
  2020-10-27 12:15     ` Qu Wenruo
@ 2020-10-27 23:31       ` David Sterba
  0 siblings, 0 replies; 97+ messages in thread
From: David Sterba @ 2020-10-27 23:31 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Tue, Oct 27, 2020 at 08:15:58PM +0800, Qu Wenruo wrote:
> On 2020/10/27 下午6:29, David Sterba wrote:
> > On Wed, Oct 21, 2020 at 02:24:59PM +0800, Qu Wenruo wrote:
> >> In end_bio_extent_readpage() we had a strange dance around
> >> extent_start/extent_len.
> >>
> >> The truth is, no matter what we're doing using those two variable, the
> >> end result is just the same, clear the EXTENT_LOCKED bit and if needed
> >> set the EXTENT_UPTODATE bit for the io_tree.
> >>
> >> This doesn't need the complex dance, we can do it pretty easily by just
> >> calling endio_readpage_release_extent() for each bvec.
> >>
> >> This greatly streamlines the code.
> > 
> > Yes it does, the old code is a series of conditions and new code is just
> > one call but it's hard to see why this is correct. Can you please write
> > some guidance, what are the invariants or how does the logic simplify?
> > What you write above is a summary but for review I'd need something to
> > follow so I don't have to spend too much time reading just this patch.
> > Thanks.
> > 
> Sorry, I should add more explanation on that, would add that in next update.

Most of the patches 1-20 are ok so I've picked them to a branch and will
move to misc-next once the comments are answered. I've fixed what I
though does not need a resend but for this patch is probably something
that you should send and I try to understand.

What I have right now is in my github repo in branch
ext/qu/subpage-1-prep so you can have a look but you don't need to
resend the whole series, if you have updates to any of the patches reply
to it and I'll fold it in.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 08/68] btrfs: inode: sink parameter @start and @len for check_data_csum()
  2020-10-27 23:17       ` David Sterba
@ 2020-10-28  0:57         ` Qu Wenruo
  2020-10-29 19:38           ` David Sterba
  0 siblings, 1 reply; 97+ messages in thread
From: Qu Wenruo @ 2020-10-28  0:57 UTC (permalink / raw)
  To: dsterba, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2435 bytes --]



On 2020/10/28 上午7:17, David Sterba wrote:
> On Tue, Oct 27, 2020 at 08:50:15AM +0800, Qu Wenruo wrote:
>> On 2020/10/27 上午8:13, David Sterba wrote:
>>> On Wed, Oct 21, 2020 at 02:24:54PM +0800, Qu Wenruo wrote:
>>>> For check_data_csum(), the page we're using is directly from inode
>>>> mapping, thus it has valid page_offset().
>>>>
>>>> We can use (page_offset() + pg_off) to replace @start parameter
>>>> completely, while the @len should always be sectorsize.
>>>>
>>>> Since we're here, also add some comment, as there are quite some
>>>> confusion in words like start/offset, without explaining whether it's
>>>> file_offset or logical bytenr.
>>>>
>>>> This should not affect the existing behavior, as for current sectorsize
>>>> == PAGE_SIZE case, @pgoff should always be 0, and len is always
>>>> PAGE_SIZE (or sectorsize from the dio read path).
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>>  fs/btrfs/inode.c | 27 +++++++++++++++++++--------
>>>>  1 file changed, 19 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>>> index 2a56d3b8eff4..24fbf2c46e56 100644
>>>> --- a/fs/btrfs/inode.c
>>>> +++ b/fs/btrfs/inode.c
>>>> @@ -2791,17 +2791,30 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
>>>>  	btrfs_queue_work(wq, &ordered_extent->work);
>>>>  }
>>>>  
>>>> +/*
>>>> + * Verify the checksum of one sector of uncompressed data.
>>>> + *
>>>> + * @inode:	The inode.
>>>> + * @io_bio:	The btrfs_io_bio which contains the csum.
>>>> + * @icsum:	The csum offset (by number of sectors).
>>>
>>> This is not true, it's the index to the checksum array, where size of
>>> the element is fs_info::csum_size. The offset can be calculated but it's
>>> not the thing that's passed as argument.
> 
>> Isn't the offset by sectors the same?
> 
> Offset by sectors reads as something expressed in sector-sized units.
>>
>> If it's 1, it means we need to skip 1 csum which is in csum_size.
> 
> Yes, so you see the difference sector vs csum_size. I understand what
> you meant by that but reading the comment without going to the code can
> confuse somebody.
> 

Any better naming alternative for that?

Or maybe I can refactor the function by passing in the current
file_offset into the function, and let check_data_csum() to calculate
the csum offset by itself?

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 08/68] btrfs: inode: sink parameter @start and @len for check_data_csum()
  2020-10-28  0:57         ` Qu Wenruo
@ 2020-10-29 19:38           ` David Sterba
  0 siblings, 0 replies; 97+ messages in thread
From: David Sterba @ 2020-10-29 19:38 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, linux-btrfs

On Wed, Oct 28, 2020 at 08:57:07AM +0800, Qu Wenruo wrote:
> On 2020/10/28 上午7:17, David Sterba wrote:
> > On Tue, Oct 27, 2020 at 08:50:15AM +0800, Qu Wenruo wrote:
> >> On 2020/10/27 上午8:13, David Sterba wrote:
> >>> On Wed, Oct 21, 2020 at 02:24:54PM +0800, Qu Wenruo wrote:
> >>>> +/*
> >>>> + * Verify the checksum of one sector of uncompressed data.
> >>>> + *
> >>>> + * @inode:	The inode.
> >>>> + * @io_bio:	The btrfs_io_bio which contains the csum.
> >>>> + * @icsum:	The csum offset (by number of sectors).
> >>>
> >>> This is not true, it's the index to the checksum array, where size of
> >>> the element is fs_info::csum_size. The offset can be calculated but it's
> >>> not the thing that's passed as argument.
> > 
> >> Isn't the offset by sectors the same?
> > 
> > Offset by sectors reads as something expressed in sector-sized units.
> >>
> >> If it's 1, it means we need to skip 1 csum which is in csum_size.
> > 
> > Yes, so you see the difference sector vs csum_size. I understand what
> > you meant by that but reading the comment without going to the code can
> > confuse somebody.
> 
> Any better naming alternative for that?
> 
> Or maybe I can refactor the function by passing in the current
> file_offset into the function, and let check_data_csum() to calculate
> the csum offset by itself?

It was only the parameter description that was a bit confusing, no need
to change anything else here.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 42/68] btrfs: allow RO mount of 4K sector size fs on 64K page system
  2020-10-21  6:25 ` [PATCH v4 42/68] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
@ 2020-10-29 20:11   ` David Sterba
  2020-10-29 23:34   ` Michał Mirosław
  1 sibling, 0 replies; 97+ messages in thread
From: David Sterba @ 2020-10-29 20:11 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Oct 21, 2020 at 02:25:28PM +0800, Qu Wenruo wrote:
> This adds the basic RO mount ability for 4K sector size on 64K page
> system.
> 
> Currently we only plan to support 4K and 64K page system.

This should cover the most common page sizes, though there's still SPARC
with 8K pages and there were people reporting some problems in the past.
There are more arches with other page sizes but I don't think they're
actively used.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 42/68] btrfs: allow RO mount of 4K sector size fs on 64K page system
  2020-10-21  6:25 ` [PATCH v4 42/68] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
  2020-10-29 20:11   ` David Sterba
@ 2020-10-29 23:34   ` Michał Mirosław
  2020-10-29 23:56     ` Qu Wenruo
  1 sibling, 1 reply; 97+ messages in thread
From: Michał Mirosław @ 2020-10-29 23:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Oct 21, 2020 at 02:25:28PM +0800, Qu Wenruo wrote:
> This adds the basic RO mount ability for 4K sector size on 64K page
> system.
> 
> Currently we only plan to support 4K and 64K page system.
[...]

Why this restriction? I briefly looked at this patch and some of the
previous and it looks like the code doesn't really care about anything
more than order(PAGE_SIZE) >= order(sectorsize).

Best Regards,
Michał Mirosław

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 42/68] btrfs: allow RO mount of 4K sector size fs on 64K page system
  2020-10-29 23:34   ` Michał Mirosław
@ 2020-10-29 23:56     ` Qu Wenruo
  0 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-10-29 23:56 UTC (permalink / raw)
  To: Michał Mirosław, Qu Wenruo; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1412 bytes --]



On 2020/10/30 上午7:34, Michał Mirosław wrote:
> On Wed, Oct 21, 2020 at 02:25:28PM +0800, Qu Wenruo wrote:
>> This adds the basic RO mount ability for 4K sector size on 64K page
>> system.
>>
>> Currently we only plan to support 4K and 64K page system.
> [...]
> 
> Why this restriction? I briefly looked at this patch and some of the
> previous and it looks like the code doesn't really care about anything
> more than order(PAGE_SIZE) >= order(sectorsize).

The restriction comes from the metadata operations.

Currently for subpage case, we expect one page to contain all of the
subpage tree block.

For things like 32K, 16K or 8K page size, we need more pages to handle them.
That needs extra work to handle unaligned tree start (already done for
current subpage support) and multi-page case (not for current subpage,
but only for sectorsize == PAGE_SIZE yet).

Another problem is testing.

Currently the only (cheap) way to test 64K page size is using ARM64 SBC.
(My current setup is with RK3399 since it's cheap and has full x4 lane
NVME support)

For 8K or 32K, I don't find widely available device to test.

Anyway, my current focus is to add balance support and remove all small
bugs exposed so far.
We may support 16K in the future, but only after we have finished
current 64K subpage support.

Thanks,
Qu

> 
> Best Regards,
> Michał Mirosław
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size
  2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
                   ` (68 preceding siblings ...)
  2020-10-21 11:22 ` [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size David Sterba
@ 2020-11-02 14:56 ` David Sterba
  2020-11-03  0:06   ` Qu Wenruo
  69 siblings, 1 reply; 97+ messages in thread
From: David Sterba @ 2020-11-02 14:56 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Oct 21, 2020 at 02:24:46PM +0800, Qu Wenruo wrote:
> Patches can be fetched from github:
> https://github.com/adam900710/linux/tree/subpage_data_fullpage_write
> 
> Qu Wenruo (67):

So far I've merged

      btrfs: extent_io: fix the comment on lock_extent_buffer_for_io()
      btrfs: extent_io: update the comment for find_first_extent_bit()
      btrfs: extent_io: sink the failed_start parameter to set_extent_bit()
      btrfs: disk-io: replace fs_info and private_data with inode for btrfs_wq_submit_bio()
      btrfs: inode: sink parameter start and len to check_data_csum()
      btrfs: extent_io: rename pages_locked in process_pages_contig()
      btrfs: extent_io: only require sector size alignment for page read
      btrfs: extent_io: rename page_size to io_size in submit_extent_page()

to misc-next.  This is from the first 20, the easy and safe changes.
There are few more that need more explanation or another look.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size
  2020-11-02 14:56 ` David Sterba
@ 2020-11-03  0:06   ` Qu Wenruo
  0 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-11-03  0:06 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1364 bytes --]



On 2020/11/2 下午10:56, David Sterba wrote:
> On Wed, Oct 21, 2020 at 02:24:46PM +0800, Qu Wenruo wrote:
>> Patches can be fetched from github:
>> https://github.com/adam900710/linux/tree/subpage_data_fullpage_write
>>
>> Qu Wenruo (67):
> 
> So far I've merged
> 
>       btrfs: extent_io: fix the comment on lock_extent_buffer_for_io()
>       btrfs: extent_io: update the comment for find_first_extent_bit()
>       btrfs: extent_io: sink the failed_start parameter to set_extent_bit()
>       btrfs: disk-io: replace fs_info and private_data with inode for btrfs_wq_submit_bio()
>       btrfs: inode: sink parameter start and len to check_data_csum()
>       btrfs: extent_io: rename pages_locked in process_pages_contig()
>       btrfs: extent_io: only require sector size alignment for page read
>       btrfs: extent_io: rename page_size to io_size in submit_extent_page()
> 
> to misc-next.  This is from the first 20, the easy and safe changes.
> There are few more that need more explanation or another look.
> 
That's great.

BTW, for next update, I should rebase all patches to current misc-next
right?
Especially to take advantage of things like sectorsize_bits.

BTW, for next round patches, should I send all the patches in a huge
batch, or just send the safe refactors (with comments addresses)?

THanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 01/68] btrfs: extent-io-tests: remove invalid tests
  2020-10-27  0:44     ` Qu Wenruo
@ 2020-11-03  6:07       ` Qu Wenruo
  0 siblings, 0 replies; 97+ messages in thread
From: Qu Wenruo @ 2020-11-03  6:07 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3351 bytes --]



On 2020/10/27 上午8:44, Qu Wenruo wrote:
> 
> 
> On 2020/10/27 上午7:26, David Sterba wrote:
>> On Wed, Oct 21, 2020 at 02:24:47PM +0800, Qu Wenruo wrote:
>>> In extent-io-test, there are two invalid tests:
>>> - Invalid nodesize for test_eb_bitmaps()
>>>   Instead of the sectorsize and nodesize combination passed in, we're
>>>   always using hand-crafted nodesize.
>>>   Although it has some extra check for 64K page size, we can still hit
>>>   a case where PAGE_SIZE == 32K, then we got 128K nodesize which is
>>>   larger than max valid node size.
>>>
>>>   Thankfully most machines are either 4K or 64K page size, thus we
>>>   haven't yet hit such case.
>>>
>>> - Invalid extent buffer bytenr
>>>   For 64K page size, the only combination we're going to test is
>>>   sectorsize = nodesize = 64K.
>>>   In that case, we'll try to create an extent buffer with 32K bytenr,
>>>   which is not aligned to sectorsize thus invalid.
>>>
>>> This patch will fix both problems by:
>>> - Honor the sectorsize/nodesize combination
>>>   Now we won't bother to hand-craft a strange length and use it as
>>>   nodesize.
>>>
>>> - Use sectorsize as the 2nd run extent buffer start
>>>   This would test the case where extent buffer is aligned to sectorsize
>>>   but not always aligned to nodesize.
>>
>> The code has evolved since it was added in 0f3312295d3ce1d823 ("Btrfs:
>> add extent buffer bitmap sanity tests") and "page * 4" is intentional to
>> provide buffer where the shifted bitmap is tested. The logic has not
>> changed, only the ppc64 case was added.
>>
>> And I remember that tweaking this code tended to break on a real machine
>> so there are a few things that bother me:
>>
>> - the test does something and I'm not sure it's invalid (I think it's
>>   not)
> 
> Sector is the minimal unit that every tree block/data should follow (the
> only exception is superblock).
> Thus a sector starts in half of the sector size is definitely invalid.
> 
>> - test on a real 64k page machine is needed
> 
> Every time I inserted the btrfs kernel for my RK3399 board with 64K page
> size it's tested already.
> 
>> - you reduce the scope of the test to fewer combinations
> 
> Well, removing invalid cases would definitely lead to fewer combinations
> anyway.
> 
>>
>> If there are combinations that would make it hard for the subpage then
>> it would be better to add it as an exception but otherwise the main
>> usecase is for 4K page and this allows more combinations to test.
>>
> No, there isn't anything special related to subpage.
> 
> Just the things related to "sector" are broken in this test cases.

Since all later subpage refactor code will require this patch as the
basestone, I just want to re-iterate here again:

Any extent buffer whose bytenr is not sector aligned is invalid.

This does not apply to subpage, but also regular sector size.

E.g. for 4K sector size, 4K node size, an eb starting at bytenr 1M + 2K
is definitely corrupted.

This is patch essential, especially when later patch "btrfs: extent_io:
calculate inline extent buffer page size based on page size" will change
extent_buffer::pages[] to bare minimal.

Anyway, I will add extra comment to explain the importance of this patch.

Thanks,
Qu

> 
> Thanks,
> Qu
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2020-11-03  6:07 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-21  6:24 [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size Qu Wenruo
2020-10-21  6:24 ` [PATCH v4 01/68] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
2020-10-26 23:26   ` David Sterba
2020-10-27  0:44     ` Qu Wenruo
2020-11-03  6:07       ` Qu Wenruo
2020-10-21  6:24 ` [PATCH v4 02/68] btrfs: use iosize while reading compressed pages Qu Wenruo
2020-10-21  6:24 ` [PATCH v4 03/68] btrfs: extent_io: fix the comment on lock_extent_buffer_for_io() Qu Wenruo
2020-10-21  6:24 ` [PATCH v4 04/68] btrfs: extent_io: update the comment for find_first_extent_bit() Qu Wenruo
2020-10-21  6:24 ` [PATCH v4 05/68] btrfs: extent_io: sink the @failed_start parameter for set_extent_bit() Qu Wenruo
2020-10-21  6:24 ` [PATCH v4 06/68] btrfs: make btree inode io_tree has its special owner Qu Wenruo
2020-10-21  6:24 ` [PATCH v4 07/68] btrfs: disk-io: replace @fs_info and @private_data with @inode for btrfs_wq_submit_bio() Qu Wenruo
2020-10-21 22:00   ` Goldwyn Rodrigues
2020-10-21  6:24 ` [PATCH v4 08/68] btrfs: inode: sink parameter @start and @len for check_data_csum() Qu Wenruo
2020-10-21 22:11   ` Goldwyn Rodrigues
2020-10-27  0:13   ` David Sterba
2020-10-27  0:50     ` Qu Wenruo
2020-10-27 23:17       ` David Sterba
2020-10-28  0:57         ` Qu Wenruo
2020-10-29 19:38           ` David Sterba
2020-10-21  6:24 ` [PATCH v4 09/68] btrfs: extent_io: unexport extent_invalidatepage() Qu Wenruo
2020-10-27  0:24   ` David Sterba
2020-10-21  6:24 ` [PATCH v4 10/68] btrfs: extent_io: remove the forward declaration and rename __process_pages_contig Qu Wenruo
2020-10-27  0:28   ` David Sterba
2020-10-27  0:50     ` Qu Wenruo
2020-10-27 23:25       ` David Sterba
2020-10-21  6:24 ` [PATCH v4 11/68] btrfs: extent_io: rename pages_locked in process_pages_contig() Qu Wenruo
2020-10-21  6:24 ` [PATCH v4 12/68] btrfs: extent_io: only require sector size alignment for page read Qu Wenruo
2020-10-21  6:24 ` [PATCH v4 13/68] btrfs: extent_io: remove the extent_start/extent_len for end_bio_extent_readpage() Qu Wenruo
2020-10-27 10:29   ` David Sterba
2020-10-27 12:15     ` Qu Wenruo
2020-10-27 23:31       ` David Sterba
2020-10-21  6:25 ` [PATCH v4 14/68] btrfs: extent_io: integrate page status update into endio_readpage_release_extent() Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 15/68] btrfs: extent_io: rename page_size to io_size in submit_extent_page() Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 16/68] btrfs: extent_io: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
2020-10-27 10:43   ` David Sterba
2020-10-21  6:25 ` [PATCH v4 17/68] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 18/68] btrfs: extent_io: calculate inline extent buffer page size based on page size Qu Wenruo
2020-10-27 11:16   ` David Sterba
2020-10-27 11:20     ` David Sterba
2020-10-21  6:25 ` [PATCH v4 19/68] btrfs: extent_io: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 20/68] btrfs: extent_io: sink less common parameters for __set_extent_bit() Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 21/68] btrfs: extent_io: sink less common parameters for __clear_extent_bit() Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 22/68] btrfs: disk_io: grab fs_info from extent_buffer::fs_info directly for btrfs_mark_buffer_dirty() Qu Wenruo
2020-10-27 15:43   ` Goldwyn Rodrigues
2020-10-21  6:25 ` [PATCH v4 23/68] btrfs: disk-io: make csum_tree_block() handle sectorsize smaller than page size Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 24/68] btrfs: disk-io: extract the extent buffer verification from btree_readpage_end_io_hook() Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 25/68] btrfs: disk-io: accept bvec directly for csum_dirty_buffer() Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 26/68] btrfs: inode: make btrfs_readpage_end_io_hook() follow sector size Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 27/68] btrfs: introduce a helper to determine if the sectorsize is smaller than PAGE_SIZE Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 28/68] btrfs: extent_io: allow find_first_extent_bit() to find a range with exact bits match Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 29/68] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 30/68] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 31/68] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 32/68] btrfs: disk-io: only clear EXTENT_LOCK bit for extent_invalidatepage() Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 33/68] btrfs: extent-io: make type of extent_state::state to be at least 32 bits Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 34/68] btrfs: extent_io: use extent_io_tree to handle subpage extent buffer allocation Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 35/68] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 36/68] btrfs: extent_io: make the assert test on page uptodate able to handle subpage Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 37/68] btrfs: extent_io: implement subpage metadata read and its endio function Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 38/68] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 39/68] btrfs: extent_io: extra the core of test_range_bit() into test_range_bit_nolock() Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 40/68] btrfs: extent_io: introduce EXTENT_READ_SUBMITTED to handle subpage data read Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 41/68] btrfs: set btree inode track_uptodate for subpage support Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 42/68] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
2020-10-29 20:11   ` David Sterba
2020-10-29 23:34   ` Michał Mirosław
2020-10-29 23:56     ` Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 43/68] btrfs: disk-io: allow btree_set_page_dirty() to do more sanity check on subpage metadata Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 44/68] btrfs: disk-io: support subpage metadata csum calculation at write time Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 45/68] btrfs: extent_io: prevent extent_state from being merged for btree io tree Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 46/68] btrfs: extent_io: make set_extent_buffer_dirty() to support subpage sized metadata Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 47/68] btrfs: extent_io: add subpage support for clear_extent_buffer_dirty() Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 48/68] btrfs: extent_io: make set_btree_ioerr() accept extent buffer Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 49/68] btrfs: extent_io: introduce write_one_subpage_eb() function Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 50/68] btrfs: extent_io: make lock_extent_buffer_for_io() subpage compatible Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 51/68] btrfs: extent_io: introduce submit_btree_subpage() to submit a page for subpage metadata write Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 52/68] btrfs: extent_io: introduce end_bio_subpage_eb_writepage() function Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 53/68] btrfs: inode: make can_nocow_extent() check only return 1 if the range is no smaller than PAGE_SIZE Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 54/68] btrfs: file: calculate reserve space based on PAGE_SIZE for buffered write Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 55/68] btrfs: file: make hole punching page aligned for subpage Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 56/68] btrfs: file: make btrfs_dirty_pages() follow page size to mark extent io tree Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 57/68] btrfs: file: make btrfs_file_write_iter() to be page aligned Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 58/68] btrfs: output extra info for space info update underflow Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 59/68] btrfs: delalloc-space: make data space reservation to be page aligned Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 60/68] btrfs: scrub: allow scrub to work with subpage sectorsize Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 61/68] btrfs: inode: make btrfs_truncate_block() to do page alignment Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 62/68] btrfs: file: make hole punch and zero range to be page aligned Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 63/68] btrfs: file: make btrfs_fallocate() to use PAGE_SIZE as blocksize Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 64/68] btrfs: inode: always mark the full page range delalloc for btrfs_page_mkwrite() Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 65/68] btrfs: inode: require page alignement for direct io Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 66/68] btrfs: inode: only do NOCOW write for page aligned extent Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 67/68] btrfs: reflink: do full page writeback for reflink prepare Qu Wenruo
2020-10-21  6:25 ` [PATCH v4 68/68] btrfs: support subpage read write for test Qu Wenruo
2020-10-21 11:22 ` [PATCH v4 00/68] btrfs: add basic rw support for subpage sector size David Sterba
2020-10-21 11:50   ` Qu Wenruo
2020-11-02 14:56 ` David Sterba
2020-11-03  0:06   ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).