All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/22] btrfs: add read-only support for subpage sector size
@ 2021-01-06  1:01 Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 01/22] btrfs: extent_io: rename @offset parameter to @disk_bytenr for submit_extent_page() Qu Wenruo
                   ` (22 more replies)
  0 siblings, 23 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

Patches can be fetched from github:
https://github.com/adam900710/linux/tree/subpage
Currently the branch also contains partial RW data support (still some
out-of-sync subpage data page status).

Great thanks to David for his effort reviewing and merging the
preparation patches into misc-next.
Now all previously submitted preparation patches are already in
misc-next.

=== What works ===

Just from the patchset:
- Data read
  Both regular and compressed data, with csum check.

- Metadata read

This means, with these patchset, 64K page systems can at least mount
btrfs with 4K sector size.

In the subpage branch
- Metadata read write and balance
  Not yet full tested due to data write still has bugs need to be
  solved.
  But considering that metadata operations from previous iteration
  is mostly untouched, metadata read write should be pretty stable.

- Baisc data balance
  This is new.

- Data read write
  Only uncompressed data writes. Fsstress can survive.
  But still very rare data csum error, which only happens in bookend data
  extents (part of the data extent which is not referred by any file).
  This looks like something related to reflink/invalidate page and cow
  fixup.
  Still invetigating.

=== Needs feedback ===
The following design needs extra comments:

- u16 bitmap
  As David mentioned, using u16 as bit map is not the fastest way.
  That's also why current bitmap code requires unsigned long (u32) as
  minimal unit.
  But using bitmap directly would double the memory usage.
  Thus the best way is to pack two u16 bitmap into one u32 bitmap, but
  that still needs extra investigation to find better practice.

  Anyway the skeleton should be pretty simple to expand.

- Separate handling for subpage metadata
  Currently the metadata read and (later write path) handles subpage
  metadata differently. Mostly due to the page locking must be skipped
  for subpage metadata.
  I tried several times to use as many common code as possible, but
  every time I ended up reverting back to current code.

  Thankfully, for data handling we will use the same common code.

- Incompatible subpage strcuture against iomap_page
  In btrfs we need extra bits than iomap_page.
  This is due to we need sector perfect write for data balance.
  E.g. if only one 4K sector is dirty in a 64K page, we should only
  write that dirty 4K back to disk, not the full 64K page.

  As data balance requires the new data extents to have exactly the
  same size as the original ones.
  This means, unless iomap_page get extra bits like what we're doing in
  btrfs for dirty, we can't merge the btrfs_subpage with iomap_page.

=== Patchset structure ===
Patch 01~03:	Existing preparation patches.
		Mostly readability related patches found during RW
		development
Patch 04~05:	New preparation patches.
		Mostly related to __process_pages_contig().
Patch 06~10:	Subpage handling for extent buffer allocation and
		freeing
Patch 11~22:	Subpage handling for extent buffer read path

=== Changelog ===
v1:
- Separate the main implementation from previous huge patchset
  Huge patchset doesn't make much sense.

- Use bitmap implementation
  Now page::private will be a pointer to btrfs_subpage structure, which
  contains bitmaps for various page status.

v2:
- Use page::private as btrfs_subpage for extra info
  This replace old extent io tree based solution, which reduces latency
  and don't require memory allocation for its operations.

- Cherry-pick new preparation patches from RW development
  Those new preparation patches improves the readability by their own.

v3:
- Make dummy extent buffer to follow the same subpage accessors
  Fsstress exposed several ASSERT() for dummy extent buffers.
  It turns out we need to make dummy extent buffer to own the same
  btrfs_subpage structure to make eb accessors to work properly

- Two new small __process_pages_contig() related preparation patches
  One to make __process_pages_contig() to enhance the error handling
  path for locked_page, one to merge one macro.

- Extent buffers refs count update
  Except try_release_extent_buffer(), all other eb uses will try to
  increase the ref count of the eb.
  For try_release_extent_buffer(), the eb refs check will happen inside
  the rcu critical section to avoid eb being freed.

- Comment updates
  Addressing the comments from the mail list.

Qu Wenruo (22):
  btrfs: extent_io: rename @offset parameter to @disk_bytenr for
    submit_extent_page()
  btrfs: extent_io: refactor __extent_writepage_io() to improve
    readability
  btrfs: file: update comment for btrfs_dirty_pages()
  btrfs: extent_io: update locked page dirty/writeback/error bits in
    __process_pages_contig()
  btrfs: extent_io: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK into
    PAGE_START_WRITEBACK
  btrfs: extent_io: introduce a helper to grab an existing extent buffer
    from a page
  btrfs: extent_io: introduce the skeleton of btrfs_subpage structure
  btrfs: extent_io: make attach_extent_buffer_page() to handle subpage
    case
  btrfs: extent_io: make grab_extent_buffer_from_page() to handle
    subpage case
  btrfs: extent_io: support subpage for extent buffer page release
  btrfs: extent_io: attach private to dummy extent buffer pages
  btrfs: subpage: introduce helper for subpage uptodate status
  btrfs: subpage: introduce helper for subpage error status
  btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support
    subpage size
  btrfs: extent_io: make btrfs_clone_extent_buffer() to be subpage
    compatible
  btrfs: extent_io: implement try_release_extent_buffer() for subpage
    metadata support
  btrfs: extent_io: introduce read_extent_buffer_subpage()
  btrfs: extent_io: make endio_readpage_update_page_status() to handle
    subpage case
  btrfs: disk-io: introduce subpage metadata validation check
  btrfs: introduce btrfs_subpage for data inodes
  btrfs: integrate page status update for read path into
    begin/end_page_read()
  btrfs: allow RO mount of 4K sector size fs on 64K page system

 fs/btrfs/Makefile           |   3 +-
 fs/btrfs/compression.c      |  10 +-
 fs/btrfs/disk-io.c          |  82 +++++-
 fs/btrfs/extent_io.c        | 534 ++++++++++++++++++++++++++++--------
 fs/btrfs/extent_io.h        |  15 +-
 fs/btrfs/file.c             |  35 +--
 fs/btrfs/free-space-cache.c |  15 +-
 fs/btrfs/inode.c            |  40 ++-
 fs/btrfs/ioctl.c            |   5 +-
 fs/btrfs/reflink.c          |   5 +-
 fs/btrfs/relocation.c       |  12 +-
 fs/btrfs/subpage.c          |  39 +++
 fs/btrfs/subpage.h          | 256 +++++++++++++++++
 fs/btrfs/super.c            |   7 +
 14 files changed, 876 insertions(+), 182 deletions(-)
 create mode 100644 fs/btrfs/subpage.c
 create mode 100644 fs/btrfs/subpage.h

-- 
2.29.2


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 01/22] btrfs: extent_io: rename @offset parameter to @disk_bytenr for submit_extent_page()
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 02/22] btrfs: extent_io: refactor __extent_writepage_io() to improve readability Qu Wenruo
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik

The parameter @offset can't be more confusing.
In fact that parameter is the disk bytenr for metadata/data.

Rename it to @disk_bytenr and update the comment to reduce confusion.

Since we're here, also rename all @offset passed into
submit_extent_page() to @disk_bytenr.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c9cee458e001..99e04fc731a3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3062,10 +3062,10 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size)
  * @opf:	bio REQ_OP_* and REQ_* flags as one value
  * @wbc:	optional writeback control for io accounting
  * @page:	page to add to the bio
+ * @disk_bytenr:the logical bytenr where the write will be
+ * @size:	portion of page that we want to write
  * @pg_offset:	offset of the new bio or to check whether we are adding
  *              a contiguous page to the previous one
- * @size:	portion of page that we want to write
- * @offset:	starting offset in the page
  * @bio_ret:	must be valid pointer, newly allocated bio will be stored there
  * @end_io_func:     end_io callback for new bio
  * @mirror_num:	     desired mirror to read/write
@@ -3074,7 +3074,7 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size)
  */
 static int submit_extent_page(unsigned int opf,
 			      struct writeback_control *wbc,
-			      struct page *page, u64 offset,
+			      struct page *page, u64 disk_bytenr,
 			      size_t size, unsigned long pg_offset,
 			      struct bio **bio_ret,
 			      bio_end_io_t end_io_func,
@@ -3086,7 +3086,7 @@ static int submit_extent_page(unsigned int opf,
 	int ret = 0;
 	struct bio *bio;
 	size_t io_size = min_t(size_t, size, PAGE_SIZE);
-	sector_t sector = offset >> 9;
+	sector_t sector = disk_bytenr >> 9;
 	struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree;
 
 	ASSERT(bio_ret);
@@ -3120,7 +3120,7 @@ static int submit_extent_page(unsigned int opf,
 		}
 	}
 
-	bio = btrfs_bio_alloc(offset);
+	bio = btrfs_bio_alloc(disk_bytenr);
 	bio_add_page(bio, page, io_size, pg_offset);
 	bio->bi_end_io = end_io_func;
 	bio->bi_private = tree;
@@ -3242,7 +3242,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 	}
 	while (cur <= end) {
 		bool force_bio_submit = false;
-		u64 offset;
+		u64 disk_bytenr;
 
 		if (cur >= last_byte) {
 			char *userpage;
@@ -3280,9 +3280,9 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		cur_end = min(extent_map_end(em) - 1, end);
 		iosize = ALIGN(iosize, blocksize);
 		if (this_bio_flag & EXTENT_BIO_COMPRESSED)
-			offset = em->block_start;
+			disk_bytenr = em->block_start;
 		else
-			offset = em->block_start + extent_offset;
+			disk_bytenr = em->block_start + extent_offset;
 		block_start = em->block_start;
 		if (test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
 			block_start = EXTENT_MAP_HOLE;
@@ -3371,7 +3371,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		}
 
 		ret = submit_extent_page(REQ_OP_READ | read_flags, NULL,
-					 page, offset, iosize,
+					 page, disk_bytenr, iosize,
 					 pg_offset, bio,
 					 end_bio_extent_readpage, 0,
 					 *bio_flags,
@@ -3548,8 +3548,8 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 	blocksize = inode->vfs_inode.i_sb->s_blocksize;
 
 	while (cur <= end) {
+		u64 disk_bytenr;
 		u64 em_end;
-		u64 offset;
 
 		if (cur >= i_size) {
 			btrfs_writepage_endio_finish_ordered(page, cur,
@@ -3569,7 +3569,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		BUG_ON(end < cur);
 		iosize = min(em_end - cur, end - cur + 1);
 		iosize = ALIGN(iosize, blocksize);
-		offset = em->block_start + extent_offset;
+		disk_bytenr = em->block_start + extent_offset;
 		block_start = em->block_start;
 		compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags);
 		free_extent_map(em);
@@ -3599,7 +3599,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		}
 
 		ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc,
-					 page, offset, iosize, pg_offset,
+					 page, disk_bytenr, iosize, pg_offset,
 					 &epd->bio,
 					 end_bio_extent_writepage,
 					 0, 0, 0, false);
@@ -3923,7 +3923,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 			struct writeback_control *wbc,
 			struct extent_page_data *epd)
 {
-	u64 offset = eb->start;
+	u64 disk_bytenr = eb->start;
 	u32 nritems;
 	int i, num_pages;
 	unsigned long start, end;
@@ -3956,7 +3956,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 		clear_page_dirty_for_io(p);
 		set_page_writeback(p);
 		ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc,
-					 p, offset, PAGE_SIZE, 0,
+					 p, disk_bytenr, PAGE_SIZE, 0,
 					 &epd->bio,
 					 end_bio_extent_buffer_writepage,
 					 0, 0, 0, false);
@@ -3969,7 +3969,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 			ret = -EIO;
 			break;
 		}
-		offset += PAGE_SIZE;
+		disk_bytenr += PAGE_SIZE;
 		update_nr_written(wbc, 1);
 		unlock_page(p);
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 02/22] btrfs: extent_io: refactor __extent_writepage_io() to improve readability
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 01/22] btrfs: extent_io: rename @offset parameter to @disk_bytenr for submit_extent_page() Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 03/22] btrfs: file: update comment for btrfs_dirty_pages() Qu Wenruo
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik

The refactor involves the following modifications:
- iosize alignment
  In fact we don't really need to manually do alignment at all.
  All extent maps should already be aligned, thus basic ASSERT() check
  would be enough.

- redundant variables
  We have extra variable like blocksize/pg_offset/end.
  They are all unnecessary.

  @blocksize can be replaced by sectorsize size directly, and it's only
  used to verify the em start/size is aligned.

  @pg_offset can be easily calculated using @cur and page_offset(page).

  @end is just assigned from @page_end and never modified, use "start +
   PAGE_SIZE - 1" directly and remove @page_end.

- remove some BUG_ON()s
  The BUG_ON()s are for extent map, which we have tree-checker to check
  on-disk extent data item and runtime check.
  ASSERT() should be enough.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 38 +++++++++++++++++---------------------
 1 file changed, 17 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 99e04fc731a3..6f156ce501a1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3513,23 +3513,20 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 				 unsigned long nr_written,
 				 int *nr_ret)
 {
+	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct extent_io_tree *tree = &inode->io_tree;
 	u64 start = page_offset(page);
-	u64 page_end = start + PAGE_SIZE - 1;
-	u64 end;
+	u64 end = start + PAGE_SIZE - 1;
 	u64 cur = start;
 	u64 extent_offset;
 	u64 block_start;
-	u64 iosize;
 	struct extent_map *em;
-	size_t pg_offset = 0;
-	size_t blocksize;
 	int ret = 0;
 	int nr = 0;
 	const unsigned int write_flags = wbc_to_write_flags(wbc);
 	bool compressed;
 
-	ret = btrfs_writepage_cow_fixup(page, start, page_end);
+	ret = btrfs_writepage_cow_fixup(page, start, end);
 	if (ret) {
 		/* Fixup worker will requeue */
 		redirty_page_for_writepage(wbc, page);
@@ -3544,16 +3541,13 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 	 */
 	update_nr_written(wbc, nr_written + 1);
 
-	end = page_end;
-	blocksize = inode->vfs_inode.i_sb->s_blocksize;
-
 	while (cur <= end) {
 		u64 disk_bytenr;
 		u64 em_end;
+		u32 iosize;
 
 		if (cur >= i_size) {
-			btrfs_writepage_endio_finish_ordered(page, cur,
-							     page_end, 1);
+			btrfs_writepage_endio_finish_ordered(page, cur, end, 1);
 			break;
 		}
 		em = btrfs_get_extent(inode, NULL, 0, cur, end - cur + 1);
@@ -3565,16 +3559,20 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 
 		extent_offset = cur - em->start;
 		em_end = extent_map_end(em);
-		BUG_ON(em_end <= cur);
-		BUG_ON(end < cur);
-		iosize = min(em_end - cur, end - cur + 1);
-		iosize = ALIGN(iosize, blocksize);
-		disk_bytenr = em->block_start + extent_offset;
+		ASSERT(cur <= em_end);
+		ASSERT(cur < end);
+		ASSERT(IS_ALIGNED(em->start, fs_info->sectorsize));
+		ASSERT(IS_ALIGNED(em->len, fs_info->sectorsize));
 		block_start = em->block_start;
 		compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags);
+		disk_bytenr = em->block_start + extent_offset;
+
+		/* Note that em_end from extent_map_end() is exclusive */
+		iosize = min(em_end, end + 1) - cur;
 		free_extent_map(em);
 		em = NULL;
 
+
 		/*
 		 * compressed and inline extents are written through other
 		 * paths in the FS
@@ -3587,7 +3585,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 				btrfs_writepage_endio_finish_ordered(page, cur,
 							cur + iosize - 1, 1);
 			cur += iosize;
-			pg_offset += iosize;
 			continue;
 		}
 
@@ -3599,8 +3596,8 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		}
 
 		ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc,
-					 page, disk_bytenr, iosize, pg_offset,
-					 &epd->bio,
+					 page, disk_bytenr, iosize,
+					 cur - page_offset(page), &epd->bio,
 					 end_bio_extent_writepage,
 					 0, 0, 0, false);
 		if (ret) {
@@ -3609,8 +3606,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 				end_page_writeback(page);
 		}
 
-		cur = cur + iosize;
-		pg_offset += iosize;
+		cur += iosize;
 		nr++;
 	}
 	*nr_ret = nr;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 03/22] btrfs: file: update comment for btrfs_dirty_pages()
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 01/22] btrfs: extent_io: rename @offset parameter to @disk_bytenr for submit_extent_page() Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 02/22] btrfs: extent_io: refactor __extent_writepage_io() to improve readability Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 04/22] btrfs: extent_io: update locked page dirty/writeback/error bits in __process_pages_contig() Qu Wenruo
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

The original comment is from the initial merge, which has several
problems:
- No holes check any more
- No inline decision is made

Update the out-of-date comment with more correct one.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e65223e3510d..1602975ddb88 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -453,12 +453,11 @@ static void btrfs_drop_pages(struct page **pages, size_t num_pages)
 }
 
 /*
- * after copy_from_user, pages need to be dirtied and we need to make
- * sure holes are created between the current EOF and the start of
- * any next extents (if required).
- *
- * this also makes the decision about creating an inline extent vs
- * doing real data extents, marking pages dirty and delalloc as required.
+ * After btrfs_copy_from_user(), update the following things for delalloc:
+ * - Mark newly dirtied pages as DELALLOC in the io tree.
+ *   Used to advise which range is to be written back.
+ * - Marks modified pages as Uptodate/Dirty and not needing cowfixup
+ * - Update inode size for past EOF write.
  */
 int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
 		      size_t num_pages, loff_t pos, size_t write_bytes,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 04/22] btrfs: extent_io: update locked page dirty/writeback/error bits in __process_pages_contig()
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (2 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 03/22] btrfs: file: update comment for btrfs_dirty_pages() Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 05/22] btrfs: extent_io: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK into PAGE_START_WRITEBACK Qu Wenruo
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

When __process_pages_contig() get called for
extent_clear_unlock_delalloc(), if we hit the locked page, only Private2
bit is updated, but dirty/writeback/error bits are all skiped.

There are several call sites call extent_clear_unlock_delalloc() with
@locked_page and PAGE_CLEAR_DIRTY/PAGE_SET_WRITEBACK/PAGE_END_WRITEBACK

- cow_file_range()
- run_delalloc_nocow()
- cow_file_range_async()
  All for their error handling branches.

For those call sites, since we skip the locked page for
dirty/error/writeback bit update, the locked page will still have its
dirty bit remaining.

Thankfully, since all those call sites can only be hit with various
serious errors, it's pretty hard to hit and shouldn't affect regular
btrfs operations.

But still, we shouldn't leave the locked_page with its
dirty/error/writeback bits untouched.

Fix this by only skipping lock/unlock page operations for locked_page.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6f156ce501a1..098c68583ebb 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1970,11 +1970,6 @@ static int __process_pages_contig(struct address_space *mapping,
 			if (page_ops & PAGE_SET_PRIVATE2)
 				SetPagePrivate2(pages[i]);
 
-			if (locked_page && pages[i] == locked_page) {
-				put_page(pages[i]);
-				pages_processed++;
-				continue;
-			}
 			if (page_ops & PAGE_CLEAR_DIRTY)
 				clear_page_dirty_for_io(pages[i]);
 			if (page_ops & PAGE_SET_WRITEBACK)
@@ -1983,6 +1978,11 @@ static int __process_pages_contig(struct address_space *mapping,
 				SetPageError(pages[i]);
 			if (page_ops & PAGE_END_WRITEBACK)
 				end_page_writeback(pages[i]);
+			if (locked_page && pages[i] == locked_page) {
+				put_page(pages[i]);
+				pages_processed++;
+				continue;
+			}
 			if (page_ops & PAGE_UNLOCK)
 				unlock_page(pages[i]);
 			if (page_ops & PAGE_LOCK) {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 05/22] btrfs: extent_io: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK into PAGE_START_WRITEBACK
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (3 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 04/22] btrfs: extent_io: update locked page dirty/writeback/error bits in __process_pages_contig() Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 06/22] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page Qu Wenruo
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK are two macros used in
__process_pages_contig(), to inform the function to clear page dirty and
then set page writeback.

However page write back and dirty are two conflict status (at least for
sector size == PAGE_SIZE case), this means those two macros are always
called together.

This means we can merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK, into
one macro, PAGE_START_WRITEBACK.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c |  4 ++--
 fs/btrfs/extent_io.h | 12 ++++++------
 fs/btrfs/inode.c     | 28 ++++++++++------------------
 3 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 098c68583ebb..a2a9c2148e91 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1970,10 +1970,10 @@ static int __process_pages_contig(struct address_space *mapping,
 			if (page_ops & PAGE_SET_PRIVATE2)
 				SetPagePrivate2(pages[i]);
 
-			if (page_ops & PAGE_CLEAR_DIRTY)
+			if (page_ops & PAGE_START_WRITEBACK) {
 				clear_page_dirty_for_io(pages[i]);
-			if (page_ops & PAGE_SET_WRITEBACK)
 				set_page_writeback(pages[i]);
+			}
 			if (page_ops & PAGE_SET_ERROR)
 				SetPageError(pages[i]);
 			if (page_ops & PAGE_END_WRITEBACK)
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 19221095c635..bedf761a0300 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -35,12 +35,12 @@ enum {
 
 /* these are flags for __process_pages_contig */
 #define PAGE_UNLOCK		(1 << 0)
-#define PAGE_CLEAR_DIRTY	(1 << 1)
-#define PAGE_SET_WRITEBACK	(1 << 2)
-#define PAGE_END_WRITEBACK	(1 << 3)
-#define PAGE_SET_PRIVATE2	(1 << 4)
-#define PAGE_SET_ERROR		(1 << 5)
-#define PAGE_LOCK		(1 << 6)
+/* This one will clera page dirty and then set paeg writeback */
+#define PAGE_START_WRITEBACK	(1 << 1)
+#define PAGE_END_WRITEBACK	(1 << 2)
+#define PAGE_SET_PRIVATE2	(1 << 3)
+#define PAGE_SET_ERROR		(1 << 4)
+#define PAGE_LOCK		(1 << 5)
 
 /*
  * page->private values.  Every page that is controlled by the extent
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9c2800fa80c6..ecc1f1f60b48 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -692,8 +692,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
 						     NULL,
 						     clear_flags,
 						     PAGE_UNLOCK |
-						     PAGE_CLEAR_DIRTY |
-						     PAGE_SET_WRITEBACK |
+						     PAGE_START_WRITEBACK |
 						     page_error_op |
 						     PAGE_END_WRITEBACK);
 
@@ -934,8 +933,7 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk)
 				async_extent->start +
 				async_extent->ram_size - 1,
 				NULL, EXTENT_LOCKED | EXTENT_DELALLOC,
-				PAGE_UNLOCK | PAGE_CLEAR_DIRTY |
-				PAGE_SET_WRITEBACK);
+				PAGE_UNLOCK | PAGE_START_WRITEBACK);
 		if (btrfs_submit_compressed_write(inode, async_extent->start,
 				    async_extent->ram_size,
 				    ins.objectid,
@@ -971,9 +969,8 @@ static noinline void submit_compressed_extents(struct async_chunk *async_chunk)
 				     NULL, EXTENT_LOCKED | EXTENT_DELALLOC |
 				     EXTENT_DELALLOC_NEW |
 				     EXTENT_DEFRAG | EXTENT_DO_ACCOUNTING,
-				     PAGE_UNLOCK | PAGE_CLEAR_DIRTY |
-				     PAGE_SET_WRITEBACK | PAGE_END_WRITEBACK |
-				     PAGE_SET_ERROR);
+				     PAGE_UNLOCK | PAGE_START_WRITEBACK |
+				     PAGE_END_WRITEBACK | PAGE_SET_ERROR);
 	free_async_extent_pages(async_extent);
 	kfree(async_extent);
 	goto again;
@@ -1071,8 +1068,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 				     EXTENT_LOCKED | EXTENT_DELALLOC |
 				     EXTENT_DELALLOC_NEW | EXTENT_DEFRAG |
 				     EXTENT_DO_ACCOUNTING, PAGE_UNLOCK |
-				     PAGE_CLEAR_DIRTY | PAGE_SET_WRITEBACK |
-				     PAGE_END_WRITEBACK);
+				     PAGE_START_WRITEBACK | PAGE_END_WRITEBACK);
 			*nr_written = *nr_written +
 			     (end - start + PAGE_SIZE) / PAGE_SIZE;
 			*page_started = 1;
@@ -1194,8 +1190,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
 out_unlock:
 	clear_bits = EXTENT_LOCKED | EXTENT_DELALLOC | EXTENT_DELALLOC_NEW |
 		EXTENT_DEFRAG | EXTENT_CLEAR_META_RESV;
-	page_ops = PAGE_UNLOCK | PAGE_CLEAR_DIRTY | PAGE_SET_WRITEBACK |
-		PAGE_END_WRITEBACK;
+	page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK | PAGE_END_WRITEBACK;
 	/*
 	 * If we reserved an extent for our delalloc range (or a subrange) and
 	 * failed to create the respective ordered extent, then it means that
@@ -1320,9 +1315,8 @@ static int cow_file_range_async(struct btrfs_inode *inode,
 		unsigned clear_bits = EXTENT_LOCKED | EXTENT_DELALLOC |
 			EXTENT_DELALLOC_NEW | EXTENT_DEFRAG |
 			EXTENT_DO_ACCOUNTING;
-		unsigned long page_ops = PAGE_UNLOCK | PAGE_CLEAR_DIRTY |
-			PAGE_SET_WRITEBACK | PAGE_END_WRITEBACK |
-			PAGE_SET_ERROR;
+		unsigned long page_ops = PAGE_UNLOCK | PAGE_START_WRITEBACK |
+			PAGE_END_WRITEBACK | PAGE_SET_ERROR;
 
 		extent_clear_unlock_delalloc(inode, start, end, locked_page,
 					     clear_bits, page_ops);
@@ -1519,8 +1513,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 					     EXTENT_LOCKED | EXTENT_DELALLOC |
 					     EXTENT_DO_ACCOUNTING |
 					     EXTENT_DEFRAG, PAGE_UNLOCK |
-					     PAGE_CLEAR_DIRTY |
-					     PAGE_SET_WRITEBACK |
+					     PAGE_START_WRITEBACK |
 					     PAGE_END_WRITEBACK);
 		return -ENOMEM;
 	}
@@ -1842,8 +1835,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
 					     locked_page, EXTENT_LOCKED |
 					     EXTENT_DELALLOC | EXTENT_DEFRAG |
 					     EXTENT_DO_ACCOUNTING, PAGE_UNLOCK |
-					     PAGE_CLEAR_DIRTY |
-					     PAGE_SET_WRITEBACK |
+					     PAGE_START_WRITEBACK |
 					     PAGE_END_WRITEBACK);
 	btrfs_free_path(path);
 	return ret;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 06/22] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (4 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 05/22] btrfs: extent_io: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK into PAGE_START_WRITEBACK Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-12 15:08   ` David Sterba
  2021-01-06  1:01 ` [PATCH v3 07/22] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure Qu Wenruo
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Johannes Thumshirn

This patch will extract the code to grab an extent buffer from a page
into a helper, grab_extent_buffer_from_page().

This reduces one indent level, and provides the work place for later
expansion for subapge support.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 51 +++++++++++++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a2a9c2148e91..d60f1837f8fb 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5248,6 +5248,30 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
 }
 #endif
 
+static struct extent_buffer *grab_extent_buffer(struct page *page)
+{
+	struct extent_buffer *exists;
+
+	/* Page not yet attached to an extent buffer */
+	if (!PagePrivate(page))
+		return NULL;
+
+	/*
+	 * We could have already allocated an eb for this page
+	 * and attached one so lets see if we can get a ref on
+	 * the existing eb, and if we can we know it's good and
+	 * we can just return that one, else we know we can just
+	 * overwrite page->private.
+	 */
+	exists = (struct extent_buffer *)page->private;
+	if (atomic_inc_not_zero(&exists->refs))
+		return exists;
+
+	WARN_ON(PageDirty(page));
+	detach_page_private(page);
+	return NULL;
+}
+
 struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 					  u64 start, u64 owner_root, int level)
 {
@@ -5293,26 +5317,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		}
 
 		spin_lock(&mapping->private_lock);
-		if (PagePrivate(p)) {
-			/*
-			 * We could have already allocated an eb for this page
-			 * and attached one so lets see if we can get a ref on
-			 * the existing eb, and if we can we know it's good and
-			 * we can just return that one, else we know we can just
-			 * overwrite page->private.
-			 */
-			exists = (struct extent_buffer *)p->private;
-			if (atomic_inc_not_zero(&exists->refs)) {
-				spin_unlock(&mapping->private_lock);
-				unlock_page(p);
-				put_page(p);
-				mark_extent_buffer_accessed(exists, p);
-				goto free_eb;
-			}
-			exists = NULL;
-
-			WARN_ON(PageDirty(p));
-			detach_page_private(p);
+		exists = grab_extent_buffer(p);
+		if (exists) {
+			spin_unlock(&mapping->private_lock);
+			unlock_page(p);
+			put_page(p);
+			mark_extent_buffer_accessed(exists, p);
+			goto free_eb;
 		}
 		attach_extent_buffer_page(eb, p);
 		spin_unlock(&mapping->private_lock);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 07/22] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (5 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 06/22] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 08/22] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik

For btrfs subpage support, we need a structure to record extra info for
the status of each sectors of a page.

This patch will introduce the skeleton structure for future btrfs
subpage support.
All subpage related code would go to subpage.[ch] to avoid populating
the existing code base.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/Makefile  |  3 ++-
 fs/btrfs/subpage.c | 39 +++++++++++++++++++++++++++++++++++++++
 fs/btrfs/subpage.h | 31 +++++++++++++++++++++++++++++++
 3 files changed, 72 insertions(+), 1 deletion(-)
 create mode 100644 fs/btrfs/subpage.c
 create mode 100644 fs/btrfs/subpage.h

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 9f1b1a88e317..942562e11456 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -11,7 +11,8 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
 	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
 	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
 	   uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \
-	   block-rsv.o delalloc-space.o block-group.o discard.o reflink.o
+	   block-rsv.o delalloc-space.o block-group.o discard.o reflink.o \
+	   subpage.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
new file mode 100644
index 000000000000..c6ab32db3995
--- /dev/null
+++ b/fs/btrfs/subpage.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "subpage.h"
+
+int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page)
+{
+	struct btrfs_subpage *subpage;
+
+	/*
+	 * We have cases like a dummy extent buffer page, which is not
+	 * mappped and doesn't need to be locked.
+	 */
+	if (page->mapping)
+		ASSERT(PageLocked(page));
+	/* Either not subpage, or the page already has private attached */
+	if (fs_info->sectorsize == PAGE_SIZE || PagePrivate(page))
+		return 0;
+
+	subpage = kzalloc(sizeof(*subpage), GFP_NOFS);
+	if (!subpage)
+		return -ENOMEM;
+
+	spin_lock_init(&subpage->lock);
+	attach_page_private(page, subpage);
+	return 0;
+}
+
+void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page)
+{
+	struct btrfs_subpage *subpage;
+
+	/* Either not subpage, or already detached */
+	if (fs_info->sectorsize == PAGE_SIZE || !PagePrivate(page))
+		return;
+
+	subpage = (struct btrfs_subpage *)detach_page_private(page);
+	ASSERT(subpage);
+	kfree(subpage);
+}
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
new file mode 100644
index 000000000000..96f3b226913e
--- /dev/null
+++ b/fs/btrfs/subpage.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef BTRFS_SUBPAGE_H
+#define BTRFS_SUBPAGE_H
+
+#include <linux/spinlock.h>
+#include "ctree.h"
+
+/*
+ * Since the maximum page size btrfs is going to support is 64K while the
+ * minimum sectorsize is 4K, this means a u16 bitmap is enough.
+ *
+ * The regular bitmap requires 32 bits as minimal bitmap size, so we can't use
+ * existing bitmap_* helpers here.
+ */
+#define BTRFS_SUBPAGE_BITMAP_SIZE	16
+
+/*
+ * Structure to trace status of each sector inside a page.
+ *
+ * Will be attached to page::private for both data and metadata inodes.
+ */
+struct btrfs_subpage {
+	/* Common members for both data and metadata pages */
+	spinlock_t lock;
+};
+
+int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
+void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
+
+#endif /* BTRFS_SUBPAGE_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 08/22] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (6 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 07/22] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  6:54   ` Qu Wenruo
  2021-01-07  1:35   ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 09/22] btrfs: extent_io: make grab_extent_buffer_from_page() " Qu Wenruo
                   ` (14 subsequent siblings)
  22 siblings, 2 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

For subpage case, we need to allocate new memory for each metadata page.

So we need to:
- Allow attach_extent_buffer_page() to return int
  To indicate allocation failure

- Prealloc page->private for alloc_extent_buffer()
  We don't want to call memory allocation with spinlock hold, so
  do preallocation before we acquire the spin lock.

- Handle subpage and regular case differently in
  attach_extent_buffer_page()
  For regular case, just do the usual thing.
  For subpage case, allocate new memory and update the tree_block
  bitmap.

  The bitmap update will be handled by new subpage specific helper,
  btrfs_subpage_set_tree_block().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 74 ++++++++++++++++++++++++++++++++++----------
 fs/btrfs/subpage.h   | 50 ++++++++++++++++++++++++++++++
 2 files changed, 108 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d60f1837f8fb..2eeff925450f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -24,6 +24,7 @@
 #include "rcu-string.h"
 #include "backref.h"
 #include "disk-io.h"
+#include "subpage.h"
 
 static struct kmem_cache *extent_state_cache;
 static struct kmem_cache *extent_buffer_cache;
@@ -3140,22 +3141,41 @@ static int submit_extent_page(unsigned int opf,
 	return ret;
 }
 
-static void attach_extent_buffer_page(struct extent_buffer *eb,
+static int attach_extent_buffer_page(struct extent_buffer *eb,
 				      struct page *page)
 {
-	/*
-	 * If the page is mapped to btree inode, we should hold the private
-	 * lock to prevent race.
-	 * For cloned or dummy extent buffers, their pages are not mapped and
-	 * will not race with any other ebs.
-	 */
-	if (page->mapping)
-		lockdep_assert_held(&page->mapping->private_lock);
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	int ret;
 
-	if (!PagePrivate(page))
-		attach_page_private(page, eb);
-	else
-		WARN_ON(page->private != (unsigned long)eb);
+	if (fs_info->sectorsize == PAGE_SIZE) {
+		/*
+		 * If the page is mapped to btree inode, we should hold the
+		 * private lock to prevent race.
+		 * For cloned or dummy extent buffers, their pages are not
+		 * mapped and will not race with any other ebs.
+		 */
+		if (page->mapping)
+			lockdep_assert_held(&page->mapping->private_lock);
+
+		if (!PagePrivate(page))
+			attach_page_private(page, eb);
+		else
+			WARN_ON(page->private != (unsigned long)eb);
+		return 0;
+	}
+
+	/* Already mapped, just update the existing range */
+	if (PagePrivate(page))
+		goto update_bitmap;
+
+	/* Do new allocation to attach subpage */
+	ret = btrfs_attach_subpage(fs_info, page);
+	if (ret < 0)
+		return ret;
+
+update_bitmap:
+	btrfs_subpage_set_tree_block(fs_info, page, eb->start, eb->len);
+	return 0;
 }
 
 void set_page_extent_mapped(struct page *page)
@@ -5063,21 +5083,29 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
 	if (new == NULL)
 		return NULL;
 
+	set_bit(EXTENT_BUFFER_UPTODATE, &new->bflags);
+	set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags);
+
 	for (i = 0; i < num_pages; i++) {
+		int ret;
+
 		p = alloc_page(GFP_NOFS);
 		if (!p) {
 			btrfs_release_extent_buffer(new);
 			return NULL;
 		}
-		attach_extent_buffer_page(new, p);
+		ret = attach_extent_buffer_page(new, p);
+		if (ret < 0) {
+			put_page(p);
+			btrfs_release_extent_buffer(new);
+			return NULL;
+		}
 		WARN_ON(PageDirty(p));
 		SetPageUptodate(p);
 		new->pages[i] = p;
 		copy_page(page_address(p), page_address(src->pages[i]));
 	}
 
-	set_bit(EXTENT_BUFFER_UPTODATE, &new->bflags);
-	set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags);
 
 	return new;
 }
@@ -5316,6 +5344,18 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 			goto free_eb;
 		}
 
+		/*
+		 * Preallocate page->private for subpage case, so that
+		 * we won't allocate memory with private_lock hold.
+		 */
+		ret = btrfs_attach_subpage(fs_info, p);
+		if (ret < 0) {
+			unlock_page(p);
+			put_page(p);
+			exists = ERR_PTR(-ENOMEM);
+			goto free_eb;
+		}
+
 		spin_lock(&mapping->private_lock);
 		exists = grab_extent_buffer(p);
 		if (exists) {
@@ -5325,8 +5365,10 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 			mark_extent_buffer_accessed(exists, p);
 			goto free_eb;
 		}
+		/* Should not fail, as we have attached the subpage already */
 		attach_extent_buffer_page(eb, p);
 		spin_unlock(&mapping->private_lock);
+
 		WARN_ON(PageDirty(p));
 		eb->pages[i] = p;
 		if (!PageUptodate(p))
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 96f3b226913e..e49d4a7329e1 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -23,9 +23,59 @@
 struct btrfs_subpage {
 	/* Common members for both data and metadata pages */
 	spinlock_t lock;
+	union {
+		/* Structures only used by metadata */
+		struct {
+			u16 tree_block_bitmap;
+		};
+		/* structures only used by data */
+	};
 };
 
 int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
 void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
 
+/*
+ * Convert the [start, start + len) range into a u16 bitmap
+ *
+ * E.g. if start == page_offset() + 16K, len = 16K, we get 0x00f0.
+ */
+static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
+			struct page *page, u64 start, u32 len)
+{
+	int bit_start = offset_in_page(start) >> fs_info->sectorsize_bits;
+	int nbits = len >> fs_info->sectorsize_bits;
+
+	/* Basic checks */
+	ASSERT(PagePrivate(page) && page->private);
+	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
+	       IS_ALIGNED(len, fs_info->sectorsize));
+
+	/*
+	 * The range check only works for mapped page, we can
+	 * still have unampped page like dummy extent buffer pages.
+	 */
+	if (page->mapping)
+		ASSERT(page_offset(page) <= start &&
+			start + len <= page_offset(page) + PAGE_SIZE);
+	/*
+	 * Here nbits can be 16, thus can go beyond u16 range. Here we make the
+	 * first left shift to be calculated in unsigned long (u32), then
+	 * truncate the result to u16.
+	 */
+	return (u16)(((1UL << nbits) - 1) << bit_start);
+}
+
+static inline void btrfs_subpage_set_tree_block(struct btrfs_fs_info *fs_info,
+			struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	unsigned long flags;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->tree_block_bitmap |= tmp;
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
 #endif /* BTRFS_SUBPAGE_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 09/22] btrfs: extent_io: make grab_extent_buffer_from_page() to handle subpage case
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (7 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 08/22] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 10/22] btrfs: extent_io: support subpage for extent buffer page release Qu Wenruo
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

For subpage case, grab_extent_buffer() can't really get an extent buffer
just from btrfs_subpage.

Although we have btrfs_subpage::tree_block_bitmap, which can be used to
grab the bytenr of an existing extent buffer, and can then go radix tree
search to grab that existing eb.

However we are still doing radix tree insert check in
alloc_extent_buffer(), thus we don't really need to do the extra hassle,
just let alloc_extent_buffer() to handle existing eb in radix tree.

So for grab_extent_buffer(), just always return NULL for subpage case.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2eeff925450f..9b6b8f6b6da3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5276,10 +5276,19 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
 }
 #endif
 
-static struct extent_buffer *grab_extent_buffer(struct page *page)
+static struct extent_buffer *grab_extent_buffer(
+		struct btrfs_fs_info *fs_info, struct page *page)
 {
 	struct extent_buffer *exists;
 
+	/*
+	 * For subpage case, we completely rely on radix tree to ensure we
+	 * don't try to insert two eb for the same bytenr.
+	 * So here we alwasy return NULL and just continue.
+	 */
+	if (fs_info->sectorsize < PAGE_SIZE)
+		return NULL;
+
 	/* Page not yet attached to an extent buffer */
 	if (!PagePrivate(page))
 		return NULL;
@@ -5357,7 +5366,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		}
 
 		spin_lock(&mapping->private_lock);
-		exists = grab_extent_buffer(p);
+		exists = grab_extent_buffer(fs_info, p);
 		if (exists) {
 			spin_unlock(&mapping->private_lock);
 			unlock_page(p);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 10/22] btrfs: extent_io: support subpage for extent buffer page release
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (8 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 09/22] btrfs: extent_io: make grab_extent_buffer_from_page() " Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 11/22] btrfs: extent_io: attach private to dummy extent buffer pages Qu Wenruo
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

In btrfs_release_extent_buffer_pages(), we need to add extra handling
for subpage.

To do so, introduce a new helper, detach_extent_buffer_page(), to do
different handling for regular and subpage cases.

For subpage case, the new trick is to clear the range of current extent
buffer, and detach page private if and only if we're the last tree block
of the page.
This part is handled by the subpage helper,
btrfs_subpage_clear_and_test_tree_block().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 57 +++++++++++++++++++++++++++++++-------------
 fs/btrfs/subpage.h   | 24 +++++++++++++++++++
 2 files changed, 64 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9b6b8f6b6da3..3158461ebf4c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4991,25 +4991,13 @@ int extent_buffer_under_io(const struct extent_buffer *eb)
 		test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
 }
 
-/*
- * Release all pages attached to the extent buffer.
- */
-static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
+static void detach_extent_buffer_page(struct extent_buffer *eb,
+				      struct page *page)
 {
-	int i;
-	int num_pages;
-	int mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
-
-	BUG_ON(extent_buffer_under_io(eb));
-
-	num_pages = num_extent_pages(eb);
-	for (i = 0; i < num_pages; i++) {
-		struct page *page = eb->pages[i];
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	bool last = false;
 
-		if (!page)
-			continue;
-		if (mapped)
-			spin_lock(&page->mapping->private_lock);
+	if (fs_info->sectorsize == PAGE_SIZE) {
 		/*
 		 * We do this since we'll remove the pages after we've
 		 * removed the eb from the radix tree, so we could race
@@ -5028,6 +5016,41 @@ static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
 			 */
 			detach_page_private(page);
 		}
+		return;
+	}
+
+	ASSERT(PagePrivate(page));
+	/*
+	 * For subpage case, clear the range in tree_block_bitmap,
+	 * and if we're the last one, detach private completely.
+	 */
+	last = btrfs_subpage_clear_and_test_tree_block(fs_info, page,
+					eb->start, eb->len);
+	if (last)
+		btrfs_detach_subpage(fs_info, page);
+}
+
+/*
+ * Release all pages attached to the extent buffer.
+ */
+static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
+{
+	int i;
+	int num_pages;
+	int mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
+
+	ASSERT(!extent_buffer_under_io(eb));
+
+	num_pages = num_extent_pages(eb);
+	for (i = 0; i < num_pages; i++) {
+		struct page *page = eb->pages[i];
+
+		if (!page)
+			continue;
+		if (mapped)
+			spin_lock(&page->mapping->private_lock);
+
+		detach_extent_buffer_page(eb, page);
 
 		if (mapped)
 			spin_unlock(&page->mapping->private_lock);
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index e49d4a7329e1..c1556b3b68f3 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -78,4 +78,28 @@ static inline void btrfs_subpage_set_tree_block(struct btrfs_fs_info *fs_info,
 	spin_unlock_irqrestore(&subpage->lock, flags);
 }
 
+/*
+ * Clear the bits in tree_block_bitmap and return if we're the last bit set
+ * int tree_block_bitmap.
+ *
+ * Return true if we're the last bits in the tree_block_bitmap.
+ * Return false otherwise.
+ */
+static inline bool btrfs_subpage_clear_and_test_tree_block(
+			struct btrfs_fs_info *fs_info, struct page *page,
+			u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+	bool last = false;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->tree_block_bitmap &= ~tmp;
+	if (subpage->tree_block_bitmap == 0)
+		last = true;
+	spin_unlock_irqrestore(&subpage->lock, flags);
+	return last;
+}
+
 #endif /* BTRFS_SUBPAGE_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 11/22] btrfs: extent_io: attach private to dummy extent buffer pages
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (9 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 10/22] btrfs: extent_io: support subpage for extent buffer page release Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 12/22] btrfs: subpage: introduce helper for subpage uptodate status Qu Wenruo
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

Even for regular btrfs, there are locations where we allocate dummy
extent buffers for temporary usage.

Like tree_mod_log_rewind() and get_old_root().

Those dummy extent buffers will be handled by the same eb accessors, and
if they don't have page::private subpage eb accessors can fail.

To address such problems, make __alloc_dummy_extent_buffer() to attach
page private for dummy extent buffers too.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3158461ebf4c..3bd4b99897d6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5146,9 +5146,14 @@ struct extent_buffer *__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info,
 
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
+		int ret;
+
 		eb->pages[i] = alloc_page(GFP_NOFS);
 		if (!eb->pages[i])
 			goto err;
+		ret = attach_extent_buffer_page(eb, eb->pages[i]);
+		if (ret < 0)
+			goto err;
 	}
 	set_extent_buffer_uptodate(eb);
 	btrfs_set_header_nritems(eb, 0);
@@ -5156,8 +5161,10 @@ struct extent_buffer *__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info,
 
 	return eb;
 err:
-	for (; i > 0; i--)
+	for (; i > 0; i--) {
+		detach_extent_buffer_page(eb, eb->pages[i - 1]);
 		__free_page(eb->pages[i - 1]);
+	}
 	__free_extent_buffer(eb);
 	return NULL;
 }
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 12/22] btrfs: subpage: introduce helper for subpage uptodate status
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (10 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 11/22] btrfs: extent_io: attach private to dummy extent buffer pages Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 13/22] btrfs: subpage: introduce helper for subpage error status Qu Wenruo
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

This patch introduce the following functions to handle btrfs subpage
uptodate status:
- btrfs_subpage_set_uptodate()
- btrfs_subpage_clear_uptodate()
- btrfs_subpage_test_uptodate()
  Those helpers can only be called when the range is ensured to be
  inside the page.

- btrfs_page_set_uptodate()
- btrfs_page_clear_uptodate()
- btrfs_page_test_uptodate()
  Those helpers can handle both regular sector size and subpage without
  problem.
  Although caller should still ensure that the range is inside the page.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/subpage.h | 84 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 84 insertions(+)

diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index c1556b3b68f3..2b2d25cc6cba 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -23,6 +23,7 @@
 struct btrfs_subpage {
 	/* Common members for both data and metadata pages */
 	spinlock_t lock;
+	u16 uptodate_bitmap;
 	union {
 		/* Structures only used by metadata */
 		struct {
@@ -102,4 +103,87 @@ static inline bool btrfs_subpage_clear_and_test_tree_block(
 	return last;
 }
 
+static inline void btrfs_subpage_set_uptodate(struct btrfs_fs_info *fs_info,
+			struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->uptodate_bitmap |= tmp;
+	if (subpage->uptodate_bitmap == U16_MAX)
+		SetPageUptodate(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
+static inline void btrfs_subpage_clear_uptodate(struct btrfs_fs_info *fs_info,
+			struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->uptodate_bitmap &= ~tmp;
+	ClearPageUptodate(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
+/*
+ * Unlike set/clear which is dependent on each page status, for test all bits
+ * are tested in the same way.
+ */
+#define DECLARE_BTRFS_SUBPAGE_TEST_OP(name)				\
+static inline bool btrfs_subpage_test_##name(struct btrfs_fs_info *fs_info, \
+			struct page *page, u64 start, u32 len)		\
+{									\
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; \
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len); \
+	unsigned long flags;						\
+	bool ret;							\
+									\
+	spin_lock_irqsave(&subpage->lock, flags);			\
+	ret = ((subpage->name##_bitmap & tmp) == tmp);			\
+	spin_unlock_irqrestore(&subpage->lock, flags);			\
+	return ret;							\
+}
+DECLARE_BTRFS_SUBPAGE_TEST_OP(uptodate);
+
+/*
+ * Note that, in selftest, especially extent-io-tests, we can have empty
+ * fs_info passed in.
+ * Thankfully in selftest, we only test sectorsize == PAGE_SIZE cases so far,
+ * thus we can fall back to regular sectorsize branch.
+ */
+#define DECLARE_BTRFS_PAGE_OPS(name, set_page_func, clear_page_func,	\
+			       test_page_func)				\
+static inline void btrfs_page_set_##name(struct btrfs_fs_info *fs_info,	\
+			struct page *page, u64 start, u32 len)		\
+{									\
+	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) {	\
+		set_page_func(page);					\
+		return;							\
+	}								\
+	btrfs_subpage_set_##name(fs_info, page, start, len);		\
+}									\
+static inline void btrfs_page_clear_##name(struct btrfs_fs_info *fs_info, \
+			struct page *page, u64 start, u32 len)		\
+{									\
+	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) {	\
+		clear_page_func(page);					\
+		return;							\
+	}								\
+	btrfs_subpage_clear_##name(fs_info, page, start, len);		\
+}									\
+static inline bool btrfs_page_test_##name(struct btrfs_fs_info *fs_info, \
+			struct page *page, u64 start, u32 len)		\
+{									\
+	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE)	\
+		return test_page_func(page);				\
+	return btrfs_subpage_test_##name(fs_info, page, start, len);	\
+}
+DECLARE_BTRFS_PAGE_OPS(uptodate, SetPageUptodate, ClearPageUptodate,
+			PageUptodate);
+
 #endif /* BTRFS_SUBPAGE_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 13/22] btrfs: subpage: introduce helper for subpage error status
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (11 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 12/22] btrfs: subpage: introduce helper for subpage uptodate status Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 14/22] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

This patch introduce the following functions to handle btrfs subpage
error status:
- btrfs_subpage_set_error()
- btrfs_subpage_clear_error()
- btrfs_subpage_test_error()
  Those helpers can only be called when the range is ensured to be
  inside the page.

- btrfs_page_set_error()
- btrfs_page_clear_error()
- btrfs_page_test_error()
  Those helpers can handle both regular sector size and subpage without
  problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/subpage.h | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 2b2d25cc6cba..25f6d0cd1a1e 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -24,6 +24,7 @@ struct btrfs_subpage {
 	/* Common members for both data and metadata pages */
 	spinlock_t lock;
 	u16 uptodate_bitmap;
+	u16 error_bitmap;
 	union {
 		/* Structures only used by metadata */
 		struct {
@@ -130,6 +131,35 @@ static inline void btrfs_subpage_clear_uptodate(struct btrfs_fs_info *fs_info,
 	spin_unlock_irqrestore(&subpage->lock, flags);
 }
 
+static inline void btrfs_subpage_set_error(struct btrfs_fs_info *fs_info,
+					   struct page *page, u64 start,
+					   u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->error_bitmap |= tmp;
+	SetPageError(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
+static inline void btrfs_subpage_clear_error(struct btrfs_fs_info *fs_info,
+					   struct page *page, u64 start,
+					   u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->error_bitmap &= ~tmp;
+	if (subpage->error_bitmap == 0)
+		ClearPageError(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
 /*
  * Unlike set/clear which is dependent on each page status, for test all bits
  * are tested in the same way.
@@ -149,6 +179,7 @@ static inline bool btrfs_subpage_test_##name(struct btrfs_fs_info *fs_info, \
 	return ret;							\
 }
 DECLARE_BTRFS_SUBPAGE_TEST_OP(uptodate);
+DECLARE_BTRFS_SUBPAGE_TEST_OP(error);
 
 /*
  * Note that, in selftest, especially extent-io-tests, we can have empty
@@ -185,5 +216,6 @@ static inline bool btrfs_page_test_##name(struct btrfs_fs_info *fs_info, \
 }
 DECLARE_BTRFS_PAGE_OPS(uptodate, SetPageUptodate, ClearPageUptodate,
 			PageUptodate);
+DECLARE_BTRFS_PAGE_OPS(error, SetPageError, ClearPageError, PageError);
 
 #endif /* BTRFS_SUBPAGE_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 14/22] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (12 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 13/22] btrfs: subpage: introduce helper for subpage error status Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 15/22] btrfs: extent_io: make btrfs_clone_extent_buffer() to be subpage compatible Qu Wenruo
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

For those functions, to support subpage size they just need to call
btrfs_page_set/clear_uptodate() wrappers.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3bd4b99897d6..66e70b263343 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5613,30 +5613,33 @@ bool set_extent_buffer_dirty(struct extent_buffer *eb)
 
 void clear_extent_buffer_uptodate(struct extent_buffer *eb)
 {
-	int i;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 	struct page *page;
 	int num_pages;
+	int i;
 
 	clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
 		if (page)
-			ClearPageUptodate(page);
+			btrfs_page_clear_uptodate(fs_info, page,
+						  eb->start, eb->len);
 	}
 }
 
 void set_extent_buffer_uptodate(struct extent_buffer *eb)
 {
-	int i;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 	struct page *page;
 	int num_pages;
+	int i;
 
 	set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-		SetPageUptodate(page);
+		btrfs_page_set_uptodate(fs_info, page, eb->start, eb->len);
 	}
 }
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 15/22] btrfs: extent_io: make btrfs_clone_extent_buffer() to be subpage compatible
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (13 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 14/22] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 16/22] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

For btrfs_clone_extent_buffer(), it's mostly the same code of
__alloc_dummy_extent_buffer(), except it has extra page copy.

So to make it subpage compatible, we only need to:
- Call set_extent_buffer_uptodate() instead of SetPageUptodate()
  This will set correct uptodate bit for subpage and regular sector size
  cases.

Since we're calling set_extent_buffer_uptodate() which will also set
EXTENT_BUFFER_UPTODATE bit, we don't need to manually set that bit
either.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 66e70b263343..194cb8b63216 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5106,7 +5106,6 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
 	if (new == NULL)
 		return NULL;
 
-	set_bit(EXTENT_BUFFER_UPTODATE, &new->bflags);
 	set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags);
 
 	for (i = 0; i < num_pages; i++) {
@@ -5124,11 +5123,10 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
 			return NULL;
 		}
 		WARN_ON(PageDirty(p));
-		SetPageUptodate(p);
 		new->pages[i] = p;
 		copy_page(page_address(p), page_address(src->pages[i]));
 	}
-
+	set_extent_buffer_uptodate(new);
 
 	return new;
 }
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 16/22] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (14 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 15/22] btrfs: extent_io: make btrfs_clone_extent_buffer() to be subpage compatible Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  8:24   ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 17/22] btrfs: extent_io: introduce read_extent_buffer_subpage() Qu Wenruo
                   ` (6 subsequent siblings)
  22 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

Unlike the original try_release_extent_buffer,
try_release_subpage_extent_buffer() will iterate through
btrfs_subpage::tree_block_bitmap, and try to release each extent buffer.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 76 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 194cb8b63216..792264f5c3c2 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -6258,10 +6258,86 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
 	}
 }
 
+static int try_release_subpage_extent_buffer(struct page *page)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+	u64 page_start = page_offset(page);
+	int bitmap_size = BTRFS_SUBPAGE_BITMAP_SIZE;
+	int bit_start = 0;
+	int ret;
+
+	while (bit_start < bitmap_size) {
+		struct btrfs_subpage *subpage;
+		struct extent_buffer *eb;
+		unsigned long flags;
+		u16 tmp = 1 << bit_start;
+		u64 start;
+
+		/*
+		 * Make sure the page still has private, as previous iteration
+		 * can detach page private.
+		 */
+		spin_lock(&page->mapping->private_lock);
+		if (!PagePrivate(page)) {
+			spin_unlock(&page->mapping->private_lock);
+			break;
+		}
+
+		subpage = (struct btrfs_subpage *)page->private;
+
+		spin_lock_irqsave(&subpage->lock, flags);
+		spin_unlock(&page->mapping->private_lock);
+
+		if (!(tmp & subpage->tree_block_bitmap))  {
+			spin_unlock_irqrestore(&subpage->lock, flags);
+			bit_start++;
+			continue;
+		}
+
+		start = bit_start * fs_info->sectorsize + page_start;
+		bit_start += fs_info->nodesize >> fs_info->sectorsize_bits;
+		/*
+		 * Here we can't call find_extent_buffer() which will increase
+		 * eb->refs.
+		 */
+		rcu_read_lock();
+		eb = radix_tree_lookup(&fs_info->buffer_radix,
+				start >> fs_info->sectorsize_bits);
+		ASSERT(eb);
+		spin_lock(&eb->refs_lock);
+		if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb) ||
+		    !test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) {
+			spin_unlock(&eb->refs_lock);
+			rcu_read_unlock();
+			continue;
+		}
+		rcu_read_unlock();
+		spin_unlock_irqrestore(&subpage->lock, flags);
+		/*
+		 * Here we don't care the return value, we will always check
+		 * the page private at the end.
+		 * And release_extent_buffer() will release the refs_lock.
+		 */
+		release_extent_buffer(eb);
+	}
+	/* Finally to check if we have cleared page private */
+	spin_lock(&page->mapping->private_lock);
+	if (!PagePrivate(page))
+		ret = 1;
+	else
+		ret = 0;
+	spin_unlock(&page->mapping->private_lock);
+	return ret;
+
+}
+
 int try_release_extent_buffer(struct page *page)
 {
 	struct extent_buffer *eb;
 
+	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
+		return try_release_subpage_extent_buffer(page);
+
 	/*
 	 * We need to make sure nobody is attaching this page to an eb right
 	 * now.
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 17/22] btrfs: extent_io: introduce read_extent_buffer_subpage()
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (15 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 16/22] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 18/22] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case Qu Wenruo
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new helper, read_extent_buffer_subpage(), to do the subpage
extent buffer read.

The difference between regular and subpage routines are:
- No page locking
  Here we completely rely on extent locking.
  Page locking can reduce the concurrency greatly, as if we lock one
  page to read one extent buffer, all the other extent buffers in the
  same page will have to wait.

- Extent uptodate condition
  Despite the existing PageUptodate() and EXTENT_BUFFER_UPTODATE check,
  We also need to check btrfs_subpage::uptodate_bitmap.

- No page loop
  Just one page, no need to loop, this greately simplified the subpage
  routine.

This patch only implemented the bio submit part, no endio support yet.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   |  1 +
 fs/btrfs/extent_io.c | 70 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5473bed6a7e8..7440663e8d13 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -602,6 +602,7 @@ int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
 	ASSERT(page->private);
 	eb = (struct extent_buffer *)page->private;
 
+
 	/*
 	 * The pending IO might have been the only thing that kept this buffer
 	 * in memory.  Make sure we have a ref for all this other checks
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 792264f5c3c2..b828314bf43a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5641,6 +5641,73 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
 	}
 }
 
+static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
+				      int mirror_num)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct extent_io_tree *io_tree;
+	struct page *page = eb->pages[0];
+	struct bio *bio = NULL;
+	int ret = 0;
+
+	ASSERT(!test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags));
+	ASSERT(PagePrivate(page));
+	io_tree = &BTRFS_I(fs_info->btree_inode)->io_tree;
+
+	if (wait == WAIT_NONE) {
+		ret = try_lock_extent(io_tree, eb->start,
+				      eb->start + eb->len - 1);
+		if (ret <= 0)
+			return ret;
+	} else {
+		ret = lock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		if (ret < 0)
+			return ret;
+	}
+
+	ret = 0;
+	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags) ||
+	    PageUptodate(page) ||
+	    btrfs_subpage_test_uptodate(fs_info, page, eb->start, eb->len)) {
+		set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+		unlock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		return ret;
+	}
+
+	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
+	eb->read_mirror = 0;
+	atomic_set(&eb->io_pages, 1);
+	check_buffer_tree_ref(eb);
+
+	ret = submit_extent_page(REQ_OP_READ | REQ_META, NULL, page, eb->start,
+				 eb->len, eb->start - page_offset(page), &bio,
+				 end_bio_extent_readpage, mirror_num, 0, 0,
+				 true);
+	if (ret) {
+		/*
+		 * In the endio function, if we hit something wrong we will
+		 * increase the io_pages, so here we need to decrease it for error
+		 * path.
+		 */
+		atomic_dec(&eb->io_pages);
+	}
+	if (bio) {
+		int tmp;
+
+		tmp = submit_one_bio(bio, mirror_num, 0);
+		if (tmp < 0)
+			return tmp;
+	}
+	if (ret || wait != WAIT_COMPLETE)
+		return ret;
+
+	wait_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+			EXTENT_LOCKED);
+	if (!test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
+		ret = -EIO;
+	return ret;
+}
+
 int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 {
 	int i;
@@ -5657,6 +5724,9 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
 		return 0;
 
+	if (eb->fs_info->sectorsize < PAGE_SIZE)
+		return read_extent_buffer_subpage(eb, wait, mirror_num);
+
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 18/22] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (16 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 17/22] btrfs: extent_io: introduce read_extent_buffer_subpage() Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 19/22] btrfs: disk-io: introduce subpage metadata validation check Qu Wenruo
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

To handle subpage status update, add the following new tricks:
- Use btrfs_page_*() helpers to update page status
  Now we can handle both cases well.

- No page unlock for subpage metadata
  Since subpage metadata doesn't utilize page locking at all, skip it.
  For subpage data locking, it's handled in later commits.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b828314bf43a..2902484ab9f9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2839,15 +2839,24 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
 	processed->uptodate = uptodate;
 }
 
-static void endio_readpage_update_page_status(struct page *page, bool uptodate)
+static void endio_readpage_update_page_status(struct page *page, bool uptodate,
+					      u64 start, u32 len)
 {
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+
+	ASSERT(page_offset(page) <= start &&
+		start + len <= page_offset(page) + PAGE_SIZE);
+
 	if (uptodate) {
-		SetPageUptodate(page);
+		btrfs_page_set_uptodate(fs_info, page, start, len);
 	} else {
-		ClearPageUptodate(page);
-		SetPageError(page);
+		btrfs_page_clear_uptodate(fs_info, page, start, len);
+		btrfs_page_set_error(fs_info, page, start, len);
 	}
-	unlock_page(page);
+
+	if (fs_info->sectorsize == PAGE_SIZE)
+		unlock_page(page);
+	/* Subpage locking will be handled in later patches */
 }
 
 /*
@@ -2984,7 +2993,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		bio_offset += len;
 
 		/* Update page status and unlock */
-		endio_readpage_update_page_status(page, uptodate);
+		endio_readpage_update_page_status(page, uptodate, start, len);
 		endio_readpage_release_extent(&processed, BTRFS_I(inode),
 					      start, end, uptodate);
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 19/22] btrfs: disk-io: introduce subpage metadata validation check
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (17 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 18/22] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  1:01 ` [PATCH v3 20/22] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

For subpage metadata validation check, there are some difference:

- Read must finish in one bvec
  Since we're just reading one subpage range in one page, it should
  never be split into two bios nor two bvecs.

- How to grab the existing eb
  Instead of grabbing eb using page->private, we have to go search radix
  tree as we don't have any direct pointer at hand.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 7440663e8d13..73733250d12c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -591,6 +591,59 @@ static int validate_extent_buffer(struct extent_buffer *eb)
 	return ret;
 }
 
+static int validate_subpage_buffer(struct page *page, u64 start, u64 end,
+				   int mirror)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+	struct extent_buffer *eb;
+	int reads_done;
+	int ret = 0;
+
+	/*
+	 * We don't allow bio merge for subpage metadata read, so we should
+	 * only get one eb for each endio hook.
+	 */
+	ASSERT(end == start + fs_info->nodesize - 1);
+	ASSERT(PagePrivate(page));
+
+	eb = find_extent_buffer(fs_info, start);
+	/*
+	 * When we are reading one tree block, eb must have been
+	 * inserted into the radix tree. If not something is wrong.
+	 */
+	ASSERT(eb);
+
+	reads_done = atomic_dec_and_test(&eb->io_pages);
+	/* Subpage read must finish in page read */
+	ASSERT(reads_done);
+
+	eb->read_mirror = mirror;
+	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
+		ret = -EIO;
+		goto err;
+	}
+	ret = validate_extent_buffer(eb);
+	if (ret < 0)
+		goto err;
+
+	if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
+		btree_readahead_hook(eb, ret);
+
+	set_extent_buffer_uptodate(eb);
+
+	free_extent_buffer(eb);
+	return ret;
+err:
+	/*
+	 * end_bio_extent_readpage decrements io_pages in case of error,
+	 * make sure it has something to decrement.
+	 */
+	atomic_inc(&eb->io_pages);
+	clear_extent_buffer_uptodate(eb);
+	free_extent_buffer(eb);
+	return ret;
+}
+
 int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
 				   struct page *page, u64 start, u64 end,
 				   int mirror)
@@ -600,6 +653,10 @@ int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
 	int reads_done;
 
 	ASSERT(page->private);
+
+	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
+		return validate_subpage_buffer(page, start, end, mirror);
+
 	eb = (struct extent_buffer *)page->private;
 
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 20/22] btrfs: introduce btrfs_subpage for data inodes
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (18 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 19/22] btrfs: disk-io: introduce subpage metadata validation check Qu Wenruo
@ 2021-01-06  1:01 ` Qu Wenruo
  2021-01-06  5:04     ` kernel test robot
  2021-01-09  9:53     ` kernel test robot
  2021-01-06  1:02 ` [PATCH v3 21/22] btrfs: integrate page status update for read path into begin/end_page_read() Qu Wenruo
                   ` (2 subsequent siblings)
  22 siblings, 2 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:01 UTC (permalink / raw)
  To: linux-btrfs

To support subpage sector size, data also need extra info to make sure
which sectors in a page are uptodate/dirty/...

This patch will make pages for data inodes to get btrfs_subpage
structure attached, and detached when the page is freed.

This patch also slightly changes the timing when
set_page_extent_mapped() to make sure:

- We have page->mapping set
  page->mapping->host is used to grab btrfs_fs_info, thus we can only
  call this function after page is mapped to an inode.

  One call site attaches pages to inode manually, thus we have to modify
  the timing of set_page_extent_mapped() a little.

- As soon as possible, before other operations
  Since memory allocation can fail, we have to do extra error handling.
  Calling set_page_extent_mapped() as soon as possible can simply the
  error handling for several call sites.

The idea is pretty much the same as iomap_page, but with more bitmaps
for btrfs specific cases.

Currently the plan is to switch iomap if iomap can provide sector
aligned write back (only write back dirty sectors, but not the full
page, data balance require this feature).

So we will stick to btrfs specific bitmap for now.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c      | 10 ++++++--
 fs/btrfs/extent_io.c        | 46 +++++++++++++++++++++++++++++++++----
 fs/btrfs/extent_io.h        |  3 ++-
 fs/btrfs/file.c             | 24 ++++++++-----------
 fs/btrfs/free-space-cache.c | 15 +++++++++---
 fs/btrfs/inode.c            | 12 ++++++----
 fs/btrfs/ioctl.c            |  5 +++-
 fs/btrfs/reflink.c          |  5 +++-
 fs/btrfs/relocation.c       | 12 ++++++++--
 9 files changed, 99 insertions(+), 33 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 5ae3fa0386b7..6d203acfdeb3 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -542,13 +542,19 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 			goto next;
 		}
 
-		end = last_offset + PAGE_SIZE - 1;
 		/*
 		 * at this point, we have a locked page in the page cache
 		 * for these bytes in the file.  But, we have to make
 		 * sure they map to this compressed extent on disk.
 		 */
-		set_page_extent_mapped(page);
+		ret = set_page_extent_mapped(page);
+		if (ret < 0) {
+			unlock_page(page);
+			put_page(page);
+			break;
+		}
+
+		end = last_offset + PAGE_SIZE - 1;
 		lock_extent(tree, last_offset, end);
 		read_lock(&em_tree->lock);
 		em = lookup_extent_mapping(em_tree, last_offset,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2902484ab9f9..335a0aa3a6ec 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3187,10 +3187,39 @@ static int attach_extent_buffer_page(struct extent_buffer *eb,
 	return 0;
 }
 
-void set_page_extent_mapped(struct page *page)
+int __must_check set_page_extent_mapped(struct page *page)
 {
+	struct btrfs_fs_info *fs_info;
+
+	ASSERT(page->mapping);
+
+	if (PagePrivate(page))
+		return 0;
+
+	fs_info = btrfs_sb(page->mapping->host->i_sb);
+
+	if (fs_info->sectorsize < PAGE_SIZE)
+		return btrfs_attach_subpage(fs_info, page);
+
+	attach_page_private(page, (void *)EXTENT_PAGE_PRIVATE);
+	return 0;
+
+}
+
+void clear_page_extent_mapped(struct page *page)
+{
+	struct btrfs_fs_info *fs_info;
+
+	ASSERT(page->mapping);
+
 	if (!PagePrivate(page))
-		attach_page_private(page, (void *)EXTENT_PAGE_PRIVATE);
+		return;
+
+	fs_info = btrfs_sb(page->mapping->host->i_sb);
+	if (fs_info->sectorsize < PAGE_SIZE)
+		return btrfs_detach_subpage(fs_info, page);
+
+	detach_page_private(page);
 }
 
 static struct extent_map *
@@ -3247,7 +3276,12 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 	unsigned long this_bio_flag = 0;
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
 
-	set_page_extent_mapped(page);
+	ret = set_page_extent_mapped(page);
+	if (ret < 0) {
+		unlock_extent(tree, start, end);
+		SetPageError(page);
+		goto out;
+	}
 
 	if (!PageUptodate(page)) {
 		if (cleancache_get_page(page) == 0) {
@@ -3688,7 +3722,11 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 		flush_dcache_page(page);
 	}
 
-	set_page_extent_mapped(page);
+	ret = set_page_extent_mapped(page);
+	if (ret < 0) {
+		SetPageError(page);
+		goto done;
+	}
 
 	if (!epd->extent_locked) {
 		ret = writepage_delalloc(BTRFS_I(inode), page, wbc, start,
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index bedf761a0300..357a3380cd42 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -178,7 +178,8 @@ int btree_write_cache_pages(struct address_space *mapping,
 void extent_readahead(struct readahead_control *rac);
 int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
 		  u64 start, u64 len);
-void set_page_extent_mapped(struct page *page);
+int __must_check set_page_extent_mapped(struct page *page);
+void clear_page_extent_mapped(struct page *page);
 
 struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 					  u64 start, u64 owner_root, int level);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 1602975ddb88..a6f627f92c64 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1369,6 +1369,12 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
 			goto fail;
 		}
 
+		err = set_page_extent_mapped(pages[i]);
+		if (err < 0) {
+			faili = i;
+			goto fail;
+		}
+
 		if (i == 0)
 			err = prepare_uptodate_page(inode, pages[i], pos,
 						    force_uptodate);
@@ -1453,23 +1459,11 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
 	}
 
 	/*
-	 * It's possible the pages are dirty right now, but we don't want
-	 * to clean them yet because copy_from_user may catch a page fault
-	 * and we might have to fall back to one page at a time.  If that
-	 * happens, we'll unlock these pages and we'd have a window where
-	 * reclaim could sneak in and drop the once-dirty page on the floor
-	 * without writing it.
-	 *
-	 * We have the pages locked and the extent range locked, so there's
-	 * no way someone can start IO on any dirty pages in this range.
-	 *
-	 * We'll call btrfs_dirty_pages() later on, and that will flip around
-	 * delalloc bits and dirty the pages as required.
+	 * We should be called after prepare_pages() which should have
+	 * locked all pages in the range.
 	 */
-	for (i = 0; i < num_pages; i++) {
-		set_page_extent_mapped(pages[i]);
+	for (i = 0; i < num_pages; i++)
 		WARN_ON(!PageLocked(pages[i]));
-	}
 
 	return ret;
 }
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index fd6ddd6b8165..379bef967e1d 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -431,11 +431,22 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, bool uptodate)
 	int i;
 
 	for (i = 0; i < io_ctl->num_pages; i++) {
+		int ret;
+
 		page = find_or_create_page(inode->i_mapping, i, mask);
 		if (!page) {
 			io_ctl_drop_pages(io_ctl);
 			return -ENOMEM;
 		}
+
+		ret = set_page_extent_mapped(page);
+		if (ret < 0) {
+			unlock_page(page);
+			put_page(page);
+			io_ctl_drop_pages(io_ctl);
+			return -ENOMEM;
+		}
+
 		io_ctl->pages[i] = page;
 		if (uptodate && !PageUptodate(page)) {
 			btrfs_readpage(NULL, page);
@@ -455,10 +466,8 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, bool uptodate)
 		}
 	}
 
-	for (i = 0; i < io_ctl->num_pages; i++) {
+	for (i = 0; i < io_ctl->num_pages; i++)
 		clear_page_dirty_for_io(io_ctl->pages[i]);
-		set_page_extent_mapped(io_ctl->pages[i]);
-	}
 
 	return 0;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ecc1f1f60b48..0cf3a0b7e98c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4712,6 +4712,9 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 		ret = -ENOMEM;
 		goto out;
 	}
+	ret = set_page_extent_mapped(page);
+	if (ret < 0)
+		goto out_unlock;
 
 	if (!PageUptodate(page)) {
 		ret = btrfs_readpage(NULL, page);
@@ -4729,7 +4732,6 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 	wait_on_page_writeback(page);
 
 	lock_extent_bits(io_tree, block_start, block_end, &cached_state);
-	set_page_extent_mapped(page);
 
 	ordered = btrfs_lookup_ordered_extent(inode, block_start);
 	if (ordered) {
@@ -8109,7 +8111,7 @@ static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
 {
 	int ret = try_release_extent_mapping(page, gfp_flags);
 	if (ret == 1)
-		detach_page_private(page);
+		clear_page_extent_mapped(page);
 	return ret;
 }
 
@@ -8268,7 +8270,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 	}
 
 	ClearPageChecked(page);
-	detach_page_private(page);
+	clear_page_extent_mapped(page);
 }
 
 /*
@@ -8347,7 +8349,9 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	wait_on_page_writeback(page);
 
 	lock_extent_bits(io_tree, page_start, page_end, &cached_state);
-	set_page_extent_mapped(page);
+	ret = set_page_extent_mapped(page);
+	if (ret < 0)
+		goto out_unlock;
 
 	/*
 	 * we can't set the delalloc bits if there are pending ordered
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5b9b0a390f0e..5a93530bca46 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1314,6 +1314,10 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		if (!page)
 			break;
 
+		ret = set_page_extent_mapped(page);
+		if (ret < 0)
+			break;
+
 		page_start = page_offset(page);
 		page_end = page_start + PAGE_SIZE - 1;
 		while (1) {
@@ -1435,7 +1439,6 @@ static int cluster_pages_for_defrag(struct inode *inode,
 	for (i = 0; i < i_done; i++) {
 		clear_page_dirty_for_io(pages[i]);
 		ClearPageChecked(pages[i]);
-		set_page_extent_mapped(pages[i]);
 		set_page_dirty(pages[i]);
 		unlock_page(pages[i]);
 		put_page(pages[i]);
diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index b03e7891394e..b24396cf2f99 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -81,7 +81,10 @@ static int copy_inline_to_page(struct btrfs_inode *inode,
 		goto out_unlock;
 	}
 
-	set_page_extent_mapped(page);
+	ret = set_page_extent_mapped(page);
+	if (ret < 0)
+		goto out_unlock;
+
 	clear_extent_bit(&inode->io_tree, file_offset, range_end,
 			 EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
 			 0, 0, NULL);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 8e51b39cbfbb..d917fdef0cbf 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2679,6 +2679,16 @@ static int relocate_file_extent_cluster(struct inode *inode,
 				goto out;
 			}
 		}
+		ret = set_page_extent_mapped(page);
+		if (ret < 0) {
+			btrfs_delalloc_release_metadata(BTRFS_I(inode),
+						PAGE_SIZE, true);
+			btrfs_delalloc_release_extents(BTRFS_I(inode),
+						PAGE_SIZE);
+			unlock_page(page);
+			put_page(page);
+			goto out;
+		}
 
 		if (PageReadahead(page)) {
 			page_cache_async_readahead(inode->i_mapping,
@@ -2706,8 +2716,6 @@ static int relocate_file_extent_cluster(struct inode *inode,
 
 		lock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end);
 
-		set_page_extent_mapped(page);
-
 		if (nr < cluster->nr &&
 		    page_start + offset == cluster->boundary[nr]) {
 			set_extent_bits(&BTRFS_I(inode)->io_tree,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 21/22] btrfs: integrate page status update for read path into begin/end_page_read()
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (19 preceding siblings ...)
  2021-01-06  1:01 ` [PATCH v3 20/22] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
@ 2021-01-06  1:02 ` Qu Wenruo
  2021-01-06  1:02 ` [PATCH v3 22/22] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
  2021-01-12 15:14 ` [PATCH v3 00/22] btrfs: add read-only support for subpage sector size David Sterba
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:02 UTC (permalink / raw)
  To: linux-btrfs

In btrfs data page read path, the page status update are handled in two
different locations:

  btrfs_do_read_page()
  {
	while (cur <= end) {
		/* No need to read from disk */
		if (HOLE/PREALLOC/INLINE){
			memset();
			set_extent_uptodate();
			continue;
		}
		/* Read from disk */
		ret = submit_extent_page(end_bio_extent_readpage);
  }

  end_bio_extent_readpage()
  {
	endio_readpage_uptodate_page_status();
  }

This is fine for sectorsize == PAGE_SIZE case, as for above loop we
should only hit one branch and then exit.

But for subpage, there are more works to be done in page status update:
- Page Unlock condition
  Unlike regular page size == sectorsize case, we can no longer just
  unlock a page.
  Only the last reader of the page can unlock the page.
  This means, we can unlock the page either in the while() loop, or in
  the endio function.

- Page uptodate condition
  Since we have multiple sectors to read for a page, we can only mark
  the full page uptodate if all sectors are uptodate.

To handle both subpage and regular cases, introduce a pair of functions
to help handling page status update:

- being_page_read()
  For regular case, it does nothing.
  For subpage case, it update the reader counters so that later
  end_page_read() can know who is the last one to unlock the page.

- end_page_read()
  This is just endio_readpage_uptodate_page_status() renamed.
  The original name is a little too long and too specific for endio.

  The only new trick added is the condition for page unlock.
  Now for subage data, we unlock the page if we're the last reader.

This does not only provide the basis for subpage data read, but also
hide the special handling of page read from the main read loop.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 38 +++++++++++++++++++----------
 fs/btrfs/subpage.h   | 57 +++++++++++++++++++++++++++++++++++---------
 2 files changed, 72 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 335a0aa3a6ec..dfbf319fc01d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2839,8 +2839,17 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
 	processed->uptodate = uptodate;
 }
 
-static void endio_readpage_update_page_status(struct page *page, bool uptodate,
-					      u64 start, u32 len)
+static void begin_data_page_read(struct btrfs_fs_info *fs_info, struct page *page)
+{
+	ASSERT(PageLocked(page));
+	if (fs_info->sectorsize == PAGE_SIZE)
+		return;
+
+	ASSERT(PagePrivate(page));
+	btrfs_subpage_start_reader(fs_info, page, page_offset(page), PAGE_SIZE);
+}
+
+static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
 
@@ -2856,7 +2865,12 @@ static void endio_readpage_update_page_status(struct page *page, bool uptodate,
 
 	if (fs_info->sectorsize == PAGE_SIZE)
 		unlock_page(page);
-	/* Subpage locking will be handled in later patches */
+	else if (is_data_inode(page->mapping->host))
+		/*
+		 * For subpage data, unlock the page if we're the last reader.
+		 * For subpage metadata, page lock is not utilized for read.
+		 */
+		btrfs_subpage_end_reader(fs_info, page, start, len);
 }
 
 /*
@@ -2993,7 +3007,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		bio_offset += len;
 
 		/* Update page status and unlock */
-		endio_readpage_update_page_status(page, uptodate, start, len);
+		end_page_read(page, uptodate, start, len);
 		endio_readpage_release_extent(&processed, BTRFS_I(inode),
 					      start, end, uptodate);
 	}
@@ -3260,6 +3274,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		      unsigned int read_flags, u64 *prev_em_start)
 {
 	struct inode *inode = page->mapping->host;
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	u64 start = page_offset(page);
 	const u64 end = start + PAGE_SIZE - 1;
 	u64 cur = start;
@@ -3303,6 +3318,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 			kunmap_atomic(userpage);
 		}
 	}
+	begin_data_page_read(fs_info, page);
 	while (cur <= end) {
 		bool force_bio_submit = false;
 		u64 disk_bytenr;
@@ -3320,13 +3336,14 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 					    &cached, GFP_NOFS);
 			unlock_extent_cached(tree, cur,
 					     cur + iosize - 1, &cached);
+			end_page_read(page, true, cur, iosize);
 			break;
 		}
 		em = __get_extent_map(inode, page, pg_offset, cur,
 				      end - cur + 1, em_cached);
 		if (IS_ERR_OR_NULL(em)) {
-			SetPageError(page);
 			unlock_extent(tree, cur, end);
+			end_page_read(page, false, cur, end + 1 - cur);
 			break;
 		}
 		extent_offset = cur - em->start;
@@ -3409,6 +3426,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 					    &cached, GFP_NOFS);
 			unlock_extent_cached(tree, cur,
 					     cur + iosize - 1, &cached);
+			end_page_read(page, true, cur, iosize);
 			cur = cur + iosize;
 			pg_offset += iosize;
 			continue;
@@ -3418,6 +3436,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 				   EXTENT_UPTODATE, 1, NULL)) {
 			check_page_uptodate(tree, page);
 			unlock_extent(tree, cur, cur + iosize - 1);
+			end_page_read(page, true, cur, iosize);
 			cur = cur + iosize;
 			pg_offset += iosize;
 			continue;
@@ -3426,8 +3445,8 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		 * to date.  Error out
 		 */
 		if (block_start == EXTENT_MAP_INLINE) {
-			SetPageError(page);
 			unlock_extent(tree, cur, cur + iosize - 1);
+			end_page_read(page, false, cur, iosize);
 			cur = cur + iosize;
 			pg_offset += iosize;
 			continue;
@@ -3444,19 +3463,14 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 			nr++;
 			*bio_flags = this_bio_flag;
 		} else {
-			SetPageError(page);
 			unlock_extent(tree, cur, cur + iosize - 1);
+			end_page_read(page, false, cur, iosize);
 			goto out;
 		}
 		cur = cur + iosize;
 		pg_offset += iosize;
 	}
 out:
-	if (!nr) {
-		if (!PageError(page))
-			SetPageUptodate(page);
-		unlock_page(page);
-	}
 	return ret;
 }
 
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 25f6d0cd1a1e..5c4b5d2fd308 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -31,28 +31,22 @@ struct btrfs_subpage {
 			u16 tree_block_bitmap;
 		};
 		/* structures only used by data */
+		struct {
+			atomic_t readers;
+		};
 	};
 };
 
 int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
 void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
 
-/*
- * Convert the [start, start + len) range into a u16 bitmap
- *
- * E.g. if start == page_offset() + 16K, len = 16K, we get 0x00f0.
- */
-static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
-			struct page *page, u64 start, u32 len)
+static inline void btrfs_subpage_assert(struct btrfs_fs_info *fs_info,
+					struct page *page, u64 start, u32 len)
 {
-	int bit_start = offset_in_page(start) >> fs_info->sectorsize_bits;
-	int nbits = len >> fs_info->sectorsize_bits;
-
 	/* Basic checks */
 	ASSERT(PagePrivate(page) && page->private);
 	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
 	       IS_ALIGNED(len, fs_info->sectorsize));
-
 	/*
 	 * The range check only works for mapped page, we can
 	 * still have unampped page like dummy extent buffer pages.
@@ -60,6 +54,21 @@ static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
 	if (page->mapping)
 		ASSERT(page_offset(page) <= start &&
 			start + len <= page_offset(page) + PAGE_SIZE);
+}
+
+/*
+ * Convert the [start, start + len) range into a u16 bitmap
+ *
+ * E.g. if start == page_offset() + 16K, len = 16K, we get 0x00f0.
+ */
+static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
+			struct page *page, u64 start, u32 len)
+{
+	int bit_start = offset_in_page(start) >> fs_info->sectorsize_bits;
+	int nbits = len >> fs_info->sectorsize_bits;
+
+	btrfs_subpage_assert(fs_info, page, start, len);
+
 	/*
 	 * Here nbits can be 16, thus can go beyond u16 range. Here we make the
 	 * first left shift to be calculated in unsigned long (u32), then
@@ -68,6 +77,32 @@ static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
 	return (u16)(((1UL << nbits) - 1) << bit_start);
 }
 
+static inline void btrfs_subpage_start_reader(struct btrfs_fs_info *fs_info,
+					      struct page *page, u64 start,
+					      u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	int nbits = len >> fs_info->sectorsize_bits;
+	int ret;
+
+	btrfs_subpage_assert(fs_info, page, start, len);
+
+	ret = atomic_add_return(nbits, &subpage->readers);
+	ASSERT(ret == nbits);
+}
+
+static inline void btrfs_subpage_end_reader(struct btrfs_fs_info *fs_info,
+			struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	int nbits = len >> fs_info->sectorsize_bits;
+
+	btrfs_subpage_assert(fs_info, page, start, len);
+	ASSERT(atomic_read(&subpage->readers) >= nbits);
+	if (atomic_sub_and_test(nbits, &subpage->readers))
+		unlock_page(page);
+}
+
 static inline void btrfs_subpage_set_tree_block(struct btrfs_fs_info *fs_info,
 			struct page *page, u64 start, u32 len)
 {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 22/22] btrfs: allow RO mount of 4K sector size fs on 64K page system
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (20 preceding siblings ...)
  2021-01-06  1:02 ` [PATCH v3 21/22] btrfs: integrate page status update for read path into begin/end_page_read() Qu Wenruo
@ 2021-01-06  1:02 ` Qu Wenruo
  2021-01-12 15:14 ` [PATCH v3 00/22] btrfs: add read-only support for subpage sector size David Sterba
  22 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  1:02 UTC (permalink / raw)
  To: linux-btrfs

This adds the basic RO mount ability for 4K sector size on 64K page
system.

Currently we only plan to support 4K and 64K page system.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 24 +++++++++++++++++++++---
 fs/btrfs/super.c   |  7 +++++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 73733250d12c..553835688560 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2484,13 +2484,21 @@ static int validate_super(struct btrfs_fs_info *fs_info,
 		btrfs_err(fs_info, "invalid sectorsize %llu", sectorsize);
 		ret = -EINVAL;
 	}
-	/* Only PAGE SIZE is supported yet */
-	if (sectorsize != PAGE_SIZE) {
+
+	/*
+	 * For 4K page size, we only support 4K sector size.
+	 * For 64K page size, we support RW for 64K sector size, and RO for
+	 * 4K sector size.
+	 */
+	if ((SZ_4K == PAGE_SIZE && sectorsize != PAGE_SIZE) ||
+	    (SZ_64K == PAGE_SIZE && (sectorsize != SZ_4K &&
+				     sectorsize != SZ_64K))) {
 		btrfs_err(fs_info,
-			"sectorsize %llu not supported yet, only support %lu",
+			"sectorsize %llu not supported yet for page size %lu",
 			sectorsize, PAGE_SIZE);
 		ret = -EINVAL;
 	}
+
 	if (!is_power_of_2(nodesize) || nodesize < sectorsize ||
 	    nodesize > BTRFS_MAX_METADATA_BLOCKSIZE) {
 		btrfs_err(fs_info, "invalid nodesize %llu", nodesize);
@@ -3249,6 +3257,16 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_alloc;
 	}
 
+	/* For 4K sector size support, it's only read-only yet */
+	if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) {
+		if (!sb_rdonly(sb) || btrfs_super_log_root(disk_super)) {
+			btrfs_err(fs_info,
+				"subpage sector size only support RO yet");
+			err = -EINVAL;
+			goto fail_alloc;
+		}
+	}
+
 	ret = btrfs_init_workqueues(fs_info, fs_devices);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 12d7d3be7cd4..5bbc23597a93 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2028,6 +2028,13 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			ret = -EINVAL;
 			goto restore;
 		}
+		if (fs_info->sectorsize < PAGE_SIZE) {
+			btrfs_warn(fs_info,
+	"read-write mount is not yet allowed for sector size %u page size %lu",
+				   fs_info->sectorsize, PAGE_SIZE);
+			ret = -EINVAL;
+			goto restore;
+		}
 
 		/*
 		 * NOTE: when remounting with a change that does writes, don't
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 20/22] btrfs: introduce btrfs_subpage for data inodes
  2021-01-06  1:01 ` [PATCH v3 20/22] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
@ 2021-01-06  5:04     ` kernel test robot
  2021-01-09  9:53     ` kernel test robot
  1 sibling, 0 replies; 37+ messages in thread
From: kernel test robot @ 2021-01-06  5:04 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 8803 bytes --]

Hi Qu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kdave/for-next]
[also build test WARNING on v5.11-rc2 next-20210104]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20210106-090847
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: m68k-randconfig-s032-20210106 (attached as .config)
compiler: m68k-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-208-g46a52ca4-dirty
        # https://github.com/0day-ci/linux/commit/2c54bbf363f66a7c4d489fa0b7967ce5fc960afb
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20210106-090847
        git checkout 2c54bbf363f66a7c4d489fa0b7967ce5fc960afb
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=m68k 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
>> fs/btrfs/inode.c:8352:13: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted vm_fault_t [assigned] [usertype] ret @@     got int @@
   fs/btrfs/inode.c:8352:13: sparse:     expected restricted vm_fault_t [assigned] [usertype] ret
   fs/btrfs/inode.c:8352:13: sparse:     got int
>> fs/btrfs/inode.c:8353:13: sparse: sparse: restricted vm_fault_t degrades to integer
   fs/btrfs/inode.c: note: in included file (through include/linux/mmzone.h, include/linux/gfp.h, include/linux/slab.h, ...):
   include/linux/spinlock.h:394:9: sparse: sparse: context imbalance in 'run_delayed_iput_locked' - unexpected unlock

vim +8352 fs/btrfs/inode.c

  8275	
  8276	/*
  8277	 * btrfs_page_mkwrite() is not allowed to change the file size as it gets
  8278	 * called from a page fault handler when a page is first dirtied. Hence we must
  8279	 * be careful to check for EOF conditions here. We set the page up correctly
  8280	 * for a written page which means we get ENOSPC checking when writing into
  8281	 * holes and correct delalloc and unwritten extent mapping on filesystems that
  8282	 * support these features.
  8283	 *
  8284	 * We are not allowed to take the i_mutex here so we have to play games to
  8285	 * protect against truncate races as the page could now be beyond EOF.  Because
  8286	 * truncate_setsize() writes the inode size before removing pages, once we have
  8287	 * the page lock we can determine safely if the page is beyond EOF. If it is not
  8288	 * beyond EOF, then the page is guaranteed safe against truncation until we
  8289	 * unlock the page.
  8290	 */
  8291	vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
  8292	{
  8293		struct page *page = vmf->page;
  8294		struct inode *inode = file_inode(vmf->vma->vm_file);
  8295		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
  8296		struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
  8297		struct btrfs_ordered_extent *ordered;
  8298		struct extent_state *cached_state = NULL;
  8299		struct extent_changeset *data_reserved = NULL;
  8300		char *kaddr;
  8301		unsigned long zero_start;
  8302		loff_t size;
  8303		vm_fault_t ret;
  8304		int ret2;
  8305		int reserved = 0;
  8306		u64 reserved_space;
  8307		u64 page_start;
  8308		u64 page_end;
  8309		u64 end;
  8310	
  8311		reserved_space = PAGE_SIZE;
  8312	
  8313		sb_start_pagefault(inode->i_sb);
  8314		page_start = page_offset(page);
  8315		page_end = page_start + PAGE_SIZE - 1;
  8316		end = page_end;
  8317	
  8318		/*
  8319		 * Reserving delalloc space after obtaining the page lock can lead to
  8320		 * deadlock. For example, if a dirty page is locked by this function
  8321		 * and the call to btrfs_delalloc_reserve_space() ends up triggering
  8322		 * dirty page write out, then the btrfs_writepage() function could
  8323		 * end up waiting indefinitely to get a lock on the page currently
  8324		 * being processed by btrfs_page_mkwrite() function.
  8325		 */
  8326		ret2 = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved,
  8327						    page_start, reserved_space);
  8328		if (!ret2) {
  8329			ret2 = file_update_time(vmf->vma->vm_file);
  8330			reserved = 1;
  8331		}
  8332		if (ret2) {
  8333			ret = vmf_error(ret2);
  8334			if (reserved)
  8335				goto out;
  8336			goto out_noreserve;
  8337		}
  8338	
  8339		ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */
  8340	again:
  8341		lock_page(page);
  8342		size = i_size_read(inode);
  8343	
  8344		if ((page->mapping != inode->i_mapping) ||
  8345		    (page_start >= size)) {
  8346			/* page got truncated out from underneath us */
  8347			goto out_unlock;
  8348		}
  8349		wait_on_page_writeback(page);
  8350	
  8351		lock_extent_bits(io_tree, page_start, page_end, &cached_state);
> 8352		ret = set_page_extent_mapped(page);
> 8353		if (ret < 0)
  8354			goto out_unlock;
  8355	
  8356		/*
  8357		 * we can't set the delalloc bits if there are pending ordered
  8358		 * extents.  Drop our locks and wait for them to finish
  8359		 */
  8360		ordered = btrfs_lookup_ordered_range(BTRFS_I(inode), page_start,
  8361				PAGE_SIZE);
  8362		if (ordered) {
  8363			unlock_extent_cached(io_tree, page_start, page_end,
  8364					     &cached_state);
  8365			unlock_page(page);
  8366			btrfs_start_ordered_extent(ordered, 1);
  8367			btrfs_put_ordered_extent(ordered);
  8368			goto again;
  8369		}
  8370	
  8371		if (page->index == ((size - 1) >> PAGE_SHIFT)) {
  8372			reserved_space = round_up(size - page_start,
  8373						  fs_info->sectorsize);
  8374			if (reserved_space < PAGE_SIZE) {
  8375				end = page_start + reserved_space - 1;
  8376				btrfs_delalloc_release_space(BTRFS_I(inode),
  8377						data_reserved, page_start,
  8378						PAGE_SIZE - reserved_space, true);
  8379			}
  8380		}
  8381	
  8382		/*
  8383		 * page_mkwrite gets called when the page is firstly dirtied after it's
  8384		 * faulted in, but write(2) could also dirty a page and set delalloc
  8385		 * bits, thus in this case for space account reason, we still need to
  8386		 * clear any delalloc bits within this page range since we have to
  8387		 * reserve data&meta space before lock_page() (see above comments).
  8388		 */
  8389		clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start, end,
  8390				  EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
  8391				  EXTENT_DEFRAG, 0, 0, &cached_state);
  8392	
  8393		ret2 = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, end, 0,
  8394						&cached_state);
  8395		if (ret2) {
  8396			unlock_extent_cached(io_tree, page_start, page_end,
  8397					     &cached_state);
  8398			ret = VM_FAULT_SIGBUS;
  8399			goto out_unlock;
  8400		}
  8401	
  8402		/* page is wholly or partially inside EOF */
  8403		if (page_start + PAGE_SIZE > size)
  8404			zero_start = offset_in_page(size);
  8405		else
  8406			zero_start = PAGE_SIZE;
  8407	
  8408		if (zero_start != PAGE_SIZE) {
  8409			kaddr = kmap(page);
  8410			memset(kaddr + zero_start, 0, PAGE_SIZE - zero_start);
  8411			flush_dcache_page(page);
  8412			kunmap(page);
  8413		}
  8414		ClearPageChecked(page);
  8415		set_page_dirty(page);
  8416		SetPageUptodate(page);
  8417	
  8418		BTRFS_I(inode)->last_trans = fs_info->generation;
  8419		BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid;
  8420		BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
  8421	
  8422		unlock_extent_cached(io_tree, page_start, page_end, &cached_state);
  8423	
  8424		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8425		sb_end_pagefault(inode->i_sb);
  8426		extent_changeset_free(data_reserved);
  8427		return VM_FAULT_LOCKED;
  8428	
  8429	out_unlock:
  8430		unlock_page(page);
  8431	out:
  8432		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8433		btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, page_start,
  8434					     reserved_space, (ret != 0));
  8435	out_noreserve:
  8436		sb_end_pagefault(inode->i_sb);
  8437		extent_changeset_free(data_reserved);
  8438		return ret;
  8439	}
  8440	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33611 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 20/22] btrfs: introduce btrfs_subpage for data inodes
@ 2021-01-06  5:04     ` kernel test robot
  0 siblings, 0 replies; 37+ messages in thread
From: kernel test robot @ 2021-01-06  5:04 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 9015 bytes --]

Hi Qu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kdave/for-next]
[also build test WARNING on v5.11-rc2 next-20210104]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20210106-090847
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: m68k-randconfig-s032-20210106 (attached as .config)
compiler: m68k-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-208-g46a52ca4-dirty
        # https://github.com/0day-ci/linux/commit/2c54bbf363f66a7c4d489fa0b7967ce5fc960afb
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20210106-090847
        git checkout 2c54bbf363f66a7c4d489fa0b7967ce5fc960afb
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=m68k 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
>> fs/btrfs/inode.c:8352:13: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted vm_fault_t [assigned] [usertype] ret @@     got int @@
   fs/btrfs/inode.c:8352:13: sparse:     expected restricted vm_fault_t [assigned] [usertype] ret
   fs/btrfs/inode.c:8352:13: sparse:     got int
>> fs/btrfs/inode.c:8353:13: sparse: sparse: restricted vm_fault_t degrades to integer
   fs/btrfs/inode.c: note: in included file (through include/linux/mmzone.h, include/linux/gfp.h, include/linux/slab.h, ...):
   include/linux/spinlock.h:394:9: sparse: sparse: context imbalance in 'run_delayed_iput_locked' - unexpected unlock

vim +8352 fs/btrfs/inode.c

  8275	
  8276	/*
  8277	 * btrfs_page_mkwrite() is not allowed to change the file size as it gets
  8278	 * called from a page fault handler when a page is first dirtied. Hence we must
  8279	 * be careful to check for EOF conditions here. We set the page up correctly
  8280	 * for a written page which means we get ENOSPC checking when writing into
  8281	 * holes and correct delalloc and unwritten extent mapping on filesystems that
  8282	 * support these features.
  8283	 *
  8284	 * We are not allowed to take the i_mutex here so we have to play games to
  8285	 * protect against truncate races as the page could now be beyond EOF.  Because
  8286	 * truncate_setsize() writes the inode size before removing pages, once we have
  8287	 * the page lock we can determine safely if the page is beyond EOF. If it is not
  8288	 * beyond EOF, then the page is guaranteed safe against truncation until we
  8289	 * unlock the page.
  8290	 */
  8291	vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
  8292	{
  8293		struct page *page = vmf->page;
  8294		struct inode *inode = file_inode(vmf->vma->vm_file);
  8295		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
  8296		struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
  8297		struct btrfs_ordered_extent *ordered;
  8298		struct extent_state *cached_state = NULL;
  8299		struct extent_changeset *data_reserved = NULL;
  8300		char *kaddr;
  8301		unsigned long zero_start;
  8302		loff_t size;
  8303		vm_fault_t ret;
  8304		int ret2;
  8305		int reserved = 0;
  8306		u64 reserved_space;
  8307		u64 page_start;
  8308		u64 page_end;
  8309		u64 end;
  8310	
  8311		reserved_space = PAGE_SIZE;
  8312	
  8313		sb_start_pagefault(inode->i_sb);
  8314		page_start = page_offset(page);
  8315		page_end = page_start + PAGE_SIZE - 1;
  8316		end = page_end;
  8317	
  8318		/*
  8319		 * Reserving delalloc space after obtaining the page lock can lead to
  8320		 * deadlock. For example, if a dirty page is locked by this function
  8321		 * and the call to btrfs_delalloc_reserve_space() ends up triggering
  8322		 * dirty page write out, then the btrfs_writepage() function could
  8323		 * end up waiting indefinitely to get a lock on the page currently
  8324		 * being processed by btrfs_page_mkwrite() function.
  8325		 */
  8326		ret2 = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved,
  8327						    page_start, reserved_space);
  8328		if (!ret2) {
  8329			ret2 = file_update_time(vmf->vma->vm_file);
  8330			reserved = 1;
  8331		}
  8332		if (ret2) {
  8333			ret = vmf_error(ret2);
  8334			if (reserved)
  8335				goto out;
  8336			goto out_noreserve;
  8337		}
  8338	
  8339		ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */
  8340	again:
  8341		lock_page(page);
  8342		size = i_size_read(inode);
  8343	
  8344		if ((page->mapping != inode->i_mapping) ||
  8345		    (page_start >= size)) {
  8346			/* page got truncated out from underneath us */
  8347			goto out_unlock;
  8348		}
  8349		wait_on_page_writeback(page);
  8350	
  8351		lock_extent_bits(io_tree, page_start, page_end, &cached_state);
> 8352		ret = set_page_extent_mapped(page);
> 8353		if (ret < 0)
  8354			goto out_unlock;
  8355	
  8356		/*
  8357		 * we can't set the delalloc bits if there are pending ordered
  8358		 * extents.  Drop our locks and wait for them to finish
  8359		 */
  8360		ordered = btrfs_lookup_ordered_range(BTRFS_I(inode), page_start,
  8361				PAGE_SIZE);
  8362		if (ordered) {
  8363			unlock_extent_cached(io_tree, page_start, page_end,
  8364					     &cached_state);
  8365			unlock_page(page);
  8366			btrfs_start_ordered_extent(ordered, 1);
  8367			btrfs_put_ordered_extent(ordered);
  8368			goto again;
  8369		}
  8370	
  8371		if (page->index == ((size - 1) >> PAGE_SHIFT)) {
  8372			reserved_space = round_up(size - page_start,
  8373						  fs_info->sectorsize);
  8374			if (reserved_space < PAGE_SIZE) {
  8375				end = page_start + reserved_space - 1;
  8376				btrfs_delalloc_release_space(BTRFS_I(inode),
  8377						data_reserved, page_start,
  8378						PAGE_SIZE - reserved_space, true);
  8379			}
  8380		}
  8381	
  8382		/*
  8383		 * page_mkwrite gets called when the page is firstly dirtied after it's
  8384		 * faulted in, but write(2) could also dirty a page and set delalloc
  8385		 * bits, thus in this case for space account reason, we still need to
  8386		 * clear any delalloc bits within this page range since we have to
  8387		 * reserve data&meta space before lock_page() (see above comments).
  8388		 */
  8389		clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start, end,
  8390				  EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
  8391				  EXTENT_DEFRAG, 0, 0, &cached_state);
  8392	
  8393		ret2 = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, end, 0,
  8394						&cached_state);
  8395		if (ret2) {
  8396			unlock_extent_cached(io_tree, page_start, page_end,
  8397					     &cached_state);
  8398			ret = VM_FAULT_SIGBUS;
  8399			goto out_unlock;
  8400		}
  8401	
  8402		/* page is wholly or partially inside EOF */
  8403		if (page_start + PAGE_SIZE > size)
  8404			zero_start = offset_in_page(size);
  8405		else
  8406			zero_start = PAGE_SIZE;
  8407	
  8408		if (zero_start != PAGE_SIZE) {
  8409			kaddr = kmap(page);
  8410			memset(kaddr + zero_start, 0, PAGE_SIZE - zero_start);
  8411			flush_dcache_page(page);
  8412			kunmap(page);
  8413		}
  8414		ClearPageChecked(page);
  8415		set_page_dirty(page);
  8416		SetPageUptodate(page);
  8417	
  8418		BTRFS_I(inode)->last_trans = fs_info->generation;
  8419		BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid;
  8420		BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
  8421	
  8422		unlock_extent_cached(io_tree, page_start, page_end, &cached_state);
  8423	
  8424		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8425		sb_end_pagefault(inode->i_sb);
  8426		extent_changeset_free(data_reserved);
  8427		return VM_FAULT_LOCKED;
  8428	
  8429	out_unlock:
  8430		unlock_page(page);
  8431	out:
  8432		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8433		btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, page_start,
  8434					     reserved_space, (ret != 0));
  8435	out_noreserve:
  8436		sb_end_pagefault(inode->i_sb);
  8437		extent_changeset_free(data_reserved);
  8438		return ret;
  8439	}
  8440	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33611 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 20/22] btrfs: introduce btrfs_subpage for data inodes
  2021-01-06  5:04     ` kernel test robot
  (?)
@ 2021-01-06  5:32     ` Qu Wenruo
  2021-01-06  6:48         ` Rong Chen
  -1 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  5:32 UTC (permalink / raw)
  To: kernel test robot, Qu Wenruo, linux-btrfs; +Cc: kbuild-all



On 2021/1/6 下午1:04, kernel test robot wrote:
> Hi Qu,
>
> Thank you for the patch! Perhaps something to improve:
>
> [auto build test WARNING on kdave/for-next]
> [also build test WARNING on v5.11-rc2 next-20210104]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20210106-090847
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
> config: m68k-randconfig-s032-20210106 (attached as .config)
> compiler: m68k-linux-gcc (GCC) 9.3.0
> reproduce:
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # apt-get install sparse
>          # sparse version: v0.6.3-208-g46a52ca4-dirty
>          # https://github.com/0day-ci/linux/commit/2c54bbf363f66a7c4d489fa0b7967ce5fc960afb
>          git remote add linux-review https://github.com/0day-ci/linux
>          git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20210106-090847
>          git checkout 2c54bbf363f66a7c4d489fa0b7967ce5fc960afb
>          # save the attached .config to linux build tree
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=m68k
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
>
> "sparse warnings: (new ones prefixed by >>)"
>>> fs/btrfs/inode.c:8352:13: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted vm_fault_t [assigned] [usertype] ret @@     got int @@
>     fs/btrfs/inode.c:8352:13: sparse:     expected restricted vm_fault_t [assigned] [usertype] ret
>     fs/btrfs/inode.c:8352:13: sparse:     got int
>>> fs/btrfs/inode.c:8353:13: sparse: sparse: restricted vm_fault_t degrades to integer

Why I always forgot this...

Now it get properly fixed in github. Although the submitted patch is
still using @ret other than @ret2.

Is there anyway to let LKP to run on specific branch so that I can avoid
similar problems.

Thanks,
Qu
>     fs/btrfs/inode.c: note: in included file (through include/linux/mmzone.h, include/linux/gfp.h, include/linux/slab.h, ...):
>     include/linux/spinlock.h:394:9: sparse: sparse: context imbalance in 'run_delayed_iput_locked' - unexpected unlock
>
> vim +8352 fs/btrfs/inode.c
>
>    8275
>    8276	/*
>    8277	 * btrfs_page_mkwrite() is not allowed to change the file size as it gets
>    8278	 * called from a page fault handler when a page is first dirtied. Hence we must
>    8279	 * be careful to check for EOF conditions here. We set the page up correctly
>    8280	 * for a written page which means we get ENOSPC checking when writing into
>    8281	 * holes and correct delalloc and unwritten extent mapping on filesystems that
>    8282	 * support these features.
>    8283	 *
>    8284	 * We are not allowed to take the i_mutex here so we have to play games to
>    8285	 * protect against truncate races as the page could now be beyond EOF.  Because
>    8286	 * truncate_setsize() writes the inode size before removing pages, once we have
>    8287	 * the page lock we can determine safely if the page is beyond EOF. If it is not
>    8288	 * beyond EOF, then the page is guaranteed safe against truncation until we
>    8289	 * unlock the page.
>    8290	 */
>    8291	vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>    8292	{
>    8293		struct page *page = vmf->page;
>    8294		struct inode *inode = file_inode(vmf->vma->vm_file);
>    8295		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>    8296		struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
>    8297		struct btrfs_ordered_extent *ordered;
>    8298		struct extent_state *cached_state = NULL;
>    8299		struct extent_changeset *data_reserved = NULL;
>    8300		char *kaddr;
>    8301		unsigned long zero_start;
>    8302		loff_t size;
>    8303		vm_fault_t ret;
>    8304		int ret2;
>    8305		int reserved = 0;
>    8306		u64 reserved_space;
>    8307		u64 page_start;
>    8308		u64 page_end;
>    8309		u64 end;
>    8310
>    8311		reserved_space = PAGE_SIZE;
>    8312
>    8313		sb_start_pagefault(inode->i_sb);
>    8314		page_start = page_offset(page);
>    8315		page_end = page_start + PAGE_SIZE - 1;
>    8316		end = page_end;
>    8317
>    8318		/*
>    8319		 * Reserving delalloc space after obtaining the page lock can lead to
>    8320		 * deadlock. For example, if a dirty page is locked by this function
>    8321		 * and the call to btrfs_delalloc_reserve_space() ends up triggering
>    8322		 * dirty page write out, then the btrfs_writepage() function could
>    8323		 * end up waiting indefinitely to get a lock on the page currently
>    8324		 * being processed by btrfs_page_mkwrite() function.
>    8325		 */
>    8326		ret2 = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved,
>    8327						    page_start, reserved_space);
>    8328		if (!ret2) {
>    8329			ret2 = file_update_time(vmf->vma->vm_file);
>    8330			reserved = 1;
>    8331		}
>    8332		if (ret2) {
>    8333			ret = vmf_error(ret2);
>    8334			if (reserved)
>    8335				goto out;
>    8336			goto out_noreserve;
>    8337		}
>    8338
>    8339		ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */
>    8340	again:
>    8341		lock_page(page);
>    8342		size = i_size_read(inode);
>    8343
>    8344		if ((page->mapping != inode->i_mapping) ||
>    8345		    (page_start >= size)) {
>    8346			/* page got truncated out from underneath us */
>    8347			goto out_unlock;
>    8348		}
>    8349		wait_on_page_writeback(page);
>    8350
>    8351		lock_extent_bits(io_tree, page_start, page_end, &cached_state);
>> 8352		ret = set_page_extent_mapped(page);
>> 8353		if (ret < 0)
>    8354			goto out_unlock;
>    8355
>    8356		/*
>    8357		 * we can't set the delalloc bits if there are pending ordered
>    8358		 * extents.  Drop our locks and wait for them to finish
>    8359		 */
>    8360		ordered = btrfs_lookup_ordered_range(BTRFS_I(inode), page_start,
>    8361				PAGE_SIZE);
>    8362		if (ordered) {
>    8363			unlock_extent_cached(io_tree, page_start, page_end,
>    8364					     &cached_state);
>    8365			unlock_page(page);
>    8366			btrfs_start_ordered_extent(ordered, 1);
>    8367			btrfs_put_ordered_extent(ordered);
>    8368			goto again;
>    8369		}
>    8370
>    8371		if (page->index == ((size - 1) >> PAGE_SHIFT)) {
>    8372			reserved_space = round_up(size - page_start,
>    8373						  fs_info->sectorsize);
>    8374			if (reserved_space < PAGE_SIZE) {
>    8375				end = page_start + reserved_space - 1;
>    8376				btrfs_delalloc_release_space(BTRFS_I(inode),
>    8377						data_reserved, page_start,
>    8378						PAGE_SIZE - reserved_space, true);
>    8379			}
>    8380		}
>    8381
>    8382		/*
>    8383		 * page_mkwrite gets called when the page is firstly dirtied after it's
>    8384		 * faulted in, but write(2) could also dirty a page and set delalloc
>    8385		 * bits, thus in this case for space account reason, we still need to
>    8386		 * clear any delalloc bits within this page range since we have to
>    8387		 * reserve data&meta space before lock_page() (see above comments).
>    8388		 */
>    8389		clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start, end,
>    8390				  EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
>    8391				  EXTENT_DEFRAG, 0, 0, &cached_state);
>    8392
>    8393		ret2 = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, end, 0,
>    8394						&cached_state);
>    8395		if (ret2) {
>    8396			unlock_extent_cached(io_tree, page_start, page_end,
>    8397					     &cached_state);
>    8398			ret = VM_FAULT_SIGBUS;
>    8399			goto out_unlock;
>    8400		}
>    8401
>    8402		/* page is wholly or partially inside EOF */
>    8403		if (page_start + PAGE_SIZE > size)
>    8404			zero_start = offset_in_page(size);
>    8405		else
>    8406			zero_start = PAGE_SIZE;
>    8407
>    8408		if (zero_start != PAGE_SIZE) {
>    8409			kaddr = kmap(page);
>    8410			memset(kaddr + zero_start, 0, PAGE_SIZE - zero_start);
>    8411			flush_dcache_page(page);
>    8412			kunmap(page);
>    8413		}
>    8414		ClearPageChecked(page);
>    8415		set_page_dirty(page);
>    8416		SetPageUptodate(page);
>    8417
>    8418		BTRFS_I(inode)->last_trans = fs_info->generation;
>    8419		BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid;
>    8420		BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
>    8421
>    8422		unlock_extent_cached(io_tree, page_start, page_end, &cached_state);
>    8423
>    8424		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
>    8425		sb_end_pagefault(inode->i_sb);
>    8426		extent_changeset_free(data_reserved);
>    8427		return VM_FAULT_LOCKED;
>    8428
>    8429	out_unlock:
>    8430		unlock_page(page);
>    8431	out:
>    8432		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
>    8433		btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, page_start,
>    8434					     reserved_space, (ret != 0));
>    8435	out_noreserve:
>    8436		sb_end_pagefault(inode->i_sb);
>    8437		extent_changeset_free(data_reserved);
>    8438		return ret;
>    8439	}
>    8440
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 20/22] btrfs: introduce btrfs_subpage for data inodes
  2021-01-06  5:32     ` Qu Wenruo
@ 2021-01-06  6:48         ` Rong Chen
  0 siblings, 0 replies; 37+ messages in thread
From: Rong Chen @ 2021-01-06  6:48 UTC (permalink / raw)
  To: Qu Wenruo, kernel test robot, Qu Wenruo, linux-btrfs; +Cc: kbuild-all



On 1/6/21 1:32 PM, Qu Wenruo wrote:
>
>
> On 2021/1/6 下午1:04, kernel test robot wrote:
>> Hi Qu,
>>
>> Thank you for the patch! Perhaps something to improve:
>>
>> [auto build test WARNING on kdave/for-next]
>> [also build test WARNING on v5.11-rc2 next-20210104]
>> [If your patch is applied to the wrong git tree, kindly drop us a note.
>> And when submitting patch, we suggest to use '--base' as documented in
>> https://git-scm.com/docs/git-format-patch]
>>
>> url: 
>> https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20210106-090847
>> base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git 
>> for-next
>> config: m68k-randconfig-s032-20210106 (attached as .config)
>> compiler: m68k-linux-gcc (GCC) 9.3.0
>> reproduce:
>>          wget 
>> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross 
>> -O ~/bin/make.cross
>>          chmod +x ~/bin/make.cross
>>          # apt-get install sparse
>>          # sparse version: v0.6.3-208-g46a52ca4-dirty
>>          # 
>> https://github.com/0day-ci/linux/commit/2c54bbf363f66a7c4d489fa0b7967ce5fc960afb
>>          git remote add linux-review https://github.com/0day-ci/linux
>>          git fetch --no-tags linux-review 
>> Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20210106-090847
>>          git checkout 2c54bbf363f66a7c4d489fa0b7967ce5fc960afb
>>          # save the attached .config to linux build tree
>>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 
>> make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=m68k
>>
>> If you fix the issue, kindly add following tag as appropriate
>> Reported-by: kernel test robot <lkp@intel.com>
>>
>>
>> "sparse warnings: (new ones prefixed by >>)"
>>>> fs/btrfs/inode.c:8352:13: sparse: sparse: incorrect type in 
>>>> assignment (different base types) @@     expected restricted 
>>>> vm_fault_t [assigned] [usertype] ret @@     got int @@
>>     fs/btrfs/inode.c:8352:13: sparse:     expected restricted 
>> vm_fault_t [assigned] [usertype] ret
>>     fs/btrfs/inode.c:8352:13: sparse:     got int
>>>> fs/btrfs/inode.c:8353:13: sparse: sparse: restricted vm_fault_t 
>>>> degrades to integer
>
> Why I always forgot this...
>
> Now it get properly fixed in github. Although the submitted patch is
> still using @ret other than @ret2.
>
> Is there anyway to let LKP to run on specific branch so that I can avoid
> similar problems.

Hi Qu,

LKP can test private branches if you can provide the git tree url, 
please see
https://github.com/intel/lkp-tests/wiki/LKP-FAQ#how-to-request-testing-a-new-kernel-tree-on-lkp

Best Regards,
Rong Chen

>
> Thanks,
> Qu
>>     fs/btrfs/inode.c: note: in included file (through 
>> include/linux/mmzone.h, include/linux/gfp.h, include/linux/slab.h, ...):
>>     include/linux/spinlock.h:394:9: sparse: sparse: context imbalance 
>> in 'run_delayed_iput_locked' - unexpected unlock
>>
>> vim +8352 fs/btrfs/inode.c
>>
>>    8275
>>    8276    /*
>>    8277     * btrfs_page_mkwrite() is not allowed to change the file 
>> size as it gets
>>    8278     * called from a page fault handler when a page is first 
>> dirtied. Hence we must
>>    8279     * be careful to check for EOF conditions here. We set the 
>> page up correctly
>>    8280     * for a written page which means we get ENOSPC checking 
>> when writing into
>>    8281     * holes and correct delalloc and unwritten extent mapping 
>> on filesystems that
>>    8282     * support these features.
>>    8283     *
>>    8284     * We are not allowed to take the i_mutex here so we have 
>> to play games to
>>    8285     * protect against truncate races as the page could now be 
>> beyond EOF.  Because
>>    8286     * truncate_setsize() writes the inode size before 
>> removing pages, once we have
>>    8287     * the page lock we can determine safely if the page is 
>> beyond EOF. If it is not
>>    8288     * beyond EOF, then the page is guaranteed safe against 
>> truncation until we
>>    8289     * unlock the page.
>>    8290     */
>>    8291    vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>>    8292    {
>>    8293        struct page *page = vmf->page;
>>    8294        struct inode *inode = file_inode(vmf->vma->vm_file);
>>    8295        struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>>    8296        struct extent_io_tree *io_tree = 
>> &BTRFS_I(inode)->io_tree;
>>    8297        struct btrfs_ordered_extent *ordered;
>>    8298        struct extent_state *cached_state = NULL;
>>    8299        struct extent_changeset *data_reserved = NULL;
>>    8300        char *kaddr;
>>    8301        unsigned long zero_start;
>>    8302        loff_t size;
>>    8303        vm_fault_t ret;
>>    8304        int ret2;
>>    8305        int reserved = 0;
>>    8306        u64 reserved_space;
>>    8307        u64 page_start;
>>    8308        u64 page_end;
>>    8309        u64 end;
>>    8310
>>    8311        reserved_space = PAGE_SIZE;
>>    8312
>>    8313        sb_start_pagefault(inode->i_sb);
>>    8314        page_start = page_offset(page);
>>    8315        page_end = page_start + PAGE_SIZE - 1;
>>    8316        end = page_end;
>>    8317
>>    8318        /*
>>    8319         * Reserving delalloc space after obtaining the page 
>> lock can lead to
>>    8320         * deadlock. For example, if a dirty page is locked by 
>> this function
>>    8321         * and the call to btrfs_delalloc_reserve_space() ends 
>> up triggering
>>    8322         * dirty page write out, then the btrfs_writepage() 
>> function could
>>    8323         * end up waiting indefinitely to get a lock on the 
>> page currently
>>    8324         * being processed by btrfs_page_mkwrite() function.
>>    8325         */
>>    8326        ret2 = btrfs_delalloc_reserve_space(BTRFS_I(inode), 
>> &data_reserved,
>>    8327                            page_start, reserved_space);
>>    8328        if (!ret2) {
>>    8329            ret2 = file_update_time(vmf->vma->vm_file);
>>    8330            reserved = 1;
>>    8331        }
>>    8332        if (ret2) {
>>    8333            ret = vmf_error(ret2);
>>    8334            if (reserved)
>>    8335                goto out;
>>    8336            goto out_noreserve;
>>    8337        }
>>    8338
>>    8339        ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */
>>    8340    again:
>>    8341        lock_page(page);
>>    8342        size = i_size_read(inode);
>>    8343
>>    8344        if ((page->mapping != inode->i_mapping) ||
>>    8345            (page_start >= size)) {
>>    8346            /* page got truncated out from underneath us */
>>    8347            goto out_unlock;
>>    8348        }
>>    8349        wait_on_page_writeback(page);
>>    8350
>>    8351        lock_extent_bits(io_tree, page_start, page_end, 
>> &cached_state);
>>> 8352        ret = set_page_extent_mapped(page);
>>> 8353        if (ret < 0)
>>    8354            goto out_unlock;
>>    8355
>>    8356        /*
>>    8357         * we can't set the delalloc bits if there are pending 
>> ordered
>>    8358         * extents.  Drop our locks and wait for them to finish
>>    8359         */
>>    8360        ordered = btrfs_lookup_ordered_range(BTRFS_I(inode), 
>> page_start,
>>    8361                PAGE_SIZE);
>>    8362        if (ordered) {
>>    8363            unlock_extent_cached(io_tree, page_start, page_end,
>>    8364                         &cached_state);
>>    8365            unlock_page(page);
>>    8366            btrfs_start_ordered_extent(ordered, 1);
>>    8367            btrfs_put_ordered_extent(ordered);
>>    8368            goto again;
>>    8369        }
>>    8370
>>    8371        if (page->index == ((size - 1) >> PAGE_SHIFT)) {
>>    8372            reserved_space = round_up(size - page_start,
>>    8373                          fs_info->sectorsize);
>>    8374            if (reserved_space < PAGE_SIZE) {
>>    8375                end = page_start + reserved_space - 1;
>>    8376 btrfs_delalloc_release_space(BTRFS_I(inode),
>>    8377                        data_reserved, page_start,
>>    8378                        PAGE_SIZE - reserved_space, true);
>>    8379            }
>>    8380        }
>>    8381
>>    8382        /*
>>    8383         * page_mkwrite gets called when the page is firstly 
>> dirtied after it's
>>    8384         * faulted in, but write(2) could also dirty a page 
>> and set delalloc
>>    8385         * bits, thus in this case for space account reason, 
>> we still need to
>>    8386         * clear any delalloc bits within this page range 
>> since we have to
>>    8387         * reserve data&meta space before lock_page() (see 
>> above comments).
>>    8388         */
>>    8389        clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start, 
>> end,
>>    8390                  EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
>>    8391                  EXTENT_DEFRAG, 0, 0, &cached_state);
>>    8392
>>    8393        ret2 = btrfs_set_extent_delalloc(BTRFS_I(inode), 
>> page_start, end, 0,
>>    8394                        &cached_state);
>>    8395        if (ret2) {
>>    8396            unlock_extent_cached(io_tree, page_start, page_end,
>>    8397                         &cached_state);
>>    8398            ret = VM_FAULT_SIGBUS;
>>    8399            goto out_unlock;
>>    8400        }
>>    8401
>>    8402        /* page is wholly or partially inside EOF */
>>    8403        if (page_start + PAGE_SIZE > size)
>>    8404            zero_start = offset_in_page(size);
>>    8405        else
>>    8406            zero_start = PAGE_SIZE;
>>    8407
>>    8408        if (zero_start != PAGE_SIZE) {
>>    8409            kaddr = kmap(page);
>>    8410            memset(kaddr + zero_start, 0, PAGE_SIZE - 
>> zero_start);
>>    8411            flush_dcache_page(page);
>>    8412            kunmap(page);
>>    8413        }
>>    8414        ClearPageChecked(page);
>>    8415        set_page_dirty(page);
>>    8416        SetPageUptodate(page);
>>    8417
>>    8418        BTRFS_I(inode)->last_trans = fs_info->generation;
>>    8419        BTRFS_I(inode)->last_sub_trans = 
>> BTRFS_I(inode)->root->log_transid;
>>    8420        BTRFS_I(inode)->last_log_commit = 
>> BTRFS_I(inode)->root->last_log_commit;
>>    8421
>>    8422        unlock_extent_cached(io_tree, page_start, page_end, 
>> &cached_state);
>>    8423
>>    8424        btrfs_delalloc_release_extents(BTRFS_I(inode), 
>> PAGE_SIZE);
>>    8425        sb_end_pagefault(inode->i_sb);
>>    8426        extent_changeset_free(data_reserved);
>>    8427        return VM_FAULT_LOCKED;
>>    8428
>>    8429    out_unlock:
>>    8430        unlock_page(page);
>>    8431    out:
>>    8432        btrfs_delalloc_release_extents(BTRFS_I(inode), 
>> PAGE_SIZE);
>>    8433        btrfs_delalloc_release_space(BTRFS_I(inode), 
>> data_reserved, page_start,
>>    8434                         reserved_space, (ret != 0));
>>    8435    out_noreserve:
>>    8436        sb_end_pagefault(inode->i_sb);
>>    8437        extent_changeset_free(data_reserved);
>>    8438        return ret;
>>    8439    }
>>    8440
>>
>> ---
>> 0-DAY CI Kernel Test Service, Intel Corporation
>> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
>>
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 20/22] btrfs: introduce btrfs_subpage for data inodes
@ 2021-01-06  6:48         ` Rong Chen
  0 siblings, 0 replies; 37+ messages in thread
From: Rong Chen @ 2021-01-06  6:48 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 13319 bytes --]



On 1/6/21 1:32 PM, Qu Wenruo wrote:
>
>
> On 2021/1/6 下午1:04, kernel test robot wrote:
>> Hi Qu,
>>
>> Thank you for the patch! Perhaps something to improve:
>>
>> [auto build test WARNING on kdave/for-next]
>> [also build test WARNING on v5.11-rc2 next-20210104]
>> [If your patch is applied to the wrong git tree, kindly drop us a note.
>> And when submitting patch, we suggest to use '--base' as documented in
>> https://git-scm.com/docs/git-format-patch]
>>
>> url: 
>> https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20210106-090847
>> base: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git 
>> for-next
>> config: m68k-randconfig-s032-20210106 (attached as .config)
>> compiler: m68k-linux-gcc (GCC) 9.3.0
>> reproduce:
>>          wget 
>> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross 
>> -O ~/bin/make.cross
>>          chmod +x ~/bin/make.cross
>>          # apt-get install sparse
>>          # sparse version: v0.6.3-208-g46a52ca4-dirty
>>          # 
>> https://github.com/0day-ci/linux/commit/2c54bbf363f66a7c4d489fa0b7967ce5fc960afb
>>          git remote add linux-review https://github.com/0day-ci/linux
>>          git fetch --no-tags linux-review 
>> Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20210106-090847
>>          git checkout 2c54bbf363f66a7c4d489fa0b7967ce5fc960afb
>>          # save the attached .config to linux build tree
>>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 
>> make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=m68k
>>
>> If you fix the issue, kindly add following tag as appropriate
>> Reported-by: kernel test robot <lkp@intel.com>
>>
>>
>> "sparse warnings: (new ones prefixed by >>)"
>>>> fs/btrfs/inode.c:8352:13: sparse: sparse: incorrect type in 
>>>> assignment (different base types) @@     expected restricted 
>>>> vm_fault_t [assigned] [usertype] ret @@     got int @@
>>     fs/btrfs/inode.c:8352:13: sparse:     expected restricted 
>> vm_fault_t [assigned] [usertype] ret
>>     fs/btrfs/inode.c:8352:13: sparse:     got int
>>>> fs/btrfs/inode.c:8353:13: sparse: sparse: restricted vm_fault_t 
>>>> degrades to integer
>
> Why I always forgot this...
>
> Now it get properly fixed in github. Although the submitted patch is
> still using @ret other than @ret2.
>
> Is there anyway to let LKP to run on specific branch so that I can avoid
> similar problems.

Hi Qu,

LKP can test private branches if you can provide the git tree url, 
please see
https://github.com/intel/lkp-tests/wiki/LKP-FAQ#how-to-request-testing-a-new-kernel-tree-on-lkp

Best Regards,
Rong Chen

>
> Thanks,
> Qu
>>     fs/btrfs/inode.c: note: in included file (through 
>> include/linux/mmzone.h, include/linux/gfp.h, include/linux/slab.h, ...):
>>     include/linux/spinlock.h:394:9: sparse: sparse: context imbalance 
>> in 'run_delayed_iput_locked' - unexpected unlock
>>
>> vim +8352 fs/btrfs/inode.c
>>
>>    8275
>>    8276    /*
>>    8277     * btrfs_page_mkwrite() is not allowed to change the file 
>> size as it gets
>>    8278     * called from a page fault handler when a page is first 
>> dirtied. Hence we must
>>    8279     * be careful to check for EOF conditions here. We set the 
>> page up correctly
>>    8280     * for a written page which means we get ENOSPC checking 
>> when writing into
>>    8281     * holes and correct delalloc and unwritten extent mapping 
>> on filesystems that
>>    8282     * support these features.
>>    8283     *
>>    8284     * We are not allowed to take the i_mutex here so we have 
>> to play games to
>>    8285     * protect against truncate races as the page could now be 
>> beyond EOF.  Because
>>    8286     * truncate_setsize() writes the inode size before 
>> removing pages, once we have
>>    8287     * the page lock we can determine safely if the page is 
>> beyond EOF. If it is not
>>    8288     * beyond EOF, then the page is guaranteed safe against 
>> truncation until we
>>    8289     * unlock the page.
>>    8290     */
>>    8291    vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>>    8292    {
>>    8293        struct page *page = vmf->page;
>>    8294        struct inode *inode = file_inode(vmf->vma->vm_file);
>>    8295        struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>>    8296        struct extent_io_tree *io_tree = 
>> &BTRFS_I(inode)->io_tree;
>>    8297        struct btrfs_ordered_extent *ordered;
>>    8298        struct extent_state *cached_state = NULL;
>>    8299        struct extent_changeset *data_reserved = NULL;
>>    8300        char *kaddr;
>>    8301        unsigned long zero_start;
>>    8302        loff_t size;
>>    8303        vm_fault_t ret;
>>    8304        int ret2;
>>    8305        int reserved = 0;
>>    8306        u64 reserved_space;
>>    8307        u64 page_start;
>>    8308        u64 page_end;
>>    8309        u64 end;
>>    8310
>>    8311        reserved_space = PAGE_SIZE;
>>    8312
>>    8313        sb_start_pagefault(inode->i_sb);
>>    8314        page_start = page_offset(page);
>>    8315        page_end = page_start + PAGE_SIZE - 1;
>>    8316        end = page_end;
>>    8317
>>    8318        /*
>>    8319         * Reserving delalloc space after obtaining the page 
>> lock can lead to
>>    8320         * deadlock. For example, if a dirty page is locked by 
>> this function
>>    8321         * and the call to btrfs_delalloc_reserve_space() ends 
>> up triggering
>>    8322         * dirty page write out, then the btrfs_writepage() 
>> function could
>>    8323         * end up waiting indefinitely to get a lock on the 
>> page currently
>>    8324         * being processed by btrfs_page_mkwrite() function.
>>    8325         */
>>    8326        ret2 = btrfs_delalloc_reserve_space(BTRFS_I(inode), 
>> &data_reserved,
>>    8327                            page_start, reserved_space);
>>    8328        if (!ret2) {
>>    8329            ret2 = file_update_time(vmf->vma->vm_file);
>>    8330            reserved = 1;
>>    8331        }
>>    8332        if (ret2) {
>>    8333            ret = vmf_error(ret2);
>>    8334            if (reserved)
>>    8335                goto out;
>>    8336            goto out_noreserve;
>>    8337        }
>>    8338
>>    8339        ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */
>>    8340    again:
>>    8341        lock_page(page);
>>    8342        size = i_size_read(inode);
>>    8343
>>    8344        if ((page->mapping != inode->i_mapping) ||
>>    8345            (page_start >= size)) {
>>    8346            /* page got truncated out from underneath us */
>>    8347            goto out_unlock;
>>    8348        }
>>    8349        wait_on_page_writeback(page);
>>    8350
>>    8351        lock_extent_bits(io_tree, page_start, page_end, 
>> &cached_state);
>>> 8352        ret = set_page_extent_mapped(page);
>>> 8353        if (ret < 0)
>>    8354            goto out_unlock;
>>    8355
>>    8356        /*
>>    8357         * we can't set the delalloc bits if there are pending 
>> ordered
>>    8358         * extents.  Drop our locks and wait for them to finish
>>    8359         */
>>    8360        ordered = btrfs_lookup_ordered_range(BTRFS_I(inode), 
>> page_start,
>>    8361                PAGE_SIZE);
>>    8362        if (ordered) {
>>    8363            unlock_extent_cached(io_tree, page_start, page_end,
>>    8364                         &cached_state);
>>    8365            unlock_page(page);
>>    8366            btrfs_start_ordered_extent(ordered, 1);
>>    8367            btrfs_put_ordered_extent(ordered);
>>    8368            goto again;
>>    8369        }
>>    8370
>>    8371        if (page->index == ((size - 1) >> PAGE_SHIFT)) {
>>    8372            reserved_space = round_up(size - page_start,
>>    8373                          fs_info->sectorsize);
>>    8374            if (reserved_space < PAGE_SIZE) {
>>    8375                end = page_start + reserved_space - 1;
>>    8376 btrfs_delalloc_release_space(BTRFS_I(inode),
>>    8377                        data_reserved, page_start,
>>    8378                        PAGE_SIZE - reserved_space, true);
>>    8379            }
>>    8380        }
>>    8381
>>    8382        /*
>>    8383         * page_mkwrite gets called when the page is firstly 
>> dirtied after it's
>>    8384         * faulted in, but write(2) could also dirty a page 
>> and set delalloc
>>    8385         * bits, thus in this case for space account reason, 
>> we still need to
>>    8386         * clear any delalloc bits within this page range 
>> since we have to
>>    8387         * reserve data&meta space before lock_page() (see 
>> above comments).
>>    8388         */
>>    8389        clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start, 
>> end,
>>    8390                  EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
>>    8391                  EXTENT_DEFRAG, 0, 0, &cached_state);
>>    8392
>>    8393        ret2 = btrfs_set_extent_delalloc(BTRFS_I(inode), 
>> page_start, end, 0,
>>    8394                        &cached_state);
>>    8395        if (ret2) {
>>    8396            unlock_extent_cached(io_tree, page_start, page_end,
>>    8397                         &cached_state);
>>    8398            ret = VM_FAULT_SIGBUS;
>>    8399            goto out_unlock;
>>    8400        }
>>    8401
>>    8402        /* page is wholly or partially inside EOF */
>>    8403        if (page_start + PAGE_SIZE > size)
>>    8404            zero_start = offset_in_page(size);
>>    8405        else
>>    8406            zero_start = PAGE_SIZE;
>>    8407
>>    8408        if (zero_start != PAGE_SIZE) {
>>    8409            kaddr = kmap(page);
>>    8410            memset(kaddr + zero_start, 0, PAGE_SIZE - 
>> zero_start);
>>    8411            flush_dcache_page(page);
>>    8412            kunmap(page);
>>    8413        }
>>    8414        ClearPageChecked(page);
>>    8415        set_page_dirty(page);
>>    8416        SetPageUptodate(page);
>>    8417
>>    8418        BTRFS_I(inode)->last_trans = fs_info->generation;
>>    8419        BTRFS_I(inode)->last_sub_trans = 
>> BTRFS_I(inode)->root->log_transid;
>>    8420        BTRFS_I(inode)->last_log_commit = 
>> BTRFS_I(inode)->root->last_log_commit;
>>    8421
>>    8422        unlock_extent_cached(io_tree, page_start, page_end, 
>> &cached_state);
>>    8423
>>    8424        btrfs_delalloc_release_extents(BTRFS_I(inode), 
>> PAGE_SIZE);
>>    8425        sb_end_pagefault(inode->i_sb);
>>    8426        extent_changeset_free(data_reserved);
>>    8427        return VM_FAULT_LOCKED;
>>    8428
>>    8429    out_unlock:
>>    8430        unlock_page(page);
>>    8431    out:
>>    8432        btrfs_delalloc_release_extents(BTRFS_I(inode), 
>> PAGE_SIZE);
>>    8433        btrfs_delalloc_release_space(BTRFS_I(inode), 
>> data_reserved, page_start,
>>    8434                         reserved_space, (ret != 0));
>>    8435    out_noreserve:
>>    8436        sb_end_pagefault(inode->i_sb);
>>    8437        extent_changeset_free(data_reserved);
>>    8438        return ret;
>>    8439    }
>>    8440
>>
>> ---
>> 0-DAY CI Kernel Test Service, Intel Corporation
>> https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
>>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 08/22] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case
  2021-01-06  1:01 ` [PATCH v3 08/22] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
@ 2021-01-06  6:54   ` Qu Wenruo
  2021-01-07  1:35   ` Qu Wenruo
  1 sibling, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  6:54 UTC (permalink / raw)
  To: linux-btrfs



On 2021/1/6 上午9:01, Qu Wenruo wrote:
> For subpage case, we need to allocate new memory for each metadata page.
> 
> So we need to:
> - Allow attach_extent_buffer_page() to return int
>    To indicate allocation failure
> 
> - Prealloc page->private for alloc_extent_buffer()
>    We don't want to call memory allocation with spinlock hold, so
>    do preallocation before we acquire the spin lock.
> 
> - Handle subpage and regular case differently in
>    attach_extent_buffer_page()
>    For regular case, just do the usual thing.
>    For subpage case, allocate new memory and update the tree_block
>    bitmap.
> 
>    The bitmap update will be handled by new subpage specific helper,
>    btrfs_subpage_set_tree_block().
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/extent_io.c | 74 ++++++++++++++++++++++++++++++++++----------
>   fs/btrfs/subpage.h   | 50 ++++++++++++++++++++++++++++++
>   2 files changed, 108 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index d60f1837f8fb..2eeff925450f 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -24,6 +24,7 @@
>   #include "rcu-string.h"
>   #include "backref.h"
>   #include "disk-io.h"
> +#include "subpage.h"
>   
>   static struct kmem_cache *extent_state_cache;
>   static struct kmem_cache *extent_buffer_cache;
> @@ -3140,22 +3141,41 @@ static int submit_extent_page(unsigned int opf,
>   	return ret;
>   }
>   
> -static void attach_extent_buffer_page(struct extent_buffer *eb,
> +static int attach_extent_buffer_page(struct extent_buffer *eb,
>   				      struct page *page)
>   {
> -	/*
> -	 * If the page is mapped to btree inode, we should hold the private
> -	 * lock to prevent race.
> -	 * For cloned or dummy extent buffers, their pages are not mapped and
> -	 * will not race with any other ebs.
> -	 */
> -	if (page->mapping)
> -		lockdep_assert_held(&page->mapping->private_lock);
> +	struct btrfs_fs_info *fs_info = eb->fs_info;
> +	int ret;
>   
> -	if (!PagePrivate(page))
> -		attach_page_private(page, eb);
> -	else
> -		WARN_ON(page->private != (unsigned long)eb);
> +	if (fs_info->sectorsize == PAGE_SIZE) {
> +		/*
> +		 * If the page is mapped to btree inode, we should hold the
> +		 * private lock to prevent race.
> +		 * For cloned or dummy extent buffers, their pages are not
> +		 * mapped and will not race with any other ebs.
> +		 */
> +		if (page->mapping)
> +			lockdep_assert_held(&page->mapping->private_lock);
> +
> +		if (!PagePrivate(page))
> +			attach_page_private(page, eb);
> +		else
> +			WARN_ON(page->private != (unsigned long)eb);
> +		return 0;
> +	}
> +
> +	/* Already mapped, just update the existing range */
> +	if (PagePrivate(page))
> +		goto update_bitmap;
> +
> +	/* Do new allocation to attach subpage */
> +	ret = btrfs_attach_subpage(fs_info, page);
> +	if (ret < 0)
> +		return ret;
> +
> +update_bitmap:
> +	btrfs_subpage_set_tree_block(fs_info, page, eb->start, eb->len);
> +	return 0;
>   }
>   
>   void set_page_extent_mapped(struct page *page)
> @@ -5063,21 +5083,29 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
>   	if (new == NULL)
>   		return NULL;
>   
> +	set_bit(EXTENT_BUFFER_UPTODATE, &new->bflags);
> +	set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags);
> +
>   	for (i = 0; i < num_pages; i++) {
> +		int ret;
> +
>   		p = alloc_page(GFP_NOFS);
>   		if (!p) {
>   			btrfs_release_extent_buffer(new);
>   			return NULL;
>   		}
> -		attach_extent_buffer_page(new, p);
> +		ret = attach_extent_buffer_page(new, p);
> +		if (ret < 0) {
> +			put_page(p);
> +			btrfs_release_extent_buffer(new);
> +			return NULL;
> +		}
>   		WARN_ON(PageDirty(p));
>   		SetPageUptodate(p);
>   		new->pages[i] = p;
>   		copy_page(page_address(p), page_address(src->pages[i]));
>   	}
>   
> -	set_bit(EXTENT_BUFFER_UPTODATE, &new->bflags);
> -	set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags);
>   
>   	return new;
>   }
> @@ -5316,6 +5344,18 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>   			goto free_eb;
>   		}
>   
> +		/*
> +		 * Preallocate page->private for subpage case, so that
> +		 * we won't allocate memory with private_lock hold.
> +		 */
> +		ret = btrfs_attach_subpage(fs_info, p);

Although we try to preallocate the subpage structure here, it's not 
reliable.

I have hit a case just minutes before, where we still try to allocate 
memory inside that private_lock spinlock, and causes sleep inside atomic 
warning.

The problem is, we can have a race where the page has one existing eb, 
and it's being freed.

At this point, we still have page::private.

But before we acquire that private_lock, the eb get freed and since it's 
the only eb of that page (our eb hasn't yet being added to the page), it 
detach page private.

> +		if (ret < 0) {
> +			unlock_page(p);
> +			put_page(p);
> +			exists = ERR_PTR(-ENOMEM);
> +			goto free_eb;
> +		}
> +
>   		spin_lock(&mapping->private_lock);
>   		exists = grab_extent_buffer(p);
>   		if (exists) {
> @@ -5325,8 +5365,10 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>   			mark_extent_buffer_accessed(exists, p);
>   			goto free_eb;
>   		}
> +		/* Should not fail, as we have attached the subpage already */
>   		attach_extent_buffer_page(eb, p);

So here we can not rely on any result before we acquire private_lock.

Thus I guess we have to pre-allocate the memory manually and pass the 
pointer in.

Why I didn't hit the bug before sending the patches...

Thanks,
Qu

>   		spin_unlock(&mapping->private_lock);
> +
>   		WARN_ON(PageDirty(p));
>   		eb->pages[i] = p;
>   		if (!PageUptodate(p))
> diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
> index 96f3b226913e..e49d4a7329e1 100644
> --- a/fs/btrfs/subpage.h
> +++ b/fs/btrfs/subpage.h
> @@ -23,9 +23,59 @@
>   struct btrfs_subpage {
>   	/* Common members for both data and metadata pages */
>   	spinlock_t lock;
> +	union {
> +		/* Structures only used by metadata */
> +		struct {
> +			u16 tree_block_bitmap;
> +		};
> +		/* structures only used by data */
> +	};
>   };
>   
>   int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>   void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>   
> +/*
> + * Convert the [start, start + len) range into a u16 bitmap
> + *
> + * E.g. if start == page_offset() + 16K, len = 16K, we get 0x00f0.
> + */
> +static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
> +			struct page *page, u64 start, u32 len)
> +{
> +	int bit_start = offset_in_page(start) >> fs_info->sectorsize_bits;
> +	int nbits = len >> fs_info->sectorsize_bits;
> +
> +	/* Basic checks */
> +	ASSERT(PagePrivate(page) && page->private);
> +	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
> +	       IS_ALIGNED(len, fs_info->sectorsize));
> +
> +	/*
> +	 * The range check only works for mapped page, we can
> +	 * still have unampped page like dummy extent buffer pages.
> +	 */
> +	if (page->mapping)
> +		ASSERT(page_offset(page) <= start &&
> +			start + len <= page_offset(page) + PAGE_SIZE);
> +	/*
> +	 * Here nbits can be 16, thus can go beyond u16 range. Here we make the
> +	 * first left shift to be calculated in unsigned long (u32), then
> +	 * truncate the result to u16.
> +	 */
> +	return (u16)(((1UL << nbits) - 1) << bit_start);
> +}
> +
> +static inline void btrfs_subpage_set_tree_block(struct btrfs_fs_info *fs_info,
> +			struct page *page, u64 start, u32 len)
> +{
> +	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
> +	unsigned long flags;
> +	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
> +
> +	spin_lock_irqsave(&subpage->lock, flags);
> +	subpage->tree_block_bitmap |= tmp;
> +	spin_unlock_irqrestore(&subpage->lock, flags);
> +}
> +
>   #endif /* BTRFS_SUBPAGE_H */
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 16/22] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2021-01-06  1:01 ` [PATCH v3 16/22] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
@ 2021-01-06  8:24   ` Qu Wenruo
  2021-01-06  8:43     ` Qu Wenruo
  0 siblings, 1 reply; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  8:24 UTC (permalink / raw)
  To: linux-btrfs



On 2021/1/6 上午9:01, Qu Wenruo wrote:
> Unlike the original try_release_extent_buffer,
> try_release_subpage_extent_buffer() will iterate through
> btrfs_subpage::tree_block_bitmap, and try to release each extent buffer.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/extent_io.c | 76 ++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 76 insertions(+)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 194cb8b63216..792264f5c3c2 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -6258,10 +6258,86 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
>   	}
>   }
>   
> +static int try_release_subpage_extent_buffer(struct page *page)
> +{
> +	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
> +	u64 page_start = page_offset(page);
> +	int bitmap_size = BTRFS_SUBPAGE_BITMAP_SIZE;
> +	int bit_start = 0;
> +	int ret;
> +
> +	while (bit_start < bitmap_size) {
> +		struct btrfs_subpage *subpage;
> +		struct extent_buffer *eb;
> +		unsigned long flags;
> +		u16 tmp = 1 << bit_start;
> +		u64 start;
> +
> +		/*
> +		 * Make sure the page still has private, as previous iteration
> +		 * can detach page private.
> +		 */
> +		spin_lock(&page->mapping->private_lock);
> +		if (!PagePrivate(page)) {
> +			spin_unlock(&page->mapping->private_lock);
> +			break;
> +		}
> +
> +		subpage = (struct btrfs_subpage *)page->private;
> +
> +		spin_lock_irqsave(&subpage->lock, flags);
> +		spin_unlock(&page->mapping->private_lock);
> +
> +		if (!(tmp & subpage->tree_block_bitmap))  {
> +			spin_unlock_irqrestore(&subpage->lock, flags);
> +			bit_start++;
> +			continue;
> +		}
> +
> +		start = bit_start * fs_info->sectorsize + page_start;
> +		bit_start += fs_info->nodesize >> fs_info->sectorsize_bits;
> +		/*
> +		 * Here we can't call find_extent_buffer() which will increase
> +		 * eb->refs.
> +		 */
> +		rcu_read_lock();
> +		eb = radix_tree_lookup(&fs_info->buffer_radix,
> +				start >> fs_info->sectorsize_bits);
> +		ASSERT(eb);

Another ASSERT() hit here. Surprised that I have never hit such case before.

Since in releasse_extent_buffer(), radix tree is removed first, then 
subpage tree_block_bitmap update, we could have cases where subpage 
tree_block_bitmap is set but no eb in radix tree.

In that case, we can safely go next bit and re-check.

The function return value is only bounded to if the page has private bit 
or not, so here we can safely continue.

Nik is right on this, we need better eb refs handling refactor, I'll 
investigate more time to make the eb refs handling better.

Thanks,
Qu
> +		spin_lock(&eb->refs_lock);
> +		if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb) ||
> +		    !test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) {
> +			spin_unlock(&eb->refs_lock);
> +			rcu_read_unlock();
> +			continue;
> +		}
> +		rcu_read_unlock();
> +		spin_unlock_irqrestore(&subpage->lock, flags);
> +		/*
> +		 * Here we don't care the return value, we will always check
> +		 * the page private at the end.
> +		 * And release_extent_buffer() will release the refs_lock.
> +		 */
> +		release_extent_buffer(eb);
> +	}
> +	/* Finally to check if we have cleared page private */
> +	spin_lock(&page->mapping->private_lock);
> +	if (!PagePrivate(page))
> +		ret = 1;
> +	else
> +		ret = 0;
> +	spin_unlock(&page->mapping->private_lock);
> +	return ret;
> +
> +}
> +
>   int try_release_extent_buffer(struct page *page)
>   {
>   	struct extent_buffer *eb;
>   
> +	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
> +		return try_release_subpage_extent_buffer(page);
> +
>   	/*
>   	 * We need to make sure nobody is attaching this page to an eb right
>   	 * now.
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 16/22] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2021-01-06  8:24   ` Qu Wenruo
@ 2021-01-06  8:43     ` Qu Wenruo
  0 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-06  8:43 UTC (permalink / raw)
  To: linux-btrfs



On 2021/1/6 下午4:24, Qu Wenruo wrote:
> 
> 
> On 2021/1/6 上午9:01, Qu Wenruo wrote:
>> Unlike the original try_release_extent_buffer,
>> try_release_subpage_extent_buffer() will iterate through
>> btrfs_subpage::tree_block_bitmap, and try to release each extent buffer.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/extent_io.c | 76 ++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 76 insertions(+)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 194cb8b63216..792264f5c3c2 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -6258,10 +6258,86 @@ void memmove_extent_buffer(const struct 
>> extent_buffer *dst,
>>       }
>>   }
>> +static int try_release_subpage_extent_buffer(struct page *page)
>> +{
>> +    struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
>> +    u64 page_start = page_offset(page);
>> +    int bitmap_size = BTRFS_SUBPAGE_BITMAP_SIZE;
>> +    int bit_start = 0;
>> +    int ret;
>> +
>> +    while (bit_start < bitmap_size) {
>> +        struct btrfs_subpage *subpage;
>> +        struct extent_buffer *eb;
>> +        unsigned long flags;
>> +        u16 tmp = 1 << bit_start;
>> +        u64 start;
>> +
>> +        /*
>> +         * Make sure the page still has private, as previous iteration
>> +         * can detach page private.
>> +         */
>> +        spin_lock(&page->mapping->private_lock);
>> +        if (!PagePrivate(page)) {
>> +            spin_unlock(&page->mapping->private_lock);
>> +            break;
>> +        }
>> +
>> +        subpage = (struct btrfs_subpage *)page->private;
>> +
>> +        spin_lock_irqsave(&subpage->lock, flags);
>> +        spin_unlock(&page->mapping->private_lock);
>> +
>> +        if (!(tmp & subpage->tree_block_bitmap))  {
>> +            spin_unlock_irqrestore(&subpage->lock, flags);
>> +            bit_start++;
>> +            continue;
>> +        }
>> +
>> +        start = bit_start * fs_info->sectorsize + page_start;
>> +        bit_start += fs_info->nodesize >> fs_info->sectorsize_bits;
>> +        /*
>> +         * Here we can't call find_extent_buffer() which will increase
>> +         * eb->refs.
>> +         */
>> +        rcu_read_lock();
>> +        eb = radix_tree_lookup(&fs_info->buffer_radix,
>> +                start >> fs_info->sectorsize_bits);
>> +        ASSERT(eb);
> 
> Another ASSERT() hit here. Surprised that I have never hit such case 
> before.
> 
> Since in releasse_extent_buffer(), radix tree is removed first, then 
> subpage tree_block_bitmap update, we could have cases where subpage 
> tree_block_bitmap is set but no eb in radix tree.
> 
> In that case, we can safely go next bit and re-check.
> 
> The function return value is only bounded to if the page has private bit 
> or not, so here we can safely continue.
> 
> Nik is right on this, we need better eb refs handling refactor, I'll 
> investigate more time to make the eb refs handling better.

The root problem here is, we have too many things to be synchronized for 
extent buffer.

We have eb::refs, eb::bflags (IN_TREE), buffer_radix, and the new 
subpage::tree_block_bitmap, they all need to be in sync with each other.

I'm wondering if we could find a good and clear enough way to handle 
extent buffers.

IMHO, we need to sacrifice some metadata performance (which is already 
poor enough), or there is really no better way to solve the mess...

Thanks,
Qu

> 
> Thanks,
> Qu
>> +        spin_lock(&eb->refs_lock);
>> +        if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb) ||
>> +            !test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) {
>> +            spin_unlock(&eb->refs_lock);
>> +            rcu_read_unlock();
>> +            continue;
>> +        }
>> +        rcu_read_unlock();
>> +        spin_unlock_irqrestore(&subpage->lock, flags);
>> +        /*
>> +         * Here we don't care the return value, we will always check
>> +         * the page private at the end.
>> +         * And release_extent_buffer() will release the refs_lock.
>> +         */
>> +        release_extent_buffer(eb);
>> +    }
>> +    /* Finally to check if we have cleared page private */
>> +    spin_lock(&page->mapping->private_lock);
>> +    if (!PagePrivate(page))
>> +        ret = 1;
>> +    else
>> +        ret = 0;
>> +    spin_unlock(&page->mapping->private_lock);
>> +    return ret;
>> +
>> +}
>> +
>>   int try_release_extent_buffer(struct page *page)
>>   {
>>       struct extent_buffer *eb;
>> +    if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
>> +        return try_release_subpage_extent_buffer(page);
>> +
>>       /*
>>        * We need to make sure nobody is attaching this page to an eb 
>> right
>>        * now.
>>
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 08/22] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case
  2021-01-06  1:01 ` [PATCH v3 08/22] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
  2021-01-06  6:54   ` Qu Wenruo
@ 2021-01-07  1:35   ` Qu Wenruo
  1 sibling, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-07  1:35 UTC (permalink / raw)
  To: linux-btrfs



On 2021/1/6 上午9:01, Qu Wenruo wrote:
> For subpage case, we need to allocate new memory for each metadata page.
> 
> So we need to:
> - Allow attach_extent_buffer_page() to return int
>    To indicate allocation failure
> 
> - Prealloc page->private for alloc_extent_buffer()
>    We don't want to call memory allocation with spinlock hold, so
>    do preallocation before we acquire the spin lock.
> 
> - Handle subpage and regular case differently in
>    attach_extent_buffer_page()
>    For regular case, just do the usual thing.
>    For subpage case, allocate new memory and update the tree_block
>    bitmap.
> 
>    The bitmap update will be handled by new subpage specific helper,
>    btrfs_subpage_set_tree_block().
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/extent_io.c | 74 ++++++++++++++++++++++++++++++++++----------
>   fs/btrfs/subpage.h   | 50 ++++++++++++++++++++++++++++++
>   2 files changed, 108 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index d60f1837f8fb..2eeff925450f 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -24,6 +24,7 @@
>   #include "rcu-string.h"
>   #include "backref.h"
>   #include "disk-io.h"
> +#include "subpage.h"
>   
>   static struct kmem_cache *extent_state_cache;
>   static struct kmem_cache *extent_buffer_cache;
> @@ -3140,22 +3141,41 @@ static int submit_extent_page(unsigned int opf,
>   	return ret;
>   }
>   
> -static void attach_extent_buffer_page(struct extent_buffer *eb,
> +static int attach_extent_buffer_page(struct extent_buffer *eb,
>   				      struct page *page)
>   {
> -	/*
> -	 * If the page is mapped to btree inode, we should hold the private
> -	 * lock to prevent race.
> -	 * For cloned or dummy extent buffers, their pages are not mapped and
> -	 * will not race with any other ebs.
> -	 */
> -	if (page->mapping)
> -		lockdep_assert_held(&page->mapping->private_lock);
> +	struct btrfs_fs_info *fs_info = eb->fs_info;
> +	int ret;
>   
> -	if (!PagePrivate(page))
> -		attach_page_private(page, eb);
> -	else
> -		WARN_ON(page->private != (unsigned long)eb);
> +	if (fs_info->sectorsize == PAGE_SIZE) {
> +		/*
> +		 * If the page is mapped to btree inode, we should hold the
> +		 * private lock to prevent race.
> +		 * For cloned or dummy extent buffers, their pages are not
> +		 * mapped and will not race with any other ebs.
> +		 */
> +		if (page->mapping)
> +			lockdep_assert_held(&page->mapping->private_lock);
> +
> +		if (!PagePrivate(page))
> +			attach_page_private(page, eb);
> +		else
> +			WARN_ON(page->private != (unsigned long)eb);
> +		return 0;
> +	}
> +
> +	/* Already mapped, just update the existing range */
> +	if (PagePrivate(page))
> +		goto update_bitmap;
> +
> +	/* Do new allocation to attach subpage */
> +	ret = btrfs_attach_subpage(fs_info, page);
> +	if (ret < 0)
> +		return ret;
> +
> +update_bitmap:
> +	btrfs_subpage_set_tree_block(fs_info, page, eb->start, eb->len);
> +	return 0;
>   }
>   
>   void set_page_extent_mapped(struct page *page)
> @@ -5063,21 +5083,29 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
>   	if (new == NULL)
>   		return NULL;
>   
> +	set_bit(EXTENT_BUFFER_UPTODATE, &new->bflags);
> +	set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags);
> +
>   	for (i = 0; i < num_pages; i++) {
> +		int ret;
> +
>   		p = alloc_page(GFP_NOFS);
>   		if (!p) {
>   			btrfs_release_extent_buffer(new);
>   			return NULL;
>   		}
> -		attach_extent_buffer_page(new, p);
> +		ret = attach_extent_buffer_page(new, p);
> +		if (ret < 0) {
> +			put_page(p);
> +			btrfs_release_extent_buffer(new);
> +			return NULL;
> +		}
>   		WARN_ON(PageDirty(p));
>   		SetPageUptodate(p);
>   		new->pages[i] = p;
>   		copy_page(page_address(p), page_address(src->pages[i]));
>   	}
>   
> -	set_bit(EXTENT_BUFFER_UPTODATE, &new->bflags);
> -	set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags);
>   
>   	return new;
>   }
> @@ -5316,6 +5344,18 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>   			goto free_eb;
>   		}
>   
> +		/*
> +		 * Preallocate page->private for subpage case, so that
> +		 * we won't allocate memory with private_lock hold.
> +		 */
> +		ret = btrfs_attach_subpage(fs_info, p);
> +		if (ret < 0) {
> +			unlock_page(p);
> +			put_page(p);
> +			exists = ERR_PTR(-ENOMEM);
> +			goto free_eb;
> +		}
> +
>   		spin_lock(&mapping->private_lock);
>   		exists = grab_extent_buffer(p);
>   		if (exists) {
> @@ -5325,8 +5365,10 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>   			mark_extent_buffer_accessed(exists, p);
>   			goto free_eb;
>   		}
> +		/* Should not fail, as we have attached the subpage already */
>   		attach_extent_buffer_page(eb, p);
>   		spin_unlock(&mapping->private_lock);
> +
>   		WARN_ON(PageDirty(p));
>   		eb->pages[i] = p;
>   		if (!PageUptodate(p))
> diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
> index 96f3b226913e..e49d4a7329e1 100644
> --- a/fs/btrfs/subpage.h
> +++ b/fs/btrfs/subpage.h
> @@ -23,9 +23,59 @@
>   struct btrfs_subpage {
>   	/* Common members for both data and metadata pages */
>   	spinlock_t lock;
> +	union {
> +		/* Structures only used by metadata */
> +		struct {
> +			u16 tree_block_bitmap;
> +		};

After more investigation, the idea of using subpage structure to record 
tree blocks bitmap may not be a good idea.

The main problem is the synchronization for the bit and extent buffer, 
which have different lock schemas.
I'll try to use radix tree only to do the check, other than relying on 
some extra bit.

Thanks,
Qu

> +		/* structures only used by data */
> +	};
>   };
>   
>   int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>   void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>   
> +/*
> + * Convert the [start, start + len) range into a u16 bitmap
> + *
> + * E.g. if start == page_offset() + 16K, len = 16K, we get 0x00f0.
> + */
> +static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
> +			struct page *page, u64 start, u32 len)
> +{
> +	int bit_start = offset_in_page(start) >> fs_info->sectorsize_bits;
> +	int nbits = len >> fs_info->sectorsize_bits;
> +
> +	/* Basic checks */
> +	ASSERT(PagePrivate(page) && page->private);
> +	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
> +	       IS_ALIGNED(len, fs_info->sectorsize));
> +
> +	/*
> +	 * The range check only works for mapped page, we can
> +	 * still have unampped page like dummy extent buffer pages.
> +	 */
> +	if (page->mapping)
> +		ASSERT(page_offset(page) <= start &&
> +			start + len <= page_offset(page) + PAGE_SIZE);
> +	/*
> +	 * Here nbits can be 16, thus can go beyond u16 range. Here we make the
> +	 * first left shift to be calculated in unsigned long (u32), then
> +	 * truncate the result to u16.
> +	 */
> +	return (u16)(((1UL << nbits) - 1) << bit_start);
> +}
> +
> +static inline void btrfs_subpage_set_tree_block(struct btrfs_fs_info *fs_info,
> +			struct page *page, u64 start, u32 len)
> +{
> +	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
> +	unsigned long flags;
> +	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
> +
> +	spin_lock_irqsave(&subpage->lock, flags);
> +	subpage->tree_block_bitmap |= tmp;
> +	spin_unlock_irqrestore(&subpage->lock, flags);
> +}
> +
>   #endif /* BTRFS_SUBPAGE_H */
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 20/22] btrfs: introduce btrfs_subpage for data inodes
  2021-01-06  1:01 ` [PATCH v3 20/22] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
@ 2021-01-09  9:53     ` kernel test robot
  2021-01-09  9:53     ` kernel test robot
  1 sibling, 0 replies; 37+ messages in thread
From: kernel test robot @ 2021-01-09  9:53 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 7480 bytes --]

Hi Qu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kdave/for-next]
[also build test WARNING on v5.11-rc2 next-20210108]
[cannot apply to btrfs/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20210106-090847
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: i386-randconfig-m021-20210108 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

smatch warnings:
fs/btrfs/inode.c:8353 btrfs_page_mkwrite() warn: unsigned 'ret' is never less than zero.

vim +/ret +8353 fs/btrfs/inode.c

  8275	
  8276	/*
  8277	 * btrfs_page_mkwrite() is not allowed to change the file size as it gets
  8278	 * called from a page fault handler when a page is first dirtied. Hence we must
  8279	 * be careful to check for EOF conditions here. We set the page up correctly
  8280	 * for a written page which means we get ENOSPC checking when writing into
  8281	 * holes and correct delalloc and unwritten extent mapping on filesystems that
  8282	 * support these features.
  8283	 *
  8284	 * We are not allowed to take the i_mutex here so we have to play games to
  8285	 * protect against truncate races as the page could now be beyond EOF.  Because
  8286	 * truncate_setsize() writes the inode size before removing pages, once we have
  8287	 * the page lock we can determine safely if the page is beyond EOF. If it is not
  8288	 * beyond EOF, then the page is guaranteed safe against truncation until we
  8289	 * unlock the page.
  8290	 */
  8291	vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
  8292	{
  8293		struct page *page = vmf->page;
  8294		struct inode *inode = file_inode(vmf->vma->vm_file);
  8295		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
  8296		struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
  8297		struct btrfs_ordered_extent *ordered;
  8298		struct extent_state *cached_state = NULL;
  8299		struct extent_changeset *data_reserved = NULL;
  8300		char *kaddr;
  8301		unsigned long zero_start;
  8302		loff_t size;
  8303		vm_fault_t ret;
  8304		int ret2;
  8305		int reserved = 0;
  8306		u64 reserved_space;
  8307		u64 page_start;
  8308		u64 page_end;
  8309		u64 end;
  8310	
  8311		reserved_space = PAGE_SIZE;
  8312	
  8313		sb_start_pagefault(inode->i_sb);
  8314		page_start = page_offset(page);
  8315		page_end = page_start + PAGE_SIZE - 1;
  8316		end = page_end;
  8317	
  8318		/*
  8319		 * Reserving delalloc space after obtaining the page lock can lead to
  8320		 * deadlock. For example, if a dirty page is locked by this function
  8321		 * and the call to btrfs_delalloc_reserve_space() ends up triggering
  8322		 * dirty page write out, then the btrfs_writepage() function could
  8323		 * end up waiting indefinitely to get a lock on the page currently
  8324		 * being processed by btrfs_page_mkwrite() function.
  8325		 */
  8326		ret2 = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved,
  8327						    page_start, reserved_space);
  8328		if (!ret2) {
  8329			ret2 = file_update_time(vmf->vma->vm_file);
  8330			reserved = 1;
  8331		}
  8332		if (ret2) {
  8333			ret = vmf_error(ret2);
  8334			if (reserved)
  8335				goto out;
  8336			goto out_noreserve;
  8337		}
  8338	
  8339		ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */
  8340	again:
  8341		lock_page(page);
  8342		size = i_size_read(inode);
  8343	
  8344		if ((page->mapping != inode->i_mapping) ||
  8345		    (page_start >= size)) {
  8346			/* page got truncated out from underneath us */
  8347			goto out_unlock;
  8348		}
  8349		wait_on_page_writeback(page);
  8350	
  8351		lock_extent_bits(io_tree, page_start, page_end, &cached_state);
  8352		ret = set_page_extent_mapped(page);
> 8353		if (ret < 0)
  8354			goto out_unlock;
  8355	
  8356		/*
  8357		 * we can't set the delalloc bits if there are pending ordered
  8358		 * extents.  Drop our locks and wait for them to finish
  8359		 */
  8360		ordered = btrfs_lookup_ordered_range(BTRFS_I(inode), page_start,
  8361				PAGE_SIZE);
  8362		if (ordered) {
  8363			unlock_extent_cached(io_tree, page_start, page_end,
  8364					     &cached_state);
  8365			unlock_page(page);
  8366			btrfs_start_ordered_extent(ordered, 1);
  8367			btrfs_put_ordered_extent(ordered);
  8368			goto again;
  8369		}
  8370	
  8371		if (page->index == ((size - 1) >> PAGE_SHIFT)) {
  8372			reserved_space = round_up(size - page_start,
  8373						  fs_info->sectorsize);
  8374			if (reserved_space < PAGE_SIZE) {
  8375				end = page_start + reserved_space - 1;
  8376				btrfs_delalloc_release_space(BTRFS_I(inode),
  8377						data_reserved, page_start,
  8378						PAGE_SIZE - reserved_space, true);
  8379			}
  8380		}
  8381	
  8382		/*
  8383		 * page_mkwrite gets called when the page is firstly dirtied after it's
  8384		 * faulted in, but write(2) could also dirty a page and set delalloc
  8385		 * bits, thus in this case for space account reason, we still need to
  8386		 * clear any delalloc bits within this page range since we have to
  8387		 * reserve data&meta space before lock_page() (see above comments).
  8388		 */
  8389		clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start, end,
  8390				  EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
  8391				  EXTENT_DEFRAG, 0, 0, &cached_state);
  8392	
  8393		ret2 = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, end, 0,
  8394						&cached_state);
  8395		if (ret2) {
  8396			unlock_extent_cached(io_tree, page_start, page_end,
  8397					     &cached_state);
  8398			ret = VM_FAULT_SIGBUS;
  8399			goto out_unlock;
  8400		}
  8401	
  8402		/* page is wholly or partially inside EOF */
  8403		if (page_start + PAGE_SIZE > size)
  8404			zero_start = offset_in_page(size);
  8405		else
  8406			zero_start = PAGE_SIZE;
  8407	
  8408		if (zero_start != PAGE_SIZE) {
  8409			kaddr = kmap(page);
  8410			memset(kaddr + zero_start, 0, PAGE_SIZE - zero_start);
  8411			flush_dcache_page(page);
  8412			kunmap(page);
  8413		}
  8414		ClearPageChecked(page);
  8415		set_page_dirty(page);
  8416		SetPageUptodate(page);
  8417	
  8418		BTRFS_I(inode)->last_trans = fs_info->generation;
  8419		BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid;
  8420		BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
  8421	
  8422		unlock_extent_cached(io_tree, page_start, page_end, &cached_state);
  8423	
  8424		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8425		sb_end_pagefault(inode->i_sb);
  8426		extent_changeset_free(data_reserved);
  8427		return VM_FAULT_LOCKED;
  8428	
  8429	out_unlock:
  8430		unlock_page(page);
  8431	out:
  8432		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8433		btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, page_start,
  8434					     reserved_space, (ret != 0));
  8435	out_noreserve:
  8436		sb_end_pagefault(inode->i_sb);
  8437		extent_changeset_free(data_reserved);
  8438		return ret;
  8439	}
  8440	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29921 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 20/22] btrfs: introduce btrfs_subpage for data inodes
@ 2021-01-09  9:53     ` kernel test robot
  0 siblings, 0 replies; 37+ messages in thread
From: kernel test robot @ 2021-01-09  9:53 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 7676 bytes --]

Hi Qu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kdave/for-next]
[also build test WARNING on v5.11-rc2 next-20210108]
[cannot apply to btrfs/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20210106-090847
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: i386-randconfig-m021-20210108 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

smatch warnings:
fs/btrfs/inode.c:8353 btrfs_page_mkwrite() warn: unsigned 'ret' is never less than zero.

vim +/ret +8353 fs/btrfs/inode.c

  8275	
  8276	/*
  8277	 * btrfs_page_mkwrite() is not allowed to change the file size as it gets
  8278	 * called from a page fault handler when a page is first dirtied. Hence we must
  8279	 * be careful to check for EOF conditions here. We set the page up correctly
  8280	 * for a written page which means we get ENOSPC checking when writing into
  8281	 * holes and correct delalloc and unwritten extent mapping on filesystems that
  8282	 * support these features.
  8283	 *
  8284	 * We are not allowed to take the i_mutex here so we have to play games to
  8285	 * protect against truncate races as the page could now be beyond EOF.  Because
  8286	 * truncate_setsize() writes the inode size before removing pages, once we have
  8287	 * the page lock we can determine safely if the page is beyond EOF. If it is not
  8288	 * beyond EOF, then the page is guaranteed safe against truncation until we
  8289	 * unlock the page.
  8290	 */
  8291	vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
  8292	{
  8293		struct page *page = vmf->page;
  8294		struct inode *inode = file_inode(vmf->vma->vm_file);
  8295		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
  8296		struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
  8297		struct btrfs_ordered_extent *ordered;
  8298		struct extent_state *cached_state = NULL;
  8299		struct extent_changeset *data_reserved = NULL;
  8300		char *kaddr;
  8301		unsigned long zero_start;
  8302		loff_t size;
  8303		vm_fault_t ret;
  8304		int ret2;
  8305		int reserved = 0;
  8306		u64 reserved_space;
  8307		u64 page_start;
  8308		u64 page_end;
  8309		u64 end;
  8310	
  8311		reserved_space = PAGE_SIZE;
  8312	
  8313		sb_start_pagefault(inode->i_sb);
  8314		page_start = page_offset(page);
  8315		page_end = page_start + PAGE_SIZE - 1;
  8316		end = page_end;
  8317	
  8318		/*
  8319		 * Reserving delalloc space after obtaining the page lock can lead to
  8320		 * deadlock. For example, if a dirty page is locked by this function
  8321		 * and the call to btrfs_delalloc_reserve_space() ends up triggering
  8322		 * dirty page write out, then the btrfs_writepage() function could
  8323		 * end up waiting indefinitely to get a lock on the page currently
  8324		 * being processed by btrfs_page_mkwrite() function.
  8325		 */
  8326		ret2 = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved,
  8327						    page_start, reserved_space);
  8328		if (!ret2) {
  8329			ret2 = file_update_time(vmf->vma->vm_file);
  8330			reserved = 1;
  8331		}
  8332		if (ret2) {
  8333			ret = vmf_error(ret2);
  8334			if (reserved)
  8335				goto out;
  8336			goto out_noreserve;
  8337		}
  8338	
  8339		ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */
  8340	again:
  8341		lock_page(page);
  8342		size = i_size_read(inode);
  8343	
  8344		if ((page->mapping != inode->i_mapping) ||
  8345		    (page_start >= size)) {
  8346			/* page got truncated out from underneath us */
  8347			goto out_unlock;
  8348		}
  8349		wait_on_page_writeback(page);
  8350	
  8351		lock_extent_bits(io_tree, page_start, page_end, &cached_state);
  8352		ret = set_page_extent_mapped(page);
> 8353		if (ret < 0)
  8354			goto out_unlock;
  8355	
  8356		/*
  8357		 * we can't set the delalloc bits if there are pending ordered
  8358		 * extents.  Drop our locks and wait for them to finish
  8359		 */
  8360		ordered = btrfs_lookup_ordered_range(BTRFS_I(inode), page_start,
  8361				PAGE_SIZE);
  8362		if (ordered) {
  8363			unlock_extent_cached(io_tree, page_start, page_end,
  8364					     &cached_state);
  8365			unlock_page(page);
  8366			btrfs_start_ordered_extent(ordered, 1);
  8367			btrfs_put_ordered_extent(ordered);
  8368			goto again;
  8369		}
  8370	
  8371		if (page->index == ((size - 1) >> PAGE_SHIFT)) {
  8372			reserved_space = round_up(size - page_start,
  8373						  fs_info->sectorsize);
  8374			if (reserved_space < PAGE_SIZE) {
  8375				end = page_start + reserved_space - 1;
  8376				btrfs_delalloc_release_space(BTRFS_I(inode),
  8377						data_reserved, page_start,
  8378						PAGE_SIZE - reserved_space, true);
  8379			}
  8380		}
  8381	
  8382		/*
  8383		 * page_mkwrite gets called when the page is firstly dirtied after it's
  8384		 * faulted in, but write(2) could also dirty a page and set delalloc
  8385		 * bits, thus in this case for space account reason, we still need to
  8386		 * clear any delalloc bits within this page range since we have to
  8387		 * reserve data&meta space before lock_page() (see above comments).
  8388		 */
  8389		clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start, end,
  8390				  EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
  8391				  EXTENT_DEFRAG, 0, 0, &cached_state);
  8392	
  8393		ret2 = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, end, 0,
  8394						&cached_state);
  8395		if (ret2) {
  8396			unlock_extent_cached(io_tree, page_start, page_end,
  8397					     &cached_state);
  8398			ret = VM_FAULT_SIGBUS;
  8399			goto out_unlock;
  8400		}
  8401	
  8402		/* page is wholly or partially inside EOF */
  8403		if (page_start + PAGE_SIZE > size)
  8404			zero_start = offset_in_page(size);
  8405		else
  8406			zero_start = PAGE_SIZE;
  8407	
  8408		if (zero_start != PAGE_SIZE) {
  8409			kaddr = kmap(page);
  8410			memset(kaddr + zero_start, 0, PAGE_SIZE - zero_start);
  8411			flush_dcache_page(page);
  8412			kunmap(page);
  8413		}
  8414		ClearPageChecked(page);
  8415		set_page_dirty(page);
  8416		SetPageUptodate(page);
  8417	
  8418		BTRFS_I(inode)->last_trans = fs_info->generation;
  8419		BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid;
  8420		BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
  8421	
  8422		unlock_extent_cached(io_tree, page_start, page_end, &cached_state);
  8423	
  8424		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8425		sb_end_pagefault(inode->i_sb);
  8426		extent_changeset_free(data_reserved);
  8427		return VM_FAULT_LOCKED;
  8428	
  8429	out_unlock:
  8430		unlock_page(page);
  8431	out:
  8432		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8433		btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, page_start,
  8434					     reserved_space, (ret != 0));
  8435	out_noreserve:
  8436		sb_end_pagefault(inode->i_sb);
  8437		extent_changeset_free(data_reserved);
  8438		return ret;
  8439	}
  8440	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 29921 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 06/22] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page
  2021-01-06  1:01 ` [PATCH v3 06/22] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page Qu Wenruo
@ 2021-01-12 15:08   ` David Sterba
  0 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2021-01-12 15:08 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Johannes Thumshirn

On Wed, Jan 06, 2021 at 09:01:45AM +0800, Qu Wenruo wrote:
> +static struct extent_buffer *grab_extent_buffer(struct page *page)
> +{
> +	struct extent_buffer *exists;
> +
> +	/* Page not yet attached to an extent buffer */
> +	if (!PagePrivate(page))
> +		return NULL;
> +
> +	/*
> +	 * We could have already allocated an eb for this page
> +	 * and attached one so lets see if we can get a ref on
> +	 * the existing eb, and if we can we know it's good and
> +	 * we can just return that one, else we know we can just
> +	 * overwrite page->private.
> +	 */

Please reformat comments that get moved to full 80 columns line. Fixed.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 00/22] btrfs: add read-only support for subpage sector size
  2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (21 preceding siblings ...)
  2021-01-06  1:02 ` [PATCH v3 22/22] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
@ 2021-01-12 15:14 ` David Sterba
  2021-01-13  5:06   ` Qu Wenruo
  22 siblings, 1 reply; 37+ messages in thread
From: David Sterba @ 2021-01-12 15:14 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Jan 06, 2021 at 09:01:39AM +0800, Qu Wenruo wrote:
> Patches can be fetched from github:
> https://github.com/adam900710/linux/tree/subpage
> Currently the branch also contains partial RW data support (still some
> out-of-sync subpage data page status).
> 
> Great thanks to David for his effort reviewing and merging the
> preparation patches into misc-next.
> Now all previously submitted preparation patches are already in
> misc-next.

I merged 1, 2, 3, 6 to misc-next as they're obvious and independent.
The rest is up for review, I haven't looked closely on the open
questions.

> Qu Wenruo (22):
>   btrfs: extent_io: rename @offset parameter to @disk_bytenr for
>     submit_extent_page()

Please drop "extent_io:" from any future patch subjects, they get too
long already and we haven't been using this prefix consistently anyway.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 00/22] btrfs: add read-only support for subpage sector size
  2021-01-12 15:14 ` [PATCH v3 00/22] btrfs: add read-only support for subpage sector size David Sterba
@ 2021-01-13  5:06   ` Qu Wenruo
  0 siblings, 0 replies; 37+ messages in thread
From: Qu Wenruo @ 2021-01-13  5:06 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2021/1/12 下午11:14, David Sterba wrote:
> On Wed, Jan 06, 2021 at 09:01:39AM +0800, Qu Wenruo wrote:
>> Patches can be fetched from github:
>> https://github.com/adam900710/linux/tree/subpage
>> Currently the branch also contains partial RW data support (still some
>> out-of-sync subpage data page status).
>>
>> Great thanks to David for his effort reviewing and merging the
>> preparation patches into misc-next.
>> Now all previously submitted preparation patches are already in
>> misc-next.
>
> I merged 1, 2, 3, 6 to misc-next as they're obvious and independent.
> The rest is up for review, I haven't looked closely on the open
> questions.

I just fetched misc-next from github/kernel, and none of the branches
have the patches.

Maybe I should be more patient?

>
>> Qu Wenruo (22):
>>    btrfs: extent_io: rename @offset parameter to @disk_bytenr for
>>      submit_extent_page()
>
> Please drop "extent_io:" from any future patch subjects, they get too
> long already and we haven't been using this prefix consistently anyway.
>
Got it.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2021-01-13  5:08 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-06  1:01 [PATCH v3 00/22] btrfs: add read-only support for subpage sector size Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 01/22] btrfs: extent_io: rename @offset parameter to @disk_bytenr for submit_extent_page() Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 02/22] btrfs: extent_io: refactor __extent_writepage_io() to improve readability Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 03/22] btrfs: file: update comment for btrfs_dirty_pages() Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 04/22] btrfs: extent_io: update locked page dirty/writeback/error bits in __process_pages_contig() Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 05/22] btrfs: extent_io: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK into PAGE_START_WRITEBACK Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 06/22] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page Qu Wenruo
2021-01-12 15:08   ` David Sterba
2021-01-06  1:01 ` [PATCH v3 07/22] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 08/22] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
2021-01-06  6:54   ` Qu Wenruo
2021-01-07  1:35   ` Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 09/22] btrfs: extent_io: make grab_extent_buffer_from_page() " Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 10/22] btrfs: extent_io: support subpage for extent buffer page release Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 11/22] btrfs: extent_io: attach private to dummy extent buffer pages Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 12/22] btrfs: subpage: introduce helper for subpage uptodate status Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 13/22] btrfs: subpage: introduce helper for subpage error status Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 14/22] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 15/22] btrfs: extent_io: make btrfs_clone_extent_buffer() to be subpage compatible Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 16/22] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
2021-01-06  8:24   ` Qu Wenruo
2021-01-06  8:43     ` Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 17/22] btrfs: extent_io: introduce read_extent_buffer_subpage() Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 18/22] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 19/22] btrfs: disk-io: introduce subpage metadata validation check Qu Wenruo
2021-01-06  1:01 ` [PATCH v3 20/22] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
2021-01-06  5:04   ` kernel test robot
2021-01-06  5:04     ` kernel test robot
2021-01-06  5:32     ` Qu Wenruo
2021-01-06  6:48       ` Rong Chen
2021-01-06  6:48         ` Rong Chen
2021-01-09  9:53   ` kernel test robot
2021-01-09  9:53     ` kernel test robot
2021-01-06  1:02 ` [PATCH v3 21/22] btrfs: integrate page status update for read path into begin/end_page_read() Qu Wenruo
2021-01-06  1:02 ` [PATCH v3 22/22] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
2021-01-12 15:14 ` [PATCH v3 00/22] btrfs: add read-only support for subpage sector size David Sterba
2021-01-13  5:06   ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.