All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/18] btrfs: add read-only support for subpage sector size
@ 2020-12-10  6:38 Qu Wenruo
  2020-12-10  6:38 ` [PATCH v2 01/18] btrfs: extent_io: rename @offset parameter to @disk_bytenr for submit_extent_page() Qu Wenruo
                   ` (17 more replies)
  0 siblings, 18 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:38 UTC (permalink / raw)
  To: linux-btrfs

Patches can be fetched from github:
https://github.com/adam900710/linux/tree/subpage
Currently the branch also contains partial RW data support (still some
out-of-sync subpage data page status).

Great thanks to David for his effort reviewing and merging the
preparation patches into misc-next.
Now all previously submitted preparation patches are already in
misc-next.

=== What works ===

Just from the patchset:
- Data read
  Both regular and compressed data, with csum check.

- Metadata read

This means, with these patchset, 64K page systems can at least mount
btrfs with 4K sector size.

In the subpage branch
- Metadata read write
  Not yet full tested due to data write still has bugs need to be
  solved.
  But considering that metadata operations from previous iteration
  is mostly untouched, metadata read write should be pretty stable.

- Data read write
  WIP. There are fsstress runs which leads to subpage dirty
  status out-of-sync and cause some ordered extent never finish.
  Still fixing it.

=== Needs feedback ===
The following design needs extra comments:

- u16 bitmap
  As David mentioned, using u16 as bit map is not the fastest way.
  That's also why current bitmap code requires unsigned long (u32) as
  minimal unit.
  But using bitmap directly would double the memory usage.
  Thus the best way is to pack two u16 bitmap into one u32 bitmap, but
  that still needs extra investigation to find better practice.

  Anyway the skeleton should be pretty simple to expand.

- Separate handling for subpage metadata
  Currently the metadata read and (later write path) handles subpage
  metadata differently. Mostly due to the page locking must be skipped
  for subpage metadata.
  I tried several times to use as many common code as possible, but
  every time I ended up reverting back to current code.

  Thankfully, for data handling we will use the same common code.

=== Patchset structure ===
Patch 01~03:	New preparation patches.
		Mostly readability related patches found during RW
		development
Patch 04~08:	Subpage handling for extent buffer allocation and
		freeing
Patch 09~18:	Subpage handling for extent buffer read path

=== Changelog ===
v1:
- Separate the main implementation from previous huge patchset
  Huge patchset doesn't make much sense.

- Use bitmap implementation
  Now page::private will be a pointer to btrfs_subpage structure, which
  contains bitmaps for various page status.

v2:
- Use page::private as btrfs_subpage for extra info
  This replace old extent io tree based solution, which reduces latency
  and don't require memory allocation for its operations.

- Cherry-pick new preparation patches from RW development
  Those new preparation patches improves the readability by their own.

Qu Wenruo (18):
  btrfs: extent_io: rename @offset parameter to @disk_bytenr for
    submit_extent_page()
  btrfs: extent_io: refactor __extent_writepage_io() to improve
    readability
  btrfs: file: update comment for btrfs_dirty_pages()
  btrfs: extent_io: introduce a helper to grab an existing extent buffer
    from a page
  btrfs: extent_io: introduce the skeleton of btrfs_subpage structure
  btrfs: extent_io: make attach_extent_buffer_page() to handle subpage
    case
  btrfs: extent_io: make grab_extent_buffer_from_page() to handle
    subpage case
  btrfs: extent_io: support subpage for extent buffer page release
  btrfs: subpage: introduce helper for subpage uptodate status
  btrfs: subpage: introduce helper for subpage error status
  btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support
    subpage size
  btrfs: extent_io: implement try_release_extent_buffer() for subpage
    metadata support
  btrfs: extent_io: introduce read_extent_buffer_subpage()
  btrfs: extent_io: make endio_readpage_update_page_status() to handle
    subpage case
  btrfs: disk-io: introduce subpage metadata validation check
  btrfs: introduce btrfs_subpage for data inodes
  btrfs: integrate page status update for read path into
    begin/end_page_read()
  btrfs: allow RO mount of 4K sector size fs on 64K page system

 fs/btrfs/Makefile           |   3 +-
 fs/btrfs/compression.c      |  10 +-
 fs/btrfs/disk-io.c          | 107 +++++++-
 fs/btrfs/extent_io.c        | 507 ++++++++++++++++++++++++++++--------
 fs/btrfs/extent_io.h        |   3 +-
 fs/btrfs/file.c             |  25 +-
 fs/btrfs/free-space-cache.c |  15 +-
 fs/btrfs/inode.c            |  12 +-
 fs/btrfs/ioctl.c            |   5 +-
 fs/btrfs/reflink.c          |   5 +-
 fs/btrfs/relocation.c       |  12 +-
 fs/btrfs/subpage.c          |  34 +++
 fs/btrfs/subpage.h          | 264 +++++++++++++++++++
 fs/btrfs/super.c            |   7 +
 14 files changed, 876 insertions(+), 133 deletions(-)
 create mode 100644 fs/btrfs/subpage.c
 create mode 100644 fs/btrfs/subpage.h

-- 
2.29.2


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH v2 01/18] btrfs: extent_io: rename @offset parameter to @disk_bytenr for submit_extent_page()
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
@ 2020-12-10  6:38 ` Qu Wenruo
  2020-12-17 15:44   ` Josef Bacik
  2020-12-10  6:38 ` [PATCH v2 02/18] btrfs: extent_io: refactor __extent_writepage_io() to improve readability Qu Wenruo
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:38 UTC (permalink / raw)
  To: linux-btrfs

The parameter @offset can't be more confusing.
In fact that parameter is the disk bytenr for metadata/data.

Rename it to @disk_bytenr and update the comment to reduce confusion.

Since we're here, also rename all @offset passed into
submit_extent_page() to @disk_bytenr.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6e3b72e63e42..2650e8720394 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3064,10 +3064,10 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size)
  * @opf:	bio REQ_OP_* and REQ_* flags as one value
  * @wbc:	optional writeback control for io accounting
  * @page:	page to add to the bio
+ * @disk_bytenr:the logical bytenr where the write will be
+ * @size:	portion of page that we want to write
  * @pg_offset:	offset of the new bio or to check whether we are adding
  *              a contiguous page to the previous one
- * @size:	portion of page that we want to write
- * @offset:	starting offset in the page
  * @bio_ret:	must be valid pointer, newly allocated bio will be stored there
  * @end_io_func:     end_io callback for new bio
  * @mirror_num:	     desired mirror to read/write
@@ -3076,7 +3076,7 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, int offset, int size)
  */
 static int submit_extent_page(unsigned int opf,
 			      struct writeback_control *wbc,
-			      struct page *page, u64 offset,
+			      struct page *page, u64 disk_bytenr,
 			      size_t size, unsigned long pg_offset,
 			      struct bio **bio_ret,
 			      bio_end_io_t end_io_func,
@@ -3088,7 +3088,7 @@ static int submit_extent_page(unsigned int opf,
 	int ret = 0;
 	struct bio *bio;
 	size_t io_size = min_t(size_t, size, PAGE_SIZE);
-	sector_t sector = offset >> 9;
+	sector_t sector = disk_bytenr >> 9;
 	struct extent_io_tree *tree = &BTRFS_I(page->mapping->host)->io_tree;
 
 	ASSERT(bio_ret);
@@ -3122,7 +3122,7 @@ static int submit_extent_page(unsigned int opf,
 		}
 	}
 
-	bio = btrfs_bio_alloc(offset);
+	bio = btrfs_bio_alloc(disk_bytenr);
 	bio_add_page(bio, page, io_size, pg_offset);
 	bio->bi_end_io = end_io_func;
 	bio->bi_private = tree;
@@ -3244,7 +3244,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 	}
 	while (cur <= end) {
 		bool force_bio_submit = false;
-		u64 offset;
+		u64 disk_bytenr;
 
 		if (cur >= last_byte) {
 			char *userpage;
@@ -3282,9 +3282,9 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		cur_end = min(extent_map_end(em) - 1, end);
 		iosize = ALIGN(iosize, blocksize);
 		if (this_bio_flag & EXTENT_BIO_COMPRESSED)
-			offset = em->block_start;
+			disk_bytenr = em->block_start;
 		else
-			offset = em->block_start + extent_offset;
+			disk_bytenr = em->block_start + extent_offset;
 		block_start = em->block_start;
 		if (test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
 			block_start = EXTENT_MAP_HOLE;
@@ -3373,7 +3373,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		}
 
 		ret = submit_extent_page(REQ_OP_READ | read_flags, NULL,
-					 page, offset, iosize,
+					 page, disk_bytenr, iosize,
 					 pg_offset, bio,
 					 end_bio_extent_readpage, 0,
 					 *bio_flags,
@@ -3550,8 +3550,8 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 	blocksize = inode->vfs_inode.i_sb->s_blocksize;
 
 	while (cur <= end) {
+		u64 disk_bytenr;
 		u64 em_end;
-		u64 offset;
 
 		if (cur >= i_size) {
 			btrfs_writepage_endio_finish_ordered(page, cur,
@@ -3571,7 +3571,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		BUG_ON(end < cur);
 		iosize = min(em_end - cur, end - cur + 1);
 		iosize = ALIGN(iosize, blocksize);
-		offset = em->block_start + extent_offset;
+		disk_bytenr = em->block_start + extent_offset;
 		block_start = em->block_start;
 		compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags);
 		free_extent_map(em);
@@ -3601,7 +3601,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		}
 
 		ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc,
-					 page, offset, iosize, pg_offset,
+					 page, disk_bytenr, iosize, pg_offset,
 					 &epd->bio,
 					 end_bio_extent_writepage,
 					 0, 0, 0, false);
@@ -3925,7 +3925,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 			struct writeback_control *wbc,
 			struct extent_page_data *epd)
 {
-	u64 offset = eb->start;
+	u64 disk_bytenr = eb->start;
 	u32 nritems;
 	int i, num_pages;
 	unsigned long start, end;
@@ -3958,7 +3958,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 		clear_page_dirty_for_io(p);
 		set_page_writeback(p);
 		ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc,
-					 p, offset, PAGE_SIZE, 0,
+					 p, disk_bytenr, PAGE_SIZE, 0,
 					 &epd->bio,
 					 end_bio_extent_buffer_writepage,
 					 0, 0, 0, false);
@@ -3971,7 +3971,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 			ret = -EIO;
 			break;
 		}
-		offset += PAGE_SIZE;
+		disk_bytenr += PAGE_SIZE;
 		update_nr_written(wbc, 1);
 		unlock_page(p);
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 02/18] btrfs: extent_io: refactor __extent_writepage_io() to improve readability
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
  2020-12-10  6:38 ` [PATCH v2 01/18] btrfs: extent_io: rename @offset parameter to @disk_bytenr for submit_extent_page() Qu Wenruo
@ 2020-12-10  6:38 ` Qu Wenruo
  2020-12-10 12:12   ` Nikolay Borisov
  2020-12-17 15:43   ` Josef Bacik
  2020-12-10  6:38 ` [PATCH v2 03/18] btrfs: file: update comment for btrfs_dirty_pages() Qu Wenruo
                   ` (15 subsequent siblings)
  17 siblings, 2 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:38 UTC (permalink / raw)
  To: linux-btrfs

The refactor involves the following modifications:
- iosize alignment
  In fact we don't really need to manually do alignment at all.
  All extent maps should already be aligned, thus basic ASSERT() check
  would be enough.

- redundant variables
  We have extra variable like blocksize/pg_offset/end.
  They are all unnecessary.

  @blocksize can be replaced by sectorsize size directly, and it's only
  used to verify the em start/size is aligned.

  @pg_offset can be easily calculated using @cur and page_offset(page).

  @end is just assigned to @page_end and never modified, use @page_end
  to replace it.

- remove some BUG_ON()s
  The BUG_ON()s are for extent map, which we have tree-checker to check
  on-disk extent data item and runtime check.
  ASSERT() should be enough.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 37 +++++++++++++++++--------------------
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2650e8720394..612fe60b367e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3515,17 +3515,14 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 				 unsigned long nr_written,
 				 int *nr_ret)
 {
+	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct extent_io_tree *tree = &inode->io_tree;
 	u64 start = page_offset(page);
 	u64 page_end = start + PAGE_SIZE - 1;
-	u64 end;
 	u64 cur = start;
 	u64 extent_offset;
 	u64 block_start;
-	u64 iosize;
 	struct extent_map *em;
-	size_t pg_offset = 0;
-	size_t blocksize;
 	int ret = 0;
 	int nr = 0;
 	const unsigned int write_flags = wbc_to_write_flags(wbc);
@@ -3546,19 +3543,17 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 	 */
 	update_nr_written(wbc, nr_written + 1);
 
-	end = page_end;
-	blocksize = inode->vfs_inode.i_sb->s_blocksize;
-
-	while (cur <= end) {
+	while (cur <= page_end) {
 		u64 disk_bytenr;
 		u64 em_end;
+		u32 iosize;
 
 		if (cur >= i_size) {
 			btrfs_writepage_endio_finish_ordered(page, cur,
 							     page_end, 1);
 			break;
 		}
-		em = btrfs_get_extent(inode, NULL, 0, cur, end - cur + 1);
+		em = btrfs_get_extent(inode, NULL, 0, cur, page_end - cur + 1);
 		if (IS_ERR_OR_NULL(em)) {
 			SetPageError(page);
 			ret = PTR_ERR_OR_ZERO(em);
@@ -3567,16 +3562,20 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 
 		extent_offset = cur - em->start;
 		em_end = extent_map_end(em);
-		BUG_ON(em_end <= cur);
-		BUG_ON(end < cur);
-		iosize = min(em_end - cur, end - cur + 1);
-		iosize = ALIGN(iosize, blocksize);
-		disk_bytenr = em->block_start + extent_offset;
+		ASSERT(cur <= em_end);
+		ASSERT(cur < page_end);
+		ASSERT(IS_ALIGNED(em->start, fs_info->sectorsize));
+		ASSERT(IS_ALIGNED(em->len, fs_info->sectorsize));
 		block_start = em->block_start;
 		compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags);
+		disk_bytenr = em->block_start + extent_offset;
+
+		/* Note that em_end from extent_map_end() is exclusive */
+		iosize = min(em_end, page_end + 1) - cur;
 		free_extent_map(em);
 		em = NULL;
 
+
 		/*
 		 * compressed and inline extents are written through other
 		 * paths in the FS
@@ -3589,7 +3588,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 				btrfs_writepage_endio_finish_ordered(page, cur,
 							cur + iosize - 1, 1);
 			cur += iosize;
-			pg_offset += iosize;
 			continue;
 		}
 
@@ -3597,12 +3595,12 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		if (!PageWriteback(page)) {
 			btrfs_err(inode->root->fs_info,
 				   "page %lu not writeback, cur %llu end %llu",
-			       page->index, cur, end);
+			       page->index, cur, page_end);
 		}
 
 		ret = submit_extent_page(REQ_OP_WRITE | write_flags, wbc,
-					 page, disk_bytenr, iosize, pg_offset,
-					 &epd->bio,
+					 page, disk_bytenr, iosize,
+					 cur - page_offset(page), &epd->bio,
 					 end_bio_extent_writepage,
 					 0, 0, 0, false);
 		if (ret) {
@@ -3611,8 +3609,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 				end_page_writeback(page);
 		}
 
-		cur = cur + iosize;
-		pg_offset += iosize;
+		cur += iosize;
 		nr++;
 	}
 	*nr_ret = nr;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 03/18] btrfs: file: update comment for btrfs_dirty_pages()
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
  2020-12-10  6:38 ` [PATCH v2 01/18] btrfs: extent_io: rename @offset parameter to @disk_bytenr for submit_extent_page() Qu Wenruo
  2020-12-10  6:38 ` [PATCH v2 02/18] btrfs: extent_io: refactor __extent_writepage_io() to improve readability Qu Wenruo
@ 2020-12-10  6:38 ` Qu Wenruo
  2020-12-10 12:16   ` Nikolay Borisov
  2020-12-10  6:38 ` [PATCH v2 04/18] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page Qu Wenruo
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:38 UTC (permalink / raw)
  To: linux-btrfs

The original comment is from the initial merge, which has several
problems:
- No holes check any more
- No inline decision is made

Update the out-of-date comment with more correct one.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0e41459b8de6..a29b50208eee 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -453,12 +453,15 @@ static void btrfs_drop_pages(struct page **pages, size_t num_pages)
 }
 
 /*
- * after copy_from_user, pages need to be dirtied and we need to make
- * sure holes are created between the current EOF and the start of
- * any next extents (if required).
- *
- * this also makes the decision about creating an inline extent vs
- * doing real data extents, marking pages dirty and delalloc as required.
+ * After btrfs_copy_from_user(), update the following things for delalloc:
+ * - DELALLOC extent io tree bits
+ *   Later btrfs_run_delalloc_range() relies on this bit to determine the
+ *   writeback range.
+ * - Page status
+ *   Including basic status like Dirty and Uptodate, and btrfs specific bit
+ *   like Checked (for cow fixup)
+ * - Inode size update
+ *   If needed
  */
 int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
 		      size_t num_pages, loff_t pos, size_t write_bytes,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 04/18] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (2 preceding siblings ...)
  2020-12-10  6:38 ` [PATCH v2 03/18] btrfs: file: update comment for btrfs_dirty_pages() Qu Wenruo
@ 2020-12-10  6:38 ` Qu Wenruo
  2020-12-10 13:51   ` Nikolay Borisov
  2020-12-17 15:50   ` Josef Bacik
  2020-12-10  6:38 ` [PATCH v2 05/18] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure Qu Wenruo
                   ` (13 subsequent siblings)
  17 siblings, 2 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:38 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Johannes Thumshirn

This patch will extract the code to grab an extent buffer from a page
into a helper, grab_extent_buffer_from_page().

This reduces one indent level, and provides the work place for later
expansion for subapge support.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 52 +++++++++++++++++++++++++++-----------------
 1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 612fe60b367e..6350c2687c7e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5251,6 +5251,32 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
 }
 #endif
 
+static struct extent_buffer *grab_extent_buffer_from_page(struct page *page)
+{
+	struct extent_buffer *exists;
+
+	/* Page not yet attached to an extent buffer */
+	if (!PagePrivate(page))
+		return NULL;
+
+	/*
+	 * We could have already allocated an eb for this page
+	 * and attached one so lets see if we can get a ref on
+	 * the existing eb, and if we can we know it's good and
+	 * we can just return that one, else we know we can just
+	 * overwrite page->private.
+	 */
+	exists = (struct extent_buffer *)page->private;
+	if (atomic_inc_not_zero(&exists->refs)) {
+		mark_extent_buffer_accessed(exists, page);
+		return exists;
+	}
+
+	WARN_ON(PageDirty(page));
+	detach_page_private(page);
+	return NULL;
+}
+
 struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 					  u64 start, u64 owner_root, int level)
 {
@@ -5296,26 +5322,12 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		}
 
 		spin_lock(&mapping->private_lock);
-		if (PagePrivate(p)) {
-			/*
-			 * We could have already allocated an eb for this page
-			 * and attached one so lets see if we can get a ref on
-			 * the existing eb, and if we can we know it's good and
-			 * we can just return that one, else we know we can just
-			 * overwrite page->private.
-			 */
-			exists = (struct extent_buffer *)p->private;
-			if (atomic_inc_not_zero(&exists->refs)) {
-				spin_unlock(&mapping->private_lock);
-				unlock_page(p);
-				put_page(p);
-				mark_extent_buffer_accessed(exists, p);
-				goto free_eb;
-			}
-			exists = NULL;
-
-			WARN_ON(PageDirty(p));
-			detach_page_private(p);
+		exists = grab_extent_buffer_from_page(p);
+		if (exists) {
+			spin_unlock(&mapping->private_lock);
+			unlock_page(p);
+			put_page(p);
+			goto free_eb;
 		}
 		attach_extent_buffer_page(eb, p);
 		spin_unlock(&mapping->private_lock);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 05/18] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (3 preceding siblings ...)
  2020-12-10  6:38 ` [PATCH v2 04/18] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page Qu Wenruo
@ 2020-12-10  6:38 ` Qu Wenruo
  2020-12-17 15:52   ` Josef Bacik
  2020-12-10  6:38 ` [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:38 UTC (permalink / raw)
  To: linux-btrfs

For btrfs subpage support, we need a structure to record extra info for
the status of each sectors of a page.

This patch will introduce the skeleton structure for future btrfs
subpage support.
All subpage related code would go to subpage.[ch] to avoid populating
the existing code base.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/Makefile  |  3 ++-
 fs/btrfs/subpage.c | 34 ++++++++++++++++++++++++++++++++++
 fs/btrfs/subpage.h | 31 +++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+), 1 deletion(-)
 create mode 100644 fs/btrfs/subpage.c
 create mode 100644 fs/btrfs/subpage.h

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 9f1b1a88e317..942562e11456 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -11,7 +11,8 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
 	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
 	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
 	   uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \
-	   block-rsv.o delalloc-space.o block-group.o discard.o reflink.o
+	   block-rsv.o delalloc-space.o block-group.o discard.o reflink.o \
+	   subpage.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
new file mode 100644
index 000000000000..9ca9f9ca61a9
--- /dev/null
+++ b/fs/btrfs/subpage.c
@@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "subpage.h"
+
+int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page)
+{
+	struct btrfs_subpage *subpage;
+
+	ASSERT(PageLocked(page));
+	/* Either not subpage, or the page already has private attached */
+	if (fs_info->sectorsize == PAGE_SIZE || PagePrivate(page))
+		return 0;
+
+	subpage = kzalloc(sizeof(*subpage), GFP_NOFS);
+	if (!subpage)
+		return -ENOMEM;
+
+	spin_lock_init(&subpage->lock);
+	attach_page_private(page, subpage);
+	return 0;
+}
+
+void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page)
+{
+	struct btrfs_subpage *subpage;
+
+	/* Either not subpage, or already detached */
+	if (fs_info->sectorsize == PAGE_SIZE || !PagePrivate(page))
+		return;
+
+	subpage = (struct btrfs_subpage *)detach_page_private(page);
+	ASSERT(subpage);
+	kfree(subpage);
+}
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
new file mode 100644
index 000000000000..96f3b226913e
--- /dev/null
+++ b/fs/btrfs/subpage.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef BTRFS_SUBPAGE_H
+#define BTRFS_SUBPAGE_H
+
+#include <linux/spinlock.h>
+#include "ctree.h"
+
+/*
+ * Since the maximum page size btrfs is going to support is 64K while the
+ * minimum sectorsize is 4K, this means a u16 bitmap is enough.
+ *
+ * The regular bitmap requires 32 bits as minimal bitmap size, so we can't use
+ * existing bitmap_* helpers here.
+ */
+#define BTRFS_SUBPAGE_BITMAP_SIZE	16
+
+/*
+ * Structure to trace status of each sector inside a page.
+ *
+ * Will be attached to page::private for both data and metadata inodes.
+ */
+struct btrfs_subpage {
+	/* Common members for both data and metadata pages */
+	spinlock_t lock;
+};
+
+int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
+void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
+
+#endif /* BTRFS_SUBPAGE_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (4 preceding siblings ...)
  2020-12-10  6:38 ` [PATCH v2 05/18] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure Qu Wenruo
@ 2020-12-10  6:38 ` Qu Wenruo
  2020-12-10 15:30   ` Nikolay Borisov
                     ` (2 more replies)
  2020-12-10  6:38 ` [PATCH v2 07/18] btrfs: extent_io: make grab_extent_buffer_from_page() " Qu Wenruo
                   ` (11 subsequent siblings)
  17 siblings, 3 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:38 UTC (permalink / raw)
  To: linux-btrfs

For subpage case, we need to allocate new memory for each metadata page.

So we need to:
- Allow attach_extent_buffer_page() to return int
  To indicate allocation failure

- Prealloc page->private for alloc_extent_buffer()
  We don't want to call memory allocation with spinlock hold, so
  do preallocation before we acquire the spin lock.

- Handle subpage and regular case differently in
  attach_extent_buffer_page()
  For regular case, just do the usual thing.
  For subpage case, allocate new memory and update the tree_block
  bitmap.

  The bitmap update will be handled by new subpage specific helper,
  btrfs_subpage_set_tree_block().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 69 +++++++++++++++++++++++++++++++++++---------
 fs/btrfs/subpage.h   | 44 ++++++++++++++++++++++++++++
 2 files changed, 99 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6350c2687c7e..51dd7ec3c2b3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -24,6 +24,7 @@
 #include "rcu-string.h"
 #include "backref.h"
 #include "disk-io.h"
+#include "subpage.h"
 
 static struct kmem_cache *extent_state_cache;
 static struct kmem_cache *extent_buffer_cache;
@@ -3142,22 +3143,41 @@ static int submit_extent_page(unsigned int opf,
 	return ret;
 }
 
-static void attach_extent_buffer_page(struct extent_buffer *eb,
+static int attach_extent_buffer_page(struct extent_buffer *eb,
 				      struct page *page)
 {
-	/*
-	 * If the page is mapped to btree inode, we should hold the private
-	 * lock to prevent race.
-	 * For cloned or dummy extent buffers, their pages are not mapped and
-	 * will not race with any other ebs.
-	 */
-	if (page->mapping)
-		lockdep_assert_held(&page->mapping->private_lock);
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	int ret;
 
-	if (!PagePrivate(page))
-		attach_page_private(page, eb);
-	else
-		WARN_ON(page->private != (unsigned long)eb);
+	if (fs_info->sectorsize == PAGE_SIZE) {
+		/*
+		 * If the page is mapped to btree inode, we should hold the
+		 * private lock to prevent race.
+		 * For cloned or dummy extent buffers, their pages are not
+		 * mapped and will not race with any other ebs.
+		 */
+		if (page->mapping)
+			lockdep_assert_held(&page->mapping->private_lock);
+
+		if (!PagePrivate(page))
+			attach_page_private(page, eb);
+		else
+			WARN_ON(page->private != (unsigned long)eb);
+		return 0;
+	}
+
+	/* Already mapped, just update the existing range */
+	if (PagePrivate(page))
+		goto update_bitmap;
+
+	/* Do new allocation to attach subpage */
+	ret = btrfs_attach_subpage(fs_info, page);
+	if (ret < 0)
+		return ret;
+
+update_bitmap:
+	btrfs_subpage_set_tree_block(fs_info, page, eb->start, eb->len);
+	return 0;
 }
 
 void set_page_extent_mapped(struct page *page)
@@ -5067,12 +5087,19 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
 		return NULL;
 
 	for (i = 0; i < num_pages; i++) {
+		int ret;
+
 		p = alloc_page(GFP_NOFS);
 		if (!p) {
 			btrfs_release_extent_buffer(new);
 			return NULL;
 		}
-		attach_extent_buffer_page(new, p);
+		ret = attach_extent_buffer_page(new, p);
+		if (ret < 0) {
+			put_page(p);
+			btrfs_release_extent_buffer(new);
+			return NULL;
+		}
 		WARN_ON(PageDirty(p));
 		SetPageUptodate(p);
 		new->pages[i] = p;
@@ -5321,6 +5348,18 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 			goto free_eb;
 		}
 
+		/*
+		 * Preallocate page->private for subpage case, so that
+		 * we won't allocate memory with private_lock hold.
+		 */
+		ret = btrfs_attach_subpage(fs_info, p);
+		if (ret < 0) {
+			unlock_page(p);
+			put_page(p);
+			exists = ERR_PTR(-ENOMEM);
+			goto free_eb;
+		}
+
 		spin_lock(&mapping->private_lock);
 		exists = grab_extent_buffer_from_page(p);
 		if (exists) {
@@ -5329,8 +5368,10 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 			put_page(p);
 			goto free_eb;
 		}
+		/* Should not fail, as we have attached the subpage already */
 		attach_extent_buffer_page(eb, p);
 		spin_unlock(&mapping->private_lock);
+
 		WARN_ON(PageDirty(p));
 		eb->pages[i] = p;
 		if (!PageUptodate(p))
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 96f3b226913e..c2ce603e7848 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -23,9 +23,53 @@
 struct btrfs_subpage {
 	/* Common members for both data and metadata pages */
 	spinlock_t lock;
+	union {
+		/* Structures only used by metadata */
+		struct {
+			u16 tree_block_bitmap;
+		};
+		/* structures only used by data */
+	};
 };
 
 int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
 void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
 
+/*
+ * Convert the [start, start + len) range into a u16 bitmap
+ *
+ * E.g. if start == page_offset() + 16K, len = 16K, we get 0x00f0.
+ */
+static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
+			struct page *page, u64 start, u32 len)
+{
+	int bit_start = (start - page_offset(page)) >> fs_info->sectorsize_bits;
+	int nbits = len >> fs_info->sectorsize_bits;
+
+	/* Basic checks */
+	ASSERT(PagePrivate(page) && page->private);
+	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
+	       IS_ALIGNED(len, fs_info->sectorsize));
+	ASSERT(page_offset(page) <= start &&
+	       start + len <= page_offset(page) + PAGE_SIZE);
+	/*
+	 * Here nbits can be 16, thus can go beyond u16 range. Here we make the
+	 * first left shift to be calculated in unsigned long (u32), then
+	 * truncate the result to u16.
+	 */
+	return (u16)(((1UL << nbits) - 1) << bit_start);
+}
+
+static inline void btrfs_subpage_set_tree_block(struct btrfs_fs_info *fs_info,
+			struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	unsigned long flags;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->tree_block_bitmap |= tmp;
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
 #endif /* BTRFS_SUBPAGE_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 07/18] btrfs: extent_io: make grab_extent_buffer_from_page() to handle subpage case
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (5 preceding siblings ...)
  2020-12-10  6:38 ` [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
@ 2020-12-10  6:38 ` Qu Wenruo
  2020-12-10 15:39   ` Nikolay Borisov
  2020-12-17 16:02   ` Josef Bacik
  2020-12-10  6:38 ` [PATCH v2 08/18] btrfs: extent_io: support subpage for extent buffer page release Qu Wenruo
                   ` (10 subsequent siblings)
  17 siblings, 2 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:38 UTC (permalink / raw)
  To: linux-btrfs

For subpage case, grab_extent_buffer_from_page() can't really get an
extent buffer just from btrfs_subpage.

Although we have btrfs_subpage::tree_block_bitmap, which can be used to
grab the bytenr of an existing extent buffer, and can then go radix tree
search to grab that existing eb.

However we are still doing radix tree insert check in
alloc_extent_buffer(), thus we don't really need to do the extra hassle,
just let alloc_extent_buffer() to handle existing eb in radix tree.

So for grab_extent_buffer_from_page(), just always return NULL for
subpage case.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 51dd7ec3c2b3..b99bd0402130 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5278,10 +5278,19 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
 }
 #endif
 
-static struct extent_buffer *grab_extent_buffer_from_page(struct page *page)
+static struct extent_buffer *grab_extent_buffer_from_page(
+		struct btrfs_fs_info *fs_info, struct page *page)
 {
 	struct extent_buffer *exists;
 
+	/*
+	 * For subpage case, we completely rely on radix tree to ensure we
+	 * don't try to insert two eb for the same bytenr.
+	 * So here we alwasy return NULL and just continue.
+	 */
+	if (fs_info->sectorsize < PAGE_SIZE)
+		return NULL;
+
 	/* Page not yet attached to an extent buffer */
 	if (!PagePrivate(page))
 		return NULL;
@@ -5361,7 +5370,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		}
 
 		spin_lock(&mapping->private_lock);
-		exists = grab_extent_buffer_from_page(p);
+		exists = grab_extent_buffer_from_page(fs_info, p);
 		if (exists) {
 			spin_unlock(&mapping->private_lock);
 			unlock_page(p);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 08/18] btrfs: extent_io: support subpage for extent buffer page release
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (6 preceding siblings ...)
  2020-12-10  6:38 ` [PATCH v2 07/18] btrfs: extent_io: make grab_extent_buffer_from_page() " Qu Wenruo
@ 2020-12-10  6:38 ` Qu Wenruo
  2020-12-10 16:13   ` Nikolay Borisov
  2020-12-10  6:38 ` [PATCH v2 09/18] btrfs: subpage: introduce helper for subpage uptodate status Qu Wenruo
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:38 UTC (permalink / raw)
  To: linux-btrfs

In btrfs_release_extent_buffer_pages(), we need to add extra handling
for subpage.

To do so, introduce a new helper, detach_extent_buffer_page(), to do
different handling for regular and subpage cases.

For subpage case, the new trick is to clear the range of current extent
buffer, and detach page private if and only if we're the last tree block
of the page.
This part is handled by the subpage helper,
btrfs_subpage_clear_and_test_tree_block().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 59 +++++++++++++++++++++++++++++++-------------
 fs/btrfs/subpage.h   | 24 ++++++++++++++++++
 2 files changed, 66 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b99bd0402130..ee81a2a1baa2 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4994,25 +4994,12 @@ int extent_buffer_under_io(const struct extent_buffer *eb)
 		test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
 }
 
-/*
- * Release all pages attached to the extent buffer.
- */
-static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
+static void detach_extent_buffer_page(struct extent_buffer *eb,
+				      struct page *page)
 {
-	int i;
-	int num_pages;
-	int mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
-
-	BUG_ON(extent_buffer_under_io(eb));
-
-	num_pages = num_extent_pages(eb);
-	for (i = 0; i < num_pages; i++) {
-		struct page *page = eb->pages[i];
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 
-		if (!page)
-			continue;
-		if (mapped)
-			spin_lock(&page->mapping->private_lock);
+	if (fs_info->sectorsize == PAGE_SIZE) {
 		/*
 		 * We do this since we'll remove the pages after we've
 		 * removed the eb from the radix tree, so we could race
@@ -5031,6 +5018,44 @@ static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
 			 */
 			detach_page_private(page);
 		}
+		return;
+	}
+
+	/*
+	 * For subpage case, clear the range in tree_block_bitmap,
+	 * and if we're the last one, detach private completely.
+	 */
+	if (PagePrivate(page)) {
+		bool last = false;
+
+		last = btrfs_subpage_clear_and_test_tree_block(fs_info, page,
+						eb->start, eb->len);
+		if (last)
+			btrfs_detach_subpage(fs_info, page);
+	}
+}
+
+/*
+ * Release all pages attached to the extent buffer.
+ */
+static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
+{
+	int i;
+	int num_pages;
+	int mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
+
+	ASSERT(!extent_buffer_under_io(eb));
+
+	num_pages = num_extent_pages(eb);
+	for (i = 0; i < num_pages; i++) {
+		struct page *page = eb->pages[i];
+
+		if (!page)
+			continue;
+		if (mapped)
+			spin_lock(&page->mapping->private_lock);
+
+		detach_extent_buffer_page(eb, page);
 
 		if (mapped)
 			spin_unlock(&page->mapping->private_lock);
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index c2ce603e7848..87b4e028ae18 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -72,4 +72,28 @@ static inline void btrfs_subpage_set_tree_block(struct btrfs_fs_info *fs_info,
 	spin_unlock_irqrestore(&subpage->lock, flags);
 }
 
+/*
+ * Clear the bits in tree_block_bitmap and return if we're the last bit set
+ * int tree_block_bitmap.
+ *
+ * Return true if we're the last bits in the tree_block_bitmap.
+ * Return false otherwise.
+ */
+static inline bool btrfs_subpage_clear_and_test_tree_block(
+			struct btrfs_fs_info *fs_info, struct page *page,
+			u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+	bool last = false;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->tree_block_bitmap &= ~tmp;
+	if (subpage->tree_block_bitmap == 0)
+		last = true;
+	spin_unlock_irqrestore(&subpage->lock, flags);
+	return last;
+}
+
 #endif /* BTRFS_SUBPAGE_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 09/18] btrfs: subpage: introduce helper for subpage uptodate status
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (7 preceding siblings ...)
  2020-12-10  6:38 ` [PATCH v2 08/18] btrfs: extent_io: support subpage for extent buffer page release Qu Wenruo
@ 2020-12-10  6:38 ` Qu Wenruo
  2020-12-11 10:10   ` Nikolay Borisov
  2020-12-10  6:38 ` [PATCH v2 10/18] btrfs: subpage: introduce helper for subpage error status Qu Wenruo
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:38 UTC (permalink / raw)
  To: linux-btrfs

This patch introduce the following functions to handle btrfs subpage
uptodate status:
- btrfs_subpage_set_uptodate()
- btrfs_subpage_clear_uptodate()
- btrfs_subpage_test_uptodate()
  Those helpers can only be called when the range is ensured to be
  inside the page.

- btrfs_page_set_uptodate()
- btrfs_page_clear_uptodate()
- btrfs_page_test_uptodate()
  Those helpers can handle both regular sector size and subpage without
  problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/subpage.h | 98 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 87b4e028ae18..b3cf9171ec98 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -23,6 +23,7 @@
 struct btrfs_subpage {
 	/* Common members for both data and metadata pages */
 	spinlock_t lock;
+	u16 uptodate_bitmap;
 	union {
 		/* Structures only used by metadata */
 		struct {
@@ -35,6 +36,17 @@ struct btrfs_subpage {
 int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
 void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
 
+static inline void btrfs_subpage_clamp_range(struct page *page,
+					     u64 *start, u32 *len)
+{
+	u64 orig_start = *start;
+	u32 orig_len = *len;
+
+	*start = max_t(u64, page_offset(page), orig_start);
+	*len = min_t(u64, page_offset(page) + PAGE_SIZE,
+		     orig_start + orig_len) - *start;
+}
+
 /*
  * Convert the [start, start + len) range into a u16 bitmap
  *
@@ -96,4 +108,90 @@ static inline bool btrfs_subpage_clear_and_test_tree_block(
 	return last;
 }
 
+static inline void btrfs_subpage_set_uptodate(struct btrfs_fs_info *fs_info,
+			struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->uptodate_bitmap |= tmp;
+	if (subpage->uptodate_bitmap == (u16)-1)
+		SetPageUptodate(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
+static inline void btrfs_subpage_clear_uptodate(struct btrfs_fs_info *fs_info,
+			struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->tree_block_bitmap &= ~tmp;
+	ClearPageUptodate(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
+/*
+ * Unlike set/clear which is dependent on each page status, for test all bits
+ * are tested in the same way.
+ */
+#define DECLARE_BTRFS_SUBPAGE_TEST_OP(name)				\
+static inline bool btrfs_subpage_test_##name(struct btrfs_fs_info *fs_info, \
+			struct page *page, u64 start, u32 len)		\
+{									\
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private; \
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len); \
+	unsigned long flags;						\
+	bool ret;							\
+									\
+	spin_lock_irqsave(&subpage->lock, flags);			\
+	ret = ((subpage->name##_bitmap & tmp) == tmp);			\
+	spin_unlock_irqrestore(&subpage->lock, flags);			\
+	return ret;							\
+}
+DECLARE_BTRFS_SUBPAGE_TEST_OP(uptodate);
+
+/*
+ * Note that, in selftest, especially extent-io-tests, we can have empty
+ * fs_info passed in.
+ * Thanfully in selftest, we only test sectorsize == PAGE_SIZE cases so far
+ * thus we can fall back to regular sectorsize branch.
+ */
+#define DECLARE_BTRFS_PAGE_OPS(name, set_page_func, clear_page_func,	\
+			       test_page_func)				\
+static inline void btrfs_page_set_##name(struct btrfs_fs_info *fs_info,	\
+			struct page *page, u64 start, u32 len)		\
+{									\
+	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) {	\
+		set_page_func(page);					\
+		return;							\
+	}								\
+	btrfs_subpage_clamp_range(page, &start, &len);			\
+	btrfs_subpage_set_##name(fs_info, page, start, len);		\
+}									\
+static inline void btrfs_page_clear_##name(struct btrfs_fs_info *fs_info, \
+			struct page *page, u64 start, u32 len)		\
+{									\
+	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE) {	\
+		clear_page_func(page);					\
+		return;							\
+	}								\
+	btrfs_subpage_clamp_range(page, &start, &len);			\
+	btrfs_subpage_clear_##name(fs_info, page, start, len);		\
+}									\
+static inline bool btrfs_page_test_##name(struct btrfs_fs_info *fs_info, \
+			struct page *page, u64 start, u32 len)		\
+{									\
+	if (unlikely(!fs_info) || fs_info->sectorsize == PAGE_SIZE)	\
+		return test_page_func(page);				\
+	btrfs_subpage_clamp_range(page, &start, &len);			\
+	return btrfs_subpage_test_##name(fs_info, page, start, len);	\
+}
+DECLARE_BTRFS_PAGE_OPS(uptodate, SetPageUptodate, ClearPageUptodate,
+			PageUptodate);
+
 #endif /* BTRFS_SUBPAGE_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 10/18] btrfs: subpage: introduce helper for subpage error status
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (8 preceding siblings ...)
  2020-12-10  6:38 ` [PATCH v2 09/18] btrfs: subpage: introduce helper for subpage uptodate status Qu Wenruo
@ 2020-12-10  6:38 ` Qu Wenruo
  2020-12-10  6:38 ` [PATCH v2 11/18] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:38 UTC (permalink / raw)
  To: linux-btrfs

This patch introduce the following functions to handle btrfs subpage
error status:
- btrfs_subpage_set_error()
- btrfs_subpage_clear_error()
- btrfs_subpage_test_error()
  Those helpers can only be called when the range is ensured to be
  inside the page.

- btrfs_page_set_error()
- btrfs_page_clear_error()
- btrfs_page_test_error()
  Those helpers can handle both regular sector size and subpage without
  problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/subpage.h | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index b3cf9171ec98..8592234d773e 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -24,6 +24,7 @@ struct btrfs_subpage {
 	/* Common members for both data and metadata pages */
 	spinlock_t lock;
 	u16 uptodate_bitmap;
+	u16 error_bitmap;
 	union {
 		/* Structures only used by metadata */
 		struct {
@@ -135,6 +136,35 @@ static inline void btrfs_subpage_clear_uptodate(struct btrfs_fs_info *fs_info,
 	spin_unlock_irqrestore(&subpage->lock, flags);
 }
 
+static inline void btrfs_subpage_set_error(struct btrfs_fs_info *fs_info,
+					   struct page *page, u64 start,
+					   u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->error_bitmap |= tmp;
+	SetPageError(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
+static inline void btrfs_subpage_clear_error(struct btrfs_fs_info *fs_info,
+					   struct page *page, u64 start,
+					   u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+	unsigned long flags;
+
+	spin_lock_irqsave(&subpage->lock, flags);
+	subpage->error_bitmap &= ~tmp;
+	if (subpage->error_bitmap == 0)
+		ClearPageError(page);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
 /*
  * Unlike set/clear which is dependent on each page status, for test all bits
  * are tested in the same way.
@@ -154,6 +184,7 @@ static inline bool btrfs_subpage_test_##name(struct btrfs_fs_info *fs_info, \
 	return ret;							\
 }
 DECLARE_BTRFS_SUBPAGE_TEST_OP(uptodate);
+DECLARE_BTRFS_SUBPAGE_TEST_OP(error);
 
 /*
  * Note that, in selftest, especially extent-io-tests, we can have empty
@@ -193,5 +224,6 @@ static inline bool btrfs_page_test_##name(struct btrfs_fs_info *fs_info, \
 }
 DECLARE_BTRFS_PAGE_OPS(uptodate, SetPageUptodate, ClearPageUptodate,
 			PageUptodate);
+DECLARE_BTRFS_PAGE_OPS(error, SetPageError, ClearPageError, PageError);
 
 #endif /* BTRFS_SUBPAGE_H */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 11/18] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (9 preceding siblings ...)
  2020-12-10  6:38 ` [PATCH v2 10/18] btrfs: subpage: introduce helper for subpage error status Qu Wenruo
@ 2020-12-10  6:38 ` Qu Wenruo
  2020-12-10  6:38 ` [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:38 UTC (permalink / raw)
  To: linux-btrfs

For those functions, to support subpage size they just need to call
btrfs_page_set/clear_uptodate() wrappers.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ee81a2a1baa2..141e414b1ab9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5611,30 +5611,33 @@ bool set_extent_buffer_dirty(struct extent_buffer *eb)
 
 void clear_extent_buffer_uptodate(struct extent_buffer *eb)
 {
-	int i;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 	struct page *page;
 	int num_pages;
+	int i;
 
 	clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
 		if (page)
-			ClearPageUptodate(page);
+			btrfs_page_clear_uptodate(fs_info, page,
+						  eb->start, eb->len);
 	}
 }
 
 void set_extent_buffer_uptodate(struct extent_buffer *eb)
 {
-	int i;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 	struct page *page;
 	int num_pages;
+	int i;
 
 	set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-		SetPageUptodate(page);
+		btrfs_page_set_uptodate(fs_info, page, eb->start, eb->len);
 	}
 }
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (10 preceding siblings ...)
  2020-12-10  6:38 ` [PATCH v2 11/18] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
@ 2020-12-10  6:38 ` Qu Wenruo
  2020-12-11 12:00   ` Nikolay Borisov
  2020-12-10  6:39 ` [PATCH v2 13/18] btrfs: extent_io: introduce read_extent_buffer_subpage() Qu Wenruo
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:38 UTC (permalink / raw)
  To: linux-btrfs

Unlike the original try_release_extent_buffer,
try_release_subpage_extent_buffer() will iterate through
btrfs_subpage::tree_block_bitmap, and try to release each extent buffer.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 141e414b1ab9..4d55803302e9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -6258,10 +6258,83 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
 	}
 }
 
+static int try_release_subpage_extent_buffer(struct page *page)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+	u64 page_start = page_offset(page);
+	int bitmap_size = BTRFS_SUBPAGE_BITMAP_SIZE;
+	int bit_start = 0;
+	int ret;
+
+	while (bit_start < bitmap_size) {
+		struct btrfs_subpage *subpage;
+		struct extent_buffer *eb;
+		unsigned long flags;
+		u16 tmp = 1 << bit_start;
+		u64 start;
+
+		/*
+		 * Make sure the page still has private, as previous run can
+		 * detach the private
+		 */
+		spin_lock(&page->mapping->private_lock);
+		if (!PagePrivate(page)) {
+			spin_unlock(&page->mapping->private_lock);
+			break;
+		}
+		subpage = (struct btrfs_subpage *)page->private;
+		spin_unlock(&page->mapping->private_lock);
+
+		spin_lock_irqsave(&subpage->lock, flags);
+		if (!(tmp & subpage->tree_block_bitmap))  {
+			spin_unlock_irqrestore(&subpage->lock, flags);
+			bit_start++;
+			continue;
+		}
+		spin_unlock_irqrestore(&subpage->lock, flags);
+
+		start = bit_start * fs_info->sectorsize + page_start;
+		bit_start += fs_info->nodesize >> fs_info->sectorsize_bits;
+		/*
+		 * Here we can't call find_extent_buffer() which will increase
+		 * eb->refs.
+		 */
+		rcu_read_lock();
+		eb = radix_tree_lookup(&fs_info->buffer_radix,
+				start >> fs_info->sectorsize_bits);
+		rcu_read_unlock();
+		ASSERT(eb);
+		spin_lock(&eb->refs_lock);
+		if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb) ||
+		    !test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) {
+			spin_unlock(&eb->refs_lock);
+			continue;
+		}
+		/*
+		 * Here we don't care the return value, we will always check
+		 * the page private at the end.
+		 * And release_extent_buffer() will release the refs_lock.
+		 */
+		release_extent_buffer(eb);
+	}
+	/* Finally to check if we have cleared page private */
+	spin_lock(&page->mapping->private_lock);
+	if (!PagePrivate(page))
+		ret = 1;
+	else
+		ret = 0;
+	spin_unlock(&page->mapping->private_lock);
+	return ret;
+
+}
+
 int try_release_extent_buffer(struct page *page)
 {
 	struct extent_buffer *eb;
 
+	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
+		return try_release_subpage_extent_buffer(page);
+
 	/*
 	 * We need to make sure nobody is attaching this page to an eb right
 	 * now.
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 13/18] btrfs: extent_io: introduce read_extent_buffer_subpage()
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (11 preceding siblings ...)
  2020-12-10  6:38 ` [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
@ 2020-12-10  6:39 ` Qu Wenruo
  2020-12-10  6:39 ` [PATCH v2 14/18] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case Qu Wenruo
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:39 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new helper, read_extent_buffer_subpage(), to do the subpage
extent buffer read.

The difference between regular and subpage routines are:
- No page locking
  Here we completely rely on extent locking.
  Page locking can reduce the concurrency greatly, as if we lock one
  page to read one extent buffer, all the other extent buffers in the
  same page will have to wait.

- Extent uptodate condition
  Despite the existing PageUptodate() and EXTENT_BUFFER_UPTODATE check,
  We also need to check btrfs_subpage::uptodate_bitmap.

- No page loop
  Just one page, no need to loop, this greately simplified the subpage
  routine.

This patch only implemented the bio submit part, no endio support yet.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   |  1 +
 fs/btrfs/extent_io.c | 70 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 765deefda92b..b6c03a8b0c72 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -602,6 +602,7 @@ int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
 	ASSERT(page->private);
 	eb = (struct extent_buffer *)page->private;
 
+
 	/*
 	 * The pending IO might have been the only thing that kept this buffer
 	 * in memory.  Make sure we have a ref for all this other checks
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4d55803302e9..1ec9de2aa910 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5641,6 +5641,73 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
 	}
 }
 
+static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
+				      int mirror_num)
+{
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	struct extent_io_tree *io_tree;
+	struct page *page = eb->pages[0];
+	struct bio *bio = NULL;
+	int ret = 0;
+
+	ASSERT(!test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags));
+	ASSERT(PagePrivate(page));
+	io_tree = &BTRFS_I(fs_info->btree_inode)->io_tree;
+
+	if (wait == WAIT_NONE) {
+		ret = try_lock_extent(io_tree, eb->start,
+				      eb->start + eb->len - 1);
+		if (ret <= 0)
+			return ret;
+	} else {
+		ret = lock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		if (ret < 0)
+			return ret;
+	}
+
+	ret = 0;
+	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags) ||
+	    PageUptodate(page) ||
+	    btrfs_subpage_test_uptodate(fs_info, page, eb->start, eb->len)) {
+		set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+		unlock_extent(io_tree, eb->start, eb->start + eb->len - 1);
+		return ret;
+	}
+
+	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
+	eb->read_mirror = 0;
+	atomic_set(&eb->io_pages, 1);
+	check_buffer_tree_ref(eb);
+
+	ret = submit_extent_page(REQ_OP_READ | REQ_META, NULL, page, eb->start,
+				 eb->len, eb->start - page_offset(page), &bio,
+				 end_bio_extent_readpage, mirror_num, 0, 0,
+				 true);
+	if (ret) {
+		/*
+		 * In the endio function, if we hit something wrong we will
+		 * increase the io_pages, so here we need to decrease it for error
+		 * path.
+		 */
+		atomic_dec(&eb->io_pages);
+	}
+	if (bio) {
+		int tmp;
+
+		tmp = submit_one_bio(bio, mirror_num, 0);
+		if (tmp < 0)
+			return tmp;
+	}
+	if (ret || wait != WAIT_COMPLETE)
+		return ret;
+
+	wait_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
+			EXTENT_LOCKED);
+	if (!test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
+		ret = -EIO;
+	return ret;
+}
+
 int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 {
 	int i;
@@ -5657,6 +5724,9 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
 		return 0;
 
+	if (eb->fs_info->sectorsize < PAGE_SIZE)
+		return read_extent_buffer_subpage(eb, wait, mirror_num);
+
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 14/18] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (12 preceding siblings ...)
  2020-12-10  6:39 ` [PATCH v2 13/18] btrfs: extent_io: introduce read_extent_buffer_subpage() Qu Wenruo
@ 2020-12-10  6:39 ` Qu Wenruo
  2020-12-14  9:57   ` Nikolay Borisov
  2020-12-10  6:39 ` [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check Qu Wenruo
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:39 UTC (permalink / raw)
  To: linux-btrfs

To handle subpage status update, add the following new tricks:
- Use btrfs_page_*() helpers to update page status
  Now we can handle both cases well.

- No page unlock for subpage metadata
  Since subpage metadata doesn't utilize page locking at all, skip it.
  For subpage data locking, it's handled in later commits.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1ec9de2aa910..64a19c1884fc 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2841,15 +2841,26 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
 	processed->uptodate = uptodate;
 }
 
-static void endio_readpage_update_page_status(struct page *page, bool uptodate)
+static void endio_readpage_update_page_status(struct page *page, bool uptodate,
+					      u64 start, u64 end)
 {
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+	u32 len;
+
+	ASSERT(page_offset(page) <= start &&
+		end <= page_offset(page) + PAGE_SIZE - 1);
+	len = end + 1 - start;
+
 	if (uptodate) {
-		SetPageUptodate(page);
+		btrfs_page_set_uptodate(fs_info, page, start, len);
 	} else {
-		ClearPageUptodate(page);
-		SetPageError(page);
+		btrfs_page_clear_uptodate(fs_info, page, start, len);
+		btrfs_page_set_error(fs_info, page, start, len);
 	}
-	unlock_page(page);
+
+	if (fs_info->sectorsize == PAGE_SIZE)
+		unlock_page(page);
+	/* Subpage locking will be handled in later patches */
 }
 
 /*
@@ -2986,7 +2997,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		bio_offset += len;
 
 		/* Update page status and unlock */
-		endio_readpage_update_page_status(page, uptodate);
+		endio_readpage_update_page_status(page, uptodate, start, end);
 		endio_readpage_release_extent(&processed, BTRFS_I(inode),
 					      start, end, uptodate);
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (13 preceding siblings ...)
  2020-12-10  6:39 ` [PATCH v2 14/18] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case Qu Wenruo
@ 2020-12-10  6:39 ` Qu Wenruo
  2020-12-10 13:24     ` kernel test robot
                     ` (2 more replies)
  2020-12-10  6:39 ` [PATCH v2 16/18] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
                   ` (2 subsequent siblings)
  17 siblings, 3 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:39 UTC (permalink / raw)
  To: linux-btrfs

For subpage metadata validation check, there are some difference:
- Read must finish in one bvec
  Since we're just reading one subpage range in one page, it should
  never be split into two bios nor two bvecs.

- How to grab the existing eb
  Instead of grabbing eb using page->private, we have to go search radix
  tree as we don't have any direct pointer at hand.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b6c03a8b0c72..adda76895058 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -591,6 +591,84 @@ static int validate_extent_buffer(struct extent_buffer *eb)
 	return ret;
 }
 
+static int validate_subpage_buffer(struct page *page, u64 start, u64 end,
+				   int mirror)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
+	struct extent_buffer *eb;
+	int reads_done;
+	int ret = 0;
+
+	if (!IS_ALIGNED(start, fs_info->sectorsize) ||
+	    !IS_ALIGNED(end - start + 1, fs_info->sectorsize) ||
+	    !IS_ALIGNED(end - start + 1, fs_info->nodesize)) {
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		btrfs_err(fs_info, "invalid tree read bytenr");
+		return -EUCLEAN;
+	}
+
+	/*
+	 * We don't allow bio merge for subpage metadata read, so we should
+	 * only get one eb for each endio hook.
+	 */
+	ASSERT(end == start + fs_info->nodesize - 1);
+	ASSERT(PagePrivate(page));
+
+	rcu_read_lock();
+	eb = radix_tree_lookup(&fs_info->buffer_radix,
+			       start / fs_info->sectorsize);
+	rcu_read_unlock();
+
+	/*
+	 * When we are reading one tree block, eb must have been
+	 * inserted into the radix tree. If not something is wrong.
+	 */
+	if (!eb) {
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		btrfs_err(fs_info,
+			"can't find extent buffer for bytenr %llu",
+			start);
+		return -EUCLEAN;
+	}
+	/*
+	 * The pending IO might have been the only thing that kept
+	 * this buffer in memory.  Make sure we have a ref for all
+	 * this other checks
+	 */
+	atomic_inc(&eb->refs);
+
+	reads_done = atomic_dec_and_test(&eb->io_pages);
+	/* Subpage read must finish in page read */
+	ASSERT(reads_done);
+
+	eb->read_mirror = mirror;
+	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
+		ret = -EIO;
+		goto err;
+	}
+	ret = validate_extent_buffer(eb);
+	if (ret < 0)
+		goto err;
+
+	if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
+		btree_readahead_hook(eb, ret);
+
+	set_extent_buffer_uptodate(eb);
+
+	free_extent_buffer(eb);
+	return ret;
+err:
+	/*
+	 * our io error hook is going to dec the io pages
+	 * again, we have to make sure it has something to
+	 * decrement
+	 */
+	atomic_inc(&eb->io_pages);
+	clear_extent_buffer_uptodate(eb);
+	free_extent_buffer(eb);
+	return ret;
+}
+
 int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
 				   struct page *page, u64 start, u64 end,
 				   int mirror)
@@ -600,6 +678,10 @@ int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
 	int reads_done;
 
 	ASSERT(page->private);
+
+	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
+		return validate_subpage_buffer(page, start, end, mirror);
+
 	eb = (struct extent_buffer *)page->private;
 
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 16/18] btrfs: introduce btrfs_subpage for data inodes
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (14 preceding siblings ...)
  2020-12-10  6:39 ` [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check Qu Wenruo
@ 2020-12-10  6:39 ` Qu Wenruo
  2020-12-10  9:44     ` kernel test robot
                     ` (2 more replies)
  2020-12-10  6:39 ` [PATCH v2 17/18] btrfs: integrate page status update for read path into begin/end_page_read() Qu Wenruo
  2020-12-10  6:39 ` [PATCH v2 18/18] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
  17 siblings, 3 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:39 UTC (permalink / raw)
  To: linux-btrfs

To support subpage sector size, data also need extra info to make sure
which sectors in a page are uptodate/dirty/...

This patch will make pages for data inodes to get btrfs_subpage
structure attached, and detached when the page is freed.

This patch also slightly changes the timing when
set_page_extent_mapped() to make sure:
- We have page->mapping set
  page->mapping->host is used to grab btrfs_fs_info, thus we can only
  call this function after page is mapped to an inode.

  One call site attaches pages to inode manually, thus we have to modify
  the timing of set_page_extent_mapped() a little.

- As soon as possible, before other operations
  Since memory allocation can fail, we have to do extra error handling.
  Calling set_page_extent_mapped() as soon as possible can simply the
  error handling for several call sites.

The idea is pretty much the same as iomap_page, but with more bitmaps
for btrfs specific cases.

Currently the plan is to switch iomap if iomap can provide sector
aligned write back (only write back dirty sectors, but not the full
page, data balance require this feature).

So we will stick to btrfs specific bitmap for now.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c      | 10 ++++++--
 fs/btrfs/extent_io.c        | 47 +++++++++++++++++++++++++++++++++----
 fs/btrfs/extent_io.h        |  3 ++-
 fs/btrfs/file.c             | 10 +++++---
 fs/btrfs/free-space-cache.c | 15 +++++++++---
 fs/btrfs/inode.c            | 12 ++++++----
 fs/btrfs/ioctl.c            |  5 +++-
 fs/btrfs/reflink.c          |  5 +++-
 fs/btrfs/relocation.c       | 12 ++++++++--
 9 files changed, 98 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 5ae3fa0386b7..6d203acfdeb3 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -542,13 +542,19 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 			goto next;
 		}
 
-		end = last_offset + PAGE_SIZE - 1;
 		/*
 		 * at this point, we have a locked page in the page cache
 		 * for these bytes in the file.  But, we have to make
 		 * sure they map to this compressed extent on disk.
 		 */
-		set_page_extent_mapped(page);
+		ret = set_page_extent_mapped(page);
+		if (ret < 0) {
+			unlock_page(page);
+			put_page(page);
+			break;
+		}
+
+		end = last_offset + PAGE_SIZE - 1;
 		lock_extent(tree, last_offset, end);
 		read_lock(&em_tree->lock);
 		em = lookup_extent_mapping(em_tree, last_offset,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 64a19c1884fc..4e4ed9c453ae 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3191,10 +3191,40 @@ static int attach_extent_buffer_page(struct extent_buffer *eb,
 	return 0;
 }
 
-void set_page_extent_mapped(struct page *page)
+int __must_check set_page_extent_mapped(struct page *page)
 {
-	if (!PagePrivate(page))
+	struct btrfs_fs_info *fs_info;
+
+	ASSERT(page->mapping);
+
+	if (PagePrivate(page))
+		return 0;
+
+	fs_info = btrfs_sb(page->mapping->host->i_sb);
+	if (fs_info->sectorsize == PAGE_SIZE) {
 		attach_page_private(page, (void *)EXTENT_PAGE_PRIVATE);
+		return 0;
+	}
+
+	return btrfs_attach_subpage(fs_info, page);
+}
+
+void clear_page_extent_mapped(struct page *page)
+{
+	struct btrfs_fs_info *fs_info;
+
+	ASSERT(page->mapping);
+
+	if (!PagePrivate(page))
+		return;
+
+	fs_info = btrfs_sb(page->mapping->host->i_sb);
+	if (fs_info->sectorsize == PAGE_SIZE) {
+		detach_page_private(page);
+		return;
+	}
+
+	btrfs_detach_subpage(fs_info, page);
 }
 
 static struct extent_map *
@@ -3251,7 +3281,12 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 	unsigned long this_bio_flag = 0;
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
 
-	set_page_extent_mapped(page);
+	ret = set_page_extent_mapped(page);
+	if (ret < 0) {
+		unlock_extent(tree, start, end);
+		SetPageError(page);
+		goto out;
+	}
 
 	if (!PageUptodate(page)) {
 		if (cleancache_get_page(page) == 0) {
@@ -3693,7 +3728,11 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc,
 		flush_dcache_page(page);
 	}
 
-	set_page_extent_mapped(page);
+	ret = set_page_extent_mapped(page);
+	if (ret < 0) {
+		SetPageError(page);
+		goto done;
+	}
 
 	if (!epd->extent_locked) {
 		ret = writepage_delalloc(BTRFS_I(inode), page, wbc, start,
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 19221095c635..349d044c1254 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -178,7 +178,8 @@ int btree_write_cache_pages(struct address_space *mapping,
 void extent_readahead(struct readahead_control *rac);
 int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
 		  u64 start, u64 len);
-void set_page_extent_mapped(struct page *page);
+int __must_check set_page_extent_mapped(struct page *page);
+void clear_page_extent_mapped(struct page *page);
 
 struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 					  u64 start, u64 owner_root, int level);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a29b50208eee..9b878616b489 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1373,6 +1373,12 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
 			goto fail;
 		}
 
+		err = set_page_extent_mapped(pages[i]);
+		if (err < 0) {
+			faili = i;
+			goto fail;
+		}
+
 		if (i == 0)
 			err = prepare_uptodate_page(inode, pages[i], pos,
 						    force_uptodate);
@@ -1470,10 +1476,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
 	 * We'll call btrfs_dirty_pages() later on, and that will flip around
 	 * delalloc bits and dirty the pages as required.
 	 */
-	for (i = 0; i < num_pages; i++) {
-		set_page_extent_mapped(pages[i]);
+	for (i = 0; i < num_pages; i++)
 		WARN_ON(!PageLocked(pages[i]));
-	}
 
 	return ret;
 }
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 71d0d14bc18b..c347b415060a 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -431,11 +431,22 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, bool uptodate)
 	int i;
 
 	for (i = 0; i < io_ctl->num_pages; i++) {
+		int ret;
+
 		page = find_or_create_page(inode->i_mapping, i, mask);
 		if (!page) {
 			io_ctl_drop_pages(io_ctl);
 			return -ENOMEM;
 		}
+
+		ret = set_page_extent_mapped(page);
+		if (ret < 0) {
+			unlock_page(page);
+			put_page(page);
+			io_ctl_drop_pages(io_ctl);
+			return -ENOMEM;
+		}
+
 		io_ctl->pages[i] = page;
 		if (uptodate && !PageUptodate(page)) {
 			btrfs_readpage(NULL, page);
@@ -455,10 +466,8 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, bool uptodate)
 		}
 	}
 
-	for (i = 0; i < io_ctl->num_pages; i++) {
+	for (i = 0; i < io_ctl->num_pages; i++)
 		clear_page_dirty_for_io(io_ctl->pages[i]);
-		set_page_extent_mapped(io_ctl->pages[i]);
-	}
 
 	return 0;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 070716650df8..5b64715df92e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4720,6 +4720,9 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 		ret = -ENOMEM;
 		goto out;
 	}
+	ret = set_page_extent_mapped(page);
+	if (ret < 0)
+		goto out_unlock;
 
 	if (!PageUptodate(page)) {
 		ret = btrfs_readpage(NULL, page);
@@ -4737,7 +4740,6 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 	wait_on_page_writeback(page);
 
 	lock_extent_bits(io_tree, block_start, block_end, &cached_state);
-	set_page_extent_mapped(page);
 
 	ordered = btrfs_lookup_ordered_extent(inode, block_start);
 	if (ordered) {
@@ -8117,7 +8119,7 @@ static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags)
 {
 	int ret = try_release_extent_mapping(page, gfp_flags);
 	if (ret == 1)
-		detach_page_private(page);
+		clear_page_extent_mapped(page);
 	return ret;
 }
 
@@ -8276,7 +8278,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 	}
 
 	ClearPageChecked(page);
-	detach_page_private(page);
+	clear_page_extent_mapped(page);
 }
 
 /*
@@ -8355,7 +8357,9 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	wait_on_page_writeback(page);
 
 	lock_extent_bits(io_tree, page_start, page_end, &cached_state);
-	set_page_extent_mapped(page);
+	ret = set_page_extent_mapped(page);
+	if (ret < 0)
+		goto out_unlock;
 
 	/*
 	 * we can't set the delalloc bits if there are pending ordered
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index dde49a791f3e..1d58ffb9212f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1319,6 +1319,10 @@ static int cluster_pages_for_defrag(struct inode *inode,
 		if (!page)
 			break;
 
+		ret = set_page_extent_mapped(page);
+		if (ret < 0)
+			break;
+
 		page_start = page_offset(page);
 		page_end = page_start + PAGE_SIZE - 1;
 		while (1) {
@@ -1440,7 +1444,6 @@ static int cluster_pages_for_defrag(struct inode *inode,
 	for (i = 0; i < i_done; i++) {
 		clear_page_dirty_for_io(pages[i]);
 		ClearPageChecked(pages[i]);
-		set_page_extent_mapped(pages[i]);
 		set_page_dirty(pages[i]);
 		unlock_page(pages[i]);
 		put_page(pages[i]);
diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index b03e7891394e..b24396cf2f99 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -81,7 +81,10 @@ static int copy_inline_to_page(struct btrfs_inode *inode,
 		goto out_unlock;
 	}
 
-	set_page_extent_mapped(page);
+	ret = set_page_extent_mapped(page);
+	if (ret < 0)
+		goto out_unlock;
+
 	clear_extent_bit(&inode->io_tree, file_offset, range_end,
 			 EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
 			 0, 0, NULL);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 19b7db8b2117..41ee0f376af3 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2679,6 +2679,16 @@ static int relocate_file_extent_cluster(struct inode *inode,
 				goto out;
 			}
 		}
+		ret = set_page_extent_mapped(page);
+		if (ret < 0) {
+			btrfs_delalloc_release_metadata(BTRFS_I(inode),
+						PAGE_SIZE, true);
+			btrfs_delalloc_release_extents(BTRFS_I(inode),
+						PAGE_SIZE);
+			unlock_page(page);
+			put_page(page);
+			goto out;
+		}
 
 		if (PageReadahead(page)) {
 			page_cache_async_readahead(inode->i_mapping,
@@ -2706,8 +2716,6 @@ static int relocate_file_extent_cluster(struct inode *inode,
 
 		lock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end);
 
-		set_page_extent_mapped(page);
-
 		if (nr < cluster->nr &&
 		    page_start + offset == cluster->boundary[nr]) {
 			set_extent_bits(&BTRFS_I(inode)->io_tree,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 17/18] btrfs: integrate page status update for read path into begin/end_page_read()
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (15 preceding siblings ...)
  2020-12-10  6:39 ` [PATCH v2 16/18] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
@ 2020-12-10  6:39 ` Qu Wenruo
  2020-12-14 13:59   ` Nikolay Borisov
  2020-12-10  6:39 ` [PATCH v2 18/18] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
  17 siblings, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:39 UTC (permalink / raw)
  To: linux-btrfs

In btrfs data page read path, the page status update are handled in two
different locations:

  btrfs_do_read_page()
  {
	while (cur <= end) {
		/* No need to read from disk */
		if (HOLE/PREALLOC/INLINE){
			memset();
			set_extent_uptodate();
			continue;
		}
		/* Read from disk */
		ret = submit_extent_page(end_bio_extent_readpage);
  }

  end_bio_extent_readpage()
  {
	endio_readpage_uptodate_page_status();
  }

This is fine for sectorsize == PAGE_SIZE case, as for above loop we
should only hit one branch and then exit.

But for subpage, there are more works to be done in page status update:
- Page Unlock condition
  Unlike regular page size == sectorsize case, we can no longer just
  unlock a page.
  Only the last reader of the page can unlock the page.
  This means, we can unlock the page either in the while() loop, or in
  the endio function.

- Page uptodate condition
  Since we have multiple sectors to read for a page, we can only mark
  the full page uptodate if all sectors are uptodate.

To handle both subpage and regular cases, introduce a pair of functions
to help handling page status update:

- being_page_read()
  For regular case, it does nothing.
  For subpage case, it update the reader counters so that later
  end_page_read() can know who is the last one to unlock the page.

- end_page_read()
  This is just endio_readpage_uptodate_page_status() renamed.
  The original name is a little too long and too specific for endio.

  The only new trick added is the condition for page unlock.
  Now for subage data, we unlock the page if we're the last reader.

This does not only provide the basis for subpage data read, but also
hide the special handling of page read from the main read loop.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 39 +++++++++++++++++++++++++-----------
 fs/btrfs/subpage.h   | 47 ++++++++++++++++++++++++++++++++++++++------
 2 files changed, 68 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4e4ed9c453ae..56174e7f0ae8 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2841,8 +2841,18 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
 	processed->uptodate = uptodate;
 }
 
-static void endio_readpage_update_page_status(struct page *page, bool uptodate,
-					      u64 start, u64 end)
+static void begin_data_page_read(struct btrfs_fs_info *fs_info, struct page *page)
+{
+	ASSERT(PageLocked(page));
+	if (fs_info->sectorsize == PAGE_SIZE)
+		return;
+
+	ASSERT(PagePrivate(page) && page->private);
+	ASSERT(page->mapping->host != fs_info->btree_inode);
+	btrfs_subpage_start_reader(fs_info, page, page_offset(page), PAGE_SIZE);
+}
+
+static void end_page_read(struct page *page, bool uptodate, u64 start, u64 end)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
 	u32 len;
@@ -2860,7 +2870,12 @@ static void endio_readpage_update_page_status(struct page *page, bool uptodate,
 
 	if (fs_info->sectorsize == PAGE_SIZE)
 		unlock_page(page);
-	/* Subpage locking will be handled in later patches */
+	else if (page->mapping->host != fs_info->btree_inode)
+		/*
+		 * For subpage data, unlock the page if we're the last reader.
+		 * For subpage metadata, page lock is not utilized for read.
+		 */
+		btrfs_subpage_end_reader(fs_info, page, start, len);
 }
 
 /*
@@ -2997,7 +3012,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 		bio_offset += len;
 
 		/* Update page status and unlock */
-		endio_readpage_update_page_status(page, uptodate, start, end);
+		end_page_read(page, uptodate, start, end);
 		endio_readpage_release_extent(&processed, BTRFS_I(inode),
 					      start, end, uptodate);
 	}
@@ -3265,6 +3280,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		      unsigned int read_flags, u64 *prev_em_start)
 {
 	struct inode *inode = page->mapping->host;
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	u64 start = page_offset(page);
 	const u64 end = start + PAGE_SIZE - 1;
 	u64 cur = start;
@@ -3308,6 +3324,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 			kunmap_atomic(userpage);
 		}
 	}
+	begin_data_page_read(fs_info, page);
 	while (cur <= end) {
 		bool force_bio_submit = false;
 		u64 disk_bytenr;
@@ -3325,13 +3342,14 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 					    &cached, GFP_NOFS);
 			unlock_extent_cached(tree, cur,
 					     cur + iosize - 1, &cached);
+			end_page_read(page, true, cur, cur + iosize - 1);
 			break;
 		}
 		em = __get_extent_map(inode, page, pg_offset, cur,
 				      end - cur + 1, em_cached);
 		if (IS_ERR_OR_NULL(em)) {
-			SetPageError(page);
 			unlock_extent(tree, cur, end);
+			end_page_read(page, false, cur, end);
 			break;
 		}
 		extent_offset = cur - em->start;
@@ -3414,6 +3432,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 					    &cached, GFP_NOFS);
 			unlock_extent_cached(tree, cur,
 					     cur + iosize - 1, &cached);
+			end_page_read(page, true, cur, cur + iosize - 1);
 			cur = cur + iosize;
 			pg_offset += iosize;
 			continue;
@@ -3423,6 +3442,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 				   EXTENT_UPTODATE, 1, NULL)) {
 			check_page_uptodate(tree, page);
 			unlock_extent(tree, cur, cur + iosize - 1);
+			end_page_read(page, true, cur, cur + iosize - 1);
 			cur = cur + iosize;
 			pg_offset += iosize;
 			continue;
@@ -3431,8 +3451,8 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		 * to date.  Error out
 		 */
 		if (block_start == EXTENT_MAP_INLINE) {
-			SetPageError(page);
 			unlock_extent(tree, cur, cur + iosize - 1);
+			end_page_read(page, false, cur, cur + iosize - 1);
 			cur = cur + iosize;
 			pg_offset += iosize;
 			continue;
@@ -3449,19 +3469,14 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 			nr++;
 			*bio_flags = this_bio_flag;
 		} else {
-			SetPageError(page);
 			unlock_extent(tree, cur, cur + iosize - 1);
+			end_page_read(page, false, cur, cur + iosize - 1);
 			goto out;
 		}
 		cur = cur + iosize;
 		pg_offset += iosize;
 	}
 out:
-	if (!nr) {
-		if (!PageError(page))
-			SetPageUptodate(page);
-		unlock_page(page);
-	}
 	return ret;
 }
 
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 8592234d773e..6c801ef00d2d 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -31,6 +31,9 @@ struct btrfs_subpage {
 			u16 tree_block_bitmap;
 		};
 		/* structures only used by data */
+		struct {
+			atomic_t readers;
+		};
 	};
 };
 
@@ -48,6 +51,17 @@ static inline void btrfs_subpage_clamp_range(struct page *page,
 		     orig_start + orig_len) - *start;
 }
 
+static inline void btrfs_subpage_assert(struct btrfs_fs_info *fs_info,
+					struct page *page, u64 start, u32 len)
+{
+	/* Basic checks */
+	ASSERT(PagePrivate(page) && page->private);
+	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
+	       IS_ALIGNED(len, fs_info->sectorsize));
+	ASSERT(page_offset(page) <= start &&
+	       start + len <= page_offset(page) + PAGE_SIZE);
+}
+
 /*
  * Convert the [start, start + len) range into a u16 bitmap
  *
@@ -59,12 +73,8 @@ static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
 	int bit_start = (start - page_offset(page)) >> fs_info->sectorsize_bits;
 	int nbits = len >> fs_info->sectorsize_bits;
 
-	/* Basic checks */
-	ASSERT(PagePrivate(page) && page->private);
-	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
-	       IS_ALIGNED(len, fs_info->sectorsize));
-	ASSERT(page_offset(page) <= start &&
-	       start + len <= page_offset(page) + PAGE_SIZE);
+	btrfs_subpage_assert(fs_info, page, start, len);
+
 	/*
 	 * Here nbits can be 16, thus can go beyond u16 range. Here we make the
 	 * first left shift to be calculated in unsigned long (u32), then
@@ -73,6 +83,31 @@ static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
 	return (u16)(((1UL << nbits) - 1) << bit_start);
 }
 
+static inline void btrfs_subpage_start_reader(struct btrfs_fs_info *fs_info,
+					      struct page *page, u64 start,
+					      u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	int nbits = len >> fs_info->sectorsize_bits;
+
+	btrfs_subpage_assert(fs_info, page, start, len);
+
+	ASSERT(atomic_read(&subpage->readers) == 0);
+	atomic_set(&subpage->readers, nbits);
+}
+
+static inline void btrfs_subpage_end_reader(struct btrfs_fs_info *fs_info,
+			struct page *page, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+	int nbits = len >> fs_info->sectorsize_bits;
+
+	btrfs_subpage_assert(fs_info, page, start, len);
+	ASSERT(atomic_read(&subpage->readers) >= nbits);
+	if (atomic_sub_and_test(nbits, &subpage->readers))
+		unlock_page(page);
+}
+
 static inline void btrfs_subpage_set_tree_block(struct btrfs_fs_info *fs_info,
 			struct page *page, u64 start, u32 len)
 {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH v2 18/18] btrfs: allow RO mount of 4K sector size fs on 64K page system
  2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
                   ` (16 preceding siblings ...)
  2020-12-10  6:39 ` [PATCH v2 17/18] btrfs: integrate page status update for read path into begin/end_page_read() Qu Wenruo
@ 2020-12-10  6:39 ` Qu Wenruo
  17 siblings, 0 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10  6:39 UTC (permalink / raw)
  To: linux-btrfs

This adds the basic RO mount ability for 4K sector size on 64K page
system.

Currently we only plan to support 4K and 64K page system.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 24 +++++++++++++++++++++---
 fs/btrfs/super.c   |  7 +++++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index adda76895058..8ab6308ff852 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2510,13 +2510,21 @@ static int validate_super(struct btrfs_fs_info *fs_info,
 		btrfs_err(fs_info, "invalid sectorsize %llu", sectorsize);
 		ret = -EINVAL;
 	}
-	/* Only PAGE SIZE is supported yet */
-	if (sectorsize != PAGE_SIZE) {
+
+	/*
+	 * For 4K page size, we only support 4K sector size.
+	 * For 64K page size, we support RW for 64K sector size, and RO for
+	 * 4K sector size.
+	 */
+	if ((SZ_4K == PAGE_SIZE && sectorsize != PAGE_SIZE) ||
+	    (SZ_64K == PAGE_SIZE && (sectorsize != SZ_4K &&
+				     sectorsize != SZ_64K))) {
 		btrfs_err(fs_info,
-			"sectorsize %llu not supported yet, only support %lu",
+			"sectorsize %llu not supported yet for page size %lu",
 			sectorsize, PAGE_SIZE);
 		ret = -EINVAL;
 	}
+
 	if (!is_power_of_2(nodesize) || nodesize < sectorsize ||
 	    nodesize > BTRFS_MAX_METADATA_BLOCKSIZE) {
 		btrfs_err(fs_info, "invalid nodesize %llu", nodesize);
@@ -3272,6 +3280,16 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_alloc;
 	}
 
+	/* For 4K sector size support, it's only read-only yet */
+	if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) {
+		if (!sb_rdonly(sb) || btrfs_super_log_root(disk_super)) {
+			btrfs_err(fs_info,
+				"subpage sector size only support RO yet");
+			err = -EINVAL;
+			goto fail_alloc;
+		}
+	}
+
 	ret = btrfs_init_workqueues(fs_info, fs_devices);
 	if (ret) {
 		err = ret;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 022f20810089..a8068c389d60 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1996,6 +1996,13 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			ret = -EINVAL;
 			goto restore;
 		}
+		if (fs_info->sectorsize < PAGE_SIZE) {
+			btrfs_warn(fs_info,
+	"read-write mount is not yet allowed for sector size %u page size %lu",
+				   fs_info->sectorsize, PAGE_SIZE);
+			ret = -EINVAL;
+			goto restore;
+		}
 
 		/*
 		 * NOTE: when remounting with a change that does writes, don't
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 16/18] btrfs: introduce btrfs_subpage for data inodes
  2020-12-10  6:39 ` [PATCH v2 16/18] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
@ 2020-12-10  9:44     ` kernel test robot
  2020-12-11  0:43     ` kernel test robot
  2020-12-14 12:46   ` Nikolay Borisov
  2 siblings, 0 replies; 71+ messages in thread
From: kernel test robot @ 2020-12-10  9:44 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 8390 bytes --]

Hi Qu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kdave/for-next]
[also build test WARNING on next-20201209]
[cannot apply to v5.10-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20201210-144442
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: x86_64-randconfig-s021-20201210 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-179-ga00755aa-dirty
        # https://github.com/0day-ci/linux/commit/3852ff477c118432fb205a3422aa538dc8ac3a5f
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20201210-144442
        git checkout 3852ff477c118432fb205a3422aa538dc8ac3a5f
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
>> fs/btrfs/inode.c:8360:13: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted vm_fault_t [assigned] [usertype] ret @@     got int @@
   fs/btrfs/inode.c:8360:13: sparse:     expected restricted vm_fault_t [assigned] [usertype] ret
   fs/btrfs/inode.c:8360:13: sparse:     got int
>> fs/btrfs/inode.c:8361:13: sparse: sparse: restricted vm_fault_t degrades to integer

vim +8360 fs/btrfs/inode.c

  8283	
  8284	/*
  8285	 * btrfs_page_mkwrite() is not allowed to change the file size as it gets
  8286	 * called from a page fault handler when a page is first dirtied. Hence we must
  8287	 * be careful to check for EOF conditions here. We set the page up correctly
  8288	 * for a written page which means we get ENOSPC checking when writing into
  8289	 * holes and correct delalloc and unwritten extent mapping on filesystems that
  8290	 * support these features.
  8291	 *
  8292	 * We are not allowed to take the i_mutex here so we have to play games to
  8293	 * protect against truncate races as the page could now be beyond EOF.  Because
  8294	 * truncate_setsize() writes the inode size before removing pages, once we have
  8295	 * the page lock we can determine safely if the page is beyond EOF. If it is not
  8296	 * beyond EOF, then the page is guaranteed safe against truncation until we
  8297	 * unlock the page.
  8298	 */
  8299	vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
  8300	{
  8301		struct page *page = vmf->page;
  8302		struct inode *inode = file_inode(vmf->vma->vm_file);
  8303		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
  8304		struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
  8305		struct btrfs_ordered_extent *ordered;
  8306		struct extent_state *cached_state = NULL;
  8307		struct extent_changeset *data_reserved = NULL;
  8308		char *kaddr;
  8309		unsigned long zero_start;
  8310		loff_t size;
  8311		vm_fault_t ret;
  8312		int ret2;
  8313		int reserved = 0;
  8314		u64 reserved_space;
  8315		u64 page_start;
  8316		u64 page_end;
  8317		u64 end;
  8318	
  8319		reserved_space = PAGE_SIZE;
  8320	
  8321		sb_start_pagefault(inode->i_sb);
  8322		page_start = page_offset(page);
  8323		page_end = page_start + PAGE_SIZE - 1;
  8324		end = page_end;
  8325	
  8326		/*
  8327		 * Reserving delalloc space after obtaining the page lock can lead to
  8328		 * deadlock. For example, if a dirty page is locked by this function
  8329		 * and the call to btrfs_delalloc_reserve_space() ends up triggering
  8330		 * dirty page write out, then the btrfs_writepage() function could
  8331		 * end up waiting indefinitely to get a lock on the page currently
  8332		 * being processed by btrfs_page_mkwrite() function.
  8333		 */
  8334		ret2 = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved,
  8335						    page_start, reserved_space);
  8336		if (!ret2) {
  8337			ret2 = file_update_time(vmf->vma->vm_file);
  8338			reserved = 1;
  8339		}
  8340		if (ret2) {
  8341			ret = vmf_error(ret2);
  8342			if (reserved)
  8343				goto out;
  8344			goto out_noreserve;
  8345		}
  8346	
  8347		ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */
  8348	again:
  8349		lock_page(page);
  8350		size = i_size_read(inode);
  8351	
  8352		if ((page->mapping != inode->i_mapping) ||
  8353		    (page_start >= size)) {
  8354			/* page got truncated out from underneath us */
  8355			goto out_unlock;
  8356		}
  8357		wait_on_page_writeback(page);
  8358	
  8359		lock_extent_bits(io_tree, page_start, page_end, &cached_state);
> 8360		ret = set_page_extent_mapped(page);
> 8361		if (ret < 0)
  8362			goto out_unlock;
  8363	
  8364		/*
  8365		 * we can't set the delalloc bits if there are pending ordered
  8366		 * extents.  Drop our locks and wait for them to finish
  8367		 */
  8368		ordered = btrfs_lookup_ordered_range(BTRFS_I(inode), page_start,
  8369				PAGE_SIZE);
  8370		if (ordered) {
  8371			unlock_extent_cached(io_tree, page_start, page_end,
  8372					     &cached_state);
  8373			unlock_page(page);
  8374			btrfs_start_ordered_extent(ordered, 1);
  8375			btrfs_put_ordered_extent(ordered);
  8376			goto again;
  8377		}
  8378	
  8379		if (page->index == ((size - 1) >> PAGE_SHIFT)) {
  8380			reserved_space = round_up(size - page_start,
  8381						  fs_info->sectorsize);
  8382			if (reserved_space < PAGE_SIZE) {
  8383				end = page_start + reserved_space - 1;
  8384				btrfs_delalloc_release_space(BTRFS_I(inode),
  8385						data_reserved, page_start,
  8386						PAGE_SIZE - reserved_space, true);
  8387			}
  8388		}
  8389	
  8390		/*
  8391		 * page_mkwrite gets called when the page is firstly dirtied after it's
  8392		 * faulted in, but write(2) could also dirty a page and set delalloc
  8393		 * bits, thus in this case for space account reason, we still need to
  8394		 * clear any delalloc bits within this page range since we have to
  8395		 * reserve data&meta space before lock_page() (see above comments).
  8396		 */
  8397		clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start, end,
  8398				  EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
  8399				  EXTENT_DEFRAG, 0, 0, &cached_state);
  8400	
  8401		ret2 = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, end, 0,
  8402						&cached_state);
  8403		if (ret2) {
  8404			unlock_extent_cached(io_tree, page_start, page_end,
  8405					     &cached_state);
  8406			ret = VM_FAULT_SIGBUS;
  8407			goto out_unlock;
  8408		}
  8409	
  8410		/* page is wholly or partially inside EOF */
  8411		if (page_start + PAGE_SIZE > size)
  8412			zero_start = offset_in_page(size);
  8413		else
  8414			zero_start = PAGE_SIZE;
  8415	
  8416		if (zero_start != PAGE_SIZE) {
  8417			kaddr = kmap(page);
  8418			memset(kaddr + zero_start, 0, PAGE_SIZE - zero_start);
  8419			flush_dcache_page(page);
  8420			kunmap(page);
  8421		}
  8422		ClearPageChecked(page);
  8423		set_page_dirty(page);
  8424		SetPageUptodate(page);
  8425	
  8426		BTRFS_I(inode)->last_trans = fs_info->generation;
  8427		BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid;
  8428		BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
  8429	
  8430		unlock_extent_cached(io_tree, page_start, page_end, &cached_state);
  8431	
  8432		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8433		sb_end_pagefault(inode->i_sb);
  8434		extent_changeset_free(data_reserved);
  8435		return VM_FAULT_LOCKED;
  8436	
  8437	out_unlock:
  8438		unlock_page(page);
  8439	out:
  8440		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8441		btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, page_start,
  8442					     reserved_space, (ret != 0));
  8443	out_noreserve:
  8444		sb_end_pagefault(inode->i_sb);
  8445		extent_changeset_free(data_reserved);
  8446		return ret;
  8447	}
  8448	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37952 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 16/18] btrfs: introduce btrfs_subpage for data inodes
@ 2020-12-10  9:44     ` kernel test robot
  0 siblings, 0 replies; 71+ messages in thread
From: kernel test robot @ 2020-12-10  9:44 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 8599 bytes --]

Hi Qu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kdave/for-next]
[also build test WARNING on next-20201209]
[cannot apply to v5.10-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20201210-144442
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: x86_64-randconfig-s021-20201210 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-179-ga00755aa-dirty
        # https://github.com/0day-ci/linux/commit/3852ff477c118432fb205a3422aa538dc8ac3a5f
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20201210-144442
        git checkout 3852ff477c118432fb205a3422aa538dc8ac3a5f
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
>> fs/btrfs/inode.c:8360:13: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted vm_fault_t [assigned] [usertype] ret @@     got int @@
   fs/btrfs/inode.c:8360:13: sparse:     expected restricted vm_fault_t [assigned] [usertype] ret
   fs/btrfs/inode.c:8360:13: sparse:     got int
>> fs/btrfs/inode.c:8361:13: sparse: sparse: restricted vm_fault_t degrades to integer

vim +8360 fs/btrfs/inode.c

  8283	
  8284	/*
  8285	 * btrfs_page_mkwrite() is not allowed to change the file size as it gets
  8286	 * called from a page fault handler when a page is first dirtied. Hence we must
  8287	 * be careful to check for EOF conditions here. We set the page up correctly
  8288	 * for a written page which means we get ENOSPC checking when writing into
  8289	 * holes and correct delalloc and unwritten extent mapping on filesystems that
  8290	 * support these features.
  8291	 *
  8292	 * We are not allowed to take the i_mutex here so we have to play games to
  8293	 * protect against truncate races as the page could now be beyond EOF.  Because
  8294	 * truncate_setsize() writes the inode size before removing pages, once we have
  8295	 * the page lock we can determine safely if the page is beyond EOF. If it is not
  8296	 * beyond EOF, then the page is guaranteed safe against truncation until we
  8297	 * unlock the page.
  8298	 */
  8299	vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
  8300	{
  8301		struct page *page = vmf->page;
  8302		struct inode *inode = file_inode(vmf->vma->vm_file);
  8303		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
  8304		struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
  8305		struct btrfs_ordered_extent *ordered;
  8306		struct extent_state *cached_state = NULL;
  8307		struct extent_changeset *data_reserved = NULL;
  8308		char *kaddr;
  8309		unsigned long zero_start;
  8310		loff_t size;
  8311		vm_fault_t ret;
  8312		int ret2;
  8313		int reserved = 0;
  8314		u64 reserved_space;
  8315		u64 page_start;
  8316		u64 page_end;
  8317		u64 end;
  8318	
  8319		reserved_space = PAGE_SIZE;
  8320	
  8321		sb_start_pagefault(inode->i_sb);
  8322		page_start = page_offset(page);
  8323		page_end = page_start + PAGE_SIZE - 1;
  8324		end = page_end;
  8325	
  8326		/*
  8327		 * Reserving delalloc space after obtaining the page lock can lead to
  8328		 * deadlock. For example, if a dirty page is locked by this function
  8329		 * and the call to btrfs_delalloc_reserve_space() ends up triggering
  8330		 * dirty page write out, then the btrfs_writepage() function could
  8331		 * end up waiting indefinitely to get a lock on the page currently
  8332		 * being processed by btrfs_page_mkwrite() function.
  8333		 */
  8334		ret2 = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved,
  8335						    page_start, reserved_space);
  8336		if (!ret2) {
  8337			ret2 = file_update_time(vmf->vma->vm_file);
  8338			reserved = 1;
  8339		}
  8340		if (ret2) {
  8341			ret = vmf_error(ret2);
  8342			if (reserved)
  8343				goto out;
  8344			goto out_noreserve;
  8345		}
  8346	
  8347		ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */
  8348	again:
  8349		lock_page(page);
  8350		size = i_size_read(inode);
  8351	
  8352		if ((page->mapping != inode->i_mapping) ||
  8353		    (page_start >= size)) {
  8354			/* page got truncated out from underneath us */
  8355			goto out_unlock;
  8356		}
  8357		wait_on_page_writeback(page);
  8358	
  8359		lock_extent_bits(io_tree, page_start, page_end, &cached_state);
> 8360		ret = set_page_extent_mapped(page);
> 8361		if (ret < 0)
  8362			goto out_unlock;
  8363	
  8364		/*
  8365		 * we can't set the delalloc bits if there are pending ordered
  8366		 * extents.  Drop our locks and wait for them to finish
  8367		 */
  8368		ordered = btrfs_lookup_ordered_range(BTRFS_I(inode), page_start,
  8369				PAGE_SIZE);
  8370		if (ordered) {
  8371			unlock_extent_cached(io_tree, page_start, page_end,
  8372					     &cached_state);
  8373			unlock_page(page);
  8374			btrfs_start_ordered_extent(ordered, 1);
  8375			btrfs_put_ordered_extent(ordered);
  8376			goto again;
  8377		}
  8378	
  8379		if (page->index == ((size - 1) >> PAGE_SHIFT)) {
  8380			reserved_space = round_up(size - page_start,
  8381						  fs_info->sectorsize);
  8382			if (reserved_space < PAGE_SIZE) {
  8383				end = page_start + reserved_space - 1;
  8384				btrfs_delalloc_release_space(BTRFS_I(inode),
  8385						data_reserved, page_start,
  8386						PAGE_SIZE - reserved_space, true);
  8387			}
  8388		}
  8389	
  8390		/*
  8391		 * page_mkwrite gets called when the page is firstly dirtied after it's
  8392		 * faulted in, but write(2) could also dirty a page and set delalloc
  8393		 * bits, thus in this case for space account reason, we still need to
  8394		 * clear any delalloc bits within this page range since we have to
  8395		 * reserve data&meta space before lock_page() (see above comments).
  8396		 */
  8397		clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start, end,
  8398				  EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
  8399				  EXTENT_DEFRAG, 0, 0, &cached_state);
  8400	
  8401		ret2 = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, end, 0,
  8402						&cached_state);
  8403		if (ret2) {
  8404			unlock_extent_cached(io_tree, page_start, page_end,
  8405					     &cached_state);
  8406			ret = VM_FAULT_SIGBUS;
  8407			goto out_unlock;
  8408		}
  8409	
  8410		/* page is wholly or partially inside EOF */
  8411		if (page_start + PAGE_SIZE > size)
  8412			zero_start = offset_in_page(size);
  8413		else
  8414			zero_start = PAGE_SIZE;
  8415	
  8416		if (zero_start != PAGE_SIZE) {
  8417			kaddr = kmap(page);
  8418			memset(kaddr + zero_start, 0, PAGE_SIZE - zero_start);
  8419			flush_dcache_page(page);
  8420			kunmap(page);
  8421		}
  8422		ClearPageChecked(page);
  8423		set_page_dirty(page);
  8424		SetPageUptodate(page);
  8425	
  8426		BTRFS_I(inode)->last_trans = fs_info->generation;
  8427		BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid;
  8428		BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
  8429	
  8430		unlock_extent_cached(io_tree, page_start, page_end, &cached_state);
  8431	
  8432		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8433		sb_end_pagefault(inode->i_sb);
  8434		extent_changeset_free(data_reserved);
  8435		return VM_FAULT_LOCKED;
  8436	
  8437	out_unlock:
  8438		unlock_page(page);
  8439	out:
  8440		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8441		btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, page_start,
  8442					     reserved_space, (ret != 0));
  8443	out_noreserve:
  8444		sb_end_pagefault(inode->i_sb);
  8445		extent_changeset_free(data_reserved);
  8446		return ret;
  8447	}
  8448	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 37952 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 02/18] btrfs: extent_io: refactor __extent_writepage_io() to improve readability
  2020-12-10  6:38 ` [PATCH v2 02/18] btrfs: extent_io: refactor __extent_writepage_io() to improve readability Qu Wenruo
@ 2020-12-10 12:12   ` Nikolay Borisov
  2020-12-10 12:53     ` Qu Wenruo
  2020-12-17 15:43   ` Josef Bacik
  1 sibling, 1 reply; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-10 12:12 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
> The refactor involves the following modifications:
> - iosize alignment
>   In fact we don't really need to manually do alignment at all.
>   All extent maps should already be aligned, thus basic ASSERT() check
>   would be enough.
> 
> - redundant variables
>   We have extra variable like blocksize/pg_offset/end.
>   They are all unnecessary.
> 
>   @blocksize can be replaced by sectorsize size directly, and it's only
>   used to verify the em start/size is aligned.
> 
>   @pg_offset can be easily calculated using @cur and page_offset(page).
> 
>   @end is just assigned to @page_end and never modified, use @page_end
>   to replace it.
> 
> - remove some BUG_ON()s
>   The BUG_ON()s are for extent map, which we have tree-checker to check
>   on-disk extent data item and runtime check.
>   ASSERT() should be enough.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 37 +++++++++++++++++--------------------
>  1 file changed, 17 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 2650e8720394..612fe60b367e 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3515,17 +3515,14 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>  				 unsigned long nr_written,
>  				 int *nr_ret)
>  {
> +	struct btrfs_fs_info *fs_info = inode->root->fs_info;
>  	struct extent_io_tree *tree = &inode->io_tree;
>  	u64 start = page_offset(page);
>  	u64 page_end = start + PAGE_SIZE - 1;

nit: page_end should be renamed to end because start now points to the
logical logical byte offset, i.e having "page" in the name is misleading.

<snip>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 03/18] btrfs: file: update comment for btrfs_dirty_pages()
  2020-12-10  6:38 ` [PATCH v2 03/18] btrfs: file: update comment for btrfs_dirty_pages() Qu Wenruo
@ 2020-12-10 12:16   ` Nikolay Borisov
  0 siblings, 0 replies; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-10 12:16 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
> The original comment is from the initial merge, which has several
> problems:
> - No holes check any more
> - No inline decision is made
> 
> Update the out-of-date comment with more correct one.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/file.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 0e41459b8de6..a29b50208eee 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -453,12 +453,15 @@ static void btrfs_drop_pages(struct page **pages, size_t num_pages)
>  }
>  
>  /*
> - * after copy_from_user, pages need to be dirtied and we need to make
> - * sure holes are created between the current EOF and the start of
> - * any next extents (if required).
> - *
> - * this also makes the decision about creating an inline extent vs
> - * doing real data extents, marking pages dirty and delalloc as required.
> + * After btrfs_copy_from_user(), update the following things for delalloc:
> + * - DELALLOC extent io tree bits
> + *   Later btrfs_run_delalloc_range() relies on this bit to determine the
> + *   writeback range.

IMO the following seems more coherent and concise:

- Mark newly dirtied pages as DELALLOC in the io tree. Used to advise
which range is to be written back.

> + * - Page status
> + *   Including basic status like Dirty and Uptodate, and btrfs specific bit
> + *   like Checked (for cow fixup)

- Marks modified pages as Uptodate/Dirty and not needing cowfixup

> + * - Inode size update
> + *   If needed

- Update inode size for past EOF write.

>   */
>  int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
>  		      size_t num_pages, loff_t pos, size_t write_bytes,
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 02/18] btrfs: extent_io: refactor __extent_writepage_io() to improve readability
  2020-12-10 12:12   ` Nikolay Borisov
@ 2020-12-10 12:53     ` Qu Wenruo
  2020-12-10 12:58       ` Nikolay Borisov
  0 siblings, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-10 12:53 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2020/12/10 下午8:12, Nikolay Borisov wrote:
>
>
> On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
>> The refactor involves the following modifications:
>> - iosize alignment
>>   In fact we don't really need to manually do alignment at all.
>>   All extent maps should already be aligned, thus basic ASSERT() check
>>   would be enough.
>>
>> - redundant variables
>>   We have extra variable like blocksize/pg_offset/end.
>>   They are all unnecessary.
>>
>>   @blocksize can be replaced by sectorsize size directly, and it's only
>>   used to verify the em start/size is aligned.
>>
>>   @pg_offset can be easily calculated using @cur and page_offset(page).
>>
>>   @end is just assigned to @page_end and never modified, use @page_end
>>   to replace it.
>>
>> - remove some BUG_ON()s
>>   The BUG_ON()s are for extent map, which we have tree-checker to check
>>   on-disk extent data item and runtime check.
>>   ASSERT() should be enough.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/extent_io.c | 37 +++++++++++++++++--------------------
>>  1 file changed, 17 insertions(+), 20 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 2650e8720394..612fe60b367e 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -3515,17 +3515,14 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>>  				 unsigned long nr_written,
>>  				 int *nr_ret)
>>  {
>> +	struct btrfs_fs_info *fs_info = inode->root->fs_info;
>>  	struct extent_io_tree *tree = &inode->io_tree;
>>  	u64 start = page_offset(page);
>>  	u64 page_end = start + PAGE_SIZE - 1;
>
> nit: page_end should be renamed to end because start now points to the
> logical logical byte offset, i.e having "page" in the name is misleading.

But page_offset() along page_end is still logical bytenr, thus I didn't
see much confusion here...

Thanks,
Qu
>
> <snip>
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 02/18] btrfs: extent_io: refactor __extent_writepage_io() to improve readability
  2020-12-10 12:53     ` Qu Wenruo
@ 2020-12-10 12:58       ` Nikolay Borisov
  0 siblings, 0 replies; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-10 12:58 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs



On 10.12.20 г. 14:53 ч., Qu Wenruo wrote:
> 
> 
> On 2020/12/10 下午8:12, Nikolay Borisov wrote:
>>
>>
>> On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
>>> The refactor involves the following modifications:
>>> - iosize alignment
>>>   In fact we don't really need to manually do alignment at all.
>>>   All extent maps should already be aligned, thus basic ASSERT() check
>>>   would be enough.
>>>
>>> - redundant variables
>>>   We have extra variable like blocksize/pg_offset/end.
>>>   They are all unnecessary.
>>>
>>>   @blocksize can be replaced by sectorsize size directly, and it's only
>>>   used to verify the em start/size is aligned.
>>>
>>>   @pg_offset can be easily calculated using @cur and page_offset(page).
>>>
>>>   @end is just assigned to @page_end and never modified, use @page_end
>>>   to replace it.
>>>
>>> - remove some BUG_ON()s
>>>   The BUG_ON()s are for extent map, which we have tree-checker to check
>>>   on-disk extent data item and runtime check.
>>>   ASSERT() should be enough.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>  fs/btrfs/extent_io.c | 37 +++++++++++++++++--------------------
>>>  1 file changed, 17 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>> index 2650e8720394..612fe60b367e 100644
>>> --- a/fs/btrfs/extent_io.c
>>> +++ b/fs/btrfs/extent_io.c
>>> @@ -3515,17 +3515,14 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>>>  				 unsigned long nr_written,
>>>  				 int *nr_ret)
>>>  {
>>> +	struct btrfs_fs_info *fs_info = inode->root->fs_info;
>>>  	struct extent_io_tree *tree = &inode->io_tree;
>>>  	u64 start = page_offset(page);
>>>  	u64 page_end = start + PAGE_SIZE - 1;
>>
>> nit: page_end should be renamed to end because start now points to the
>> logical logical byte offset, i.e having "page" in the name is misleading.
> 
> But page_offset() along page_end is still logical bytenr, thus I didn't
> see much confusion here...

Exactly page_offset converts the page index to a logical bytenr and that
point we no longer care about the physical page but the logical range
which is PAGE_SIZE. 'page_end' is really some logical affset which spans
a PAGE_SIZE region
> 
> Thanks,
> Qu
>>
>> <snip>
>>
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check
  2020-12-10  6:39 ` [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check Qu Wenruo
@ 2020-12-10 13:24     ` kernel test robot
  2020-12-10 13:39     ` kernel test robot
  2020-12-14 10:21   ` Nikolay Borisov
  2 siblings, 0 replies; 71+ messages in thread
From: kernel test robot @ 2020-12-10 13:24 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1828 bytes --]

Hi Qu,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kdave/for-next]
[also build test ERROR on next-20201209]
[cannot apply to btrfs/next v5.10-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20201210-144442
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: nds32-randconfig-r004-20201209 (attached as .config)
compiler: nds32le-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/e01cdf51d0d32647697616c0dd08f2cc3220bde4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20201210-144442
        git checkout e01cdf51d0d32647697616c0dd08f2cc3220bde4
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nds32 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   nds32le-linux-ld: fs/btrfs/disk-io.o: in function `btrfs_validate_metadata_buffer':
   disk-io.c:(.text+0x4200): undefined reference to `__udivdi3'
>> nds32le-linux-ld: disk-io.c:(.text+0x4204): undefined reference to `__udivdi3'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22008 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check
@ 2020-12-10 13:24     ` kernel test robot
  0 siblings, 0 replies; 71+ messages in thread
From: kernel test robot @ 2020-12-10 13:24 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1867 bytes --]

Hi Qu,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kdave/for-next]
[also build test ERROR on next-20201209]
[cannot apply to btrfs/next v5.10-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20201210-144442
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: nds32-randconfig-r004-20201209 (attached as .config)
compiler: nds32le-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/e01cdf51d0d32647697616c0dd08f2cc3220bde4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20201210-144442
        git checkout e01cdf51d0d32647697616c0dd08f2cc3220bde4
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nds32 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   nds32le-linux-ld: fs/btrfs/disk-io.o: in function `btrfs_validate_metadata_buffer':
   disk-io.c:(.text+0x4200): undefined reference to `__udivdi3'
>> nds32le-linux-ld: disk-io.c:(.text+0x4204): undefined reference to `__udivdi3'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 22008 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check
  2020-12-10  6:39 ` [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check Qu Wenruo
@ 2020-12-10 13:39     ` kernel test robot
  2020-12-10 13:39     ` kernel test robot
  2020-12-14 10:21   ` Nikolay Borisov
  2 siblings, 0 replies; 71+ messages in thread
From: kernel test robot @ 2020-12-10 13:39 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4077 bytes --]

Hi Qu,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kdave/for-next]
[also build test ERROR on next-20201210]
[cannot apply to v5.10-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20201210-144442
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: i386-randconfig-a013-20201209 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/e01cdf51d0d32647697616c0dd08f2cc3220bde4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20201210-144442
        git checkout e01cdf51d0d32647697616c0dd08f2cc3220bde4
        # save the attached .config to linux build tree
        make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: fs/btrfs/disk-io.o: in function `validate_subpage_buffer':
>> fs/btrfs/disk-io.c:619: undefined reference to `__udivdi3'

vim +619 fs/btrfs/disk-io.c

   593	
   594	static int validate_subpage_buffer(struct page *page, u64 start, u64 end,
   595					   int mirror)
   596	{
   597		struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
   598		struct extent_buffer *eb;
   599		int reads_done;
   600		int ret = 0;
   601	
   602		if (!IS_ALIGNED(start, fs_info->sectorsize) ||
   603		    !IS_ALIGNED(end - start + 1, fs_info->sectorsize) ||
   604		    !IS_ALIGNED(end - start + 1, fs_info->nodesize)) {
   605			WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
   606			btrfs_err(fs_info, "invalid tree read bytenr");
   607			return -EUCLEAN;
   608		}
   609	
   610		/*
   611		 * We don't allow bio merge for subpage metadata read, so we should
   612		 * only get one eb for each endio hook.
   613		 */
   614		ASSERT(end == start + fs_info->nodesize - 1);
   615		ASSERT(PagePrivate(page));
   616	
   617		rcu_read_lock();
   618		eb = radix_tree_lookup(&fs_info->buffer_radix,
 > 619				       start / fs_info->sectorsize);
   620		rcu_read_unlock();
   621	
   622		/*
   623		 * When we are reading one tree block, eb must have been
   624		 * inserted into the radix tree. If not something is wrong.
   625		 */
   626		if (!eb) {
   627			WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
   628			btrfs_err(fs_info,
   629				"can't find extent buffer for bytenr %llu",
   630				start);
   631			return -EUCLEAN;
   632		}
   633		/*
   634		 * The pending IO might have been the only thing that kept
   635		 * this buffer in memory.  Make sure we have a ref for all
   636		 * this other checks
   637		 */
   638		atomic_inc(&eb->refs);
   639	
   640		reads_done = atomic_dec_and_test(&eb->io_pages);
   641		/* Subpage read must finish in page read */
   642		ASSERT(reads_done);
   643	
   644		eb->read_mirror = mirror;
   645		if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
   646			ret = -EIO;
   647			goto err;
   648		}
   649		ret = validate_extent_buffer(eb);
   650		if (ret < 0)
   651			goto err;
   652	
   653		if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
   654			btree_readahead_hook(eb, ret);
   655	
   656		set_extent_buffer_uptodate(eb);
   657	
   658		free_extent_buffer(eb);
   659		return ret;
   660	err:
   661		/*
   662		 * our io error hook is going to dec the io pages
   663		 * again, we have to make sure it has something to
   664		 * decrement
   665		 */
   666		atomic_inc(&eb->io_pages);
   667		clear_extent_buffer_uptodate(eb);
   668		free_extent_buffer(eb);
   669		return ret;
   670	}
   671	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 36747 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check
@ 2020-12-10 13:39     ` kernel test robot
  0 siblings, 0 replies; 71+ messages in thread
From: kernel test robot @ 2020-12-10 13:39 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4195 bytes --]

Hi Qu,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kdave/for-next]
[also build test ERROR on next-20201210]
[cannot apply to v5.10-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20201210-144442
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: i386-randconfig-a013-20201209 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/e01cdf51d0d32647697616c0dd08f2cc3220bde4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20201210-144442
        git checkout e01cdf51d0d32647697616c0dd08f2cc3220bde4
        # save the attached .config to linux build tree
        make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: fs/btrfs/disk-io.o: in function `validate_subpage_buffer':
>> fs/btrfs/disk-io.c:619: undefined reference to `__udivdi3'

vim +619 fs/btrfs/disk-io.c

   593	
   594	static int validate_subpage_buffer(struct page *page, u64 start, u64 end,
   595					   int mirror)
   596	{
   597		struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
   598		struct extent_buffer *eb;
   599		int reads_done;
   600		int ret = 0;
   601	
   602		if (!IS_ALIGNED(start, fs_info->sectorsize) ||
   603		    !IS_ALIGNED(end - start + 1, fs_info->sectorsize) ||
   604		    !IS_ALIGNED(end - start + 1, fs_info->nodesize)) {
   605			WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
   606			btrfs_err(fs_info, "invalid tree read bytenr");
   607			return -EUCLEAN;
   608		}
   609	
   610		/*
   611		 * We don't allow bio merge for subpage metadata read, so we should
   612		 * only get one eb for each endio hook.
   613		 */
   614		ASSERT(end == start + fs_info->nodesize - 1);
   615		ASSERT(PagePrivate(page));
   616	
   617		rcu_read_lock();
   618		eb = radix_tree_lookup(&fs_info->buffer_radix,
 > 619				       start / fs_info->sectorsize);
   620		rcu_read_unlock();
   621	
   622		/*
   623		 * When we are reading one tree block, eb must have been
   624		 * inserted into the radix tree. If not something is wrong.
   625		 */
   626		if (!eb) {
   627			WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
   628			btrfs_err(fs_info,
   629				"can't find extent buffer for bytenr %llu",
   630				start);
   631			return -EUCLEAN;
   632		}
   633		/*
   634		 * The pending IO might have been the only thing that kept
   635		 * this buffer in memory.  Make sure we have a ref for all
   636		 * this other checks
   637		 */
   638		atomic_inc(&eb->refs);
   639	
   640		reads_done = atomic_dec_and_test(&eb->io_pages);
   641		/* Subpage read must finish in page read */
   642		ASSERT(reads_done);
   643	
   644		eb->read_mirror = mirror;
   645		if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
   646			ret = -EIO;
   647			goto err;
   648		}
   649		ret = validate_extent_buffer(eb);
   650		if (ret < 0)
   651			goto err;
   652	
   653		if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
   654			btree_readahead_hook(eb, ret);
   655	
   656		set_extent_buffer_uptodate(eb);
   657	
   658		free_extent_buffer(eb);
   659		return ret;
   660	err:
   661		/*
   662		 * our io error hook is going to dec the io pages
   663		 * again, we have to make sure it has something to
   664		 * decrement
   665		 */
   666		atomic_inc(&eb->io_pages);
   667		clear_extent_buffer_uptodate(eb);
   668		free_extent_buffer(eb);
   669		return ret;
   670	}
   671	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 36747 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 04/18] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page
  2020-12-10  6:38 ` [PATCH v2 04/18] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page Qu Wenruo
@ 2020-12-10 13:51   ` Nikolay Borisov
  2020-12-17 15:50   ` Josef Bacik
  1 sibling, 0 replies; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-10 13:51 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Johannes Thumshirn



On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
> This patch will extract the code to grab an extent buffer from a page
> into a helper, grab_extent_buffer_from_page().
> 
> This reduces one indent level, and provides the work place for later
> expansion for subapge support.
> 
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 52 +++++++++++++++++++++++++++-----------------
>  1 file changed, 32 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 612fe60b367e..6350c2687c7e 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5251,6 +5251,32 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
>  }
>  #endif
>  
> +static struct extent_buffer *grab_extent_buffer_from_page(struct page *page)

nit: Make the name just grab_extent_buffer/get_extent_buffer, given you
pass in a page as an input parameter "from_page" is obvious.

> +{
> +	struct extent_buffer *exists;
> +
> +	/* Page not yet attached to an extent buffer */
> +	if (!PagePrivate(page))
> +		return NULL;
> +
> +	/*
> +	 * We could have already allocated an eb for this page
> +	 * and attached one so lets see if we can get a ref on
> +	 * the existing eb, and if we can we know it's good and
> +	 * we can just return that one, else we know we can just
> +	 * overwrite page->private.
> +	 */
> +	exists = (struct extent_buffer *)page->private;
> +	if (atomic_inc_not_zero(&exists->refs)) {
> +		mark_extent_buffer_accessed(exists, page);
> +		return exists;
> +	}

nit: This patch slightly changes the timing of
mark_extent_buffer_accessed, as it's now called under
mapping->private_lock and respective page locked. Looking at
map_extent_buffer_accessed it does iterate pages and call
mark_page_accessed on them as well as calling check_buffer_tre_ref which
does some atomic ops. While it might not be a big hit I'd expect there
will be some minimal performance regression.

> +
> +	WARN_ON(PageDirty(page));
> +	detach_page_private(page);
> +	return NULL;
> +}
> +
>  struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>  					  u64 start, u64 owner_root, int level)
>  {
> @@ -5296,26 +5322,12 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>  		}
>  
>  		spin_lock(&mapping->private_lock);
> -		if (PagePrivate(p)) {
> -			/*
> -			 * We could have already allocated an eb for this page
> -			 * and attached one so lets see if we can get a ref on
> -			 * the existing eb, and if we can we know it's good and
> -			 * we can just return that one, else we know we can just
> -			 * overwrite page->private.
> -			 */
> -			exists = (struct extent_buffer *)p->private;
> -			if (atomic_inc_not_zero(&exists->refs)) {
> -				spin_unlock(&mapping->private_lock);
> -				unlock_page(p);
> -				put_page(p);
> -				mark_extent_buffer_accessed(exists, p);
> -				goto free_eb;
> -			}
> -			exists = NULL;
> -
> -			WARN_ON(PageDirty(p));
> -			detach_page_private(p);
> +		exists = grab_extent_buffer_from_page(p);
> +		if (exists) {
> +			spin_unlock(&mapping->private_lock);
> +			unlock_page(p);
> +			put_page(p);
> +			goto free_eb;
>  		}
>  		attach_extent_buffer_page(eb, p);
>  		spin_unlock(&mapping->private_lock);
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case
  2020-12-10  6:38 ` [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
@ 2020-12-10 15:30   ` Nikolay Borisov
  2020-12-17  6:48     ` Qu Wenruo
  2020-12-10 16:09   ` Nikolay Borisov
  2020-12-17 16:00   ` Josef Bacik
  2 siblings, 1 reply; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-10 15:30 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
> For subpage case, we need to allocate new memory for each metadata page.
> 
> So we need to:
> - Allow attach_extent_buffer_page() to return int
>   To indicate allocation failure
> 
> - Prealloc page->private for alloc_extent_buffer()
>   We don't want to call memory allocation with spinlock hold, so
>   do preallocation before we acquire the spin lock.
> 
> - Handle subpage and regular case differently in
>   attach_extent_buffer_page()
>   For regular case, just do the usual thing.
>   For subpage case, allocate new memory and update the tree_block
>   bitmap.
> 
>   The bitmap update will be handled by new subpage specific helper,
>   btrfs_subpage_set_tree_block().
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 69 +++++++++++++++++++++++++++++++++++---------
>  fs/btrfs/subpage.h   | 44 ++++++++++++++++++++++++++++
>  2 files changed, 99 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 6350c2687c7e..51dd7ec3c2b3 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -24,6 +24,7 @@
>  #include "rcu-string.h"
>  #include "backref.h"
>  #include "disk-io.h"
> +#include "subpage.h"
>  
>  static struct kmem_cache *extent_state_cache;
>  static struct kmem_cache *extent_buffer_cache;
> @@ -3142,22 +3143,41 @@ static int submit_extent_page(unsigned int opf,
>  	return ret;
>  }
>  
> -static void attach_extent_buffer_page(struct extent_buffer *eb,
> +static int attach_extent_buffer_page(struct extent_buffer *eb,
>  				      struct page *page)
>  {
> -	/*
> -	 * If the page is mapped to btree inode, we should hold the private
> -	 * lock to prevent race.
> -	 * For cloned or dummy extent buffers, their pages are not mapped and
> -	 * will not race with any other ebs.
> -	 */
> -	if (page->mapping)
> -		lockdep_assert_held(&page->mapping->private_lock);
> +	struct btrfs_fs_info *fs_info = eb->fs_info;
> +	int ret;
>  
> -	if (!PagePrivate(page))
> -		attach_page_private(page, eb);
> -	else
> -		WARN_ON(page->private != (unsigned long)eb);
> +	if (fs_info->sectorsize == PAGE_SIZE) {
> +		/*
> +		 * If the page is mapped to btree inode, we should hold the
> +		 * private lock to prevent race.
> +		 * For cloned or dummy extent buffers, their pages are not
> +		 * mapped and will not race with any other ebs.
> +		 */
> +		if (page->mapping)
> +			lockdep_assert_held(&page->mapping->private_lock);
> +
> +		if (!PagePrivate(page))
> +			attach_page_private(page, eb);
> +		else
> +			WARN_ON(page->private != (unsigned long)eb);
> +		return 0;
> +	}
> +
> +	/* Already mapped, just update the existing range */
> +	if (PagePrivate(page))
> +		goto update_bitmap;

How can this check ever be false, given btrfs_attach_subpage is called
unconditionally  in alloc_extent_buffer so that you can avoid allocating
memory with private lock held, yet in this function you check if memory
hasn't been allocated and you proceed to do it? Also that memory
allocation is done with GFP_NOFS under a spinlock, that's not atomic i.e
IO can still be kicked which means you can go to sleep while holding a
spinlock, not cool.

> +
> +	/* Do new allocation to attach subpage */
> +	ret = btrfs_attach_subpage(fs_info, page);
> +	if (ret < 0)
> +		return ret;
> +
> +update_bitmap:
> +	btrfs_subpage_set_tree_block(fs_info, page, eb->start, eb->len);
> +	return 0;

Those are really 2 functions, demarcated by the if. Given that
attach_extent_buffer is called in only 2 places, can't you opencode the
if (fs_info->sectorize) check in the callers and define 2 functions:

1 for subpage blocksize and the other one for the old code?

>  }
>  

<snip>

> diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
> index 96f3b226913e..c2ce603e7848 100644
> --- a/fs/btrfs/subpage.h
> +++ b/fs/btrfs/subpage.h
> @@ -23,9 +23,53 @@
>  struct btrfs_subpage {
>  	/* Common members for both data and metadata pages */
>  	spinlock_t lock;
> +	union {
> +		/* Structures only used by metadata */
> +		struct {
> +			u16 tree_block_bitmap;
> +		};
> +		/* structures only used by data */
> +	};
>  };
>  
>  int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>  void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>  
> +/*
> + * Convert the [start, start + len) range into a u16 bitmap
> + *
> + * E.g. if start == page_offset() + 16K, len = 16K, we get 0x00f0.
> + */
> +static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
> +			struct page *page, u64 start, u32 len)
> +{
> +	int bit_start = (start - page_offset(page)) >> fs_info->sectorsize_bits;
> +	int nbits = len >> fs_info->sectorsize_bits;
> +
> +	/* Basic checks */
> +	ASSERT(PagePrivate(page) && page->private);
> +	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
> +	       IS_ALIGNED(len, fs_info->sectorsize));

Separate aligns so if they feel it's evident which one failed.

> +	ASSERT(page_offset(page) <= start &&
> +	       start + len <= page_offset(page) + PAGE_SIZE);

ditto. Also instead of checking 'page_offset(page) <= start' you can
simply check 'bit_start is >= 0' as that's what you ultimately care about.

> +	/*
> +	 * Here nbits can be 16, thus can go beyond u16 range. Here we make the
> +	 * first left shift to be calculated in unsigned long (u32), then
> +	 * truncate the result to u16.
> +	 */
> +	return (u16)(((1UL << nbits) - 1) << bit_start);
> +}
> +
> +static inline void btrfs_subpage_set_tree_block(struct btrfs_fs_info *fs_info,
> +			struct page *page, u64 start, u32 len)
> +{
> +	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
> +	unsigned long flags;
> +	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
> +
> +	spin_lock_irqsave(&subpage->lock, flags);
> +	subpage->tree_block_bitmap |= tmp;
> +	spin_unlock_irqrestore(&subpage->lock, flags);
> +}
> +
>  #endif /* BTRFS_SUBPAGE_H */
> 


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 07/18] btrfs: extent_io: make grab_extent_buffer_from_page() to handle subpage case
  2020-12-10  6:38 ` [PATCH v2 07/18] btrfs: extent_io: make grab_extent_buffer_from_page() " Qu Wenruo
@ 2020-12-10 15:39   ` Nikolay Borisov
  2020-12-17  6:55     ` Qu Wenruo
  2020-12-17 16:02   ` Josef Bacik
  1 sibling, 1 reply; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-10 15:39 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
> For subpage case, grab_extent_buffer_from_page() can't really get an
> extent buffer just from btrfs_subpage.
> 
> Although we have btrfs_subpage::tree_block_bitmap, which can be used to
> grab the bytenr of an existing extent buffer, and can then go radix tree
> search to grab that existing eb.
> 
> However we are still doing radix tree insert check in
> alloc_extent_buffer(), thus we don't really need to do the extra hassle,
> just let alloc_extent_buffer() to handle existing eb in radix tree.
> 
> So for grab_extent_buffer_from_page(), just always return NULL for
> subpage case.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 51dd7ec3c2b3..b99bd0402130 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5278,10 +5278,19 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
>  }
>  #endif
>  
> -static struct extent_buffer *grab_extent_buffer_from_page(struct page *page)
> +static struct extent_buffer *grab_extent_buffer_from_page(
> +		struct btrfs_fs_info *fs_info, struct page *page)
>  {
>  	struct extent_buffer *exists;
>  
> +	/*
> +	 * For subpage case, we completely rely on radix tree to ensure we
> +	 * don't try to insert two eb for the same bytenr.
> +	 * So here we alwasy return NULL and just continue.
> +	 */
> +	if (fs_info->sectorsize < PAGE_SIZE)
> +		return NULL;
> +

Instead of hiding this in the function, just open-code it in the only caller. It would look like: 

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b99bd0402130..440dab207944 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5370,8 +5370,9 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
                }
 
                spin_lock(&mapping->private_lock);
-               exists = grab_extent_buffer_from_page(fs_info, p);
-               if (exists) {
+               if (fs_info->sectorsize == PAGE_SIZE &&
+                   (exists = grab_extent_buffer_from_page(fs_info, p)));
+               {
                        spin_unlock(&mapping->private_lock);
                        unlock_page(p);
                        put_page(p);


Admittedly that exist = ... in the if condition is a bit of an anti-pattern but given it's used in only 1 place
and makes the flow of code more linear I'd say it's a win. But would like to hear David's opinion. 

>  	/* Page not yet attached to an extent buffer */
>  	if (!PagePrivate(page))
>  		return NULL;
> @@ -5361,7 +5370,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>  		}
>  
>  		spin_lock(&mapping->private_lock);
> -		exists = grab_extent_buffer_from_page(p);
> +		exists = grab_extent_buffer_from_page(fs_info, p);
>  		if (exists) {
>  			spin_unlock(&mapping->private_lock);
>  			unlock_page(p);
> 

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case
  2020-12-10  6:38 ` [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
  2020-12-10 15:30   ` Nikolay Borisov
@ 2020-12-10 16:09   ` Nikolay Borisov
  2020-12-17 16:00   ` Josef Bacik
  2 siblings, 0 replies; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-10 16:09 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
> For subpage case, we need to allocate new memory for each metadata page.
> 
> So we need to:
> - Allow attach_extent_buffer_page() to return int
>   To indicate allocation failure
> 
> - Prealloc page->private for alloc_extent_buffer()
>   We don't want to call memory allocation with spinlock hold, so
>   do preallocation before we acquire the spin lock.
> 
> - Handle subpage and regular case differently in
>   attach_extent_buffer_page()
>   For regular case, just do the usual thing.
>   For subpage case, allocate new memory and update the tree_block
>   bitmap.
> 
>   The bitmap update will be handled by new subpage specific helper,
>   btrfs_subpage_set_tree_block().
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 69 +++++++++++++++++++++++++++++++++++---------
>  fs/btrfs/subpage.h   | 44 ++++++++++++++++++++++++++++
>  2 files changed, 99 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 6350c2687c7e..51dd7ec3c2b3 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -24,6 +24,7 @@
>  #include "rcu-string.h"
>  #include "backref.h"
>  #include "disk-io.h"
> +#include "subpage.h"
>  

<snip>

>  void set_page_extent_mapped(struct page *page)
> @@ -5067,12 +5087,19 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
>  		return NULL;
>  
>  	for (i = 0; i < num_pages; i++) {
> +		int ret;
> +
>  		p = alloc_page(GFP_NOFS);
>  		if (!p) {
>  			btrfs_release_extent_buffer(new);
>  			return NULL;
>  		}
> -		attach_extent_buffer_page(new, p);
> +		ret = attach_extent_buffer_page(new, p);
> +		if (ret < 0) {
> +			put_page(p);
> +			btrfs_release_extent_buffer(new);
> +			return NULL;
> +		}

In this function you need to move
'set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags);' line before entering
the loop otherwise when btrfs_release_extent_buffer is called it will
try to erroneously acquire the mapping lock since BUFFER_UNMAPPED
wouldn't have been set.

<snip>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 08/18] btrfs: extent_io: support subpage for extent buffer page release
  2020-12-10  6:38 ` [PATCH v2 08/18] btrfs: extent_io: support subpage for extent buffer page release Qu Wenruo
@ 2020-12-10 16:13   ` Nikolay Borisov
  0 siblings, 0 replies; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-10 16:13 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
> In btrfs_release_extent_buffer_pages(), we need to add extra handling
> for subpage.
> 
> To do so, introduce a new helper, detach_extent_buffer_page(), to do
> different handling for regular and subpage cases.
> 
> For subpage case, the new trick is to clear the range of current extent
> buffer, and detach page private if and only if we're the last tree block
> of the page.
> This part is handled by the subpage helper,
> btrfs_subpage_clear_and_test_tree_block().
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 59 +++++++++++++++++++++++++++++++-------------
>  fs/btrfs/subpage.h   | 24 ++++++++++++++++++
>  2 files changed, 66 insertions(+), 17 deletions(-)

<snip>

> @@ -5031,6 +5018,44 @@ static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
>  			 */
>  			detach_page_private(page);
>  		}
> +		return;
> +	}
> +
> +	/*
> +	 * For subpage case, clear the range in tree_block_bitmap,
> +	 * and if we're the last one, detach private completely.
> +	 */
> +	if (PagePrivate(page)) {

Under what condition can you have subpage fs and call
detach_extent_buffer_page on a page that doesn't have PagePrivate flag
set ?  I think that's impossible i.e that check should be really an assert?

> +		bool last = false;
> +
> +		last = btrfs_subpage_clear_and_test_tree_block(fs_info, page,
> +						eb->start, eb->len);
> +		if (last)
> +			btrfs_detach_subpage(fs_info, page);
> +	}
> +}
> +


<snip>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 16/18] btrfs: introduce btrfs_subpage for data inodes
  2020-12-10  6:39 ` [PATCH v2 16/18] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
@ 2020-12-11  0:43     ` kernel test robot
  2020-12-11  0:43     ` kernel test robot
  2020-12-14 12:46   ` Nikolay Borisov
  2 siblings, 0 replies; 71+ messages in thread
From: kernel test robot @ 2020-12-11  0:43 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 7566 bytes --]

Hi Qu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kdave/for-next]
[also build test WARNING on next-20201210]
[cannot apply to v5.10-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20201210-144442
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: i386-randconfig-m021-20201209 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

New smatch warnings:
fs/btrfs/inode.c:8361 btrfs_page_mkwrite() warn: unsigned 'ret' is never less than zero.

Old smatch warnings:
include/linux/fs.h:862 i_size_write() warn: statement has no effect 31

vim +/ret +8361 fs/btrfs/inode.c

  8283	
  8284	/*
  8285	 * btrfs_page_mkwrite() is not allowed to change the file size as it gets
  8286	 * called from a page fault handler when a page is first dirtied. Hence we must
  8287	 * be careful to check for EOF conditions here. We set the page up correctly
  8288	 * for a written page which means we get ENOSPC checking when writing into
  8289	 * holes and correct delalloc and unwritten extent mapping on filesystems that
  8290	 * support these features.
  8291	 *
  8292	 * We are not allowed to take the i_mutex here so we have to play games to
  8293	 * protect against truncate races as the page could now be beyond EOF.  Because
  8294	 * truncate_setsize() writes the inode size before removing pages, once we have
  8295	 * the page lock we can determine safely if the page is beyond EOF. If it is not
  8296	 * beyond EOF, then the page is guaranteed safe against truncation until we
  8297	 * unlock the page.
  8298	 */
  8299	vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
  8300	{
  8301		struct page *page = vmf->page;
  8302		struct inode *inode = file_inode(vmf->vma->vm_file);
  8303		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
  8304		struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
  8305		struct btrfs_ordered_extent *ordered;
  8306		struct extent_state *cached_state = NULL;
  8307		struct extent_changeset *data_reserved = NULL;
  8308		char *kaddr;
  8309		unsigned long zero_start;
  8310		loff_t size;
  8311		vm_fault_t ret;
  8312		int ret2;
  8313		int reserved = 0;
  8314		u64 reserved_space;
  8315		u64 page_start;
  8316		u64 page_end;
  8317		u64 end;
  8318	
  8319		reserved_space = PAGE_SIZE;
  8320	
  8321		sb_start_pagefault(inode->i_sb);
  8322		page_start = page_offset(page);
  8323		page_end = page_start + PAGE_SIZE - 1;
  8324		end = page_end;
  8325	
  8326		/*
  8327		 * Reserving delalloc space after obtaining the page lock can lead to
  8328		 * deadlock. For example, if a dirty page is locked by this function
  8329		 * and the call to btrfs_delalloc_reserve_space() ends up triggering
  8330		 * dirty page write out, then the btrfs_writepage() function could
  8331		 * end up waiting indefinitely to get a lock on the page currently
  8332		 * being processed by btrfs_page_mkwrite() function.
  8333		 */
  8334		ret2 = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved,
  8335						    page_start, reserved_space);
  8336		if (!ret2) {
  8337			ret2 = file_update_time(vmf->vma->vm_file);
  8338			reserved = 1;
  8339		}
  8340		if (ret2) {
  8341			ret = vmf_error(ret2);
  8342			if (reserved)
  8343				goto out;
  8344			goto out_noreserve;
  8345		}
  8346	
  8347		ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */
  8348	again:
  8349		lock_page(page);
  8350		size = i_size_read(inode);
  8351	
  8352		if ((page->mapping != inode->i_mapping) ||
  8353		    (page_start >= size)) {
  8354			/* page got truncated out from underneath us */
  8355			goto out_unlock;
  8356		}
  8357		wait_on_page_writeback(page);
  8358	
  8359		lock_extent_bits(io_tree, page_start, page_end, &cached_state);
  8360		ret = set_page_extent_mapped(page);
> 8361		if (ret < 0)
  8362			goto out_unlock;
  8363	
  8364		/*
  8365		 * we can't set the delalloc bits if there are pending ordered
  8366		 * extents.  Drop our locks and wait for them to finish
  8367		 */
  8368		ordered = btrfs_lookup_ordered_range(BTRFS_I(inode), page_start,
  8369				PAGE_SIZE);
  8370		if (ordered) {
  8371			unlock_extent_cached(io_tree, page_start, page_end,
  8372					     &cached_state);
  8373			unlock_page(page);
  8374			btrfs_start_ordered_extent(ordered, 1);
  8375			btrfs_put_ordered_extent(ordered);
  8376			goto again;
  8377		}
  8378	
  8379		if (page->index == ((size - 1) >> PAGE_SHIFT)) {
  8380			reserved_space = round_up(size - page_start,
  8381						  fs_info->sectorsize);
  8382			if (reserved_space < PAGE_SIZE) {
  8383				end = page_start + reserved_space - 1;
  8384				btrfs_delalloc_release_space(BTRFS_I(inode),
  8385						data_reserved, page_start,
  8386						PAGE_SIZE - reserved_space, true);
  8387			}
  8388		}
  8389	
  8390		/*
  8391		 * page_mkwrite gets called when the page is firstly dirtied after it's
  8392		 * faulted in, but write(2) could also dirty a page and set delalloc
  8393		 * bits, thus in this case for space account reason, we still need to
  8394		 * clear any delalloc bits within this page range since we have to
  8395		 * reserve data&meta space before lock_page() (see above comments).
  8396		 */
  8397		clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start, end,
  8398				  EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
  8399				  EXTENT_DEFRAG, 0, 0, &cached_state);
  8400	
  8401		ret2 = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, end, 0,
  8402						&cached_state);
  8403		if (ret2) {
  8404			unlock_extent_cached(io_tree, page_start, page_end,
  8405					     &cached_state);
  8406			ret = VM_FAULT_SIGBUS;
  8407			goto out_unlock;
  8408		}
  8409	
  8410		/* page is wholly or partially inside EOF */
  8411		if (page_start + PAGE_SIZE > size)
  8412			zero_start = offset_in_page(size);
  8413		else
  8414			zero_start = PAGE_SIZE;
  8415	
  8416		if (zero_start != PAGE_SIZE) {
  8417			kaddr = kmap(page);
  8418			memset(kaddr + zero_start, 0, PAGE_SIZE - zero_start);
  8419			flush_dcache_page(page);
  8420			kunmap(page);
  8421		}
  8422		ClearPageChecked(page);
  8423		set_page_dirty(page);
  8424		SetPageUptodate(page);
  8425	
  8426		BTRFS_I(inode)->last_trans = fs_info->generation;
  8427		BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid;
  8428		BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
  8429	
  8430		unlock_extent_cached(io_tree, page_start, page_end, &cached_state);
  8431	
  8432		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8433		sb_end_pagefault(inode->i_sb);
  8434		extent_changeset_free(data_reserved);
  8435		return VM_FAULT_LOCKED;
  8436	
  8437	out_unlock:
  8438		unlock_page(page);
  8439	out:
  8440		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8441		btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, page_start,
  8442					     reserved_space, (ret != 0));
  8443	out_noreserve:
  8444		sb_end_pagefault(inode->i_sb);
  8445		extent_changeset_free(data_reserved);
  8446		return ret;
  8447	}
  8448	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 36210 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 16/18] btrfs: introduce btrfs_subpage for data inodes
@ 2020-12-11  0:43     ` kernel test robot
  0 siblings, 0 replies; 71+ messages in thread
From: kernel test robot @ 2020-12-11  0:43 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 7765 bytes --]

Hi Qu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kdave/for-next]
[also build test WARNING on next-20201210]
[cannot apply to v5.10-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20201210-144442
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: i386-randconfig-m021-20201209 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

New smatch warnings:
fs/btrfs/inode.c:8361 btrfs_page_mkwrite() warn: unsigned 'ret' is never less than zero.

Old smatch warnings:
include/linux/fs.h:862 i_size_write() warn: statement has no effect 31

vim +/ret +8361 fs/btrfs/inode.c

  8283	
  8284	/*
  8285	 * btrfs_page_mkwrite() is not allowed to change the file size as it gets
  8286	 * called from a page fault handler when a page is first dirtied. Hence we must
  8287	 * be careful to check for EOF conditions here. We set the page up correctly
  8288	 * for a written page which means we get ENOSPC checking when writing into
  8289	 * holes and correct delalloc and unwritten extent mapping on filesystems that
  8290	 * support these features.
  8291	 *
  8292	 * We are not allowed to take the i_mutex here so we have to play games to
  8293	 * protect against truncate races as the page could now be beyond EOF.  Because
  8294	 * truncate_setsize() writes the inode size before removing pages, once we have
  8295	 * the page lock we can determine safely if the page is beyond EOF. If it is not
  8296	 * beyond EOF, then the page is guaranteed safe against truncation until we
  8297	 * unlock the page.
  8298	 */
  8299	vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
  8300	{
  8301		struct page *page = vmf->page;
  8302		struct inode *inode = file_inode(vmf->vma->vm_file);
  8303		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
  8304		struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
  8305		struct btrfs_ordered_extent *ordered;
  8306		struct extent_state *cached_state = NULL;
  8307		struct extent_changeset *data_reserved = NULL;
  8308		char *kaddr;
  8309		unsigned long zero_start;
  8310		loff_t size;
  8311		vm_fault_t ret;
  8312		int ret2;
  8313		int reserved = 0;
  8314		u64 reserved_space;
  8315		u64 page_start;
  8316		u64 page_end;
  8317		u64 end;
  8318	
  8319		reserved_space = PAGE_SIZE;
  8320	
  8321		sb_start_pagefault(inode->i_sb);
  8322		page_start = page_offset(page);
  8323		page_end = page_start + PAGE_SIZE - 1;
  8324		end = page_end;
  8325	
  8326		/*
  8327		 * Reserving delalloc space after obtaining the page lock can lead to
  8328		 * deadlock. For example, if a dirty page is locked by this function
  8329		 * and the call to btrfs_delalloc_reserve_space() ends up triggering
  8330		 * dirty page write out, then the btrfs_writepage() function could
  8331		 * end up waiting indefinitely to get a lock on the page currently
  8332		 * being processed by btrfs_page_mkwrite() function.
  8333		 */
  8334		ret2 = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved,
  8335						    page_start, reserved_space);
  8336		if (!ret2) {
  8337			ret2 = file_update_time(vmf->vma->vm_file);
  8338			reserved = 1;
  8339		}
  8340		if (ret2) {
  8341			ret = vmf_error(ret2);
  8342			if (reserved)
  8343				goto out;
  8344			goto out_noreserve;
  8345		}
  8346	
  8347		ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */
  8348	again:
  8349		lock_page(page);
  8350		size = i_size_read(inode);
  8351	
  8352		if ((page->mapping != inode->i_mapping) ||
  8353		    (page_start >= size)) {
  8354			/* page got truncated out from underneath us */
  8355			goto out_unlock;
  8356		}
  8357		wait_on_page_writeback(page);
  8358	
  8359		lock_extent_bits(io_tree, page_start, page_end, &cached_state);
  8360		ret = set_page_extent_mapped(page);
> 8361		if (ret < 0)
  8362			goto out_unlock;
  8363	
  8364		/*
  8365		 * we can't set the delalloc bits if there are pending ordered
  8366		 * extents.  Drop our locks and wait for them to finish
  8367		 */
  8368		ordered = btrfs_lookup_ordered_range(BTRFS_I(inode), page_start,
  8369				PAGE_SIZE);
  8370		if (ordered) {
  8371			unlock_extent_cached(io_tree, page_start, page_end,
  8372					     &cached_state);
  8373			unlock_page(page);
  8374			btrfs_start_ordered_extent(ordered, 1);
  8375			btrfs_put_ordered_extent(ordered);
  8376			goto again;
  8377		}
  8378	
  8379		if (page->index == ((size - 1) >> PAGE_SHIFT)) {
  8380			reserved_space = round_up(size - page_start,
  8381						  fs_info->sectorsize);
  8382			if (reserved_space < PAGE_SIZE) {
  8383				end = page_start + reserved_space - 1;
  8384				btrfs_delalloc_release_space(BTRFS_I(inode),
  8385						data_reserved, page_start,
  8386						PAGE_SIZE - reserved_space, true);
  8387			}
  8388		}
  8389	
  8390		/*
  8391		 * page_mkwrite gets called when the page is firstly dirtied after it's
  8392		 * faulted in, but write(2) could also dirty a page and set delalloc
  8393		 * bits, thus in this case for space account reason, we still need to
  8394		 * clear any delalloc bits within this page range since we have to
  8395		 * reserve data&meta space before lock_page() (see above comments).
  8396		 */
  8397		clear_extent_bit(&BTRFS_I(inode)->io_tree, page_start, end,
  8398				  EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING |
  8399				  EXTENT_DEFRAG, 0, 0, &cached_state);
  8400	
  8401		ret2 = btrfs_set_extent_delalloc(BTRFS_I(inode), page_start, end, 0,
  8402						&cached_state);
  8403		if (ret2) {
  8404			unlock_extent_cached(io_tree, page_start, page_end,
  8405					     &cached_state);
  8406			ret = VM_FAULT_SIGBUS;
  8407			goto out_unlock;
  8408		}
  8409	
  8410		/* page is wholly or partially inside EOF */
  8411		if (page_start + PAGE_SIZE > size)
  8412			zero_start = offset_in_page(size);
  8413		else
  8414			zero_start = PAGE_SIZE;
  8415	
  8416		if (zero_start != PAGE_SIZE) {
  8417			kaddr = kmap(page);
  8418			memset(kaddr + zero_start, 0, PAGE_SIZE - zero_start);
  8419			flush_dcache_page(page);
  8420			kunmap(page);
  8421		}
  8422		ClearPageChecked(page);
  8423		set_page_dirty(page);
  8424		SetPageUptodate(page);
  8425	
  8426		BTRFS_I(inode)->last_trans = fs_info->generation;
  8427		BTRFS_I(inode)->last_sub_trans = BTRFS_I(inode)->root->log_transid;
  8428		BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
  8429	
  8430		unlock_extent_cached(io_tree, page_start, page_end, &cached_state);
  8431	
  8432		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8433		sb_end_pagefault(inode->i_sb);
  8434		extent_changeset_free(data_reserved);
  8435		return VM_FAULT_LOCKED;
  8436	
  8437	out_unlock:
  8438		unlock_page(page);
  8439	out:
  8440		btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE);
  8441		btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, page_start,
  8442					     reserved_space, (ret != 0));
  8443	out_noreserve:
  8444		sb_end_pagefault(inode->i_sb);
  8445		extent_changeset_free(data_reserved);
  8446		return ret;
  8447	}
  8448	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 36210 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 09/18] btrfs: subpage: introduce helper for subpage uptodate status
  2020-12-10  6:38 ` [PATCH v2 09/18] btrfs: subpage: introduce helper for subpage uptodate status Qu Wenruo
@ 2020-12-11 10:10   ` Nikolay Borisov
  2020-12-11 10:48     ` Qu Wenruo
  0 siblings, 1 reply; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-11 10:10 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
> This patch introduce the following functions to handle btrfs subpage
> uptodate status:
> - btrfs_subpage_set_uptodate()
> - btrfs_subpage_clear_uptodate()
> - btrfs_subpage_test_uptodate()
>   Those helpers can only be called when the range is ensured to be
>   inside the page.
> 
> - btrfs_page_set_uptodate()
> - btrfs_page_clear_uptodate()
> - btrfs_page_test_uptodate()
>   Those helpers can handle both regular sector size and subpage without
>   problem.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/subpage.h | 98 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 98 insertions(+)
> 
> diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
> index 87b4e028ae18..b3cf9171ec98 100644
> --- a/fs/btrfs/subpage.h
> +++ b/fs/btrfs/subpage.h
> @@ -23,6 +23,7 @@
>  struct btrfs_subpage {
>  	/* Common members for both data and metadata pages */
>  	spinlock_t lock;
> +	u16 uptodate_bitmap;
>  	union {
>  		/* Structures only used by metadata */
>  		struct {
> @@ -35,6 +36,17 @@ struct btrfs_subpage {
>  int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>  void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>  
> +static inline void btrfs_subpage_clamp_range(struct page *page,
> +					     u64 *start, u32 *len)
> +{
> +	u64 orig_start = *start;
> +	u32 orig_len = *len;
> +
> +	*start = max_t(u64, page_offset(page), orig_start);
> +	*len = min_t(u64, page_offset(page) + PAGE_SIZE,
> +		     orig_start + orig_len) - *start;
> +}

This handles EB's which span pages, right? If so - a comment is in order
since there is no design document specifying whether eb can or cannot
span multiple pages.

> +
>  /*
>   * Convert the [start, start + len) range into a u16 bitmap
>   *
> @@ -96,4 +108,90 @@ static inline bool btrfs_subpage_clear_and_test_tree_block(
>  	return last;
>  }
>  
> +static inline void btrfs_subpage_set_uptodate(struct btrfs_fs_info *fs_info,
> +			struct page *page, u64 start, u32 len)
> +{
> +	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
> +	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&subpage->lock, flags);
> +	subpage->uptodate_bitmap |= tmp;
> +	if (subpage->uptodate_bitmap == (u16)-1)

just use U16_MAX instead of (u16)-1.

> +		SetPageUptodate(page);
> +	spin_unlock_irqrestore(&subpage->lock, flags);
> +}
> +
> +static inline void btrfs_subpage_clear_uptodate(struct btrfs_fs_info *fs_info,
> +			struct page *page, u64 start, u32 len)
> +{
> +	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
> +	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&subpage->lock, flags);
> +	subpage->tree_block_bitmap &= ~tmp;

I  guess you meant to clear uptodate_bitmap and not tree_block_bitmap ?

> +	ClearPageUptodate(page);
> +	spin_unlock_irqrestore(&subpage->lock, flags);
> +}
> +

<snip>

> +DECLARE_BTRFS_SUBPAGE_TEST_OP(uptodate);
> +
> +/*
> + * Note that, in selftest, especially extent-io-tests, we can have empty
> + * fs_info passed in.
> + * Thanfully in selftest, we only test sectorsize == PAGE_SIZE cases so far

nit:s/Thankfully/Thankfully

> + * thus we can fall back to regular sectorsize branch.
> + */

<snip>

> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 09/18] btrfs: subpage: introduce helper for subpage uptodate status
  2020-12-11 10:10   ` Nikolay Borisov
@ 2020-12-11 10:48     ` Qu Wenruo
  2020-12-11 11:41       ` Nikolay Borisov
  0 siblings, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-11 10:48 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2020/12/11 下午6:10, Nikolay Borisov wrote:
>
>
> On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
>> This patch introduce the following functions to handle btrfs subpage
>> uptodate status:
>> - btrfs_subpage_set_uptodate()
>> - btrfs_subpage_clear_uptodate()
>> - btrfs_subpage_test_uptodate()
>>   Those helpers can only be called when the range is ensured to be
>>   inside the page.
>>
>> - btrfs_page_set_uptodate()
>> - btrfs_page_clear_uptodate()
>> - btrfs_page_test_uptodate()
>>   Those helpers can handle both regular sector size and subpage without
>>   problem.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/subpage.h | 98 ++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 98 insertions(+)
>>
>> diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
>> index 87b4e028ae18..b3cf9171ec98 100644
>> --- a/fs/btrfs/subpage.h
>> +++ b/fs/btrfs/subpage.h
>> @@ -23,6 +23,7 @@
>>  struct btrfs_subpage {
>>  	/* Common members for both data and metadata pages */
>>  	spinlock_t lock;
>> +	u16 uptodate_bitmap;
>>  	union {
>>  		/* Structures only used by metadata */
>>  		struct {
>> @@ -35,6 +36,17 @@ struct btrfs_subpage {
>>  int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>>  void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>>
>> +static inline void btrfs_subpage_clamp_range(struct page *page,
>> +					     u64 *start, u32 *len)
>> +{
>> +	u64 orig_start = *start;
>> +	u32 orig_len = *len;
>> +
>> +	*start = max_t(u64, page_offset(page), orig_start);
>> +	*len = min_t(u64, page_offset(page) + PAGE_SIZE,
>> +		     orig_start + orig_len) - *start;
>> +}
>
> This handles EB's which span pages, right? If so - a comment is in order
> since there is no design document specifying whether eb can or cannot
> span multiple pages.

Didn't I have already stated that in the subpage eb accessors patch?

No subpage eb can across page bounday.

>
>> +
>>  /*
>>   * Convert the [start, start + len) range into a u16 bitmap
>>   *
>> @@ -96,4 +108,90 @@ static inline bool btrfs_subpage_clear_and_test_tree_block(
>>  	return last;
>>  }
>>
>> +static inline void btrfs_subpage_set_uptodate(struct btrfs_fs_info *fs_info,
>> +			struct page *page, u64 start, u32 len)
>> +{
>> +	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
>> +	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&subpage->lock, flags);
>> +	subpage->uptodate_bitmap |= tmp;
>> +	if (subpage->uptodate_bitmap == (u16)-1)
>
> just use U16_MAX instead of (u16)-1.
>
>> +		SetPageUptodate(page);
>> +	spin_unlock_irqrestore(&subpage->lock, flags);
>> +}
>> +
>> +static inline void btrfs_subpage_clear_uptodate(struct btrfs_fs_info *fs_info,
>> +			struct page *page, u64 start, u32 len)
>> +{
>> +	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
>> +	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&subpage->lock, flags);
>> +	subpage->tree_block_bitmap &= ~tmp;
>
> I  guess you meant to clear uptodate_bitmap and not tree_block_bitmap ?'

Oh my...

Thanks for catching this,
Qu
>
>> +	ClearPageUptodate(page);
>> +	spin_unlock_irqrestore(&subpage->lock, flags);
>> +}
>> +
>
> <snip>
>
>> +DECLARE_BTRFS_SUBPAGE_TEST_OP(uptodate);
>> +
>> +/*
>> + * Note that, in selftest, especially extent-io-tests, we can have empty
>> + * fs_info passed in.
>> + * Thanfully in selftest, we only test sectorsize == PAGE_SIZE cases so far
>
> nit:s/Thankfully/Thankfully
>
>> + * thus we can fall back to regular sectorsize branch.
>> + */
>
> <snip>
>
>>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 09/18] btrfs: subpage: introduce helper for subpage uptodate status
  2020-12-11 10:48     ` Qu Wenruo
@ 2020-12-11 11:41       ` Nikolay Borisov
  2020-12-11 11:56         ` Qu Wenruo
  0 siblings, 1 reply; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-11 11:41 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs



On 11.12.20 г. 12:48 ч., Qu Wenruo wrote:
> 
> 
> On 2020/12/11 下午6:10, Nikolay Borisov wrote:
>>
>>
>> On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
>>> This patch introduce the following functions to handle btrfs subpage
>>> uptodate status:
>>> - btrfs_subpage_set_uptodate()
>>> - btrfs_subpage_clear_uptodate()
>>> - btrfs_subpage_test_uptodate()
>>>   Those helpers can only be called when the range is ensured to be
>>>   inside the page.
>>>
>>> - btrfs_page_set_uptodate()
>>> - btrfs_page_clear_uptodate()
>>> - btrfs_page_test_uptodate()
>>>   Those helpers can handle both regular sector size and subpage without
>>>   problem.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>  fs/btrfs/subpage.h | 98 ++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 98 insertions(+)
>>>
>>> diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
>>> index 87b4e028ae18..b3cf9171ec98 100644
>>> --- a/fs/btrfs/subpage.h
>>> +++ b/fs/btrfs/subpage.h
>>> @@ -23,6 +23,7 @@
>>>  struct btrfs_subpage {
>>>  	/* Common members for both data and metadata pages */
>>>  	spinlock_t lock;
>>> +	u16 uptodate_bitmap;
>>>  	union {
>>>  		/* Structures only used by metadata */
>>>  		struct {
>>> @@ -35,6 +36,17 @@ struct btrfs_subpage {
>>>  int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>>>  void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>>>
>>> +static inline void btrfs_subpage_clamp_range(struct page *page,
>>> +					     u64 *start, u32 *len)
>>> +{
>>> +	u64 orig_start = *start;
>>> +	u32 orig_len = *len;
>>> +
>>> +	*start = max_t(u64, page_offset(page), orig_start);
>>> +	*len = min_t(u64, page_offset(page) + PAGE_SIZE,
>>> +		     orig_start + orig_len) - *start;
>>> +}
>>
>> This handles EB's which span pages, right? If so - a comment is in order
>> since there is no design document specifying whether eb can or cannot
>> span multiple pages.
> 
> Didn't I have already stated that in the subpage eb accessors patch?
> 
> No subpage eb can across page bounday.
> 

As just discussed during the whiteboard session this function is really
dead code for eb's because they are guaranteed to not span pages. Even
for RW support it seems there is only btrfs_dirty_pages which changes
page flags without having clamped the data i.e. there's only 1
exception. In light of this I think it would be better to replace this
function with ASSERTS and handle the only exception at the call site.


<snip>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 09/18] btrfs: subpage: introduce helper for subpage uptodate status
  2020-12-11 11:41       ` Nikolay Borisov
@ 2020-12-11 11:56         ` Qu Wenruo
  0 siblings, 0 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-11 11:56 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2020/12/11 下午7:41, Nikolay Borisov wrote:
>
>
> On 11.12.20 г. 12:48 ч., Qu Wenruo wrote:
>>
>>
>> On 2020/12/11 下午6:10, Nikolay Borisov wrote:
>>>
>>>
>>> On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
>>>> This patch introduce the following functions to handle btrfs subpage
>>>> uptodate status:
>>>> - btrfs_subpage_set_uptodate()
>>>> - btrfs_subpage_clear_uptodate()
>>>> - btrfs_subpage_test_uptodate()
>>>>   Those helpers can only be called when the range is ensured to be
>>>>   inside the page.
>>>>
>>>> - btrfs_page_set_uptodate()
>>>> - btrfs_page_clear_uptodate()
>>>> - btrfs_page_test_uptodate()
>>>>   Those helpers can handle both regular sector size and subpage without
>>>>   problem.
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>>  fs/btrfs/subpage.h | 98 ++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 98 insertions(+)
>>>>
>>>> diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
>>>> index 87b4e028ae18..b3cf9171ec98 100644
>>>> --- a/fs/btrfs/subpage.h
>>>> +++ b/fs/btrfs/subpage.h
>>>> @@ -23,6 +23,7 @@
>>>>  struct btrfs_subpage {
>>>>  	/* Common members for both data and metadata pages */
>>>>  	spinlock_t lock;
>>>> +	u16 uptodate_bitmap;
>>>>  	union {
>>>>  		/* Structures only used by metadata */
>>>>  		struct {
>>>> @@ -35,6 +36,17 @@ struct btrfs_subpage {
>>>>  int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>>>>  void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>>>>
>>>> +static inline void btrfs_subpage_clamp_range(struct page *page,
>>>> +					     u64 *start, u32 *len)
>>>> +{
>>>> +	u64 orig_start = *start;
>>>> +	u32 orig_len = *len;
>>>> +
>>>> +	*start = max_t(u64, page_offset(page), orig_start);
>>>> +	*len = min_t(u64, page_offset(page) + PAGE_SIZE,
>>>> +		     orig_start + orig_len) - *start;
>>>> +}
>>>
>>> This handles EB's which span pages, right? If so - a comment is in order
>>> since there is no design document specifying whether eb can or cannot
>>> span multiple pages.
>>
>> Didn't I have already stated that in the subpage eb accessors patch?
>>
>> No subpage eb can across page bounday.
>>
>
> As just discussed during the whiteboard session this function is really
> dead code for eb's because they are guaranteed to not span pages. Even
> for RW support it seems there is only btrfs_dirty_pages which changes
> page flags without having clamped the data i.e. there's only 1
> exception. In light of this I think it would be better to replace this
> function with ASSERTS and handle the only exception at the call site.

You're completely right.

I'll definite change these in next update.

Thanks,
Qu
>
>
> <snip>
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2020-12-10  6:38 ` [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
@ 2020-12-11 12:00   ` Nikolay Borisov
  2020-12-11 12:11     ` Qu Wenruo
  0 siblings, 1 reply; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-11 12:00 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
> Unlike the original try_release_extent_buffer,
> try_release_subpage_extent_buffer() will iterate through
> btrfs_subpage::tree_block_bitmap, and try to release each extent buffer.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 73 insertions(+)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 141e414b1ab9..4d55803302e9 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -6258,10 +6258,83 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
>  	}
>  }
>  
> +static int try_release_subpage_extent_buffer(struct page *page)
> +{
> +	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
> +	u64 page_start = page_offset(page);
> +	int bitmap_size = BTRFS_SUBPAGE_BITMAP_SIZE;

Remove this variable and directly use BTRFS_SUBPAGE_BITMAP_SIZE as a
terminating condition

> +	int bit_start = 0;
> +	int ret;
> +
> +	while (bit_start < bitmap_size) {

You really want to iterate for a fixed number of items so switch that to
a for loop.

> +		struct btrfs_subpage *subpage;
> +		struct extent_buffer *eb;
> +		unsigned long flags;
> +		u16 tmp = 1 << bit_start;
> +		u64 start;
> +
> +		/*
> +		 * Make sure the page still has private, as previous run can
> +		 * detach the private
> +		 */

But if previous run has run it would have disposed of this eb and you
won't find this page at all, no ?

> +		spin_lock(&page->mapping->private_lock);
> +		if (!PagePrivate(page)) {
> +			spin_unlock(&page->mapping->private_lock);
> +			break;
> +		}
> +		subpage = (struct btrfs_subpage *)page->private;
> +		spin_unlock(&page->mapping->private_lock);
> +
> +		spin_lock_irqsave(&subpage->lock, flags);
> +		if (!(tmp & subpage->tree_block_bitmap))  {
> +			spin_unlock_irqrestore(&subpage->lock, flags);
> +			bit_start++;
> +			continue;
> +		}
> +		spin_unlock_irqrestore(&subpage->lock, flags);
> +
> +		start = bit_start * fs_info->sectorsize + page_start;
> +		bit_start += fs_info->nodesize >> fs_info->sectorsize_bits;

By doing this you are really saying "skip all blocks pertaining to this
eb". In order for this to be correct it would imply that bit_start
should _always_ be 0,4,8,12 - am I correct?  But what happens if
if (!(tmp & subpage->tree_block_bitmap))  has executed and bit_start is
now 1, then you'd make start = page_start + 4k , skip next 4(16k) blocks
but that would be wrong, no ?

Essentially the page would look like:

|0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15

So you want to release the EB's that spawn 0-3, 4-7, 8-11, 12-15, but
what if bit_start becomes 1 and you add 4 to that, this offsets all
further calculation by 1 i.e you are going into the next eb.


> +		/*
> +		 * Here we can't call find_extent_buffer() which will increase
> +		 * eb->refs.
> +		 */
> +		rcu_read_lock();
> +		eb = radix_tree_lookup(&fs_info->buffer_radix,
> +				start >> fs_info->sectorsize_bits);
> +		rcu_read_unlock();
> +		ASSERT(eb);
> +		spin_lock(&eb->refs_lock);
> +		if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb) ||
> +		    !test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) {
> +			spin_unlock(&eb->refs_lock);
> +			continue;
> +		}
> +		/*
> +		 * Here we don't care the return value, we will always check
> +		 * the page private at the end.
> +		 * And release_extent_buffer() will release the refs_lock.
> +		 */
> +		release_extent_buffer(eb);
> +	}
> +	/* Finally to check if we have cleared page private */
> +	spin_lock(&page->mapping->private_lock);
> +	if (!PagePrivate(page))
> +		ret = 1;
> +	else
> +		ret = 0;
> +	spin_unlock(&page->mapping->private_lock);
> +	return ret;
> +
> +}
> +
>  int try_release_extent_buffer(struct page *page)
>  {
>  	struct extent_buffer *eb;
>  
> +	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
> +		return try_release_subpage_extent_buffer(page);
> +
>  	/*
>  	 * We need to make sure nobody is attaching this page to an eb right
>  	 * now.
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2020-12-11 12:00   ` Nikolay Borisov
@ 2020-12-11 12:11     ` Qu Wenruo
  2020-12-11 16:57       ` Nikolay Borisov
  0 siblings, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-11 12:11 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2020/12/11 下午8:00, Nikolay Borisov wrote:
> 
> 
> On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
>> Unlike the original try_release_extent_buffer,
>> try_release_subpage_extent_buffer() will iterate through
>> btrfs_subpage::tree_block_bitmap, and try to release each extent buffer.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/extent_io.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 73 insertions(+)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 141e414b1ab9..4d55803302e9 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -6258,10 +6258,83 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
>>  	}
>>  }
>>  
>> +static int try_release_subpage_extent_buffer(struct page *page)
>> +{
>> +	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
>> +	u64 page_start = page_offset(page);
>> +	int bitmap_size = BTRFS_SUBPAGE_BITMAP_SIZE;
> 
> Remove this variable and directly use BTRFS_SUBPAGE_BITMAP_SIZE as a
> terminating condition
> 
>> +	int bit_start = 0;
>> +	int ret;
>> +
>> +	while (bit_start < bitmap_size) {
> 
> You really want to iterate for a fixed number of items so switch that to
> a for loop.

The problem here is, it's not always fixed.

If it finds one bit set, it will skip (nodesize >> sectorsize_bits) bits.

But if not found, it will skip to just next bit.

Thus I'm not sure if for loop is really a good choice here for
differential step.

> 
>> +		struct btrfs_subpage *subpage;
>> +		struct extent_buffer *eb;
>> +		unsigned long flags;
>> +		u16 tmp = 1 << bit_start;
>> +		u64 start;
>> +
>> +		/*
>> +		 * Make sure the page still has private, as previous run can
>> +		 * detach the private
>> +		 */
> 
> But if previous run has run it would have disposed of this eb and you
> won't find this page at all, no ?

For the "previous run" I mean, previous iteration in the same loop.

E.g. the page has 4 bits set, just one eb (16K nodesize).

For the first run, it release the only eb of the page, and cleared page
private.
For the second run, since private is cleared, we need to break out.

> 
>> +		spin_lock(&page->mapping->private_lock);
>> +		if (!PagePrivate(page)) {
>> +			spin_unlock(&page->mapping->private_lock);
>> +			break;
>> +		}
>> +		subpage = (struct btrfs_subpage *)page->private;
>> +		spin_unlock(&page->mapping->private_lock);
>> +
>> +		spin_lock_irqsave(&subpage->lock, flags);
>> +		if (!(tmp & subpage->tree_block_bitmap))  {
>> +			spin_unlock_irqrestore(&subpage->lock, flags);
>> +			bit_start++;
>> +			continue;
>> +		}
>> +		spin_unlock_irqrestore(&subpage->lock, flags);
>> +
>> +		start = bit_start * fs_info->sectorsize + page_start;
>> +		bit_start += fs_info->nodesize >> fs_info->sectorsize_bits;
> 
> By doing this you are really saying "skip all blocks pertaining to this
> eb". In order for this to be correct it would imply that bit_start
> should _always_ be 0,4,8,12 - am I correct? 

Nope. As long as no eb crosses page boundary, it won't cause problem.
So in theory we support case like eb spans sector 1~5.

> But what happens if
> if (!(tmp & subpage->tree_block_bitmap))  has executed and bit_start is
> now 1, then you'd make start = page_start + 4k , skip next 4(16k) blocks
> but that would be wrong, no ?

For (!(tmp & subpage->tree_block_bitmap)) branch, isn't bit_start just
increased by one?
Exactly like what I said, we will check next sector, until we hit the
first bit set.

And only when we hit a bit, we increase the bit_start by nodesize /
sectorsize.

> 
> Essentially the page would look like:
> 
> |0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15
> 
> So you want to release the EB's that spawn 0-3, 4-7, 8-11, 12-15, but
> what if bit_start becomes 1 and you add 4 to that, this offsets all
> further calculation by 1 i.e you are going into the next eb.

Nope, 4 is only added we we hit a bit set.
If we hit a bit zero, we jump to next bit, not following nodesize >>
sectorsize.

That's exactly the reason I'm not using for() loop here, due to the
difference in step size.

Thanks,
Qu
> 
> 
>> +		/*
>> +		 * Here we can't call find_extent_buffer() which will increase
>> +		 * eb->refs.
>> +		 */
>> +		rcu_read_lock();
>> +		eb = radix_tree_lookup(&fs_info->buffer_radix,
>> +				start >> fs_info->sectorsize_bits);
>> +		rcu_read_unlock();
>> +		ASSERT(eb);
>> +		spin_lock(&eb->refs_lock);
>> +		if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb) ||
>> +		    !test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) {
>> +			spin_unlock(&eb->refs_lock);
>> +			continue;
>> +		}
>> +		/*
>> +		 * Here we don't care the return value, we will always check
>> +		 * the page private at the end.
>> +		 * And release_extent_buffer() will release the refs_lock.
>> +		 */
>> +		release_extent_buffer(eb);
>> +	}
>> +	/* Finally to check if we have cleared page private */
>> +	spin_lock(&page->mapping->private_lock);
>> +	if (!PagePrivate(page))
>> +		ret = 1;
>> +	else
>> +		ret = 0;
>> +	spin_unlock(&page->mapping->private_lock);
>> +	return ret;
>> +
>> +}
>> +
>>  int try_release_extent_buffer(struct page *page)
>>  {
>>  	struct extent_buffer *eb;
>>  
>> +	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
>> +		return try_release_subpage_extent_buffer(page);
>> +
>>  	/*
>>  	 * We need to make sure nobody is attaching this page to an eb right
>>  	 * now.
>>
> 


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2020-12-11 12:11     ` Qu Wenruo
@ 2020-12-11 16:57       ` Nikolay Borisov
  2020-12-12  1:28         ` Qu Wenruo
  2020-12-12  5:44         ` Qu Wenruo
  0 siblings, 2 replies; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-11 16:57 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 11.12.20 г. 14:11 ч., Qu Wenruo wrote:
> 
> 
> On 2020/12/11 下午8:00, Nikolay Borisov wrote:
>>
>>
>> On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
>>> Unlike the original try_release_extent_buffer,
>>> try_release_subpage_extent_buffer() will iterate through
>>> btrfs_subpage::tree_block_bitmap, and try to release each extent buffer.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>  fs/btrfs/extent_io.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 73 insertions(+)
>>>
>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>> index 141e414b1ab9..4d55803302e9 100644
>>> --- a/fs/btrfs/extent_io.c
>>> +++ b/fs/btrfs/extent_io.c
>>> @@ -6258,10 +6258,83 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
>>>  	}
>>>  }
>>>  
>>> +static int try_release_subpage_extent_buffer(struct page *page)
>>> +{
>>> +	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
>>> +	u64 page_start = page_offset(page);
>>> +	int bitmap_size = BTRFS_SUBPAGE_BITMAP_SIZE;
>>
>> Remove this variable and directly use BTRFS_SUBPAGE_BITMAP_SIZE as a
>> terminating condition
>>
>>> +	int bit_start = 0;
>>> +	int ret;
>>> +
>>> +	while (bit_start < bitmap_size) {
>>
>> You really want to iterate for a fixed number of items so switch that to
>> a for loop.
> 
> The problem here is, it's not always fixed.
> 
> If it finds one bit set, it will skip (nodesize >> sectorsize_bits) bits.
> 
> But if not found, it will skip to just next bit.
> 
> Thus I'm not sure if for loop is really a good choice here for
> differential step.
> 
>>
>>> +		struct btrfs_subpage *subpage;
>>> +		struct extent_buffer *eb;
>>> +		unsigned long flags;
>>> +		u16 tmp = 1 << bit_start;
>>> +		u64 start;
>>> +
>>> +		/*
>>> +		 * Make sure the page still has private, as previous run can
>>> +		 * detach the private
>>> +		 */
>>
>> But if previous run has run it would have disposed of this eb and you
>> won't find this page at all, no ?
> 
> For the "previous run" I mean, previous iteration in the same loop.
> 
> E.g. the page has 4 bits set, just one eb (16K nodesize).

Isn't it guaranteed that if you iterate the eb's in a page if you meet
an empty block then the whole extent buffer is gone, hence instead of
doing bit_start++ you ought to also increment by the size of nodesize.

For example, assume a page contains 4 EBs:

0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15
x|x|x|x|0|0|0|0|x|x| x|x |0 | 0|0 |0

So first bit is set, so you proceed to call release_extent_buffer on it,
which clears the first 4 bits in tree_block_bitmap, in this case you've
incremented by nodesize so next iteration begins at index 4. You detect
it's unset (0) hence you increment it byte 1 and you repeat this for the
next 3 bits, then you free the whole of the next eb. I argue that you
also need to increment by nodesize in the case of a bit which is not
set, because you cannot really see partially freed eb i.e you cannot see
the following state:

0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15
x|x|x|x|x|0|0|0|x|x| x|x |0 | 0|0 |0

Am I missing something?




> 
> For the first run, it release the only eb of the page, and cleared page
> private.
> For the second run, since private is cleared, we need to break out.
> 
>>
>>> +		spin_lock(&page->mapping->private_lock);
>>> +		if (!PagePrivate(page)) {
>>> +			spin_unlock(&page->mapping->private_lock);
>>> +			break;
>>> +		}

Aren't we guaranteed that a page has private if this function is called ?

>>> +		subpage = (struct btrfs_subpage *)page->private;
>>> +		spin_unlock(&page->mapping->private_lock);
>>> +
>>> +		spin_lock_irqsave(&subpage->lock, flags);
>>> +		if (!(tmp & subpage->tree_block_bitmap))  {
>>> +			spin_unlock_irqrestore(&subpage->lock, flags);
>>> +			bit_start++;
>>> +			continue;
>>> +		}
>>> +		spin_unlock_irqrestore(&subpage->lock, flags);
>>> +
>>> +		start = bit_start * fs_info->sectorsize + page_start;
>>> +		bit_start += fs_info->nodesize >> fs_info->sectorsize_bits;

<snip>

> Thanks,
> Qu
>>
>>
>>> +		/*
>>> +		 * Here we can't call find_extent_buffer() which will increase
>>> +		 * eb->refs.
>>> +		 */
>>> +		rcu_read_lock();
>>> +		eb = radix_tree_lookup(&fs_info->buffer_radix,
>>> +				start >> fs_info->sectorsize_bits);
>>> +		rcu_read_unlock();

Your usage of radix_tree_lookup + rcu lock is wrong. rcu guarantees that
an EB you get won't be freed while the rcu section is active, however
you get a reference to the EB and you do not increment the ref count
WHILE holding the RCU critical section, consult find_extent_buffer
what's the correct usage pattern.

Frankly the locking in this function is insane, first mapping->private
lock is acquired to check if Page_private is set and then page->private
is referenced but that is not signalled at all. Then subpage->lock is
taken to check the tree_block_bitmap, then the lock is dropped. At that
point no locks are held so this page could possibly be referenced by
someone else? Then the buggy locking is used to get the eb, then you
lock refs_lock and call release_extent_buffer...

>>> +		ASSERT(eb);

Doing this outside of the rcu read side critical section _without_
incrementing the ref count is buggy!

>>> +		spin_lock(&eb->refs_lock);
>>> +		if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb) ||
>>> +		    !test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) {
>>> +			spin_unlock(&eb->refs_lock);
>>> +			continue;
>>> +		}
>>> +		/*


<snip>


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2020-12-11 16:57       ` Nikolay Borisov
@ 2020-12-12  1:28         ` Qu Wenruo
  2020-12-12  9:26           ` Nikolay Borisov
  2020-12-12  5:44         ` Qu Wenruo
  1 sibling, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-12  1:28 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2020/12/12 上午12:57, Nikolay Borisov wrote:
>
>
> On 11.12.20 г. 14:11 ч., Qu Wenruo wrote:
>>
>>
>> On 2020/12/11 下午8:00, Nikolay Borisov wrote:
>>>
>>>
>>> On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
>>>> Unlike the original try_release_extent_buffer,
>>>> try_release_subpage_extent_buffer() will iterate through
>>>> btrfs_subpage::tree_block_bitmap, and try to release each extent buffer.
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>>   fs/btrfs/extent_io.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
>>>>   1 file changed, 73 insertions(+)
>>>>
>>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>>> index 141e414b1ab9..4d55803302e9 100644
>>>> --- a/fs/btrfs/extent_io.c
>>>> +++ b/fs/btrfs/extent_io.c
>>>> @@ -6258,10 +6258,83 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
>>>>   	}
>>>>   }
>>>>
>>>> +static int try_release_subpage_extent_buffer(struct page *page)
>>>> +{
>>>> +	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
>>>> +	u64 page_start = page_offset(page);
>>>> +	int bitmap_size = BTRFS_SUBPAGE_BITMAP_SIZE;
>>>
>>> Remove this variable and directly use BTRFS_SUBPAGE_BITMAP_SIZE as a
>>> terminating condition
>>>
>>>> +	int bit_start = 0;
>>>> +	int ret;
>>>> +
>>>> +	while (bit_start < bitmap_size) {
>>>
>>> You really want to iterate for a fixed number of items so switch that to
>>> a for loop.
>>
>> The problem here is, it's not always fixed.
>>
>> If it finds one bit set, it will skip (nodesize >> sectorsize_bits) bits.
>>
>> But if not found, it will skip to just next bit.
>>
>> Thus I'm not sure if for loop is really a good choice here for
>> differential step.
>>
>>>
>>>> +		struct btrfs_subpage *subpage;
>>>> +		struct extent_buffer *eb;
>>>> +		unsigned long flags;
>>>> +		u16 tmp = 1 << bit_start;
>>>> +		u64 start;
>>>> +
>>>> +		/*
>>>> +		 * Make sure the page still has private, as previous run can
>>>> +		 * detach the private
>>>> +		 */
>>>
>>> But if previous run has run it would have disposed of this eb and you
>>> won't find this page at all, no ?
>>
>> For the "previous run" I mean, previous iteration in the same loop.
>>
>> E.g. the page has 4 bits set, just one eb (16K nodesize).
>
> Isn't it guaranteed that if you iterate the eb's in a page if you meet
> an empty block then the whole extent buffer is gone, hence instead of
> doing bit_start++ you ought to also increment by the size of nodesize.
>
> For example, assume a page contains 4 EBs:
>
> 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15
> x|x|x|x|0|0|0|0|x|x| x|x |0 | 0|0 |0
>
> So first bit is set, so you proceed to call release_extent_buffer on it,
> which clears the first 4 bits in tree_block_bitmap, in this case you've
> incremented by nodesize so next iteration begins at index 4. You detect
> it's unset (0) hence you increment it byte 1 and you repeat this for the
> next 3 bits, then you free the whole of the next eb. I argue that you
> also need to increment by nodesize in the case of a bit which is not
> set, because you cannot really see partially freed eb i.e you cannot see
> the following state:
>
> 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15
> x|x|x|x|x|0|0|0|x|x| x|x |0 | 0|0 |0
>
> Am I missing something?

It's not for partly freed eb, but nodesize unaligned eb.

E.g. if we have a eb starts at sector 1 of a page, your nodesize based
iteration would go crazy.
Although we have ensured no subpage eb can cross page boundary, but it's
not the same requirement for nodesize alignment.

Thus I uses the extra safe way for the empty bit.

Thanks,
Qu
>
>
>
>
>>
>> For the first run, it release the only eb of the page, and cleared page
>> private.
>> For the second run, since private is cleared, we need to break out.
>>
>>>
>>>> +		spin_lock(&page->mapping->private_lock);
>>>> +		if (!PagePrivate(page)) {
>>>> +			spin_unlock(&page->mapping->private_lock);
>>>> +			break;
>>>> +		}
>
> Aren't we guaranteed that a page has private if this function is called ?
>
>>>> +		subpage = (struct btrfs_subpage *)page->private;
>>>> +		spin_unlock(&page->mapping->private_lock);
>>>> +
>>>> +		spin_lock_irqsave(&subpage->lock, flags);
>>>> +		if (!(tmp & subpage->tree_block_bitmap))  {
>>>> +			spin_unlock_irqrestore(&subpage->lock, flags);
>>>> +			bit_start++;
>>>> +			continue;
>>>> +		}
>>>> +		spin_unlock_irqrestore(&subpage->lock, flags);
>>>> +
>>>> +		start = bit_start * fs_info->sectorsize + page_start;
>>>> +		bit_start += fs_info->nodesize >> fs_info->sectorsize_bits;
>
> <snip>
>
>> Thanks,
>> Qu
>>>
>>>
>>>> +		/*
>>>> +		 * Here we can't call find_extent_buffer() which will increase
>>>> +		 * eb->refs.
>>>> +		 */
>>>> +		rcu_read_lock();
>>>> +		eb = radix_tree_lookup(&fs_info->buffer_radix,
>>>> +				start >> fs_info->sectorsize_bits);
>>>> +		rcu_read_unlock();
>
> Your usage of radix_tree_lookup + rcu lock is wrong. rcu guarantees that
> an EB you get won't be freed while the rcu section is active, however
> you get a reference to the EB and you do not increment the ref count
> WHILE holding the RCU critical section, consult find_extent_buffer
> what's the correct usage pattern.
>
> Frankly the locking in this function is insane, first mapping->private
> lock is acquired to check if Page_private is set and then page->private
> is referenced but that is not signalled at all. Then subpage->lock is
> taken to check the tree_block_bitmap, then the lock is dropped. At that
> point no locks are held so this page could possibly be referenced by
> someone else? Then the buggy locking is used to get the eb, then you
> lock refs_lock and call release_extent_buffer...
>
>>>> +		ASSERT(eb);
>
> Doing this outside of the rcu read side critical section _without_
> incrementing the ref count is buggy!
>
>>>> +		spin_lock(&eb->refs_lock);
>>>> +		if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb) ||
>>>> +		    !test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) {
>>>> +			spin_unlock(&eb->refs_lock);
>>>> +			continue;
>>>> +		}
>>>> +		/*
>
>
> <snip>
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2020-12-11 16:57       ` Nikolay Borisov
  2020-12-12  1:28         ` Qu Wenruo
@ 2020-12-12  5:44         ` Qu Wenruo
  2020-12-12 10:30           ` Nikolay Borisov
  1 sibling, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-12  5:44 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2020/12/12 上午12:57, Nikolay Borisov wrote:
>
>
> On 11.12.20 г. 14:11 ч., Qu Wenruo wrote:
>>
>>
>> On 2020/12/11 下午8:00, Nikolay Borisov wrote:
>>>
>>>
>>> On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
>>>> Unlike the original try_release_extent_buffer,
>>>> try_release_subpage_extent_buffer() will iterate through
>>>> btrfs_subpage::tree_block_bitmap, and try to release each extent buffer.
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>>   fs/btrfs/extent_io.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
>>>>   1 file changed, 73 insertions(+)
>>>>
>>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>>> index 141e414b1ab9..4d55803302e9 100644
>>>> --- a/fs/btrfs/extent_io.c
>>>> +++ b/fs/btrfs/extent_io.c
>>>> @@ -6258,10 +6258,83 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
>>>>   	}
>>>>   }
>>>>
>>>> +static int try_release_subpage_extent_buffer(struct page *page)
>>>> +{
>>>> +	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
>>>> +	u64 page_start = page_offset(page);
>>>> +	int bitmap_size = BTRFS_SUBPAGE_BITMAP_SIZE;
>>>
>>> Remove this variable and directly use BTRFS_SUBPAGE_BITMAP_SIZE as a
>>> terminating condition
>>>
>>>> +	int bit_start = 0;
>>>> +	int ret;
>>>> +
>>>> +	while (bit_start < bitmap_size) {
>>>
>>> You really want to iterate for a fixed number of items so switch that to
>>> a for loop.
>>
>> The problem here is, it's not always fixed.
>>
>> If it finds one bit set, it will skip (nodesize >> sectorsize_bits) bits.
>>
>> But if not found, it will skip to just next bit.
>>
>> Thus I'm not sure if for loop is really a good choice here for
>> differential step.
>>
>>>
>>>> +		struct btrfs_subpage *subpage;
>>>> +		struct extent_buffer *eb;
>>>> +		unsigned long flags;
>>>> +		u16 tmp = 1 << bit_start;
>>>> +		u64 start;
>>>> +
>>>> +		/*
>>>> +		 * Make sure the page still has private, as previous run can
>>>> +		 * detach the private
>>>> +		 */
>>>
>>> But if previous run has run it would have disposed of this eb and you
>>> won't find this page at all, no ?
>>
>> For the "previous run" I mean, previous iteration in the same loop.
>>
>> E.g. the page has 4 bits set, just one eb (16K nodesize).
>
> Isn't it guaranteed that if you iterate the eb's in a page if you meet
> an empty block then the whole extent buffer is gone, hence instead of
> doing bit_start++ you ought to also increment by the size of nodesize.
>
> For example, assume a page contains 4 EBs:
>
> 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15
> x|x|x|x|0|0|0|0|x|x| x|x |0 | 0|0 |0
>
> So first bit is set, so you proceed to call release_extent_buffer on it,
> which clears the first 4 bits in tree_block_bitmap, in this case you've
> incremented by nodesize so next iteration begins at index 4. You detect
> it's unset (0) hence you increment it byte 1 and you repeat this for the
> next 3 bits, then you free the whole of the next eb. I argue that you
> also need to increment by nodesize in the case of a bit which is not
> set, because you cannot really see partially freed eb i.e you cannot see
> the following state:
>
> 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15
> x|x|x|x|x|0|0|0|x|x| x|x |0 | 0|0 |0
>
> Am I missing something?
>
>
>
>
>>
>> For the first run, it release the only eb of the page, and cleared page
>> private.
>> For the second run, since private is cleared, we need to break out.
>>
>>>
>>>> +		spin_lock(&page->mapping->private_lock);
>>>> +		if (!PagePrivate(page)) {
>>>> +			spin_unlock(&page->mapping->private_lock);
>>>> +			break;
>>>> +		}
>
> Aren't we guaranteed that a page has private if this function is called ?
>
>>>> +		subpage = (struct btrfs_subpage *)page->private;
>>>> +		spin_unlock(&page->mapping->private_lock);
>>>> +
>>>> +		spin_lock_irqsave(&subpage->lock, flags);
>>>> +		if (!(tmp & subpage->tree_block_bitmap))  {
>>>> +			spin_unlock_irqrestore(&subpage->lock, flags);
>>>> +			bit_start++;
>>>> +			continue;
>>>> +		}
>>>> +		spin_unlock_irqrestore(&subpage->lock, flags);
>>>> +
>>>> +		start = bit_start * fs_info->sectorsize + page_start;
>>>> +		bit_start += fs_info->nodesize >> fs_info->sectorsize_bits;
>
> <snip>
>
>> Thanks,
>> Qu
>>>
>>>
>>>> +		/*
>>>> +		 * Here we can't call find_extent_buffer() which will increase
>>>> +		 * eb->refs.
>>>> +		 */
>>>> +		rcu_read_lock();
>>>> +		eb = radix_tree_lookup(&fs_info->buffer_radix,
>>>> +				start >> fs_info->sectorsize_bits);
>>>> +		rcu_read_unlock();
>
> Your usage of radix_tree_lookup + rcu lock is wrong. rcu guarantees that
> an EB you get won't be freed while the rcu section is active, however
> you get a reference to the EB and you do not increment the ref count
> WHILE holding the RCU critical section, consult find_extent_buffer
> what's the correct usage pattern.

Nope, you just fall into the trap what I fell before.

Here if the eb has no other referencer, its refs is just 1 (because it's
still in the tree).

If you go increase the refs, the eb becomes referenced again and
release_extent_buffer() won't free it at all.

Causing no eb to be freed whatever.

>
> Frankly the locking in this function is insane, first mapping->private
> lock is acquired to check if Page_private is set and then page->private
> is referenced but that is not signalled at all.

Because we just want the page::private pointer.

We won't touch page::private until we're really going to detach/attach.
But detach/attach will also modify subpage::tree_block_bitmap which is
protected by subpage::lock.

So here just to grab subpage pointer is completely fine.

> Then subpage->lock is
> taken to check the tree_block_bitmap, then the lock is dropped. At that
> point no locks are held so this page could possibly be referenced by
> someone else?

Does it matter? We have the info we need (the eb bytenr) that's all.

Other metadata operation may touch the page, but that won't cause
anything wrong.

> Then the buggy locking is used to get the eb, then you
> lock refs_lock and call release_extent_buffer...

Nope, eb access is not buggy.
If you increase the refs, that would be buggy.

>
>>>> +		ASSERT(eb);
>
> Doing this outside of the rcu read side critical section _without_
> incrementing the ref count is buggy!

Try increasing refs when we're going to cleanup one eb, that's really buggy.

Thanks,
Qu

>
>>>> +		spin_lock(&eb->refs_lock);
>>>> +		if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb) ||
>>>> +		    !test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) {
>>>> +			spin_unlock(&eb->refs_lock);
>>>> +			continue;
>>>> +		}
>>>> +		/*
>
>
> <snip>
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2020-12-12  1:28         ` Qu Wenruo
@ 2020-12-12  9:26           ` Nikolay Borisov
  2020-12-12 10:26             ` Qu Wenruo
  0 siblings, 1 reply; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-12  9:26 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs



On 12.12.20 г. 3:28 ч., Qu Wenruo wrote:
> 
> 
> On 2020/12/12 上午12:57, Nikolay Borisov wrote:
>>
>>
>> On 11.12.20 г. 14:11 ч., Qu Wenruo wrote:
>>>
>>>
>>> On 2020/12/11 下午8:00, Nikolay Borisov wrote:
>>>>
>>>>
>>>> On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
>>>>> Unlike the original try_release_extent_buffer,
>>>>> try_release_subpage_extent_buffer() will iterate through
>>>>> btrfs_subpage::tree_block_bitmap, and try to release each extent
>>>>> buffer.
>>>>>
>>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>>> ---
>>>>>   fs/btrfs/extent_io.c | 73
>>>>> ++++++++++++++++++++++++++++++++++++++++++++
>>>>>   1 file changed, 73 insertions(+)
>>>>>
>>>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>>>> index 141e414b1ab9..4d55803302e9 100644
>>>>> --- a/fs/btrfs/extent_io.c
>>>>> +++ b/fs/btrfs/extent_io.c
>>>>> @@ -6258,10 +6258,83 @@ void memmove_extent_buffer(const struct
>>>>> extent_buffer *dst,
>>>>>       }
>>>>>   }
>>>>>
>>>>> +static int try_release_subpage_extent_buffer(struct page *page)
>>>>> +{
>>>>> +    struct btrfs_fs_info *fs_info =
>>>>> btrfs_sb(page->mapping->host->i_sb);
>>>>> +    u64 page_start = page_offset(page);
>>>>> +    int bitmap_size = BTRFS_SUBPAGE_BITMAP_SIZE;
>>>>
>>>> Remove this variable and directly use BTRFS_SUBPAGE_BITMAP_SIZE as a
>>>> terminating condition
>>>>
>>>>> +    int bit_start = 0;
>>>>> +    int ret;
>>>>> +
>>>>> +    while (bit_start < bitmap_size) {
>>>>
>>>> You really want to iterate for a fixed number of items so switch
>>>> that to
>>>> a for loop.
>>>
>>> The problem here is, it's not always fixed.
>>>
>>> If it finds one bit set, it will skip (nodesize >> sectorsize_bits)
>>> bits.
>>>
>>> But if not found, it will skip to just next bit.
>>>
>>> Thus I'm not sure if for loop is really a good choice here for
>>> differential step.
>>>
>>>>
>>>>> +        struct btrfs_subpage *subpage;
>>>>> +        struct extent_buffer *eb;
>>>>> +        unsigned long flags;
>>>>> +        u16 tmp = 1 << bit_start;
>>>>> +        u64 start;
>>>>> +
>>>>> +        /*
>>>>> +         * Make sure the page still has private, as previous run can
>>>>> +         * detach the private
>>>>> +         */
>>>>
>>>> But if previous run has run it would have disposed of this eb and you
>>>> won't find this page at all, no ?
>>>
>>> For the "previous run" I mean, previous iteration in the same loop.
>>>
>>> E.g. the page has 4 bits set, just one eb (16K nodesize).
>>
>> Isn't it guaranteed that if you iterate the eb's in a page if you meet
>> an empty block then the whole extent buffer is gone, hence instead of
>> doing bit_start++ you ought to also increment by the size of nodesize.
>>
>> For example, assume a page contains 4 EBs:
>>
>> 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15
>> x|x|x|x|0|0|0|0|x|x| x|x |0 | 0|0 |0
>>
>> So first bit is set, so you proceed to call release_extent_buffer on it,
>> which clears the first 4 bits in tree_block_bitmap, in this case you've
>> incremented by nodesize so next iteration begins at index 4. You detect
>> it's unset (0) hence you increment it byte 1 and you repeat this for the
>> next 3 bits, then you free the whole of the next eb. I argue that you
>> also need to increment by nodesize in the case of a bit which is not
>> set, because you cannot really see partially freed eb i.e you cannot see
>> the following state:
>>
>> 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15
>> x|x|x|x|x|0|0|0|x|x| x|x |0 | 0|0 |0
>>
>> Am I missing something?
> 
> It's not for partly freed eb, but nodesize unaligned eb.
> 
> E.g. if we have a eb starts at sector 1 of a page, your nodesize based
> iteration would go crazy.
> Although we have ensured no subpage eb can cross page boundary, but it's
> not the same requirement for nodesize alignment.
> 
> Thus I uses the extra safe way for the empty bit.

Which of course cannot happen because the allocator ensures that
returned addresses are always aligned to fs_info::stripeize which in
turn is always equal to sectorsize... So you add extra complexity for no
apparent reason making code which is already subtle to be even more subtle.

<snip>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2020-12-12  9:26           ` Nikolay Borisov
@ 2020-12-12 10:26             ` Qu Wenruo
  0 siblings, 0 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-12 10:26 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2020/12/12 下午5:26, Nikolay Borisov wrote:
>
>
> On 12.12.20 г. 3:28 ч., Qu Wenruo wrote:
>>
>>
>> On 2020/12/12 上午12:57, Nikolay Borisov wrote:
>>>
>>>
>>> On 11.12.20 г. 14:11 ч., Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2020/12/11 下午8:00, Nikolay Borisov wrote:
>>>>>
>>>>>
>>>>> On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
>>>>>> Unlike the original try_release_extent_buffer,
>>>>>> try_release_subpage_extent_buffer() will iterate through
>>>>>> btrfs_subpage::tree_block_bitmap, and try to release each extent
>>>>>> buffer.
>>>>>>
>>>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>>>> ---
>>>>>>    fs/btrfs/extent_io.c | 73
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++
>>>>>>    1 file changed, 73 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>>>>> index 141e414b1ab9..4d55803302e9 100644
>>>>>> --- a/fs/btrfs/extent_io.c
>>>>>> +++ b/fs/btrfs/extent_io.c
>>>>>> @@ -6258,10 +6258,83 @@ void memmove_extent_buffer(const struct
>>>>>> extent_buffer *dst,
>>>>>>        }
>>>>>>    }
>>>>>>
>>>>>> +static int try_release_subpage_extent_buffer(struct page *page)
>>>>>> +{
>>>>>> +    struct btrfs_fs_info *fs_info =
>>>>>> btrfs_sb(page->mapping->host->i_sb);
>>>>>> +    u64 page_start = page_offset(page);
>>>>>> +    int bitmap_size = BTRFS_SUBPAGE_BITMAP_SIZE;
>>>>>
>>>>> Remove this variable and directly use BTRFS_SUBPAGE_BITMAP_SIZE as a
>>>>> terminating condition
>>>>>
>>>>>> +    int bit_start = 0;
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    while (bit_start < bitmap_size) {
>>>>>
>>>>> You really want to iterate for a fixed number of items so switch
>>>>> that to
>>>>> a for loop.
>>>>
>>>> The problem here is, it's not always fixed.
>>>>
>>>> If it finds one bit set, it will skip (nodesize >> sectorsize_bits)
>>>> bits.
>>>>
>>>> But if not found, it will skip to just next bit.
>>>>
>>>> Thus I'm not sure if for loop is really a good choice here for
>>>> differential step.
>>>>
>>>>>
>>>>>> +        struct btrfs_subpage *subpage;
>>>>>> +        struct extent_buffer *eb;
>>>>>> +        unsigned long flags;
>>>>>> +        u16 tmp = 1 << bit_start;
>>>>>> +        u64 start;
>>>>>> +
>>>>>> +        /*
>>>>>> +         * Make sure the page still has private, as previous run can
>>>>>> +         * detach the private
>>>>>> +         */
>>>>>
>>>>> But if previous run has run it would have disposed of this eb and you
>>>>> won't find this page at all, no ?
>>>>
>>>> For the "previous run" I mean, previous iteration in the same loop.
>>>>
>>>> E.g. the page has 4 bits set, just one eb (16K nodesize).
>>>
>>> Isn't it guaranteed that if you iterate the eb's in a page if you meet
>>> an empty block then the whole extent buffer is gone, hence instead of
>>> doing bit_start++ you ought to also increment by the size of nodesize.
>>>
>>> For example, assume a page contains 4 EBs:
>>>
>>> 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15
>>> x|x|x|x|0|0|0|0|x|x| x|x |0 | 0|0 |0
>>>
>>> So first bit is set, so you proceed to call release_extent_buffer on it,
>>> which clears the first 4 bits in tree_block_bitmap, in this case you've
>>> incremented by nodesize so next iteration begins at index 4. You detect
>>> it's unset (0) hence you increment it byte 1 and you repeat this for the
>>> next 3 bits, then you free the whole of the next eb. I argue that you
>>> also need to increment by nodesize in the case of a bit which is not
>>> set, because you cannot really see partially freed eb i.e you cannot see
>>> the following state:
>>>
>>> 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15
>>> x|x|x|x|x|0|0|0|x|x| x|x |0 | 0|0 |0
>>>
>>> Am I missing something?
>>
>> It's not for partly freed eb, but nodesize unaligned eb.
>>
>> E.g. if we have a eb starts at sector 1 of a page, your nodesize based
>> iteration would go crazy.
>> Although we have ensured no subpage eb can cross page boundary, but it's
>> not the same requirement for nodesize alignment.
>>
>> Thus I uses the extra safe way for the empty bit.
>
> Which of course cannot happen because the allocator ensures that
> returned addresses are always aligned to fs_info::stripeize which in
> turn is always equal to sectorsize...

Nope again.
Think again, sectorsize is only 4K, while nodesize is 16K.

So it's valid (not really good though) to have eb bytenr which is only
aligned to 4K but not aligned to 16K.


> So you add extra complexity for no
> apparent reason making code which is already subtle to be even more subtle.
>
> <snip>
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2020-12-12  5:44         ` Qu Wenruo
@ 2020-12-12 10:30           ` Nikolay Borisov
  2020-12-12 10:31             ` Qu Wenruo
  0 siblings, 1 reply; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-12 10:30 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs



On 12.12.20 г. 7:44 ч., Qu Wenruo wrote:
> 
> 
> On 2020/12/12 上午12:57, Nikolay Borisov wrote:
>>
>>
>> On 11.12.20 г. 14:11 ч., Qu Wenruo wrote:
>>>
>>>
>>> On 2020/12/11 下午8:00, Nikolay Borisov wrote:
>>>>
>>>>

<snip>

>>>>
>>>>> +        /*
>>>>> +         * Here we can't call find_extent_buffer() which will
>>>>> increase
>>>>> +         * eb->refs.
>>>>> +         */
>>>>> +        rcu_read_lock();
>>>>> +        eb = radix_tree_lookup(&fs_info->buffer_radix,
>>>>> +                start >> fs_info->sectorsize_bits);
>>>>> +        rcu_read_unlock();
>>
>> Your usage of radix_tree_lookup + rcu lock is wrong. rcu guarantees that
>> an EB you get won't be freed while the rcu section is active, however
>> you get a reference to the EB and you do not increment the ref count
>> WHILE holding the RCU critical section, consult find_extent_buffer
>> what's the correct usage pattern.
> 
> Nope, you just fall into the trap what I fell before.
> 
> Here if the eb has no other referencer, its refs is just 1 (because it's
> still in the tree).
> 
> If you go increase the refs, the eb becomes referenced again and
> release_extent_buffer() won't free it at all.
> 
> Causing no eb to be freed whatever.

After the rcu_read_unlock you hold a reference to eb, without having
incremented the eb's refs, without having locked eb's refs_lock. At this
point nothing prevents the eb from disappearing from underneath you. The
correct way would be to increment the eb's ref and check if ref is > 2
(1 for the buffer radix tree, 1 for you), then you acquire the refs_lock
and drop your current ref leaving it to 1 and call release_extent_buffer.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support
  2020-12-12 10:30           ` Nikolay Borisov
@ 2020-12-12 10:31             ` Qu Wenruo
  0 siblings, 0 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-12 10:31 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2020/12/12 下午6:30, Nikolay Borisov wrote:
> 
> 
> On 12.12.20 г. 7:44 ч., Qu Wenruo wrote:
>>
>>
>> On 2020/12/12 上午12:57, Nikolay Borisov wrote:
>>>
>>>
>>> On 11.12.20 г. 14:11 ч., Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2020/12/11 下午8:00, Nikolay Borisov wrote:
>>>>>
>>>>>
> 
> <snip>
> 
>>>>>
>>>>>> +        /*
>>>>>> +         * Here we can't call find_extent_buffer() which will
>>>>>> increase
>>>>>> +         * eb->refs.
>>>>>> +         */
>>>>>> +        rcu_read_lock();
>>>>>> +        eb = radix_tree_lookup(&fs_info->buffer_radix,
>>>>>> +                start >> fs_info->sectorsize_bits);
>>>>>> +        rcu_read_unlock();
>>>
>>> Your usage of radix_tree_lookup + rcu lock is wrong. rcu guarantees that
>>> an EB you get won't be freed while the rcu section is active, however
>>> you get a reference to the EB and you do not increment the ref count
>>> WHILE holding the RCU critical section, consult find_extent_buffer
>>> what's the correct usage pattern.
>>
>> Nope, you just fall into the trap what I fell before.
>>
>> Here if the eb has no other referencer, its refs is just 1 (because it's
>> still in the tree).
>>
>> If you go increase the refs, the eb becomes referenced again and
>> release_extent_buffer() won't free it at all.
>>
>> Causing no eb to be freed whatever.
> 
> After the rcu_read_unlock you hold a reference to eb, without having
> incremented the eb's refs, without having locked eb's refs_lock.

Haven't you checked the original try_release_extent_buffer()?

  At this
> point nothing prevents the eb from disappearing from underneath you. The
> correct way would be to increment the eb's ref and check if ref is > 2
> (1 for the buffer radix tree, 1 for you), then you acquire the refs_lock
> and drop your current ref leaving it to 1 and call release_extent_buffer.
> 
> 


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 14/18] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case
  2020-12-10  6:39 ` [PATCH v2 14/18] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case Qu Wenruo
@ 2020-12-14  9:57   ` Nikolay Borisov
  2020-12-14 10:46     ` Qu Wenruo
  0 siblings, 1 reply; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-14  9:57 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.12.20 г. 8:39 ч., Qu Wenruo wrote:
> To handle subpage status update, add the following new tricks:
> - Use btrfs_page_*() helpers to update page status
>   Now we can handle both cases well.
> 
> - No page unlock for subpage metadata
>   Since subpage metadata doesn't utilize page locking at all, skip it.
>   For subpage data locking, it's handled in later commits.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 23 +++++++++++++++++------
>  1 file changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 1ec9de2aa910..64a19c1884fc 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2841,15 +2841,26 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
>  	processed->uptodate = uptodate;
>  }
>  
> -static void endio_readpage_update_page_status(struct page *page, bool uptodate)
> +static void endio_readpage_update_page_status(struct page *page, bool uptodate,
> +					      u64 start, u64 end)
>  {
> +	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
> +	u32 len;
> +
> +	ASSERT(page_offset(page) <= start &&
> +		end <= page_offset(page) + PAGE_SIZE - 1);

'start' in this case is
'start = page_offset(page) + bvec->bv_offset;' from
end_bio_extent_readpage, so it can't possibly be less than page_offset,
instead it will at least be equal to page_offset if bvec->bv_offset is 0
. However, can we really guarantee this ?


You are using the end only for the assert, and given you already have
the 'len' parameter calculated in the caller I'd rather have this
function take start/len, that would save you from recalculating the len
and also for someone looking at the code it would be apparent it's the
length of the currently processed bvec. I looked through the end of the
series and you never use 'end' just 'len'

<snip>


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check
  2020-12-10  6:39 ` [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check Qu Wenruo
  2020-12-10 13:24     ` kernel test robot
  2020-12-10 13:39     ` kernel test robot
@ 2020-12-14 10:21   ` Nikolay Borisov
  2020-12-14 10:50     ` Qu Wenruo
  2 siblings, 1 reply; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-14 10:21 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.12.20 г. 8:39 ч., Qu Wenruo wrote:
> For subpage metadata validation check, there are some difference:
> - Read must finish in one bvec
>   Since we're just reading one subpage range in one page, it should
>   never be split into two bios nor two bvecs.
> 
> - How to grab the existing eb
>   Instead of grabbing eb using page->private, we have to go search radix
>   tree as we don't have any direct pointer at hand.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/disk-io.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 82 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index b6c03a8b0c72..adda76895058 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -591,6 +591,84 @@ static int validate_extent_buffer(struct extent_buffer *eb)
>  	return ret;
>  }
>  
> +static int validate_subpage_buffer(struct page *page, u64 start, u64 end,
> +				   int mirror)
> +{
> +	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
> +	struct extent_buffer *eb;
> +	int reads_done;
> +	int ret = 0;
> +
> +	if (!IS_ALIGNED(start, fs_info->sectorsize) ||

That's guaranteed by the allocator.

> +	    !IS_ALIGNED(end - start + 1, fs_info->sectorsize) ||
That's guaranteed by the fact that  nodesize is a multiple of sectorsize.

> +	    !IS_ALIGNED(end - start + 1, fs_info->nodesize)) {

And that's also guaranteed that the size of an eb is always a nodesize.
Also aren't those checks already performed by the tree-checker during
write? Just remove this as it adds noise.

> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));> +		btrfs_err(fs_info, "invalid tree read bytenr");
> +		return -EUCLEAN;
> +	}
> +
> +	/*
> +	 * We don't allow bio merge for subpage metadata read, so we should
> +	 * only get one eb for each endio hook.
> +	 */
> +	ASSERT(end == start + fs_info->nodesize - 1);
> +	ASSERT(PagePrivate(page));
> +
> +	rcu_read_lock();
> +	eb = radix_tree_lookup(&fs_info->buffer_radix,
> +			       start / fs_info->sectorsize);

This division op likely produces the kernel robot's warning. It could be
written to use >> fs_info->sectorsize_bits. Furthermore this usage of
radix tree + rcu without acquiring the refs is unsafe as per my
explanation of, essentially, identical issue in patch 12 and our offline
chat about it.

> +	rcu_read_unlock();
> +
> +	/*
> +	 * When we are reading one tree block, eb must have been
> +	 * inserted into the radix tree. If not something is wrong.
> +	 */
> +	if (!eb) {
> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> +		btrfs_err(fs_info,
> +			"can't find extent buffer for bytenr %llu",
> +			start);
> +		return -EUCLEAN;
> +	}

That's impossible to execute and such a failure will result in a crash
so just remove this code.

> +	/*
> +	 * The pending IO might have been the only thing that kept
> +	 * this buffer in memory.  Make sure we have a ref for all
> +	 * this other checks
> +	 */
> +	atomic_inc(&eb->refs);
> +
> +	reads_done = atomic_dec_and_test(&eb->io_pages);
> +	/* Subpage read must finish in page read */
> +	ASSERT(reads_done);

Just ASSERT(atomic_dec_and_test(&eb->io_pages)). Again, for subpage I
think that's a bit much since it only has 1 page so it's guaranteed that
it will always be true.
> +
> +	eb->read_mirror = mirror;
> +	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
> +		ret = -EIO;
> +		goto err;
> +	}
> +	ret = validate_extent_buffer(eb);
> +	if (ret < 0)
> +		goto err;
> +
> +	if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
> +		btree_readahead_hook(eb, ret);
> +
> +	set_extent_buffer_uptodate(eb);
> +
> +	free_extent_buffer(eb);
> +	return ret;
> +err:
> +	/*
> +	 * our io error hook is going to dec the io pages
> +	 * again, we have to make sure it has something to
> +	 * decrement
> +	 */

That comment is slightly ambiguous - it's not the io error hook that
does the decrement but end_bio_extent_readpage. Just rewrite the comment
to :

"end_bio_extent_readpage decrements io_pages in case of error, make sure
it has ...."

> +	atomic_inc(&eb->io_pages);
> +	clear_extent_buffer_uptodate(eb);
> +	free_extent_buffer(eb);
> +	return ret;
> +}
> +
>  int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
>  				   struct page *page, u64 start, u64 end,
>  				   int mirror)
> @@ -600,6 +678,10 @@ int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
>  	int reads_done;
>  
>  	ASSERT(page->private);
> +
> +	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
> +		return validate_subpage_buffer(page, start, end, mirror);

nit: validate_metadata_buffer is called in only once place so I'm
wondering won't it make it more readable if this check is lifted to its
sole caller so that when reading end_bio_extent_readpage it's apparent
what's going on. Though it's apparent that the nesting in the caller
will get somewhat unwieldy so won't be pressing hard for this.
> +
>  	eb = (struct extent_buffer *)page->private;
>  
>  
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 14/18] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case
  2020-12-14  9:57   ` Nikolay Borisov
@ 2020-12-14 10:46     ` Qu Wenruo
  0 siblings, 0 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-14 10:46 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2020/12/14 下午5:57, Nikolay Borisov wrote:
>
>
> On 10.12.20 г. 8:39 ч., Qu Wenruo wrote:
>> To handle subpage status update, add the following new tricks:
>> - Use btrfs_page_*() helpers to update page status
>>    Now we can handle both cases well.
>>
>> - No page unlock for subpage metadata
>>    Since subpage metadata doesn't utilize page locking at all, skip it.
>>    For subpage data locking, it's handled in later commits.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/extent_io.c | 23 +++++++++++++++++------
>>   1 file changed, 17 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 1ec9de2aa910..64a19c1884fc 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -2841,15 +2841,26 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
>>   	processed->uptodate = uptodate;
>>   }
>>
>> -static void endio_readpage_update_page_status(struct page *page, bool uptodate)
>> +static void endio_readpage_update_page_status(struct page *page, bool uptodate,
>> +					      u64 start, u64 end)
>>   {
>> +	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
>> +	u32 len;
>> +
>> +	ASSERT(page_offset(page) <= start &&
>> +		end <= page_offset(page) + PAGE_SIZE - 1);
>
> 'start' in this case is
> 'start = page_offset(page) + bvec->bv_offset;' from
> end_bio_extent_readpage, so it can't possibly be less than page_offset,
> instead it will at least be equal to page_offset if bvec->bv_offset is 0
> . However, can we really guarantee this ?

I believe we can.

But as you may have already found, I'm sometimes over-cautious, thus I
tend to use ASSERT() as a way to indicate the prerequisites.

>
>
> You are using the end only for the assert, and given you already have
> the 'len' parameter calculated in the caller I'd rather have this
> function take start/len, that would save you from recalculating the len
> and also for someone looking at the code it would be apparent it's the
> length of the currently processed bvec. I looked through the end of the
> series and you never use 'end' just 'len'

Right, sticking len would be better, I'll stick to start/len schema for
new functions in the series.

Thanks,
Qu

>
> <snip>
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check
  2020-12-14 10:21   ` Nikolay Borisov
@ 2020-12-14 10:50     ` Qu Wenruo
  2020-12-14 11:17       ` Nikolay Borisov
  0 siblings, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-14 10:50 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2020/12/14 下午6:21, Nikolay Borisov wrote:
>
>
> On 10.12.20 г. 8:39 ч., Qu Wenruo wrote:
>> For subpage metadata validation check, there are some difference:
>> - Read must finish in one bvec
>>    Since we're just reading one subpage range in one page, it should
>>    never be split into two bios nor two bvecs.
>>
>> - How to grab the existing eb
>>    Instead of grabbing eb using page->private, we have to go search radix
>>    tree as we don't have any direct pointer at hand.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/disk-io.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 82 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index b6c03a8b0c72..adda76895058 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -591,6 +591,84 @@ static int validate_extent_buffer(struct extent_buffer *eb)
>>   	return ret;
>>   }
>>
>> +static int validate_subpage_buffer(struct page *page, u64 start, u64 end,
>> +				   int mirror)
>> +{
>> +	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
>> +	struct extent_buffer *eb;
>> +	int reads_done;
>> +	int ret = 0;
>> +
>> +	if (!IS_ALIGNED(start, fs_info->sectorsize) ||
>
> That's guaranteed by the allocator.
>
>> +	    !IS_ALIGNED(end - start + 1, fs_info->sectorsize) ||
> That's guaranteed by the fact that  nodesize is a multiple of sectorsize.
>
>> +	    !IS_ALIGNED(end - start + 1, fs_info->nodesize)) {
>
> And that's also guaranteed that the size of an eb is always a nodesize.
> Also aren't those checks already performed by the tree-checker during
> write? Just remove this as it adds noise.
>
>> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));> +		btrfs_err(fs_info, "invalid tree read bytenr");
>> +		return -EUCLEAN;
>> +	}
>> +
>> +	/*
>> +	 * We don't allow bio merge for subpage metadata read, so we should
>> +	 * only get one eb for each endio hook.
>> +	 */
>> +	ASSERT(end == start + fs_info->nodesize - 1);
>> +	ASSERT(PagePrivate(page));
>> +
>> +	rcu_read_lock();
>> +	eb = radix_tree_lookup(&fs_info->buffer_radix,
>> +			       start / fs_info->sectorsize);
>
> This division op likely produces the kernel robot's warning. It could be
> written to use >> fs_info->sectorsize_bits. Furthermore this usage of
> radix tree + rcu without acquiring the refs is unsafe as per my
> explanation of, essentially, identical issue in patch 12 and our offline
> chat about it.

Another relic I forgot in the long update history, nice find.

>
>> +	rcu_read_unlock();
>> +
>> +	/*
>> +	 * When we are reading one tree block, eb must have been
>> +	 * inserted into the radix tree. If not something is wrong.
>> +	 */
>> +	if (!eb) {
>> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>> +		btrfs_err(fs_info,
>> +			"can't find extent buffer for bytenr %llu",
>> +			start);
>> +		return -EUCLEAN;
>> +	}
>
> That's impossible to execute and such a failure will result in a crash
> so just remove this code.
>
>> +	/*
>> +	 * The pending IO might have been the only thing that kept
>> +	 * this buffer in memory.  Make sure we have a ref for all
>> +	 * this other checks
>> +	 */
>> +	atomic_inc(&eb->refs);
>> +
>> +	reads_done = atomic_dec_and_test(&eb->io_pages);
>> +	/* Subpage read must finish in page read */
>> +	ASSERT(reads_done);
>
> Just ASSERT(atomic_dec_and_test(&eb->io_pages)). Again, for subpage I
> think that's a bit much since it only has 1 page so it's guaranteed that
> it will always be true.

IIRC ASSERT() won't execute whatever in it for non debug build.
Thus ASSERT(atomic_*) would cause non-debug kernel not to decrease the
io_pages and hangs the system.

Exactly the pitfall I'm thinking of.

Thanks,
Qu

>> +
>> +	eb->read_mirror = mirror;
>> +	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
>> +		ret = -EIO;
>> +		goto err;
>> +	}
>> +	ret = validate_extent_buffer(eb);
>> +	if (ret < 0)
>> +		goto err;
>> +
>> +	if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
>> +		btree_readahead_hook(eb, ret);
>> +
>> +	set_extent_buffer_uptodate(eb);
>> +
>> +	free_extent_buffer(eb);
>> +	return ret;
>> +err:
>> +	/*
>> +	 * our io error hook is going to dec the io pages
>> +	 * again, we have to make sure it has something to
>> +	 * decrement
>> +	 */
>
> That comment is slightly ambiguous - it's not the io error hook that
> does the decrement but end_bio_extent_readpage. Just rewrite the comment
> to :
>
> "end_bio_extent_readpage decrements io_pages in case of error, make sure
> it has ...."
>
>> +	atomic_inc(&eb->io_pages);
>> +	clear_extent_buffer_uptodate(eb);
>> +	free_extent_buffer(eb);
>> +	return ret;
>> +}
>> +
>>   int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
>>   				   struct page *page, u64 start, u64 end,
>>   				   int mirror)
>> @@ -600,6 +678,10 @@ int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio,
>>   	int reads_done;
>>
>>   	ASSERT(page->private);
>> +
>> +	if (btrfs_sb(page->mapping->host->i_sb)->sectorsize < PAGE_SIZE)
>> +		return validate_subpage_buffer(page, start, end, mirror);
>
> nit: validate_metadata_buffer is called in only once place so I'm
> wondering won't it make it more readable if this check is lifted to its
> sole caller so that when reading end_bio_extent_readpage it's apparent
> what's going on. Though it's apparent that the nesting in the caller
> will get somewhat unwieldy so won't be pressing hard for this.
>> +
>>   	eb = (struct extent_buffer *)page->private;
>>
>>
>>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check
  2020-12-14 10:50     ` Qu Wenruo
@ 2020-12-14 11:17       ` Nikolay Borisov
  2020-12-14 11:32         ` Qu Wenruo
  0 siblings, 1 reply; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-14 11:17 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs



On 14.12.20 г. 12:50 ч., Qu Wenruo wrote:
> 
> IIRC ASSERT() won't execute whatever in it for non debug build.
> Thus ASSERT(atomic_*) would cause non-debug kernel not to decrease the
> io_pages and hangs the system.

Nope: 

  3362 #ifdef CONFIG_BTRFS_ASSERT                                                      
     1 __cold __noreturn                                                               
     2 static inline void assertfail(const char *expr, const char *file, int line)     
     3 {                                                                               
     4         pr_err("assertion failed: %s, in %s:%d\n", expr, file, line);           
     5         BUG();                                                                  
     6 }                                                                               
     7                                                                                 
     8 #define ASSERT(expr)                                            \               
     9         (likely(expr) ? (void)0 : assertfail(#expr, __FILE__, __LINE__))        
    10                                                                                 
    11 #else                                                                           
    12 static inline void assertfail(const char *expr, const char* file, int line) { } 
    13 #define ASSERT(expr)    (void)(expr)               <-- expression is evaluated.                   
    14 #endif 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check
  2020-12-14 11:17       ` Nikolay Borisov
@ 2020-12-14 11:32         ` Qu Wenruo
  2020-12-14 12:40           ` Nikolay Borisov
  0 siblings, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-14 11:32 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2020/12/14 下午7:17, Nikolay Borisov wrote:
>
>
> On 14.12.20 г. 12:50 ч., Qu Wenruo wrote:
>>
>> IIRC ASSERT() won't execute whatever in it for non debug build.
>> Thus ASSERT(atomic_*) would cause non-debug kernel not to decrease the
>> io_pages and hangs the system.
>
> Nope:
>
>    3362 #ifdef CONFIG_BTRFS_ASSERT
>       1 __cold __noreturn
>       2 static inline void assertfail(const char *expr, const char *file, int line)
>       3 {
>       4         pr_err("assertion failed: %s, in %s:%d\n", expr, file, line);
>       5         BUG();
>       6 }
>       7
>       8 #define ASSERT(expr)                                            \
>       9         (likely(expr) ? (void)0 : assertfail(#expr, __FILE__, __LINE__))
>      10
>      11 #else
>      12 static inline void assertfail(const char *expr, const char* file, int line) { }
>      13 #define ASSERT(expr)    (void)(expr)               <-- expression is evaluated.
>      14 #endif
>
Wow, that's too tricky and maybe that's the reason why Josef is
complaining about the ASSERT()s slows down the system.

In fact, from the assert(3) man page, we're doing things differently
than user space at least:

   If the macro NDEBUG is defined at the moment <assert.h> was last
   included, the macro assert() generates no code, and hence does nothing
   at all.

So I'm confused, what's the proper way to do ASSERT()?

Thanks,
Qu

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check
  2020-12-14 11:32         ` Qu Wenruo
@ 2020-12-14 12:40           ` Nikolay Borisov
  0 siblings, 0 replies; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-14 12:40 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs; +Cc: David Sterba



On 14.12.20 г. 13:32 ч., Qu Wenruo wrote:
> 
> 
> On 2020/12/14 下午7:17, Nikolay Borisov wrote:
>>
>>
>> On 14.12.20 г. 12:50 ч., Qu Wenruo wrote:
>>>
>>> IIRC ASSERT() won't execute whatever in it for non debug build.
>>> Thus ASSERT(atomic_*) would cause non-debug kernel not to decrease the
>>> io_pages and hangs the system.
>>
>> Nope:
>>
>>    3362 #ifdef CONFIG_BTRFS_ASSERT
>>       1 __cold __noreturn
>>       2 static inline void assertfail(const char *expr, const char
>> *file, int line)
>>       3 {
>>       4         pr_err("assertion failed: %s, in %s:%d\n", expr, file,
>> line);
>>       5         BUG();
>>       6 }
>>       7
>>       8 #define ASSERT(expr)                                            \
>>       9         (likely(expr) ? (void)0 : assertfail(#expr, __FILE__,
>> __LINE__))
>>      10
>>      11 #else
>>      12 static inline void assertfail(const char *expr, const char*
>> file, int line) { }
>>      13 #define ASSERT(expr)    (void)(expr)               <--
>> expression is evaluated.
>>      14 #endif
>>
> Wow, that's too tricky and maybe that's the reason why Josef is
> complaining about the ASSERT()s slows down the system.
> 
> In fact, from the assert(3) man page, we're doing things differently
> than user space at least:
> 
>   If the macro NDEBUG is defined at the moment <assert.h> was last
>   included, the macro assert() generates no code, and hence does nothing
>   at all.
> 
> So I'm confused, what's the proper way to do ASSERT()?

Well as it stands now, what I suggested would work. OTOH this really
puts forward the question why do we leave code around (well, the
compiler should really eliminate those redundant checks). Hm, David?



> 
> Thanks,
> Qu
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 16/18] btrfs: introduce btrfs_subpage for data inodes
  2020-12-10  6:39 ` [PATCH v2 16/18] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
  2020-12-10  9:44     ` kernel test robot
  2020-12-11  0:43     ` kernel test robot
@ 2020-12-14 12:46   ` Nikolay Borisov
  2 siblings, 0 replies; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-14 12:46 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.12.20 г. 8:39 ч., Qu Wenruo wrote:
> To support subpage sector size, data also need extra info to make sure
> which sectors in a page are uptodate/dirty/...
> 
> This patch will make pages for data inodes to get btrfs_subpage
> structure attached, and detached when the page is freed.
> 
> This patch also slightly changes the timing when
> set_page_extent_mapped() to make sure:
> - We have page->mapping set
>   page->mapping->host is used to grab btrfs_fs_info, thus we can only
>   call this function after page is mapped to an inode.
> 
>   One call site attaches pages to inode manually, thus we have to modify
>   the timing of set_page_extent_mapped() a little.
> 
> - As soon as possible, before other operations
>   Since memory allocation can fail, we have to do extra error handling.
>   Calling set_page_extent_mapped() as soon as possible can simply the
>   error handling for several call sites.
> 
> The idea is pretty much the same as iomap_page, but with more bitmaps
> for btrfs specific cases.
> 
> Currently the plan is to switch iomap if iomap can provide sector
> aligned write back (only write back dirty sectors, but not the full
> page, data balance require this feature).
> 
> So we will stick to btrfs specific bitmap for now.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/compression.c      | 10 ++++++--
>  fs/btrfs/extent_io.c        | 47 +++++++++++++++++++++++++++++++++----
>  fs/btrfs/extent_io.h        |  3 ++-
>  fs/btrfs/file.c             | 10 +++++---
>  fs/btrfs/free-space-cache.c | 15 +++++++++---
>  fs/btrfs/inode.c            | 12 ++++++----
>  fs/btrfs/ioctl.c            |  5 +++-
>  fs/btrfs/reflink.c          |  5 +++-
>  fs/btrfs/relocation.c       | 12 ++++++++--
>  9 files changed, 98 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 5ae3fa0386b7..6d203acfdeb3 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -542,13 +542,19 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>  			goto next;
>  		}
>  
> -		end = last_offset + PAGE_SIZE - 1;
>  		/*
>  		 * at this point, we have a locked page in the page cache
>  		 * for these bytes in the file.  But, we have to make
>  		 * sure they map to this compressed extent on disk.
>  		 */
> -		set_page_extent_mapped(page);
> +		ret = set_page_extent_mapped(page);
> +		if (ret < 0) {
> +			unlock_page(page);
> +			put_page(page);
> +			break;
> +		}
> +
> +		end = last_offset + PAGE_SIZE - 1;
>  		lock_extent(tree, last_offset, end);
>  		read_lock(&em_tree->lock);
>  		em = lookup_extent_mapping(em_tree, last_offset,
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 64a19c1884fc..4e4ed9c453ae 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3191,10 +3191,40 @@ static int attach_extent_buffer_page(struct extent_buffer *eb,
>  	return 0;
>  }
>  
> -void set_page_extent_mapped(struct page *page)
> +int __must_check set_page_extent_mapped(struct page *page)
>  {
> -	if (!PagePrivate(page))
> +	struct btrfs_fs_info *fs_info;
> +
> +	ASSERT(page->mapping);
> +
> +	if (PagePrivate(page))
> +		return 0;
> +
> +	fs_info = btrfs_sb(page->mapping->host->i_sb);
> +	if (fs_info->sectorsize == PAGE_SIZE) {
>  		attach_page_private(page, (void *)EXTENT_PAGE_PRIVATE);
> +		return 0;
> +	}
> +
> +	return btrfs_attach_subpage(fs_info, page);

In all previous patches < PAGE_SIZE is the special case, in this
function it's reversed. For the sake of consistency change that so
btrfs_attch_subpage is executed inside the conditional.

> +}
> +
> +void clear_page_extent_mapped(struct page *page)
> +{
> +	struct btrfs_fs_info *fs_info;
> +
> +	ASSERT(page->mapping);
> +
> +	if (!PagePrivate(page))
> +		return;
> +
> +	fs_info = btrfs_sb(page->mapping->host->i_sb);
> +	if (fs_info->sectorsize == PAGE_SIZE) {
> +		detach_page_private(page);
> +		return;
> +	}
> +
> +	btrfs_detach_subpage(fs_info, page);

DITTO

>  }
>  
>  static struct extent_map *

<snip>

> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index a29b50208eee..9b878616b489 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1373,6 +1373,12 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
>  			goto fail;
>  		}
>  
> +		err = set_page_extent_mapped(pages[i]);
> +		if (err < 0) {
> +			faili = i;
> +			goto fail;
> +		}
> +
>  		if (i == 0)
>  			err = prepare_uptodate_page(inode, pages[i], pos,
>  						    force_uptodate);
> @@ -1470,10 +1476,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
>  	 * We'll call btrfs_dirty_pages() later on, and that will flip around
>  	 * delalloc bits and dirty the pages as required.
>  	 */
> -	for (i = 0; i < num_pages; i++) {
> -		set_page_extent_mapped(pages[i]);
> +	for (i = 0; i < num_pages; i++)
>  		WARN_ON(!PageLocked(pages[i]));
> -	}
The comment above this needs to be removed/rewritten I guess?
Essentially that set_page_extent_mapped is moved to prepare_pages.

>  
>  	return ret;
>  }

<snip>

> @@ -8355,7 +8357,9 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>  	wait_on_page_writeback(page);
>  
>  	lock_extent_bits(io_tree, page_start, page_end, &cached_state);
> -	set_page_extent_mapped(page);
> +	ret = set_page_extent_mapped(page);
> +	if (ret < 0)
> +		goto out_unlock;

You should use ret2, ret in this function is used for the retval of
vmf_error().

>  
>  	/*
>  	 * we can't set the delalloc bits if there are pending ordered

<snip>


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 17/18] btrfs: integrate page status update for read path into begin/end_page_read()
  2020-12-10  6:39 ` [PATCH v2 17/18] btrfs: integrate page status update for read path into begin/end_page_read() Qu Wenruo
@ 2020-12-14 13:59   ` Nikolay Borisov
  0 siblings, 0 replies; 71+ messages in thread
From: Nikolay Borisov @ 2020-12-14 13:59 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.12.20 г. 8:39 ч., Qu Wenruo wrote:
> In btrfs data page read path, the page status update are handled in two
> different locations:
> 
>   btrfs_do_read_page()
>   {
> 	while (cur <= end) {
> 		/* No need to read from disk */
> 		if (HOLE/PREALLOC/INLINE){
> 			memset();
> 			set_extent_uptodate();
> 			continue;
> 		}
> 		/* Read from disk */
> 		ret = submit_extent_page(end_bio_extent_readpage);
>   }
> 
>   end_bio_extent_readpage()
>   {
> 	endio_readpage_uptodate_page_status();
>   }
> 
> This is fine for sectorsize == PAGE_SIZE case, as for above loop we
> should only hit one branch and then exit.
> 
> But for subpage, there are more works to be done in page status update:
> - Page Unlock condition
>   Unlike regular page size == sectorsize case, we can no longer just
>   unlock a page.
>   Only the last reader of the page can unlock the page.
>   This means, we can unlock the page either in the while() loop, or in
>   the endio function.
> 
> - Page uptodate condition
>   Since we have multiple sectors to read for a page, we can only mark
>   the full page uptodate if all sectors are uptodate.
> 
> To handle both subpage and regular cases, introduce a pair of functions
> to help handling page status update:
> 
> - being_page_read()
>   For regular case, it does nothing.
>   For subpage case, it update the reader counters so that later
>   end_page_read() can know who is the last one to unlock the page.
> 
> - end_page_read()
>   This is just endio_readpage_uptodate_page_status() renamed.
>   The original name is a little too long and too specific for endio.
> 
>   The only new trick added is the condition for page unlock.
>   Now for subage data, we unlock the page if we're the last reader.
> 
> This does not only provide the basis for subpage data read, but also
> hide the special handling of page read from the main read loop.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 39 +++++++++++++++++++++++++-----------
>  fs/btrfs/subpage.h   | 47 ++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 68 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 4e4ed9c453ae..56174e7f0ae8 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2841,8 +2841,18 @@ static void endio_readpage_release_extent(struct processed_extent *processed,
>  	processed->uptodate = uptodate;
>  }
>  
> -static void endio_readpage_update_page_status(struct page *page, bool uptodate,
> -					      u64 start, u64 end)
> +static void begin_data_page_read(struct btrfs_fs_info *fs_info, struct page *page)
> +{
> +	ASSERT(PageLocked(page));
> +	if (fs_info->sectorsize == PAGE_SIZE)
> +		return;
> +
> +	ASSERT(PagePrivate(page) && page->private);
2nd part of the assert condition is redundant, page->private should only
be set via the respective generic helper which is never called with NULL
as the 2nd argument.

> +	ASSERT(page->mapping->host != fs_info->btree_inode);
That function is only called by btrfs_do_readpage which is used only for
data read out, so do we really need this? I understand you want to be
extra careful but I think this is going over the top.

> +	btrfs_subpage_start_reader(fs_info, page, page_offset(page), PAGE_SIZE);
> +}
> +
> +static void end_page_read(struct page *page, bool uptodate, u64 start, u64 end)
>  {
>  	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
>  	u32 len;
> @@ -2860,7 +2870,12 @@ static void endio_readpage_update_page_status(struct page *page, bool uptodate,
>  
>  	if (fs_info->sectorsize == PAGE_SIZE)
>  		unlock_page(page);
> -	/* Subpage locking will be handled in later patches */
> +	else if (page->mapping->host != fs_info->btree_inode)

Use is_data_inode() helper

> +		/*
> +		 * For subpage data, unlock the page if we're the last reader.
> +		 * For subpage metadata, page lock is not utilized for read.
> +		 */
> +		btrfs_subpage_end_reader(fs_info, page, start, len);
>  }
>  
>  /*

<snip>
> diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
> index 8592234d773e..6c801ef00d2d 100644
> --- a/fs/btrfs/subpage.h
> +++ b/fs/btrfs/subpage.h
> @@ -31,6 +31,9 @@ struct btrfs_subpage {
>  			u16 tree_block_bitmap;
>  		};
>  		/* structures only used by data */
> +		struct {
> +			atomic_t readers;
> +		};
>  	};
>  };
>  
> @@ -48,6 +51,17 @@ static inline void btrfs_subpage_clamp_range(struct page *page,
>  		     orig_start + orig_len) - *start;
>  }
>  
> +static inline void btrfs_subpage_assert(struct btrfs_fs_info *fs_info,
> +					struct page *page, u64 start, u32 len)
> +{
> +	/* Basic checks */
> +	ASSERT(PagePrivate(page) && page->private);
> +	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
> +	       IS_ALIGNED(len, fs_info->sectorsize));
> +	ASSERT(page_offset(page) <= start &&
> +	       start + len <= page_offset(page) + PAGE_SIZE);
> +}
> +
>  /*
>   * Convert the [start, start + len) range into a u16 bitmap
>   *
> @@ -59,12 +73,8 @@ static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
>  	int bit_start = (start - page_offset(page)) >> fs_info->sectorsize_bits;
>  	int nbits = len >> fs_info->sectorsize_bits;
>  
> -	/* Basic checks */
> -	ASSERT(PagePrivate(page) && page->private);
> -	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
> -	       IS_ALIGNED(len, fs_info->sectorsize));
> -	ASSERT(page_offset(page) <= start &&
> -	       start + len <= page_offset(page) + PAGE_SIZE);
> +	btrfs_subpage_assert(fs_info, page, start, len);
> +
>  	/*
>  	 * Here nbits can be 16, thus can go beyond u16 range. Here we make the
>  	 * first left shift to be calculated in unsigned long (u32), then
> @@ -73,6 +83,31 @@ static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
>  	return (u16)(((1UL << nbits) - 1) << bit_start);
>  }
>  
> +static inline void btrfs_subpage_start_reader(struct btrfs_fs_info *fs_info,
> +					      struct page *page, u64 start,
> +					      u32 len)
> +{
> +	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
> +	int nbits = len >> fs_info->sectorsize_bits;
> +
> +	btrfs_subpage_assert(fs_info, page, start, len);
> +
> +	ASSERT(atomic_read(&subpage->readers) == 0);
> +	atomic_set(&subpage->readers, nbits);

To make this more explicit implement it via atomic_add_unless and assert
on the return value.

> +}
> +
> +static inline void btrfs_subpage_end_reader(struct btrfs_fs_info *fs_info,
> +			struct page *page, u64 start, u32 len)
> +{
> +	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
> +	int nbits = len >> fs_info->sectorsize_bits;
> +
> +	btrfs_subpage_assert(fs_info, page, start, len);
> +	ASSERT(atomic_read(&subpage->readers) >= nbits);
> +	if (atomic_sub_and_test(nbits, &subpage->readers))
> +		unlock_page(page);
> +}
> +
>  static inline void btrfs_subpage_set_tree_block(struct btrfs_fs_info *fs_info,
>  			struct page *page, u64 start, u32 len)
>  {
> 

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case
  2020-12-10 15:30   ` Nikolay Borisov
@ 2020-12-17  6:48     ` Qu Wenruo
  0 siblings, 0 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-17  6:48 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs, David Sterba



On 2020/12/10 下午11:30, Nikolay Borisov wrote:
>
>
> On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
>> For subpage case, we need to allocate new memory for each metadata page.
>>
>> So we need to:
>> - Allow attach_extent_buffer_page() to return int
>>    To indicate allocation failure
>>
>> - Prealloc page->private for alloc_extent_buffer()
>>    We don't want to call memory allocation with spinlock hold, so
>>    do preallocation before we acquire the spin lock.
>>
>> - Handle subpage and regular case differently in
>>    attach_extent_buffer_page()
>>    For regular case, just do the usual thing.
>>    For subpage case, allocate new memory and update the tree_block
>>    bitmap.
>>
>>    The bitmap update will be handled by new subpage specific helper,
>>    btrfs_subpage_set_tree_block().
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/extent_io.c | 69 +++++++++++++++++++++++++++++++++++---------
>>   fs/btrfs/subpage.h   | 44 ++++++++++++++++++++++++++++
>>   2 files changed, 99 insertions(+), 14 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 6350c2687c7e..51dd7ec3c2b3 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -24,6 +24,7 @@
>>   #include "rcu-string.h"
>>   #include "backref.h"
>>   #include "disk-io.h"
>> +#include "subpage.h"
>>
>>   static struct kmem_cache *extent_state_cache;
>>   static struct kmem_cache *extent_buffer_cache;
>> @@ -3142,22 +3143,41 @@ static int submit_extent_page(unsigned int opf,
>>   	return ret;
>>   }
>>
>> -static void attach_extent_buffer_page(struct extent_buffer *eb,
>> +static int attach_extent_buffer_page(struct extent_buffer *eb,
>>   				      struct page *page)
>>   {
>> -	/*
>> -	 * If the page is mapped to btree inode, we should hold the private
>> -	 * lock to prevent race.
>> -	 * For cloned or dummy extent buffers, their pages are not mapped and
>> -	 * will not race with any other ebs.
>> -	 */
>> -	if (page->mapping)
>> -		lockdep_assert_held(&page->mapping->private_lock);
>> +	struct btrfs_fs_info *fs_info = eb->fs_info;
>> +	int ret;
>>
>> -	if (!PagePrivate(page))
>> -		attach_page_private(page, eb);
>> -	else
>> -		WARN_ON(page->private != (unsigned long)eb);
>> +	if (fs_info->sectorsize == PAGE_SIZE) {
>> +		/*
>> +		 * If the page is mapped to btree inode, we should hold the
>> +		 * private lock to prevent race.
>> +		 * For cloned or dummy extent buffers, their pages are not
>> +		 * mapped and will not race with any other ebs.
>> +		 */
>> +		if (page->mapping)
>> +			lockdep_assert_held(&page->mapping->private_lock);
>> +
>> +		if (!PagePrivate(page))
>> +			attach_page_private(page, eb);
>> +		else
>> +			WARN_ON(page->private != (unsigned long)eb);
>> +		return 0;
>> +	}
>> +
>> +	/* Already mapped, just update the existing range */
>> +	if (PagePrivate(page))
>> +		goto update_bitmap;
>
> How can this check ever be false, given btrfs_attach_subpage is called
> unconditionally  in alloc_extent_buffer so that you can avoid allocating
> memory with private lock held, yet in this function you check if memory
> hasn't been allocated and you proceed to do it? Also that memory
> allocation is done with GFP_NOFS under a spinlock, that's not atomic i.e
> IO can still be kicked which means you can go to sleep while holding a
> spinlock, not cool.

There are two callers of attach_extent_buffer_page(), one in
alloc_extent_buffer(), which we pre-allocate page::private before
calling attach_extent_buffer_page().

And the pre-allocation happens out of the spinlock.
Thus there is no memory allocation at all for that call site.

The other caller is in btrfs_clone_extent_buffer(), which needs proper
memory allocation.

>
>> +
>> +	/* Do new allocation to attach subpage */
>> +	ret = btrfs_attach_subpage(fs_info, page);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +update_bitmap:
>> +	btrfs_subpage_set_tree_block(fs_info, page, eb->start, eb->len);
>> +	return 0;
>
> Those are really 2 functions, demarcated by the if. Given that
> attach_extent_buffer is called in only 2 places, can't you opencode the
> if (fs_info->sectorize) check in the callers and define 2 functions:
>
> 1 for subpage blocksize and the other one for the old code?

Tried, looks much worse than current code, especially we need to add one
indent in btrfs_clone_extent_buffer().

>
>>   }
>>
>
> <snip>
>
>> diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
>> index 96f3b226913e..c2ce603e7848 100644
>> --- a/fs/btrfs/subpage.h
>> +++ b/fs/btrfs/subpage.h
>> @@ -23,9 +23,53 @@
>>   struct btrfs_subpage {
>>   	/* Common members for both data and metadata pages */
>>   	spinlock_t lock;
>> +	union {
>> +		/* Structures only used by metadata */
>> +		struct {
>> +			u16 tree_block_bitmap;
>> +		};
>> +		/* structures only used by data */
>> +	};
>>   };
>>
>>   int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>>   void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
>>
>> +/*
>> + * Convert the [start, start + len) range into a u16 bitmap
>> + *
>> + * E.g. if start == page_offset() + 16K, len = 16K, we get 0x00f0.
>> + */
>> +static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
>> +			struct page *page, u64 start, u32 len)
>> +{
>> +	int bit_start = (start - page_offset(page)) >> fs_info->sectorsize_bits;
>> +	int nbits = len >> fs_info->sectorsize_bits;
>> +
>> +	/* Basic checks */
>> +	ASSERT(PagePrivate(page) && page->private);
>> +	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
>> +	       IS_ALIGNED(len, fs_info->sectorsize));
>
> Separate aligns so if they feel it's evident which one failed.

I guess we are going to forget when ASSERT() should be used.
It's for something which shouldn't fail.

It's not used as a less-terrible BUG_ON(), but really to indicate what's
expected, thus I don't really expect it to be triggered, nor would it
matter if it's two lines or one line.

what's your idea on this David?

>
>> +	ASSERT(page_offset(page) <= start &&
>> +	       start + len <= page_offset(page) + PAGE_SIZE);
>
> ditto. Also instead of checking 'page_offset(page) <= start' you can
> simply check 'bit_start is >= 0' as that's what you ultimately care about.

Despite the ASSERT() usage, the start + len and page_offset() is much
easier to grasp without the need to refer to bit_start.

Thanks,
Qu

>
>> +	/*
>> +	 * Here nbits can be 16, thus can go beyond u16 range. Here we make the
>> +	 * first left shift to be calculated in unsigned long (u32), then
>> +	 * truncate the result to u16.
>> +	 */
>> +	return (u16)(((1UL << nbits) - 1) << bit_start);
>> +}
>> +
>> +static inline void btrfs_subpage_set_tree_block(struct btrfs_fs_info *fs_info,
>> +			struct page *page, u64 start, u32 len)
>> +{
>> +	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
>> +	unsigned long flags;
>> +	u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
>> +
>> +	spin_lock_irqsave(&subpage->lock, flags);
>> +	subpage->tree_block_bitmap |= tmp;
>> +	spin_unlock_irqrestore(&subpage->lock, flags);
>> +}
>> +
>>   #endif /* BTRFS_SUBPAGE_H */
>>
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 07/18] btrfs: extent_io: make grab_extent_buffer_from_page() to handle subpage case
  2020-12-10 15:39   ` Nikolay Borisov
@ 2020-12-17  6:55     ` Qu Wenruo
  0 siblings, 0 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-17  6:55 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2020/12/10 下午11:39, Nikolay Borisov wrote:
>
>
> On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
>> For subpage case, grab_extent_buffer_from_page() can't really get an
>> extent buffer just from btrfs_subpage.
>>
>> Although we have btrfs_subpage::tree_block_bitmap, which can be used to
>> grab the bytenr of an existing extent buffer, and can then go radix tree
>> search to grab that existing eb.
>>
>> However we are still doing radix tree insert check in
>> alloc_extent_buffer(), thus we don't really need to do the extra hassle,
>> just let alloc_extent_buffer() to handle existing eb in radix tree.
>>
>> So for grab_extent_buffer_from_page(), just always return NULL for
>> subpage case.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/extent_io.c | 13 +++++++++++--
>>   1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 51dd7ec3c2b3..b99bd0402130 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -5278,10 +5278,19 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
>>   }
>>   #endif
>>
>> -static struct extent_buffer *grab_extent_buffer_from_page(struct page *page)
>> +static struct extent_buffer *grab_extent_buffer_from_page(
>> +		struct btrfs_fs_info *fs_info, struct page *page)
>>   {
>>   	struct extent_buffer *exists;
>>
>> +	/*
>> +	 * For subpage case, we completely rely on radix tree to ensure we
>> +	 * don't try to insert two eb for the same bytenr.
>> +	 * So here we alwasy return NULL and just continue.
>> +	 */
>> +	if (fs_info->sectorsize < PAGE_SIZE)
>> +		return NULL;
>> +
>
> Instead of hiding this in the function, just open-code it in the only caller. It would look like:
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index b99bd0402130..440dab207944 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5370,8 +5370,9 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>                  }
>
>                  spin_lock(&mapping->private_lock);
> -               exists = grab_extent_buffer_from_page(fs_info, p);
> -               if (exists) {
> +               if (fs_info->sectorsize == PAGE_SIZE &&
> +                   (exists = grab_extent_buffer_from_page(fs_info, p)));
> +               {
>                          spin_unlock(&mapping->private_lock);
>                          unlock_page(p);
>                          put_page(p);
>
>
> Admittedly that exist = ... in the if condition is a bit of an anti-pattern but given it's used in only 1 place
> and makes the flow of code more linear I'd say it's a win. But would like to hear David's opinion.

Personally speaking, the (exists == *) inside the condition really looks
ugly and hard to grasp.

And since grab_extent_buffer_from_page() is only called once, the
generated code shouldn't be that much different anyway as the compiler
would mostly just inline it.

So I still prefer the current code, not to mention it also provides
extra space for the comment.

Thanks,
Qu
>
>>   	/* Page not yet attached to an extent buffer */
>>   	if (!PagePrivate(page))
>>   		return NULL;
>> @@ -5361,7 +5370,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>>   		}
>>
>>   		spin_lock(&mapping->private_lock);
>> -		exists = grab_extent_buffer_from_page(p);
>> +		exists = grab_extent_buffer_from_page(fs_info, p);
>>   		if (exists) {
>>   			spin_unlock(&mapping->private_lock);
>>   			unlock_page(p);
>>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 02/18] btrfs: extent_io: refactor __extent_writepage_io() to improve readability
  2020-12-10  6:38 ` [PATCH v2 02/18] btrfs: extent_io: refactor __extent_writepage_io() to improve readability Qu Wenruo
  2020-12-10 12:12   ` Nikolay Borisov
@ 2020-12-17 15:43   ` Josef Bacik
  1 sibling, 0 replies; 71+ messages in thread
From: Josef Bacik @ 2020-12-17 15:43 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 12/10/20 1:38 AM, Qu Wenruo wrote:
> The refactor involves the following modifications:
> - iosize alignment
>    In fact we don't really need to manually do alignment at all.
>    All extent maps should already be aligned, thus basic ASSERT() check
>    would be enough.
> 
> - redundant variables
>    We have extra variable like blocksize/pg_offset/end.
>    They are all unnecessary.
> 
>    @blocksize can be replaced by sectorsize size directly, and it's only
>    used to verify the em start/size is aligned.
> 
>    @pg_offset can be easily calculated using @cur and page_offset(page).
> 
>    @end is just assigned to @page_end and never modified, use @page_end
>    to replace it.
> 
> - remove some BUG_ON()s
>    The BUG_ON()s are for extent map, which we have tree-checker to check
>    on-disk extent data item and runtime check.
>    ASSERT() should be enough.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/extent_io.c | 37 +++++++++++++++++--------------------
>   1 file changed, 17 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 2650e8720394..612fe60b367e 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3515,17 +3515,14 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>   				 unsigned long nr_written,
>   				 int *nr_ret)
>   {
> +	struct btrfs_fs_info *fs_info = inode->root->fs_info;
>   	struct extent_io_tree *tree = &inode->io_tree;
>   	u64 start = page_offset(page);
>   	u64 page_end = start + PAGE_SIZE - 1;
> -	u64 end;
>   	u64 cur = start;
>   	u64 extent_offset;
>   	u64 block_start;
> -	u64 iosize;
>   	struct extent_map *em;
> -	size_t pg_offset = 0;
> -	size_t blocksize;
>   	int ret = 0;
>   	int nr = 0;
>   	const unsigned int write_flags = wbc_to_write_flags(wbc);
> @@ -3546,19 +3543,17 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>   	 */
>   	update_nr_written(wbc, nr_written + 1);
>   
> -	end = page_end;
> -	blocksize = inode->vfs_inode.i_sb->s_blocksize;
> -
> -	while (cur <= end) {
> +	while (cur <= page_end) {
>   		u64 disk_bytenr;
>   		u64 em_end;
> +		u32 iosize;
>   
>   		if (cur >= i_size) {
>   			btrfs_writepage_endio_finish_ordered(page, cur,
>   							     page_end, 1);
>   			break;
>   		}
> -		em = btrfs_get_extent(inode, NULL, 0, cur, end - cur + 1);
> +		em = btrfs_get_extent(inode, NULL, 0, cur, page_end - cur + 1);
>   		if (IS_ERR_OR_NULL(em)) {
>   			SetPageError(page);
>   			ret = PTR_ERR_OR_ZERO(em);
> @@ -3567,16 +3562,20 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
>   
>   		extent_offset = cur - em->start;
>   		em_end = extent_map_end(em);
> -		BUG_ON(em_end <= cur);
> -		BUG_ON(end < cur);
> -		iosize = min(em_end - cur, end - cur + 1);
> -		iosize = ALIGN(iosize, blocksize);
> -		disk_bytenr = em->block_start + extent_offset;
> +		ASSERT(cur <= em_end);
> +		ASSERT(cur < page_end);
> +		ASSERT(IS_ALIGNED(em->start, fs_info->sectorsize));
> +		ASSERT(IS_ALIGNED(em->len, fs_info->sectorsize));
>   		block_start = em->block_start;
>   		compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags);
> +		disk_bytenr = em->block_start + extent_offset;
> +
> +		/* Note that em_end from extent_map_end() is exclusive */
> +		iosize = min(em_end, page_end + 1) - cur;
>   		free_extent_map(em);
>   		em = NULL;
>   
> +

Random extra whitespace.  Once you fix you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 01/18] btrfs: extent_io: rename @offset parameter to @disk_bytenr for submit_extent_page()
  2020-12-10  6:38 ` [PATCH v2 01/18] btrfs: extent_io: rename @offset parameter to @disk_bytenr for submit_extent_page() Qu Wenruo
@ 2020-12-17 15:44   ` Josef Bacik
  0 siblings, 0 replies; 71+ messages in thread
From: Josef Bacik @ 2020-12-17 15:44 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 12/10/20 1:38 AM, Qu Wenruo wrote:
> The parameter @offset can't be more confusing.
> In fact that parameter is the disk bytenr for metadata/data.
> 
> Rename it to @disk_bytenr and update the comment to reduce confusion.
> 
> Since we're here, also rename all @offset passed into
> submit_extent_page() to @disk_bytenr.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 04/18] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page
  2020-12-10  6:38 ` [PATCH v2 04/18] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page Qu Wenruo
  2020-12-10 13:51   ` Nikolay Borisov
@ 2020-12-17 15:50   ` Josef Bacik
  1 sibling, 0 replies; 71+ messages in thread
From: Josef Bacik @ 2020-12-17 15:50 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Johannes Thumshirn

On 12/10/20 1:38 AM, Qu Wenruo wrote:
> This patch will extract the code to grab an extent buffer from a page
> into a helper, grab_extent_buffer_from_page().
> 
> This reduces one indent level, and provides the work place for later
> expansion for subapge support.
> 
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/extent_io.c | 52 +++++++++++++++++++++++++++-----------------
>   1 file changed, 32 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 612fe60b367e..6350c2687c7e 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5251,6 +5251,32 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
>   }
>   #endif
>   
> +static struct extent_buffer *grab_extent_buffer_from_page(struct page *page)
> +{
> +	struct extent_buffer *exists;
> +
> +	/* Page not yet attached to an extent buffer */
> +	if (!PagePrivate(page))
> +		return NULL;
> +
> +	/*
> +	 * We could have already allocated an eb for this page
> +	 * and attached one so lets see if we can get a ref on
> +	 * the existing eb, and if we can we know it's good and
> +	 * we can just return that one, else we know we can just
> +	 * overwrite page->private.
> +	 */
> +	exists = (struct extent_buffer *)page->private;
> +	if (atomic_inc_not_zero(&exists->refs)) {
> +		mark_extent_buffer_accessed(exists, page);
> +		return exists;
> +	}
> +
> +	WARN_ON(PageDirty(page));
> +	detach_page_private(page);
> +	return NULL;
> +}
> +
>   struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>   					  u64 start, u64 owner_root, int level)
>   {
> @@ -5296,26 +5322,12 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>   		}
>   
>   		spin_lock(&mapping->private_lock);
> -		if (PagePrivate(p)) {
> -			/*
> -			 * We could have already allocated an eb for this page
> -			 * and attached one so lets see if we can get a ref on
> -			 * the existing eb, and if we can we know it's good and
> -			 * we can just return that one, else we know we can just
> -			 * overwrite page->private.
> -			 */
> -			exists = (struct extent_buffer *)p->private;
> -			if (atomic_inc_not_zero(&exists->refs)) {
> -				spin_unlock(&mapping->private_lock);
> -				unlock_page(p);
> -				put_page(p);
> -				mark_extent_buffer_accessed(exists, p);
> -				goto free_eb;
> -			}
> -			exists = NULL;
> -
> -			WARN_ON(PageDirty(p));
> -			detach_page_private(p);
> +		exists = grab_extent_buffer_from_page(p);
> +		if (exists) {
> +			spin_unlock(&mapping->private_lock);
> +			unlock_page(p);
> +			put_page(p);

Put the mark_extent_buffer_accessed() here.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 05/18] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure
  2020-12-10  6:38 ` [PATCH v2 05/18] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure Qu Wenruo
@ 2020-12-17 15:52   ` Josef Bacik
  0 siblings, 0 replies; 71+ messages in thread
From: Josef Bacik @ 2020-12-17 15:52 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 12/10/20 1:38 AM, Qu Wenruo wrote:
> For btrfs subpage support, we need a structure to record extra info for
> the status of each sectors of a page.
> 
> This patch will introduce the skeleton structure for future btrfs
> subpage support.
> All subpage related code would go to subpage.[ch] to avoid populating
> the existing code base.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case
  2020-12-10  6:38 ` [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
  2020-12-10 15:30   ` Nikolay Borisov
  2020-12-10 16:09   ` Nikolay Borisov
@ 2020-12-17 16:00   ` Josef Bacik
  2020-12-18  0:44     ` Qu Wenruo
  2 siblings, 1 reply; 71+ messages in thread
From: Josef Bacik @ 2020-12-17 16:00 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 12/10/20 1:38 AM, Qu Wenruo wrote:
> For subpage case, we need to allocate new memory for each metadata page.
> 
> So we need to:
> - Allow attach_extent_buffer_page() to return int
>    To indicate allocation failure
> 
> - Prealloc page->private for alloc_extent_buffer()
>    We don't want to call memory allocation with spinlock hold, so
>    do preallocation before we acquire the spin lock.
> 
> - Handle subpage and regular case differently in
>    attach_extent_buffer_page()
>    For regular case, just do the usual thing.
>    For subpage case, allocate new memory and update the tree_block
>    bitmap.
> 
>    The bitmap update will be handled by new subpage specific helper,
>    btrfs_subpage_set_tree_block().
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/extent_io.c | 69 +++++++++++++++++++++++++++++++++++---------
>   fs/btrfs/subpage.h   | 44 ++++++++++++++++++++++++++++
>   2 files changed, 99 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 6350c2687c7e..51dd7ec3c2b3 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -24,6 +24,7 @@
>   #include "rcu-string.h"
>   #include "backref.h"
>   #include "disk-io.h"
> +#include "subpage.h"
>   
>   static struct kmem_cache *extent_state_cache;
>   static struct kmem_cache *extent_buffer_cache;
> @@ -3142,22 +3143,41 @@ static int submit_extent_page(unsigned int opf,
>   	return ret;
>   }
>   
> -static void attach_extent_buffer_page(struct extent_buffer *eb,
> +static int attach_extent_buffer_page(struct extent_buffer *eb,
>   				      struct page *page)
>   {
> -	/*
> -	 * If the page is mapped to btree inode, we should hold the private
> -	 * lock to prevent race.
> -	 * For cloned or dummy extent buffers, their pages are not mapped and
> -	 * will not race with any other ebs.
> -	 */
> -	if (page->mapping)
> -		lockdep_assert_held(&page->mapping->private_lock);
> +	struct btrfs_fs_info *fs_info = eb->fs_info;
> +	int ret;
>   
> -	if (!PagePrivate(page))
> -		attach_page_private(page, eb);
> -	else
> -		WARN_ON(page->private != (unsigned long)eb);
> +	if (fs_info->sectorsize == PAGE_SIZE) {
> +		/*
> +		 * If the page is mapped to btree inode, we should hold the
> +		 * private lock to prevent race.
> +		 * For cloned or dummy extent buffers, their pages are not
> +		 * mapped and will not race with any other ebs.
> +		 */
> +		if (page->mapping)
> +			lockdep_assert_held(&page->mapping->private_lock);
> +
> +		if (!PagePrivate(page))
> +			attach_page_private(page, eb);
> +		else
> +			WARN_ON(page->private != (unsigned long)eb);
> +		return 0;
> +	}
> +
> +	/* Already mapped, just update the existing range */
> +	if (PagePrivate(page))
> +		goto update_bitmap;
> +
> +	/* Do new allocation to attach subpage */
> +	ret = btrfs_attach_subpage(fs_info, page);
> +	if (ret < 0)
> +		return ret;
> +
> +update_bitmap:
> +	btrfs_subpage_set_tree_block(fs_info, page, eb->start, eb->len);
> +	return 0;
>   }
>   
>   void set_page_extent_mapped(struct page *page)
> @@ -5067,12 +5087,19 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
>   		return NULL;
>   
>   	for (i = 0; i < num_pages; i++) {
> +		int ret;
> +
>   		p = alloc_page(GFP_NOFS);
>   		if (!p) {
>   			btrfs_release_extent_buffer(new);
>   			return NULL;
>   		}
> -		attach_extent_buffer_page(new, p);
> +		ret = attach_extent_buffer_page(new, p);
> +		if (ret < 0) {
> +			put_page(p);
> +			btrfs_release_extent_buffer(new);
> +			return NULL;
> +		}
>   		WARN_ON(PageDirty(p));
>   		SetPageUptodate(p);
>   		new->pages[i] = p;
> @@ -5321,6 +5348,18 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>   			goto free_eb;
>   		}
>   
> +		/*
> +		 * Preallocate page->private for subpage case, so that
> +		 * we won't allocate memory with private_lock hold.
> +		 */
> +		ret = btrfs_attach_subpage(fs_info, p);
> +		if (ret < 0) {
> +			unlock_page(p);
> +			put_page(p);
> +			exists = ERR_PTR(-ENOMEM);
> +			goto free_eb;
> +		}
> +

This is broken, if we race with another thread adding an extent buffer for this 
same range we'll overwrite the page private with the new thing, losing any of 
the work that was done previously.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 07/18] btrfs: extent_io: make grab_extent_buffer_from_page() to handle subpage case
  2020-12-10  6:38 ` [PATCH v2 07/18] btrfs: extent_io: make grab_extent_buffer_from_page() " Qu Wenruo
  2020-12-10 15:39   ` Nikolay Borisov
@ 2020-12-17 16:02   ` Josef Bacik
  2020-12-18  0:49     ` Qu Wenruo
  1 sibling, 1 reply; 71+ messages in thread
From: Josef Bacik @ 2020-12-17 16:02 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 12/10/20 1:38 AM, Qu Wenruo wrote:
> For subpage case, grab_extent_buffer_from_page() can't really get an
> extent buffer just from btrfs_subpage.
> 
> Although we have btrfs_subpage::tree_block_bitmap, which can be used to
> grab the bytenr of an existing extent buffer, and can then go radix tree
> search to grab that existing eb.
> 
> However we are still doing radix tree insert check in
> alloc_extent_buffer(), thus we don't really need to do the extra hassle,
> just let alloc_extent_buffer() to handle existing eb in radix tree.
> 
> So for grab_extent_buffer_from_page(), just always return NULL for
> subpage case.

This is fundamentally flawed.  The extent buffer radix tree look up is done 
_after_ the pages are init'ed.  This is why there's that complicated dance of 
checking for existing extent buffers attached to to a page, because we can race 
at the initialization stage and attach an EB to a page before it's in the radix 
tree.  What you'll end up doing here is overwriting your existing subpage stuff 
anytime there's a race, and it'll end very badly.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case
  2020-12-17 16:00   ` Josef Bacik
@ 2020-12-18  0:44     ` Qu Wenruo
  2020-12-18 15:41       ` Josef Bacik
  0 siblings, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-18  0:44 UTC (permalink / raw)
  To: Josef Bacik, Qu Wenruo, linux-btrfs



On 2020/12/18 上午12:00, Josef Bacik wrote:
> On 12/10/20 1:38 AM, Qu Wenruo wrote:
>> For subpage case, we need to allocate new memory for each metadata page.
>>
>> So we need to:
>> - Allow attach_extent_buffer_page() to return int
>>    To indicate allocation failure
>>
>> - Prealloc page->private for alloc_extent_buffer()
>>    We don't want to call memory allocation with spinlock hold, so
>>    do preallocation before we acquire the spin lock.
>>
>> - Handle subpage and regular case differently in
>>    attach_extent_buffer_page()
>>    For regular case, just do the usual thing.
>>    For subpage case, allocate new memory and update the tree_block
>>    bitmap.
>>
>>    The bitmap update will be handled by new subpage specific helper,
>>    btrfs_subpage_set_tree_block().
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/extent_io.c | 69 +++++++++++++++++++++++++++++++++++---------
>>   fs/btrfs/subpage.h   | 44 ++++++++++++++++++++++++++++
>>   2 files changed, 99 insertions(+), 14 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 6350c2687c7e..51dd7ec3c2b3 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -24,6 +24,7 @@
>>   #include "rcu-string.h"
>>   #include "backref.h"
>>   #include "disk-io.h"
>> +#include "subpage.h"
>>   static struct kmem_cache *extent_state_cache;
>>   static struct kmem_cache *extent_buffer_cache;
>> @@ -3142,22 +3143,41 @@ static int submit_extent_page(unsigned int opf,
>>       return ret;
>>   }
>> -static void attach_extent_buffer_page(struct extent_buffer *eb,
>> +static int attach_extent_buffer_page(struct extent_buffer *eb,
>>                         struct page *page)
>>   {
>> -    /*
>> -     * If the page is mapped to btree inode, we should hold the private
>> -     * lock to prevent race.
>> -     * For cloned or dummy extent buffers, their pages are not mapped 
>> and
>> -     * will not race with any other ebs.
>> -     */
>> -    if (page->mapping)
>> -        lockdep_assert_held(&page->mapping->private_lock);
>> +    struct btrfs_fs_info *fs_info = eb->fs_info;
>> +    int ret;
>> -    if (!PagePrivate(page))
>> -        attach_page_private(page, eb);
>> -    else
>> -        WARN_ON(page->private != (unsigned long)eb);
>> +    if (fs_info->sectorsize == PAGE_SIZE) {
>> +        /*
>> +         * If the page is mapped to btree inode, we should hold the
>> +         * private lock to prevent race.
>> +         * For cloned or dummy extent buffers, their pages are not
>> +         * mapped and will not race with any other ebs.
>> +         */
>> +        if (page->mapping)
>> +            lockdep_assert_held(&page->mapping->private_lock);
>> +
>> +        if (!PagePrivate(page))
>> +            attach_page_private(page, eb);
>> +        else
>> +            WARN_ON(page->private != (unsigned long)eb);
>> +        return 0;
>> +    }
>> +
>> +    /* Already mapped, just update the existing range */
>> +    if (PagePrivate(page))
>> +        goto update_bitmap;
>> +
>> +    /* Do new allocation to attach subpage */
>> +    ret = btrfs_attach_subpage(fs_info, page);
>> +    if (ret < 0)
>> +        return ret;
>> +
>> +update_bitmap:
>> +    btrfs_subpage_set_tree_block(fs_info, page, eb->start, eb->len);
>> +    return 0;
>>   }
>>   void set_page_extent_mapped(struct page *page)
>> @@ -5067,12 +5087,19 @@ struct extent_buffer 
>> *btrfs_clone_extent_buffer(const struct extent_buffer *src)
>>           return NULL;
>>       for (i = 0; i < num_pages; i++) {
>> +        int ret;
>> +
>>           p = alloc_page(GFP_NOFS);
>>           if (!p) {
>>               btrfs_release_extent_buffer(new);
>>               return NULL;
>>           }
>> -        attach_extent_buffer_page(new, p);
>> +        ret = attach_extent_buffer_page(new, p);
>> +        if (ret < 0) {
>> +            put_page(p);
>> +            btrfs_release_extent_buffer(new);
>> +            return NULL;
>> +        }
>>           WARN_ON(PageDirty(p));
>>           SetPageUptodate(p);
>>           new->pages[i] = p;
>> @@ -5321,6 +5348,18 @@ struct extent_buffer 
>> *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>>               goto free_eb;
>>           }
>> +        /*
>> +         * Preallocate page->private for subpage case, so that
>> +         * we won't allocate memory with private_lock hold.
>> +         */
>> +        ret = btrfs_attach_subpage(fs_info, p);
>> +        if (ret < 0) {
>> +            unlock_page(p);
>> +            put_page(p);
>> +            exists = ERR_PTR(-ENOMEM);
>> +            goto free_eb;
>> +        }
>> +
> 
> This is broken, if we race with another thread adding an extent buffer 
> for this same range we'll overwrite the page private with the new thing, 
> losing any of the work that was done previously.  Thanks,

Firstly the page is locked, so there should be only one to grab the page.

Secondly, btrfs_attach_subpage() would just exit if it detects the page 
is already private.

So there shouldn't be a race.

Thanks,
Qu
> 
> Josef

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 07/18] btrfs: extent_io: make grab_extent_buffer_from_page() to handle subpage case
  2020-12-17 16:02   ` Josef Bacik
@ 2020-12-18  0:49     ` Qu Wenruo
  0 siblings, 0 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-18  0:49 UTC (permalink / raw)
  To: Josef Bacik, Qu Wenruo, linux-btrfs



On 2020/12/18 上午12:02, Josef Bacik wrote:
> On 12/10/20 1:38 AM, Qu Wenruo wrote:
>> For subpage case, grab_extent_buffer_from_page() can't really get an
>> extent buffer just from btrfs_subpage.
>>
>> Although we have btrfs_subpage::tree_block_bitmap, which can be used to
>> grab the bytenr of an existing extent buffer, and can then go radix tree
>> search to grab that existing eb.
>>
>> However we are still doing radix tree insert check in
>> alloc_extent_buffer(), thus we don't really need to do the extra hassle,
>> just let alloc_extent_buffer() to handle existing eb in radix tree.
>>
>> So for grab_extent_buffer_from_page(), just always return NULL for
>> subpage case.
>
> This is fundamentally flawed.  The extent buffer radix tree look up is
> done _after_ the pages are init'ed.  This is why there's that
> complicated dance of checking for existing extent buffers attached to to
> a page, because we can race at the initialization stage and attach an EB
> to a page before it's in the radix tree.  What you'll end up doing here
> is overwriting your existing subpage stuff anytime there's a race, and
> it'll end very badly.  Thanks,

We have page lock preventing two eb getting the same page.

And btrfs_attach_subpage() won't overwrite the existing page::private,
thus it's safe.

Thanks,
Qu
>
> Josef

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case
  2020-12-18  0:44     ` Qu Wenruo
@ 2020-12-18 15:41       ` Josef Bacik
  2020-12-19  0:24         ` Qu Wenruo
  0 siblings, 1 reply; 71+ messages in thread
From: Josef Bacik @ 2020-12-18 15:41 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs

On 12/17/20 7:44 PM, Qu Wenruo wrote:
> 
> 
> On 2020/12/18 上午12:00, Josef Bacik wrote:
>> On 12/10/20 1:38 AM, Qu Wenruo wrote:
>>> For subpage case, we need to allocate new memory for each metadata page.
>>>
>>> So we need to:
>>> - Allow attach_extent_buffer_page() to return int
>>>    To indicate allocation failure
>>>
>>> - Prealloc page->private for alloc_extent_buffer()
>>>    We don't want to call memory allocation with spinlock hold, so
>>>    do preallocation before we acquire the spin lock.
>>>
>>> - Handle subpage and regular case differently in
>>>    attach_extent_buffer_page()
>>>    For regular case, just do the usual thing.
>>>    For subpage case, allocate new memory and update the tree_block
>>>    bitmap.
>>>
>>>    The bitmap update will be handled by new subpage specific helper,
>>>    btrfs_subpage_set_tree_block().
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>   fs/btrfs/extent_io.c | 69 +++++++++++++++++++++++++++++++++++---------
>>>   fs/btrfs/subpage.h   | 44 ++++++++++++++++++++++++++++
>>>   2 files changed, 99 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>> index 6350c2687c7e..51dd7ec3c2b3 100644
>>> --- a/fs/btrfs/extent_io.c
>>> +++ b/fs/btrfs/extent_io.c
>>> @@ -24,6 +24,7 @@
>>>   #include "rcu-string.h"
>>>   #include "backref.h"
>>>   #include "disk-io.h"
>>> +#include "subpage.h"
>>>   static struct kmem_cache *extent_state_cache;
>>>   static struct kmem_cache *extent_buffer_cache;
>>> @@ -3142,22 +3143,41 @@ static int submit_extent_page(unsigned int opf,
>>>       return ret;
>>>   }
>>> -static void attach_extent_buffer_page(struct extent_buffer *eb,
>>> +static int attach_extent_buffer_page(struct extent_buffer *eb,
>>>                         struct page *page)
>>>   {
>>> -    /*
>>> -     * If the page is mapped to btree inode, we should hold the private
>>> -     * lock to prevent race.
>>> -     * For cloned or dummy extent buffers, their pages are not mapped and
>>> -     * will not race with any other ebs.
>>> -     */
>>> -    if (page->mapping)
>>> -        lockdep_assert_held(&page->mapping->private_lock);
>>> +    struct btrfs_fs_info *fs_info = eb->fs_info;
>>> +    int ret;
>>> -    if (!PagePrivate(page))
>>> -        attach_page_private(page, eb);
>>> -    else
>>> -        WARN_ON(page->private != (unsigned long)eb);
>>> +    if (fs_info->sectorsize == PAGE_SIZE) {
>>> +        /*
>>> +         * If the page is mapped to btree inode, we should hold the
>>> +         * private lock to prevent race.
>>> +         * For cloned or dummy extent buffers, their pages are not
>>> +         * mapped and will not race with any other ebs.
>>> +         */
>>> +        if (page->mapping)
>>> +            lockdep_assert_held(&page->mapping->private_lock);
>>> +
>>> +        if (!PagePrivate(page))
>>> +            attach_page_private(page, eb);
>>> +        else
>>> +            WARN_ON(page->private != (unsigned long)eb);
>>> +        return 0;
>>> +    }
>>> +
>>> +    /* Already mapped, just update the existing range */
>>> +    if (PagePrivate(page))
>>> +        goto update_bitmap;
>>> +
>>> +    /* Do new allocation to attach subpage */
>>> +    ret = btrfs_attach_subpage(fs_info, page);
>>> +    if (ret < 0)
>>> +        return ret;
>>> +
>>> +update_bitmap:
>>> +    btrfs_subpage_set_tree_block(fs_info, page, eb->start, eb->len);
>>> +    return 0;
>>>   }
>>>   void set_page_extent_mapped(struct page *page)
>>> @@ -5067,12 +5087,19 @@ struct extent_buffer *btrfs_clone_extent_buffer(const 
>>> struct extent_buffer *src)
>>>           return NULL;
>>>       for (i = 0; i < num_pages; i++) {
>>> +        int ret;
>>> +
>>>           p = alloc_page(GFP_NOFS);
>>>           if (!p) {
>>>               btrfs_release_extent_buffer(new);
>>>               return NULL;
>>>           }
>>> -        attach_extent_buffer_page(new, p);
>>> +        ret = attach_extent_buffer_page(new, p);
>>> +        if (ret < 0) {
>>> +            put_page(p);
>>> +            btrfs_release_extent_buffer(new);
>>> +            return NULL;
>>> +        }
>>>           WARN_ON(PageDirty(p));
>>>           SetPageUptodate(p);
>>>           new->pages[i] = p;
>>> @@ -5321,6 +5348,18 @@ struct extent_buffer *alloc_extent_buffer(struct 
>>> btrfs_fs_info *fs_info,
>>>               goto free_eb;
>>>           }
>>> +        /*
>>> +         * Preallocate page->private for subpage case, so that
>>> +         * we won't allocate memory with private_lock hold.
>>> +         */
>>> +        ret = btrfs_attach_subpage(fs_info, p);
>>> +        if (ret < 0) {
>>> +            unlock_page(p);
>>> +            put_page(p);
>>> +            exists = ERR_PTR(-ENOMEM);
>>> +            goto free_eb;
>>> +        }
>>> +
>>
>> This is broken, if we race with another thread adding an extent buffer for 
>> this same range we'll overwrite the page private with the new thing, losing 
>> any of the work that was done previously.  Thanks,
> 
> Firstly the page is locked, so there should be only one to grab the page.
> 
> Secondly, btrfs_attach_subpage() would just exit if it detects the page is 
> already private.
> 
> So there shouldn't be a race.
> 
Task1						Task2
alloc_extent_buffer(4096)			alloc_extent_buffer(4096)
   find_extent_buffer, nothing			  find_extent_buffer, nothing
     find_or_create_page(1)
						    find_or_create_page(1)
						      waits on page lock
       btrfs_attach_subpage()
   radix_tree_insert()
   unlock pages
						      exit find_or_create_page()
						    btrfs_attach_subpage(), BAD

there's definitely a race, again this is why the code does the check to see if 
there's a private attached to the EB already.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case
  2020-12-18 15:41       ` Josef Bacik
@ 2020-12-19  0:24         ` Qu Wenruo
  2020-12-21 10:15           ` Qu Wenruo
  0 siblings, 1 reply; 71+ messages in thread
From: Qu Wenruo @ 2020-12-19  0:24 UTC (permalink / raw)
  To: Josef Bacik, Qu Wenruo, linux-btrfs



On 2020/12/18 下午11:41, Josef Bacik wrote:
> On 12/17/20 7:44 PM, Qu Wenruo wrote:
>>
>>
>> On 2020/12/18 上午12:00, Josef Bacik wrote:
>>> On 12/10/20 1:38 AM, Qu Wenruo wrote:
>>>> For subpage case, we need to allocate new memory for each metadata 
>>>> page.
>>>>
>>>> So we need to:
>>>> - Allow attach_extent_buffer_page() to return int
>>>>    To indicate allocation failure
>>>>
>>>> - Prealloc page->private for alloc_extent_buffer()
>>>>    We don't want to call memory allocation with spinlock hold, so
>>>>    do preallocation before we acquire the spin lock.
>>>>
>>>> - Handle subpage and regular case differently in
>>>>    attach_extent_buffer_page()
>>>>    For regular case, just do the usual thing.
>>>>    For subpage case, allocate new memory and update the tree_block
>>>>    bitmap.
>>>>
>>>>    The bitmap update will be handled by new subpage specific helper,
>>>>    btrfs_subpage_set_tree_block().
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>>   fs/btrfs/extent_io.c | 69 
>>>> +++++++++++++++++++++++++++++++++++---------
>>>>   fs/btrfs/subpage.h   | 44 ++++++++++++++++++++++++++++
>>>>   2 files changed, 99 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>>> index 6350c2687c7e..51dd7ec3c2b3 100644
>>>> --- a/fs/btrfs/extent_io.c
>>>> +++ b/fs/btrfs/extent_io.c
>>>> @@ -24,6 +24,7 @@
>>>>   #include "rcu-string.h"
>>>>   #include "backref.h"
>>>>   #include "disk-io.h"
>>>> +#include "subpage.h"
>>>>   static struct kmem_cache *extent_state_cache;
>>>>   static struct kmem_cache *extent_buffer_cache;
>>>> @@ -3142,22 +3143,41 @@ static int submit_extent_page(unsigned int opf,
>>>>       return ret;
>>>>   }
>>>> -static void attach_extent_buffer_page(struct extent_buffer *eb,
>>>> +static int attach_extent_buffer_page(struct extent_buffer *eb,
>>>>                         struct page *page)
>>>>   {
>>>> -    /*
>>>> -     * If the page is mapped to btree inode, we should hold the 
>>>> private
>>>> -     * lock to prevent race.
>>>> -     * For cloned or dummy extent buffers, their pages are not 
>>>> mapped and
>>>> -     * will not race with any other ebs.
>>>> -     */
>>>> -    if (page->mapping)
>>>> -        lockdep_assert_held(&page->mapping->private_lock);
>>>> +    struct btrfs_fs_info *fs_info = eb->fs_info;
>>>> +    int ret;
>>>> -    if (!PagePrivate(page))
>>>> -        attach_page_private(page, eb);
>>>> -    else
>>>> -        WARN_ON(page->private != (unsigned long)eb);
>>>> +    if (fs_info->sectorsize == PAGE_SIZE) {
>>>> +        /*
>>>> +         * If the page is mapped to btree inode, we should hold the
>>>> +         * private lock to prevent race.
>>>> +         * For cloned or dummy extent buffers, their pages are not
>>>> +         * mapped and will not race with any other ebs.
>>>> +         */
>>>> +        if (page->mapping)
>>>> +            lockdep_assert_held(&page->mapping->private_lock);
>>>> +
>>>> +        if (!PagePrivate(page))
>>>> +            attach_page_private(page, eb);
>>>> +        else
>>>> +            WARN_ON(page->private != (unsigned long)eb);
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    /* Already mapped, just update the existing range */
>>>> +    if (PagePrivate(page))
>>>> +        goto update_bitmap;
>>>> +
>>>> +    /* Do new allocation to attach subpage */
>>>> +    ret = btrfs_attach_subpage(fs_info, page);
>>>> +    if (ret < 0)
>>>> +        return ret;
>>>> +
>>>> +update_bitmap:
>>>> +    btrfs_subpage_set_tree_block(fs_info, page, eb->start, eb->len);
>>>> +    return 0;
>>>>   }
>>>>   void set_page_extent_mapped(struct page *page)
>>>> @@ -5067,12 +5087,19 @@ struct extent_buffer 
>>>> *btrfs_clone_extent_buffer(const struct extent_buffer *src)
>>>>           return NULL;
>>>>       for (i = 0; i < num_pages; i++) {
>>>> +        int ret;
>>>> +
>>>>           p = alloc_page(GFP_NOFS);
>>>>           if (!p) {
>>>>               btrfs_release_extent_buffer(new);
>>>>               return NULL;
>>>>           }
>>>> -        attach_extent_buffer_page(new, p);
>>>> +        ret = attach_extent_buffer_page(new, p);
>>>> +        if (ret < 0) {
>>>> +            put_page(p);
>>>> +            btrfs_release_extent_buffer(new);
>>>> +            return NULL;
>>>> +        }
>>>>           WARN_ON(PageDirty(p));
>>>>           SetPageUptodate(p);
>>>>           new->pages[i] = p;
>>>> @@ -5321,6 +5348,18 @@ struct extent_buffer 
>>>> *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>>>>               goto free_eb;
>>>>           }
>>>> +        /*
>>>> +         * Preallocate page->private for subpage case, so that
>>>> +         * we won't allocate memory with private_lock hold.
>>>> +         */
>>>> +        ret = btrfs_attach_subpage(fs_info, p);
>>>> +        if (ret < 0) {
>>>> +            unlock_page(p);
>>>> +            put_page(p);
>>>> +            exists = ERR_PTR(-ENOMEM);
>>>> +            goto free_eb;
>>>> +        }
>>>> +
>>>
>>> This is broken, if we race with another thread adding an extent 
>>> buffer for this same range we'll overwrite the page private with the 
>>> new thing, losing any of the work that was done previously.  Thanks,
>>
>> Firstly the page is locked, so there should be only one to grab the page.
>>
>> Secondly, btrfs_attach_subpage() would just exit if it detects the 
>> page is already private.
>>
>> So there shouldn't be a race.
>>
> Task1                        Task2
> alloc_extent_buffer(4096)            alloc_extent_buffer(4096)
>    find_extent_buffer, nothing              find_extent_buffer, nothing
>      find_or_create_page(1)
>                              find_or_create_page(1)
>                                waits on page lock
>        btrfs_attach_subpage()
>    radix_tree_insert()
>    unlock pages
>                                exit find_or_create_page()
>                              btrfs_attach_subpage(), BAD
> 
> there's definitely a race, again this is why the code does the check to 
> see if there's a private attached to the EB already.  Thanks,

btrfs_attach_subpage() is already doing the private check.

Thanks,
Qu

> 
> Josef

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case
  2020-12-19  0:24         ` Qu Wenruo
@ 2020-12-21 10:15           ` Qu Wenruo
  0 siblings, 0 replies; 71+ messages in thread
From: Qu Wenruo @ 2020-12-21 10:15 UTC (permalink / raw)
  To: Josef Bacik, Qu Wenruo, linux-btrfs



On 2020/12/19 上午8:24, Qu Wenruo wrote:
> 
> 
> On 2020/12/18 下午11:41, Josef Bacik wrote:
>> On 12/17/20 7:44 PM, Qu Wenruo wrote:
>>>
>>>
>>> On 2020/12/18 上午12:00, Josef Bacik wrote:
>>>> On 12/10/20 1:38 AM, Qu Wenruo wrote:
>>>>> For subpage case, we need to allocate new memory for each metadata 
>>>>> page.
>>>>>
>>>>> So we need to:
>>>>> - Allow attach_extent_buffer_page() to return int
>>>>>    To indicate allocation failure
>>>>>
>>>>> - Prealloc page->private for alloc_extent_buffer()
>>>>>    We don't want to call memory allocation with spinlock hold, so
>>>>>    do preallocation before we acquire the spin lock.
>>>>>
>>>>> - Handle subpage and regular case differently in
>>>>>    attach_extent_buffer_page()
>>>>>    For regular case, just do the usual thing.
>>>>>    For subpage case, allocate new memory and update the tree_block
>>>>>    bitmap.
>>>>>
>>>>>    The bitmap update will be handled by new subpage specific helper,
>>>>>    btrfs_subpage_set_tree_block().
>>>>>
>>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>>> ---
>>>>>   fs/btrfs/extent_io.c | 69 
>>>>> +++++++++++++++++++++++++++++++++++---------
>>>>>   fs/btrfs/subpage.h   | 44 ++++++++++++++++++++++++++++
>>>>>   2 files changed, 99 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>>>> index 6350c2687c7e..51dd7ec3c2b3 100644
>>>>> --- a/fs/btrfs/extent_io.c
>>>>> +++ b/fs/btrfs/extent_io.c
>>>>> @@ -24,6 +24,7 @@
>>>>>   #include "rcu-string.h"
>>>>>   #include "backref.h"
>>>>>   #include "disk-io.h"
>>>>> +#include "subpage.h"
>>>>>   static struct kmem_cache *extent_state_cache;
>>>>>   static struct kmem_cache *extent_buffer_cache;
>>>>> @@ -3142,22 +3143,41 @@ static int submit_extent_page(unsigned int 
>>>>> opf,
>>>>>       return ret;
>>>>>   }
>>>>> -static void attach_extent_buffer_page(struct extent_buffer *eb,
>>>>> +static int attach_extent_buffer_page(struct extent_buffer *eb,
>>>>>                         struct page *page)
>>>>>   {
>>>>> -    /*
>>>>> -     * If the page is mapped to btree inode, we should hold the 
>>>>> private
>>>>> -     * lock to prevent race.
>>>>> -     * For cloned or dummy extent buffers, their pages are not 
>>>>> mapped and
>>>>> -     * will not race with any other ebs.
>>>>> -     */
>>>>> -    if (page->mapping)
>>>>> -        lockdep_assert_held(&page->mapping->private_lock);
>>>>> +    struct btrfs_fs_info *fs_info = eb->fs_info;
>>>>> +    int ret;
>>>>> -    if (!PagePrivate(page))
>>>>> -        attach_page_private(page, eb);
>>>>> -    else
>>>>> -        WARN_ON(page->private != (unsigned long)eb);
>>>>> +    if (fs_info->sectorsize == PAGE_SIZE) {
>>>>> +        /*
>>>>> +         * If the page is mapped to btree inode, we should hold the
>>>>> +         * private lock to prevent race.
>>>>> +         * For cloned or dummy extent buffers, their pages are not
>>>>> +         * mapped and will not race with any other ebs.
>>>>> +         */
>>>>> +        if (page->mapping)
>>>>> +            lockdep_assert_held(&page->mapping->private_lock);
>>>>> +
>>>>> +        if (!PagePrivate(page))
>>>>> +            attach_page_private(page, eb);
>>>>> +        else
>>>>> +            WARN_ON(page->private != (unsigned long)eb);
>>>>> +        return 0;
>>>>> +    }
>>>>> +
>>>>> +    /* Already mapped, just update the existing range */
>>>>> +    if (PagePrivate(page))
>>>>> +        goto update_bitmap;
>>>>> +
>>>>> +    /* Do new allocation to attach subpage */
>>>>> +    ret = btrfs_attach_subpage(fs_info, page);
>>>>> +    if (ret < 0)
>>>>> +        return ret;
>>>>> +
>>>>> +update_bitmap:
>>>>> +    btrfs_subpage_set_tree_block(fs_info, page, eb->start, eb->len);
>>>>> +    return 0;
>>>>>   }
>>>>>   void set_page_extent_mapped(struct page *page)
>>>>> @@ -5067,12 +5087,19 @@ struct extent_buffer 
>>>>> *btrfs_clone_extent_buffer(const struct extent_buffer *src)
>>>>>           return NULL;
>>>>>       for (i = 0; i < num_pages; i++) {
>>>>> +        int ret;
>>>>> +
>>>>>           p = alloc_page(GFP_NOFS);
>>>>>           if (!p) {
>>>>>               btrfs_release_extent_buffer(new);
>>>>>               return NULL;
>>>>>           }
>>>>> -        attach_extent_buffer_page(new, p);
>>>>> +        ret = attach_extent_buffer_page(new, p);
>>>>> +        if (ret < 0) {
>>>>> +            put_page(p);
>>>>> +            btrfs_release_extent_buffer(new);
>>>>> +            return NULL;
>>>>> +        }
>>>>>           WARN_ON(PageDirty(p));
>>>>>           SetPageUptodate(p);
>>>>>           new->pages[i] = p;
>>>>> @@ -5321,6 +5348,18 @@ struct extent_buffer 
>>>>> *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>>>>>               goto free_eb;
>>>>>           }
>>>>> +        /*
>>>>> +         * Preallocate page->private for subpage case, so that
>>>>> +         * we won't allocate memory with private_lock hold.
>>>>> +         */
>>>>> +        ret = btrfs_attach_subpage(fs_info, p);
>>>>> +        if (ret < 0) {
>>>>> +            unlock_page(p);
>>>>> +            put_page(p);
>>>>> +            exists = ERR_PTR(-ENOMEM);
>>>>> +            goto free_eb;
>>>>> +        }
>>>>> +
>>>>
>>>> This is broken, if we race with another thread adding an extent 
>>>> buffer for this same range we'll overwrite the page private with the 
>>>> new thing, losing any of the work that was done previously.  Thanks,
>>>
>>> Firstly the page is locked, so there should be only one to grab the 
>>> page.
>>>
>>> Secondly, btrfs_attach_subpage() would just exit if it detects the 
>>> page is already private.
>>>
>>> So there shouldn't be a race.
>>>
>> Task1                        Task2
>> alloc_extent_buffer(4096)            alloc_extent_buffer(4096)
>>    find_extent_buffer, nothing              find_extent_buffer, nothing
>>      find_or_create_page(1)
>>                              find_or_create_page(1)
>>                                waits on page lock
>>        btrfs_attach_subpage()
>>    radix_tree_insert()
>>    unlock pages
>>                                exit find_or_create_page()
>>                              btrfs_attach_subpage(), BAD

To be more clear, in above case, btrfs_attach_subpage() would find page 
is already private, thus exit without doing anything (no extra attaching 
nor bitmap update).

Thus no btrfs_subpage info is overwritten.

>>
>> there's definitely a race, again this is why the code does the check 
>> to see if there's a private attached to the EB already.  Thanks,

That's exactly btrfs_attach_subpage() is doing.

Anyway, all the hassle is needed just to avoid memory allocation inside 
the spinlock.

Personally speaking I don't see any better solution than pre-allocating 
right now.

Thanks,
Qu

> 
> btrfs_attach_subpage() is already doing the private check.
> 
> Thanks,
> Qu
> 
>>
>> Josef

^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2020-12-21 10:17 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-10  6:38 [PATCH v2 00/18] btrfs: add read-only support for subpage sector size Qu Wenruo
2020-12-10  6:38 ` [PATCH v2 01/18] btrfs: extent_io: rename @offset parameter to @disk_bytenr for submit_extent_page() Qu Wenruo
2020-12-17 15:44   ` Josef Bacik
2020-12-10  6:38 ` [PATCH v2 02/18] btrfs: extent_io: refactor __extent_writepage_io() to improve readability Qu Wenruo
2020-12-10 12:12   ` Nikolay Borisov
2020-12-10 12:53     ` Qu Wenruo
2020-12-10 12:58       ` Nikolay Borisov
2020-12-17 15:43   ` Josef Bacik
2020-12-10  6:38 ` [PATCH v2 03/18] btrfs: file: update comment for btrfs_dirty_pages() Qu Wenruo
2020-12-10 12:16   ` Nikolay Borisov
2020-12-10  6:38 ` [PATCH v2 04/18] btrfs: extent_io: introduce a helper to grab an existing extent buffer from a page Qu Wenruo
2020-12-10 13:51   ` Nikolay Borisov
2020-12-17 15:50   ` Josef Bacik
2020-12-10  6:38 ` [PATCH v2 05/18] btrfs: extent_io: introduce the skeleton of btrfs_subpage structure Qu Wenruo
2020-12-17 15:52   ` Josef Bacik
2020-12-10  6:38 ` [PATCH v2 06/18] btrfs: extent_io: make attach_extent_buffer_page() to handle subpage case Qu Wenruo
2020-12-10 15:30   ` Nikolay Borisov
2020-12-17  6:48     ` Qu Wenruo
2020-12-10 16:09   ` Nikolay Borisov
2020-12-17 16:00   ` Josef Bacik
2020-12-18  0:44     ` Qu Wenruo
2020-12-18 15:41       ` Josef Bacik
2020-12-19  0:24         ` Qu Wenruo
2020-12-21 10:15           ` Qu Wenruo
2020-12-10  6:38 ` [PATCH v2 07/18] btrfs: extent_io: make grab_extent_buffer_from_page() " Qu Wenruo
2020-12-10 15:39   ` Nikolay Borisov
2020-12-17  6:55     ` Qu Wenruo
2020-12-17 16:02   ` Josef Bacik
2020-12-18  0:49     ` Qu Wenruo
2020-12-10  6:38 ` [PATCH v2 08/18] btrfs: extent_io: support subpage for extent buffer page release Qu Wenruo
2020-12-10 16:13   ` Nikolay Borisov
2020-12-10  6:38 ` [PATCH v2 09/18] btrfs: subpage: introduce helper for subpage uptodate status Qu Wenruo
2020-12-11 10:10   ` Nikolay Borisov
2020-12-11 10:48     ` Qu Wenruo
2020-12-11 11:41       ` Nikolay Borisov
2020-12-11 11:56         ` Qu Wenruo
2020-12-10  6:38 ` [PATCH v2 10/18] btrfs: subpage: introduce helper for subpage error status Qu Wenruo
2020-12-10  6:38 ` [PATCH v2 11/18] btrfs: extent_io: make set/clear_extent_buffer_uptodate() to support subpage size Qu Wenruo
2020-12-10  6:38 ` [PATCH v2 12/18] btrfs: extent_io: implement try_release_extent_buffer() for subpage metadata support Qu Wenruo
2020-12-11 12:00   ` Nikolay Borisov
2020-12-11 12:11     ` Qu Wenruo
2020-12-11 16:57       ` Nikolay Borisov
2020-12-12  1:28         ` Qu Wenruo
2020-12-12  9:26           ` Nikolay Borisov
2020-12-12 10:26             ` Qu Wenruo
2020-12-12  5:44         ` Qu Wenruo
2020-12-12 10:30           ` Nikolay Borisov
2020-12-12 10:31             ` Qu Wenruo
2020-12-10  6:39 ` [PATCH v2 13/18] btrfs: extent_io: introduce read_extent_buffer_subpage() Qu Wenruo
2020-12-10  6:39 ` [PATCH v2 14/18] btrfs: extent_io: make endio_readpage_update_page_status() to handle subpage case Qu Wenruo
2020-12-14  9:57   ` Nikolay Borisov
2020-12-14 10:46     ` Qu Wenruo
2020-12-10  6:39 ` [PATCH v2 15/18] btrfs: disk-io: introduce subpage metadata validation check Qu Wenruo
2020-12-10 13:24   ` kernel test robot
2020-12-10 13:24     ` kernel test robot
2020-12-10 13:39   ` kernel test robot
2020-12-10 13:39     ` kernel test robot
2020-12-14 10:21   ` Nikolay Borisov
2020-12-14 10:50     ` Qu Wenruo
2020-12-14 11:17       ` Nikolay Borisov
2020-12-14 11:32         ` Qu Wenruo
2020-12-14 12:40           ` Nikolay Borisov
2020-12-10  6:39 ` [PATCH v2 16/18] btrfs: introduce btrfs_subpage for data inodes Qu Wenruo
2020-12-10  9:44   ` kernel test robot
2020-12-10  9:44     ` kernel test robot
2020-12-11  0:43   ` kernel test robot
2020-12-11  0:43     ` kernel test robot
2020-12-14 12:46   ` Nikolay Borisov
2020-12-10  6:39 ` [PATCH v2 17/18] btrfs: integrate page status update for read path into begin/end_page_read() Qu Wenruo
2020-12-14 13:59   ` Nikolay Borisov
2020-12-10  6:39 ` [PATCH v2 18/18] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.