linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/15] btrfs: preparation patches for subpage support
@ 2020-12-02  6:47 Qu Wenruo
  2020-12-02  6:47 ` [PATCH v3 01/15] btrfs: rename bio_offset of extent_submit_bio_start_t to opt_file_offset Qu Wenruo
                   ` (15 more replies)
  0 siblings, 16 replies; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:47 UTC (permalink / raw)
  To: linux-btrfs

This is the rebased preparation branch for all patches not yet merged into
misc-next.

It can be fetched from github (with experimental sector aligned data write
support)
https://github.com/adam900710/linux/tree/subpage

This patchset includes all the unmerged preparation patches for subpage
support.

The patchset is sent without the main core for subpage support, as
myself has proven that, big patchset bombarding won't really make
reviewers happy, but only make the author happy (for a very short time).

Thanks for the hard work from David, there are only 15 patches unmerged.
(With 2 new small patches to address u32 u64 problem)

Patch 01~02:	bio_offset related fixes. Make bio_offset to be u32.
Patch 03:	Refactor metadata submission for later metadata write
		support.
Patch 04~08:	Metadata related refactor.
Patch 09~10:	Data related refactor
Patch 11~15:	Scrub related refactor and cleanup

For the scrub patch, there was a discussion with David, about whether we
should use sector size as the unit for metadata scrub.

His idea is, sector size should be the minimal unit for DATA, not
metadata. This indicates there is a undefined "minimal unit" of access.

But my argument is, sector size is the minimal unit for all btrfs
access, current btrfs has an undefined "data size", and that "data size"
must equal to sectorsize for current btrfs implementation.

Thus for "data size" < nodesize case, we should first add support for
"data size" > sectorsize first.

Thus I kept the scrub patch untouched, since IMHO sector size is still
the minimal unit to access, thus iterating using sectorsize is
completely sane.

Changelog:
v1:
- Separate prep patches from the huge subpage patchset

- Rebased to misc-next

- Add more commit message for patch "btrfs: extent_io: remove the
  extent_start/extent_len for end_bio_extent_readpage()"
  With one runtime example to explain why we are doing the same thing.

- Fix the assert_spin_lock() usage
  What we really want is lockdep_assert_held()

- Re-iterate the reason why some extent io tests are invalid
  This is especially important since later patches will reduce
  extent_buffer::pages[] to bare minimal, killing the ability to
  handle certain invalid extent buffers.

- Use sectorsize_bits for division
  During the convert, we should only use sectorsize_bits for division,
  this solves the hassle on 32bit system to do division.
  But we should not use sectorsize_bits no brain, as bit shift is not
  straight forward as multiple/division.

- Address the comments for btrfs_lookup_bio_sums() cleanup patchset
  From naming to macro usages, all of those comments should further
  improve the readability.

v2:
- Remove new extent_io tree features
  Now we won't utilize extent io tree for subpage support, thus new
  features along with some aggressive refactor is no longer needed.

- Reduce extent_io tree operations to reduce endio time latency
  Although extent_io tree can do a lot of things like page status, but
  it has obvious overhead, namingly search btree.
  So keep the original behavior by only calling extent_io operation in a
  big extent, to reduce latency

v3:
- Rebased to latest misc-next
  Now only 15 patches to submit.

- Add two new patches to address u32 and u64 problems
  The root problem is the on-disk format is abusing u64 for its length.
  We have to draw a line between where we should convert to u32.
  Currently for bio_offset and extent_len, we can safely use u32.
  Just to be extra safe, added more ASSERT() for this.

- Put BTRFS_MAX_METADATA_BLOCKSIZE into uapi
  To avoid circle including "ctree.h"

- Add more changelog for the patch enabling subpage scrub


Qu Wenruo (15):
  btrfs: rename bio_offset of extent_submit_bio_start_t to
    opt_file_offset
  btrfs: pass bio_offset to check_data_csum() directly
  btrfs: inode: make btrfs_verify_data_csum() follow sector size
  btrfs: extent_io: extract the btree page submission code into its own
    helper function
  btrfs: extent_io: calculate inline extent buffer page size based on
    page size
  btrfs: extent_io: don't allow tree block to cross page boundary for
    subpage support
  btrfs: extent_io: update num_extent_pages() to support subpage sized
    extent buffer
  btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
  btrfs: file-item: remove the btrfs_find_ordered_sum() call in
    btrfs_lookup_bio_sums()
  btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle
    out-of-order bvecs
  btrfs: scrub: reduce the width for extent_len/stripe_len from 64 bits
    to 32 bits
  btrfs: scrub: always allocate one full page for one sector for RAID56
  btrfs: scrub: support subpage tree block scrub
  btrfs: scrub: support subpage data scrub
  btrfs: scrub: allow scrub to work with subpage sectorsize

 fs/btrfs/compression.c          |   5 +-
 fs/btrfs/ctree.c                |   3 +-
 fs/btrfs/ctree.h                |  48 ++++--
 fs/btrfs/disk-io.c              |  17 +-
 fs/btrfs/disk-io.h              |   2 +-
 fs/btrfs/extent_io.c            | 232 +++++++++++++++++-----------
 fs/btrfs/extent_io.h            |  18 ++-
 fs/btrfs/file-item.c            | 266 +++++++++++++++++++++-----------
 fs/btrfs/inode.c                |  61 +++++---
 fs/btrfs/ordered-data.c         |  44 ------
 fs/btrfs/ordered-data.h         |   2 -
 fs/btrfs/scrub.c                | 102 +++++++-----
 fs/btrfs/struct-funcs.c         |  18 ++-
 include/uapi/linux/btrfs_tree.h |   4 +
 14 files changed, 496 insertions(+), 326 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 01/15] btrfs: rename bio_offset of extent_submit_bio_start_t to opt_file_offset
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
@ 2020-12-02  6:47 ` Qu Wenruo
  2020-12-02  8:12   ` Christoph Hellwig
  2020-12-02  6:47 ` [PATCH v3 02/15] btrfs: pass bio_offset to check_data_csum() directly Qu Wenruo
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:47 UTC (permalink / raw)
  To: linux-btrfs

The parameter bio_offset of extent_submit_bio_start_t is very confusing.

If it's really bio_offset (offset to bio), then it should be u32.

But in fact, it's only utilized by dio read, and that member is used as
file offset, which must be u64.

Rename it to opt_file_offset since the only user uses it as file offset,
and add comment for who is using it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   | 17 ++++++++---------
 fs/btrfs/disk-io.h   |  2 +-
 fs/btrfs/extent_io.h |  2 +-
 fs/btrfs/inode.c     | 10 +++++-----
 4 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 46dd9e0b077e..504636803bc4 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -113,11 +113,9 @@ struct async_submit_bio {
 	struct bio *bio;
 	extent_submit_bio_start_t *submit_bio_start;
 	int mirror_num;
-	/*
-	 * bio_offset is optional, can be used if the pages in the bio
-	 * can't tell us where in the file the bio should go
-	 */
-	u64 bio_offset;
+
+	/* Optional parameter for submit_bio_start, used by direct io */
+	u64 opt_file_offset;
 	struct btrfs_work work;
 	blk_status_t status;
 };
@@ -697,7 +695,8 @@ static void run_one_async_start(struct btrfs_work *work)
 	blk_status_t ret;
 
 	async = container_of(work, struct  async_submit_bio, work);
-	ret = async->submit_bio_start(async->inode, async->bio, async->bio_offset);
+	ret = async->submit_bio_start(async->inode, async->bio,
+				      async->opt_file_offset);
 	if (ret)
 		async->status = ret;
 }
@@ -749,7 +748,7 @@ static void run_one_async_free(struct btrfs_work *work)
 
 blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio,
 				 int mirror_num, unsigned long bio_flags,
-				 u64 bio_offset,
+				 u64 opt_file_offset,
 				 extent_submit_bio_start_t *submit_bio_start)
 {
 	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
@@ -767,7 +766,7 @@ blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio,
 	btrfs_init_work(&async->work, run_one_async_start, run_one_async_done,
 			run_one_async_free);
 
-	async->bio_offset = bio_offset;
+	async->opt_file_offset = opt_file_offset;
 
 	async->status = 0;
 
@@ -797,7 +796,7 @@ static blk_status_t btree_csum_one_bio(struct bio *bio)
 }
 
 static blk_status_t btree_submit_bio_start(struct inode *inode, struct bio *bio,
-					   u64 bio_offset)
+					   u64 opt_file_offset)
 {
 	/*
 	 * when we're called for a write, we're already in the async
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 7b3ecad88d7e..9c2d6469bf25 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -116,7 +116,7 @@ blk_status_t btrfs_bio_wq_end_io(struct btrfs_fs_info *info, struct bio *bio,
 			enum btrfs_wq_endio_type metadata);
 blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio,
 				 int mirror_num, unsigned long bio_flags,
-				 u64 bio_offset,
+				 u64 opt_file_offset,
 				 extent_submit_bio_start_t *submit_bio_start);
 blk_status_t btrfs_submit_bio_done(void *private_data, struct bio *bio,
 			  int mirror_num);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 66762c3cdf81..d3c7ad02db24 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -72,7 +72,7 @@ typedef blk_status_t (submit_bio_hook_t)(struct inode *inode, struct bio *bio,
 					 unsigned long bio_flags);
 
 typedef blk_status_t (extent_submit_bio_start_t)(struct inode *inode,
-		struct bio *bio, u64 bio_offset);
+		struct bio *bio, u64 opt_file_offset);
 
 #define INLINE_EXTENT_BUFFER_PAGES 16
 #define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0ce42d52d53e..cf27729e41c8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2212,7 +2212,7 @@ int btrfs_bio_fits_in_stripe(struct page *page, size_t size, struct bio *bio,
  * are inserted into the btree
  */
 static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio,
-					   u64 bio_offset)
+					   u64 opt_file_offset)
 {
 	return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0);
 }
@@ -7795,9 +7795,10 @@ static void __endio_write_update_ordered(struct btrfs_inode *inode,
 }
 
 static blk_status_t btrfs_submit_bio_start_direct_io(struct inode *inode,
-						     struct bio *bio, u64 offset)
+						     struct bio *bio,
+						     u64 opt_file_offset)
 {
-	return btrfs_csum_one_bio(BTRFS_I(inode), bio, offset, 1);
+	return btrfs_csum_one_bio(BTRFS_I(inode), bio, opt_file_offset, 1);
 }
 
 static void btrfs_end_dio_bio(struct bio *bio)
@@ -7846,8 +7847,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 		goto map;
 
 	if (write && async_submit) {
-		ret = btrfs_wq_submit_bio(inode, bio, 0, 0,
-					  file_offset,
+		ret = btrfs_wq_submit_bio(inode, bio, 0, 0, file_offset,
 					  btrfs_submit_bio_start_direct_io);
 		goto err;
 	} else if (write) {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 02/15] btrfs: pass bio_offset to check_data_csum() directly
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
  2020-12-02  6:47 ` [PATCH v3 01/15] btrfs: rename bio_offset of extent_submit_bio_start_t to opt_file_offset Qu Wenruo
@ 2020-12-02  6:47 ` Qu Wenruo
  2020-12-02  6:47 ` [PATCH v3 03/15] btrfs: inode: make btrfs_verify_data_csum() follow sector size Qu Wenruo
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Johannes Thumshirn, Nikolay Borisov, Josef Bacik

Parameter @icsum for check_data_csum() is a little hard to understand.
So is the @phy_offset for btrfs_verify_data_csum().

Both parameters are calculated values for csum lookup.

Instead of some calculated value, just pass @bio_offset and let the
final and only user, check_data_csum(), to calculate whatever it needs.

Since we are here, also make the bio_offset parameter and some related
variables to be u32 (unsigned int).
As bio size is limited by its bi_size, which is unsigned int, and has
extra size limit check during various bio operations.
Thus we are ensured that bio_offset won't over flow u32.

Thus for all involved functions, not only rename the parameter from
@phy_offset to @bio_offset, but also reduce its width to u32, so we
won't have suspicious "u32 = u64 >> sector_bits;" lines anymore.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h     |  2 +-
 fs/btrfs/extent_io.c | 32 ++++++++++++++++++++------------
 fs/btrfs/extent_io.h |  2 +-
 fs/btrfs/inode.c     | 32 ++++++++++++++++++++------------
 4 files changed, 42 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2744e13e8eb9..112c9a2ae47b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3052,7 +3052,7 @@ u64 btrfs_file_extent_end(const struct btrfs_path *path);
 /* inode.c */
 blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 				   int mirror_num, unsigned long bio_flags);
-int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 phy_offset,
+int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u32 bio_offset,
 			   struct page *page, u64 start, u64 end, int mirror);
 struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
 					   u64 start, u64 len);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 569d50ccf78a..a28b61510265 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2634,7 +2634,7 @@ static bool btrfs_io_needs_validation(struct inode *inode, struct bio *bio)
 }
 
 blk_status_t btrfs_submit_read_repair(struct inode *inode,
-				      struct bio *failed_bio, u64 phy_offset,
+				      struct bio *failed_bio, u32 bio_offset,
 				      struct page *page, unsigned int pgoff,
 				      u64 start, u64 end, int failed_mirror,
 				      submit_bio_hook_t *submit_bio_hook)
@@ -2644,7 +2644,7 @@ blk_status_t btrfs_submit_read_repair(struct inode *inode,
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
 	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
 	struct btrfs_io_bio *failed_io_bio = btrfs_io_bio(failed_bio);
-	const int icsum = phy_offset >> fs_info->sectorsize_bits;
+	const int icsum = bio_offset >> fs_info->sectorsize_bits;
 	bool need_validation;
 	struct bio *repair_bio;
 	struct btrfs_io_bio *repair_io_bio;
@@ -2869,10 +2869,11 @@ static void end_bio_extent_readpage(struct bio *bio)
 	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
 	struct extent_io_tree *tree, *failure_tree;
 	struct processed_extent processed = { 0 };
-	u64 offset = 0;
-	u64 start;
-	u64 end;
-	u64 len;
+	/*
+	 * The offset to the beginning of a bio, since one bio can never be
+	 * larger than UINT_MAX, u32 here is enough.
+	 */
+	u32 bio_offset = 0;
 	int mirror;
 	int ret;
 	struct bvec_iter_all iter_all;
@@ -2882,7 +2883,10 @@ static void end_bio_extent_readpage(struct bio *bio)
 		struct page *page = bvec->bv_page;
 		struct inode *inode = page->mapping->host;
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-		u32 sectorsize = fs_info->sectorsize;
+		const u32 sectorsize = fs_info->sectorsize;
+		u64 start;
+		u64 end;
+		u32 len;
 
 		btrfs_debug(fs_info,
 			"end_bio_extent_readpage: bi_sector=%llu, err=%d, mirror=%u",
@@ -2915,8 +2919,9 @@ static void end_bio_extent_readpage(struct bio *bio)
 		mirror = io_bio->mirror_num;
 		if (likely(uptodate)) {
 			if (is_data_inode(inode))
-				ret = btrfs_verify_data_csum(io_bio, offset, page,
-							     start, end, mirror);
+				ret = btrfs_verify_data_csum(io_bio,
+						bio_offset, page, start, end,
+						mirror);
 			else
 				ret = btrfs_validate_metadata_buffer(io_bio,
 					page, start, end, mirror);
@@ -2944,12 +2949,14 @@ static void end_bio_extent_readpage(struct bio *bio)
 			 * If it can't handle the error it will return -EIO and
 			 * we remain responsible for that page.
 			 */
-			if (!btrfs_submit_read_repair(inode, bio, offset, page,
+			if (!btrfs_submit_read_repair(inode, bio, bio_offset,
+						page,
 						start - page_offset(page),
 						start, end, mirror,
 						btrfs_submit_data_bio)) {
 				uptodate = !bio->bi_status;
-				offset += len;
+				ASSERT(bio_offset + len > bio_offset);
+				bio_offset += len;
 				continue;
 			}
 		} else {
@@ -2974,7 +2981,8 @@ static void end_bio_extent_readpage(struct bio *bio)
 			if (page->index == end_index && off)
 				zero_user_segment(page, off, PAGE_SIZE);
 		}
-		offset += len;
+		ASSERT(bio_offset + len > bio_offset);
+		bio_offset += len;
 
 		/* Update page status and unlock */
 		endio_readpage_update_page_status(page, uptodate);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index d3c7ad02db24..db95468801c7 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -291,7 +291,7 @@ struct io_failure_record {
 
 
 blk_status_t btrfs_submit_read_repair(struct inode *inode,
-				      struct bio *failed_bio, u64 phy_offset,
+				      struct bio *failed_bio, u32 bio_offset,
 				      struct page *page, unsigned int pgoff,
 				      u64 start, u64 end, int failed_mirror,
 				      submit_bio_hook_t *submit_bio_hook);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index cf27729e41c8..5c051b0e58a5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2942,28 +2942,30 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
 
 /*
  * check_data_csum - verify checksum of one sector of uncompressed data
- * @inode:	the inode
+ * @inode:	inode
  * @io_bio:	btrfs_io_bio which contains the csum
- * @icsum:	checksum index in the io_bio->csum array, size of csum_size
+ * @bio_offset:	offset to the beginning of the bio (in bytes)
  * @page:	page where is the data to be verified
  * @pgoff:	offset inside the page
  *
  * The length of such check is always one sector size.
  */
 static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
-			   int icsum, struct page *page, int pgoff)
+			   u32 bio_offset, struct page *page, int pgoff)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	char *kaddr;
 	u32 len = fs_info->sectorsize;
 	const u32 csum_size = fs_info->csum_size;
+	unsigned int offset_sectors;
 	u8 *csum_expected;
 	u8 csum[BTRFS_CSUM_SIZE];
 
 	ASSERT(pgoff + len <= PAGE_SIZE);
 
-	csum_expected = ((u8 *)io_bio->csum) + icsum * csum_size;
+	offset_sectors = bio_offset >> fs_info->sectorsize_bits;
+	csum_expected = ((u8 *)io_bio->csum) + offset_sectors * csum_size;
 
 	kaddr = kmap_atomic(page);
 	shash->tfm = fs_info->csum_shash;
@@ -2988,11 +2990,16 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
 }
 
 /*
- * when reads are done, we need to check csums to verify the data is correct
+ * When reads are done, we need to check csums to verify the data is correct.
  * if there's a match, we allow the bio to finish.  If not, the code in
  * extent_io.c will try to find good copies for us.
+ *
+ * @bio_offset:	offset to the beginning of the bio (in bytes)
+ * @start:	file offset of the range start
+ * @end:	file offset of the range end (inclusive)
+ * @mirror:	mirror number
  */
-int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 phy_offset,
+int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u32 bio_offset,
 			   struct page *page, u64 start, u64 end, int mirror)
 {
 	size_t offset = start - page_offset(page);
@@ -3017,8 +3024,7 @@ int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 phy_offset,
 		return 0;
 	}
 
-	phy_offset >>= root->fs_info->sectorsize_bits;
-	return check_data_csum(inode, io_bio, phy_offset, page, offset);
+	return check_data_csum(inode, io_bio, bio_offset, page, offset);
 }
 
 /*
@@ -7712,7 +7718,7 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
 	struct bio_vec bvec;
 	struct bvec_iter iter;
 	u64 start = io_bio->logical;
-	int icsum = 0;
+	u32 bio_offset = 0;
 	blk_status_t err = BLK_STS_OK;
 
 	__bio_for_each_segment(bvec, &io_bio->bio, iter, io_bio->iter) {
@@ -7723,8 +7729,8 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
 		for (i = 0; i < nr_sectors; i++) {
 			ASSERT(pgoff < PAGE_SIZE);
 			if (uptodate &&
-			    (!csum || !check_data_csum(inode, io_bio, icsum,
-						       bvec.bv_page, pgoff))) {
+			    (!csum || !check_data_csum(inode, io_bio,
+					bio_offset, bvec.bv_page, pgoff))) {
 				clean_io_failure(fs_info, failure_tree, io_tree,
 						 start, bvec.bv_page,
 						 btrfs_ino(BTRFS_I(inode)),
@@ -7732,6 +7738,7 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
 			} else {
 				blk_status_t status;
 
+				ASSERT((start - io_bio->logical) < UINT_MAX);
 				status = btrfs_submit_read_repair(inode,
 							&io_bio->bio,
 							start - io_bio->logical,
@@ -7744,7 +7751,8 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
 					err = status;
 			}
 			start += sectorsize;
-			icsum++;
+			ASSERT(bio_offset + sectorsize > bio_offset);
+			bio_offset += sectorsize;
 			pgoff += sectorsize;
 		}
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 03/15] btrfs: inode: make btrfs_verify_data_csum() follow sector size
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
  2020-12-02  6:47 ` [PATCH v3 01/15] btrfs: rename bio_offset of extent_submit_bio_start_t to opt_file_offset Qu Wenruo
  2020-12-02  6:47 ` [PATCH v3 02/15] btrfs: pass bio_offset to check_data_csum() directly Qu Wenruo
@ 2020-12-02  6:47 ` Qu Wenruo
  2020-12-02  6:48 ` [PATCH v3 04/15] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik, Goldwyn Rodrigues

Currently btrfs_verify_data_csum() just pass the whole page to
check_data_csum(), which is fine since we only support sectorsize ==
PAGE_SIZE.

To support subpage, we need to properly honor per-sector
checksum verification, just like what we did in dio read path.

This patch will do the csum verification in a for loop, starts with
pg_off == start - page_offset(page), with sectorsize increasement for
each loop.

For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we
will only finish with just one loop.

For subpage case, we do the loop to iterate each sector and if we found
any error, we return error.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5c051b0e58a5..255ea28982ff 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2951,7 +2951,7 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
  * The length of such check is always one sector size.
  */
 static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
-			   u32 bio_offset, struct page *page, int pgoff)
+			   u32 bio_offset, struct page *page, u32 pgoff)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
@@ -3002,10 +3002,11 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
 int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u32 bio_offset,
 			   struct page *page, u64 start, u64 end, int mirror)
 {
-	size_t offset = start - page_offset(page);
 	struct inode *inode = page->mapping->host;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
+	const u32 sectorsize = root->fs_info->sectorsize;
+	u32 pg_off;
 
 	if (PageChecked(page)) {
 		ClearPageChecked(page);
@@ -3024,7 +3025,18 @@ int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u32 bio_offset,
 		return 0;
 	}
 
-	return check_data_csum(inode, io_bio, bio_offset, page, offset);
+	ASSERT(page_offset(page) <= start &&
+	       end <= page_offset(page) + PAGE_SIZE - 1);
+	for (pg_off = offset_in_page(start);
+	     pg_off < offset_in_page(end);
+	     pg_off += sectorsize, bio_offset += sectorsize) {
+		int ret;
+
+		ret = check_data_csum(inode, io_bio, bio_offset, page, pg_off);
+		if (ret < 0)
+			return -EIO;
+	}
+	return 0;
 }
 
 /*
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 04/15] btrfs: extent_io: extract the btree page submission code into its own helper function
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (2 preceding siblings ...)
  2020-12-02  6:47 ` [PATCH v3 03/15] btrfs: inode: make btrfs_verify_data_csum() follow sector size Qu Wenruo
@ 2020-12-02  6:48 ` Qu Wenruo
  2020-12-02  6:48 ` [PATCH v3 05/15] btrfs: extent_io: calculate inline extent buffer page size based on page size Qu Wenruo
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik, David Sterba

In btree_write_cache_pages() we have a btree page submission routine
buried deeply into a nested loop.

This patch will extract that part of code into a helper function,
submit_eb_page(), to do the same work.

Also, since submit_eb_page() now can return >0 for successful extent
buffer submission, remove the "ASSERT(ret <= 0);" line.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c | 122 ++++++++++++++++++++++++++-----------------
 1 file changed, 75 insertions(+), 47 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a28b61510265..4b72c824064c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3987,10 +3987,81 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
 	return ret;
 }
 
+/*
+ * Submit all page(s) of one extent buffer.
+ *
+ * @page:	The page of one extent buffer
+ * @eb_context:	To determine if we need to submit this page. If current page
+ *		belongs to this eb, we don't need to submit.
+ *
+ * The caller should pass each page in their bytenr order, and here we use
+ * @eb_context to determine if we have submitted pages of one extent buffer.
+ *
+ * If have submitted, we just skip until we hit a new page that doesn't belong
+ * to current @eb_context.
+ *
+ * If not yet submitted, we submit all the page(s) of the extent buffer.
+ *
+ * Return >0 if we have submitted the extent buffer successfully.
+ * Return 0 if we don't need to submit the page, as it's already submitted by
+ * previous call.
+ * Return <0 for fatal error.
+ */
+static int submit_eb_page(struct page *page, struct writeback_control *wbc,
+			  struct extent_page_data *epd,
+			  struct extent_buffer **eb_context)
+{
+	struct address_space *mapping = page->mapping;
+	struct extent_buffer *eb;
+	int ret;
+
+	if (!PagePrivate(page))
+		return 0;
+
+	spin_lock(&mapping->private_lock);
+	if (!PagePrivate(page)) {
+		spin_unlock(&mapping->private_lock);
+		return 0;
+	}
+
+	eb = (struct extent_buffer *)page->private;
+
+	/*
+	 * Shouldn't happen and normally this would be a BUG_ON but no sense
+	 * in crashing the users box for something we can survive anyway.
+	 */
+	if (WARN_ON(!eb)) {
+		spin_unlock(&mapping->private_lock);
+		return 0;
+	}
+
+	if (eb == *eb_context) {
+		spin_unlock(&mapping->private_lock);
+		return 0;
+	}
+	ret = atomic_inc_not_zero(&eb->refs);
+	spin_unlock(&mapping->private_lock);
+	if (!ret)
+		return 0;
+
+	*eb_context = eb;
+
+	ret = lock_extent_buffer_for_io(eb, epd);
+	if (ret <= 0) {
+		free_extent_buffer(eb);
+		return ret;
+	}
+	ret = write_one_eb(eb, wbc, epd);
+	free_extent_buffer(eb);
+	if (ret < 0)
+		return ret;
+	return 1;
+}
+
 int btree_write_cache_pages(struct address_space *mapping,
 				   struct writeback_control *wbc)
 {
-	struct extent_buffer *eb, *prev_eb = NULL;
+	struct extent_buffer *eb_context = NULL;
 	struct extent_page_data epd = {
 		.bio = NULL,
 		.extent_locked = 0,
@@ -4036,55 +4107,13 @@ int btree_write_cache_pages(struct address_space *mapping,
 		for (i = 0; i < nr_pages; i++) {
 			struct page *page = pvec.pages[i];
 
-			if (!PagePrivate(page))
-				continue;
-
-			spin_lock(&mapping->private_lock);
-			if (!PagePrivate(page)) {
-				spin_unlock(&mapping->private_lock);
-				continue;
-			}
-
-			eb = (struct extent_buffer *)page->private;
-
-			/*
-			 * Shouldn't happen and normally this would be a BUG_ON
-			 * but no sense in crashing the users box for something
-			 * we can survive anyway.
-			 */
-			if (WARN_ON(!eb)) {
-				spin_unlock(&mapping->private_lock);
-				continue;
-			}
-
-			if (eb == prev_eb) {
-				spin_unlock(&mapping->private_lock);
+			ret = submit_eb_page(page, wbc, &epd, &eb_context);
+			if (ret == 0)
 				continue;
-			}
-
-			ret = atomic_inc_not_zero(&eb->refs);
-			spin_unlock(&mapping->private_lock);
-			if (!ret)
-				continue;
-
-			prev_eb = eb;
-			ret = lock_extent_buffer_for_io(eb, &epd);
-			if (!ret) {
-				free_extent_buffer(eb);
-				continue;
-			} else if (ret < 0) {
-				done = 1;
-				free_extent_buffer(eb);
-				break;
-			}
-
-			ret = write_one_eb(eb, wbc, &epd);
-			if (ret) {
+			if (ret < 0) {
 				done = 1;
-				free_extent_buffer(eb);
 				break;
 			}
-			free_extent_buffer(eb);
 
 			/*
 			 * the filesystem may choose to bump up nr_to_write.
@@ -4105,7 +4134,6 @@ int btree_write_cache_pages(struct address_space *mapping,
 		index = 0;
 		goto retry;
 	}
-	ASSERT(ret <= 0);
 	if (ret < 0) {
 		end_write_bio(&epd, ret);
 		return ret;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 05/15] btrfs: extent_io: calculate inline extent buffer page size based on page size
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (3 preceding siblings ...)
  2020-12-02  6:48 ` [PATCH v3 04/15] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
@ 2020-12-02  6:48 ` Qu Wenruo
  2020-12-02  6:48 ` [PATCH v3 06/15] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support Qu Wenruo
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Johannes Thumshirn, Nikolay Borisov, David Sterba

Btrfs only support 64K as max node size, thus for 4K page system, we
would have at most 16 pages for one extent buffer.

For a system using 64K page size, we would really have just one
single page.

While we always use 16 pages for extent_buffer::pages[], this means for
systems using 64K pages, we are wasting memory for the 15 pages which
will never be utilized.

So this patch will change how the extent_buffer::pages[] array size is
calclulated, now it will be calculated using
BTRFS_MAX_METADATA_BLOCKSIZE and PAGE_SIZE.

For systems using 4K page size, it will stay 16 pages.
For systems using 64K page size, it will be just 1 page.

Since we're here, also move the definition of
BTRFS_MAX_METADATA_BLOCKSIZE to btrfs_tree.h, to avoid circle including
of ctree.h.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h                | 6 ------
 fs/btrfs/extent_io.c            | 7 +------
 fs/btrfs/extent_io.h            | 4 ++--
 include/uapi/linux/btrfs_tree.h | 4 ++++
 4 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 112c9a2ae47b..c5ef29078954 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -67,12 +67,6 @@ struct btrfs_ref;
 
 #define BTRFS_OLDEST_GENERATION	0ULL
 
-/*
- * the max metadata block size.  This limit is somewhat artificial,
- * but the memmove costs go through the roof for larger blocks.
- */
-#define BTRFS_MAX_METADATA_BLOCKSIZE 65536
-
 /*
  * we can actually store much bigger names, but lets not confuse the rest
  * of linux
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4b72c824064c..2bab66b42395 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5053,12 +5053,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
 	atomic_set(&eb->refs, 1);
 	atomic_set(&eb->io_pages, 0);
 
-	/*
-	 * Sanity checks, currently the maximum is 64k covered by 16x 4k pages
-	 */
-	BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE
-		> MAX_INLINE_EXTENT_BUFFER_SIZE);
-	BUG_ON(len > MAX_INLINE_EXTENT_BUFFER_SIZE);
+	ASSERT(len <= BTRFS_MAX_METADATA_BLOCKSIZE);
 
 	return eb;
 }
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index db95468801c7..02b4786478b6 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -6,6 +6,7 @@
 #include <linux/rbtree.h>
 #include <linux/refcount.h>
 #include <linux/fiemap.h>
+#include <linux/btrfs_tree.h>
 #include "ulist.h"
 
 /*
@@ -74,8 +75,7 @@ typedef blk_status_t (submit_bio_hook_t)(struct inode *inode, struct bio *bio,
 typedef blk_status_t (extent_submit_bio_start_t)(struct inode *inode,
 		struct bio *bio, u64 opt_file_offset);
 
-#define INLINE_EXTENT_BUFFER_PAGES 16
-#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE)
+#define INLINE_EXTENT_BUFFER_PAGES (BTRFS_MAX_METADATA_BLOCKSIZE / PAGE_SIZE)
 struct extent_buffer {
 	u64 start;
 	unsigned long len;
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index 6b885982ece6..44e22f935b8c 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -4,6 +4,7 @@
 
 #include <linux/btrfs.h>
 #include <linux/types.h>
+#include <linux/sizes.h>
 #ifdef __KERNEL__
 #include <linux/stddef.h>
 #else
@@ -990,4 +991,7 @@ struct btrfs_qgroup_limit_item {
 	__le64 rsv_excl;
 } __attribute__ ((__packed__));
 
+/* Maximum metadata block size (nodesize) */
+#define BTRFS_MAX_METADATA_BLOCKSIZE	SZ_64K
+
 #endif /* _BTRFS_CTREE_H_ */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 06/15] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (4 preceding siblings ...)
  2020-12-02  6:48 ` [PATCH v3 05/15] btrfs: extent_io: calculate inline extent buffer page size based on page size Qu Wenruo
@ 2020-12-02  6:48 ` Qu Wenruo
  2020-12-02  6:48 ` [PATCH v3 07/15] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:48 UTC (permalink / raw)
  To: linux-btrfs

As a preparation for subpage sector size support (allowing filesystem
with sector size smaller than page size to be mounted) if the sector
size is smaller than page size, we don't allow tree block to be read if
it crosses 64K(*) boundary.

The 64K is selected because:
- We are only going to support 64K page size for subpage for now
- 64K is also the max node size btrfs supports

This ensures that, tree blocks are always contained in one page for a
system with 64K page size, which can greatly simplify the handling.

Or we need to do complex multi-page handling for tree blocks.

Currently there is no way to create such tree blocks.
Kernel has avoided such tree blocks allocation even on 4K page size, as
it can lead to RAID56 stripe scrubing.

While btrfs-progs has fixed its chunk allocator since 2016 for convert,
and has extra checks to do the same behavior as the kernel.

Just add such graceful checks for ancient btrfs.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2bab66b42395..8cbd6d43b154 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5272,6 +5272,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		btrfs_err(fs_info, "bad tree block start %llu", start);
 		return ERR_PTR(-EINVAL);
 	}
+	if (fs_info->sectorsize < PAGE_SIZE &&
+	    offset_in_page(start) + len > PAGE_SIZE) {
+		btrfs_err(fs_info,
+		"tree block crosses page boundary, start %llu nodesize %lu",
+			  start, len);
+		return ERR_PTR(-EINVAL);
+	}
 
 	eb = find_extent_buffer(fs_info, start);
 	if (eb)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 07/15] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (5 preceding siblings ...)
  2020-12-02  6:48 ` [PATCH v3 06/15] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support Qu Wenruo
@ 2020-12-02  6:48 ` Qu Wenruo
  2020-12-02  6:48 ` [PATCH v3 08/15] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

For subpage sized extent buffer, we have ensured no extent buffer will
cross page boundary, thus we would only need one page for any extent
buffer.

This patch will update the function num_extent_pages() to handle such
case.
Now num_extent_pages() would return 1 instead of for subpage sized
extent buffer.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 02b4786478b6..811f44d82c7c 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -203,8 +203,14 @@ void btrfs_readahead_node_child(struct extent_buffer *node, int slot);
 
 static inline int num_extent_pages(const struct extent_buffer *eb)
 {
-	return (round_up(eb->start + eb->len, PAGE_SIZE) >> PAGE_SHIFT) -
-	       (eb->start >> PAGE_SHIFT);
+	/*
+	 * For sectorsize == PAGE_SIZE case, since nodesize is always aligned to
+	 * sectorsize, it's just eb->len >> PAGE_SHIFT.
+	 *
+	 * For sectorsize < PAGE_SIZE case, we could have nodesize < PAGE_SIZE,
+	 * thus have to ensure we get at least one page.
+	 */
+	return eb->len >> PAGE_SHIFT ?: 1;
 }
 
 static inline int extent_buffer_uptodate(const struct extent_buffer *eb)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 08/15] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (6 preceding siblings ...)
  2020-12-02  6:48 ` [PATCH v3 07/15] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
@ 2020-12-02  6:48 ` Qu Wenruo
  2020-12-02  6:48 ` [PATCH v3 09/15] btrfs: file-item: remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums() Qu Wenruo
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goldwyn Rodrigues

To support sectorsize < PAGE_SIZE case, we need to take extra care for
extent buffer accessors.

Since sectorsize is smaller than PAGE_SIZE, one page can contain
multiple tree blocks, we must use eb->start to determine the real offset
to read/write for extent buffer accessors.

This patch introduces two helpers to do these:
- get_eb_page_index()
  This is to calculate the index to access extent_buffer::pages.
  It's just a simple wrapper around "start >> PAGE_SHIFT".

  For sectorsize == PAGE_SIZE case, nothing is changed.
  For sectorsize < PAGE_SIZE case, we always get index as 0, and
  the existing page shift works also fine.

- get_eb_offset_in_page()
  This is to calculate the offset to access extent_buffer::pages.
  This needs to take extent_buffer::start into consideration.

  For sectorsize == PAGE_SIZE case, extent_buffer::start is always
  aligned to PAGE_SIZE, thus adding extent_buffer::start to
  offset_in_page() won't change the result.
  For sectorsize < PAGE_SIZE case, adding extent_buffer::start gives
  us the correct offset to access.

This patch will touch the following parts to cover all extent buffer
accessors:

- BTRFS_SETGET_HEADER_FUNCS()
- read_extent_buffer()
- read_extent_buffer_to_user()
- memcmp_extent_buffer()
- write_extent_buffer_chunk_tree_uuid()
- write_extent_buffer_fsid()
- write_extent_buffer()
- memzero_extent_buffer()
- copy_extent_buffer_full()
- copy_extent_buffer()
- memcpy_extent_buffer()
- memmove_extent_buffer()
- btrfs_get_token_##bits()
- btrfs_get_##bits()
- btrfs_set_token_##bits()
- btrfs_set_##bits()
- generic_bin_search()

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.c        |  3 +-
 fs/btrfs/ctree.h        | 38 ++++++++++++++++++++++--
 fs/btrfs/extent_io.c    | 64 ++++++++++++++++++++++++-----------------
 fs/btrfs/struct-funcs.c | 18 ++++++------
 4 files changed, 85 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index e5a0941c4bde..07810891e204 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1683,9 +1683,10 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
 		oip = offset_in_page(offset);
 
 		if (oip + key_size <= PAGE_SIZE) {
-			const unsigned long idx = offset >> PAGE_SHIFT;
+			const unsigned long idx = get_eb_page_index(offset);
 			char *kaddr = page_address(eb->pages[idx]);
 
+			oip = get_eb_offset_in_page(eb, offset);
 			tmp = (struct btrfs_disk_key *)(kaddr + oip);
 		} else {
 			read_extent_buffer(eb, &unaligned, offset, key_size);
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c5ef29078954..c9eb6d881064 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1552,13 +1552,14 @@ static inline void btrfs_set_token_##name(struct btrfs_map_token *token,\
 #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)		\
 static inline u##bits btrfs_##name(const struct extent_buffer *eb)	\
 {									\
-	const type *p = page_address(eb->pages[0]);			\
+	const type *p = page_address(eb->pages[0]) +			\
+			offset_in_page(eb->start);			\
 	return get_unaligned_le##bits(&p->member);			\
 }									\
 static inline void btrfs_set_##name(const struct extent_buffer *eb,	\
 				    u##bits val)			\
 {									\
-	type *p = page_address(eb->pages[0]);				\
+	type *p = page_address(eb->pages[0]) + offset_in_page(eb->start); \
 	put_unaligned_le##bits(val, &p->member);			\
 }
 
@@ -3366,6 +3367,39 @@ static inline void assertfail(const char *expr, const char* file, int line) { }
 #define ASSERT(expr)	(void)(expr)
 #endif
 
+/*
+ * Get the correct offset inside the page of extent buffer.
+ *
+ * Will handle both sectorsize == PAGE_SIZE and sectorsize < PAGE_SIZE cases.
+ *
+ * @eb:		The target extent buffer
+ * @start:	The offset inside the extent buffer
+ */
+static inline size_t get_eb_offset_in_page(const struct extent_buffer *eb,
+					   unsigned long offset)
+{
+	/*
+	 * For sectorsize == PAGE_SIZE case, eb->start will always be aligned
+	 * to PAGE_SIZE, thus adding it won't cause any difference.
+	 *
+	 * For sectorsize < PAGE_SIZE, we must only read the data belongs to
+	 * the eb, thus we have to take the eb->start into consideration.
+	 */
+	return offset_in_page(offset + eb->start);
+}
+
+static inline unsigned long get_eb_page_index(unsigned long offset)
+{
+	/*
+	 * For sectorsize == PAGE_SIZE case, plain >> PAGE_SHIFT is enough.
+	 *
+	 * For sectorsize < PAGE_SIZE case, we only support 64K PAGE_SIZE,
+	 * and has ensured all tree blocks are contained in one page, thus
+	 * we always get index == 0.
+	 */
+	return offset >> PAGE_SHIFT;
+}
+
 /*
  * Use that for functions that are conditionally exported for sanity tests but
  * otherwise static
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8cbd6d43b154..8719c51bb4c5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5702,12 +5702,12 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 	struct page *page;
 	char *kaddr;
 	char *dst = (char *)dstv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 
 	if (check_eb_range(eb, start, len))
 		return;
 
-	offset = offset_in_page(start);
+	offset = get_eb_offset_in_page(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5732,13 +5732,13 @@ int read_extent_buffer_to_user_nofault(const struct extent_buffer *eb,
 	struct page *page;
 	char *kaddr;
 	char __user *dst = (char __user *)dstv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 	int ret = 0;
 
 	WARN_ON(start > eb->len);
 	WARN_ON(start + len > eb->start + eb->len);
 
-	offset = offset_in_page(start);
+	offset = get_eb_offset_in_page(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5767,13 +5767,13 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 	struct page *page;
 	char *kaddr;
 	char *ptr = (char *)ptrv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 	int ret = 0;
 
 	if (check_eb_range(eb, start, len))
 		return -EINVAL;
 
-	offset = offset_in_page(start);
+	offset = get_eb_offset_in_page(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5799,7 +5799,7 @@ void write_extent_buffer_chunk_tree_uuid(const struct extent_buffer *eb,
 	char *kaddr;
 
 	WARN_ON(!PageUptodate(eb->pages[0]));
-	kaddr = page_address(eb->pages[0]);
+	kaddr = page_address(eb->pages[0]) + get_eb_offset_in_page(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, chunk_tree_uuid), srcv,
 			BTRFS_FSID_SIZE);
 }
@@ -5809,7 +5809,7 @@ void write_extent_buffer_fsid(const struct extent_buffer *eb, const void *srcv)
 	char *kaddr;
 
 	WARN_ON(!PageUptodate(eb->pages[0]));
-	kaddr = page_address(eb->pages[0]);
+	kaddr = page_address(eb->pages[0]) + get_eb_offset_in_page(eb, 0);
 	memcpy(kaddr + offsetof(struct btrfs_header, fsid), srcv,
 			BTRFS_FSID_SIZE);
 }
@@ -5822,12 +5822,12 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
 	struct page *page;
 	char *kaddr;
 	char *src = (char *)srcv;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 
 	if (check_eb_range(eb, start, len))
 		return;
 
-	offset = offset_in_page(start);
+	offset = get_eb_offset_in_page(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5851,12 +5851,12 @@ void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start,
 	size_t offset;
 	struct page *page;
 	char *kaddr;
-	unsigned long i = start >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(start);
 
 	if (check_eb_range(eb, start, len))
 		return;
 
-	offset = offset_in_page(start);
+	offset = get_eb_offset_in_page(eb, start);
 
 	while (len > 0) {
 		page = eb->pages[i];
@@ -5880,10 +5880,20 @@ void copy_extent_buffer_full(const struct extent_buffer *dst,
 
 	ASSERT(dst->len == src->len);
 
-	num_pages = num_extent_pages(dst);
-	for (i = 0; i < num_pages; i++)
-		copy_page(page_address(dst->pages[i]),
-				page_address(src->pages[i]));
+	if (dst->fs_info->sectorsize == PAGE_SIZE) {
+		num_pages = num_extent_pages(dst);
+		for (i = 0; i < num_pages; i++)
+			copy_page(page_address(dst->pages[i]),
+				  page_address(src->pages[i]));
+	} else {
+		size_t src_offset = get_eb_offset_in_page(src, 0);
+		size_t dst_offset = get_eb_offset_in_page(dst, 0);
+
+		ASSERT(src->fs_info->sectorsize < PAGE_SIZE);
+		memcpy(page_address(dst->pages[0]) + dst_offset,
+		       page_address(src->pages[0]) + src_offset,
+		       src->len);
+	}
 }
 
 void copy_extent_buffer(const struct extent_buffer *dst,
@@ -5896,7 +5906,7 @@ void copy_extent_buffer(const struct extent_buffer *dst,
 	size_t offset;
 	struct page *page;
 	char *kaddr;
-	unsigned long i = dst_offset >> PAGE_SHIFT;
+	unsigned long i = get_eb_page_index(dst_offset);
 
 	if (check_eb_range(dst, dst_offset, len) ||
 	    check_eb_range(src, src_offset, len))
@@ -5904,7 +5914,7 @@ void copy_extent_buffer(const struct extent_buffer *dst,
 
 	WARN_ON(src->len != dst_len);
 
-	offset = offset_in_page(dst_offset);
+	offset = get_eb_offset_in_page(dst, dst_offset);
 
 	while (len > 0) {
 		page = dst->pages[i];
@@ -5948,7 +5958,7 @@ static inline void eb_bitmap_offset(const struct extent_buffer *eb,
 	 * the bitmap item in the extent buffer + the offset of the byte in the
 	 * bitmap item.
 	 */
-	offset = start + byte_offset;
+	offset = start + offset_in_page(eb->start) + byte_offset;
 
 	*page_index = offset >> PAGE_SHIFT;
 	*page_offset = offset_in_page(offset);
@@ -6102,11 +6112,11 @@ void memcpy_extent_buffer(const struct extent_buffer *dst,
 		return;
 
 	while (len > 0) {
-		dst_off_in_page = offset_in_page(dst_offset);
-		src_off_in_page = offset_in_page(src_offset);
+		dst_off_in_page = get_eb_offset_in_page(dst, dst_offset);
+		src_off_in_page = get_eb_offset_in_page(dst, src_offset);
 
-		dst_i = dst_offset >> PAGE_SHIFT;
-		src_i = src_offset >> PAGE_SHIFT;
+		dst_i = get_eb_page_index(dst_offset);
+		src_i = get_eb_page_index(src_offset);
 
 		cur = min(len, (unsigned long)(PAGE_SIZE -
 					       src_off_in_page));
@@ -6142,11 +6152,11 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
 		return;
 	}
 	while (len > 0) {
-		dst_i = dst_end >> PAGE_SHIFT;
-		src_i = src_end >> PAGE_SHIFT;
+		dst_i = get_eb_page_index(dst_end);
+		src_i = get_eb_page_index(src_end);
 
-		dst_off_in_page = offset_in_page(dst_end);
-		src_off_in_page = offset_in_page(src_end);
+		dst_off_in_page = get_eb_offset_in_page(dst, dst_end);
+		src_off_in_page = get_eb_offset_in_page(dst, src_end);
 
 		cur = min_t(unsigned long, len, src_off_in_page + 1);
 		cur = min(cur, dst_off_in_page + 1);
diff --git a/fs/btrfs/struct-funcs.c b/fs/btrfs/struct-funcs.c
index c46be27be700..fc282eec7291 100644
--- a/fs/btrfs/struct-funcs.c
+++ b/fs/btrfs/struct-funcs.c
@@ -57,8 +57,9 @@ u##bits btrfs_get_token_##bits(struct btrfs_map_token *token,		\
 			       const void *ptr, unsigned long off)	\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
-	const unsigned long oip = offset_in_page(member_offset);	\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
+	const unsigned long oip = get_eb_offset_in_page(token->eb,	\
+						     member_offset);	\
 	const int size = sizeof(u##bits);				\
 	u8 lebytes[sizeof(u##bits)];					\
 	const int part = PAGE_SIZE - oip;				\
@@ -85,8 +86,8 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb,		\
 			 const void *ptr, unsigned long off)		\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long oip = offset_in_page(member_offset);	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
+	const unsigned long oip = get_eb_offset_in_page(eb, member_offset); \
+	const unsigned long idx = get_eb_page_index(member_offset);	\
 	char *kaddr = page_address(eb->pages[idx]);			\
 	const int size = sizeof(u##bits);				\
 	const int part = PAGE_SIZE - oip;				\
@@ -106,8 +107,9 @@ void btrfs_set_token_##bits(struct btrfs_map_token *token,		\
 			    u##bits val)				\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
-	const unsigned long oip = offset_in_page(member_offset);	\
+	const unsigned long idx = get_eb_page_index(member_offset);	\
+	const unsigned long oip = get_eb_offset_in_page(token->eb,	\
+						     member_offset);	\
 	const int size = sizeof(u##bits);				\
 	u8 lebytes[sizeof(u##bits)];					\
 	const int part = PAGE_SIZE - oip;				\
@@ -136,8 +138,8 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr,	\
 		      unsigned long off, u##bits val)			\
 {									\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
-	const unsigned long oip = offset_in_page(member_offset);	\
-	const unsigned long idx = member_offset >> PAGE_SHIFT;		\
+	const unsigned long oip = get_eb_offset_in_page(eb, member_offset); \
+	const unsigned long idx = get_eb_page_index(member_offset);	\
 	char *kaddr = page_address(eb->pages[idx]);			\
 	const int size = sizeof(u##bits);				\
 	const int part = PAGE_SIZE - oip;				\
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 09/15] btrfs: file-item: remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums()
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (7 preceding siblings ...)
  2020-12-02  6:48 ` [PATCH v3 08/15] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
@ 2020-12-02  6:48 ` Qu Wenruo
  2020-12-02  6:48 ` [PATCH v3 10/15] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs Qu Wenruo
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:48 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

The function btrfs_lookup_bio_sums() is only called for read bios.
While btrfs_find_ordered_sum() is to search ordered extent sums, which
is only for write path.

This means to read a page we either:
- Submit read bio if it's no uptodate
  This means we only need to search csum tree for csums.

- The page is already uptodate
  It can be marked uptodate for previous read, or being marked dirty.
  As we always mark page uptodate for dirty page.
  In that case, we don't need to submit read bio at all, thus no need
  to search any csum.

So this patch will remove the btrfs_find_ordered_sum() call in
btrfs_lookup_bio_sums().
And since btrfs_lookup_bio_sums() is the only caller for
btrfs_find_ordered_sum(), also remove the implementation.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file-item.c    | 16 ++++++++++-----
 fs/btrfs/ordered-data.c | 44 -----------------------------------------
 fs/btrfs/ordered-data.h |  2 --
 3 files changed, 11 insertions(+), 51 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 8fa98d55fcfd..3df13d0446b9 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -239,7 +239,8 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans,
 }
 
 /**
- * btrfs_lookup_bio_sums - Look up checksums for a bio.
+ * btrfs_lookup_bio_sums - Look up checksums for a read bio.
+ *
  * @inode: inode that the bio is for.
  * @bio: bio to look up.
  * @offset: Unless (u64)-1, look up checksums for this offset in the file.
@@ -274,6 +275,15 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 	if (!fs_info->csum_root || (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM))
 		return BLK_STS_OK;
 
+	/*
+	 * This function is only called for read bio.
+	 *
+	 * This means several things:
+	 * - All of our csums should only be in csum tree
+	 *   No ordered extents csums. As ordered extents are only for write
+	 *   path.
+	 */
+	ASSERT(bio_op(bio) == REQ_OP_READ);
 	path = btrfs_alloc_path();
 	if (!path)
 		return BLK_STS_RESOURCE;
@@ -324,10 +334,6 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 
 		if (page_offsets)
 			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
-		count = btrfs_find_ordered_sum(BTRFS_I(inode), offset,
-					       disk_bytenr, csum, nblocks);
-		if (count)
-			goto found;
 
 		if (!item || disk_bytenr < item_start_offset ||
 		    disk_bytenr >= item_last_offset) {
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 0d61f9fefc02..79d366a36223 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -854,50 +854,6 @@ btrfs_lookup_first_ordered_extent(struct btrfs_inode *inode, u64 file_offset)
 	return entry;
 }
 
-/*
- * search the ordered extents for one corresponding to 'offset' and
- * try to find a checksum.  This is used because we allow pages to
- * be reclaimed before their checksum is actually put into the btree
- */
-int btrfs_find_ordered_sum(struct btrfs_inode *inode, u64 offset,
-			   u64 disk_bytenr, u8 *sum, int len)
-{
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct btrfs_ordered_sum *ordered_sum;
-	struct btrfs_ordered_extent *ordered;
-	struct btrfs_ordered_inode_tree *tree = &inode->ordered_tree;
-	unsigned long num_sectors;
-	unsigned long i;
-	const u32 csum_size = fs_info->csum_size;
-	int index = 0;
-
-	ordered = btrfs_lookup_ordered_extent(inode, offset);
-	if (!ordered)
-		return 0;
-
-	spin_lock_irq(&tree->lock);
-	list_for_each_entry_reverse(ordered_sum, &ordered->list, list) {
-		if (disk_bytenr >= ordered_sum->bytenr &&
-		    disk_bytenr < ordered_sum->bytenr + ordered_sum->len) {
-			i = (disk_bytenr - ordered_sum->bytenr) >>
-			    fs_info->sectorsize_bits;
-			num_sectors = ordered_sum->len >> fs_info->sectorsize_bits;
-			num_sectors = min_t(int, len - index, num_sectors - i);
-			memcpy(sum + index, ordered_sum->sums + i * csum_size,
-			       num_sectors * csum_size);
-
-			index += (int)num_sectors * csum_size;
-			if (index == len)
-				goto out;
-			disk_bytenr += num_sectors * fs_info->sectorsize;
-		}
-	}
-out:
-	spin_unlock_irq(&tree->lock);
-	btrfs_put_ordered_extent(ordered);
-	return index;
-}
-
 /*
  * btrfs_flush_ordered_range - Lock the passed range and ensures all pending
  * ordered extents in it are run to completion.
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 367269effd6a..0bfa82b58e23 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -183,8 +183,6 @@ struct btrfs_ordered_extent *btrfs_lookup_ordered_range(
 		u64 len);
 void btrfs_get_ordered_extents_for_logging(struct btrfs_inode *inode,
 					   struct list_head *list);
-int btrfs_find_ordered_sum(struct btrfs_inode *inode, u64 offset,
-			   u64 disk_bytenr, u8 *sum, int len);
 u64 btrfs_wait_ordered_extents(struct btrfs_root *root, u64 nr,
 			       const u64 range_start, const u64 range_len);
 void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 10/15] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (8 preceding siblings ...)
  2020-12-02  6:48 ` [PATCH v3 09/15] btrfs: file-item: remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums() Qu Wenruo
@ 2020-12-02  6:48 ` Qu Wenruo
  2020-12-02  6:48 ` [PATCH v3 11/15] btrfs: scrub: reduce the width for extent_len/stripe_len from 64 bits to 32 bits Qu Wenruo
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:48 UTC (permalink / raw)
  To: linux-btrfs

Refactor btrfs_lookup_bio_sums() by:
- Remove the @file_offset parameter
  There are two factors making the @file_offset parameter useless:

  * For csum lookup in csum tree, file offset makes no sense
    We only need disk_bytenr, which is unrelated to file_offset

  * page_offset (file offset) of each bvec is not contiguous.
    Pages can be added to the same bio as long as their on-disk bytenr
    is contiguous, meaning we could have pages at different file offsets
    in the same bio.

  Thus passing file_offset makes no sense any more.
  The only user of file_offset is for data reloc inode, we will use
  a new function, search_file_offset_in_bio(), to handle it.

- Extract the csum tree lookup into search_csum_tree()
  The new function will handle the csum search in csum tree.
  The return value is the same as btrfs_find_ordered_sum(), returning
  the found number of sectors which has checksum.

- Change how we do the main loop
  The only needed info from bio is:
  * the on-disk bytenr
  * the length

  After extracting above info, we can do the search without bio
  at all, which makes the main loop much simpler:

	for (cur_disk_bytenr = orig_disk_bytenr;
	     cur_disk_bytenr < orig_disk_bytenr + orig_len;
	     cur_disk_bytenr += count * sectorsize) {

		/* Lookup csum tree */
		count = search_csum_tree(fs_info, path, cur_disk_bytenr,
					 search_len, csum_dst);
		if (!count) {
			/* Csum hole handling */
		}
	}

- Use single variable as core to calculate all other offsets
  Instead of all different type of variables, we use only one core
  variable, cur_disk_bytenr, which represents the current disk bytenr.

  All involved values can be calculated from that core variable, and
  all those variable will only be visible in the inner loop.

All above refactor makes btrfs_lookup_bio_sums() way more robust than it
used to, especially related to the file offset lookup.
Now file_offset lookup is only related to data reloc inode, other wise
we don't need to bother file_offset at all.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c |   5 +-
 fs/btrfs/ctree.h       |   2 +-
 fs/btrfs/file-item.c   | 252 +++++++++++++++++++++++++++--------------
 fs/btrfs/inode.c       |   5 +-
 4 files changed, 173 insertions(+), 91 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 12d50f1cdc58..5ae3fa0386b7 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -719,8 +719,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 			 */
 			refcount_inc(&cb->pending_bios);
 
-			ret = btrfs_lookup_bio_sums(inode, comp_bio, (u64)-1,
-						    sums);
+			ret = btrfs_lookup_bio_sums(inode, comp_bio, sums);
 			BUG_ON(ret); /* -ENOMEM */
 
 			nr_sectors = DIV_ROUND_UP(comp_bio->bi_iter.bi_size,
@@ -746,7 +745,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	ret = btrfs_bio_wq_end_io(fs_info, comp_bio, BTRFS_WQ_ENDIO_DATA);
 	BUG_ON(ret); /* -ENOMEM */
 
-	ret = btrfs_lookup_bio_sums(inode, comp_bio, (u64)-1, sums);
+	ret = btrfs_lookup_bio_sums(inode, comp_bio, sums);
 	BUG_ON(ret); /* -ENOMEM */
 
 	ret = btrfs_map_bio(fs_info, comp_bio, mirror_num);
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c9eb6d881064..d31627449acd 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3014,7 +3014,7 @@ struct btrfs_dio_private;
 int btrfs_del_csums(struct btrfs_trans_handle *trans,
 		    struct btrfs_root *root, u64 bytenr, u64 len);
 blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
-				   u64 offset, u8 *dst);
+				   u8 *dst);
 int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
 			     struct btrfs_root *root,
 			     u64 objectid, u64 pos,
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 3df13d0446b9..0eaa78800861 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -238,13 +238,118 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
+/*
+ * Find csums for logical bytenr range
+ * [disk_bytenr, disk_bytenr + len) and restore the result to @dst.
+ *
+ * Return >0 for the number of sectors we found.
+ * Return 0 for the range [disk_bytenr, disk_bytenr + sectorsize) has no csum
+ * for it. Caller may want to try next sector until one range is hit.
+ * Return <0 for fatal error.
+ */
+static int search_csum_tree(struct btrfs_fs_info *fs_info,
+			    struct btrfs_path *path, u64 disk_bytenr,
+			    u64 len, u8 *dst)
+{
+	struct btrfs_csum_item *item = NULL;
+	struct btrfs_key key;
+	const u32 sectorsize = fs_info->sectorsize;
+	const u32 csum_size = fs_info->csum_size;
+	u32 itemsize;
+	int ret;
+	u64 csum_start;
+	u64 csum_len;
+
+	ASSERT(IS_ALIGNED(disk_bytenr, sectorsize) &&
+	       IS_ALIGNED(len, sectorsize));
+
+	/* Check if the current csum item covers disk_bytenr */
+	if (path->nodes[0]) {
+		item = btrfs_item_ptr(path->nodes[0], path->slots[0],
+				      struct btrfs_csum_item);
+		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+		itemsize = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
+
+		csum_start = key.offset;
+		csum_len = (itemsize / csum_size) * sectorsize;
+
+		if (in_range(disk_bytenr, csum_start, csum_len))
+			goto found;
+	}
+
+	/* Current item doesn't contain the desired range, re-search */
+	btrfs_release_path(path);
+	item = btrfs_lookup_csum(NULL, fs_info->csum_root, path,
+				 disk_bytenr, 0);
+	if (IS_ERR(item)) {
+		ret = PTR_ERR(item);
+		goto out;
+	}
+	btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+	itemsize = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
+
+	csum_start = key.offset;
+	csum_len = (itemsize / csum_size) * sectorsize;
+	ASSERT(in_range(disk_bytenr, csum_start, csum_len));
+
+found:
+	ret = (min(csum_start + csum_len, disk_bytenr + len) -
+		   disk_bytenr) >> fs_info->sectorsize_bits;
+	read_extent_buffer(path->nodes[0], dst, (unsigned long)item,
+			ret * csum_size);
+out:
+	if (ret == -ENOENT)
+		ret = 0;
+	return ret;
+}
+
+/*
+ * A helper to locate the file_offset of @cur_disk_bytenr of a @bio.
+ *
+ * Bio of btrfs represents read range of
+ * [bi_sector << 9, bi_sector << 9 + bi_size).
+ * Knowing this, we can iterate through each bvec to locate the page belong to
+ * @cur_disk_bytenr and get the file offset.
+ *
+ * @inode is used to determine the bvec page really belongs to @inode.
+ *
+ * Return 0 if we can't find the file offset;
+ * Return >0 if we find the file offset and restore it to @file_offset_ret
+ */
+static int search_file_offset_in_bio(struct bio *bio, struct inode *inode,
+				     u64 disk_bytenr, u64 *file_offset_ret)
+{
+	struct bvec_iter iter;
+	struct bio_vec bvec;
+	u64 cur = bio->bi_iter.bi_sector << 9;
+	int ret = 0;
+
+	bio_for_each_segment(bvec, bio, iter) {
+		struct page *page = bvec.bv_page;
+
+		if (cur > disk_bytenr)
+			break;
+		if (cur + bvec.bv_len <= disk_bytenr) {
+			cur += bvec.bv_len;
+			continue;
+		}
+		ASSERT(in_range(disk_bytenr, cur, bvec.bv_len));
+		if (page->mapping && page->mapping->host &&
+		    page->mapping->host == inode) {
+			ret = 1;
+			*file_offset_ret = page_offset(page) + bvec.bv_offset
+				+ disk_bytenr - cur;
+			break;
+		}
+	}
+	return ret;
+}
+
 /**
- * btrfs_lookup_bio_sums - Look up checksums for a read bio.
+ * Lookup the csum for the read bio in csum tree.
  *
  * @inode: inode that the bio is for.
  * @bio: bio to look up.
- * @offset: Unless (u64)-1, look up checksums for this offset in the file.
- *          If (u64)-1, use the page offsets from the bio instead.
  * @dst: Buffer of size nblocks * btrfs_super_csum_size() used to return
  *       checksum (nblocks = bio->bi_iter.bi_size / fs_info->sectorsize). If
  *       NULL, the checksum buffer is allocated and returned in
@@ -253,24 +358,19 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans,
  * Return: BLK_STS_RESOURCE if allocating memory fails, BLK_STS_OK otherwise.
  */
 blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
-				   u64 offset, u8 *dst)
+				   u8 *dst)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct bio_vec bvec;
-	struct bvec_iter iter;
-	struct btrfs_csum_item *item = NULL;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_path *path;
-	const bool page_offsets = (offset == (u64)-1);
+	const u32 sectorsize = fs_info->sectorsize;
+	const u32 csum_size = fs_info->csum_size;
+	u32 orig_len = bio->bi_iter.bi_size;
+	u64 orig_disk_bytenr = bio->bi_iter.bi_sector << 9;
+	u64 cur_disk_bytenr;
 	u8 *csum;
-	u64 item_start_offset = 0;
-	u64 item_last_offset = 0;
-	u64 disk_bytenr;
-	u64 page_bytes_left;
-	u32 diff;
-	int nblocks;
+	unsigned int nblocks = orig_len >> fs_info->sectorsize_bits;
 	int count = 0;
-	const u32 csum_size = fs_info->csum_size;
 
 	if (!fs_info->csum_root || (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM))
 		return BLK_STS_OK;
@@ -282,13 +382,16 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 	 * - All of our csums should only be in csum tree
 	 *   No ordered extents csums. As ordered extents are only for write
 	 *   path.
+	 * - No need to bother any other info from bvec
+	 *   Since we're looking up csums, the only important info is the
+	 *   disk_bytenr and the length, which can all be extracted from
+	 *   bi_iter directly.
 	 */
 	ASSERT(bio_op(bio) == REQ_OP_READ);
 	path = btrfs_alloc_path();
 	if (!path)
 		return BLK_STS_RESOURCE;
 
-	nblocks = bio->bi_iter.bi_size >> fs_info->sectorsize_bits;
 	if (!dst) {
 		struct btrfs_io_bio *btrfs_bio = btrfs_io_bio(bio);
 
@@ -325,81 +428,62 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio,
 		path->skip_locking = 1;
 	}
 
-	disk_bytenr = bio->bi_iter.bi_sector << 9;
+	for (cur_disk_bytenr = orig_disk_bytenr;
+	     cur_disk_bytenr < orig_disk_bytenr + orig_len;
+	     cur_disk_bytenr += (count * sectorsize)) {
+		u64 search_len = orig_disk_bytenr + orig_len - cur_disk_bytenr;
+		unsigned int sector_offset;
+		u8 *csum_dst;
 
-	bio_for_each_segment(bvec, bio, iter) {
-		page_bytes_left = bvec.bv_len;
-		if (count)
-			goto next;
-
-		if (page_offsets)
-			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
+		/*
+		 * Although both cur_disk_bytenr and orig_disk_bytenr is u64,
+		 * we're calculating the offset to the bio start.
+		 *
+		 * Bio size is limited to UINT_MAX, thus unsigned int is
+		 * large enough to contain the raw result, not to mention
+		 * the right shifted result.
+		 */
+		ASSERT(cur_disk_bytenr - orig_disk_bytenr < UINT_MAX);
+		sector_offset = (cur_disk_bytenr - orig_disk_bytenr) >>
+				 fs_info->sectorsize_bits;
+		csum_dst = csum + sector_offset * csum_size;
+
+		count = search_csum_tree(fs_info, path, cur_disk_bytenr,
+					 search_len, csum_dst);
+		if (count <= 0) {
+			/*
+			 * Either we hit a critical error or we didn't find
+			 * the csum.
+			 * Either way, we put zero into the csums dst, and just
+			 * skip to next sector for a better luck.
+			 */
+			memset(csum_dst, 0, csum_size);
+			count = 1;
 
-		if (!item || disk_bytenr < item_start_offset ||
-		    disk_bytenr >= item_last_offset) {
-			struct btrfs_key found_key;
-			u32 item_size;
-
-			if (item)
-				btrfs_release_path(path);
-			item = btrfs_lookup_csum(NULL, fs_info->csum_root,
-						 path, disk_bytenr, 0);
-			if (IS_ERR(item)) {
-				count = 1;
-				memset(csum, 0, csum_size);
-				if (BTRFS_I(inode)->root->root_key.objectid ==
-				    BTRFS_DATA_RELOC_TREE_OBJECTID) {
-					set_extent_bits(io_tree, offset,
-						offset + fs_info->sectorsize - 1,
+			/*
+			 * For data reloc inode, we need to mark the
+			 * range NODATASUM so that balance won't report
+			 * false csum error.
+			 */
+			if (BTRFS_I(inode)->root->root_key.objectid ==
+			    BTRFS_DATA_RELOC_TREE_OBJECTID) {
+				u64 file_offset;
+				int ret;
+
+				ret = search_file_offset_in_bio(bio, inode,
+						cur_disk_bytenr, &file_offset);
+				if (ret)
+					set_extent_bits(io_tree, file_offset,
+						file_offset + sectorsize - 1,
 						EXTENT_NODATASUM);
-				} else {
-					btrfs_info_rl(fs_info,
-						   "no csum found for inode %llu start %llu",
-					       btrfs_ino(BTRFS_I(inode)), offset);
-				}
-				item = NULL;
-				btrfs_release_path(path);
-				goto found;
+			} else {
+				btrfs_warn_rl(fs_info,
+			"csum hole found for disk bytenr range [%llu, %llu)",
+				cur_disk_bytenr, cur_disk_bytenr + sectorsize);
 			}
-			btrfs_item_key_to_cpu(path->nodes[0], &found_key,
-					      path->slots[0]);
-
-			item_start_offset = found_key.offset;
-			item_size = btrfs_item_size_nr(path->nodes[0],
-						       path->slots[0]);
-			item_last_offset = item_start_offset +
-				(item_size / csum_size) *
-				fs_info->sectorsize;
-			item = btrfs_item_ptr(path->nodes[0], path->slots[0],
-					      struct btrfs_csum_item);
-		}
-		/*
-		 * this byte range must be able to fit inside
-		 * a single leaf so it will also fit inside a u32
-		 */
-		diff = disk_bytenr - item_start_offset;
-		diff = diff >> fs_info->sectorsize_bits;
-		diff = diff * csum_size;
-		count = min_t(int, nblocks, (item_last_offset - disk_bytenr) >>
-					    fs_info->sectorsize_bits);
-		read_extent_buffer(path->nodes[0], csum,
-				   ((unsigned long)item) + diff,
-				   csum_size * count);
-found:
-		csum += count * csum_size;
-		nblocks -= count;
-next:
-		while (count > 0) {
-			count--;
-			disk_bytenr += fs_info->sectorsize;
-			offset += fs_info->sectorsize;
-			page_bytes_left -= fs_info->sectorsize;
-			if (!page_bytes_left)
-				break; /* move to next bio */
 		}
 	}
 
-	WARN_ON_ONCE(count);
 	btrfs_free_path(path);
 	return BLK_STS_OK;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 255ea28982ff..8fb4b60a0091 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2268,7 +2268,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 			 * need to csum or not, which is why we ignore skip_sum
 			 * here.
 			 */
-			ret = btrfs_lookup_bio_sums(inode, bio, (u64)-1, NULL);
+			ret = btrfs_lookup_bio_sums(inode, bio, NULL);
 			if (ret)
 				goto out;
 		}
@@ -7964,8 +7964,7 @@ static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap,
 		 *
 		 * If we have csums disabled this will do nothing.
 		 */
-		status = btrfs_lookup_bio_sums(inode, dio_bio, file_offset,
-					       dip->csums);
+		status = btrfs_lookup_bio_sums(inode, dio_bio, dip->csums);
 		if (status != BLK_STS_OK)
 			goto out_err;
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 11/15] btrfs: scrub: reduce the width for extent_len/stripe_len from 64 bits to 32 bits
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (9 preceding siblings ...)
  2020-12-02  6:48 ` [PATCH v3 10/15] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs Qu Wenruo
@ 2020-12-02  6:48 ` Qu Wenruo
  2020-12-02  6:48 ` [PATCH v3 12/15] btrfs: scrub: always allocate one full page for one sector for RAID56 Qu Wenruo
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:48 UTC (permalink / raw)
  To: linux-btrfs

Btrfs on-disk format choose to use u64 for almost everything, but there
are a lot of restriction that, we can't use more than u32 for things like
extent length (the maximum length is 128MiB for non-hole extents), or
stripe length (we have device number limit).

This means if we don't have extra handling to convert u64 to u32, we
will always have some questionable operations like
"u32 = u64 >> sectorsize_bits" in the code.

This patch will try to address the problem by reducing the width for the
following members/parameters:

- scrub_parity::stripe_len
- @len of scrub_pages()
- @extent_len of scrub_remap_extent()
- @len of scrub_parity_mark_sectors_error()
- @len of scrub_parity_mark_sectors_data()
- @len of scrub_extent()
- @len of scrub_pages_for_parity()
- @len of scrub_extent_for_parity()

For members extracted from on-disk structure, like map->stripe_len, they
will be kept as is. Since that modification would require on-disk format
change.

There will be cases like "u32 = u64 - u64" or "u32 = u64", for such call
sites, extra ASSERT() is added to be extra safe for debug build.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 54 +++++++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 78759bc9c980..8026606f7510 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -130,7 +130,7 @@ struct scrub_parity {
 
 	int			nsectors;
 
-	u64			stripe_len;
+	u32			stripe_len;
 
 	refcount_t		refs;
 
@@ -233,7 +233,7 @@ static void scrub_parity_get(struct scrub_parity *sparity);
 static void scrub_parity_put(struct scrub_parity *sparity);
 static int scrub_add_page_to_rd_bio(struct scrub_ctx *sctx,
 				    struct scrub_page *spage);
-static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
+static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u32 len,
 		       u64 physical, struct btrfs_device *dev, u64 flags,
 		       u64 gen, int mirror_num, u8 *csum,
 		       u64 physical_for_dev_replace);
@@ -241,7 +241,7 @@ static void scrub_bio_end_io(struct bio *bio);
 static void scrub_bio_end_io_worker(struct btrfs_work *work);
 static void scrub_block_complete(struct scrub_block *sblock);
 static void scrub_remap_extent(struct btrfs_fs_info *fs_info,
-			       u64 extent_logical, u64 extent_len,
+			       u64 extent_logical, u32 extent_len,
 			       u64 *extent_physical,
 			       struct btrfs_device **extent_dev,
 			       int *extent_mirror_num);
@@ -2147,7 +2147,7 @@ static void scrub_missing_raid56_pages(struct scrub_block *sblock)
 	spin_unlock(&sctx->stat_lock);
 }
 
-static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
+static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u32 len,
 		       u64 physical, struct btrfs_device *dev, u64 flags,
 		       u64 gen, int mirror_num, u8 *csum,
 		       u64 physical_for_dev_replace)
@@ -2171,7 +2171,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 
 	for (index = 0; len > 0; index++) {
 		struct scrub_page *spage;
-		u64 l = min_t(u64, len, PAGE_SIZE);
+		u32 l = min_t(u32, len, PAGE_SIZE);
 
 		spage = kzalloc(sizeof(*spage), GFP_KERNEL);
 		if (!spage) {
@@ -2292,10 +2292,9 @@ static void scrub_bio_end_io_worker(struct btrfs_work *work)
 
 static inline void __scrub_mark_bitmap(struct scrub_parity *sparity,
 				       unsigned long *bitmap,
-				       u64 start, u64 len)
+				       u64 start, u32 len)
 {
 	u64 offset;
-	u64 nsectors64;
 	u32 nsectors;
 	u32 sectorsize_bits = sparity->sctx->fs_info->sectorsize_bits;
 
@@ -2307,10 +2306,7 @@ static inline void __scrub_mark_bitmap(struct scrub_parity *sparity,
 	start -= sparity->logic_start;
 	start = div64_u64_rem(start, sparity->stripe_len, &offset);
 	offset = offset >> sectorsize_bits;
-	nsectors64 = len >> sectorsize_bits;
-
-	ASSERT(nsectors64 < UINT_MAX);
-	nsectors = (u32)nsectors64;
+	nsectors = len >> sectorsize_bits;
 
 	if (offset + nsectors <= sparity->nsectors) {
 		bitmap_set(bitmap, offset, nsectors);
@@ -2322,13 +2318,13 @@ static inline void __scrub_mark_bitmap(struct scrub_parity *sparity,
 }
 
 static inline void scrub_parity_mark_sectors_error(struct scrub_parity *sparity,
-						   u64 start, u64 len)
+						   u64 start, u32 len)
 {
 	__scrub_mark_bitmap(sparity, sparity->ebitmap, start, len);
 }
 
 static inline void scrub_parity_mark_sectors_data(struct scrub_parity *sparity,
-						  u64 start, u64 len)
+						  u64 start, u32 len)
 {
 	__scrub_mark_bitmap(sparity, sparity->dbitmap, start, len);
 }
@@ -2356,6 +2352,7 @@ static void scrub_block_complete(struct scrub_block *sblock)
 		u64 end = sblock->pagev[sblock->page_count - 1]->logical +
 			  PAGE_SIZE;
 
+		ASSERT(end - start <= U32_MAX);
 		scrub_parity_mark_sectors_error(sblock->sparity,
 						start, end - start);
 	}
@@ -2425,7 +2422,7 @@ static int scrub_find_csum(struct scrub_ctx *sctx, u64 logical, u8 *csum)
 
 /* scrub extent tries to collect up to 64 kB for each bio */
 static int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map,
-			u64 logical, u64 len,
+			u64 logical, u32 len,
 			u64 physical, struct btrfs_device *dev, u64 flags,
 			u64 gen, int mirror_num, u64 physical_for_dev_replace)
 {
@@ -2457,7 +2454,7 @@ static int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map,
 	}
 
 	while (len) {
-		u64 l = min_t(u64, len, blocksize);
+		u32 l = min(len, blocksize);
 		int have_csum = 0;
 
 		if (flags & BTRFS_EXTENT_FLAG_DATA) {
@@ -2480,7 +2477,7 @@ static int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map,
 }
 
 static int scrub_pages_for_parity(struct scrub_parity *sparity,
-				  u64 logical, u64 len,
+				  u64 logical, u32 len,
 				  u64 physical, struct btrfs_device *dev,
 				  u64 flags, u64 gen, int mirror_num, u8 *csum)
 {
@@ -2506,7 +2503,7 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity,
 
 	for (index = 0; len > 0; index++) {
 		struct scrub_page *spage;
-		u64 l = min_t(u64, len, PAGE_SIZE);
+		u32 l = min_t(u32, len, PAGE_SIZE);
 
 		spage = kzalloc(sizeof(*spage), GFP_KERNEL);
 		if (!spage) {
@@ -2564,7 +2561,7 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity,
 }
 
 static int scrub_extent_for_parity(struct scrub_parity *sparity,
-				   u64 logical, u64 len,
+				   u64 logical, u32 len,
 				   u64 physical, struct btrfs_device *dev,
 				   u64 flags, u64 gen, int mirror_num)
 {
@@ -2588,7 +2585,7 @@ static int scrub_extent_for_parity(struct scrub_parity *sparity,
 	}
 
 	while (len) {
-		u64 l = min_t(u64, len, blocksize);
+		u32 l = min(len, blocksize);
 		int have_csum = 0;
 
 		if (flags & BTRFS_EXTENT_FLAG_DATA) {
@@ -2792,7 +2789,8 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx,
 	u64 generation;
 	u64 extent_logical;
 	u64 extent_physical;
-	u64 extent_len;
+	/* Check the comment in scrub_stripe() for why u32 is enough here */
+	u32 extent_len;
 	u64 mapped_length;
 	struct btrfs_device *extent_dev;
 	struct scrub_parity *sparity;
@@ -2801,6 +2799,7 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx,
 	int extent_mirror_num;
 	int stop_loop = 0;
 
+	ASSERT(map->stripe_len <= U32_MAX);
 	nsectors = map->stripe_len >> fs_info->sectorsize_bits;
 	bitmap_len = scrub_calc_parity_bitmap_len(nsectors);
 	sparity = kzalloc(sizeof(struct scrub_parity) + 2 * bitmap_len,
@@ -2812,6 +2811,7 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx,
 		return -ENOMEM;
 	}
 
+	ASSERT(map->stripe_len <= U32_MAX);
 	sparity->stripe_len = map->stripe_len;
 	sparity->nsectors = nsectors;
 	sparity->sctx = sctx;
@@ -2906,6 +2906,7 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx,
 			}
 again:
 			extent_logical = key.objectid;
+			ASSERT(bytes <= U32_MAX);
 			extent_len = bytes;
 
 			if (extent_logical < logic_start) {
@@ -2984,9 +2985,11 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx,
 		logic_start += map->stripe_len;
 	}
 out:
-	if (ret < 0)
+	if (ret < 0) {
+		ASSERT(logic_end - logic_start <= U32_MAX);
 		scrub_parity_mark_sectors_error(sparity, logic_start,
 						logic_end - logic_start);
+	}
 	scrub_parity_put(sparity);
 	scrub_submit(sctx);
 	mutex_lock(&sctx->wr_lock);
@@ -3028,7 +3031,11 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 	u64 offset;
 	u64 extent_logical;
 	u64 extent_physical;
-	u64 extent_len;
+	/*
+	 * Unlike chunk length, extent length should never go beyond
+	 * BTRFS_MAX_EXTENT_SIZE, thus u32 is enough here.
+	 */
+	u32 extent_len;
 	u64 stripe_logical;
 	u64 stripe_end;
 	struct btrfs_device *extent_dev;
@@ -3277,6 +3284,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 
 again:
 			extent_logical = key.objectid;
+			ASSERT(bytes <= U32_MAX);
 			extent_len = bytes;
 
 			/*
@@ -4074,7 +4082,7 @@ int btrfs_scrub_progress(struct btrfs_fs_info *fs_info, u64 devid,
 }
 
 static void scrub_remap_extent(struct btrfs_fs_info *fs_info,
-			       u64 extent_logical, u64 extent_len,
+			       u64 extent_logical, u32 extent_len,
 			       u64 *extent_physical,
 			       struct btrfs_device **extent_dev,
 			       int *extent_mirror_num)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 12/15] btrfs: scrub: always allocate one full page for one sector for RAID56
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (10 preceding siblings ...)
  2020-12-02  6:48 ` [PATCH v3 11/15] btrfs: scrub: reduce the width for extent_len/stripe_len from 64 bits to 32 bits Qu Wenruo
@ 2020-12-02  6:48 ` Qu Wenruo
  2020-12-02  6:48 ` [PATCH v3 13/15] btrfs: scrub: support subpage tree block scrub Qu Wenruo
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:48 UTC (permalink / raw)
  To: linux-btrfs

For scrub_pages() and scrub_pages_for_parity(), we currently allocate
one scrub_page structure for one page.

This is fine if we only read/write one sector one time.
But for cases like scrubing RAID56, we need to read/write the full
stripe, which is in 64K size.

For subpage size, we will submit the read in just one page, which is
normally a good thing, but for RAID56 case, it only expects to see one
sector, not the full stripe in its endio function.
This could lead to wrong parity checksum for RAID56 on subpage.

To make the existing code work well for subpage case, here we take a
shortcut, by always allocating a full page for one sector.

This should provide the basis to make RAID56 work for subpage case.

The cost is pretty obvious now, for one RAID56 stripe now we always need 16
pages. For support subpage situation (64K page size, 4K sector size),
this means we need full one megabyte to scrub just one RAID56 stripe.

And for data scrub, each 4K sector will also need one 64K page.

This is mostly just a workaround, the proper fix for this is a much
larger project, using scrub_block to replace scrub_page, and allow
scrub_block to handle multi pages, csums, and csum_bitmap to avoid
allocating one page for each sector.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 8026606f7510..efc6f5f2b8a4 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2153,6 +2153,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u32 len,
 		       u64 physical_for_dev_replace)
 {
 	struct scrub_block *sblock;
+	const u32 sectorsize = sctx->fs_info->sectorsize;
 	int index;
 
 	sblock = kzalloc(sizeof(*sblock), GFP_KERNEL);
@@ -2171,7 +2172,12 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u32 len,
 
 	for (index = 0; len > 0; index++) {
 		struct scrub_page *spage;
-		u32 l = min_t(u32, len, PAGE_SIZE);
+		/*
+		 * Here we will allocate one page for one sector to scrub.
+		 * This is fine if PAGE_SIZE == sectorsize, but will cost
+		 * more memory for PAGE_SIZE > sectorsize case.
+		 */
+		u32 l = min(sectorsize, len);
 
 		spage = kzalloc(sizeof(*spage), GFP_KERNEL);
 		if (!spage) {
@@ -2483,8 +2489,11 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity,
 {
 	struct scrub_ctx *sctx = sparity->sctx;
 	struct scrub_block *sblock;
+	u32 sectorsize = sctx->fs_info->sectorsize;
 	int index;
 
+	ASSERT(IS_ALIGNED(len, sectorsize));
+
 	sblock = kzalloc(sizeof(*sblock), GFP_KERNEL);
 	if (!sblock) {
 		spin_lock(&sctx->stat_lock);
@@ -2503,7 +2512,6 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity,
 
 	for (index = 0; len > 0; index++) {
 		struct scrub_page *spage;
-		u32 l = min_t(u32, len, PAGE_SIZE);
 
 		spage = kzalloc(sizeof(*spage), GFP_KERNEL);
 		if (!spage) {
@@ -2538,9 +2546,12 @@ static int scrub_pages_for_parity(struct scrub_parity *sparity,
 		spage->page = alloc_page(GFP_KERNEL);
 		if (!spage->page)
 			goto leave_nomem;
-		len -= l;
-		logical += l;
-		physical += l;
+
+
+		/* Iterate over the stripe range in sectorsize steps */
+		len -= sectorsize;
+		logical += sectorsize;
+		physical += sectorsize;
 	}
 
 	WARN_ON(sblock->page_count == 0);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 13/15] btrfs: scrub: support subpage tree block scrub
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (11 preceding siblings ...)
  2020-12-02  6:48 ` [PATCH v3 12/15] btrfs: scrub: always allocate one full page for one sector for RAID56 Qu Wenruo
@ 2020-12-02  6:48 ` Qu Wenruo
  2020-12-02  6:48 ` [PATCH v3 14/15] btrfs: scrub: support subpage data scrub Qu Wenruo
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:48 UTC (permalink / raw)
  To: linux-btrfs

To support subpage tree block scrub, scrub_checksum_tree_block() only
needs to learn 2 new tricks:

- Follow sector size
  Now scrub_page only represents one sector, we need to follow it
  properly.

- Run checksum on all sectors
  Since scrub_page only represents one sector, we need to run hash on
  all sectors, no longer just (nodesize >> PAGE_SIZE).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index efc6f5f2b8a4..a4d30106bacb 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1808,15 +1808,20 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock)
 	struct scrub_ctx *sctx = sblock->sctx;
 	struct btrfs_header *h;
 	struct btrfs_fs_info *fs_info = sctx->fs_info;
+	const u32 sectorsize = sctx->fs_info->sectorsize;
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	u8 calculated_csum[BTRFS_CSUM_SIZE];
 	u8 on_disk_csum[BTRFS_CSUM_SIZE];
-	const int num_pages = sctx->fs_info->nodesize >> PAGE_SHIFT;
+	const int num_sectors = fs_info->nodesize >> fs_info->sectorsize_bits;
 	int i;
 	struct scrub_page *spage;
 	char *kaddr;
 
 	BUG_ON(sblock->page_count < 1);
+
+	/* Each pagev[] is in fact just one sector, not a full page */
+	ASSERT(sblock->page_count == num_sectors);
+
 	spage = sblock->pagev[0];
 	kaddr = page_address(spage->page);
 	h = (struct btrfs_header *)kaddr;
@@ -1845,11 +1850,11 @@ static int scrub_checksum_tree_block(struct scrub_block *sblock)
 	shash->tfm = fs_info->csum_shash;
 	crypto_shash_init(shash);
 	crypto_shash_update(shash, kaddr + BTRFS_CSUM_SIZE,
-			    PAGE_SIZE - BTRFS_CSUM_SIZE);
+			    sectorsize - BTRFS_CSUM_SIZE);
 
-	for (i = 1; i < num_pages; i++) {
+	for (i = 1; i < num_sectors; i++) {
 		kaddr = page_address(sblock->pagev[i]->page);
-		crypto_shash_update(shash, kaddr, PAGE_SIZE);
+		crypto_shash_update(shash, kaddr, sectorsize);
 	}
 
 	crypto_shash_final(shash, calculated_csum);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 14/15] btrfs: scrub: support subpage data scrub
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (12 preceding siblings ...)
  2020-12-02  6:48 ` [PATCH v3 13/15] btrfs: scrub: support subpage tree block scrub Qu Wenruo
@ 2020-12-02  6:48 ` Qu Wenruo
  2020-12-02  6:48 ` [PATCH v3 15/15] btrfs: scrub: allow scrub to work with subpage sectorsize Qu Wenruo
  2020-12-04 15:18 ` [PATCH v3 00/15] btrfs: preparation patches for subpage support David Sterba
  15 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:48 UTC (permalink / raw)
  To: linux-btrfs

Btrfs scrub is in fact much more flex than buffered data write path, as
we can read an unaligned subpage data into page offset 0.

This ability makes subpage support much easier, we just need to check
each scrub_page::page_len and ensure we only calculate hash for [0,
page_len) of a page, and call it a day for subpage scrub support.

There is a small thing to notice, for subpage case, we still do sector
by sector scrub.
This means we will submit a read bio for each sector to scrub, resulting
the same amount of read bios, just like the 4K page systems.

This behavior can be considered as a good thing, if we want everything
to be the same as 4K page systems.
But this also means, we're wasting the ability to submit larger bio
using 64K page size.
This is another problem to consider in the future.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index a4d30106bacb..8a43e8cb10a6 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1795,11 +1795,15 @@ static int scrub_checksum_data(struct scrub_block *sblock)
 
 	shash->tfm = fs_info->csum_shash;
 	crypto_shash_init(shash);
-	crypto_shash_digest(shash, kaddr, PAGE_SIZE, csum);
 
-	if (memcmp(csum, spage->csum, sctx->fs_info->csum_size))
-		sblock->checksum_error = 1;
+	/*
+	 * In scrub_pages() and scrub_pages_for_parity() we ensure
+	 * each spage only contains just one sector of data.
+	 */
+	crypto_shash_digest(shash, kaddr, fs_info->sectorsize, csum);
 
+	if (memcmp(csum, spage->csum, fs_info->csum_size))
+		sblock->checksum_error = 1;
 	return sblock->checksum_error;
 }
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 15/15] btrfs: scrub: allow scrub to work with subpage sectorsize
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (13 preceding siblings ...)
  2020-12-02  6:48 ` [PATCH v3 14/15] btrfs: scrub: support subpage data scrub Qu Wenruo
@ 2020-12-02  6:48 ` Qu Wenruo
  2020-12-04 15:18 ` [PATCH v3 00/15] btrfs: preparation patches for subpage support David Sterba
  15 siblings, 0 replies; 19+ messages in thread
From: Qu Wenruo @ 2020-12-02  6:48 UTC (permalink / raw)
  To: linux-btrfs

Since btrfs scrub is utilizing its own infrastructure to submit
read/write, scrub is independent from all other routines.

This brings one very neat feature, allow us to read 4K data into
offset 0 of a 64K page.
So is the writeback routine.

This makes scrub on subpage sector size much easier to implement, and
thanks to previous commits which just changed the implementation to
always do scrub based on sector size, now scrub can handle subpage
filesystem without any problem.

This patch will just remove the restriction on
(sectorsize != PAGE_SIZE), to make scrub finally work on subpage
filesystems.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 8a43e8cb10a6..e0ac0009303d 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3880,14 +3880,6 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 		return -EINVAL;
 	}
 
-	if (fs_info->sectorsize != PAGE_SIZE) {
-		/* not supported for data w/o checksums */
-		btrfs_err_rl(fs_info,
-			   "scrub: size assumption sectorsize != PAGE_SIZE (%d != %lu) fails",
-		       fs_info->sectorsize, PAGE_SIZE);
-		return -EINVAL;
-	}
-
 	if (fs_info->nodesize >
 	    PAGE_SIZE * SCRUB_MAX_PAGES_PER_BLOCK ||
 	    fs_info->sectorsize > PAGE_SIZE * SCRUB_MAX_PAGES_PER_BLOCK) {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 01/15] btrfs: rename bio_offset of extent_submit_bio_start_t to opt_file_offset
  2020-12-02  6:47 ` [PATCH v3 01/15] btrfs: rename bio_offset of extent_submit_bio_start_t to opt_file_offset Qu Wenruo
@ 2020-12-02  8:12   ` Christoph Hellwig
  2020-12-03 18:45     ` David Sterba
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2020-12-02  8:12 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Dec 02, 2020 at 02:47:57PM +0800, Qu Wenruo wrote:
> The parameter bio_offset of extent_submit_bio_start_t is very confusing.
> 
> If it's really bio_offset (offset to bio), then it should be u32.
> 
> But in fact, it's only utilized by dio read, and that member is used as
> file offset, which must be u64.
> 
> Rename it to opt_file_offset since the only user uses it as file offset,
> and add comment for who is using it.

I think dio_file_offset might be a better name.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 01/15] btrfs: rename bio_offset of extent_submit_bio_start_t to opt_file_offset
  2020-12-02  8:12   ` Christoph Hellwig
@ 2020-12-03 18:45     ` David Sterba
  0 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2020-12-03 18:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Qu Wenruo, linux-btrfs

On Wed, Dec 02, 2020 at 08:12:36AM +0000, Christoph Hellwig wrote:
> On Wed, Dec 02, 2020 at 02:47:57PM +0800, Qu Wenruo wrote:
> > The parameter bio_offset of extent_submit_bio_start_t is very confusing.
> > 
> > If it's really bio_offset (offset to bio), then it should be u32.
> > 
> > But in fact, it's only utilized by dio read, and that member is used as
> > file offset, which must be u64.
> > 
> > Rename it to opt_file_offset since the only user uses it as file offset,
> > and add comment for who is using it.
> 
> I think dio_file_offset might be a better name.

Sounds good, I'll change it. Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 00/15] btrfs: preparation patches for subpage support
  2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
                   ` (14 preceding siblings ...)
  2020-12-02  6:48 ` [PATCH v3 15/15] btrfs: scrub: allow scrub to work with subpage sectorsize Qu Wenruo
@ 2020-12-04 15:18 ` David Sterba
  15 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2020-12-04 15:18 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Dec 02, 2020 at 02:47:56PM +0800, Qu Wenruo wrote:
> This is the rebased preparation branch for all patches not yet merged into
> misc-next.
> 
> It can be fetched from github (with experimental sector aligned data write
> support)
> https://github.com/adam900710/linux/tree/subpage
> 
> This patchset includes all the unmerged preparation patches for subpage
> support.
> 
> The patchset is sent without the main core for subpage support, as
> myself has proven that, big patchset bombarding won't really make
> reviewers happy, but only make the author happy (for a very short time).
> 
> Thanks for the hard work from David, there are only 15 patches unmerged.
> (With 2 new small patches to address u32 u64 problem)
> 
> Patch 01~02:	bio_offset related fixes. Make bio_offset to be u32.
> Patch 03:	Refactor metadata submission for later metadata write
> 		support.
> Patch 04~08:	Metadata related refactor.
> Patch 09~10:	Data related refactor
> Patch 11~15:	Scrub related refactor and cleanup
> 
> For the scrub patch, there was a discussion with David, about whether we
> should use sector size as the unit for metadata scrub.
> 
> His idea is, sector size should be the minimal unit for DATA, not
> metadata. This indicates there is a undefined "minimal unit" of access.
> 
> But my argument is, sector size is the minimal unit for all btrfs
> access, current btrfs has an undefined "data size", and that "data size"
> must equal to sectorsize for current btrfs implementation.
> 
> Thus for "data size" < nodesize case, we should first add support for
> "data size" > sectorsize first.
> 
> Thus I kept the scrub patch untouched, since IMHO sector size is still
> the minimal unit to access, thus iterating using sectorsize is
> completely sane.
> 
> Changelog:
> v1:
> - Separate prep patches from the huge subpage patchset
> 
> - Rebased to misc-next
> 
> - Add more commit message for patch "btrfs: extent_io: remove the
>   extent_start/extent_len for end_bio_extent_readpage()"
>   With one runtime example to explain why we are doing the same thing.
> 
> - Fix the assert_spin_lock() usage
>   What we really want is lockdep_assert_held()
> 
> - Re-iterate the reason why some extent io tests are invalid
>   This is especially important since later patches will reduce
>   extent_buffer::pages[] to bare minimal, killing the ability to
>   handle certain invalid extent buffers.
> 
> - Use sectorsize_bits for division
>   During the convert, we should only use sectorsize_bits for division,
>   this solves the hassle on 32bit system to do division.
>   But we should not use sectorsize_bits no brain, as bit shift is not
>   straight forward as multiple/division.
> 
> - Address the comments for btrfs_lookup_bio_sums() cleanup patchset
>   From naming to macro usages, all of those comments should further
>   improve the readability.
> 
> v2:
> - Remove new extent_io tree features
>   Now we won't utilize extent io tree for subpage support, thus new
>   features along with some aggressive refactor is no longer needed.
> 
> - Reduce extent_io tree operations to reduce endio time latency
>   Although extent_io tree can do a lot of things like page status, but
>   it has obvious overhead, namingly search btree.
>   So keep the original behavior by only calling extent_io operation in a
>   big extent, to reduce latency
> 
> v3:
> - Rebased to latest misc-next
>   Now only 15 patches to submit.
> 
> - Add two new patches to address u32 and u64 problems
>   The root problem is the on-disk format is abusing u64 for its length.
>   We have to draw a line between where we should convert to u32.
>   Currently for bio_offset and extent_len, we can safely use u32.
>   Just to be extra safe, added more ASSERT() for this.
> 
> - Put BTRFS_MAX_METADATA_BLOCKSIZE into uapi
>   To avoid circle including "ctree.h"
> 
> - Add more changelog for the patch enabling subpage scrub
> 
> 
> Qu Wenruo (15):
>   btrfs: rename bio_offset of extent_submit_bio_start_t to
>     opt_file_offset
>   btrfs: pass bio_offset to check_data_csum() directly
>   btrfs: inode: make btrfs_verify_data_csum() follow sector size
>   btrfs: extent_io: extract the btree page submission code into its own
>     helper function
>   btrfs: extent_io: calculate inline extent buffer page size based on
>     page size
>   btrfs: extent_io: don't allow tree block to cross page boundary for
>     subpage support
>   btrfs: extent_io: update num_extent_pages() to support subpage sized
>     extent buffer
>   btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
>   btrfs: file-item: remove the btrfs_find_ordered_sum() call in
>     btrfs_lookup_bio_sums()
>   btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle
>     out-of-order bvecs
>   btrfs: scrub: reduce the width for extent_len/stripe_len from 64 bits
>     to 32 bits
>   btrfs: scrub: always allocate one full page for one sector for RAID56
>   btrfs: scrub: support subpage tree block scrub
>   btrfs: scrub: support subpage data scrub
>   btrfs: scrub: allow scrub to work with subpage sectorsize

With a few minor fixups it's in misc-next, thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-12-04 15:20 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-02  6:47 [PATCH v3 00/15] btrfs: preparation patches for subpage support Qu Wenruo
2020-12-02  6:47 ` [PATCH v3 01/15] btrfs: rename bio_offset of extent_submit_bio_start_t to opt_file_offset Qu Wenruo
2020-12-02  8:12   ` Christoph Hellwig
2020-12-03 18:45     ` David Sterba
2020-12-02  6:47 ` [PATCH v3 02/15] btrfs: pass bio_offset to check_data_csum() directly Qu Wenruo
2020-12-02  6:47 ` [PATCH v3 03/15] btrfs: inode: make btrfs_verify_data_csum() follow sector size Qu Wenruo
2020-12-02  6:48 ` [PATCH v3 04/15] btrfs: extent_io: extract the btree page submission code into its own helper function Qu Wenruo
2020-12-02  6:48 ` [PATCH v3 05/15] btrfs: extent_io: calculate inline extent buffer page size based on page size Qu Wenruo
2020-12-02  6:48 ` [PATCH v3 06/15] btrfs: extent_io: don't allow tree block to cross page boundary for subpage support Qu Wenruo
2020-12-02  6:48 ` [PATCH v3 07/15] btrfs: extent_io: update num_extent_pages() to support subpage sized extent buffer Qu Wenruo
2020-12-02  6:48 ` [PATCH v3 08/15] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
2020-12-02  6:48 ` [PATCH v3 09/15] btrfs: file-item: remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums() Qu Wenruo
2020-12-02  6:48 ` [PATCH v3 10/15] btrfs: file-item: refactor btrfs_lookup_bio_sums() to handle out-of-order bvecs Qu Wenruo
2020-12-02  6:48 ` [PATCH v3 11/15] btrfs: scrub: reduce the width for extent_len/stripe_len from 64 bits to 32 bits Qu Wenruo
2020-12-02  6:48 ` [PATCH v3 12/15] btrfs: scrub: always allocate one full page for one sector for RAID56 Qu Wenruo
2020-12-02  6:48 ` [PATCH v3 13/15] btrfs: scrub: support subpage tree block scrub Qu Wenruo
2020-12-02  6:48 ` [PATCH v3 14/15] btrfs: scrub: support subpage data scrub Qu Wenruo
2020-12-02  6:48 ` [PATCH v3 15/15] btrfs: scrub: allow scrub to work with subpage sectorsize Qu Wenruo
2020-12-04 15:18 ` [PATCH v3 00/15] btrfs: preparation patches for subpage support David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).