All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/17] btrfs: add subpage support for RAID56
@ 2022-04-12  9:32 Qu Wenruo
  2022-04-12  9:32 ` [PATCH v2 01/17] btrfs: reduce width for stripe_len from u64 to u32 Qu Wenruo
                   ` (17 more replies)
  0 siblings, 18 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:32 UTC (permalink / raw)
  To: linux-btrfs

The branch can be fetched from github, based on latest misc-next branch
(with bio and memory allocation refactors):
https://github.com/adam900710/linux/tree/subpage_raid56

[CHANGELOG]
v2:
- Rebased to latest misc-next
  There are several conflicts caused by bio interface change and page
  allocation update.

- A new patch to reduce the width of @stripe_len to u32
  Currently @stripe_len is fixed to 64K, and even in the future we
  choose to enlarge the value, I see no reason to go beyond 4G for
  stripe length.

  Thus change it u32 to avoid some u64-divided-by-u32 situations.

  This will reduce memory usage for map_lookup (which has a lifespan as
  long as the mounted fs) and btrfs_io_geometry (which only has a very
  short lifespan, mostly bounded to bio).

  Furthermore, add some extra alignment check and use right bit shift
  to replace involved division to avoid possible problems on 32bit
  systems.

- Pack sector_ptr::pgoff and sector_ptr::uptodate into one u32
  This will reduce memory usage and reduce unaligned memory access

  Please note that, even with it packed, we still have a 4 bytes padding
  (it's u64 + u32, thus not perfectly aligned).
  Without packed attribute, it will cost more memory usage anyway.

- Call kunmap_local() using address with pgoff
  As it can handle it without problem, no need to bother extra search
  just for pgoff.

- Use "= { 0 }" for structure initialization

- Reduce comment updates to minimal
  If one comment line is not really touched, then don't touch it just to
  fix some bad styles.

[DESIGN]
To make RAID56 code to support subpage, we need to make every sector of
a RAID56 full stripe (including P/Q) to be addressable.

Previously we use page pointer directly for things like stripe_pages:

Full stripe is 64K * 3, 2 data stripes, one P stripe (RAID5)

stripe_pages:   | 0 | 1 | 2 |  .. | 46 | 47 |

Those 48 pages all points to a page we allocated.


The new subpage support will introduce a sector layer, based on the old
stripe_pages[] array:

The same 64K * 3, RAID5 layout, but 64K page size and 4K sector size:

stripe_sectors: | 0 | 1 | .. |15 |16 |  ...  |31 |32 | ..    |47 |
stripe_pages:   |      Page 0    |     Page 1    |    Page 2     |

Each stripe_ptr of stripe_sectors[] will include:

- One page pointer
  Points back the page inside stripe_pages[].

- One pgoff member
  To indicate the offset inside the page

- One uptodate member
  To indicate if the sector is uptodate, replacing the old PageUptodate
  flag.
  As introducing btrfs_subpage structure to stripe_pages[] looks a
  little overkilled, as we only care Uptodate flag.

The same applies to bio_sectors[] array, which is going to completely
replace the old bio_pages[] array.

[SIDE EFFECT]
Despite the obvious new ability for subpage to utilize btrfs RAID56
profiles, it will cause more memory usage for real btrfs_raid_bio
structure.

We allocate extra memory based on the stripe size and number of stripes,
and update the pointer arrays to utilize the extra memory.

To compare, we use a pretty standard setup, 3 disks raid5, 4K page size
on x86_64:

 Before: 1176
 After:  2320 (+97.8%)

The reason for such a big bump is:

- Extra size for sector_ptr.
  Instead of just a page pointer, now it's twice the size of a pointer
  (a page pointer + 2 * unsigned int)

  This means although we replace bio_pages[] with bio_sectors[], we are
  still enlarging the size.

- A completely new array for stripe_sectors[]
  And the array itself is also twice the size of the old stripe_pages[].

- Extra padding for sector_ptr
  Since we don't have packed attribute anymore, the real size of
  a sector_ptr is 16 bytes, not 12 bytes.

There is some attempt to reduce the size of btrfs_raid_bio itself, but
the big impact still comes from the new sector_ptr arrays.

Without exotic macros or packed attribute, I don't have any better ideas
on reducing the real size of btrfs_raid_bio.

[TESTS]
Now due to recent new error path exposed in generic/475, only btrfs
groups are tested. Or it will always hang at generic/475 due to
unrelated bugs.

Both x86_64 and aarch64 (64K page size) pass the full btrfs test cases
without new regression.

[PATCHSET LAYOUT]
The patchset layout puts several things into consideration:

- Every patch can be tested independently on x86_64
  No more tons of unused helpers then a big switch.
  Every change can be verified on x86_64.

- More temporary sanity checks than previous code
  For example, when rbio_add_io_page() is converted to be subpage
  compatible, extra ASSERT() is added to ensure no subpage range
  can even be added.

  Such temporary checks are removed in the last enablement patch.
  This is to make testing on x86_64 more comprehensive.

- Mostly small change in each patch
  The only exception is the conversion for rbio_add_io_page().
  But the most change in that patch comes from variable renaming.
  The overall line changed in each patch should still be small enough
  for review.

Qu Wenruo (17):
  btrfs: reduce width for stripe_len from u64 to u32
  btrfs: open-code rbio_nr_pages()
  btrfs: make btrfs_raid_bio more compact
  btrfs: introduce new cached members for btrfs_raid_bio
  btrfs: introduce btrfs_raid_bio::stripe_sectors
  btrfs: introduce btrfs_raid_bio::bio_sectors
  btrfs: make rbio_add_io_page() subpage compatible
  btrfs: make finish_parity_scrub() subpage compatible
  btrfs: make __raid_recover_endio_io() subpage compatibable
  btrfs: make finish_rmw() subpage compatible
  btrfs: open-code rbio_stripe_page_index()
  btrfs: make raid56_add_scrub_pages() subpage compatible
  btrfs: remove btrfs_raid_bio::bio_pages array
  btrfs: make set_bio_pages_uptodate() subpage compatible
  btrfs: make steal_rbio() subpage compatible
  btrfs: make alloc_rbio_essential_pages() subpage compatible
  btrfs: enable subpage support for RAID56

 fs/btrfs/disk-io.c |   8 -
 fs/btrfs/raid56.c  | 749 +++++++++++++++++++++++++++------------------
 fs/btrfs/raid56.h  |   8 +-
 fs/btrfs/scrub.c   |   6 +-
 fs/btrfs/volumes.c |  27 +-
 fs/btrfs/volumes.h |   8 +-
 6 files changed, 479 insertions(+), 327 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 01/17] btrfs: reduce width for stripe_len from u64 to u32
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
@ 2022-04-12  9:32 ` Qu Wenruo
  2022-04-12  9:32 ` [PATCH v2 02/17] btrfs: open-code rbio_nr_pages() Qu Wenruo
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:32 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs uses fixed stripe length (64K), thus u32 is wide enough
for the usage.

Furthermore, even in the future we choose to enlarge stripe length to
larger values, I don't believe we would want stripe as large as 4G or
larger.

So this patch will reduce the width for all in-memory structures and
parameters, this involves:

- RAID56 related function argument lists
  This allows us to do direct division related to stripe_len.
  Although we will use bits shift to replace the division anyway.

- btrfs_io_geometry structure
  This involves one change to simplify the calculation of both @stripe_nr
  and @stripe_offset, using div64_u64_rem().
  And add extra sanity check to make sure @stripe_offset is always small
  enough for u32.

  This saves 8 bytes for the structure.

- map_lookup structure
  This convert @stripe_len to u32, which saves 8 bytes. (saved 4 bytes,
  and removed a 4-bytes hole)

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c  | 15 ++++++++-------
 fs/btrfs/raid56.h  |  6 +++---
 fs/btrfs/volumes.c | 20 +++++++-------------
 fs/btrfs/volumes.h |  8 ++++----
 4 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 79438cdd604e..daffa3076325 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -949,9 +949,10 @@ static struct page *page_in_rbio(struct btrfs_raid_bio *rbio,
  * number of pages we need for the entire stripe across all the
  * drives
  */
-static unsigned long rbio_nr_pages(unsigned long stripe_len, int nr_stripes)
+static unsigned long rbio_nr_pages(u32 stripe_len, int nr_stripes)
 {
-	return DIV_ROUND_UP(stripe_len, PAGE_SIZE) * nr_stripes;
+	ASSERT(IS_ALIGNED(stripe_len, PAGE_SIZE));
+	return (stripe_len >> PAGE_SHIFT) * nr_stripes;
 }
 
 /*
@@ -960,13 +961,13 @@ static unsigned long rbio_nr_pages(unsigned long stripe_len, int nr_stripes)
  */
 static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 					 struct btrfs_io_context *bioc,
-					 u64 stripe_len)
+					 u32 stripe_len)
 {
 	struct btrfs_raid_bio *rbio;
 	int nr_data = 0;
 	int real_stripes = bioc->num_stripes - bioc->num_tgtdevs;
 	int num_pages = rbio_nr_pages(stripe_len, real_stripes);
-	int stripe_npages = DIV_ROUND_UP(stripe_len, PAGE_SIZE);
+	int stripe_npages = stripe_len >> PAGE_SHIFT;
 	void *p;
 
 	rbio = kzalloc(sizeof(*rbio) +
@@ -1692,7 +1693,7 @@ static void btrfs_raid_unplug(struct blk_plug_cb *cb, bool from_schedule)
  * our main entry point for writes from the rest of the FS.
  */
 int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc,
-			u64 stripe_len)
+			u32 stripe_len)
 {
 	struct btrfs_fs_info *fs_info = bioc->fs_info;
 	struct btrfs_raid_bio *rbio;
@@ -2089,7 +2090,7 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
  * of the drive.
  */
 int raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc,
-			  u64 stripe_len, int mirror_num, int generic_io)
+			  u32 stripe_len, int mirror_num, int generic_io)
 {
 	struct btrfs_fs_info *fs_info = bioc->fs_info;
 	struct btrfs_raid_bio *rbio;
@@ -2195,7 +2196,7 @@ static void read_rebuild_work(struct btrfs_work *work)
 
 struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio,
 				struct btrfs_io_context *bioc,
-				u64 stripe_len, struct btrfs_device *scrub_dev,
+				u32 stripe_len, struct btrfs_device *scrub_dev,
 				unsigned long *dbitmap, int stripe_nsectors)
 {
 	struct btrfs_fs_info *fs_info = bioc->fs_info;
diff --git a/fs/btrfs/raid56.h b/fs/btrfs/raid56.h
index 72c00fc284b5..fb35ae157b02 100644
--- a/fs/btrfs/raid56.h
+++ b/fs/btrfs/raid56.h
@@ -31,15 +31,15 @@ struct btrfs_raid_bio;
 struct btrfs_device;
 
 int raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc,
-			  u64 stripe_len, int mirror_num, int generic_io);
+			  u32 stripe_len, int mirror_num, int generic_io);
 int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc,
-			u64 stripe_len);
+			u32 stripe_len);
 
 void raid56_add_scrub_pages(struct btrfs_raid_bio *rbio, struct page *page,
 			    u64 logical);
 
 struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio,
-				struct btrfs_io_context *bioc, u64 stripe_len,
+				struct btrfs_io_context *bioc, u32 stripe_len,
 				struct btrfs_device *scrub_dev,
 				unsigned long *dbitmap, int stripe_nsectors);
 void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2368a2ffbee7..93072a090fdd 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6313,7 +6313,7 @@ int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em,
 	u64 offset;
 	u64 stripe_offset;
 	u64 stripe_nr;
-	u64 stripe_len;
+	u32 stripe_len;
 	u64 raid56_full_stripe_start = (u64)-1;
 	int data_stripes;
 
@@ -6324,19 +6324,13 @@ int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em,
 	offset = logical - em->start;
 	/* Len of a stripe in a chunk */
 	stripe_len = map->stripe_len;
-	/* Stripe where this block falls in */
-	stripe_nr = div64_u64(offset, stripe_len);
-	/* Offset of stripe in the chunk */
-	stripe_offset = stripe_nr * stripe_len;
-	if (offset < stripe_offset) {
-		btrfs_crit(fs_info,
-"stripe math has gone wrong, stripe_offset=%llu offset=%llu start=%llu logical=%llu stripe_len=%llu",
-			stripe_offset, offset, em->start, logical, stripe_len);
-		return -EINVAL;
-	}
+	/*
+	 * Stripe_nr is where this block falls in
+	 * stripe_offset is the offset of this block in its stripe.
+	 */
+	stripe_nr = div64_u64_rem(offset, stripe_len, &stripe_offset);
+	ASSERT(stripe_offset < U32_MAX);
 
-	/* stripe_offset is the offset of this block in its stripe */
-	stripe_offset = offset - stripe_offset;
 	data_stripes = nr_data_stripes(map);
 
 	/* Only stripe based profiles needs to check against stripe length. */
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index bd297f23d19e..0332a0a25483 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -23,11 +23,11 @@ struct btrfs_io_geometry {
 	/* offset of logical address in chunk */
 	u64 offset;
 	/* length of single IO stripe */
-	u64 stripe_len;
+	u32 stripe_len;
+	/* offset of address in stripe */
+	u32 stripe_offset;
 	/* number of stripe where address falls */
 	u64 stripe_nr;
-	/* offset of address in stripe */
-	u64 stripe_offset;
 	/* offset of raid56 stripe into the chunk */
 	u64 raid56_stripe_offset;
 };
@@ -427,7 +427,7 @@ struct map_lookup {
 	u64 type;
 	int io_align;
 	int io_width;
-	u64 stripe_len;
+	u32 stripe_len;
 	int num_stripes;
 	int sub_stripes;
 	int verified_stripes; /* For mount time dev extent verification */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 02/17] btrfs: open-code rbio_nr_pages()
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
  2022-04-12  9:32 ` [PATCH v2 01/17] btrfs: reduce width for stripe_len from u64 to u32 Qu Wenruo
@ 2022-04-12  9:32 ` Qu Wenruo
  2022-04-12  9:32 ` [PATCH v2 03/17] btrfs: make btrfs_raid_bio more compact Qu Wenruo
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:32 UTC (permalink / raw)
  To: linux-btrfs

The function rbio_nr_pages() is only called once inside alloc_rbio(),
there is no reason to make it dedicated helper.

Furthermore, the return type doesn't match, the function return "unsigned
long" which may not be necessary, while the only caller only uses "int".

Since we're doing cleaning up here, also fix the type to "const unsigned
int" for all involved local variables.

This change will slightly modify the timing of the ASSERT() on
@stripe_len though.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index daffa3076325..a16297b04db6 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -945,16 +945,6 @@ static struct page *page_in_rbio(struct btrfs_raid_bio *rbio,
 	return rbio->stripe_pages[chunk_page];
 }
 
-/*
- * number of pages we need for the entire stripe across all the
- * drives
- */
-static unsigned long rbio_nr_pages(u32 stripe_len, int nr_stripes)
-{
-	ASSERT(IS_ALIGNED(stripe_len, PAGE_SIZE));
-	return (stripe_len >> PAGE_SHIFT) * nr_stripes;
-}
-
 /*
  * allocation and initial setup for the btrfs_raid_bio.  Not
  * this does not allocate any pages for rbio->pages.
@@ -963,13 +953,15 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 					 struct btrfs_io_context *bioc,
 					 u32 stripe_len)
 {
+	const unsigned int real_stripes = bioc->num_stripes - bioc->num_tgtdevs;
+	const unsigned int stripe_npages = stripe_len >> PAGE_SHIFT;
+	const unsigned int num_pages = stripe_npages * real_stripes;
 	struct btrfs_raid_bio *rbio;
 	int nr_data = 0;
-	int real_stripes = bioc->num_stripes - bioc->num_tgtdevs;
-	int num_pages = rbio_nr_pages(stripe_len, real_stripes);
-	int stripe_npages = stripe_len >> PAGE_SHIFT;
 	void *p;
 
+	ASSERT(IS_ALIGNED(stripe_len, PAGE_SIZE));
+
 	rbio = kzalloc(sizeof(*rbio) +
 		       sizeof(*rbio->stripe_pages) * num_pages +
 		       sizeof(*rbio->bio_pages) * num_pages +
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 03/17] btrfs: make btrfs_raid_bio more compact
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
  2022-04-12  9:32 ` [PATCH v2 01/17] btrfs: reduce width for stripe_len from u64 to u32 Qu Wenruo
  2022-04-12  9:32 ` [PATCH v2 02/17] btrfs: open-code rbio_nr_pages() Qu Wenruo
@ 2022-04-12  9:32 ` Qu Wenruo
  2022-04-12  9:32 ` [PATCH v2 04/17] btrfs: introduce new cached members for btrfs_raid_bio Qu Wenruo
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:32 UTC (permalink / raw)
  To: linux-btrfs

There are a lot of members using much larger type in btrfs_raid_bio.

Like nr_pages which represents the total number of a full stripe.

Instead of int (which is at least 32bits), u16 is already enough
(max stripe length will be 256MiB, already beyond current raid56 device
number limit).

So this patch will reduce the width of the following members:

- stripe_len to u32
- nr_pages to u16
- nr_data to u8
- real_stripes to u8
- scrubp to u8
- faila/b to s8
  As -1 is used to indicate no corruption

This will slightly reduce the size of btrfs_raid_bio from 272 bytes to
256 bytes, reducing 16 bytes usage.

But please note that, when using btrfs_raid_bio, we allocate extra space
for it to cover various pointer array, so the reduce memory is not
really a big saving overall.

As we're here modifying the comments already, update existing comments
to current code standard.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index a16297b04db6..8cf0f368729a 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -101,15 +101,6 @@ struct btrfs_raid_bio {
 	 */
 	unsigned long flags;
 
-	/* size of each individual stripe on disk */
-	int stripe_len;
-
-	/* number of data stripes (no p/q) */
-	int nr_data;
-
-	int real_stripes;
-
-	int stripe_npages;
 	/*
 	 * set if we're doing a parity rebuild
 	 * for a read from higher up, which is handled
@@ -118,21 +109,32 @@ struct btrfs_raid_bio {
 	 */
 	enum btrfs_rbio_ops operation;
 
-	/* first bad stripe */
-	int faila;
+	/* Size of each individual stripe on disk */
+	u32 stripe_len;
 
-	/* second bad stripe (for raid6 use) */
-	int failb;
+	/* How many pages there are for the full stripe including P/Q */
+	u16 nr_pages;
 
-	int scrubp;
-	/*
-	 * number of pages needed to represent the full
-	 * stripe
-	 */
-	int nr_pages;
+	/* Number of data stripes (no p/q) */
+	u8 nr_data;
+
+	/* Numer of all stripes (including P/Q) */
+	u8 real_stripes;
+
+	/* How many pages there are for each stripe */
+	u8 stripe_npages;
+
+	/* First bad stripe, -1 means no corruption */
+	s8 faila;
+
+	/* Second bad stripe (for raid6 use) */
+	s8 failb;
+
+	/* Stripe number that we're scrubbing  */
+	u8 scrubp;
 
 	/*
-	 * size of all the bios in the bio_list.  This
+	 * Size of all the bios in the bio_list.  This
 	 * helps us decide if the rbio maps to a full
 	 * stripe or not
 	 */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 04/17] btrfs: introduce new cached members for btrfs_raid_bio
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (2 preceding siblings ...)
  2022-04-12  9:32 ` [PATCH v2 03/17] btrfs: make btrfs_raid_bio more compact Qu Wenruo
@ 2022-04-12  9:32 ` Qu Wenruo
  2022-04-15  5:31   ` Christoph Hellwig
  2022-04-12  9:32 ` [PATCH v2 05/17] btrfs: introduce btrfs_raid_bio::stripe_sectors Qu Wenruo
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:32 UTC (permalink / raw)
  To: linux-btrfs

The new members are all related to number of sectors, but the existing
number of pages members are kept as is:

- nr_sectors
  Total sectors of the full stripe including P/Q.

- stripe_nsectors
  The sectors of a single stripe.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 8cf0f368729a..92d464d4456a 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -115,6 +115,9 @@ struct btrfs_raid_bio {
 	/* How many pages there are for the full stripe including P/Q */
 	u16 nr_pages;
 
+	/* How many sectors there are for the full stripe including P/Q */
+	u16 nr_sectors;
+
 	/* Number of data stripes (no p/q) */
 	u8 nr_data;
 
@@ -124,6 +127,9 @@ struct btrfs_raid_bio {
 	/* How many pages there are for each stripe */
 	u8 stripe_npages;
 
+	/* How many sectors there are for each stripe */
+	u8 stripe_nsectors;
+
 	/* First bad stripe, -1 means no corruption */
 	s8 faila;
 
@@ -172,7 +178,7 @@ struct btrfs_raid_bio {
 	/* allocated with real_stripes-many pointers for finish_*() calls */
 	void **finish_pointers;
 
-	/* allocated with stripe_npages-many bits for finish_*() calls */
+	/* Allocated with stripe_nsectors-many bits for finish_*() calls */
 	unsigned long *finish_pbitmap;
 };
 
@@ -958,19 +964,24 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	const unsigned int real_stripes = bioc->num_stripes - bioc->num_tgtdevs;
 	const unsigned int stripe_npages = stripe_len >> PAGE_SHIFT;
 	const unsigned int num_pages = stripe_npages * real_stripes;
+	const unsigned int stripe_nsectors = stripe_len >>
+					     fs_info->sectorsize_bits;
+	const unsigned int num_sectors = stripe_nsectors * real_stripes;
 	struct btrfs_raid_bio *rbio;
 	int nr_data = 0;
 	void *p;
 
 	ASSERT(IS_ALIGNED(stripe_len, PAGE_SIZE));
+	/* PAGE_SIZE must also be aligned to sectorsize for subpage support */
+	ASSERT(IS_ALIGNED(PAGE_SIZE, fs_info->sectorsize));
 
 	rbio = kzalloc(sizeof(*rbio) +
 		       sizeof(*rbio->stripe_pages) * num_pages +
 		       sizeof(*rbio->bio_pages) * num_pages +
 		       sizeof(*rbio->finish_pointers) * real_stripes +
-		       sizeof(*rbio->dbitmap) * BITS_TO_LONGS(stripe_npages) +
+		       sizeof(*rbio->dbitmap) * BITS_TO_LONGS(stripe_nsectors) +
 		       sizeof(*rbio->finish_pbitmap) *
-				BITS_TO_LONGS(stripe_npages),
+				BITS_TO_LONGS(stripe_nsectors),
 		       GFP_NOFS);
 	if (!rbio)
 		return ERR_PTR(-ENOMEM);
@@ -983,8 +994,10 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	rbio->bioc = bioc;
 	rbio->stripe_len = stripe_len;
 	rbio->nr_pages = num_pages;
+	rbio->nr_sectors = num_sectors;
 	rbio->real_stripes = real_stripes;
 	rbio->stripe_npages = stripe_npages;
+	rbio->stripe_nsectors = stripe_nsectors;
 	rbio->faila = -1;
 	rbio->failb = -1;
 	refcount_set(&rbio->refs, 1);
@@ -1003,8 +1016,8 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	CONSUME_ALLOC(rbio->stripe_pages, num_pages);
 	CONSUME_ALLOC(rbio->bio_pages, num_pages);
 	CONSUME_ALLOC(rbio->finish_pointers, real_stripes);
-	CONSUME_ALLOC(rbio->dbitmap, BITS_TO_LONGS(stripe_npages));
-	CONSUME_ALLOC(rbio->finish_pbitmap, BITS_TO_LONGS(stripe_npages));
+	CONSUME_ALLOC(rbio->dbitmap, BITS_TO_LONGS(stripe_nsectors));
+	CONSUME_ALLOC(rbio->finish_pbitmap, BITS_TO_LONGS(stripe_nsectors));
 #undef  CONSUME_ALLOC
 
 	if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID5)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 05/17] btrfs: introduce btrfs_raid_bio::stripe_sectors
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (3 preceding siblings ...)
  2022-04-12  9:32 ` [PATCH v2 04/17] btrfs: introduce new cached members for btrfs_raid_bio Qu Wenruo
@ 2022-04-12  9:32 ` Qu Wenruo
  2022-04-12  9:32 ` [PATCH v2 06/17] btrfs: introduce btrfs_raid_bio::bio_sectors Qu Wenruo
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:32 UTC (permalink / raw)
  To: linux-btrfs

The new member is an array of sector_ptr pointers, they will represent
all sectors inside a full stripe (including P/Q).

They co-operate with btrfs_raid_bio::stripe_pages:

stripe_pages:   | Page 0, range [0, 64K)   | Page 1 ...
stripe_sectors: |  |  | ...             |  |
                  |  |                    \- sector 15, page 0, pgoff=60K
                  |  \- sector 1, page 0, pgoff=4K
                  \---- sector 0, page 0, pfoff=0

With such structure, we can represent subpage sectors without using
extra pages.

Here we introduce a new helper, index_stripe_sectors(), to update
stripe_sectors[] to point to correct page and pgoff.

So every time rbio::stripe_pages[] pointer gets updated, the new helper
should be called.

The following functions have to call the new helper:

- steal_rbio()
- alloc_rbio_pages()
- alloc_rbio_parity_pages()
- alloc_rbio_essential_pages()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 65 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 60 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 92d464d4456a..0c7b6a06d5ce 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -52,6 +52,16 @@ struct btrfs_stripe_hash_table {
 	struct btrfs_stripe_hash table[];
 };
 
+/*
+ * A bvec like structure to present a sector inside a page.
+ *
+ * Unlike bvec we don't need bvlen, as it's fixed to sectorsize.
+ */
+struct sector_ptr {
+	struct page *page;
+	unsigned int pgoff;
+};
+
 enum btrfs_rbio_ops {
 	BTRFS_RBIO_WRITE,
 	BTRFS_RBIO_READ_REBUILD,
@@ -171,8 +181,12 @@ struct btrfs_raid_bio {
 	struct page **bio_pages;
 
 	/*
-	 * bitmap to record which horizontal stripe has data
+	 * For subpage support, we need to map each sector to above
+	 * stripe_pages.
 	 */
+	struct sector_ptr *stripe_sectors;
+
+	/* Bitmap to record which horizontal stripe has data */
 	unsigned long *dbitmap;
 
 	/* allocated with real_stripes-many pointers for finish_*() calls */
@@ -291,6 +305,26 @@ static int rbio_bucket(struct btrfs_raid_bio *rbio)
 	return hash_64(num >> 16, BTRFS_STRIPE_HASH_TABLE_BITS);
 }
 
+/*
+ * Update the stripe_sectors[] array to use correct page and pgoff
+ *
+ * Should be called every time any page pointer in stripes_pages[] got modified.
+ */
+static void index_stripe_sectors(struct btrfs_raid_bio *rbio)
+{
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
+	u32 offset;
+	int i;
+
+	for (i = 0, offset = 0; i < rbio->nr_sectors; i++, offset += sectorsize) {
+		int page_index = offset >> PAGE_SHIFT;
+
+		ASSERT(page_index < rbio->nr_pages);
+		rbio->stripe_sectors[i].page = rbio->stripe_pages[page_index];
+		rbio->stripe_sectors[i].pgoff = offset_in_page(offset);
+	}
+}
+
 /*
  * stealing an rbio means taking all the uptodate pages from the stripe
  * array in the source rbio and putting them into the destination rbio
@@ -317,6 +351,8 @@ static void steal_rbio(struct btrfs_raid_bio *src, struct btrfs_raid_bio *dest)
 		dest->stripe_pages[i] = s;
 		src->stripe_pages[i] = NULL;
 	}
+	index_stripe_sectors(dest);
+	index_stripe_sectors(src);
 }
 
 /*
@@ -978,6 +1014,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	rbio = kzalloc(sizeof(*rbio) +
 		       sizeof(*rbio->stripe_pages) * num_pages +
 		       sizeof(*rbio->bio_pages) * num_pages +
+		       sizeof(*rbio->stripe_sectors) * num_sectors +
 		       sizeof(*rbio->finish_pointers) * real_stripes +
 		       sizeof(*rbio->dbitmap) * BITS_TO_LONGS(stripe_nsectors) +
 		       sizeof(*rbio->finish_pbitmap) *
@@ -1015,6 +1052,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	} while (0)
 	CONSUME_ALLOC(rbio->stripe_pages, num_pages);
 	CONSUME_ALLOC(rbio->bio_pages, num_pages);
+	CONSUME_ALLOC(rbio->stripe_sectors, num_sectors);
 	CONSUME_ALLOC(rbio->finish_pointers, real_stripes);
 	CONSUME_ALLOC(rbio->dbitmap, BITS_TO_LONGS(stripe_nsectors));
 	CONSUME_ALLOC(rbio->finish_pbitmap, BITS_TO_LONGS(stripe_nsectors));
@@ -1031,19 +1069,35 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	return rbio;
 }
 
-/* allocate pages for all the stripes in the bio, including parity */
+/*
+ * Allocate pages for all the stripes in the bio including parity, and map all
+ * sectors to their corresponding pages
+ */
 static int alloc_rbio_pages(struct btrfs_raid_bio *rbio)
 {
-	return btrfs_alloc_page_array(rbio->nr_pages, rbio->stripe_pages);
+	int ret;
+
+	ret = btrfs_alloc_page_array(rbio->nr_pages, rbio->stripe_pages);
+	if (ret < 0)
+		return ret;
+	/* Mapping all sectors */
+	index_stripe_sectors(rbio);
+	return 0;
 }
 
 /* only allocate pages for p/q stripes */
 static int alloc_rbio_parity_pages(struct btrfs_raid_bio *rbio)
 {
 	int data_pages = rbio_stripe_page_index(rbio, rbio->nr_data, 0);
+	int ret;
 
-	return btrfs_alloc_page_array(rbio->nr_pages - data_pages,
-				      rbio->stripe_pages + data_pages);
+	ret = btrfs_alloc_page_array(rbio->nr_pages - data_pages,
+				     rbio->stripe_pages + data_pages);
+	if (ret < 0)
+		return ret;
+
+	index_stripe_sectors(rbio);
+	return 0;
 }
 
 /*
@@ -2286,6 +2340,7 @@ static int alloc_rbio_essential_pages(struct btrfs_raid_bio *rbio)
 			rbio->stripe_pages[index] = page;
 		}
 	}
+	index_stripe_sectors(rbio);
 	return 0;
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 06/17] btrfs: introduce btrfs_raid_bio::bio_sectors
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (4 preceding siblings ...)
  2022-04-12  9:32 ` [PATCH v2 05/17] btrfs: introduce btrfs_raid_bio::stripe_sectors Qu Wenruo
@ 2022-04-12  9:32 ` Qu Wenruo
  2022-04-12  9:32 ` [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible Qu Wenruo
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:32 UTC (permalink / raw)
  To: linux-btrfs

This new member is going to fully replace bio_pages in the future, but
for now let's keep them co-exist, until the full switch is done.

Currently cache_rbio_pages() and index_rbio_pages() will also populate
the new array.

And cache_rbio_pages() need to record which sectors are uptodate, so we
also need to introduce sector_ptr::uptodate bit.

To avoid extra memory usage, we let the new @uptodate bit to share bits
with @pgoff.
Now pgoff only have at most 31 bits, which is already more than enough,
as even for 256K page size, we only need 18 bits.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 51 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 49 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 0c7b6a06d5ce..f88dd11a208d 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -59,7 +59,8 @@ struct btrfs_stripe_hash_table {
  */
 struct sector_ptr {
 	struct page *page;
-	unsigned int pgoff;
+	unsigned int pgoff:24;
+	unsigned int uptodate:8;
 };
 
 enum btrfs_rbio_ops {
@@ -174,6 +175,9 @@ struct btrfs_raid_bio {
 	 */
 	struct page **stripe_pages;
 
+	/* Pointers to the sectors in the bio_list, for faster lookup */
+	struct sector_ptr *bio_sectors;
+
 	/*
 	 * pointers to the pages in the bio_list.  Stored
 	 * here for faster lookup
@@ -284,6 +288,20 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio)
 		copy_highpage(rbio->stripe_pages[i], rbio->bio_pages[i]);
 		SetPageUptodate(rbio->stripe_pages[i]);
 	}
+
+	for (i = 0; i < rbio->nr_sectors; i++) {
+		/* Some range not covered by bio (partial write), skip it */
+		if (!rbio->bio_sectors[i].page)
+			continue;
+
+		ASSERT(rbio->stripe_sectors[i].page);
+		memcpy_page(rbio->stripe_sectors[i].page,
+			    rbio->stripe_sectors[i].pgoff,
+			    rbio->bio_sectors[i].page,
+			    rbio->bio_sectors[i].pgoff,
+			    rbio->bioc->fs_info->sectorsize);
+		rbio->stripe_sectors[i].uptodate = 1;
+	}
 	set_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
 }
 
@@ -1013,7 +1031,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 
 	rbio = kzalloc(sizeof(*rbio) +
 		       sizeof(*rbio->stripe_pages) * num_pages +
-		       sizeof(*rbio->bio_pages) * num_pages +
+		       sizeof(*rbio->bio_sectors) * num_sectors +
 		       sizeof(*rbio->stripe_sectors) * num_sectors +
 		       sizeof(*rbio->finish_pointers) * real_stripes +
 		       sizeof(*rbio->dbitmap) * BITS_TO_LONGS(stripe_nsectors) +
@@ -1052,6 +1070,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	} while (0)
 	CONSUME_ALLOC(rbio->stripe_pages, num_pages);
 	CONSUME_ALLOC(rbio->bio_pages, num_pages);
+	CONSUME_ALLOC(rbio->bio_sectors, num_sectors);
 	CONSUME_ALLOC(rbio->stripe_sectors, num_sectors);
 	CONSUME_ALLOC(rbio->finish_pointers, real_stripes);
 	CONSUME_ALLOC(rbio->dbitmap, BITS_TO_LONGS(stripe_nsectors));
@@ -1171,6 +1190,31 @@ static void validate_rbio_for_rmw(struct btrfs_raid_bio *rbio)
 	}
 }
 
+static void index_one_bio(struct btrfs_raid_bio *rbio, struct bio *bio)
+{
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
+	struct bio_vec bvec;
+	struct bvec_iter iter;
+	u32 offset = (bio->bi_iter.bi_sector << SECTOR_SHIFT) -
+		     rbio->bioc->raid_map[0];
+
+	if (bio_flagged(bio, BIO_CLONED))
+		bio->bi_iter = btrfs_bio(bio)->iter;
+	bio_for_each_segment(bvec, bio, iter) {
+		u32 bvec_offset;
+
+		for (bvec_offset = 0; bvec_offset < bvec.bv_len;
+		     bvec_offset += sectorsize, offset += sectorsize) {
+			int index = offset / sectorsize;
+			struct sector_ptr *sector = &rbio->bio_sectors[index];
+
+			sector->page = bvec.bv_page;
+			sector->pgoff = bvec.bv_offset + bvec_offset;
+			ASSERT(sector->pgoff < PAGE_SIZE);
+		}
+	}
+}
+
 /*
  * helper function to walk our bio list and populate the bio_pages array with
  * the result.  This seems expensive, but it is faster than constantly
@@ -1201,6 +1245,9 @@ static void index_rbio_pages(struct btrfs_raid_bio *rbio)
 			i++;
 		}
 	}
+	bio_list_for_each(bio, &rbio->bio_list)
+		index_one_bio(rbio, bio);
+
 	spin_unlock_irq(&rbio->bio_list_lock);
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (5 preceding siblings ...)
  2022-04-12  9:32 ` [PATCH v2 06/17] btrfs: introduce btrfs_raid_bio::bio_sectors Qu Wenruo
@ 2022-04-12  9:32 ` Qu Wenruo
  2022-04-13 19:14   ` David Sterba
  2022-04-12  9:32 ` [PATCH v2 08/17] btrfs: make finish_parity_scrub() " Qu Wenruo
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:32 UTC (permalink / raw)
  To: linux-btrfs

This involves:

- Rename rbio_add_io_page() to rbio_add_io_sector()
  Since we are still relying on PAGE_SIZE == sectorsize, add a new
  ASSERT() inside rbio_add_io_sector() to makesure all pgoff is 0.

- Introduce rbio_stripe_sector() helper
  The equivalent of rbio_stripe_page().

  This new helper has extra ASSERT()s to validate the stripe and sector
  number.

- Introduce sector_in_rbio() helper
  The equivalent of page_in_rbio().

- Rename @pagenr variables to @sectornr

- Use rbio::stripe_nsectors when iterating the bitmap

Please note that, this only changes the interface, the bios are still
using full page for IO.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 247 ++++++++++++++++++++++++++++++----------------
 1 file changed, 161 insertions(+), 86 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index f88dd11a208d..9c3e0201d032 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -662,6 +662,25 @@ static int rbio_can_merge(struct btrfs_raid_bio *last,
 	return 1;
 }
 
+static unsigned int rbio_stripe_sector_index(const struct btrfs_raid_bio *rbio,
+					     unsigned int stripe_nr,
+					     unsigned int sector_nr)
+{
+	ASSERT(stripe_nr < rbio->real_stripes);
+	ASSERT(sector_nr < rbio->stripe_nsectors);
+
+	return stripe_nr * rbio->stripe_nsectors + sector_nr;
+}
+
+/* Return a sector from rbio->stripe_sectors, not from the bio list */
+static struct sector_ptr *rbio_stripe_sector(const struct btrfs_raid_bio *rbio,
+					     unsigned int stripe_nr,
+					     unsigned int sector_nr)
+{
+	return &rbio->stripe_sectors[rbio_stripe_sector_index(rbio, stripe_nr,
+							      sector_nr)];
+}
+
 static int rbio_stripe_page_index(struct btrfs_raid_bio *rbio, int stripe,
 				  int index)
 {
@@ -973,6 +992,44 @@ static void raid_write_end_io(struct bio *bio)
 	rbio_orig_end_io(rbio, err);
 }
 
+/*
+ * To get a sector pointer specified by its @stripe_nr and @sector_nr
+ *
+ * @stripe_nr:		Stripe number, valid range [0, real_stripe)
+ * @sector_nr:		Sector number inside the stripe,
+ *			valid range [0, stripe_nsectors)
+ * @bio_list_only:	Whether to use sectors inside the bio list only.
+ *
+ * The read/modify/write code wants to reuse the original bio page as much
+ * as possible, and only use stripe_sectors as fallback.
+ */
+static struct sector_ptr *sector_in_rbio(struct btrfs_raid_bio *rbio,
+					 int stripe_nr, int sector_nr,
+					 bool bio_list_only)
+{
+	struct sector_ptr *sector = NULL;
+	int index;
+
+	ASSERT(stripe_nr >= 0 && stripe_nr < rbio->real_stripes);
+	ASSERT(sector_nr >= 0 && sector_nr < rbio->stripe_nsectors);
+
+	index = stripe_nr * rbio->stripe_nsectors + sector_nr;
+	ASSERT(index >= 0 && index < rbio->nr_sectors);
+
+	spin_lock_irq(&rbio->bio_list_lock);
+	sector = &rbio->bio_sectors[index];
+	if (sector->page || bio_list_only) {
+		/* Don't return sector without a valid page pointer */
+		if (!sector->page)
+			sector = NULL;
+		spin_unlock_irq(&rbio->bio_list_lock);
+		return sector;
+	}
+	spin_unlock_irq(&rbio->bio_list_lock);
+
+	return &rbio->stripe_sectors[index];
+}
+
 /*
  * the read/modify/write code wants to use the original bio for
  * any pages it included, and then use the rbio for everything
@@ -1120,26 +1177,39 @@ static int alloc_rbio_parity_pages(struct btrfs_raid_bio *rbio)
 }
 
 /*
- * add a single page from a specific stripe into our list of bios for IO
- * this will try to merge into existing bios if possible, and returns
- * zero if all went well.
+ * Add a single sector @sector into our list of bios for IO.
+ *
+ * Return 0 if everything went well.
+ * Return <0 for error.
  */
-static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
-			    struct bio_list *bio_list,
-			    struct page *page,
-			    int stripe_nr,
-			    unsigned long page_index,
-			    unsigned long bio_max_len,
-			    unsigned int opf)
+static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
+			      struct bio_list *bio_list,
+			      struct sector_ptr *sector,
+			      unsigned int stripe_nr,
+			      unsigned int sector_nr,
+			      unsigned long bio_max_len, unsigned int opf)
 {
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
 	struct bio *last = bio_list->tail;
 	int ret;
 	struct bio *bio;
 	struct btrfs_io_stripe *stripe;
 	u64 disk_start;
 
+	/*
+	 * NOTE: here stripe_nr has taken device replace into consideration,
+	 * thus it can be larger than rbio->real_stripe.
+	 * So here we check against bioc->num_stripes, not rbio->real_stripes.
+	 */
+	ASSERT(stripe_nr >= 0 && stripe_nr < rbio->bioc->num_stripes);
+	ASSERT(sector_nr >= 0 && sector_nr < rbio->stripe_nsectors);
+	ASSERT(sector->page);
+
+	/* We don't yet support subpage, thus pgoff should always be 0 */
+	ASSERT(sector->pgoff == 0);
+
 	stripe = &rbio->bioc->stripes[stripe_nr];
-	disk_start = stripe->physical + (page_index << PAGE_SHIFT);
+	disk_start = stripe->physical + sector_nr * sectorsize;
 
 	/* if the device is missing, just fail this stripe */
 	if (!stripe->dev->bdev)
@@ -1156,8 +1226,9 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
 		 */
 		if (last_end == disk_start && !last->bi_status &&
 		    last->bi_bdev == stripe->dev->bdev) {
-			ret = bio_add_page(last, page, PAGE_SIZE, 0);
-			if (ret == PAGE_SIZE)
+			ret = bio_add_page(last, sector->page, sectorsize,
+					   sector->pgoff);
+			if (ret == sectorsize)
 				return 0;
 		}
 	}
@@ -1168,7 +1239,7 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
 	bio->bi_iter.bi_sector = disk_start >> 9;
 	bio->bi_private = rbio;
 
-	bio_add_page(bio, page, PAGE_SIZE, 0);
+	bio_add_page(bio, sector->page, sectorsize, sector->pgoff);
 	bio_list_add(bio_list, bio);
 	return 0;
 }
@@ -1265,7 +1336,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 	void **pointers = rbio->finish_pointers;
 	int nr_data = rbio->nr_data;
 	int stripe;
-	int pagenr;
+	int sectornr;
 	bool has_qstripe;
 	struct bio_list bio_list;
 	struct bio *bio;
@@ -1309,16 +1380,16 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 	else
 		clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
 
-	for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
+	for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
 		struct page *p;
 		/* first collect one page from each data stripe */
 		for (stripe = 0; stripe < nr_data; stripe++) {
-			p = page_in_rbio(rbio, stripe, pagenr, 0);
+			p = page_in_rbio(rbio, stripe, sectornr, 0);
 			pointers[stripe] = kmap_local_page(p);
 		}
 
 		/* then add the parity stripe */
-		p = rbio_pstripe_page(rbio, pagenr);
+		p = rbio_pstripe_page(rbio, sectornr);
 		SetPageUptodate(p);
 		pointers[stripe++] = kmap_local_page(p);
 
@@ -1328,7 +1399,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 			 * raid6, add the qstripe and call the
 			 * library function to fill in our p/q
 			 */
-			p = rbio_qstripe_page(rbio, pagenr);
+			p = rbio_qstripe_page(rbio, sectornr);
 			SetPageUptodate(p);
 			pointers[stripe++] = kmap_local_page(p);
 
@@ -1349,19 +1420,19 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 	 * everything else.
 	 */
 	for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
-		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
-			struct page *page;
+		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
+			struct sector_ptr *sector;
+
 			if (stripe < rbio->nr_data) {
-				page = page_in_rbio(rbio, stripe, pagenr, 1);
-				if (!page)
+				sector = sector_in_rbio(rbio, stripe, sectornr, 1);
+				if (!sector)
 					continue;
 			} else {
-			       page = rbio_stripe_page(rbio, stripe, pagenr);
+				sector = rbio_stripe_sector(rbio, stripe, sectornr);
 			}
 
-			ret = rbio_add_io_page(rbio, &bio_list,
-				       page, stripe, pagenr, rbio->stripe_len,
-				       REQ_OP_WRITE);
+			ret = rbio_add_io_sector(rbio, &bio_list, sector, stripe,
+						 sectornr, rbio->stripe_len, REQ_OP_WRITE);
 			if (ret)
 				goto cleanup;
 		}
@@ -1374,20 +1445,21 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 		if (!bioc->tgtdev_map[stripe])
 			continue;
 
-		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
-			struct page *page;
+		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
+			struct sector_ptr *sector;
+
 			if (stripe < rbio->nr_data) {
-				page = page_in_rbio(rbio, stripe, pagenr, 1);
-				if (!page)
+				sector = sector_in_rbio(rbio, stripe, sectornr, 1);
+				if (!sector)
 					continue;
 			} else {
-			       page = rbio_stripe_page(rbio, stripe, pagenr);
+				sector = rbio_stripe_sector(rbio, stripe, sectornr);
 			}
 
-			ret = rbio_add_io_page(rbio, &bio_list, page,
-					       rbio->bioc->tgtdev_map[stripe],
-					       pagenr, rbio->stripe_len,
-					       REQ_OP_WRITE);
+			ret = rbio_add_io_sector(rbio, &bio_list, sector,
+						 rbio->bioc->tgtdev_map[stripe],
+						 sectornr, rbio->stripe_len,
+						 REQ_OP_WRITE);
 			if (ret)
 				goto cleanup;
 		}
@@ -1563,7 +1635,7 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
 	int bios_to_read = 0;
 	struct bio_list bio_list;
 	int ret;
-	int pagenr;
+	int sectornr;
 	int stripe;
 	struct bio *bio;
 
@@ -1581,28 +1653,29 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
 	 * stripe
 	 */
 	for (stripe = 0; stripe < rbio->nr_data; stripe++) {
-		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
-			struct page *page;
+		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
+			struct sector_ptr *sector;
+
 			/*
-			 * we want to find all the pages missing from
+			 * We want to find all the sectors missing from
 			 * the rbio and read them from the disk.  If
-			 * page_in_rbio finds a page in the bio list
+			 * sector_in_rbio() finds a page in the bio list
 			 * we don't need to read it off the stripe.
 			 */
-			page = page_in_rbio(rbio, stripe, pagenr, 1);
-			if (page)
+			sector = sector_in_rbio(rbio, stripe, sectornr, 1);
+			if (sector)
 				continue;
 
-			page = rbio_stripe_page(rbio, stripe, pagenr);
+			sector = rbio_stripe_sector(rbio, stripe, sectornr);
 			/*
-			 * the bio cache may have handed us an uptodate
+			 * The bio cache may have handed us an uptodate
 			 * page.  If so, be happy and use it
 			 */
-			if (PageUptodate(page))
+			if (sector->uptodate)
 				continue;
 
-			ret = rbio_add_io_page(rbio, &bio_list, page,
-				       stripe, pagenr, rbio->stripe_len,
+			ret = rbio_add_io_sector(rbio, &bio_list, sector,
+				       stripe, sectornr, rbio->stripe_len,
 				       REQ_OP_READ);
 			if (ret)
 				goto cleanup;
@@ -2107,7 +2180,7 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
 	int bios_to_read = 0;
 	struct bio_list bio_list;
 	int ret;
-	int pagenr;
+	int sectornr;
 	int stripe;
 	struct bio *bio;
 
@@ -2130,21 +2203,20 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
 			continue;
 		}
 
-		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
-			struct page *p;
+		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
+			struct sector_ptr *sector;
 
 			/*
 			 * the rmw code may have already read this
 			 * page in
 			 */
-			p = rbio_stripe_page(rbio, stripe, pagenr);
-			if (PageUptodate(p))
+			sector = rbio_stripe_sector(rbio, stripe, sectornr);
+			if (sector->uptodate)
 				continue;
 
-			ret = rbio_add_io_page(rbio, &bio_list,
-				       rbio_stripe_page(rbio, stripe, pagenr),
-				       stripe, pagenr, rbio->stripe_len,
-				       REQ_OP_READ);
+			ret = rbio_add_io_sector(rbio, &bio_list, sector,
+						 stripe, sectornr,
+						 rbio->stripe_len, REQ_OP_READ);
 			if (ret < 0)
 				goto cleanup;
 		}
@@ -2399,7 +2471,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	unsigned long *pbitmap = rbio->finish_pbitmap;
 	int nr_data = rbio->nr_data;
 	int stripe;
-	int pagenr;
+	int sectornr;
 	bool has_qstripe;
 	struct page *p_page = NULL;
 	struct page *q_page = NULL;
@@ -2419,7 +2491,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 
 	if (bioc->num_tgtdevs && bioc->tgtdev_map[rbio->scrubp]) {
 		is_replace = 1;
-		bitmap_copy(pbitmap, rbio->dbitmap, rbio->stripe_npages);
+		bitmap_copy(pbitmap, rbio->dbitmap, rbio->stripe_nsectors);
 	}
 
 	/*
@@ -2453,12 +2525,12 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	/* Map the parity stripe just once */
 	pointers[nr_data] = kmap_local_page(p_page);
 
-	for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
+	for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
 		struct page *p;
 		void *parity;
 		/* first collect one page from each data stripe */
 		for (stripe = 0; stripe < nr_data; stripe++) {
-			p = page_in_rbio(rbio, stripe, pagenr, 0);
+			p = page_in_rbio(rbio, stripe, sectornr, 0);
 			pointers[stripe] = kmap_local_page(p);
 		}
 
@@ -2473,13 +2545,13 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 		}
 
 		/* Check scrubbing parity and repair it */
-		p = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
+		p = rbio_stripe_page(rbio, rbio->scrubp, sectornr);
 		parity = kmap_local_page(p);
 		if (memcmp(parity, pointers[rbio->scrubp], PAGE_SIZE))
 			copy_page(parity, pointers[rbio->scrubp]);
 		else
 			/* Parity is right, needn't writeback */
-			bitmap_clear(rbio->dbitmap, pagenr, 1);
+			bitmap_clear(rbio->dbitmap, sectornr, 1);
 		kunmap_local(parity);
 
 		for (stripe = nr_data - 1; stripe >= 0; stripe--)
@@ -2499,12 +2571,13 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	 * higher layers (the bio_list in our rbio) and our p/q.  Ignore
 	 * everything else.
 	 */
-	for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
-		struct page *page;
+	for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
+		struct sector_ptr *sector;
 
-		page = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
-		ret = rbio_add_io_page(rbio, &bio_list, page, rbio->scrubp,
-				       pagenr, rbio->stripe_len, REQ_OP_WRITE);
+		sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
+		ret = rbio_add_io_sector(rbio, &bio_list, sector, rbio->scrubp,
+					 sectornr, rbio->stripe_len,
+					 REQ_OP_WRITE);
 		if (ret)
 			goto cleanup;
 	}
@@ -2512,13 +2585,13 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	if (!is_replace)
 		goto submit_write;
 
-	for_each_set_bit(pagenr, pbitmap, rbio->stripe_npages) {
-		struct page *page;
+	for_each_set_bit(sectornr, pbitmap, rbio->stripe_nsectors) {
+		struct sector_ptr *sector;
 
-		page = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
-		ret = rbio_add_io_page(rbio, &bio_list, page,
+		sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
+		ret = rbio_add_io_sector(rbio, &bio_list, sector,
 				       bioc->tgtdev_map[rbio->scrubp],
-				       pagenr, rbio->stripe_len, REQ_OP_WRITE);
+				       sectornr, rbio->stripe_len, REQ_OP_WRITE);
 		if (ret)
 			goto cleanup;
 	}
@@ -2650,7 +2723,7 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
 	int bios_to_read = 0;
 	struct bio_list bio_list;
 	int ret;
-	int pagenr;
+	int sectornr;
 	int stripe;
 	struct bio *bio;
 
@@ -2666,28 +2739,30 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
 	 * stripe
 	 */
 	for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
-		for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
-			struct page *page;
+		for_each_set_bit(sectornr, rbio->dbitmap,
+				 rbio->stripe_nsectors) {
+			struct sector_ptr *sector;
 			/*
-			 * we want to find all the pages missing from
+			 * We want to find all the sectors missing from
 			 * the rbio and read them from the disk.  If
-			 * page_in_rbio finds a page in the bio list
+			 * sector_in_rbio() finds a sector in the bio list
 			 * we don't need to read it off the stripe.
 			 */
-			page = page_in_rbio(rbio, stripe, pagenr, 1);
-			if (page)
+			sector = sector_in_rbio(rbio, stripe, sectornr, 1);
+			if (sector)
 				continue;
 
-			page = rbio_stripe_page(rbio, stripe, pagenr);
+			sector = rbio_stripe_sector(rbio, stripe, sectornr);
 			/*
-			 * the bio cache may have handed us an uptodate
-			 * page.  If so, be happy and use it
+			 * The bio cache may have handed us an uptodate
+			 * sector.  If so, be happy and use it
 			 */
-			if (PageUptodate(page))
+			if (sector->uptodate)
 				continue;
 
-			ret = rbio_add_io_page(rbio, &bio_list, page, stripe,
-					       pagenr, rbio->stripe_len, REQ_OP_READ);
+			ret = rbio_add_io_sector(rbio, &bio_list, sector,
+						 stripe, sectornr,
+						 rbio->stripe_len, REQ_OP_READ);
 			if (ret)
 				goto cleanup;
 		}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 08/17] btrfs: make finish_parity_scrub() subpage compatible
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (6 preceding siblings ...)
  2022-04-12  9:32 ` [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible Qu Wenruo
@ 2022-04-12  9:32 ` Qu Wenruo
  2022-04-12  9:32 ` [PATCH v2 09/17] btrfs: make __raid_recover_endio_io() subpage compatibable Qu Wenruo
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:32 UTC (permalink / raw)
  To: linux-btrfs

The core is to convert direct page usage into sector_ptr usage, and
use memcpy() to replace copy_page().

For pointers usage, we need to convert it to kmap_local_page() +
sector->pgoff.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 56 +++++++++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 9c3e0201d032..6b2d75a09896 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2467,14 +2467,15 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 					 int need_check)
 {
 	struct btrfs_io_context *bioc = rbio->bioc;
+	const u32 sectorsize = bioc->fs_info->sectorsize;
 	void **pointers = rbio->finish_pointers;
 	unsigned long *pbitmap = rbio->finish_pbitmap;
 	int nr_data = rbio->nr_data;
 	int stripe;
 	int sectornr;
 	bool has_qstripe;
-	struct page *p_page = NULL;
-	struct page *q_page = NULL;
+	struct sector_ptr p_sector = { 0 };
+	struct sector_ptr q_sector = { 0 };
 	struct bio_list bio_list;
 	struct bio *bio;
 	int is_replace = 0;
@@ -2504,51 +2505,56 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	if (!need_check)
 		goto writeback;
 
-	p_page = alloc_page(GFP_NOFS);
-	if (!p_page)
+	p_sector.page = alloc_page(GFP_NOFS);
+	if (!p_sector.page)
 		goto cleanup;
-	SetPageUptodate(p_page);
+	p_sector.pgoff = 0;
+	p_sector.uptodate = 1;
 
 	if (has_qstripe) {
 		/* RAID6, allocate and map temp space for the Q stripe */
-		q_page = alloc_page(GFP_NOFS);
-		if (!q_page) {
-			__free_page(p_page);
+		q_sector.page = alloc_page(GFP_NOFS);
+		if (!q_sector.page) {
+			__free_page(p_sector.page);
+			p_sector.page = NULL;
 			goto cleanup;
 		}
-		SetPageUptodate(q_page);
-		pointers[rbio->real_stripes - 1] = kmap_local_page(q_page);
+		q_sector.pgoff = 0;
+		q_sector.uptodate = 1;
+		pointers[rbio->real_stripes - 1] = kmap_local_page(q_sector.page);
 	}
 
 	atomic_set(&rbio->error, 0);
 
 	/* Map the parity stripe just once */
-	pointers[nr_data] = kmap_local_page(p_page);
+	pointers[nr_data] = kmap_local_page(p_sector.page);
 
 	for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
-		struct page *p;
+		struct sector_ptr *sector;
 		void *parity;
+
 		/* first collect one page from each data stripe */
 		for (stripe = 0; stripe < nr_data; stripe++) {
-			p = page_in_rbio(rbio, stripe, sectornr, 0);
-			pointers[stripe] = kmap_local_page(p);
+			sector = sector_in_rbio(rbio, stripe, sectornr, 0);
+			pointers[stripe] = kmap_local_page(sector->page) +
+					   sector->pgoff;
 		}
 
 		if (has_qstripe) {
 			/* RAID6, call the library function to fill in our P/Q */
-			raid6_call.gen_syndrome(rbio->real_stripes, PAGE_SIZE,
+			raid6_call.gen_syndrome(rbio->real_stripes, sectorsize,
 						pointers);
 		} else {
 			/* raid5 */
-			copy_page(pointers[nr_data], pointers[0]);
-			run_xor(pointers + 1, nr_data - 1, PAGE_SIZE);
+			memcpy(pointers[nr_data], pointers[0], sectorsize);
+			run_xor(pointers + 1, nr_data - 1, sectorsize);
 		}
 
 		/* Check scrubbing parity and repair it */
-		p = rbio_stripe_page(rbio, rbio->scrubp, sectornr);
-		parity = kmap_local_page(p);
-		if (memcmp(parity, pointers[rbio->scrubp], PAGE_SIZE))
-			copy_page(parity, pointers[rbio->scrubp]);
+		sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
+		parity = kmap_local_page(sector->page) + sector->pgoff;
+		if (memcmp(parity, pointers[rbio->scrubp], sectorsize))
+			memcpy(parity, pointers[rbio->scrubp], sectorsize);
 		else
 			/* Parity is right, needn't writeback */
 			bitmap_clear(rbio->dbitmap, sectornr, 1);
@@ -2559,10 +2565,12 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	}
 
 	kunmap_local(pointers[nr_data]);
-	__free_page(p_page);
-	if (q_page) {
+	__free_page(p_sector.page);
+	p_sector.page = NULL;
+	if (q_sector.page) {
 		kunmap_local(pointers[rbio->real_stripes - 1]);
-		__free_page(q_page);
+		__free_page(q_sector.page);
+		q_sector.page = NULL;
 	}
 
 writeback:
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 09/17] btrfs: make __raid_recover_endio_io() subpage compatibable
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (7 preceding siblings ...)
  2022-04-12  9:32 ` [PATCH v2 08/17] btrfs: make finish_parity_scrub() " Qu Wenruo
@ 2022-04-12  9:32 ` Qu Wenruo
  2022-04-12  9:33 ` [PATCH v2 10/17] btrfs: make finish_rmw() subpage compatible Qu Wenruo
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:32 UTC (permalink / raw)
  To: linux-btrfs

This involves:

- Use sector_ptr interface to grab the pointers

- Add sector->pgoff to pointers[]

- Rebuild data using sectorsize instead of PAGE_SIZE

- Use memcpy() to replace copy_page()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 52 ++++++++++++++++++++++++++---------------------
 1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 6b2d75a09896..075b996040ba 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1929,14 +1929,18 @@ int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc,
  */
 static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 {
-	int pagenr, stripe;
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
+	int sectornr, stripe;
 	void **pointers;
 	void **unmap_array;
 	int faila = -1, failb = -1;
-	struct page *page;
 	blk_status_t err;
 	int i;
 
+	/*
+	 * This array stores the pointer for each sector, thus it has the extra
+	 * pgoff value added from each sector
+	 */
 	pointers = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
 	if (!pointers) {
 		err = BLK_STS_RESOURCE;
@@ -1965,43 +1969,44 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 
 	index_rbio_pages(rbio);
 
-	for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
+	for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
+		struct sector_ptr *sector;
+
 		/*
 		 * Now we just use bitmap to mark the horizontal stripes in
 		 * which we have data when doing parity scrub.
 		 */
 		if (rbio->operation == BTRFS_RBIO_PARITY_SCRUB &&
-		    !test_bit(pagenr, rbio->dbitmap))
+		    !test_bit(sectornr, rbio->dbitmap))
 			continue;
 
 		/*
-		 * Setup our array of pointers with pages from each stripe
+		 * Setup our array of pointers with sectors from each stripe
 		 *
 		 * NOTE: store a duplicate array of pointers to preserve the
 		 * pointer order
 		 */
 		for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
 			/*
-			 * if we're rebuilding a read, we have to use
+			 * If we're rebuilding a read, we have to use
 			 * pages from the bio list
 			 */
 			if ((rbio->operation == BTRFS_RBIO_READ_REBUILD ||
 			     rbio->operation == BTRFS_RBIO_REBUILD_MISSING) &&
 			    (stripe == faila || stripe == failb)) {
-				page = page_in_rbio(rbio, stripe, pagenr, 0);
+				sector = sector_in_rbio(rbio, stripe, sectornr, 0);
 			} else {
-				page = rbio_stripe_page(rbio, stripe, pagenr);
+				sector = rbio_stripe_sector(rbio, stripe, sectornr);
 			}
-			pointers[stripe] = kmap_local_page(page);
+			ASSERT(sector->page);
+			pointers[stripe] = kmap_local_page(sector->page) +
+					   sector->pgoff;
 			unmap_array[stripe] = pointers[stripe];
 		}
 
-		/* all raid6 handling here */
+		/* All raid6 handling here */
 		if (rbio->bioc->map_type & BTRFS_BLOCK_GROUP_RAID6) {
-			/*
-			 * single failure, rebuild from parity raid5
-			 * style
-			 */
+			/* Single failure, rebuild from parity raid5 style */
 			if (failb < 0) {
 				if (faila == rbio->nr_data) {
 					/*
@@ -2044,10 +2049,10 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 
 			if (rbio->bioc->raid_map[failb] == RAID5_P_STRIPE) {
 				raid6_datap_recov(rbio->real_stripes,
-						  PAGE_SIZE, faila, pointers);
+						  sectorsize, faila, pointers);
 			} else {
 				raid6_2data_recov(rbio->real_stripes,
-						  PAGE_SIZE, faila, failb,
+						  sectorsize, faila, failb,
 						  pointers);
 			}
 		} else {
@@ -2057,7 +2062,8 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 			BUG_ON(failb != -1);
 pstripe:
 			/* Copy parity block into failed block to start with */
-			copy_page(pointers[faila], pointers[rbio->nr_data]);
+			memcpy(pointers[faila], pointers[rbio->nr_data],
+			       sectorsize);
 
 			/* rearrange the pointer array */
 			p = pointers[faila];
@@ -2066,7 +2072,7 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 			pointers[rbio->nr_data - 1] = p;
 
 			/* xor in the rest */
-			run_xor(pointers, rbio->nr_data - 1, PAGE_SIZE);
+			run_xor(pointers, rbio->nr_data - 1, sectorsize);
 		}
 		/* if we're doing this rebuild as part of an rmw, go through
 		 * and set all of our private rbio pages in the
@@ -2075,14 +2081,14 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 		 * other endio functions will fiddle the uptodate bits
 		 */
 		if (rbio->operation == BTRFS_RBIO_WRITE) {
-			for (i = 0;  i < rbio->stripe_npages; i++) {
+			for (i = 0;  i < rbio->stripe_nsectors; i++) {
 				if (faila != -1) {
-					page = rbio_stripe_page(rbio, faila, i);
-					SetPageUptodate(page);
+					sector = rbio_stripe_sector(rbio, faila, i);
+					sector->uptodate = 1;
 				}
 				if (failb != -1) {
-					page = rbio_stripe_page(rbio, failb, i);
-					SetPageUptodate(page);
+					sector = rbio_stripe_sector(rbio, failb, i);
+					sector->uptodate = 1;
 				}
 			}
 		}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 10/17] btrfs: make finish_rmw() subpage compatible
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (8 preceding siblings ...)
  2022-04-12  9:32 ` [PATCH v2 09/17] btrfs: make __raid_recover_endio_io() subpage compatibable Qu Wenruo
@ 2022-04-12  9:33 ` Qu Wenruo
  2022-04-12  9:33 ` [PATCH v2 11/17] btrfs: open-code rbio_stripe_page_index() Qu Wenruo
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:33 UTC (permalink / raw)
  To: linux-btrfs

With this function converted to subpage compatible sector interfaces,
the following helper functions can be removed:

- rbio_stripe_page()
- rbio_pstripe_page()
- rbio_qstripe_page()
- page_in_rbio()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 108 ++++++++++++++--------------------------------
 1 file changed, 32 insertions(+), 76 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 075b996040ba..928359840e8e 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -681,39 +681,26 @@ static struct sector_ptr *rbio_stripe_sector(const struct btrfs_raid_bio *rbio,
 							      sector_nr)];
 }
 
-static int rbio_stripe_page_index(struct btrfs_raid_bio *rbio, int stripe,
-				  int index)
+/* Helper to grab a sector inside P stripe */
+static struct sector_ptr *rbio_pstripe_sector(const struct btrfs_raid_bio *rbio,
+					      unsigned int sector_nr)
 {
-	return stripe * rbio->stripe_npages + index;
+	return rbio_stripe_sector(rbio, rbio->nr_data, sector_nr);
 }
 
-/*
- * these are just the pages from the rbio array, not from anything
- * the FS sent down to us
- */
-static struct page *rbio_stripe_page(struct btrfs_raid_bio *rbio, int stripe,
-				     int index)
+/* Helper to grab a sector inside Q stripe, return NULL if not RAID6 */
+static struct sector_ptr *rbio_qstripe_sector(const struct btrfs_raid_bio *rbio,
+					      unsigned int sector_nr)
 {
-	return rbio->stripe_pages[rbio_stripe_page_index(rbio, stripe, index)];
-}
-
-/*
- * helper to index into the pstripe
- */
-static struct page *rbio_pstripe_page(struct btrfs_raid_bio *rbio, int index)
-{
-	return rbio_stripe_page(rbio, rbio->nr_data, index);
+	if (rbio->nr_data + 1 == rbio->real_stripes)
+		return NULL;
+	return rbio_stripe_sector(rbio, rbio->nr_data + 1, sector_nr);
 }
 
-/*
- * helper to index into the qstripe, returns null
- * if there is no qstripe
- */
-static struct page *rbio_qstripe_page(struct btrfs_raid_bio *rbio, int index)
+static int rbio_stripe_page_index(struct btrfs_raid_bio *rbio, int stripe,
+				  int index)
 {
-	if (rbio->nr_data + 1 == rbio->real_stripes)
-		return NULL;
-	return rbio_stripe_page(rbio, rbio->nr_data + 1, index);
+	return stripe * rbio->stripe_npages + index;
 }
 
 /*
@@ -1030,40 +1017,6 @@ static struct sector_ptr *sector_in_rbio(struct btrfs_raid_bio *rbio,
 	return &rbio->stripe_sectors[index];
 }
 
-/*
- * the read/modify/write code wants to use the original bio for
- * any pages it included, and then use the rbio for everything
- * else.  This function decides if a given index (stripe number)
- * and page number in that stripe fall inside the original bio
- * or the rbio.
- *
- * if you set bio_list_only, you'll get a NULL back for any ranges
- * that are outside the bio_list
- *
- * This doesn't take any refs on anything, you get a bare page pointer
- * and the caller must bump refs as required.
- *
- * You must call index_rbio_pages once before you can trust
- * the answers from this function.
- */
-static struct page *page_in_rbio(struct btrfs_raid_bio *rbio,
-				 int index, int pagenr, int bio_list_only)
-{
-	int chunk_page;
-	struct page *p = NULL;
-
-	chunk_page = index * (rbio->stripe_len >> PAGE_SHIFT) + pagenr;
-
-	spin_lock_irq(&rbio->bio_list_lock);
-	p = rbio->bio_pages[chunk_page];
-	spin_unlock_irq(&rbio->bio_list_lock);
-
-	if (p || bio_list_only)
-		return p;
-
-	return rbio->stripe_pages[chunk_page];
-}
-
 /*
  * allocation and initial setup for the btrfs_raid_bio.  Not
  * this does not allocate any pages for rbio->pages.
@@ -1333,6 +1286,7 @@ static void index_rbio_pages(struct btrfs_raid_bio *rbio)
 static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 {
 	struct btrfs_io_context *bioc = rbio->bioc;
+	const u32 sectorsize = bioc->fs_info->sectorsize;
 	void **pointers = rbio->finish_pointers;
 	int nr_data = rbio->nr_data;
 	int stripe;
@@ -1381,34 +1335,36 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 		clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
 
 	for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
-		struct page *p;
-		/* first collect one page from each data stripe */
+		struct sector_ptr *sector;
+
+		/* First collect one sector from each data stripe */
 		for (stripe = 0; stripe < nr_data; stripe++) {
-			p = page_in_rbio(rbio, stripe, sectornr, 0);
-			pointers[stripe] = kmap_local_page(p);
+			sector = sector_in_rbio(rbio, stripe, sectornr, 0);
+			pointers[stripe] = kmap_local_page(sector->page) +
+					   sector->pgoff;
 		}
 
-		/* then add the parity stripe */
-		p = rbio_pstripe_page(rbio, sectornr);
-		SetPageUptodate(p);
-		pointers[stripe++] = kmap_local_page(p);
+		/* Then add the parity stripe */
+		sector = rbio_pstripe_sector(rbio, sectornr);
+		sector->uptodate = 1;
+		pointers[stripe++] = kmap_local_page(sector->page) + sector->pgoff;
 
 		if (has_qstripe) {
-
 			/*
-			 * raid6, add the qstripe and call the
+			 * RAID6, add the qstripe and call the
 			 * library function to fill in our p/q
 			 */
-			p = rbio_qstripe_page(rbio, sectornr);
-			SetPageUptodate(p);
-			pointers[stripe++] = kmap_local_page(p);
+			sector = rbio_qstripe_sector(rbio, sectornr);
+			sector->uptodate = 1;
+			pointers[stripe++] = kmap_local_page(sector->page) +
+					     sector->pgoff;
 
-			raid6_call.gen_syndrome(rbio->real_stripes, PAGE_SIZE,
+			raid6_call.gen_syndrome(rbio->real_stripes, sectorsize,
 						pointers);
 		} else {
 			/* raid5 */
-			copy_page(pointers[nr_data], pointers[0]);
-			run_xor(pointers + 1, nr_data - 1, PAGE_SIZE);
+			memcpy(pointers[nr_data], pointers[0], sectorsize);
+			run_xor(pointers + 1, nr_data - 1, sectorsize);
 		}
 		for (stripe = stripe - 1; stripe >= 0; stripe--)
 			kunmap_local(pointers[stripe]);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 11/17] btrfs: open-code rbio_stripe_page_index()
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (9 preceding siblings ...)
  2022-04-12  9:33 ` [PATCH v2 10/17] btrfs: make finish_rmw() subpage compatible Qu Wenruo
@ 2022-04-12  9:33 ` Qu Wenruo
  2022-04-12  9:33 ` [PATCH v2 12/17] btrfs: make raid56_add_scrub_pages() subpage compatible Qu Wenruo
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:33 UTC (permalink / raw)
  To: linux-btrfs

There is only one caller for that helper now, and we're definitely fine
to open-code it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 928359840e8e..5d383f894eb9 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -697,12 +697,6 @@ static struct sector_ptr *rbio_qstripe_sector(const struct btrfs_raid_bio *rbio,
 	return rbio_stripe_sector(rbio, rbio->nr_data + 1, sector_nr);
 }
 
-static int rbio_stripe_page_index(struct btrfs_raid_bio *rbio, int stripe,
-				  int index)
-{
-	return stripe * rbio->stripe_npages + index;
-}
-
 /*
  * The first stripe in the table for a logical address
  * has the lock.  rbios are added in one of three ways:
@@ -1117,7 +1111,7 @@ static int alloc_rbio_pages(struct btrfs_raid_bio *rbio)
 /* only allocate pages for p/q stripes */
 static int alloc_rbio_parity_pages(struct btrfs_raid_bio *rbio)
 {
-	int data_pages = rbio_stripe_page_index(rbio, rbio->nr_data, 0);
+	int data_pages = rbio->nr_data * rbio->stripe_npages;
 	int ret;
 
 	ret = btrfs_alloc_page_array(rbio->nr_pages - data_pages,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 12/17] btrfs: make raid56_add_scrub_pages() subpage compatible
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (10 preceding siblings ...)
  2022-04-12  9:33 ` [PATCH v2 11/17] btrfs: open-code rbio_stripe_page_index() Qu Wenruo
@ 2022-04-12  9:33 ` Qu Wenruo
  2022-04-12  9:33 ` [PATCH v2 13/17] btrfs: remove btrfs_raid_bio::bio_pages array Qu Wenruo
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:33 UTC (permalink / raw)
  To: linux-btrfs

This requires one extra parameter @pgoff for the function.

In the current code base, scrub is still one page per sector, thus the
new parameter will always be 0.

It needs the extra subpage scrub optimization code to fully take
advantage.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 10 ++++++----
 fs/btrfs/raid56.h |  2 +-
 fs/btrfs/scrub.c  |  6 +++++-
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 5d383f894eb9..5f89ff3963c2 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2379,17 +2379,19 @@ struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio,
 
 /* Used for both parity scrub and missing. */
 void raid56_add_scrub_pages(struct btrfs_raid_bio *rbio, struct page *page,
-			    u64 logical)
+			    unsigned int pgoff, u64 logical)
 {
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
 	int stripe_offset;
 	int index;
 
 	ASSERT(logical >= rbio->bioc->raid_map[0]);
-	ASSERT(logical + PAGE_SIZE <= rbio->bioc->raid_map[0] +
+	ASSERT(logical + sectorsize <= rbio->bioc->raid_map[0] +
 				rbio->stripe_len * rbio->nr_data);
 	stripe_offset = (int)(logical - rbio->bioc->raid_map[0]);
-	index = stripe_offset >> PAGE_SHIFT;
-	rbio->bio_pages[index] = page;
+	index = stripe_offset / sectorsize;
+	rbio->bio_sectors[index].page = page;
+	rbio->bio_sectors[index].pgoff = pgoff;
 }
 
 /*
diff --git a/fs/btrfs/raid56.h b/fs/btrfs/raid56.h
index fb35ae157b02..b5bc0feb3401 100644
--- a/fs/btrfs/raid56.h
+++ b/fs/btrfs/raid56.h
@@ -36,7 +36,7 @@ int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc,
 			u32 stripe_len);
 
 void raid56_add_scrub_pages(struct btrfs_raid_bio *rbio, struct page *page,
-			    u64 logical);
+			    unsigned int pgoff, u64 logical);
 
 struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio,
 				struct btrfs_io_context *bioc, u32 stripe_len,
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 13ba458c080c..ccc2cb869cca 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2205,7 +2205,11 @@ static void scrub_missing_raid56_pages(struct scrub_block *sblock)
 	for (i = 0; i < sblock->sector_count; i++) {
 		struct scrub_sector *sector = sblock->sectors[i];
 
-		raid56_add_scrub_pages(rbio, sector->page, sector->logical);
+		/*
+		 * For now, our scrub is still one page per-sector, so pgoff
+		 * is always 0.
+		 */
+		raid56_add_scrub_pages(rbio, sector->page, 0, sector->logical);
 	}
 
 	btrfs_init_work(&sblock->work, scrub_missing_raid56_worker, NULL, NULL);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 13/17] btrfs: remove btrfs_raid_bio::bio_pages array
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (11 preceding siblings ...)
  2022-04-12  9:33 ` [PATCH v2 12/17] btrfs: make raid56_add_scrub_pages() subpage compatible Qu Wenruo
@ 2022-04-12  9:33 ` Qu Wenruo
  2022-04-12  9:33 ` [PATCH v2 14/17] btrfs: make set_bio_pages_uptodate() subpage compatible Qu Wenruo
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:33 UTC (permalink / raw)
  To: linux-btrfs

The functionality is completely replaced by the new bio_sectors member,
now it's time to remove the old member.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 38 +++-----------------------------------
 1 file changed, 3 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 5f89ff3963c2..af8ba1aff682 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -178,12 +178,6 @@ struct btrfs_raid_bio {
 	/* Pointers to the sectors in the bio_list, for faster lookup */
 	struct sector_ptr *bio_sectors;
 
-	/*
-	 * pointers to the pages in the bio_list.  Stored
-	 * here for faster lookup
-	 */
-	struct page **bio_pages;
-
 	/*
 	 * For subpage support, we need to map each sector to above
 	 * stripe_pages.
@@ -265,7 +259,7 @@ int btrfs_alloc_stripe_hash_table(struct btrfs_fs_info *info)
 
 /*
  * caching an rbio means to copy anything from the
- * bio_pages array into the stripe_pages array.  We
+ * bio_sectors array into the stripe_pages array.  We
  * use the page uptodate bit in the stripe cache array
  * to indicate if it has valid data
  *
@@ -281,14 +275,6 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio)
 	if (ret)
 		return;
 
-	for (i = 0; i < rbio->nr_pages; i++) {
-		if (!rbio->bio_pages[i])
-			continue;
-
-		copy_highpage(rbio->stripe_pages[i], rbio->bio_pages[i]);
-		SetPageUptodate(rbio->stripe_pages[i]);
-	}
-
 	for (i = 0; i < rbio->nr_sectors; i++) {
 		/* Some range not covered by bio (partial write), skip it */
 		if (!rbio->bio_sectors[i].page)
@@ -1064,7 +1050,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	atomic_set(&rbio->stripes_pending, 0);
 
 	/*
-	 * the stripe_pages, bio_pages, etc arrays point to the extra
+	 * The stripe_pages, bio_sectors, etc arrays point to the extra
 	 * memory we allocated past the end of the rbio
 	 */
 	p = rbio + 1;
@@ -1073,7 +1059,6 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 		p = (unsigned char *)p + sizeof(*(ptr)) * (count);	\
 	} while (0)
 	CONSUME_ALLOC(rbio->stripe_pages, num_pages);
-	CONSUME_ALLOC(rbio->bio_pages, num_pages);
 	CONSUME_ALLOC(rbio->bio_sectors, num_sectors);
 	CONSUME_ALLOC(rbio->stripe_sectors, num_sectors);
 	CONSUME_ALLOC(rbio->finish_pointers, real_stripes);
@@ -1234,7 +1219,7 @@ static void index_one_bio(struct btrfs_raid_bio *rbio, struct bio *bio)
 }
 
 /*
- * helper function to walk our bio list and populate the bio_pages array with
+ * Helper function to walk our bio list and populate the bio_sectors array with
  * the result.  This seems expensive, but it is faster than constantly
  * searching through the bio list as we setup the IO in finish_rmw or stripe
  * reconstruction.
@@ -1244,25 +1229,8 @@ static void index_one_bio(struct btrfs_raid_bio *rbio, struct bio *bio)
 static void index_rbio_pages(struct btrfs_raid_bio *rbio)
 {
 	struct bio *bio;
-	u64 start;
-	unsigned long stripe_offset;
-	unsigned long page_index;
 
 	spin_lock_irq(&rbio->bio_list_lock);
-	bio_list_for_each(bio, &rbio->bio_list) {
-		struct bio_vec bvec;
-		struct bvec_iter iter;
-		int i = 0;
-
-		start = bio->bi_iter.bi_sector << 9;
-		stripe_offset = start - rbio->bioc->raid_map[0];
-		page_index = stripe_offset >> PAGE_SHIFT;
-
-		bio_for_each_segment(bvec, bio, iter) {
-			rbio->bio_pages[page_index + i] = bvec.bv_page;
-			i++;
-		}
-	}
 	bio_list_for_each(bio, &rbio->bio_list)
 		index_one_bio(rbio, bio);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 14/17] btrfs: make set_bio_pages_uptodate() subpage compatible
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (12 preceding siblings ...)
  2022-04-12  9:33 ` [PATCH v2 13/17] btrfs: remove btrfs_raid_bio::bio_pages array Qu Wenruo
@ 2022-04-12  9:33 ` Qu Wenruo
  2022-04-12  9:33 ` [PATCH v2 15/17] btrfs: make steal_rbio() " Qu Wenruo
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:33 UTC (permalink / raw)
  To: linux-btrfs

Unlike previous code, we can not directly set PageUptodate for stripe
pages now.

Instead we have to iterate through all the sectors and set
SECTOR_UPTODATE flag there.

Thus this patch introduced a new helper, find_stripe_sector(), to do the
work.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 45 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 39 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index af8ba1aff682..8de5294de7a5 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1491,19 +1491,52 @@ static int fail_bio_stripe(struct btrfs_raid_bio *rbio,
 	return fail_rbio_index(rbio, failed);
 }
 
+/*
+ * For subpage case, we can no longer set page Uptodate directly for
+ * stripe_pages[], thus we need to locate the sector.
+ */
+static struct sector_ptr *find_stripe_sector(struct btrfs_raid_bio *rbio,
+					     struct page *page,
+					     unsigned int pgoff)
+{
+	int i;
+
+	for (i = 0; i < rbio->nr_sectors; i++) {
+		struct sector_ptr *sector = &rbio->stripe_sectors[i];
+
+		if (sector->page == page && sector->pgoff == pgoff)
+			return sector;
+	}
+	return NULL;
+}
+
 /*
  * this sets each page in the bio uptodate.  It should only be used on private
  * rbio pages, nothing that comes in from the higher layers
  */
-static void set_bio_pages_uptodate(struct bio *bio)
+static void set_bio_pages_uptodate(struct btrfs_raid_bio *rbio,
+				   struct bio *bio)
 {
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
 	struct bio_vec *bvec;
 	struct bvec_iter_all iter_all;
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 
-	bio_for_each_segment_all(bvec, bio, iter_all)
-		SetPageUptodate(bvec->bv_page);
+	bio_for_each_segment_all(bvec, bio, iter_all) {
+		struct sector_ptr *sector;
+		int pgoff;
+
+		for (pgoff = bvec->bv_offset;
+		     pgoff - bvec->bv_offset < bvec->bv_len;
+		     pgoff += sectorsize) {
+			sector = find_stripe_sector(rbio, bvec->bv_page, pgoff);
+			ASSERT(sector);
+			if (sector)
+				sector->uptodate = 1;
+		}
+
+	}
 }
 
 /*
@@ -1521,7 +1554,7 @@ static void raid_rmw_end_io(struct bio *bio)
 	if (bio->bi_status)
 		fail_bio_stripe(rbio, bio);
 	else
-		set_bio_pages_uptodate(bio);
+		set_bio_pages_uptodate(rbio, bio);
 
 	bio_put(bio);
 
@@ -2079,7 +2112,7 @@ static void raid_recover_end_io(struct bio *bio)
 	if (bio->bi_status)
 		fail_bio_stripe(rbio, bio);
 	else
-		set_bio_pages_uptodate(bio);
+		set_bio_pages_uptodate(rbio, bio);
 	bio_put(bio);
 
 	if (!atomic_dec_and_test(&rbio->stripes_pending))
@@ -2637,7 +2670,7 @@ static void raid56_parity_scrub_end_io(struct bio *bio)
 	if (bio->bi_status)
 		fail_bio_stripe(rbio, bio);
 	else
-		set_bio_pages_uptodate(bio);
+		set_bio_pages_uptodate(rbio, bio);
 
 	bio_put(bio);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 15/17] btrfs: make steal_rbio() subpage compatible
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (13 preceding siblings ...)
  2022-04-12  9:33 ` [PATCH v2 14/17] btrfs: make set_bio_pages_uptodate() subpage compatible Qu Wenruo
@ 2022-04-12  9:33 ` Qu Wenruo
  2022-04-12  9:33 ` [PATCH v2 16/17] btrfs: make alloc_rbio_essential_pages() " Qu Wenruo
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:33 UTC (permalink / raw)
  To: linux-btrfs

Function steal_rbio() will taking all the uptodate pages from the source
rbio to destination rbio.

With the new stripe_sectors[] array, we also need to do the extra check:

- Check sector::flags to make sure the full page is uptodate
  Now we don't use PageUptodate flag for subpage cases to indicate
  if the page is uptodate.

  Instead we need to check all the sectors belong to the page to be sure
  about whether it's full page uptodate.

  So here we introduce a new helper, full_page_sectors_uptodate() to do
  the check.

- Update rbio::stripe_sectors[] to use the new page pointer
  We only need to change the page pointer, no need to change anything
  else.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 8de5294de7a5..d0f9bee0c10d 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -309,6 +309,25 @@ static int rbio_bucket(struct btrfs_raid_bio *rbio)
 	return hash_64(num >> 16, BTRFS_STRIPE_HASH_TABLE_BITS);
 }
 
+static bool full_page_sectors_uptodate(struct btrfs_raid_bio *rbio,
+				       unsigned int page_nr)
+{
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
+	const u32 sectors_per_page = PAGE_SIZE / sectorsize;
+	bool uptodate = true;
+	int i;
+
+	ASSERT(page_nr < rbio->nr_pages);
+
+	for (i = sectors_per_page * page_nr;
+	     i < sectors_per_page * page_nr + sectors_per_page; i++) {
+		if (!rbio->stripe_sectors[i].uptodate) {
+			uptodate = false;
+			break;
+		}
+	}
+	return uptodate;
+}
 /*
  * Update the stripe_sectors[] array to use correct page and pgoff
  *
@@ -330,8 +349,11 @@ static void index_stripe_sectors(struct btrfs_raid_bio *rbio)
 }
 
 /*
- * stealing an rbio means taking all the uptodate pages from the stripe
+ * Stealing an rbio means taking all the uptodate pages from the stripe
  * array in the source rbio and putting them into the destination rbio
+ *
+ * This will also update the involved stripe_sectors[] which are referring to
+ * the old pages.
  */
 static void steal_rbio(struct btrfs_raid_bio *src, struct btrfs_raid_bio *dest)
 {
@@ -344,9 +366,8 @@ static void steal_rbio(struct btrfs_raid_bio *src, struct btrfs_raid_bio *dest)
 
 	for (i = 0; i < dest->nr_pages; i++) {
 		s = src->stripe_pages[i];
-		if (!s || !PageUptodate(s)) {
+		if (!s || !full_page_sectors_uptodate(src, i))
 			continue;
-		}
 
 		d = dest->stripe_pages[i];
 		if (d)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 16/17] btrfs: make alloc_rbio_essential_pages() subpage compatible
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (14 preceding siblings ...)
  2022-04-12  9:33 ` [PATCH v2 15/17] btrfs: make steal_rbio() " Qu Wenruo
@ 2022-04-12  9:33 ` Qu Wenruo
  2022-04-12  9:33 ` [PATCH v2 17/17] btrfs: enable subpage support for RAID56 Qu Wenruo
  2022-04-12 17:42 ` [PATCH v2 00/17] btrfs: add " David Sterba
  17 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:33 UTC (permalink / raw)
  To: linux-btrfs

The non-compatible part is only the bitmap iteration part, now the
bitmap size is extended to rbio::stripe_nsectors, not the old
rbio::stripe_npages.

Since we're here, also slightly improve the function by:

- Rename @i to @stripe
- Rename @bit to @sectornr
- Move @page and @index into the inner loop

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index d0f9bee0c10d..ef3d3b67098b 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2422,14 +2422,16 @@ void raid56_add_scrub_pages(struct btrfs_raid_bio *rbio, struct page *page,
  */
 static int alloc_rbio_essential_pages(struct btrfs_raid_bio *rbio)
 {
-	int i;
-	int bit;
-	int index;
-	struct page *page;
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
+	int stripe;
+	int sectornr;
+
+	for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
+		for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
+			struct page *page;
+			int index = (stripe * rbio->stripe_nsectors + sectornr) *
+				    sectorsize >> PAGE_SHIFT;
 
-	for_each_set_bit(bit, rbio->dbitmap, rbio->stripe_npages) {
-		for (i = 0; i < rbio->real_stripes; i++) {
-			index = i * rbio->stripe_npages + bit;
 			if (rbio->stripe_pages[index])
 				continue;
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 17/17] btrfs: enable subpage support for RAID56
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (15 preceding siblings ...)
  2022-04-12  9:33 ` [PATCH v2 16/17] btrfs: make alloc_rbio_essential_pages() " Qu Wenruo
@ 2022-04-12  9:33 ` Qu Wenruo
  2022-04-12 17:42 ` [PATCH v2 00/17] btrfs: add " David Sterba
  17 siblings, 0 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-12  9:33 UTC (permalink / raw)
  To: linux-btrfs

Now the btrfs RAID56 infrastructure has migrated to use sector_ptr
interface, it should be safe to enable subpage support for btrfs RAID56.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 8 --------
 fs/btrfs/raid56.c  | 6 ------
 fs/btrfs/volumes.c | 7 -------
 3 files changed, 21 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2bc867d35308..e816943ddb7a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3682,14 +3682,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		btrfs_warn(fs_info,
 		"read-write for sector size %u with page size %lu is experimental",
 			   sectorsize, PAGE_SIZE);
-		if (btrfs_super_incompat_flags(fs_info->super_copy) &
-			BTRFS_FEATURE_INCOMPAT_RAID56) {
-			btrfs_err(fs_info,
-		"RAID56 is not yet supported for sector size %u with page size %lu",
-				sectorsize, PAGE_SIZE);
-			err = -EINVAL;
-			goto fail_alloc;
-		}
 		subpage_info = kzalloc(sizeof(*subpage_info), GFP_KERNEL);
 		if (!subpage_info)
 			goto fail_alloc;
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index ef3d3b67098b..97a92576a163 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1158,9 +1158,6 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
 	ASSERT(sector_nr >= 0 && sector_nr < rbio->stripe_nsectors);
 	ASSERT(sector->page);
 
-	/* We don't yet support subpage, thus pgoff should always be 0 */
-	ASSERT(sector->pgoff == 0);
-
 	stripe = &rbio->bioc->stripes[stripe_nr];
 	disk_start = stripe->physical + sector_nr * sectorsize;
 
@@ -2385,9 +2382,6 @@ struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio,
 	}
 	ASSERT(i < rbio->real_stripes);
 
-	/* Now we just support the sectorsize equals to page size */
-	ASSERT(fs_info->sectorsize == PAGE_SIZE);
-	ASSERT(rbio->stripe_npages == stripe_nsectors);
 	bitmap_copy(rbio->dbitmap, dbitmap, stripe_nsectors);
 
 	/*
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 93072a090fdd..1b75cde5a267 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4063,13 +4063,6 @@ static inline int validate_convert_profile(struct btrfs_fs_info *fs_info,
 	if (!(bargs->flags & BTRFS_BALANCE_ARGS_CONVERT))
 		return true;
 
-	if (fs_info->sectorsize < PAGE_SIZE &&
-		bargs->target & BTRFS_BLOCK_GROUP_RAID56_MASK) {
-		btrfs_err(fs_info,
-		"RAID56 is not yet supported for sectorsize %u with page size %lu",
-			  fs_info->sectorsize, PAGE_SIZE);
-		return false;
-	}
 	/* Profile is valid and does not have bits outside of the allowed set */
 	if (alloc_profile_is_valid(bargs->target, 1) &&
 	    (bargs->target & ~allowed) == 0)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/17] btrfs: add subpage support for RAID56
  2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (16 preceding siblings ...)
  2022-04-12  9:33 ` [PATCH v2 17/17] btrfs: enable subpage support for RAID56 Qu Wenruo
@ 2022-04-12 17:42 ` David Sterba
  2022-04-13 14:46   ` David Sterba
  17 siblings, 1 reply; 34+ messages in thread
From: David Sterba @ 2022-04-12 17:42 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Apr 12, 2022 at 05:32:50PM +0800, Qu Wenruo wrote:
> The branch can be fetched from github, based on latest misc-next branch
> (with bio and memory allocation refactors):
> https://github.com/adam900710/linux/tree/subpage_raid56
> 
> [CHANGELOG]
> v2:
> - Rebased to latest misc-next
>   There are several conflicts caused by bio interface change and page
>   allocation update.
> 
> - A new patch to reduce the width of @stripe_len to u32
>   Currently @stripe_len is fixed to 64K, and even in the future we
>   choose to enlarge the value, I see no reason to go beyond 4G for
>   stripe length.
> 
>   Thus change it u32 to avoid some u64-divided-by-u32 situations.
> 
>   This will reduce memory usage for map_lookup (which has a lifespan as
>   long as the mounted fs) and btrfs_io_geometry (which only has a very
>   short lifespan, mostly bounded to bio).
> 
>   Furthermore, add some extra alignment check and use right bit shift
>   to replace involved division to avoid possible problems on 32bit
>   systems.
> 
> - Pack sector_ptr::pgoff and sector_ptr::uptodate into one u32
>   This will reduce memory usage and reduce unaligned memory access
> 
>   Please note that, even with it packed, we still have a 4 bytes padding
>   (it's u64 + u32, thus not perfectly aligned).
>   Without packed attribute, it will cost more memory usage anyway.
> 
> - Call kunmap_local() using address with pgoff
>   As it can handle it without problem, no need to bother extra search
>   just for pgoff.
> 
> - Use "= { 0 }" for structure initialization
> 
> - Reduce comment updates to minimal
>   If one comment line is not really touched, then don't touch it just to
>   fix some bad styles.

v2 updated in for-next, I had a local branch with more changes so I
transferred the changes manually.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 00/17] btrfs: add subpage support for RAID56
  2022-04-12 17:42 ` [PATCH v2 00/17] btrfs: add " David Sterba
@ 2022-04-13 14:46   ` David Sterba
  0 siblings, 0 replies; 34+ messages in thread
From: David Sterba @ 2022-04-13 14:46 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

On Tue, Apr 12, 2022 at 07:42:25PM +0200, David Sterba wrote:
> On Tue, Apr 12, 2022 at 05:32:50PM +0800, Qu Wenruo wrote:
> > The branch can be fetched from github, based on latest misc-next branch
> > (with bio and memory allocation refactors):
> > https://github.com/adam900710/linux/tree/subpage_raid56
> > 
> > [CHANGELOG]
> > v2:
> > - Rebased to latest misc-next
> >   There are several conflicts caused by bio interface change and page
> >   allocation update.
> > 
> > - A new patch to reduce the width of @stripe_len to u32
> >   Currently @stripe_len is fixed to 64K, and even in the future we
> >   choose to enlarge the value, I see no reason to go beyond 4G for
> >   stripe length.
> > 
> >   Thus change it u32 to avoid some u64-divided-by-u32 situations.
> > 
> >   This will reduce memory usage for map_lookup (which has a lifespan as
> >   long as the mounted fs) and btrfs_io_geometry (which only has a very
> >   short lifespan, mostly bounded to bio).
> > 
> >   Furthermore, add some extra alignment check and use right bit shift
> >   to replace involved division to avoid possible problems on 32bit
> >   systems.
> > 
> > - Pack sector_ptr::pgoff and sector_ptr::uptodate into one u32
> >   This will reduce memory usage and reduce unaligned memory access
> > 
> >   Please note that, even with it packed, we still have a 4 bytes padding
> >   (it's u64 + u32, thus not perfectly aligned).
> >   Without packed attribute, it will cost more memory usage anyway.
> > 
> > - Call kunmap_local() using address with pgoff
> >   As it can handle it without problem, no need to bother extra search
> >   just for pgoff.
> > 
> > - Use "= { 0 }" for structure initialization
> > 
> > - Reduce comment updates to minimal
> >   If one comment line is not really touched, then don't touch it just to
> >   fix some bad styles.
> 
> v2 updated in for-next, I had a local branch with more changes so I
> transferred the changes manually.

Now moved to misc-next as it's an isolated functionality and should not
affect anything else. Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible
  2022-04-12  9:32 ` [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible Qu Wenruo
@ 2022-04-13 19:14   ` David Sterba
  2022-04-13 23:28     ` Qu Wenruo
  0 siblings, 1 reply; 34+ messages in thread
From: David Sterba @ 2022-04-13 19:14 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

There's an assertion failure reported by btrfs/023

On Tue, Apr 12, 2022 at 05:32:57PM +0800, Qu Wenruo wrote:
> +static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
> +			      struct bio_list *bio_list,
> +			      struct sector_ptr *sector,
> +			      unsigned int stripe_nr,
> +			      unsigned int sector_nr,
> +			      unsigned long bio_max_len, unsigned int opf)
>  {
> +	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
>  	struct bio *last = bio_list->tail;
>  	int ret;
>  	struct bio *bio;
>  	struct btrfs_io_stripe *stripe;
>  	u64 disk_start;
>  
> +	/*
> +	 * NOTE: here stripe_nr has taken device replace into consideration,
> +	 * thus it can be larger than rbio->real_stripe.
> +	 * So here we check against bioc->num_stripes, not rbio->real_stripes.
> +	 */
> +	ASSERT(stripe_nr >= 0 && stripe_nr < rbio->bioc->num_stripes);
> +	ASSERT(sector_nr >= 0 && sector_nr < rbio->stripe_nsectors);
> +	ASSERT(sector->page);

This one ^^^

[ 2280.705765] assertion failed: sector->page, in fs/btrfs/raid56.c:1145
[ 2280.707844] ------------[ cut here ]------------
[ 2280.709401] kernel BUG at fs/btrfs/ctree.h:3614!
[ 2280.711084] invalid opcode: 0000 [#1] PREEMPT SMP
[ 2280.712822] CPU: 3 PID: 4084 Comm: kworker/u8:2 Not tainted 5.18.0-rc2-default+ #1697
[ 2280.715648] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
[ 2280.719656] Workqueue: btrfs-rmw btrfs_work_helper [btrfs]
[ 2280.721775] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
[ 2280.729575] RSP: 0018:ffffad6d071afda0 EFLAGS: 00010246
[ 2280.730844] RAX: 0000000000000039 RBX: 0000000000000000 RCX: 0000000000000000
[ 2280.732449] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 00000000ffffffff
[ 2280.733992] RBP: ffff8e51d5465000 R08: 0000000000000003 R09: 0000000000000002
[ 2280.735535] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000003
[ 2280.737093] R13: ffff8e51d5465000 R14: ffff8e51d5465d78 R15: 0000000000001000
[ 2280.738613] FS:  0000000000000000(0000) GS:ffff8e523dc00000(0000) knlGS:0000000000000000
[ 2280.740392] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2280.741794] CR2: 000055f48fe51ab0 CR3: 000000001f805001 CR4: 0000000000170ea0
[ 2280.743532] Call Trace:
[ 2280.744136]  <TASK>
[ 2280.744701]  rbio_add_io_sector.cold+0x11/0x33 [btrfs]
[ 2280.745846]  ? _raw_spin_unlock_irq+0x2f/0x50
[ 2280.746782]  raid56_rmw_stripe.isra.0+0x153/0x320 [btrfs]
[ 2280.747965]  btrfs_work_helper+0xd6/0x1d0 [btrfs]
[ 2280.749018]  process_one_work+0x264/0x5f0
[ 2280.749806]  worker_thread+0x52/0x3b0
[ 2280.750523]  ? process_one_work+0x5f0/0x5f0
[ 2280.751395]  kthread+0xea/0x110
[ 2280.752097]  ? kthread_complete_and_exit+0x20/0x20
[ 2280.753112]  ret_from_fork+0x1f/0x30


> +
> +	/* We don't yet support subpage, thus pgoff should always be 0 */
> +	ASSERT(sector->pgoff == 0);
> +
>  	stripe = &rbio->bioc->stripes[stripe_nr];
> -	disk_start = stripe->physical + (page_index << PAGE_SHIFT);
> +	disk_start = stripe->physical + sector_nr * sectorsize;
>  
>  	/* if the device is missing, just fail this stripe */
>  	if (!stripe->dev->bdev)
> @@ -1156,8 +1226,9 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
>  		 */
>  		if (last_end == disk_start && !last->bi_status &&
>  		    last->bi_bdev == stripe->dev->bdev) {
> -			ret = bio_add_page(last, page, PAGE_SIZE, 0);
> -			if (ret == PAGE_SIZE)
> +			ret = bio_add_page(last, sector->page, sectorsize,
> +					   sector->pgoff);
> +			if (ret == sectorsize)
>  				return 0;
>  		}
>  	}
> @@ -1168,7 +1239,7 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
>  	bio->bi_iter.bi_sector = disk_start >> 9;
>  	bio->bi_private = rbio;
>  
> -	bio_add_page(bio, page, PAGE_SIZE, 0);
> +	bio_add_page(bio, sector->page, sectorsize, sector->pgoff);
>  	bio_list_add(bio_list, bio);
>  	return 0;
>  }
> @@ -1265,7 +1336,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
>  	void **pointers = rbio->finish_pointers;
>  	int nr_data = rbio->nr_data;
>  	int stripe;
> -	int pagenr;
> +	int sectornr;
>  	bool has_qstripe;
>  	struct bio_list bio_list;
>  	struct bio *bio;
> @@ -1309,16 +1380,16 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
>  	else
>  		clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
>  
> -	for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
> +	for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
>  		struct page *p;
>  		/* first collect one page from each data stripe */
>  		for (stripe = 0; stripe < nr_data; stripe++) {
> -			p = page_in_rbio(rbio, stripe, pagenr, 0);
> +			p = page_in_rbio(rbio, stripe, sectornr, 0);
>  			pointers[stripe] = kmap_local_page(p);
>  		}
>  
>  		/* then add the parity stripe */
> -		p = rbio_pstripe_page(rbio, pagenr);
> +		p = rbio_pstripe_page(rbio, sectornr);
>  		SetPageUptodate(p);
>  		pointers[stripe++] = kmap_local_page(p);
>  
> @@ -1328,7 +1399,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
>  			 * raid6, add the qstripe and call the
>  			 * library function to fill in our p/q
>  			 */
> -			p = rbio_qstripe_page(rbio, pagenr);
> +			p = rbio_qstripe_page(rbio, sectornr);
>  			SetPageUptodate(p);
>  			pointers[stripe++] = kmap_local_page(p);
>  
> @@ -1349,19 +1420,19 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
>  	 * everything else.
>  	 */
>  	for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
> -		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
> -			struct page *page;
> +		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
> +			struct sector_ptr *sector;
> +
>  			if (stripe < rbio->nr_data) {
> -				page = page_in_rbio(rbio, stripe, pagenr, 1);
> -				if (!page)
> +				sector = sector_in_rbio(rbio, stripe, sectornr, 1);
> +				if (!sector)
>  					continue;
>  			} else {
> -			       page = rbio_stripe_page(rbio, stripe, pagenr);
> +				sector = rbio_stripe_sector(rbio, stripe, sectornr);
>  			}
>  
> -			ret = rbio_add_io_page(rbio, &bio_list,
> -				       page, stripe, pagenr, rbio->stripe_len,
> -				       REQ_OP_WRITE);
> +			ret = rbio_add_io_sector(rbio, &bio_list, sector, stripe,
> +						 sectornr, rbio->stripe_len, REQ_OP_WRITE);
>  			if (ret)
>  				goto cleanup;
>  		}
> @@ -1374,20 +1445,21 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
>  		if (!bioc->tgtdev_map[stripe])
>  			continue;
>  
> -		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
> -			struct page *page;
> +		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
> +			struct sector_ptr *sector;
> +
>  			if (stripe < rbio->nr_data) {
> -				page = page_in_rbio(rbio, stripe, pagenr, 1);
> -				if (!page)
> +				sector = sector_in_rbio(rbio, stripe, sectornr, 1);
> +				if (!sector)
>  					continue;
>  			} else {
> -			       page = rbio_stripe_page(rbio, stripe, pagenr);
> +				sector = rbio_stripe_sector(rbio, stripe, sectornr);
>  			}
>  
> -			ret = rbio_add_io_page(rbio, &bio_list, page,
> -					       rbio->bioc->tgtdev_map[stripe],
> -					       pagenr, rbio->stripe_len,
> -					       REQ_OP_WRITE);
> +			ret = rbio_add_io_sector(rbio, &bio_list, sector,
> +						 rbio->bioc->tgtdev_map[stripe],
> +						 sectornr, rbio->stripe_len,
> +						 REQ_OP_WRITE);
>  			if (ret)
>  				goto cleanup;
>  		}
> @@ -1563,7 +1635,7 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
>  	int bios_to_read = 0;
>  	struct bio_list bio_list;
>  	int ret;
> -	int pagenr;
> +	int sectornr;
>  	int stripe;
>  	struct bio *bio;
>  
> @@ -1581,28 +1653,29 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
>  	 * stripe
>  	 */
>  	for (stripe = 0; stripe < rbio->nr_data; stripe++) {
> -		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
> -			struct page *page;
> +		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
> +			struct sector_ptr *sector;
> +
>  			/*
> -			 * we want to find all the pages missing from
> +			 * We want to find all the sectors missing from
>  			 * the rbio and read them from the disk.  If
> -			 * page_in_rbio finds a page in the bio list
> +			 * sector_in_rbio() finds a page in the bio list
>  			 * we don't need to read it off the stripe.
>  			 */
> -			page = page_in_rbio(rbio, stripe, pagenr, 1);
> -			if (page)
> +			sector = sector_in_rbio(rbio, stripe, sectornr, 1);
> +			if (sector)
>  				continue;
>  
> -			page = rbio_stripe_page(rbio, stripe, pagenr);
> +			sector = rbio_stripe_sector(rbio, stripe, sectornr);
>  			/*
> -			 * the bio cache may have handed us an uptodate
> +			 * The bio cache may have handed us an uptodate
>  			 * page.  If so, be happy and use it
>  			 */
> -			if (PageUptodate(page))
> +			if (sector->uptodate)
>  				continue;
>  
> -			ret = rbio_add_io_page(rbio, &bio_list, page,
> -				       stripe, pagenr, rbio->stripe_len,
> +			ret = rbio_add_io_sector(rbio, &bio_list, sector,
> +				       stripe, sectornr, rbio->stripe_len,
>  				       REQ_OP_READ);
>  			if (ret)
>  				goto cleanup;
> @@ -2107,7 +2180,7 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
>  	int bios_to_read = 0;
>  	struct bio_list bio_list;
>  	int ret;
> -	int pagenr;
> +	int sectornr;
>  	int stripe;
>  	struct bio *bio;
>  
> @@ -2130,21 +2203,20 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
>  			continue;
>  		}
>  
> -		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
> -			struct page *p;
> +		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
> +			struct sector_ptr *sector;
>  
>  			/*
>  			 * the rmw code may have already read this
>  			 * page in
>  			 */
> -			p = rbio_stripe_page(rbio, stripe, pagenr);
> -			if (PageUptodate(p))
> +			sector = rbio_stripe_sector(rbio, stripe, sectornr);
> +			if (sector->uptodate)
>  				continue;
>  
> -			ret = rbio_add_io_page(rbio, &bio_list,
> -				       rbio_stripe_page(rbio, stripe, pagenr),
> -				       stripe, pagenr, rbio->stripe_len,
> -				       REQ_OP_READ);
> +			ret = rbio_add_io_sector(rbio, &bio_list, sector,
> +						 stripe, sectornr,
> +						 rbio->stripe_len, REQ_OP_READ);
>  			if (ret < 0)
>  				goto cleanup;
>  		}
> @@ -2399,7 +2471,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
>  	unsigned long *pbitmap = rbio->finish_pbitmap;
>  	int nr_data = rbio->nr_data;
>  	int stripe;
> -	int pagenr;
> +	int sectornr;
>  	bool has_qstripe;
>  	struct page *p_page = NULL;
>  	struct page *q_page = NULL;
> @@ -2419,7 +2491,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
>  
>  	if (bioc->num_tgtdevs && bioc->tgtdev_map[rbio->scrubp]) {
>  		is_replace = 1;
> -		bitmap_copy(pbitmap, rbio->dbitmap, rbio->stripe_npages);
> +		bitmap_copy(pbitmap, rbio->dbitmap, rbio->stripe_nsectors);
>  	}
>  
>  	/*
> @@ -2453,12 +2525,12 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
>  	/* Map the parity stripe just once */
>  	pointers[nr_data] = kmap_local_page(p_page);
>  
> -	for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
> +	for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
>  		struct page *p;
>  		void *parity;
>  		/* first collect one page from each data stripe */
>  		for (stripe = 0; stripe < nr_data; stripe++) {
> -			p = page_in_rbio(rbio, stripe, pagenr, 0);
> +			p = page_in_rbio(rbio, stripe, sectornr, 0);
>  			pointers[stripe] = kmap_local_page(p);
>  		}
>  
> @@ -2473,13 +2545,13 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
>  		}
>  
>  		/* Check scrubbing parity and repair it */
> -		p = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
> +		p = rbio_stripe_page(rbio, rbio->scrubp, sectornr);
>  		parity = kmap_local_page(p);
>  		if (memcmp(parity, pointers[rbio->scrubp], PAGE_SIZE))
>  			copy_page(parity, pointers[rbio->scrubp]);
>  		else
>  			/* Parity is right, needn't writeback */
> -			bitmap_clear(rbio->dbitmap, pagenr, 1);
> +			bitmap_clear(rbio->dbitmap, sectornr, 1);
>  		kunmap_local(parity);
>  
>  		for (stripe = nr_data - 1; stripe >= 0; stripe--)
> @@ -2499,12 +2571,13 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
>  	 * higher layers (the bio_list in our rbio) and our p/q.  Ignore
>  	 * everything else.
>  	 */
> -	for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
> -		struct page *page;
> +	for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
> +		struct sector_ptr *sector;
>  
> -		page = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
> -		ret = rbio_add_io_page(rbio, &bio_list, page, rbio->scrubp,
> -				       pagenr, rbio->stripe_len, REQ_OP_WRITE);
> +		sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
> +		ret = rbio_add_io_sector(rbio, &bio_list, sector, rbio->scrubp,
> +					 sectornr, rbio->stripe_len,
> +					 REQ_OP_WRITE);
>  		if (ret)
>  			goto cleanup;
>  	}
> @@ -2512,13 +2585,13 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
>  	if (!is_replace)
>  		goto submit_write;
>  
> -	for_each_set_bit(pagenr, pbitmap, rbio->stripe_npages) {
> -		struct page *page;
> +	for_each_set_bit(sectornr, pbitmap, rbio->stripe_nsectors) {
> +		struct sector_ptr *sector;
>  
> -		page = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
> -		ret = rbio_add_io_page(rbio, &bio_list, page,
> +		sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
> +		ret = rbio_add_io_sector(rbio, &bio_list, sector,
>  				       bioc->tgtdev_map[rbio->scrubp],
> -				       pagenr, rbio->stripe_len, REQ_OP_WRITE);
> +				       sectornr, rbio->stripe_len, REQ_OP_WRITE);
>  		if (ret)
>  			goto cleanup;
>  	}
> @@ -2650,7 +2723,7 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
>  	int bios_to_read = 0;
>  	struct bio_list bio_list;
>  	int ret;
> -	int pagenr;
> +	int sectornr;
>  	int stripe;
>  	struct bio *bio;
>  
> @@ -2666,28 +2739,30 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
>  	 * stripe
>  	 */
>  	for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
> -		for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
> -			struct page *page;
> +		for_each_set_bit(sectornr, rbio->dbitmap,
> +				 rbio->stripe_nsectors) {
> +			struct sector_ptr *sector;
>  			/*
> -			 * we want to find all the pages missing from
> +			 * We want to find all the sectors missing from
>  			 * the rbio and read them from the disk.  If
> -			 * page_in_rbio finds a page in the bio list
> +			 * sector_in_rbio() finds a sector in the bio list
>  			 * we don't need to read it off the stripe.
>  			 */
> -			page = page_in_rbio(rbio, stripe, pagenr, 1);
> -			if (page)
> +			sector = sector_in_rbio(rbio, stripe, sectornr, 1);
> +			if (sector)
>  				continue;
>  
> -			page = rbio_stripe_page(rbio, stripe, pagenr);
> +			sector = rbio_stripe_sector(rbio, stripe, sectornr);
>  			/*
> -			 * the bio cache may have handed us an uptodate
> -			 * page.  If so, be happy and use it
> +			 * The bio cache may have handed us an uptodate
> +			 * sector.  If so, be happy and use it
>  			 */
> -			if (PageUptodate(page))
> +			if (sector->uptodate)
>  				continue;
>  
> -			ret = rbio_add_io_page(rbio, &bio_list, page, stripe,
> -					       pagenr, rbio->stripe_len, REQ_OP_READ);
> +			ret = rbio_add_io_sector(rbio, &bio_list, sector,
> +						 stripe, sectornr,
> +						 rbio->stripe_len, REQ_OP_READ);
>  			if (ret)
>  				goto cleanup;
>  		}
> -- 
> 2.35.1

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible
  2022-04-13 19:14   ` David Sterba
@ 2022-04-13 23:28     ` Qu Wenruo
  2022-04-14  0:43       ` Qu Wenruo
  0 siblings, 1 reply; 34+ messages in thread
From: Qu Wenruo @ 2022-04-13 23:28 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/4/14 03:14, David Sterba wrote:
> There's an assertion failure reported by btrfs/023
>
> On Tue, Apr 12, 2022 at 05:32:57PM +0800, Qu Wenruo wrote:
>> +static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
>> +			      struct bio_list *bio_list,
>> +			      struct sector_ptr *sector,
>> +			      unsigned int stripe_nr,
>> +			      unsigned int sector_nr,
>> +			      unsigned long bio_max_len, unsigned int opf)
>>   {
>> +	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
>>   	struct bio *last = bio_list->tail;
>>   	int ret;
>>   	struct bio *bio;
>>   	struct btrfs_io_stripe *stripe;
>>   	u64 disk_start;
>>
>> +	/*
>> +	 * NOTE: here stripe_nr has taken device replace into consideration,
>> +	 * thus it can be larger than rbio->real_stripe.
>> +	 * So here we check against bioc->num_stripes, not rbio->real_stripes.
>> +	 */
>> +	ASSERT(stripe_nr >= 0 && stripe_nr < rbio->bioc->num_stripes);
>> +	ASSERT(sector_nr >= 0 && sector_nr < rbio->stripe_nsectors);
>> +	ASSERT(sector->page);
>
> This one ^^^

Failed to reproduce here, both x86_64 and aarch64 survived over 64 loops
of btrfs/023.

Although that's withy my github branch. Let me check the current branch
to see what's wrong.

Thanks,
Qu

>
> [ 2280.705765] assertion failed: sector->page, in fs/btrfs/raid56.c:1145
> [ 2280.707844] ------------[ cut here ]------------
> [ 2280.709401] kernel BUG at fs/btrfs/ctree.h:3614!
> [ 2280.711084] invalid opcode: 0000 [#1] PREEMPT SMP
> [ 2280.712822] CPU: 3 PID: 4084 Comm: kworker/u8:2 Not tainted 5.18.0-rc2-default+ #1697
> [ 2280.715648] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
> [ 2280.719656] Workqueue: btrfs-rmw btrfs_work_helper [btrfs]
> [ 2280.721775] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
> [ 2280.729575] RSP: 0018:ffffad6d071afda0 EFLAGS: 00010246
> [ 2280.730844] RAX: 0000000000000039 RBX: 0000000000000000 RCX: 0000000000000000
> [ 2280.732449] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 00000000ffffffff
> [ 2280.733992] RBP: ffff8e51d5465000 R08: 0000000000000003 R09: 0000000000000002
> [ 2280.735535] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000003
> [ 2280.737093] R13: ffff8e51d5465000 R14: ffff8e51d5465d78 R15: 0000000000001000
> [ 2280.738613] FS:  0000000000000000(0000) GS:ffff8e523dc00000(0000) knlGS:0000000000000000
> [ 2280.740392] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2280.741794] CR2: 000055f48fe51ab0 CR3: 000000001f805001 CR4: 0000000000170ea0
> [ 2280.743532] Call Trace:
> [ 2280.744136]  <TASK>
> [ 2280.744701]  rbio_add_io_sector.cold+0x11/0x33 [btrfs]
> [ 2280.745846]  ? _raw_spin_unlock_irq+0x2f/0x50
> [ 2280.746782]  raid56_rmw_stripe.isra.0+0x153/0x320 [btrfs]
> [ 2280.747965]  btrfs_work_helper+0xd6/0x1d0 [btrfs]
> [ 2280.749018]  process_one_work+0x264/0x5f0
> [ 2280.749806]  worker_thread+0x52/0x3b0
> [ 2280.750523]  ? process_one_work+0x5f0/0x5f0
> [ 2280.751395]  kthread+0xea/0x110
> [ 2280.752097]  ? kthread_complete_and_exit+0x20/0x20
> [ 2280.753112]  ret_from_fork+0x1f/0x30
>
>
>> +
>> +	/* We don't yet support subpage, thus pgoff should always be 0 */
>> +	ASSERT(sector->pgoff == 0);
>> +
>>   	stripe = &rbio->bioc->stripes[stripe_nr];
>> -	disk_start = stripe->physical + (page_index << PAGE_SHIFT);
>> +	disk_start = stripe->physical + sector_nr * sectorsize;
>>
>>   	/* if the device is missing, just fail this stripe */
>>   	if (!stripe->dev->bdev)
>> @@ -1156,8 +1226,9 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
>>   		 */
>>   		if (last_end == disk_start && !last->bi_status &&
>>   		    last->bi_bdev == stripe->dev->bdev) {
>> -			ret = bio_add_page(last, page, PAGE_SIZE, 0);
>> -			if (ret == PAGE_SIZE)
>> +			ret = bio_add_page(last, sector->page, sectorsize,
>> +					   sector->pgoff);
>> +			if (ret == sectorsize)
>>   				return 0;
>>   		}
>>   	}
>> @@ -1168,7 +1239,7 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
>>   	bio->bi_iter.bi_sector = disk_start >> 9;
>>   	bio->bi_private = rbio;
>>
>> -	bio_add_page(bio, page, PAGE_SIZE, 0);
>> +	bio_add_page(bio, sector->page, sectorsize, sector->pgoff);
>>   	bio_list_add(bio_list, bio);
>>   	return 0;
>>   }
>> @@ -1265,7 +1336,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
>>   	void **pointers = rbio->finish_pointers;
>>   	int nr_data = rbio->nr_data;
>>   	int stripe;
>> -	int pagenr;
>> +	int sectornr;
>>   	bool has_qstripe;
>>   	struct bio_list bio_list;
>>   	struct bio *bio;
>> @@ -1309,16 +1380,16 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
>>   	else
>>   		clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
>>
>> -	for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
>> +	for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
>>   		struct page *p;
>>   		/* first collect one page from each data stripe */
>>   		for (stripe = 0; stripe < nr_data; stripe++) {
>> -			p = page_in_rbio(rbio, stripe, pagenr, 0);
>> +			p = page_in_rbio(rbio, stripe, sectornr, 0);
>>   			pointers[stripe] = kmap_local_page(p);
>>   		}
>>
>>   		/* then add the parity stripe */
>> -		p = rbio_pstripe_page(rbio, pagenr);
>> +		p = rbio_pstripe_page(rbio, sectornr);
>>   		SetPageUptodate(p);
>>   		pointers[stripe++] = kmap_local_page(p);
>>
>> @@ -1328,7 +1399,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
>>   			 * raid6, add the qstripe and call the
>>   			 * library function to fill in our p/q
>>   			 */
>> -			p = rbio_qstripe_page(rbio, pagenr);
>> +			p = rbio_qstripe_page(rbio, sectornr);
>>   			SetPageUptodate(p);
>>   			pointers[stripe++] = kmap_local_page(p);
>>
>> @@ -1349,19 +1420,19 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
>>   	 * everything else.
>>   	 */
>>   	for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
>> -		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
>> -			struct page *page;
>> +		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
>> +			struct sector_ptr *sector;
>> +
>>   			if (stripe < rbio->nr_data) {
>> -				page = page_in_rbio(rbio, stripe, pagenr, 1);
>> -				if (!page)
>> +				sector = sector_in_rbio(rbio, stripe, sectornr, 1);
>> +				if (!sector)
>>   					continue;
>>   			} else {
>> -			       page = rbio_stripe_page(rbio, stripe, pagenr);
>> +				sector = rbio_stripe_sector(rbio, stripe, sectornr);
>>   			}
>>
>> -			ret = rbio_add_io_page(rbio, &bio_list,
>> -				       page, stripe, pagenr, rbio->stripe_len,
>> -				       REQ_OP_WRITE);
>> +			ret = rbio_add_io_sector(rbio, &bio_list, sector, stripe,
>> +						 sectornr, rbio->stripe_len, REQ_OP_WRITE);
>>   			if (ret)
>>   				goto cleanup;
>>   		}
>> @@ -1374,20 +1445,21 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
>>   		if (!bioc->tgtdev_map[stripe])
>>   			continue;
>>
>> -		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
>> -			struct page *page;
>> +		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
>> +			struct sector_ptr *sector;
>> +
>>   			if (stripe < rbio->nr_data) {
>> -				page = page_in_rbio(rbio, stripe, pagenr, 1);
>> -				if (!page)
>> +				sector = sector_in_rbio(rbio, stripe, sectornr, 1);
>> +				if (!sector)
>>   					continue;
>>   			} else {
>> -			       page = rbio_stripe_page(rbio, stripe, pagenr);
>> +				sector = rbio_stripe_sector(rbio, stripe, sectornr);
>>   			}
>>
>> -			ret = rbio_add_io_page(rbio, &bio_list, page,
>> -					       rbio->bioc->tgtdev_map[stripe],
>> -					       pagenr, rbio->stripe_len,
>> -					       REQ_OP_WRITE);
>> +			ret = rbio_add_io_sector(rbio, &bio_list, sector,
>> +						 rbio->bioc->tgtdev_map[stripe],
>> +						 sectornr, rbio->stripe_len,
>> +						 REQ_OP_WRITE);
>>   			if (ret)
>>   				goto cleanup;
>>   		}
>> @@ -1563,7 +1635,7 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
>>   	int bios_to_read = 0;
>>   	struct bio_list bio_list;
>>   	int ret;
>> -	int pagenr;
>> +	int sectornr;
>>   	int stripe;
>>   	struct bio *bio;
>>
>> @@ -1581,28 +1653,29 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
>>   	 * stripe
>>   	 */
>>   	for (stripe = 0; stripe < rbio->nr_data; stripe++) {
>> -		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
>> -			struct page *page;
>> +		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
>> +			struct sector_ptr *sector;
>> +
>>   			/*
>> -			 * we want to find all the pages missing from
>> +			 * We want to find all the sectors missing from
>>   			 * the rbio and read them from the disk.  If
>> -			 * page_in_rbio finds a page in the bio list
>> +			 * sector_in_rbio() finds a page in the bio list
>>   			 * we don't need to read it off the stripe.
>>   			 */
>> -			page = page_in_rbio(rbio, stripe, pagenr, 1);
>> -			if (page)
>> +			sector = sector_in_rbio(rbio, stripe, sectornr, 1);
>> +			if (sector)
>>   				continue;
>>
>> -			page = rbio_stripe_page(rbio, stripe, pagenr);
>> +			sector = rbio_stripe_sector(rbio, stripe, sectornr);
>>   			/*
>> -			 * the bio cache may have handed us an uptodate
>> +			 * The bio cache may have handed us an uptodate
>>   			 * page.  If so, be happy and use it
>>   			 */
>> -			if (PageUptodate(page))
>> +			if (sector->uptodate)
>>   				continue;
>>
>> -			ret = rbio_add_io_page(rbio, &bio_list, page,
>> -				       stripe, pagenr, rbio->stripe_len,
>> +			ret = rbio_add_io_sector(rbio, &bio_list, sector,
>> +				       stripe, sectornr, rbio->stripe_len,
>>   				       REQ_OP_READ);
>>   			if (ret)
>>   				goto cleanup;
>> @@ -2107,7 +2180,7 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
>>   	int bios_to_read = 0;
>>   	struct bio_list bio_list;
>>   	int ret;
>> -	int pagenr;
>> +	int sectornr;
>>   	int stripe;
>>   	struct bio *bio;
>>
>> @@ -2130,21 +2203,20 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
>>   			continue;
>>   		}
>>
>> -		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
>> -			struct page *p;
>> +		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
>> +			struct sector_ptr *sector;
>>
>>   			/*
>>   			 * the rmw code may have already read this
>>   			 * page in
>>   			 */
>> -			p = rbio_stripe_page(rbio, stripe, pagenr);
>> -			if (PageUptodate(p))
>> +			sector = rbio_stripe_sector(rbio, stripe, sectornr);
>> +			if (sector->uptodate)
>>   				continue;
>>
>> -			ret = rbio_add_io_page(rbio, &bio_list,
>> -				       rbio_stripe_page(rbio, stripe, pagenr),
>> -				       stripe, pagenr, rbio->stripe_len,
>> -				       REQ_OP_READ);
>> +			ret = rbio_add_io_sector(rbio, &bio_list, sector,
>> +						 stripe, sectornr,
>> +						 rbio->stripe_len, REQ_OP_READ);
>>   			if (ret < 0)
>>   				goto cleanup;
>>   		}
>> @@ -2399,7 +2471,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>   	unsigned long *pbitmap = rbio->finish_pbitmap;
>>   	int nr_data = rbio->nr_data;
>>   	int stripe;
>> -	int pagenr;
>> +	int sectornr;
>>   	bool has_qstripe;
>>   	struct page *p_page = NULL;
>>   	struct page *q_page = NULL;
>> @@ -2419,7 +2491,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>
>>   	if (bioc->num_tgtdevs && bioc->tgtdev_map[rbio->scrubp]) {
>>   		is_replace = 1;
>> -		bitmap_copy(pbitmap, rbio->dbitmap, rbio->stripe_npages);
>> +		bitmap_copy(pbitmap, rbio->dbitmap, rbio->stripe_nsectors);
>>   	}
>>
>>   	/*
>> @@ -2453,12 +2525,12 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>   	/* Map the parity stripe just once */
>>   	pointers[nr_data] = kmap_local_page(p_page);
>>
>> -	for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
>> +	for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
>>   		struct page *p;
>>   		void *parity;
>>   		/* first collect one page from each data stripe */
>>   		for (stripe = 0; stripe < nr_data; stripe++) {
>> -			p = page_in_rbio(rbio, stripe, pagenr, 0);
>> +			p = page_in_rbio(rbio, stripe, sectornr, 0);
>>   			pointers[stripe] = kmap_local_page(p);
>>   		}
>>
>> @@ -2473,13 +2545,13 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>   		}
>>
>>   		/* Check scrubbing parity and repair it */
>> -		p = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
>> +		p = rbio_stripe_page(rbio, rbio->scrubp, sectornr);
>>   		parity = kmap_local_page(p);
>>   		if (memcmp(parity, pointers[rbio->scrubp], PAGE_SIZE))
>>   			copy_page(parity, pointers[rbio->scrubp]);
>>   		else
>>   			/* Parity is right, needn't writeback */
>> -			bitmap_clear(rbio->dbitmap, pagenr, 1);
>> +			bitmap_clear(rbio->dbitmap, sectornr, 1);
>>   		kunmap_local(parity);
>>
>>   		for (stripe = nr_data - 1; stripe >= 0; stripe--)
>> @@ -2499,12 +2571,13 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>   	 * higher layers (the bio_list in our rbio) and our p/q.  Ignore
>>   	 * everything else.
>>   	 */
>> -	for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
>> -		struct page *page;
>> +	for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
>> +		struct sector_ptr *sector;
>>
>> -		page = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
>> -		ret = rbio_add_io_page(rbio, &bio_list, page, rbio->scrubp,
>> -				       pagenr, rbio->stripe_len, REQ_OP_WRITE);
>> +		sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
>> +		ret = rbio_add_io_sector(rbio, &bio_list, sector, rbio->scrubp,
>> +					 sectornr, rbio->stripe_len,
>> +					 REQ_OP_WRITE);
>>   		if (ret)
>>   			goto cleanup;
>>   	}
>> @@ -2512,13 +2585,13 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>   	if (!is_replace)
>>   		goto submit_write;
>>
>> -	for_each_set_bit(pagenr, pbitmap, rbio->stripe_npages) {
>> -		struct page *page;
>> +	for_each_set_bit(sectornr, pbitmap, rbio->stripe_nsectors) {
>> +		struct sector_ptr *sector;
>>
>> -		page = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
>> -		ret = rbio_add_io_page(rbio, &bio_list, page,
>> +		sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
>> +		ret = rbio_add_io_sector(rbio, &bio_list, sector,
>>   				       bioc->tgtdev_map[rbio->scrubp],
>> -				       pagenr, rbio->stripe_len, REQ_OP_WRITE);
>> +				       sectornr, rbio->stripe_len, REQ_OP_WRITE);
>>   		if (ret)
>>   			goto cleanup;
>>   	}
>> @@ -2650,7 +2723,7 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
>>   	int bios_to_read = 0;
>>   	struct bio_list bio_list;
>>   	int ret;
>> -	int pagenr;
>> +	int sectornr;
>>   	int stripe;
>>   	struct bio *bio;
>>
>> @@ -2666,28 +2739,30 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
>>   	 * stripe
>>   	 */
>>   	for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
>> -		for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
>> -			struct page *page;
>> +		for_each_set_bit(sectornr, rbio->dbitmap,
>> +				 rbio->stripe_nsectors) {
>> +			struct sector_ptr *sector;
>>   			/*
>> -			 * we want to find all the pages missing from
>> +			 * We want to find all the sectors missing from
>>   			 * the rbio and read them from the disk.  If
>> -			 * page_in_rbio finds a page in the bio list
>> +			 * sector_in_rbio() finds a sector in the bio list
>>   			 * we don't need to read it off the stripe.
>>   			 */
>> -			page = page_in_rbio(rbio, stripe, pagenr, 1);
>> -			if (page)
>> +			sector = sector_in_rbio(rbio, stripe, sectornr, 1);
>> +			if (sector)
>>   				continue;
>>
>> -			page = rbio_stripe_page(rbio, stripe, pagenr);
>> +			sector = rbio_stripe_sector(rbio, stripe, sectornr);
>>   			/*
>> -			 * the bio cache may have handed us an uptodate
>> -			 * page.  If so, be happy and use it
>> +			 * The bio cache may have handed us an uptodate
>> +			 * sector.  If so, be happy and use it
>>   			 */
>> -			if (PageUptodate(page))
>> +			if (sector->uptodate)
>>   				continue;
>>
>> -			ret = rbio_add_io_page(rbio, &bio_list, page, stripe,
>> -					       pagenr, rbio->stripe_len, REQ_OP_READ);
>> +			ret = rbio_add_io_sector(rbio, &bio_list, sector,
>> +						 stripe, sectornr,
>> +						 rbio->stripe_len, REQ_OP_READ);
>>   			if (ret)
>>   				goto cleanup;
>>   		}
>> --
>> 2.35.1

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible
  2022-04-13 23:28     ` Qu Wenruo
@ 2022-04-14  0:43       ` Qu Wenruo
  2022-04-14 10:59         ` Qu Wenruo
  2022-04-14 15:43         ` David Sterba
  0 siblings, 2 replies; 34+ messages in thread
From: Qu Wenruo @ 2022-04-14  0:43 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/4/14 07:28, Qu Wenruo wrote:
> 
> 
> On 2022/4/14 03:14, David Sterba wrote:
>> There's an assertion failure reported by btrfs/023
>>
>> On Tue, Apr 12, 2022 at 05:32:57PM +0800, Qu Wenruo wrote:
>>> +static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
>>> +                  struct bio_list *bio_list,
>>> +                  struct sector_ptr *sector,
>>> +                  unsigned int stripe_nr,
>>> +                  unsigned int sector_nr,
>>> +                  unsigned long bio_max_len, unsigned int opf)
>>>   {
>>> +    const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
>>>       struct bio *last = bio_list->tail;
>>>       int ret;
>>>       struct bio *bio;
>>>       struct btrfs_io_stripe *stripe;
>>>       u64 disk_start;
>>>
>>> +    /*
>>> +     * NOTE: here stripe_nr has taken device replace into 
>>> consideration,
>>> +     * thus it can be larger than rbio->real_stripe.
>>> +     * So here we check against bioc->num_stripes, not 
>>> rbio->real_stripes.
>>> +     */
>>> +    ASSERT(stripe_nr >= 0 && stripe_nr < rbio->bioc->num_stripes);
>>> +    ASSERT(sector_nr >= 0 && sector_nr < rbio->stripe_nsectors);
>>> +    ASSERT(sector->page);
>>
>> This one ^^^
> 
> Failed to reproduce here, both x86_64 and aarch64 survived over 64 loops
> of btrfs/023.
> 
> Although that's withy my github branch. Let me check the current branch
> to see what's wrong.

It's a rebase error.

At least one patch has incorrect diff snippet.

For the patch "btrfs: raid56: introduce btrfs_raid_bio::stripe_sectors":

For alloc_rbio(), in my branch and patch (v2) submitted to the mailing list:

@@ -978,6 +1014,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct 
btrfs_fs_info *fs_info,
         rbio = kzalloc(sizeof(*rbio) +
                        sizeof(*rbio->stripe_pages) * num_pages +
                        sizeof(*rbio->bio_pages) * num_pages +
+                      sizeof(*rbio->stripe_sectors) * num_sectors +
                        sizeof(*rbio->finish_pointers) * real_stripes +
                        sizeof(*rbio->dbitmap) * 
BITS_TO_LONGS(stripe_nsectors) +
                        sizeof(*rbio->finish_pbitmap) *
@@ -1015,6 +1052,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct 
btrfs_fs_info *fs_info,
         } while (0)
         CONSUME_ALLOC(rbio->stripe_pages, num_pages);
         CONSUME_ALLOC(rbio->bio_pages, num_pages);
+       CONSUME_ALLOC(rbio->stripe_sectors, num_sectors);
         CONSUME_ALLOC(rbio->finish_pointers, real_stripes);
         CONSUME_ALLOC(rbio->dbitmap, BITS_TO_LONGS(stripe_nsectors));
         CONSUME_ALLOC(rbio->finish_pbitmap, 
BITS_TO_LONGS(stripe_nsectors));

It has the correct extra space for stripe_sectors.

But in the for-next branch, it looks like this:
@@ -1014,6 +1050,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct 
btrfs_fs_info *fs_info,
         } while (0)
         CONSUME_ALLOC(rbio->stripe_pages, num_pages);
         CONSUME_ALLOC(rbio->bio_pages, num_pages);
+       CONSUME_ALLOC(rbio->stripe_sectors, num_sectors);
         CONSUME_ALLOC(rbio->finish_pointers, real_stripes);
         CONSUME_ALLOC(rbio->dbitmap, BITS_TO_LONGS(stripe_nsectors));
         CONSUME_ALLOC(rbio->finish_pbitmap, 
BITS_TO_LONGS(stripe_nsectors));

Without the extra space allocated for stripe_sectors, no wonder it crashes.

Mind to provide a branch for me to rebase my patchse upon?

Thanks,
Qu

> 
> Thanks,
> Qu
> 
>>
>> [ 2280.705765] assertion failed: sector->page, in fs/btrfs/raid56.c:1145
>> [ 2280.707844] ------------[ cut here ]------------
>> [ 2280.709401] kernel BUG at fs/btrfs/ctree.h:3614!
>> [ 2280.711084] invalid opcode: 0000 [#1] PREEMPT SMP
>> [ 2280.712822] CPU: 3 PID: 4084 Comm: kworker/u8:2 Not tainted 
>> 5.18.0-rc2-default+ #1697
>> [ 2280.715648] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
>> BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
>> [ 2280.719656] Workqueue: btrfs-rmw btrfs_work_helper [btrfs]
>> [ 2280.721775] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
>> [ 2280.729575] RSP: 0018:ffffad6d071afda0 EFLAGS: 00010246
>> [ 2280.730844] RAX: 0000000000000039 RBX: 0000000000000000 RCX: 
>> 0000000000000000
>> [ 2280.732449] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 
>> 00000000ffffffff
>> [ 2280.733992] RBP: ffff8e51d5465000 R08: 0000000000000003 R09: 
>> 0000000000000002
>> [ 2280.735535] R10: 0000000000000000 R11: 0000000000000001 R12: 
>> 0000000000000003
>> [ 2280.737093] R13: ffff8e51d5465000 R14: ffff8e51d5465d78 R15: 
>> 0000000000001000
>> [ 2280.738613] FS:  0000000000000000(0000) GS:ffff8e523dc00000(0000) 
>> knlGS:0000000000000000
>> [ 2280.740392] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 2280.741794] CR2: 000055f48fe51ab0 CR3: 000000001f805001 CR4: 
>> 0000000000170ea0
>> [ 2280.743532] Call Trace:
>> [ 2280.744136]  <TASK>
>> [ 2280.744701]  rbio_add_io_sector.cold+0x11/0x33 [btrfs]
>> [ 2280.745846]  ? _raw_spin_unlock_irq+0x2f/0x50
>> [ 2280.746782]  raid56_rmw_stripe.isra.0+0x153/0x320 [btrfs]
>> [ 2280.747965]  btrfs_work_helper+0xd6/0x1d0 [btrfs]
>> [ 2280.749018]  process_one_work+0x264/0x5f0
>> [ 2280.749806]  worker_thread+0x52/0x3b0
>> [ 2280.750523]  ? process_one_work+0x5f0/0x5f0
>> [ 2280.751395]  kthread+0xea/0x110
>> [ 2280.752097]  ? kthread_complete_and_exit+0x20/0x20
>> [ 2280.753112]  ret_from_fork+0x1f/0x30
>>
>>
>>> +
>>> +    /* We don't yet support subpage, thus pgoff should always be 0 */
>>> +    ASSERT(sector->pgoff == 0);
>>> +
>>>       stripe = &rbio->bioc->stripes[stripe_nr];
>>> -    disk_start = stripe->physical + (page_index << PAGE_SHIFT);
>>> +    disk_start = stripe->physical + sector_nr * sectorsize;
>>>
>>>       /* if the device is missing, just fail this stripe */
>>>       if (!stripe->dev->bdev)
>>> @@ -1156,8 +1226,9 @@ static int rbio_add_io_page(struct 
>>> btrfs_raid_bio *rbio,
>>>            */
>>>           if (last_end == disk_start && !last->bi_status &&
>>>               last->bi_bdev == stripe->dev->bdev) {
>>> -            ret = bio_add_page(last, page, PAGE_SIZE, 0);
>>> -            if (ret == PAGE_SIZE)
>>> +            ret = bio_add_page(last, sector->page, sectorsize,
>>> +                       sector->pgoff);
>>> +            if (ret == sectorsize)
>>>                   return 0;
>>>           }
>>>       }
>>> @@ -1168,7 +1239,7 @@ static int rbio_add_io_page(struct 
>>> btrfs_raid_bio *rbio,
>>>       bio->bi_iter.bi_sector = disk_start >> 9;
>>>       bio->bi_private = rbio;
>>>
>>> -    bio_add_page(bio, page, PAGE_SIZE, 0);
>>> +    bio_add_page(bio, sector->page, sectorsize, sector->pgoff);
>>>       bio_list_add(bio_list, bio);
>>>       return 0;
>>>   }
>>> @@ -1265,7 +1336,7 @@ static noinline void finish_rmw(struct 
>>> btrfs_raid_bio *rbio)
>>>       void **pointers = rbio->finish_pointers;
>>>       int nr_data = rbio->nr_data;
>>>       int stripe;
>>> -    int pagenr;
>>> +    int sectornr;
>>>       bool has_qstripe;
>>>       struct bio_list bio_list;
>>>       struct bio *bio;
>>> @@ -1309,16 +1380,16 @@ static noinline void finish_rmw(struct 
>>> btrfs_raid_bio *rbio)
>>>       else
>>>           clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
>>>
>>> -    for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
>>> +    for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
>>>           struct page *p;
>>>           /* first collect one page from each data stripe */
>>>           for (stripe = 0; stripe < nr_data; stripe++) {
>>> -            p = page_in_rbio(rbio, stripe, pagenr, 0);
>>> +            p = page_in_rbio(rbio, stripe, sectornr, 0);
>>>               pointers[stripe] = kmap_local_page(p);
>>>           }
>>>
>>>           /* then add the parity stripe */
>>> -        p = rbio_pstripe_page(rbio, pagenr);
>>> +        p = rbio_pstripe_page(rbio, sectornr);
>>>           SetPageUptodate(p);
>>>           pointers[stripe++] = kmap_local_page(p);
>>>
>>> @@ -1328,7 +1399,7 @@ static noinline void finish_rmw(struct 
>>> btrfs_raid_bio *rbio)
>>>                * raid6, add the qstripe and call the
>>>                * library function to fill in our p/q
>>>                */
>>> -            p = rbio_qstripe_page(rbio, pagenr);
>>> +            p = rbio_qstripe_page(rbio, sectornr);
>>>               SetPageUptodate(p);
>>>               pointers[stripe++] = kmap_local_page(p);
>>>
>>> @@ -1349,19 +1420,19 @@ static noinline void finish_rmw(struct 
>>> btrfs_raid_bio *rbio)
>>>        * everything else.
>>>        */
>>>       for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
>>> -        for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
>>> -            struct page *page;
>>> +        for (sectornr = 0; sectornr < rbio->stripe_nsectors; 
>>> sectornr++) {
>>> +            struct sector_ptr *sector;
>>> +
>>>               if (stripe < rbio->nr_data) {
>>> -                page = page_in_rbio(rbio, stripe, pagenr, 1);
>>> -                if (!page)
>>> +                sector = sector_in_rbio(rbio, stripe, sectornr, 1);
>>> +                if (!sector)
>>>                       continue;
>>>               } else {
>>> -                   page = rbio_stripe_page(rbio, stripe, pagenr);
>>> +                sector = rbio_stripe_sector(rbio, stripe, sectornr);
>>>               }
>>>
>>> -            ret = rbio_add_io_page(rbio, &bio_list,
>>> -                       page, stripe, pagenr, rbio->stripe_len,
>>> -                       REQ_OP_WRITE);
>>> +            ret = rbio_add_io_sector(rbio, &bio_list, sector, stripe,
>>> +                         sectornr, rbio->stripe_len, REQ_OP_WRITE);
>>>               if (ret)
>>>                   goto cleanup;
>>>           }
>>> @@ -1374,20 +1445,21 @@ static noinline void finish_rmw(struct 
>>> btrfs_raid_bio *rbio)
>>>           if (!bioc->tgtdev_map[stripe])
>>>               continue;
>>>
>>> -        for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
>>> -            struct page *page;
>>> +        for (sectornr = 0; sectornr < rbio->stripe_nsectors; 
>>> sectornr++) {
>>> +            struct sector_ptr *sector;
>>> +
>>>               if (stripe < rbio->nr_data) {
>>> -                page = page_in_rbio(rbio, stripe, pagenr, 1);
>>> -                if (!page)
>>> +                sector = sector_in_rbio(rbio, stripe, sectornr, 1);
>>> +                if (!sector)
>>>                       continue;
>>>               } else {
>>> -                   page = rbio_stripe_page(rbio, stripe, pagenr);
>>> +                sector = rbio_stripe_sector(rbio, stripe, sectornr);
>>>               }
>>>
>>> -            ret = rbio_add_io_page(rbio, &bio_list, page,
>>> -                           rbio->bioc->tgtdev_map[stripe],
>>> -                           pagenr, rbio->stripe_len,
>>> -                           REQ_OP_WRITE);
>>> +            ret = rbio_add_io_sector(rbio, &bio_list, sector,
>>> +                         rbio->bioc->tgtdev_map[stripe],
>>> +                         sectornr, rbio->stripe_len,
>>> +                         REQ_OP_WRITE);
>>>               if (ret)
>>>                   goto cleanup;
>>>           }
>>> @@ -1563,7 +1635,7 @@ static int raid56_rmw_stripe(struct 
>>> btrfs_raid_bio *rbio)
>>>       int bios_to_read = 0;
>>>       struct bio_list bio_list;
>>>       int ret;
>>> -    int pagenr;
>>> +    int sectornr;
>>>       int stripe;
>>>       struct bio *bio;
>>>
>>> @@ -1581,28 +1653,29 @@ static int raid56_rmw_stripe(struct 
>>> btrfs_raid_bio *rbio)
>>>        * stripe
>>>        */
>>>       for (stripe = 0; stripe < rbio->nr_data; stripe++) {
>>> -        for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
>>> -            struct page *page;
>>> +        for (sectornr = 0; sectornr < rbio->stripe_nsectors; 
>>> sectornr++) {
>>> +            struct sector_ptr *sector;
>>> +
>>>               /*
>>> -             * we want to find all the pages missing from
>>> +             * We want to find all the sectors missing from
>>>                * the rbio and read them from the disk.  If
>>> -             * page_in_rbio finds a page in the bio list
>>> +             * sector_in_rbio() finds a page in the bio list
>>>                * we don't need to read it off the stripe.
>>>                */
>>> -            page = page_in_rbio(rbio, stripe, pagenr, 1);
>>> -            if (page)
>>> +            sector = sector_in_rbio(rbio, stripe, sectornr, 1);
>>> +            if (sector)
>>>                   continue;
>>>
>>> -            page = rbio_stripe_page(rbio, stripe, pagenr);
>>> +            sector = rbio_stripe_sector(rbio, stripe, sectornr);
>>>               /*
>>> -             * the bio cache may have handed us an uptodate
>>> +             * The bio cache may have handed us an uptodate
>>>                * page.  If so, be happy and use it
>>>                */
>>> -            if (PageUptodate(page))
>>> +            if (sector->uptodate)
>>>                   continue;
>>>
>>> -            ret = rbio_add_io_page(rbio, &bio_list, page,
>>> -                       stripe, pagenr, rbio->stripe_len,
>>> +            ret = rbio_add_io_sector(rbio, &bio_list, sector,
>>> +                       stripe, sectornr, rbio->stripe_len,
>>>                          REQ_OP_READ);
>>>               if (ret)
>>>                   goto cleanup;
>>> @@ -2107,7 +2180,7 @@ static int __raid56_parity_recover(struct 
>>> btrfs_raid_bio *rbio)
>>>       int bios_to_read = 0;
>>>       struct bio_list bio_list;
>>>       int ret;
>>> -    int pagenr;
>>> +    int sectornr;
>>>       int stripe;
>>>       struct bio *bio;
>>>
>>> @@ -2130,21 +2203,20 @@ static int __raid56_parity_recover(struct 
>>> btrfs_raid_bio *rbio)
>>>               continue;
>>>           }
>>>
>>> -        for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
>>> -            struct page *p;
>>> +        for (sectornr = 0; sectornr < rbio->stripe_nsectors; 
>>> sectornr++) {
>>> +            struct sector_ptr *sector;
>>>
>>>               /*
>>>                * the rmw code may have already read this
>>>                * page in
>>>                */
>>> -            p = rbio_stripe_page(rbio, stripe, pagenr);
>>> -            if (PageUptodate(p))
>>> +            sector = rbio_stripe_sector(rbio, stripe, sectornr);
>>> +            if (sector->uptodate)
>>>                   continue;
>>>
>>> -            ret = rbio_add_io_page(rbio, &bio_list,
>>> -                       rbio_stripe_page(rbio, stripe, pagenr),
>>> -                       stripe, pagenr, rbio->stripe_len,
>>> -                       REQ_OP_READ);
>>> +            ret = rbio_add_io_sector(rbio, &bio_list, sector,
>>> +                         stripe, sectornr,
>>> +                         rbio->stripe_len, REQ_OP_READ);
>>>               if (ret < 0)
>>>                   goto cleanup;
>>>           }
>>> @@ -2399,7 +2471,7 @@ static noinline void finish_parity_scrub(struct 
>>> btrfs_raid_bio *rbio,
>>>       unsigned long *pbitmap = rbio->finish_pbitmap;
>>>       int nr_data = rbio->nr_data;
>>>       int stripe;
>>> -    int pagenr;
>>> +    int sectornr;
>>>       bool has_qstripe;
>>>       struct page *p_page = NULL;
>>>       struct page *q_page = NULL;
>>> @@ -2419,7 +2491,7 @@ static noinline void finish_parity_scrub(struct 
>>> btrfs_raid_bio *rbio,
>>>
>>>       if (bioc->num_tgtdevs && bioc->tgtdev_map[rbio->scrubp]) {
>>>           is_replace = 1;
>>> -        bitmap_copy(pbitmap, rbio->dbitmap, rbio->stripe_npages);
>>> +        bitmap_copy(pbitmap, rbio->dbitmap, rbio->stripe_nsectors);
>>>       }
>>>
>>>       /*
>>> @@ -2453,12 +2525,12 @@ static noinline void 
>>> finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>>       /* Map the parity stripe just once */
>>>       pointers[nr_data] = kmap_local_page(p_page);
>>>
>>> -    for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
>>> +    for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
>>>           struct page *p;
>>>           void *parity;
>>>           /* first collect one page from each data stripe */
>>>           for (stripe = 0; stripe < nr_data; stripe++) {
>>> -            p = page_in_rbio(rbio, stripe, pagenr, 0);
>>> +            p = page_in_rbio(rbio, stripe, sectornr, 0);
>>>               pointers[stripe] = kmap_local_page(p);
>>>           }
>>>
>>> @@ -2473,13 +2545,13 @@ static noinline void 
>>> finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>>           }
>>>
>>>           /* Check scrubbing parity and repair it */
>>> -        p = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
>>> +        p = rbio_stripe_page(rbio, rbio->scrubp, sectornr);
>>>           parity = kmap_local_page(p);
>>>           if (memcmp(parity, pointers[rbio->scrubp], PAGE_SIZE))
>>>               copy_page(parity, pointers[rbio->scrubp]);
>>>           else
>>>               /* Parity is right, needn't writeback */
>>> -            bitmap_clear(rbio->dbitmap, pagenr, 1);
>>> +            bitmap_clear(rbio->dbitmap, sectornr, 1);
>>>           kunmap_local(parity);
>>>
>>>           for (stripe = nr_data - 1; stripe >= 0; stripe--)
>>> @@ -2499,12 +2571,13 @@ static noinline void 
>>> finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>>        * higher layers (the bio_list in our rbio) and our p/q.  Ignore
>>>        * everything else.
>>>        */
>>> -    for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
>>> -        struct page *page;
>>> +    for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
>>> +        struct sector_ptr *sector;
>>>
>>> -        page = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
>>> -        ret = rbio_add_io_page(rbio, &bio_list, page, rbio->scrubp,
>>> -                       pagenr, rbio->stripe_len, REQ_OP_WRITE);
>>> +        sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
>>> +        ret = rbio_add_io_sector(rbio, &bio_list, sector, rbio->scrubp,
>>> +                     sectornr, rbio->stripe_len,
>>> +                     REQ_OP_WRITE);
>>>           if (ret)
>>>               goto cleanup;
>>>       }
>>> @@ -2512,13 +2585,13 @@ static noinline void 
>>> finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>>       if (!is_replace)
>>>           goto submit_write;
>>>
>>> -    for_each_set_bit(pagenr, pbitmap, rbio->stripe_npages) {
>>> -        struct page *page;
>>> +    for_each_set_bit(sectornr, pbitmap, rbio->stripe_nsectors) {
>>> +        struct sector_ptr *sector;
>>>
>>> -        page = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
>>> -        ret = rbio_add_io_page(rbio, &bio_list, page,
>>> +        sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
>>> +        ret = rbio_add_io_sector(rbio, &bio_list, sector,
>>>                          bioc->tgtdev_map[rbio->scrubp],
>>> -                       pagenr, rbio->stripe_len, REQ_OP_WRITE);
>>> +                       sectornr, rbio->stripe_len, REQ_OP_WRITE);
>>>           if (ret)
>>>               goto cleanup;
>>>       }
>>> @@ -2650,7 +2723,7 @@ static void raid56_parity_scrub_stripe(struct 
>>> btrfs_raid_bio *rbio)
>>>       int bios_to_read = 0;
>>>       struct bio_list bio_list;
>>>       int ret;
>>> -    int pagenr;
>>> +    int sectornr;
>>>       int stripe;
>>>       struct bio *bio;
>>>
>>> @@ -2666,28 +2739,30 @@ static void raid56_parity_scrub_stripe(struct 
>>> btrfs_raid_bio *rbio)
>>>        * stripe
>>>        */
>>>       for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
>>> -        for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
>>> -            struct page *page;
>>> +        for_each_set_bit(sectornr, rbio->dbitmap,
>>> +                 rbio->stripe_nsectors) {
>>> +            struct sector_ptr *sector;
>>>               /*
>>> -             * we want to find all the pages missing from
>>> +             * We want to find all the sectors missing from
>>>                * the rbio and read them from the disk.  If
>>> -             * page_in_rbio finds a page in the bio list
>>> +             * sector_in_rbio() finds a sector in the bio list
>>>                * we don't need to read it off the stripe.
>>>                */
>>> -            page = page_in_rbio(rbio, stripe, pagenr, 1);
>>> -            if (page)
>>> +            sector = sector_in_rbio(rbio, stripe, sectornr, 1);
>>> +            if (sector)
>>>                   continue;
>>>
>>> -            page = rbio_stripe_page(rbio, stripe, pagenr);
>>> +            sector = rbio_stripe_sector(rbio, stripe, sectornr);
>>>               /*
>>> -             * the bio cache may have handed us an uptodate
>>> -             * page.  If so, be happy and use it
>>> +             * The bio cache may have handed us an uptodate
>>> +             * sector.  If so, be happy and use it
>>>                */
>>> -            if (PageUptodate(page))
>>> +            if (sector->uptodate)
>>>                   continue;
>>>
>>> -            ret = rbio_add_io_page(rbio, &bio_list, page, stripe,
>>> -                           pagenr, rbio->stripe_len, REQ_OP_READ);
>>> +            ret = rbio_add_io_sector(rbio, &bio_list, sector,
>>> +                         stripe, sectornr,
>>> +                         rbio->stripe_len, REQ_OP_READ);
>>>               if (ret)
>>>                   goto cleanup;
>>>           }
>>> -- 
>>> 2.35.1

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible
  2022-04-14  0:43       ` Qu Wenruo
@ 2022-04-14 10:59         ` Qu Wenruo
  2022-04-14 15:51           ` David Sterba
  2022-04-14 15:43         ` David Sterba
  1 sibling, 1 reply; 34+ messages in thread
From: Qu Wenruo @ 2022-04-14 10:59 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/4/14 08:43, Qu Wenruo wrote:
> 
> 
> On 2022/4/14 07:28, Qu Wenruo wrote:
>>
>>
>> On 2022/4/14 03:14, David Sterba wrote:
>>> There's an assertion failure reported by btrfs/023
>>>
>>> On Tue, Apr 12, 2022 at 05:32:57PM +0800, Qu Wenruo wrote:
>>>> +static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
>>>> +                  struct bio_list *bio_list,
>>>> +                  struct sector_ptr *sector,
>>>> +                  unsigned int stripe_nr,
>>>> +                  unsigned int sector_nr,
>>>> +                  unsigned long bio_max_len, unsigned int opf)
>>>>   {
>>>> +    const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
>>>>       struct bio *last = bio_list->tail;
>>>>       int ret;
>>>>       struct bio *bio;
>>>>       struct btrfs_io_stripe *stripe;
>>>>       u64 disk_start;
>>>>
>>>> +    /*
>>>> +     * NOTE: here stripe_nr has taken device replace into 
>>>> consideration,
>>>> +     * thus it can be larger than rbio->real_stripe.
>>>> +     * So here we check against bioc->num_stripes, not 
>>>> rbio->real_stripes.
>>>> +     */
>>>> +    ASSERT(stripe_nr >= 0 && stripe_nr < rbio->bioc->num_stripes);
>>>> +    ASSERT(sector_nr >= 0 && sector_nr < rbio->stripe_nsectors);
>>>> +    ASSERT(sector->page);
>>>
>>> This one ^^^
>>
>> Failed to reproduce here, both x86_64 and aarch64 survived over 64 loops
>> of btrfs/023.
>>
>> Although that's withy my github branch. Let me check the current branch
>> to see what's wrong.
> 
> It's a rebase error.
> 
> At least one patch has incorrect diff snippet.
> 
> For the patch "btrfs: raid56: introduce btrfs_raid_bio::stripe_sectors":
> 
> For alloc_rbio(), in my branch and patch (v2) submitted to the mailing 
> list:
> 
> @@ -978,6 +1014,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct 
> btrfs_fs_info *fs_info,
>          rbio = kzalloc(sizeof(*rbio) +
>                         sizeof(*rbio->stripe_pages) * num_pages +
>                         sizeof(*rbio->bio_pages) * num_pages +
> +                      sizeof(*rbio->stripe_sectors) * num_sectors +

The more I check this part of the existing code, the worse it looks to me.

We're really dancing on the edge, it's just a problem of time for us to 
get a cut.

Shouldn't we go the regular way, kzalloc the structure only, then alloc 
needed pages/bitmaps?

Thanks,
Qu
>                         sizeof(*rbio->finish_pointers) * real_stripes +
>                         sizeof(*rbio->dbitmap) * 
> BITS_TO_LONGS(stripe_nsectors) +
>                         sizeof(*rbio->finish_pbitmap) *
> @@ -1015,6 +1052,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct 
> btrfs_fs_info *fs_info,
>          } while (0)
>          CONSUME_ALLOC(rbio->stripe_pages, num_pages);
>          CONSUME_ALLOC(rbio->bio_pages, num_pages);
> +       CONSUME_ALLOC(rbio->stripe_sectors, num_sectors);
>          CONSUME_ALLOC(rbio->finish_pointers, real_stripes);
>          CONSUME_ALLOC(rbio->dbitmap, BITS_TO_LONGS(stripe_nsectors));
>          CONSUME_ALLOC(rbio->finish_pbitmap, 
> BITS_TO_LONGS(stripe_nsectors));
> 
> It has the correct extra space for stripe_sectors.
> 
> But in the for-next branch, it looks like this:
> @@ -1014,6 +1050,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct 
> btrfs_fs_info *fs_info,
>          } while (0)
>          CONSUME_ALLOC(rbio->stripe_pages, num_pages);
>          CONSUME_ALLOC(rbio->bio_pages, num_pages);
> +       CONSUME_ALLOC(rbio->stripe_sectors, num_sectors);
>          CONSUME_ALLOC(rbio->finish_pointers, real_stripes);
>          CONSUME_ALLOC(rbio->dbitmap, BITS_TO_LONGS(stripe_nsectors));
>          CONSUME_ALLOC(rbio->finish_pbitmap, 
> BITS_TO_LONGS(stripe_nsectors));
> 
> Without the extra space allocated for stripe_sectors, no wonder it crashes.
> 
> Mind to provide a branch for me to rebase my patchse upon?
> 
> Thanks,
> Qu
> 
>>
>> Thanks,
>> Qu
>>
>>>
>>> [ 2280.705765] assertion failed: sector->page, in fs/btrfs/raid56.c:1145
>>> [ 2280.707844] ------------[ cut here ]------------
>>> [ 2280.709401] kernel BUG at fs/btrfs/ctree.h:3614!
>>> [ 2280.711084] invalid opcode: 0000 [#1] PREEMPT SMP
>>> [ 2280.712822] CPU: 3 PID: 4084 Comm: kworker/u8:2 Not tainted 
>>> 5.18.0-rc2-default+ #1697
>>> [ 2280.715648] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
>>> BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
>>> [ 2280.719656] Workqueue: btrfs-rmw btrfs_work_helper [btrfs]
>>> [ 2280.721775] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
>>> [ 2280.729575] RSP: 0018:ffffad6d071afda0 EFLAGS: 00010246
>>> [ 2280.730844] RAX: 0000000000000039 RBX: 0000000000000000 RCX: 
>>> 0000000000000000
>>> [ 2280.732449] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 
>>> 00000000ffffffff
>>> [ 2280.733992] RBP: ffff8e51d5465000 R08: 0000000000000003 R09: 
>>> 0000000000000002
>>> [ 2280.735535] R10: 0000000000000000 R11: 0000000000000001 R12: 
>>> 0000000000000003
>>> [ 2280.737093] R13: ffff8e51d5465000 R14: ffff8e51d5465d78 R15: 
>>> 0000000000001000
>>> [ 2280.738613] FS:  0000000000000000(0000) GS:ffff8e523dc00000(0000) 
>>> knlGS:0000000000000000
>>> [ 2280.740392] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 2280.741794] CR2: 000055f48fe51ab0 CR3: 000000001f805001 CR4: 
>>> 0000000000170ea0
>>> [ 2280.743532] Call Trace:
>>> [ 2280.744136]  <TASK>
>>> [ 2280.744701]  rbio_add_io_sector.cold+0x11/0x33 [btrfs]
>>> [ 2280.745846]  ? _raw_spin_unlock_irq+0x2f/0x50
>>> [ 2280.746782]  raid56_rmw_stripe.isra.0+0x153/0x320 [btrfs]
>>> [ 2280.747965]  btrfs_work_helper+0xd6/0x1d0 [btrfs]
>>> [ 2280.749018]  process_one_work+0x264/0x5f0
>>> [ 2280.749806]  worker_thread+0x52/0x3b0
>>> [ 2280.750523]  ? process_one_work+0x5f0/0x5f0
>>> [ 2280.751395]  kthread+0xea/0x110
>>> [ 2280.752097]  ? kthread_complete_and_exit+0x20/0x20
>>> [ 2280.753112]  ret_from_fork+0x1f/0x30
>>>
>>>
>>>> +
>>>> +    /* We don't yet support subpage, thus pgoff should always be 0 */
>>>> +    ASSERT(sector->pgoff == 0);
>>>> +
>>>>       stripe = &rbio->bioc->stripes[stripe_nr];
>>>> -    disk_start = stripe->physical + (page_index << PAGE_SHIFT);
>>>> +    disk_start = stripe->physical + sector_nr * sectorsize;
>>>>
>>>>       /* if the device is missing, just fail this stripe */
>>>>       if (!stripe->dev->bdev)
>>>> @@ -1156,8 +1226,9 @@ static int rbio_add_io_page(struct 
>>>> btrfs_raid_bio *rbio,
>>>>            */
>>>>           if (last_end == disk_start && !last->bi_status &&
>>>>               last->bi_bdev == stripe->dev->bdev) {
>>>> -            ret = bio_add_page(last, page, PAGE_SIZE, 0);
>>>> -            if (ret == PAGE_SIZE)
>>>> +            ret = bio_add_page(last, sector->page, sectorsize,
>>>> +                       sector->pgoff);
>>>> +            if (ret == sectorsize)
>>>>                   return 0;
>>>>           }
>>>>       }
>>>> @@ -1168,7 +1239,7 @@ static int rbio_add_io_page(struct 
>>>> btrfs_raid_bio *rbio,
>>>>       bio->bi_iter.bi_sector = disk_start >> 9;
>>>>       bio->bi_private = rbio;
>>>>
>>>> -    bio_add_page(bio, page, PAGE_SIZE, 0);
>>>> +    bio_add_page(bio, sector->page, sectorsize, sector->pgoff);
>>>>       bio_list_add(bio_list, bio);
>>>>       return 0;
>>>>   }
>>>> @@ -1265,7 +1336,7 @@ static noinline void finish_rmw(struct 
>>>> btrfs_raid_bio *rbio)
>>>>       void **pointers = rbio->finish_pointers;
>>>>       int nr_data = rbio->nr_data;
>>>>       int stripe;
>>>> -    int pagenr;
>>>> +    int sectornr;
>>>>       bool has_qstripe;
>>>>       struct bio_list bio_list;
>>>>       struct bio *bio;
>>>> @@ -1309,16 +1380,16 @@ static noinline void finish_rmw(struct 
>>>> btrfs_raid_bio *rbio)
>>>>       else
>>>>           clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
>>>>
>>>> -    for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
>>>> +    for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
>>>>           struct page *p;
>>>>           /* first collect one page from each data stripe */
>>>>           for (stripe = 0; stripe < nr_data; stripe++) {
>>>> -            p = page_in_rbio(rbio, stripe, pagenr, 0);
>>>> +            p = page_in_rbio(rbio, stripe, sectornr, 0);
>>>>               pointers[stripe] = kmap_local_page(p);
>>>>           }
>>>>
>>>>           /* then add the parity stripe */
>>>> -        p = rbio_pstripe_page(rbio, pagenr);
>>>> +        p = rbio_pstripe_page(rbio, sectornr);
>>>>           SetPageUptodate(p);
>>>>           pointers[stripe++] = kmap_local_page(p);
>>>>
>>>> @@ -1328,7 +1399,7 @@ static noinline void finish_rmw(struct 
>>>> btrfs_raid_bio *rbio)
>>>>                * raid6, add the qstripe and call the
>>>>                * library function to fill in our p/q
>>>>                */
>>>> -            p = rbio_qstripe_page(rbio, pagenr);
>>>> +            p = rbio_qstripe_page(rbio, sectornr);
>>>>               SetPageUptodate(p);
>>>>               pointers[stripe++] = kmap_local_page(p);
>>>>
>>>> @@ -1349,19 +1420,19 @@ static noinline void finish_rmw(struct 
>>>> btrfs_raid_bio *rbio)
>>>>        * everything else.
>>>>        */
>>>>       for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
>>>> -        for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
>>>> -            struct page *page;
>>>> +        for (sectornr = 0; sectornr < rbio->stripe_nsectors; 
>>>> sectornr++) {
>>>> +            struct sector_ptr *sector;
>>>> +
>>>>               if (stripe < rbio->nr_data) {
>>>> -                page = page_in_rbio(rbio, stripe, pagenr, 1);
>>>> -                if (!page)
>>>> +                sector = sector_in_rbio(rbio, stripe, sectornr, 1);
>>>> +                if (!sector)
>>>>                       continue;
>>>>               } else {
>>>> -                   page = rbio_stripe_page(rbio, stripe, pagenr);
>>>> +                sector = rbio_stripe_sector(rbio, stripe, sectornr);
>>>>               }
>>>>
>>>> -            ret = rbio_add_io_page(rbio, &bio_list,
>>>> -                       page, stripe, pagenr, rbio->stripe_len,
>>>> -                       REQ_OP_WRITE);
>>>> +            ret = rbio_add_io_sector(rbio, &bio_list, sector, stripe,
>>>> +                         sectornr, rbio->stripe_len, REQ_OP_WRITE);
>>>>               if (ret)
>>>>                   goto cleanup;
>>>>           }
>>>> @@ -1374,20 +1445,21 @@ static noinline void finish_rmw(struct 
>>>> btrfs_raid_bio *rbio)
>>>>           if (!bioc->tgtdev_map[stripe])
>>>>               continue;
>>>>
>>>> -        for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
>>>> -            struct page *page;
>>>> +        for (sectornr = 0; sectornr < rbio->stripe_nsectors; 
>>>> sectornr++) {
>>>> +            struct sector_ptr *sector;
>>>> +
>>>>               if (stripe < rbio->nr_data) {
>>>> -                page = page_in_rbio(rbio, stripe, pagenr, 1);
>>>> -                if (!page)
>>>> +                sector = sector_in_rbio(rbio, stripe, sectornr, 1);
>>>> +                if (!sector)
>>>>                       continue;
>>>>               } else {
>>>> -                   page = rbio_stripe_page(rbio, stripe, pagenr);
>>>> +                sector = rbio_stripe_sector(rbio, stripe, sectornr);
>>>>               }
>>>>
>>>> -            ret = rbio_add_io_page(rbio, &bio_list, page,
>>>> -                           rbio->bioc->tgtdev_map[stripe],
>>>> -                           pagenr, rbio->stripe_len,
>>>> -                           REQ_OP_WRITE);
>>>> +            ret = rbio_add_io_sector(rbio, &bio_list, sector,
>>>> +                         rbio->bioc->tgtdev_map[stripe],
>>>> +                         sectornr, rbio->stripe_len,
>>>> +                         REQ_OP_WRITE);
>>>>               if (ret)
>>>>                   goto cleanup;
>>>>           }
>>>> @@ -1563,7 +1635,7 @@ static int raid56_rmw_stripe(struct 
>>>> btrfs_raid_bio *rbio)
>>>>       int bios_to_read = 0;
>>>>       struct bio_list bio_list;
>>>>       int ret;
>>>> -    int pagenr;
>>>> +    int sectornr;
>>>>       int stripe;
>>>>       struct bio *bio;
>>>>
>>>> @@ -1581,28 +1653,29 @@ static int raid56_rmw_stripe(struct 
>>>> btrfs_raid_bio *rbio)
>>>>        * stripe
>>>>        */
>>>>       for (stripe = 0; stripe < rbio->nr_data; stripe++) {
>>>> -        for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
>>>> -            struct page *page;
>>>> +        for (sectornr = 0; sectornr < rbio->stripe_nsectors; 
>>>> sectornr++) {
>>>> +            struct sector_ptr *sector;
>>>> +
>>>>               /*
>>>> -             * we want to find all the pages missing from
>>>> +             * We want to find all the sectors missing from
>>>>                * the rbio and read them from the disk.  If
>>>> -             * page_in_rbio finds a page in the bio list
>>>> +             * sector_in_rbio() finds a page in the bio list
>>>>                * we don't need to read it off the stripe.
>>>>                */
>>>> -            page = page_in_rbio(rbio, stripe, pagenr, 1);
>>>> -            if (page)
>>>> +            sector = sector_in_rbio(rbio, stripe, sectornr, 1);
>>>> +            if (sector)
>>>>                   continue;
>>>>
>>>> -            page = rbio_stripe_page(rbio, stripe, pagenr);
>>>> +            sector = rbio_stripe_sector(rbio, stripe, sectornr);
>>>>               /*
>>>> -             * the bio cache may have handed us an uptodate
>>>> +             * The bio cache may have handed us an uptodate
>>>>                * page.  If so, be happy and use it
>>>>                */
>>>> -            if (PageUptodate(page))
>>>> +            if (sector->uptodate)
>>>>                   continue;
>>>>
>>>> -            ret = rbio_add_io_page(rbio, &bio_list, page,
>>>> -                       stripe, pagenr, rbio->stripe_len,
>>>> +            ret = rbio_add_io_sector(rbio, &bio_list, sector,
>>>> +                       stripe, sectornr, rbio->stripe_len,
>>>>                          REQ_OP_READ);
>>>>               if (ret)
>>>>                   goto cleanup;
>>>> @@ -2107,7 +2180,7 @@ static int __raid56_parity_recover(struct 
>>>> btrfs_raid_bio *rbio)
>>>>       int bios_to_read = 0;
>>>>       struct bio_list bio_list;
>>>>       int ret;
>>>> -    int pagenr;
>>>> +    int sectornr;
>>>>       int stripe;
>>>>       struct bio *bio;
>>>>
>>>> @@ -2130,21 +2203,20 @@ static int __raid56_parity_recover(struct 
>>>> btrfs_raid_bio *rbio)
>>>>               continue;
>>>>           }
>>>>
>>>> -        for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
>>>> -            struct page *p;
>>>> +        for (sectornr = 0; sectornr < rbio->stripe_nsectors; 
>>>> sectornr++) {
>>>> +            struct sector_ptr *sector;
>>>>
>>>>               /*
>>>>                * the rmw code may have already read this
>>>>                * page in
>>>>                */
>>>> -            p = rbio_stripe_page(rbio, stripe, pagenr);
>>>> -            if (PageUptodate(p))
>>>> +            sector = rbio_stripe_sector(rbio, stripe, sectornr);
>>>> +            if (sector->uptodate)
>>>>                   continue;
>>>>
>>>> -            ret = rbio_add_io_page(rbio, &bio_list,
>>>> -                       rbio_stripe_page(rbio, stripe, pagenr),
>>>> -                       stripe, pagenr, rbio->stripe_len,
>>>> -                       REQ_OP_READ);
>>>> +            ret = rbio_add_io_sector(rbio, &bio_list, sector,
>>>> +                         stripe, sectornr,
>>>> +                         rbio->stripe_len, REQ_OP_READ);
>>>>               if (ret < 0)
>>>>                   goto cleanup;
>>>>           }
>>>> @@ -2399,7 +2471,7 @@ static noinline void 
>>>> finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>>>       unsigned long *pbitmap = rbio->finish_pbitmap;
>>>>       int nr_data = rbio->nr_data;
>>>>       int stripe;
>>>> -    int pagenr;
>>>> +    int sectornr;
>>>>       bool has_qstripe;
>>>>       struct page *p_page = NULL;
>>>>       struct page *q_page = NULL;
>>>> @@ -2419,7 +2491,7 @@ static noinline void 
>>>> finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>>>
>>>>       if (bioc->num_tgtdevs && bioc->tgtdev_map[rbio->scrubp]) {
>>>>           is_replace = 1;
>>>> -        bitmap_copy(pbitmap, rbio->dbitmap, rbio->stripe_npages);
>>>> +        bitmap_copy(pbitmap, rbio->dbitmap, rbio->stripe_nsectors);
>>>>       }
>>>>
>>>>       /*
>>>> @@ -2453,12 +2525,12 @@ static noinline void 
>>>> finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>>>       /* Map the parity stripe just once */
>>>>       pointers[nr_data] = kmap_local_page(p_page);
>>>>
>>>> -    for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
>>>> +    for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
>>>>           struct page *p;
>>>>           void *parity;
>>>>           /* first collect one page from each data stripe */
>>>>           for (stripe = 0; stripe < nr_data; stripe++) {
>>>> -            p = page_in_rbio(rbio, stripe, pagenr, 0);
>>>> +            p = page_in_rbio(rbio, stripe, sectornr, 0);
>>>>               pointers[stripe] = kmap_local_page(p);
>>>>           }
>>>>
>>>> @@ -2473,13 +2545,13 @@ static noinline void 
>>>> finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>>>           }
>>>>
>>>>           /* Check scrubbing parity and repair it */
>>>> -        p = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
>>>> +        p = rbio_stripe_page(rbio, rbio->scrubp, sectornr);
>>>>           parity = kmap_local_page(p);
>>>>           if (memcmp(parity, pointers[rbio->scrubp], PAGE_SIZE))
>>>>               copy_page(parity, pointers[rbio->scrubp]);
>>>>           else
>>>>               /* Parity is right, needn't writeback */
>>>> -            bitmap_clear(rbio->dbitmap, pagenr, 1);
>>>> +            bitmap_clear(rbio->dbitmap, sectornr, 1);
>>>>           kunmap_local(parity);
>>>>
>>>>           for (stripe = nr_data - 1; stripe >= 0; stripe--)
>>>> @@ -2499,12 +2571,13 @@ static noinline void 
>>>> finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>>>        * higher layers (the bio_list in our rbio) and our p/q.  Ignore
>>>>        * everything else.
>>>>        */
>>>> -    for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
>>>> -        struct page *page;
>>>> +    for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
>>>> +        struct sector_ptr *sector;
>>>>
>>>> -        page = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
>>>> -        ret = rbio_add_io_page(rbio, &bio_list, page, rbio->scrubp,
>>>> -                       pagenr, rbio->stripe_len, REQ_OP_WRITE);
>>>> +        sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
>>>> +        ret = rbio_add_io_sector(rbio, &bio_list, sector, 
>>>> rbio->scrubp,
>>>> +                     sectornr, rbio->stripe_len,
>>>> +                     REQ_OP_WRITE);
>>>>           if (ret)
>>>>               goto cleanup;
>>>>       }
>>>> @@ -2512,13 +2585,13 @@ static noinline void 
>>>> finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>>>       if (!is_replace)
>>>>           goto submit_write;
>>>>
>>>> -    for_each_set_bit(pagenr, pbitmap, rbio->stripe_npages) {
>>>> -        struct page *page;
>>>> +    for_each_set_bit(sectornr, pbitmap, rbio->stripe_nsectors) {
>>>> +        struct sector_ptr *sector;
>>>>
>>>> -        page = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
>>>> -        ret = rbio_add_io_page(rbio, &bio_list, page,
>>>> +        sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
>>>> +        ret = rbio_add_io_sector(rbio, &bio_list, sector,
>>>>                          bioc->tgtdev_map[rbio->scrubp],
>>>> -                       pagenr, rbio->stripe_len, REQ_OP_WRITE);
>>>> +                       sectornr, rbio->stripe_len, REQ_OP_WRITE);
>>>>           if (ret)
>>>>               goto cleanup;
>>>>       }
>>>> @@ -2650,7 +2723,7 @@ static void raid56_parity_scrub_stripe(struct 
>>>> btrfs_raid_bio *rbio)
>>>>       int bios_to_read = 0;
>>>>       struct bio_list bio_list;
>>>>       int ret;
>>>> -    int pagenr;
>>>> +    int sectornr;
>>>>       int stripe;
>>>>       struct bio *bio;
>>>>
>>>> @@ -2666,28 +2739,30 @@ static void 
>>>> raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
>>>>        * stripe
>>>>        */
>>>>       for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
>>>> -        for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
>>>> -            struct page *page;
>>>> +        for_each_set_bit(sectornr, rbio->dbitmap,
>>>> +                 rbio->stripe_nsectors) {
>>>> +            struct sector_ptr *sector;
>>>>               /*
>>>> -             * we want to find all the pages missing from
>>>> +             * We want to find all the sectors missing from
>>>>                * the rbio and read them from the disk.  If
>>>> -             * page_in_rbio finds a page in the bio list
>>>> +             * sector_in_rbio() finds a sector in the bio list
>>>>                * we don't need to read it off the stripe.
>>>>                */
>>>> -            page = page_in_rbio(rbio, stripe, pagenr, 1);
>>>> -            if (page)
>>>> +            sector = sector_in_rbio(rbio, stripe, sectornr, 1);
>>>> +            if (sector)
>>>>                   continue;
>>>>
>>>> -            page = rbio_stripe_page(rbio, stripe, pagenr);
>>>> +            sector = rbio_stripe_sector(rbio, stripe, sectornr);
>>>>               /*
>>>> -             * the bio cache may have handed us an uptodate
>>>> -             * page.  If so, be happy and use it
>>>> +             * The bio cache may have handed us an uptodate
>>>> +             * sector.  If so, be happy and use it
>>>>                */
>>>> -            if (PageUptodate(page))
>>>> +            if (sector->uptodate)
>>>>                   continue;
>>>>
>>>> -            ret = rbio_add_io_page(rbio, &bio_list, page, stripe,
>>>> -                           pagenr, rbio->stripe_len, REQ_OP_READ);
>>>> +            ret = rbio_add_io_sector(rbio, &bio_list, sector,
>>>> +                         stripe, sectornr,
>>>> +                         rbio->stripe_len, REQ_OP_READ);
>>>>               if (ret)
>>>>                   goto cleanup;
>>>>           }
>>>> -- 
>>>> 2.35.1

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible
  2022-04-14  0:43       ` Qu Wenruo
  2022-04-14 10:59         ` Qu Wenruo
@ 2022-04-14 15:43         ` David Sterba
  2022-04-14 17:51           ` David Sterba
  1 sibling, 1 reply; 34+ messages in thread
From: David Sterba @ 2022-04-14 15:43 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Thu, Apr 14, 2022 at 08:43:45AM +0800, Qu Wenruo wrote:
> > Failed to reproduce here, both x86_64 and aarch64 survived over 64 loops
> > of btrfs/023.
> > 
> > Although that's withy my github branch. Let me check the current branch
> > to see what's wrong.
> 
> It's a rebase error.
> 
> At least one patch has incorrect diff snippet.

Ok, my bad, I did not do a finall diff check, I'll update the patchset.

> Mind to provide a branch for me to rebase my patchse upon?

It was is for-next I guess you found it already.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible
  2022-04-14 10:59         ` Qu Wenruo
@ 2022-04-14 15:51           ` David Sterba
  2022-04-14 22:48             ` Qu Wenruo
  0 siblings, 1 reply; 34+ messages in thread
From: David Sterba @ 2022-04-14 15:51 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Thu, Apr 14, 2022 at 06:59:58PM +0800, Qu Wenruo wrote:
> >> to see what's wrong.
> > 
> > It's a rebase error.
> > 
> > At least one patch has incorrect diff snippet.
> > 
> > For the patch "btrfs: raid56: introduce btrfs_raid_bio::stripe_sectors":
> > 
> > For alloc_rbio(), in my branch and patch (v2) submitted to the mailing 
> > list:
> > 
> > @@ -978,6 +1014,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct 
> > btrfs_fs_info *fs_info,
> >          rbio = kzalloc(sizeof(*rbio) +
> >                         sizeof(*rbio->stripe_pages) * num_pages +
> >                         sizeof(*rbio->bio_pages) * num_pages +
> > +                      sizeof(*rbio->stripe_sectors) * num_sectors +
> 
> The more I check this part of the existing code, the worse it looks to me.
> 
> We're really dancing on the edge, it's just a problem of time for us to 
> get a cut.
> 
> Shouldn't we go the regular way, kzalloc the structure only, then alloc 
> needed pages/bitmaps?

Yeah I agree that this has become unwieldy with the definitions and
calculations. One thing to consider is the real memory footprint and the
overhead of one kmalloc vs several kmalloc of smaller memory chunks.
Both have pros and cons but if there is no real effect then we may have
better code with the separate allocations.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible
  2022-04-14 15:43         ` David Sterba
@ 2022-04-14 17:51           ` David Sterba
  2022-04-14 22:28             ` Qu Wenruo
  0 siblings, 1 reply; 34+ messages in thread
From: David Sterba @ 2022-04-14 17:51 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs

On Thu, Apr 14, 2022 at 05:43:42PM +0200, David Sterba wrote:
> On Thu, Apr 14, 2022 at 08:43:45AM +0800, Qu Wenruo wrote:
> > > Failed to reproduce here, both x86_64 and aarch64 survived over 64 loops
> > > of btrfs/023.
> > > 
> > > Although that's withy my github branch. Let me check the current branch
> > > to see what's wrong.
> > 
> > It's a rebase error.
> > 
> > At least one patch has incorrect diff snippet.
> 
> Ok, my bad, I did not do a finall diff check, I'll update the patchset.

I should have done that, there are way more code changes that I missed
when going through the patch difference manually. I'll try to port it
again but uh.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible
  2022-04-14 17:51           ` David Sterba
@ 2022-04-14 22:28             ` Qu Wenruo
  2022-04-21 16:24               ` David Sterba
  0 siblings, 1 reply; 34+ messages in thread
From: Qu Wenruo @ 2022-04-14 22:28 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/4/15 01:51, David Sterba wrote:
> On Thu, Apr 14, 2022 at 05:43:42PM +0200, David Sterba wrote:
>> On Thu, Apr 14, 2022 at 08:43:45AM +0800, Qu Wenruo wrote:
>>>> Failed to reproduce here, both x86_64 and aarch64 survived over 64 loops
>>>> of btrfs/023.
>>>>
>>>> Although that's withy my github branch. Let me check the current branch
>>>> to see what's wrong.
>>>
>>> It's a rebase error.
>>>
>>> At least one patch has incorrect diff snippet.
>>
>> Ok, my bad, I did not do a finall diff check, I'll update the patchset.
>
> I should have done that, there are way more code changes that I missed
> when going through the patch difference manually. I'll try to port it
> again but uh.

Mind me to do the rebase?

Just base all the changes to the last commit just before the branch.

As doing manually diff check would also give me a deeper impression on
what I should do in the first place to reduce your workload.

THanks,
Qu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible
  2022-04-14 15:51           ` David Sterba
@ 2022-04-14 22:48             ` Qu Wenruo
  2022-04-21 15:44               ` David Sterba
  0 siblings, 1 reply; 34+ messages in thread
From: Qu Wenruo @ 2022-04-14 22:48 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/4/14 23:51, David Sterba wrote:
> On Thu, Apr 14, 2022 at 06:59:58PM +0800, Qu Wenruo wrote:
>>>> to see what's wrong.
>>>
>>> It's a rebase error.
>>>
>>> At least one patch has incorrect diff snippet.
>>>
>>> For the patch "btrfs: raid56: introduce btrfs_raid_bio::stripe_sectors":
>>>
>>> For alloc_rbio(), in my branch and patch (v2) submitted to the mailing
>>> list:
>>>
>>> @@ -978,6 +1014,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct
>>> btrfs_fs_info *fs_info,
>>>           rbio = kzalloc(sizeof(*rbio) +
>>>                          sizeof(*rbio->stripe_pages) * num_pages +
>>>                          sizeof(*rbio->bio_pages) * num_pages +
>>> +                      sizeof(*rbio->stripe_sectors) * num_sectors +
>>
>> The more I check this part of the existing code, the worse it looks to me.
>>
>> We're really dancing on the edge, it's just a problem of time for us to
>> get a cut.
>>
>> Shouldn't we go the regular way, kzalloc the structure only, then alloc
>> needed pages/bitmaps?
>
> Yeah I agree that this has become unwieldy with the definitions and
> calculations. One thing to consider is the real memory footprint and the
> overhead of one kmalloc vs several kmalloc of smaller memory chunks.

One thing to mention is, although our one kmalloc seems to be some
optimization, I guess the benefit is already small or none, especially
after subpage patchset.

Just check for 3 disks RAID5, one btrfs_raid_bio already takes over 2Kilo:

  After:  2320 (+97.8%) (From the cover letter)

In this case, I doubt smaller kmallocs would cause anything worse, as
kmalloc() will give us a 4K page/range anyway.

> Both have pros and cons but if there is no real effect then we may have
> better code with the separate allocations.

There are some ways to optimize, but they may be a little aggressive.

The most aggressive one would be, replace "struct page **stripe_pages"
to "struct page *stripe_page", and use alloc_pages_exact() to allocate a
range of physically adjacent pages.

In fact I'm even considering this for extent buffer pages, but it would
cause way more unexpected ENOMEM when memory is fragmented, thus too
aggressive.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 04/17] btrfs: introduce new cached members for btrfs_raid_bio
  2022-04-12  9:32 ` [PATCH v2 04/17] btrfs: introduce new cached members for btrfs_raid_bio Qu Wenruo
@ 2022-04-15  5:31   ` Christoph Hellwig
  2022-04-15  5:34     ` Qu Wenruo
  0 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2022-04-15  5:31 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

What is cached about these members?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 04/17] btrfs: introduce new cached members for btrfs_raid_bio
  2022-04-15  5:31   ` Christoph Hellwig
@ 2022-04-15  5:34     ` Qu Wenruo
  2022-04-15  5:45       ` Christoph Hellwig
  0 siblings, 1 reply; 34+ messages in thread
From: Qu Wenruo @ 2022-04-15  5:34 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo; +Cc: linux-btrfs



On 2022/4/15 13:31, Christoph Hellwig wrote:
> What is cached about these members?

It's just like btrfs_fs_info::sectorsize, gives us a quick way to use,
without calculating the same value again and again.

BTW, are you suggesting to calculate everything when needed?

I'm fine either way.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 04/17] btrfs: introduce new cached members for btrfs_raid_bio
  2022-04-15  5:34     ` Qu Wenruo
@ 2022-04-15  5:45       ` Christoph Hellwig
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2022-04-15  5:45 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, Qu Wenruo, linux-btrfs

On Fri, Apr 15, 2022 at 01:34:19PM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/4/15 13:31, Christoph Hellwig wrote:
> > What is cached about these members?
> 
> It's just like btrfs_fs_info::sectorsize, gives us a quick way to use,
> without calculating the same value again and again.
> 
> BTW, are you suggesting to calculate everything when needed?
> 
> I'm fine either way.

No.  I was just confused by the term cached.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible
  2022-04-14 22:48             ` Qu Wenruo
@ 2022-04-21 15:44               ` David Sterba
  0 siblings, 0 replies; 34+ messages in thread
From: David Sterba @ 2022-04-21 15:44 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Fri, Apr 15, 2022 at 06:48:07AM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/4/14 23:51, David Sterba wrote:
> > On Thu, Apr 14, 2022 at 06:59:58PM +0800, Qu Wenruo wrote:
> >>>> to see what's wrong.
> >>>
> >>> It's a rebase error.
> >>>
> >>> At least one patch has incorrect diff snippet.
> >>>
> >>> For the patch "btrfs: raid56: introduce btrfs_raid_bio::stripe_sectors":
> >>>
> >>> For alloc_rbio(), in my branch and patch (v2) submitted to the mailing
> >>> list:
> >>>
> >>> @@ -978,6 +1014,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct
> >>> btrfs_fs_info *fs_info,
> >>>           rbio = kzalloc(sizeof(*rbio) +
> >>>                          sizeof(*rbio->stripe_pages) * num_pages +
> >>>                          sizeof(*rbio->bio_pages) * num_pages +
> >>> +                      sizeof(*rbio->stripe_sectors) * num_sectors +
> >>
> >> The more I check this part of the existing code, the worse it looks to me.
> >>
> >> We're really dancing on the edge, it's just a problem of time for us to
> >> get a cut.
> >>
> >> Shouldn't we go the regular way, kzalloc the structure only, then alloc
> >> needed pages/bitmaps?
> >
> > Yeah I agree that this has become unwieldy with the definitions and
> > calculations. One thing to consider is the real memory footprint and the
> > overhead of one kmalloc vs several kmalloc of smaller memory chunks.
> 
> One thing to mention is, although our one kmalloc seems to be some
> optimization, I guess the benefit is already small or none, especially
> after subpage patchset.
> 
> Just check for 3 disks RAID5, one btrfs_raid_bio already takes over 2Kilo:
> 
>   After:  2320 (+97.8%) (From the cover letter)
> 
> In this case, I doubt smaller kmallocs would cause anything worse, as
> kmalloc() will give us a 4K page/range anyway.

Small allocations fall into the kmalloc-<size> slabs.

> > Both have pros and cons but if there is no real effect then we may have
> > better code with the separate allocations.
> 
> There are some ways to optimize, but they may be a little aggressive.
> 
> The most aggressive one would be, replace "struct page **stripe_pages"
> to "struct page *stripe_page", and use alloc_pages_exact() to allocate a
> range of physically adjacent pages.
> 
> In fact I'm even considering this for extent buffer pages, but it would
> cause way more unexpected ENOMEM when memory is fragmented, thus too
> aggressive.

Yeah we must assume the memory can get fragmented so allocating
individual pages must stay as a fallback. As extent bufferes represent
the metadata, we can't afford to return such errors in case there's
enough memory but not in the shape we'd like it.

I've thought about using the contig allocaiton too as it makes the
memcpy in the eb helpers much easier. What could be done is to have
both, optimistically allocating contiguous range and falling back to
what we have now, tracking the status in the eb flags. This means that
there would be a fast path if we're lucky and get the contiguous range
and this could speed some things up.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible
  2022-04-14 22:28             ` Qu Wenruo
@ 2022-04-21 16:24               ` David Sterba
  0 siblings, 0 replies; 34+ messages in thread
From: David Sterba @ 2022-04-21 16:24 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Fri, Apr 15, 2022 at 06:28:35AM +0800, Qu Wenruo wrote:
> > I should have done that, there are way more code changes that I missed
> > when going through the patch difference manually. I'll try to port it
> > again but uh.
> 
> Mind me to do the rebase?
> 
> Just base all the changes to the last commit just before the branch.
> 
> As doing manually diff check would also give me a deeper impression on
> what I should do in the first place to reduce your workload.

I don't push all the topic branches but they appear in for-next so you
can extract it from there, otherwise I can always push it on request so
we can sort it out.

Anyway, the branch is now merged to misc-next, and regarding the
problems I think there were a few factors that struck at the same time.
Checking diff of my changes against your branch is a safety check that
I should have done. We want to get branches for testing but patch
revisions also happen, so they can diverge and that we all stick to the
same coding style and formatting preferences is a noble goal but
practicaly not 100% achievable, there's always something left.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2022-04-21 16:31 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-12  9:32 [PATCH v2 00/17] btrfs: add subpage support for RAID56 Qu Wenruo
2022-04-12  9:32 ` [PATCH v2 01/17] btrfs: reduce width for stripe_len from u64 to u32 Qu Wenruo
2022-04-12  9:32 ` [PATCH v2 02/17] btrfs: open-code rbio_nr_pages() Qu Wenruo
2022-04-12  9:32 ` [PATCH v2 03/17] btrfs: make btrfs_raid_bio more compact Qu Wenruo
2022-04-12  9:32 ` [PATCH v2 04/17] btrfs: introduce new cached members for btrfs_raid_bio Qu Wenruo
2022-04-15  5:31   ` Christoph Hellwig
2022-04-15  5:34     ` Qu Wenruo
2022-04-15  5:45       ` Christoph Hellwig
2022-04-12  9:32 ` [PATCH v2 05/17] btrfs: introduce btrfs_raid_bio::stripe_sectors Qu Wenruo
2022-04-12  9:32 ` [PATCH v2 06/17] btrfs: introduce btrfs_raid_bio::bio_sectors Qu Wenruo
2022-04-12  9:32 ` [PATCH v2 07/17] btrfs: make rbio_add_io_page() subpage compatible Qu Wenruo
2022-04-13 19:14   ` David Sterba
2022-04-13 23:28     ` Qu Wenruo
2022-04-14  0:43       ` Qu Wenruo
2022-04-14 10:59         ` Qu Wenruo
2022-04-14 15:51           ` David Sterba
2022-04-14 22:48             ` Qu Wenruo
2022-04-21 15:44               ` David Sterba
2022-04-14 15:43         ` David Sterba
2022-04-14 17:51           ` David Sterba
2022-04-14 22:28             ` Qu Wenruo
2022-04-21 16:24               ` David Sterba
2022-04-12  9:32 ` [PATCH v2 08/17] btrfs: make finish_parity_scrub() " Qu Wenruo
2022-04-12  9:32 ` [PATCH v2 09/17] btrfs: make __raid_recover_endio_io() subpage compatibable Qu Wenruo
2022-04-12  9:33 ` [PATCH v2 10/17] btrfs: make finish_rmw() subpage compatible Qu Wenruo
2022-04-12  9:33 ` [PATCH v2 11/17] btrfs: open-code rbio_stripe_page_index() Qu Wenruo
2022-04-12  9:33 ` [PATCH v2 12/17] btrfs: make raid56_add_scrub_pages() subpage compatible Qu Wenruo
2022-04-12  9:33 ` [PATCH v2 13/17] btrfs: remove btrfs_raid_bio::bio_pages array Qu Wenruo
2022-04-12  9:33 ` [PATCH v2 14/17] btrfs: make set_bio_pages_uptodate() subpage compatible Qu Wenruo
2022-04-12  9:33 ` [PATCH v2 15/17] btrfs: make steal_rbio() " Qu Wenruo
2022-04-12  9:33 ` [PATCH v2 16/17] btrfs: make alloc_rbio_essential_pages() " Qu Wenruo
2022-04-12  9:33 ` [PATCH v2 17/17] btrfs: enable subpage support for RAID56 Qu Wenruo
2022-04-12 17:42 ` [PATCH v2 00/17] btrfs: add " David Sterba
2022-04-13 14:46   ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.