All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/16] btrfs: add subpage support for RAID56
@ 2022-04-01 11:23 Qu Wenruo
  2022-04-01 11:23 ` [PATCH 01/16] btrfs: open-code rbio_nr_pages() Qu Wenruo
                   ` (16 more replies)
  0 siblings, 17 replies; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

The branch can be fetched from github, based on a slightly older
misc-next branch:
https://github.com/adam900710/linux/tree/subpage_raid56

[DESIGN]
To make RAID56 code to support subpage, we need to make every sector of
a RAID56 full stripe (including P/Q) to be addressable.

Previously we use page pointer directly for things like stripe_pages:

Full stripe is 64K * 3, 2 data stripes, one P stripe (RAID5)

stripe_pages:   | 0 | 1 | 2 |  .. | 46 | 47 |

Those 48 pages all points to a page we allocated.


The new subpage support will introduce a sector layer, based on the old
stripe_pages[] array:

The same 64K * 3, RAID5 layout, but 64K page size and 4K sector size:

stripe_sectors: | 0 | 1 | .. |15 |16 |  ...  |31 |32 | ..    |47 |
stripe_pages:   |      Page 0    |     Page 1    |    Page 2     |

Each stripe_ptr of stripe_sectors[] will include:

- One page pointer
  Points back the page inside stripe_pages[].

- One pgoff member
  To indicate the offset inside the page

- One uptodate member
  To indicate if the sector is uptodate, replacing the old PageUptodate
  flag.
  As introducing btrfs_subpage structure to stripe_pages[] looks a
  little overkilled, as we only care Uptodate flag.

The same applies to bio_sectors[] array, which is going to completely
replace the old bio_pages[] array.

[SIDE EFFECT]
Despite the obvious new ability for subpage to utilize btrfs RAID56
profiles, it will cause more memory usage for real btrfs_raid_bio
structure.

We allocate extra memory based on the stripe size and number of stripes,
and update the pointer arrays to utilize the extra memory.

To compare, we use a pretty standard setup, 3 disks raid5, 4K page size
on x86_64:

 Before: 1176
 After:  2032 (+72.8%)

The reason for such a big bump is:

- Extra size for sector_ptr.
  Instead of just a page pointer, now it's twice the size of a pointer
  (a page pointer + 2 * unsigned int)

  This means although we replace bio_pages[] with bio_sectors[], we are
  still enlarging the size.

- A completely new array for stripe_sectors[]
  And the array itself is also twice the size of the old stripe_pages[].

There is some attempt to reduce the size of btrfs_raid_bio itself, but
the big impact still comes from the new sector_ptr arrays.

I have tried my best to reduce the size, by compacting the sector_ptr
structure.
Without exotic macros, I don't have any better ideas on reducing the
real size of btrfs_raid_bio.

[TESTS]
Full fstests are run on both x86_64 and aarch64.
No new regressions found.
(In fact several regressions found during development, all fixed).

[PATCHSET LAYOUT]
The patchset layout puts several things into consideration:

- Every patch can be tested independently on x86_64
  No more tons of unused helpers then a big switch.
  Every change can be verified on x86_64.

- More temporary sanity checks than previous code
  For example, when rbio_add_io_page() is converted to be subpage
  compatible, extra ASSERT() is added to ensure no subpage range
  can even be added.

  Such temporary checks are removed in the last enablement patch.
  This is to make testing on x86_64 more comprehensive.

- Mostly small change in each patch
  The only exception is the conversion for rbio_add_io_page().
  But the most change in that patch comes from variable renaming.
  The overall line changed in each patch should still be small enough
  for review.

Qu Wenruo (16):
  btrfs: open-code rbio_nr_pages()
  btrfs: make btrfs_raid_bio more compact
  btrfs: introduce new cached members for btrfs_raid_bio
  btrfs: introduce btrfs_raid_bio::stripe_sectors
  btrfs: introduce btrfs_raid_bio::bio_sectors
  btrfs: make rbio_add_io_page() subpage compatible
  btrfs: make finish_parity_scrub() subpage compatible
  btrfs: make __raid_recover_endio_io() subpage compatibable
  btrfs: make finish_rmw() subpage compatible
  btrfs: open-code rbio_stripe_page_index()
  btrfs: make raid56_add_scrub_pages() subpage compatible
  btrfs: remove btrfs_raid_bio::bio_pages array
  btrfs: make set_bio_pages_uptodate() subpage compatible
  btrfs: make steal_rbio() subpage compatible
  btrfs: make alloc_rbio_essential_pages() subpage compatible
  btrfs: enable subpage support for RAID56

 fs/btrfs/disk-io.c |   8 -
 fs/btrfs/raid56.c  | 748 ++++++++++++++++++++++++++++-----------------
 fs/btrfs/raid56.h  |   2 +-
 fs/btrfs/scrub.c   |   6 +-
 fs/btrfs/volumes.c |   7 -
 5 files changed, 467 insertions(+), 304 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/16] btrfs: open-code rbio_nr_pages()
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-01 11:23 ` [PATCH 02/16] btrfs: make btrfs_raid_bio more compact Qu Wenruo
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

The function rbio_nr_pages() is only called once inside alloc_rbio(),
there is no reason to make it dedicated helper.

Furthermore, the return type doesn't match, the function return "unsigned
long" which may not be necessary, while the only caller only uses "int".

Since we're doing cleaning up here, also fix the type to "const unsigned
int" for all involved local variables.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 0e239a4c3b26..ae585ed6a023 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -945,15 +945,6 @@ static struct page *page_in_rbio(struct btrfs_raid_bio *rbio,
 	return rbio->stripe_pages[chunk_page];
 }
 
-/*
- * number of pages we need for the entire stripe across all the
- * drives
- */
-static unsigned long rbio_nr_pages(unsigned long stripe_len, int nr_stripes)
-{
-	return DIV_ROUND_UP(stripe_len, PAGE_SIZE) * nr_stripes;
-}
-
 /*
  * allocation and initial setup for the btrfs_raid_bio.  Not
  * this does not allocate any pages for rbio->pages.
@@ -962,11 +953,12 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 					 struct btrfs_io_context *bioc,
 					 u64 stripe_len)
 {
+	const unsigned int real_stripes = bioc->num_stripes - bioc->num_tgtdevs;
+	const unsigned int num_pages = DIV_ROUND_UP(stripe_len, PAGE_SIZE) *
+				       real_stripes;
+	const unsigned int stripe_npages = DIV_ROUND_UP(stripe_len, PAGE_SIZE);
 	struct btrfs_raid_bio *rbio;
 	int nr_data = 0;
-	int real_stripes = bioc->num_stripes - bioc->num_tgtdevs;
-	int num_pages = rbio_nr_pages(stripe_len, real_stripes);
-	int stripe_npages = DIV_ROUND_UP(stripe_len, PAGE_SIZE);
 	void *p;
 
 	rbio = kzalloc(sizeof(*rbio) +
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/16] btrfs: make btrfs_raid_bio more compact
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
  2022-04-01 11:23 ` [PATCH 01/16] btrfs: open-code rbio_nr_pages() Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-01 11:23 ` [PATCH 03/16] btrfs: introduce new cached members for btrfs_raid_bio Qu Wenruo
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

There are a lot of members using much larger type in btrfs_raid_bio.

Like nr_pages which represents the total number of a full stripe.

Instead of int (which is at least 32bits), u16 is already enough
(max stripe length will be 256MiB, already beyond current raid56 device
number limit).

So this patch will reduce the width of the following members:

- stripe_len to u32
- nr_pages to u16
- nr_data to u8
- real_stripes to u8
- scrubp to u8
- faila/b to s8
  As -1 is used to indicate no corruption

This will slightly reduce the size of btrfs_raid_bio from 272 bytes to
256 bytes, reducing 16 bytes usage.

But please note that, when using btrfs_raid_bio, we allocate extra space
for it to cover various pointer array, so the reduce memory is not
really a big saving overall.

As we're here modifying the comments already, update existing comments
to current code standard.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index ae585ed6a023..2553e1bb8bbf 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -101,15 +101,6 @@ struct btrfs_raid_bio {
 	 */
 	unsigned long flags;
 
-	/* size of each individual stripe on disk */
-	int stripe_len;
-
-	/* number of data stripes (no p/q) */
-	int nr_data;
-
-	int real_stripes;
-
-	int stripe_npages;
 	/*
 	 * set if we're doing a parity rebuild
 	 * for a read from higher up, which is handled
@@ -118,21 +109,32 @@ struct btrfs_raid_bio {
 	 */
 	enum btrfs_rbio_ops operation;
 
-	/* first bad stripe */
-	int faila;
+	/* Size of each individual stripe on disk */
+	u32 stripe_len;
 
-	/* second bad stripe (for raid6 use) */
-	int failb;
+	/* How many pages there are for the full stripe including P/Q */
+	u16 nr_pages;
 
-	int scrubp;
-	/*
-	 * number of pages needed to represent the full
-	 * stripe
-	 */
-	int nr_pages;
+	/* Number of data stripes (no p/q) */
+	u8 nr_data;
+
+	/* Numer of all stripes (including P/Q) */
+	u8 real_stripes;
+
+	/* How many pages there are for each stripe */
+	u8 stripe_npages;
+
+	/* First bad stripe, -1 means no corruption */
+	s8 faila;
+
+	/* Second bad stripe (for raid6 use) */
+	s8 failb;
+
+	/* Stripe number that we're scrubbing  */
+	u8 scrubp;
 
 	/*
-	 * size of all the bios in the bio_list.  This
+	 * Size of all the bios in the bio_list.  This
 	 * helps us decide if the rbio maps to a full
 	 * stripe or not
 	 */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/16] btrfs: introduce new cached members for btrfs_raid_bio
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
  2022-04-01 11:23 ` [PATCH 01/16] btrfs: open-code rbio_nr_pages() Qu Wenruo
  2022-04-01 11:23 ` [PATCH 02/16] btrfs: make btrfs_raid_bio more compact Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-11 10:29   ` Geert Uytterhoeven
  2022-04-01 11:23 ` [PATCH 04/16] btrfs: introduce btrfs_raid_bio::stripe_sectors Qu Wenruo
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

The new members are all related to number of sectors, but the existing
number of pages members are kept as is:

- nr_sectors
  Total sectors of the full stripe including P/Q.

- stripe_nsectors
  The sectors of a single stripe.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 2553e1bb8bbf..1bad7d3a0331 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -115,6 +115,9 @@ struct btrfs_raid_bio {
 	/* How many pages there are for the full stripe including P/Q */
 	u16 nr_pages;
 
+	/* How many sectors there are for the full stripe including P/Q */
+	u16 nr_sectors;
+
 	/* Number of data stripes (no p/q) */
 	u8 nr_data;
 
@@ -124,6 +127,9 @@ struct btrfs_raid_bio {
 	/* How many pages there are for each stripe */
 	u8 stripe_npages;
 
+	/* How many sectors there are for each stripe */
+	u8 stripe_nsectors;
+
 	/* First bad stripe, -1 means no corruption */
 	s8 faila;
 
@@ -172,7 +178,7 @@ struct btrfs_raid_bio {
 	/* allocated with real_stripes-many pointers for finish_*() calls */
 	void **finish_pointers;
 
-	/* allocated with stripe_npages-many bits for finish_*() calls */
+	/* allocated with stripe_nsectors-many bits for finish_*() calls */
 	unsigned long *finish_pbitmap;
 };
 
@@ -958,18 +964,25 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	const unsigned int real_stripes = bioc->num_stripes - bioc->num_tgtdevs;
 	const unsigned int num_pages = DIV_ROUND_UP(stripe_len, PAGE_SIZE) *
 				       real_stripes;
+	const unsigned int num_sectors = DIV_ROUND_UP(stripe_len * real_stripes,
+						      fs_info->sectorsize);
 	const unsigned int stripe_npages = DIV_ROUND_UP(stripe_len, PAGE_SIZE);
+	const unsigned int stripe_nsectors = DIV_ROUND_UP(stripe_len,
+							  fs_info->sectorsize);
 	struct btrfs_raid_bio *rbio;
 	int nr_data = 0;
 	void *p;
 
+	/* PAGE_SIZE must tbe aligned to sectorsize for subpage support */
+	ASSERT(IS_ALIGNED(PAGE_SIZE, fs_info->sectorsize));
+
 	rbio = kzalloc(sizeof(*rbio) +
 		       sizeof(*rbio->stripe_pages) * num_pages +
 		       sizeof(*rbio->bio_pages) * num_pages +
 		       sizeof(*rbio->finish_pointers) * real_stripes +
-		       sizeof(*rbio->dbitmap) * BITS_TO_LONGS(stripe_npages) +
+		       sizeof(*rbio->dbitmap) * BITS_TO_LONGS(stripe_nsectors) +
 		       sizeof(*rbio->finish_pbitmap) *
-				BITS_TO_LONGS(stripe_npages),
+				BITS_TO_LONGS(stripe_nsectors),
 		       GFP_NOFS);
 	if (!rbio)
 		return ERR_PTR(-ENOMEM);
@@ -982,8 +995,10 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	rbio->bioc = bioc;
 	rbio->stripe_len = stripe_len;
 	rbio->nr_pages = num_pages;
+	rbio->nr_sectors = num_sectors;
 	rbio->real_stripes = real_stripes;
 	rbio->stripe_npages = stripe_npages;
+	rbio->stripe_nsectors = stripe_nsectors;
 	rbio->faila = -1;
 	rbio->failb = -1;
 	refcount_set(&rbio->refs, 1);
@@ -1002,8 +1017,8 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	CONSUME_ALLOC(rbio->stripe_pages, num_pages);
 	CONSUME_ALLOC(rbio->bio_pages, num_pages);
 	CONSUME_ALLOC(rbio->finish_pointers, real_stripes);
-	CONSUME_ALLOC(rbio->dbitmap, BITS_TO_LONGS(stripe_npages));
-	CONSUME_ALLOC(rbio->finish_pbitmap, BITS_TO_LONGS(stripe_npages));
+	CONSUME_ALLOC(rbio->dbitmap, BITS_TO_LONGS(stripe_nsectors));
+	CONSUME_ALLOC(rbio->finish_pbitmap, BITS_TO_LONGS(stripe_nsectors));
 #undef  CONSUME_ALLOC
 
 	if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID5)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/16] btrfs: introduce btrfs_raid_bio::stripe_sectors
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (2 preceding siblings ...)
  2022-04-01 11:23 ` [PATCH 03/16] btrfs: introduce new cached members for btrfs_raid_bio Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-08 16:40   ` David Sterba
  2022-04-01 11:23 ` [PATCH 05/16] btrfs: introduce btrfs_raid_bio::bio_sectors Qu Wenruo
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

The new member is an array of sector_ptr pointers, they will represent
all sectors inside a full stripe (including P/Q).

They co-operate with btrfs_raid_bio::stripe_pages:

stripe_pages:   | Page 0, range [0, 64K)   | Page 1 ...
stripe_sectors: |  |  | ...             |  |
                  |  |                    \- sector 15, page 0, pgoff=60K
                  |  \- sector 1, page 0, pgoff=4K
                  \---- sector 0, page 0, pfoff=0

With such structure, we can represent subpage sectors without using
extra pages.

Here we introduce a new helper, index_stripe_sectors(), to update
stripe_sectors[] to point to correct page and pgoff.

So every time rbio::stripe_pages[] pointer gets updated, the new helper
should be called.

The following functions have to call the new helper:

- steal_rbio()
- alloc_rbio_pages()
- alloc_rbio_parity_pages()
- alloc_rbio_essential_pages()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 62 +++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 54 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 1bad7d3a0331..8cfe00db79c9 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -52,6 +52,16 @@ struct btrfs_stripe_hash_table {
 	struct btrfs_stripe_hash table[];
 };
 
+/*
+ * A bvec like structure to present a sector inside a page.
+ *
+ * Unlike bvec we don't need bvlen, as it's fixed to sectorsize.
+ */
+struct sector_ptr {
+	struct page *page;
+	unsigned int pgoff;
+} __attribute__ ((__packed__));
+
 enum btrfs_rbio_ops {
 	BTRFS_RBIO_WRITE,
 	BTRFS_RBIO_READ_REBUILD,
@@ -154,25 +164,27 @@ struct btrfs_raid_bio {
 
 	atomic_t error;
 	/*
-	 * these are two arrays of pointers.  We allocate the
+	 * These are two arrays of pointers.  We allocate the
 	 * rbio big enough to hold them both and setup their
 	 * locations when the rbio is allocated
 	 */
 
-	/* pointers to pages that we allocated for
+	/*
+	 * Pointers to pages that we allocated for
 	 * reading/writing stripes directly from the disk (including P/Q)
 	 */
 	struct page **stripe_pages;
 
-	/*
-	 * pointers to the pages in the bio_list.  Stored
-	 * here for faster lookup
-	 */
+	/* Pointers to the pages in the bio_list, for faster lookup */
 	struct page **bio_pages;
 
 	/*
-	 * bitmap to record which horizontal stripe has data
+	 * For subpage support, we need to map each sector to above
+	 * stripe_pages.
 	 */
+	struct sector_ptr *stripe_sectors;
+
+	/* Bitmap to record which horizontal stripe has data */
 	unsigned long *dbitmap;
 
 	/* allocated with real_stripes-many pointers for finish_*() calls */
@@ -291,6 +303,26 @@ static int rbio_bucket(struct btrfs_raid_bio *rbio)
 	return hash_64(num >> 16, BTRFS_STRIPE_HASH_TABLE_BITS);
 }
 
+/*
+ * Update the stripe_sectors[] array to use correct page and pgoff
+ *
+ * Should be called every time any page pointer in stripes_pages[] got modified.
+ */
+static void index_stripe_sectors(struct btrfs_raid_bio *rbio)
+{
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
+	u32 offset;
+	int i;
+
+	for (i = 0, offset = 0; i < rbio->nr_sectors; i++, offset += sectorsize) {
+		int page_index = offset >> PAGE_SHIFT;
+
+		ASSERT(page_index < rbio->nr_pages);
+		rbio->stripe_sectors[i].page = rbio->stripe_pages[page_index];
+		rbio->stripe_sectors[i].pgoff = offset_in_page(offset);
+	}
+}
+
 /*
  * stealing an rbio means taking all the uptodate pages from the stripe
  * array in the source rbio and putting them into the destination rbio
@@ -317,6 +349,8 @@ static void steal_rbio(struct btrfs_raid_bio *src, struct btrfs_raid_bio *dest)
 		dest->stripe_pages[i] = s;
 		src->stripe_pages[i] = NULL;
 	}
+	index_stripe_sectors(dest);
+	index_stripe_sectors(src);
 }
 
 /*
@@ -979,6 +1013,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	rbio = kzalloc(sizeof(*rbio) +
 		       sizeof(*rbio->stripe_pages) * num_pages +
 		       sizeof(*rbio->bio_pages) * num_pages +
+		       sizeof(*rbio->stripe_sectors) * num_sectors +
 		       sizeof(*rbio->finish_pointers) * real_stripes +
 		       sizeof(*rbio->dbitmap) * BITS_TO_LONGS(stripe_nsectors) +
 		       sizeof(*rbio->finish_pbitmap) *
@@ -1016,6 +1051,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	} while (0)
 	CONSUME_ALLOC(rbio->stripe_pages, num_pages);
 	CONSUME_ALLOC(rbio->bio_pages, num_pages);
+	CONSUME_ALLOC(rbio->stripe_sectors, num_sectors);
 	CONSUME_ALLOC(rbio->finish_pointers, real_stripes);
 	CONSUME_ALLOC(rbio->dbitmap, BITS_TO_LONGS(stripe_nsectors));
 	CONSUME_ALLOC(rbio->finish_pbitmap, BITS_TO_LONGS(stripe_nsectors));
@@ -1032,12 +1068,16 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	return rbio;
 }
 
-/* allocate pages for all the stripes in the bio, including parity */
+/*
+ * Allocate pages for all the stripes in the bio including parity, and map all
+ * sectors to their corresponding pages
+ */
 static int alloc_rbio_pages(struct btrfs_raid_bio *rbio)
 {
 	int i;
 	struct page *page;
 
+	/* Allocate all pages */
 	for (i = 0; i < rbio->nr_pages; i++) {
 		if (rbio->stripe_pages[i])
 			continue;
@@ -1046,6 +1086,9 @@ static int alloc_rbio_pages(struct btrfs_raid_bio *rbio)
 			return -ENOMEM;
 		rbio->stripe_pages[i] = page;
 	}
+
+	/* Mapping all sectors */
+	index_stripe_sectors(rbio);
 	return 0;
 }
 
@@ -1065,6 +1108,8 @@ static int alloc_rbio_parity_pages(struct btrfs_raid_bio *rbio)
 			return -ENOMEM;
 		rbio->stripe_pages[i] = page;
 	}
+
+	index_stripe_sectors(rbio);
 	return 0;
 }
 
@@ -2313,6 +2358,7 @@ static int alloc_rbio_essential_pages(struct btrfs_raid_bio *rbio)
 			rbio->stripe_pages[index] = page;
 		}
 	}
+	index_stripe_sectors(rbio);
 	return 0;
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/16] btrfs: introduce btrfs_raid_bio::bio_sectors
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (3 preceding siblings ...)
  2022-04-01 11:23 ` [PATCH 04/16] btrfs: introduce btrfs_raid_bio::stripe_sectors Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-08 16:46   ` David Sterba
  2022-04-01 11:23 ` [PATCH 06/16] btrfs: make rbio_add_io_page() subpage compatible Qu Wenruo
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

This new member is going to fully replace bio_pages in the future, but
for now let's keep them co-exist, until the full switch is done.

Currently cache_rbio_pages() and index_rbio_pages() will also populate
the new array.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 8cfe00db79c9..f23fd282d298 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -60,6 +60,7 @@ struct btrfs_stripe_hash_table {
 struct sector_ptr {
 	struct page *page;
 	unsigned int pgoff;
+	unsigned int uptodate:1;
 } __attribute__ ((__packed__));
 
 enum btrfs_rbio_ops {
@@ -175,6 +176,9 @@ struct btrfs_raid_bio {
 	 */
 	struct page **stripe_pages;
 
+	/* Pointers to the sectors in the bio_list, for faster lookup */
+	struct sector_ptr *bio_sectors;
+
 	/* Pointers to the pages in the bio_list, for faster lookup */
 	struct page **bio_pages;
 
@@ -282,6 +286,24 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio)
 		copy_highpage(rbio->stripe_pages[i], rbio->bio_pages[i]);
 		SetPageUptodate(rbio->stripe_pages[i]);
 	}
+
+	/*
+	 * TODO: This work is duplicated with above loop, should remove above
+	 * loop when the switch is fully done.
+	 */
+	for (i = 0; i < rbio->nr_sectors; i++) {
+		/* Some range not covered by bio (partial write), skip it */
+		if (!rbio->bio_sectors[i].page)
+			continue;
+
+		ASSERT(rbio->stripe_sectors[i].page);
+		memcpy_page(rbio->stripe_sectors[i].page,
+			    rbio->stripe_sectors[i].pgoff,
+			    rbio->bio_sectors[i].page,
+			    rbio->bio_sectors[i].pgoff,
+			    rbio->bioc->fs_info->sectorsize);
+		rbio->stripe_sectors[i].uptodate = 1;
+	}
 	set_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
 }
 
@@ -1012,7 +1034,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 
 	rbio = kzalloc(sizeof(*rbio) +
 		       sizeof(*rbio->stripe_pages) * num_pages +
-		       sizeof(*rbio->bio_pages) * num_pages +
+		       sizeof(*rbio->bio_sectors) * num_sectors +
 		       sizeof(*rbio->stripe_sectors) * num_sectors +
 		       sizeof(*rbio->finish_pointers) * real_stripes +
 		       sizeof(*rbio->dbitmap) * BITS_TO_LONGS(stripe_nsectors) +
@@ -1051,6 +1073,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	} while (0)
 	CONSUME_ALLOC(rbio->stripe_pages, num_pages);
 	CONSUME_ALLOC(rbio->bio_pages, num_pages);
+	CONSUME_ALLOC(rbio->bio_sectors, num_sectors);
 	CONSUME_ALLOC(rbio->stripe_sectors, num_sectors);
 	CONSUME_ALLOC(rbio->finish_pointers, real_stripes);
 	CONSUME_ALLOC(rbio->dbitmap, BITS_TO_LONGS(stripe_nsectors));
@@ -1184,6 +1207,31 @@ static void validate_rbio_for_rmw(struct btrfs_raid_bio *rbio)
 	}
 }
 
+static void index_one_bio(struct btrfs_raid_bio *rbio, struct bio *bio)
+{
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
+	struct bio_vec bvec;
+	struct bvec_iter iter;
+	u32 offset = (bio->bi_iter.bi_sector << SECTOR_SHIFT) -
+		     rbio->bioc->raid_map[0];
+
+	if (bio_flagged(bio, BIO_CLONED))
+		bio->bi_iter = btrfs_bio(bio)->iter;
+	bio_for_each_segment(bvec, bio, iter) {
+		u32 bvec_offset;
+
+		for (bvec_offset = 0; bvec_offset < bvec.bv_len;
+		     bvec_offset += sectorsize, offset += sectorsize) {
+			int index = offset / sectorsize;
+			struct sector_ptr *sector = &rbio->bio_sectors[index];
+
+			sector->page = bvec.bv_page;
+			sector->pgoff = bvec.bv_offset + bvec_offset;
+			ASSERT(sector->pgoff < PAGE_SIZE);
+		}
+	}
+}
+
 /*
  * helper function to walk our bio list and populate the bio_pages array with
  * the result.  This seems expensive, but it is faster than constantly
@@ -1217,6 +1265,10 @@ static void index_rbio_pages(struct btrfs_raid_bio *rbio)
 			i++;
 		}
 	}
+	/* TODO: This loop will replace above loop when the full switch is done */
+	bio_list_for_each(bio, &rbio->bio_list)
+		index_one_bio(rbio, bio);
+
 	spin_unlock_irq(&rbio->bio_list_lock);
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/16] btrfs: make rbio_add_io_page() subpage compatible
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (4 preceding siblings ...)
  2022-04-01 11:23 ` [PATCH 05/16] btrfs: introduce btrfs_raid_bio::bio_sectors Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-01 11:23 ` [PATCH 07/16] btrfs: make finish_parity_scrub() " Qu Wenruo
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

This involves:

- Rename rbio_add_io_page() to rbio_add_io_sector()
  Although we still rely on PAGE_SIZE == sectorsize, so add a new
  ASSERT() inside rbio_add_io_sector() to makesure all pgoff is 0.

- Introduce rbio_stripe_sector() helper
  The equivalent of rbio_stripe_page().

  This new helper has extra ASSERT()s to validate the stripe and sector
  number.

- Introduce sector_in_rbio() helper
  The equivalent of page_in_rbio().

- Rename @pagenr variables to @sectornr

- Use rbio::stripe_nsectors when iterating the bitmap

Please note that, this only changes the interface, the bios are still
using full page for IO.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 236 ++++++++++++++++++++++++++++++----------------
 1 file changed, 155 insertions(+), 81 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index f23fd282d298..07d0b26024dd 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -664,6 +664,25 @@ static int rbio_can_merge(struct btrfs_raid_bio *last,
 	return 1;
 }
 
+static unsigned int rbio_stripe_sector_index(const struct btrfs_raid_bio *rbio,
+					     unsigned int stripe_nr,
+					     unsigned int sector_nr)
+{
+	ASSERT(stripe_nr < rbio->real_stripes);
+	ASSERT(sector_nr < rbio->stripe_nsectors);
+
+	return stripe_nr * rbio->stripe_nsectors + sector_nr;
+}
+
+/* Return a sector from rbio->stripe_sectors, not from the bio list */
+static struct sector_ptr *rbio_stripe_sector(const struct btrfs_raid_bio *rbio,
+					     unsigned int stripe_nr,
+					     unsigned int sector_nr)
+{
+	return &rbio->stripe_sectors[rbio_stripe_sector_index(rbio, stripe_nr,
+							      sector_nr)];
+}
+
 static int rbio_stripe_page_index(struct btrfs_raid_bio *rbio, int stripe,
 				  int index)
 {
@@ -975,6 +994,44 @@ static void raid_write_end_io(struct bio *bio)
 	rbio_orig_end_io(rbio, err);
 }
 
+/*
+ * To get a sector pointer specified by its @stripe_nr and @sector_nr
+ *
+ * @stripe_nr:		Stripe number, valid range [0, real_stripe)
+ * @sector_nr:		Sector number inside the stripe,
+ *			valid range [0, stripe_nsectors)
+ * @bio_list_only:	Whether to use sectors inside the bio list only.
+ *
+ * The read/modify/write code wants to reuse the original bio page as much
+ * as possible, and only use stripe_sectors as fallback.
+ */
+static struct sector_ptr *sector_in_rbio(struct btrfs_raid_bio *rbio,
+					 int stripe_nr, int sector_nr,
+					 bool bio_list_only)
+{
+	struct sector_ptr *sector = NULL;
+	int index;
+
+	ASSERT(stripe_nr >= 0 && stripe_nr < rbio->real_stripes);
+	ASSERT(sector_nr >= 0 && sector_nr < rbio->stripe_nsectors);
+
+	index = stripe_nr * rbio->stripe_nsectors + sector_nr;
+	ASSERT(index >= 0 && index < rbio->nr_sectors);
+
+	spin_lock_irq(&rbio->bio_list_lock);
+	sector = &rbio->bio_sectors[index];
+	if (sector->page || bio_list_only) {
+		/* Don't return sector without a valid page pointer */
+		if (!sector->page)
+			sector = NULL;
+		spin_unlock_irq(&rbio->bio_list_lock);
+		return sector;
+	}
+	spin_unlock_irq(&rbio->bio_list_lock);
+
+	return &rbio->stripe_sectors[index];
+}
+
 /*
  * the read/modify/write code wants to use the original bio for
  * any pages it included, and then use the rbio for everything
@@ -1137,25 +1194,39 @@ static int alloc_rbio_parity_pages(struct btrfs_raid_bio *rbio)
 }
 
 /*
- * add a single page from a specific stripe into our list of bios for IO
- * this will try to merge into existing bios if possible, and returns
- * zero if all went well.
+ * Add a single sector @sector into our list of bios for IO.
+ *
+ * Return 0 if everything went well.
+ * Return <0 for error.
  */
-static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
-			    struct bio_list *bio_list,
-			    struct page *page,
-			    int stripe_nr,
-			    unsigned long page_index,
-			    unsigned long bio_max_len)
+static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
+			      struct bio_list *bio_list,
+			      struct sector_ptr *sector,
+			      unsigned int stripe_nr,
+			      unsigned int sector_nr,
+			      unsigned long bio_max_len)
 {
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
 	struct bio *last = bio_list->tail;
 	int ret;
 	struct bio *bio;
 	struct btrfs_io_stripe *stripe;
 	u64 disk_start;
 
+	/*
+	 * NOTE: here stripe_nr has taken device replace into consideration,
+	 * thus it can be larger than rbio->real_stripe.
+	 * So here we check against bioc->num_stripes, not rbio->real_stripes.
+	 */
+	ASSERT(stripe_nr >= 0 && stripe_nr < rbio->bioc->num_stripes);
+	ASSERT(sector_nr >= 0 && sector_nr < rbio->stripe_nsectors);
+	ASSERT(sector->page);
+
+	/* TODO: We don't yet support subpage, thus pgoff should always be 0 */
+	ASSERT(sector->pgoff == 0);
+
 	stripe = &rbio->bioc->stripes[stripe_nr];
-	disk_start = stripe->physical + (page_index << PAGE_SHIFT);
+	disk_start = stripe->physical + sector_nr * sectorsize;
 
 	/* if the device is missing, just fail this stripe */
 	if (!stripe->dev->bdev)
@@ -1172,8 +1243,9 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
 		 */
 		if (last_end == disk_start && !last->bi_status &&
 		    last->bi_bdev == stripe->dev->bdev) {
-			ret = bio_add_page(last, page, PAGE_SIZE, 0);
-			if (ret == PAGE_SIZE)
+			ret = bio_add_page(last, sector->page, sectorsize,
+					   sector->pgoff);
+			if (ret == sectorsize)
 				return 0;
 		}
 	}
@@ -1185,7 +1257,7 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
 	bio_set_dev(bio, stripe->dev->bdev);
 	bio->bi_iter.bi_sector = disk_start >> 9;
 
-	bio_add_page(bio, page, PAGE_SIZE, 0);
+	bio_add_page(bio, sector->page, sectorsize, sector->pgoff);
 	bio_list_add(bio_list, bio);
 	return 0;
 }
@@ -1286,7 +1358,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 	void **pointers = rbio->finish_pointers;
 	int nr_data = rbio->nr_data;
 	int stripe;
-	int pagenr;
+	int sectornr;
 	bool has_qstripe;
 	struct bio_list bio_list;
 	struct bio *bio;
@@ -1330,16 +1402,16 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 	else
 		clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
 
-	for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
+	for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
 		struct page *p;
 		/* first collect one page from each data stripe */
 		for (stripe = 0; stripe < nr_data; stripe++) {
-			p = page_in_rbio(rbio, stripe, pagenr, 0);
+			p = page_in_rbio(rbio, stripe, sectornr, 0);
 			pointers[stripe] = kmap_local_page(p);
 		}
 
 		/* then add the parity stripe */
-		p = rbio_pstripe_page(rbio, pagenr);
+		p = rbio_pstripe_page(rbio, sectornr);
 		SetPageUptodate(p);
 		pointers[stripe++] = kmap_local_page(p);
 
@@ -1349,7 +1421,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 			 * raid6, add the qstripe and call the
 			 * library function to fill in our p/q
 			 */
-			p = rbio_qstripe_page(rbio, pagenr);
+			p = rbio_qstripe_page(rbio, sectornr);
 			SetPageUptodate(p);
 			pointers[stripe++] = kmap_local_page(p);
 
@@ -1370,18 +1442,19 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 	 * everything else.
 	 */
 	for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
-		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
-			struct page *page;
+		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
+			struct sector_ptr *sector;
+
 			if (stripe < rbio->nr_data) {
-				page = page_in_rbio(rbio, stripe, pagenr, 1);
-				if (!page)
+				sector = sector_in_rbio(rbio, stripe, sectornr, 1);
+				if (!sector)
 					continue;
 			} else {
-			       page = rbio_stripe_page(rbio, stripe, pagenr);
+				sector = rbio_stripe_sector(rbio, stripe, sectornr);
 			}
 
-			ret = rbio_add_io_page(rbio, &bio_list,
-				       page, stripe, pagenr, rbio->stripe_len);
+			ret = rbio_add_io_sector(rbio, &bio_list, sector, stripe,
+						 sectornr, rbio->stripe_len);
 			if (ret)
 				goto cleanup;
 		}
@@ -1394,19 +1467,20 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 		if (!bioc->tgtdev_map[stripe])
 			continue;
 
-		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
-			struct page *page;
+		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
+			struct sector_ptr *sector;
+
 			if (stripe < rbio->nr_data) {
-				page = page_in_rbio(rbio, stripe, pagenr, 1);
-				if (!page)
+				sector = sector_in_rbio(rbio, stripe, sectornr, 1);
+				if (!sector)
 					continue;
 			} else {
-			       page = rbio_stripe_page(rbio, stripe, pagenr);
+				sector = rbio_stripe_sector(rbio, stripe, sectornr);
 			}
 
-			ret = rbio_add_io_page(rbio, &bio_list, page,
+			ret = rbio_add_io_sector(rbio, &bio_list, sector,
 					       rbio->bioc->tgtdev_map[stripe],
-					       pagenr, rbio->stripe_len);
+					       sectornr, rbio->stripe_len);
 			if (ret)
 				goto cleanup;
 		}
@@ -1584,7 +1658,7 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
 	int bios_to_read = 0;
 	struct bio_list bio_list;
 	int ret;
-	int pagenr;
+	int sectornr;
 	int stripe;
 	struct bio *bio;
 
@@ -1602,28 +1676,29 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
 	 * stripe
 	 */
 	for (stripe = 0; stripe < rbio->nr_data; stripe++) {
-		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
-			struct page *page;
+		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
+			struct sector_ptr *sector;
+
 			/*
-			 * we want to find all the pages missing from
+			 * We want to find all the sectors missing from
 			 * the rbio and read them from the disk.  If
-			 * page_in_rbio finds a page in the bio list
+			 * sector_in_rbio() finds a page in the bio list
 			 * we don't need to read it off the stripe.
 			 */
-			page = page_in_rbio(rbio, stripe, pagenr, 1);
-			if (page)
+			sector = sector_in_rbio(rbio, stripe, sectornr, 1);
+			if (sector)
 				continue;
 
-			page = rbio_stripe_page(rbio, stripe, pagenr);
+			sector = rbio_stripe_sector(rbio, stripe, sectornr);
 			/*
-			 * the bio cache may have handed us an uptodate
+			 * The bio cache may have handed us an uptodate
 			 * page.  If so, be happy and use it
 			 */
-			if (PageUptodate(page))
+			if (sector->uptodate)
 				continue;
 
-			ret = rbio_add_io_page(rbio, &bio_list, page,
-				       stripe, pagenr, rbio->stripe_len);
+			ret = rbio_add_io_sector(rbio, &bio_list, sector,
+				       stripe, sectornr, rbio->stripe_len);
 			if (ret)
 				goto cleanup;
 		}
@@ -2129,7 +2204,7 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
 	int bios_to_read = 0;
 	struct bio_list bio_list;
 	int ret;
-	int pagenr;
+	int sectornr;
 	int stripe;
 	struct bio *bio;
 
@@ -2152,20 +2227,19 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
 			continue;
 		}
 
-		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
-			struct page *p;
+		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
+			struct sector_ptr *sector;
 
 			/*
 			 * the rmw code may have already read this
 			 * page in
 			 */
-			p = rbio_stripe_page(rbio, stripe, pagenr);
-			if (PageUptodate(p))
+			sector = rbio_stripe_sector(rbio, stripe, sectornr);
+			if (sector->uptodate)
 				continue;
 
-			ret = rbio_add_io_page(rbio, &bio_list,
-				       rbio_stripe_page(rbio, stripe, pagenr),
-				       stripe, pagenr, rbio->stripe_len);
+			ret = rbio_add_io_sector(rbio, &bio_list, sector,
+						 stripe, sectornr, rbio->stripe_len);
 			if (ret < 0)
 				goto cleanup;
 		}
@@ -2422,7 +2496,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	unsigned long *pbitmap = rbio->finish_pbitmap;
 	int nr_data = rbio->nr_data;
 	int stripe;
-	int pagenr;
+	int sectornr;
 	bool has_qstripe;
 	struct page *p_page = NULL;
 	struct page *q_page = NULL;
@@ -2442,7 +2516,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 
 	if (bioc->num_tgtdevs && bioc->tgtdev_map[rbio->scrubp]) {
 		is_replace = 1;
-		bitmap_copy(pbitmap, rbio->dbitmap, rbio->stripe_npages);
+		bitmap_copy(pbitmap, rbio->dbitmap, rbio->stripe_nsectors);
 	}
 
 	/*
@@ -2476,12 +2550,12 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	/* Map the parity stripe just once */
 	pointers[nr_data] = kmap_local_page(p_page);
 
-	for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
+	for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
 		struct page *p;
 		void *parity;
 		/* first collect one page from each data stripe */
 		for (stripe = 0; stripe < nr_data; stripe++) {
-			p = page_in_rbio(rbio, stripe, pagenr, 0);
+			p = page_in_rbio(rbio, stripe, sectornr, 0);
 			pointers[stripe] = kmap_local_page(p);
 		}
 
@@ -2496,13 +2570,13 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 		}
 
 		/* Check scrubbing parity and repair it */
-		p = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
+		p = rbio_stripe_page(rbio, rbio->scrubp, sectornr);
 		parity = kmap_local_page(p);
 		if (memcmp(parity, pointers[rbio->scrubp], PAGE_SIZE))
 			copy_page(parity, pointers[rbio->scrubp]);
 		else
 			/* Parity is right, needn't writeback */
-			bitmap_clear(rbio->dbitmap, pagenr, 1);
+			bitmap_clear(rbio->dbitmap, sectornr, 1);
 		kunmap_local(parity);
 
 		for (stripe = nr_data - 1; stripe >= 0; stripe--)
@@ -2522,12 +2596,12 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	 * higher layers (the bio_list in our rbio) and our p/q.  Ignore
 	 * everything else.
 	 */
-	for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
-		struct page *page;
+	for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
+		struct sector_ptr *sector;
 
-		page = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
-		ret = rbio_add_io_page(rbio, &bio_list,
-			       page, rbio->scrubp, pagenr, rbio->stripe_len);
+		sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
+		ret = rbio_add_io_sector(rbio, &bio_list, sector, rbio->scrubp,
+					 sectornr, rbio->stripe_len);
 		if (ret)
 			goto cleanup;
 	}
@@ -2535,13 +2609,13 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	if (!is_replace)
 		goto submit_write;
 
-	for_each_set_bit(pagenr, pbitmap, rbio->stripe_npages) {
-		struct page *page;
+	for_each_set_bit(sectornr, pbitmap, rbio->stripe_nsectors) {
+		struct sector_ptr *sector;
 
-		page = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
-		ret = rbio_add_io_page(rbio, &bio_list, page,
+		sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
+		ret = rbio_add_io_sector(rbio, &bio_list, sector,
 				       bioc->tgtdev_map[rbio->scrubp],
-				       pagenr, rbio->stripe_len);
+				       sectornr, rbio->stripe_len);
 		if (ret)
 			goto cleanup;
 	}
@@ -2675,7 +2749,7 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
 	int bios_to_read = 0;
 	struct bio_list bio_list;
 	int ret;
-	int pagenr;
+	int sectornr;
 	int stripe;
 	struct bio *bio;
 
@@ -2691,28 +2765,28 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
 	 * stripe
 	 */
 	for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
-		for_each_set_bit(pagenr, rbio->dbitmap, rbio->stripe_npages) {
-			struct page *page;
+		for_each_set_bit(sectornr , rbio->dbitmap, rbio->stripe_nsectors) {
+			struct sector_ptr *sector;
 			/*
-			 * we want to find all the pages missing from
+			 * We want to find all the sectors missing from
 			 * the rbio and read them from the disk.  If
-			 * page_in_rbio finds a page in the bio list
+			 * sector_in_rbio() finds a sector in the bio list
 			 * we don't need to read it off the stripe.
 			 */
-			page = page_in_rbio(rbio, stripe, pagenr, 1);
-			if (page)
+			sector = sector_in_rbio(rbio, stripe, sectornr, 1);
+			if (sector)
 				continue;
 
-			page = rbio_stripe_page(rbio, stripe, pagenr);
+			sector = rbio_stripe_sector(rbio, stripe, sectornr);
 			/*
-			 * the bio cache may have handed us an uptodate
-			 * page.  If so, be happy and use it
+			 * The bio cache may have handed us an uptodate
+			 * sector.  If so, be happy and use it
 			 */
-			if (PageUptodate(page))
+			if (sector->uptodate)
 				continue;
 
-			ret = rbio_add_io_page(rbio, &bio_list, page,
-				       stripe, pagenr, rbio->stripe_len);
+			ret = rbio_add_io_sector(rbio, &bio_list, sector,
+				       stripe, sectornr, rbio->stripe_len);
 			if (ret)
 				goto cleanup;
 		}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/16] btrfs: make finish_parity_scrub() subpage compatible
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (5 preceding siblings ...)
  2022-04-01 11:23 ` [PATCH 06/16] btrfs: make rbio_add_io_page() subpage compatible Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-08 17:04   ` David Sterba
  2022-04-01 11:23 ` [PATCH 08/16] btrfs: make __raid_recover_endio_io() subpage compatibable Qu Wenruo
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

The core is to convert direct page usage into sector_ptr usage, and
use memcpy() to replace copy_page().

For pointers usage, we need to convert it to kmap_local_page() +
sector->pgoff.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 56 +++++++++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 07d0b26024dd..bd01e64ea4fc 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2492,14 +2492,15 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 					 int need_check)
 {
 	struct btrfs_io_context *bioc = rbio->bioc;
+	const u32 sectorsize = bioc->fs_info->sectorsize;
 	void **pointers = rbio->finish_pointers;
 	unsigned long *pbitmap = rbio->finish_pbitmap;
 	int nr_data = rbio->nr_data;
 	int stripe;
 	int sectornr;
 	bool has_qstripe;
-	struct page *p_page = NULL;
-	struct page *q_page = NULL;
+	struct sector_ptr p_sector = {};
+	struct sector_ptr q_sector = {};
 	struct bio_list bio_list;
 	struct bio *bio;
 	int is_replace = 0;
@@ -2529,51 +2530,56 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	if (!need_check)
 		goto writeback;
 
-	p_page = alloc_page(GFP_NOFS);
-	if (!p_page)
+	p_sector.page = alloc_page(GFP_NOFS);
+	if (!p_sector.page)
 		goto cleanup;
-	SetPageUptodate(p_page);
+	p_sector.pgoff = 0;
+	p_sector.uptodate = 1;
 
 	if (has_qstripe) {
 		/* RAID6, allocate and map temp space for the Q stripe */
-		q_page = alloc_page(GFP_NOFS);
-		if (!q_page) {
-			__free_page(p_page);
+		q_sector.page = alloc_page(GFP_NOFS);
+		if (!q_sector.page) {
+			__free_page(p_sector.page);
+			p_sector.page = NULL;
 			goto cleanup;
 		}
-		SetPageUptodate(q_page);
-		pointers[rbio->real_stripes - 1] = kmap_local_page(q_page);
+		q_sector.pgoff = 0;
+		q_sector.uptodate = 1;
+		pointers[rbio->real_stripes - 1] = kmap_local_page(q_sector.page);
 	}
 
 	atomic_set(&rbio->error, 0);
 
 	/* Map the parity stripe just once */
-	pointers[nr_data] = kmap_local_page(p_page);
+	pointers[nr_data] = kmap_local_page(p_sector.page);
 
 	for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
-		struct page *p;
+		struct sector_ptr *sector;
 		void *parity;
+
 		/* first collect one page from each data stripe */
 		for (stripe = 0; stripe < nr_data; stripe++) {
-			p = page_in_rbio(rbio, stripe, sectornr, 0);
-			pointers[stripe] = kmap_local_page(p);
+			sector = sector_in_rbio(rbio, stripe, sectornr, 0);
+			pointers[stripe] = kmap_local_page(sector->page) +
+					   sector->pgoff;
 		}
 
 		if (has_qstripe) {
 			/* RAID6, call the library function to fill in our P/Q */
-			raid6_call.gen_syndrome(rbio->real_stripes, PAGE_SIZE,
+			raid6_call.gen_syndrome(rbio->real_stripes, sectorsize,
 						pointers);
 		} else {
 			/* raid5 */
-			copy_page(pointers[nr_data], pointers[0]);
-			run_xor(pointers + 1, nr_data - 1, PAGE_SIZE);
+			memcpy(pointers[nr_data], pointers[0], sectorsize);
+			run_xor(pointers + 1, nr_data - 1, sectorsize);
 		}
 
 		/* Check scrubbing parity and repair it */
-		p = rbio_stripe_page(rbio, rbio->scrubp, sectornr);
-		parity = kmap_local_page(p);
-		if (memcmp(parity, pointers[rbio->scrubp], PAGE_SIZE))
-			copy_page(parity, pointers[rbio->scrubp]);
+		sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
+		parity = kmap_local_page(sector->page) + sector->pgoff;
+		if (memcmp(parity, pointers[rbio->scrubp], sectorsize))
+			memcpy(parity, pointers[rbio->scrubp], sectorsize);
 		else
 			/* Parity is right, needn't writeback */
 			bitmap_clear(rbio->dbitmap, sectornr, 1);
@@ -2584,10 +2590,12 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	}
 
 	kunmap_local(pointers[nr_data]);
-	__free_page(p_page);
-	if (q_page) {
+	__free_page(p_sector.page);
+	p_sector.page = NULL;
+	if (q_sector.page) {
 		kunmap_local(pointers[rbio->real_stripes - 1]);
-		__free_page(q_page);
+		__free_page(q_sector.page);
+		q_sector.page = NULL;
 	}
 
 writeback:
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/16] btrfs: make __raid_recover_endio_io() subpage compatibable
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (6 preceding siblings ...)
  2022-04-01 11:23 ` [PATCH 07/16] btrfs: make finish_parity_scrub() " Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-01 11:23 ` [PATCH 09/16] btrfs: make finish_rmw() subpage compatible Qu Wenruo
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

One trick involved is, since our pointers[] array has to have
sector::pgoff added, we can no longer use the address to unmap the page.

Thankfully we have another array for kunmap_local, so those two arrays
are no longer containing the same values, even before the rotation.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 57 ++++++++++++++++++++++++++---------------------
 1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index bd01e64ea4fc..1fbeaf2909b4 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1947,20 +1947,24 @@ int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc,
 }
 
 /*
- * all parity reconstruction happens here.  We've read in everything
+ * All parity reconstruction happens here.  We've read in everything
  * we can find from the drives and this does the heavy lifting of
  * sorting the good from the bad.
  */
 static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 {
-	int pagenr, stripe;
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
+	int sectornr, stripe;
 	void **pointers;
 	void **unmap_array;
 	int faila = -1, failb = -1;
-	struct page *page;
 	blk_status_t err;
 	int i;
 
+	/*
+	 * This array stores the pointer for each sector, thus it has the extra
+	 * pgoff value added from each sector
+	 */
 	pointers = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
 	if (!pointers) {
 		err = BLK_STS_RESOURCE;
@@ -1970,6 +1974,8 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 	/*
 	 * Store copy of pointers that does not get reordered during
 	 * reconstruction so that kunmap_local works.
+	 * This array doesn't have pgoff added so we can safely call kunmap_local
+	 * on it.
 	 */
 	unmap_array = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
 	if (!unmap_array) {
@@ -1989,43 +1995,43 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 
 	index_rbio_pages(rbio);
 
-	for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
+	for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
+		struct sector_ptr *sector;
+
 		/*
 		 * Now we just use bitmap to mark the horizontal stripes in
 		 * which we have data when doing parity scrub.
 		 */
 		if (rbio->operation == BTRFS_RBIO_PARITY_SCRUB &&
-		    !test_bit(pagenr, rbio->dbitmap))
+		    !test_bit(sectornr, rbio->dbitmap))
 			continue;
 
 		/*
-		 * Setup our array of pointers with pages from each stripe
+		 * Setup our array of pointers with sectors from each stripe
 		 *
 		 * NOTE: store a duplicate array of pointers to preserve the
 		 * pointer order
 		 */
 		for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
 			/*
-			 * if we're rebuilding a read, we have to use
+			 * If we're rebuilding a read, we have to use
 			 * pages from the bio list
 			 */
 			if ((rbio->operation == BTRFS_RBIO_READ_REBUILD ||
 			     rbio->operation == BTRFS_RBIO_REBUILD_MISSING) &&
 			    (stripe == faila || stripe == failb)) {
-				page = page_in_rbio(rbio, stripe, pagenr, 0);
+				sector = sector_in_rbio(rbio, stripe, sectornr, 0);
 			} else {
-				page = rbio_stripe_page(rbio, stripe, pagenr);
+				sector = rbio_stripe_sector(rbio, stripe, sectornr);
 			}
-			pointers[stripe] = kmap_local_page(page);
-			unmap_array[stripe] = pointers[stripe];
+			ASSERT(sector->page);
+			unmap_array[stripe] = kmap_local_page(sector->page);
+			pointers[stripe] = unmap_array[stripe] + sector->pgoff;
 		}
 
-		/* all raid6 handling here */
+		/* All raid6 handling here */
 		if (rbio->bioc->map_type & BTRFS_BLOCK_GROUP_RAID6) {
-			/*
-			 * single failure, rebuild from parity raid5
-			 * style
-			 */
+			/* Single failure, rebuild from parity raid5 style */
 			if (failb < 0) {
 				if (faila == rbio->nr_data) {
 					/*
@@ -2068,10 +2074,10 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 
 			if (rbio->bioc->raid_map[failb] == RAID5_P_STRIPE) {
 				raid6_datap_recov(rbio->real_stripes,
-						  PAGE_SIZE, faila, pointers);
+						  sectorsize, faila, pointers);
 			} else {
 				raid6_2data_recov(rbio->real_stripes,
-						  PAGE_SIZE, faila, failb,
+						  sectorsize, faila, failb,
 						  pointers);
 			}
 		} else {
@@ -2081,7 +2087,8 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 			BUG_ON(failb != -1);
 pstripe:
 			/* Copy parity block into failed block to start with */
-			copy_page(pointers[faila], pointers[rbio->nr_data]);
+			memcpy(pointers[faila], pointers[rbio->nr_data],
+			       sectorsize);
 
 			/* rearrange the pointer array */
 			p = pointers[faila];
@@ -2090,7 +2097,7 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 			pointers[rbio->nr_data - 1] = p;
 
 			/* xor in the rest */
-			run_xor(pointers, rbio->nr_data - 1, PAGE_SIZE);
+			run_xor(pointers, rbio->nr_data - 1, sectorsize);
 		}
 		/* if we're doing this rebuild as part of an rmw, go through
 		 * and set all of our private rbio pages in the
@@ -2099,14 +2106,14 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 		 * other endio functions will fiddle the uptodate bits
 		 */
 		if (rbio->operation == BTRFS_RBIO_WRITE) {
-			for (i = 0;  i < rbio->stripe_npages; i++) {
+			for (i = 0;  i < rbio->stripe_nsectors; i++) {
 				if (faila != -1) {
-					page = rbio_stripe_page(rbio, faila, i);
-					SetPageUptodate(page);
+					sector = rbio_stripe_sector(rbio, faila, i);
+					sector->uptodate = 1;
 				}
 				if (failb != -1) {
-					page = rbio_stripe_page(rbio, failb, i);
-					SetPageUptodate(page);
+					sector = rbio_stripe_sector(rbio, failb, i);
+					sector->uptodate = 1;
 				}
 			}
 		}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/16] btrfs: make finish_rmw() subpage compatible
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (7 preceding siblings ...)
  2022-04-01 11:23 ` [PATCH 08/16] btrfs: make __raid_recover_endio_io() subpage compatibable Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-11 17:00   ` Christoph Hellwig
  2022-04-01 11:23 ` [PATCH 10/16] btrfs: open-code rbio_stripe_page_index() Qu Wenruo
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

The trick involved is how we call kunmap_local(), as now the pointers[]
only store the address with @pgoff added, thus they can not be directly
used for kunmap_local().

For the cleanup, we have to re-grab all the sectors and reduce the
@pgoff from pointers[], then call kunmap_local().

With this function converted to subpage compatible sector interfaces,
the following helper functions can be removed:

- rbio_stripe_page()
- rbio_pstripe_page()
- rbio_qstripe_page()
- page_in_rbio()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 127 ++++++++++++++++++----------------------------
 1 file changed, 49 insertions(+), 78 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 1fbeaf2909b4..61df2b3636d2 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -683,39 +683,26 @@ static struct sector_ptr *rbio_stripe_sector(const struct btrfs_raid_bio *rbio,
 							      sector_nr)];
 }
 
-static int rbio_stripe_page_index(struct btrfs_raid_bio *rbio, int stripe,
-				  int index)
-{
-	return stripe * rbio->stripe_npages + index;
-}
-
-/*
- * these are just the pages from the rbio array, not from anything
- * the FS sent down to us
- */
-static struct page *rbio_stripe_page(struct btrfs_raid_bio *rbio, int stripe,
-				     int index)
+/* Helper to grab a sector inside P stripe */
+static struct sector_ptr *rbio_pstripe_sector(const struct btrfs_raid_bio *rbio,
+					      unsigned int sector_nr)
 {
-	return rbio->stripe_pages[rbio_stripe_page_index(rbio, stripe, index)];
+	return rbio_stripe_sector(rbio, rbio->nr_data, sector_nr);
 }
 
-/*
- * helper to index into the pstripe
- */
-static struct page *rbio_pstripe_page(struct btrfs_raid_bio *rbio, int index)
+/* Helper to grab a sector inside Q stripe, return NULL if not RAID6 */
+static struct sector_ptr *rbio_qstripe_sector(const struct btrfs_raid_bio *rbio,
+					      unsigned int sector_nr)
 {
-	return rbio_stripe_page(rbio, rbio->nr_data, index);
+	if (rbio->nr_data + 1 == rbio->real_stripes)
+		return NULL;
+	return rbio_stripe_sector(rbio, rbio->nr_data + 1, sector_nr);
 }
 
-/*
- * helper to index into the qstripe, returns null
- * if there is no qstripe
- */
-static struct page *rbio_qstripe_page(struct btrfs_raid_bio *rbio, int index)
+static int rbio_stripe_page_index(struct btrfs_raid_bio *rbio, int stripe,
+				  int index)
 {
-	if (rbio->nr_data + 1 == rbio->real_stripes)
-		return NULL;
-	return rbio_stripe_page(rbio, rbio->nr_data + 1, index);
+	return stripe * rbio->stripe_npages + index;
 }
 
 /*
@@ -1032,40 +1019,6 @@ static struct sector_ptr *sector_in_rbio(struct btrfs_raid_bio *rbio,
 	return &rbio->stripe_sectors[index];
 }
 
-/*
- * the read/modify/write code wants to use the original bio for
- * any pages it included, and then use the rbio for everything
- * else.  This function decides if a given index (stripe number)
- * and page number in that stripe fall inside the original bio
- * or the rbio.
- *
- * if you set bio_list_only, you'll get a NULL back for any ranges
- * that are outside the bio_list
- *
- * This doesn't take any refs on anything, you get a bare page pointer
- * and the caller must bump refs as required.
- *
- * You must call index_rbio_pages once before you can trust
- * the answers from this function.
- */
-static struct page *page_in_rbio(struct btrfs_raid_bio *rbio,
-				 int index, int pagenr, int bio_list_only)
-{
-	int chunk_page;
-	struct page *p = NULL;
-
-	chunk_page = index * (rbio->stripe_len >> PAGE_SHIFT) + pagenr;
-
-	spin_lock_irq(&rbio->bio_list_lock);
-	p = rbio->bio_pages[chunk_page];
-	spin_unlock_irq(&rbio->bio_list_lock);
-
-	if (p || bio_list_only)
-		return p;
-
-	return rbio->stripe_pages[chunk_page];
-}
-
 /*
  * allocation and initial setup for the btrfs_raid_bio.  Not
  * this does not allocate any pages for rbio->pages.
@@ -1355,6 +1308,7 @@ static void index_rbio_pages(struct btrfs_raid_bio *rbio)
 static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 {
 	struct btrfs_io_context *bioc = rbio->bioc;
+	const u32 sectorsize = bioc->fs_info->sectorsize;
 	void **pointers = rbio->finish_pointers;
 	int nr_data = rbio->nr_data;
 	int stripe;
@@ -1403,37 +1357,54 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 		clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
 
 	for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
-		struct page *p;
-		/* first collect one page from each data stripe */
+		struct sector_ptr *sector;
+
+		/* First collect one sector from each data stripe */
 		for (stripe = 0; stripe < nr_data; stripe++) {
-			p = page_in_rbio(rbio, stripe, sectornr, 0);
-			pointers[stripe] = kmap_local_page(p);
+			sector = sector_in_rbio(rbio, stripe, sectornr, 0);
+			pointers[stripe] = kmap_local_page(sector->page) +
+					   sector->pgoff;
 		}
 
-		/* then add the parity stripe */
-		p = rbio_pstripe_page(rbio, sectornr);
-		SetPageUptodate(p);
-		pointers[stripe++] = kmap_local_page(p);
+		/* Then add the parity stripe */
+		sector = rbio_pstripe_sector(rbio, sectornr);
+		sector->uptodate = 1;
+		pointers[stripe++] = kmap_local_page(sector->page) + sector->pgoff;
 
 		if (has_qstripe) {
-
 			/*
-			 * raid6, add the qstripe and call the
+			 * RAID6, add the qstripe and call the
 			 * library function to fill in our p/q
 			 */
-			p = rbio_qstripe_page(rbio, sectornr);
-			SetPageUptodate(p);
-			pointers[stripe++] = kmap_local_page(p);
+			sector = rbio_qstripe_sector(rbio, sectornr);
+			sector->uptodate = 1;
+			pointers[stripe++] = kmap_local_page(sector->page) +
+					     sector->pgoff;
 
-			raid6_call.gen_syndrome(rbio->real_stripes, PAGE_SIZE,
+			raid6_call.gen_syndrome(rbio->real_stripes, sectorsize,
 						pointers);
 		} else {
 			/* raid5 */
-			copy_page(pointers[nr_data], pointers[0]);
-			run_xor(pointers + 1, nr_data - 1, PAGE_SIZE);
+			memcpy(pointers[nr_data], pointers[0], sectorsize);
+			run_xor(pointers + 1, nr_data - 1, sectorsize);
+		}
+
+		/*
+		 * Unmap the pointers, need to re-grab those sectors to get
+		 * pgoff and calculate the original mapping address.
+		 * So we need to re-do the data and P/Q iteration.
+		 */
+		for (stripe = 0; stripe < nr_data; stripe++) {
+			sector = sector_in_rbio(rbio, stripe, sectornr, 0);
+			kunmap_local(pointers[stripe] - sector->pgoff);
+		}
+		sector = rbio_pstripe_sector(rbio, sectornr);
+		kunmap_local(pointers[stripe] - sector->pgoff);
+		stripe++;
+		if (has_qstripe) {
+			sector = rbio_qstripe_sector(rbio, sectornr);
+			kunmap_local(pointers[stripe] - sector->pgoff);
 		}
-		for (stripe = stripe - 1; stripe >= 0; stripe--)
-			kunmap_local(pointers[stripe]);
 	}
 
 	/*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/16] btrfs: open-code rbio_stripe_page_index()
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (8 preceding siblings ...)
  2022-04-01 11:23 ` [PATCH 09/16] btrfs: make finish_rmw() subpage compatible Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-08 17:28   ` David Sterba
  2022-04-01 11:23 ` [PATCH 11/16] btrfs: make raid56_add_scrub_pages() subpage compatible Qu Wenruo
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

There is only one caller for that helper now, and we're definitely fine
to open-code it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 61df2b3636d2..998c30867554 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -699,12 +699,6 @@ static struct sector_ptr *rbio_qstripe_sector(const struct btrfs_raid_bio *rbio,
 	return rbio_stripe_sector(rbio, rbio->nr_data + 1, sector_nr);
 }
 
-static int rbio_stripe_page_index(struct btrfs_raid_bio *rbio, int stripe,
-				  int index)
-{
-	return stripe * rbio->stripe_npages + index;
-}
-
 /*
  * The first stripe in the table for a logical address
  * has the lock.  rbios are added in one of three ways:
@@ -1131,9 +1125,7 @@ static int alloc_rbio_parity_pages(struct btrfs_raid_bio *rbio)
 	int i;
 	struct page *page;
 
-	i = rbio_stripe_page_index(rbio, rbio->nr_data, 0);
-
-	for (; i < rbio->nr_pages; i++) {
+	for (i = rbio->nr_data * rbio->stripe_npages; i < rbio->nr_pages; i++) {
 		if (rbio->stripe_pages[i])
 			continue;
 		page = alloc_page(GFP_NOFS);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 11/16] btrfs: make raid56_add_scrub_pages() subpage compatible
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (9 preceding siblings ...)
  2022-04-01 11:23 ` [PATCH 10/16] btrfs: open-code rbio_stripe_page_index() Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-01 11:23 ` [PATCH 12/16] btrfs: remove btrfs_raid_bio::bio_pages array Qu Wenruo
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

This requires one extra parameter @pgoff for the function.

In the current code base, scrub is still one page per sector, thus the
new parameter will always be 0.

It needs the extra subpage scrub optimization code to fully take
advantage.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 10 ++++++----
 fs/btrfs/raid56.h |  2 +-
 fs/btrfs/scrub.c  |  6 +++++-
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 998c30867554..74e246f4758e 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2418,17 +2418,19 @@ struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio,
 
 /* Used for both parity scrub and missing. */
 void raid56_add_scrub_pages(struct btrfs_raid_bio *rbio, struct page *page,
-			    u64 logical)
+			    unsigned int pgoff, u64 logical)
 {
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
 	int stripe_offset;
 	int index;
 
 	ASSERT(logical >= rbio->bioc->raid_map[0]);
-	ASSERT(logical + PAGE_SIZE <= rbio->bioc->raid_map[0] +
+	ASSERT(logical + sectorsize <= rbio->bioc->raid_map[0] +
 				rbio->stripe_len * rbio->nr_data);
 	stripe_offset = (int)(logical - rbio->bioc->raid_map[0]);
-	index = stripe_offset >> PAGE_SHIFT;
-	rbio->bio_pages[index] = page;
+	index = stripe_offset / sectorsize;
+	rbio->bio_sectors[index].page = page;
+	rbio->bio_sectors[index].pgoff = pgoff;
 }
 
 /*
diff --git a/fs/btrfs/raid56.h b/fs/btrfs/raid56.h
index 72c00fc284b5..c52cf690bd9e 100644
--- a/fs/btrfs/raid56.h
+++ b/fs/btrfs/raid56.h
@@ -36,7 +36,7 @@ int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc,
 			u64 stripe_len);
 
 void raid56_add_scrub_pages(struct btrfs_raid_bio *rbio, struct page *page,
-			    u64 logical);
+			    unsigned int pgoff, u64 logical);
 
 struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio,
 				struct btrfs_io_context *bioc, u64 stripe_len,
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 3f073d80755a..66ca71a4e570 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2222,7 +2222,11 @@ static void scrub_missing_raid56_pages(struct scrub_block *sblock)
 	for (i = 0; i < sblock->sector_count; i++) {
 		struct scrub_sector *sector = sblock->sectors[i];
 
-		raid56_add_scrub_pages(rbio, sector->page, sector->logical);
+		/*
+		 * For now, our scrub is still one page per-sector, so pgoff
+		 * is always 0.
+		 */
+		raid56_add_scrub_pages(rbio, sector->page, 0, sector->logical);
 	}
 
 	btrfs_init_work(&sblock->work, scrub_missing_raid56_worker, NULL, NULL);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 12/16] btrfs: remove btrfs_raid_bio::bio_pages array
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (10 preceding siblings ...)
  2022-04-01 11:23 ` [PATCH 11/16] btrfs: make raid56_add_scrub_pages() subpage compatible Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-01 11:23 ` [PATCH 13/16] btrfs: make set_bio_pages_uptodate() subpage compatible Qu Wenruo
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

The functionality is completely replaced by the new bio_sectors member,
now it's time to remove the old member.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 43 +++----------------------------------------
 1 file changed, 3 insertions(+), 40 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 74e246f4758e..b45f63b40131 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -179,9 +179,6 @@ struct btrfs_raid_bio {
 	/* Pointers to the sectors in the bio_list, for faster lookup */
 	struct sector_ptr *bio_sectors;
 
-	/* Pointers to the pages in the bio_list, for faster lookup */
-	struct page **bio_pages;
-
 	/*
 	 * For subpage support, we need to map each sector to above
 	 * stripe_pages.
@@ -263,7 +260,7 @@ int btrfs_alloc_stripe_hash_table(struct btrfs_fs_info *info)
 
 /*
  * caching an rbio means to copy anything from the
- * bio_pages array into the stripe_pages array.  We
+ * bio_sectors array into the stripe_pages array.  We
  * use the page uptodate bit in the stripe cache array
  * to indicate if it has valid data
  *
@@ -279,18 +276,6 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio)
 	if (ret)
 		return;
 
-	for (i = 0; i < rbio->nr_pages; i++) {
-		if (!rbio->bio_pages[i])
-			continue;
-
-		copy_highpage(rbio->stripe_pages[i], rbio->bio_pages[i]);
-		SetPageUptodate(rbio->stripe_pages[i]);
-	}
-
-	/*
-	 * TODO: This work is duplicated with above loop, should remove above
-	 * loop when the switch is fully done.
-	 */
 	for (i = 0; i < rbio->nr_sectors; i++) {
 		/* Some range not covered by bio (partial write), skip it */
 		if (!rbio->bio_sectors[i].page)
@@ -1067,7 +1052,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	atomic_set(&rbio->stripes_pending, 0);
 
 	/*
-	 * the stripe_pages, bio_pages, etc arrays point to the extra
+	 * the stripe_pages, bio_sectors, etc arrays point to the extra
 	 * memory we allocated past the end of the rbio
 	 */
 	p = rbio + 1;
@@ -1076,7 +1061,6 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 		p = (unsigned char *)p + sizeof(*(ptr)) * (count);	\
 	} while (0)
 	CONSUME_ALLOC(rbio->stripe_pages, num_pages);
-	CONSUME_ALLOC(rbio->bio_pages, num_pages);
 	CONSUME_ALLOC(rbio->bio_sectors, num_sectors);
 	CONSUME_ALLOC(rbio->stripe_sectors, num_sectors);
 	CONSUME_ALLOC(rbio->finish_pointers, real_stripes);
@@ -1250,7 +1234,7 @@ static void index_one_bio(struct btrfs_raid_bio *rbio, struct bio *bio)
 }
 
 /*
- * helper function to walk our bio list and populate the bio_pages array with
+ * Helper function to walk our bio list and populate the bio_sectors array with
  * the result.  This seems expensive, but it is faster than constantly
  * searching through the bio list as we setup the IO in finish_rmw or stripe
  * reconstruction.
@@ -1260,29 +1244,8 @@ static void index_one_bio(struct btrfs_raid_bio *rbio, struct bio *bio)
 static void index_rbio_pages(struct btrfs_raid_bio *rbio)
 {
 	struct bio *bio;
-	u64 start;
-	unsigned long stripe_offset;
-	unsigned long page_index;
 
 	spin_lock_irq(&rbio->bio_list_lock);
-	bio_list_for_each(bio, &rbio->bio_list) {
-		struct bio_vec bvec;
-		struct bvec_iter iter;
-		int i = 0;
-
-		start = bio->bi_iter.bi_sector << 9;
-		stripe_offset = start - rbio->bioc->raid_map[0];
-		page_index = stripe_offset >> PAGE_SHIFT;
-
-		if (bio_flagged(bio, BIO_CLONED))
-			bio->bi_iter = btrfs_bio(bio)->iter;
-
-		bio_for_each_segment(bvec, bio, iter) {
-			rbio->bio_pages[page_index + i] = bvec.bv_page;
-			i++;
-		}
-	}
-	/* TODO: This loop will replace above loop when the full switch is done */
 	bio_list_for_each(bio, &rbio->bio_list)
 		index_one_bio(rbio, bio);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 13/16] btrfs: make set_bio_pages_uptodate() subpage compatible
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (11 preceding siblings ...)
  2022-04-01 11:23 ` [PATCH 12/16] btrfs: remove btrfs_raid_bio::bio_pages array Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-01 11:23 ` [PATCH 14/16] btrfs: make steal_rbio() " Qu Wenruo
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

Unlike previous code, we can not directly set PageUptodate for stripe
pages now.

Instead we have to iterate through all the sectors and set
SECTOR_UPTODATE flag there.

Thus this patch introduced a new helper, find_stripe_sector(), to do the
work.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 45 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 39 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index b45f63b40131..b57be9387740 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1522,19 +1522,52 @@ static int fail_bio_stripe(struct btrfs_raid_bio *rbio,
 	return fail_rbio_index(rbio, failed);
 }
 
+/*
+ * For subpage case, we can no longer set page Uptodate directly for
+ * stripe_pages[], thus we need to locate the sector.
+ */
+static struct sector_ptr *find_stripe_sector(struct btrfs_raid_bio *rbio,
+					     struct page *page,
+					     unsigned int pgoff)
+{
+	int i;
+
+	for (i = 0; i < rbio->nr_sectors; i++) {
+		struct sector_ptr *sector = &rbio->stripe_sectors[i];
+
+		if (sector->page == page && sector->pgoff == pgoff)
+			return sector;
+	}
+	return NULL;
+}
+
 /*
  * this sets each page in the bio uptodate.  It should only be used on private
  * rbio pages, nothing that comes in from the higher layers
  */
-static void set_bio_pages_uptodate(struct bio *bio)
+static void set_bio_pages_uptodate(struct btrfs_raid_bio *rbio,
+				   struct bio *bio)
 {
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
 	struct bio_vec *bvec;
 	struct bvec_iter_all iter_all;
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 
-	bio_for_each_segment_all(bvec, bio, iter_all)
-		SetPageUptodate(bvec->bv_page);
+	bio_for_each_segment_all(bvec, bio, iter_all) {
+		struct sector_ptr *sector;
+		int pgoff;
+
+		for (pgoff = bvec->bv_offset;
+		     pgoff - bvec->bv_offset < bvec->bv_len;
+		     pgoff += sectorsize) {
+			sector = find_stripe_sector(rbio, bvec->bv_page, pgoff);
+			ASSERT(sector);
+			if (sector)
+				sector->uptodate = 1;
+		}
+
+	}
 }
 
 /*
@@ -1552,7 +1585,7 @@ static void raid_rmw_end_io(struct bio *bio)
 	if (bio->bi_status)
 		fail_bio_stripe(rbio, bio);
 	else
-		set_bio_pages_uptodate(bio);
+		set_bio_pages_uptodate(rbio, bio);
 
 	bio_put(bio);
 
@@ -2112,7 +2145,7 @@ static void raid_recover_end_io(struct bio *bio)
 	if (bio->bi_status)
 		fail_bio_stripe(rbio, bio);
 	else
-		set_bio_pages_uptodate(bio);
+		set_bio_pages_uptodate(rbio, bio);
 	bio_put(bio);
 
 	if (!atomic_dec_and_test(&rbio->stripes_pending))
@@ -2672,7 +2705,7 @@ static void raid56_parity_scrub_end_io(struct bio *bio)
 	if (bio->bi_status)
 		fail_bio_stripe(rbio, bio);
 	else
-		set_bio_pages_uptodate(bio);
+		set_bio_pages_uptodate(rbio, bio);
 
 	bio_put(bio);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 14/16] btrfs: make steal_rbio() subpage compatible
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (12 preceding siblings ...)
  2022-04-01 11:23 ` [PATCH 13/16] btrfs: make set_bio_pages_uptodate() subpage compatible Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-01 11:23 ` [PATCH 15/16] btrfs: make alloc_rbio_essential_pages() " Qu Wenruo
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

Function steal_rbio() will taking all the uptodate pages from the source
rbio to destination rbio.

With the new stripe_sectors[] array, we also need to do the extra check:

- Check sector::flags to make sure the full page is uptodate
  Now we don't use PageUptodate flag for subpage cases to indicate
  if the page is uptodate.

  Instead we need to check all the sectors belong to the page to be sure
  about whether it's full page uptodate.

  So here we introduce a new helper, full_page_sectors_uptodate() to do
  the check.

- Update rbio::stripe_sectors[] to use the new page pointer
  We only need to change the page pointer, no need to change anything
  else.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index b57be9387740..01fe5414e013 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -310,6 +310,25 @@ static int rbio_bucket(struct btrfs_raid_bio *rbio)
 	return hash_64(num >> 16, BTRFS_STRIPE_HASH_TABLE_BITS);
 }
 
+static bool full_page_sectors_uptodate(struct btrfs_raid_bio *rbio,
+				       unsigned int page_nr)
+{
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
+	const u32 sectors_per_page = PAGE_SIZE / sectorsize;
+	bool uptodate = true;
+	int i;
+
+	ASSERT(page_nr < rbio->nr_pages);
+
+	for (i = sectors_per_page * page_nr;
+	     i < sectors_per_page * page_nr + sectors_per_page; i++) {
+		if (!rbio->stripe_sectors[i].uptodate) {
+			uptodate = false;
+			break;
+		}
+	}
+	return uptodate;
+}
 /*
  * Update the stripe_sectors[] array to use correct page and pgoff
  *
@@ -331,8 +350,11 @@ static void index_stripe_sectors(struct btrfs_raid_bio *rbio)
 }
 
 /*
- * stealing an rbio means taking all the uptodate pages from the stripe
+ * Stealing an rbio means taking all the uptodate pages from the stripe
  * array in the source rbio and putting them into the destination rbio
+ *
+ * This will also update the involved stripe_sectors[] which are referring to
+ * the old pages.
  */
 static void steal_rbio(struct btrfs_raid_bio *src, struct btrfs_raid_bio *dest)
 {
@@ -345,9 +367,8 @@ static void steal_rbio(struct btrfs_raid_bio *src, struct btrfs_raid_bio *dest)
 
 	for (i = 0; i < dest->nr_pages; i++) {
 		s = src->stripe_pages[i];
-		if (!s || !PageUptodate(s)) {
+		if (!s || !full_page_sectors_uptodate(src, i))
 			continue;
-		}
 
 		d = dest->stripe_pages[i];
 		if (d)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 15/16] btrfs: make alloc_rbio_essential_pages() subpage compatible
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (13 preceding siblings ...)
  2022-04-01 11:23 ` [PATCH 14/16] btrfs: make steal_rbio() " Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-01 11:23 ` [PATCH 16/16] btrfs: enable subpage support for RAID56 Qu Wenruo
  2022-04-08 18:16 ` [PATCH 00/16] btrfs: add " David Sterba
  16 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

The non-compatible part is only the bitmap iteration part, now the
bitmap size is extended to rbio::stripe_nsectors, not the old
rbio::stripe_npages.

Since we're here, also slightly improve the function by:

- Rename @i to @stripe
- Rename @bit to @sectornr
- Move @page and @index into the inner loop

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 01fe5414e013..7bfe8e8d8325 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2456,14 +2456,16 @@ void raid56_add_scrub_pages(struct btrfs_raid_bio *rbio, struct page *page,
  */
 static int alloc_rbio_essential_pages(struct btrfs_raid_bio *rbio)
 {
-	int i;
-	int bit;
-	int index;
-	struct page *page;
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
+	int stripe;
+	int sectornr;
+
+	for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
+		for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
+			struct page *page;
+			int index = (stripe * rbio->stripe_nsectors + sectornr) *
+				    sectorsize >> PAGE_SHIFT;
 
-	for_each_set_bit(bit, rbio->dbitmap, rbio->stripe_npages) {
-		for (i = 0; i < rbio->real_stripes; i++) {
-			index = i * rbio->stripe_npages + bit;
 			if (rbio->stripe_pages[index])
 				continue;
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 16/16] btrfs: enable subpage support for RAID56
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (14 preceding siblings ...)
  2022-04-01 11:23 ` [PATCH 15/16] btrfs: make alloc_rbio_essential_pages() " Qu Wenruo
@ 2022-04-01 11:23 ` Qu Wenruo
  2022-04-08 18:16 ` [PATCH 00/16] btrfs: add " David Sterba
  16 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2022-04-01 11:23 UTC (permalink / raw)
  To: linux-btrfs

Now the btrfs RAID56 infrastructure has migrated to use sector_ptr
interface, it should be safe to enable subpage support for btrfs RAID56.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 8 --------
 fs/btrfs/raid56.c  | 6 ------
 fs/btrfs/volumes.c | 7 -------
 3 files changed, 21 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d456f426924c..2c66ea3485e8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3678,14 +3678,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		btrfs_warn(fs_info,
 		"read-write for sector size %u with page size %lu is experimental",
 			   sectorsize, PAGE_SIZE);
-		if (btrfs_super_incompat_flags(fs_info->super_copy) &
-			BTRFS_FEATURE_INCOMPAT_RAID56) {
-			btrfs_err(fs_info,
-		"RAID56 is not yet supported for sector size %u with page size %lu",
-				sectorsize, PAGE_SIZE);
-			err = -EINVAL;
-			goto fail_alloc;
-		}
 		subpage_info = kzalloc(sizeof(*subpage_info), GFP_KERNEL);
 		if (!subpage_info)
 			goto fail_alloc;
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 7bfe8e8d8325..3dff0e009404 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1172,9 +1172,6 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
 	ASSERT(sector_nr >= 0 && sector_nr < rbio->stripe_nsectors);
 	ASSERT(sector->page);
 
-	/* TODO: We don't yet support subpage, thus pgoff should always be 0 */
-	ASSERT(sector->pgoff == 0);
-
 	stripe = &rbio->bioc->stripes[stripe_nr];
 	disk_start = stripe->physical + sector_nr * sectorsize;
 
@@ -2419,9 +2416,6 @@ struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio,
 	}
 	ASSERT(i < rbio->real_stripes);
 
-	/* Now we just support the sectorsize equals to page size */
-	ASSERT(fs_info->sectorsize == PAGE_SIZE);
-	ASSERT(rbio->stripe_npages == stripe_nsectors);
 	bitmap_copy(rbio->dbitmap, dbitmap, stripe_nsectors);
 
 	/*
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3fd17e87815a..82c7e4e59c29 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4063,13 +4063,6 @@ static inline int validate_convert_profile(struct btrfs_fs_info *fs_info,
 	if (!(bargs->flags & BTRFS_BALANCE_ARGS_CONVERT))
 		return true;
 
-	if (fs_info->sectorsize < PAGE_SIZE &&
-		bargs->target & BTRFS_BLOCK_GROUP_RAID56_MASK) {
-		btrfs_err(fs_info,
-		"RAID56 is not yet supported for sectorsize %u with page size %lu",
-			  fs_info->sectorsize, PAGE_SIZE);
-		return false;
-	}
 	/* Profile is valid and does not have bits outside of the allowed set */
 	if (alloc_profile_is_valid(bargs->target, 1) &&
 	    (bargs->target & ~allowed) == 0)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/16] btrfs: introduce btrfs_raid_bio::stripe_sectors
  2022-04-01 11:23 ` [PATCH 04/16] btrfs: introduce btrfs_raid_bio::stripe_sectors Qu Wenruo
@ 2022-04-08 16:40   ` David Sterba
  2022-04-08 22:55     ` Qu Wenruo
  0 siblings, 1 reply; 33+ messages in thread
From: David Sterba @ 2022-04-08 16:40 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 01, 2022 at 07:23:19PM +0800, Qu Wenruo wrote:
> The new member is an array of sector_ptr pointers, they will represent
> all sectors inside a full stripe (including P/Q).
> 
> They co-operate with btrfs_raid_bio::stripe_pages:
> 
> stripe_pages:   | Page 0, range [0, 64K)   | Page 1 ...
> stripe_sectors: |  |  | ...             |  |
>                   |  |                    \- sector 15, page 0, pgoff=60K
>                   |  \- sector 1, page 0, pgoff=4K
>                   \---- sector 0, page 0, pfoff=0
> 
> With such structure, we can represent subpage sectors without using
> extra pages.
> 
> Here we introduce a new helper, index_stripe_sectors(), to update
> stripe_sectors[] to point to correct page and pgoff.
> 
> So every time rbio::stripe_pages[] pointer gets updated, the new helper
> should be called.
> 
> The following functions have to call the new helper:
> 
> - steal_rbio()
> - alloc_rbio_pages()
> - alloc_rbio_parity_pages()
> - alloc_rbio_essential_pages()
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/raid56.c | 62 +++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 54 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 1bad7d3a0331..8cfe00db79c9 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -52,6 +52,16 @@ struct btrfs_stripe_hash_table {
>  	struct btrfs_stripe_hash table[];
>  };
>  
> +/*
> + * A bvec like structure to present a sector inside a page.
> + *
> + * Unlike bvec we don't need bvlen, as it's fixed to sectorsize.
> + */
> +struct sector_ptr {
> +	struct page *page;
> +	unsigned int pgoff;
> +} __attribute__ ((__packed__));

Packed does not make much sense here, it's an in-memory structure and
there are no savings, besides that packed structures may force unaligned
access.

> +
>  enum btrfs_rbio_ops {
>  	BTRFS_RBIO_WRITE,
>  	BTRFS_RBIO_READ_REBUILD,
> @@ -154,25 +164,27 @@ struct btrfs_raid_bio {
>  
>  	atomic_t error;
>  	/*
> -	 * these are two arrays of pointers.  We allocate the
> +	 * These are two arrays of pointers.  We allocate the

Please don't fix comments that you don't move or change, this is
unrelated change.

>  	 * rbio big enough to hold them both and setup their
>  	 * locations when the rbio is allocated
>  	 */
>  
> -	/* pointers to pages that we allocated for
> +	/*
> +	 * Pointers to pages that we allocated for

Same

>  	 * reading/writing stripes directly from the disk (including P/Q)
>  	 */
>  	struct page **stripe_pages;
>  
> -	/*
> -	 * pointers to the pages in the bio_list.  Stored
> -	 * here for faster lookup
> -	 */
> +	/* Pointers to the pages in the bio_list, for faster lookup */
>  	struct page **bio_pages;
>  
>  	/*
> -	 * bitmap to record which horizontal stripe has data
> +	 * For subpage support, we need to map each sector to above
> +	 * stripe_pages.
>  	 */
> +	struct sector_ptr *stripe_sectors;
> +
> +	/* Bitmap to record which horizontal stripe has data */

So this one got moved and it's fine to update.

>  	unsigned long *dbitmap;
>  
>  	/* allocated with real_stripes-many pointers for finish_*() calls */

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/16] btrfs: introduce btrfs_raid_bio::bio_sectors
  2022-04-01 11:23 ` [PATCH 05/16] btrfs: introduce btrfs_raid_bio::bio_sectors Qu Wenruo
@ 2022-04-08 16:46   ` David Sterba
  2022-04-08 22:58     ` Qu Wenruo
  0 siblings, 1 reply; 33+ messages in thread
From: David Sterba @ 2022-04-08 16:46 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 01, 2022 at 07:23:20PM +0800, Qu Wenruo wrote:
> This new member is going to fully replace bio_pages in the future, but
> for now let's keep them co-exist, until the full switch is done.
> 
> Currently cache_rbio_pages() and index_rbio_pages() will also populate
> the new array.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/raid56.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 53 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 8cfe00db79c9..f23fd282d298 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -60,6 +60,7 @@ struct btrfs_stripe_hash_table {
>  struct sector_ptr {
>  	struct page *page;
>  	unsigned int pgoff;
> +	unsigned int uptodate:1;
>  } __attribute__ ((__packed__));

Even with packed this does not help as the full in is allocated for the
single bit and it requires bit operations to set/clear. Up to 4 bools
fit into one int and using them is better.

>  
>  enum btrfs_rbio_ops {
> @@ -175,6 +176,9 @@ struct btrfs_raid_bio {
>  	 */
>  	struct page **stripe_pages;
>  
> +	/* Pointers to the sectors in the bio_list, for faster lookup */
> +	struct sector_ptr *bio_sectors;
> +
>  	/* Pointers to the pages in the bio_list, for faster lookup */
>  	struct page **bio_pages;
>  
> @@ -282,6 +286,24 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio)
>  		copy_highpage(rbio->stripe_pages[i], rbio->bio_pages[i]);
>  		SetPageUptodate(rbio->stripe_pages[i]);
>  	}
> +
> +	/*
> +	 * TODO: This work is duplicated with above loop, should remove above

I think I told you several times, please don't write TODO notes. A plain
comment explaining that the loop is duplicated is understandable by
itself.

> +	 * loop when the switch is fully done.
> +	 */
> +	for (i = 0; i < rbio->nr_sectors; i++) {
> +		/* Some range not covered by bio (partial write), skip it */
> +		if (!rbio->bio_sectors[i].page)
> +			continue;
> +
> +		ASSERT(rbio->stripe_sectors[i].page);
> +		memcpy_page(rbio->stripe_sectors[i].page,
> +			    rbio->stripe_sectors[i].pgoff,
> +			    rbio->bio_sectors[i].page,
> +			    rbio->bio_sectors[i].pgoff,
> +			    rbio->bioc->fs_info->sectorsize);
> +		rbio->stripe_sectors[i].uptodate = 1;
> +	}
>  	set_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
>  }
>  

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/16] btrfs: make finish_parity_scrub() subpage compatible
  2022-04-01 11:23 ` [PATCH 07/16] btrfs: make finish_parity_scrub() " Qu Wenruo
@ 2022-04-08 17:04   ` David Sterba
  2022-04-08 22:59     ` Qu Wenruo
  0 siblings, 1 reply; 33+ messages in thread
From: David Sterba @ 2022-04-08 17:04 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 01, 2022 at 07:23:22PM +0800, Qu Wenruo wrote:
> The core is to convert direct page usage into sector_ptr usage, and
> use memcpy() to replace copy_page().
> 
> For pointers usage, we need to convert it to kmap_local_page() +
> sector->pgoff.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/raid56.c | 56 +++++++++++++++++++++++++++--------------------
>  1 file changed, 32 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 07d0b26024dd..bd01e64ea4fc 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -2492,14 +2492,15 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
>  					 int need_check)
>  {
>  	struct btrfs_io_context *bioc = rbio->bioc;
> +	const u32 sectorsize = bioc->fs_info->sectorsize;
>  	void **pointers = rbio->finish_pointers;
>  	unsigned long *pbitmap = rbio->finish_pbitmap;
>  	int nr_data = rbio->nr_data;
>  	int stripe;
>  	int sectornr;
>  	bool has_qstripe;
> -	struct page *p_page = NULL;
> -	struct page *q_page = NULL;
> +	struct sector_ptr p_sector = {};
> +	struct sector_ptr q_sector = {};

This should be { 0 }

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/16] btrfs: open-code rbio_stripe_page_index()
  2022-04-01 11:23 ` [PATCH 10/16] btrfs: open-code rbio_stripe_page_index() Qu Wenruo
@ 2022-04-08 17:28   ` David Sterba
  0 siblings, 0 replies; 33+ messages in thread
From: David Sterba @ 2022-04-08 17:28 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 01, 2022 at 07:23:25PM +0800, Qu Wenruo wrote:
> There is only one caller for that helper now, and we're definitely fine
> to open-code it.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/raid56.c | 10 +---------
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 61df2b3636d2..998c30867554 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -699,12 +699,6 @@ static struct sector_ptr *rbio_qstripe_sector(const struct btrfs_raid_bio *rbio,
>  	return rbio_stripe_sector(rbio, rbio->nr_data + 1, sector_nr);
>  }
>  
> -static int rbio_stripe_page_index(struct btrfs_raid_bio *rbio, int stripe,
> -				  int index)
> -{
> -	return stripe * rbio->stripe_npages + index;
> -}
> -
>  /*
>   * The first stripe in the table for a logical address
>   * has the lock.  rbios are added in one of three ways:
> @@ -1131,9 +1125,7 @@ static int alloc_rbio_parity_pages(struct btrfs_raid_bio *rbio)
>  	int i;
>  	struct page *page;
>  
> -	i = rbio_stripe_page_index(rbio, rbio->nr_data, 0);
> -
> -	for (; i < rbio->nr_pages; i++) {
> +	for (i = rbio->nr_data * rbio->stripe_npages; i < rbio->nr_pages; i++) {

The context has changed in misc-next so I applied it manually, there was
still just one caller so it was straightforward.

>  		if (rbio->stripe_pages[i])
>  			continue;
>  		page = alloc_page(GFP_NOFS);
> -- 
> 2.35.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/16] btrfs: add subpage support for RAID56
  2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
                   ` (15 preceding siblings ...)
  2022-04-01 11:23 ` [PATCH 16/16] btrfs: enable subpage support for RAID56 Qu Wenruo
@ 2022-04-08 18:16 ` David Sterba
  2022-04-12  7:15   ` Qu Wenruo
  16 siblings, 1 reply; 33+ messages in thread
From: David Sterba @ 2022-04-08 18:16 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 01, 2022 at 07:23:15PM +0800, Qu Wenruo wrote:
> The branch can be fetched from github, based on a slightly older
> misc-next branch:
> https://github.com/adam900710/linux/tree/subpage_raid56
> 
> [DESIGN]
> To make RAID56 code to support subpage, we need to make every sector of
> a RAID56 full stripe (including P/Q) to be addressable.
> 
> Previously we use page pointer directly for things like stripe_pages:
> 
> Full stripe is 64K * 3, 2 data stripes, one P stripe (RAID5)
> 
> stripe_pages:   | 0 | 1 | 2 |  .. | 46 | 47 |
> 
> Those 48 pages all points to a page we allocated.
> 
> 
> The new subpage support will introduce a sector layer, based on the old
> stripe_pages[] array:
> 
> The same 64K * 3, RAID5 layout, but 64K page size and 4K sector size:
> 
> stripe_sectors: | 0 | 1 | .. |15 |16 |  ...  |31 |32 | ..    |47 |
> stripe_pages:   |      Page 0    |     Page 1    |    Page 2     |
> 
> Each stripe_ptr of stripe_sectors[] will include:
> 
> - One page pointer
>   Points back the page inside stripe_pages[].
> 
> - One pgoff member
>   To indicate the offset inside the page
> 
> - One uptodate member
>   To indicate if the sector is uptodate, replacing the old PageUptodate
>   flag.
>   As introducing btrfs_subpage structure to stripe_pages[] looks a
>   little overkilled, as we only care Uptodate flag.
> 
> The same applies to bio_sectors[] array, which is going to completely
> replace the old bio_pages[] array.
> 
> [SIDE EFFECT]
> Despite the obvious new ability for subpage to utilize btrfs RAID56
> profiles, it will cause more memory usage for real btrfs_raid_bio
> structure.
> 
> We allocate extra memory based on the stripe size and number of stripes,
> and update the pointer arrays to utilize the extra memory.
> 
> To compare, we use a pretty standard setup, 3 disks raid5, 4K page size
> on x86_64:
> 
>  Before: 1176
>  After:  2032 (+72.8%)
> 
> The reason for such a big bump is:
> 
> - Extra size for sector_ptr.
>   Instead of just a page pointer, now it's twice the size of a pointer
>   (a page pointer + 2 * unsigned int)
> 
>   This means although we replace bio_pages[] with bio_sectors[], we are
>   still enlarging the size.
> 
> - A completely new array for stripe_sectors[]
>   And the array itself is also twice the size of the old stripe_pages[].
> 
> There is some attempt to reduce the size of btrfs_raid_bio itself, but
> the big impact still comes from the new sector_ptr arrays.
> 
> I have tried my best to reduce the size, by compacting the sector_ptr
> structure.
> Without exotic macros, I don't have any better ideas on reducing the
> real size of btrfs_raid_bio.
> 
> [TESTS]
> Full fstests are run on both x86_64 and aarch64.
> No new regressions found.
> (In fact several regressions found during development, all fixed).
> 
> [PATCHSET LAYOUT]
> The patchset layout puts several things into consideration:
> 
> - Every patch can be tested independently on x86_64
>   No more tons of unused helpers then a big switch.
>   Every change can be verified on x86_64.
> 
> - More temporary sanity checks than previous code
>   For example, when rbio_add_io_page() is converted to be subpage
>   compatible, extra ASSERT() is added to ensure no subpage range
>   can even be added.
> 
>   Such temporary checks are removed in the last enablement patch.
>   This is to make testing on x86_64 more comprehensive.
> 
> - Mostly small change in each patch
>   The only exception is the conversion for rbio_add_io_page().
>   But the most change in that patch comes from variable renaming.
>   The overall line changed in each patch should still be small enough
>   for review.
> 
> Qu Wenruo (16):
>   btrfs: open-code rbio_nr_pages()
>   btrfs: make btrfs_raid_bio more compact
>   btrfs: introduce new cached members for btrfs_raid_bio
>   btrfs: introduce btrfs_raid_bio::stripe_sectors
>   btrfs: introduce btrfs_raid_bio::bio_sectors
>   btrfs: make rbio_add_io_page() subpage compatible
>   btrfs: make finish_parity_scrub() subpage compatible
>   btrfs: make __raid_recover_endio_io() subpage compatibable
>   btrfs: make finish_rmw() subpage compatible
>   btrfs: open-code rbio_stripe_page_index()
>   btrfs: make raid56_add_scrub_pages() subpage compatible
>   btrfs: remove btrfs_raid_bio::bio_pages array
>   btrfs: make set_bio_pages_uptodate() subpage compatible
>   btrfs: make steal_rbio() subpage compatible
>   btrfs: make alloc_rbio_essential_pages() subpage compatible
>   btrfs: enable subpage support for RAID56

This patchset is relatively independent so I picked it now, it's in a
topic branch, rebased on misc-next ie. on top of the bio cleanups from
Christoph. I'll move it to misc-next after some quick testing.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/16] btrfs: introduce btrfs_raid_bio::stripe_sectors
  2022-04-08 16:40   ` David Sterba
@ 2022-04-08 22:55     ` Qu Wenruo
  0 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2022-04-08 22:55 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/4/9 00:40, David Sterba wrote:
> On Fri, Apr 01, 2022 at 07:23:19PM +0800, Qu Wenruo wrote:
>> The new member is an array of sector_ptr pointers, they will represent
>> all sectors inside a full stripe (including P/Q).
>>
>> They co-operate with btrfs_raid_bio::stripe_pages:
>>
>> stripe_pages:   | Page 0, range [0, 64K)   | Page 1 ...
>> stripe_sectors: |  |  | ...             |  |
>>                    |  |                    \- sector 15, page 0, pgoff=60K
>>                    |  \- sector 1, page 0, pgoff=4K
>>                    \---- sector 0, page 0, pfoff=0
>>
>> With such structure, we can represent subpage sectors without using
>> extra pages.
>>
>> Here we introduce a new helper, index_stripe_sectors(), to update
>> stripe_sectors[] to point to correct page and pgoff.
>>
>> So every time rbio::stripe_pages[] pointer gets updated, the new helper
>> should be called.
>>
>> The following functions have to call the new helper:
>>
>> - steal_rbio()
>> - alloc_rbio_pages()
>> - alloc_rbio_parity_pages()
>> - alloc_rbio_essential_pages()
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/raid56.c | 62 +++++++++++++++++++++++++++++++++++++++++------
>>   1 file changed, 54 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
>> index 1bad7d3a0331..8cfe00db79c9 100644
>> --- a/fs/btrfs/raid56.c
>> +++ b/fs/btrfs/raid56.c
>> @@ -52,6 +52,16 @@ struct btrfs_stripe_hash_table {
>>   	struct btrfs_stripe_hash table[];
>>   };
>>
>> +/*
>> + * A bvec like structure to present a sector inside a page.
>> + *
>> + * Unlike bvec we don't need bvlen, as it's fixed to sectorsize.
>> + */
>> +struct sector_ptr {
>> +	struct page *page;
>> +	unsigned int pgoff;
>> +} __attribute__ ((__packed__));
>
> Packed does not make much sense here, it's an in-memory structure and
> there are no savings, besides that packed structures may force unaligned
> access.

Not exactly.

The packed is for the later 1 bit flag, without packed it will take
another hundred of bytes for the very basic 3 disks RAID5.

Thanks,
Qu
>
>> +
>>   enum btrfs_rbio_ops {
>>   	BTRFS_RBIO_WRITE,
>>   	BTRFS_RBIO_READ_REBUILD,
>> @@ -154,25 +164,27 @@ struct btrfs_raid_bio {
>>
>>   	atomic_t error;
>>   	/*
>> -	 * these are two arrays of pointers.  We allocate the
>> +	 * These are two arrays of pointers.  We allocate the
>
> Please don't fix comments that you don't move or change, this is
> unrelated change.
>
>>   	 * rbio big enough to hold them both and setup their
>>   	 * locations when the rbio is allocated
>>   	 */
>>
>> -	/* pointers to pages that we allocated for
>> +	/*
>> +	 * Pointers to pages that we allocated for
>
> Same
>
>>   	 * reading/writing stripes directly from the disk (including P/Q)
>>   	 */
>>   	struct page **stripe_pages;
>>
>> -	/*
>> -	 * pointers to the pages in the bio_list.  Stored
>> -	 * here for faster lookup
>> -	 */
>> +	/* Pointers to the pages in the bio_list, for faster lookup */
>>   	struct page **bio_pages;
>>
>>   	/*
>> -	 * bitmap to record which horizontal stripe has data
>> +	 * For subpage support, we need to map each sector to above
>> +	 * stripe_pages.
>>   	 */
>> +	struct sector_ptr *stripe_sectors;
>> +
>> +	/* Bitmap to record which horizontal stripe has data */
>
> So this one got moved and it's fine to update.
>
>>   	unsigned long *dbitmap;
>>
>>   	/* allocated with real_stripes-many pointers for finish_*() calls */

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/16] btrfs: introduce btrfs_raid_bio::bio_sectors
  2022-04-08 16:46   ` David Sterba
@ 2022-04-08 22:58     ` Qu Wenruo
  2022-04-11 14:35       ` David Sterba
  0 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2022-04-08 22:58 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/4/9 00:46, David Sterba wrote:
> On Fri, Apr 01, 2022 at 07:23:20PM +0800, Qu Wenruo wrote:
>> This new member is going to fully replace bio_pages in the future, but
>> for now let's keep them co-exist, until the full switch is done.
>>
>> Currently cache_rbio_pages() and index_rbio_pages() will also populate
>> the new array.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/raid56.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 53 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
>> index 8cfe00db79c9..f23fd282d298 100644
>> --- a/fs/btrfs/raid56.c
>> +++ b/fs/btrfs/raid56.c
>> @@ -60,6 +60,7 @@ struct btrfs_stripe_hash_table {
>>   struct sector_ptr {
>>   	struct page *page;
>>   	unsigned int pgoff;
>> +	unsigned int uptodate:1;
>>   } __attribute__ ((__packed__));
>
> Even with packed this does not help as the full in is allocated for the
> single bit and it requires bit operations to set/clear. Up to 4 bools
> fit into one int and using them is better.

I guess I can put pgoff to take at most 31 bits, then can share uptodate
flag with pgoff.

>
>>
>>   enum btrfs_rbio_ops {
>> @@ -175,6 +176,9 @@ struct btrfs_raid_bio {
>>   	 */
>>   	struct page **stripe_pages;
>>
>> +	/* Pointers to the sectors in the bio_list, for faster lookup */
>> +	struct sector_ptr *bio_sectors;
>> +
>>   	/* Pointers to the pages in the bio_list, for faster lookup */
>>   	struct page **bio_pages;
>>
>> @@ -282,6 +286,24 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio)
>>   		copy_highpage(rbio->stripe_pages[i], rbio->bio_pages[i]);
>>   		SetPageUptodate(rbio->stripe_pages[i]);
>>   	}
>> +
>> +	/*
>> +	 * TODO: This work is duplicated with above loop, should remove above
>
> I think I told you several times, please don't write TODO notes. A plain
> comment explaining that the loop is duplicated is understandable by
> itself.

Even it's going to be deleted in later patches?

TODO provides a pretty good highlight in most editors, thus it's way
more easier to be exposed without being forgotten.

Thanks,
Qu

>
>> +	 * loop when the switch is fully done.
>> +	 */
>> +	for (i = 0; i < rbio->nr_sectors; i++) {
>> +		/* Some range not covered by bio (partial write), skip it */
>> +		if (!rbio->bio_sectors[i].page)
>> +			continue;
>> +
>> +		ASSERT(rbio->stripe_sectors[i].page);
>> +		memcpy_page(rbio->stripe_sectors[i].page,
>> +			    rbio->stripe_sectors[i].pgoff,
>> +			    rbio->bio_sectors[i].page,
>> +			    rbio->bio_sectors[i].pgoff,
>> +			    rbio->bioc->fs_info->sectorsize);
>> +		rbio->stripe_sectors[i].uptodate = 1;
>> +	}
>>   	set_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
>>   }
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/16] btrfs: make finish_parity_scrub() subpage compatible
  2022-04-08 17:04   ` David Sterba
@ 2022-04-08 22:59     ` Qu Wenruo
  2022-04-11 14:36       ` David Sterba
  0 siblings, 1 reply; 33+ messages in thread
From: Qu Wenruo @ 2022-04-08 22:59 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/4/9 01:04, David Sterba wrote:
> On Fri, Apr 01, 2022 at 07:23:22PM +0800, Qu Wenruo wrote:
>> The core is to convert direct page usage into sector_ptr usage, and
>> use memcpy() to replace copy_page().
>>
>> For pointers usage, we need to convert it to kmap_local_page() +
>> sector->pgoff.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/raid56.c | 56 +++++++++++++++++++++++++++--------------------
>>   1 file changed, 32 insertions(+), 24 deletions(-)
>>
>> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
>> index 07d0b26024dd..bd01e64ea4fc 100644
>> --- a/fs/btrfs/raid56.c
>> +++ b/fs/btrfs/raid56.c
>> @@ -2492,14 +2492,15 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
>>   					 int need_check)
>>   {
>>   	struct btrfs_io_context *bioc = rbio->bioc;
>> +	const u32 sectorsize = bioc->fs_info->sectorsize;
>>   	void **pointers = rbio->finish_pointers;
>>   	unsigned long *pbitmap = rbio->finish_pbitmap;
>>   	int nr_data = rbio->nr_data;
>>   	int stripe;
>>   	int sectornr;
>>   	bool has_qstripe;
>> -	struct page *p_page = NULL;
>> -	struct page *q_page = NULL;
>> +	struct sector_ptr p_sector = {};
>> +	struct sector_ptr q_sector = {};
>
> This should be { 0 }

That's not what you said before.

You said that { 0 } conversion is not going to be merged and {} is fine.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/16] btrfs: introduce new cached members for btrfs_raid_bio
  2022-04-01 11:23 ` [PATCH 03/16] btrfs: introduce new cached members for btrfs_raid_bio Qu Wenruo
@ 2022-04-11 10:29   ` Geert Uytterhoeven
  2022-04-11 11:09     ` Qu Wenruo
  0 siblings, 1 reply; 33+ messages in thread
From: Geert Uytterhoeven @ 2022-04-11 10:29 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, linux-kernel

 	Hi Qu,

On Fri, 1 Apr 2022, Qu Wenruo wrote:
> The new members are all related to number of sectors, but the existing
> number of pages members are kept as is:
>
> - nr_sectors
>  Total sectors of the full stripe including P/Q.
>
> - stripe_nsectors
>  The sectors of a single stripe.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Thanks for your patch, which is now commit f1e779cdb7f165f0
("btrfs: raid56: introduce new cached members for btrfs_raid_bio") in
next-20220411.

> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -958,18 +964,25 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
> 	const unsigned int real_stripes = bioc->num_stripes - bioc->num_tgtdevs;
> 	const unsigned int num_pages = DIV_ROUND_UP(stripe_len, PAGE_SIZE) *
> 				       real_stripes;
> +	const unsigned int num_sectors = DIV_ROUND_UP(stripe_len * real_stripes,
> +						      fs_info->sectorsize);

noreply@ellerman.id.au reports for m68k builds:

     ERROR: modpost: "__udivdi3" [fs/btrfs/btrfs.ko] undefined!

As this is a 64-by-32 division, you should not use a plain division,
but e.g. a shift by fs_info->sectorsize_bits.

> 	const unsigned int stripe_npages = DIV_ROUND_UP(stripe_len, PAGE_SIZE);
> +	const unsigned int stripe_nsectors = DIV_ROUND_UP(stripe_len,
> +							  fs_info->sectorsize);

Likewise.

Gr{oetje,eeting}s,

 						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
 							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/16] btrfs: introduce new cached members for btrfs_raid_bio
  2022-04-11 10:29   ` Geert Uytterhoeven
@ 2022-04-11 11:09     ` Qu Wenruo
  0 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2022-04-11 11:09 UTC (permalink / raw)
  To: Geert Uytterhoeven, Qu Wenruo; +Cc: linux-btrfs, linux-kernel



On 2022/4/11 18:29, Geert Uytterhoeven wrote:
>      Hi Qu,
>
> On Fri, 1 Apr 2022, Qu Wenruo wrote:
>> The new members are all related to number of sectors, but the existing
>> number of pages members are kept as is:
>>
>> - nr_sectors
>>  Total sectors of the full stripe including P/Q.
>>
>> - stripe_nsectors
>>  The sectors of a single stripe.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> Thanks for your patch, which is now commit f1e779cdb7f165f0
> ("btrfs: raid56: introduce new cached members for btrfs_raid_bio") in
> next-20220411.
>
>> --- a/fs/btrfs/raid56.c
>> +++ b/fs/btrfs/raid56.c
>> @@ -958,18 +964,25 @@ static struct btrfs_raid_bio *alloc_rbio(struct
>> btrfs_fs_info *fs_info,
>>     const unsigned int real_stripes = bioc->num_stripes -
>> bioc->num_tgtdevs;
>>     const unsigned int num_pages = DIV_ROUND_UP(stripe_len, PAGE_SIZE) *
>>                        real_stripes;
>> +    const unsigned int num_sectors = DIV_ROUND_UP(stripe_len *
>> real_stripes,
>> +                              fs_info->sectorsize);
>
> noreply@ellerman.id.au reports for m68k builds:
>
>      ERROR: modpost: "__udivdi3" [fs/btrfs/btrfs.ko] undefined!
>
> As this is a 64-by-32 division, you should not use a plain division,
> but e.g. a shift by fs_info->sectorsize_bits.
>
>>     const unsigned int stripe_npages = DIV_ROUND_UP(stripe_len,
>> PAGE_SIZE);
>> +    const unsigned int stripe_nsectors = DIV_ROUND_UP(stripe_len,
>> +                              fs_info->sectorsize);

All my bad, in fact when I crafting this exact line, the
64-diveded-by-32 problem is already in my mind.

But my initial assumption that stripe_len should be u32, prevents me to
do a proper check.

In fact, stripe_len should never be u64. U32 is definitely enough, and
we should avoid abusing u64 for those members.

I'll update the patchset, with one new patch to address the abuse of u64
specifically.

Thank you very much for pointing this out,
Qu

>
> Likewise.
>
> Gr{oetje,eeting}s,
>
>                          Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 --
> geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker.
> But
> when I'm talking to journalists I just say "programmer" or something
> like that.
>                                  -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/16] btrfs: introduce btrfs_raid_bio::bio_sectors
  2022-04-08 22:58     ` Qu Wenruo
@ 2022-04-11 14:35       ` David Sterba
  0 siblings, 0 replies; 33+ messages in thread
From: David Sterba @ 2022-04-11 14:35 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Sat, Apr 09, 2022 at 06:58:56AM +0800, Qu Wenruo wrote:
> On 2022/4/9 00:46, David Sterba wrote:
> > On Fri, Apr 01, 2022 at 07:23:20PM +0800, Qu Wenruo wrote:
> >> This new member is going to fully replace bio_pages in the future, but
> >> for now let's keep them co-exist, until the full switch is done.
> >>
> >> Currently cache_rbio_pages() and index_rbio_pages() will also populate
> >> the new array.
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >>   fs/btrfs/raid56.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++-
> >>   1 file changed, 53 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> >> index 8cfe00db79c9..f23fd282d298 100644
> >> --- a/fs/btrfs/raid56.c
> >> +++ b/fs/btrfs/raid56.c
> >> @@ -60,6 +60,7 @@ struct btrfs_stripe_hash_table {
> >>   struct sector_ptr {
> >>   	struct page *page;
> >>   	unsigned int pgoff;
> >> +	unsigned int uptodate:1;
> >>   } __attribute__ ((__packed__));
> >
> > Even with packed this does not help as the full in is allocated for the
> > single bit and it requires bit operations to set/clear. Up to 4 bools
> > fit into one int and using them is better.
> 
> I guess I can put pgoff to take at most 31 bits, then can share uptodate
> flag with pgoff.

If we go that way, splitting the bit fields on byte boundary generates a
better code, anything non-aligned requires masking and oring
> 
> >
> >>
> >>   enum btrfs_rbio_ops {
> >> @@ -175,6 +176,9 @@ struct btrfs_raid_bio {
> >>   	 */
> >>   	struct page **stripe_pages;
> >>
> >> +	/* Pointers to the sectors in the bio_list, for faster lookup */
> >> +	struct sector_ptr *bio_sectors;
> >> +
> >>   	/* Pointers to the pages in the bio_list, for faster lookup */
> >>   	struct page **bio_pages;
> >>
> >> @@ -282,6 +286,24 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio)
> >>   		copy_highpage(rbio->stripe_pages[i], rbio->bio_pages[i]);
> >>   		SetPageUptodate(rbio->stripe_pages[i]);
> >>   	}
> >> +
> >> +	/*
> >> +	 * TODO: This work is duplicated with above loop, should remove above
> >
> > I think I told you several times, please don't write TODO notes. A plain
> > comment explaining that the loop is duplicated is understandable by
> > itself.
> 
> Even it's going to be deleted in later patches?

Yes.

> TODO provides a pretty good highlight in most editors, thus it's way
> more easier to be exposed without being forgotten.

Being forgotten by you while writing the patches, it's not interesting
for anybody else and I'm really opposed to using source code as todo
list and note taking storage. We have too many in btrfs code, lots of
them stale for years. If it's in the same patchset there's little risk
you'd forget about it, or take notes on paper or maybe your editor has a
capability to bookmark lines.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/16] btrfs: make finish_parity_scrub() subpage compatible
  2022-04-08 22:59     ` Qu Wenruo
@ 2022-04-11 14:36       ` David Sterba
  2022-04-11 14:46         ` David Sterba
  0 siblings, 1 reply; 33+ messages in thread
From: David Sterba @ 2022-04-11 14:36 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Sat, Apr 09, 2022 at 06:59:37AM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/4/9 01:04, David Sterba wrote:
> > On Fri, Apr 01, 2022 at 07:23:22PM +0800, Qu Wenruo wrote:
> >> The core is to convert direct page usage into sector_ptr usage, and
> >> use memcpy() to replace copy_page().
> >>
> >> For pointers usage, we need to convert it to kmap_local_page() +
> >> sector->pgoff.
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >>   fs/btrfs/raid56.c | 56 +++++++++++++++++++++++++++--------------------
> >>   1 file changed, 32 insertions(+), 24 deletions(-)
> >>
> >> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> >> index 07d0b26024dd..bd01e64ea4fc 100644
> >> --- a/fs/btrfs/raid56.c
> >> +++ b/fs/btrfs/raid56.c
> >> @@ -2492,14 +2492,15 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
> >>   					 int need_check)
> >>   {
> >>   	struct btrfs_io_context *bioc = rbio->bioc;
> >> +	const u32 sectorsize = bioc->fs_info->sectorsize;
> >>   	void **pointers = rbio->finish_pointers;
> >>   	unsigned long *pbitmap = rbio->finish_pbitmap;
> >>   	int nr_data = rbio->nr_data;
> >>   	int stripe;
> >>   	int sectornr;
> >>   	bool has_qstripe;
> >> -	struct page *p_page = NULL;
> >> -	struct page *q_page = NULL;
> >> +	struct sector_ptr p_sector = {};
> >> +	struct sector_ptr q_sector = {};
> >
> > This should be { 0 }
> 
> That's not what you said before.
> 
> You said that { 0 } conversion is not going to be merged and {} is fine.

Let me check, I could have misremembered that.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/16] btrfs: make finish_parity_scrub() subpage compatible
  2022-04-11 14:36       ` David Sterba
@ 2022-04-11 14:46         ` David Sterba
  0 siblings, 0 replies; 33+ messages in thread
From: David Sterba @ 2022-04-11 14:46 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs

On Mon, Apr 11, 2022 at 04:36:20PM +0200, David Sterba wrote:
> On Sat, Apr 09, 2022 at 06:59:37AM +0800, Qu Wenruo wrote:
> > 
> > 
> > On 2022/4/9 01:04, David Sterba wrote:
> > > On Fri, Apr 01, 2022 at 07:23:22PM +0800, Qu Wenruo wrote:
> > >> The core is to convert direct page usage into sector_ptr usage, and
> > >> use memcpy() to replace copy_page().
> > >>
> > >> For pointers usage, we need to convert it to kmap_local_page() +
> > >> sector->pgoff.
> > >>
> > >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> > >> ---
> > >>   fs/btrfs/raid56.c | 56 +++++++++++++++++++++++++++--------------------
> > >>   1 file changed, 32 insertions(+), 24 deletions(-)
> > >>
> > >> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> > >> index 07d0b26024dd..bd01e64ea4fc 100644
> > >> --- a/fs/btrfs/raid56.c
> > >> +++ b/fs/btrfs/raid56.c
> > >> @@ -2492,14 +2492,15 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
> > >>   					 int need_check)
> > >>   {
> > >>   	struct btrfs_io_context *bioc = rbio->bioc;
> > >> +	const u32 sectorsize = bioc->fs_info->sectorsize;
> > >>   	void **pointers = rbio->finish_pointers;
> > >>   	unsigned long *pbitmap = rbio->finish_pbitmap;
> > >>   	int nr_data = rbio->nr_data;
> > >>   	int stripe;
> > >>   	int sectornr;
> > >>   	bool has_qstripe;
> > >> -	struct page *p_page = NULL;
> > >> -	struct page *q_page = NULL;
> > >> +	struct sector_ptr p_sector = {};
> > >> +	struct sector_ptr q_sector = {};
> > >
> > > This should be { 0 }
> > 
> > That's not what you said before.
> > 
> > You said that { 0 } conversion is not going to be merged and {} is fine.
> 
> Let me check, I could have misremembered that.

Nope, { 0 } it is

https://lore.kernel.org/all/9292fc4a-9c6e-8e28-8203-f70118e9ee20@gmx.com/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 09/16] btrfs: make finish_rmw() subpage compatible
  2022-04-01 11:23 ` [PATCH 09/16] btrfs: make finish_rmw() subpage compatible Qu Wenruo
@ 2022-04-11 17:00   ` Christoph Hellwig
  2022-04-11 22:24     ` Qu Wenruo
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2022-04-11 17:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 01, 2022 at 07:23:24PM +0800, Qu Wenruo wrote:
> The trick involved is how we call kunmap_local(), as now the pointers[]
> only store the address with @pgoff added, thus they can not be directly
> used for kunmap_local().

Yes, they can. kumap_local only needs a pointer somewhere into the
mapped page, not the return value from kmap_local_page:

 * @__addr can be any address within the mapped page.  Commonly it is the
 * address return from kmap_local_page(), but it can also include offsets.

> For the cleanup, we have to re-grab all the sectors and reduce the
> @pgoff from pointers[], then call kunmap_local().

No need for that.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 09/16] btrfs: make finish_rmw() subpage compatible
  2022-04-11 17:00   ` Christoph Hellwig
@ 2022-04-11 22:24     ` Qu Wenruo
  0 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2022-04-11 22:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs



On 2022/4/12 01:00, Christoph Hellwig wrote:
> On Fri, Apr 01, 2022 at 07:23:24PM +0800, Qu Wenruo wrote:
>> The trick involved is how we call kunmap_local(), as now the pointers[]
>> only store the address with @pgoff added, thus they can not be directly
>> used for kunmap_local().
> 
> Yes, they can. kumap_local only needs a pointer somewhere into the
> mapped page, not the return value from kmap_local_page:
> 
>   * @__addr can be any address within the mapped page.  Commonly it is the
>   * address return from kmap_local_page(), but it can also include offsets.
> 
>> For the cleanup, we have to re-grab all the sectors and reduce the
>> @pgoff from pointers[], then call kunmap_local().
> 
> No need for that.
> 

Awesome!!

Finally we can get rid of the complexity of weird pointer tracing.

Thanks,
Qu


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/16] btrfs: add subpage support for RAID56
  2022-04-08 18:16 ` [PATCH 00/16] btrfs: add " David Sterba
@ 2022-04-12  7:15   ` Qu Wenruo
  0 siblings, 0 replies; 33+ messages in thread
From: Qu Wenruo @ 2022-04-12  7:15 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/4/9 02:16, David Sterba wrote:
> On Fri, Apr 01, 2022 at 07:23:15PM +0800, Qu Wenruo wrote:
>> The branch can be fetched from github, based on a slightly older
>> misc-next branch:
>> https://github.com/adam900710/linux/tree/subpage_raid56
>>
>> [DESIGN]
>> To make RAID56 code to support subpage, we need to make every sector of
>> a RAID56 full stripe (including P/Q) to be addressable.
>>
>> Previously we use page pointer directly for things like stripe_pages:
>>
>> Full stripe is 64K * 3, 2 data stripes, one P stripe (RAID5)
>>
>> stripe_pages:   | 0 | 1 | 2 |  .. | 46 | 47 |
>>
>> Those 48 pages all points to a page we allocated.
>>
>>
>> The new subpage support will introduce a sector layer, based on the old
>> stripe_pages[] array:
>>
>> The same 64K * 3, RAID5 layout, but 64K page size and 4K sector size:
>>
>> stripe_sectors: | 0 | 1 | .. |15 |16 |  ...  |31 |32 | ..    |47 |
>> stripe_pages:   |      Page 0    |     Page 1    |    Page 2     |
>>
>> Each stripe_ptr of stripe_sectors[] will include:
>>
>> - One page pointer
>>    Points back the page inside stripe_pages[].
>>
>> - One pgoff member
>>    To indicate the offset inside the page
>>
>> - One uptodate member
>>    To indicate if the sector is uptodate, replacing the old PageUptodate
>>    flag.
>>    As introducing btrfs_subpage structure to stripe_pages[] looks a
>>    little overkilled, as we only care Uptodate flag.
>>
>> The same applies to bio_sectors[] array, which is going to completely
>> replace the old bio_pages[] array.
>>
>> [SIDE EFFECT]
>> Despite the obvious new ability for subpage to utilize btrfs RAID56
>> profiles, it will cause more memory usage for real btrfs_raid_bio
>> structure.
>>
>> We allocate extra memory based on the stripe size and number of stripes,
>> and update the pointer arrays to utilize the extra memory.
>>
>> To compare, we use a pretty standard setup, 3 disks raid5, 4K page size
>> on x86_64:
>>
>>   Before: 1176
>>   After:  2032 (+72.8%)
>>
>> The reason for such a big bump is:
>>
>> - Extra size for sector_ptr.
>>    Instead of just a page pointer, now it's twice the size of a pointer
>>    (a page pointer + 2 * unsigned int)
>>
>>    This means although we replace bio_pages[] with bio_sectors[], we are
>>    still enlarging the size.
>>
>> - A completely new array for stripe_sectors[]
>>    And the array itself is also twice the size of the old stripe_pages[].
>>
>> There is some attempt to reduce the size of btrfs_raid_bio itself, but
>> the big impact still comes from the new sector_ptr arrays.
>>
>> I have tried my best to reduce the size, by compacting the sector_ptr
>> structure.
>> Without exotic macros, I don't have any better ideas on reducing the
>> real size of btrfs_raid_bio.
>>
>> [TESTS]
>> Full fstests are run on both x86_64 and aarch64.
>> No new regressions found.
>> (In fact several regressions found during development, all fixed).
>>
>> [PATCHSET LAYOUT]
>> The patchset layout puts several things into consideration:
>>
>> - Every patch can be tested independently on x86_64
>>    No more tons of unused helpers then a big switch.
>>    Every change can be verified on x86_64.
>>
>> - More temporary sanity checks than previous code
>>    For example, when rbio_add_io_page() is converted to be subpage
>>    compatible, extra ASSERT() is added to ensure no subpage range
>>    can even be added.
>>
>>    Such temporary checks are removed in the last enablement patch.
>>    This is to make testing on x86_64 more comprehensive.
>>
>> - Mostly small change in each patch
>>    The only exception is the conversion for rbio_add_io_page().
>>    But the most change in that patch comes from variable renaming.
>>    The overall line changed in each patch should still be small enough
>>    for review.
>>
>> Qu Wenruo (16):
>>    btrfs: open-code rbio_nr_pages()
>>    btrfs: make btrfs_raid_bio more compact
>>    btrfs: introduce new cached members for btrfs_raid_bio
>>    btrfs: introduce btrfs_raid_bio::stripe_sectors
>>    btrfs: introduce btrfs_raid_bio::bio_sectors
>>    btrfs: make rbio_add_io_page() subpage compatible
>>    btrfs: make finish_parity_scrub() subpage compatible
>>    btrfs: make __raid_recover_endio_io() subpage compatibable
>>    btrfs: make finish_rmw() subpage compatible
>>    btrfs: open-code rbio_stripe_page_index()
>>    btrfs: make raid56_add_scrub_pages() subpage compatible
>>    btrfs: remove btrfs_raid_bio::bio_pages array
>>    btrfs: make set_bio_pages_uptodate() subpage compatible
>>    btrfs: make steal_rbio() subpage compatible
>>    btrfs: make alloc_rbio_essential_pages() subpage compatible
>>    btrfs: enable subpage support for RAID56
>
> This patchset is relatively independent so I picked it now, it's in a
> topic branch, rebased on misc-next ie. on top of the bio cleanups from
> Christoph. I'll move it to misc-next after some quick testing.

Thanks a lot.

Although the last time (one hour ago), I didn't see a branch with
similar name.

And I'm adding just one new patch to address the u64 division problem
(it's going to shrink the width for stripe_len, and use right shift to
replace various division).

Since I can't find the branch, all your modification may not be preserved...

Thanks,
Qu

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-04-12 10:01 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-01 11:23 [PATCH 00/16] btrfs: add subpage support for RAID56 Qu Wenruo
2022-04-01 11:23 ` [PATCH 01/16] btrfs: open-code rbio_nr_pages() Qu Wenruo
2022-04-01 11:23 ` [PATCH 02/16] btrfs: make btrfs_raid_bio more compact Qu Wenruo
2022-04-01 11:23 ` [PATCH 03/16] btrfs: introduce new cached members for btrfs_raid_bio Qu Wenruo
2022-04-11 10:29   ` Geert Uytterhoeven
2022-04-11 11:09     ` Qu Wenruo
2022-04-01 11:23 ` [PATCH 04/16] btrfs: introduce btrfs_raid_bio::stripe_sectors Qu Wenruo
2022-04-08 16:40   ` David Sterba
2022-04-08 22:55     ` Qu Wenruo
2022-04-01 11:23 ` [PATCH 05/16] btrfs: introduce btrfs_raid_bio::bio_sectors Qu Wenruo
2022-04-08 16:46   ` David Sterba
2022-04-08 22:58     ` Qu Wenruo
2022-04-11 14:35       ` David Sterba
2022-04-01 11:23 ` [PATCH 06/16] btrfs: make rbio_add_io_page() subpage compatible Qu Wenruo
2022-04-01 11:23 ` [PATCH 07/16] btrfs: make finish_parity_scrub() " Qu Wenruo
2022-04-08 17:04   ` David Sterba
2022-04-08 22:59     ` Qu Wenruo
2022-04-11 14:36       ` David Sterba
2022-04-11 14:46         ` David Sterba
2022-04-01 11:23 ` [PATCH 08/16] btrfs: make __raid_recover_endio_io() subpage compatibable Qu Wenruo
2022-04-01 11:23 ` [PATCH 09/16] btrfs: make finish_rmw() subpage compatible Qu Wenruo
2022-04-11 17:00   ` Christoph Hellwig
2022-04-11 22:24     ` Qu Wenruo
2022-04-01 11:23 ` [PATCH 10/16] btrfs: open-code rbio_stripe_page_index() Qu Wenruo
2022-04-08 17:28   ` David Sterba
2022-04-01 11:23 ` [PATCH 11/16] btrfs: make raid56_add_scrub_pages() subpage compatible Qu Wenruo
2022-04-01 11:23 ` [PATCH 12/16] btrfs: remove btrfs_raid_bio::bio_pages array Qu Wenruo
2022-04-01 11:23 ` [PATCH 13/16] btrfs: make set_bio_pages_uptodate() subpage compatible Qu Wenruo
2022-04-01 11:23 ` [PATCH 14/16] btrfs: make steal_rbio() " Qu Wenruo
2022-04-01 11:23 ` [PATCH 15/16] btrfs: make alloc_rbio_essential_pages() " Qu Wenruo
2022-04-01 11:23 ` [PATCH 16/16] btrfs: enable subpage support for RAID56 Qu Wenruo
2022-04-08 18:16 ` [PATCH 00/16] btrfs: add " David Sterba
2022-04-12  7:15   ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.