All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] btrfs: raid56: reduce unnecessary parity writes
@ 2022-05-27  7:28 Qu Wenruo
  2022-05-27  7:28 ` [PATCH 1/3] btrfs: use integrated bitmaps for btrfs_raid_bio::dbitmap and finish_pbitmap Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-05-27  7:28 UTC (permalink / raw)
  To: linux-btrfs

If we have only 8K partial write at the beginning of a full RAID56
stripe, we will write the following contents:

                    0  8K           32K             64K
Disk 1	(data):     |XX|            |               |
Disk 2  (data):     |               |               |
Disk 3  (parity):   |XXXXXXXXXXXXXXX|XXXXXXXXXXXXXXX|

|X| means the sector will be written back to disk.

This is due to the fact that we don't really check if the vertical
stripe has any data (aka, range from higher level bio) for parity
stripes.

The patchset will convert it to the following write pattern:

                    0  8K           32K             64K
Disk 1	(data):     |XX|            |               |
Disk 2  (data):     |               |               |
Disk 3  (parity):   |XX|            |               |

This is done by fully utilize btrfs_raid_bio::dbitmap which is only
utilized by scrub path.

Now write path (either partial or full write) will also populate
btrfs_raid_bio::dbitmap, and then only assemble sectors marked in
dbitmap for submission.

The first two patches is just to make previous bitmap pointers into
integrated bitmaps inside the bbtrfs_raid_bio and scrub_parity.

This saves 8 bytes for each structure.

The final patch does the most important work, by introducing a new
helper, rbio_add_bio() to mark the btrfs_raid_bio::dbitmap.
Then in finish_rmw() only add sectors which has bit in dbitmap marked to
change the write pattern.


Qu Wenruo (3):
  btrfs: use integrated bitmaps for btrfs_raid_bio::dbitmap and
    finish_pbitmap
  btrfs: use integrated bitmaps for scrub_parity::dbitmap and ebitmap
  btrfs: only write the sectors in the vertical stripe which has data
    stripes

 fs/btrfs/raid56.c | 89 ++++++++++++++++++++++++++++++++++-------------
 fs/btrfs/scrub.c  | 32 +++++++----------
 2 files changed, 78 insertions(+), 43 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] btrfs: use integrated bitmaps for btrfs_raid_bio::dbitmap and finish_pbitmap
  2022-05-27  7:28 [PATCH 0/3] btrfs: raid56: reduce unnecessary parity writes Qu Wenruo
@ 2022-05-27  7:28 ` Qu Wenruo
  2022-05-27 14:09   ` David Sterba
  2022-05-27  7:28 ` [PATCH 2/3] btrfs: use integrated bitmaps for scrub_parity::dbitmap and ebitmap Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2022-05-27  7:28 UTC (permalink / raw)
  To: linux-btrfs

Previsouly we use "unsigned long *" for those two bitmaps.

But since we only support fixed stripe length (64KiB, already checked in
tree-checker), "unsigned long *" is really a waste of memory, while we
can just use "unsigned long".

This saves us 8 bytes in total for btrfs_raid_bio.

To be extra safe, add an ASSERT() making sure calculated
@stripe_nsectors is always smaller than BITS_PER_LONG.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index a5b623ee6fac..bc1e510b0c12 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -164,6 +164,13 @@ struct btrfs_raid_bio {
 	atomic_t stripes_pending;
 
 	atomic_t error;
+
+	/* Bitmap to record which horizontal stripe has data */
+	unsigned long dbitmap;
+
+	/* Allocated with stripe_nsectors-many bits for finish_*() calls */
+	unsigned long finish_pbitmap;
+
 	/*
 	 * these are two arrays of pointers.  We allocate the
 	 * rbio big enough to hold them both and setup their
@@ -184,14 +191,8 @@ struct btrfs_raid_bio {
 	 */
 	struct sector_ptr *stripe_sectors;
 
-	/* Bitmap to record which horizontal stripe has data */
-	unsigned long *dbitmap;
-
 	/* allocated with real_stripes-many pointers for finish_*() calls */
 	void **finish_pointers;
-
-	/* Allocated with stripe_nsectors-many bits for finish_*() calls */
-	unsigned long *finish_pbitmap;
 };
 
 static int __raid56_parity_recover(struct btrfs_raid_bio *rbio);
@@ -1038,14 +1039,17 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	ASSERT(IS_ALIGNED(stripe_len, PAGE_SIZE));
 	/* PAGE_SIZE must also be aligned to sectorsize for subpage support */
 	ASSERT(IS_ALIGNED(PAGE_SIZE, fs_info->sectorsize));
+	/*
+	 * Our current stripe len should be fixed to 64k thus stripe_nsectors
+	 * (at most 16) should be no larger than BITS_PER_LONG.
+	 */
+	ASSERT(stripe_nsectors <= BITS_PER_LONG);
 
 	rbio = kzalloc(sizeof(*rbio) +
 		       sizeof(*rbio->stripe_pages) * num_pages +
 		       sizeof(*rbio->bio_sectors) * num_sectors +
 		       sizeof(*rbio->stripe_sectors) * num_sectors +
-		       sizeof(*rbio->finish_pointers) * real_stripes +
-		       sizeof(*rbio->dbitmap) * BITS_TO_LONGS(stripe_nsectors) +
-		       sizeof(*rbio->finish_pbitmap) * BITS_TO_LONGS(stripe_nsectors),
+		       sizeof(*rbio->finish_pointers) * real_stripes,
 		       GFP_NOFS);
 	if (!rbio)
 		return ERR_PTR(-ENOMEM);
@@ -1081,8 +1085,6 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	CONSUME_ALLOC(rbio->bio_sectors, num_sectors);
 	CONSUME_ALLOC(rbio->stripe_sectors, num_sectors);
 	CONSUME_ALLOC(rbio->finish_pointers, real_stripes);
-	CONSUME_ALLOC(rbio->dbitmap, BITS_TO_LONGS(stripe_nsectors));
-	CONSUME_ALLOC(rbio->finish_pbitmap, BITS_TO_LONGS(stripe_nsectors));
 #undef  CONSUME_ALLOC
 
 	if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID5)
@@ -1939,7 +1941,7 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 		 * which we have data when doing parity scrub.
 		 */
 		if (rbio->operation == BTRFS_RBIO_PARITY_SCRUB &&
-		    !test_bit(sectornr, rbio->dbitmap))
+		    !test_bit(sectornr, &rbio->dbitmap))
 			continue;
 
 		/*
@@ -2374,7 +2376,7 @@ struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio,
 	}
 	ASSERT(i < rbio->real_stripes);
 
-	bitmap_copy(rbio->dbitmap, dbitmap, stripe_nsectors);
+	bitmap_copy(&rbio->dbitmap, dbitmap, stripe_nsectors);
 
 	/*
 	 * We have already increased bio_counter when getting bioc, record it
@@ -2412,7 +2414,7 @@ static int alloc_rbio_essential_pages(struct btrfs_raid_bio *rbio)
 	int stripe;
 	int sectornr;
 
-	for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
+	for_each_set_bit(sectornr, &rbio->dbitmap, rbio->stripe_nsectors) {
 		for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
 			struct page *page;
 			int index = (stripe * rbio->stripe_nsectors + sectornr) *
@@ -2437,7 +2439,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	struct btrfs_io_context *bioc = rbio->bioc;
 	const u32 sectorsize = bioc->fs_info->sectorsize;
 	void **pointers = rbio->finish_pointers;
-	unsigned long *pbitmap = rbio->finish_pbitmap;
+	unsigned long *pbitmap = &rbio->finish_pbitmap;
 	int nr_data = rbio->nr_data;
 	int stripe;
 	int sectornr;
@@ -2460,7 +2462,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 
 	if (bioc->num_tgtdevs && bioc->tgtdev_map[rbio->scrubp]) {
 		is_replace = 1;
-		bitmap_copy(pbitmap, rbio->dbitmap, rbio->stripe_nsectors);
+		bitmap_copy(pbitmap, &rbio->dbitmap, rbio->stripe_nsectors);
 	}
 
 	/*
@@ -2497,7 +2499,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	/* Map the parity stripe just once */
 	pointers[nr_data] = kmap_local_page(p_sector.page);
 
-	for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
+	for_each_set_bit(sectornr, &rbio->dbitmap, rbio->stripe_nsectors) {
 		struct sector_ptr *sector;
 		void *parity;
 
@@ -2525,7 +2527,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 			memcpy(parity, pointers[rbio->scrubp], sectorsize);
 		else
 			/* Parity is right, needn't writeback */
-			bitmap_clear(rbio->dbitmap, sectornr, 1);
+			bitmap_clear(&rbio->dbitmap, sectornr, 1);
 		kunmap_local(parity);
 
 		for (stripe = nr_data - 1; stripe >= 0; stripe--)
@@ -2547,7 +2549,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	 * higher layers (the bio_list in our rbio) and our p/q.  Ignore
 	 * everything else.
 	 */
-	for_each_set_bit(sectornr, rbio->dbitmap, rbio->stripe_nsectors) {
+	for_each_set_bit(sectornr, &rbio->dbitmap, rbio->stripe_nsectors) {
 		struct sector_ptr *sector;
 
 		sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr);
@@ -2714,7 +2716,7 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
 	 * stripe
 	 */
 	for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
-		for_each_set_bit(sectornr , rbio->dbitmap, rbio->stripe_nsectors) {
+		for_each_set_bit(sectornr, &rbio->dbitmap, rbio->stripe_nsectors) {
 			struct sector_ptr *sector;
 			/*
 			 * We want to find all the sectors missing from the
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] btrfs: use integrated bitmaps for scrub_parity::dbitmap and ebitmap
  2022-05-27  7:28 [PATCH 0/3] btrfs: raid56: reduce unnecessary parity writes Qu Wenruo
  2022-05-27  7:28 ` [PATCH 1/3] btrfs: use integrated bitmaps for btrfs_raid_bio::dbitmap and finish_pbitmap Qu Wenruo
@ 2022-05-27  7:28 ` Qu Wenruo
  2022-05-27  7:28 ` [PATCH 3/3] btrfs: only write the sectors in the vertical stripe which has data stripes Qu Wenruo
  2022-05-27 14:08 ` [PATCH 0/3] btrfs: raid56: reduce unnecessary parity writes David Sterba
  3 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-05-27  7:28 UTC (permalink / raw)
  To: linux-btrfs

Previously we use "unsigned long *" for those two bitmaps.

But since we only support fixed stripe length (64KiB, already checked in
tree-checker), "unsigned long *" is really a waste of memory, while we
can just use "unsigned long".

This saves us 8 bytes in total for scrub_parity.

To be extra safe, add an ASSERT() making sure calclulated @nsectors is
always smaller than BITS_PER_LONG.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 32 +++++++++++++-------------------
 1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index e7b0323e6efd..46cac7c7f292 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -135,15 +135,13 @@ struct scrub_parity {
 	struct work_struct	work;
 
 	/* Mark the parity blocks which have data */
-	unsigned long		*dbitmap;
+	unsigned long		dbitmap;
 
 	/*
 	 * Mark the parity blocks which have data, but errors happen when
 	 * read data or check data
 	 */
-	unsigned long		*ebitmap;
-
-	unsigned long		bitmap[];
+	unsigned long		ebitmap;
 };
 
 struct scrub_ctx {
@@ -2406,13 +2404,13 @@ static inline void __scrub_mark_bitmap(struct scrub_parity *sparity,
 static inline void scrub_parity_mark_sectors_error(struct scrub_parity *sparity,
 						   u64 start, u32 len)
 {
-	__scrub_mark_bitmap(sparity, sparity->ebitmap, start, len);
+	__scrub_mark_bitmap(sparity, &sparity->ebitmap, start, len);
 }
 
 static inline void scrub_parity_mark_sectors_data(struct scrub_parity *sparity,
 						  u64 start, u32 len)
 {
-	__scrub_mark_bitmap(sparity, sparity->dbitmap, start, len);
+	__scrub_mark_bitmap(sparity, &sparity->dbitmap, start, len);
 }
 
 static void scrub_block_complete(struct scrub_block *sblock)
@@ -2763,7 +2761,7 @@ static void scrub_free_parity(struct scrub_parity *sparity)
 	struct scrub_sector *curr, *next;
 	int nbits;
 
-	nbits = bitmap_weight(sparity->ebitmap, sparity->nsectors);
+	nbits = bitmap_weight(&sparity->ebitmap, sparity->nsectors);
 	if (nbits) {
 		spin_lock(&sctx->stat_lock);
 		sctx->stat.read_errors += nbits;
@@ -2795,8 +2793,8 @@ static void scrub_parity_bio_endio(struct bio *bio)
 	struct btrfs_fs_info *fs_info = sparity->sctx->fs_info;
 
 	if (bio->bi_status)
-		bitmap_or(sparity->ebitmap, sparity->ebitmap, sparity->dbitmap,
-			  sparity->nsectors);
+		bitmap_or(&sparity->ebitmap, &sparity->ebitmap,
+			  &sparity->dbitmap, sparity->nsectors);
 
 	bio_put(bio);
 
@@ -2814,8 +2812,8 @@ static void scrub_parity_check_and_repair(struct scrub_parity *sparity)
 	u64 length;
 	int ret;
 
-	if (!bitmap_andnot(sparity->dbitmap, sparity->dbitmap, sparity->ebitmap,
-			   sparity->nsectors))
+	if (!bitmap_andnot(&sparity->dbitmap, &sparity->dbitmap,
+			   &sparity->ebitmap, sparity->nsectors))
 		goto out;
 
 	length = sparity->logic_end - sparity->logic_start;
@@ -2833,7 +2831,7 @@ static void scrub_parity_check_and_repair(struct scrub_parity *sparity)
 
 	rbio = raid56_parity_alloc_scrub_rbio(bio, bioc, length,
 					      sparity->scrub_dev,
-					      sparity->dbitmap,
+					      &sparity->dbitmap,
 					      sparity->nsectors);
 	if (!rbio)
 		goto rbio_out;
@@ -2847,7 +2845,7 @@ static void scrub_parity_check_and_repair(struct scrub_parity *sparity)
 bioc_out:
 	btrfs_bio_counter_dec(fs_info);
 	btrfs_put_bioc(bioc);
-	bitmap_or(sparity->ebitmap, sparity->ebitmap, sparity->dbitmap,
+	bitmap_or(&sparity->ebitmap, &sparity->ebitmap, &sparity->dbitmap,
 		  sparity->nsectors);
 	spin_lock(&sctx->stat_lock);
 	sctx->stat.malloc_errors++;
@@ -3131,7 +3129,6 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx,
 	int ret;
 	struct scrub_parity *sparity;
 	int nsectors;
-	int bitmap_len;
 
 	path = btrfs_alloc_path();
 	if (!path) {
@@ -3145,9 +3142,8 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx,
 
 	ASSERT(map->stripe_len <= U32_MAX);
 	nsectors = map->stripe_len >> fs_info->sectorsize_bits;
-	bitmap_len = scrub_calc_parity_bitmap_len(nsectors);
-	sparity = kzalloc(sizeof(struct scrub_parity) + 2 * bitmap_len,
-			  GFP_NOFS);
+	ASSERT(nsectors <= BITS_PER_LONG);
+	sparity = kzalloc(sizeof(struct scrub_parity), GFP_NOFS);
 	if (!sparity) {
 		spin_lock(&sctx->stat_lock);
 		sctx->stat.malloc_errors++;
@@ -3165,8 +3161,6 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx,
 	sparity->logic_end = logic_end;
 	refcount_set(&sparity->refs, 1);
 	INIT_LIST_HEAD(&sparity->sectors_list);
-	sparity->dbitmap = sparity->bitmap;
-	sparity->ebitmap = (void *)sparity->bitmap + bitmap_len;
 
 	ret = 0;
 	for (cur_logical = logic_start; cur_logical < logic_end;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] btrfs: only write the sectors in the vertical stripe which has data stripes
  2022-05-27  7:28 [PATCH 0/3] btrfs: raid56: reduce unnecessary parity writes Qu Wenruo
  2022-05-27  7:28 ` [PATCH 1/3] btrfs: use integrated bitmaps for btrfs_raid_bio::dbitmap and finish_pbitmap Qu Wenruo
  2022-05-27  7:28 ` [PATCH 2/3] btrfs: use integrated bitmaps for scrub_parity::dbitmap and ebitmap Qu Wenruo
@ 2022-05-27  7:28 ` Qu Wenruo
  2022-05-31 10:42   ` Qu Wenruo
  2022-05-27 14:08 ` [PATCH 0/3] btrfs: raid56: reduce unnecessary parity writes David Sterba
  3 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2022-05-27  7:28 UTC (permalink / raw)
  To: linux-btrfs

If we have only 8K partial write at the beginning of a full RAID56
stripe, we will write the following contents:

                    0  8K           32K             64K
Disk 1	(data):     |XX|            |               |
Disk 2  (data):     |               |               |
Disk 3  (parity):   |XXXXXXXXXXXXXXX|XXXXXXXXXXXXXXX|

|X| means the sector will be written back to disk.

Note that, although we won't write any sectors from disk 2, but we will
write the full 64KiB of parity to disk.

This behavior is fine for now, but not for the future (especially for
RAID56J, as we waste quite some space to journal the unused parity
stripes).

So here we will also utilize the btrfs_raid_bio::dbitmap, anytime we
queue a higher level bio into an rbio, we will update rbio::dbitmap to
indicate which vertical stripes we need to writeback.

And at finish_rmw(), we also check dbitmap to see if we need to write
any sector in the veritical stripe.

So after the patch, above example will only lead to the following
writeback patter:

                    0  8K           32K             64K
Disk 1	(data):     |XX|            |               |
Disk 2  (data):     |               |               |
Disk 3  (parity):   |XX|            |               |

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 47 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 43 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index bc1e510b0c12..88640f7e1622 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -392,6 +392,9 @@ static void merge_rbio(struct btrfs_raid_bio *dest,
 {
 	bio_list_merge(&dest->bio_list, &victim->bio_list);
 	dest->bio_list_bytes += victim->bio_list_bytes;
+	/* Also inherit the bitmaps from @victim. */
+	bitmap_or(&dest->dbitmap, &victim->dbitmap, &dest->dbitmap,
+		  dest->stripe_nsectors);
 	dest->generic_bio_cnt += victim->generic_bio_cnt;
 	bio_list_init(&victim->bio_list);
 }
@@ -1284,6 +1287,9 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 	else
 		BUG();
 
+	/* We should have at least one data sector. */
+	ASSERT(bitmap_weight(&rbio->dbitmap, rbio->stripe_nsectors));
+
 	/* at this point we either have a full stripe,
 	 * or we've read the full stripe from the drive.
 	 * recalculate the parity and write the new results.
@@ -1358,6 +1364,10 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
 			struct sector_ptr *sector;
 
+			/* This vertical stripe has no data, skip it. */
+			if (!test_bit(sectornr, &rbio->dbitmap))
+				continue;
+
 			if (stripe < rbio->nr_data) {
 				sector = sector_in_rbio(rbio, stripe, sectornr, 1);
 				if (!sector)
@@ -1384,6 +1394,10 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
 			struct sector_ptr *sector;
 
+			/* This vertical stripe has no data, skip it. */
+			if (!test_bit(sectornr, &rbio->dbitmap))
+				continue;
+
 			if (stripe < rbio->nr_data) {
 				sector = sector_in_rbio(rbio, stripe, sectornr, 1);
 				if (!sector)
@@ -1835,6 +1849,33 @@ static void btrfs_raid_unplug(struct blk_plug_cb *cb, bool from_schedule)
 	run_plug(plug);
 }
 
+/* Add the original bio into rbio->bio_list, and update rbio::dbitmap. */
+static void rbio_add_bio(struct btrfs_raid_bio *rbio, struct bio *orig_bio)
+{
+	const struct btrfs_fs_info *fs_info = rbio->bioc->fs_info;
+	const u64 orig_logical = orig_bio->bi_iter.bi_sector << SECTOR_SHIFT;
+	const u64 full_stripe_start = rbio->bioc->raid_map[0];
+	const u32 orig_len = orig_bio->bi_iter.bi_size;
+	const u32 sectorsize = fs_info->sectorsize;
+	u64 cur_logical;
+
+	ASSERT(orig_logical >= full_stripe_start &&
+	       orig_logical + orig_len <= full_stripe_start +
+	       rbio->nr_data * rbio->stripe_len);
+
+	bio_list_add(&rbio->bio_list, orig_bio);
+	rbio->bio_list_bytes += orig_bio->bi_iter.bi_size;
+
+	/* Update the dbitmap. */
+	for (cur_logical = orig_logical; cur_logical < orig_logical + orig_len;
+	     cur_logical += sectorsize) {
+		int bit = ((u32)(cur_logical - full_stripe_start) >>
+			   fs_info->sectorsize_bits) % rbio->stripe_nsectors;
+
+		set_bit(bit, &rbio->dbitmap);
+	}
+}
+
 /*
  * our main entry point for writes from the rest of the FS.
  */
@@ -1851,9 +1892,8 @@ int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc, u32 stri
 		btrfs_put_bioc(bioc);
 		return PTR_ERR(rbio);
 	}
-	bio_list_add(&rbio->bio_list, bio);
-	rbio->bio_list_bytes = bio->bi_iter.bi_size;
 	rbio->operation = BTRFS_RBIO_WRITE;
+	rbio_add_bio(rbio, bio);
 
 	btrfs_bio_counter_inc_noblocked(fs_info);
 	rbio->generic_bio_cnt = 1;
@@ -2258,8 +2298,7 @@ int raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc,
 	}
 
 	rbio->operation = BTRFS_RBIO_READ_REBUILD;
-	bio_list_add(&rbio->bio_list, bio);
-	rbio->bio_list_bytes = bio->bi_iter.bi_size;
+	rbio_add_bio(rbio, bio);
 
 	rbio->faila = find_logical_bio_stripe(rbio, bio);
 	if (rbio->faila == -1) {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] btrfs: raid56: reduce unnecessary parity writes
  2022-05-27  7:28 [PATCH 0/3] btrfs: raid56: reduce unnecessary parity writes Qu Wenruo
                   ` (2 preceding siblings ...)
  2022-05-27  7:28 ` [PATCH 3/3] btrfs: only write the sectors in the vertical stripe which has data stripes Qu Wenruo
@ 2022-05-27 14:08 ` David Sterba
  3 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2022-05-27 14:08 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, May 27, 2022 at 03:28:16PM +0800, Qu Wenruo wrote:
> If we have only 8K partial write at the beginning of a full RAID56
> stripe, we will write the following contents:
> 
>                     0  8K           32K             64K
> Disk 1	(data):     |XX|            |               |
> Disk 2  (data):     |               |               |
> Disk 3  (parity):   |XXXXXXXXXXXXXXX|XXXXXXXXXXXXXXX|
> 
> |X| means the sector will be written back to disk.
> 
> This is due to the fact that we don't really check if the vertical
> stripe has any data (aka, range from higher level bio) for parity
> stripes.
> 
> The patchset will convert it to the following write pattern:
> 
>                     0  8K           32K             64K
> Disk 1	(data):     |XX|            |               |
> Disk 2  (data):     |               |               |
> Disk 3  (parity):   |XX|            |               |
> 
> This is done by fully utilize btrfs_raid_bio::dbitmap which is only
> utilized by scrub path.
> 
> Now write path (either partial or full write) will also populate
> btrfs_raid_bio::dbitmap, and then only assemble sectors marked in
> dbitmap for submission.
> 
> The first two patches is just to make previous bitmap pointers into
> integrated bitmaps inside the bbtrfs_raid_bio and scrub_parity.
> 
> This saves 8 bytes for each structure.
> 
> The final patch does the most important work, by introducing a new
> helper, rbio_add_bio() to mark the btrfs_raid_bio::dbitmap.
> Then in finish_rmw() only add sectors which has bit in dbitmap marked to
> change the write pattern.
> 
> 
> Qu Wenruo (3):
>   btrfs: use integrated bitmaps for btrfs_raid_bio::dbitmap and
>     finish_pbitmap
>   btrfs: use integrated bitmaps for scrub_parity::dbitmap and ebitmap
>   btrfs: only write the sectors in the vertical stripe which has data
>     stripes

Added to a for-next/topic branch, thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] btrfs: use integrated bitmaps for btrfs_raid_bio::dbitmap and finish_pbitmap
  2022-05-27  7:28 ` [PATCH 1/3] btrfs: use integrated bitmaps for btrfs_raid_bio::dbitmap and finish_pbitmap Qu Wenruo
@ 2022-05-27 14:09   ` David Sterba
  2022-05-27 23:27     ` Qu Wenruo
  0 siblings, 1 reply; 8+ messages in thread
From: David Sterba @ 2022-05-27 14:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, May 27, 2022 at 03:28:17PM +0800, Qu Wenruo wrote:
> Previsouly we use "unsigned long *" for those two bitmaps.
> 
> But since we only support fixed stripe length (64KiB, already checked in
> tree-checker), "unsigned long *" is really a waste of memory, while we
> can just use "unsigned long".
> 
> This saves us 8 bytes in total for btrfs_raid_bio.

Nice, also the indirection of pointers and kmalloc, for 8 bytes it's an
overkill. If we ever implement bigger stripe size then it would have to
be reverted but the asserts make sure we'll notice.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] btrfs: use integrated bitmaps for btrfs_raid_bio::dbitmap and finish_pbitmap
  2022-05-27 14:09   ` David Sterba
@ 2022-05-27 23:27     ` Qu Wenruo
  0 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-05-27 23:27 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs



On 2022/5/27 22:09, David Sterba wrote:
> On Fri, May 27, 2022 at 03:28:17PM +0800, Qu Wenruo wrote:
>> Previsouly we use "unsigned long *" for those two bitmaps.
>>
>> But since we only support fixed stripe length (64KiB, already checked in
>> tree-checker), "unsigned long *" is really a waste of memory, while we
>> can just use "unsigned long".
>>
>> This saves us 8 bytes in total for btrfs_raid_bio.
>
> Nice, also the indirection of pointers and kmalloc, for 8 bytes it's an
> overkill. If we ever implement bigger stripe size then it would have to
> be reverted but the asserts make sure we'll notice.

In fact, even if we enlarge the STRIPE_LEN to 128K, the bitmap is still
enough to contain them.

Furthermore, in that enlarged case, tree-checker will be the first one
to warn us.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] btrfs: only write the sectors in the vertical stripe which has data stripes
  2022-05-27  7:28 ` [PATCH 3/3] btrfs: only write the sectors in the vertical stripe which has data stripes Qu Wenruo
@ 2022-05-31 10:42   ` Qu Wenruo
  0 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-05-31 10:42 UTC (permalink / raw)
  To: linux-btrfs



On 2022/5/27 15:28, Qu Wenruo wrote:
> If we have only 8K partial write at the beginning of a full RAID56
> stripe, we will write the following contents:
> 
>                      0  8K           32K             64K
> Disk 1	(data):     |XX|            |               |
> Disk 2  (data):     |               |               |
> Disk 3  (parity):   |XXXXXXXXXXXXXXX|XXXXXXXXXXXXXXX|
> 
> |X| means the sector will be written back to disk.
> 
> Note that, although we won't write any sectors from disk 2, but we will
> write the full 64KiB of parity to disk.
> 
> This behavior is fine for now, but not for the future (especially for
> RAID56J, as we waste quite some space to journal the unused parity
> stripes).
> 
> So here we will also utilize the btrfs_raid_bio::dbitmap, anytime we
> queue a higher level bio into an rbio, we will update rbio::dbitmap to
> indicate which vertical stripes we need to writeback.
> 
> And at finish_rmw(), we also check dbitmap to see if we need to write
> any sector in the veritical stripe.
> 
> So after the patch, above example will only lead to the following
> writeback patter:
> 
>                      0  8K           32K             64K
> Disk 1	(data):     |XX|            |               |
> Disk 2  (data):     |               |               |
> Disk 3  (parity):   |XX|            |               |

With the latest debug patch, it turns out that, this patch is only 
working for the initial several writes into a full stripe.

Since we didn't reset dbitmap after the RMW finished, it will slowly 
degrade into the old behavior.

Will update the patchset to address this soon.

Thanks,
Qu
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/raid56.c | 47 +++++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 43 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index bc1e510b0c12..88640f7e1622 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -392,6 +392,9 @@ static void merge_rbio(struct btrfs_raid_bio *dest,
>   {
>   	bio_list_merge(&dest->bio_list, &victim->bio_list);
>   	dest->bio_list_bytes += victim->bio_list_bytes;
> +	/* Also inherit the bitmaps from @victim. */
> +	bitmap_or(&dest->dbitmap, &victim->dbitmap, &dest->dbitmap,
> +		  dest->stripe_nsectors);
>   	dest->generic_bio_cnt += victim->generic_bio_cnt;
>   	bio_list_init(&victim->bio_list);
>   }
> @@ -1284,6 +1287,9 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
>   	else
>   		BUG();
>   
> +	/* We should have at least one data sector. */
> +	ASSERT(bitmap_weight(&rbio->dbitmap, rbio->stripe_nsectors));
> +
>   	/* at this point we either have a full stripe,
>   	 * or we've read the full stripe from the drive.
>   	 * recalculate the parity and write the new results.
> @@ -1358,6 +1364,10 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
>   		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
>   			struct sector_ptr *sector;
>   
> +			/* This vertical stripe has no data, skip it. */
> +			if (!test_bit(sectornr, &rbio->dbitmap))
> +				continue;
> +
>   			if (stripe < rbio->nr_data) {
>   				sector = sector_in_rbio(rbio, stripe, sectornr, 1);
>   				if (!sector)
> @@ -1384,6 +1394,10 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
>   		for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
>   			struct sector_ptr *sector;
>   
> +			/* This vertical stripe has no data, skip it. */
> +			if (!test_bit(sectornr, &rbio->dbitmap))
> +				continue;
> +
>   			if (stripe < rbio->nr_data) {
>   				sector = sector_in_rbio(rbio, stripe, sectornr, 1);
>   				if (!sector)
> @@ -1835,6 +1849,33 @@ static void btrfs_raid_unplug(struct blk_plug_cb *cb, bool from_schedule)
>   	run_plug(plug);
>   }
>   
> +/* Add the original bio into rbio->bio_list, and update rbio::dbitmap. */
> +static void rbio_add_bio(struct btrfs_raid_bio *rbio, struct bio *orig_bio)
> +{
> +	const struct btrfs_fs_info *fs_info = rbio->bioc->fs_info;
> +	const u64 orig_logical = orig_bio->bi_iter.bi_sector << SECTOR_SHIFT;
> +	const u64 full_stripe_start = rbio->bioc->raid_map[0];
> +	const u32 orig_len = orig_bio->bi_iter.bi_size;
> +	const u32 sectorsize = fs_info->sectorsize;
> +	u64 cur_logical;
> +
> +	ASSERT(orig_logical >= full_stripe_start &&
> +	       orig_logical + orig_len <= full_stripe_start +
> +	       rbio->nr_data * rbio->stripe_len);
> +
> +	bio_list_add(&rbio->bio_list, orig_bio);
> +	rbio->bio_list_bytes += orig_bio->bi_iter.bi_size;
> +
> +	/* Update the dbitmap. */
> +	for (cur_logical = orig_logical; cur_logical < orig_logical + orig_len;
> +	     cur_logical += sectorsize) {
> +		int bit = ((u32)(cur_logical - full_stripe_start) >>
> +			   fs_info->sectorsize_bits) % rbio->stripe_nsectors;
> +
> +		set_bit(bit, &rbio->dbitmap);
> +	}
> +}
> +
>   /*
>    * our main entry point for writes from the rest of the FS.
>    */
> @@ -1851,9 +1892,8 @@ int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc, u32 stri
>   		btrfs_put_bioc(bioc);
>   		return PTR_ERR(rbio);
>   	}
> -	bio_list_add(&rbio->bio_list, bio);
> -	rbio->bio_list_bytes = bio->bi_iter.bi_size;
>   	rbio->operation = BTRFS_RBIO_WRITE;
> +	rbio_add_bio(rbio, bio);
>   
>   	btrfs_bio_counter_inc_noblocked(fs_info);
>   	rbio->generic_bio_cnt = 1;
> @@ -2258,8 +2298,7 @@ int raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc,
>   	}
>   
>   	rbio->operation = BTRFS_RBIO_READ_REBUILD;
> -	bio_list_add(&rbio->bio_list, bio);
> -	rbio->bio_list_bytes = bio->bi_iter.bi_size;
> +	rbio_add_bio(rbio, bio);
>   
>   	rbio->faila = find_logical_bio_stripe(rbio, bio);
>   	if (rbio->faila == -1) {


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-05-31 10:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-27  7:28 [PATCH 0/3] btrfs: raid56: reduce unnecessary parity writes Qu Wenruo
2022-05-27  7:28 ` [PATCH 1/3] btrfs: use integrated bitmaps for btrfs_raid_bio::dbitmap and finish_pbitmap Qu Wenruo
2022-05-27 14:09   ` David Sterba
2022-05-27 23:27     ` Qu Wenruo
2022-05-27  7:28 ` [PATCH 2/3] btrfs: use integrated bitmaps for scrub_parity::dbitmap and ebitmap Qu Wenruo
2022-05-27  7:28 ` [PATCH 3/3] btrfs: only write the sectors in the vertical stripe which has data stripes Qu Wenruo
2022-05-31 10:42   ` Qu Wenruo
2022-05-27 14:08 ` [PATCH 0/3] btrfs: raid56: reduce unnecessary parity writes David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.