linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, David Sterba <dsterba@suse.com>,
	Qu Wenruo <wqu@suse.com>
Subject: [PATCH 5.15 09/14] btrfs: only write the sectors in the vertical stripe which has data stripes
Date: Fri, 19 Aug 2022 17:40:25 +0200	[thread overview]
Message-ID: <20220819153711.977759356@linuxfoundation.org> (raw)
In-Reply-To: <20220819153711.658766010@linuxfoundation.org>

From: Qu Wenruo <wqu@suse.com>

commit bd8f7e627703ca5707833d623efcd43f104c7b3f upstream.

If we have only 8K partial write at the beginning of a full RAID56
stripe, we will write the following contents:

                    0  8K           32K             64K
Disk 1	(data):     |XX|            |               |
Disk 2  (data):     |               |               |
Disk 3  (parity):   |XXXXXXXXXXXXXXX|XXXXXXXXXXXXXXX|

|X| means the sector will be written back to disk.

Note that, although we won't write any sectors from disk 2, but we will
write the full 64KiB of parity to disk.

This behavior is fine for now, but not for the future (especially for
RAID56J, as we waste quite some space to journal the unused parity
stripes).

So here we will also utilize the btrfs_raid_bio::dbitmap, anytime we
queue a higher level bio into an rbio, we will update rbio::dbitmap to
indicate which vertical stripes we need to writeback.

And at finish_rmw(), we also check dbitmap to see if we need to write
any sector in the vertical stripe.

So after the patch, above example will only lead to the following
writeback pattern:

                    0  8K           32K             64K
Disk 1	(data):     |XX|            |               |
Disk 2  (data):     |               |               |
Disk 3  (parity):   |XX|            |               |

Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/btrfs/raid56.c |   55 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 51 insertions(+), 4 deletions(-)

--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -324,6 +324,9 @@ static void merge_rbio(struct btrfs_raid
 {
 	bio_list_merge(&dest->bio_list, &victim->bio_list);
 	dest->bio_list_bytes += victim->bio_list_bytes;
+	/* Also inherit the bitmaps from @victim. */
+	bitmap_or(dest->dbitmap, victim->dbitmap, dest->dbitmap,
+		  dest->stripe_npages);
 	dest->generic_bio_cnt += victim->generic_bio_cnt;
 	bio_list_init(&victim->bio_list);
 }
@@ -865,6 +868,12 @@ static void rbio_orig_end_io(struct btrf
 
 	if (rbio->generic_bio_cnt)
 		btrfs_bio_counter_sub(rbio->fs_info, rbio->generic_bio_cnt);
+	/*
+	 * Clear the data bitmap, as the rbio may be cached for later usage.
+	 * do this before before unlock_stripe() so there will be no new bio
+	 * for this bio.
+	 */
+	bitmap_clear(rbio->dbitmap, 0, rbio->stripe_npages);
 
 	/*
 	 * At this moment, rbio->bio_list is empty, however since rbio does not
@@ -1197,6 +1206,9 @@ static noinline void finish_rmw(struct b
 	else
 		BUG();
 
+	/* We should have at least one data sector. */
+	ASSERT(bitmap_weight(rbio->dbitmap, rbio->stripe_npages));
+
 	/* at this point we either have a full stripe,
 	 * or we've read the full stripe from the drive.
 	 * recalculate the parity and write the new results.
@@ -1268,6 +1280,11 @@ static noinline void finish_rmw(struct b
 	for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
 		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
 			struct page *page;
+
+			/* This vertical stripe has no data, skip it. */
+			if (!test_bit(pagenr, rbio->dbitmap))
+				continue;
+
 			if (stripe < rbio->nr_data) {
 				page = page_in_rbio(rbio, stripe, pagenr, 1);
 				if (!page)
@@ -1292,6 +1309,11 @@ static noinline void finish_rmw(struct b
 
 		for (pagenr = 0; pagenr < rbio->stripe_npages; pagenr++) {
 			struct page *page;
+
+			/* This vertical stripe has no data, skip it. */
+			if (!test_bit(pagenr, rbio->dbitmap))
+				continue;
+
 			if (stripe < rbio->nr_data) {
 				page = page_in_rbio(rbio, stripe, pagenr, 1);
 				if (!page)
@@ -1715,6 +1737,33 @@ static void btrfs_raid_unplug(struct blk
 	run_plug(plug);
 }
 
+/* Add the original bio into rbio->bio_list, and update rbio::dbitmap. */
+static void rbio_add_bio(struct btrfs_raid_bio *rbio, struct bio *orig_bio)
+{
+	const struct btrfs_fs_info *fs_info = rbio->fs_info;
+	const u64 orig_logical = orig_bio->bi_iter.bi_sector << SECTOR_SHIFT;
+	const u64 full_stripe_start = rbio->bioc->raid_map[0];
+	const u32 orig_len = orig_bio->bi_iter.bi_size;
+	const u32 sectorsize = fs_info->sectorsize;
+	u64 cur_logical;
+
+	ASSERT(orig_logical >= full_stripe_start &&
+	       orig_logical + orig_len <= full_stripe_start +
+	       rbio->nr_data * rbio->stripe_len);
+
+	bio_list_add(&rbio->bio_list, orig_bio);
+	rbio->bio_list_bytes += orig_bio->bi_iter.bi_size;
+
+	/* Update the dbitmap. */
+	for (cur_logical = orig_logical; cur_logical < orig_logical + orig_len;
+	     cur_logical += sectorsize) {
+		int bit = ((u32)(cur_logical - full_stripe_start) >>
+			   fs_info->sectorsize_bits) % rbio->stripe_npages;
+
+		set_bit(bit, rbio->dbitmap);
+	}
+}
+
 /*
  * our main entry point for writes from the rest of the FS.
  */
@@ -1731,9 +1780,8 @@ int raid56_parity_write(struct btrfs_fs_
 		btrfs_put_bioc(bioc);
 		return PTR_ERR(rbio);
 	}
-	bio_list_add(&rbio->bio_list, bio);
-	rbio->bio_list_bytes = bio->bi_iter.bi_size;
 	rbio->operation = BTRFS_RBIO_WRITE;
+	rbio_add_bio(rbio, bio);
 
 	btrfs_bio_counter_inc_noblocked(fs_info);
 	rbio->generic_bio_cnt = 1;
@@ -2135,8 +2183,7 @@ int raid56_parity_recover(struct btrfs_f
 	}
 
 	rbio->operation = BTRFS_RBIO_READ_REBUILD;
-	bio_list_add(&rbio->bio_list, bio);
-	rbio->bio_list_bytes = bio->bi_iter.bi_size;
+	rbio_add_bio(rbio, bio);
 
 	rbio->faila = find_logical_bio_stripe(rbio, bio);
 	if (rbio->faila == -1) {



  parent reply	other threads:[~2022-08-19 15:43 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-19 15:40 [PATCH 5.15 00/14] 5.15.62-rc1 review Greg Kroah-Hartman
2022-08-19 15:40 ` [PATCH 5.15 01/14] io_uring: use original request task for inflight tracking Greg Kroah-Hartman
2022-08-19 15:40 ` [PATCH 5.15 02/14] tee: add overflow check in register_shm_helper() Greg Kroah-Hartman
2022-08-19 15:40 ` [PATCH 5.15 03/14] net_sched: cls_route: disallow handle of 0 Greg Kroah-Hartman
2022-08-19 15:40 ` [PATCH 5.15 04/14] ksmbd: prevent out of bound read for SMB2_WRITE Greg Kroah-Hartman
2022-08-19 15:40 ` [PATCH 5.15 05/14] ksmbd: fix heap-based overflow in set_ntacl_dacl() Greg Kroah-Hartman
2022-08-19 15:40 ` [PATCH 5.15 06/14] Revert "x86/ftrace: Use alternative RET encoding" Greg Kroah-Hartman
2022-08-19 15:40 ` [PATCH 5.15 07/14] x86/ibt,ftrace: Make function-graph play nice Greg Kroah-Hartman
2022-08-19 15:40 ` [PATCH 5.15 08/14] x86/ftrace: Use alternative RET encoding Greg Kroah-Hartman
2022-08-19 15:40 ` Greg Kroah-Hartman [this message]
2022-08-19 15:40 ` [PATCH 5.15 10/14] btrfs: raid56: dont trust any cached sector in __raid56_parity_recover() Greg Kroah-Hartman
2022-08-19 15:40 ` [PATCH 5.15 11/14] kexec_file: drop weak attribute from functions Greg Kroah-Hartman
2022-08-19 15:40 ` [PATCH 5.15 12/14] kexec: clean up arch_kexec_kernel_verify_sig Greg Kroah-Hartman
2022-08-19 15:40 ` [PATCH 5.15 13/14] kexec, KEYS: make the code in bzImage64_verify_sig generic Greg Kroah-Hartman
2022-08-19 15:40 ` [PATCH 5.15 14/14] arm64: kexec_file: use more system keyrings to verify kernel image signature Greg Kroah-Hartman
2022-08-20  0:42 ` [PATCH 5.15 00/14] 5.15.62-rc1 review Shuah Khan
2022-08-20  8:27 ` Naresh Kamboju
2022-08-20 18:17   ` Greg Kroah-Hartman
2022-08-20 10:05 ` Bagas Sanjaya
2022-08-20 10:42 ` Sudip Mukherjee (Codethink)
2022-08-20 22:23 ` Ron Economos
2022-08-21  0:56 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220819153711.977759356@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dsterba@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).