linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Liu Bo <bo.li.liu@oracle.com>,
	David Sterba <dsterba@suse.com>,
	Sasha Levin <alexander.levin@microsoft.com>
Subject: [PATCH 4.9 13/39] Btrfs: make raid6 rebuild retry more
Date: Sun, 24 Jun 2018 23:24:00 +0800	[thread overview]
Message-ID: <20180624152353.535229696@linuxfoundation.org> (raw)
In-Reply-To: <20180624152352.038950449@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Liu Bo <bo.li.liu@oracle.com>

[ Upstream commit 8810f7517a3bc4ca2d41d022446d3f5fd6b77c09 ]

There is a scenario that can end up with rebuild process failing to
return good content, i.e.
suppose that all disks can be read without problems and if the content
that was read out doesn't match its checksum, currently for raid6
btrfs at most retries twice,

- the 1st retry is to rebuild with all other stripes, it'll eventually
  be a raid5 xor rebuild,
- if the 1st fails, the 2nd retry will deliberately fail parity p so
  that it will do raid6 style rebuild,

however, the chances are that another non-parity stripe content also
has something corrupted, so that the above retries are not able to
return correct content, and users will think of this as data loss.
More seriouly, if the loss happens on some important internal btree
roots, it could refuse to mount.

This extends btrfs to do more retries and each retry fails only one
stripe.  Since raid6 can tolerate 2 disk failures, if there is one
more failure besides the failure on which we're recovering, this can
always work.

The worst case is to retry as many times as the number of raid6 disks,
but given the fact that such a scenario is really rare in practice,
it's still acceptable.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/btrfs/raid56.c  |   18 ++++++++++++++----
 fs/btrfs/volumes.c |    9 ++++++++-
 2 files changed, 22 insertions(+), 5 deletions(-)

--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2161,11 +2161,21 @@ int raid56_parity_recover(struct btrfs_r
 	}
 
 	/*
-	 * reconstruct from the q stripe if they are
-	 * asking for mirror 3
+	 * Loop retry:
+	 * for 'mirror == 2', reconstruct from all other stripes.
+	 * for 'mirror_num > 2', select a stripe to fail on every retry.
 	 */
-	if (mirror_num == 3)
-		rbio->failb = rbio->real_stripes - 2;
+	if (mirror_num > 2) {
+		/*
+		 * 'mirror == 3' is to fail the p stripe and
+		 * reconstruct from the q stripe.  'mirror > 3' is to
+		 * fail a data stripe and reconstruct from p+q stripe.
+		 */
+		rbio->failb = rbio->real_stripes - (mirror_num - 1);
+		ASSERT(rbio->failb > 0);
+		if (rbio->failb <= rbio->faila)
+			rbio->failb--;
+	}
 
 	ret = lock_stripe_add(rbio);
 
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5186,7 +5186,14 @@ int btrfs_num_copies(struct btrfs_fs_inf
 	else if (map->type & BTRFS_BLOCK_GROUP_RAID5)
 		ret = 2;
 	else if (map->type & BTRFS_BLOCK_GROUP_RAID6)
-		ret = 3;
+		/*
+		 * There could be two corrupted data stripes, we need
+		 * to loop retry in order to rebuild the correct data.
+		 *
+		 * Fail a stripe at a time on every retry except the
+		 * stripe under reconstruction.
+		 */
+		ret = map->num_stripes;
 	else
 		ret = 1;
 	free_extent_map(em);



  parent reply	other threads:[~2018-06-24 15:49 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-24 15:23 [PATCH 4.9 00/39] 4.9.110-stable review Greg Kroah-Hartman
2018-06-24 15:23 ` [PATCH 4.9 01/39] objtool: update .gitignore file Greg Kroah-Hartman
2018-06-24 15:23 ` [PATCH 4.9 03/39] netfilter: ebtables: handle string from userspace with care Greg Kroah-Hartman
2018-06-24 15:23 ` [PATCH 4.9 04/39] ipvs: fix buffer overflow with sync daemon and service Greg Kroah-Hartman
2018-06-24 15:23 ` [PATCH 4.9 05/39] iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs Greg Kroah-Hartman
2018-06-24 15:23 ` [PATCH 4.9 06/39] atm: zatm: fix memcmp casting Greg Kroah-Hartman
2018-06-24 15:23 ` [PATCH 4.9 09/39] net/sonic: Use dma_mapping_error() Greg Kroah-Hartman
2018-06-24 15:23 ` [PATCH 4.9 11/39] Revert "Btrfs: fix scrub to repair raid6 corruption" Greg Kroah-Hartman
2018-06-24 15:23 ` [PATCH 4.9 12/39] tcp: do not overshoot window_clamp in tcp_rcv_space_adjust() Greg Kroah-Hartman
2018-06-24 15:24 ` Greg Kroah-Hartman [this message]
2018-06-24 15:24 ` [PATCH 4.9 15/39] bonding: re-evaluate force_primary when the primary slave name changes Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 16/39] ipv6: allow PMTU exceptions to local routes Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 17/39] net/sched: act_simple: fix parsing of TCA_DEF_DATA Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 18/39] tcp: verify the checksum of the first data segment in a new connection Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 20/39] ext4: fix hole length detection in ext4_ind_map_blocks() Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 21/39] ext4: update mtime in ext4_punch_hole even if no blocks are released Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 22/39] ext4: fix fencepost error in check for inode count overflow during resize Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 23/39] driver core: Dont ignore class_dir_create_and_add() failure Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 24/39] Btrfs: fix clone vs chattr NODATASUM race Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 25/39] Btrfs: fix memory and mount leak in btrfs_ioctl_rm_dev_v2() Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 26/39] btrfs: scrub: Dont use inode pages for device replace Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 27/39] ALSA: hda - Handle kzalloc() failure in snd_hda_attach_pcm_stream() Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 28/39] ALSA: hda: add dock and led support for HP EliteBook 830 G5 Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 29/39] ALSA: hda: add dock and led support for HP ProBook 640 G4 Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 30/39] smb3: on reconnect set PreviousSessionId field Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 31/39] cpufreq: Fix new policy initialization during limits updates via sysfs Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 32/39] libata: zpodd: make arrays cdb static, reduces object code size Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 33/39] libata: zpodd: small read overflow in eject_tray() Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 34/39] libata: Drop SanDisk SD7UB3Q*G1001 NOLPM quirk Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 35/39] w1: mxc_w1: Enable clock before calling clk_get_rate() on it Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 36/39] orangefs: set i_size on new symlink Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 37/39] HID: intel_ish-hid: ipc: register more pm callbacks to support hibernation Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 38/39] vhost: fix info leak due to uninitialized memory Greg Kroah-Hartman
2018-06-24 15:24 ` [PATCH 4.9 39/39] fs/binfmt_misc.c: do not allow offset overflow Greg Kroah-Hartman
2018-06-24 17:44 ` [PATCH 4.9 00/39] 4.9.110-stable review Nathan Chancellor
2018-06-25  0:55   ` Greg Kroah-Hartman
2018-06-25  5:06 ` Naresh Kamboju
2018-06-25  6:43   ` Greg Kroah-Hartman
2018-06-25 17:18 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180624152353.535229696@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=alexander.levin@microsoft.com \
    --cc=bo.li.liu@oracle.com \
    --cc=dsterba@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).