All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Jes Sorensen <Jes.Sorensen@redhat.com>
Cc: "Manibalan P" <pmanibalan@amiindia.co.in>,
	"Pasi Kärkkäinen" <pasik@iki.fi>,
	linux-raid <linux-raid@vger.kernel.org>
Subject: Re: md_raid5 using 100% CPU and hang with status resync=PENDING, if a drive is removed during initialization
Date: Wed, 18 Feb 2015 11:27:41 +1100	[thread overview]
Message-ID: <20150218112741.08495514@notabene.brown> (raw)
In-Reply-To: <wrfj61b0gk6l.fsf@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3998 bytes --]

On Tue, 17 Feb 2015 19:03:30 -0500 Jes Sorensen <Jes.Sorensen@redhat.com>
wrote:

> Jes Sorensen <Jes.Sorensen@redhat.com> writes:
> > Jes Sorensen <Jes.Sorensen@redhat.com> writes:
> >> NeilBrown <neilb@suse.de> writes:
> >>> On Mon, 2 Feb 2015 07:10:14 +0000 Manibalan P <pmanibalan@amiindia.co.in>
> >>> wrote:
> >>>
> >>>> Dear All,
> >>>> 	Any updates on this issue.
> >>>
> >>> Probably the same as:
> >>>
> >>>   http://marc.info/?l=linux-raid&m=142283560704091&w=2
> >>
> >> Hi Neil,
> >>
> >> I ran some tests on this one against the latest Linus' tree as of today
> >> (1fa185ebcbcefdc5229c783450c9f0439a69f0c1) which I believe includes all
> >> your pending 3.20 patches.
> >>
> >> I am able to reproduce Manibalan's hangs on a system with 4 SSDs if I
> >> run fio on top of a device while it is resyncing and I fail one of the
> >> devices.
> >
> > Since Manibalan mentioned this issue wasn't present in earlier kernels,
> > I started trying to track down what change caused it.
> >
> > So far I have been able to reproduce the hang as far back as 3.10.
> 
> After a lot of bisecting I finally traced the issue back to this commit:
> 
> a7854487cd7128a30a7f4f5259de9f67d5efb95f is the first bad commit
> commit a7854487cd7128a30a7f4f5259de9f67d5efb95f
> Author: Alexander Lyakas <alex.bolshoy@gmail.com>
> Date:   Thu Oct 11 13:50:12 2012 +1100
> 
>     md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write.
>     
>     Signed-off-by: Alex Lyakas <alex@zadarastorage.com>
>     Suggested-by: Yair Hershko <yair@zadarastorage.com>
>     Signed-off-by: NeilBrown <neilb@suse.de>
> 
> If I revert that one I cannot reproduce the hang, applying it reproduces
> the hang consistently.

Thanks for all the research!

That is consistent with what you already reported.
You noted that it doesn't affect RAID6, and RAID6 doesn't have an RMW cycle.

Also, one  of the early emails from Manibalan contained:

handling stripe 273480328, state=0x2041 cnt=1, pd_idx=5, qd_idx=-1
, check:0, reconstruct:0
check 5: state 0x10 read           (null) write           (null) written           (null)
check 4: state 0x11 read           (null) write           (null) written           (null)
check 3: state 0x0 read           (null) write           (null) written           (null)
check 2: state 0x11 read           (null) write           (null) written           (null)
check 1: state 0x11 read           (null) write           (null) written           (null)
check 0: state 0x18 read           (null) write ffff8808029b6b00 written           (null)
locked=0 uptodate=3 to_read=0 to_write=1 failed=1 failed_num=3,-1
force RCW max_degraded=1, recovery_cp=7036944 sh->sector=273480328
for sector 273480328, rmw=2 rcw=1

So it is forcing RCW, even though a single block update is usually handled
with RMW.

In this stripe, the parity disk is '5' and disk 3 has failed.
That means to perform an RCW, we need to read the parity block in order
to reconstruct the content of the failed disk.  And if we were to do that,
we may as well just do an RMW.

So I think the correct fix would be to only force RCW when the array
is not degraded.

So something like this:

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index aa76865b804b..fa8f8b94bfa8 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3170,7 +3170,8 @@ static void handle_stripe_dirtying(struct r5conf *conf,
 	 * generate correct data from the parity.
 	 */
 	if (conf->max_degraded == 2 ||
-	    (recovery_cp < MaxSector && sh->sector >= recovery_cp)) {
+	    (recovery_cp < MaxSector && sh->sector >= recovery_cp &&
+	     s->failed == 0)) {
 		/* Calculate the real rcw later - for now make it
 		 * look like rcw is cheaper
 		 */


I think reverting the whole patch is not necessary and discards useful
functionality while the array is not degraded.

Can you test this patch please?

Thanks!

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

  reply	other threads:[~2015-02-18  0:27 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-30 11:06 md_raid5 using 100% CPU and hang with status resync=PENDING, if a drive is removed during initialization Manibalan P
2014-12-31 16:48 ` Pasi Kärkkäinen
2015-01-02  6:38   ` Manibalan P
2015-01-14 10:24   ` Manibalan P
2015-02-02  7:10   ` Manibalan P
2015-02-02 22:30     ` NeilBrown
2015-02-04  5:56       ` Manibalan P
2015-02-12 13:56       ` Manibalan P
2015-02-16 20:36       ` Jes Sorensen
2015-02-16 22:49         ` Jes Sorensen
2015-02-18  0:03           ` Jes Sorensen
2015-02-18  0:27             ` NeilBrown [this message]
2015-02-18  1:01               ` Jes Sorensen
2015-02-18  1:07                 ` Jes Sorensen
2015-02-18  1:16                   ` NeilBrown
2015-02-18  5:05                     ` Jes Sorensen
  -- strict thread matches above, loose matches on Subject: below --
2014-12-24  6:45 Manibalan P
2014-12-18  6:08 Manibalan P
2014-12-17  6:40 Manibalan P
2014-12-17  6:31 Manibalan P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150218112741.08495514@notabene.brown \
    --to=neilb@suse.de \
    --cc=Jes.Sorensen@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=pasik@iki.fi \
    --cc=pmanibalan@amiindia.co.in \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.