From: NeilBrown <neilb@suse.com> To: Michael Wang <yun.wang@profitbricks.com>, linux-raid@vger.kernel.org, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org> Cc: Shaohua Li <shli@kernel.org>, Jinpu Wang <jinpu.wang@profitbricks.com> Subject: Re: [RFC PATCH] raid1: reset 'bi_next' before reuse the bio Date: Wed, 05 Apr 2017 08:17:52 +1000 [thread overview] Message-ID: <87shlnizqn.fsf@notabene.neil.brown.name> (raw) In-Reply-To: <dcd33b53-5c6f-3ebc-8c07-04f0c0372796@profitbricks.com> [-- Attachment #1: Type: text/plain, Size: 2653 bytes --] On Tue, Apr 04 2017, Michael Wang wrote: > During the testing we found the sync read bio can go through > path: > > md_do_sync() > sync_request() > generic_make_request() > blk_queue_bio() > blk_attempt_plug_merge() > bio->bi_next CHAINED HERE > > ... > > raid1d() > sync_request_write() > fix_sync_read_error() > if FailFast && Faulty > bio->bi_end_io = end_sync_write > generic_make_request() > BUG_ON(bio->bi_next) > > This need to meet the conditions: > * bio once merged > * read disk have FailFast enabled > * read disk is Faulty > > And since the block layer won't reset the 'bi_next' after bio > is done inside request, we hit the BUG like that. > > This patch simply reset the bi_next before we reuse it. > > Signed-off-by: Michael Wang <yun.wang@profitbricks.com> > --- > drivers/md/raid1.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 7d67235..0554110 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -1986,11 +1986,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio) > /* Don't try recovering from here - just fail it > * ... unless it is the last working device of course */ > md_error(mddev, rdev); > - if (test_bit(Faulty, &rdev->flags)) > + if (test_bit(Faulty, &rdev->flags)) { > /* Don't try to read from here, but make sure > * put_buf does it's thing > */ > bio->bi_end_io = end_sync_write; > + bio->bi_next = NULL; > + } > } > > while(sectors) { Ah - I see what is happening now. I was looking at the vanilla 4.4 code, which doesn't have the failfast changes. I don't think your patch is correct though. We really shouldn't be re-using that bio, and setting bi_next to NULL just hides the bug. It doesn't fix it. As the rdev is now Faulty, it doesn't make sense for sync_request_write() to submit a write request to it. Can you confirm that this works please. diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index d2d8b8a5bd56..219f1e1f1d1d 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2180,6 +2180,8 @@ static void sync_request_write(struct mddev *mddev, struct r1bio *r1_bio) (i == r1_bio->read_disk || !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)))) continue; + if (test_bit(Faulty, &conf->mirrors[i].rdev->flags)) + continue; bio_set_op_attrs(wbio, REQ_OP_WRITE, 0); if (test_bit(FailFast, &conf->mirrors[i].rdev->flags)) Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <neilb@suse.com> To: Michael Wang <yun.wang@profitbricks.com>, linux-raid@vger.kernel.org, "linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org> Cc: Shaohua Li <shli@kernel.org>, Jinpu Wang <jinpu.wang@profitbricks.com> Subject: Re: [RFC PATCH] raid1: reset 'bi_next' before reuse the bio Date: Wed, 05 Apr 2017 08:17:52 +1000 [thread overview] Message-ID: <87shlnizqn.fsf@notabene.neil.brown.name> (raw) In-Reply-To: <dcd33b53-5c6f-3ebc-8c07-04f0c0372796@profitbricks.com> [-- Attachment #1: Type: text/plain, Size: 2653 bytes --] On Tue, Apr 04 2017, Michael Wang wrote: > During the testing we found the sync read bio can go through > path: > > md_do_sync() > sync_request() > generic_make_request() > blk_queue_bio() > blk_attempt_plug_merge() > bio->bi_next CHAINED HERE > > ... > > raid1d() > sync_request_write() > fix_sync_read_error() > if FailFast && Faulty > bio->bi_end_io = end_sync_write > generic_make_request() > BUG_ON(bio->bi_next) > > This need to meet the conditions: > * bio once merged > * read disk have FailFast enabled > * read disk is Faulty > > And since the block layer won't reset the 'bi_next' after bio > is done inside request, we hit the BUG like that. > > This patch simply reset the bi_next before we reuse it. > > Signed-off-by: Michael Wang <yun.wang@profitbricks.com> > --- > drivers/md/raid1.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 7d67235..0554110 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -1986,11 +1986,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio) > /* Don't try recovering from here - just fail it > * ... unless it is the last working device of course */ > md_error(mddev, rdev); > - if (test_bit(Faulty, &rdev->flags)) > + if (test_bit(Faulty, &rdev->flags)) { > /* Don't try to read from here, but make sure > * put_buf does it's thing > */ > bio->bi_end_io = end_sync_write; > + bio->bi_next = NULL; > + } > } > > while(sectors) { Ah - I see what is happening now. I was looking at the vanilla 4.4 code, which doesn't have the failfast changes. I don't think your patch is correct though. We really shouldn't be re-using that bio, and setting bi_next to NULL just hides the bug. It doesn't fix it. As the rdev is now Faulty, it doesn't make sense for sync_request_write() to submit a write request to it. Can you confirm that this works please. diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index d2d8b8a5bd56..219f1e1f1d1d 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2180,6 +2180,8 @@ static void sync_request_write(struct mddev *mddev, struct r1bio *r1_bio) (i == r1_bio->read_disk || !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)))) continue; + if (test_bit(Faulty, &conf->mirrors[i].rdev->flags)) + continue; bio_set_op_attrs(wbio, REQ_OP_WRITE, 0); if (test_bit(FailFast, &conf->mirrors[i].rdev->flags)) Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2017-04-04 22:17 UTC|newest] Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-04-04 13:50 [RFC PATCH] raid1: reset 'bi_next' before reuse the bio Michael Wang 2017-04-04 22:17 ` NeilBrown [this message] 2017-04-04 22:17 ` NeilBrown 2017-04-05 7:40 ` Michael Wang 2017-04-06 2:03 ` NeilBrown 2017-04-06 2:03 ` NeilBrown
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=87shlnizqn.fsf@notabene.neil.brown.name \ --to=neilb@suse.com \ --cc=jinpu.wang@profitbricks.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-raid@vger.kernel.org \ --cc=shli@kernel.org \ --cc=yun.wang@profitbricks.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.