All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Michael Wang <yun.wang@profitbricks.com>,
	linux-raid@vger.kernel.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: Shaohua Li <shli@kernel.org>, Jinpu Wang <jinpu.wang@profitbricks.com>
Subject: Re: [RFC PATCH] raid1: reset 'bi_next' before reuse the bio
Date: Wed, 05 Apr 2017 08:17:52 +1000	[thread overview]
Message-ID: <87shlnizqn.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <dcd33b53-5c6f-3ebc-8c07-04f0c0372796@profitbricks.com>

[-- Attachment #1: Type: text/plain, Size: 2653 bytes --]

On Tue, Apr 04 2017, Michael Wang wrote:

> During the testing we found the sync read bio can go through
> path:
>
>   md_do_sync()
>     sync_request()
>       generic_make_request()
>         blk_queue_bio()
>           blk_attempt_plug_merge()
>             bio->bi_next CHAINED HERE
>
>   ...
>
>   raid1d()
>     sync_request_write()
>       fix_sync_read_error()
>         if FailFast && Faulty
>           bio->bi_end_io = end_sync_write
>       generic_make_request()
>         BUG_ON(bio->bi_next)
>
> This need to meet the conditions:
>   * bio once merged
>   * read disk have FailFast enabled
>   * read disk is Faulty
>
> And since the block layer won't reset the 'bi_next' after bio
> is done inside request, we hit the BUG like that.
>
> This patch simply reset the bi_next before we reuse it.
>
> Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
> ---
>  drivers/md/raid1.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 7d67235..0554110 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1986,11 +1986,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>  		/* Don't try recovering from here - just fail it
>  		 * ... unless it is the last working device of course */
>  		md_error(mddev, rdev);
> -		if (test_bit(Faulty, &rdev->flags))
> +		if (test_bit(Faulty, &rdev->flags)) {
>  			/* Don't try to read from here, but make sure
>  			 * put_buf does it's thing
>  			 */
>  			bio->bi_end_io = end_sync_write;
> +			bio->bi_next = NULL;
> +		}
>  	}
>  
>  	while(sectors) {


Ah - I see what is happening now.  I was looking at the vanilla 4.4
code, which doesn't have the failfast changes.

I don't think your patch is correct though.  We really shouldn't be
re-using that bio, and setting bi_next to NULL just hides the bug.  It
doesn't fix it.
As the rdev is now Faulty, it doesn't make sense for
sync_request_write() to submit a write request to it.

Can you confirm that this works please.

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d2d8b8a5bd56..219f1e1f1d1d 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2180,6 +2180,8 @@ static void sync_request_write(struct mddev *mddev, struct r1bio *r1_bio)
 		     (i == r1_bio->read_disk ||
 		      !test_bit(MD_RECOVERY_SYNC, &mddev->recovery))))
 			continue;
+		if (test_bit(Faulty, &conf->mirrors[i].rdev->flags))
+			continue;
 
 		bio_set_op_attrs(wbio, REQ_OP_WRITE, 0);
 		if (test_bit(FailFast, &conf->mirrors[i].rdev->flags))


Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <neilb@suse.com>
To: Michael Wang <yun.wang@profitbricks.com>,
	linux-raid@vger.kernel.org,
	"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: Shaohua Li <shli@kernel.org>, Jinpu Wang <jinpu.wang@profitbricks.com>
Subject: Re: [RFC PATCH] raid1: reset 'bi_next' before reuse the bio
Date: Wed, 05 Apr 2017 08:17:52 +1000	[thread overview]
Message-ID: <87shlnizqn.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <dcd33b53-5c6f-3ebc-8c07-04f0c0372796@profitbricks.com>

[-- Attachment #1: Type: text/plain, Size: 2653 bytes --]

On Tue, Apr 04 2017, Michael Wang wrote:

> During the testing we found the sync read bio can go through
> path:
>
>   md_do_sync()
>     sync_request()
>       generic_make_request()
>         blk_queue_bio()
>           blk_attempt_plug_merge()
>             bio->bi_next CHAINED HERE
>
>   ...
>
>   raid1d()
>     sync_request_write()
>       fix_sync_read_error()
>         if FailFast && Faulty
>           bio->bi_end_io = end_sync_write
>       generic_make_request()
>         BUG_ON(bio->bi_next)
>
> This need to meet the conditions:
>   * bio once merged
>   * read disk have FailFast enabled
>   * read disk is Faulty
>
> And since the block layer won't reset the 'bi_next' after bio
> is done inside request, we hit the BUG like that.
>
> This patch simply reset the bi_next before we reuse it.
>
> Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
> ---
>  drivers/md/raid1.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 7d67235..0554110 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1986,11 +1986,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
>  		/* Don't try recovering from here - just fail it
>  		 * ... unless it is the last working device of course */
>  		md_error(mddev, rdev);
> -		if (test_bit(Faulty, &rdev->flags))
> +		if (test_bit(Faulty, &rdev->flags)) {
>  			/* Don't try to read from here, but make sure
>  			 * put_buf does it's thing
>  			 */
>  			bio->bi_end_io = end_sync_write;
> +			bio->bi_next = NULL;
> +		}
>  	}
>  
>  	while(sectors) {


Ah - I see what is happening now.  I was looking at the vanilla 4.4
code, which doesn't have the failfast changes.

I don't think your patch is correct though.  We really shouldn't be
re-using that bio, and setting bi_next to NULL just hides the bug.  It
doesn't fix it.
As the rdev is now Faulty, it doesn't make sense for
sync_request_write() to submit a write request to it.

Can you confirm that this works please.

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d2d8b8a5bd56..219f1e1f1d1d 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2180,6 +2180,8 @@ static void sync_request_write(struct mddev *mddev, struct r1bio *r1_bio)
 		     (i == r1_bio->read_disk ||
 		      !test_bit(MD_RECOVERY_SYNC, &mddev->recovery))))
 			continue;
+		if (test_bit(Faulty, &conf->mirrors[i].rdev->flags))
+			continue;
 
 		bio_set_op_attrs(wbio, REQ_OP_WRITE, 0);
 		if (test_bit(FailFast, &conf->mirrors[i].rdev->flags))


Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2017-04-04 22:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-04 13:50 [RFC PATCH] raid1: reset 'bi_next' before reuse the bio Michael Wang
2017-04-04 22:17 ` NeilBrown [this message]
2017-04-04 22:17   ` NeilBrown
2017-04-05  7:40   ` Michael Wang
2017-04-06  2:03     ` NeilBrown
2017-04-06  2:03       ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87shlnizqn.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=jinpu.wang@profitbricks.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@kernel.org \
    --cc=yun.wang@profitbricks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.