linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] block: only update parent bi_status when bio fail
@ 2021-03-31 11:53 Yufen Yu
  2021-04-01  0:54 ` Ming Lei
  2021-04-01  1:18 ` Jens Axboe
  0 siblings, 2 replies; 3+ messages in thread
From: Yufen Yu @ 2021-03-31 11:53 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, hch, ming.lei, bvanassche, kbusch

For multiple split bios, if one of the bio is fail, the whole
should return error to application. But we found there is a race
between bio_integrity_verify_fn and bio complete, which return
io success to application after one of the bio fail. The race as
following:

split bio(READ)          kworker

nvme_complete_rq
blk_update_request //split error=0
  bio_endio
    bio_integrity_endio
      queue_work(kintegrityd_wq, &bip->bip_work);

                         bio_integrity_verify_fn
                         bio_endio //split bio
                          __bio_chain_endio
                             if (!parent->bi_status)

                               <interrupt entry>
                               nvme_irq
                                 blk_update_request //parent error=7
                                 req_bio_endio
                                    bio->bi_status = 7 //parent bio
                               <interrupt exit>

                               parent->bi_status = 0
                        parent->bi_end_io() // return bi_status=0

The bio has been split as two: split and parent. When split
bio completed, it depends on kworker to do endio, while
bio_integrity_verify_fn have been interrupted by parent bio
complete irq handler. Then, parent bio->bi_status which have
been set in irq handler will overwrite by kworker.

In fact, even without the above race, we also need to conside
the concurrency beteen mulitple split bio complete and update
the same parent bi_status. Normally, multiple split bios will
be issued to the same hctx and complete from the same irq
vector. But if we have updated queue map between multiple split
bios, these bios may complete on different hw queue and different
irq vector. Then the concurrency update parent bi_status may
cause the final status error.

Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 block/bio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index 26b7f721cda8..f49713ff8ad3 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -277,7 +277,7 @@ static struct bio *__bio_chain_endio(struct bio *bio)
 {
 	struct bio *parent = bio->bi_private;
 
-	if (!parent->bi_status)
+	if (bio->bi_status && !parent->bi_status)
 		parent->bi_status = bio->bi_status;
 	bio_put(bio);
 	return parent;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] block: only update parent bi_status when bio fail
  2021-03-31 11:53 [PATCH] block: only update parent bi_status when bio fail Yufen Yu
@ 2021-04-01  0:54 ` Ming Lei
  2021-04-01  1:18 ` Jens Axboe
  1 sibling, 0 replies; 3+ messages in thread
From: Ming Lei @ 2021-04-01  0:54 UTC (permalink / raw)
  To: Yufen Yu; +Cc: axboe, linux-block, hch, bvanassche, kbusch

On Wed, Mar 31, 2021 at 07:53:59AM -0400, Yufen Yu wrote:
> For multiple split bios, if one of the bio is fail, the whole
> should return error to application. But we found there is a race
> between bio_integrity_verify_fn and bio complete, which return
> io success to application after one of the bio fail. The race as
> following:
> 
> split bio(READ)          kworker
> 
> nvme_complete_rq
> blk_update_request //split error=0
>   bio_endio
>     bio_integrity_endio
>       queue_work(kintegrityd_wq, &bip->bip_work);
> 
>                          bio_integrity_verify_fn
>                          bio_endio //split bio
>                           __bio_chain_endio
>                              if (!parent->bi_status)
> 
>                                <interrupt entry>
>                                nvme_irq
>                                  blk_update_request //parent error=7
>                                  req_bio_endio
>                                     bio->bi_status = 7 //parent bio
>                                <interrupt exit>
> 
>                                parent->bi_status = 0
>                         parent->bi_end_io() // return bi_status=0
> 
> The bio has been split as two: split and parent. When split
> bio completed, it depends on kworker to do endio, while
> bio_integrity_verify_fn have been interrupted by parent bio
> complete irq handler. Then, parent bio->bi_status which have
> been set in irq handler will overwrite by kworker.
> 
> In fact, even without the above race, we also need to conside
> the concurrency beteen mulitple split bio complete and update
> the same parent bi_status. Normally, multiple split bios will
> be issued to the same hctx and complete from the same irq
> vector. But if we have updated queue map between multiple split
> bios, these bios may complete on different hw queue and different
> irq vector. Then the concurrency update parent bi_status may
> cause the final status error.
> 
> Suggested-by: Keith Busch <kbusch@kernel.org>
> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
> ---
>  block/bio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 26b7f721cda8..f49713ff8ad3 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -277,7 +277,7 @@ static struct bio *__bio_chain_endio(struct bio *bio)
>  {
>  	struct bio *parent = bio->bi_private;
>  
> -	if (!parent->bi_status)
> +	if (bio->bi_status && !parent->bi_status)
>  		parent->bi_status = bio->bi_status;
>  	bio_put(bio);
>  	return parent;
> -- 
> 2.25.4
> 

Looks fine:

Reviewed-by: Ming Lei <ming.lei@redhat.com>

-- 
Ming


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] block: only update parent bi_status when bio fail
  2021-03-31 11:53 [PATCH] block: only update parent bi_status when bio fail Yufen Yu
  2021-04-01  0:54 ` Ming Lei
@ 2021-04-01  1:18 ` Jens Axboe
  1 sibling, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2021-04-01  1:18 UTC (permalink / raw)
  To: Yufen Yu; +Cc: linux-block, hch, ming.lei, bvanassche, kbusch

On 3/31/21 5:53 AM, Yufen Yu wrote:
> For multiple split bios, if one of the bio is fail, the whole
> should return error to application. But we found there is a race
> between bio_integrity_verify_fn and bio complete, which return
> io success to application after one of the bio fail. The race as
> following:
> 
> split bio(READ)          kworker
> 
> nvme_complete_rq
> blk_update_request //split error=0
>   bio_endio
>     bio_integrity_endio
>       queue_work(kintegrityd_wq, &bip->bip_work);
> 
>                          bio_integrity_verify_fn
>                          bio_endio //split bio
>                           __bio_chain_endio
>                              if (!parent->bi_status)
> 
>                                <interrupt entry>
>                                nvme_irq
>                                  blk_update_request //parent error=7
>                                  req_bio_endio
>                                     bio->bi_status = 7 //parent bio
>                                <interrupt exit>
> 
>                                parent->bi_status = 0
>                         parent->bi_end_io() // return bi_status=0
> 
> The bio has been split as two: split and parent. When split
> bio completed, it depends on kworker to do endio, while
> bio_integrity_verify_fn have been interrupted by parent bio
> complete irq handler. Then, parent bio->bi_status which have
> been set in irq handler will overwrite by kworker.
> 
> In fact, even without the above race, we also need to conside
> the concurrency beteen mulitple split bio complete and update
> the same parent bi_status. Normally, multiple split bios will
> be issued to the same hctx and complete from the same irq
> vector. But if we have updated queue map between multiple split
> bios, these bios may complete on different hw queue and different
> irq vector. Then the concurrency update parent bi_status may
> cause the final status error.

Applied, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-04-01  1:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-31 11:53 [PATCH] block: only update parent bi_status when bio fail Yufen Yu
2021-04-01  0:54 ` Ming Lei
2021-04-01  1:18 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).