All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Don't allow REQ_NOWAIT for bio splitting
@ 2023-01-04 16:09 Jens Axboe
  2023-01-04 16:09 ` [PATCH 1/2] block: handle bio_split_to_limits() NULL return Jens Axboe
  2023-01-04 16:09 ` [PATCH 2/2] block: don't allow splitting of a REQ_NOWAIT bio Jens Axboe
  0 siblings, 2 replies; 10+ messages in thread
From: Jens Axboe @ 2023-01-04 16:09 UTC (permalink / raw)
  To: linux-block; +Cc: mikelley

Hi,

If we end up needing to split a bio that is marked with REQ_NOWAIT, it's
entirely possible that any part of the split bio will hit a failure
trying to get submitted. If this happens, then the entire request ends
up being errored with BLK_STS_AGAIN, even if any parts of this bio has
been read or written to the device/file.

Rather than end up in this odd state, disallow splitting with REQ_NOWAIT.
If we end the entire request upfront with EAGAIN, then we don't end up
with a partially written IO that still returns EAGAIN.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] block: handle bio_split_to_limits() NULL return
  2023-01-04 16:09 [PATCH 0/2] Don't allow REQ_NOWAIT for bio splitting Jens Axboe
@ 2023-01-04 16:09 ` Jens Axboe
  2023-01-10  9:20   ` Christoph Hellwig
  2023-01-04 16:09 ` [PATCH 2/2] block: don't allow splitting of a REQ_NOWAIT bio Jens Axboe
  1 sibling, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2023-01-04 16:09 UTC (permalink / raw)
  To: linux-block; +Cc: mikelley, Jens Axboe

This can't happen right now, but in preparation for allowing
bio_split_to_limits() returning NULL if it ended the bio, check for it
in all the callers.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-merge.c             | 4 +++-
 block/blk-mq.c                | 5 ++++-
 drivers/block/drbd/drbd_req.c | 2 ++
 drivers/block/ps3vram.c       | 2 ++
 drivers/md/dm.c               | 2 ++
 drivers/md/md.c               | 2 ++
 drivers/nvme/host/multipath.c | 2 ++
 drivers/s390/block/dcssblk.c  | 2 ++
 8 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 35a8f75cc45d..071c5f8cf0cf 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -358,11 +358,13 @@ struct bio *__bio_split_to_limits(struct bio *bio,
 	default:
 		split = bio_split_rw(bio, lim, nr_segs, bs,
 				get_max_io_size(bio, lim) << SECTOR_SHIFT);
+		if (IS_ERR(split))
+			return NULL;
 		break;
 	}
 
 	if (split) {
-		/* there isn't chance to merge the splitted bio */
+		/* there isn't chance to merge the split bio */
 		split->bi_opf |= REQ_NOMERGE;
 
 		blkcg_bio_issue_init(split);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c5cf0dbca1db..2c49b4151da1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2951,8 +2951,11 @@ void blk_mq_submit_bio(struct bio *bio)
 	blk_status_t ret;
 
 	bio = blk_queue_bounce(bio, q);
-	if (bio_may_exceed_limits(bio, &q->limits))
+	if (bio_may_exceed_limits(bio, &q->limits)) {
 		bio = __bio_split_to_limits(bio, &q->limits, &nr_segs);
+		if (!bio)
+			return;
+	}
 
 	if (!bio_integrity_prep(bio))
 		return;
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index eb14ec8ec04c..e36216d50753 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1607,6 +1607,8 @@ void drbd_submit_bio(struct bio *bio)
 	struct drbd_device *device = bio->bi_bdev->bd_disk->private_data;
 
 	bio = bio_split_to_limits(bio);
+	if (!bio)
+		return;
 
 	/*
 	 * what we "blindly" assume:
diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index c76e0148eada..574e470b220b 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -587,6 +587,8 @@ static void ps3vram_submit_bio(struct bio *bio)
 	dev_dbg(&dev->core, "%s\n", __func__);
 
 	bio = bio_split_to_limits(bio);
+	if (!bio)
+		return;
 
 	spin_lock_irq(&priv->lock);
 	busy = !bio_list_empty(&priv->list);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index e1ea3a7bd9d9..b424a6ee27ba 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1742,6 +1742,8 @@ static void dm_split_and_process_bio(struct mapped_device *md,
 		 * otherwise associated queue_limits won't be imposed.
 		 */
 		bio = bio_split_to_limits(bio);
+		if (!bio)
+			return;
 	}
 
 	init_clone_info(&ci, md, map, bio, is_abnormal);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 775f1dde190a..8af639296b3c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -455,6 +455,8 @@ static void md_submit_bio(struct bio *bio)
 	}
 
 	bio = bio_split_to_limits(bio);
+	if (!bio)
+		return;
 
 	if (mddev->ro == MD_RDONLY && unlikely(rw == WRITE)) {
 		if (bio_sectors(bio) != 0)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index c03093b6813c..fc39d01e7b63 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -376,6 +376,8 @@ static void nvme_ns_head_submit_bio(struct bio *bio)
 	 * pool from the original queue to allocate the bvecs from.
 	 */
 	bio = bio_split_to_limits(bio);
+	if (!bio)
+		return;
 
 	srcu_idx = srcu_read_lock(&head->srcu);
 	ns = nvme_find_path(head);
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index b392b9f5482e..c0f85ffb2b62 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -865,6 +865,8 @@ dcssblk_submit_bio(struct bio *bio)
 	unsigned long bytes_done;
 
 	bio = bio_split_to_limits(bio);
+	if (!bio)
+		return;
 
 	bytes_done = 0;
 	dev_info = bio->bi_bdev->bd_disk->private_data;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] block: don't allow splitting of a REQ_NOWAIT bio
  2023-01-04 16:09 [PATCH 0/2] Don't allow REQ_NOWAIT for bio splitting Jens Axboe
  2023-01-04 16:09 ` [PATCH 1/2] block: handle bio_split_to_limits() NULL return Jens Axboe
@ 2023-01-04 16:09 ` Jens Axboe
  2023-01-04 19:13   ` Keith Busch
  2023-01-10  9:21   ` Christoph Hellwig
  1 sibling, 2 replies; 10+ messages in thread
From: Jens Axboe @ 2023-01-04 16:09 UTC (permalink / raw)
  To: linux-block; +Cc: mikelley, Jens Axboe, stable

If we split a bio marked with REQ_NOWAIT, then we can trigger spurious
EAGAIN if constituent parts of that split bio end up failing request
allocations. Parts will complete just fine, but just a single failure
in one of the chained bios will yield an EAGAIN final result for the
parent bio.

Return EAGAIN early if we end up needing to split such a bio, which
allows for saner recovery handling.

Cc: stable@vger.kernel.org # 5.15+
Link: https://github.com/axboe/liburing/issues/766
Reported-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-merge.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 071c5f8cf0cf..b7c193d67185 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -309,6 +309,16 @@ static struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
 	*segs = nsegs;
 	return NULL;
 split:
+	/*
+	 * We can't sanely support splitting for a REQ_NOWAIT bio. End it
+	 * with EAGAIN if splitting is required and return an error pointer.
+	 */
+	if (bio->bi_opf & REQ_NOWAIT) {
+		bio->bi_status = BLK_STS_AGAIN;
+		bio_endio(bio);
+		return ERR_PTR(-EAGAIN);
+	}
+
 	*segs = nsegs;
 
 	/*
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] block: don't allow splitting of a REQ_NOWAIT bio
  2023-01-04 16:09 ` [PATCH 2/2] block: don't allow splitting of a REQ_NOWAIT bio Jens Axboe
@ 2023-01-04 19:13   ` Keith Busch
  2023-01-04 20:24     ` Jens Axboe
  2023-01-10  9:21   ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Keith Busch @ 2023-01-04 19:13 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, mikelley, stable

On Wed, Jan 04, 2023 at 09:09:38AM -0700, Jens Axboe wrote:
> If we split a bio marked with REQ_NOWAIT, then we can trigger spurious
> EAGAIN if constituent parts of that split bio end up failing request
> allocations. Parts will complete just fine, but just a single failure
> in one of the chained bios will yield an EAGAIN final result for the
> parent bio.
> 
> Return EAGAIN early if we end up needing to split such a bio, which
> allows for saner recovery handling.

We're losing some performance here for large-ish single depth IO with
nvme. We can get a little back by forcing to use the async worker
earlier in the dispatch instead of getting all the way to the bio
splitting, but the overhead to check for the condition (which is
arbitrary decision anyway since we don't know the queue limits at
io_uring prep time) mostly negates the gain.

It's probably fine, though, since you can still hit peak b/w with just a
little higher qdepth. This patch is a simple way to handle the problem,
so looks good to me.

Reviewed-by: Keith Busch <kbusch@kernel.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] block: don't allow splitting of a REQ_NOWAIT bio
  2023-01-04 19:13   ` Keith Busch
@ 2023-01-04 20:24     ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2023-01-04 20:24 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, mikelley, stable

On 1/4/23 12:13?PM, Keith Busch wrote:
> On Wed, Jan 04, 2023 at 09:09:38AM -0700, Jens Axboe wrote:
>> If we split a bio marked with REQ_NOWAIT, then we can trigger spurious
>> EAGAIN if constituent parts of that split bio end up failing request
>> allocations. Parts will complete just fine, but just a single failure
>> in one of the chained bios will yield an EAGAIN final result for the
>> parent bio.
>>
>> Return EAGAIN early if we end up needing to split such a bio, which
>> allows for saner recovery handling.
> 
> We're losing some performance here for large-ish single depth IO with
> nvme. We can get a little back by forcing to use the async worker
> earlier in the dispatch instead of getting all the way to the bio
> splitting, but the overhead to check for the condition (which is
> arbitrary decision anyway since we don't know the queue limits at
> io_uring prep time) mostly negates the gain.

Yes, it's not perfect - for perfection, we'd need to be able to either
arbitrarily retry parts of the split bio if we can't get a tag. Or
reserve tags for this request when doing the splits. Either one of those
would require extensive surgery to achieve. In my testing, the cost is
low enough that I think we can live with this for now.

> It's probably fine, though, since you can still hit peak b/w with just a
> little higher qdepth. This patch is a simple way to handle the problem,
> so looks good to me.
> 
> Reviewed-by: Keith Busch <kbusch@kernel.org>

Thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] block: handle bio_split_to_limits() NULL return
  2023-01-04 16:09 ` [PATCH 1/2] block: handle bio_split_to_limits() NULL return Jens Axboe
@ 2023-01-10  9:20   ` Christoph Hellwig
  2023-01-17  6:01     ` Christoph Hellwig
  2023-01-18  2:22     ` Jens Axboe
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2023-01-10  9:20 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, mikelley

On Wed, Jan 04, 2023 at 09:09:37AM -0700, Jens Axboe wrote:
> This can't happen right now, but in preparation for allowing
> bio_split_to_limits() returning NULL if it ended the bio, check for it
> in all the callers.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/blk-merge.c             | 4 +++-
>  block/blk-mq.c                | 5 ++++-
>  drivers/block/drbd/drbd_req.c | 2 ++
>  drivers/block/ps3vram.c       | 2 ++
>  drivers/md/dm.c               | 2 ++
>  drivers/md/md.c               | 2 ++
>  drivers/nvme/host/multipath.c | 2 ++
>  drivers/s390/block/dcssblk.c  | 2 ++
>  8 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 35a8f75cc45d..071c5f8cf0cf 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -358,11 +358,13 @@ struct bio *__bio_split_to_limits(struct bio *bio,
>  	default:
>  		split = bio_split_rw(bio, lim, nr_segs, bs,
>  				get_max_io_size(bio, lim) << SECTOR_SHIFT);
> +		if (IS_ERR(split))
> +			return NULL;

Can we decide on either passing an ERR_PTR or NULL and do it through
the whole stack? 

> diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
> index eb14ec8ec04c..e36216d50753 100644
> --- a/drivers/block/drbd/drbd_req.c
> +++ b/drivers/block/drbd/drbd_req.c
> @@ -1607,6 +1607,8 @@ void drbd_submit_bio(struct bio *bio)
>  	struct drbd_device *device = bio->bi_bdev->bd_disk->private_data;
>  
>  	bio = bio_split_to_limits(bio);
> +	if (!bio)
> +		return;

So for the callers in drivers, do we need thee checks for drivers
that don't even support REQ_NOWAIT? 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] block: don't allow splitting of a REQ_NOWAIT bio
  2023-01-04 16:09 ` [PATCH 2/2] block: don't allow splitting of a REQ_NOWAIT bio Jens Axboe
  2023-01-04 19:13   ` Keith Busch
@ 2023-01-10  9:21   ` Christoph Hellwig
  2023-01-17  6:01     ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2023-01-10  9:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, mikelley, stable

On Wed, Jan 04, 2023 at 09:09:38AM -0700, Jens Axboe wrote:
>  split:
> +	/*
> +	 * We can't sanely support splitting for a REQ_NOWAIT bio. End it
> +	 * with EAGAIN if splitting is required and return an error pointer.
> +	 */
> +	if (bio->bi_opf & REQ_NOWAIT) {
> +		bio->bi_status = BLK_STS_AGAIN;
> +		bio_endio(bio);
> +		return ERR_PTR(-EAGAIN);
> +	}

Hmm.  Just completing the bio here seems a little dangerous in terms
of ownership.  What speaks against letting the caller do it?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] block: handle bio_split_to_limits() NULL return
  2023-01-10  9:20   ` Christoph Hellwig
@ 2023-01-17  6:01     ` Christoph Hellwig
  2023-01-18  2:22     ` Jens Axboe
  1 sibling, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2023-01-17  6:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, mikelley

Per the question in the other thread: these are my comments to it.

On Tue, Jan 10, 2023 at 01:20:26AM -0800, Christoph Hellwig wrote:
> >  		split = bio_split_rw(bio, lim, nr_segs, bs,
> >  				get_max_io_size(bio, lim) << SECTOR_SHIFT);
> > +		if (IS_ERR(split))
> > +			return NULL;
> 
> Can we decide on either passing an ERR_PTR or NULL and do it through
> the whole stack? 
> 
> > diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
> > index eb14ec8ec04c..e36216d50753 100644
> > --- a/drivers/block/drbd/drbd_req.c
> > +++ b/drivers/block/drbd/drbd_req.c
> > @@ -1607,6 +1607,8 @@ void drbd_submit_bio(struct bio *bio)
> >  	struct drbd_device *device = bio->bi_bdev->bd_disk->private_data;
> >  
> >  	bio = bio_split_to_limits(bio);
> > +	if (!bio)
> > +		return;
> 
> So for the callers in drivers, do we need thee checks for drivers
> that don't even support REQ_NOWAIT? 
---end quoted text---

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] block: don't allow splitting of a REQ_NOWAIT bio
  2023-01-10  9:21   ` Christoph Hellwig
@ 2023-01-17  6:01     ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2023-01-17  6:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, mikelley, stable

And this is the other comment.

On Tue, Jan 10, 2023 at 01:21:24AM -0800, Christoph Hellwig wrote:
> On Wed, Jan 04, 2023 at 09:09:38AM -0700, Jens Axboe wrote:
> >  split:
> > +	/*
> > +	 * We can't sanely support splitting for a REQ_NOWAIT bio. End it
> > +	 * with EAGAIN if splitting is required and return an error pointer.
> > +	 */
> > +	if (bio->bi_opf & REQ_NOWAIT) {
> > +		bio->bi_status = BLK_STS_AGAIN;
> > +		bio_endio(bio);
> > +		return ERR_PTR(-EAGAIN);
> > +	}
> 
> Hmm.  Just completing the bio here seems a little dangerous in terms
> of ownership.  What speaks against letting the caller do it?
---end quoted text---

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] block: handle bio_split_to_limits() NULL return
  2023-01-10  9:20   ` Christoph Hellwig
  2023-01-17  6:01     ` Christoph Hellwig
@ 2023-01-18  2:22     ` Jens Axboe
  1 sibling, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2023-01-18  2:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, mikelley

On 1/10/23 2:20 AM, Christoph Hellwig wrote:
> On Wed, Jan 04, 2023 at 09:09:37AM -0700, Jens Axboe wrote:
>> This can't happen right now, but in preparation for allowing
>> bio_split_to_limits() returning NULL if it ended the bio, check for it
>> in all the callers.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>>  block/blk-merge.c             | 4 +++-
>>  block/blk-mq.c                | 5 ++++-
>>  drivers/block/drbd/drbd_req.c | 2 ++
>>  drivers/block/ps3vram.c       | 2 ++
>>  drivers/md/dm.c               | 2 ++
>>  drivers/md/md.c               | 2 ++
>>  drivers/nvme/host/multipath.c | 2 ++
>>  drivers/s390/block/dcssblk.c  | 2 ++
>>  8 files changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 35a8f75cc45d..071c5f8cf0cf 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -358,11 +358,13 @@ struct bio *__bio_split_to_limits(struct bio *bio,
>>  	default:
>>  		split = bio_split_rw(bio, lim, nr_segs, bs,
>>  				get_max_io_size(bio, lim) << SECTOR_SHIFT);
>> +		if (IS_ERR(split))
>> +			return NULL;
> 
> Can we decide on either passing an ERR_PTR or NULL and do it through
> the whole stack? 

We need the error return here as we could just return NULL and it not
be an error, but for further down the stack I feel like returning an
error that you can't do anything with (as it's already been dealt with)
would be confusing. That's why I did it that way.

>> diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
>> index eb14ec8ec04c..e36216d50753 100644
>> --- a/drivers/block/drbd/drbd_req.c
>> +++ b/drivers/block/drbd/drbd_req.c
>> @@ -1607,6 +1607,8 @@ void drbd_submit_bio(struct bio *bio)
>>  	struct drbd_device *device = bio->bi_bdev->bd_disk->private_data;
>>  
>>  	bio = bio_split_to_limits(bio);
>> +	if (!bio)
>> +		return;
> 
> So for the callers in drivers, do we need thee checks for drivers
> that don't even support REQ_NOWAIT? 

Good point, probably not, we should be erroring these out before they
reach the driver.

Doesn't hurt though, it would not necessarily be obvious that you'd
now need to start checking bio_split_to_limits() return values when
you set NOWAIT on the queue.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-01-18  2:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-04 16:09 [PATCH 0/2] Don't allow REQ_NOWAIT for bio splitting Jens Axboe
2023-01-04 16:09 ` [PATCH 1/2] block: handle bio_split_to_limits() NULL return Jens Axboe
2023-01-10  9:20   ` Christoph Hellwig
2023-01-17  6:01     ` Christoph Hellwig
2023-01-18  2:22     ` Jens Axboe
2023-01-04 16:09 ` [PATCH 2/2] block: don't allow splitting of a REQ_NOWAIT bio Jens Axboe
2023-01-04 19:13   ` Keith Busch
2023-01-04 20:24     ` Jens Axboe
2023-01-10  9:21   ` Christoph Hellwig
2023-01-17  6:01     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.