linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] block: avoid blk_bio_segment_split for small I/O operations
@ 2019-11-04 18:05 Christoph Hellwig
  2019-11-04 18:24 ` Jens Axboe
  2019-11-04 19:30 ` Keith Busch
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2019-11-04 18:05 UTC (permalink / raw)
  To: axboe, ming.lei, linux-block

__blk_queue_split() adds significant overhead for small I/O operations.
Add a shortcut to avoid it for cases where we know we never need to
split.

Based on a patch from Ming Lei.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-merge.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 48e6725b32ee..06eb38357b41 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -293,7 +293,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 void __blk_queue_split(struct request_queue *q, struct bio **bio,
 		unsigned int *nr_segs)
 {
-	struct bio *split;
+	struct bio *split = NULL;
 
 	switch (bio_op(*bio)) {
 	case REQ_OP_DISCARD:
@@ -309,6 +309,19 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio,
 				nr_segs);
 		break;
 	default:
+		/*
+		 * All drivers must accept single-segments bios that are <=
+		 * PAGE_SIZE.  This is a quick and dirty check that relies on
+		 * the fact that bi_io_vec[0] is always valid if a bio has data.
+		 * The check might lead to occasional false negatives when bios
+		 * are cloned, but compared to the performance impact of cloned
+		 * bios themselves the loop below doesn't matter anyway.
+		 */
+		if ((*bio)->bi_vcnt == 1 &&
+		    (*bio)->bi_io_vec[0].bv_len <= PAGE_SIZE) {
+			*nr_segs = 1;
+			break;
+		}
 		split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
 		break;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: avoid blk_bio_segment_split for small I/O operations
  2019-11-04 18:05 [PATCH] block: avoid blk_bio_segment_split for small I/O operations Christoph Hellwig
@ 2019-11-04 18:24 ` Jens Axboe
  2019-11-04 19:30 ` Keith Busch
  1 sibling, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2019-11-04 18:24 UTC (permalink / raw)
  To: Christoph Hellwig, ming.lei, linux-block

On 11/4/19 11:05 AM, Christoph Hellwig wrote:
> __blk_queue_split() adds significant overhead for small I/O operations.
> Add a shortcut to avoid it for cases where we know we never need to
> split.
> 
> Based on a patch from Ming Lei.

Applied, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: avoid blk_bio_segment_split for small I/O operations
  2019-11-04 18:05 [PATCH] block: avoid blk_bio_segment_split for small I/O operations Christoph Hellwig
  2019-11-04 18:24 ` Jens Axboe
@ 2019-11-04 19:30 ` Keith Busch
  2019-11-04 20:13   ` Jens Axboe
  1 sibling, 1 reply; 10+ messages in thread
From: Keith Busch @ 2019-11-04 19:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, ming.lei, linux-block

On Mon, Nov 04, 2019 at 10:05:43AM -0800, Christoph Hellwig wrote:
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 48e6725b32ee..06eb38357b41 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -293,7 +293,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>  void __blk_queue_split(struct request_queue *q, struct bio **bio,
>  		unsigned int *nr_segs)
>  {
> -	struct bio *split;
> +	struct bio *split = NULL;
>  
>  	switch (bio_op(*bio)) {
>  	case REQ_OP_DISCARD:
> @@ -309,6 +309,19 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio,
>  				nr_segs);
>  		break;
>  	default:
> +		/*
> +		 * All drivers must accept single-segments bios that are <=
> +		 * PAGE_SIZE.  This is a quick and dirty check that relies on
> +		 * the fact that bi_io_vec[0] is always valid if a bio has data.
> +		 * The check might lead to occasional false negatives when bios
> +		 * are cloned, but compared to the performance impact of cloned
> +		 * bios themselves the loop below doesn't matter anyway.
> +		 */
> +		if ((*bio)->bi_vcnt == 1 &&
> +		    (*bio)->bi_io_vec[0].bv_len <= PAGE_SIZE) {
> +			*nr_segs = 1;
> +			break;
> +		}
>  		split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
>  		break;
>  	}

If the device advertises a chunk boundary and this small IO happens to
cross it, skipping the split is going to harm performance.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: avoid blk_bio_segment_split for small I/O operations
  2019-11-04 19:30 ` Keith Busch
@ 2019-11-04 20:13   ` Jens Axboe
  2019-11-04 22:50     ` Keith Busch
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2019-11-04 20:13 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig; +Cc: ming.lei, linux-block

On 11/4/19 12:30 PM, Keith Busch wrote:
> On Mon, Nov 04, 2019 at 10:05:43AM -0800, Christoph Hellwig wrote:
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 48e6725b32ee..06eb38357b41 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -293,7 +293,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>   void __blk_queue_split(struct request_queue *q, struct bio **bio,
>>   		unsigned int *nr_segs)
>>   {
>> -	struct bio *split;
>> +	struct bio *split = NULL;
>>   
>>   	switch (bio_op(*bio)) {
>>   	case REQ_OP_DISCARD:
>> @@ -309,6 +309,19 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio,
>>   				nr_segs);
>>   		break;
>>   	default:
>> +		/*
>> +		 * All drivers must accept single-segments bios that are <=
>> +		 * PAGE_SIZE.  This is a quick and dirty check that relies on
>> +		 * the fact that bi_io_vec[0] is always valid if a bio has data.
>> +		 * The check might lead to occasional false negatives when bios
>> +		 * are cloned, but compared to the performance impact of cloned
>> +		 * bios themselves the loop below doesn't matter anyway.
>> +		 */
>> +		if ((*bio)->bi_vcnt == 1 &&
>> +		    (*bio)->bi_io_vec[0].bv_len <= PAGE_SIZE) {
>> +			*nr_segs = 1;
>> +			break;
>> +		}
>>   		split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
>>   		break;
>>   	}
> 
> If the device advertises a chunk boundary and this small IO happens to
> cross it, skipping the split is going to harm performance.

Does anyone do that, that isn't the first gen intel weirdness? Honest question,
but always seemed to me that this spec addition was driven entirely by that
one device.

And if they do, do they align on non-4k?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: avoid blk_bio_segment_split for small I/O operations
  2019-11-04 20:13   ` Jens Axboe
@ 2019-11-04 22:50     ` Keith Busch
  2019-11-04 22:55       ` Jens Axboe
  2019-11-04 22:58       ` Bart Van Assche
  0 siblings, 2 replies; 10+ messages in thread
From: Keith Busch @ 2019-11-04 22:50 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, ming.lei, linux-block

On Mon, Nov 04, 2019 at 01:13:53PM -0700, Jens Axboe wrote:
> > If the device advertises a chunk boundary and this small IO happens to
> > cross it, skipping the split is going to harm performance.
> 
> Does anyone do that, that isn't the first gen intel weirdness? Honest question,
> but always seemed to me that this spec addition was driven entirely by that
> one device.

There are at least 3 generations of Intel DC P-series that use this,
maybe more. I'm not sure if any other available vendor devices report
this feature, though.
 
> And if they do, do they align on non-4k?

All existing ones I'm aware of are 128k, so 4k aligned, but if the LBA
format is 512B, you could start a 4k IO at a 126k offset to straddle the
boundary. Hm, maybe we don't care about the split penalty in that case
since unaligned access is already going to be slower for other reasons ...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: avoid blk_bio_segment_split for small I/O operations
  2019-11-04 22:50     ` Keith Busch
@ 2019-11-04 22:55       ` Jens Axboe
  2019-11-04 22:57         ` Christoph Hellwig
  2019-11-04 22:58       ` Bart Van Assche
  1 sibling, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2019-11-04 22:55 UTC (permalink / raw)
  To: Keith Busch; +Cc: Christoph Hellwig, ming.lei, linux-block

On 11/4/19 3:50 PM, Keith Busch wrote:
> On Mon, Nov 04, 2019 at 01:13:53PM -0700, Jens Axboe wrote:
>>> If the device advertises a chunk boundary and this small IO happens to
>>> cross it, skipping the split is going to harm performance.
>>
>> Does anyone do that, that isn't the first gen intel weirdness? Honest question,
>> but always seemed to me that this spec addition was driven entirely by that
>> one device.
> 
> There are at least 3 generations of Intel DC P-series that use this,
> maybe more. I'm not sure if any other available vendor devices report
> this feature, though.

Gotcha

>> And if they do, do they align on non-4k?
> 
> All existing ones I'm aware of are 128k, so 4k aligned, but if the LBA
> format is 512B, you could start a 4k IO at a 126k offset to straddle the
> boundary. Hm, maybe we don't care about the split penalty in that case
> since unaligned access is already going to be slower for other reasons ...

Yeah, not sure that's a huge concern for that particular case. If you
think it's a real world issue, it should be possible to set aside a
queue bit for this and always have them hit the slower split path.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: avoid blk_bio_segment_split for small I/O operations
  2019-11-04 22:55       ` Jens Axboe
@ 2019-11-04 22:57         ` Christoph Hellwig
  2019-11-04 22:59           ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2019-11-04 22:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Keith Busch, Christoph Hellwig, ming.lei, linux-block

On Mon, Nov 04, 2019 at 03:55:41PM -0700, Jens Axboe wrote:
> > All existing ones I'm aware of are 128k, so 4k aligned, but if the LBA
> > format is 512B, you could start a 4k IO at a 126k offset to straddle the
> > boundary. Hm, maybe we don't care about the split penalty in that case
> > since unaligned access is already going to be slower for other reasons ...
> 
> Yeah, not sure that's a huge concern for that particular case. If you
> think it's a real world issue, it should be possible to set aside a
> queue bit for this and always have them hit the slower split path.

Well, we use that field for zoned devices also, in which case it is an
issue.  I think I'll need to send a patch to skip the fast path if
we have chunk_sectors set.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: avoid blk_bio_segment_split for small I/O operations
  2019-11-04 22:50     ` Keith Busch
  2019-11-04 22:55       ` Jens Axboe
@ 2019-11-04 22:58       ` Bart Van Assche
  2019-11-04 23:04         ` Keith Busch
  1 sibling, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2019-11-04 22:58 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe; +Cc: Christoph Hellwig, ming.lei, linux-block

On 11/4/19 2:50 PM, Keith Busch wrote:
> On Mon, Nov 04, 2019 at 01:13:53PM -0700, Jens Axboe wrote:
>>> If the device advertises a chunk boundary and this small IO happens to
>>> cross it, skipping the split is going to harm performance.
>>
>> Does anyone do that, that isn't the first gen intel weirdness? Honest question,
>> but always seemed to me that this spec addition was driven entirely by that
>> one device.
> 
> There are at least 3 generations of Intel DC P-series that use this,
> maybe more. I'm not sure if any other available vendor devices report
> this feature, though.
>   
>> And if they do, do they align on non-4k?
> 
> All existing ones I'm aware of are 128k, so 4k aligned, but if the LBA
> format is 512B, you could start a 4k IO at a 126k offset to straddle the
> boundary. Hm, maybe we don't care about the split penalty in that case
> since unaligned access is already going to be slower for other reasons ...

Aren't NVMe devices expected to set the NOIOB parameter to avoid that 
NVMe commands straddle boundaries that incur a performance penalty? From 
the NVMe spec: "Namespace Optimal IO Boundary (NOIOB): This field 
indicates the optimal IO boundary for this namespace. This field is 
specified in logical blocks. The host should construct read and write 
commands that do not cross the IO boundary to achieve optimal 
performance. A value of 0h indicates that no optimal IO boundary is 
reported."

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: avoid blk_bio_segment_split for small I/O operations
  2019-11-04 22:57         ` Christoph Hellwig
@ 2019-11-04 22:59           ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2019-11-04 22:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, ming.lei, linux-block

On 11/4/19 3:57 PM, Christoph Hellwig wrote:
> On Mon, Nov 04, 2019 at 03:55:41PM -0700, Jens Axboe wrote:
>>> All existing ones I'm aware of are 128k, so 4k aligned, but if the LBA
>>> format is 512B, you could start a 4k IO at a 126k offset to straddle the
>>> boundary. Hm, maybe we don't care about the split penalty in that case
>>> since unaligned access is already going to be slower for other reasons ...
>>
>> Yeah, not sure that's a huge concern for that particular case. If you
>> think it's a real world issue, it should be possible to set aside a
>> queue bit for this and always have them hit the slower split path.
> 
> Well, we use that field for zoned devices also, in which case it is an
> issue.  I think I'll need to send a patch to skip the fast path if
> we have chunk_sectors set.

Yes, let's do that, makes sense.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: avoid blk_bio_segment_split for small I/O operations
  2019-11-04 22:58       ` Bart Van Assche
@ 2019-11-04 23:04         ` Keith Busch
  0 siblings, 0 replies; 10+ messages in thread
From: Keith Busch @ 2019-11-04 23:04 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jens Axboe, Christoph Hellwig, ming.lei, linux-block

On Mon, Nov 04, 2019 at 02:58:41PM -0800, Bart Van Assche wrote:
> On 11/4/19 2:50 PM, Keith Busch wrote:
> > On Mon, Nov 04, 2019 at 01:13:53PM -0700, Jens Axboe wrote:
> > > > If the device advertises a chunk boundary and this small IO happens to
> > > > cross it, skipping the split is going to harm performance.
> > > 
> > > Does anyone do that, that isn't the first gen intel weirdness? Honest question,
> > > but always seemed to me that this spec addition was driven entirely by that
> > > one device.
> > 
> > There are at least 3 generations of Intel DC P-series that use this,
> > maybe more. I'm not sure if any other available vendor devices report
> > this feature, though.
> > > And if they do, do they align on non-4k?
> > 
> > All existing ones I'm aware of are 128k, so 4k aligned, but if the LBA
> > format is 512B, you could start a 4k IO at a 126k offset to straddle the
> > boundary. Hm, maybe we don't care about the split penalty in that case
> > since unaligned access is already going to be slower for other reasons ...
> 
> Aren't NVMe devices expected to set the NOIOB parameter to avoid that NVMe
> commands straddle boundaries that incur a performance penalty? From the NVMe
> spec: "Namespace Optimal IO Boundary (NOIOB): This field indicates the
> optimal IO boundary for this namespace. This field is specified in logical
> blocks. The host should construct read and write commands that do not cross
> the IO boundary to achieve optimal performance. A value of 0h indicates that
> no optimal IO boundary is reported."

Yes, for nvme, noiob is the feature we're talking about.

I was initially just thinking about performance, but there's other
cases Christoph mentioned where the host split is necessary.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-11-04 23:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-04 18:05 [PATCH] block: avoid blk_bio_segment_split for small I/O operations Christoph Hellwig
2019-11-04 18:24 ` Jens Axboe
2019-11-04 19:30 ` Keith Busch
2019-11-04 20:13   ` Jens Axboe
2019-11-04 22:50     ` Keith Busch
2019-11-04 22:55       ` Jens Axboe
2019-11-04 22:57         ` Christoph Hellwig
2019-11-04 22:59           ` Jens Axboe
2019-11-04 22:58       ` Bart Van Assche
2019-11-04 23:04         ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).