All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] wbt: fix incorrect throttling due to flush latency
@ 2018-02-05 19:11 Mikulas Patocka
  2018-02-05 19:22 ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Mikulas Patocka @ 2018-02-05 19:11 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, dm-devel

I have a workload where one process sends many asynchronous write bios
(without waiting for them) and another process sends synchronous flush
bios. During this workload, writeback throttling throttles down to one
outstanding bio, and this incorrect throttling causes performance
degradation (all write bios sleep in __wbt_wait and they couldn't be sent
in parallel).

The reason for this throttling is that wbt_data_dir counts flush requests
in the read bucket. The flush requests (that take quite a long time)
trigger this condition repeatedly:
	if (stat[READ].min > rwb->min_lat_nsec)
and that condition causes scale down to one outstanding request, despite
the fact that there are no read bios at all.

A similar problem could also show up with REQ_OP_ZONE_REPORT and
REQ_OP_ZONE_RESET - they are also counted in the read bucket.

This patch fixes the function wbt_data_dir, so that only REQ_OP_READ 
requests are counted in the read bucket. The patch improves SATA 
write+flush throughput from 130MB/s to 350MB/s.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org	# v4.12+

---
 block/blk-wbt.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: linux-stable/block/blk-wbt.c
===================================================================
--- linux-stable.orig/block/blk-wbt.c	2018-02-05 12:30:09.606063908 -0500
+++ linux-stable/block/blk-wbt.c	2018-02-05 13:50:02.784213501 -0500
@@ -697,7 +697,11 @@ u64 wbt_default_latency_nsec(struct requ
 
 static int wbt_data_dir(const struct request *rq)
 {
-	return rq_data_dir(rq);
+	/*
+	 * Flushes must be counted in the write bucket, so that high flush
+	 * latencies don't cause scale down.
+	 */
+	return req_op(rq) == REQ_OP_READ ? READ : WRITE;
 }
 
 int wbt_init(struct request_queue *q)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] wbt: fix incorrect throttling due to flush latency
  2018-02-05 19:11 [PATCH] wbt: fix incorrect throttling due to flush latency Mikulas Patocka
@ 2018-02-05 19:22 ` Jens Axboe
  2018-02-05 19:35   ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2018-02-05 19:22 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: linux-block, dm-devel

On 2/5/18 12:11 PM, Mikulas Patocka wrote:
> I have a workload where one process sends many asynchronous write bios
> (without waiting for them) and another process sends synchronous flush
> bios. During this workload, writeback throttling throttles down to one
> outstanding bio, and this incorrect throttling causes performance
> degradation (all write bios sleep in __wbt_wait and they couldn't be sent
> in parallel).
> 
> The reason for this throttling is that wbt_data_dir counts flush requests
> in the read bucket. The flush requests (that take quite a long time)
> trigger this condition repeatedly:
> 	if (stat[READ].min > rwb->min_lat_nsec)
> and that condition causes scale down to one outstanding request, despite
> the fact that there are no read bios at all.
> 
> A similar problem could also show up with REQ_OP_ZONE_REPORT and
> REQ_OP_ZONE_RESET - they are also counted in the read bucket.
> 
> This patch fixes the function wbt_data_dir, so that only REQ_OP_READ 
> requests are counted in the read bucket. The patch improves SATA 
> write+flush throughput from 130MB/s to 350MB/s.

Good catch. But I do wonder if we should account them at all. It
might make sense to account flushes as writes, but we really
should not account anything that isn't regular read/write IO 
related.

Something like the below? Potentially including REQ_OP_FLUSH in the
write latencies as well.


diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index ae8de9780085..55671e5f3b6e 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -697,7 +697,13 @@ u64 wbt_default_latency_nsec(struct request_queue *q)
 
 static int wbt_data_dir(const struct request *rq)
 {
-	return rq_data_dir(rq);
+	if (req_op(rq) == REQ_OP_READ)
+		return READ;
+	else if (req_op(rq) == REQ_OP_WRITE)
+		return WRITE;
+
+	/* don't account */
+	return -1;
 }
 
 int wbt_init(struct request_queue *q)

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] wbt: fix incorrect throttling due to flush latency
  2018-02-05 19:22 ` Jens Axboe
@ 2018-02-05 19:35   ` Jens Axboe
  2018-02-05 20:00     ` Mikulas Patocka
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2018-02-05 19:35 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: linux-block, dm-devel

On 2/5/18 12:22 PM, Jens Axboe wrote:
> On 2/5/18 12:11 PM, Mikulas Patocka wrote:
>> I have a workload where one process sends many asynchronous write bios
>> (without waiting for them) and another process sends synchronous flush
>> bios. During this workload, writeback throttling throttles down to one
>> outstanding bio, and this incorrect throttling causes performance
>> degradation (all write bios sleep in __wbt_wait and they couldn't be sent
>> in parallel).
>>
>> The reason for this throttling is that wbt_data_dir counts flush requests
>> in the read bucket. The flush requests (that take quite a long time)
>> trigger this condition repeatedly:
>> 	if (stat[READ].min > rwb->min_lat_nsec)
>> and that condition causes scale down to one outstanding request, despite
>> the fact that there are no read bios at all.
>>
>> A similar problem could also show up with REQ_OP_ZONE_REPORT and
>> REQ_OP_ZONE_RESET - they are also counted in the read bucket.
>>
>> This patch fixes the function wbt_data_dir, so that only REQ_OP_READ 
>> requests are counted in the read bucket. The patch improves SATA 
>> write+flush throughput from 130MB/s to 350MB/s.
> 
> Good catch. But I do wonder if we should account them at all. It
> might make sense to account flushes as writes, but we really
> should not account anything that isn't regular read/write IO 
> related.
> 
> Something like the below? Potentially including REQ_OP_FLUSH in the
> write latencies as well.

Thinking about it, flush should just be counted as writes, and the
rest disregarded. Ala below.


diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index ae8de9780085..f92fc84b5e2c 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -697,7 +697,15 @@ u64 wbt_default_latency_nsec(struct request_queue *q)
 
 static int wbt_data_dir(const struct request *rq)
 {
-	return rq_data_dir(rq);
+	const int op = req_op(rq);
+
+	if (op == REQ_OP_READ)
+		return READ;
+	else if (op == REQ_OP_WRITE || op == REQ_OP_FLUSH)
+		return WRITE;
+
+	/* don't account */
+	return -1;
 }
 
 int wbt_init(struct request_queue *q)

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] wbt: fix incorrect throttling due to flush latency
  2018-02-05 19:35   ` Jens Axboe
@ 2018-02-05 20:00     ` Mikulas Patocka
  2018-02-05 20:16       ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Mikulas Patocka @ 2018-02-05 20:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, dm-devel



On Mon, 5 Feb 2018, Jens Axboe wrote:

> On 2/5/18 12:22 PM, Jens Axboe wrote:
> > On 2/5/18 12:11 PM, Mikulas Patocka wrote:
> >> I have a workload where one process sends many asynchronous write bios
> >> (without waiting for them) and another process sends synchronous flush
> >> bios. During this workload, writeback throttling throttles down to one
> >> outstanding bio, and this incorrect throttling causes performance
> >> degradation (all write bios sleep in __wbt_wait and they couldn't be sent
> >> in parallel).
> >>
> >> The reason for this throttling is that wbt_data_dir counts flush requests
> >> in the read bucket. The flush requests (that take quite a long time)
> >> trigger this condition repeatedly:
> >> 	if (stat[READ].min > rwb->min_lat_nsec)
> >> and that condition causes scale down to one outstanding request, despite
> >> the fact that there are no read bios at all.
> >>
> >> A similar problem could also show up with REQ_OP_ZONE_REPORT and
> >> REQ_OP_ZONE_RESET - they are also counted in the read bucket.
> >>
> >> This patch fixes the function wbt_data_dir, so that only REQ_OP_READ 
> >> requests are counted in the read bucket. The patch improves SATA 
> >> write+flush throughput from 130MB/s to 350MB/s.
> > 
> > Good catch. But I do wonder if we should account them at all. It
> > might make sense to account flushes as writes, but we really
> > should not account anything that isn't regular read/write IO 
> > related.
> > 
> > Something like the below? Potentially including REQ_OP_FLUSH in the
> > write latencies as well.
> 
> Thinking about it, flush should just be counted as writes, and the
> rest disregarded. Ala below.
> 
> 
> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> index ae8de9780085..f92fc84b5e2c 100644
> --- a/block/blk-wbt.c
> +++ b/block/blk-wbt.c
> @@ -697,7 +697,15 @@ u64 wbt_default_latency_nsec(struct request_queue *q)
>  
>  static int wbt_data_dir(const struct request *rq)
>  {
> -	return rq_data_dir(rq);
> +	const int op = req_op(rq);
> +
> +	if (op == REQ_OP_READ)
> +		return READ;
> +	else if (op == REQ_OP_WRITE || op == REQ_OP_FLUSH)
> +		return WRITE;
> +
> +	/* don't account */
> +	return -1;
>  }
>  
>  int wbt_init(struct request_queue *q)
> 
> -- 
> Jens Axboe

Yes, this works.

BTW. write throttling still causes some performance degradation (350MB/s 
vs. 500MB/s without throttling) - should there be some logic that disables 
write throttling if no sync requests are received?

Something like - if there's sync read, enable write throttling - if there 
were no sync reads in the last 5 seconds, disable write throttling?

Mikulas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] wbt: fix incorrect throttling due to flush latency
  2018-02-05 20:00     ` Mikulas Patocka
@ 2018-02-05 20:16       ` Jens Axboe
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2018-02-05 20:16 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: linux-block, dm-devel

On 2/5/18 1:00 PM, Mikulas Patocka wrote:
> 
> 
> On Mon, 5 Feb 2018, Jens Axboe wrote:
> 
>> On 2/5/18 12:22 PM, Jens Axboe wrote:
>>> On 2/5/18 12:11 PM, Mikulas Patocka wrote:
>>>> I have a workload where one process sends many asynchronous write bios
>>>> (without waiting for them) and another process sends synchronous flush
>>>> bios. During this workload, writeback throttling throttles down to one
>>>> outstanding bio, and this incorrect throttling causes performance
>>>> degradation (all write bios sleep in __wbt_wait and they couldn't be sent
>>>> in parallel).
>>>>
>>>> The reason for this throttling is that wbt_data_dir counts flush requests
>>>> in the read bucket. The flush requests (that take quite a long time)
>>>> trigger this condition repeatedly:
>>>> 	if (stat[READ].min > rwb->min_lat_nsec)
>>>> and that condition causes scale down to one outstanding request, despite
>>>> the fact that there are no read bios at all.
>>>>
>>>> A similar problem could also show up with REQ_OP_ZONE_REPORT and
>>>> REQ_OP_ZONE_RESET - they are also counted in the read bucket.
>>>>
>>>> This patch fixes the function wbt_data_dir, so that only REQ_OP_READ 
>>>> requests are counted in the read bucket. The patch improves SATA 
>>>> write+flush throughput from 130MB/s to 350MB/s.
>>>
>>> Good catch. But I do wonder if we should account them at all. It
>>> might make sense to account flushes as writes, but we really
>>> should not account anything that isn't regular read/write IO 
>>> related.
>>>
>>> Something like the below? Potentially including REQ_OP_FLUSH in the
>>> write latencies as well.
>>
>> Thinking about it, flush should just be counted as writes, and the
>> rest disregarded. Ala below.
>>
>>
>> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
>> index ae8de9780085..f92fc84b5e2c 100644
>> --- a/block/blk-wbt.c
>> +++ b/block/blk-wbt.c
>> @@ -697,7 +697,15 @@ u64 wbt_default_latency_nsec(struct request_queue *q)
>>  
>>  static int wbt_data_dir(const struct request *rq)
>>  {
>> -	return rq_data_dir(rq);
>> +	const int op = req_op(rq);
>> +
>> +	if (op == REQ_OP_READ)
>> +		return READ;
>> +	else if (op == REQ_OP_WRITE || op == REQ_OP_FLUSH)
>> +		return WRITE;
>> +
>> +	/* don't account */
>> +	return -1;
>>  }
>>  
>>  int wbt_init(struct request_queue *q)
>>
>> -- 
>> Jens Axboe
> 
> Yes, this works.

OK good, I'll queue that up then.

> BTW. write throttling still causes some performance degradation (350MB/s 
> vs. 500MB/s without throttling) - should there be some logic that disables 
> write throttling if no sync requests are received?
> 
> Something like - if there's sync read, enable write throttling - if there 
> were no sync reads in the last 5 seconds, disable write throttling?

blk-wbt maintains a baseline that isn't too deep into writes, or you run
into problems exactly when a sync or read request does come in after 5
seconds, or whatever heuristic you choose. So that's very much by
design. It does allow steps to go negative if no reads have been seen,
but only so much so, and it bounces back immediately when one is seen.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-02-05 20:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-05 19:11 [PATCH] wbt: fix incorrect throttling due to flush latency Mikulas Patocka
2018-02-05 19:22 ` Jens Axboe
2018-02-05 19:35   ` Jens Axboe
2018-02-05 20:00     ` Mikulas Patocka
2018-02-05 20:16       ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.