[PATCH 0/3] bfq: Two fixes and a cleanup for sequential readers

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] bfq: Two fixes and a cleanup for sequential readers
@ 2021-01-13 10:09 ` Jan Kara
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2020-06-05 14:16 UTC (permalink / raw)
  To: linux-block; +Cc: Paolo Valente, Jan Kara

Hello,

this patch series contains one tiny cleanup and two relatively simple fixes
for BFQ I've uncovered when analyzing behavior of four parallel sequential
readers on one machine. The fio jobfile is like:

[global]
direct=0
ioengine=sync
invalidate=1
size=16g
rw=read
bs=4k

[reader]
numjobs=4
directory=/mnt

The first patch fixes a problem due to which the four bfq queues were getting
merged without a reason. The third patch fixes a problem where we were unfairly
raising bfq queue think time (leading to clearing of short_ttime and subsequent
reduction in throughput).

What do people think about these?

								Honza

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/3] bfq: Avoid false bfq queue merging
  2021-01-13 10:09 ` [PATCH 0/3 v2] " Jan Kara
  (?)
@ 2020-06-05 14:16 ` Jan Kara
  2020-06-11  7:13   ` Paolo Valente
  -1 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2020-06-05 14:16 UTC (permalink / raw)
  To: linux-block; +Cc: Paolo Valente, Jan Kara, stable

bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
makes sense to merge current bfq queue with the in-service queue.
However if the in-service queue is freshly scheduled and didn't dispatch
any requests yet, bfqd->in_serv_last_pos is stale and contains value
from the previously scheduled bfq queue which can thus result in a bogus
decision that the two queues should be merged. This bug can be observed
for example with the following fio jobfile:

[global]
direct=0
ioengine=sync
invalidate=1
size=1g
rw=read

[reader]
numjobs=4
directory=/mnt

where the 4 processes will end up in the one shared bfq queue although
they do IO to physically very distant files (for some reason I was able to
observe this only with slice_idle=1ms setting).

Fix the problem by invalidating bfqd->in_serv_last_pos when switching
in-service queue.

Fixes: 058fdecc6de7 ("block, bfq: fix in-service-queue check for queue merging")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 3d411716d7ee..50017275915f 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2937,6 +2937,7 @@ static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
 	}

 	bfqd->in_service_queue = bfqq;
+	bfqd->in_serv_last_pos = 0;
 }

 /*
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/3] bfq: Use 'ttime' local variable
  2021-01-13 10:09 ` [PATCH 0/3 v2] " Jan Kara
  (?)
  (?)
@ 2020-06-05 14:16 ` Jan Kara
  2020-06-11  7:14   ` Paolo Valente
  -1 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2020-06-05 14:16 UTC (permalink / raw)
  To: linux-block; +Cc: Paolo Valente, Jan Kara

Use local variable 'ttime' instead of dereferencing bfqq.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 50017275915f..c66c3eaa9e26 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5196,7 +5196,7 @@ static void bfq_update_io_thinktime(struct bfq_data *bfqd,
 
 	elapsed = min_t(u64, elapsed, 2ULL * bfqd->bfq_slice_idle);
 
-	ttime->ttime_samples = (7*bfqq->ttime.ttime_samples + 256) / 8;
+	ttime->ttime_samples = (7*ttime->ttime_samples + 256) / 8;
 	ttime->ttime_total = div_u64(7*ttime->ttime_total + 256*elapsed,  8);
 	ttime->ttime_mean = div64_ul(ttime->ttime_total + 128,
 				     ttime->ttime_samples);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/3] bfq: Use only idle IO periods for think time calculations
  2021-01-13 10:09 ` [PATCH 0/3 v2] " Jan Kara
                   ` (2 preceding siblings ...)
  (?)
@ 2020-06-05 14:16 ` Jan Kara
  2020-06-11 14:11   ` Paolo Valente
  -1 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2020-06-05 14:16 UTC (permalink / raw)
  To: linux-block; +Cc: Paolo Valente, Jan Kara

Currently whenever bfq queue has a request queued we add now -
last_completion_time to the think time statistics. This is however
misleading in case the process is able to submit several requests in
parallel because e.g. if the queue has request completed at time T0 and
then queues new requests at times T1, T2, then we will add T1-T0 and
T2-T0 to think time statistics which just doesn't make any sence (the
queue's think time is penalized by the queue being able to submit more
IO). So add to think time statistics only time intervals when the queue
had no IO pending.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index c66c3eaa9e26..4b1c9c5f57b6 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5192,8 +5192,16 @@ static void bfq_update_io_thinktime(struct bfq_data *bfqd,
 				    struct bfq_queue *bfqq)
 {
 	struct bfq_ttime *ttime = &bfqq->ttime;
-	u64 elapsed = ktime_get_ns() - bfqq->ttime.last_end_request;
-
+	u64 elapsed;
+	
+	/*
+	 * We are really interested in how long it takes for the queue to
+	 * become busy when there is no outstanding IO for this queue. So
+	 * ignore cases when the bfq queue has already IO queued.
+	 */
+	if (bfqq->dispatched || bfq_bfqq_busy(bfqq))
+		return;
+	elapsed = ktime_get_ns() - bfqq->ttime.last_end_request;
 	elapsed = min_t(u64, elapsed, 2ULL * bfqd->bfq_slice_idle);
 
 	ttime->ttime_samples = (7*ttime->ttime_samples + 256) / 8;
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] bfq: Avoid false bfq queue merging
  2020-06-05 14:16 ` [PATCH 1/3] bfq: Avoid false bfq queue merging Jan Kara
@ 2020-06-11  7:13   ` Paolo Valente
  2020-06-11  8:31     ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Valente @ 2020-06-11  7:13 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-block, stable



> Il giorno 5 giu 2020, alle ore 16:16, Jan Kara <jack@suse.cz> ha scritto:
> 
> bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
> makes sense to merge current bfq queue with the in-service queue.
> However if the in-service queue is freshly scheduled and didn't dispatch
> any requests yet, bfqd->in_serv_last_pos is stale and contains value
> from the previously scheduled bfq queue which can thus result in a bogus
> decision that the two queues should be merged.

Good catch! 

> This bug can be observed
> for example with the following fio jobfile:
> 
> [global]
> direct=0
> ioengine=sync
> invalidate=1
> size=1g
> rw=read
> 
> [reader]
> numjobs=4
> directory=/mnt
> 
> where the 4 processes will end up in the one shared bfq queue although
> they do IO to physically very distant files (for some reason I was able to
> observe this only with slice_idle=1ms setting).
> 
> Fix the problem by invalidating bfqd->in_serv_last_pos when switching
> in-service queue.
> 

Apart from the nonexistent problem that even 0 is a valid LBA :)

Acked-by: Paolo Valente <paolo.valente@linaro.org>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/3] bfq: Use 'ttime' local variable
  2020-06-05 14:16 ` [PATCH 2/3] bfq: Use 'ttime' local variable Jan Kara
@ 2020-06-11  7:14   ` Paolo Valente
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Valente @ 2020-06-11  7:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-block



> Il giorno 5 giu 2020, alle ore 16:16, Jan Kara <jack@suse.cz> ha scritto:
> 
> Use local variable 'ttime' instead of dereferencing bfqq.
> 

Acked-by: Paolo Valente <paolo.valente@linaro.org>

Thanks!

> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> block/bfq-iosched.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 50017275915f..c66c3eaa9e26 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -5196,7 +5196,7 @@ static void bfq_update_io_thinktime(struct bfq_data *bfqd,
> 
> 	elapsed = min_t(u64, elapsed, 2ULL * bfqd->bfq_slice_idle);
> 
> -	ttime->ttime_samples = (7*bfqq->ttime.ttime_samples + 256) / 8;
> +	ttime->ttime_samples = (7*ttime->ttime_samples + 256) / 8;
> 	ttime->ttime_total = div_u64(7*ttime->ttime_total + 256*elapsed,  8);
> 	ttime->ttime_mean = div64_ul(ttime->ttime_total + 128,
> 				     ttime->ttime_samples);
> -- 
> 2.16.4
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] bfq: Avoid false bfq queue merging
  2020-06-11  7:13   ` Paolo Valente
@ 2020-06-11  8:31     ` Jan Kara
  2020-06-11 14:12       ` Paolo Valente
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2020-06-11  8:31 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Jan Kara, linux-block, stable

On Thu 11-06-20 09:13:07, Paolo Valente wrote:
> 
> 
> > Il giorno 5 giu 2020, alle ore 16:16, Jan Kara <jack@suse.cz> ha scritto:
> > 
> > bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
> > makes sense to merge current bfq queue with the in-service queue.
> > However if the in-service queue is freshly scheduled and didn't dispatch
> > any requests yet, bfqd->in_serv_last_pos is stale and contains value
> > from the previously scheduled bfq queue which can thus result in a bogus
> > decision that the two queues should be merged.
> 
> Good catch! 
> 
> > This bug can be observed
> > for example with the following fio jobfile:
> > 
> > [global]
> > direct=0
> > ioengine=sync
> > invalidate=1
> > size=1g
> > rw=read
> > 
> > [reader]
> > numjobs=4
> > directory=/mnt
> > 
> > where the 4 processes will end up in the one shared bfq queue although
> > they do IO to physically very distant files (for some reason I was able to
> > observe this only with slice_idle=1ms setting).
> > 
> > Fix the problem by invalidating bfqd->in_serv_last_pos when switching
> > in-service queue.
> > 
> 
> Apart from the nonexistent problem that even 0 is a valid LBA :)

Yes, I was also thinking about that and decided 0 is "good enough" :). But
I just as well just switch to (sector_t)-1 if you think it would be better.

> Acked-by: Paolo Valente <paolo.valente@linaro.org>

Thanks!

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] bfq: Use only idle IO periods for think time calculations
  2020-06-05 14:16 ` [PATCH 3/3] bfq: Use only idle IO periods for think time calculations Jan Kara
@ 2020-06-11 14:11   ` Paolo Valente
  2020-06-11 14:39     ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Valente @ 2020-06-11 14:11 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-block



> Il giorno 5 giu 2020, alle ore 16:16, Jan Kara <jack@suse.cz> ha scritto:
> 
> Currently whenever bfq queue has a request queued we add now -
> last_completion_time to the think time statistics. This is however
> misleading in case the process is able to submit several requests in
> parallel because e.g. if the queue has request completed at time T0 and
> then queues new requests at times T1, T2, then we will add T1-T0 and
> T2-T0 to think time statistics which just doesn't make any sence (the
> queue's think time is penalized by the queue being able to submit more
> IO).

I've thought about this issue several times.  My concern is that, by
updating the think time only when strictly meaningful, we reduce the
number of samples.  In some cases, we may reduce it to a very low
value.  For this reason, so far I have desisted from changing the
current scheme.  IOW, I opted for dirtier statistics to avoid the risk
of too scarse statistics.  Any feedback is very welcome.

Thanks,
Paolo

> So add to think time statistics only time intervals when the queue
> had no IO pending.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> block/bfq-iosched.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index c66c3eaa9e26..4b1c9c5f57b6 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -5192,8 +5192,16 @@ static void bfq_update_io_thinktime(struct bfq_data *bfqd,
> 				    struct bfq_queue *bfqq)
> {
> 	struct bfq_ttime *ttime = &bfqq->ttime;
> -	u64 elapsed = ktime_get_ns() - bfqq->ttime.last_end_request;
> -
> +	u64 elapsed;
> +	
> +	/*
> +	 * We are really interested in how long it takes for the queue to
> +	 * become busy when there is no outstanding IO for this queue. So
> +	 * ignore cases when the bfq queue has already IO queued.
> +	 */
> +	if (bfqq->dispatched || bfq_bfqq_busy(bfqq))
> +		return;
> +	elapsed = ktime_get_ns() - bfqq->ttime.last_end_request;
> 	elapsed = min_t(u64, elapsed, 2ULL * bfqd->bfq_slice_idle);
> 
> 	ttime->ttime_samples = (7*ttime->ttime_samples + 256) / 8;
> -- 
> 2.16.4
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] bfq: Avoid false bfq queue merging
  2020-06-11  8:31     ` Jan Kara
@ 2020-06-11 14:12       ` Paolo Valente
  2021-01-10  9:21         ` Paolo Valente
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Valente @ 2020-06-11 14:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-block, stable



> Il giorno 11 giu 2020, alle ore 10:31, Jan Kara <jack@suse.cz> ha scritto:
> 
> On Thu 11-06-20 09:13:07, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 5 giu 2020, alle ore 16:16, Jan Kara <jack@suse.cz> ha scritto:
>>> 
>>> bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
>>> makes sense to merge current bfq queue with the in-service queue.
>>> However if the in-service queue is freshly scheduled and didn't dispatch
>>> any requests yet, bfqd->in_serv_last_pos is stale and contains value
>>> from the previously scheduled bfq queue which can thus result in a bogus
>>> decision that the two queues should be merged.
>> 
>> Good catch! 
>> 
>>> This bug can be observed
>>> for example with the following fio jobfile:
>>> 
>>> [global]
>>> direct=0
>>> ioengine=sync
>>> invalidate=1
>>> size=1g
>>> rw=read
>>> 
>>> [reader]
>>> numjobs=4
>>> directory=/mnt
>>> 
>>> where the 4 processes will end up in the one shared bfq queue although
>>> they do IO to physically very distant files (for some reason I was able to
>>> observe this only with slice_idle=1ms setting).
>>> 
>>> Fix the problem by invalidating bfqd->in_serv_last_pos when switching
>>> in-service queue.
>>> 
>> 
>> Apart from the nonexistent problem that even 0 is a valid LBA :)
> 
> Yes, I was also thinking about that and decided 0 is "good enough" :). But
> I just as well just switch to (sector_t)-1 if you think it would be better.
> 

0 is ok :)

Thanks,
Paolo

>> Acked-by: Paolo Valente <paolo.valente@linaro.org>
> 
> Thanks!
> 
> 								Honza
> 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] bfq: Use only idle IO periods for think time calculations
  2020-06-11 14:11   ` Paolo Valente
@ 2020-06-11 14:39     ` Jan Kara
  2020-07-22  9:13       ` Paolo Valente
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2020-06-11 14:39 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Jan Kara, linux-block

On Thu 11-06-20 16:11:10, Paolo Valente wrote:
> 
> 
> > Il giorno 5 giu 2020, alle ore 16:16, Jan Kara <jack@suse.cz> ha scritto:
> > 
> > Currently whenever bfq queue has a request queued we add now -
> > last_completion_time to the think time statistics. This is however
> > misleading in case the process is able to submit several requests in
> > parallel because e.g. if the queue has request completed at time T0 and
> > then queues new requests at times T1, T2, then we will add T1-T0 and
> > T2-T0 to think time statistics which just doesn't make any sence (the
> > queue's think time is penalized by the queue being able to submit more
> > IO).
> 
> I've thought about this issue several times.  My concern is that, by
> updating the think time only when strictly meaningful, we reduce the
> number of samples.  In some cases, we may reduce it to a very low
> value.  For this reason, so far I have desisted from changing the
> current scheme.  IOW, I opted for dirtier statistics to avoid the risk
> of too scarse statistics.  Any feedback is very welcome.

I understand the concern. But:

a) I don't think adding these samples to statistics helps in any way (you
cannot improve the prediction power of the statistics by including in it
some samples that are not directly related to the thing you try to
predict). And think time is used to predict the answer to the question: If
bfq queue becomes idle, how long will it take for new request to arrive? So
second and further requests are simply irrelevant.

b) From my testing with 4 fio sequential readers (the workload mentioned in
the previous patch) this patch caused a noticeable change in computed think
time and that allowed fio processes to be reliably marked as having short
think time (without this patch they were oscilating between short ttime and
not short ttime) and consequently achieving better throughput because we
were idling for new requests from these bfq queues. Note that this was with
somewhat aggressive slice_idle setting (2ms). Having slice_idle >= 4ms was
enough hide the ttime computation issue but still this demonstrates that
these bogus samples noticeably raise computed average.

								Honza

> > So add to think time statistics only time intervals when the queue
> > had no IO pending.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > block/bfq-iosched.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> > index c66c3eaa9e26..4b1c9c5f57b6 100644
> > --- a/block/bfq-iosched.c
> > +++ b/block/bfq-iosched.c
> > @@ -5192,8 +5192,16 @@ static void bfq_update_io_thinktime(struct bfq_data *bfqd,
> > 				    struct bfq_queue *bfqq)
> > {
> > 	struct bfq_ttime *ttime = &bfqq->ttime;
> > -	u64 elapsed = ktime_get_ns() - bfqq->ttime.last_end_request;
> > -
> > +	u64 elapsed;
> > +	
> > +	/*
> > +	 * We are really interested in how long it takes for the queue to
> > +	 * become busy when there is no outstanding IO for this queue. So
> > +	 * ignore cases when the bfq queue has already IO queued.
> > +	 */
> > +	if (bfqq->dispatched || bfq_bfqq_busy(bfqq))
> > +		return;
> > +	elapsed = ktime_get_ns() - bfqq->ttime.last_end_request;
> > 	elapsed = min_t(u64, elapsed, 2ULL * bfqd->bfq_slice_idle);
> > 
> > 	ttime->ttime_samples = (7*ttime->ttime_samples + 256) / 8;
> > -- 
> > 2.16.4
> > 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] bfq: Use only idle IO periods for think time calculations
  2020-06-11 14:39     ` Jan Kara
@ 2020-07-22  9:13       ` Paolo Valente
  2020-07-27  7:35         ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Valente @ 2020-07-22  9:13 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-block



> Il giorno 11 giu 2020, alle ore 16:39, Jan Kara <jack@suse.cz> ha scritto:
> 
> On Thu 11-06-20 16:11:10, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 5 giu 2020, alle ore 16:16, Jan Kara <jack@suse.cz> ha scritto:
>>> 
>>> Currently whenever bfq queue has a request queued we add now -
>>> last_completion_time to the think time statistics. This is however
>>> misleading in case the process is able to submit several requests in
>>> parallel because e.g. if the queue has request completed at time T0 and
>>> then queues new requests at times T1, T2, then we will add T1-T0 and
>>> T2-T0 to think time statistics which just doesn't make any sence (the
>>> queue's think time is penalized by the queue being able to submit more
>>> IO).
>> 
>> I've thought about this issue several times.  My concern is that, by
>> updating the think time only when strictly meaningful, we reduce the
>> number of samples.  In some cases, we may reduce it to a very low
>> value.  For this reason, so far I have desisted from changing the
>> current scheme.  IOW, I opted for dirtier statistics to avoid the risk
>> of too scarse statistics.  Any feedback is very welcome.
> 
> I understand the concern.

Hi,
sorry for the sidereal delay.

> But:
> 
> a) I don't think adding these samples to statistics helps in any way (you
> cannot improve the prediction power of the statistics by including in it
> some samples that are not directly related to the thing you try to
> predict). And think time is used to predict the answer to the question: If
> bfq queue becomes idle, how long will it take for new request to arrive? So
> second and further requests are simply irrelevant.
> 

Yes, you are super right in theory.

Unfortunately this may not mean that your patch will do only good, for
the concerns in my previous email. 

So, here is a proposal to move forward:
1) I test your patch on my typical set of
   latency/guaranteed-bandwidth/total-throughput benchmarks
2) You test your patch on a significant set of benchmarks in mmtests

What do you think?

Thanks,
Paolo

> b) From my testing with 4 fio sequential readers (the workload mentioned in
> the previous patch) this patch caused a noticeable change in computed think
> time and that allowed fio processes to be reliably marked as having short
> think time (without this patch they were oscilating between short ttime and
> not short ttime) and consequently achieving better throughput because we
> were idling for new requests from these bfq queues. Note that this was with
> somewhat aggressive slice_idle setting (2ms). Having slice_idle >= 4ms was
> enough hide the ttime computation issue but still this demonstrates that
> these bogus samples noticeably raise computed average.
> 
> 								Honza
> 
>>> So add to think time statistics only time intervals when the queue
>>> had no IO pending.
>>> 
>>> Signed-off-by: Jan Kara <jack@suse.cz>
>>> ---
>>> block/bfq-iosched.c | 12 ++++++++++--
>>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>> index c66c3eaa9e26..4b1c9c5f57b6 100644
>>> --- a/block/bfq-iosched.c
>>> +++ b/block/bfq-iosched.c
>>> @@ -5192,8 +5192,16 @@ static void bfq_update_io_thinktime(struct bfq_data *bfqd,
>>> 				    struct bfq_queue *bfqq)
>>> {
>>> 	struct bfq_ttime *ttime = &bfqq->ttime;
>>> -	u64 elapsed = ktime_get_ns() - bfqq->ttime.last_end_request;
>>> -
>>> +	u64 elapsed;
>>> +	
>>> +	/*
>>> +	 * We are really interested in how long it takes for the queue to
>>> +	 * become busy when there is no outstanding IO for this queue. So
>>> +	 * ignore cases when the bfq queue has already IO queued.
>>> +	 */
>>> +	if (bfqq->dispatched || bfq_bfqq_busy(bfqq))
>>> +		return;
>>> +	elapsed = ktime_get_ns() - bfqq->ttime.last_end_request;
>>> 	elapsed = min_t(u64, elapsed, 2ULL * bfqd->bfq_slice_idle);
>>> 
>>> 	ttime->ttime_samples = (7*ttime->ttime_samples + 256) / 8;
>>> -- 
>>> 2.16.4
>>> 
>> 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] bfq: Use only idle IO periods for think time calculations
  2020-07-22  9:13       ` Paolo Valente
@ 2020-07-27  7:35         ` Jan Kara
  2020-08-26 13:54           ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2020-07-27  7:35 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Jan Kara, linux-block

On Wed 22-07-20 11:13:28, Paolo Valente wrote:
> > a) I don't think adding these samples to statistics helps in any way (you
> > cannot improve the prediction power of the statistics by including in it
> > some samples that are not directly related to the thing you try to
> > predict). And think time is used to predict the answer to the question: If
> > bfq queue becomes idle, how long will it take for new request to arrive? So
> > second and further requests are simply irrelevant.
> > 
> 
> Yes, you are super right in theory.
> 
> Unfortunately this may not mean that your patch will do only good, for
> the concerns in my previous email. 
> 
> So, here is a proposal to move forward:
> 1) I test your patch on my typical set of
>    latency/guaranteed-bandwidth/total-throughput benchmarks
> 2) You test your patch on a significant set of benchmarks in mmtests
> 
> What do you think?

Sure, I will queue runs for the patches with mmtests :).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] bfq: Use only idle IO periods for think time calculations
  2020-07-27  7:35         ` Jan Kara
@ 2020-08-26 13:54           ` Jan Kara
  2020-09-02 15:17             ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2020-08-26 13:54 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Jan Kara, linux-block

On Mon 27-07-20 09:35:15, Jan Kara wrote:
> On Wed 22-07-20 11:13:28, Paolo Valente wrote:
> > > a) I don't think adding these samples to statistics helps in any way (you
> > > cannot improve the prediction power of the statistics by including in it
> > > some samples that are not directly related to the thing you try to
> > > predict). And think time is used to predict the answer to the question: If
> > > bfq queue becomes idle, how long will it take for new request to arrive? So
> > > second and further requests are simply irrelevant.
> > > 
> > 
> > Yes, you are super right in theory.
> > 
> > Unfortunately this may not mean that your patch will do only good, for
> > the concerns in my previous email. 
> > 
> > So, here is a proposal to move forward:
> > 1) I test your patch on my typical set of
> >    latency/guaranteed-bandwidth/total-throughput benchmarks
> > 2) You test your patch on a significant set of benchmarks in mmtests
> > 
> > What do you think?
> 
> Sure, I will queue runs for the patches with mmtests :).

Sorry it took so long but I've hit a couple of technical snags when running
these tests (plus I was on vacation). So I've run the tests on 4 machines.
2 with rotational disks with NCQ, 2 with SATA SSD. Results are mostly
neutral, details are below.

For dbench, it seems to be generally neutral but the patches do fix
occasional weird outlier which are IMO caused exactly by bugs in the
heuristics I'm fixing. Things like (see the outlier for 4 clients
with vanilla kernel):

		vanilla			bfq-waker-fixes
Amean 	1 	32.57	( 0.00%)	32.10	( 1.46%)
Amean 	2 	34.73	( 0.00%)	34.68	( 0.15%)
Amean 	4 	199.74	( 0.00%)	45.76	( 77.09%)
Amean 	8 	65.41	( 0.00%)	65.47	( -0.10%)
Amean 	16	95.46	( 0.00%)	96.61	( -1.21%)
Amean 	32	148.07	( 0.00%)	147.66	( 0.27%)
Amean	64	291.17	( 0.00%)	289.44	( 0.59%)

For pgbench and bonnie, patches are neutral for all the machines.

For reaim disk workload, patches are mostly neutral, just on one machine
with SSD they seem to improve XFS results and worsen ext4 results. But
results look rather noisy on that machine so it may be just a noise...

For parallel dd(1) processes reading from multiple files, results are also
neutral all machines.

For parallel dd(1) processes reading from a common file, results are also
neutral except for one machine with SSD on XFS (ext4 was fine) where there
seems to be consistent regression for 4 and more processes:

		vanilla			bfq-waker-fixes
Amean 	1 	393.30	( 0.00%)	391.02	( 0.58%)
Amean 	4 	443.88	( 0.00%)	517.16	( -16.51%)
Amean 	7 	599.60	( 0.00%)	748.68	( -24.86%)
Amean 	12	1134.26	( 0.00%)	1255.62	( -10.70%)
Amean 	21	1940.50	( 0.00%)	2206.29	( -13.70%)
Amean 	30	2381.08	( 0.00%)	2735.69	( -14.89%)
Amean 	48	2754.36	( 0.00%)	3258.93	( -18.32%)

I'll try to reproduce this regression and check what's happening...

So what do you think, are you fine with merging my patches now?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] bfq: Use only idle IO periods for think time calculations
  2020-08-26 13:54           ` Jan Kara
@ 2020-09-02 15:17             ` Jan Kara
  2021-01-10  9:23               ` Paolo Valente
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2020-09-02 15:17 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Jan Kara, linux-block

On Wed 26-08-20 15:54:19, Jan Kara wrote:
> On Mon 27-07-20 09:35:15, Jan Kara wrote:
> > On Wed 22-07-20 11:13:28, Paolo Valente wrote:
> > > > a) I don't think adding these samples to statistics helps in any way (you
> > > > cannot improve the prediction power of the statistics by including in it
> > > > some samples that are not directly related to the thing you try to
> > > > predict). And think time is used to predict the answer to the question: If
> > > > bfq queue becomes idle, how long will it take for new request to arrive? So
> > > > second and further requests are simply irrelevant.
> > > > 
> > > 
> > > Yes, you are super right in theory.
> > > 
> > > Unfortunately this may not mean that your patch will do only good, for
> > > the concerns in my previous email. 
> > > 
> > > So, here is a proposal to move forward:
> > > 1) I test your patch on my typical set of
> > >    latency/guaranteed-bandwidth/total-throughput benchmarks
> > > 2) You test your patch on a significant set of benchmarks in mmtests
> > > 
> > > What do you think?
> > 
> > Sure, I will queue runs for the patches with mmtests :).
> 
> Sorry it took so long but I've hit a couple of technical snags when running
> these tests (plus I was on vacation). So I've run the tests on 4 machines.
> 2 with rotational disks with NCQ, 2 with SATA SSD. Results are mostly
> neutral, details are below.
> 
> For dbench, it seems to be generally neutral but the patches do fix
> occasional weird outlier which are IMO caused exactly by bugs in the
> heuristics I'm fixing. Things like (see the outlier for 4 clients
> with vanilla kernel):
> 
> 		vanilla			bfq-waker-fixes
> Amean 	1 	32.57	( 0.00%)	32.10	( 1.46%)
> Amean 	2 	34.73	( 0.00%)	34.68	( 0.15%)
> Amean 	4 	199.74	( 0.00%)	45.76	( 77.09%)
> Amean 	8 	65.41	( 0.00%)	65.47	( -0.10%)
> Amean 	16	95.46	( 0.00%)	96.61	( -1.21%)
> Amean 	32	148.07	( 0.00%)	147.66	( 0.27%)
> Amean	64	291.17	( 0.00%)	289.44	( 0.59%)
> 
> For pgbench and bonnie, patches are neutral for all the machines.
> 
> For reaim disk workload, patches are mostly neutral, just on one machine
> with SSD they seem to improve XFS results and worsen ext4 results. But
> results look rather noisy on that machine so it may be just a noise...
> 
> For parallel dd(1) processes reading from multiple files, results are also
> neutral all machines.
> 
> For parallel dd(1) processes reading from a common file, results are also
> neutral except for one machine with SSD on XFS (ext4 was fine) where there
> seems to be consistent regression for 4 and more processes:
> 
> 		vanilla			bfq-waker-fixes
> Amean 	1 	393.30	( 0.00%)	391.02	( 0.58%)
> Amean 	4 	443.88	( 0.00%)	517.16	( -16.51%)
> Amean 	7 	599.60	( 0.00%)	748.68	( -24.86%)
> Amean 	12	1134.26	( 0.00%)	1255.62	( -10.70%)
> Amean 	21	1940.50	( 0.00%)	2206.29	( -13.70%)
> Amean 	30	2381.08	( 0.00%)	2735.69	( -14.89%)
> Amean 	48	2754.36	( 0.00%)	3258.93	( -18.32%)
> 
> I'll try to reproduce this regression and check what's happening...
> 
> So what do you think, are you fine with merging my patches now?

Paolo, any results from running your tests for these patches? I'd like to
get these mostly obvious things merged so that we can move on...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] bfq: Avoid false bfq queue merging
  2020-06-11 14:12       ` Paolo Valente
@ 2021-01-10  9:21         ` Paolo Valente
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Valente @ 2021-01-10  9:21 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-block, stable



> Il giorno 11 giu 2020, alle ore 16:12, Paolo Valente <paolo.valente@linaro.org> ha scritto:
> 
> 
> 
>> Il giorno 11 giu 2020, alle ore 10:31, Jan Kara <jack@suse.cz> ha scritto:
>> 
>> On Thu 11-06-20 09:13:07, Paolo Valente wrote:
>>> 
>>> 
>>>> Il giorno 5 giu 2020, alle ore 16:16, Jan Kara <jack@suse.cz> ha scritto:
>>>> 
>>>> bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
>>>> makes sense to merge current bfq queue with the in-service queue.
>>>> However if the in-service queue is freshly scheduled and didn't dispatch
>>>> any requests yet, bfqd->in_serv_last_pos is stale and contains value
>>>> from the previously scheduled bfq queue which can thus result in a bogus
>>>> decision that the two queues should be merged.
>>> 
>>> Good catch! 
>>> 
>>>> This bug can be observed
>>>> for example with the following fio jobfile:
>>>> 
>>>> [global]
>>>> direct=0
>>>> ioengine=sync
>>>> invalidate=1
>>>> size=1g
>>>> rw=read
>>>> 
>>>> [reader]
>>>> numjobs=4
>>>> directory=/mnt
>>>> 
>>>> where the 4 processes will end up in the one shared bfq queue although
>>>> they do IO to physically very distant files (for some reason I was able to
>>>> observe this only with slice_idle=1ms setting).
>>>> 
>>>> Fix the problem by invalidating bfqd->in_serv_last_pos when switching
>>>> in-service queue.
>>>> 
>>> 
>>> Apart from the nonexistent problem that even 0 is a valid LBA :)
>> 
>> Yes, I was also thinking about that and decided 0 is "good enough" :). But
>> I just as well just switch to (sector_t)-1 if you think it would be better.
>> 
> 
> 0 is ok :)
> 

Hi Jan,
I've finally tested this patch of yours. No regression.

Once again:
Acked-by: Paolo Valente <paolo.valente@linaro.org>

Thanks,
Paolo

> Thanks,
> Paolo
> 
>>> Acked-by: Paolo Valente <paolo.valente@linaro.org>
>> 
>> Thanks!
>> 
>> 								Honza
>> 
>> -- 
>> Jan Kara <jack@suse.com>
>> SUSE Labs, CR


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] bfq: Use only idle IO periods for think time calculations
  2020-09-02 15:17             ` Jan Kara
@ 2021-01-10  9:23               ` Paolo Valente
  2021-01-13  9:56                 ` Jan Kara
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Valente @ 2021-01-10  9:23 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-block



> Il giorno 2 set 2020, alle ore 17:17, Jan Kara <jack@suse.cz> ha scritto:
> 
> On Wed 26-08-20 15:54:19, Jan Kara wrote:
>> On Mon 27-07-20 09:35:15, Jan Kara wrote:
>>> On Wed 22-07-20 11:13:28, Paolo Valente wrote:
>>>>> a) I don't think adding these samples to statistics helps in any way (you
>>>>> cannot improve the prediction power of the statistics by including in it
>>>>> some samples that are not directly related to the thing you try to
>>>>> predict). And think time is used to predict the answer to the question: If
>>>>> bfq queue becomes idle, how long will it take for new request to arrive? So
>>>>> second and further requests are simply irrelevant.
>>>>> 
>>>> 
>>>> Yes, you are super right in theory.
>>>> 
>>>> Unfortunately this may not mean that your patch will do only good, for
>>>> the concerns in my previous email. 
>>>> 
>>>> So, here is a proposal to move forward:
>>>> 1) I test your patch on my typical set of
>>>>   latency/guaranteed-bandwidth/total-throughput benchmarks
>>>> 2) You test your patch on a significant set of benchmarks in mmtests
>>>> 
>>>> What do you think?
>>> 
>>> Sure, I will queue runs for the patches with mmtests :).
>> 
>> Sorry it took so long but I've hit a couple of technical snags when running
>> these tests (plus I was on vacation). So I've run the tests on 4 machines.
>> 2 with rotational disks with NCQ, 2 with SATA SSD. Results are mostly
>> neutral, details are below.
>> 
>> For dbench, it seems to be generally neutral but the patches do fix
>> occasional weird outlier which are IMO caused exactly by bugs in the
>> heuristics I'm fixing. Things like (see the outlier for 4 clients
>> with vanilla kernel):
>> 
>> 		vanilla			bfq-waker-fixes
>> Amean 	1 	32.57	( 0.00%)	32.10	( 1.46%)
>> Amean 	2 	34.73	( 0.00%)	34.68	( 0.15%)
>> Amean 	4 	199.74	( 0.00%)	45.76	( 77.09%)
>> Amean 	8 	65.41	( 0.00%)	65.47	( -0.10%)
>> Amean 	16	95.46	( 0.00%)	96.61	( -1.21%)
>> Amean 	32	148.07	( 0.00%)	147.66	( 0.27%)
>> Amean	64	291.17	( 0.00%)	289.44	( 0.59%)
>> 
>> For pgbench and bonnie, patches are neutral for all the machines.
>> 
>> For reaim disk workload, patches are mostly neutral, just on one machine
>> with SSD they seem to improve XFS results and worsen ext4 results. But
>> results look rather noisy on that machine so it may be just a noise...
>> 
>> For parallel dd(1) processes reading from multiple files, results are also
>> neutral all machines.
>> 
>> For parallel dd(1) processes reading from a common file, results are also
>> neutral except for one machine with SSD on XFS (ext4 was fine) where there
>> seems to be consistent regression for 4 and more processes:
>> 
>> 		vanilla			bfq-waker-fixes
>> Amean 	1 	393.30	( 0.00%)	391.02	( 0.58%)
>> Amean 	4 	443.88	( 0.00%)	517.16	( -16.51%)
>> Amean 	7 	599.60	( 0.00%)	748.68	( -24.86%)
>> Amean 	12	1134.26	( 0.00%)	1255.62	( -10.70%)
>> Amean 	21	1940.50	( 0.00%)	2206.29	( -13.70%)
>> Amean 	30	2381.08	( 0.00%)	2735.69	( -14.89%)
>> Amean 	48	2754.36	( 0.00%)	3258.93	( -18.32%)
>> 
>> I'll try to reproduce this regression and check what's happening...
>> 
>> So what do you think, are you fine with merging my patches now?
> 
> Paolo, any results from running your tests for these patches? I'd like to
> get these mostly obvious things merged so that we can move on...
> 

Hi,
sorry again for my delay. Tested this too, at last. No regression. So gladly

Acked-by: Paolo Valente <paolo.valente@linaro.org>

And thank you very much for your contributions and patience,
Paolo

> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] bfq: Use only idle IO periods for think time calculations
  2021-01-10  9:23               ` Paolo Valente
@ 2021-01-13  9:56                 ` Jan Kara
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2021-01-13  9:56 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Jan Kara, linux-block

On Sun 10-01-21 10:23:47, Paolo Valente wrote:
> 
> 
> > Il giorno 2 set 2020, alle ore 17:17, Jan Kara <jack@suse.cz> ha scritto:
> > 
> > On Wed 26-08-20 15:54:19, Jan Kara wrote:
> >> On Mon 27-07-20 09:35:15, Jan Kara wrote:
> >>> On Wed 22-07-20 11:13:28, Paolo Valente wrote:
> >>>>> a) I don't think adding these samples to statistics helps in any way (you
> >>>>> cannot improve the prediction power of the statistics by including in it
> >>>>> some samples that are not directly related to the thing you try to
> >>>>> predict). And think time is used to predict the answer to the question: If
> >>>>> bfq queue becomes idle, how long will it take for new request to arrive? So
> >>>>> second and further requests are simply irrelevant.
> >>>>> 
> >>>> 
> >>>> Yes, you are super right in theory.
> >>>> 
> >>>> Unfortunately this may not mean that your patch will do only good, for
> >>>> the concerns in my previous email. 
> >>>> 
> >>>> So, here is a proposal to move forward:
> >>>> 1) I test your patch on my typical set of
> >>>>   latency/guaranteed-bandwidth/total-throughput benchmarks
> >>>> 2) You test your patch on a significant set of benchmarks in mmtests
> >>>> 
> >>>> What do you think?
> >>> 
> >>> Sure, I will queue runs for the patches with mmtests :).
> >> 
> >> Sorry it took so long but I've hit a couple of technical snags when running
> >> these tests (plus I was on vacation). So I've run the tests on 4 machines.
> >> 2 with rotational disks with NCQ, 2 with SATA SSD. Results are mostly
> >> neutral, details are below.
> >> 
> >> For dbench, it seems to be generally neutral but the patches do fix
> >> occasional weird outlier which are IMO caused exactly by bugs in the
> >> heuristics I'm fixing. Things like (see the outlier for 4 clients
> >> with vanilla kernel):
> >> 
> >> 		vanilla			bfq-waker-fixes
> >> Amean 	1 	32.57	( 0.00%)	32.10	( 1.46%)
> >> Amean 	2 	34.73	( 0.00%)	34.68	( 0.15%)
> >> Amean 	4 	199.74	( 0.00%)	45.76	( 77.09%)
> >> Amean 	8 	65.41	( 0.00%)	65.47	( -0.10%)
> >> Amean 	16	95.46	( 0.00%)	96.61	( -1.21%)
> >> Amean 	32	148.07	( 0.00%)	147.66	( 0.27%)
> >> Amean	64	291.17	( 0.00%)	289.44	( 0.59%)
> >> 
> >> For pgbench and bonnie, patches are neutral for all the machines.
> >> 
> >> For reaim disk workload, patches are mostly neutral, just on one machine
> >> with SSD they seem to improve XFS results and worsen ext4 results. But
> >> results look rather noisy on that machine so it may be just a noise...
> >> 
> >> For parallel dd(1) processes reading from multiple files, results are also
> >> neutral all machines.
> >> 
> >> For parallel dd(1) processes reading from a common file, results are also
> >> neutral except for one machine with SSD on XFS (ext4 was fine) where there
> >> seems to be consistent regression for 4 and more processes:
> >> 
> >> 		vanilla			bfq-waker-fixes
> >> Amean 	1 	393.30	( 0.00%)	391.02	( 0.58%)
> >> Amean 	4 	443.88	( 0.00%)	517.16	( -16.51%)
> >> Amean 	7 	599.60	( 0.00%)	748.68	( -24.86%)
> >> Amean 	12	1134.26	( 0.00%)	1255.62	( -10.70%)
> >> Amean 	21	1940.50	( 0.00%)	2206.29	( -13.70%)
> >> Amean 	30	2381.08	( 0.00%)	2735.69	( -14.89%)
> >> Amean 	48	2754.36	( 0.00%)	3258.93	( -18.32%)
> >> 
> >> I'll try to reproduce this regression and check what's happening...
> >> 
> >> So what do you think, are you fine with merging my patches now?
> > 
> > Paolo, any results from running your tests for these patches? I'd like to
> > get these mostly obvious things merged so that we can move on...
> > 
> 
> Hi,
> sorry again for my delay. Tested this too, at last. No regression. So gladly
> 
> Acked-by: Paolo Valente <paolo.valente@linaro.org>
> 
> And thank you very much for your contributions and patience,

Thanks. I'll add you ack and send all the patches to Jens.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 0/3 v2] bfq: Two fixes and a cleanup for sequential readers
@ 2021-01-13 10:09 ` Jan Kara
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2021-01-13 10:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Paolo Valente, Jan Kara

Hello,

this patch series contains one tiny cleanup and two relatively simple fixes
for BFQ I've uncovered when analyzing behavior of four parallel sequential
readers on one machine. The fio jobfile is like:

[global]
direct=0
ioengine=sync
invalidate=1
size=16g
rw=read
bs=4k

[reader]
numjobs=4
directory=/mnt

The first patch fixes a problem due to which the four bfq queues were getting
merged without a reason. The third patch fixes a problem where we were unfairly
raising bfq queue think time (leading to clearing of short_ttime and subsequent
reduction in throughput).

Jens, since Paolo has acked all the patches, can you please merge them?

								Honza

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/3] bfq: Avoid false bfq queue merging
  2021-01-13 10:09 ` [PATCH 0/3 v2] " Jan Kara
                   ` (3 preceding siblings ...)
  (?)
@ 2021-01-13 10:09 ` Jan Kara
  -1 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2021-01-13 10:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Paolo Valente, Jan Kara, stable

bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
makes sense to merge current bfq queue with the in-service queue.
However if the in-service queue is freshly scheduled and didn't dispatch
any requests yet, bfqd->in_serv_last_pos is stale and contains value
from the previously scheduled bfq queue which can thus result in a bogus
decision that the two queues should be merged. This bug can be observed
for example with the following fio jobfile:

[global]
direct=0
ioengine=sync
invalidate=1
size=1g
rw=read

[reader]
numjobs=4
directory=/mnt

where the 4 processes will end up in the one shared bfq queue although
they do IO to physically very distant files (for some reason I was able to
observe this only with slice_idle=1ms setting).

Fix the problem by invalidating bfqd->in_serv_last_pos when switching
in-service queue.

Fixes: 058fdecc6de7 ("block, bfq: fix in-service-queue check for queue merging")
CC: stable@vger.kernel.org
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 9e4eb0fc1c16..dc5f12d0d482 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2937,6 +2937,7 @@ static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
 	}

 	bfqd->in_service_queue = bfqq;
+	bfqd->in_serv_last_pos = -1;
 }

 /*
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/3] bfq: Use 'ttime' local variable
  2021-01-13 10:09 ` [PATCH 0/3 v2] " Jan Kara
                   ` (4 preceding siblings ...)
  (?)
@ 2021-01-13 10:09 ` Jan Kara
  -1 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2021-01-13 10:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Paolo Valente, Jan Kara

Use local variable 'ttime' instead of dereferencing bfqq.

Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index dc5f12d0d482..d0b744eae1a3 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5199,7 +5199,7 @@ static void bfq_update_io_thinktime(struct bfq_data *bfqd,
 
 	elapsed = min_t(u64, elapsed, 2ULL * bfqd->bfq_slice_idle);
 
-	ttime->ttime_samples = (7*bfqq->ttime.ttime_samples + 256) / 8;
+	ttime->ttime_samples = (7*ttime->ttime_samples + 256) / 8;
 	ttime->ttime_total = div_u64(7*ttime->ttime_total + 256*elapsed,  8);
 	ttime->ttime_mean = div64_ul(ttime->ttime_total + 128,
 				     ttime->ttime_samples);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/3] bfq: Use only idle IO periods for think time calculations
  2021-01-13 10:09 ` [PATCH 0/3 v2] " Jan Kara
                   ` (5 preceding siblings ...)
  (?)
@ 2021-01-13 10:09 ` Jan Kara
  -1 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2021-01-13 10:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Paolo Valente, Jan Kara

Currently whenever bfq queue has a request queued we add now -
last_completion_time to the think time statistics. This is however
misleading in case the process is able to submit several requests in
parallel because e.g. if the queue has request completed at time T0 and
then queues new requests at times T1, T2, then we will add T1-T0 and
T2-T0 to think time statistics which just doesn't make any sence (the
queue's think time is penalized by the queue being able to submit more
IO). So add to think time statistics only time intervals when the queue
had no IO pending.

Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index d0b744eae1a3..53e6e76b98af 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5195,8 +5195,16 @@ static void bfq_update_io_thinktime(struct bfq_data *bfqd,
 				    struct bfq_queue *bfqq)
 {
 	struct bfq_ttime *ttime = &bfqq->ttime;
-	u64 elapsed = ktime_get_ns() - bfqq->ttime.last_end_request;
+	u64 elapsed;
 
+	/*
+	 * We are really interested in how long it takes for the queue to
+	 * become busy when there is no outstanding IO for this queue. So
+	 * ignore cases when the bfq queue has already IO queued.
+	 */
+	if (bfqq->dispatched || bfq_bfqq_busy(bfqq))
+		return;
+	elapsed = ktime_get_ns() - bfqq->ttime.last_end_request;
 	elapsed = min_t(u64, elapsed, 2ULL * bfqd->bfq_slice_idle);
 
 	ttime->ttime_samples = (7*ttime->ttime_samples + 256) / 8;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3 v2] bfq: Two fixes and a cleanup for sequential readers
  2021-01-13 10:09 ` [PATCH 0/3 v2] " Jan Kara
                   ` (6 preceding siblings ...)
  (?)
@ 2021-01-27 15:27 ` Jan Kara
  2021-01-27 16:16   ` Jens Axboe
  -1 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2021-01-27 15:27 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Paolo Valente, Jan Kara

On Wed 13-01-21 11:09:24, Jan Kara wrote:
> Hello,
> 
> this patch series contains one tiny cleanup and two relatively simple fixes
> for BFQ I've uncovered when analyzing behavior of four parallel sequential
> readers on one machine. The fio jobfile is like:
> 
> [global]
> direct=0
> ioengine=sync
> invalidate=1
> size=16g
> rw=read
> bs=4k
> 
> [reader]
> numjobs=4
> directory=/mnt
> 
> The first patch fixes a problem due to which the four bfq queues were getting
> merged without a reason. The third patch fixes a problem where we were unfairly
> raising bfq queue think time (leading to clearing of short_ttime and subsequent
> reduction in throughput).
> 
> Jens, since Paolo has acked all the patches, can you please merge them?

Jens, ping? Can you please pick up these fixes? Thanks!

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/3 v2] bfq: Two fixes and a cleanup for sequential readers
  2021-01-27 15:27 ` [PATCH 0/3 v2] bfq: Two fixes and a cleanup for sequential readers Jan Kara
@ 2021-01-27 16:16   ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2021-01-27 16:16 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-block, Paolo Valente

On 1/27/21 8:27 AM, Jan Kara wrote:
> On Wed 13-01-21 11:09:24, Jan Kara wrote:
>> Hello,
>>
>> this patch series contains one tiny cleanup and two relatively simple fixes
>> for BFQ I've uncovered when analyzing behavior of four parallel sequential
>> readers on one machine. The fio jobfile is like:
>>
>> [global]
>> direct=0
>> ioengine=sync
>> invalidate=1
>> size=16g
>> rw=read
>> bs=4k
>>
>> [reader]
>> numjobs=4
>> directory=/mnt
>>
>> The first patch fixes a problem due to which the four bfq queues were getting
>> merged without a reason. The third patch fixes a problem where we were unfairly
>> raising bfq queue think time (leading to clearing of short_ttime and subsequent
>> reduction in throughput).
>>
>> Jens, since Paolo has acked all the patches, can you please merge them?
> 
> Jens, ping? Can you please pick up these fixes? Thanks!

Yep, I've queued them up, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-01-27 16:19 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-05 14:16 [PATCH 0/3] bfq: Two fixes and a cleanup for sequential readers Jan Kara
2021-01-13 10:09 ` [PATCH 0/3 v2] " Jan Kara
2020-06-05 14:16 ` [PATCH 1/3] bfq: Avoid false bfq queue merging Jan Kara
2020-06-11  7:13   ` Paolo Valente
2020-06-11  8:31     ` Jan Kara
2020-06-11 14:12       ` Paolo Valente
2021-01-10  9:21         ` Paolo Valente
2020-06-05 14:16 ` [PATCH 2/3] bfq: Use 'ttime' local variable Jan Kara
2020-06-11  7:14   ` Paolo Valente
2020-06-05 14:16 ` [PATCH 3/3] bfq: Use only idle IO periods for think time calculations Jan Kara
2020-06-11 14:11   ` Paolo Valente
2020-06-11 14:39     ` Jan Kara
2020-07-22  9:13       ` Paolo Valente
2020-07-27  7:35         ` Jan Kara
2020-08-26 13:54           ` Jan Kara
2020-09-02 15:17             ` Jan Kara
2021-01-10  9:23               ` Paolo Valente
2021-01-13  9:56                 ` Jan Kara
2021-01-13 10:09 ` [PATCH 1/3] bfq: Avoid false bfq queue merging Jan Kara
2021-01-13 10:09 ` [PATCH 2/3] bfq: Use 'ttime' local variable Jan Kara
2021-01-13 10:09 ` [PATCH 3/3] bfq: Use only idle IO periods for think time calculations Jan Kara
2021-01-27 15:27 ` [PATCH 0/3 v2] bfq: Two fixes and a cleanup for sequential readers Jan Kara
2021-01-27 16:16   ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.