[PATCH BUGFIX/IMPROVEMENTS 0/4] bfq: fix regressions on slow virtual machines

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH BUGFIX/IMPROVEMENTS 0/4] bfq: fix regressions on slow virtual machines
@ 2018-05-31 14:45 Paolo Valente
  2018-05-31 14:45 ` [PATCH BUGFIX/IMPROVEMENTS 1/4] block, bfq: add description of weight-raising heuristics Paolo Valente
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Paolo Valente @ 2018-05-31 14:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, broonie, linus.walleij,
	bfq-iosched, oleksandr, sapienza.dav, 177992, Paolo Valente

Hi,
we have bumped into some BFQ latency regressions on slow virtual
machines. The patches in this series fix all these regressions, apart
from the first patch, which provides an indirect contribution, by
improving the description of the mechanisms affected by these
regressions.

Thanks,
Paolo and Davide


Davide Sapienza (2):
  block, bfq: increase weight-raising duration for interactive apps
  block, bfq: prevent soft_rt_next_start from being stuck at infinity

Paolo Valente (2):
  block, bfq: add description of weight-raising heuristics
  block, bfq: remove slow-system class

 block/bfq-iosched.c | 252 ++++++++++++++++++++++------------------------------
 block/bfq-iosched.h |  14 +--
 2 files changed, 109 insertions(+), 157 deletions(-)

--
2.16.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH BUGFIX/IMPROVEMENTS 1/4] block, bfq: add description of weight-raising heuristics
  2018-05-31 14:45 [PATCH BUGFIX/IMPROVEMENTS 0/4] bfq: fix regressions on slow virtual machines Paolo Valente
@ 2018-05-31 14:45 ` Paolo Valente
  2018-05-31 14:45 ` [PATCH BUGFIX/IMPROVEMENTS 2/4] block, bfq: remove slow-system class Paolo Valente
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Paolo Valente @ 2018-05-31 14:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, broonie, linus.walleij,
	bfq-iosched, oleksandr, sapienza.dav, 177992, Paolo Valente

A description of how weight raising works is missing in BFQ
sources. In addition, the code for handling weight raising is
scattered across a few functions. This makes it rather hard to
understand the mechanism and its rationale. This commits adds such a
description at the beginning of the main source file.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 80 +++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 56 insertions(+), 24 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index f71a5846b629..f3703e7431aa 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -49,9 +49,39 @@
  *
  * In particular, to provide these low-latency guarantees, BFQ
  * explicitly privileges the I/O of two classes of time-sensitive
- * applications: interactive and soft real-time. This feature enables
- * BFQ to provide applications in these classes with a very low
- * latency. Finally, BFQ also features additional heuristics for
+ * applications: interactive and soft real-time. In more detail, BFQ
+ * behaves this way if the low_latency parameter is set (default
+ * configuration). This feature enables BFQ to provide applications in
+ * these classes with a very low latency.
+ *
+ * To implement this feature, BFQ constantly tries to detect whether
+ * the I/O requests in a bfq_queue come from an interactive or a soft
+ * real-time application. For brevity, in these cases, the queue is
+ * said to be interactive or soft real-time. In both cases, BFQ
+ * privileges the service of the queue, over that of non-interactive
+ * and non-soft-real-time queues. This privileging is performed,
+ * mainly, by raising the weight of the queue. So, for brevity, we
+ * call just weight-raising periods the time periods during which a
+ * queue is privileged, because deemed interactive or soft real-time.
+ *
+ * The detection of soft real-time queues/applications is described in
+ * detail in the comments on the function
+ * bfq_bfqq_softrt_next_start. On the other hand, the detection of an
+ * interactive queue works as follows: a queue is deemed interactive
+ * if it is constantly non empty only for a limited time interval,
+ * after which it does become empty. The queue may be deemed
+ * interactive again (for a limited time), if it restarts being
+ * constantly non empty, provided that this happens only after the
+ * queue has remained empty for a given minimum idle time.
+ *
+ * By default, BFQ computes automatically the above maximum time
+ * interval, i.e., the time interval after which a constantly
+ * non-empty queue stops being deemed interactive. Since a queue is
+ * weight-raised while it is deemed interactive, this maximum time
+ * interval happens to coincide with the (maximum) duration of the
+ * weight-raising for interactive queues.
+ *
+ * Finally, BFQ also features additional heuristics for
  * preserving both a low latency and a high throughput on NCQ-capable,
  * rotational or flash-based devices, and to get the job done quickly
  * for applications consisting in many I/O-bound processes.
@@ -61,14 +91,14 @@
  * all low-latency heuristics for that device, by setting low_latency
  * to 0.
  *
- * BFQ is described in [1], where also a reference to the initial, more
- * theoretical paper on BFQ can be found. The interested reader can find
- * in the latter paper full details on the main algorithm, as well as
- * formulas of the guarantees and formal proofs of all the properties.
- * With respect to the version of BFQ presented in these papers, this
- * implementation adds a few more heuristics, such as the one that
- * guarantees a low latency to soft real-time applications, and a
- * hierarchical extension based on H-WF2Q+.
+ * BFQ is described in [1], where also a reference to the initial,
+ * more theoretical paper on BFQ can be found. The interested reader
+ * can find in the latter paper full details on the main algorithm, as
+ * well as formulas of the guarantees and formal proofs of all the
+ * properties.  With respect to the version of BFQ presented in these
+ * papers, this implementation adds a few more heuristics, such as the
+ * ones that guarantee a low latency to interactive and soft real-time
+ * applications, and a hierarchical extension based on H-WF2Q+.
  *
  * B-WF2Q+ is based on WF2Q+, which is described in [2], together with
  * H-WF2Q+, while the augmented tree used here to implement B-WF2Q+
@@ -218,21 +248,23 @@ static struct kmem_cache *bfq_pool;
 #define BFQ_RATE_SHIFT		16
 
 /*
- * By default, BFQ computes the duration of the weight raising for
- * interactive applications automatically, using the following formula:
- * duration = (R / r) * T, where r is the peak rate of the device, and
- * R and T are two reference parameters.
- * In particular, R is the peak rate of the reference device (see
- * below), and T is a reference time: given the systems that are
- * likely to be installed on the reference device according to its
- * speed class, T is about the maximum time needed, under BFQ and
+ * When configured for computing the duration of the weight-raising
+ * for interactive queues automatically (see the comments at the
+ * beginning of this file), BFQ does it using the following formula:
+ * duration = (R / r) * T,
+ * where r is the peak rate of the device, and R
+ * and T are two reference parameters.	In particular,
+ * R is the peak rate of the reference device (see below), and
+ * T is a reference time: given the systems that are likely
+ * to be installed on the reference device according to its speed
+ * class, T is about the maximum time needed, under BFQ and
  * while reading two files in parallel, to load typical large
  * applications on these systems (see the comments on
- * max_service_from_wr below, for more details on how T is obtained).
- * In practice, the slower/faster the device at hand is, the more/less
- * it takes to load applications with respect to the reference device.
- * Accordingly, the longer/shorter BFQ grants weight raising to
- * interactive applications.
+ * max_service_from_wr below, for more details on how T is
+ * obtained).  In practice, the slower/faster the device at hand is,
+ * the more/less it takes to load applications with respect to the
+ * reference device.  Accordingly, the longer/shorter BFQ grants
+ * weight raising to interactive applications.
  *
  * BFQ uses four different reference pairs (R, T), depending on:
  * . whether the device is rotational or non-rotational;
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH BUGFIX/IMPROVEMENTS 2/4] block, bfq: remove slow-system class
  2018-05-31 14:45 [PATCH BUGFIX/IMPROVEMENTS 0/4] bfq: fix regressions on slow virtual machines Paolo Valente
  2018-05-31 14:45 ` [PATCH BUGFIX/IMPROVEMENTS 1/4] block, bfq: add description of weight-raising heuristics Paolo Valente
@ 2018-05-31 14:45 ` Paolo Valente
  2018-05-31 14:45 ` [PATCH BUGFIX/IMPROVEMENTS 3/4] block, bfq: increase weight-raising duration for interactive apps Paolo Valente
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Paolo Valente @ 2018-05-31 14:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, broonie, linus.walleij,
	bfq-iosched, oleksandr, sapienza.dav, 177992, Paolo Valente

BFQ computes the duration of weight raising for interactive
applications automatically, using some reference parameters. In
particular, BFQ uses the best durations (see comments in the code for
how these durations have been assessed) for two classes of systems:
slow and fast ones. Examples of slow systems are old phones or systems
using micro HDDs. Fast systems are all the remaining ones. Using these
parameters, BFQ computes the actual duration of the weight raising,
for the system at hand, as a function of the relative speed of the
system w.r.t. the speed of a reference system, belonging to the same
class of systems as the system at hand.

This slow vs fast differentiation proved to be useful in the past, but
happens to have little meaning with current hardware. Even worse, it
does cause problems in virtual systems, where the speed of the system
can vary frequently, and so widely to just confuse the class-detection
mechanism, and, as we have verified experimentally, to cause BFQ to
compute non-sensical weight-raising durations.

This commit addresses this issue by removing the slow class and the
class-detection mechanism.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 137 ++++++++++++++++------------------------------------
 block/bfq-iosched.h |  14 ++----
 2 files changed, 46 insertions(+), 105 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index f3703e7431aa..262c929e24ee 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -251,55 +251,43 @@ static struct kmem_cache *bfq_pool;
  * When configured for computing the duration of the weight-raising
  * for interactive queues automatically (see the comments at the
  * beginning of this file), BFQ does it using the following formula:
- * duration = (R / r) * T,
- * where r is the peak rate of the device, and R
- * and T are two reference parameters.	In particular,
- * R is the peak rate of the reference device (see below), and
- * T is a reference time: given the systems that are likely
- * to be installed on the reference device according to its speed
- * class, T is about the maximum time needed, under BFQ and
- * while reading two files in parallel, to load typical large
- * applications on these systems (see the comments on
- * max_service_from_wr below, for more details on how T is
- * obtained).  In practice, the slower/faster the device at hand is,
- * the more/less it takes to load applications with respect to the
+ * duration = (ref_rate / r) * ref_wr_duration,
+ * where r is the peak rate of the device, and ref_rate and
+ * ref_wr_duration are two reference parameters.  In particular,
+ * ref_rate is the peak rate of the reference storage device (see
+ * below), and ref_wr_duration is about the maximum time needed, with
+ * BFQ and while reading two files in parallel, to load typical large
+ * applications on the reference device (see the comments on
+ * max_service_from_wr below, for more details on how ref_wr_duration
+ * is obtained).  In practice, the slower/faster the device at hand
+ * is, the more/less it takes to load applications with respect to the
  * reference device.  Accordingly, the longer/shorter BFQ grants
  * weight raising to interactive applications.
  *
- * BFQ uses four different reference pairs (R, T), depending on:
- * . whether the device is rotational or non-rotational;
- * . whether the device is slow, such as old or portable HDDs, as well as
- *   SD cards, or fast, such as newer HDDs and SSDs.
+ * BFQ uses two different reference pairs (ref_rate, ref_wr_duration),
+ * depending on whether the device is rotational or non-rotational.
  *
- * The device's speed class is dynamically (re)detected in
- * bfq_update_peak_rate() every time the estimated peak rate is updated.
+ * In the following definitions, ref_rate[0] and ref_wr_duration[0]
+ * are the reference values for a rotational device, whereas
+ * ref_rate[1] and ref_wr_duration[1] are the reference values for a
+ * non-rotational device. The reference rates are not the actual peak
+ * rates of the devices used as a reference, but slightly lower
+ * values. The reason for using slightly lower values is that the
+ * peak-rate estimator tends to yield slightly lower values than the
+ * actual peak rate (it can yield the actual peak rate only if there
+ * is only one process doing I/O, and the process does sequential
+ * I/O).
  *
- * In the following definitions, R_slow[0]/R_fast[0] and
- * T_slow[0]/T_fast[0] are the reference values for a slow/fast
- * rotational device, whereas R_slow[1]/R_fast[1] and
- * T_slow[1]/T_fast[1] are the reference values for a slow/fast
- * non-rotational device. Finally, device_speed_thresh are the
- * thresholds used to switch between speed classes. The reference
- * rates are not the actual peak rates of the devices used as a
- * reference, but slightly lower values. The reason for using these
- * slightly lower values is that the peak-rate estimator tends to
- * yield slightly lower values than the actual peak rate (it can yield
- * the actual peak rate only if there is only one process doing I/O,
- * and the process does sequential I/O).
- *
- * Both the reference peak rates and the thresholds are measured in
- * sectors/usec, left-shifted by BFQ_RATE_SHIFT.
+ * The reference peak rates are measured in sectors/usec, left-shifted
+ * by BFQ_RATE_SHIFT.
  */
-static int R_slow[2] = {1000, 10700};
-static int R_fast[2] = {14000, 33000};
+static int ref_rate[2] = {14000, 33000};
 /*
- * To improve readability, a conversion function is used to initialize the
- * following arrays, which entails that they can be initialized only in a
- * function.
+ * To improve readability, a conversion function is used to initialize
+ * the following array, which entails that the array can be
+ * initialized only in a function.
  */
-static int T_slow[2];
-static int T_fast[2];
-static int device_speed_thresh[2];
+static int ref_wr_duration[2];
 
 /*
  * BFQ uses the above-detailed, time-based weight-raising mechanism to
@@ -938,7 +926,7 @@ static unsigned int bfq_wr_duration(struct bfq_data *bfqd)
 	if (bfqd->bfq_wr_max_time > 0)
 		return bfqd->bfq_wr_max_time;
 
-	dur = bfqd->RT_prod;
+	dur = bfqd->rate_dur_prod;
 	do_div(dur, bfqd->peak_rate);
 
 	/*
@@ -2543,37 +2531,15 @@ static unsigned long bfq_calc_max_budget(struct bfq_data *bfqd)
 /*
  * Update parameters related to throughput and responsiveness, as a
  * function of the estimated peak rate. See comments on
- * bfq_calc_max_budget(), and on T_slow and T_fast arrays.
+ * bfq_calc_max_budget(), and on the ref_wr_duration array.
  */
 static void update_thr_responsiveness_params(struct bfq_data *bfqd)
 {
-	int dev_type = blk_queue_nonrot(bfqd->queue);
-
-	if (bfqd->bfq_user_max_budget == 0)
+	if (bfqd->bfq_user_max_budget == 0) {
 		bfqd->bfq_max_budget =
 			bfq_calc_max_budget(bfqd);
-
-	if (bfqd->device_speed == BFQ_BFQD_FAST &&
-	    bfqd->peak_rate < device_speed_thresh[dev_type]) {
-		bfqd->device_speed = BFQ_BFQD_SLOW;
-		bfqd->RT_prod = R_slow[dev_type] *
-			T_slow[dev_type];
-	} else if (bfqd->device_speed == BFQ_BFQD_SLOW &&
-		   bfqd->peak_rate > device_speed_thresh[dev_type]) {
-		bfqd->device_speed = BFQ_BFQD_FAST;
-		bfqd->RT_prod = R_fast[dev_type] *
-			T_fast[dev_type];
+		bfq_log(bfqd, "new max_budget = %d", bfqd->bfq_max_budget);
 	}
-
-	bfq_log(bfqd,
-"dev_type %s dev_speed_class = %s (%llu sects/sec), thresh %llu setcs/sec",
-		dev_type == 0 ? "ROT" : "NONROT",
-		bfqd->device_speed == BFQ_BFQD_FAST ? "FAST" : "SLOW",
-		bfqd->device_speed == BFQ_BFQD_FAST ?
-		(USEC_PER_SEC*(u64)R_fast[dev_type])>>BFQ_RATE_SHIFT :
-		(USEC_PER_SEC*(u64)R_slow[dev_type])>>BFQ_RATE_SHIFT,
-		(USEC_PER_SEC*(u64)device_speed_thresh[dev_type])>>
-		BFQ_RATE_SHIFT);
 }
 
 static void bfq_reset_rate_computation(struct bfq_data *bfqd,
@@ -5279,14 +5245,12 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
 	bfqd->wr_busy_queues = 0;
 
 	/*
-	 * Begin by assuming, optimistically, that the device is a
-	 * high-speed one, and that its peak rate is equal to 2/3 of
-	 * the highest reference rate.
+	 * Begin by assuming, optimistically, that the device peak
+	 * rate is equal to 2/3 of the highest reference rate.
 	 */
-	bfqd->RT_prod = R_fast[blk_queue_nonrot(bfqd->queue)] *
-			T_fast[blk_queue_nonrot(bfqd->queue)];
-	bfqd->peak_rate = R_fast[blk_queue_nonrot(bfqd->queue)] * 2 / 3;
-	bfqd->device_speed = BFQ_BFQD_FAST;
+	bfqd->rate_dur_prod = ref_rate[blk_queue_nonrot(bfqd->queue)] *
+		ref_wr_duration[blk_queue_nonrot(bfqd->queue)];
+	bfqd->peak_rate = ref_rate[blk_queue_nonrot(bfqd->queue)] * 2 / 3;
 
 	spin_lock_init(&bfqd->lock);
 
@@ -5593,8 +5557,8 @@ static int __init bfq_init(void)
 	/*
 	 * Times to load large popular applications for the typical
 	 * systems installed on the reference devices (see the
-	 * comments before the definitions of the next two
-	 * arrays). Actually, we use slightly slower values, as the
+	 * comments before the definition of the next
+	 * array). Actually, we use slightly lower values, as the
 	 * estimated peak rate tends to be smaller than the actual
 	 * peak rate.  The reason for this last fact is that estimates
 	 * are computed over much shorter time intervals than the long
@@ -5603,25 +5567,8 @@ static int __init bfq_init(void)
 	 * scheduler cannot rely on a peak-rate-evaluation workload to
 	 * be run for a long time.
 	 */
-	T_slow[0] = msecs_to_jiffies(3500); /* actually 4 sec */
-	T_slow[1] = msecs_to_jiffies(6000); /* actually 6.5 sec */
-	T_fast[0] = msecs_to_jiffies(7000); /* actually 8 sec */
-	T_fast[1] = msecs_to_jiffies(2500); /* actually 3 sec */
-
-	/*
-	 * Thresholds that determine the switch between speed classes
-	 * (see the comments before the definition of the array
-	 * device_speed_thresh). These thresholds are biased towards
-	 * transitions to the fast class. This is safer than the
-	 * opposite bias. In fact, a wrong transition to the slow
-	 * class results in short weight-raising periods, because the
-	 * speed of the device then tends to be higher that the
-	 * reference peak rate. On the opposite end, a wrong
-	 * transition to the fast class tends to increase
-	 * weight-raising periods, because of the opposite reason.
-	 */
-	device_speed_thresh[0] = (4 * R_slow[0]) / 3;
-	device_speed_thresh[1] = (4 * R_slow[1]) / 3;
+	ref_wr_duration[0] = msecs_to_jiffies(7000); /* actually 8 sec */
+	ref_wr_duration[1] = msecs_to_jiffies(2500); /* actually 3 sec */
 
 	ret = elv_register(&iosched_bfq_mq);
 	if (ret)
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index ae2f3dadec44..b5678cc8cfa1 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -399,11 +399,6 @@ struct bfq_io_cq {
 	struct bfq_ttime saved_ttime;
 };
 
-enum bfq_device_speed {
-	BFQ_BFQD_FAST,
-	BFQ_BFQD_SLOW,
-};
-
 /**
  * struct bfq_data - per-device data structure.
  *
@@ -611,12 +606,11 @@ struct bfq_data {
 	/* Max service-rate for a soft real-time queue, in sectors/sec */
 	unsigned int bfq_wr_max_softrt_rate;
 	/*
-	 * Cached value of the product R*T, used for computing the
-	 * maximum duration of weight raising automatically.
+	 * Cached value of the product ref_rate*ref_wr_duration, used
+	 * for computing the maximum duration of weight raising
+	 * automatically.
 	 */
-	u64 RT_prod;
-	/* device-speed class for the low-latency heuristic */
-	enum bfq_device_speed device_speed;
+	u64 rate_dur_prod;
 
 	/* fallback dummy bfqq for extreme OOM conditions */
 	struct bfq_queue oom_bfqq;
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH BUGFIX/IMPROVEMENTS 3/4] block, bfq: increase weight-raising duration for interactive apps
  2018-05-31 14:45 [PATCH BUGFIX/IMPROVEMENTS 0/4] bfq: fix regressions on slow virtual machines Paolo Valente
  2018-05-31 14:45 ` [PATCH BUGFIX/IMPROVEMENTS 1/4] block, bfq: add description of weight-raising heuristics Paolo Valente
  2018-05-31 14:45 ` [PATCH BUGFIX/IMPROVEMENTS 2/4] block, bfq: remove slow-system class Paolo Valente
@ 2018-05-31 14:45 ` Paolo Valente
  2018-05-31 14:45 ` [PATCH BUGFIX/IMPROVEMENTS 4/4] block, bfq: prevent soft_rt_next_start from being stuck at infinity Paolo Valente
  2018-05-31 14:56 ` [PATCH BUGFIX/IMPROVEMENTS 0/4] bfq: fix regressions on slow virtual machines Jens Axboe
  4 siblings, 0 replies; 6+ messages in thread
From: Paolo Valente @ 2018-05-31 14:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, broonie, linus.walleij,
	bfq-iosched, oleksandr, sapienza.dav, 177992, Paolo Valente

From: Davide Sapienza <sapienza.dav@gmail.com>

The maximum possible duration of the weight-raising period for
interactive applications is limited to 13 seconds, as this is the time
needed to load the largest application that we considered when tuning
weight raising. Unfortunately, in such an evaluation, we did not
consider the case of very slow virtual machines.

For example, on a QEMU/KVM virtual machine
- running in a slow PC;
- with a virtual disk stacked on a slow low-end 5400rpm HDD;
- serving a heavy I/O workload, such as the sequential reading of
several files;
mplayer takes 23 seconds to start, if constantly weight-raised.

To address this issue, this commit conservatively sets the upper limit
for weight-raising duration to 25 seconds.

Signed-off-by: Davide Sapienza <sapienza.dav@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 262c929e24ee..31ce089d54eb 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -930,22 +930,26 @@ static unsigned int bfq_wr_duration(struct bfq_data *bfqd)
 	do_div(dur, bfqd->peak_rate);
 
 	/*
-	 * Limit duration between 3 and 13 seconds. Tests show that
-	 * higher values than 13 seconds often yield the opposite of
-	 * the desired result, i.e., worsen responsiveness by letting
-	 * non-interactive and non-soft-real-time applications
-	 * preserve weight raising for a too long time interval.
+	 * Limit duration between 3 and 25 seconds. The upper limit
+	 * has been conservatively set after the following worst case:
+	 * on a QEMU/KVM virtual machine
+	 * - running in a slow PC
+	 * - with a virtual disk stacked on a slow low-end 5400rpm HDD
+	 * - serving a heavy I/O workload, such as the sequential reading
+	 *   of several files
+	 * mplayer took 23 seconds to start, if constantly weight-raised.
+	 *
+	 * As for higher values than that accomodating the above bad
+	 * scenario, tests show that higher values would often yield
+	 * the opposite of the desired result, i.e., would worsen
+	 * responsiveness by allowing non-interactive applications to
+	 * preserve weight raising for too long.
 	 *
 	 * On the other end, lower values than 3 seconds make it
 	 * difficult for most interactive tasks to complete their jobs
 	 * before weight-raising finishes.
 	 */
-	if (dur > msecs_to_jiffies(13000))
-		dur = msecs_to_jiffies(13000);
-	else if (dur < msecs_to_jiffies(3000))
-		dur = msecs_to_jiffies(3000);
-
-	return dur;
+	return clamp_val(dur, msecs_to_jiffies(3000), msecs_to_jiffies(25000));
 }
 
 /* switch back from soft real-time to interactive weight raising */
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH BUGFIX/IMPROVEMENTS 4/4] block, bfq: prevent soft_rt_next_start from being stuck at infinity
  2018-05-31 14:45 [PATCH BUGFIX/IMPROVEMENTS 0/4] bfq: fix regressions on slow virtual machines Paolo Valente
                   ` (2 preceding siblings ...)
  2018-05-31 14:45 ` [PATCH BUGFIX/IMPROVEMENTS 3/4] block, bfq: increase weight-raising duration for interactive apps Paolo Valente
@ 2018-05-31 14:45 ` Paolo Valente
  2018-05-31 14:56 ` [PATCH BUGFIX/IMPROVEMENTS 0/4] bfq: fix regressions on slow virtual machines Jens Axboe
  4 siblings, 0 replies; 6+ messages in thread
From: Paolo Valente @ 2018-05-31 14:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, broonie, linus.walleij,
	bfq-iosched, oleksandr, sapienza.dav, 177992, Paolo Valente

From: Davide Sapienza <sapienza.dav@gmail.com>

BFQ can deem a bfq_queue as soft real-time only if the queue
- periodically becomes completely idle, i.e., empty and with
  no still-outstanding I/O request;
- after becoming idle, gets new I/O only after a special reference
  time soft_rt_next_start.

In this respect, after commit "block, bfq: consider also past I/O in
soft real-time detection", the value of soft_rt_next_start can never
decrease. This causes a problem with the following special updating
case for soft_rt_next_start: to prevent queues that are not completely
idle to be wrongly detected as soft real-time (when they become
non-empty again), soft_rt_next_start is temporarily set to infinity
for empty queues with still outstanding I/O requests. But, if such an
update is actually performed, then, because of the above commit,
soft_rt_next_start will be stuck at infinity forever, and the queue
will have no more chance to be considered soft real-time.

On slow systems, this problem does cause actual soft real-time
applications to be occasionally not detected as such.

This commit addresses this issue by eliminating the pushing of
soft_rt_next_start to infinity, and by changing the way non-empty
queues are prevented from being wrongly detected as soft
real-time. Simply, a queue that becomes non-empty again can now be
detected as soft real-time only if it has no outstanding I/O request.

Signed-off-by: Davide Sapienza <sapienza.dav@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 29 ++---------------------------
 1 file changed, 2 insertions(+), 27 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 31ce089d54eb..37cadc643c56 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1416,15 +1416,6 @@ static bool bfq_bfqq_update_budg_for_activation(struct bfq_data *bfqd,
 	return wr_or_deserves_wr;
 }
 
-/*
- * Return the farthest future time instant according to jiffies
- * macros.
- */
-static unsigned long bfq_greatest_from_now(void)
-{
-	return jiffies + MAX_JIFFY_OFFSET;
-}
-
 /*
  * Return the farthest past time instant according to jiffies
  * macros.
@@ -1569,7 +1560,8 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
 	in_burst = bfq_bfqq_in_large_burst(bfqq);
 	soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
 		!in_burst &&
-		time_is_before_jiffies(bfqq->soft_rt_next_start);
+		time_is_before_jiffies(bfqq->soft_rt_next_start) &&
+		bfqq->dispatched == 0;
 	*interactive = !in_burst && idle_for_long_time;
 	wr_or_deserves_wr = bfqd->low_latency &&
 		(bfqq->wr_coeff > 1 ||
@@ -3272,23 +3264,6 @@ void bfq_bfqq_expire(struct bfq_data *bfqd,
 			bfqq->soft_rt_next_start =
 				bfq_bfqq_softrt_next_start(bfqd, bfqq);
 		else {
-			/*
-			 * The application is still waiting for the
-			 * completion of one or more requests:
-			 * prevent it from possibly being incorrectly
-			 * deemed as soft real-time by setting its
-			 * soft_rt_next_start to infinity. In fact,
-			 * without this assignment, the application
-			 * would be incorrectly deemed as soft
-			 * real-time if:
-			 * 1) it issued a new request before the
-			 *    completion of all its in-flight
-			 *    requests, and
-			 * 2) at that time, its soft_rt_next_start
-			 *    happened to be in the past.
-			 */
-			bfqq->soft_rt_next_start =
-				bfq_greatest_from_now();
 			/*
 			 * Schedule an update of soft_rt_next_start to when
 			 * the task may be discovered to be isochronous.
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH BUGFIX/IMPROVEMENTS 0/4] bfq: fix regressions on slow virtual machines
  2018-05-31 14:45 [PATCH BUGFIX/IMPROVEMENTS 0/4] bfq: fix regressions on slow virtual machines Paolo Valente
                   ` (3 preceding siblings ...)
  2018-05-31 14:45 ` [PATCH BUGFIX/IMPROVEMENTS 4/4] block, bfq: prevent soft_rt_next_start from being stuck at infinity Paolo Valente
@ 2018-05-31 14:56 ` Jens Axboe
  4 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2018-05-31 14:56 UTC (permalink / raw)
  To: Paolo Valente
  Cc: linux-block, linux-kernel, ulf.hansson, broonie, linus.walleij,
	bfq-iosched, oleksandr, sapienza.dav, 177992

On 5/31/18 8:45 AM, Paolo Valente wrote:
> Hi,
> we have bumped into some BFQ latency regressions on slow virtual
> machines. The patches in this series fix all these regressions, apart
> from the first patch, which provides an indirect contribution, by
> improving the description of the mechanisms affected by these
> regressions.

Pulled for 4.18, thanks Paolo.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-05-31 14:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-31 14:45 [PATCH BUGFIX/IMPROVEMENTS 0/4] bfq: fix regressions on slow virtual machines Paolo Valente
2018-05-31 14:45 ` [PATCH BUGFIX/IMPROVEMENTS 1/4] block, bfq: add description of weight-raising heuristics Paolo Valente
2018-05-31 14:45 ` [PATCH BUGFIX/IMPROVEMENTS 2/4] block, bfq: remove slow-system class Paolo Valente
2018-05-31 14:45 ` [PATCH BUGFIX/IMPROVEMENTS 3/4] block, bfq: increase weight-raising duration for interactive apps Paolo Valente
2018-05-31 14:45 ` [PATCH BUGFIX/IMPROVEMENTS 4/4] block, bfq: prevent soft_rt_next_start from being stuck at infinity Paolo Valente
2018-05-31 14:56 ` [PATCH BUGFIX/IMPROVEMENTS 0/4] bfq: fix regressions on slow virtual machines Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).