linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release
@ 2019-01-29 11:06 Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 01/14] block, bfq: do not consider interactive queues in srt filtering Paolo Valente
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: Paolo Valente @ 2019-01-29 11:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha, Paolo Valente

Hi,
this batch of patches provides fixes and improvements for throughput
and latency. Every patch has been under test for at least one month,
some patches for much longer.

Thanks,
Paolo

Paolo Valente (14):
  block, bfq: do not consider interactive queues in srt filtering
  block, bfq: avoid selecting a queue w/o budget
  block, bfq: make sure queue budgets are not below service received
  block, bfq: remove case of redirected bic from insert_request
  block, bfq: consider also ioprio classes in symmetry detection
  block, bfq: split function bfq_better_to_idle
  block, bfq: do not plug I/O of in-service queue when harmful
  block, bfq: unconditionally plug I/O in asymmetric scenarios
  block, bfq: fix sequential rq detection in rate estimation
  block, bfq: fix queue removal from weights tree
  block, bfq: reduce threshold for detecting command queueing
  block, bfq: port commit "cfq-iosched: improve hw_tag detection"
  block, bfq: do not overcharge writes in asymmetric scenarios
  block, bfq: fix in-service-queue check for queue merging

 block/bfq-iosched.c | 705 ++++++++++++++++++++++++--------------------
 block/bfq-iosched.h |  11 +-
 block/bfq-wf2q.c    |  18 +-
 3 files changed, 400 insertions(+), 334 deletions(-)

--
2.20.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH BUGFIX IMPROVEMENT 01/14] block, bfq: do not consider interactive queues in srt filtering
  2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
@ 2019-01-29 11:06 ` Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 02/14] block, bfq: avoid selecting a queue w/o budget Paolo Valente
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Paolo Valente @ 2019-01-29 11:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha, Paolo Valente

The speed at which a bfq_queue receives I/O is one of the parameters
by which bfq decides whether the queue is soft real-time (i.e.,
whether the queue contains the I/O of a soft real-time
application). In particular, when a bfq_queue remains without
outstanding I/O requests, bfq computes the minimum time instant, named
soft_rt_next_start, at which the next request of the queue may arrive
for the queue to be deemed as soft real time.

Unfortunately this filtering may cause problems with a queue in
interactive weight raising. In fact, such a queue may be conveying the
I/O needed to load a soft real-time application. The latter will
actually exhibit a soft real-time I/O pattern after it finally starts
doing its job. But, if soft_rt_next_start is updated for an
interactive bfq_queue, and the queue has received a lot of service
before remaining with no outstanding request (likely to happen on a
fast device), then soft_rt_next_start is assigned such a high value
that, for a very long time, the queue is prevented from being possibly
considered as soft real time.

This commit removes the updating of soft_rt_next_start for bfq_queues
in interactive weight raising.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 39 +++++++++++++++++++++++++++++----------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index cd307767a134..c7a4a15c7c19 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -3274,16 +3274,32 @@ void bfq_bfqq_expire(struct bfq_data *bfqd,
 		 * requests, then the request pattern is isochronous
 		 * (see the comments on the function
 		 * bfq_bfqq_softrt_next_start()). Thus we can compute
-		 * soft_rt_next_start. If, instead, the queue still
-		 * has outstanding requests, then we have to wait for
-		 * the completion of all the outstanding requests to
-		 * discover whether the request pattern is actually
-		 * isochronous.
+		 * soft_rt_next_start. And we do it, unless bfqq is in
+		 * interactive weight raising. We do not do it in the
+		 * latter subcase, for the following reason. bfqq may
+		 * be conveying the I/O needed to load a soft
+		 * real-time application. Such an application will
+		 * actually exhibit a soft real-time I/O pattern after
+		 * it finally starts doing its job. But, if
+		 * soft_rt_next_start is computed here for an
+		 * interactive bfqq, and bfqq had received a lot of
+		 * service before remaining with no outstanding
+		 * request (likely to happen on a fast device), then
+		 * soft_rt_next_start would be assigned such a high
+		 * value that, for a very long time, bfqq would be
+		 * prevented from being possibly considered as soft
+		 * real time.
+		 *
+		 * If, instead, the queue still has outstanding
+		 * requests, then we have to wait for the completion
+		 * of all the outstanding requests to discover whether
+		 * the request pattern is actually isochronous.
 		 */
-		if (bfqq->dispatched == 0)
+		if (bfqq->dispatched == 0 &&
+		    bfqq->wr_coeff != bfqd->bfq_wr_coeff)
 			bfqq->soft_rt_next_start =
 				bfq_bfqq_softrt_next_start(bfqd, bfqq);
-		else {
+		else if (bfqq->dispatched > 0) {
 			/*
 			 * Schedule an update of soft_rt_next_start to when
 			 * the task may be discovered to be isochronous.
@@ -4834,11 +4850,14 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
 	 * isochronous, and both requisites for this condition to hold
 	 * are now satisfied, then compute soft_rt_next_start (see the
 	 * comments on the function bfq_bfqq_softrt_next_start()). We
-	 * schedule this delayed check when bfqq expires, if it still
-	 * has in-flight requests.
+	 * do not compute soft_rt_next_start if bfqq is in interactive
+	 * weight raising (see the comments in bfq_bfqq_expire() for
+	 * an explanation). We schedule this delayed update when bfqq
+	 * expires, if it still has in-flight requests.
 	 */
 	if (bfq_bfqq_softrt_update(bfqq) && bfqq->dispatched == 0 &&
-	    RB_EMPTY_ROOT(&bfqq->sort_list))
+	    RB_EMPTY_ROOT(&bfqq->sort_list) &&
+	    bfqq->wr_coeff != bfqd->bfq_wr_coeff)
 		bfqq->soft_rt_next_start =
 			bfq_bfqq_softrt_next_start(bfqd, bfqq);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH BUGFIX IMPROVEMENT 02/14] block, bfq: avoid selecting a queue w/o budget
  2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 01/14] block, bfq: do not consider interactive queues in srt filtering Paolo Valente
@ 2019-01-29 11:06 ` Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 03/14] block, bfq: make sure queue budgets are not below service received Paolo Valente
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Paolo Valente @ 2019-01-29 11:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha, Paolo Valente

To boost throughput on devices with internal queueing and in scenarios
where device idling is not strictly needed, bfq immediately starts
serving a new bfq_queue if the in-service bfq_queue remains without
pending I/O, even if new I/O may arrive soon for the latter
queue. Then, if such I/O actually arrives soon, bfq preempts the new
in-service bfq_queue so as to give the previous queue a chance to go
on being served (in case the previous queue should actually be the one
to be served, according to its timestamps).

However, the in-service bfq_queue, say Q, may also be without further
budget when it remains also pending I/O. Since bfq changes budgets
dynamically to fit the needs of bfq_queues, this happens more often
than one may expect. If this happens, then there is no point in trying
to go on serving Q when new I/O arrives for it soon: Q would be
expired immediately after being selected for service. This would only
cause useless overhead. This commit avoids such a useless selection.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index c7a4a15c7c19..9ea2c4f42501 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1380,7 +1380,15 @@ static bool bfq_bfqq_update_budg_for_activation(struct bfq_data *bfqd,
 {
 	struct bfq_entity *entity = &bfqq->entity;
 
-	if (bfq_bfqq_non_blocking_wait_rq(bfqq) && arrived_in_time) {
+	/*
+	 * In the next compound condition, we check also whether there
+	 * is some budget left, because otherwise there is no point in
+	 * trying to go on serving bfqq with this same budget: bfqq
+	 * would be expired immediately after being selected for
+	 * service. This would only cause useless overhead.
+	 */
+	if (bfq_bfqq_non_blocking_wait_rq(bfqq) && arrived_in_time &&
+	    bfq_bfqq_budget_left(bfqq) > 0) {
 		/*
 		 * We do not clear the flag non_blocking_wait_rq here, as
 		 * the latter is used in bfq_activate_bfqq to signal
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH BUGFIX IMPROVEMENT 03/14] block, bfq: make sure queue budgets are not below service received
  2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 01/14] block, bfq: do not consider interactive queues in srt filtering Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 02/14] block, bfq: avoid selecting a queue w/o budget Paolo Valente
@ 2019-01-29 11:06 ` Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 04/14] block, bfq: remove case of redirected bic from insert_request Paolo Valente
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Paolo Valente @ 2019-01-29 11:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha, Paolo Valente

With some unlucky sequences of events, the function
bfq_updated_next_req updates the current budget of a bfq_queue to a
lower value than the service received by the queue using such a
budget. Unfortunately, if this happens, then the return value of the
function bfq_bfqq_budget_left becomes inconsistent.  This commit
solves this problem by lower-bounding the budget computed in
bfq_updated_next_req to the service currently charged to the queue.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 9ea2c4f42501..b0e8006475be 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -907,8 +907,10 @@ static void bfq_updated_next_req(struct bfq_data *bfqd,
 		 */
 		return;
 
-	new_budget = max_t(unsigned long, bfqq->max_budget,
-			   bfq_serv_to_charge(next_rq, bfqq));
+	new_budget = max_t(unsigned long,
+			   max_t(unsigned long, bfqq->max_budget,
+				 bfq_serv_to_charge(next_rq, bfqq)),
+			   entity->service);
 	if (entity->budget != new_budget) {
 		entity->budget = new_budget;
 		bfq_log_bfqq(bfqd, bfqq, "updated next rq: new budget %lu",
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH BUGFIX IMPROVEMENT 04/14] block, bfq: remove case of redirected bic from insert_request
  2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
                   ` (2 preceding siblings ...)
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 03/14] block, bfq: make sure queue budgets are not below service received Paolo Valente
@ 2019-01-29 11:06 ` Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 05/14] block, bfq: consider also ioprio classes in symmetry detection Paolo Valente
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Paolo Valente @ 2019-01-29 11:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha, Paolo Valente

Before commit 18e5a57d7987 ("block, bfq: postpone rq preparation to
insert or merge"), the destination queue for a request was chosen by a
different hook than the one that then inserted the request. So,
between the execution of the two hooks, the bic of the process
generating the request could happen to be redirected to a different
bfq_queue. As a consequence, the destination bfq_queue stored in the
request could be wrong. Such an event does not need to ba handled any
longer.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index b0e8006475be..a9275ed57726 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4633,8 +4633,6 @@ static bool __bfq_insert_request(struct bfq_data *bfqd, struct request *rq)
 	bool waiting, idle_timer_disabled = false;
 
 	if (new_bfqq) {
-		if (bic_to_bfqq(RQ_BIC(rq), 1) != bfqq)
-			new_bfqq = bic_to_bfqq(RQ_BIC(rq), 1);
 		/*
 		 * Release the request's reference to the old bfqq
 		 * and make sure one is taken to the shared queue.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH BUGFIX IMPROVEMENT 05/14] block, bfq: consider also ioprio classes in symmetry detection
  2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
                   ` (3 preceding siblings ...)
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 04/14] block, bfq: remove case of redirected bic from insert_request Paolo Valente
@ 2019-01-29 11:06 ` Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 06/14] block, bfq: split function bfq_better_to_idle Paolo Valente
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Paolo Valente @ 2019-01-29 11:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha, Paolo Valente

In asymmetric scenarios, i.e., when some bfq_queue or bfq_group needs
to be guaranteed a different bandwidth than other bfq_queues or
bfq_groups, these service guaranteed can be provided only by plugging
I/O dispatch, completely or partially, when the queue in service
remains temporarily empty. A case where asymmetry is particularly
strong is when some active bfq_queues belong to a higher-priority
class than some other active bfq_queues. Unfortunately, this important
case is not considered at all in the code for detecting asymmetric
scenarios. This commit adds the missing logic.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 86 ++++++++++++++++++++++++---------------------
 block/bfq-iosched.h |  8 +++--
 block/bfq-wf2q.c    | 12 +++++--
 3 files changed, 59 insertions(+), 47 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index a9275ed57726..6bfbfa65610b 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -623,26 +623,6 @@ void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 		bfqq->pos_root = NULL;
 }
 
-/*
- * Tell whether there are active queues with different weights or
- * active groups.
- */
-static bool bfq_varied_queue_weights_or_active_groups(struct bfq_data *bfqd)
-{
-	/*
-	 * For queue weights to differ, queue_weights_tree must contain
-	 * at least two nodes.
-	 */
-	return (!RB_EMPTY_ROOT(&bfqd->queue_weights_tree) &&
-		(bfqd->queue_weights_tree.rb_node->rb_left ||
-		 bfqd->queue_weights_tree.rb_node->rb_right)
-#ifdef CONFIG_BFQ_GROUP_IOSCHED
-	       ) ||
-		(bfqd->num_groups_with_pending_reqs > 0
-#endif
-	       );
-}
-
 /*
  * The following function returns true if every queue must receive the
  * same share of the throughput (this condition is used when deciding
@@ -651,25 +631,48 @@ static bool bfq_varied_queue_weights_or_active_groups(struct bfq_data *bfqd)
  *
  * Such a scenario occurs when:
  * 1) all active queues have the same weight,
- * 2) all active groups at the same level in the groups tree have the same
- *    weight,
+ * 2) all active queues belong to the same I/O-priority class,
  * 3) all active groups at the same level in the groups tree have the same
+ *    weight,
+ * 4) all active groups at the same level in the groups tree have the same
  *    number of children.
  *
  * Unfortunately, keeping the necessary state for evaluating exactly
  * the last two symmetry sub-conditions above would be quite complex
- * and time consuming.  Therefore this function evaluates, instead,
- * only the following stronger two sub-conditions, for which it is
+ * and time consuming. Therefore this function evaluates, instead,
+ * only the following stronger three sub-conditions, for which it is
  * much easier to maintain the needed state:
  * 1) all active queues have the same weight,
- * 2) there are no active groups.
+ * 2) all active queues belong to the same I/O-priority class,
+ * 3) there are no active groups.
  * In particular, the last condition is always true if hierarchical
  * support or the cgroups interface are not enabled, thus no state
  * needs to be maintained in this case.
  */
 static bool bfq_symmetric_scenario(struct bfq_data *bfqd)
 {
-	return !bfq_varied_queue_weights_or_active_groups(bfqd);
+	/*
+	 * For queue weights to differ, queue_weights_tree must contain
+	 * at least two nodes.
+	 */
+	bool varied_queue_weights = !RB_EMPTY_ROOT(&bfqd->queue_weights_tree) &&
+		(bfqd->queue_weights_tree.rb_node->rb_left ||
+		 bfqd->queue_weights_tree.rb_node->rb_right);
+
+	bool multiple_classes_busy =
+		(bfqd->busy_queues[0] && bfqd->busy_queues[1]) ||
+		(bfqd->busy_queues[0] && bfqd->busy_queues[2]) ||
+		(bfqd->busy_queues[1] && bfqd->busy_queues[2]);
+
+	/*
+	 * For queue weights to differ, queue_weights_tree must contain
+	 * at least two nodes.
+	 */
+	return !(varied_queue_weights || multiple_classes_busy
+#ifdef BFQ_GROUP_IOSCHED_ENABLED
+	       || bfqd->num_groups_with_pending_reqs > 0
+#endif
+		);
 }
 
 /*
@@ -728,15 +731,14 @@ void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 	/*
 	 * In the unlucky event of an allocation failure, we just
 	 * exit. This will cause the weight of queue to not be
-	 * considered in bfq_varied_queue_weights_or_active_groups,
-	 * which, in its turn, causes the scenario to be deemed
-	 * wrongly symmetric in case bfqq's weight would have been
-	 * the only weight making the scenario asymmetric.  On the
-	 * bright side, no unbalance will however occur when bfqq
-	 * becomes inactive again (the invocation of this function
-	 * is triggered by an activation of queue).  In fact,
-	 * bfq_weights_tree_remove does nothing if
-	 * !bfqq->weight_counter.
+	 * considered in bfq_symmetric_scenario, which, in its turn,
+	 * causes the scenario to be deemed wrongly symmetric in case
+	 * bfqq's weight would have been the only weight making the
+	 * scenario asymmetric.  On the bright side, no unbalance will
+	 * however occur when bfqq becomes inactive again (the
+	 * invocation of this function is triggered by an activation
+	 * of queue).  In fact, bfq_weights_tree_remove does nothing
+	 * if !bfqq->weight_counter.
 	 */
 	if (unlikely(!bfqq->weight_counter))
 		return;
@@ -2227,7 +2229,7 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 		return NULL;
 
 	/* If there is only one backlogged queue, don't search. */
-	if (bfqd->busy_queues == 1)
+	if (bfq_tot_busy_queues(bfqd) == 1)
 		return NULL;
 
 	in_service_bfqq = bfqd->in_service_queue;
@@ -3681,7 +3683,8 @@ static bool bfq_better_to_idle(struct bfq_queue *bfqq)
 	 * the requests already queued in the device have been served.
 	 */
 	asymmetric_scenario = (bfqq->wr_coeff > 1 &&
-			       bfqd->wr_busy_queues < bfqd->busy_queues) ||
+			       bfqd->wr_busy_queues <
+			       bfq_tot_busy_queues(bfqd)) ||
 		!bfq_symmetric_scenario(bfqd);
 
 	/*
@@ -3960,7 +3963,7 @@ static struct request *bfq_dispatch_rq_from_bfqq(struct bfq_data *bfqd,
 	 * belongs to CLASS_IDLE and other queues are waiting for
 	 * service.
 	 */
-	if (!(bfqd->busy_queues > 1 && bfq_class_idle(bfqq)))
+	if (!(bfq_tot_busy_queues(bfqd) > 1 && bfq_class_idle(bfqq)))
 		goto return_rq;
 
 	bfq_bfqq_expire(bfqd, bfqq, false, BFQQE_BUDGET_EXHAUSTED);
@@ -3978,7 +3981,7 @@ static bool bfq_has_work(struct blk_mq_hw_ctx *hctx)
 	 * most a call to dispatch for nothing
 	 */
 	return !list_empty_careful(&bfqd->dispatch) ||
-		bfqd->busy_queues > 0;
+		bfq_tot_busy_queues(bfqd) > 0;
 }
 
 static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
@@ -4032,9 +4035,10 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
 		goto start_rq;
 	}
 
-	bfq_log(bfqd, "dispatch requests: %d busy queues", bfqd->busy_queues);
+	bfq_log(bfqd, "dispatch requests: %d busy queues",
+		bfq_tot_busy_queues(bfqd));
 
-	if (bfqd->busy_queues == 0)
+	if (bfq_tot_busy_queues(bfqd) == 0)
 		goto exit;
 
 	/*
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 0b02bf302de0..30be669be465 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -501,10 +501,11 @@ struct bfq_data {
 	unsigned int num_groups_with_pending_reqs;
 
 	/*
-	 * Number of bfq_queues containing requests (including the
-	 * queue in service, even if it is idling).
+	 * Per-class (RT, BE, IDLE) number of bfq_queues containing
+	 * requests (including the queue in service, even if it is
+	 * idling).
 	 */
-	int busy_queues;
+	unsigned int busy_queues[3];
 	/* number of weight-raised busy @bfq_queues */
 	int wr_busy_queues;
 	/* number of queued requests */
@@ -974,6 +975,7 @@ extern struct blkcg_policy blkcg_policy_bfq;
 
 struct bfq_group *bfq_bfqq_to_bfqg(struct bfq_queue *bfqq);
 struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity);
+unsigned int bfq_tot_busy_queues(struct bfq_data *bfqd);
 struct bfq_service_tree *bfq_entity_service_tree(struct bfq_entity *entity);
 struct bfq_entity *bfq_entity_of(struct rb_node *node);
 unsigned short bfq_ioprio_to_weight(int ioprio);
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index 72adbbe975d5..ce37d709a34f 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -44,6 +44,12 @@ static unsigned int bfq_class_idx(struct bfq_entity *entity)
 		BFQ_DEFAULT_GRP_CLASS - 1;
 }
 
+unsigned int bfq_tot_busy_queues(struct bfq_data *bfqd)
+{
+	return bfqd->busy_queues[0] + bfqd->busy_queues[1] +
+		bfqd->busy_queues[2];
+}
+
 static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
 						 bool expiration);
 
@@ -1513,7 +1519,7 @@ struct bfq_queue *bfq_get_next_queue(struct bfq_data *bfqd)
 	struct bfq_sched_data *sd;
 	struct bfq_queue *bfqq;
 
-	if (bfqd->busy_queues == 0)
+	if (bfq_tot_busy_queues(bfqd) == 0)
 		return NULL;
 
 	/*
@@ -1665,7 +1671,7 @@ void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 
 	bfq_clear_bfqq_busy(bfqq);
 
-	bfqd->busy_queues--;
+	bfqd->busy_queues[bfqq->ioprio_class - 1]--;
 
 	if (!bfqq->dispatched)
 		bfq_weights_tree_remove(bfqd, bfqq);
@@ -1688,7 +1694,7 @@ void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 	bfq_activate_bfqq(bfqd, bfqq);
 
 	bfq_mark_bfqq_busy(bfqq);
-	bfqd->busy_queues++;
+	bfqd->busy_queues[bfqq->ioprio_class - 1]++;
 
 	if (!bfqq->dispatched)
 		if (bfqq->wr_coeff == 1)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH BUGFIX IMPROVEMENT 06/14] block, bfq: split function bfq_better_to_idle
  2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
                   ` (4 preceding siblings ...)
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 05/14] block, bfq: consider also ioprio classes in symmetry detection Paolo Valente
@ 2019-01-29 11:06 ` Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 07/14] block, bfq: do not plug I/O of in-service queue when harmful Paolo Valente
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Paolo Valente @ 2019-01-29 11:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha, Paolo Valente

This is a preparatory commit for commits that need to check only one
of the two main reasons for idling. This change should also improve
the quality of the code a little bit, by splitting a function that
contains very long, non-trivial and little related comments.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 155 +++++++++++++++++++++++---------------------
 1 file changed, 82 insertions(+), 73 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 6bfbfa65610b..2756f4b1432b 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -3404,53 +3404,13 @@ static bool bfq_may_expire_for_budg_timeout(struct bfq_queue *bfqq)
 		bfq_bfqq_budget_timeout(bfqq);
 }
 
-/*
- * For a queue that becomes empty, device idling is allowed only if
- * this function returns true for the queue. As a consequence, since
- * device idling plays a critical role in both throughput boosting and
- * service guarantees, the return value of this function plays a
- * critical role in both these aspects as well.
- *
- * In a nutshell, this function returns true only if idling is
- * beneficial for throughput or, even if detrimental for throughput,
- * idling is however necessary to preserve service guarantees (low
- * latency, desired throughput distribution, ...). In particular, on
- * NCQ-capable devices, this function tries to return false, so as to
- * help keep the drives' internal queues full, whenever this helps the
- * device boost the throughput without causing any service-guarantee
- * issue.
- *
- * In more detail, the return value of this function is obtained by,
- * first, computing a number of boolean variables that take into
- * account throughput and service-guarantee issues, and, then,
- * combining these variables in a logical expression. Most of the
- * issues taken into account are not trivial. We discuss these issues
- * individually while introducing the variables.
- */
-static bool bfq_better_to_idle(struct bfq_queue *bfqq)
+static bool idling_boosts_thr_without_issues(struct bfq_data *bfqd,
+					     struct bfq_queue *bfqq)
 {
-	struct bfq_data *bfqd = bfqq->bfqd;
 	bool rot_without_queueing =
 		!blk_queue_nonrot(bfqd->queue) && !bfqd->hw_tag,
 		bfqq_sequential_and_IO_bound,
-		idling_boosts_thr, idling_boosts_thr_without_issues,
-		idling_needed_for_service_guarantees,
-		asymmetric_scenario;
-
-	if (bfqd->strict_guarantees)
-		return true;
-
-	/*
-	 * Idling is performed only if slice_idle > 0. In addition, we
-	 * do not idle if
-	 * (a) bfqq is async
-	 * (b) bfqq is in the idle io prio class: in this case we do
-	 * not idle because we want to minimize the bandwidth that
-	 * queues in this class can steal to higher-priority queues
-	 */
-	if (bfqd->bfq_slice_idle == 0 || !bfq_bfqq_sync(bfqq) ||
-	    bfq_class_idle(bfqq))
-		return false;
+		idling_boosts_thr;
 
 	bfqq_sequential_and_IO_bound = !BFQQ_SEEKY(bfqq) &&
 		bfq_bfqq_IO_bound(bfqq) && bfq_bfqq_has_short_ttime(bfqq);
@@ -3482,8 +3442,7 @@ static bool bfq_better_to_idle(struct bfq_queue *bfqq)
 		 bfqq_sequential_and_IO_bound);
 
 	/*
-	 * The value of the next variable,
-	 * idling_boosts_thr_without_issues, is equal to that of
+	 * The return value of this function is equal to that of
 	 * idling_boosts_thr, unless a special case holds. In this
 	 * special case, described below, idling may cause problems to
 	 * weight-raised queues.
@@ -3500,32 +3459,35 @@ static bool bfq_better_to_idle(struct bfq_queue *bfqq)
 	 * which enqueue several requests in advance, and further
 	 * reorder internally-queued requests.
 	 *
-	 * For this reason, we force to false the value of
-	 * idling_boosts_thr_without_issues if there are weight-raised
-	 * busy queues. In this case, and if bfqq is not weight-raised,
-	 * this guarantees that the device is not idled for bfqq (if,
-	 * instead, bfqq is weight-raised, then idling will be
-	 * guaranteed by another variable, see below). Combined with
-	 * the timestamping rules of BFQ (see [1] for details), this
-	 * behavior causes bfqq, and hence any sync non-weight-raised
-	 * queue, to get a lower number of requests served, and thus
-	 * to ask for a lower number of requests from the request
-	 * pool, before the busy weight-raised queues get served
-	 * again. This often mitigates starvation problems in the
-	 * presence of heavy write workloads and NCQ, thereby
-	 * guaranteeing a higher application and system responsiveness
-	 * in these hostile scenarios.
+	 * For this reason, we force to false the return value if
+	 * there are weight-raised busy queues. In this case, and if
+	 * bfqq is not weight-raised, this guarantees that the device
+	 * is not idled for bfqq (if, instead, bfqq is weight-raised,
+	 * then idling will be guaranteed by another variable, see
+	 * below). Combined with the timestamping rules of BFQ (see
+	 * [1] for details), this behavior causes bfqq, and hence any
+	 * sync non-weight-raised queue, to get a lower number of
+	 * requests served, and thus to ask for a lower number of
+	 * requests from the request pool, before the busy
+	 * weight-raised queues get served again. This often mitigates
+	 * starvation problems in the presence of heavy write
+	 * workloads and NCQ, thereby guaranteeing a higher
+	 * application and system responsiveness in these hostile
+	 * scenarios.
 	 */
-	idling_boosts_thr_without_issues = idling_boosts_thr &&
+	return idling_boosts_thr &&
 		bfqd->wr_busy_queues == 0;
+}
 
+static bool idling_needed_for_service_guarantees(struct bfq_data *bfqd,
+						 struct bfq_queue *bfqq)
+{
 	/*
-	 * There is then a case where idling must be performed not
-	 * for throughput concerns, but to preserve service
-	 * guarantees.
+	 * There is a case where idling must be performed not for
+	 * throughput concerns, but to preserve service guarantees.
 	 *
 	 * To introduce this case, we can note that allowing the drive
-	 * to enqueue more than one request at a time, and hence
+	 * to enqueue more than one request at a time, and thereby
 	 * delegating de facto final scheduling decisions to the
 	 * drive's internal scheduler, entails loss of control on the
 	 * actual request service order. In particular, the critical
@@ -3682,9 +3644,9 @@ static bool bfq_better_to_idle(struct bfq_queue *bfqq)
 	 * to let requests be served in the desired order until all
 	 * the requests already queued in the device have been served.
 	 */
-	asymmetric_scenario = (bfqq->wr_coeff > 1 &&
-			       bfqd->wr_busy_queues <
-			       bfq_tot_busy_queues(bfqd)) ||
+	bool asymmetric_scenario = (bfqq->wr_coeff > 1 &&
+				    bfqd->wr_busy_queues <
+				    bfq_tot_busy_queues(bfqd)) ||
 		!bfq_symmetric_scenario(bfqd);
 
 	/*
@@ -3701,17 +3663,64 @@ static bool bfq_better_to_idle(struct bfq_queue *bfqq)
 	 * now establish when idling is actually needed to preserve
 	 * service guarantees.
 	 */
-	idling_needed_for_service_guarantees =
-		asymmetric_scenario && !bfq_bfqq_in_large_burst(bfqq);
+	return asymmetric_scenario && !bfq_bfqq_in_large_burst(bfqq);
+}
+
+/*
+ * For a queue that becomes empty, device idling is allowed only if
+ * this function returns true for that queue. As a consequence, since
+ * device idling plays a critical role for both throughput boosting
+ * and service guarantees, the return value of this function plays a
+ * critical role as well.
+ *
+ * In a nutshell, this function returns true only if idling is
+ * beneficial for throughput or, even if detrimental for throughput,
+ * idling is however necessary to preserve service guarantees (low
+ * latency, desired throughput distribution, ...). In particular, on
+ * NCQ-capable devices, this function tries to return false, so as to
+ * help keep the drives' internal queues full, whenever this helps the
+ * device boost the throughput without causing any service-guarantee
+ * issue.
+ *
+ * Most of the issues taken into account to get the return value of
+ * this function are not trivial. We discuss these issues in the two
+ * functions providing the main pieces of information needed by this
+ * function.
+ */
+static bool bfq_better_to_idle(struct bfq_queue *bfqq)
+{
+	struct bfq_data *bfqd = bfqq->bfqd;
+	bool idling_boosts_thr_with_no_issue, idling_needed_for_service_guar;
+
+	if (unlikely(bfqd->strict_guarantees))
+		return true;
+
+	/*
+	 * Idling is performed only if slice_idle > 0. In addition, we
+	 * do not idle if
+	 * (a) bfqq is async
+	 * (b) bfqq is in the idle io prio class: in this case we do
+	 * not idle because we want to minimize the bandwidth that
+	 * queues in this class can steal to higher-priority queues
+	 */
+	if (bfqd->bfq_slice_idle == 0 || !bfq_bfqq_sync(bfqq) ||
+	   bfq_class_idle(bfqq))
+		return false;
+
+	idling_boosts_thr_with_no_issue =
+		idling_boosts_thr_without_issues(bfqd, bfqq);
+
+	idling_needed_for_service_guar =
+		idling_needed_for_service_guarantees(bfqd, bfqq);
 
 	/*
-	 * We have now all the components we need to compute the
+	 * We have now the two components we need to compute the
 	 * return value of the function, which is true only if idling
 	 * either boosts the throughput (without issues), or is
 	 * necessary to preserve service guarantees.
 	 */
-	return idling_boosts_thr_without_issues ||
-		idling_needed_for_service_guarantees;
+	return idling_boosts_thr_with_no_issue ||
+		idling_needed_for_service_guar;
 }
 
 /*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH BUGFIX IMPROVEMENT 07/14] block, bfq: do not plug I/O of in-service queue when harmful
  2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
                   ` (5 preceding siblings ...)
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 06/14] block, bfq: split function bfq_better_to_idle Paolo Valente
@ 2019-01-29 11:06 ` Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 08/14] block, bfq: unconditionally plug I/O in asymmetric scenarios Paolo Valente
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Paolo Valente @ 2019-01-29 11:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha, Paolo Valente

If the in-service bfq_queue is sync and remains temporarily idle, then
I/O dispatching (from other queues) may be plugged. It may be dome for
two reasons: either to boost throughput, or to preserve the bandwidth
share of the in-service queue. In the first case, if the I/O of the
in-service queue, when it finally arrives, consists only of one small
I/O request, then it makes sense to plug even the I/O of the
in-service queue. In fact, serving such a small request immediately is
likely to lower throughput instead of boosting it, whereas waiting a
little bit is likely to let that request grow, thanks to request
merging, and become more profitable in terms of throughput (this is
likely to happen exactly because the I/O of the queue has been
detected to boost throughput).

On the opposite end, if I/O dispatching is being plugged only to
preserve the bandwidth of the in-service queue, then it would be
better not to plug also the I/O of the in-service queue, because such
a plugging is likely to cause only loss of bandwidth for the queue.

Unfortunately, no distinction is made between the two cases, and the
I/O of the in-service queue is always plugged in case just a small I/O
request arrives. This commit draws this missing distinction and does
not perform harmful plugging.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 2756f4b1432b..a6fe60114ade 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4599,28 +4599,31 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 		bool budget_timeout = bfq_bfqq_budget_timeout(bfqq);
 
 		/*
-		 * There is just this request queued: if the request
-		 * is small and the queue is not to be expired, then
-		 * just exit.
+		 * There is just this request queued: if
+		 * - the request is small, and
+		 * - we are idling to boost throughput, and
+		 * - the queue is not to be expired,
+		 * then just exit.
 		 *
 		 * In this way, if the device is being idled to wait
 		 * for a new request from the in-service queue, we
 		 * avoid unplugging the device and committing the
-		 * device to serve just a small request. On the
-		 * contrary, we wait for the block layer to decide
-		 * when to unplug the device: hopefully, new requests
-		 * will be merged to this one quickly, then the device
-		 * will be unplugged and larger requests will be
-		 * dispatched.
+		 * device to serve just a small request. In contrast
+		 * we wait for the block layer to decide when to
+		 * unplug the device: hopefully, new requests will be
+		 * merged to this one quickly, then the device will be
+		 * unplugged and larger requests will be dispatched.
 		 */
-		if (small_req && !budget_timeout)
+		if (small_req && idling_boosts_thr_without_issues(bfqd, bfqq) &&
+		    !budget_timeout)
 			return;
 
 		/*
-		 * A large enough request arrived, or the queue is to
-		 * be expired: in both cases disk idling is to be
-		 * stopped, so clear wait_request flag and reset
-		 * timer.
+		 * A large enough request arrived, or idling is being
+		 * performed to preserve service guarantees, or
+		 * finally the queue is to be expired: in all these
+		 * cases disk idling is to be stopped, so clear
+		 * wait_request flag and reset timer.
 		 */
 		bfq_clear_bfqq_wait_request(bfqq);
 		hrtimer_try_to_cancel(&bfqd->idle_slice_timer);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH BUGFIX IMPROVEMENT 08/14] block, bfq: unconditionally plug I/O in asymmetric scenarios
  2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
                   ` (6 preceding siblings ...)
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 07/14] block, bfq: do not plug I/O of in-service queue when harmful Paolo Valente
@ 2019-01-29 11:06 ` Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 09/14] block, bfq: fix sequential rq detection in rate estimation Paolo Valente
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Paolo Valente @ 2019-01-29 11:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha, Paolo Valente

bfq detects the creation of multiple bfq_queues shortly after each
other, namely a burst of queue creations in the terminology used in
the code. If the burst is large, then no queue in the burst is granted
- either I/O-dispatch plugging when the queue remains temporarily
  idle while in service;
- or weight raising, because it causes even longer plugging.

In fact, such a plugging tends to lower throughput, while these bursts
are typically due to applications or services that spawn multiple
processes, to reach a common goal as soon as possible. Examples are a
"git grep" or the booting of a system.

Unfortunately, disabling plugging may cause a loss of service
guarantees in asymmetric scenarios, i.e., if queue weights are
differentiated or if more than one group is active.

This commit addresses this issue by no longer disabling I/O-dispatch
plugging for queues in large bursts.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 346 +++++++++++++++++++++-----------------------
 1 file changed, 165 insertions(+), 181 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index a6fe60114ade..c1bb5e5fcdc4 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -3479,191 +3479,175 @@ static bool idling_boosts_thr_without_issues(struct bfq_data *bfqd,
 		bfqd->wr_busy_queues == 0;
 }
 
+/*
+ * There is a case where idling must be performed not for
+ * throughput concerns, but to preserve service guarantees.
+ *
+ * To introduce this case, we can note that allowing the drive
+ * to enqueue more than one request at a time, and hence
+ * delegating de facto final scheduling decisions to the
+ * drive's internal scheduler, entails loss of control on the
+ * actual request service order. In particular, the critical
+ * situation is when requests from different processes happen
+ * to be present, at the same time, in the internal queue(s)
+ * of the drive. In such a situation, the drive, by deciding
+ * the service order of the internally-queued requests, does
+ * determine also the actual throughput distribution among
+ * these processes. But the drive typically has no notion or
+ * concern about per-process throughput distribution, and
+ * makes its decisions only on a per-request basis. Therefore,
+ * the service distribution enforced by the drive's internal
+ * scheduler is likely to coincide with the desired
+ * device-throughput distribution only in a completely
+ * symmetric scenario where:
+ * (i)  each of these processes must get the same throughput as
+ *      the others;
+ * (ii) the I/O of each process has the same properties, in
+ *      terms of locality (sequential or random), direction
+ *      (reads or writes), request sizes, greediness
+ *      (from I/O-bound to sporadic), and so on.
+ * In fact, in such a scenario, the drive tends to treat
+ * the requests of each of these processes in about the same
+ * way as the requests of the others, and thus to provide
+ * each of these processes with about the same throughput
+ * (which is exactly the desired throughput distribution). In
+ * contrast, in any asymmetric scenario, device idling is
+ * certainly needed to guarantee that bfqq receives its
+ * assigned fraction of the device throughput (see [1] for
+ * details).
+ * The problem is that idling may significantly reduce
+ * throughput with certain combinations of types of I/O and
+ * devices. An important example is sync random I/O, on flash
+ * storage with command queueing. So, unless bfqq falls in the
+ * above cases where idling also boosts throughput, it would
+ * be important to check conditions (i) and (ii) accurately,
+ * so as to avoid idling when not strictly needed for service
+ * guarantees.
+ *
+ * Unfortunately, it is extremely difficult to thoroughly
+ * check condition (ii). And, in case there are active groups,
+ * it becomes very difficult to check condition (i) too. In
+ * fact, if there are active groups, then, for condition (i)
+ * to become false, it is enough that an active group contains
+ * more active processes or sub-groups than some other active
+ * group. More precisely, for condition (i) to hold because of
+ * such a group, it is not even necessary that the group is
+ * (still) active: it is sufficient that, even if the group
+ * has become inactive, some of its descendant processes still
+ * have some request already dispatched but still waiting for
+ * completion. In fact, requests have still to be guaranteed
+ * their share of the throughput even after being
+ * dispatched. In this respect, it is easy to show that, if a
+ * group frequently becomes inactive while still having
+ * in-flight requests, and if, when this happens, the group is
+ * not considered in the calculation of whether the scenario
+ * is asymmetric, then the group may fail to be guaranteed its
+ * fair share of the throughput (basically because idling may
+ * not be performed for the descendant processes of the group,
+ * but it had to be).  We address this issue with the
+ * following bi-modal behavior, implemented in the function
+ * bfq_symmetric_scenario().
+ *
+ * If there are groups with requests waiting for completion
+ * (as commented above, some of these groups may even be
+ * already inactive), then the scenario is tagged as
+ * asymmetric, conservatively, without checking any of the
+ * conditions (i) and (ii). So the device is idled for bfqq.
+ * This behavior matches also the fact that groups are created
+ * exactly if controlling I/O is a primary concern (to
+ * preserve bandwidth and latency guarantees).
+ *
+ * On the opposite end, if there are no groups with requests
+ * waiting for completion, then only condition (i) is actually
+ * controlled, i.e., provided that condition (i) holds, idling
+ * is not performed, regardless of whether condition (ii)
+ * holds. In other words, only if condition (i) does not hold,
+ * then idling is allowed, and the device tends to be
+ * prevented from queueing many requests, possibly of several
+ * processes. Since there are no groups with requests waiting
+ * for completion, then, to control condition (i) it is enough
+ * to check just whether all the queues with requests waiting
+ * for completion also have the same weight.
+ *
+ * Not checking condition (ii) evidently exposes bfqq to the
+ * risk of getting less throughput than its fair share.
+ * However, for queues with the same weight, a further
+ * mechanism, preemption, mitigates or even eliminates this
+ * problem. And it does so without consequences on overall
+ * throughput. This mechanism and its benefits are explained
+ * in the next three paragraphs.
+ *
+ * Even if a queue, say Q, is expired when it remains idle, Q
+ * can still preempt the new in-service queue if the next
+ * request of Q arrives soon (see the comments on
+ * bfq_bfqq_update_budg_for_activation). If all queues and
+ * groups have the same weight, this form of preemption,
+ * combined with the hole-recovery heuristic described in the
+ * comments on function bfq_bfqq_update_budg_for_activation,
+ * are enough to preserve a correct bandwidth distribution in
+ * the mid term, even without idling. In fact, even if not
+ * idling allows the internal queues of the device to contain
+ * many requests, and thus to reorder requests, we can rather
+ * safely assume that the internal scheduler still preserves a
+ * minimum of mid-term fairness.
+ *
+ * More precisely, this preemption-based, idleless approach
+ * provides fairness in terms of IOPS, and not sectors per
+ * second. This can be seen with a simple example. Suppose
+ * that there are two queues with the same weight, but that
+ * the first queue receives requests of 8 sectors, while the
+ * second queue receives requests of 1024 sectors. In
+ * addition, suppose that each of the two queues contains at
+ * most one request at a time, which implies that each queue
+ * always remains idle after it is served. Finally, after
+ * remaining idle, each queue receives very quickly a new
+ * request. It follows that the two queues are served
+ * alternatively, preempting each other if needed. This
+ * implies that, although both queues have the same weight,
+ * the queue with large requests receives a service that is
+ * 1024/8 times as high as the service received by the other
+ * queue.
+ *
+ * The motivation for using preemption instead of idling (for
+ * queues with the same weight) is that, by not idling,
+ * service guarantees are preserved (completely or at least in
+ * part) without minimally sacrificing throughput. And, if
+ * there is no active group, then the primary expectation for
+ * this device is probably a high throughput.
+ *
+ * We are now left only with explaining the additional
+ * compound condition that is checked below for deciding
+ * whether the scenario is asymmetric. To explain this
+ * compound condition, we need to add that the function
+ * bfq_symmetric_scenario checks the weights of only
+ * non-weight-raised queues, for efficiency reasons (see
+ * comments on bfq_weights_tree_add()). Then the fact that
+ * bfqq is weight-raised is checked explicitly here. More
+ * precisely, the compound condition below takes into account
+ * also the fact that, even if bfqq is being weight-raised,
+ * the scenario is still symmetric if all queues with requests
+ * waiting for completion happen to be
+ * weight-raised. Actually, we should be even more precise
+ * here, and differentiate between interactive weight raising
+ * and soft real-time weight raising.
+ *
+ * As a side note, it is worth considering that the above
+ * device-idling countermeasures may however fail in the
+ * following unlucky scenario: if idling is (correctly)
+ * disabled in a time period during which all symmetry
+ * sub-conditions hold, and hence the device is allowed to
+ * enqueue many requests, but at some later point in time some
+ * sub-condition stops to hold, then it may become impossible
+ * to let requests be served in the desired order until all
+ * the requests already queued in the device have been served.
+ */
 static bool idling_needed_for_service_guarantees(struct bfq_data *bfqd,
 						 struct bfq_queue *bfqq)
 {
-	/*
-	 * There is a case where idling must be performed not for
-	 * throughput concerns, but to preserve service guarantees.
-	 *
-	 * To introduce this case, we can note that allowing the drive
-	 * to enqueue more than one request at a time, and thereby
-	 * delegating de facto final scheduling decisions to the
-	 * drive's internal scheduler, entails loss of control on the
-	 * actual request service order. In particular, the critical
-	 * situation is when requests from different processes happen
-	 * to be present, at the same time, in the internal queue(s)
-	 * of the drive. In such a situation, the drive, by deciding
-	 * the service order of the internally-queued requests, does
-	 * determine also the actual throughput distribution among
-	 * these processes. But the drive typically has no notion or
-	 * concern about per-process throughput distribution, and
-	 * makes its decisions only on a per-request basis. Therefore,
-	 * the service distribution enforced by the drive's internal
-	 * scheduler is likely to coincide with the desired
-	 * device-throughput distribution only in a completely
-	 * symmetric scenario where:
-	 * (i)  each of these processes must get the same throughput as
-	 *      the others;
-	 * (ii) the I/O of each process has the same properties, in
-	 *      terms of locality (sequential or random), direction
-	 *      (reads or writes), request sizes, greediness
-	 *      (from I/O-bound to sporadic), and so on.
-	 * In fact, in such a scenario, the drive tends to treat
-	 * the requests of each of these processes in about the same
-	 * way as the requests of the others, and thus to provide
-	 * each of these processes with about the same throughput
-	 * (which is exactly the desired throughput distribution). In
-	 * contrast, in any asymmetric scenario, device idling is
-	 * certainly needed to guarantee that bfqq receives its
-	 * assigned fraction of the device throughput (see [1] for
-	 * details).
-	 * The problem is that idling may significantly reduce
-	 * throughput with certain combinations of types of I/O and
-	 * devices. An important example is sync random I/O, on flash
-	 * storage with command queueing. So, unless bfqq falls in the
-	 * above cases where idling also boosts throughput, it would
-	 * be important to check conditions (i) and (ii) accurately,
-	 * so as to avoid idling when not strictly needed for service
-	 * guarantees.
-	 *
-	 * Unfortunately, it is extremely difficult to thoroughly
-	 * check condition (ii). And, in case there are active groups,
-	 * it becomes very difficult to check condition (i) too. In
-	 * fact, if there are active groups, then, for condition (i)
-	 * to become false, it is enough that an active group contains
-	 * more active processes or sub-groups than some other active
-	 * group. More precisely, for condition (i) to hold because of
-	 * such a group, it is not even necessary that the group is
-	 * (still) active: it is sufficient that, even if the group
-	 * has become inactive, some of its descendant processes still
-	 * have some request already dispatched but still waiting for
-	 * completion. In fact, requests have still to be guaranteed
-	 * their share of the throughput even after being
-	 * dispatched. In this respect, it is easy to show that, if a
-	 * group frequently becomes inactive while still having
-	 * in-flight requests, and if, when this happens, the group is
-	 * not considered in the calculation of whether the scenario
-	 * is asymmetric, then the group may fail to be guaranteed its
-	 * fair share of the throughput (basically because idling may
-	 * not be performed for the descendant processes of the group,
-	 * but it had to be).  We address this issue with the
-	 * following bi-modal behavior, implemented in the function
-	 * bfq_symmetric_scenario().
-	 *
-	 * If there are groups with requests waiting for completion
-	 * (as commented above, some of these groups may even be
-	 * already inactive), then the scenario is tagged as
-	 * asymmetric, conservatively, without checking any of the
-	 * conditions (i) and (ii). So the device is idled for bfqq.
-	 * This behavior matches also the fact that groups are created
-	 * exactly if controlling I/O is a primary concern (to
-	 * preserve bandwidth and latency guarantees).
-	 *
-	 * On the opposite end, if there are no groups with requests
-	 * waiting for completion, then only condition (i) is actually
-	 * controlled, i.e., provided that condition (i) holds, idling
-	 * is not performed, regardless of whether condition (ii)
-	 * holds. In other words, only if condition (i) does not hold,
-	 * then idling is allowed, and the device tends to be
-	 * prevented from queueing many requests, possibly of several
-	 * processes. Since there are no groups with requests waiting
-	 * for completion, then, to control condition (i) it is enough
-	 * to check just whether all the queues with requests waiting
-	 * for completion also have the same weight.
-	 *
-	 * Not checking condition (ii) evidently exposes bfqq to the
-	 * risk of getting less throughput than its fair share.
-	 * However, for queues with the same weight, a further
-	 * mechanism, preemption, mitigates or even eliminates this
-	 * problem. And it does so without consequences on overall
-	 * throughput. This mechanism and its benefits are explained
-	 * in the next three paragraphs.
-	 *
-	 * Even if a queue, say Q, is expired when it remains idle, Q
-	 * can still preempt the new in-service queue if the next
-	 * request of Q arrives soon (see the comments on
-	 * bfq_bfqq_update_budg_for_activation). If all queues and
-	 * groups have the same weight, this form of preemption,
-	 * combined with the hole-recovery heuristic described in the
-	 * comments on function bfq_bfqq_update_budg_for_activation,
-	 * are enough to preserve a correct bandwidth distribution in
-	 * the mid term, even without idling. In fact, even if not
-	 * idling allows the internal queues of the device to contain
-	 * many requests, and thus to reorder requests, we can rather
-	 * safely assume that the internal scheduler still preserves a
-	 * minimum of mid-term fairness.
-	 *
-	 * More precisely, this preemption-based, idleless approach
-	 * provides fairness in terms of IOPS, and not sectors per
-	 * second. This can be seen with a simple example. Suppose
-	 * that there are two queues with the same weight, but that
-	 * the first queue receives requests of 8 sectors, while the
-	 * second queue receives requests of 1024 sectors. In
-	 * addition, suppose that each of the two queues contains at
-	 * most one request at a time, which implies that each queue
-	 * always remains idle after it is served. Finally, after
-	 * remaining idle, each queue receives very quickly a new
-	 * request. It follows that the two queues are served
-	 * alternatively, preempting each other if needed. This
-	 * implies that, although both queues have the same weight,
-	 * the queue with large requests receives a service that is
-	 * 1024/8 times as high as the service received by the other
-	 * queue.
-	 *
-	 * The motivation for using preemption instead of idling (for
-	 * queues with the same weight) is that, by not idling,
-	 * service guarantees are preserved (completely or at least in
-	 * part) without minimally sacrificing throughput. And, if
-	 * there is no active group, then the primary expectation for
-	 * this device is probably a high throughput.
-	 *
-	 * We are now left only with explaining the additional
-	 * compound condition that is checked below for deciding
-	 * whether the scenario is asymmetric. To explain this
-	 * compound condition, we need to add that the function
-	 * bfq_symmetric_scenario checks the weights of only
-	 * non-weight-raised queues, for efficiency reasons (see
-	 * comments on bfq_weights_tree_add()). Then the fact that
-	 * bfqq is weight-raised is checked explicitly here. More
-	 * precisely, the compound condition below takes into account
-	 * also the fact that, even if bfqq is being weight-raised,
-	 * the scenario is still symmetric if all queues with requests
-	 * waiting for completion happen to be
-	 * weight-raised. Actually, we should be even more precise
-	 * here, and differentiate between interactive weight raising
-	 * and soft real-time weight raising.
-	 *
-	 * As a side note, it is worth considering that the above
-	 * device-idling countermeasures may however fail in the
-	 * following unlucky scenario: if idling is (correctly)
-	 * disabled in a time period during which all symmetry
-	 * sub-conditions hold, and hence the device is allowed to
-	 * enqueue many requests, but at some later point in time some
-	 * sub-condition stops to hold, then it may become impossible
-	 * to let requests be served in the desired order until all
-	 * the requests already queued in the device have been served.
-	 */
-	bool asymmetric_scenario = (bfqq->wr_coeff > 1 &&
-				    bfqd->wr_busy_queues <
-				    bfq_tot_busy_queues(bfqd)) ||
+	return (bfqq->wr_coeff > 1 &&
+		bfqd->wr_busy_queues <
+		bfq_tot_busy_queues(bfqd)) ||
 		!bfq_symmetric_scenario(bfqd);
-
-	/*
-	 * Finally, there is a case where maximizing throughput is the
-	 * best choice even if it may cause unfairness toward
-	 * bfqq. Such a case is when bfqq became active in a burst of
-	 * queue activations. Queues that became active during a large
-	 * burst benefit only from throughput, as discussed in the
-	 * comments on bfq_handle_burst. Thus, if bfqq became active
-	 * in a burst and not idling the device maximizes throughput,
-	 * then the device must no be idled, because not idling the
-	 * device provides bfqq and all other queues in the burst with
-	 * maximum benefit. Combining this and the above case, we can
-	 * now establish when idling is actually needed to preserve
-	 * service guarantees.
-	 */
-	return asymmetric_scenario && !bfq_bfqq_in_large_burst(bfqq);
 }
 
 /*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH BUGFIX IMPROVEMENT 09/14] block, bfq: fix sequential rq detection in rate estimation
  2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
                   ` (7 preceding siblings ...)
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 08/14] block, bfq: unconditionally plug I/O in asymmetric scenarios Paolo Valente
@ 2019-01-29 11:06 ` Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 10/14] block, bfq: fix queue removal from weights tree Paolo Valente
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Paolo Valente @ 2019-01-29 11:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha, Paolo Valente

In bfq_update_peak_rate, to check whether an I/O request rq is
sequential, only the seek distance of rq w.r.t. the last request
dispatched is controlled. This is not sufficient for non-rotational
storage, where the size of rq is at least as relevant. This commit
adds the missing control.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index c1bb5e5fcdc4..12228af16198 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -235,6 +235,11 @@ static struct kmem_cache *bfq_pool;
 
 #define BFQQ_SEEK_THR		(sector_t)(8 * 100)
 #define BFQQ_SECT_THR_NONROT	(sector_t)(2 * 32)
+#define BFQ_RQ_SEEKY(bfqd, last_pos, rq) \
+	(get_sdist(last_pos, rq) >			\
+	 BFQQ_SEEK_THR &&				\
+	 (!blk_queue_nonrot(bfqd->queue) ||		\
+	  blk_rq_sectors(rq) < BFQQ_SECT_THR_NONROT))
 #define BFQQ_CLOSE_THR		(sector_t)(8 * 1024)
 #define BFQQ_SEEKY(bfqq)	(hweight32(bfqq->seek_history) > 19)
 
@@ -2754,7 +2759,7 @@ static void bfq_update_peak_rate(struct bfq_data *bfqd, struct request *rq)
 
 	if ((bfqd->rq_in_driver > 0 ||
 		now_ns - bfqd->last_completion < BFQ_MIN_TT)
-	     && get_sdist(bfqd->last_position, rq) < BFQQ_SEEK_THR)
+	    && !BFQ_RQ_SEEKY(bfqd, bfqd->last_position, rq))
 		bfqd->sequential_samples++;
 
 	bfqd->tot_sectors_dispatched += blk_rq_sectors(rq);
@@ -4511,10 +4516,7 @@ bfq_update_io_seektime(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 		       struct request *rq)
 {
 	bfqq->seek_history <<= 1;
-	bfqq->seek_history |=
-		get_sdist(bfqq->last_request_pos, rq) > BFQQ_SEEK_THR &&
-		(!blk_queue_nonrot(bfqd->queue) ||
-		 blk_rq_sectors(rq) < BFQQ_SECT_THR_NONROT);
+	bfqq->seek_history |= BFQ_RQ_SEEKY(bfqd, bfqq->last_request_pos, rq);
 }
 
 static void bfq_update_has_short_ttime(struct bfq_data *bfqd,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH BUGFIX IMPROVEMENT 10/14] block, bfq: fix queue removal from weights tree
  2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
                   ` (8 preceding siblings ...)
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 09/14] block, bfq: fix sequential rq detection in rate estimation Paolo Valente
@ 2019-01-29 11:06 ` Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 11/14] block, bfq: reduce threshold for detecting command queueing Paolo Valente
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Paolo Valente @ 2019-01-29 11:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha, Paolo Valente

bfq maintains an ordered list, through a red-black tree, of unique
weights of active bfq_queues. This list is used to detect whether
there are active queues with differentiated weights. The weight of a
queue is removed from the list when both the following two conditions
become true:
(1) the bfq_queue is flagged as inactive
(2) the has no in-flight request any longer;

Unfortunately, in the rare cases where condition (2) becomes true
before condition (1), the removal fails, because the function to
remove the weight of the queue (bfq_weights_tree_remove) is rightly
invoked in the path that deactivates the bfq_queue, but mistakenly
invoked *before* the function that actually performs the deactivation
(bfq_deactivate_bfqq).

This commits moves the invocation of bfq_weights_tree_remove for
condition (1) to after bfq_deactivate_bfqq. As a consequence of this
move, it is necessary to add a further reference to the queue when the
weight of a queue is added, because the queue might otherwise be freed
before bfq_weights_tree_remove is invoked. This commit adds this
reference and makes all related modifications.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 17 +++++++++++++----
 block/bfq-wf2q.c    |  6 +++---
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 12228af16198..bf585ad29bb5 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -754,6 +754,7 @@ void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 
 inc_counter:
 	bfqq->weight_counter->num_active++;
+	bfqq->ref++;
 }
 
 /*
@@ -778,6 +779,7 @@ void __bfq_weights_tree_remove(struct bfq_data *bfqd,
 
 reset_entity_pointer:
 	bfqq->weight_counter = NULL;
+	bfq_put_queue(bfqq);
 }
 
 /*
@@ -789,9 +791,6 @@ void bfq_weights_tree_remove(struct bfq_data *bfqd,
 {
 	struct bfq_entity *entity = bfqq->entity.parent;
 
-	__bfq_weights_tree_remove(bfqd, bfqq,
-				  &bfqd->queue_weights_tree);
-
 	for_each_entity(entity) {
 		struct bfq_sched_data *sd = entity->my_sched_data;
 
@@ -825,6 +824,15 @@ void bfq_weights_tree_remove(struct bfq_data *bfqd,
 			bfqd->num_groups_with_pending_reqs--;
 		}
 	}
+
+	/*
+	 * Next function is invoked last, because it causes bfqq to be
+	 * freed if the following holds: bfqq is not in service and
+	 * has no dispatched request. DO NOT use bfqq after the next
+	 * function invocation.
+	 */
+	__bfq_weights_tree_remove(bfqd, bfqq,
+				  &bfqd->queue_weights_tree);
 }
 
 /*
@@ -1020,7 +1028,8 @@ bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_data *bfqd,
 
 static int bfqq_process_refs(struct bfq_queue *bfqq)
 {
-	return bfqq->ref - bfqq->allocated - bfqq->entity.on_st;
+	return bfqq->ref - bfqq->allocated - bfqq->entity.on_st -
+		(bfqq->weight_counter != NULL);
 }
 
 /* Empty burst list and add just bfqq (see comments on bfq_handle_burst) */
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index ce37d709a34f..63311d1ff1ed 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -1673,15 +1673,15 @@ void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 
 	bfqd->busy_queues[bfqq->ioprio_class - 1]--;
 
-	if (!bfqq->dispatched)
-		bfq_weights_tree_remove(bfqd, bfqq);
-
 	if (bfqq->wr_coeff > 1)
 		bfqd->wr_busy_queues--;
 
 	bfqg_stats_update_dequeue(bfqq_group(bfqq));
 
 	bfq_deactivate_bfqq(bfqd, bfqq, true, expiration);
+
+	if (!bfqq->dispatched)
+		bfq_weights_tree_remove(bfqd, bfqq);
 }
 
 /*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH BUGFIX IMPROVEMENT 11/14] block, bfq: reduce threshold for detecting command queueing
  2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
                   ` (9 preceding siblings ...)
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 10/14] block, bfq: fix queue removal from weights tree Paolo Valente
@ 2019-01-29 11:06 ` Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 12/14] block, bfq: port commit "cfq-iosched: improve hw_tag detection" Paolo Valente
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Paolo Valente @ 2019-01-29 11:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha, Paolo Valente

bfq borrowed from cfq a simple heuristic for detecting whether the
drive performs command queueing: check whether the average number of
in-flight requests is above a given threshold. Unfortunately this
heuristic does fail to detect queueing (on drives with queueing) if
processes doing I/O are few and issue I/O with a low depth.

To reduce false negatives, this commit lowers the threshold.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index bf585ad29bb5..48b579032d14 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -230,7 +230,7 @@ static struct kmem_cache *bfq_pool;
 #define BFQ_MIN_TT		(2 * NSEC_PER_MSEC)
 
 /* hw_tag detection: parallel requests threshold and min samples needed. */
-#define BFQ_HW_QUEUE_THRESHOLD	4
+#define BFQ_HW_QUEUE_THRESHOLD	3
 #define BFQ_HW_QUEUE_SAMPLES	32
 
 #define BFQQ_SEEK_THR		(sector_t)(8 * 100)
@@ -4798,7 +4798,7 @@ static void bfq_update_hw_tag(struct bfq_data *bfqd)
 	 * sum is not exact, as it's not taking into account deactivated
 	 * requests.
 	 */
-	if (bfqd->rq_in_driver + bfqd->queued < BFQ_HW_QUEUE_THRESHOLD)
+	if (bfqd->rq_in_driver + bfqd->queued <= BFQ_HW_QUEUE_THRESHOLD)
 		return;
 
 	if (bfqd->hw_tag_samples++ < BFQ_HW_QUEUE_SAMPLES)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH BUGFIX IMPROVEMENT 12/14] block, bfq: port commit "cfq-iosched: improve hw_tag detection"
  2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
                   ` (10 preceding siblings ...)
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 11/14] block, bfq: reduce threshold for detecting command queueing Paolo Valente
@ 2019-01-29 11:06 ` Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 13/14] block, bfq: do not overcharge writes in asymmetric scenarios Paolo Valente
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Paolo Valente @ 2019-01-29 11:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha, Paolo Valente

The original commit is
commit 1a1238a7dd48 ("cfq-iosched: improve hw_tag detection")
and has the following commit message.

If active queue hasn't enough requests and idle window opens, cfq will not
dispatch sufficient requests to hardware. In such situation, current code
will zero hw_tag. But this is because cfq doesn't dispatch enough requests
instead of hardware queue doesn't work. Don't zero hw_tag in such case.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 48b579032d14..2ab53d93ba12 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4786,6 +4786,8 @@ static void bfq_insert_requests(struct blk_mq_hw_ctx *hctx,
 
 static void bfq_update_hw_tag(struct bfq_data *bfqd)
 {
+	struct bfq_queue *bfqq = bfqd->in_service_queue;
+
 	bfqd->max_rq_in_driver = max_t(int, bfqd->max_rq_in_driver,
 				       bfqd->rq_in_driver);
 
@@ -4801,6 +4803,17 @@ static void bfq_update_hw_tag(struct bfq_data *bfqd)
 	if (bfqd->rq_in_driver + bfqd->queued <= BFQ_HW_QUEUE_THRESHOLD)
 		return;
 
+	/*
+	 * If active queue hasn't enough requests and can idle, bfq might not
+	 * dispatch sufficient requests to hardware. Don't zero hw_tag in this
+	 * case
+	 */
+	if (bfqq && bfq_bfqq_has_short_ttime(bfqq) &&
+	    bfqq->dispatched + bfqq->queued[0] + bfqq->queued[1] <
+	    BFQ_HW_QUEUE_THRESHOLD &&
+	    bfqd->rq_in_driver < BFQ_HW_QUEUE_THRESHOLD)
+		return;
+
 	if (bfqd->hw_tag_samples++ < BFQ_HW_QUEUE_SAMPLES)
 		return;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH BUGFIX IMPROVEMENT 13/14] block, bfq: do not overcharge writes in asymmetric scenarios
  2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
                   ` (11 preceding siblings ...)
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 12/14] block, bfq: port commit "cfq-iosched: improve hw_tag detection" Paolo Valente
@ 2019-01-29 11:06 ` Paolo Valente
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 14/14] block, bfq: fix in-service-queue check for queue merging Paolo Valente
  2019-01-31 19:50 ` [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Jens Axboe
  14 siblings, 0 replies; 16+ messages in thread
From: Paolo Valente @ 2019-01-29 11:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha, Paolo Valente

Writes tend to starve reads. bfq counters this problem by overcharging
writes with an inflated service w.r.t. the actual service (number of
sector written) they receive.

Yet his overcharging is useless, and actually causes unfairness in the
opposite direction, when bfq happens to be enforcing strong I/O
control. bfq does this enforcing when the scenario is asymmetric,
i.e., when some bfq_queue or group of bfq_queues is to be granted a
different bandwidth than some other bfq_queue or group of
bfq_queues. So, in such a scenario, this commit disables write
overcharging.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 2ab53d93ba12..06268449d2ca 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -888,7 +888,8 @@ static struct request *bfq_find_next_rq(struct bfq_data *bfqd,
 static unsigned long bfq_serv_to_charge(struct request *rq,
 					struct bfq_queue *bfqq)
 {
-	if (bfq_bfqq_sync(bfqq) || bfqq->wr_coeff > 1)
+	if (bfq_bfqq_sync(bfqq) || bfqq->wr_coeff > 1 ||
+	    !bfq_symmetric_scenario(bfqq->bfqd))
 		return blk_rq_sectors(rq);
 
 	return blk_rq_sectors(rq) * bfq_async_charge_factor;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH BUGFIX IMPROVEMENT 14/14] block, bfq: fix in-service-queue check for queue merging
  2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
                   ` (12 preceding siblings ...)
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 13/14] block, bfq: do not overcharge writes in asymmetric scenarios Paolo Valente
@ 2019-01-29 11:06 ` Paolo Valente
  2019-01-31 19:50 ` [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Jens Axboe
  14 siblings, 0 replies; 16+ messages in thread
From: Paolo Valente @ 2019-01-29 11:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha, Paolo Valente

When a new I/O request arrives for a bfq_queue, say Q, bfq checks
whether that request is close to
(a) the head request of some other queue waiting to be served, or
(b) the last request dispatched for the in-service queue (in case Q
itself is not the in-service queue)

If a queue, say Q2, is found for which the above condition holds, then
bfq merges Q and Q2, to hopefully get a more sequential I/O in the
resulting merged queue, and thus a possibly higher throughput.

Case (b) is checked by comparing the new request for Q with the last
request dispatched, assuming that the latter necessarily belonged to
the in-service queue. Unfortunately, this assumption is no longer
always correct, since commit d0edc2473be9 ("block, bfq: inject
other-queue I/O into seeky idle queues on NCQ flash").

When the assumption does not hold, queues that must not be merged may
be merged, causing unexpected loss of control on per-queue service
guarantees.

This commit solves this problem by adding an extra field, which stores
the actual last request dispatched for the in-service queue, and by
using this new field to correctly check case (b).

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 5 ++++-
 block/bfq-iosched.h | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 06268449d2ca..4c592496a16a 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2251,7 +2251,8 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 
 	if (in_service_bfqq && in_service_bfqq != bfqq &&
 	    likely(in_service_bfqq != &bfqd->oom_bfqq) &&
-	    bfq_rq_close_to_sector(io_struct, request, bfqd->last_position) &&
+	    bfq_rq_close_to_sector(io_struct, request,
+				   bfqd->in_serv_last_pos) &&
 	    bfqq->entity.parent == in_service_bfqq->entity.parent &&
 	    bfq_may_be_close_cooperator(bfqq, in_service_bfqq)) {
 		new_bfqq = bfq_setup_merge(bfqq, in_service_bfqq);
@@ -2791,6 +2792,8 @@ static void bfq_update_peak_rate(struct bfq_data *bfqd, struct request *rq)
 	bfq_update_rate_reset(bfqd, rq);
 update_last_values:
 	bfqd->last_position = blk_rq_pos(rq) + blk_rq_sectors(rq);
+	if (RQ_BFQQ(rq) == bfqd->in_service_queue)
+		bfqd->in_serv_last_pos = bfqd->last_position;
 	bfqd->last_dispatch = now_ns;
 }
 
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 30be669be465..062e1c4787f4 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -538,6 +538,9 @@ struct bfq_data {
 	/* on-disk position of the last served request */
 	sector_t last_position;
 
+	/* position of the last served request for the in-service queue */
+	sector_t in_serv_last_pos;
+
 	/* time of last request completion (ns) */
 	u64 last_completion;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release
  2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
                   ` (13 preceding siblings ...)
  2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 14/14] block, bfq: fix in-service-queue check for queue merging Paolo Valente
@ 2019-01-31 19:50 ` Jens Axboe
  14 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2019-01-31 19:50 UTC (permalink / raw)
  To: Paolo Valente
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, mancha

On 1/29/19 4:06 AM, Paolo Valente wrote:
> Hi,
> this batch of patches provides fixes and improvements for throughput
> and latency. Every patch has been under test for at least one month,
> some patches for much longer.
> 
> Thanks,
> Paolo
> 
> Paolo Valente (14):
>   block, bfq: do not consider interactive queues in srt filtering
>   block, bfq: avoid selecting a queue w/o budget
>   block, bfq: make sure queue budgets are not below service received
>   block, bfq: remove case of redirected bic from insert_request
>   block, bfq: consider also ioprio classes in symmetry detection
>   block, bfq: split function bfq_better_to_idle
>   block, bfq: do not plug I/O of in-service queue when harmful
>   block, bfq: unconditionally plug I/O in asymmetric scenarios
>   block, bfq: fix sequential rq detection in rate estimation
>   block, bfq: fix queue removal from weights tree
>   block, bfq: reduce threshold for detecting command queueing
>   block, bfq: port commit "cfq-iosched: improve hw_tag detection"
>   block, bfq: do not overcharge writes in asymmetric scenarios
>   block, bfq: fix in-service-queue check for queue merging
> 
>  block/bfq-iosched.c | 705 ++++++++++++++++++++++++--------------------
>  block/bfq-iosched.h |  11 +-
>  block/bfq-wf2q.c    |  18 +-
>  3 files changed, 400 insertions(+), 334 deletions(-)

Applied, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-01-31 19:51 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-29 11:06 [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Paolo Valente
2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 01/14] block, bfq: do not consider interactive queues in srt filtering Paolo Valente
2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 02/14] block, bfq: avoid selecting a queue w/o budget Paolo Valente
2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 03/14] block, bfq: make sure queue budgets are not below service received Paolo Valente
2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 04/14] block, bfq: remove case of redirected bic from insert_request Paolo Valente
2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 05/14] block, bfq: consider also ioprio classes in symmetry detection Paolo Valente
2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 06/14] block, bfq: split function bfq_better_to_idle Paolo Valente
2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 07/14] block, bfq: do not plug I/O of in-service queue when harmful Paolo Valente
2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 08/14] block, bfq: unconditionally plug I/O in asymmetric scenarios Paolo Valente
2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 09/14] block, bfq: fix sequential rq detection in rate estimation Paolo Valente
2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 10/14] block, bfq: fix queue removal from weights tree Paolo Valente
2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 11/14] block, bfq: reduce threshold for detecting command queueing Paolo Valente
2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 12/14] block, bfq: port commit "cfq-iosched: improve hw_tag detection" Paolo Valente
2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 13/14] block, bfq: do not overcharge writes in asymmetric scenarios Paolo Valente
2019-01-29 11:06 ` [PATCH BUGFIX IMPROVEMENT 14/14] block, bfq: fix in-service-queue check for queue merging Paolo Valente
2019-01-31 19:50 ` [PATCH BUGFIX IMPROVEMENT 00/14] batch of patches for next linux release Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).