linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH BUGFIX/IMPROVEMENT 0/3] bfq: one fix and two important improvements for throughput
@ 2018-09-14 14:23 Paolo Valente
  2018-09-14 14:23 ` [PATCH BUGFIX/IMPROVEMENT 1/3] block, bfq: correctly charge and reset entity service in all cases Paolo Valente
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Paolo Valente @ 2018-09-14 14:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, Paolo Valente

Hi Jens,
the second and third patch in this series provide two important
improvements in bfq's ability to boost throughput with random I/O. The
benefits of the second patch concern I/O control, and are described in
detail in this LWN article [1] (and briefly in the commit message
itself). The benefits of the other patch should be
straightforward. Finally, the first patch fixes an I/O-control bug,
found while making the second patch.

These patches modify somehow core operations of bfq, so, before
submitting them, I have tested them a lot, and have had them tested by
other people too. In particular, during these months, they have been
tested in systems ranging from PCs to development platforms.

Thanks,
Paolo

[1] https://lwn.net/Articles/763603/

Paolo Valente (3):
  block, bfq: correctly charge and reset entity service in all cases
  block, bfq: inject other-queue I/O into seeky idle queues on NCQ flash
  blok, bfq: do not plug I/O if all queues are weight-raised

 block/bfq-iosched.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++------
 block/bfq-iosched.h | 26 ++++++++++++++++++
 block/bfq-wf2q.c    | 13 ++++++---
 3 files changed, 106 insertions(+), 11 deletions(-)

--
2.16.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH BUGFIX/IMPROVEMENT 1/3] block, bfq: correctly charge and reset entity service in all cases
  2018-09-14 14:23 [PATCH BUGFIX/IMPROVEMENT 0/3] bfq: one fix and two important improvements for throughput Paolo Valente
@ 2018-09-14 14:23 ` Paolo Valente
  2018-09-14 14:23 ` [PATCH BUGFIX/IMPROVEMENT 2/3] block, bfq: inject other-queue I/O into seeky idle queues on NCQ flash Paolo Valente
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Paolo Valente @ 2018-09-14 14:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, Paolo Valente

BFQ schedules entities (which represent either per-process queues or
groups of queues) as a function of their timestamps. In particular, as
a function of their (virtual) finish times. The finish time of an
entity is computed as a function of the budget assigned to the entity,
assuming, tentatively, that the entity, once in service, will receive
an amount of service equal to its budget. Then, when the entity is
expired because it finishes to be served, this finish time is updated
as a function of the actual service received by the entity. This
allows the entity to be correctly charged with only the service
received, and then to be correctly re-scheduled.

Yet an entity may receive service also while not being the entity in
service (in the scheduling environment of its parent entity), for
several reasons. If the entity remains with no backlog while receiving
this 'unofficial' service, then it is expired. Also on such an
expiration, the finish time of the entity should be updated to account
for only the service actually received by the entity. Unfortunately,
such an update is not performed for an entity expiring without being
the entity in service.

In a similar vein, the service counter of the entity in service is
reset when the entity is expired, to be ready to be used for next
service cycle. This reset too should be performed also in case an
entity is expired because it remains empty after receiving service
while not being the entity in service. But in this case the reset is
not performed.

This commit performs the above update of the finish time and reset of
the service received, also for an entity expiring while not being the
entity in service.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-wf2q.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index ae52bff43ce4..ff7c2d470bb8 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -1181,10 +1181,17 @@ bool __bfq_deactivate_entity(struct bfq_entity *entity, bool ins_into_idle_tree)
 	st = bfq_entity_service_tree(entity);
 	is_in_service = entity == sd->in_service_entity;
 
-	if (is_in_service) {
-		bfq_calc_finish(entity, entity->service);
+	bfq_calc_finish(entity, entity->service);
+
+	if (is_in_service)
 		sd->in_service_entity = NULL;
-	}
+	else
+		/*
+		 * Non in-service entity: nobody will take care of
+		 * resetting its service counter on expiration. Do it
+		 * now.
+		 */
+		entity->service = 0;
 
 	if (entity->tree == &st->active)
 		bfq_active_extract(st, entity);
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH BUGFIX/IMPROVEMENT 2/3] block, bfq: inject other-queue I/O into seeky idle queues on NCQ flash
  2018-09-14 14:23 [PATCH BUGFIX/IMPROVEMENT 0/3] bfq: one fix and two important improvements for throughput Paolo Valente
  2018-09-14 14:23 ` [PATCH BUGFIX/IMPROVEMENT 1/3] block, bfq: correctly charge and reset entity service in all cases Paolo Valente
@ 2018-09-14 14:23 ` Paolo Valente
  2018-09-14 14:23 ` [PATCH BUGFIX/IMPROVEMENT 3/3] blok, bfq: do not plug I/O if all queues are weight-raised Paolo Valente
  2018-09-14 19:06 ` [PATCH BUGFIX/IMPROVEMENT 0/3] bfq: one fix and two important improvements for throughput Jens Axboe
  3 siblings, 0 replies; 5+ messages in thread
From: Paolo Valente @ 2018-09-14 14:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, Paolo Valente

The Achilles' heel of BFQ is its failing to reach a high throughput
with sync random I/O on flash storage with internal queueing, in case
the processes doing I/O have differentiated weights.

The cause of this failure is as follows. If at least two processes do
sync I/O, and have a different weight from each other, then BFQ plugs
I/O dispatching every time one of these processes, while it is being
served, remains temporarily without pending I/O requests. This
plugging is necessary to guarantee that every process enjoys a
bandwidth proportional to its weight; but it empties the internal
queue(s) of the drive. And this kills throughput with random I/O. So,
if some processes have differentiated weights and do both sync and
random I/O, the end result is a throughput collapse.

This commit tries to counter this problem by injecting the service of
other processes, in a controlled way, while the process in service
happens to have no I/O. This injection is performed only if the medium
is non rotational and performs internal queueing, and the process in
service does random I/O (service injection might be beneficial for
sequential I/O too, we'll work on that).

As an example of the benefits of this commit, on a PLEXTOR PX-256M5S
SSD, and with five processes having differentiated weights and doing
sync random 4KB I/O, this commit makes the throughput with bfq grow by
400%, from 25 to 100MB/s. This higher throughput is 10MB/s lower than
that reached with none. As some less random I/O is added to the mix,
the throughput becomes equal to or higher than that with none.

This commit is a very first attempt to recover throughput without
losing control, and certainly has many limitations. One is, e.g., that
the processes whose service is injected are not chosen so as to
distribute the extra bandwidth they receive in accordance to their
weights. Thus there might be loss of weighted fairness in some
cases. Anyway, this loss concerns extra service, which would not have
been received at all without this commit. Other limitations and issues
will probably show up with usage.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 block/bfq-iosched.h | 26 ++++++++++++++++++++
 2 files changed, 88 insertions(+), 6 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 653100fb719e..d94838bcc135 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -3182,6 +3182,13 @@ static unsigned long bfq_bfqq_softrt_next_start(struct bfq_data *bfqd,
 		    jiffies + nsecs_to_jiffies(bfqq->bfqd->bfq_slice_idle) + 4);
 }
 
+static bool bfq_bfqq_injectable(struct bfq_queue *bfqq)
+{
+	return BFQQ_SEEKY(bfqq) && bfqq->wr_coeff == 1 &&
+		blk_queue_nonrot(bfqq->bfqd->queue) &&
+		bfqq->bfqd->hw_tag;
+}
+
 /**
  * bfq_bfqq_expire - expire a queue.
  * @bfqd: device owning the queue.
@@ -3291,6 +3298,8 @@ void bfq_bfqq_expire(struct bfq_data *bfqd,
 	if (ref == 1) /* bfqq is gone, no more actions on it */
 		return;
 
+	bfqq->injected_service = 0;
+
 	/* mark bfqq as waiting a request only if a bic still points to it */
 	if (!bfq_bfqq_busy(bfqq) &&
 	    reason != BFQQE_BUDGET_TIMEOUT &&
@@ -3629,6 +3638,30 @@ static bool bfq_bfqq_must_idle(struct bfq_queue *bfqq)
 	return RB_EMPTY_ROOT(&bfqq->sort_list) && bfq_better_to_idle(bfqq);
 }
 
+static struct bfq_queue *bfq_choose_bfqq_for_injection(struct bfq_data *bfqd)
+{
+	struct bfq_queue *bfqq;
+
+	/*
+	 * A linear search; but, with a high probability, very few
+	 * steps are needed to find a candidate queue, i.e., a queue
+	 * with enough budget left for its next request. In fact:
+	 * - BFQ dynamically updates the budget of every queue so as
+	 *   to accommodate the expected backlog of the queue;
+	 * - if a queue gets all its requests dispatched as injected
+	 *   service, then the queue is removed from the active list
+	 *   (and re-added only if it gets new requests, but with
+	 *   enough budget for its new backlog).
+	 */
+	list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list)
+		if (!RB_EMPTY_ROOT(&bfqq->sort_list) &&
+		    bfq_serv_to_charge(bfqq->next_rq, bfqq) <=
+		    bfq_bfqq_budget_left(bfqq))
+			return bfqq;
+
+	return NULL;
+}
+
 /*
  * Select a queue for service.  If we have a current queue in service,
  * check whether to continue servicing it, or retrieve and set a new one.
@@ -3710,10 +3743,19 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
 	 * No requests pending. However, if the in-service queue is idling
 	 * for a new request, or has requests waiting for a completion and
 	 * may idle after their completion, then keep it anyway.
+	 *
+	 * Yet, to boost throughput, inject service from other queues if
+	 * possible.
 	 */
 	if (bfq_bfqq_wait_request(bfqq) ||
 	    (bfqq->dispatched != 0 && bfq_better_to_idle(bfqq))) {
-		bfqq = NULL;
+		if (bfq_bfqq_injectable(bfqq) &&
+		    bfqq->injected_service * bfqq->inject_coeff <
+		    bfqq->entity.service * 10)
+			bfqq = bfq_choose_bfqq_for_injection(bfqd);
+		else
+			bfqq = NULL;
+
 		goto keep_queue;
 	}
 
@@ -3803,6 +3845,14 @@ static struct request *bfq_dispatch_rq_from_bfqq(struct bfq_data *bfqd,
 
 	bfq_dispatch_remove(bfqd->queue, rq);
 
+	if (bfqq != bfqd->in_service_queue) {
+		if (likely(bfqd->in_service_queue))
+			bfqd->in_service_queue->injected_service +=
+				bfq_serv_to_charge(rq, bfqq);
+
+		goto return_rq;
+	}
+
 	/*
 	 * If weight raising has to terminate for bfqq, then next
 	 * function causes an immediate update of bfqq's weight,
@@ -3821,13 +3871,12 @@ static struct request *bfq_dispatch_rq_from_bfqq(struct bfq_data *bfqd,
 	 * belongs to CLASS_IDLE and other queues are waiting for
 	 * service.
 	 */
-	if (bfqd->busy_queues > 1 && bfq_class_idle(bfqq))
-		goto expire;
-
-	return rq;
+	if (!(bfqd->busy_queues > 1 && bfq_class_idle(bfqq)))
+		goto return_rq;
 
-expire:
 	bfq_bfqq_expire(bfqd, bfqq, false, BFQQE_BUDGET_EXHAUSTED);
+
+return_rq:
 	return rq;
 }
 
@@ -4232,6 +4281,13 @@ static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 			bfq_mark_bfqq_has_short_ttime(bfqq);
 		bfq_mark_bfqq_sync(bfqq);
 		bfq_mark_bfqq_just_created(bfqq);
+		/*
+		 * Aggressively inject a lot of service: up to 90%.
+		 * This coefficient remains constant during bfqq life,
+		 * but this behavior might be changed, after enough
+		 * testing and tuning.
+		 */
+		bfqq->inject_coeff = 1;
 	} else
 		bfq_clear_bfqq_sync(bfqq);
 
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index a8a2e5aca4d4..37d627afdc2e 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -351,6 +351,32 @@ struct bfq_queue {
 	unsigned long split_time; /* time of last split */
 
 	unsigned long first_IO_time; /* time of first I/O for this queue */
+
+	/* max service rate measured so far */
+	u32 max_service_rate;
+	/*
+	 * Ratio between the service received by bfqq while it is in
+	 * service, and the cumulative service (of requests of other
+	 * queues) that may be injected while bfqq is empty but still
+	 * in service. To increase precision, the coefficient is
+	 * measured in tenths of unit. Here are some example of (1)
+	 * ratios, (2) resulting percentages of service injected
+	 * w.r.t. to the total service dispatched while bfqq is in
+	 * service, and (3) corresponding values of the coefficient:
+	 * 1 (50%) -> 10
+	 * 2 (33%) -> 20
+	 * 10 (9%) -> 100
+	 * 9.9 (9%) -> 99
+	 * 1.5 (40%) -> 15
+	 * 0.5 (66%) -> 5
+	 * 0.1 (90%) -> 1
+	 *
+	 * So, if the coefficient is lower than 10, then
+	 * injected service is more than bfqq service.
+	 */
+	unsigned int inject_coeff;
+	/* amount of service injected in current service slot */
+	unsigned int injected_service;
 };
 
 /**
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH BUGFIX/IMPROVEMENT 3/3] blok, bfq: do not plug I/O if all queues are weight-raised
  2018-09-14 14:23 [PATCH BUGFIX/IMPROVEMENT 0/3] bfq: one fix and two important improvements for throughput Paolo Valente
  2018-09-14 14:23 ` [PATCH BUGFIX/IMPROVEMENT 1/3] block, bfq: correctly charge and reset entity service in all cases Paolo Valente
  2018-09-14 14:23 ` [PATCH BUGFIX/IMPROVEMENT 2/3] block, bfq: inject other-queue I/O into seeky idle queues on NCQ flash Paolo Valente
@ 2018-09-14 14:23 ` Paolo Valente
  2018-09-14 19:06 ` [PATCH BUGFIX/IMPROVEMENT 0/3] bfq: one fix and two important improvements for throughput Jens Axboe
  3 siblings, 0 replies; 5+ messages in thread
From: Paolo Valente @ 2018-09-14 14:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr, Paolo Valente

To reduce latency for interactive and soft real-time applications, bfq
privileges the bfq_queues containing the I/O of these
applications. These privileged queues, referred-to as weight-raised
queues, get a much higher share of the device throughput
w.r.t. non-privileged queues. To preserve this higher share, the I/O
of any non-weight-raised queue must be plugged whenever a sync
weight-raised queue, while being served, remains temporarily empty. To
attain this goal, bfq simply plugs any I/O (from any queue), if a sync
weight-raised queue remains empty while in service.

Unfortunately, this plugging typically lowers throughput with random
I/O, on devices with internal queueing (because it reduces the filling
level of the internal queues of the device).

This commit addresses this issue by restricting the cases where
plugging is performed: if a sync weight-raised queue remains empty
while in service, then I/O plugging is performed only if some of the
active bfq_queues are *not* weight-raised (which is actually the only
circumstance where plugging is needed to preserve the higher share of
the throughput of weight-raised queues). This restriction proved able
to boost throughput in really many use cases needing only maximum
throughput.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index d94838bcc135..c0b1db3afb81 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -3580,7 +3580,12 @@ static bool bfq_better_to_idle(struct bfq_queue *bfqq)
 	 * whether bfqq is being weight-raised, because
 	 * bfq_symmetric_scenario() does not take into account also
 	 * weight-raised queues (see comments on
-	 * bfq_weights_tree_add()).
+	 * bfq_weights_tree_add()). In particular, if bfqq is being
+	 * weight-raised, it is important to idle only if there are
+	 * other, non-weight-raised queues that may steal throughput
+	 * to bfqq. Actually, we should be even more precise, and
+	 * differentiate between interactive weight raising and
+	 * soft real-time weight raising.
 	 *
 	 * As a side note, it is worth considering that the above
 	 * device-idling countermeasures may however fail in the
@@ -3592,7 +3597,8 @@ static bool bfq_better_to_idle(struct bfq_queue *bfqq)
 	 * to let requests be served in the desired order until all
 	 * the requests already queued in the device have been served.
 	 */
-	asymmetric_scenario = bfqq->wr_coeff > 1 ||
+	asymmetric_scenario = (bfqq->wr_coeff > 1 &&
+			       bfqd->wr_busy_queues < bfqd->busy_queues) ||
 		!bfq_symmetric_scenario(bfqd);
 
 	/*
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH BUGFIX/IMPROVEMENT 0/3] bfq: one fix and two important improvements for throughput
  2018-09-14 14:23 [PATCH BUGFIX/IMPROVEMENT 0/3] bfq: one fix and two important improvements for throughput Paolo Valente
                   ` (2 preceding siblings ...)
  2018-09-14 14:23 ` [PATCH BUGFIX/IMPROVEMENT 3/3] blok, bfq: do not plug I/O if all queues are weight-raised Paolo Valente
@ 2018-09-14 19:06 ` Jens Axboe
  3 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2018-09-14 19:06 UTC (permalink / raw)
  To: Paolo Valente
  Cc: linux-block, linux-kernel, ulf.hansson, linus.walleij, broonie,
	bfq-iosched, oleksandr

On 9/14/18 8:23 AM, Paolo Valente wrote:
> Hi Jens,
> the second and third patch in this series provide two important
> improvements in bfq's ability to boost throughput with random I/O. The
> benefits of the second patch concern I/O control, and are described in
> detail in this LWN article [1] (and briefly in the commit message
> itself). The benefits of the other patch should be
> straightforward. Finally, the first patch fixes an I/O-control bug,
> found while making the second patch.
> 
> These patches modify somehow core operations of bfq, so, before
> submitting them, I have tested them a lot, and have had them tested by
> other people too. In particular, during these months, they have been
> tested in systems ranging from PCs to development platforms.

Applied for 4.20, thanks Paolo.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-09-14 19:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-14 14:23 [PATCH BUGFIX/IMPROVEMENT 0/3] bfq: one fix and two important improvements for throughput Paolo Valente
2018-09-14 14:23 ` [PATCH BUGFIX/IMPROVEMENT 1/3] block, bfq: correctly charge and reset entity service in all cases Paolo Valente
2018-09-14 14:23 ` [PATCH BUGFIX/IMPROVEMENT 2/3] block, bfq: inject other-queue I/O into seeky idle queues on NCQ flash Paolo Valente
2018-09-14 14:23 ` [PATCH BUGFIX/IMPROVEMENT 3/3] blok, bfq: do not plug I/O if all queues are weight-raised Paolo Valente
2018-09-14 19:06 ` [PATCH BUGFIX/IMPROVEMENT 0/3] bfq: one fix and two important improvements for throughput Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).