linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Valente <paolo.valente@linaro.org>
To: Jens Axboe <axboe@kernel.dk>, Tejun Heo <tj@kernel.org>
Cc: Fabio Checconi <fchecconi@gmail.com>,
	Arianna Avanzini <avanzini.arianna@gmail.com>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	ulf.hansson@linaro.org, linus.walleij@linaro.org,
	broonie@kernel.org, Paolo Valente <paolo.valente@linaro.org>
Subject: [PATCH RFC 13/22] block, bfq: add more fairness to boost throughput and reduce latency
Date: Mon,  1 Feb 2016 23:12:49 +0100	[thread overview]
Message-ID: <1454364778-25179-14-git-send-email-paolo.valente@linaro.org> (raw)
In-Reply-To: <1454364778-25179-1-git-send-email-paolo.valente@linaro.org>

We have found four sources of throughput loss and higher
latencies. First, write requests tend to starve read requests,
basically because, on one side, writes are slower than reads, whereas,
on the other side, storage devices confuse schedulers by deceptively
signaling the completion of write requests immediately after receiving
them. This patch addresses this issue by just throttling writes. In
particular, after a write request is dispatched for a queue, the
budget of the queue is decremented by the number of sectors to write,
multiplied by an (over)charge coefficient.

The current value of this coefficient, as well as the values of the
constants used in the following other changes, is the result of
our tuning with different devices.

The second source of problems is that some applications generate
really few and small, yet very far, random requests at the beginning
of a new I/O-bound phase. This causes the average seek distance,
computed using a low-pass filter, to remain high for a non-negligible
amount of time, even if then the application issues only sequential
requests. Hence, for a while, the queue associated with the
application is unavoidably detected as seeky (i.e., containing random
requests). The device-idling timeout is then set to a very low value
for the queue. This often caused a loss of throughput on rotational
devices, as well as an increased latency. In contrast, this patch
allows the device-idling timeout for a seeky queue to be set to a very
low value only if the associated process has either already consumed
at least a minimum fraction (1/8) of the maximum budget B_max, or
already proved to generate random requests systematically. In
particular, in the latter case the queue is flagged as "constantly
seeky".

Finally, the following additional BFQ mechanism causes throughput loss
and increased latencies in two further situations. When the in-service
queue is expired, BFQ also controls whether the queue has been "too
slow", i.e., has consumed its last-assigned budget at such a low rate
that it would have been impossible to consume all of it within the
maximum time slice T_max (Subsec. 3.5 in [1]). In this case, the queue
is always (over)charged the whole budget, to reduce its utilization of
the device, exactly as it happens with seeky queues. The description
of both the two situations in which this behavior causes problems and
the solutions provided by this patch follows.

1. If too little time has elapsed since a process started doing
sequential I/O, then the positive effect on the throughput of its
sequential accesses may not have yet prevailed on the throughput loss
caused by the fact that a random access had to be performed to get to
the first sector requested by the process. For this reason, if a slow
queue is expired after receiving very little service (at most 1/8 of
the maximum budget), now it is not charged a full budget.

2. Because of ZBR, a queue may be deemed as slow when its associated
process is performing I/O on the slowest zones of a disk. However,
unless the process is truly too slow, not reducing the disk
utilization of the queue is more profitable in terms of disk
throughput than the opposite. For this reason now a queue is never
charged the whole budget if it has consumed at least a significant
part of it (2/3).

[1] P. Valente and M. Andreolini, "Improving Application
    Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of
    the 5th Annual International Systems and Storage Conference
    (SYSTOR '12), June 2012.
    Slightly extended version:
    http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite-
							results.pdf

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
---
 block/bfq.h         |  5 +++++
 block/cfq-iosched.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/block/bfq.h b/block/bfq.h
index 22088da..c8f4f29 100644
--- a/block/bfq.h
+++ b/block/bfq.h
@@ -384,6 +384,10 @@ enum bfqq_state_flags {
 					 * having consumed at most 2/10 of
 					 * its budget
 					 */
+	BFQ_BFQQ_FLAG_constantly_seeky,	/*
+					 * bfqq has proved to be slow and
+					 * seeky until budget timeout
+					 */
 };
 
 #define BFQ_BFQQ_FNS(name)						\
@@ -408,6 +412,7 @@ BFQ_BFQQ_FNS(idle_window);
 BFQ_BFQQ_FNS(sync);
 BFQ_BFQQ_FNS(budget_new);
 BFQ_BFQQ_FNS(IO_bound);
+BFQ_BFQQ_FNS(constantly_seeky);
 #undef BFQ_BFQQ_FNS
 
 /* Logging facilities. */
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 9792cd7..25ee367 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -93,6 +93,13 @@ static const int bfq_stats_min_budgets = 194;
 static const int bfq_default_max_budget = 16 * 1024;
 static const int bfq_max_budget_async_rq = 4;
 
+/*
+ * Async to sync throughput distribution is controlled as follows:
+ * when an async request is served, the entity is charged the number
+ * of sectors of the request, multiplied by the factor below
+ */
+static const int bfq_async_charge_factor = 10;
+
 /* Default timeout values, in jiffies, approximating CFQ defaults. */
 static const int bfq_timeout_sync = HZ / 8;
 static int bfq_timeout_async = HZ / 25;
@@ -2440,10 +2447,12 @@ static struct request *bfq_find_next_rq(struct bfq_data *bfqd,
 	return bfq_choose_req(bfqd, next, prev, blk_rq_pos(last));
 }
 
+/* see the definition of bfq_async_charge_factor for details */
 static unsigned long bfq_serv_to_charge(struct request *rq,
 					struct bfq_queue *bfqq)
 {
-	return blk_rq_sectors(rq);
+	return blk_rq_sectors(rq) *
+		(1 + ((!bfq_bfqq_sync(bfqq)) * bfq_async_charge_factor));
 }
 
 /**
@@ -2779,13 +2788,22 @@ static void bfq_arm_slice_timer(struct bfq_data *bfqd)
 	 * We don't want to idle for seeks, but we do want to allow
 	 * fair distribution of slice time for a process doing back-to-back
 	 * seeks. So allow a little bit of time for him to submit a new rq.
+	 *
+	 * To prevent processes with (partly) seeky workloads from
+	 * being too ill-treated, grant them a small fraction of the
+	 * assigned budget before reducing the waiting time to
+	 * BFQ_MIN_TT. This happened to help reduce latency.
 	 */
 	sl = bfqd->bfq_slice_idle;
 	/*
-	 * Grant only minimum idle time if the queue has been seeky
-	 * for long enough.
+	 * Grant only minimum idle time if the queue either has been
+	 * seeky for long enough or has already proved to be
+	 * constantly seeky.
 	 */
-	if (bfq_sample_valid(bfqq->seek_samples) && BFQQ_SEEKY(bfqq))
+	if (bfq_sample_valid(bfqq->seek_samples) &&
+	    ((BFQQ_SEEKY(bfqq) && bfqq->entity.service >
+				  bfq_max_budget(bfqq->bfqd) / 8) ||
+	      bfq_bfqq_constantly_seeky(bfqq)))
 		sl = min(sl, msecs_to_jiffies(BFQ_MIN_TT));
 
 	bfqd->last_idling_start = ktime_get();
@@ -3109,6 +3127,16 @@ static bool bfq_update_peak_rate(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 	}
 
 	/*
+	 * If the process has been served for a too short time
+	 * interval to let its possible sequential accesses prevail on
+	 * the initial seek time needed to move the disk head on the
+	 * first sector it requested, then give the process a chance
+	 * and for the moment return false.
+	 */
+	if (bfqq->entity.budget <= bfq_max_budget(bfqd) / 8)
+		return false;
+
+	/*
 	 * A process is considered ``slow'' (i.e., seeky, so that we
 	 * cannot treat it fairly in the service domain, as it would
 	 * slow down too much the other processes) if, when a slice
@@ -3175,10 +3203,21 @@ static void bfq_bfqq_expire(struct bfq_data *bfqd,
 	/*
 	 * As above explained, 'punish' slow (i.e., seeky), timed-out
 	 * and async queues, to favor sequential sync workloads.
+	 *
+	 * Processes doing I/O in the slower disk zones will tend to be
+	 * slow(er) even if not seeky. Hence, since the estimated peak
+	 * rate is actually an average over the disk surface, these
+	 * processes may timeout just for bad luck. To avoid punishing
+	 * them we do not charge a full budget to a process that
+	 * succeeded in consuming at least 2/3 of its budget.
 	 */
-	if (slow || reason == BFQ_BFQQ_BUDGET_TIMEOUT)
+	if (slow || (reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
+		     bfq_bfqq_budget_left(bfqq) >=  bfqq->entity.budget / 3))
 		bfq_bfqq_charge_full_budget(bfqq);
 
+	if (BFQQ_SEEKY(bfqq) && reason == BFQ_BFQQ_BUDGET_TIMEOUT)
+		bfq_mark_bfqq_constantly_seeky(bfqq);
+
 	if (reason == BFQ_BFQQ_TOO_IDLE &&
 	    bfqq->entity.service <= 2 * bfqq->entity.budget / 10)
 		bfq_clear_bfqq_IO_bound(bfqq);
@@ -3914,6 +3953,8 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 
 	bfq_update_io_thinktime(bfqd, bic);
 	bfq_update_io_seektime(bfqd, bfqq, rq);
+	if (!BFQQ_SEEKY(bfqq))
+		bfq_clear_bfqq_constantly_seeky(bfqq);
 	if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
 	    !BFQQ_SEEKY(bfqq))
 		bfq_update_idle_window(bfqd, bfqq, bic);
-- 
1.9.1

  parent reply	other threads:[~2016-02-01 22:51 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-01 22:12 [PATCH RFC 00/22] Replace the CFQ I/O Scheduler with BFQ Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 01/22] block, cfq: remove queue merging for close cooperators Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 02/22] block, cfq: remove close-based preemption Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 03/22] block, cfq: remove deep seek queues logic Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 04/22] block, cfq: remove SSD-related logic Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 05/22] block, cfq: get rid of hierarchical support Paolo Valente
2016-02-10 23:04   ` Tejun Heo
2016-02-01 22:12 ` [PATCH RFC 06/22] block, cfq: get rid of queue preemption Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 07/22] block, cfq: get rid of workload type Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 08/22] block, cfq: get rid of latency tunables Paolo Valente
2016-02-10 23:05   ` Tejun Heo
2016-02-01 22:12 ` [PATCH RFC 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler Paolo Valente
2016-02-11 22:22   ` Tejun Heo
2016-02-12  0:35     ` Mark Brown
2016-02-17 15:57       ` Tejun Heo
2016-02-17 16:02         ` Mark Brown
2016-02-17 17:04           ` Tejun Heo
2016-02-17 18:13             ` Jonathan Corbet
2016-02-17 19:45               ` Tejun Heo
2016-02-17 19:56                 ` Jonathan Corbet
2016-02-17 20:14                   ` Tejun Heo
2016-02-17  9:02     ` Paolo Valente
2016-02-17 17:02       ` Tejun Heo
2016-02-20 10:23         ` Paolo Valente
2016-02-20 11:02           ` Paolo Valente
2016-03-01 18:46           ` Tejun Heo
2016-03-04 17:29             ` Linus Walleij
2016-03-04 17:39               ` Christoph Hellwig
2016-03-04 18:10                 ` Austin S. Hemmelgarn
2016-03-11 11:16                   ` Christoph Hellwig
2016-03-11 13:38                     ` Austin S. Hemmelgarn
2016-03-05 12:18                 ` Linus Walleij
2016-03-11 11:17                   ` Christoph Hellwig
2016-03-11 11:24                     ` Nikolay Borisov
2016-03-11 11:49                       ` Christoph Hellwig
2016-03-11 14:53                     ` Linus Walleij
2016-03-09  6:55                 ` Paolo Valente
2016-04-13 19:54                 ` Tejun Heo
2016-04-14  5:03                   ` Mark Brown
2016-03-09  6:34             ` Paolo Valente
2016-04-13 20:41               ` Tejun Heo
2016-04-14 10:23                 ` Paolo Valente
2016-04-14 16:29                   ` Tejun Heo
2016-04-15 14:20                     ` Paolo Valente
2016-04-15 15:08                       ` Tejun Heo
2016-04-15 16:17                         ` Paolo Valente
2016-04-15 19:29                           ` Tejun Heo
2016-04-15 22:08                             ` Paolo Valente
2016-04-15 22:45                               ` Tejun Heo
2016-04-16  6:03                                 ` Paolo Valente
2016-04-15 14:49                     ` Linus Walleij
2016-02-01 22:12 ` [PATCH RFC 10/22] block, bfq: add full hierarchical scheduling and cgroups support Paolo Valente
2016-02-11 22:28   ` Tejun Heo
2016-02-17  9:07     ` Paolo Valente
2016-02-17 17:14       ` Tejun Heo
2016-02-17 17:45         ` Tejun Heo
2016-04-20  9:32     ` Paolo
2016-04-22 18:13       ` Tejun Heo
2016-04-22 18:19         ` Paolo Valente
2016-04-22 18:41           ` Tejun Heo
2016-04-22 19:05             ` Paolo Valente
2016-04-22 19:32               ` Tejun Heo
2016-04-23  7:07                 ` Paolo Valente
2016-04-25 19:24                   ` Tejun Heo
2016-04-25 20:30                     ` Paolo
2016-05-06 20:20                       ` Paolo Valente
2016-05-12 13:11                         ` Paolo
2016-07-27 16:13                         ` [PATCH RFC V8 00/22] Replace the CFQ I/O Scheduler with BFQ Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 01/22] block, cfq: remove queue merging for close cooperators Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 02/22] block, cfq: remove close-based preemption Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 03/22] block, cfq: remove deep seek queues logic Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 04/22] block, cfq: remove SSD-related logic Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 05/22] block, cfq: get rid of hierarchical support Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 06/22] block, cfq: get rid of queue preemption Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 07/22] block, cfq: get rid of workload type Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 08/22] block, cfq: get rid of latency tunables Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 09/22] block, cfq: replace CFQ with the BFQ-v0 I/O scheduler Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 10/22] block, bfq: add full hierarchical scheduling and cgroups support Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 11/22] block, bfq: improve throughput boosting Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 12/22] block, bfq: modify the peak-rate estimator Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 13/22] block, bfq: add more fairness with writes and slow processes Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 14/22] block, bfq: improve responsiveness Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 15/22] block, bfq: reduce I/O latency for soft real-time applications Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 16/22] block, bfq: preserve a low latency also with NCQ-capable drives Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 17/22] block, bfq: reduce latency during request-pool saturation Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 18/22] block, bfq: add Early Queue Merge (EQM) Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 19/22] block, bfq: reduce idling only in symmetric scenarios Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 20/22] block, bfq: boost the throughput on NCQ-capable flash-based devices Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 21/22] block, bfq: boost the throughput with random I/O on NCQ-capable HDDs Paolo Valente
2016-07-27 16:13                           ` [PATCH RFC V8 22/22] block, bfq: handle bursts of queue activations Paolo Valente
2016-07-28 16:50                           ` [PATCH RFC V8 00/22] Replace the CFQ I/O Scheduler with BFQ Paolo
2016-02-01 22:12 ` [PATCH RFC 11/22] block, bfq: improve throughput boosting Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 12/22] block, bfq: modify the peak-rate estimator Paolo Valente
2016-02-01 22:12 ` Paolo Valente [this message]
2016-02-01 22:12 ` [PATCH RFC 14/22] block, bfq: improve responsiveness Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 15/22] block, bfq: reduce I/O latency for soft real-time applications Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 16/22] block, bfq: preserve a low latency also with NCQ-capable drives Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 17/22] block, bfq: reduce latency during request-pool saturation Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 18/22] block, bfq: add Early Queue Merge (EQM) Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 19/22] block, bfq: reduce idling only in symmetric scenarios Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 20/22] block, bfq: boost the throughput on NCQ-capable flash-based devices Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 21/22] block, bfq: boost the throughput with random I/O on NCQ-capable HDDs Paolo Valente
2016-02-01 22:12 ` [PATCH RFC 22/22] block, bfq: handle bursts of queue activations Paolo Valente

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1454364778-25179-14-git-send-email-paolo.valente@linaro.org \
    --to=paolo.valente@linaro.org \
    --cc=avanzini.arianna@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=broonie@kernel.org \
    --cc=fchecconi@gmail.com \
    --cc=linus.walleij@linaro.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).