linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH IMPROVEMENT/BUGFIX 0/4] block, bfq: increase sustainable IOPS and fix a bug
@ 2017-11-13  6:34 Paolo Valente
  2017-11-13  6:34 ` [PATCH BUGFIX/IMPROVEMENT 1/4] doc, block, bfq: update max IOPS sustainable with BFQ Paolo Valente
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Paolo Valente @ 2017-11-13  6:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, broonie, linus.walleij,
	lee.tibbert, oleksandr, lucmiccio, bfq-iosched, Paolo Valente

Hi,
these patches address the following issue, raised and
discussed in [1].

BFQ provides a proportional share policy for the blkio controller.  In
this respect, BFQ updates the I/O accounting related to its policy,
i.e., the statistics contained in the special files blkio.bfq.* in
blkio groups (these files are the bfq counterpart of the blkio.*
statistic files updated by CFQ).  To update these statistics, BFQ
invokes some blkg_*stats_* functions.  We have found out that these
functions take a considerable percentage, about 40%, of the total
execution time of BFQ.

This patch series contains two patches to address this issue, namely
the patches anticipated and discussed in their main aspects in [1].

The first of these two patches is patch 3/4 in this series: it enables
BFQ to execute the above blkg_*stats_* functions, where possible, in
parallel with the rest of the code of the scheduler.  With this
improvement, the maximum request-processing rate sustainable with BFQ
grows by 25%-30%, depending on the CPU.  For instance, it grows from
250 to 310 KIOPS on an Intel i7-4850HQ. These results, and the others
reported in this letter, have been obtained and can be reproduced very
easily with the script [2].

Unfortunately, even after the above improvement, blkg_*stats_*
functions still cause a noticeable loss of sustainable throughput.  To
give an idea, on an Intel i7-4850HQ, if the update of blkio.bfq.*
statistics is not performed at all, then the sustainable throughput
grows from 310 to 400 KIOPS.  This issue has been already discussed in
[1] as well. In brief, we agreed to make a further commit, which
introduces the possibility to disable/re-enable at boot, or at
module-loading time, the updating of all blkio statistics for
proportional-share policies, i.e., of both those updated by BFQ and
those updated by CFQ.

We are already working on that commit, but finalizing it will take
some time.  Fortunately, following a suggestion/recommendation of
Tejun in the same thread [2], it is already possible to drastically
increase BFQ performance, when no blkio-debugging information is
needed. Tejun's suggestion/recommendation is to move most blkio.bfq.*
statistics behind an already existing config option,
CONFIG_DEBUG_BLK_CGROUP.  Patch 4/4 in this series does that.  Thanks
to this change, if CONFIG_DEBUG_BLK_CGROUP is not set, then bfq does
attain a further boost in sustainable throughput, which ranges from
+30% to +45%, depending on the CPU (some figures in the
documentation).

The above two patches are preceded by two preliminary patches.  The
first updates the conservative range of IOPS (sustainable with BFQ)
that was previously reported in the documentation.  The patch replaces
this piece of information with the actual, much higher limits that we
have measured while working at the above two commits.  The second
preliminary patch fixes a functional bug, related to the update of the
above statistics.

We waited for one week of testing from bfq users before submitting
these patches. We hope we are still in time for having these
improvements and fixes considered for 4.15.

NOTE.

Two definitions of empty functions in patch 4/4 trigger the following
checkpatch error: "open brace '{' following function definitions go on
the next line". Unfortunately, following this recommendation does seem
to worsen code in our case: in addition to making these two
definitions slightly harder to read, it would break symmetry with
respect to all other definitions of empty functions, both those
already present in the base code, and those added by the patch
itself. In particular, in all those other definitions, the empty body
of the functions is on the same line as the prototypes of the
functions. Oddly, the latter definitions do not cause the same error
report.

Thanks,
Paolo

[1] https://www.spinics.net/lists/linux-block/msg18943.html
[2] https://github.com/Algodev-github/IOSpeed

Luca Miccio (2):
  block, bfq: add missing invocations of bfqg_stats_update_io_add/remove
  block, bfq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP

Paolo Valente (2):
  doc, block, bfq: update max IOPS sustainable with BFQ
  block, bfq: update blkio stats outside the scheduler lock

 Documentation/block/bfq-iosched.txt |  43 +++++++++--
 block/bfq-cgroup.c                  | 148 ++++++++++++++++++++----------------
 block/bfq-iosched.c                 | 117 ++++++++++++++++++++++++++--
 block/bfq-iosched.h                 |   4 +-
 block/bfq-wf2q.c                    |   1 -
 5 files changed, 233 insertions(+), 80 deletions(-)

--
2.10.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH BUGFIX/IMPROVEMENT 1/4] doc, block, bfq: update max IOPS sustainable with BFQ
  2017-11-13  6:34 [PATCH IMPROVEMENT/BUGFIX 0/4] block, bfq: increase sustainable IOPS and fix a bug Paolo Valente
@ 2017-11-13  6:34 ` Paolo Valente
  2017-11-13  6:34 ` [PATCH BUGFIX/IMPROVEMENT 2/4] block, bfq: add missing invocations of bfqg_stats_update_io_add/remove Paolo Valente
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Paolo Valente @ 2017-11-13  6:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, broonie, linus.walleij,
	lee.tibbert, oleksandr, lucmiccio, bfq-iosched, Paolo Valente

We have investigated more deeply the performance of BFQ, in terms of
number of IOPS that can be processed by the CPU when BFQ is used as
I/O scheduler. In more detail, using the script [1], we have measured
the number of IOPS reached on top of a null block device configured
with zero latency, as a function of the workload (sequential read,
sequential write, random read, random write) and of the system (we
considered desktops, laptops and embedded systems).

Basing on the resulting figures, with this commit we update the
current, conservative IOPS range reported in BFQ documentation. In
particular, the documentation now reports, for each of three different
systems, the lowest number of IOPS obtained for that system with the
above test (namely, the value obtained with the workload leading to
the lowest IOPS).

[1] https://github.com/Algodev-github/IOSpeed

Reviewed-by: Lee Tibbert <lee.tibbert@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Luca Miccio <lucmiccio@gmail.com>
---
 Documentation/block/bfq-iosched.txt | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/Documentation/block/bfq-iosched.txt b/Documentation/block/bfq-iosched.txt
index 3d6951d..7a93615 100644
--- a/Documentation/block/bfq-iosched.txt
+++ b/Documentation/block/bfq-iosched.txt
@@ -20,12 +20,17 @@ for that device, by setting low_latency to 0. See Section 3 for
 details on how to configure BFQ for the desired tradeoff between
 latency and throughput, or on how to maximize throughput.
 
-On average CPUs, the current version of BFQ can handle devices
-performing at most ~30K IOPS; at most ~50 KIOPS on faster CPUs. As a
-reference, 30-50 KIOPS correspond to very high bandwidths with
-sequential I/O (e.g., 8-12 GB/s if I/O requests are 256 KB large), and
-to 120-200 MB/s with 4KB random I/O. BFQ is currently being tested on
-multi-queue devices too.
+BFQ has a non-null overhead, which limits the maximum IOPS that the
+CPU can process for a device scheduled with BFQ. To give an idea of
+the limits on slow or average CPUs, here are BFQ limits for three
+different CPUs, on, respectively, an average laptop, an old desktop,
+and a cheap embedded system, in case full hierarchical support is
+enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set):
+- Intel i7-4850HQ: 250 KIOPS
+- AMD A8-3850: 170 KIOPS
+- ARM CortexTM-A53 Octa-core: 45 KIOPS
+
+BFQ works for multi-queue devices too.
 
 The table of contents follow. Impatients can just jump to Section 3.
 
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH BUGFIX/IMPROVEMENT 2/4] block, bfq: add missing invocations of bfqg_stats_update_io_add/remove
  2017-11-13  6:34 [PATCH IMPROVEMENT/BUGFIX 0/4] block, bfq: increase sustainable IOPS and fix a bug Paolo Valente
  2017-11-13  6:34 ` [PATCH BUGFIX/IMPROVEMENT 1/4] doc, block, bfq: update max IOPS sustainable with BFQ Paolo Valente
@ 2017-11-13  6:34 ` Paolo Valente
  2017-11-13  6:34 ` [PATCH BUGFIX/IMPROVEMENT 3/4] block, bfq: update blkio stats outside the scheduler lock Paolo Valente
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Paolo Valente @ 2017-11-13  6:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, broonie, linus.walleij,
	lee.tibbert, oleksandr, lucmiccio, bfq-iosched, Paolo Valente

From: Luca Miccio <lucmiccio@gmail.com>

bfqg_stats_update_io_add and bfqg_stats_update_io_remove are to be
invoked, respectively, when an I/O request enters and when an I/O
request exits the scheduler. Unfortunately, bfq does not fully comply
with this scheme, because it does not invoke these functions for
requests that are inserted into or extracted from its priority
dispatch list. This commit fixes this mistake.

Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Luca Miccio <lucmiccio@gmail.com>
---
 block/bfq-iosched.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 889a854..91703eb 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1359,7 +1359,6 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
 			bfqq->ttime.last_end_request +
 			bfqd->bfq_slice_idle * 3;
 
-	bfqg_stats_update_io_add(bfqq_group(RQ_BFQQ(rq)), bfqq, rq->cmd_flags);
 
 	/*
 	 * bfqq deserves to be weight-raised if:
@@ -1633,7 +1632,6 @@ static void bfq_remove_request(struct request_queue *q,
 	if (rq->cmd_flags & REQ_META)
 		bfqq->meta_pending--;
 
-	bfqg_stats_update_io_remove(bfqq_group(bfqq), rq->cmd_flags);
 }
 
 static bool bfq_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio)
@@ -1746,6 +1744,7 @@ static void bfq_requests_merged(struct request_queue *q, struct request *rq,
 		bfqq->next_rq = rq;
 
 	bfq_remove_request(q, next);
+	bfqg_stats_update_io_remove(bfqq_group(bfqq), next->cmd_flags);
 
 	spin_unlock_irq(&bfqq->bfqd->lock);
 end:
@@ -3700,6 +3699,9 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	spin_lock_irq(&bfqd->lock);
 
 	rq = __bfq_dispatch_request(hctx);
+	if (rq && RQ_BFQQ(rq))
+		bfqg_stats_update_io_remove(bfqq_group(RQ_BFQQ(rq)),
+					    rq->cmd_flags);
 	spin_unlock_irq(&bfqd->lock);
 
 	return rq;
@@ -4224,6 +4226,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 {
 	struct request_queue *q = hctx->queue;
 	struct bfq_data *bfqd = q->elevator->elevator_data;
+	struct bfq_queue *bfqq = RQ_BFQQ(rq);
 
 	spin_lock_irq(&bfqd->lock);
 	if (blk_mq_sched_try_insert_merge(q, rq)) {
@@ -4243,6 +4246,12 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 			list_add_tail(&rq->queuelist, &bfqd->dispatch);
 	} else {
 		__bfq_insert_request(bfqd, rq);
+		/*
+		 * Update bfqq, because, if a queue merge has occurred
+		 * in __bfq_insert_request, then rq has been
+		 * redirected into a new queue.
+		 */
+		bfqq = RQ_BFQQ(rq);
 
 		if (rq_mergeable(rq)) {
 			elv_rqhash_add(q, rq);
@@ -4251,6 +4260,9 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 		}
 	}
 
+	if (bfqq)
+		bfqg_stats_update_io_add(bfqq_group(bfqq), bfqq, rq->cmd_flags);
+
 	spin_unlock_irq(&bfqd->lock);
 }
 
@@ -4428,8 +4440,11 @@ static void bfq_finish_request(struct request *rq)
 		 * lock is held.
 		 */
 
-		if (!RB_EMPTY_NODE(&rq->rb_node))
+		if (!RB_EMPTY_NODE(&rq->rb_node)) {
 			bfq_remove_request(rq->q, rq);
+			bfqg_stats_update_io_remove(bfqq_group(bfqq),
+						    rq->cmd_flags);
+		}
 		bfq_put_rq_priv_body(bfqq);
 	}
 
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH BUGFIX/IMPROVEMENT 3/4] block, bfq: update blkio stats outside the scheduler lock
  2017-11-13  6:34 [PATCH IMPROVEMENT/BUGFIX 0/4] block, bfq: increase sustainable IOPS and fix a bug Paolo Valente
  2017-11-13  6:34 ` [PATCH BUGFIX/IMPROVEMENT 1/4] doc, block, bfq: update max IOPS sustainable with BFQ Paolo Valente
  2017-11-13  6:34 ` [PATCH BUGFIX/IMPROVEMENT 2/4] block, bfq: add missing invocations of bfqg_stats_update_io_add/remove Paolo Valente
@ 2017-11-13  6:34 ` Paolo Valente
  2017-11-13  6:34 ` [PATCH BUGFIX/IMPROVEMENT 4/4] block, bfq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP Paolo Valente
  2017-11-13 17:53 ` [PATCH IMPROVEMENT/BUGFIX 0/4] block, bfq: increase sustainable IOPS and fix a bug Jens Axboe
  4 siblings, 0 replies; 6+ messages in thread
From: Paolo Valente @ 2017-11-13  6:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, broonie, linus.walleij,
	lee.tibbert, oleksandr, lucmiccio, bfq-iosched, Paolo Valente

bfq invokes various blkg_*stats_* functions to update the statistics
contained in the special files blkio.bfq.* in the blkio controller
groups, i.e., the I/O accounting related to the proportional-share
policy provided by bfq. The execution of these functions takes a
considerable percentage, about 40%, of the total per-request execution
time of bfq (i.e., of the sum of the execution time of all the bfq
functions that have to be executed to process an I/O request from its
creation to its destruction).  This reduces the request-processing
rate sustainable by bfq noticeably, even on a multicore CPU. In fact,
the bfq functions that invoke blkg_*stats_* functions cannot be
executed in parallel with the rest of the code of bfq, because both
are executed under the same same per-device scheduler lock.

To reduce this slowdown, this commit moves, wherever possible, the
invocation of these functions (more precisely, of the bfq functions
that invoke blkg_*stats_* functions) outside the critical sections
protected by the scheduler lock.

With this change, and with all blkio.bfq.* statistics enabled, the
throughput grows, e.g., from 250 to 310 KIOPS (+25%) on an Intel
i7-4850HQ, in case of 8 threads doing random I/O in parallel on
null_blk, with the latter configured with 0 latency. We obtained the
same or higher throughput boosts, up to +30%, with other processors
(some figures are reported in the documentation). For our tests, we
used the script [1], with which our results can be easily reproduced.

NOTE. This commit still protects the invocation of blkg_*stats_*
functions with the request_queue lock, because the group these
functions are invoked on may otherwise disappear before or while these
functions are executed.  Fortunately, tests without even this lock
show, by difference, that the serialization caused by this lock has a
little impact (at most ~5% of throughput reduction).

[1] https://github.com/Algodev-github/IOSpeed

Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Luca Miccio <lucmiccio@gmail.com>
---
 Documentation/block/bfq-iosched.txt |   6 +-
 block/bfq-iosched.c                 | 110 ++++++++++++++++++++++++++++++++----
 block/bfq-wf2q.c                    |   1 -
 3 files changed, 102 insertions(+), 15 deletions(-)

diff --git a/Documentation/block/bfq-iosched.txt b/Documentation/block/bfq-iosched.txt
index 7a93615..7fad6c0 100644
--- a/Documentation/block/bfq-iosched.txt
+++ b/Documentation/block/bfq-iosched.txt
@@ -26,9 +26,9 @@ the limits on slow or average CPUs, here are BFQ limits for three
 different CPUs, on, respectively, an average laptop, an old desktop,
 and a cheap embedded system, in case full hierarchical support is
 enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set):
-- Intel i7-4850HQ: 250 KIOPS
-- AMD A8-3850: 170 KIOPS
-- ARM CortexTM-A53 Octa-core: 45 KIOPS
+- Intel i7-4850HQ: 310 KIOPS
+- AMD A8-3850: 200 KIOPS
+- ARM CortexTM-A53 Octa-core: 56 KIOPS
 
 BFQ works for multi-queue devices too.
 
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 91703eb..69e05f8 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2228,7 +2228,6 @@ static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
 				       struct bfq_queue *bfqq)
 {
 	if (bfqq) {
-		bfqg_stats_update_avg_queue_size(bfqq_group(bfqq));
 		bfq_clear_bfqq_fifo_expire(bfqq);
 
 		bfqd->budgets_assigned = (bfqd->budgets_assigned * 7 + 256) / 8;
@@ -3469,7 +3468,6 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
 				 */
 				bfq_clear_bfqq_wait_request(bfqq);
 				hrtimer_try_to_cancel(&bfqd->idle_slice_timer);
-				bfqg_stats_update_idle_time(bfqq_group(bfqq));
 			}
 			goto keep_queue;
 		}
@@ -3695,15 +3693,67 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
 {
 	struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
 	struct request *rq;
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+	struct bfq_queue *in_serv_queue, *bfqq;
+	bool waiting_rq, idle_timer_disabled;
+#endif
 
 	spin_lock_irq(&bfqd->lock);
 
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+	in_serv_queue = bfqd->in_service_queue;
+	waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
+
+	rq = __bfq_dispatch_request(hctx);
+
+	idle_timer_disabled =
+		waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
+
+#else
 	rq = __bfq_dispatch_request(hctx);
-	if (rq && RQ_BFQQ(rq))
-		bfqg_stats_update_io_remove(bfqq_group(RQ_BFQQ(rq)),
-					    rq->cmd_flags);
+#endif
 	spin_unlock_irq(&bfqd->lock);
 
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+	bfqq = rq ? RQ_BFQQ(rq) : NULL;
+	if (!idle_timer_disabled && !bfqq)
+		return rq;
+
+	/*
+	 * rq and bfqq are guaranteed to exist until this function
+	 * ends, for the following reasons. First, rq can be
+	 * dispatched to the device, and then can be completed and
+	 * freed, only after this function ends. Second, rq cannot be
+	 * merged (and thus freed because of a merge) any longer,
+	 * because it has already started. Thus rq cannot be freed
+	 * before this function ends, and, since rq has a reference to
+	 * bfqq, the same guarantee holds for bfqq too.
+	 *
+	 * In addition, the following queue lock guarantees that
+	 * bfqq_group(bfqq) exists as well.
+	 */
+	spin_lock_irq(hctx->queue->queue_lock);
+	if (idle_timer_disabled)
+		/*
+		 * Since the idle timer has been disabled,
+		 * in_serv_queue contained some request when
+		 * __bfq_dispatch_request was invoked above, which
+		 * implies that rq was picked exactly from
+		 * in_serv_queue. Thus in_serv_queue == bfqq, and is
+		 * therefore guaranteed to exist because of the above
+		 * arguments.
+		 */
+		bfqg_stats_update_idle_time(bfqq_group(in_serv_queue));
+	if (bfqq) {
+		struct bfq_group *bfqg = bfqq_group(bfqq);
+
+		bfqg_stats_update_avg_queue_size(bfqg);
+		bfqg_stats_set_start_empty_time(bfqg);
+		bfqg_stats_update_io_remove(bfqg, rq->cmd_flags);
+	}
+	spin_unlock_irq(hctx->queue->queue_lock);
+#endif
+
 	return rq;
 }
 
@@ -4161,7 +4211,6 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 		 */
 		bfq_clear_bfqq_wait_request(bfqq);
 		hrtimer_try_to_cancel(&bfqd->idle_slice_timer);
-		bfqg_stats_update_idle_time(bfqq_group(bfqq));
 
 		/*
 		 * The queue is not empty, because a new request just
@@ -4176,10 +4225,12 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 	}
 }
 
-static void __bfq_insert_request(struct bfq_data *bfqd, struct request *rq)
+/* returns true if it causes the idle timer to be disabled */
+static bool __bfq_insert_request(struct bfq_data *bfqd, struct request *rq)
 {
 	struct bfq_queue *bfqq = RQ_BFQQ(rq),
 		*new_bfqq = bfq_setup_cooperator(bfqd, bfqq, rq, true);
+	bool waiting, idle_timer_disabled = false;
 
 	if (new_bfqq) {
 		if (bic_to_bfqq(RQ_BIC(rq), 1) != bfqq)
@@ -4213,12 +4264,16 @@ static void __bfq_insert_request(struct bfq_data *bfqd, struct request *rq)
 		bfqq = new_bfqq;
 	}
 
+	waiting = bfqq && bfq_bfqq_wait_request(bfqq);
 	bfq_add_request(rq);
+	idle_timer_disabled = waiting && !bfq_bfqq_wait_request(bfqq);
 
 	rq->fifo_time = ktime_get_ns() + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
 	list_add_tail(&rq->queuelist, &bfqq->fifo);
 
 	bfq_rq_enqueued(bfqd, bfqq, rq);
+
+	return idle_timer_disabled;
 }
 
 static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
@@ -4226,7 +4281,11 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 {
 	struct request_queue *q = hctx->queue;
 	struct bfq_data *bfqd = q->elevator->elevator_data;
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
 	struct bfq_queue *bfqq = RQ_BFQQ(rq);
+	bool idle_timer_disabled = false;
+	unsigned int cmd_flags;
+#endif
 
 	spin_lock_irq(&bfqd->lock);
 	if (blk_mq_sched_try_insert_merge(q, rq)) {
@@ -4245,13 +4304,17 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 		else
 			list_add_tail(&rq->queuelist, &bfqd->dispatch);
 	} else {
-		__bfq_insert_request(bfqd, rq);
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+		idle_timer_disabled = __bfq_insert_request(bfqd, rq);
 		/*
 		 * Update bfqq, because, if a queue merge has occurred
 		 * in __bfq_insert_request, then rq has been
 		 * redirected into a new queue.
 		 */
 		bfqq = RQ_BFQQ(rq);
+#else
+		__bfq_insert_request(bfqd, rq);
+#endif
 
 		if (rq_mergeable(rq)) {
 			elv_rqhash_add(q, rq);
@@ -4260,10 +4323,35 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 		}
 	}
 
-	if (bfqq)
-		bfqg_stats_update_io_add(bfqq_group(bfqq), bfqq, rq->cmd_flags);
-
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+	/*
+	 * Cache cmd_flags before releasing scheduler lock, because rq
+	 * may disappear afterwards (for example, because of a request
+	 * merge).
+	 */
+	cmd_flags = rq->cmd_flags;
+#endif
 	spin_unlock_irq(&bfqd->lock);
+
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+	if (!bfqq)
+		return;
+	/*
+	 * bfqq still exists, because it can disappear only after
+	 * either it is merged with another queue, or the process it
+	 * is associated with exits. But both actions must be taken by
+	 * the same process currently executing this flow of
+	 * instruction.
+	 *
+	 * In addition, the following queue lock guarantees that
+	 * bfqq_group(bfqq) exists as well.
+	 */
+	spin_lock_irq(q->queue_lock);
+	bfqg_stats_update_io_add(bfqq_group(bfqq), bfqq, cmd_flags);
+	if (idle_timer_disabled)
+		bfqg_stats_update_idle_time(bfqq_group(bfqq));
+	spin_unlock_irq(q->queue_lock);
+#endif
 }
 
 static void bfq_insert_requests(struct blk_mq_hw_ctx *hctx,
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index 414ba68..e495d3f 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -843,7 +843,6 @@ void bfq_bfqq_served(struct bfq_queue *bfqq, int served)
 		st->vtime += bfq_delta(served, st->wsum);
 		bfq_forget_idle(st);
 	}
-	bfqg_stats_set_start_empty_time(bfqq_group(bfqq));
 	bfq_log_bfqq(bfqq->bfqd, bfqq, "bfqq_served %d secs", served);
 }
 
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH BUGFIX/IMPROVEMENT 4/4] block, bfq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP
  2017-11-13  6:34 [PATCH IMPROVEMENT/BUGFIX 0/4] block, bfq: increase sustainable IOPS and fix a bug Paolo Valente
                   ` (2 preceding siblings ...)
  2017-11-13  6:34 ` [PATCH BUGFIX/IMPROVEMENT 3/4] block, bfq: update blkio stats outside the scheduler lock Paolo Valente
@ 2017-11-13  6:34 ` Paolo Valente
  2017-11-13 17:53 ` [PATCH IMPROVEMENT/BUGFIX 0/4] block, bfq: increase sustainable IOPS and fix a bug Jens Axboe
  4 siblings, 0 replies; 6+ messages in thread
From: Paolo Valente @ 2017-11-13  6:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, broonie, linus.walleij,
	lee.tibbert, oleksandr, lucmiccio, bfq-iosched, Paolo Valente

From: Luca Miccio <lucmiccio@gmail.com>

BFQ currently creates, and updates, its own instance of the whole
set of blkio statistics that cfq creates. Yet, from the comments
of Tejun Heo in [1], it turned out that most of these statistics
are meant/useful only for debugging. This commit makes BFQ create
the latter, debugging statistics only if the option
CONFIG_DEBUG_BLK_CGROUP is set.

By doing so, this commit also enables BFQ to enjoy a high perfomance
boost. The reason is that, if CONFIG_DEBUG_BLK_CGROUP is not set, then
BFQ has to update far fewer statistics, and, in particular, not the
heaviest to update.  To give an idea of the benefits, if
CONFIG_DEBUG_BLK_CGROUP is not set, then, on an Intel i7-4850HQ, and
with 8 threads doing random I/O in parallel on null_blk (configured
with 0 latency), the throughput of BFQ grows from 310 to 400 KIOPS
(+30%). We have measured similar or even much higher boosts with other
CPUs: e.g., +45% with an ARM CortexTM-A53 Octa-core. Our results have
been obtained and can be reproduced very easily with the script in [1].

[1] https://www.spinics.net/lists/linux-block/msg18943.html

Suggested-by: Tejun Heo <tj@kernel.org>
Suggested-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Luca Miccio <lucmiccio@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 Documentation/block/bfq-iosched.txt |  38 +++++++--
 block/bfq-cgroup.c                  | 148 ++++++++++++++++++++----------------
 block/bfq-iosched.c                 |  14 ++--
 block/bfq-iosched.h                 |   4 +-
 4 files changed, 125 insertions(+), 79 deletions(-)

diff --git a/Documentation/block/bfq-iosched.txt b/Documentation/block/bfq-iosched.txt
index 7fad6c0..8d8d8f0 100644
--- a/Documentation/block/bfq-iosched.txt
+++ b/Documentation/block/bfq-iosched.txt
@@ -20,12 +20,22 @@ for that device, by setting low_latency to 0. See Section 3 for
 details on how to configure BFQ for the desired tradeoff between
 latency and throughput, or on how to maximize throughput.
 
-BFQ has a non-null overhead, which limits the maximum IOPS that the
-CPU can process for a device scheduled with BFQ. To give an idea of
-the limits on slow or average CPUs, here are BFQ limits for three
-different CPUs, on, respectively, an average laptop, an old desktop,
-and a cheap embedded system, in case full hierarchical support is
-enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set):
+BFQ has a non-null overhead, which limits the maximum IOPS that a CPU
+can process for a device scheduled with BFQ. To give an idea of the
+limits on slow or average CPUs, here are, first, the limits of BFQ for
+three different CPUs, on, respectively, an average laptop, an old
+desktop, and a cheap embedded system, in case full hierarchical
+support is enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set), but
+CONFIG_DEBUG_BLK_CGROUP is not set (Section 4-2):
+- Intel i7-4850HQ: 400 KIOPS
+- AMD A8-3850: 250 KIOPS
+- ARM CortexTM-A53 Octa-core: 80 KIOPS
+
+If CONFIG_DEBUG_BLK_CGROUP is set (and of course full hierarchical
+support is enabled), then the sustainable throughput with BFQ
+decreases, because all blkio.bfq* statistics are created and updated
+(Section 4-2). For BFQ, this leads to the following maximum
+sustainable throughputs, on the same systems as above:
 - Intel i7-4850HQ: 310 KIOPS
 - AMD A8-3850: 200 KIOPS
 - ARM CortexTM-A53 Octa-core: 56 KIOPS
@@ -505,6 +515,22 @@ BFQ-specific files is "blkio.bfq." or "io.bfq." For example, the group
 parameter to set the weight of a group with BFQ is blkio.bfq.weight
 or io.bfq.weight.
 
+As for cgroups-v1 (blkio controller), the exact set of stat files
+created, and kept up-to-date by bfq, depends on whether
+CONFIG_DEBUG_BLK_CGROUP is set. If it is set, then bfq creates all
+the stat files documented in
+Documentation/cgroup-v1/blkio-controller.txt. If, instead,
+CONFIG_DEBUG_BLK_CGROUP is not set, then bfq creates only the files
+blkio.bfq.io_service_bytes
+blkio.bfq.io_service_bytes_recursive
+blkio.bfq.io_serviced
+blkio.bfq.io_serviced_recursive
+
+The value of CONFIG_DEBUG_BLK_CGROUP greatly influences the maximum
+throughput sustainable with bfq, because updating the blkio.bfq.*
+stats is rather costly, especially for some of the stats enabled by
+CONFIG_DEBUG_BLK_CGROUP.
+
 Parameters to set
 -----------------
 
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index ceefb9a..da1525e 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -24,7 +24,7 @@
 
 #include "bfq-iosched.h"
 
-#ifdef CONFIG_BFQ_GROUP_IOSCHED
+#if defined(CONFIG_BFQ_GROUP_IOSCHED) &&  defined(CONFIG_DEBUG_BLK_CGROUP)
 
 /* bfqg stats flags */
 enum bfqg_stats_flags {
@@ -152,6 +152,57 @@ void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg)
 	bfqg_stats_update_group_wait_time(stats);
 }
 
+void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,
+			      unsigned int op)
+{
+	blkg_rwstat_add(&bfqg->stats.queued, op, 1);
+	bfqg_stats_end_empty_time(&bfqg->stats);
+	if (!(bfqq == ((struct bfq_data *)bfqg->bfqd)->in_service_queue))
+		bfqg_stats_set_start_group_wait_time(bfqg, bfqq_group(bfqq));
+}
+
+void bfqg_stats_update_io_remove(struct bfq_group *bfqg, unsigned int op)
+{
+	blkg_rwstat_add(&bfqg->stats.queued, op, -1);
+}
+
+void bfqg_stats_update_io_merged(struct bfq_group *bfqg, unsigned int op)
+{
+	blkg_rwstat_add(&bfqg->stats.merged, op, 1);
+}
+
+void bfqg_stats_update_completion(struct bfq_group *bfqg, uint64_t start_time,
+				  uint64_t io_start_time, unsigned int op)
+{
+	struct bfqg_stats *stats = &bfqg->stats;
+	unsigned long long now = sched_clock();
+
+	if (time_after64(now, io_start_time))
+		blkg_rwstat_add(&stats->service_time, op,
+				now - io_start_time);
+	if (time_after64(io_start_time, start_time))
+		blkg_rwstat_add(&stats->wait_time, op,
+				io_start_time - start_time);
+}
+
+#else /* CONFIG_BFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */
+
+void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,
+			      unsigned int op) { }
+void bfqg_stats_update_io_remove(struct bfq_group *bfqg, unsigned int op) { }
+void bfqg_stats_update_io_merged(struct bfq_group *bfqg, unsigned int op) { }
+void bfqg_stats_update_completion(struct bfq_group *bfqg, uint64_t start_time,
+				  uint64_t io_start_time, unsigned int op) { }
+void bfqg_stats_update_dequeue(struct bfq_group *bfqg) { }
+void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg) { }
+void bfqg_stats_update_idle_time(struct bfq_group *bfqg) { }
+void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg) { }
+void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg) { }
+
+#endif /* CONFIG_BFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */
+
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+
 /*
  * blk-cgroup policy-related handlers
  * The following functions help in converting between blk-cgroup
@@ -229,42 +280,10 @@ void bfqg_and_blkg_put(struct bfq_group *bfqg)
 	blkg_put(bfqg_to_blkg(bfqg));
 }
 
-void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,
-			      unsigned int op)
-{
-	blkg_rwstat_add(&bfqg->stats.queued, op, 1);
-	bfqg_stats_end_empty_time(&bfqg->stats);
-	if (!(bfqq == ((struct bfq_data *)bfqg->bfqd)->in_service_queue))
-		bfqg_stats_set_start_group_wait_time(bfqg, bfqq_group(bfqq));
-}
-
-void bfqg_stats_update_io_remove(struct bfq_group *bfqg, unsigned int op)
-{
-	blkg_rwstat_add(&bfqg->stats.queued, op, -1);
-}
-
-void bfqg_stats_update_io_merged(struct bfq_group *bfqg, unsigned int op)
-{
-	blkg_rwstat_add(&bfqg->stats.merged, op, 1);
-}
-
-void bfqg_stats_update_completion(struct bfq_group *bfqg, uint64_t start_time,
-				  uint64_t io_start_time, unsigned int op)
-{
-	struct bfqg_stats *stats = &bfqg->stats;
-	unsigned long long now = sched_clock();
-
-	if (time_after64(now, io_start_time))
-		blkg_rwstat_add(&stats->service_time, op,
-				now - io_start_time);
-	if (time_after64(io_start_time, start_time))
-		blkg_rwstat_add(&stats->wait_time, op,
-				io_start_time - start_time);
-}
-
 /* @stats = 0 */
 static void bfqg_stats_reset(struct bfqg_stats *stats)
 {
+#ifdef CONFIG_DEBUG_BLK_CGROUP
 	/* queued stats shouldn't be cleared */
 	blkg_rwstat_reset(&stats->merged);
 	blkg_rwstat_reset(&stats->service_time);
@@ -276,6 +295,7 @@ static void bfqg_stats_reset(struct bfqg_stats *stats)
 	blkg_stat_reset(&stats->group_wait_time);
 	blkg_stat_reset(&stats->idle_time);
 	blkg_stat_reset(&stats->empty_time);
+#endif
 }
 
 /* @to += @from */
@@ -284,6 +304,7 @@ static void bfqg_stats_add_aux(struct bfqg_stats *to, struct bfqg_stats *from)
 	if (!to || !from)
 		return;
 
+#ifdef CONFIG_DEBUG_BLK_CGROUP
 	/* queued stats shouldn't be cleared */
 	blkg_rwstat_add_aux(&to->merged, &from->merged);
 	blkg_rwstat_add_aux(&to->service_time, &from->service_time);
@@ -296,6 +317,7 @@ static void bfqg_stats_add_aux(struct bfqg_stats *to, struct bfqg_stats *from)
 	blkg_stat_add_aux(&to->group_wait_time, &from->group_wait_time);
 	blkg_stat_add_aux(&to->idle_time, &from->idle_time);
 	blkg_stat_add_aux(&to->empty_time, &from->empty_time);
+#endif
 }
 
 /*
@@ -342,6 +364,7 @@ void bfq_init_entity(struct bfq_entity *entity, struct bfq_group *bfqg)
 
 static void bfqg_stats_exit(struct bfqg_stats *stats)
 {
+#ifdef CONFIG_DEBUG_BLK_CGROUP
 	blkg_rwstat_exit(&stats->merged);
 	blkg_rwstat_exit(&stats->service_time);
 	blkg_rwstat_exit(&stats->wait_time);
@@ -353,10 +376,12 @@ static void bfqg_stats_exit(struct bfqg_stats *stats)
 	blkg_stat_exit(&stats->group_wait_time);
 	blkg_stat_exit(&stats->idle_time);
 	blkg_stat_exit(&stats->empty_time);
+#endif
 }
 
 static int bfqg_stats_init(struct bfqg_stats *stats, gfp_t gfp)
 {
+#ifdef CONFIG_DEBUG_BLK_CGROUP
 	if (blkg_rwstat_init(&stats->merged, gfp) ||
 	    blkg_rwstat_init(&stats->service_time, gfp) ||
 	    blkg_rwstat_init(&stats->wait_time, gfp) ||
@@ -371,6 +396,7 @@ static int bfqg_stats_init(struct bfqg_stats *stats, gfp_t gfp)
 		bfqg_stats_exit(stats);
 		return -ENOMEM;
 	}
+#endif
 
 	return 0;
 }
@@ -887,6 +913,7 @@ static ssize_t bfq_io_set_weight(struct kernfs_open_file *of,
 	return bfq_io_set_weight_legacy(of_css(of), NULL, weight);
 }
 
+#ifdef CONFIG_DEBUG_BLK_CGROUP
 static int bfqg_print_stat(struct seq_file *sf, void *v)
 {
 	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_stat,
@@ -991,6 +1018,7 @@ static int bfqg_print_avg_queue_size(struct seq_file *sf, void *v)
 			  0, false);
 	return 0;
 }
+#endif /* CONFIG_DEBUG_BLK_CGROUP */
 
 struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)
 {
@@ -1029,15 +1057,6 @@ struct cftype bfq_blkcg_legacy_files[] = {
 
 	/* statistics, covers only the tasks in the bfqg */
 	{
-		.name = "bfq.time",
-		.private = offsetof(struct bfq_group, stats.time),
-		.seq_show = bfqg_print_stat,
-	},
-	{
-		.name = "bfq.sectors",
-		.seq_show = bfqg_print_stat_sectors,
-	},
-	{
 		.name = "bfq.io_service_bytes",
 		.private = (unsigned long)&blkcg_policy_bfq,
 		.seq_show = blkg_print_stat_bytes,
@@ -1047,6 +1066,16 @@ struct cftype bfq_blkcg_legacy_files[] = {
 		.private = (unsigned long)&blkcg_policy_bfq,
 		.seq_show = blkg_print_stat_ios,
 	},
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+	{
+		.name = "bfq.time",
+		.private = offsetof(struct bfq_group, stats.time),
+		.seq_show = bfqg_print_stat,
+	},
+	{
+		.name = "bfq.sectors",
+		.seq_show = bfqg_print_stat_sectors,
+	},
 	{
 		.name = "bfq.io_service_time",
 		.private = offsetof(struct bfq_group, stats.service_time),
@@ -1067,18 +1096,10 @@ struct cftype bfq_blkcg_legacy_files[] = {
 		.private = offsetof(struct bfq_group, stats.queued),
 		.seq_show = bfqg_print_rwstat,
 	},
+#endif /* CONFIG_DEBUG_BLK_CGROUP */
 
 	/* the same statictics which cover the bfqg and its descendants */
 	{
-		.name = "bfq.time_recursive",
-		.private = offsetof(struct bfq_group, stats.time),
-		.seq_show = bfqg_print_stat_recursive,
-	},
-	{
-		.name = "bfq.sectors_recursive",
-		.seq_show = bfqg_print_stat_sectors_recursive,
-	},
-	{
 		.name = "bfq.io_service_bytes_recursive",
 		.private = (unsigned long)&blkcg_policy_bfq,
 		.seq_show = blkg_print_stat_bytes_recursive,
@@ -1088,6 +1109,16 @@ struct cftype bfq_blkcg_legacy_files[] = {
 		.private = (unsigned long)&blkcg_policy_bfq,
 		.seq_show = blkg_print_stat_ios_recursive,
 	},
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+	{
+		.name = "bfq.time_recursive",
+		.private = offsetof(struct bfq_group, stats.time),
+		.seq_show = bfqg_print_stat_recursive,
+	},
+	{
+		.name = "bfq.sectors_recursive",
+		.seq_show = bfqg_print_stat_sectors_recursive,
+	},
 	{
 		.name = "bfq.io_service_time_recursive",
 		.private = offsetof(struct bfq_group, stats.service_time),
@@ -1132,6 +1163,7 @@ struct cftype bfq_blkcg_legacy_files[] = {
 		.private = offsetof(struct bfq_group, stats.dequeue),
 		.seq_show = bfqg_print_stat,
 	},
+#endif	/* CONFIG_DEBUG_BLK_CGROUP */
 	{ }	/* terminate */
 };
 
@@ -1147,18 +1179,6 @@ struct cftype bfq_blkg_files[] = {
 
 #else	/* CONFIG_BFQ_GROUP_IOSCHED */
 
-void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,
-			      unsigned int op) { }
-void bfqg_stats_update_io_remove(struct bfq_group *bfqg, unsigned int op) { }
-void bfqg_stats_update_io_merged(struct bfq_group *bfqg, unsigned int op) { }
-void bfqg_stats_update_completion(struct bfq_group *bfqg, uint64_t start_time,
-				  uint64_t io_start_time, unsigned int op) { }
-void bfqg_stats_update_dequeue(struct bfq_group *bfqg) { }
-void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg) { }
-void bfqg_stats_update_idle_time(struct bfq_group *bfqg) { }
-void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg) { }
-void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg) { }
-
 void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 		   struct bfq_group *bfqg) {}
 
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 69e05f8..bcb6d21 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -3693,14 +3693,14 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
 {
 	struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
 	struct request *rq;
-#ifdef CONFIG_BFQ_GROUP_IOSCHED
+#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP)
 	struct bfq_queue *in_serv_queue, *bfqq;
 	bool waiting_rq, idle_timer_disabled;
 #endif
 
 	spin_lock_irq(&bfqd->lock);
 
-#ifdef CONFIG_BFQ_GROUP_IOSCHED
+#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP)
 	in_serv_queue = bfqd->in_service_queue;
 	waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
 
@@ -3714,7 +3714,7 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
 #endif
 	spin_unlock_irq(&bfqd->lock);
 
-#ifdef CONFIG_BFQ_GROUP_IOSCHED
+#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP)
 	bfqq = rq ? RQ_BFQQ(rq) : NULL;
 	if (!idle_timer_disabled && !bfqq)
 		return rq;
@@ -4281,7 +4281,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 {
 	struct request_queue *q = hctx->queue;
 	struct bfq_data *bfqd = q->elevator->elevator_data;
-#ifdef CONFIG_BFQ_GROUP_IOSCHED
+#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP)
 	struct bfq_queue *bfqq = RQ_BFQQ(rq);
 	bool idle_timer_disabled = false;
 	unsigned int cmd_flags;
@@ -4304,7 +4304,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 		else
 			list_add_tail(&rq->queuelist, &bfqd->dispatch);
 	} else {
-#ifdef CONFIG_BFQ_GROUP_IOSCHED
+#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP)
 		idle_timer_disabled = __bfq_insert_request(bfqd, rq);
 		/*
 		 * Update bfqq, because, if a queue merge has occurred
@@ -4323,7 +4323,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 		}
 	}
 
-#ifdef CONFIG_BFQ_GROUP_IOSCHED
+#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP)
 	/*
 	 * Cache cmd_flags before releasing scheduler lock, because rq
 	 * may disappear afterwards (for example, because of a request
@@ -4333,7 +4333,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 #endif
 	spin_unlock_irq(&bfqd->lock);
 
-#ifdef CONFIG_BFQ_GROUP_IOSCHED
+#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP)
 	if (!bfqq)
 		return;
 	/*
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index ac0809c..91c4390 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -689,7 +689,7 @@ enum bfqq_expiration {
 };
 
 struct bfqg_stats {
-#ifdef CONFIG_BFQ_GROUP_IOSCHED
+#if defined(CONFIG_BFQ_GROUP_IOSCHED) && defined(CONFIG_DEBUG_BLK_CGROUP)
 	/* number of ios merged */
 	struct blkg_rwstat		merged;
 	/* total time spent on device in ns, may not be accurate w/ queueing */
@@ -717,7 +717,7 @@ struct bfqg_stats {
 	uint64_t			start_idle_time;
 	uint64_t			start_empty_time;
 	uint16_t			flags;
-#endif	/* CONFIG_BFQ_GROUP_IOSCHED */
+#endif	/* CONFIG_BFQ_GROUP_IOSCHED && CONFIG_DEBUG_BLK_CGROUP */
 };
 
 #ifdef CONFIG_BFQ_GROUP_IOSCHED
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH IMPROVEMENT/BUGFIX 0/4] block, bfq: increase sustainable IOPS and fix a bug
  2017-11-13  6:34 [PATCH IMPROVEMENT/BUGFIX 0/4] block, bfq: increase sustainable IOPS and fix a bug Paolo Valente
                   ` (3 preceding siblings ...)
  2017-11-13  6:34 ` [PATCH BUGFIX/IMPROVEMENT 4/4] block, bfq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP Paolo Valente
@ 2017-11-13 17:53 ` Jens Axboe
  4 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2017-11-13 17:53 UTC (permalink / raw)
  To: Paolo Valente
  Cc: linux-block, linux-kernel, ulf.hansson, broonie, linus.walleij,
	lee.tibbert, oleksandr, lucmiccio, bfq-iosched

On 11/12/2017 11:34 PM, Paolo Valente wrote:
> Hi,
> these patches address the following issue, raised and
> discussed in [1].
> 
> BFQ provides a proportional share policy for the blkio controller.  In
> this respect, BFQ updates the I/O accounting related to its policy,
> i.e., the statistics contained in the special files blkio.bfq.* in
> blkio groups (these files are the bfq counterpart of the blkio.*
> statistic files updated by CFQ).  To update these statistics, BFQ
> invokes some blkg_*stats_* functions.  We have found out that these
> functions take a considerable percentage, about 40%, of the total
> execution time of BFQ.
> 
> This patch series contains two patches to address this issue, namely
> the patches anticipated and discussed in their main aspects in [1].
> 
> The first of these two patches is patch 3/4 in this series: it enables
> BFQ to execute the above blkg_*stats_* functions, where possible, in
> parallel with the rest of the code of the scheduler.  With this
> improvement, the maximum request-processing rate sustainable with BFQ
> grows by 25%-30%, depending on the CPU.  For instance, it grows from
> 250 to 310 KIOPS on an Intel i7-4850HQ. These results, and the others
> reported in this letter, have been obtained and can be reproduced very
> easily with the script [2].
> 
> Unfortunately, even after the above improvement, blkg_*stats_*
> functions still cause a noticeable loss of sustainable throughput.  To
> give an idea, on an Intel i7-4850HQ, if the update of blkio.bfq.*
> statistics is not performed at all, then the sustainable throughput
> grows from 310 to 400 KIOPS.  This issue has been already discussed in
> [1] as well. In brief, we agreed to make a further commit, which
> introduces the possibility to disable/re-enable at boot, or at
> module-loading time, the updating of all blkio statistics for
> proportional-share policies, i.e., of both those updated by BFQ and
> those updated by CFQ.
> 
> We are already working on that commit, but finalizing it will take
> some time.  Fortunately, following a suggestion/recommendation of
> Tejun in the same thread [2], it is already possible to drastically
> increase BFQ performance, when no blkio-debugging information is
> needed. Tejun's suggestion/recommendation is to move most blkio.bfq.*
> statistics behind an already existing config option,
> CONFIG_DEBUG_BLK_CGROUP.  Patch 4/4 in this series does that.  Thanks
> to this change, if CONFIG_DEBUG_BLK_CGROUP is not set, then bfq does
> attain a further boost in sustainable throughput, which ranges from
> +30% to +45%, depending on the CPU (some figures in the
> documentation).
> 
> The above two patches are preceded by two preliminary patches.  The
> first updates the conservative range of IOPS (sustainable with BFQ)
> that was previously reported in the documentation.  The patch replaces
> this piece of information with the actual, much higher limits that we
> have measured while working at the above two commits.  The second
> preliminary patch fixes a functional bug, related to the update of the
> above statistics.
> 
> We waited for one week of testing from bfq users before submitting
> these patches. We hope we are still in time for having these
> improvements and fixes considered for 4.15.

Usually I'd say it's too late, but I knew this was coming. I'll get
this queued up.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-11-13 17:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-13  6:34 [PATCH IMPROVEMENT/BUGFIX 0/4] block, bfq: increase sustainable IOPS and fix a bug Paolo Valente
2017-11-13  6:34 ` [PATCH BUGFIX/IMPROVEMENT 1/4] doc, block, bfq: update max IOPS sustainable with BFQ Paolo Valente
2017-11-13  6:34 ` [PATCH BUGFIX/IMPROVEMENT 2/4] block, bfq: add missing invocations of bfqg_stats_update_io_add/remove Paolo Valente
2017-11-13  6:34 ` [PATCH BUGFIX/IMPROVEMENT 3/4] block, bfq: update blkio stats outside the scheduler lock Paolo Valente
2017-11-13  6:34 ` [PATCH BUGFIX/IMPROVEMENT 4/4] block, bfq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP Paolo Valente
2017-11-13 17:53 ` [PATCH IMPROVEMENT/BUGFIX 0/4] block, bfq: increase sustainable IOPS and fix a bug Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).