All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] blk-mq: multiqueue I/O scheduler
@ 2017-03-17 22:03 Omar Sandoval
  2017-03-17 22:03 ` [RFC PATCH 1/4] sbitmap: add sbitmap_get_shallow() operation Omar Sandoval
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Omar Sandoval @ 2017-03-17 22:03 UTC (permalink / raw)
  To: linux-block; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

This patch series implements an I/O scheduler for multiqueue devices
combining several techniques: the scalable bitmap library, the new
blk-stats API, and queue depth throttling similar to blk-wbt.

These patches are on top of my earlier blk-stats series [1]. They also
need a fix in Jens' for-linus branch in order to work properly [2].

Patches 1 and 2 implement a new sbitmap operation and patch 3 exports a
required function. Patch 4 implements the new scheduler, named Kyber.

The commit message in patch 4 describes Kyber in detail. The scheduler
employs some heuristics that I've experimented with, but that's probably
the area that needs the most work. I'll be at LSF/MM next week, and
there's currently a lightning talk on the schedule to discuss
improvements and extensions.

A quick test case that demonstrates the scheduler in action:

---
[global]
filename=/dev/nvme0n1
direct=1
runtime=10s
time_based
group_reporting

[writers]
numjobs=40
ioengine=libaio
iodepth=1024
rw=randwrite

[reader]
new_group
rate_iops=1000
ioengine=sync
rw=randread
---

With Kyber set to a target latency of 1 ms, the reader sees latencies of 8 ms
on average. Kyber brings this down to just over 1 ms.

Thanks!

1: http://marc.info/?l=linux-block&m=148952547205774&w=2
2: http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=efd4b81abbe1ac753717f2f10cd3dab8bed6c103

Omar Sandoval (4):
  sbitmap: add sbitmap_get_shallow() operation
  blk-mq: add shallow depth option for blk_mq_get_tag()
  blk-mq: export blk_mq_finish_request()
  blk-mq: introduce Kyber multiqueue I/O scheduler

 block/Kconfig.iosched   |   8 +
 block/Makefile          |   1 +
 block/blk-mq-tag.c      |   5 +-
 block/blk-mq.c          |   1 +
 block/blk-mq.h          |   1 +
 block/elevator.c        |   9 +-
 block/kyber-iosched.c   | 586 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/sbitmap.h |  55 +++++
 lib/sbitmap.c           |  75 ++++++-
 9 files changed, 729 insertions(+), 12 deletions(-)
 create mode 100644 block/kyber-iosched.c

-- 
2.12.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/4] sbitmap: add sbitmap_get_shallow() operation
  2017-03-17 22:03 [RFC PATCH 0/4] blk-mq: multiqueue I/O scheduler Omar Sandoval
@ 2017-03-17 22:03 ` Omar Sandoval
  2017-03-17 22:03 ` [RFC PATCH 2/4] blk-mq: add shallow depth option for blk_mq_get_tag() Omar Sandoval
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Omar Sandoval @ 2017-03-17 22:03 UTC (permalink / raw)
  To: linux-block; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

This operation supports the use case of limiting the number of bits that
can be allocated for a given operation. Rather than setting aside some
bits at the end of the bitmap, we can set aside bits in each word of the
bitmap. This means we can keep the allocation hints spread out and
support sbitmap_resize() nicely at the cost of lower granularity for the
allowed depth.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 include/linux/sbitmap.h | 55 ++++++++++++++++++++++++++++++++++++
 lib/sbitmap.c           | 75 ++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 123 insertions(+), 7 deletions(-)

diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
index d4e0a204c118..a1904aadbc45 100644
--- a/include/linux/sbitmap.h
+++ b/include/linux/sbitmap.h
@@ -176,6 +176,25 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth);
 int sbitmap_get(struct sbitmap *sb, unsigned int alloc_hint, bool round_robin);
 
 /**
+ * sbitmap_get_shallow() - Try to allocate a free bit from a &struct sbitmap,
+ * limiting the depth used from each word.
+ * @sb: Bitmap to allocate from.
+ * @alloc_hint: Hint for where to start searching for a free bit.
+ * @shallow_depth: The maximum number of bits to allocate from a single word.
+ *
+ * This rather specific operation allows for having multiple users with
+ * different allocation limits. E.g., there can be a high-priority class that
+ * uses sbitmap_get() and a low-priority class that uses sbitmap_get_shallow()
+ * with a @shallow_depth of (1 << (@sb->shift - 1)). Then, the low-priority
+ * class can only allocate half of the total bits in the bitmap, preventing it
+ * from starving out the high-priority class.
+ *
+ * Return: Non-negative allocated bit number if successful, -1 otherwise.
+ */
+int sbitmap_get_shallow(struct sbitmap *sb, unsigned int alloc_hint,
+			unsigned long shallow_depth);
+
+/**
  * sbitmap_any_bit_set() - Check for a set bit in a &struct sbitmap.
  * @sb: Bitmap to check.
  *
@@ -326,6 +345,19 @@ void sbitmap_queue_resize(struct sbitmap_queue *sbq, unsigned int depth);
 int __sbitmap_queue_get(struct sbitmap_queue *sbq);
 
 /**
+ * __sbitmap_queue_get_shallow() - Try to allocate a free bit from a &struct
+ * sbitmap_queue, limiting the depth used from each word, with preemption
+ * already disabled.
+ * @sbq: Bitmap queue to allocate from.
+ * @shallow_depth: The maximum number of bits to allocate from a single word.
+ * See sbitmap_get_shallow().
+ *
+ * Return: Non-negative allocated bit number if successful, -1 otherwise.
+ */
+int __sbitmap_queue_get_shallow(struct sbitmap_queue *sbq,
+				unsigned int shallow_depth);
+
+/**
  * sbitmap_queue_get() - Try to allocate a free bit from a &struct
  * sbitmap_queue.
  * @sbq: Bitmap queue to allocate from.
@@ -346,6 +378,29 @@ static inline int sbitmap_queue_get(struct sbitmap_queue *sbq,
 }
 
 /**
+ * sbitmap_queue_get_shallow() - Try to allocate a free bit from a &struct
+ * sbitmap_queue, limiting the depth used from each word.
+ * @sbq: Bitmap queue to allocate from.
+ * @cpu: Output parameter; will contain the CPU we ran on (e.g., to be passed to
+ *       sbitmap_queue_clear()).
+ * @shallow_depth: The maximum number of bits to allocate from a single word.
+ * See sbitmap_get_shallow().
+ *
+ * Return: Non-negative allocated bit number if successful, -1 otherwise.
+ */
+static inline int sbitmap_queue_get_shallow(struct sbitmap_queue *sbq,
+					    unsigned int *cpu,
+					    unsigned int shallow_depth)
+{
+	int nr;
+
+	*cpu = get_cpu();
+	nr = __sbitmap_queue_get_shallow(sbq, shallow_depth);
+	put_cpu();
+	return nr;
+}
+
+/**
  * sbitmap_queue_clear() - Free an allocated bit and wake up waiters on a
  * &struct sbitmap_queue.
  * @sbq: Bitmap to free from.
diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index 60e800e0b5a0..80aa8d5463fa 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -79,15 +79,15 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth)
 }
 EXPORT_SYMBOL_GPL(sbitmap_resize);
 
-static int __sbitmap_get_word(struct sbitmap_word *word, unsigned int hint,
-			      bool wrap)
+static int __sbitmap_get_word(unsigned long *word, unsigned long depth,
+			      unsigned int hint, bool wrap)
 {
 	unsigned int orig_hint = hint;
 	int nr;
 
 	while (1) {
-		nr = find_next_zero_bit(&word->word, word->depth, hint);
-		if (unlikely(nr >= word->depth)) {
+		nr = find_next_zero_bit(word, depth, hint);
+		if (unlikely(nr >= depth)) {
 			/*
 			 * We started with an offset, and we didn't reset the
 			 * offset to 0 in a failure case, so start from 0 to
@@ -100,11 +100,11 @@ static int __sbitmap_get_word(struct sbitmap_word *word, unsigned int hint,
 			return -1;
 		}
 
-		if (!test_and_set_bit(nr, &word->word))
+		if (!test_and_set_bit(nr, word))
 			break;
 
 		hint = nr + 1;
-		if (hint >= word->depth - 1)
+		if (hint >= depth - 1)
 			hint = 0;
 	}
 
@@ -119,7 +119,8 @@ int sbitmap_get(struct sbitmap *sb, unsigned int alloc_hint, bool round_robin)
 	index = SB_NR_TO_INDEX(sb, alloc_hint);
 
 	for (i = 0; i < sb->map_nr; i++) {
-		nr = __sbitmap_get_word(&sb->map[index],
+		nr = __sbitmap_get_word(&sb->map[index].word,
+					sb->map[index].depth,
 					SB_NR_TO_BIT(sb, alloc_hint),
 					!round_robin);
 		if (nr != -1) {
@@ -141,6 +142,37 @@ int sbitmap_get(struct sbitmap *sb, unsigned int alloc_hint, bool round_robin)
 }
 EXPORT_SYMBOL_GPL(sbitmap_get);
 
+int sbitmap_get_shallow(struct sbitmap *sb, unsigned int alloc_hint,
+			unsigned long shallow_depth)
+{
+	unsigned int i, index;
+	int nr = -1;
+
+	index = SB_NR_TO_INDEX(sb, alloc_hint);
+
+	for (i = 0; i < sb->map_nr; i++) {
+		nr = __sbitmap_get_word(&sb->map[index].word,
+					min(sb->map[index].depth, shallow_depth),
+					SB_NR_TO_BIT(sb, alloc_hint), true);
+		if (nr != -1) {
+			nr += index << sb->shift;
+			break;
+		}
+
+		/* Jump to next index. */
+		index++;
+		alloc_hint = index << sb->shift;
+
+		if (index >= sb->map_nr) {
+			index = 0;
+			alloc_hint = 0;
+		}
+	}
+
+	return nr;
+}
+EXPORT_SYMBOL_GPL(sbitmap_get_shallow);
+
 bool sbitmap_any_bit_set(const struct sbitmap *sb)
 {
 	unsigned int i;
@@ -342,6 +374,35 @@ int __sbitmap_queue_get(struct sbitmap_queue *sbq)
 }
 EXPORT_SYMBOL_GPL(__sbitmap_queue_get);
 
+int __sbitmap_queue_get_shallow(struct sbitmap_queue *sbq,
+				unsigned int shallow_depth)
+{
+	unsigned int hint, depth;
+	int nr;
+
+	hint = this_cpu_read(*sbq->alloc_hint);
+	depth = READ_ONCE(sbq->sb.depth);
+	if (unlikely(hint >= depth)) {
+		hint = depth ? prandom_u32() % depth : 0;
+		this_cpu_write(*sbq->alloc_hint, hint);
+	}
+	nr = sbitmap_get_shallow(&sbq->sb, hint, shallow_depth);
+
+	if (nr == -1) {
+		/* If the map is full, a hint won't do us much good. */
+		this_cpu_write(*sbq->alloc_hint, 0);
+	} else if (nr == hint || unlikely(sbq->round_robin)) {
+		/* Only update the hint if we used it. */
+		hint = nr + 1;
+		if (hint >= depth - 1)
+			hint = 0;
+		this_cpu_write(*sbq->alloc_hint, hint);
+	}
+
+	return nr;
+}
+EXPORT_SYMBOL_GPL(__sbitmap_queue_get_shallow);
+
 static struct sbq_wait_state *sbq_wake_ptr(struct sbitmap_queue *sbq)
 {
 	int i, wake_index;
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/4] blk-mq: add shallow depth option for blk_mq_get_tag()
  2017-03-17 22:03 [RFC PATCH 0/4] blk-mq: multiqueue I/O scheduler Omar Sandoval
  2017-03-17 22:03 ` [RFC PATCH 1/4] sbitmap: add sbitmap_get_shallow() operation Omar Sandoval
@ 2017-03-17 22:03 ` Omar Sandoval
  2017-03-17 22:03 ` [RFC PATCH 3/4] blk-mq: export blk_mq_finish_request() Omar Sandoval
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Omar Sandoval @ 2017-03-17 22:03 UTC (permalink / raw)
  To: linux-block; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

Wire up the sbitmap_get_shallow() operation to the tag code so that a
caller can limit the number of tags available to it.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 block/blk-mq-tag.c | 5 ++++-
 block/blk-mq.h     | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index e48bc2c72615..1f25d466ebdc 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -96,7 +96,10 @@ static int __blk_mq_get_tag(struct blk_mq_alloc_data *data,
 	if (!(data->flags & BLK_MQ_REQ_INTERNAL) &&
 	    !hctx_may_queue(data->hctx, bt))
 		return -1;
-	return __sbitmap_queue_get(bt);
+	if (data->shallow_depth)
+		return __sbitmap_queue_get_shallow(bt, data->shallow_depth);
+	else
+		return __sbitmap_queue_get(bt);
 }
 
 unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 8d49c06fc520..77ec66369f21 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -141,6 +141,7 @@ struct blk_mq_alloc_data {
 	/* input parameter */
 	struct request_queue *q;
 	unsigned int flags;
+	unsigned int shallow_depth;
 
 	/* input & output parameter */
 	struct blk_mq_ctx *ctx;
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 3/4] blk-mq: export blk_mq_finish_request()
  2017-03-17 22:03 [RFC PATCH 0/4] blk-mq: multiqueue I/O scheduler Omar Sandoval
  2017-03-17 22:03 ` [RFC PATCH 1/4] sbitmap: add sbitmap_get_shallow() operation Omar Sandoval
  2017-03-17 22:03 ` [RFC PATCH 2/4] blk-mq: add shallow depth option for blk_mq_get_tag() Omar Sandoval
@ 2017-03-17 22:03 ` Omar Sandoval
  2017-03-17 22:03 ` [RFC PATCH 4/4] blk-mq: introduce Kyber multiqueue I/O scheduler Omar Sandoval
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Omar Sandoval @ 2017-03-17 22:03 UTC (permalink / raw)
  To: linux-block; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

This is required for schedulers that define their own put_request().

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 block/blk-mq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 559f9b0f24a1..94973103e809 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -370,6 +370,7 @@ void blk_mq_finish_request(struct request *rq)
 {
 	blk_mq_finish_hctx_request(blk_mq_map_queue(rq->q, rq->mq_ctx->cpu), rq);
 }
+EXPORT_SYMBOL_GPL(blk_mq_finish_request);
 
 void blk_mq_free_request(struct request *rq)
 {
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 4/4] blk-mq: introduce Kyber multiqueue I/O scheduler
  2017-03-17 22:03 [RFC PATCH 0/4] blk-mq: multiqueue I/O scheduler Omar Sandoval
                   ` (2 preceding siblings ...)
  2017-03-17 22:03 ` [RFC PATCH 3/4] blk-mq: export blk_mq_finish_request() Omar Sandoval
@ 2017-03-17 22:03 ` Omar Sandoval
  2017-03-18  0:55 ` [RFC PATCH 0/4] blk-mq: " Omar Sandoval
  2017-03-18 11:14 ` [Lsf] " Hannes Reinecke
  5 siblings, 0 replies; 10+ messages in thread
From: Omar Sandoval @ 2017-03-17 22:03 UTC (permalink / raw)
  To: linux-block; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

The Kyber I/O scheduler is an I/O scheduler for fast devices designed to
scale to multiple queues. Users configure a single knob, the target read
latency, and the scheduler tunes itself to achieve that latency goal.

The implementation is based on "tokens", built on top of the scalable
bitmap library. Tokens serve as a mechanism for limiting requests. There
are two tiers of tokens: queueing tokens and dispatch tokens.

A queueing token is required to allocate a request. In fact, these
tokens are actually the blk-mq internal scheduler tags, but the
scheduler manages the allocation directly in order to implement its
policy.

Dispatch tokens are device-wide and split up into two scheduling
domains: reads vs. writes. Each hardware queue dispatches batches
round-robin between the scheduling domains as long as tokens are
available for that domain.

These tokens can be used as the mechanism to enable various policies.
The policy Kyber uses is inspired by active queue management techniques
for network routing, similar to blk-wbt. The scheduler monitors read
latencies and scales the number of available write dispatch tokens
accordingly. Queueing tokens are used to prevent starvation of
synchronous requests by asynchronous requests.

Various extensions are possible, including better heuristics and ionice
support.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 block/Kconfig.iosched |   8 +
 block/Makefile        |   1 +
 block/elevator.c      |   9 +-
 block/kyber-iosched.c | 586 ++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 600 insertions(+), 4 deletions(-)
 create mode 100644 block/kyber-iosched.c

diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index 58fc8684788d..ba6c9be67fa4 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -69,6 +69,14 @@ config MQ_IOSCHED_DEADLINE
 	---help---
 	  MQ version of the deadline IO scheduler.
 
+config MQ_IOSCHED_KYBER
+	tristate "Kyber I/O scheduler"
+	default y
+	---help---
+	  The Kyber I/O scheduler is a low-overhead scheduler suitable for
+	  multiqueue and other fast devices. Given a target latency, it will
+	  self-tune queue depths to achieve that goal.
+
 endmenu
 
 endif
diff --git a/block/Makefile b/block/Makefile
index 081bb680789b..6146d2eaaeaa 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_IOSCHED_NOOP)	+= noop-iosched.o
 obj-$(CONFIG_IOSCHED_DEADLINE)	+= deadline-iosched.o
 obj-$(CONFIG_IOSCHED_CFQ)	+= cfq-iosched.o
 obj-$(CONFIG_MQ_IOSCHED_DEADLINE)	+= mq-deadline.o
+obj-$(CONFIG_MQ_IOSCHED_KYBER)	+= kyber-iosched.o
 
 obj-$(CONFIG_BLOCK_COMPAT)	+= compat_ioctl.o
 obj-$(CONFIG_BLK_CMDLINE_PARSER)	+= cmdline-parser.o
diff --git a/block/elevator.c b/block/elevator.c
index 01139f549b5b..44a6e42ffc1a 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -221,14 +221,15 @@ int elevator_init(struct request_queue *q, char *name)
 
 	if (!e) {
 		/*
-		 * For blk-mq devices, we default to using mq-deadline,
-		 * if available, for single queue devices. If deadline
-		 * isn't available OR we have multiple queues, default
-		 * to "none".
+		 * For blk-mq, we default to using mq-deadline for single-queue
+		 * devices and kyber for multi-queue devices. We fall back to
+		 * "none" if the preferred scheduler isn't available.
 		 */
 		if (q->mq_ops) {
 			if (q->nr_hw_queues == 1)
 				e = elevator_get("mq-deadline", false);
+			else
+				e = elevator_get("kyber", false);
 			if (!e)
 				return 0;
 		} else
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
new file mode 100644
index 000000000000..e29cea785408
--- /dev/null
+++ b/block/kyber-iosched.c
@@ -0,0 +1,586 @@
+/*
+ * The Kyber I/O scheduler. Controls latency by throttling queue depths using
+ * scalable techniques.
+ *
+ * Copyright (C) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <https://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/blkdev.h>
+#include <linux/blk-mq.h>
+#include <linux/elevator.h>
+#include <linux/module.h>
+#include <linux/sbitmap.h>
+
+#include "blk.h"
+#include "blk-mq.h"
+#include "blk-mq-sched.h"
+#include "blk-mq-tag.h"
+#include "blk-stat.h"
+
+/* Scheduling domains. */
+enum {
+	KYBER_READ,
+	KYBER_WRITE,
+	KYBER_NUM_DOMAINS,
+};
+
+enum {
+	KYBER_MIN_DEPTH = 256,
+
+	/*
+	 * Initial device-wide depths for each scheduling domain.
+	 *
+	 * Even for fast devices with lots of tags like NVMe, you can saturate
+	 * the device with only a fraction of the maximum possible queue depth.
+	 * So, we cap these to a reasonable value.
+	 */
+	KYBER_READ_DEPTH = 256,
+	KYBER_WRITE_DEPTH = KYBER_READ_DEPTH / 4,
+
+	/*
+	 * Scheduling domain batch sizes. We favor reads over writes.
+	 */
+	KYBER_READ_BATCH = 16,
+	KYBER_WRITE_BATCH = 8,
+
+	/*
+	 * In order to prevent starvation of synchronous requests by a flood of
+	 * asynchronous requests, we reserve 25% of requests for synchronous
+	 * operations.
+	 */
+	KYBER_ASYNC_PERCENT = 75,
+};
+
+struct kyber_queue_data {
+	struct request_queue *q;
+
+	struct blk_stat_callback *cb;
+
+	/*
+	 * The device is divided into multiple scheduling domains based on the
+	 * request type. Each domain has a fixed number of in-flight requests of
+	 * that type device-wide, limited by these tokens.
+	 */
+	struct sbitmap_queue domain_tokens[KYBER_NUM_DOMAINS];
+
+	/*
+	 * The maximum depth that the domain tokens can be resized to.
+	 */
+	unsigned int max_domain_tokens[KYBER_NUM_DOMAINS];
+
+	/* Batch size for each scheduling domain. */
+	unsigned int domain_batch[KYBER_NUM_DOMAINS];
+
+	/*
+	 * Async request percentage, converted to per-word depth for
+	 * sbitmap_get_shallow().
+	 */
+	unsigned int async_depth;
+
+	/* Target read latency in nanoseconds. */
+	u64 read_lat_nsec;
+};
+
+struct kyber_hctx_data {
+	spinlock_t lock;
+	struct list_head rqs[KYBER_NUM_DOMAINS];
+	int cur_domain;
+	unsigned int batching;
+};
+
+/*
+ * Heuristics for limiting queue depths based on latency. Similar to AQM
+ * techniques for network routing.
+ */
+static void kyber_stats_fn(struct blk_stat_callback *cb,
+			   struct blk_stats *stats)
+{
+	struct kyber_queue_data *kqd = cb->data;
+	unsigned int orig_write_depth, write_depth;
+	u64 latency, target;
+
+	orig_write_depth = write_depth =
+		READ_ONCE(kqd->domain_tokens[KYBER_WRITE].sb.depth);
+
+	if (!stats->read.nr_samples) {
+		write_depth += 1;
+		goto resize;
+	}
+
+	latency = stats->read.mean;
+	target = kqd->read_lat_nsec;
+
+	if (latency >= 4 * target)
+		write_depth /= 2;
+	else if (latency >= 2 * target)
+		write_depth -= max(write_depth / 4, 1U);
+	else if (latency > target)
+		write_depth -= max(write_depth / 8, 1U);
+	else if (latency <= target / 2)
+		write_depth += 2;
+	else if (latency <= 3 * target / 4)
+		write_depth += 1;
+
+resize:
+	write_depth = clamp_t(unsigned int, write_depth, 1, KYBER_WRITE_DEPTH);
+	if (write_depth != orig_write_depth)
+		sbitmap_queue_resize(&kqd->domain_tokens[KYBER_WRITE], write_depth);
+
+	/* Continue monitoring latencies as long as we are throttling. */
+	if (write_depth < KYBER_WRITE_DEPTH && !timer_pending(&kqd->cb->timer))
+		blk_stat_arm_callback(kqd->cb, jiffies + msecs_to_jiffies(100));
+}
+
+/*
+ * Check if this request met our latency goal. If not, quickly gather some
+ * statistics and start throttling.
+ */
+static void kyber_check_latency(struct kyber_queue_data *kqd,
+				struct request *rq)
+{
+	u64 now, latency;
+	unsigned long expires;
+
+	if (req_op(rq) != REQ_OP_READ)
+		return;
+
+	/* If we are already managing the write depth, don't check again. */
+	if (kqd->domain_tokens[KYBER_WRITE].sb.depth < KYBER_WRITE_DEPTH)
+		return;
+
+	now = __blk_stat_time(ktime_to_ns(ktime_get()));
+	if (now < blk_stat_time(&rq->issue_stat))
+		return;
+
+	latency = now - blk_stat_time(&rq->issue_stat);
+
+	if (latency <= kqd->read_lat_nsec)
+		return;
+
+	if (!timer_pending(&kqd->cb->timer)) {
+		expires = jiffies + msecs_to_jiffies(10);
+		blk_stat_arm_callback(kqd->cb, expires);
+	}
+}
+
+static unsigned int kyber_sched_tags_shift(struct kyber_queue_data *kqd)
+{
+	/*
+	 * All of the hardware queues have the same depth, so we can just grab
+	 * the shift of the first one.
+	 */
+	return kqd->q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift;
+}
+
+static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
+{
+	struct kyber_queue_data *kqd;
+	unsigned int max_tokens;
+	unsigned int shift;
+	int ret = -ENOMEM;
+	int i;
+
+	kqd = kmalloc_node(sizeof(*kqd), GFP_KERNEL, q->node);
+	if (!kqd)
+		goto err;
+	kqd->q = q;
+
+	kqd->cb = blk_stat_alloc_callback(kyber_stats_fn, kqd);
+	if (!kqd->cb)
+		goto err_kqd;
+
+	/*
+	 * The maximum number of tokens for any scheduling domain is at least
+	 * the queue depth of a single hardware queue. If the hardware doesn't
+	 * have many tags, still provide a reasonable number.
+	 */
+	max_tokens = max_t(unsigned int, q->tag_set->queue_depth,
+			   KYBER_MIN_DEPTH);
+	for (i = 0; i < KYBER_NUM_DOMAINS; i++) {
+		kqd->max_domain_tokens[i] = max_tokens;
+		ret = sbitmap_queue_init_node(&kqd->domain_tokens[i],
+					      max_tokens, -1, false, GFP_KERNEL,
+					      q->node);
+		if (ret) {
+			while (--i >= 0)
+				sbitmap_queue_free(&kqd->domain_tokens[i]);
+			goto err_cb;
+		}
+	}
+
+	sbitmap_queue_resize(&kqd->domain_tokens[KYBER_READ], KYBER_READ_DEPTH);
+	sbitmap_queue_resize(&kqd->domain_tokens[KYBER_WRITE], KYBER_WRITE_DEPTH);
+
+	kqd->domain_batch[KYBER_READ] = KYBER_READ_BATCH;
+	kqd->domain_batch[KYBER_WRITE] = KYBER_WRITE_BATCH;
+
+	shift = kyber_sched_tags_shift(kqd);
+	kqd->async_depth = (1U << shift) * KYBER_ASYNC_PERCENT / 100U;
+
+	kqd->read_lat_nsec = 2000000ULL;
+
+	return kqd;
+
+err_cb:
+	blk_stat_free_callback(kqd->cb);
+err_kqd:
+	kfree(kqd);
+err:
+	return ERR_PTR(ret);
+}
+
+static void kyber_queue_data_free(struct kyber_queue_data *kqd)
+{
+	int i;
+
+	if (!kqd)
+		return;
+
+	for (i = 0; i < KYBER_NUM_DOMAINS; i++)
+		sbitmap_queue_free(&kqd->domain_tokens[i]);
+	blk_stat_free_callback(kqd->cb);
+	kfree(kqd);
+}
+
+static int kyber_hctx_data_init(struct blk_mq_hw_ctx *hctx)
+{
+	struct kyber_hctx_data *khd = hctx->sched_data;
+	int i;
+
+	spin_lock_init(&khd->lock);
+
+	for (i = 0; i < KYBER_NUM_DOMAINS; i++)
+		INIT_LIST_HEAD(&khd->rqs[i]);
+
+	khd->cur_domain = 0;
+	khd->batching = 0;
+
+	return 0;
+}
+
+static int kyber_init_sched(struct request_queue *q, struct elevator_type *e)
+{
+	struct kyber_queue_data *kqd;
+	struct elevator_queue *eq;
+	int ret;
+
+	eq = elevator_alloc(q, e);
+	if (!eq)
+		return -ENOMEM;
+
+	kqd = kyber_queue_data_alloc(q);
+	if (IS_ERR(kqd)) {
+		ret = PTR_ERR(kqd);
+		goto err_kobj;
+	}
+
+	ret = blk_mq_sched_init_hctx_data(q, sizeof(struct kyber_hctx_data),
+					  kyber_hctx_data_init, NULL);
+	if (ret)
+		goto err_kqd;
+
+	eq->elevator_data = kqd;
+	q->elevator = eq;
+
+	blk_stat_add_callback(q, kqd->cb);
+
+	return 0;
+
+err_kqd:
+	kyber_queue_data_free(kqd);
+err_kobj:
+	kobject_put(&eq->kobj);
+	return ret;
+}
+
+static void kyber_exit_sched(struct elevator_queue *e)
+{
+	struct kyber_queue_data *kqd = e->elevator_data;
+	struct request_queue *q = kqd->q;
+
+	blk_stat_remove_callback(q, kqd->cb);
+	blk_mq_sched_free_hctx_data(q, NULL);
+	kyber_queue_data_free(e->elevator_data);
+}
+
+static int op_to_sched_domain(int op)
+{
+	if (op_is_write(op))
+		return KYBER_WRITE;
+	else
+		return KYBER_READ;
+}
+
+static int kyber_get_domain_token(struct kyber_queue_data *kqd,
+				  int sched_domain)
+{
+	struct sbitmap_queue *domain_tokens;
+
+	domain_tokens = &kqd->domain_tokens[sched_domain];
+	return __sbitmap_queue_get(domain_tokens);
+}
+
+static int rq_get_domain_token(struct request *rq)
+{
+	return (long)rq->elv.priv[0];
+}
+
+static void rq_set_domain_token(struct request *rq, int token)
+{
+	rq->elv.priv[0] = (void *)(long)token;
+}
+
+static void rq_clear_domain_token(struct kyber_queue_data *kqd,
+				  struct request *rq)
+{
+	int sched_domain, nr;
+
+	nr = rq_get_domain_token(rq);
+	if (nr != -1) {
+		sched_domain = op_to_sched_domain(req_op(rq));
+		sbitmap_queue_clear(&kqd->domain_tokens[sched_domain], nr,
+				    rq->mq_ctx->cpu);
+	}
+}
+
+static struct request *kyber_get_request(struct request_queue *q,
+					 unsigned int op,
+					 struct blk_mq_alloc_data *data)
+{
+	struct kyber_queue_data *kqd = q->elevator->elevator_data;
+	struct request *rq;
+
+	/*
+	 * We use the scheduler tags as per-hardware queue queueing tokens.
+	 * Async requests can be limited at this stage.
+	 */
+	if (!op_is_sync(op))
+		data->shallow_depth = READ_ONCE(kqd->async_depth);
+
+	rq = __blk_mq_alloc_request(data, op);
+	if (rq)
+		rq_set_domain_token(rq, -1);
+	return rq;
+}
+
+static void kyber_put_request(struct request *rq)
+{
+	struct request_queue *q = rq->q;
+	struct kyber_queue_data *kqd = q->elevator->elevator_data;
+
+	kyber_check_latency(kqd, rq);
+	rq_clear_domain_token(kqd, rq);
+	blk_mq_finish_request(rq);
+}
+
+static void kyber_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx,
+				  struct kyber_hctx_data *khd)
+{
+	LIST_HEAD(rq_list);
+	struct request *rq, *next;
+
+	blk_mq_flush_busy_ctxs(hctx, &rq_list);
+	list_for_each_entry_safe(rq, next, &rq_list, queuelist) {
+		int sched_domain;
+
+		sched_domain = op_to_sched_domain(req_op(rq));
+		list_move_tail(&rq->queuelist, &khd->rqs[sched_domain]);
+	}
+}
+
+static struct request *
+kyber_dispatch_cur_domain(struct blk_mq_hw_ctx *hctx,
+			  struct kyber_queue_data *kqd,
+			  struct kyber_hctx_data *khd,
+			  bool *flushed, bool *no_tokens)
+{
+	struct list_head *rqs;
+	struct request *rq;
+	int nr;
+
+	rqs = &khd->rqs[khd->cur_domain];
+	rq = list_first_entry_or_null(rqs, struct request, queuelist);
+
+	/*
+	 * If there wasn't already a pending request and we haven't flushed the
+	 * software queues yet, flush the software queues and check again.
+	 */
+	if (!rq && !*flushed) {
+		kyber_flush_busy_ctxs(hctx, khd);
+		*flushed = true;
+		rq = list_first_entry_or_null(rqs, struct request, queuelist);
+	}
+
+	if (rq) {
+		nr = kyber_get_domain_token(kqd, khd->cur_domain);
+		if (nr == -1) {
+			*no_tokens = true;
+		} else {
+			khd->batching++;
+			rq_set_domain_token(rq, nr);
+			list_del_init(&rq->queuelist);
+			return rq;
+		}
+	}
+
+	/* There were either no pending requests or no tokens. */
+	return NULL;
+}
+
+/*
+ * Returns a request on success, NULL if there were no requests to dispatch, and
+ * ERR_PTR(-EBUSY) if there were requests to dispatch but no domain tokens for
+ * them.
+ */
+static struct request *__kyber_dispatch_request(struct kyber_queue_data *kqd,
+						struct kyber_hctx_data *khd,
+						struct blk_mq_hw_ctx *hctx)
+{
+	bool flushed = false, no_tokens = false;
+	struct request *rq;
+	int i;
+
+	/*
+	 * First, if we are still entitled to batch, try to dispatch a request
+	 * from the batch.
+	 */
+	if (khd->batching < READ_ONCE(kqd->domain_batch[khd->cur_domain])) {
+		rq = kyber_dispatch_cur_domain(hctx, kqd, khd, &flushed,
+					       &no_tokens);
+		if (rq)
+			return rq;
+	}
+
+	/*
+	 * Either,
+	 * 1. We were no longer entitled to a batch.
+	 * 2. The domain we were batching didn't have any requests.
+	 * 3. The domain we were batching was out of tokens.
+	 *
+	 * Start another batch. Note that this wraps back around to the original
+	 * domain if no other domains have requests or tokens.
+	 */
+	khd->batching = 0;
+	for (i = 0; i < KYBER_NUM_DOMAINS; i++) {
+		if (++khd->cur_domain >= KYBER_NUM_DOMAINS)
+			khd->cur_domain = 0;
+
+		rq = kyber_dispatch_cur_domain(hctx, kqd, khd, &flushed,
+					       &no_tokens);
+		if (rq)
+			return rq;
+	}
+
+	return no_tokens ? ERR_PTR(-EBUSY) : NULL;
+}
+
+static struct request *kyber_dispatch_request(struct blk_mq_hw_ctx *hctx)
+{
+	struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data;
+	struct kyber_hctx_data *khd = hctx->sched_data;
+	struct request *rq;
+
+	spin_lock(&khd->lock);
+
+	rq = __kyber_dispatch_request(kqd, khd, hctx);
+	if (IS_ERR(rq)) {
+		/*
+		 * We failed to get a domain token. Mark the queue as needing a
+		 * restart and try again in case a token was freed before we set
+		 * the restart bit.
+		 */
+		blk_mq_sched_mark_restart_queue(hctx);
+		rq = __kyber_dispatch_request(kqd, khd, hctx);
+		if (IS_ERR(rq))
+			rq = NULL;
+	}
+
+	spin_unlock(&khd->lock);
+
+	return rq;
+}
+
+static bool kyber_has_work(struct blk_mq_hw_ctx *hctx)
+{
+	struct kyber_hctx_data *khd = hctx->sched_data;
+	int i;
+
+	for (i = 0; i < KYBER_NUM_DOMAINS; i++) {
+		if (!list_empty_careful(&khd->rqs[i]))
+			return true;
+	}
+	return false;
+}
+
+static ssize_t kyber_read_lat_show(struct elevator_queue *e, char *page)
+{
+	struct kyber_queue_data *kqd = e->elevator_data;
+
+	return sprintf(page, "%llu\n", kqd->read_lat_nsec);
+}
+
+static ssize_t kyber_read_lat_store(struct elevator_queue *e, const char *page,
+				    size_t count)
+{
+	struct kyber_queue_data *kqd = e->elevator_data;
+	unsigned long long nsec;
+	int ret;
+
+	ret = kstrtoull(page, 10, &nsec);
+	if (ret)
+		return ret;
+
+	WRITE_ONCE(kqd->read_lat_nsec, nsec);
+
+	return count;
+}
+
+static struct elv_fs_entry kyber_sched_attrs[] = {
+	__ATTR(read_lat_nsec, 0644, kyber_read_lat_show, kyber_read_lat_store),
+	__ATTR_NULL
+};
+
+static struct elevator_type kyber_sched = {
+	.ops.mq = {
+		.init_sched = kyber_init_sched,
+		.exit_sched = kyber_exit_sched,
+		.get_request = kyber_get_request,
+		.put_request = kyber_put_request,
+		.dispatch_request = kyber_dispatch_request,
+		.has_work = kyber_has_work,
+	},
+	.uses_mq = true,
+	.elevator_attrs = kyber_sched_attrs,
+	.elevator_name = "kyber",
+	.elevator_owner = THIS_MODULE,
+};
+
+static int __init kyber_init(void)
+{
+	return elv_register(&kyber_sched);
+}
+
+static void __exit kyber_exit(void)
+{
+	elv_unregister(&kyber_sched);
+}
+
+module_init(kyber_init);
+module_exit(kyber_exit);
+
+MODULE_AUTHOR("Omar Sandoval");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Kyber I/O scheduler");
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/4] blk-mq: multiqueue I/O scheduler
  2017-03-17 22:03 [RFC PATCH 0/4] blk-mq: multiqueue I/O scheduler Omar Sandoval
                   ` (3 preceding siblings ...)
  2017-03-17 22:03 ` [RFC PATCH 4/4] blk-mq: introduce Kyber multiqueue I/O scheduler Omar Sandoval
@ 2017-03-18  0:55 ` Omar Sandoval
  2017-03-18 11:14 ` [Lsf] " Hannes Reinecke
  5 siblings, 0 replies; 10+ messages in thread
From: Omar Sandoval @ 2017-03-18  0:55 UTC (permalink / raw)
  To: linux-block; +Cc: kernel-team

On Fri, Mar 17, 2017 at 03:03:29PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> With Kyber set to a target latency of 1 ms, the reader sees latencies of 8 ms
> on average. Kyber brings this down to just over 1 ms.

Of course, this is supposed to say, without Kyber, the reader sees
latencies of 8 ms. With Kyber set to a target latency of 1 ms, the
actual average latency is just over 1 ms.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Lsf] [RFC PATCH 0/4] blk-mq: multiqueue I/O scheduler
  2017-03-17 22:03 [RFC PATCH 0/4] blk-mq: multiqueue I/O scheduler Omar Sandoval
                   ` (4 preceding siblings ...)
  2017-03-18  0:55 ` [RFC PATCH 0/4] blk-mq: " Omar Sandoval
@ 2017-03-18 11:14 ` Hannes Reinecke
  2017-03-18 11:15   ` Paolo Valente
  2017-03-18 15:30   ` Bart Van Assche
  5 siblings, 2 replies; 10+ messages in thread
From: Hannes Reinecke @ 2017-03-18 11:14 UTC (permalink / raw)
  To: Omar Sandoval, linux-block; +Cc: lsf, kernel-team

On 03/17/2017 11:03 PM, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> =

> This patch series implements an I/O scheduler for multiqueue devices
> combining several techniques: the scalable bitmap library, the new
> blk-stats API, and queue depth throttling similar to blk-wbt.
> =

> These patches are on top of my earlier blk-stats series [1]. They also
> need a fix in Jens' for-linus branch in order to work properly [2].
> =

> Patches 1 and 2 implement a new sbitmap operation and patch 3 exports a
> required function. Patch 4 implements the new scheduler, named Kyber.
> =

> The commit message in patch 4 describes Kyber in detail. The scheduler
> employs some heuristics that I've experimented with, but that's probably
> the area that needs the most work. I'll be at LSF/MM next week, and
> there's currently a lightning talk on the schedule to discuss
> improvements and extensions.
> =

[ .. ]

A silent round of applause for you.
Very much appreciated.

And actually, I have been looking at properly supporting host-wide tag
maps, so the concept of 'shallow' queue depth might be the way to go here.

And as I have some other questions, too (assigning a new tag on
requeue?) it might be an idea to spread this out into a formal session
at LSF. If there are some slots left, that is.

Cheers,

Hannes
-- =

Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg
GF: J. Hawn, J. Guild, F. Imend=F6rffer, HRB 16746 (AG N=FCrnberg)
_______________________________________________
Lsf mailing list
Lsf@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lsf

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Lsf] [RFC PATCH 0/4] blk-mq: multiqueue I/O scheduler
  2017-03-18 11:14 ` [Lsf] " Hannes Reinecke
@ 2017-03-18 11:15   ` Paolo Valente
  2017-03-18 20:18     ` Martin K. Petersen
  2017-03-18 15:30   ` Bart Van Assche
  1 sibling, 1 reply; 10+ messages in thread
From: Paolo Valente @ 2017-03-18 11:15 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-block, lsf, Omar Sandoval, kernel-team


> Il giorno 18 mar 2017, alle ore 07:14, Hannes Reinecke <hare@suse.de> ha =
scritto:
> =

> On 03/17/2017 11:03 PM, Omar Sandoval wrote:
>> From: Omar Sandoval <osandov@fb.com>
>> =

>> This patch series implements an I/O scheduler for multiqueue devices
>> combining several techniques: the scalable bitmap library, the new
>> blk-stats API, and queue depth throttling similar to blk-wbt.
>> =

>> These patches are on top of my earlier blk-stats series [1]. They also
>> need a fix in Jens' for-linus branch in order to work properly [2].
>> =

>> Patches 1 and 2 implement a new sbitmap operation and patch 3 exports a
>> required function. Patch 4 implements the new scheduler, named Kyber.
>> =

>> The commit message in patch 4 describes Kyber in detail. The scheduler
>> employs some heuristics that I've experimented with, but that's probably
>> the area that needs the most work. I'll be at LSF/MM next week, and
>> there's currently a lightning talk on the schedule to discuss
>> improvements and extensions.
>> =

> [ .. ]
> =

> A silent round of applause for you.
> Very much appreciated.
> =

> And actually, I have been looking at properly supporting host-wide tag
> maps, so the concept of 'shallow' queue depth might be the way to go here.
> =

> And as I have some other questions, too (assigning a new tag on
> requeue?) it might be an idea to spread this out into a formal session
> at LSF. If there are some slots left, that is.
> =


I would be very interested in such a discussion too.

Thanks,
Paolo

> Cheers,
> =

> Hannes
> -- =

> Dr. Hannes Reinecke		      zSeries & Storage
> hare@suse.de			      +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg
> GF: J. Hawn, J. Guild, F. Imend=F6rffer, HRB 16746 (AG N=FCrnberg)
> _______________________________________________
> Lsf mailing list
> Lsf@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lsf

_______________________________________________
Lsf mailing list
Lsf@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lsf

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Lsf] [RFC PATCH 0/4] blk-mq: multiqueue I/O scheduler
  2017-03-18 11:14 ` [Lsf] " Hannes Reinecke
  2017-03-18 11:15   ` Paolo Valente
@ 2017-03-18 15:30   ` Bart Van Assche
  1 sibling, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2017-03-18 15:30 UTC (permalink / raw)
  To: osandov, hare, linux-block; +Cc: lsf, kernel-team

T24gU2F0LCAyMDE3LTAzLTE4IGF0IDEyOjE0ICswMTAwLCBIYW5uZXMgUmVpbmVja2Ugd3JvdGU6
DQo+IE9uIDAzLzE3LzIwMTcgMTE6MDMgUE0sIE9tYXIgU2FuZG92YWwgd3JvdGU6DQo+ID4gRnJv
bTogT21hciBTYW5kb3ZhbCA8b3NhbmRvdkBmYi5jb20+DQo+ID4gDQo+ID4gVGhpcyBwYXRjaCBz
ZXJpZXMgaW1wbGVtZW50cyBhbiBJL08gc2NoZWR1bGVyIGZvciBtdWx0aXF1ZXVlIGRldmljZXMN
Cj4gPiBjb21iaW5pbmcgc2V2ZXJhbCB0ZWNobmlxdWVzOiB0aGUgc2NhbGFibGUgYml0bWFwIGxp
YnJhcnksIHRoZSBuZXcNCj4gPiBibGstc3RhdHMgQVBJLCBhbmQgcXVldWUgZGVwdGggdGhyb3R0
bGluZyBzaW1pbGFyIHRvIGJsay13YnQuDQo+ID4gDQo+ID4gVGhlc2UgcGF0Y2hlcyBhcmUgb24g
dG9wIG9mIG15IGVhcmxpZXIgYmxrLXN0YXRzIHNlcmllcyBbMV0uIFRoZXkgYWxzbw0KPiA+IG5l
ZWQgYSBmaXggaW4gSmVucycgZm9yLWxpbnVzIGJyYW5jaCBpbiBvcmRlciB0byB3b3JrIHByb3Bl
cmx5IFsyXS4NCj4gPiANCj4gPiBQYXRjaGVzIDEgYW5kIDIgaW1wbGVtZW50IGEgbmV3IHNiaXRt
YXAgb3BlcmF0aW9uIGFuZCBwYXRjaCAzIGV4cG9ydHMgYQ0KPiA+IHJlcXVpcmVkIGZ1bmN0aW9u
LiBQYXRjaCA0IGltcGxlbWVudHMgdGhlIG5ldyBzY2hlZHVsZXIsIG5hbWVkIEt5YmVyLg0KPiA+
IA0KPiA+IFRoZSBjb21taXQgbWVzc2FnZSBpbiBwYXRjaCA0IGRlc2NyaWJlcyBLeWJlciBpbiBk
ZXRhaWwuIFRoZSBzY2hlZHVsZXINCj4gPiBlbXBsb3lzIHNvbWUgaGV1cmlzdGljcyB0aGF0IEkn
dmUgZXhwZXJpbWVudGVkIHdpdGgsIGJ1dCB0aGF0J3MgcHJvYmFibHkNCj4gPiB0aGUgYXJlYSB0
aGF0IG5lZWRzIHRoZSBtb3N0IHdvcmsuIEknbGwgYmUgYXQgTFNGL01NIG5leHQgd2VlaywgYW5k
DQo+ID4gdGhlcmUncyBjdXJyZW50bHkgYSBsaWdodG5pbmcgdGFsayBvbiB0aGUgc2NoZWR1bGUg
dG8gZGlzY3Vzcw0KPiA+IGltcHJvdmVtZW50cyBhbmQgZXh0ZW5zaW9ucy4NCj4gPiANCj4gDQo+
IFsgLi4gXQ0KPiANCj4gQSBzaWxlbnQgcm91bmQgb2YgYXBwbGF1c2UgZm9yIHlvdS4NCj4gVmVy
eSBtdWNoIGFwcHJlY2lhdGVkLg0KPiANCj4gQW5kIGFjdHVhbGx5LCBJIGhhdmUgYmVlbiBsb29r
aW5nIGF0IHByb3Blcmx5IHN1cHBvcnRpbmcgaG9zdC13aWRlIHRhZw0KPiBtYXBzLCBzbyB0aGUg
Y29uY2VwdCBvZiAnc2hhbGxvdycgcXVldWUgZGVwdGggbWlnaHQgYmUgdGhlIHdheSB0byBnbyBo
ZXJlLg0KPiANCj4gQW5kIGFzIEkgaGF2ZSBzb21lIG90aGVyIHF1ZXN0aW9ucywgdG9vIChhc3Np
Z25pbmcgYSBuZXcgdGFnIG9uDQo+IHJlcXVldWU/KSBpdCBtaWdodCBiZSBhbiBpZGVhIHRvIHNw
cmVhZCB0aGlzIG91dCBpbnRvIGEgZm9ybWFsIHNlc3Npb24NCj4gYXQgTFNGLiBJZiB0aGVyZSBh
cmUgc29tZSBzbG90cyBsZWZ0LCB0aGF0IGlzLg0KDQpJZiBubyBzbG90IGNhbiBiZSBmb3VuZCwg
aG93IGFib3V0IG9yZ2FuaXppbmcgYW4gZXZlbmluZyBzZXNzaW9uIGFib3V0IHRoaXMNCm5ldyBz
Y2hlZHVsZXI/DQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Lsf] [RFC PATCH 0/4] blk-mq: multiqueue I/O scheduler
  2017-03-18 11:15   ` Paolo Valente
@ 2017-03-18 20:18     ` Martin K. Petersen
  0 siblings, 0 replies; 10+ messages in thread
From: Martin K. Petersen @ 2017-03-18 20:18 UTC (permalink / raw)
  To: Paolo Valente
  Cc: linux-block, lsf, Omar Sandoval, kernel-team, Hannes Reinecke

Paolo Valente <paolo.valente@linaro.org> writes:


>> And actually, I have been looking at properly supporting host-wide tag
>> maps, so the concept of 'shallow' queue depth might be the way to go here.
>> 
>> And as I have some other questions, too (assigning a new tag on
>> requeue?) it might be an idea to spread this out into a formal session
>> at LSF. If there are some slots left, that is.
>> 
>
> I would be very interested in such a discussion too.

OK. Added.

-- 
Martin K. Petersen	Oracle Linux Engineering
_______________________________________________
Lsf mailing list
Lsf@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lsf

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-03-18 20:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-17 22:03 [RFC PATCH 0/4] blk-mq: multiqueue I/O scheduler Omar Sandoval
2017-03-17 22:03 ` [RFC PATCH 1/4] sbitmap: add sbitmap_get_shallow() operation Omar Sandoval
2017-03-17 22:03 ` [RFC PATCH 2/4] blk-mq: add shallow depth option for blk_mq_get_tag() Omar Sandoval
2017-03-17 22:03 ` [RFC PATCH 3/4] blk-mq: export blk_mq_finish_request() Omar Sandoval
2017-03-17 22:03 ` [RFC PATCH 4/4] blk-mq: introduce Kyber multiqueue I/O scheduler Omar Sandoval
2017-03-18  0:55 ` [RFC PATCH 0/4] blk-mq: " Omar Sandoval
2017-03-18 11:14 ` [Lsf] " Hannes Reinecke
2017-03-18 11:15   ` Paolo Valente
2017-03-18 20:18     ` Martin K. Petersen
2017-03-18 15:30   ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.