All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v3 0/9] blk-mq-sched and sbitmap shallow depth
@ 2018-05-10 16:24 Jens Axboe
  2018-05-10 16:24 ` [PATCH 1/9] blk-mq: don't call into depth limiting for reserved tags Jens Axboe
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Jens Axboe @ 2018-05-10 16:24 UTC (permalink / raw)
  To: linux-block; +Cc: osandov, efault, paolo.valente

Found another issue in sbitmap around our wake batch accounting.
See the last patch for details.

This series passes my testing. The sbitmap change was improved
by Omar, I swapped in his patch instead.

You can also find this series here:

http://git.kernel.dk/cgit/linux-block/log/?h=for-4.18/iosched

 block/bfq-iosched.c     |  113 +++++++++++++++++++++++++-----------------------
 block/bfq-iosched.h     |    6 --
 block/blk-mq.c          |    6 +-
 block/kyber-iosched.c   |    3 +
 include/linux/sbitmap.h |   29 ++++++++++++
 lib/sbitmap.c           |   69 ++++++++++++++++++++++-------
 6 files changed, 148 insertions(+), 78 deletions(-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/9] blk-mq: don't call into depth limiting for reserved tags
  2018-05-10 16:24 [PATCHSET v3 0/9] blk-mq-sched and sbitmap shallow depth Jens Axboe
@ 2018-05-10 16:24 ` Jens Axboe
  2018-05-10 16:59   ` Omar Sandoval
  2018-05-10 16:24 ` [PATCH 2/9] bfq-iosched: don't worry about reserved tags in limit_depth Jens Axboe
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2018-05-10 16:24 UTC (permalink / raw)
  To: linux-block; +Cc: osandov, efault, paolo.valente, Jens Axboe

It's not useful, they are internal and/or error handling recovery
commands.

Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4e9d83594cca..64630caaf27e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -360,9 +360,11 @@ static struct request *blk_mq_get_request(struct request_queue *q,
 
 		/*
 		 * Flush requests are special and go directly to the
-		 * dispatch list.
+		 * dispatch list. Don't include reserved tags in the
+		 * limiting, as it isn't useful.
 		 */
-		if (!op_is_flush(op) && e->type->ops.mq.limit_depth)
+		if (!op_is_flush(op) && e->type->ops.mq.limit_depth &&
+		    !(data->flags & BLK_MQ_REQ_RESERVED))
 			e->type->ops.mq.limit_depth(op, data);
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/9] bfq-iosched: don't worry about reserved tags in limit_depth
  2018-05-10 16:24 [PATCHSET v3 0/9] blk-mq-sched and sbitmap shallow depth Jens Axboe
  2018-05-10 16:24 ` [PATCH 1/9] blk-mq: don't call into depth limiting for reserved tags Jens Axboe
@ 2018-05-10 16:24 ` Jens Axboe
  2018-05-10 16:59   ` Omar Sandoval
  2018-05-10 16:24 ` [PATCH 3/9] bfq: calculate shallow depths at init time Jens Axboe
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2018-05-10 16:24 UTC (permalink / raw)
  To: linux-block; +Cc: osandov, efault, paolo.valente, Jens Axboe

Reserved tags are used for error handling, we don't need to
care about them for regular IO. The core won't call us for these
anyway.

Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/bfq-iosched.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index eefd8a4bc936..db38e88a5670 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -542,14 +542,7 @@ static void bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data)
 	if (op_is_sync(op) && !op_is_write(op))
 		return;
 
-	if (data->flags & BLK_MQ_REQ_RESERVED) {
-		if (unlikely(!tags->nr_reserved_tags)) {
-			WARN_ON_ONCE(1);
-			return;
-		}
-		bt = &tags->breserved_tags;
-	} else
-		bt = &tags->bitmap_tags;
+	bt = &tags->bitmap_tags;
 
 	if (unlikely(bfqd->sb_shift != bt->sb.shift))
 		bfq_update_depths(bfqd, bt);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/9] bfq: calculate shallow depths at init time
  2018-05-10 16:24 [PATCHSET v3 0/9] blk-mq-sched and sbitmap shallow depth Jens Axboe
  2018-05-10 16:24 ` [PATCH 1/9] blk-mq: don't call into depth limiting for reserved tags Jens Axboe
  2018-05-10 16:24 ` [PATCH 2/9] bfq-iosched: don't worry about reserved tags in limit_depth Jens Axboe
@ 2018-05-10 16:24 ` Jens Axboe
  2018-05-10 17:00   ` Omar Sandoval
  2018-05-10 16:24 ` [PATCH 4/9] bfq-iosched: remove unused variable Jens Axboe
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2018-05-10 16:24 UTC (permalink / raw)
  To: linux-block; +Cc: osandov, efault, paolo.valente, Jens Axboe

It doesn't change, so don't put it in the per-IO hot path.

Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/bfq-iosched.c | 97 +++++++++++++++++++++++++++--------------------------
 1 file changed, 50 insertions(+), 47 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index db38e88a5670..0cd8aa80c32d 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -487,46 +487,6 @@ static struct request *bfq_choose_req(struct bfq_data *bfqd,
 }
 
 /*
- * See the comments on bfq_limit_depth for the purpose of
- * the depths set in the function.
- */
-static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
-{
-	bfqd->sb_shift = bt->sb.shift;
-
-	/*
-	 * In-word depths if no bfq_queue is being weight-raised:
-	 * leaving 25% of tags only for sync reads.
-	 *
-	 * In next formulas, right-shift the value
-	 * (1U<<bfqd->sb_shift), instead of computing directly
-	 * (1U<<(bfqd->sb_shift - something)), to be robust against
-	 * any possible value of bfqd->sb_shift, without having to
-	 * limit 'something'.
-	 */
-	/* no more than 50% of tags for async I/O */
-	bfqd->word_depths[0][0] = max((1U<<bfqd->sb_shift)>>1, 1U);
-	/*
-	 * no more than 75% of tags for sync writes (25% extra tags
-	 * w.r.t. async I/O, to prevent async I/O from starving sync
-	 * writes)
-	 */
-	bfqd->word_depths[0][1] = max(((1U<<bfqd->sb_shift) * 3)>>2, 1U);
-
-	/*
-	 * In-word depths in case some bfq_queue is being weight-
-	 * raised: leaving ~63% of tags for sync reads. This is the
-	 * highest percentage for which, in our tests, application
-	 * start-up times didn't suffer from any regression due to tag
-	 * shortage.
-	 */
-	/* no more than ~18% of tags for async I/O */
-	bfqd->word_depths[1][0] = max(((1U<<bfqd->sb_shift) * 3)>>4, 1U);
-	/* no more than ~37% of tags for sync writes (~20% extra tags) */
-	bfqd->word_depths[1][1] = max(((1U<<bfqd->sb_shift) * 6)>>4, 1U);
-}
-
-/*
  * Async I/O can easily starve sync I/O (both sync reads and sync
  * writes), by consuming all tags. Similarly, storms of sync writes,
  * such as those that sync(2) may trigger, can starve sync reads.
@@ -535,18 +495,11 @@ static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
  */
 static void bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data)
 {
-	struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
 	struct bfq_data *bfqd = data->q->elevator->elevator_data;
-	struct sbitmap_queue *bt;
 
 	if (op_is_sync(op) && !op_is_write(op))
 		return;
 
-	bt = &tags->bitmap_tags;
-
-	if (unlikely(bfqd->sb_shift != bt->sb.shift))
-		bfq_update_depths(bfqd, bt);
-
 	data->shallow_depth =
 		bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)];
 
@@ -5126,6 +5079,55 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
 	__bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
 }
 
+/*
+ * See the comments on bfq_limit_depth for the purpose of
+ * the depths set in the function.
+ */
+static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
+{
+	bfqd->sb_shift = bt->sb.shift;
+
+	/*
+	 * In-word depths if no bfq_queue is being weight-raised:
+	 * leaving 25% of tags only for sync reads.
+	 *
+	 * In next formulas, right-shift the value
+	 * (1U<<bfqd->sb_shift), instead of computing directly
+	 * (1U<<(bfqd->sb_shift - something)), to be robust against
+	 * any possible value of bfqd->sb_shift, without having to
+	 * limit 'something'.
+	 */
+	/* no more than 50% of tags for async I/O */
+	bfqd->word_depths[0][0] = max((1U<<bfqd->sb_shift)>>1, 1U);
+	/*
+	 * no more than 75% of tags for sync writes (25% extra tags
+	 * w.r.t. async I/O, to prevent async I/O from starving sync
+	 * writes)
+	 */
+	bfqd->word_depths[0][1] = max(((1U<<bfqd->sb_shift) * 3)>>2, 1U);
+
+	/*
+	 * In-word depths in case some bfq_queue is being weight-
+	 * raised: leaving ~63% of tags for sync reads. This is the
+	 * highest percentage for which, in our tests, application
+	 * start-up times didn't suffer from any regression due to tag
+	 * shortage.
+	 */
+	/* no more than ~18% of tags for async I/O */
+	bfqd->word_depths[1][0] = max(((1U<<bfqd->sb_shift) * 3)>>4, 1U);
+	/* no more than ~37% of tags for sync writes (~20% extra tags) */
+	bfqd->word_depths[1][1] = max(((1U<<bfqd->sb_shift) * 6)>>4, 1U);
+}
+
+static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
+{
+	struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
+	struct blk_mq_tags *tags = hctx->sched_tags;
+
+	bfq_update_depths(bfqd, &tags->bitmap_tags);
+	return 0;
+}
+
 static void bfq_exit_queue(struct elevator_queue *e)
 {
 	struct bfq_data *bfqd = e->elevator_data;
@@ -5547,6 +5549,7 @@ static struct elevator_type iosched_bfq_mq = {
 		.requests_merged	= bfq_requests_merged,
 		.request_merged		= bfq_request_merged,
 		.has_work		= bfq_has_work,
+		.init_hctx		= bfq_init_hctx,
 		.init_sched		= bfq_init_queue,
 		.exit_sched		= bfq_exit_queue,
 	},
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/9] bfq-iosched: remove unused variable
  2018-05-10 16:24 [PATCHSET v3 0/9] blk-mq-sched and sbitmap shallow depth Jens Axboe
                   ` (2 preceding siblings ...)
  2018-05-10 16:24 ` [PATCH 3/9] bfq: calculate shallow depths at init time Jens Axboe
@ 2018-05-10 16:24 ` Jens Axboe
  2018-05-10 17:00   ` Omar Sandoval
  2018-05-10 16:24 ` [PATCH 5/9] sbitmap: fix missed wakeups caused by sbitmap_queue_get_shallow() Jens Axboe
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2018-05-10 16:24 UTC (permalink / raw)
  To: linux-block; +Cc: osandov, efault, paolo.valente, Jens Axboe

bfqd->sb_shift was attempted used as a cache for the sbitmap queue
shift, but we don't need it, as it never changes. Kill it with fire.

Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/bfq-iosched.c | 16 +++++++---------
 block/bfq-iosched.h |  6 ------
 2 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 0cd8aa80c32d..10294124d597 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5085,26 +5085,24 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
  */
 static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
 {
-	bfqd->sb_shift = bt->sb.shift;
-
 	/*
 	 * In-word depths if no bfq_queue is being weight-raised:
 	 * leaving 25% of tags only for sync reads.
 	 *
 	 * In next formulas, right-shift the value
-	 * (1U<<bfqd->sb_shift), instead of computing directly
-	 * (1U<<(bfqd->sb_shift - something)), to be robust against
-	 * any possible value of bfqd->sb_shift, without having to
+	 * (1U<<bt->sb.shift), instead of computing directly
+	 * (1U<<(bt->sb.shift - something)), to be robust against
+	 * any possible value of bt->sb.shift, without having to
 	 * limit 'something'.
 	 */
 	/* no more than 50% of tags for async I/O */
-	bfqd->word_depths[0][0] = max((1U<<bfqd->sb_shift)>>1, 1U);
+	bfqd->word_depths[0][0] = max((1U << bt->sb.shift) >> 1, 1U);
 	/*
 	 * no more than 75% of tags for sync writes (25% extra tags
 	 * w.r.t. async I/O, to prevent async I/O from starving sync
 	 * writes)
 	 */
-	bfqd->word_depths[0][1] = max(((1U<<bfqd->sb_shift) * 3)>>2, 1U);
+	bfqd->word_depths[0][1] = max(((1U << bt->sb.shift) * 3) >> 2, 1U);
 
 	/*
 	 * In-word depths in case some bfq_queue is being weight-
@@ -5114,9 +5112,9 @@ static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
 	 * shortage.
 	 */
 	/* no more than ~18% of tags for async I/O */
-	bfqd->word_depths[1][0] = max(((1U<<bfqd->sb_shift) * 3)>>4, 1U);
+	bfqd->word_depths[1][0] = max(((1U << bt->sb.shift) * 3) >> 4, 1U);
 	/* no more than ~37% of tags for sync writes (~20% extra tags) */
-	bfqd->word_depths[1][1] = max(((1U<<bfqd->sb_shift) * 6)>>4, 1U);
+	bfqd->word_depths[1][1] = max(((1U << bt->sb.shift) * 6) >> 4, 1U);
 }
 
 static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 8ec7ff92cd6f..faac509cb35e 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -636,12 +636,6 @@ struct bfq_data {
 	struct bfq_queue *bio_bfqq;
 
 	/*
-	 * Cached sbitmap shift, used to compute depth limits in
-	 * bfq_update_depths.
-	 */
-	unsigned int sb_shift;
-
-	/*
 	 * Depth limits used in bfq_limit_depth (see comments on the
 	 * function)
 	 */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/9] sbitmap: fix missed wakeups caused by sbitmap_queue_get_shallow()
  2018-05-10 16:24 [PATCHSET v3 0/9] blk-mq-sched and sbitmap shallow depth Jens Axboe
                   ` (3 preceding siblings ...)
  2018-05-10 16:24 ` [PATCH 4/9] bfq-iosched: remove unused variable Jens Axboe
@ 2018-05-10 16:24 ` Jens Axboe
  2018-05-10 17:01   ` Omar Sandoval
  2018-05-10 16:24 ` [PATCH 6/9] sbitmap: warn if using smaller shallow depth than was setup Jens Axboe
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2018-05-10 16:24 UTC (permalink / raw)
  To: linux-block; +Cc: osandov, efault, paolo.valente, Jens Axboe

From: Omar Sandoval <osandov@fb.com>

The sbitmap queue wake batch is calculated such that once allocations
start blocking, all of the bits which are already allocated must be
enough to fulfill the batch counters of all of the waitqueues. However,
the shallow allocation depth can break this invariant, since we block
before our full depth is being utilized. Add
sbitmap_queue_min_shallow_depth(), which saves the minimum shallow depth
the sbq will use, and update sbq_calc_wake_batch() to take it into
account.

Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/sbitmap.h | 29 +++++++++++++++++++++++++++++
 lib/sbitmap.c           | 49 ++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
index 841585f6e5f2..0c4a9c242dd7 100644
--- a/include/linux/sbitmap.h
+++ b/include/linux/sbitmap.h
@@ -127,6 +127,12 @@ struct sbitmap_queue {
 	 * @round_robin: Allocate bits in strict round-robin order.
 	 */
 	bool round_robin;
+
+	/**
+	 * @min_shallow_depth: The minimum shallow depth which may be passed to
+	 * sbitmap_queue_get_shallow() or __sbitmap_queue_get_shallow().
+	 */
+	unsigned int min_shallow_depth;
 };
 
 /**
@@ -390,6 +396,9 @@ int __sbitmap_queue_get(struct sbitmap_queue *sbq);
  * @shallow_depth: The maximum number of bits to allocate from a single word.
  * See sbitmap_get_shallow().
  *
+ * If you call this, make sure to call sbitmap_queue_min_shallow_depth() after
+ * initializing @sbq.
+ *
  * Return: Non-negative allocated bit number if successful, -1 otherwise.
  */
 int __sbitmap_queue_get_shallow(struct sbitmap_queue *sbq,
@@ -424,6 +433,9 @@ static inline int sbitmap_queue_get(struct sbitmap_queue *sbq,
  * @shallow_depth: The maximum number of bits to allocate from a single word.
  * See sbitmap_get_shallow().
  *
+ * If you call this, make sure to call sbitmap_queue_min_shallow_depth() after
+ * initializing @sbq.
+ *
  * Return: Non-negative allocated bit number if successful, -1 otherwise.
  */
 static inline int sbitmap_queue_get_shallow(struct sbitmap_queue *sbq,
@@ -439,6 +451,23 @@ static inline int sbitmap_queue_get_shallow(struct sbitmap_queue *sbq,
 }
 
 /**
+ * sbitmap_queue_min_shallow_depth() - Inform a &struct sbitmap_queue of the
+ * minimum shallow depth that will be used.
+ * @sbq: Bitmap queue in question.
+ * @min_shallow_depth: The minimum shallow depth that will be passed to
+ * sbitmap_queue_get_shallow() or __sbitmap_queue_get_shallow().
+ *
+ * sbitmap_queue_clear() batches wakeups as an optimization. The batch size
+ * depends on the depth of the bitmap. Since the shallow allocation functions
+ * effectively operate with a different depth, the shallow depth must be taken
+ * into account when calculating the batch size. This function must be called
+ * with the minimum shallow depth that will be used. Failure to do so can result
+ * in missed wakeups.
+ */
+void sbitmap_queue_min_shallow_depth(struct sbitmap_queue *sbq,
+				     unsigned int min_shallow_depth);
+
+/**
  * sbitmap_queue_clear() - Free an allocated bit and wake up waiters on a
  * &struct sbitmap_queue.
  * @sbq: Bitmap to free from.
diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index e6a9c06ec70c..d21473b42465 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -270,18 +270,33 @@ void sbitmap_bitmap_show(struct sbitmap *sb, struct seq_file *m)
 }
 EXPORT_SYMBOL_GPL(sbitmap_bitmap_show);
 
-static unsigned int sbq_calc_wake_batch(unsigned int depth)
+static unsigned int sbq_calc_wake_batch(struct sbitmap_queue *sbq,
+					unsigned int depth)
 {
 	unsigned int wake_batch;
+	unsigned int shallow_depth;
 
 	/*
 	 * For each batch, we wake up one queue. We need to make sure that our
-	 * batch size is small enough that the full depth of the bitmap is
-	 * enough to wake up all of the queues.
+	 * batch size is small enough that the full depth of the bitmap,
+	 * potentially limited by a shallow depth, is enough to wake up all of
+	 * the queues.
+	 *
+	 * Each full word of the bitmap has bits_per_word bits, and there might
+	 * be a partial word. There are depth / bits_per_word full words and
+	 * depth % bits_per_word bits left over. In bitwise arithmetic:
+	 *
+	 * bits_per_word = 1 << shift
+	 * depth / bits_per_word = depth >> shift
+	 * depth % bits_per_word = depth & ((1 << shift) - 1)
+	 *
+	 * Each word can be limited to sbq->min_shallow_depth bits.
 	 */
-	wake_batch = SBQ_WAKE_BATCH;
-	if (wake_batch > depth / SBQ_WAIT_QUEUES)
-		wake_batch = max(1U, depth / SBQ_WAIT_QUEUES);
+	shallow_depth = min(1U << sbq->sb.shift, sbq->min_shallow_depth);
+	depth = ((depth >> sbq->sb.shift) * shallow_depth +
+		 min(depth & ((1U << sbq->sb.shift) - 1), shallow_depth));
+	wake_batch = clamp_t(unsigned int, depth / SBQ_WAIT_QUEUES, 1,
+			     SBQ_WAKE_BATCH);
 
 	return wake_batch;
 }
@@ -307,7 +322,8 @@ int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth,
 			*per_cpu_ptr(sbq->alloc_hint, i) = prandom_u32() % depth;
 	}
 
-	sbq->wake_batch = sbq_calc_wake_batch(depth);
+	sbq->min_shallow_depth = UINT_MAX;
+	sbq->wake_batch = sbq_calc_wake_batch(sbq, depth);
 	atomic_set(&sbq->wake_index, 0);
 
 	sbq->ws = kzalloc_node(SBQ_WAIT_QUEUES * sizeof(*sbq->ws), flags, node);
@@ -327,9 +343,10 @@ int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth,
 }
 EXPORT_SYMBOL_GPL(sbitmap_queue_init_node);
 
-void sbitmap_queue_resize(struct sbitmap_queue *sbq, unsigned int depth)
+static void sbitmap_queue_update_wake_batch(struct sbitmap_queue *sbq,
+					    unsigned int depth)
 {
-	unsigned int wake_batch = sbq_calc_wake_batch(depth);
+	unsigned int wake_batch = sbq_calc_wake_batch(sbq, depth);
 	int i;
 
 	if (sbq->wake_batch != wake_batch) {
@@ -342,6 +359,11 @@ void sbitmap_queue_resize(struct sbitmap_queue *sbq, unsigned int depth)
 		for (i = 0; i < SBQ_WAIT_QUEUES; i++)
 			atomic_set(&sbq->ws[i].wait_cnt, 1);
 	}
+}
+
+void sbitmap_queue_resize(struct sbitmap_queue *sbq, unsigned int depth)
+{
+	sbitmap_queue_update_wake_batch(sbq, depth);
 	sbitmap_resize(&sbq->sb, depth);
 }
 EXPORT_SYMBOL_GPL(sbitmap_queue_resize);
@@ -403,6 +425,14 @@ int __sbitmap_queue_get_shallow(struct sbitmap_queue *sbq,
 }
 EXPORT_SYMBOL_GPL(__sbitmap_queue_get_shallow);
 
+void sbitmap_queue_min_shallow_depth(struct sbitmap_queue *sbq,
+				     unsigned int min_shallow_depth)
+{
+	sbq->min_shallow_depth = min_shallow_depth;
+	sbitmap_queue_update_wake_batch(sbq, sbq->sb.depth);
+}
+EXPORT_SYMBOL_GPL(sbitmap_queue_min_shallow_depth);
+
 static struct sbq_wait_state *sbq_wake_ptr(struct sbitmap_queue *sbq)
 {
 	int i, wake_index;
@@ -528,5 +558,6 @@ void sbitmap_queue_show(struct sbitmap_queue *sbq, struct seq_file *m)
 	seq_puts(m, "}\n");
 
 	seq_printf(m, "round_robin=%d\n", sbq->round_robin);
+	seq_printf(m, "min_shallow_depth=%u\n", sbq->min_shallow_depth);
 }
 EXPORT_SYMBOL_GPL(sbitmap_queue_show);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/9] sbitmap: warn if using smaller shallow depth than was setup
  2018-05-10 16:24 [PATCHSET v3 0/9] blk-mq-sched and sbitmap shallow depth Jens Axboe
                   ` (4 preceding siblings ...)
  2018-05-10 16:24 ` [PATCH 5/9] sbitmap: fix missed wakeups caused by sbitmap_queue_get_shallow() Jens Axboe
@ 2018-05-10 16:24 ` Jens Axboe
  2018-05-10 17:02   ` Omar Sandoval
  2018-05-10 16:24 ` [PATCH 7/9] bfq-iosched: update shallow depth to smallest one used Jens Axboe
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2018-05-10 16:24 UTC (permalink / raw)
  To: linux-block; +Cc: osandov, efault, paolo.valente, Jens Axboe

From: Omar Sandoval <osandov@fb.com>

Make sure the user passed the right value to
sbitmap_queue_min_shallow_depth().

Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 lib/sbitmap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index d21473b42465..8f0950fbaa5c 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -402,6 +402,8 @@ int __sbitmap_queue_get_shallow(struct sbitmap_queue *sbq,
 	unsigned int hint, depth;
 	int nr;
 
+	WARN_ON_ONCE(shallow_depth < sbq->min_shallow_depth);
+
 	hint = this_cpu_read(*sbq->alloc_hint);
 	depth = READ_ONCE(sbq->sb.depth);
 	if (unlikely(hint >= depth)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/9] bfq-iosched: update shallow depth to smallest one used
  2018-05-10 16:24 [PATCHSET v3 0/9] blk-mq-sched and sbitmap shallow depth Jens Axboe
                   ` (5 preceding siblings ...)
  2018-05-10 16:24 ` [PATCH 6/9] sbitmap: warn if using smaller shallow depth than was setup Jens Axboe
@ 2018-05-10 16:24 ` Jens Axboe
  2018-05-10 17:03   ` Omar Sandoval
  2018-05-10 16:24 ` [PATCH 8/9] kyber-iosched: update shallow depth when setting up hardware queue Jens Axboe
  2018-05-10 16:24 ` [PATCH 9/9] sbitmap: fix race in wait batch accounting Jens Axboe
  8 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2018-05-10 16:24 UTC (permalink / raw)
  To: linux-block; +Cc: osandov, efault, paolo.valente, Jens Axboe

If our shallow depth is smaller than the wake batching of sbitmap,
we can introduce hangs. Ensure that sbitmap knows how low we'll go.

Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/bfq-iosched.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 10294124d597..b622e73a326a 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5081,10 +5081,13 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
 
 /*
  * See the comments on bfq_limit_depth for the purpose of
- * the depths set in the function.
+ * the depths set in the function. Return minimum shallow depth we'll use.
  */
-static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
+static unsigned int bfq_update_depths(struct bfq_data *bfqd,
+				      struct sbitmap_queue *bt)
 {
+	unsigned int i, j, min_shallow = UINT_MAX;
+
 	/*
 	 * In-word depths if no bfq_queue is being weight-raised:
 	 * leaving 25% of tags only for sync reads.
@@ -5115,14 +5118,22 @@ static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
 	bfqd->word_depths[1][0] = max(((1U << bt->sb.shift) * 3) >> 4, 1U);
 	/* no more than ~37% of tags for sync writes (~20% extra tags) */
 	bfqd->word_depths[1][1] = max(((1U << bt->sb.shift) * 6) >> 4, 1U);
+
+	for (i = 0; i < 2; i++)
+		for (j = 0; j < 2; j++)
+			min_shallow = min(min_shallow, bfqd->word_depths[i][j]);
+
+	return min_shallow;
 }
 
 static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
 {
 	struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
 	struct blk_mq_tags *tags = hctx->sched_tags;
+	unsigned int min_shallow;
 
-	bfq_update_depths(bfqd, &tags->bitmap_tags);
+	min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
+	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 8/9] kyber-iosched: update shallow depth when setting up hardware queue
  2018-05-10 16:24 [PATCHSET v3 0/9] blk-mq-sched and sbitmap shallow depth Jens Axboe
                   ` (6 preceding siblings ...)
  2018-05-10 16:24 ` [PATCH 7/9] bfq-iosched: update shallow depth to smallest one used Jens Axboe
@ 2018-05-10 16:24 ` Jens Axboe
  2018-05-10 17:03   ` Omar Sandoval
  2018-05-10 16:24 ` [PATCH 9/9] sbitmap: fix race in wait batch accounting Jens Axboe
  8 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2018-05-10 16:24 UTC (permalink / raw)
  To: linux-block; +Cc: osandov, efault, paolo.valente, Jens Axboe

We don't expect the async depth to be smaller than the wake batch
count for sbitmap, but just in case, inform sbitmap of what shallow
depth kyber may use.

Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/kyber-iosched.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 564967fafe5f..5b33dc394cc7 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -378,6 +378,7 @@ static void kyber_exit_sched(struct elevator_queue *e)
 
 static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
 {
+	struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data;
 	struct kyber_hctx_data *khd;
 	int i;
 
@@ -400,6 +401,8 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
 	khd->batching = 0;
 
 	hctx->sched_data = khd;
+	sbitmap_queue_min_shallow_depth(&hctx->sched_tags->bitmap_tags,
+					kqd->async_depth);
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 9/9] sbitmap: fix race in wait batch accounting
  2018-05-10 16:24 [PATCHSET v3 0/9] blk-mq-sched and sbitmap shallow depth Jens Axboe
                   ` (7 preceding siblings ...)
  2018-05-10 16:24 ` [PATCH 8/9] kyber-iosched: update shallow depth when setting up hardware queue Jens Axboe
@ 2018-05-10 16:24 ` Jens Axboe
  8 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2018-05-10 16:24 UTC (permalink / raw)
  To: linux-block; +Cc: osandov, efault, paolo.valente, Jens Axboe

If we have multiple callers of sbq_wake_up(), we can end up in a
situation where the wait_cnt will continually go more and more
negative. Consider the case where our wake batch is 1, hence
wait_cnt will start out as 1.

wait_cnt == 1

CPU0				CPU1
atomic_dec_return(), cnt == 0
				atomic_dec_return(), cnt == -1
				cmpxchg(-1, 0) (succeeds)
				[wait_cnt now 0]
cmpxchg(0, 1) (fails)

This ends up with wait_cnt being 0, we'll wakeup immediately
next time. Going through the same loop as above again, and
we'll have wait_cnt -1.

For the case where we have a larger wake batch, the only
difference is that the starting point will be higher. We'll
still end up with continually smaller batch wakeups, which
defeats the purpose of the rolling wakeups.

Always reset the wait_cnt to the batch value. Then it doesn't
matter who wins the race. But ensure that whomever does win
the race is the one that increments the ws index as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 lib/sbitmap.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index 8f0950fbaa5c..ffc36e695f50 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -478,22 +478,26 @@ static void sbq_wake_up(struct sbitmap_queue *sbq)
 
 	wait_cnt = atomic_dec_return(&ws->wait_cnt);
 	if (wait_cnt <= 0) {
+		int ret;
+
 		wake_batch = READ_ONCE(sbq->wake_batch);
+
 		/*
 		 * Pairs with the memory barrier in sbitmap_queue_resize() to
 		 * ensure that we see the batch size update before the wait
 		 * count is reset.
 		 */
 		smp_mb__before_atomic();
+
 		/*
-		 * If there are concurrent callers to sbq_wake_up(), the last
-		 * one to decrement the wait count below zero will bump it back
-		 * up. If there is a concurrent resize, the count reset will
-		 * either cause the cmpxchg to fail or overwrite after the
-		 * cmpxchg.
+		 * For concurrent callers (either resize, or just multiple
+		 * concurrent wakeups of waiters), the one that wins the
+		 * cmpxchg() race increments the waiter index.
 		 */
-		atomic_cmpxchg(&ws->wait_cnt, wait_cnt, wait_cnt + wake_batch);
-		sbq_index_atomic_inc(&sbq->wake_index);
+		ret = atomic_cmpxchg(&ws->wait_cnt, wait_cnt, wake_batch);
+		if (ret == wait_cnt)
+			sbq_index_atomic_inc(&sbq->wake_index);
+
 		wake_up_nr(&ws->wait, wake_batch);
 	}
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/9] blk-mq: don't call into depth limiting for reserved tags
  2018-05-10 16:24 ` [PATCH 1/9] blk-mq: don't call into depth limiting for reserved tags Jens Axboe
@ 2018-05-10 16:59   ` Omar Sandoval
  0 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-10 16:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, osandov, efault, paolo.valente

On Thu, May 10, 2018 at 10:24:19AM -0600, Jens Axboe wrote:
> It's not useful, they are internal and/or error handling recovery
> commands.
> 
> Acked-by: Paolo Valente <paolo.valente@linaro.org>

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/blk-mq.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4e9d83594cca..64630caaf27e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -360,9 +360,11 @@ static struct request *blk_mq_get_request(struct request_queue *q,
>  
>  		/*
>  		 * Flush requests are special and go directly to the
> -		 * dispatch list.
> +		 * dispatch list. Don't include reserved tags in the
> +		 * limiting, as it isn't useful.
>  		 */
> -		if (!op_is_flush(op) && e->type->ops.mq.limit_depth)
> +		if (!op_is_flush(op) && e->type->ops.mq.limit_depth &&
> +		    !(data->flags & BLK_MQ_REQ_RESERVED))
>  			e->type->ops.mq.limit_depth(op, data);
>  	}
>  
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/9] bfq-iosched: don't worry about reserved tags in limit_depth
  2018-05-10 16:24 ` [PATCH 2/9] bfq-iosched: don't worry about reserved tags in limit_depth Jens Axboe
@ 2018-05-10 16:59   ` Omar Sandoval
  0 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-10 16:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, osandov, efault, paolo.valente

On Thu, May 10, 2018 at 10:24:20AM -0600, Jens Axboe wrote:
> Reserved tags are used for error handling, we don't need to
> care about them for regular IO. The core won't call us for these
> anyway.
> 
> Acked-by: Paolo Valente <paolo.valente@linaro.org>

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/bfq-iosched.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index eefd8a4bc936..db38e88a5670 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -542,14 +542,7 @@ static void bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data)
>  	if (op_is_sync(op) && !op_is_write(op))
>  		return;
>  
> -	if (data->flags & BLK_MQ_REQ_RESERVED) {
> -		if (unlikely(!tags->nr_reserved_tags)) {
> -			WARN_ON_ONCE(1);
> -			return;
> -		}
> -		bt = &tags->breserved_tags;
> -	} else
> -		bt = &tags->bitmap_tags;
> +	bt = &tags->bitmap_tags;
>  
>  	if (unlikely(bfqd->sb_shift != bt->sb.shift))
>  		bfq_update_depths(bfqd, bt);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/9] bfq: calculate shallow depths at init time
  2018-05-10 16:24 ` [PATCH 3/9] bfq: calculate shallow depths at init time Jens Axboe
@ 2018-05-10 17:00   ` Omar Sandoval
  0 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-10 17:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, osandov, efault, paolo.valente

On Thu, May 10, 2018 at 10:24:21AM -0600, Jens Axboe wrote:
> It doesn't change, so don't put it in the per-IO hot path.
> 
> Acked-by: Paolo Valente <paolo.valente@linaro.org>

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/bfq-iosched.c | 97 +++++++++++++++++++++++++++--------------------------
>  1 file changed, 50 insertions(+), 47 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index db38e88a5670..0cd8aa80c32d 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -487,46 +487,6 @@ static struct request *bfq_choose_req(struct bfq_data *bfqd,
>  }
>  
>  /*
> - * See the comments on bfq_limit_depth for the purpose of
> - * the depths set in the function.
> - */
> -static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
> -{
> -	bfqd->sb_shift = bt->sb.shift;
> -
> -	/*
> -	 * In-word depths if no bfq_queue is being weight-raised:
> -	 * leaving 25% of tags only for sync reads.
> -	 *
> -	 * In next formulas, right-shift the value
> -	 * (1U<<bfqd->sb_shift), instead of computing directly
> -	 * (1U<<(bfqd->sb_shift - something)), to be robust against
> -	 * any possible value of bfqd->sb_shift, without having to
> -	 * limit 'something'.
> -	 */
> -	/* no more than 50% of tags for async I/O */
> -	bfqd->word_depths[0][0] = max((1U<<bfqd->sb_shift)>>1, 1U);
> -	/*
> -	 * no more than 75% of tags for sync writes (25% extra tags
> -	 * w.r.t. async I/O, to prevent async I/O from starving sync
> -	 * writes)
> -	 */
> -	bfqd->word_depths[0][1] = max(((1U<<bfqd->sb_shift) * 3)>>2, 1U);
> -
> -	/*
> -	 * In-word depths in case some bfq_queue is being weight-
> -	 * raised: leaving ~63% of tags for sync reads. This is the
> -	 * highest percentage for which, in our tests, application
> -	 * start-up times didn't suffer from any regression due to tag
> -	 * shortage.
> -	 */
> -	/* no more than ~18% of tags for async I/O */
> -	bfqd->word_depths[1][0] = max(((1U<<bfqd->sb_shift) * 3)>>4, 1U);
> -	/* no more than ~37% of tags for sync writes (~20% extra tags) */
> -	bfqd->word_depths[1][1] = max(((1U<<bfqd->sb_shift) * 6)>>4, 1U);
> -}
> -
> -/*
>   * Async I/O can easily starve sync I/O (both sync reads and sync
>   * writes), by consuming all tags. Similarly, storms of sync writes,
>   * such as those that sync(2) may trigger, can starve sync reads.
> @@ -535,18 +495,11 @@ static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
>   */
>  static void bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data)
>  {
> -	struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
>  	struct bfq_data *bfqd = data->q->elevator->elevator_data;
> -	struct sbitmap_queue *bt;
>  
>  	if (op_is_sync(op) && !op_is_write(op))
>  		return;
>  
> -	bt = &tags->bitmap_tags;
> -
> -	if (unlikely(bfqd->sb_shift != bt->sb.shift))
> -		bfq_update_depths(bfqd, bt);
> -
>  	data->shallow_depth =
>  		bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)];
>  
> @@ -5126,6 +5079,55 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
>  	__bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
>  }
>  
> +/*
> + * See the comments on bfq_limit_depth for the purpose of
> + * the depths set in the function.
> + */
> +static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
> +{
> +	bfqd->sb_shift = bt->sb.shift;
> +
> +	/*
> +	 * In-word depths if no bfq_queue is being weight-raised:
> +	 * leaving 25% of tags only for sync reads.
> +	 *
> +	 * In next formulas, right-shift the value
> +	 * (1U<<bfqd->sb_shift), instead of computing directly
> +	 * (1U<<(bfqd->sb_shift - something)), to be robust against
> +	 * any possible value of bfqd->sb_shift, without having to
> +	 * limit 'something'.
> +	 */
> +	/* no more than 50% of tags for async I/O */
> +	bfqd->word_depths[0][0] = max((1U<<bfqd->sb_shift)>>1, 1U);
> +	/*
> +	 * no more than 75% of tags for sync writes (25% extra tags
> +	 * w.r.t. async I/O, to prevent async I/O from starving sync
> +	 * writes)
> +	 */
> +	bfqd->word_depths[0][1] = max(((1U<<bfqd->sb_shift) * 3)>>2, 1U);
> +
> +	/*
> +	 * In-word depths in case some bfq_queue is being weight-
> +	 * raised: leaving ~63% of tags for sync reads. This is the
> +	 * highest percentage for which, in our tests, application
> +	 * start-up times didn't suffer from any regression due to tag
> +	 * shortage.
> +	 */
> +	/* no more than ~18% of tags for async I/O */
> +	bfqd->word_depths[1][0] = max(((1U<<bfqd->sb_shift) * 3)>>4, 1U);
> +	/* no more than ~37% of tags for sync writes (~20% extra tags) */
> +	bfqd->word_depths[1][1] = max(((1U<<bfqd->sb_shift) * 6)>>4, 1U);
> +}
> +
> +static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
> +{
> +	struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
> +	struct blk_mq_tags *tags = hctx->sched_tags;
> +
> +	bfq_update_depths(bfqd, &tags->bitmap_tags);
> +	return 0;
> +}
> +
>  static void bfq_exit_queue(struct elevator_queue *e)
>  {
>  	struct bfq_data *bfqd = e->elevator_data;
> @@ -5547,6 +5549,7 @@ static struct elevator_type iosched_bfq_mq = {
>  		.requests_merged	= bfq_requests_merged,
>  		.request_merged		= bfq_request_merged,
>  		.has_work		= bfq_has_work,
> +		.init_hctx		= bfq_init_hctx,
>  		.init_sched		= bfq_init_queue,
>  		.exit_sched		= bfq_exit_queue,
>  	},
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/9] bfq-iosched: remove unused variable
  2018-05-10 16:24 ` [PATCH 4/9] bfq-iosched: remove unused variable Jens Axboe
@ 2018-05-10 17:00   ` Omar Sandoval
  0 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-10 17:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, osandov, efault, paolo.valente

On Thu, May 10, 2018 at 10:24:22AM -0600, Jens Axboe wrote:
> bfqd->sb_shift was attempted used as a cache for the sbitmap queue
> shift, but we don't need it, as it never changes. Kill it with fire.
> 
> Acked-by: Paolo Valente <paolo.valente@linaro.org>

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/bfq-iosched.c | 16 +++++++---------
>  block/bfq-iosched.h |  6 ------
>  2 files changed, 7 insertions(+), 15 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 0cd8aa80c32d..10294124d597 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -5085,26 +5085,24 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
>   */
>  static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
>  {
> -	bfqd->sb_shift = bt->sb.shift;
> -
>  	/*
>  	 * In-word depths if no bfq_queue is being weight-raised:
>  	 * leaving 25% of tags only for sync reads.
>  	 *
>  	 * In next formulas, right-shift the value
> -	 * (1U<<bfqd->sb_shift), instead of computing directly
> -	 * (1U<<(bfqd->sb_shift - something)), to be robust against
> -	 * any possible value of bfqd->sb_shift, without having to
> +	 * (1U<<bt->sb.shift), instead of computing directly
> +	 * (1U<<(bt->sb.shift - something)), to be robust against
> +	 * any possible value of bt->sb.shift, without having to
>  	 * limit 'something'.
>  	 */
>  	/* no more than 50% of tags for async I/O */
> -	bfqd->word_depths[0][0] = max((1U<<bfqd->sb_shift)>>1, 1U);
> +	bfqd->word_depths[0][0] = max((1U << bt->sb.shift) >> 1, 1U);
>  	/*
>  	 * no more than 75% of tags for sync writes (25% extra tags
>  	 * w.r.t. async I/O, to prevent async I/O from starving sync
>  	 * writes)
>  	 */
> -	bfqd->word_depths[0][1] = max(((1U<<bfqd->sb_shift) * 3)>>2, 1U);
> +	bfqd->word_depths[0][1] = max(((1U << bt->sb.shift) * 3) >> 2, 1U);
>  
>  	/*
>  	 * In-word depths in case some bfq_queue is being weight-
> @@ -5114,9 +5112,9 @@ static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
>  	 * shortage.
>  	 */
>  	/* no more than ~18% of tags for async I/O */
> -	bfqd->word_depths[1][0] = max(((1U<<bfqd->sb_shift) * 3)>>4, 1U);
> +	bfqd->word_depths[1][0] = max(((1U << bt->sb.shift) * 3) >> 4, 1U);
>  	/* no more than ~37% of tags for sync writes (~20% extra tags) */
> -	bfqd->word_depths[1][1] = max(((1U<<bfqd->sb_shift) * 6)>>4, 1U);
> +	bfqd->word_depths[1][1] = max(((1U << bt->sb.shift) * 6) >> 4, 1U);
>  }
>  
>  static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
> index 8ec7ff92cd6f..faac509cb35e 100644
> --- a/block/bfq-iosched.h
> +++ b/block/bfq-iosched.h
> @@ -636,12 +636,6 @@ struct bfq_data {
>  	struct bfq_queue *bio_bfqq;
>  
>  	/*
> -	 * Cached sbitmap shift, used to compute depth limits in
> -	 * bfq_update_depths.
> -	 */
> -	unsigned int sb_shift;
> -
> -	/*
>  	 * Depth limits used in bfq_limit_depth (see comments on the
>  	 * function)
>  	 */
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/9] sbitmap: fix missed wakeups caused by sbitmap_queue_get_shallow()
  2018-05-10 16:24 ` [PATCH 5/9] sbitmap: fix missed wakeups caused by sbitmap_queue_get_shallow() Jens Axboe
@ 2018-05-10 17:01   ` Omar Sandoval
  2018-05-10 17:09     ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Omar Sandoval @ 2018-05-10 17:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, osandov, efault, paolo.valente

On Thu, May 10, 2018 at 10:24:23AM -0600, Jens Axboe wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> The sbitmap queue wake batch is calculated such that once allocations
> start blocking, all of the bits which are already allocated must be
> enough to fulfill the batch counters of all of the waitqueues. However,
> the shallow allocation depth can break this invariant, since we block
> before our full depth is being utilized. Add
> sbitmap_queue_min_shallow_depth(), which saves the minimum shallow depth
> the sbq will use, and update sbq_calc_wake_batch() to take it into
> account.
> 
> Acked-by: Paolo Valente <paolo.valente@linaro.org>
> Signed-off-by: Omar Sandoval <osandov@fb.com>

Reviewed -- haha, wait, thanks for picking this one up :)

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  include/linux/sbitmap.h | 29 +++++++++++++++++++++++++++++
>  lib/sbitmap.c           | 49 ++++++++++++++++++++++++++++++++++++++++---------
>  2 files changed, 69 insertions(+), 9 deletions(-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/9] sbitmap: warn if using smaller shallow depth than was setup
  2018-05-10 16:24 ` [PATCH 6/9] sbitmap: warn if using smaller shallow depth than was setup Jens Axboe
@ 2018-05-10 17:02   ` Omar Sandoval
  2018-05-10 17:09     ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Omar Sandoval @ 2018-05-10 17:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, osandov, efault, paolo.valente

On Thu, May 10, 2018 at 10:24:24AM -0600, Jens Axboe wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Make sure the user passed the right value to
> sbitmap_queue_min_shallow_depth().

An unlucky bisect that lands between this change and the BFQ/Kyber
changes is going to trigger this warning. We should have it after the
BFQ/Kyber changes.

> Acked-by: Paolo Valente <paolo.valente@linaro.org>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  lib/sbitmap.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index d21473b42465..8f0950fbaa5c 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -402,6 +402,8 @@ int __sbitmap_queue_get_shallow(struct sbitmap_queue *sbq,
>  	unsigned int hint, depth;
>  	int nr;
>  
> +	WARN_ON_ONCE(shallow_depth < sbq->min_shallow_depth);
> +
>  	hint = this_cpu_read(*sbq->alloc_hint);
>  	depth = READ_ONCE(sbq->sb.depth);
>  	if (unlikely(hint >= depth)) {
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/9] bfq-iosched: update shallow depth to smallest one used
  2018-05-10 16:24 ` [PATCH 7/9] bfq-iosched: update shallow depth to smallest one used Jens Axboe
@ 2018-05-10 17:03   ` Omar Sandoval
  0 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-10 17:03 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, osandov, efault, paolo.valente

On Thu, May 10, 2018 at 10:24:25AM -0600, Jens Axboe wrote:
> If our shallow depth is smaller than the wake batching of sbitmap,
> we can introduce hangs. Ensure that sbitmap knows how low we'll go.
> 
> Acked-by: Paolo Valente <paolo.valente@linaro.org>

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/bfq-iosched.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 10294124d597..b622e73a326a 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -5081,10 +5081,13 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
>  
>  /*
>   * See the comments on bfq_limit_depth for the purpose of
> - * the depths set in the function.
> + * the depths set in the function. Return minimum shallow depth we'll use.
>   */
> -static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
> +static unsigned int bfq_update_depths(struct bfq_data *bfqd,
> +				      struct sbitmap_queue *bt)
>  {
> +	unsigned int i, j, min_shallow = UINT_MAX;
> +
>  	/*
>  	 * In-word depths if no bfq_queue is being weight-raised:
>  	 * leaving 25% of tags only for sync reads.
> @@ -5115,14 +5118,22 @@ static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
>  	bfqd->word_depths[1][0] = max(((1U << bt->sb.shift) * 3) >> 4, 1U);
>  	/* no more than ~37% of tags for sync writes (~20% extra tags) */
>  	bfqd->word_depths[1][1] = max(((1U << bt->sb.shift) * 6) >> 4, 1U);
> +
> +	for (i = 0; i < 2; i++)
> +		for (j = 0; j < 2; j++)
> +			min_shallow = min(min_shallow, bfqd->word_depths[i][j]);
> +
> +	return min_shallow;
>  }
>  
>  static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
>  {
>  	struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
>  	struct blk_mq_tags *tags = hctx->sched_tags;
> +	unsigned int min_shallow;
>  
> -	bfq_update_depths(bfqd, &tags->bitmap_tags);
> +	min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
> +	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
>  	return 0;
>  }
>  
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 8/9] kyber-iosched: update shallow depth when setting up hardware queue
  2018-05-10 16:24 ` [PATCH 8/9] kyber-iosched: update shallow depth when setting up hardware queue Jens Axboe
@ 2018-05-10 17:03   ` Omar Sandoval
  0 siblings, 0 replies; 20+ messages in thread
From: Omar Sandoval @ 2018-05-10 17:03 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, osandov, efault, paolo.valente

On Thu, May 10, 2018 at 10:24:26AM -0600, Jens Axboe wrote:
> We don't expect the async depth to be smaller than the wake batch
> count for sbitmap, but just in case, inform sbitmap of what shallow
> depth kyber may use.
> 
> Acked-by: Paolo Valente <paolo.valente@linaro.org>

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/kyber-iosched.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
> index 564967fafe5f..5b33dc394cc7 100644
> --- a/block/kyber-iosched.c
> +++ b/block/kyber-iosched.c
> @@ -378,6 +378,7 @@ static void kyber_exit_sched(struct elevator_queue *e)
>  
>  static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
>  {
> +	struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data;
>  	struct kyber_hctx_data *khd;
>  	int i;
>  
> @@ -400,6 +401,8 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
>  	khd->batching = 0;
>  
>  	hctx->sched_data = khd;
> +	sbitmap_queue_min_shallow_depth(&hctx->sched_tags->bitmap_tags,
> +					kqd->async_depth);
>  
>  	return 0;
>  }
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/9] sbitmap: warn if using smaller shallow depth than was setup
  2018-05-10 17:02   ` Omar Sandoval
@ 2018-05-10 17:09     ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2018-05-10 17:09 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-block, osandov, efault, paolo.valente

On 5/10/18 11:02 AM, Omar Sandoval wrote:
> On Thu, May 10, 2018 at 10:24:24AM -0600, Jens Axboe wrote:
>> From: Omar Sandoval <osandov@fb.com>
>>
>> Make sure the user passed the right value to
>> sbitmap_queue_min_shallow_depth().
> 
> An unlucky bisect that lands between this change and the BFQ/Kyber
> changes is going to trigger this warning. We should have it after the
> BFQ/Kyber changes.

Good point, I'll move it after the kyber/bfq change.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/9] sbitmap: fix missed wakeups caused by sbitmap_queue_get_shallow()
  2018-05-10 17:01   ` Omar Sandoval
@ 2018-05-10 17:09     ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2018-05-10 17:09 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-block, osandov, efault, paolo.valente

On 5/10/18 11:01 AM, Omar Sandoval wrote:
> On Thu, May 10, 2018 at 10:24:23AM -0600, Jens Axboe wrote:
>> From: Omar Sandoval <osandov@fb.com>
>>
>> The sbitmap queue wake batch is calculated such that once allocations
>> start blocking, all of the bits which are already allocated must be
>> enough to fulfill the batch counters of all of the waitqueues. However,
>> the shallow allocation depth can break this invariant, since we block
>> before our full depth is being utilized. Add
>> sbitmap_queue_min_shallow_depth(), which saves the minimum shallow depth
>> the sbq will use, and update sbq_calc_wake_batch() to take it into
>> account.
>>
>> Acked-by: Paolo Valente <paolo.valente@linaro.org>
>> Signed-off-by: Omar Sandoval <osandov@fb.com>
> 
> Reviewed -- haha, wait, thanks for picking this one up :)

Just checking if you're awake!

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-05-10 17:09 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-10 16:24 [PATCHSET v3 0/9] blk-mq-sched and sbitmap shallow depth Jens Axboe
2018-05-10 16:24 ` [PATCH 1/9] blk-mq: don't call into depth limiting for reserved tags Jens Axboe
2018-05-10 16:59   ` Omar Sandoval
2018-05-10 16:24 ` [PATCH 2/9] bfq-iosched: don't worry about reserved tags in limit_depth Jens Axboe
2018-05-10 16:59   ` Omar Sandoval
2018-05-10 16:24 ` [PATCH 3/9] bfq: calculate shallow depths at init time Jens Axboe
2018-05-10 17:00   ` Omar Sandoval
2018-05-10 16:24 ` [PATCH 4/9] bfq-iosched: remove unused variable Jens Axboe
2018-05-10 17:00   ` Omar Sandoval
2018-05-10 16:24 ` [PATCH 5/9] sbitmap: fix missed wakeups caused by sbitmap_queue_get_shallow() Jens Axboe
2018-05-10 17:01   ` Omar Sandoval
2018-05-10 17:09     ` Jens Axboe
2018-05-10 16:24 ` [PATCH 6/9] sbitmap: warn if using smaller shallow depth than was setup Jens Axboe
2018-05-10 17:02   ` Omar Sandoval
2018-05-10 17:09     ` Jens Axboe
2018-05-10 16:24 ` [PATCH 7/9] bfq-iosched: update shallow depth to smallest one used Jens Axboe
2018-05-10 17:03   ` Omar Sandoval
2018-05-10 16:24 ` [PATCH 8/9] kyber-iosched: update shallow depth when setting up hardware queue Jens Axboe
2018-05-10 17:03   ` Omar Sandoval
2018-05-10 16:24 ` [PATCH 9/9] sbitmap: fix race in wait batch accounting Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.