* [PATCH 0/7] blk-mq: fix queue quiescing
@ 2017-05-25 4:21 Ming Lei
2017-05-25 4:21 ` [PATCH 1/7] blk-mq: introduce blk_mq_unquiesce_queue Ming Lei
` (7 more replies)
0 siblings, 8 replies; 15+ messages in thread
From: Ming Lei @ 2017-05-25 4:21 UTC (permalink / raw)
To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche
Hi,
One big problem of blk_mq_quiesce_queue() is that it
can't prevent .queue_rq() in direct issue path from
being run even though hw queues are stopped by
blk_mq_quiesce_queue().
It is observed that request double-free/use-after-free
can be triggered easily when canceling NVMe requests via
blk_mq_tagset_busy_iter(...nvme_cancel_request) in nvme_dev_disable().
The reason is that blk_mq_quiesce_queue() can't prevent
direct issue from being run.
This patchset implements queue quiescing via percpu_ref
and fixes the above issue.
Actually we have to quiesce queue for canceling dispatched
requests via blk_mq_tagset_busy_iter(), otherwise use-after-free
can be made easily. This way of canceling dispatched requests
has been used in several drivers, only NVMe uses blk_mq_quiesce_queue()
to avoid the issue, and others need to be fixed too.
Also this patchset cleans up the code much, and the fat 'srcu_struct'
can be removed from 'struct blk_mq_hw_ctx'.
Thanks,
Ming
Ming Lei (7):
blk-mq: introduce blk_mq_unquiesce_queue
block: introduce flag of QUEUE_FLAG_QUIESCED
dm: use the introduced blk_mq_unquiesce_queue()
nvme: use the introduced blk_mq_unquiesce_queue()
scsi: use the introduced blk_mq_unquiesce_queue()
blk-mq: quiesce queue via percpu_ref
blk-mq: update comments on blk_mq_quiesce_queue()
block/blk-mq.c | 131 +++++++++++++++++++++++++++++++----------------
drivers/md/dm-rq.c | 2 +-
drivers/nvme/host/core.c | 2 +-
drivers/scsi/scsi_lib.c | 5 +-
include/linux/blk-mq.h | 2 -
include/linux/blkdev.h | 8 +++
6 files changed, 102 insertions(+), 48 deletions(-)
--
2.9.4
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/7] blk-mq: introduce blk_mq_unquiesce_queue
2017-05-25 4:21 [PATCH 0/7] blk-mq: fix queue quiescing Ming Lei
@ 2017-05-25 4:21 ` Ming Lei
2017-05-25 4:21 ` [PATCH 2/7] block: introduce flag of QUEUE_FLAG_QUIESCED Ming Lei
` (6 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2017-05-25 4:21 UTC (permalink / raw)
To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche
blk_mq_unquiesce_queue() is introduced to for rewriting
the current quiescing mechanism, which will be done
in the following patches.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 13 +++++++++++++
include/linux/blkdev.h | 1 +
2 files changed, 14 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f2224ffd225d..a26fee3fb389 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -181,6 +181,19 @@ void blk_mq_quiesce_queue(struct request_queue *q)
}
EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
+/**
+ * blk_mq_unquiesce_queue() - pair of blk_mq_quiesce_queue()
+ * @q: request queue.
+ *
+ * This function recovers queue into the state before quiescing
+ * done by blk_mq_quiesce_queue
+ */
+void blk_mq_unquiesce_queue(struct request_queue *q)
+{
+ blk_mq_start_stopped_hw_queues(q, true);
+}
+EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
+
void blk_mq_wake_waiters(struct request_queue *q)
{
struct blk_mq_hw_ctx *hctx;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ab92c4ea138b..41291be82ac4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -964,6 +964,7 @@ extern void __blk_run_queue_uncond(struct request_queue *q);
extern void blk_run_queue(struct request_queue *);
extern void blk_run_queue_async(struct request_queue *q);
extern void blk_mq_quiesce_queue(struct request_queue *q);
+extern void blk_mq_unquiesce_queue(struct request_queue *q);
extern int blk_rq_map_user(struct request_queue *, struct request *,
struct rq_map_data *, void __user *, unsigned long,
gfp_t);
--
2.9.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/7] block: introduce flag of QUEUE_FLAG_QUIESCED
2017-05-25 4:21 [PATCH 0/7] blk-mq: fix queue quiescing Ming Lei
2017-05-25 4:21 ` [PATCH 1/7] blk-mq: introduce blk_mq_unquiesce_queue Ming Lei
@ 2017-05-25 4:21 ` Ming Lei
2017-05-25 4:21 ` [PATCH 3/7] dm: use the introduced blk_mq_unquiesce_queue() Ming Lei
` (5 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2017-05-25 4:21 UTC (permalink / raw)
To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche
This flag is introduced for reimplementing the quiescing code.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
include/linux/blkdev.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 41291be82ac4..60967797f4f6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -618,6 +618,7 @@ struct request_queue {
#define QUEUE_FLAG_STATS 27 /* track rq completion times */
#define QUEUE_FLAG_POLL_STATS 28 /* collecting stats for hybrid polling */
#define QUEUE_FLAG_REGISTERED 29 /* queue has been registered to a disk */
+#define QUEUE_FLAG_QUIESCED 30 /* queue has been quiesced */
#define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
(1 << QUEUE_FLAG_STACKABLE) | \
@@ -712,6 +713,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
#define blk_noretry_request(rq) \
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
REQ_FAILFAST_DRIVER))
+#define blk_queue_quiesced(q) test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
static inline bool blk_account_rq(struct request *rq)
{
--
2.9.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/7] dm: use the introduced blk_mq_unquiesce_queue()
2017-05-25 4:21 [PATCH 0/7] blk-mq: fix queue quiescing Ming Lei
2017-05-25 4:21 ` [PATCH 1/7] blk-mq: introduce blk_mq_unquiesce_queue Ming Lei
2017-05-25 4:21 ` [PATCH 2/7] block: introduce flag of QUEUE_FLAG_QUIESCED Ming Lei
@ 2017-05-25 4:21 ` Ming Lei
2017-05-25 4:21 ` [PATCH 4/7] nvme: " Ming Lei
` (4 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2017-05-25 4:21 UTC (permalink / raw)
To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche
blk_mq_unquiesce_queue() is used for unquiescing the queue.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/md/dm-rq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 2af27026aa2e..673fcf075077 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -71,7 +71,7 @@ static void dm_old_start_queue(struct request_queue *q)
static void dm_mq_start_queue(struct request_queue *q)
{
- blk_mq_start_stopped_hw_queues(q, true);
+ blk_mq_unquiesce_queue(q);
blk_mq_kick_requeue_list(q);
}
--
2.9.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/7] nvme: use the introduced blk_mq_unquiesce_queue()
2017-05-25 4:21 [PATCH 0/7] blk-mq: fix queue quiescing Ming Lei
` (2 preceding siblings ...)
2017-05-25 4:21 ` [PATCH 3/7] dm: use the introduced blk_mq_unquiesce_queue() Ming Lei
@ 2017-05-25 4:21 ` Ming Lei
2017-05-25 4:21 ` [PATCH 5/7] scsi: " Ming Lei
` (3 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2017-05-25 4:21 UTC (permalink / raw)
To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche
blk_mq_unquiesce_queue() is used for unquiescing the queue.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/nvme/host/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 04e115834702..231d36028afc 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2514,7 +2514,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
mutex_lock(&ctrl->namespaces_mutex);
list_for_each_entry(ns, &ctrl->namespaces, list) {
- blk_mq_start_stopped_hw_queues(ns->queue, true);
+ blk_mq_unquiesce_queue(ns->queue);
blk_mq_kick_requeue_list(ns->queue);
}
mutex_unlock(&ctrl->namespaces_mutex);
--
2.9.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 5/7] scsi: use the introduced blk_mq_unquiesce_queue()
2017-05-25 4:21 [PATCH 0/7] blk-mq: fix queue quiescing Ming Lei
` (3 preceding siblings ...)
2017-05-25 4:21 ` [PATCH 4/7] nvme: " Ming Lei
@ 2017-05-25 4:21 ` Ming Lei
2017-05-25 4:21 ` [PATCH 6/7] blk-mq: quiesce queue via percpu_ref Ming Lei
` (2 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2017-05-25 4:21 UTC (permalink / raw)
To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche
blk_mq_unquiesce_queue() is used for unquiescing the queue.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/scsi/scsi_lib.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 814a4bd8405d..72b11f75719c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3030,7 +3030,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
return -EINVAL;
if (q->mq_ops) {
- blk_mq_start_stopped_hw_queues(q, false);
+ if (blk_queue_quiesced(q))
+ blk_mq_unquiesce_queue(q);
+ else
+ blk_mq_start_stopped_hw_queues(q, false);
} else {
spin_lock_irqsave(q->queue_lock, flags);
blk_start_queue(q);
--
2.9.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 6/7] blk-mq: quiesce queue via percpu_ref
2017-05-25 4:21 [PATCH 0/7] blk-mq: fix queue quiescing Ming Lei
` (4 preceding siblings ...)
2017-05-25 4:21 ` [PATCH 5/7] scsi: " Ming Lei
@ 2017-05-25 4:21 ` Ming Lei
2017-05-25 4:21 ` [PATCH 7/7] blk-mq: update comments on blk_mq_quiesce_queue() Ming Lei
2017-05-25 5:24 ` [PATCH 0/7] blk-mq: fix queue quiescing Bart Van Assche
7 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2017-05-25 4:21 UTC (permalink / raw)
To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche
One big problem of blk_mq_quiesce_queue() is that it
can't prevent .queue_rq() in direct issue path from
being run even though hw queues are stopped.
It is observed that request double-free/use-after-free
can be triggered easily when canceling NVMe requests via
blk_mq_tagset_busy_iter(...nvme_cancel_request) in nvme_dev_disable().
The reason is that blk_mq_quiesce_queue() doesn't prevent
dispatch from being run.
Actually other drivers(mtip32xx, nbd) need to quiesce
queue first before canceling dispatched requests via
blk_mq_tagset_busy_iter(), otherwise use-after-free
can be caused.
This patch implements queue quiescing via percpu_ref,
and fixes the above issue, also has the following benefits:
- rcu & srcu are used for non-blocking and blocking respectively,
code becomes much clean after we unify both via percpu_ref
- the fat 'srcu_struct' can be removed from 'blk_mq_hw_ctx'
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 112 +++++++++++++++++++++++++++++++------------------
include/linux/blk-mq.h | 2 -
include/linux/blkdev.h | 5 +++
3 files changed, 77 insertions(+), 42 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a26fee3fb389..6316efb42e04 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -164,20 +164,23 @@ EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
*/
void blk_mq_quiesce_queue(struct request_queue *q)
{
- struct blk_mq_hw_ctx *hctx;
- unsigned int i;
- bool rcu = false;
+ bool old;
__blk_mq_stop_hw_queues(q, true);
- queue_for_each_hw_ctx(q, hctx, i) {
- if (hctx->flags & BLK_MQ_F_BLOCKING)
- synchronize_srcu(&hctx->queue_rq_srcu);
- else
- rcu = true;
- }
- if (rcu)
- synchronize_rcu();
+ mutex_lock(&q->quiesce_mutex);
+
+ spin_lock_irq(q->queue_lock);
+ old = queue_flag_test_and_set(QUEUE_FLAG_QUIESCED, q);
+ spin_unlock_irq(q->queue_lock);
+
+ if (old)
+ goto exit;
+
+ percpu_ref_kill(&q->q_dispatch_counter);
+ wait_event(q->quiesce_wq, percpu_ref_is_zero(&q->q_dispatch_counter));
+ exit:
+ mutex_unlock(&q->quiesce_mutex);
}
EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
@@ -190,6 +193,21 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
*/
void blk_mq_unquiesce_queue(struct request_queue *q)
{
+ bool old;
+
+ mutex_lock(&q->quiesce_mutex);
+
+ spin_lock_irq(q->queue_lock);
+ old = queue_flag_test_and_clear(QUEUE_FLAG_QUIESCED, q);
+ spin_unlock_irq(q->queue_lock);
+
+ if (!old)
+ goto exit;
+
+ percpu_ref_reinit(&q->q_dispatch_counter);
+ exit:
+ mutex_unlock(&q->quiesce_mutex);
+
blk_mq_start_stopped_hw_queues(q, true);
}
EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
@@ -209,6 +227,9 @@ void blk_mq_wake_waiters(struct request_queue *q)
* the queue are notified as well.
*/
wake_up_all(&q->mq_freeze_wq);
+
+ /* Forcibly unquiesce the queue to avoid having stuck requests */
+ blk_mq_unquiesce_queue(q);
}
bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
@@ -1099,24 +1120,28 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
return (queued + errors) != 0;
}
-static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+static inline bool blk_mq_dispatch_start(struct request_queue *q)
+{
+ return percpu_ref_tryget_live(&q->q_dispatch_counter);
+}
+
+static inline void blk_mq_dispatch_end(struct request_queue *q)
{
- int srcu_idx;
+ percpu_ref_put(&q->q_dispatch_counter);
+}
+static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+{
WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
cpu_online(hctx->next_cpu));
- if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
- rcu_read_lock();
- blk_mq_sched_dispatch_requests(hctx);
- rcu_read_unlock();
- } else {
- might_sleep();
+ if (!blk_mq_dispatch_start(hctx->queue))
+ return;
- srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
- blk_mq_sched_dispatch_requests(hctx);
- srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
- }
+ might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING);
+ blk_mq_sched_dispatch_requests(hctx);
+
+ blk_mq_dispatch_end(hctx->queue);
}
/*
@@ -1519,18 +1544,14 @@ static void __blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie,
static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
struct request *rq, blk_qc_t *cookie)
{
- if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
- rcu_read_lock();
- __blk_mq_try_issue_directly(rq, cookie, false);
- rcu_read_unlock();
- } else {
- unsigned int srcu_idx;
-
- might_sleep();
+ bool blocking = hctx->flags & BLK_MQ_F_BLOCKING;
- srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
- __blk_mq_try_issue_directly(rq, cookie, true);
- srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
+ if (blk_mq_dispatch_start(hctx->queue)) {
+ might_sleep_if(blocking);
+ __blk_mq_try_issue_directly(rq, cookie, blocking);
+ blk_mq_dispatch_end(hctx->queue);
+ } else {
+ blk_mq_sched_insert_request(rq, false, false, false, blocking);
}
}
@@ -1869,9 +1890,6 @@ static void blk_mq_exit_hctx(struct request_queue *q,
if (set->ops->exit_hctx)
set->ops->exit_hctx(hctx, hctx_idx);
- if (hctx->flags & BLK_MQ_F_BLOCKING)
- cleanup_srcu_struct(&hctx->queue_rq_srcu);
-
blk_mq_remove_cpuhp(hctx);
blk_free_flush_queue(hctx->fq);
sbitmap_free(&hctx->ctx_map);
@@ -1942,9 +1960,6 @@ static int blk_mq_init_hctx(struct request_queue *q,
node))
goto free_fq;
- if (hctx->flags & BLK_MQ_F_BLOCKING)
- init_srcu_struct(&hctx->queue_rq_srcu);
-
blk_mq_debugfs_register_hctx(q, hctx);
return 0;
@@ -2272,12 +2287,27 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
blk_mq_sysfs_register(q);
}
+static void blk_queue_dispatch_counter_release(struct percpu_ref *ref)
+{
+ struct request_queue *q =
+ container_of(ref, struct request_queue, q_dispatch_counter);
+
+ wake_up_all(&q->quiesce_wq);
+}
+
struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
struct request_queue *q)
{
/* mark the queue as mq asap */
q->mq_ops = set->ops;
+ init_waitqueue_head(&q->quiesce_wq);
+ mutex_init(&q->quiesce_mutex);
+ if (percpu_ref_init(&q->q_dispatch_counter,
+ blk_queue_dispatch_counter_release,
+ 0, GFP_KERNEL))
+ goto err_ref;
+
q->poll_cb = blk_stat_alloc_callback(blk_mq_poll_stats_fn,
blk_mq_poll_stats_bkt,
BLK_MQ_POLL_STATS_BKTS, q);
@@ -2360,6 +2390,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
err_percpu:
free_percpu(q->queue_ctx);
err_exit:
+ percpu_ref_exit(&q->q_dispatch_counter);
+err_ref:
q->mq_ops = NULL;
return ERR_PTR(-ENOMEM);
}
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index fcd641032f8d..6da015e6d38a 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -39,8 +39,6 @@ struct blk_mq_hw_ctx {
struct blk_mq_tags *tags;
struct blk_mq_tags *sched_tags;
- struct srcu_struct queue_rq_srcu;
-
unsigned long queued;
unsigned long run;
#define BLK_MQ_MAX_DISPATCH_ORDER 7
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 60967797f4f6..a85841e9113d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -586,6 +586,11 @@ struct request_queue {
size_t cmd_size;
void *rq_alloc_data;
+
+ /* for blk_mq_quiesce_queue() */
+ wait_queue_head_t quiesce_wq;
+ struct percpu_ref q_dispatch_counter;
+ struct mutex quiesce_mutex;
};
#define QUEUE_FLAG_QUEUED 1 /* uses generic tag queueing */
--
2.9.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 7/7] blk-mq: update comments on blk_mq_quiesce_queue()
2017-05-25 4:21 [PATCH 0/7] blk-mq: fix queue quiescing Ming Lei
` (5 preceding siblings ...)
2017-05-25 4:21 ` [PATCH 6/7] blk-mq: quiesce queue via percpu_ref Ming Lei
@ 2017-05-25 4:21 ` Ming Lei
2017-05-25 5:24 ` [PATCH 0/7] blk-mq: fix queue quiescing Bart Van Assche
7 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2017-05-25 4:21 UTC (permalink / raw)
To: Jens Axboe, linux-block, Christoph Hellwig; +Cc: Bart Van Assche
Actually what we want to get from blk_mq_quiesce_queue()
isn't only to wait for completion of all ongooing .queue_rq().
In the typical context of canceling requests, we need to
make sure that the following is done in the dispatch path
before starting to cancel requests:
- failed dispatched request is freeed
- busy dispatched request is requeued, and the STARTED
flag is cleared
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6316efb42e04..a8f185f7e133 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -155,12 +155,12 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
/**
- * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
+ * blk_mq_quiesce_queue() - wait until all ongoing dispatching have finished
* @q: request queue.
*
* Note: this function does not prevent that the struct request end_io()
- * callback function is invoked. Additionally, it is not prevented that
- * new queue_rq() calls occur unless the queue has been stopped first.
+ * callback function is invoked. Once this function is returned, we make
+ * sure no dispatching can happen.
*/
void blk_mq_quiesce_queue(struct request_queue *q)
{
--
2.9.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 0/7] blk-mq: fix queue quiescing
2017-05-25 4:21 [PATCH 0/7] blk-mq: fix queue quiescing Ming Lei
` (6 preceding siblings ...)
2017-05-25 4:21 ` [PATCH 7/7] blk-mq: update comments on blk_mq_quiesce_queue() Ming Lei
@ 2017-05-25 5:24 ` Bart Van Assche
2017-05-25 9:09 ` Ming Lei
7 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2017-05-25 5:24 UTC (permalink / raw)
To: hch, linux-block, axboe, ming.lei
T24gVGh1LCAyMDE3LTA1LTI1IGF0IDEyOjIxICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT25l
IGJpZyBwcm9ibGVtIG9mIGJsa19tcV9xdWllc2NlX3F1ZXVlKCkgaXMgdGhhdCBpdA0KPiBjYW4n
dCBwcmV2ZW50IC5xdWV1ZV9ycSgpIGluIGRpcmVjdCBpc3N1ZSBwYXRoIGZyb20NCj4gYmVpbmcg
cnVuIGV2ZW4gdGhvdWdoIGh3IHF1ZXVlcyBhcmUgc3RvcHBlZCBieQ0KPiBibGtfbXFfcXVpZXNj
ZV9xdWV1ZSgpLg0KDQpUaGF0J3Mgd3JvbmcuIEFsbCB3aGF0J3MgbmVlZGVkIHRvIHByZXZlbnQg
dGhhdA0KX19ibGtfbXFfdHJ5X2lzc3VlX2RpcmVjdGx5KCkgY2FsbHMgLnF1ZXVlX3JxKCkgZm9y
IGEgc3RvcHBlZA0KcXVldWUgaXMgdG8gY2hlY2sgaW4gX19ibGtfbXFfdHJ5X2lzc3VlX2RpcmVj
dGx5KCkgd2hldGhlciB0aGUNCnJlbGV2YW50IGhhcmR3YXJlIHF1ZXVlIGhhcyBiZWVuIHN0b3Bw
ZWQuIFRoYXQgYXBwcm9hY2ggaGFzIHRoZQ0KZm9sbG93aW5nIHR3byBhZHZhbnRhZ2VzIG92ZXIg
dGhlIGFwcHJvYWNoIG9mIHlvdXIgcGF0Y2ggc2VyaWVzOg0KLSBMb3dlciBjb2RlIGNvbXBsZXhp
dHkgYW5kIGhlbmNlIGVhc2llciB0byByZXZpZXcgYW5kIHRvIG1haW50YWluLg0KLSBGYXN0ZXIg
Zm9yIHF1ZXVlcyBmb3Igd2hpY2ggQkxLX01RX0ZfQkxPQ0tJTkcgaGFzIG5vdCBiZWVuIHNldC4N
CiAgRm9yIG5vbi1wcmVlbXB0aWJsZSBrZXJuZWxzIHJjdV9yZWFkX2xvY2soKSBhbmQgcmN1X3Jl
YWRfbG9jaygpDQogIGFyZSBvcHRpbWl6ZWQgb3V0LiBUaGF0IGlzIG5vdCBwb3NzaWJsZSBmb3IN
CiAgcGVyY3B1X3JlZl90cnlnZXRfbGl2ZSgpIC8gcGVyY3B1X3JlZl9wdXQoKS4NCg0KQmFydC4=
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/7] blk-mq: fix queue quiescing
2017-05-25 5:24 ` [PATCH 0/7] blk-mq: fix queue quiescing Bart Van Assche
@ 2017-05-25 9:09 ` Ming Lei
2017-05-25 17:24 ` Bart Van Assche
0 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2017-05-25 9:09 UTC (permalink / raw)
To: Bart Van Assche; +Cc: hch, linux-block, axboe
On Thu, May 25, 2017 at 05:24:54AM +0000, Bart Van Assche wrote:
> On Thu, 2017-05-25 at 12:21 +0800, Ming Lei wrote:
> > One big problem of blk_mq_quiesce_queue() is that it
> > can't prevent .queue_rq() in direct issue path from
> > being run even though hw queues are stopped by
> > blk_mq_quiesce_queue().
>
> That's wrong. All what's needed to prevent that
> __blk_mq_try_issue_directly() calls .queue_rq() for a stopped
> queue is to check in __blk_mq_try_issue_directly() whether the
That should work, and I just didn't want to introduce a check
on global variable in fast path, but just figured out that
it is inevitable for percpu_ref too.
> relevant hardware queue has been stopped. That approach has the
I don't think it is a good idea to abuse stopped state for checking
queue quiesce, actually stopping queue has made lots of trouble,
we should avoid it in blk_mq_quiesce_queue() in the future, even
most of them in driver side.
So I will introduce a queue quiesce flag like in patch 2 for this
purpose.
> following two advantages over the approach of your patch series:
> - Lower code complexity and hence easier to review and to maintain.
> - Faster for queues for which BLK_MQ_F_BLOCKING has not been set.
> For non-preemptible kernels rcu_read_lock() and rcu_read_lock()
> are optimized out. That is not possible for
> percpu_ref_tryget_live() / percpu_ref_put().
That won't be issue for perpcu access at all.
Another big issue is that 'srcu_struct' is very big, which shouldn't
be embedded into hctx, since we only have one real user of
BLK_MQ_F_BLOCKING.
So I will fix that too.
Thanks,
Ming
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/7] blk-mq: fix queue quiescing
2017-05-25 9:09 ` Ming Lei
@ 2017-05-25 17:24 ` Bart Van Assche
2017-05-25 17:31 ` Jens Axboe
2017-05-25 17:42 ` Jens Axboe
0 siblings, 2 replies; 15+ messages in thread
From: Bart Van Assche @ 2017-05-25 17:24 UTC (permalink / raw)
To: ming.lei; +Cc: hch, linux-block, axboe
On Thu, 2017-05-25 at 17:09 +0800, Ming Lei wrote:
> Another big issue is that 'srcu_struct' is very big, which shouldn't
> be embedded into hctx, since we only have one real user of
> BLK_MQ_F_BLOCKING.
>=20
> So I will fix that too.
Hello Ming,
Is something like the (untested) patch below perhaps what you had in mind?
Subject: [PATCH] blk-mq: Reduce blk_mq_hw_ctx size
Since the srcu structure is rather large (184 bytes on an x86-64
system), only allocate it if needed.
Reported-by: Ming Lei <ming.lei@redhat.com>
---
=A0block/blk-mq.c=A0=A0=A0=A0=A0=A0=A0=A0=A0| 13 ++++++++++++-
=A0include/linux/blk-mq.h |=A0=A05 +++--
=A02 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1e330de4e3c5..15b7d4077638 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2233,6 +2233,17 @@ struct request_queue *blk_mq_init_queue(struct blk_m=
q_tag_set *set)
=A0}
=A0EXPORT_SYMBOL(blk_mq_init_queue);
=A0
+static int blk_mq_hw_ctx_size(struct request_queue *q)
+{
+ BUILD_BUG_ON(ALIGN(offsetof(struct blk_mq_hw_ctx, queue_rq_srcu) +
+ sizeof(((struct blk_mq_hw_ctx *)NULL)->queue_rq_srcu), 64) !=3D
+ sizeof(struct blk_mq_hw_ctx));
+
+ return q->tag_set->flags & BLK_MQ_F_BLOCKING ?
+ sizeof(struct blk_mq_hw_ctx) :
+ offsetof(struct blk_mq_hw_ctx, queue_rq_srcu);
+}
+
=A0static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
=A0 struct request_queue *q)
=A0{
@@ -2247,7 +2258,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_=
set *set,
=A0 continue;
=A0
=A0 node =3D blk_mq_hw_queue_to_node(q->mq_map, i);
- hctxs[i] =3D kzalloc_node(sizeof(struct blk_mq_hw_ctx),
+ hctxs[i] =3D kzalloc_node(blk_mq_hw_ctx_size(q),
=A0 GFP_KERNEL, node);
=A0 if (!hctxs[i])
=A0 break;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index c0d59330b5e0..8467e1f83524 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -39,8 +39,6 @@ struct blk_mq_hw_ctx {
=A0 struct blk_mq_tags *tags;
=A0 struct blk_mq_tags *sched_tags;
=A0
- struct srcu_struct queue_rq_srcu;
-
=A0 unsigned long queued;
=A0 unsigned long run;
=A0#define BLK_MQ_MAX_DISPATCH_ORDER 7
@@ -62,6 +60,9 @@ struct blk_mq_hw_ctx {
=A0 struct dentry *debugfs_dir;
=A0 struct dentry *sched_debugfs_dir;
=A0#endif
+
+ /* Must be the last member - see also blk_mq_hw_ctx_size(). */
+ struct srcu_struct queue_rq_srcu;
=A0};
=A0
=A0struct blk_mq_tag_set {
--=A0
2.12.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 0/7] blk-mq: fix queue quiescing
2017-05-25 17:24 ` Bart Van Assche
@ 2017-05-25 17:31 ` Jens Axboe
2017-05-25 17:42 ` Jens Axboe
1 sibling, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2017-05-25 17:31 UTC (permalink / raw)
To: Bart Van Assche, ming.lei; +Cc: hch, linux-block
On 05/25/2017 11:24 AM, Bart Van Assche wrote:
> On Thu, 2017-05-25 at 17:09 +0800, Ming Lei wrote:
>> Another big issue is that 'srcu_struct' is very big, which shouldn't
>> be embedded into hctx, since we only have one real user of
>> BLK_MQ_F_BLOCKING.
>>
>> So I will fix that too.
>
> Hello Ming,
>
> Is something like the (untested) patch below perhaps what you had in mind?
>
> Subject: [PATCH] blk-mq: Reduce blk_mq_hw_ctx size
>
> Since the srcu structure is rather large (184 bytes on an x86-64
> system), only allocate it if needed.
On my normal laptop setup, it's actually 408 bytes (!!).
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/7] blk-mq: fix queue quiescing
2017-05-25 17:24 ` Bart Van Assche
2017-05-25 17:31 ` Jens Axboe
@ 2017-05-25 17:42 ` Jens Axboe
2017-05-25 17:59 ` Bart Van Assche
2017-05-26 0:44 ` Ming Lei
1 sibling, 2 replies; 15+ messages in thread
From: Jens Axboe @ 2017-05-25 17:42 UTC (permalink / raw)
To: Bart Van Assche, ming.lei; +Cc: hch, linux-block
On 05/25/2017 11:24 AM, Bart Van Assche wrote:
> On Thu, 2017-05-25 at 17:09 +0800, Ming Lei wrote:
>> Another big issue is that 'srcu_struct' is very big, which shouldn't
>> be embedded into hctx, since we only have one real user of
>> BLK_MQ_F_BLOCKING.
>>
>> So I will fix that too.
>
> Hello Ming,
>
> Is something like the (untested) patch below perhaps what you had in mind?
>
> Subject: [PATCH] blk-mq: Reduce blk_mq_hw_ctx size
>
> Since the srcu structure is rather large (184 bytes on an x86-64
> system), only allocate it if needed.
>
> Reported-by: Ming Lei <ming.lei@redhat.com>
> ---
> block/blk-mq.c | 13 ++++++++++++-
> include/linux/blk-mq.h | 5 +++--
> 2 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 1e330de4e3c5..15b7d4077638 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2233,6 +2233,17 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
> }
> EXPORT_SYMBOL(blk_mq_init_queue);
>
> +static int blk_mq_hw_ctx_size(struct request_queue *q)
> +{
> + BUILD_BUG_ON(ALIGN(offsetof(struct blk_mq_hw_ctx, queue_rq_srcu) +
> + sizeof(((struct blk_mq_hw_ctx *)NULL)->queue_rq_srcu), 64) !=
> + sizeof(struct blk_mq_hw_ctx));
> +
> + return q->tag_set->flags & BLK_MQ_F_BLOCKING ?
> + sizeof(struct blk_mq_hw_ctx) :
> + offsetof(struct blk_mq_hw_ctx, queue_rq_srcu);
> +}
> +
> static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
> struct request_queue *q)
> {
> @@ -2247,7 +2258,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
> continue;
>
> node = blk_mq_hw_queue_to_node(q->mq_map, i);
> - hctxs[i] = kzalloc_node(sizeof(struct blk_mq_hw_ctx),
> + hctxs[i] = kzalloc_node(blk_mq_hw_ctx_size(q),
> GFP_KERNEL, node);
> if (!hctxs[i])
> break;
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index c0d59330b5e0..8467e1f83524 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -39,8 +39,6 @@ struct blk_mq_hw_ctx {
> struct blk_mq_tags *tags;
> struct blk_mq_tags *sched_tags;
>
> - struct srcu_struct queue_rq_srcu;
> -
> unsigned long queued;
> unsigned long run;
> #define BLK_MQ_MAX_DISPATCH_ORDER 7
> @@ -62,6 +60,9 @@ struct blk_mq_hw_ctx {
> struct dentry *debugfs_dir;
> struct dentry *sched_debugfs_dir;
> #endif
> +
> + /* Must be the last member - see also blk_mq_hw_ctx_size(). */
> + struct srcu_struct queue_rq_srcu;
> };
Why not make it
/* Must be the last member - see also blk_mq_hw_ctx_size(). */
struct srcu_struct queue_rq_srcu[0];
and fixup blk_mq_hw_ctx_size()
static int blk_mq_hw_ctx_size(struct request_queue *q)
{
int size = sizeof(struct blk_mq_hw_ctx);
if (q->tag_set->flags & BLK_MQ_F_BLOCKING)
size += sizeof(struct srcu_struct);
return size;
}
I think that'd be cleaner. Keep the end-of-struct checking, just to be
on the safe side.
Neither one is super pretty though, and still doesn't fix the fact that
the srcu_struct is _half_ the blk_mq_hw_ctx in total.
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/7] blk-mq: fix queue quiescing
2017-05-25 17:42 ` Jens Axboe
@ 2017-05-25 17:59 ` Bart Van Assche
2017-05-26 0:44 ` Ming Lei
1 sibling, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2017-05-25 17:59 UTC (permalink / raw)
To: axboe, ming.lei; +Cc: hch, linux-block
On Thu, 2017-05-25 at 11:42 -0600, Jens Axboe wrote:
> Why not make it
>=20
> /* Must be the last member - see also blk_mq_hw_ctx_size(). */
> struct srcu_struct queue_rq_srcu[0];
>=20
> and fixup blk_mq_hw_ctx_size()
>=20
> static int blk_mq_hw_ctx_size(struct request_queue *q)
> {
> int size =3D sizeof(struct blk_mq_hw_ctx);
>=20
> if (q->tag_set->flags & BLK_MQ_F_BLOCKING)
> size +=3D sizeof(struct srcu_struct);
>=20
> return size;
> }
>=20
> I think that'd be cleaner. Keep the end-of-struct checking, just to be
> on the safe side.
>=20
> Neither one is super pretty though, and still doesn't fix the fact that
> the srcu_struct is _half_ the blk_mq_hw_ctx in total.
Hello Jens,
Making these changes seems like a good idea to me. I will make these change=
s
and post a patch.
Bart.=
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/7] blk-mq: fix queue quiescing
2017-05-25 17:42 ` Jens Axboe
2017-05-25 17:59 ` Bart Van Assche
@ 2017-05-26 0:44 ` Ming Lei
1 sibling, 0 replies; 15+ messages in thread
From: Ming Lei @ 2017-05-26 0:44 UTC (permalink / raw)
To: Jens Axboe; +Cc: Bart Van Assche, hch, linux-block
On Thu, May 25, 2017 at 11:42:59AM -0600, Jens Axboe wrote:
> On 05/25/2017 11:24 AM, Bart Van Assche wrote:
> > On Thu, 2017-05-25 at 17:09 +0800, Ming Lei wrote:
> >> Another big issue is that 'srcu_struct' is very big, which shouldn't
> >> be embedded into hctx, since we only have one real user of
> >> BLK_MQ_F_BLOCKING.
> >>
> >> So I will fix that too.
> >
> > Hello Ming,
> >
> > Is something like the (untested) patch below perhaps what you had in mind?
> >
> > Subject: [PATCH] blk-mq: Reduce blk_mq_hw_ctx size
> >
> > Since the srcu structure is rather large (184 bytes on an x86-64
> > system), only allocate it if needed.
> >
> > Reported-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > block/blk-mq.c | 13 ++++++++++++-
> > include/linux/blk-mq.h | 5 +++--
> > 2 files changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 1e330de4e3c5..15b7d4077638 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -2233,6 +2233,17 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
> > }
> > EXPORT_SYMBOL(blk_mq_init_queue);
> >
> > +static int blk_mq_hw_ctx_size(struct request_queue *q)
> > +{
> > + BUILD_BUG_ON(ALIGN(offsetof(struct blk_mq_hw_ctx, queue_rq_srcu) +
> > + sizeof(((struct blk_mq_hw_ctx *)NULL)->queue_rq_srcu), 64) !=
> > + sizeof(struct blk_mq_hw_ctx));
> > +
> > + return q->tag_set->flags & BLK_MQ_F_BLOCKING ?
> > + sizeof(struct blk_mq_hw_ctx) :
> > + offsetof(struct blk_mq_hw_ctx, queue_rq_srcu);
> > +}
> > +
> > static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
> > struct request_queue *q)
> > {
> > @@ -2247,7 +2258,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
> > continue;
> >
> > node = blk_mq_hw_queue_to_node(q->mq_map, i);
> > - hctxs[i] = kzalloc_node(sizeof(struct blk_mq_hw_ctx),
> > + hctxs[i] = kzalloc_node(blk_mq_hw_ctx_size(q),
> > GFP_KERNEL, node);
> > if (!hctxs[i])
> > break;
> > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> > index c0d59330b5e0..8467e1f83524 100644
> > --- a/include/linux/blk-mq.h
> > +++ b/include/linux/blk-mq.h
> > @@ -39,8 +39,6 @@ struct blk_mq_hw_ctx {
> > struct blk_mq_tags *tags;
> > struct blk_mq_tags *sched_tags;
> >
> > - struct srcu_struct queue_rq_srcu;
> > -
> > unsigned long queued;
> > unsigned long run;
> > #define BLK_MQ_MAX_DISPATCH_ORDER 7
> > @@ -62,6 +60,9 @@ struct blk_mq_hw_ctx {
> > struct dentry *debugfs_dir;
> > struct dentry *sched_debugfs_dir;
> > #endif
> > +
> > + /* Must be the last member - see also blk_mq_hw_ctx_size(). */
> > + struct srcu_struct queue_rq_srcu;
> > };
>
> Why not make it
>
> /* Must be the last member - see also blk_mq_hw_ctx_size(). */
> struct srcu_struct queue_rq_srcu[0];
Yeah, actually that was what I did yesterday in my local tree, will
posted out today with the blk_mq_quiesce_queue() fix.
Thanks,
Ming
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2017-05-26 0:44 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-25 4:21 [PATCH 0/7] blk-mq: fix queue quiescing Ming Lei
2017-05-25 4:21 ` [PATCH 1/7] blk-mq: introduce blk_mq_unquiesce_queue Ming Lei
2017-05-25 4:21 ` [PATCH 2/7] block: introduce flag of QUEUE_FLAG_QUIESCED Ming Lei
2017-05-25 4:21 ` [PATCH 3/7] dm: use the introduced blk_mq_unquiesce_queue() Ming Lei
2017-05-25 4:21 ` [PATCH 4/7] nvme: " Ming Lei
2017-05-25 4:21 ` [PATCH 5/7] scsi: " Ming Lei
2017-05-25 4:21 ` [PATCH 6/7] blk-mq: quiesce queue via percpu_ref Ming Lei
2017-05-25 4:21 ` [PATCH 7/7] blk-mq: update comments on blk_mq_quiesce_queue() Ming Lei
2017-05-25 5:24 ` [PATCH 0/7] blk-mq: fix queue quiescing Bart Van Assche
2017-05-25 9:09 ` Ming Lei
2017-05-25 17:24 ` Bart Van Assche
2017-05-25 17:31 ` Jens Axboe
2017-05-25 17:42 ` Jens Axboe
2017-05-25 17:59 ` Bart Van Assche
2017-05-26 0:44 ` Ming Lei
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.