All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING
@ 2020-08-25 14:17 Ming Lei
  2020-08-25 14:17 ` [PATCH V2 1/2] blk-mq: serialize queue quiesce and unquiesce by mutex Ming Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Ming Lei @ 2020-08-25 14:17 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Lai Jiangshan, Paul E . McKenney,
	Josh Triplett, Sagi Grimberg, Bart Van Assche,
	Johannes Thumshirn, Chao Leng, Christoph Hellwig

Hi Jens,

The 1st patch add .mq_quiesce_mutex for serializing quiesce/unquiesce,
and prepares for replacing srcu with percpu_ref.

The 2nd patch replaces srcu with percpu_ref.

V2:
	- add .mq_quiesce_lock
	- add comment on patch 2 wrt. handling hctx_lock() failure
	- trivial patch style change


Ming Lei (2):
  blk-mq: serialize queue quiesce and unquiesce by mutex
  blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING

 block/blk-core.c       |   2 +
 block/blk-mq-sysfs.c   |   2 -
 block/blk-mq.c         | 125 +++++++++++++++++++++++------------------
 block/blk-sysfs.c      |   6 +-
 include/linux/blk-mq.h |   7 ---
 include/linux/blkdev.h |   6 ++
 6 files changed, 82 insertions(+), 66 deletions(-)

Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Cc: Chao Leng <lengchao@huawei.com>
Cc: Christoph Hellwig <hch@lst.de>
-- 
2.25.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH V2 1/2] blk-mq: serialize queue quiesce and unquiesce by mutex
  2020-08-25 14:17 [PATCH V2 0/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING Ming Lei
@ 2020-08-25 14:17 ` Ming Lei
  2020-08-26  7:51   ` Chao Leng
  2020-08-25 14:17 ` [PATCH V2 2/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING Ming Lei
  2020-09-02  3:11 ` [PATCH V2 0/2] " Ming Lei
  2 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2020-08-25 14:17 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Sagi Grimberg, Bart Van Assche,
	Johannes Thumshirn, Chao Leng

Add .mq_quiesce_mutext to request queue, so that queue quiesce and
unquiesce can be serialized. Meantime we can avoid unnecessary
synchronize_rcu() in case that queue has been quiesced already.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Cc: Chao Leng <lengchao@huawei.com>
---
 block/blk-core.c       |  2 ++
 block/blk-mq.c         | 11 +++++++++++
 include/linux/blkdev.h |  2 ++
 3 files changed, 15 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index d9d632639bd1..ffc57df70064 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -561,6 +561,8 @@ struct request_queue *blk_alloc_queue(int node_id)
 	init_waitqueue_head(&q->mq_freeze_wq);
 	mutex_init(&q->mq_freeze_lock);
 
+	mutex_init(&q->mq_quiesce_lock);
+
 	/*
 	 * Init percpu_ref in atomic mode so that it's faster to shutdown.
 	 * See blk_register_queue() for details.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b3d2785eefe9..817e016ef886 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -224,6 +224,11 @@ void blk_mq_quiesce_queue(struct request_queue *q)
 	unsigned int i;
 	bool rcu = false;
 
+	mutex_lock(&q->mq_quiesce_lock);
+
+	if (blk_queue_quiesced(q))
+		goto exit;
+
 	blk_mq_quiesce_queue_nowait(q);
 
 	queue_for_each_hw_ctx(q, hctx, i) {
@@ -234,6 +239,8 @@ void blk_mq_quiesce_queue(struct request_queue *q)
 	}
 	if (rcu)
 		synchronize_rcu();
+ exit:
+	mutex_unlock(&q->mq_quiesce_lock);
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
 
@@ -246,10 +253,14 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
  */
 void blk_mq_unquiesce_queue(struct request_queue *q)
 {
+	mutex_lock(&q->mq_quiesce_lock);
+
 	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
 
 	/* dispatch requests which are inserted during quiescing */
 	blk_mq_run_hw_queues(q, true);
+
+	mutex_unlock(&q->mq_quiesce_lock);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d8dba550ecac..5ed03066b33e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -569,6 +569,8 @@ struct request_queue {
 	 */
 	struct mutex		mq_freeze_lock;
 
+	struct mutex		mq_quiesce_lock;
+
 	struct blk_mq_tag_set	*tag_set;
 	struct list_head	tag_set_list;
 	struct bio_set		bio_split;
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V2 2/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING
  2020-08-25 14:17 [PATCH V2 0/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING Ming Lei
  2020-08-25 14:17 ` [PATCH V2 1/2] blk-mq: serialize queue quiesce and unquiesce by mutex Ming Lei
@ 2020-08-25 14:17 ` Ming Lei
  2020-09-02  3:11 ` [PATCH V2 0/2] " Ming Lei
  2 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2020-08-25 14:17 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Ming Lei, Lai Jiangshan, Paul E . McKenney,
	Josh Triplett, Sagi Grimberg, Bart Van Assche,
	Johannes Thumshirn, Chao Leng, Christoph Hellwig

In case of BLK_MQ_F_BLOCKING, blk-mq uses SRCU to mark read critical
section during dispatching request, then request queue quiesce is based on
SRCU. What we want to get is low cost added in fast path.

With percpu-ref, it is cleaner and simpler & enough for implementing queue
quiesce. The main requirement is to make sure all read sections to observe
QUEUE_FLAG_QUIESCED once blk_mq_quiesce_queue() returns.

Also it becomes much easier to add interface of async queue quiesce.

Meantime memory footprint can be reduced with per-request-queue percpu-ref.

From implementation viewpoint, in fast path, not see percpu_ref is
slower than SRCU, and srcu tree(default option in most distributions)
could be slower since memory barrier is required in both lock & unlock,
and rcu_read_lock()/rcu_read_unlock() should be much cheap than
smp_mb().

1) percpu_ref just hold the rcu_read_lock, then run a check &
   increase/decrease on the percpu variable:

   rcu_read_lock()
   if (__ref_is_percpu(ref, &percpu_count))
	this_cpu_inc(*percpu_count);
   rcu_read_unlock()

2) srcu tree:
        idx = READ_ONCE(ssp->srcu_idx) & 0x1;
        this_cpu_inc(ssp->sda->srcu_lock_count[idx]);
        smp_mb(); /* B */  /* Avoid leaking the critical section. */

Also from my test on null_blk(blocking), not observe percpu-ref performs
worse than srcu, see the following test:

1) test steps:

rmmod null_blk > /dev/null 2>&1
modprobe null_blk nr_devices=1 submit_queues=1 blocking=1
fio --bs=4k --size=512G  --rw=randread --norandommap --direct=1 --ioengine=libaio \
	--iodepth=64 --runtime=60 --group_reporting=1  --name=nullb0 \
	--filename=/dev/nullb0 --numjobs=32

test machine: HP DL380, 16 cpu cores, 2 threads per core, dual
sockets/numa, Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz

2) test result:
- srcu quiesce: 6063K IOPS
- percpu-ref quiesce: 6113K IOPS

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Cc: Chao Leng <lengchao@huawei.com>
Cc: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq-sysfs.c   |   2 -
 block/blk-mq.c         | 116 +++++++++++++++++++++--------------------
 block/blk-sysfs.c      |   6 ++-
 include/linux/blk-mq.h |   7 ---
 include/linux/blkdev.h |   4 ++
 5 files changed, 68 insertions(+), 67 deletions(-)

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 062229395a50..799db7937105 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -38,8 +38,6 @@ static void blk_mq_hw_sysfs_release(struct kobject *kobj)
 
 	cancel_delayed_work_sync(&hctx->run_work);
 
-	if (hctx->flags & BLK_MQ_F_BLOCKING)
-		cleanup_srcu_struct(hctx->srcu);
 	blk_free_flush_queue(hctx->fq);
 	sbitmap_free(&hctx->ctx_map);
 	free_cpumask_var(hctx->cpumask);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 817e016ef886..ef6c6fa8dab0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -220,10 +220,6 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
  */
 void blk_mq_quiesce_queue(struct request_queue *q)
 {
-	struct blk_mq_hw_ctx *hctx;
-	unsigned int i;
-	bool rcu = false;
-
 	mutex_lock(&q->mq_quiesce_lock);
 
 	if (blk_queue_quiesced(q))
@@ -231,13 +227,11 @@ void blk_mq_quiesce_queue(struct request_queue *q)
 
 	blk_mq_quiesce_queue_nowait(q);
 
-	queue_for_each_hw_ctx(q, hctx, i) {
-		if (hctx->flags & BLK_MQ_F_BLOCKING)
-			synchronize_srcu(hctx->srcu);
-		else
-			rcu = true;
-	}
-	if (rcu)
+	if (q->tag_set->flags & BLK_MQ_F_BLOCKING) {
+		percpu_ref_kill(&q->dispatch_counter);
+		wait_event(q->mq_quiesce_wq,
+				percpu_ref_is_zero(&q->dispatch_counter));
+	} else
 		synchronize_rcu();
  exit:
 	mutex_unlock(&q->mq_quiesce_lock);
@@ -257,6 +251,9 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
 
 	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
 
+	if (q->tag_set->flags & BLK_MQ_F_BLOCKING)
+		percpu_ref_resurrect(&q->dispatch_counter);
+
 	/* dispatch requests which are inserted during quiescing */
 	blk_mq_run_hw_queues(q, true);
 
@@ -710,24 +707,21 @@ void blk_mq_complete_request(struct request *rq)
 }
 EXPORT_SYMBOL(blk_mq_complete_request);
 
-static void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx)
-	__releases(hctx->srcu)
+static void hctx_unlock(struct blk_mq_hw_ctx *hctx)
 {
-	if (!(hctx->flags & BLK_MQ_F_BLOCKING))
-		rcu_read_unlock();
+	if (hctx->flags & BLK_MQ_F_BLOCKING)
+		percpu_ref_put(&hctx->queue->dispatch_counter);
 	else
-		srcu_read_unlock(hctx->srcu, srcu_idx);
+		rcu_read_unlock();
 }
 
-static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
-	__acquires(hctx->srcu)
+/* Returning false means that queue is being quiesced */
+static inline bool hctx_lock(struct blk_mq_hw_ctx *hctx)
 {
-	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
-		/* shut up gcc false positive */
-		*srcu_idx = 0;
-		rcu_read_lock();
-	} else
-		*srcu_idx = srcu_read_lock(hctx->srcu);
+	if (hctx->flags & BLK_MQ_F_BLOCKING)
+		return percpu_ref_tryget_live(&hctx->queue->dispatch_counter);
+	rcu_read_lock();
+	return true;
 }
 
 /**
@@ -1506,8 +1500,6 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
  */
 static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
-	int srcu_idx;
-
 	/*
 	 * We should be running this queue from one of the CPUs that
 	 * are mapped to it.
@@ -1541,9 +1533,10 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 
 	might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING);
 
-	hctx_lock(hctx, &srcu_idx);
-	blk_mq_sched_dispatch_requests(hctx);
-	hctx_unlock(hctx, srcu_idx);
+	if (hctx_lock(hctx)) {
+		blk_mq_sched_dispatch_requests(hctx);
+		hctx_unlock(hctx);
+	}
 }
 
 static inline int blk_mq_first_mapped_cpu(struct blk_mq_hw_ctx *hctx)
@@ -1655,7 +1648,6 @@ EXPORT_SYMBOL(blk_mq_delay_run_hw_queue);
  */
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
 {
-	int srcu_idx;
 	bool need_run;
 
 	/*
@@ -1666,10 +1658,12 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
 	 * And queue will be rerun in blk_mq_unquiesce_queue() if it is
 	 * quiesced.
 	 */
-	hctx_lock(hctx, &srcu_idx);
+	if (!hctx_lock(hctx))
+		return;
+
 	need_run = !blk_queue_quiesced(hctx->queue) &&
 		blk_mq_hctx_has_pending(hctx);
-	hctx_unlock(hctx, srcu_idx);
+	hctx_unlock(hctx);
 
 	if (need_run)
 		__blk_mq_delay_run_hw_queue(hctx, async, 0);
@@ -2009,7 +2003,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	bool run_queue = true;
 
 	/*
-	 * RCU or SRCU read lock is needed before checking quiesced flag.
+	 * hctx_lock() is needed before checking quiesced flag.
 	 *
 	 * When queue is stopped or quiesced, ignore 'bypass_insert' from
 	 * blk_mq_request_issue_directly(), and return BLK_STS_OK to caller,
@@ -2057,11 +2051,14 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 		struct request *rq, blk_qc_t *cookie)
 {
 	blk_status_t ret;
-	int srcu_idx;
 
 	might_sleep_if(hctx->flags & BLK_MQ_F_BLOCKING);
 
-	hctx_lock(hctx, &srcu_idx);
+	/* Insert request to queue in case of being quiesced */
+	if (!hctx_lock(hctx)) {
+		blk_mq_sched_insert_request(rq, false, false, false);
+		return;
+	}
 
 	ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true);
 	if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
@@ -2069,19 +2066,22 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	else if (ret != BLK_STS_OK)
 		blk_mq_end_request(rq, ret);
 
-	hctx_unlock(hctx, srcu_idx);
+	hctx_unlock(hctx);
 }
 
 blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
 {
 	blk_status_t ret;
-	int srcu_idx;
 	blk_qc_t unused_cookie;
 	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
-	hctx_lock(hctx, &srcu_idx);
+	/* Insert request to queue in case of being quiesced */
+	if (!hctx_lock(hctx)) {
+		blk_mq_sched_insert_request(rq, false, false, false);
+		return BLK_STS_OK;
+	}
 	ret = __blk_mq_try_issue_directly(hctx, rq, &unused_cookie, true, last);
-	hctx_unlock(hctx, srcu_idx);
+	hctx_unlock(hctx);
 
 	return ret;
 }
@@ -2612,20 +2612,6 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
 	}
 }
 
-static int blk_mq_hw_ctx_size(struct blk_mq_tag_set *tag_set)
-{
-	int hw_ctx_size = sizeof(struct blk_mq_hw_ctx);
-
-	BUILD_BUG_ON(ALIGN(offsetof(struct blk_mq_hw_ctx, srcu),
-			   __alignof__(struct blk_mq_hw_ctx)) !=
-		     sizeof(struct blk_mq_hw_ctx));
-
-	if (tag_set->flags & BLK_MQ_F_BLOCKING)
-		hw_ctx_size += sizeof(struct srcu_struct);
-
-	return hw_ctx_size;
-}
-
 static int blk_mq_init_hctx(struct request_queue *q,
 		struct blk_mq_tag_set *set,
 		struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
@@ -2663,7 +2649,7 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set,
 	struct blk_mq_hw_ctx *hctx;
 	gfp_t gfp = GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY;
 
-	hctx = kzalloc_node(blk_mq_hw_ctx_size(set), gfp, node);
+	hctx = kzalloc_node(sizeof(struct blk_mq_hw_ctx), gfp, node);
 	if (!hctx)
 		goto fail_alloc_hctx;
 
@@ -2705,8 +2691,6 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set,
 	if (!hctx->fq)
 		goto free_bitmap;
 
-	if (hctx->flags & BLK_MQ_F_BLOCKING)
-		init_srcu_struct(hctx->srcu);
 	blk_mq_hctx_kobj_init(hctx);
 
 	return hctx;
@@ -3183,6 +3167,13 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 	mutex_unlock(&q->sysfs_lock);
 }
 
+static void blk_mq_dispatch_counter_release(struct percpu_ref *ref)
+{
+	struct request_queue *q = container_of(ref, struct request_queue,
+				dispatch_counter);
+	wake_up_all(&q->mq_quiesce_wq);
+}
+
 struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 						  struct request_queue *q,
 						  bool elevator_init)
@@ -3199,6 +3190,14 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	if (blk_mq_alloc_ctxs(q))
 		goto err_poll;
 
+	if (set->flags & BLK_MQ_F_BLOCKING) {
+		init_waitqueue_head(&q->mq_quiesce_wq);
+		if (percpu_ref_init(&q->dispatch_counter,
+					blk_mq_dispatch_counter_release,
+					PERCPU_REF_ALLOW_REINIT, GFP_KERNEL))
+			goto err_hctxs;
+	}
+
 	/* init q->mq_kobj and sw queues' kobjects */
 	blk_mq_sysfs_init(q);
 
@@ -3207,7 +3206,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 
 	blk_mq_realloc_hw_ctxs(set, q);
 	if (!q->nr_hw_queues)
-		goto err_hctxs;
+		goto err_dispatch_counter;
 
 	INIT_WORK(&q->timeout_work, blk_mq_timeout_work);
 	blk_queue_rq_timeout(q, set->timeout ? set->timeout : 30 * HZ);
@@ -3241,6 +3240,9 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 
 	return q;
 
+err_dispatch_counter:
+	if (set->flags & BLK_MQ_F_BLOCKING)
+		percpu_ref_exit(&q->dispatch_counter);
 err_hctxs:
 	kfree(q->queue_hw_ctx);
 	q->nr_hw_queues = 0;
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 7dda709f3ccb..56b6c045e30c 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -941,9 +941,13 @@ static void blk_release_queue(struct kobject *kobj)
 
 	blk_queue_free_zone_bitmaps(q);
 
-	if (queue_is_mq(q))
+	if (queue_is_mq(q)) {
 		blk_mq_release(q);
 
+		if (q->tag_set->flags & BLK_MQ_F_BLOCKING)
+			percpu_ref_exit(&q->dispatch_counter);
+	}
+
 	blk_trace_shutdown(q);
 	mutex_lock(&q->debugfs_mutex);
 	debugfs_remove_recursive(q->debugfs_dir);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 9d2d5ad367a4..ea3461298de5 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -169,13 +169,6 @@ struct blk_mq_hw_ctx {
 	 * q->unused_hctx_list.
 	 */
 	struct list_head	hctx_list;
-
-	/**
-	 * @srcu: Sleepable RCU. Use as lock when type of the hardware queue is
-	 * blocking (BLK_MQ_F_BLOCKING). Must be the last member - see also
-	 * blk_mq_hw_ctx_size().
-	 */
-	struct srcu_struct	srcu[];
 };
 
 /**
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5ed03066b33e..1ab17bcbc334 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -571,6 +571,10 @@ struct request_queue {
 
 	struct mutex		mq_quiesce_lock;
 
+	/* only used for BLK_MQ_F_BLOCKING */
+	struct percpu_ref	dispatch_counter;
+	wait_queue_head_t	mq_quiesce_wq;
+
 	struct blk_mq_tag_set	*tag_set;
 	struct list_head	tag_set_list;
 	struct bio_set		bio_split;
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH V2 1/2] blk-mq: serialize queue quiesce and unquiesce by mutex
  2020-08-25 14:17 ` [PATCH V2 1/2] blk-mq: serialize queue quiesce and unquiesce by mutex Ming Lei
@ 2020-08-26  7:51   ` Chao Leng
  2020-08-26  8:54     ` Ming Lei
  0 siblings, 1 reply; 15+ messages in thread
From: Chao Leng @ 2020-08-26  7:51 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: linux-block, Sagi Grimberg, Bart Van Assche, Johannes Thumshirn

It doesn't matter. Because the reentry of quiesce&unquiesce queue is not
safe, must be avoided by other mechanism. otherwise, exceptions may
occur. Introduce mq_quiesce_lock looks saving possible synchronization
waits, but it should not happen. If really happen, we need fix it.

On 2020/8/25 22:17, Ming Lei wrote:
> Add .mq_quiesce_mutext to request queue, so that queue quiesce and
> unquiesce can be serialized. Meantime we can avoid unnecessary
> synchronize_rcu() in case that queue has been quiesced already.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
> Cc: Chao Leng <lengchao@huawei.com>
> ---
>   block/blk-core.c       |  2 ++
>   block/blk-mq.c         | 11 +++++++++++
>   include/linux/blkdev.h |  2 ++
>   3 files changed, 15 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index d9d632639bd1..ffc57df70064 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -561,6 +561,8 @@ struct request_queue *blk_alloc_queue(int node_id)
>   	init_waitqueue_head(&q->mq_freeze_wq);
>   	mutex_init(&q->mq_freeze_lock);
>   
> +	mutex_init(&q->mq_quiesce_lock);
> +
>   	/*
>   	 * Init percpu_ref in atomic mode so that it's faster to shutdown.
>   	 * See blk_register_queue() for details.
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index b3d2785eefe9..817e016ef886 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -224,6 +224,11 @@ void blk_mq_quiesce_queue(struct request_queue *q)
>   	unsigned int i;
>   	bool rcu = false;
>   
> +	mutex_lock(&q->mq_quiesce_lock);
> +
> +	if (blk_queue_quiesced(q))
> +		goto exit;
> +
>   	blk_mq_quiesce_queue_nowait(q);
>   
>   	queue_for_each_hw_ctx(q, hctx, i) {
> @@ -234,6 +239,8 @@ void blk_mq_quiesce_queue(struct request_queue *q)
>   	}
>   	if (rcu)
>   		synchronize_rcu();
> + exit:
> +	mutex_unlock(&q->mq_quiesce_lock);
>   }
>   EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
>   
> @@ -246,10 +253,14 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
>    */
>   void blk_mq_unquiesce_queue(struct request_queue *q)
>   {
> +	mutex_lock(&q->mq_quiesce_lock);
> +
>   	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
>   
>   	/* dispatch requests which are inserted during quiescing */
>   	blk_mq_run_hw_queues(q, true);
> +
> +	mutex_unlock(&q->mq_quiesce_lock);
>   }
>   EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
>   
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index d8dba550ecac..5ed03066b33e 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -569,6 +569,8 @@ struct request_queue {
>   	 */
>   	struct mutex		mq_freeze_lock;
>   
> +	struct mutex		mq_quiesce_lock;
> +
>   	struct blk_mq_tag_set	*tag_set;
>   	struct list_head	tag_set_list;
>   	struct bio_set		bio_split;
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V2 1/2] blk-mq: serialize queue quiesce and unquiesce by mutex
  2020-08-26  7:51   ` Chao Leng
@ 2020-08-26  8:54     ` Ming Lei
  2020-08-26 15:36       ` Keith Busch
  0 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2020-08-26  8:54 UTC (permalink / raw)
  To: Chao Leng
  Cc: Jens Axboe, linux-block, Sagi Grimberg, Bart Van Assche,
	Johannes Thumshirn

On Wed, Aug 26, 2020 at 03:51:25PM +0800, Chao Leng wrote:
> It doesn't matter. Because the reentry of quiesce&unquiesce queue is not
> safe, must be avoided by other mechanism. otherwise, exceptions may
> occur. Introduce mq_quiesce_lock looks saving possible synchronization
> waits, but it should not happen. If really happen, we need fix it.

Sagi mentioned there may be nested queue quiesce, so I add .mq_quiesce_lock 
to make this usage easy to support, meantime avoid percpu_ref warning
in such usage.

Anyway, not see any problem with adding .mq_quiesce_lock, so I'd suggest to
move on with this way.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V2 1/2] blk-mq: serialize queue quiesce and unquiesce by mutex
  2020-08-26  8:54     ` Ming Lei
@ 2020-08-26 15:36       ` Keith Busch
  2020-08-26 16:23         ` Sagi Grimberg
  2020-08-27  2:38         ` Ming Lei
  0 siblings, 2 replies; 15+ messages in thread
From: Keith Busch @ 2020-08-26 15:36 UTC (permalink / raw)
  To: Ming Lei
  Cc: Chao Leng, Jens Axboe, linux-block, Sagi Grimberg,
	Bart Van Assche, Johannes Thumshirn

On Wed, Aug 26, 2020 at 04:54:22PM +0800, Ming Lei wrote:
> On Wed, Aug 26, 2020 at 03:51:25PM +0800, Chao Leng wrote:
> > It doesn't matter. Because the reentry of quiesce&unquiesce queue is not
> > safe, must be avoided by other mechanism. otherwise, exceptions may
> > occur. Introduce mq_quiesce_lock looks saving possible synchronization
> > waits, but it should not happen. If really happen, we need fix it.
> 
> Sagi mentioned there may be nested queue quiesce, so I add .mq_quiesce_lock 
> to make this usage easy to support, meantime avoid percpu_ref warning
> in such usage.
> 
> Anyway, not see any problem with adding .mq_quiesce_lock, so I'd suggest to
> move on with this way.

I'm not sure there really are any nested queue quiesce paths, but if
there are, wouldn't we need to track the "depth" like how a queue freeze
works?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V2 1/2] blk-mq: serialize queue quiesce and unquiesce by mutex
  2020-08-26 15:36       ` Keith Busch
@ 2020-08-26 16:23         ` Sagi Grimberg
  2020-08-27  2:38         ` Ming Lei
  1 sibling, 0 replies; 15+ messages in thread
From: Sagi Grimberg @ 2020-08-26 16:23 UTC (permalink / raw)
  To: Keith Busch, Ming Lei
  Cc: Chao Leng, Jens Axboe, linux-block, Bart Van Assche, Johannes Thumshirn


>>> It doesn't matter. Because the reentry of quiesce&unquiesce queue is not
>>> safe, must be avoided by other mechanism. otherwise, exceptions may
>>> occur. Introduce mq_quiesce_lock looks saving possible synchronization
>>> waits, but it should not happen. If really happen, we need fix it.
>>
>> Sagi mentioned there may be nested queue quiesce, so I add .mq_quiesce_lock
>> to make this usage easy to support, meantime avoid percpu_ref warning
>> in such usage.
>>
>> Anyway, not see any problem with adding .mq_quiesce_lock, so I'd suggest to
>> move on with this way.
> 
> I'm not sure there really are any nested queue quiesce paths, but if
> there are, wouldn't we need to track the "depth" like how a queue freeze
> works?

We might need it when the async quiesce is implemented for this,
because then quiesce will only "start" and in a different context wait
for the quiesced event (where we may need to add wakeup for waiters).

This is why I want to see this with the async quiesce piece.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V2 1/2] blk-mq: serialize queue quiesce and unquiesce by mutex
  2020-08-26 15:36       ` Keith Busch
  2020-08-26 16:23         ` Sagi Grimberg
@ 2020-08-27  2:38         ` Ming Lei
  1 sibling, 0 replies; 15+ messages in thread
From: Ming Lei @ 2020-08-27  2:38 UTC (permalink / raw)
  To: Keith Busch
  Cc: Chao Leng, Jens Axboe, linux-block, Sagi Grimberg,
	Bart Van Assche, Johannes Thumshirn

On Wed, Aug 26, 2020 at 08:36:33AM -0700, Keith Busch wrote:
> On Wed, Aug 26, 2020 at 04:54:22PM +0800, Ming Lei wrote:
> > On Wed, Aug 26, 2020 at 03:51:25PM +0800, Chao Leng wrote:
> > > It doesn't matter. Because the reentry of quiesce&unquiesce queue is not
> > > safe, must be avoided by other mechanism. otherwise, exceptions may
> > > occur. Introduce mq_quiesce_lock looks saving possible synchronization
> > > waits, but it should not happen. If really happen, we need fix it.
> > 
> > Sagi mentioned there may be nested queue quiesce, so I add .mq_quiesce_lock 
> > to make this usage easy to support, meantime avoid percpu_ref warning
> > in such usage.
> > 
> > Anyway, not see any problem with adding .mq_quiesce_lock, so I'd suggest to
> > move on with this way.
> 
> I'm not sure there really are any nested queue quiesce paths, but if
> there are, wouldn't we need to track the "depth" like how a queue freeze
> works?

Both atomic 'depth' and .mq_quiesce_lock can work for nested queue
quiesce since we can avoid unnecessary queue quiesce with the mutex.
percpu_ref_kill() / percpu_ref_kill_and_confirm() can warn if the
percpu_ref has been killed, that is why I think Sagi's suggestion is good.

But 'depth' may cause trouble easily, such as unbalanced quiesce/unquiesce,
however no such issue with mutex, at least we don't require the two to
be paired strictly so far.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V2 0/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING
  2020-08-25 14:17 [PATCH V2 0/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING Ming Lei
  2020-08-25 14:17 ` [PATCH V2 1/2] blk-mq: serialize queue quiesce and unquiesce by mutex Ming Lei
  2020-08-25 14:17 ` [PATCH V2 2/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING Ming Lei
@ 2020-09-02  3:11 ` Ming Lei
  2020-09-02 17:52   ` Jens Axboe
  2 siblings, 1 reply; 15+ messages in thread
From: Ming Lei @ 2020-09-02  3:11 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Lai Jiangshan, Paul E . McKenney, Josh Triplett,
	Sagi Grimberg, Bart Van Assche, Johannes Thumshirn, Chao Leng,
	Christoph Hellwig

On Tue, Aug 25, 2020 at 10:17:32PM +0800, Ming Lei wrote:
> Hi Jens,
> 
> The 1st patch add .mq_quiesce_mutex for serializing quiesce/unquiesce,
> and prepares for replacing srcu with percpu_ref.
> 
> The 2nd patch replaces srcu with percpu_ref.
> 
> V2:
> 	- add .mq_quiesce_lock
> 	- add comment on patch 2 wrt. handling hctx_lock() failure
> 	- trivial patch style change
> 
> 
> Ming Lei (2):
>   blk-mq: serialize queue quiesce and unquiesce by mutex
>   blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING
> 
>  block/blk-core.c       |   2 +
>  block/blk-mq-sysfs.c   |   2 -
>  block/blk-mq.c         | 125 +++++++++++++++++++++++------------------
>  block/blk-sysfs.c      |   6 +-
>  include/linux/blk-mq.h |   7 ---
>  include/linux/blkdev.h |   6 ++
>  6 files changed, 82 insertions(+), 66 deletions(-)
> 
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
> Cc: Chao Leng <lengchao@huawei.com>
> Cc: Christoph Hellwig <hch@lst.de>

Hello Guys,

Is there any objections on the two patches? If not, I'd suggest to move
on.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V2 0/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING
  2020-09-02  3:11 ` [PATCH V2 0/2] " Ming Lei
@ 2020-09-02 17:52   ` Jens Axboe
  2020-09-02 18:20     ` Sagi Grimberg
  2020-09-03  0:35     ` Ming Lei
  0 siblings, 2 replies; 15+ messages in thread
From: Jens Axboe @ 2020-09-02 17:52 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-block, Lai Jiangshan, Paul E . McKenney, Josh Triplett,
	Sagi Grimberg, Bart Van Assche, Johannes Thumshirn, Chao Leng,
	Christoph Hellwig

On 9/1/20 9:11 PM, Ming Lei wrote:
> On Tue, Aug 25, 2020 at 10:17:32PM +0800, Ming Lei wrote:
>> Hi Jens,
>>
>> The 1st patch add .mq_quiesce_mutex for serializing quiesce/unquiesce,
>> and prepares for replacing srcu with percpu_ref.
>>
>> The 2nd patch replaces srcu with percpu_ref.
>>
>> V2:
>> 	- add .mq_quiesce_lock
>> 	- add comment on patch 2 wrt. handling hctx_lock() failure
>> 	- trivial patch style change
>>
>>
>> Ming Lei (2):
>>   blk-mq: serialize queue quiesce and unquiesce by mutex
>>   blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING
>>
>>  block/blk-core.c       |   2 +
>>  block/blk-mq-sysfs.c   |   2 -
>>  block/blk-mq.c         | 125 +++++++++++++++++++++++------------------
>>  block/blk-sysfs.c      |   6 +-
>>  include/linux/blk-mq.h |   7 ---
>>  include/linux/blkdev.h |   6 ++
>>  6 files changed, 82 insertions(+), 66 deletions(-)
>>
>> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
>> Cc: Paul E. McKenney <paulmck@kernel.org>
>> Cc: Josh Triplett <josh@joshtriplett.org>
>> Cc: Sagi Grimberg <sagi@grimberg.me>
>> Cc: Bart Van Assche <bvanassche@acm.org>
>> Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
>> Cc: Chao Leng <lengchao@huawei.com>
>> Cc: Christoph Hellwig <hch@lst.de>
> 
> Hello Guys,
> 
> Is there any objections on the two patches? If not, I'd suggest to move> on.

Seems like the nested case is one that should either be handled, or at
least detected.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V2 0/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING
  2020-09-02 17:52   ` Jens Axboe
@ 2020-09-02 18:20     ` Sagi Grimberg
  2020-09-03  0:41       ` Ming Lei
  2020-09-03  0:35     ` Ming Lei
  1 sibling, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2020-09-02 18:20 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei
  Cc: linux-block, Lai Jiangshan, Paul E . McKenney, Josh Triplett,
	Bart Van Assche, Johannes Thumshirn, Chao Leng,
	Christoph Hellwig


>>> Hi Jens,
>>>
>>> The 1st patch add .mq_quiesce_mutex for serializing quiesce/unquiesce,
>>> and prepares for replacing srcu with percpu_ref.
>>>
>>> The 2nd patch replaces srcu with percpu_ref.
>>>
>>> V2:
>>> 	- add .mq_quiesce_lock
>>> 	- add comment on patch 2 wrt. handling hctx_lock() failure
>>> 	- trivial patch style change
>>>
>>>
>>> Ming Lei (2):
>>>    blk-mq: serialize queue quiesce and unquiesce by mutex
>>>    blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING

I thought we agreed to have a little more consolidation for blocking and
!blocking paths (move fallbacks to common paths).

>>>
>>>   block/blk-core.c       |   2 +
>>>   block/blk-mq-sysfs.c   |   2 -
>>>   block/blk-mq.c         | 125 +++++++++++++++++++++++------------------
>>>   block/blk-sysfs.c      |   6 +-
>>>   include/linux/blk-mq.h |   7 ---
>>>   include/linux/blkdev.h |   6 ++
>>>   6 files changed, 82 insertions(+), 66 deletions(-)
>>>
>>> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
>>> Cc: Paul E. McKenney <paulmck@kernel.org>
>>> Cc: Josh Triplett <josh@joshtriplett.org>
>>> Cc: Sagi Grimberg <sagi@grimberg.me>
>>> Cc: Bart Van Assche <bvanassche@acm.org>
>>> Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
>>> Cc: Chao Leng <lengchao@huawei.com>
>>> Cc: Christoph Hellwig <hch@lst.de>
>>
>> Hello Guys,
>>
>> Is there any objections on the two patches? If not, I'd suggest to move> on.
> 
> Seems like the nested case is one that should either be handled, or at
> least detected.

Personally, I'd like to see the async quiesce piece as well here, which
is the reason why this change was proposed. Don't see a strong urgency
to move forward with it before that, especially as this could
potentially affect various non-trivial reset flows.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V2 0/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING
  2020-09-02 17:52   ` Jens Axboe
  2020-09-02 18:20     ` Sagi Grimberg
@ 2020-09-03  0:35     ` Ming Lei
  2020-09-03 12:37       ` Keith Busch
  1 sibling, 1 reply; 15+ messages in thread
From: Ming Lei @ 2020-09-03  0:35 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Lai Jiangshan, Paul E . McKenney, Josh Triplett,
	Sagi Grimberg, Bart Van Assche, Johannes Thumshirn, Chao Leng,
	Christoph Hellwig

On Wed, Sep 02, 2020 at 11:52:59AM -0600, Jens Axboe wrote:
> On 9/1/20 9:11 PM, Ming Lei wrote:
> > On Tue, Aug 25, 2020 at 10:17:32PM +0800, Ming Lei wrote:
> >> Hi Jens,
> >>
> >> The 1st patch add .mq_quiesce_mutex for serializing quiesce/unquiesce,
> >> and prepares for replacing srcu with percpu_ref.
> >>
> >> The 2nd patch replaces srcu with percpu_ref.
> >>
> >> V2:
> >> 	- add .mq_quiesce_lock
> >> 	- add comment on patch 2 wrt. handling hctx_lock() failure
> >> 	- trivial patch style change
> >>
> >>
> >> Ming Lei (2):
> >>   blk-mq: serialize queue quiesce and unquiesce by mutex
> >>   blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING
> >>
> >>  block/blk-core.c       |   2 +
> >>  block/blk-mq-sysfs.c   |   2 -
> >>  block/blk-mq.c         | 125 +++++++++++++++++++++++------------------
> >>  block/blk-sysfs.c      |   6 +-
> >>  include/linux/blk-mq.h |   7 ---
> >>  include/linux/blkdev.h |   6 ++
> >>  6 files changed, 82 insertions(+), 66 deletions(-)
> >>
> >> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> >> Cc: Paul E. McKenney <paulmck@kernel.org>
> >> Cc: Josh Triplett <josh@joshtriplett.org>
> >> Cc: Sagi Grimberg <sagi@grimberg.me>
> >> Cc: Bart Van Assche <bvanassche@acm.org>
> >> Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
> >> Cc: Chao Leng <lengchao@huawei.com>
> >> Cc: Christoph Hellwig <hch@lst.de>
> > 
> > Hello Guys,
> > 
> > Is there any objections on the two patches? If not, I'd suggest to move> on.
> 
> Seems like the nested case is one that should either be handled, or at
> least detected.

Yeah, the 1st patch adds mutex for handling nested case correctly and efficiently.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V2 0/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING
  2020-09-02 18:20     ` Sagi Grimberg
@ 2020-09-03  0:41       ` Ming Lei
  0 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2020-09-03  0:41 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Jens Axboe, linux-block, Lai Jiangshan, Paul E . McKenney,
	Josh Triplett, Bart Van Assche, Johannes Thumshirn, Chao Leng,
	Christoph Hellwig

On Wed, Sep 02, 2020 at 11:20:36AM -0700, Sagi Grimberg wrote:
> 
> > > > Hi Jens,
> > > > 
> > > > The 1st patch add .mq_quiesce_mutex for serializing quiesce/unquiesce,
> > > > and prepares for replacing srcu with percpu_ref.
> > > > 
> > > > The 2nd patch replaces srcu with percpu_ref.
> > > > 
> > > > V2:
> > > > 	- add .mq_quiesce_lock
> > > > 	- add comment on patch 2 wrt. handling hctx_lock() failure
> > > > 	- trivial patch style change
> > > > 
> > > > 
> > > > Ming Lei (2):
> > > >    blk-mq: serialize queue quiesce and unquiesce by mutex
> > > >    blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING
> 
> I thought we agreed to have a little more consolidation for blocking and
> !blocking paths (move fallbacks to common paths).

Could you describe the consolidation one more time for the two paths?

BTW, code will become a little messy if we move queue quiesce handling
out of __blk_mq_try_issue_directly(), because we have two conditions to
trigger insert request into scheduler queue:

1) hctx_lock() failure

2) blk_queue_quiesced() or blk_mq_hctx_stopped()

The former doesn't need to unlock hctx, however the latter needs that,
that is why I don't do the change if that is the consolidation you
mentioned.

> 
> > > > 
> > > >   block/blk-core.c       |   2 +
> > > >   block/blk-mq-sysfs.c   |   2 -
> > > >   block/blk-mq.c         | 125 +++++++++++++++++++++++------------------
> > > >   block/blk-sysfs.c      |   6 +-
> > > >   include/linux/blk-mq.h |   7 ---
> > > >   include/linux/blkdev.h |   6 ++
> > > >   6 files changed, 82 insertions(+), 66 deletions(-)
> > > > 
> > > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > > Cc: Sagi Grimberg <sagi@grimberg.me>
> > > > Cc: Bart Van Assche <bvanassche@acm.org>
> > > > Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
> > > > Cc: Chao Leng <lengchao@huawei.com>
> > > > Cc: Christoph Hellwig <hch@lst.de>
> > > 
> > > Hello Guys,
> > > 
> > > Is there any objections on the two patches? If not, I'd suggest to move> on.
> > 
> > Seems like the nested case is one that should either be handled, or at
> > least detected.
> 
> Personally, I'd like to see the async quiesce piece as well here, which
> is the reason why this change was proposed. Don't see a strong urgency
> to move forward with it before that, especially as this could
> potentially affect various non-trivial reset flows.

OK, it shouldn't be easy to add the interface, will do that in next
version.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V2 0/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING
  2020-09-03  0:35     ` Ming Lei
@ 2020-09-03 12:37       ` Keith Busch
  2020-09-03 13:10         ` Ming Lei
  0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2020-09-03 12:37 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Lai Jiangshan, Paul E . McKenney,
	Josh Triplett, Sagi Grimberg, Bart Van Assche,
	Johannes Thumshirn, Chao Leng, Christoph Hellwig

On Thu, Sep 03, 2020 at 08:35:45AM +0800, Ming Lei wrote:
> On Wed, Sep 02, 2020 at 11:52:59AM -0600, Jens Axboe wrote:
> > On 9/1/20 9:11 PM, Ming Lei wrote:
> > > On Tue, Aug 25, 2020 at 10:17:32PM +0800, Ming Lei wrote:
> > >> Hi Jens,
> > >>
> > >> The 1st patch add .mq_quiesce_mutex for serializing quiesce/unquiesce,
> > >> and prepares for replacing srcu with percpu_ref.
> > >>
> > >> The 2nd patch replaces srcu with percpu_ref.
> > >>
> > >> V2:
> > >> 	- add .mq_quiesce_lock
> > >> 	- add comment on patch 2 wrt. handling hctx_lock() failure
> > >> 	- trivial patch style change
> > >>
> > >>
> > >> Ming Lei (2):
> > >>   blk-mq: serialize queue quiesce and unquiesce by mutex
> > >>   blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING
> > >>
> > >>  block/blk-core.c       |   2 +
> > >>  block/blk-mq-sysfs.c   |   2 -
> > >>  block/blk-mq.c         | 125 +++++++++++++++++++++++------------------
> > >>  block/blk-sysfs.c      |   6 +-
> > >>  include/linux/blk-mq.h |   7 ---
> > >>  include/linux/blkdev.h |   6 ++
> > >>  6 files changed, 82 insertions(+), 66 deletions(-)
> > >>
> > >> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > >> Cc: Paul E. McKenney <paulmck@kernel.org>
> > >> Cc: Josh Triplett <josh@joshtriplett.org>
> > >> Cc: Sagi Grimberg <sagi@grimberg.me>
> > >> Cc: Bart Van Assche <bvanassche@acm.org>
> > >> Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
> > >> Cc: Chao Leng <lengchao@huawei.com>
> > >> Cc: Christoph Hellwig <hch@lst.de>
> > > 
> > > Hello Guys,
> > > 
> > > Is there any objections on the two patches? If not, I'd suggest to move> on.
> > 
> > Seems like the nested case is one that should either be handled, or at
> > least detected.
> 
> Yeah, the 1st patch adds mutex for handling nested case correctly and efficiently.

That doesn't really do anything about handling nested quiesce/unquiesce.
It just prevents two threads from doing it at the same time, but anyone
can still undo the other's expected queue state. The following on top of
your series will at least detect the condition:

---
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ef6c6fa8dab0..52b53f2bb567 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -249,6 +249,7 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
 {
 	mutex_lock(&q->mq_quiesce_lock);
 
+	WARN_ON(!blk_queue_quiesced(q));
 	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
 
 	if (q->tag_set->flags & BLK_MQ_F_BLOCKING)
--

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH V2 0/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING
  2020-09-03 12:37       ` Keith Busch
@ 2020-09-03 13:10         ` Ming Lei
  0 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2020-09-03 13:10 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, linux-block, Lai Jiangshan, Paul E . McKenney,
	Josh Triplett, Sagi Grimberg, Bart Van Assche,
	Johannes Thumshirn, Chao Leng, Christoph Hellwig

On Thu, Sep 03, 2020 at 05:37:02AM -0700, Keith Busch wrote:
> On Thu, Sep 03, 2020 at 08:35:45AM +0800, Ming Lei wrote:
> > On Wed, Sep 02, 2020 at 11:52:59AM -0600, Jens Axboe wrote:
> > > On 9/1/20 9:11 PM, Ming Lei wrote:
> > > > On Tue, Aug 25, 2020 at 10:17:32PM +0800, Ming Lei wrote:
> > > >> Hi Jens,
> > > >>
> > > >> The 1st patch add .mq_quiesce_mutex for serializing quiesce/unquiesce,
> > > >> and prepares for replacing srcu with percpu_ref.
> > > >>
> > > >> The 2nd patch replaces srcu with percpu_ref.
> > > >>
> > > >> V2:
> > > >> 	- add .mq_quiesce_lock
> > > >> 	- add comment on patch 2 wrt. handling hctx_lock() failure
> > > >> 	- trivial patch style change
> > > >>
> > > >>
> > > >> Ming Lei (2):
> > > >>   blk-mq: serialize queue quiesce and unquiesce by mutex
> > > >>   blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING
> > > >>
> > > >>  block/blk-core.c       |   2 +
> > > >>  block/blk-mq-sysfs.c   |   2 -
> > > >>  block/blk-mq.c         | 125 +++++++++++++++++++++++------------------
> > > >>  block/blk-sysfs.c      |   6 +-
> > > >>  include/linux/blk-mq.h |   7 ---
> > > >>  include/linux/blkdev.h |   6 ++
> > > >>  6 files changed, 82 insertions(+), 66 deletions(-)
> > > >>
> > > >> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > >> Cc: Paul E. McKenney <paulmck@kernel.org>
> > > >> Cc: Josh Triplett <josh@joshtriplett.org>
> > > >> Cc: Sagi Grimberg <sagi@grimberg.me>
> > > >> Cc: Bart Van Assche <bvanassche@acm.org>
> > > >> Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
> > > >> Cc: Chao Leng <lengchao@huawei.com>
> > > >> Cc: Christoph Hellwig <hch@lst.de>
> > > > 
> > > > Hello Guys,
> > > > 
> > > > Is there any objections on the two patches? If not, I'd suggest to move> on.
> > > 
> > > Seems like the nested case is one that should either be handled, or at
> > > least detected.
> > 
> > Yeah, the 1st patch adds mutex for handling nested case correctly and efficiently.
> 
> That doesn't really do anything about handling nested quiesce/unquiesce.

The mutex is required for avoiding warning in percpu_ref_kill_and_confirm() if
two queue quiesce are nested, and I will comment on this motivation in next
version.

> It just prevents two threads from doing it at the same time, but anyone
> can still undo the other's expected queue state. The following on top of

Right, the patch itself changes nothing wrt. this point, and it breaks nothing
too.

> your series will at least detect the condition:
> 
> ---
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index ef6c6fa8dab0..52b53f2bb567 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -249,6 +249,7 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
>  {
>  	mutex_lock(&q->mq_quiesce_lock);
>  
> +	WARN_ON(!blk_queue_quiesced(q));

We can't do that simply, because queue unquiesce may be called
unconditionally on un-quiesced queue, such as nvme_dev_remove_admin(),
nvme_set_queue_dying(), ...


Thanks,
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-09-03 14:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-25 14:17 [PATCH V2 0/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING Ming Lei
2020-08-25 14:17 ` [PATCH V2 1/2] blk-mq: serialize queue quiesce and unquiesce by mutex Ming Lei
2020-08-26  7:51   ` Chao Leng
2020-08-26  8:54     ` Ming Lei
2020-08-26 15:36       ` Keith Busch
2020-08-26 16:23         ` Sagi Grimberg
2020-08-27  2:38         ` Ming Lei
2020-08-25 14:17 ` [PATCH V2 2/2] blk-mq: implement queue quiesce via percpu_ref for BLK_MQ_F_BLOCKING Ming Lei
2020-09-02  3:11 ` [PATCH V2 0/2] " Ming Lei
2020-09-02 17:52   ` Jens Axboe
2020-09-02 18:20     ` Sagi Grimberg
2020-09-03  0:41       ` Ming Lei
2020-09-03  0:35     ` Ming Lei
2020-09-03 12:37       ` Keith Busch
2020-09-03 13:10         ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.