[RFC PATCH 0/3] blk-mq: rework queue freeze and preempt-only

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/3] blk-mq: rework queue freeze and preempt-only
@ 2018-09-20 10:18 Jianchao Wang
  2018-09-20 10:18 ` [PATCH 1/3] percpu_ref: add a new helper interface __percpu_ref_get_many Jianchao Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Jianchao Wang @ 2018-09-20 10:18 UTC (permalink / raw)
  To: axboe, tj, kent.overstreet, ming.lei, bart.vanassche
  Cc: linux-block, linux-kernel

Hi all

The current queue freeze depending on percpu_ref_kil/reinit has a limit that
we have drain the requests before unfreeze the queue.

Let's rework the queue freeze feature as following:
1. introduce __percpu_ref_get_many.
   It is same with original percpu_ref_get_many, but just need callers to provide
   sched rcu critical section. We will put the __percpu_ref_get_many and our own
   condition checking under rcu_read_lock_sched. With this new helper interface,
   we could save an extra rcu_read_lock_sched.
2. rework the blk_queue_enter as:

   rcu_read_lock_sched()
   if condition check true
     __percpu_ref_get_many(&q->q_usage_counter, 1)
   else
     goto wait
   rcu_read_unlock_sched()
3. use percpu_ref_switch_to_atomic/percpu to switch mode directly.

Then we could unfreeze the queue w/o draining requests.
In addition, preempt-only mode code could be simplified.

Jianchao Wang (3)
percpu_ref: add a new helper interface
blk-core: rework the queue freeze
block, scsi: rework the preempt only mode

d16c830226a0ac15cb5947479030b
 block/blk-core.c                | 58 ++++++++++++++++++++++++++--------------------------------
 block/blk-mq-debugfs.c          |  1 -
 block/blk-mq.c                  |  8 ++++++--
 block/blk.h                     |  5 +++++
 drivers/scsi/scsi_lib.c         | 12 ++++--------
 include/linux/blkdev.h          |  5 ++---
 include/linux/percpu-refcount.h | 23 +++++++++++++++++------
 7 files changed, 60 insertions(+), 52 deletions(-)

Thanks
Jianchao 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] percpu_ref: add a new helper interface __percpu_ref_get_many
  2018-09-20 10:18 [RFC PATCH 0/3] blk-mq: rework queue freeze and preempt-only Jianchao Wang
@ 2018-09-20 10:18 ` Jianchao Wang
  2018-09-20 20:53   ` Tejun Heo
  2018-09-20 10:18 ` [PATCH 2/3] blk-core: rework the queue freeze Jianchao Wang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Jianchao Wang @ 2018-09-20 10:18 UTC (permalink / raw)
  To: axboe, tj, kent.overstreet, ming.lei, bart.vanassche
  Cc: linux-block, linux-kernel

This __percpu_ref_get_many is almost same with the percpu_ref_get_many,
except for the caller need to provide a sched rcu critical section
for it. We want to do some other condition checking under the sched
rcu lock. With this interface, one extra rcu_read_lock/unlock_sched
could be saved.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
---
 include/linux/percpu-refcount.h | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index 009cdf3..b86e03b 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -169,21 +169,32 @@ static inline bool __ref_is_percpu(struct percpu_ref *ref,
  * @ref: percpu_ref to get
  * @nr: number of references to get
  *
- * Analogous to atomic_long_add().
- *
- * This function is safe to call as long as @ref is between init and exit.
+ * This function is same with percpu_ref_get_many except for the caller need to
+ * provide a sched rcu critical section for it.
  */
-static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned long nr)
+static inline void __percpu_ref_get_many(struct percpu_ref *ref, unsigned long nr)
 {
 	unsigned long __percpu *percpu_count;
 
-	rcu_read_lock_sched();
-
 	if (__ref_is_percpu(ref, &percpu_count))
 		this_cpu_add(*percpu_count, nr);
 	else
 		atomic_long_add(nr, &ref->count);
+}
 
+/**
+ * percpu_ref_get_many - increment a percpu refcount
+ * @ref: percpu_ref to get
+ * @nr: number of references to get
+ *
+ * Analogous to atomic_long_add().
+ *
+ * This function is safe to call as long as @ref is between init and exit.
+ */
+static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned long nr)
+{
+	rcu_read_lock_sched();
+	__percpu_ref_get_many(ref, nr);
 	rcu_read_unlock_sched();
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] blk-core: rework the queue freeze
  2018-09-20 10:18 [RFC PATCH 0/3] blk-mq: rework queue freeze and preempt-only Jianchao Wang
  2018-09-20 10:18 ` [PATCH 1/3] percpu_ref: add a new helper interface __percpu_ref_get_many Jianchao Wang
@ 2018-09-20 10:18 ` Jianchao Wang
  2018-09-20 10:18 ` [PATCH 3/3] block, scsi: rework the preempt only mode Jianchao Wang
  2018-09-20 17:06 ` [RFC PATCH 0/3] blk-mq: rework queue freeze and preempt-only Bart Van Assche
  3 siblings, 0 replies; 10+ messages in thread
From: Jianchao Wang @ 2018-09-20 10:18 UTC (permalink / raw)
  To: axboe, tj, kent.overstreet, ming.lei, bart.vanassche
  Cc: linux-block, linux-kernel

The previous queue freeze depends on the percpu_ref_kill/reinit.
But the limit is that we have to drain the q_usage_counter when
unfreeze the queue.

To improve this, we implement our own condition checking, namely
queue_gate, instead of depending on the __PERCPU_REF_DEAD. Then
put both the checking on queue_gate and __percpu_ref_get_many
under sched rcu lock. At the same time, switch the percpu ref mode
between atomic and percpu with percpu_ref_switch_to_atomic/percpu.

After this, introduce the BLK_QUEUE_GATE_FROZEN on queue_gate to
implement queue freeze feature. Then we could unfreeze the queue
anytime without drain the queue.

In addition, this fashion will be convinient to implement other
condition checking, such as preempt-only mode.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
---
 block/blk-core.c        | 28 +++++++++++++++++-----------
 block/blk-mq.c          |  8 ++++++--
 block/blk.h             |  4 ++++
 drivers/scsi/scsi_lib.c |  2 +-
 include/linux/blkdev.h  |  2 ++
 5 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index dee56c2..f8b8fe2 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -910,6 +910,18 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(blk_alloc_queue);
 
+static inline bool blk_queue_gate_allow(struct request_queue *q,
+		blk_mq_req_flags_t flags)
+{
+	if (likely(!q->queue_gate))
+		return true;
+
+	if (test_bit(BLK_QUEUE_GATE_FROZEN, &q->queue_gate))
+		return false;
+
+	return true;
+}
+
 /**
  * blk_queue_enter() - try to increase q->q_usage_counter
  * @q: request queue pointer
@@ -922,8 +934,9 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 	while (true) {
 		bool success = false;
 
-		rcu_read_lock();
-		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
+		rcu_read_lock_sched();
+		if (blk_queue_gate_allow(q, flags)) {
+			__percpu_ref_get_many(&q->q_usage_counter, 1);
 			/*
 			 * The code that sets the PREEMPT_ONLY flag is
 			 * responsible for ensuring that that flag is globally
@@ -935,7 +948,7 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 				percpu_ref_put(&q->q_usage_counter);
 			}
 		}
-		rcu_read_unlock();
+		rcu_read_unlock_sched();
 
 		if (success)
 			return 0;
@@ -943,17 +956,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		if (flags & BLK_MQ_REQ_NOWAIT)
 			return -EBUSY;
 
-		/*
-		 * read pair of barrier in blk_freeze_queue_start(),
-		 * we need to order reading __PERCPU_REF_DEAD flag of
-		 * .q_usage_counter and reading .mq_freeze_depth or
-		 * queue dying flag, otherwise the following wait may
-		 * never return if the two reads are reordered.
-		 */
 		smp_rmb();
 
 		wait_event(q->mq_freeze_wq,
-			   (atomic_read(&q->mq_freeze_depth) == 0 &&
+			   (blk_queue_gate_allow(q, flags) &&
 			    (preempt || !blk_queue_preempt_only(q))) ||
 			   blk_queue_dying(q));
 		if (blk_queue_dying(q))
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 85a1c1a..fc90ad3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -140,7 +140,9 @@ void blk_freeze_queue_start(struct request_queue *q)
 
 	freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
 	if (freeze_depth == 1) {
-		percpu_ref_kill(&q->q_usage_counter);
+		set_bit(BLK_QUEUE_GATE_FROZEN, &q->queue_gate);
+		percpu_ref_put(&q->q_usage_counter);
+		percpu_ref_switch_to_atomic(&q->q_usage_counter, NULL);
 		if (q->mq_ops)
 			blk_mq_run_hw_queues(q, false);
 	}
@@ -198,7 +200,9 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 	freeze_depth = atomic_dec_return(&q->mq_freeze_depth);
 	WARN_ON_ONCE(freeze_depth < 0);
 	if (!freeze_depth) {
-		percpu_ref_reinit(&q->q_usage_counter);
+		clear_bit(BLK_QUEUE_GATE_FROZEN, &q->queue_gate);
+		percpu_ref_get(&q->q_usage_counter);
+		percpu_ref_switch_to_percpu(&q->q_usage_counter);
 		wake_up_all(&q->mq_freeze_wq);
 	}
 }
diff --git a/block/blk.h b/block/blk.h
index 9db4e38..19d2c00 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -19,6 +19,10 @@
 extern struct dentry *blk_debugfs_root;
 #endif
 
+enum blk_queue_gate_flag_t {
+	BLK_QUEUE_GATE_FROZEN,
+};
+
 struct blk_flush_queue {
 	unsigned int		flush_queue_delayed:1;
 	unsigned int		flush_pending_idx:1;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0adfb3b..1980648 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3066,7 +3066,7 @@ scsi_device_quiesce(struct scsi_device *sdev)
 	 * unfreeze even if the queue was already frozen before this function
 	 * was called. See also https://lwn.net/Articles/573497/.
 	 */
-	synchronize_rcu();
+	synchronize_sched();
 	blk_mq_unfreeze_queue(q);
 
 	mutex_lock(&sdev->state_mutex);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d6869e0..9f3f0d7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -647,6 +647,8 @@ struct request_queue {
 	struct rcu_head		rcu_head;
 	wait_queue_head_t	mq_freeze_wq;
 	struct percpu_ref	q_usage_counter;
+	unsigned long		queue_gate;
+
 	struct list_head	all_q_node;
 
 	struct blk_mq_tag_set	*tag_set;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] block, scsi: rework the preempt only mode
  2018-09-20 10:18 [RFC PATCH 0/3] blk-mq: rework queue freeze and preempt-only Jianchao Wang
  2018-09-20 10:18 ` [PATCH 1/3] percpu_ref: add a new helper interface __percpu_ref_get_many Jianchao Wang
  2018-09-20 10:18 ` [PATCH 2/3] blk-core: rework the queue freeze Jianchao Wang
@ 2018-09-20 10:18 ` Jianchao Wang
  2018-09-20 17:06 ` [RFC PATCH 0/3] blk-mq: rework queue freeze and preempt-only Bart Van Assche
  3 siblings, 0 replies; 10+ messages in thread
From: Jianchao Wang @ 2018-09-20 10:18 UTC (permalink / raw)
  To: axboe, tj, kent.overstreet, ming.lei, bart.vanassche
  Cc: linux-block, linux-kernel

Migrate preempt-only mode from queue_flags to queue_gate. Because
the queue_gate checking and __percpu_ref_get_many are under the
same sched rcu critical section, we don't need extra synchronize_sched
after blk_mq_freeze_queue any more.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
---
 block/blk-core.c        | 32 ++++++++++----------------------
 block/blk-mq-debugfs.c  |  1 -
 block/blk.h             |  1 +
 drivers/scsi/scsi_lib.c | 12 ++++--------
 include/linux/blkdev.h  |  3 ---
 5 files changed, 15 insertions(+), 34 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index f8b8fe2..c8d642a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -429,13 +429,13 @@ EXPORT_SYMBOL(blk_sync_queue);
  */
 int blk_set_preempt_only(struct request_queue *q)
 {
-	return blk_queue_flag_test_and_set(QUEUE_FLAG_PREEMPT_ONLY, q);
+	return test_and_set_bit(BLK_QUEUE_GATE_PREEMPT_ONLY, &q->queue_gate);
 }
 EXPORT_SYMBOL_GPL(blk_set_preempt_only);
 
 void blk_clear_preempt_only(struct request_queue *q)
 {
-	blk_queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
+	clear_bit(BLK_QUEUE_GATE_PREEMPT_ONLY, &q->queue_gate);
 	wake_up_all(&q->mq_freeze_wq);
 }
 EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
@@ -919,6 +919,10 @@ static inline bool blk_queue_gate_allow(struct request_queue *q,
 	if (test_bit(BLK_QUEUE_GATE_FROZEN, &q->queue_gate))
 		return false;
 
+	if (test_bit(BLK_QUEUE_GATE_PREEMPT_ONLY, &q->queue_gate) &&
+			!(flags & BLK_MQ_REQ_PREEMPT))
+		return false;
+
 	return true;
 }
 
@@ -929,39 +933,23 @@ static inline bool blk_queue_gate_allow(struct request_queue *q,
  */
 int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 {
-	const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
-
 	while (true) {
-		bool success = false;
-
 		rcu_read_lock_sched();
 		if (blk_queue_gate_allow(q, flags)) {
 			__percpu_ref_get_many(&q->q_usage_counter, 1);
-			/*
-			 * The code that sets the PREEMPT_ONLY flag is
-			 * responsible for ensuring that that flag is globally
-			 * visible before the queue is unfrozen.
-			 */
-			if (preempt || !blk_queue_preempt_only(q)) {
-				success = true;
-			} else {
-				percpu_ref_put(&q->q_usage_counter);
-			}
+			rcu_read_unlock_sched();
+			return 0;
 		}
 		rcu_read_unlock_sched();
 
-		if (success)
-			return 0;
-
 		if (flags & BLK_MQ_REQ_NOWAIT)
 			return -EBUSY;
 
 		smp_rmb();
 
 		wait_event(q->mq_freeze_wq,
-			   (blk_queue_gate_allow(q, flags) &&
-			    (preempt || !blk_queue_preempt_only(q))) ||
-			   blk_queue_dying(q));
+				blk_queue_gate_allow(q, flags) ||
+				blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
 	}
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index cb1e6cf..4174951 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -132,7 +132,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(REGISTERED),
 	QUEUE_FLAG_NAME(SCSI_PASSTHROUGH),
 	QUEUE_FLAG_NAME(QUIESCED),
-	QUEUE_FLAG_NAME(PREEMPT_ONLY),
 };
 #undef QUEUE_FLAG_NAME
 
diff --git a/block/blk.h b/block/blk.h
index 19d2c00..e38060b 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -21,6 +21,7 @@ extern struct dentry *blk_debugfs_root;
 
 enum blk_queue_gate_flag_t {
 	BLK_QUEUE_GATE_FROZEN,
+	BLK_QUEUE_GATE_PREEMPT_ONLY,
 };
 
 struct blk_flush_queue {
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1980648..3c9c33a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3058,17 +3058,13 @@ scsi_device_quiesce(struct scsi_device *sdev)
 	WARN_ON_ONCE(sdev->quiesced_by && sdev->quiesced_by != current);
 
 	blk_set_preempt_only(q);
-
 	blk_mq_freeze_queue(q);
+	blk_mq_unfreeze_queue(q);
 	/*
-	 * Ensure that the effect of blk_set_preempt_only() will be visible
-	 * for percpu_ref_tryget() callers that occur after the queue
-	 * unfreeze even if the queue was already frozen before this function
-	 * was called. See also https://lwn.net/Articles/573497/.
+	 * When we reach here:
+	 *  - PREEMPT_ONLY gate flag is globally visible
+	 *  - no non-preempt request in queue
 	 */
-	synchronize_sched();
-	blk_mq_unfreeze_queue(q);
-
 	mutex_lock(&sdev->state_mutex);
 	err = scsi_device_set_state(sdev, SDEV_QUIESCE);
 	if (err == 0)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9f3f0d7..caa73b8 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -700,7 +700,6 @@ struct request_queue {
 #define QUEUE_FLAG_REGISTERED  26	/* queue has been registered to a disk */
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
-#define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process REQ_PREEMPT requests */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
@@ -738,8 +737,6 @@ bool blk_queue_flag_test_and_clear(unsigned int flag, struct request_queue *q);
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
 			     REQ_FAILFAST_DRIVER))
 #define blk_queue_quiesced(q)	test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
-#define blk_queue_preempt_only(q)				\
-	test_bit(QUEUE_FLAG_PREEMPT_ONLY, &(q)->queue_flags)
 #define blk_queue_fua(q)	test_bit(QUEUE_FLAG_FUA, &(q)->queue_flags)
 
 extern int blk_set_preempt_only(struct request_queue *q);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/3] blk-mq: rework queue freeze and preempt-only
  2018-09-20 10:18 [RFC PATCH 0/3] blk-mq: rework queue freeze and preempt-only Jianchao Wang
                   ` (2 preceding siblings ...)
  2018-09-20 10:18 ` [PATCH 3/3] block, scsi: rework the preempt only mode Jianchao Wang
@ 2018-09-20 17:06 ` Bart Van Assche
  2018-09-21  1:51   ` jianchao.wang
  3 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2018-09-20 17:06 UTC (permalink / raw)
  To: Jianchao Wang, axboe, tj, kent.overstreet, ming.lei, bart.vanassche
  Cc: linux-block, linux-kernel

On Thu, 2018-09-20 at 18:18 +0800, Jianchao Wang wrote:
> The current queue freeze depending on percpu_ref_kil/reinit has a limit that
> we have drain the requests before unfreeze the queue.
> 
> Let's rework the queue freeze feature as following:
> 1. introduce __percpu_ref_get_many.
>    It is same with original percpu_ref_get_many, but just need callers to provide
>    sched rcu critical section. We will put the __percpu_ref_get_many and our own
>    condition checking under rcu_read_lock_sched. With this new helper interface,
>    we could save an extra rcu_read_lock_sched.
> 2. rework the blk_queue_enter as:
>    
>    rcu_read_lock_sched()
>    if condition check true
>      __percpu_ref_get_many(&q->q_usage_counter, 1)
>    else
>      goto wait
>    rcu_read_unlock_sched()
> 3. use percpu_ref_switch_to_atomic/percpu to switch mode directly.
> 
> Then we could unfreeze the queue w/o draining requests.
> In addition, preempt-only mode code could be simplified.

Hello Jianchao,
                                                                           =
Thanks for having taken a look. However, the approach of this patch series
may be more complicated than necessary. Had you considered something like
the patch below?

Thanks,

Bart.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 20fdd78b75c7..d384ab700afd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -199,7 +199,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 	freeze_depth = atomic_dec_return(&q->mq_freeze_depth);
 	WARN_ON_ONCE(freeze_depth < 0);
 	if (!freeze_depth) {
-		percpu_ref_reinit(&q->q_usage_counter);
+		percpu_ref_resurrect(&q->q_usage_counter);
 		wake_up_all(&q->mq_freeze_wq);
 	}
 }
diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index 009cdf3d65b6..b297cd1cd4f1 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -108,6 +108,7 @@ void percpu_ref_switch_to_atomic_sync(struct percpu_ref *ref);
 void percpu_ref_switch_to_percpu(struct percpu_ref *ref);
 void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
 				 percpu_ref_func_t *confirm_kill);
+void percpu_ref_resurrect(struct percpu_ref *ref);
 void percpu_ref_reinit(struct percpu_ref *ref);
 
 /**
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index 9f96fa7bc000..054e7650cd35 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -356,11 +356,31 @@ EXPORT_SYMBOL_GPL(percpu_ref_kill_and_confirm);
  */
 void percpu_ref_reinit(struct percpu_ref *ref)
 {
+	WARN_ON_ONCE(!percpu_ref_is_zero(ref));
+
+	percpu_ref_resurrect(ref);
+}
+EXPORT_SYMBOL_GPL(percpu_ref_reinit);
+
+/**
+ * percpu_ref_resurrect - modify a percpu refcount from dead to live
+ * @ref: perpcu_ref to resurrect
+ *
+ * Modify @ref so that it's in the same state as before percpu_ref_kill() was
+ * called. @ref must have be dead but not exited.
+ *
+ * Note that percpu_ref_tryget[_live]() are safe to perform on @ref while
+ * this function is in progress.
+ */
+void percpu_ref_resurrect(struct percpu_ref *ref)
+{
+	unsigned long __percpu *percpu_count;
 	unsigned long flags;
 
 	spin_lock_irqsave(&percpu_ref_switch_lock, flags);
 
-	WARN_ON_ONCE(!percpu_ref_is_zero(ref));
+	WARN_ON_ONCE(!(ref->percpu_count_ptr & __PERCPU_REF_DEAD));
+	WARN_ON_ONCE(__ref_is_percpu(ref, &percpu_count));
 
 	ref->percpu_count_ptr &= ~__PERCPU_REF_DEAD;
 	percpu_ref_get(ref);
@@ -368,4 +388,4 @@ void percpu_ref_reinit(struct percpu_ref *ref)
 
 	spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
 }
-EXPORT_SYMBOL_GPL(percpu_ref_reinit);
+EXPORT_SYMBOL_GPL(percpu_ref_resurrect);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] percpu_ref: add a new helper interface __percpu_ref_get_many
  2018-09-20 10:18 ` [PATCH 1/3] percpu_ref: add a new helper interface __percpu_ref_get_many Jianchao Wang
@ 2018-09-20 20:53   ` Tejun Heo
  2018-09-21  1:45     ` jianchao.wang
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2018-09-20 20:53 UTC (permalink / raw)
  To: Jianchao Wang
  Cc: axboe, kent.overstreet, ming.lei, bart.vanassche, linux-block,
	linux-kernel

Hello,

On Thu, Sep 20, 2018 at 06:18:21PM +0800, Jianchao Wang wrote:
> -static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned long nr)
> +static inline void __percpu_ref_get_many(struct percpu_ref *ref, unsigned long nr)
>  {
>  	unsigned long __percpu *percpu_count;
>  
> -	rcu_read_lock_sched();

So, if we're gonna do this (please read below tho), please add the
matching assertion

>  	if (__ref_is_percpu(ref, &percpu_count))
>  		this_cpu_add(*percpu_count, nr);
>  	else
>  		atomic_long_add(nr, &ref->count);
> +}
>  
> +/**
> + * percpu_ref_get_many - increment a percpu refcount
> + * @ref: percpu_ref to get
> + * @nr: number of references to get
> + *
> + * Analogous to atomic_long_add().
> + *
> + * This function is safe to call as long as @ref is between init and exit.
> + */
> +static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned long nr)
> +{
> +	rcu_read_lock_sched();
> +	__percpu_ref_get_many(ref, nr);
>  	rcu_read_unlock_sched();
>  }

And add the matching variant for get/put with and without _many.

Ming, so, if we make locking explicit like above, I think it should be
fine to share the locking.  However, please note that percpu_ref and
blk_mq are using different types of RCU, at least for now, and I'm not
really sure that unifying that and taking out one rcu read lock/unlock
is a meaningful optimization.

Let's please first do something straight-forward.  If somebody can
show that this actually impacts performance, we can optimize it but
right now all these seem premature to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] percpu_ref: add a new helper interface __percpu_ref_get_many
  2018-09-20 20:53   ` Tejun Heo
@ 2018-09-21  1:45     ` jianchao.wang
  0 siblings, 0 replies; 10+ messages in thread
From: jianchao.wang @ 2018-09-21  1:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: axboe, kent.overstreet, ming.lei, bart.vanassche, linux-block,
	linux-kernel

Hi Tejun

Thanks for your kindly response.

On 09/21/2018 04:53 AM, Tejun Heo wrote:
> Hello,
> 
> On Thu, Sep 20, 2018 at 06:18:21PM +0800, Jianchao Wang wrote:
>> -static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned long nr)
>> +static inline void __percpu_ref_get_many(struct percpu_ref *ref, unsigned long nr)
>>  {
>>  	unsigned long __percpu *percpu_count;
>>  
>> -	rcu_read_lock_sched();
> 
> So, if we're gonna do this (please read below tho), please add the
> matching assertion

Yes, I will.

> 
>>  	if (__ref_is_percpu(ref, &percpu_count))
>>  		this_cpu_add(*percpu_count, nr);
>>  	else
>>  		atomic_long_add(nr, &ref->count);
>> +}
>>  
>> +/**
>> + * percpu_ref_get_many - increment a percpu refcount
>> + * @ref: percpu_ref to get
>> + * @nr: number of references to get
>> + *
>> + * Analogous to atomic_long_add().
>> + *
>> + * This function is safe to call as long as @ref is between init and exit.
>> + */
>> +static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned long nr)
>> +{
>> +	rcu_read_lock_sched();
>> +	__percpu_ref_get_many(ref, nr);
>>  	rcu_read_unlock_sched();
>>  }
> 
> And add the matching variant for get/put with and without _many.

Yes.

> 
> Ming, so, if we make locking explicit like above, I think it should be
> fine to share the locking.  However, please note that percpu_ref and
> blk_mq are using different types of RCU, at least for now, and I'm not
> really sure that unifying that and taking out one rcu read lock/unlock
> is a meaningful optimization.
> 

Essentially, I want to rework the current queue freeze which depends on
percpu_ref_kil/reinit. We implement our own condition checking instead of
using the __PERCPU_REF_DEAD checking in percpu_ref_tryget_live. And use
percpu with percpu_ref_switch_to_atomic/percpu to switch the percpu ref
mode between percpu and atimic directly. Then it will be more convenient
to implement:
 - unfreeze the queue without draining requests.
 - check whether q->q_usage_counter is zero
 - add other gate conditions into blk_queue_enter, such as preempt-only

So I want to put the condition checking and percpu_ref_get_many into a
same sched rcu critical section.
 rcu_read_lock_sched()
   if condition check true
     percpu_ref_get_many(&q->q_usage_counter, 1)
   else
     goto wait
   rcu_read_unlock_sched()

Then we could freeze the queue like:
  set FROZEN flag on q
  percpu_ref_put(1)
  percpu_ref_switch_to_atomic
Otherwise, we may need a synchronize_rcu.
It is not for performance.

Is there any reason that why blk_queue_enter only could use rcu lock instead of
the sched rcu lock ?


> Let's please first do something straight-forward.  If somebody can
> show that this actually impacts performance, we can optimize it but
> right now all these seem premature to me.

Adding this interface is just for saving an extra rcu_read_lock/unlock_sched pair.
If it doesn't make any sense, It is OK for me to discard it. :)

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/3] blk-mq: rework queue freeze and preempt-only
  2018-09-20 17:06 ` [RFC PATCH 0/3] blk-mq: rework queue freeze and preempt-only Bart Van Assche
@ 2018-09-21  1:51   ` jianchao.wang
  2018-09-21  2:13     ` Bart Van Assche
  0 siblings, 1 reply; 10+ messages in thread
From: jianchao.wang @ 2018-09-21  1:51 UTC (permalink / raw)
  To: Bart Van Assche, axboe, tj, kent.overstreet, ming.lei, bart.vanassche
  Cc: linux-block, linux-kernel

Hi Bart

On 09/21/2018 01:06 AM, Bart Van Assche wrote:
> On Thu, 2018-09-20 at 18:18 +0800, Jianchao Wang wrote:
>> The current queue freeze depending on percpu_ref_kil/reinit has a limit that
>> we have drain the requests before unfreeze the queue.
>>
>> Let's rework the queue freeze feature as following:
>> 1. introduce __percpu_ref_get_many.
>>    It is same with original percpu_ref_get_many, but just need callers to provide
>>    sched rcu critical section. We will put the __percpu_ref_get_many and our own
>>    condition checking under rcu_read_lock_sched. With this new helper interface,
>>    we could save an extra rcu_read_lock_sched.
>> 2. rework the blk_queue_enter as:
>>    
>>    rcu_read_lock_sched()
>>    if condition check true
>>      __percpu_ref_get_many(&q->q_usage_counter, 1)
>>    else
>>      goto wait
>>    rcu_read_unlock_sched()
>> 3. use percpu_ref_switch_to_atomic/percpu to switch mode directly.
>>
>> Then we could unfreeze the queue w/o draining requests.
>> In addition, preempt-only mode code could be simplified.
> Hello Jianchao,
>                                                                            =
> Thanks for having taken a look. However, the approach of this patch series
> may be more complicated than necessary. Had you considered something like
> the patch below?

This patch is just a trying that let our life easier w/o changing anything in percpu-ref code. :)
In fact, the 1st patch which introduces a new non-functional-changing helper interface in
percpu-ref code is not necessary.

This patchset just implement our own condition checking in blk_queue_enter instead of depending on
__PERCPU_REF_DEAD checking in percpu_ref_tryget_live.

Then we could do more thing here, like:
 - unfreeze the queue without draining requests.
 - check whether q->q_usage_counter is zero
 - add other gate conditions into blk_queue_enter, such as preempt-only, even any others.

So why not try it ?

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/3] blk-mq: rework queue freeze and preempt-only
  2018-09-21  1:51   ` jianchao.wang
@ 2018-09-21  2:13     ` Bart Van Assche
  2018-09-21  2:24       ` jianchao.wang
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2018-09-21  2:13 UTC (permalink / raw)
  To: jianchao.wang, axboe, tj, kent.overstreet, ming.lei, bart.vanassche
  Cc: linux-block, linux-kernel

On 9/20/18 6:51 PM, jianchao.wang wrote:
> Hi Bart
> 
> On 09/21/2018 01:06 AM, Bart Van Assche wrote:
>> On Thu, 2018-09-20 at 18:18 +0800, Jianchao Wang wrote:
>>> The current queue freeze depending on percpu_ref_kil/reinit has a limit that
>>> we have drain the requests before unfreeze the queue.
>>>
>>> Let's rework the queue freeze feature as following:
>>> 1. introduce __percpu_ref_get_many.
>>>     It is same with original percpu_ref_get_many, but just need callers to provide
>>>     sched rcu critical section. We will put the __percpu_ref_get_many and our own
>>>     condition checking under rcu_read_lock_sched. With this new helper interface,
>>>     we could save an extra rcu_read_lock_sched.
>>> 2. rework the blk_queue_enter as:
>>>     
>>>     rcu_read_lock_sched()
>>>     if condition check true
>>>       __percpu_ref_get_many(&q->q_usage_counter, 1)
>>>     else
>>>       goto wait
>>>     rcu_read_unlock_sched()
>>> 3. use percpu_ref_switch_to_atomic/percpu to switch mode directly.
>>>
>>> Then we could unfreeze the queue w/o draining requests.
>>> In addition, preempt-only mode code could be simplified.
>> Hello Jianchao,
>>                                                                             =
>> Thanks for having taken a look. However, the approach of this patch series
>> may be more complicated than necessary. Had you considered something like
>> the patch below?
> 
> This patch is just a trying that let our life easier w/o changing anything in percpu-ref code. :)
> In fact, the 1st patch which introduces a new non-functional-changing helper interface in
> percpu-ref code is not necessary.
> 
> This patchset just implement our own condition checking in blk_queue_enter instead of depending on
> __PERCPU_REF_DEAD checking in percpu_ref_tryget_live.
> 
> Then we could do more thing here, like:
>   - unfreeze the queue without draining requests.
>   - check whether q->q_usage_counter is zero
>   - add other gate conditions into blk_queue_enter, such as preempt-only, even any others.
> 
> So why not try it ?

Hello Jianchao,

Some time ago Tejun wrote that he wants to keep the dependency of the 
percpu_ref implementation on rcu-sched inside the percpu_ref 
implementation because he would like to have the freedom to switch to 
regular RCU. Your patch series makes the dependency of percpu_ref 
counters on rcu-sched visible outside the percpu_ref implementation. I 
think that's a disadvantage of your approach. Tejun, please correct me 
if I misunderstood you.

Bart.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/3] blk-mq: rework queue freeze and preempt-only
  2018-09-21  2:13     ` Bart Van Assche
@ 2018-09-21  2:24       ` jianchao.wang
  0 siblings, 0 replies; 10+ messages in thread
From: jianchao.wang @ 2018-09-21  2:24 UTC (permalink / raw)
  To: Bart Van Assche, axboe, tj, kent.overstreet, ming.lei, bart.vanassche
  Cc: linux-block, linux-kernel


Hi Bart

On 09/21/2018 10:13 AM, Bart Van Assche wrote:
> On 9/20/18 6:51 PM, jianchao.wang wrote:
>> Hi Bart
>>
>> On 09/21/2018 01:06 AM, Bart Van Assche wrote:
>>> On Thu, 2018-09-20 at 18:18 +0800, Jianchao Wang wrote:
>>>> The current queue freeze depending on percpu_ref_kil/reinit has a limit that
>>>> we have drain the requests before unfreeze the queue.
>>>>
>>>> Let's rework the queue freeze feature as following:
>>>> 1. introduce __percpu_ref_get_many.
>>>>     It is same with original percpu_ref_get_many, but just need callers to provide
>>>>     sched rcu critical section. We will put the __percpu_ref_get_many and our own
>>>>     condition checking under rcu_read_lock_sched. With this new helper interface,
>>>>     we could save an extra rcu_read_lock_sched.
>>>> 2. rework the blk_queue_enter as:
>>>>         rcu_read_lock_sched()
>>>>     if condition check true
>>>>       __percpu_ref_get_many(&q->q_usage_counter, 1)
>>>>     else
>>>>       goto wait
>>>>     rcu_read_unlock_sched()
>>>> 3. use percpu_ref_switch_to_atomic/percpu to switch mode directly.
>>>>
>>>> Then we could unfreeze the queue w/o draining requests.
>>>> In addition, preempt-only mode code could be simplified.
>>> Hello Jianchao,
>>>                                                                             =
>>> Thanks for having taken a look. However, the approach of this patch series
>>> may be more complicated than necessary. Had you considered something like
>>> the patch below?
>>
>> This patch is just a trying that let our life easier w/o changing anything in percpu-ref code. :)
>> In fact, the 1st patch which introduces a new non-functional-changing helper interface in
>> percpu-ref code is not necessary.
>>
>> This patchset just implement our own condition checking in blk_queue_enter instead of depending on
>> __PERCPU_REF_DEAD checking in percpu_ref_tryget_live.
>>
>> Then we could do more thing here, like:
>>   - unfreeze the queue without draining requests.
>>   - check whether q->q_usage_counter is zero
>>   - add other gate conditions into blk_queue_enter, such as preempt-only, even any others.
>>
>> So why not try it ?
> 
> Hello Jianchao,
> 
> Some time ago Tejun wrote that he wants to keep the dependency of the percpu_ref implementation on rcu-sched inside the percpu_ref implementation because he would like to have the freedom to switch to regular RCU. Your patch series makes the dependency of percpu_ref counters on rcu-sched visible outside the percpu_ref implementation. I think that's a disadvantage of your approach. Tejun, please correct me if I misunderstood you.

It will be easy to fix this through adding some other interfaces like:

percpu_ref_lock/unlock
{
    rcu_read_lock/unlock_sched
}

But things is up to Tejun that whether he agrees to do like this.
 
Thanks
Jianchao
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-09-21  2:24 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-20 10:18 [RFC PATCH 0/3] blk-mq: rework queue freeze and preempt-only Jianchao Wang
2018-09-20 10:18 ` [PATCH 1/3] percpu_ref: add a new helper interface __percpu_ref_get_many Jianchao Wang
2018-09-20 20:53   ` Tejun Heo
2018-09-21  1:45     ` jianchao.wang
2018-09-20 10:18 ` [PATCH 2/3] blk-core: rework the queue freeze Jianchao Wang
2018-09-20 10:18 ` [PATCH 3/3] block, scsi: rework the preempt only mode Jianchao Wang
2018-09-20 17:06 ` [RFC PATCH 0/3] blk-mq: rework queue freeze and preempt-only Bart Van Assche
2018-09-21  1:51   ` jianchao.wang
2018-09-21  2:13     ` Bart Van Assche
2018-09-21  2:24       ` jianchao.wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).