All of lore.kernel.org
 help / color / mirror / Atom feed
* blk-mq vs cpu hotplug performance (due to percpu_ref_put performance)
@ 2014-10-28 19:35 ` Christian Borntraeger
  0 siblings, 0 replies; 20+ messages in thread
From: Christian Borntraeger @ 2014-10-28 19:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Kent Overstreet, Jens Axboe, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

Tejun,

when going from 3.17 to 3.18-rc2 cpu hotplug become horrible slow on some KVM guests on s390

I was able to bisect this to

commit 9eca80461a45177e456219a9cd944c27675d6512
("Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe")

Seems that this is due to all the rcu grace periods on percpu_ref_put during the cpu hotplug notifiers.

This is barely noticable on small guests (lets say 1 virtio disk), but on guests with 20 disks a hotplug takes 2 or 3 instead of around 0.1 sec.
There are three things that make this especially noticably on s390:
- s390 has 100HZ which makes grace period waiting slower
- s390 does not yet implement context tracking which would speed up RCU
- s390 systems usually have a bigger amount of disk (e.g. 20 7GB disks instead of one 140GB disks)

Any idea how to improve the situation? I think we could accept an expedited variant on cpu hotplug, since stop_machine_run will cause hickups anyway, but there are probably other callers.


Christian

PS: on the plus side, this makes CPU hotplug races less likely.... 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* blk-mq vs cpu hotplug performance (due to percpu_ref_put performance)
@ 2014-10-28 19:35 ` Christian Borntraeger
  0 siblings, 0 replies; 20+ messages in thread
From: Christian Borntraeger @ 2014-10-28 19:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Kent Overstreet, Jens Axboe, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

Tejun,

when going from 3.17 to 3.18-rc2 cpu hotplug become horrible slow on some KVM guests on s390

I was able to bisect this to

commit 9eca80461a45177e456219a9cd944c27675d6512
("Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe")

Seems that this is due to all the rcu grace periods on percpu_ref_put during the cpu hotplug notifiers.

This is barely noticable on small guests (lets say 1 virtio disk), but on guests with 20 disks a hotplug takes 2 or 3 instead of around 0.1 sec.
There are three things that make this especially noticably on s390:
- s390 has 100HZ which makes grace period waiting slower
- s390 does not yet implement context tracking which would speed up RCU
- s390 systems usually have a bigger amount of disk (e.g. 20 7GB disks instead of one 140GB disks)

Any idea how to improve the situation? I think we could accept an expedited variant on cpu hotplug, since stop_machine_run will cause hickups anyway, but there are probably other callers.


Christian

PS: on the plus side, this makes CPU hotplug races less likely.... 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: blk-mq vs cpu hotplug performance (due to percpu_ref_put performance)
  2014-10-28 19:35 ` Christian Borntraeger
@ 2014-10-28 20:00   ` Tejun Heo
  -1 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2014-10-28 20:00 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kent Overstreet, Jens Axboe, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

Hello,

On Tue, Oct 28, 2014 at 08:35:39PM +0100, Christian Borntraeger wrote:
> when going from 3.17 to 3.18-rc2 cpu hotplug become horrible slow on some KVM guests on s390
> 
> I was able to bisect this to
> 
> commit 9eca80461a45177e456219a9cd944c27675d6512
> ("Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe")

That removes the earlier kludge to avoid the RCU delay so RCU
latencies are expected to show up right after; however, the following
patches implement proper fix for the problem and the latencies
shouldn't be visible afterwards.

So, 17497acbdce9 ("blk-mq, percpu_ref: start q->mq_usage_counter in
atomic mode") should remove the latencies again.  It doesn't?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: blk-mq vs cpu hotplug performance (due to percpu_ref_put performance)
@ 2014-10-28 20:00   ` Tejun Heo
  0 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2014-10-28 20:00 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kent Overstreet, Jens Axboe, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

Hello,

On Tue, Oct 28, 2014 at 08:35:39PM +0100, Christian Borntraeger wrote:
> when going from 3.17 to 3.18-rc2 cpu hotplug become horrible slow on some KVM guests on s390
> 
> I was able to bisect this to
> 
> commit 9eca80461a45177e456219a9cd944c27675d6512
> ("Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe")

That removes the earlier kludge to avoid the RCU delay so RCU
latencies are expected to show up right after; however, the following
patches implement proper fix for the problem and the latencies
shouldn't be visible afterwards.

So, 17497acbdce9 ("blk-mq, percpu_ref: start q->mq_usage_counter in
atomic mode") should remove the latencies again.  It doesn't?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: blk-mq vs cpu hotplug performance (due to percpu_ref_put performance)
  2014-10-28 20:00   ` Tejun Heo
@ 2014-10-28 20:20     ` Christian Borntraeger
  -1 siblings, 0 replies; 20+ messages in thread
From: Christian Borntraeger @ 2014-10-28 20:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Kent Overstreet, Jens Axboe, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

Am 28.10.2014 21:00, schrieb Tejun Heo:
> Hello,
> 
> On Tue, Oct 28, 2014 at 08:35:39PM +0100, Christian Borntraeger wrote:
>> when going from 3.17 to 3.18-rc2 cpu hotplug become horrible slow on some KVM guests on s390
>>
>> I was able to bisect this to
>>
>> commit 9eca80461a45177e456219a9cd944c27675d6512
>> ("Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe")
> 
> That removes the earlier kludge to avoid the RCU delay so RCU
> latencies are expected to show up right after; however, the following
> patches implement proper fix for the problem and the latencies
> shouldn't be visible afterwards.
> 
> So, 17497acbdce9 ("blk-mq, percpu_ref: start q->mq_usage_counter in
> atomic mode") should remove the latencies again.  It doesn't?

I have not verified this, but I guess what happens is:
hotplug
-> notify
-> blk_mq_queue_reinit_notify
-> blk_mq_queue_reinit
-> blk_mq_freeze_queue
-> percpu_ref_kill
-> percpu_ref_kill_and_confirm
-> __percpu_ref_switch_to_atomic
-> call_rcu_sched  

for every request queue.
Christian


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: blk-mq vs cpu hotplug performance (due to percpu_ref_put performance)
@ 2014-10-28 20:20     ` Christian Borntraeger
  0 siblings, 0 replies; 20+ messages in thread
From: Christian Borntraeger @ 2014-10-28 20:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Kent Overstreet, Jens Axboe, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

Am 28.10.2014 21:00, schrieb Tejun Heo:
> Hello,
> 
> On Tue, Oct 28, 2014 at 08:35:39PM +0100, Christian Borntraeger wrote:
>> when going from 3.17 to 3.18-rc2 cpu hotplug become horrible slow on some KVM guests on s390
>>
>> I was able to bisect this to
>>
>> commit 9eca80461a45177e456219a9cd944c27675d6512
>> ("Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe")
> 
> That removes the earlier kludge to avoid the RCU delay so RCU
> latencies are expected to show up right after; however, the following
> patches implement proper fix for the problem and the latencies
> shouldn't be visible afterwards.
> 
> So, 17497acbdce9 ("blk-mq, percpu_ref: start q->mq_usage_counter in
> atomic mode") should remove the latencies again.  It doesn't?

I have not verified this, but I guess what happens is:
hotplug
-> notify
-> blk_mq_queue_reinit_notify
-> blk_mq_queue_reinit
-> blk_mq_freeze_queue
-> percpu_ref_kill
-> percpu_ref_kill_and_confirm
-> __percpu_ref_switch_to_atomic
-> call_rcu_sched  

for every request queue.
Christian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: blk-mq vs cpu hotplug performance (due to percpu_ref_put performance)
  2014-10-28 20:20     ` Christian Borntraeger
@ 2014-10-28 20:22       ` Tejun Heo
  -1 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2014-10-28 20:22 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kent Overstreet, Jens Axboe, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

On Tue, Oct 28, 2014 at 09:20:55PM +0100, Christian Borntraeger wrote:
> I have not verified this, but I guess what happens is:
> hotplug
> -> notify
> -> blk_mq_queue_reinit_notify
> -> blk_mq_queue_reinit
> -> blk_mq_freeze_queue
> -> percpu_ref_kill
> -> percpu_ref_kill_and_confirm
> -> __percpu_ref_switch_to_atomic
> -> call_rcu_sched  

But call_rcu_sched() wouldn't show up as latency.  It's an async call
unlike synchronize_*().

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: blk-mq vs cpu hotplug performance (due to percpu_ref_put performance)
@ 2014-10-28 20:22       ` Tejun Heo
  0 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2014-10-28 20:22 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kent Overstreet, Jens Axboe, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

On Tue, Oct 28, 2014 at 09:20:55PM +0100, Christian Borntraeger wrote:
> I have not verified this, but I guess what happens is:
> hotplug
> -> notify
> -> blk_mq_queue_reinit_notify
> -> blk_mq_queue_reinit
> -> blk_mq_freeze_queue
> -> percpu_ref_kill
> -> percpu_ref_kill_and_confirm
> -> __percpu_ref_switch_to_atomic
> -> call_rcu_sched  

But call_rcu_sched() wouldn't show up as latency.  It's an async call
unlike synchronize_*().

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: blk-mq vs cpu hotplug performance (due to percpu_ref_put performance)
  2014-10-28 20:22       ` Tejun Heo
@ 2014-10-28 20:26         ` Tejun Heo
  -1 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2014-10-28 20:26 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kent Overstreet, Jens Axboe, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

On Tue, Oct 28, 2014 at 04:22:55PM -0400, Tejun Heo wrote:
> On Tue, Oct 28, 2014 at 09:20:55PM +0100, Christian Borntraeger wrote:
> > I have not verified this, but I guess what happens is:
> > hotplug
> > -> notify
> > -> blk_mq_queue_reinit_notify
> > -> blk_mq_queue_reinit
> > -> blk_mq_freeze_queue
> > -> percpu_ref_kill
> > -> percpu_ref_kill_and_confirm
> > -> __percpu_ref_switch_to_atomic
> > -> call_rcu_sched  
> 
> But call_rcu_sched() wouldn't show up as latency.  It's an async call
> unlike synchronize_*().

I got confused, so perpcu_ref does wait for the async grace period
making it synchronous.  I see what you mean.  This isn't during init
but freezing itself being slow.  Hmmmm... so are you seeing multiple
queues doing that back-to-back?  If so, the right thing to do would be
making the freezing take place in parallel.  I'll look into it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: blk-mq vs cpu hotplug performance (due to percpu_ref_put performance)
@ 2014-10-28 20:26         ` Tejun Heo
  0 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2014-10-28 20:26 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kent Overstreet, Jens Axboe, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

On Tue, Oct 28, 2014 at 04:22:55PM -0400, Tejun Heo wrote:
> On Tue, Oct 28, 2014 at 09:20:55PM +0100, Christian Borntraeger wrote:
> > I have not verified this, but I guess what happens is:
> > hotplug
> > -> notify
> > -> blk_mq_queue_reinit_notify
> > -> blk_mq_queue_reinit
> > -> blk_mq_freeze_queue
> > -> percpu_ref_kill
> > -> percpu_ref_kill_and_confirm
> > -> __percpu_ref_switch_to_atomic
> > -> call_rcu_sched  
> 
> But call_rcu_sched() wouldn't show up as latency.  It's an async call
> unlike synchronize_*().

I got confused, so perpcu_ref does wait for the async grace period
making it synchronous.  I see what you mean.  This isn't during init
but freezing itself being slow.  Hmmmm... so are you seeing multiple
queues doing that back-to-back?  If so, the right thing to do would be
making the freezing take place in parallel.  I'll look into it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: blk-mq vs cpu hotplug performance (due to percpu_ref_put performance)
  2014-10-28 20:22       ` Tejun Heo
@ 2014-10-28 20:29         ` Christian Borntraeger
  -1 siblings, 0 replies; 20+ messages in thread
From: Christian Borntraeger @ 2014-10-28 20:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Kent Overstreet, Jens Axboe, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

Am 28.10.2014 21:22, schrieb Tejun Heo:
> On Tue, Oct 28, 2014 at 09:20:55PM +0100, Christian Borntraeger wrote:
>> I have not verified this, but I guess what happens is:
>> hotplug
>> -> notify
>> -> blk_mq_queue_reinit_notify
>> -> blk_mq_queue_reinit
>> -> blk_mq_freeze_queue
>> -> percpu_ref_kill
>> -> percpu_ref_kill_and_confirm
>> -> __percpu_ref_switch_to_atomic
>> -> call_rcu_sched  
> 
> But call_rcu_sched() wouldn't show up as latency.  It's an async call
> unlike synchronize_*().

Right, but

blk_mq_freeze_queue

also contains

wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->mq_usage_counter));

Isnt that wait_event woken up at the end of the call_rcu_sched?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: blk-mq vs cpu hotplug performance (due to percpu_ref_put performance)
@ 2014-10-28 20:29         ` Christian Borntraeger
  0 siblings, 0 replies; 20+ messages in thread
From: Christian Borntraeger @ 2014-10-28 20:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Kent Overstreet, Jens Axboe, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

Am 28.10.2014 21:22, schrieb Tejun Heo:
> On Tue, Oct 28, 2014 at 09:20:55PM +0100, Christian Borntraeger wrote:
>> I have not verified this, but I guess what happens is:
>> hotplug
>> -> notify
>> -> blk_mq_queue_reinit_notify
>> -> blk_mq_queue_reinit
>> -> blk_mq_freeze_queue
>> -> percpu_ref_kill
>> -> percpu_ref_kill_and_confirm
>> -> __percpu_ref_switch_to_atomic
>> -> call_rcu_sched  
> 
> But call_rcu_sched() wouldn't show up as latency.  It's an async call
> unlike synchronize_*().

Right, but

blk_mq_freeze_queue

also contains

wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->mq_usage_counter));

Isnt that wait_event woken up at the end of the call_rcu_sched?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: blk-mq vs cpu hotplug performance (due to percpu_ref_put performance)
  2014-10-28 20:29         ` Christian Borntraeger
@ 2014-10-28 20:30           ` Tejun Heo
  -1 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2014-10-28 20:30 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kent Overstreet, Jens Axboe, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

On Tue, Oct 28, 2014 at 09:29:16PM +0100, Christian Borntraeger wrote:
> Am 28.10.2014 21:22, schrieb Tejun Heo:
> > On Tue, Oct 28, 2014 at 09:20:55PM +0100, Christian Borntraeger wrote:
> >> I have not verified this, but I guess what happens is:
> >> hotplug
> >> -> notify
> >> -> blk_mq_queue_reinit_notify
> >> -> blk_mq_queue_reinit
> >> -> blk_mq_freeze_queue
> >> -> percpu_ref_kill
> >> -> percpu_ref_kill_and_confirm
> >> -> __percpu_ref_switch_to_atomic
> >> -> call_rcu_sched  
> > 
> > But call_rcu_sched() wouldn't show up as latency.  It's an async call
> > unlike synchronize_*().
> 
> Right, but
> 
> blk_mq_freeze_queue
> 
> also contains
> 
> wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->mq_usage_counter));
> 
> Isnt that wait_event woken up at the end of the call_rcu_sched?

Yeah, yeah, I was confused.  We just need to initiate the killing for
all mqs at once and then wait for the completions.  Shouldn't be too
difficult to fix.  Will get to it soon.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: blk-mq vs cpu hotplug performance (due to percpu_ref_put performance)
@ 2014-10-28 20:30           ` Tejun Heo
  0 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2014-10-28 20:30 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Kent Overstreet, Jens Axboe, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

On Tue, Oct 28, 2014 at 09:29:16PM +0100, Christian Borntraeger wrote:
> Am 28.10.2014 21:22, schrieb Tejun Heo:
> > On Tue, Oct 28, 2014 at 09:20:55PM +0100, Christian Borntraeger wrote:
> >> I have not verified this, but I guess what happens is:
> >> hotplug
> >> -> notify
> >> -> blk_mq_queue_reinit_notify
> >> -> blk_mq_queue_reinit
> >> -> blk_mq_freeze_queue
> >> -> percpu_ref_kill
> >> -> percpu_ref_kill_and_confirm
> >> -> __percpu_ref_switch_to_atomic
> >> -> call_rcu_sched  
> > 
> > But call_rcu_sched() wouldn't show up as latency.  It's an async call
> > unlike synchronize_*().
> 
> Right, but
> 
> blk_mq_freeze_queue
> 
> also contains
> 
> wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->mq_usage_counter));
> 
> Isnt that wait_event woken up at the end of the call_rcu_sched?

Yeah, yeah, I was confused.  We just need to initiate the killing for
all mqs at once and then wait for the completions.  Shouldn't be too
difficult to fix.  Will get to it soon.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH block/for-linus] blk-mq: make mq_queue_reinit_notify() freeze queues in parallel
  2014-10-28 20:30           ` Tejun Heo
@ 2014-11-04 18:52             ` Tejun Heo
  -1 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2014-11-04 18:52 UTC (permalink / raw)
  To: Jens Axboe, Christian Borntraeger
  Cc: Kent Overstreet, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

q->mq_usage_counter is a percpu_ref which is killed and drained when
the queue is frozen.  On a CPU hotplug event, blk_mq_queue_reinit()
which involves freezing the queue is invoked on all existing queues.
Because percpu_ref killing and draining involve a RCU grace period,
doing the above on one queue after another may take a long time if
there are many queues on the system.

This patch splits out initiation of freezing and waiting for its
completion, and updates blk_mq_queue_reinit_notify() so that the
queues are frozen in parallel instead of one after another.  Note that
freezing and unfreezing are moved from blk_mq_queue_reinit() to
blk_mq_queue_reinit_notify().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
Christian, can you please verify that this resolves the latency issue
that you're seeing?  Jens, can you please route this patch once
Christian confirms it?

Thanks!

 block/blk-mq.c |   41 +++++++++++++++++++++++++++++++++--------
 1 file changed, 33 insertions(+), 8 deletions(-)

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -107,11 +107,7 @@ static void blk_mq_usage_counter_release
 	wake_up_all(&q->mq_freeze_wq);
 }
 
-/*
- * Guarantee no request is in use, so we can change any data structure of
- * the queue afterward.
- */
-void blk_mq_freeze_queue(struct request_queue *q)
+static void blk_mq_freeze_queue_start(struct request_queue *q)
 {
 	bool freeze;
 
@@ -123,9 +119,23 @@ void blk_mq_freeze_queue(struct request_
 		percpu_ref_kill(&q->mq_usage_counter);
 		blk_mq_run_queues(q, false);
 	}
+}
+
+static void blk_mq_freeze_queue_wait(struct request_queue *q)
+{
 	wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->mq_usage_counter));
 }
 
+/*
+ * Guarantee no request is in use, so we can change any data structure of
+ * the queue afterward.
+ */
+void blk_mq_freeze_queue(struct request_queue *q)
+{
+	blk_mq_freeze_queue_start(q);
+	blk_mq_freeze_queue_wait(q);
+}
+
 static void blk_mq_unfreeze_queue(struct request_queue *q)
 {
 	bool wake;
@@ -1921,7 +1931,7 @@ void blk_mq_free_queue(struct request_qu
 /* Basically redo blk_mq_init_queue with queue frozen */
 static void blk_mq_queue_reinit(struct request_queue *q)
 {
-	blk_mq_freeze_queue(q);
+	WARN_ON_ONCE(!q->mq_freeze_depth);
 
 	blk_mq_sysfs_unregister(q);
 
@@ -1936,8 +1946,6 @@ static void blk_mq_queue_reinit(struct r
 	blk_mq_map_swqueue(q);
 
 	blk_mq_sysfs_register(q);
-
-	blk_mq_unfreeze_queue(q);
 }
 
 static int blk_mq_queue_reinit_notify(struct notifier_block *nb,
@@ -1956,8 +1964,25 @@ static int blk_mq_queue_reinit_notify(st
 		return NOTIFY_OK;
 
 	mutex_lock(&all_q_mutex);
+
+	/*
+	 * We need to freeze and reinit all existing queues.  Freezing
+	 * involves synchronous wait for an RCU grace period and doing it
+	 * one by one may take a long time.  Start freezing all queues in
+	 * one swoop and then wait for the completions so that freezing can
+	 * take place in parallel.
+	 */
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_freeze_queue_start(q);
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_freeze_queue_wait(q);
+
 	list_for_each_entry(q, &all_q_list, all_q_node)
 		blk_mq_queue_reinit(q);
+
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_unfreeze_queue(q);
+
 	mutex_unlock(&all_q_mutex);
 	return NOTIFY_OK;
 }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH block/for-linus] blk-mq: make mq_queue_reinit_notify() freeze queues in parallel
@ 2014-11-04 18:52             ` Tejun Heo
  0 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2014-11-04 18:52 UTC (permalink / raw)
  To: Jens Axboe, Christian Borntraeger
  Cc: Kent Overstreet, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

q->mq_usage_counter is a percpu_ref which is killed and drained when
the queue is frozen.  On a CPU hotplug event, blk_mq_queue_reinit()
which involves freezing the queue is invoked on all existing queues.
Because percpu_ref killing and draining involve a RCU grace period,
doing the above on one queue after another may take a long time if
there are many queues on the system.

This patch splits out initiation of freezing and waiting for its
completion, and updates blk_mq_queue_reinit_notify() so that the
queues are frozen in parallel instead of one after another.  Note that
freezing and unfreezing are moved from blk_mq_queue_reinit() to
blk_mq_queue_reinit_notify().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
Christian, can you please verify that this resolves the latency issue
that you're seeing?  Jens, can you please route this patch once
Christian confirms it?

Thanks!

 block/blk-mq.c |   41 +++++++++++++++++++++++++++++++++--------
 1 file changed, 33 insertions(+), 8 deletions(-)

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -107,11 +107,7 @@ static void blk_mq_usage_counter_release
 	wake_up_all(&q->mq_freeze_wq);
 }
 
-/*
- * Guarantee no request is in use, so we can change any data structure of
- * the queue afterward.
- */
-void blk_mq_freeze_queue(struct request_queue *q)
+static void blk_mq_freeze_queue_start(struct request_queue *q)
 {
 	bool freeze;
 
@@ -123,9 +119,23 @@ void blk_mq_freeze_queue(struct request_
 		percpu_ref_kill(&q->mq_usage_counter);
 		blk_mq_run_queues(q, false);
 	}
+}
+
+static void blk_mq_freeze_queue_wait(struct request_queue *q)
+{
 	wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->mq_usage_counter));
 }
 
+/*
+ * Guarantee no request is in use, so we can change any data structure of
+ * the queue afterward.
+ */
+void blk_mq_freeze_queue(struct request_queue *q)
+{
+	blk_mq_freeze_queue_start(q);
+	blk_mq_freeze_queue_wait(q);
+}
+
 static void blk_mq_unfreeze_queue(struct request_queue *q)
 {
 	bool wake;
@@ -1921,7 +1931,7 @@ void blk_mq_free_queue(struct request_qu
 /* Basically redo blk_mq_init_queue with queue frozen */
 static void blk_mq_queue_reinit(struct request_queue *q)
 {
-	blk_mq_freeze_queue(q);
+	WARN_ON_ONCE(!q->mq_freeze_depth);
 
 	blk_mq_sysfs_unregister(q);
 
@@ -1936,8 +1946,6 @@ static void blk_mq_queue_reinit(struct r
 	blk_mq_map_swqueue(q);
 
 	blk_mq_sysfs_register(q);
-
-	blk_mq_unfreeze_queue(q);
 }
 
 static int blk_mq_queue_reinit_notify(struct notifier_block *nb,
@@ -1956,8 +1964,25 @@ static int blk_mq_queue_reinit_notify(st
 		return NOTIFY_OK;
 
 	mutex_lock(&all_q_mutex);
+
+	/*
+	 * We need to freeze and reinit all existing queues.  Freezing
+	 * involves synchronous wait for an RCU grace period and doing it
+	 * one by one may take a long time.  Start freezing all queues in
+	 * one swoop and then wait for the completions so that freezing can
+	 * take place in parallel.
+	 */
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_freeze_queue_start(q);
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_freeze_queue_wait(q);
+
 	list_for_each_entry(q, &all_q_list, all_q_node)
 		blk_mq_queue_reinit(q);
+
+	list_for_each_entry(q, &all_q_list, all_q_node)
+		blk_mq_unfreeze_queue(q);
+
 	mutex_unlock(&all_q_mutex);
 	return NOTIFY_OK;
 }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH block/for-linus] blk-mq: make mq_queue_reinit_notify() freeze queues in parallel
  2014-11-04 18:52             ` Tejun Heo
@ 2014-11-04 19:46               ` Christian Borntraeger
  -1 siblings, 0 replies; 20+ messages in thread
From: Christian Borntraeger @ 2014-11-04 19:46 UTC (permalink / raw)
  To: Tejun Heo, Jens Axboe
  Cc: Kent Overstreet, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

Am 04.11.2014 19:52, schrieb Tejun Heo:
> q->mq_usage_counter is a percpu_ref which is killed and drained when
> the queue is frozen.  On a CPU hotplug event, blk_mq_queue_reinit()
> which involves freezing the queue is invoked on all existing queues.
> Because percpu_ref killing and draining involve a RCU grace period,
> doing the above on one queue after another may take a long time if
> there are many queues on the system.
> 
> This patch splits out initiation of freezing and waiting for its
> completion, and updates blk_mq_queue_reinit_notify() so that the
> queues are frozen in parallel instead of one after another.  Note that
> freezing and unfreezing are moved from blk_mq_queue_reinit() to
> blk_mq_queue_reinit_notify().
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

Thanks.

> ---
> Christian, can you please verify that this resolves the latency issue
> that you're seeing?  Jens, can you please route this patch once
> Christian confirms it?
> 
> Thanks!
> 
>  block/blk-mq.c |   41 +++++++++++++++++++++++++++++++++--------
>  1 file changed, 33 insertions(+), 8 deletions(-)
> 
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -107,11 +107,7 @@ static void blk_mq_usage_counter_release
>  	wake_up_all(&q->mq_freeze_wq);
>  }
> 
> -/*
> - * Guarantee no request is in use, so we can change any data structure of
> - * the queue afterward.
> - */
> -void blk_mq_freeze_queue(struct request_queue *q)
> +static void blk_mq_freeze_queue_start(struct request_queue *q)
>  {
>  	bool freeze;
> 
> @@ -123,9 +119,23 @@ void blk_mq_freeze_queue(struct request_
>  		percpu_ref_kill(&q->mq_usage_counter);
>  		blk_mq_run_queues(q, false);
>  	}
> +}
> +
> +static void blk_mq_freeze_queue_wait(struct request_queue *q)
> +{
>  	wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->mq_usage_counter));
>  }
> 
> +/*
> + * Guarantee no request is in use, so we can change any data structure of
> + * the queue afterward.
> + */
> +void blk_mq_freeze_queue(struct request_queue *q)
> +{
> +	blk_mq_freeze_queue_start(q);
> +	blk_mq_freeze_queue_wait(q);
> +}
> +
>  static void blk_mq_unfreeze_queue(struct request_queue *q)
>  {
>  	bool wake;
> @@ -1921,7 +1931,7 @@ void blk_mq_free_queue(struct request_qu
>  /* Basically redo blk_mq_init_queue with queue frozen */
>  static void blk_mq_queue_reinit(struct request_queue *q)
>  {
> -	blk_mq_freeze_queue(q);
> +	WARN_ON_ONCE(!q->mq_freeze_depth);
> 
>  	blk_mq_sysfs_unregister(q);
> 
> @@ -1936,8 +1946,6 @@ static void blk_mq_queue_reinit(struct r
>  	blk_mq_map_swqueue(q);
> 
>  	blk_mq_sysfs_register(q);
> -
> -	blk_mq_unfreeze_queue(q);
>  }
> 
>  static int blk_mq_queue_reinit_notify(struct notifier_block *nb,
> @@ -1956,8 +1964,25 @@ static int blk_mq_queue_reinit_notify(st
>  		return NOTIFY_OK;
> 
>  	mutex_lock(&all_q_mutex);
> +
> +	/*
> +	 * We need to freeze and reinit all existing queues.  Freezing
> +	 * involves synchronous wait for an RCU grace period and doing it
> +	 * one by one may take a long time.  Start freezing all queues in
> +	 * one swoop and then wait for the completions so that freezing can
> +	 * take place in parallel.
> +	 */
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_freeze_queue_start(q);
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_freeze_queue_wait(q);
> +
>  	list_for_each_entry(q, &all_q_list, all_q_node)
>  		blk_mq_queue_reinit(q);
> +
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_unfreeze_queue(q);
> +
>  	mutex_unlock(&all_q_mutex);
>  	return NOTIFY_OK;
>  }
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH block/for-linus] blk-mq: make mq_queue_reinit_notify() freeze queues in parallel
@ 2014-11-04 19:46               ` Christian Borntraeger
  0 siblings, 0 replies; 20+ messages in thread
From: Christian Borntraeger @ 2014-11-04 19:46 UTC (permalink / raw)
  To: Tejun Heo, Jens Axboe
  Cc: Kent Overstreet, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

Am 04.11.2014 19:52, schrieb Tejun Heo:
> q->mq_usage_counter is a percpu_ref which is killed and drained when
> the queue is frozen.  On a CPU hotplug event, blk_mq_queue_reinit()
> which involves freezing the queue is invoked on all existing queues.
> Because percpu_ref killing and draining involve a RCU grace period,
> doing the above on one queue after another may take a long time if
> there are many queues on the system.
> 
> This patch splits out initiation of freezing and waiting for its
> completion, and updates blk_mq_queue_reinit_notify() so that the
> queues are frozen in parallel instead of one after another.  Note that
> freezing and unfreezing are moved from blk_mq_queue_reinit() to
> blk_mq_queue_reinit_notify().
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

Thanks.

> ---
> Christian, can you please verify that this resolves the latency issue
> that you're seeing?  Jens, can you please route this patch once
> Christian confirms it?
> 
> Thanks!
> 
>  block/blk-mq.c |   41 +++++++++++++++++++++++++++++++++--------
>  1 file changed, 33 insertions(+), 8 deletions(-)
> 
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -107,11 +107,7 @@ static void blk_mq_usage_counter_release
>  	wake_up_all(&q->mq_freeze_wq);
>  }
> 
> -/*
> - * Guarantee no request is in use, so we can change any data structure of
> - * the queue afterward.
> - */
> -void blk_mq_freeze_queue(struct request_queue *q)
> +static void blk_mq_freeze_queue_start(struct request_queue *q)
>  {
>  	bool freeze;
> 
> @@ -123,9 +119,23 @@ void blk_mq_freeze_queue(struct request_
>  		percpu_ref_kill(&q->mq_usage_counter);
>  		blk_mq_run_queues(q, false);
>  	}
> +}
> +
> +static void blk_mq_freeze_queue_wait(struct request_queue *q)
> +{
>  	wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->mq_usage_counter));
>  }
> 
> +/*
> + * Guarantee no request is in use, so we can change any data structure of
> + * the queue afterward.
> + */
> +void blk_mq_freeze_queue(struct request_queue *q)
> +{
> +	blk_mq_freeze_queue_start(q);
> +	blk_mq_freeze_queue_wait(q);
> +}
> +
>  static void blk_mq_unfreeze_queue(struct request_queue *q)
>  {
>  	bool wake;
> @@ -1921,7 +1931,7 @@ void blk_mq_free_queue(struct request_qu
>  /* Basically redo blk_mq_init_queue with queue frozen */
>  static void blk_mq_queue_reinit(struct request_queue *q)
>  {
> -	blk_mq_freeze_queue(q);
> +	WARN_ON_ONCE(!q->mq_freeze_depth);
> 
>  	blk_mq_sysfs_unregister(q);
> 
> @@ -1936,8 +1946,6 @@ static void blk_mq_queue_reinit(struct r
>  	blk_mq_map_swqueue(q);
> 
>  	blk_mq_sysfs_register(q);
> -
> -	blk_mq_unfreeze_queue(q);
>  }
> 
>  static int blk_mq_queue_reinit_notify(struct notifier_block *nb,
> @@ -1956,8 +1964,25 @@ static int blk_mq_queue_reinit_notify(st
>  		return NOTIFY_OK;
> 
>  	mutex_lock(&all_q_mutex);
> +
> +	/*
> +	 * We need to freeze and reinit all existing queues.  Freezing
> +	 * involves synchronous wait for an RCU grace period and doing it
> +	 * one by one may take a long time.  Start freezing all queues in
> +	 * one swoop and then wait for the completions so that freezing can
> +	 * take place in parallel.
> +	 */
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_freeze_queue_start(q);
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_freeze_queue_wait(q);
> +
>  	list_for_each_entry(q, &all_q_list, all_q_node)
>  		blk_mq_queue_reinit(q);
> +
> +	list_for_each_entry(q, &all_q_list, all_q_node)
> +		blk_mq_unfreeze_queue(q);
> +
>  	mutex_unlock(&all_q_mutex);
>  	return NOTIFY_OK;
>  }
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH block/for-linus] blk-mq: make mq_queue_reinit_notify() freeze queues in parallel
  2014-11-04 18:52             ` Tejun Heo
@ 2014-11-04 21:48               ` Jens Axboe
  -1 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2014-11-04 21:48 UTC (permalink / raw)
  To: Tejun Heo, Christian Borntraeger
  Cc: Kent Overstreet, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

On 2014-11-04 11:52, Tejun Heo wrote:
> q->mq_usage_counter is a percpu_ref which is killed and drained when
> the queue is frozen.  On a CPU hotplug event, blk_mq_queue_reinit()
> which involves freezing the queue is invoked on all existing queues.
> Because percpu_ref killing and draining involve a RCU grace period,
> doing the above on one queue after another may take a long time if
> there are many queues on the system.
>
> This patch splits out initiation of freezing and waiting for its
> completion, and updates blk_mq_queue_reinit_notify() so that the
> queues are frozen in parallel instead of one after another.  Note that
> freezing and unfreezing are moved from blk_mq_queue_reinit() to
> blk_mq_queue_reinit_notify().
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
> Christian, can you please verify that this resolves the latency issue
> that you're seeing?  Jens, can you please route this patch once
> Christian confirms it?

Will queue up for 3.18.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH block/for-linus] blk-mq: make mq_queue_reinit_notify() freeze queues in parallel
@ 2014-11-04 21:48               ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2014-11-04 21:48 UTC (permalink / raw)
  To: Tejun Heo, Christian Borntraeger
  Cc: Kent Overstreet, Christoph Hellwig,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	linux-s390

On 2014-11-04 11:52, Tejun Heo wrote:
> q->mq_usage_counter is a percpu_ref which is killed and drained when
> the queue is frozen.  On a CPU hotplug event, blk_mq_queue_reinit()
> which involves freezing the queue is invoked on all existing queues.
> Because percpu_ref killing and draining involve a RCU grace period,
> doing the above on one queue after another may take a long time if
> there are many queues on the system.
>
> This patch splits out initiation of freezing and waiting for its
> completion, and updates blk_mq_queue_reinit_notify() so that the
> queues are frozen in parallel instead of one after another.  Note that
> freezing and unfreezing are moved from blk_mq_queue_reinit() to
> blk_mq_queue_reinit_notify().
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
> Christian, can you please verify that this resolves the latency issue
> that you're seeing?  Jens, can you please route this patch once
> Christian confirms it?

Will queue up for 3.18.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-11-04 21:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-28 19:35 blk-mq vs cpu hotplug performance (due to percpu_ref_put performance) Christian Borntraeger
2014-10-28 19:35 ` Christian Borntraeger
2014-10-28 20:00 ` Tejun Heo
2014-10-28 20:00   ` Tejun Heo
2014-10-28 20:20   ` Christian Borntraeger
2014-10-28 20:20     ` Christian Borntraeger
2014-10-28 20:22     ` Tejun Heo
2014-10-28 20:22       ` Tejun Heo
2014-10-28 20:26       ` Tejun Heo
2014-10-28 20:26         ` Tejun Heo
2014-10-28 20:29       ` Christian Borntraeger
2014-10-28 20:29         ` Christian Borntraeger
2014-10-28 20:30         ` Tejun Heo
2014-10-28 20:30           ` Tejun Heo
2014-11-04 18:52           ` [PATCH block/for-linus] blk-mq: make mq_queue_reinit_notify() freeze queues in parallel Tejun Heo
2014-11-04 18:52             ` Tejun Heo
2014-11-04 19:46             ` Christian Borntraeger
2014-11-04 19:46               ` Christian Borntraeger
2014-11-04 21:48             ` Jens Axboe
2014-11-04 21:48               ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.