All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
@ 2021-12-14 14:53 Jens Axboe
  2021-12-14 15:04 ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2021-12-14 14:53 UTC (permalink / raw)
  To: linux-block; +Cc: Dexuan Cui, Ming Lei

Dexuan reports that he's seeing spikes of very heavy CPU utilization when
running 24 disks and using the 'none' scheduler. This happens off the
flush path, because SCSI requires the queue to be restarted async, and
hence we're hammering on mod_delayed_work_on() to ensure that the work
item gets run appropriately.

What we care about here is that the queue is run, and we don't need to
repeatedly re-arm the timer associated with the delayed work item. If we
check if the work item is pending upfront, then we don't really need to do
anything else. This is safe as theh work pending bit is cleared before a
work item is started.

The only potential caveat here is if we have callers with wildly different
timeouts specified. That's generally not the case, so don't think we need
to care for that case.

Reported-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/linux-block/BYAPR21MB1270C598ED214C0490F47400BF719@BYAPR21MB1270.namprd21.prod.outlook.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

diff --git a/block/blk-core.c b/block/blk-core.c
index 1378d084c770..4584fe709c15 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1484,7 +1484,16 @@ EXPORT_SYMBOL(kblockd_schedule_work);
 int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
 				unsigned long delay)
 {
-	return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
+	/*
+	 * Avoid hammering on work addition, if the work item is already
+	 * pending. This is safe the work pending state is cleared before
+	 * the work item is started, so if we see it set, then we know that
+	 * whatever was previously queued on the block side will get run by
+	 * an existing pending work item.
+	 */
+	if (!work_pending(&dwork->work))
+		return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
+	return true;
 }
 EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
 
-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
  2021-12-14 14:53 [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption Jens Axboe
@ 2021-12-14 15:04 ` Christoph Hellwig
  2021-12-14 15:59   ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2021-12-14 15:04 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Dexuan Cui, Ming Lei, linux-scsi, Tejun Heo, Lai Jiangshan

On Tue, Dec 14, 2021 at 07:53:46AM -0700, Jens Axboe wrote:
> Dexuan reports that he's seeing spikes of very heavy CPU utilization when
> running 24 disks and using the 'none' scheduler. This happens off the
> flush path, because SCSI requires the queue to be restarted async, and
> hence we're hammering on mod_delayed_work_on() to ensure that the work
> item gets run appropriately.
> 
> What we care about here is that the queue is run, and we don't need to
> repeatedly re-arm the timer associated with the delayed work item. If we
> check if the work item is pending upfront, then we don't really need to do
> anything else. This is safe as theh work pending bit is cleared before a
> work item is started.
> 
> The only potential caveat here is if we have callers with wildly different
> timeouts specified. That's generally not the case, so don't think we need
> to care for that case.

So why not do a non-delayed queue_work for that case?  Might be good
to get the scsi and workqueue maintaines involved to understand the
issue a bit better first.

> 
> Reported-by: Dexuan Cui <decui@microsoft.com>
> Link: https://lore.kernel.org/linux-block/BYAPR21MB1270C598ED214C0490F47400BF719@BYAPR21MB1270.namprd21.prod.outlook.com/
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 1378d084c770..4584fe709c15 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1484,7 +1484,16 @@ EXPORT_SYMBOL(kblockd_schedule_work);
>  int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
>  				unsigned long delay)
>  {
> -	return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
> +	/*
> +	 * Avoid hammering on work addition, if the work item is already
> +	 * pending. This is safe the work pending state is cleared before
> +	 * the work item is started, so if we see it set, then we know that
> +	 * whatever was previously queued on the block side will get run by
> +	 * an existing pending work item.
> +	 */
> +	if (!work_pending(&dwork->work))
> +		return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
> +	return true;
>  }
>  EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
>  
> -- 
> Jens Axboe
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
  2021-12-14 15:04 ` Christoph Hellwig
@ 2021-12-14 15:59   ` Jens Axboe
  2021-12-14 20:42     ` Dexuan Cui
  2021-12-15 17:40     ` Bart Van Assche
  0 siblings, 2 replies; 6+ messages in thread
From: Jens Axboe @ 2021-12-14 15:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-block, Dexuan Cui, Ming Lei, linux-scsi, Tejun Heo, Lai Jiangshan

On 12/14/21 8:04 AM, Christoph Hellwig wrote:
> On Tue, Dec 14, 2021 at 07:53:46AM -0700, Jens Axboe wrote:
>> Dexuan reports that he's seeing spikes of very heavy CPU utilization when
>> running 24 disks and using the 'none' scheduler. This happens off the
>> flush path, because SCSI requires the queue to be restarted async, and
>> hence we're hammering on mod_delayed_work_on() to ensure that the work
>> item gets run appropriately.
>>
>> What we care about here is that the queue is run, and we don't need to
>> repeatedly re-arm the timer associated with the delayed work item. If we
>> check if the work item is pending upfront, then we don't really need to do
>> anything else. This is safe as theh work pending bit is cleared before a
>> work item is started.
>>
>> The only potential caveat here is if we have callers with wildly different
>> timeouts specified. That's generally not the case, so don't think we need
>> to care for that case.
> 
> So why not do a non-delayed queue_work for that case?  Might be good
> to get the scsi and workqueue maintaines involved to understand the
> issue a bit better first.

We can probably get by with doing just that, and just ignore if a delayed
work timer is already running.

Dexuan, can you try this one?

diff --git a/block/blk-core.c b/block/blk-core.c
index 1378d084c770..c1833f95cb97 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
 int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
 				unsigned long delay)
 {
+	if (!delay)
+		return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
 	return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
 }
 EXPORT_SYMBOL(kblockd_mod_delayed_work_on);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* RE: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
  2021-12-14 15:59   ` Jens Axboe
@ 2021-12-14 20:42     ` Dexuan Cui
  2021-12-15 17:40     ` Bart Van Assche
  1 sibling, 0 replies; 6+ messages in thread
From: Dexuan Cui @ 2021-12-14 20:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: linux-block, Ming Lei, linux-scsi, Tejun Heo, Lai Jiangshan

> From: Jens Axboe <axboe@kernel.dk>
> Sent: Tuesday, December 14, 2021 8:00 AM
>  ...
> We can probably get by with doing just that, and just ignore if a delayed
> work timer is already running.
> 
> Dexuan, can you try this one?
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 1378d084c770..c1833f95cb97 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
>  int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
>  				unsigned long delay)
>  {
> +	if (!delay)
> +		return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
>  	return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
>  }
>  EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
> 
> --
> Jens Axboe

According to my test, this patch works as efficiently as the v1.  Thanks Jens!

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
  2021-12-14 15:59   ` Jens Axboe
  2021-12-14 20:42     ` Dexuan Cui
@ 2021-12-15 17:40     ` Bart Van Assche
  2021-12-16  7:22       ` Ming Lei
  1 sibling, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2021-12-15 17:40 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: linux-block, Dexuan Cui, Ming Lei, linux-scsi, Tejun Heo, Lai Jiangshan

On 12/14/21 7:59 AM, Jens Axboe wrote:
> On 12/14/21 8:04 AM, Christoph Hellwig wrote:
>> So why not do a non-delayed queue_work for that case?  Might be good
>> to get the scsi and workqueue maintaines involved to understand the
>> issue a bit better first.
> 
> We can probably get by with doing just that, and just ignore if a delayed
> work timer is already running.
> 
> Dexuan, can you try this one?
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 1378d084c770..c1833f95cb97 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
>   int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
>   				unsigned long delay)
>   {
> +	if (!delay)
> +		return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
>   	return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
>   }
>   EXPORT_SYMBOL(kblockd_mod_delayed_work_on);

As Christoph already mentioned, it would be great to receive feedback from the
workqueue maintainer about this patch since I'm not aware of other kernel code
that queues delayed_work in a similar way.
Regarding the feedback from the view of the SCSI subsystem: I'd like to see the
block layer core track whether or not a queue needs to be run such that the
scsi_run_queue_async() call can be removed from scsi_end_request(). No such call
was present in the original conversion of the SCSI core from the legacy block
layer to blk-mq. See also commit d285203cf647 ("scsi: add support for a blk-mq
based I/O path.").

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption
  2021-12-15 17:40     ` Bart Van Assche
@ 2021-12-16  7:22       ` Ming Lei
  0 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2021-12-16  7:22 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, linux-block, Dexuan Cui,
	linux-scsi, Tejun Heo, Lai Jiangshan

On Wed, Dec 15, 2021 at 09:40:38AM -0800, Bart Van Assche wrote:
> On 12/14/21 7:59 AM, Jens Axboe wrote:
> > On 12/14/21 8:04 AM, Christoph Hellwig wrote:
> > > So why not do a non-delayed queue_work for that case?  Might be good
> > > to get the scsi and workqueue maintaines involved to understand the
> > > issue a bit better first.
> > 
> > We can probably get by with doing just that, and just ignore if a delayed
> > work timer is already running.
> > 
> > Dexuan, can you try this one?
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 1378d084c770..c1833f95cb97 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
> >   int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
> >   				unsigned long delay)
> >   {
> > +	if (!delay)
> > +		return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
> >   	return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
> >   }
> >   EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
> 
> As Christoph already mentioned, it would be great to receive feedback from the
> workqueue maintainer about this patch since I'm not aware of other kernel code
> that queues delayed_work in a similar way.
> Regarding the feedback from the view of the SCSI subsystem: I'd like to see the
> block layer core track whether or not a queue needs to be run such that the
> scsi_run_queue_async() call can be removed from scsi_end_request(). No such call

scsi_run_queue_async() is just for handling restart from running out of
scsi's device queue limit, which shouldn't be hot now, and it is for
handling scsi's own queue limit.

> was present in the original conversion of the SCSI core from the legacy block
> layer to blk-mq. See also commit d285203cf647 ("scsi: add support for a blk-mq
> based I/O path.").

That isn't true, see scsi_next_command()->scsi_run_queue().


Thanks,
Ming


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-12-16  7:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-14 14:53 [PATCH] block: reduce kblockd_mod_delayed_work_on() CPU consumption Jens Axboe
2021-12-14 15:04 ` Christoph Hellwig
2021-12-14 15:59   ` Jens Axboe
2021-12-14 20:42     ` Dexuan Cui
2021-12-15 17:40     ` Bart Van Assche
2021-12-16  7:22       ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.