[RFC PATCH] workqueue: Unbind workers before sending them to exit()

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] workqueue: Unbind workers before sending them to exit()
@ 2022-07-19 16:57 Valentin Schneider
  2022-07-20 17:54 ` Marcelo Tosatti
  2022-07-20 18:03 ` Tejun Heo
  0 siblings, 2 replies; 14+ messages in thread
From: Valentin Schneider @ 2022-07-19 16:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Peter Zijlstra, Frederic Weisbecker,
	Juri Lelli, Phil Auld, Marcelo Tosatti

It has been reported that isolated CPUs can suffer from interference due to
per-CPU kworkers waking up just to die.

A surge of workqueue activity (sleeping workfn's exacerbate this) during
initial setup can cause extra per-CPU kworkers to be spawned. Then, a
latency-sensitive task can be running merrily on an isolated CPU only to be
interrupted sometime later by a kworker marked for death (cf.
IDLE_WORKER_TIMEOUT, 5 minutes after last kworker activity).

Affine kworkers to the wq_unbound_cpumask (which doesn't contain isolated
CPUs, cf. HK_TYPE_WQ) before waking them up after marking them with
WORKER_DIE.

This follows the logic of CPU hot-unplug, which has been packaged into
helpers for the occasion.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
 kernel/workqueue.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1ea50f6be843..0f1a25ea4924 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1972,6 +1972,18 @@ static struct worker *create_worker(struct worker_pool *pool)
 	return NULL;
 }
 
+static void unbind_worker(struct worker *worker)
+{
+	kthread_set_per_cpu(worker->task, -1);
+	WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < 0);
+}
+
+static void rebind_worker(struct worker *worker, struct worker_pool *pool)
+{
+	kthread_set_per_cpu(worker->task, pool->cpu);
+	WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask) < 0);
+}
+
 /**
  * destroy_worker - destroy a workqueue worker
  * @worker: worker to be destroyed
@@ -1999,6 +2011,16 @@ static void destroy_worker(struct worker *worker)
 
 	list_del_init(&worker->entry);
 	worker->flags |= WORKER_DIE;
+
+	/*
+	 * We're sending that thread off to die, so any CPU would do. This is
+	 * especially relevant for pcpu kworkers affined to an isolated CPU:
+	 * we'd rather not interrupt an isolated CPU just for a kworker to
+	 * do_exit().
+	 */
+	if (!(worker->flags & WORKER_UNBOUND))
+		unbind_worker(worker);
+
 	wake_up_process(worker->task);
 }
 
@@ -4999,10 +5021,8 @@ static void unbind_workers(int cpu)
 
 		raw_spin_unlock_irq(&pool->lock);
 
-		for_each_pool_worker(worker, pool) {
-			kthread_set_per_cpu(worker->task, -1);
-			WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < 0);
-		}
+		for_each_pool_worker(worker, pool)
+			unbind_worker(worker);
 
 		mutex_unlock(&wq_pool_attach_mutex);
 	}
@@ -5027,11 +5047,8 @@ static void rebind_workers(struct worker_pool *pool)
 	 * of all workers first and then clear UNBOUND.  As we're called
 	 * from CPU_ONLINE, the following shouldn't fail.
 	 */
-	for_each_pool_worker(worker, pool) {
-		kthread_set_per_cpu(worker->task, pool->cpu);
-		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
-						  pool->attrs->cpumask) < 0);
-	}
+	for_each_pool_worker(worker, pool)
+		rebind_worker(worker, pool);
 
 	raw_spin_lock_irq(&pool->lock);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()
  2022-07-19 16:57 [RFC PATCH] workqueue: Unbind workers before sending them to exit() Valentin Schneider
@ 2022-07-20 17:54 ` Marcelo Tosatti
  2022-07-20 18:03 ` Tejun Heo
  1 sibling, 0 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2022-07-20 17:54 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Tejun Heo, Lai Jiangshan, Peter Zijlstra,
	Frederic Weisbecker, Juri Lelli, Phil Auld

On Tue, Jul 19, 2022 at 05:57:43PM +0100, Valentin Schneider wrote:
> It has been reported that isolated CPUs can suffer from interference due to
> per-CPU kworkers waking up just to die.
> 
> A surge of workqueue activity (sleeping workfn's exacerbate this) during
> initial setup can cause extra per-CPU kworkers to be spawned. Then, a
> latency-sensitive task can be running merrily on an isolated CPU only to be
> interrupted sometime later by a kworker marked for death (cf.
> IDLE_WORKER_TIMEOUT, 5 minutes after last kworker activity).
> 
> Affine kworkers to the wq_unbound_cpumask (which doesn't contain isolated
> CPUs, cf. HK_TYPE_WQ) before waking them up after marking them with
> WORKER_DIE.
> 
> This follows the logic of CPU hot-unplug, which has been packaged into
> helpers for the occasion.
> 
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
>  kernel/workqueue.c | 35 ++++++++++++++++++++++++++---------
>  1 file changed, 26 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 1ea50f6be843..0f1a25ea4924 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1972,6 +1972,18 @@ static struct worker *create_worker(struct worker_pool *pool)
>  	return NULL;
>  }
>  
> +static void unbind_worker(struct worker *worker)
> +{
> +	kthread_set_per_cpu(worker->task, -1);
> +	WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < 0);
> +}
> +
> +static void rebind_worker(struct worker *worker, struct worker_pool *pool)
> +{
> +	kthread_set_per_cpu(worker->task, pool->cpu);
> +	WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask) < 0);
> +}
> +
>  /**
>   * destroy_worker - destroy a workqueue worker
>   * @worker: worker to be destroyed
> @@ -1999,6 +2011,16 @@ static void destroy_worker(struct worker *worker)
>  
>  	list_del_init(&worker->entry);
>  	worker->flags |= WORKER_DIE;
> +
> +	/*
> +	 * We're sending that thread off to die, so any CPU would do. This is
> +	 * especially relevant for pcpu kworkers affined to an isolated CPU:
> +	 * we'd rather not interrupt an isolated CPU just for a kworker to
> +	 * do_exit().
> +	 */
> +	if (!(worker->flags & WORKER_UNBOUND))
> +		unbind_worker(worker);
> +
>  	wake_up_process(worker->task);
>  }
>  
> @@ -4999,10 +5021,8 @@ static void unbind_workers(int cpu)
>  
>  		raw_spin_unlock_irq(&pool->lock);
>  
> -		for_each_pool_worker(worker, pool) {
> -			kthread_set_per_cpu(worker->task, -1);
> -			WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < 0);
> -		}
> +		for_each_pool_worker(worker, pool)
> +			unbind_worker(worker);
>  
>  		mutex_unlock(&wq_pool_attach_mutex);
>  	}
> @@ -5027,11 +5047,8 @@ static void rebind_workers(struct worker_pool *pool)
>  	 * of all workers first and then clear UNBOUND.  As we're called
>  	 * from CPU_ONLINE, the following shouldn't fail.
>  	 */
> -	for_each_pool_worker(worker, pool) {
> -		kthread_set_per_cpu(worker->task, pool->cpu);
> -		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
> -						  pool->attrs->cpumask) < 0);
> -	}
> +	for_each_pool_worker(worker, pool)
> +		rebind_worker(worker, pool);
>  
>  	raw_spin_lock_irq(&pool->lock);
>  
> -- 
> 2.31.1
> 
> 

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()
  2022-07-19 16:57 [RFC PATCH] workqueue: Unbind workers before sending them to exit() Valentin Schneider
  2022-07-20 17:54 ` Marcelo Tosatti
@ 2022-07-20 18:03 ` Tejun Heo
  2022-07-21  3:35   ` Lai Jiangshan
  1 sibling, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2022-07-20 18:03 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Lai Jiangshan, Peter Zijlstra, Frederic Weisbecker,
	Juri Lelli, Phil Auld, Marcelo Tosatti

On Tue, Jul 19, 2022 at 05:57:43PM +0100, Valentin Schneider wrote:
> It has been reported that isolated CPUs can suffer from interference due to
> per-CPU kworkers waking up just to die.
> 
> A surge of workqueue activity (sleeping workfn's exacerbate this) during
> initial setup can cause extra per-CPU kworkers to be spawned. Then, a
> latency-sensitive task can be running merrily on an isolated CPU only to be
> interrupted sometime later by a kworker marked for death (cf.
> IDLE_WORKER_TIMEOUT, 5 minutes after last kworker activity).
> 
> Affine kworkers to the wq_unbound_cpumask (which doesn't contain isolated
> CPUs, cf. HK_TYPE_WQ) before waking them up after marking them with
> WORKER_DIE.
> 
> This follows the logic of CPU hot-unplug, which has been packaged into
> helpers for the occasion.

Idea-wise, seems fine to me, but we have some other issues around twiddling
cpu affinities right now, so let's wait a bit till Lai chimes in.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()
  2022-07-20 18:03 ` Tejun Heo
@ 2022-07-21  3:35   ` Lai Jiangshan
  2022-07-21 13:53     ` Valentin Schneider
  0 siblings, 1 reply; 14+ messages in thread
From: Lai Jiangshan @ 2022-07-21  3:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Valentin Schneider, LKML, Peter Zijlstra, Frederic Weisbecker,
	Juri Lelli, Phil Auld, Marcelo Tosatti

> +static void unbind_worker(struct worker *worker)
> +{
> +       kthread_set_per_cpu(worker->task, -1);
> +       WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < 0);
> +}
> +
> +static void rebind_worker(struct worker *worker, struct worker_pool *pool)
> +{
> +       kthread_set_per_cpu(worker->task, pool->cpu);
> +       WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask) < 0);
> +}
> +
>  /**
>   * destroy_worker - destroy a workqueue worker
>   * @worker: worker to be destroyed
> @@ -1999,6 +2011,16 @@ static void destroy_worker(struct worker *worker)
>
>         list_del_init(&worker->entry);
>         worker->flags |= WORKER_DIE;
> +
> +       /*
> +        * We're sending that thread off to die, so any CPU would do. This is
> +        * especially relevant for pcpu kworkers affined to an isolated CPU:
> +        * we'd rather not interrupt an isolated CPU just for a kworker to
> +        * do_exit().
> +        */
> +       if (!(worker->flags & WORKER_UNBOUND))
> +               unbind_worker(worker);
> +
>         wake_up_process(worker->task);
>  }

destroy_worker() is called with raw_spin_lock_irq(pool->lock), so
it cannot call the sleepable set_cpus_allowed_ptr().

From __set_cpus_allowed_ptr:
> * NOTE: the caller must have a valid reference to the task, the
> * task must not exit() & deallocate itself prematurely. The
> * call is not atomic; no spinlocks may be held.

I think it needs something like task_set_cpumask_possible() which is
documented as being usable in (raw) spinlocks and set the task's cpumask
to cpu_possible_mask and let the later ttwu help migrate it to a
proper non-isolated CPU or let it keep running.


On Thu, Jul 21, 2022 at 2:03 AM Tejun Heo <tj@kernel.org> wrote:
>
> On Tue, Jul 19, 2022 at 05:57:43PM +0100, Valentin Schneider wrote:
> > It has been reported that isolated CPUs can suffer from interference due to
> > per-CPU kworkers waking up just to die.
> >
> > A surge of workqueue activity (sleeping workfn's exacerbate this) during
> > initial setup can cause extra per-CPU kworkers to be spawned. Then, a
> > latency-sensitive task can be running merrily on an isolated CPU only to be
> > interrupted sometime later by a kworker marked for death (cf.
> > IDLE_WORKER_TIMEOUT, 5 minutes after last kworker activity).
> >
> > Affine kworkers to the wq_unbound_cpumask (which doesn't contain isolated
> > CPUs, cf. HK_TYPE_WQ) before waking them up after marking them with
> > WORKER_DIE.
> >
> > This follows the logic of CPU hot-unplug, which has been packaged into
> > helpers for the occasion.
>
> Idea-wise, seems fine to me, but we have some other issues around twiddling
> cpu affinities right now, so let's wait a bit till Lai chimes in.
>

I think there are some imperfections related to cpu affinities
which need to be fixed too.

Thanks
Lai

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()
  2022-07-21  3:35   ` Lai Jiangshan
@ 2022-07-21 13:53     ` Valentin Schneider
  2022-07-23  5:16       ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Valentin Schneider @ 2022-07-21 13:53 UTC (permalink / raw)
  To: Lai Jiangshan, Tejun Heo
  Cc: LKML, Peter Zijlstra, Frederic Weisbecker, Juri Lelli, Phil Auld,
	Marcelo Tosatti

On 21/07/22 11:35, Lai Jiangshan wrote:
>> @@ -1999,6 +2011,16 @@ static void destroy_worker(struct worker *worker)
>>
>>         list_del_init(&worker->entry);
>>         worker->flags |= WORKER_DIE;
>> +
>> +       /*
>> +        * We're sending that thread off to die, so any CPU would do. This is
>> +        * especially relevant for pcpu kworkers affined to an isolated CPU:
>> +        * we'd rather not interrupt an isolated CPU just for a kworker to
>> +        * do_exit().
>> +        */
>> +       if (!(worker->flags & WORKER_UNBOUND))
>> +               unbind_worker(worker);
>> +
>>         wake_up_process(worker->task);
>>  }
>
> destroy_worker() is called with raw_spin_lock_irq(pool->lock), so
> it cannot call the sleepable set_cpus_allowed_ptr().
>
> From __set_cpus_allowed_ptr:
>> * NOTE: the caller must have a valid reference to the task, the
>> * task must not exit() & deallocate itself prematurely. The
>> * call is not atomic; no spinlocks may be held.
>

Heh, I somehow forgot that this is blocking. Now in this particular case I
think pcpu kworkers are "safe" - they shouldn't be running when
destroy_worker() is invoked on them (though AFAICT that is not a "hard"
guarantee), and it doesn't make any sense for them to use
migrate_disable(). Still, yeah, not ideal.

> I think it needs something like task_set_cpumask_possible() which is
> documented as being usable in (raw) spinlocks and set the task's cpumask
> to cpu_possible_mask and let the later ttwu help migrate it to a
> proper non-isolated CPU or let it keep running.
>

I'll see what I can come up with, thanks for the suggestion.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()
  2022-07-21 13:53     ` Valentin Schneider
@ 2022-07-23  5:16       ` Tejun Heo
  2022-07-25 10:21         ` Valentin Schneider
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2022-07-23  5:16 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Lai Jiangshan, LKML, Peter Zijlstra, Frederic Weisbecker,
	Juri Lelli, Phil Auld, Marcelo Tosatti

On Thu, Jul 21, 2022 at 02:53:43PM +0100, Valentin Schneider wrote:
> > I think it needs something like task_set_cpumask_possible() which is
> > documented as being usable in (raw) spinlocks and set the task's cpumask
> > to cpu_possible_mask and let the later ttwu help migrate it to a
> > proper non-isolated CPU or let it keep running.
> 
> I'll see what I can come up with, thanks for the suggestion.

Alternatively, we can just kill all the idle kworkers on isolated cpus at
the end of the booting process.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()
  2022-07-23  5:16       ` Tejun Heo
@ 2022-07-25 10:21         ` Valentin Schneider
  2022-07-26 17:30           ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Valentin Schneider @ 2022-07-25 10:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lai Jiangshan, LKML, Peter Zijlstra, Frederic Weisbecker,
	Juri Lelli, Phil Auld, Marcelo Tosatti

On 22/07/22 19:16, Tejun Heo wrote:
> On Thu, Jul 21, 2022 at 02:53:43PM +0100, Valentin Schneider wrote:
>> > I think it needs something like task_set_cpumask_possible() which is
>> > documented as being usable in (raw) spinlocks and set the task's cpumask
>> > to cpu_possible_mask and let the later ttwu help migrate it to a
>> > proper non-isolated CPU or let it keep running.
>> 
>> I'll see what I can come up with, thanks for the suggestion.
>
> Alternatively, we can just kill all the idle kworkers on isolated cpus at
> the end of the booting process.
>

Hm so my choice of words in the changelog wasn't great - "initial setup"
can be kernel init, but *also* setup of whatever workload is being deployed
onto the system.

So you can be having "normal" background activity (I've seen some IRQs end
up with schedule_work() on isolated CPUs, they're not moved away at boot
time but rather shortly before launching the latency-sensitive app), some
preliminary stats collection / setup to make sure the CPU will be quiet
(e.g. refresh_vm_stats()), and *then* the application starts with
fresh-but-no-longer-required extra pcpu kworkers assigned to its CPU.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()
  2022-07-25 10:21         ` Valentin Schneider
@ 2022-07-26 17:30           ` Tejun Heo
  2022-07-26 20:36             ` Valentin Schneider
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2022-07-26 17:30 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Lai Jiangshan, LKML, Peter Zijlstra, Frederic Weisbecker,
	Juri Lelli, Phil Auld, Marcelo Tosatti

Hello,

On Mon, Jul 25, 2022 at 11:21:37AM +0100, Valentin Schneider wrote:
> On 22/07/22 19:16, Tejun Heo wrote:
> > On Thu, Jul 21, 2022 at 02:53:43PM +0100, Valentin Schneider wrote:
> >> > I think it needs something like task_set_cpumask_possible() which is
> >> > documented as being usable in (raw) spinlocks and set the task's cpumask
> >> > to cpu_possible_mask and let the later ttwu help migrate it to a
> >> > proper non-isolated CPU or let it keep running.
> >> 
> >> I'll see what I can come up with, thanks for the suggestion.
> >
> > Alternatively, we can just kill all the idle kworkers on isolated cpus at
> > the end of the booting process.
> 
> Hm so my choice of words in the changelog wasn't great - "initial setup"
> can be kernel init, but *also* setup of whatever workload is being deployed
> onto the system.
> 
> So you can be having "normal" background activity (I've seen some IRQs end
> up with schedule_work() on isolated CPUs, they're not moved away at boot
> time but rather shortly before launching the latency-sensitive app), some
> preliminary stats collection / setup to make sure the CPU will be quiet
> (e.g. refresh_vm_stats()), and *then* the application starts with
> fresh-but-no-longer-required extra pcpu kworkers assigned to its CPU.

Ah, I see. I guess we'll need to figure out how to unbind the workers then.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()
  2022-07-26 17:30           ` Tejun Heo
@ 2022-07-26 20:36             ` Valentin Schneider
  2022-07-26 22:59               ` Tejun Heo
  2022-07-27  5:38               ` Lai Jiangshan
  0 siblings, 2 replies; 14+ messages in thread
From: Valentin Schneider @ 2022-07-26 20:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lai Jiangshan, LKML, Peter Zijlstra, Frederic Weisbecker,
	Juri Lelli, Phil Auld, Marcelo Tosatti

On 26/07/22 07:30, Tejun Heo wrote:
> Hello,
>
> On Mon, Jul 25, 2022 at 11:21:37AM +0100, Valentin Schneider wrote:
>> Hm so my choice of words in the changelog wasn't great - "initial setup"
>> can be kernel init, but *also* setup of whatever workload is being deployed
>> onto the system.
>>
>> So you can be having "normal" background activity (I've seen some IRQs end
>> up with schedule_work() on isolated CPUs, they're not moved away at boot
>> time but rather shortly before launching the latency-sensitive app), some
>> preliminary stats collection / setup to make sure the CPU will be quiet
>> (e.g. refresh_vm_stats()), and *then* the application starts with
>> fresh-but-no-longer-required extra pcpu kworkers assigned to its CPU.
>
> Ah, I see. I guess we'll need to figure out how to unbind the workers then.
>

I've been playing with different ways to unbind & wake the workers in a
sleepable context, but so far I haven't been happy with any of my
experiments.

What hasn't changed much between my attempts is transferring to-be-destroyed
kworkers from their pool->idle_list to a reaper_list which is walked by
*something* that does unbind+wakeup. AFAIA as long as the kworker is off
the pool->idle_list we can play with it (i.e. unbind+wake) off the
pool->lock.

It's the *something* that's annoying to get right, I don't want it to be
overly complicated given most users are probably not impacted by what I'm
trying to fix, but I'm getting the feeling it should still be a per-pool
kthread. I toyed with a single reaper kthread but a central synchronization
for all the pools feels like a stupid overhead.

If any of that sounds ludicrous please shout, otherwise I'm going to keep
tinkering :)

> Thanks.
>
> --
> tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()
  2022-07-26 20:36             ` Valentin Schneider
@ 2022-07-26 22:59               ` Tejun Heo
  2022-07-27  5:38               ` Lai Jiangshan
  1 sibling, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2022-07-26 22:59 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Lai Jiangshan, LKML, Peter Zijlstra, Frederic Weisbecker,
	Juri Lelli, Phil Auld, Marcelo Tosatti

Hello,

On Tue, Jul 26, 2022 at 09:36:06PM +0100, Valentin Schneider wrote:
> It's the *something* that's annoying to get right, I don't want it to be
> overly complicated given most users are probably not impacted by what I'm
> trying to fix, but I'm getting the feeling it should still be a per-pool
> kthread. I toyed with a single reaper kthread but a central synchronization
> for all the pools feels like a stupid overhead.

That sounds like quite a bit of complexity.

> If any of that sounds ludicrous please shout, otherwise I'm going to keep
> tinkering :)

I mean, whatever works works but let's please keep it as minimal as
possible. Why does it need dedicated kthreads in the first place? Wouldn't
scheduling an unbound work item work just as well?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()
  2022-07-26 20:36             ` Valentin Schneider
  2022-07-26 22:59               ` Tejun Heo
@ 2022-07-27  5:38               ` Lai Jiangshan
  2022-07-27  6:30                 ` Lai Jiangshan
  1 sibling, 1 reply; 14+ messages in thread
From: Lai Jiangshan @ 2022-07-27  5:38 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Tejun Heo, LKML, Peter Zijlstra, Frederic Weisbecker, Juri Lelli,
	Phil Auld, Marcelo Tosatti

On Wed, Jul 27, 2022 at 4:36 AM Valentin Schneider <vschneid@redhat.com> wrote:
>
> On 26/07/22 07:30, Tejun Heo wrote:
> > Hello,
> >
> > On Mon, Jul 25, 2022 at 11:21:37AM +0100, Valentin Schneider wrote:
> >> Hm so my choice of words in the changelog wasn't great - "initial setup"
> >> can be kernel init, but *also* setup of whatever workload is being deployed
> >> onto the system.
> >>
> >> So you can be having "normal" background activity (I've seen some IRQs end
> >> up with schedule_work() on isolated CPUs, they're not moved away at boot
> >> time but rather shortly before launching the latency-sensitive app), some
> >> preliminary stats collection / setup to make sure the CPU will be quiet
> >> (e.g. refresh_vm_stats()), and *then* the application starts with
> >> fresh-but-no-longer-required extra pcpu kworkers assigned to its CPU.
> >
> > Ah, I see. I guess we'll need to figure out how to unbind the workers then.
> >
>
> I've been playing with different ways to unbind & wake the workers in a
> sleepable context, but so far I haven't been happy with any of my
> experiments.


I'm writing code to handle the problems of cpu affinity and prematurely
waking up of newly created worker.

This work of unbinding the dying worker is also on the list.
I haven't figured out a good solution.

I was planning to add set_cpus_allowed_ptr_off_rq() which only set
cpumasks to the task only if it is sleeping and returns -EBUSY otherwise.
And it is ensured and documented as being usable in an atomic context
and it is recommended to be used for dying tasks only.

I can't really ensure it would be implemented as I'm expecting since
it touches scheduler code.

I'd better back off.

>
> What hasn't changed much between my attempts is transferring to-be-destroyed
> kworkers from their pool->idle_list to a reaper_list which is walked by
> *something* that does unbind+wakeup. AFAIA as long as the kworker is off
> the pool->idle_list we can play with it (i.e. unbind+wake) off the
> pool->lock.
>
> It's the *something* that's annoying to get right, I don't want it to be
> overly complicated given most users are probably not impacted by what I'm
> trying to fix, but I'm getting the feeling it should still be a per-pool
> kthread. I toyed with a single reaper kthread but a central synchronization
> for all the pools feels like a stupid overhead.

I think fixing it in the workqueue.c is complicated.

Nevertheless, I will also try to fix it inside workqueue only to see
what will come up.

>
> If any of that sounds ludicrous please shout, otherwise I'm going to keep
> tinkering :)
>
> > Thanks.
> >
> > --
> > tejun
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()
  2022-07-27  5:38               ` Lai Jiangshan
@ 2022-07-27  6:30                 ` Lai Jiangshan
  2022-07-27  8:55                   ` Lai Jiangshan
  0 siblings, 1 reply; 14+ messages in thread
From: Lai Jiangshan @ 2022-07-27  6:30 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Tejun Heo, LKML, Peter Zijlstra, Frederic Weisbecker, Juri Lelli,
	Phil Auld, Marcelo Tosatti

On Wed, Jul 27, 2022 at 1:38 PM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
>
> On Wed, Jul 27, 2022 at 4:36 AM Valentin Schneider <vschneid@redhat.com> wrote:
> >
> > On 26/07/22 07:30, Tejun Heo wrote:
> > > Hello,
> > >
> > > On Mon, Jul 25, 2022 at 11:21:37AM +0100, Valentin Schneider wrote:
> > >> Hm so my choice of words in the changelog wasn't great - "initial setup"
> > >> can be kernel init, but *also* setup of whatever workload is being deployed
> > >> onto the system.
> > >>
> > >> So you can be having "normal" background activity (I've seen some IRQs end
> > >> up with schedule_work() on isolated CPUs, they're not moved away at boot
> > >> time but rather shortly before launching the latency-sensitive app), some
> > >> preliminary stats collection / setup to make sure the CPU will be quiet
> > >> (e.g. refresh_vm_stats()), and *then* the application starts with
> > >> fresh-but-no-longer-required extra pcpu kworkers assigned to its CPU.
> > >
> > > Ah, I see. I guess we'll need to figure out how to unbind the workers then.
> > >
> >
> > I've been playing with different ways to unbind & wake the workers in a
> > sleepable context, but so far I haven't been happy with any of my
> > experiments.
>
>
> I'm writing code to handle the problems of cpu affinity and prematurely
> waking up of newly created worker.
>
> This work of unbinding the dying worker is also on the list.
> I haven't figured out a good solution.
>
> I was planning to add set_cpus_allowed_ptr_off_rq() which only set
> cpumasks to the task only if it is sleeping and returns -EBUSY otherwise.
> And it is ensured and documented as being usable in an atomic context
> and it is recommended to be used for dying tasks only.
>
> I can't really ensure it would be implemented as I'm expecting since
> it touches scheduler code.
>
> I'd better back off.
>
> >
> > What hasn't changed much between my attempts is transferring to-be-destroyed
> > kworkers from their pool->idle_list to a reaper_list which is walked by
> > *something* that does unbind+wakeup. AFAIA as long as the kworker is off
> > the pool->idle_list we can play with it (i.e. unbind+wake) off the
> > pool->lock.
> >
> > It's the *something* that's annoying to get right, I don't want it to be
> > overly complicated given most users are probably not impacted by what I'm
> > trying to fix, but I'm getting the feeling it should still be a per-pool
> > kthread. I toyed with a single reaper kthread but a central synchronization
> > for all the pools feels like a stupid overhead.
>
> I think fixing it in the workqueue.c is complicated.
>
> Nevertheless, I will also try to fix it inside workqueue only to see
> what will come up.

I'm going to kind of revert 3347fc9f36e7 ("workqueue: destroy worker
directly in the idle timeout handler"), so that we can have a sleepable
destroy_worker().

>
> >
> > If any of that sounds ludicrous please shout, otherwise I'm going to keep
> > tinkering :)
> >
> > > Thanks.
> > >
> > > --
> > > tejun
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()
  2022-07-27  6:30                 ` Lai Jiangshan
@ 2022-07-27  8:55                   ` Lai Jiangshan
  2022-07-27  9:22                     ` Valentin Schneider
  0 siblings, 1 reply; 14+ messages in thread
From: Lai Jiangshan @ 2022-07-27  8:55 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Tejun Heo, LKML, Peter Zijlstra, Frederic Weisbecker, Juri Lelli,
	Phil Auld, Marcelo Tosatti

On Wed, Jul 27, 2022 at 2:30 PM Lai Jiangshan <jiangshanlai@gmail.com> wrote:

> >
> > >
> > > What hasn't changed much between my attempts is transferring to-be-destroyed
> > > kworkers from their pool->idle_list to a reaper_list which is walked by
> > > *something* that does unbind+wakeup. AFAIA as long as the kworker is off
> > > the pool->idle_list we can play with it (i.e. unbind+wake) off the
> > > pool->lock.
> > >
> > > It's the *something* that's annoying to get right, I don't want it to be
> > > overly complicated given most users are probably not impacted by what I'm
> > > trying to fix, but I'm getting the feeling it should still be a per-pool
> > > kthread. I toyed with a single reaper kthread but a central synchronization
> > > for all the pools feels like a stupid overhead.
> >
> > I think fixing it in the workqueue.c is complicated.
> >
> > Nevertheless, I will also try to fix it inside workqueue only to see
> > what will come up.
>
> I'm going to kind of revert 3347fc9f36e7 ("workqueue: destroy worker
> directly in the idle timeout handler"), so that we can have a sleepable
> destroy_worker().
>

It is not a good idea.  The woken up manager might still be in
the isolated CPU.

On Wed, Jul 27, 2022 at 6:59 AM Tejun Heo <tj@kernel.org> wrote:
>
> I mean, whatever works works but let's please keep it as minimal as
> possible. Why does it need dedicated kthreads in the first place? Wouldn't
> scheduling an unbound work item work just as well?
>

Scheduling an unbound work item will work well.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()
  2022-07-27  8:55                   ` Lai Jiangshan
@ 2022-07-27  9:22                     ` Valentin Schneider
  0 siblings, 0 replies; 14+ messages in thread
From: Valentin Schneider @ 2022-07-27  9:22 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Tejun Heo, LKML, Peter Zijlstra, Frederic Weisbecker, Juri Lelli,
	Phil Auld, Marcelo Tosatti

On 27/07/22 16:55, Lai Jiangshan wrote:
> On Wed, Jul 27, 2022 at 2:30 PM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
>
>> >
>> > >
>> > > What hasn't changed much between my attempts is transferring to-be-destroyed
>> > > kworkers from their pool->idle_list to a reaper_list which is walked by
>> > > *something* that does unbind+wakeup. AFAIA as long as the kworker is off
>> > > the pool->idle_list we can play with it (i.e. unbind+wake) off the
>> > > pool->lock.
>> > >
>> > > It's the *something* that's annoying to get right, I don't want it to be
>> > > overly complicated given most users are probably not impacted by what I'm
>> > > trying to fix, but I'm getting the feeling it should still be a per-pool
>> > > kthread. I toyed with a single reaper kthread but a central synchronization
>> > > for all the pools feels like a stupid overhead.
>> >
>> > I think fixing it in the workqueue.c is complicated.
>> >
>> > Nevertheless, I will also try to fix it inside workqueue only to see
>> > what will come up.
>>
>> I'm going to kind of revert 3347fc9f36e7 ("workqueue: destroy worker
>> directly in the idle timeout handler"), so that we can have a sleepable
>> destroy_worker().
>>
>
> It is not a good idea.  The woken up manager might still be in
> the isolated CPU.
>
> On Wed, Jul 27, 2022 at 6:59 AM Tejun Heo <tj@kernel.org> wrote:
>>
>> I mean, whatever works works but let's please keep it as minimal as
>> possible. Why does it need dedicated kthreads in the first place? Wouldn't
>> scheduling an unbound work item work just as well?
>>
>
> Scheduling an unbound work item will work well.

I did play a bit with that yesterday (pretty much replacing the
pool->idle_timer with a delayed_work) but locking discouraged me - it's
quite easy to end up with a self-deadlock.

Now, I've slept over it and have a fresh cup of coffee, and it's been the
least intrusive-looking change I've tried, so let me give that a shot
again.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-07-27  9:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-19 16:57 [RFC PATCH] workqueue: Unbind workers before sending them to exit() Valentin Schneider
2022-07-20 17:54 ` Marcelo Tosatti
2022-07-20 18:03 ` Tejun Heo
2022-07-21  3:35   ` Lai Jiangshan
2022-07-21 13:53     ` Valentin Schneider
2022-07-23  5:16       ` Tejun Heo
2022-07-25 10:21         ` Valentin Schneider
2022-07-26 17:30           ` Tejun Heo
2022-07-26 20:36             ` Valentin Schneider
2022-07-26 22:59               ` Tejun Heo
2022-07-27  5:38               ` Lai Jiangshan
2022-07-27  6:30                 ` Lai Jiangshan
2022-07-27  8:55                   ` Lai Jiangshan
2022-07-27  9:22                     ` Valentin Schneider

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).