All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] workqueue: Fix hotplug races
@ 2021-11-30  0:06 Frederic Weisbecker
  2021-11-30  0:06 ` [RFC PATCH 1/2] workqueue: Fix unbind_workers() VS wq_worker_running() race Frederic Weisbecker
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2021-11-30  0:06 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan; +Cc: LKML, Frederic Weisbecker, Paul E . McKenney

While triggering rcutorture splats, I couldn't trace down to the exact
cause due to the issue being tracing-unfriendly. But looking at the code,
those two potential races look like plausible causes and also after
these two patches, I couldn't reproduce the rcutorture issues again (yet...).

Frederic Weisbecker (2):
  workqueue: Fix unbind_workers() VS wq_worker_running() race
  workqueue: Fix unbind_workers() VS wq_worker_sleeping() race

 kernel/workqueue.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 1/2] workqueue: Fix unbind_workers() VS wq_worker_running() race
  2021-11-30  0:06 [RFC PATCH 0/2] workqueue: Fix hotplug races Frederic Weisbecker
@ 2021-11-30  0:06 ` Frederic Weisbecker
  2021-11-30  0:33   ` Lai Jiangshan
  2021-11-30  0:06 ` [RFC PATCH 2/2] workqueue: Fix unbind_workers() VS wq_worker_sleeping() race Frederic Weisbecker
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2021-11-30  0:06 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan; +Cc: LKML, Frederic Weisbecker, Paul E . McKenney

At CPU-hotplug time, unbind_worker() may preempt a worker while it is
waking up. In that case the following scenario can happen:

        unbind_workers()                     wq_worker_running()
        --------------                      -------------------
        	                      if (!(worker->flags & WORKER_NOT_RUNNING))
        	                          //PREEMPTED by unbind_workers
        worker->flags |= WORKER_UNBOUND;
        [...]
        atomic_set(&pool->nr_running, 0);
        //resume to worker
		                              atomic_inc(&worker->pool->nr_running);

After unbind_worker() resets pool->nr_running, the value is expected to
remain 0 until the pool ever gets rebound in case cpu_up() is called on
the target CPU in the future. But here the race leaves pool->nr_running
with a value of 1, triggering the following warning when the worker goes
idle:

	WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0
	Modules linked in:
	CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34
	Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
	Workqueue:  0x0 (rcu_par_gp)
	RIP: 0010:worker_enter_idle+0x95/0xc0
	Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93  0
	RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086
	RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000
	RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140
	RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080
	R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20
	R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140
	FS:  0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0
	DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
	Call Trace:
	  <TASK>
	  worker_thread+0x89/0x3d0
	  ? process_one_work+0x400/0x400
	  kthread+0x162/0x190
	  ? set_kthread_struct+0x40/0x40
	  ret_from_fork+0x22/0x30
	  </TASK>

Also due to this incorrect "nr_running == 1", further queued work may
end up not being served, because no worker is awaken at work insert time.
This raises rcutorture writer stalls for example.

Fix this with disabling preemption in the right place in
wq_worker_running().

It's worth noting that if the worker migrates and runs concurrently with
unbind_workers(), it is guaranteed to see the WORKER_UNBOUND flag update
due to set_cpus_allowed_ptr() acquiring/releasing rq->lock.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/workqueue.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 332361cf215f..5094573e8b45 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -868,8 +868,17 @@ void wq_worker_running(struct task_struct *task)
 
 	if (!worker->sleeping)
 		return;
+
+	/*
+	 * If preempted by unbind_workers() between the WORKER_NOT_RUNNING check
+	 * and the nr_running increment below, we may ruin the nr_running reset
+	 * and leave with an unexpected pool->nr_running == 1 on the newly unbound
+	 * pool. Protect against such race.
+	 */
+	preempt_disable();
 	if (!(worker->flags & WORKER_NOT_RUNNING))
 		atomic_inc(&worker->pool->nr_running);
+	preempt_enable();
 	worker->sleeping = 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 2/2] workqueue: Fix unbind_workers() VS wq_worker_sleeping() race
  2021-11-30  0:06 [RFC PATCH 0/2] workqueue: Fix hotplug races Frederic Weisbecker
  2021-11-30  0:06 ` [RFC PATCH 1/2] workqueue: Fix unbind_workers() VS wq_worker_running() race Frederic Weisbecker
@ 2021-11-30  0:06 ` Frederic Weisbecker
  2021-11-30  0:57   ` Lai Jiangshan
  2021-11-30 16:36 ` [RFC PATCH 0/2] workqueue: Fix hotplug races Tejun Heo
  2021-11-30 18:01 ` Paul E. McKenney
  3 siblings, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2021-11-30  0:06 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan; +Cc: LKML, Frederic Weisbecker, Paul E . McKenney

At CPU-hotplug time, unbind_workers() may preempt a worker while it is
going to sleep. In that case the following scenario can happen:

    unbind_workers()                     wq_worker_sleeping()
    --------------                      -------------------
                                      if (worker->flags & WORKER_NOT_RUNNING)
                                          return;
                                      //PREEMPTED by unbind_workers
    worker->flags |= WORKER_UNBOUND;
    [...]
    atomic_set(&pool->nr_running, 0);
    //resume to worker
                                       atomic_dec_and_test(&pool->nr_running);

After unbind_worker() resets pool->nr_running, the value is expected to
remain 0 until the pool ever gets rebound in case cpu_up() is called on
the target CPU in the future. But here the race leaves pool->nr_running
with a value of -1, triggering the following warning when the worker goes
idle:

        WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0
        Modules linked in:
        CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
        Workqueue:  0x0 (rcu_par_gp)
        RIP: 0010:worker_enter_idle+0x95/0xc0
        Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93  0
        RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086
        RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140
        RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080
        R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20
        R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140
        FS:  0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
          <TASK>
          worker_thread+0x89/0x3d0
          ? process_one_work+0x400/0x400
          kthread+0x162/0x190
          ? set_kthread_struct+0x40/0x40
          ret_from_fork+0x22/0x30
          </TASK>

Also due to this incorrect "nr_running == -1", all sorts of hazards can
happen, starting with queued works being ignored because no workers are
awaken at insert_work() time.

Fix this with checking again the worker flags while pool->lock is locked.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/workqueue.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 5094573e8b45..5557d19ea81c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -912,6 +912,16 @@ void wq_worker_sleeping(struct task_struct *task)
 	worker->sleeping = 1;
 	raw_spin_lock_irq(&pool->lock);
 
+	/*
+	 * Recheck in case unbind_workers() preempted us. We don't
+	 * want to decrement nr_running after the worker is unbound
+	 * and nr_running has been reset.
+	 */
+	if (worker->flags & WORKER_NOT_RUNNING) {
+		raw_spin_unlock_irq(&pool->lock);
+		return;
+	}
+
 	/*
 	 * The counterpart of the following dec_and_test, implied mb,
 	 * worklist not empty test sequence is in insert_work().
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 1/2] workqueue: Fix unbind_workers() VS wq_worker_running() race
  2021-11-30  0:06 ` [RFC PATCH 1/2] workqueue: Fix unbind_workers() VS wq_worker_running() race Frederic Weisbecker
@ 2021-11-30  0:33   ` Lai Jiangshan
  0 siblings, 0 replies; 8+ messages in thread
From: Lai Jiangshan @ 2021-11-30  0:33 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Tejun Heo, LKML, Paul E . McKenney

On Tue, Nov 30, 2021 at 8:06 AM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> At CPU-hotplug time, unbind_worker() may preempt a worker while it is
> waking up. In that case the following scenario can happen:
>
>         unbind_workers()                     wq_worker_running()
>         --------------                      -------------------
>                                       if (!(worker->flags & WORKER_NOT_RUNNING))
>                                           //PREEMPTED by unbind_workers
>         worker->flags |= WORKER_UNBOUND;
>         [...]
>         atomic_set(&pool->nr_running, 0);
>         //resume to worker
>                                               atomic_inc(&worker->pool->nr_running);
>
> After unbind_worker() resets pool->nr_running, the value is expected to
> remain 0 until the pool ever gets rebound in case cpu_up() is called on
> the target CPU in the future. But here the race leaves pool->nr_running
> with a value of 1, triggering the following warning when the worker goes
> idle:
>
>         WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0
>         Modules linked in:
>         CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34
>         Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
>         Workqueue:  0x0 (rcu_par_gp)
>         RIP: 0010:worker_enter_idle+0x95/0xc0
>         Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93  0
>         RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086
>         RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000
>         RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140
>         RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080
>         R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20
>         R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140
>         FS:  0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000
>         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>         CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0
>         DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>         DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>         Call Trace:
>           <TASK>
>           worker_thread+0x89/0x3d0
>           ? process_one_work+0x400/0x400
>           kthread+0x162/0x190
>           ? set_kthread_struct+0x40/0x40
>           ret_from_fork+0x22/0x30
>           </TASK>
>
> Also due to this incorrect "nr_running == 1", further queued work may
> end up not being served, because no worker is awaken at work insert time.
> This raises rcutorture writer stalls for example.
>
> Fix this with disabling preemption in the right place in
> wq_worker_running().
>
> It's worth noting that if the worker migrates and runs concurrently with
> unbind_workers(), it is guaranteed to see the WORKER_UNBOUND flag update
> due to set_cpus_allowed_ptr() acquiring/releasing rq->lock.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Fixes: 6d25be5782e4 ("sched/core, workqueues: Distangle worker
accounting from rq lock")

Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>

> Cc: Paul E. McKenney <paulmck@kernel.org>
> ---
>  kernel/workqueue.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 332361cf215f..5094573e8b45 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -868,8 +868,17 @@ void wq_worker_running(struct task_struct *task)
>
>         if (!worker->sleeping)
>                 return;
> +
> +       /*
> +        * If preempted by unbind_workers() between the WORKER_NOT_RUNNING check
> +        * and the nr_running increment below, we may ruin the nr_running reset
> +        * and leave with an unexpected pool->nr_running == 1 on the newly unbound
> +        * pool. Protect against such race.
> +        */
> +       preempt_disable();
>         if (!(worker->flags & WORKER_NOT_RUNNING))
>                 atomic_inc(&worker->pool->nr_running);
> +       preempt_enable();
>         worker->sleeping = 0;
>  }
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 2/2] workqueue: Fix unbind_workers() VS wq_worker_sleeping() race
  2021-11-30  0:06 ` [RFC PATCH 2/2] workqueue: Fix unbind_workers() VS wq_worker_sleeping() race Frederic Weisbecker
@ 2021-11-30  0:57   ` Lai Jiangshan
  0 siblings, 0 replies; 8+ messages in thread
From: Lai Jiangshan @ 2021-11-30  0:57 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Tejun Heo, LKML, Paul E . McKenney

On Tue, Nov 30, 2021 at 8:06 AM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> At CPU-hotplug time, unbind_workers() may preempt a worker while it is
> going to sleep. In that case the following scenario can happen:
>
>     unbind_workers()                     wq_worker_sleeping()
>     --------------                      -------------------
>                                       if (worker->flags & WORKER_NOT_RUNNING)
>                                           return;
>                                       //PREEMPTED by unbind_workers
>     worker->flags |= WORKER_UNBOUND;
>     [...]
>     atomic_set(&pool->nr_running, 0);
>     //resume to worker
>                                        atomic_dec_and_test(&pool->nr_running);
>
> After unbind_worker() resets pool->nr_running, the value is expected to
> remain 0 until the pool ever gets rebound in case cpu_up() is called on
> the target CPU in the future. But here the race leaves pool->nr_running
> with a value of -1, triggering the following warning when the worker goes
> idle:
>
>         WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0
>         Modules linked in:
>         CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34
>         Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
>         Workqueue:  0x0 (rcu_par_gp)
>         RIP: 0010:worker_enter_idle+0x95/0xc0
>         Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93  0
>         RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086
>         RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000
>         RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140
>         RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080
>         R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20
>         R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140
>         FS:  0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000
>         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>         CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0
>         DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>         DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>         Call Trace:
>           <TASK>
>           worker_thread+0x89/0x3d0
>           ? process_one_work+0x400/0x400
>           kthread+0x162/0x190
>           ? set_kthread_struct+0x40/0x40
>           ret_from_fork+0x22/0x30
>           </TASK>
>
> Also due to this incorrect "nr_running == -1", all sorts of hazards can
> happen, starting with queued works being ignored because no workers are
> awaken at insert_work() time.
>
> Fix this with checking again the worker flags while pool->lock is locked.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Paul E. McKenney <paulmck@kernel.org>


Fixes: b945efcdd07d ("sched: Remove pointless preemption disable in
sched_submit_work()")
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>

It was my fault for not reviewing b945efcdd07d carefully enough.

> ---
>  kernel/workqueue.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 5094573e8b45..5557d19ea81c 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -912,6 +912,16 @@ void wq_worker_sleeping(struct task_struct *task)
>         worker->sleeping = 1;
>         raw_spin_lock_irq(&pool->lock);
>
> +       /*
> +        * Recheck in case unbind_workers() preempted us. We don't
> +        * want to decrement nr_running after the worker is unbound
> +        * and nr_running has been reset.
> +        */
> +       if (worker->flags & WORKER_NOT_RUNNING) {
> +               raw_spin_unlock_irq(&pool->lock);
> +               return;
> +       }
> +
>         /*
>          * The counterpart of the following dec_and_test, implied mb,
>          * worklist not empty test sequence is in insert_work().
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/2] workqueue: Fix hotplug races
  2021-11-30  0:06 [RFC PATCH 0/2] workqueue: Fix hotplug races Frederic Weisbecker
  2021-11-30  0:06 ` [RFC PATCH 1/2] workqueue: Fix unbind_workers() VS wq_worker_running() race Frederic Weisbecker
  2021-11-30  0:06 ` [RFC PATCH 2/2] workqueue: Fix unbind_workers() VS wq_worker_sleeping() race Frederic Weisbecker
@ 2021-11-30 16:36 ` Tejun Heo
  2021-12-01 14:48   ` Frederic Weisbecker
  2021-11-30 18:01 ` Paul E. McKenney
  3 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2021-11-30 16:36 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Lai Jiangshan, LKML, Paul E . McKenney

On Tue, Nov 30, 2021 at 01:06:10AM +0100, Frederic Weisbecker wrote:
> While triggering rcutorture splats, I couldn't trace down to the exact
> cause due to the issue being tracing-unfriendly. But looking at the code,
> those two potential races look like plausible causes and also after
> these two patches, I couldn't reproduce the rcutorture issues again (yet...).
> 
> Frederic Weisbecker (2):
>   workqueue: Fix unbind_workers() VS wq_worker_running() race
>   workqueue: Fix unbind_workers() VS wq_worker_sleeping() race

Both patches look good to me. Lai, thanks for finding the offending commits.
Frederic, can you please repost them w/ scheduler folks and tglx cc'd?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/2] workqueue: Fix hotplug races
  2021-11-30  0:06 [RFC PATCH 0/2] workqueue: Fix hotplug races Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2021-11-30 16:36 ` [RFC PATCH 0/2] workqueue: Fix hotplug races Tejun Heo
@ 2021-11-30 18:01 ` Paul E. McKenney
  3 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2021-11-30 18:01 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Tejun Heo, Lai Jiangshan, LKML

On Tue, Nov 30, 2021 at 01:06:10AM +0100, Frederic Weisbecker wrote:
> While triggering rcutorture splats, I couldn't trace down to the exact
> cause due to the issue being tracing-unfriendly. But looking at the code,
> those two potential races look like plausible causes and also after
> these two patches, I couldn't reproduce the rcutorture issues again (yet...).
> 
> Frederic Weisbecker (2):
>   workqueue: Fix unbind_workers() VS wq_worker_running() race
>   workqueue: Fix unbind_workers() VS wq_worker_sleeping() race
> 
>  kernel/workqueue.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)

This series passed 1500 hours of TREE01 with rcupdate.rcu_expedited=1
and also with Frederic's fix for expedited RCU grace periods, so:

Tested-by: Paul E. McKenney <paulmck@kernel.org>

							Thanx, Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/2] workqueue: Fix hotplug races
  2021-11-30 16:36 ` [RFC PATCH 0/2] workqueue: Fix hotplug races Tejun Heo
@ 2021-12-01 14:48   ` Frederic Weisbecker
  0 siblings, 0 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2021-12-01 14:48 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, LKML, Paul E . McKenney

On Tue, Nov 30, 2021 at 06:36:23AM -1000, Tejun Heo wrote:
> On Tue, Nov 30, 2021 at 01:06:10AM +0100, Frederic Weisbecker wrote:
> > While triggering rcutorture splats, I couldn't trace down to the exact
> > cause due to the issue being tracing-unfriendly. But looking at the code,
> > those two potential races look like plausible causes and also after
> > these two patches, I couldn't reproduce the rcutorture issues again (yet...).
> > 
> > Frederic Weisbecker (2):
> >   workqueue: Fix unbind_workers() VS wq_worker_running() race
> >   workqueue: Fix unbind_workers() VS wq_worker_sleeping() race
> 
> Both patches look good to me. Lai, thanks for finding the offending commits.
> Frederic, can you please repost them w/ scheduler folks and tglx cc'd?

Sure, preparing that.

Thanks.

> Thanks.
> 
> -- 
> tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-12-01 14:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-30  0:06 [RFC PATCH 0/2] workqueue: Fix hotplug races Frederic Weisbecker
2021-11-30  0:06 ` [RFC PATCH 1/2] workqueue: Fix unbind_workers() VS wq_worker_running() race Frederic Weisbecker
2021-11-30  0:33   ` Lai Jiangshan
2021-11-30  0:06 ` [RFC PATCH 2/2] workqueue: Fix unbind_workers() VS wq_worker_sleeping() race Frederic Weisbecker
2021-11-30  0:57   ` Lai Jiangshan
2021-11-30 16:36 ` [RFC PATCH 0/2] workqueue: Fix hotplug races Tejun Heo
2021-12-01 14:48   ` Frederic Weisbecker
2021-11-30 18:01 ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.