linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] stop_machine: Disable preemption after queueing stopper threads
@ 2018-07-17 19:35 Isaac J. Manjarres
  2018-07-24  1:13 ` isaacm
  2018-07-25 14:21 ` [tip:sched/core] stop_machine: Disable preemption after queueing stopper threads tip-bot for Isaac J. Manjarres
  0 siblings, 2 replies; 15+ messages in thread
From: Isaac J. Manjarres @ 2018-07-17 19:35 UTC (permalink / raw)
  To: peterz, matt, mingo, tglx, bigeasy
  Cc: Isaac J. Manjarres, linux-kernel, psodagud, gregkh, pkondeti, stable

This commit:

9fb8d5dc4b64 ("stop_machine, Disable preemption when
waking two stopper threads")

does not fully address the race condition that can occur
as follows:

On one CPU, call it CPU 3, thread 1 invokes
cpu_stop_queue_two_works(2, 3,...), and the execution is such
that thread 1 queues the works for migration/2 and migration/3,
and is preempted after releasing the locks for migration/2 and
migration/3, but before waking the threads.

Then, On CPU 2, a kworker, call it thread 2, is running,
and it invokes cpu_stop_queue_two_works(1, 2,...), such that
thread 2 queues the works for migration/1 and migration/2.
Meanwhile, on CPU 3, thread 1 resumes execution, and wakes
migration/2 and migration/3. This means that when CPU 2
releases the locks for migration/1 and migration/2, but before
it wakes those threads, it can be preempted by migration/2.

If thread 2 is preempted by migration/2, then migration/2 will
execute the first work item successfully, since migration/3
was woken up by CPU 3, but when it goes to execute the second
work item, it disables preemption, calls multi_cpu_stop(),
and thus, CPU 2 will wait forever for migration/1, which should
have been woken up by thread 2. However migration/1 cannot be
woken up by thread 2, since it is a kworker, so it is affine to
CPU 2, but CPU 2 is running migration/2 with preemption
disabled, so thread 2 will never run.

Disable preemption after queueing works for stopper threads
to ensure that the operation of queueing the works and waking
the stopper threads is atomic.

Fixes: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads")
Co-Developed-by: Prasad Sodagudi <psodagud@codeaurora.org>
Co-Developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Cc: stable@vger.kernel.org
---
 kernel/stop_machine.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 1ff523d..e190d1e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -260,6 +260,15 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
 	err = 0;
 	__cpu_stop_queue_work(stopper1, work1, &wakeq);
 	__cpu_stop_queue_work(stopper2, work2, &wakeq);
+	/*
+	 * The waking up of stopper threads has to happen
+	 * in the same scheduling context as the queueing.
+	 * Otherwise, there is a possibility of one of the
+	 * above stoppers being woken up by another CPU,
+	 * and preempting us. This will cause us to n ot
+	 * wake up the other stopper forever.
+	 */
+	preempt_disable();
 unlock:
 	raw_spin_unlock(&stopper2->lock);
 	raw_spin_unlock_irq(&stopper1->lock);
@@ -271,7 +280,6 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
 	}
 
 	if (!err) {
-		preempt_disable();
 		wake_up_q(&wakeq);
 		preempt_enable();
 	}
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
  2018-07-17 19:35 [PATCH] stop_machine: Disable preemption after queueing stopper threads Isaac J. Manjarres
@ 2018-07-24  1:13 ` isaacm
  2018-07-24  6:23   ` Sebastian Andrzej Siewior
  2018-07-25 14:21 ` [tip:sched/core] stop_machine: Disable preemption after queueing stopper threads tip-bot for Isaac J. Manjarres
  1 sibling, 1 reply; 15+ messages in thread
From: isaacm @ 2018-07-24  1:13 UTC (permalink / raw)
  To: peterz, matt, mingo, tglx, bigeasy
  Cc: linux-kernel, psodagud, gregkh, pkondeti, stable

Hi all,

Are there any comments about this patch?

Thanks,
Isaac Manjarres
On 2018-07-17 12:35, Isaac J. Manjarres wrote:
> This commit:
> 
> 9fb8d5dc4b64 ("stop_machine, Disable preemption when
> waking two stopper threads")
> 
> does not fully address the race condition that can occur
> as follows:
> 
> On one CPU, call it CPU 3, thread 1 invokes
> cpu_stop_queue_two_works(2, 3,...), and the execution is such
> that thread 1 queues the works for migration/2 and migration/3,
> and is preempted after releasing the locks for migration/2 and
> migration/3, but before waking the threads.
> 
> Then, On CPU 2, a kworker, call it thread 2, is running,
> and it invokes cpu_stop_queue_two_works(1, 2,...), such that
> thread 2 queues the works for migration/1 and migration/2.
> Meanwhile, on CPU 3, thread 1 resumes execution, and wakes
> migration/2 and migration/3. This means that when CPU 2
> releases the locks for migration/1 and migration/2, but before
> it wakes those threads, it can be preempted by migration/2.
> 
> If thread 2 is preempted by migration/2, then migration/2 will
> execute the first work item successfully, since migration/3
> was woken up by CPU 3, but when it goes to execute the second
> work item, it disables preemption, calls multi_cpu_stop(),
> and thus, CPU 2 will wait forever for migration/1, which should
> have been woken up by thread 2. However migration/1 cannot be
> woken up by thread 2, since it is a kworker, so it is affine to
> CPU 2, but CPU 2 is running migration/2 with preemption
> disabled, so thread 2 will never run.
> 
> Disable preemption after queueing works for stopper threads
> to ensure that the operation of queueing the works and waking
> the stopper threads is atomic.
> 
> Fixes: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two
> stopper threads")
> Co-Developed-by: Prasad Sodagudi <psodagud@codeaurora.org>
> Co-Developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
> Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
> Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
> Cc: stable@vger.kernel.org
> ---
>  kernel/stop_machine.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 1ff523d..e190d1e 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -260,6 +260,15 @@ static int cpu_stop_queue_two_works(int cpu1,
> struct cpu_stop_work *work1,
>  	err = 0;
>  	__cpu_stop_queue_work(stopper1, work1, &wakeq);
>  	__cpu_stop_queue_work(stopper2, work2, &wakeq);
> +	/*
> +	 * The waking up of stopper threads has to happen
> +	 * in the same scheduling context as the queueing.
> +	 * Otherwise, there is a possibility of one of the
> +	 * above stoppers being woken up by another CPU,
> +	 * and preempting us. This will cause us to n ot
> +	 * wake up the other stopper forever.
> +	 */
> +	preempt_disable();
>  unlock:
>  	raw_spin_unlock(&stopper2->lock);
>  	raw_spin_unlock_irq(&stopper1->lock);
> @@ -271,7 +280,6 @@ static int cpu_stop_queue_two_works(int cpu1,
> struct cpu_stop_work *work1,
>  	}
> 
>  	if (!err) {
> -		preempt_disable();
>  		wake_up_q(&wakeq);
>  		preempt_enable();
>  	}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
  2018-07-24  1:13 ` isaacm
@ 2018-07-24  6:23   ` Sebastian Andrzej Siewior
  2018-07-25  4:15     ` isaacm
  2018-07-30 10:20     ` Thomas Gleixner
  0 siblings, 2 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-07-24  6:23 UTC (permalink / raw)
  To: isaacm
  Cc: peterz, matt, mingo, tglx, linux-kernel, psodagud, gregkh,
	pkondeti, stable

On 2018-07-23 18:13:48 [-0700], isaacm@codeaurora.org wrote:
> Hi all,
Hi,

> Are there any comments about this patch?

I haven't look in detail at this but your new preempt_disable() makes
things unbalanced for the err != 0 case.

> Thanks,
> Isaac Manjarres

Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
  2018-07-24  6:23   ` Sebastian Andrzej Siewior
@ 2018-07-25  4:15     ` isaacm
  2018-07-30 10:20     ` Thomas Gleixner
  1 sibling, 0 replies; 15+ messages in thread
From: isaacm @ 2018-07-25  4:15 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: peterz, matt, mingo, tglx, linux-kernel, psodagud, gregkh,
	pkondeti, stable

Hi Sebastian,

Thanks for the response.

"I haven't look in detail at this but your new preempt_disable() makes
things unbalanced for the err != 0 case."

This cannot happen. The only possible return values of this function
are -ENOENT or 0.

In the case where we return -ENOENT, we'll go
straight to "unlock," which releases the two locks being held, but
doesn't disable preemption, and since err != we won't call
preemption_enable.

In the case where we return 0, then that means the works were queued
successfully, and preemption was disabled, and we'll fall into the
if branch, after releasing the locks, and enable preemption, which is
correct.

In either case, there is no imbalance between the 
preemption_[disable/enable]
calls.

Thanks,
Isaac Manjarres

On 2018-07-23 23:23, Sebastian Andrzej Siewior wrote:
> On 2018-07-23 18:13:48 [-0700], isaacm@codeaurora.org wrote:
>> Hi all,
> Hi,
> 
>> Are there any comments about this patch?
> 
> I haven't look in detail at this but your new preempt_disable() makes
> things unbalanced for the err != 0 case.
> 
>> Thanks,
>> Isaac Manjarres
> 
> Sebastian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [tip:sched/core] stop_machine: Disable preemption after queueing stopper threads
  2018-07-17 19:35 [PATCH] stop_machine: Disable preemption after queueing stopper threads Isaac J. Manjarres
  2018-07-24  1:13 ` isaacm
@ 2018-07-25 14:21 ` tip-bot for Isaac J. Manjarres
  1 sibling, 0 replies; 15+ messages in thread
From: tip-bot for Isaac J. Manjarres @ 2018-07-25 14:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: psodagud, linux-kernel, tglx, hpa, mingo, isaacm, torvalds,
	pkondeti, peterz

Commit-ID:  2610e88946632afb78aa58e61f11368ac4c0af7b
Gitweb:     https://git.kernel.org/tip/2610e88946632afb78aa58e61f11368ac4c0af7b
Author:     Isaac J. Manjarres <isaacm@codeaurora.org>
AuthorDate: Tue, 17 Jul 2018 12:35:29 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 25 Jul 2018 11:25:08 +0200

stop_machine: Disable preemption after queueing stopper threads

This commit:

  9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads")

does not fully address the race condition that can occur
as follows:

On one CPU, call it CPU 3, thread 1 invokes
cpu_stop_queue_two_works(2, 3,...), and the execution is such
that thread 1 queues the works for migration/2 and migration/3,
and is preempted after releasing the locks for migration/2 and
migration/3, but before waking the threads.

Then, On CPU 2, a kworker, call it thread 2, is running,
and it invokes cpu_stop_queue_two_works(1, 2,...), such that
thread 2 queues the works for migration/1 and migration/2.
Meanwhile, on CPU 3, thread 1 resumes execution, and wakes
migration/2 and migration/3. This means that when CPU 2
releases the locks for migration/1 and migration/2, but before
it wakes those threads, it can be preempted by migration/2.

If thread 2 is preempted by migration/2, then migration/2 will
execute the first work item successfully, since migration/3
was woken up by CPU 3, but when it goes to execute the second
work item, it disables preemption, calls multi_cpu_stop(),
and thus, CPU 2 will wait forever for migration/1, which should
have been woken up by thread 2. However migration/1 cannot be
woken up by thread 2, since it is a kworker, so it is affine to
CPU 2, but CPU 2 is running migration/2 with preemption
disabled, so thread 2 will never run.

Disable preemption after queueing works for stopper threads
to ensure that the operation of queueing the works and waking
the stopper threads is atomic.

Co-Developed-by: Prasad Sodagudi <psodagud@codeaurora.org>
Co-Developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bigeasy@linutronix.de
Cc: gregkh@linuxfoundation.org
Cc: matt@codeblueprint.co.uk
Fixes: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads")
Link: http://lkml.kernel.org/r/1531856129-9871-1-git-send-email-isaacm@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/stop_machine.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 1ff523dae6e2..e190d1ef3a23 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -260,6 +260,15 @@ retry:
 	err = 0;
 	__cpu_stop_queue_work(stopper1, work1, &wakeq);
 	__cpu_stop_queue_work(stopper2, work2, &wakeq);
+	/*
+	 * The waking up of stopper threads has to happen
+	 * in the same scheduling context as the queueing.
+	 * Otherwise, there is a possibility of one of the
+	 * above stoppers being woken up by another CPU,
+	 * and preempting us. This will cause us to n ot
+	 * wake up the other stopper forever.
+	 */
+	preempt_disable();
 unlock:
 	raw_spin_unlock(&stopper2->lock);
 	raw_spin_unlock_irq(&stopper1->lock);
@@ -271,7 +280,6 @@ unlock:
 	}
 
 	if (!err) {
-		preempt_disable();
 		wake_up_q(&wakeq);
 		preempt_enable();
 	}

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
  2018-07-24  6:23   ` Sebastian Andrzej Siewior
  2018-07-25  4:15     ` isaacm
@ 2018-07-30 10:20     ` Thomas Gleixner
  2018-07-30 11:21       ` Peter Zijlstra
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2018-07-30 10:20 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: isaacm, peterz, matt, mingo, linux-kernel, psodagud, gregkh,
	pkondeti, stable

On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
> On 2018-07-23 18:13:48 [-0700], isaacm@codeaurora.org wrote:
> > Hi all,
> Hi,
> 
> > Are there any comments about this patch?
> 
> I haven't look in detail at this but your new preempt_disable() makes
> things unbalanced for the err != 0 case.

It doesn't but that code is really an unreadable pile of ...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
  2018-07-30 10:20     ` Thomas Gleixner
@ 2018-07-30 11:21       ` Peter Zijlstra
  2018-07-30 12:41         ` Thomas Gleixner
                           ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Peter Zijlstra @ 2018-07-30 11:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sebastian Andrzej Siewior, isaacm, matt, mingo, linux-kernel,
	psodagud, gregkh, pkondeti, stable

On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote:
> On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
> > On 2018-07-23 18:13:48 [-0700], isaacm@codeaurora.org wrote:
> > > Hi all,
> > Hi,
> > 
> > > Are there any comments about this patch?
> > 
> > I haven't look in detail at this but your new preempt_disable() makes
> > things unbalanced for the err != 0 case.
> 
> It doesn't but that code is really an unreadable pile of ...

---
Subject: stop_machine: Reflow cpu_stop_queue_two_works()

The code flow in cpu_stop_queue_two_works() is a little arcane; fix
this by lifting the preempt_disable() to the top to create more natural
nesting wrt the spinlocks and make the wake_up_q() and preempt_enable()
unconditional at the end.

Furthermore, enable preemption in the -EDEADLK case, such that we
spin-wait with preemption enabled.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/stop_machine.c | 41 +++++++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index e190d1ef3a23..34b6652e8677 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -236,13 +236,24 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
 	struct cpu_stopper *stopper2 = per_cpu_ptr(&cpu_stopper, cpu2);
 	DEFINE_WAKE_Q(wakeq);
 	int err;
+
 retry:
+	/*
+	 * The waking up of stopper threads has to happen in the same
+	 * scheduling context as the queueing.  Otherwise, there is a
+	 * possibility of one of the above stoppers being woken up by another
+	 * CPU, and preempting us. This will cause us to not wake up the other
+	 * stopper forever.
+	 */
+	preempt_disable();
 	raw_spin_lock_irq(&stopper1->lock);
 	raw_spin_lock_nested(&stopper2->lock, SINGLE_DEPTH_NESTING);
 
-	err = -ENOENT;
-	if (!stopper1->enabled || !stopper2->enabled)
+	if (!stopper1->enabled || !stopper2->enabled) {
+		err = -ENOENT;
 		goto unlock;
+	}
+
 	/*
 	 * Ensure that if we race with __stop_cpus() the stoppers won't get
 	 * queued up in reverse order leading to system deadlock.
@@ -253,36 +264,30 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
 	 * It can be falsely true but it is safe to spin until it is cleared,
 	 * queue_stop_cpus_work() does everything under preempt_disable().
 	 */
-	err = -EDEADLK;
-	if (unlikely(stop_cpus_in_progress))
-			goto unlock;
+	if (unlikely(stop_cpus_in_progress)) {
+		err = -EDEADLK;
+		goto unlock;
+	}
 
 	err = 0;
 	__cpu_stop_queue_work(stopper1, work1, &wakeq);
 	__cpu_stop_queue_work(stopper2, work2, &wakeq);
-	/*
-	 * The waking up of stopper threads has to happen
-	 * in the same scheduling context as the queueing.
-	 * Otherwise, there is a possibility of one of the
-	 * above stoppers being woken up by another CPU,
-	 * and preempting us. This will cause us to n ot
-	 * wake up the other stopper forever.
-	 */
-	preempt_disable();
+
 unlock:
 	raw_spin_unlock(&stopper2->lock);
 	raw_spin_unlock_irq(&stopper1->lock);
 
 	if (unlikely(err == -EDEADLK)) {
+		preempt_enable();
+
 		while (stop_cpus_in_progress)
 			cpu_relax();
+
 		goto retry;
 	}
 
-	if (!err) {
-		wake_up_q(&wakeq);
-		preempt_enable();
-	}
+	wake_up_q(&wakeq);
+	preempt_enable();
 
 	return err;
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
  2018-07-30 11:21       ` Peter Zijlstra
@ 2018-07-30 12:41         ` Thomas Gleixner
  2018-07-30 17:12           ` Sodagudi Prasad
  2018-08-02 12:06         ` [tip:sched/core] stop_machine: Reflow cpu_stop_queue_two_works() tip-bot for Peter Zijlstra
  2018-08-02 13:27         ` tip-bot for Peter Zijlstra
  2 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2018-07-30 12:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sebastian Andrzej Siewior, isaacm, matt, mingo, linux-kernel,
	psodagud, gregkh, pkondeti, stable

On Mon, 30 Jul 2018, Peter Zijlstra wrote:

> On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote:
> > On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
> > > On 2018-07-23 18:13:48 [-0700], isaacm@codeaurora.org wrote:
> > > > Hi all,
> > > Hi,
> > > 
> > > > Are there any comments about this patch?
> > > 
> > > I haven't look in detail at this but your new preempt_disable() makes
> > > things unbalanced for the err != 0 case.
> > 
> > It doesn't but that code is really an unreadable pile of ...
> 
> ---
> Subject: stop_machine: Reflow cpu_stop_queue_two_works()
> 
> The code flow in cpu_stop_queue_two_works() is a little arcane; fix
> this by lifting the preempt_disable() to the top to create more natural
> nesting wrt the spinlocks and make the wake_up_q() and preempt_enable()
> unconditional at the end.
> 
> Furthermore, enable preemption in the -EDEADLK case, such that we
> spin-wait with preemption enabled.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Thanks for cleaning that up!

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
  2018-07-30 12:41         ` Thomas Gleixner
@ 2018-07-30 17:12           ` Sodagudi Prasad
  2018-07-30 17:16             ` Thomas Gleixner
  2018-07-30 21:07             ` Peter Zijlstra
  0 siblings, 2 replies; 15+ messages in thread
From: Sodagudi Prasad @ 2018-07-30 17:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Sebastian Andrzej Siewior, isaacm, matt, mingo,
	linux-kernel, gregkh, pkondeti, stable

On 2018-07-30 05:41, Thomas Gleixner wrote:
> On Mon, 30 Jul 2018, Peter Zijlstra wrote:
> 
>> On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote:
>> > On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
>> > > On 2018-07-23 18:13:48 [-0700], isaacm@codeaurora.org wrote:
>> > > > Hi all,
>> > > Hi,
>> > >
>> > > > Are there any comments about this patch?
>> > >
>> > > I haven't look in detail at this but your new preempt_disable() makes
>> > > things unbalanced for the err != 0 case.
>> >
>> > It doesn't but that code is really an unreadable pile of ...
>> 
>> ---
>> Subject: stop_machine: Reflow cpu_stop_queue_two_works()
>> 
>> The code flow in cpu_stop_queue_two_works() is a little arcane; fix
>> this by lifting the preempt_disable() to the top to create more 
>> natural
>> nesting wrt the spinlocks and make the wake_up_q() and 
>> preempt_enable()
>> unconditional at the end.
>> 
>> Furthermore, enable preemption in the -EDEADLK case, such that we
>> spin-wait with preemption enabled.
>> 
>> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
Hi Peter/Thomas,

How about including below change as well?  Currently, there is no way to 
identify thread migrations completed or not.  When we observe this 
issue, the symptom was work queue lock up. It is better to have some 
timeout here and induce the bug_on.

There is no way to identify the migration threads stuck or not.

--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2, cpu_stop_fn_t fn, void *
         struct cpu_stop_done done;
         struct cpu_stop_work work1, work2;
         struct multi_stop_data msdata;
+       int ret;

         msdata = (struct multi_stop_data){
                 .fn = fn,
@@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2, cpu_stop_fn_t fn, void *
         if (cpu_stop_queue_two_works(cpu1, &work1, cpu2, &work2))
                 return -ENOENT;

-       wait_for_completion(&done.completion);
+       ret = wait_for_completion_timeout(&done.completion,  
msecs_to_jiffies(1000));
+       if (!ret)
+               BUG_ON(1);
+


> Thanks for cleaning that up!
> 
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
  2018-07-30 17:12           ` Sodagudi Prasad
@ 2018-07-30 17:16             ` Thomas Gleixner
  2018-07-30 21:07             ` Peter Zijlstra
  1 sibling, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2018-07-30 17:16 UTC (permalink / raw)
  To: Sodagudi Prasad
  Cc: Peter Zijlstra, Sebastian Andrzej Siewior, isaacm, matt, mingo,
	linux-kernel, gregkh, pkondeti, stable

On Mon, 30 Jul 2018, Sodagudi Prasad wrote:
> How about including below change as well?  Currently, there is no way to

That would be a completely separate change.

> identify thread migrations completed or not.  When we observe this issue, the
> symptom was work queue lock up. It is better to have some timeout here and
> induce the bug_on.

BUG_ON() is wrong. Why kill the thing if there is at least a theoretical
chance that stuff can continue half baken so you can get more info out of
it. The back trace is pretty much uninteresting.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
  2018-07-30 17:12           ` Sodagudi Prasad
  2018-07-30 17:16             ` Thomas Gleixner
@ 2018-07-30 21:07             ` Peter Zijlstra
  2018-08-01  8:07               ` Sodagudi Prasad
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2018-07-30 21:07 UTC (permalink / raw)
  To: Sodagudi Prasad
  Cc: Thomas Gleixner, Sebastian Andrzej Siewior, isaacm, matt, mingo,
	linux-kernel, gregkh, pkondeti, stable

On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote:
> How about including below change as well?  Currently, there is no way to
> identify thread migrations completed or not.  When we observe this issue,
> the symptom was work queue lock up. It is better to have some timeout here
> and induce the bug_on.

You'd trigger the soft-lockup or hung-task detector I think. And if not,
we ought to look at making it trigger at least one of those.

> There is no way to identify the migration threads stuck or not.

Should be pretty obvious from the splat generated by the above, no?

> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2,
> cpu_stop_fn_t fn, void *
>         struct cpu_stop_done done;
>         struct cpu_stop_work work1, work2;
>         struct multi_stop_data msdata;
> +       int ret;
> 
>         msdata = (struct multi_stop_data){
>                 .fn = fn,
> @@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2,
> cpu_stop_fn_t fn, void *
>         if (cpu_stop_queue_two_works(cpu1, &work1, cpu2, &work2))
>                 return -ENOENT;
> 
> -       wait_for_completion(&done.completion);
> +       ret = wait_for_completion_timeout(&done.completion,
> msecs_to_jiffies(1000));
> +       if (!ret)
> +               BUG_ON(1);
> +

That's a random timeout, which if you spuriously trigger it, will take
down your machine. That seems like a cure worse than the disease.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
  2018-07-30 21:07             ` Peter Zijlstra
@ 2018-08-01  8:07               ` Sodagudi Prasad
  2018-08-06  8:37                 ` Pavan Kondeti
  0 siblings, 1 reply; 15+ messages in thread
From: Sodagudi Prasad @ 2018-08-01  8:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Sebastian Andrzej Siewior, isaacm, matt, mingo,
	linux-kernel, gregkh, pkondeti, stable

On 2018-07-30 14:07, Peter Zijlstra wrote:
> On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote:
>> How about including below change as well?  Currently, there is no way 
>> to
>> identify thread migrations completed or not.  When we observe this 
>> issue,
>> the symptom was work queue lock up. It is better to have some timeout 
>> here
>> and induce the bug_on.
> 
> You'd trigger the soft-lockup or hung-task detector I think. And if 
> not,
> we ought to look at making it trigger at least one of those.
> 
>> There is no way to identify the migration threads stuck or not.
> 
> Should be pretty obvious from the splat generated by the above, no?
Hi Peter and Thomas,

Thanks for your support.
I have another question on this flow and retry mechanism used in this 
cpu_stop_queue_two_works() function using the global variable 
stop_cpus_in_progress.

This variable is getting used in various paths, such as task migration, 
set task affinity, and CPU hotplug.

For example cpu hotplug path, stop_cpus_in_progress variable getting set 
with true with out checking.
takedown_cpu()
--stop_machine_cpuslocked()
---stop_cpus()
---__stop_cpus()
----queue_stop_cpus_work()
setting stop_cpus_in_progress to true directly.

But in the task migration path only, the stop_cpus_in_progress  variable 
is used for retry.

I am thinking that stop_cpus_in_progress variable lead race conditions, 
where CPU hotplug and task migration happening simultaneously. Please 
correct me If my understanding wrong.

-Thanks, Prasad

> 
>> --- a/kernel/stop_machine.c
>> +++ b/kernel/stop_machine.c
>> @@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
>> cpu2,
>> cpu_stop_fn_t fn, void *
>>         struct cpu_stop_done done;
>>         struct cpu_stop_work work1, work2;
>>         struct multi_stop_data msdata;
>> +       int ret;
>> 
>>         msdata = (struct multi_stop_data){
>>                 .fn = fn,
>> @@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
>> cpu2,
>> cpu_stop_fn_t fn, void *
>>         if (cpu_stop_queue_two_works(cpu1, &work1, cpu2, &work2))
>>                 return -ENOENT;
>> 
>> -       wait_for_completion(&done.completion);
>> +       ret = wait_for_completion_timeout(&done.completion,
>> msecs_to_jiffies(1000));
>> +       if (!ret)
>> +               BUG_ON(1);
>> +
> 
> That's a random timeout, which if you spuriously trigger it, will take
> down your machine. That seems like a cure worse than the disease.

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [tip:sched/core] stop_machine: Reflow cpu_stop_queue_two_works()
  2018-07-30 11:21       ` Peter Zijlstra
  2018-07-30 12:41         ` Thomas Gleixner
@ 2018-08-02 12:06         ` tip-bot for Peter Zijlstra
  2018-08-02 13:27         ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Peter Zijlstra @ 2018-08-02 12:06 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: tglx, linux-kernel, peterz, bigeasy, mingo, hpa

Commit-ID:  2171ce2d470d6e389ebbef3edd22c7643918a02f
Gitweb:     https://git.kernel.org/tip/2171ce2d470d6e389ebbef3edd22c7643918a02f
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 30 Jul 2018 13:21:40 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 2 Aug 2018 14:02:53 +0200

stop_machine: Reflow cpu_stop_queue_two_works()

The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by
lifting the preempt_disable() to the top to create more natural nesting wrt
the spinlocks and make the wake_up_q() and preempt_enable() unconditional
at the end.

Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait
with preemption enabled.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: isaacm@codeaurora.org
Cc: matt@codeblueprint.co.uk
Cc: psodagud@codeaurora.org
Cc: gregkh@linuxfoundation.org
Cc: pkondeti@codeaurora.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180730112140.GH2494@hirez.programming.kicks-ass.net
---
 kernel/stop_machine.c | 41 +++++++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index e190d1ef3a23..34b6652e8677 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -236,13 +236,24 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
 	struct cpu_stopper *stopper2 = per_cpu_ptr(&cpu_stopper, cpu2);
 	DEFINE_WAKE_Q(wakeq);
 	int err;
+
 retry:
+	/*
+	 * The waking up of stopper threads has to happen in the same
+	 * scheduling context as the queueing.  Otherwise, there is a
+	 * possibility of one of the above stoppers being woken up by another
+	 * CPU, and preempting us. This will cause us to not wake up the other
+	 * stopper forever.
+	 */
+	preempt_disable();
 	raw_spin_lock_irq(&stopper1->lock);
 	raw_spin_lock_nested(&stopper2->lock, SINGLE_DEPTH_NESTING);
 
-	err = -ENOENT;
-	if (!stopper1->enabled || !stopper2->enabled)
+	if (!stopper1->enabled || !stopper2->enabled) {
+		err = -ENOENT;
 		goto unlock;
+	}
+
 	/*
 	 * Ensure that if we race with __stop_cpus() the stoppers won't get
 	 * queued up in reverse order leading to system deadlock.
@@ -253,36 +264,30 @@ retry:
 	 * It can be falsely true but it is safe to spin until it is cleared,
 	 * queue_stop_cpus_work() does everything under preempt_disable().
 	 */
-	err = -EDEADLK;
-	if (unlikely(stop_cpus_in_progress))
-			goto unlock;
+	if (unlikely(stop_cpus_in_progress)) {
+		err = -EDEADLK;
+		goto unlock;
+	}
 
 	err = 0;
 	__cpu_stop_queue_work(stopper1, work1, &wakeq);
 	__cpu_stop_queue_work(stopper2, work2, &wakeq);
-	/*
-	 * The waking up of stopper threads has to happen
-	 * in the same scheduling context as the queueing.
-	 * Otherwise, there is a possibility of one of the
-	 * above stoppers being woken up by another CPU,
-	 * and preempting us. This will cause us to n ot
-	 * wake up the other stopper forever.
-	 */
-	preempt_disable();
+
 unlock:
 	raw_spin_unlock(&stopper2->lock);
 	raw_spin_unlock_irq(&stopper1->lock);
 
 	if (unlikely(err == -EDEADLK)) {
+		preempt_enable();
+
 		while (stop_cpus_in_progress)
 			cpu_relax();
+
 		goto retry;
 	}
 
-	if (!err) {
-		wake_up_q(&wakeq);
-		preempt_enable();
-	}
+	wake_up_q(&wakeq);
+	preempt_enable();
 
 	return err;
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:sched/core] stop_machine: Reflow cpu_stop_queue_two_works()
  2018-07-30 11:21       ` Peter Zijlstra
  2018-07-30 12:41         ` Thomas Gleixner
  2018-08-02 12:06         ` [tip:sched/core] stop_machine: Reflow cpu_stop_queue_two_works() tip-bot for Peter Zijlstra
@ 2018-08-02 13:27         ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Peter Zijlstra @ 2018-08-02 13:27 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, tglx, peterz, bigeasy, hpa, mingo

Commit-ID:  b80a2bfce85e1051056d98d04ecb2d0b55cbbc1c
Gitweb:     https://git.kernel.org/tip/b80a2bfce85e1051056d98d04ecb2d0b55cbbc1c
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 30 Jul 2018 13:21:40 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 2 Aug 2018 15:25:20 +0200

stop_machine: Reflow cpu_stop_queue_two_works()

The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by
lifting the preempt_disable() to the top to create more natural nesting wrt
the spinlocks and make the wake_up_q() and preempt_enable() unconditional
at the end.

Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait
with preemption enabled.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: isaacm@codeaurora.org
Cc: matt@codeblueprint.co.uk
Cc: psodagud@codeaurora.org
Cc: gregkh@linuxfoundation.org
Cc: pkondeti@codeaurora.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180730112140.GH2494@hirez.programming.kicks-ass.net
---
 kernel/stop_machine.c | 41 +++++++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index e190d1ef3a23..34b6652e8677 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -236,13 +236,24 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
 	struct cpu_stopper *stopper2 = per_cpu_ptr(&cpu_stopper, cpu2);
 	DEFINE_WAKE_Q(wakeq);
 	int err;
+
 retry:
+	/*
+	 * The waking up of stopper threads has to happen in the same
+	 * scheduling context as the queueing.  Otherwise, there is a
+	 * possibility of one of the above stoppers being woken up by another
+	 * CPU, and preempting us. This will cause us to not wake up the other
+	 * stopper forever.
+	 */
+	preempt_disable();
 	raw_spin_lock_irq(&stopper1->lock);
 	raw_spin_lock_nested(&stopper2->lock, SINGLE_DEPTH_NESTING);
 
-	err = -ENOENT;
-	if (!stopper1->enabled || !stopper2->enabled)
+	if (!stopper1->enabled || !stopper2->enabled) {
+		err = -ENOENT;
 		goto unlock;
+	}
+
 	/*
 	 * Ensure that if we race with __stop_cpus() the stoppers won't get
 	 * queued up in reverse order leading to system deadlock.
@@ -253,36 +264,30 @@ retry:
 	 * It can be falsely true but it is safe to spin until it is cleared,
 	 * queue_stop_cpus_work() does everything under preempt_disable().
 	 */
-	err = -EDEADLK;
-	if (unlikely(stop_cpus_in_progress))
-			goto unlock;
+	if (unlikely(stop_cpus_in_progress)) {
+		err = -EDEADLK;
+		goto unlock;
+	}
 
 	err = 0;
 	__cpu_stop_queue_work(stopper1, work1, &wakeq);
 	__cpu_stop_queue_work(stopper2, work2, &wakeq);
-	/*
-	 * The waking up of stopper threads has to happen
-	 * in the same scheduling context as the queueing.
-	 * Otherwise, there is a possibility of one of the
-	 * above stoppers being woken up by another CPU,
-	 * and preempting us. This will cause us to n ot
-	 * wake up the other stopper forever.
-	 */
-	preempt_disable();
+
 unlock:
 	raw_spin_unlock(&stopper2->lock);
 	raw_spin_unlock_irq(&stopper1->lock);
 
 	if (unlikely(err == -EDEADLK)) {
+		preempt_enable();
+
 		while (stop_cpus_in_progress)
 			cpu_relax();
+
 		goto retry;
 	}
 
-	if (!err) {
-		wake_up_q(&wakeq);
-		preempt_enable();
-	}
+	wake_up_q(&wakeq);
+	preempt_enable();
 
 	return err;
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
  2018-08-01  8:07               ` Sodagudi Prasad
@ 2018-08-06  8:37                 ` Pavan Kondeti
  0 siblings, 0 replies; 15+ messages in thread
From: Pavan Kondeti @ 2018-08-06  8:37 UTC (permalink / raw)
  To: Sodagudi Prasad
  Cc: Peter Zijlstra, Thomas Gleixner, Sebastian Andrzej Siewior,
	isaacm, matt, mingo, linux-kernel, gregkh, stable

Hi Prasad,

On Wed, Aug 01, 2018 at 01:07:03AM -0700, Sodagudi Prasad wrote:
> On 2018-07-30 14:07, Peter Zijlstra wrote:
> >On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote:
> >>How about including below change as well?  Currently, there is
> >>no way to
> >>identify thread migrations completed or not.  When we observe
> >>this issue,
> >>the symptom was work queue lock up. It is better to have some
> >>timeout here
> >>and induce the bug_on.
> >
> >You'd trigger the soft-lockup or hung-task detector I think. And
> >if not,
> >we ought to look at making it trigger at least one of those.
> >
> >>There is no way to identify the migration threads stuck or not.
> >
> >Should be pretty obvious from the splat generated by the above, no?
> Hi Peter and Thomas,
> 
> Thanks for your support.
> I have another question on this flow and retry mechanism used in
> this cpu_stop_queue_two_works() function using the global variable
> stop_cpus_in_progress.
> 
> This variable is getting used in various paths, such as task
> migration, set task affinity, and CPU hotplug.
> 
> For example cpu hotplug path, stop_cpus_in_progress variable getting
> set with true with out checking.
> takedown_cpu()
> --stop_machine_cpuslocked()
> ---stop_cpus()
> ---__stop_cpus()
> ----queue_stop_cpus_work()
> setting stop_cpus_in_progress to true directly.
> 
> But in the task migration path only, the stop_cpus_in_progress
> variable is used for retry.
> 
> I am thinking that stop_cpus_in_progress variable lead race
> conditions, where CPU hotplug and task migration happening
> simultaneously. Please correct me If my understanding wrong.
> 

The stop_cpus_in_progress variable is to guard against out of order queuing.
The stopper locks does not protect this when cpu_stop_queue_two_works() and
stop_cpus() are executing in parallel.

stop_one_cpu_{nowait} functions are called to handle affinity change and
load balance. Since we are queuing the work only on 1 CPU,
stop_cpus_in_progress variable protection is not needed.

Thanks,
Pavan
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-08-06  8:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-17 19:35 [PATCH] stop_machine: Disable preemption after queueing stopper threads Isaac J. Manjarres
2018-07-24  1:13 ` isaacm
2018-07-24  6:23   ` Sebastian Andrzej Siewior
2018-07-25  4:15     ` isaacm
2018-07-30 10:20     ` Thomas Gleixner
2018-07-30 11:21       ` Peter Zijlstra
2018-07-30 12:41         ` Thomas Gleixner
2018-07-30 17:12           ` Sodagudi Prasad
2018-07-30 17:16             ` Thomas Gleixner
2018-07-30 21:07             ` Peter Zijlstra
2018-08-01  8:07               ` Sodagudi Prasad
2018-08-06  8:37                 ` Pavan Kondeti
2018-08-02 12:06         ` [tip:sched/core] stop_machine: Reflow cpu_stop_queue_two_works() tip-bot for Peter Zijlstra
2018-08-02 13:27         ` tip-bot for Peter Zijlstra
2018-07-25 14:21 ` [tip:sched/core] stop_machine: Disable preemption after queueing stopper threads tip-bot for Isaac J. Manjarres

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).