linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kthread: Atomically set completion and perform dequeue in __kthread_parkme
@ 2017-06-26 22:18 Vikram Mulukutla
  2017-06-26 23:03 ` Vikram Mulukutla
  2017-07-04 16:07 ` Peter Zijlstra
  0 siblings, 2 replies; 7+ messages in thread
From: Vikram Mulukutla @ 2017-06-26 22:18 UTC (permalink / raw)
  To: rusty, tj, tglx, akpm; +Cc: linux-kernel, stable, Vikram Mulukutla

kthread_park waits for the target kthread to park itself with
__kthread_parkme using a completion variable. __kthread_parkme - which is
invoked by the target kthread - sets the completion variable before
calling schedule() to voluntarily get itself off of the runqueue.

This causes an interesting race in the hotplug path. takedown_cpu()
invoked for CPU_X attempts to park the cpuhp/X hotplug kthread before
running the stopper thread on CPU_X. kthread_unpark doesn't guarantee that
cpuhp/X is off of X's runqueue, only that the thread has executed
__kthread_parkme and set the completion. cpuhp/X may have been preempted
out before calling schedule() to voluntarily sleep. takedown_cpu proceeds
to run the stopper thread on CPU_X which promptly migrates off the
still-on-rq cpuhp/X thread to another cpu CPU_Y, setting its affinity
mask to something other than CPU_X alone.

This is OK - cpuhp/X may finally get itself off of CPU_Y's runqueue at
some later point. But if that doesn't happen (for example, if there's
an RT thread on CPU_Y), the kthread_unpark in a subsequent cpu_up call
for CPU_X will race with the still-on-rq condition. Even now we're
functionally OK because there is a wait_task_inactive in the
kthread_unpark(), BUT the following happens:

[   12.472745] BUG: scheduling while atomic: swapper/7/0/0x00000002
[   12.472749] Modules linked in:
[   12.472756] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680
[   12.472758] Hardware name: XXXXX
[   12.472760] Call trace:
[   12.472773] [<ffffff8eb4e87928>] dump_backtrace+0x0/0x198
[   12.472777] [<ffffff8eb4e87ad4>] show_stack+0x14/0x1c
[   12.472781] [<ffffff8eb516c998>] dump_stack+0x8c/0xac
[   12.472786] [<ffffff8eb4ecea28>] __schedule_bug+0x54/0x70
[   12.472792] [<ffffff8eb5bbf478>] __schedule+0x6b4/0x928
[   12.472794] [<ffffff8eb5bbf728>] schedule+0x3c/0xa0
[   12.472797] [<ffffff8eb5bc2950>] schedule_hrtimeout_range_clock+0x80/0xec
[   12.472799] [<ffffff8eb5bc29ec>] schedule_hrtimeout+0x18/0x20
[   12.472803] [<ffffff8eb4ed3b30>] wait_task_inactive+0x1a0/0x1a4
[   12.472806] [<ffffff8eb4ec1b88>] __kthread_bind_mask+0x20/0x7c
[   12.472809] [<ffffff8eb4ec1c0c>] __kthread_bind+0x28/0x30
[   12.472811] [<ffffff8eb4ec1c88>] __kthread_unpark+0x5c/0x60
[   12.472814] [<ffffff8eb4ec1cb0>] kthread_unpark+0x24/0x2c
[   12.472818] [<ffffff8eb4ea4a7c>] cpuhp_online_idle+0x50/0x90
[   12.472822] [<ffffff8eb4ef2940>] cpu_startup_entry+0x3c/0x1d4
[   12.472824] [<ffffff8eb4e8dae4>] secondary_start_kernel+0x164/0x1b4

Since the kthread_unpark is invoked from a preemption-disabled context,
wait_task_inactive's action of invoking schedule is invalid, causing the
splat. Note that kthread_bind_mask is correctly attempting to re-set
the affinity mask since cpuhp is a per-cpu smpboot thread.

Instead of adding an expensive wait_task_inactive inside kthread_park()
or trying to muck with the hotplug code, let's just ensure that the
completion variable and the schedule happen atomically inside
__kthread_parkme. This focuses the fix to the hotplug requirement alone,
and removes the unnecessary migration of cpuhp/X.

Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
---
 kernel/kthread.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 26db528..7ad3354 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -171,9 +171,20 @@ static void __kthread_parkme(struct kthread *self)
 {
 	__set_current_state(TASK_PARKED);
 	while (test_bit(KTHREAD_SHOULD_PARK, &self->flags)) {
+		/*
+		 * Why the preempt_disable?
+		 * Hotplug needs to ensure that 'self' is off of the runqueue
+		 * as well, before scheduling the stopper thread that will
+		 * migrate tasks off of the runqeue that 'self' was running on.
+		 * This avoids unnecessary migration work and also ensures that
+		 * kthread_unpark in the cpu_up path doesn't race with
+		 * __kthread_parkme.
+		 */
+		preempt_disable();
 		if (!test_and_set_bit(KTHREAD_IS_PARKED, &self->flags))
 			complete(&self->parked);
-		schedule();
+		schedule_preempt_disabled();
+		preempt_enable();
 		__set_current_state(TASK_PARKED);
 	}
 	clear_bit(KTHREAD_IS_PARKED, &self->flags);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] kthread: Atomically set completion and perform dequeue in __kthread_parkme
  2017-06-26 22:18 [PATCH] kthread: Atomically set completion and perform dequeue in __kthread_parkme Vikram Mulukutla
@ 2017-06-26 23:03 ` Vikram Mulukutla
  2017-06-28 14:05   ` Vikram Mulukutla
  2017-07-04 19:49   ` Thomas Gleixner
  2017-07-04 16:07 ` Peter Zijlstra
  1 sibling, 2 replies; 7+ messages in thread
From: Vikram Mulukutla @ 2017-06-26 23:03 UTC (permalink / raw)
  To: rusty, tj, tglx, akpm; +Cc: linux-kernel, stable


correcting Thomas Gleixner's email address. s/linuxtronix/linutronix

On 6/26/2017 3:18 PM, Vikram Mulukutla wrote:
> kthread_park waits for the target kthread to park itself with
> __kthread_parkme using a completion variable. __kthread_parkme - which is
> invoked by the target kthread - sets the completion variable before
> calling schedule() to voluntarily get itself off of the runqueue.
> 
> This causes an interesting race in the hotplug path. takedown_cpu()
> invoked for CPU_X attempts to park the cpuhp/X hotplug kthread before
> running the stopper thread on CPU_X. kthread_unpark doesn't guarantee that
> cpuhp/X is off of X's runqueue, only that the thread has executed
> __kthread_parkme and set the completion. cpuhp/X may have been preempted
> out before calling schedule() to voluntarily sleep. takedown_cpu proceeds
> to run the stopper thread on CPU_X which promptly migrates off the
> still-on-rq cpuhp/X thread to another cpu CPU_Y, setting its affinity
> mask to something other than CPU_X alone.
> 
> This is OK - cpuhp/X may finally get itself off of CPU_Y's runqueue at
> some later point. But if that doesn't happen (for example, if there's
> an RT thread on CPU_Y), the kthread_unpark in a subsequent cpu_up call
> for CPU_X will race with the still-on-rq condition. Even now we're
> functionally OK because there is a wait_task_inactive in the
> kthread_unpark(), BUT the following happens:
> 
> [   12.472745] BUG: scheduling while atomic: swapper/7/0/0x00000002
> [   12.472749] Modules linked in:
> [   12.472756] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680
> [   12.472758] Hardware name: XXXXX
> [   12.472760] Call trace:
> [   12.472773] [<ffffff8eb4e87928>] dump_backtrace+0x0/0x198
> [   12.472777] [<ffffff8eb4e87ad4>] show_stack+0x14/0x1c
> [   12.472781] [<ffffff8eb516c998>] dump_stack+0x8c/0xac
> [   12.472786] [<ffffff8eb4ecea28>] __schedule_bug+0x54/0x70
> [   12.472792] [<ffffff8eb5bbf478>] __schedule+0x6b4/0x928
> [   12.472794] [<ffffff8eb5bbf728>] schedule+0x3c/0xa0
> [   12.472797] [<ffffff8eb5bc2950>] schedule_hrtimeout_range_clock+0x80/0xec
> [   12.472799] [<ffffff8eb5bc29ec>] schedule_hrtimeout+0x18/0x20
> [   12.472803] [<ffffff8eb4ed3b30>] wait_task_inactive+0x1a0/0x1a4
> [   12.472806] [<ffffff8eb4ec1b88>] __kthread_bind_mask+0x20/0x7c
> [   12.472809] [<ffffff8eb4ec1c0c>] __kthread_bind+0x28/0x30
> [   12.472811] [<ffffff8eb4ec1c88>] __kthread_unpark+0x5c/0x60
> [   12.472814] [<ffffff8eb4ec1cb0>] kthread_unpark+0x24/0x2c
> [   12.472818] [<ffffff8eb4ea4a7c>] cpuhp_online_idle+0x50/0x90
> [   12.472822] [<ffffff8eb4ef2940>] cpu_startup_entry+0x3c/0x1d4
> [   12.472824] [<ffffff8eb4e8dae4>] secondary_start_kernel+0x164/0x1b4
> 
> Since the kthread_unpark is invoked from a preemption-disabled context,
> wait_task_inactive's action of invoking schedule is invalid, causing the
> splat. Note that kthread_bind_mask is correctly attempting to re-set
> the affinity mask since cpuhp is a per-cpu smpboot thread.
> 
> Instead of adding an expensive wait_task_inactive inside kthread_park()
> or trying to muck with the hotplug code, let's just ensure that the
> completion variable and the schedule happen atomically inside
> __kthread_parkme. This focuses the fix to the hotplug requirement alone,
> and removes the unnecessary migration of cpuhp/X.
> 
> Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
> ---
>   kernel/kthread.c | 13 ++++++++++++-
>   1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 26db528..7ad3354 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -171,9 +171,20 @@ static void __kthread_parkme(struct kthread *self)
>   {
>   	__set_current_state(TASK_PARKED);
>   	while (test_bit(KTHREAD_SHOULD_PARK, &self->flags)) {
> +		/*
> +		 * Why the preempt_disable?
> +		 * Hotplug needs to ensure that 'self' is off of the runqueue
> +		 * as well, before scheduling the stopper thread that will
> +		 * migrate tasks off of the runqeue that 'self' was running on.
> +		 * This avoids unnecessary migration work and also ensures that
> +		 * kthread_unpark in the cpu_up path doesn't race with
> +		 * __kthread_parkme.
> +		 */
> +		preempt_disable();
>   		if (!test_and_set_bit(KTHREAD_IS_PARKED, &self->flags))
>   			complete(&self->parked);
> -		schedule();
> +		schedule_preempt_disabled();
> +		preempt_enable();
>   		__set_current_state(TASK_PARKED);
>   	}
>   	clear_bit(KTHREAD_IS_PARKED, &self->flags);
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kthread: Atomically set completion and perform dequeue in __kthread_parkme
  2017-06-26 23:03 ` Vikram Mulukutla
@ 2017-06-28 14:05   ` Vikram Mulukutla
  2017-07-04 19:49   ` Thomas Gleixner
  1 sibling, 0 replies; 7+ messages in thread
From: Vikram Mulukutla @ 2017-06-28 14:05 UTC (permalink / raw)
  To: rusty, tj, tglx, akpm; +Cc: linux-kernel, stable



Ping. This is happening on x86 across suspend/resume too.
https://bugs.freedesktop.org/show_bug.cgi?id=100113

On 2017-06-26 16:03, Vikram Mulukutla wrote:
> correcting Thomas Gleixner's email address. s/linuxtronix/linutronix
> 
> On 6/26/2017 3:18 PM, Vikram Mulukutla wrote:
>> kthread_park waits for the target kthread to park itself with
>> __kthread_parkme using a completion variable. __kthread_parkme - which 
>> is
>> invoked by the target kthread - sets the completion variable before
>> calling schedule() to voluntarily get itself off of the runqueue.

<snip>

Thanks,
Vikram

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kthread: Atomically set completion and perform dequeue in __kthread_parkme
  2017-06-26 22:18 [PATCH] kthread: Atomically set completion and perform dequeue in __kthread_parkme Vikram Mulukutla
  2017-06-26 23:03 ` Vikram Mulukutla
@ 2017-07-04 16:07 ` Peter Zijlstra
  2017-07-05 17:21   ` Vikram Mulukutla
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2017-07-04 16:07 UTC (permalink / raw)
  To: Vikram Mulukutla; +Cc: rusty, tj, Thomas Gleixner, akpm, linux-kernel, stable

On Mon, Jun 26, 2017 at 03:18:03PM -0700, Vikram Mulukutla wrote:
>  kernel/kthread.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 26db528..7ad3354 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -171,9 +171,20 @@ static void __kthread_parkme(struct kthread *self)
>  {
>  	__set_current_state(TASK_PARKED);
>  	while (test_bit(KTHREAD_SHOULD_PARK, &self->flags)) {
> +		/*
> +		 * Why the preempt_disable?
> +		 * Hotplug needs to ensure that 'self' is off of the runqueue
> +		 * as well, before scheduling the stopper thread that will
> +		 * migrate tasks off of the runqeue that 'self' was running on.
> +		 * This avoids unnecessary migration work and also ensures that
> +		 * kthread_unpark in the cpu_up path doesn't race with
> +		 * __kthread_parkme.
> +		 */
> +		preempt_disable();
>  		if (!test_and_set_bit(KTHREAD_IS_PARKED, &self->flags))
>  			complete(&self->parked);
> +		schedule_preempt_disabled();

This is broken. schedule_preempt_disable() doesn't guarantee no
preemptions, just makes it less likely.

> +		preempt_enable();
>  		__set_current_state(TASK_PARKED);
>  	}
>  	clear_bit(KTHREAD_IS_PARKED, &self->flags);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kthread: Atomically set completion and perform dequeue in __kthread_parkme
  2017-06-26 23:03 ` Vikram Mulukutla
  2017-06-28 14:05   ` Vikram Mulukutla
@ 2017-07-04 19:49   ` Thomas Gleixner
  2017-07-05 17:23     ` Vikram Mulukutla
  1 sibling, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2017-07-04 19:49 UTC (permalink / raw)
  To: Vikram Mulukutla
  Cc: Rusty Russell, Tejun Heo, Andrew Morton, LKML, Peter Zijlstra,
	Sebastian Sewior

On Mon, 26 Jun 2017, Vikram Mulukutla wrote:
> On 6/26/2017 3:18 PM, Vikram Mulukutla wrote:
> > kthread_park waits for the target kthread to park itself with
> > __kthread_parkme using a completion variable. __kthread_parkme - which is
> > invoked by the target kthread - sets the completion variable before
> > calling schedule() to voluntarily get itself off of the runqueue.
> > 
> > This causes an interesting race in the hotplug path. takedown_cpu()
> > invoked for CPU_X attempts to park the cpuhp/X hotplug kthread before
> > running the stopper thread on CPU_X. kthread_unpark doesn't guarantee that
> > cpuhp/X is off of X's runqueue, only that the thread has executed
> > __kthread_parkme and set the completion. cpuhp/X may have been preempted
> > out before calling schedule() to voluntarily sleep. takedown_cpu proceeds
> > to run the stopper thread on CPU_X which promptly migrates off the
> > still-on-rq cpuhp/X thread to another cpu CPU_Y, setting its affinity
> > mask to something other than CPU_X alone.
> > 
> > This is OK - cpuhp/X may finally get itself off of CPU_Y's runqueue at
> > some later point. But if that doesn't happen (for example, if there's
> > an RT thread on CPU_Y), the kthread_unpark in a subsequent cpu_up call
> > for CPU_X will race with the still-on-rq condition. Even now we're
> > functionally OK because there is a wait_task_inactive in the
> > kthread_unpark(), BUT the following happens:
> > 
> > [   12.472745] BUG: scheduling while atomic: swapper/7/0/0x00000002

Thats not the worst problem. We could simply enable preemption there, but
the real issue is that this is the idle task of the upcoming CPU which is
not supposed to schedule in the first place.

So no, your 'fix' is just papering over the underlying issue.

And yes, the moron who did not think about wait_task_inactive() being
called via kthread_unpark() -> kthread_bind() is me.

I'm testing a proper fix for it right now. Will post later.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kthread: Atomically set completion and perform dequeue in __kthread_parkme
  2017-07-04 16:07 ` Peter Zijlstra
@ 2017-07-05 17:21   ` Vikram Mulukutla
  0 siblings, 0 replies; 7+ messages in thread
From: Vikram Mulukutla @ 2017-07-05 17:21 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: rusty, tj, Thomas Gleixner, akpm, linux-kernel, stable

On 7/4/2017 9:07 AM, Peter Zijlstra wrote:
> On Mon, Jun 26, 2017 at 03:18:03PM -0700, Vikram Mulukutla wrote:
>>   kernel/kthread.c | 13 ++++++++++++-
>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/kthread.c b/kernel/kthread.c
>> index 26db528..7ad3354 100644
>> --- a/kernel/kthread.c
>> +++ b/kernel/kthread.c
>> @@ -171,9 +171,20 @@ static void __kthread_parkme(struct kthread *self)
>>   {
>>   	__set_current_state(TASK_PARKED);
>>   	while (test_bit(KTHREAD_SHOULD_PARK, &self->flags)) {
>> +		/*
>> +		 * Why the preempt_disable?
>> +		 * Hotplug needs to ensure that 'self' is off of the runqueue
>> +		 * as well, before scheduling the stopper thread that will
>> +		 * migrate tasks off of the runqeue that 'self' was running on.
>> +		 * This avoids unnecessary migration work and also ensures that
>> +		 * kthread_unpark in the cpu_up path doesn't race with
>> +		 * __kthread_parkme.
>> +		 */
>> +		preempt_disable();
>>   		if (!test_and_set_bit(KTHREAD_IS_PARKED, &self->flags))
>>   			complete(&self->parked);
>> +		schedule_preempt_disabled();
> 
> This is broken. schedule_preempt_disable() doesn't guarantee no
> preemptions, just makes it less likely.

Right, the API just informs the scheduler that the calling thread
wishes to have preemption disabled when the API returns. I thought
it was going to guarantee no preemption until the thread is actually
off of the runqueue, but I see the window where an interrupt might
preempt. Doh.

Separate from this hotplug problem, would it be entirely moronic to have
the API disable and enable local interrupts across that short window? I
suppose there's no one that needs this sort of thing so.. no?

> 
>> +		preempt_enable();
>>   		__set_current_state(TASK_PARKED);
>>   	}
>>   	clear_bit(KTHREAD_IS_PARKED, &self->flags);

Thanks,
Vikram

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kthread: Atomically set completion and perform dequeue in __kthread_parkme
  2017-07-04 19:49   ` Thomas Gleixner
@ 2017-07-05 17:23     ` Vikram Mulukutla
  0 siblings, 0 replies; 7+ messages in thread
From: Vikram Mulukutla @ 2017-07-05 17:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rusty Russell, Tejun Heo, Andrew Morton, LKML, Peter Zijlstra,
	Sebastian Sewior

On 7/4/2017 12:49 PM, Thomas Gleixner wrote:
> On Mon, 26 Jun 2017, Vikram Mulukutla wrote:
>> On 6/26/2017 3:18 PM, Vikram Mulukutla wrote:
>>> kthread_park waits for the target kthread to park itself with
>>> __kthread_parkme using a completion variable. __kthread_parkme - which is
>>> invoked by the target kthread - sets the completion variable before
>>> calling schedule() to voluntarily get itself off of the runqueue.
>>>
>>> This causes an interesting race in the hotplug path. takedown_cpu()
>>> invoked for CPU_X attempts to park the cpuhp/X hotplug kthread before
>>> running the stopper thread on CPU_X. kthread_unpark doesn't guarantee that
>>> cpuhp/X is off of X's runqueue, only that the thread has executed
>>> __kthread_parkme and set the completion. cpuhp/X may have been preempted
>>> out before calling schedule() to voluntarily sleep. takedown_cpu proceeds
>>> to run the stopper thread on CPU_X which promptly migrates off the
>>> still-on-rq cpuhp/X thread to another cpu CPU_Y, setting its affinity
>>> mask to something other than CPU_X alone.
>>>
>>> This is OK - cpuhp/X may finally get itself off of CPU_Y's runqueue at
>>> some later point. But if that doesn't happen (for example, if there's
>>> an RT thread on CPU_Y), the kthread_unpark in a subsequent cpu_up call
>>> for CPU_X will race with the still-on-rq condition. Even now we're
>>> functionally OK because there is a wait_task_inactive in the
>>> kthread_unpark(), BUT the following happens:
>>>
>>> [   12.472745] BUG: scheduling while atomic: swapper/7/0/0x00000002
> 
> Thats not the worst problem. We could simply enable preemption there, but
> the real issue is that this is the idle task of the upcoming CPU which is
> not supposed to schedule in the first place.
> 
> So no, your 'fix' is just papering over the underlying issue.
> 
> And yes, the moron who did not think about wait_task_inactive() being
> called via kthread_unpark() -> kthread_bind() is me.
> 
> I'm testing a proper fix for it right now. Will post later.

Thanks, it did totally wrong to have any sort of scheduling in the idle
thread as the subsequent warnings do indicate, but I didn't feel
confident enough to mess around with the hotplug state machine.
> 
> Thanks,
> 
> 	tglx
> 

Thanks,
Vikram

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-07-05 17:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-26 22:18 [PATCH] kthread: Atomically set completion and perform dequeue in __kthread_parkme Vikram Mulukutla
2017-06-26 23:03 ` Vikram Mulukutla
2017-06-28 14:05   ` Vikram Mulukutla
2017-07-04 19:49   ` Thomas Gleixner
2017-07-05 17:23     ` Vikram Mulukutla
2017-07-04 16:07 ` Peter Zijlstra
2017-07-05 17:21   ` Vikram Mulukutla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).