All of lore.kernel.org
 help / color / mirror / Atom feed
* workqueue: WARN at at kernel/workqueue.c:2176
@ 2014-05-12 18:58 Sasha Levin
  2014-05-12 20:01 ` Tejun Heo
  0 siblings, 1 reply; 50+ messages in thread
From: Sasha Levin @ 2014-05-12 18:58 UTC (permalink / raw)
  To: LKML; +Cc: Tejun Heo, Dave Jones

Hi all,

While fuzzing with trinity inside a KVM tools guest running the latest -next
kernel I've stumbled on the following spew:

[ 1297.886670] WARNING: CPU: 0 PID: 190 at kernel/workqueue.c:2176 process_one_work+0xb5/0x6f0()
[ 1297.889216] Modules linked in:
[ 1297.890306] CPU: 0 PID: 190 Comm: kworker/3:0 Not tainted 3.15.0-rc5-next-20140512-sasha-00019-ga20bc00-dirty #456
[ 1297.893258]  0000000000000009 ffff88010c5d7ce8 ffffffffb153e1ec 0000000000000002
[ 1297.893258]  0000000000000000 ffff88010c5d7d28 ffffffffae15fd6c ffff88010cdd6c98
[ 1297.893258]  ffff8806285d4000 ffffffffb3cd09e0 ffff88010cdde000 0000000000000000
[ 1297.893258] Call Trace:
[ 1297.893258] dump_stack (lib/dump_stack.c:52)
[ 1297.893258] warn_slowpath_common (kernel/panic.c:430)
[ 1297.893258] warn_slowpath_null (kernel/panic.c:465)
[ 1297.893258] process_one_work (kernel/workqueue.c:2174 (discriminator 38))
[ 1297.893258] worker_thread (kernel/workqueue.c:2354)
[ 1297.893258] ? rescuer_thread (kernel/workqueue.c:2303)
[ 1297.893258] kthread (kernel/kthread.c:210)
[ 1297.893258] ? kthread_create_on_node (kernel/kthread.c:176)
[ 1297.893258] ret_from_fork (arch/x86/kernel/entry_64.S:553)
[ 1297.893258] ? kthread_create_on_node (kernel/kthread.c:176)


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-12 18:58 workqueue: WARN at at kernel/workqueue.c:2176 Sasha Levin
@ 2014-05-12 20:01 ` Tejun Heo
  2014-05-13  2:19   ` Lai Jiangshan
  0 siblings, 1 reply; 50+ messages in thread
From: Tejun Heo @ 2014-05-12 20:01 UTC (permalink / raw)
  To: Sasha Levin
  Cc: LKML, Dave Jones, Jason J. Herne, Peter Zijlstra, Lai Jiangshan,
	Ingo Molnar

On Mon, May 12, 2014 at 02:58:55PM -0400, Sasha Levin wrote:
> Hi all,
> 
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel I've stumbled on the following spew:
> 
> [ 1297.886670] WARNING: CPU: 0 PID: 190 at kernel/workqueue.c:2176 process_one_work+0xb5/0x6f0()
> [ 1297.889216] Modules linked in:
> [ 1297.890306] CPU: 0 PID: 190 Comm: kworker/3:0 Not tainted 3.15.0-rc5-next-20140512-sasha-00019-ga20bc00-dirty #456
> [ 1297.893258]  0000000000000009 ffff88010c5d7ce8 ffffffffb153e1ec 0000000000000002
> [ 1297.893258]  0000000000000000 ffff88010c5d7d28 ffffffffae15fd6c ffff88010cdd6c98
> [ 1297.893258]  ffff8806285d4000 ffffffffb3cd09e0 ffff88010cdde000 0000000000000000
> [ 1297.893258] Call Trace:
> [ 1297.893258] dump_stack (lib/dump_stack.c:52)
> [ 1297.893258] warn_slowpath_common (kernel/panic.c:430)
> [ 1297.893258] warn_slowpath_null (kernel/panic.c:465)
> [ 1297.893258] process_one_work (kernel/workqueue.c:2174 (discriminator 38))
> [ 1297.893258] worker_thread (kernel/workqueue.c:2354)
> [ 1297.893258] kthread (kernel/kthread.c:210)
> [ 1297.893258] ret_from_fork (arch/x86/kernel/entry_64.S:553)

Hmm, this is "percpu worker on the wrong CPU while the current
workqueue state indicates it should be on the CPU it's bound to"
warning.  We had a similar and more reproducible report a couple
months back.

  http://lkml.kernel.org/g/52F4F01C.1070800@linux.vnet.ibm.com

We added some debug code back then and it looked like the worker was
setting the right cpus_allowed mask and the cpu was up but still
ending up on the wrong CPU.  Peter was looking into it and, ooh, I
missed his last message and it fell through the crack.  We probably
should follow up on that thread.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-13  2:19   ` Lai Jiangshan
@ 2014-05-13  2:17     ` Sasha Levin
  2014-05-14 16:52       ` Jason J. Herne
  0 siblings, 1 reply; 50+ messages in thread
From: Sasha Levin @ 2014-05-13  2:17 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Tejun Heo, LKML, Dave Jones, Jason J. Herne, Peter Zijlstra, Ingo Molnar

On 05/12/2014 10:19 PM, Lai Jiangshan wrote:
> On 05/13/2014 04:01 AM, Tejun Heo wrote:
>> > On Mon, May 12, 2014 at 02:58:55PM -0400, Sasha Levin wrote:
>>> >> Hi all,
>>> >>
>>> >> While fuzzing with trinity inside a KVM tools guest running the latest -next
>>> >> kernel I've stumbled on the following spew:
>>> >>
>>> >> [ 1297.886670] WARNING: CPU: 0 PID: 190 at kernel/workqueue.c:2176 process_one_work+0xb5/0x6f0()
>>> >> [ 1297.889216] Modules linked in:
>>> >> [ 1297.890306] CPU: 0 PID: 190 Comm: kworker/3:0 Not tainted 3.15.0-rc5-next-20140512-sasha-00019-ga20bc00-dirty #456
>>> >> [ 1297.893258]  0000000000000009 ffff88010c5d7ce8 ffffffffb153e1ec 0000000000000002
>>> >> [ 1297.893258]  0000000000000000 ffff88010c5d7d28 ffffffffae15fd6c ffff88010cdd6c98
>>> >> [ 1297.893258]  ffff8806285d4000 ffffffffb3cd09e0 ffff88010cdde000 0000000000000000
>>> >> [ 1297.893258] Call Trace:
>>> >> [ 1297.893258] dump_stack (lib/dump_stack.c:52)
>>> >> [ 1297.893258] warn_slowpath_common (kernel/panic.c:430)
>>> >> [ 1297.893258] warn_slowpath_null (kernel/panic.c:465)
>>> >> [ 1297.893258] process_one_work (kernel/workqueue.c:2174 (discriminator 38))
>>> >> [ 1297.893258] worker_thread (kernel/workqueue.c:2354)
>>> >> [ 1297.893258] kthread (kernel/kthread.c:210)
>>> >> [ 1297.893258] ret_from_fork (arch/x86/kernel/entry_64.S:553)
> Hi,
> 
> I have been trying to address this bug.
> Buy I can't reproduce this bug. Is your testing arch X86?
> if yes, could you find out how to reproduce the bug?

Yup, it's a 64bit KVM guest.

I don't have an easy way to reproduce it as I only saw the bug once, but
it happened when I started pressuring CPU hotplug paths by adding and removing
CPUs often. Maybe it has anything to do with that?


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-12 20:01 ` Tejun Heo
@ 2014-05-13  2:19   ` Lai Jiangshan
  2014-05-13  2:17     ` Sasha Levin
  0 siblings, 1 reply; 50+ messages in thread
From: Lai Jiangshan @ 2014-05-13  2:19 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Tejun Heo, LKML, Dave Jones, Jason J. Herne, Peter Zijlstra, Ingo Molnar

On 05/13/2014 04:01 AM, Tejun Heo wrote:
> On Mon, May 12, 2014 at 02:58:55PM -0400, Sasha Levin wrote:
>> Hi all,
>>
>> While fuzzing with trinity inside a KVM tools guest running the latest -next
>> kernel I've stumbled on the following spew:
>>
>> [ 1297.886670] WARNING: CPU: 0 PID: 190 at kernel/workqueue.c:2176 process_one_work+0xb5/0x6f0()
>> [ 1297.889216] Modules linked in:
>> [ 1297.890306] CPU: 0 PID: 190 Comm: kworker/3:0 Not tainted 3.15.0-rc5-next-20140512-sasha-00019-ga20bc00-dirty #456
>> [ 1297.893258]  0000000000000009 ffff88010c5d7ce8 ffffffffb153e1ec 0000000000000002
>> [ 1297.893258]  0000000000000000 ffff88010c5d7d28 ffffffffae15fd6c ffff88010cdd6c98
>> [ 1297.893258]  ffff8806285d4000 ffffffffb3cd09e0 ffff88010cdde000 0000000000000000
>> [ 1297.893258] Call Trace:
>> [ 1297.893258] dump_stack (lib/dump_stack.c:52)
>> [ 1297.893258] warn_slowpath_common (kernel/panic.c:430)
>> [ 1297.893258] warn_slowpath_null (kernel/panic.c:465)
>> [ 1297.893258] process_one_work (kernel/workqueue.c:2174 (discriminator 38))
>> [ 1297.893258] worker_thread (kernel/workqueue.c:2354)
>> [ 1297.893258] kthread (kernel/kthread.c:210)
>> [ 1297.893258] ret_from_fork (arch/x86/kernel/entry_64.S:553)

Hi,

I have been trying to address this bug.
Buy I can't reproduce this bug. Is your testing arch X86?
if yes, could you find out how to reproduce the bug?

Thanks,
Lai

> 
> Hmm, this is "percpu worker on the wrong CPU while the current
> workqueue state indicates it should be on the CPU it's bound to"
> warning.  We had a similar and more reproducible report a couple
> months back.
> 
>   http://lkml.kernel.org/g/52F4F01C.1070800@linux.vnet.ibm.com
> 
> We added some debug code back then and it looked like the worker was
> setting the right cpus_allowed mask and the cpu was up but still
> ending up on the wrong CPU.  Peter was looking into it and, ooh, I
> missed his last message and it fell through the crack.  We probably
> should follow up on that thread.
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-13  2:17     ` Sasha Levin
@ 2014-05-14 16:52       ` Jason J. Herne
  2014-05-16  3:50         ` Lai Jiangshan
  0 siblings, 1 reply; 50+ messages in thread
From: Jason J. Herne @ 2014-05-14 16:52 UTC (permalink / raw)
  To: Sasha Levin, Lai Jiangshan
  Cc: Tejun Heo, LKML, Dave Jones, Peter Zijlstra, Ingo Molnar

On 05/12/2014 10:17 PM, Sasha Levin wrote:
> I don't have an easy way to reproduce it as I only saw the bug once, but
> it happened when I started pressuring CPU hotplug paths by adding and removing
> CPUs often. Maybe it has anything to do with that?

As per the original report 
(http://article.gmane.org/gmane.linux.kernel/1643027)
I am able to reproduce the problem.

The workload is (on S390 architecture):
2 processes onlining random cpus in a tight loop by using 'echo 1 >
/sys/bus/cpu.../online'
2 processes offlining random cpus in a tight loop by using 'echo 0 >
/sys/bus/cpu.../online'
Otherwise, fairly idle system. load average: 5.82, 6.27, 6.27

The machine has 10 processors.
The warning message some times hits within a few minutes on starting the
workload. Other times it takes several hours.


-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-14 16:52       ` Jason J. Herne
@ 2014-05-16  3:50         ` Lai Jiangshan
  2014-05-16  9:35           ` Peter Zijlstra
                             ` (3 more replies)
  0 siblings, 4 replies; 50+ messages in thread
From: Lai Jiangshan @ 2014-05-16  3:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On 05/15/2014 12:52 AM, Jason J. Herne wrote:
> On 05/12/2014 10:17 PM, Sasha Levin wrote:
>> I don't have an easy way to reproduce it as I only saw the bug once, but
>> it happened when I started pressuring CPU hotplug paths by adding and removing
>> CPUs often. Maybe it has anything to do with that?
> 
> As per the original report (http://article.gmane.org/gmane.linux.kernel/1643027)
> I am able to reproduce the problem.
> 
> The workload is (on S390 architecture):
> 2 processes onlining random cpus in a tight loop by using 'echo 1 >
> /sys/bus/cpu.../online'
> 2 processes offlining random cpus in a tight loop by using 'echo 0 >
> /sys/bus/cpu.../online'
> Otherwise, fairly idle system. load average: 5.82, 6.27, 6.27
> 
> The machine has 10 processors.
> The warning message some times hits within a few minutes on starting the
> workload. Other times it takes several hours.
> 
> 
> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
> 
> 


Hi, Peter and other scheduler Gurus:

When I was trying to test wq-VS-hotplug, I always hit a problem in scheduler
with the following WARNING:

[   74.765519] WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
[   74.765520] Modules linked in: wq_hotplug(O) fuse cpufreq_ondemand ipv6 kvm_intel kvm uinput snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi e1000e snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer ptp iTCO_wdt iTCO_vendor_support lpc_ich snd mfd_core pps_core soundcore acpi_cpufreq i2c_i801 microcode wmi radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core
[   74.765545] CPU: 1 PID: 13 Comm: migration/1 Tainted: G           O  3.15.0-rc3+ #153
[   74.765546] Hardware name: LENOVO ThinkCentre M8200T/  , BIOS 5JKT51AUS 11/02/2010
[   74.765547]  000000000000007c ffff880236199c88 ffffffff814d7d2c 0000000000000000
[   74.765550]  0000000000000000 ffff880236199cc8 ffffffff8103add4 ffff880236199cb8
[   74.765552]  ffffffff81023e1b ffff8802361861c0 0000000000000001 ffff88023fd92b40
[   74.765555] Call Trace:
[   74.765559]  [<ffffffff814d7d2c>] dump_stack+0x51/0x75
[   74.765562]  [<ffffffff8103add4>] warn_slowpath_common+0x81/0x9b
[   74.765564]  [<ffffffff81023e1b>] ? native_smp_send_reschedule+0x2d/0x4b
[   74.765566]  [<ffffffff8103ae08>] warn_slowpath_null+0x1a/0x1c
[   74.765568]  [<ffffffff81023e1b>] native_smp_send_reschedule+0x2d/0x4b
[   74.765571]  [<ffffffff8105c2ea>] smp_send_reschedule+0xa/0xc
[   74.765574]  [<ffffffff8105fe46>] resched_task+0x5e/0x62
[   74.765576]  [<ffffffff81060238>] check_preempt_curr+0x43/0x77
[   74.765578]  [<ffffffff81060680>] __migrate_task+0xda/0x100
[   74.765580]  [<ffffffff810606a6>] ? __migrate_task+0x100/0x100
[   74.765582]  [<ffffffff810606c3>] migration_cpu_stop+0x1d/0x22
[   74.765585]  [<ffffffff810a33c6>] cpu_stopper_thread+0x84/0x116
[   74.765587]  [<ffffffff814d8642>] ? __schedule+0x559/0x581
[   74.765590]  [<ffffffff814dae3c>] ? _raw_spin_lock_irqsave+0x12/0x3c
[   74.765592]  [<ffffffff8105bd75>] ? __smpboot_create_thread+0x109/0x109
[   74.765594]  [<ffffffff8105bf46>] smpboot_thread_fn+0x1d1/0x1d6
[   74.765598]  [<ffffffff81056665>] kthread+0xad/0xb5
[   74.765600]  [<ffffffff810565b8>] ? kthread_freezable_should_stop+0x41/0x41
[   74.765603]  [<ffffffff814e0e2c>] ret_from_fork+0x7c/0xb0
[   74.765605]  [<ffffffff810565b8>] ? kthread_freezable_should_stop+0x41/0x41
[   74.765607] ---[ end trace 662efb362b4e8ed0 ]---

After debugging, I found the hotlug-in cpu is atctive but !online in this case.
the problem was introduced by 5fbd036b.
Some code assumes that any cpu in cpu_active_mask is also online, but 5fbd036b breaks
this assumption, so the corresponding code with this assumption should be changed too.


Hi, Jason J. Herne and Sasha Levin

Thank you for testing wq-VS-hotplug.

The following patch is just a workaround. After it is applied, the above WARNING
is gone, but I can't hit the wq problem that you found.

You can use the following workaround patch to test wq-VS-hotplug again or just
wait the scheduler guys give us a proper patch.
(A interesting thing, 5fbd036b also touches the arch s390).

Thanks,
Lai
---
diff --git a/kernel/cpu.c b/kernel/cpu.c
index a9e710e..253a129 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -726,9 +726,10 @@ void set_cpu_present(unsigned int cpu, bool present)
 
 void set_cpu_online(unsigned int cpu, bool online)
 {
-	if (online)
+	if (online) {
 		cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
-	else
+		cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
+	} else
 		cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits));
 }
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 268a45e..c1a712d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5043,7 +5043,6 @@ static int sched_cpu_active(struct notifier_block *nfb,
 				      unsigned long action, void *hcpu)
 {
 	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_STARTING:
 	case CPU_DOWN_FAILED:
 		set_cpu_active((long)hcpu, true);
 		return NOTIFY_OK;

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-16  3:50         ` Lai Jiangshan
@ 2014-05-16  9:35           ` Peter Zijlstra
  2014-05-16  9:56             ` Lai Jiangshan
  2014-05-16 10:15             ` Peter Zijlstra
  2014-05-16 11:57           ` Peter Zijlstra
                             ` (2 subsequent siblings)
  3 siblings, 2 replies; 50+ messages in thread
From: Peter Zijlstra @ 2014-05-16  9:35 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 1606 bytes --]

On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote:
> After debugging, I found the hotlug-in cpu is atctive but !online in this case.
> the problem was introduced by 5fbd036b.
> Some code assumes that any cpu in cpu_active_mask is also online, but 5fbd036b breaks
> this assumption, so the corresponding code with this assumption should be changed too.

Good find, and yes it does that.

> The following patch is just a workaround. After it is applied, the above WARNING
> is gone, but I can't hit the wq problem that you found.

Seeing how the entirety of hotplug is basically duct tape and twigs, the
below isn't that bad.

> ---
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index a9e710e..253a129 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -726,9 +726,10 @@ void set_cpu_present(unsigned int cpu, bool present)
>  
>  void set_cpu_online(unsigned int cpu, bool online)
>  {
> -	if (online)
> +	if (online) {
>  		cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
> -	else
> +		cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
> +	} else
>  		cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits));
>  }
>  
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 268a45e..c1a712d 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5043,7 +5043,6 @@ static int sched_cpu_active(struct notifier_block *nfb,
>  				      unsigned long action, void *hcpu)
>  {
>  	switch (action & ~CPU_TASKS_FROZEN) {
> -	case CPU_STARTING:
>  	case CPU_DOWN_FAILED:
>  		set_cpu_active((long)hcpu, true);
>  		return NOTIFY_OK;

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-16  9:35           ` Peter Zijlstra
@ 2014-05-16  9:56             ` Lai Jiangshan
  2014-05-16 10:29               ` Peter Zijlstra
  2014-05-16 10:15             ` Peter Zijlstra
  1 sibling, 1 reply; 50+ messages in thread
From: Lai Jiangshan @ 2014-05-16  9:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On 05/16/2014 05:35 PM, Peter Zijlstra wrote:
> On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote:
>> After debugging, I found the hotlug-in cpu is atctive but !online in this case.
>> the problem was introduced by 5fbd036b.
>> Some code assumes that any cpu in cpu_active_mask is also online, but 5fbd036b breaks
>> this assumption, so the corresponding code with this assumption should be changed too.
> 
> Good find, and yes it does that.
> 
>> The following patch is just a workaround. After it is applied, the above WARNING
>> is gone, but I can't hit the wq problem that you found.
> 
> Seeing how the entirety of hotplug is basically duct tape and twigs, the
> below isn't that bad.

I think we need to find a more grace solution...

> 
>> ---
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index a9e710e..253a129 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -726,9 +726,10 @@ void set_cpu_present(unsigned int cpu, bool present)
>>  
>>  void set_cpu_online(unsigned int cpu, bool online)
>>  {
>> -	if (online)
>> +	if (online) {
>>  		cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
>> -	else
>> +		cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
>> +	} else
>>  		cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits));
>>  }
>>  
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 268a45e..c1a712d 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5043,7 +5043,6 @@ static int sched_cpu_active(struct notifier_block *nfb,
>>  				      unsigned long action, void *hcpu)
>>  {
>>  	switch (action & ~CPU_TASKS_FROZEN) {
>> -	case CPU_STARTING:
>>  	case CPU_DOWN_FAILED:
>>  		set_cpu_active((long)hcpu, true);
>>  		return NOTIFY_OK;


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-16  9:35           ` Peter Zijlstra
  2014-05-16  9:56             ` Lai Jiangshan
@ 2014-05-16 10:15             ` Peter Zijlstra
  2014-05-16 10:16               ` Peter Zijlstra
  1 sibling, 1 reply; 50+ messages in thread
From: Peter Zijlstra @ 2014-05-16 10:15 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 2472 bytes --]

On Fri, May 16, 2014 at 11:35:30AM +0200, Peter Zijlstra wrote:
> On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote:
> > After debugging, I found the hotlug-in cpu is atctive but !online in this case.
> > the problem was introduced by 5fbd036b.
> > Some code assumes that any cpu in cpu_active_mask is also online, but 5fbd036b breaks
> > this assumption, so the corresponding code with this assumption should be changed too.
> 
> Good find, and yes it does that.
> 
> > The following patch is just a workaround. After it is applied, the above WARNING
> > is gone, but I can't hit the wq problem that you found.
> 
> Seeing how the entirety of hotplug is basically duct tape and twigs, the
> below isn't that bad.


I made that, are you okay with that?

---
Subject: sched: Fix hotplug vs set_cpus_allowed_ptr()
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Fri, 16 May 2014 11:50:42 +0800

Lai found that:

  WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
  ...
  migration_cpu_stop+0x1d/0x22

was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.

This isn't true since 5fbd036b552f ("sched: Cleanup cpu_active
madness").

So set active and online at the same time to avoid this particular
problem.

Fixes: 5fbd036b552f ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
---
 kernel/cpu.c        |    6 ++++--
 kernel/sched/core.c |    1 -
 2 files changed, 4 insertions(+), 3 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -726,10 +726,12 @@ void set_cpu_present(unsigned int cpu, b
 
 void set_cpu_online(unsigned int cpu, bool online)
 {
-	if (online)
+	if (online) {
 		cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
-	else
+		cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
+	} else {
 		cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits));
+	}
 }
 
 void set_cpu_active(unsigned int cpu, bool active)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5126,7 +5126,6 @@ static int sched_cpu_active(struct notif
 				      unsigned long action, void *hcpu)
 {
 	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_STARTING:
 	case CPU_DOWN_FAILED:
 		set_cpu_active((long)hcpu, true);
 		return NOTIFY_OK;

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-16 10:15             ` Peter Zijlstra
@ 2014-05-16 10:16               ` Peter Zijlstra
  2014-05-16 10:39                 ` Peter Zijlstra
  0 siblings, 1 reply; 50+ messages in thread
From: Peter Zijlstra @ 2014-05-16 10:16 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 584 bytes --]

On Fri, May 16, 2014 at 12:15:05PM +0200, Peter Zijlstra wrote:
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5126,7 +5126,6 @@ static int sched_cpu_active(struct notif
>  				      unsigned long action, void *hcpu)
>  {
>  	switch (action & ~CPU_TASKS_FROZEN) {
> -	case CPU_STARTING:
>  	case CPU_DOWN_FAILED:
>  		set_cpu_active((long)hcpu, true);
>  		return NOTIFY_OK;

In fact, I dropped this part as it conflicts with another patch I have,
and setting it again isn't a big problem.

  http://lkml.kernel.org/r/1399574859-11714-1-git-send-email-minyard@acm.org


[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-16  9:56             ` Lai Jiangshan
@ 2014-05-16 10:29               ` Peter Zijlstra
  0 siblings, 0 replies; 50+ messages in thread
From: Peter Zijlstra @ 2014-05-16 10:29 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 941 bytes --]

On Fri, May 16, 2014 at 05:56:10PM +0800, Lai Jiangshan wrote:
> On 05/16/2014 05:35 PM, Peter Zijlstra wrote:
> > On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote:
> >> After debugging, I found the hotlug-in cpu is atctive but !online in this case.
> >> the problem was introduced by 5fbd036b.
> >> Some code assumes that any cpu in cpu_active_mask is also online, but 5fbd036b breaks
> >> this assumption, so the corresponding code with this assumption should be changed too.
> > 
> > Good find, and yes it does that.
> > 
> >> The following patch is just a workaround. After it is applied, the above WARNING
> >> is gone, but I can't hit the wq problem that you found.
> > 
> > Seeing how the entirety of hotplug is basically duct tape and twigs, the
> > below isn't that bad.
> 
> I think we need to find a more grace solution...

A complete rewrite of the hotplug mess, until that time this is fine.



[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-16 10:16               ` Peter Zijlstra
@ 2014-05-16 10:39                 ` Peter Zijlstra
  0 siblings, 0 replies; 50+ messages in thread
From: Peter Zijlstra @ 2014-05-16 10:39 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 724 bytes --]

On Fri, May 16, 2014 at 12:16:43PM +0200, Peter Zijlstra wrote:
> On Fri, May 16, 2014 at 12:15:05PM +0200, Peter Zijlstra wrote:
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -5126,7 +5126,6 @@ static int sched_cpu_active(struct notif
> >  				      unsigned long action, void *hcpu)
> >  {
> >  	switch (action & ~CPU_TASKS_FROZEN) {
> > -	case CPU_STARTING:
> >  	case CPU_DOWN_FAILED:
> >  		set_cpu_active((long)hcpu, true);
> >  		return NOTIFY_OK;
> 
> In fact, I dropped this part as it conflicts with another patch I have,
> and setting it again isn't a big problem.

Gurgl, I'm an idiot, this doesn't re-set it, it sets it earlier. Let me
go fix the conflict properly then..

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-16  3:50         ` Lai Jiangshan
  2014-05-16  9:35           ` Peter Zijlstra
@ 2014-05-16 11:57           ` Peter Zijlstra
  2014-05-16 12:08             ` Tejun Heo
  2014-05-16 16:18             ` Lai Jiangshan
  2014-05-19 13:07           ` [tip:sched/core] sched: Fix hotplug vs set_cpus_allowed_ptr() tip-bot for Lai Jiangshan
  2014-05-22 12:26           ` [tip:sched/core] sched: Fix hotplug vs. set_cpus_allowed_ptr() tip-bot for Lai Jiangshan
  3 siblings, 2 replies; 50+ messages in thread
From: Peter Zijlstra @ 2014-05-16 11:57 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 3300 bytes --]

On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote:
> Hi, Peter and other scheduler Gurus:
> 
> When I was trying to test wq-VS-hotplug, I always hit a problem in scheduler
> with the following WARNING:
> 
> [   74.765519] WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
> [   74.765520] Modules linked in: wq_hotplug(O) fuse cpufreq_ondemand ipv6 kvm_intel kvm uinput snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi e1000e snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer ptp iTCO_wdt iTCO_vendor_support lpc_ich snd mfd_core pps_core soundcore acpi_cpufreq i2c_i801 microcode wmi radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core
> [   74.765545] CPU: 1 PID: 13 Comm: migration/1 Tainted: G           O  3.15.0-rc3+ #153
> [   74.765546] Hardware name: LENOVO ThinkCentre M8200T/  , BIOS 5JKT51AUS 11/02/2010
> [   74.765547]  000000000000007c ffff880236199c88 ffffffff814d7d2c 0000000000000000
> [   74.765550]  0000000000000000 ffff880236199cc8 ffffffff8103add4 ffff880236199cb8
> [   74.765552]  ffffffff81023e1b ffff8802361861c0 0000000000000001 ffff88023fd92b40
> [   74.765555] Call Trace:
> [   74.765559]  [<ffffffff814d7d2c>] dump_stack+0x51/0x75
> [   74.765562]  [<ffffffff8103add4>] warn_slowpath_common+0x81/0x9b
> [   74.765564]  [<ffffffff81023e1b>] ? native_smp_send_reschedule+0x2d/0x4b
> [   74.765566]  [<ffffffff8103ae08>] warn_slowpath_null+0x1a/0x1c
> [   74.765568]  [<ffffffff81023e1b>] native_smp_send_reschedule+0x2d/0x4b
> [   74.765571]  [<ffffffff8105c2ea>] smp_send_reschedule+0xa/0xc
> [   74.765574]  [<ffffffff8105fe46>] resched_task+0x5e/0x62
> [   74.765576]  [<ffffffff81060238>] check_preempt_curr+0x43/0x77
> [   74.765578]  [<ffffffff81060680>] __migrate_task+0xda/0x100
> [   74.765580]  [<ffffffff810606a6>] ? __migrate_task+0x100/0x100
> [   74.765582]  [<ffffffff810606c3>] migration_cpu_stop+0x1d/0x22
> [   74.765585]  [<ffffffff810a33c6>] cpu_stopper_thread+0x84/0x116
> [   74.765587]  [<ffffffff814d8642>] ? __schedule+0x559/0x581
> [   74.765590]  [<ffffffff814dae3c>] ? _raw_spin_lock_irqsave+0x12/0x3c
> [   74.765592]  [<ffffffff8105bd75>] ? __smpboot_create_thread+0x109/0x109
> [   74.765594]  [<ffffffff8105bf46>] smpboot_thread_fn+0x1d1/0x1d6
> [   74.765598]  [<ffffffff81056665>] kthread+0xad/0xb5
> [   74.765600]  [<ffffffff810565b8>] ? kthread_freezable_should_stop+0x41/0x41
> [   74.765603]  [<ffffffff814e0e2c>] ret_from_fork+0x7c/0xb0
> [   74.765605]  [<ffffffff810565b8>] ? kthread_freezable_should_stop+0x41/0x41
> [   74.765607] ---[ end trace 662efb362b4e8ed0 ]---
> 
> After debugging, I found the hotlug-in cpu is atctive but !online in this case.
> the problem was introduced by 5fbd036b.
> Some code assumes that any cpu in cpu_active_mask is also online, but 5fbd036b breaks
> this assumption, so the corresponding code with this assumption should be changed too.
> 

This of course leaves the question how the workqueue code manages to
call set_cpu_allowed_ptr() on a cpu _before_ its online.

That too sounds fishy.. with the proposed patch the
set_cpus_allowed_ptr() will 'gracefully' fail, but calling it in the
first place is of course dubious too.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-16 11:57           ` Peter Zijlstra
@ 2014-05-16 12:08             ` Tejun Heo
  2014-05-16 12:14               ` Thomas Gleixner
  2014-05-16 16:18             ` Lai Jiangshan
  1 sibling, 1 reply; 50+ messages in thread
From: Tejun Heo @ 2014-05-16 12:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Lai Jiangshan, jjherne, Sasha Levin, LKML, Dave Jones,
	Ingo Molnar, Thomas Gleixner, Steven Rostedt

Hello, Peter.

On Fri, May 16, 2014 at 01:57:37PM +0200, Peter Zijlstra wrote:
> This of course leaves the question how the workqueue code manages to
> call set_cpu_allowed_ptr() on a cpu _before_ its online.
> 
> That too sounds fishy.. with the proposed patch the
> set_cpus_allowed_ptr() will 'gracefully' fail, but calling it in the
> first place is of course dubious too.

Right after being created, a workqueue worker invokes
set_cpus_allowed_ptr() to the target cpumask without checking whether
the cpu[s] are online or not and it's allowed to fail.  The guarantee
there is that the worker is already registered by that point and if a
CPU comes online after the registration, CPU_ONLINE notification will
update the cpumask accordingly, so either way the worker is guaranteed
to be on the right cpumask.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-16 12:08             ` Tejun Heo
@ 2014-05-16 12:14               ` Thomas Gleixner
  2014-05-16 12:16                 ` Tejun Heo
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Gleixner @ 2014-05-16 12:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Lai Jiangshan, jjherne, Sasha Levin, LKML,
	Dave Jones, Ingo Molnar, Steven Rostedt

On Fri, 16 May 2014, Tejun Heo wrote:
> On Fri, May 16, 2014 at 01:57:37PM +0200, Peter Zijlstra wrote:
> > This of course leaves the question how the workqueue code manages to
> > call set_cpu_allowed_ptr() on a cpu _before_ its online.
> > 
> > That too sounds fishy.. with the proposed patch the
> > set_cpus_allowed_ptr() will 'gracefully' fail, but calling it in the
> > first place is of course dubious too.
> 
> Right after being created, a workqueue worker invokes
> set_cpus_allowed_ptr() to the target cpumask without checking whether
> the cpu[s] are online or not and it's allowed to fail.  The guarantee
> there is that the worker is already registered by that point and if a
> CPU comes online after the registration, CPU_ONLINE notification will
> update the cpumask accordingly, so either way the worker is guaranteed
> to be on the right cpumask.

That's what the kthread_create_on_cpu/kthread_park/kthread_unpark
infrastructure is for.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-16 12:14               ` Thomas Gleixner
@ 2014-05-16 12:16                 ` Tejun Heo
  0 siblings, 0 replies; 50+ messages in thread
From: Tejun Heo @ 2014-05-16 12:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Lai Jiangshan, jjherne, Sasha Levin, LKML,
	Dave Jones, Ingo Molnar, Steven Rostedt

On Fri, May 16, 2014 at 02:14:05PM +0200, Thomas Gleixner wrote:
> On Fri, 16 May 2014, Tejun Heo wrote:
> > On Fri, May 16, 2014 at 01:57:37PM +0200, Peter Zijlstra wrote:
> > > This of course leaves the question how the workqueue code manages to
> > > call set_cpu_allowed_ptr() on a cpu _before_ its online.
> > > 
> > > That too sounds fishy.. with the proposed patch the
> > > set_cpus_allowed_ptr() will 'gracefully' fail, but calling it in the
> > > first place is of course dubious too.
> > 
> > Right after being created, a workqueue worker invokes
> > set_cpus_allowed_ptr() to the target cpumask without checking whether
> > the cpu[s] are online or not and it's allowed to fail.  The guarantee
> > there is that the worker is already registered by that point and if a
> > CPU comes online after the registration, CPU_ONLINE notification will
> > update the cpumask accordingly, so either way the worker is guaranteed
> > to be on the right cpumask.
> 
> That's what the kthread_create_on_cpu/kthread_park/kthread_unpark
> infrastructure is for.

They aren't necessarily on one CPU.  Unbound workqueues can have
arbitrary cpumasks associated with them.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-16 11:57           ` Peter Zijlstra
  2014-05-16 12:08             ` Tejun Heo
@ 2014-05-16 16:18             ` Lai Jiangshan
  2014-05-16 16:29               ` Peter Zijlstra
  1 sibling, 1 reply; 50+ messages in thread
From: Lai Jiangshan @ 2014-05-16 16:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On Fri, May 16, 2014 at 7:57 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, May 16, 2014 at 11:50:42AM +0800, Lai Jiangshan wrote:
>> Hi, Peter and other scheduler Gurus:
>>
>> When I was trying to test wq-VS-hotplug, I always hit a problem in scheduler
>> with the following WARNING:
>>
>> [   74.765519] WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
>> [   74.765520] Modules linked in: wq_hotplug(O) fuse cpufreq_ondemand ipv6 kvm_intel kvm uinput snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi e1000e snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer ptp iTCO_wdt iTCO_vendor_support lpc_ich snd mfd_core pps_core soundcore acpi_cpufreq i2c_i801 microcode wmi radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core
>> [   74.765545] CPU: 1 PID: 13 Comm: migration/1 Tainted: G           O  3.15.0-rc3+ #153
>> [   74.765546] Hardware name: LENOVO ThinkCentre M8200T/  , BIOS 5JKT51AUS 11/02/2010
>> [   74.765547]  000000000000007c ffff880236199c88 ffffffff814d7d2c 0000000000000000
>> [   74.765550]  0000000000000000 ffff880236199cc8 ffffffff8103add4 ffff880236199cb8
>> [   74.765552]  ffffffff81023e1b ffff8802361861c0 0000000000000001 ffff88023fd92b40
>> [   74.765555] Call Trace:
>> [   74.765559]  [<ffffffff814d7d2c>] dump_stack+0x51/0x75
>> [   74.765562]  [<ffffffff8103add4>] warn_slowpath_common+0x81/0x9b
>> [   74.765564]  [<ffffffff81023e1b>] ? native_smp_send_reschedule+0x2d/0x4b
>> [   74.765566]  [<ffffffff8103ae08>] warn_slowpath_null+0x1a/0x1c
>> [   74.765568]  [<ffffffff81023e1b>] native_smp_send_reschedule+0x2d/0x4b
>> [   74.765571]  [<ffffffff8105c2ea>] smp_send_reschedule+0xa/0xc
>> [   74.765574]  [<ffffffff8105fe46>] resched_task+0x5e/0x62
>> [   74.765576]  [<ffffffff81060238>] check_preempt_curr+0x43/0x77
>> [   74.765578]  [<ffffffff81060680>] __migrate_task+0xda/0x100
>> [   74.765580]  [<ffffffff810606a6>] ? __migrate_task+0x100/0x100
>> [   74.765582]  [<ffffffff810606c3>] migration_cpu_stop+0x1d/0x22
>> [   74.765585]  [<ffffffff810a33c6>] cpu_stopper_thread+0x84/0x116
>> [   74.765587]  [<ffffffff814d8642>] ? __schedule+0x559/0x581
>> [   74.765590]  [<ffffffff814dae3c>] ? _raw_spin_lock_irqsave+0x12/0x3c
>> [   74.765592]  [<ffffffff8105bd75>] ? __smpboot_create_thread+0x109/0x109
>> [   74.765594]  [<ffffffff8105bf46>] smpboot_thread_fn+0x1d1/0x1d6
>> [   74.765598]  [<ffffffff81056665>] kthread+0xad/0xb5
>> [   74.765600]  [<ffffffff810565b8>] ? kthread_freezable_should_stop+0x41/0x41
>> [   74.765603]  [<ffffffff814e0e2c>] ret_from_fork+0x7c/0xb0
>> [   74.765605]  [<ffffffff810565b8>] ? kthread_freezable_should_stop+0x41/0x41
>> [   74.765607] ---[ end trace 662efb362b4e8ed0 ]---
>>
>> After debugging, I found the hotlug-in cpu is atctive but !online in this case.
>> the problem was introduced by 5fbd036b.
>> Some code assumes that any cpu in cpu_active_mask is also online, but 5fbd036b breaks
>> this assumption, so the corresponding code with this assumption should be changed too.
>>
>
> This of course leaves the question how the workqueue code manages to
> call set_cpu_allowed_ptr() on a cpu _before_ its online.

create_worker().

>
> That too sounds fishy.. with the proposed patch the
> set_cpus_allowed_ptr() will 'gracefully' fail, but calling it in the
> first place is of course dubious too.

Not all set_cpus_allowed_ptr() in the kernel is protected by get_cpu_online(),
these set_cpus_allowed_ptr()s can also hit this problem when race.

so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the first
place to fix.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-16 16:18             ` Lai Jiangshan
@ 2014-05-16 16:29               ` Peter Zijlstra
  2014-05-27 14:18                 ` Jason J. Herne
  0 siblings, 1 reply; 50+ messages in thread
From: Peter Zijlstra @ 2014-05-16 16:29 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 365 bytes --]

On Sat, May 17, 2014 at 12:18:06AM +0800, Lai Jiangshan wrote:
> so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the first
> place to fix.

I'm not arguing about that, not to mention that this is userspace
exposed and nobody protects that.

But I was expecting kernel stuff that calls it on hotplug to be
serialized thusly, but apparently not so.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [tip:sched/core] sched: Fix hotplug vs set_cpus_allowed_ptr()
  2014-05-16  3:50         ` Lai Jiangshan
  2014-05-16  9:35           ` Peter Zijlstra
  2014-05-16 11:57           ` Peter Zijlstra
@ 2014-05-19 13:07           ` tip-bot for Lai Jiangshan
  2014-05-22 12:26           ` [tip:sched/core] sched: Fix hotplug vs. set_cpus_allowed_ptr() tip-bot for Lai Jiangshan
  3 siblings, 0 replies; 50+ messages in thread
From: tip-bot for Lai Jiangshan @ 2014-05-19 13:07 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, tglx, laijs

Commit-ID:  b961b5eea6f90e269bfdec7cd0dac98b008ae817
Gitweb:     http://git.kernel.org/tip/b961b5eea6f90e269bfdec7cd0dac98b008ae817
Author:     Lai Jiangshan <laijs@cn.fujitsu.com>
AuthorDate: Fri, 16 May 2014 11:50:42 +0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 19 May 2014 21:47:34 +0900

sched: Fix hotplug vs set_cpus_allowed_ptr()

Lai found that:

  WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
  ...
  migration_cpu_stop+0x1d/0x22

was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.

This isn't true since 5fbd036b552f ("sched: Cleanup cpu_active
madness").

So set active and online at the same time to avoid this particular
problem.

Fixes: 5fbd036b552f ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/cpu.c        | 6 ++++--
 kernel/sched/core.c | 1 -
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index a9e710e..247979a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -726,10 +726,12 @@ void set_cpu_present(unsigned int cpu, bool present)
 
 void set_cpu_online(unsigned int cpu, bool online)
 {
-	if (online)
+	if (online) {
 		cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
-	else
+		cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
+	} else {
 		cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits));
+	}
 }
 
 void set_cpu_active(unsigned int cpu, bool active)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 44e00ab..86f3890 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5076,7 +5076,6 @@ static int sched_cpu_active(struct notifier_block *nfb,
 				      unsigned long action, void *hcpu)
 {
 	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_STARTING:
 	case CPU_DOWN_FAILED:
 		set_cpu_active((long)hcpu, true);
 		return NOTIFY_OK;

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [tip:sched/core] sched: Fix hotplug vs. set_cpus_allowed_ptr()
  2014-05-16  3:50         ` Lai Jiangshan
                             ` (2 preceding siblings ...)
  2014-05-19 13:07           ` [tip:sched/core] sched: Fix hotplug vs set_cpus_allowed_ptr() tip-bot for Lai Jiangshan
@ 2014-05-22 12:26           ` tip-bot for Lai Jiangshan
  2014-05-22 22:02             ` Srivatsa S. Bhat
  3 siblings, 1 reply; 50+ messages in thread
From: tip-bot for Lai Jiangshan @ 2014-05-22 12:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, peterz, srivatsa.bhat,
	rafael.j.wysocki, toshi.kani, ego, akpm, wangyun, tglx, laijs,
	paul.gortmaker

Commit-ID:  6acbfb96976fc3350e30d964acb1dbbdf876d55e
Gitweb:     http://git.kernel.org/tip/6acbfb96976fc3350e30d964acb1dbbdf876d55e
Author:     Lai Jiangshan <laijs@cn.fujitsu.com>
AuthorDate: Fri, 16 May 2014 11:50:42 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 22 May 2014 10:21:31 +0200

sched: Fix hotplug vs. set_cpus_allowed_ptr()

Lai found that:

  WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
  ...
  migration_cpu_stop+0x1d/0x22

was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.

This isn't true since 5fbd036b552f ("sched: Cleanup cpu_active madness").

So set active and online at the same time to avoid this particular
problem.

Fixes: 5fbd036b552f ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael wang <wangyun@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/cpu.c        | 6 ++++--
 kernel/sched/core.c | 1 -
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index a9e710e..247979a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -726,10 +726,12 @@ void set_cpu_present(unsigned int cpu, bool present)
 
 void set_cpu_online(unsigned int cpu, bool online)
 {
-	if (online)
+	if (online) {
 		cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
-	else
+		cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
+	} else {
 		cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits));
+	}
 }
 
 void set_cpu_active(unsigned int cpu, bool active)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 44e00ab..86f3890 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5076,7 +5076,6 @@ static int sched_cpu_active(struct notifier_block *nfb,
 				      unsigned long action, void *hcpu)
 {
 	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_STARTING:
 	case CPU_DOWN_FAILED:
 		set_cpu_active((long)hcpu, true);
 		return NOTIFY_OK;

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [tip:sched/core] sched: Fix hotplug vs. set_cpus_allowed_ptr()
  2014-05-22 12:26           ` [tip:sched/core] sched: Fix hotplug vs. set_cpus_allowed_ptr() tip-bot for Lai Jiangshan
@ 2014-05-22 22:02             ` Srivatsa S. Bhat
  0 siblings, 0 replies; 50+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-22 22:02 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, torvalds, peterz, rafael.j.wysocki,
	srivatsa.bhat, toshi.kani, ego, akpm, wangyun, tglx, laijs,
	paul.gortmaker
  Cc: tip-bot for Lai Jiangshan, linux-tip-commits

On 05/22/2014 05:56 PM, tip-bot for Lai Jiangshan wrote:
> Commit-ID:  6acbfb96976fc3350e30d964acb1dbbdf876d55e
> Gitweb:     http://git.kernel.org/tip/6acbfb96976fc3350e30d964acb1dbbdf876d55e
> Author:     Lai Jiangshan <laijs@cn.fujitsu.com>
> AuthorDate: Fri, 16 May 2014 11:50:42 +0800
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Thu, 22 May 2014 10:21:31 +0200
> 
> sched: Fix hotplug vs. set_cpus_allowed_ptr()
> 
> Lai found that:
> 
>   WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
>   ...
>   migration_cpu_stop+0x1d/0x22
> 
> was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
> always a sub-set of cpu_online_mask.
> 
> This isn't true since 5fbd036b552f ("sched: Cleanup cpu_active madness").
> 
> So set active and online at the same time to avoid this particular
> problem.
> 
> Fixes: 5fbd036b552f ("sched: Cleanup cpu_active madness")
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Michael wang <wangyun@linux.vnet.ibm.com>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Cc: Toshi Kani <toshi.kani@hp.com>
> Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>

Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Regards,
Srivatsa S. Bhat

> ---
>  kernel/cpu.c        | 6 ++++--
>  kernel/sched/core.c | 1 -
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index a9e710e..247979a 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -726,10 +726,12 @@ void set_cpu_present(unsigned int cpu, bool present)
> 
>  void set_cpu_online(unsigned int cpu, bool online)
>  {
> -	if (online)
> +	if (online) {
>  		cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
> -	else
> +		cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
> +	} else {
>  		cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits));
> +	}
>  }
> 
>  void set_cpu_active(unsigned int cpu, bool active)
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 44e00ab..86f3890 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5076,7 +5076,6 @@ static int sched_cpu_active(struct notifier_block *nfb,
>  				      unsigned long action, void *hcpu)
>  {
>  	switch (action & ~CPU_TASKS_FROZEN) {
> -	case CPU_STARTING:
>  	case CPU_DOWN_FAILED:
>  		set_cpu_active((long)hcpu, true);
>  		return NOTIFY_OK;
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-16 16:29               ` Peter Zijlstra
@ 2014-05-27 14:18                 ` Jason J. Herne
  2014-05-27 14:26                   ` Peter Zijlstra
  0 siblings, 1 reply; 50+ messages in thread
From: Jason J. Herne @ 2014-05-27 14:18 UTC (permalink / raw)
  To: Peter Zijlstra, Lai Jiangshan
  Cc: Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On 05/16/2014 12:29 PM, Peter Zijlstra wrote:
> On Sat, May 17, 2014 at 12:18:06AM +0800, Lai Jiangshan wrote:
>> so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the first
>> place to fix.
>
> I'm not arguing about that, not to mention that this is userspace
> exposed and nobody protects that.
>
> But I was expecting kernel stuff that calls it on hotplug to be
> serialized thusly, but apparently not so.
>

Was a final patch posted for this issue? The discussion made it sound 
like there were still a few things to figure out before we could resolve 
this bug. I can recreate this as needed and I'm happy to test any patches.

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-27 14:18                 ` Jason J. Herne
@ 2014-05-27 14:26                   ` Peter Zijlstra
  2014-05-29 16:23                     ` Jason J. Herne
  0 siblings, 1 reply; 50+ messages in thread
From: Peter Zijlstra @ 2014-05-27 14:26 UTC (permalink / raw)
  To: Jason J. Herne
  Cc: Lai Jiangshan, Sasha Levin, Tejun Heo, LKML, Dave Jones,
	Ingo Molnar, Thomas Gleixner, Steven Rostedt

On Tue, May 27, 2014 at 10:18:31AM -0400, Jason J. Herne wrote:
> On 05/16/2014 12:29 PM, Peter Zijlstra wrote:
> >On Sat, May 17, 2014 at 12:18:06AM +0800, Lai Jiangshan wrote:
> >>so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the first
> >>place to fix.
> >
> >I'm not arguing about that, not to mention that this is userspace
> >exposed and nobody protects that.
> >
> >But I was expecting kernel stuff that calls it on hotplug to be
> >serialized thusly, but apparently not so.
> >
> 
> Was a final patch posted for this issue? The discussion made it sound like
> there were still a few things to figure out before we could resolve this
> bug. I can recreate this as needed and I'm happy to test any patches.


https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sched/urgent&id=6acbfb96976fc3350e30d964acb1dbbdf876d55e

which should make its way to Linus soonish I suppose.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-27 14:26                   ` Peter Zijlstra
@ 2014-05-29 16:23                     ` Jason J. Herne
  2014-06-03 11:24                       ` Lai Jiangshan
  0 siblings, 1 reply; 50+ messages in thread
From: Jason J. Herne @ 2014-05-29 16:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Lai Jiangshan, Sasha Levin, Tejun Heo, LKML, Dave Jones,
	Ingo Molnar, Thomas Gleixner, Steven Rostedt

On 05/27/2014 10:26 AM, Peter Zijlstra wrote:
> On Tue, May 27, 2014 at 10:18:31AM -0400, Jason J. Herne wrote:
>> On 05/16/2014 12:29 PM, Peter Zijlstra wrote:
>>> On Sat, May 17, 2014 at 12:18:06AM +0800, Lai Jiangshan wrote:
>>>> so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the first
>>>> place to fix.
>>>
>>> I'm not arguing about that, not to mention that this is userspace
>>> exposed and nobody protects that.
>>>
>>> But I was expecting kernel stuff that calls it on hotplug to be
>>> serialized thusly, but apparently not so.
>>>
>>
>> Was a final patch posted for this issue? The discussion made it sound like
>> there were still a few things to figure out before we could resolve this
>> bug. I can recreate this as needed and I'm happy to test any patches.
>
>
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sched/urgent&id=6acbfb96976fc3350e30d964acb1dbbdf876d55e
>
> which should make its way to Linus soonish I suppose.
>

I applied the patch on top of c7208164e66f63e3ec1759b98087849286410741 
and I am still hitting the problem.
Should I have applied to a different branch/commit to pick up any other 
needed changes?

Patch applied:

diff --git a/kernel/cpu.c b/kernel/cpu.c
index a9e710e..247979a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -726,10 +726,12 @@ void set_cpu_present(unsigned int cpu, bool present)
void set_cpu_online(unsigned int cpu, bool online)
{
- if (online)
+ if (online) {
cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
- else
+ cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
+ } else {
cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits));
+ }
}
void set_cpu_active(unsigned int cpu, bool active)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 44e00ab..86f3890 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5076,7 +5076,6 @@ static int sched_cpu_active(struct notifier_block 
*nfb,
unsigned long action, void *hcpu)
{
switch (action & ~CPU_TASKS_FROZEN) {
- case CPU_STARTING:
case CPU_DOWN_FAILED:
set_cpu_active((long)hcpu, true);
return NOTIFY_OK;

Here is the output from the recreation using this patch:

[ 3634.146233] ------------[ cut here ]------------
[ 3634.146238] WARNING: at kernel/workqueue.c:2176
[ 3634.146239] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables 
ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun 
vhost macvtap macvlan eadm_sch qeth ccwgroup zfcp scsi_transport_fc 
scsi_tgt qdio dasd_eckd_mod dasd_mod dm_multipath [last unloaded: kvm]
[ 3634.146260] CPU: 6 PID: 28009 Comm: kworker/7:0 Not tainted 3.15.0-rc7 #1
[ 3634.146263] Workqueue: \xffffff80           (null)
[ 3634.146264] task: 000000025def32e0 ti: 000000026dca0000 task.ti: 
000000026dca0000
[ 3634.146266] Krnl PSW : 0404c00180000000 000000000015ad1a 
(process_one_work+0x2e6/0x4c0)
[ 3634.146272]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 
PM:0 EA:3
Krnl GPRS: 0000000000000000 0000000000bc649a 00000002764b0980 
0000000000b94f40
[ 3634.146275]            0000000000b94f40 0000000000000000 
0000000000000000 0000000000bc6496
[ 3634.146277]            0000000000000000 000000008b65b600 
000000008b657000 000000008b657018
[ 3634.146278]            00000002764b0980 0000000000b94f40 
000000026dca3dd0 000000026dca3d70
[ 3634.146287] Krnl Code: 000000000015ad0e: 95001000		cli	0(%r1),0
            000000000015ad12: a774fece		brc	7,15aaae
           #000000000015ad16: a7f40001		brc	15,15ad18
           >000000000015ad1a: 92011000		mvi	0(%r1),1
            000000000015ad1e: a7f4fec8		brc	15,15aaae
            000000000015ad22: e31003180004	lg	%r1,792
            000000000015ad28: 58301024		l	%r3,36(%r1)
            000000000015ad2c: a73a0001		ahi	%r3,1
[ 3634.146299] Call Trace:
[ 3634.146301] ([<000000000015ace8>] process_one_work+0x2b4/0x4c0)
[ 3634.146303]  [<000000000015c100>] worker_thread+0x178/0x39c
[ 3634.146305]  [<0000000000164ba6>] kthread+0x10e/0x128
[ 3634.146310]  [<000000000072d026>] kernel_thread_starter+0x6/0xc
[ 3634.146312]  [<000000000072d020>] kernel_thread_starter+0x0/0xc
[ 3634.146313] Last Breaking-Event-Address:
[ 3634.146315]  [<000000000015ad16>] process_one_work+0x2e2/0x4c0
[ 3634.146316] ---[ end trace 03f51c9126c24171 ]---

I don't think this output provides anything new. Please let me know if I 
can gather any more data.

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-05-29 16:23                     ` Jason J. Herne
@ 2014-06-03 11:24                       ` Lai Jiangshan
  2014-06-03 12:45                         ` Lai Jiangshan
                                           ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Lai Jiangshan @ 2014-06-03 11:24 UTC (permalink / raw)
  To: jjherne
  Cc: Peter Zijlstra, Sasha Levin, Tejun Heo, LKML, Dave Jones,
	Ingo Molnar, Thomas Gleixner, Steven Rostedt

Hi, Jason

Could you test again after the following command has done.
(if Peter hasn't asked you test with this command before nor he doesn't stop you now) 

echo NO_TTWU_QUEUE > /sys/kernel/debug/sched_features

Thanks a lot.

Hi, Peter,

I found something strange by review (just by review, no test yet)

__migrate_task()
{
...
	/*
	 * If we're not on a rq, the next wake-up will ensure we're
	 * placed properly.
	 */
	if (p->on_rq) {
		dequeue_task(rq_src, p, 0);
		set_task_cpu(p, dest_cpu);
		enqueue_task(rq_dest, p, 0);
		check_preempt_curr(rq_dest, p, 0);
	}
...
}

The comment is incorrect if TTWU_QUEUE is enabled.
The task is waken-up even p->on_rq==0 in this case:
	p->wake_entry is added to the rq,
	p->state is TASK_WAKING
	p->on_rq is 0

In this case __migrate_task() fails to migrate the task!!!.

Go back to workqueue for higher level analysing.

task1				cpu#4			task3
workqueue_cpu_up_callback()
							wake_up_process(worker1)
							  ttwu_queue_remote() #queue worker1 to cpu#4
set_cpus_allowed_ptr()
  set worker's cpuallowed to
  cpumask_of(5)
				#stopper_task
				__migrate_task()
				finds p->on_rq is 0,
				do nothing return
set_cpus_allowed_ptr() return 0

In this case, the WARN_ON() in process_one_work() hit.

Thanks,
Lai


On 05/30/2014 12:23 AM, Jason J. Herne wrote:
> On 05/27/2014 10:26 AM, Peter Zijlstra wrote:
>> On Tue, May 27, 2014 at 10:18:31AM -0400, Jason J. Herne wrote:
>>> On 05/16/2014 12:29 PM, Peter Zijlstra wrote:
>>>> On Sat, May 17, 2014 at 12:18:06AM +0800, Lai Jiangshan wrote:
>>>>> so the scheduler/set_cpus_allowed_ptr()/cpu_active_mask should be the first
>>>>> place to fix.
>>>>
>>>> I'm not arguing about that, not to mention that this is userspace
>>>> exposed and nobody protects that.
>>>>
>>>> But I was expecting kernel stuff that calls it on hotplug to be
>>>> serialized thusly, but apparently not so.
>>>>
>>>
>>> Was a final patch posted for this issue? The discussion made it sound like
>>> there were still a few things to figure out before we could resolve this
>>> bug. I can recreate this as needed and I'm happy to test any patches.
>>
>>
>> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sched/urgent&id=6acbfb96976fc3350e30d964acb1dbbdf876d55e
>>
>> which should make its way to Linus soonish I suppose.
>>
> 
> I applied the patch on top of c7208164e66f63e3ec1759b98087849286410741 and I am still hitting the problem.
> Should I have applied to a different branch/commit to pick up any other needed changes?
> 
> Patch applied:
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index a9e710e..247979a 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -726,10 +726,12 @@ void set_cpu_present(unsigned int cpu, bool present)
> void set_cpu_online(unsigned int cpu, bool online)
> {
> - if (online)
> + if (online) {
> cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
> - else
> + cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
> + } else {
> cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits));
> + }
> }
> void set_cpu_active(unsigned int cpu, bool active)
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 44e00ab..86f3890 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5076,7 +5076,6 @@ static int sched_cpu_active(struct notifier_block *nfb,
> unsigned long action, void *hcpu)
> {
> switch (action & ~CPU_TASKS_FROZEN) {
> - case CPU_STARTING:
> case CPU_DOWN_FAILED:
> set_cpu_active((long)hcpu, true);
> return NOTIFY_OK;
> 
> Here is the output from the recreation using this patch:
> 
> [ 3634.146233] ------------[ cut here ]------------
> [ 3634.146238] WARNING: at kernel/workqueue.c:2176
> [ 3634.146239] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables ebtable_nat ebtables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi qeth_l2 tape_3590 tape tape_class vhost_net tun vhost macvtap macvlan eadm_sch qeth ccwgroup zfcp scsi_transport_fc scsi_tgt qdio dasd_eckd_mod dasd_mod dm_multipath [last unloaded: kvm]
> [ 3634.146260] CPU: 6 PID: 28009 Comm: kworker/7:0 Not tainted 3.15.0-rc7 #1
> [ 3634.146263] Workqueue: \xffffff80           (null)
> [ 3634.146264] task: 000000025def32e0 ti: 000000026dca0000 task.ti: 000000026dca0000
> [ 3634.146266] Krnl PSW : 0404c00180000000 000000000015ad1a (process_one_work+0x2e6/0x4c0)
> [ 3634.146272]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
> Krnl GPRS: 0000000000000000 0000000000bc649a 00000002764b0980 0000000000b94f40
> [ 3634.146275]            0000000000b94f40 0000000000000000 0000000000000000 0000000000bc6496
> [ 3634.146277]            0000000000000000 000000008b65b600 000000008b657000 000000008b657018
> [ 3634.146278]            00000002764b0980 0000000000b94f40 000000026dca3dd0 000000026dca3d70
> [ 3634.146287] Krnl Code: 000000000015ad0e: 95001000        cli    0(%r1),0
>            000000000015ad12: a774fece        brc    7,15aaae
>           #000000000015ad16: a7f40001        brc    15,15ad18
>           >000000000015ad1a: 92011000        mvi    0(%r1),1
>            000000000015ad1e: a7f4fec8        brc    15,15aaae
>            000000000015ad22: e31003180004    lg    %r1,792
>            000000000015ad28: 58301024        l    %r3,36(%r1)
>            000000000015ad2c: a73a0001        ahi    %r3,1
> [ 3634.146299] Call Trace:
> [ 3634.146301] ([<000000000015ace8>] process_one_work+0x2b4/0x4c0)
> [ 3634.146303]  [<000000000015c100>] worker_thread+0x178/0x39c
> [ 3634.146305]  [<0000000000164ba6>] kthread+0x10e/0x128
> [ 3634.146310]  [<000000000072d026>] kernel_thread_starter+0x6/0xc
> [ 3634.146312]  [<000000000072d020>] kernel_thread_starter+0x0/0xc
> [ 3634.146313] Last Breaking-Event-Address:
> [ 3634.146315]  [<000000000015ad16>] process_one_work+0x2e2/0x4c0
> [ 3634.146316] ---[ end trace 03f51c9126c24171 ]---
> 
> I don't think this output provides anything new. Please let me know if I can gather any more data.
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-03 11:24                       ` Lai Jiangshan
@ 2014-06-03 12:45                         ` Lai Jiangshan
  2014-06-03 14:28                           ` Peter Zijlstra
  2014-06-03 14:16                         ` Peter Zijlstra
  2014-06-04  2:28                         ` workqueue: WARN at at kernel/workqueue.c:2176 Lai Jiangshan
  2 siblings, 1 reply; 50+ messages in thread
From: Lai Jiangshan @ 2014-06-03 12:45 UTC (permalink / raw)
  To: jjherne
  Cc: Peter Zijlstra, Sasha Levin, Tejun Heo, LKML, Dave Jones,
	Ingo Molnar, Thomas Gleixner, Steven Rostedt


Hi, Peter,

I rewrote the analyse. (scheduler_ipi() must be called before stopper-task,
so the part for workqueue of the old analyse maybe be wrong.)


I found something strange by review (just by review, no test yet)

int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
{
...
	if (p->on_rq) {
		struct migration_arg arg = { p, dest_cpu };
		/* Need help from migration thread: drop lock and wait. */
		task_rq_unlock(rq, p, &flags);
		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
		tlb_migrate_finish(p->mm);
		return 0;
	}
...
}

The branch failed to migrate a waken-up task with p->on_rq==0 if TTWU_QUEUE is enabled:
	p->wake_entry is added to the rq,
	p->state is TASK_WAKING
	p->on_rq is 0

In this case set_cpus_allowed_ptr() fails to migrate the waken-up task!!!.

Go back to workqueue for higher level analysing.

task1					task2					cpu#4
					wake_up_process(worker1)
					  ttwu_queue_remote() #queue worker1
					  to cpu#4
workqueue_cpu_up_callback(cpu=5)

set_cpus_allowed_ptr(worker1)
  set worker's cpuallowed to
  cpumask_of(5)
  see worker1->on_rq = 0,
  do not migrate it.
										scheduler_ipi()
										set worker1->on_rq = 1

										wq_worker_waking_up(worker1)
										fail to hit the WARN_ON()
										  due to WORKER_UNBOUND is not cleared.
set_cpus_allowed_ptr() return 0
clear WORKER_UNBOUND for worker1

In this case, the WARN_ON() in process_one_work() hit.

Thanks,
Lai


The following code maybe help.
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 268a45e..1a198a5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4530,7 +4530,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 		goto out;
 
 	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
-	if (p->on_rq) {
+	if (p->on_rq || p->state == TASK_WAKING) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
 		task_rq_unlock(rq, p, &flags);


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-03 11:24                       ` Lai Jiangshan
  2014-06-03 12:45                         ` Lai Jiangshan
@ 2014-06-03 14:16                         ` Peter Zijlstra
  2014-06-04  2:27                           ` Lai Jiangshan
  2014-06-04  2:28                         ` workqueue: WARN at at kernel/workqueue.c:2176 Lai Jiangshan
  2 siblings, 1 reply; 50+ messages in thread
From: Peter Zijlstra @ 2014-06-03 14:16 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 3071 bytes --]

On Tue, Jun 03, 2014 at 07:24:38PM +0800, Lai Jiangshan wrote:
> Hi, Jason
> 
> Could you test again after the following command has done.
> (if Peter hasn't asked you test with this command before nor he doesn't stop you now) 
> 
> echo NO_TTWU_QUEUE > /sys/kernel/debug/sched_features
> 
> Thanks a lot.
> 
> Hi, Peter,
> 
> I found something strange by review (just by review, no test yet)
> 
> __migrate_task()
> {
> ...
> 	/*
> 	 * If we're not on a rq, the next wake-up will ensure we're
> 	 * placed properly.
> 	 */
> 	if (p->on_rq) {
> 		dequeue_task(rq_src, p, 0);
> 		set_task_cpu(p, dest_cpu);
> 		enqueue_task(rq_dest, p, 0);
> 		check_preempt_curr(rq_dest, p, 0);
> 	}
> ...
> }
> 
> The comment is incorrect if TTWU_QUEUE is enabled.
> The task is waken-up even p->on_rq==0 in this case:
> 	p->wake_entry is added to the rq,
> 	p->state is TASK_WAKING
> 	p->on_rq is 0
> 
> In this case __migrate_task() fails to migrate the task!!!.
> 
> Go back to workqueue for higher level analysing.
> 
> task1				cpu#4			task3
> workqueue_cpu_up_callback()
> 							wake_up_process(worker1)
> 							  ttwu_queue_remote() #queue worker1 to cpu#4
> set_cpus_allowed_ptr()
>   set worker's cpuallowed to
>   cpumask_of(5)
> 				#stopper_task
> 				__migrate_task()
> 				finds p->on_rq is 0,
> 				do nothing return
> set_cpus_allowed_ptr() return 0
> 
> In this case, the WARN_ON() in process_one_work() hit.

Hmm, yes I think you're right. A queued wakeup can miss an affinity
change like that.

Something like the below ought to cure that I suppose..

---
 kernel/sched/core.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 240aa83e73f5..0708ee21632f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1521,17 +1521,32 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
 }
 
 #ifdef CONFIG_SMP
+static void ttwu_queue_remote(struct task_struct *p, int cpu)
+{
+	if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list))
+		smp_send_reschedule(cpu);
+}
+
 static void sched_ttwu_pending(void)
 {
 	struct rq *rq = this_rq();
 	struct llist_node *llist = llist_del_all(&rq->wake_list);
 	struct task_struct *p;
+	int cpu;
 
 	raw_spin_lock(&rq->lock);
 
 	while (llist) {
 		p = llist_entry(llist, struct task_struct, wake_entry);
 		llist = llist_next(llist);
+
+		if (unlikely(!cpumask_test_cpu(rq->cpu, tsk_cpus_allowed(p)))) {
+			cpu = select_fallback_rq(rq->cpu, p);
+			set_task_cpu(p, cpu);
+			ttwu_queue_remote(p, cpu);
+			continue;
+		}
+
 		ttwu_do_activate(rq, p, 0);
 	}
 
@@ -1579,12 +1594,6 @@ void scheduler_ipi(void)
 	irq_exit();
 }
 
-static void ttwu_queue_remote(struct task_struct *p, int cpu)
-{
-	if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list))
-		smp_send_reschedule(cpu);
-}
-
 bool cpus_share_cache(int this_cpu, int that_cpu)
 {
 	return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-03 12:45                         ` Lai Jiangshan
@ 2014-06-03 14:28                           ` Peter Zijlstra
  2014-06-04  1:47                             ` Lai Jiangshan
  0 siblings, 1 reply; 50+ messages in thread
From: Peter Zijlstra @ 2014-06-03 14:28 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 1357 bytes --]

On Tue, Jun 03, 2014 at 08:45:39PM +0800, Lai Jiangshan wrote:
> 
> Hi, Peter,
> 
> I rewrote the analyse. (scheduler_ipi() must be called before stopper-task,
> so the part for workqueue of the old analyse maybe be wrong.)

But I don't think there is any guarantee we'll do the wakeup before
running the stop work.

Suppose the initial task gets queued, and the thing gets send the
interrupt, meanwhile we'll do the stopper work wakeup !queueing, the
set_cpus_allowed_ptr() isn't crossing llc boundaries.

Now, the remote cpu preempts/schedules before the interrupt hits and
runs the stop task.

At which point we'll run __migrate_task() while the task is still queued
on the wake list.

> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 268a45e..1a198a5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4530,7 +4530,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>  		goto out;
>  
>  	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
> -	if (p->on_rq) {
> +	if (p->on_rq || p->state == TASK_WAKING) {
>  		struct migration_arg arg = { p, dest_cpu };
>  		/* Need help from migration thread: drop lock and wait. */
>  		task_rq_unlock(rq, p, &flags);

So while this will close the window somewhat, I don't think its entirely
closed.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-03 14:28                           ` Peter Zijlstra
@ 2014-06-04  1:47                             ` Lai Jiangshan
  0 siblings, 0 replies; 50+ messages in thread
From: Lai Jiangshan @ 2014-06-04  1:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On 06/03/2014 10:28 PM, Peter Zijlstra wrote:
> On Tue, Jun 03, 2014 at 08:45:39PM +0800, Lai Jiangshan wrote:
>>
>> Hi, Peter,
>>
>> I rewrote the analyse. (scheduler_ipi() must be called before stopper-task,
>> so the part for workqueue of the old analyse maybe be wrong.)
> 
> But I don't think there is any guarantee we'll do the wakeup before
> running the stop work.

You are right, but the race window in my old analyse is too narrow to
hit the WARN_ON(). so I rewrote the new analyse showing a much bigger window
which can hit the WARN_ON() in workqueue.c

> 
> Suppose the initial task gets queued, and the thing gets send the
> interrupt, meanwhile we'll do the stopper work wakeup !queueing, the
> set_cpus_allowed_ptr() isn't crossing llc boundaries.
> 
> Now, the remote cpu preempts/schedules before the interrupt hits and
> runs the stop task.
> 
> At which point we'll run __migrate_task() while the task is still queued
> on the wake list.
> 
>> ---
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 268a45e..1a198a5 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4530,7 +4530,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>>  		goto out;
>>  
>>  	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
>> -	if (p->on_rq) {
>> +	if (p->on_rq || p->state == TASK_WAKING) {
>>  		struct migration_arg arg = { p, dest_cpu };
>>  		/* Need help from migration thread: drop lock and wait. */
>>  		task_rq_unlock(rq, p, &flags);
> 
> So while this will close the window somewhat, I don't think its entirely
> closed.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-03 14:16                         ` Peter Zijlstra
@ 2014-06-04  2:27                           ` Lai Jiangshan
  2014-06-04  6:49                             ` Peter Zijlstra
  0 siblings, 1 reply; 50+ messages in thread
From: Lai Jiangshan @ 2014-06-04  2:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On 06/03/2014 10:16 PM, Peter Zijlstra wrote:
> On Tue, Jun 03, 2014 at 07:24:38PM +0800, Lai Jiangshan wrote:
>> Hi, Jason
>>
>> Could you test again after the following command has done.
>> (if Peter hasn't asked you test with this command before nor he doesn't stop you now) 
>>
>> echo NO_TTWU_QUEUE > /sys/kernel/debug/sched_features
>>
>> Thanks a lot.
>>
>> Hi, Peter,
>>
>> I found something strange by review (just by review, no test yet)
>>
>> __migrate_task()
>> {
>> ...
>> 	/*
>> 	 * If we're not on a rq, the next wake-up will ensure we're
>> 	 * placed properly.
>> 	 */
>> 	if (p->on_rq) {
>> 		dequeue_task(rq_src, p, 0);
>> 		set_task_cpu(p, dest_cpu);
>> 		enqueue_task(rq_dest, p, 0);
>> 		check_preempt_curr(rq_dest, p, 0);
>> 	}
>> ...
>> }
>>
>> The comment is incorrect if TTWU_QUEUE is enabled.
>> The task is waken-up even p->on_rq==0 in this case:
>> 	p->wake_entry is added to the rq,
>> 	p->state is TASK_WAKING
>> 	p->on_rq is 0
>>
>> In this case __migrate_task() fails to migrate the task!!!.
>>
>> Go back to workqueue for higher level analysing.
>>
>> task1				cpu#4			task3
>> workqueue_cpu_up_callback()
>> 							wake_up_process(worker1)
>> 							  ttwu_queue_remote() #queue worker1 to cpu#4
>> set_cpus_allowed_ptr()
>>   set worker's cpuallowed to
>>   cpumask_of(5)
>> 				#stopper_task
>> 				__migrate_task()
>> 				finds p->on_rq is 0,
>> 				do nothing return
>> set_cpus_allowed_ptr() return 0
>>
>> In this case, the WARN_ON() in process_one_work() hit.
> 
> Hmm, yes I think you're right. A queued wakeup can miss an affinity
> change like that.
> 
> Something like the below ought to cure that I suppose..

As a non-scheduler developer, I can't find anything wrong with the patch
(I searched all on_rq in kernel/sched).

but I think __migrate_task() is slow path comparing to sched_ttwu_pending().
So I prefer to change set_cpus_allowed_ptr() and __migrate_task() rather than
to sched_ttwu_pending().

Any way, I agree on the change.
I hope Jason test it in few days.

> 
> ---
>  kernel/sched/core.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 240aa83e73f5..0708ee21632f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1521,17 +1521,32 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
>  }
>  
>  #ifdef CONFIG_SMP
> +static void ttwu_queue_remote(struct task_struct *p, int cpu)
> +{
> +	if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list))
> +		smp_send_reschedule(cpu);
> +}
> +
>  static void sched_ttwu_pending(void)
>  {
>  	struct rq *rq = this_rq();
>  	struct llist_node *llist = llist_del_all(&rq->wake_list);
>  	struct task_struct *p;
> +	int cpu;
>  
>  	raw_spin_lock(&rq->lock);
>  
>  	while (llist) {
>  		p = llist_entry(llist, struct task_struct, wake_entry);
>  		llist = llist_next(llist);
> +
> +		if (unlikely(!cpumask_test_cpu(rq->cpu, tsk_cpus_allowed(p)))) {
> +			cpu = select_fallback_rq(rq->cpu, p);
> +			set_task_cpu(p, cpu);
> +			ttwu_queue_remote(p, cpu);
> +			continue;
> +		}
> +
>  		ttwu_do_activate(rq, p, 0);
>  	}
>  
> @@ -1579,12 +1594,6 @@ void scheduler_ipi(void)
>  	irq_exit();
>  }
>  
> -static void ttwu_queue_remote(struct task_struct *p, int cpu)
> -{
> -	if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list))
> -		smp_send_reschedule(cpu);
> -}
> -
>  bool cpus_share_cache(int this_cpu, int that_cpu)
>  {
>  	return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-03 11:24                       ` Lai Jiangshan
  2014-06-03 12:45                         ` Lai Jiangshan
  2014-06-03 14:16                         ` Peter Zijlstra
@ 2014-06-04  2:28                         ` Lai Jiangshan
  2014-06-04  6:48                           ` Peter Zijlstra
  2 siblings, 1 reply; 50+ messages in thread
From: Lai Jiangshan @ 2014-06-04  2:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On 06/03/2014 07:24 PM, Lai Jiangshan wrote:
> Hi, Jason
> 
> Could you test again after the following command has done.
> (if Peter hasn't asked you test with this command before nor he doesn't stop you now) 
> 
> echo NO_TTWU_QUEUE > /sys/kernel/debug/sched_features


Off-topic!

Why sched_features is in debugfs? I thought it is in procfs or sysfs.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-04  2:28                         ` workqueue: WARN at at kernel/workqueue.c:2176 Lai Jiangshan
@ 2014-06-04  6:48                           ` Peter Zijlstra
  0 siblings, 0 replies; 50+ messages in thread
From: Peter Zijlstra @ 2014-06-04  6:48 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 638 bytes --]

On Wed, Jun 04, 2014 at 10:28:50AM +0800, Lai Jiangshan wrote:
> On 06/03/2014 07:24 PM, Lai Jiangshan wrote:
> > Hi, Jason
> > 
> > Could you test again after the following command has done.
> > (if Peter hasn't asked you test with this command before nor he doesn't stop you now) 
> > 
> > echo NO_TTWU_QUEUE > /sys/kernel/debug/sched_features
> 
> 
> Off-topic!
> 
> Why sched_features is in debugfs? I thought it is in procfs or sysfs.

because its for debugging, note that the entire thing relies on
CONFIG_SCHED_DEBUG, if you build without that its gone.

Its not an API its not for production use, its debugging.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-04  2:27                           ` Lai Jiangshan
@ 2014-06-04  6:49                             ` Peter Zijlstra
  2014-06-04  8:25                               ` Lai Jiangshan
  0 siblings, 1 reply; 50+ messages in thread
From: Peter Zijlstra @ 2014-06-04  6:49 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 732 bytes --]

On Wed, Jun 04, 2014 at 10:27:25AM +0800, Lai Jiangshan wrote:
> > Hmm, yes I think you're right. A queued wakeup can miss an affinity
> > change like that.
> > 
> > Something like the below ought to cure that I suppose..
> 
> As a non-scheduler developer, I can't find anything wrong with the patch
> (I searched all on_rq in kernel/sched).

You did fine finding that hole, so who knows, you might become one real
soon now ;-)

> but I think __migrate_task() is slow path comparing to sched_ttwu_pending().
> So I prefer to change set_cpus_allowed_ptr() and __migrate_task() rather than
> to sched_ttwu_pending().

Yes, I agree, something there would be better, but I couldn't find
anything without holes in.



[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-04  6:49                             ` Peter Zijlstra
@ 2014-06-04  8:25                               ` Lai Jiangshan
  2014-06-04  9:39                                 ` Peter Zijlstra
  2014-09-09 14:52                                 ` [tip:sched/core] sched: Migrate waking tasks tip-bot for Lai Jiangshan
  0 siblings, 2 replies; 50+ messages in thread
From: Lai Jiangshan @ 2014-06-04  8:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On 06/04/2014 02:49 PM, Peter Zijlstra wrote:
> On Wed, Jun 04, 2014 at 10:27:25AM +0800, Lai Jiangshan wrote:
>>> Hmm, yes I think you're right. A queued wakeup can miss an affinity
>>> change like that.
>>>
>>> Something like the below ought to cure that I suppose..
>>
>> As a non-scheduler developer, I can't find anything wrong with the patch
>> (I searched all on_rq in kernel/sched).
> 
> You did fine finding that hole, so who knows, you might become one real
> soon now ;-)
> 
>> but I think __migrate_task() is slow path comparing to sched_ttwu_pending().
>> So I prefer to change set_cpus_allowed_ptr() and __migrate_task() rather than
>> to sched_ttwu_pending().
> 
> Yes, I agree, something there would be better, but I couldn't find
> anything without holes in.
> 
> 

I think the following code works. (inspirited from the sched_ttwu_pending() in migration_call().)

if p->on_rq == 0 && p->state == TASK_WAKING in __migrate_task() after this patch,
it means the cpuallowed is changed before __migrate_task() along with other scheduler
movements happens between sched_ttwu_pending() and __migrate_task().


diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 268a45e..277f3bc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4530,7 +4530,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 		goto out;
 
 	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
-	if (p->on_rq) {
+	if (p->on_rq || p->state == TASK_WAKING) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
 		task_rq_unlock(rq, p, &flags);
@@ -4656,6 +4656,7 @@ static int migration_cpu_stop(void *data)
 	 * be on another cpu but it doesn't matter.
 	 */
 	local_irq_disable();
+	sched_ttwu_pending();
 	__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
 	local_irq_enable();
 	return 0;

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-04  8:25                               ` Lai Jiangshan
@ 2014-06-04  9:39                                 ` Peter Zijlstra
  2014-06-05 10:54                                   ` Lai Jiangshan
  2014-09-09 14:52                                 ` [tip:sched/core] sched: Migrate waking tasks tip-bot for Lai Jiangshan
  1 sibling, 1 reply; 50+ messages in thread
From: Peter Zijlstra @ 2014-06-04  9:39 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

[-- Attachment #1: Type: text/plain, Size: 1598 bytes --]

On Wed, Jun 04, 2014 at 04:25:15PM +0800, Lai Jiangshan wrote:
> I think the following code works. (inspirited from the sched_ttwu_pending() in migration_call().)
> 
> if p->on_rq == 0 && p->state == TASK_WAKING in __migrate_task() after this patch,
> it means the cpuallowed is changed before __migrate_task() along with other scheduler
> movements happens between sched_ttwu_pending() and __migrate_task().

Yes, this is better. Can you send a full patch with the below comment in
as well, then I can apply.

Thanks!

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 268a45e..277f3bc 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4530,7 +4530,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>  		goto out;
>  
>  	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
> -	if (p->on_rq) {
> +	if (p->on_rq || p->state == TASK_WAKING) {
>  		struct migration_arg arg = { p, dest_cpu };
>  		/* Need help from migration thread: drop lock and wait. */
>  		task_rq_unlock(rq, p, &flags);
> @@ -4656,6 +4656,7 @@ static int migration_cpu_stop(void *data)
>  	 * be on another cpu but it doesn't matter.
>  	 */
>  	local_irq_disable();

	/*
	 * We need to explicitly wake pending tasks before running
	 * __migrate_task() such that we will not miss enforcing
	 * cpus_allowed during wakeups, see set_cpus_allowed_ptr()'s
	 * TASK_WAKING test.
	 */

> +	sched_ttwu_pending();
>  	__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
>  	local_irq_enable();
>  	return 0;

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-04  9:39                                 ` Peter Zijlstra
@ 2014-06-05 10:54                                   ` Lai Jiangshan
  2014-06-05 15:22                                     ` Jason J. Herne
                                                       ` (3 more replies)
  0 siblings, 4 replies; 50+ messages in thread
From: Lai Jiangshan @ 2014-06-05 10:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

The patch is not tested by Jason, I don't know whether the patch fix the problem.
The changlog including the "Reported-by:" and "Tested-by:" need to be updated
after it is proved.

------------

Subject: [PATCH] sched: migrate the waking tasks

Current code skips to migrate the waking task silently when TTWU_QUEUE is enabled.

When a task is waking, it is pending on the wake_list of the rq, but
it is not on queue (task->on_rq == 0). In this case, set_cpus_allowed_ptr()
and __migrate_task() will not migrate it due to it is not on queue.

This behavior is incorrect, because the task had been already waken-up, it will
be running on the wrong CPU without correct placement until the next wake-up
or update for cpus_allowed.

To fix this problem, we need to make the waking tasks on-queue (transfer
the waking tasks to running state) before migrate them.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 268a45e..d05a5a1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1474,20 +1474,24 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
 }
 
 #ifdef CONFIG_SMP
-static void sched_ttwu_pending(void)
+static void sched_ttwu_pending_locked(struct rq *rq)
 {
-	struct rq *rq = this_rq();
 	struct llist_node *llist = llist_del_all(&rq->wake_list);
 	struct task_struct *p;
 
-	raw_spin_lock(&rq->lock);
-
 	while (llist) {
 		p = llist_entry(llist, struct task_struct, wake_entry);
 		llist = llist_next(llist);
 		ttwu_do_activate(rq, p, 0);
 	}
+}
 
+static void sched_ttwu_pending(void)
+{
+	struct rq *rq = this_rq();
+
+	raw_spin_lock(&rq->lock);
+	sched_ttwu_pending_locked(rq);
 	raw_spin_unlock(&rq->lock);
 }
 
@@ -4530,6 +4534,11 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 		goto out;
 
 	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
+
+	/* Ensure it is on rq for migration if it is waking */
+	if (p->state == TASK_WAKING)
+		sched_ttwu_pending_locked(rq);
+
 	if (p->on_rq) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
@@ -4576,6 +4585,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
 	if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
 		goto fail;
 
+	/* Ensure it is on rq for migration if it is waking */
+	if (p->state == TASK_WAKING)
+		sched_ttwu_pending_locked(rq_src);
+
 	/*
 	 * If we're not on a rq, the next wake-up will ensure we're
 	 * placed properly.

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-05 10:54                                   ` Lai Jiangshan
@ 2014-06-05 15:22                                     ` Jason J. Herne
  2014-06-06 12:39                                     ` Jason J. Herne
                                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 50+ messages in thread
From: Jason J. Herne @ 2014-06-05 15:22 UTC (permalink / raw)
  To: Lai Jiangshan, Peter Zijlstra
  Cc: Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On 06/05/2014 06:54 AM, Lai Jiangshan wrote:
> The patch is not tested by Jason, I don't know whether the patch fix the problem.
> The changlog including the "Reported-by:" and "Tested-by:" need to be updated
> after it is proved.

I will test this one and get back to you as soon as possible.

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-05 10:54                                   ` Lai Jiangshan
  2014-06-05 15:22                                     ` Jason J. Herne
@ 2014-06-06 12:39                                     ` Jason J. Herne
  2014-06-06 13:36                                     ` Peter Zijlstra
  2014-06-09 14:01                                     ` Jason J. Herne
  3 siblings, 0 replies; 50+ messages in thread
From: Jason J. Herne @ 2014-06-06 12:39 UTC (permalink / raw)
  To: Lai Jiangshan, Peter Zijlstra
  Cc: Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On 06/05/2014 06:54 AM, Lai Jiangshan wrote:
> The patch is not tested by Jason, I don't know whether the patch fix the problem.
> The changlog including the "Reported-by:" and "Tested-by:" need to be updated
> after it is proved.
>

With this patch, my workload ran overnight without hitting the warning. 
This seems promising. I would like to run a day or two more before 
declaring success though. Just to be sure :).

> ------------
>
> Subject: [PATCH] sched: migrate the waking tasks
>
> Current code skips to migrate the waking task silently when TTWU_QUEUE is enabled.
>
> When a task is waking, it is pending on the wake_list of the rq, but
> it is not on queue (task->on_rq == 0). In this case, set_cpus_allowed_ptr()
> and __migrate_task() will not migrate it due to it is not on queue.
>
> This behavior is incorrect, because the task had been already waken-up, it will
> be running on the wrong CPU without correct placement until the next wake-up
> or update for cpus_allowed.
>
> To fix this problem, we need to make the waking tasks on-queue (transfer
> the waking tasks to running state) before migrate them.
>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 268a45e..d05a5a1 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1474,20 +1474,24 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
>   }
>
>   #ifdef CONFIG_SMP
> -static void sched_ttwu_pending(void)
> +static void sched_ttwu_pending_locked(struct rq *rq)
>   {
> -	struct rq *rq = this_rq();
>   	struct llist_node *llist = llist_del_all(&rq->wake_list);
>   	struct task_struct *p;
>
> -	raw_spin_lock(&rq->lock);
> -
>   	while (llist) {
>   		p = llist_entry(llist, struct task_struct, wake_entry);
>   		llist = llist_next(llist);
>   		ttwu_do_activate(rq, p, 0);
>   	}
> +}
>
> +static void sched_ttwu_pending(void)
> +{
> +	struct rq *rq = this_rq();
> +
> +	raw_spin_lock(&rq->lock);
> +	sched_ttwu_pending_locked(rq);
>   	raw_spin_unlock(&rq->lock);
>   }
>
> @@ -4530,6 +4534,11 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>   		goto out;
>
>   	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
> +
> +	/* Ensure it is on rq for migration if it is waking */
> +	if (p->state == TASK_WAKING)
> +		sched_ttwu_pending_locked(rq);
> +
>   	if (p->on_rq) {
>   		struct migration_arg arg = { p, dest_cpu };
>   		/* Need help from migration thread: drop lock and wait. */
> @@ -4576,6 +4585,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
>   	if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
>   		goto fail;
>
> +	/* Ensure it is on rq for migration if it is waking */
> +	if (p->state == TASK_WAKING)
> +		sched_ttwu_pending_locked(rq_src);
> +
>   	/*
>   	 * If we're not on a rq, the next wake-up will ensure we're
>   	 * placed properly.
>

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-05 10:54                                   ` Lai Jiangshan
  2014-06-05 15:22                                     ` Jason J. Herne
  2014-06-06 12:39                                     ` Jason J. Herne
@ 2014-06-06 13:36                                     ` Peter Zijlstra
  2014-06-08  2:50                                       ` Lai Jiangshan
  2014-09-01  3:04                                       ` Lai Jiangshan
  2014-06-09 14:01                                     ` Jason J. Herne
  3 siblings, 2 replies; 50+ messages in thread
From: Peter Zijlstra @ 2014-06-06 13:36 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On Thu, Jun 05, 2014 at 06:54:35PM +0800, Lai Jiangshan wrote:
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 268a45e..d05a5a1 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1474,20 +1474,24 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
>  }
>  
>  #ifdef CONFIG_SMP
> -static void sched_ttwu_pending(void)
> +static void sched_ttwu_pending_locked(struct rq *rq)
>  {
> -	struct rq *rq = this_rq();
>  	struct llist_node *llist = llist_del_all(&rq->wake_list);
>  	struct task_struct *p;
>  
> -	raw_spin_lock(&rq->lock);
> -
>  	while (llist) {
>  		p = llist_entry(llist, struct task_struct, wake_entry);
>  		llist = llist_next(llist);
>  		ttwu_do_activate(rq, p, 0);
>  	}
> +}
>  
> +static void sched_ttwu_pending(void)
> +{
> +	struct rq *rq = this_rq();
> +
> +	raw_spin_lock(&rq->lock);
> +	sched_ttwu_pending_locked(rq);
>  	raw_spin_unlock(&rq->lock);
>  }

OK, so this won't apply to a recent kernel.

> @@ -4530,6 +4534,11 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>  		goto out;
>  
>  	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
> +
> +	/* Ensure it is on rq for migration if it is waking */
> +	if (p->state == TASK_WAKING)
> +		sched_ttwu_pending_locked(rq);

So I would really rather like to avoid this if possible, its doing full
remote queueing, exactly what we tried to avoid.

> +
>  	if (p->on_rq) {
>  		struct migration_arg arg = { p, dest_cpu };
>  		/* Need help from migration thread: drop lock and wait. */
> @@ -4576,6 +4585,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
>  	if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
>  		goto fail;
>  
> +	/* Ensure it is on rq for migration if it is waking */
> +	if (p->state == TASK_WAKING)
> +		sched_ttwu_pending_locked(rq_src);
> +
>  	/*
>  	 * If we're not on a rq, the next wake-up will ensure we're
>  	 * placed properly.

Oh man, another variant.. why did you change it again? And without
explanation for why you changed it.

I don't see a reason to call sched_ttwu_pending() with rq->lock held,
seeing as how we append to that list without it held.

I'm still thinking the previous version is good, can you explain why you
changed it?







^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-06 13:36                                     ` Peter Zijlstra
@ 2014-06-08  2:50                                       ` Lai Jiangshan
  2014-09-01  3:04                                       ` Lai Jiangshan
  1 sibling, 0 replies; 50+ messages in thread
From: Lai Jiangshan @ 2014-06-08  2:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On 06/06/2014 09:36 PM, Peter Zijlstra wrote:
> On Thu, Jun 05, 2014 at 06:54:35PM +0800, Lai Jiangshan wrote:
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 268a45e..d05a5a1 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1474,20 +1474,24 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
>>  }
>>  
>>  #ifdef CONFIG_SMP
>> -static void sched_ttwu_pending(void)
>> +static void sched_ttwu_pending_locked(struct rq *rq)
>>  {
>> -	struct rq *rq = this_rq();
>>  	struct llist_node *llist = llist_del_all(&rq->wake_list);
>>  	struct task_struct *p;
>>  
>> -	raw_spin_lock(&rq->lock);
>> -
>>  	while (llist) {
>>  		p = llist_entry(llist, struct task_struct, wake_entry);
>>  		llist = llist_next(llist);
>>  		ttwu_do_activate(rq, p, 0);
>>  	}
>> +}
>>  
>> +static void sched_ttwu_pending(void)
>> +{
>> +	struct rq *rq = this_rq();
>> +
>> +	raw_spin_lock(&rq->lock);
>> +	sched_ttwu_pending_locked(rq);
>>  	raw_spin_unlock(&rq->lock);
>>  }
> 
> OK, so this won't apply to a recent kernel.

Thank you for review.

The code here was already changed in the recent kernel?
or I touched too much to apply it?

> 
>> @@ -4530,6 +4534,11 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>>  		goto out;
>>  
>>  	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
>> +
>> +	/* Ensure it is on rq for migration if it is waking */
>> +	if (p->state == TASK_WAKING)
>> +		sched_ttwu_pending_locked(rq);
> 
> So I would really rather like to avoid this if possible, its doing full
> remote queueing, exactly what we tried to avoid.

set_cpus_allowed_ptr() is slow path, the bad effect introduced by this
change is limited.

> 
>> +
>>  	if (p->on_rq) {
>>  		struct migration_arg arg = { p, dest_cpu };
>>  		/* Need help from migration thread: drop lock and wait. */
>> @@ -4576,6 +4585,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
>>  	if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
>>  		goto fail;
>>  
>> +	/* Ensure it is on rq for migration if it is waking */
>> +	if (p->state == TASK_WAKING)
>> +		sched_ttwu_pending_locked(rq_src);
>> +
>>  	/*
>>  	 * If we're not on a rq, the next wake-up will ensure we're
>>  	 * placed properly.
> 
> Oh man, another variant.. why did you change it again? And without
> explanation for why you changed it.
> 
> I don't see a reason to call sched_ttwu_pending() with rq->lock held,
> seeing as how we append to that list without it held.

sched_ttwu_pending() requires rq->lock to do actual work.

I swapped the order of "llist_del_all(&rq->wake_list)" and "raw_spin_lock(&rq->lock);"
that it makes the lock section of rq->lock is slightly extend.

> 
> I'm still thinking the previous version is good, can you explain why you
> changed it?

There was a window in the previous version.

sched_ttwu_pending();
	<----------------window here, the task can be in WAKING state again.
__migrate_task();


When I thought deeply, it was still correct in current code for all migration_cpu_stop()'s
callers. But we need a big chunk of comments to explain it. And I felt nervous
with that window, it would be a bug if something else changed without regarding
this window. I don't want to leave a fragile code.

The new version patch is much straight and it is self comment.

Migration: if the task is waken-up, migrate it, otherwise it is next wakeup's responsibility.
Original code simply considered p->on_rq <==> waken-up.
New patch just fixes up p->on_rq before considering it.


How about this (slight changed without touching original sched_ttwu_pending())

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 268a45e..cd224ea 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1474,6 +1474,18 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
 }
 
 #ifdef CONFIG_SMP
+static void sched_ttwu_pending_locked(struct rq *rq)
+{
+	struct llist_node *llist = llist_del_all(&rq->wake_list);
+	struct task_struct *p;
+
+	while (llist) {
+		p = llist_entry(llist, struct task_struct, wake_entry);
+		llist = llist_next(llist);
+		ttwu_do_activate(rq, p, 0);
+	}
+}
+
 static void sched_ttwu_pending(void)
 {
 	struct rq *rq = this_rq();
@@ -4530,7 +4542,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 		goto out;
 
 	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
-	if (p->on_rq) {
+	if (p->on_rq || p->state == TASK_WAKING) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
 		task_rq_unlock(rq, p, &flags);
@@ -4576,6 +4588,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
 	if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
 		goto fail;
 
+	/* Ensure it is on rq for migration if it is waking */
+	if (p->state == TASK_WAKING)
+		sched_ttwu_pending_locked(rq_src);
+
 	/*
 	 * If we're not on a rq, the next wake-up will ensure we're
 	 * placed properly.
> 
> 
> 
> 
> 
> 
> .
> 


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-05 10:54                                   ` Lai Jiangshan
                                                       ` (2 preceding siblings ...)
  2014-06-06 13:36                                     ` Peter Zijlstra
@ 2014-06-09 14:01                                     ` Jason J. Herne
  2014-06-10  1:21                                       ` Lai Jiangshan
  3 siblings, 1 reply; 50+ messages in thread
From: Jason J. Herne @ 2014-06-09 14:01 UTC (permalink / raw)
  To: Lai Jiangshan, Peter Zijlstra
  Cc: Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On 06/05/2014 06:54 AM, Lai Jiangshan wrote:
> ------------
>
> Subject: [PATCH] sched: migrate the waking tasks
>
> Current code skips to migrate the waking task silently when TTWU_QUEUE is enabled.
>
> When a task is waking, it is pending on the wake_list of the rq, but
> it is not on queue (task->on_rq == 0). In this case, set_cpus_allowed_ptr()
> and __migrate_task() will not migrate it due to it is not on queue.
>
> This behavior is incorrect, because the task had been already waken-up, it will
> be running on the wrong CPU without correct placement until the next wake-up
> or update for cpus_allowed.
>
> To fix this problem, we need to make the waking tasks on-queue (transfer
> the waking tasks to running state) before migrate them.
>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 268a45e..d05a5a1 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1474,20 +1474,24 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
>   }
>
>   #ifdef CONFIG_SMP
> -static void sched_ttwu_pending(void)
> +static void sched_ttwu_pending_locked(struct rq *rq)
>   {
> -	struct rq *rq = this_rq();
>   	struct llist_node *llist = llist_del_all(&rq->wake_list);
>   	struct task_struct *p;
>
> -	raw_spin_lock(&rq->lock);
> -
>   	while (llist) {
>   		p = llist_entry(llist, struct task_struct, wake_entry);
>   		llist = llist_next(llist);
>   		ttwu_do_activate(rq, p, 0);
>   	}
> +}
>
> +static void sched_ttwu_pending(void)
> +{
> +	struct rq *rq = this_rq();
> +
> +	raw_spin_lock(&rq->lock);
> +	sched_ttwu_pending_locked(rq);
>   	raw_spin_unlock(&rq->lock);
>   }
>
> @@ -4530,6 +4534,11 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>   		goto out;
>
>   	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
> +
> +	/* Ensure it is on rq for migration if it is waking */
> +	if (p->state == TASK_WAKING)
> +		sched_ttwu_pending_locked(rq);
> +
>   	if (p->on_rq) {
>   		struct migration_arg arg = { p, dest_cpu };
>   		/* Need help from migration thread: drop lock and wait. */
> @@ -4576,6 +4585,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
>   	if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
>   		goto fail;
>
> +	/* Ensure it is on rq for migration if it is waking */
> +	if (p->state == TASK_WAKING)
> +		sched_ttwu_pending_locked(rq_src);
> +
>   	/*
>   	 * If we're not on a rq, the next wake-up will ensure we're
>   	 * placed properly.
>

FYI, this patch appears to fix the problem. I was able to run for 3 days 
without hitting the warning.

I see that you guys are still discussing the details of the fix. When 
you decide on a final solution I'm happy to retest. Just be sure to ask 
:). It is hard to tell what to test with so many patches and code 
snippets flying around all the time.

Happy coding.

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-09 14:01                                     ` Jason J. Herne
@ 2014-06-10  1:21                                       ` Lai Jiangshan
  2014-06-16  1:30                                         ` Lai Jiangshan
  0 siblings, 1 reply; 50+ messages in thread
From: Lai Jiangshan @ 2014-06-10  1:21 UTC (permalink / raw)
  To: jjherne
  Cc: Peter Zijlstra, Sasha Levin, Tejun Heo, LKML, Dave Jones,
	Ingo Molnar, Thomas Gleixner, Steven Rostedt

On 06/09/2014 10:01 PM, Jason J. Herne wrote:
> On 06/05/2014 06:54 AM, Lai Jiangshan wrote:
>> ------------
>>
>> Subject: [PATCH] sched: migrate the waking tasks
>>
>> Current code skips to migrate the waking task silently when TTWU_QUEUE is enabled.
>>
>> When a task is waking, it is pending on the wake_list of the rq, but
>> it is not on queue (task->on_rq == 0). In this case, set_cpus_allowed_ptr()
>> and __migrate_task() will not migrate it due to it is not on queue.
>>
>> This behavior is incorrect, because the task had been already waken-up, it will
>> be running on the wrong CPU without correct placement until the next wake-up
>> or update for cpus_allowed.
>>
>> To fix this problem, we need to make the waking tasks on-queue (transfer
>> the waking tasks to running state) before migrate them.
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 268a45e..d05a5a1 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1474,20 +1474,24 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
>>   }
>>
>>   #ifdef CONFIG_SMP
>> -static void sched_ttwu_pending(void)
>> +static void sched_ttwu_pending_locked(struct rq *rq)
>>   {
>> -    struct rq *rq = this_rq();
>>       struct llist_node *llist = llist_del_all(&rq->wake_list);
>>       struct task_struct *p;
>>
>> -    raw_spin_lock(&rq->lock);
>> -
>>       while (llist) {
>>           p = llist_entry(llist, struct task_struct, wake_entry);
>>           llist = llist_next(llist);
>>           ttwu_do_activate(rq, p, 0);
>>       }
>> +}
>>
>> +static void sched_ttwu_pending(void)
>> +{
>> +    struct rq *rq = this_rq();
>> +
>> +    raw_spin_lock(&rq->lock);
>> +    sched_ttwu_pending_locked(rq);
>>       raw_spin_unlock(&rq->lock);
>>   }
>>
>> @@ -4530,6 +4534,11 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>>           goto out;
>>
>>       dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
>> +
>> +    /* Ensure it is on rq for migration if it is waking */
>> +    if (p->state == TASK_WAKING)
>> +        sched_ttwu_pending_locked(rq);
>> +
>>       if (p->on_rq) {
>>           struct migration_arg arg = { p, dest_cpu };
>>           /* Need help from migration thread: drop lock and wait. */
>> @@ -4576,6 +4585,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
>>       if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
>>           goto fail;
>>
>> +    /* Ensure it is on rq for migration if it is waking */
>> +    if (p->state == TASK_WAKING)
>> +        sched_ttwu_pending_locked(rq_src);
>> +
>>       /*
>>        * If we're not on a rq, the next wake-up will ensure we're
>>        * placed properly.
>>
> 
> FYI, this patch appears to fix the problem. I was able to run for 3 days without hitting the warning.

Thank you for the test. It proves that we found the root cause.
Your tests are the most important, coding takes the second place, let it go forward step by step.

Thanks,
Lai

> 
> I see that you guys are still discussing the details of the fix. When you decide on a final solution I'm happy to retest. Just be sure to ask :). It is hard to tell what to test with so many patches and code snippets flying around all the time.
> 
> Happy coding.
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-10  1:21                                       ` Lai Jiangshan
@ 2014-06-16  1:30                                         ` Lai Jiangshan
  0 siblings, 0 replies; 50+ messages in thread
From: Lai Jiangshan @ 2014-06-16  1:30 UTC (permalink / raw)
  To: jjherne, Peter Zijlstra
  Cc: Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

Hi, Peter

Ping...

thanks,
Lai

On 06/10/2014 09:21 AM, Lai Jiangshan wrote:
> On 06/09/2014 10:01 PM, Jason J. Herne wrote:
>> On 06/05/2014 06:54 AM, Lai Jiangshan wrote:
>>> ------------
>>>
>>> Subject: [PATCH] sched: migrate the waking tasks
>>>
>>> Current code skips to migrate the waking task silently when TTWU_QUEUE is enabled.
>>>
>>> When a task is waking, it is pending on the wake_list of the rq, but
>>> it is not on queue (task->on_rq == 0). In this case, set_cpus_allowed_ptr()
>>> and __migrate_task() will not migrate it due to it is not on queue.
>>>
>>> This behavior is incorrect, because the task had been already waken-up, it will
>>> be running on the wrong CPU without correct placement until the next wake-up
>>> or update for cpus_allowed.
>>>
>>> To fix this problem, we need to make the waking tasks on-queue (transfer
>>> the waking tasks to running state) before migrate them.
>>>
>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> ---
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 268a45e..d05a5a1 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -1474,20 +1474,24 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
>>>   }
>>>
>>>   #ifdef CONFIG_SMP
>>> -static void sched_ttwu_pending(void)
>>> +static void sched_ttwu_pending_locked(struct rq *rq)
>>>   {
>>> -    struct rq *rq = this_rq();
>>>       struct llist_node *llist = llist_del_all(&rq->wake_list);
>>>       struct task_struct *p;
>>>
>>> -    raw_spin_lock(&rq->lock);
>>> -
>>>       while (llist) {
>>>           p = llist_entry(llist, struct task_struct, wake_entry);
>>>           llist = llist_next(llist);
>>>           ttwu_do_activate(rq, p, 0);
>>>       }
>>> +}
>>>
>>> +static void sched_ttwu_pending(void)
>>> +{
>>> +    struct rq *rq = this_rq();
>>> +
>>> +    raw_spin_lock(&rq->lock);
>>> +    sched_ttwu_pending_locked(rq);
>>>       raw_spin_unlock(&rq->lock);
>>>   }
>>>
>>> @@ -4530,6 +4534,11 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>>>           goto out;
>>>
>>>       dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
>>> +
>>> +    /* Ensure it is on rq for migration if it is waking */
>>> +    if (p->state == TASK_WAKING)
>>> +        sched_ttwu_pending_locked(rq);
>>> +
>>>       if (p->on_rq) {
>>>           struct migration_arg arg = { p, dest_cpu };
>>>           /* Need help from migration thread: drop lock and wait. */
>>> @@ -4576,6 +4585,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
>>>       if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
>>>           goto fail;
>>>
>>> +    /* Ensure it is on rq for migration if it is waking */
>>> +    if (p->state == TASK_WAKING)
>>> +        sched_ttwu_pending_locked(rq_src);
>>> +
>>>       /*
>>>        * If we're not on a rq, the next wake-up will ensure we're
>>>        * placed properly.
>>>
>>
>> FYI, this patch appears to fix the problem. I was able to run for 3 days without hitting the warning.
> 
> Thank you for the test. It proves that we found the root cause.
> Your tests are the most important, coding takes the second place, let it go forward step by step.
> 
> Thanks,
> Lai
> 
>>
>> I see that you guys are still discussing the details of the fix. When you decide on a final solution I'm happy to retest. Just be sure to ask :). It is hard to tell what to test with so many patches and code snippets flying around all the time.
>>
>> Happy coding.
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> .
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-06-06 13:36                                     ` Peter Zijlstra
  2014-06-08  2:50                                       ` Lai Jiangshan
@ 2014-09-01  3:04                                       ` Lai Jiangshan
  2014-09-03 15:15                                         ` Peter Zijlstra
  1 sibling, 1 reply; 50+ messages in thread
From: Lai Jiangshan @ 2014-09-01  3:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

Hi, Peter

Could you make a patch for it, please? Jason J. Herne's test showed we
addressed the bug.  But the fix is not in kernel yet.  Some new highly
related reports are come up again.

I don't want to argue any more, no matter how the patch will be,
I will accept.  And please add the following tags in your patch:

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Tested-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>


Thanks,
Lai

On 06/06/2014 09:36 PM, Peter Zijlstra wrote:
> On Thu, Jun 05, 2014 at 06:54:35PM +0800, Lai Jiangshan wrote:
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 268a45e..d05a5a1 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1474,20 +1474,24 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
>>  }
>>  
>>  #ifdef CONFIG_SMP
>> -static void sched_ttwu_pending(void)
>> +static void sched_ttwu_pending_locked(struct rq *rq)
>>  {
>> -	struct rq *rq = this_rq();
>>  	struct llist_node *llist = llist_del_all(&rq->wake_list);
>>  	struct task_struct *p;
>>  
>> -	raw_spin_lock(&rq->lock);
>> -
>>  	while (llist) {
>>  		p = llist_entry(llist, struct task_struct, wake_entry);
>>  		llist = llist_next(llist);
>>  		ttwu_do_activate(rq, p, 0);
>>  	}
>> +}
>>  
>> +static void sched_ttwu_pending(void)
>> +{
>> +	struct rq *rq = this_rq();
>> +
>> +	raw_spin_lock(&rq->lock);
>> +	sched_ttwu_pending_locked(rq);
>>  	raw_spin_unlock(&rq->lock);
>>  }
> 
> OK, so this won't apply to a recent kernel.
> 
>> @@ -4530,6 +4534,11 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>>  		goto out;
>>  
>>  	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
>> +
>> +	/* Ensure it is on rq for migration if it is waking */
>> +	if (p->state == TASK_WAKING)
>> +		sched_ttwu_pending_locked(rq);
> 
> So I would really rather like to avoid this if possible, its doing full
> remote queueing, exactly what we tried to avoid.
> 
>> +
>>  	if (p->on_rq) {
>>  		struct migration_arg arg = { p, dest_cpu };
>>  		/* Need help from migration thread: drop lock and wait. */
>> @@ -4576,6 +4585,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
>>  	if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
>>  		goto fail;
>>  
>> +	/* Ensure it is on rq for migration if it is waking */
>> +	if (p->state == TASK_WAKING)
>> +		sched_ttwu_pending_locked(rq_src);
>> +
>>  	/*
>>  	 * If we're not on a rq, the next wake-up will ensure we're
>>  	 * placed properly.
> 
> Oh man, another variant.. why did you change it again? And without
> explanation for why you changed it.
> 
> I don't see a reason to call sched_ttwu_pending() with rq->lock held,
> seeing as how we append to that list without it held.
> 
> I'm still thinking the previous version is good, can you explain why you
> changed it?
> 
> 
> 
> 
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-09-01  3:04                                       ` Lai Jiangshan
@ 2014-09-03 15:15                                         ` Peter Zijlstra
  2014-09-04  2:22                                           ` Lai Jiangshan
  0 siblings, 1 reply; 50+ messages in thread
From: Peter Zijlstra @ 2014-09-03 15:15 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On Mon, Sep 01, 2014 at 11:04:23AM +0800, Lai Jiangshan wrote:
> Hi, Peter
> 
> Could you make a patch for it, please? Jason J. Herne's test showed we
> addressed the bug.  But the fix is not in kernel yet.  Some new highly
> related reports are come up again.
> 
> I don't want to argue any more, no matter how the patch will be,
> I will accept.  And please add the following tags in your patch:

Well; I said http://marc.info/?l=linux-kernel&m=140187009016179 was a
good patch and only asked you to add a comment and make it a nice patch
which I could apply: http://marc.info/?l=linux-kernel&m=140187477317886&w=2

Instead you posted an entirely different patch again.

So how about the below?

---
Subject: sched: Migrate waking tasks
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Wed, 4 Jun 2014 16:25:15 +0800

Current code can fail to migrate a waking task (silently) when TTWU_QUEUE is
enabled.

When a task is waking, it is pending on the wake_list of the rq, but it is not
queued (task->on_rq == 0). In this case, set_cpus_allowed_ptr() and
__migrate_task() will not migrate it because its invisible to them.

This behavior is incorrect, because the task has been already woken, it will be
running on the wrong CPU without correct placement until the next wake-up or
update for cpus_allowed.

To fix this problem, we need to finish the wakeup (so they appear on
the runqueue) before we migrate them.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Tested-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/538ED7EB.5050303@cn.fujitsu.com
---
 kernel/sched/core.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4642,7 +4642,7 @@ int set_cpus_allowed_ptr(struct task_str
 		goto out;
 
 	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
-	if (task_on_rq_queued(p)) {
+	if (task_on_rq_queued(p) || p->state == TASK_WAKING) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
 		task_rq_unlock(rq, p, &flags);
@@ -4775,6 +4775,12 @@ static int migration_cpu_stop(void *data
 	 * be on another cpu but it doesn't matter.
 	 */
 	local_irq_disable();
+	/*
+	 * We need to explicitly wake pending tasks before running
+	 * __migrate_task() such that we will not miss enforcing cpus_allowed
+	 * during wakeups, see set_cpus_allowed_ptr()'s TASK_WAKING test.
+	 */
+	sched_ttwu_pending();
 	__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
 	local_irq_enable();
 	return 0;


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-09-03 15:15                                         ` Peter Zijlstra
@ 2014-09-04  2:22                                           ` Lai Jiangshan
  2014-09-04  6:39                                             ` Peter Zijlstra
  0 siblings, 1 reply; 50+ messages in thread
From: Lai Jiangshan @ 2014-09-04  2:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On 09/03/2014 11:15 PM, Peter Zijlstra wrote:
> On Mon, Sep 01, 2014 at 11:04:23AM +0800, Lai Jiangshan wrote:
>> Hi, Peter
>>
>> Could you make a patch for it, please? Jason J. Herne's test showed we
>> addressed the bug.  But the fix is not in kernel yet.  Some new highly
>> related reports are come up again.
>>
>> I don't want to argue any more, no matter how the patch will be,
>> I will accept.  And please add the following tags in your patch:
> 
> Well; I said http://marc.info/?l=linux-kernel&m=140187009016179 was a
> good patch and only asked you to add a comment and make it a nice patch
> which I could apply: http://marc.info/?l=linux-kernel&m=140187477317886&w=2
> 

I posted this patch as a quick fix for early test...
It didn't match my own criteria which may be diverted from the community although.

I'm sorry for having pushed my personal thinking to the community and resulted
that you had to write comment for my patch which was my responsibility.

> Instead you posted an entirely different patch again.
> 
> So how about the below?

Acked

> 
> ---
> Subject: sched: Migrate waking tasks
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date: Wed, 4 Jun 2014 16:25:15 +0800
> 
> Current code can fail to migrate a waking task (silently) when TTWU_QUEUE is
> enabled.
> 
> When a task is waking, it is pending on the wake_list of the rq, but it is not
> queued (task->on_rq == 0). In this case, set_cpus_allowed_ptr() and
> __migrate_task() will not migrate it because its invisible to them.
> 
> This behavior is incorrect, because the task has been already woken, it will be
> running on the wrong CPU without correct placement until the next wake-up or
> update for cpus_allowed.
> 
> To fix this problem, we need to finish the wakeup (so they appear on
> the runqueue) before we migrate them.
> 
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Reported-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> Tested-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: http://lkml.kernel.org/r/538ED7EB.5050303@cn.fujitsu.com
> ---
>  kernel/sched/core.c |    8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4642,7 +4642,7 @@ int set_cpus_allowed_ptr(struct task_str
>  		goto out;
>  
>  	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
> -	if (task_on_rq_queued(p)) {
> +	if (task_on_rq_queued(p) || p->state == TASK_WAKING) {

unrelated question: why we have to stop the cpu even the task is
not running?


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: workqueue: WARN at at kernel/workqueue.c:2176
  2014-09-04  2:22                                           ` Lai Jiangshan
@ 2014-09-04  6:39                                             ` Peter Zijlstra
  0 siblings, 0 replies; 50+ messages in thread
From: Peter Zijlstra @ 2014-09-04  6:39 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: jjherne, Sasha Levin, Tejun Heo, LKML, Dave Jones, Ingo Molnar,
	Thomas Gleixner, Steven Rostedt

On Thu, Sep 04, 2014 at 10:22:08AM +0800, Lai Jiangshan wrote:
> >  	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
> > -	if (task_on_rq_queued(p)) {
> > +	if (task_on_rq_queued(p) || p->state == TASK_WAKING) {
> 
> unrelated question: why we have to stop the cpu even the task is
> not running?

Strictly speaking not needed I suppose; I suppose this was done to
reduce complexity. Setting affinity is a rare operation after all.

Maybe Ingo can remember, this is certainly from before I started poking
at things.. :-)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [tip:sched/core] sched: Migrate waking tasks
  2014-06-04  8:25                               ` Lai Jiangshan
  2014-06-04  9:39                                 ` Peter Zijlstra
@ 2014-09-09 14:52                                 ` tip-bot for Lai Jiangshan
  2014-09-10  7:38                                   ` Kirill Tkhai
  1 sibling, 1 reply; 50+ messages in thread
From: tip-bot for Lai Jiangshan @ 2014-09-09 14:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, sasha.levin, hpa, mingo, torvalds, peterz, tglx,
	laijs, jjherne

Commit-ID:  5cd038f53ed9ec7a17ab7d536a727363080f4210
Gitweb:     http://git.kernel.org/tip/5cd038f53ed9ec7a17ab7d536a727363080f4210
Author:     Lai Jiangshan <laijs@cn.fujitsu.com>
AuthorDate: Wed, 4 Jun 2014 16:25:15 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 9 Sep 2014 06:47:27 +0200

sched: Migrate waking tasks

Current code can fail to migrate a waking task (silently) when TTWU_QUEUE is
enabled.

When a task is waking, it is pending on the wake_list of the rq, but it is not
queued (task->on_rq == 0). In this case, set_cpus_allowed_ptr() and
__migrate_task() will not migrate it because its invisible to them.

This behavior is incorrect, because the task has been already woken, it will be
running on the wrong CPU without correct placement until the next wake-up or
update for cpus_allowed.

To fix this problem, we need to finish the wakeup (so they appear on
the runqueue) before we migrate them.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Tested-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/538ED7EB.5050303@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a814b3c..78e5c83 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4666,7 +4666,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 		goto out;
 
 	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
-	if (task_on_rq_queued(p)) {
+	if (task_on_rq_queued(p) || p->state == TASK_WAKING) {
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
 		task_rq_unlock(rq, p, &flags);
@@ -4799,6 +4799,12 @@ static int migration_cpu_stop(void *data)
 	 * be on another cpu but it doesn't matter.
 	 */
 	local_irq_disable();
+	/*
+	 * We need to explicitly wake pending tasks before running
+	 * __migrate_task() such that we will not miss enforcing cpus_allowed
+	 * during wakeups, see set_cpus_allowed_ptr()'s TASK_WAKING test.
+	 */
+	sched_ttwu_pending();
 	__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
 	local_irq_enable();
 	return 0;

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [tip:sched/core] sched: Migrate waking tasks
  2014-09-09 14:52                                 ` [tip:sched/core] sched: Migrate waking tasks tip-bot for Lai Jiangshan
@ 2014-09-10  7:38                                   ` Kirill Tkhai
  2014-09-10  7:53                                     ` Peter Zijlstra
  0 siblings, 1 reply; 50+ messages in thread
From: Kirill Tkhai @ 2014-09-10  7:38 UTC (permalink / raw)
  To: mingo, hpa, sasha.levin, linux-kernel, torvalds, peterz, tglx,
	jjherne, laijs, linux-tip-commits

09.09.2014, 18:54, "tip-bot for Lai Jiangshan" <tipbot@zytor.com>:
> Commit-ID:  5cd038f53ed9ec7a17ab7d536a727363080f4210
> Gitweb:     http://git.kernel.org/tip/5cd038f53ed9ec7a17ab7d536a727363080f4210
> Author:     Lai Jiangshan <laijs@cn.fujitsu.com>
> AuthorDate: Wed, 4 Jun 2014 16:25:15 +0800
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Tue, 9 Sep 2014 06:47:27 +0200
>
> sched: Migrate waking tasks
>
> Current code can fail to migrate a waking task (silently) when TTWU_QUEUE is
> enabled.
>
> When a task is waking, it is pending on the wake_list of the rq, but it is not
> queued (task->on_rq == 0). In this case, set_cpus_allowed_ptr() and
> __migrate_task() will not migrate it because its invisible to them.
>
> This behavior is incorrect, because the task has been already woken, it will be
> running on the wrong CPU without correct placement until the next wake-up or
> update for cpus_allowed.
>
> To fix this problem, we need to finish the wakeup (so they appear on
> the runqueue) before we migrate them.
>
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Reported-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> Tested-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Link: http://lkml.kernel.org/r/538ED7EB.5050303@cn.fujitsu.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  kernel/sched/core.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a814b3c..78e5c83 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4666,7 +4666,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>  		goto out;
>
>	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
> -	if (task_on_rq_queued(p)) {
> +	if (task_on_rq_queued(p) || p->state == TASK_WAKING) {
>  		struct migration_arg arg = { p, dest_cpu };
>  		/* Need help from migration thread: drop lock and wait. */
>  		task_rq_unlock(rq, p, &flags);


About migration_cpu_stop():

> @@ -4799,6 +4799,12 @@ static int migration_cpu_stop(void *data)
>  	* be on another cpu but it doesn't matter.
>  	*/
>  	local_irq_disable();
> +	/*
> +	* We need to explicitly wake pending tasks before running
> +	* __migrate_task() such that we will not miss enforcing cpus_allowed
> +	* during wakeups, see set_cpus_allowed_ptr()'s TASK_WAKING test.
> +	*/
> +	sched_ttwu_pending();
>	__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
>	local_irq_enable();
>	return 0;

It looks like we do not need this hunk, because IPI happens earlier then
stop class begins migration_cpu_stop() execution.

In the first hunk the check "p->state == TASK_WAKING" is under pi_lock,
so if the task is really waking then the IPI is already set.

So, the first hunk is enough here, the second is not need.

Kirill

P.S. Formatting may be in punk style

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [tip:sched/core] sched: Migrate waking tasks
  2014-09-10  7:38                                   ` Kirill Tkhai
@ 2014-09-10  7:53                                     ` Peter Zijlstra
  0 siblings, 0 replies; 50+ messages in thread
From: Peter Zijlstra @ 2014-09-10  7:53 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: mingo, hpa, sasha.levin, linux-kernel, torvalds, tglx, jjherne,
	laijs, linux-tip-commits

[-- Attachment #1: Type: text/plain, Size: 1969 bytes --]

On Wed, Sep 10, 2014 at 11:38:05AM +0400, Kirill Tkhai wrote:
> 09.09.2014, 18:54, "tip-bot for Lai Jiangshan" <tipbot@zytor.com>:
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index a814b3c..78e5c83 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4666,7 +4666,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
> >  		goto out;
> >
> >	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
> > -	if (task_on_rq_queued(p)) {
> > +	if (task_on_rq_queued(p) || p->state == TASK_WAKING) {
> >  		struct migration_arg arg = { p, dest_cpu };
> >  		/* Need help from migration thread: drop lock and wait. */
> >  		task_rq_unlock(rq, p, &flags);
> 
> 
> About migration_cpu_stop():
> 
> > @@ -4799,6 +4799,12 @@ static int migration_cpu_stop(void *data)
> >  	* be on another cpu but it doesn't matter.
> >  	*/
> >  	local_irq_disable();
> > +	/*
> > +	* We need to explicitly wake pending tasks before running
> > +	* __migrate_task() such that we will not miss enforcing cpus_allowed
> > +	* during wakeups, see set_cpus_allowed_ptr()'s TASK_WAKING test.
> > +	*/
> > +	sched_ttwu_pending();
> >	__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
> >	local_irq_enable();
> >	return 0;
> 
> It looks like we do not need this hunk, because IPI happens earlier then
> stop class begins migration_cpu_stop() execution.
> 
> In the first hunk the check "p->state == TASK_WAKING" is under pi_lock,
> so if the task is really waking then the IPI is already set.
> 
> So, the first hunk is enough here, the second is not need.

If the cpu is idle and has TIF_POLLING_NRFLAG we'll never send the IPI,
then again, the wake from idle required to start running the stop task
will also flush that pending queue. So you're probably right.

I'll leave it in though, better safe than sorry and its not a critical
fast path. Good thinking though.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2014-09-10  7:53 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-12 18:58 workqueue: WARN at at kernel/workqueue.c:2176 Sasha Levin
2014-05-12 20:01 ` Tejun Heo
2014-05-13  2:19   ` Lai Jiangshan
2014-05-13  2:17     ` Sasha Levin
2014-05-14 16:52       ` Jason J. Herne
2014-05-16  3:50         ` Lai Jiangshan
2014-05-16  9:35           ` Peter Zijlstra
2014-05-16  9:56             ` Lai Jiangshan
2014-05-16 10:29               ` Peter Zijlstra
2014-05-16 10:15             ` Peter Zijlstra
2014-05-16 10:16               ` Peter Zijlstra
2014-05-16 10:39                 ` Peter Zijlstra
2014-05-16 11:57           ` Peter Zijlstra
2014-05-16 12:08             ` Tejun Heo
2014-05-16 12:14               ` Thomas Gleixner
2014-05-16 12:16                 ` Tejun Heo
2014-05-16 16:18             ` Lai Jiangshan
2014-05-16 16:29               ` Peter Zijlstra
2014-05-27 14:18                 ` Jason J. Herne
2014-05-27 14:26                   ` Peter Zijlstra
2014-05-29 16:23                     ` Jason J. Herne
2014-06-03 11:24                       ` Lai Jiangshan
2014-06-03 12:45                         ` Lai Jiangshan
2014-06-03 14:28                           ` Peter Zijlstra
2014-06-04  1:47                             ` Lai Jiangshan
2014-06-03 14:16                         ` Peter Zijlstra
2014-06-04  2:27                           ` Lai Jiangshan
2014-06-04  6:49                             ` Peter Zijlstra
2014-06-04  8:25                               ` Lai Jiangshan
2014-06-04  9:39                                 ` Peter Zijlstra
2014-06-05 10:54                                   ` Lai Jiangshan
2014-06-05 15:22                                     ` Jason J. Herne
2014-06-06 12:39                                     ` Jason J. Herne
2014-06-06 13:36                                     ` Peter Zijlstra
2014-06-08  2:50                                       ` Lai Jiangshan
2014-09-01  3:04                                       ` Lai Jiangshan
2014-09-03 15:15                                         ` Peter Zijlstra
2014-09-04  2:22                                           ` Lai Jiangshan
2014-09-04  6:39                                             ` Peter Zijlstra
2014-06-09 14:01                                     ` Jason J. Herne
2014-06-10  1:21                                       ` Lai Jiangshan
2014-06-16  1:30                                         ` Lai Jiangshan
2014-09-09 14:52                                 ` [tip:sched/core] sched: Migrate waking tasks tip-bot for Lai Jiangshan
2014-09-10  7:38                                   ` Kirill Tkhai
2014-09-10  7:53                                     ` Peter Zijlstra
2014-06-04  2:28                         ` workqueue: WARN at at kernel/workqueue.c:2176 Lai Jiangshan
2014-06-04  6:48                           ` Peter Zijlstra
2014-05-19 13:07           ` [tip:sched/core] sched: Fix hotplug vs set_cpus_allowed_ptr() tip-bot for Lai Jiangshan
2014-05-22 12:26           ` [tip:sched/core] sched: Fix hotplug vs. set_cpus_allowed_ptr() tip-bot for Lai Jiangshan
2014-05-22 22:02             ` Srivatsa S. Bhat

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.