linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] workqueue: Ensure that cpumask set for pools created after boot
@ 2017-05-10 16:48 Michael Bringmann
  2017-05-10 17:33 ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Bringmann @ 2017-05-10 16:48 UTC (permalink / raw)
  To: Tejun Heo, Lai Jiangshan, linux-kernel; +Cc: Nathan Fontenot, Michael Bringmann


On NUMA systems with dynamic processors, the content of the cpumask
may change over time.  As new processors are added via DLPAR operations,
workqueues are created for them.  This patch ensures that the pools
created for new workqueues will be initialized with a cpumask before
the first worker is created, attached, and woken up.  If the mask is
not set up, then the kernel will crash when 'wakeup_process' is unable
to find a valid CPU to which to assign the new worker.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
---
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c74bf39..6091069 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3366,6 +3366,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 	copy_workqueue_attrs(pool->attrs, attrs);
 	pool->node = target_node;
 
+	cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id()));
+
 	/*
 	 * no_numa isn't a worker_pool attribute, always clear it.  See
 	 * 'struct workqueue_attrs' comments for detail.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-05-10 16:48 [PATCH] workqueue: Ensure that cpumask set for pools created after boot Michael Bringmann
@ 2017-05-10 17:33 ` Tejun Heo
  2017-05-15 15:48   ` Michael Bringmann
  0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2017-05-10 17:33 UTC (permalink / raw)
  To: Michael Bringmann; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

Hello,

On Wed, May 10, 2017 at 11:48:17AM -0500, Michael Bringmann wrote:
> 
> On NUMA systems with dynamic processors, the content of the cpumask
> may change over time.  As new processors are added via DLPAR operations,
> workqueues are created for them.  This patch ensures that the pools
> created for new workqueues will be initialized with a cpumask before
> the first worker is created, attached, and woken up.  If the mask is
> not set up, then the kernel will crash when 'wakeup_process' is unable
> to find a valid CPU to which to assign the new worker.
> 
> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
> ---
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index c74bf39..6091069 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3366,6 +3366,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
>  	copy_workqueue_attrs(pool->attrs, attrs);
>  	pool->node = target_node;
>  
> +	cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id()));

What prevents a cpu getting added right here tho?

Maybe the right thing to do is protecting the whole thing with hotplug
readlock?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-05-10 17:33 ` Tejun Heo
@ 2017-05-15 15:48   ` Michael Bringmann
  2017-05-16 15:55     ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Bringmann @ 2017-05-15 15:48 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot, Michael Bringmann

Hello:

On 05/10/2017 12:33 PM, Tejun Heo wrote:
> Hello,
> 
> On Wed, May 10, 2017 at 11:48:17AM -0500, Michael Bringmann wrote:
>>
>> On NUMA systems with dynamic processors, the content of the cpumask
>> may change over time.  As new processors are added via DLPAR operations,
>> workqueues are created for them.  This patch ensures that the pools
>> created for new workqueues will be initialized with a cpumask before
>> the first worker is created, attached, and woken up.  If the mask is
>> not set up, then the kernel will crash when 'wakeup_process' is unable
>> to find a valid CPU to which to assign the new worker.
>>
>> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
>> ---
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index c74bf39..6091069 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -3366,6 +3366,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
>>  	copy_workqueue_attrs(pool->attrs, attrs);
>>  	pool->node = target_node;
>>  
>> +	cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id()));
> 
> What prevents a cpu getting added right here tho?

PowerPC has only one control path to add/remove CPUs via DLPAR operations.
Even so, the underlying code is protected through multiple locks.

> 
> Maybe the right thing to do is protecting the whole thing with hotplug
> readlock?

The operation is already within a hotplug readlock when performing DLPAR
add/remove.  Adding a CPU to the system, requires it to be brought online.
Removing a CPU from the system, requires it to be taken offline.  These
involve calls to cpu_up / cpu_down, which go through _cpu_up / _cpu_down,
which acquire the hotplug locks, among others along the path of execution.

The locks are acquired before getting to the workqueue code, the pool
creation/attachment code (which is where the cpu mask needs to be set),
or trying to wakeup the initial created task in 'sched.c'.

> 
> Thanks.
> 

Regards,
Michael

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-05-15 15:48   ` Michael Bringmann
@ 2017-05-16 15:55     ` Tejun Heo
  2017-05-23 19:44       ` Michael Bringmann
  0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2017-05-16 15:55 UTC (permalink / raw)
  To: Michael Bringmann; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

Hello, Michael.

On Mon, May 15, 2017 at 10:48:04AM -0500, Michael Bringmann wrote:
> >> --- a/kernel/workqueue.c
> >> +++ b/kernel/workqueue.c
> >> @@ -3366,6 +3366,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
> >>  	copy_workqueue_attrs(pool->attrs, attrs);
> >>  	pool->node = target_node;
> >>  
> >> +	cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id()));
> > 
> > What prevents a cpu getting added right here tho?
> 
> PowerPC has only one control path to add/remove CPUs via DLPAR operations.
> Even so, the underlying code is protected through multiple locks.

The more I look at the patch, the less sense it seems to make.  So,
whenever we create a new pool, we ignore the requested cpumask and
override it with the cpumask of the current thread?

> > Maybe the right thing to do is protecting the whole thing with hotplug
> > readlock?
> 
> The operation is already within a hotplug readlock when performing DLPAR
> add/remove.  Adding a CPU to the system, requires it to be brought online.
> Removing a CPU from the system, requires it to be taken offline.  These
> involve calls to cpu_up / cpu_down, which go through _cpu_up / _cpu_down,
> which acquire the hotplug locks, among others along the path of execution.
> 
> The locks are acquired before getting to the workqueue code, the pool
> creation/attachment code (which is where the cpu mask needs to be set),
> or trying to wakeup the initial created task in 'sched.c'.

A new unbound workqueue and thus unbound pool can also be created from
paths outside cpu hotplug, so get_unbound_pool() can race against
hotplug.  Can you please explain the failures that you see in more
detail?  I'm sure your patch works around the issue somehow but it
doesn't look like the right fix.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-05-16 15:55     ` Tejun Heo
@ 2017-05-23 19:44       ` Michael Bringmann
  2017-05-23 19:49         ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Bringmann @ 2017-05-23 19:44 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot



On 05/16/2017 10:55 AM, Tejun Heo wrote:
> Hello, Michael.
> 
> On Mon, May 15, 2017 at 10:48:04AM -0500, Michael Bringmann wrote:
>>>> --- a/kernel/workqueue.c
>>>> +++ b/kernel/workqueue.c
>>>> @@ -3366,6 +3366,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
>>>>  	copy_workqueue_attrs(pool->attrs, attrs);
>>>>  	pool->node = target_node;
>>>>  
>>>> +	cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id()));
>>>
>>> What prevents a cpu getting added right here tho?
>>
>> PowerPC has only one control path to add/remove CPUs via DLPAR operations.
>> Even so, the underlying code is protected through multiple locks.
> 
> The more I look at the patch, the less sense it seems to make.  So,
> whenever we create a new pool, we ignore the requested cpumask and
> override it with the cpumask of the current thread?

No.  As I mentioned previously, the operation/problem occurs within a DLPAR
hotplug add/remove operation.  This is happening to a node which previously
did not have any CPUs associated to it -- we are trying to add more resources
to an LPAR / partition.  At this point, the cpumask for the node is empty / zero.
Sorry for not being more clear on this point earlier.
 
>>> Maybe the right thing to do is protecting the whole thing with hotplug
>>> readlock?
>>
>> The operation is already within a hotplug readlock when performing DLPAR
>> add/remove.  Adding a CPU to the system, requires it to be brought online.
>> Removing a CPU from the system, requires it to be taken offline.  These
>> involve calls to cpu_up / cpu_down, which go through _cpu_up / _cpu_down,
>> which acquire the hotplug locks, among others along the path of execution.
>>
>> The locks are acquired before getting to the workqueue code, the pool
>> creation/attachment code (which is where the cpu mask needs to be set),
>> or trying to wakeup the initial created task in 'sched.c'.
> 
> A new unbound workqueue and thus unbound pool can also be created from
> paths outside cpu hotplug, so get_unbound_pool() can race against
> hotplug.  Can you please explain the failures that you see in more
> detail?  I'm sure your patch works around the issue somehow but it
> doesn't look like the right fix.

We fill in an empty cpumask field with a guaranteed non-empty value.
I verified that the incoming cpumask in the attrs was zero at this point
preceding the failure.  If we proceed without putting in a useful value,
we go to 'wake_up_process()' (kernel/sched/core.c) next to wakeup the new
worker for the new unbound pool.  While there, the code runs through
'select_task_rq()' and invokes cpumask_any() on a copy of the cpumask.
Unfortunately, running that function over an empty/non-initialized cpumask
returns an index beyond the end of the list, resulting shortly thereafter
in an instruction/data fetch exception.

If you have a suggestion for an alternate non-empty value to use, I would
be happy to try it.

> 
> Thanks.
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-05-23 19:44       ` Michael Bringmann
@ 2017-05-23 19:49         ` Tejun Heo
  2017-05-23 20:09           ` Michael Bringmann
  0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2017-05-23 19:49 UTC (permalink / raw)
  To: Michael Bringmann; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

Hello, Michael.

On Tue, May 23, 2017 at 02:44:23PM -0500, Michael Bringmann wrote:
> On 05/16/2017 10:55 AM, Tejun Heo wrote:
> > Hello, Michael.
> > 
> > On Mon, May 15, 2017 at 10:48:04AM -0500, Michael Bringmann wrote:
> >>>> --- a/kernel/workqueue.c
> >>>> +++ b/kernel/workqueue.c
> >>>> @@ -3366,6 +3366,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
> >>>>  	copy_workqueue_attrs(pool->attrs, attrs);
> >>>>  	pool->node = target_node;
> >>>>  
> >>>> +	cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id()));
> >>>
> >>> What prevents a cpu getting added right here tho?
> >>
> >> PowerPC has only one control path to add/remove CPUs via DLPAR operations.
> >> Even so, the underlying code is protected through multiple locks.
> > 
> > The more I look at the patch, the less sense it seems to make.  So,
> > whenever we create a new pool, we ignore the requested cpumask and
> > override it with the cpumask of the current thread?
> 
> No.  As I mentioned previously, the operation/problem occurs within a DLPAR
> hotplug add/remove operation.  This is happening to a node which previously

But that's what the code is doing.  Whenever it creates a new unbound
pool, it ends up ignoring the requested cpumask and overwrites it with
the cpumask containing self.

> did not have any CPUs associated to it -- we are trying to add more resources
> to an LPAR / partition.  At this point, the cpumask for the node is empty / zero.
> Sorry for not being more clear on this point earlier.
...
> > A new unbound workqueue and thus unbound pool can also be created from
> > paths outside cpu hotplug, so get_unbound_pool() can race against
> > hotplug.  Can you please explain the failures that you see in more
> > detail?  I'm sure your patch works around the issue somehow but it
> > doesn't look like the right fix.
> 
> We fill in an empty cpumask field with a guaranteed non-empty value.
> I verified that the incoming cpumask in the attrs was zero at this point
> preceding the failure.  If we proceed without putting in a useful value,
> we go to 'wake_up_process()' (kernel/sched/core.c) next to wakeup the new
> worker for the new unbound pool.  While there, the code runs through
> 'select_task_rq()' and invokes cpumask_any() on a copy of the cpumask.
> Unfortunately, running that function over an empty/non-initialized cpumask
> returns an index beyond the end of the list, resulting shortly thereafter
> in an instruction/data fetch exception.
> 
> If you have a suggestion for an alternate non-empty value to use, I would
> be happy to try it.

Can you please post the backtrace of the problematic worker pool being
created (WARN_ON empty cpumask while creating a new pool)?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-05-23 19:49         ` Tejun Heo
@ 2017-05-23 20:09           ` Michael Bringmann
  2017-05-23 20:10             ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Bringmann @ 2017-05-23 20:09 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

To confirm, you want the WARN_ON(cpumask_any(pool->attrs->cpumask) >= NR_CPUS)
at the point where I place my current patch?

On 05/23/2017 02:49 PM, Tejun Heo wrote:
> Hello, Michael.
> 
> On Tue, May 23, 2017 at 02:44:23PM -0500, Michael Bringmann wrote:
>> On 05/16/2017 10:55 AM, Tejun Heo wrote:
>>> Hello, Michael.
>>>
>>> On Mon, May 15, 2017 at 10:48:04AM -0500, Michael Bringmann wrote:
>>>>>> --- a/kernel/workqueue.c
>>>>>> +++ b/kernel/workqueue.c
>>>>>> @@ -3366,6 +3366,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
>>>>>>  	copy_workqueue_attrs(pool->attrs, attrs);
>>>>>>  	pool->node = target_node;
>>>>>>  
>>>>>> +	cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id()));
>>>>>
>>>>> What prevents a cpu getting added right here tho?
>>>>
>>>> PowerPC has only one control path to add/remove CPUs via DLPAR operations.
>>>> Even so, the underlying code is protected through multiple locks.
>>>
>>> The more I look at the patch, the less sense it seems to make.  So,
>>> whenever we create a new pool, we ignore the requested cpumask and
>>> override it with the cpumask of the current thread?
>>
>> No.  As I mentioned previously, the operation/problem occurs within a DLPAR
>> hotplug add/remove operation.  This is happening to a node which previously
> 
> But that's what the code is doing.  Whenever it creates a new unbound
> pool, it ends up ignoring the requested cpumask and overwrites it with
> the cpumask containing self.
> 
>> did not have any CPUs associated to it -- we are trying to add more resources
>> to an LPAR / partition.  At this point, the cpumask for the node is empty / zero.
>> Sorry for not being more clear on this point earlier.
> ...
>>> A new unbound workqueue and thus unbound pool can also be created from
>>> paths outside cpu hotplug, so get_unbound_pool() can race against
>>> hotplug.  Can you please explain the failures that you see in more
>>> detail?  I'm sure your patch works around the issue somehow but it
>>> doesn't look like the right fix.
>>
>> We fill in an empty cpumask field with a guaranteed non-empty value.
>> I verified that the incoming cpumask in the attrs was zero at this point
>> preceding the failure.  If we proceed without putting in a useful value,
>> we go to 'wake_up_process()' (kernel/sched/core.c) next to wakeup the new
>> worker for the new unbound pool.  While there, the code runs through
>> 'select_task_rq()' and invokes cpumask_any() on a copy of the cpumask.
>> Unfortunately, running that function over an empty/non-initialized cpumask
>> returns an index beyond the end of the list, resulting shortly thereafter
>> in an instruction/data fetch exception.
>>
>> If you have a suggestion for an alternate non-empty value to use, I would
>> be happy to try it.
> 
> Can you please post the backtrace of the problematic worker pool being
> created (WARN_ON empty cpumask while creating a new pool)?
> 
> Thanks.
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-05-23 20:09           ` Michael Bringmann
@ 2017-05-23 20:10             ` Tejun Heo
  2017-05-24 16:30               ` Michael Bringmann
  2017-05-24 23:39               ` Michael Bringmann
  0 siblings, 2 replies; 26+ messages in thread
From: Tejun Heo @ 2017-05-23 20:10 UTC (permalink / raw)
  To: Michael Bringmann; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

Hello,

On Tue, May 23, 2017 at 03:09:07PM -0500, Michael Bringmann wrote:
> To confirm, you want the WARN_ON(cpumask_any(pool->attrs->cpumask) >= NR_CPUS)
> at the point where I place my current patch?

Yeah, cpumask_weight() probably is a bit more intuitive but I'm
curious why we're creating workqueues for a node before cpus come
online.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-05-23 20:10             ` Tejun Heo
@ 2017-05-24 16:30               ` Michael Bringmann
  2017-05-24 23:39               ` Michael Bringmann
  1 sibling, 0 replies; 26+ messages in thread
From: Michael Bringmann @ 2017-05-24 16:30 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot



On 05/23/2017 03:10 PM, Tejun Heo wrote:
> Hello,
> 
> On Tue, May 23, 2017 at 03:09:07PM -0500, Michael Bringmann wrote:
>> To confirm, you want the WARN_ON(cpumask_any(pool->attrs->cpumask) >= NR_CPUS)
>> at the point where I place my current patch?
> 
> Yeah, cpumask_weight() probably is a bit more intuitive but I'm
> curious why we're creating workqueues for a node before cpus come
> online.
> 
> Thanks.
> 

I am in the middle of another test, but I did find this test crash log from
one of my earlier tests.  The system was configured for Shared Processors,
booting with 16 or so VPs, and then I was adding and removing them, and hit
this crash.  I will get the other log later.

[    8.599437] Unable to handle kernel paging request for unaligned access at address 0xc0000003c52231cf
[    8.599443] Faulting instruction address: 0xc00000000049c54c
[    8.599450] Oops: Kernel access of bad area, sig: 7 [#1]
[    8.599454] SMP NR_CPUS=2048 
[    8.599455] NUMA 
[    8.599458] pSeries
[    8.599463] Modules linked in:
[    8.599470] CPU: 35 PID: 1 Comm: swapper/0 Not tainted 4.10.0-rc6_VPHNt010+ #19
[    8.599475] task: c0000005f93c0001 task.stack: c000000bf8100000
[    8.599480] NIP: c00000000049c54c LR: c000000000101814 CTR: c0000000001190d0
[    8.599485] REGS: c000000bf8103520 TRAP: 0600   Not tainted  (4.10.0-rc6_VPHNt010+)
[    8.599490] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>
[    8.599495]   CR: 28108e44  XER: 0000000b
[    8.599501] CFAR: c000000000101810 DAR: c0000003c52231cf DSISR: 00000000 SOFTE: 0 
[    8.599501] GPR00: c0000000001017dc c000000bf81037a0 c000000000fd4c00 c0000005ef7c15a0 
[    8.599501] GPR04: c0000005ef7c15a0 c0000003c52231cf 0000000000000000 0000000000000000 
[    8.599501] GPR08: c000000001014c00 69665f716573006e fa00000000000000 0000000000000000 
[    8.599501] GPR12: c0000000001190d0 c00000000e5e3b00 c00000000000d718 0000000000000000 
[    8.599501] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[    8.599501] GPR20: 0000000000000000 0000000000000000 c000000000c62b00 0000000000004000 
[    8.599501] GPR24: 00000003c45bfc57 c00000000100dbe0 0000000000000000 c000000000c62b00 
[    8.599501] GPR28: 00000003c45bfc57 c0000005ef7c1d4c 0000000000000800 c0000005ef7c1580 
[    8.634917] NIP [c00000000049c54c] llist_add_batch+0xc/0x40
[    8.634923] LR [c000000000101814] try_to_wake_up+0x3e4/0x500
[    8.634928] Call Trace:
[    8.634931] [c000000bf81037a0] [c0000000001017dc] try_to_wake_up+0x3ac/0x500 (unreliable)
[    8.634939] [c000000bf8103820] [c0000000000e5508] create_worker+0x148/0x250
[    8.679666] [c000000bf81038c0] [c0000000000e986c] alloc_unbound_pwq+0x3cc/0x4d0
[    8.679673] [c000000bf8103960] [c0000000000e9e9c] apply_wqattrs_prepare+0x2bc/0x330
[    8.679679] [c000000bf8103a10] [c0000000000e9f74] apply_workqueue_attrs_locked+0x64/0xd0
[    8.679685] [c000000bf8103a80] [c0000000000ea4f4] apply_workqueue_attrs+0x64/0xa0
[    8.679692] [c000000bf8103b00] [c0000000000ec19c] __alloc_workqueue_key+0x1cc/0x680
[    8.679700] [c000000bf8103be0] [c000000000bcb2a4] __machine_initcall_pseries_pseries_dlpar_init+0x50/0x8c
[    8.679707] [c000000bf8103c40] [c00000000000cef0] do_one_initcall+0x60/0x1c0
[    8.679714] [c000000bf8103d00] [c000000000bb423c] kernel_init_freeable+0x2a8/0x390
[    8.679719] [c000000bf8103dc0] [c00000000000d734] kernel_init+0x24/0x150
[    8.679726] [c000000bf8103e30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[    8.679731] Instruction dump:
[    8.706288] 60420000 7c832378 4e800020 60000000 60000000 60000000 60000000 60000000 
[    8.706296] 60000000 e9250000 f9240000 7c0004ac <7d4028a8> 7c2a4800 40c20010 7c6029ad 
[    8.706307] ---[ end trace b6256c8c7d38d99b ]---
[    8.706311] 
[   10.706438] Kernel panic - not syncing: Fatal exception
[   10.706704] Rebooting in 10 seconds..




-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-05-23 20:10             ` Tejun Heo
  2017-05-24 16:30               ` Michael Bringmann
@ 2017-05-24 23:39               ` Michael Bringmann
  2017-05-25 15:03                 ` Tejun Heo
  1 sibling, 1 reply; 26+ messages in thread
From: Michael Bringmann @ 2017-05-24 23:39 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot


On 05/23/2017 03:10 PM, Tejun Heo wrote:
> Hello,
> 
> On Tue, May 23, 2017 at 03:09:07PM -0500, Michael Bringmann wrote:
>> To confirm, you want the WARN_ON(cpumask_any(pool->attrs->cpumask) >= NR_CPUS)
>> at the point where I place my current patch?
> 
> Yeah, cpumask_weight() probably is a bit more intuitive but I'm
> curious why we're creating workqueues for a node before cpus come
> online.
> 
> Thanks.
> 

And here is the full WARN_ONCE(!cpumask(pool->attrs->cpumask), "message")
following by the crash in kernel/sched/core.c, as I removed the patch as
well to demonstrate what would happen in the 4.12 kernel on powerpc.

1) Boot with Shared CPUs (2 CPUs, 8VPs) / Shared Memory (20G)
2) numactl -H
3) Hot-add 16 VPs to system, and run 'numactl -H'
4) Hot-remove 16 VPs from system.  Hit WARN_ONCE message in
   workqueue.c:get_unbound_pool(), followed by the crash in
   kernel/sched/core.c, as I also removed the patch that 'fixed'
   the cpumask.

-----------------------Beginning of Log-------------------------------------

Red Hat Enterprise Linux Server 7.3 (Maipo)
Kernel 4.12.0-rc1.wi91275_debug_03.ppc64le+ on an ppc64le

ltcalpine2-lp20 login: root
Password: 
Last login: Wed May 24 18:45:40 from oc1554177480.austin.ibm.com
[root@ltcalpine2-lp20 ~]# numactl -H
available: 2 nodes (0,6)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 6 size: 19858 MB
node 6 free: 16920 MB
node distances:
node   0   6 
  0:  10  40 
  6:  40  10 
[root@ltcalpine2-lp20 ~]# numactl -H
available: 2 nodes (0,6)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
node 6 size: 19858 MB
node 6 free: 16362 MB
node distances:
node   0   6 
  0:  10  40 
  6:  40  10 
[root@ltcalpine2-lp20 ~]# [  321.310943] workqueue:get_unbound_pool has empty cpumask for pool attrs
[  321.310961] ------------[ cut here ]------------
[  321.310997] WARNING: CPU: 184 PID: 13201 at kernel/workqueue.c:3375 alloc_unbound_pwq+0x5c0/0x5e0
[  321.311005] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag sg pseries_rng ghash_generic gf128mul xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
[  321.311097] CPU: 184 PID: 13201 Comm: cpuhp/184 Not tainted 4.12.0-rc1.wi91275_debug_03.ppc64le+ #8
[  321.311106] task: c000000408961080 task.stack: c000000406394000
[  321.311113] NIP: c000000000116c80 LR: c000000000116c7c CTR: 0000000000000000
[  321.311121] REGS: c0000004063977b0 TRAP: 0700   Not tainted  (4.12.0-rc1.wi91275_debug_03.ppc64le+)
[  321.311128] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>
[  321.311150]   CR: 28000082  XER: 00000000
[  321.311159] CFAR: c000000000a2dc80 SOFTE: 1 
[  321.311159] GPR00: c000000000116c7c c000000406397a30 c0000000013ae900 000000000000003b 
[  321.311159] GPR04: c000000408961a38 0000000000000006 00000000a49e41e5 ffffffffa4a5a483 
[  321.311159] GPR08: 00000000000062cc 0000000000000000 0000000000000000 c000000408961a38 
[  321.311159] GPR12: 0000000000000000 c00000000fb38c00 c00000000011e858 c00000040a902ac0 
[  321.311159] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  321.311159] GPR20: c000000406394000 0000000000000002 c000000406394000 0000000000000000 
[  321.311159] GPR24: c000000405075400 c000000404fc0000 0000000000000110 c0000000015a4c88 
[  321.311159] GPR28: 0000000000000000 c0000004fe256000 c0000004fe256008 c0000004fe052800 
[  321.311290] NIP [c000000000116c80] alloc_unbound_pwq+0x5c0/0x5e0
[  321.311298] LR [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0
[  321.311305] Call Trace:
[  321.311310] [c000000406397a30] [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0 (unreliable)
[  321.311323] [c000000406397ad0] [c000000000116e30] wq_update_unbound_numa+0x190/0x270
[  321.311334] [c000000406397b60] [c000000000118eb0] workqueue_offline_cpu+0xe0/0x130
[  321.311345] [c000000406397bf0] [c0000000000e9f20] cpuhp_invoke_callback+0x240/0xcd0
[  321.311355] [c000000406397cb0] [c0000000000eab28] cpuhp_down_callbacks+0x78/0xf0
[  321.311365] [c000000406397d00] [c0000000000eae6c] cpuhp_thread_fun+0x18c/0x1a0
[  321.311376] [c000000406397d30] [c0000000001251cc] smpboot_thread_fn+0x2fc/0x3b0
[  321.311386] [c000000406397dc0] [c00000000011e9c0] kthread+0x170/0x1b0
[  321.311397] [c000000406397e30] [c00000000000b4f4] ret_from_kernel_thread+0x5c/0x68
[  321.311406] Instruction dump:
[  321.311413] 3d42fff0 892ac565 2f890000 40fefd98 39200001 3c62ff89 3c82ff6c 3863d590 
[  321.311437] 38847cb0 992ac565 48916fc9 60000000 <0fe00000> 4bfffd70 60000000 60420000 
[  321.311462] ---[ end trace 9f7c1cd616b26de8 ]---
[  321.318347] Unable to handle kernel paging request for unaligned access at address 0xc0000003c5577ebf
[  321.318448] Faulting instruction address: 0xc00000000055ec8c
[  321.318457] Oops: Kernel access of bad area, sig: 7 [#1]
[  321.318462] SMP NR_CPUS=2048 
[  321.318463] NUMA 
[  321.318468] pSeries
[  321.318473] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag sg pseries_rng ghash_generic gf128mul xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
[  321.318524] CPU: 184 PID: 13201 Comm: cpuhp/184 Tainted: G        W       4.12.0-rc1.wi91275_debug_03.ppc64le+ #8
[  321.318532] task: c000000408961080 task.stack: c000000406394000
[  321.318537] NIP: c00000000055ec8c LR: c0000000001312d4 CTR: c000000000145d50
[  321.318544] REGS: c000000406397690 TRAP: 0600   Tainted: G        W        (4.12.0-rc1.wi91275_debug_03.ppc64le+)
[  321.318551] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>
[  321.318563]   CR: 28000024  XER: 00000000
[  321.318571] CFAR: c0000000001312d0 DAR: c0000003c5577ebf DSISR: 00000000 SOFTE: 0 
[  321.318571] GPR00: c000000000131298 c000000406397910 c0000000013ae900 c0000004b6d22820 
[  321.318571] GPR04: c0000004b6d22820 c0000003c5577ebf 0000000000000000 00000004f1230000 
[  321.318571] GPR08: 0000000d8ddb1ea7 0000000000000000 0000000000000008 c000000408961aa8 
[  321.318571] GPR12: c000000000145d50 c00000000fb38c00 c00000000011e858 c00000040a902ac0 
[  321.318571] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  321.318571] GPR20: c000000406394000 0000000000000002 0000000000004000 c000000000fb7700 
[  321.318571] GPR24: c0000000013f5d00 c0000000013f9d48 0000000000000000 c0000004b6d230e8 
[  321.318571] GPR28: 0000000000000004 00000003c45bfc57 0000000000000800 c0000004b6d22800 
[  321.318664] NIP [c00000000055ec8c] llist_add_batch+0xc/0x40
[  321.318670] LR [c0000000001312d4] try_to_wake_up+0x524/0x850
[  321.318675] Call Trace:
[  321.318679] [c000000406397910] [c000000000131298] try_to_wake_up+0x4e8/0x850 (unreliable)
[  321.318689] [c000000406397990] [c000000000111bf8] create_worker+0x148/0x220
[  321.318696] [c000000406397a30] [c000000000116ae8] alloc_unbound_pwq+0x428/0x5e0
[  321.318705] [c000000406397ad0] [c000000000116e30] wq_update_unbound_numa+0x190/0x270
[  321.318713] [c000000406397b60] [c000000000118eb0] workqueue_offline_cpu+0xe0/0x130
[  321.318721] [c000000406397bf0] [c0000000000e9f20] cpuhp_invoke_callback+0x240/0xcd0
[  321.318729] [c000000406397cb0] [c0000000000eab28] cpuhp_down_callbacks+0x78/0xf0
[  321.318737] [c000000406397d00] [c0000000000eae6c] cpuhp_thread_fun+0x18c/0x1a0
[  321.318745] [c000000406397d30] [c0000000001251cc] smpboot_thread_fn+0x2fc/0x3b0
[  321.318754] [c000000406397dc0] [c00000000011e9c0] kthread+0x170/0x1b0
[  321.318762] [c000000406397e30] [c00000000000b4f4] ret_from_kernel_thread+0x5c/0x68
[  321.318769] Instruction dump:
[  321.318775] 60420000 38600000 4e800020 60000000 60420000 7c832378 4e800020 60000000 
[  321.318790] 60000000 e9250000 f9240000 7c0004ac <7d4028a8> 7c2a4800 40c20010 7c6029ad 
[  321.318808] ---[ end trace 9f7c1cd616b26de9 ]---
[  321.322303] 
[  323.322505] Kernel panic - not syncing: Fatal exception
[  323.429027] Rebooting in 10 seconds..




-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-05-24 23:39               ` Michael Bringmann
@ 2017-05-25 15:03                 ` Tejun Heo
  2017-05-25 15:07                   ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2017-05-25 15:03 UTC (permalink / raw)
  To: Michael Bringmann; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

Hello, Michael.

On Wed, May 24, 2017 at 06:39:49PM -0500, Michael Bringmann wrote:
> [  321.310961] ------------[ cut here ]------------
> [  321.310997] WARNING: CPU: 184 PID: 13201 at kernel/workqueue.c:3375 alloc_unbound_pwq+0x5c0/0x5e0
> [  321.311005] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag sg pseries_rng ghash_generic gf128mul xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
> [  321.311097] CPU: 184 PID: 13201 Comm: cpuhp/184 Not tainted 4.12.0-rc1.wi91275_debug_03.ppc64le+ #8
> [  321.311106] task: c000000408961080 task.stack: c000000406394000
> [  321.311113] NIP: c000000000116c80 LR: c000000000116c7c CTR: 0000000000000000
> [  321.311121] REGS: c0000004063977b0 TRAP: 0700   Not tainted  (4.12.0-rc1.wi91275_debug_03.ppc64le+)
> [  321.311128] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>
> [  321.311150]   CR: 28000082  XER: 00000000
> [  321.311159] CFAR: c000000000a2dc80 SOFTE: 1 
> [  321.311159] GPR00: c000000000116c7c c000000406397a30 c0000000013ae900 000000000000003b 
> [  321.311159] GPR04: c000000408961a38 0000000000000006 00000000a49e41e5 ffffffffa4a5a483 
> [  321.311159] GPR08: 00000000000062cc 0000000000000000 0000000000000000 c000000408961a38 
> [  321.311159] GPR12: 0000000000000000 c00000000fb38c00 c00000000011e858 c00000040a902ac0 
> [  321.311159] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
> [  321.311159] GPR20: c000000406394000 0000000000000002 c000000406394000 0000000000000000 
> [  321.311159] GPR24: c000000405075400 c000000404fc0000 0000000000000110 c0000000015a4c88 
> [  321.311159] GPR28: 0000000000000000 c0000004fe256000 c0000004fe256008 c0000004fe052800 
> [  321.311290] NIP [c000000000116c80] alloc_unbound_pwq+0x5c0/0x5e0
> [  321.311298] LR [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0
> [  321.311305] Call Trace:
> [  321.311310] [c000000406397a30] [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0 (unreliable)
> [  321.311323] [c000000406397ad0] [c000000000116e30] wq_update_unbound_numa+0x190/0x270
> [  321.311334] [c000000406397b60] [c000000000118eb0] workqueue_offline_cpu+0xe0/0x130
> [  321.311345] [c000000406397bf0] [c0000000000e9f20] cpuhp_invoke_callback+0x240/0xcd0
> [  321.311355] [c000000406397cb0] [c0000000000eab28] cpuhp_down_callbacks+0x78/0xf0
> [  321.311365] [c000000406397d00] [c0000000000eae6c] cpuhp_thread_fun+0x18c/0x1a0
> [  321.311376] [c000000406397d30] [c0000000001251cc] smpboot_thread_fn+0x2fc/0x3b0
> [  321.311386] [c000000406397dc0] [c00000000011e9c0] kthread+0x170/0x1b0
> [  321.311397] [c000000406397e30] [c00000000000b4f4] ret_from_kernel_thread+0x5c/0x68

wq_update_unbound_numa() should have never called into
alloc_unbound_pwq() w/ empty node cpu mask.  It should have fallen
back to the dfl_pwq.  It looks like I just messed up the logic there
from the initial commit of the feature.  Can you please see whether
the following fixes the problem?

Thanks.

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c74bf39ef764..e2c248d5223a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3559,29 +3559,21 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
  * stable.
  *
  * Return: %true if the resulting @cpumask is different from @attrs->cpumask,
- * %false if equal.
+ * %false if equal.  On %false return, the content of @cpumask is undefined.
  */
 static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
 				 int cpu_going_down, cpumask_t *cpumask)
 {
 	if (!wq_numa_enabled || attrs->no_numa)
-		goto use_dfl;
+		return false;
 
 	/* does @node have any online CPUs @attrs wants? */
 	cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask);
 	if (cpu_going_down >= 0)
 		cpumask_clear_cpu(cpu_going_down, cpumask);
 
-	if (cpumask_empty(cpumask))
-		goto use_dfl;
-
-	/* yeap, return possible CPUs in @node that @attrs wants */
-	cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
-	return !cpumask_equal(cpumask, attrs->cpumask);
-
-use_dfl:
-	cpumask_copy(cpumask, attrs->cpumask);
-	return false;
+	return !cpumask_empty(cpumask) &&
+		!cpumask_equal(cpumask, attrs->cpumask);
 }
 
 /* install @pwq into @wq's numa_pwq_tbl[] for @node and return the old pwq */

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-05-25 15:03                 ` Tejun Heo
@ 2017-05-25 15:07                   ` Tejun Heo
  2017-05-25 15:30                     ` Michael Bringmann
  0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2017-05-25 15:07 UTC (permalink / raw)
  To: Michael Bringmann; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

On Thu, May 25, 2017 at 11:03:53AM -0400, Tejun Heo wrote:
> wq_update_unbound_numa() should have never called into
> alloc_unbound_pwq() w/ empty node cpu mask.  It should have fallen
> back to the dfl_pwq.  It looks like I just messed up the logic there
> from the initial commit of the feature.  Can you please see whether
> the following fixes the problem?

Can you please try the following instead.  On the second thought, I
don't think the current logic is wrong.  If this fixes the issue,
somehow your setup is having a situation where online cpumask for a
node is a proper superset of possible cpumask for the node.

Thanks.

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c74bf39ef764..4da5ff649ff8 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3559,13 +3559,13 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
  * stable.
  *
  * Return: %true if the resulting @cpumask is different from @attrs->cpumask,
- * %false if equal.
+ * %false if equal.  On %false return, the content of @cpumask is undefined.
  */
 static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
 				 int cpu_going_down, cpumask_t *cpumask)
 {
 	if (!wq_numa_enabled || attrs->no_numa)
-		goto use_dfl;
+		return false;
 
 	/* does @node have any online CPUs @attrs wants? */
 	cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask);
@@ -3573,15 +3573,13 @@ static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
 		cpumask_clear_cpu(cpu_going_down, cpumask);
 
 	if (cpumask_empty(cpumask))
-		goto use_dfl;
+		return false;
 
 	/* yeap, return possible CPUs in @node that @attrs wants */
 	cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
-	return !cpumask_equal(cpumask, attrs->cpumask);
 
-use_dfl:
-	cpumask_copy(cpumask, attrs->cpumask);
-	return false;
+	return !cpumask_empty(cpumask) &&
+		!cpumask_equal(cpumask, attrs->cpumask);
 }
 
 /* install @pwq into @wq's numa_pwq_tbl[] for @node and return the old pwq */

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-05-25 15:07                   ` Tejun Heo
@ 2017-05-25 15:30                     ` Michael Bringmann
  2017-06-06 16:18                       ` Michael Bringmann
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Bringmann @ 2017-05-25 15:30 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

I will try that patch shortly.  I also updated my patch to be conditional
on whether the pool's cpumask attribute was empty.  You should have received
V2 of that patch by now.

As to your remark about 'proper subset of possible cpumask for the node',
would that not be the case when we are removing VPs?

On 05/25/2017 10:07 AM, Tejun Heo wrote:
> On Thu, May 25, 2017 at 11:03:53AM -0400, Tejun Heo wrote:
>> wq_update_unbound_numa() should have never called into
>> alloc_unbound_pwq() w/ empty node cpu mask.  It should have fallen
>> back to the dfl_pwq.  It looks like I just messed up the logic there
>> from the initial commit of the feature.  Can you please see whether
>> the following fixes the problem?
> 
> Can you please try the following instead.  On the second thought, I
> don't think the current logic is wrong.  If this fixes the issue,
> somehow your setup is having a situation where online cpumask for a
> node is a proper superset of possible cpumask for the node.
> 
> Thanks.
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index c74bf39ef764..4da5ff649ff8 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3559,13 +3559,13 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
>   * stable.
>   *
>   * Return: %true if the resulting @cpumask is different from @attrs->cpumask,
> - * %false if equal.
> + * %false if equal.  On %false return, the content of @cpumask is undefined.
>   */
>  static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
>  				 int cpu_going_down, cpumask_t *cpumask)
>  {
>  	if (!wq_numa_enabled || attrs->no_numa)
> -		goto use_dfl;
> +		return false;
> 
>  	/* does @node have any online CPUs @attrs wants? */
>  	cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask);
> @@ -3573,15 +3573,13 @@ static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
>  		cpumask_clear_cpu(cpu_going_down, cpumask);
> 
>  	if (cpumask_empty(cpumask))
> -		goto use_dfl;
> +		return false;
> 
>  	/* yeap, return possible CPUs in @node that @attrs wants */
>  	cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
> -	return !cpumask_equal(cpumask, attrs->cpumask);
> 
> -use_dfl:
> -	cpumask_copy(cpumask, attrs->cpumask);
> -	return false;
> +	return !cpumask_empty(cpumask) &&
> +		!cpumask_equal(cpumask, attrs->cpumask);
>  }
> 
>  /* install @pwq into @wq's numa_pwq_tbl[] for @node and return the old pwq */
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-05-25 15:30                     ` Michael Bringmann
@ 2017-06-06 16:18                       ` Michael Bringmann
  2017-06-06 18:09                         ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Bringmann @ 2017-06-06 16:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot



On 05/25/2017 10:30 AM, Michael Bringmann wrote:
> I will try that patch shortly.  I also updated my patch to be conditional
> on whether the pool's cpumask attribute was empty.  You should have received
> V2 of that patch by now.

Let's try this again.

The hotplug problem goes away with the changes that you provided earlier, and
shown in the patch below.  I kept this change to get_unbound_pool' as a just
in case to explain the crash in the event that it occurs again:

    if (!cpumask_weight(pool->attrs->cpumask))
        cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id()));

I could also insert 

    BUG(!cpumask_weight(pool->attrs->cpumask, cpumask_of(smp_processor_id()));

at that place, but I really prefer not to crash the system if there is a workaround.


> On 05/25/2017 10:07 AM, Tejun Heo wrote:
>> On Thu, May 25, 2017 at 11:03:53AM -0400, Tejun Heo wrote:
>>> wq_update_unbound_numa() should have never called into
>>> alloc_unbound_pwq() w/ empty node cpu mask.  It should have fallen
>>> back to the dfl_pwq.  It looks like I just messed up the logic there
>>> from the initial commit of the feature.  Can you please see whether
>>> the following fixes the problem?
>>
>> Can you please try the following instead.  On the second thought, I
>> don't think the current logic is wrong.  If this fixes the issue,
>> somehow your setup is having a situation where online cpumask for a
>> node is a proper superset of possible cpumask for the node.
>>
>> Thanks.
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index c74bf39ef764..4da5ff649ff8 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -3559,13 +3559,13 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
>>   * stable.
>>   *
>>   * Return: %true if the resulting @cpumask is different from @attrs->cpumask,
>> - * %false if equal.
>> + * %false if equal.  On %false return, the content of @cpumask is undefined.
>>   */
>>  static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
>>  				 int cpu_going_down, cpumask_t *cpumask)
>>  {
>>  	if (!wq_numa_enabled || attrs->no_numa)
>> -		goto use_dfl;
>> +		return false;
>>
>>  	/* does @node have any online CPUs @attrs wants? */
>>  	cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask);
>> @@ -3573,15 +3573,13 @@ static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
>>  		cpumask_clear_cpu(cpu_going_down, cpumask);
>>
>>  	if (cpumask_empty(cpumask))
>> -		goto use_dfl;
>> +		return false;
>>
>>  	/* yeap, return possible CPUs in @node that @attrs wants */
>>  	cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
>> -	return !cpumask_equal(cpumask, attrs->cpumask);
>>
>> -use_dfl:
>> -	cpumask_copy(cpumask, attrs->cpumask);
>> -	return false;
>> +	return !cpumask_empty(cpumask) &&
>> +		!cpumask_equal(cpumask, attrs->cpumask);
>>  }
>>
>>  /* install @pwq into @wq's numa_pwq_tbl[] for @node and return the old pwq */
>>
>>

> Can you please post the messages with the debug patch from the prev
> thread?  In fact, let's please continue on that thread.  I'm having a
> hard time following what's going wrong with the code.

Are these the failure logs that you requested?


Red Hat Enterprise Linux Server 7.3 (Maipo)
Kernel 4.12.0-rc1.wi91275_debug_03.ppc64le+ on an ppc64le

ltcalpine2-lp20 login: root
Password: 
Last login: Wed May 24 18:45:40 from oc1554177480.austin.ibm.com
[root@ltcalpine2-lp20 ~]# numactl -H
available: 2 nodes (0,6)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 6 size: 19858 MB
node 6 free: 16920 MB
node distances:
node   0   6 
  0:  10  40 
  6:  40  10 
[root@ltcalpine2-lp20 ~]# numactl -H
available: 2 nodes (0,6)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
node 6 size: 19858 MB
node 6 free: 16362 MB
node distances:
node   0   6 
  0:  10  40 
  6:  40  10 
[root@ltcalpine2-lp20 ~]# [  321.310943] workqueue:get_unbound_pool has empty cpumask for pool attrs
[  321.310961] ------------[ cut here ]------------
[  321.310997] WARNING: CPU: 184 PID: 13201 at kernel/workqueue.c:3375 alloc_unbound_pwq+0x5c0/0x5e0
[  321.311005] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag sg pseries_rng ghash_generic gf128mul xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
[  321.311097] CPU: 184 PID: 13201 Comm: cpuhp/184 Not tainted 4.12.0-rc1.wi91275_debug_03.ppc64le+ #8
[  321.311106] task: c000000408961080 task.stack: c000000406394000
[  321.311113] NIP: c000000000116c80 LR: c000000000116c7c CTR: 0000000000000000
[  321.311121] REGS: c0000004063977b0 TRAP: 0700   Not tainted  (4.12.0-rc1.wi91275_debug_03.ppc64le+)
[  321.311128] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>
[  321.311150]   CR: 28000082  XER: 00000000
[  321.311159] CFAR: c000000000a2dc80 SOFTE: 1 
[  321.311159] GPR00: c000000000116c7c c000000406397a30 c0000000013ae900 000000000000003b 
[  321.311159] GPR04: c000000408961a38 0000000000000006 00000000a49e41e5 ffffffffa4a5a483 
[  321.311159] GPR08: 00000000000062cc 0000000000000000 0000000000000000 c000000408961a38 
[  321.311159] GPR12: 0000000000000000 c00000000fb38c00 c00000000011e858 c00000040a902ac0 
[  321.311159] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  321.311159] GPR20: c000000406394000 0000000000000002 c000000406394000 0000000000000000 
[  321.311159] GPR24: c000000405075400 c000000404fc0000 0000000000000110 c0000000015a4c88 
[  321.311159] GPR28: 0000000000000000 c0000004fe256000 c0000004fe256008 c0000004fe052800 
[  321.311290] NIP [c000000000116c80] alloc_unbound_pwq+0x5c0/0x5e0
[  321.311298] LR [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0
[  321.311305] Call Trace:
[  321.311310] [c000000406397a30] [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0 (unreliable)
[  321.311323] [c000000406397ad0] [c000000000116e30] wq_update_unbound_numa+0x190/0x270
[  321.311334] [c000000406397b60] [c000000000118eb0] workqueue_offline_cpu+0xe0/0x130
[  321.311345] [c000000406397bf0] [c0000000000e9f20] cpuhp_invoke_callback+0x240/0xcd0
[  321.311355] [c000000406397cb0] [c0000000000eab28] cpuhp_down_callbacks+0x78/0xf0
[  321.311365] [c000000406397d00] [c0000000000eae6c] cpuhp_thread_fun+0x18c/0x1a0
[  321.311376] [c000000406397d30] [c0000000001251cc] smpboot_thread_fn+0x2fc/0x3b0
[  321.311386] [c000000406397dc0] [c00000000011e9c0] kthread+0x170/0x1b0
[  321.311397] [c000000406397e30] [c00000000000b4f4] ret_from_kernel_thread+0x5c/0x68
[  321.311406] Instruction dump:
[  321.311413] 3d42fff0 892ac565 2f890000 40fefd98 39200001 3c62ff89 3c82ff6c 3863d590 
[  321.311437] 38847cb0 992ac565 48916fc9 60000000 <0fe00000> 4bfffd70 60000000 60420000 
[  321.311462] ---[ end trace 9f7c1cd616b26de8 ]---
[  321.318347] Unable to handle kernel paging request for unaligned access at address 0xc0000003c5577ebf
[  321.318448] Faulting instruction address: 0xc00000000055ec8c
[  321.318457] Oops: Kernel access of bad area, sig: 7 [#1]
[  321.318462] SMP NR_CPUS=2048 
[  321.318463] NUMA 
[  321.318468] pSeries
[  321.318473] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag sg pseries_rng ghash_generic gf128mul xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
[  321.318524] CPU: 184 PID: 13201 Comm: cpuhp/184 Tainted: G        W       4.12.0-rc1.wi91275_debug_03.ppc64le+ #8
[  321.318532] task: c000000408961080 task.stack: c000000406394000
[  321.318537] NIP: c00000000055ec8c LR: c0000000001312d4 CTR: c000000000145d50
[  321.318544] REGS: c000000406397690 TRAP: 0600   Tainted: G        W        (4.12.0-rc1.wi91275_debug_03.ppc64le+)
[  321.318551] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>
[  321.318563]   CR: 28000024  XER: 00000000
[  321.318571] CFAR: c0000000001312d0 DAR: c0000003c5577ebf DSISR: 00000000 SOFTE: 0 
[  321.318571] GPR00: c000000000131298 c000000406397910 c0000000013ae900 c0000004b6d22820 
[  321.318571] GPR04: c0000004b6d22820 c0000003c5577ebf 0000000000000000 00000004f1230000 
[  321.318571] GPR08: 0000000d8ddb1ea7 0000000000000000 0000000000000008 c000000408961aa8 
[  321.318571] GPR12: c000000000145d50 c00000000fb38c00 c00000000011e858 c00000040a902ac0 
[  321.318571] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  321.318571] GPR20: c000000406394000 0000000000000002 0000000000004000 c000000000fb7700 
[  321.318571] GPR24: c0000000013f5d00 c0000000013f9d48 0000000000000000 c0000004b6d230e8 
[  321.318571] GPR28: 0000000000000004 00000003c45bfc57 0000000000000800 c0000004b6d22800 
[  321.318664] NIP [c00000000055ec8c] llist_add_batch+0xc/0x40
[  321.318670] LR [c0000000001312d4] try_to_wake_up+0x524/0x850
[  321.318675] Call Trace:
[  321.318679] [c000000406397910] [c000000000131298] try_to_wake_up+0x4e8/0x850 (unreliable)
[  321.318689] [c000000406397990] [c000000000111bf8] create_worker+0x148/0x220
[  321.318696] [c000000406397a30] [c000000000116ae8] alloc_unbound_pwq+0x428/0x5e0
[  321.318705] [c000000406397ad0] [c000000000116e30] wq_update_unbound_numa+0x190/0x270
[  321.318713] [c000000406397b60] [c000000000118eb0] workqueue_offline_cpu+0xe0/0x130
[  321.318721] [c000000406397bf0] [c0000000000e9f20] cpuhp_invoke_callback+0x240/0xcd0
[  321.318729] [c000000406397cb0] [c0000000000eab28] cpuhp_down_callbacks+0x78/0xf0
[  321.318737] [c000000406397d00] [c0000000000eae6c] cpuhp_thread_fun+0x18c/0x1a0
[  321.318745] [c000000406397d30] [c0000000001251cc] smpboot_thread_fn+0x2fc/0x3b0
[  321.318754] [c000000406397dc0] [c00000000011e9c0] kthread+0x170/0x1b0
[  321.318762] [c000000406397e30] [c00000000000b4f4] ret_from_kernel_thread+0x5c/0x68
[  321.318769] Instruction dump:
[  321.318775] 60420000 38600000 4e800020 60000000 60420000 7c832378 4e800020 60000000 
[  321.318790] 60000000 e9250000 f9240000 7c0004ac <7d4028a8> 7c2a4800 40c20010 7c6029ad 
[  321.318808] ---[ end trace 9f7c1cd616b26de9 ]---
[  321.322303] 
[  323.322505] Kernel panic - not syncing: Fatal exception
[  323.429027] Rebooting in 10 seconds..


Regards,

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-06-06 16:18                       ` Michael Bringmann
@ 2017-06-06 18:09                         ` Tejun Heo
  2017-06-12 14:47                           ` Michael Bringmann
  0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2017-06-06 18:09 UTC (permalink / raw)
  To: Michael Bringmann; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

Hello,

On Tue, Jun 06, 2017 at 11:18:36AM -0500, Michael Bringmann wrote:
> On 05/25/2017 10:30 AM, Michael Bringmann wrote:
> > I will try that patch shortly.  I also updated my patch to be conditional
> > on whether the pool's cpumask attribute was empty.  You should have received
> > V2 of that patch by now.
> 
> Let's try this again.
> 
> The hotplug problem goes away with the changes that you provided earlier, and

So, that means we're ending up in situations where NUMA online is a
proper superset of NUMA possible.

> shown in the patch below.  I kept this change to get_unbound_pool' as a just
> in case to explain the crash in the event that it occurs again:
> 
>     if (!cpumask_weight(pool->attrs->cpumask))
>         cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id()));
> 
> I could also insert 
> 
>     BUG(!cpumask_weight(pool->attrs->cpumask, cpumask_of(smp_processor_id()));
> 
> at that place, but I really prefer not to crash the system if there is a workaround.

I'm not sure because it doesn't make any logical sense and it's not
right in terms of correctness.  The above would be able to enable CPUs
which are explicitly excluded from a workqueue.  The only fallback
which makes sense is falling back to the default pwq.

> > Can you please post the messages with the debug patch from the prev
> > thread?  In fact, let's please continue on that thread.  I'm having a
> > hard time following what's going wrong with the code.
> 
> Are these the failure logs that you requested?
> 
> 
> Red Hat Enterprise Linux Server 7.3 (Maipo)
> Kernel 4.12.0-rc1.wi91275_debug_03.ppc64le+ on an ppc64le
> 
> ltcalpine2-lp20 login: root
> Password: 
> Last login: Wed May 24 18:45:40 from oc1554177480.austin.ibm.com
> [root@ltcalpine2-lp20 ~]# numactl -H
> available: 2 nodes (0,6)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
> node 6 size: 19858 MB
> node 6 free: 16920 MB
> node distances:
> node   0   6 
>   0:  10  40 
>   6:  40  10 
> [root@ltcalpine2-lp20 ~]# numactl -H
> available: 2 nodes (0,6)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
> node 6 size: 19858 MB
> node 6 free: 16362 MB
> node distances:
> node   0   6 
>   0:  10  40 
>   6:  40  10 
> [root@ltcalpine2-lp20 ~]# [  321.310943] workqueue:get_unbound_pool has empty cpumask for pool attrs
> [  321.310961] ------------[ cut here ]------------
> [  321.310997] WARNING: CPU: 184 PID: 13201 at kernel/workqueue.c:3375 alloc_unbound_pwq+0x5c0/0x5e0
> [  321.311005] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag sg pseries_rng ghash_generic gf128mul xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
> [  321.311097] CPU: 184 PID: 13201 Comm: cpuhp/184 Not tainted 4.12.0-rc1.wi91275_debug_03.ppc64le+ #8
> [  321.311106] task: c000000408961080 task.stack: c000000406394000
> [  321.311113] NIP: c000000000116c80 LR: c000000000116c7c CTR: 0000000000000000
> [  321.311121] REGS: c0000004063977b0 TRAP: 0700   Not tainted  (4.12.0-rc1.wi91275_debug_03.ppc64le+)
> [  321.311128] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>
> [  321.311150]   CR: 28000082  XER: 00000000
> [  321.311159] CFAR: c000000000a2dc80 SOFTE: 1 
> [  321.311159] GPR00: c000000000116c7c c000000406397a30 c0000000013ae900 000000000000003b 
> [  321.311159] GPR04: c000000408961a38 0000000000000006 00000000a49e41e5 ffffffffa4a5a483 
> [  321.311159] GPR08: 00000000000062cc 0000000000000000 0000000000000000 c000000408961a38 
> [  321.311159] GPR12: 0000000000000000 c00000000fb38c00 c00000000011e858 c00000040a902ac0 
> [  321.311159] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
> [  321.311159] GPR20: c000000406394000 0000000000000002 c000000406394000 0000000000000000 
> [  321.311159] GPR24: c000000405075400 c000000404fc0000 0000000000000110 c0000000015a4c88 
> [  321.311159] GPR28: 0000000000000000 c0000004fe256000 c0000004fe256008 c0000004fe052800 
> [  321.311290] NIP [c000000000116c80] alloc_unbound_pwq+0x5c0/0x5e0
> [  321.311298] LR [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0
> [  321.311305] Call Trace:
> [  321.311310] [c000000406397a30] [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0 (unreliable)
> [  321.311323] [c000000406397ad0] [c000000000116e30] wq_update_unbound_numa+0x190/0x270
> [  321.311334] [c000000406397b60] [c000000000118eb0] workqueue_offline_cpu+0xe0/0x130
> [  321.311345] [c000000406397bf0] [c0000000000e9f20] cpuhp_invoke_callback+0x240/0xcd0
> [  321.311355] [c000000406397cb0] [c0000000000eab28] cpuhp_down_callbacks+0x78/0xf0
> [  321.311365] [c000000406397d00] [c0000000000eae6c] cpuhp_thread_fun+0x18c/0x1a0
> [  321.311376] [c000000406397d30] [c0000000001251cc] smpboot_thread_fn+0x2fc/0x3b0
> [  321.311386] [c000000406397dc0] [c00000000011e9c0] kthread+0x170/0x1b0
> [  321.311397] [c000000406397e30] [c00000000000b4f4] ret_from_kernel_thread+0x5c/0x68
> [  321.311406] Instruction dump:
> [  321.311413] 3d42fff0 892ac565 2f890000 40fefd98 39200001 3c62ff89 3c82ff6c 3863d590 
> [  321.311437] 38847cb0 992ac565 48916fc9 60000000 <0fe00000> 4bfffd70 60000000 60420000 

The only way offlining can lead to this failure is when wq numa
possible cpu mask is a proper subset of the matching online mask.  Can
you please print out the numa online cpu and wq_numa_possible_cpumask
masks and verify that online stays within the possible for each node?
If not, the ppc arch init code needs to be updated so that cpu <->
node binding is establish for all possible cpus on boot.  Note that
this isn't a requirement coming solely from wq.  All node affine (thus
percpu) allocations depend on that.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-06-06 18:09                         ` Tejun Heo
@ 2017-06-12 14:47                           ` Michael Bringmann
  2017-06-12 16:14                             ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Bringmann @ 2017-06-12 14:47 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

Hello:

On 06/06/2017 01:09 PM, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jun 06, 2017 at 11:18:36AM -0500, Michael Bringmann wrote:
>> On 05/25/2017 10:30 AM, Michael Bringmann wrote:
>>> I will try that patch shortly.  I also updated my patch to be conditional
>>> on whether the pool's cpumask attribute was empty.  You should have received
>>> V2 of that patch by now.
>>
>> Let's try this again.
>>
>> The hotplug problem goes away with the changes that you provided earlier, and
> 
> So, that means we're ending up in situations where NUMA online is a
> proper superset of NUMA possible.
> 
>> shown in the patch below.  I kept this change to get_unbound_pool' as a just
>> in case to explain the crash in the event that it occurs again:
>>
>>     if (!cpumask_weight(pool->attrs->cpumask))
>>         cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id()));
>>
>> I could also insert 
>>
>>     BUG(!cpumask_weight(pool->attrs->cpumask, cpumask_of(smp_processor_id()));
>>
>> at that place, but I really prefer not to crash the system if there is a workaround.
> 
> I'm not sure because it doesn't make any logical sense and it's not
> right in terms of correctness.  The above would be able to enable CPUs
> which are explicitly excluded from a workqueue.  The only fallback
> which makes sense is falling back to the default pwq.

What would that look like?  Are you sure that would always be valid?
In a system that is hot-adding and hot-removing CPUs?

>>> Can you please post the messages with the debug patch from the prev
>>> thread?  In fact, let's please continue on that thread.  I'm having a
>>> hard time following what's going wrong with the code.
>>
>> Are these the failure logs that you requested?
>>
>>
>> Red Hat Enterprise Linux Server 7.3 (Maipo)
>> Kernel 4.12.0-rc1.wi91275_debug_03.ppc64le+ on an ppc64le
>>
>> ltcalpine2-lp20 login: root
>> Password: 
>> Last login: Wed May 24 18:45:40 from oc1554177480.austin.ibm.com
>> [root@ltcalpine2-lp20 ~]# numactl -H
>> available: 2 nodes (0,6)
>> node 0 cpus:
>> node 0 size: 0 MB
>> node 0 free: 0 MB
>> node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
>> node 6 size: 19858 MB
>> node 6 free: 16920 MB
>> node distances:
>> node   0   6 
>>   0:  10  40 
>>   6:  40  10 
>> [root@ltcalpine2-lp20 ~]# numactl -H
>> available: 2 nodes (0,6)
>> node 0 cpus:
>> node 0 size: 0 MB
>> node 0 free: 0 MB
>> node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
>> node 6 size: 19858 MB
>> node 6 free: 16362 MB
>> node distances:
>> node   0   6 
>>   0:  10  40 
>>   6:  40  10 
>> [root@ltcalpine2-lp20 ~]# [  321.310943] workqueue:get_unbound_pool has empty cpumask for pool attrs
>> [  321.310961] ------------[ cut here ]------------
>> [  321.310997] WARNING: CPU: 184 PID: 13201 at kernel/workqueue.c:3375 alloc_unbound_pwq+0x5c0/0x5e0
>> [  321.311005] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag sg pseries_rng ghash_generic gf128mul xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
>> [  321.311097] CPU: 184 PID: 13201 Comm: cpuhp/184 Not tainted 4.12.0-rc1.wi91275_debug_03.ppc64le+ #8
>> [  321.311106] task: c000000408961080 task.stack: c000000406394000
>> [  321.311113] NIP: c000000000116c80 LR: c000000000116c7c CTR: 0000000000000000
>> [  321.311121] REGS: c0000004063977b0 TRAP: 0700   Not tainted  (4.12.0-rc1.wi91275_debug_03.ppc64le+)
>> [  321.311128] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>
>> [  321.311150]   CR: 28000082  XER: 00000000
>> [  321.311159] CFAR: c000000000a2dc80 SOFTE: 1 
>> [  321.311159] GPR00: c000000000116c7c c000000406397a30 c0000000013ae900 000000000000003b 
>> [  321.311159] GPR04: c000000408961a38 0000000000000006 00000000a49e41e5 ffffffffa4a5a483 
>> [  321.311159] GPR08: 00000000000062cc 0000000000000000 0000000000000000 c000000408961a38 
>> [  321.311159] GPR12: 0000000000000000 c00000000fb38c00 c00000000011e858 c00000040a902ac0 
>> [  321.311159] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
>> [  321.311159] GPR20: c000000406394000 0000000000000002 c000000406394000 0000000000000000 
>> [  321.311159] GPR24: c000000405075400 c000000404fc0000 0000000000000110 c0000000015a4c88 
>> [  321.311159] GPR28: 0000000000000000 c0000004fe256000 c0000004fe256008 c0000004fe052800 
>> [  321.311290] NIP [c000000000116c80] alloc_unbound_pwq+0x5c0/0x5e0
>> [  321.311298] LR [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0
>> [  321.311305] Call Trace:
>> [  321.311310] [c000000406397a30] [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0 (unreliable)
>> [  321.311323] [c000000406397ad0] [c000000000116e30] wq_update_unbound_numa+0x190/0x270
>> [  321.311334] [c000000406397b60] [c000000000118eb0] workqueue_offline_cpu+0xe0/0x130
>> [  321.311345] [c000000406397bf0] [c0000000000e9f20] cpuhp_invoke_callback+0x240/0xcd0
>> [  321.311355] [c000000406397cb0] [c0000000000eab28] cpuhp_down_callbacks+0x78/0xf0
>> [  321.311365] [c000000406397d00] [c0000000000eae6c] cpuhp_thread_fun+0x18c/0x1a0
>> [  321.311376] [c000000406397d30] [c0000000001251cc] smpboot_thread_fn+0x2fc/0x3b0
>> [  321.311386] [c000000406397dc0] [c00000000011e9c0] kthread+0x170/0x1b0
>> [  321.311397] [c000000406397e30] [c00000000000b4f4] ret_from_kernel_thread+0x5c/0x68
>> [  321.311406] Instruction dump:
>> [  321.311413] 3d42fff0 892ac565 2f890000 40fefd98 39200001 3c62ff89 3c82ff6c 3863d590 
>> [  321.311437] 38847cb0 992ac565 48916fc9 60000000 <0fe00000> 4bfffd70 60000000 60420000 
> 
> The only way offlining can lead to this failure is when wq numa
> possible cpu mask is a proper subset of the matching online mask.  Can
> you please print out the numa online cpu and wq_numa_possible_cpumask
> masks and verify that online stays within the possible for each node?
> If not, the ppc arch init code needs to be updated so that cpu <->
> node binding is establish for all possible cpus on boot.  Note that
> this isn't a requirement coming solely from wq.  All node affine (thus
> percpu) allocations depend on that.

The ppc arch init code already records all nodes used by the CPUs visible in
the device-tree at boot time into the possible and online node bindings.  The
problem here occurs when we hot-add new CPUs to the powerpc system -- they may
require nodes that are mentioned by the VPHN hcall, but which were not used
at boot time.

I will run a test that dumps these masks later this week to try to provide
the information that you are interested in.

Right now we are having a discussion on another thread as to how to properly
set the possible node mask at boot given that there is no mechanism to hot-add
nodes to the system.  The latest idea appears to be adding another property
or two to define the maximum number of nodes that should be added to the
possible / online node masks to allow for dynamic growth after boot.

> 
> Thanks.
> 

Thanks.

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-06-12 14:47                           ` Michael Bringmann
@ 2017-06-12 16:14                             ` Tejun Heo
  2017-06-12 17:10                               ` Michael Bringmann
  0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2017-06-12 16:14 UTC (permalink / raw)
  To: Michael Bringmann; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

Hello,

On Mon, Jun 12, 2017 at 09:47:31AM -0500, Michael Bringmann wrote:
> > I'm not sure because it doesn't make any logical sense and it's not
> > right in terms of correctness.  The above would be able to enable CPUs
> > which are explicitly excluded from a workqueue.  The only fallback
> > which makes sense is falling back to the default pwq.
> 
> What would that look like?  Are you sure that would always be valid?
> In a system that is hot-adding and hot-removing CPUs?

The reason why we're ending up with empty masks is because
wq_calc_node_cpumask() is assuming that the possible node cpumask is
always a superset of online (as it should).  We can trigger a fat
warning there if that isn't so and just return false from that
function.

> > The only way offlining can lead to this failure is when wq numa
> > possible cpu mask is a proper subset of the matching online mask.  Can
> > you please print out the numa online cpu and wq_numa_possible_cpumask
> > masks and verify that online stays within the possible for each node?
> > If not, the ppc arch init code needs to be updated so that cpu <->
> > node binding is establish for all possible cpus on boot.  Note that
> > this isn't a requirement coming solely from wq.  All node affine (thus
> > percpu) allocations depend on that.
> 
> The ppc arch init code already records all nodes used by the CPUs visible in
> the device-tree at boot time into the possible and online node bindings.  The
> problem here occurs when we hot-add new CPUs to the powerpc system -- they may
> require nodes that are mentioned by the VPHN hcall, but which were not used
> at boot time.

We need all the possible (so, for cpus which aren't online yet too)
CPU -> node mappings to be established on boot.  This isn't just a
requirement from workqueue.  We don't have any synchronization
regarding cpu <-> numa mapping in memory allocation paths either.

> I will run a test that dumps these masks later this week to try to provide
> the information that you are interested in.
> 
> Right now we are having a discussion on another thread as to how to properly
> set the possible node mask at boot given that there is no mechanism to hot-add
> nodes to the system.  The latest idea appears to be adding another property
> or two to define the maximum number of nodes that should be added to the
> possible / online node masks to allow for dynamic growth after boot.

I have no idea about the specifics of ppc but at least the code base
we have currently expect all possible cpus and nodes and their
mappings to be established on boot.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-06-12 16:14                             ` Tejun Heo
@ 2017-06-12 17:10                               ` Michael Bringmann
  2017-06-12 17:32                                 ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Bringmann @ 2017-06-12 17:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot



On 06/12/2017 11:14 AM, Tejun Heo wrote:
> Hello,
> 
> On Mon, Jun 12, 2017 at 09:47:31AM -0500, Michael Bringmann wrote:
>>> I'm not sure because it doesn't make any logical sense and it's not
>>> right in terms of correctness.  The above would be able to enable CPUs
>>> which are explicitly excluded from a workqueue.  The only fallback
>>> which makes sense is falling back to the default pwq.
>>
>> What would that look like?  Are you sure that would always be valid?
>> In a system that is hot-adding and hot-removing CPUs?
> 
> The reason why we're ending up with empty masks is because
> wq_calc_node_cpumask() is assuming that the possible node cpumask is
> always a superset of online (as it should).  We can trigger a fat
> warning there if that isn't so and just return false from that
> function.

What would that look like?  I should be able to test it on top of the
other changes / corrections.

>>> The only way offlining can lead to this failure is when wq numa
>>> possible cpu mask is a proper subset of the matching online mask.  Can
>>> you please print out the numa online cpu and wq_numa_possible_cpumask
>>> masks and verify that online stays within the possible for each node?
>>> If not, the ppc arch init code needs to be updated so that cpu <->
>>> node binding is establish for all possible cpus on boot.  Note that
>>> this isn't a requirement coming solely from wq.  All node affine (thus
>>> percpu) allocations depend on that.
>>
>> The ppc arch init code already records all nodes used by the CPUs visible in
>> the device-tree at boot time into the possible and online node bindings.  The
>> problem here occurs when we hot-add new CPUs to the powerpc system -- they may
>> require nodes that are mentioned by the VPHN hcall, but which were not used
>> at boot time.
> 
> We need all the possible (so, for cpus which aren't online yet too)
> CPU -> node mappings to be established on boot.  This isn't just a
> requirement from workqueue.  We don't have any synchronization
> regarding cpu <-> numa mapping in memory allocation paths either.
> 
>> I will run a test that dumps these masks later this week to try to provide
>> the information that you are interested in.
>>
>> Right now we are having a discussion on another thread as to how to properly
>> set the possible node mask at boot given that there is no mechanism to hot-add
>> nodes to the system.  The latest idea appears to be adding another property
>> or two to define the maximum number of nodes that should be added to the
>> possible / online node masks to allow for dynamic growth after boot.
> 
> I have no idea about the specifics of ppc but at least the code base
> we have currently expect all possible cpus and nodes and their
> mappings to be established on boot.

Hopefully, the new properties will fix the holes in the current implementation
with regard to hot-add.

> 
> Thanks.
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-06-12 17:10                               ` Michael Bringmann
@ 2017-06-12 17:32                                 ` Tejun Heo
  2017-06-13 20:04                                   ` Michael Bringmann
  0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2017-06-12 17:32 UTC (permalink / raw)
  To: Michael Bringmann; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

Hello,

On Mon, Jun 12, 2017 at 12:10:49PM -0500, Michael Bringmann wrote:
> > The reason why we're ending up with empty masks is because
> > wq_calc_node_cpumask() is assuming that the possible node cpumask is
> > always a superset of online (as it should).  We can trigger a fat
> > warning there if that isn't so and just return false from that
> > function.
> 
> What would that look like?  I should be able to test it on top of the
> other changes / corrections.

So, the function looks like the following now.

  static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
				   int cpu_going_down, cpumask_t *cpumask)
  {
	  if (!wq_numa_enabled || attrs->no_numa)
		  goto use_dfl;

	  /* does @node have any online CPUs @attrs wants? */
A:	cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask);
	  if (cpu_going_down >= 0)
		  cpumask_clear_cpu(cpu_going_down, cpumask);

B:	if (cpumask_empty(cpumask))
		  goto use_dfl;

	  /* yeap, return possible CPUs in @node that @attrs wants */
C:	cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
	  return !cpumask_equal(cpumask, attrs->cpumask);

  use_dfl:
	  cpumask_copy(cpumask, attrs->cpumask);
	  return false;
  }

A is calculating the target cpumask to use using the online map.  B
falls back to dfl mask if the intersection is empty.  C calculates the
eventual mask to use from the intersection of possible mask and what's
requested.  The assumption is that because possible is a superset of
online, C's result can't be smaller than A.

So, what we can do is if to calculate the possible intersection,
compare it against the online intersection, and if the latter is
bigger, trigger a big fat warning and return false there.

> > I have no idea about the specifics of ppc but at least the code base
> > we have currently expect all possible cpus and nodes and their
> > mappings to be established on boot.
> 
> Hopefully, the new properties will fix the holes in the current implementation
> with regard to hot-add.

Yeah, that's the only proper fix here.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-06-12 17:32                                 ` Tejun Heo
@ 2017-06-13 20:04                                   ` Michael Bringmann
  2017-06-13 20:10                                     ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Bringmann @ 2017-06-13 20:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

[-- Attachment #1: Type: text/plain, Size: 4044 bytes --]

Hello:

On 06/12/2017 12:32 PM, Tejun Heo wrote:
> Hello,
> 
> On Mon, Jun 12, 2017 at 12:10:49PM -0500, Michael Bringmann wrote:
>>> The reason why we're ending up with empty masks is because
>>> wq_calc_node_cpumask() is assuming that the possible node cpumask is
>>> always a superset of online (as it should).  We can trigger a fat
>>> warning there if that isn't so and just return false from that
>>> function.
>>
>> What would that look like?  I should be able to test it on top of the
>> other changes / corrections.
> 
> So, the function looks like the following now.
> 
>   static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
> 				   int cpu_going_down, cpumask_t *cpumask)
>   {
> 	  if (!wq_numa_enabled || attrs->no_numa)
> 		  goto use_dfl;
> 
> 	  /* does @node have any online CPUs @attrs wants? */
> A:	cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask);
> 	  if (cpu_going_down >= 0)
> 		  cpumask_clear_cpu(cpu_going_down, cpumask);
> 
> B:	if (cpumask_empty(cpumask))
> 		  goto use_dfl;
> 
> 	  /* yeap, return possible CPUs in @node that @attrs wants */
> C:	cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
> 	  return !cpumask_equal(cpumask, attrs->cpumask);
> 
>   use_dfl:
> 	  cpumask_copy(cpumask, attrs->cpumask);
> 	  return false;
>   }
> 
> A is calculating the target cpumask to use using the online map.  B
> falls back to dfl mask if the intersection is empty.  C calculates the
> eventual mask to use from the intersection of possible mask and what's
> requested.  The assumption is that because possible is a superset of
> online, C's result can't be smaller than A.
> 
> So, what we can do is if to calculate the possible intersection,
> compare it against the online intersection, and if the latter is
> bigger, trigger a big fat warning and return false there.

So the revisions to the function would look like, correct?

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c74bf39..5d7674e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3564,19 +3564,28 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
 static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
 				 int cpu_going_down, cpumask_t *cpumask)
 {
+	cpumask_t	onl_targ_cm;
+
 	if (!wq_numa_enabled || attrs->no_numa)
 		goto use_dfl;
 
 	/* does @node have any online CPUs @attrs wants? */
-	cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask);
+	cpumask_and(&onl_targ_cm, cpumask_of_node(node), attrs->cpumask);
 	if (cpu_going_down >= 0)
-		cpumask_clear_cpu(cpu_going_down, cpumask);
+		cpumask_clear_cpu(cpu_going_down, &onl_targ_cm);
 
-	if (cpumask_empty(cpumask))
+	if (cpumask_empty(&onl_targ_cm))
 		goto use_dfl;
 
 	/* yeap, return possible CPUs in @node that @attrs wants */
 	cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
+
+	if (cpumask_weight(&onl_targ_cm) > cpumask_weight(cpumask)) {
+		printk(KERN_INFO "WARNING: WQ cpumask: onl intersect > "
+						"possible intersect\n");
+		return false;
+	}
+
 	return !cpumask_equal(cpumask, attrs->cpumask);
 
 use_dfl:

> 
>>> I have no idea about the specifics of ppc but at least the code base
>>> we have currently expect all possible cpus and nodes and their
>>> mappings to be established on boot.
>>
>> Hopefully, the new properties will fix the holes in the current implementation
>> with regard to hot-add.
> 
> Yeah, that's the only proper fix here.

I incorporated other changes to try to fill in the possible map more accurately,
and with only the above modification to workqueue.c, I ran a hot-add CPU test
to add 8 VPs to a powerpc Shared CPU configuration.  It produced a lot of the
warning messages -- a lot more than I was expecting.  But it did not crash.
I have attached a compressed copy of the log file, here.

> 
> Thanks.
> 

Thanks.

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

[-- Attachment #2: boot-run.log.xz --]
[-- Type: application/x-xz, Size: 14284 bytes --]

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-06-13 20:04                                   ` Michael Bringmann
@ 2017-06-13 20:10                                     ` Tejun Heo
  2017-06-28 21:15                                       ` Michael Bringmann
  0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2017-06-13 20:10 UTC (permalink / raw)
  To: Michael Bringmann; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

Hello,

On Tue, Jun 13, 2017 at 03:04:30PM -0500, Michael Bringmann wrote:
> @@ -3564,19 +3564,28 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
>  static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
>  				 int cpu_going_down, cpumask_t *cpumask)
>  {
> +	cpumask_t	onl_targ_cm;
> +
>  	if (!wq_numa_enabled || attrs->no_numa)
>  		goto use_dfl;
>  
>  	/* does @node have any online CPUs @attrs wants? */
> -	cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask);
> +	cpumask_and(&onl_targ_cm, cpumask_of_node(node), attrs->cpumask);
>  	if (cpu_going_down >= 0)
> -		cpumask_clear_cpu(cpu_going_down, cpumask);
> +		cpumask_clear_cpu(cpu_going_down, &onl_targ_cm);
>  
> -	if (cpumask_empty(cpumask))
> +	if (cpumask_empty(&onl_targ_cm))
>  		goto use_dfl;
>  
>  	/* yeap, return possible CPUs in @node that @attrs wants */
>  	cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
> +
> +	if (cpumask_weight(&onl_targ_cm) > cpumask_weight(cpumask)) {
> +		printk(KERN_INFO "WARNING: WQ cpumask: onl intersect > "
> +						"possible intersect\n");
> +		return false;
> +	}

Yeah, alternatively, we can just add right before returning,

	if (WARN_ON(cpumask_empty(cpumask)))
		return false;

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-06-13 20:10                                     ` Tejun Heo
@ 2017-06-28 21:15                                       ` Michael Bringmann
  2017-06-28 21:24                                         ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Bringmann @ 2017-06-28 21:15 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

I will try that patch tomorrow.  My only concern about that is the use of WARN_ON().
As I may have mentioned in my note of 6/27, I saw about 600 instances of the warning
message just during boot of the PowerPC kernel.  I doubt that we want to see that on
an ongoing basis.

Michael

On 06/13/2017 03:10 PM, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jun 13, 2017 at 03:04:30PM -0500, Michael Bringmann wrote:
>> @@ -3564,19 +3564,28 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
>>  static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
>>  				 int cpu_going_down, cpumask_t *cpumask)
>>  {
>> +	cpumask_t	onl_targ_cm;
>> +
>>  	if (!wq_numa_enabled || attrs->no_numa)
>>  		goto use_dfl;
>>  
>>  	/* does @node have any online CPUs @attrs wants? */
>> -	cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask);
>> +	cpumask_and(&onl_targ_cm, cpumask_of_node(node), attrs->cpumask);
>>  	if (cpu_going_down >= 0)
>> -		cpumask_clear_cpu(cpu_going_down, cpumask);
>> +		cpumask_clear_cpu(cpu_going_down, &onl_targ_cm);
>>  
>> -	if (cpumask_empty(cpumask))
>> +	if (cpumask_empty(&onl_targ_cm))
>>  		goto use_dfl;
>>  
>>  	/* yeap, return possible CPUs in @node that @attrs wants */
>>  	cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
>> +
>> +	if (cpumask_weight(&onl_targ_cm) > cpumask_weight(cpumask)) {
>> +		printk(KERN_INFO "WARNING: WQ cpumask: onl intersect > "
>> +						"possible intersect\n");
>> +		return false;
>> +	}
> 
> Yeah, alternatively, we can just add right before returning,
> 
> 	if (WARN_ON(cpumask_empty(cpumask)))
> 		return false;
> 
> Thanks.
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-06-28 21:15                                       ` Michael Bringmann
@ 2017-06-28 21:24                                         ` Tejun Heo
  2017-07-26 15:25                                           ` Michael Bringmann
  0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2017-06-28 21:24 UTC (permalink / raw)
  To: Michael Bringmann; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

On Wed, Jun 28, 2017 at 04:15:09PM -0500, Michael Bringmann wrote:
> I will try that patch tomorrow.  My only concern about that is the use of WARN_ON().
> As I may have mentioned in my note of 6/27, I saw about 600 instances of the warning
> message just during boot of the PowerPC kernel.  I doubt that we want to see that on
> an ongoing basis.

Yeah, we can either do pr_err_once() or WARN_ON_ONCE(), so that we
don't print that out constantly.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-06-28 21:24                                         ` Tejun Heo
@ 2017-07-26 15:25                                           ` Michael Bringmann
  2017-07-26 19:16                                             ` Tejun Heo
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Bringmann @ 2017-07-26 15:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot, Michael Bringmann

Hello, Tejun:
    Do you need anything else from me regarding this patch?
Or are you good to commit it upstream?
    Thanks.
Michael

On 06/28/2017 04:24 PM, Tejun Heo wrote:
> On Wed, Jun 28, 2017 at 04:15:09PM -0500, Michael Bringmann wrote:
>> I will try that patch tomorrow.  My only concern about that is the use of WARN_ON().
>> As I may have mentioned in my note of 6/27, I saw about 600 instances of the warning
>> message just during boot of the PowerPC kernel.  I doubt that we want to see that on
>> an ongoing basis.
> 
> Yeah, we can either do pr_err_once() or WARN_ON_ONCE(), so that we
> don't print that out constantly.
> 
> Thanks.
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-07-26 15:25                                           ` Michael Bringmann
@ 2017-07-26 19:16                                             ` Tejun Heo
  2017-07-27 17:04                                               ` Michael Bringmann
  0 siblings, 1 reply; 26+ messages in thread
From: Tejun Heo @ 2017-07-26 19:16 UTC (permalink / raw)
  To: Michael Bringmann; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

On Wed, Jul 26, 2017 at 10:25:08AM -0500, Michael Bringmann wrote:
> Hello, Tejun:
>     Do you need anything else from me regarding this patch?
> Or are you good to commit it upstream?
>     Thanks.

Hmmm... you were planning to try it and we wanted to convert it to
WARN_ONCE?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot
  2017-07-26 19:16                                             ` Tejun Heo
@ 2017-07-27 17:04                                               ` Michael Bringmann
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Bringmann @ 2017-07-27 17:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, Nathan Fontenot

Sorry, I did try it.  I must have forgotten to notify you of its success.
Will next post an updated patch using 'pr_warn_once'.  It prints a message
during system initialization, by the way.

Thanks.


On 07/26/2017 02:16 PM, Tejun Heo wrote:
> On Wed, Jul 26, 2017 at 10:25:08AM -0500, Michael Bringmann wrote:
>> Hello, Tejun:
>>     Do you need anything else from me regarding this patch?
>> Or are you good to commit it upstream?
>>     Thanks.
> 
> Hmmm... you were planning to try it and we wanted to convert it to
> WARN_ONCE?
> 
> Thanks.
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-07-27 17:04 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-10 16:48 [PATCH] workqueue: Ensure that cpumask set for pools created after boot Michael Bringmann
2017-05-10 17:33 ` Tejun Heo
2017-05-15 15:48   ` Michael Bringmann
2017-05-16 15:55     ` Tejun Heo
2017-05-23 19:44       ` Michael Bringmann
2017-05-23 19:49         ` Tejun Heo
2017-05-23 20:09           ` Michael Bringmann
2017-05-23 20:10             ` Tejun Heo
2017-05-24 16:30               ` Michael Bringmann
2017-05-24 23:39               ` Michael Bringmann
2017-05-25 15:03                 ` Tejun Heo
2017-05-25 15:07                   ` Tejun Heo
2017-05-25 15:30                     ` Michael Bringmann
2017-06-06 16:18                       ` Michael Bringmann
2017-06-06 18:09                         ` Tejun Heo
2017-06-12 14:47                           ` Michael Bringmann
2017-06-12 16:14                             ` Tejun Heo
2017-06-12 17:10                               ` Michael Bringmann
2017-06-12 17:32                                 ` Tejun Heo
2017-06-13 20:04                                   ` Michael Bringmann
2017-06-13 20:10                                     ` Tejun Heo
2017-06-28 21:15                                       ` Michael Bringmann
2017-06-28 21:24                                         ` Tejun Heo
2017-07-26 15:25                                           ` Michael Bringmann
2017-07-26 19:16                                             ` Tejun Heo
2017-07-27 17:04                                               ` Michael Bringmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).