All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] workqueue: initialize cpumask of wq_numa_possible_cpumask
@ 2014-07-06 17:21 Yasuaki Ishimatsu
  2014-07-07  0:19 ` Lai Jiangshan
  0 siblings, 1 reply; 7+ messages in thread
From: Yasuaki Ishimatsu @ 2014-07-06 17:21 UTC (permalink / raw)
  To: tj, laijs; +Cc: linux-kernel

When hot-adding and onlining CPU, kernel panic occurs, showing following
call trace.

 BUG: unable to handle kernel paging request at 0000000000001d08
 IP: [<ffffffff8114acfd>] __alloc_pages_nodemask+0x9d/0xb10
 PGD 0
 Oops: 0000 [#1] SMP
 ...
 Call Trace:
  [<ffffffff812b8745>] ? cpumask_next_and+0x35/0x50
  [<ffffffff810a3283>] ? find_busiest_group+0x113/0x8f0
  [<ffffffff81193bc9>] ? deactivate_slab+0x349/0x3c0
  [<ffffffff811926f1>] new_slab+0x91/0x300
  [<ffffffff815de95a>] __slab_alloc+0x2bb/0x482
  [<ffffffff8105bc1c>] ? copy_process.part.25+0xfc/0x14c0
  [<ffffffff810a3c78>] ? load_balance+0x218/0x890
  [<ffffffff8101a679>] ? sched_clock+0x9/0x10
  [<ffffffff81105ba9>] ? trace_clock_local+0x9/0x10
  [<ffffffff81193d1c>] kmem_cache_alloc_node+0x8c/0x200
  [<ffffffff8105bc1c>] copy_process.part.25+0xfc/0x14c0
  [<ffffffff81114d0d>] ? trace_buffer_unlock_commit+0x4d/0x60
  [<ffffffff81085a80>] ? kthread_create_on_node+0x140/0x140
  [<ffffffff8105d0ec>] do_fork+0xbc/0x360
  [<ffffffff8105d3b6>] kernel_thread+0x26/0x30
  [<ffffffff81086652>] kthreadd+0x2c2/0x300
  [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60
  [<ffffffff815f20ec>] ret_from_fork+0x7c/0xb0
  [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60

In my investigation, I found the root cause is wq_numa_possible_cpumask.
All entries of wq_numa_possible_cpumask is allocated by
alloc_cpumask_var_node(). And these entries are used without initializing.
So these entries have wrong value.

When hot-adding and onlining CPU, wq_update_unbound_numa() is called.
wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq()
calls get_unbound_pool(). In get_unbound_pool(), worker_pool->node is set
as follow:

#kernel/workqueue.c
3592         /* if cpumask is contained inside a NUMA node, we belong to that node */
3593         if (wq_numa_enabled) {
3594                 for_each_node(node) {
3595                         if (cpumask_subset(pool->attrs->cpumask,
3596                                            wq_numa_possible_cpumask[node])) {
3597                                 pool->node = node;
3598                                 break;
3599                         }
3600                 }
3601         }

But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong
node is selected. As a result, kernel panic occurs.

By this patch, all entries of wq_numa_possible_cpumask are allocated by
zalloc_cpumask_var_node to initialize them. And the panic disappeared.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6203d29..b393ded 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3338,7 +3338,7 @@ struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask)
 	attrs = kzalloc(sizeof(*attrs), gfp_mask);
 	if (!attrs)
 		goto fail;
-	if (!alloc_cpumask_var(&attrs->cpumask, gfp_mask))
+	if (!zalloc_cpumask_var(&attrs->cpumask, gfp_mask))
 		goto fail;

 	cpumask_copy(attrs->cpumask, cpu_possible_mask);


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] workqueue: initialize cpumask of wq_numa_possible_cpumask
  2014-07-06 17:21 [PATCH] workqueue: initialize cpumask of wq_numa_possible_cpumask Yasuaki Ishimatsu
@ 2014-07-07  0:19 ` Lai Jiangshan
  2014-07-07  0:33   ` Yasuaki Ishimatsu
  2014-07-07  0:47   ` [resend PATCH] " Yasuaki Ishimatsu
  0 siblings, 2 replies; 7+ messages in thread
From: Lai Jiangshan @ 2014-07-07  0:19 UTC (permalink / raw)
  To: Yasuaki Ishimatsu; +Cc: tj, linux-kernel

On 07/07/2014 01:21 AM, Yasuaki Ishimatsu wrote:
> When hot-adding and onlining CPU, kernel panic occurs, showing following
> call trace.
> 
>  BUG: unable to handle kernel paging request at 0000000000001d08
>  IP: [<ffffffff8114acfd>] __alloc_pages_nodemask+0x9d/0xb10
>  PGD 0
>  Oops: 0000 [#1] SMP
>  ...
>  Call Trace:
>   [<ffffffff812b8745>] ? cpumask_next_and+0x35/0x50
>   [<ffffffff810a3283>] ? find_busiest_group+0x113/0x8f0
>   [<ffffffff81193bc9>] ? deactivate_slab+0x349/0x3c0
>   [<ffffffff811926f1>] new_slab+0x91/0x300
>   [<ffffffff815de95a>] __slab_alloc+0x2bb/0x482
>   [<ffffffff8105bc1c>] ? copy_process.part.25+0xfc/0x14c0
>   [<ffffffff810a3c78>] ? load_balance+0x218/0x890
>   [<ffffffff8101a679>] ? sched_clock+0x9/0x10
>   [<ffffffff81105ba9>] ? trace_clock_local+0x9/0x10
>   [<ffffffff81193d1c>] kmem_cache_alloc_node+0x8c/0x200
>   [<ffffffff8105bc1c>] copy_process.part.25+0xfc/0x14c0
>   [<ffffffff81114d0d>] ? trace_buffer_unlock_commit+0x4d/0x60
>   [<ffffffff81085a80>] ? kthread_create_on_node+0x140/0x140
>   [<ffffffff8105d0ec>] do_fork+0xbc/0x360
>   [<ffffffff8105d3b6>] kernel_thread+0x26/0x30
>   [<ffffffff81086652>] kthreadd+0x2c2/0x300
>   [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60
>   [<ffffffff815f20ec>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60
> 
> In my investigation, I found the root cause is wq_numa_possible_cpumask.
> All entries of wq_numa_possible_cpumask is allocated by
> alloc_cpumask_var_node(). And these entries are used without initializing.
> So these entries have wrong value.
> 
> When hot-adding and onlining CPU, wq_update_unbound_numa() is called.
> wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq()
> calls get_unbound_pool(). In get_unbound_pool(), worker_pool->node is set
> as follow:
> 
> #kernel/workqueue.c
> 3592         /* if cpumask is contained inside a NUMA node, we belong to that node */
> 3593         if (wq_numa_enabled) {
> 3594                 for_each_node(node) {
> 3595                         if (cpumask_subset(pool->attrs->cpumask,
> 3596                                            wq_numa_possible_cpumask[node])) {
> 3597                                 pool->node = node;
> 3598                                 break;
> 3599                         }
> 3600                 }
> 3601         }
> 
> But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong
> node is selected. As a result, kernel panic occurs.
> 
> By this patch, all entries of wq_numa_possible_cpumask are allocated by
> zalloc_cpumask_var_node to initialize them. And the panic disappeared.
> 
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Hi, Yasuaki

All cpumasks in the wq_numa_possible_cpumask array are allocated in
wq_numa_init():

	for_each_node(node)
		BUG_ON(!alloc_cpumask_var_node(&tbl[node], GFP_KERNEL,
				node_online(node) ? node : NUMA_NO_NODE));

	[snip...]

	wq_numa_possible_cpumask = tbl;

I didn't find out how does this patch make the all entries of
wq_numa_possible_cpumask zeroed.

Or I misunderstood.

Thanks,
Lai

> ---
>  kernel/workqueue.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 6203d29..b393ded 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3338,7 +3338,7 @@ struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask)
>  	attrs = kzalloc(sizeof(*attrs), gfp_mask);
>  	if (!attrs)
>  		goto fail;
> -	if (!alloc_cpumask_var(&attrs->cpumask, gfp_mask))
> +	if (!zalloc_cpumask_var(&attrs->cpumask, gfp_mask))
>  		goto fail;
> 
>  	cpumask_copy(attrs->cpumask, cpu_possible_mask);
> 
> .
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] workqueue: initialize cpumask of wq_numa_possible_cpumask
  2014-07-07  0:19 ` Lai Jiangshan
@ 2014-07-07  0:33   ` Yasuaki Ishimatsu
  2014-07-07  1:04     ` Lai Jiangshan
  2014-07-07  0:47   ` [resend PATCH] " Yasuaki Ishimatsu
  1 sibling, 1 reply; 7+ messages in thread
From: Yasuaki Ishimatsu @ 2014-07-07  0:33 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: tj, linux-kernel

(2014/07/07 9:19), Lai Jiangshan wrote:
> On 07/07/2014 01:21 AM, Yasuaki Ishimatsu wrote:
>> When hot-adding and onlining CPU, kernel panic occurs, showing following
>> call trace.
>>
>>   BUG: unable to handle kernel paging request at 0000000000001d08
>>   IP: [<ffffffff8114acfd>] __alloc_pages_nodemask+0x9d/0xb10
>>   PGD 0
>>   Oops: 0000 [#1] SMP
>>   ...
>>   Call Trace:
>>    [<ffffffff812b8745>] ? cpumask_next_and+0x35/0x50
>>    [<ffffffff810a3283>] ? find_busiest_group+0x113/0x8f0
>>    [<ffffffff81193bc9>] ? deactivate_slab+0x349/0x3c0
>>    [<ffffffff811926f1>] new_slab+0x91/0x300
>>    [<ffffffff815de95a>] __slab_alloc+0x2bb/0x482
>>    [<ffffffff8105bc1c>] ? copy_process.part.25+0xfc/0x14c0
>>    [<ffffffff810a3c78>] ? load_balance+0x218/0x890
>>    [<ffffffff8101a679>] ? sched_clock+0x9/0x10
>>    [<ffffffff81105ba9>] ? trace_clock_local+0x9/0x10
>>    [<ffffffff81193d1c>] kmem_cache_alloc_node+0x8c/0x200
>>    [<ffffffff8105bc1c>] copy_process.part.25+0xfc/0x14c0
>>    [<ffffffff81114d0d>] ? trace_buffer_unlock_commit+0x4d/0x60
>>    [<ffffffff81085a80>] ? kthread_create_on_node+0x140/0x140
>>    [<ffffffff8105d0ec>] do_fork+0xbc/0x360
>>    [<ffffffff8105d3b6>] kernel_thread+0x26/0x30
>>    [<ffffffff81086652>] kthreadd+0x2c2/0x300
>>    [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60
>>    [<ffffffff815f20ec>] ret_from_fork+0x7c/0xb0
>>    [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60
>>
>> In my investigation, I found the root cause is wq_numa_possible_cpumask.
>> All entries of wq_numa_possible_cpumask is allocated by
>> alloc_cpumask_var_node(). And these entries are used without initializing.
>> So these entries have wrong value.
>>
>> When hot-adding and onlining CPU, wq_update_unbound_numa() is called.
>> wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq()
>> calls get_unbound_pool(). In get_unbound_pool(), worker_pool->node is set
>> as follow:
>>
>> #kernel/workqueue.c
>> 3592         /* if cpumask is contained inside a NUMA node, we belong to that node */
>> 3593         if (wq_numa_enabled) {
>> 3594                 for_each_node(node) {
>> 3595                         if (cpumask_subset(pool->attrs->cpumask,
>> 3596                                            wq_numa_possible_cpumask[node])) {
>> 3597                                 pool->node = node;
>> 3598                                 break;
>> 3599                         }
>> 3600                 }
>> 3601         }
>>
>> But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong
>> node is selected. As a result, kernel panic occurs.
>>
>> By this patch, all entries of wq_numa_possible_cpumask are allocated by
>> zalloc_cpumask_var_node to initialize them. And the panic disappeared.
>>
>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>
> Hi, Yasuaki
>
> All cpumasks in the wq_numa_possible_cpumask array are allocated in
> wq_numa_init():
>
> 	for_each_node(node)
> 		BUG_ON(!alloc_cpumask_var_node(&tbl[node], GFP_KERNEL,
> 				node_online(node) ? node : NUMA_NO_NODE));
>
> 	[snip...]
>
> 	wq_numa_possible_cpumask = tbl;
>
> I didn't find out how does this patch make the all entries of
> wq_numa_possible_cpumask zeroed.

Sorry. I mistook. I will resend soon.

Thanks,
Yasuaki Ishimatsu.

>
> Or I misunderstood.
>
> Thanks,
> Lai
>
>> ---
>>   kernel/workqueue.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 6203d29..b393ded 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -3338,7 +3338,7 @@ struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask)
>>   	attrs = kzalloc(sizeof(*attrs), gfp_mask);
>>   	if (!attrs)
>>   		goto fail;
>> -	if (!alloc_cpumask_var(&attrs->cpumask, gfp_mask))
>> +	if (!zalloc_cpumask_var(&attrs->cpumask, gfp_mask))
>>   		goto fail;
>>
>>   	cpumask_copy(attrs->cpumask, cpu_possible_mask);
>>
>> .
>>
>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [resend PATCH] workqueue: initialize cpumask of wq_numa_possible_cpumask
  2014-07-07  0:19 ` Lai Jiangshan
  2014-07-07  0:33   ` Yasuaki Ishimatsu
@ 2014-07-07  0:47   ` Yasuaki Ishimatsu
  2014-07-07 14:01     ` [PATCH wq/for-3.16-fixes] workqueue: zero cpumask of wq_numa_possible_cpumask on init Tejun Heo
  1 sibling, 1 reply; 7+ messages in thread
From: Yasuaki Ishimatsu @ 2014-07-07  0:47 UTC (permalink / raw)
  To: Lai Jiangshan, tj; +Cc: linux-kernel

When hot-adding and onlining CPU, kernel panic occurs, showing following
call trace.

  BUG: unable to handle kernel paging request at 0000000000001d08
  IP: [<ffffffff8114acfd>] __alloc_pages_nodemask+0x9d/0xb10
  PGD 0
  Oops: 0000 [#1] SMP
  ...
  Call Trace:
   [<ffffffff812b8745>] ? cpumask_next_and+0x35/0x50
   [<ffffffff810a3283>] ? find_busiest_group+0x113/0x8f0
   [<ffffffff81193bc9>] ? deactivate_slab+0x349/0x3c0
   [<ffffffff811926f1>] new_slab+0x91/0x300
   [<ffffffff815de95a>] __slab_alloc+0x2bb/0x482
   [<ffffffff8105bc1c>] ? copy_process.part.25+0xfc/0x14c0
   [<ffffffff810a3c78>] ? load_balance+0x218/0x890
   [<ffffffff8101a679>] ? sched_clock+0x9/0x10
   [<ffffffff81105ba9>] ? trace_clock_local+0x9/0x10
   [<ffffffff81193d1c>] kmem_cache_alloc_node+0x8c/0x200
   [<ffffffff8105bc1c>] copy_process.part.25+0xfc/0x14c0
   [<ffffffff81114d0d>] ? trace_buffer_unlock_commit+0x4d/0x60
   [<ffffffff81085a80>] ? kthread_create_on_node+0x140/0x140
   [<ffffffff8105d0ec>] do_fork+0xbc/0x360
   [<ffffffff8105d3b6>] kernel_thread+0x26/0x30
   [<ffffffff81086652>] kthreadd+0x2c2/0x300
   [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60
   [<ffffffff815f20ec>] ret_from_fork+0x7c/0xb0
   [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60

In my investigation, I found the root cause is wq_numa_possible_cpumask.
All entries of wq_numa_possible_cpumask is allocated by
alloc_cpumask_var_node(). And these entries are used without initializing.
So these entries have wrong value.

When hot-adding and onlining CPU, wq_update_unbound_numa() is called.
wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq()
calls get_unbound_pool(). In get_unbound_pool(), worker_pool->node is set
as follow:

#kernel/workqueue.c
3592         /* if cpumask is contained inside a NUMA node, we belong to that node */
3593         if (wq_numa_enabled) {
3594                 for_each_node(node) {
3595                         if (cpumask_subset(pool->attrs->cpumask,
3596                                            wq_numa_possible_cpumask[node])) {
3597                                 pool->node = node;
3598                                 break;
3599                         }
3600                 }
3601         }

But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong
node is selected. As a result, kernel panic occurs.

By this patch, all entries of wq_numa_possible_cpumask are allocated by
zalloc_cpumask_var_node to initialize them. And the panic disappeared.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
  kernel/workqueue.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b393ded..81e6511 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4879,7 +4879,7 @@ static void __init wq_numa_init(void)
  	BUG_ON(!tbl);

  	for_each_node(node)
-		BUG_ON(!alloc_cpumask_var_node(&tbl[node], GFP_KERNEL,
+		BUG_ON(!zalloc_cpumask_var_node(&tbl[node], GFP_KERNEL,
  				node_online(node) ? node : NUMA_NO_NODE));

  	for_each_possible_cpu(cpu) {


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] workqueue: initialize cpumask of wq_numa_possible_cpumask
  2014-07-07  0:33   ` Yasuaki Ishimatsu
@ 2014-07-07  1:04     ` Lai Jiangshan
  2014-07-07  1:20       ` Yasuaki Ishimatsu
  0 siblings, 1 reply; 7+ messages in thread
From: Lai Jiangshan @ 2014-07-07  1:04 UTC (permalink / raw)
  To: Yasuaki Ishimatsu; +Cc: tj, linux-kernel

On 07/07/2014 08:33 AM, Yasuaki Ishimatsu wrote:
> (2014/07/07 9:19), Lai Jiangshan wrote:
>> On 07/07/2014 01:21 AM, Yasuaki Ishimatsu wrote:
>>> When hot-adding and onlining CPU, kernel panic occurs, showing following
>>> call trace.
>>>
>>>   BUG: unable to handle kernel paging request at 0000000000001d08
>>>   IP: [<ffffffff8114acfd>] __alloc_pages_nodemask+0x9d/0xb10
>>>   PGD 0
>>>   Oops: 0000 [#1] SMP
>>>   ...
>>>   Call Trace:
>>>    [<ffffffff812b8745>] ? cpumask_next_and+0x35/0x50
>>>    [<ffffffff810a3283>] ? find_busiest_group+0x113/0x8f0
>>>    [<ffffffff81193bc9>] ? deactivate_slab+0x349/0x3c0
>>>    [<ffffffff811926f1>] new_slab+0x91/0x300
>>>    [<ffffffff815de95a>] __slab_alloc+0x2bb/0x482
>>>    [<ffffffff8105bc1c>] ? copy_process.part.25+0xfc/0x14c0
>>>    [<ffffffff810a3c78>] ? load_balance+0x218/0x890
>>>    [<ffffffff8101a679>] ? sched_clock+0x9/0x10
>>>    [<ffffffff81105ba9>] ? trace_clock_local+0x9/0x10
>>>    [<ffffffff81193d1c>] kmem_cache_alloc_node+0x8c/0x200
>>>    [<ffffffff8105bc1c>] copy_process.part.25+0xfc/0x14c0
>>>    [<ffffffff81114d0d>] ? trace_buffer_unlock_commit+0x4d/0x60
>>>    [<ffffffff81085a80>] ? kthread_create_on_node+0x140/0x140
>>>    [<ffffffff8105d0ec>] do_fork+0xbc/0x360
>>>    [<ffffffff8105d3b6>] kernel_thread+0x26/0x30
>>>    [<ffffffff81086652>] kthreadd+0x2c2/0x300
>>>    [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60
>>>    [<ffffffff815f20ec>] ret_from_fork+0x7c/0xb0
>>>    [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60
>>>
>>> In my investigation, I found the root cause is wq_numa_possible_cpumask.
>>> All entries of wq_numa_possible_cpumask is allocated by
>>> alloc_cpumask_var_node(). And these entries are used without initializing.
>>> So these entries have wrong value.
>>>
>>> When hot-adding and onlining CPU, wq_update_unbound_numa() is called.
>>> wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq()
>>> calls get_unbound_pool(). In get_unbound_pool(), worker_pool->node is set
>>> as follow:
>>>
>>> #kernel/workqueue.c
>>> 3592         /* if cpumask is contained inside a NUMA node, we belong to that node */
>>> 3593         if (wq_numa_enabled) {
>>> 3594                 for_each_node(node) {
>>> 3595                         if (cpumask_subset(pool->attrs->cpumask,
>>> 3596                                            wq_numa_possible_cpumask[node])) {
>>> 3597                                 pool->node = node;
>>> 3598                                 break;
>>> 3599                         }
>>> 3600                 }
>>> 3601         }
>>>
>>> But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong
>>> node is selected. As a result, kernel panic occurs.
>>>
>>> By this patch, all entries of wq_numa_possible_cpumask are allocated by
>>> zalloc_cpumask_var_node to initialize them. And the panic disappeared.

Hi, Yasuaki

You said the panic disappeared with the old patch, how did it happen
since the old patch was considered incorrect?

Did the panic happen so rarely that it was mistaken disappeared?

How did you test the new one?

In the point of review, we definitely need to use zalloc_cpumask_var_node()
instead of alloc_cpumask_var_node() in wq_numa_init().

So for the new patch:

Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>

Thanks,
Lai

>>>
>>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>
>> Hi, Yasuaki
>>
>> All cpumasks in the wq_numa_possible_cpumask array are allocated in
>> wq_numa_init():
>>
>>     for_each_node(node)
>>         BUG_ON(!alloc_cpumask_var_node(&tbl[node], GFP_KERNEL,
>>                 node_online(node) ? node : NUMA_NO_NODE));
>>
>>     [snip...]
>>
>>     wq_numa_possible_cpumask = tbl;
>>
>> I didn't find out how does this patch make the all entries of
>> wq_numa_possible_cpumask zeroed.
> 
> Sorry. I mistook. I will resend soon.
> 
> Thanks,
> Yasuaki Ishimatsu.
> 
>>
>> Or I misunderstood.
>>
>> Thanks,
>> Lai
>>
>>> ---
>>>   kernel/workqueue.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>> index 6203d29..b393ded 100644
>>> --- a/kernel/workqueue.c
>>> +++ b/kernel/workqueue.c
>>> @@ -3338,7 +3338,7 @@ struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask)
>>>       attrs = kzalloc(sizeof(*attrs), gfp_mask);
>>>       if (!attrs)
>>>           goto fail;
>>> -    if (!alloc_cpumask_var(&attrs->cpumask, gfp_mask))
>>> +    if (!zalloc_cpumask_var(&attrs->cpumask, gfp_mask))
>>>           goto fail;
>>>
>>>       cpumask_copy(attrs->cpumask, cpu_possible_mask);
>>>
>>> .
>>>
>>
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] workqueue: initialize cpumask of wq_numa_possible_cpumask
  2014-07-07  1:04     ` Lai Jiangshan
@ 2014-07-07  1:20       ` Yasuaki Ishimatsu
  0 siblings, 0 replies; 7+ messages in thread
From: Yasuaki Ishimatsu @ 2014-07-07  1:20 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: tj, linux-kernel

(2014/07/07 10:04), Lai Jiangshan wrote:
> On 07/07/2014 08:33 AM, Yasuaki Ishimatsu wrote:
>> (2014/07/07 9:19), Lai Jiangshan wrote:
>>> On 07/07/2014 01:21 AM, Yasuaki Ishimatsu wrote:
>>>> When hot-adding and onlining CPU, kernel panic occurs, showing following
>>>> call trace.
>>>>
>>>>    BUG: unable to handle kernel paging request at 0000000000001d08
>>>>    IP: [<ffffffff8114acfd>] __alloc_pages_nodemask+0x9d/0xb10
>>>>    PGD 0
>>>>    Oops: 0000 [#1] SMP
>>>>    ...
>>>>    Call Trace:
>>>>     [<ffffffff812b8745>] ? cpumask_next_and+0x35/0x50
>>>>     [<ffffffff810a3283>] ? find_busiest_group+0x113/0x8f0
>>>>     [<ffffffff81193bc9>] ? deactivate_slab+0x349/0x3c0
>>>>     [<ffffffff811926f1>] new_slab+0x91/0x300
>>>>     [<ffffffff815de95a>] __slab_alloc+0x2bb/0x482
>>>>     [<ffffffff8105bc1c>] ? copy_process.part.25+0xfc/0x14c0
>>>>     [<ffffffff810a3c78>] ? load_balance+0x218/0x890
>>>>     [<ffffffff8101a679>] ? sched_clock+0x9/0x10
>>>>     [<ffffffff81105ba9>] ? trace_clock_local+0x9/0x10
>>>>     [<ffffffff81193d1c>] kmem_cache_alloc_node+0x8c/0x200
>>>>     [<ffffffff8105bc1c>] copy_process.part.25+0xfc/0x14c0
>>>>     [<ffffffff81114d0d>] ? trace_buffer_unlock_commit+0x4d/0x60
>>>>     [<ffffffff81085a80>] ? kthread_create_on_node+0x140/0x140
>>>>     [<ffffffff8105d0ec>] do_fork+0xbc/0x360
>>>>     [<ffffffff8105d3b6>] kernel_thread+0x26/0x30
>>>>     [<ffffffff81086652>] kthreadd+0x2c2/0x300
>>>>     [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60
>>>>     [<ffffffff815f20ec>] ret_from_fork+0x7c/0xb0
>>>>     [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60
>>>>
>>>> In my investigation, I found the root cause is wq_numa_possible_cpumask.
>>>> All entries of wq_numa_possible_cpumask is allocated by
>>>> alloc_cpumask_var_node(). And these entries are used without initializing.
>>>> So these entries have wrong value.
>>>>
>>>> When hot-adding and onlining CPU, wq_update_unbound_numa() is called.
>>>> wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq()
>>>> calls get_unbound_pool(). In get_unbound_pool(), worker_pool->node is set
>>>> as follow:
>>>>
>>>> #kernel/workqueue.c
>>>> 3592         /* if cpumask is contained inside a NUMA node, we belong to that node */
>>>> 3593         if (wq_numa_enabled) {
>>>> 3594                 for_each_node(node) {
>>>> 3595                         if (cpumask_subset(pool->attrs->cpumask,
>>>> 3596                                            wq_numa_possible_cpumask[node])) {
>>>> 3597                                 pool->node = node;
>>>> 3598                                 break;
>>>> 3599                         }
>>>> 3600                 }
>>>> 3601         }
>>>>
>>>> But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong
>>>> node is selected. As a result, kernel panic occurs.
>>>>
>>>> By this patch, all entries of wq_numa_possible_cpumask are allocated by
>>>> zalloc_cpumask_var_node to initialize them. And the panic disappeared.
>
> Hi, Yasuaki
>

> You said the panic disappeared with the old patch, how did it happen
> since the old patch was considered incorrect?
>
> Did the panic happen so rarely that it was mistaken disappeared?
>
> How did you test the new one?

I tested new patch. And I confirmed the panic disappeared.

The patch is one liner.  So after testing correct patch (new one), I wrote
a patch into my tree by hand not use git format-patch/am command. Then I mistook
to create old patch and send it.

So I tested new patch and didn't test old patch.
Sorry for my mistake.

>
> In the point of review, we definitely need to use zalloc_cpumask_var_node()
> instead of alloc_cpumask_var_node() in wq_numa_init().
>
> So for the new patch:
>

> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Thanks for your review.

Thanks,
Yasuaki Ishimatsu

>
> Thanks,
> Lai
>
>>>>
>>>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>>
>>> Hi, Yasuaki
>>>
>>> All cpumasks in the wq_numa_possible_cpumask array are allocated in
>>> wq_numa_init():
>>>
>>>      for_each_node(node)
>>>          BUG_ON(!alloc_cpumask_var_node(&tbl[node], GFP_KERNEL,
>>>                  node_online(node) ? node : NUMA_NO_NODE));
>>>
>>>      [snip...]
>>>
>>>      wq_numa_possible_cpumask = tbl;
>>>
>>> I didn't find out how does this patch make the all entries of
>>> wq_numa_possible_cpumask zeroed.
>>
>> Sorry. I mistook. I will resend soon.
>>
>> Thanks,
>> Yasuaki Ishimatsu.
>>
>>>
>>> Or I misunderstood.
>>>
>>> Thanks,
>>> Lai
>>>
>>>> ---
>>>>    kernel/workqueue.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>>>> index 6203d29..b393ded 100644
>>>> --- a/kernel/workqueue.c
>>>> +++ b/kernel/workqueue.c
>>>> @@ -3338,7 +3338,7 @@ struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask)
>>>>        attrs = kzalloc(sizeof(*attrs), gfp_mask);
>>>>        if (!attrs)
>>>>            goto fail;
>>>> -    if (!alloc_cpumask_var(&attrs->cpumask, gfp_mask))
>>>> +    if (!zalloc_cpumask_var(&attrs->cpumask, gfp_mask))
>>>>            goto fail;
>>>>
>>>>        cpumask_copy(attrs->cpumask, cpu_possible_mask);
>>>>
>>>> .
>>>>
>>>
>>
>>
>> .
>>
>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH wq/for-3.16-fixes] workqueue: zero cpumask of wq_numa_possible_cpumask on init
  2014-07-07  0:47   ` [resend PATCH] " Yasuaki Ishimatsu
@ 2014-07-07 14:01     ` Tejun Heo
  0 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2014-07-07 14:01 UTC (permalink / raw)
  To: Yasuaki Ishimatsu; +Cc: Lai Jiangshan, linux-kernel

Applied to wq/for-3.16-fixes with slightly updated $SUBJ.

Thanks.
------ 8< ------
>From 5a6024f1604eef119cf3a6fa413fe0261a81a8f3 Mon Sep 17 00:00:00 2001
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Date: Mon, 7 Jul 2014 09:56:48 -0400

When hot-adding and onlining CPU, kernel panic occurs, showing following
call trace.

  BUG: unable to handle kernel paging request at 0000000000001d08
  IP: [<ffffffff8114acfd>] __alloc_pages_nodemask+0x9d/0xb10
  PGD 0
  Oops: 0000 [#1] SMP
  ...
  Call Trace:
   [<ffffffff812b8745>] ? cpumask_next_and+0x35/0x50
   [<ffffffff810a3283>] ? find_busiest_group+0x113/0x8f0
   [<ffffffff81193bc9>] ? deactivate_slab+0x349/0x3c0
   [<ffffffff811926f1>] new_slab+0x91/0x300
   [<ffffffff815de95a>] __slab_alloc+0x2bb/0x482
   [<ffffffff8105bc1c>] ? copy_process.part.25+0xfc/0x14c0
   [<ffffffff810a3c78>] ? load_balance+0x218/0x890
   [<ffffffff8101a679>] ? sched_clock+0x9/0x10
   [<ffffffff81105ba9>] ? trace_clock_local+0x9/0x10
   [<ffffffff81193d1c>] kmem_cache_alloc_node+0x8c/0x200
   [<ffffffff8105bc1c>] copy_process.part.25+0xfc/0x14c0
   [<ffffffff81114d0d>] ? trace_buffer_unlock_commit+0x4d/0x60
   [<ffffffff81085a80>] ? kthread_create_on_node+0x140/0x140
   [<ffffffff8105d0ec>] do_fork+0xbc/0x360
   [<ffffffff8105d3b6>] kernel_thread+0x26/0x30
   [<ffffffff81086652>] kthreadd+0x2c2/0x300
   [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60
   [<ffffffff815f20ec>] ret_from_fork+0x7c/0xb0
   [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60

In my investigation, I found the root cause is wq_numa_possible_cpumask.
All entries of wq_numa_possible_cpumask is allocated by
alloc_cpumask_var_node(). And these entries are used without initializing.
So these entries have wrong value.

When hot-adding and onlining CPU, wq_update_unbound_numa() is called.
wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq()
calls get_unbound_pool(). In get_unbound_pool(), worker_pool->node is set
as follow:

3592         /* if cpumask is contained inside a NUMA node, we belong to that node */
3593         if (wq_numa_enabled) {
3594                 for_each_node(node) {
3595                         if (cpumask_subset(pool->attrs->cpumask,
3596                                            wq_numa_possible_cpumask[node])) {
3597                                 pool->node = node;
3598                                 break;
3599                         }
3600                 }
3601         }

But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong
node is selected. As a result, kernel panic occurs.

By this patch, all entries of wq_numa_possible_cpumask are allocated by
zalloc_cpumask_var_node to initialize them. And the panic disappeared.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Fixes: bce903809ab3 ("workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]")
---
 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6f5f9c7..35974ac 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4880,7 +4880,7 @@ static void __init wq_numa_init(void)
 	BUG_ON(!tbl);
 
 	for_each_node(node)
-		BUG_ON(!alloc_cpumask_var_node(&tbl[node], GFP_KERNEL,
+		BUG_ON(!zalloc_cpumask_var_node(&tbl[node], GFP_KERNEL,
 				node_online(node) ? node : NUMA_NO_NODE));
 
 	for_each_possible_cpu(cpu) {
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-07-07 14:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-06 17:21 [PATCH] workqueue: initialize cpumask of wq_numa_possible_cpumask Yasuaki Ishimatsu
2014-07-07  0:19 ` Lai Jiangshan
2014-07-07  0:33   ` Yasuaki Ishimatsu
2014-07-07  1:04     ` Lai Jiangshan
2014-07-07  1:20       ` Yasuaki Ishimatsu
2014-07-07  0:47   ` [resend PATCH] " Yasuaki Ishimatsu
2014-07-07 14:01     ` [PATCH wq/for-3.16-fixes] workqueue: zero cpumask of wq_numa_possible_cpumask on init Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.