All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] Hotplug: fix the bug that the system is down,when memory is not in node0 and cpu is logically hotadded.
  2015-06-17 15:46 [PATCH] Hotplug: fix the bug that the system is down,when memory is not in node0 and cpu is logically hotadded gongzg
@ 2015-06-17  8:19 ` Frans Klaver
  0 siblings, 0 replies; 5+ messages in thread
From: Frans Klaver @ 2015-06-17  8:19 UTC (permalink / raw)
  To: gongzg; +Cc: tj, linux-kernel, SongXiumiao

On Wed, Jun 17, 2015 at 5:46 PM, gongzg <gongzhaogang@inspur.com> wrote:
> From: GongZhaogang <gongzhaogang@inspur.com>
>
> By analysing the bug function call trace,we find that create_worker function
> will alloc the memory from node0.Because node0 is offline,the allocation is
> failed.Then we add a condition to ensure the node is online and
> system can alloc memory from a node that is online.
>
> Follow is the bug information:

>
> Signed-off-by: SongXiumiao <songxiumiao@inspur.com>

I still think you should add at least one space after your periods and
commas, and that you should be able to shorten your subject line.
Explain the exact bug in your commit message.

I also just noticed that the From/Signed-off-by names and emails don't
match. If SongXiumiao wrote the patch, From: should contain that name
and email. And since you are sending it, you will have to add your
sign-off as well.

So here's a suggestion of how your commit message could look:

Subject: [PATCH] hotplug: fix oops when adding cpu
From: SongXiumiao <songxiumiao@inspur.com>

If memory is not in node0 and a cpu is logically hotadded, the kernel oopses ...

By analyzing ...

Here's the backtrace:
...

Signed-off-by: SongXiumiao <songxiumiao@inspur.com>
Signed-off-by: GongZhaogang <gongzhaogang@inspur.com>

Thanks,
Frans


> ---
>  kernel/workqueue.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 586ad91..22d194c 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3253,7 +3253,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
>         if (wq_numa_enabled) {
>                 for_each_node(node) {
>                         if (cpumask_subset(pool->attrs->cpumask,
> -                                          wq_numa_possible_cpumask[node])) {
> +                                   wq_numa_possible_cpumask[node])
> +                                              && node_online(node)) {
>                                 pool->node = node;
>                                 break;
>                         }
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] Hotplug: fix the bug that the system is down,when memory  is not in node0 and cpu is logically hotadded.
@ 2015-06-17 15:46 gongzg
  2015-06-17  8:19 ` Frans Klaver
  0 siblings, 1 reply; 5+ messages in thread
From: gongzg @ 2015-06-17 15:46 UTC (permalink / raw)
  To: tj, linux-kernel; +Cc: GongZhaogang, SongXiumiao

From: GongZhaogang <gongzhaogang@inspur.com>

By analysing the bug function call trace,we find that create_worker function 
will alloc the memory from node0.Because node0 is offline,the allocation is
failed.Then we add a condition to ensure the node is online and
system can alloc memory from a node that is online.

Follow is the bug information:
[root@localhost ~]# echo 1 > /sys/devices/system/cpu/cpu90/online
[  225.611209] smpboot: Booting Node 2 Processor 90 APIC 0x40
[18446744029.482996] kvm: enabling virtualization on CPU90
[  225.725503] TSC synchronization [CPU#43 -> CPU#90]:
[  225.730952] Measured 672516581900 cycles TSC warp between CPUs, turning off TSC clock.
[  225.739800] tsc: Marking TSC unstable due to check_tsc_sync_source failed
[  225.755126] BUG: unable to handle kernel paging request at 0000000000001b08
[  225.762931] IP: [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
[  225.770247] PGD 449bb0067 PUD 46110e067 PMD 0
[  225.775248] Oops: 0000 [#1] SMP
[  225.778875] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd
[  225.868198] CPU: 43 PID: 5400 Comm: bash Not tainted 4.0.0-rc4-bug-fixed-remove #16
[  225.876754] Hardware name: Insyde Brickland/Type2 - Board Product Name1, BIOS Brickland.05.04.15.0024 02/28/2015
[  225.888122] task: ffff88045a3d8da0 ti: ffff880446120000 task.ti: ffff880446120000
[  225.896484] RIP: 0010:[<ffffffff81182597>]  [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
[  225.906509] RSP: 0018:ffff880446123918  EFLAGS: 00010246
[  225.912443] RAX: 0000000000001b00 RBX: 0000000000000010 RCX: 0000000000000000
[  225.920416] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000002052d0
[  225.928388] RBP: ffff880446123a08 R08: ffff880460eca0c0 R09: 0000000060eca101
[  225.936361] R10: ffff88046d007300 R11: ffffffff8108dd31 R12: 000000000001002a
[  225.944334] R13: 00000000002052d0 R14: 0000000000000001 R15: 00000000000040d0
[  225.952306] FS:  00007f9386450740(0000) GS:ffff88046db60000(0000) knlGS:0000000000000000
[  225.961346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  225.967765] CR2: 0000000000001b08 CR3: 00000004612a3000 CR4: 00000000001407e0
[  225.975735] Stack:
[  225.977981]  00000000002052d0 0000000000000000 0000000000000003 ffff88045a3d8da0
[  225.986291]  ffff880446123988 ffffffff811c7f81 ffff88045a3d8da0 0000000000000000
[  225.994597]  000080d000000002 ffff88046d005500 000000000003000f 002052d0002052d0
[  226.002904] Call Trace:
[  226.005645]  [<ffffffff811c7f81>] ? alloc_pages_current+0x91/0x100
[  226.012557]  [<ffffffff811d27c3>] ? deactivate_slab+0x383/0x400
[  226.019173]  [<ffffffff811d3957>] new_slab+0xa7/0x460
[  226.024826]  [<ffffffff81678c75>] __slab_alloc+0x310/0x470
[  226.030960]  [<ffffffff8130caf6>] ? get_from_free_list+0x46/0x60
[  226.037679]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
[  226.043812]  [<ffffffff811d46c1>] kmem_cache_alloc_node_trace+0x91/0x250
[  226.051299]  [<ffffffff8108dd31>] alloc_worker+0x21/0x50
[  226.057236]  [<ffffffff8108ff23>] create_worker+0x53/0x1e0
[  226.063357]  [<ffffffff81092092>] alloc_unbound_pwq+0x2a2/0x510
[  226.069974]  [<ffffffff810924b4>] wq_update_unbound_numa+0x1b4/0x220
[  226.077076]  [<ffffffff81092828>] workqueue_cpu_up_callback+0x308/0x3d0
[  226.084468]  [<ffffffff8109784e>] notifier_call_chain+0x4e/0x80
[  226.091084]  [<ffffffff8109796e>] __raw_notifier_call_chain+0xe/0x10
[  226.098189]  [<ffffffff810774f3>] cpu_notify+0x23/0x50
[  226.103929]  [<ffffffff81077878>] _cpu_up+0x188/0x1a0
[  226.109574]  [<ffffffff81077919>] cpu_up+0x89/0xb0
[  226.114923]  [<ffffffff8166fba0>] cpu_subsys_online+0x40/0x90
[  226.121350]  [<ffffffff814386dd>] device_online+0x6d/0xa0
[  226.127382]  [<ffffffff814387a5>] online_store+0x95/0xa0
[  226.133322]  [<ffffffff814358a8>] dev_attr_store+0x18/0x30
[  226.139457]  [<ffffffff8126d76d>] sysfs_kf_write+0x3d/0x50
[  226.145586]  [<ffffffff8126cc1a>] kernfs_fop_write+0x12a/0x180
[  226.152109]  [<ffffffff811f1bb7>] vfs_write+0xb7/0x1f0
[  226.157853]  [<ffffffff810232bc>] ? do_audit_syscall_entry+0x6c/0x70
[  226.164954]  [<ffffffff811f2835>] SyS_write+0x55/0xd0
[  226.170595]  [<ffffffff81681f09>] system_call_fastpath+0x12/0x17
[  226.177306] Code: 30 97 00 89 45 bc 83 e1 0f b8 22 01 32 01 01 c9 d3 f8 83 e0 03 89 9d 6c ff ff ff 83 e3 10 89 45 c0 0f 85 6d 01 00 00 48 8b 45 88 <48> 83 78 08 00 0f 84 51 01 00 00 b8 01
[  226.199175] RIP  [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
[  226.206576]  RSP <ffff880446123918>
[  226.210471] CR2: 0000000000001b08
[  226.227939] ---[ end trace 30d753e1e1124696 ]---
[  226.412591] Kernel panic - not syncing: Fatal exception
[  226.430948] Kernel Offset: disabled
[  226.434845] drm_kms_helper: panic occurred, switching back to text console
[  226.618325] ---[ end Kernel panic - not syncing: Fatal exception
[  226.625047] ------------[ cut here ]------------
[  226.630213] WARNING: CPU: 43 PID: 5400 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5d/0x60()
[  226.640999] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd
[  226.730275] CPU: 43 PID: 5400 Comm: bash Tainted: G      D         4.0.0-rc4-bug-fixed-remove #16
[  226.740189] Hardware name: Insyde Brickland/Type2 - Board Product Name1, BIOS Brickland.05.04.15.0024 02/28/2015
[  226.751558]  0000000000000000 00000000aa535e80 ffff88046db63d58 ffffffff8167aa08
[  226.759865]  0000000000000000 0000000000000000 ffff88046db63d98 ffffffff810772da
[  226.768173]  ffff88046db63d98 0000000000000000 ffff88046d615380 000000000000002b
[  226.776480] Call Trace:
[  226.779212]  <IRQ>  [<ffffffff8167aa08>] dump_stack+0x45/0x57
[  226.785657]  [<ffffffff810772da>] warn_slowpath_common+0x8a/0xc0
[  226.792367]  [<ffffffff8107740a>] warn_slowpath_null+0x1a/0x20
[  226.798886]  [<ffffffff8104a64d>] native_smp_send_reschedule+0x5d/0x60
[  226.806182]  [<ffffffff810b4fe5>] trigger_load_balance+0x145/0x1b0
[  226.813093]  [<ffffffff810a348c>] scheduler_tick+0x9c/0xe0
[  226.819228]  [<ffffffff810e0a21>] update_process_times+0x51/0x60
[  226.825946]  [<ffffffff810f0925>] tick_sched_handle.isra.18+0x25/0x60
[  226.833143]  [<ffffffff810f09a4>] tick_sched_timer+0x44/0x80
[  226.839467]  [<ffffffff810e1737>] __run_hrtimer+0x77/0x1d0
[  226.845590]  [<ffffffff810f0960>] ? tick_sched_handle.isra.18+0x60/0x60
[  226.852980]  [<ffffffff810e1b13>] hrtimer_interrupt+0x103/0x230
[  226.859596]  [<ffffffff8104d3d9>] local_apic_timer_interrupt+0x39/0x60
[  226.866883]  [<ffffffff81684d85>] smp_apic_timer_interrupt+0x45/0x60
[  226.873982]  [<ffffffff81682ded>] apic_timer_interrupt+0x6d/0x80
[  226.880690]  <EOI>  [<ffffffff81675abe>] ? panic+0x1c3/0x204
[  226.887036]  [<ffffffff81675ab7>] ? panic+0x1bc/0x204
[  226.892682]  [<ffffffff81018949>] oops_end+0x109/0x120
[  226.898422]  [<ffffffff81675285>] no_context+0x2ee/0x366
[  226.904359]  [<ffffffff81675370>] __bad_area_nosemaphore+0x73/0x1cc
[  226.911361]  [<ffffffff816756ae>] bad_area+0x44/0x4c
[  226.916910]  [<ffffffff81062b1a>] __do_page_fault+0x2ea/0x420
[  226.923331]  [<ffffffff81062c81>] do_page_fault+0x31/0x70
[  226.929364]  [<ffffffff81683f08>] page_fault+0x28/0x30
[  226.935106]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
[  226.941235]  [<ffffffff81182597>] ? __alloc_pages_nodemask+0xb7/0x940
[  226.948430]  [<ffffffff81182705>] ? __alloc_pages_nodemask+0x225/0x940
[  226.955725]  [<ffffffff811c7f81>] ? alloc_pages_current+0x91/0x100
[  226.962624]  [<ffffffff811d27c3>] ? deactivate_slab+0x383/0x400
[  226.969239]  [<ffffffff811d3957>] new_slab+0xa7/0x460
[  226.974885]  [<ffffffff81678c75>] __slab_alloc+0x310/0x470
[  226.981015]  [<ffffffff8130caf6>] ? get_from_free_list+0x46/0x60
[  226.987727]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
[  226.993851]  [<ffffffff811d46c1>] kmem_cache_alloc_node_trace+0x91/0x250
[  227.001340]  [<ffffffff8108dd31>] alloc_worker+0x21/0x50
[  227.007275]  [<ffffffff8108ff23>] create_worker+0x53/0x1e0
[  227.013404]  [<ffffffff81092092>] alloc_unbound_pwq+0x2a2/0x510
[  227.020019]  [<ffffffff810924b4>] wq_update_unbound_numa+0x1b4/0x220
[  227.027112]  [<ffffffff81092828>] workqueue_cpu_up_callback+0x308/0x3d0
[  227.034502]  [<ffffffff8109784e>] notifier_call_chain+0x4e/0x80
[  227.041117]  [<ffffffff8109796e>] __raw_notifier_call_chain+0xe/0x10
[  227.048219]  [<ffffffff810774f3>] cpu_notify+0x23/0x50
[  227.053961]  [<ffffffff81077878>] _cpu_up+0x188/0x1a0
[  227.059597]  [<ffffffff81077919>] cpu_up+0x89/0xb0
[  227.064950]  [<ffffffff8166fba0>] cpu_subsys_online+0x40/0x90
[  227.071372]  [<ffffffff814386dd>] device_online+0x6d/0xa0
[  227.077395]  [<ffffffff814387a5>] online_store+0x95/0xa0
[  227.083332]  [<ffffffff814358a8>] dev_attr_store+0x18/0x30
[  227.089460]  [<ffffffff8126d76d>] sysfs_kf_write+0x3d/0x50
[  227.095589]  [<ffffffff8126cc1a>] kernfs_fop_write+0x12a/0x180
[  227.102108]  [<ffffffff811f1bb7>] vfs_write+0xb7/0x1f0
[  227.107850]  [<ffffffff810232bc>] ? do_audit_syscall_entry+0x6c/0x70
[  227.114950]  [<ffffffff811f2835>] SyS_write+0x55/0xd0
[  227.120595]  [<ffffffff81681f09>] system_call_fastpath+0x12/0x17
[  227.127306] ---[ end trace 30d753e1e1124697 ]---

Signed-off-by: SongXiumiao <songxiumiao@inspur.com>
---
 kernel/workqueue.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 586ad91..22d194c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3253,7 +3253,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 	if (wq_numa_enabled) {
 		for_each_node(node) {
 			if (cpumask_subset(pool->attrs->cpumask,
-					   wq_numa_possible_cpumask[node])) {
+				    wq_numa_possible_cpumask[node])
+					       && node_online(node)) {
 				pool->node = node;
 				break;
 			}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] Hotplug: fix the bug that the system is down,when memory is not in node0 and cpu is logically hotadded.
  2015-05-08 15:23 ` Tejun Heo
@ 2015-05-11  1:41   ` Gu Zheng
  0 siblings, 0 replies; 5+ messages in thread
From: Gu Zheng @ 2015-05-11  1:41 UTC (permalink / raw)
  To: Tejun Heo, Song Xiumiao
  Cc: linux-kernel, yanxiaofeng, fandd, liuchangsheng, Gong Zhaogang,
	Lai Jiangshan, kamezawa.hiroyu

Hi TJ, Song,

Sorry for late reply.

On 05/08/2015 11:23 PM, Tejun Heo wrote:

> Cc'ing Lai, Gu and Kamezawa as they've been working in the area for a
> while now.  Gu, is this related to what you've been working on?


Yes, they are the same. And we are still working on it, please refer to the
following for detail:
https://lkml.org/lkml/2015/4/24/143
https://lkml.org/lkml/2015/2/27/145
https://lkml.org/lkml/2015/3/25/989

Regards,
Gu

> 
> Thanks.
> 
> On Fri, May 08, 2015 at 07:16:40PM +0800, Song Xiumiao wrote:
>> From: songxiumiao <songxiumiao@inspur.com>
>>
>> By analysing the bug function call trace,we find that create_worker
>> function will alloc the memory from node0.Because node0 is offline,
>> the allocation is failed. Then we add a condition to ensure the node
>> is online and system can alloc memory from a node that is online.
>>
>> Follow is the bug information:
>> [root@localhost ~]# echo 1 > /sys/devices/system/cpu/cpu90/online
>> [  225.611209] smpboot: Booting Node 2 Processor 90 APIC 0x40
>> [18446744029.482996] kvm: enabling virtualization on CPU90
>> [  225.725503] TSC synchronization [CPU#43 -> CPU#90]:
>> [  225.730952] Measured 672516581900 cycles TSC warp between CPUs, turning off TSC clock.
>> [  225.739800] tsc: Marking TSC unstable due to check_tsc_sync_source failed
>> [  225.755126] BUG: unable to handle kernel paging request at 0000000000001b08
>> [  225.762931] IP: [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
>> [  225.770247] PGD 449bb0067 PUD 46110e067 PMD 0
>> [  225.775248] Oops: 0000 [#1] SMP
>> [  225.778875] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd
>> [  225.868198] CPU: 43 PID: 5400 Comm: bash Not tainted 4.0.0-rc4-bug-fixed-remove #16
>> [  225.876754] Hardware name: Insyde Brickland/Type2 - Board Product Name1, BIOS Brickland.05.04.15.0024 02/28/2015
>> [  225.888122] task: ffff88045a3d8da0 ti: ffff880446120000 task.ti: ffff880446120000
>> [  225.896484] RIP: 0010:[<ffffffff81182597>]  [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
>> [  225.906509] RSP: 0018:ffff880446123918  EFLAGS: 00010246
>> [  225.912443] RAX: 0000000000001b00 RBX: 0000000000000010 RCX: 0000000000000000
>> [  225.920416] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000002052d0
>> [  225.928388] RBP: ffff880446123a08 R08: ffff880460eca0c0 R09: 0000000060eca101
>> [  225.936361] R10: ffff88046d007300 R11: ffffffff8108dd31 R12: 000000000001002a
>> [  225.944334] R13: 00000000002052d0 R14: 0000000000000001 R15: 00000000000040d0
>> [  225.952306] FS:  00007f9386450740(0000) GS:ffff88046db60000(0000) knlGS:0000000000000000
>> [  225.961346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  225.967765] CR2: 0000000000001b08 CR3: 00000004612a3000 CR4: 00000000001407e0
>> [  225.975735] Stack:
>> [  225.977981]  00000000002052d0 0000000000000000 0000000000000003 ffff88045a3d8da0
>> [  225.986291]  ffff880446123988 ffffffff811c7f81 ffff88045a3d8da0 0000000000000000
>> [  225.994597]  000080d000000002 ffff88046d005500 000000000003000f 002052d0002052d0
>> [  226.002904] Call Trace:
>> [  226.005645]  [<ffffffff811c7f81>] ? alloc_pages_current+0x91/0x100
>> [  226.012557]  [<ffffffff811d27c3>] ? deactivate_slab+0x383/0x400
>> [  226.019173]  [<ffffffff811d3957>] new_slab+0xa7/0x460
>> [  226.024826]  [<ffffffff81678c75>] __slab_alloc+0x310/0x470
>> [  226.030960]  [<ffffffff8130caf6>] ? get_from_free_list+0x46/0x60
>> [  226.037679]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
>> [  226.043812]  [<ffffffff811d46c1>] kmem_cache_alloc_node_trace+0x91/0x250
>> [  226.051299]  [<ffffffff8108dd31>] alloc_worker+0x21/0x50
>> [  226.057236]  [<ffffffff8108ff23>] create_worker+0x53/0x1e0
>> [  226.063357]  [<ffffffff81092092>] alloc_unbound_pwq+0x2a2/0x510
>> [  226.069974]  [<ffffffff810924b4>] wq_update_unbound_numa+0x1b4/0x220
>> [  226.077076]  [<ffffffff81092828>] workqueue_cpu_up_callback+0x308/0x3d0
>> [  226.084468]  [<ffffffff8109784e>] notifier_call_chain+0x4e/0x80
>> [  226.091084]  [<ffffffff8109796e>] __raw_notifier_call_chain+0xe/0x10
>> [  226.098189]  [<ffffffff810774f3>] cpu_notify+0x23/0x50
>> [  226.103929]  [<ffffffff81077878>] _cpu_up+0x188/0x1a0
>> [  226.109574]  [<ffffffff81077919>] cpu_up+0x89/0xb0
>> [  226.114923]  [<ffffffff8166fba0>] cpu_subsys_online+0x40/0x90
>> [  226.121350]  [<ffffffff814386dd>] device_online+0x6d/0xa0
>> [  226.127382]  [<ffffffff814387a5>] online_store+0x95/0xa0
>> [  226.133322]  [<ffffffff814358a8>] dev_attr_store+0x18/0x30
>> [  226.139457]  [<ffffffff8126d76d>] sysfs_kf_write+0x3d/0x50
>> [  226.145586]  [<ffffffff8126cc1a>] kernfs_fop_write+0x12a/0x180
>> [  226.152109]  [<ffffffff811f1bb7>] vfs_write+0xb7/0x1f0
>> [  226.157853]  [<ffffffff810232bc>] ? do_audit_syscall_entry+0x6c/0x70
>> [  226.164954]  [<ffffffff811f2835>] SyS_write+0x55/0xd0
>> [  226.170595]  [<ffffffff81681f09>] system_call_fastpath+0x12/0x17
>> [  226.177306] Code: 30 97 00 89 45 bc 83 e1 0f b8 22 01 32 01 01 c9 d3 f8 83 e0 03 89 9d 6c ff ff ff 83 e3 10 89 45 c0 0f 85 6d 01 00 00 48 8b 45 88 <48> 83 78 08 00 0f 84 51 01 00 00 b8 01
>> [  226.199175] RIP  [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
>> [  226.206576]  RSP <ffff880446123918>
>> [  226.210471] CR2: 0000000000001b08
>> [  226.227939] ---[ end trace 30d753e1e1124696 ]---
>> [  226.412591] Kernel panic - not syncing: Fatal exception
>> [  226.430948] Kernel Offset: disabled
>> [  226.434845] drm_kms_helper: panic occurred, switching back to text console
>> [  226.618325] ---[ end Kernel panic - not syncing: Fatal exception
>> [  226.625047] ------------[ cut here ]------------
>> [  226.630213] WARNING: CPU: 43 PID: 5400 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5d/0x60()
>> [  226.640999] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd
>> [  226.730275] CPU: 43 PID: 5400 Comm: bash Tainted: G      D         4.0.0-rc4-bug-fixed-remove #16
>> [  226.740189] Hardware name: Insyde Brickland/Type2 - Board Product Name1, BIOS Brickland.05.04.15.0024 02/28/2015
>> [  226.751558]  0000000000000000 00000000aa535e80 ffff88046db63d58 ffffffff8167aa08
>> [  226.759865]  0000000000000000 0000000000000000 ffff88046db63d98 ffffffff810772da
>> [  226.768173]  ffff88046db63d98 0000000000000000 ffff88046d615380 000000000000002b
>> [  226.776480] Call Trace:
>> [  226.779212]  <IRQ>  [<ffffffff8167aa08>] dump_stack+0x45/0x57
>> [  226.785657]  [<ffffffff810772da>] warn_slowpath_common+0x8a/0xc0
>> [  226.792367]  [<ffffffff8107740a>] warn_slowpath_null+0x1a/0x20
>> [  226.798886]  [<ffffffff8104a64d>] native_smp_send_reschedule+0x5d/0x60
>> [  226.806182]  [<ffffffff810b4fe5>] trigger_load_balance+0x145/0x1b0
>> [  226.813093]  [<ffffffff810a348c>] scheduler_tick+0x9c/0xe0
>> [  226.819228]  [<ffffffff810e0a21>] update_process_times+0x51/0x60
>> [  226.825946]  [<ffffffff810f0925>] tick_sched_handle.isra.18+0x25/0x60
>> [  226.833143]  [<ffffffff810f09a4>] tick_sched_timer+0x44/0x80
>> [  226.839467]  [<ffffffff810e1737>] __run_hrtimer+0x77/0x1d0
>> [  226.845590]  [<ffffffff810f0960>] ? tick_sched_handle.isra.18+0x60/0x60
>> [  226.852980]  [<ffffffff810e1b13>] hrtimer_interrupt+0x103/0x230
>> [  226.859596]  [<ffffffff8104d3d9>] local_apic_timer_interrupt+0x39/0x60
>> [  226.866883]  [<ffffffff81684d85>] smp_apic_timer_interrupt+0x45/0x60
>> [  226.873982]  [<ffffffff81682ded>] apic_timer_interrupt+0x6d/0x80
>> [  226.880690]  <EOI>  [<ffffffff81675abe>] ? panic+0x1c3/0x204
>> [  226.887036]  [<ffffffff81675ab7>] ? panic+0x1bc/0x204
>> [  226.892682]  [<ffffffff81018949>] oops_end+0x109/0x120
>> [  226.898422]  [<ffffffff81675285>] no_context+0x2ee/0x366
>> [  226.904359]  [<ffffffff81675370>] __bad_area_nosemaphore+0x73/0x1cc
>> [  226.911361]  [<ffffffff816756ae>] bad_area+0x44/0x4c
>> [  226.916910]  [<ffffffff81062b1a>] __do_page_fault+0x2ea/0x420
>> [  226.923331]  [<ffffffff81062c81>] do_page_fault+0x31/0x70
>> [  226.929364]  [<ffffffff81683f08>] page_fault+0x28/0x30
>> [  226.935106]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
>> [  226.941235]  [<ffffffff81182597>] ? __alloc_pages_nodemask+0xb7/0x940
>> [  226.948430]  [<ffffffff81182705>] ? __alloc_pages_nodemask+0x225/0x940
>> [  226.955725]  [<ffffffff811c7f81>] ? alloc_pages_current+0x91/0x100
>> [  226.962624]  [<ffffffff811d27c3>] ? deactivate_slab+0x383/0x400
>> [  226.969239]  [<ffffffff811d3957>] new_slab+0xa7/0x460
>> [  226.974885]  [<ffffffff81678c75>] __slab_alloc+0x310/0x470
>> [  226.981015]  [<ffffffff8130caf6>] ? get_from_free_list+0x46/0x60
>> [  226.987727]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
>> [  226.993851]  [<ffffffff811d46c1>] kmem_cache_alloc_node_trace+0x91/0x250
>> [  227.001340]  [<ffffffff8108dd31>] alloc_worker+0x21/0x50
>> [  227.007275]  [<ffffffff8108ff23>] create_worker+0x53/0x1e0
>> [  227.013404]  [<ffffffff81092092>] alloc_unbound_pwq+0x2a2/0x510
>> [  227.020019]  [<ffffffff810924b4>] wq_update_unbound_numa+0x1b4/0x220
>> [  227.027112]  [<ffffffff81092828>] workqueue_cpu_up_callback+0x308/0x3d0
>> [  227.034502]  [<ffffffff8109784e>] notifier_call_chain+0x4e/0x80
>> [  227.041117]  [<ffffffff8109796e>] __raw_notifier_call_chain+0xe/0x10
>> [  227.048219]  [<ffffffff810774f3>] cpu_notify+0x23/0x50
>> [  227.053961]  [<ffffffff81077878>] _cpu_up+0x188/0x1a0
>> [  227.059597]  [<ffffffff81077919>] cpu_up+0x89/0xb0
>> [  227.064950]  [<ffffffff8166fba0>] cpu_subsys_online+0x40/0x90
>> [  227.071372]  [<ffffffff814386dd>] device_online+0x6d/0xa0
>> [  227.077395]  [<ffffffff814387a5>] online_store+0x95/0xa0
>> [  227.083332]  [<ffffffff814358a8>] dev_attr_store+0x18/0x30
>> [  227.089460]  [<ffffffff8126d76d>] sysfs_kf_write+0x3d/0x50
>> [  227.095589]  [<ffffffff8126cc1a>] kernfs_fop_write+0x12a/0x180
>> [  227.102108]  [<ffffffff811f1bb7>] vfs_write+0xb7/0x1f0
>> [  227.107850]  [<ffffffff810232bc>] ? do_audit_syscall_entry+0x6c/0x70
>> [  227.114950]  [<ffffffff811f2835>] SyS_write+0x55/0xd0
>> [  227.120595]  [<ffffffff81681f09>] system_call_fastpath+0x12/0x17
>> [  227.127306] ---[ end trace 30d753e1e1124697 ]---
>>
>> Signed-off-by: Song Xiumiao <songxiumiao@inspur.com>
>> Signed-off-by: Gong Zhaogang <gongzhaogang@inspur.com>
>> Tested-by: Liu Changsheng <liuchangsheng@inspur.com>
>> Reviewed-by: xiaofeng.yan <xiaofeng.yan@inspur.com>
>> Reviewed-by: Fan Dongdong <fandd@inspur.com>
>> ---
>>  kernel/workqueue.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 586ad91..cae6277 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -3253,7 +3253,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
>>  	if (wq_numa_enabled) {
>>  		for_each_node(node) {
>>  			if (cpumask_subset(pool->attrs->cpumask,
>> -					   wq_numa_possible_cpumask[node])) {
>> +					   wq_numa_possible_cpumask[node]) &&
>> +					   node_online(node)) {
>>  				pool->node = node;
>>  				break;
>>  			}
>> -- 
>> 1.9.1
>>
>>
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Hotplug: fix the bug that the system is down,when memory is not in node0 and cpu is logically hotadded.
  2015-05-08 11:16 Song Xiumiao
@ 2015-05-08 15:23 ` Tejun Heo
  2015-05-11  1:41   ` Gu Zheng
  0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2015-05-08 15:23 UTC (permalink / raw)
  To: Song Xiumiao
  Cc: linux-kernel, yanxiaofeng, fandd, liuchangsheng, Gong Zhaogang,
	Lai Jiangshan, Gu Zheng, kamezawa.hiroyu

Cc'ing Lai, Gu and Kamezawa as they've been working in the area for a
while now.  Gu, is this related to what you've been working on?

Thanks.

On Fri, May 08, 2015 at 07:16:40PM +0800, Song Xiumiao wrote:
> From: songxiumiao <songxiumiao@inspur.com>
> 
> By analysing the bug function call trace,we find that create_worker
> function will alloc the memory from node0.Because node0 is offline,
> the allocation is failed. Then we add a condition to ensure the node
> is online and system can alloc memory from a node that is online.
> 
> Follow is the bug information:
> [root@localhost ~]# echo 1 > /sys/devices/system/cpu/cpu90/online
> [  225.611209] smpboot: Booting Node 2 Processor 90 APIC 0x40
> [18446744029.482996] kvm: enabling virtualization on CPU90
> [  225.725503] TSC synchronization [CPU#43 -> CPU#90]:
> [  225.730952] Measured 672516581900 cycles TSC warp between CPUs, turning off TSC clock.
> [  225.739800] tsc: Marking TSC unstable due to check_tsc_sync_source failed
> [  225.755126] BUG: unable to handle kernel paging request at 0000000000001b08
> [  225.762931] IP: [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
> [  225.770247] PGD 449bb0067 PUD 46110e067 PMD 0
> [  225.775248] Oops: 0000 [#1] SMP
> [  225.778875] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd
> [  225.868198] CPU: 43 PID: 5400 Comm: bash Not tainted 4.0.0-rc4-bug-fixed-remove #16
> [  225.876754] Hardware name: Insyde Brickland/Type2 - Board Product Name1, BIOS Brickland.05.04.15.0024 02/28/2015
> [  225.888122] task: ffff88045a3d8da0 ti: ffff880446120000 task.ti: ffff880446120000
> [  225.896484] RIP: 0010:[<ffffffff81182597>]  [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
> [  225.906509] RSP: 0018:ffff880446123918  EFLAGS: 00010246
> [  225.912443] RAX: 0000000000001b00 RBX: 0000000000000010 RCX: 0000000000000000
> [  225.920416] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000002052d0
> [  225.928388] RBP: ffff880446123a08 R08: ffff880460eca0c0 R09: 0000000060eca101
> [  225.936361] R10: ffff88046d007300 R11: ffffffff8108dd31 R12: 000000000001002a
> [  225.944334] R13: 00000000002052d0 R14: 0000000000000001 R15: 00000000000040d0
> [  225.952306] FS:  00007f9386450740(0000) GS:ffff88046db60000(0000) knlGS:0000000000000000
> [  225.961346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  225.967765] CR2: 0000000000001b08 CR3: 00000004612a3000 CR4: 00000000001407e0
> [  225.975735] Stack:
> [  225.977981]  00000000002052d0 0000000000000000 0000000000000003 ffff88045a3d8da0
> [  225.986291]  ffff880446123988 ffffffff811c7f81 ffff88045a3d8da0 0000000000000000
> [  225.994597]  000080d000000002 ffff88046d005500 000000000003000f 002052d0002052d0
> [  226.002904] Call Trace:
> [  226.005645]  [<ffffffff811c7f81>] ? alloc_pages_current+0x91/0x100
> [  226.012557]  [<ffffffff811d27c3>] ? deactivate_slab+0x383/0x400
> [  226.019173]  [<ffffffff811d3957>] new_slab+0xa7/0x460
> [  226.024826]  [<ffffffff81678c75>] __slab_alloc+0x310/0x470
> [  226.030960]  [<ffffffff8130caf6>] ? get_from_free_list+0x46/0x60
> [  226.037679]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
> [  226.043812]  [<ffffffff811d46c1>] kmem_cache_alloc_node_trace+0x91/0x250
> [  226.051299]  [<ffffffff8108dd31>] alloc_worker+0x21/0x50
> [  226.057236]  [<ffffffff8108ff23>] create_worker+0x53/0x1e0
> [  226.063357]  [<ffffffff81092092>] alloc_unbound_pwq+0x2a2/0x510
> [  226.069974]  [<ffffffff810924b4>] wq_update_unbound_numa+0x1b4/0x220
> [  226.077076]  [<ffffffff81092828>] workqueue_cpu_up_callback+0x308/0x3d0
> [  226.084468]  [<ffffffff8109784e>] notifier_call_chain+0x4e/0x80
> [  226.091084]  [<ffffffff8109796e>] __raw_notifier_call_chain+0xe/0x10
> [  226.098189]  [<ffffffff810774f3>] cpu_notify+0x23/0x50
> [  226.103929]  [<ffffffff81077878>] _cpu_up+0x188/0x1a0
> [  226.109574]  [<ffffffff81077919>] cpu_up+0x89/0xb0
> [  226.114923]  [<ffffffff8166fba0>] cpu_subsys_online+0x40/0x90
> [  226.121350]  [<ffffffff814386dd>] device_online+0x6d/0xa0
> [  226.127382]  [<ffffffff814387a5>] online_store+0x95/0xa0
> [  226.133322]  [<ffffffff814358a8>] dev_attr_store+0x18/0x30
> [  226.139457]  [<ffffffff8126d76d>] sysfs_kf_write+0x3d/0x50
> [  226.145586]  [<ffffffff8126cc1a>] kernfs_fop_write+0x12a/0x180
> [  226.152109]  [<ffffffff811f1bb7>] vfs_write+0xb7/0x1f0
> [  226.157853]  [<ffffffff810232bc>] ? do_audit_syscall_entry+0x6c/0x70
> [  226.164954]  [<ffffffff811f2835>] SyS_write+0x55/0xd0
> [  226.170595]  [<ffffffff81681f09>] system_call_fastpath+0x12/0x17
> [  226.177306] Code: 30 97 00 89 45 bc 83 e1 0f b8 22 01 32 01 01 c9 d3 f8 83 e0 03 89 9d 6c ff ff ff 83 e3 10 89 45 c0 0f 85 6d 01 00 00 48 8b 45 88 <48> 83 78 08 00 0f 84 51 01 00 00 b8 01
> [  226.199175] RIP  [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
> [  226.206576]  RSP <ffff880446123918>
> [  226.210471] CR2: 0000000000001b08
> [  226.227939] ---[ end trace 30d753e1e1124696 ]---
> [  226.412591] Kernel panic - not syncing: Fatal exception
> [  226.430948] Kernel Offset: disabled
> [  226.434845] drm_kms_helper: panic occurred, switching back to text console
> [  226.618325] ---[ end Kernel panic - not syncing: Fatal exception
> [  226.625047] ------------[ cut here ]------------
> [  226.630213] WARNING: CPU: 43 PID: 5400 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5d/0x60()
> [  226.640999] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd
> [  226.730275] CPU: 43 PID: 5400 Comm: bash Tainted: G      D         4.0.0-rc4-bug-fixed-remove #16
> [  226.740189] Hardware name: Insyde Brickland/Type2 - Board Product Name1, BIOS Brickland.05.04.15.0024 02/28/2015
> [  226.751558]  0000000000000000 00000000aa535e80 ffff88046db63d58 ffffffff8167aa08
> [  226.759865]  0000000000000000 0000000000000000 ffff88046db63d98 ffffffff810772da
> [  226.768173]  ffff88046db63d98 0000000000000000 ffff88046d615380 000000000000002b
> [  226.776480] Call Trace:
> [  226.779212]  <IRQ>  [<ffffffff8167aa08>] dump_stack+0x45/0x57
> [  226.785657]  [<ffffffff810772da>] warn_slowpath_common+0x8a/0xc0
> [  226.792367]  [<ffffffff8107740a>] warn_slowpath_null+0x1a/0x20
> [  226.798886]  [<ffffffff8104a64d>] native_smp_send_reschedule+0x5d/0x60
> [  226.806182]  [<ffffffff810b4fe5>] trigger_load_balance+0x145/0x1b0
> [  226.813093]  [<ffffffff810a348c>] scheduler_tick+0x9c/0xe0
> [  226.819228]  [<ffffffff810e0a21>] update_process_times+0x51/0x60
> [  226.825946]  [<ffffffff810f0925>] tick_sched_handle.isra.18+0x25/0x60
> [  226.833143]  [<ffffffff810f09a4>] tick_sched_timer+0x44/0x80
> [  226.839467]  [<ffffffff810e1737>] __run_hrtimer+0x77/0x1d0
> [  226.845590]  [<ffffffff810f0960>] ? tick_sched_handle.isra.18+0x60/0x60
> [  226.852980]  [<ffffffff810e1b13>] hrtimer_interrupt+0x103/0x230
> [  226.859596]  [<ffffffff8104d3d9>] local_apic_timer_interrupt+0x39/0x60
> [  226.866883]  [<ffffffff81684d85>] smp_apic_timer_interrupt+0x45/0x60
> [  226.873982]  [<ffffffff81682ded>] apic_timer_interrupt+0x6d/0x80
> [  226.880690]  <EOI>  [<ffffffff81675abe>] ? panic+0x1c3/0x204
> [  226.887036]  [<ffffffff81675ab7>] ? panic+0x1bc/0x204
> [  226.892682]  [<ffffffff81018949>] oops_end+0x109/0x120
> [  226.898422]  [<ffffffff81675285>] no_context+0x2ee/0x366
> [  226.904359]  [<ffffffff81675370>] __bad_area_nosemaphore+0x73/0x1cc
> [  226.911361]  [<ffffffff816756ae>] bad_area+0x44/0x4c
> [  226.916910]  [<ffffffff81062b1a>] __do_page_fault+0x2ea/0x420
> [  226.923331]  [<ffffffff81062c81>] do_page_fault+0x31/0x70
> [  226.929364]  [<ffffffff81683f08>] page_fault+0x28/0x30
> [  226.935106]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
> [  226.941235]  [<ffffffff81182597>] ? __alloc_pages_nodemask+0xb7/0x940
> [  226.948430]  [<ffffffff81182705>] ? __alloc_pages_nodemask+0x225/0x940
> [  226.955725]  [<ffffffff811c7f81>] ? alloc_pages_current+0x91/0x100
> [  226.962624]  [<ffffffff811d27c3>] ? deactivate_slab+0x383/0x400
> [  226.969239]  [<ffffffff811d3957>] new_slab+0xa7/0x460
> [  226.974885]  [<ffffffff81678c75>] __slab_alloc+0x310/0x470
> [  226.981015]  [<ffffffff8130caf6>] ? get_from_free_list+0x46/0x60
> [  226.987727]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
> [  226.993851]  [<ffffffff811d46c1>] kmem_cache_alloc_node_trace+0x91/0x250
> [  227.001340]  [<ffffffff8108dd31>] alloc_worker+0x21/0x50
> [  227.007275]  [<ffffffff8108ff23>] create_worker+0x53/0x1e0
> [  227.013404]  [<ffffffff81092092>] alloc_unbound_pwq+0x2a2/0x510
> [  227.020019]  [<ffffffff810924b4>] wq_update_unbound_numa+0x1b4/0x220
> [  227.027112]  [<ffffffff81092828>] workqueue_cpu_up_callback+0x308/0x3d0
> [  227.034502]  [<ffffffff8109784e>] notifier_call_chain+0x4e/0x80
> [  227.041117]  [<ffffffff8109796e>] __raw_notifier_call_chain+0xe/0x10
> [  227.048219]  [<ffffffff810774f3>] cpu_notify+0x23/0x50
> [  227.053961]  [<ffffffff81077878>] _cpu_up+0x188/0x1a0
> [  227.059597]  [<ffffffff81077919>] cpu_up+0x89/0xb0
> [  227.064950]  [<ffffffff8166fba0>] cpu_subsys_online+0x40/0x90
> [  227.071372]  [<ffffffff814386dd>] device_online+0x6d/0xa0
> [  227.077395]  [<ffffffff814387a5>] online_store+0x95/0xa0
> [  227.083332]  [<ffffffff814358a8>] dev_attr_store+0x18/0x30
> [  227.089460]  [<ffffffff8126d76d>] sysfs_kf_write+0x3d/0x50
> [  227.095589]  [<ffffffff8126cc1a>] kernfs_fop_write+0x12a/0x180
> [  227.102108]  [<ffffffff811f1bb7>] vfs_write+0xb7/0x1f0
> [  227.107850]  [<ffffffff810232bc>] ? do_audit_syscall_entry+0x6c/0x70
> [  227.114950]  [<ffffffff811f2835>] SyS_write+0x55/0xd0
> [  227.120595]  [<ffffffff81681f09>] system_call_fastpath+0x12/0x17
> [  227.127306] ---[ end trace 30d753e1e1124697 ]---
> 
> Signed-off-by: Song Xiumiao <songxiumiao@inspur.com>
> Signed-off-by: Gong Zhaogang <gongzhaogang@inspur.com>
> Tested-by: Liu Changsheng <liuchangsheng@inspur.com>
> Reviewed-by: xiaofeng.yan <xiaofeng.yan@inspur.com>
> Reviewed-by: Fan Dongdong <fandd@inspur.com>
> ---
>  kernel/workqueue.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 586ad91..cae6277 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3253,7 +3253,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
>  	if (wq_numa_enabled) {
>  		for_each_node(node) {
>  			if (cpumask_subset(pool->attrs->cpumask,
> -					   wq_numa_possible_cpumask[node])) {
> +					   wq_numa_possible_cpumask[node]) &&
> +					   node_online(node)) {
>  				pool->node = node;
>  				break;
>  			}
> -- 
> 1.9.1
> 
> 

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] Hotplug: fix the bug that the system is down,when memory is not in node0 and cpu is logically hotadded.
@ 2015-05-08 11:16 Song Xiumiao
  2015-05-08 15:23 ` Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: Song Xiumiao @ 2015-05-08 11:16 UTC (permalink / raw)
  To: tj, linux-kernel
  Cc: yanxiaofeng, fandd, liuchangsheng, songxiumiao, Gong Zhaogang

From: songxiumiao <songxiumiao@inspur.com>

By analysing the bug function call trace,we find that create_worker
function will alloc the memory from node0.Because node0 is offline,
the allocation is failed. Then we add a condition to ensure the node
is online and system can alloc memory from a node that is online.

Follow is the bug information:
[root@localhost ~]# echo 1 > /sys/devices/system/cpu/cpu90/online
[  225.611209] smpboot: Booting Node 2 Processor 90 APIC 0x40
[18446744029.482996] kvm: enabling virtualization on CPU90
[  225.725503] TSC synchronization [CPU#43 -> CPU#90]:
[  225.730952] Measured 672516581900 cycles TSC warp between CPUs, turning off TSC clock.
[  225.739800] tsc: Marking TSC unstable due to check_tsc_sync_source failed
[  225.755126] BUG: unable to handle kernel paging request at 0000000000001b08
[  225.762931] IP: [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
[  225.770247] PGD 449bb0067 PUD 46110e067 PMD 0
[  225.775248] Oops: 0000 [#1] SMP
[  225.778875] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd
[  225.868198] CPU: 43 PID: 5400 Comm: bash Not tainted 4.0.0-rc4-bug-fixed-remove #16
[  225.876754] Hardware name: Insyde Brickland/Type2 - Board Product Name1, BIOS Brickland.05.04.15.0024 02/28/2015
[  225.888122] task: ffff88045a3d8da0 ti: ffff880446120000 task.ti: ffff880446120000
[  225.896484] RIP: 0010:[<ffffffff81182597>]  [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
[  225.906509] RSP: 0018:ffff880446123918  EFLAGS: 00010246
[  225.912443] RAX: 0000000000001b00 RBX: 0000000000000010 RCX: 0000000000000000
[  225.920416] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000002052d0
[  225.928388] RBP: ffff880446123a08 R08: ffff880460eca0c0 R09: 0000000060eca101
[  225.936361] R10: ffff88046d007300 R11: ffffffff8108dd31 R12: 000000000001002a
[  225.944334] R13: 00000000002052d0 R14: 0000000000000001 R15: 00000000000040d0
[  225.952306] FS:  00007f9386450740(0000) GS:ffff88046db60000(0000) knlGS:0000000000000000
[  225.961346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  225.967765] CR2: 0000000000001b08 CR3: 00000004612a3000 CR4: 00000000001407e0
[  225.975735] Stack:
[  225.977981]  00000000002052d0 0000000000000000 0000000000000003 ffff88045a3d8da0
[  225.986291]  ffff880446123988 ffffffff811c7f81 ffff88045a3d8da0 0000000000000000
[  225.994597]  000080d000000002 ffff88046d005500 000000000003000f 002052d0002052d0
[  226.002904] Call Trace:
[  226.005645]  [<ffffffff811c7f81>] ? alloc_pages_current+0x91/0x100
[  226.012557]  [<ffffffff811d27c3>] ? deactivate_slab+0x383/0x400
[  226.019173]  [<ffffffff811d3957>] new_slab+0xa7/0x460
[  226.024826]  [<ffffffff81678c75>] __slab_alloc+0x310/0x470
[  226.030960]  [<ffffffff8130caf6>] ? get_from_free_list+0x46/0x60
[  226.037679]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
[  226.043812]  [<ffffffff811d46c1>] kmem_cache_alloc_node_trace+0x91/0x250
[  226.051299]  [<ffffffff8108dd31>] alloc_worker+0x21/0x50
[  226.057236]  [<ffffffff8108ff23>] create_worker+0x53/0x1e0
[  226.063357]  [<ffffffff81092092>] alloc_unbound_pwq+0x2a2/0x510
[  226.069974]  [<ffffffff810924b4>] wq_update_unbound_numa+0x1b4/0x220
[  226.077076]  [<ffffffff81092828>] workqueue_cpu_up_callback+0x308/0x3d0
[  226.084468]  [<ffffffff8109784e>] notifier_call_chain+0x4e/0x80
[  226.091084]  [<ffffffff8109796e>] __raw_notifier_call_chain+0xe/0x10
[  226.098189]  [<ffffffff810774f3>] cpu_notify+0x23/0x50
[  226.103929]  [<ffffffff81077878>] _cpu_up+0x188/0x1a0
[  226.109574]  [<ffffffff81077919>] cpu_up+0x89/0xb0
[  226.114923]  [<ffffffff8166fba0>] cpu_subsys_online+0x40/0x90
[  226.121350]  [<ffffffff814386dd>] device_online+0x6d/0xa0
[  226.127382]  [<ffffffff814387a5>] online_store+0x95/0xa0
[  226.133322]  [<ffffffff814358a8>] dev_attr_store+0x18/0x30
[  226.139457]  [<ffffffff8126d76d>] sysfs_kf_write+0x3d/0x50
[  226.145586]  [<ffffffff8126cc1a>] kernfs_fop_write+0x12a/0x180
[  226.152109]  [<ffffffff811f1bb7>] vfs_write+0xb7/0x1f0
[  226.157853]  [<ffffffff810232bc>] ? do_audit_syscall_entry+0x6c/0x70
[  226.164954]  [<ffffffff811f2835>] SyS_write+0x55/0xd0
[  226.170595]  [<ffffffff81681f09>] system_call_fastpath+0x12/0x17
[  226.177306] Code: 30 97 00 89 45 bc 83 e1 0f b8 22 01 32 01 01 c9 d3 f8 83 e0 03 89 9d 6c ff ff ff 83 e3 10 89 45 c0 0f 85 6d 01 00 00 48 8b 45 88 <48> 83 78 08 00 0f 84 51 01 00 00 b8 01
[  226.199175] RIP  [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
[  226.206576]  RSP <ffff880446123918>
[  226.210471] CR2: 0000000000001b08
[  226.227939] ---[ end trace 30d753e1e1124696 ]---
[  226.412591] Kernel panic - not syncing: Fatal exception
[  226.430948] Kernel Offset: disabled
[  226.434845] drm_kms_helper: panic occurred, switching back to text console
[  226.618325] ---[ end Kernel panic - not syncing: Fatal exception
[  226.625047] ------------[ cut here ]------------
[  226.630213] WARNING: CPU: 43 PID: 5400 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x5d/0x60()
[  226.640999] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd
[  226.730275] CPU: 43 PID: 5400 Comm: bash Tainted: G      D         4.0.0-rc4-bug-fixed-remove #16
[  226.740189] Hardware name: Insyde Brickland/Type2 - Board Product Name1, BIOS Brickland.05.04.15.0024 02/28/2015
[  226.751558]  0000000000000000 00000000aa535e80 ffff88046db63d58 ffffffff8167aa08
[  226.759865]  0000000000000000 0000000000000000 ffff88046db63d98 ffffffff810772da
[  226.768173]  ffff88046db63d98 0000000000000000 ffff88046d615380 000000000000002b
[  226.776480] Call Trace:
[  226.779212]  <IRQ>  [<ffffffff8167aa08>] dump_stack+0x45/0x57
[  226.785657]  [<ffffffff810772da>] warn_slowpath_common+0x8a/0xc0
[  226.792367]  [<ffffffff8107740a>] warn_slowpath_null+0x1a/0x20
[  226.798886]  [<ffffffff8104a64d>] native_smp_send_reschedule+0x5d/0x60
[  226.806182]  [<ffffffff810b4fe5>] trigger_load_balance+0x145/0x1b0
[  226.813093]  [<ffffffff810a348c>] scheduler_tick+0x9c/0xe0
[  226.819228]  [<ffffffff810e0a21>] update_process_times+0x51/0x60
[  226.825946]  [<ffffffff810f0925>] tick_sched_handle.isra.18+0x25/0x60
[  226.833143]  [<ffffffff810f09a4>] tick_sched_timer+0x44/0x80
[  226.839467]  [<ffffffff810e1737>] __run_hrtimer+0x77/0x1d0
[  226.845590]  [<ffffffff810f0960>] ? tick_sched_handle.isra.18+0x60/0x60
[  226.852980]  [<ffffffff810e1b13>] hrtimer_interrupt+0x103/0x230
[  226.859596]  [<ffffffff8104d3d9>] local_apic_timer_interrupt+0x39/0x60
[  226.866883]  [<ffffffff81684d85>] smp_apic_timer_interrupt+0x45/0x60
[  226.873982]  [<ffffffff81682ded>] apic_timer_interrupt+0x6d/0x80
[  226.880690]  <EOI>  [<ffffffff81675abe>] ? panic+0x1c3/0x204
[  226.887036]  [<ffffffff81675ab7>] ? panic+0x1bc/0x204
[  226.892682]  [<ffffffff81018949>] oops_end+0x109/0x120
[  226.898422]  [<ffffffff81675285>] no_context+0x2ee/0x366
[  226.904359]  [<ffffffff81675370>] __bad_area_nosemaphore+0x73/0x1cc
[  226.911361]  [<ffffffff816756ae>] bad_area+0x44/0x4c
[  226.916910]  [<ffffffff81062b1a>] __do_page_fault+0x2ea/0x420
[  226.923331]  [<ffffffff81062c81>] do_page_fault+0x31/0x70
[  226.929364]  [<ffffffff81683f08>] page_fault+0x28/0x30
[  226.935106]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
[  226.941235]  [<ffffffff81182597>] ? __alloc_pages_nodemask+0xb7/0x940
[  226.948430]  [<ffffffff81182705>] ? __alloc_pages_nodemask+0x225/0x940
[  226.955725]  [<ffffffff811c7f81>] ? alloc_pages_current+0x91/0x100
[  226.962624]  [<ffffffff811d27c3>] ? deactivate_slab+0x383/0x400
[  226.969239]  [<ffffffff811d3957>] new_slab+0xa7/0x460
[  226.974885]  [<ffffffff81678c75>] __slab_alloc+0x310/0x470
[  226.981015]  [<ffffffff8130caf6>] ? get_from_free_list+0x46/0x60
[  226.987727]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
[  226.993851]  [<ffffffff811d46c1>] kmem_cache_alloc_node_trace+0x91/0x250
[  227.001340]  [<ffffffff8108dd31>] alloc_worker+0x21/0x50
[  227.007275]  [<ffffffff8108ff23>] create_worker+0x53/0x1e0
[  227.013404]  [<ffffffff81092092>] alloc_unbound_pwq+0x2a2/0x510
[  227.020019]  [<ffffffff810924b4>] wq_update_unbound_numa+0x1b4/0x220
[  227.027112]  [<ffffffff81092828>] workqueue_cpu_up_callback+0x308/0x3d0
[  227.034502]  [<ffffffff8109784e>] notifier_call_chain+0x4e/0x80
[  227.041117]  [<ffffffff8109796e>] __raw_notifier_call_chain+0xe/0x10
[  227.048219]  [<ffffffff810774f3>] cpu_notify+0x23/0x50
[  227.053961]  [<ffffffff81077878>] _cpu_up+0x188/0x1a0
[  227.059597]  [<ffffffff81077919>] cpu_up+0x89/0xb0
[  227.064950]  [<ffffffff8166fba0>] cpu_subsys_online+0x40/0x90
[  227.071372]  [<ffffffff814386dd>] device_online+0x6d/0xa0
[  227.077395]  [<ffffffff814387a5>] online_store+0x95/0xa0
[  227.083332]  [<ffffffff814358a8>] dev_attr_store+0x18/0x30
[  227.089460]  [<ffffffff8126d76d>] sysfs_kf_write+0x3d/0x50
[  227.095589]  [<ffffffff8126cc1a>] kernfs_fop_write+0x12a/0x180
[  227.102108]  [<ffffffff811f1bb7>] vfs_write+0xb7/0x1f0
[  227.107850]  [<ffffffff810232bc>] ? do_audit_syscall_entry+0x6c/0x70
[  227.114950]  [<ffffffff811f2835>] SyS_write+0x55/0xd0
[  227.120595]  [<ffffffff81681f09>] system_call_fastpath+0x12/0x17
[  227.127306] ---[ end trace 30d753e1e1124697 ]---

Signed-off-by: Song Xiumiao <songxiumiao@inspur.com>
Signed-off-by: Gong Zhaogang <gongzhaogang@inspur.com>
Tested-by: Liu Changsheng <liuchangsheng@inspur.com>
Reviewed-by: xiaofeng.yan <xiaofeng.yan@inspur.com>
Reviewed-by: Fan Dongdong <fandd@inspur.com>
---
 kernel/workqueue.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 586ad91..cae6277 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3253,7 +3253,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
 	if (wq_numa_enabled) {
 		for_each_node(node) {
 			if (cpumask_subset(pool->attrs->cpumask,
-					   wq_numa_possible_cpumask[node])) {
+					   wq_numa_possible_cpumask[node]) &&
+					   node_online(node)) {
 				pool->node = node;
 				break;
 			}
-- 
1.9.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-06-17  8:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-17 15:46 [PATCH] Hotplug: fix the bug that the system is down,when memory is not in node0 and cpu is logically hotadded gongzg
2015-06-17  8:19 ` Frans Klaver
  -- strict thread matches above, loose matches on Subject: below --
2015-05-08 11:16 Song Xiumiao
2015-05-08 15:23 ` Tejun Heo
2015-05-11  1:41   ` Gu Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.