Re: [PATCH] sched_groups are expected to be circular linked list, make it so right after allocation

* Re: [PATCH] sched_groups are expected to be circular linked list, make it so right after allocation
  2012-05-09 10:38 [PATCH] sched_groups are expected to be circular linked list, make it so right after allocation Igor Mammedov
@ 2012-05-09 10:21 ` Jiang Liu
  2012-05-09 11:44   ` Igor Mammedov
  2012-05-09 10:35 ` [tip:sched/urgent] sched: Fix KVM and ia64 boot crash due to sched_groups circular linked list assumption tip-bot for Igor Mammedov
  2012-05-09 11:41 ` [PATCH] sched_groups are expected to be circular linked list, make it so right after allocation Peter Zijlstra
  2 siblings, 1 reply; 39+ messages in thread
From: Jiang Liu @ 2012-05-09 10:21 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, a.p.zijlstra, mingo, pjt, tglx, seto.hidetoshi

Hi Igor,
	Thanks for fixing this bug! We encountered the same issue with an
IA64 systems too. That system could boot with 2.6.32, but can't boot with
any 3.x.x kernels. We have just found the root cause today.
	--gerry

On 05/09/2012 06:38 PM, Igor Mammedov wrote:
> if we have one cpu that failed to boot and boot cpu gave up on waiting for it
> and then another cpu is being booted, kernel might crash with following OOPS:
> 
> [  723.865765] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
> [  723.866616] IP: [<ffffffff812c3630>] __bitmap_weight+0x30/0x80
> [  723.866616] PGD 7ba91067 PUD 7a205067 PMD 0
> [  723.866616] Oops: 0000 [#1] SMP
> [  723.898527] CPU 1
> ...
> [  723.898527] Pid: 1221, comm: offV2.sh Tainted: G        W    3.4.0-rc4+ #213 Red Hat KVM
> [  723.898527] RIP: 0010:[<ffffffff812c3630>]  [<ffffffff812c3630>] __bitmap_weight+0x30/0x80
> [  723.898527] RSP: 0018:ffff88007ab9dc18  EFLAGS: 00010246
> [  723.898527] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000000
> [  723.898527] RDX: 0000000000000018 RSI: 0000000000000100 RDI: 0000000000000018
> [  723.898527] RBP: ffff88007ab9dc18 R08: 0000000000000000 R09: 0000000000000020
> [  723.898527] R10: 0000000000000004 R11: 0000000000000000 R12: ffff88007c06ed60
> [  723.898527] R13: ffff880037a94000 R14: 0000000000000003 R15: ffff88007c06ed60
> [  723.898527] FS:  00007f1d6a7d8700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
> [  723.898527] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  723.898527] CR2: 0000000000000018 CR3: 000000007bb7f000 CR4: 00000000000007e0
> [  723.898527] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  723.898527] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  723.898527] Process offV2.sh (pid: 1221, threadinfo ffff88007ab9c000, task ffff88007b358000)
> [  723.898527] Stack:
> [  723.898527]  ffff88007ab9dcc8 ffffffff8108b9b6 ffff88007ab9dc58 ffff88007b4f2a00
> [  723.898527]  ffff88007c06ed60 0000000000000003 000000037ab9dc58 0000000000010008
> [  723.898527]  ffffffff81a308e8 0000000000000003 ffff88007b489cc0 ffff880037b6bd20
> [  723.898527] Call Trace:
> [  723.898527]  [<ffffffff8108b9b6>] build_sched_domains+0x7b6/0xa50
> [  723.898527]  [<ffffffff8108bea9>] partition_sched_domains+0x259/0x3f0
> [  723.898527]  [<ffffffff810c4485>] cpuset_update_active_cpus+0x85/0x90
> [  723.898527]  [<ffffffff81084f65>] cpuset_cpu_active+0x25/0x30
> [  723.898527]  [<ffffffff81545b45>] notifier_call_chain+0x55/0x80
> [  723.898527]  [<ffffffff8107e59e>] __raw_notifier_call_chain+0xe/0x10
> [  723.898527]  [<ffffffff81058be0>] __cpu_notify+0x20/0x40
> [  723.898527]  [<ffffffff8153af08>] _cpu_up+0xc7/0x10e
> [  723.898527]  [<ffffffff8153af9b>] cpu_up+0x4c/0x5c
> 
> crash happens in  init_sched_groups_power() that expects sched_groups to be
> circular linked list. However it is not always true, since sched_groups
> preallocated in __sdt_alloc are initialized in build_sched_groups and it
> may exit early
> 
>         if (cpu != cpumask_first(sched_domain_span(sd)))
>                 return 0;
> 
> without initializing sd->groups->next field.
> 
> Fix bug by initializing next field right after sched_group was allocated.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  kernel/sched/core.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 0533a68..e5212ae 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6382,6 +6382,8 @@ static int __sdt_alloc(const struct cpumask *cpu_map)
>  			if (!sg)
>  				return -ENOMEM;
>  
> +			sg->next = sg;
> +
>  			*per_cpu_ptr(sdd->sg, j) = sg;
>  
>  			sgp = kzalloc_node(sizeof(struct sched_group_power),


^ permalink raw reply	[flat|nested] 39+ messages in thread