* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
2014-09-17 7:17 [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug Wanpeng Li
@ 2014-09-21 23:11 ` Wanpeng Li
2014-09-23 4:46 ` Kamezawa Hiroyuki
2014-09-23 9:37 ` Borislav Petkov
2 siblings, 0 replies; 9+ messages in thread
From: Wanpeng Li @ 2014-09-21 23:11 UTC (permalink / raw)
To: Wanpeng Li, Ingo Molnar, hpa, Peter Zijlstra
Cc: Ingo Molnar, x86, Borislav Petkov, Yasuaki Ishimatsu,
David Rientjes, Prarit Bhargava, Steven Rostedt, Toshi Kani,
linux-kernel
Ping Ingo, Peter Z, HPA,
于 14-9-17 下午3:17, Wanpeng Li 写道:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> IP: [..] find_busiest_group
> PGD 5a9d5067 PUD 13067 PMD 0
> Oops: 0000 [#3] SMP
> [...]
> Call Trace:
> load_balance
> ? _raw_spin_unlock_irqrestore
> idle_balance
> __schedule
> schedule
> schedule_timeout
> ? lock_timer_base
> schedule_timeout_uninterruptible
> msleep
> lock_device_hotplug_sysfs
> online_store
> dev_attr_store
> sysfs_write_file
> vfs_write
> SyS_write
> system_call_fastpath
>
> This bug can be triggered by hot add and remove large number of xen
> domain0's vcpus repeatedly.
>
> Last level cache shared map is built during cpu up and build sched domain
> routine takes advantage of it to setup sched domain cpu topology, however,
> llc shared map is unreleased during cpu disable which lead to invalid sched
> domain cpu topology. This patch fix it by release llc shared map correctly
> during cpu disable.
>
> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Tested-by: Linn Crosetto <linn@hp.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> ---
> v4 -> v5:
> * add the description when the bug can occur
> v3 -> v4:
> * simplify backtrace
> v2 -> v3:
> * simplify backtrace
> v1 -> v2:
> * fix subject line
>
> arch/x86/kernel/smpboot.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 5492798..0134ec7 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1292,6 +1292,9 @@ static void remove_siblinginfo(int cpu)
>
> for_each_cpu(sibling, cpu_sibling_mask(cpu))
> cpumask_clear_cpu(cpu, cpu_sibling_mask(sibling));
> + for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
> + cpumask_clear_cpu(cpu, cpu_llc_shared_mask(sibling));
> + cpumask_clear(cpu_llc_shared_mask(cpu));
> cpumask_clear(cpu_sibling_mask(cpu));
> cpumask_clear(cpu_core_mask(cpu));
> c->phys_proc_id = 0;
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
2014-09-17 7:17 [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug Wanpeng Li
2014-09-21 23:11 ` Wanpeng Li
@ 2014-09-23 4:46 ` Kamezawa Hiroyuki
2014-09-23 6:36 ` Wanpeng Li
2014-09-23 9:37 ` Borislav Petkov
2 siblings, 1 reply; 9+ messages in thread
From: Kamezawa Hiroyuki @ 2014-09-23 4:46 UTC (permalink / raw)
To: Wanpeng Li, Ingo Molnar, hpa, Peter Zijlstra
Cc: Ingo Molnar, x86, Borislav Petkov, Yasuaki Ishimatsu,
David Rientjes, Prarit Bhargava, Steven Rostedt, Toshi Kani,
linux-kernel
(2014/09/17 16:17), Wanpeng Li wrote:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> IP: [..] find_busiest_group
> PGD 5a9d5067 PUD 13067 PMD 0
> Oops: 0000 [#3] SMP
> [...]
> Call Trace:
> load_balance
> ? _raw_spin_unlock_irqrestore
> idle_balance
> __schedule
> schedule
> schedule_timeout
> ? lock_timer_base
> schedule_timeout_uninterruptible
> msleep
> lock_device_hotplug_sysfs
> online_store
> dev_attr_store
> sysfs_write_file
> vfs_write
> SyS_write
> system_call_fastpath
>
> This bug can be triggered by hot add and remove large number of xen
> domain0's vcpus repeatedly.
>
> Last level cache shared map is built during cpu up and build sched domain
> routine takes advantage of it to setup sched domain cpu topology, however,
> llc shared map is unreleased during cpu disable which lead to invalid sched
> domain cpu topology. This patch fix it by release llc shared map correctly
> during cpu disable.
>
> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Tested-by: Linn Crosetto <linn@hp.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Yasuaki reported this can happen on our real hardware.
https://lkml.org/lkml/2014/7/22/1018
Our case is here.
==
Here is a example on my system.
My system has 4 sockets and each socket has 15 cores and HT is enabled.
In this case, each core of sockes is numbered as follows:
| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-44, 90-104
Socket#3 | 45-59, 105-119
Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
It means that last level cache of Socket#2 is shared with
CPU#30-44 and 90-104.
When hot-removing socket#2 and #3, each core of sockets is numbered
as follows:
| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains
having 0x3fff80000001fffc0000000.
After that, when hot-adding socket#2 and #3, each core of sockets is
numbered as follows:
| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-59
Socket#3 | 90-119
Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000.
It means that last level cache of Socket#2 is shared with CPU#30-59
and 90-104. So the mask has wrong value.
At first, I cleared hot-removed CPU number's bit from llc_shared_map
when hot removing CPU. But Borislav suggested that the problem will
disappear if readded CPU is assigned same CPU number. And llc_shared_map
must not be changed.
==
So, please.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
2014-09-23 4:46 ` Kamezawa Hiroyuki
@ 2014-09-23 6:36 ` Wanpeng Li
2014-09-23 7:56 ` Kamezawa Hiroyuki
0 siblings, 1 reply; 9+ messages in thread
From: Wanpeng Li @ 2014-09-23 6:36 UTC (permalink / raw)
To: Kamezawa Hiroyuki, Wanpeng Li, Ingo Molnar, hpa, Peter Zijlstra
Cc: Ingo Molnar, x86, Borislav Petkov, Yasuaki Ishimatsu,
David Rientjes, Prarit Bhargava, Steven Rostedt, Toshi Kani,
linux-kernel
Hi Kamezawa,
于 14-9-23 下午12:46, Kamezawa Hiroyuki 写道:
> (2014/09/17 16:17), Wanpeng Li wrote:
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>> IP: [..] find_busiest_group
>> PGD 5a9d5067 PUD 13067 PMD 0
>> Oops: 0000 [#3] SMP
>> [...]
>> Call Trace:
>> load_balance
>> ? _raw_spin_unlock_irqrestore
>> idle_balance
>> __schedule
>> schedule
>> schedule_timeout
>> ? lock_timer_base
>> schedule_timeout_uninterruptible
>> msleep
>> lock_device_hotplug_sysfs
>> online_store
>> dev_attr_store
>> sysfs_write_file
>> vfs_write
>> SyS_write
>> system_call_fastpath
>>
>> This bug can be triggered by hot add and remove large number of xen
>> domain0's vcpus repeatedly.
>>
>> Last level cache shared map is built during cpu up and build sched domain
>> routine takes advantage of it to setup sched domain cpu topology, however,
>> llc shared map is unreleased during cpu disable which lead to invalid sched
>> domain cpu topology. This patch fix it by release llc shared map correctly
>> during cpu disable.
>>
>> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
>> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Tested-by: Linn Crosetto <linn@hp.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> Yasuaki reported this can happen on our real hardware.
> https://lkml.org/lkml/2014/7/22/1018
>
> Our case is here.
> ==
> Here is a example on my system.
> My system has 4 sockets and each socket has 15 cores and HT is enabled.
> In this case, each core of sockes is numbered as follows:
>
> | CPU#
> Socket#0 | 0-14 , 60-74
> Socket#1 | 15-29, 75-89
> Socket#2 | 30-44, 90-104
> Socket#3 | 45-59, 105-119
> Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
> It means that last level cache of Socket#2 is shared with
> CPU#30-44 and 90-104.
> When hot-removing socket#2 and #3, each core of sockets is numbered
> as follows:
>
> | CPU#
> Socket#0 | 0-14 , 60-74
> Socket#1 | 15-29, 75-89
> But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains
> having 0x3fff80000001fffc0000000.
> After that, when hot-adding socket#2 and #3, each core of sockets is
> numbered as follows:
>
> | CPU#
> Socket#0 | 0-14 , 60-74
> Socket#1 | 15-29, 75-89
> Socket#2 | 30-59
> Socket#3 | 90-119
> Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000.
> It means that last level cache of Socket#2 is shared with CPU#30-59
> and 90-104. So the mask has wrong value.
> At first, I cleared hot-removed CPU number's bit from llc_shared_map
> when hot removing CPU. But Borislav suggested that the problem will
> disappear if readded CPU is assigned same CPU number. And llc_shared_map
> must not be changed.
> ==
>
> So, please.
As I mentioned before, we still observe calltrace after Yasuaki's patch
applied.
https://lkml.org/lkml/2014/7/29/40
Actually I prefer to merge both patches, one for fix llc shared map
unreleased during hotplug and the other one for assign same CPU number
to readded CPU.
Regards,
Wanpeng Li
> Thanks,
> -Kame
>
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
2014-09-23 6:36 ` Wanpeng Li
@ 2014-09-23 7:56 ` Kamezawa Hiroyuki
0 siblings, 0 replies; 9+ messages in thread
From: Kamezawa Hiroyuki @ 2014-09-23 7:56 UTC (permalink / raw)
To: Wanpeng Li, Wanpeng Li, Ingo Molnar, hpa, Peter Zijlstra
Cc: Ingo Molnar, x86, Borislav Petkov, Yasuaki Ishimatsu,
David Rientjes, Prarit Bhargava, Steven Rostedt, Toshi Kani,
linux-kernel
(2014/09/23 15:36), Wanpeng Li wrote:
> Hi Kamezawa,
> 于 14-9-23 下午12:46, Kamezawa Hiroyuki 写道:
>> (2014/09/17 16:17), Wanpeng Li wrote:
>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>>> IP: [..] find_busiest_group
>>> PGD 5a9d5067 PUD 13067 PMD 0
>>> Oops: 0000 [#3] SMP
>>> [...]
>>> Call Trace:
>>> load_balance
>>> ? _raw_spin_unlock_irqrestore
>>> idle_balance
>>> __schedule
>>> schedule
>>> schedule_timeout
>>> ? lock_timer_base
>>> schedule_timeout_uninterruptible
>>> msleep
>>> lock_device_hotplug_sysfs
>>> online_store
>>> dev_attr_store
>>> sysfs_write_file
>>> vfs_write
>>> SyS_write
>>> system_call_fastpath
>>>
>>> This bug can be triggered by hot add and remove large number of xen
>>> domain0's vcpus repeatedly.
>>>
>>> Last level cache shared map is built during cpu up and build sched domain
>>> routine takes advantage of it to setup sched domain cpu topology, however,
>>> llc shared map is unreleased during cpu disable which lead to invalid sched
>>> domain cpu topology. This patch fix it by release llc shared map correctly
>>> during cpu disable.
>>>
>>> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
>>> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>> Tested-by: Linn Crosetto <linn@hp.com>
>>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
>> Yasuaki reported this can happen on our real hardware.
>> https://lkml.org/lkml/2014/7/22/1018
>>
>> Our case is here.
>> ==
>> Here is a example on my system.
>> My system has 4 sockets and each socket has 15 cores and HT is enabled.
>> In this case, each core of sockes is numbered as follows:
>>
>> | CPU#
>> Socket#0 | 0-14 , 60-74
>> Socket#1 | 15-29, 75-89
>> Socket#2 | 30-44, 90-104
>> Socket#3 | 45-59, 105-119
>> Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
>> It means that last level cache of Socket#2 is shared with
>> CPU#30-44 and 90-104.
>> When hot-removing socket#2 and #3, each core of sockets is numbered
>> as follows:
>>
>> | CPU#
>> Socket#0 | 0-14 , 60-74
>> Socket#1 | 15-29, 75-89
>> But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains
>> having 0x3fff80000001fffc0000000.
>> After that, when hot-adding socket#2 and #3, each core of sockets is
>> numbered as follows:
>>
>> | CPU#
>> Socket#0 | 0-14 , 60-74
>> Socket#1 | 15-29, 75-89
>> Socket#2 | 30-59
>> Socket#3 | 90-119
>> Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000.
>> It means that last level cache of Socket#2 is shared with CPU#30-59
>> and 90-104. So the mask has wrong value.
>> At first, I cleared hot-removed CPU number's bit from llc_shared_map
>> when hot removing CPU. But Borislav suggested that the problem will
>> disappear if readded CPU is assigned same CPU number. And llc_shared_map
>> must not be changed.
>> ==
>>
>> So, please.
>
> As I mentioned before, we still observe calltrace after Yasuaki's patch
> applied.
> https://lkml.org/lkml/2014/7/29/40
>
Yes.
I just wanted to say we need your patch by showing real hardware case.
Sorry for confusion I just reused his explanation of the problem.
I know Yasuaki's original trial was clearing llc_shared map as you do.
> Actually I prefer to merge both patches, one for fix llc shared map
> unreleased during hotplug and the other one for assign same CPU number
> to readded CPU.
>
I agree.
Thanks,
-Kame
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
2014-09-17 7:17 [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug Wanpeng Li
2014-09-21 23:11 ` Wanpeng Li
2014-09-23 4:46 ` Kamezawa Hiroyuki
@ 2014-09-23 9:37 ` Borislav Petkov
2014-09-23 23:48 ` Wanpeng Li
2 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2014-09-23 9:37 UTC (permalink / raw)
To: Wanpeng Li
Cc: Ingo Molnar, hpa, Peter Zijlstra, Ingo Molnar, x86,
Yasuaki Ishimatsu, David Rientjes, Prarit Bhargava,
Steven Rostedt, Toshi Kani, linux-kernel
On Wed, Sep 17, 2014 at 03:17:52PM +0800, Wanpeng Li wrote:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> IP: [..] find_busiest_group
> PGD 5a9d5067 PUD 13067 PMD 0
> Oops: 0000 [#3] SMP
> [...]
> Call Trace:
> load_balance
> ? _raw_spin_unlock_irqrestore
> idle_balance
> __schedule
> schedule
> schedule_timeout
> ? lock_timer_base
> schedule_timeout_uninterruptible
> msleep
> lock_device_hotplug_sysfs
> online_store
> dev_attr_store
> sysfs_write_file
> vfs_write
> SyS_write
> system_call_fastpath
>
> This bug can be triggered by hot add and remove large number of xen
> domain0's vcpus repeatedly.
>
> Last level cache shared map is built during cpu up and build sched domain
> routine takes advantage of it to setup sched domain cpu topology, however,
> llc shared map is unreleased during cpu disable which lead to invalid sched
> domain cpu topology. This patch fix it by release llc shared map correctly
> during cpu disable.
>
> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Tested-by: Linn Crosetto <linn@hp.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
2014-09-23 9:37 ` Borislav Petkov
@ 2014-09-23 23:48 ` Wanpeng Li
2014-09-24 7:52 ` Ingo Molnar
0 siblings, 1 reply; 9+ messages in thread
From: Wanpeng Li @ 2014-09-23 23:48 UTC (permalink / raw)
To: Borislav Petkov, Wanpeng Li
Cc: Ingo Molnar, hpa, Peter Zijlstra, Ingo Molnar, x86,
Yasuaki Ishimatsu, David Rientjes, Prarit Bhargava,
Steven Rostedt, Toshi Kani, linux-kernel
于 14-9-23 下午5:37, Borislav Petkov 写道:
> On Wed, Sep 17, 2014 at 03:17:52PM +0800, Wanpeng Li wrote:
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>> IP: [..] find_busiest_group
>> PGD 5a9d5067 PUD 13067 PMD 0
>> Oops: 0000 [#3] SMP
>> [...]
>> Call Trace:
>> load_balance
>> ? _raw_spin_unlock_irqrestore
>> idle_balance
>> __schedule
>> schedule
>> schedule_timeout
>> ? lock_timer_base
>> schedule_timeout_uninterruptible
>> msleep
>> lock_device_hotplug_sysfs
>> online_store
>> dev_attr_store
>> sysfs_write_file
>> vfs_write
>> SyS_write
>> system_call_fastpath
>>
>> This bug can be triggered by hot add and remove large number of xen
>> domain0's vcpus repeatedly.
>>
>> Last level cache shared map is built during cpu up and build sched domain
>> routine takes advantage of it to setup sched domain cpu topology, however,
>> llc shared map is unreleased during cpu disable which lead to invalid sched
>> domain cpu topology. This patch fix it by release llc shared map correctly
>> during cpu disable.
>>
>> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
>> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Tested-by: Linn Crosetto <linn@hp.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> Reviewed-by: Borislav Petkov <bp@suse.de>
>
Thanks.
Ingo, Peter Z, HPA,
Could this patch catch up with 3.18 merge window?
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
2014-09-23 23:48 ` Wanpeng Li
@ 2014-09-24 7:52 ` Ingo Molnar
2014-09-24 8:18 ` Wanpeng Li
0 siblings, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2014-09-24 7:52 UTC (permalink / raw)
To: Wanpeng Li
Cc: Borislav Petkov, Wanpeng Li, Ingo Molnar, hpa, Peter Zijlstra,
x86, Yasuaki Ishimatsu, David Rientjes, Prarit Bhargava,
Steven Rostedt, Toshi Kani, linux-kernel
* Wanpeng Li <kernellwp@gmail.com> wrote:
>
> 于 14-9-23 下午5:37, Borislav Petkov 写道:
> >On Wed, Sep 17, 2014 at 03:17:52PM +0800, Wanpeng Li wrote:
> >>BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> >>IP: [..] find_busiest_group
> >>PGD 5a9d5067 PUD 13067 PMD 0
> >>Oops: 0000 [#3] SMP
> >>[...]
> >>Call Trace:
> >>load_balance
> >>? _raw_spin_unlock_irqrestore
> >>idle_balance
> >>__schedule
> >>schedule
> >>schedule_timeout
> >>? lock_timer_base
> >>schedule_timeout_uninterruptible
> >>msleep
> >>lock_device_hotplug_sysfs
> >>online_store
> >>dev_attr_store
> >>sysfs_write_file
> >>vfs_write
> >>SyS_write
> >>system_call_fastpath
> >>
> >>This bug can be triggered by hot add and remove large number of xen
> >>domain0's vcpus repeatedly.
> >>
> >>Last level cache shared map is built during cpu up and build sched domain
> >>routine takes advantage of it to setup sched domain cpu topology, however,
> >>llc shared map is unreleased during cpu disable which lead to invalid sched
> >>domain cpu topology. This patch fix it by release llc shared map correctly
> >>during cpu disable.
> >>
> >>Reviewed-by: Toshi Kani <toshi.kani@hp.com>
> >>Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> >>Tested-by: Linn Crosetto <linn@hp.com>
> >>Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> >Reviewed-by: Borislav Petkov <bp@suse.de>
> >
>
> Thanks.
>
>
> Ingo, Peter Z, HPA,
>
>
> Could this patch catch up with 3.18 merge window?
Please also add the real-hardware reports to the changelog.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
2014-09-24 7:52 ` Ingo Molnar
@ 2014-09-24 8:18 ` Wanpeng Li
0 siblings, 0 replies; 9+ messages in thread
From: Wanpeng Li @ 2014-09-24 8:18 UTC (permalink / raw)
To: Ingo Molnar
Cc: Borislav Petkov, Wanpeng Li, Ingo Molnar, hpa, Peter Zijlstra,
x86, Yasuaki Ishimatsu, David Rientjes, Prarit Bhargava,
Steven Rostedt, Toshi Kani, linux-kernel
于 9/24/14, 3:52 PM, Ingo Molnar 写道:
> * Wanpeng Li <kernellwp@gmail.com> wrote:
>
>> 于 14-9-23 下午5:37, Borislav Petkov 写道:
>>> On Wed, Sep 17, 2014 at 03:17:52PM +0800, Wanpeng Li wrote:
>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>>>> IP: [..] find_busiest_group
>>>> PGD 5a9d5067 PUD 13067 PMD 0
>>>> Oops: 0000 [#3] SMP
>>>> [...]
>>>> Call Trace:
>>>> load_balance
>>>> ? _raw_spin_unlock_irqrestore
>>>> idle_balance
>>>> __schedule
>>>> schedule
>>>> schedule_timeout
>>>> ? lock_timer_base
>>>> schedule_timeout_uninterruptible
>>>> msleep
>>>> lock_device_hotplug_sysfs
>>>> online_store
>>>> dev_attr_store
>>>> sysfs_write_file
>>>> vfs_write
>>>> SyS_write
>>>> system_call_fastpath
>>>>
>>>> This bug can be triggered by hot add and remove large number of xen
>>>> domain0's vcpus repeatedly.
>>>>
>>>> Last level cache shared map is built during cpu up and build sched domain
>>>> routine takes advantage of it to setup sched domain cpu topology, however,
>>>> llc shared map is unreleased during cpu disable which lead to invalid sched
>>>> domain cpu topology. This patch fix it by release llc shared map correctly
>>>> during cpu disable.
>>>>
>>>> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
>>>> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>>> Tested-by: Linn Crosetto <linn@hp.com>
>>>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
>>> Reviewed-by: Borislav Petkov <bp@suse.de>
>>>
>> Thanks.
>>
>>
>> Ingo, Peter Z, HPA,
>>
>>
>> Could this patch catch up with 3.18 merge window?
> Please also add the real-hardware reports to the changelog.
Just send out the latest version.
Regards,
Wanpeng Li
>
> Thanks,
>
> Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread