linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
@ 2014-09-17  7:17 Wanpeng Li
  2014-09-21 23:11 ` Wanpeng Li
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Wanpeng Li @ 2014-09-17  7:17 UTC (permalink / raw)
  To: Ingo Molnar, hpa, Peter Zijlstra
  Cc: Ingo Molnar, x86, Borislav Petkov, Yasuaki Ishimatsu,
	David Rientjes, Prarit Bhargava, Steven Rostedt, Toshi Kani,
	linux-kernel, Wanpeng Li

BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: [..] find_busiest_group
PGD 5a9d5067 PUD 13067 PMD 0
Oops: 0000 [#3] SMP
[...]
Call Trace:
load_balance
? _raw_spin_unlock_irqrestore
idle_balance
__schedule
schedule
schedule_timeout
? lock_timer_base
schedule_timeout_uninterruptible
msleep
lock_device_hotplug_sysfs
online_store
dev_attr_store
sysfs_write_file
vfs_write
SyS_write
system_call_fastpath

This bug can be triggered by hot add and remove large number of xen
domain0's vcpus repeatedly.

Last level cache shared map is built during cpu up and build sched domain 
routine takes advantage of it to setup sched domain cpu topology, however, 
llc shared map is unreleased during cpu disable which lead to invalid sched 
domain cpu topology. This patch fix it by release llc shared map correctly
during cpu disable.

Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Tested-by: Linn Crosetto <linn@hp.com>
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
---
v4 -> v5:
 * add the description when the bug can occur
v3 -> v4:
 * simplify backtrace
v2 -> v3:
 * simplify backtrace 
v1 -> v2:
 * fix subject line

 arch/x86/kernel/smpboot.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 5492798..0134ec7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1292,6 +1292,9 @@ static void remove_siblinginfo(int cpu)
 
 	for_each_cpu(sibling, cpu_sibling_mask(cpu))
 		cpumask_clear_cpu(cpu, cpu_sibling_mask(sibling));
+	for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
+		cpumask_clear_cpu(cpu, cpu_llc_shared_mask(sibling));
+	cpumask_clear(cpu_llc_shared_mask(cpu));
 	cpumask_clear(cpu_sibling_mask(cpu));
 	cpumask_clear(cpu_core_mask(cpu));
 	c->phys_proc_id = 0;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
  2014-09-17  7:17 [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug Wanpeng Li
@ 2014-09-21 23:11 ` Wanpeng Li
  2014-09-23  4:46 ` Kamezawa Hiroyuki
  2014-09-23  9:37 ` Borislav Petkov
  2 siblings, 0 replies; 9+ messages in thread
From: Wanpeng Li @ 2014-09-21 23:11 UTC (permalink / raw)
  To: Wanpeng Li, Ingo Molnar, hpa, Peter Zijlstra
  Cc: Ingo Molnar, x86, Borislav Petkov, Yasuaki Ishimatsu,
	David Rientjes, Prarit Bhargava, Steven Rostedt, Toshi Kani,
	linux-kernel

Ping Ingo, Peter Z, HPA,

于 14-9-17 下午3:17, Wanpeng Li 写道:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> IP: [..] find_busiest_group
> PGD 5a9d5067 PUD 13067 PMD 0
> Oops: 0000 [#3] SMP
> [...]
> Call Trace:
> load_balance
> ? _raw_spin_unlock_irqrestore
> idle_balance
> __schedule
> schedule
> schedule_timeout
> ? lock_timer_base
> schedule_timeout_uninterruptible
> msleep
> lock_device_hotplug_sysfs
> online_store
> dev_attr_store
> sysfs_write_file
> vfs_write
> SyS_write
> system_call_fastpath
>
> This bug can be triggered by hot add and remove large number of xen
> domain0's vcpus repeatedly.
>
> Last level cache shared map is built during cpu up and build sched domain 
> routine takes advantage of it to setup sched domain cpu topology, however, 
> llc shared map is unreleased during cpu disable which lead to invalid sched 
> domain cpu topology. This patch fix it by release llc shared map correctly
> during cpu disable.
>
> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Tested-by: Linn Crosetto <linn@hp.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> ---
> v4 -> v5:
>  * add the description when the bug can occur
> v3 -> v4:
>  * simplify backtrace
> v2 -> v3:
>  * simplify backtrace 
> v1 -> v2:
>  * fix subject line
>
>  arch/x86/kernel/smpboot.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 5492798..0134ec7 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1292,6 +1292,9 @@ static void remove_siblinginfo(int cpu)
>  
>  	for_each_cpu(sibling, cpu_sibling_mask(cpu))
>  		cpumask_clear_cpu(cpu, cpu_sibling_mask(sibling));
> +	for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
> +		cpumask_clear_cpu(cpu, cpu_llc_shared_mask(sibling));
> +	cpumask_clear(cpu_llc_shared_mask(cpu));
>  	cpumask_clear(cpu_sibling_mask(cpu));
>  	cpumask_clear(cpu_core_mask(cpu));
>  	c->phys_proc_id = 0;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
  2014-09-17  7:17 [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug Wanpeng Li
  2014-09-21 23:11 ` Wanpeng Li
@ 2014-09-23  4:46 ` Kamezawa Hiroyuki
  2014-09-23  6:36   ` Wanpeng Li
  2014-09-23  9:37 ` Borislav Petkov
  2 siblings, 1 reply; 9+ messages in thread
From: Kamezawa Hiroyuki @ 2014-09-23  4:46 UTC (permalink / raw)
  To: Wanpeng Li, Ingo Molnar, hpa, Peter Zijlstra
  Cc: Ingo Molnar, x86, Borislav Petkov, Yasuaki Ishimatsu,
	David Rientjes, Prarit Bhargava, Steven Rostedt, Toshi Kani,
	linux-kernel

(2014/09/17 16:17), Wanpeng Li wrote:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> IP: [..] find_busiest_group
> PGD 5a9d5067 PUD 13067 PMD 0
> Oops: 0000 [#3] SMP
> [...]
> Call Trace:
> load_balance
> ? _raw_spin_unlock_irqrestore
> idle_balance
> __schedule
> schedule
> schedule_timeout
> ? lock_timer_base
> schedule_timeout_uninterruptible
> msleep
> lock_device_hotplug_sysfs
> online_store
> dev_attr_store
> sysfs_write_file
> vfs_write
> SyS_write
> system_call_fastpath
> 
> This bug can be triggered by hot add and remove large number of xen
> domain0's vcpus repeatedly.
> 
> Last level cache shared map is built during cpu up and build sched domain
> routine takes advantage of it to setup sched domain cpu topology, however,
> llc shared map is unreleased during cpu disable which lead to invalid sched
> domain cpu topology. This patch fix it by release llc shared map correctly
> during cpu disable.
> 
> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Tested-by: Linn Crosetto <linn@hp.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>

Yasuaki reported this can happen on our real hardware. 
https://lkml.org/lkml/2014/7/22/1018

Our case is here.
==
Here is a example on my system.
My system has 4 sockets and each socket has 15 cores and HT is enabled.
In this case, each core of sockes is numbered as follows:

          | CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-44, 90-104
Socket#3 | 45-59, 105-119
Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
It means that last level cache of Socket#2 is shared with
CPU#30-44 and 90-104.
When hot-removing socket#2 and #3, each core of sockets is numbered
as follows:

          | CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains
having 0x3fff80000001fffc0000000.
After that, when hot-adding socket#2 and #3, each core of sockets is
numbered as follows:

          | CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-59
Socket#3 | 90-119
Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000.
It means that last level cache of Socket#2 is shared with CPU#30-59
and 90-104. So the mask has wrong value.
At first, I cleared hot-removed CPU number's bit from llc_shared_map
when hot removing CPU. But Borislav suggested that the problem will
disappear if readded CPU is assigned same CPU number. And llc_shared_map
must not be changed.
==

So, please.

Thanks,
-Kame








^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
  2014-09-23  4:46 ` Kamezawa Hiroyuki
@ 2014-09-23  6:36   ` Wanpeng Li
  2014-09-23  7:56     ` Kamezawa Hiroyuki
  0 siblings, 1 reply; 9+ messages in thread
From: Wanpeng Li @ 2014-09-23  6:36 UTC (permalink / raw)
  To: Kamezawa Hiroyuki, Wanpeng Li, Ingo Molnar, hpa, Peter Zijlstra
  Cc: Ingo Molnar, x86, Borislav Petkov, Yasuaki Ishimatsu,
	David Rientjes, Prarit Bhargava, Steven Rostedt, Toshi Kani,
	linux-kernel

Hi Kamezawa,
于 14-9-23 下午12:46, Kamezawa Hiroyuki 写道:
> (2014/09/17 16:17), Wanpeng Li wrote:
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>> IP: [..] find_busiest_group
>> PGD 5a9d5067 PUD 13067 PMD 0
>> Oops: 0000 [#3] SMP
>> [...]
>> Call Trace:
>> load_balance
>> ? _raw_spin_unlock_irqrestore
>> idle_balance
>> __schedule
>> schedule
>> schedule_timeout
>> ? lock_timer_base
>> schedule_timeout_uninterruptible
>> msleep
>> lock_device_hotplug_sysfs
>> online_store
>> dev_attr_store
>> sysfs_write_file
>> vfs_write
>> SyS_write
>> system_call_fastpath
>>
>> This bug can be triggered by hot add and remove large number of xen
>> domain0's vcpus repeatedly.
>>
>> Last level cache shared map is built during cpu up and build sched domain
>> routine takes advantage of it to setup sched domain cpu topology, however,
>> llc shared map is unreleased during cpu disable which lead to invalid sched
>> domain cpu topology. This patch fix it by release llc shared map correctly
>> during cpu disable.
>>
>> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
>> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Tested-by: Linn Crosetto <linn@hp.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> Yasuaki reported this can happen on our real hardware. 
> https://lkml.org/lkml/2014/7/22/1018
>
> Our case is here.
> ==
> Here is a example on my system.
> My system has 4 sockets and each socket has 15 cores and HT is enabled.
> In this case, each core of sockes is numbered as follows:
>
>           | CPU#
> Socket#0 | 0-14 , 60-74
> Socket#1 | 15-29, 75-89
> Socket#2 | 30-44, 90-104
> Socket#3 | 45-59, 105-119
> Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
> It means that last level cache of Socket#2 is shared with
> CPU#30-44 and 90-104.
> When hot-removing socket#2 and #3, each core of sockets is numbered
> as follows:
>
>           | CPU#
> Socket#0 | 0-14 , 60-74
> Socket#1 | 15-29, 75-89
> But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains
> having 0x3fff80000001fffc0000000.
> After that, when hot-adding socket#2 and #3, each core of sockets is
> numbered as follows:
>
>           | CPU#
> Socket#0 | 0-14 , 60-74
> Socket#1 | 15-29, 75-89
> Socket#2 | 30-59
> Socket#3 | 90-119
> Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000.
> It means that last level cache of Socket#2 is shared with CPU#30-59
> and 90-104. So the mask has wrong value.
> At first, I cleared hot-removed CPU number's bit from llc_shared_map
> when hot removing CPU. But Borislav suggested that the problem will
> disappear if readded CPU is assigned same CPU number. And llc_shared_map
> must not be changed.
> ==
>
> So, please.

As I mentioned before, we still observe calltrace after Yasuaki's patch
applied.
https://lkml.org/lkml/2014/7/29/40

Actually I prefer to merge both patches, one for fix llc shared map
unreleased during hotplug and the other one for assign same CPU number
to readded CPU.

Regards,
Wanpeng Li

> Thanks,
> -Kame
>
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
  2014-09-23  6:36   ` Wanpeng Li
@ 2014-09-23  7:56     ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 9+ messages in thread
From: Kamezawa Hiroyuki @ 2014-09-23  7:56 UTC (permalink / raw)
  To: Wanpeng Li, Wanpeng Li, Ingo Molnar, hpa, Peter Zijlstra
  Cc: Ingo Molnar, x86, Borislav Petkov, Yasuaki Ishimatsu,
	David Rientjes, Prarit Bhargava, Steven Rostedt, Toshi Kani,
	linux-kernel

(2014/09/23 15:36), Wanpeng Li wrote:
> Hi Kamezawa,
> 于 14-9-23 下午12:46, Kamezawa Hiroyuki 写道:
>> (2014/09/17 16:17), Wanpeng Li wrote:
>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>>> IP: [..] find_busiest_group
>>> PGD 5a9d5067 PUD 13067 PMD 0
>>> Oops: 0000 [#3] SMP
>>> [...]
>>> Call Trace:
>>> load_balance
>>> ? _raw_spin_unlock_irqrestore
>>> idle_balance
>>> __schedule
>>> schedule
>>> schedule_timeout
>>> ? lock_timer_base
>>> schedule_timeout_uninterruptible
>>> msleep
>>> lock_device_hotplug_sysfs
>>> online_store
>>> dev_attr_store
>>> sysfs_write_file
>>> vfs_write
>>> SyS_write
>>> system_call_fastpath
>>>
>>> This bug can be triggered by hot add and remove large number of xen
>>> domain0's vcpus repeatedly.
>>>
>>> Last level cache shared map is built during cpu up and build sched domain
>>> routine takes advantage of it to setup sched domain cpu topology, however,
>>> llc shared map is unreleased during cpu disable which lead to invalid sched
>>> domain cpu topology. This patch fix it by release llc shared map correctly
>>> during cpu disable.
>>>
>>> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
>>> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>> Tested-by: Linn Crosetto <linn@hp.com>
>>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
>> Yasuaki reported this can happen on our real hardware.
>> https://lkml.org/lkml/2014/7/22/1018
>>
>> Our case is here.
>> ==
>> Here is a example on my system.
>> My system has 4 sockets and each socket has 15 cores and HT is enabled.
>> In this case, each core of sockes is numbered as follows:
>>
>>            | CPU#
>> Socket#0 | 0-14 , 60-74
>> Socket#1 | 15-29, 75-89
>> Socket#2 | 30-44, 90-104
>> Socket#3 | 45-59, 105-119
>> Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
>> It means that last level cache of Socket#2 is shared with
>> CPU#30-44 and 90-104.
>> When hot-removing socket#2 and #3, each core of sockets is numbered
>> as follows:
>>
>>            | CPU#
>> Socket#0 | 0-14 , 60-74
>> Socket#1 | 15-29, 75-89
>> But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains
>> having 0x3fff80000001fffc0000000.
>> After that, when hot-adding socket#2 and #3, each core of sockets is
>> numbered as follows:
>>
>>            | CPU#
>> Socket#0 | 0-14 , 60-74
>> Socket#1 | 15-29, 75-89
>> Socket#2 | 30-59
>> Socket#3 | 90-119
>> Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000.
>> It means that last level cache of Socket#2 is shared with CPU#30-59
>> and 90-104. So the mask has wrong value.
>> At first, I cleared hot-removed CPU number's bit from llc_shared_map
>> when hot removing CPU. But Borislav suggested that the problem will
>> disappear if readded CPU is assigned same CPU number. And llc_shared_map
>> must not be changed.
>> ==
>>
>> So, please.
> 
> As I mentioned before, we still observe calltrace after Yasuaki's patch
> applied.
> https://lkml.org/lkml/2014/7/29/40
> 
Yes. 
I just wanted to say we need your patch by showing real hardware case.
Sorry for confusion I just reused his explanation of the problem.

I know Yasuaki's original trial was clearing llc_shared map as you do.

> Actually I prefer to merge both patches, one for fix llc shared map
> unreleased during hotplug and the other one for assign same CPU number
> to readded CPU.
> 
I agree. 

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
  2014-09-17  7:17 [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug Wanpeng Li
  2014-09-21 23:11 ` Wanpeng Li
  2014-09-23  4:46 ` Kamezawa Hiroyuki
@ 2014-09-23  9:37 ` Borislav Petkov
  2014-09-23 23:48   ` Wanpeng Li
  2 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2014-09-23  9:37 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Ingo Molnar, hpa, Peter Zijlstra, Ingo Molnar, x86,
	Yasuaki Ishimatsu, David Rientjes, Prarit Bhargava,
	Steven Rostedt, Toshi Kani, linux-kernel

On Wed, Sep 17, 2014 at 03:17:52PM +0800, Wanpeng Li wrote:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> IP: [..] find_busiest_group
> PGD 5a9d5067 PUD 13067 PMD 0
> Oops: 0000 [#3] SMP
> [...]
> Call Trace:
> load_balance
> ? _raw_spin_unlock_irqrestore
> idle_balance
> __schedule
> schedule
> schedule_timeout
> ? lock_timer_base
> schedule_timeout_uninterruptible
> msleep
> lock_device_hotplug_sysfs
> online_store
> dev_attr_store
> sysfs_write_file
> vfs_write
> SyS_write
> system_call_fastpath
> 
> This bug can be triggered by hot add and remove large number of xen
> domain0's vcpus repeatedly.
> 
> Last level cache shared map is built during cpu up and build sched domain 
> routine takes advantage of it to setup sched domain cpu topology, however, 
> llc shared map is unreleased during cpu disable which lead to invalid sched 
> domain cpu topology. This patch fix it by release llc shared map correctly
> during cpu disable.
> 
> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Tested-by: Linn Crosetto <linn@hp.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
  2014-09-23  9:37 ` Borislav Petkov
@ 2014-09-23 23:48   ` Wanpeng Li
  2014-09-24  7:52     ` Ingo Molnar
  0 siblings, 1 reply; 9+ messages in thread
From: Wanpeng Li @ 2014-09-23 23:48 UTC (permalink / raw)
  To: Borislav Petkov, Wanpeng Li
  Cc: Ingo Molnar, hpa, Peter Zijlstra, Ingo Molnar, x86,
	Yasuaki Ishimatsu, David Rientjes, Prarit Bhargava,
	Steven Rostedt, Toshi Kani, linux-kernel


于 14-9-23 下午5:37, Borislav Petkov 写道:
> On Wed, Sep 17, 2014 at 03:17:52PM +0800, Wanpeng Li wrote:
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>> IP: [..] find_busiest_group
>> PGD 5a9d5067 PUD 13067 PMD 0
>> Oops: 0000 [#3] SMP
>> [...]
>> Call Trace:
>> load_balance
>> ? _raw_spin_unlock_irqrestore
>> idle_balance
>> __schedule
>> schedule
>> schedule_timeout
>> ? lock_timer_base
>> schedule_timeout_uninterruptible
>> msleep
>> lock_device_hotplug_sysfs
>> online_store
>> dev_attr_store
>> sysfs_write_file
>> vfs_write
>> SyS_write
>> system_call_fastpath
>>
>> This bug can be triggered by hot add and remove large number of xen
>> domain0's vcpus repeatedly.
>>
>> Last level cache shared map is built during cpu up and build sched domain
>> routine takes advantage of it to setup sched domain cpu topology, however,
>> llc shared map is unreleased during cpu disable which lead to invalid sched
>> domain cpu topology. This patch fix it by release llc shared map correctly
>> during cpu disable.
>>
>> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
>> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Tested-by: Linn Crosetto <linn@hp.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> Reviewed-by: Borislav Petkov <bp@suse.de>
>

Thanks.


Ingo, Peter Z, HPA,


Could this patch catch up with 3.18 merge window?

Regards,
Wanpeng Li


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
  2014-09-23 23:48   ` Wanpeng Li
@ 2014-09-24  7:52     ` Ingo Molnar
  2014-09-24  8:18       ` Wanpeng Li
  0 siblings, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2014-09-24  7:52 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Borislav Petkov, Wanpeng Li, Ingo Molnar, hpa, Peter Zijlstra,
	x86, Yasuaki Ishimatsu, David Rientjes, Prarit Bhargava,
	Steven Rostedt, Toshi Kani, linux-kernel


* Wanpeng Li <kernellwp@gmail.com> wrote:

> 
> 于 14-9-23 下午5:37, Borislav Petkov 写道:
> >On Wed, Sep 17, 2014 at 03:17:52PM +0800, Wanpeng Li wrote:
> >>BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> >>IP: [..] find_busiest_group
> >>PGD 5a9d5067 PUD 13067 PMD 0
> >>Oops: 0000 [#3] SMP
> >>[...]
> >>Call Trace:
> >>load_balance
> >>? _raw_spin_unlock_irqrestore
> >>idle_balance
> >>__schedule
> >>schedule
> >>schedule_timeout
> >>? lock_timer_base
> >>schedule_timeout_uninterruptible
> >>msleep
> >>lock_device_hotplug_sysfs
> >>online_store
> >>dev_attr_store
> >>sysfs_write_file
> >>vfs_write
> >>SyS_write
> >>system_call_fastpath
> >>
> >>This bug can be triggered by hot add and remove large number of xen
> >>domain0's vcpus repeatedly.
> >>
> >>Last level cache shared map is built during cpu up and build sched domain
> >>routine takes advantage of it to setup sched domain cpu topology, however,
> >>llc shared map is unreleased during cpu disable which lead to invalid sched
> >>domain cpu topology. This patch fix it by release llc shared map correctly
> >>during cpu disable.
> >>
> >>Reviewed-by: Toshi Kani <toshi.kani@hp.com>
> >>Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> >>Tested-by: Linn Crosetto <linn@hp.com>
> >>Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> >Reviewed-by: Borislav Petkov <bp@suse.de>
> >
> 
> Thanks.
> 
> 
> Ingo, Peter Z, HPA,
> 
> 
> Could this patch catch up with 3.18 merge window?

Please also add the real-hardware reports to the changelog.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
  2014-09-24  7:52     ` Ingo Molnar
@ 2014-09-24  8:18       ` Wanpeng Li
  0 siblings, 0 replies; 9+ messages in thread
From: Wanpeng Li @ 2014-09-24  8:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, Wanpeng Li, Ingo Molnar, hpa, Peter Zijlstra,
	x86, Yasuaki Ishimatsu, David Rientjes, Prarit Bhargava,
	Steven Rostedt, Toshi Kani, linux-kernel


于 9/24/14, 3:52 PM, Ingo Molnar 写道:
> * Wanpeng Li <kernellwp@gmail.com> wrote:
>
>> 于 14-9-23 下午5:37, Borislav Petkov 写道:
>>> On Wed, Sep 17, 2014 at 03:17:52PM +0800, Wanpeng Li wrote:
>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>>>> IP: [..] find_busiest_group
>>>> PGD 5a9d5067 PUD 13067 PMD 0
>>>> Oops: 0000 [#3] SMP
>>>> [...]
>>>> Call Trace:
>>>> load_balance
>>>> ? _raw_spin_unlock_irqrestore
>>>> idle_balance
>>>> __schedule
>>>> schedule
>>>> schedule_timeout
>>>> ? lock_timer_base
>>>> schedule_timeout_uninterruptible
>>>> msleep
>>>> lock_device_hotplug_sysfs
>>>> online_store
>>>> dev_attr_store
>>>> sysfs_write_file
>>>> vfs_write
>>>> SyS_write
>>>> system_call_fastpath
>>>>
>>>> This bug can be triggered by hot add and remove large number of xen
>>>> domain0's vcpus repeatedly.
>>>>
>>>> Last level cache shared map is built during cpu up and build sched domain
>>>> routine takes advantage of it to setup sched domain cpu topology, however,
>>>> llc shared map is unreleased during cpu disable which lead to invalid sched
>>>> domain cpu topology. This patch fix it by release llc shared map correctly
>>>> during cpu disable.
>>>>
>>>> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
>>>> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>>> Tested-by: Linn Crosetto <linn@hp.com>
>>>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
>>> Reviewed-by: Borislav Petkov <bp@suse.de>
>>>
>> Thanks.
>>
>>
>> Ingo, Peter Z, HPA,
>>
>>
>> Could this patch catch up with 3.18 merge window?
> Please also add the real-hardware reports to the changelog.

Just send out the latest version.

Regards,
Wanpeng Li

>
> Thanks,
>
> 	Ingo


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-09-24  8:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-17  7:17 [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug Wanpeng Li
2014-09-21 23:11 ` Wanpeng Li
2014-09-23  4:46 ` Kamezawa Hiroyuki
2014-09-23  6:36   ` Wanpeng Li
2014-09-23  7:56     ` Kamezawa Hiroyuki
2014-09-23  9:37 ` Borislav Petkov
2014-09-23 23:48   ` Wanpeng Li
2014-09-24  7:52     ` Ingo Molnar
2014-09-24  8:18       ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).