All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wanpeng Li <kernellwp@gmail.com>
To: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Wanpeng Li <wanpeng.li@linux.intel.com>,
	Ingo Molnar <mingo@redhat.com>,
	hpa@zytor.com, Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	x86@kernel.org, Borislav Petkov <bp@alien8.de>,
	Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
	David Rientjes <rientjes@google.com>,
	Prarit Bhargava <prarit@redhat.com>,
	Steven Rostedt <srostedt@redhat.com>,
	Toshi Kani <toshi.kani@hp.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
Date: Tue, 23 Sep 2014 14:36:07 +0800	[thread overview]
Message-ID: <542114D7.3030605@gmail.com> (raw)
In-Reply-To: <5420FB25.8050102@jp.fujitsu.com>

Hi Kamezawa,
于 14-9-23 下午12:46, Kamezawa Hiroyuki 写道:
> (2014/09/17 16:17), Wanpeng Li wrote:
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>> IP: [..] find_busiest_group
>> PGD 5a9d5067 PUD 13067 PMD 0
>> Oops: 0000 [#3] SMP
>> [...]
>> Call Trace:
>> load_balance
>> ? _raw_spin_unlock_irqrestore
>> idle_balance
>> __schedule
>> schedule
>> schedule_timeout
>> ? lock_timer_base
>> schedule_timeout_uninterruptible
>> msleep
>> lock_device_hotplug_sysfs
>> online_store
>> dev_attr_store
>> sysfs_write_file
>> vfs_write
>> SyS_write
>> system_call_fastpath
>>
>> This bug can be triggered by hot add and remove large number of xen
>> domain0's vcpus repeatedly.
>>
>> Last level cache shared map is built during cpu up and build sched domain
>> routine takes advantage of it to setup sched domain cpu topology, however,
>> llc shared map is unreleased during cpu disable which lead to invalid sched
>> domain cpu topology. This patch fix it by release llc shared map correctly
>> during cpu disable.
>>
>> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
>> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Tested-by: Linn Crosetto <linn@hp.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> Yasuaki reported this can happen on our real hardware. 
> https://lkml.org/lkml/2014/7/22/1018
>
> Our case is here.
> ==
> Here is a example on my system.
> My system has 4 sockets and each socket has 15 cores and HT is enabled.
> In this case, each core of sockes is numbered as follows:
>
>           | CPU#
> Socket#0 | 0-14 , 60-74
> Socket#1 | 15-29, 75-89
> Socket#2 | 30-44, 90-104
> Socket#3 | 45-59, 105-119
> Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
> It means that last level cache of Socket#2 is shared with
> CPU#30-44 and 90-104.
> When hot-removing socket#2 and #3, each core of sockets is numbered
> as follows:
>
>           | CPU#
> Socket#0 | 0-14 , 60-74
> Socket#1 | 15-29, 75-89
> But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains
> having 0x3fff80000001fffc0000000.
> After that, when hot-adding socket#2 and #3, each core of sockets is
> numbered as follows:
>
>           | CPU#
> Socket#0 | 0-14 , 60-74
> Socket#1 | 15-29, 75-89
> Socket#2 | 30-59
> Socket#3 | 90-119
> Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000.
> It means that last level cache of Socket#2 is shared with CPU#30-59
> and 90-104. So the mask has wrong value.
> At first, I cleared hot-removed CPU number's bit from llc_shared_map
> when hot removing CPU. But Borislav suggested that the problem will
> disappear if readded CPU is assigned same CPU number. And llc_shared_map
> must not be changed.
> ==
>
> So, please.

As I mentioned before, we still observe calltrace after Yasuaki's patch
applied.
https://lkml.org/lkml/2014/7/29/40

Actually I prefer to merge both patches, one for fix llc shared map
unreleased during hotplug and the other one for assign same CPU number
to readded CPU.

Regards,
Wanpeng Li

> Thanks,
> -Kame
>
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


  reply	other threads:[~2014-09-23  6:36 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-17  7:17 [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug Wanpeng Li
2014-09-21 23:11 ` Wanpeng Li
2014-09-23  4:46 ` Kamezawa Hiroyuki
2014-09-23  6:36   ` Wanpeng Li [this message]
2014-09-23  7:56     ` Kamezawa Hiroyuki
2014-09-23  9:37 ` Borislav Petkov
2014-09-23 23:48   ` Wanpeng Li
2014-09-24  7:52     ` Ingo Molnar
2014-09-24  8:18       ` Wanpeng Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=542114D7.3030605@gmail.com \
    --to=kernellwp@gmail.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=prarit@redhat.com \
    --cc=rientjes@google.com \
    --cc=srostedt@redhat.com \
    --cc=toshi.kani@hp.com \
    --cc=wanpeng.li@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.