linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] perf/x86/intel/uncore: Remove hardcoded socket 0 in Haswell init code
@ 2017-01-05 15:09 Prarit Bhargava
  2017-01-09  6:43 ` [tip:perf/urgent] perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the " tip-bot for Prarit Bhargava
  2017-01-11 11:15 ` tip-bot for Prarit Bhargava
  0 siblings, 2 replies; 3+ messages in thread
From: Prarit Bhargava @ 2017-01-05 15:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Prarit Bhargava, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Peter Zijlstra, Kan Liang, Borislav Petkov, Harish Chegondi

In case anyone finds the actual panic interesting ...

BUG: unable to handle kernel paging request at 00000000006563a1
IP: [<ffffffff8101b582>] hswep_uncore_cpu_init+0x52/0xa0
PGD 0 [    2.313897]
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0 #1
Hardware name: NEC Express5800/T120f [N8100-2285Y]/GA-7WESV-NJ, BIOS 5.0.4009 08/01/2016
task: ffff88002bdb8000 task.stack: ffffc90000014000
RIP: 0010:[<ffffffff8101b582>]  [<ffffffff8101b582>] hswep_uncore_cpu_init+0x52/0xa0
RSP: 0000:ffffc90000017db8  EFLAGS: 00010206
RAX: 0000000000656369 RBX: 0000000000000000 RCX: 0000000000001e03
RDX: ffff88002b224780 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90000017dc8 R08: 000000000001c880 R09: ffffffff813667e1
R10: ffff880030c1c880 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff81c1c090 R14: afafafafafafafaf R15: afafafafafafafaf
FS:  0000000000000000(0000) GS:ffff880030c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000006563a1 CR3: 000000002fc07000 CR4: 00000000001406b0
Stack:
 ffffc90000017dc8 00000000352bd002 ffffc90000017e00 ffffffff81da17f8
 0000000000000000 ffffffff81da16f9 00000000000000f0 afafafafafafafaf
 afafafafafafafaf ffffc90000017e78 ffffffff81002190 ffffc90000017e00
Call Trace:
 [<ffffffff81da17f8>] intel_uncore_init+0xff/0x2e6
 [<ffffffff81da16f9>] ? uncore_type_init+0x158/0x158
 [<ffffffff81002190>] do_one_initcall+0x50/0x190
 [<ffffffff810af27b>] ? parse_args+0x27b/0x460
 [<ffffffff81d9c357>] kernel_init_freeable+0x1a5/0x249
 [<ffffffff81d9ba27>] ? set_debug_rodata+0x12/0x12
 [<ffffffff81702010>] ? rest_init+0x80/0x80
 [<ffffffff8170201e>] kernel_init+0xe/0x110
 [<ffffffff8170f715>] ret_from_fork+0x25/0x30
Code: 1a d5 00 39 15 cc 1c c0 00 7e 06 89 15 c4 1c c0 00 48 98 48 8b 15 d7 c3 f7 00 48 8d 04 40 48 8d 04 c2 48 8b 40 10 48 85 c0 74 1b <8b> 70 38 48 8b 78 10 48 8d 4d f4 ba 94 00 00 00 e8 b9 db 38 00
RIP  [<ffffffff8101b582>] hswep_uncore_cpu_init+0x52/0xa0

----8<----

On multi-socket Intel v3 processor systems (aka Haswell) kdump can panic
in hswep_uncore_cpu_init() since 9d85eb9119f4 ("x86/smpboot: Make logical
package management more robust") corrected the physical ID to logical ID
mapping of the threads if the panic occurs on any socket other than socket 0.

hswep_uncore_cpu_init() is hard coded for physical socket 0 and if the
system is kdump'ing on any other socket the logical package value will be
incorrect.  The code should not use 0 as the physical ID, and should use
the boot cpu's logical package ID in this calculation.

v2: switched to using boot_cpu_data.logical_proc_id

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Harish Chegondi <harish.chegondi@intel.com>
---
 arch/x86/events/intel/uncore_snbep.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index e6832be714bc..dae2fedc1601 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -2686,7 +2686,7 @@ static int hswep_pcu_hw_config(struct intel_uncore_box *box, struct perf_event *
 
 void hswep_uncore_cpu_init(void)
 {
-	int pkg = topology_phys_to_logical_pkg(0);
+	int pkg = boot_cpu_data.logical_proc_id;
 
 	if (hswep_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
 		hswep_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
-- 
1.7.9.3

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [tip:perf/urgent] perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code
  2017-01-05 15:09 [PATCH v2] perf/x86/intel/uncore: Remove hardcoded socket 0 in Haswell init code Prarit Bhargava
@ 2017-01-09  6:43 ` tip-bot for Prarit Bhargava
  2017-01-11 11:15 ` tip-bot for Prarit Bhargava
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Prarit Bhargava @ 2017-01-09  6:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, bp, jolsa, prarit, vincent.weaver, tglx, hpa, acme,
	kan.liang, eranian, alexander.shishkin, harish.chegondi,
	linux-kernel, mingo, torvalds

Commit-ID:  fa37361e291bfe528872b9aef5c8644a3fc7ff20
Gitweb:     http://git.kernel.org/tip/fa37361e291bfe528872b9aef5c8644a3fc7ff20
Author:     Prarit Bhargava <prarit@redhat.com>
AuthorDate: Thu, 5 Jan 2017 10:09:25 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 7 Jan 2017 08:54:38 +0100

perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code

On multi-socket Intel v3 processor systems (aka Haswell), kdump can crash
in hswep_uncore_cpu_init():

  BUG: unable to handle kernel paging request at 00000000006563a1
  IP: [<ffffffff8101b582>] hswep_uncore_cpu_init+0x52/0xa0

The crash was introduced by the following commit:

  9d85eb9119f4 ("x86/smpboot: Make logical package management more robust")

... which patch corrected the physical ID to logical ID mapping of the
threads if the kdumped panic occurs on any socket other than socket 0.

But hswep_uncore_cpu_init() is hard coded for physical socket 0 and if the
system is kdump'ing on any other socket the logical package value will be
incorrect - crashing the kdump kernel.

The code should not use 0 as the physical ID, and should use the boot
CPU's logical package ID in this calculation.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1483628965-2890-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/events/intel/uncore_snbep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index e6832be..dae2fed 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -2686,7 +2686,7 @@ static struct intel_uncore_type *hswep_msr_uncores[] = {
 
 void hswep_uncore_cpu_init(void)
 {
-	int pkg = topology_phys_to_logical_pkg(0);
+	int pkg = boot_cpu_data.logical_proc_id;
 
 	if (hswep_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
 		hswep_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [tip:perf/urgent] perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code
  2017-01-05 15:09 [PATCH v2] perf/x86/intel/uncore: Remove hardcoded socket 0 in Haswell init code Prarit Bhargava
  2017-01-09  6:43 ` [tip:perf/urgent] perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the " tip-bot for Prarit Bhargava
@ 2017-01-11 11:15 ` tip-bot for Prarit Bhargava
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Prarit Bhargava @ 2017-01-11 11:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, peterz, mingo, linux-kernel, hpa, eranian, kan.liang, prarit,
	torvalds, vincent.weaver, tglx, alexander.shishkin, jolsa, acme,
	harish.chegondi

Commit-ID:  6d6daa20945f3f598e56e18d1f926c08754f5801
Gitweb:     http://git.kernel.org/tip/6d6daa20945f3f598e56e18d1f926c08754f5801
Author:     Prarit Bhargava <prarit@redhat.com>
AuthorDate: Thu, 5 Jan 2017 10:09:25 -0500
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 11 Jan 2017 12:13:21 +0100

perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code

hswep_uncore_cpu_init() uses a hardcoded physical package id 0 for the boot
cpu. This works as long as the boot CPU is actually on the physical package
0, which is normaly the case after power on / reboot.

But it fails with a NULL pointer dereference when a kdump kernel is started
on a secondary socket which has a different physical package id because the
locigal package translation for physical package 0 does not exist.

Use the logical package id of the boot cpu instead of hard coded 0.

[ tglx: Rewrote changelog once more ]

Fixes: cf6d445f6897 ("perf/x86/uncore: Track packages, not per CPU data")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1483628965-2890-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/events/intel/uncore_snbep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index e6832be..dae2fed 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -2686,7 +2686,7 @@ static struct intel_uncore_type *hswep_msr_uncores[] = {
 
 void hswep_uncore_cpu_init(void)
 {
-	int pkg = topology_phys_to_logical_pkg(0);
+	int pkg = boot_cpu_data.logical_proc_id;
 
 	if (hswep_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
 		hswep_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-01-11 11:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-05 15:09 [PATCH v2] perf/x86/intel/uncore: Remove hardcoded socket 0 in Haswell init code Prarit Bhargava
2017-01-09  6:43 ` [tip:perf/urgent] perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the " tip-bot for Prarit Bhargava
2017-01-11 11:15 ` tip-bot for Prarit Bhargava

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).