linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86,mm: print likely CPU at segfault time
@ 2021-07-19 19:00 Rik van Riel
  2021-07-19 19:20 ` Dave Hansen
  2021-07-21 20:36 ` Thomas Gleixner
  0 siblings, 2 replies; 8+ messages in thread
From: Rik van Riel @ 2021-07-19 19:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, Andy Lutomirski, kernel-team, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, x86

From 14d31a44a5186c94399dc9518ba80adf64c99772 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel@butters.home.surriel.com>
Date: Mon, 19 Jul 2021 14:49:17 -0400
Subject: [PATCH] x86,mm: print likely CPU at segfault time

In a large enough fleet of computers, it is common to have a few bad
CPUs. Those can often be identified by seeing that some commonly run
kernel code (that runs fine everywhere else) keeps crashing on the
same CPU core on a particular bad system.

One of the failure modes observed is that either the instruction pointer,
or some register used to specify the address of data that needs to be
fetched gets corrupted, resulting in something like a kernel page fault,
null pointer dereference, NX violation, or similar.

Those kernel failures are often preceded by similar looking userspace
failures. It would be useful to know if those are also happening on
the same CPU cores, to get a little more confirmation that it is indeed
a hardware issue.

Adding a printk to show_signal_msg() achieves that purpose. It isn't
perfect since the task might get rescheduled on another CPU between
when the fault hit and when the message is printed, but it should be
good enough to show correlation between userspace and kernel errors
when dealing with a bad CPU.

$ ./segfault
Segmentation fault (core dumped)
$ dmesg | grep segfault
segfault[1349]: segfault at 0 ip 000000000040113a sp 00007ffc6d32e360 error 4 in segfault[401000+1000] on CPU 0

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 arch/x86/mm/fault.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index b2eefdefc108..dd6c89c23a3a 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -777,6 +777,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
 
 	print_vma_addr(KERN_CONT " in ", regs->ip);
 
+	printk(KERN_CONT " on CPU %d", raw_smp_processor_id());
+
 	printk(KERN_CONT "\n");
 
 	show_opcodes(regs, loglvl);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86,mm: print likely CPU at segfault time
  2021-07-19 19:00 [PATCH] x86,mm: print likely CPU at segfault time Rik van Riel
@ 2021-07-19 19:20 ` Dave Hansen
  2021-07-19 19:34   ` Rik van Riel
  2021-07-21 20:36 ` Thomas Gleixner
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2021-07-19 19:20 UTC (permalink / raw)
  To: Rik van Riel, linux-kernel
  Cc: Dave Hansen, Andy Lutomirski, kernel-team, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, x86

On 7/19/21 12:00 PM, Rik van Riel wrote:
> In a large enough fleet of computers, it is common to have a few bad
> CPUs. Those can often be identified by seeing that some commonly run
> kernel code (that runs fine everywhere else) keeps crashing on the
> same CPU core on a particular bad system.

I've encountered a few of these kinds of things over the years.  This is
*definitely* useful.  What you've proposed here is surely the simplest
thing we could print and probably also offers the best bang for our buck.

The only other thing I thought of is that it might be nice to print out
the core id instead of the CPU id.  If there are hardware issues with a
CPU, they're likely to affect both threads.  Seeing to different "CPUs"
in an SMT environment might tempt some folks to think it's not a
core-level hardware issue.

If it's as trivial as:

	printk(KERN_CONT " on cpu/core %d/%d",
		raw_smp_processor_id(),
		topology_core_id(raw_smp_processor_id()));

it would be handy.  But, it's also not hard to look at 10 segfaults, see
that they happened only on 2 CPUs and realize that hyperthreading is
enabled.

Either way, this patch moves things in the right direction, so:

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86,mm: print likely CPU at segfault time
  2021-07-19 19:20 ` Dave Hansen
@ 2021-07-19 19:34   ` Rik van Riel
  2021-07-21 20:38     ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Rik van Riel @ 2021-07-19 19:34 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: Dave Hansen, Andy Lutomirski, kernel-team, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, x86

[-- Attachment #1: Type: text/plain, Size: 749 bytes --]

On Mon, 2021-07-19 at 12:20 -0700, Dave Hansen wrote:

> If it's as trivial as:
> 
>         printk(KERN_CONT " on cpu/core %d/%d",
>                 raw_smp_processor_id(),
>                 topology_core_id(raw_smp_processor_id()));
> 
> it would be handy.  But, it's also not hard to look at 10 segfaults,
> see
> that they happened only on 2 CPUs and realize that hyperthreading is
> enabled.

One problem with topology_core_id() is that that, on a
multi-socket system, the core number may not be unique.

That is why I ended up going with just the CPU number.
It's pretty easy to put one and one together afterwards.

Thanks for your quick patch review.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86,mm: print likely CPU at segfault time
  2021-07-19 19:00 [PATCH] x86,mm: print likely CPU at segfault time Rik van Riel
  2021-07-19 19:20 ` Dave Hansen
@ 2021-07-21 20:36 ` Thomas Gleixner
  2021-07-24  1:38   ` Rik van Riel
  1 sibling, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2021-07-21 20:36 UTC (permalink / raw)
  To: Rik van Riel, linux-kernel
  Cc: Dave Hansen, Andy Lutomirski, kernel-team, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, x86

Rik,

On Mon, Jul 19 2021 at 15:00, Rik van Riel wrote:
>
> Adding a printk to show_signal_msg() achieves that purpose. It isn't
> perfect since the task might get rescheduled on another CPU between
> when the fault hit and when the message is printed, but it should be
> good enough to show correlation between userspace and kernel errors
> when dealing with a bad CPU.

we could collect the cpu number in do_*_addr_fault() before interrupts
are enabled and just hand it through. There are only a few callchains
which end up in __bad_area_nosemaphore().

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86,mm: print likely CPU at segfault time
  2021-07-19 19:34   ` Rik van Riel
@ 2021-07-21 20:38     ` Thomas Gleixner
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2021-07-21 20:38 UTC (permalink / raw)
  To: Rik van Riel, Dave Hansen, linux-kernel
  Cc: Dave Hansen, Andy Lutomirski, kernel-team, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, x86

On Mon, Jul 19 2021 at 15:34, Rik van Riel wrote:

> On Mon, 2021-07-19 at 12:20 -0700, Dave Hansen wrote:
>
>> If it's as trivial as:
>> 
>>         printk(KERN_CONT " on cpu/core %d/%d",
>>                 raw_smp_processor_id(),
>>                 topology_core_id(raw_smp_processor_id()));
>> 
>> it would be handy.  But, it's also not hard to look at 10 segfaults,
>> see
>> that they happened only on 2 CPUs and realize that hyperthreading is
>> enabled.
>
> One problem with topology_core_id() is that that, on a
> multi-socket system, the core number may not be unique.

Just add topology_physical_package_id() and you have a complete picture.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86,mm: print likely CPU at segfault time
  2021-07-21 20:36 ` Thomas Gleixner
@ 2021-07-24  1:38   ` Rik van Riel
  0 siblings, 0 replies; 8+ messages in thread
From: Rik van Riel @ 2021-07-24  1:38 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel
  Cc: Dave Hansen, Andy Lutomirski, kernel-team, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, x86

[-- Attachment #1: Type: text/plain, Size: 1074 bytes --]

On Wed, 2021-07-21 at 22:36 +0200, Thomas Gleixner wrote:
> Rik,
> 
> On Mon, Jul 19 2021 at 15:00, Rik van Riel wrote:
> > 
> > Adding a printk to show_signal_msg() achieves that purpose. It
> > isn't
> > perfect since the task might get rescheduled on another CPU between
> > when the fault hit and when the message is printed, but it should
> > be
> > good enough to show correlation between userspace and kernel errors
> > when dealing with a bad CPU.
> 
> we could collect the cpu number in do_*_addr_fault() before
> interrupts
> are enabled and just hand it through. There are only a few callchains
> which end up in __bad_area_nosemaphore().

We could, but do we really want to add that to the hot path
for page faults, when segfaults are so rare?

I suspect the simple patch I sent will be good enough to
identify a bad CPU, even if only 3 out of 4 userspace crashes
get attributed to the right CPU...

I would be happy to write a patch that does what you want
though, so you can compare them side by side :)

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86,mm: print likely CPU at segfault time
  2022-08-02 20:09 Rik van Riel
@ 2022-08-03 14:49 ` Dave Hansen
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Hansen @ 2022-08-03 14:49 UTC (permalink / raw)
  To: Rik van Riel, linux-kernel
  Cc: x86, kernel-team, Dave Hansen, Thomas Gleixner, Dave Jones,
	Andy Lutomirski

On 8/2/22 13:09, Rik van Riel wrote:
> Add a printk to show_signal_msg() to print the CPU, core, and socket

Nit:     ^ printk(), please

> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -782,6 +782,12 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
>  
>  	print_vma_addr(KERN_CONT " in ", regs->ip);
>  
> +	printk(KERN_CONT " on CPU %d (core %d, socket %d)",
> +	       raw_smp_processor_id(),
> +	       topology_core_id(raw_smp_processor_id()),
> +	       topology_physical_package_id(raw_smp_processor_id()));

This seems totally sane to me.  I have found myself looking through
customer-provided *oopses* more than once trying to figure out if the
same CPU cores were at fault.  This extends that to userspace crashes
too.  I've also found myself trying to map back from logical CPU numbers
to core and package.

One nit: Preempt is enabled here, right?  I understand that this thing
is fundamentally racy, but if we did:

	int cpu = READ_ONCE(raw_smp_processor_id());

it would make it internally *consistent*.  Without that, we could
theoretically get three different raw_smp_processor_id()'s.  It might
even make the code look a wee bit nicer.

The changelog here is great, but  couple of comments would also be nice:

	/* This is a racy snapshot, but is better than nothing: */
	int cpu = READ_ONCE(raw_smp_processor_id());
...
	/*
	 * Dump the likely CPU where the fatal segfault happened.  This
	 * can help help identify buggy pieces of hardware.
	 */
	printk(KERN_CONT " on CPU %d (core %d, socket %d)", cpu,
	       topology_core_id(cpu),
	       topology_physical_package_id(cpu));

If you want to wait a bit and see if you get any other comments, this
seems like something we can suck in after the merge window.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] x86,mm: print likely CPU at segfault time
@ 2022-08-02 20:09 Rik van Riel
  2022-08-03 14:49 ` Dave Hansen
  0 siblings, 1 reply; 8+ messages in thread
From: Rik van Riel @ 2022-08-02 20:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, kernel-team, Dave Hansen, Thomas Gleixner, Dave Jones,
	Andy Lutomirski

In a large enough fleet of computers, it is common to have a few bad CPUs.
Those can often be identified by seeing that some commonly run kernel code,
which runs fine everywhere else, keeps crashing on the same CPU core on one
particular bad system.

However, the failure modes in CPUs that have gone bad over the years are
often oddly specific, and the only bad behavior seen might be segfaults
in programs like bash, python, or various system daemons that run fine
everywhere else.

Add a printk to show_signal_msg() to print the CPU, core, and socket
at segfault time. This is not perfect, since the task might get rescheduled
on another CPU between when the fault hit, and when the message is printed,
but in practice this has been good enough to help us identify several bad
CPU cores.

segfault[1349]: segfault at 0 ip 000000000040113a sp 00007ffc6d32e360 error 4 in segfault[401000+1000] on CPU 0 (core 0, socket 0)

Signed-off-by: Rik van Riel <riel@surriel.com>
CC: Dave Jones <dsj@fb.com>
---
 arch/x86/mm/fault.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fad8faa29d04..47faf7c0041e 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -782,6 +782,12 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
 
 	print_vma_addr(KERN_CONT " in ", regs->ip);
 
+	printk(KERN_CONT " on CPU %d (core %d, socket %d)",
+	       raw_smp_processor_id(),
+	       topology_core_id(raw_smp_processor_id()),
+	       topology_physical_package_id(raw_smp_processor_id()));
+
+
 	printk(KERN_CONT "\n");
 
 	show_opcodes(regs, loglvl);
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-08-03 14:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-19 19:00 [PATCH] x86,mm: print likely CPU at segfault time Rik van Riel
2021-07-19 19:20 ` Dave Hansen
2021-07-19 19:34   ` Rik van Riel
2021-07-21 20:38     ` Thomas Gleixner
2021-07-21 20:36 ` Thomas Gleixner
2021-07-24  1:38   ` Rik van Riel
2022-08-02 20:09 Rik van Riel
2022-08-03 14:49 ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).