All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Print CPU at segfault time
@ 2022-08-11  2:49 ira.weiny
  2022-08-11  2:49 ` [PATCH 1/3] x86,mm: print likely " ira.weiny
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: ira.weiny @ 2022-08-11  2:49 UTC (permalink / raw)
  To: Rik van Riel, Borislav Petkov, Dave Hansen
  Cc: Ira Weiny, x86, linux-kernel, kernel-team

From: Ira Weiny <ira.weiny@intel.com>

Changes from RFC:
	Drop patch 1 as I misunderstood the code and it is not needed for this
		use case
	Combine patch 2 and 5 into a patch 3 which stores the CPU only for page
		faults

Rik reported that the knowledge of which CPU's are seeing faults can help in
determining which CPUs are failing in a large data center.[1]

Storing the CPU at exception entry time allows this print to report the CPU
which actually took the exception.  This still may not be the CPU which is
failing but it should be closer.

Dave and Boris recognized that the auxiliary pt_regs work I did for the PKS
series could help to store this value and avoid passing the CPU throughout the
fault handler call stack.

The patches I sent for RFC had a lot more overhead than is needed to report the
CPU in a page fault.[2]  After the discussions the generic save/restore of the
auxiliary pt_regs is overkill for the current use case.  Skip that overhead and
only store the CPU in the page fault path.

[1] https://lore.kernel.org/all/20220805101644.2e674553@imladris.surriel.com/
[2] https://lore.kernel.org/lkml/20220805173009.3128098-1-ira.weiny@intel.com/

Ira Weiny (2):
  x86/entry: Add auxiliary pt_regs space
  x86/mm: Store CPU info on exception entry

Rik van Riel (1):
  x86,mm: print likely CPU at segfault time

 arch/x86/Kconfig                 |  4 ++++
 arch/x86/entry/calling.h         | 19 +++++++++++++++++++
 arch/x86/entry/entry_64.S        | 22 ++++++++++++++++++++++
 arch/x86/entry/entry_64_compat.S |  6 ++++++
 arch/x86/include/asm/ptrace.h    | 19 +++++++++++++++++++
 arch/x86/kernel/asm-offsets_64.c | 15 +++++++++++++++
 arch/x86/kernel/head_64.S        |  6 ++++++
 arch/x86/mm/fault.c              | 18 ++++++++++++++++++
 8 files changed, 109 insertions(+)


base-commit: d4252071b97d2027d246f6a82cbee4d52f618b47
-- 
2.35.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] x86,mm: print likely CPU at segfault time
  2022-08-11  2:49 [PATCH 0/3] Print CPU at segfault time ira.weiny
@ 2022-08-11  2:49 ` ira.weiny
  2022-08-11  2:49 ` [PATCH 2/3] x86/entry: Add auxiliary pt_regs space ira.weiny
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: ira.weiny @ 2022-08-11  2:49 UTC (permalink / raw)
  To: Rik van Riel, Borislav Petkov, Dave Hansen
  Cc: Ira Weiny, Dave Jones, x86, linux-kernel, kernel-team

From: Rik van Riel <riel@surriel.com>

In a large enough fleet of computers, it is common to have a few bad CPUs.
Those can often be identified by seeing that some commonly run kernel code,
which runs fine everywhere else, keeps crashing on the same CPU core on one
particular bad system.

However, the failure modes in CPUs that have gone bad over the years are
often oddly specific, and the only bad behavior seen might be segfaults
in programs like bash, python, or various system daemons that run fine
everywhere else.

Add a printk() to show_signal_msg() to print the CPU, core, and socket
at segfault time. This is not perfect, since the task might get rescheduled
on another CPU between when the fault hit, and when the message is printed,
but in practice this has been good enough to help us identify several bad
CPU cores.

segfault[1349]: segfault at 0 ip 000000000040113a sp 00007ffc6d32e360 error 4 in segfault[401000+1000] on CPU 0 (core 0, socket 0)

This printk can be controlled through /proc/sys/debug/exception-trace

Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
CC: Dave Jones <dsj@fb.com>
---
 arch/x86/mm/fault.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fa71a5d12e87..dbc6a2e08a96 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -769,6 +769,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
 		unsigned long address, struct task_struct *tsk)
 {
 	const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
+	/* This is a racy snapshot, but it's better than nothing. */
+	int cpu = raw_smp_processor_id();
 
 	if (!unhandled_signal(tsk, SIGSEGV))
 		return;
@@ -782,6 +784,14 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
 
 	print_vma_addr(KERN_CONT " in ", regs->ip);
 
+	/*
+	 * Dump the likely CPU where the fatal segfault happened.
+	 * This can help identify faulty hardware.
+	 */
+	printk(KERN_CONT " on CPU %d (core %d, socket %d)", cpu,
+	       topology_core_id(cpu), topology_physical_package_id(cpu));
+
+
 	printk(KERN_CONT "\n");
 
 	show_opcodes(regs, loglvl);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] x86/entry: Add auxiliary pt_regs space
  2022-08-11  2:49 [PATCH 0/3] Print CPU at segfault time ira.weiny
  2022-08-11  2:49 ` [PATCH 1/3] x86,mm: print likely " ira.weiny
@ 2022-08-11  2:49 ` ira.weiny
  2022-08-11  2:49 ` [PATCH 3/3] x86/mm: Store CPU info on exception entry ira.weiny
  2022-09-02  3:15 ` [PATCH 0/3] Print CPU at segfault time Ira Weiny
  3 siblings, 0 replies; 5+ messages in thread
From: ira.weiny @ 2022-08-11  2:49 UTC (permalink / raw)
  To: Rik van Riel, Borislav Petkov, Dave Hansen
  Cc: Ira Weiny, Dave Hansen, Dan Williams, Peter Zijlstra,
	Thomas Gleixner, Andy Lutomirski, x86, linux-kernel, kernel-team

From: Ira Weiny <ira.weiny@intel.com>

Rik van Riel reports that knowledge of where a fault hits is valuable in
detecting CPU failures in large data centers.[0]  Having auxiliary
pt_regs space is a useful place to store the CPU and avoids passing
additional data through the exception call stacks.

Two possible places for preserving this state were originally
considered, irqentry_state_t or pt_regs.[1]  pt_regs was much more
complicated and was potentially fraught with unintended consequences.[2]
However, Andy Lutomirski came up with a way to hide additional values on
the stack which could be accessed as "extended_pt_regs".[3] This method
allows any function with current access to pt_regs to obtain access to
the extra information without expanding the use of irqentry_state_t and
leaving pt_regs intact for compatibility with outside tools like BPF.

Prepare the assembly code to add a hidden auxiliary pt_regs space.  To
simplify, the assembly code only adds space on the stack as defined by
the C code which needs it.  The use of this space is left to the C code
which is required to select ARCH_HAS_PTREGS_AUXILIARY to enable this
support.

Each nested exception gets another copy of this auxiliary space allowing
for any number of levels of exception handling.

Initially the space is left empty and results in no code changes because
ARCH_HAS_PTREGS_AUXILIARY is not set.  Subsequent patches adding data to
pt_regs_auxiliary must set ARCH_HAS_PTREGS_AUXILIARY or a build failure
will occur.  The use of ARCH_HAS_PTREGS_AUXILIARY also avoids the
introduction of 2 instructions (addq/subq) on every entry call when the
extra space is not needed.

32bit is specifically excluded.

Peter, Thomas, Andy, Dave, and Dan all suggested parts of the patch or
aided in the development of the patch..

[0] https://lore.kernel.org/all/20220805101644.2e674553@imladris.surriel.com/
[1] https://lore.kernel.org/lkml/CALCETrVe1i5JdyzD_BcctxQJn+ZE3T38EFPgjxN1F577M36g+w@mail.gmail.com/
[2] https://lore.kernel.org/lkml/874kpxx4jf.fsf@nanos.tec.linutronix.de/#t
[3] https://lore.kernel.org/lkml/CALCETrUHwZPic89oExMMe-WyDY8-O3W68NcZvse3=PGW+iW5=w@mail.gmail.com/

Cc: Rik van Riel <riel@surriel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Forward port from PKS series
	https://lore.kernel.org/lkml/20220419170649.1022246-18-ira.weiny@intel.com/
---
 arch/x86/Kconfig                 |  4 ++++
 arch/x86/entry/calling.h         | 19 +++++++++++++++++++
 arch/x86/entry/entry_64.S        | 22 ++++++++++++++++++++++
 arch/x86/entry/entry_64_compat.S |  6 ++++++
 arch/x86/include/asm/ptrace.h    | 18 ++++++++++++++++++
 arch/x86/kernel/asm-offsets_64.c | 15 +++++++++++++++
 arch/x86/kernel/head_64.S        |  6 ++++++
 7 files changed, 90 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f9920f1341c8..5a0b6ee49cb4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1872,6 +1872,10 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 	  If unsure, say y.
 
+config ARCH_HAS_PTREGS_AUXILIARY
+	depends on X86_64
+	bool
+
 choice
 	prompt "TSX enable mode"
 	depends on CPU_SUP_INTEL
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index f6907627172b..b7515f8b0092 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -65,6 +65,25 @@ For 32-bit we have the following conventions - kernel is built with
  * for assembly code:
  */
 
+#ifdef CONFIG_ARCH_HAS_PTREGS_AUXILIARY
+
+.macro PUSH_PTREGS_AUXILIARY
+	/* add space for pt_regs_auxiliary */
+	subq $PTREGS_AUX_SIZE, %rsp
+.endm
+
+.macro POP_PTREGS_AUXILIARY
+	/* remove space for pt_regs_auxiliary */
+	addq $PTREGS_AUX_SIZE, %rsp
+.endm
+
+#else
+
+#define PUSH_PTREGS_AUXILIARY
+#define POP_PTREGS_AUXILIARY
+
+#endif
+
 .macro PUSH_REGS rdx=%rdx rcx=%rcx rax=%rax save_ret=0
 	.if \save_ret
 	pushq	%rsi		/* pt_regs->si */
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9953d966d124..4f9f7f5cb563 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -362,7 +362,9 @@ SYM_CODE_END(xen_error_entry)
 		movq	$-1, ORIG_RAX(%rsp)	/* no syscall to restart */
 	.endif
 
+	PUSH_PTREGS_AUXILIARY
 	call	\cfunc
+	POP_PTREGS_AUXILIARY
 
 	/* For some configurations \cfunc ends up being a noreturn. */
 	REACHABLE
@@ -472,7 +474,9 @@ SYM_CODE_START(\asmsym)
 
 	movq	%rsp, %rdi		/* pt_regs pointer */
 
+	PUSH_PTREGS_AUXILIARY
 	call	\cfunc
+	POP_PTREGS_AUXILIARY
 
 	jmp	paranoid_exit
 
@@ -535,7 +539,9 @@ SYM_CODE_START(\asmsym)
 	 * stack.
 	 */
 	movq	%rsp, %rdi		/* pt_regs pointer */
+	PUSH_PTREGS_AUXILIARY
 	call	vc_switch_off_ist
+	POP_PTREGS_AUXILIARY
 	movq	%rax, %rsp		/* Switch to new stack */
 
 	ENCODE_FRAME_POINTER
@@ -547,7 +553,9 @@ SYM_CODE_START(\asmsym)
 
 	movq	%rsp, %rdi		/* pt_regs pointer */
 
+	PUSH_PTREGS_AUXILIARY
 	call	kernel_\cfunc
+	POP_PTREGS_AUXILIARY
 
 	/*
 	 * No need to switch back to the IST stack. The current stack is either
@@ -584,7 +592,9 @@ SYM_CODE_START(\asmsym)
 	movq	%rsp, %rdi		/* pt_regs pointer into first argument */
 	movq	ORIG_RAX(%rsp), %rsi	/* get error code into 2nd argument*/
 	movq	$-1, ORIG_RAX(%rsp)	/* no syscall to restart */
+	PUSH_PTREGS_AUXILIARY
 	call	\cfunc
+	POP_PTREGS_AUXILIARY
 
 	/* For some configurations \cfunc ends up being a noreturn. */
 	REACHABLE
@@ -838,7 +848,9 @@ SYM_CODE_START_LOCAL(exc_xen_hypervisor_callback)
 	movq	%rdi, %rsp			/* we don't return, adjust the stack frame */
 	UNWIND_HINT_REGS
 
+	PUSH_PTREGS_AUXILIARY
 	call	xen_pv_evtchn_do_upcall
+	POP_PTREGS_AUXILIARY
 
 	jmp	error_return
 SYM_CODE_END(exc_xen_hypervisor_callback)
@@ -1062,7 +1074,9 @@ SYM_CODE_START_LOCAL(error_entry)
 .Lerror_entry_from_usermode_after_swapgs:
 
 	/* Put us onto the real thread stack. */
+	PUSH_PTREGS_AUXILIARY
 	call	sync_regs
+	POP_PTREGS_AUXILIARY
 	RET
 
 	/*
@@ -1119,7 +1133,9 @@ SYM_CODE_START_LOCAL(error_entry)
 	 * as if we faulted immediately after IRET.
 	 */
 	leaq	8(%rsp), %rdi			/* arg0 = pt_regs pointer */
+	PUSH_PTREGS_AUXILIARY
 	call	fixup_bad_iret
+	POP_PTREGS_AUXILIARY
 	mov	%rax, %rdi
 	jmp	.Lerror_entry_from_usermode_after_swapgs
 SYM_CODE_END(error_entry)
@@ -1229,7 +1245,9 @@ SYM_CODE_START(asm_exc_nmi)
 
 	movq	%rsp, %rdi
 	movq	$-1, %rsi
+	PUSH_PTREGS_AUXILIARY
 	call	exc_nmi
+	POP_PTREGS_AUXILIARY
 
 	/*
 	 * Return back to user mode.  We must *not* do the normal exit
@@ -1265,6 +1283,8 @@ SYM_CODE_START(asm_exc_nmi)
 	 * +---------------------------------------------------------+
 	 * | pt_regs                                                 |
 	 * +---------------------------------------------------------+
+	 * | (Optionally) pt_regs_extended                           |
+	 * +---------------------------------------------------------+
 	 *
 	 * The "original" frame is used by hardware.  Before re-enabling
 	 * NMIs, we need to be done with it, and we need to leave enough
@@ -1443,7 +1463,9 @@ end_repeat_nmi:
 
 	movq	%rsp, %rdi
 	movq	$-1, %rsi
+	PUSH_PTREGS_AUXILIARY
 	call	exc_nmi
+	POP_PTREGS_AUXILIARY
 
 	/* Always restore stashed SPEC_CTRL value (see paranoid_entry) */
 	IBRS_EXIT save_reg=%r15
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 682338e7e2a3..7f1e670f7b06 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -117,7 +117,9 @@ SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SYM_L_GLOBAL)
 .Lsysenter_flags_fixed:
 
 	movq	%rsp, %rdi
+	PUSH_PTREGS_AUXILIARY
 	call	do_SYSENTER_32
+	POP_PTREGS_AUXILIARY
 	/* XEN PV guests always use IRET path */
 	ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \
 		    "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
@@ -212,7 +214,9 @@ SYM_INNER_LABEL(entry_SYSCALL_compat_after_hwframe, SYM_L_GLOBAL)
 	UNTRAIN_RET
 
 	movq	%rsp, %rdi
+	PUSH_PTREGS_AUXILIARY
 	call	do_fast_syscall_32
+	POP_PTREGS_AUXILIARY
 	/* XEN PV guests always use IRET path */
 	ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \
 		    "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
@@ -351,6 +355,8 @@ SYM_CODE_START(entry_INT80_compat)
 	UNTRAIN_RET
 
 	movq	%rsp, %rdi
+	PUSH_PTREGS_AUXILIARY
 	call	do_int80_syscall_32
+	POP_PTREGS_AUXILIARY
 	jmp	swapgs_restore_regs_and_return_to_usermode
 SYM_CODE_END(entry_INT80_compat)
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index f4db78b09c8f..5a9c85893459 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -2,6 +2,7 @@
 #ifndef _ASM_X86_PTRACE_H
 #define _ASM_X86_PTRACE_H
 
+#include <linux/container_of.h>
 #include <asm/segment.h>
 #include <asm/page_types.h>
 #include <uapi/asm/ptrace.h>
@@ -91,6 +92,23 @@ struct pt_regs {
 /* top of stack page */
 };
 
+/*
+ * NOTE: Features which add data to pt_regs_auxiliary must select
+ * ARCH_HAS_PTREGS_AUXILIARY.  Failure to do so will result in a build failure.
+ */
+struct pt_regs_auxiliary {
+};
+
+struct pt_regs_extended {
+	struct pt_regs_auxiliary aux;
+	struct pt_regs pt_regs __aligned(8);
+};
+
+static inline struct pt_regs_extended *to_extended_pt_regs(struct pt_regs *regs)
+{
+	return container_of(regs, struct pt_regs_extended, pt_regs);
+}
+
 #endif /* !__i386__ */
 
 #ifdef CONFIG_PARAVIRT
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index 9b698215d261..413fe632445b 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -4,6 +4,7 @@
 #endif
 
 #include <asm/ia32.h>
+#include <asm/ptrace.h>
 
 #if defined(CONFIG_KVM_GUEST)
 #include <asm/kvm_para.h>
@@ -60,5 +61,19 @@ int main(void)
 	DEFINE(stack_canary_offset, offsetof(struct fixed_percpu_data, stack_canary));
 	BLANK();
 #endif
+
+#ifdef CONFIG_ARCH_HAS_PTREGS_AUXILIARY
+	/* Size of Auxiliary pt_regs data */
+	DEFINE(PTREGS_AUX_SIZE, sizeof(struct pt_regs_extended) -
+				sizeof(struct pt_regs));
+#else
+	/*
+	 * Adding data to struct pt_regs_auxiliary requires setting
+	 * ARCH_HAS_PTREGS_AUXILIARY
+	 */
+	BUILD_BUG_ON((sizeof(struct pt_regs_extended) -
+		      sizeof(struct pt_regs)) != 0);
+#endif
+
 	return 0;
 }
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d860d437631b..3a41273acb1c 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -398,8 +398,10 @@ SYM_CODE_START_NOALIGN(vc_boot_ghcb)
 	movq    %rsp, %rdi
 	movq	ORIG_RAX(%rsp), %rsi
 	movq	initial_vc_handler(%rip), %rax
+	PUSH_PTREGS_AUXILIARY
 	ANNOTATE_RETPOLINE_SAFE
 	call	*%rax
+	POP_PTREGS_AUXILIARY
 
 	/* Unwind pt_regs */
 	POP_REGS
@@ -479,7 +481,9 @@ SYM_CODE_START_LOCAL(early_idt_handler_common)
 	UNWIND_HINT_REGS
 
 	movq %rsp,%rdi		/* RDI = pt_regs; RSI is already trapnr */
+	PUSH_PTREGS_AUXILIARY
 	call do_early_exception
+	POP_PTREGS_AUXILIARY
 
 	decl early_recursion_flag(%rip)
 	jmp restore_regs_and_return_to_kernel
@@ -508,7 +512,9 @@ SYM_CODE_START_NOALIGN(vc_no_ghcb)
 	/* Call C handler */
 	movq    %rsp, %rdi
 	movq	ORIG_RAX(%rsp), %rsi
+	PUSH_PTREGS_AUXILIARY
 	call    do_vc_no_ghcb
+	POP_PTREGS_AUXILIARY
 
 	/* Unwind pt_regs */
 	POP_REGS
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] x86/mm: Store CPU info on exception entry
  2022-08-11  2:49 [PATCH 0/3] Print CPU at segfault time ira.weiny
  2022-08-11  2:49 ` [PATCH 1/3] x86,mm: print likely " ira.weiny
  2022-08-11  2:49 ` [PATCH 2/3] x86/entry: Add auxiliary pt_regs space ira.weiny
@ 2022-08-11  2:49 ` ira.weiny
  2022-09-02  3:15 ` [PATCH 0/3] Print CPU at segfault time Ira Weiny
  3 siblings, 0 replies; 5+ messages in thread
From: ira.weiny @ 2022-08-11  2:49 UTC (permalink / raw)
  To: Rik van Riel, Borislav Petkov, Dave Hansen
  Cc: Ira Weiny, Thomas Gleixner, x86, linux-kernel, kernel-team

From: Ira Weiny <ira.weiny@intel.com>

x86 has auxiliary pt_regs space available to store information on the
stack during exceptions.  This information is easier to obtain and store
within C code.

The CPU information of a page fault is useful in determining where bad
CPUs are in a large data center.

Store the CPU on page fault entry and use it later.

Cc: Rik van Riel <riel@surriel.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from RFC:
	New patch combining 2 and 5 from original series and modified.
	Boris/Thomas - eliminate generic calls to save the cpu and call
	only from exc_page_fault
---
 arch/x86/Kconfig              |  2 +-
 arch/x86/include/asm/ptrace.h |  1 +
 arch/x86/mm/fault.c           | 12 ++++++++++--
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5a0b6ee49cb4..e4a04406be2c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1874,7 +1874,7 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 config ARCH_HAS_PTREGS_AUXILIARY
 	depends on X86_64
-	bool
+	def_bool y
 
 choice
 	prompt "TSX enable mode"
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 5a9c85893459..b403b469996f 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -97,6 +97,7 @@ struct pt_regs {
  * ARCH_HAS_PTREGS_AUXILIARY.  Failure to do so will result in a build failure.
  */
 struct pt_regs_auxiliary {
+	int cpu;
 };
 
 struct pt_regs_extended {
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index dbc6a2e08a96..0aa420cd7874 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -768,9 +768,9 @@ static inline void
 show_signal_msg(struct pt_regs *regs, unsigned long error_code,
 		unsigned long address, struct task_struct *tsk)
 {
+	struct pt_regs_auxiliary *aux_pt_regs = &to_extended_pt_regs(regs)->aux;
 	const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
-	/* This is a racy snapshot, but it's better than nothing. */
-	int cpu = raw_smp_processor_id();
+	int cpu = aux_pt_regs->cpu;
 
 	if (!unhandled_signal(tsk, SIGSEGV))
 		return;
@@ -1507,6 +1507,13 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
 	}
 }
 
+noinstr static void aux_pt_regs_save_cpu(struct pt_regs *regs)
+{
+	struct pt_regs_auxiliary *aux_pt_regs = &to_extended_pt_regs(regs)->aux;
+
+	aux_pt_regs->cpu = raw_smp_processor_id();
+}
+
 DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
 {
 	unsigned long address = read_cr2();
@@ -1550,6 +1557,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
 	 */
 	state = irqentry_enter(regs);
 
+	aux_pt_regs_save_cpu(regs);
 	instrumentation_begin();
 	handle_page_fault(regs, error_code, address);
 	instrumentation_end();
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/3] Print CPU at segfault time
  2022-08-11  2:49 [PATCH 0/3] Print CPU at segfault time ira.weiny
                   ` (2 preceding siblings ...)
  2022-08-11  2:49 ` [PATCH 3/3] x86/mm: Store CPU info on exception entry ira.weiny
@ 2022-09-02  3:15 ` Ira Weiny
  3 siblings, 0 replies; 5+ messages in thread
From: Ira Weiny @ 2022-09-02  3:15 UTC (permalink / raw)
  To: Rik van Riel, Borislav Petkov, Dave Hansen; +Cc: x86, linux-kernel, kernel-team

On Wed, Aug 10, 2022 at 07:49:00PM -0700, Ira wrote:
> From: Ira Weiny <ira.weiny@intel.com>

Any feedback on this series?  Or has Rik's patches been accepted instead?

Ira

> 
> Changes from RFC:
> 	Drop patch 1 as I misunderstood the code and it is not needed for this
> 		use case
> 	Combine patch 2 and 5 into a patch 3 which stores the CPU only for page
> 		faults
> 
> Rik reported that the knowledge of which CPU's are seeing faults can help in
> determining which CPUs are failing in a large data center.[1]
> 
> Storing the CPU at exception entry time allows this print to report the CPU
> which actually took the exception.  This still may not be the CPU which is
> failing but it should be closer.
> 
> Dave and Boris recognized that the auxiliary pt_regs work I did for the PKS
> series could help to store this value and avoid passing the CPU throughout the
> fault handler call stack.
> 
> The patches I sent for RFC had a lot more overhead than is needed to report the
> CPU in a page fault.[2]  After the discussions the generic save/restore of the
> auxiliary pt_regs is overkill for the current use case.  Skip that overhead and
> only store the CPU in the page fault path.
> 
> [1] https://lore.kernel.org/all/20220805101644.2e674553@imladris.surriel.com/
> [2] https://lore.kernel.org/lkml/20220805173009.3128098-1-ira.weiny@intel.com/
> 
> Ira Weiny (2):
>   x86/entry: Add auxiliary pt_regs space
>   x86/mm: Store CPU info on exception entry
> 
> Rik van Riel (1):
>   x86,mm: print likely CPU at segfault time
> 
>  arch/x86/Kconfig                 |  4 ++++
>  arch/x86/entry/calling.h         | 19 +++++++++++++++++++
>  arch/x86/entry/entry_64.S        | 22 ++++++++++++++++++++++
>  arch/x86/entry/entry_64_compat.S |  6 ++++++
>  arch/x86/include/asm/ptrace.h    | 19 +++++++++++++++++++
>  arch/x86/kernel/asm-offsets_64.c | 15 +++++++++++++++
>  arch/x86/kernel/head_64.S        |  6 ++++++
>  arch/x86/mm/fault.c              | 18 ++++++++++++++++++
>  8 files changed, 109 insertions(+)
> 
> 
> base-commit: d4252071b97d2027d246f6a82cbee4d52f618b47
> -- 
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-09-02  3:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-11  2:49 [PATCH 0/3] Print CPU at segfault time ira.weiny
2022-08-11  2:49 ` [PATCH 1/3] x86,mm: print likely " ira.weiny
2022-08-11  2:49 ` [PATCH 2/3] x86/entry: Add auxiliary pt_regs space ira.weiny
2022-08-11  2:49 ` [PATCH 3/3] x86/mm: Store CPU info on exception entry ira.weiny
2022-09-02  3:15 ` [PATCH 0/3] Print CPU at segfault time Ira Weiny

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.