linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@kernel.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	"gustavo@embeddedor.com" <gustavo@embeddedor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"paulmck@kernel.org" <paulmck@kernel.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Dan Carpenter <dan.carpenter@oracle.com>,
	Masami Hiramatsu <mhiramat@kernel.org>
Subject: Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
Date: Wed, 19 Feb 2020 14:48:34 -0800	[thread overview]
Message-ID: <772ACE2A-FD8B-492F-960E-981ECC72E283@amacapital.net> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F7F57E302@ORSMSX115.amr.corp.intel.com>


> On Feb 19, 2020, at 2:33 PM, Luck, Tony <tony.luck@intel.com> wrote:
> 
> 
>> 
>> One big question here: are memory failure #MC exceptions synchronous
>> or can they be delayed?   If we get a memory failure, is it possible
>> that the #MC hits some random context and not the actual context where
>> the error occurred?
> 
> There are a few cases:
> 1) SRAO (Software recoverable action optional) [Patrol scrub or L3 cache eviction]
> These aren't synchronous with any core execution. Using machine check to signal
> was probably a mistake - compounded by it being broadcast :-(  Could pick any CPU
> to handle (actually choose the first to arrive in do_machine_check()). That guy should
> arrange to soft offline the affected page. Every CPU can return to what they were doing
> before.

You could handle this by sending IPI-to-self and dealing with it in the interrupt handler. Or even wake a high-priority kthread or workqueue. irq_work may help. Relying on task_work or the non_atomic stuff seems silly - you can’t rely on anything about the interrupted context, and the context is more or less irrelevant anyway.

> 
> 2) SRAR (Software recoverable action required)
> These are synchronous. Starting with Skylake they may be signaled just to the thread
> that hit the poison. Earlier generations broadcast.

Here’s where dealing with one that came from kernel code is just nasty, right?

I would argue that, if IF=0, killing the machine is reasonable.  If IF=1, we should be okay.  Actually making this work sanely is gross, and arguably the goal should be minimizing grossness.

Perhaps, if we came from kernel mode, we should IPI-to-self and use a special vector that is idtentry, not apicinterrupt.  Or maybe even do this for entries from usermode just to keep everything consistent.

>    2a) Hit in ring3 code ... we want to offline the page and SIGBUS the task(s)
>    2b) Memcpy_mcsafe() ... kernel has a recovery path. "Return" to the recovery code instead of to the original RIP.
>    2c) copy_from_user ... not implemented yet. We are in kernel, but would like to treat this like case 2a
> 
> 3) Fatal
> Always broadcast. Some bank has MCi_STATUS.PCC==1. System must be shutdown.

Easy :)

It would be really, really nice if NMI was masked in MCE context.

> 
> -Tony

  reply	other threads:[~2020-02-19 22:48 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter() Peter Zijlstra
2020-02-19 15:31   ` Steven Rostedt
2020-02-19 16:56     ` Borislav Petkov
2020-02-19 17:07       ` Peter Zijlstra
2020-02-20  8:41   ` Will Deacon
2020-02-20  9:19   ` Marc Zyngier
2020-02-20 13:18   ` Petr Mladek
2020-02-19 14:47 ` [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic() Peter Zijlstra
2020-02-19 17:13   ` Borislav Petkov
2020-02-19 17:21     ` Andy Lutomirski
2020-02-19 17:33       ` Peter Zijlstra
2020-02-19 22:12         ` Andy Lutomirski
2020-02-19 22:33           ` Luck, Tony
2020-02-19 22:48             ` Andy Lutomirski [this message]
2020-02-20  7:39           ` Peter Zijlstra
2020-02-19 17:42       ` Borislav Petkov
2020-02-19 17:46         ` Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 03/22] x86: Replace ist_enter() with nmi_enter() Peter Zijlstra
2020-02-20 10:54   ` Borislav Petkov
2020-02-20 12:11     ` Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE Peter Zijlstra
2020-02-19 15:36   ` Steven Rostedt
2020-02-19 15:40     ` Peter Zijlstra
2020-02-19 15:55       ` Steven Rostedt
2020-02-19 15:57       ` Peter Zijlstra
2020-02-19 16:04         ` Peter Zijlstra
2020-02-19 16:12           ` Steven Rostedt
2020-02-19 16:27             ` Paul E. McKenney
2020-02-19 16:34               ` Peter Zijlstra
2020-02-19 16:46                 ` Paul E. McKenney
2020-02-19 17:05               ` Steven Rostedt
2020-02-20 12:17         ` Borislav Petkov
2020-02-20 12:37           ` Peter Zijlstra
2020-02-19 15:47   ` Steven Rostedt
2020-02-19 14:47 ` [PATCH v3 05/22] rcu: Make RCU IRQ enter/exit functions rely on in_nmi() Peter Zijlstra
2020-02-19 16:31   ` Paul E. McKenney
2020-02-19 16:37     ` Peter Zijlstra
2020-02-19 16:45       ` Paul E. McKenney
2020-02-19 17:03       ` Peter Zijlstra
2020-02-19 17:42         ` Paul E. McKenney
2020-02-19 17:16     ` [PATCH] rcu/kprobes: Comment why rcu_nmi_enter() is marked NOKPROBE Steven Rostedt
2020-02-19 17:18       ` Joel Fernandes
2020-02-19 17:41       ` Paul E. McKenney
2020-02-20  5:54       ` Masami Hiramatsu
2020-02-19 14:47 ` [PATCH v3 06/22] rcu: Rename rcu_irq_{enter,exit}_irqson() Peter Zijlstra
2020-02-19 16:38   ` Paul E. McKenney
2020-02-19 14:47 ` [PATCH v3 07/22] rcu: Mark rcu_dynticks_curr_cpu_in_eqs() inline Peter Zijlstra
2020-02-19 16:39   ` Paul E. McKenney
2020-02-19 17:19     ` Steven Rostedt
2020-02-19 14:47 ` [PATCH v3 08/22] rcu,tracing: Create trace_rcu_{enter,exit}() Peter Zijlstra
2020-02-19 15:49   ` Steven Rostedt
2020-02-19 15:58     ` Peter Zijlstra
2020-02-19 16:15       ` Steven Rostedt
2020-02-19 16:35         ` Peter Zijlstra
2020-02-19 16:44           ` Paul E. McKenney
2020-02-20 10:34             ` Peter Zijlstra
2020-02-20 13:58               ` Paul E. McKenney
2020-02-19 14:47 ` [PATCH v3 09/22] sched,rcu,tracing: Avoid tracing before in_nmi() is correct Peter Zijlstra
2020-02-19 15:50   ` Steven Rostedt
2020-02-19 15:50   ` Steven Rostedt
2020-02-19 14:47 ` [PATCH v3 10/22] x86,tracing: Add comments to do_nmi() Peter Zijlstra
2020-02-19 15:51   ` Steven Rostedt
2020-02-19 14:47 ` [PATCH v3 11/22] perf,tracing: Prepare the perf-trace interface for RCU changes Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 12/22] tracing: Employ trace_rcu_{enter,exit}() Peter Zijlstra
2020-02-19 15:52   ` Steven Rostedt
2020-02-19 14:47 ` [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again) Peter Zijlstra
2020-02-19 15:53   ` Steven Rostedt
2020-02-19 16:43   ` Paul E. McKenney
2020-02-19 16:47     ` Peter Zijlstra
2020-02-19 17:05       ` Peter Zijlstra
2020-02-19 17:21         ` Steven Rostedt
2020-02-19 17:40           ` Paul E. McKenney
2020-02-19 18:00             ` Steven Rostedt
2020-02-19 19:05               ` Paul E. McKenney
2020-02-19 14:47 ` [PATCH v3 14/22] perf,tracing: Allow function tracing when !RCU Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 15/22] x86/int3: Ensure that poke_int3_handler() is not traced Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 16/22] locking/atomics, kcsan: Add KCSAN instrumentation Peter Zijlstra
2020-02-19 15:46   ` Steven Rostedt
2020-02-19 16:03     ` Peter Zijlstra
2020-02-19 16:50       ` Paul E. McKenney
2020-02-19 16:54         ` Peter Zijlstra
2020-02-19 17:36           ` Paul E. McKenney
2020-02-19 14:47 ` [PATCH v3 17/22] asm-generic/atomic: Use __always_inline for pure wrappers Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 18/22] asm-generic/atomic: Use __always_inline for fallback wrappers Peter Zijlstra
2020-02-19 16:55   ` Paul E. McKenney
2020-02-19 17:06     ` Peter Zijlstra
2020-02-19 17:35       ` Paul E. McKenney
2020-02-19 14:47 ` [PATCH v3 19/22] compiler: Simple READ/WRITE_ONCE() implementations Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 20/22] locking/atomics: Flip fallbacks and instrumentation Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 21/22] x86/int3: Avoid atomic instrumentation Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 22/22] x86/int3: Ensure that poke_int3_handler() is not sanitized Peter Zijlstra
2020-02-19 16:06   ` Dmitry Vyukov
2020-02-19 16:30     ` Peter Zijlstra
2020-02-19 16:51       ` Peter Zijlstra
2020-02-19 17:20       ` Peter Zijlstra
2020-02-20 10:37         ` Dmitry Vyukov
2020-02-20 12:06           ` Peter Zijlstra
2020-02-20 16:22             ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=772ACE2A-FD8B-492F-960E-981ECC72E283@amacapital.net \
    --to=luto@amacapital.net \
    --cc=bp@alien8.de \
    --cc=dan.carpenter@oracle.com \
    --cc=frederic@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=gustavo@embeddedor.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).