All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@kernel.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	"gustavo@embeddedor.com" <gustavo@embeddedor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"paulmck@kernel.org" <paulmck@kernel.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Dan Carpenter <dan.carpenter@oracle.com>,
	Masami Hiramatsu <mhiramat@kernel.org>
Subject: Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
Date: Wed, 19 Feb 2020 14:48:34 -0800	[thread overview]
Message-ID: <772ACE2A-FD8B-492F-960E-981ECC72E283@amacapital.net> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F7F57E302@ORSMSX115.amr.corp.intel.com>


> On Feb 19, 2020, at 2:33 PM, Luck, Tony <tony.luck@intel.com> wrote:
> 
> 
>> 
>> One big question here: are memory failure #MC exceptions synchronous
>> or can they be delayed?   If we get a memory failure, is it possible
>> that the #MC hits some random context and not the actual context where
>> the error occurred?
> 
> There are a few cases:
> 1) SRAO (Software recoverable action optional) [Patrol scrub or L3 cache eviction]
> These aren't synchronous with any core execution. Using machine check to signal
> was probably a mistake - compounded by it being broadcast :-(  Could pick any CPU
> to handle (actually choose the first to arrive in do_machine_check()). That guy should
> arrange to soft offline the affected page. Every CPU can return to what they were doing
> before.

You could handle this by sending IPI-to-self and dealing with it in the interrupt handler. Or even wake a high-priority kthread or workqueue. irq_work may help. Relying on task_work or the non_atomic stuff seems silly - you can’t rely on anything about the interrupted context, and the context is more or less irrelevant anyway.

> 
> 2) SRAR (Software recoverable action required)
> These are synchronous. Starting with Skylake they may be signaled just to the thread
> that hit the poison. Earlier generations broadcast.

Here’s where dealing with one that came from kernel code is just nasty, right?

I would argue that, if IF=0, killing the machine is reasonable.  If IF=1, we should be okay.  Actually making this work sanely is gross, and arguably the goal should be minimizing grossness.

Perhaps, if we came from kernel mode, we should IPI-to-self and use a special vector that is idtentry, not apicinterrupt.  Or maybe even do this for entries from usermode just to keep everything consistent.

>    2a) Hit in ring3 code ... we want to offline the page and SIGBUS the task(s)
>    2b) Memcpy_mcsafe() ... kernel has a recovery path. "Return" to the recovery code instead of to the original RIP.
>    2c) copy_from_user ... not implemented yet. We are in kernel, but would like to treat this like case 2a
> 
> 3) Fatal
> Always broadcast. Some bank has MCi_STATUS.PCC==1. System must be shutdown.

Easy :)

It would be really, really nice if NMI was masked in MCE context.

> 
> -Tony

  reply	other threads:[~2020-02-19 22:48 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter() Peter Zijlstra
2020-02-19 15:31   ` Steven Rostedt
2020-02-19 16:56     ` Borislav Petkov
2020-02-19 17:07       ` Peter Zijlstra
2020-02-20  8:41   ` Will Deacon
2020-02-20  9:19   ` Marc Zyngier
2020-02-20 13:18   ` Petr Mladek
2020-02-19 14:47 ` [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic() Peter Zijlstra
2020-02-19 17:13   ` Borislav Petkov
2020-02-19 17:21     ` Andy Lutomirski
2020-02-19 17:33       ` Peter Zijlstra
2020-02-19 22:12         ` Andy Lutomirski
2020-02-19 22:33           ` Luck, Tony
2020-02-19 22:48             ` Andy Lutomirski [this message]
2020-02-20  7:39           ` Peter Zijlstra
2020-02-19 17:42       ` Borislav Petkov
2020-02-19 17:46         ` Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 03/22] x86: Replace ist_enter() with nmi_enter() Peter Zijlstra
2020-02-20 10:54   ` Borislav Petkov
2020-02-20 12:11     ` Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE Peter Zijlstra
2020-02-19 15:36   ` Steven Rostedt
2020-02-19 15:40     ` Peter Zijlstra
2020-02-19 15:55       ` Steven Rostedt
2020-02-19 15:57       ` Peter Zijlstra
2020-02-19 16:04         ` Peter Zijlstra
2020-02-19 16:12           ` Steven Rostedt
2020-02-19 16:27             ` Paul E. McKenney
2020-02-19 16:34               ` Peter Zijlstra
2020-02-19 16:46                 ` Paul E. McKenney
2020-02-19 17:05               ` Steven Rostedt
2020-02-20 12:17         ` Borislav Petkov
2020-02-20 12:37           ` Peter Zijlstra
2020-02-19 15:47   ` Steven Rostedt
2020-02-19 14:47 ` [PATCH v3 05/22] rcu: Make RCU IRQ enter/exit functions rely on in_nmi() Peter Zijlstra
2020-02-19 16:31   ` Paul E. McKenney
2020-02-19 16:37     ` Peter Zijlstra
2020-02-19 16:45       ` Paul E. McKenney
2020-02-19 17:03       ` Peter Zijlstra
2020-02-19 17:42         ` Paul E. McKenney
2020-02-19 17:16     ` [PATCH] rcu/kprobes: Comment why rcu_nmi_enter() is marked NOKPROBE Steven Rostedt
2020-02-19 17:18       ` Joel Fernandes
2020-02-19 17:41       ` Paul E. McKenney
2020-02-20  5:54       ` Masami Hiramatsu
2020-02-19 14:47 ` [PATCH v3 06/22] rcu: Rename rcu_irq_{enter,exit}_irqson() Peter Zijlstra
2020-02-19 16:38   ` Paul E. McKenney
2020-02-19 14:47 ` [PATCH v3 07/22] rcu: Mark rcu_dynticks_curr_cpu_in_eqs() inline Peter Zijlstra
2020-02-19 16:39   ` Paul E. McKenney
2020-02-19 17:19     ` Steven Rostedt
2020-02-19 14:47 ` [PATCH v3 08/22] rcu,tracing: Create trace_rcu_{enter,exit}() Peter Zijlstra
2020-02-19 15:49   ` Steven Rostedt
2020-02-19 15:58     ` Peter Zijlstra
2020-02-19 16:15       ` Steven Rostedt
2020-02-19 16:35         ` Peter Zijlstra
2020-02-19 16:44           ` Paul E. McKenney
2020-02-20 10:34             ` Peter Zijlstra
2020-02-20 13:58               ` Paul E. McKenney
2020-02-19 14:47 ` [PATCH v3 09/22] sched,rcu,tracing: Avoid tracing before in_nmi() is correct Peter Zijlstra
2020-02-19 15:50   ` Steven Rostedt
2020-02-19 15:50   ` Steven Rostedt
2020-02-19 14:47 ` [PATCH v3 10/22] x86,tracing: Add comments to do_nmi() Peter Zijlstra
2020-02-19 15:51   ` Steven Rostedt
2020-02-19 14:47 ` [PATCH v3 11/22] perf,tracing: Prepare the perf-trace interface for RCU changes Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 12/22] tracing: Employ trace_rcu_{enter,exit}() Peter Zijlstra
2020-02-19 15:52   ` Steven Rostedt
2020-02-19 14:47 ` [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again) Peter Zijlstra
2020-02-19 15:53   ` Steven Rostedt
2020-02-19 16:43   ` Paul E. McKenney
2020-02-19 16:47     ` Peter Zijlstra
2020-02-19 17:05       ` Peter Zijlstra
2020-02-19 17:21         ` Steven Rostedt
2020-02-19 17:40           ` Paul E. McKenney
2020-02-19 18:00             ` Steven Rostedt
2020-02-19 19:05               ` Paul E. McKenney
2020-02-19 14:47 ` [PATCH v3 14/22] perf,tracing: Allow function tracing when !RCU Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 15/22] x86/int3: Ensure that poke_int3_handler() is not traced Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 16/22] locking/atomics, kcsan: Add KCSAN instrumentation Peter Zijlstra
2020-02-19 15:46   ` Steven Rostedt
2020-02-19 16:03     ` Peter Zijlstra
2020-02-19 16:50       ` Paul E. McKenney
2020-02-19 16:54         ` Peter Zijlstra
2020-02-19 17:36           ` Paul E. McKenney
2020-02-19 14:47 ` [PATCH v3 17/22] asm-generic/atomic: Use __always_inline for pure wrappers Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 18/22] asm-generic/atomic: Use __always_inline for fallback wrappers Peter Zijlstra
2020-02-19 16:55   ` Paul E. McKenney
2020-02-19 17:06     ` Peter Zijlstra
2020-02-19 17:35       ` Paul E. McKenney
2020-02-19 14:47 ` [PATCH v3 19/22] compiler: Simple READ/WRITE_ONCE() implementations Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 20/22] locking/atomics: Flip fallbacks and instrumentation Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 21/22] x86/int3: Avoid atomic instrumentation Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 22/22] x86/int3: Ensure that poke_int3_handler() is not sanitized Peter Zijlstra
2020-02-19 16:06   ` Dmitry Vyukov
2020-02-19 16:30     ` Peter Zijlstra
2020-02-19 16:51       ` Peter Zijlstra
2020-02-19 17:20       ` Peter Zijlstra
2020-02-20 10:37         ` Dmitry Vyukov
2020-02-20 12:06           ` Peter Zijlstra
2020-02-20 16:22             ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=772ACE2A-FD8B-492F-960E-981ECC72E283@amacapital.net \
    --to=luto@amacapital.net \
    --cc=bp@alien8.de \
    --cc=dan.carpenter@oracle.com \
    --cc=frederic@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=gustavo@embeddedor.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.