All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "Huang, Ying" <ying.huang@intel.com>,
	huang ying <huang.ying.caritas@gmail.com>,
	Len Brown <lenb@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Andi Kleen <andi@firstfloor.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	Andi Kleen <ak@linux.intel.com>,
	"Wu, Fengguang" <fengguang.wu@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Borislav Petkov <bp@alien8.de>
Subject: Re: [PATCH 5/9] HWPoison: add memory_failure_queue()
Date: Wed, 25 May 2011 16:08:08 +0200	[thread overview]
Message-ID: <20110525140808.GD19118@elte.hu> (raw)
In-Reply-To: <987664A83D2D224EAE907B061CE93D5301D5BF823C@orsmsx505.amr.corp.intel.com>


* Luck, Tony <tony.luck@intel.com> wrote:

> In your proposed solution, we'd generate an event that would be 
> handled by some process/daemon ... but how would we ensure that the 
> affected process does not run in the mean time? Could we create 
> some analogous method to the ptrace stopped state, and hand control 
> of the affected process to the daemon that gets the event?

Ok, i think there is a bit of a misunderstanding here - which is not 
a surprise really: we made generic arguments all along with very few 
specifics.

The RAS daemon would deal with 'slow' policy action: fully recovered 
events. It would also log various events so that people can do post 
mortem etc.

The main point of defining events here is so that there's a single 
method of transport and a single flexible method of defining and 
extracting events.

Some of the event processing would occur in the kernel: in code that 
knows about memory_failure() and calls it while making sure we do not 
execute any user-space instruction.

Some of the code would execute *very* early and in a very atomic way, 
still in NMI context: panicing the box if the error is so severe.

Neither of these are steps that the RAS daemon can or wants to 
handle.

The RAS tools would interact with the regular perf facilities setting 
and configuring the various RAS related events. They'd handle the 
'severity' config bits, they'd initiate testing (injection), etc.

Ideally the RAS daemon and tools would do what syslog does (and 
more), with more structured events. In the end of the day most of the 
'policy action' is taken by humans anyway, who want to take a look at 
some ASCII output. So printk() integration and obvious ASCII output 
for everything is important along the way.

> 2) The memory error was found in certain special sections of the
>    kernel for which recovery is possible (e.g. while copying to/from
>    user memory, perhaps also page copy and page clear).
> 
> Here I don't have a solution. TIF_MCE_NOTIFY isn't checked when 
> returning from do_machine_check() to kernel code.

Well, since we are already in interrupt context (albeit in a very 
atomic NMI context), sending a self-IPI is not strictly necessary. We 
could fix up the return address and jump to the right handler 
straight away during the IRET.

A self-IPI might also not execute *immediately* - there's always the 
chance of APIC related delays.

> In a CONFIG_PREEMPT=y kernel, all of the recoverable cases ought to 
> be in places where pre-emption is allowed ... so perhaps we can 
> also use the stop-and-switch option here?

Yes, these are generally preemptible cases - and if they are not we 
can make the error fatal (we do not have to handle *every* complex 
case, giving up is a fair answer as well - we do not want rare code 
to be complex really).

But you don't need to stop-and-switch: just stack-nesting on top of 
whatever preemptible code was running there would be enough, wouldnt 
it? That stops a task from executing until the decision has been made 
whether it can continue or not.

Thanks,

	Ingo

  reply	other threads:[~2011-05-25 14:09 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-17  8:08 [PATCH 0/9] ACPI, APEI patches for 2.6.40 Huang Ying
2011-05-17  8:08 ` [PATCH 1/9] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG Huang Ying
2011-05-17  8:08 ` [PATCH 2/9] lib, Add lock-less NULL terminated single list Huang Ying
2011-05-17  8:08 ` [PATCH 3/9] lib, Make gen_pool memory allocator lockless Huang Ying
2011-05-17  8:08 ` [PATCH 4/9] ACPI, APEI, GHES, printk support for recoverable error via NMI Huang Ying
2011-05-17  8:08 ` [PATCH 5/9] HWPoison: add memory_failure_queue() Huang Ying
2011-05-17  8:46   ` Ingo Molnar
2011-05-17  8:52     ` Huang Ying
2011-05-17  9:26       ` Ingo Molnar
2011-05-18  1:10         ` Huang Ying
2011-05-20 11:56           ` Ingo Molnar
2011-05-22  8:14             ` huang ying
2011-05-22 10:00               ` Ingo Molnar
2011-05-22 12:32                 ` huang ying
2011-05-22 12:32                   ` huang ying
2011-05-22 13:25                   ` Ingo Molnar
2011-05-22 13:25                     ` Ingo Molnar
2011-05-23  2:38                     ` Huang Ying
2011-05-23 11:01                       ` Ingo Molnar
2011-05-23 16:45                         ` Luck, Tony
2011-05-25 14:08                           ` Ingo Molnar [this message]
2011-05-24  2:10                         ` Huang Ying
2011-05-24  2:48                           ` Ingo Molnar
2011-05-24  3:07                             ` Huang Ying
2011-05-24  4:24                               ` Ingo Molnar
2011-05-25  7:41                 ` Hidetoshi Seto
2011-05-25 14:11                   ` Ingo Molnar
2011-05-26  1:33                     ` Hidetoshi Seto
2011-05-17  8:08 ` [PATCH 6/9] ACPI, APEI, GHES: Add hardware memory error recovery support Huang Ying
2011-05-17  8:08 ` [PATCH 7/9] PCIe, AER, add aer_recover_queue Huang Ying
2011-06-01 18:49   ` Jesse Barnes
2011-06-02  5:09     ` Huang Ying
2011-06-02 15:05   ` Jesse Barnes
2011-05-17  8:08 ` [PATCH 8/9] ACPI, APEI, GHES: Add PCIe AER recovery support Huang Ying
2011-05-17  8:08 ` [PATCH 9/9] ACPI, APEI, ERST, Prevent erst_dbg from loading if ERST is disabled Huang Ying
2011-05-29  6:55 ` [PATCH 0/9] ACPI, APEI patches for 2.6.40 Len Brown
2011-05-29 11:31   ` huang ying
2011-05-29 11:31     ` huang ying
2011-05-30  6:48   ` Chen Gong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110525140808.GD19118@elte.hu \
    --to=mingo@elte.hu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=bp@alien8.de \
    --cc=fengguang.wu@intel.com \
    --cc=huang.ying.caritas@gmail.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.