From: Ingo Molnar <mingo@elte.hu>
To: Huang Ying <ying.huang@intel.com>
Cc: huang ying <huang.ying.caritas@gmail.com>,
Len Brown <lenb@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Andi Kleen <andi@firstfloor.org>,
"Luck, Tony" <tony.luck@intel.com>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
Andi Kleen <ak@linux.intel.com>,
"Wu, Fengguang" <fengguang.wu@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Borislav Petkov <bp@alien8.de>
Subject: Re: [PATCH 5/9] HWPoison: add memory_failure_queue()
Date: Mon, 23 May 2011 13:01:51 +0200 [thread overview]
Message-ID: <20110523110151.GD24674@elte.hu> (raw)
In-Reply-To: <4DD9C8B9.5070004@intel.com>
* Huang Ying <ying.huang@intel.com> wrote:
> > That's where 'active filters' come into the picture - see my other mail
> > (that was in the context of unidentified NMI errors/events) where i
> > outlined how they would work in this case and elsewhere. Via active filters
> > we could share most of the code, gain access to the events and still have
> > kernel driven policy action.
>
> Is that something as follow?
>
> - NMI handler run for the hardware error, where hardware error
> information is collected and put into perf ring buffer as 'event'.
Correct.
Note that for MCE errors we want the 'persistent event' framework Boris has
posted: we want these events to be buffered up to a point even if there is no
tool listening in on them:
- this gives us boot-time MCE error coverage
- this protects us against a logging daemon being restarted and events
getting lost
> - Some 'active filters' are run for each 'event' in NMI context.
Yeah. Whether it's a human-ASCII space 'filter' or really just a callback you
register with that event is secondary - both would work.
> - Some operations can not be done in NMI handler, so they are delayed to
> an IRQ handler (can be done with something like irq_work).
Yes.
> - Some other 'active filters' are run for each 'event' in IRQ context.
> (For memory error, we can call memory_failure_queue() here).
Correct.
> Where some 'active filters' are kernel built-in, some 'active filters' can be
> customized via kernel command line or by user space.
Yes.
> If my understanding as above is correct, I think this is a general and
> complex solution. It is a little hard for user to understand which 'active
> filters' are in effect. He may need some runtime assistant to understand the
> code (maybe /sys/events/active_filters, which list all filters in effect
> now), because that is hard only by reading the source code. Anyway, this is
> a design style choice.
I don't think it's complex: the built-in rules are in plain sight (can be in
the source code or can even be explicitly registered callbacks), the
configuration/tooling installed rules will be as complex as the admin or tool
wants them to be.
> There are still some issues, I don't know how to solve in above framework.
>
> - If there are two processes request the same type of hardware error
> events. One hardware error event will be copied to two ring buffers (each
> for one process), but the 'active filters' should be run only once for each
> hardware error event.
With persistent events 'active filters' should only be attached to the central
persistent event.
> - How to deal with ring-buffer overflow? For example, there is full of
> corrected memory error in ring-buffer, and now a recoverable memory error
> occurs but it can not be put into perf ring buffer because of ring-buffer
> overflow, how to deal with the recoverable memory error?
The solution is to make it large enough. With *every* queueing solution there
will be some sort of queue size limit.
Thanks,
Ingo
next prev parent reply other threads:[~2011-05-23 11:02 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-17 8:08 [PATCH 0/9] ACPI, APEI patches for 2.6.40 Huang Ying
2011-05-17 8:08 ` [PATCH 1/9] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG Huang Ying
2011-05-17 8:08 ` [PATCH 2/9] lib, Add lock-less NULL terminated single list Huang Ying
2011-05-17 8:08 ` [PATCH 3/9] lib, Make gen_pool memory allocator lockless Huang Ying
2011-05-17 8:08 ` [PATCH 4/9] ACPI, APEI, GHES, printk support for recoverable error via NMI Huang Ying
2011-05-17 8:08 ` [PATCH 5/9] HWPoison: add memory_failure_queue() Huang Ying
2011-05-17 8:46 ` Ingo Molnar
2011-05-17 8:52 ` Huang Ying
2011-05-17 9:26 ` Ingo Molnar
2011-05-18 1:10 ` Huang Ying
2011-05-20 11:56 ` Ingo Molnar
2011-05-22 8:14 ` huang ying
2011-05-22 10:00 ` Ingo Molnar
2011-05-22 12:32 ` huang ying
2011-05-22 12:32 ` huang ying
2011-05-22 13:25 ` Ingo Molnar
2011-05-22 13:25 ` Ingo Molnar
2011-05-23 2:38 ` Huang Ying
2011-05-23 11:01 ` Ingo Molnar [this message]
2011-05-23 16:45 ` Luck, Tony
2011-05-25 14:08 ` Ingo Molnar
2011-05-24 2:10 ` Huang Ying
2011-05-24 2:48 ` Ingo Molnar
2011-05-24 3:07 ` Huang Ying
2011-05-24 4:24 ` Ingo Molnar
2011-05-25 7:41 ` Hidetoshi Seto
2011-05-25 14:11 ` Ingo Molnar
2011-05-26 1:33 ` Hidetoshi Seto
2011-05-17 8:08 ` [PATCH 6/9] ACPI, APEI, GHES: Add hardware memory error recovery support Huang Ying
2011-05-17 8:08 ` [PATCH 7/9] PCIe, AER, add aer_recover_queue Huang Ying
2011-06-01 18:49 ` Jesse Barnes
2011-06-02 5:09 ` Huang Ying
2011-06-02 15:05 ` Jesse Barnes
2011-05-17 8:08 ` [PATCH 8/9] ACPI, APEI, GHES: Add PCIe AER recovery support Huang Ying
2011-05-17 8:08 ` [PATCH 9/9] ACPI, APEI, ERST, Prevent erst_dbg from loading if ERST is disabled Huang Ying
2011-05-29 6:55 ` [PATCH 0/9] ACPI, APEI patches for 2.6.40 Len Brown
2011-05-29 11:31 ` huang ying
2011-05-29 11:31 ` huang ying
2011-05-30 6:48 ` Chen Gong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110523110151.GD24674@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=bp@alien8.de \
--cc=fengguang.wu@intel.com \
--cc=huang.ying.caritas@gmail.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.