All of lore.kernel.org
 help / color / mirror / Atom feed
From: Don Zickus <dzickus@redhat.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>,
	linux-edac <linux-edac@vger.kernel.org>,
	Borislav Petkov <bp@suse.de>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Len Brown <lenb@kernel.org>, Tony Luck <tony.luck@intel.com>,
	Tomasz Nowicki <tomasz.nowicki@linaro.org>,
	"Chen, Gong" <gong.chen@linux.intel.com>,
	Wolfram Sang <wsa@the-dreams.de>, Lv Zheng <lv.zheng@intel.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
	Huang Ying <ying.huang@intel.com>
Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
Date: Tue, 28 Apr 2015 14:44:28 -0400	[thread overview]
Message-ID: <20150428184428.GF98296@redhat.com> (raw)
In-Reply-To: <20150428162229.GH19025@pd.tnic>

On Tue, Apr 28, 2015 at 06:22:29PM +0200, Borislav Petkov wrote:
> On Tue, Apr 28, 2015 at 11:35:21AM -0400, Don Zickus wrote:
> > Your solution seems much simpler. :-)
> 
> ... and I love simpler :-)
> 
> > I followed up in another email stating I mis-spoke.  I forgot this still
> > uses the NMI_LOCAL shared NMI.  So every perf NMI, will also call the GHES
> > handler to make sure NMIs did not piggy back each other.  So I don't believe
> 
> And this is something we should really fix - perf and RAS should
> not have anything to do with each other. But I don't know the NMI
> code to even have an idea how. I don't even know whether we can
> differentiate NMIs, hell, I can't imagine the hardware giving us a
> different NMI reason through get_nmi_reason(). Maybe that byte returned
> from NMI_REASON_PORT is too small and hangs on too much legacy crap to
> even be usable. Questions over questions...

:-)  Well, let me first clear up some of your questions. 

RAS doesn't go through the legacy ports (ie get_nmi_reason()).  Instead it
triggers the external NMI through a different bit (ioapic I think).

The nmi code has no idea what io_remap'ed address apei is using to map its
error handling register that GHES uses.  Unlike the legacy port which is
always port 0x61.

So, with NMI being basically a shared interrupt, with no ability to discern
who sent the interrupt (and even worse no ability to know how _many_ were sent as
the NMI is edge triggered instead of level triggered).  As a result we rely
on the NMI handlers to talk to their address space/registers to determine if
they were they source of the interrupt.

Now I can agree that perf and RAS have nothing to do with each other, but
they both use NMI to interrupt.  Perf is fortunate enough to be internal to
each cpu and therefore needs no global lock unlike GHES (hence part of the
problem).

The only way to determine who sent the NMI is to have each handler read its
register, which is time consuming for GHES.

Of course, we could go back to playing tricks knowing that external NMIs
like GHES and IO_CHECK/SERR are only routed to one cpu (cpu0 mainly) and
optimize things that way, but that inhibits the bsp cpu hotplugging folks.



I also played tricks like last year's patchset that split out the
nmi_handlers into LOCAL and EXTERNAL queues.  Perf would be part of the
LOCAL queue while GHES was part of the EXTERNAL queue.  The thought was to
never touch the EXTERNAL queue if perf claimed an NMI.  This lead to all
sorts of missed external NMIs, so it didn't work out.


Anyway, any ideas or thoughts for improvement are always welcomed. :-) 


Cheers,
Don

> 
> > the NMI reason lock is called a majority of the time (except when the NMI is
> > swallowed, but that is under heavy perf load...).
> 
> ..
> 
> > We both agree the mechanics of the spinlock are overkill here and cause much
> > cache contention.  Simplifying it to just 'reads' and return removes most of
> > the problem.
> 
> Right.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> ECO tip #101: Trim your mails when you reply.
> --

  reply	other threads:[~2015-04-28 18:44 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-27  9:22 [RFC PATCH 0/5] GHES NMI handler cleanup Borislav Petkov
2015-03-27  9:22 ` [RFC PATCH 1/5] GHES: Carve out error queueing in a separate function Borislav Petkov
2015-03-27  9:22 ` [RFC PATCH 2/5] GHES: Carve out the panic functionality Borislav Petkov
2015-03-27  9:22 ` [RFC PATCH 3/5] GHES: Panic right after detection Borislav Petkov
2015-03-27  9:22 ` [RFC PATCH 4/5] GHES: Elliminate double-loop in the NMI handler Borislav Petkov
2015-03-27  9:22 ` [RFC PATCH 5/5] GHES: Make NMI handler have a single reader Borislav Petkov
2015-04-01  7:45   ` Jiri Kosina
2015-04-01 13:49     ` Borislav Petkov
2015-04-23  8:39       ` Jiri Kosina
2015-04-23  8:59         ` Borislav Petkov
2015-04-23 18:00           ` Luck, Tony
2015-04-23 18:00             ` Luck, Tony
2015-04-27 20:23             ` Borislav Petkov
2015-04-28 14:30     ` Don Zickus
2015-04-28 14:42       ` Don Zickus
2015-04-28 14:55       ` Borislav Petkov
2015-04-28 15:35         ` Don Zickus
2015-04-28 16:22           ` Borislav Petkov
2015-04-28 18:44             ` Don Zickus [this message]
2015-05-04 15:40               ` Borislav Petkov
2015-04-27  3:16   ` Zheng, Lv
2015-04-27  8:46     ` Borislav Petkov
2015-04-28  0:44       ` Zheng, Lv
2015-04-28  0:44         ` Zheng, Lv
2015-04-28  2:24       ` Zheng, Lv
2015-04-28  2:24         ` Zheng, Lv
2015-04-28  7:38         ` Borislav Petkov
2015-04-28 13:38   ` Zheng, Lv
2015-04-28 13:59     ` Borislav Petkov
2015-04-29  0:24       ` Zheng, Lv
2015-04-29  0:24         ` Zheng, Lv
2015-04-29  0:49       ` Zheng, Lv
2015-04-29  0:49         ` Zheng, Lv
2015-04-29  8:13         ` Borislav Petkov
2015-04-30  8:05           ` Zheng, Lv
2015-04-30  8:05             ` Zheng, Lv
2015-04-30  8:48             ` Borislav Petkov
2015-05-02  0:34               ` Zheng, Lv
2015-05-02  0:34                 ` Zheng, Lv

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150428184428.GF98296@redhat.com \
    --to=dzickus@redhat.com \
    --cc=bp@alien8.de \
    --cc=bp@suse.de \
    --cc=gong.chen@linux.intel.com \
    --cc=jkosina@suse.cz \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lv.zheng@intel.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=rjw@rjwysocki.net \
    --cc=tomasz.nowicki@linaro.org \
    --cc=tony.luck@intel.com \
    --cc=wsa@the-dreams.de \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.