linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Don Zickus <dzickus@redhat.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: x86@kernel.org, "Peter Zijlstra" <peterz@infradead.org>,
	ak@linux.intel.com, gong.chen@linux.intel.com,
	LKML <linux-kernel@vger.kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Steven Rostedt" <rostedt@goodmis.org>
Subject: Re: [PATCH 1/5] x86, nmi:  Add new nmi type 'external'
Date: Wed, 7 May 2014 12:02:51 -0400	[thread overview]
Message-ID: <20140507160251.GQ39568@redhat.com> (raw)
In-Reply-To: <20140507153854.GA14926@gmail.com>

On Wed, May 07, 2014 at 05:38:54PM +0200, Ingo Molnar wrote:
> 
> * Don Zickus <dzickus@redhat.com> wrote:
> 
> > I noticed when debugging a perf problem on a machine with GHES enabled,
> > perf seemed slow.  I then realized that the GHES NMI routine was taking
> > a global lock all the time to inspect the hardware.  This contended
> > with all the local perf counters which did not need a lock.  So each cpu
> > accidentally was synchronizing with itself when using perf.
> > 
> > This is because the way the nmi handler works.  It executes all the handlers
> > registered to a particular subtype (to deal with nmi sharing).  As a result
> > the GHES handler was executed on every PMI.
> > 
> > Fix this by creating a new nmi type called NMI_EXT, which is used by
> > handlers that need to probe external hardware and require a global lock
> > to do so.
> > 
> > Now the main NMI handler can check the internal NMI handlers first and
> > then the external ones if nothing is found.
> > 
> > This makes perf a little faster again on those machines with GHES enabled.
> 
> So what happens if GHES asserts an NMI at the same time a PMI 
> triggers?
> 
> If the perf PMI executes and indicates that it has handled something, 
> we don't execute the GHES handler, right? Will the GHES re-trigger the 
> NMI after we return?

In my head, I had thought they would be queued up and things work out
fine.  But I guess in theory, if a PMI NMI comes in and before the cpu can
accept it and GHES NMI comes in, then it would suffice to say it may get
dropped.  That would be not be good.  Though the race would be very small.

I don't have a good idea how to handle that.

On the flip side, we have the same exact problem, today, with the other
common external NMIs (SERR, IO).  If a PCI SERR comes in at the same time
as a PMI, then it gets dropped.  Worse, it doesn't get re-enabled and
blocks future SERRs (just found this out two weeks ago because of a dirty
perf status register on boot).

Again, I don't have a solution to juggle between PMI performance and
reliable delivery.  We could do away with the spinlocks and go back to
single cpu delivery (like it used to be).  Then devise a mechanism to
switch delivery to another cpu upon hotplug.

Thoughts?

Cheers,
Don

  reply	other threads:[~2014-05-07 16:03 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-07 15:34 [PATCH 0/5 RESEND] x86, nmi: Various fixes and cleanups Don Zickus
2014-05-07 15:34 ` [PATCH 1/5] x86, nmi: Add new nmi type 'external' Don Zickus
2014-05-07 15:38   ` Ingo Molnar
2014-05-07 16:02     ` Don Zickus [this message]
2014-05-07 16:27       ` Ingo Molnar
2014-05-07 16:48         ` Don Zickus
2014-05-08 16:33         ` Don Zickus
2014-05-08 17:35           ` Ingo Molnar
2014-05-08 17:52             ` Don Zickus
2014-05-09  7:10               ` Ingo Molnar
2014-05-09 13:36                 ` Don Zickus
2014-05-07 15:34 ` [PATCH 2/5] x86, nmi: Add boot line option 'panic_on_unrecovered_nmi' and 'panic_on_io_nmi' Don Zickus
2014-05-07 15:34 ` [PATCH 3/5] x86, nmi: Remove 'reason' value from unknown nmi output Don Zickus
2014-05-07 15:34 ` [PATCH 4/5] x86, nmi: Move default external NMI handler to its own routine Don Zickus
2014-05-07 15:34 ` [PATCH 5/5] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers Don Zickus
2014-05-07 15:42   ` Ingo Molnar
2014-05-07 16:04     ` Don Zickus
2014-05-07 16:30       ` Ingo Molnar
2014-05-07 19:50   ` Elliott, Robert (Server Storage)
2014-05-08  1:28     ` Don Zickus
2014-05-08  6:04       ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140507160251.GQ39568@redhat.com \
    --to=dzickus@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=fweisbec@gmail.com \
    --cc=gong.chen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).