Re: [PATCH 1/5] x86, nmi: Add new nmi type 'external'

From: Don Zickus <dzickus@redhat.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: x86@kernel.org, "Peter Zijlstra" <peterz@infradead.org>,
	ak@linux.intel.com, gong.chen@linux.intel.com,
	LKML <linux-kernel@vger.kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Steven Rostedt" <rostedt@goodmis.org>
Subject: Re: [PATCH 1/5] x86, nmi:  Add new nmi type 'external'
Date: Wed, 7 May 2014 12:02:51 -0400	[thread overview]
Message-ID: <20140507160251.GQ39568@redhat.com> (raw)
In-Reply-To: <20140507153854.GA14926@gmail.com>

On Wed, May 07, 2014 at 05:38:54PM +0200, Ingo Molnar wrote:
> 
> * Don Zickus <dzickus@redhat.com> wrote:
> 
> > I noticed when debugging a perf problem on a machine with GHES enabled,
> > perf seemed slow.  I then realized that the GHES NMI routine was taking
> > a global lock all the time to inspect the hardware.  This contended
> > with all the local perf counters which did not need a lock.  So each cpu
> > accidentally was synchronizing with itself when using perf.
> > 
> > This is because the way the nmi handler works.  It executes all the handlers
> > registered to a particular subtype (to deal with nmi sharing).  As a result
> > the GHES handler was executed on every PMI.
> > 
> > Fix this by creating a new nmi type called NMI_EXT, which is used by
> > handlers that need to probe external hardware and require a global lock
> > to do so.
> > 
> > Now the main NMI handler can check the internal NMI handlers first and
> > then the external ones if nothing is found.
> > 
> > This makes perf a little faster again on those machines with GHES enabled.
> 
> So what happens if GHES asserts an NMI at the same time a PMI 
> triggers?
> 
> If the perf PMI executes and indicates that it has handled something, 
> we don't execute the GHES handler, right? Will the GHES re-trigger the 
> NMI after we return?

In my head, I had thought they would be queued up and things work out
fine.  But I guess in theory, if a PMI NMI comes in and before the cpu can
accept it and GHES NMI comes in, then it would suffice to say it may get
dropped.  That would be not be good.  Though the race would be very small.

I don't have a good idea how to handle that.

On the flip side, we have the same exact problem, today, with the other
common external NMIs (SERR, IO).  If a PCI SERR comes in at the same time
as a PMI, then it gets dropped.  Worse, it doesn't get re-enabled and
blocks future SERRs (just found this out two weeks ago because of a dirty
perf status register on boot).

Again, I don't have a solution to juggle between PMI performance and
reliable delivery.  We could do away with the spinlocks and go back to
single cpu delivery (like it used to be).  Then devise a mechanism to
switch delivery to another cpu upon hotplug.

Thoughts?

Cheers,
Don