Hi,

> From: Borislav Petkov [mailto:bp@alien8.de]
> Sent: Wednesday, April 29, 2015 4:14 PM
> Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader
> 
> On Wed, Apr 29, 2015 at 12:49:59AM +0000, Zheng, Lv wrote:
> > > > We absolutely want to use atomic_add_unless() because we get to save us
> > > > the expensive
> > > >
> > > > 	LOCK; CMPXCHG
> > > >
> > > > if the value was already 1. Which is exactly what this patch is trying
> > > > to avoid - a thundering herd of cores CMPXCHGing a global variable.
> > >
> > > IMO, on most architectures, the "cmp" part should work just like what you've done with "if".
> > > And on some architectures, if the "xchg" doesn't happen, the "cmp" part even won't cause a pipe line hazard.
> 
> Even if CMPXCHG is being split into several microops, they all still
> need to flow down the pipe and require resources and tracking. And you
> only know at retire time what the CMP result is and can "discard" the
> XCHG part. Provided the uarch is smart enough to do that.
> 
> This is probably why CMPXCHG needs 5,6,7,10,22,... cycles depending on
> uarch and vendor, if I can trust Agner Fog's tables. And I bet those
> numbers are best-case only and in real-life they probably tend to fall
> out even worse.
> 
> CMP needs only 1. On almost every uarch and vendor. And even that cycle
> probably gets hidden with a good branch predictor.

Are there any such data around the SC and LL (MIPS)?

> 
> > If you man the LOCK prefix, I understand now.
> 
> And that makes several times worse: 22, 40, 80, ... cycles.

I'm OK if the code still keeps the readability then.

Thanks and best regards
-Lv

> 
> --
> Regards/Gruss,
>     Boris.
> 
> ECO tip #101: Trim your mails when you reply.
> --