Hi, > From: Borislav Petkov [mailto:bp@alien8.de] > Sent: Wednesday, April 29, 2015 4:14 PM > Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader > > On Wed, Apr 29, 2015 at 12:49:59AM +0000, Zheng, Lv wrote: > > > > We absolutely want to use atomic_add_unless() because we get to save us > > > > the expensive > > > > > > > > LOCK; CMPXCHG > > > > > > > > if the value was already 1. Which is exactly what this patch is trying > > > > to avoid - a thundering herd of cores CMPXCHGing a global variable. > > > > > > IMO, on most architectures, the "cmp" part should work just like what you've done with "if". > > > And on some architectures, if the "xchg" doesn't happen, the "cmp" part even won't cause a pipe line hazard. > > Even if CMPXCHG is being split into several microops, they all still > need to flow down the pipe and require resources and tracking. And you > only know at retire time what the CMP result is and can "discard" the > XCHG part. Provided the uarch is smart enough to do that. > > This is probably why CMPXCHG needs 5,6,7,10,22,... cycles depending on > uarch and vendor, if I can trust Agner Fog's tables. And I bet those > numbers are best-case only and in real-life they probably tend to fall > out even worse. > > CMP needs only 1. On almost every uarch and vendor. And even that cycle > probably gets hidden with a good branch predictor. Are there any such data around the SC and LL (MIPS)? > > > If you man the LOCK prefix, I understand now. > > And that makes several times worse: 22, 40, 80, ... cycles. I'm OK if the code still keeps the readability then. Thanks and best regards -Lv > > -- > Regards/Gruss, > Boris. > > ECO tip #101: Trim your mails when you reply. > --