All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Junchang Wang <junchangwang@gmail.com>
Cc: Akira Yokosawa <akiyks@gmail.com>, perfbook@vger.kernel.org
Subject: Re: [Possible BUG] count_lim_atomic.c fails on POWER8
Date: Thu, 25 Oct 2018 08:24:50 -0700	[thread overview]
Message-ID: <20181025152450.GS4170@linux.ibm.com> (raw)
In-Reply-To: <CABoNC81fnRn3H3wB+Zx7WG7QPw7qLa13pVcUzxQ-zdX3vtaxuA@mail.gmail.com>

On Thu, Oct 25, 2018 at 10:09:22PM +0800, Junchang Wang wrote:
> On Thu, Oct 25, 2018 at 5:45 PM Paul E. McKenney <paulmck@linux.ibm.com> wrote:
> >
> > On Thu, Oct 25, 2018 at 10:11:18AM +0800, Junchang Wang wrote:
> > > Hi Akira,
> > >
> > > Thanks for the mail. My understanding is that PPC uses LL/SC to
> > > emulate CAS by using a tiny loop. Unfortunately, the LL/SC loop itself
> > > could fail (due to, for example, context switches) even if *ptr equals
> > > to old. In such a case, a CAS instruction in actually should return a
> > > success. I think this is what the term "spurious fail" describes. Here
> > > is a reference:
> > > http://liblfds.org/mediawiki/index.php?title=Article:CAS_and_LL/SC_Implementation_Details_by_Processor_family
> >
> > First, thank you both for your work on this!  And yes, my cmpxchg() code
> > is clearly quite broken.
> >
> > > It seems that __atomic_compare_exchange_n() provides option "weak" for
> > > performance. I tested these two solutions and got the following
> > > results:
> > >
> > >                            1      4      8     16     32    64
> > > my patch (ns)     35    34    37    73    142  281
> > > strong (ns)          39    39    41    79    158  313
> >
> > So strong is a bit slower, correct?
> >
> > > I tested the performance of count_lim_atomic by varying the number of
> > > updaters (./count_lim_atomic N uperf) on a 8-core PPC server. The
> > > first row in the table is the result when my patch is used, and the
> > > second row is the result when the 4th argument of the function is set
> > > to false(0). It seems performance improves slightly if option "weak"
> > > is used. However, there is no performance boost as we expected. So
> > > your solution sounds good if safety is one major concern because
> > > option "weak" seems risky to me :-)
> > >
> > > Another interesting observation is that the performance of LL/SC-based
> > > CAS instruction deteriorates dramatically when the number of working
> > > threads exceeds the number of CPU cores.
> >
> > If weak is faster, would it make sense to return (~o), that is,
> > the bitwise complement of the expected arguement, when the weak
> > __atomic_compare_exchange_n() fails?  This would get the improved
> > performance (if I understand your results above) while correctly handling
> > the strange (but possible) case where o==n.
> >
> > Does that make sense, or am I missing something?
> 
> Hi Paul and Akira,
> 
> Yes, the weak version is faster. The solution looks good. But when I
> tried to use the following patch
> 
> #define cmpxchg(ptr, o, n) \
> ({ \
>         typeof(*ptr) old = (o); \
>         (__atomic_compare_exchange_n(ptr, (void *)&old, (n), 1,
> __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))? \
>                                 (o) : (~o); \
> })
> 
> gcc complains of my use of complement symbol
> 
> ../api.h:769:12: error: wrong type argument to bit-complement
>      (o) : (~o); \
>               ^
> 
> Any suggestions?

You might need to do this for the macro argument: "(~(o))".

Another possibility is ((o) + 1), which would work for pointers as well
as for integers.

							Thanx, Paul


      parent reply	other threads:[~2018-10-25 23:58 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-20 15:53 [Possible BUG] count_lim_atomic.c fails on POWER8 Akira Yokosawa
2018-10-20 16:36 ` Paul E. McKenney
2018-10-24 15:53   ` Junchang Wang
2018-10-24 22:05     ` Akira Yokosawa
2018-10-24 22:29       ` Akira Yokosawa
2018-10-25  2:18         ` Junchang Wang
2018-10-25  2:11       ` Junchang Wang
2018-10-25  9:45         ` Paul E. McKenney
2018-10-25 12:23           ` Akira Yokosawa
2018-10-25 14:09           ` Junchang Wang
2018-10-25 15:17             ` Akira Yokosawa
2018-10-25 22:04               ` Akira Yokosawa
2018-10-26  0:58                 ` Junchang Wang
2018-10-27 14:56                   ` Akira Yokosawa
     [not found]                     ` <20181028001723.GJ4170@linux.ibm.com>
2018-10-28 12:08                       ` Junchang Wang
2018-10-28 13:19                         ` Paul E. McKenney
2018-10-28 13:22                         ` Akira Yokosawa
2018-10-28 14:24                       ` Akira Yokosawa
2018-10-28 16:43                         ` Paul E. McKenney
2018-10-29 14:45                           ` Akira Yokosawa
2018-10-29 15:30                             ` Paul E. McKenney
2018-10-26  1:12                 ` Akira Yokosawa
2018-10-26 11:34                   ` Akira Yokosawa
2018-10-26 16:06                     ` Junchang Wang
2018-10-25 15:24             ` Paul E. McKenney [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181025152450.GS4170@linux.ibm.com \
    --to=paulmck@linux.ibm.com \
    --cc=akiyks@gmail.com \
    --cc=junchangwang@gmail.com \
    --cc=perfbook@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.