linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Richard Henderson <rth@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew MacLeod <amacleod@redhat.com>,
	paulmck@linux.vnet.ibm.com, Torvald Riegel <triegel@redhat.com>,
	Jan Kara <jack@suse.cz>, LKML <linux-kernel@vger.kernel.org>,
	linux-ia64@vger.kernel.org, dsterba@suse.cz, ptesarik@suse.cz,
	rguenther@suse.de, gcc@gcc.gnu.org
Subject: Re: Memory corruption due to word sharing
Date: Fri, 10 Feb 2012 11:27:34 -0800	[thread overview]
Message-ID: <4F356FA6.4000104@redhat.com> (raw)
In-Reply-To: <CA+55aFxoW0zexd4Zy33gToK9V0UU6wbQFFogEyFPPiGuNq2GGQ@mail.gmail.com>

On 02/03/2012 12:00 PM, Linus Torvalds wrote:
>    do {
>      load-link %r,%m
>      if (r == value)
>          return 0;
>      add
>    } while (store-conditional %r,%m)
>    return 1;
> 
> and it is used to implement two *very* common (and critical)
> reference-counting use cases:
> 
>  - decrement ref-count locklessly, and if it hits free, take a lock
> atomically (ie "it would be a bug for anybody to ever see it decrement
> to zero while holding the lock")
> 
>  - increment ref-count locklessly, but if we race with the final free,
> don't touch it, because it's gone (ie "if somebody decremented it to
> zero while holding the lock, we absolutely cannot increment it again")
> 
> They may sound odd, but those two operations are absolutely critical
> for most lock-less refcount management. And taking locks for reference
> counting is absolutely death to performance, and is often impossible
> anyway (the lock may be a local lock that is *inside* the structure
> that is being reference-counted, so if the refcount is zero, then you
> cannot even look at the lock!)
> 
> In the above example, the load-locked -> store-conditional would
> obviously be a cmpxchg loop instead on architectures that do cmpxchg
> instead of ll/sc.
> 
> Of course, it you expose some intrinsic for the whole "ll/sc" model
> (and you then turn it into cmpxchg on demand), we could literally
> open-code it.

We can't expose the ll/sc model directly, due to various target-specific
hardware constraints (e.g. no other memory references, i.e. no spills).
But the "weak" compare-exchange is sorta-kinda close.

A "weak" compare-exchange is a ll/sc pair, without the loop for restart
on sc failure.  Thus a weak compare-exchange can fail for arbitrary reasons.
You do, however, get back the current value in the memory, so you can loop
back yourself and try again.

So that loop above becomes the familiar expression as for CAS:

  r = *m;
  do {
    if (r == value)
      return;
    n = r + inc;
  } while (!atomic_compare_exchange(m, &r, n, /*weak=*/true,
				    __ATOMIC_RELAXED, __ATOMIC_RELAXED));

which, unlike with the current __sync_val_compare_and_swap, does not
result in a nested loop for the ll/sc architectures.

Yes, the ll/sc architectures can technically do slightly better by writing
this as inline asm, but the gain is significantly less than before.  Given
that the line must be in L1 cache already, the redundant load from M in
the ll insn should be nearly negligible.

Of course for architectures that implement cas, there is no difference
between "weak" and "strong".


r~

  parent reply	other threads:[~2012-02-10 19:35 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-01 15:19 Memory corruption due to word sharing Jan Kara
2012-02-01 15:34 ` Markus Trippelsdorf
2012-02-01 16:37 ` Colin Walters
2012-02-01 16:56   ` Linus Torvalds
2012-02-01 17:11     ` Jiri Kosina
2012-02-01 17:37       ` Linus Torvalds
2012-02-01 17:41       ` Michael Matz
2012-02-01 18:09         ` David Miller
2012-02-01 18:45           ` Jeff Law
2012-02-01 19:09             ` Linus Torvalds
2012-02-02 15:51               ` Jeff Garzik
2012-02-01 18:57           ` Linus Torvalds
2012-02-01 19:04           ` Peter Bergner
2012-02-01 18:52         ` Linus Torvalds
2012-02-02  9:35           ` Richard Guenther
2012-02-02  9:37           ` Richard Guenther
2012-02-02 13:43           ` Michael Matz
2012-02-01 16:41 ` Linus Torvalds
2012-02-01 17:42   ` Torvald Riegel
2012-02-01 19:40     ` Jakub Jelinek
2012-02-01 20:01       ` Linus Torvalds
2012-02-01 20:16         ` Jakub Jelinek
2012-02-01 20:44           ` Linus Torvalds
2012-02-02 15:58             ` Aldy Hernandez
2012-02-02 16:28               ` Michael Matz
2012-02-02 17:51                 ` Linus Torvalds
2012-02-01 20:19         ` Linus Torvalds
2012-02-02  9:46           ` Richard Guenther
2012-02-01 19:44     ` Boehm, Hans
2012-02-01 19:54       ` Jeff Law
2012-02-01 19:47     ` Linus Torvalds
2012-02-01 19:58       ` Alan Cox
2012-02-01 20:41       ` Torvald Riegel
2012-02-01 20:59         ` Linus Torvalds
2012-02-01 21:24           ` Torvald Riegel
2012-02-01 21:55             ` Linus Torvalds
2012-02-01 21:25           ` Boehm, Hans
2012-02-01 22:27             ` Linus Torvalds
2012-02-01 22:45           ` Paul E. McKenney
2012-02-01 23:11             ` Linus Torvalds
2012-02-02 18:42               ` Paul E. McKenney
2012-02-02 19:08                 ` Linus Torvalds
2012-02-02 19:37                   ` Paul E. McKenney
2012-02-03 16:38                     ` Andrew MacLeod
2012-02-03 17:16                       ` Linus Torvalds
2012-02-03 19:16                         ` Andrew MacLeod
2012-02-03 20:00                           ` Linus Torvalds
2012-02-03 20:19                             ` Paul E. McKenney
2012-02-06 15:38                             ` Torvald Riegel
2012-02-10 19:27                             ` Richard Henderson [this message]
2012-02-02 11:19           ` Ingo Molnar
2012-02-01 21:04       ` Boehm, Hans
2012-02-02  9:28         ` Bernd Petrovitsch
2012-02-01 17:08 ` Torvald Riegel
2012-02-01 17:29   ` Linus Torvalds
2012-02-01 20:53     ` Torvald Riegel
2012-02-01 21:20       ` Linus Torvalds
2012-02-01 21:37         ` Torvald Riegel
2012-02-01 22:18           ` Boehm, Hans
2012-02-02 11:11 ` James Courtier-Dutton
2012-02-02 11:24   ` Richard Guenther
2012-02-02 11:13 ` David Sterba
2012-02-02 11:23   ` Richard Guenther
2012-02-03  6:45 ` DJ Delorie
2012-02-03  9:37   ` Richard Guenther
2012-02-03 10:03     ` Matthew Gretton-Dann
2012-02-01 17:52 Dennis Clarke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F356FA6.4000104@redhat.com \
    --to=rth@redhat.com \
    --cc=amacleod@redhat.com \
    --cc=dsterba@suse.cz \
    --cc=gcc@gcc.gnu.org \
    --cc=jack@suse.cz \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=ptesarik@suse.cz \
    --cc=rguenther@suse.de \
    --cc=torvalds@linux-foundation.org \
    --cc=triegel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).