linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Dave Hansen <dave.hansen@intel.com>
Cc: "Chen, Tim C" <tim.c.chen@intel.com>,
	Ingo Molnar <mingo@redhat.com>, Davidlohr Bueso <dbueso@suse.de>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Jason Low <jason.low2@hp.com>,
	Michel Lespinasse <walken@google.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Waiman Long <waiman.long@hp.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: performance delta after VFS i_mutex=>i_rwsem conversion
Date: Mon, 6 Jun 2016 13:46:23 -0700	[thread overview]
Message-ID: <CA+55aFxH_7wjo_BgUPK5iomWedE2=DaUZVX-yruHOWEk7OTiHQ@mail.gmail.com> (raw)
In-Reply-To: <5755D671.9070908@intel.com>

On Mon, Jun 6, 2016 at 1:00 PM, Dave Hansen <dave.hansen@intel.com> wrote:
>
> I tracked this down to the differences between:
>
>         rwsem_spin_on_owner() - false roughly 1% of the time
>         mutex_spin_on_owner() - false roughly 0.05% of the time
>
> The optimistic rwsem and mutex code look quite similar, but there is one
> big difference: a hunk of code in rwsem_spin_on_owner() stops the
> spinning for rwsems, but isn't present for mutexes in any form:
>
>>         if (READ_ONCE(sem->owner))
>>                 return true; /* new owner, continue spinning */
>>
>>         /*
>>          * When the owner is not set, the lock could be free or
>>          * held by readers. Check the counter to verify the
>>          * state.
>>          */
>>         count = READ_ONCE(sem->count);
>>         return (count == 0 || count == RWSEM_WAITING_BIAS);
>
> If I hack this out, I end up with:
>
>         d9171b9(mutex-original): 689179
>         9902af7(rwsem-hacked  ): 671706 (-2.5%)
>
> I think it's safe to say that this accounts for the majority of the
> difference in behavior.

So my gut feel is that we do want to have the same heuristics for
rwsems and mutexes (well, modulo possible actual semantic differences
due to the whole shared-vs-exclusive issues).

And I also suspect that the mutexes have gotten a lot more performance
tuning done on them, so it's likely the correct thing to try to make
the rwsem match the mutex code rather than the other way around.

I think we had Jason and Davidlohr do mutex work last year, let's see
if they agree on that "yes, the mutex case is the likely more tuned
case" feeling.

The fact that your performance improves when you do that obviously
then also validates the assumption that the mutex spinning is the
better optimized one.

> So, as it stands today in 4.7-rc1, mutexes end up yielding higher
> performance under contention.  But, they don't let them system go very
> idle, even under heavy contention, which seems rather wrong.  Should we
> be making rwsems spin more, or mutexes spin less?

I think performance is what matters. The fact that it performs better
with spinning is a big mark for spinning more.

Being idle under load is _not_ something we should see as a good
thing. Yes, yes, it would be lower power, but lock contention is *not*
a low-power load. Being slow under lock contention just tends to make
for more lock contention, and trying to increase idle time is almost
certainly the wrong thing to do.

Spinning behavior tends to have a secondary advantage too: it is a
hell of a lot nicer to do performance analysis on. So if you get lock
contention on real loads (as opposed to some extreme
unlink-microbenchmark), I think a lot of people will be happier seeing
the spinning behavior just because it helps pinpoint the problem in
ways idling does not.

So I think everything points to: "make rwsems do the same thing
mutexes do". But I'll let it locking maintainers pipe up. Peter? Ingo?

              Linus

  reply	other threads:[~2016-06-06 20:46 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-06 20:00 performance delta after VFS i_mutex=>i_rwsem conversion Dave Hansen
2016-06-06 20:46 ` Linus Torvalds [this message]
2016-06-06 21:13   ` Waiman Long
2016-06-06 21:20     ` Linus Torvalds
2016-06-07  3:22       ` Valdis.Kletnieks
2016-06-07 15:22         ` Waiman Long
2016-06-08  8:58     ` Ingo Molnar
2016-06-09 10:25       ` Ingo Molnar
2016-06-09 18:14         ` Dave Hansen
2016-06-09 20:10           ` Chen, Tim C
2016-06-06 21:15   ` Al Viro
2016-06-06 21:46     ` Linus Torvalds
2016-06-06 22:07       ` Al Viro
2016-06-06 23:50         ` Linus Torvalds
2016-06-06 23:59           ` Linus Torvalds
2016-06-07  0:29             ` Linus Torvalds
2016-06-07  0:40           ` Al Viro
2016-06-07  0:44             ` Al Viro
2016-06-07  0:58             ` Al Viro
2016-06-07  0:58             ` Linus Torvalds
2016-06-07  1:19               ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+55aFxH_7wjo_BgUPK5iomWedE2=DaUZVX-yruHOWEk7OTiHQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=dbueso@suse.de \
    --cc=jason.low2@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tim.c.chen@intel.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=waiman.long@hp.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).