All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: "Chen, Tim C" <tim.c.chen@intel.com>,
	Ingo Molnar <mingo@redhat.com>, Davidlohr Bueso <dbueso@suse.de>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Jason Low <jason.low2@hp.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Michel Lespinasse <walken@google.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Waiman Long <waiman.long@hp.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	LKML <linux-kernel@vger.kernel.org>
Subject: performance delta after VFS i_mutex=>i_rwsem conversion
Date: Mon, 6 Jun 2016 13:00:49 -0700	[thread overview]
Message-ID: <5755D671.9070908@intel.com> (raw)

tl;dr: Mutexes spin more than than rwsems which makes mutexes perform
better when contending ->i_rwsem.  But, mutexes do this at the cost of
not sleeping much, even with tons of contention.

Should we do something to keep rwsem and mutex performance close to each
other?  If so, should mutexes be sleeping more or rwsems sleeping less?

---

I was doing some performance comparisons between 4.5 and 4.7 and noticed
a weird "blip" in unlink[1] performance where it got worse between 4.5
and 4.7:

	https://www.sr71.net/~dave/intel/rwsem-vs-mutex.png

There are two things to notice here:
1. Although the "1" (4.5) and "2" (4.7) lines nearly converge at high
   cpu  counts, 4.5 outperforms 4.7 by almost 2x at low contention
2. 4.7 has lots of idle time.  4.5 eats lots of cpu spinning

That was a 160-thread Westmere (~4 years old) system, but I moved to a
smaller system for some more focused testing (4 cores, Skylake), and
bisected it there down to the i_mutex switch to rwsem.  I tested on two
commits from here on out:

	[d9171b9] 1 commit before 9902af7
	[9902af7] parallel lookups: actual switch to rwsem"

unlink takes the rwsem for write, so it should see the same level of
contention as the mutex did.  But, at 4 threads, the unlink performance was:

	d9171b9(mutex): 689179
	9902af7(rwsem): 498325 (-27.7%)

I tracked this down to the differences between:

	rwsem_spin_on_owner() - false roughly 1% of the time
	mutex_spin_on_owner() - false roughly 0.05% of the time

The optimistic rwsem and mutex code look quite similar, but there is one
big difference: a hunk of code in rwsem_spin_on_owner() stops the
spinning for rwsems, but isn't present for mutexes in any form:

>         if (READ_ONCE(sem->owner))
>                 return true; /* new owner, continue spinning */
> 
>         /*
>          * When the owner is not set, the lock could be free or
>          * held by readers. Check the counter to verify the
>          * state.
>          */
>         count = READ_ONCE(sem->count); 
>         return (count == 0 || count == RWSEM_WAITING_BIAS);

If I hack this out, I end up with:

	d9171b9(mutex-original): 689179
	9902af7(rwsem-hacked  ): 671706 (-2.5%)

I think it's safe to say that this accounts for the majority of the
difference in behavior.  It also shows that the _actual_
i_mutex=>i_rwsem conversion is the issue here, and not some other part
of the parallel lookup patches.

So, as it stands today in 4.7-rc1, mutexes end up yielding higher
performance under contention.  But, they don't let them system go very
idle, even under heavy contention, which seems rather wrong.  Should we
be making rwsems spin more, or mutexes spin less?

--

1. The test in question here is creating/unlinking many files in the
same directory: >
https://github.com/antonblanchard/will-it-scale/blob/master/tests/unlink1.c

             reply	other threads:[~2016-06-06 20:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-06 20:00 Dave Hansen [this message]
2016-06-06 20:46 ` performance delta after VFS i_mutex=>i_rwsem conversion Linus Torvalds
2016-06-06 21:13   ` Waiman Long
2016-06-06 21:20     ` Linus Torvalds
2016-06-07  3:22       ` Valdis.Kletnieks
2016-06-07 15:22         ` Waiman Long
2016-06-08  8:58     ` Ingo Molnar
2016-06-09 10:25       ` Ingo Molnar
2016-06-09 18:14         ` Dave Hansen
2016-06-09 20:10           ` Chen, Tim C
2016-06-06 21:15   ` Al Viro
2016-06-06 21:46     ` Linus Torvalds
2016-06-06 22:07       ` Al Viro
2016-06-06 23:50         ` Linus Torvalds
2016-06-06 23:59           ` Linus Torvalds
2016-06-07  0:29             ` Linus Torvalds
2016-06-07  0:40           ` Al Viro
2016-06-07  0:44             ` Al Viro
2016-06-07  0:58             ` Al Viro
2016-06-07  0:58             ` Linus Torvalds
2016-06-07  1:19               ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5755D671.9070908@intel.com \
    --to=dave.hansen@intel.com \
    --cc=dbueso@suse.de \
    --cc=jason.low2@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tim.c.chen@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=waiman.long@hp.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.