All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] mutex: Improve mutex performance by doing less atomic-ops & spinning
@ 2013-04-04 14:54 Waiman Long
  2013-04-04 14:54 ` [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations Waiman Long
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Waiman Long @ 2013-04-04 14:54 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Paul E. McKenney,
	David Howells, Dave Jones, Clark Williams, Peter Zijlstra
  Cc: Davidlohr Bueso, Waiman Long, linux-kernel, Chandramouleeswaran, Aswin

This patch set is a collection of 3 different mutex related patches
aimed at improving mutex performance especially for system with large
number of CPUs. This is achieved by doing less atomic operations and
mutex spinning (when the CONFIG_MUTEX_SPIN_ON_OWNER is on).

The first patch reduces the number of atomic operations executed. It
can produce dramatic performance improvement in the AIM7 benchmark
with large number of CPUs. For example, there was a more than 3X
improvement in the high_systime workload with a 3.7.10 kernel on
an 8-socket x86-64 system with 80 cores. The 3.8 kernels, on the
other hand, are not mutex limited for that workload anymore. So the
performance improvement is only about 1% for the high_systime workload.

Patches 2 and 3 represent different ways to reduce mutex spinning. Of
the 2, the third one is better from both a performance perspective
and the fact that no mutex data structure change is needed. See the
individual patch descriptions for more information on those patches.

The table below shows the performance impact on the AIM7 benchmark with
a 3.8.5 kernel running on the same 8-socket system mentioned above:

+--------------+------------------------------------------------------+
|   Workload   |              Mean % Change 10-100 users              |
|              +-----------------+-----------------+------------------+
|              |   Patches 1+2   |   Patches 1+3   | Relative %Change |
+--------------+-----------------+-----------------+------------------+
| fserver      |     +1.7%       |      0.0%       |     -1.7%        |
| new_fserver  |     -0.2%       |     -1.5%       |     -1.2%        |
+--------------+-----------------+-----------------+------------------+
|   Workload   |             Mean % Change 100-1000 users             |
|              +-----------------+-----------------+------------------+
|              |   Patches 1+2   |   Patches 1+3   | Relative %Change |
+--------------+-----------------+-----------------+------------------+
| fserver      |    +18.6%       |    +43.4%       |    +21.0%        |
| new_fserver  |    +14.0%       |    +23.4%       |     +8.2%        |
+--------------+-----------------+-----------------+------------------+
|   Workload   |             Mean % Change 1100-2000 users            |
|              +-----------------+-----------------+------------------+
|              |   Patches 1+2   |   Patches 1+3   | Relative %Change |
+--------------+-----------------+-----------------+------------------+
| fserver      |    +11.6%       |     +5.1%       |     -5.8%        |
| new_fserver  |    +13.3%       |     +7.6%       |     -5.0%        |
+--------------+-----------------+-----------------+------------------+

So patch 2 is better at low and high load. Patch 3 is better at
intermediate load. For other AIM7 workloads, patch 3 is generally
better.

Waiman Long (3):
  mutex: Make more scalable by doing less atomic operations
  mutex: restrict mutex spinning to only one task per mutex
  mutex: dynamically disable mutex spinning at high load

 arch/x86/include/asm/mutex.h |   16 ++++++++++++++++
 include/linux/mutex.h        |    3 +++
 kernel/mutex.c               |   21 ++++++++++++++++++---
 kernel/mutex.h               |    8 ++++++++
 kernel/sched/core.c          |   22 ++++++++++++++++++++++
 5 files changed, 67 insertions(+), 3 deletions(-)


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-04-11  9:07 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-04 14:54 [PATCH RFC 0/3] mutex: Improve mutex performance by doing less atomic-ops & spinning Waiman Long
2013-04-04 14:54 ` [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations Waiman Long
2013-04-08 12:42   ` Ingo Molnar
2013-04-08 14:38     ` Linus Torvalds
2013-04-08 15:09       ` Ingo Molnar
2013-04-08 17:53       ` Waiman Long
2013-04-10 10:31         ` Ingo Molnar
2013-04-10 15:52           ` Waiman Long
2013-04-10 17:16             ` Ingo Molnar
2013-04-10 21:26               ` Waiman Long
2013-04-11  9:07                 ` Ingo Molnar
2013-04-10 14:09       ` Robin Holt
2013-04-10 15:46         ` Linus Torvalds
2013-04-08 17:42     ` Waiman Long
2013-04-10 10:28       ` Ingo Molnar
2013-04-10 15:47         ` Waiman Long
2013-04-04 14:54 ` [PATCH RFC 2/3] mutex: restrict mutex spinning to only one task per mutex Waiman Long
2013-04-04 14:54 ` [PATCH RFC 3/3] mutex: dynamically disable mutex spinning at high load Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.