All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH tip/locking/core v10 0/7] locking/qspinlock: Enhance qspinlock & pvqspinlock performance
@ 2015-11-10  0:09 Waiman Long
  2015-11-10  0:09 ` [PATCH tip/locking/core v10 1/7] locking/qspinlock: Use _acquire/_release versions of cmpxchg & xchg Waiman Long
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Waiman Long @ 2015-11-10  0:09 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, H. Peter Anvin
  Cc: x86, linux-kernel, Scott J Norton, Douglas Hatch,
	Davidlohr Bueso, Waiman Long

v9->v10:
 - Broke patch 2 into two separated patches (suggested by PeterZ).
 - Changed the slowpath statistical counter code to use back debugfs
   while keeping the per-cpu counter setup.
 - Some minor twists and additional comments for the lock stealing
   and adaptive spinning patches.

v8->v9:
 - Added a new patch 2 which tried to prefetch the cacheline of the
   next MCS node in order to reduce the MCS unlock latency when it
   was time to do the unlock.
 - Changed the slowpath statistical counters implementation in patch
   4 from atomic_t to per-cpu variables to reduce performance overhead
   and used sysfs instead of debugfs to return the consolidated counts
   and data.

v7->v8:
 - Annotated the use of each _acquire/_release variants in qspinlock.c.
 - Used the available pending bit in the lock stealing patch to disable
   lock stealing when the queue head vCPU is actively spinning on the
   lock to avoid lock starvation.
 - Restructured the lock stealing patch to reduce code duplication.
 - Verified that the waitcnt processing will be compiled away if
   QUEUED_LOCK_STAT isn't enabled.

v6->v7:
 - Removed arch/x86/include/asm/qspinlock.h from patch 1.
 - Removed the unconditional PV kick patch as it has been merged
   into tip.
 - Changed the pvstat_inc() API to add a new condition parameter.
 - Added comments and rearrange code in patch 4 to clarify where
   lock stealing happened.
 - In patch 5, removed the check for pv_wait count when deciding when
   to wait early.
 - Updated copyrights and email address.

v5->v6:
 - Added a new patch 1 to relax the cmpxchg and xchg operations in
   the native code path to reduce performance overhead on non-x86
   architectures.
 - Updated the unconditional PV kick patch as suggested by PeterZ.
 - Added a new patch to allow one lock stealing attempt at slowpath
   entry point to reduce performance penalty due to lock waiter
   preemption.
 - Removed the pending bit and kick-ahead patches as they didn't show
   any noticeable performance improvement on top of the lock stealing
   patch.
 - Simplified the adaptive spinning patch as the lock stealing patch
   allows more aggressive pv_wait() without much performance penalty
   in non-overcommitted VMs.

v4->v5:
 - Rebased the patch to the latest tip tree.
 - Corrected the comments and commit log for patch 1.
 - Removed the v4 patch 5 as PV kick deferment is no longer needed with
   the new tip tree.
 - Simplified the adaptive spinning patch (patch 6) & improve its
   performance a bit further.
 - Re-ran the benchmark test with the new patch.

v3->v4:
 - Patch 1: add comment about possible racing condition in PV unlock.
 - Patch 2: simplified the pv_pending_lock() function as suggested by
   Davidlohr.
 - Move PV unlock optimization patch forward to patch 4 & rerun
   performance test.

 - Moved deferred kicking enablement patch forward & move back
   the kick-ahead patch to make the effect of kick-ahead more visible.
 - Reworked patch 6 to make it more readable.
 - Reverted back to use state as a tri-state variable instead of
   adding an additional bistate variable.
 - Added performance data for different values of PV_KICK_AHEAD_MAX.
 - Add a new patch to optimize PV unlock code path performance.

v1->v2:
 - Take out the queued unfair lock patches
 - Add a patch to simplify the PV unlock code
 - Move pending bit and statistics collection patches to the front
 - Keep vCPU kicking in pv_kick_node(), but defer it to unlock time
   when appropriate.
 - Change the wait-early patch to use adaptive spinning to better
   balance the difference effect on normal and over-committed guests.
 - Add patch-to-patch performance changes in the patch commit logs.

This patchset tries to improve the performance of both regular and
over-commmitted VM guests. The adaptive spinning patch was inspired
by the "Do Virtual Machines Really Scale?" blog from Sanidhya Kashyap.

Patch 1 relaxes the memory order restriction of atomic operations by
using less restrictive _acquire and _release variants of cmpxchg()
and xchg(). This will reduce performance overhead when ported to other
non-x86 architectures.

Patch 2 attempts to prefetch the cacheline of the next MCS node to
reduce latency in the MCS unlock operation.

Patch 3 removes a redundant read of the next pointer.

Patch 4 optimizes the PV unlock code path performance for x86-64
architecture.

Patch 5 allows the collection of various slowpath statistics counter
data that are useful to see what is happening in the system. Per-cpu
counters are used to minimize performance overhead.

Patch 6 allows one lock stealing attempt at slowpath entry. This causes
a pretty big performance improvement for over-committed VM guests.

Patch 7 enables adaptive spinning in the queue nodes. This patch
leads to further performance improvement in over-committed guest,
though it is not as big as the previous patch.

Waiman Long (7):
  locking/qspinlock: Use _acquire/_release versions of cmpxchg & xchg
  locking/qspinlock: prefetch next node cacheline
  locking/qspinlock: Avoid redundant read of next pointer
  locking/pvqspinlock, x86: Optimize PV unlock code path
  locking/pvqspinlock: Collect slowpath lock statistics
  locking/pvqspinlock: Allow limited lock stealing
  locking/pvqspinlock: Queue node adaptive spinning

 arch/x86/Kconfig                          |    8 +
 arch/x86/include/asm/qspinlock_paravirt.h |   59 ++++++
 include/asm-generic/qspinlock.h           |    9 +-
 kernel/locking/qspinlock.c                |   90 +++++++--
 kernel/locking/qspinlock_paravirt.h       |  252 +++++++++++++++++++++----
 kernel/locking/qspinlock_stat.h           |  293 +++++++++++++++++++++++++++++
 6 files changed, 648 insertions(+), 63 deletions(-)
 create mode 100644 kernel/locking/qspinlock_stat.h


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-12-04 12:02 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-10  0:09 [PATCH tip/locking/core v10 0/7] locking/qspinlock: Enhance qspinlock & pvqspinlock performance Waiman Long
2015-11-10  0:09 ` [PATCH tip/locking/core v10 1/7] locking/qspinlock: Use _acquire/_release versions of cmpxchg & xchg Waiman Long
2015-11-23 16:26   ` [tip:locking/core] locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg() tip-bot for Waiman Long
2015-11-10  0:09 ` [PATCH tip/locking/core v10 2/7] locking/qspinlock: prefetch next node cacheline Waiman Long
2015-11-23 16:27   ` [tip:locking/core] locking/qspinlock: Prefetch the " tip-bot for Waiman Long
2015-11-10  0:09 ` [PATCH tip/locking/core v10 3/7] locking/qspinlock: Avoid redundant read of next pointer Waiman Long
2015-11-23 16:27   ` [tip:locking/core] " tip-bot for Waiman Long
2015-11-10  0:09 ` [PATCH tip/locking/core v10 4/7] locking/pvqspinlock, x86: Optimize PV unlock code path Waiman Long
2015-11-23 16:27   ` [tip:locking/core] locking/pvqspinlock, x86: Optimize the " tip-bot for Waiman Long
2015-11-10  0:09 ` [PATCH tip/locking/core v10 5/7] locking/pvqspinlock: Collect slowpath lock statistics Waiman Long
2015-11-23  9:51   ` Peter Zijlstra
2015-11-25 19:08     ` Waiman Long
2015-12-04 12:00   ` [tip:locking/core] " tip-bot for Waiman Long
2015-11-10  0:09 ` [PATCH tip/locking/core v10 6/7] locking/pvqspinlock: Allow limited lock stealing Waiman Long
2015-11-10 16:03   ` Peter Zijlstra
2015-11-10 19:46     ` Waiman Long
2015-11-10 21:07       ` Peter Zijlstra
2015-11-10  0:09 ` [PATCH tip/locking/core v10 7/7] locking/pvqspinlock: Queue node adaptive spinning Waiman Long
2015-12-04 12:00   ` [tip:locking/core] " tip-bot for Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.