All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] locking/lockdep: Improve lockdep performance
@ 2018-10-02 20:19 Waiman Long
  2018-10-02 20:19 ` [PATCH v2 1/5] locking/lockdep: Remove add_chain_cache_classes() Waiman Long
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Waiman Long @ 2018-10-02 20:19 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon; +Cc: linux-kernel, Waiman Long

 v1->v2:
  - Minor twists to incorporate Ingo's comments.
  - Move class->ops from the lock_class structure to percpu array under
    CONFIG_DEBUG_LOCKDEP. That moves the increased memory consumption
    to CONFIG_DEBUG_LOCKDEP only.

Enabling CONFIG_LOCKDEP and other related debug options will greatly
reduce system performance. This patchset aims to reduce the performance
slowdown caused by the lockdep code.

Patch 1 just removes an inline function that wasn't used.

Patches 2 and 3 are minor twists to optimize the code.

Patch 4 makes class->ops a per-cpu counter and moves the stat counter
under CONFIG_DEBUG_LOCKDEP again.

Patch 5 moves the lock_release() call outside of the lock critical 
section.

Parallel kernel compilation tests (make -j <#cpu>, best of 3 runs)
with gcc8 were performed on 2 different systems:

 1) an 1-socket 22-core 44-thread Skylake system
 2) a 4-socket 72-core 144-thread Broadwell system

Four different kernel variants based on the 4.19-rc5 kernel were used:

 1) non-debug kernel (with minimal debug options enabled)
 2) pre-patch debug kernel  (CONFIG_LOCKDEP, !CONFIG_DEBUG_LOCKDEP)
 3) post-patch debug kernel (CONFIG_LOCKDEP, !CONFIG_DEBUG_LOCKDEP)
 4) post-patch debug kernel (CONFIG_LOCKDEP,  CONFIG_DEBUG_LOCKDEP)

Note that the debug kernels had more debug options enabled than just
LOCKDEP.

The build times with pre-patch and post-patch debug kernels were:

   System    Kernel 1    Kernel 2    Kernel 3    Kernel 4
   ------    --------    --------    --------    --------
  1-socket    6m06.0s     8m54.7s     8m34.9s     9m28.1s
  4-socket    4m09.2s     7m36.0s     5m38.8s     6m17.8s

Using the non-debug kernel execution times as the baseline, the % 
runtime increase of the other 3 kernel variants were:

   System    Kernel 2    Kernel 3    Kernel 4
   ------    --------    --------    --------    
  1-socket    +46.1%      +40.7%      +55.2%
  4-socket    +83.0%      +36.0%      +51.6%

Comparing just kernels 2 and 3, the patch reduced the execution times 
by 3.7% and 25.7% for the 1-socket and 4-socket systems respectively.

I think the last 2 patches yield most of the performance improvement.

Waiman Long (5):
  locking/lockdep: Remove add_chain_cache_classes()
  locking/lockdep: Eliminate redundant irqs check in __lock_acquire()
  locking/lockdep: Add a faster path in __lock_release()
  locking/lockdep: Make class->ops a percpu counter
  locking/lockdep: Call lock_release() after releasing the lock

 include/linux/lockdep.h            |   7 +-
 include/linux/rwlock_api_smp.h     |  16 ++--
 include/linux/spinlock_api_smp.h   |   8 +-
 kernel/locking/lockdep.c           | 113 ++++++++---------------------
 kernel/locking/lockdep_internals.h |  23 ++++++
 kernel/locking/lockdep_proc.c      |   2 +-
 6 files changed, 66 insertions(+), 103 deletions(-)

-- 
2.18.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-10-09 10:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-02 20:19 [PATCH v2 0/5] locking/lockdep: Improve lockdep performance Waiman Long
2018-10-02 20:19 ` [PATCH v2 1/5] locking/lockdep: Remove add_chain_cache_classes() Waiman Long
2018-10-03  7:30   ` [tip:locking/core] " tip-bot for Waiman Long
2018-10-02 20:19 ` [PATCH v2 2/5] locking/lockdep: Eliminate redundant irqs check in __lock_acquire() Waiman Long
2018-10-03  7:31   ` [tip:locking/core] locking/lockdep: Eliminate redundant IRQs " tip-bot for Waiman Long
2018-10-02 20:19 ` [PATCH v2 3/5] locking/lockdep: Add a faster path in __lock_release() Waiman Long
2018-10-03  7:31   ` [tip:locking/core] " tip-bot for Waiman Long
2018-10-02 20:19 ` [PATCH v2 4/5] locking/lockdep: Make class->ops a percpu counter Waiman Long
2018-10-03  7:48   ` Peter Zijlstra
2018-10-03  7:54     ` Ingo Molnar
2018-10-03  8:24       ` Peter Zijlstra
2018-10-03 13:57     ` Waiman Long
2018-10-03 17:07       ` Waiman Long
2018-10-04 10:14         ` Ingo Molnar
2018-10-04 13:05           ` Waiman Long
2018-10-09 10:54         ` [tip:locking/core] locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y tip-bot for Waiman Long
2018-10-02 20:19 ` [PATCH v2 5/5] locking/lockdep: Call lock_release() after releasing the lock Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.