All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] locking/lockdep: Improve lockdep performance
@ 2018-10-02 20:19 Waiman Long
  2018-10-02 20:19 ` [PATCH v2 1/5] locking/lockdep: Remove add_chain_cache_classes() Waiman Long
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Waiman Long @ 2018-10-02 20:19 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon; +Cc: linux-kernel, Waiman Long

 v1->v2:
  - Minor twists to incorporate Ingo's comments.
  - Move class->ops from the lock_class structure to percpu array under
    CONFIG_DEBUG_LOCKDEP. That moves the increased memory consumption
    to CONFIG_DEBUG_LOCKDEP only.

Enabling CONFIG_LOCKDEP and other related debug options will greatly
reduce system performance. This patchset aims to reduce the performance
slowdown caused by the lockdep code.

Patch 1 just removes an inline function that wasn't used.

Patches 2 and 3 are minor twists to optimize the code.

Patch 4 makes class->ops a per-cpu counter and moves the stat counter
under CONFIG_DEBUG_LOCKDEP again.

Patch 5 moves the lock_release() call outside of the lock critical 
section.

Parallel kernel compilation tests (make -j <#cpu>, best of 3 runs)
with gcc8 were performed on 2 different systems:

 1) an 1-socket 22-core 44-thread Skylake system
 2) a 4-socket 72-core 144-thread Broadwell system

Four different kernel variants based on the 4.19-rc5 kernel were used:

 1) non-debug kernel (with minimal debug options enabled)
 2) pre-patch debug kernel  (CONFIG_LOCKDEP, !CONFIG_DEBUG_LOCKDEP)
 3) post-patch debug kernel (CONFIG_LOCKDEP, !CONFIG_DEBUG_LOCKDEP)
 4) post-patch debug kernel (CONFIG_LOCKDEP,  CONFIG_DEBUG_LOCKDEP)

Note that the debug kernels had more debug options enabled than just
LOCKDEP.

The build times with pre-patch and post-patch debug kernels were:

   System    Kernel 1    Kernel 2    Kernel 3    Kernel 4
   ------    --------    --------    --------    --------
  1-socket    6m06.0s     8m54.7s     8m34.9s     9m28.1s
  4-socket    4m09.2s     7m36.0s     5m38.8s     6m17.8s

Using the non-debug kernel execution times as the baseline, the % 
runtime increase of the other 3 kernel variants were:

   System    Kernel 2    Kernel 3    Kernel 4
   ------    --------    --------    --------    
  1-socket    +46.1%      +40.7%      +55.2%
  4-socket    +83.0%      +36.0%      +51.6%

Comparing just kernels 2 and 3, the patch reduced the execution times 
by 3.7% and 25.7% for the 1-socket and 4-socket systems respectively.

I think the last 2 patches yield most of the performance improvement.

Waiman Long (5):
  locking/lockdep: Remove add_chain_cache_classes()
  locking/lockdep: Eliminate redundant irqs check in __lock_acquire()
  locking/lockdep: Add a faster path in __lock_release()
  locking/lockdep: Make class->ops a percpu counter
  locking/lockdep: Call lock_release() after releasing the lock

 include/linux/lockdep.h            |   7 +-
 include/linux/rwlock_api_smp.h     |  16 ++--
 include/linux/spinlock_api_smp.h   |   8 +-
 kernel/locking/lockdep.c           | 113 ++++++++---------------------
 kernel/locking/lockdep_internals.h |  23 ++++++
 kernel/locking/lockdep_proc.c      |   2 +-
 6 files changed, 66 insertions(+), 103 deletions(-)

-- 
2.18.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/5] locking/lockdep: Remove add_chain_cache_classes()
  2018-10-02 20:19 [PATCH v2 0/5] locking/lockdep: Improve lockdep performance Waiman Long
@ 2018-10-02 20:19 ` Waiman Long
  2018-10-03  7:30   ` [tip:locking/core] " tip-bot for Waiman Long
  2018-10-02 20:19 ` [PATCH v2 2/5] locking/lockdep: Eliminate redundant irqs check in __lock_acquire() Waiman Long
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Waiman Long @ 2018-10-02 20:19 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon; +Cc: linux-kernel, Waiman Long

The inline function add_chain_cache_classes() is defined, but has no
caller. Just remove it.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/lockdep.c | 70 ----------------------------------------
 1 file changed, 70 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index dd13f865ad40..8f9de7cd11ab 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2147,76 +2147,6 @@ static int check_no_collision(struct task_struct *curr,
 	return 1;
 }
 
-/*
- * This is for building a chain between just two different classes,
- * instead of adding a new hlock upon current, which is done by
- * add_chain_cache().
- *
- * This can be called in any context with two classes, while
- * add_chain_cache() must be done within the lock owener's context
- * since it uses hlock which might be racy in another context.
- */
-static inline int add_chain_cache_classes(unsigned int prev,
-					  unsigned int next,
-					  unsigned int irq_context,
-					  u64 chain_key)
-{
-	struct hlist_head *hash_head = chainhashentry(chain_key);
-	struct lock_chain *chain;
-
-	/*
-	 * Allocate a new chain entry from the static array, and add
-	 * it to the hash:
-	 */
-
-	/*
-	 * We might need to take the graph lock, ensure we've got IRQs
-	 * disabled to make this an IRQ-safe lock.. for recursion reasons
-	 * lockdep won't complain about its own locking errors.
-	 */
-	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
-		return 0;
-
-	if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
-		if (!debug_locks_off_graph_unlock())
-			return 0;
-
-		print_lockdep_off("BUG: MAX_LOCKDEP_CHAINS too low!");
-		dump_stack();
-		return 0;
-	}
-
-	chain = lock_chains + nr_lock_chains++;
-	chain->chain_key = chain_key;
-	chain->irq_context = irq_context;
-	chain->depth = 2;
-	if (likely(nr_chain_hlocks + chain->depth <= MAX_LOCKDEP_CHAIN_HLOCKS)) {
-		chain->base = nr_chain_hlocks;
-		nr_chain_hlocks += chain->depth;
-		chain_hlocks[chain->base] = prev - 1;
-		chain_hlocks[chain->base + 1] = next -1;
-	}
-#ifdef CONFIG_DEBUG_LOCKDEP
-	/*
-	 * Important for check_no_collision().
-	 */
-	else {
-		if (!debug_locks_off_graph_unlock())
-			return 0;
-
-		print_lockdep_off("BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!");
-		dump_stack();
-		return 0;
-	}
-#endif
-
-	hlist_add_head_rcu(&chain->entry, hash_head);
-	debug_atomic_inc(chain_lookup_misses);
-	inc_chains();
-
-	return 1;
-}
-
 /*
  * Adds a dependency chain into chain hashtable. And must be called with
  * graph_lock held.
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/5] locking/lockdep: Eliminate redundant irqs check in __lock_acquire()
  2018-10-02 20:19 [PATCH v2 0/5] locking/lockdep: Improve lockdep performance Waiman Long
  2018-10-02 20:19 ` [PATCH v2 1/5] locking/lockdep: Remove add_chain_cache_classes() Waiman Long
@ 2018-10-02 20:19 ` Waiman Long
  2018-10-03  7:31   ` [tip:locking/core] locking/lockdep: Eliminate redundant IRQs " tip-bot for Waiman Long
  2018-10-02 20:19 ` [PATCH v2 3/5] locking/lockdep: Add a faster path in __lock_release() Waiman Long
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Waiman Long @ 2018-10-02 20:19 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon; +Cc: linux-kernel, Waiman Long

The static __lock_acquire() function has only two callers:

 1) lock_acquire()
 2) reacquire_held_locks()

In lock_acquire(), raw_local_irq_save() is called beforehand. So
IRQs must have been disabled. So the check

	DEBUG_LOCKS_WARN_ON(!irqs_disabled())

is kind of redundant in this case. So move the above check
to reacquire_held_locks() to eliminate redundant code in the
lock_acquire() path.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/lockdep.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 8f9de7cd11ab..bd59163b0550 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3192,6 +3192,10 @@ static int __lock_is_held(const struct lockdep_map *lock, int read);
 /*
  * This gets called for every mutex_lock*()/spin_lock*() operation.
  * We maintain the dependency maps and validate the locking attempt:
+ *
+ * The callers must make sure that IRQs are disabled before calling it,
+ * otherwise we could get an interrupt which would want to take locks,
+ * which would end up in lockdep again.
  */
 static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 			  int trylock, int read, int check, int hardirqs_off,
@@ -3209,14 +3213,6 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	if (unlikely(!debug_locks))
 		return 0;
 
-	/*
-	 * Lockdep should run with IRQs disabled, otherwise we could
-	 * get an interrupt which would want to take locks, which would
-	 * end up in lockdep and have you got a head-ache already?
-	 */
-	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
-		return 0;
-
 	if (!prove_locking || lock->key == &__lockdep_no_validate__)
 		check = 0;
 
@@ -3473,6 +3469,9 @@ static int reacquire_held_locks(struct task_struct *curr, unsigned int depth,
 {
 	struct held_lock *hlock;
 
+	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+		return 0;
+
 	for (hlock = curr->held_locks + idx; idx < depth; idx++, hlock++) {
 		if (!__lock_acquire(hlock->instance,
 				    hlock_class(hlock)->subclass,
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/5] locking/lockdep: Add a faster path in __lock_release()
  2018-10-02 20:19 [PATCH v2 0/5] locking/lockdep: Improve lockdep performance Waiman Long
  2018-10-02 20:19 ` [PATCH v2 1/5] locking/lockdep: Remove add_chain_cache_classes() Waiman Long
  2018-10-02 20:19 ` [PATCH v2 2/5] locking/lockdep: Eliminate redundant irqs check in __lock_acquire() Waiman Long
@ 2018-10-02 20:19 ` Waiman Long
  2018-10-03  7:31   ` [tip:locking/core] " tip-bot for Waiman Long
  2018-10-02 20:19 ` [PATCH v2 4/5] locking/lockdep: Make class->ops a percpu counter Waiman Long
  2018-10-02 20:19 ` [PATCH v2 5/5] locking/lockdep: Call lock_release() after releasing the lock Waiman Long
  4 siblings, 1 reply; 17+ messages in thread
From: Waiman Long @ 2018-10-02 20:19 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon; +Cc: linux-kernel, Waiman Long

When __lock_release() is called, the most likely unlock scenario is
on the innermost lock in the chain.  In this case, we can skip some of
the checks and provide a faster path to completion.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/lockdep.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index bd59163b0550..b7774ff2516f 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3625,6 +3625,13 @@ __lock_release(struct lockdep_map *lock, int nested, unsigned long ip)
 	curr->lockdep_depth = i;
 	curr->curr_chain_key = hlock->prev_chain_key;
 
+	/*
+	 * The most likely case is when the unlock is on the innermost
+	 * lock. In this case, we are done!
+	 */
+	if (i == depth - 1)
+		return 1;
+
 	if (reacquire_held_locks(curr, depth, i + 1))
 		return 0;
 
@@ -3632,10 +3639,14 @@ __lock_release(struct lockdep_map *lock, int nested, unsigned long ip)
 	 * We had N bottles of beer on the wall, we drank one, but now
 	 * there's not N-1 bottles of beer left on the wall...
 	 */
-	if (DEBUG_LOCKS_WARN_ON(curr->lockdep_depth != depth - 1))
-		return 0;
+	DEBUG_LOCKS_WARN_ON(curr->lockdep_depth != depth-1);
 
-	return 1;
+	/*
+	 * Since reacquire_held_locks() would have called check_chain_key()
+	 * indirectly via __lock_acquire(), we don't need to do it again
+	 * on return.
+	 */
+	return 0;
 }
 
 static int __lock_is_held(const struct lockdep_map *lock, int read)
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-10-02 20:19 [PATCH v2 0/5] locking/lockdep: Improve lockdep performance Waiman Long
                   ` (2 preceding siblings ...)
  2018-10-02 20:19 ` [PATCH v2 3/5] locking/lockdep: Add a faster path in __lock_release() Waiman Long
@ 2018-10-02 20:19 ` Waiman Long
  2018-10-03  7:48   ` Peter Zijlstra
  2018-10-02 20:19 ` [PATCH v2 5/5] locking/lockdep: Call lock_release() after releasing the lock Waiman Long
  4 siblings, 1 reply; 17+ messages in thread
From: Waiman Long @ 2018-10-02 20:19 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon; +Cc: linux-kernel, Waiman Long

A sizable portion of the CPU cycles spent on the __lock_acquire() is used
up by the atomic increment of class->ops stat counter. By taking it out
from the lock_class structure and changing it to a per-cpu per-lock-class
counter, we can reduce the amount of cacheline contention on the class
structure when multiple CPUs are trying to acquire locks of the same
class simultaneously.

To limit the increase in memory consumption because of the percpu nature
of that counter, it is now put back under the CONFIG_DEBUG_LOCKDEP
config option. So the memory consumption increase will only occur if
CONFIG_DEBUG_LOCKDEP is defined. The lock_class structure, however,
is reduced in size by 16 bytes on 64-bit archs after ops removal and
a minor restructuring of the fields.

This patch also fixes a bug in the increment code as the counter is of
the unsigned long type, but atomic_inc() was used to increment it.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/lockdep.h            |  7 +------
 kernel/locking/lockdep.c           | 11 ++++++++---
 kernel/locking/lockdep_internals.h | 23 +++++++++++++++++++++++
 kernel/locking/lockdep_proc.c      |  2 +-
 4 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index b0d0b51c4d85..1fd82ff99c65 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -99,13 +99,8 @@ struct lock_class {
 	 */
 	unsigned int			version;
 
-	/*
-	 * Statistics counter:
-	 */
-	unsigned long			ops;
-
-	const char			*name;
 	int				name_version;
+	const char			*name;
 
 #ifdef CONFIG_LOCK_STAT
 	unsigned long			contention_point[LOCKSTAT_POINTS];
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index b7774ff2516f..57e3f153474f 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -138,7 +138,7 @@ static struct lock_list list_entries[MAX_LOCKDEP_ENTRIES];
  * get freed - this significantly simplifies the debugging code.
  */
 unsigned long nr_lock_classes;
-static struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
+struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
 
 static inline struct lock_class *hlock_class(struct held_lock *hlock)
 {
@@ -435,6 +435,7 @@ unsigned int max_lockdep_depth;
  * Various lockdep statistics:
  */
 DEFINE_PER_CPU(struct lockdep_stats, lockdep_stats);
+DEFINE_PER_CPU(unsigned long [MAX_LOCKDEP_KEYS], lock_class_ops);
 #endif
 
 /*
@@ -1391,7 +1392,9 @@ static void print_lock_class_header(struct lock_class *class, int depth)
 
 	printk("%*s->", depth, "");
 	print_lock_name(class);
-	printk(KERN_CONT " ops: %lu", class->ops);
+#ifdef CONFIG_DEBUG_LOCKDEP
+	printk(KERN_CONT " ops: %lu", debug_class_ops_read(class));
+#endif
 	printk(KERN_CONT " {\n");
 
 	for (bit = 0; bit < LOCK_USAGE_STATES; bit++) {
@@ -3226,7 +3229,9 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 		if (!class)
 			return 0;
 	}
-	atomic_inc((atomic_t *)&class->ops);
+
+	debug_class_ops_inc(class);
+
 	if (very_verbose(class)) {
 		printk("\nacquire class [%px] %s", class->key, class->name);
 		if (class->name_version > 1)
diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h
index d459d624ba2a..536012e81d04 100644
--- a/kernel/locking/lockdep_internals.h
+++ b/kernel/locking/lockdep_internals.h
@@ -155,6 +155,8 @@ struct lockdep_stats {
 };
 
 DECLARE_PER_CPU(struct lockdep_stats, lockdep_stats);
+DECLARE_PER_CPU(unsigned long [MAX_LOCKDEP_KEYS], lock_class_ops);
+extern struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
 
 #define __debug_atomic_inc(ptr)					\
 	this_cpu_inc(lockdep_stats.ptr);
@@ -179,9 +181,30 @@ DECLARE_PER_CPU(struct lockdep_stats, lockdep_stats);
 	}								\
 	__total;							\
 })
+
+static inline void debug_class_ops_inc(struct lock_class *class)
+{
+	int idx;
+
+	idx = class - lock_classes;
+	__this_cpu_inc(lock_class_ops[idx]);
+}
+
+static inline unsigned long debug_class_ops_read(struct lock_class *class)
+{
+	int idx, cpu;
+	unsigned long ops = 0;
+
+	idx = class - lock_classes;
+	for_each_possible_cpu(cpu)
+		ops += per_cpu(lock_class_ops[idx], cpu);
+	return ops;
+}
+
 #else
 # define __debug_atomic_inc(ptr)	do { } while (0)
 # define debug_atomic_inc(ptr)		do { } while (0)
 # define debug_atomic_dec(ptr)		do { } while (0)
 # define debug_atomic_read(ptr)		0
+# define debug_class_ops_inc(ptr)	do { } while (0)
 #endif
diff --git a/kernel/locking/lockdep_proc.c b/kernel/locking/lockdep_proc.c
index 3dd980dfba2d..3d31f9b0059e 100644
--- a/kernel/locking/lockdep_proc.c
+++ b/kernel/locking/lockdep_proc.c
@@ -68,7 +68,7 @@ static int l_show(struct seq_file *m, void *v)
 
 	seq_printf(m, "%p", class->key);
 #ifdef CONFIG_DEBUG_LOCKDEP
-	seq_printf(m, " OPS:%8ld", class->ops);
+	seq_printf(m, " OPS:%8ld", debug_class_ops_read(class));
 #endif
 #ifdef CONFIG_PROVE_LOCKING
 	seq_printf(m, " FD:%5ld", lockdep_count_forward_deps(class));
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 5/5] locking/lockdep: Call lock_release() after releasing the lock
  2018-10-02 20:19 [PATCH v2 0/5] locking/lockdep: Improve lockdep performance Waiman Long
                   ` (3 preceding siblings ...)
  2018-10-02 20:19 ` [PATCH v2 4/5] locking/lockdep: Make class->ops a percpu counter Waiman Long
@ 2018-10-02 20:19 ` Waiman Long
  4 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2018-10-02 20:19 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon; +Cc: linux-kernel, Waiman Long

Currently, lock_acquire() is called before acquiring the lock and
lock_release() is called before the releasing the lock. As a result,
the execution time of lock_release() is added to the lock hold time
reducing locking throughput, especially for spinlocks and rwlocks which
tend to have a much shorter lock hold time.

As lock_release() is not going to update any shared data that needs
protection from the lock, we don't actually need to call it before
releasing the lock. So the lock_release() calls are now postponed to
after releasing the lock for spinlocks and rwlocks.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/rwlock_api_smp.h   | 16 ++++++++--------
 include/linux/spinlock_api_smp.h |  8 ++++----
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/rwlock_api_smp.h b/include/linux/rwlock_api_smp.h
index 86ebb4bf9c6e..b026940a0962 100644
--- a/include/linux/rwlock_api_smp.h
+++ b/include/linux/rwlock_api_smp.h
@@ -215,63 +215,63 @@ static inline void __raw_write_lock(rwlock_t *lock)
 
 static inline void __raw_write_unlock(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_write_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	preempt_enable();
 }
 
 static inline void __raw_read_unlock(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_read_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	preempt_enable();
 }
 
 static inline void
 __raw_read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_read_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	local_irq_restore(flags);
 	preempt_enable();
 }
 
 static inline void __raw_read_unlock_irq(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_read_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	local_irq_enable();
 	preempt_enable();
 }
 
 static inline void __raw_read_unlock_bh(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_read_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	__local_bh_enable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
 }
 
 static inline void __raw_write_unlock_irqrestore(rwlock_t *lock,
 					     unsigned long flags)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_write_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	local_irq_restore(flags);
 	preempt_enable();
 }
 
 static inline void __raw_write_unlock_irq(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_write_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	local_irq_enable();
 	preempt_enable();
 }
 
 static inline void __raw_write_unlock_bh(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_write_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	__local_bh_enable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
 }
 
diff --git a/include/linux/spinlock_api_smp.h b/include/linux/spinlock_api_smp.h
index 42dfab89e740..fcb84df0678b 100644
--- a/include/linux/spinlock_api_smp.h
+++ b/include/linux/spinlock_api_smp.h
@@ -147,32 +147,32 @@ static inline void __raw_spin_lock(raw_spinlock_t *lock)
 
 static inline void __raw_spin_unlock(raw_spinlock_t *lock)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_spin_unlock(lock);
+	spin_release(&lock->dep_map, 1, _RET_IP_);
 	preempt_enable();
 }
 
 static inline void __raw_spin_unlock_irqrestore(raw_spinlock_t *lock,
 					    unsigned long flags)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_spin_unlock(lock);
+	spin_release(&lock->dep_map, 1, _RET_IP_);
 	local_irq_restore(flags);
 	preempt_enable();
 }
 
 static inline void __raw_spin_unlock_irq(raw_spinlock_t *lock)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_spin_unlock(lock);
+	spin_release(&lock->dep_map, 1, _RET_IP_);
 	local_irq_enable();
 	preempt_enable();
 }
 
 static inline void __raw_spin_unlock_bh(raw_spinlock_t *lock)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_spin_unlock(lock);
+	spin_release(&lock->dep_map, 1, _RET_IP_);
 	__local_bh_enable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
 }
 
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip:locking/core] locking/lockdep: Remove add_chain_cache_classes()
  2018-10-02 20:19 ` [PATCH v2 1/5] locking/lockdep: Remove add_chain_cache_classes() Waiman Long
@ 2018-10-03  7:30   ` tip-bot for Waiman Long
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Waiman Long @ 2018-10-03  7:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, will.deacon, hpa, linux-kernel, longman, torvalds,
	a.p.zijlstra, tglx, mingo

Commit-ID:  44318d5b07be7d7cfe718aa22ea3b2577361a0b5
Gitweb:     https://git.kernel.org/tip/44318d5b07be7d7cfe718aa22ea3b2577361a0b5
Author:     Waiman Long <longman@redhat.com>
AuthorDate: Tue, 2 Oct 2018 16:19:16 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 3 Oct 2018 08:46:02 +0200

locking/lockdep: Remove add_chain_cache_classes()

The inline function add_chain_cache_classes() is defined, but has no
caller. Just remove it.

Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1538511560-10090-2-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/lockdep.c | 70 ------------------------------------------------
 1 file changed, 70 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e406c5fdb41e..fa82d55279fe 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2148,76 +2148,6 @@ static int check_no_collision(struct task_struct *curr,
 	return 1;
 }
 
-/*
- * This is for building a chain between just two different classes,
- * instead of adding a new hlock upon current, which is done by
- * add_chain_cache().
- *
- * This can be called in any context with two classes, while
- * add_chain_cache() must be done within the lock owener's context
- * since it uses hlock which might be racy in another context.
- */
-static inline int add_chain_cache_classes(unsigned int prev,
-					  unsigned int next,
-					  unsigned int irq_context,
-					  u64 chain_key)
-{
-	struct hlist_head *hash_head = chainhashentry(chain_key);
-	struct lock_chain *chain;
-
-	/*
-	 * Allocate a new chain entry from the static array, and add
-	 * it to the hash:
-	 */
-
-	/*
-	 * We might need to take the graph lock, ensure we've got IRQs
-	 * disabled to make this an IRQ-safe lock.. for recursion reasons
-	 * lockdep won't complain about its own locking errors.
-	 */
-	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
-		return 0;
-
-	if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
-		if (!debug_locks_off_graph_unlock())
-			return 0;
-
-		print_lockdep_off("BUG: MAX_LOCKDEP_CHAINS too low!");
-		dump_stack();
-		return 0;
-	}
-
-	chain = lock_chains + nr_lock_chains++;
-	chain->chain_key = chain_key;
-	chain->irq_context = irq_context;
-	chain->depth = 2;
-	if (likely(nr_chain_hlocks + chain->depth <= MAX_LOCKDEP_CHAIN_HLOCKS)) {
-		chain->base = nr_chain_hlocks;
-		nr_chain_hlocks += chain->depth;
-		chain_hlocks[chain->base] = prev - 1;
-		chain_hlocks[chain->base + 1] = next -1;
-	}
-#ifdef CONFIG_DEBUG_LOCKDEP
-	/*
-	 * Important for check_no_collision().
-	 */
-	else {
-		if (!debug_locks_off_graph_unlock())
-			return 0;
-
-		print_lockdep_off("BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!");
-		dump_stack();
-		return 0;
-	}
-#endif
-
-	hlist_add_head_rcu(&chain->entry, hash_head);
-	debug_atomic_inc(chain_lookup_misses);
-	inc_chains();
-
-	return 1;
-}
-
 /*
  * Adds a dependency chain into chain hashtable. And must be called with
  * graph_lock held.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip:locking/core] locking/lockdep: Eliminate redundant IRQs check in __lock_acquire()
  2018-10-02 20:19 ` [PATCH v2 2/5] locking/lockdep: Eliminate redundant irqs check in __lock_acquire() Waiman Long
@ 2018-10-03  7:31   ` tip-bot for Waiman Long
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Waiman Long @ 2018-10-03  7:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: will.deacon, a.p.zijlstra, hpa, mingo, longman, tglx, torvalds,
	linux-kernel, peterz

Commit-ID:  8ee10862476ef8b9e81e5b521205fd5c620b4ffb
Gitweb:     https://git.kernel.org/tip/8ee10862476ef8b9e81e5b521205fd5c620b4ffb
Author:     Waiman Long <longman@redhat.com>
AuthorDate: Tue, 2 Oct 2018 16:19:17 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 3 Oct 2018 08:46:02 +0200

locking/lockdep: Eliminate redundant IRQs check in __lock_acquire()

The static __lock_acquire() function has only two callers:

 1) lock_acquire()
 2) reacquire_held_locks()

In lock_acquire(), raw_local_irq_save() is called beforehand. So
IRQs must have been disabled. So the check:

	DEBUG_LOCKS_WARN_ON(!irqs_disabled())

is kind of redundant in this case. So move the above check
to reacquire_held_locks() to eliminate redundant code in the
lock_acquire() path.

Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1538511560-10090-3-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/lockdep.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index fa82d55279fe..a5d7db558928 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3193,6 +3193,10 @@ static int __lock_is_held(const struct lockdep_map *lock, int read);
 /*
  * This gets called for every mutex_lock*()/spin_lock*() operation.
  * We maintain the dependency maps and validate the locking attempt:
+ *
+ * The callers must make sure that IRQs are disabled before calling it,
+ * otherwise we could get an interrupt which would want to take locks,
+ * which would end up in lockdep again.
  */
 static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 			  int trylock, int read, int check, int hardirqs_off,
@@ -3210,14 +3214,6 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	if (unlikely(!debug_locks))
 		return 0;
 
-	/*
-	 * Lockdep should run with IRQs disabled, otherwise we could
-	 * get an interrupt which would want to take locks, which would
-	 * end up in lockdep and have you got a head-ache already?
-	 */
-	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
-		return 0;
-
 	if (!prove_locking || lock->key == &__lockdep_no_validate__)
 		check = 0;
 
@@ -3474,6 +3470,9 @@ static int reacquire_held_locks(struct task_struct *curr, unsigned int depth,
 {
 	struct held_lock *hlock;
 
+	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+		return 0;
+
 	for (hlock = curr->held_locks + idx; idx < depth; idx++, hlock++) {
 		if (!__lock_acquire(hlock->instance,
 				    hlock_class(hlock)->subclass,

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip:locking/core] locking/lockdep: Add a faster path in __lock_release()
  2018-10-02 20:19 ` [PATCH v2 3/5] locking/lockdep: Add a faster path in __lock_release() Waiman Long
@ 2018-10-03  7:31   ` tip-bot for Waiman Long
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Waiman Long @ 2018-10-03  7:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: will.deacon, torvalds, tglx, hpa, mingo, longman, linux-kernel,
	a.p.zijlstra, peterz

Commit-ID:  ce52a18db45842f5b992851a552bd7f6acb2241b
Gitweb:     https://git.kernel.org/tip/ce52a18db45842f5b992851a552bd7f6acb2241b
Author:     Waiman Long <longman@redhat.com>
AuthorDate: Tue, 2 Oct 2018 16:19:18 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 3 Oct 2018 08:46:03 +0200

locking/lockdep: Add a faster path in __lock_release()

When __lock_release() is called, the most likely unlock scenario is
on the innermost lock in the chain.  In this case, we can skip some of
the checks and provide a faster path to completion.

Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1538511560-10090-4-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/lockdep.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index a5d7db558928..511d30f88bce 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3626,6 +3626,13 @@ __lock_release(struct lockdep_map *lock, int nested, unsigned long ip)
 	curr->lockdep_depth = i;
 	curr->curr_chain_key = hlock->prev_chain_key;
 
+	/*
+	 * The most likely case is when the unlock is on the innermost
+	 * lock. In this case, we are done!
+	 */
+	if (i == depth-1)
+		return 1;
+
 	if (reacquire_held_locks(curr, depth, i + 1))
 		return 0;
 
@@ -3633,10 +3640,14 @@ __lock_release(struct lockdep_map *lock, int nested, unsigned long ip)
 	 * We had N bottles of beer on the wall, we drank one, but now
 	 * there's not N-1 bottles of beer left on the wall...
 	 */
-	if (DEBUG_LOCKS_WARN_ON(curr->lockdep_depth != depth - 1))
-		return 0;
+	DEBUG_LOCKS_WARN_ON(curr->lockdep_depth != depth-1);
 
-	return 1;
+	/*
+	 * Since reacquire_held_locks() would have called check_chain_key()
+	 * indirectly via __lock_acquire(), we don't need to do it again
+	 * on return.
+	 */
+	return 0;
 }
 
 static int __lock_is_held(const struct lockdep_map *lock, int read)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-10-02 20:19 ` [PATCH v2 4/5] locking/lockdep: Make class->ops a percpu counter Waiman Long
@ 2018-10-03  7:48   ` Peter Zijlstra
  2018-10-03  7:54     ` Ingo Molnar
  2018-10-03 13:57     ` Waiman Long
  0 siblings, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2018-10-03  7:48 UTC (permalink / raw)
  To: Waiman Long; +Cc: Ingo Molnar, Will Deacon, linux-kernel

On Tue, Oct 02, 2018 at 04:19:19PM -0400, Waiman Long wrote:
> +DEFINE_PER_CPU(unsigned long [MAX_LOCKDEP_KEYS], lock_class_ops);

> @@ -179,9 +181,30 @@ DECLARE_PER_CPU(struct lockdep_stats, lockdep_stats);
>  	}								\
>  	__total;							\
>  })
> +
> +static inline void debug_class_ops_inc(struct lock_class *class)
> +{
> +	int idx;
> +
> +	idx = class - lock_classes;
> +	__this_cpu_inc(lock_class_ops[idx]);
> +}
> +
> +static inline unsigned long debug_class_ops_read(struct lock_class *class)
> +{
> +	int idx, cpu;
> +	unsigned long ops = 0;
> +
> +	idx = class - lock_classes;
> +	for_each_possible_cpu(cpu)
> +		ops += per_cpu(lock_class_ops[idx], cpu);
> +	return ops;
> +}

Would it make sense to stick that in struct lockdep_stats ?

A little something like:

struct lockdep_stats {
	/* ... */
	int	lock_class_ops[MAX_LOCKDEP_KEYS];
};

and then use:

	debug_atomic_inc(lock_class_ops[idx]);


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-10-03  7:48   ` Peter Zijlstra
@ 2018-10-03  7:54     ` Ingo Molnar
  2018-10-03  8:24       ` Peter Zijlstra
  2018-10-03 13:57     ` Waiman Long
  1 sibling, 1 reply; 17+ messages in thread
From: Ingo Molnar @ 2018-10-03  7:54 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Waiman Long, Ingo Molnar, Will Deacon, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Oct 02, 2018 at 04:19:19PM -0400, Waiman Long wrote:
> > +DEFINE_PER_CPU(unsigned long [MAX_LOCKDEP_KEYS], lock_class_ops);
> 
> > @@ -179,9 +181,30 @@ DECLARE_PER_CPU(struct lockdep_stats, lockdep_stats);
> >  	}								\
> >  	__total;							\
> >  })
> > +
> > +static inline void debug_class_ops_inc(struct lock_class *class)
> > +{
> > +	int idx;
> > +
> > +	idx = class - lock_classes;
> > +	__this_cpu_inc(lock_class_ops[idx]);
> > +}
> > +
> > +static inline unsigned long debug_class_ops_read(struct lock_class *class)
> > +{
> > +	int idx, cpu;
> > +	unsigned long ops = 0;
> > +
> > +	idx = class - lock_classes;
> > +	for_each_possible_cpu(cpu)
> > +		ops += per_cpu(lock_class_ops[idx], cpu);
> > +	return ops;
> > +}
> 
> Would it make sense to stick that in struct lockdep_stats ?
> 
> A little something like:
> 
> struct lockdep_stats {
> 	/* ... */
> 	int	lock_class_ops[MAX_LOCKDEP_KEYS];
> };

Did you shrink the 'long' to 'int' intentionally?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-10-03  7:54     ` Ingo Molnar
@ 2018-10-03  8:24       ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2018-10-03  8:24 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Waiman Long, Ingo Molnar, Will Deacon, linux-kernel

On Wed, Oct 03, 2018 at 09:54:46AM +0200, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Tue, Oct 02, 2018 at 04:19:19PM -0400, Waiman Long wrote:
> > > +DEFINE_PER_CPU(unsigned long [MAX_LOCKDEP_KEYS], lock_class_ops);
> > 
> > > @@ -179,9 +181,30 @@ DECLARE_PER_CPU(struct lockdep_stats, lockdep_stats);
> > >  	}								\
> > >  	__total;							\
> > >  })
> > > +
> > > +static inline void debug_class_ops_inc(struct lock_class *class)
> > > +{
> > > +	int idx;
> > > +
> > > +	idx = class - lock_classes;
> > > +	__this_cpu_inc(lock_class_ops[idx]);
> > > +}
> > > +
> > > +static inline unsigned long debug_class_ops_read(struct lock_class *class)
> > > +{
> > > +	int idx, cpu;
> > > +	unsigned long ops = 0;
> > > +
> > > +	idx = class - lock_classes;
> > > +	for_each_possible_cpu(cpu)
> > > +		ops += per_cpu(lock_class_ops[idx], cpu);
> > > +	return ops;
> > > +}
> > 
> > Would it make sense to stick that in struct lockdep_stats ?
> > 
> > A little something like:
> > 
> > struct lockdep_stats {
> > 	/* ... */
> > 	int	lock_class_ops[MAX_LOCKDEP_KEYS];
> > };
> 
> Did you shrink the 'long' to 'int' intentionally?

nah, was because all of lockdep_stats is int and I didn't pay much
attention.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-10-03  7:48   ` Peter Zijlstra
  2018-10-03  7:54     ` Ingo Molnar
@ 2018-10-03 13:57     ` Waiman Long
  2018-10-03 17:07       ` Waiman Long
  1 sibling, 1 reply; 17+ messages in thread
From: Waiman Long @ 2018-10-03 13:57 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Will Deacon, linux-kernel

On 10/03/2018 03:48 AM, Peter Zijlstra wrote:
> On Tue, Oct 02, 2018 at 04:19:19PM -0400, Waiman Long wrote:
>> +DEFINE_PER_CPU(unsigned long [MAX_LOCKDEP_KEYS], lock_class_ops);
>> @@ -179,9 +181,30 @@ DECLARE_PER_CPU(struct lockdep_stats, lockdep_stats);
>>  	}								\
>>  	__total;							\
>>  })
>> +
>> +static inline void debug_class_ops_inc(struct lock_class *class)
>> +{
>> +	int idx;
>> +
>> +	idx = class - lock_classes;
>> +	__this_cpu_inc(lock_class_ops[idx]);
>> +}
>> +
>> +static inline unsigned long debug_class_ops_read(struct lock_class *class)
>> +{
>> +	int idx, cpu;
>> +	unsigned long ops = 0;
>> +
>> +	idx = class - lock_classes;
>> +	for_each_possible_cpu(cpu)
>> +		ops += per_cpu(lock_class_ops[idx], cpu);
>> +	return ops;
>> +}
> Would it make sense to stick that in struct lockdep_stats ?
>
> A little something like:
>
> struct lockdep_stats {
> 	/* ... */
> 	int	lock_class_ops[MAX_LOCKDEP_KEYS];
> };
>
> and then use:
>
> 	debug_atomic_inc(lock_class_ops[idx]);
>
Yes, I can certainly do that. Thanks for pointing this out. Will post an
updated patch just for this one.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-10-03 13:57     ` Waiman Long
@ 2018-10-03 17:07       ` Waiman Long
  2018-10-04 10:14         ` Ingo Molnar
  2018-10-09 10:54         ` [tip:locking/core] locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y tip-bot for Waiman Long
  0 siblings, 2 replies; 17+ messages in thread
From: Waiman Long @ 2018-10-03 17:07 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Will Deacon, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1318 bytes --]

On 10/03/2018 09:57 AM, Waiman Long wrote:
> On 10/03/2018 03:48 AM, Peter Zijlstra wrote:
>> On Tue, Oct 02, 2018 at 04:19:19PM -0400, Waiman Long wrote:
>>> +DEFINE_PER_CPU(unsigned long [MAX_LOCKDEP_KEYS], lock_class_ops);
>>> @@ -179,9 +181,30 @@ DECLARE_PER_CPU(struct lockdep_stats, lockdep_stats);
>>>  	}								\
>>>  	__total;							\
>>>  })
>>> +
>>> +static inline void debug_class_ops_inc(struct lock_class *class)
>>> +{
>>> +	int idx;
>>> +
>>> +	idx = class - lock_classes;
>>> +	__this_cpu_inc(lock_class_ops[idx]);
>>> +}
>>> +
>>> +static inline unsigned long debug_class_ops_read(struct lock_class *class)
>>> +{
>>> +	int idx, cpu;
>>> +	unsigned long ops = 0;
>>> +
>>> +	idx = class - lock_classes;
>>> +	for_each_possible_cpu(cpu)
>>> +		ops += per_cpu(lock_class_ops[idx], cpu);
>>> +	return ops;
>>> +}
>> Would it make sense to stick that in struct lockdep_stats ?
>>
>> A little something like:
>>
>> struct lockdep_stats {
>> 	/* ... */
>> 	int	lock_class_ops[MAX_LOCKDEP_KEYS];
>> };
>>
>> and then use:
>>
>> 	debug_atomic_inc(lock_class_ops[idx]);
>>
> Yes, I can certainly do that. Thanks for pointing this out. Will post an
> updated patch just for this one.
>
> Cheers,
> Longman
>
Attached is the updated patch 4. Please let me know if you guys are OK
with that.

Cheers,
Longman


[-- Attachment #2: v3-0004-locking-lockdep-Make-class-ops-a-percpu-counter.patch --]
[-- Type: text/x-patch, Size: 5411 bytes --]

From e615f62dfa155cbf9c04318f146570047f2c0c0c Mon Sep 17 00:00:00 2001
From: Waiman Long <longman@redhat.com>
Date: Wed, 3 Oct 2018 11:36:24 -0400
Subject: [PATCH v3 4/5] locking/lockdep: Make class->ops a percpu counter

A sizable portion of the CPU cycles spent on the __lock_acquire() is used
up by the atomic increment of class->ops stat counter. By taking it out
from the lock_class structure and changing it to a per-cpu per-lock-class
counter, we can reduce the amount of cacheline contention on the class
structure when multiple CPUs are trying to acquire locks of the same
class simultaneously.

To limit the increase in memory consumption because of the percpu nature
of that counter, it is now put back under the CONFIG_DEBUG_LOCKDEP
config option. So the memory consumption increase will only occur if
CONFIG_DEBUG_LOCKDEP is defined. The lock_class structure, however,
is reduced in size by 16 bytes on 64-bit archs after ops removal and
a minor restructuring of the fields.

This patch also fixes a bug in the increment code as the counter is of
the unsigned long type, but atomic_inc() was used to increment it.

 v3: Move lock_class_ops[] into lockdep_stats structure.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/lockdep.h            |  7 +------
 kernel/locking/lockdep.c           | 11 ++++++++---
 kernel/locking/lockdep_internals.h | 27 +++++++++++++++++++++++++++
 kernel/locking/lockdep_proc.c      |  2 +-
 4 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index b0d0b51c4d85..1fd82ff99c65 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -99,13 +99,8 @@ struct lock_class {
 	 */
 	unsigned int			version;
 
-	/*
-	 * Statistics counter:
-	 */
-	unsigned long			ops;
-
-	const char			*name;
 	int				name_version;
+	const char			*name;
 
 #ifdef CONFIG_LOCK_STAT
 	unsigned long			contention_point[LOCKSTAT_POINTS];
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index b7774ff2516f..57e3f153474f 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -138,7 +138,7 @@ static struct lock_list list_entries[MAX_LOCKDEP_ENTRIES];
  * get freed - this significantly simplifies the debugging code.
  */
 unsigned long nr_lock_classes;
-static struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
+struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
 
 static inline struct lock_class *hlock_class(struct held_lock *hlock)
 {
@@ -435,6 +435,7 @@ unsigned int max_lockdep_depth;
  * Various lockdep statistics:
  */
 DEFINE_PER_CPU(struct lockdep_stats, lockdep_stats);
+DEFINE_PER_CPU(unsigned long [MAX_LOCKDEP_KEYS], lock_class_ops);
 #endif
 
 /*
@@ -1391,7 +1392,9 @@ static void print_lock_class_header(struct lock_class *class, int depth)
 
 	printk("%*s->", depth, "");
 	print_lock_name(class);
-	printk(KERN_CONT " ops: %lu", class->ops);
+#ifdef CONFIG_DEBUG_LOCKDEP
+	printk(KERN_CONT " ops: %lu", debug_class_ops_read(class));
+#endif
 	printk(KERN_CONT " {\n");
 
 	for (bit = 0; bit < LOCK_USAGE_STATES; bit++) {
@@ -3226,7 +3229,9 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 		if (!class)
 			return 0;
 	}
-	atomic_inc((atomic_t *)&class->ops);
+
+	debug_class_ops_inc(class);
+
 	if (very_verbose(class)) {
 		printk("\nacquire class [%px] %s", class->key, class->name);
 		if (class->name_version > 1)
diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h
index d459d624ba2a..88c847a41c8a 100644
--- a/kernel/locking/lockdep_internals.h
+++ b/kernel/locking/lockdep_internals.h
@@ -152,9 +152,15 @@ struct lockdep_stats {
 	int	nr_find_usage_forwards_recursions;
 	int	nr_find_usage_backwards_checks;
 	int	nr_find_usage_backwards_recursions;
+
+	/*
+	 * Per lock class locking operation stat counts
+	 */
+	unsigned long lock_class_ops[MAX_LOCKDEP_KEYS];
 };
 
 DECLARE_PER_CPU(struct lockdep_stats, lockdep_stats);
+extern struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
 
 #define __debug_atomic_inc(ptr)					\
 	this_cpu_inc(lockdep_stats.ptr);
@@ -179,9 +185,30 @@ DECLARE_PER_CPU(struct lockdep_stats, lockdep_stats);
 	}								\
 	__total;							\
 })
+
+static inline void debug_class_ops_inc(struct lock_class *class)
+{
+	int idx;
+
+	idx = class - lock_classes;
+	__debug_atomic_inc(lock_class_ops[idx]);
+}
+
+static inline unsigned long debug_class_ops_read(struct lock_class *class)
+{
+	int idx, cpu;
+	unsigned long ops = 0;
+
+	idx = class - lock_classes;
+	for_each_possible_cpu(cpu)
+		ops += per_cpu(lockdep_stats.lock_class_ops[idx], cpu);
+	return ops;
+}
+
 #else
 # define __debug_atomic_inc(ptr)	do { } while (0)
 # define debug_atomic_inc(ptr)		do { } while (0)
 # define debug_atomic_dec(ptr)		do { } while (0)
 # define debug_atomic_read(ptr)		0
+# define debug_class_ops_inc(ptr)	do { } while (0)
 #endif
diff --git a/kernel/locking/lockdep_proc.c b/kernel/locking/lockdep_proc.c
index 3dd980dfba2d..3d31f9b0059e 100644
--- a/kernel/locking/lockdep_proc.c
+++ b/kernel/locking/lockdep_proc.c
@@ -68,7 +68,7 @@ static int l_show(struct seq_file *m, void *v)
 
 	seq_printf(m, "%p", class->key);
 #ifdef CONFIG_DEBUG_LOCKDEP
-	seq_printf(m, " OPS:%8ld", class->ops);
+	seq_printf(m, " OPS:%8ld", debug_class_ops_read(class));
 #endif
 #ifdef CONFIG_PROVE_LOCKING
 	seq_printf(m, " FD:%5ld", lockdep_count_forward_deps(class));
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-10-03 17:07       ` Waiman Long
@ 2018-10-04 10:14         ` Ingo Molnar
  2018-10-04 13:05           ` Waiman Long
  2018-10-09 10:54         ` [tip:locking/core] locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y tip-bot for Waiman Long
  1 sibling, 1 reply; 17+ messages in thread
From: Ingo Molnar @ 2018-10-04 10:14 UTC (permalink / raw)
  To: Waiman Long; +Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel


* Waiman Long <longman@redhat.com> wrote:

> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index b0d0b51c4d85..1fd82ff99c65 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -99,13 +99,8 @@ struct lock_class {
>  	 */
>  	unsigned int			version;
>  
> -	/*
> -	 * Statistics counter:
> -	 */
> -	unsigned long			ops;
> -
> -	const char			*name;
>  	int				name_version;
> +	const char			*name;

'name' gets moved by the patch - better structure packing on 64-bit kernels?

Looks good to me otherwise.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-10-04 10:14         ` Ingo Molnar
@ 2018-10-04 13:05           ` Waiman Long
  0 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2018-10-04 13:05 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel

On 10/04/2018 06:14 AM, Ingo Molnar wrote:
> * Waiman Long <longman@redhat.com> wrote:
>
>> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
>> index b0d0b51c4d85..1fd82ff99c65 100644
>> --- a/include/linux/lockdep.h
>> +++ b/include/linux/lockdep.h
>> @@ -99,13 +99,8 @@ struct lock_class {
>>  	 */
>>  	unsigned int			version;
>>  
>> -	/*
>> -	 * Statistics counter:
>> -	 */
>> -	unsigned long			ops;
>> -
>> -	const char			*name;
>>  	int				name_version;
>> +	const char			*name;
> 'name' gets moved by the patch - better structure packing on 64-bit kernels?
>
> Looks good to me otherwise.
>
> Thanks,
>
> 	Ingo

Yes, that is done on purpose to pack the 2 integers together to remove
two 4-byte holes on 64-bit archs.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:locking/core] locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y
  2018-10-03 17:07       ` Waiman Long
  2018-10-04 10:14         ` Ingo Molnar
@ 2018-10-09 10:54         ` tip-bot for Waiman Long
  1 sibling, 0 replies; 17+ messages in thread
From: tip-bot for Waiman Long @ 2018-10-09 10:54 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: longman, peterz, hpa, mingo, linux-kernel, tglx, will.deacon,
	torvalds, a.p.zijlstra

Commit-ID:  8ca2b56cd7da98fc8f8d787bb706b9d6c8674a3b
Gitweb:     https://git.kernel.org/tip/8ca2b56cd7da98fc8f8d787bb706b9d6c8674a3b
Author:     Waiman Long <longman@redhat.com>
AuthorDate: Wed, 3 Oct 2018 13:07:18 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 9 Oct 2018 09:56:33 +0200

locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y

A sizable portion of the CPU cycles spent on the __lock_acquire() is used
up by the atomic increment of the class->ops stat counter. By taking it out
from the lock_class structure and changing it to a per-cpu per-lock-class
counter, we can reduce the amount of cacheline contention on the class
structure when multiple CPUs are trying to acquire locks of the same
class simultaneously.

To limit the increase in memory consumption because of the percpu nature
of that counter, it is now put back under the CONFIG_DEBUG_LOCKDEP
config option. So the memory consumption increase will only occur if
CONFIG_DEBUG_LOCKDEP is defined. The lock_class structure, however,
is reduced in size by 16 bytes on 64-bit archs after ops removal and
a minor restructuring of the fields.

This patch also fixes a bug in the increment code as the counter is of
the 'unsigned long' type, but atomic_inc() was used to increment it.

Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/d66681f3-8781-9793-1dcf-2436a284550b@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/lockdep.h            |  7 +------
 kernel/locking/lockdep.c           | 11 ++++++++---
 kernel/locking/lockdep_internals.h | 27 +++++++++++++++++++++++++++
 kernel/locking/lockdep_proc.c      |  2 +-
 4 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index b0d0b51c4d85..1fd82ff99c65 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -99,13 +99,8 @@ struct lock_class {
 	 */
 	unsigned int			version;
 
-	/*
-	 * Statistics counter:
-	 */
-	unsigned long			ops;
-
-	const char			*name;
 	int				name_version;
+	const char			*name;
 
 #ifdef CONFIG_LOCK_STAT
 	unsigned long			contention_point[LOCKSTAT_POINTS];
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 511d30f88bce..a0f83058d6aa 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -139,7 +139,7 @@ static struct lock_list list_entries[MAX_LOCKDEP_ENTRIES];
  * get freed - this significantly simplifies the debugging code.
  */
 unsigned long nr_lock_classes;
-static struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
+struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
 
 static inline struct lock_class *hlock_class(struct held_lock *hlock)
 {
@@ -436,6 +436,7 @@ unsigned int max_lockdep_depth;
  * Various lockdep statistics:
  */
 DEFINE_PER_CPU(struct lockdep_stats, lockdep_stats);
+DEFINE_PER_CPU(unsigned long [MAX_LOCKDEP_KEYS], lock_class_ops);
 #endif
 
 /*
@@ -1392,7 +1393,9 @@ static void print_lock_class_header(struct lock_class *class, int depth)
 
 	printk("%*s->", depth, "");
 	print_lock_name(class);
-	printk(KERN_CONT " ops: %lu", class->ops);
+#ifdef CONFIG_DEBUG_LOCKDEP
+	printk(KERN_CONT " ops: %lu", debug_class_ops_read(class));
+#endif
 	printk(KERN_CONT " {\n");
 
 	for (bit = 0; bit < LOCK_USAGE_STATES; bit++) {
@@ -3227,7 +3230,9 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 		if (!class)
 			return 0;
 	}
-	atomic_inc((atomic_t *)&class->ops);
+
+	debug_class_ops_inc(class);
+
 	if (very_verbose(class)) {
 		printk("\nacquire class [%px] %s", class->key, class->name);
 		if (class->name_version > 1)
diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h
index d459d624ba2a..88c847a41c8a 100644
--- a/kernel/locking/lockdep_internals.h
+++ b/kernel/locking/lockdep_internals.h
@@ -152,9 +152,15 @@ struct lockdep_stats {
 	int	nr_find_usage_forwards_recursions;
 	int	nr_find_usage_backwards_checks;
 	int	nr_find_usage_backwards_recursions;
+
+	/*
+	 * Per lock class locking operation stat counts
+	 */
+	unsigned long lock_class_ops[MAX_LOCKDEP_KEYS];
 };
 
 DECLARE_PER_CPU(struct lockdep_stats, lockdep_stats);
+extern struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
 
 #define __debug_atomic_inc(ptr)					\
 	this_cpu_inc(lockdep_stats.ptr);
@@ -179,9 +185,30 @@ DECLARE_PER_CPU(struct lockdep_stats, lockdep_stats);
 	}								\
 	__total;							\
 })
+
+static inline void debug_class_ops_inc(struct lock_class *class)
+{
+	int idx;
+
+	idx = class - lock_classes;
+	__debug_atomic_inc(lock_class_ops[idx]);
+}
+
+static inline unsigned long debug_class_ops_read(struct lock_class *class)
+{
+	int idx, cpu;
+	unsigned long ops = 0;
+
+	idx = class - lock_classes;
+	for_each_possible_cpu(cpu)
+		ops += per_cpu(lockdep_stats.lock_class_ops[idx], cpu);
+	return ops;
+}
+
 #else
 # define __debug_atomic_inc(ptr)	do { } while (0)
 # define debug_atomic_inc(ptr)		do { } while (0)
 # define debug_atomic_dec(ptr)		do { } while (0)
 # define debug_atomic_read(ptr)		0
+# define debug_class_ops_inc(ptr)	do { } while (0)
 #endif
diff --git a/kernel/locking/lockdep_proc.c b/kernel/locking/lockdep_proc.c
index 3dd980dfba2d..3d31f9b0059e 100644
--- a/kernel/locking/lockdep_proc.c
+++ b/kernel/locking/lockdep_proc.c
@@ -68,7 +68,7 @@ static int l_show(struct seq_file *m, void *v)
 
 	seq_printf(m, "%p", class->key);
 #ifdef CONFIG_DEBUG_LOCKDEP
-	seq_printf(m, " OPS:%8ld", class->ops);
+	seq_printf(m, " OPS:%8ld", debug_class_ops_read(class));
 #endif
 #ifdef CONFIG_PROVE_LOCKING
 	seq_printf(m, " FD:%5ld", lockdep_count_forward_deps(class));

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-10-09 10:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-02 20:19 [PATCH v2 0/5] locking/lockdep: Improve lockdep performance Waiman Long
2018-10-02 20:19 ` [PATCH v2 1/5] locking/lockdep: Remove add_chain_cache_classes() Waiman Long
2018-10-03  7:30   ` [tip:locking/core] " tip-bot for Waiman Long
2018-10-02 20:19 ` [PATCH v2 2/5] locking/lockdep: Eliminate redundant irqs check in __lock_acquire() Waiman Long
2018-10-03  7:31   ` [tip:locking/core] locking/lockdep: Eliminate redundant IRQs " tip-bot for Waiman Long
2018-10-02 20:19 ` [PATCH v2 3/5] locking/lockdep: Add a faster path in __lock_release() Waiman Long
2018-10-03  7:31   ` [tip:locking/core] " tip-bot for Waiman Long
2018-10-02 20:19 ` [PATCH v2 4/5] locking/lockdep: Make class->ops a percpu counter Waiman Long
2018-10-03  7:48   ` Peter Zijlstra
2018-10-03  7:54     ` Ingo Molnar
2018-10-03  8:24       ` Peter Zijlstra
2018-10-03 13:57     ` Waiman Long
2018-10-03 17:07       ` Waiman Long
2018-10-04 10:14         ` Ingo Molnar
2018-10-04 13:05           ` Waiman Long
2018-10-09 10:54         ` [tip:locking/core] locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y tip-bot for Waiman Long
2018-10-02 20:19 ` [PATCH v2 5/5] locking/lockdep: Call lock_release() after releasing the lock Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.