linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] locking/lockdep: Improve lockdep performance
@ 2018-09-28 17:53 Waiman Long
  2018-09-28 17:53 ` [PATCH 1/5] locking/lockdep: Remove add_chain_cache_classes() Waiman Long
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Waiman Long @ 2018-09-28 17:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon; +Cc: linux-kernel, Waiman Long

Enabling CONFIG_LOCKDEP and other related debug options will greatly
reduce system performance. This patchset aims to reduce the performance
slowdown caused by the lockdep code.

Patch 1 just removes an inline function that wasn't used.

Patches 2 and 3 are minor twists to optimize the code.

Patch 4 makes class->ops a per-cpu counter.

Patch 5 moves the lock_release() call outside of a lock critical section.

Parallel kernel compilation tests (make -j <#cpu>) were performed on
2 different systems:

 1) an 1-socket 22-core 44-thread Skylake system
 2) a 4-socket 72-core 144-thread Broadwell system

The build times with pre-patch and post-patch debug kernels were:

   System      Pre-patch     Post-patch    %Change
   ------      ---------     ----------    -------
  1-socket      8m53.9s        8m41.2s      -2.4%
  4-socket      7m27.0s        5m31.0s      -26%

I think it is the last 2 patches that yield most of the performance
improvement.

Waiman Long (5):
  locking/lockdep: Remove add_chain_cache_classes()
  locking/lockdep: Eliminate redundant irqs check in __lock_acquire()
  locking/lockdep: Add a faster path in __lock_release()
  locking/lockdep: Make class->ops a percpu counter
  locking/lockdep: Call lock_release after releasing the lock

 include/linux/lockdep.h          |   2 +-
 include/linux/rwlock_api_smp.h   |  16 +++---
 include/linux/spinlock_api_smp.h |   8 +--
 kernel/locking/lockdep.c         | 120 ++++++++++++---------------------------
 4 files changed, 48 insertions(+), 98 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/5] locking/lockdep: Remove add_chain_cache_classes()
  2018-09-28 17:53 [PATCH 0/5] locking/lockdep: Improve lockdep performance Waiman Long
@ 2018-09-28 17:53 ` Waiman Long
  2018-09-28 17:53 ` [PATCH 2/5] locking/lockdep: Eliminate redundant irqs check in __lock_acquire() Waiman Long
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Waiman Long @ 2018-09-28 17:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon; +Cc: linux-kernel, Waiman Long

The inline function add_chain_cache_classes() is defined, but has no
caller. Just remove it.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/lockdep.c | 70 ------------------------------------------------
 1 file changed, 70 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index dd13f86..8f9de7c 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2148,76 +2148,6 @@ static int check_no_collision(struct task_struct *curr,
 }
 
 /*
- * This is for building a chain between just two different classes,
- * instead of adding a new hlock upon current, which is done by
- * add_chain_cache().
- *
- * This can be called in any context with two classes, while
- * add_chain_cache() must be done within the lock owener's context
- * since it uses hlock which might be racy in another context.
- */
-static inline int add_chain_cache_classes(unsigned int prev,
-					  unsigned int next,
-					  unsigned int irq_context,
-					  u64 chain_key)
-{
-	struct hlist_head *hash_head = chainhashentry(chain_key);
-	struct lock_chain *chain;
-
-	/*
-	 * Allocate a new chain entry from the static array, and add
-	 * it to the hash:
-	 */
-
-	/*
-	 * We might need to take the graph lock, ensure we've got IRQs
-	 * disabled to make this an IRQ-safe lock.. for recursion reasons
-	 * lockdep won't complain about its own locking errors.
-	 */
-	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
-		return 0;
-
-	if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
-		if (!debug_locks_off_graph_unlock())
-			return 0;
-
-		print_lockdep_off("BUG: MAX_LOCKDEP_CHAINS too low!");
-		dump_stack();
-		return 0;
-	}
-
-	chain = lock_chains + nr_lock_chains++;
-	chain->chain_key = chain_key;
-	chain->irq_context = irq_context;
-	chain->depth = 2;
-	if (likely(nr_chain_hlocks + chain->depth <= MAX_LOCKDEP_CHAIN_HLOCKS)) {
-		chain->base = nr_chain_hlocks;
-		nr_chain_hlocks += chain->depth;
-		chain_hlocks[chain->base] = prev - 1;
-		chain_hlocks[chain->base + 1] = next -1;
-	}
-#ifdef CONFIG_DEBUG_LOCKDEP
-	/*
-	 * Important for check_no_collision().
-	 */
-	else {
-		if (!debug_locks_off_graph_unlock())
-			return 0;
-
-		print_lockdep_off("BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!");
-		dump_stack();
-		return 0;
-	}
-#endif
-
-	hlist_add_head_rcu(&chain->entry, hash_head);
-	debug_atomic_inc(chain_lookup_misses);
-	inc_chains();
-
-	return 1;
-}
-
-/*
  * Adds a dependency chain into chain hashtable. And must be called with
  * graph_lock held.
  *
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/5] locking/lockdep: Eliminate redundant irqs check in __lock_acquire()
  2018-09-28 17:53 [PATCH 0/5] locking/lockdep: Improve lockdep performance Waiman Long
  2018-09-28 17:53 ` [PATCH 1/5] locking/lockdep: Remove add_chain_cache_classes() Waiman Long
@ 2018-09-28 17:53 ` Waiman Long
  2018-10-02  9:06   ` Ingo Molnar
  2018-09-28 17:53 ` [PATCH 3/5] locking/lockdep: Add a faster path in __lock_release() Waiman Long
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Waiman Long @ 2018-09-28 17:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon; +Cc: linux-kernel, Waiman Long

The static __lock_acquire() function has only two callers:

 1) lock_acquire()
 2) reacquire_held_locks()

In lock_acquire(), raw_local_irq_save() is called before hand. So
irqs must have been disabled. So the check

	DEBUG_LOCKS_WARN_ON(!irqs_disabled())

is kind of redundant in thise case. So move the above check
to reacquire_held_locks() to eliminate redundant code in the
lock_acquire path.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/lockdep.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 8f9de7c..add0468 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3192,6 +3192,10 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name,
 /*
  * This gets called for every mutex_lock*()/spin_lock*() operation.
  * We maintain the dependency maps and validate the locking attempt:
+ *
+ * The callers must make sure that IRQs are disabled before calling it.
+ * otherwise we could get an interrupt which would want to take locks,
+ * which would end up in lockdep again.
  */
 static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 			  int trylock, int read, int check, int hardirqs_off,
@@ -3209,14 +3213,6 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	if (unlikely(!debug_locks))
 		return 0;
 
-	/*
-	 * Lockdep should run with IRQs disabled, otherwise we could
-	 * get an interrupt which would want to take locks, which would
-	 * end up in lockdep and have you got a head-ache already?
-	 */
-	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
-		return 0;
-
 	if (!prove_locking || lock->key == &__lockdep_no_validate__)
 		check = 0;
 
@@ -3473,6 +3469,9 @@ static int reacquire_held_locks(struct task_struct *curr, unsigned int depth,
 {
 	struct held_lock *hlock;
 
+	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+		return 0;
+
 	for (hlock = curr->held_locks + idx; idx < depth; idx++, hlock++) {
 		if (!__lock_acquire(hlock->instance,
 				    hlock_class(hlock)->subclass,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/5] locking/lockdep: Add a faster path in __lock_release()
  2018-09-28 17:53 [PATCH 0/5] locking/lockdep: Improve lockdep performance Waiman Long
  2018-09-28 17:53 ` [PATCH 1/5] locking/lockdep: Remove add_chain_cache_classes() Waiman Long
  2018-09-28 17:53 ` [PATCH 2/5] locking/lockdep: Eliminate redundant irqs check in __lock_acquire() Waiman Long
@ 2018-09-28 17:53 ` Waiman Long
  2018-10-02  9:03   ` Ingo Molnar
  2018-09-28 17:53 ` [PATCH 4/5] locking/lockdep: Make class->ops a percpu counter Waiman Long
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Waiman Long @ 2018-09-28 17:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon; +Cc: linux-kernel, Waiman Long

When __lock_release() is called, the most likely unlock scenario is
on the innermost lock in the chain.  In this case, we can skip some of
the checks and provide a faster path to completion.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/lockdep.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index add0468..ca002c0 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3625,6 +3625,13 @@ static int __lock_downgrade(struct lockdep_map *lock, unsigned long ip)
 	curr->lockdep_depth = i;
 	curr->curr_chain_key = hlock->prev_chain_key;
 
+	/*
+	 * The most likely case is when the unlock is on the innermost
+	 * lock. In this case, we are done!
+	 */
+	if (i == depth - 1)
+		return 1;
+
 	if (reacquire_held_locks(curr, depth, i + 1))
 		return 0;
 
@@ -3632,10 +3639,14 @@ static int __lock_downgrade(struct lockdep_map *lock, unsigned long ip)
 	 * We had N bottles of beer on the wall, we drank one, but now
 	 * there's not N-1 bottles of beer left on the wall...
 	 */
-	if (DEBUG_LOCKS_WARN_ON(curr->lockdep_depth != depth - 1))
-		return 0;
+	DEBUG_LOCKS_WARN_ON(curr->lockdep_depth != depth - 1);
 
-	return 1;
+	/*
+	 * Since reacquire_held_locks() would have called check_chain_key()
+	 * indirectly via __lock_acquire(), we don't need to do it again
+	 * on return.
+	 */
+	return 0;
 }
 
 static int __lock_is_held(const struct lockdep_map *lock, int read)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-09-28 17:53 [PATCH 0/5] locking/lockdep: Improve lockdep performance Waiman Long
                   ` (2 preceding siblings ...)
  2018-09-28 17:53 ` [PATCH 3/5] locking/lockdep: Add a faster path in __lock_release() Waiman Long
@ 2018-09-28 17:53 ` Waiman Long
  2018-09-28 20:25   ` kbuild test robot
                     ` (2 more replies)
  2018-09-28 17:53 ` [PATCH 5/5] locking/lockdep: Call lock_release after releasing the lock Waiman Long
  2018-10-02  9:06 ` [PATCH 0/5] locking/lockdep: Improve lockdep performance Ingo Molnar
  5 siblings, 3 replies; 19+ messages in thread
From: Waiman Long @ 2018-09-28 17:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon; +Cc: linux-kernel, Waiman Long

A sizable portion of the CPU cycles spent on the __lock_acquire() is used
up by the atomic increment of class->ops stat counter. By changing it
to a per-cpu counter, we can reduce the amount of cacheline contention
on the class structure when multiple CPUs are trying to acquire locks
of the same class simultaneously.

This patch also fixes a bug in the increment code as the counter is of
the unsigned long type, but atomic_inc() was used to increment it.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/lockdep.h  |  2 +-
 kernel/locking/lockdep.c | 18 ++++++++++++++----
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index b0d0b51..f8bf705 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -102,7 +102,7 @@ struct lock_class {
 	/*
 	 * Statistics counter:
 	 */
-	unsigned long			ops;
+	unsigned long __percpu		*pops;
 
 	const char			*name;
 	int				name_version;
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index ca002c0..7a0ed1d 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -139,6 +139,7 @@ static inline int debug_locks_off_graph_unlock(void)
  */
 unsigned long nr_lock_classes;
 static struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
+static DEFINE_PER_CPU(unsigned long [MAX_LOCKDEP_KEYS], lock_class_ops);
 
 static inline struct lock_class *hlock_class(struct held_lock *hlock)
 {
@@ -784,11 +785,14 @@ static bool assign_lock_key(struct lockdep_map *lock)
 		dump_stack();
 		return NULL;
 	}
-	class = lock_classes + nr_lock_classes++;
+	class = lock_classes + nr_lock_classes;
 	debug_atomic_inc(nr_unused_locks);
 	class->key = key;
 	class->name = lock->name;
 	class->subclass = subclass;
+	class->pops = &lock_class_ops[nr_lock_classes];
+	nr_lock_classes++;
+
 	INIT_LIST_HEAD(&class->lock_entry);
 	INIT_LIST_HEAD(&class->locks_before);
 	INIT_LIST_HEAD(&class->locks_after);
@@ -1387,11 +1391,15 @@ static inline int usage_match(struct lock_list *entry, void *bit)
 
 static void print_lock_class_header(struct lock_class *class, int depth)
 {
-	int bit;
+	int bit, cpu;
+	unsigned long ops = 0UL;
+
+	for_each_possible_cpu(cpu)
+		ops += *per_cpu(class->pops, cpu);
 
 	printk("%*s->", depth, "");
 	print_lock_name(class);
-	printk(KERN_CONT " ops: %lu", class->ops);
+	printk(KERN_CONT " ops: %lu", ops);
 	printk(KERN_CONT " {\n");
 
 	for (bit = 0; bit < LOCK_USAGE_STATES; bit++) {
@@ -3226,7 +3234,9 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 		if (!class)
 			return 0;
 	}
-	atomic_inc((atomic_t *)&class->ops);
+
+	__this_cpu_inc(*class->pops);
+
 	if (very_verbose(class)) {
 		printk("\nacquire class [%px] %s", class->key, class->name);
 		if (class->name_version > 1)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/5] locking/lockdep: Call lock_release after releasing the lock
  2018-09-28 17:53 [PATCH 0/5] locking/lockdep: Improve lockdep performance Waiman Long
                   ` (3 preceding siblings ...)
  2018-09-28 17:53 ` [PATCH 4/5] locking/lockdep: Make class->ops a percpu counter Waiman Long
@ 2018-09-28 17:53 ` Waiman Long
  2018-10-02  9:08   ` Ingo Molnar
  2018-10-02  9:06 ` [PATCH 0/5] locking/lockdep: Improve lockdep performance Ingo Molnar
  5 siblings, 1 reply; 19+ messages in thread
From: Waiman Long @ 2018-09-28 17:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon; +Cc: linux-kernel, Waiman Long

Currently, lock_acquire() is called before acquiring the lock and
lock_release() is called before the releasing the lock. As a result,
the execution time of lock_release() is added to the lock hold time
reducing locking throughput, especially for spinlocks and rwlocks which
tend to have a much shorter lock hold time.

As lock_release() is not going to update any shared data that needs
protection from the lock, we don't actually need to call it before
releasing the lock. So the lock_release() calls are now postponed to
after releasing the lock for spinlocks and rwlocks.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/rwlock_api_smp.h   | 16 ++++++++--------
 include/linux/spinlock_api_smp.h |  8 ++++----
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/rwlock_api_smp.h b/include/linux/rwlock_api_smp.h
index 86ebb4b..b026940 100644
--- a/include/linux/rwlock_api_smp.h
+++ b/include/linux/rwlock_api_smp.h
@@ -215,63 +215,63 @@ static inline void __raw_write_lock(rwlock_t *lock)
 
 static inline void __raw_write_unlock(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_write_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	preempt_enable();
 }
 
 static inline void __raw_read_unlock(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_read_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	preempt_enable();
 }
 
 static inline void
 __raw_read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_read_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	local_irq_restore(flags);
 	preempt_enable();
 }
 
 static inline void __raw_read_unlock_irq(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_read_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	local_irq_enable();
 	preempt_enable();
 }
 
 static inline void __raw_read_unlock_bh(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_read_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	__local_bh_enable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
 }
 
 static inline void __raw_write_unlock_irqrestore(rwlock_t *lock,
 					     unsigned long flags)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_write_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	local_irq_restore(flags);
 	preempt_enable();
 }
 
 static inline void __raw_write_unlock_irq(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_write_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	local_irq_enable();
 	preempt_enable();
 }
 
 static inline void __raw_write_unlock_bh(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_write_unlock(lock);
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
 	__local_bh_enable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
 }
 
diff --git a/include/linux/spinlock_api_smp.h b/include/linux/spinlock_api_smp.h
index 42dfab8..fcb84df 100644
--- a/include/linux/spinlock_api_smp.h
+++ b/include/linux/spinlock_api_smp.h
@@ -147,32 +147,32 @@ static inline void __raw_spin_lock(raw_spinlock_t *lock)
 
 static inline void __raw_spin_unlock(raw_spinlock_t *lock)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_spin_unlock(lock);
+	spin_release(&lock->dep_map, 1, _RET_IP_);
 	preempt_enable();
 }
 
 static inline void __raw_spin_unlock_irqrestore(raw_spinlock_t *lock,
 					    unsigned long flags)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_spin_unlock(lock);
+	spin_release(&lock->dep_map, 1, _RET_IP_);
 	local_irq_restore(flags);
 	preempt_enable();
 }
 
 static inline void __raw_spin_unlock_irq(raw_spinlock_t *lock)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_spin_unlock(lock);
+	spin_release(&lock->dep_map, 1, _RET_IP_);
 	local_irq_enable();
 	preempt_enable();
 }
 
 static inline void __raw_spin_unlock_bh(raw_spinlock_t *lock)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
 	do_raw_spin_unlock(lock);
+	spin_release(&lock->dep_map, 1, _RET_IP_);
 	__local_bh_enable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-09-28 17:53 ` [PATCH 4/5] locking/lockdep: Make class->ops a percpu counter Waiman Long
@ 2018-09-28 20:25   ` kbuild test robot
  2018-09-28 20:31     ` Waiman Long
  2018-09-28 20:42   ` kbuild test robot
  2018-10-02  9:39   ` Peter Zijlstra
  2 siblings, 1 reply; 19+ messages in thread
From: kbuild test robot @ 2018-09-28 20:25 UTC (permalink / raw)
  To: Waiman Long
  Cc: kbuild-all, Peter Zijlstra, Ingo Molnar, Will Deacon,
	linux-kernel, Waiman Long

[-- Attachment #1: Type: text/plain, Size: 4840 bytes --]

Hi Waiman,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/locking/core]
[also build test ERROR on v4.19-rc5 next-20180928]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Waiman-Long/locking-lockdep-Remove-add_chain_cache_classes/20180929-031820
config: i386-randconfig-s2-201838 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel/locking/lockdep_proc.c: In function 'l_show':
>> kernel/locking/lockdep_proc.c:71:34: error: 'struct lock_class' has no member named 'ops'; did you mean 'pops'?
     seq_printf(m, " OPS:%8ld", class->ops);
                                     ^~

vim +71 kernel/locking/lockdep_proc.c

068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  57  
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  58  static int l_show(struct seq_file *m, void *v)
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  59  {
8109e1de8 kernel/lockdep_proc.c Li Zefan          2009-08-17  60  	struct lock_class *class = list_entry(v, struct lock_class, lock_entry);
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  61  	struct lock_list *entry;
f510b233c kernel/lockdep_proc.c Peter Zijlstra    2009-01-22  62  	char usage[LOCK_USAGE_CHARS];
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  63  
8109e1de8 kernel/lockdep_proc.c Li Zefan          2009-08-17  64  	if (v == &all_lock_classes) {
94c61c0ae kernel/lockdep_proc.c Tim Pepper        2007-10-11  65  		seq_printf(m, "all lock classes:\n");
94c61c0ae kernel/lockdep_proc.c Tim Pepper        2007-10-11  66  		return 0;
94c61c0ae kernel/lockdep_proc.c Tim Pepper        2007-10-11  67  	}
94c61c0ae kernel/lockdep_proc.c Tim Pepper        2007-10-11  68  
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  69  	seq_printf(m, "%p", class->key);
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  70  #ifdef CONFIG_DEBUG_LOCKDEP
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03 @71  	seq_printf(m, " OPS:%8ld", class->ops);
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  72  #endif
df60a8441 kernel/lockdep_proc.c Stephen Hemminger 2008-08-15  73  #ifdef CONFIG_PROVE_LOCKING
df60a8441 kernel/lockdep_proc.c Stephen Hemminger 2008-08-15  74  	seq_printf(m, " FD:%5ld", lockdep_count_forward_deps(class));
df60a8441 kernel/lockdep_proc.c Stephen Hemminger 2008-08-15  75  	seq_printf(m, " BD:%5ld", lockdep_count_backward_deps(class));
df60a8441 kernel/lockdep_proc.c Stephen Hemminger 2008-08-15  76  #endif
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  77  
f510b233c kernel/lockdep_proc.c Peter Zijlstra    2009-01-22  78  	get_usage_chars(class, usage);
f510b233c kernel/lockdep_proc.c Peter Zijlstra    2009-01-22  79  	seq_printf(m, " %s", usage);
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  80  
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  81  	seq_printf(m, ": ");
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  82  	print_name(m, class);
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  83  	seq_puts(m, "\n");
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  84  
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  85  	list_for_each_entry(entry, &class->locks_after, entry) {
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  86  		if (entry->distance == 1) {
2429e4ee7 kernel/lockdep_proc.c Huang, Ying       2008-06-13  87  			seq_printf(m, " -> [%p] ", entry->class->key);
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  88  			print_name(m, entry->class);
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  89  			seq_puts(m, "\n");
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  90  		}
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  91  	}
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  92  	seq_puts(m, "\n");
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  93  
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  94  	return 0;
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  95  }
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  96  

:::::: The code at line 71 was first introduced by commit
:::::: a8f24a3978c5f82419e1c90dc90460731204f46f [PATCH] lockdep: procfs

:::::: TO: Ingo Molnar <mingo@elte.hu>
:::::: CC: Linus Torvalds <torvalds@g5.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30727 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-09-28 20:25   ` kbuild test robot
@ 2018-09-28 20:31     ` Waiman Long
  0 siblings, 0 replies; 19+ messages in thread
From: Waiman Long @ 2018-09-28 20:31 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel

On 09/28/2018 04:25 PM, kbuild test robot wrote:
> Hi Waiman,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on tip/locking/core]
> [also build test ERROR on v4.19-rc5 next-20180928]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url:    https://github.com/0day-ci/linux/commits/Waiman-Long/locking-lockdep-Remove-add_chain_cache_classes/20180929-031820
> config: i386-randconfig-s2-201838 (attached as .config)
> compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=i386 
>
> All errors (new ones prefixed by >>):
>
>    kernel/locking/lockdep_proc.c: In function 'l_show':
>>> kernel/locking/lockdep_proc.c:71:34: error: 'struct lock_class' has no member named 'ops'; did you mean 'pops'?
>      seq_printf(m, " OPS:%8ld", class->ops);
>                                      ^~

OK, I missed the ops reference in lockdep_proc.c which is probably not
built in my debug kernel config. I will fix that in the next version.

-Longman

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-09-28 17:53 ` [PATCH 4/5] locking/lockdep: Make class->ops a percpu counter Waiman Long
  2018-09-28 20:25   ` kbuild test robot
@ 2018-09-28 20:42   ` kbuild test robot
  2018-10-02  9:39   ` Peter Zijlstra
  2 siblings, 0 replies; 19+ messages in thread
From: kbuild test robot @ 2018-09-28 20:42 UTC (permalink / raw)
  To: Waiman Long
  Cc: kbuild-all, Peter Zijlstra, Ingo Molnar, Will Deacon,
	linux-kernel, Waiman Long

[-- Attachment #1: Type: text/plain, Size: 4823 bytes --]

Hi Waiman,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/locking/core]
[also build test ERROR on v4.19-rc5 next-20180928]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Waiman-Long/locking-lockdep-Remove-add_chain_cache_classes/20180929-031820
config: x86_64-randconfig-u0-09290404 (attached as .config)
compiler: gcc-5 (Debian 5.5.0-3) 5.4.1 20171010
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   kernel/locking/lockdep_proc.c: In function 'l_show':
>> kernel/locking/lockdep_proc.c:71:34: error: 'struct lock_class' has no member named 'ops'
     seq_printf(m, " OPS:%8ld", class->ops);
                                     ^

vim +71 kernel/locking/lockdep_proc.c

068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  57  
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  58  static int l_show(struct seq_file *m, void *v)
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  59  {
8109e1de8 kernel/lockdep_proc.c Li Zefan          2009-08-17  60  	struct lock_class *class = list_entry(v, struct lock_class, lock_entry);
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  61  	struct lock_list *entry;
f510b233c kernel/lockdep_proc.c Peter Zijlstra    2009-01-22  62  	char usage[LOCK_USAGE_CHARS];
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  63  
8109e1de8 kernel/lockdep_proc.c Li Zefan          2009-08-17  64  	if (v == &all_lock_classes) {
94c61c0ae kernel/lockdep_proc.c Tim Pepper        2007-10-11  65  		seq_printf(m, "all lock classes:\n");
94c61c0ae kernel/lockdep_proc.c Tim Pepper        2007-10-11  66  		return 0;
94c61c0ae kernel/lockdep_proc.c Tim Pepper        2007-10-11  67  	}
94c61c0ae kernel/lockdep_proc.c Tim Pepper        2007-10-11  68  
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  69  	seq_printf(m, "%p", class->key);
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  70  #ifdef CONFIG_DEBUG_LOCKDEP
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03 @71  	seq_printf(m, " OPS:%8ld", class->ops);
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  72  #endif
df60a8441 kernel/lockdep_proc.c Stephen Hemminger 2008-08-15  73  #ifdef CONFIG_PROVE_LOCKING
df60a8441 kernel/lockdep_proc.c Stephen Hemminger 2008-08-15  74  	seq_printf(m, " FD:%5ld", lockdep_count_forward_deps(class));
df60a8441 kernel/lockdep_proc.c Stephen Hemminger 2008-08-15  75  	seq_printf(m, " BD:%5ld", lockdep_count_backward_deps(class));
df60a8441 kernel/lockdep_proc.c Stephen Hemminger 2008-08-15  76  #endif
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  77  
f510b233c kernel/lockdep_proc.c Peter Zijlstra    2009-01-22  78  	get_usage_chars(class, usage);
f510b233c kernel/lockdep_proc.c Peter Zijlstra    2009-01-22  79  	seq_printf(m, " %s", usage);
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  80  
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  81  	seq_printf(m, ": ");
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  82  	print_name(m, class);
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  83  	seq_puts(m, "\n");
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  84  
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  85  	list_for_each_entry(entry, &class->locks_after, entry) {
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  86  		if (entry->distance == 1) {
2429e4ee7 kernel/lockdep_proc.c Huang, Ying       2008-06-13  87  			seq_printf(m, " -> [%p] ", entry->class->key);
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  88  			print_name(m, entry->class);
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  89  			seq_puts(m, "\n");
068135e63 kernel/lockdep_proc.c Jason Baron       2007-02-10  90  		}
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  91  	}
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  92  	seq_puts(m, "\n");
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  93  
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  94  	return 0;
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  95  }
a8f24a397 kernel/lockdep_proc.c Ingo Molnar       2006-07-03  96  

:::::: The code at line 71 was first introduced by commit
:::::: a8f24a3978c5f82419e1c90dc90460731204f46f [PATCH] lockdep: procfs

:::::: TO: Ingo Molnar <mingo@elte.hu>
:::::: CC: Linus Torvalds <torvalds@g5.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30896 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] locking/lockdep: Add a faster path in __lock_release()
  2018-09-28 17:53 ` [PATCH 3/5] locking/lockdep: Add a faster path in __lock_release() Waiman Long
@ 2018-10-02  9:03   ` Ingo Molnar
  0 siblings, 0 replies; 19+ messages in thread
From: Ingo Molnar @ 2018-10-02  9:03 UTC (permalink / raw)
  To: Waiman Long; +Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel


* Waiman Long <longman@redhat.com> wrote:

> When __lock_release() is called, the most likely unlock scenario is
> on the innermost lock in the chain.  In this case, we can skip some of
> the checks and provide a faster path to completion.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/locking/lockdep.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index add0468..ca002c0 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -3625,6 +3625,13 @@ static int __lock_downgrade(struct lockdep_map *lock, unsigned long ip)
>  	curr->lockdep_depth = i;
>  	curr->curr_chain_key = hlock->prev_chain_key;
>  
> +	/*
> +	 * The most likely case is when the unlock is on the innermost
> +	 * lock. In this case, we are done!
> +	 */
> +	if (i == depth - 1)
> +		return 1;
> +
>  	if (reacquire_held_locks(curr, depth, i + 1))
>  		return 0;
>  
> @@ -3632,10 +3639,14 @@ static int __lock_downgrade(struct lockdep_map *lock, unsigned long ip)
>  	 * We had N bottles of beer on the wall, we drank one, but now
>  	 * there's not N-1 bottles of beer left on the wall...
>  	 */
> -	if (DEBUG_LOCKS_WARN_ON(curr->lockdep_depth != depth - 1))
> -		return 0;
> +	DEBUG_LOCKS_WARN_ON(curr->lockdep_depth != depth - 1);
>  
> -	return 1;
> +	/*
> +	 * Since reacquire_held_locks() would have called check_chain_key()
> +	 * indirectly via __lock_acquire(), we don't need to do it again
> +	 * on return.
> +	 */
> +	return 0;

Minor nit:

	s/depth - 1/depth-1

for slightly better readability.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] locking/lockdep: Eliminate redundant irqs check in __lock_acquire()
  2018-09-28 17:53 ` [PATCH 2/5] locking/lockdep: Eliminate redundant irqs check in __lock_acquire() Waiman Long
@ 2018-10-02  9:06   ` Ingo Molnar
  0 siblings, 0 replies; 19+ messages in thread
From: Ingo Molnar @ 2018-10-02  9:06 UTC (permalink / raw)
  To: Waiman Long; +Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel


* Waiman Long <longman@redhat.com> wrote:

> The static __lock_acquire() function has only two callers:
> 
>  1) lock_acquire()
>  2) reacquire_held_locks()
> 
> In lock_acquire(), raw_local_irq_save() is called before hand. So
> irqs must have been disabled. So the check

s/before hand/
 /beforehand

s/irqs
 /IRQs

> 
> 	DEBUG_LOCKS_WARN_ON(!irqs_disabled())
> 
> is kind of redundant in thise case. So move the above check
> to reacquire_held_locks() to eliminate redundant code in the
> lock_acquire path.

s/thise
 /this

s/lock_acquire path
 /lock_acquire() path
 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/locking/lockdep.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 8f9de7c..add0468 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -3192,6 +3192,10 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name,
>  /*
>   * This gets called for every mutex_lock*()/spin_lock*() operation.
>   * We maintain the dependency maps and validate the locking attempt:
> + *
> + * The callers must make sure that IRQs are disabled before calling it.
> + * otherwise we could get an interrupt which would want to take locks,
> + * which would end up in lockdep again.

Spelling nit: a comma after the first line, like it was in the original version:

> -	/*
> -	 * Lockdep should run with IRQs disabled, otherwise we could
> -	 * get an interrupt which would want to take locks, which would
> -	 * end up in lockdep and have you got a head-ache already?
> -	 */

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/5] locking/lockdep: Improve lockdep performance
  2018-09-28 17:53 [PATCH 0/5] locking/lockdep: Improve lockdep performance Waiman Long
                   ` (4 preceding siblings ...)
  2018-09-28 17:53 ` [PATCH 5/5] locking/lockdep: Call lock_release after releasing the lock Waiman Long
@ 2018-10-02  9:06 ` Ingo Molnar
  2018-10-02 13:57   ` Waiman Long
  5 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2018-10-02  9:06 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Thomas Gleixner


* Waiman Long <longman@redhat.com> wrote:

> Enabling CONFIG_LOCKDEP and other related debug options will greatly
> reduce system performance. This patchset aims to reduce the performance
> slowdown caused by the lockdep code.
> 
> Patch 1 just removes an inline function that wasn't used.
> 
> Patches 2 and 3 are minor twists to optimize the code.
> 
> Patch 4 makes class->ops a per-cpu counter.
> 
> Patch 5 moves the lock_release() call outside of a lock critical section.
> 
> Parallel kernel compilation tests (make -j <#cpu>) were performed on
> 2 different systems:
> 
>  1) an 1-socket 22-core 44-thread Skylake system
>  2) a 4-socket 72-core 144-thread Broadwell system
> 
> The build times with pre-patch and post-patch debug kernels were:
> 
>    System      Pre-patch     Post-patch    %Change
>    ------      ---------     ----------    -------
>   1-socket      8m53.9s        8m41.2s      -2.4%
>   4-socket      7m27.0s        5m31.0s      -26%
> 
> I think it is the last 2 patches that yield most of the performance
> improvement.

Impressive speedup!

Mind including the non-lockdep numbers as well, for reference?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] locking/lockdep: Call lock_release after releasing the lock
  2018-09-28 17:53 ` [PATCH 5/5] locking/lockdep: Call lock_release after releasing the lock Waiman Long
@ 2018-10-02  9:08   ` Ingo Molnar
  0 siblings, 0 replies; 19+ messages in thread
From: Ingo Molnar @ 2018-10-02  9:08 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Thomas Gleixner


* Waiman Long <longman@redhat.com> wrote:

> Subject: locking/lockdep: Call lock_release after releasing the lock

s/lock_release
 /lock_release()

> Currently, lock_acquire() is called before acquiring the lock and
> lock_release() is called before the releasing the lock. As a result,
> the execution time of lock_release() is added to the lock hold time
> reducing locking throughput, especially for spinlocks and rwlocks which
> tend to have a much shorter lock hold time.
> 
> As lock_release() is not going to update any shared data that needs
> protection from the lock, we don't actually need to call it before
> releasing the lock. So the lock_release() calls are now postponed to
> after releasing the lock for spinlocks and rwlocks.

Nice optimization!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-09-28 17:53 ` [PATCH 4/5] locking/lockdep: Make class->ops a percpu counter Waiman Long
  2018-09-28 20:25   ` kbuild test robot
  2018-09-28 20:42   ` kbuild test robot
@ 2018-10-02  9:39   ` Peter Zijlstra
  2018-10-02  9:55     ` Ingo Molnar
  2 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2018-10-02  9:39 UTC (permalink / raw)
  To: Waiman Long; +Cc: Ingo Molnar, Will Deacon, linux-kernel

On Fri, Sep 28, 2018 at 01:53:20PM -0400, Waiman Long wrote:
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index ca002c0..7a0ed1d 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -139,6 +139,7 @@ static inline int debug_locks_off_graph_unlock(void)
>   */
>  unsigned long nr_lock_classes;
>  static struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
> +static DEFINE_PER_CPU(unsigned long [MAX_LOCKDEP_KEYS], lock_class_ops);

> @@ -1387,11 +1391,15 @@ static inline int usage_match(struct lock_list *entry, void *bit)
>  
>  static void print_lock_class_header(struct lock_class *class, int depth)
>  {
> -	int bit;
> +	int bit, cpu;
> +	unsigned long ops = 0UL;
> +
> +	for_each_possible_cpu(cpu)
> +		ops += *per_cpu(class->pops, cpu);
>  
>  	printk("%*s->", depth, "");
>  	print_lock_name(class);
> -	printk(KERN_CONT " ops: %lu", class->ops);
> +	printk(KERN_CONT " ops: %lu", ops);
>  	printk(KERN_CONT " {\n");
>  
>  	for (bit = 0; bit < LOCK_USAGE_STATES; bit++) {

That is an aweful lot of storage for a stupid number. Some archs
(sparc64) are bzImage size constrained and this will hurt them.

Ingo, do you happen to remember what that number was good for?
Can't we simply ditch it?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-10-02  9:39   ` Peter Zijlstra
@ 2018-10-02  9:55     ` Ingo Molnar
  2018-10-02 14:10       ` Waiman Long
  0 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2018-10-02  9:55 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Waiman Long, Ingo Molnar, Will Deacon, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Sep 28, 2018 at 01:53:20PM -0400, Waiman Long wrote:
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index ca002c0..7a0ed1d 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -139,6 +139,7 @@ static inline int debug_locks_off_graph_unlock(void)
> >   */
> >  unsigned long nr_lock_classes;
> >  static struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
> > +static DEFINE_PER_CPU(unsigned long [MAX_LOCKDEP_KEYS], lock_class_ops);
> 
> > @@ -1387,11 +1391,15 @@ static inline int usage_match(struct lock_list *entry, void *bit)
> >  
> >  static void print_lock_class_header(struct lock_class *class, int depth)
> >  {
> > -	int bit;
> > +	int bit, cpu;
> > +	unsigned long ops = 0UL;
> > +
> > +	for_each_possible_cpu(cpu)
> > +		ops += *per_cpu(class->pops, cpu);
> >  
> >  	printk("%*s->", depth, "");
> >  	print_lock_name(class);
> > -	printk(KERN_CONT " ops: %lu", class->ops);
> > +	printk(KERN_CONT " ops: %lu", ops);
> >  	printk(KERN_CONT " {\n");
> >  
> >  	for (bit = 0; bit < LOCK_USAGE_STATES; bit++) {
> 
> That is an aweful lot of storage for a stupid number. Some archs
> (sparc64) are bzImage size constrained and this will hurt them.
> 
> Ingo, do you happen to remember what that number was good for?

Just a spur of the moment statistics to satisfy curiousity, and it's useful to see how 'busy' a 
particular class is, right?

> Can't we simply ditch it?

We certainly could. Do we have roughly equivalent metrics to arrive at this number via other 
methods?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/5] locking/lockdep: Improve lockdep performance
  2018-10-02  9:06 ` [PATCH 0/5] locking/lockdep: Improve lockdep performance Ingo Molnar
@ 2018-10-02 13:57   ` Waiman Long
  0 siblings, 0 replies; 19+ messages in thread
From: Waiman Long @ 2018-10-02 13:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Thomas Gleixner

On 10/02/2018 05:06 AM, Ingo Molnar wrote:
> * Waiman Long <longman@redhat.com> wrote:
>
>> Enabling CONFIG_LOCKDEP and other related debug options will greatly
>> reduce system performance. This patchset aims to reduce the performance
>> slowdown caused by the lockdep code.
>>
>> Patch 1 just removes an inline function that wasn't used.
>>
>> Patches 2 and 3 are minor twists to optimize the code.
>>
>> Patch 4 makes class->ops a per-cpu counter.
>>
>> Patch 5 moves the lock_release() call outside of a lock critical section.
>>
>> Parallel kernel compilation tests (make -j <#cpu>) were performed on
>> 2 different systems:
>>
>>  1) an 1-socket 22-core 44-thread Skylake system
>>  2) a 4-socket 72-core 144-thread Broadwell system
>>
>> The build times with pre-patch and post-patch debug kernels were:
>>
>>    System      Pre-patch     Post-patch    %Change
>>    ------      ---------     ----------    -------
>>   1-socket      8m53.9s        8m41.2s      -2.4%
>>   4-socket      7m27.0s        5m31.0s      -26%
>>
>> I think it is the last 2 patches that yield most of the performance
>> improvement.
> Impressive speedup!
>
> Mind including the non-lockdep numbers as well, for reference?
>
> Thanks,
>
> 	Ingo

OK, I will include the non lockdep number for comparison. However the
debug kernel has other debugging code enabled as well so the slowdown
won't be just for the enabling of lockdep.

-Longman


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-10-02  9:55     ` Ingo Molnar
@ 2018-10-02 14:10       ` Waiman Long
  2018-10-02 14:28         ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: Waiman Long @ 2018-10-02 14:10 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: Ingo Molnar, Will Deacon, linux-kernel

On 10/02/2018 05:55 AM, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Fri, Sep 28, 2018 at 01:53:20PM -0400, Waiman Long wrote:
>>> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
>>> index ca002c0..7a0ed1d 100644
>>> --- a/kernel/locking/lockdep.c
>>> +++ b/kernel/locking/lockdep.c
>>> @@ -139,6 +139,7 @@ static inline int debug_locks_off_graph_unlock(void)
>>>   */
>>>  unsigned long nr_lock_classes;
>>>  static struct lock_class lock_classes[MAX_LOCKDEP_KEYS];
>>> +static DEFINE_PER_CPU(unsigned long [MAX_LOCKDEP_KEYS], lock_class_ops);
>>> @@ -1387,11 +1391,15 @@ static inline int usage_match(struct lock_list *entry, void *bit)
>>>  
>>>  static void print_lock_class_header(struct lock_class *class, int depth)
>>>  {
>>> -	int bit;
>>> +	int bit, cpu;
>>> +	unsigned long ops = 0UL;
>>> +
>>> +	for_each_possible_cpu(cpu)
>>> +		ops += *per_cpu(class->pops, cpu);
>>>  
>>>  	printk("%*s->", depth, "");
>>>  	print_lock_name(class);
>>> -	printk(KERN_CONT " ops: %lu", class->ops);
>>> +	printk(KERN_CONT " ops: %lu", ops);
>>>  	printk(KERN_CONT " {\n");
>>>  
>>>  	for (bit = 0; bit < LOCK_USAGE_STATES; bit++) {
>> That is an aweful lot of storage for a stupid number. Some archs
>> (sparc64) are bzImage size constrained and this will hurt them.
>>
>> Ingo, do you happen to remember what that number was good for?
> Just a spur of the moment statistics to satisfy curiousity, and it's useful to see how 'busy' a 
> particular class is, right?
>
>> Can't we simply ditch it?
> We certainly could. Do we have roughly equivalent metrics to arrive at this number via other 
> methods?
>
> Thanks,
>
> 	Ingo


One alternative is to group it under CONFIG_DEBUG_LOCKDEP again. This
metric was originally under CONFIG_DEBUG_LOCKDEP, but was moved to
CONFIG_LOCKDEP when trying to make other lock debugging statistics
per-cpu counters. It was probably because this metric is per lock class
while the rests are global.

By doing so, you incur the memory cost only when CONFIG_DEBUG_LOCKDEP is
defined.

What do you think?

Cheers,
Longman



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-10-02 14:10       ` Waiman Long
@ 2018-10-02 14:28         ` Peter Zijlstra
  2018-10-02 18:53           ` Ingo Molnar
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2018-10-02 14:28 UTC (permalink / raw)
  To: Waiman Long; +Cc: Ingo Molnar, Ingo Molnar, Will Deacon, linux-kernel

On Tue, Oct 02, 2018 at 10:10:48AM -0400, Waiman Long wrote:
> One alternative is to group it under CONFIG_DEBUG_LOCKDEP again. This
> metric was originally under CONFIG_DEBUG_LOCKDEP, but was moved to
> CONFIG_LOCKDEP when trying to make other lock debugging statistics
> per-cpu counters. It was probably because this metric is per lock class
> while the rests are global.
> 
> By doing so, you incur the memory cost only when CONFIG_DEBUG_LOCKDEP is
> defined.
> 
> What do you think?

Works for me.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] locking/lockdep: Make class->ops a percpu counter
  2018-10-02 14:28         ` Peter Zijlstra
@ 2018-10-02 18:53           ` Ingo Molnar
  0 siblings, 0 replies; 19+ messages in thread
From: Ingo Molnar @ 2018-10-02 18:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, Will Deacon, linux-kernel, Thomas Gleixner


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Oct 02, 2018 at 10:10:48AM -0400, Waiman Long wrote:
> > One alternative is to group it under CONFIG_DEBUG_LOCKDEP again. This
> > metric was originally under CONFIG_DEBUG_LOCKDEP, but was moved to
> > CONFIG_LOCKDEP when trying to make other lock debugging statistics
> > per-cpu counters. It was probably because this metric is per lock class
> > while the rests are global.
> > 
> > By doing so, you incur the memory cost only when CONFIG_DEBUG_LOCKDEP is
> > defined.
> > 
> > What do you think?
> 
> Works for me.

Sounds good to me too!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-10-02 18:54 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-28 17:53 [PATCH 0/5] locking/lockdep: Improve lockdep performance Waiman Long
2018-09-28 17:53 ` [PATCH 1/5] locking/lockdep: Remove add_chain_cache_classes() Waiman Long
2018-09-28 17:53 ` [PATCH 2/5] locking/lockdep: Eliminate redundant irqs check in __lock_acquire() Waiman Long
2018-10-02  9:06   ` Ingo Molnar
2018-09-28 17:53 ` [PATCH 3/5] locking/lockdep: Add a faster path in __lock_release() Waiman Long
2018-10-02  9:03   ` Ingo Molnar
2018-09-28 17:53 ` [PATCH 4/5] locking/lockdep: Make class->ops a percpu counter Waiman Long
2018-09-28 20:25   ` kbuild test robot
2018-09-28 20:31     ` Waiman Long
2018-09-28 20:42   ` kbuild test robot
2018-10-02  9:39   ` Peter Zijlstra
2018-10-02  9:55     ` Ingo Molnar
2018-10-02 14:10       ` Waiman Long
2018-10-02 14:28         ` Peter Zijlstra
2018-10-02 18:53           ` Ingo Molnar
2018-09-28 17:53 ` [PATCH 5/5] locking/lockdep: Call lock_release after releasing the lock Waiman Long
2018-10-02  9:08   ` Ingo Molnar
2018-10-02  9:06 ` [PATCH 0/5] locking/lockdep: Improve lockdep performance Ingo Molnar
2018-10-02 13:57   ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).