linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/4] locking: Add new lock contention tracepoints (v2)
@ 2022-03-01  1:04 Namhyung Kim
  2022-03-01  1:04 ` [PATCH 1/4] locking: Add lock contention tracepoints Namhyung Kim
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Namhyung Kim @ 2022-03-01  1:04 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng
  Cc: LKML, Thomas Gleixner, Steven Rostedt, Mathieu Desnoyers,
	Byungchul Park, Paul E. McKenney, Arnd Bergmann, linux-arch, bpf,
	Radoslaw Burny

Hello,

There have been some requests for low-overhead kernel lock contention
monitoring.  The kernel has CONFIG_LOCK_STAT to provide such an infra
either via /proc/lock_stat or tracepoints directly.

However it's not light-weight and hard to be used in production.  So
I'm trying to add new tracepoints for lock contention and using them
as a base to build a new monitoring system.

* Changes in v2
 - do not use lockdep infrastructure
 - add flags argument to lock:contention_begin tracepoint

As we don't want to increase the size of lock data structure, it's
hard to have the name of locks or their classes.  Instead we can use
caller IP to identify contended locks.  I had to modify some places to
have that information meaningful.

Also, I added a flags argument in the contention_begin to classify
locks in question.  The lower 2 bytes will have a task state it goes
to.  This can be TASK_RUNNING (0) for spinlocks, or other values for
sleeping locks (mutex, rwsem, ...).  And the upper 2 bytes will have
addition info like whether it's a reader-writer lock or real-time and
so on.

The patch 01 added the tracepoints in a new file and two new wrapper
functions were added.  This file contains definition of all locking
tracepoints including lockdep/lock_stat.  The wrappers are necessary
because some kind of locks are defined in a header file and it was not
possible to include tracepoint headers directly due to circular
dependencies.

The patch 02 actually installs the tracepoints in the locking code.
To minimize the overhead, they were added in the slow path of the code
separately.  As spinlocks are defined in the arch headers, I couldn't
handle them all.  I've just added it to generic queued spinlock and
rwlocks only.  Each arch can add the tracepoints later.

The patch 03 and 04 updates the mutex and rwsem code to pass proper
caller IPs.  Getting the caller IP at the tracepoint won't work if any
of the locking code is not inlined.  So pass the IP from the API
function to the internal ones.

This series base on the current tip/locking/core and you get it from
'locking/tracepoint-v2' branch in my tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


Thanks,
Namhyung


Namhyung Kim (4):
  locking: Add lock contention tracepoints
  locking: Apply contention tracepoints in the slow path
  locking/mutex: Pass proper call-site ip
  locking/rwsem: Pass proper call-site ip

 include/asm-generic/qrwlock.h   |  5 ++
 include/asm-generic/qspinlock.h |  3 ++
 include/linux/lock_trace.h      | 31 +++++++++++++
 include/linux/lockdep.h         | 29 +++++++++++-
 include/trace/events/lock.h     | 43 ++++++++++++++++-
 kernel/locking/Makefile         |  2 +-
 kernel/locking/lockdep.c        |  1 -
 kernel/locking/mutex.c          | 44 ++++++++++--------
 kernel/locking/percpu-rwsem.c   | 11 ++++-
 kernel/locking/rtmutex.c        | 12 ++++-
 kernel/locking/rwbase_rt.c      | 11 ++++-
 kernel/locking/rwsem.c          | 81 ++++++++++++++++++++-------------
 kernel/locking/tracepoint.c     | 21 +++++++++
 13 files changed, 235 insertions(+), 59 deletions(-)
 create mode 100644 include/linux/lock_trace.h
 create mode 100644 kernel/locking/tracepoint.c


base-commit: cd27ccfc727e99352321c0c75012ab9c5a90321e
-- 
2.35.1.574.g5d30c73bfb-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] locking: Add lock contention tracepoints
  2022-03-01  1:04 [RFC 0/4] locking: Add new lock contention tracepoints (v2) Namhyung Kim
@ 2022-03-01  1:04 ` Namhyung Kim
  2022-03-01  1:04 ` [PATCH 2/4] locking: Apply contention tracepoints in the slow path Namhyung Kim
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2022-03-01  1:04 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng
  Cc: LKML, Thomas Gleixner, Steven Rostedt, Mathieu Desnoyers,
	Byungchul Park, Paul E. McKenney, Arnd Bergmann, linux-arch, bpf,
	Radoslaw Burny

This adds two new lock contention tracepoints like below:

 * lock:contention_begin
 * lock:contention_end

The lock:contention_begin takes a flags argument to classify locks.  I
found it useful to pass a task state it goes to and it can tell if the
given lock is busy-waiting (spinlock) or sleeping (mutex or semaphore).
Also it has info whether it's a reader-writer lock, real-time, and
per-cpu.

Move tracepoint definitions into a separate file so that we can use
some of them without lockdep.  Also lock_trace.h header was added to
provide access to the tracepoints in the header file (like spinlock.h)
which cannot include the tracepoint definition directly.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/linux/lock_trace.h  | 31 +++++++++++++++++++++++++++
 include/trace/events/lock.h | 42 ++++++++++++++++++++++++++++++++++++-
 kernel/locking/Makefile     |  2 +-
 kernel/locking/lockdep.c    |  1 -
 kernel/locking/tracepoint.c | 21 +++++++++++++++++++
 5 files changed, 94 insertions(+), 3 deletions(-)
 create mode 100644 include/linux/lock_trace.h
 create mode 100644 kernel/locking/tracepoint.c

diff --git a/include/linux/lock_trace.h b/include/linux/lock_trace.h
new file mode 100644
index 000000000000..d84bcc9570a4
--- /dev/null
+++ b/include/linux/lock_trace.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __LINUX_LOCK_TRACE_H
+#define __LINUX_LOCK_TRACE_H
+
+#include <linux/tracepoint-defs.h>
+
+DECLARE_TRACEPOINT(contention_begin);
+DECLARE_TRACEPOINT(contention_end);
+
+#define LCB_F_READ	(1U << 31)
+#define LCB_F_WRITE	(1U << 30)
+#define LCB_F_RT	(1U << 29)
+#define LCB_F_PERCPU	(1U << 28)
+
+extern void lock_contention_begin(void *lock, unsigned long ip,
+				  unsigned int flags);
+extern void lock_contention_end(void *lock);
+
+#define LOCK_CONTENTION_BEGIN(_lock, _flags)				\
+	do {								\
+		if (tracepoint_enabled(contention_begin))		\
+			lock_contention_begin(_lock, _RET_IP_, _flags);	\
+	} while (0)
+
+#define LOCK_CONTENTION_END(_lock)					\
+	do {								\
+		if (tracepoint_enabled(contention_end))			\
+			lock_contention_end(_lock);			\
+	} while (0)
+
+#endif /* __LINUX_LOCK_TRACE_H */
diff --git a/include/trace/events/lock.h b/include/trace/events/lock.h
index d7512129a324..7bca0a537dbd 100644
--- a/include/trace/events/lock.h
+++ b/include/trace/events/lock.h
@@ -5,11 +5,12 @@
 #if !defined(_TRACE_LOCK_H) || defined(TRACE_HEADER_MULTI_READ)
 #define _TRACE_LOCK_H
 
-#include <linux/lockdep.h>
 #include <linux/tracepoint.h>
 
 #ifdef CONFIG_LOCKDEP
 
+#include <linux/lockdep.h>
+
 TRACE_EVENT(lock_acquire,
 
 	TP_PROTO(struct lockdep_map *lock, unsigned int subclass,
@@ -81,6 +82,45 @@ DEFINE_EVENT(lock, lock_acquired,
 #endif
 #endif
 
+TRACE_EVENT(contention_begin,
+
+	TP_PROTO(void *lock, unsigned long ip, unsigned int flags),
+
+	TP_ARGS(lock, ip, flags),
+
+	TP_STRUCT__entry(
+		__field(void *, lock_addr)
+		__field(unsigned long, ip)
+		__field(unsigned int, flags)
+	),
+
+	TP_fast_assign(
+		__entry->lock_addr = lock;
+		__entry->ip = ip;
+		__entry->flags = flags;
+	),
+
+	TP_printk("%p %pS (%x)", __entry->lock_addr,  (void *) __entry->ip,
+		  __entry->flags)
+);
+
+TRACE_EVENT(contention_end,
+
+	TP_PROTO(void *lock),
+
+	TP_ARGS(lock),
+
+	TP_STRUCT__entry(
+		__field(void *, lock_addr)
+	),
+
+	TP_fast_assign(
+		__entry->lock_addr = lock;
+	),
+
+	TP_printk("%p", __entry->lock_addr)
+);
+
 #endif /* _TRACE_LOCK_H */
 
 /* This part must be outside protection */
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index d51cabf28f38..d212401adcdc 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -3,7 +3,7 @@
 # and is generally not a function of system call inputs.
 KCOV_INSTRUMENT		:= n
 
-obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o
+obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o tracepoint.o
 
 # Avoid recursion lockdep -> KCSAN -> ... -> lockdep.
 KCSAN_SANITIZE_lockdep.o := n
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 50036c10b518..08f8fb6a2d1e 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -60,7 +60,6 @@
 
 #include "lockdep_internals.h"
 
-#define CREATE_TRACE_POINTS
 #include <trace/events/lock.h>
 
 #ifdef CONFIG_PROVE_LOCKING
diff --git a/kernel/locking/tracepoint.c b/kernel/locking/tracepoint.c
new file mode 100644
index 000000000000..d6f5c6c1d7bd
--- /dev/null
+++ b/kernel/locking/tracepoint.c
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#include <linux/lock_trace.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/lock.h>
+
+/* these are exported via LOCK_CONTENTION_{BEGIN,END} macro */
+EXPORT_TRACEPOINT_SYMBOL_GPL(contention_begin);
+EXPORT_TRACEPOINT_SYMBOL_GPL(contention_end);
+
+void lock_contention_begin(void *lock, unsigned long ip, unsigned int flags)
+{
+	trace_contention_begin(lock, ip, flags);
+}
+EXPORT_SYMBOL_GPL(lock_contention_begin);
+
+void lock_contention_end(void *lock)
+{
+	trace_contention_end(lock);
+}
+EXPORT_SYMBOL_GPL(lock_contention_end);
-- 
2.35.1.574.g5d30c73bfb-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] locking: Apply contention tracepoints in the slow path
  2022-03-01  1:04 [RFC 0/4] locking: Add new lock contention tracepoints (v2) Namhyung Kim
  2022-03-01  1:04 ` [PATCH 1/4] locking: Add lock contention tracepoints Namhyung Kim
@ 2022-03-01  1:04 ` Namhyung Kim
  2022-03-01  8:43   ` Peter Zijlstra
  2022-03-01  9:03   ` Peter Zijlstra
  2022-03-01  1:04 ` [PATCH 3/4] locking/mutex: Pass proper call-site ip Namhyung Kim
  2022-03-01  1:04 ` [PATCH 4/4] locking/rwsem: " Namhyung Kim
  3 siblings, 2 replies; 17+ messages in thread
From: Namhyung Kim @ 2022-03-01  1:04 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng
  Cc: LKML, Thomas Gleixner, Steven Rostedt, Mathieu Desnoyers,
	Byungchul Park, Paul E. McKenney, Arnd Bergmann, linux-arch, bpf,
	Radoslaw Burny

Adding the lock contention tracepoints in various lock function slow
paths.  Note that each arch can define spinlock differently, I only
added it only to the generic qspinlock for now.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/asm-generic/qrwlock.h   |  5 +++++
 include/asm-generic/qspinlock.h |  3 +++
 include/trace/events/lock.h     |  1 +
 kernel/locking/mutex.c          |  4 ++++
 kernel/locking/percpu-rwsem.c   | 11 ++++++++++-
 kernel/locking/rtmutex.c        | 12 +++++++++++-
 kernel/locking/rwbase_rt.c      | 11 ++++++++++-
 kernel/locking/rwsem.c          | 16 ++++++++++++++--
 8 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 7ae0ece07b4e..9735c39b05bb 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -12,6 +12,7 @@
 #include <linux/atomic.h>
 #include <asm/barrier.h>
 #include <asm/processor.h>
+#include <linux/lock_trace.h>
 
 #include <asm-generic/qrwlock_types.h>
 
@@ -80,7 +81,9 @@ static inline void queued_read_lock(struct qrwlock *lock)
 		return;
 
 	/* The slowpath will decrement the reader count, if necessary. */
+	LOCK_CONTENTION_BEGIN(lock, LCB_F_READ);
 	queued_read_lock_slowpath(lock);
+	LOCK_CONTENTION_END(lock);
 }
 
 /**
@@ -94,7 +97,9 @@ static inline void queued_write_lock(struct qrwlock *lock)
 	if (likely(atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED)))
 		return;
 
+	LOCK_CONTENTION_BEGIN(lock, LCB_F_WRITE);
 	queued_write_lock_slowpath(lock);
+	LOCK_CONTENTION_END(lock);
 }
 
 /**
diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index d74b13825501..986b96fadbf9 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -12,6 +12,7 @@
 
 #include <asm-generic/qspinlock_types.h>
 #include <linux/atomic.h>
+#include <linux/lock_trace.h>
 
 #ifndef queued_spin_is_locked
 /**
@@ -82,7 +83,9 @@ static __always_inline void queued_spin_lock(struct qspinlock *lock)
 	if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL)))
 		return;
 
+	LOCK_CONTENTION_BEGIN(lock, 0);
 	queued_spin_lock_slowpath(lock, val);
+	LOCK_CONTENTION_END(lock);
 }
 #endif
 
diff --git a/include/trace/events/lock.h b/include/trace/events/lock.h
index 7bca0a537dbd..9b285083f88f 100644
--- a/include/trace/events/lock.h
+++ b/include/trace/events/lock.h
@@ -6,6 +6,7 @@
 #define _TRACE_LOCK_H
 
 #include <linux/tracepoint.h>
+#include <linux/lock_trace.h>
 
 #ifdef CONFIG_LOCKDEP
 
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 5e3585950ec8..756624c14dfd 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -30,6 +30,8 @@
 #include <linux/debug_locks.h>
 #include <linux/osq_lock.h>
 
+#include <trace/events/lock.h>
+
 #ifndef CONFIG_PREEMPT_RT
 #include "mutex.h"
 
@@ -626,6 +628,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 		waiter.ww_ctx = ww_ctx;
 
 	lock_contended(&lock->dep_map, ip);
+	trace_contention_begin(lock, ip, state);
 
 	if (!use_ww_ctx) {
 		/* add waiting tasks to the end of the waitqueue (FIFO): */
@@ -688,6 +691,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 	}
 	raw_spin_lock(&lock->wait_lock);
 acquired:
+	trace_contention_end(lock);
 	__set_current_state(TASK_RUNNING);
 
 	if (ww_ctx) {
diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index c9fdae94e098..4049b79b3dcc 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -9,6 +9,7 @@
 #include <linux/sched/task.h>
 #include <linux/sched/debug.h>
 #include <linux/errno.h>
+#include <trace/events/lock.h>
 
 int __percpu_init_rwsem(struct percpu_rw_semaphore *sem,
 			const char *name, struct lock_class_key *key)
@@ -171,9 +172,12 @@ bool __sched __percpu_down_read(struct percpu_rw_semaphore *sem, bool try)
 	if (try)
 		return false;
 
+	trace_contention_begin(sem, _RET_IP_,
+			       LCB_F_READ | LCB_F_PERCPU | TASK_UNINTERRUPTIBLE);
 	preempt_enable();
 	percpu_rwsem_wait(sem, /* .reader = */ true);
 	preempt_disable();
+	trace_contention_end(sem);
 
 	return true;
 }
@@ -224,8 +228,13 @@ void __sched percpu_down_write(struct percpu_rw_semaphore *sem)
 	 * Try set sem->block; this provides writer-writer exclusion.
 	 * Having sem->block set makes new readers block.
 	 */
-	if (!__percpu_down_write_trylock(sem))
+	if (!__percpu_down_write_trylock(sem)) {
+		unsigned int flags = LCB_F_WRITE | LCB_F_PERCPU | TASK_UNINTERRUPTIBLE;
+
+		trace_contention_begin(sem, _RET_IP_, flags);
 		percpu_rwsem_wait(sem, /* .reader = */ false);
+		trace_contention_end(sem);
+	}
 
 	/* smp_mb() implied by __percpu_down_write_trylock() on success -- D matches A */
 
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 8555c4efe97c..e49f5d2a232b 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -24,6 +24,8 @@
 #include <linux/sched/wake_q.h>
 #include <linux/ww_mutex.h>
 
+#include <trace/events/lock.h>
+
 #include "rtmutex_common.h"
 
 #ifndef WW_RT
@@ -1652,10 +1654,16 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_base *lock,
 static __always_inline int __rt_mutex_lock(struct rt_mutex_base *lock,
 					   unsigned int state)
 {
+	int ret;
+
 	if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
 		return 0;
 
-	return rt_mutex_slowlock(lock, NULL, state);
+	trace_contention_begin(lock, _RET_IP_, LCB_F_RT | state);
+	ret = rt_mutex_slowlock(lock, NULL, state);
+	trace_contention_end(lock);
+
+	return ret;
 }
 #endif /* RT_MUTEX_BUILD_MUTEX */
 
@@ -1718,9 +1726,11 @@ static __always_inline void __sched rtlock_slowlock(struct rt_mutex_base *lock)
 {
 	unsigned long flags;
 
+	trace_contention_begin(lock, _RET_IP_, LCB_F_RT | TASK_RTLOCK_WAIT);
 	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 	rtlock_slowlock_locked(lock);
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
+	trace_contention_end(lock);
 }
 
 #endif /* RT_MUTEX_BUILD_SPINLOCKS */
diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index 6fd3162e4098..8a28f1195c58 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -136,10 +136,16 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *rwb,
 static __always_inline int rwbase_read_lock(struct rwbase_rt *rwb,
 					    unsigned int state)
 {
+	int ret;
+
 	if (rwbase_read_trylock(rwb))
 		return 0;
 
-	return __rwbase_read_lock(rwb, state);
+	trace_contention_begin(rwb, _RET_IP_, LCB_F_READ | LCB_F_RT | state);
+	ret = __rwbase_read_lock(rwb, state);
+	trace_contention_end(rwb);
+
+	return ret;
 }
 
 static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb,
@@ -246,12 +252,14 @@ static int __sched rwbase_write_lock(struct rwbase_rt *rwb,
 	if (__rwbase_write_trylock(rwb))
 		goto out_unlock;
 
+	trace_contention_begin(rwb, _RET_IP_, LCB_F_WRITE | LCB_F_RT | state);
 	rwbase_set_and_save_current_state(state);
 	for (;;) {
 		/* Optimized out for rwlocks */
 		if (rwbase_signal_pending_state(state, current)) {
 			rwbase_restore_current_state();
 			__rwbase_write_unlock(rwb, 0, flags);
+			trace_contention_end(rwb);
 			return -EINTR;
 		}
 
@@ -265,6 +273,7 @@ static int __sched rwbase_write_lock(struct rwbase_rt *rwb,
 		set_current_state(state);
 	}
 	rwbase_restore_current_state();
+	trace_contention_end(rwb);
 
 out_unlock:
 	raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index acde5d6f1254..a1a17af7f747 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -27,6 +27,7 @@
 #include <linux/export.h>
 #include <linux/rwsem.h>
 #include <linux/atomic.h>
+#include <trace/events/lock.h>
 
 #ifndef CONFIG_PREEMPT_RT
 #include "lock_events.h"
@@ -1209,9 +1210,14 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
 static inline int __down_read_common(struct rw_semaphore *sem, int state)
 {
 	long count;
+	void *ret;
 
 	if (!rwsem_read_trylock(sem, &count)) {
-		if (IS_ERR(rwsem_down_read_slowpath(sem, count, state)))
+		trace_contention_begin(sem, _RET_IP_, LCB_F_READ | state);
+		ret = rwsem_down_read_slowpath(sem, count, state);
+		trace_contention_end(sem);
+
+		if (IS_ERR(ret))
 			return -EINTR;
 		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	}
@@ -1255,8 +1261,14 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
  */
 static inline int __down_write_common(struct rw_semaphore *sem, int state)
 {
+	void *ret;
+
 	if (unlikely(!rwsem_write_trylock(sem))) {
-		if (IS_ERR(rwsem_down_write_slowpath(sem, state)))
+		trace_contention_begin(sem, _RET_IP_, LCB_F_WRITE | state);
+		ret = rwsem_down_write_slowpath(sem, state);
+		trace_contention_end(sem);
+
+		if (IS_ERR(ret))
 			return -EINTR;
 	}
 
-- 
2.35.1.574.g5d30c73bfb-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] locking/mutex: Pass proper call-site ip
  2022-03-01  1:04 [RFC 0/4] locking: Add new lock contention tracepoints (v2) Namhyung Kim
  2022-03-01  1:04 ` [PATCH 1/4] locking: Add lock contention tracepoints Namhyung Kim
  2022-03-01  1:04 ` [PATCH 2/4] locking: Apply contention tracepoints in the slow path Namhyung Kim
@ 2022-03-01  1:04 ` Namhyung Kim
  2022-03-01  9:05   ` Peter Zijlstra
  2022-03-01  1:04 ` [PATCH 4/4] locking/rwsem: " Namhyung Kim
  3 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2022-03-01  1:04 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng
  Cc: LKML, Thomas Gleixner, Steven Rostedt, Mathieu Desnoyers,
	Byungchul Park, Paul E. McKenney, Arnd Bergmann, linux-arch, bpf,
	Radoslaw Burny

The __mutex_lock_slowpath() and friends are declared as noinline and
_RET_IP_ returns its caller as mutex_lock which is not meaningful.
Pass the ip from mutex_lock() to have actual caller info in the trace.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/locking/mutex.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 756624c14dfd..126b014098f3 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -254,7 +254,7 @@ static void __mutex_handoff(struct mutex *lock, struct task_struct *task)
  * We also put the fastpath first in the kernel image, to make sure the
  * branch is predicted by the CPU as default-untaken.
  */
-static void __sched __mutex_lock_slowpath(struct mutex *lock);
+static void __sched __mutex_lock_slowpath(struct mutex *lock, unsigned long ip);
 
 /**
  * mutex_lock - acquire the mutex
@@ -282,7 +282,7 @@ void __sched mutex_lock(struct mutex *lock)
 	might_sleep();
 
 	if (!__mutex_trylock_fast(lock))
-		__mutex_lock_slowpath(lock);
+		__mutex_lock_slowpath(lock, _RET_IP_);
 }
 EXPORT_SYMBOL(mutex_lock);
 #endif
@@ -947,10 +947,10 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
  * mutex_lock_interruptible() and mutex_trylock().
  */
 static noinline int __sched
-__mutex_lock_killable_slowpath(struct mutex *lock);
+__mutex_lock_killable_slowpath(struct mutex *lock, unsigned long ip);
 
 static noinline int __sched
-__mutex_lock_interruptible_slowpath(struct mutex *lock);
+__mutex_lock_interruptible_slowpath(struct mutex *lock, unsigned long ip);
 
 /**
  * mutex_lock_interruptible() - Acquire the mutex, interruptible by signals.
@@ -971,7 +971,7 @@ int __sched mutex_lock_interruptible(struct mutex *lock)
 	if (__mutex_trylock_fast(lock))
 		return 0;
 
-	return __mutex_lock_interruptible_slowpath(lock);
+	return __mutex_lock_interruptible_slowpath(lock, _RET_IP_);
 }
 
 EXPORT_SYMBOL(mutex_lock_interruptible);
@@ -995,7 +995,7 @@ int __sched mutex_lock_killable(struct mutex *lock)
 	if (__mutex_trylock_fast(lock))
 		return 0;
 
-	return __mutex_lock_killable_slowpath(lock);
+	return __mutex_lock_killable_slowpath(lock, _RET_IP_);
 }
 EXPORT_SYMBOL(mutex_lock_killable);
 
@@ -1020,36 +1020,36 @@ void __sched mutex_lock_io(struct mutex *lock)
 EXPORT_SYMBOL_GPL(mutex_lock_io);
 
 static noinline void __sched
-__mutex_lock_slowpath(struct mutex *lock)
+__mutex_lock_slowpath(struct mutex *lock, unsigned long ip)
 {
-	__mutex_lock(lock, TASK_UNINTERRUPTIBLE, 0, NULL, _RET_IP_);
+	__mutex_lock(lock, TASK_UNINTERRUPTIBLE, 0, NULL, ip);
 }
 
 static noinline int __sched
-__mutex_lock_killable_slowpath(struct mutex *lock)
+__mutex_lock_killable_slowpath(struct mutex *lock, unsigned long ip)
 {
-	return __mutex_lock(lock, TASK_KILLABLE, 0, NULL, _RET_IP_);
+	return __mutex_lock(lock, TASK_KILLABLE, 0, NULL, ip);
 }
 
 static noinline int __sched
-__mutex_lock_interruptible_slowpath(struct mutex *lock)
+__mutex_lock_interruptible_slowpath(struct mutex *lock, unsigned long ip)
 {
-	return __mutex_lock(lock, TASK_INTERRUPTIBLE, 0, NULL, _RET_IP_);
+	return __mutex_lock(lock, TASK_INTERRUPTIBLE, 0, NULL, ip);
 }
 
 static noinline int __sched
-__ww_mutex_lock_slowpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
+__ww_mutex_lock_slowpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx,
+			 unsigned long ip)
 {
-	return __ww_mutex_lock(&lock->base, TASK_UNINTERRUPTIBLE, 0,
-			       _RET_IP_, ctx);
+	return __ww_mutex_lock(&lock->base, TASK_UNINTERRUPTIBLE, 0, ip, ctx);
 }
 
 static noinline int __sched
 __ww_mutex_lock_interruptible_slowpath(struct ww_mutex *lock,
-					    struct ww_acquire_ctx *ctx)
+				       struct ww_acquire_ctx *ctx,
+				       unsigned long ip)
 {
-	return __ww_mutex_lock(&lock->base, TASK_INTERRUPTIBLE, 0,
-			       _RET_IP_, ctx);
+	return __ww_mutex_lock(&lock->base, TASK_INTERRUPTIBLE, 0, ip, ctx);
 }
 
 #endif
@@ -1094,7 +1094,7 @@ ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 		return 0;
 	}
 
-	return __ww_mutex_lock_slowpath(lock, ctx);
+	return __ww_mutex_lock_slowpath(lock, ctx, _RET_IP_);
 }
 EXPORT_SYMBOL(ww_mutex_lock);
 
@@ -1109,7 +1109,7 @@ ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 		return 0;
 	}
 
-	return __ww_mutex_lock_interruptible_slowpath(lock, ctx);
+	return __ww_mutex_lock_interruptible_slowpath(lock, ctx, _RET_IP_);
 }
 EXPORT_SYMBOL(ww_mutex_lock_interruptible);
 
-- 
2.35.1.574.g5d30c73bfb-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] locking/rwsem: Pass proper call-site ip
  2022-03-01  1:04 [RFC 0/4] locking: Add new lock contention tracepoints (v2) Namhyung Kim
                   ` (2 preceding siblings ...)
  2022-03-01  1:04 ` [PATCH 3/4] locking/mutex: Pass proper call-site ip Namhyung Kim
@ 2022-03-01  1:04 ` Namhyung Kim
  3 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2022-03-01  1:04 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng
  Cc: LKML, Thomas Gleixner, Steven Rostedt, Mathieu Desnoyers,
	Byungchul Park, Paul E. McKenney, Arnd Bergmann, linux-arch, bpf,
	Radoslaw Burny

For some reason, __down_read_common() was not inlined in my system.
So I can only see its caller down_read() in the tracepoint.  It should
pass an IP of the actual caller.  Let's add a new variants of
LOCK_CONTENDED macro to pass _RET_IP_ to the lock function and make
rwsem down functions take an ip argument

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/linux/lockdep.h | 29 ++++++++++++++++-
 kernel/locking/rwsem.c  | 69 ++++++++++++++++++++++-------------------
 2 files changed, 65 insertions(+), 33 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 467b94257105..6aca885f356c 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -453,7 +453,16 @@ do {								\
 		lock_contended(&(_lock)->dep_map, _RET_IP_);	\
 		lock(_lock);					\
 	}							\
-	lock_acquired(&(_lock)->dep_map, _RET_IP_);			\
+	lock_acquired(&(_lock)->dep_map, _RET_IP_);		\
+} while (0)
+
+#define LOCK_CONTENDED_IP(_lock, try, lock)			\
+do {								\
+	if (!try(_lock)) {					\
+		lock_contended(&(_lock)->dep_map, _RET_IP_);	\
+		lock(_lock, _RET_IP_);				\
+	}							\
+	lock_acquired(&(_lock)->dep_map, _RET_IP_);		\
 } while (0)
 
 #define LOCK_CONTENDED_RETURN(_lock, try, lock)			\
@@ -468,6 +477,18 @@ do {								\
 	____err;						\
 })
 
+#define LOCK_CONTENDED_RETURN_IP(_lock, try, lock)		\
+({								\
+	int ____err = 0;					\
+	if (!try(_lock)) {					\
+		lock_contended(&(_lock)->dep_map, _RET_IP_);	\
+		____err = lock(_lock, _RET_IP_);		\
+	}							\
+	if (!____err)						\
+		lock_acquired(&(_lock)->dep_map, _RET_IP_);	\
+	____err;						\
+})
+
 #else /* CONFIG_LOCK_STAT */
 
 #define lock_contended(lockdep_map, ip) do {} while (0)
@@ -476,9 +497,15 @@ do {								\
 #define LOCK_CONTENDED(_lock, try, lock) \
 	lock(_lock)
 
+#define LOCK_CONTENDED_IP(_lock, try, lock) \
+	lock(_lock, _RET_IP_)
+
 #define LOCK_CONTENDED_RETURN(_lock, try, lock) \
 	lock(_lock)
 
+#define LOCK_CONTENDED_RETURN_IP(_lock, try, lock) \
+	lock(_lock, _RET_IP_)
+
 #endif /* CONFIG_LOCK_STAT */
 
 #ifdef CONFIG_PROVE_LOCKING
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index a1a17af7f747..eafb0faaed0d 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1207,13 +1207,14 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
 /*
  * lock for reading
  */
-static inline int __down_read_common(struct rw_semaphore *sem, int state)
+static inline int __down_read_common(struct rw_semaphore *sem, int state,
+				     unsigned long ip)
 {
 	long count;
 	void *ret;
 
 	if (!rwsem_read_trylock(sem, &count)) {
-		trace_contention_begin(sem, _RET_IP_, LCB_F_READ | state);
+		trace_contention_begin(sem, ip, LCB_F_READ | state);
 		ret = rwsem_down_read_slowpath(sem, count, state);
 		trace_contention_end(sem);
 
@@ -1224,19 +1225,19 @@ static inline int __down_read_common(struct rw_semaphore *sem, int state)
 	return 0;
 }
 
-static inline void __down_read(struct rw_semaphore *sem)
+static inline void __down_read(struct rw_semaphore *sem, unsigned long ip)
 {
-	__down_read_common(sem, TASK_UNINTERRUPTIBLE);
+	__down_read_common(sem, TASK_UNINTERRUPTIBLE, ip);
 }
 
-static inline int __down_read_interruptible(struct rw_semaphore *sem)
+static inline int __down_read_interruptible(struct rw_semaphore *sem, unsigned long ip)
 {
-	return __down_read_common(sem, TASK_INTERRUPTIBLE);
+	return __down_read_common(sem, TASK_INTERRUPTIBLE, ip);
 }
 
-static inline int __down_read_killable(struct rw_semaphore *sem)
+static inline int __down_read_killable(struct rw_semaphore *sem, unsigned long ip)
 {
-	return __down_read_common(sem, TASK_KILLABLE);
+	return __down_read_common(sem, TASK_KILLABLE, ip);
 }
 
 static inline int __down_read_trylock(struct rw_semaphore *sem)
@@ -1259,12 +1260,13 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 /*
  * lock for writing
  */
-static inline int __down_write_common(struct rw_semaphore *sem, int state)
+static inline int __down_write_common(struct rw_semaphore *sem, int state,
+				      unsigned long ip)
 {
 	void *ret;
 
 	if (unlikely(!rwsem_write_trylock(sem))) {
-		trace_contention_begin(sem, _RET_IP_, LCB_F_WRITE | state);
+		trace_contention_begin(sem, ip, LCB_F_WRITE | state);
 		ret = rwsem_down_write_slowpath(sem, state);
 		trace_contention_end(sem);
 
@@ -1275,14 +1277,14 @@ static inline int __down_write_common(struct rw_semaphore *sem, int state)
 	return 0;
 }
 
-static inline void __down_write(struct rw_semaphore *sem)
+static inline void __down_write(struct rw_semaphore *sem, unsigned long ip)
 {
-	__down_write_common(sem, TASK_UNINTERRUPTIBLE);
+	__down_write_common(sem, TASK_UNINTERRUPTIBLE, ip);
 }
 
-static inline int __down_write_killable(struct rw_semaphore *sem)
+static inline int __down_write_killable(struct rw_semaphore *sem, unsigned long ip)
 {
-	return __down_write_common(sem, TASK_KILLABLE);
+	return __down_write_common(sem, TASK_KILLABLE, ip);
 }
 
 static inline int __down_write_trylock(struct rw_semaphore *sem)
@@ -1397,17 +1399,17 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name,
 }
 EXPORT_SYMBOL(__init_rwsem);
 
-static inline void __down_read(struct rw_semaphore *sem)
+static inline void __down_read(struct rw_semaphore *sem, unsigned long ip)
 {
 	rwbase_read_lock(&sem->rwbase, TASK_UNINTERRUPTIBLE);
 }
 
-static inline int __down_read_interruptible(struct rw_semaphore *sem)
+static inline int __down_read_interruptible(struct rw_semaphore *sem, unsigned long ip)
 {
 	return rwbase_read_lock(&sem->rwbase, TASK_INTERRUPTIBLE);
 }
 
-static inline int __down_read_killable(struct rw_semaphore *sem)
+static inline int __down_read_killable(struct rw_semaphore *sem, unsigned long ip)
 {
 	return rwbase_read_lock(&sem->rwbase, TASK_KILLABLE);
 }
@@ -1422,12 +1424,12 @@ static inline void __up_read(struct rw_semaphore *sem)
 	rwbase_read_unlock(&sem->rwbase, TASK_NORMAL);
 }
 
-static inline void __sched __down_write(struct rw_semaphore *sem)
+static inline void __sched __down_write(struct rw_semaphore *sem, unsigned long ip)
 {
 	rwbase_write_lock(&sem->rwbase, TASK_UNINTERRUPTIBLE);
 }
 
-static inline int __sched __down_write_killable(struct rw_semaphore *sem)
+static inline int __sched __down_write_killable(struct rw_semaphore *sem, unsigned long ip)
 {
 	return rwbase_write_lock(&sem->rwbase, TASK_KILLABLE);
 }
@@ -1472,7 +1474,7 @@ void __sched down_read(struct rw_semaphore *sem)
 	might_sleep();
 	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
 
-	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
+	LOCK_CONTENDED_IP(sem, __down_read_trylock, __down_read);
 }
 EXPORT_SYMBOL(down_read);
 
@@ -1481,7 +1483,8 @@ int __sched down_read_interruptible(struct rw_semaphore *sem)
 	might_sleep();
 	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
 
-	if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_interruptible)) {
+	if (LOCK_CONTENDED_RETURN_IP(sem, __down_read_trylock,
+				     __down_read_interruptible)) {
 		rwsem_release(&sem->dep_map, _RET_IP_);
 		return -EINTR;
 	}
@@ -1495,7 +1498,8 @@ int __sched down_read_killable(struct rw_semaphore *sem)
 	might_sleep();
 	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
 
-	if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_killable)) {
+	if (LOCK_CONTENDED_RETURN_IP(sem, __down_read_trylock,
+				     __down_read_killable)) {
 		rwsem_release(&sem->dep_map, _RET_IP_);
 		return -EINTR;
 	}
@@ -1524,7 +1528,7 @@ void __sched down_write(struct rw_semaphore *sem)
 {
 	might_sleep();
 	rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
+	LOCK_CONTENDED_IP(sem, __down_write_trylock, __down_write);
 }
 EXPORT_SYMBOL(down_write);
 
@@ -1536,8 +1540,8 @@ int __sched down_write_killable(struct rw_semaphore *sem)
 	might_sleep();
 	rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
 
-	if (LOCK_CONTENDED_RETURN(sem, __down_write_trylock,
-				  __down_write_killable)) {
+	if (LOCK_CONTENDED_RETURN_IP(sem, __down_write_trylock,
+				     __down_write_killable)) {
 		rwsem_release(&sem->dep_map, _RET_IP_);
 		return -EINTR;
 	}
@@ -1596,7 +1600,7 @@ void down_read_nested(struct rw_semaphore *sem, int subclass)
 {
 	might_sleep();
 	rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_);
-	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
+	LOCK_CONTENDED_IP(sem, __down_read_trylock, __down_read);
 }
 EXPORT_SYMBOL(down_read_nested);
 
@@ -1605,7 +1609,8 @@ int down_read_killable_nested(struct rw_semaphore *sem, int subclass)
 	might_sleep();
 	rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_);
 
-	if (LOCK_CONTENDED_RETURN(sem, __down_read_trylock, __down_read_killable)) {
+	if (LOCK_CONTENDED_RETURN_IP(sem, __down_read_trylock,
+				     __down_read_killable)) {
 		rwsem_release(&sem->dep_map, _RET_IP_);
 		return -EINTR;
 	}
@@ -1618,14 +1623,14 @@ void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest)
 {
 	might_sleep();
 	rwsem_acquire_nest(&sem->dep_map, 0, 0, nest, _RET_IP_);
-	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
+	LOCK_CONTENDED_IP(sem, __down_write_trylock, __down_write);
 }
 EXPORT_SYMBOL(_down_write_nest_lock);
 
 void down_read_non_owner(struct rw_semaphore *sem)
 {
 	might_sleep();
-	__down_read(sem);
+	__down_read(sem, _RET_IP_);
 	__rwsem_set_reader_owned(sem, NULL);
 }
 EXPORT_SYMBOL(down_read_non_owner);
@@ -1634,7 +1639,7 @@ void down_write_nested(struct rw_semaphore *sem, int subclass)
 {
 	might_sleep();
 	rwsem_acquire(&sem->dep_map, subclass, 0, _RET_IP_);
-	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
+	LOCK_CONTENDED_IP(sem, __down_write_trylock, __down_write);
 }
 EXPORT_SYMBOL(down_write_nested);
 
@@ -1643,8 +1648,8 @@ int __sched down_write_killable_nested(struct rw_semaphore *sem, int subclass)
 	might_sleep();
 	rwsem_acquire(&sem->dep_map, subclass, 0, _RET_IP_);
 
-	if (LOCK_CONTENDED_RETURN(sem, __down_write_trylock,
-				  __down_write_killable)) {
+	if (LOCK_CONTENDED_RETURN_IP(sem, __down_write_trylock,
+				     __down_write_killable)) {
 		rwsem_release(&sem->dep_map, _RET_IP_);
 		return -EINTR;
 	}
-- 
2.35.1.574.g5d30c73bfb-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] locking: Apply contention tracepoints in the slow path
  2022-03-01  1:04 ` [PATCH 2/4] locking: Apply contention tracepoints in the slow path Namhyung Kim
@ 2022-03-01  8:43   ` Peter Zijlstra
  2022-03-01 18:03     ` Namhyung Kim
  2022-03-01  9:03   ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2022-03-01  8:43 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, LKML,
	Thomas Gleixner, Steven Rostedt, Mathieu Desnoyers,
	Byungchul Park, Paul E. McKenney, Arnd Bergmann, linux-arch, bpf,
	Radoslaw Burny

On Mon, Feb 28, 2022 at 05:04:10PM -0800, Namhyung Kim wrote:
> @@ -80,7 +81,9 @@ static inline void queued_read_lock(struct qrwlock *lock)
>  		return;
>  
>  	/* The slowpath will decrement the reader count, if necessary. */
> +	LOCK_CONTENTION_BEGIN(lock, LCB_F_READ);
>  	queued_read_lock_slowpath(lock);
> +	LOCK_CONTENTION_END(lock);
>  }
>  
>  /**
> @@ -94,7 +97,9 @@ static inline void queued_write_lock(struct qrwlock *lock)
>  	if (likely(atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED)))
>  		return;
>  
> +	LOCK_CONTENTION_BEGIN(lock, LCB_F_WRITE);
>  	queued_write_lock_slowpath(lock);
> +	LOCK_CONTENTION_END(lock);
>  }

> @@ -82,7 +83,9 @@ static __always_inline void queued_spin_lock(struct qspinlock *lock)
>  	if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL)))
>  		return;
>  
> +	LOCK_CONTENTION_BEGIN(lock, 0);
>  	queued_spin_lock_slowpath(lock, val);
> +	LOCK_CONTENTION_END(lock);
>  }

Can you please stick that _inside_ the slowpath? You really don't want
to inline that.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] locking: Apply contention tracepoints in the slow path
  2022-03-01  1:04 ` [PATCH 2/4] locking: Apply contention tracepoints in the slow path Namhyung Kim
  2022-03-01  8:43   ` Peter Zijlstra
@ 2022-03-01  9:03   ` Peter Zijlstra
  2022-03-01 14:45     ` Steven Rostedt
                       ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Peter Zijlstra @ 2022-03-01  9:03 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, LKML,
	Thomas Gleixner, Steven Rostedt, Mathieu Desnoyers,
	Byungchul Park, Paul E. McKenney, Arnd Bergmann, linux-arch, bpf,
	Radoslaw Burny

On Mon, Feb 28, 2022 at 05:04:10PM -0800, Namhyung Kim wrote:

> @@ -171,9 +172,12 @@ bool __sched __percpu_down_read(struct percpu_rw_semaphore *sem, bool try)
>  	if (try)
>  		return false;
>  
> +	trace_contention_begin(sem, _RET_IP_,
> +			       LCB_F_READ | LCB_F_PERCPU | TASK_UNINTERRUPTIBLE);

That is a bit unwieldy, isn't it ?

>  	preempt_enable();
>  	percpu_rwsem_wait(sem, /* .reader = */ true);
>  	preempt_disable();
> +	trace_contention_end(sem);
>  
>  	return true;
>  }
> @@ -224,8 +228,13 @@ void __sched percpu_down_write(struct percpu_rw_semaphore *sem)
>  	 * Try set sem->block; this provides writer-writer exclusion.
>  	 * Having sem->block set makes new readers block.
>  	 */
> -	if (!__percpu_down_write_trylock(sem))
> +	if (!__percpu_down_write_trylock(sem)) {
> +		unsigned int flags = LCB_F_WRITE | LCB_F_PERCPU | TASK_UNINTERRUPTIBLE;
> +
> +		trace_contention_begin(sem, _RET_IP_, flags);
>  		percpu_rwsem_wait(sem, /* .reader = */ false);
> +		trace_contention_end(sem);
> +	}
>  
>  	/* smp_mb() implied by __percpu_down_write_trylock() on success -- D matches A */
>  

Wouldn't it be easier to stick all that inside percpu_rwsem_wait() and
have it only once? You can even re-frob the wait loop such that the
tracepoint can use current->__state or something.

diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index c9fdae94e098..ca01f8ff88e5 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -154,13 +154,16 @@ static void percpu_rwsem_wait(struct percpu_rw_semaphore *sem, bool reader)
 	}
 	spin_unlock_irq(&sem->waiters.lock);
 
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	trace_contention_begin(sem, _RET_IP_, LCB_F_PERCPU | LCB_F_WRITE*!reader);
 	while (wait) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (!smp_load_acquire(&wq_entry.private))
 			break;
 		schedule();
+		set_current_state(TASK_UNINTERRUPTIBLE);
 	}
 	__set_current_state(TASK_RUNNING);
+	trace_contention_end(sem);
 }
 
 bool __sched __percpu_down_read(struct percpu_rw_semaphore *sem, bool try)

> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index 8555c4efe97c..e49f5d2a232b 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -24,6 +24,8 @@
>  #include <linux/sched/wake_q.h>
>  #include <linux/ww_mutex.h>
>  
> +#include <trace/events/lock.h>
> +
>  #include "rtmutex_common.h"
>  
>  #ifndef WW_RT
> @@ -1652,10 +1654,16 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_base *lock,
>  static __always_inline int __rt_mutex_lock(struct rt_mutex_base *lock,
>  					   unsigned int state)
>  {
> +	int ret;
> +
>  	if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
>  		return 0;
>  
> -	return rt_mutex_slowlock(lock, NULL, state);
> +	trace_contention_begin(lock, _RET_IP_, LCB_F_RT | state);
> +	ret = rt_mutex_slowlock(lock, NULL, state);
> +	trace_contention_end(lock);
> +
> +	return ret;
>  }
>  #endif /* RT_MUTEX_BUILD_MUTEX */
>  
> @@ -1718,9 +1726,11 @@ static __always_inline void __sched rtlock_slowlock(struct rt_mutex_base *lock)
>  {
>  	unsigned long flags;
>  
> +	trace_contention_begin(lock, _RET_IP_, LCB_F_RT | TASK_RTLOCK_WAIT);
>  	raw_spin_lock_irqsave(&lock->wait_lock, flags);
>  	rtlock_slowlock_locked(lock);
>  	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
> +	trace_contention_end(lock);
>  }

Same, if you do it one level in, you can have the tracepoint itself look
at current->__state. Also, you seem to have forgotten to trace the
return value. Now you can't tell if the lock was acquired, or was denied
(ww_mutex) or we were interrupted.

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 8555c4efe97c..18b9f4bf6f34 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1579,6 +1579,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
 
 	set_current_state(state);
 
+	trace_contention_begin(lock, _RET_IP_, LCB_F_RT);
+
 	ret = task_blocks_on_rt_mutex(lock, waiter, current, ww_ctx, chwalk);
 	if (likely(!ret))
 		ret = rt_mutex_slowlock_block(lock, ww_ctx, state, NULL, waiter);
@@ -1601,6 +1603,9 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
 	 * unconditionally. We might have to fix that up.
 	 */
 	fixup_rt_mutex_waiters(lock);
+
+	trace_contention_end(lock, ret);
+
 	return ret;
 }
 
@@ -1683,6 +1688,8 @@ static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock)
 	/* Save current state and set state to TASK_RTLOCK_WAIT */
 	current_save_and_set_rtlock_wait_state();
 
+	trace_contention_begin(lock, _RET_IP_, LCB_F_RT);
+
 	task_blocks_on_rt_mutex(lock, &waiter, current, NULL, RT_MUTEX_MIN_CHAINWALK);
 
 	for (;;) {
@@ -1703,6 +1710,8 @@ static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock)
 		set_current_state(TASK_RTLOCK_WAIT);
 	}
 
+	trace_contention_end(lock, 0);
+
 	/* Restore the task state */
 	current_restore_rtlock_saved_state();
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] locking/mutex: Pass proper call-site ip
  2022-03-01  1:04 ` [PATCH 3/4] locking/mutex: Pass proper call-site ip Namhyung Kim
@ 2022-03-01  9:05   ` Peter Zijlstra
  2022-03-01 14:53     ` Steven Rostedt
  2022-03-01 18:25     ` Namhyung Kim
  0 siblings, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2022-03-01  9:05 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, LKML,
	Thomas Gleixner, Steven Rostedt, Mathieu Desnoyers,
	Byungchul Park, Paul E. McKenney, Arnd Bergmann, linux-arch, bpf,
	Radoslaw Burny

On Mon, Feb 28, 2022 at 05:04:11PM -0800, Namhyung Kim wrote:
> The __mutex_lock_slowpath() and friends are declared as noinline and
> _RET_IP_ returns its caller as mutex_lock which is not meaningful.
> Pass the ip from mutex_lock() to have actual caller info in the trace.

Blergh, can't you do a very limited unwind when you do the tracing
instead? 3 or 4 levels should be plenty fast and sufficient.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] locking: Apply contention tracepoints in the slow path
  2022-03-01  9:03   ` Peter Zijlstra
@ 2022-03-01 14:45     ` Steven Rostedt
  2022-03-01 18:14       ` Namhyung Kim
  2022-03-01 18:11     ` Namhyung Kim
  2022-03-14 21:44     ` Namhyung Kim
  2 siblings, 1 reply; 17+ messages in thread
From: Steven Rostedt @ 2022-03-01 14:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Namhyung Kim, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
	LKML, Thomas Gleixner, Mathieu Desnoyers, Byungchul Park,
	Paul E. McKenney, Arnd Bergmann, linux-arch, bpf, Radoslaw Burny

On Tue, 1 Mar 2022 10:03:54 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index 8555c4efe97c..18b9f4bf6f34 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -1579,6 +1579,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
>  
>  	set_current_state(state);
>  
> +	trace_contention_begin(lock, _RET_IP_, LCB_F_RT);

I guess one issue with this is that _RET_IP_ will return the rt_mutex
address if this is not inlined, making the _RET_IP_ rather useless.

Now, if we can pass the _RET_IP_ into __rt_mutex_slowlock() then we could
reference that.

-- Steve


> +
>  	ret = task_blocks_on_rt_mutex(lock, waiter, current, ww_ctx, chwalk);
>  	if (likely(!ret))
>  		ret = rt_mutex_slowlock_block(lock, ww_ctx, state, NULL, waiter);
> @@ -1601,6 +1603,9 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
>  	 * unconditionally. We might have to fix that up.
>  	 */
>  	fixup_rt_mutex_waiters(lock);
> +
> +	trace_contention_end(lock, ret);
> +
>  	return ret;
>  }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] locking/mutex: Pass proper call-site ip
  2022-03-01  9:05   ` Peter Zijlstra
@ 2022-03-01 14:53     ` Steven Rostedt
  2022-03-01 19:47       ` Peter Zijlstra
  2022-03-01 18:25     ` Namhyung Kim
  1 sibling, 1 reply; 17+ messages in thread
From: Steven Rostedt @ 2022-03-01 14:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Namhyung Kim, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
	LKML, Thomas Gleixner, Mathieu Desnoyers, Byungchul Park,
	Paul E. McKenney, Arnd Bergmann, linux-arch, bpf, Radoslaw Burny

On Tue, 1 Mar 2022 10:05:12 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Feb 28, 2022 at 05:04:11PM -0800, Namhyung Kim wrote:
> > The __mutex_lock_slowpath() and friends are declared as noinline and
> > _RET_IP_ returns its caller as mutex_lock which is not meaningful.
> > Pass the ip from mutex_lock() to have actual caller info in the trace.  
> 
> Blergh, can't you do a very limited unwind when you do the tracing
> instead? 3 or 4 levels should be plenty fast and sufficient.

Is there a fast and sufficient way that works across architectures?

Could objtool help here?

-- Steve

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] locking: Apply contention tracepoints in the slow path
  2022-03-01  8:43   ` Peter Zijlstra
@ 2022-03-01 18:03     ` Namhyung Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2022-03-01 18:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, LKML,
	Thomas Gleixner, Steven Rostedt, Mathieu Desnoyers,
	Byungchul Park, Paul E. McKenney, Arnd Bergmann, linux-arch, bpf,
	Radoslaw Burny

Hi Peter,

On Tue, Mar 1, 2022 at 12:44 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Feb 28, 2022 at 05:04:10PM -0800, Namhyung Kim wrote:
> > @@ -80,7 +81,9 @@ static inline void queued_read_lock(struct qrwlock *lock)
> >               return;
> >
> >       /* The slowpath will decrement the reader count, if necessary. */
> > +     LOCK_CONTENTION_BEGIN(lock, LCB_F_READ);
> >       queued_read_lock_slowpath(lock);
> > +     LOCK_CONTENTION_END(lock);
> >  }
> >
> >  /**
> > @@ -94,7 +97,9 @@ static inline void queued_write_lock(struct qrwlock *lock)
> >       if (likely(atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED)))
> >               return;
> >
> > +     LOCK_CONTENTION_BEGIN(lock, LCB_F_WRITE);
> >       queued_write_lock_slowpath(lock);
> > +     LOCK_CONTENTION_END(lock);
> >  }
>
> > @@ -82,7 +83,9 @@ static __always_inline void queued_spin_lock(struct qspinlock *lock)
> >       if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL)))
> >               return;
> >
> > +     LOCK_CONTENTION_BEGIN(lock, 0);
> >       queued_spin_lock_slowpath(lock, val);
> > +     LOCK_CONTENTION_END(lock);
> >  }
>
> Can you please stick that _inside_ the slowpath? You really don't want
> to inline that.

I can move it into the slow path with caller ip.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] locking: Apply contention tracepoints in the slow path
  2022-03-01  9:03   ` Peter Zijlstra
  2022-03-01 14:45     ` Steven Rostedt
@ 2022-03-01 18:11     ` Namhyung Kim
  2022-03-14 21:44     ` Namhyung Kim
  2 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2022-03-01 18:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, LKML,
	Thomas Gleixner, Steven Rostedt, Mathieu Desnoyers,
	Byungchul Park, Paul E. McKenney, Arnd Bergmann, linux-arch, bpf,
	Radoslaw Burny

On Tue, Mar 1, 2022 at 1:04 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Feb 28, 2022 at 05:04:10PM -0800, Namhyung Kim wrote:
>
> > @@ -171,9 +172,12 @@ bool __sched __percpu_down_read(struct percpu_rw_semaphore *sem, bool try)
> >       if (try)
> >               return false;
> >
> > +     trace_contention_begin(sem, _RET_IP_,
> > +                            LCB_F_READ | LCB_F_PERCPU | TASK_UNINTERRUPTIBLE);
>
> That is a bit unwieldy, isn't it ?
>
> >       preempt_enable();
> >       percpu_rwsem_wait(sem, /* .reader = */ true);
> >       preempt_disable();
> > +     trace_contention_end(sem);
> >
> >       return true;
> >  }
> > @@ -224,8 +228,13 @@ void __sched percpu_down_write(struct percpu_rw_semaphore *sem)
> >        * Try set sem->block; this provides writer-writer exclusion.
> >        * Having sem->block set makes new readers block.
> >        */
> > -     if (!__percpu_down_write_trylock(sem))
> > +     if (!__percpu_down_write_trylock(sem)) {
> > +             unsigned int flags = LCB_F_WRITE | LCB_F_PERCPU | TASK_UNINTERRUPTIBLE;
> > +
> > +             trace_contention_begin(sem, _RET_IP_, flags);
> >               percpu_rwsem_wait(sem, /* .reader = */ false);
> > +             trace_contention_end(sem);
> > +     }
> >
> >       /* smp_mb() implied by __percpu_down_write_trylock() on success -- D matches A */
> >
>
> Wouldn't it be easier to stick all that inside percpu_rwsem_wait() and
> have it only once? You can even re-frob the wait loop such that the
> tracepoint can use current->__state or something.
>
> diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
> index c9fdae94e098..ca01f8ff88e5 100644
> --- a/kernel/locking/percpu-rwsem.c
> +++ b/kernel/locking/percpu-rwsem.c
> @@ -154,13 +154,16 @@ static void percpu_rwsem_wait(struct percpu_rw_semaphore *sem, bool reader)
>         }
>         spin_unlock_irq(&sem->waiters.lock);
>
> +       set_current_state(TASK_UNINTERRUPTIBLE);
> +       trace_contention_begin(sem, _RET_IP_, LCB_F_PERCPU | LCB_F_WRITE*!reader);
>         while (wait) {
> -               set_current_state(TASK_UNINTERRUPTIBLE);
>                 if (!smp_load_acquire(&wq_entry.private))
>                         break;
>                 schedule();
> +               set_current_state(TASK_UNINTERRUPTIBLE);
>         }
>         __set_current_state(TASK_RUNNING);
> +       trace_contention_end(sem);
>  }

Looks good.  I'll make similar changes in other places too.

One general concern of moving inside is that it makes the _RET_IP_
useless.  If we can pass the caller ip to the slowpath function,
it'd be fine.

>
> > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> > index 8555c4efe97c..e49f5d2a232b 100644
> > --- a/kernel/locking/rtmutex.c
> > +++ b/kernel/locking/rtmutex.c
> > @@ -24,6 +24,8 @@
> >  #include <linux/sched/wake_q.h>
> >  #include <linux/ww_mutex.h>
> >
> > +#include <trace/events/lock.h>
> > +
> >  #include "rtmutex_common.h"
> >
> >  #ifndef WW_RT
> > @@ -1652,10 +1654,16 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_base *lock,
> >  static __always_inline int __rt_mutex_lock(struct rt_mutex_base *lock,
> >                                          unsigned int state)
> >  {
> > +     int ret;
> > +
> >       if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
> >               return 0;
> >
> > -     return rt_mutex_slowlock(lock, NULL, state);
> > +     trace_contention_begin(lock, _RET_IP_, LCB_F_RT | state);
> > +     ret = rt_mutex_slowlock(lock, NULL, state);
> > +     trace_contention_end(lock);
> > +
> > +     return ret;
> >  }
> >  #endif /* RT_MUTEX_BUILD_MUTEX */
> >
> > @@ -1718,9 +1726,11 @@ static __always_inline void __sched rtlock_slowlock(struct rt_mutex_base *lock)
> >  {
> >       unsigned long flags;
> >
> > +     trace_contention_begin(lock, _RET_IP_, LCB_F_RT | TASK_RTLOCK_WAIT);
> >       raw_spin_lock_irqsave(&lock->wait_lock, flags);
> >       rtlock_slowlock_locked(lock);
> >       raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
> > +     trace_contention_end(lock);
> >  }
>
> Same, if you do it one level in, you can have the tracepoint itself look
> at current->__state. Also, you seem to have forgotten to trace the
> return value. Now you can't tell if the lock was acquired, or was denied
> (ww_mutex) or we were interrupted.

Right, thanks for pointing that out.  Will add that.

Thanks,
Namhyung

>
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index 8555c4efe97c..18b9f4bf6f34 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -1579,6 +1579,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
>
>         set_current_state(state);
>
> +       trace_contention_begin(lock, _RET_IP_, LCB_F_RT);
> +
>         ret = task_blocks_on_rt_mutex(lock, waiter, current, ww_ctx, chwalk);
>         if (likely(!ret))
>                 ret = rt_mutex_slowlock_block(lock, ww_ctx, state, NULL, waiter);
> @@ -1601,6 +1603,9 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
>          * unconditionally. We might have to fix that up.
>          */
>         fixup_rt_mutex_waiters(lock);
> +
> +       trace_contention_end(lock, ret);
> +
>         return ret;
>  }
>
> @@ -1683,6 +1688,8 @@ static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock)
>         /* Save current state and set state to TASK_RTLOCK_WAIT */
>         current_save_and_set_rtlock_wait_state();
>
> +       trace_contention_begin(lock, _RET_IP_, LCB_F_RT);
> +
>         task_blocks_on_rt_mutex(lock, &waiter, current, NULL, RT_MUTEX_MIN_CHAINWALK);
>
>         for (;;) {
> @@ -1703,6 +1710,8 @@ static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock)
>                 set_current_state(TASK_RTLOCK_WAIT);
>         }
>
> +       trace_contention_end(lock, 0);
> +
>         /* Restore the task state */
>         current_restore_rtlock_saved_state();
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] locking: Apply contention tracepoints in the slow path
  2022-03-01 14:45     ` Steven Rostedt
@ 2022-03-01 18:14       ` Namhyung Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2022-03-01 18:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, LKML, Thomas Gleixner, Mathieu Desnoyers,
	Byungchul Park, Paul E. McKenney, Arnd Bergmann, linux-arch, bpf,
	Radoslaw Burny

Hi Steve,

On Tue, Mar 1, 2022 at 6:45 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 1 Mar 2022 10:03:54 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
>
> > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> > index 8555c4efe97c..18b9f4bf6f34 100644
> > --- a/kernel/locking/rtmutex.c
> > +++ b/kernel/locking/rtmutex.c
> > @@ -1579,6 +1579,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
> >
> >       set_current_state(state);
> >
> > +     trace_contention_begin(lock, _RET_IP_, LCB_F_RT);
>
> I guess one issue with this is that _RET_IP_ will return the rt_mutex
> address if this is not inlined, making the _RET_IP_ rather useless.
>
> Now, if we can pass the _RET_IP_ into __rt_mutex_slowlock() then we could
> reference that.

Right, that's what I did in patch 03 and 04 for mutex and rwsem.
But I forgot to update rt_mutex, will fix.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] locking/mutex: Pass proper call-site ip
  2022-03-01  9:05   ` Peter Zijlstra
  2022-03-01 14:53     ` Steven Rostedt
@ 2022-03-01 18:25     ` Namhyung Kim
  1 sibling, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2022-03-01 18:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, LKML,
	Thomas Gleixner, Steven Rostedt, Mathieu Desnoyers,
	Byungchul Park, Paul E. McKenney, Arnd Bergmann, linux-arch, bpf,
	Radoslaw Burny

On Tue, Mar 1, 2022 at 1:05 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Feb 28, 2022 at 05:04:11PM -0800, Namhyung Kim wrote:
> > The __mutex_lock_slowpath() and friends are declared as noinline and
> > _RET_IP_ returns its caller as mutex_lock which is not meaningful.
> > Pass the ip from mutex_lock() to have actual caller info in the trace.
>
> Blergh, can't you do a very limited unwind when you do the tracing
> instead? 3 or 4 levels should be plenty fast and sufficient.

Are you talking about getting rid of the ip from the tracepoints?
Having stacktraces together is good, but it'd be nice if it provided
a way to identify the lock without them too.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] locking/mutex: Pass proper call-site ip
  2022-03-01 14:53     ` Steven Rostedt
@ 2022-03-01 19:47       ` Peter Zijlstra
  2022-03-04  7:28         ` Namhyung Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2022-03-01 19:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Namhyung Kim, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
	LKML, Thomas Gleixner, Mathieu Desnoyers, Byungchul Park,
	Paul E. McKenney, Arnd Bergmann, linux-arch, bpf, Radoslaw Burny

On Tue, Mar 01, 2022 at 09:53:54AM -0500, Steven Rostedt wrote:
> On Tue, 1 Mar 2022 10:05:12 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Mon, Feb 28, 2022 at 05:04:11PM -0800, Namhyung Kim wrote:
> > > The __mutex_lock_slowpath() and friends are declared as noinline and
> > > _RET_IP_ returns its caller as mutex_lock which is not meaningful.
> > > Pass the ip from mutex_lock() to have actual caller info in the trace.  
> > 
> > Blergh, can't you do a very limited unwind when you do the tracing
> > instead? 3 or 4 levels should be plenty fast and sufficient.
> 
> Is there a fast and sufficient way that works across architectures?

The normal stacktrace API? Or the fancy new arch_stack_walk() which is
already available on most architectures you actually care about and
risc-v :-)

Remember, this is the contention path, we're going to stall anyway,
doing a few levels of unwind shouldn't really hurt at that point.

Anyway; when I wrote that this morning, I was thinking:

	unsigned long ips[4];
	stack_trace_save(ips, 4, 0);


> Could objtool help here?

There's a contradition there... objtool is still x86_64 only :-/

IIRC there's been work on s390, arm64 and power objtool, but so far none
of them actually made it in.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] locking/mutex: Pass proper call-site ip
  2022-03-01 19:47       ` Peter Zijlstra
@ 2022-03-04  7:28         ` Namhyung Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2022-03-04  7:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, LKML, Thomas Gleixner, Mathieu Desnoyers,
	Byungchul Park, Paul E. McKenney, Arnd Bergmann, linux-arch, bpf,
	Radoslaw Burny

On Tue, Mar 1, 2022 at 11:47 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Mar 01, 2022 at 09:53:54AM -0500, Steven Rostedt wrote:
> > On Tue, 1 Mar 2022 10:05:12 +0100
> > Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > > On Mon, Feb 28, 2022 at 05:04:11PM -0800, Namhyung Kim wrote:
> > > > The __mutex_lock_slowpath() and friends are declared as noinline and
> > > > _RET_IP_ returns its caller as mutex_lock which is not meaningful.
> > > > Pass the ip from mutex_lock() to have actual caller info in the trace.
> > >
> > > Blergh, can't you do a very limited unwind when you do the tracing
> > > instead? 3 or 4 levels should be plenty fast and sufficient.
> >
> > Is there a fast and sufficient way that works across architectures?
>
> The normal stacktrace API? Or the fancy new arch_stack_walk() which is
> already available on most architectures you actually care about and
> risc-v :-)
>
> Remember, this is the contention path, we're going to stall anyway,
> doing a few levels of unwind shouldn't really hurt at that point.
>
> Anyway; when I wrote that this morning, I was thinking:
>
>         unsigned long ips[4];
>         stack_trace_save(ips, 4, 0);

When I collected stack traces in a BPF, it already consumed 3 or 4
entries in the BPF so I had to increase the size to 8 and skip 4.
But it didn't add noticeable overheads in my test.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] locking: Apply contention tracepoints in the slow path
  2022-03-01  9:03   ` Peter Zijlstra
  2022-03-01 14:45     ` Steven Rostedt
  2022-03-01 18:11     ` Namhyung Kim
@ 2022-03-14 21:44     ` Namhyung Kim
  2 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2022-03-14 21:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng, LKML,
	Thomas Gleixner, Steven Rostedt, Mathieu Desnoyers,
	Byungchul Park, Paul E. McKenney, Arnd Bergmann, linux-arch, bpf,
	Radoslaw Burny

On Tue, Mar 1, 2022 at 1:04 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Feb 28, 2022 at 05:04:10PM -0800, Namhyung Kim wrote:
> > @@ -1718,9 +1726,11 @@ static __always_inline void __sched rtlock_slowlock(struct rt_mutex_base *lock)
> >  {
> >       unsigned long flags;
> >
> > +     trace_contention_begin(lock, _RET_IP_, LCB_F_RT | TASK_RTLOCK_WAIT);
> >       raw_spin_lock_irqsave(&lock->wait_lock, flags);
> >       rtlock_slowlock_locked(lock);
> >       raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
> > +     trace_contention_end(lock);
> >  }
>
> Same, if you do it one level in, you can have the tracepoint itself look
> at current->__state.

So I tried this by reading the state in the trace like below:

+       TP_fast_assign(
+               __entry->lock_addr = lock;
+               __entry->flags = flags | get_current_state();
+       ),

But I sometimes see unrelated values which contain
__TASK_TRACED or __TASK_STOPPED and some unexpected values
like TASK_UNINTERRUPTIBLE for rwlocks.  Maybe I missed something.

Anyway I think it's confusing and complicates things unnecessarily.
Probably it'd better using new flags like LCB_F_SPIN and LCB_F_WAIT.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-03-14 21:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-01  1:04 [RFC 0/4] locking: Add new lock contention tracepoints (v2) Namhyung Kim
2022-03-01  1:04 ` [PATCH 1/4] locking: Add lock contention tracepoints Namhyung Kim
2022-03-01  1:04 ` [PATCH 2/4] locking: Apply contention tracepoints in the slow path Namhyung Kim
2022-03-01  8:43   ` Peter Zijlstra
2022-03-01 18:03     ` Namhyung Kim
2022-03-01  9:03   ` Peter Zijlstra
2022-03-01 14:45     ` Steven Rostedt
2022-03-01 18:14       ` Namhyung Kim
2022-03-01 18:11     ` Namhyung Kim
2022-03-14 21:44     ` Namhyung Kim
2022-03-01  1:04 ` [PATCH 3/4] locking/mutex: Pass proper call-site ip Namhyung Kim
2022-03-01  9:05   ` Peter Zijlstra
2022-03-01 14:53     ` Steven Rostedt
2022-03-01 19:47       ` Peter Zijlstra
2022-03-04  7:28         ` Namhyung Kim
2022-03-01 18:25     ` Namhyung Kim
2022-03-01  1:04 ` [PATCH 4/4] locking/rwsem: " Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).