linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] locking: Add new lock contention tracepoints (v3)
@ 2022-03-16 22:45 Namhyung Kim
  2022-03-16 22:45 ` [PATCH 1/2] locking: Add lock contention tracepoints Namhyung Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Namhyung Kim @ 2022-03-16 22:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng
  Cc: LKML, Thomas Gleixner, Steven Rostedt, Byungchul Park,
	Paul E. McKenney, Mathieu Desnoyers, Arnd Bergmann,
	Radoslaw Burny, linux-arch, bpf

Hello,

There have been some requests for low-overhead kernel lock contention
monitoring.  The kernel has CONFIG_LOCK_STAT to provide such an infra
either via /proc/lock_stat or tracepoints directly.

However it's not light-weight and hard to be used in production.  So
I'm trying to add new tracepoints for lock contention and using them
as a base to build a new monitoring system.

* Changes in v3
 - move the tracepoints deeper in the slow path
 - remove the caller ip
 - don't use task state in the flags

* Changes in v2
 - do not use lockdep infrastructure
 - add flags argument to lock:contention_begin tracepoint

I added a flags argument in the contention_begin to classify locks in
question.  It can tell whether it's a spinlock, reader-writer lock or
a mutex.  With stacktrace, users can identify which lock is contended.

The patch 01 added the tracepoints and move the definition to the
mutex.c file so that it can see the tracepoints without lockdep.

The patch 02 actually installs the tracepoints in the locking code.
To minimize the overhead, they were added in the slow path of the code
separately.  As spinlocks are defined in the arch headers, I couldn't
handle them all.  I've just added it to generic queued spinlock and
rwlocks only.  Each arch can add the tracepoints later.

This series base on the current tip/locking/core and you get it from
'locking/tracepoint-v3' branch in my tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


Thanks,
Namhyung


Namhyung Kim (2):
  locking: Add lock contention tracepoints
  locking: Apply contention tracepoints in the slow path

 include/trace/events/lock.h   | 54 +++++++++++++++++++++++++++++++++--
 kernel/locking/lockdep.c      |  1 -
 kernel/locking/mutex.c        |  6 ++++
 kernel/locking/percpu-rwsem.c |  3 ++
 kernel/locking/qrwlock.c      |  9 ++++++
 kernel/locking/qspinlock.c    |  5 ++++
 kernel/locking/rtmutex.c      | 11 +++++++
 kernel/locking/rwbase_rt.c    |  3 ++
 kernel/locking/rwsem.c        |  9 ++++++
 kernel/locking/semaphore.c    | 14 ++++++++-
 10 files changed, 110 insertions(+), 5 deletions(-)


base-commit: cd27ccfc727e99352321c0c75012ab9c5a90321e
-- 
2.35.1.894.gb6a874cedc-goog


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/2] locking: Add lock contention tracepoints
  2022-03-16 22:45 [PATCH 0/2] locking: Add new lock contention tracepoints (v3) Namhyung Kim
@ 2022-03-16 22:45 ` Namhyung Kim
  2022-03-17  2:31   ` Steven Rostedt
  2022-03-17 13:32   ` Mathieu Desnoyers
  2022-03-16 22:45 ` [PATCH 2/2] locking: Apply contention tracepoints in the slow path Namhyung Kim
  2022-03-17 17:32 ` [PATCH 0/2] locking: Add new lock contention tracepoints (v3) Hyeonggon Yoo
  2 siblings, 2 replies; 26+ messages in thread
From: Namhyung Kim @ 2022-03-16 22:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng
  Cc: LKML, Thomas Gleixner, Steven Rostedt, Byungchul Park,
	Paul E. McKenney, Mathieu Desnoyers, Arnd Bergmann,
	Radoslaw Burny, linux-arch, bpf

This adds two new lock contention tracepoints like below:

 * lock:contention_begin
 * lock:contention_end

The lock:contention_begin takes a flags argument to classify locks.  I
found it useful to identify what kind of locks it's tracing like if
it's spinning or sleeping, reader-writer lock, real-time, and per-cpu.

Move tracepoint definitions into mutex.c so that we can use them
without lockdep.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/trace/events/lock.h | 54 ++++++++++++++++++++++++++++++++++---
 kernel/locking/lockdep.c    |  1 -
 kernel/locking/mutex.c      |  3 +++
 3 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/lock.h b/include/trace/events/lock.h
index d7512129a324..2a3df36d4fdb 100644
--- a/include/trace/events/lock.h
+++ b/include/trace/events/lock.h
@@ -5,11 +5,21 @@
 #if !defined(_TRACE_LOCK_H) || defined(TRACE_HEADER_MULTI_READ)
 #define _TRACE_LOCK_H
 
-#include <linux/lockdep.h>
+#include <linux/sched.h>
 #include <linux/tracepoint.h>
 
+/* flags for lock:contention_begin */
+#define LCB_F_SPIN	(1U << 0)
+#define LCB_F_READ	(1U << 1)
+#define LCB_F_WRITE	(1U << 2)
+#define LCB_F_RT	(1U << 3)
+#define LCB_F_PERCPU	(1U << 4)
+
+
 #ifdef CONFIG_LOCKDEP
 
+#include <linux/lockdep.h>
+
 TRACE_EVENT(lock_acquire,
 
 	TP_PROTO(struct lockdep_map *lock, unsigned int subclass,
@@ -78,8 +88,46 @@ DEFINE_EVENT(lock, lock_acquired,
 	TP_ARGS(lock, ip)
 );
 
-#endif
-#endif
+#endif /* CONFIG_LOCK_STAT */
+#endif /* CONFIG_LOCKDEP */
+
+TRACE_EVENT(contention_begin,
+
+	TP_PROTO(void *lock, unsigned int flags),
+
+	TP_ARGS(lock, flags),
+
+	TP_STRUCT__entry(
+		__field(void *, lock_addr)
+		__field(unsigned int, flags)
+	),
+
+	TP_fast_assign(
+		__entry->lock_addr = lock;
+		__entry->flags = flags;
+	),
+
+	TP_printk("%p (flags=%x)", __entry->lock_addr, __entry->flags)
+);
+
+TRACE_EVENT(contention_end,
+
+	TP_PROTO(void *lock, int ret),
+
+	TP_ARGS(lock, ret),
+
+	TP_STRUCT__entry(
+		__field(void *, lock_addr)
+		__field(int, ret)
+	),
+
+	TP_fast_assign(
+		__entry->lock_addr = lock;
+		__entry->ret = ret;
+	),
+
+	TP_printk("%p (ret=%d)", __entry->lock_addr, __entry->ret)
+);
 
 #endif /* _TRACE_LOCK_H */
 
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 50036c10b518..08f8fb6a2d1e 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -60,7 +60,6 @@
 
 #include "lockdep_internals.h"
 
-#define CREATE_TRACE_POINTS
 #include <trace/events/lock.h>
 
 #ifdef CONFIG_PROVE_LOCKING
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 5e3585950ec8..ee2fd7614a93 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -30,6 +30,9 @@
 #include <linux/debug_locks.h>
 #include <linux/osq_lock.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/lock.h>
+
 #ifndef CONFIG_PREEMPT_RT
 #include "mutex.h"
 
-- 
2.35.1.894.gb6a874cedc-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-16 22:45 [PATCH 0/2] locking: Add new lock contention tracepoints (v3) Namhyung Kim
  2022-03-16 22:45 ` [PATCH 1/2] locking: Add lock contention tracepoints Namhyung Kim
@ 2022-03-16 22:45 ` Namhyung Kim
  2022-03-17 13:45   ` Mathieu Desnoyers
                     ` (2 more replies)
  2022-03-17 17:32 ` [PATCH 0/2] locking: Add new lock contention tracepoints (v3) Hyeonggon Yoo
  2 siblings, 3 replies; 26+ messages in thread
From: Namhyung Kim @ 2022-03-16 22:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng
  Cc: LKML, Thomas Gleixner, Steven Rostedt, Byungchul Park,
	Paul E. McKenney, Mathieu Desnoyers, Arnd Bergmann,
	Radoslaw Burny, linux-arch, bpf

Adding the lock contention tracepoints in various lock function slow
paths.  Note that each arch can define spinlock differently, I only
added it only to the generic qspinlock for now.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/locking/mutex.c        |  3 +++
 kernel/locking/percpu-rwsem.c |  3 +++
 kernel/locking/qrwlock.c      |  9 +++++++++
 kernel/locking/qspinlock.c    |  5 +++++
 kernel/locking/rtmutex.c      | 11 +++++++++++
 kernel/locking/rwbase_rt.c    |  3 +++
 kernel/locking/rwsem.c        |  9 +++++++++
 kernel/locking/semaphore.c    | 14 +++++++++++++-
 8 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index ee2fd7614a93..c88deda77cf2 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -644,6 +644,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 	}
 
 	set_current_state(state);
+	trace_contention_begin(lock, 0);
 	for (;;) {
 		bool first;
 
@@ -710,6 +711,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 skip_wait:
 	/* got the lock - cleanup and rejoice! */
 	lock_acquired(&lock->dep_map, ip);
+	trace_contention_end(lock, 0);
 
 	if (ww_ctx)
 		ww_mutex_lock_acquired(ww, ww_ctx);
@@ -721,6 +723,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 err:
 	__set_current_state(TASK_RUNNING);
 	__mutex_remove_waiter(lock, &waiter);
+	trace_contention_end(lock, ret);
 err_early_kill:
 	raw_spin_unlock(&lock->wait_lock);
 	debug_mutex_free_waiter(&waiter);
diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index c9fdae94e098..833043613af6 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -9,6 +9,7 @@
 #include <linux/sched/task.h>
 #include <linux/sched/debug.h>
 #include <linux/errno.h>
+#include <trace/events/lock.h>
 
 int __percpu_init_rwsem(struct percpu_rw_semaphore *sem,
 			const char *name, struct lock_class_key *key)
@@ -154,6 +155,7 @@ static void percpu_rwsem_wait(struct percpu_rw_semaphore *sem, bool reader)
 	}
 	spin_unlock_irq(&sem->waiters.lock);
 
+	trace_contention_begin(sem, LCB_F_PERCPU | (reader ? LCB_F_READ : LCB_F_WRITE));
 	while (wait) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (!smp_load_acquire(&wq_entry.private))
@@ -161,6 +163,7 @@ static void percpu_rwsem_wait(struct percpu_rw_semaphore *sem, bool reader)
 		schedule();
 	}
 	__set_current_state(TASK_RUNNING);
+	trace_contention_end(sem, 0);
 }
 
 bool __sched __percpu_down_read(struct percpu_rw_semaphore *sem, bool try)
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index ec36b73f4733..b9f6f963d77f 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -12,6 +12,7 @@
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
 #include <linux/spinlock.h>
+#include <trace/events/lock.h>
 
 /**
  * queued_read_lock_slowpath - acquire read lock of a queue rwlock
@@ -34,6 +35,8 @@ void queued_read_lock_slowpath(struct qrwlock *lock)
 	}
 	atomic_sub(_QR_BIAS, &lock->cnts);
 
+	trace_contention_begin(lock, LCB_F_READ | LCB_F_SPIN);
+
 	/*
 	 * Put the reader into the wait queue
 	 */
@@ -51,6 +54,8 @@ void queued_read_lock_slowpath(struct qrwlock *lock)
 	 * Signal the next one in queue to become queue head
 	 */
 	arch_spin_unlock(&lock->wait_lock);
+
+	trace_contention_end(lock, 0);
 }
 EXPORT_SYMBOL(queued_read_lock_slowpath);
 
@@ -62,6 +67,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 {
 	int cnts;
 
+	trace_contention_begin(lock, LCB_F_WRITE | LCB_F_SPIN);
+
 	/* Put the writer into the wait queue */
 	arch_spin_lock(&lock->wait_lock);
 
@@ -79,5 +86,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 	} while (!atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED));
 unlock:
 	arch_spin_unlock(&lock->wait_lock);
+
+	trace_contention_end(lock, 0);
 }
 EXPORT_SYMBOL(queued_write_lock_slowpath);
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index cbff6ba53d56..65a9a10caa6f 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -22,6 +22,7 @@
 #include <linux/prefetch.h>
 #include <asm/byteorder.h>
 #include <asm/qspinlock.h>
+#include <trace/events/lock.h>
 
 /*
  * Include queued spinlock statistics code
@@ -401,6 +402,8 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	idx = node->count++;
 	tail = encode_tail(smp_processor_id(), idx);
 
+	trace_contention_begin(lock, LCB_F_SPIN);
+
 	/*
 	 * 4 nodes are allocated based on the assumption that there will
 	 * not be nested NMIs taking spinlocks. That may not be true in
@@ -554,6 +557,8 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	pv_kick_node(lock, next);
 
 release:
+	trace_contention_end(lock, 0);
+
 	/*
 	 * release the node
 	 */
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 8555c4efe97c..7779ee8abc2a 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -24,6 +24,8 @@
 #include <linux/sched/wake_q.h>
 #include <linux/ww_mutex.h>
 
+#include <trace/events/lock.h>
+
 #include "rtmutex_common.h"
 
 #ifndef WW_RT
@@ -1579,6 +1581,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
 
 	set_current_state(state);
 
+	trace_contention_begin(lock, LCB_F_RT);
+
 	ret = task_blocks_on_rt_mutex(lock, waiter, current, ww_ctx, chwalk);
 	if (likely(!ret))
 		ret = rt_mutex_slowlock_block(lock, ww_ctx, state, NULL, waiter);
@@ -1601,6 +1605,9 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
 	 * unconditionally. We might have to fix that up.
 	 */
 	fixup_rt_mutex_waiters(lock);
+
+	trace_contention_end(lock, ret);
+
 	return ret;
 }
 
@@ -1683,6 +1690,8 @@ static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock)
 	/* Save current state and set state to TASK_RTLOCK_WAIT */
 	current_save_and_set_rtlock_wait_state();
 
+	trace_contention_begin(lock, LCB_F_RT);
+
 	task_blocks_on_rt_mutex(lock, &waiter, current, NULL, RT_MUTEX_MIN_CHAINWALK);
 
 	for (;;) {
@@ -1712,6 +1721,8 @@ static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock)
 	 */
 	fixup_rt_mutex_waiters(lock);
 	debug_rt_mutex_free_waiter(&waiter);
+
+	trace_contention_end(lock, 0);
 }
 
 static __always_inline void __sched rtlock_slowlock(struct rt_mutex_base *lock)
diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index 6fd3162e4098..ec7b1fda7982 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -247,11 +247,13 @@ static int __sched rwbase_write_lock(struct rwbase_rt *rwb,
 		goto out_unlock;
 
 	rwbase_set_and_save_current_state(state);
+	trace_contention_begin(rwb, LCB_F_WRITE | LCB_F_RT);
 	for (;;) {
 		/* Optimized out for rwlocks */
 		if (rwbase_signal_pending_state(state, current)) {
 			rwbase_restore_current_state();
 			__rwbase_write_unlock(rwb, 0, flags);
+			trace_contention_end(rwb, -EINTR);
 			return -EINTR;
 		}
 
@@ -265,6 +267,7 @@ static int __sched rwbase_write_lock(struct rwbase_rt *rwb,
 		set_current_state(state);
 	}
 	rwbase_restore_current_state();
+	trace_contention_end(rwb, 0);
 
 out_unlock:
 	raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index acde5d6f1254..465db7bd84f8 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -27,6 +27,7 @@
 #include <linux/export.h>
 #include <linux/rwsem.h>
 #include <linux/atomic.h>
+#include <trace/events/lock.h>
 
 #ifndef CONFIG_PREEMPT_RT
 #include "lock_events.h"
@@ -1014,6 +1015,8 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
 	raw_spin_unlock_irq(&sem->wait_lock);
 	wake_up_q(&wake_q);
 
+	trace_contention_begin(sem, LCB_F_READ);
+
 	/* wait to be given the lock */
 	for (;;) {
 		set_current_state(state);
@@ -1035,6 +1038,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
 
 	__set_current_state(TASK_RUNNING);
 	lockevent_inc(rwsem_rlock);
+	trace_contention_end(sem, 0);
 	return sem;
 
 out_nolock:
@@ -1042,6 +1046,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat
 	raw_spin_unlock_irq(&sem->wait_lock);
 	__set_current_state(TASK_RUNNING);
 	lockevent_inc(rwsem_rlock_fail);
+	trace_contention_end(sem, -EINTR);
 	return ERR_PTR(-EINTR);
 }
 
@@ -1109,6 +1114,8 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 wait:
 	/* wait until we successfully acquire the lock */
 	set_current_state(state);
+	trace_contention_begin(sem, LCB_F_WRITE);
+
 	for (;;) {
 		if (rwsem_try_write_lock(sem, &waiter)) {
 			/* rwsem_try_write_lock() implies ACQUIRE on success */
@@ -1148,6 +1155,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 	__set_current_state(TASK_RUNNING);
 	raw_spin_unlock_irq(&sem->wait_lock);
 	lockevent_inc(rwsem_wlock);
+	trace_contention_end(sem, 0);
 	return sem;
 
 out_nolock:
@@ -1159,6 +1167,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 	raw_spin_unlock_irq(&sem->wait_lock);
 	wake_up_q(&wake_q);
 	lockevent_inc(rwsem_wlock_fail);
+	trace_contention_end(sem, -EINTR);
 	return ERR_PTR(-EINTR);
 }
 
diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index 9ee381e4d2a4..e3c19668dfee 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -32,6 +32,7 @@
 #include <linux/semaphore.h>
 #include <linux/spinlock.h>
 #include <linux/ftrace.h>
+#include <trace/events/lock.h>
 
 static noinline void __down(struct semaphore *sem);
 static noinline int __down_interruptible(struct semaphore *sem);
@@ -209,6 +210,7 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
 								long timeout)
 {
 	struct semaphore_waiter waiter;
+	bool tracing = false;
 
 	list_add_tail(&waiter.list, &sem->wait_list);
 	waiter.task = current;
@@ -220,18 +222,28 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
 		if (unlikely(timeout <= 0))
 			goto timed_out;
 		__set_current_state(state);
+		if (!tracing) {
+			trace_contention_begin(sem, 0);
+			tracing = true;
+		}
 		raw_spin_unlock_irq(&sem->lock);
 		timeout = schedule_timeout(timeout);
 		raw_spin_lock_irq(&sem->lock);
-		if (waiter.up)
+		if (waiter.up) {
+			trace_contention_end(sem, 0);
 			return 0;
+		}
 	}
 
  timed_out:
+	if (tracing)
+		trace_contention_end(sem, -ETIME);
 	list_del(&waiter.list);
 	return -ETIME;
 
  interrupted:
+	if (tracing)
+		trace_contention_end(sem, -EINTR);
 	list_del(&waiter.list);
 	return -EINTR;
 }
-- 
2.35.1.894.gb6a874cedc-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] locking: Add lock contention tracepoints
  2022-03-16 22:45 ` [PATCH 1/2] locking: Add lock contention tracepoints Namhyung Kim
@ 2022-03-17  2:31   ` Steven Rostedt
  2022-03-17 13:32   ` Mathieu Desnoyers
  1 sibling, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2022-03-17  2:31 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, LKML, Thomas Gleixner, Byungchul Park,
	Paul E. McKenney, Mathieu Desnoyers, Arnd Bergmann,
	Radoslaw Burny, linux-arch, bpf

On Wed, 16 Mar 2022 15:45:47 -0700
Namhyung Kim <namhyung@kernel.org> wrote:

> This adds two new lock contention tracepoints like below:
> 
>  * lock:contention_begin
>  * lock:contention_end
> 
> The lock:contention_begin takes a flags argument to classify locks.  I
> found it useful to identify what kind of locks it's tracing like if
> it's spinning or sleeping, reader-writer lock, real-time, and per-cpu.
> 
> Move tracepoint definitions into mutex.c so that we can use them
> without lockdep.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  include/trace/events/lock.h | 54 ++++++++++++++++++++++++++++++++++---
>  kernel/locking/lockdep.c    |  1 -
>  kernel/locking/mutex.c      |  3 +++
>  3 files changed, 54 insertions(+), 4 deletions(-)
> 
> diff --git a/include/trace/events/lock.h b/include/trace/events/lock.h
> index d7512129a324..2a3df36d4fdb 100644
> --- a/include/trace/events/lock.h
> +++ b/include/trace/events/lock.h
> @@ -5,11 +5,21 @@
>  #if !defined(_TRACE_LOCK_H) || defined(TRACE_HEADER_MULTI_READ)
>  #define _TRACE_LOCK_H
>  
> -#include <linux/lockdep.h>
> +#include <linux/sched.h>
>  #include <linux/tracepoint.h>
>  
> +/* flags for lock:contention_begin */
> +#define LCB_F_SPIN	(1U << 0)
> +#define LCB_F_READ	(1U << 1)
> +#define LCB_F_WRITE	(1U << 2)
> +#define LCB_F_RT	(1U << 3)
> +#define LCB_F_PERCPU	(1U << 4)
> +
> +
>  #ifdef CONFIG_LOCKDEP
>  
> +#include <linux/lockdep.h>
> +
>  TRACE_EVENT(lock_acquire,
>  
>  	TP_PROTO(struct lockdep_map *lock, unsigned int subclass,
> @@ -78,8 +88,46 @@ DEFINE_EVENT(lock, lock_acquired,
>  	TP_ARGS(lock, ip)
>  );
>  
> -#endif
> -#endif
> +#endif /* CONFIG_LOCK_STAT */
> +#endif /* CONFIG_LOCKDEP */
> +
> +TRACE_EVENT(contention_begin,
> +
> +	TP_PROTO(void *lock, unsigned int flags),
> +
> +	TP_ARGS(lock, flags),
> +
> +	TP_STRUCT__entry(
> +		__field(void *, lock_addr)
> +		__field(unsigned int, flags)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->lock_addr = lock;
> +		__entry->flags = flags;
> +	),
> +
> +	TP_printk("%p (flags=%x)", __entry->lock_addr, __entry->flags)

Perhaps make this into:

	TP_printk("%p (flags=%s)", __entry->lock_addr,
		__print_flags(__entry->flags, "|",
			{LCB_F_SPIN,   "spin" },
			{LCB_F_READ,   "read" },
			{LCB_F_WRITE,  "write" },
			{LCB_F_RT,     "rt" },
			{LCB_F_PERCPU, "percpu" }
			))

That way, the use doesn't need to figure out what the numbers mean.

-- Steve

			
> +);
> +
> +TRACE_EVENT(contention_end,
> +
> +	TP_PROTO(void *lock, int ret),
> +
> +	TP_ARGS(lock, ret),
> +
> +	TP_STRUCT__entry(
> +		__field(void *, lock_addr)
> +		__field(int, ret)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->lock_addr = lock;
> +		__entry->ret = ret;
> +	),
> +
> +	TP_printk("%p (ret=%d)", __entry->lock_addr, __entry->ret)
> +);
>  
>  #endif /* _TRACE_LOCK_H */
>  
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 50036c10b518..08f8fb6a2d1e 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -60,7 +60,6 @@
>  
>  #include "lockdep_internals.h"
>  
> -#define CREATE_TRACE_POINTS
>  #include <trace/events/lock.h>
>  
>  #ifdef CONFIG_PROVE_LOCKING
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 5e3585950ec8..ee2fd7614a93 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -30,6 +30,9 @@
>  #include <linux/debug_locks.h>
>  #include <linux/osq_lock.h>
>  
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/lock.h>
> +
>  #ifndef CONFIG_PREEMPT_RT
>  #include "mutex.h"
>  


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] locking: Add lock contention tracepoints
  2022-03-16 22:45 ` [PATCH 1/2] locking: Add lock contention tracepoints Namhyung Kim
  2022-03-17  2:31   ` Steven Rostedt
@ 2022-03-17 13:32   ` Mathieu Desnoyers
  2022-03-17 16:07     ` Steven Rostedt
  1 sibling, 1 reply; 26+ messages in thread
From: Mathieu Desnoyers @ 2022-03-17 13:32 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, linux-kernel, Thomas Gleixner, rostedt,
	Byungchul Park, paulmck, Arnd Bergmann, Radoslaw Burny,
	linux-arch, bpf

----- On Mar 16, 2022, at 6:45 PM, Namhyung Kim namhyung@kernel.org wrote:

> This adds two new lock contention tracepoints like below:
> 
> * lock:contention_begin
> * lock:contention_end
> 
> The lock:contention_begin takes a flags argument to classify locks.  I
> found it useful to identify what kind of locks it's tracing like if
> it's spinning or sleeping, reader-writer lock, real-time, and per-cpu.
> 
> Move tracepoint definitions into mutex.c so that we can use them
> without lockdep.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> include/trace/events/lock.h | 54 ++++++++++++++++++++++++++++++++++---
> kernel/locking/lockdep.c    |  1 -
> kernel/locking/mutex.c      |  3 +++
> 3 files changed, 54 insertions(+), 4 deletions(-)
> 
> diff --git a/include/trace/events/lock.h b/include/trace/events/lock.h
> index d7512129a324..2a3df36d4fdb 100644
> --- a/include/trace/events/lock.h
> +++ b/include/trace/events/lock.h
> @@ -5,11 +5,21 @@
> #if !defined(_TRACE_LOCK_H) || defined(TRACE_HEADER_MULTI_READ)
> #define _TRACE_LOCK_H
> 
> -#include <linux/lockdep.h>
> +#include <linux/sched.h>
> #include <linux/tracepoint.h>
> 
> +/* flags for lock:contention_begin */
> +#define LCB_F_SPIN	(1U << 0)
> +#define LCB_F_READ	(1U << 1)
> +#define LCB_F_WRITE	(1U << 2)
> +#define LCB_F_RT	(1U << 3)
> +#define LCB_F_PERCPU	(1U << 4)

Unless there is a particular reason for using preprocessor defines here, the
following form is typically better because it does not pollute the preprocessor
defines, e.g.:

enum lock_contention_flags {
        LCB_F_SPIN =   1U << 0;
        LCB_F_READ =   1U << 1;
        LCB_F_WRITE =  1U << 2;
        LCB_F_RT =     1U << 3;
        LCB_F_PERCPU = 1U << 4;
};

Thanks,

Mathieu

> +
> +
> #ifdef CONFIG_LOCKDEP
> 
> +#include <linux/lockdep.h>
> +
> TRACE_EVENT(lock_acquire,
> 
> 	TP_PROTO(struct lockdep_map *lock, unsigned int subclass,
> @@ -78,8 +88,46 @@ DEFINE_EVENT(lock, lock_acquired,
> 	TP_ARGS(lock, ip)
> );
> 
> -#endif
> -#endif
> +#endif /* CONFIG_LOCK_STAT */
> +#endif /* CONFIG_LOCKDEP */
> +
> +TRACE_EVENT(contention_begin,
> +
> +	TP_PROTO(void *lock, unsigned int flags),
> +
> +	TP_ARGS(lock, flags),
> +
> +	TP_STRUCT__entry(
> +		__field(void *, lock_addr)
> +		__field(unsigned int, flags)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->lock_addr = lock;
> +		__entry->flags = flags;
> +	),
> +
> +	TP_printk("%p (flags=%x)", __entry->lock_addr, __entry->flags)
> +);
> +
> +TRACE_EVENT(contention_end,
> +
> +	TP_PROTO(void *lock, int ret),
> +
> +	TP_ARGS(lock, ret),
> +
> +	TP_STRUCT__entry(
> +		__field(void *, lock_addr)
> +		__field(int, ret)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->lock_addr = lock;
> +		__entry->ret = ret;
> +	),
> +
> +	TP_printk("%p (ret=%d)", __entry->lock_addr, __entry->ret)
> +);
> 
> #endif /* _TRACE_LOCK_H */
> 
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 50036c10b518..08f8fb6a2d1e 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -60,7 +60,6 @@
> 
> #include "lockdep_internals.h"
> 
> -#define CREATE_TRACE_POINTS
> #include <trace/events/lock.h>
> 
> #ifdef CONFIG_PROVE_LOCKING
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 5e3585950ec8..ee2fd7614a93 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -30,6 +30,9 @@
> #include <linux/debug_locks.h>
> #include <linux/osq_lock.h>
> 
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/lock.h>
> +
> #ifndef CONFIG_PREEMPT_RT
> #include "mutex.h"
> 
> --
> 2.35.1.894.gb6a874cedc-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-16 22:45 ` [PATCH 2/2] locking: Apply contention tracepoints in the slow path Namhyung Kim
@ 2022-03-17 13:45   ` Mathieu Desnoyers
  2022-03-17 16:10     ` Steven Rostedt
  2022-03-18 21:34     ` Namhyung Kim
  2022-03-17 18:19   ` Hyeonggon Yoo
  2022-03-18 12:55   ` Boqun Feng
  2 siblings, 2 replies; 26+ messages in thread
From: Mathieu Desnoyers @ 2022-03-17 13:45 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, linux-kernel, Thomas Gleixner, rostedt,
	Byungchul Park, paulmck, Arnd Bergmann, Radoslaw Burny,
	linux-arch, bpf

----- On Mar 16, 2022, at 6:45 PM, Namhyung Kim namhyung@kernel.org wrote:

> Adding the lock contention tracepoints in various lock function slow
> paths.  Note that each arch can define spinlock differently, I only
> added it only to the generic qspinlock for now.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> kernel/locking/mutex.c        |  3 +++
> kernel/locking/percpu-rwsem.c |  3 +++
> kernel/locking/qrwlock.c      |  9 +++++++++
> kernel/locking/qspinlock.c    |  5 +++++
> kernel/locking/rtmutex.c      | 11 +++++++++++
> kernel/locking/rwbase_rt.c    |  3 +++
> kernel/locking/rwsem.c        |  9 +++++++++
> kernel/locking/semaphore.c    | 14 +++++++++++++-
> 8 files changed, 56 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index ee2fd7614a93..c88deda77cf2 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -644,6 +644,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state,
> unsigned int subclas
> 	}
> 
> 	set_current_state(state);
> +	trace_contention_begin(lock, 0);

This should be LCB_F_SPIN rather than the hardcoded 0.

> 	for (;;) {
> 		bool first;
> 
> @@ -710,6 +711,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state,
> unsigned int subclas
> skip_wait:
> 	/* got the lock - cleanup and rejoice! */
> 	lock_acquired(&lock->dep_map, ip);
> +	trace_contention_end(lock, 0);
> 
> 	if (ww_ctx)
> 		ww_mutex_lock_acquired(ww, ww_ctx);
> @@ -721,6 +723,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state,
> unsigned int subclas
> err:
> 	__set_current_state(TASK_RUNNING);
> 	__mutex_remove_waiter(lock, &waiter);
> +	trace_contention_end(lock, ret);
> err_early_kill:
> 	raw_spin_unlock(&lock->wait_lock);
> 	debug_mutex_free_waiter(&waiter);
> diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
> index c9fdae94e098..833043613af6 100644
> --- a/kernel/locking/percpu-rwsem.c
> +++ b/kernel/locking/percpu-rwsem.c
> @@ -9,6 +9,7 @@
> #include <linux/sched/task.h>
> #include <linux/sched/debug.h>
> #include <linux/errno.h>
> +#include <trace/events/lock.h>
> 
> int __percpu_init_rwsem(struct percpu_rw_semaphore *sem,
> 			const char *name, struct lock_class_key *key)
> @@ -154,6 +155,7 @@ static void percpu_rwsem_wait(struct percpu_rw_semaphore
> *sem, bool reader)
> 	}
> 	spin_unlock_irq(&sem->waiters.lock);
> 
> +	trace_contention_begin(sem, LCB_F_PERCPU | (reader ? LCB_F_READ :
> LCB_F_WRITE));
> 	while (wait) {
> 		set_current_state(TASK_UNINTERRUPTIBLE);
> 		if (!smp_load_acquire(&wq_entry.private))
> @@ -161,6 +163,7 @@ static void percpu_rwsem_wait(struct percpu_rw_semaphore
> *sem, bool reader)
> 		schedule();
> 	}
> 	__set_current_state(TASK_RUNNING);
> +	trace_contention_end(sem, 0);

So for the reader-write locks, and percpu rwlocks, the "trace contention end" will always
have ret=0. Likewise for qspinlock, qrwlock, and rtlock. It seems to be a waste of trace
buffer space to always have space for a return value that is always 0.

Sorry if I missed prior dicussions of that topic, but why introduce this single
"trace contention begin/end" muxer tracepoint with flags rather than per-locking-type
tracepoint ? The per-locking-type tracepoint could be tuned to only have the fields
that are needed for each locking type.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] locking: Add lock contention tracepoints
  2022-03-17 13:32   ` Mathieu Desnoyers
@ 2022-03-17 16:07     ` Steven Rostedt
  2022-03-17 16:37       ` Mathieu Desnoyers
  0 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2022-03-17 16:07 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Namhyung Kim, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, linux-kernel, Thomas Gleixner,
	Byungchul Park, paulmck, Arnd Bergmann, Radoslaw Burny,
	linux-arch, bpf

On Thu, 17 Mar 2022 09:32:55 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> Unless there is a particular reason for using preprocessor defines here, the
> following form is typically better because it does not pollute the preprocessor
> defines, e.g.:
> 
> enum lock_contention_flags {
>         LCB_F_SPIN =   1U << 0;
>         LCB_F_READ =   1U << 1;
>         LCB_F_WRITE =  1U << 2;
>         LCB_F_RT =     1U << 3;
>         LCB_F_PERCPU = 1U << 4;
> };

If you do this, then to use the __print_flags(), You'll also need to add:

TRACE_DEFINE_ENUM(LCB_F_SPIN);
TRACE_DEFINE_ENUM(LCB_F_READ);
TRACE_DEFINE_ENUM(LCB_F_WRITE);
TRACE_DEFINE_ENUM(LCB_F_RT);
TRACE_DEFINE_ENUM(LCB_F_PERCPU);

Which does slow down boot up slightly.

-- Steve

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-17 13:45   ` Mathieu Desnoyers
@ 2022-03-17 16:10     ` Steven Rostedt
  2022-03-17 16:43       ` Mathieu Desnoyers
  2022-03-18 21:34     ` Namhyung Kim
  1 sibling, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2022-03-17 16:10 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Namhyung Kim, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, linux-kernel, Thomas Gleixner,
	Byungchul Park, paulmck, Arnd Bergmann, Radoslaw Burny,
	linux-arch, bpf

On Thu, 17 Mar 2022 09:45:28 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> > *sem, bool reader)
> > 		schedule();
> > 	}
> > 	__set_current_state(TASK_RUNNING);
> > +	trace_contention_end(sem, 0);  
> 
> So for the reader-write locks, and percpu rwlocks, the "trace contention end" will always
> have ret=0. Likewise for qspinlock, qrwlock, and rtlock. It seems to be a waste of trace
> buffer space to always have space for a return value that is always 0.
> 
> Sorry if I missed prior dicussions of that topic, but why introduce this single
> "trace contention begin/end" muxer tracepoint with flags rather than per-locking-type
> tracepoint ? The per-locking-type tracepoint could be tuned to only have the fields
> that are needed for each locking type.

per-locking-type tracepoint will also add a bigger footprint.

One extra byte is not an issue. This is just the tracepoints. You can still
attach your own specific LTTng trace events that ignores the zero
parameter, and can multiplex into specific types of trace events on your
end.

I prefer the current approach as it keeps the tracing footprint down.

-- Steve

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] locking: Add lock contention tracepoints
  2022-03-17 16:07     ` Steven Rostedt
@ 2022-03-17 16:37       ` Mathieu Desnoyers
  2022-03-18 20:58         ` Namhyung Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Mathieu Desnoyers @ 2022-03-17 16:37 UTC (permalink / raw)
  To: rostedt
  Cc: Namhyung Kim, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, linux-kernel, Thomas Gleixner,
	Byungchul Park, paulmck, Arnd Bergmann, Radoslaw Burny,
	linux-arch, bpf

----- On Mar 17, 2022, at 12:07 PM, rostedt rostedt@goodmis.org wrote:

> On Thu, 17 Mar 2022 09:32:55 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> Unless there is a particular reason for using preprocessor defines here, the
>> following form is typically better because it does not pollute the preprocessor
>> defines, e.g.:
>> 
>> enum lock_contention_flags {
>>         LCB_F_SPIN =   1U << 0;
>>         LCB_F_READ =   1U << 1;
>>         LCB_F_WRITE =  1U << 2;
>>         LCB_F_RT =     1U << 3;
>>         LCB_F_PERCPU = 1U << 4;
>> };
> 
> If you do this, then to use the __print_flags(), You'll also need to add:
> 
> TRACE_DEFINE_ENUM(LCB_F_SPIN);
> TRACE_DEFINE_ENUM(LCB_F_READ);
> TRACE_DEFINE_ENUM(LCB_F_WRITE);
> TRACE_DEFINE_ENUM(LCB_F_RT);
> TRACE_DEFINE_ENUM(LCB_F_PERCPU);
> 
> Which does slow down boot up slightly.

So it looks like there is (currently) a good reason for going with the #define.

As a side-discussion, I keep finding it odd that this adds overhead on boot. I suspect
this is also implemented as a linked list which needs to be iterated over at boot-time.

With a few changes to these macros, these linked lists could be turned into arrays,
and thus remove the boot-time overhead.

Thanks,

Mathieu

> 
> -- Steve

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-17 16:10     ` Steven Rostedt
@ 2022-03-17 16:43       ` Mathieu Desnoyers
  0 siblings, 0 replies; 26+ messages in thread
From: Mathieu Desnoyers @ 2022-03-17 16:43 UTC (permalink / raw)
  To: rostedt
  Cc: Namhyung Kim, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, linux-kernel, Thomas Gleixner,
	Byungchul Park, paulmck, Arnd Bergmann, Radoslaw Burny,
	linux-arch, bpf

----- On Mar 17, 2022, at 12:10 PM, rostedt rostedt@goodmis.org wrote:

> On Thu, 17 Mar 2022 09:45:28 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> > *sem, bool reader)
>> > 		schedule();
>> > 	}
>> > 	__set_current_state(TASK_RUNNING);
>> > +	trace_contention_end(sem, 0);
>> 
>> So for the reader-write locks, and percpu rwlocks, the "trace contention end"
>> will always
>> have ret=0. Likewise for qspinlock, qrwlock, and rtlock. It seems to be a waste
>> of trace
>> buffer space to always have space for a return value that is always 0.
>> 
>> Sorry if I missed prior dicussions of that topic, but why introduce this single
>> "trace contention begin/end" muxer tracepoint with flags rather than
>> per-locking-type
>> tracepoint ? The per-locking-type tracepoint could be tuned to only have the
>> fields
>> that are needed for each locking type.
> 
> per-locking-type tracepoint will also add a bigger footprint.

If you are talking about code and data size footprint in the kernel, yes, we agree
there.

> 
> One extra byte is not an issue.

The implementation uses a 32-bit integer.

But given that this only traces contention, it's probably not as important to
shrink the event size as if it would be for tracing every uncontended lock/unlock.

> This is just the tracepoints. You can still
> attach your own specific LTTng trace events that ignores the zero
> parameter, and can multiplex into specific types of trace events on your
> end.

Indeed, I could, as I do for system call entry/exit tracing. But I suspect it would
not be worth it for contended locks, because I don't expect those events to be frequent
enough in the trace to justify the added code/data footprint, as you pointed out.

> 
> I prefer the current approach as it keeps the tracing footprint down.

Likewise. I just wanted to make sure this was done knowing the trace buffer vs kernel
code/data overhead trade-off.

Thanks,

Mathieu

> 
> -- Steve

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/2] locking: Add new lock contention tracepoints (v3)
  2022-03-16 22:45 [PATCH 0/2] locking: Add new lock contention tracepoints (v3) Namhyung Kim
  2022-03-16 22:45 ` [PATCH 1/2] locking: Add lock contention tracepoints Namhyung Kim
  2022-03-16 22:45 ` [PATCH 2/2] locking: Apply contention tracepoints in the slow path Namhyung Kim
@ 2022-03-17 17:32 ` Hyeonggon Yoo
  2022-03-18 21:12   ` Namhyung Kim
  2 siblings, 1 reply; 26+ messages in thread
From: Hyeonggon Yoo @ 2022-03-17 17:32 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, LKML, Thomas Gleixner, Steven Rostedt,
	Byungchul Park, Paul E. McKenney, Mathieu Desnoyers,
	Arnd Bergmann, Radoslaw Burny, linux-arch, bpf

On Wed, Mar 16, 2022 at 03:45:46PM -0700, Namhyung Kim wrote:
> Hello,
> 
> There have been some requests for low-overhead kernel lock contention
> monitoring.  The kernel has CONFIG_LOCK_STAT to provide such an infra
> either via /proc/lock_stat or tracepoints directly.
> 
> However it's not light-weight and hard to be used in production.  So
> I'm trying to add new tracepoints for lock contention and using them
> as a base to build a new monitoring system.

Hello Namhyung,
I like this series so much.
Lock contentions became much more visible without serious overhead.

For the series:
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

How would you use these tracepoints, is there a script you use?
For testing, I just wrote a simple bpftrace script:

$ sudo bpftrace -e 'BEGIN
{
	printf("Collecting lockstats... Hit Ctrl-C to end.\n");
}

tracepoint:lock:contention_begin
{
	@start_us[tid] = nsecs / 1000;
}

tracepoint:lock:contention_end
{
	if (args->ret == 0) {
		@stats[kstack] = hist(nsecs / 1000 - @start_us[tid]);
	}
}

END
{
	clear(@start_us);
}'

And it shows its distribution of slowpath wait time like below. Great.

@stats[
    __traceiter_contention_end+76
    __traceiter_contention_end+76
    queued_spin_lock_slowpath+556
    _raw_spin_lock+108
    rmqueue_bulk+80
    rmqueue+1060
    get_page_from_freelist+372
    __alloc_pages+208
    alloc_pages_vma+160
    alloc_zeroed_user_highpage_movable+72
    do_anonymous_page+92
    handle_pte_fault+320
    __handle_mm_fault+252
    handle_mm_fault+244
    do_page_fault+340
    do_translation_fault+100
    do_mem_abort+76
    el0_da+60
    el0t_64_sync_handler+232
    el0t_64_sync+420
]:
[2, 4)                 1 |@                                                   |
[4, 8)                30 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     |
[8, 16)               25 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@             |
[16, 32)              33 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32, 64)              29 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       |
[64, 128)             13 |@@@@@@@@@@@@@@@@@@@@                                |
[128, 256)             2 |@@@                                                 |


@stats[
    __traceiter_contention_end+76
    __traceiter_contention_end+76
    rwsem_down_write_slowpath+1216
    down_write_killable+100
    do_mprotect_pkey.constprop.0+176
    __arm64_sys_mprotect+40
    invoke_syscall+80
    el0_svc_common.constprop.0+76
    do_el0_svc+52
    el0_svc+48
    el0t_64_sync_handler+164
    el0t_64_sync+420
]: 
[1]                   13 |@@@@@@@@@@@                                         |
[2, 4)                42 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                |
[4, 8)                 5 |@@@@                                                |
[8, 16)               10 |@@@@@@@@                                            |
[16, 32)              60 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32, 64)              41 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 |
[64, 128)             40 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                  |
[128, 256)            14 |@@@@@@@@@@@@                                        |
[256, 512)             7 |@@@@@@                                              |
[512, 1K)              6 |@@@@@                                               |
[1K, 2K)               2 |@                                                   |
[2K, 4K)               1 |                                                    |

@stats[
    __traceiter_contention_end+76
    __traceiter_contention_end+76
    queued_spin_lock_slowpath+556
    _raw_spin_lock+108
    futex_wake+168
    do_futex+200
    __arm64_sys_futex+112
    invoke_syscall+80
    el0_svc_common.constprop.0+76
    do_el0_svc+52
    el0_svc+48
    el0t_64_sync_handler+164
    el0t_64_sync+420
]:
[0]                    3 |                                                    |
[1]                 2515 |@                                                   |
[2, 4)              8747 |@@@@@                                               |
[4, 8)             17052 |@@@@@@@@@@                                          |
[8, 16)            46706 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                       |
[16, 32)           82105 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32, 64)           46918 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                       |
[64, 128)             99 |                                                    |
[128, 256)             1 |                                                    |
[256, 512)             8 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               0 |                                                    |
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)              0 |                                                    |
[16K, 32K)             5 |                                                    |
[32K, 64K)            12 |                                                    |
[64K, 128K)           34 |                                                    |
[128K, 256K)          68 |                                                    |
[256K, 512K)           7 |                                                    |


> * Changes in v3
>  - move the tracepoints deeper in the slow path
>  - remove the caller ip
>  - don't use task state in the flags
> 
> * Changes in v2
>  - do not use lockdep infrastructure
>  - add flags argument to lock:contention_begin tracepoint
> 
> I added a flags argument in the contention_begin to classify locks in
> question.  It can tell whether it's a spinlock, reader-writer lock or
> a mutex.  With stacktrace, users can identify which lock is contended.
>
> The patch 01 added the tracepoints and move the definition to the
> mutex.c file so that it can see the tracepoints without lockdep.
> 
> The patch 02 actually installs the tracepoints in the locking code.
> To minimize the overhead, they were added in the slow path of the code
> separately.  As spinlocks are defined in the arch headers, I couldn't
> handle them all.  I've just added it to generic queued spinlock and
> rwlocks only.  Each arch can add the tracepoints later.
>
> This series base on the current tip/locking/core and you get it from
> 'locking/tracepoint-v3' branch in my tree at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> 
> 
> Thanks,
> Namhyung
> 
> 
> Namhyung Kim (2):
>   locking: Add lock contention tracepoints
>   locking: Apply contention tracepoints in the slow path
> 
>  include/trace/events/lock.h   | 54 +++++++++++++++++++++++++++++++++--
>  kernel/locking/lockdep.c      |  1 -
>  kernel/locking/mutex.c        |  6 ++++
>  kernel/locking/percpu-rwsem.c |  3 ++
>  kernel/locking/qrwlock.c      |  9 ++++++
>  kernel/locking/qspinlock.c    |  5 ++++
>  kernel/locking/rtmutex.c      | 11 +++++++
>  kernel/locking/rwbase_rt.c    |  3 ++
>  kernel/locking/rwsem.c        |  9 ++++++
>  kernel/locking/semaphore.c    | 14 ++++++++-
>  10 files changed, 110 insertions(+), 5 deletions(-)
> 
> 
> base-commit: cd27ccfc727e99352321c0c75012ab9c5a90321e
> -- 
> 2.35.1.894.gb6a874cedc-goog
> 

-- 
Thank you, You are awesome!
Hyeonggon :-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-16 22:45 ` [PATCH 2/2] locking: Apply contention tracepoints in the slow path Namhyung Kim
  2022-03-17 13:45   ` Mathieu Desnoyers
@ 2022-03-17 18:19   ` Hyeonggon Yoo
  2022-03-18 21:43     ` Namhyung Kim
  2022-03-18 12:55   ` Boqun Feng
  2 siblings, 1 reply; 26+ messages in thread
From: Hyeonggon Yoo @ 2022-03-17 18:19 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, LKML, Thomas Gleixner, Steven Rostedt,
	Byungchul Park, Paul E. McKenney, Mathieu Desnoyers,
	Arnd Bergmann, Radoslaw Burny, linux-arch, bpf

On Wed, Mar 16, 2022 at 03:45:48PM -0700, Namhyung Kim wrote:
> Adding the lock contention tracepoints in various lock function slow
> paths.  Note that each arch can define spinlock differently, I only
> added it only to the generic qspinlock for now.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  kernel/locking/mutex.c        |  3 +++
>  kernel/locking/percpu-rwsem.c |  3 +++
>  kernel/locking/qrwlock.c      |  9 +++++++++
>  kernel/locking/qspinlock.c    |  5 +++++
>  kernel/locking/rtmutex.c      | 11 +++++++++++
>  kernel/locking/rwbase_rt.c    |  3 +++
>  kernel/locking/rwsem.c        |  9 +++++++++
>  kernel/locking/semaphore.c    | 14 +++++++++++++-
>  8 files changed, 56 insertions(+), 1 deletion(-)
>

[...]

> diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
> index 9ee381e4d2a4..e3c19668dfee 100644
> --- a/kernel/locking/semaphore.c
> +++ b/kernel/locking/semaphore.c
> @@ -32,6 +32,7 @@
>  #include <linux/semaphore.h>
>  #include <linux/spinlock.h>
>  #include <linux/ftrace.h>
> +#include <trace/events/lock.h>
>  
>  static noinline void __down(struct semaphore *sem);
>  static noinline int __down_interruptible(struct semaphore *sem);
> @@ -209,6 +210,7 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
>  								long timeout)
>  {
>  	struct semaphore_waiter waiter;
> +	bool tracing = false;
>  
>  	list_add_tail(&waiter.list, &sem->wait_list);
>  	waiter.task = current;
> @@ -220,18 +222,28 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
>  		if (unlikely(timeout <= 0))
>  			goto timed_out;
>  		__set_current_state(state);
> +		if (!tracing) {
> +			trace_contention_begin(sem, 0);
> +			tracing = true;
> +		}
>  		raw_spin_unlock_irq(&sem->lock);
>  		timeout = schedule_timeout(timeout);
>  		raw_spin_lock_irq(&sem->lock);
> -		if (waiter.up)
> +		if (waiter.up) {
> +			trace_contention_end(sem, 0);
>  			return 0;
> +		}
>  	}
>  
>   timed_out:
> +	if (tracing)
> +		trace_contention_end(sem, -ETIME);
>  	list_del(&waiter.list);
>  	return -ETIME;
>  
>   interrupted:
> +	if (tracing)
> +		trace_contention_end(sem, -EINTR);
>  	list_del(&waiter.list);
>  	return -EINTR;
>  }

why not simply remove tracing variable and call trace_contention_begin()
earlier like in rwsem? we can ignore it if ret != 0.

-- 
Thank you, You are awesome!
Hyeonggon :-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-16 22:45 ` [PATCH 2/2] locking: Apply contention tracepoints in the slow path Namhyung Kim
  2022-03-17 13:45   ` Mathieu Desnoyers
  2022-03-17 18:19   ` Hyeonggon Yoo
@ 2022-03-18 12:55   ` Boqun Feng
  2022-03-18 13:24     ` Hyeonggon Yoo
  2022-03-18 16:43     ` Peter Zijlstra
  2 siblings, 2 replies; 26+ messages in thread
From: Boqun Feng @ 2022-03-18 12:55 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, LKML,
	Thomas Gleixner, Steven Rostedt, Byungchul Park,
	Paul E. McKenney, Mathieu Desnoyers, Arnd Bergmann,
	Radoslaw Burny, linux-arch, bpf

On Wed, Mar 16, 2022 at 03:45:48PM -0700, Namhyung Kim wrote:
[...]
> @@ -209,6 +210,7 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
>  								long timeout)
>  {
>  	struct semaphore_waiter waiter;
> +	bool tracing = false;
>  
>  	list_add_tail(&waiter.list, &sem->wait_list);
>  	waiter.task = current;
> @@ -220,18 +222,28 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
>  		if (unlikely(timeout <= 0))
>  			goto timed_out;
>  		__set_current_state(state);
> +		if (!tracing) {
> +			trace_contention_begin(sem, 0);

This looks a littl ugly ;-/ Maybe we can rename __down_common() to
___down_common() and implement __down_common() as:

	static inline int __sched __down_common(...)
	{
		int ret;
		trace_contention_begin(sem, 0);
		ret = ___down_common(...);
		trace_contention_end(sem, ret);
		return ret;
	}

Thoughts?

Regards,
Boqun

> +			tracing = true;
> +		}
>  		raw_spin_unlock_irq(&sem->lock);
>  		timeout = schedule_timeout(timeout);
>  		raw_spin_lock_irq(&sem->lock);
> -		if (waiter.up)
> +		if (waiter.up) {
> +			trace_contention_end(sem, 0);
>  			return 0;
> +		}
>  	}
>  
>   timed_out:
> +	if (tracing)
> +		trace_contention_end(sem, -ETIME);
>  	list_del(&waiter.list);
>  	return -ETIME;
>  
>   interrupted:
> +	if (tracing)
> +		trace_contention_end(sem, -EINTR);
>  	list_del(&waiter.list);
>  	return -EINTR;
>  }
> -- 
> 2.35.1.894.gb6a874cedc-goog
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-18 12:55   ` Boqun Feng
@ 2022-03-18 13:24     ` Hyeonggon Yoo
  2022-03-18 13:28       ` Hyeonggon Yoo
  2022-03-18 16:43     ` Peter Zijlstra
  1 sibling, 1 reply; 26+ messages in thread
From: Hyeonggon Yoo @ 2022-03-18 13:24 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Namhyung Kim, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, LKML, Thomas Gleixner, Steven Rostedt,
	Byungchul Park, Paul E. McKenney, Mathieu Desnoyers,
	Arnd Bergmann, Radoslaw Burny, linux-arch, bpf

On Fri, Mar 18, 2022 at 08:55:32PM +0800, Boqun Feng wrote:
> On Wed, Mar 16, 2022 at 03:45:48PM -0700, Namhyung Kim wrote:
> [...]
> > @@ -209,6 +210,7 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
> >  								long timeout)
> >  {
> >  	struct semaphore_waiter waiter;
> > +	bool tracing = false;
> >  
> >  	list_add_tail(&waiter.list, &sem->wait_list);
> >  	waiter.task = current;
> > @@ -220,18 +222,28 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
> >  		if (unlikely(timeout <= 0))
> >  			goto timed_out;
> >  		__set_current_state(state);
> > +		if (!tracing) {
> > +			trace_contention_begin(sem, 0);
> 
> This looks a littl ugly ;-/

I agree this can be simplified a bit.

> Maybe we can rename __down_common() to
> ___down_common() and implement __down_common() as:
> 
> 	static inline int __sched __down_common(...)
> 	{
> 		int ret;
> 		trace_contention_begin(sem, 0);
> 		ret = ___down_common(...);
> 		trace_contention_end(sem, ret);
> 		return ret;
> 	}
> 
> Thoughts?
>

But IMO inlining tracepoints is generally not a good idea.
Will increase kernel size a lot.

> Regards,
> Boqun
> 
> > +			tracing = true;
> > +		}
> >  		raw_spin_unlock_irq(&sem->lock);
> >  		timeout = schedule_timeout(timeout);
> >  		raw_spin_lock_irq(&sem->lock);
> > -		if (waiter.up)
> > +		if (waiter.up) {
> > +			trace_contention_end(sem, 0);
> >  			return 0;
> > +		}
> >  	}
> >  
> >   timed_out:
> > +	if (tracing)
> > +		trace_contention_end(sem, -ETIME);
> >  	list_del(&waiter.list);
> >  	return -ETIME;
> >  
> >   interrupted:
> > +	if (tracing)
> > +		trace_contention_end(sem, -EINTR);
> >  	list_del(&waiter.list);
> >  	return -EINTR;
> >  }
> > -- 
> > 2.35.1.894.gb6a874cedc-goog
> > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-18 13:24     ` Hyeonggon Yoo
@ 2022-03-18 13:28       ` Hyeonggon Yoo
  0 siblings, 0 replies; 26+ messages in thread
From: Hyeonggon Yoo @ 2022-03-18 13:28 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Namhyung Kim, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, LKML, Thomas Gleixner, Steven Rostedt,
	Byungchul Park, Paul E. McKenney, Mathieu Desnoyers,
	Arnd Bergmann, Radoslaw Burny, linux-arch, bpf

On Fri, Mar 18, 2022 at 01:24:24PM +0000, Hyeonggon Yoo wrote:
> On Fri, Mar 18, 2022 at 08:55:32PM +0800, Boqun Feng wrote:
> > On Wed, Mar 16, 2022 at 03:45:48PM -0700, Namhyung Kim wrote:
> > [...]
> > > @@ -209,6 +210,7 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
> > >  								long timeout)
> > >  {
> > >  	struct semaphore_waiter waiter;
> > > +	bool tracing = false;
> > >  
> > >  	list_add_tail(&waiter.list, &sem->wait_list);
> > >  	waiter.task = current;
> > > @@ -220,18 +222,28 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
> > >  		if (unlikely(timeout <= 0))
> > >  			goto timed_out;
> > >  		__set_current_state(state);
> > > +		if (!tracing) {
> > > +			trace_contention_begin(sem, 0);
> > 
> > This looks a littl ugly ;-/
> 
> I agree this can be simplified a bit.
> 
> > Maybe we can rename __down_common() to
> > ___down_common() and implement __down_common() as:
> > 
> > 	static inline int __sched __down_common(...)
> > 	{
> > 		int ret;
> > 		trace_contention_begin(sem, 0);
> > 		ret = ___down_common(...);
> > 		trace_contention_end(sem, ret);
> > 		return ret;
> > 	}
> > 
> > Thoughts?
> >
> 
> But IMO inlining tracepoints is generally not a good idea.
> Will increase kernel size a lot.
>

Ah, it's already inlined. Sorry.

> > Regards,
> > Boqun
> > 
> > > +			tracing = true;
> > > +		}
> > >  		raw_spin_unlock_irq(&sem->lock);
> > >  		timeout = schedule_timeout(timeout);
> > >  		raw_spin_lock_irq(&sem->lock);
> > > -		if (waiter.up)
> > > +		if (waiter.up) {
> > > +			trace_contention_end(sem, 0);
> > >  			return 0;
> > > +		}
> > >  	}
> > >  
> > >   timed_out:
> > > +	if (tracing)
> > > +		trace_contention_end(sem, -ETIME);
> > >  	list_del(&waiter.list);
> > >  	return -ETIME;
> > >  
> > >   interrupted:
> > > +	if (tracing)
> > > +		trace_contention_end(sem, -EINTR);
> > >  	list_del(&waiter.list);
> > >  	return -EINTR;
> > >  }
> > > -- 
> > > 2.35.1.894.gb6a874cedc-goog
> > > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-18 12:55   ` Boqun Feng
  2022-03-18 13:24     ` Hyeonggon Yoo
@ 2022-03-18 16:43     ` Peter Zijlstra
  2022-03-18 21:55       ` Namhyung Kim
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2022-03-18 16:43 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Namhyung Kim, Ingo Molnar, Will Deacon, Waiman Long, LKML,
	Thomas Gleixner, Steven Rostedt, Byungchul Park,
	Paul E. McKenney, Mathieu Desnoyers, Arnd Bergmann,
	Radoslaw Burny, linux-arch, bpf

On Fri, Mar 18, 2022 at 08:55:32PM +0800, Boqun Feng wrote:
> On Wed, Mar 16, 2022 at 03:45:48PM -0700, Namhyung Kim wrote:
> [...]
> > @@ -209,6 +210,7 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
> >  								long timeout)
> >  {
> >  	struct semaphore_waiter waiter;
> > +	bool tracing = false;
> >  
> >  	list_add_tail(&waiter.list, &sem->wait_list);
> >  	waiter.task = current;
> > @@ -220,18 +222,28 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
> >  		if (unlikely(timeout <= 0))
> >  			goto timed_out;
> >  		__set_current_state(state);
> > +		if (!tracing) {
> > +			trace_contention_begin(sem, 0);
> 
> This looks a littl ugly ;-/ Maybe we can rename __down_common() to
> ___down_common() and implement __down_common() as:
> 
> 	static inline int __sched __down_common(...)
> 	{
> 		int ret;
> 		trace_contention_begin(sem, 0);
> 		ret = ___down_common(...);
> 		trace_contention_end(sem, ret);
> 		return ret;
> 	}
> 
> Thoughts?

Yeah, that works, except I think he wants a few extra
__set_current_state()'s like so:

diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index 9ee381e4d2a4..e2049a7e0ea4 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -205,8 +205,7 @@ struct semaphore_waiter {
  * constant, and thus optimised away by the compiler.  Likewise the
  * 'timeout' parameter for the cases without timeouts.
  */
-static inline int __sched __down_common(struct semaphore *sem, long state,
-								long timeout)
+static __always_inline int ___down_common(struct semaphore *sem, long state, long timeout)
 {
 	struct semaphore_waiter waiter;
 
@@ -227,15 +226,28 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
 			return 0;
 	}
 
- timed_out:
+timed_out:
 	list_del(&waiter.list);
 	return -ETIME;
 
- interrupted:
+interrupted:
 	list_del(&waiter.list);
 	return -EINTR;
 }
 
+static __always_inline int __down_common(struct semaphore *sem, long state, long timeout)
+{
+	int ret;
+
+	__set_current_state(state);
+	trace_contention_begin(sem, 0);
+	ret = ___down_common(sem, state, timeout);
+	__set_current_state(TASK_RUNNING);
+	trace_contention_end(sem, ret);
+
+	return ret;
+}
+
 static noinline void __sched __down(struct semaphore *sem)
 {
 	__down_common(sem, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] locking: Add lock contention tracepoints
  2022-03-17 16:37       ` Mathieu Desnoyers
@ 2022-03-18 20:58         ` Namhyung Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Namhyung Kim @ 2022-03-18 20:58 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: rostedt, Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, linux-kernel, Thomas Gleixner, Byungchul Park,
	paulmck, Arnd Bergmann, Radoslaw Burny, linux-arch, bpf

Hi Mathieu and Steve,

On Thu, Mar 17, 2022 at 9:37 AM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> ----- On Mar 17, 2022, at 12:07 PM, rostedt rostedt@goodmis.org wrote:
>
> > On Thu, 17 Mar 2022 09:32:55 -0400 (EDT)
> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> >
> >> Unless there is a particular reason for using preprocessor defines here, the
> >> following form is typically better because it does not pollute the preprocessor
> >> defines, e.g.:
> >>
> >> enum lock_contention_flags {
> >>         LCB_F_SPIN =   1U << 0;
> >>         LCB_F_READ =   1U << 1;
> >>         LCB_F_WRITE =  1U << 2;
> >>         LCB_F_RT =     1U << 3;
> >>         LCB_F_PERCPU = 1U << 4;
> >> };
> >
> > If you do this, then to use the __print_flags(), You'll also need to add:
> >
> > TRACE_DEFINE_ENUM(LCB_F_SPIN);
> > TRACE_DEFINE_ENUM(LCB_F_READ);
> > TRACE_DEFINE_ENUM(LCB_F_WRITE);
> > TRACE_DEFINE_ENUM(LCB_F_RT);
> > TRACE_DEFINE_ENUM(LCB_F_PERCPU);
> >
> > Which does slow down boot up slightly.
>
> So it looks like there is (currently) a good reason for going with the #define.

Thanks for your suggestions, I'd go with define this time and we could
convert it to enum later (hopefully after the boot time is resolved).

Thanks,
Namhyung

>
> As a side-discussion, I keep finding it odd that this adds overhead on boot. I suspect
> this is also implemented as a linked list which needs to be iterated over at boot-time.
>
> With a few changes to these macros, these linked lists could be turned into arrays,
> and thus remove the boot-time overhead.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/2] locking: Add new lock contention tracepoints (v3)
  2022-03-17 17:32 ` [PATCH 0/2] locking: Add new lock contention tracepoints (v3) Hyeonggon Yoo
@ 2022-03-18 21:12   ` Namhyung Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Namhyung Kim @ 2022-03-18 21:12 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, LKML, Thomas Gleixner, Steven Rostedt,
	Byungchul Park, Paul E. McKenney, Mathieu Desnoyers,
	Arnd Bergmann, Radoslaw Burny, linux-arch, bpf

Hello Hyeonggon,

On Thu, Mar 17, 2022 at 10:32 AM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
>
> On Wed, Mar 16, 2022 at 03:45:46PM -0700, Namhyung Kim wrote:
> > Hello,
> >
> > There have been some requests for low-overhead kernel lock contention
> > monitoring.  The kernel has CONFIG_LOCK_STAT to provide such an infra
> > either via /proc/lock_stat or tracepoints directly.
> >
> > However it's not light-weight and hard to be used in production.  So
> > I'm trying to add new tracepoints for lock contention and using them
> > as a base to build a new monitoring system.
>
> Hello Namhyung,
> I like this series so much.
> Lock contentions became much more visible without serious overhead.
>
> For the series:
> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Thanks!

>
> How would you use these tracepoints, is there a script you use?

Not yet.  But I'm thinking something similar to your script.
Probably I'll extend 'perf lock' command to have this kind of output
but it doesn't have lock name and use avg/min/max time instead of
histogram.

Thanks,
Namhyung


> For testing, I just wrote a simple bpftrace script:
>
> $ sudo bpftrace -e 'BEGIN
> {
>         printf("Collecting lockstats... Hit Ctrl-C to end.\n");
> }
>
> tracepoint:lock:contention_begin
> {
>         @start_us[tid] = nsecs / 1000;
> }
>
> tracepoint:lock:contention_end
> {
>         if (args->ret == 0) {
>                 @stats[kstack] = hist(nsecs / 1000 - @start_us[tid]);
>         }
> }
>
> END
> {
>         clear(@start_us);
> }'
>
> And it shows its distribution of slowpath wait time like below. Great.
>
> @stats[
>     __traceiter_contention_end+76
>     __traceiter_contention_end+76
>     queued_spin_lock_slowpath+556
>     _raw_spin_lock+108
>     rmqueue_bulk+80
>     rmqueue+1060
>     get_page_from_freelist+372
>     __alloc_pages+208
>     alloc_pages_vma+160
>     alloc_zeroed_user_highpage_movable+72
>     do_anonymous_page+92
>     handle_pte_fault+320
>     __handle_mm_fault+252
>     handle_mm_fault+244
>     do_page_fault+340
>     do_translation_fault+100
>     do_mem_abort+76
>     el0_da+60
>     el0t_64_sync_handler+232
>     el0t_64_sync+420
> ]:
> [2, 4)                 1 |@                                                   |
> [4, 8)                30 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     |
> [8, 16)               25 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@             |
> [16, 32)              33 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [32, 64)              29 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       |
> [64, 128)             13 |@@@@@@@@@@@@@@@@@@@@                                |
> [128, 256)             2 |@@@                                                 |
>
>
> @stats[
>     __traceiter_contention_end+76
>     __traceiter_contention_end+76
>     rwsem_down_write_slowpath+1216
>     down_write_killable+100
>     do_mprotect_pkey.constprop.0+176
>     __arm64_sys_mprotect+40
>     invoke_syscall+80
>     el0_svc_common.constprop.0+76
>     do_el0_svc+52
>     el0_svc+48
>     el0t_64_sync_handler+164
>     el0t_64_sync+420
> ]:
> [1]                   13 |@@@@@@@@@@@                                         |
> [2, 4)                42 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                |
> [4, 8)                 5 |@@@@                                                |
> [8, 16)               10 |@@@@@@@@                                            |
> [16, 32)              60 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [32, 64)              41 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 |
> [64, 128)             40 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                  |
> [128, 256)            14 |@@@@@@@@@@@@                                        |
> [256, 512)             7 |@@@@@@                                              |
> [512, 1K)              6 |@@@@@                                               |
> [1K, 2K)               2 |@                                                   |
> [2K, 4K)               1 |                                                    |
>
> @stats[
>     __traceiter_contention_end+76
>     __traceiter_contention_end+76
>     queued_spin_lock_slowpath+556
>     _raw_spin_lock+108
>     futex_wake+168
>     do_futex+200
>     __arm64_sys_futex+112
>     invoke_syscall+80
>     el0_svc_common.constprop.0+76
>     do_el0_svc+52
>     el0_svc+48
>     el0t_64_sync_handler+164
>     el0t_64_sync+420
> ]:
> [0]                    3 |                                                    |
> [1]                 2515 |@                                                   |
> [2, 4)              8747 |@@@@@                                               |
> [4, 8)             17052 |@@@@@@@@@@                                          |
> [8, 16)            46706 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                       |
> [16, 32)           82105 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [32, 64)           46918 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                       |
> [64, 128)             99 |                                                    |
> [128, 256)             1 |                                                    |
> [256, 512)             8 |                                                    |
> [512, 1K)              0 |                                                    |
> [1K, 2K)               0 |                                                    |
> [2K, 4K)               0 |                                                    |
> [4K, 8K)               0 |                                                    |
> [8K, 16K)              0 |                                                    |
> [16K, 32K)             5 |                                                    |
> [32K, 64K)            12 |                                                    |
> [64K, 128K)           34 |                                                    |
> [128K, 256K)          68 |                                                    |
> [256K, 512K)           7 |                                                    |
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-17 13:45   ` Mathieu Desnoyers
  2022-03-17 16:10     ` Steven Rostedt
@ 2022-03-18 21:34     ` Namhyung Kim
  1 sibling, 0 replies; 26+ messages in thread
From: Namhyung Kim @ 2022-03-18 21:34 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, linux-kernel, Thomas Gleixner, rostedt,
	Byungchul Park, paulmck, Arnd Bergmann, Radoslaw Burny,
	linux-arch, bpf

On Thu, Mar 17, 2022 at 6:45 AM Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> ----- On Mar 16, 2022, at 6:45 PM, Namhyung Kim namhyung@kernel.org wrote:
>
> > Adding the lock contention tracepoints in various lock function slow
> > paths.  Note that each arch can define spinlock differently, I only
> > added it only to the generic qspinlock for now.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> > kernel/locking/mutex.c        |  3 +++
> > kernel/locking/percpu-rwsem.c |  3 +++
> > kernel/locking/qrwlock.c      |  9 +++++++++
> > kernel/locking/qspinlock.c    |  5 +++++
> > kernel/locking/rtmutex.c      | 11 +++++++++++
> > kernel/locking/rwbase_rt.c    |  3 +++
> > kernel/locking/rwsem.c        |  9 +++++++++
> > kernel/locking/semaphore.c    | 14 +++++++++++++-
> > 8 files changed, 56 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> > index ee2fd7614a93..c88deda77cf2 100644
> > --- a/kernel/locking/mutex.c
> > +++ b/kernel/locking/mutex.c
> > @@ -644,6 +644,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state,
> > unsigned int subclas
> >       }
> >
> >       set_current_state(state);
> > +     trace_contention_begin(lock, 0);
>
> This should be LCB_F_SPIN rather than the hardcoded 0.

I don't think so.  LCB_F_SPIN is for spin locks indicating that
it's spinning on a cpu.  And the value is not 0.

>
> >       for (;;) {
> >               bool first;
> >
> > @@ -710,6 +711,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state,
> > unsigned int subclas
> > skip_wait:
> >       /* got the lock - cleanup and rejoice! */
> >       lock_acquired(&lock->dep_map, ip);
> > +     trace_contention_end(lock, 0);
> >
> >       if (ww_ctx)
> >               ww_mutex_lock_acquired(ww, ww_ctx);
> > @@ -721,6 +723,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state,
> > unsigned int subclas
> > err:
> >       __set_current_state(TASK_RUNNING);
> >       __mutex_remove_waiter(lock, &waiter);
> > +     trace_contention_end(lock, ret);
> > err_early_kill:
> >       raw_spin_unlock(&lock->wait_lock);
> >       debug_mutex_free_waiter(&waiter);
> > diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
> > index c9fdae94e098..833043613af6 100644
> > --- a/kernel/locking/percpu-rwsem.c
> > +++ b/kernel/locking/percpu-rwsem.c
> > @@ -9,6 +9,7 @@
> > #include <linux/sched/task.h>
> > #include <linux/sched/debug.h>
> > #include <linux/errno.h>
> > +#include <trace/events/lock.h>
> >
> > int __percpu_init_rwsem(struct percpu_rw_semaphore *sem,
> >                       const char *name, struct lock_class_key *key)
> > @@ -154,6 +155,7 @@ static void percpu_rwsem_wait(struct percpu_rw_semaphore
> > *sem, bool reader)
> >       }
> >       spin_unlock_irq(&sem->waiters.lock);
> >
> > +     trace_contention_begin(sem, LCB_F_PERCPU | (reader ? LCB_F_READ :
> > LCB_F_WRITE));
> >       while (wait) {
> >               set_current_state(TASK_UNINTERRUPTIBLE);
> >               if (!smp_load_acquire(&wq_entry.private))
> > @@ -161,6 +163,7 @@ static void percpu_rwsem_wait(struct percpu_rw_semaphore
> > *sem, bool reader)
> >               schedule();
> >       }
> >       __set_current_state(TASK_RUNNING);
> > +     trace_contention_end(sem, 0);
>
> So for the reader-write locks, and percpu rwlocks, the "trace contention end" will always
> have ret=0. Likewise for qspinlock, qrwlock, and rtlock. It seems to be a waste of trace
> buffer space to always have space for a return value that is always 0.

Right, I think it'd be better to have a new tracepoint for the error cases
and get rid of the return value in the contention_end.

Like contention_error or contention_return ?

>
> Sorry if I missed prior dicussions of that topic, but why introduce this single
> "trace contention begin/end" muxer tracepoint with flags rather than per-locking-type
> tracepoint ? The per-locking-type tracepoint could be tuned to only have the fields
> that are needed for each locking type.

No prior discussions on that topic and thanks for bringing it out.

Having per-locking-type tracepoints will help if you want to filter
out specific types of locks efficiently.  Otherwise it'd be simpler
for users to have a single set of tracepoints to handle all locking
types like the existing lockdep tracepoints do.

As it's in a contended path, I think it's allowed to be a little bit
less efficient and the flags can tell which type of locks it's tracing
so you can filter it out anyway.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-17 18:19   ` Hyeonggon Yoo
@ 2022-03-18 21:43     ` Namhyung Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Namhyung Kim @ 2022-03-18 21:43 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, LKML, Thomas Gleixner, Steven Rostedt,
	Byungchul Park, Paul E. McKenney, Mathieu Desnoyers,
	Arnd Bergmann, Radoslaw Burny, linux-arch, bpf

On Thu, Mar 17, 2022 at 11:19 AM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
>
> On Wed, Mar 16, 2022 at 03:45:48PM -0700, Namhyung Kim wrote:
> > Adding the lock contention tracepoints in various lock function slow
> > paths.  Note that each arch can define spinlock differently, I only
> > added it only to the generic qspinlock for now.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  kernel/locking/mutex.c        |  3 +++
> >  kernel/locking/percpu-rwsem.c |  3 +++
> >  kernel/locking/qrwlock.c      |  9 +++++++++
> >  kernel/locking/qspinlock.c    |  5 +++++
> >  kernel/locking/rtmutex.c      | 11 +++++++++++
> >  kernel/locking/rwbase_rt.c    |  3 +++
> >  kernel/locking/rwsem.c        |  9 +++++++++
> >  kernel/locking/semaphore.c    | 14 +++++++++++++-
> >  8 files changed, 56 insertions(+), 1 deletion(-)
> >
>
> [...]
>
> > diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
> > index 9ee381e4d2a4..e3c19668dfee 100644
> > --- a/kernel/locking/semaphore.c
> > +++ b/kernel/locking/semaphore.c
> > @@ -32,6 +32,7 @@
> >  #include <linux/semaphore.h>
> >  #include <linux/spinlock.h>
> >  #include <linux/ftrace.h>
> > +#include <trace/events/lock.h>
> >
> >  static noinline void __down(struct semaphore *sem);
> >  static noinline int __down_interruptible(struct semaphore *sem);
> > @@ -209,6 +210,7 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
> >                                                               long timeout)
> >  {
> >       struct semaphore_waiter waiter;
> > +     bool tracing = false;
> >
> >       list_add_tail(&waiter.list, &sem->wait_list);
> >       waiter.task = current;
> > @@ -220,18 +222,28 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
> >               if (unlikely(timeout <= 0))
> >                       goto timed_out;
> >               __set_current_state(state);
> > +             if (!tracing) {
> > +                     trace_contention_begin(sem, 0);
> > +                     tracing = true;
> > +             }
> >               raw_spin_unlock_irq(&sem->lock);
> >               timeout = schedule_timeout(timeout);
> >               raw_spin_lock_irq(&sem->lock);
> > -             if (waiter.up)
> > +             if (waiter.up) {
> > +                     trace_contention_end(sem, 0);
> >                       return 0;
> > +             }
> >       }
> >
> >   timed_out:
> > +     if (tracing)
> > +             trace_contention_end(sem, -ETIME);
> >       list_del(&waiter.list);
> >       return -ETIME;
> >
> >   interrupted:
> > +     if (tracing)
> > +             trace_contention_end(sem, -EINTR);
> >       list_del(&waiter.list);
> >       return -EINTR;
> >  }
>
> why not simply remove tracing variable and call trace_contention_begin()
> earlier like in rwsem? we can ignore it if ret != 0.

Right, will change.  But we should not ignore the return value.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-18 16:43     ` Peter Zijlstra
@ 2022-03-18 21:55       ` Namhyung Kim
  2022-03-18 22:07         ` Steven Rostedt
  0 siblings, 1 reply; 26+ messages in thread
From: Namhyung Kim @ 2022-03-18 21:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Boqun Feng, Ingo Molnar, Will Deacon, Waiman Long, LKML,
	Thomas Gleixner, Steven Rostedt, Byungchul Park,
	Paul E. McKenney, Mathieu Desnoyers, Arnd Bergmann,
	Radoslaw Burny, linux-arch, bpf

Hello,

On Fri, Mar 18, 2022 at 9:43 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Mar 18, 2022 at 08:55:32PM +0800, Boqun Feng wrote:
> > On Wed, Mar 16, 2022 at 03:45:48PM -0700, Namhyung Kim wrote:
> > [...]
> > > @@ -209,6 +210,7 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
> > >                                                             long timeout)
> > >  {
> > >     struct semaphore_waiter waiter;
> > > +   bool tracing = false;
> > >
> > >     list_add_tail(&waiter.list, &sem->wait_list);
> > >     waiter.task = current;
> > > @@ -220,18 +222,28 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
> > >             if (unlikely(timeout <= 0))
> > >                     goto timed_out;
> > >             __set_current_state(state);
> > > +           if (!tracing) {
> > > +                   trace_contention_begin(sem, 0);
> >
> > This looks a littl ugly ;-/ Maybe we can rename __down_common() to
> > ___down_common() and implement __down_common() as:
> >
> >       static inline int __sched __down_common(...)
> >       {
> >               int ret;
> >               trace_contention_begin(sem, 0);
> >               ret = ___down_common(...);
> >               trace_contention_end(sem, ret);
> >               return ret;
> >       }
> >
> > Thoughts?
>
> Yeah, that works, except I think he wants a few extra
> __set_current_state()'s like so:

Not anymore, I decided not to because of noise in the task state.

Also I'm considering two tracepoints for the return path to reduce
the buffer size as Mathieu suggested.  Normally it'd return with 0
so we can ignore it in the contention_end.  For non-zero cases,
we can add a new tracepoint to save the return value.

Thanks,
Namhyung

>
> diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
> index 9ee381e4d2a4..e2049a7e0ea4 100644
> --- a/kernel/locking/semaphore.c
> +++ b/kernel/locking/semaphore.c
> @@ -205,8 +205,7 @@ struct semaphore_waiter {
>   * constant, and thus optimised away by the compiler.  Likewise the
>   * 'timeout' parameter for the cases without timeouts.
>   */
> -static inline int __sched __down_common(struct semaphore *sem, long state,
> -                                                               long timeout)
> +static __always_inline int ___down_common(struct semaphore *sem, long state, long timeout)
>  {
>         struct semaphore_waiter waiter;
>
> @@ -227,15 +226,28 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
>                         return 0;
>         }
>
> - timed_out:
> +timed_out:
>         list_del(&waiter.list);
>         return -ETIME;
>
> - interrupted:
> +interrupted:
>         list_del(&waiter.list);
>         return -EINTR;
>  }
>
> +static __always_inline int __down_common(struct semaphore *sem, long state, long timeout)
> +{
> +       int ret;
> +
> +       __set_current_state(state);
> +       trace_contention_begin(sem, 0);
> +       ret = ___down_common(sem, state, timeout);
> +       __set_current_state(TASK_RUNNING);
> +       trace_contention_end(sem, ret);
> +
> +       return ret;
> +}
> +
>  static noinline void __sched __down(struct semaphore *sem)
>  {
>         __down_common(sem, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-18 21:55       ` Namhyung Kim
@ 2022-03-18 22:07         ` Steven Rostedt
  2022-03-19  0:11           ` Namhyung Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2022-03-18 22:07 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Boqun Feng, Ingo Molnar, Will Deacon,
	Waiman Long, LKML, Thomas Gleixner, Byungchul Park,
	Paul E. McKenney, Mathieu Desnoyers, Arnd Bergmann,
	Radoslaw Burny, linux-arch, bpf

On Fri, 18 Mar 2022 14:55:27 -0700
Namhyung Kim <namhyung@kernel.org> wrote:

> > > This looks a littl ugly ;-/ Maybe we can rename __down_common() to
> > > ___down_common() and implement __down_common() as:
> > >
> > >       static inline int __sched __down_common(...)
> > >       {
> > >               int ret;
> > >               trace_contention_begin(sem, 0);
> > >               ret = ___down_common(...);
> > >               trace_contention_end(sem, ret);
> > >               return ret;
> > >       }
> > >
> > > Thoughts?  
> >
> > Yeah, that works, except I think he wants a few extra
> > __set_current_state()'s like so:  
> 
> Not anymore, I decided not to because of noise in the task state.
> 
> Also I'm considering two tracepoints for the return path to reduce
> the buffer size as Mathieu suggested.  Normally it'd return with 0
> so we can ignore it in the contention_end.  For non-zero cases,
> we can add a new tracepoint to save the return value.

I don't think you need two tracepoints, but one that you can override.

We have eprobes that let you create a trace event on top of a current trace
event that can limit or extend what is traced in the buffer.

And I also have custom events that can be placed on top of any existing
tracepoint that has full access to what is sent into the tracepoint on not
just what is available to the trace event:

  https://lore.kernel.org/all/20220312232551.181178712@goodmis.org/

-- Steve

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-18 22:07         ` Steven Rostedt
@ 2022-03-19  0:11           ` Namhyung Kim
  2022-03-22  5:31             ` Namhyung Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Namhyung Kim @ 2022-03-19  0:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Boqun Feng, Ingo Molnar, Will Deacon,
	Waiman Long, LKML, Thomas Gleixner, Byungchul Park,
	Paul E. McKenney, Mathieu Desnoyers, Arnd Bergmann,
	Radoslaw Burny, linux-arch, bpf

On Fri, Mar 18, 2022 at 3:07 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 18 Mar 2022 14:55:27 -0700
> Namhyung Kim <namhyung@kernel.org> wrote:
>
> > > > This looks a littl ugly ;-/ Maybe we can rename __down_common() to
> > > > ___down_common() and implement __down_common() as:
> > > >
> > > >       static inline int __sched __down_common(...)
> > > >       {
> > > >               int ret;
> > > >               trace_contention_begin(sem, 0);
> > > >               ret = ___down_common(...);
> > > >               trace_contention_end(sem, ret);
> > > >               return ret;
> > > >       }
> > > >
> > > > Thoughts?
> > >
> > > Yeah, that works, except I think he wants a few extra
> > > __set_current_state()'s like so:
> >
> > Not anymore, I decided not to because of noise in the task state.
> >
> > Also I'm considering two tracepoints for the return path to reduce
> > the buffer size as Mathieu suggested.  Normally it'd return with 0
> > so we can ignore it in the contention_end.  For non-zero cases,
> > we can add a new tracepoint to save the return value.
>
> I don't think you need two tracepoints, but one that you can override.
>
> We have eprobes that let you create a trace event on top of a current trace
> event that can limit or extend what is traced in the buffer.
>
> And I also have custom events that can be placed on top of any existing
> tracepoint that has full access to what is sent into the tracepoint on not
> just what is available to the trace event:
>
>   https://lore.kernel.org/all/20220312232551.181178712@goodmis.org/

Thanks for the info.  But it's unclear to me if it provides the custom
event with the same or different name.  Can I use both of the original
and the custom events at the same time?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-19  0:11           ` Namhyung Kim
@ 2022-03-22  5:31             ` Namhyung Kim
  2022-03-22 12:59               ` Steven Rostedt
  0 siblings, 1 reply; 26+ messages in thread
From: Namhyung Kim @ 2022-03-22  5:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Boqun Feng, Ingo Molnar, Will Deacon,
	Waiman Long, LKML, Thomas Gleixner, Byungchul Park,
	Paul E. McKenney, Mathieu Desnoyers, Arnd Bergmann,
	Radoslaw Burny, linux-arch, bpf

On Fri, Mar 18, 2022 at 5:11 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Mar 18, 2022 at 3:07 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Fri, 18 Mar 2022 14:55:27 -0700
> > Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > > > > This looks a littl ugly ;-/ Maybe we can rename __down_common() to
> > > > > ___down_common() and implement __down_common() as:
> > > > >
> > > > >       static inline int __sched __down_common(...)
> > > > >       {
> > > > >               int ret;
> > > > >               trace_contention_begin(sem, 0);
> > > > >               ret = ___down_common(...);
> > > > >               trace_contention_end(sem, ret);
> > > > >               return ret;
> > > > >       }
> > > > >
> > > > > Thoughts?
> > > >
> > > > Yeah, that works, except I think he wants a few extra
> > > > __set_current_state()'s like so:
> > >
> > > Not anymore, I decided not to because of noise in the task state.
> > >
> > > Also I'm considering two tracepoints for the return path to reduce
> > > the buffer size as Mathieu suggested.  Normally it'd return with 0
> > > so we can ignore it in the contention_end.  For non-zero cases,
> > > we can add a new tracepoint to save the return value.
> >
> > I don't think you need two tracepoints, but one that you can override.
> >
> > We have eprobes that let you create a trace event on top of a current trace
> > event that can limit or extend what is traced in the buffer.
> >
> > And I also have custom events that can be placed on top of any existing
> > tracepoint that has full access to what is sent into the tracepoint on not
> > just what is available to the trace event:
> >
> >   https://lore.kernel.org/all/20220312232551.181178712@goodmis.org/
>
> Thanks for the info.  But it's unclear to me if it provides the custom
> event with the same or different name.  Can I use both of the original
> and the custom events at the same time?

I've read the code and understood that it's a separate event that can
be used together.  Then I think we can leave the tracepoint with the
return value and let users customize it for their needs later.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-22  5:31             ` Namhyung Kim
@ 2022-03-22 12:59               ` Steven Rostedt
  2022-03-22 16:39                 ` Namhyung Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2022-03-22 12:59 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Boqun Feng, Ingo Molnar, Will Deacon,
	Waiman Long, LKML, Thomas Gleixner, Byungchul Park,
	Paul E. McKenney, Mathieu Desnoyers, Arnd Bergmann,
	Radoslaw Burny, linux-arch, bpf

On Mon, 21 Mar 2022 22:31:30 -0700
Namhyung Kim <namhyung@kernel.org> wrote:

> > Thanks for the info.  But it's unclear to me if it provides the custom
> > event with the same or different name.  Can I use both of the original
> > and the custom events at the same time?  

Sorry, missed your previous question.

> 
> I've read the code and understood that it's a separate event that can
> be used together.  Then I think we can leave the tracepoint with the
> return value and let users customize it for their needs later.

Right, thanks for looking deeper at it.

-- Steve

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] locking: Apply contention tracepoints in the slow path
  2022-03-22 12:59               ` Steven Rostedt
@ 2022-03-22 16:39                 ` Namhyung Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Namhyung Kim @ 2022-03-22 16:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Boqun Feng, Ingo Molnar, Will Deacon,
	Waiman Long, LKML, Thomas Gleixner, Byungchul Park,
	Paul E. McKenney, Mathieu Desnoyers, Arnd Bergmann,
	Radoslaw Burny, linux-arch, bpf

On Tue, Mar 22, 2022 at 5:59 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 21 Mar 2022 22:31:30 -0700
> Namhyung Kim <namhyung@kernel.org> wrote:
>
> > > Thanks for the info.  But it's unclear to me if it provides the custom
> > > event with the same or different name.  Can I use both of the original
> > > and the custom events at the same time?
>
> Sorry, missed your previous question.

No problem!

>
> >
> > I've read the code and understood that it's a separate event that can
> > be used together.  Then I think we can leave the tracepoint with the
> > return value and let users customize it for their needs later.
>
> Right, thanks for looking deeper at it.

And thanks for your review.  I'll post a v4 soon.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-03-22 16:40 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-16 22:45 [PATCH 0/2] locking: Add new lock contention tracepoints (v3) Namhyung Kim
2022-03-16 22:45 ` [PATCH 1/2] locking: Add lock contention tracepoints Namhyung Kim
2022-03-17  2:31   ` Steven Rostedt
2022-03-17 13:32   ` Mathieu Desnoyers
2022-03-17 16:07     ` Steven Rostedt
2022-03-17 16:37       ` Mathieu Desnoyers
2022-03-18 20:58         ` Namhyung Kim
2022-03-16 22:45 ` [PATCH 2/2] locking: Apply contention tracepoints in the slow path Namhyung Kim
2022-03-17 13:45   ` Mathieu Desnoyers
2022-03-17 16:10     ` Steven Rostedt
2022-03-17 16:43       ` Mathieu Desnoyers
2022-03-18 21:34     ` Namhyung Kim
2022-03-17 18:19   ` Hyeonggon Yoo
2022-03-18 21:43     ` Namhyung Kim
2022-03-18 12:55   ` Boqun Feng
2022-03-18 13:24     ` Hyeonggon Yoo
2022-03-18 13:28       ` Hyeonggon Yoo
2022-03-18 16:43     ` Peter Zijlstra
2022-03-18 21:55       ` Namhyung Kim
2022-03-18 22:07         ` Steven Rostedt
2022-03-19  0:11           ` Namhyung Kim
2022-03-22  5:31             ` Namhyung Kim
2022-03-22 12:59               ` Steven Rostedt
2022-03-22 16:39                 ` Namhyung Kim
2022-03-17 17:32 ` [PATCH 0/2] locking: Add new lock contention tracepoints (v3) Hyeonggon Yoo
2022-03-18 21:12   ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).