From: Namhyung Kim <namhyung@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>, Will Deacon <will@kernel.org>,
Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Steven Rostedt <rostedt@goodmis.org>,
Byungchul Park <byungchul.park@lge.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Radoslaw Burny <rburny@google.com>, Tejun Heo <tj@kernel.org>,
rcu@vger.kernel.org, cgroups@vger.kernel.org,
linux-btrfs@vger.kernel.org, intel-gfx@lists.freedesktop.org,
paulmck@kernel.org
Subject: Re: [RFC 00/12] locking: Separate lock tracepoints from lockdep/lock_stat (v1)
Date: Tue, 8 Feb 2022 11:14:00 -0800 [thread overview]
Message-ID: <CAM9d7cjVgqefx28uvZAv4GLUO1u78QKqMwjRfuyLHYKWJkWF_Q@mail.gmail.com> (raw)
In-Reply-To: <20220208184208.79303-1-namhyung@kernel.org>
Oops, I used the wrong email address of Paul. Sorry about that!
I'll resend with a new address soon.
Thanks,
Namhyung
On Tue, Feb 8, 2022 at 10:42 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hello,
>
> There have been some requests for low-overhead kernel lock contention
> monitoring. The kernel has CONFIG_LOCK_STAT to provide such an infra
> either via /proc/lock_stat or tracepoints directly.
>
> However it's not light-weight and hard to be used in production. So
> I'm trying to separate out the tracepoints and using them as a base to
> build a new monitoring system.
>
> As the lockdep and lock_stat provide good hooks in the lock functions,
> it'd be natural to reuse them. Actually I tried to use lockdep as is
> but disables the functionality at runtime (initialize debug_locks = 0,
> lock_stat = 0). But it still has unacceptable overhead and the
> lockdep data structures also increase memory footprint unnecessarily.
>
> So I'm proposing a separate tracepoint-only configuration and keeping
> lockdep_map only with minimal information needed for tracepoints (for
> now, it's just name). And then the existing lockdep hooks can be used
> for the tracepoints.
>
> The patch 01-06 are preparation for the work. In a few places in the
> kernel, they calls lockdep annotation explicitly to deal with
> limitations in the lockdep implementation. In my understanding, they
> are not needed to analyze lock contention.
>
> To make matters worse, they rely on the compiler optimization (or
> macro magic) so that it can get rid of the annotations and their
> arguments when lockdep is not configured.
>
> But it's not true any more when lock tracepoints are added and it'd
> cause build errors. So I added #ifdef guards for LOCKDEP in the code
> to prevent such errors.
>
> In the patch 07 I mechanically changed most of code that depend on
> CONFIG_LOCKDEP or CONFIG_DEBUG_LOCK_ALLOC to CONFIG_LOCK_INFO. It
> paves the way for the codes to be shared for lockdep and tracepoints.
> Mostly, it makes sure that locks are initialized with a proper name,
> like in the patch 08 and 09.
>
> I didn't change some places intentionally - for example, timer and
> workqueue depend on LOCKDEP explicitly since they use some lockdep
> annotations to work around the "held lock freed" warnings. The ocfs2
> directly accesses lockdep_map.key so I didn't touch the code for now.
> And RCU was because it generates too much overhead thanks to the
> rcu_read_lock(). Maybe I need to revisit some of them later.
>
> I added CONFIG_LOCK_TRACEPOINTS in the patch 10 to make it optional.
> I found that it adds 2~3% of overhead when I ran `perf bench sched
> messaging` even when the tracepoints are disabled. The benchmark
> creates a lot of processes and make them communicate by socket pairs
> (or pipes). I measured that around 15% of lock acquisition creates
> contentions and it's mostly for spin locks (alc->lock and u->lock).
>
> I ran perf record + report with the workload and it showed 50% of the
> cpu cycles are in the spin lock slow path. So it's highly affected by
> the spin lock slow path. Actually LOCK_CONTENDED() macro transforms
> the spin lock code (and others) to use trylock first and then fall
> back to real lock function if failed. Thus it'd add more (atomic)
> operations and cache line bouncing for the contended cases:
>
> #define LOCK_CONTENDED(_lock, try, lock) \
> do { \
> if (!try(_lock)) { \
> lock_contended(&(_lock)->dep_map, _RET_IP_); \
> lock(_lock); \
> } \
> lock_acquired(&(_lock)->dep_map, _RET_IP_); \
> } while (0)
>
> If I modify the macro not to use trylock and to call the real lock
> function directly (so the lock_contended tracepoint would be called
> always, if enabled), the overhead goes down to 0.x% when the
> tracepoints are disabled.
>
> I don't have a good solution as long as we use LOCK_CONTENDED() macro
> to separate the contended locking path. Maybe we can make it call
> some (generic) slow path lock function directly after failing trylock.
> Or move the lockdep annotations into the each lock function bodies and
> get rid of the LOCK_CONTENDED() macro entirely. Ideas?
>
> Actually the patch 11 handles the same issue on the mutex code. The
> fast version of mutex trylock was attempted only if LOCKDEP is not
> enabled and it affects the mutex lock performance in the uncontended
> cases too. So I partially reverted the change in the patch 07 to use
> the fast functions with lock tracepoints too. Maybe we can use it
> with LOCKDEP as well?
>
> The last patch 12 might be controversial and I'd like to move the
> lock_acquired annotation into the if(!try) block in the LOCK_CONTEDED
> macro so that it can only be called when there's a contention.
>
> Eventually I'm mostly interested in the contended locks only and I
> want to reduce the overhead in the fast path. By moving that, it'd be
> easy to track contended locks with timing by using two tracepoints.
>
> It'd change lock hold time calculation in lock_stat for the fast path,
> but I assume time difference between lock_acquire and lock_acquired
> would be small when the lock is not contended. So I think we can use
> the timestamp in lock_acquire. If it's not acceptable, we might
> consider adding a new tracepoint to track the timing of contended
> locks.
>
> This series base on the current tip/locking/core and you get it from
> 'locking/tracepoint-v1' branch in my tree at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
>
> Thanks,
> Namhyung
>
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Byungchul Park <byungchul.park@lge.com>
> Cc: rcu@vger.kernel.org
> Cc: cgroups@vger.kernel.org
> Cc: linux-btrfs@vger.kernel.org
> Cc: intel-gfx@lists.freedesktop.org
>
>
> Namhyung Kim (12):
> locking: Pass correct outer wait type info
> cgroup: rstat: Make cgroup_rstat_cpu_lock name readable
> timer: Protect lockdep functions with #ifdef
> workqueue: Protect lockdep functions with #ifdef
> drm/i915: Protect lockdep functions with #ifdef
> btrfs: change lockdep class size check using ks->names
> locking: Introduce CONFIG_LOCK_INFO
> locking/mutex: Init name properly w/ CONFIG_LOCK_INFO
> locking: Add more static lockdep init macros
> locking: Add CONFIG_LOCK_TRACEPOINTS option
> locking/mutex: Revive fast functions for LOCK_TRACEPOINTS
> locking: Move lock_acquired() from the fast path
>
> drivers/gpu/drm/drm_connector.c | 7 +-
> drivers/gpu/drm/i915/i915_sw_fence.h | 2 +-
> drivers/gpu/drm/i915/intel_wakeref.c | 3 +
> drivers/gpu/drm/i915/selftests/lib_sw_fence.h | 2 +-
> .../net/wireless/intel/iwlwifi/iwl-trans.c | 4 +-
> .../net/wireless/intel/iwlwifi/iwl-trans.h | 2 +-
> drivers/tty/tty_ldsem.c | 2 +-
> fs/btrfs/disk-io.c | 4 +-
> fs/btrfs/disk-io.h | 2 +-
> fs/cifs/connect.c | 2 +-
> fs/kernfs/file.c | 2 +-
> include/linux/completion.h | 2 +-
> include/linux/jbd2.h | 2 +-
> include/linux/kernfs.h | 2 +-
> include/linux/kthread.h | 2 +-
> include/linux/local_lock_internal.h | 18 +-
> include/linux/lockdep.h | 170 ++++++++++++++++--
> include/linux/lockdep_types.h | 8 +-
> include/linux/mmu_notifier.h | 2 +-
> include/linux/mutex.h | 12 +-
> include/linux/percpu-rwsem.h | 4 +-
> include/linux/regmap.h | 4 +-
> include/linux/rtmutex.h | 14 +-
> include/linux/rwlock_api_smp.h | 4 +-
> include/linux/rwlock_rt.h | 4 +-
> include/linux/rwlock_types.h | 11 +-
> include/linux/rwsem.h | 14 +-
> include/linux/seqlock.h | 4 +-
> include/linux/spinlock_api_smp.h | 4 +-
> include/linux/spinlock_rt.h | 4 +-
> include/linux/spinlock_types.h | 4 +-
> include/linux/spinlock_types_raw.h | 28 ++-
> include/linux/swait.h | 2 +-
> include/linux/tty_ldisc.h | 2 +-
> include/linux/wait.h | 2 +-
> include/linux/ww_mutex.h | 6 +-
> include/media/v4l2-ctrls.h | 2 +-
> include/net/sock.h | 2 +-
> include/trace/events/lock.h | 4 +-
> kernel/cgroup/rstat.c | 7 +-
> kernel/locking/Makefile | 1 +
> kernel/locking/lockdep.c | 40 ++++-
> kernel/locking/mutex-debug.c | 2 +-
> kernel/locking/mutex.c | 22 ++-
> kernel/locking/mutex.h | 7 +
> kernel/locking/percpu-rwsem.c | 2 +-
> kernel/locking/rtmutex_api.c | 10 +-
> kernel/locking/rwsem.c | 4 +-
> kernel/locking/spinlock.c | 2 +-
> kernel/locking/spinlock_debug.c | 4 +-
> kernel/locking/spinlock_rt.c | 8 +-
> kernel/locking/ww_rt_mutex.c | 2 +-
> kernel/printk/printk.c | 14 +-
> kernel/rcu/update.c | 27 +--
> kernel/time/timer.c | 8 +-
> kernel/workqueue.c | 13 ++
> lib/Kconfig.debug | 14 ++
> mm/memcontrol.c | 7 +-
> mm/mmu_notifier.c | 2 +-
> net/core/dev.c | 2 +-
> net/sunrpc/svcsock.c | 2 +-
> net/sunrpc/xprtsock.c | 2 +-
> 62 files changed, 391 insertions(+), 180 deletions(-)
>
>
> base-commit: 1dc01abad6544cb9d884071b626b706e37aa9601
> --
> 2.35.0.263.gb82422642f-goog
>
next prev parent reply other threads:[~2022-02-08 19:14 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-08 18:41 [RFC 00/12] locking: Separate lock tracepoints from lockdep/lock_stat (v1) Namhyung Kim
2022-02-08 18:41 ` [PATCH 01/12] locking: Pass correct outer wait type info Namhyung Kim
2022-02-08 18:41 ` [PATCH 02/12] cgroup: rstat: Make cgroup_rstat_cpu_lock name readable Namhyung Kim
2022-02-08 18:46 ` Tejun Heo
2022-02-08 19:16 ` Namhyung Kim
2022-02-08 23:51 ` Namhyung Kim
2022-02-08 18:41 ` [PATCH 03/12] timer: Protect lockdep functions with #ifdef Namhyung Kim
2022-02-08 19:36 ` Steven Rostedt
2022-02-08 20:29 ` Namhyung Kim
2022-02-08 21:19 ` Steven Rostedt
2022-02-08 18:42 ` [PATCH 04/12] workqueue: " Namhyung Kim
2022-02-08 18:48 ` Tejun Heo
2022-02-08 19:17 ` Namhyung Kim
2022-02-08 19:38 ` Steven Rostedt
2022-02-08 18:42 ` [PATCH 05/12] drm/i915: " Namhyung Kim
2022-02-08 18:51 ` Jani Nikula
2022-02-08 19:22 ` Namhyung Kim
2022-02-09 13:49 ` Jani Nikula
2022-02-09 16:27 ` Steven Rostedt
2022-02-09 19:28 ` Namhyung Kim
2022-02-08 18:42 ` [PATCH 06/12] btrfs: change lockdep class size check using ks->names Namhyung Kim
2022-02-08 19:03 ` David Sterba
2022-02-08 18:42 ` [PATCH 07/12] locking: Introduce CONFIG_LOCK_INFO Namhyung Kim
2022-02-08 18:42 ` [PATCH 08/12] locking/mutex: Init name properly w/ CONFIG_LOCK_INFO Namhyung Kim
2022-02-08 18:42 ` [PATCH 09/12] locking: Add more static lockdep init macros Namhyung Kim
2022-02-08 18:42 ` [PATCH 10/12] locking: Add CONFIG_LOCK_TRACEPOINTS option Namhyung Kim
2022-02-08 18:42 ` [PATCH 11/12] locking/mutex: Revive fast functions for CONFIG_LOCK_TRACEPOINTS Namhyung Kim
2022-02-09 8:40 ` Peter Zijlstra
2022-02-09 20:15 ` Namhyung Kim
2022-02-08 18:42 ` [PATCH 12/12] locking: Move lock_acquired() from the fast path Namhyung Kim
2022-02-08 19:14 ` Namhyung Kim [this message]
2022-02-09 9:09 ` [RFC 00/12] locking: Separate lock tracepoints from lockdep/lock_stat (v1) Peter Zijlstra
2022-02-09 18:19 ` Waiman Long
2022-02-09 18:29 ` Mathieu Desnoyers
2022-02-09 19:02 ` Waiman Long
2022-02-09 19:17 ` Mathieu Desnoyers
2022-02-09 19:37 ` Waiman Long
2022-02-09 19:22 ` Namhyung Kim
2022-02-09 19:28 ` Mathieu Desnoyers
2022-02-09 19:45 ` Namhyung Kim
2022-02-09 19:56 ` Mathieu Desnoyers
2022-02-09 20:17 ` Waiman Long
2022-02-10 0:27 ` Namhyung Kim
2022-02-10 2:12 ` Waiman Long
2022-02-10 9:33 ` Peter Zijlstra
2022-02-10 0:32 ` Namhyung Kim
2022-02-10 9:13 ` Peter Zijlstra
2022-02-10 19:14 ` Paul E. McKenney
2022-02-10 19:27 ` Waiman Long
2022-02-10 20:10 ` Paul E. McKenney
2022-02-11 5:57 ` Namhyung Kim
2022-02-11 5:55 ` Namhyung Kim
2022-02-11 10:39 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAM9d7cjVgqefx28uvZAv4GLUO1u78QKqMwjRfuyLHYKWJkWF_Q@mail.gmail.com \
--to=namhyung@kernel.org \
--cc=boqun.feng@gmail.com \
--cc=byungchul.park@lge.com \
--cc=cgroups@vger.kernel.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rburny@google.com \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).