From: Marco Elver <elver@google.com>
To: elver@google.com, Peter Zijlstra <peterz@infradead.org>,
Frederic Weisbecker <frederic@kernel.org>,
Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
Dmitry Vyukov <dvyukov@google.com>,
linux-perf-users@vger.kernel.org, x86@kernel.org,
linux-sh@vger.kernel.org, kasan-dev@googlegroups.com,
linux-kernel@vger.kernel.org
Subject: [PATCH 0/8] perf/hw_breakpoint: Optimize for thousands of tasks
Date: Thu, 9 Jun 2022 13:30:38 +0200 [thread overview]
Message-ID: <20220609113046.780504-1-elver@google.com> (raw)
The hw_breakpoint subsystem's code has seen little change in over 10
years. In that time, systems with >100s of CPUs have become common,
along with improvements to the perf subsystem: using breakpoints on
thousands of concurrent tasks should be a supported usecase.
The breakpoint constraints accounting algorithm is the major bottleneck
in doing so:
1. task_bp_pinned() has been O(#tasks), and called twice for each CPU.
2. Everything is serialized on a global mutex, 'nr_bp_mutex'.
This series first optimizes task_bp_pinned() to only take O(1) on
average, and then reworks synchronization to allow concurrency when
checking and updating breakpoint constraints for tasks. Along the way,
smaller micro-optimizations and cleanups are done as they seemed obvious
when staring at the code (but likely insignificant).
The result is (on a system with 256 CPUs) that we go from:
| $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64
[ ^ more aggressive benchmark parameters took too long ]
| # Running 'breakpoint/thread' benchmark:
| # Created/joined 30 threads with 4 breakpoints and 64 parallelism
| Total time: 236.418 [sec]
|
| 123134.794271 usecs/op
| 7880626.833333 usecs/op/cpu
... to -- with all optimizations:
| $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64
| # Running 'breakpoint/thread' benchmark:
| # Created/joined 30 threads with 4 breakpoints and 64 parallelism
| Total time: 0.071 [sec]
|
| 37.134896 usecs/op
| 2376.633333 usecs/op/cpu
On the used test system, that's an effective speedup of ~3315x per op.
Which is close to the theoretical ideal performance through
optimizations in hw_breakpoint.c -- for reference, constraints
accounting disabled:
| perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64
| # Running 'breakpoint/thread' benchmark:
| # Created/joined 30 threads with 4 breakpoints and 64 parallelism
| Total time: 0.067 [sec]
|
| 35.286458 usecs/op
| 2258.333333 usecs/op/cpu
At this point, the current implementation is only ~5% slower than the
theoretical ideal. However, given constraints accounting cannot
realistically be disabled, this is likely as far as we can push it.
Marco Elver (8):
perf/hw_breakpoint: Optimize list of per-task breakpoints
perf/hw_breakpoint: Mark data __ro_after_init
perf/hw_breakpoint: Optimize constant number of breakpoint slots
perf/hw_breakpoint: Make hw_breakpoint_weight() inlinable
perf/hw_breakpoint: Remove useless code related to flexible
breakpoints
perf/hw_breakpoint: Reduce contention with large number of tasks
perf/hw_breakpoint: Optimize task_bp_pinned() if CPU-independent
perf/hw_breakpoint: Clean up headers
arch/sh/include/asm/hw_breakpoint.h | 5 +-
arch/x86/include/asm/hw_breakpoint.h | 5 +-
include/linux/hw_breakpoint.h | 1 -
include/linux/perf_event.h | 3 +-
kernel/events/hw_breakpoint.c | 374 +++++++++++++++++++--------
5 files changed, 276 insertions(+), 112 deletions(-)
--
2.36.1.255.ge46751e96f-goog
next reply other threads:[~2022-06-09 11:31 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-09 11:30 Marco Elver [this message]
2022-06-09 11:30 ` [PATCH 1/8] perf/hw_breakpoint: Optimize list of per-task breakpoints Marco Elver
2022-06-09 12:30 ` Dmitry Vyukov
2022-06-09 12:53 ` Marco Elver
2022-06-09 13:05 ` Dmitry Vyukov
2022-06-09 14:29 ` Dmitry Vyukov
2022-06-09 14:55 ` Marco Elver
2022-06-09 16:53 ` Dmitry Vyukov
2022-06-09 18:37 ` Marco Elver
2022-06-10 9:04 ` Dmitry Vyukov
2022-06-10 9:36 ` Marco Elver
2022-06-09 11:30 ` [PATCH 2/8] perf/hw_breakpoint: Mark data __ro_after_init Marco Elver
2022-06-09 11:45 ` Dmitry Vyukov
2022-06-09 11:30 ` [PATCH 3/8] perf/hw_breakpoint: Optimize constant number of breakpoint slots Marco Elver
2022-06-09 11:55 ` Dmitry Vyukov
2022-06-09 11:30 ` [PATCH 4/8] perf/hw_breakpoint: Make hw_breakpoint_weight() inlinable Marco Elver
2022-06-09 12:03 ` Dmitry Vyukov
2022-06-09 12:08 ` Marco Elver
2022-06-09 12:23 ` Dmitry Vyukov
2022-06-09 13:25 ` Peter Zijlstra
2022-06-09 11:30 ` [PATCH 5/8] perf/hw_breakpoint: Remove useless code related to flexible breakpoints Marco Elver
2022-06-09 12:04 ` Dmitry Vyukov
2022-06-09 13:41 ` Dmitry Vyukov
2022-06-09 14:00 ` Marco Elver
2022-06-09 11:30 ` [PATCH 6/8] perf/hw_breakpoint: Reduce contention with large number of tasks Marco Elver
2022-06-09 13:03 ` Dmitry Vyukov
2022-06-09 13:29 ` Marco Elver
2022-06-09 11:30 ` [PATCH 7/8] perf/hw_breakpoint: Optimize task_bp_pinned() if CPU-independent Marco Elver
2022-06-09 15:00 ` Dmitry Vyukov
2022-06-10 8:25 ` Marco Elver
2022-06-10 9:13 ` Dmitry Vyukov
2022-06-09 11:30 ` [PATCH 8/8] perf/hw_breakpoint: Clean up headers Marco Elver
2022-06-09 12:11 ` Dmitry Vyukov
2022-06-09 12:28 ` [PATCH 0/8] perf/hw_breakpoint: Optimize for thousands of tasks Dmitry Vyukov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220609113046.780504-1-elver@google.com \
--to=elver@google.com \
--cc=acme@kernel.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=dvyukov@google.com \
--cc=frederic@kernel.org \
--cc=jolsa@redhat.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux-sh@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.