All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/3] perf/core: addressing 4x slowdown during per-process profiling of STREAM benchmark on Intel Xeon Phi
@ 2017-08-02  8:11 Alexey Budankov
  2017-08-02  8:13 ` [PATCH v6 1/3] perf/core: use rb trees for pinned/flexible groups Alexey Budankov
                   ` (3 more replies)
  0 siblings, 4 replies; 76+ messages in thread
From: Alexey Budankov @ 2017-08-02  8:11 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin
  Cc: Andi Kleen, Kan Liang, Dmitri Prokhorov, Valery Cherepennikov,
	Mark Rutland, Stephane Eranian, David Carrillo-Cisneros,
	linux-kernel

Hi,

By default, the userspace perf tool opens per-cpu task-bound events
when sampling, so for N logical events requested by the user, the tool
will open N * NR_CPUS events.

In the kernel, we mux events with a hrtimer, periodically rotating the
flexible group list and trying to schedule each group in turn. We skip 
groups whose cpu filter doesn't match. So when we get unlucky, we can 
walk N * (NR_CPUS - 1) groups pointlessly for each hrtimer invocation.

This has been observed to result in significant overhead when running
the STREAM benchmark on 272 core Xeon Phi systems.

One way to avoid this is to place our events into an rb tree sorted by
CPU, so that our hrtimer can skip to the current CPU's list and ignore
everything else. 

This patch set moves event groups into rb trees and implements 
skipping to the current CPU's list on hrtimer interrupt.

The patch set was tested on Xeon Phi using perf_fuzzer and tests 
from here: https://github.com/deater/perf_event_tests

Patches in the set are expected to be applied one after another in 
the mentioned order and they are logically split here into three parts 
to simplify the review process.

Thanks,
Alexey

---
 Alexey Budankov (3):
	perf/core: use rb trees for pinned/flexible groups
	perf/core: use context tstamp_data for skipped events on mux interrupt
	perf/core: add mux switch to skip to the current CPU's events list on mux interrupt

 include/linux/perf_event.h |  54 +++--
 kernel/events/core.c       | 584 +++++++++++++++++++++++++++++++++------------
 2 files changed, 473 insertions(+), 165 deletions(-)

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2018-03-12 17:43 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-02  8:11 [PATCH v6 0/3] perf/core: addressing 4x slowdown during per-process profiling of STREAM benchmark on Intel Xeon Phi Alexey Budankov
2017-08-02  8:13 ` [PATCH v6 1/3] perf/core: use rb trees for pinned/flexible groups Alexey Budankov
2017-08-03 13:00   ` Peter Zijlstra
2017-08-03 20:30     ` Alexey Budankov
2017-08-04 14:36       ` Peter Zijlstra
2017-08-07  7:17         ` Alexey Budankov
2017-08-07  8:39           ` Peter Zijlstra
2017-08-07  9:13             ` Peter Zijlstra
2017-08-07 15:32               ` Alexey Budankov
2017-08-07 15:55                 ` Peter Zijlstra
2017-08-07 16:27                   ` Alexey Budankov
2017-08-07 16:57                     ` Peter Zijlstra
2017-08-07 17:39                       ` Andi Kleen
2017-08-07 18:12                         ` Peter Zijlstra
2017-08-07 18:13                       ` Alexey Budankov
2017-08-15 17:28           ` Alexey Budankov
2017-08-23 13:39             ` Alexander Shishkin
2017-08-23 14:18               ` Alexey Budankov
2017-08-29 13:51             ` Alexander Shishkin
2017-08-30  8:30               ` Alexey Budankov
2017-08-30 10:18                 ` Alexander Shishkin
2017-08-30 10:30                   ` Alexey Budankov
2017-08-30 11:13                     ` Alexander Shishkin
2017-08-30 11:16                 ` Alexey Budankov
2017-08-31 10:12                   ` Alexey Budankov
2017-08-31 10:12             ` Alexey Budankov
2017-08-04 14:53       ` Peter Zijlstra
2017-08-07 15:22         ` Alexey Budankov
2017-08-02  8:15 ` [PATCH v6 2/3]: perf/core: use context tstamp_data for skipped events on mux interrupt Alexey Budankov
2017-08-03 13:04   ` Peter Zijlstra
2017-08-03 14:00   ` Peter Zijlstra
2017-08-03 15:58     ` Alexey Budankov
2017-08-04 12:36       ` Peter Zijlstra
2017-08-03 15:00   ` Peter Zijlstra
2017-08-03 18:47     ` Alexey Budankov
2017-08-04 12:35       ` Peter Zijlstra
2017-08-04 12:51         ` Peter Zijlstra
2017-08-04 14:25           ` Alexey Budankov
2017-08-04 14:23         ` Alexey Budankov
2017-08-10 15:57     ` Alexey Budankov
2017-08-22 20:47       ` Peter Zijlstra
2017-08-23  8:54         ` Alexey Budankov
2017-08-31 17:18           ` [RFC][PATCH] perf: Rewrite enabled/running timekeeping Peter Zijlstra
2017-08-31 19:51             ` Stephane Eranian
2017-09-05  7:51               ` Stephane Eranian
2017-09-05  9:44                 ` Peter Zijlstra
2017-09-01 10:45             ` Alexey Budankov
2017-09-01 12:31               ` Peter Zijlstra
2017-09-01 11:17             ` Alexey Budankov
2017-09-01 12:42               ` Peter Zijlstra
2017-09-01 21:03             ` Vince Weaver
2017-09-04 10:46             ` Alexey Budankov
2017-09-04 12:08               ` Peter Zijlstra
2017-09-04 14:56                 ` Alexey Budankov
2017-09-04 15:41                   ` Peter Zijlstra
2017-09-04 15:58                     ` Peter Zijlstra
2017-09-05 10:17                     ` Alexey Budankov
2017-09-05 11:19                       ` Peter Zijlstra
2017-09-11  6:55                         ` Alexey Budankov
2017-09-05 12:06                       ` Alexey Budankov
2017-09-05 12:59                         ` Peter Zijlstra
2017-09-05 16:03                         ` Peter Zijlstra
2017-09-06 13:48                           ` Alexey Budankov
2017-09-08  8:47                           ` Alexey Budankov
2018-03-12 17:43                             ` [tip:perf/core] perf/cor: Use RB trees for pinned/flexible groups tip-bot for Alexey Budankov
2017-08-02  8:16 ` [PATCH v6 3/3]: perf/core: add mux switch to skip to the current CPU's events list on mux interrupt Alexey Budankov
2017-08-18  5:17 ` [PATCH v7 0/2] perf/core: addressing 4x slowdown during per-process profiling of STREAM benchmark on Intel Xeon Phi Alexey Budankov
2017-08-18  5:21   ` [PATCH v7 1/2] perf/core: use rb trees for pinned/flexible groups Alexey Budankov
2017-08-23 11:17     ` Alexander Shishkin
2017-08-23 17:23       ` Alexey Budankov
2017-08-18  5:22   ` [PATCH v7 2/2] perf/core: add mux switch to skip to the current CPU's events list on mux interrupt Alexey Budankov
2017-08-23 11:54     ` Alexander Shishkin
2017-08-23 18:12       ` Alexey Budankov
2017-08-22 20:21   ` [PATCH v7 0/2] perf/core: addressing 4x slowdown during per-process profiling of STREAM benchmark on Intel Xeon Phi Peter Zijlstra
2017-08-23  8:54     ` Alexey Budankov
2017-08-31 10:12     ` Alexey Budankov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.