linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: acme@kernel.org
Cc: jolsa@kernel.org, linux-kernel@vger.kernel.org
Subject: Optimize perf stat for large number of events/cpus
Date: Fri, 15 Nov 2019 21:52:17 -0800	[thread overview]
Message-ID: <20191116055229.62002-1-andi@firstfloor.org> (raw)

[v7: Address review feedback. Fix python script problem
reported by 0day. Drop merged patches.]

This patch kit optimizes perf stat for a large number of events 
on systems with many CPUs and PMUs.

Some profiling shows that the most overhead is doing IPIs to
all the target CPUs. We can optimize this by using sched_setaffinity
to set the affinity to a target CPU once and then doing
the perf operation for all events on that CPU. This requires
some restructuring, but cuts the set up time quite a bit.

In theory we could go further by parallelizing these setups
too, but that would be much more complicated and for now just batching it
per CPU seems to be sufficient. At some point with many more cores 
parallelization or a better bulk perf setup API might be needed though.

In addition perf does a lot of redundant /sys accesses with
many PMUs, which can be also expensve. This is also optimized.

On a large test case (>700 events with many weak groups) on a 94 CPU
system I go from

real	0m8.607s
user	0m0.550s
sys	0m8.041s

to 

real	0m3.269s
user	0m0.760s
sys	0m1.694s

so shaving ~6 seconds of system time, at slightly more cost
in perf stat itself. On a 4 socket system with the savings
are more dramatic:

real	0m15.641s
user	0m0.873s
sys	0m14.729s

to 

real	0m4.493s
user	0m1.578s
sys	0m2.444s

so 11s difference in the user visible set up time.

Also available in 

git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/stat-scale-10

v1: Initial post.
v2: Rebase. Fix some minor issues.
v3: Rebase. Address review feedback. Fix one minor issue
v4: Modified based on review feedback. Now it maintains
all_cpus per evlist. There is still a need for cpu_index iteration
to get the correct index for indexing the file descriptors.
Fix bug with unsorted cpu maps, now they are always sorted.
Some cleanups and refactoring.
v5: Split patches. Redo loop iteration again. Fix cpu map
merging for uncore. Remove duplicates from cpumaps. Add unit
tests.
v6: Address review feedback. Fix some bugs. Add more comments.
Merge one invalid patch split.
v7: Address review feedback. Fix python scripting (thanks 0day)
Minor updates.

-Andi


             reply	other threads:[~2019-11-16  5:52 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-16  5:52 Andi Kleen [this message]
2019-11-16  5:52 ` [PATCH v7 01/12] perf pmu: Use file system cache to optimize sysfs access Andi Kleen
2019-11-16  5:52 ` [PATCH v7 02/12] perf affinity: Add infrastructure to save/restore affinity Andi Kleen
2019-11-16  5:52 ` [PATCH v7 03/12] perf cpumap: Maintain cpumaps ordered and without dups Andi Kleen
2019-11-16  5:52 ` [PATCH v7 04/12] perf evlist: Maintain evlist->all_cpus Andi Kleen
2019-11-16  5:52 ` [PATCH v7 05/12] perf evsel: Add iterator to iterate over events ordered by CPU Andi Kleen
2019-11-16  5:52 ` [PATCH v7 06/12] perf evsel: Add functions to close evsel on a CPU Andi Kleen
2019-11-16  5:52 ` [PATCH v7 07/12] perf stat: Use affinity for closing file descriptors Andi Kleen
2019-11-16  5:52 ` [PATCH v7 08/12] perf stat: Factor out open error handling Andi Kleen
2019-11-16  5:52 ` [PATCH v7 09/12] perf stat: Use affinity for opening events Andi Kleen
2019-11-20 15:07   ` Jiri Olsa
2019-11-20 22:31     ` Andi Kleen
2019-11-21 15:13       ` Jiri Olsa
2019-11-16  5:52 ` [PATCH v7 10/12] perf stat: Use affinity for reading Andi Kleen
2019-11-20 15:12   ` Jiri Olsa
2019-11-16  5:52 ` [PATCH v7 11/12] perf evsel: Add functions to enable/disable for a specific CPU Andi Kleen
2019-11-16  5:52 ` [PATCH v7 12/12] perf stat: Use affinity for enabling/disabling events Andi Kleen
2019-11-20 15:16 ` Optimize perf stat for large number of events/cpus Jiri Olsa
  -- strict thread matches above, loose matches on Subject: below --
2019-11-21  0:15 Andi Kleen
2019-11-21 12:47 ` Andi Kleen
2019-11-21 14:32   ` Arnaldo Carvalho de Melo
2019-11-27 15:16 ` Arnaldo Carvalho de Melo
2019-11-27 15:43   ` Arnaldo Carvalho de Melo
2019-11-27 23:26     ` Andi Kleen
2019-11-28  0:01       ` Arnaldo Carvalho de Melo
2019-11-12  0:59 Andi Kleen
2019-11-07 18:16 Andi Kleen
2019-11-05  0:25 Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191116055229.62002-1-andi@firstfloor.org \
    --to=andi@firstfloor.org \
    --cc=acme@kernel.org \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --subject='Re: Optimize perf stat for large number of events/cpus' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).