linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Ian Rogers <irogers@google.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Ahmad Yasin <ahmad.yasin@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Stephane Eranian <eranian@google.com>,
	Andi Kleen <ak@linux.intel.com>,
	Perry Taylor <perry.taylor@intel.com>,
	Samantha Alt <samantha.alt@intel.com>,
	Caleb Biggers <caleb.biggers@intel.com>,
	Weilin Wang <weilin.wang@intel.com>,
	Edward Baker <edward.baker@intel.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Florian Fischer <florian.fischer@muhq.space>,
	Rob Herring <robh@kernel.org>,
	Zhengjun Xing <zhengjun.xing@linux.intel.com>,
	John Garry <john.g.garry@oracle.com>,
	Kajol Jain <kjain@linux.ibm.com>,
	Sumanth Korikkar <sumanthk@linux.ibm.com>,
	Thomas Richter <tmricht@linux.ibm.com>,
	Tiezhu Yang <yangtiezhu@loongson.cn>,
	Ravi Bangoria <ravi.bangoria@amd.com>,
	Leo Yan <leo.yan@linaro.org>,
	Yang Jihong <yangjihong1@huawei.com>,
	James Clark <james.clark@arm.com>,
	Suzuki Poulouse <suzuki.poulose@arm.com>,
	Kang Minchul <tegongkang@gmail.com>,
	Athira Rajeev <atrajeev@linux.vnet.ibm.com>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 00/40] Fix perf on Intel hybrid CPUs
Date: Wed, 26 Apr 2023 09:53:45 -0400	[thread overview]
Message-ID: <bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com> (raw)
In-Reply-To: <20230426070050.1315519-1-irogers@google.com>



On 2023-04-26 3:00 a.m., Ian Rogers wrote:
> TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
> or individually, event parsing doesn't always scan all PMUs, more and
> new tests that also run without hybrid, less code.
> 
> The first patches were previously posted to improve metrics here:
> "perf stat: Introduce skippable evsels"
> https://lore.kernel.org/all/20230414051922.3625666-1-irogers@google.com/
> "perf vendor events intel: Add xxx metric constraints"
> https://lore.kernel.org/all/20230419005423.343862-1-irogers@google.com/
> 
> Next are some general test improvements.
> 
> Next event parsing is rewritten to not scan all PMUs for the benefit
> of raw and legacy cache parsing, instead these are handled by the
> lexer and a new term type. This ultimately removes the need for the
> event parser for hybrid to be recursive as legacy cache can be just a
> term. Tests are re-enabled for events with hyphens, so AMD's
> branch-brs event is now parsable.
> 
> The cputype option is made a generic pmu filter flag and is tested
> even on non-hybrid systems.
> 
> The final patches address specific json metric issues on hybrid, in
> both the json metrics and the metric code. They also bring in a new
> json option to not group events when matching a metricgroup, this
> helps reduce counter pressure for TopdownL1 and TopdownL2 metric
> groups. The updates to the script that updates the json are posted in:
> https://github.com/intel/perfmon/pull/73
> 
> The patches add slightly more code than they remove, in areas like
> better json metric constraints and tests, but in the core util code,
> the removal of hybrid is a net reduction:
>  20 files changed, 631 insertions(+), 951 deletions(-)
> 
> There's specific detail with each patch, but for now here is the 6.3
> output followed by that from perf-tools-next with the patch series
> applied. The tool is running on an Alderlake CPU on an elderly 5.15
> kernel:
> 
> Events on hybrid that parse and pass tests:
> '''
> $ perf-6.3 version
> perf version 6.3.rc7.gb7bc77e2f2c7
> $ perf-6.3 test
> ...
>   6.1: Test event parsing                       : FAILED!
> ...
> $ perf test
> ...
>   6: Parse event definition strings             :
>   6.1: Test event parsing                       : Ok
>   6.2: Parsing of all PMU events from sysfs     : Ok
>   6.3: Parsing of given PMU events from sysfs   : Ok
>   6.4: Parsing of aliased events from sysfs     : Skip (no aliases in sysfs)
>   6.5: Parsing of aliased events                : Ok
>   6.6: Parsing of terms (event modifiers)       : Ok
> ...
> '''
> 
> No event/metric running with json metrics and TopdownL1 on both PMUs:
> '''
> $ perf-6.3 stat -a sleep 1
> 
>  Performance counter stats for 'system wide':
> 
>          24,073.58 msec cpu-clock                        #   23.975 CPUs utilized             
>                350      context-switches                 #   14.539 /sec                      
>                 25      cpu-migrations                   #    1.038 /sec                      
>                 66      page-faults                      #    2.742 /sec                      
>         21,257,199      cpu_core/cycles/                 #  883.009 K/sec                     
>          2,162,192      cpu_atom/cycles/                 #   89.816 K/sec                     
>          6,679,379      cpu_core/instructions/           #  277.457 K/sec                     
>            753,197      cpu_atom/instructions/           #   31.287 K/sec                     
>          1,300,647      cpu_core/branches/               #   54.028 K/sec                     
>            148,652      cpu_atom/branches/               #    6.175 K/sec                     
>            117,429      cpu_core/branch-misses/          #    4.878 K/sec                     
>             14,396      cpu_atom/branch-misses/          #  598.000 /sec                      
>        123,097,644      cpu_core/slots/                  #    5.113 M/sec                     
>          9,241,207      cpu_core/topdown-retiring/       #      7.5% Retiring                 
>          8,903,288      cpu_core/topdown-bad-spec/       #      7.2% Bad Speculation          
>         66,590,029      cpu_core/topdown-fe-bound/       #     54.1% Frontend Bound           
>         38,397,500      cpu_core/topdown-be-bound/       #     31.2% Backend Bound            
>          3,294,283      cpu_core/topdown-heavy-ops/      #      2.7% Heavy Operations          #      4.8% Light Operations         
>          8,855,769      cpu_core/topdown-br-mispredict/  #      7.2% Branch Mispredict         #      0.0% Machine Clears           
>         57,695,714      cpu_core/topdown-fetch-lat/      #     46.9% Fetch Latency             #      7.2% Fetch Bandwidth          
>         12,823,926      cpu_core/topdown-mem-bound/      #     10.4% Memory Bound              #     20.8% Core Bound               
> 
>        1.004093622 seconds time elapsed
> 
> $ perf stat -a sleep 1
> 
>  Performance counter stats for 'system wide':
> 
>          24,064.65 msec cpu-clock                        #   23.973 CPUs utilized             
>                384      context-switches                 #   15.957 /sec                      
>                 24      cpu-migrations                   #    0.997 /sec                      
>                 71      page-faults                      #    2.950 /sec                      
>         19,737,646      cpu_core/cycles/                 #  820.192 K/sec                     
>        122,018,505      cpu_atom/cycles/                 #    5.070 M/sec                       (63.32%)
>          7,636,653      cpu_core/instructions/           #  317.339 K/sec                     
>         16,266,629      cpu_atom/instructions/           #  675.955 K/sec                       (72.50%)
>          1,552,995      cpu_core/branches/               #   64.534 K/sec                     
>          3,208,143      cpu_atom/branches/               #  133.314 K/sec                       (72.50%)
>            132,151      cpu_core/branch-misses/          #    5.491 K/sec                     
>            547,285      cpu_atom/branch-misses/          #   22.742 K/sec                       (72.49%)
>         32,110,597      cpu_atom/TOPDOWN_RETIRING.ALL/   #    1.334 M/sec                     
>                                                   #     18.4 %  tma_bad_speculation      (72.48%)
>        228,006,765      cpu_atom/TOPDOWN_FE_BOUND.ALL/   #    9.475 M/sec                     
>                                                   #     38.1 %  tma_frontend_bound       (72.47%)
>        225,866,251      cpu_atom/TOPDOWN_BE_BOUND.ALL/   #    9.386 M/sec                     
>                                                   #     37.7 %  tma_backend_bound      
>                                                   #     37.7 %  tma_backend_bound_aux    (72.73%)
>        119,748,254      cpu_atom/CPU_CLK_UNHALTED.CORE/  #    4.976 M/sec                     
>                                                   #      5.2 %  tma_retiring             (73.14%)
>         31,363,579      cpu_atom/TOPDOWN_RETIRING.ALL/   #    1.303 M/sec                       (73.37%)
>        227,907,321      cpu_atom/TOPDOWN_FE_BOUND.ALL/   #    9.471 M/sec                       (63.95%)
>        228,803,268      cpu_atom/TOPDOWN_BE_BOUND.ALL/   #    9.508 M/sec                       (63.55%)
>        113,357,334      cpu_core/TOPDOWN.SLOTS/          #     30.5 %  tma_backend_bound      
>                                                   #      9.2 %  tma_retiring           
>                                                   #      8.7 %  tma_bad_speculation    
>                                                   #     51.6 %  tma_frontend_bound     
>         10,451,044      cpu_core/topdown-retiring/                                            
>          9,687,449      cpu_core/topdown-bad-spec/                                            
>         58,703,214      cpu_core/topdown-fe-bound/                                            
>         34,540,660      cpu_core/topdown-be-bound/                                            
>            154,902      cpu_core/INT_MISC.UOP_DROPPING/  #    6.437 K/sec                     
> 
>        1.003818397 seconds time elapsed
> '''

Thanks for the fixes. That should work for -M or --topdown options.
But I don't think the above output is better than the 6.3 for the
*default* of perf stat?

- The multiplexing in the atom core messes up the other events.
- The "M/sec" seems useless for the Topdown events.
- The tma_* is not a generic name.
  "Retiring" is much better than "tma_retiring" as a generic annotation.
  It should works for both X86 and Arm.

As the default, it's better to provide a clean and generic ouptput for
the end users.

If the users want to know more details, they can use -M or --topdown
options. The events/formats are expected to be different among ARCHs.

Also, there should be a bug for all atom Topdown events. They are
displayed twice.

Thanks,
Kan

  parent reply	other threads:[~2023-04-26 13:53 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-26  7:00 [PATCH v1 00/40] Fix perf on Intel hybrid CPUs Ian Rogers
2023-04-26  7:00 ` [PATCH v1 01/40] perf stat: Introduce skippable evsels Ian Rogers
2023-04-26 23:26   ` Yasin, Ahmad
2023-04-27  0:37     ` Ian Rogers
2023-04-27  2:03       ` Ian Rogers
2023-04-27 18:52   ` Liang, Kan
2023-04-27 20:21     ` Ian Rogers
2023-04-27 21:00       ` Namhyung Kim
2023-04-27 21:09         ` Ian Rogers
2023-04-26  7:00 ` [PATCH v1 02/40] perf vendor events intel: Add alderlake metric constraints Ian Rogers
2023-04-26  7:00 ` [PATCH v1 03/40] perf vendor events intel: Add icelake " Ian Rogers
2023-04-27 19:06   ` Liang, Kan
2023-04-27 20:22     ` Ian Rogers
2023-04-26  7:00 ` [PATCH v1 04/40] perf vendor events intel: Add icelakex " Ian Rogers
2023-04-26  7:00 ` [PATCH v1 05/40] perf vendor events intel: Add sapphirerapids " Ian Rogers
2023-04-26  7:00 ` [PATCH v1 06/40] perf vendor events intel: Add tigerlake " Ian Rogers
2023-04-26  7:00 ` [PATCH v1 07/40] perf stat: Avoid segv on counter->name Ian Rogers
2023-04-27 19:11   ` Liang, Kan
2023-04-27 19:34     ` Arnaldo Carvalho de Melo
2023-04-26  7:00 ` [PATCH v1 08/40] perf test: Test more sysfs events Ian Rogers
2023-04-27 19:38   ` Liang, Kan
2023-04-27 20:23     ` Ian Rogers
2023-04-26  7:00 ` [PATCH v1 09/40] perf test: Use valid for PMU tests Ian Rogers
2023-04-27 19:39   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 10/40] perf test: Mask config then test Ian Rogers
2023-04-27 19:39   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 11/40] perf test: Test more with config_cache Ian Rogers
2023-04-27 19:40   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 12/40] perf test: Roundtrip name, don't assume 1 event per name Ian Rogers
2023-04-27 19:44   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 13/40] perf parse-events: Set attr.type to PMU type early Ian Rogers
2023-04-27 20:00   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 14/40] perf print-events: Avoid unnecessary strlist Ian Rogers
2023-04-27 20:01   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 15/40] perf parse-events: Avoid scanning PMUs before parsing Ian Rogers
2023-04-27 20:06   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 16/40] perf test: Validate events with hyphens in Ian Rogers
2023-04-27 20:08   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 17/40] perf evsel: Modify group pmu name for software events Ian Rogers
2023-04-27 20:12   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 18/40] perf test: Move x86 hybrid tests to arch/x86 Ian Rogers
2023-04-27 21:42   ` Liang, Kan
2023-04-26  7:00 ` [PATCH v1 19/40] perf test x86 hybrid: Don't assume evlist order Ian Rogers
2023-04-26  7:00 ` [PATCH v1 20/40] perf parse-events: Support PMUs for legacy cache events Ian Rogers
2023-04-26  7:00 ` [PATCH v1 21/40] perf parse-events: Wildcard " Ian Rogers
2023-04-26 10:11   ` James Clark
2023-04-27  5:50     ` Ian Rogers
2023-04-27 21:02       ` Ian Rogers
2023-04-26  7:00 ` [PATCH v1 22/40] perf print-events: Print legacy cache events for each PMU Ian Rogers
2023-04-26  7:00 ` [PATCH v1 23/40] perf parse-events: Support wildcards on raw events Ian Rogers
2023-04-26  7:00 ` [PATCH v1 24/40] perf parse-events: Remove now unused hybrid logic Ian Rogers
2023-04-26  7:00 ` [PATCH v1 25/40] perf parse-events: Minor type safety cleanup Ian Rogers
2023-04-26  7:00 ` [PATCH v1 26/40] perf parse-events: Add pmu filter Ian Rogers
2023-04-26  7:00 ` [PATCH v1 27/40] perf stat: Make cputype filter generic Ian Rogers
2023-04-26  7:00 ` [PATCH v1 28/40] perf test: Add cputype testing to perf stat Ian Rogers
2023-04-26  7:00 ` [PATCH v1 29/40] perf test: Fix parse-events tests for >1 core PMU Ian Rogers
2023-04-26  7:00 ` [PATCH v1 30/40] perf parse-events: Support hardware events as terms Ian Rogers
2023-04-26  7:00 ` [PATCH v1 31/40] perf parse-events: Avoid error when assigning a term Ian Rogers
2023-04-26  7:00 ` [PATCH v1 32/40] perf parse-events: Avoid error when assigning a legacy cache term Ian Rogers
2023-04-26  7:00 ` [PATCH v1 33/40] perf parse-events: Don't auto merge hybrid wildcard events Ian Rogers
2023-04-26  7:00 ` [PATCH v1 34/40] perf parse-events: Don't reorder atom cpu events Ian Rogers
2023-04-26  7:00 ` [PATCH v1 35/40] perf metrics: Be PMU specific for referenced metrics Ian Rogers
2023-04-26  7:00 ` [PATCH v1 36/40] perf metric: Json flag to not group events if gathering a metric group Ian Rogers
2023-04-26  7:00 ` [PATCH v1 37/40] perf stat: Command line PMU metric filtering Ian Rogers
2023-04-26  7:00 ` [PATCH v1 38/40] perf vendor events intel: Correct alderlake metrics Ian Rogers
2023-04-26  7:00 ` [PATCH v1 39/40] perf jevents: Don't rewrite metrics across PMUs Ian Rogers
2023-04-26  7:00 ` [PATCH v1 40/40] perf metrics: Be PMU specific in event match Ian Rogers
2023-04-26 13:53 ` Liang, Kan [this message]
2023-04-26 21:09 ` [PATCH v1 00/40] Fix perf on Intel hybrid CPUs Arnaldo Carvalho de Melo
2023-04-26 21:33   ` Arnaldo Carvalho de Melo
2023-04-26 22:07     ` Liang, Kan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ahmad.yasin@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=atrajeev@linux.vnet.ibm.com \
    --cc=caleb.biggers@intel.com \
    --cc=edward.baker@intel.com \
    --cc=eranian@google.com \
    --cc=florian.fischer@muhq.space \
    --cc=irogers@google.com \
    --cc=james.clark@arm.com \
    --cc=john.g.garry@oracle.com \
    --cc=jolsa@kernel.org \
    --cc=kjain@linux.ibm.com \
    --cc=leo.yan@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=perry.taylor@intel.com \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    --cc=robh@kernel.org \
    --cc=samantha.alt@intel.com \
    --cc=sumanthk@linux.ibm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tegongkang@gmail.com \
    --cc=tmricht@linux.ibm.com \
    --cc=weilin.wang@intel.com \
    --cc=yangjihong1@huawei.com \
    --cc=yangtiezhu@loongson.cn \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).