All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ian Rogers <irogers@google.com>
Cc: "Kim Phillips" <kim.phillips@amd.com>,
	"Arnaldo Carvalho de Melo" <acme@redhat.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
	"Jiri Olsa" <jolsa@redhat.com>,
	"Namhyung Kim" <namhyung@kernel.org>,
	"Vijay Thakkar" <vijaythakkar@me.com>,
	"Andi Kleen" <ak@linux.intel.com>,
	"John Garry" <john.garry@huawei.com>,
	"Kan Liang" <kan.liang@linux.intel.com>,
	"Yunfeng Ye" <yeyunfeng@huawei.com>,
	"Jin Yao" <yao.jin@linux.intel.com>,
	"Martin Liška" <mliska@suse.cz>, "Borislav Petkov" <bp@suse.de>,
	"Jon Grimm" <jon.grimm@amd.com>,
	"Martin Jambor" <mjambor@suse.cz>,
	"Michael Petlan" <mpetlan@redhat.com>,
	"William Cohen" <wcohen@redhat.com>,
	"Stephane Eranian" <eranian@google.com>,
	linux-perf-users <linux-perf-users@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/4] perf vendor events amd: Add recommended events
Date: Fri, 4 Sep 2020 16:28:39 -0300	[thread overview]
Message-ID: <20200904192839.GG3753976@kernel.org> (raw)
In-Reply-To: <CAP-5=fW8zLRgxXgY-CFUnU0HiiZMKhJ6b1znMxT8WeLO5Z-wZA@mail.gmail.com>

Em Thu, Sep 03, 2020 at 10:48:15PM -0700, Ian Rogers escreveu:
> On Thu, Sep 3, 2020 at 11:27 AM Kim Phillips <kim.phillips@amd.com> wrote:
> > On 9/3/20 1:19 AM, Ian Rogers wrote:
> > > On Tue, Sep 1, 2020 at 3:10 PM Kim Phillips <kim.phillips@amd.com> wrote:
> > >> The nps1_die_to_dram event may need perf stat's --metric-no-group
> > >> switch if the number of available data fabric counters is less
> > >> than the number it uses (8).

> > > These are really excellent additions! Does:
> > > "MetricConstraint": "NO_NMI_WATCHDOG"
> > > solve the grouping issue? Perhaps the MetricConstraint needs to be
> > > named more generically to cover this case as it seems sub-optimal to
> > > require the use of --metric-no-group.

> > That metric uses data fabric (DFPMC/amd_df) events, not Core PMC
> > events, which the watchdog uses, so NO_NMI_WATCHDOG wouldn't have
> > an effect.  The event is defined as an approximation anyway.

> > I'll have to get back to you on the other items.

> > Thanks for your review!
 
> NP, more nits than anything else.
 
> Acked-by: Ian Rogers <irogers@google.com>

Thanks, applied, testing notes added to the cset:

    Committer testing:
    
    On a AMD Ryzen 3900x system:
    
    Before:
    
      # perf list all_dc_accesses   all_tlbs_flushed   l1_dtlb_misses   l2_cache_accesses_from_dc_misses   l2_cache_accesses_from_ic_misses   l2_cache_hits_from_dc_misses   l2_cache_hits_from_ic_misses   l2_cache_misses_from_dc_misses   l2_cache_misses_from_ic_miss   l2_dtlb_misses   l2_itlb_misses   sse_avx_stalls   uops_dispatched   uops_retired   l3_accesses   l3_misses | grep -v "^Metric Groups:$" | grep -v "^$"
      #
    
    After:
    
      # perf list all_dc_accesses   all_tlbs_flushed   l1_dtlb_misses   l2_cache_accesses_from_dc_misses   l2_cache_accesses_from_ic_misses   l2_cache_hits_from_dc_misses   l2_cache_hits_from_ic_misses   l2_cache_misses_from_dc_misses   l2_cache_misses_from_ic_miss   l2_dtlb_misses   l2_itlb_misses   sse_avx_stalls   uops_dispatched   uops_retired   l3_accesses   l3_misses | grep -v "^Metric Groups:$" | grep -v "^$" | grep -v "^recommended:$"
      all_dc_accesses
           [All L1 Data Cache Accesses]
      all_tlbs_flushed
           [All TLBs Flushed]
      l1_dtlb_misses
           [L1 DTLB Misses]
      l2_cache_accesses_from_dc_misses
           [L2 Cache Accesses from L1 Data Cache Misses (including prefetch)]
      l2_cache_accesses_from_ic_misses
           [L2 Cache Accesses from L1 Instruction Cache Misses (including
            prefetch)]
      l2_cache_hits_from_dc_misses
           [L2 Cache Hits from L1 Data Cache Misses]
      l2_cache_hits_from_ic_misses
           [L2 Cache Hits from L1 Instruction Cache Misses]
      l2_cache_misses_from_dc_misses
           [L2 Cache Misses from L1 Data Cache Misses]
      l2_cache_misses_from_ic_miss
           [L2 Cache Misses from L1 Instruction Cache Misses]
      l2_dtlb_misses
           [L2 DTLB Misses & Data page walks]
      l2_itlb_misses
           [L2 ITLB Misses & Instruction page walks]
      sse_avx_stalls
           [Mixed SSE/AVX Stalls]
      uops_dispatched
           [Micro-ops Dispatched]
      uops_retired
           [Micro-ops Retired]
      l3_accesses
           [L3 Accesses. Unit: amd_l3]
      l3_misses
           [L3 Misses (includes Chg2X). Unit: amd_l3]
      #
    
      # perf stat -a -e all_dc_accesses,all_tlbs_flushed,l1_dtlb_misses,l2_cache_accesses_from_dc_misses,l2_cache_accesses_from_ic_misses,l2_cache_hits_from_dc_misses,l2_cache_hits_from_ic_misses,l2_cache_misses_from_dc_misses,l2_cache_misses_from_ic_miss,l2_dtlb_misses,l2_itlb_misses,sse_avx_stalls,uops_dispatched,uops_retired,l3_accesses,l3_misses sleep 2
    
       Performance counter stats for 'system wide':
    
           433,439,949      all_dc_accesses                                               (35.66%)
                   443      all_tlbs_flushed                                              (35.66%)
             2,985,885      l1_dtlb_misses                                                (35.66%)
            18,318,019      l2_cache_accesses_from_dc_misses                                     (35.68%)
            50,114,810      l2_cache_accesses_from_ic_misses                                     (35.72%)
            12,423,978      l2_cache_hits_from_dc_misses                                     (35.74%)
            40,703,103      l2_cache_hits_from_ic_misses                                     (35.74%)
             6,698,673      l2_cache_misses_from_dc_misses                                     (35.74%)
            12,090,892      l2_cache_misses_from_ic_miss                                     (35.74%)
               614,267      l2_dtlb_misses                                                (35.74%)
               216,036      l2_itlb_misses                                                (35.74%)
                11,977      sse_avx_stalls                                                (35.74%)
           999,276,223      uops_dispatched                                               (35.73%)
         1,075,311,620      uops_retired                                                  (35.69%)
             1,420,763      l3_accesses
               540,164      l3_misses
    
           2.002344121 seconds time elapsed
    
      # perf stat -a -e all_dc_accesses,all_tlbs_flushed,l1_dtlb_misses,l2_cache_accesses_from_dc_misses,l2_cache_accesses_from_ic_misses sleep 2
    
       Performance counter stats for 'system wide':
    
           175,943,104      all_dc_accesses
                   310      all_tlbs_flushed
             2,280,359      l1_dtlb_misses
            11,700,151      l2_cache_accesses_from_dc_misses
            25,414,963      l2_cache_accesses_from_ic_misses
    
           2.001957818 seconds time elapsed
    
      #
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
    Signed-off-by: Kim Phillips <kim.phillips@amd.com>
    Acked-by: Ian Rogers <irogers@google.com>
    Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>

WARNING: multiple messages have this Message-ID (diff)
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ian Rogers <irogers@google.com>
Cc: "Kim Phillips" <kim.phillips@amd.com>,
	"Arnaldo Carvalho de Melo" <acme@redhat.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
	"Jiri Olsa" <jolsa@redhat.com>,
	"Namhyung Kim" <namhyung@kernel.org>,
	"Vijay Thakkar" <vijaythakkar@me.com>,
	"Andi Kleen" <ak@linux.intel.com>,
	"John Garry" <john.garry@huawei.com>,
	"Kan Liang" <kan.liang@linux.intel.com>,
	"Yunfeng Ye" <yeyunfeng@huawei.com>,
	"Jin Yao" <yao.jin@linux.intel.com>,
	"Martin Liška" <mliska@suse.cz>, "Borislav Petkov" <bp@suse.de>,
	"Jon Grimm" <jon.grimm@amd.com>,
	"Martin Jambor" <mjambor@suse.cz>,
	"Michael Petlan" <mpetlan@redhat.com>,
	"William Cohen" <wcohen@redhat.com>
Subject: Re: [PATCH 3/4] perf vendor events amd: Add recommended events
Date: Fri, 4 Sep 2020 16:28:39 -0300	[thread overview]
Message-ID: <20200904192839.GG3753976@kernel.org> (raw)
In-Reply-To: <CAP-5=fW8zLRgxXgY-CFUnU0HiiZMKhJ6b1znMxT8WeLO5Z-wZA@mail.gmail.com>

Em Thu, Sep 03, 2020 at 10:48:15PM -0700, Ian Rogers escreveu:
> On Thu, Sep 3, 2020 at 11:27 AM Kim Phillips <kim.phillips@amd.com> wrote:
> > On 9/3/20 1:19 AM, Ian Rogers wrote:
> > > On Tue, Sep 1, 2020 at 3:10 PM Kim Phillips <kim.phillips@amd.com> wrote:
> > >> The nps1_die_to_dram event may need perf stat's --metric-no-group
> > >> switch if the number of available data fabric counters is less
> > >> than the number it uses (8).

> > > These are really excellent additions! Does:
> > > "MetricConstraint": "NO_NMI_WATCHDOG"
> > > solve the grouping issue? Perhaps the MetricConstraint needs to be
> > > named more generically to cover this case as it seems sub-optimal to
> > > require the use of --metric-no-group.

> > That metric uses data fabric (DFPMC/amd_df) events, not Core PMC
> > events, which the watchdog uses, so NO_NMI_WATCHDOG wouldn't have
> > an effect.  The event is defined as an approximation anyway.

> > I'll have to get back to you on the other items.

> > Thanks for your review!
 
> NP, more nits than anything else.
 
> Acked-by: Ian Rogers <irogers@google.com>

Thanks, applied, testing notes added to the cset:

    Committer testing:
    
    On a AMD Ryzen 3900x system:
    
    Before:
    
      # perf list all_dc_accesses   all_tlbs_flushed   l1_dtlb_misses   l2_cache_accesses_from_dc_misses   l2_cache_accesses_from_ic_misses   l2_cache_hits_from_dc_misses   l2_cache_hits_from_ic_misses   l2_cache_misses_from_dc_misses   l2_cache_misses_from_ic_miss   l2_dtlb_misses   l2_itlb_misses   sse_avx_stalls   uops_dispatched   uops_retired   l3_accesses   l3_misses | grep -v "^Metric Groups:$" | grep -v "^$"
      #
    
    After:
    
      # perf list all_dc_accesses   all_tlbs_flushed   l1_dtlb_misses   l2_cache_accesses_from_dc_misses   l2_cache_accesses_from_ic_misses   l2_cache_hits_from_dc_misses   l2_cache_hits_from_ic_misses   l2_cache_misses_from_dc_misses   l2_cache_misses_from_ic_miss   l2_dtlb_misses   l2_itlb_misses   sse_avx_stalls   uops_dispatched   uops_retired   l3_accesses   l3_misses | grep -v "^Metric Groups:$" | grep -v "^$" | grep -v "^recommended:$"
      all_dc_accesses
           [All L1 Data Cache Accesses]
      all_tlbs_flushed
           [All TLBs Flushed]
      l1_dtlb_misses
           [L1 DTLB Misses]
      l2_cache_accesses_from_dc_misses
           [L2 Cache Accesses from L1 Data Cache Misses (including prefetch)]
      l2_cache_accesses_from_ic_misses
           [L2 Cache Accesses from L1 Instruction Cache Misses (including
            prefetch)]
      l2_cache_hits_from_dc_misses
           [L2 Cache Hits from L1 Data Cache Misses]
      l2_cache_hits_from_ic_misses
           [L2 Cache Hits from L1 Instruction Cache Misses]
      l2_cache_misses_from_dc_misses
           [L2 Cache Misses from L1 Data Cache Misses]
      l2_cache_misses_from_ic_miss
           [L2 Cache Misses from L1 Instruction Cache Misses]
      l2_dtlb_misses
           [L2 DTLB Misses & Data page walks]
      l2_itlb_misses
           [L2 ITLB Misses & Instruction page walks]
      sse_avx_stalls
           [Mixed SSE/AVX Stalls]
      uops_dispatched
           [Micro-ops Dispatched]
      uops_retired
           [Micro-ops Retired]
      l3_accesses
           [L3 Accesses. Unit: amd_l3]
      l3_misses
           [L3 Misses (includes Chg2X). Unit: amd_l3]
      #
    
      # perf stat -a -e all_dc_accesses,all_tlbs_flushed,l1_dtlb_misses,l2_cache_accesses_from_dc_misses,l2_cache_accesses_from_ic_misses,l2_cache_hits_from_dc_misses,l2_cache_hits_from_ic_misses,l2_cache_misses_from_dc_misses,l2_cache_misses_from_ic_miss,l2_dtlb_misses,l2_itlb_misses,sse_avx_stalls,uops_dispatched,uops_retired,l3_accesses,l3_misses sleep 2
    
       Performance counter stats for 'system wide':
    
           433,439,949      all_dc_accesses                                               (35.66%)
                   443      all_tlbs_flushed                                              (35.66%)
             2,985,885      l1_dtlb_misses                                                (35.66%)
            18,318,019      l2_cache_accesses_from_dc_misses                                     (35.68%)
            50,114,810      l2_cache_accesses_from_ic_misses                                     (35.72%)
            12,423,978      l2_cache_hits_from_dc_misses                                     (35.74%)
            40,703,103      l2_cache_hits_from_ic_misses                                     (35.74%)
             6,698,673      l2_cache_misses_from_dc_misses                                     (35.74%)
            12,090,892      l2_cache_misses_from_ic_miss                                     (35.74%)
               614,267      l2_dtlb_misses                                                (35.74%)
               216,036      l2_itlb_misses                                                (35.74%)
                11,977      sse_avx_stalls                                                (35.74%)
           999,276,223      uops_dispatched                                               (35.73%)
         1,075,311,620      uops_retired                                                  (35.69%)
             1,420,763      l3_accesses
               540,164      l3_misses
    
           2.002344121 seconds time elapsed
    
      # perf stat -a -e all_dc_accesses,all_tlbs_flushed,l1_dtlb_misses,l2_cache_accesses_from_dc_misses,l2_cache_accesses_from_ic_misses sleep 2
    
       Performance counter stats for 'system wide':
    
           175,943,104      all_dc_accesses
                   310      all_tlbs_flushed
             2,280,359      l1_dtlb_misses
            11,700,151      l2_cache_accesses_from_dc_misses
            25,414,963      l2_cache_accesses_from_ic_misses
    
           2.001957818 seconds time elapsed
    
      #
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
    Signed-off-by: Kim Phillips <kim.phillips@amd.com>
    Acked-by: Ian Rogers <irogers@google.com>
    Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>

  reply	other threads:[~2020-09-04 19:28 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-01 22:09 [PATCH 1/4] perf vendor events amd: Add L2 Prefetch events for zen1 Kim Phillips
2020-09-01 22:09 ` Kim Phillips
2020-09-01 22:09 ` [PATCH 2/4] perf vendor events amd: Add ITLB Instruction Fetch Hits event " Kim Phillips
2020-09-01 22:09   ` Kim Phillips
2020-09-03  6:03   ` Ian Rogers
2020-09-03  6:03     ` Ian Rogers
2020-09-04 19:19     ` Arnaldo Carvalho de Melo
2020-09-04 19:19       ` Arnaldo Carvalho de Melo
2020-09-01 22:09 ` [PATCH 3/4] perf vendor events amd: Add recommended events Kim Phillips
2020-09-01 22:09   ` Kim Phillips
2020-09-03  6:19   ` Ian Rogers
2020-09-03  6:19     ` Ian Rogers
2020-09-03 18:26     ` Kim Phillips
2020-09-03 18:26       ` Kim Phillips
2020-09-04  5:48       ` Ian Rogers
2020-09-04  5:48         ` Ian Rogers
2020-09-04 19:28         ` Arnaldo Carvalho de Melo [this message]
2020-09-04 19:28           ` Arnaldo Carvalho de Melo
2020-09-01 22:09 ` [PATCH 4/4] perf vendor events amd: Enable Family 19h users by matching Zen2 events Kim Phillips
2020-09-01 22:09   ` Kim Phillips
2020-09-03  6:20   ` Ian Rogers
2020-09-03  6:20     ` Ian Rogers
2020-09-04 19:33     ` Arnaldo Carvalho de Melo
2020-09-04 19:33       ` Arnaldo Carvalho de Melo
2020-09-03  5:40 ` [PATCH 1/4] perf vendor events amd: Add L2 Prefetch events for zen1 Ian Rogers
2020-09-03  5:40   ` Ian Rogers
2020-09-04 19:18 ` Arnaldo Carvalho de Melo
2020-09-04 19:18   ` Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200904192839.GG3753976@kernel.org \
    --to=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=bp@suse.de \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=john.garry@huawei.com \
    --cc=jolsa@redhat.com \
    --cc=jon.grimm@amd.com \
    --cc=kan.liang@linux.intel.com \
    --cc=kim.phillips@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mjambor@suse.cz \
    --cc=mliska@suse.cz \
    --cc=mpetlan@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=vijaythakkar@me.com \
    --cc=wcohen@redhat.com \
    --cc=yao.jin@linux.intel.com \
    --cc=yeyunfeng@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.