linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] perf c2c report: count remote loads correctly
       [not found] <6282053a-d813-8638-531d-56e852d582a2@foss.arm.com>
@ 2020-09-01 17:28 ` Arnaldo Carvalho de Melo
       [not found] ` <20200901201717.GA656443@krava>
  1 sibling, 0 replies; 2+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-09-01 17:28 UTC (permalink / raw)
  To: Al Grant, Jiri Olsa, Joe Mario
  Cc: David Ahern, Don Zickus, Namhyung Kim, linux-perf-users,
	andi.kleen, Linux Kernel Mailing List

Em Thu, Aug 20, 2020 at 02:48:58PM +0100, Al Grant escreveu:
> "perf c2c report" can show load counts for cache lines, which don't match
> the actual number of load samples, e.g. as displayed by "perf script". This
> is specific to "Remote Any cache hit" loads. Firstly, these loads are
> counted twice, because if the "remote" flag is set, rmt_dram is always
> incremented, and then rmt_hitm or rmt_hit may also be incremented. These are
> then totalled in the overall load count, causing double-counting. "Remote
> Any cache hit" should not increment rmt_dram. Instead, use LVLNUM to
> discriminate between remote cache and remote DRAM. Also, non-HITM loads to
> remote cache are not being counted as hits (the last column in the cache
> line report is zero), when the SNOOP field is unset. This causes
> under-reporting of the load count. The code currently only increments
> counters if the SNOOP field is set to either HIT or HITM. Instead, for
> access to remote cache (as indicated by LVLNUM), increment rmt_hitm if
> SNOOP=HITM, increment rmt_hit otherwise.

Hi Joe, Jiri, can you please take a look and provide your Acked-by or
better, Reviewed-by?

- Arnaldo
 
> From: Al Grant <al.grant@arm.com>

Al, please provide Signed-off-by: lines for code your write,

Thanks,

- Arnaldo
 
>  tools/perf/util/mem-events.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
> index ea0af0bc4314..c6bb86fd4330 100644
> --- a/tools/perf/util/mem-events.c
> +++ b/tools/perf/util/mem-events.c
> @@ -332,11 +332,13 @@ int c2c_decode_stats(struct c2c_stats *stats, struct
> mem_info *mi)
>         u64 lvl    = data_src->mem_lvl;
>         u64 snoop  = data_src->mem_snoop;
>         u64 lock   = data_src->mem_lock;
> +       u64 lvlnum = data_src->mem_lvl_num;
>         /*
>          * Skylake might report unknown remote level via this
>          * bit, consider it when evaluating remote HITMs.
>          */
>         bool mrem  = data_src->mem_remote;
> +       bool mmem = (lvlnum == PERF_MEM_LVLNUM_RAM || lvlnum ==
> PERF_MEM_LVLNUM_PMEM);
>         int err = 0;
> 
>  #define HITM_INC(__f)          \
> @@ -383,7 +385,7 @@ do {                                \
> 
>                         if ((lvl & P(LVL, REM_RAM1)) ||
>                             (lvl & P(LVL, REM_RAM2)) ||
> -                            mrem) {
> +                            (mrem && mmem)) {
>                                 stats->rmt_dram++;
>                                 if (snoop & P(SNOOP, HIT))
>                                         stats->ld_shared++;
> @@ -394,11 +396,11 @@ do {                              \
> 
>                 if ((lvl & P(LVL, REM_CCE1)) ||
>                     (lvl & P(LVL, REM_CCE2)) ||
> -                    mrem) {
> -                       if (snoop & P(SNOOP, HIT))
> -                               stats->rmt_hit++;
> -                       else if (snoop & P(SNOOP, HITM))
> +                    (mrem && !mmem)) {
> +                       if (snoop & P(SNOOP, HITM))
>                                 HITM_INC(rmt_hitm);
> +                       else
> +                               stats->rmt_hit++;
>                 }
> 
>                 if ((lvl & P(LVL, MISS)))

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] perf c2c report: count remote loads correctly
       [not found]       ` <0ffdeb34-6fa7-380e-3183-8ce1022ed21c@foss.arm.com>
@ 2020-09-02 13:47         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 2+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-09-02 13:47 UTC (permalink / raw)
  To: Al Grant
  Cc: Joe Mario, Jiri Olsa, linux-perf-users, Andi Kleen,
	Linux Kernel Mailing List

Em Wed, Sep 02, 2020 at 01:09:42PM +0100, Al Grant escreveu:
> On 01/09/2020 22:26, Joe Mario wrote:
> > On 9/1/20 4:46 PM, Jiri Olsa wrote:
> > > On Tue, Sep 01, 2020 at 10:17:19PM +0200, Jiri Olsa wrote:
> > > > On Thu, Aug 20, 2020 at 02:48:58PM +0100, Al Grant wrote:
> > > > I'm getting following when trying to apply the patch:

> > > > patching file util/mem-events.c
> > > > patch: **** malformed patch at line 83: struct mem_info *mi)

> > > > > "perf c2c report" can show load counts for cache lines, which
> > > > > don't match the actual number of load samples, e.g. as
> > > > > displayed by "perf script". This is specific to "Remote Any
> > > > > cache hit" loads. Firstly, these loads are counted twice,
> > > > > because if the "remote" flag is set, rmt_dram is always
> > > > > incremented, and then rmt_hitm or rmt_hit may also be
> > > > > incremented. These are then totalled in the overall load
> > > > > count, causing double-counting. "Remote Any cache hit" should
> > > > > not increment rmt_dram. Instead, use LVLNUM to discriminate
> > > > > between remote cache and remote DRAM. Also, non-HITM loads to
> > > > > remote cache are not being counted as hits (the last column in
> > > > > the cache line report is zero), when the SNOOP field is unset.
> > > > > This causes under-reporting of the load count. The code
> > > > > currently only increments counters if the SNOOP field is set
> > > > > to either HIT or HITM. Instead, for access to remote cache (as
> > > > > indicated by LVLNUM), increment rmt_hitm if SNOOP=HITM,
> > > > > increment rmt_hit otherwise.

> > > > seems like you described more issues in here,
> > > > could you please put them in separate patches,
> > > > so it's easier to review?

> > > > Andi, Joe,
> > > > any idea about changes described above?

> > Hi Al:
> > Can you provide a simple reproducer for this?
> > When I try it, the "perf c2c report" shows the same counts for the load samples as the "perf script" output does.
> > 
> > For example:
> > Here's the output from the "perf c2c report"
> > # head -6 c2c.out
> >     =================================================
> >                 Trace Event Information
> >     =================================================
> >       Total records                     :      32923
> >       Locked Load/Store Operations      :        306
> >       Load Operations                   :       5392
> > 
> > And here's the grep count from the "perf script" output:
> >     # egrep -c 'mem-stores|mem-loads' script.out
> >     32923
> >     # egrep -c "LCK Yes" script.out
> >     306
> >     # grep -c mem-loads script.out
> >     5392

> "Load Operations" is correct, the problem is with some of the
> counts of more specific load types. E.g. in this output:

>   Load Operations                   :        782
>   Loads - uncacheable               :          0
>   Loads - IO                        :          0
>   Loads - Miss                      :         18
>   Loads - no mapping                :          0
>   Load Fill Buffer Hit              :         85
>   Load L1D hit                      :        171
>   Load L2D hit                      :         16
>   Load LLC hit                      :         86
>   Load Local HITM                   :          0
>   Load Remote HITM                  :        110
>   Load Remote HIT                   :          0
>   Load Local DRAM                   :         86
>   Load Remote DRAM                  :        320

> LFB, L1D, L2D, LLC, local DRAM, remote HIT/HITM and remote DRAM are
> distinct places so their individual counts should total up to no more
> than the total load count. But 85+171+16+86+110+86+320 is 874 which
> exceeds the total load count by 92. If you add in "Loads - Miss"
> (which is "L3 Miss" in perf script - it's not specific about where
> it hit) the excess is 110.

> # grep -c "Remote Any cache hit" script.out
> 111

> # grep -c "Remote Any cache hit|SNP HitM" script.out
> 110

> # grep -c "Remote RAM" script.out
> 209

> # grep -c "Remote" script.out
> 320

> Loads which hit remote cache are being counted as both remote HIT/HITM
> and remote DRAM. perf c2c is reporting 320 "Load Remote DRAM" but this
> is the count of all remote loads, including "Remote Any cache hit".
> "Load Remote DRAM" should match "grep -c "Remote RAM" i.e. 209 not
> 320.  The 110 excess in the total matches the count of "Remote Any
> cache hit|SNP HitM" as it is these which are counted twice. (There is
> one "SNP N/A" which isn't counted, that's the other issue I mentioned)

> There are knock-on effects in the detailed cache line reporting, but I
> believe it's all due to the same root cause where we have some records
> triggering two counts which should be mutually exclusive.

Would you consider adding a tools/perf/tests/ entry to do these checks,
i.e. run some synthetic workload, record some of these events, then do
this math, to check that perf is doing the right thing?

The starting point is something like:

  $ ls -la tools/perf/tests/sw-clock.c 
  -rw-rw-r--. 1 acme acme 3326 Aug 14 09:00 tools/perf/tests/sw-clock.c
  $ 
  $ wc -l tools/perf/tests/sw-clock.c 
  149 tools/perf/tests/sw-clock.c
  $ 
  $ perf test clock
  25: Software clock events period values                   : Ok
  $ perf test -v clock
  Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
  25: Software clock events period values                   :
  --- start ---
  test child forked, pid 3455348
  mmap size 528384B
  mmap size 528384B
  test child finished with 0
  ---- end ----
  Software clock events period values: Ok
  $

This way, everytime we go sending patches upstream and do:

  # perf test

We make sure the changes in the current batch of patches being submitted
don't regress,

Thanks for your consideration and for your patches and detailed
discussion,

Regards,

- Arnaldo

P.S.: The SNOOPX fix for tools/ is already in Linus tree, he even
noticed that now the kernel UAPI header is out of sync and needs that
fix :-)

https://lore.kernel.org/lkml/CAHk-=wifL-04oOF3RAbX9Odyfgz4zc4dE=pq-QL+2C-aTxUmqw@mail.gmail.com/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-09-02 13:47 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <6282053a-d813-8638-531d-56e852d582a2@foss.arm.com>
2020-09-01 17:28 ` [PATCH] perf c2c report: count remote loads correctly Arnaldo Carvalho de Melo
     [not found] ` <20200901201717.GA656443@krava>
     [not found]   ` <20200901204614.GC656443@krava>
     [not found]     ` <be482330-5532-6c34-ed0a-1c6f79e365fc@redhat.com>
     [not found]       ` <0ffdeb34-6fa7-380e-3183-8ce1022ed21c@foss.arm.com>
2020-09-02 13:47         ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).