All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leo Yan <leo.yan@linaro.org>
To: Joe Mario <jmario@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Ali Saidi <alisaidi@amazon.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, german.gomez@arm.com,
	benh@kernel.crashing.org, Nick.Forrington@arm.com,
	alexander.shishkin@linux.intel.com, andrew.kilroy@arm.com,
	james.clark@arm.com, john.garry@huawei.com,
	Jiri Olsa <jolsa@kernel.org>,
	kjain@linux.ibm.com, lihuafei1@huawei.com, mark.rutland@arm.com,
	mathieu.poirier@linaro.org, mingo@redhat.com,
	namhyung@kernel.org, peterz@infradead.org, will@kernel.org
Subject: Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
Date: Sun, 22 May 2022 14:15:33 +0800	[thread overview]
Message-ID: <20220522061533.GA715382@leoy-ThinkPad-X240s> (raw)
In-Reply-To: <32e5a3b7-9294-bbd5-0ae4-b5c04eb4e0e6@redhat.com>

Hi Joe,

On Thu, May 19, 2022 at 11:16:53AM -0400, Joe Mario wrote:

[...]

> Hi Leo:
> Thanks for getting this working on ARM.  I do have a few comments.
> 
> I built and ran this on a ARM Neoverse-N1 system with 2 numa nodes.  
> 
> Comment 1:
> When I run "perf c2c report", the "Node" field is marked "N/A".  It's supposed to show the numa node where the data address for the cacheline resides.  That's important both to see what node hot data resides on and if that data is getting lots of cross-numa accesses. 

Good catching.  Will fix it.

> Comment 2:
> I'm assuming you're identifying the contended cachelines using the "peer" load response, which indicates the load was resolved from a "peer" cpu's cacheline.  Please confirm.

Yeah, "peer" is ambiguous.  AFAIK, "peer" load can come from:
- Local node which in peer CPU's cache (can be same cluster or peer cluster);
- Remove ndoe which in CPU's cache line, or even from *remote DRAM*.

> If that's true, is it possible to identify if that "peer" response was on the local or remote numa node?  

Good point.  Yes, we can do this.  So far, the remote accesses are
accounted in the metric "rmt_hit", it should be same with the
remote peer load; but so far we have no a metric to account local
peer loads, it would be not hard to add metric "lcl_peer".

> I ask because being able to identify both local and remote HitM's on Intel X86_64 has been quite valuable.  That's because remote HitM's are costly and because it helps the viewer see if they need to optimize their cpu affinity or what node their hot data resides on.

Thanks a lot for the info.  This means at least I should refine the shared
cache line distribution pareto for remote peer access, will do some
experiment for the enhancement.

> Last Comment:
> There's a row in the Pareto table that has incorrect column alignment.
> Look at row 80 below in the truncated snipit of output.  It has an extra field inserted in it at the beginning.
> I also show what the corrected output should look like.
> 
> Incorrect row 80:
>     71	=================================================
>     72	      Shared Cache Line Distribution Pareto      
>     73	=================================================
>     74	#
>     75	# ----- HITM -----    Snoop  ------- Store Refs ------  ------- CL --------                      
>     76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A    Off  Node  PA cnt        Code address
>     77	# .......  .......  .......  .......  .......  .......  .....  ....  ......  ..................
>     78	#
>     79	  -------------------------------------------------------------------------------
>     80	      0        0        0     4648        0        0    11572            0x422140
>     81	  -------------------------------------------------------------------------------
>     82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
>     83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
>     84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
>     85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038
> 
> 
> Corrected row 80:
>     71	=================================================
>     72	      Shared Cache Line Distribution Pareto      
>     73	=================================================
>     74	#
>     75	# ----- HITM -----    Snoop  ------- Store Refs -----   ------- CL --------                       
>     76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss     N/A     Off  Node  PA cnt        Code address
>     77	# .......  .......  .......  .......  .......  ......   .....  ....  ......  ..................
>     78	#
>     79	  -------------------------------------------------------------------------------
>     80	       0        0     4648        0        0    11572            0x422140
>     81	  -------------------------------------------------------------------------------
>     82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
>     83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
>     84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
>     85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038

Hmm‥.  At my side, I used below command to output pareto view, but I
cannot see the conlumn "CL", the conlumn "CL" is only shown for TUI
mode but not for the mode "--stdio".  Could you share the method for
how to reproduce this issue?

$ ./perf c2c report -i perf.data.v3 -N

=================================================
      Shared Cache Line Distribution Pareto      
=================================================
#
#        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                     Shared                              
#   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol             Object             Source:Line  Node{cpus %peers %stores}
# .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  .................  ......................  ....
#
  -------------------------------------------------------------------------------
      0        0        0    56183        0        0    26534            0x420180
  -------------------------------------------------------------------------------
           0.00%    0.00%   99.85%    0.00%    0.00%    0.00%                 0x0   N/A       0            0x400bd0         0         0      1587      4034   188785         2  [.] 0x0000000000000bd0  false_sharing.exe  false_sharing.exe[bd0]   0{ 1  87.4%    n/a}  1{ 1  12.6%    n/a}
           0.00%    0.00%    0.00%    0.00%    0.00%   54.56%                 0x0   N/A       0            0x400bd4         0         0         0         0    14476         2  [.] 0x0000000000000bd4  false_sharing.exe  false_sharing.exe[bd4]   0{ 1    n/a   0.2%}  1{ 1    n/a  99.8%}
           0.00%    0.00%    0.00%    0.00%    0.00%   45.44%                 0x0   N/A       0            0x400bf8         0         0         0         0    12058         2  [.] 0x0000000000000bf8  false_sharing.exe  false_sharing.exe[bf8]   0{ 1    n/a  70.3%}  1{ 1    n/a  29.7%}
           0.00%    0.00%    0.15%    0.00%    0.00%    0.00%                0x20   N/A       0            0x400c64         0         0      2462      2451     4835         2  [.] 0x0000000000000c64  false_sharing.exe  false_sharing.exe[c64]   0{ 1  11.9%    n/a}  1{ 1  88.1%    n/a}

  -------------------------------------------------------------------------------
      1        0        0     2571        0        0    69861            0x420100
  -------------------------------------------------------------------------------
           0.00%    0.00%    0.00%    0.00%    0.00%  100.00%                 0x8   N/A       0            0x400c08         0         0         0         0    69861         2  [.] 0x0000000000000c08  false_sharing.exe  false_sharing.exe[c08]   0{ 1    n/a  62.1%}  1{ 1    n/a  37.9%}
           0.00%    0.00%  100.00%    0.00%    0.00%    0.00%                0x20   N/A       0            0x400c74         0         0       834       641     6576         2  [.] 0x0000000000000c74  false_sharing.exe  false_sharing.exe[c74]   0{ 1  93.2%    n/a}  1{ 1   6.8%    n/a}

Very appreciate your testing and suggestions!

Leo

WARNING: multiple messages have this Message-ID (diff)
From: Leo Yan <leo.yan@linaro.org>
To: Joe Mario <jmario@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Ali Saidi <alisaidi@amazon.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, german.gomez@arm.com,
	benh@kernel.crashing.org, Nick.Forrington@arm.com,
	alexander.shishkin@linux.intel.com, andrew.kilroy@arm.com,
	james.clark@arm.com, john.garry@huawei.com,
	Jiri Olsa <jolsa@kernel.org>,
	kjain@linux.ibm.com, lihuafei1@huawei.com, mark.rutland@arm.com,
	mathieu.poirier@linaro.org, mingo@redhat.com,
	namhyung@kernel.org, peterz@infradead.org, will@kernel.org
Subject: Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c
Date: Sun, 22 May 2022 14:15:33 +0800	[thread overview]
Message-ID: <20220522061533.GA715382@leoy-ThinkPad-X240s> (raw)
In-Reply-To: <32e5a3b7-9294-bbd5-0ae4-b5c04eb4e0e6@redhat.com>

Hi Joe,

On Thu, May 19, 2022 at 11:16:53AM -0400, Joe Mario wrote:

[...]

> Hi Leo:
> Thanks for getting this working on ARM.  I do have a few comments.
> 
> I built and ran this on a ARM Neoverse-N1 system with 2 numa nodes.  
> 
> Comment 1:
> When I run "perf c2c report", the "Node" field is marked "N/A".  It's supposed to show the numa node where the data address for the cacheline resides.  That's important both to see what node hot data resides on and if that data is getting lots of cross-numa accesses. 

Good catching.  Will fix it.

> Comment 2:
> I'm assuming you're identifying the contended cachelines using the "peer" load response, which indicates the load was resolved from a "peer" cpu's cacheline.  Please confirm.

Yeah, "peer" is ambiguous.  AFAIK, "peer" load can come from:
- Local node which in peer CPU's cache (can be same cluster or peer cluster);
- Remove ndoe which in CPU's cache line, or even from *remote DRAM*.

> If that's true, is it possible to identify if that "peer" response was on the local or remote numa node?  

Good point.  Yes, we can do this.  So far, the remote accesses are
accounted in the metric "rmt_hit", it should be same with the
remote peer load; but so far we have no a metric to account local
peer loads, it would be not hard to add metric "lcl_peer".

> I ask because being able to identify both local and remote HitM's on Intel X86_64 has been quite valuable.  That's because remote HitM's are costly and because it helps the viewer see if they need to optimize their cpu affinity or what node their hot data resides on.

Thanks a lot for the info.  This means at least I should refine the shared
cache line distribution pareto for remote peer access, will do some
experiment for the enhancement.

> Last Comment:
> There's a row in the Pareto table that has incorrect column alignment.
> Look at row 80 below in the truncated snipit of output.  It has an extra field inserted in it at the beginning.
> I also show what the corrected output should look like.
> 
> Incorrect row 80:
>     71	=================================================
>     72	      Shared Cache Line Distribution Pareto      
>     73	=================================================
>     74	#
>     75	# ----- HITM -----    Snoop  ------- Store Refs ------  ------- CL --------                      
>     76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A    Off  Node  PA cnt        Code address
>     77	# .......  .......  .......  .......  .......  .......  .....  ....  ......  ..................
>     78	#
>     79	  -------------------------------------------------------------------------------
>     80	      0        0        0     4648        0        0    11572            0x422140
>     81	  -------------------------------------------------------------------------------
>     82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
>     83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
>     84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
>     85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038
> 
> 
> Corrected row 80:
>     71	=================================================
>     72	      Shared Cache Line Distribution Pareto      
>     73	=================================================
>     74	#
>     75	# ----- HITM -----    Snoop  ------- Store Refs -----   ------- CL --------                       
>     76	# RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss     N/A     Off  Node  PA cnt        Code address
>     77	# .......  .......  .......  .......  .......  ......   .....  ....  ......  ..................
>     78	#
>     79	  -------------------------------------------------------------------------------
>     80	       0        0     4648        0        0    11572            0x422140
>     81	  -------------------------------------------------------------------------------
>     82	    0.00%    0.00%    0.00%    0.00%    0.00%   44.47%    0x0   N/A       0            0x400ce8
>     83	    0.00%    0.00%   10.26%    0.00%    0.00%    0.00%    0x0   N/A       0            0x400e48
>     84	    0.00%    0.00%    0.00%    0.00%    0.00%   55.53%    0x0   N/A       0            0x400e54
>     85	    0.00%    0.00%   89.74%    0.00%    0.00%    0.00%    0x8   N/A       0            0x401038

Hmm‥.  At my side, I used below command to output pareto view, but I
cannot see the conlumn "CL", the conlumn "CL" is only shown for TUI
mode but not for the mode "--stdio".  Could you share the method for
how to reproduce this issue?

$ ./perf c2c report -i perf.data.v3 -N

=================================================
      Shared Cache Line Distribution Pareto      
=================================================
#
#        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                     Shared                              
#   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol             Object             Source:Line  Node{cpus %peers %stores}
# .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  .................  ......................  ....
#
  -------------------------------------------------------------------------------
      0        0        0    56183        0        0    26534            0x420180
  -------------------------------------------------------------------------------
           0.00%    0.00%   99.85%    0.00%    0.00%    0.00%                 0x0   N/A       0            0x400bd0         0         0      1587      4034   188785         2  [.] 0x0000000000000bd0  false_sharing.exe  false_sharing.exe[bd0]   0{ 1  87.4%    n/a}  1{ 1  12.6%    n/a}
           0.00%    0.00%    0.00%    0.00%    0.00%   54.56%                 0x0   N/A       0            0x400bd4         0         0         0         0    14476         2  [.] 0x0000000000000bd4  false_sharing.exe  false_sharing.exe[bd4]   0{ 1    n/a   0.2%}  1{ 1    n/a  99.8%}
           0.00%    0.00%    0.00%    0.00%    0.00%   45.44%                 0x0   N/A       0            0x400bf8         0         0         0         0    12058         2  [.] 0x0000000000000bf8  false_sharing.exe  false_sharing.exe[bf8]   0{ 1    n/a  70.3%}  1{ 1    n/a  29.7%}
           0.00%    0.00%    0.15%    0.00%    0.00%    0.00%                0x20   N/A       0            0x400c64         0         0      2462      2451     4835         2  [.] 0x0000000000000c64  false_sharing.exe  false_sharing.exe[c64]   0{ 1  11.9%    n/a}  1{ 1  88.1%    n/a}

  -------------------------------------------------------------------------------
      1        0        0     2571        0        0    69861            0x420100
  -------------------------------------------------------------------------------
           0.00%    0.00%    0.00%    0.00%    0.00%  100.00%                 0x8   N/A       0            0x400c08         0         0         0         0    69861         2  [.] 0x0000000000000c08  false_sharing.exe  false_sharing.exe[c08]   0{ 1    n/a  62.1%}  1{ 1    n/a  37.9%}
           0.00%    0.00%  100.00%    0.00%    0.00%    0.00%                0x20   N/A       0            0x400c74         0         0       834       641     6576         2  [.] 0x0000000000000c74  false_sharing.exe  false_sharing.exe[c74]   0{ 1  93.2%    n/a}  1{ 1   6.8%    n/a}

Very appreciate your testing and suggestions!

Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-05-22  6:15 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-17  2:03 [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Ali Saidi
2022-05-17  2:03 ` Ali Saidi
2022-05-17  2:03 ` [PATCH v9 1/5] perf: Add SNOOP_PEER flag to perf mem data struct Ali Saidi
2022-05-17  2:03   ` Ali Saidi
2022-05-17  2:03 ` [PATCH v9 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER Ali Saidi
2022-05-17  2:03   ` Ali Saidi
2022-05-17  2:03 ` [PATCH v9 3/5] perf mem: Print snoop peer flag Ali Saidi
2022-05-17  2:03   ` Ali Saidi
2022-05-17  2:03 ` [PATCH v9 4/5] perf arm-spe: Don't set data source if it's not a memory operation Ali Saidi
2022-05-17  2:03   ` Ali Saidi
2022-06-17 19:41   ` Arnaldo Carvalho de Melo
2022-06-17 19:41     ` Arnaldo Carvalho de Melo
2022-05-17  2:03 ` [PATCH v9 5/5] perf arm-spe: Use SPE data source for neoverse cores Ali Saidi
2022-05-17  2:03   ` Ali Saidi
2022-05-17 21:20 ` [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Arnaldo Carvalho de Melo
2022-05-17 21:20   ` Arnaldo Carvalho de Melo
2022-05-18  1:06   ` Leo Yan
2022-05-18  1:06     ` Leo Yan
2022-05-18  4:16   ` Leo Yan
2022-05-18  4:16     ` Leo Yan
2022-05-19 15:16     ` Joe Mario
2022-05-19 15:16       ` Joe Mario
2022-05-22  6:15       ` Leo Yan [this message]
2022-05-22  6:15         ` Leo Yan
2022-05-23 17:24         ` Joe Mario
2022-05-23 17:24           ` Joe Mario
2022-05-26 14:44           ` Leo Yan
2022-05-26 14:44             ` Leo Yan
  -- strict thread matches above, loose matches on Subject: below --
2022-05-04 18:48 Ali Saidi
2022-05-04 18:48 ` Ali Saidi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220522061533.GA715382@leoy-ThinkPad-X240s \
    --to=leo.yan@linaro.org \
    --cc=Nick.Forrington@arm.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alisaidi@amazon.com \
    --cc=andrew.kilroy@arm.com \
    --cc=benh@kernel.crashing.org \
    --cc=german.gomez@arm.com \
    --cc=james.clark@arm.com \
    --cc=jmario@redhat.com \
    --cc=john.garry@huawei.com \
    --cc=jolsa@kernel.org \
    --cc=kjain@linux.ibm.com \
    --cc=lihuafei1@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.poirier@linaro.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.