From: Leo Yan <leo.yan@linaro.org> To: Joe Mario <jmario@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>, Ali Saidi <alisaidi@amazon.com>, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, german.gomez@arm.com, benh@kernel.crashing.org, Nick.Forrington@arm.com, alexander.shishkin@linux.intel.com, andrew.kilroy@arm.com, james.clark@arm.com, john.garry@huawei.com, Jiri Olsa <jolsa@kernel.org>, kjain@linux.ibm.com, lihuafei1@huawei.com, mark.rutland@arm.com, mathieu.poirier@linaro.org, mingo@redhat.com, namhyung@kernel.org, peterz@infradead.org, will@kernel.org Subject: Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Date: Sun, 22 May 2022 14:15:33 +0800 [thread overview] Message-ID: <20220522061533.GA715382@leoy-ThinkPad-X240s> (raw) In-Reply-To: <32e5a3b7-9294-bbd5-0ae4-b5c04eb4e0e6@redhat.com> Hi Joe, On Thu, May 19, 2022 at 11:16:53AM -0400, Joe Mario wrote: [...] > Hi Leo: > Thanks for getting this working on ARM. I do have a few comments. > > I built and ran this on a ARM Neoverse-N1 system with 2 numa nodes. > > Comment 1: > When I run "perf c2c report", the "Node" field is marked "N/A". It's supposed to show the numa node where the data address for the cacheline resides. That's important both to see what node hot data resides on and if that data is getting lots of cross-numa accesses. Good catching. Will fix it. > Comment 2: > I'm assuming you're identifying the contended cachelines using the "peer" load response, which indicates the load was resolved from a "peer" cpu's cacheline. Please confirm. Yeah, "peer" is ambiguous. AFAIK, "peer" load can come from: - Local node which in peer CPU's cache (can be same cluster or peer cluster); - Remove ndoe which in CPU's cache line, or even from *remote DRAM*. > If that's true, is it possible to identify if that "peer" response was on the local or remote numa node? Good point. Yes, we can do this. So far, the remote accesses are accounted in the metric "rmt_hit", it should be same with the remote peer load; but so far we have no a metric to account local peer loads, it would be not hard to add metric "lcl_peer". > I ask because being able to identify both local and remote HitM's on Intel X86_64 has been quite valuable. That's because remote HitM's are costly and because it helps the viewer see if they need to optimize their cpu affinity or what node their hot data resides on. Thanks a lot for the info. This means at least I should refine the shared cache line distribution pareto for remote peer access, will do some experiment for the enhancement. > Last Comment: > There's a row in the Pareto table that has incorrect column alignment. > Look at row 80 below in the truncated snipit of output. It has an extra field inserted in it at the beginning. > I also show what the corrected output should look like. > > Incorrect row 80: > 71 ================================================= > 72 Shared Cache Line Distribution Pareto > 73 ================================================= > 74 # > 75 # ----- HITM ----- Snoop ------- Store Refs ------ ------- CL -------- > 76 # RmtHitm LclHitm Peer L1 Hit L1 Miss N/A Off Node PA cnt Code address > 77 # ....... ....... ....... ....... ....... ....... ..... .... ...... .................. > 78 # > 79 ------------------------------------------------------------------------------- > 80 0 0 0 4648 0 0 11572 0x422140 > 81 ------------------------------------------------------------------------------- > 82 0.00% 0.00% 0.00% 0.00% 0.00% 44.47% 0x0 N/A 0 0x400ce8 > 83 0.00% 0.00% 10.26% 0.00% 0.00% 0.00% 0x0 N/A 0 0x400e48 > 84 0.00% 0.00% 0.00% 0.00% 0.00% 55.53% 0x0 N/A 0 0x400e54 > 85 0.00% 0.00% 89.74% 0.00% 0.00% 0.00% 0x8 N/A 0 0x401038 > > > Corrected row 80: > 71 ================================================= > 72 Shared Cache Line Distribution Pareto > 73 ================================================= > 74 # > 75 # ----- HITM ----- Snoop ------- Store Refs ----- ------- CL -------- > 76 # RmtHitm LclHitm Peer L1 Hit L1 Miss N/A Off Node PA cnt Code address > 77 # ....... ....... ....... ....... ....... ...... ..... .... ...... .................. > 78 # > 79 ------------------------------------------------------------------------------- > 80 0 0 4648 0 0 11572 0x422140 > 81 ------------------------------------------------------------------------------- > 82 0.00% 0.00% 0.00% 0.00% 0.00% 44.47% 0x0 N/A 0 0x400ce8 > 83 0.00% 0.00% 10.26% 0.00% 0.00% 0.00% 0x0 N/A 0 0x400e48 > 84 0.00% 0.00% 0.00% 0.00% 0.00% 55.53% 0x0 N/A 0 0x400e54 > 85 0.00% 0.00% 89.74% 0.00% 0.00% 0.00% 0x8 N/A 0 0x401038 Hmm‥. At my side, I used below command to output pareto view, but I cannot see the conlumn "CL", the conlumn "CL" is only shown for TUI mode but not for the mode "--stdio". Could you share the method for how to reproduce this issue? $ ./perf c2c report -i perf.data.v3 -N ================================================= Shared Cache Line Distribution Pareto ================================================= # # ----- HITM ----- Snoop ------- Store Refs ------ --------- Data address --------- --------------- cycles --------------- Total cpu Shared # Num RmtHitm LclHitm Peer L1 Hit L1 Miss N/A Offset Node PA cnt Code address rmt hitm lcl hitm load peer records cnt Symbol Object Source:Line Node{cpus %peers %stores} # ..... ....... ....... ....... ....... ....... ....... .................. .... ...... .................. ........ ........ ........ ........ ....... ........ ...................... ................. ...................... .... # ------------------------------------------------------------------------------- 0 0 0 56183 0 0 26534 0x420180 ------------------------------------------------------------------------------- 0.00% 0.00% 99.85% 0.00% 0.00% 0.00% 0x0 N/A 0 0x400bd0 0 0 1587 4034 188785 2 [.] 0x0000000000000bd0 false_sharing.exe false_sharing.exe[bd0] 0{ 1 87.4% n/a} 1{ 1 12.6% n/a} 0.00% 0.00% 0.00% 0.00% 0.00% 54.56% 0x0 N/A 0 0x400bd4 0 0 0 0 14476 2 [.] 0x0000000000000bd4 false_sharing.exe false_sharing.exe[bd4] 0{ 1 n/a 0.2%} 1{ 1 n/a 99.8%} 0.00% 0.00% 0.00% 0.00% 0.00% 45.44% 0x0 N/A 0 0x400bf8 0 0 0 0 12058 2 [.] 0x0000000000000bf8 false_sharing.exe false_sharing.exe[bf8] 0{ 1 n/a 70.3%} 1{ 1 n/a 29.7%} 0.00% 0.00% 0.15% 0.00% 0.00% 0.00% 0x20 N/A 0 0x400c64 0 0 2462 2451 4835 2 [.] 0x0000000000000c64 false_sharing.exe false_sharing.exe[c64] 0{ 1 11.9% n/a} 1{ 1 88.1% n/a} ------------------------------------------------------------------------------- 1 0 0 2571 0 0 69861 0x420100 ------------------------------------------------------------------------------- 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0x8 N/A 0 0x400c08 0 0 0 0 69861 2 [.] 0x0000000000000c08 false_sharing.exe false_sharing.exe[c08] 0{ 1 n/a 62.1%} 1{ 1 n/a 37.9%} 0.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0x20 N/A 0 0x400c74 0 0 834 641 6576 2 [.] 0x0000000000000c74 false_sharing.exe false_sharing.exe[c74] 0{ 1 93.2% n/a} 1{ 1 6.8% n/a} Very appreciate your testing and suggestions! Leo
WARNING: multiple messages have this Message-ID (diff)
From: Leo Yan <leo.yan@linaro.org> To: Joe Mario <jmario@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>, Ali Saidi <alisaidi@amazon.com>, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-arm-kernel@lists.infradead.org, german.gomez@arm.com, benh@kernel.crashing.org, Nick.Forrington@arm.com, alexander.shishkin@linux.intel.com, andrew.kilroy@arm.com, james.clark@arm.com, john.garry@huawei.com, Jiri Olsa <jolsa@kernel.org>, kjain@linux.ibm.com, lihuafei1@huawei.com, mark.rutland@arm.com, mathieu.poirier@linaro.org, mingo@redhat.com, namhyung@kernel.org, peterz@infradead.org, will@kernel.org Subject: Re: [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Date: Sun, 22 May 2022 14:15:33 +0800 [thread overview] Message-ID: <20220522061533.GA715382@leoy-ThinkPad-X240s> (raw) In-Reply-To: <32e5a3b7-9294-bbd5-0ae4-b5c04eb4e0e6@redhat.com> Hi Joe, On Thu, May 19, 2022 at 11:16:53AM -0400, Joe Mario wrote: [...] > Hi Leo: > Thanks for getting this working on ARM. I do have a few comments. > > I built and ran this on a ARM Neoverse-N1 system with 2 numa nodes. > > Comment 1: > When I run "perf c2c report", the "Node" field is marked "N/A". It's supposed to show the numa node where the data address for the cacheline resides. That's important both to see what node hot data resides on and if that data is getting lots of cross-numa accesses. Good catching. Will fix it. > Comment 2: > I'm assuming you're identifying the contended cachelines using the "peer" load response, which indicates the load was resolved from a "peer" cpu's cacheline. Please confirm. Yeah, "peer" is ambiguous. AFAIK, "peer" load can come from: - Local node which in peer CPU's cache (can be same cluster or peer cluster); - Remove ndoe which in CPU's cache line, or even from *remote DRAM*. > If that's true, is it possible to identify if that "peer" response was on the local or remote numa node? Good point. Yes, we can do this. So far, the remote accesses are accounted in the metric "rmt_hit", it should be same with the remote peer load; but so far we have no a metric to account local peer loads, it would be not hard to add metric "lcl_peer". > I ask because being able to identify both local and remote HitM's on Intel X86_64 has been quite valuable. That's because remote HitM's are costly and because it helps the viewer see if they need to optimize their cpu affinity or what node their hot data resides on. Thanks a lot for the info. This means at least I should refine the shared cache line distribution pareto for remote peer access, will do some experiment for the enhancement. > Last Comment: > There's a row in the Pareto table that has incorrect column alignment. > Look at row 80 below in the truncated snipit of output. It has an extra field inserted in it at the beginning. > I also show what the corrected output should look like. > > Incorrect row 80: > 71 ================================================= > 72 Shared Cache Line Distribution Pareto > 73 ================================================= > 74 # > 75 # ----- HITM ----- Snoop ------- Store Refs ------ ------- CL -------- > 76 # RmtHitm LclHitm Peer L1 Hit L1 Miss N/A Off Node PA cnt Code address > 77 # ....... ....... ....... ....... ....... ....... ..... .... ...... .................. > 78 # > 79 ------------------------------------------------------------------------------- > 80 0 0 0 4648 0 0 11572 0x422140 > 81 ------------------------------------------------------------------------------- > 82 0.00% 0.00% 0.00% 0.00% 0.00% 44.47% 0x0 N/A 0 0x400ce8 > 83 0.00% 0.00% 10.26% 0.00% 0.00% 0.00% 0x0 N/A 0 0x400e48 > 84 0.00% 0.00% 0.00% 0.00% 0.00% 55.53% 0x0 N/A 0 0x400e54 > 85 0.00% 0.00% 89.74% 0.00% 0.00% 0.00% 0x8 N/A 0 0x401038 > > > Corrected row 80: > 71 ================================================= > 72 Shared Cache Line Distribution Pareto > 73 ================================================= > 74 # > 75 # ----- HITM ----- Snoop ------- Store Refs ----- ------- CL -------- > 76 # RmtHitm LclHitm Peer L1 Hit L1 Miss N/A Off Node PA cnt Code address > 77 # ....... ....... ....... ....... ....... ...... ..... .... ...... .................. > 78 # > 79 ------------------------------------------------------------------------------- > 80 0 0 4648 0 0 11572 0x422140 > 81 ------------------------------------------------------------------------------- > 82 0.00% 0.00% 0.00% 0.00% 0.00% 44.47% 0x0 N/A 0 0x400ce8 > 83 0.00% 0.00% 10.26% 0.00% 0.00% 0.00% 0x0 N/A 0 0x400e48 > 84 0.00% 0.00% 0.00% 0.00% 0.00% 55.53% 0x0 N/A 0 0x400e54 > 85 0.00% 0.00% 89.74% 0.00% 0.00% 0.00% 0x8 N/A 0 0x401038 Hmm‥. At my side, I used below command to output pareto view, but I cannot see the conlumn "CL", the conlumn "CL" is only shown for TUI mode but not for the mode "--stdio". Could you share the method for how to reproduce this issue? $ ./perf c2c report -i perf.data.v3 -N ================================================= Shared Cache Line Distribution Pareto ================================================= # # ----- HITM ----- Snoop ------- Store Refs ------ --------- Data address --------- --------------- cycles --------------- Total cpu Shared # Num RmtHitm LclHitm Peer L1 Hit L1 Miss N/A Offset Node PA cnt Code address rmt hitm lcl hitm load peer records cnt Symbol Object Source:Line Node{cpus %peers %stores} # ..... ....... ....... ....... ....... ....... ....... .................. .... ...... .................. ........ ........ ........ ........ ....... ........ ...................... ................. ...................... .... # ------------------------------------------------------------------------------- 0 0 0 56183 0 0 26534 0x420180 ------------------------------------------------------------------------------- 0.00% 0.00% 99.85% 0.00% 0.00% 0.00% 0x0 N/A 0 0x400bd0 0 0 1587 4034 188785 2 [.] 0x0000000000000bd0 false_sharing.exe false_sharing.exe[bd0] 0{ 1 87.4% n/a} 1{ 1 12.6% n/a} 0.00% 0.00% 0.00% 0.00% 0.00% 54.56% 0x0 N/A 0 0x400bd4 0 0 0 0 14476 2 [.] 0x0000000000000bd4 false_sharing.exe false_sharing.exe[bd4] 0{ 1 n/a 0.2%} 1{ 1 n/a 99.8%} 0.00% 0.00% 0.00% 0.00% 0.00% 45.44% 0x0 N/A 0 0x400bf8 0 0 0 0 12058 2 [.] 0x0000000000000bf8 false_sharing.exe false_sharing.exe[bf8] 0{ 1 n/a 70.3%} 1{ 1 n/a 29.7%} 0.00% 0.00% 0.15% 0.00% 0.00% 0.00% 0x20 N/A 0 0x400c64 0 0 2462 2451 4835 2 [.] 0x0000000000000c64 false_sharing.exe false_sharing.exe[c64] 0{ 1 11.9% n/a} 1{ 1 88.1% n/a} ------------------------------------------------------------------------------- 1 0 0 2571 0 0 69861 0x420100 ------------------------------------------------------------------------------- 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0x8 N/A 0 0x400c08 0 0 0 0 69861 2 [.] 0x0000000000000c08 false_sharing.exe false_sharing.exe[c08] 0{ 1 n/a 62.1%} 1{ 1 n/a 37.9%} 0.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0x20 N/A 0 0x400c74 0 0 834 641 6576 2 [.] 0x0000000000000c74 false_sharing.exe false_sharing.exe[c74] 0{ 1 93.2% n/a} 1{ 1 6.8% n/a} Very appreciate your testing and suggestions! Leo _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-05-22 6:15 UTC|newest] Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-05-17 2:03 [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Ali Saidi 2022-05-17 2:03 ` Ali Saidi 2022-05-17 2:03 ` [PATCH v9 1/5] perf: Add SNOOP_PEER flag to perf mem data struct Ali Saidi 2022-05-17 2:03 ` Ali Saidi 2022-05-17 2:03 ` [PATCH v9 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER Ali Saidi 2022-05-17 2:03 ` Ali Saidi 2022-05-17 2:03 ` [PATCH v9 3/5] perf mem: Print snoop peer flag Ali Saidi 2022-05-17 2:03 ` Ali Saidi 2022-05-17 2:03 ` [PATCH v9 4/5] perf arm-spe: Don't set data source if it's not a memory operation Ali Saidi 2022-05-17 2:03 ` Ali Saidi 2022-06-17 19:41 ` Arnaldo Carvalho de Melo 2022-06-17 19:41 ` Arnaldo Carvalho de Melo 2022-05-17 2:03 ` [PATCH v9 5/5] perf arm-spe: Use SPE data source for neoverse cores Ali Saidi 2022-05-17 2:03 ` Ali Saidi 2022-05-17 21:20 ` [PATCH v8 0/4] perf: arm-spe: Decode SPE source and use for perf c2c Arnaldo Carvalho de Melo 2022-05-17 21:20 ` Arnaldo Carvalho de Melo 2022-05-18 1:06 ` Leo Yan 2022-05-18 1:06 ` Leo Yan 2022-05-18 4:16 ` Leo Yan 2022-05-18 4:16 ` Leo Yan 2022-05-19 15:16 ` Joe Mario 2022-05-19 15:16 ` Joe Mario 2022-05-22 6:15 ` Leo Yan [this message] 2022-05-22 6:15 ` Leo Yan 2022-05-23 17:24 ` Joe Mario 2022-05-23 17:24 ` Joe Mario 2022-05-26 14:44 ` Leo Yan 2022-05-26 14:44 ` Leo Yan -- strict thread matches above, loose matches on Subject: below -- 2022-05-04 18:48 Ali Saidi 2022-05-04 18:48 ` Ali Saidi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20220522061533.GA715382@leoy-ThinkPad-X240s \ --to=leo.yan@linaro.org \ --cc=Nick.Forrington@arm.com \ --cc=acme@kernel.org \ --cc=alexander.shishkin@linux.intel.com \ --cc=alisaidi@amazon.com \ --cc=andrew.kilroy@arm.com \ --cc=benh@kernel.crashing.org \ --cc=german.gomez@arm.com \ --cc=james.clark@arm.com \ --cc=jmario@redhat.com \ --cc=john.garry@huawei.com \ --cc=jolsa@kernel.org \ --cc=kjain@linux.ibm.com \ --cc=lihuafei1@huawei.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-perf-users@vger.kernel.org \ --cc=mark.rutland@arm.com \ --cc=mathieu.poirier@linaro.org \ --cc=mingo@redhat.com \ --cc=namhyung@kernel.org \ --cc=peterz@infradead.org \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.