linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Adam Li <adamli@os.amperecomputing.com>
To: Leo Yan <leo.yan@linaro.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Like Xu <likexu@tencent.com>, Ian Rogers <irogers@google.com>,
	Alyssa Ross <hi@alyssa.is>, Kajol Jain <kjain@linux.ibm.com>,
	Li Huafei <lihuafei1@huawei.com>,
	German Gomez <german.gomez@arm.com>,
	James Clark <james.clark@arm.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Ali Saidi <alisaidi@amazon.com>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 9/11] perf c2c: Sort on peer snooping for load operations
Date: Thu, 19 May 2022 17:06:18 +0800	[thread overview]
Message-ID: <e87bc3d7-6664-a1b7-faee-6117aa1d121c@os.amperecomputing.com> (raw)
In-Reply-To: <20220518061221.GA430350@leoy-ThinkPad-X240s>

Hi Leo,

Thanks for the update.
On 5/18/2022 2:12 PM, Leo Yan wrote:
 
> Please note, in the total statistics, all remote accesses will be
> accounted into metric "rmt_hit", so "rmt_hit" includes the access for
> remote DRAM or any upwards cache levels due we cannot distinguish
> them.
>

Agree that "Load Remote HIT" makes more sense than "Load Remote DRAM".
 
> From my experiment, with this updating the output result is promised
> for the peer accesses and it's easier for inspecting false sharing.
> 
> As you might see I have prepared a git repo:
> https://git.linaro.org/people/leo.yan/linux-spe.git/ branch:
> perf_c2c_arm_spe_peer_v3, which contains the updated patches for both
> memory flag setting and perf c2c related patches.
> 
> Could you confirm if the updated code works for you or not?
> 

I tested v3 patch (perf_c2c_arm_spe_peer_v3 branch) on 2P Altra system.

Compared with v2, "Snoop Peer" can better indicate cache false-sharing,
for the 'false_sharing.exe' test case.

Bellow are details:

# perf c2c record -- numactl -m 0 ./false_sharing.exe 2
183 mticks, reader_thd (thread 2), on node 0 (cpu 78).
195 mticks, reader_thd (thread 3), on node 1 (cpu 124).
546 mticks, lock_th (thread 0), on node 0 (cpu 0).
562 mticks, lock_th (thread 1), on node 1 (cpu 123).
[ perf record: Woken up 36 times to write data ]
[ perf record: Captured and wrote 72.440 MB perf.data ]

# perf c2c report -d peer --coalesce tid,pid,iaddr,dso -N --stdio
Warning:
Arm SPE CONTEXT packets not found in the traces.
Matching of TIDs to SPE events could be inaccurate.
Warning:
AUX data detected collision  20 times out of 168!

=================================================
  Total records                     :    1198728
  Locked Load/Store Operations      :          0
  Load Operations                   :    1031196
  Loads - uncacheable               :          0
  Loads - IO                        :          0
  Loads - Miss                      :          0
  Loads - no mapping                :          0
  Load Fill Buffer Hit              :          0
  Load L1D hit                      :     970636
  Load L2D hit                      :        292
  Load LLC hit                      :       2477
  Load Local HITM                   :          0
  Load Remote HITM                  :          0
  Load Remote HIT                   :      56459
  Load Local DRAM                   :       1332
  Load Remote DRAM                  :          0
  Load MESI State Exclusive         :       1332
  Load MESI State Shared            :          0
  Load LLC Misses                   :      57791
  Load access blocked by data       :          0
  Load access blocked by address    :          0
  Load HIT Peer                     :      58814
  LLC Misses to Local DRAM          :        2.3%
  LLC Misses to Remote DRAM         :        0.0%
  LLC Misses to Remote cache (HIT)  :       97.7%
  LLC Misses to Remote cache (HITM) :        0.0%
  Store Operations                  :     167532
  Store - uncacheable               :          0
  Store - no mapping                :          0
  Store L1D Hit                     :          0
  Store L1D Miss                    :          0
  Store No available memory level   :     167532
  No Page Map Rejects               :       1234
  Unable to parse data source       :          0

=================================================
    Global Shared Cache Line Event Information
=================================================
  Total Shared Cache Lines          :         45
  Load HITs on shared lines         :     226254
  Fill Buffer Hits on shared lines  :          0
  L1D hits on shared lines          :     166010
  L2D hits on shared lines          :          4
  Load HITs on peer cache lines     :      58814
  LLC hits on shared lines          :       2455
  Locked Access on shared lines     :          0
  Blocked Access on shared lines    :          0
  Store HITs on shared lines        :      96403
  Store L1D hits on shared lines    :          0
  Store No available memory level   :      96403
  Total Merged records              :      96403

=================================================
                 c2c details
=================================================
  Events                            : arm_spe_0/ts_enable=1,load_filter=1,store_filter=1,min_latency=30/
                                    : dummy:u
                                    : memory
  Cachelines sort on                : Snoop Peers
  Cacheline data grouping           : offset,tid,pid,iaddr,dso

=================================================
           Shared Data Cache Line Table
=================================================
#
#        ----------- Cacheline ----------    Snoop  ------- Load Hitm -------    Snoop    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
# Index             Address  Node  PA cnt     Peer    Total  LclHitm  RmtHitm     Peer  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
# .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
#
      0            0x420180   N/A       0   95.53%        0        0        0    56183   246056   219522    26534        0        0    26534        0   161914        0       106        0     56176        0      1326         0
      1            0x420100   N/A       0    4.37%        0        0        0     2571    76437     6576    69861        0        0    69861        0     4005        0      2335        0       236        0         0         0
[...]

Thanks,
-adam

  reply	other threads:[~2022-05-19  9:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-08  9:23 [PATCH v2 00/11] perf c2c: Support display for Arm64 Leo Yan
2022-05-08  9:23 ` [PATCH v2 01/11] perf mem: Add stats for store operation with no available memory level Leo Yan
2022-05-08  9:23 ` [PATCH v2 02/11] perf c2c: Add dimensions for 'N/A' metrics of store operation Leo Yan
2022-05-08  9:23 ` [PATCH v2 03/11] perf c2c: Update documentation for store metric 'N/A' Leo Yan
2022-05-08  9:23 ` [PATCH v2 04/11] perf mem: Add statistics for peer snooping Leo Yan
2022-05-08  9:23 ` [PATCH v2 05/11] perf c2c: Add dimensions for peer load operations Leo Yan
2022-05-08  9:23 ` [PATCH v2 06/11] perf c2c: Use explicit names for display macros Leo Yan
2022-05-08  9:23 ` [PATCH v2 07/11] perf c2c: Rename dimension from 'percent_hitm' to 'percent_costly_snoop' Leo Yan
2022-05-08  9:23 ` [PATCH v2 08/11] perf c2c: Refactor node header Leo Yan
2022-05-08  9:23 ` [PATCH v2 09/11] perf c2c: Sort on peer snooping for load operations Leo Yan
2022-05-13  9:05   ` [PATCH v2 9/11] " Adam Li
2022-05-18  6:12     ` Leo Yan
2022-05-19  9:06       ` Adam Li [this message]
2022-05-22 13:27         ` Leo Yan
2022-05-08  9:23 ` [PATCH v2 10/11] perf c2c: Update documentation for new display option 'peer' Leo Yan
2022-05-08  9:23 ` [PATCH v2 11/11] perf c2c: Use 'peer' as default display for Arm64 Leo Yan
2022-05-19 14:19 ` [PATCH v2 00/11] perf c2c: Support " James Clark
2022-05-22  6:28   ` Leo Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e87bc3d7-6664-a1b7-faee-6117aa1d121c@os.amperecomputing.com \
    --to=adamli@os.amperecomputing.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alisaidi@amazon.com \
    --cc=german.gomez@arm.com \
    --cc=hi@alyssa.is \
    --cc=irogers@google.com \
    --cc=james.clark@arm.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=kjain@linux.ibm.com \
    --cc=leo.yan@linaro.org \
    --cc=lihuafei1@huawei.com \
    --cc=likexu@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).