All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/11] perf c2c: Support display for Arm64
@ 2022-05-08  9:23 Leo Yan
  2022-05-08  9:23 ` [PATCH v2 01/11] perf mem: Add stats for store operation with no available memory level Leo Yan
                   ` (11 more replies)
  0 siblings, 12 replies; 18+ messages in thread
From: Leo Yan @ 2022-05-08  9:23 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Like Xu, Ian Rogers, Alyssa Ross, Kajol Jain, Li Huafei,
	German Gomez, James Clark, Kan Liang, Ali Saidi,
	linux-perf-users, linux-kernel
  Cc: Leo Yan

Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
us to detect cache line contention and transfers.

Unlike x86 architecture, Arm SPE trace data cannot provide 'HITM'
snooping flag, Ali Said has a patch set "perf: arm-spe: Decode SPE
source and use for perf c2c" [1] which introduces 'peer' flag and
synthesizes memory samples with this flag.

Based on patch set [1], this patch set is to finish the second half work
to consume the 'peer' flag in perf c2c tool, it adds an extra display
'peer' mode.

Patches 01, 02 and 03 are to support 'N/A' metrics for store operations.

Patches 04 and 05 adds statistics and dimensions for memory samples with
peer flag.

Patches 06, 07, 08 are for refactoring, it refines the code with more
general naming so this can allow us to easier to extend display modes
but not strictly bound to HITM tags.

Patches 09, 10 and 11 are to extend display 'peer' mode, it also updates
the document and also changes to use 'peer' mode as default mode on
Arm64 arches.

This patch set has been verified for both x86 and Arm64 memory samples.

The display result with x86 memory samples:

  =================================================
             Shared Data Cache Line Table          
  =================================================
  #
  #        ----------- Cacheline ----------      Tot  ------- Load Hitm -------    Snoop    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt     Hitm    Total  LclHitm  RmtHitm     Peer  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
  #
        0      0x55c8971f0080     0    1967   66.14%      252      252        0        0     6044     3550     2494     2024      470        0      528     2672       78        20      252         0        0         0         0
        1      0x55c8971f00c0     0       1   33.86%      129      129        0        0      914      914        0        0        0        0      272      374       52        87      129         0        0         0         0

  =================================================
        Shared Cache Line Distribution Pareto      
  =================================================
  #
  #        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                     Shared                               
  #   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol             Object              Source:Line  Node
  # .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  .................  .......................  ....
  #
    -------------------------------------------------------------------------------
        0        0      252        0     2024      470        0      0x55c8971f0080
    -------------------------------------------------------------------------------
             0.00%   12.30%    0.00%    0.00%    0.00%    0.00%                 0x0     0       1      0x55c8971ed3e9         0      1313       863         0     1222         3  [.] 0x00000000000013e9  false_sharing.exe  false_sharing.exe[13e9]   0
             0.00%    0.79%    0.00%   90.51%    0.00%    0.00%                 0x0     0       1      0x55c8971ed3e2         0      1800       878         0     3029         3  [.] 0x00000000000013e2  false_sharing.exe  false_sharing.exe[13e2]   0
             0.00%    0.00%    0.00%    9.49%  100.00%    0.00%                 0x0     0       1      0x55c8971ed3f4         0         0         0         0      662         3  [.] 0x00000000000013f4  false_sharing.exe  false_sharing.exe[13f4]   0
             0.00%   86.90%    0.00%    0.00%    0.00%    0.00%                0x20     0       1      0x55c8971ed447         0       141       103         0     1131         2  [.] 0x0000000000001447  false_sharing.exe  false_sharing.exe[1447]   0

    -------------------------------------------------------------------------------
        1        0      129        0        0        0        0      0x55c8971f00c0
    -------------------------------------------------------------------------------
             0.00%  100.00%    0.00%    0.00%    0.00%    0.00%                0x20     0       1      0x55c8971ed455         0        88        94         0      914         2  [.] 0x0000000000001455  false_sharing.exe  false_sharing.exe[1455]   0


The display result with Arm SPE memory samples:

  =================================================
             Shared Data Cache Line Table          
  =================================================
  #
  #        ----------- Cacheline ----------    Snoop  ------- Load Hitm -------    Snoop    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt     Peer    Total  LclHitm  RmtHitm     Peer  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
  #
        0      0xaaaac17d6000   N/A       0  100.00%        0        0        0       99    18851    18851        0        0        0        0        0    18752        0         0        0         0        0         0         0

  =================================================
        Shared Cache Line Distribution Pareto      
  =================================================
  #
  #        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                    Shared                       
  #   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol            Object      Source:Line  Node
  # .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  ................  ...............  ....
  #
    -------------------------------------------------------------------------------
        0        0        0       99        0        0        0      0xaaaac17d6000
    -------------------------------------------------------------------------------
             0.00%    0.00%    6.06%    0.00%    0.00%    0.00%                0x20   N/A       0      0xaaaac17c25ac         0         0        43       375    18469         2  [.] 0x00000000000025ac  memstress         memstress[25ac]   0
             0.00%    0.00%   93.94%    0.00%    0.00%    0.00%                0x29   N/A       0      0xaaaac17c3e88         0         0       173       180      135         2  [.] 0x0000000000003e88  memstress         memstress[3e88]   0

[1] https://lore.kernel.org/lkml/20220504184850.24986-1-alisaidi@amazon.com/

Changes from v1:
* Update patches 01, 02 and 03 to support 'N/A' metrics for store
  operations, so can align with the patch set [1] for store samples.


Leo Yan (11):
  perf mem: Add stats for store operation with no available memory level
  perf c2c: Add dimensions for 'N/A' metrics of store operation
  perf c2c: Update documentation for store metric 'N/A'
  perf mem: Add statistics for peer snooping
  perf c2c: Add dimensions for peer load operations
  perf c2c: Use explicit names for display macros
  perf c2c: Rename dimension from 'percent_hitm' to
    'percent_costly_snoop'
  perf c2c: Refactor node header
  perf c2c: Sort on peer snooping for load operations
  perf c2c: Update documentation for new display option 'peer'
  perf c2c: Use 'peer' as default display for Arm64

 tools/perf/Documentation/perf-c2c.txt |  30 ++-
 tools/perf/builtin-c2c.c              | 363 ++++++++++++++++++++------
 tools/perf/util/mem-events.c          |  14 +-
 tools/perf/util/mem-events.h          |   2 +
 4 files changed, 323 insertions(+), 86 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-05-22 13:28 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-08  9:23 [PATCH v2 00/11] perf c2c: Support display for Arm64 Leo Yan
2022-05-08  9:23 ` [PATCH v2 01/11] perf mem: Add stats for store operation with no available memory level Leo Yan
2022-05-08  9:23 ` [PATCH v2 02/11] perf c2c: Add dimensions for 'N/A' metrics of store operation Leo Yan
2022-05-08  9:23 ` [PATCH v2 03/11] perf c2c: Update documentation for store metric 'N/A' Leo Yan
2022-05-08  9:23 ` [PATCH v2 04/11] perf mem: Add statistics for peer snooping Leo Yan
2022-05-08  9:23 ` [PATCH v2 05/11] perf c2c: Add dimensions for peer load operations Leo Yan
2022-05-08  9:23 ` [PATCH v2 06/11] perf c2c: Use explicit names for display macros Leo Yan
2022-05-08  9:23 ` [PATCH v2 07/11] perf c2c: Rename dimension from 'percent_hitm' to 'percent_costly_snoop' Leo Yan
2022-05-08  9:23 ` [PATCH v2 08/11] perf c2c: Refactor node header Leo Yan
2022-05-08  9:23 ` [PATCH v2 09/11] perf c2c: Sort on peer snooping for load operations Leo Yan
2022-05-13  9:05   ` [PATCH v2 9/11] " Adam Li
2022-05-18  6:12     ` Leo Yan
2022-05-19  9:06       ` Adam Li
2022-05-22 13:27         ` Leo Yan
2022-05-08  9:23 ` [PATCH v2 10/11] perf c2c: Update documentation for new display option 'peer' Leo Yan
2022-05-08  9:23 ` [PATCH v2 11/11] perf c2c: Use 'peer' as default display for Arm64 Leo Yan
2022-05-19 14:19 ` [PATCH v2 00/11] perf c2c: Support " James Clark
2022-05-22  6:28   ` Leo Yan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.