linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/11] perf c2c: Support display for Arm64
@ 2022-05-18  5:57 Leo Yan
  2022-05-18  5:57 ` [PATCH v3 01/11] perf mem: Add stats for store operation with no available memory level Leo Yan
                   ` (11 more replies)
  0 siblings, 12 replies; 16+ messages in thread
From: Leo Yan @ 2022-05-18  5:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Like Xu, Alyssa Ross, Ian Rogers, Kajol Jain, Adam Li, Li Huafei,
	German Gomez, James Clark, Kan Liang, Ali Saidi, Joe Mario,
	linux-perf-users, linux-kernel
  Cc: Leo Yan

Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
us to detect cache line contention and transfers.

Unlike x86 architecture, Arm SPE trace data cannot provide 'HITM'
snooping flag, Ali Said has a patch set v9 "perf: arm-spe: Decode SPE
source and use for perf c2c" [1] which introduces 'peer' flag and
synthesizes memory samples with this flag.

Based on patch set [1], this patch set is to finish the second half work
to consume the 'peer' flag in perf c2c tool, it adds an extra display
'peer' mode.

Patches 01, 02 and 03 are to support 'N/A' metrics for store operations.

Patches 04 and 05 adds statistics and dimensions for memory samples with
peer flag.

Patches 06, 07, 08 are for refactoring, it refines the code with more
general naming so this can allow us to easier to extend display modes
but not strictly bound to HITM tags.

Patches 09, 10 and 11 are to extend display 'peer' mode, it also updates
the document and also changes to use 'peer' mode as default mode on
Arm64 arches.

This patch set has been verified for both x86 and Arm64 memory samples.

The display result with x86 memory samples:

  =================================================
             Shared Data Cache Line Table          
  =================================================
  #
  #        ----------- Cacheline ----------      Tot  ------- Load Hitm -------    Snoop    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt     Hitm    Total  LclHitm  RmtHitm     Peer  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
  #
        0      0x55c8971f0080     0    1967   66.14%      252      252        0        0     6044     3550     2494     2024      470        0      528     2672       78        20      252         0        0         0         0
        1      0x55c8971f00c0     0       1   33.86%      129      129        0        0      914      914        0        0        0        0      272      374       52        87      129         0        0         0         0

  =================================================
        Shared Cache Line Distribution Pareto      
  =================================================
  #
  #        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                     Shared                               
  #   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol             Object              Source:Line  Node
  # .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  .................  .......................  ....
  #
    -------------------------------------------------------------------------------
        0        0      252        0     2024      470        0      0x55c8971f0080
    -------------------------------------------------------------------------------
             0.00%   12.30%    0.00%    0.00%    0.00%    0.00%                 0x0     0       1      0x55c8971ed3e9         0      1313       863         0     1222         3  [.] 0x00000000000013e9  false_sharing.exe  false_sharing.exe[13e9]   0
             0.00%    0.79%    0.00%   90.51%    0.00%    0.00%                 0x0     0       1      0x55c8971ed3e2         0      1800       878         0     3029         3  [.] 0x00000000000013e2  false_sharing.exe  false_sharing.exe[13e2]   0
             0.00%    0.00%    0.00%    9.49%  100.00%    0.00%                 0x0     0       1      0x55c8971ed3f4         0         0         0         0      662         3  [.] 0x00000000000013f4  false_sharing.exe  false_sharing.exe[13f4]   0
             0.00%   86.90%    0.00%    0.00%    0.00%    0.00%                0x20     0       1      0x55c8971ed447         0       141       103         0     1131         2  [.] 0x0000000000001447  false_sharing.exe  false_sharing.exe[1447]   0

    -------------------------------------------------------------------------------
        1        0      129        0        0        0        0      0x55c8971f00c0
    -------------------------------------------------------------------------------
             0.00%  100.00%    0.00%    0.00%    0.00%    0.00%                0x20     0       1      0x55c8971ed455         0        88        94         0      914         2  [.] 0x0000000000001455  false_sharing.exe  false_sharing.exe[1455]   0


The display result with Arm SPE memory samples:

  =================================================
             Shared Data Cache Line Table          
  =================================================
  #
  #        ----------- Cacheline ----------    Snoop  ------- Load Hitm -------    Snoop    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt     Peer    Total  LclHitm  RmtHitm     Peer  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
  #
        0      0xaaaac17d6000   N/A       0  100.00%        0        0        0       99    18851    18851        0        0        0        0        0    18752        0        99        0         0        0         0         0

  =================================================
        Shared Cache Line Distribution Pareto      
  =================================================
  #
  #        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                    Shared                       
  #   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol            Object      Source:Line  Node
  # .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  ................  ...............  ....
  #
    -------------------------------------------------------------------------------
        0        0        0       99        0        0        0      0xaaaac17d6000
    -------------------------------------------------------------------------------
             0.00%    0.00%    6.06%    0.00%    0.00%    0.00%                0x20   N/A       0      0xaaaac17c25ac         0         0        43       375    18469         2  [.] 0x00000000000025ac  memstress         memstress[25ac]   0
             0.00%    0.00%   93.94%    0.00%    0.00%    0.00%                0x29   N/A       0      0xaaaac17c3e88         0         0       173       180      135         2  [.] 0x0000000000003e88  memstress         memstress[3e88]   0

[1] https://lore.kernel.org/lkml/20220517020326.18580-1-alisaidi@amazon.com/

Changes from v2:
* Updated patch 04 to account metrics for both cache level and ld_peer
  for PEER flag;
* Updated document for metric 'rmt_hit' which is accounted for all
  remote accesses (include remote DRAM and any upward caches).

Changes from v1:
* Updated patches 01, 02 and 03 to support 'N/A' metrics for store
  operations, so can align with the patch set [1] for store samples.


Leo Yan (11):
  perf mem: Add stats for store operation with no available memory level
  perf c2c: Add dimensions for 'N/A' metrics of store operation
  perf c2c: Update documentation for store metric 'N/A'
  perf mem: Add statistics for peer snooping
  perf c2c: Add dimensions for peer load operations
  perf c2c: Use explicit names for display macros
  perf c2c: Rename dimension from 'percent_hitm' to
    'percent_costly_snoop'
  perf c2c: Refactor node header
  perf c2c: Sort on peer snooping for load operations
  perf c2c: Update documentation for new display option 'peer'
  perf c2c: Use 'peer' as default display for Arm64

 tools/perf/Documentation/perf-c2c.txt |  34 ++-
 tools/perf/builtin-c2c.c              | 357 ++++++++++++++++++++------
 tools/perf/util/mem-events.c          |  25 +-
 tools/perf/util/mem-events.h          |   2 +
 4 files changed, 331 insertions(+), 87 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 01/11] perf mem: Add stats for store operation with no available memory level
  2022-05-18  5:57 [PATCH v3 00/11] perf c2c: Support display for Arm64 Leo Yan
@ 2022-05-18  5:57 ` Leo Yan
  2022-05-18  5:57 ` [PATCH v3 02/11] perf c2c: Add dimensions for 'N/A' metrics of store operation Leo Yan
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Leo Yan @ 2022-05-18  5:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Like Xu, Alyssa Ross, Ian Rogers, Kajol Jain, Adam Li, Li Huafei,
	German Gomez, James Clark, Kan Liang, Ali Saidi, Joe Mario,
	linux-perf-users, linux-kernel
  Cc: Leo Yan

Sometimes we don't know memory store operations happen on exactly which
memory (or cache) level, the memory level flag is set to PERF_MEM_LVL_NA
in this case; a practical example is Arm SPE AUX trace sets this flag
for all store operations due to absent info for cache level.

This patch is to add a new item "st_na" in structure c2c_stats to add
statistics for store operations with no available cache level.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/mem-events.c | 3 +++
 tools/perf/util/mem-events.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index db5225caaabe..5dca1882c284 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -621,6 +621,8 @@ do {				\
 		}
 		if (lvl & P(LVL, MISS))
 			if (lvl & P(LVL, L1)) stats->st_l1miss++;
+		if (lvl & P(LVL, NA))
+			stats->st_na++;
 	} else {
 		/* unparsable data_src? */
 		stats->noparse++;
@@ -647,6 +649,7 @@ void c2c_add_stats(struct c2c_stats *stats, struct c2c_stats *add)
 	stats->st_noadrs	+= add->st_noadrs;
 	stats->st_l1hit		+= add->st_l1hit;
 	stats->st_l1miss	+= add->st_l1miss;
+	stats->st_na		+= add->st_na;
 	stats->load		+= add->load;
 	stats->ld_excl		+= add->ld_excl;
 	stats->ld_shared	+= add->ld_shared;
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 916242f8020a..8a8b568baeee 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -63,6 +63,7 @@ struct c2c_stats {
 	u32	st_noadrs;           /* cacheable store with no address */
 	u32	st_l1hit;            /* count of stores that hit L1D */
 	u32	st_l1miss;           /* count of stores that miss L1D */
+	u32	st_na;               /* count of stores with memory level is not available */
 	u32	load;                /* count of all loads in trace */
 	u32	ld_excl;             /* exclusive loads, rmt/lcl DRAM - snp none/miss */
 	u32	ld_shared;           /* shared loads, rmt/lcl DRAM - snp hit */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 02/11] perf c2c: Add dimensions for 'N/A' metrics of store operation
  2022-05-18  5:57 [PATCH v3 00/11] perf c2c: Support display for Arm64 Leo Yan
  2022-05-18  5:57 ` [PATCH v3 01/11] perf mem: Add stats for store operation with no available memory level Leo Yan
@ 2022-05-18  5:57 ` Leo Yan
  2022-05-18  5:57 ` [PATCH v3 03/11] perf c2c: Update documentation for store metric 'N/A' Leo Yan
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Leo Yan @ 2022-05-18  5:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Like Xu, Alyssa Ross, Ian Rogers, Kajol Jain, Adam Li, Li Huafei,
	German Gomez, James Clark, Kan Liang, Ali Saidi, Joe Mario,
	linux-perf-users, linux-kernel
  Cc: Leo Yan

Since now we have the statistics 'st_na' for store operations, add
dimensions for the 'N/A' (no available memory level) metrics and the
associated percentage calculation for the single cache line view.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 80 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 73 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index fbbed434014f..c8230c48125f 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -653,6 +653,7 @@ STAT_FN(lcl_hitm)
 STAT_FN(store)
 STAT_FN(st_l1hit)
 STAT_FN(st_l1miss)
+STAT_FN(st_na)
 STAT_FN(ld_fbhit)
 STAT_FN(ld_l1hit)
 STAT_FN(ld_l2hit)
@@ -677,7 +678,8 @@ static uint64_t total_records(struct c2c_stats *stats)
 
 	total    = ldcnt +
 		   stats->st_l1hit +
-		   stats->st_l1miss;
+		   stats->st_l1miss +
+		   stats->st_na;
 
 	return total;
 }
@@ -899,6 +901,7 @@ PERCENT_FN(rmt_hitm)
 PERCENT_FN(lcl_hitm)
 PERCENT_FN(st_l1hit)
 PERCENT_FN(st_l1miss)
+PERCENT_FN(st_na)
 
 static int
 percent_rmt_hitm_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
@@ -1024,6 +1027,37 @@ percent_stores_l1miss_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 	return per_left - per_right;
 }
 
+static int
+percent_stores_na_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+			struct hist_entry *he)
+{
+	int width = c2c_width(fmt, hpp, he->hists);
+	double per = PERCENT(he, st_na);
+	char buf[10];
+
+	return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
+}
+
+static int
+percent_stores_na_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+			struct hist_entry *he)
+{
+	return percent_color(fmt, hpp, he, percent_st_na);
+}
+
+static int64_t
+percent_stores_na_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+		      struct hist_entry *left, struct hist_entry *right)
+{
+	double per_left;
+	double per_right;
+
+	per_left  = PERCENT(left, st_na);
+	per_right = PERCENT(right, st_na);
+
+	return per_left - per_right;
+}
+
 STAT_FN(lcl_dram)
 STAT_FN(rmt_dram)
 
@@ -1351,7 +1385,7 @@ static struct c2c_dimension dim_tot_stores = {
 };
 
 static struct c2c_dimension dim_stores_l1hit = {
-	.header		= HEADER_SPAN("---- Stores ----", "L1Hit", 1),
+	.header		= HEADER_SPAN("--------- Stores --------", "L1Hit", 2),
 	.name		= "stores_l1hit",
 	.cmp		= st_l1hit_cmp,
 	.entry		= st_l1hit_entry,
@@ -1366,8 +1400,16 @@ static struct c2c_dimension dim_stores_l1miss = {
 	.width		= 7,
 };
 
+static struct c2c_dimension dim_stores_na = {
+	.header		= HEADER_SPAN_LOW("N/A"),
+	.name		= "stores_na",
+	.cmp		= st_na_cmp,
+	.entry		= st_na_entry,
+	.width		= 7,
+};
+
 static struct c2c_dimension dim_cl_stores_l1hit = {
-	.header		= HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
+	.header		= HEADER_SPAN("------- Store Refs ------", "L1 Hit", 2),
 	.name		= "cl_stores_l1hit",
 	.cmp		= st_l1hit_cmp,
 	.entry		= st_l1hit_entry,
@@ -1382,6 +1424,14 @@ static struct c2c_dimension dim_cl_stores_l1miss = {
 	.width		= 7,
 };
 
+static struct c2c_dimension dim_cl_stores_na = {
+	.header		= HEADER_SPAN_LOW("N/A"),
+	.name		= "cl_stores_na",
+	.cmp		= st_na_cmp,
+	.entry		= st_na_entry,
+	.width		= 7,
+};
+
 static struct c2c_dimension dim_ld_fbhit = {
 	.header		= HEADER_SPAN("----- Core Load Hit -----", "FB", 2),
 	.name		= "ld_fbhit",
@@ -1471,7 +1521,7 @@ static struct c2c_dimension dim_percent_lcl_hitm = {
 };
 
 static struct c2c_dimension dim_percent_stores_l1hit = {
-	.header		= HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
+	.header		= HEADER_SPAN("------- Store Refs ------", "L1 Hit", 2),
 	.name		= "percent_stores_l1hit",
 	.cmp		= percent_stores_l1hit_cmp,
 	.entry		= percent_stores_l1hit_entry,
@@ -1488,6 +1538,15 @@ static struct c2c_dimension dim_percent_stores_l1miss = {
 	.width		= 7,
 };
 
+static struct c2c_dimension dim_percent_stores_na = {
+	.header		= HEADER_SPAN_LOW("N/A"),
+	.name		= "percent_stores_na",
+	.cmp		= percent_stores_na_cmp,
+	.entry		= percent_stores_na_entry,
+	.color		= percent_stores_na_color,
+	.width		= 7,
+};
+
 static struct c2c_dimension dim_dram_lcl = {
 	.header		= HEADER_SPAN("--- Load Dram ----", "Lcl", 1),
 	.name		= "dram_lcl",
@@ -1618,8 +1677,10 @@ static struct c2c_dimension *dimensions[] = {
 	&dim_tot_stores,
 	&dim_stores_l1hit,
 	&dim_stores_l1miss,
+	&dim_stores_na,
 	&dim_cl_stores_l1hit,
 	&dim_cl_stores_l1miss,
+	&dim_cl_stores_na,
 	&dim_ld_fbhit,
 	&dim_ld_l1hit,
 	&dim_ld_l2hit,
@@ -1632,6 +1693,7 @@ static struct c2c_dimension *dimensions[] = {
 	&dim_percent_lcl_hitm,
 	&dim_percent_stores_l1hit,
 	&dim_percent_stores_l1miss,
+	&dim_percent_stores_na,
 	&dim_dram_lcl,
 	&dim_dram_rmt,
 	&dim_pid,
@@ -2149,6 +2211,7 @@ static void print_c2c__display_stats(FILE *out)
 	fprintf(out, "  Store - no mapping                : %10d\n", stats->st_noadrs);
 	fprintf(out, "  Store L1D Hit                     : %10d\n", stats->st_l1hit);
 	fprintf(out, "  Store L1D Miss                    : %10d\n", stats->st_l1miss);
+	fprintf(out, "  Store No available memory level   : %10d\n", stats->st_na);
 	fprintf(out, "  No Page Map Rejects               : %10d\n", stats->nomap);
 	fprintf(out, "  Unable to parse data source       : %10d\n", stats->noparse);
 }
@@ -2171,6 +2234,7 @@ static void print_shared_cacheline_info(FILE *out)
 	fprintf(out, "  Blocked Access on shared lines    : %10d\n", stats->blk_data + stats->blk_addr);
 	fprintf(out, "  Store HITs on shared lines        : %10d\n", stats->store);
 	fprintf(out, "  Store L1D hits on shared lines    : %10d\n", stats->st_l1hit);
+	fprintf(out, "  Store No available memory level   : %10d\n", stats->st_na);
 	fprintf(out, "  Total Merged records              : %10d\n", hitm_cnt + stats->store);
 }
 
@@ -2193,10 +2257,10 @@ static void print_cacheline(struct c2c_hists *c2c_hists,
 		fprintf(out, "\n");
 	}
 
-	fprintf(out, "  -------------------------------------------------------------\n");
+	fprintf(out, "  ----------------------------------------------------------------------\n");
 	__hist_entry__snprintf(he_cl, &hpp, hpp_list);
 	fprintf(out, "%s\n", bf);
-	fprintf(out, "  -------------------------------------------------------------\n");
+	fprintf(out, "  ----------------------------------------------------------------------\n");
 
 	hists__fprintf(&c2c_hists->hists, false, 0, 0, 0, out, false);
 }
@@ -2213,6 +2277,7 @@ static void print_pareto(FILE *out)
 		    "cl_lcl_hitm,"
 		    "cl_stores_l1hit,"
 		    "cl_stores_l1miss,"
+		    "cl_stores_na,"
 		    "dcacheline";
 
 	perf_hpp_list__init(&hpp_list);
@@ -2664,6 +2729,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
 		"percent_lcl_hitm,"
 		"percent_stores_l1hit,"
 		"percent_stores_l1miss,"
+		"percent_stores_na,"
 		"offset,offset_node,dcacheline_count,",
 		add_pid   ? "pid," : "",
 		add_tid   ? "tid," : "",
@@ -2850,7 +2916,7 @@ static int perf_c2c__report(int argc, const char **argv)
 		     "tot_recs,"
 		     "tot_loads,"
 		     "tot_stores,"
-		     "stores_l1hit,stores_l1miss,"
+		     "stores_l1hit,stores_l1miss,stores_na,"
 		     "ld_fbhit,ld_l1hit,ld_l2hit,"
 		     "ld_lclhit,lcl_hitm,"
 		     "ld_rmthit,rmt_hitm,"
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 03/11] perf c2c: Update documentation for store metric 'N/A'
  2022-05-18  5:57 [PATCH v3 00/11] perf c2c: Support display for Arm64 Leo Yan
  2022-05-18  5:57 ` [PATCH v3 01/11] perf mem: Add stats for store operation with no available memory level Leo Yan
  2022-05-18  5:57 ` [PATCH v3 02/11] perf c2c: Add dimensions for 'N/A' metrics of store operation Leo Yan
@ 2022-05-18  5:57 ` Leo Yan
  2022-05-18  5:57 ` [PATCH v3 04/11] perf mem: Add statistics for peer snooping Leo Yan
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Leo Yan @ 2022-05-18  5:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Like Xu, Alyssa Ross, Ian Rogers, Kajol Jain, Adam Li, Li Huafei,
	German Gomez, James Clark, Kan Liang, Ali Saidi, Joe Mario,
	linux-perf-users, linux-kernel
  Cc: Leo Yan

The 'N/A' metric is added for store operations, update documentation to
reflect changes in the report table.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/Documentation/perf-c2c.txt | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index 3b6a2c84ea02..6f69173731aa 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -189,9 +189,10 @@ For each cacheline in the 1) list we display following data:
   Total stores
   - sum of all store accesses
 
-  Store Reference - L1Hit, L1Miss
+  Store Reference - L1Hit, L1Miss, N/A
     L1Hit - store accesses that hit L1
     L1Miss - store accesses that missed L1
+    N/A - store accesses with memory level is not available
 
   Core Load Hit - FB, L1, L2
   - count of load hits in FB (Fill Buffer), L1 and L2 cache
@@ -210,8 +211,9 @@ For each offset in the 2) list we display following data:
   HITM - Rmt, Lcl
   - % of Remote/Local HITM accesses for given offset within cacheline
 
-  Store Refs - L1 Hit, L1 Miss
-  - % of store accesses that hit/missed L1 for given offset within cacheline
+  Store Refs - L1 Hit, L1 Miss, N/A
+  - % of store accesses that hit L1, missed L1 and N/A (no available) memory
+    level for given offset within cacheline
 
   Data address - Offset
   - offset address
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 04/11] perf mem: Add statistics for peer snooping
  2022-05-18  5:57 [PATCH v3 00/11] perf c2c: Support display for Arm64 Leo Yan
                   ` (2 preceding siblings ...)
  2022-05-18  5:57 ` [PATCH v3 03/11] perf c2c: Update documentation for store metric 'N/A' Leo Yan
@ 2022-05-18  5:57 ` Leo Yan
  2022-05-23 12:38   ` Arnaldo Carvalho de Melo
  2022-05-18  5:57 ` [PATCH v3 05/11] perf c2c: Add dimensions for peer load operations Leo Yan
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 16+ messages in thread
From: Leo Yan @ 2022-05-18  5:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Like Xu, Alyssa Ross, Ian Rogers, Kajol Jain, Adam Li, Li Huafei,
	German Gomez, James Clark, Kan Liang, Ali Saidi, Joe Mario,
	linux-perf-users, linux-kernel
  Cc: Leo Yan

Since the flag PERF_MEM_SNOOPX_PEER is added to support cache snooping
from peer cache line, it can come from a peer core, a peer cluster, or
a remote NUMA node.

This patch adds statistics for the flag PERF_MEM_SNOOPX_PEER.  Note, we
take PERF_MEM_SNOOPX_PEER as an affiliated info, it needs to cooperate
with cache level statistics.  Therefore, we account the load operations
for both the cache level's metrics (e.g. ld_l2hit, ld_llchit, etc.) and
the metric 'ld_peer' when flag PERF_MEM_SNOOPX_PEER is set.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Ali Saidi <alisaidi@amazon.com>
---
 tools/perf/util/mem-events.c | 22 +++++++++++++++++++---
 tools/perf/util/mem-events.h |  1 +
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 5dca1882c284..9de0eb3a1200 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -525,6 +525,7 @@ int c2c_decode_stats(struct c2c_stats *stats, struct mem_info *mi)
 	u64 op     = data_src->mem_op;
 	u64 lvl    = data_src->mem_lvl;
 	u64 snoop  = data_src->mem_snoop;
+	u64 snoopx = data_src->mem_snoopx;
 	u64 lock   = data_src->mem_lock;
 	u64 blk    = data_src->mem_blk;
 	/*
@@ -567,18 +568,28 @@ do {				\
 			if (lvl & P(LVL, IO))  stats->ld_io++;
 			if (lvl & P(LVL, LFB)) stats->ld_fbhit++;
 			if (lvl & P(LVL, L1 )) stats->ld_l1hit++;
-			if (lvl & P(LVL, L2 )) stats->ld_l2hit++;
+			if (lvl & P(LVL, L2)) {
+				stats->ld_l2hit++;
+
+				if (snoopx & P(SNOOPX, PEER))
+					stats->ld_peer++;
+			}
 			if (lvl & P(LVL, L3 )) {
 				if (snoop & P(SNOOP, HITM))
 					HITM_INC(lcl_hitm);
 				else
 					stats->ld_llchit++;
+
+				if (snoopx & P(SNOOPX, PEER))
+					stats->ld_peer++;
 			}
 
 			if (lvl & P(LVL, LOC_RAM)) {
 				stats->lcl_dram++;
 				if (snoop & P(SNOOP, HIT))
 					stats->ld_shared++;
+				else if (snoopx & P(SNOOPX, PEER))
+					stats->ld_peer++;
 				else
 					stats->ld_excl++;
 			}
@@ -597,10 +608,14 @@ do {				\
 		if ((lvl & P(LVL, REM_CCE1)) ||
 		    (lvl & P(LVL, REM_CCE2)) ||
 		     mrem) {
-			if (snoop & P(SNOOP, HIT))
+			if (snoop & P(SNOOP, HIT)) {
 				stats->rmt_hit++;
-			else if (snoop & P(SNOOP, HITM))
+			} else if (snoop & P(SNOOP, HITM)) {
 				HITM_INC(rmt_hitm);
+			} else if (snoopx & P(SNOOPX, PEER)) {
+				stats->rmt_hit++;
+				stats->ld_peer++;
+			}
 		}
 
 		if ((lvl & P(LVL, MISS)))
@@ -661,6 +676,7 @@ void c2c_add_stats(struct c2c_stats *stats, struct c2c_stats *add)
 	stats->ld_l1hit		+= add->ld_l1hit;
 	stats->ld_l2hit		+= add->ld_l2hit;
 	stats->ld_llchit	+= add->ld_llchit;
+	stats->ld_peer		+= add->ld_peer;
 	stats->lcl_hitm		+= add->lcl_hitm;
 	stats->rmt_hitm		+= add->rmt_hitm;
 	stats->tot_hitm		+= add->tot_hitm;
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 8a8b568baeee..4879b841c841 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -75,6 +75,7 @@ struct c2c_stats {
 	u32	ld_l1hit;            /* count of loads that hit L1D */
 	u32	ld_l2hit;            /* count of loads that hit L2D */
 	u32	ld_llchit;           /* count of loads that hit LLC */
+	u32	ld_peer;             /* count of loads that hit peer core or cluster cache */
 	u32	lcl_hitm;            /* count of loads with local HITM  */
 	u32	rmt_hitm;            /* count of loads with remote HITM */
 	u32	tot_hitm;            /* count of loads with local and remote HITM */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 05/11] perf c2c: Add dimensions for peer load operations
  2022-05-18  5:57 [PATCH v3 00/11] perf c2c: Support display for Arm64 Leo Yan
                   ` (3 preceding siblings ...)
  2022-05-18  5:57 ` [PATCH v3 04/11] perf mem: Add statistics for peer snooping Leo Yan
@ 2022-05-18  5:57 ` Leo Yan
  2022-05-18  5:57 ` [PATCH v3 06/11] perf c2c: Use explicit names for display macros Leo Yan
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Leo Yan @ 2022-05-18  5:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Like Xu, Alyssa Ross, Ian Rogers, Kajol Jain, Adam Li, Li Huafei,
	German Gomez, James Clark, Kan Liang, Ali Saidi, Joe Mario,
	linux-perf-users, linux-kernel
  Cc: Leo Yan

This patch is to add dimensions for peer load operations, include a
dimension for the total statistics for metric 'ld_peer', and also add
dimensions for the single cache line view.

Same as HTIM metrics, this patch also adds the dimension for mean value
for peer load operations.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Ali Saidi <alisaidi@amazon.com>
---
 tools/perf/builtin-c2c.c | 87 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 84 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index c8230c48125f..b0695cfe793f 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -55,6 +55,7 @@ struct c2c_hists {
 struct compute_stats {
 	struct stats		 lcl_hitm;
 	struct stats		 rmt_hitm;
+	struct stats		 ld_peer;
 	struct stats		 load;
 };
 
@@ -154,6 +155,7 @@ static void *c2c_he_zalloc(size_t size)
 
 	init_stats(&c2c_he->cstats.lcl_hitm);
 	init_stats(&c2c_he->cstats.rmt_hitm);
+	init_stats(&c2c_he->cstats.ld_peer);
 	init_stats(&c2c_he->cstats.load);
 
 	return &c2c_he->he;
@@ -253,6 +255,8 @@ static void compute_stats(struct c2c_hist_entry *c2c_he,
 		update_stats(&cstats->rmt_hitm, weight);
 	else if (stats->lcl_hitm)
 		update_stats(&cstats->lcl_hitm, weight);
+	else if (stats->ld_peer)
+		update_stats(&cstats->ld_peer, weight);
 	else if (stats->load)
 		update_stats(&cstats->load, weight);
 }
@@ -658,6 +662,7 @@ STAT_FN(ld_fbhit)
 STAT_FN(ld_l1hit)
 STAT_FN(ld_l2hit)
 STAT_FN(ld_llchit)
+STAT_FN(ld_peer)
 STAT_FN(rmt_hit)
 
 static uint64_t total_records(struct c2c_stats *stats)
@@ -899,6 +904,7 @@ static double percent_ ## __f(struct c2c_hist_entry *c2c_he)			\
 
 PERCENT_FN(rmt_hitm)
 PERCENT_FN(lcl_hitm)
+PERCENT_FN(ld_peer)
 PERCENT_FN(st_l1hit)
 PERCENT_FN(st_l1miss)
 PERCENT_FN(st_na)
@@ -965,6 +971,37 @@ percent_lcl_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 	return per_left - per_right;
 }
 
+static int
+percent_ld_peer_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+		       struct hist_entry *he)
+{
+	int width = c2c_width(fmt, hpp, he->hists);
+	double per = PERCENT(he, ld_peer);
+	char buf[10];
+
+	return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
+}
+
+static int
+percent_ld_peer_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+		       struct hist_entry *he)
+{
+	return percent_color(fmt, hpp, he, percent_ld_peer);
+}
+
+static int64_t
+percent_ld_peer_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+		    struct hist_entry *left, struct hist_entry *right)
+{
+	double per_left;
+	double per_right;
+
+	per_left  = PERCENT(left, ld_peer);
+	per_right = PERCENT(right, ld_peer);
+
+	return per_left - per_right;
+}
+
 static int
 percent_stores_l1hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
 			   struct hist_entry *he)
@@ -1213,6 +1250,7 @@ __func(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp, struct hist_entry *he)	\
 MEAN_ENTRY(mean_rmt_entry,  rmt_hitm);
 MEAN_ENTRY(mean_lcl_entry,  lcl_hitm);
 MEAN_ENTRY(mean_load_entry, load);
+MEAN_ENTRY(mean_peer_entry, ld_peer);
 
 static int
 cpucnt_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
@@ -1360,6 +1398,14 @@ static struct c2c_dimension dim_rmt_hitm = {
 	.width		= 7,
 };
 
+static struct c2c_dimension dim_ld_peer = {
+	.header		= HEADER_BOTH("Snoop", "Peer"),
+	.name		= "ld_peer",
+	.cmp		= ld_peer_cmp,
+	.entry		= ld_peer_entry,
+	.width		= 7,
+};
+
 static struct c2c_dimension dim_cl_rmt_hitm = {
 	.header		= HEADER_SPAN("----- HITM -----", "Rmt", 1),
 	.name		= "cl_rmt_hitm",
@@ -1376,6 +1422,14 @@ static struct c2c_dimension dim_cl_lcl_hitm = {
 	.width		= 7,
 };
 
+static struct c2c_dimension dim_cl_ld_peer = {
+	.header		= HEADER_BOTH("Snoop", "Peer"),
+	.name		= "cl_ld_peer",
+	.cmp		= ld_peer_cmp,
+	.entry		= ld_peer_entry,
+	.width		= 7,
+};
+
 static struct c2c_dimension dim_tot_stores = {
 	.header		= HEADER_BOTH("Total", "Stores"),
 	.name		= "tot_stores",
@@ -1520,6 +1574,15 @@ static struct c2c_dimension dim_percent_lcl_hitm = {
 	.width		= 7,
 };
 
+static struct c2c_dimension dim_percent_ld_peer = {
+	.header		= HEADER_BOTH("Snoop", "Peer"),
+	.name		= "percent_ld_peer",
+	.cmp		= percent_ld_peer_cmp,
+	.entry		= percent_ld_peer_entry,
+	.color		= percent_ld_peer_color,
+	.width		= 7,
+};
+
 static struct c2c_dimension dim_percent_stores_l1hit = {
 	.header		= HEADER_SPAN("------- Store Refs ------", "L1 Hit", 2),
 	.name		= "percent_stores_l1hit",
@@ -1602,7 +1665,7 @@ static struct c2c_dimension dim_node = {
 };
 
 static struct c2c_dimension dim_mean_rmt = {
-	.header		= HEADER_SPAN("---------- cycles ----------", "rmt hitm", 2),
+	.header		= HEADER_SPAN("--------------- cycles ---------------", "rmt hitm", 3),
 	.name		= "mean_rmt",
 	.cmp		= empty_cmp,
 	.entry		= mean_rmt_entry,
@@ -1625,6 +1688,14 @@ static struct c2c_dimension dim_mean_load = {
 	.width		= 8,
 };
 
+static struct c2c_dimension dim_mean_peer = {
+	.header		= HEADER_SPAN_LOW("peer"),
+	.name		= "mean_peer",
+	.cmp		= empty_cmp,
+	.entry		= mean_peer_entry,
+	.width		= 8,
+};
+
 static struct c2c_dimension dim_cpucnt = {
 	.header		= HEADER_BOTH("cpu", "cnt"),
 	.name		= "cpucnt",
@@ -1672,8 +1743,10 @@ static struct c2c_dimension *dimensions[] = {
 	&dim_tot_hitm,
 	&dim_lcl_hitm,
 	&dim_rmt_hitm,
+	&dim_ld_peer,
 	&dim_cl_lcl_hitm,
 	&dim_cl_rmt_hitm,
+	&dim_cl_ld_peer,
 	&dim_tot_stores,
 	&dim_stores_l1hit,
 	&dim_stores_l1miss,
@@ -1691,6 +1764,7 @@ static struct c2c_dimension *dimensions[] = {
 	&dim_percent_hitm,
 	&dim_percent_rmt_hitm,
 	&dim_percent_lcl_hitm,
+	&dim_percent_ld_peer,
 	&dim_percent_stores_l1hit,
 	&dim_percent_stores_l1miss,
 	&dim_percent_stores_na,
@@ -1704,6 +1778,7 @@ static struct c2c_dimension *dimensions[] = {
 	&dim_mean_rmt,
 	&dim_mean_lcl,
 	&dim_mean_load,
+	&dim_mean_peer,
 	&dim_cpucnt,
 	&dim_srcline,
 	&dim_dcacheline_idx,
@@ -2202,6 +2277,7 @@ static void print_c2c__display_stats(FILE *out)
 	fprintf(out, "  Load LLC Misses                   : %10d\n", llc_misses);
 	fprintf(out, "  Load access blocked by data       : %10d\n", stats->blk_data);
 	fprintf(out, "  Load access blocked by address    : %10d\n", stats->blk_addr);
+	fprintf(out, "  Load HIT Peer                     : %10d\n", stats->ld_peer);
 	fprintf(out, "  LLC Misses to Local DRAM          : %10.1f%%\n", ((double)stats->lcl_dram/(double)llc_misses) * 100.);
 	fprintf(out, "  LLC Misses to Remote DRAM         : %10.1f%%\n", ((double)stats->rmt_dram/(double)llc_misses) * 100.);
 	fprintf(out, "  LLC Misses to Remote cache (HIT)  : %10.1f%%\n", ((double)stats->rmt_hit /(double)llc_misses) * 100.);
@@ -2229,6 +2305,7 @@ static void print_shared_cacheline_info(FILE *out)
 	fprintf(out, "  Fill Buffer Hits on shared lines  : %10d\n", stats->ld_fbhit);
 	fprintf(out, "  L1D hits on shared lines          : %10d\n", stats->ld_l1hit);
 	fprintf(out, "  L2D hits on shared lines          : %10d\n", stats->ld_l2hit);
+	fprintf(out, "  Load HITs on peer cache lines     : %10d\n", stats->ld_peer);
 	fprintf(out, "  LLC hits on shared lines          : %10d\n", stats->ld_llchit + stats->lcl_hitm);
 	fprintf(out, "  Locked Access on shared lines     : %10d\n", stats->locks);
 	fprintf(out, "  Blocked Access on shared lines    : %10d\n", stats->blk_data + stats->blk_addr);
@@ -2257,10 +2334,10 @@ static void print_cacheline(struct c2c_hists *c2c_hists,
 		fprintf(out, "\n");
 	}
 
-	fprintf(out, "  ----------------------------------------------------------------------\n");
+	fprintf(out, "  -------------------------------------------------------------------------------\n");
 	__hist_entry__snprintf(he_cl, &hpp, hpp_list);
 	fprintf(out, "%s\n", bf);
-	fprintf(out, "  ----------------------------------------------------------------------\n");
+	fprintf(out, "  -------------------------------------------------------------------------------\n");
 
 	hists__fprintf(&c2c_hists->hists, false, 0, 0, 0, out, false);
 }
@@ -2275,6 +2352,7 @@ static void print_pareto(FILE *out)
 	cl_output = "cl_num,"
 		    "cl_rmt_hitm,"
 		    "cl_lcl_hitm,"
+		    "cl_ld_peer,"
 		    "cl_stores_l1hit,"
 		    "cl_stores_l1miss,"
 		    "cl_stores_na,"
@@ -2727,6 +2805,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
 		c2c.use_stdio ? "cl_num_empty," : "",
 		"percent_rmt_hitm,"
 		"percent_lcl_hitm,"
+		"percent_ld_peer,"
 		"percent_stores_l1hit,"
 		"percent_stores_l1miss,"
 		"percent_stores_na,"
@@ -2737,6 +2816,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
 		"mean_rmt,"
 		"mean_lcl,"
 		"mean_load,"
+		"mean_peer,"
 		"tot_recs,"
 		"cpucnt,",
 		add_sym ? "symbol," : "",
@@ -2913,6 +2993,7 @@ static int perf_c2c__report(int argc, const char **argv)
 		     "dcacheline_count,"
 		     "percent_hitm,"
 		     "tot_hitm,lcl_hitm,rmt_hitm,"
+		     "ld_peer,"
 		     "tot_recs,"
 		     "tot_loads,"
 		     "tot_stores,"
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 06/11] perf c2c: Use explicit names for display macros
  2022-05-18  5:57 [PATCH v3 00/11] perf c2c: Support display for Arm64 Leo Yan
                   ` (4 preceding siblings ...)
  2022-05-18  5:57 ` [PATCH v3 05/11] perf c2c: Add dimensions for peer load operations Leo Yan
@ 2022-05-18  5:57 ` Leo Yan
  2022-05-18  5:57 ` [PATCH v3 07/11] perf c2c: Rename dimension from 'percent_hitm' to 'percent_costly_snoop' Leo Yan
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Leo Yan @ 2022-05-18  5:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Like Xu, Alyssa Ross, Ian Rogers, Kajol Jain, Adam Li, Li Huafei,
	German Gomez, James Clark, Kan Liang, Ali Saidi, Joe Mario,
	linux-perf-users, linux-kernel
  Cc: Leo Yan

Perf c2c tool has an assumption that it heavily depends on HITM snoop
type to detect cache false sharing, unfortunately, HITM is not supported
on some architectures.

Essentially, perf c2c tool wants to find some very costly snooping
operations for false cache sharing, this means it's not necessarily
to stick using HITM tags and we can explore other snooping types
(e.g. SNOOPX_PEER).

For this reason, this patch renames HITM related display macros with
suffix '_HITM', so it can be distinct if later add more display types
for on other snooping type.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Ali Saidi <alisaidi@amazon.com>
---
 tools/perf/builtin-c2c.c | 58 ++++++++++++++++++++--------------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index b0695cfe793f..47da9ede644b 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -114,16 +114,16 @@ struct perf_c2c {
 };
 
 enum {
-	DISPLAY_LCL,
-	DISPLAY_RMT,
-	DISPLAY_TOT,
+	DISPLAY_LCL_HITM,
+	DISPLAY_RMT_HITM,
+	DISPLAY_TOT_HITM,
 	DISPLAY_MAX,
 };
 
 static const char *display_str[DISPLAY_MAX] = {
-	[DISPLAY_LCL] = "Local",
-	[DISPLAY_RMT] = "Remote",
-	[DISPLAY_TOT] = "Total",
+	[DISPLAY_LCL_HITM] = "Local",
+	[DISPLAY_RMT_HITM] = "Remote",
+	[DISPLAY_TOT_HITM] = "Total",
 };
 
 static const struct option c2c_options[] = {
@@ -805,15 +805,15 @@ static double percent_hitm(struct c2c_hist_entry *c2c_he)
 	total = &hists->stats;
 
 	switch (c2c.display) {
-	case DISPLAY_RMT:
+	case DISPLAY_RMT_HITM:
 		st  = stats->rmt_hitm;
 		tot = total->rmt_hitm;
 		break;
-	case DISPLAY_LCL:
+	case DISPLAY_LCL_HITM:
 		st  = stats->lcl_hitm;
 		tot = total->lcl_hitm;
 		break;
-	case DISPLAY_TOT:
+	case DISPLAY_TOT_HITM:
 		st  = stats->tot_hitm;
 		tot = total->tot_hitm;
 	default:
@@ -1179,15 +1179,15 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
 			advance_hpp(hpp, ret);
 
 			switch (c2c.display) {
-			case DISPLAY_RMT:
+			case DISPLAY_RMT_HITM:
 				ret = display_metrics(hpp, stats->rmt_hitm,
 						      c2c_he->stats.rmt_hitm);
 				break;
-			case DISPLAY_LCL:
+			case DISPLAY_LCL_HITM:
 				ret = display_metrics(hpp, stats->lcl_hitm,
 						      c2c_he->stats.lcl_hitm);
 				break;
-			case DISPLAY_TOT:
+			case DISPLAY_TOT_HITM:
 				ret = display_metrics(hpp, stats->tot_hitm,
 						      c2c_he->stats.tot_hitm);
 				break;
@@ -1543,9 +1543,9 @@ static struct c2c_dimension dim_tot_loads = {
 };
 
 static struct c2c_header percent_hitm_header[] = {
-	[DISPLAY_LCL] = HEADER_BOTH("Lcl", "Hitm"),
-	[DISPLAY_RMT] = HEADER_BOTH("Rmt", "Hitm"),
-	[DISPLAY_TOT] = HEADER_BOTH("Tot", "Hitm"),
+	[DISPLAY_LCL_HITM] = HEADER_BOTH("Lcl", "Hitm"),
+	[DISPLAY_RMT_HITM] = HEADER_BOTH("Rmt", "Hitm"),
+	[DISPLAY_TOT_HITM] = HEADER_BOTH("Tot", "Hitm"),
 };
 
 static struct c2c_dimension dim_percent_hitm = {
@@ -2016,15 +2016,15 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
 
 	switch (c2c.display) {
-	case DISPLAY_LCL:
+	case DISPLAY_LCL_HITM:
 		he->filtered = filter_display(c2c_he->stats.lcl_hitm,
 					      stats->lcl_hitm);
 		break;
-	case DISPLAY_RMT:
+	case DISPLAY_RMT_HITM:
 		he->filtered = filter_display(c2c_he->stats.rmt_hitm,
 					      stats->rmt_hitm);
 		break;
-	case DISPLAY_TOT:
+	case DISPLAY_TOT_HITM:
 		he->filtered = filter_display(c2c_he->stats.tot_hitm,
 					      stats->tot_hitm);
 		break;
@@ -2047,13 +2047,13 @@ static inline bool is_valid_hist_entry(struct hist_entry *he)
 		return true;
 
 	switch (c2c.display) {
-	case DISPLAY_LCL:
+	case DISPLAY_LCL_HITM:
 		has_record = !!c2c_he->stats.lcl_hitm;
 		break;
-	case DISPLAY_RMT:
+	case DISPLAY_RMT_HITM:
 		has_record = !!c2c_he->stats.rmt_hitm;
 		break;
-	case DISPLAY_TOT:
+	case DISPLAY_TOT_HITM:
 		has_record = !!c2c_he->stats.tot_hitm;
 		break;
 	default:
@@ -2750,11 +2750,11 @@ static int setup_display(const char *str)
 	const char *display = str ?: "tot";
 
 	if (!strcmp(display, "tot"))
-		c2c.display = DISPLAY_TOT;
+		c2c.display = DISPLAY_TOT_HITM;
 	else if (!strcmp(display, "rmt"))
-		c2c.display = DISPLAY_RMT;
+		c2c.display = DISPLAY_RMT_HITM;
 	else if (!strcmp(display, "lcl"))
-		c2c.display = DISPLAY_LCL;
+		c2c.display = DISPLAY_LCL_HITM;
 	else {
 		pr_err("failed: unknown display type: %s\n", str);
 		return -1;
@@ -2844,9 +2844,9 @@ static int setup_coalesce(const char *coalesce, bool no_source)
 		return -1;
 
 	if (asprintf(&c2c.cl_resort, "offset,%s",
-		     c2c.display == DISPLAY_TOT ?
+		     c2c.display == DISPLAY_TOT_HITM ?
 		     "tot_hitm" :
-		     c2c.display == DISPLAY_RMT ?
+		     c2c.display == DISPLAY_RMT_HITM ?
 		     "rmt_hitm,lcl_hitm" :
 		     "lcl_hitm,rmt_hitm") < 0)
 		return -ENOMEM;
@@ -3003,11 +3003,11 @@ static int perf_c2c__report(int argc, const char **argv)
 		     "ld_rmthit,rmt_hitm,"
 		     "dram_lcl,dram_rmt";
 
-	if (c2c.display == DISPLAY_TOT)
+	if (c2c.display == DISPLAY_TOT_HITM)
 		sort_str = "tot_hitm";
-	else if (c2c.display == DISPLAY_RMT)
+	else if (c2c.display == DISPLAY_RMT_HITM)
 		sort_str = "rmt_hitm";
-	else if (c2c.display == DISPLAY_LCL)
+	else if (c2c.display == DISPLAY_LCL_HITM)
 		sort_str = "lcl_hitm";
 
 	c2c_hists__reinit(&c2c.hists, output_str, sort_str);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 07/11] perf c2c: Rename dimension from 'percent_hitm' to 'percent_costly_snoop'
  2022-05-18  5:57 [PATCH v3 00/11] perf c2c: Support display for Arm64 Leo Yan
                   ` (5 preceding siblings ...)
  2022-05-18  5:57 ` [PATCH v3 06/11] perf c2c: Use explicit names for display macros Leo Yan
@ 2022-05-18  5:57 ` Leo Yan
  2022-05-18  5:57 ` [PATCH v3 08/11] perf c2c: Refactor node header Leo Yan
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Leo Yan @ 2022-05-18  5:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Like Xu, Alyssa Ross, Ian Rogers, Kajol Jain, Adam Li, Li Huafei,
	German Gomez, James Clark, Kan Liang, Ali Saidi, Joe Mario,
	linux-perf-users, linux-kernel
  Cc: Leo Yan

Use more general naming for the main sort dimension, this can allow us
not to sort only on HITM snoop type, so it can be extended to support
other costly snooping operations.  So rename the dimension to the prefix
'percent_costly_".

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Ali Saidi <alisaidi@amazon.com>
---
 tools/perf/builtin-c2c.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 47da9ede644b..ace7ead4ab75 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -792,7 +792,7 @@ percent_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
 	return hpp_color_scnprintf(hpp, "%*.2f%%", width - 1, per);
 }
 
-static double percent_hitm(struct c2c_hist_entry *c2c_he)
+static double percent_costly_snoop(struct c2c_hist_entry *c2c_he)
 {
 	struct c2c_hists *hists;
 	struct c2c_stats *stats;
@@ -832,8 +832,8 @@ static double percent_hitm(struct c2c_hist_entry *c2c_he)
 })
 
 static int
-percent_hitm_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
-		   struct hist_entry *he)
+percent_costly_snoop_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+			   struct hist_entry *he)
 {
 	struct c2c_hist_entry *c2c_he;
 	int width = c2c_width(fmt, hpp, he->hists);
@@ -841,20 +841,20 @@ percent_hitm_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
 	double per;
 
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
-	per = percent_hitm(c2c_he);
+	per = percent_costly_snoop(c2c_he);
 	return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
 }
 
 static int
-percent_hitm_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
-		   struct hist_entry *he)
+percent_costly_snoop_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+			   struct hist_entry *he)
 {
-	return percent_color(fmt, hpp, he, percent_hitm);
+	return percent_color(fmt, hpp, he, percent_costly_snoop);
 }
 
 static int64_t
-percent_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
-		 struct hist_entry *left, struct hist_entry *right)
+percent_costly_snoop_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+			 struct hist_entry *left, struct hist_entry *right)
 {
 	struct c2c_hist_entry *c2c_left;
 	struct c2c_hist_entry *c2c_right;
@@ -864,8 +864,8 @@ percent_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 	c2c_left  = container_of(left, struct c2c_hist_entry, he);
 	c2c_right = container_of(right, struct c2c_hist_entry, he);
 
-	per_left  = percent_hitm(c2c_left);
-	per_right = percent_hitm(c2c_right);
+	per_left  = percent_costly_snoop(c2c_left);
+	per_right = percent_costly_snoop(c2c_right);
 
 	return per_left - per_right;
 }
@@ -1542,17 +1542,17 @@ static struct c2c_dimension dim_tot_loads = {
 	.width		= 7,
 };
 
-static struct c2c_header percent_hitm_header[] = {
+static struct c2c_header percent_costly_snoop_header[] = {
 	[DISPLAY_LCL_HITM] = HEADER_BOTH("Lcl", "Hitm"),
 	[DISPLAY_RMT_HITM] = HEADER_BOTH("Rmt", "Hitm"),
 	[DISPLAY_TOT_HITM] = HEADER_BOTH("Tot", "Hitm"),
 };
 
-static struct c2c_dimension dim_percent_hitm = {
-	.name		= "percent_hitm",
-	.cmp		= percent_hitm_cmp,
-	.entry		= percent_hitm_entry,
-	.color		= percent_hitm_color,
+static struct c2c_dimension dim_percent_costly_snoop = {
+	.name		= "percent_costly_snoop",
+	.cmp		= percent_costly_snoop_cmp,
+	.entry		= percent_costly_snoop_entry,
+	.color		= percent_costly_snoop_color,
 	.width		= 7,
 };
 
@@ -1761,7 +1761,7 @@ static struct c2c_dimension *dimensions[] = {
 	&dim_ld_rmthit,
 	&dim_tot_recs,
 	&dim_tot_loads,
-	&dim_percent_hitm,
+	&dim_percent_costly_snoop,
 	&dim_percent_rmt_hitm,
 	&dim_percent_lcl_hitm,
 	&dim_percent_ld_peer,
@@ -2663,7 +2663,7 @@ static int ui_quirks(void)
 		nodestr = "CL";
 	}
 
-	dim_percent_hitm.header = percent_hitm_header[c2c.display];
+	dim_percent_costly_snoop.header = percent_costly_snoop_header[c2c.display];
 
 	/* Fix the zero line for dcacheline column. */
 	buf = fill_line("Cacheline", dim_dcacheline.width +
@@ -2991,7 +2991,7 @@ static int perf_c2c__report(int argc, const char **argv)
 		     "dcacheline,"
 		     "dcacheline_node,"
 		     "dcacheline_count,"
-		     "percent_hitm,"
+		     "percent_costly_snoop,"
 		     "tot_hitm,lcl_hitm,rmt_hitm,"
 		     "ld_peer,"
 		     "tot_recs,"
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 08/11] perf c2c: Refactor node header
  2022-05-18  5:57 [PATCH v3 00/11] perf c2c: Support display for Arm64 Leo Yan
                   ` (6 preceding siblings ...)
  2022-05-18  5:57 ` [PATCH v3 07/11] perf c2c: Rename dimension from 'percent_hitm' to 'percent_costly_snoop' Leo Yan
@ 2022-05-18  5:57 ` Leo Yan
  2022-05-18  5:57 ` [PATCH v3 09/11] perf c2c: Sort on peer snooping for load operations Leo Yan
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Leo Yan @ 2022-05-18  5:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Like Xu, Alyssa Ross, Ian Rogers, Kajol Jain, Adam Li, Li Huafei,
	German Gomez, James Clark, Kan Liang, Ali Saidi, Joe Mario,
	linux-perf-users, linux-kernel
  Cc: Leo Yan

The node header array contains 3 items, each item is used for one of
the 3 flavors for node accessing info.  To extend sorting on other
snooping type and not always stick to HITMs, the second header string
"Node{cpus %hitms %stores}" should be adjusted (e.g. it's changed as
"Node{cpus %peer %stores}").

For this reason, this patch changes the node header array to three
flat variables and uses switch-case in function setup_nodes_header(),
thus it is easier for altering the header string.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Ali Saidi <alisaidi@amazon.com>
---
 tools/perf/builtin-c2c.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index ace7ead4ab75..757a79442a52 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1651,12 +1651,6 @@ static struct c2c_dimension dim_dso = {
 	.se		= &sort_dso,
 };
 
-static struct c2c_header header_node[3] = {
-	HEADER_LOW("Node"),
-	HEADER_LOW("Node{cpus %hitms %stores}"),
-	HEADER_LOW("Node{cpu list}"),
-};
-
 static struct c2c_dimension dim_node = {
 	.name		= "node",
 	.cmp		= empty_cmp,
@@ -2144,9 +2138,27 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 	return 0;
 }
 
+static struct c2c_header header_node_0 = HEADER_LOW("Node");
+static struct c2c_header header_node_1 = HEADER_LOW("Node{cpus %hitms %stores}");
+static struct c2c_header header_node_2 = HEADER_LOW("Node{cpu list}");
+
 static void setup_nodes_header(void)
 {
-	dim_node.header = header_node[c2c.node_info];
+	switch (c2c.node_info) {
+	case 0:
+		dim_node.header = header_node_0;
+		break;
+	case 1:
+		dim_node.header = header_node_1;
+		break;
+	case 2:
+		dim_node.header = header_node_2;
+		break;
+	default:
+		break;
+	}
+
+	return;
 }
 
 static int setup_nodes(struct perf_session *session)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 09/11] perf c2c: Sort on peer snooping for load operations
  2022-05-18  5:57 [PATCH v3 00/11] perf c2c: Support display for Arm64 Leo Yan
                   ` (7 preceding siblings ...)
  2022-05-18  5:57 ` [PATCH v3 08/11] perf c2c: Refactor node header Leo Yan
@ 2022-05-18  5:57 ` Leo Yan
  2022-05-18  5:57 ` [PATCH v3 10/11] perf c2c: Update documentation for new display option 'peer' Leo Yan
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Leo Yan @ 2022-05-18  5:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Like Xu, Alyssa Ross, Ian Rogers, Kajol Jain, Adam Li, Li Huafei,
	German Gomez, James Clark, Kan Liang, Ali Saidi, Joe Mario,
	linux-perf-users, linux-kernel
  Cc: Leo Yan

Except the existed three display options 'tot', 'rmt', 'lcl', this patch
adds a new option 'peer' so can sort on the cache hit for peer snooping.

For displaying with option 'peer', the "Shared Data Cache Line Table" and
"Shared Cache Line Distribution Pareto" both sort with the metrics
"ld_peer".  As result, we can get the 'peer' display as below:

  # perf c2c report -d peer --coalesce tid,pid,iaddr,dso -N --stdio

  [...]

  =================================================
             Shared Data Cache Line Table
  =================================================
  #
  #        ----------- Cacheline ----------    Snoop  ------- Load Hitm -------    Snoop    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt     Peer    Total  LclHitm  RmtHitm     Peer  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
  #
        0      0xaaaac17d6000   N/A       0  100.00%        0        0        0       99    18851    18851        0        0        0        0        0    18752        0         0        0         0        0         0         0

  =================================================
        Shared Cache Line Distribution Pareto
  =================================================
  #
  #        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                                                  --------------- cycles ---------------    Total       cpu                                    Shared
  #   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt      Pid                Tid        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol            Object      Source:Line  Node{cpus %peers %stores}
  # .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  .......  .................  ..................  ........  ........  ........  ........  .......  ........  ......................  ................  ...............  ....
  #
    -------------------------------------------------------------------------------
        0        0        0       99        0        0        0      0xaaaac17d6000
    -------------------------------------------------------------------------------
             0.00%    0.00%    3.03%    0.00%    0.00%    0.00%                0x20   N/A       0     3603     3603:memstress      0xaaaac17c25ac         0         0        41       376     9314         2  [.] 0x00000000000025ac  memstress         memstress[25ac]   0{ 2 100.0%    n/a}
             0.00%    0.00%    3.03%    0.00%    0.00%    0.00%                0x20   N/A       0     3603     3606:memstress      0xaaaac17c25ac         0         0        44       375     9155         1  [.] 0x00000000000025ac  memstress         memstress[25ac]   0{ 1 100.0%    n/a}
             0.00%    0.00%   45.45%    0.00%    0.00%    0.00%                0x29   N/A       0     3603     3603:memstress      0xaaaac17c3e88         0         0       175       180       70         2  [.] 0x0000000000003e88  memstress         memstress[3e88]   0{ 2 100.0%    n/a}
             0.00%    0.00%   48.48%    0.00%    0.00%    0.00%                0x29   N/A       0     3603     3606:memstress      0xaaaac17c3e88         0         0       170       180       65         1  [.] 0x0000000000003e88  memstress         memstress[3e88]   0{ 1 100.0%    n/a}

  [...]

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Ali Saidi <alisaidi@amazon.com>
---
 tools/perf/builtin-c2c.c | 63 +++++++++++++++++++++++++++++++---------
 1 file changed, 49 insertions(+), 14 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 757a79442a52..3bd422c5e8ae 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -117,13 +117,15 @@ enum {
 	DISPLAY_LCL_HITM,
 	DISPLAY_RMT_HITM,
 	DISPLAY_TOT_HITM,
+	DISPLAY_SNP_PEER,
 	DISPLAY_MAX,
 };
 
 static const char *display_str[DISPLAY_MAX] = {
-	[DISPLAY_LCL_HITM] = "Local",
-	[DISPLAY_RMT_HITM] = "Remote",
-	[DISPLAY_TOT_HITM] = "Total",
+	[DISPLAY_LCL_HITM] = "Local HITMs",
+	[DISPLAY_RMT_HITM] = "Remote HITMs",
+	[DISPLAY_TOT_HITM] = "Total HITMs",
+	[DISPLAY_SNP_PEER] = "Snoop Peers",
 };
 
 static const struct option c2c_options[] = {
@@ -816,6 +818,11 @@ static double percent_costly_snoop(struct c2c_hist_entry *c2c_he)
 	case DISPLAY_TOT_HITM:
 		st  = stats->tot_hitm;
 		tot = total->tot_hitm;
+		break;
+	case DISPLAY_SNP_PEER:
+		st  = stats->ld_peer;
+		tot = total->ld_peer;
+		break;
 	default:
 		break;
 	}
@@ -1191,6 +1198,10 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
 				ret = display_metrics(hpp, stats->tot_hitm,
 						      c2c_he->stats.tot_hitm);
 				break;
+			case DISPLAY_SNP_PEER:
+				ret = display_metrics(hpp, stats->ld_peer,
+						      c2c_he->stats.ld_peer);
+				break;
 			default:
 				break;
 			}
@@ -1546,6 +1557,7 @@ static struct c2c_header percent_costly_snoop_header[] = {
 	[DISPLAY_LCL_HITM] = HEADER_BOTH("Lcl", "Hitm"),
 	[DISPLAY_RMT_HITM] = HEADER_BOTH("Rmt", "Hitm"),
 	[DISPLAY_TOT_HITM] = HEADER_BOTH("Tot", "Hitm"),
+	[DISPLAY_SNP_PEER] = HEADER_BOTH("Snoop", "Peer"),
 };
 
 static struct c2c_dimension dim_percent_costly_snoop = {
@@ -2022,6 +2034,10 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
 		he->filtered = filter_display(c2c_he->stats.tot_hitm,
 					      stats->tot_hitm);
 		break;
+	case DISPLAY_SNP_PEER:
+		he->filtered = filter_display(c2c_he->stats.ld_peer,
+					      stats->ld_peer);
+		break;
 	default:
 		break;
 	}
@@ -2050,6 +2066,8 @@ static inline bool is_valid_hist_entry(struct hist_entry *he)
 	case DISPLAY_TOT_HITM:
 		has_record = !!c2c_he->stats.tot_hitm;
 		break;
+	case DISPLAY_SNP_PEER:
+		has_record = !!c2c_he->stats.ld_peer;
 	default:
 		break;
 	}
@@ -2139,7 +2157,10 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 }
 
 static struct c2c_header header_node_0 = HEADER_LOW("Node");
-static struct c2c_header header_node_1 = HEADER_LOW("Node{cpus %hitms %stores}");
+static struct c2c_header header_node_1_hitms_stores =
+		HEADER_LOW("Node{cpus %hitms %stores}");
+static struct c2c_header header_node_1_peers_stores =
+		HEADER_LOW("Node{cpus %peers %stores}");
 static struct c2c_header header_node_2 = HEADER_LOW("Node{cpu list}");
 
 static void setup_nodes_header(void)
@@ -2149,7 +2170,10 @@ static void setup_nodes_header(void)
 		dim_node.header = header_node_0;
 		break;
 	case 1:
-		dim_node.header = header_node_1;
+		if (c2c.display == DISPLAY_SNP_PEER)
+			dim_node.header = header_node_1_peers_stores;
+		else
+			dim_node.header = header_node_1_hitms_stores;
 		break;
 	case 2:
 		dim_node.header = header_node_2;
@@ -2223,13 +2247,15 @@ static int setup_nodes(struct perf_session *session)
 }
 
 #define HAS_HITMS(__h) ((__h)->stats.lcl_hitm || (__h)->stats.rmt_hitm)
+#define HAS_PEER(__h) ((__h)->stats.ld_peer)
 
 static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 {
 	struct c2c_hist_entry *c2c_he;
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
 
-	if (HAS_HITMS(c2c_he)) {
+	if ((c2c.display != DISPLAY_SNP_PEER && HAS_HITMS(c2c_he)) ||
+	    (c2c.display == DISPLAY_SNP_PEER && HAS_PEER(c2c_he))) {
 		c2c.shared_clines++;
 		c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
 	}
@@ -2404,7 +2430,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
 		fprintf(out, "%-36s: %s\n", first ? "  Events" : "", evsel__name(evsel));
 		first = false;
 	}
-	fprintf(out, "  Cachelines sort on                : %s HITMs\n",
+	fprintf(out, "  Cachelines sort on                : %s\n",
 		display_str[c2c.display]);
 	fprintf(out, "  Cacheline data grouping           : %s\n", c2c.cl_sort);
 }
@@ -2561,7 +2587,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
 {
 	scnprintf(bf, size,
 		  "Shared Data Cache Line Table     "
-		  "(%lu entries, sorted on %s HITMs)",
+		  "(%lu entries, sorted on %s)",
 		  browser->nr_non_filtered_entries,
 		  display_str[c2c.display]);
 	return 0;
@@ -2767,6 +2793,8 @@ static int setup_display(const char *str)
 		c2c.display = DISPLAY_RMT_HITM;
 	else if (!strcmp(display, "lcl"))
 		c2c.display = DISPLAY_LCL_HITM;
+	else if (!strcmp(display, "peer"))
+		c2c.display = DISPLAY_SNP_PEER;
 	else {
 		pr_err("failed: unknown display type: %s\n", str);
 		return -1;
@@ -2848,6 +2876,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
 static int setup_coalesce(const char *coalesce, bool no_source)
 {
 	const char *c = coalesce ?: coalesce_default;
+	const char *sort_str = NULL;
 
 	if (asprintf(&c2c.cl_sort, "offset,%s", c) < 0)
 		return -ENOMEM;
@@ -2855,12 +2884,16 @@ static int setup_coalesce(const char *coalesce, bool no_source)
 	if (build_cl_output(c2c.cl_sort, no_source))
 		return -1;
 
-	if (asprintf(&c2c.cl_resort, "offset,%s",
-		     c2c.display == DISPLAY_TOT_HITM ?
-		     "tot_hitm" :
-		     c2c.display == DISPLAY_RMT_HITM ?
-		     "rmt_hitm,lcl_hitm" :
-		     "lcl_hitm,rmt_hitm") < 0)
+	if (c2c.display == DISPLAY_TOT_HITM)
+		sort_str = "tot_hitm";
+	else if (c2c.display == DISPLAY_RMT_HITM)
+		sort_str = "rmt_hitm,lcl_hitm";
+	else if (c2c.display == DISPLAY_LCL_HITM)
+		sort_str = "lcl_hitm,rmt_hitm";
+	else if (c2c.display == DISPLAY_SNP_PEER)
+		sort_str = "ld_peer";
+
+	if (asprintf(&c2c.cl_resort, "offset,%s", sort_str) < 0)
 		return -ENOMEM;
 
 	pr_debug("coalesce sort   fields: %s\n", c2c.cl_sort);
@@ -3021,6 +3054,8 @@ static int perf_c2c__report(int argc, const char **argv)
 		sort_str = "rmt_hitm";
 	else if (c2c.display == DISPLAY_LCL_HITM)
 		sort_str = "lcl_hitm";
+	else if (c2c.display == DISPLAY_SNP_PEER)
+		sort_str = "ld_peer";
 
 	c2c_hists__reinit(&c2c.hists, output_str, sort_str);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 10/11] perf c2c: Update documentation for new display option 'peer'
  2022-05-18  5:57 [PATCH v3 00/11] perf c2c: Support display for Arm64 Leo Yan
                   ` (8 preceding siblings ...)
  2022-05-18  5:57 ` [PATCH v3 09/11] perf c2c: Sort on peer snooping for load operations Leo Yan
@ 2022-05-18  5:57 ` Leo Yan
  2022-05-18  5:57 ` [PATCH v3 11/11] perf c2c: Use 'peer' as default display for Arm64 Leo Yan
  2022-05-23  8:43 ` [PATCH v3 00/11] perf c2c: Support " Jiri Olsa
  11 siblings, 0 replies; 16+ messages in thread
From: Leo Yan @ 2022-05-18  5:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Like Xu, Alyssa Ross, Ian Rogers, Kajol Jain, Adam Li, Li Huafei,
	German Gomez, James Clark, Kan Liang, Ali Saidi, Joe Mario,
	linux-perf-users, linux-kernel
  Cc: Leo Yan

Since the new display option 'peer' is introduced, this patch is to
update the documentation to reflect it.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Ali Saidi <alisaidi@amazon.com>
---
 tools/perf/Documentation/perf-c2c.txt | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index 6f69173731aa..df9536be856b 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -109,7 +109,8 @@ REPORT OPTIONS
 
 -d::
 --display::
-	Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
+	Switch to HITM type (rmt, lcl) or peer snooping type (peer) to display
+	and sort on. Total HITMs (tot) as default.
 
 --stitch-lbr::
 	Show callgraph with stitched LBRs, which may have more complete
@@ -174,12 +175,18 @@ For each cacheline in the 1) list we display following data:
   Cacheline
   - cacheline address (hex number)
 
-  Rmt/Lcl Hitm
+  Rmt/Lcl Hitm (For display with HITM types)
   - cacheline percentage of all Remote/Local HITM accesses
 
+  Snoop Peer (For display with peer type)
+  - cacheline percentage of peer access
+
   LLC Load Hitm - Total, LclHitm, RmtHitm
   - count of Total/Local/Remote load HITMs
 
+  Snoop Peer
+  - count of peer access
+
   Total records
   - sum of all cachelines accesses
 
@@ -201,7 +208,9 @@ For each cacheline in the 1) list we display following data:
   - count of LLC load accesses, includes LLC hits and LLC HITMs
 
   RMT Load Hit - RmtHit, RmtHitm
-  - count of remote load accesses, includes remote hits and remote HITMs
+  - count of remote load accesses, includes remote hits and remote HITMs;
+    on Arm neoverse cores, RmtHit is used to account remote accesses,
+    includes remote DRAM or any upward cache level in remote node
 
   Load Dram - Lcl, Rmt
   - count of local and remote DRAM accesses
@@ -211,6 +220,9 @@ For each offset in the 2) list we display following data:
   HITM - Rmt, Lcl
   - % of Remote/Local HITM accesses for given offset within cacheline
 
+  Snoop Peer
+  - % of peer accesses for given offset within cacheline
+
   Store Refs - L1 Hit, L1 Miss, N/A
   - % of store accesses that hit L1, missed L1 and N/A (no available) memory
     level for given offset within cacheline
@@ -227,8 +239,9 @@ For each offset in the 2) list we display following data:
   Code address
   - code address responsible for the accesses
 
-  cycles - rmt hitm, lcl hitm, load
-    - sum of cycles for given accesses - Remote/Local HITM and generic load
+  cycles - rmt hitm, lcl hitm, load, peer
+    - sum of cycles for given accesses - Remote/Local HITM, generic load and
+      peer access
 
   cpu cnt
     - number of cpus that participated on the access
@@ -251,7 +264,8 @@ The 'Node' field displays nodes that accesses given cacheline
 offset. Its output comes in 3 flavors:
   - node IDs separated by ','
   - node IDs with stats for each ID, in following format:
-      Node{cpus %hitms %stores}
+      Node{cpus %hitms %stores} (For display with HITM types)
+      Node{cpus %peers %stores} (For display with "peer" type)
   - node IDs with list of affected CPUs in following format:
       Node{cpu list}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 11/11] perf c2c: Use 'peer' as default display for Arm64
  2022-05-18  5:57 [PATCH v3 00/11] perf c2c: Support display for Arm64 Leo Yan
                   ` (9 preceding siblings ...)
  2022-05-18  5:57 ` [PATCH v3 10/11] perf c2c: Update documentation for new display option 'peer' Leo Yan
@ 2022-05-18  5:57 ` Leo Yan
  2022-05-23  8:43 ` [PATCH v3 00/11] perf c2c: Support " Jiri Olsa
  11 siblings, 0 replies; 16+ messages in thread
From: Leo Yan @ 2022-05-18  5:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Like Xu, Alyssa Ross, Ian Rogers, Kajol Jain, Adam Li, Li Huafei,
	German Gomez, James Clark, Kan Liang, Ali Saidi, Joe Mario,
	linux-perf-users, linux-kernel
  Cc: Leo Yan

Since Arm64 arch doesn't support HITMs flags, so use 'peer' as default
if user doesn't specify display type; for other arches, it still uses
'tot' as default display type if user doesn't specify it.

Suggested-by: Ali Saidi <alisaidi@amazon.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 3bd422c5e8ae..042431b7b6ba 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2785,7 +2785,7 @@ static int setup_callchain(struct evlist *evlist)
 
 static int setup_display(const char *str)
 {
-	const char *display = str ?: "tot";
+	const char *display = str;
 
 	if (!strcmp(display, "tot"))
 		c2c.display = DISPLAY_TOT_HITM;
@@ -2971,9 +2971,6 @@ static int perf_c2c__report(int argc, const char **argv)
 	data.path  = input_name;
 	data.force = symbol_conf.force;
 
-	err = setup_display(display);
-	if (err)
-		goto out;
 
 	err = setup_coalesce(coalesce, no_source);
 	if (err) {
@@ -2994,6 +2991,22 @@ static int perf_c2c__report(int argc, const char **argv)
 		goto out;
 	}
 
+	/*
+	 * Use the 'tot' as default display type if user doesn't specify it;
+	 * since Arm64 platform doesn't support HITMs flag, use 'peer' as the
+	 * default display type.
+	 */
+	if (!display) {
+		if (!strcmp(perf_env__arch(&session->header.env), "arm64"))
+			display = "peer";
+		else
+			display = "tot";
+	}
+
+	err = setup_display(display);
+	if (err)
+		goto out_session;
+
 	session->itrace_synth_opts = &itrace_synth_opts;
 
 	err = setup_nodes(session);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 00/11] perf c2c: Support display for Arm64
  2022-05-18  5:57 [PATCH v3 00/11] perf c2c: Support display for Arm64 Leo Yan
                   ` (10 preceding siblings ...)
  2022-05-18  5:57 ` [PATCH v3 11/11] perf c2c: Use 'peer' as default display for Arm64 Leo Yan
@ 2022-05-23  8:43 ` Jiri Olsa
  2022-05-23 12:43   ` Arnaldo Carvalho de Melo
  11 siblings, 1 reply; 16+ messages in thread
From: Jiri Olsa @ 2022-05-23  8:43 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Like Xu,
	Alyssa Ross, Ian Rogers, Kajol Jain, Adam Li, Li Huafei,
	German Gomez, James Clark, Kan Liang, Ali Saidi, Joe Mario,
	linux-perf-users, linux-kernel

On Wed, May 18, 2022 at 01:57:18PM +0800, Leo Yan wrote:
> Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
> us to detect cache line contention and transfers.
> 
> Unlike x86 architecture, Arm SPE trace data cannot provide 'HITM'
> snooping flag, Ali Said has a patch set v9 "perf: arm-spe: Decode SPE
> source and use for perf c2c" [1] which introduces 'peer' flag and
> synthesizes memory samples with this flag.
> 
> Based on patch set [1], this patch set is to finish the second half work
> to consume the 'peer' flag in perf c2c tool, it adds an extra display
> 'peer' mode.
> 
> Patches 01, 02 and 03 are to support 'N/A' metrics for store operations.
> 
> Patches 04 and 05 adds statistics and dimensions for memory samples with
> peer flag.
> 
> Patches 06, 07, 08 are for refactoring, it refines the code with more
> general naming so this can allow us to easier to extend display modes
> but not strictly bound to HITM tags.
> 
> Patches 09, 10 and 11 are to extend display 'peer' mode, it also updates
> the document and also changes to use 'peer' mode as default mode on
> Arm64 arches.
> 
> This patch set has been verified for both x86 and Arm64 memory samples.
> 
> The display result with x86 memory samples:
> 
>   =================================================
>              Shared Data Cache Line Table          
>   =================================================
>   #
>   #        ----------- Cacheline ----------      Tot  ------- Load Hitm -------    Snoop    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
>   # Index             Address  Node  PA cnt     Hitm    Total  LclHitm  RmtHitm     Peer  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
>   # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
>   #
>         0      0x55c8971f0080     0    1967   66.14%      252      252        0        0     6044     3550     2494     2024      470        0      528     2672       78        20      252         0        0         0         0
>         1      0x55c8971f00c0     0       1   33.86%      129      129        0        0      914      914        0        0        0        0      272      374       52        87      129         0        0         0         0
> 
>   =================================================
>         Shared Cache Line Distribution Pareto      
>   =================================================
>   #
>   #        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                     Shared                               
>   #   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol             Object              Source:Line  Node
>   # .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  .................  .......................  ....
>   #
>     -------------------------------------------------------------------------------
>         0        0      252        0     2024      470        0      0x55c8971f0080
>     -------------------------------------------------------------------------------
>              0.00%   12.30%    0.00%    0.00%    0.00%    0.00%                 0x0     0       1      0x55c8971ed3e9         0      1313       863         0     1222         3  [.] 0x00000000000013e9  false_sharing.exe  false_sharing.exe[13e9]   0
>              0.00%    0.79%    0.00%   90.51%    0.00%    0.00%                 0x0     0       1      0x55c8971ed3e2         0      1800       878         0     3029         3  [.] 0x00000000000013e2  false_sharing.exe  false_sharing.exe[13e2]   0
>              0.00%    0.00%    0.00%    9.49%  100.00%    0.00%                 0x0     0       1      0x55c8971ed3f4         0         0         0         0      662         3  [.] 0x00000000000013f4  false_sharing.exe  false_sharing.exe[13f4]   0
>              0.00%   86.90%    0.00%    0.00%    0.00%    0.00%                0x20     0       1      0x55c8971ed447         0       141       103         0     1131         2  [.] 0x0000000000001447  false_sharing.exe  false_sharing.exe[1447]   0
> 
>     -------------------------------------------------------------------------------
>         1        0      129        0        0        0        0      0x55c8971f00c0
>     -------------------------------------------------------------------------------
>              0.00%  100.00%    0.00%    0.00%    0.00%    0.00%                0x20     0       1      0x55c8971ed455         0        88        94         0      914         2  [.] 0x0000000000001455  false_sharing.exe  false_sharing.exe[1455]   0
> 
> 
> The display result with Arm SPE memory samples:
> 
>   =================================================
>              Shared Data Cache Line Table          
>   =================================================
>   #
>   #        ----------- Cacheline ----------    Snoop  ------- Load Hitm -------    Snoop    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
>   # Index             Address  Node  PA cnt     Peer    Total  LclHitm  RmtHitm     Peer  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
>   # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
>   #
>         0      0xaaaac17d6000   N/A       0  100.00%        0        0        0       99    18851    18851        0        0        0        0        0    18752        0        99        0         0        0         0         0
> 
>   =================================================
>         Shared Cache Line Distribution Pareto      
>   =================================================
>   #
>   #        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                    Shared                       
>   #   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol            Object      Source:Line  Node
>   # .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  ................  ...............  ....
>   #
>     -------------------------------------------------------------------------------
>         0        0        0       99        0        0        0      0xaaaac17d6000
>     -------------------------------------------------------------------------------
>              0.00%    0.00%    6.06%    0.00%    0.00%    0.00%                0x20   N/A       0      0xaaaac17c25ac         0         0        43       375    18469         2  [.] 0x00000000000025ac  memstress         memstress[25ac]   0
>              0.00%    0.00%   93.94%    0.00%    0.00%    0.00%                0x29   N/A       0      0xaaaac17c3e88         0         0       173       180      135         2  [.] 0x0000000000003e88  memstress         memstress[3e88]   0
> 
> [1] https://lore.kernel.org/lkml/20220517020326.18580-1-alisaidi@amazon.com/
> 
> Changes from v2:
> * Updated patch 04 to account metrics for both cache level and ld_peer
>   for PEER flag;
> * Updated document for metric 'rmt_hit' which is accounted for all
>   remote accesses (include remote DRAM and any upward caches).

LGTM

Acked-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka

> 
> Changes from v1:
> * Updated patches 01, 02 and 03 to support 'N/A' metrics for store
>   operations, so can align with the patch set [1] for store samples.
> 
> 
> Leo Yan (11):
>   perf mem: Add stats for store operation with no available memory level
>   perf c2c: Add dimensions for 'N/A' metrics of store operation
>   perf c2c: Update documentation for store metric 'N/A'
>   perf mem: Add statistics for peer snooping
>   perf c2c: Add dimensions for peer load operations
>   perf c2c: Use explicit names for display macros
>   perf c2c: Rename dimension from 'percent_hitm' to
>     'percent_costly_snoop'
>   perf c2c: Refactor node header
>   perf c2c: Sort on peer snooping for load operations
>   perf c2c: Update documentation for new display option 'peer'
>   perf c2c: Use 'peer' as default display for Arm64
> 
>  tools/perf/Documentation/perf-c2c.txt |  34 ++-
>  tools/perf/builtin-c2c.c              | 357 ++++++++++++++++++++------
>  tools/perf/util/mem-events.c          |  25 +-
>  tools/perf/util/mem-events.h          |   2 +
>  4 files changed, 331 insertions(+), 87 deletions(-)
> 
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 04/11] perf mem: Add statistics for peer snooping
  2022-05-18  5:57 ` [PATCH v3 04/11] perf mem: Add statistics for peer snooping Leo Yan
@ 2022-05-23 12:38   ` Arnaldo Carvalho de Melo
  2022-05-23 12:46     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-23 12:38 UTC (permalink / raw)
  To: Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Like Xu, Alyssa Ross, Ian Rogers,
	Kajol Jain, Adam Li, Li Huafei, German Gomez, James Clark,
	Kan Liang, Ali Saidi, Joe Mario, linux-perf-users, linux-kernel

Em Wed, May 18, 2022 at 01:57:22PM +0800, Leo Yan escreveu:
> Since the flag PERF_MEM_SNOOPX_PEER is added to support cache snooping
> from peer cache line, it can come from a peer core, a peer cluster, or
> a remote NUMA node.
> 
> This patch adds statistics for the flag PERF_MEM_SNOOPX_PEER.  Note, we
> take PERF_MEM_SNOOPX_PEER as an affiliated info, it needs to cooperate
> with cache level statistics.  Therefore, we account the load operations
> for both the cache level's metrics (e.g. ld_l2hit, ld_llchit, etc.) and
> the metric 'ld_peer' when flag PERF_MEM_SNOOPX_PEER is set.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> Tested-by: Ali Saidi <alisaidi@amazon.com>

util/mem-events.c: In function ‘c2c_decode_stats’:
util/mem-events.c:536:17: error: ‘PERF_MEM_SNOOPX_PEER’ undeclared (first use in this function); did you mean ‘PERF_MEM_SNOOPX_FWD’?


Should I fix this as suggested by the compiler?

  536 | #define P(a, b) PERF_MEM_##a##_##b
      |                 ^~~~~~~~~
util/mem-events.c:562:46: note: in expansion of macro ‘P’
  562 |                                 if (snoopx & P(SNOOPX, PEER))
      |                                              ^
util/mem-events.c:536:17: note: each undeclared identifier is reported only once for each function it appears in
  536 | #define P(a, b) PERF_MEM_##a##_##b
      |                 ^~~~~~~~~
util/mem-events.c:562:46: note: in expansion of macro ‘P’
  562 |                                 if (snoopx & P(SNOOPX, PEER))
      |                                              ^
make[4]: *** [/var/home/acme/git/perf/tools/build/Makefile.build:96: /tmp/build/perf/util/mem-events.o] Error 1
make[4]: *** Waiting for unfinished jobs....
  LD      /tmp/build/perf/util/scripting-engines/perf-in.o
make[3]: *** [/var/home/acme/git/perf/tools/build/Makefile.build:139: util] Error 2
make[2]: *** [Makefile.perf:664: /tmp/build/perf/perf-in.o] Error 2
make[1]: *** [Makefile.perf:240: sub-make] Error 2
make: *** [Makefile:113: install-bin] Error 2
make: Leaving directory '/var/home/acme/git/perf/tools/perf'

 Performance counter stats for 'make -k BUILD_BPF_SKEL=1 CORESIGHT=1 PYTHON=python3 O=/tmp/build/perf -C tools/perf install-bin':

    31,749,639,340      cycles:u
    57,052,398,827      instructions:u            #    1.80  insn per cycle

       2.123830023 seconds time elapsed

       7.146520000 seconds user
       1.707080000 seconds sys


⬢[acme@toolbox perf]$

> ---
>  tools/perf/util/mem-events.c | 22 +++++++++++++++++++---
>  tools/perf/util/mem-events.h |  1 +
>  2 files changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
> index 5dca1882c284..9de0eb3a1200 100644
> --- a/tools/perf/util/mem-events.c
> +++ b/tools/perf/util/mem-events.c
> @@ -525,6 +525,7 @@ int c2c_decode_stats(struct c2c_stats *stats, struct mem_info *mi)
>  	u64 op     = data_src->mem_op;
>  	u64 lvl    = data_src->mem_lvl;
>  	u64 snoop  = data_src->mem_snoop;
> +	u64 snoopx = data_src->mem_snoopx;
>  	u64 lock   = data_src->mem_lock;
>  	u64 blk    = data_src->mem_blk;
>  	/*
> @@ -567,18 +568,28 @@ do {				\
>  			if (lvl & P(LVL, IO))  stats->ld_io++;
>  			if (lvl & P(LVL, LFB)) stats->ld_fbhit++;
>  			if (lvl & P(LVL, L1 )) stats->ld_l1hit++;
> -			if (lvl & P(LVL, L2 )) stats->ld_l2hit++;
> +			if (lvl & P(LVL, L2)) {
> +				stats->ld_l2hit++;
> +
> +				if (snoopx & P(SNOOPX, PEER))
> +					stats->ld_peer++;
> +			}
>  			if (lvl & P(LVL, L3 )) {
>  				if (snoop & P(SNOOP, HITM))
>  					HITM_INC(lcl_hitm);
>  				else
>  					stats->ld_llchit++;
> +
> +				if (snoopx & P(SNOOPX, PEER))
> +					stats->ld_peer++;
>  			}
>  
>  			if (lvl & P(LVL, LOC_RAM)) {
>  				stats->lcl_dram++;
>  				if (snoop & P(SNOOP, HIT))
>  					stats->ld_shared++;
> +				else if (snoopx & P(SNOOPX, PEER))
> +					stats->ld_peer++;
>  				else
>  					stats->ld_excl++;
>  			}
> @@ -597,10 +608,14 @@ do {				\
>  		if ((lvl & P(LVL, REM_CCE1)) ||
>  		    (lvl & P(LVL, REM_CCE2)) ||
>  		     mrem) {
> -			if (snoop & P(SNOOP, HIT))
> +			if (snoop & P(SNOOP, HIT)) {
>  				stats->rmt_hit++;
> -			else if (snoop & P(SNOOP, HITM))
> +			} else if (snoop & P(SNOOP, HITM)) {
>  				HITM_INC(rmt_hitm);
> +			} else if (snoopx & P(SNOOPX, PEER)) {
> +				stats->rmt_hit++;
> +				stats->ld_peer++;
> +			}
>  		}
>  
>  		if ((lvl & P(LVL, MISS)))
> @@ -661,6 +676,7 @@ void c2c_add_stats(struct c2c_stats *stats, struct c2c_stats *add)
>  	stats->ld_l1hit		+= add->ld_l1hit;
>  	stats->ld_l2hit		+= add->ld_l2hit;
>  	stats->ld_llchit	+= add->ld_llchit;
> +	stats->ld_peer		+= add->ld_peer;
>  	stats->lcl_hitm		+= add->lcl_hitm;
>  	stats->rmt_hitm		+= add->rmt_hitm;
>  	stats->tot_hitm		+= add->tot_hitm;
> diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
> index 8a8b568baeee..4879b841c841 100644
> --- a/tools/perf/util/mem-events.h
> +++ b/tools/perf/util/mem-events.h
> @@ -75,6 +75,7 @@ struct c2c_stats {
>  	u32	ld_l1hit;            /* count of loads that hit L1D */
>  	u32	ld_l2hit;            /* count of loads that hit L2D */
>  	u32	ld_llchit;           /* count of loads that hit LLC */
> +	u32	ld_peer;             /* count of loads that hit peer core or cluster cache */
>  	u32	lcl_hitm;            /* count of loads with local HITM  */
>  	u32	rmt_hitm;            /* count of loads with remote HITM */
>  	u32	tot_hitm;            /* count of loads with local and remote HITM */
> -- 
> 2.25.1

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 00/11] perf c2c: Support display for Arm64
  2022-05-23  8:43 ` [PATCH v3 00/11] perf c2c: Support " Jiri Olsa
@ 2022-05-23 12:43   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-23 12:43 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Leo Yan, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Namhyung Kim, Like Xu, Alyssa Ross,
	Ian Rogers, Kajol Jain, Adam Li, Li Huafei, German Gomez,
	James Clark, Kan Liang, Ali Saidi, Joe Mario, linux-perf-users,
	linux-kernel

Em Mon, May 23, 2022 at 10:43:47AM +0200, Jiri Olsa escreveu:
> On Wed, May 18, 2022 at 01:57:18PM +0800, Leo Yan wrote:
> > Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
> > us to detect cache line contention and transfers.
> > 
> > Unlike x86 architecture, Arm SPE trace data cannot provide 'HITM'
> > snooping flag, Ali Said has a patch set v9 "perf: arm-spe: Decode SPE
> > source and use for perf c2c" [1] which introduces 'peer' flag and
> > synthesizes memory samples with this flag.
> > 
> > Based on patch set [1], this patch set is to finish the second half work
> > to consume the 'peer' flag in perf c2c tool, it adds an extra display
> > 'peer' mode.

Ok, I'll look at the base patch set...

> > Patches 01, 02 and 03 are to support 'N/A' metrics for store operations.
> > 
> > Patches 04 and 05 adds statistics and dimensions for memory samples with
> > peer flag.
> > 
> > Patches 06, 07, 08 are for refactoring, it refines the code with more
> > general naming so this can allow us to easier to extend display modes
> > but not strictly bound to HITM tags.
> > 
> > Patches 09, 10 and 11 are to extend display 'peer' mode, it also updates
> > the document and also changes to use 'peer' mode as default mode on
> > Arm64 arches.
> > 
> > This patch set has been verified for both x86 and Arm64 memory samples.
> > 
> > The display result with x86 memory samples:
> > 
> >   =================================================
> >              Shared Data Cache Line Table          
> >   =================================================
> >   #
> >   #        ----------- Cacheline ----------      Tot  ------- Load Hitm -------    Snoop    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
> >   # Index             Address  Node  PA cnt     Hitm    Total  LclHitm  RmtHitm     Peer  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
> >   # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
> >   #
> >         0      0x55c8971f0080     0    1967   66.14%      252      252        0        0     6044     3550     2494     2024      470        0      528     2672       78        20      252         0        0         0         0
> >         1      0x55c8971f00c0     0       1   33.86%      129      129        0        0      914      914        0        0        0        0      272      374       52        87      129         0        0         0         0
> > 
> >   =================================================
> >         Shared Cache Line Distribution Pareto      
> >   =================================================
> >   #
> >   #        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                     Shared                               
> >   #   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol             Object              Source:Line  Node
> >   # .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  .................  .......................  ....
> >   #
> >     -------------------------------------------------------------------------------
> >         0        0      252        0     2024      470        0      0x55c8971f0080
> >     -------------------------------------------------------------------------------
> >              0.00%   12.30%    0.00%    0.00%    0.00%    0.00%                 0x0     0       1      0x55c8971ed3e9         0      1313       863         0     1222         3  [.] 0x00000000000013e9  false_sharing.exe  false_sharing.exe[13e9]   0
> >              0.00%    0.79%    0.00%   90.51%    0.00%    0.00%                 0x0     0       1      0x55c8971ed3e2         0      1800       878         0     3029         3  [.] 0x00000000000013e2  false_sharing.exe  false_sharing.exe[13e2]   0
> >              0.00%    0.00%    0.00%    9.49%  100.00%    0.00%                 0x0     0       1      0x55c8971ed3f4         0         0         0         0      662         3  [.] 0x00000000000013f4  false_sharing.exe  false_sharing.exe[13f4]   0
> >              0.00%   86.90%    0.00%    0.00%    0.00%    0.00%                0x20     0       1      0x55c8971ed447         0       141       103         0     1131         2  [.] 0x0000000000001447  false_sharing.exe  false_sharing.exe[1447]   0
> > 
> >     -------------------------------------------------------------------------------
> >         1        0      129        0        0        0        0      0x55c8971f00c0
> >     -------------------------------------------------------------------------------
> >              0.00%  100.00%    0.00%    0.00%    0.00%    0.00%                0x20     0       1      0x55c8971ed455         0        88        94         0      914         2  [.] 0x0000000000001455  false_sharing.exe  false_sharing.exe[1455]   0
> > 
> > 
> > The display result with Arm SPE memory samples:
> > 
> >   =================================================
> >              Shared Data Cache Line Table          
> >   =================================================
> >   #
> >   #        ----------- Cacheline ----------    Snoop  ------- Load Hitm -------    Snoop    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
> >   # Index             Address  Node  PA cnt     Peer    Total  LclHitm  RmtHitm     Peer  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
> >   # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
> >   #
> >         0      0xaaaac17d6000   N/A       0  100.00%        0        0        0       99    18851    18851        0        0        0        0        0    18752        0        99        0         0        0         0         0
> > 
> >   =================================================
> >         Shared Cache Line Distribution Pareto      
> >   =================================================
> >   #
> >   #        ----- HITM -----    Snoop  ------- Store Refs ------  --------- Data address ---------                      --------------- cycles ---------------    Total       cpu                                    Shared                       
> >   #   Num  RmtHitm  LclHitm     Peer   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt        Code address  rmt hitm  lcl hitm      load      peer  records       cnt                  Symbol            Object      Source:Line  Node
> >   # .....  .......  .......  .......  .......  .......  .......  ..................  ....  ......  ..................  ........  ........  ........  ........  .......  ........  ......................  ................  ...............  ....
> >   #
> >     -------------------------------------------------------------------------------
> >         0        0        0       99        0        0        0      0xaaaac17d6000
> >     -------------------------------------------------------------------------------
> >              0.00%    0.00%    6.06%    0.00%    0.00%    0.00%                0x20   N/A       0      0xaaaac17c25ac         0         0        43       375    18469         2  [.] 0x00000000000025ac  memstress         memstress[25ac]   0
> >              0.00%    0.00%   93.94%    0.00%    0.00%    0.00%                0x29   N/A       0      0xaaaac17c3e88         0         0       173       180      135         2  [.] 0x0000000000003e88  memstress         memstress[3e88]   0
> > 
> > [1] https://lore.kernel.org/lkml/20220517020326.18580-1-alisaidi@amazon.com/
> > 
> > Changes from v2:
> > * Updated patch 04 to account metrics for both cache level and ld_peer
> >   for PEER flag;
> > * Updated document for metric 'rmt_hit' which is accounted for all
> >   remote accesses (include remote DRAM and any upward caches).
> 
> LGTM
> 
> Acked-by: Jiri Olsa <jolsa@kernel.org>
> 
> thanks,
> jirka
> 
> > 
> > Changes from v1:
> > * Updated patches 01, 02 and 03 to support 'N/A' metrics for store
> >   operations, so can align with the patch set [1] for store samples.
> > 
> > 
> > Leo Yan (11):
> >   perf mem: Add stats for store operation with no available memory level
> >   perf c2c: Add dimensions for 'N/A' metrics of store operation
> >   perf c2c: Update documentation for store metric 'N/A'
> >   perf mem: Add statistics for peer snooping
> >   perf c2c: Add dimensions for peer load operations
> >   perf c2c: Use explicit names for display macros
> >   perf c2c: Rename dimension from 'percent_hitm' to
> >     'percent_costly_snoop'
> >   perf c2c: Refactor node header
> >   perf c2c: Sort on peer snooping for load operations
> >   perf c2c: Update documentation for new display option 'peer'
> >   perf c2c: Use 'peer' as default display for Arm64
> > 
> >  tools/perf/Documentation/perf-c2c.txt |  34 ++-
> >  tools/perf/builtin-c2c.c              | 357 ++++++++++++++++++++------
> >  tools/perf/util/mem-events.c          |  25 +-
> >  tools/perf/util/mem-events.h          |   2 +
> >  4 files changed, 331 insertions(+), 87 deletions(-)
> > 
> > -- 
> > 2.25.1
> > 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 04/11] perf mem: Add statistics for peer snooping
  2022-05-23 12:38   ` Arnaldo Carvalho de Melo
@ 2022-05-23 12:46     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-23 12:46 UTC (permalink / raw)
  To: Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Like Xu, Alyssa Ross, Ian Rogers,
	Kajol Jain, Adam Li, Li Huafei, German Gomez, James Clark,
	Kan Liang, Ali Saidi, Joe Mario, linux-perf-users, linux-kernel

Em Mon, May 23, 2022 at 09:38:31AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, May 18, 2022 at 01:57:22PM +0800, Leo Yan escreveu:
> > Since the flag PERF_MEM_SNOOPX_PEER is added to support cache snooping
> > from peer cache line, it can come from a peer core, a peer cluster, or
> > a remote NUMA node.
> > 
> > This patch adds statistics for the flag PERF_MEM_SNOOPX_PEER.  Note, we
> > take PERF_MEM_SNOOPX_PEER as an affiliated info, it needs to cooperate
> > with cache level statistics.  Therefore, we account the load operations
> > for both the cache level's metrics (e.g. ld_l2hit, ld_llchit, etc.) and
> > the metric 'ld_peer' when flag PERF_MEM_SNOOPX_PEER is set.
> > 
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > Tested-by: Ali Saidi <alisaidi@amazon.com>
> 
> util/mem-events.c: In function ‘c2c_decode_stats’:
> util/mem-events.c:536:17: error: ‘PERF_MEM_SNOOPX_PEER’ undeclared (first use in this function); did you mean ‘PERF_MEM_SNOOPX_FWD’?

So I kept the first three patches:

⬢[acme@toolbox perf]$ git log --oneline -5
12aeaaba087d6d92 (HEAD -> perf/core) perf c2c: Update documentation for store metric 'N/A'
550b4d6f9a7e5ddc perf c2c: Add dimensions for 'N/A' metrics of store operation
9845063710725424 perf mem: Add stats for store operation with no available memory level
508c9fbce0d30da3 perf build: Error for BPF skeletons without LIBBPF
0869331fbaa2c11c Merge remote-tracking branch 'torvalds/master' into perf/core
⬢[acme@toolbox perf]$

With Jiri's ack, now waiting for clarification about the v9 discussion
on the base patchkit.

- Arnaldo
 
> 
> Should I fix this as suggested by the compiler?
> 
>   536 | #define P(a, b) PERF_MEM_##a##_##b
>       |                 ^~~~~~~~~
> util/mem-events.c:562:46: note: in expansion of macro ‘P’
>   562 |                                 if (snoopx & P(SNOOPX, PEER))
>       |                                              ^
> util/mem-events.c:536:17: note: each undeclared identifier is reported only once for each function it appears in
>   536 | #define P(a, b) PERF_MEM_##a##_##b
>       |                 ^~~~~~~~~
> util/mem-events.c:562:46: note: in expansion of macro ‘P’
>   562 |                                 if (snoopx & P(SNOOPX, PEER))
>       |                                              ^
> make[4]: *** [/var/home/acme/git/perf/tools/build/Makefile.build:96: /tmp/build/perf/util/mem-events.o] Error 1
> make[4]: *** Waiting for unfinished jobs....
>   LD      /tmp/build/perf/util/scripting-engines/perf-in.o
> make[3]: *** [/var/home/acme/git/perf/tools/build/Makefile.build:139: util] Error 2
> make[2]: *** [Makefile.perf:664: /tmp/build/perf/perf-in.o] Error 2
> make[1]: *** [Makefile.perf:240: sub-make] Error 2
> make: *** [Makefile:113: install-bin] Error 2
> make: Leaving directory '/var/home/acme/git/perf/tools/perf'
> 
>  Performance counter stats for 'make -k BUILD_BPF_SKEL=1 CORESIGHT=1 PYTHON=python3 O=/tmp/build/perf -C tools/perf install-bin':
> 
>     31,749,639,340      cycles:u
>     57,052,398,827      instructions:u            #    1.80  insn per cycle
> 
>        2.123830023 seconds time elapsed
> 
>        7.146520000 seconds user
>        1.707080000 seconds sys
> 
> 
> ⬢[acme@toolbox perf]$
> 
> > ---
> >  tools/perf/util/mem-events.c | 22 +++++++++++++++++++---
> >  tools/perf/util/mem-events.h |  1 +
> >  2 files changed, 20 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
> > index 5dca1882c284..9de0eb3a1200 100644
> > --- a/tools/perf/util/mem-events.c
> > +++ b/tools/perf/util/mem-events.c
> > @@ -525,6 +525,7 @@ int c2c_decode_stats(struct c2c_stats *stats, struct mem_info *mi)
> >  	u64 op     = data_src->mem_op;
> >  	u64 lvl    = data_src->mem_lvl;
> >  	u64 snoop  = data_src->mem_snoop;
> > +	u64 snoopx = data_src->mem_snoopx;
> >  	u64 lock   = data_src->mem_lock;
> >  	u64 blk    = data_src->mem_blk;
> >  	/*
> > @@ -567,18 +568,28 @@ do {				\
> >  			if (lvl & P(LVL, IO))  stats->ld_io++;
> >  			if (lvl & P(LVL, LFB)) stats->ld_fbhit++;
> >  			if (lvl & P(LVL, L1 )) stats->ld_l1hit++;
> > -			if (lvl & P(LVL, L2 )) stats->ld_l2hit++;
> > +			if (lvl & P(LVL, L2)) {
> > +				stats->ld_l2hit++;
> > +
> > +				if (snoopx & P(SNOOPX, PEER))
> > +					stats->ld_peer++;
> > +			}
> >  			if (lvl & P(LVL, L3 )) {
> >  				if (snoop & P(SNOOP, HITM))
> >  					HITM_INC(lcl_hitm);
> >  				else
> >  					stats->ld_llchit++;
> > +
> > +				if (snoopx & P(SNOOPX, PEER))
> > +					stats->ld_peer++;
> >  			}
> >  
> >  			if (lvl & P(LVL, LOC_RAM)) {
> >  				stats->lcl_dram++;
> >  				if (snoop & P(SNOOP, HIT))
> >  					stats->ld_shared++;
> > +				else if (snoopx & P(SNOOPX, PEER))
> > +					stats->ld_peer++;
> >  				else
> >  					stats->ld_excl++;
> >  			}
> > @@ -597,10 +608,14 @@ do {				\
> >  		if ((lvl & P(LVL, REM_CCE1)) ||
> >  		    (lvl & P(LVL, REM_CCE2)) ||
> >  		     mrem) {
> > -			if (snoop & P(SNOOP, HIT))
> > +			if (snoop & P(SNOOP, HIT)) {
> >  				stats->rmt_hit++;
> > -			else if (snoop & P(SNOOP, HITM))
> > +			} else if (snoop & P(SNOOP, HITM)) {
> >  				HITM_INC(rmt_hitm);
> > +			} else if (snoopx & P(SNOOPX, PEER)) {
> > +				stats->rmt_hit++;
> > +				stats->ld_peer++;
> > +			}
> >  		}
> >  
> >  		if ((lvl & P(LVL, MISS)))
> > @@ -661,6 +676,7 @@ void c2c_add_stats(struct c2c_stats *stats, struct c2c_stats *add)
> >  	stats->ld_l1hit		+= add->ld_l1hit;
> >  	stats->ld_l2hit		+= add->ld_l2hit;
> >  	stats->ld_llchit	+= add->ld_llchit;
> > +	stats->ld_peer		+= add->ld_peer;
> >  	stats->lcl_hitm		+= add->lcl_hitm;
> >  	stats->rmt_hitm		+= add->rmt_hitm;
> >  	stats->tot_hitm		+= add->tot_hitm;
> > diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
> > index 8a8b568baeee..4879b841c841 100644
> > --- a/tools/perf/util/mem-events.h
> > +++ b/tools/perf/util/mem-events.h
> > @@ -75,6 +75,7 @@ struct c2c_stats {
> >  	u32	ld_l1hit;            /* count of loads that hit L1D */
> >  	u32	ld_l2hit;            /* count of loads that hit L2D */
> >  	u32	ld_llchit;           /* count of loads that hit LLC */
> > +	u32	ld_peer;             /* count of loads that hit peer core or cluster cache */
> >  	u32	lcl_hitm;            /* count of loads with local HITM  */
> >  	u32	rmt_hitm;            /* count of loads with remote HITM */
> >  	u32	tot_hitm;            /* count of loads with local and remote HITM */
> > -- 
> > 2.25.1
> 
> -- 
> 
> - Arnaldo

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-05-23 12:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-18  5:57 [PATCH v3 00/11] perf c2c: Support display for Arm64 Leo Yan
2022-05-18  5:57 ` [PATCH v3 01/11] perf mem: Add stats for store operation with no available memory level Leo Yan
2022-05-18  5:57 ` [PATCH v3 02/11] perf c2c: Add dimensions for 'N/A' metrics of store operation Leo Yan
2022-05-18  5:57 ` [PATCH v3 03/11] perf c2c: Update documentation for store metric 'N/A' Leo Yan
2022-05-18  5:57 ` [PATCH v3 04/11] perf mem: Add statistics for peer snooping Leo Yan
2022-05-23 12:38   ` Arnaldo Carvalho de Melo
2022-05-23 12:46     ` Arnaldo Carvalho de Melo
2022-05-18  5:57 ` [PATCH v3 05/11] perf c2c: Add dimensions for peer load operations Leo Yan
2022-05-18  5:57 ` [PATCH v3 06/11] perf c2c: Use explicit names for display macros Leo Yan
2022-05-18  5:57 ` [PATCH v3 07/11] perf c2c: Rename dimension from 'percent_hitm' to 'percent_costly_snoop' Leo Yan
2022-05-18  5:57 ` [PATCH v3 08/11] perf c2c: Refactor node header Leo Yan
2022-05-18  5:57 ` [PATCH v3 09/11] perf c2c: Sort on peer snooping for load operations Leo Yan
2022-05-18  5:57 ` [PATCH v3 10/11] perf c2c: Update documentation for new display option 'peer' Leo Yan
2022-05-18  5:57 ` [PATCH v3 11/11] perf c2c: Use 'peer' as default display for Arm64 Leo Yan
2022-05-23  8:43 ` [PATCH v3 00/11] perf c2c: Support " Jiri Olsa
2022-05-23 12:43   ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).